Markov Chains and Invariant Probabilities

Onesimo Hernandez-Lerma Jean Bernard Lasserre Markov Chains and Invariant Probabilities Birkhauser Verlag Basel • Bos...

Author: Onésimo Hernández-Lerma | Jean B. Lasserre

55 downloads 1313 Views 3MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Onesimo Hernandez-Lerma Jean Bernard Lasserre

Markov Chains and Invariant Probabilities

Birkhauser Verlag Basel • Boston • Berlin

Authors: Onesimo Hernandez-Lerma Departamento de Matematicas CINVESTAV-IPN Apartado Postal 14-740 Mexico, D.F. 07000 Mexico e-mail: [email protected]

Jean Bernard Lasserre LAAS-CNRS 7 Avenue du Colonel Roche 31077 Toulouse Cedex 4 France e-mail: [email protected]

2000 Mathematics Subject Classification 60J05, 60J10; 28A33, 28C15, 60B10, 60H15, 90C40

A CIP catalogue record for this book is available from the Library of Congress, Washington D.C., USA

Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at .

ISBN 3-7643-7000-9 Birkhauser Verlag, Basel — Boston — Berlin This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, broadcasting, reproduction on microfilms or in other ways, and storage in data banks. For any kind of use whatsoever, permission from the copyright owner must be obtained. © 2003 Birkhauser Verlag, P.O. Box 133, CH-4010 Basel, Switzerland Member of the BertelsmannSpringer Publishing Group Printed on acid-free paper produced of chlorine-free pulp. TCF 00 Printed in Germany ISBN 3-7643-7000-9 987654321

www.birkhauser.ch

Contents Acknowledgements

ix

Preface

xi

Abbreviations and List of Symbols

xv

1 Preliminaries 1.1 Introduction 1.2 Measures and Functions 1.3 Weak Topologies 1.4 Convergence of Measures 1.5 Complements 1.6 Notes

1 1 1 4 6 13 17

I

Markov Chains and Ergodicity

19

2

Markov Chains and Ergodic Theorems

21

2.1 2.2 2.3 2.4 2.5 2.6

21 22 28 33 36 39

3

Introduction Basic Notation and Definitions Ergodic Theorems The Ergodicity Property Pathwise Results Notes

Countable Markov Chains

41

3.1 3.2 3.3 3.4

41 41 44 46

Introduction Classification of States and Class Properties Limit Theorems Notes

Contents

Vi

4

5

6

Harris Markov Chains

47

4.1 4.2 4.3 4.4 4.5 4.6

47 47 51 56 60 61

Markov Chains in Metric Spaces

63

5.1 5.2 5.3 5.4 5.5 5.6

63 64 68 73 74 82

Introduction The Limit in Ergodic Theorems Yosida's Ergodic Decomposition Pathwise Results Proofs Notes

Classification of Markov Chains via Occupation Measures 6.1 6.2 6.3 6.4

II

Introduction Basic Definitions and Properties Characterization of Harris recurrence Sufficient Conditions for P.H R Harris and Doeblin Decompositions Notes

Introduction A Classification On the Birkhoff Individual Ergodic Theorem Notes

Further Ergodicity Properties

7 Feller Markov Chains 7.1 7.2 7.3 7.4

8

The Poisson Equation 8.1 8.2 8.3 8.4 8.5 8.6

9

Introduction Weak- and Strong-Feller Markov Chains Quasi Feller Chains Notes

Introduction The Poisson Equation Canonical Pairs The Cesaro-Averages Approach The Abelian Approach Notes

83 83 84 88 92

93 95 95 96 99 102

103 103 103 105 110 114 119

Strong and Uniform Ergodicity

121

9.1 9.2 9.3 9.4

121 122 127 131

Introduction Strong and Uniform Ergodicity Weak and Weak Uniform Ergodicity Notes

Contents

vii

III Existence and Approximation of Invariant Probability Measures

133

10 Existence of Invariant Probability Measures

135

10.1 Introduction and Statement of the Problems 10.2 Notation and Definitions 10.3 Existence Results 10.4 Markov Chains in Locally Compact Separable Metric Spaces 10.5 Other Existence Results in Locally Compact Separable Metric Spaces 10.6 Technical Preliminaries 10.7 Proofs 10.8 Notes

11 Existence and Uniqueness of Fixed Points for Markov Operators 11.1 Introduction and Statement of the Problems 11.2 Notation and Definitions 11.3 Existence Results 11.4 Proofs 11.5 Notes

12 Approximation Procedures for Invariant Probability Measures

135 136 138 143 145 147 149 155

157 157 158 160 167 174

175

12.1 Introduction 175 12.2 Statement of the Problem and Preliminaries 176 12.3 An Approximation Scheme 178 12.4 A Moment Approach for a Special Class of Markov Chains 183 12.5 Notes 190

Bibliography

193

Index

203

Acknowledgements

This book has been prepared thanks to the support of the CNRS (France)CONACYT (Mexico) Scientific Cooperation Program, and the EGOS (France)ANUIES (Mexico) Educational and Scientific Cooperation Program. The research of the first author was also supported by CONACYT Grants 32299-E and 37355-E, and it was partly done during a sabbatical leave at the Universidad AutOnoma Metropolitana-Azcapotzalco, Mexico City.

The authors

Preface

This book is about discrete-time, time-homogeneous, Markov chains (MCs) and their ergodic behavior. To this end, most of the material is in fact about stable MCs, by which we mean MCs that admit an invariant probability measure. To state this more precisely and give an overview of the questions we shall be dealing with, we will first introduce some notation and terminology. Let (X, B) be a measurable space, and consider a X-valued Markov chain e„ = {6, k = 0, 1, ... } with transition probability function (t.p.f.) P(x, B), i.e.,

P(x, B) := Pro') (G-1-1 E B I

ek = X)

for each xEX,Be B, and k = 0, 1, .... The MC . is said to be stable if there exists a probability measure (p.m.) ,u on B such that

(*)

p(B) = i ,u(dx) P(x, B) x

V B E B.

If (*) holds then pc is called an invariant p.m. for the MC e. (or the t.p.f. P). A key question in this book are the various types of convergence, as n —> oo, of the so-called expected occupation measures 1 n-1

P(n) (x, B) :=

n

k =0

with initial state x E X, in which Pk denotes the k-step t.p.f. For example, convergence of P(n) (X, .) setwise or in the total variation norm is used to characterize positive Harris recurrence of MCs, and also, when X is a locally compact separable metric space, it yields a Yosida-like ergodic decomposition of X. Other forms of convergence of P(n) (X, .) ensure, for instance, that the mapping pi—÷ p,P is continuous (in some norm for p.m.'s) on the set of invariant p.m.'s ,u for P. In addition to the expected occupation measures P(n) (x, B) we also consider the pathwise or empirical occupation measures 1 nx--1■

2_, 1B(k),

n k=0

where 1B stands for the indicator function of B E B. Under suitable assumptions, the almost-sure convergence of 7r ( n) (.) gives an analogue for MCs of the strong law of large numbers for independent and identically distributed (i.i.d.) random variables.

xii

Preface

The emphasis on the role of the expected occupation measures to analyze the long-run behavior of MCs is in fact a distinguishing feature of this book. Indeed, the different forms of convergence mentioned in the previous paragraph are used in the identification of the limit in ergodic theorems and to study the existence and characterization of solutions to the probabilistic Poisson equation, among many other key issues. It goes without saying, however, that our presentation of ergodic-theoretic concepts is not exhaustive. For instance, potential theory is not treated as a topic in itself even though some of its basic ingredients are used at some places in the book. A standard reference on the subject is Revuz [112]. Our choice of topics was mainly dictated (and biased) by the voluntary emphasis we have decided to put on one essential aspect of MC theory, namely the central role played by the expected occupation measures and the invariant probabilities. Chapter 1 concerns the mathematical background. It summarizes some results from real analysis required in later chapters, and its reading can be deferred until the results are actually needed. The rest of the book is divided in three parts. Part I, which consists of Chapters 2 to 6, deals with some basic facts from the ergodic theory of stable MCs. Most of the main results in this part — and in fact in all of the book — are either "individual" or "mean" ergodic theorems. The former refer to some form of pointwise convergence (for instance, p-almost everywhere for some measure p), and the latter refer to convergence in some norm (e.g. the L 1 (p)-norm for functions or the total variation norm for measures). In Chapter 2 we introduce some basic definitions and facts concerning MCs and invariant p.m.'s, as well as the individual ergodic theorem (JET) and the mean ergodic theorem (MET), which underlie a large part of the ergodic theory for stable MCs. In Chapter 3 we briefly recall some definitions and properties of MCs on a countable state space, whereas Chapter 4 summarizes basic results on Harris MCs. It is shown, in particular, that positive Harris recurrent MCs are basically the analogue in uncountable state spaces of the unichain MCs in countable spaces. Chapter 5 deals with MCs in a locally compact and separable metric space X. One of the main goals here is to identify the limiting function in the JET, the MET and other ergodic theorems. We also present a Yosida-like decomposition of the state space X into ergodic classes, each of which has associated a unique invariant p.m. Finally, in Chapter 6 we obtain a classification of a family of stable MCs into two disjoint classes, according to the type of convergence of their expected occupation measures. Part II of the book consists of Chapters 7, 8 and 9, and it concerns MCs with some particular properties in contrast to the general MCs in Part I. For instance, in Chapter 7 we consider Feller and quasi-Feller MCs on a locally compact separable metric space, for which one can easily derive conditions for the existence of invariant p.m.'s. In Chapter 8 we investigate the existence of solutions to the (probabilistic) Poisson equation, which appears in many areas of applied and theoretical probability, e.g. potential theory, stochastic approximation, and stochastic control. To conclude Part II, in Chapter 9 we introduce the notions of

Preface

xiii

strong, weak, and uniform ergodicity of MCs, and in particular we show how these notions relate to the Poisson equation. Finally, in Part III, which consists of Chapters 10 to 12, we study questions concerning the existence and approximation of invariant p.m.'s. Chapter 10 deals with three basic problems : existence of invariant p.m.'s, existence of strictly positive invariant p.m.'s, and existence and uniqueness of strictly positive invariant p.m.'s. In Chapter 11 we study the same problems but for invariant probability densities rather than p.m.'s. To conclude, in Chapter 12 we propose two numerical approximation schemes for invariant p.m.'s. The first is essentially an infinitedimensional linear programming approach for MCs on a general state space X, and the second is a moment approach for a special class of MCs on X =I W.

Abbreviations Throughout the book we use the following abbreviations: a.a. a.c. a.e. DET IET IPD LCS 1.s.c. MC MET p.m. P.E. P.H.R. s.t. t.p.f. weak* w.r.t.

almost all absolutely continuous almost everywhere dual ergodic theorem individual ergodic theorem invariant probability density locally compact separable lower semicontinuous Markov chain mean ergodic theorem probability measure (probabilistic) Poisson equation positive Harris recurrent subject to transition probability function weak star (topology) with respect to

List of Symbols

Sx

X, Y X xY

fi riz,, xi (X,8) (1, .F) X*

X+ 13(X) Co(X) Cb(X) Cc (X) M(X) ba(X)

a-algebra the Dirac measure at x spaces Cartesian product of the spaces X, Y Cartesian product of the spaces Xi , i = 1, n a measurable space X with a-algebra B the canonical sample space the topological dual of the Banach space X the cone of nonnegative elements of X (for a given order on X) the space of bounded measurable functions on X the space of continuous functions that vanish at infinity the space of bounded continuous functions on X the space of continuous functions on X with compact support the space of finite signed measures on X the space of finite, finitely additive signed measures on X

List of Symbols

xvi P(x, B) PTh (x, B) P(') (x, B) 7r(n)(x, B) B) 1®,u Px P1, Ex, EL.,

Ii HAI

TV

,u, V v /2 <1/ p v dv/dp, wi

:=

the one-step t.p.f. the n-step t.p.f. the n-step Cesaro averages (or n-step expected occupation measures) the n-step empirical occupation measures the limiting t.p.f. the t.p.f. 11(x, .) = p for all x probability on .T (with initial distribution 8 x ) probability on F (with initial distribution v) the corresponding expectation operators the total variation of the measure ,u the total variation norm of the measure p the minimum of the measures p and v the maximum of the measures ,u and v p(B) < v(B) for all B the measure p, is a.c. w.r.t. the measure A the measures p, v are mutually singular Radon-Niko0m derivative of v w.r.t. p converges to convergence in the weak topology wi weak convergence of p.m.'s (Definition 1.4.10) equality by definition isometrically isomorphic (for two Banach spaces) MC = {6, el, • • •

:= xai

1 • • •

an

; I

dual pair of vector spaces X, Y duality bracket for a dual pair of vector spaces (X, 3)) weight function the complement of A the closure of A the empty set indicator function of the set B the set of real numbers the set of extended real numbers the set of nonnegative integers L i (,u) L i (X , 8, ,u) norm (e.g. the supremum nom) of f norm of f e L 1 w-weighted norm of f norm of the operator P

Chapter 1

Preliminaries 1.1 Introduction In this chapter we summarize some results from Real Analysis that will be extensively used in later chapters. As many of these results are standard, we do not include proofs of them, but we provide appropriate references. We are particularly interested in various types of convergence of measures, including convergence in the total variation norm and setwise convergence on a general measurable space (X, 13), as well as several "weak" forms of convergence when X is a metric space and B is the Borel a-algebra. We also present complementary results on (a) a uniform principle underlying several forms of weak convergence of measures, and (b) analogues of Fatou's lemma and of the monotone and the dominated convergence theorems for sequences of measures, and extensions of these results to sequences of functions and measures. The chapter is organized as follows: In §1.2, we first present some preliminary material including basic definitions and facts on measure and function spaces. In §1.3 we briefly review some material on weak topologies, and then, in §1.4, we state the basic results on several forms of weak convergence of measures. Finally, §1.5 contains some useful complements. A Notes section concludes the chapter.

1.2 Measures and Functions In this section we introduce some basic definitions and facts on measures and functions.

Chapter 1. Preliminaries

2

1.2.1 Measures and Signed Measures Let (X, B) be a measurable space, and N. the set of extended real numbers. A measure p on B is a set function p : B R such that (i) p(0) = 0. (ii) p(B) > 0 for all B E B. (iii) p,(Un,B) = Eri p(B 7 ) for every countable collection {B} C B of mutually disjoint sets. If in the countable additivity property (iii), one replaces "countable" with "finite", then p is called a finitely additive measure on B. In the sequel, the term "measure" will always refer to a countably additive measure, that is, a set function that satisfies (i), (ii) and (iii), whereas a finitely additive measure will always be specified as such. A measure p is finite if p(X) < oo, and a-finite if there is a countable collection {B} c B such that UB = X and p(B 11 ) < oo for each n. A finite measure p with p(X) = 1 is called a probability measure (p.m.). In the sequel, if X is a topological space, it will be understood that B denotes the corresponding Borel a-algebra.

Signed Measures A set function p : B R that satisfies (i) and (iii) and such that either +oo or -oo does not belong to the range of p, is called a signed measure on B. (The latter condition precludes the appearance of terms of the form +oo - oo in (iii).) A signed measure p typically appears as a difference p = - A2 of two measures A1 and A2 with either Ai (X) <00 or A2(X) <00. Again, if in the countable additivity property (iii), one replaces "countable" with "finite", then p is said to be a finitely additive signed measure on B. Given a signed measure p on B, for every B e B let

p+ (B) := sup{p(A)

E B, A c B}

and

(B) := - inf {p(A) I A E B, Ac B}. Then p+ and p - are measures on B called the positive and the negative parts of p, respectively. The measure Ipl := p+ + p - is called the total variation of p. (By the definition of 14 we have 1p(B)1 _< (B) for all p E M(X) and all B E B. Hence, one should not confuse p(.)1 with Ip1(.).) A signed measure p is finite if both p±(X) and it - (X) are finite. We denote by M(X) the vector space of finite signed measures on B and by ba(X) the vector space of finite, finitely additive signed measures on B. The corresponding positive cones of nonnegative elements in M(X) and ba(X) will be denoted by M(X)+ and ba(X)+ , respectively.

1.2. Measures and Functions

3

The mapping 1 2 i— Ili/ll TV := I/21(X) defines a norm on M(X), called the total variation norm. When equipped with this norm, M(X) is a Banach space (that is, a complete normed vector space). On the other hand, M(X) is an ordered vector space in which it > v means that /2 — v is in the positive cone M(X)+; that is, ti(B) > v(B) for all B c B. A (not necessarily finite) signed measure /2 on B is said to be absolutely continuous (a.c.) with respect to (w.r.t.) a measure A on B, denoted /2 < A, if for B EB

A(B) = 0 = p(B) = 0.

(1.2.1)

Moreover, ,u, < A if and only if WI < A. Let A be a cr-finite measure and /2 a signed measure on B. If p, < A, then the Radon-NikodSrm Theorem states that there exists a measurable function f : X —> Fk such that

p,(B) = f f dA

VB E B.

(1.2.2)

B

In particular, if /2 is finite, then f is A-integrable. The function f in (1.2.2) is sometimes written as f := dii/dA and it is called the Radon-Nikod3'rm derivative of /2 w.r.t. A.

1.2.2 Function Spaces Let (X, B) be a given measurable space. We next define a series of vector spaces that are extensively used in what follows.

• B(X) denotes the vector space of bounded measurable functions on X. • If X is a topological space, Cb(X) denotes the vector space of bounded continuous functions on X. Furthermore, Co(X) c Cb(X) is the subspace of continuous functions that vanish at infinity, and Cc (X) c Co(X) is the subspace of continuous functions with compact support. Thus Cc (X) C Co(X) C Cb(X) c B(X).

(1.2.3)

11111 :=

When equipped with the sup-norm supx If (x) I, the spaces B(X), Cb(X) and Co(X) are Banach spaces, and when X is locally compact and Hausdorff, C(X) is dense in Co (X). The following two facts illustrate that functions and measures are in a sense dual to each other. • For a general measurable space (X, B), the topological dual B (X)* of B (X) is isometrically isomorphic to ba(X). • If X is a locally compact Hausdorff space, then the topological dual Co(X)* of Co (X) is isometrically isomorphic to M(X).

4

Chapter 1. Preliminaries When using one of the last two statements sometimes we write B(X)*

ba(X)

(1.2.4)

Co (X)* L-2 M(X).

(1.2.5)

and In other words, `c- -'" means "isometrically isomorphic".

1.3 Weak Topologies Let X and Y be two vector spaces and (., .) : X x Y R a given bilinear form on X x Y. Then (X, 3)) is said to be a dual pair of vector spaces if (., .) separates points, that is, • for each x 0 in X, there exists y E 3) such that (x, y) 0, and • for each y 0 in 3), there exists x E X such that (x, y) 0. Given a dual pair (X, 3)) of vector spaces, there is a natural topology on X induced by 3), called the weak topology on X and denoted by o- (X,3)), under which all the elements of 3), when regarded as linear functionals (., y) on X, are continuous. In this topology, a neighborhood N(x,E) of a point x E X is of the form N(x,e) := {x l E X (x' x, < c Vy E F C , —

for € > 0 and F a finite subset of 3). In the weak topology o- (X, 32), a sequence or a net {x,} C X converges to x E X if (1.3.1) (x, y) (x, y) VY G Y. Obviously, whenever yi c 3)2, the topology o- (X,3)2 ) is finer than the topology a(X,3)1 ), that is, it contains more open sets. For instance, with X := M(X), we may let 3) be any of the spaces in (1.2.3) and we get the following. • If 3) := B(X), the weak topology o- (M (X), B(X)) on M(X) is referred to as the topology of setwise convergence of measures. It is so named because the corresponding bilinear form is

(pi) := f f 41 , and (1.3.1) becomes

(pa, f)

f)

B(X),

which is easily seen to be equivalent to the "setwise convergence" ,u,(B)

bi(B)

VB E B.

(1.3.2)

5

1.3. Weak Topologies

• If Y := Cb(X), with X being a metric space and B the Borel a-algebra, the weak topology a(M(X), Cb(X)) on M(X) is usually referred to as the topology of weak convergence of measures, which is of fundamental importance in probability theory and its applications. This convergence is of course as in (1.3.2), replacing B(X) with the smaller space Cb (X). For a sequence { /20 of p.m.'s (probability it. measures) on B, the convergence in this topology is denoted by ,u, • If Y := Co (X) (or Cc (X)), with X being a locally compact separable metric space endowed with the Borel a-algebra, the weak topology cr(M(X), C o (X)) on M(X) is usually referred to as the topology of vague convergence of measures. These three types of weak convergence of measures are reviewed in more detail in §1.4. Remark 1.3.1. When X and 3) are Banach spaces and Y is the topological dual of X, that is, Y = X*, then the weak topology o- (X, X*) is the usual weak topology of X, as defined above. However, the weak topology o- (X*, X) on y = X. is called the weak* (weak-star) topology on Y. The importance of the latter comes from the celebrated Ban,ach-Alaoglu (or Ban,ach-Alaoglu-Bourbaki) Theorem, which states that the unit ball in X* is compact in the weak* topology o- (X*, X) (see Lemma 1.3.2(b) below). The weak and weak* topologies are the most frequently used weak topologies and, specially in probability theory and functional analysis, one does not even specify them as o- (X, X*) and a(X*, X), respectively. However, since we will frequently use several weak topologies, we will usually specify them in terms of the dual pairs of vector spaces involved. For instance, in probability theory, if X is a metric space and X := what one usually calls the "weak convergence of p.m.'s" (that is, convergence in the weak topology o- (M(X),Cb(X))), is not the standard "weak" convergence in functional analysis, that is, convergence in the weak topology o- (M(X), M(X)*), which is a much stronger form of convergence. On the other hand, if X is a compact metric space, the "weak convergence of p.m.'s" (that is, convergence in the weak topology a(M(X),C b (X))) is in fact the convergence in the weak* topology since, in this case, M(X) _-• Cb(X)*. Similarly, when X is a locally compact Hausdorff space, the "vague topology" o- (M(X), Co(X)) is in fact the weak* topology because M(X) '-' Co (X)*. We conclude this section with a fundamental result in functional analysis that we repeatedly use in later chapters. (For a proof of Lemma 1.3.2 see, e.g., [3, 17, 34, 113, Uzi].)

Lemma 1.3.2. Let X be a Banach space with topological dual X". (a) If x n converges to x in the weak topology a(X , X*), then liz n il is bounded and liminfilxmli > lixli. (b) [Banach-Alaoglu-Bourbaki]. The closed unit sphere U := y E X * 1 li Yll < 1 } in X* is compact in the weak* topology. Moreover, if X is separable, then the weak* topology of U is m,etrizable, and, therefore, U is sequentially compact. {

Chapter 1. Preliminaries

6

(c) If X is separable, then a convex subset K of X* is closed in the weak* topology if and only if

[4, E K and (4„ x) —> (x* , x) Vx E X] = x* E K.

1.4

Convergence of Measures

In this section we review some properties of several forms of convergence of measures on a measurable space (X, B), including convergence in the weak topology o- (X, Y) with X = M(X) and Y being one of the spaces in (1.2.3). Many of these properties concern bounded sets of measures, by which we mean the following. A set K C M(X) is said to be bounded if there is a constant m such that Milky <m for all ii, in K.

1.4.1

Setwise Convergence

Definition 1.4.1. A sequence {/i n } of measures on B is said to converge setwise to a measure p on B if pn (B) —> p(B)

VB E B.

(1.4.1)

If the sequence bin } in Definition 1.4.1 is in M(X), then, as was already mentioned, (1.4.1) is equivalent to (1.3.2), that is, convergence in the weak topology o- (M (X), B(X)) on M(X). This follows from the fact that the space of simple functions (that is, functions that are linear combinations of a finite number of indicators functions 1113, B E B) is dense in B(X) endowed with the sup norm. Alternatively, if M(X) is viewed as a subspace of ba(X) '_' B(X)* , then (1.4.1) is equivalent to the weak* convergence of /I n to /..t in ba(X). Still another interpretation is that, as B(X) C M(X) * , every convergent sequence in the weak topology or(M(X), M(X)*) is setwise convergent. As measures are set functions on B, setwise convergence is the natural counterpart for measures of the pointwise convergence of functions on X. The following proposition gives conditions that permit to detect whether (1.4.1) holds even if p is not known. (See also Proposition 1.4.3 for the case when X is a locally compact metric space. A different type of result that gives (1.4.1) appears in Theorem 1.5.5)

Proposition 1.4.2. (a) Let {An } be a sequence of measures on B such that pri (B) --> aB

VB E B

(1.4.2)

for some number aB. If (i) p n, is nondecreasing (that is, p,, < btn+i for all n), or if (ii) aB <00 for each B E B, then there is a measure p on B such that p(B) = aB for all B c B and (1.4.1) holds. Moreover, if the measures p n and p are finite and

1.4. Convergence of Measures

7

either (1) or (ii) hold, then the countable additivity of ian is uniform in n, that is, if {Bk} is a sequence in 13 such that Bk 0, then lirn sup [p,n (B k )] = 0. k—■ CXD

n

Jim addition ii < A for all n = 1, 2, . . . , then p, < A. (b) Let {li n } be a sequence of signed measures in M(X) such that (1.4.2) holds and supBE8 laBI < oo. Then there exists p E M(X) such that p(B) = aB for all B E 8, that is, (1.4.1) holds. This is the well-known Vitali-Hahn-Saks Theorem [see e.g. Chapter IX, §10 in Doob [34. On the other hand, when X is locally compact, setwise convergence takes a simpler form, in the sense that it only needs to be verified in open sets. Namely:

Proposition 1.4.3. (Panchapagesan [109, Cor. 1]) Let X be a locally compact met-

tun l

ric space. A bounded sequence in M(X) is convergent in the weak topology o- (M(X),M(X)*) if and only if lim n ,c, p,„(0) exists for every open set 0 E B. In fact, Proposition 1.4.3 is true even in the more general context of a locally compact Hausdorff space provided that "for every open 0 E 8" is replaced with "for every open Baire subset of X". Related to Proposition 1.4.2(b) we also have the following.

Proposition 1.4.4. (Dunford and Schwartz [34, IV.9.1]) A set K in M(X) is weakly sequentially compact (that is, sequentially compact in the weak topology cr(M(X), M(X)*)) if and only if K is bounded and the countable additivity of p on 8 is uniform w.r.t. p E K. From Proposition 1.4.4, an obvious sufficient condition for sequential compactness of a set K C M(X) in the weak topology cr(M(X), M(X)*) is to "majorize" K by a finite measure on X. The precise statement is as follows.

Corollary 1.4.5. Let K

C M(X) and v E M(X) be such that WI < v for all p, E K.

Then K is sequentially compact in the weak topology o - (M(X), M(X)*). Proof. As v(X) is finite, K is bounded. Now, let {B} C B and suppose Bri 0. Then, as v(Bk) ,I, 0 and sup Ifil(Bk) < v(Bk), ACK

we get lim sup 1,a1(Bk) = 0.

k—*oo A EK

This proves that the countable additivity of p, on 8 is uniform w.r.t. p, E K. Therefore, the desired conclusion follows from Proposition 1.4.4. LI

Chapter 1. Preliminaries

8

Concerning setwise convergence, let us mention the following complementary result. For a Banach space X, let X** be its second dual, that is, X** = (X*)*.

Proposition 1.4.6. (Zhang [141, Theor. 1.1]) Let K be a bounded subset of M(X). Then the following are equivalent: (a) K is o- (B(X)* ,B(X)**)-compact. (b) K is o- (B(X)* , B(X))-compact. (c) K is weak* closed and the countable additivity of p on B is uniform in p, e K . Therefore, if p m —> p, E M(X) in the weak topology cr(M(X),B(X)) (or, equivalently, in the weak* topology o- (ba(X), B(X)) on ba(X)), then it also converges in the topology cr(ba(X), B(X)**) = cr(ba(X),ba(X)*) on ba(X), that is, in the "stronger" weak topology on ba(X); see the comments after (1.4.1).

1.4.2 Convergence in the Total Variation Norm Let M(X) be equipped with the total variation norm 11.117v, which is defined as

I lia ITV :=

111

1(x),

M(X).

(1.4.3)

Another norm equivalent to the total variation norm is

Ili4 := sup I(B)I, Bet3

ti E M(X),

(1.4.4)

because

111-1 11 5_ IlttliTv 5_ 2 1Itt ,

fi E M(X).

In particular, from (1.4.4) and Proposition 1.4.2(a), we can see that (M(X), lidiTv) is a Banach space.

Definition 1.4.7. A sequence {A n } in M(X) converges to p, C M(X)

in the total

variation norm if lim

n —* oc

I lli n —pl I T v

= o,

(1.4.5)

or, equivalently, in view of (1.4.4), lim sup Ittn (B) — p(B)1 = 0.

(1.4.6)

cc) BEB

By (1.4.5) and (1.4.6), we have

Din — ialliv --4 0 <=> tt(B) --- p(B)

uniformly in B E B.

Hence, just as the setwise convergence of measures is the analogue for measures of the pointwise convergence of functions, convergence in the total variation norm can be seen as the analogue for measures of the uniform convergence of functions.

9

1.4. Convergence of Measures

1.4.3 Convergence of Measures in a Metric Space In the remainder of this section, X denotes a metric space, with the usual Borel a-algebra B. In some places, we require X to be locally compact and separable (LCS for short). Setwise convergence of a sequence in M(X) (that is, convergence in the weak topology o- (M(X), B(X))) is a strong property. Therefore, one is often satisfied with weaker forms of convergence, in which B(X) is replaced with smaller classes of functions, for instance, the vector spaces in (1.2.3). One thus obtains several interesting forms of weak convergence, already mentioned in §1.3 but which, in view of their importance, we redefine here. Definition 1.4.8. A sequence {An } in M(X) is said to converge vaguely to M(X) if

I

fd

f f dp,

V f E C c (X).

E

(1.4.7)

If X is a LCS metric space, then C c (X) is dense in Co(X), and so (1.4.7) is in fact equivalent to

I

f dttri

f f dp,

V f E Co(X).

(1.4.8)

As noted in (1.2.5), when X is a LCS metric space we have M(X) C o (X)*. Thus vague convergence is just the weak* convergence in M(X), which we sometimes denote by tan u4 Given a measure ,a E M(X), a set A E B with boundary OA is called a 0 continuity set if ii(aA) = 0. -

Theorem 1.4.9. Let X be a LCS metric space and let p n be a sequence in M(X) that converges vaguely to p E M(X)+ . Then: (a) lim supn (K) < p,(K) for every compact K E B. (b) lim infn (G) p,(G) for every open G E B. (c) limp ,an,(A) = p(A) for every /1-continuity set A E B with compact closure. Conversely, if (c) holds and in addition {,in } is bounded, then p, n converges vaguely to For a proof of Theorem 1.4.9, see e.g. Doob [31, pp. 136 and 138-139]. Replacing Co (X) with Cb(X) in (1.4.8) we obtain another form of weak convergence that is stronger than vague convergence: Definition 1.4.10. A sequence {A n } in M(X) converges to p, E M(X) in the weak topology a(M(X),Cb(X)), denoted /i n = it, if

f

f d iu n —> f f dp,

V f E Cb(X).

(1.4.9)

Chapter 1. Preliminaries

10

This latter convergence is the most commonly used in probability theory, where it is called the "weak" convergence. However, as we already noted, (1.4.9) should not be confused with other forms of weak convergence (see Remark 1.3.1). If p and pi are finite measures on X such that f fdp, = f fdtil for all f E Cb(X), then p, = p'. Hence, it follows from (1.4.9) that if a sequence {A n } c M(X) converges weakly, then it has a unique limiting measure. We obviously have: cony, in total variation = setwise cony. = "weak" cony. = vague cony. (1.4.10) Observe that on a compact metric space, vague and "weak" convergence coincide since Co(X) ,_- Cb(X). Two important notions related to the "." convergence in (1.4.9) are tightness and relative compactness of a set of p.m.'s.

Definition 1.4.11. Let H be a set of p.m. 's on B. It is said that H is (a) tight if for every e > 0 there is a compact set K C X such that p(K) > 1— E for all p, E H. (b) relatively compact if every sequence in H contains a "weakly" convergent subsequence (that is, convergent in the weak topology o - (M(X),Cb(X))). More explicitly, for every sequence {A n } C H, there is a subsequence {li nk } of {p m } and a /L. p.m. p (not necessarily in H) such that tank Theconcepts of tightness and relative compactness are related by Prohorov's Theorem (see e.g. Theorems 6.1, 6.2 in [14]).

Theorem 1.4.12 (Prohorov's Theorem). Let X be a metric space. (a) If H is tight, then H is relatively compact. (b) If X is separable and complete and H is relatively compact, then H is tight. Remark 1.4.13. Thus, if X is a Polish (that is, a separable and complete metric) space, then Prohorov's Theorem yields that H is tight if and only if it is relatively compact. This is also true if X is a LCS metric space because then X can be given a metric under which it is complete.

Definition 1.4.14. A function f : X —>118 + is said to be (a) a moment if there is a sequence of compact sets Kn lirn inf f (x) = +oo,

n—■ oo xElq,

I X such that (1.4.11)

where Knc denotes the complement of Kn; (b) inf-compact if the set K r := Ix E XI f(x) < r} is compact for every r E R.

11

1.4. Convergence of Measures

Of course, f : X -4 R+ is a moment if it is inf-compact. Conversely, if f: X --- R+ is a moment, then the closure of Kr is compact for every r E R. Tightness of a set of p.m.'s and the existence of a moment are related as follows.

Proposition 1.4.15. Let H be a set of p.m. 's on B and let f : X --> le be a moment function. (a) If X is a metric space and sup i If dp, < +co ttE rt

(1.4.12)

then H is tight. (b) Conversely, if X is a LCS metric space and H is tight, then there exists a moment f : X —>R + that satisfies (1.4.12). In the latter proposition we may replace "moment" function with "inf-compact" function. We have, on the other hand, the useful Portmanteau Theorem.

Theorem 1.4.16 (Portmanteau Theorem). Let X be a metric space and p, p n 1, 2, . . . ) p.m. 's on B. Then the following five statements are equivalent: (n= (a) bin = I-1 . (b) f fdp n .-- f f dp, for every bounded, uniformly continuous f. (c) limsupn u(F) < p(F) for every closed F E B. (d) liminf n pn (G) > p(G) for every open G E B. (e) limn, u(A) = p(A) for every ,a-continuity set A c B. For proofs of the Portmanteau Theorem 1.4.16 see, e.g., Ash [3], Bertsekas and Shreve [12], Billingsley [14], Doob [31], .... In fact, there is another useful characterization of the weak convergence p n p, which uses lower semicontinuous (1.s.c.) functions and the following well-known fact (e.g., [3, 12]).

Proposition 1.4.17. Let X be a metric space. A real-valued function f on X is l.c.s. and bounded below if and only if there is a nondecreasing sequence of continuous bounded functions vk E Cb(X) such that vk(x) I v(x) for all x E X. Given a metric space X, we shall denote by L(X) the family of real-valued functions on X which are 1.c.s. and bounded below.

Proposition 1.4.18. Let X be a metric space, and fp n l a bounded sequence in M (X)± . (a) pn, p, if and only if liminf f f dp n > f f dp

V f E L(X).

(1.4.13)

n —■ Do

(b) If in addition X is a-compact and {A n } is such that p n 1L-1 p, then (1.4.13) holds for every nonnegative function f in L(X).

Chapter 1. Preliminaries

12

Proof. (a) () Suppose that pn = p and choose an arbitrary function f E L(X). By Proposition 1.4.17, there is a sequence {vk } in Cb(X) such that vk (x) v(x) for all x E X. Hence, in particular, we have f fdp n > f vkdp n for all n and k, and so lirn inf ri--400

v k f d,u n > lirn inf f

f vk = d

n-400

,u, Vk [since

An

pl.

(1.4.14)

Thus, letting k —> oo, (1.4.13) follows from monotone convergence (see Proposition 1.5.3(b), below) and the fact that f E L(X) was arbitrary. () Conversely, suppose that (1.4.13) holds and pick an arbitrary function v E C6(X). Then, as Cb(X) c L(X), it follows that v is in L(X) and so (1.4.13) yields lim inf f v T'-400

> I dp. v

Moreover, —v is also in Cb(X) C L(X) and so, using (1.4.13) again we get lim inf f (—v) dpn > f (—v) dp, Ti -400

i.e., lirn sup f v dp„ < f v dp. n-*°0

It follows that f vd,u, n f vdp, and, as v E Cb(X) was arbitrary, we conclude that bin (b) Let f E L(X) and vk E Cb(X) be as in (1.4.14). As each vk is bounded and X is a-compact, for every k there is an nondecreasing sequence vk i in Co (X) such that vk i (x) v(x) for all x E X. Therefore, lim inf f n-■ cx)

f vk c pn Vk dpn > lim inf l n-+00

>

lim inf ri-■ oo

I

Letting i

oo and then k

vk i d,u,n

Vk,

Vk, i [as pn 1.'E pl.

oo, (1.4.13) again follows by monotone convergence. 0

As was already noted in (1.4.10), "weak" convergence implies vague convergence. The following elementary example shows that the converse is not true. Example 1.4.19. Let X = (0, oo), and let p n := Sx7, be the Dirac measure at xn, where x n E (0,00), x n —> 0. Observe that X is not complete but it is a LCS metric space with respect to the usual distance in R. Furthermore, Co (X) is the space of continuous functions f(x) on (0,00) with limit 0 whenever x 0 or x oo. Then lin 1-14 p = 0 since f fdp n = f (xn ) 0 for all f c Co (X), but of course, we do not have "weak" convergence.

1.5. Complements

13

The latter example shows that vague convergence does not imply weak convergence. On the other hand, by Proposition 1.4.18(b), and under additional hypotheses (on X), vague convergence implies (1.4.13), which is equivalent to weak convergence. Therefore, the obvious question then is, what should we "add" to vague convergence to get weak convergence? An answer to this question is provided by the following result, which gives another characterization of weak convergence when the space X is LCS.

Theorem 1.4.20. If X is a LCS metric space, then the following statements are equivalent for a sequence {A} in M(X)+ (a) /in (b) p„ u4K p and pT(X) In Theorem 1.4.20 it is obvious that (a) implies (b). For a proof of the converse see, for instance, Doob [31, VIII. §11], where the condition (b) is called "stable Co(X)-convergence". Hence, in other words, Theorem 1.4.20 states that weak convergence is equivalent to stable C o (X)-convergence (or, in our terminology, "stable vague convergence"). Finally, observe that in the Example 1.4.19 we have p„ u4k ,u,, but the condition p,„(X) —> p(X) fails.

1.5 Complements The material in this section can be omitted in a first reading because it only presents some complements to the results in §1.4. We first present a uniform principle that permits one to give a unified characterization of various types of weak convergence of measures, including some of those introduced in §1.4. We then present analogues for measures of Fatou's lemma and the monotone and the Lebesgue dominated convergence theorems for functions, as well as extensions of Fatou's lemma and dominated convergence for mixed sequences of measures and functions.

1.5.1 A Uniform Principle of Weak Convergence There are other types of weak convergence in addition to those in §1.4. For instance, let X be a compact metric space. Dieudonne [30] considered two spaces of functions : 1. The space of bounded semi continuous that is, either upper semicontinuous (u.s.c.) or lower semicontinuous (1.s.c.) — functions on X; and 2. For a given sequence of measures ,u,„ E M(X), the space of functions f on X that are bounded and continuous, except at points in a set (depending on f) of null measure for every p n . In Dieudonne's terminology, this is the space of Riemann integrable functions.

Chapter 1. Preliminaries

14

Now, let X be a LCS metric space and consider a sequence of p.m.'s pn on the Bore! a-algebra B. Moreover, let • Xi := Co(X) • X2 := Cb(X) • X3 := the space of Riemann integrable functions on X, if X is compact • X4 := the space of bounded semicontinuous functions on X, if X is compact • X5 := B(X). The convergence pn p in the weak topology a(M(X), Xi ) is denoted wi wi pn —> p. In this case, the convergence p„ —> p, for i = 1, is the weakest type of convergence, and it is the weak* convergence in M(X) (see Remark 1.3.1). For the convergence p n p, i = 1, 2, 5, in §1.4 we have seen that some corresponding property has to hold uniformly in n. Namely: wi • For sup, p,n (X) < oo, which is automatically satisfied for a sequence of p.111.' s. w2 • For —> the tightness of each measure p n is uniform in n, that is, for every > 0, there is a compact K such that p(K) > 1 — E for all n = 1,2, .... w5 • For the countable additivity of p,„ is uniform in n. Similarly (from Dieudonne [30]), when X is compact, w3

• For we must have : V€ > 0, VK compact, there exists an open neigborhood 0 of K, such that (1.5.1)

ttn( 11 ) 5_ e Vn, V compact H c 0 — K. w4

• For we must have : V€ > 0, VK compact (with p(K) = 0 Vn), there exists an open neigborhood 0 of K for which (1.5.1) holds. In fact, these "ad-hoc" uniform properties can be summarized under the following common uniform principle.

Theorem 1.5.1 (Lasserre [92]). Let fpn l be a sequence of p.m.'s on M(X) such that pn .11> p for some p E M(X)± . Then for i = 1, . . . , 5, pn

p, if and only if

lim [sup f fk dp n] — 0, whenever fk 1 0, fk E X, Do

(1.5.2)

where it is assumed that X is compact when i = 3, 4. Example 1.5.2. Let X, x n E X, and pn be as in Example 1.4.19, i.e., X = (0, oo), xn 0, and pn := 6xn • As shown in that example, we have the convergence wl

w2 = 0, but not the convergence —>.

Now with X := [0, oo) and the same sequence p n , we have p,n := 60, i.e., p,n = p,. Now consider fk E B(X) with fk(0) = 0, fk(x) := 1 on (0,1/4 ik(x) := 2 — kx on [1/k, 2/k], and fk(x) := 0 on [2/k, oo) for every k = 1, 2, .... Hence fk 1 0. As supn f fkdp n = 1 for all k, pn does not converge setwise to p.

1.5. Complements

15

1.5.2 Fatou's Lemma, Monotone and Lebesgue Dominated Convergence for Measures In this section we present analogues of Fatou's lemma and the monotone and the Lebesgue dominated convergence theorems for measures instead of functions. That is, given a sequence of measures {A n } and a function f, what can we say about the limit of f f d,u„ whenever A n A in a suitable sense? We also consider generalized versions of these theorems for a mixed sequence {(fn, lin)} of functions and measures, that is, we investigate the relationship between f f„dp,„ and f fdp, whenever fn f and A n it in their respective "pointwise" convergence. Setwise convergence is a natural convergence to consider for the sequence An for we have seen that it is the analogue for measures of the pointwise convergence for functions. However, we will also consider other types of convergence when X is an arbitrary Borel space, that is, a Borel subset of a Polish space. First, we briefly recall the standard Fatou lemma and the monotone and Lebesgue dominated convergence theorems. In Proposition 1.5.3 and Theorem 1.5.4, (X, 8, itt) is a general measure space.

Proposition 1.5.3. (a) Fatou's lemma: Let g, f (n = 1, 2 ... ) be measurable functions such that f„ > g for all n and f gdp, > —oo. Then lim inf f f n n,c>o

f

fn i d 1,1 .

(1.5.3)

Tl- 00

(b) Monotone convergence: Let f„, be a sequence of nonnegative measurable functions such that f„ I f pointwise. Then lim I f„dp, =

I

f di.t.

(1.5.4)

(c) Dominated convergence: Let g, f71 (n = 1,2 ... ) be measurable functions such that IfTh l < g for all n, and g is ,u-integrable. If f„, —> f pointwise, then lim f fn dA =d f f it. 00

(1.5.5)

For proofs of Proposition 1.5.3, see, for instance, [3, 31, 114]. It is important to note that in (a) and (b), the functions f n need not be ,u-integrable. In addition, in (b) the convergence f(x) f(x) for all x E X can be weakened to the convergence f„ —> f A-a.e., and finally, in (c) we may replace pointwise convergence with either fn f p-a.e. or fn --> f in A-measure. We next consider an analogue of Proposition 1.5.3 in which, in addition to the sequence { fn }, we also consider a sequence {A}.

Chapter 1. Preliminaries

16

Theorem 1.5.4. (See [66].) Let {p n } and {f n } be two sequences of measures and measurable functions, respectively. (i) "Generalized Fatou". Let u be a measure such that lim infn An(A) > bt(A) for every A E B, and let f (x) := lim infn fn (x), x E X. If f n > g for all n, with lim n f gdp,n = f gdp, > —oo, then lim inf f fn dp,n > df f cx)

.

(1.5.6)

(ii) "Generalized dominated". Assume that there exist a measurable function g and a measure v such that (al) Ifn l g for all n (a2) p,n < v for all n (hi) fn (x) f(x) for all x E X setwise for some measure it (b2) pn (c) f g dv < oo . Then lim f

n —*cc

A dpn =

f f clbt.

(1.5.7)

Of course, the interesting case in Theorem 1.5.4(i) is when p, is nontrivial, i.e., p(X) > 0. In general, if there is a nontrivial measure p such that fi,„ > tt for all n = 1, 2, ... , then the sequence { pn } is said to be order-bounded from below. Similarly, {ja n } is order-bounded from above if pn < v for all n and some finite measure v. Some necessary and/or sufficient conditions for order-boundedness (from above or from below) for sequences of p.m.'s are given in [69]. For an important consequence of "weak" convergence and order-boundedness from above, see Theorem 1.5.5 below. f for all n = 1, 2, ... , then we obtain Observe that in Theorem 1.5.4, if A a Fatou lemma and a dominated convergence theorem for a sequence of measures An• We now assume that X is a Borel space with Borel a-algebra B. The result in Theorem 1.5.4(ii) on "dominated convergence" of a sequence of p.m.'s pn requires setwise convergence, which as noted in §1.4 is one of the strongest form of convergence. The obvious question then is if there is an easy-to-use test for setwise convergence. One such a test is provided in the first part of the following result (which is a special case of Proposition 2.3 in [66]).

Theorem 1.5.5. Let X be a Borel space, and let 1u and ttn , n = 1, 2, . .. be measures in M(X) such that (a) p,„ = p, and (b) {A} is order-bounded from above by a finite measure v. Then ,an —* setwise.

(1.5.8)

1.6. Notes

17

Moreover,

f

f dp,„ —> f f dp,

(1.5.9)

for any v-integrable nonnegative function f on X. It is worth noting that for each bounded sequence {ti n } in M(X) there exists a majorizing measure v — namely, the so-called upper envelope of {An } (see, e.g., [44]). Thus, the key fact in part (b) of Theorem 1.5.5 is that v is required to be finite. For instance, the sequence {ju, n } in Example 1.5.2 (or 1.4.19) with X = [0, oo) satisfies (a) in Theorem 1.5.5, but not (b). For another, less trivial example of a sequence {p,„} that satisfies (a) but not (b), see [44, Remark 3.3]. On the other hand, if X is a LCS metric (rather than a Borel) space, Theorem 1.5.5 remains true if the "weak" convergence in (a) is replaced with vague convergence. However, if the measures au,n, are not necessarily finite, we must modify Theorem 1.5.5 to read as follows.

Proposition 1.5.6. Let X be a LCS metric space. Let f be a nonnegative l.s.c. function on X, and {jan } a sequence of measures such that

I

hdp

fhd

Vh E C o (X).

(1.5.10)

Then (a) lirn inf„,,, f f dtin > f f d,u,. (b) If in addition !L T,
oo

(1.5.11)

In contrast to Theorem 1.5.5 (in which the majorant measure is finite), in Proposition 1.5.6, the conditions (1.5.10) and ti n < v for all n, do not imply that Un setwise if v is not finite. Finally, observe that part (a) in Proposition 1.5.6 is more general than Proposition 1.4.18(b) because the latter deals with finite measures. j

1.6 Notes The material in Sections §1.2 to §1.4 is quite standard and can be found in many textbooks on real analysis. In particular, we have borrowed material mainly from Ash [3], Doob [31], Dunford and Schwartz [34], and Royden [114]. Most of the results in Section §1.5 are from the authors. The uniform principle in Theorem 1.5.1 is from Lasserre [92]. Theorem 1.5.4 is from [66], but part (i) has also been proved by Serfozo (see [116, Lemma 2.2]) with the same hypotheses we use (and fn nonnegative) but with a more involved proof, and also by Royden (see

18

Chapter 1. Preliminaries

[113, Chapter 11, Prop. 17]) under stronger hypotheses (namely, fn —> f pointwise and ttn —> ii, setwise). Theorem 1.5.4(ii), on the other hand, has also been proved by Royden (cf. [113, Chapter 11, Prop. 18]) and Serfozo [116] under different (weaker) hypotheses. In [113], one assumes (al), Ifni < gn with limn gn (x) = g(x) for all x and limn i gndian = f gdp <00, whereas in [116], one requires (al), lim inf ., ,u(A) > p(A) for all A E B and i ini < gn for all n with lim sup f gn ioltin =mf (li

n

inf gn )dp, < oo.

n

However, assuming one knows a sequence g n such that f gn daan -- f gdit might be hard to check and it is in fact an assumption similar to the result one wants to prove. We believe Theorem 1.5.4(ii) is more natural and more in the spirit of the "traditional" dominated convergence theorem in Proposition 1.5.3. Indeed, (al), (bl) (resp. (a2), (b2)) are the assumptions in the dominated convergence theorem for functions (resp. for measures when fn :,- f for all n), whereas (c) links (al) and (a2). Combination of both yields Theorem 1.5.4(ii).

Part I

Markov Chains and Ergodicity

Chapter 2

Markov Chains and Ergodic Theorems 2.1 Introduction In this chapter we first state the definition of a Markov chain (MC) with values in a general measurable space (X, B), and then present some basic ergodic theorems for functionals of MCs. In fact, in §2.2 we introduce several equivalent ways of defining a MC. This is important to keep in mind because in some concrete situations, one of these definitions or formulations might be more appropriate than the others. For instance, in engineering and economics, many of the MC models are expressed by a "state equation" (as in (2.2.4), below), but there are cases — e.g., in epidemics and fisheries modelling — in which it might be more practical to describe a MC using "transition probabilities" (as in (2.2.2)) rather than a state equation. In §2.2 we also introduce the notion of an invariant p.m. (probability measure) for a MC. This is a key concept that underlies most of the facts we are interested in. For instance, the ergodic theorems in §2.3 to §2.5 are all based on the assumption that there exists an invariant p.m. By "ergodic theorem" we mean a result on the limiting behavior of averages, also known as Cesar° sums, of either expected occupation measures or pathwise (or empirical) occupation measures (as in (2.3.4) and (2.3.5), respectively). Here we are mainly interested in individual and mean ergodic theorems. The former refer to some form of pointwise convergence (for instance, p-a.e. for some measure p,), and the latter refer to convergence in some norm (e.g. the L i (p)-norm or the total variation norm). Many of the results in the remainder of this book can be classified as either individual or mean ergodic theorems.

22

Chapter 2. Markov Chains and Ergodic Theorems

After the introductory material in §2.2, in §2.3 we present the ChaconOrnstein Theorem 2.3.2 for a class of positive-contraction operators, which is the basis to obtain other ergodic theorems, for instance Birkhoff's Individual Ergodic Theorem 2.3.4 and the Dual Ergodic Theorem 2.3.6. In §2.4 we study a case in which the so-called ergodicity property (2.4.5) holds. This property is also obtained in later chapters under different sets of hypotheses on the MC's state space and/or the transition probability function. Finally, as another application of the Chacon-Ornstein Theorem, in §2.5 we obtain a Pathwise Ergodic Theorem 2.5.1 which, when the invariant p.m. is unique, is in fact the analogue for MCs of the strong law of large numbers for sequences of i.i.d. random variables.

2.2 Basic Notation and Definitions Given a measurable space (X, 8), there are at least three ways of defining an X valued, discrete-time, time-homogeneous MC. The first one considers a Markov chain as a particular discrete-time stochastic process with the Markov property. Namely, let (52, .F, 73 ) be a probability space, and let e. = fe n , n = 0, 1, ... I be a sequence of X-valued random variables defined on a Then e, is called a Markov chain if P(en+i E B I eo, • • • , en) = P(en+1 C B

I en)

VB E 8, n = 0, 1, ...

(2.2.1)

This is called the Markov property. Intuitively, it states that e, is a "memoryless" process in the sense that the conditional distribution of the "future" state Gi+i, given the "past" history {60 , ... ,6n } of the process up to time n, depends only on the current state e n . For each x E X and B E B, let

P(x, B) :=

p(6n±i E

B

1 en = x).

(2.2.2)

This defines a stochastic kernel on X, which means that (a) P(x,.) is a p.m. on B for each fixed x E X, and (b) P(., B) is a measurable function on X for each fixed B E B. The stochastic kernel P(x, B) is also known as a (Markov) transition probability function (hereafter abbreviated t.p.f.). The "time-homogeneity" of the MC 6, refers to the fact that P(x, B) in (2.2.2) is independent of the time index n. In the second approach, we are given a stochastic kernel P(x, B) on X and then a MC 6, is constructed as follows. Let (C2, .F) be the (canonical) sample space, that is, C2 := X' and F is the associated product a-algebra. An element w E S2 is a sequence (x o , x i , ... ) with components xn E X. Let v be a p.m. (the "initial distribution") on B, and for each n = 0, 1, ... , let en : Q —> X be the projection w 1—÷ en (w) := x n . Then by a well-known theorem of I. Ionescu Tulcea (see, e.g., [3, Theor. 2.7.2, p. 109], or [12, Prop. 7.28, p. 140]), there is a probability measure P, on .F such that V B E 8, Pv(6o E B) = v(B)

2.2. Basic Notation and Definitions

23

and, moreover, for every n = 0, 1, ... , x E X and B E B Pv(en+1 E

BI G, = x) = P(x, B).

When v is the Dirac measure at x E X we denote P,, by P. Similarly, the corresponding expectation operators are denoted by E, and Ex , respectively. Thus, the MC . represents a discrete-time (homogeneous) dynamical system that evolves in time as follows: At time t = 0, the initial state 6 is randomly chosen according to the initial probability distribution v. Next, at time t = 1, and given that 6 = xo, the state i E X is randomly chosen according to the probability distribution

Pv(6 C B I 6 = xo) = P(xo, B),

B E B,

and this process is repeated. One can also introduce the n-step transition probability Prob[ 7,±1 E B

1 6 = x] .: Pn (x, B)

for xEX,BE B,

where the n-step t.p.f. Pn(x, B) can be recursively defined by

Pn (x, B) =

fx P(x, dy) pn-1(y, B) ____ fx P1 (x, do p(y, B)

for all B E 13,x E X, and n= 1, 2 ... , with P°(x, .) = 8x (.), the Dirac measure at x E X. Similarly, if Al and N are two stochastic kernels, their product (or composition) MN is defined as

M N(x, B) :=

fx M (x, dy)N(y, B), xE X, BE B.

(2.2.3)

Further, given a p.m. ,u on X, we shall say that M = N p-a.e. if M(x, •) = N(x, -) for fra.a. (almost all) x E X. A third way to define a discrete-time MC is as a discrete-time dynamical system ei, on X that evolves according to a time-homogeneous state equation of the form (2.2.4) t = 0, 1, ... t+1. = F(t,'Ot), Here F : X xY -4 X is a given measurable mapping, NO is a sequence of independent identically distributed (i.i.d.) Y-valued random variables, for some measurable space Y, and independent of the (given) initial state 6. Then straightforward calculations show that the process , = {G , n = 0, 1, ... } thus defined satisfies the Markov property (2.2.1). Furthermore, from (2.2.4) and (2.2.2), we can obtain the stochastic kernel

P(x,B) = f 1B(F(x, Y)) v(dY) Y

V B E 8, x E X,

(2.2.5)

Chapter 2. Markov Chains and Ergodic Theorems

24

where v denotes the probability distribution of lpo , and 1lB stands for the indicator function of a set B. Conversely, given a MC e. with stochastic kernel P as in (2.2.2), one may prove the existence of a measurable mapping F:XxY --0 X, and of a sequence of i.i.d. random variables O t (on some Bore! space Y) such that e„ satisfies (2.2.4). (See, for instance, Gihman and Skorohod [41, §1.1]).

2.2.1 Examples Here are some examples of MCs.

Deterministic systems. Let X = R. For some given initial state e o , let = F(t),

t+i

where F : ilk' is given by

t = 0,1, ... ,

(2.2.6)

R" is a given measurable mapping. In this case the t.p.f. P(x, B)

P(x,B) = 1B(F(x))

V x E X, B E B,

which is the same as P(x,.) = 6F (r)( •) the Dirac measure at F(x).

Iterated function systems. (See, e.g., [85]) Let X = R", and let Fi : X —> X, z = 1, 2, ... , m, be m measurable mappings. Consider the dynamical system = Fi(t),

t = 0, 1, .

(2.2.7)

where the O t 's are i.i.d. S-valued random variables, with S := {1, 2, ... , m}, and a common probability distribution {pi,. At each time t, a mapping Fi is randomly chosen acording to the distribution of O t and the system evolves deterministically according to the dynamics F1 . The t.p.f. P(x, B) is given by In

P(x,B) = Epi lB (F,(x)),

xEX,BEB.

7=1

Under certain conditions, these systems are known to yield fractal invariant measures (see Lasota and Mackey [85]).

Production/inventory models. Let X = R and consider the dynamical system et+i =

+ f (et) —

t = 0, 1,

,

(2.2.8)

where et stands for the inventory of some good at the beginning of the time interval [t, t+ 1]. Moreover, for a given measurable function f: X —> W, f (et ) denotes the amount of that good produced during a time interval (allowed to depend on the inventory et at time t), and 7,bt is the random demand during the interval [t,t +1). Thus (2.2.8) is the inventory balance equation if replenishment is assumed to take

2.2. Basic Notation and Definitions

25

place instantaneously at time t. This is the version where demand is backlogged, whereas with lost sales, (2.2.8) becomes = (t + f (t) -

11)0 + , t = 0 , 1, . . .

(2.2.9)

If the demands O t are i.i.d. random variables, and also independent of the initial inventory level, then (2.2.8) and (2.2.9) each defines a MC on R. (Water-reservoir models, as well as many other "storage" processes, are of this type.) In the case (2.2.8), the t.p.f. P(x, B) is of the form (2.2.5) with F(x, y) = x + 1(x) — y, i.e.,

P(x, B) =- I 1B(x + .1. (x) — y) v(dY), where v denotes the distribution of

Vx EX, BE 8,

Ipo, and similarly for (2.2.9).

Additive-noise systems. The MC defined by (2.2.8) is an example of so-called "additive-noise" systems because the "noise" process {1P t } is added to a "drift" term G( t ), i.e.,

G(6) + '11)t •

(2.2.10)

This is, of course, of the form (2.2.4) with F (x , y) = G(x)+y, where the state space X and the disturbance set Y are both R", that is, X = Y = R". In econometrics and time series analysis, additive-noise systems are usually known as autoregressive processes. They include the important class of linear time-invariant dynamical systems t

t = 0,1, . . .

+1 =

(2.2.11)

where A E R"'" is a given matrix. Linear systems have been extensively studied. In particular, they were the basis of the development of modern control theory in the 1960s. In many applications the O t are assumed to have a Gaussian distribution, in which case e. is called a Gauss-Markov process.

2.2.2 Invariant Probability Measures Let (X, B) be a measurable space. Throughout the following, e. = {0 denotes an X-valued MC with t.p.f. P (see (2.2.2)). We will use P to define linear operators on the vector space M(X) of finite signed measures on 8, introduced in §1.2, and on several spaces of functions on X. In the former case we define P: M (X) —> M (X), v '—f vP, as follows. For each v E M(X), let

(vP)(B) :=

v(dx)P(x, B), B E B.

(2.2.12)

26

Chapter 2. Markov Chains and Ergodic Theorems

As P(x, .) is a p.m. on B, it is clear that vP is indeed in M(X). Moreover, if v itself is a p.m. on X, then so is vP. (Observe that in the latter case, the definition of vP is consistent with (2.2.3) when M(x,B) is the "constant" stochastic kernel M(x,.) The measure vP in (2.2.12) has an important interpretation if V is a p.m. that denotes the distribution of G, because, in this case, (2.2.12) and (2.2.2) yield that vP is the distribution of e,,,, +1 ; that is, for B E B, v(B) = Prob(G, E B) VB = (vP)(B) = Prob(Gi+i E B) VB.

(2.2.13)

Further, if v is a p.m. such that V = vP, then v is said to be an invariant p.m. for P, or an invariant (or stationary or steady-state) distribution for the MC e.. It is so named because if v is the initial distribution, then (2.2.13) yields that GI has distribution v for all n = 0, 1, ... , that is, Prob( n, E .) = v(.) for all M. In other words, the dynamics of the system has no influence on the state distribution, whence the name "invariant" (or "stationary"). A recurrent theme in these notes is that the existence of invariant p.m.'s is the basis for many stability concepts for MCs. In addition to invariant p.m.'s we shall consider invariant sets.

Definition 2.2.2. A set B E B is said to be invariant (or absorbing or stochastically closed) w.r.t. P if P(x, B) =1 whenever x is in B. We finish this section with the following lemma that states some useful properties of invariant p.m.'s.

Lemma 2.2.3. Let P be the t.p.f. of a MC. Then (a) The set of invariant p.m. 's for P is a convex subset of M(X)+ . (b) For any two distinct invariant p.m. 's p, V for P the (finite) measures yo := (p — 0+ and qp := (p — v) — = (v — ,u,)± are mutually singular and invariant for P (c) If yo and lp are mutually singular invariant p.m. 's for P, then there exist two disjoint invariant sets By„ .13,1, E B such that yo(B p ) =1 and 7,b(Bo) = 1. Proof. (a) This part follows from the fact that P is a linear operator on M(X). Therefore, if p, v are two invariant p.m.'s for P, then for every scalar a c [0, 1] (ap, + (1 — a)v)P = apP -I- (1 — a)vP = ap, + (1 — a)v

[by linearity]

[by invariance].

(b) That cp and V) are mutually singular follows from their definitions. Therefore, there exists a set X1 E B such that (p(XD = 0 and 7P(X 1 ) = 0.

(2.2.14)

2.2. Basic Notation and Definitions

27

Moreover, choose an arbitrary B E B, and let Bi := B

coP(B1)

=

P (x ,

n Xi . Then

co(dx)

P(x, BO — v)± (dx) >

fx1 P(x, BO — v)(dx)

— v)(B i ) = so that pP(B 1 ) > co(B 1 ). Next, as BnXf C Xf, we have (pP(BnXf)> 0 because cpP E M(X) +, whereas cp(B n Xf) = 0 because co(Xf) = 0. Hence,

coP(B) = (pP(Bi ) coP(B

n Xf)

(p(B i ) (p(B

n

= (p(B).

Therefore, as B E B was arbitrary, we obtain cpP > cp and, similarly, 'IPP > This implies coP = co and oP = 0. (See Remark 2.2.4, below.) (c) Let X 1 be as in (2.2.14). Then by the invariance of the p.m. cp and the fact that (,o(Xf) = 0, we get

40 (Xi) = 1 = fx1 P(x, Xi) 4 0 (dx), which implies the existence of a set X2 C B such that, X2 C Xi, (i0 (X2 ) = 1 and P(x, X i ) = 1 Vx C X2.

Repeating the above argument, we obtain a nonincreasing sequence {X i } of sets E B such that co(X i ) = 1 for all i = 1,2, ... , and

P(x,X i ) = 1

Vx

e Xi+ .

(2.2.15)

Therefore, the set B w := fl 1 X E B is nonempty because

co(B) =

= lim co(X i ) = 1. i -400

It is obvious that Bw is an invariant (or absorbing) set w.r.t. P because for any x E Bw , we have that

P(x,B w ) = P(x,Fiff. i X i ) = lim P(x,X i ) = 1. Summarizing, Bw C Xi is an invariant set with co(B) = 1. Similarly, as 0(Xf) = 1, the same argument yields the existence of an absorbing set B,p E B such that C Xf, and 'Ø(B) = 1. Finally, as B w c X i and B v, c Xf are disjoint, (c) follows.

Remark 2.2.4. Let ft and v be finite measures on X such that p(X) = v(X). If p(B) > v(B)

V B E B,

(2.2.16)

then u = II. Indeed, replacing B in (2.2.16) with its complement BC we obtain p(X)— p(B)> v(X)— v(B), i.e., eu(B) v(B). This inequality and (2.2.16) yield = v(.).

Chapter 2. Markov Chains and Ergodic Theorems

28

2.3 Ergodic Theorems Ergodic theorems are fundamental to analyze the long-run behavior of dynamical systems, in particular MCs. In this section we present some of the basic ergodic theorems for MCs. As we are mainly interested in MCs with at least one invariant p.m., we do not state the ergodic theorems in their full generality, but rather their versions for MCs with an invariant p.m. They express long-run properties in various forms, which can be interpreted as "stability" properties of a MC. Most of these ergodic theorems can be derived from the celebrated ChaconOrnstein theorem that, in view of its importance, we reproduce below.

2.3.1 The Chacon-Ornstein Theorem Let (X, B, p) be a measure space and L i (it) L i (X, 8, it) the Banach space of ea-integrable functions, that is, the space of real-valued measurable functions f on X, with norm

11/111 :=

f If I

dii < cc.

(2.3.1)

We shall denote by Li (A)E the cone of nonnegative functions in Li (p).

Definition 2.3.1. Let T: L i (p) Li(a) be a linear mapping. T is said to be (a) positive if T f > 0 for every f E Li(p)+, and (b) a contraction if IIT flli If Iii for every f E Li (A). We shall be dealing with "positive contractions" T, so that (a) and (b) are both satisfied. Given a positive contraction T on Li (it), we define the iterate T k : Li (A) ---* L i (A) by f i— T k f := T(T k-1 f) for k = 1,2, ... , where T° := I, the identity operator on Li (A). We also define n-1

f i— S n f :

Tk f

V n ENU fool .

k=0

We now state the Chacon-Ornstein theorem (for the proof see, e.g., Krengel [82, Theor. 2.7, p. 122], Revuz [112]).

Theorem 2.3.2 (Chacon Ornstein). Let T : L 1 (it) ---* Li(p) be a positive contraction. Then for each f E Li(p) and g E Li (it) -

Sn f Sng as n --4 cc.

converges a.e. on {S oc g > 0} to a finite limit

(2.3.2)

2.3. Ergodic Theorems

29

In particular, if ft is finite and Ti = 1, then 1 — Sn f converges a.e. to a finite limit as n

(2.3.3)

oo.

We will see below that in the context of a MC with t.p.f. P and an invariant p.m. i the positive contraction T in Theorem 2.3.4 is P itself (see (2.3.6)). The proof of the Chacon-Ornstein theorem relies on Hopf's decomposition, which has several equivalent forms (see e.g. Krengel [82], Revuz [112]). In view of its importance (and that it is used in §11.4), we state one of them. ,

Proposition 2.3.3 (Hopf's decomposition). If T is a positive contraction on Li( 1a), there exists a decomposition of X into disjoint sets C and D, uniquely determined modulo 1,t by : (a) for each f E Li(p,)+ , S oo f = oo on C n Isc, f > 01, and (b) for each f E L i (p)+, Soo f < oo on D. The sets C and D in Proposition 2.3.3 are called the conservative and the dissipative parts of X, respectively.

2.3.2 Ergodic Theorems for Markov Chains Let us go back to the MC e, ={et} with t.p.f. P. An object that will be central in this book is the average (or Cesaro sum) n- 1 P(n) (X

B) := Lt-

E Pk (x, B),

B E B, n = 1, 2, ... ,

(2.3.4)

n k=0

which will be referred to as the n-step expected occupation measure with initial state eo = x. In later sections we will also be concerned with the empirical or pathwise occupation measure 7r (n)

(B)

1 — n

n-1

1B(G),

B E B, n = 1, 2, ... ,

(2.3.5)

k=0

which of course is related to P(n ) by B) = E[7(n) (B)16 = x]. Sometimes, we will also write 7r(n)(B) as 7r(B) := n —1 Enk:01 IB [ek (W)] if the dependence on w E Si needs to be emphasized. We first present the mean ergodic theorem (MET) and the Birkhoff individual ergodic theorem (IET) that are of fundamental importance in the theory of MCs. Then we also derive a "dual" version of the MET and JET.

Chapter 2. Markov Chains and Ergodic Theorems

30

If it is an invariant p.m. for P, then the t.p.f. P defines a linear operator on L i (p) := L1 (X, B, A) into itself, f i-+Pf , given by

P f (x) := f P (x , dy) f (y) = E[f (G +1 ) I G, = x] x

(2.3.6)

for all n = 0, 1, . . . and f E Li(u). Moreover, as in [133, p. 381] it can be shown that liPf ill 5-

1111k <00.

This yields that P f is in L i (p), and also that P is a contraction on L i (p), i.e. iiPf 111 5-

11/111

Vi

E

L1(12 ).

(See Definition 2.3.1.) Hence, as it is obvious that P is positive (i.e., P f > 0 if I > 0), we conclude that P is a positive contraction on L i (A). For each k = 0, 1, . . . and f E Li (//), let

Pk i(x) := f P k (x,dY)f(Y)

= E [f. (k) 1 eo

where P° := I, the identity operator on L i (ju), and P 1 := P, the mapping in (2.3.6). Hence, as in (2.3.4), we have the averages p(n) f :_

1 n-1

E pk f

V f E L 1 (14, n = 1, 2 ... .

n k=0

Comparing P(n) f with -7,1-,- Sn f in (2.3.3) we can see the connection between the Chacon-Ornstein Theorem 2.3.2 and the following ergodic theorems for MCs with an invariant p.m. ,u.

Theorem 2.3.4 (Birkhoff's individual ergodic theorem (IET)). Let A be an invariant p.m. for P, that is, ,u, = AP. For every f E L1(A) there is a function f* E Li (p) such that (i) P(n) f --- f* A-a.e., and (ii) f f*dit= f f dA.

(2.3.7)

The full ,u-measure set on which (2.3.7)(i) occurs depends on f. Theorem 2.3.4 is called an "individual" ergodic theorem because of the pointwise (p-a.e.) convergence in (2.3.7)(i). In contrast, in a "mean" ergodic theorem we have convergence in norm, for instance, the Li (u)-norm convergence in (2.3.8)(i), below. Theorem 2.3.4 is essentially the same as the Chacon-Ornstein Theorem 2.3.2 except for the additional condition (2.3.7)(ii). (In fact, there are stronger versions of the Chacon-Ornstein theorem that also yield a condition similar to (2.3.7)(ii); see, e.g., Krengel [82], Revuz [112]). On the other hand, the following mean ergodic theorem yields the convergence P(n) f f* in L i (,u,) rather than iu-a.e.

31

2.3. Ergodic Theorems

Theorem 2.3.5 (Mean ergodic theorem (MET)). Let ii be as in Theorem 2.3.4. Then for every f E L1(p) the function f* in (2.3.7) is such that (i) P(n) f

f* in L i (p), and (ii) P f* = f* .

(2.3.8)

As f* and P1* are functions in L 1 (//), it is understood that the equality in (2.3.8)(ii) is p-a.e.

2.3.3 A "Dual" Ergodic Theorem There is also a "dual" version of the mean (MET) and the individual (JET) ergodic theorems, which requires to introduce some notation. Let ii be an invariant p.m. for P, and for every f E L1(i2) let vf be the finite signed measure defined by vf (B) := f f dtt, B E B,

(2.3.9)

B

which we also write as vf(dx) = f(x)p(dx). Note that, for every f G Li(p), both vf and vfP [defined as in (2.2.12)] are absolutely continuous (a.c.) with respect to ii; in symbols, vf < 12 and v f P < il. Finally, let T : L i (a) ---4 Li(p) be the operator that maps every function f E Li(p) into the Radon-Nikoqm derivative of v f P with respect to ii, that is, (2.3.10)

Tf := d(vfP)Idp.

In other words, T f E Li(,a) is such that (as in (1.2.2))

L

(2.3.11)

(T f) dit = (v fP)(B), B E B.

Observe that T is a positive operator on L1 (p) because T f > 0 if f > 0. Moreover, by (2.3.11) and (2.3.9), iiTiIii = f (Tf)dtt = vfP(X) = IlfIli x

v f E L1(A) + ,

(2.3.12)

whereas using that f = f+ — f — and If 1= f+ + f— , from (2.3.9) again we get vf(B) =

LB

f+ dia

—Ir B

di _<

L Ifl

dit = vifi(B) V f E Li(p), B E B,

i.e., vf 5_ yip. This yields that iiTilIi = f ITfIdit _. f Tificlia 5_ and so T is a (positive) contraction on Li (jI)•

1111k

vf

E Li (A),

32

Chapter 2. Markov Chains and Ergodic Theorems

On the other hand, replacing 11 by vf in (2.2.13) we obtain an interpretation of T f . Namely, suppose that the distribution of e n is a.c. with respect to /..t and that the corresponding probability density is f E L 1 (11), i.e., vf (B) = Prob(en E B) = I fdt, B e B, where f > 0 and f f = 1. ix Then the distribution vf P of " n _F1 is a.c. with respect to ti and, by (2.3.10) [or (2.3.11)], its density is T f. For a "deterministic" (or "noiseless") system of the form (2.2.6), that is, et := F(et-i) = F(0), t = 0,..., an invariant p.m. ,u, if it exists, satisfies that Ii(B) = 1u(F-1 (B))

V B E 8,

(2.3.13)

which follows from the fact that the t.p.f. is P(x, B) = 1B[F(x)]. In this case, F is said to be a measure-preserving transformation for the p.m. it, and T is called the Frobenius-Perron operator corresponding to F; see, for instance, [85] and also §11 where we consider the existence of fixed points of T. However, what we are really interested in is the fact that the restriction of T to L oc (iu) is precisely the adjoint P* : 1,0 (p) —> 1, 00 (p) of P. In other words, using terminology of §1.3, we consider the dual pair (L 1 (ii), L(j4) with the bilinear form (f , g) := f f g d,u, for f E L i (p), g E L oc (it). Thus, from (2.3.10) we see that (P f,

= (LTg) V f E Li(ti), Vg E 1,0„(11).

That is, T restricted to L(1.1) coincides with the adjoint P*. Alternatively, since T = P* can be extended to all of L i (,u) as in (2.3.10) (see, for instance, Foguel [39], p. 75) we can also write (Tf,g) = (LPg) Vf E Li (A), Vg E L c, ( au). Then, as eu is an invariant p.m. for P and T is a positive contraction on Li (ii), again a particular case of the Chacon-Ornstein Theorem 2.3.2 yields the following (Foguel [39], Revuz [112]).

Theorem 2.3.6 (A dual ergodic theorem (DET)). For every f E L1 (A) (i) T ( n) f

f* -a.e. , and (ii) Ten ) f —> f* in Li(it)

with f* as in (2.3.7)-(2.3.8) and where T (n) denotes the Cesar() sum n-1 T (n) := — n

E Tk, c=0

n =- 1, 2, ... .

(2.3.14)

2.4. The Ergodicity Property

33

We call this a "dual" ergodic theorem because it is an ergodic theorem for the operator T, which (as was already mentioned) when restricted to L(t) coincides with P*, the adjoint (or "dual") of P.

Remark 2.3.7. (a) Let (X, 13) be a measurable space such that X is a topological space and B the corresponding Borel a-algebra. If A is a finite measure on X, then in addition to (1.2.3) we have

C,(X) C Co(X) c Cb(X) c B(X) C Li(ja),

(2.3.15)

where Li (pc) --=- Li (X, B, p). In this case, the function P f in (2.3.6) is well defined for functions f in any of the spaces in (2.3.15), even if A is not an invariant p.m. for P. (b) Let (V,11.11) be an arbitrary normed vector space, partially ordered by a positive cone V+ c V (i.e., for vectors u and v in V, u > v if and only if u — v is in V+). A linear operator Q from V into itself is said to be (b1) positive if Qv E V+ whenever v c V+; (b2) a contraction if liQvil < PI for all v c V; (b3) norm-preserving if 11QvIl = 11v11 for all v e V+; (b4) a Markov operator if Q is positive and norm-preserving. These definitions include, of course, Definition 2.3.1 and (2.3.6) in which (V,11-11) = (L10-0,11.110, as well as (2.2.12) in which (V, 11.11) = ( m(x),11.11 T v)• Moreover, if A is an invariant p.m. for P, it is clear that the positive contractions P in (2.3.6) and T in (2.3.10) are both Markov operators on L i (A) (see (2.3.12) and the calculations after (2.3.6)). Similarly, (2.2.12) defines a positive contraction v i— vP on (M(X),11.11Tv), which is obviously norm-preserving; hence a Markov operator.

2.4 The Ergodicity Property Throughout this section pc denotes an invariant p.m. for P. There is a well-known representation of the limit function f* in the ergodic theorems in the previous section. This requires the notion of invariant sets introduced in Definition 2.2.2. Note that using (2.3.6) with f =1B, B E 8 is invariant w.r.t. P if and only if P1B > 11B• On the other hand, B E 8 is said to be A-invariant if P1B = 1B pt-a.e. In particular, any set B with pt(B) = 0 or 1 is pc-invariant. Clearly, an invariant set is ,a-invariant. Conversely, it can be shown (see [133], p. 396) that if B is pc-invariant, then there exists an invariant set B such that

A(BAB) = 0 [hence A(B) = 1i(E)1, where A denotes the symmetric difference; AAB := (A \ B) U (B \ A).

(2.4.1)

34

Chapter 2. Markov Chains and Ergodic Theorems

We shall denote by S the family of all the invariant sets, and by St, the completion of S with respect to p. The a-algebra Si, is a sub-a-algebra of the completion 8 , of 8 with respect to p, and it coincides with the family of //invariant sets. The connection of these concepts with the JET, MET and DET ergodic theorems is that in (2.3.7)(i) we have for every f E ,

f*(x) = lim P ( ') f(x) = E(f 1S t,,)(x)

for fra.a. x

(2.4.2)

where E(f ISO denotes the conditional expectation of f with respect to Si, in the probability space (X, B p , p). Hence, f* is a 8,1-measurable function and f* dp = I f diu

VB E

In particular, with B = X we obtain (2.3.7)(ii). Another representation of f* is provided in §5.2 under the assumption that X is a metric space. However, there is a case where the representation (2.4.2) has a simpler form, which requires the following concept.

Definition 2.4.1. An invariant p.m. p is said to be ergodic if,u,(B) = 0 or p(B) = 1 for every invariant set B E S. Proposition 2.4.2. Let p be an invariant p.m. for P. If p is ergodic, then for each f E Li(p) the function f* in (2.3.7), (2.3.8) is constant p-a.e., and, in fact, f*(x) = f f dp p-a.e.

(2.4.3)

Proof Let a := f f dp. By (2.3.7)(ii) we also have = f f* dp.

(2.4.4)

Let A := f* (x) > af. To prove (2.4.3) we proceed in two steps. First we show that (a) A is /2-invariant, that is, A is in Sp. Therefore, by (2.4.1) and Definition 2.4.1, it follows that either ,u(A) = 0 or p(A) = 1. Thus in the second step we prove that (b) ,u(A) = 0. This yields that f* (x) 0 //-a.e. From the latter fact and (2.4.4) we obtain that f (a — f*) dp = 0 and, therefore, f*(x) = which is the same as (2.4.3). We now proceed to prove (a) and (b). Proof of (a). By (2.3.8)(ii), Pf* = f* ,u-a.e. This implies P(f * ct) = f* - a and so Ij (f * a)i = If* - al.

2.4. The Ergodicity Property

35

Hence (as f + Ill = 2f+), we get P(f* - a)+ > (f* - a)+, which together with the invariance of au, yields P(f* - a) + = (f* - a)± A-a.e. This in turn, by the definition of A, gives PI1A = 11,4 A-a.e. and so (a) follows. Proof of (b). If (A) = 1, then f*(x) > a A-a.e. Therefore f f* dbt > a, which contradicts (2.4.4). This yields (b). 0 As a consequence of Proposition 2.4.2, if au, is an ergodic invariant p.m., we may express (2.3.7)(i) as the time-average lim P(n) f = the space-average f f dtt

A-a.e.

(2.4.5)

71 -*00

In physics, (2.4.5) is called the ergodicity property, and it was first reported by Boltzmann in statistical mechanics. We end up this section with the following result.

Proposition 2.4.3. Let P be the t.p.f. of a MC on X. If P has a unique invariant p.m. it, then 1.t is ergodic. Proof. It suffices to prove that A(A) = 0 or 1 for every A-invariant set A E B, that is, for every A E S. Suppose this is not the case. Then there exists A E ; such that p(A) = a for some scalar a E (0,1). By (2.4.1), there is an invariant set A E S such that ,a(A) = a. Now let V be the p.m. on X defined by v(B) := a-1 p(B

n A)

VB E B.

Note that 11 A because v(Ac) = 0, whereas ,u,(Ac) = 1 - a > 0. We next show that v is an invariant p.m., which contradicts the uniqueness of A, and, therefore, we must have a = 0 or a = 1. For every B E B such that B c A, we have vP(B) =

1

Jx

=

P(x, B) v(dx) P(x, B) a -1 eu(dx)

fA

36

Chapter 2. Markov Chains and Ergodic Theorems

On the other hand, for every B E 8 such that B

c Ac,

yP(B) = f P(x, u(dx) = fP(x

we obtain

ce -1 [t(dx) = 0

where the last equality holds because, as A is invariant, P(x, A') = 0 for every x c A. Hence yP(B) < v(B) for every B e B. Finally, as v(X) = yP(X) = 1, from Remark 2.2.4 we get vP = v. LII

2.5 Pat hwise Results In this section, we provide a sample-path ergodic theorem for MCs that is perhaps the most frequently cited as it is the analogue for MCs (with a unique invariant p.m.) of the strong law of large numbers (SLLN) for a sequence of i.i.d. random variables. Let e. be a MC on (X, 8) with t.p.f. P. As in §2.2 let (c2, .F) be the associated canonical sample space. We also assume that there exists an invariant p.m. i for P, and denote by Pi, the corresponding p.m. on (S2, T) given that the MC's initial distribution is iu. We next introduce a mapping 0 : —> C2, called the (unilateral) shift operator on S-2, defined by

w=

(Xo,

Xi,

0(w) := (Xi, x2, ... ),

)

WE

lL

We can check that 0 -1 (B) E F for all B E F , so that 0 defines a measurable transformation from c2 into itself. Consider the associated dynamical system (similar to (2.2.6)) on S2, defined by = (9(wt_i) = e t (wo),

t = 1, 2, ...

(2.5.1)

with some initial condition coo E a This dynamical system defines an C2-valued MC w, = {w t } with t.p.f. e given (as for the deterministic system (2.2.6)) by e(w, B) = 1B(0(c.,4) VB E F, w E ft

(2.5.2)

Now choose an arbitrary measurable "rectangle" B := B1 X B2 X • • C with B. c B for all n. Then PI,(0 -1 (B)) = f f ,a(dx 0 )P(x o ,dx i )P, i (B) x x

L

it(dx i )P,1 (B)

[by the invariance of itt]

= Pp (B), that is Pit (B) = Pit (0 -1 (B))

VB E F.

(2.5.3)

37

2.5. Pat hwise Results

In other words, (2.5.3) states that 0 is a so-called measure-preserving transformation for the p.m. Pp , or, equivalently (as in (2.3.13)), Pp itself is an invariant p.m. for the MC w.. That is, by (2.5.2) and (2.2.12), for every B E Pp e(B)

Pp (dw)0(w, B) = L1„(0 (w) ) p,i (dw)

=

PII NG)) =- Pt,(0 -1 (B))

f

9- 1 (B)

= I:3'11 (M

[by (2.5.3)].

Thus Pp 0 = Pp , and so Pp is an invariant p.m. for 0. Moreover, as in (2.3.6), 0 defines an operator on L i (Pp ) = Li (C2, .F, Pp ) via g

0 g (w) := f 0((.4.) , do! )g (c4.1) = g (0 (w))

Vw E

g E //IWO-

( 2 .5. 4 )

Therefore, the same argument after (2.3.6) yields that e is a positive contraction on L i (Pp ). We can now obtain the following sample-path version of Theorem 2.3.4 for the empirical occupation measures r ( ') in (2.3.5). First, recall that by the "canonical" construction of the MC = {} (at the beginning of §2.2), for w = (x o , x i , ) E 52 and n = 0, 1, , we define G as the projection en (w) := x n . However, for notational ease (and using a standard convention) sometimes we shall write G, for G (w).

Theorem 2.5.1 (A pathwise ergodic theorem). Let p be an invariant p.m. for P. For every f E L 1 (p) there exist a set B1 E B and a function f* e L 1 (p) such that p(B1) =1, and for every x E Bf n-1

n- 1

E

f(k)

(2.5.5)

f*

k=0

Moreover, (2.3.7)(ii) holds, i.e.,

f

(2.5.6)

dp = f f dp.

Proof. Fix an arbitrary f E L 1 (p) and let hi. :

R be defined by Vw E ft

h1(w) := f (e0(w)) = f(x 0 ) It is clear that hf E Li (Pp ) because IhfldPri = f li(xo)lia(dx0) X

=

<

00.

(2.5.7)

Chapter 2. Markov Chains and Ergodic Theorems

38

Therefore, as Pii is an invariant p.m. for e, by the Birkhoff IET 2.3.4, there exists a function h*f E L 1 (P,) such that

e ( n) hf

(1)

Pica.e.,

h*f d Pi, = fhfdP

(ii)

and

=

(2.5.8)

f dtt,

where the second equality in (ii) follows from (2.5.7). Moreover, from (2.5.7) again and (2.5.4), we obtain

e k hi(w) = .gek) so that (2.5.8)(i) yields n-1

-1E n

f(k)

(2.5.9)

h.f*

k=0

Next, as .F is the product a-algebra BxBx•••, for each "rectangle" B = B1 X B2 X • • in F we also have

P(B ) = LPx(B),u(dx). From this fact and (2.5.9), it follows that there exists a set Bf E B with ,u(B1) = 1 and such that for every x E Bf n-1

n -1

E f ( k)

h*f

Px-a.e.,

k=0

which is (2.5.5) with the limit function h7 instead of f*• However, by (2.3.8)(ii), (2.5.4) and the definition (2.5.7) of hf, the limit function h*f depends only on xo , and so it can be written as f*(e 0 ) for some f* E Li (p). Therefore, from (2.5.8)(ii), f

dP

f f* d,u, = f f dp,

which proves (2.5.5)-(2.5.6). We conclude with the following result in which it is the unique invariant p.m. for the t.p.f. P.

Corollary 2.5.2. Let p, be the unique invariant p.m. for P, and let f E L1(12). Then for p-a.a. x E X n-1

n 1

E k=0

f f d,u, Px -a.s.

(2.5.10)

2.6. Notes

39

Proof. We only neeed to prove (2.5.10) for a nonnegative f E L 1 (p) because for a general f = f+ — f — E Li (p) it suffices to apply the result to f+ and f — . Therefore, choose an arbitrary f E L i (//)±. By Proposition 2.4.3, /..t is ergodic. Hence, by (2.4.5), P(m ) f --+ f f dm act-a.e. Moreover, as P(n ) (x, B) = E[71- (7i)(B)K0 = x], we have n-1 P(n) f (x) = f

n-i E f(ek(c.o))Px (c/w) k= o

Vu = 1, 2, ... .

(2.5.11)

Now let f*(e 0 ) be as in (2.5.5). Then for ett-a.a. x E X, = lim inf P(n) f (x) 11,00

[by (2.4.5)]

n-1 =

lim inf f n -1 ii -*00

Q

E f(ek(w))P x (c/co) k=0

[by (2.5.11)]

> .12 f * (e0) dPx [by Fatou 's lemma] = f* (x). Thus f f dit > f* (x) ,u-a.e. But, on the other hand, from (2.5.6) we also have f f d ia = f f*d iu, which implies that f* (x) = f f d,u iu-a.e. 0 As was already mentioned, (2.5.10) is the analogue for MCs with a unique invariant p.m. of the strong law of large numbers for sequences of i.i.d. random variables.

2.6 Notes Most of the results in this chapter can be found in one form or another in books on the ergodic theory for MCs, e.g. Foguel [39], Krengel [82], Revuz [112], Yosida [133], etc. We began here with the Chacon-Ornstein Theorem 2.3.2 because it unifies many results, depending on the definition of the linear mapping T involved. For instance, it yields Birkhoff's IET 2.3.4, which in the present context of MCs with an invariant p.m. it, was first proved by Doob (1938) for indicators functions, then by Kakutani (1940) for bounded measurable functions, and finally by Hopf (1954) for functions in L1(). We also derived the pathwise or sample-path ergodic Theorem 2.5.1 via the Chacon-Ornstein theorem, using the shift operator 0. However, other derivations are possible; see, e.g., Theorem 17.1.2 in Meyn and Tweedie [103]. The ergodic theorems in §§ 2.3, 2.4, 2.5 are very general because they hold for arbitrary MCs provided that they admit an invariant p.m. ict. However, the convergence in some of theses results, e.g., Theorem 2.3.4 and Theorem 2.5.1, occurs only in a set of full it-measure, which, as observed in [103], despite of being of full /2-measure, may be very "small" (for example, a singleton if itt is a Dirac measure).

Chapter 3

Countable Markov Chains 3.1 Introduction In this chapter we consider a time-homogeneous MC e, — {e t } on a countable space X. For notational convenience, and without loss of generality, we assume that X = {0,1, 2, ... } with the discrete topology. In this case, B is the a-algebra of all the subsets of X. The corresponding one-step t.p.f. P is an infinite matrix {P(i, j)} where

P(i,i) = Prob[et+i = i I t =i],

i, j E X.

As in §2.2, the n-step t.p.f. is denoted by Pn , and it can be obtained recursively as PT' = ppn-1 = Pn-1 P for all n = 1, 2, ... , with P° = I, the identity matrix. We only briefly review the basic definitions and properties that we need for comparison with the more general MCs considered in this book. In fact, we are mainly interested in recurrence properties and limit theorems. For complete treatments of countable-state MCs the reader may consult, for instance, the books by Chung [23], Kemeny and Snell [79] or Norris [107].

3.2 Classification of States and Class Properties The first basic definitions below are purely structural and only depend on the fact that P(i, j) is zero or strictly positive, but in the latter case, not on the precise value of P(i, j).

3.2.1 Communication It is said that a state i E X leads to j E X, denoted i —, j, if there exists a nonnegative integer m such that 7) (71 = ilo = i) = P m (i,i) >0.

42

Chapter 3. Countable Markov Chains

The states i and j E S communicate if i j and j i, denoted i j. It can be seen that the relation i j is reflexive, symmetric and transitive; in other words, it is an equivalence relation. Thus =± defines equivalence classes, so that two states belong to the same class if and only if they communicate. A set K c X is called a communicating set if every two states in K communicate with each other, that is, K is a subset of some equivalence class. A class property is a property shared by all members of the same class. For instance, from the previous paragraph, "communicating" is a class property. It turns out that many other interesting properties are class properties.

3.2.2 Essential and Inessential States A state i E S is called essential if i

j whenever i

j. Otherwise, i is called

inessential. An essential state cannot lead to an inessential state, and so the property of being essential is a class property. Period. The greatest common divisor of the set of positive integers n such that Pm (i, i) > 0 is called the period of i. Periodicity is a class property because all the states in the same equivalence class have the same period. If D is an essential class of period d, then D can be decomposed as D = U`k1 =1Dk, where the Dk's are (nonempty) disjoint sets such that PC), Dk+1) = 1 Vj E Dk, 1 < k < d —1,

and

P(j,Di ) = 1 for j E Dd.

3.2.3 Absorbing Sets and Irreducibility Recall from §2.2 that a nonempty set A c X is called invariant for P (or stochastically closed or absorbing), if P(i, A) = 1 for all i E A. An absorbing set is said to be minimal if it does not contain a proper subset that is absorbing. Thus a set is minimal absorbing if and only if it is an essential class. A MC is said to be irreducible if the state space X is minimal absorbing (and thus X contains only one essential class), and indecomposable if X does not contain two disjoint absorbing sets. For instance, let the MC be such that P(0,0) = 1; P(i,i +1) = 1 Vi > 0. The set {0} is minimal absorbing but the MC is neither irreducible nor indecomposable because {1, 2, 3, ... } is another closed set. However, if we now have P(0,0) = 1; P(i, 0) = 1 Vi > 0, then {0} is again minimal absorbing and the chain is indecomposable. Finally, if P(i, 0) = 1, and P(0,i) = a i > 0 Vi > 0, then the chain is irreducible.

3.2. Classification of States and Class Properties

43

3.2.4 Recurrence and Transience Intuitively, a state j is "recurrent" if the MC visits j "infinitely often" (abbreviated "i.o."). To state this precisely, consider the event 00

fen =

j

i.0.1 :=

=

00

= n u fen =

m=1 n=m

The probability of this event, given the initial state Co = i, is denoted by g ii , that is, gjj := P(en = j i.o. leo = A state i is called recurrent if gii = 1, and transient (or nonrecurrent) otherwise. For instance, an inessential state is transient. Recurrence and transience are class properties. In fact, recurrent states and essential states coincide, and so do inessential and transient states. A state i E X is recurrent (respectively, transient) if and only if the series pn(i ,

diverges (respectively, converges).

(3.2.1)

n=1 To state this condition in an equivalent, but slightly more appealing form, let i). be the indicator function of {i} (that is, MA := 1 if j = i and := 0 if j Moreover, let ij be the number of visits of the MC C. to state i, i.e.,

:= E

(3.2.2)

n=1

Then, as Ei[112;( n )] = Pn (i, i), the expected number of visits to state i, given that Co = i, is 00

00

E(i) =

Pn(i,i),

[Men )] = n=1

(3.2.3)

n=1

and so we can express the recurrence/transience criterion (3.2.1) as follows : = oo. • i E X is recurrent 4=> • i E X is transient .=> Ei(th) < oo. Note that we can also use (3.2.3) to state the condition for irreducibility : the MC is irreducible <=>

> 0 Vi E X.

(3.2.4)

Within a recurrent class, E(i7) = En'L l Pn (i, j) = oo for every i and j in that class, and the MC C. is said to be recurrent if the whole state space X is the unique recurrent class. For instance, a MC C. such that

P(i,i —1) = 1, and P(0, i) = a i > 0 Vi > 1 is recurrent.

(3.2.5)

44

Chapter 3. Countable Markov Chains

It is important to distinguish between positive recurrent and null recurrent states. To do this, let 7-, be the first passage time to state i, that is,

:= inf

>1I

=

where inf 0 = oo, and let m i := E(r) be the expected time of first return to i. Then a recurrent state i is said to be positive recurrent if m, < oo, and null recurrent if m i = oo. Again, positive- and null-recurrence are class properties so that in a recurrent class all the states are either positive recurrent or null recurrent.

Definition 3.2.1. A MC (with a countable state space X) is said to be ergodic if the whole space X is the unique positive recurrent class. Equivalently, a MC is ergodic if it is irreducible and positive recurrent. Moreover, in Proposition 3.3.1(e), below, it is stated that if a MC is ergodic, then the MC has a unique invariant p.m. which, by Proposition 2.4.3, is ergodic in the sense of Definition 2.4.1. As an example, a MC with transition probabilities as in (3.2.5), is irreducible and null recurrent whenever . ia, = oo, but positive recurrent, hence ergodic, otherwise.

3.3 Limit Theorems In this section we particularize to the present context of countable MCs some of the ergodic theorems in Chapter 2. An invariant p.m. ea = {, j E X} is a p.m. on X that satisfies 00

bLj

Pi

=

j),

for all j E X.

(3.3.1)

i=o A measure (not necessarily a p.m.) p= {pi, j E X} that satisfies (3.3.1) is called an invariant measure. We have the following results. Proposition 3.3.1. (a) For every i, j E X the following limit exists: pi, (i, j)

: { pii 0

if j is recurrent, if j is transient.

(3.3.2)

(b) For every positive recurrent state i E X with period d i , di lim PT' (i, i) = . n--+00 mi

(3.3.3)

(c) For every pair (i, j) in the same positive recurrent class C, the limit p ii in (3.3.2) is independent of i, say p ii = pi , and p = {pi , j E C} is the unique invariant p.m. for the restriction of the MC to the class C.

3.3. Limit Theorems

45

(d) If e, is recurrent, it has a unique (up to a multiple) invariant measure. (e) If e, is ergodic, it has a unique invariant p.m. It should be noted that if has a unique invariant p.m. p, then { i : ji > 0} is the unique positive recurrent class.

Proposition 3.3.2. Let the MC. be ergodic with (unique) invariant p.m. p. Then, for every f with E i If (j) p i < oo and every initial state i, n-1

lim 71-.00

=

(3.3.4)

f

(3.3.5)

t=0

and

n-1

lim n -i

71-■ 00

E f(t(u)» =

t=()

Observe that (3.3.4) is just Birkhoff's Individual Ergodic Theorem 2.3.4 in which the right-hand side f* is now explicitly formulated in terms of the function f and the unique invariant p.m. p. This is in fact the countable-state version of the ergodic property (2.4.5) because, as was already mentioned after Definition 3.2.1, p, is indeed an ergodic invariant p.m. Similarly, (3.3.5) is the countable-state version of the sample-path ergodic Theorem 2.5.1. We now introduce the unichain MCs.

Definition 3.3.3. A MC is called unichain if it has a single positive recurrent class and it is indecomposable. In other words, for unichain MCs the state space X has a single recurrent class C and there is no other absorbing set disjoint of C. Therefore, when the MC starts in a transient state, the chain is eventually absorbed (not necessarily in finite time) in the recurrent class C. When is unichain, and the time to absorption in the recurrent class is finite with probability one for every initial transient state, then in Proposition 3.3.1(a), we have pii = p,j for every i E X, and p = {p,i } is the unique invariant p.m. We now provide a Lyapunov-like sufficient condition for existence and uniqueness of an invariant p.m.

Proposition 3.3.4. Let e, be a MC on X with t.p.f. P. Let K be a finite communicating set, that is, i j for every i, j in K. Assume that there is a function V : X -> R+ and a scalar b>0 such that

EP(i, j )V( j ) < V (i) - 1 + blK (i)

Vi E X.

(3.3.6)

./ Then there is a unique invariant p.m. p and p(K) > 0. Proof The existence of an invariant p.m. follows from Theorem 7.2.4 (see the comment after the proof of that corollary) so that we now concentrate on the uniqueness issue.

Chapter 3. Countable Markov Chains

46

The set Et, := fi E X : pi > 01, the support of /2, is an invariant (or absorbing) set. Indeed, let i E Er, and suppose that P(i, j) > 0 for some j 0 Eli , so that pi = 0. From the invariance of tt we obtain 00

0 = pi =

i=1 a contradiction. Therefore, P(i, Ep,) = 1 for every i E Et, , that is, .E0 is invariant. To get uniqueness, observe first that every invariant p.m. p is such that p(K) > 0. Indeed, assume the contrary, p(K) = 0. As .E0 is invariant and KnEt, = 0, it follows from (3.3.6) that

EP(i, j )V( j ) < V(i) — 1

Vi E

(3.3.7)

Again, using the fact that that Eii is invariant, iterating (3.3.7) n times yields

Epn(i, i )v(i ) 5. v(i)__

n

Vi E

which, for n sufficiently large, contradicts the nonnegativity of V. Hence, p(K) > 0. Now assume that p, and v are distinct invariant p.m.'s. By Lemma 2.2.3, it follows that co := (,u — v)+ and //) := (v — /1)+ are nontrivial and mutually singular invariant measures whose supports Eep and Ev, satisfy E ,0 n E = 0. As we must have v(K) > 0 and 4)(K) > 0, it follows that K1:= Etprix 0, K2 := ElpnK 0, and K1 n K2 = 0. But then this contradicts that K is communicating. If we drop the communicating assumption on the set K, then (3.3.6) is a sufficient condition for existence of a (not necessarily unique) p.m. This type of Lyapunov condition is particularly interesting for it only involves the one-step t.p.f. P and a Lyapunov function V to "guess" (see Theorem 7.2.4 and the comment after its proof).

3.4 Notes Most of the material in this chapter can be found in Chung [23] or Kemeny and Snell [79] or Norris [107]. It is worth noting that finite and countable state space MCs form a very special class of MCs for which a complete classification of states is available. In contrast, there is no such a precise classification for MCs on general measurable state spaces. This is the case because, for a general space X and a general t.p.f. P, we typically have P(x, {y}) = 0 for most (if not all) y e X. Hence, statements like "the number of visits to y" or "the MC visits y i.o." do not make sense any longer. One is thus led to speak of recurrent and transient "sets" rather than "states", as in the next chapter.

Chapter 4

Harris Markov Chains 4.1 Introduction In this chapter we briefly review some basic properties of the so-called Harris MCs. As for countable-state MCs, the class of Harris MCs has been extensively studied in the literature, mainly because they are by far the MCs that enjoy the strongest properties. In fact, we will see that Harris MCs are the exact analogue in uncountable state spaces of the recurrent countable-state MCs. We consider MCs on a measurable space (X, B) in which B is generated by a countable collection of sets. In this case, B is said to be countably generated, and (X, B) is then called a separable measurable space. For example, if X is a separable metric space and B is the corresponding Borel a-algebra, then (X, B) is separable. We begin in §4.2 with some basic definitions, including co-irreducibility, recurrence, and Harris recurrence, and some of their properties. In §4.3 we provide several characterizations of positive Harris recurrence (P.H.R.) via the expected occupation measures P(a) defined in (2.3.4). These characterizations are specialized to aperiodic MCs replacing P(n) with the n-step transition probabilities Pn Two different sets of sufficient conditions for P.H.R. are given in §4.4. Finally, in §4.5 we introduce the Harris and Doeblin decompositions of the state space X, which together with Hopf's decomposition (Proposition 2.3.3) and Yosida's decomposition (§5.3) are the most well-known decompositions of X. .

4.2 Basic Definitions and Properties Let (X, B) be a separable measurable space. Throughout the following, e. = {e t } denotes a X-valued MC with t.p.f. P. The concepts in this section are standard; see, e.g., Meyn and Tweedie [103], Neveu [106], Nummelin [108].

Chapter 4. Harris Markov Chains

48

For A E B, let 7/A be the number of visits of e. to A, that is, 00

TIA :=

1 A(en.)•

n=1

(This random variable was already defined in (3.2.2) for countable-state MCs.) We begin with the definition of co-irreducibility:

Definition 4.2.1. Let (p be a nontrivial a-finite measure on B. The Markov chain e. is said to be co-irreducible if 00

Pn (x, A) > 0 Vx E X whenever (p(A) > 0, A E B.

Ex(71A) =

(4.2.1)

n=1

In other words, (4.2.1) states that for any initial state x E X, the expected number of visits of to A is positive whenever co(A) > 0. This property, which is the starting point of the study of Harris MCs, is quite strong. Indeed, co-irreducibility implies that the "important" sets A (those for which cp(A) > 0) are always reached with positive probability from every initial state X E X. But, as we next show, it also implies that an invariant p.m. is unique whenever it exists.

Proposition 4.2.2. Let

be (p-irreducible and suppose that P admits an invariant p.m. A. Then A is the unique invariant p.m. for P; therefore u is ergodic. is co-irreducible. Let p be an invariant p.m. for P, that Proof. Suppose that p. By Lemma is, A = AP, and assume that P admits another invariant p.m. v 2.2.3, the mutually singular measures 'y := (A- v)± and x := (v - p)± = (p,- v) P, and there exist two disjointareboth(fnmliz)vartp.'sfo invariant sets B 1 , B2 E B such that 'Y( 131) = 1; P(x,B1) = 1

Vx E

and x(B2) = 1;

P(x, B2) = 1

It thus follows that for every x E B1 and y E (x,

= 0 and Pn (y,

Vx

E B2.

B21

= 0,

n = 0, 1, ...

(4.2.2)

But then (4.2.1) combined with (4.2.2) yields that co(B) = 0 and co(B) = 0, and so co(X) = 0. This contradiction yields that p is necessarily the unique invariant p.m. for P, which in turn, by Proposition 2.4.3, implies that p, is ergodic. The weaker notion of co-essential irreducibility introduced by Neveu [106] also proves to be very useful, as we will see in Chapter 6.

4.2. Basic Definitions and Properties

49

Definition 4.2.3. Let cp be a a-finite measure on B. The MC e. is said to be cp-essentially irreducible if CX)

Pri (x, A) > 0 co-a.e. whenever A E n=i

(4.2.3)

E B, cp(A) > 0.

If is (p-irreducible, there is a maximal irreducibility measure lp (which we may choose to be a p.m.). The latter means that e. is 0-irreducible and, for any other measure v on X, e. is v-irreducible if and only if v is absolutely continuous with respect to 0, i.e., v < 1/1'. is irreducible in the sense of §3.2, Observe that if X is a countable set and then (4.2.1) holds for the counting measure ((p(A) := the number of elements in A) on X. Conversely, if (4.2.1) holds and 7p is the maximal irreducibility measure for then the set fi : > 0} is the unique recurrent class (as defined in §3.2).

Definition 4.2.4. A 7p -irreducible MC with no invariant p.m. is called a null MC, and positive otherwise. We next introduce the notion of a small set.

Definition 4.2.5. A set C c B is called a small set if there exists an integer m > 0 and a nontrivial measure v, on B such that Pm(x, B) > v(B)

(4.2.4)

Vx EC, BE B.

Equivalently, Pm (x, .) is order-bounded from below by the measure v m (.) for every x E C. When (4.2.4) holds, C is said to be v,-small.

Proposition 4.2.6 (Meyn and Tweedie [103, Prop. 5.2.4]). Let e. be lp irreducible. Then there is a countable collection of small sets C i in B such that X = U 1 C1 . -

The existence of small sets for 0-irreducib1e MCs is useful to define aperiodicity of a chain.

Proposition 4.2.7. Let e. be 0-irreducible and assume that there exists a v„-small set C E B with 0(C) > 0. Then there are an integer d > 1 and disjoint sets D E B, i = 1, 2, .. . d, such that (a) for x E Di, P(x,Di+i) = 1 for i = 0, 1, . . . d — 1, and P(x, Di) = 1 for x E Dd• (b) [X \ U i Di] =0. In this case, {D1,

, Dd} is called a d-cycle.

Definition 4.2.8. Let e. be a 0-irreducible MC. The smallest d for which a d-cycle exists is called the period of e.. When d=1, the chain is said to be aperiodic. We now turn to recurrence concepts, which are essentially "strong" versions of (4.2.1).

50

Chapter 4. Harris Markov Chains

Definition 4.2.9. A MC is called recurrent if it is *-irreducible and 00

(x, A) = oo VX E X whenever W(A) > 0, A E B.

Ex(71A) =

(4.2.5)

n=1

If a recurrent MC has an invariant p.m., then the MC is said to be positive recurrent. For instance, a positive MC (Definition 4.2.4) is recurrent, and therefore, positive recurrent. On the other hand, if X is a countable set and 0 is the counting measure on X, then taking A = fil and x = i, we retrieve that the state i E X is recurrent in the sense of §3.2.

Definition 4.2.10. A set A E B is said to be Harris recurrent if Vx E A,

Px OA = ()(3 ) = 1

(4.2.6)

and a MC is Harris recurrent if it is W-irreducible and every set A E B is Harris recurrent whenever W(A) > 0. If in addition the MC admits an invariant p.m., then it is called a positive Harris recurrent (P.H.R.) MC. In other words, a set A is Harris recurrent if, when the MC starts at x E A, it returns to A infinitely many times, Px-almost surely. In fact, it can be shown that a MC which is Harris recurrent also satisfies Px(7)A = 00)

= 1

V x E X,

whenever CA) > 0;

that is, from every initial state x E X, the number of visits to A is infinite Px almost surely.

Proposition 4.2.11. A recurrent MC has a unique (up to a multiple) invariant measure. (Hence, if the invariant measure happens to be a p.m., then the MC is positive recurrent.) Summarizing, a MC is

0-irreducib1e <=> recurrent <#. Harris recurrent <=>.

Ex >o Ex (77 A) = oo }Vx E X, W(A)> 0, A E B. Px (rm = oo) = 1 ( nA

)

-

It follows, in particular, that Harris recurrence implies recurrence, but the converse is not true. However, the difference between recurrent and Harris recurrent MCs is only by a 0-nu11 set in the following sense.

Proposition 4.2.12. If , is recurrent then X =HUN, where 0(N) = 0, H is invariant, and every subset A E B of H is Harris recurrent whenever W(A) > 0.

4.3. Characterization of Harris recurrence

51

P.H.R. MCs are by far the chains that enjoy the strongest properties. [In fact, we will see in §4.3 that they are the analogue in general measurable spaces of unichain MCs on a countable state space, with the property that absorption into the recurrent class takes place in finite time P r -almost surely from every initial state x E X.] For instance, the following theorem shows how they relate to the strong law of large numbers (SLLN) for functionals of MCs.

Theorem 4.2.13. Suppose that P has an invariant p.m. ,u,. Then the following two propositions are equivalent: (a) is P.H.R. (b) For every x E X and every f E n-1

lim n n-■ oo

f(t) t=0

=

f

f

Px -a.s.

(4.2.7)

Observe that Theorem 4.2.13(b) is a refinement of the sample-path ergodic Theorem 2.5.1. Indeed, (4.2.7) holds for every x E X, whereas (2.5.5) holds only for every x in a set Bf of full it-measure. A measurable function f : X -4 R is said to be invariant or harmonic w.r.t. P (or P -harmonic) if P f = f. Another important property of P.H.R. MCs is the following result concerning the t.p.f. P.

Theorem 4.2.14. (a) The bounded P-harmonic functions on X of a P.H.R. MC are constant. (b) Conversely, if P admits an invariant p.m. and the bounded P-harmonic functions are constant, then the MC is P.H.R. Proof. For the statement (a) see e.g. Meyn and Tweedie [103, Theorem 17.1.5], or Revuz [112, Proposition 2.8]. On the other hand, (b) is a consequence of Proposition 17.1.4 and Theorem 17.1.7 in Meyn and Tweedie [103]. 111 The characterization of P.H.R. MCs in Theorem 4.2.14 is a purely algebraic property of the t.p.f. P viewed as a linear operator f Pf on the space B(X) of bounded measurable functions. It states that the eigenspace corresponding to the eigenvalue 1 is the one-dimensional subspace of B(X) spanned by the constant function f 1.

4.3 Characterization of Harris Recurrence via Occupation Measures Needless to say, checking that a MC is P.H.R. is not an easy task in general. However, we next provide a characterization of positive P.H.R. which is not based

52

Chapter 4. Harris Markov Chains

on cp-irreducibility and recurrence properties, but rather on asymptotic properties of the expected occupation measures defined in (2.3.4), that is pt( x,B),

P(n ) (x, B) :=

x E X, B E B.

(4.3.1)

These measures are a useful and appropriate tool for characterizing positive Harris recurrence of a Markov chain. We first state the results for P.H.R. MCs (Theorems 4.3.1 and 4.3.2) and next with the additional assumption that the MC is aperiodic (Theorems 4.3.3 and 4.3.4).

4.3.1 Positive Harris Recurrent Markov Chains Theorem 4.3.1. A MC

is P.H.R. if and only if, for every B nonnegative number aB such that P(" ) (x, B) —> aB

E B, there is a

Vx E X.

(4.3.2)

Proof The only if part is easy: If is P.H.R. and A denotes the unique invariant p.m., then from the SLLN (4.2.7) with f := 11B, and the bounded convergence theorem we obtain p(n)( x , B)

ki(B)

V B E B, x E X.

(4.3.3)

Therefore, (4.3.2) follows with aB := B E B. The if part: Suppose that (4.3.2) holds. Then, by the Vitali-Hahn-Saks theorem [see Proposition 1.4.2(a)], it follows that (4.3.3) holds for some p.m. A on B. It is straighforward to check that A is the unique invariant p.m. Invoking the JET Theorem 2.3.4 and using the uniqueness of A, it follows that for f E Li(p) and for /1-almost all x E X lim n -1 E

f (t) =

Ti -> 00

t =1

I

71

f dp, = lim n'> 71 -> DO

(et),

t=1

where the second equality holds Px -a.s. (see Proposition 2.4.2 and Corollary 2.5.2). Moreover, from the setwise convergence in (4.3.3), it also follows that the bounded harmonic functions are constant. Also note that, for an arbitrary f E L i (p), the bounded function h(x) := E ri' f(t) —> f f is harmonic. Thus, h is constant and, since h(x) = 1 for at least one x, the SLLN (4.2.7) follows, which in turn implies that is P.H.R. The characterization (4.3.2) of P.H.R. MCs is simpler than the characterization via the SLLN property (4.2.7). Indeed, (4.3.2) is "simpler" than (4.2.7) because (4.3.2) is only in expectation, not pathwise, a significant improvement.

4.3. Characterization of Harris recurrence

53

In addition, while one needs to check (4.2.7) for every f E Li (p,), only indicator functions are required in (4.3.2). Moreover, one does not need to identify aB as p(B). Finally, observe that Theorem 4.3.1 does not invoke any irreducibility nor "minorization" hypotheses on the t.p.f. P, as do many of the results on sufficient conditions for positive Harris recurrence (see [103, 112]). Condition (4.3.2) is stated only in terms of the MC's t.p.f. P(x, B). When e. is known to have a unique invariant p.m., one may refine Theorem 4.3.1 as follows.

Theorem 4.3.2. Suppose that P admits a unique invariant p.m. p. Then the following statements are equivalent: (a) e. is P.H.R. (b) For each x E X, IIP( " ) (x,.) — pli T v —> 0. (c) For each x E X, P (n ) (x,.) —> p,(.) setwise. (d) For each x E X, P ( " ) (x,B) —> 0 for every B E B with ,u(B) = 0. Proof By Theorem 4.3.1, (a) <=>. (c), and, on the other hand, it is obvious that (b) = (c) = (d). Therefore, to complete the proof it suffices to show that (d) (b). To do this, the idea is to use the Lebesgue decomposition of Pk w.r.t. p, that is, Pk (x, .) Qk(X, .) --i- Uk(X, .) with Qk(X, .) < it and vk(x, .)1p. Then using (d), it can be seen that lim infk vk(x, X) = 0, so that lim i Qk i (X, X) = 1 for some subsequence { }. On the other hand, by Scheffe's Theorem and the condition (2.3.14) in the Dual Ergodic Theorem 2.3.6, j1vP ( n ) — pliTv —> 0 for every p.m. v < p. Therefore, after normalizing Qk as C2k(x, •) := Qh.(x, .)/Qk(x, X) in the above subsequence, one obtains 11Qk i (x, .)/3(71 ) ——› AliTv 0 for each x e X. Finally, using that Pk P( n ) (x, .) = (n+k)P( n+k )/n—kP( k )(x,.)/n, (b) follows. LI Part (d) in Theorem 4.3.2 shows that it suffices to have (4.3.2) for p-null sets (when p is the unique invariant p.m.). In fact, part (d) remains valid if we replace p, with any measure v "equivalent" to /I. This fact is particularly useful in applications where one easily knows a measure v equivalent to p, but perhaps not p, itself. This is the case, for instance, for additive-noise systems xt+i = F(x) + et,

t = 0, 1, ...

(4.3.4)

in R', where F : R" Rn is a given measurable function; the e t are i.i.d. random vectors, with eo absolutely continous with respect to the Lebesgue measure A, and with a strictly positive density. If the system has an invariant p.m. ,u,, then p is equivalent to A.

4.3.2 Aperiodic Positive Harris Recurrent Markov Chains We next provide the analogues of Theorems 4.3.1 and 4.3.2 but now for aperiodic P.H.R. MCs. Note that Theorems 4.3.3 and 4.3.4, below, are formally the same as Theorems 4.3.1 and 4.3.2 but replacing the Cesar° averages PH (x, .) with the n-step transition probabilities Pn(x,.).

54

Chapter 4. Harris Markov Chains

Theorem 4.3.3. A MC is P.H.R. and aperiodic if and only if, for every B E B, there is a nonnegative number aB such that Pn (x, B)

aB

Vx E X.

(4.3.5)

Proof. The only if part. Assume that e. is P.H.R. and aperiodic so that, in particular, P admits a unique invariant p.m. II. Then IIPn(x, .) — —;310i efso (4.3. 5) v in x E X [see Meyn and Tweedie [103, Theor. 13.01]]. This clearly 5) with aB := p,(B). The if part. Assume that (4.3.5) holds. Then so does (4.3.2) and thus, by (a), is P.H.R. In addition, e. is 0-irreducible with := ji and from Proposition 4.2.6, X = u 1 C2 with every Ci being a small set. As ti(X) = 1, it follows that > 0 for at least one index i, and so Ci is a p-small set with p(Ci ) > 0. Therefore, by Proposition 4.2.7, there is a d-cycle {D i }. Assume that d> 1. Then, as Pnd+1 (x, D2) = 0 and Pnd (x, D2) = 1 for every x E Di and every n = 1, 2, • • • , it follows that Pn(x, D2) cannot converge to p(D i ) whenever x E D 1 , which is a contradiction. Therefore, d =1 and thus is aperiodic. 111 The analogue of Theorem 4.3.2 is as follows.

Theorem 4.3.4. Suppose that P admits a unique invariant p.m. ,u,. Then the following statements are equivalent: (a) e. is P.H.R. and aperiodic. (b) For each x E X, 11Pn (X, .) — Tv —> 0. (C) For each x E X, Pri(x,.) --> A(.) setwise. Proof. As was already noted in the proof of Theorem 4.3.3, (a) implies (b). Moreover, (b) = (c) is trivial. Finally, from Theorem 4.3.3, we get (c) = (a). LI

4.3.3 Geometric Ergodicity For a certain class of P.H.R. aperiodic MCs (the geometrically ergodic MCs defined below), the convergence of P"(x, .) to the invariant p.m. p in Theorem 4.3.4(b) can be greatly improved. Before proceeding further we introduce the following weighted norm 11.111v. Let w : X —> [1, oo) be a fixed given measurable function, sometimes called a "weight" function. For a measure v on X let

:=

SUP I fg v.

(4.3.6)

g: Igl<w

If w 1, then HIW is a norm equivalent to the total variation norm 11.11 Tv ; see (1.4.3)—(1.4.4).

55

4.3. Characterization of Harris recurrence

Definition 4.3.5. Let

be a MC with t.p.f. P and let 0u be an invariant p.m. for P. Then e. is said to be w-geometrically ergodic if there is a constant r> 1 such that 00

rn11 13n(x,

—

w

(4.3.7)

<

n=1

for all x E X. If w 1 then e. is said to be geometrically ergodic. Hence, for geometrically ergodic MCs, the convergence iiP n (x,.) — PilTv is stronger than in Theorem 4.3.4(b) because in view of (4.3.7) it must take place at a geometric rate p < 11r, and from Theorem 4.3.4(b), a geometrically ergodic MC is necessarily P.H.R. and aperiodic. The question then is, under what conditions is the MC geometrically ergodic?

Theorem 4.3.6. Let

be a MC on X, and p a p.m. on X. Then the following statements are equivalent: (a) sup 11P7 (x, .) — —› 0 as n oo xEX

< 1 for some n > 1.

(b) sup xEX

(c) sup 11Pn(x,.) — 0u1ITv < Rpn for some R < oo and some 0 < p <1. xEX

For a proof see Meyn and Tweedie [103, Theor. 16.2.1]. Hence, if any of the statements (a), (b) or (c) in Theorem 4.3.6 holds true, then e. is geometrically ergodic. However, the definition of geometric ergodicity (and more generally, w-geometric ergodicity) or Theorem 4.3.6 are difficult to check directly. Fortunately, there is a sufficient condition (and in principle, easy to check) that only involves the one-step t.p.f. P. It requires, however, to introduce the notion of a petite set.

Definition 4.3.7. For a MC e. with t.p.f. P, a set C E B is said to be va -petite if there is a p.m. a on N and a nontrivial measure v, on X, such that 00

Ic(x,B) :=

E a(n)Pn(x,B) > v(B)

(4.3.8)

n=0

for all x E C and B E B. The notion of petite set generalizes that of a small set in Definition 4.2.5 because we have that C is v 1 -petiteif C is v1 -small. Then we have the following.

Proposition 4.3.8. Suppose that e. is 0-irreducible and aperiodic. Moreover, assume that there exist a measurable function w : X —> [1, 00), a petite set C e B and scalars b, f3 > 0, such that

Pw(x) < (1 — f3)w(x) + blc(x)

Vx E X.

(4.3.9)

Chapter 4. Harris Markov Chains

56 Then

E 7,', [sup w(x) — iiiPn(x, .) — pilw] ii=1

In particular,

<00.

(4.3.10)

xEX

is w-geometrically ergodic.

For a proof see Meyn and Tweedie [103, Theor. 16.1.2].

4.3.4 Discussion The following remarks complement the previous results and highlight some of their consequences. (a) The equivalence "(b) <=> (c)" in Theorem 4.3.2, that is, the fact that the setwise convergence of P(n) (x, .) to i is equivalent to the much stronger convergence in the total variation norm (i.e., the setwise convergence is uniform on the Borel sets), is indeed a very special feature that characterizes P.H.R. MCs because, of course, this implication does not hold for arbitrary MCs. (b) If X is countable and is ergodic (as defined in §3.2), then from Proposition 3.3.2, it follows that P(n) (i, .) converges "setwise" for all i E X (take f :-= I1{ 3 } for all j E X) and, therefore, e. is P.H.R. In the countable case, the proof that convergence is also in the total variation norm follows from Scheffe's Theorem [14]. has a unique recurIn fact, this is true even if is only unichain (that is, when rent class) and absorption into the recurrent class from every initial transient state takes place in finite time with probability one. Similarly, if is P.H.R. and X is countable, if follows that has a unique positive recurrent class (the support of it) and is indecomposable. In particular, starting in a transient initial state x E X, the chain is absorbed into the recurrent class in finite time P x-a.s. Therefore, the P.H.R. MCs are the analogues for general state spaces of the indecomposable, positive recurrent MCs in the countable case, for which absorption into the recurrent class from every initial state x, takes place in finite time Px -a.s. (c) As an application of Theorem 4.3.2, one may see that a "noiseless" measure-preserving transformation in W (as in (2.2.6), for instance) whose invariant p.m. has a density w.r.t. the Lebesgue measure A, cannot be P.H.R. Indeed, the expected occupation measures P( n ) (x, .) are all finitely supported, and, therefore, cannot converge setwise to a p.m. that has a density w.r.t. A. For the same reason, random walks on the real line with a finitely supported noise distribution and an invariant p.m. absolutely continuous w.r.t. A cannot be P.H.R.

4.4 Sufficient Conditions for P.H.R. We now provide three sufficient conditions to ensure that (d) in Theorem 4.3.2 holds. They have the remarkable common feature that they only involve the onestep t.p.f. P.

4.4. Sufficient Conditions for P.H.R.

57

Theorem 4.4.1. Suppose that P admits a unique invariant p.m. it, and consider the following conditions (i), (ii), (iii) : (i) For every x E X, P(x,.) < it. (ii) For every ,u-null set B E 13 sup P(x,B) < 1. xE B

(iii) For every pc-null set B E B, there exists a nonnegative measurable function fB : X —> R such that

f P (x , dY)f B(Y)

Vx E X.

(4.4.1)

Then (i) = (ii) = (iii) = (d) in Theorem 4.3.2; hence, each of the conditions (i), (ii), (iii) implies that .,, is P.H.R. Proof. The implication (i) = (ii) is trivial as ,u(B) = 0 implies P(x, B) = 0 for every x E X. To prove (ii) = (iii), one first constructs a set S D B such that ,u(S) = 0 and P(x, S) = 0 if x cl S. Then, the function fB := cells, with a > (1 — suprEs P(x,S)) -1 , satisfies (4.4.1). Finally, to prove (iii) (d) in Theorem 4.3.2, arbitrarily fix a p.-null set B E B. Then note that iteration of (4.4.1) yields n-1

E Pt(x, B) <

A (x)

— Pn/B(x)

f(x)

Vx E X,

t=0

so that P( n ) (x,B) —> 0 for all x E X.

El

The sufficient conditions in Theorem 4.4.1 are easy to check in some cases (for instance, for an additive-noise system as in (4.3.4)). We next present a sufficient condition for a ( 09-irreducible MC to be P.H.R., which requires the following definition.

Definition 4.4.2. Let . be a MC on a metric space X, with t.p.f. P. Let f 1— P1 be as in (2.3.6) for functions in B(X) D Cb (X) (see Remark 2.3.7(a)). Then P (or the MC ..) is said to satisfy (a) the weak-Feller property if P maps Cb(X) into itself, i.e., P f (.) := )f( P(., dY) f (Y)

is in Cb(X) V f E Cb(X);

(b) the strong-Feller property if P maps B(X) into Cb(X), i.e., P f (.)

is in Cb(X) V f E B(X).

Chapter 4. Harris Markov Chains

58

In other words, P is weak-Feller if and only if for every sequence {x n } in X such that x n —> x E X, Pf(x n ) —> Pf(x) whenever f E Cb(X); equivalently, P(x n ,.) converges weakly to P(x,.) if x n X. Similarly, P is strong-Feller if and only if for every sequence { x n } in X such that x n —>xE X, Pf(x n )—> Pf(x) whenever f E B(X); equivalently, P(xn, •) converges setwise to P(x,.) if xn —> x. As a trivial example, observe that if X is a countable set (with the discrete topology), then B(X) = Cb(X), and P satisfies the strong- (hence the weak-) Feller property. In other words, a countable-state MC is strong-Feller. On the other hand, for a MC as in (2.2.4), i.e., e t±i = F(et , '0) for all t = 0,1, ... , if v denotes the common distribution of the /P t , then from (2.2.5) we get

P f (x) = I f [F (x , y)] v (dy). Thus, by the dominated convergence theorem, it follows that P satisfies the weakFeller property if X 1-4 F (x , y) is continuous on X for each y e Y. In particular, consider the additive-noise system (2.2.10), and again let v be the distribution of the Then, as

ot.

P f (x) = fy f [G(x) + y] v(dy)

with Y = X = Rn ,

P is weak-Feller if G(.) is continuous. In addition, let us suppose that v has a density w.r.t. the Lebesgue measure A on X, i.e, v(dy) = g(y)A(dy). Then a straightforward change of variable gives that

P f (x) = f f (y) g(y — G(x)) A (dy) Hence, if G(.) is continuous and g(.) is continuous and bounded, then P is strongFeller. Feller MCs, that is, MCs that satisfy the weak- or the strong-Feller property, are studied in detail in Chapter 7 . Here, we wish to use the weak-Feller case to obtain the following sufficient condition for a MC to be P.H.R.

Proposition 4.4.3. Let X be a LCS metric space with the Borel o - -algebra B. Assume that e. is co-irreducible and, moreover : (a) P is weak-Feller. (b) There exist a measurable function V : X R + and a moment function f : X —>118 + (see Definition 1.4.14) such that

PV (x) < V (x) — f (x) 1 Then

is P.H.R.

Vx E X.

(4.4.2)

4.4. Sufficient Conditions for P.H.R.

59

Proof. For every x E X, iterating (4.4.2) we obtain n-1

V(X) > PV(x) +

E Pk f (x) — n,

k=0

which, as PnV > 0, yields P(n) f (x) < V(x)/n +1,

n = 1, 2, • •

Hence,

ex E X.

sup P(n) f(x) < 1+ V(x) n>1

As f is a moment function, it follows that for every x E X, the sequence of p.m's {P(n)(x,.)} is tight (see Proposition 1.4.15). Therefore, there is a p.m. itx and a subsequence {P ) (x, .)} such that P(nk) (X, .) 'i x . Using the weak-Feller property and Proposition 7.2.2, we conclude that itx is an invariant p.m. for P. Therefore, from Proposition 4.2.2 we must have ,ux it for all x, where it is the unique invariant p.m for P. Consequently, for every x E X, the whole sequence 113(n) (X, .)} converges weakly to it, i.e., P(n) (X, .) it. Moreover, from Proposition 4.2.11, is positive recurrent, and thus, by Proposition 4.2.12, there is a Harris set H such that the restriction of to H is P.H.R. That is, if BH denotes the sub-a-field of B consisting of the sets B E B contained in H, then Px(nB = 00) = 1 Vx E H, whenever ,u(B) > 0, with B E BH,

(4.4.3)

where nB := E IEB(t)• We next prove that (4.4.3) extends to all of X. First observe that ,u(OH) = 0 since bt(H) -= 1 = ,u(H) (where H denotes the topological closure of H and OH the boundary of H). Therefore, H is a ,u-continuity set and thus, from the Portmanteau Theorem 1.4.16 P(n) (x, H)

p(H) = 1

Vx E X.

Equivalently, /3(11) (x, HC) 0 for all x E X. Let x E HG, and let TH := inf{ n > 0: E H} be the hitting time of the set H. Expressing the event E 1-/C1 as

ft EH C , TH > n — 1} U ft E Hc , TH < n — 11, it follows that n—1

P ( T' ) (x, TIC) = Px (TH > n — 1) +

iPx(TH = i), i= 1

or, equivalently, 00

P(11) (x, H.') = Px (TH > n — 1) + Yj gn (i)Px (TH = i),

Chapter 4. Harris Markov Chains

60

with g71 (i) := iln for all 1 < i < n - 1 and 0 otherwise (and thus gn (i) < 1 for all i). Observe that for every fixed i, g(i) -> 0 as n -> oo and Px (TH < oo) = Px(TH = i) < 1. Therefore, by the bounded convergence theorem 0 = lirn P( n ) (x, He) = lirn Px (TH > n - 1)) = Px (TH = oo). n—>Do

n—■ co

Hence, from every initial state x E HG, absorption into H from x takes place in finite time Px -a.s. On the other hand, for every B E BH and every x E , we also have: OC Pi-1 (x,dz) f P(z,dy)P y (riB = 00)

Px(rm = co) OC

Pi-1 (x,dz)P(z,H)

[by (4.4.3)]

i=1

00 PX (TH = i), Z=1. =

Px (TH < 00) = 1,

i.e., Px (riB = 0o) = 1. Finally, if B E BHe then we have p(B) = 0; hence we conclude that for every x E X, Px(T/B = 00) = 1 whenever p(B) > 0, that is, E is P.H.R. 0

4.5 Harris and Doeblin Decompositions To state the Doeblin decomposition of a MC on a general (not necessarily separable) measurable space (X, B), we need to introduce some preliminary concepts. An invariant set A E B is said to be indecomposable if it does not contain two disjoint invariant sets. Let Q(x, A) := P[„, E A i.o.

= xl,

where "i.o." stands for infinitely often. A set A E 8 is called inessential if Q(x, A) = 0 for all x E X, and the union of countably many inessential sets is an improperly essential set. Any other set is called properly (or absolutely) essential. Then, we have the so-called Doeblin decomposition.

Proposition 4.5.1 (Doeblin decomposition). Assume that there is a finite measure on X that gives positive mass to every invariant set. Then we can express the state space X as X=EUH D,1

,

(4.5.1)

n=1

where the D r, are invariant, indecomposable and properly essential mutually disjoint sets in B and E is an improperly essential set disjoint of the D, 'S.

61

4.6. Notes

A set A E B is said to be uniformly transient if there is a constant M such that

00

Es (r I A)

-=

Pn(x, A) _< M

Vx E A,

n=1

and a set B E B is transient if it has a countable cover consisting of uniformly transient sets. We next consider another decomposition of X, called the Harris decomposition.

Definition 4.5.2. The space X has a Harris decomposition if it can be expressed as X

E U

U pc) n=1

1

(4.5.2)

where {11, 1 } is a sequence of disjoint Harris recurrent sets and E is a transient set.

The Doeblin and Harris decompositions are related as follows (see e.g. Meyn and Tweedie [104]).

Proposition 4.5.3 (Harris decomposition). (a) If the space X has a Harris decomposition then it also has a Doeblin decomposition. (b) If B is countably generated, the converse of (a) is true.

Hence (4.5.2) = (4.5.1) and, in addition, when the a-algebra B is countably generated (so that (X, B) is separable), the converse also holds. Therefore, in the latter case, the Harris and Doeblin decompositions either both hold or do not hold. In fact, the hypothesis in Proposition 4.5.1 can be weakened to: There is no uncountable disjoint class of invariant subsets on X (see Meyn and Tweedie [104, p. 214]).

4.6 Notes Many of the results in this chapter are standard and can be found in e.g. Meyn and Tweedie [103], Nummelin [108], Revuz [112]. On the other hand, the results of section §4.3 on the characterization of Harris recurrence via the convergence of and the sufficient conditions in §4.4 the expected occupation measures {P (n) (X, are all from the authors. In particular, Theorems 4.3.1, 4.3.2 and 4.4.1 are from Hernandez-Lerma and Lasserre [65]. Finally, concerning the weak-Feller property (Definition 4.4.2), we mention that it can be defined in an alternative way, on the space M(X) of finite signed measures on X, rather than on Cb (X); see Zaharopol [138], for instance. .)}

Chapter 5

Markov Chains in Metric Spaces 5.1 Introduction We now consider a MC in a LCS (locally compact separable) metric space X and with at least one invariant p.m., say tt. From a practical point of view, LCS metric spaces are very important as many, if not most, real-world applications fall into this framework. One of the main goals in this chapter is to identify the limit function f* in the ergodic theorems of Chapter 2, namely, the mean ergodic theorem (MET), the individual ergodic theorem (JET), the "dual ergodic theorem" (DET), and the sample-path (or pathwise) ergodic theorem. We have already seen that for some particular MCs (e.g. the countable-state ergodic MCs and the positive Harris recurrent MCs) one is able to characterize this limit function f* in terms of the unique invariant p.m. of the MC (see Proposition 3.3.2 for countable MCs and Theorem 4.2.13 for positive Harris recurrent MCs). However, the underlying assumptions (in particular, having a unique invariant p.m.) may be very restrictive for some applications. In this chapter we will see that for MCs on a LCS metric space, and with at least one invariant p.m., we are also able to identify the limit function f* without any further assumption on the MC, but now, the limit function f* is characterized in terms of "ergodic measures". We also provide a Yosida-like ergodic decomposition of the state space X into ergodic classes, each with a unique invariant p.m., and provide some convergence properties of the expected occupation measures in each ergodic class. This result will be particularly useful for the classification of MCs in metric spaces presented in the next chapter. For "continuity" of the exposition, in this chapter we first state the main results in §5.2 to §5.4, but their proofs are all postponed to §5.5

64

Chapter 5. Markov Chains in Metric Spaces

5.2 The Limit in Ergodic Theorems The problem of identifying the limit function f* in the ergodic theorems presented in §2.3 is of fundamental importance in theoretical and applied probability, and in fact there is a well-known solution to it. Namely, f* [in (2.3.7), (2.3.8) and (2.3.14)] can be identified with a conditional expectation [see (2.4.2)], which in turn is induced by a t.p.f. (or stochastic kernel) H on X; see, for instance, [21], [39], [112]. This, however, is an "existence" result which does not say where H comes from. On the other hand, several authors [15], [85], [104], [133] have shown that H is precisely the limit, in some sense, of the expected occupation measures P ( n ) P denotes the MC's t.p.f., but under specific topological [defin(2.34)],whr and/or probabilistic hypotheses on P, such as the Feller property, equicontinuity, Harris recurrence, etc. Here, we prove basically the same result [Theorem 5.2.2(b)] but without any of these restrictions; all we need are the assumptions mentioned at the beginning of this section (which are restated as Assumption 5.2.1 below). Thus we get not only weak convergence of P( " ) to H [Theorem 5.2.2(b)], but also convergence in the total variation norm [Theorem 5.2.2(g)] whenever the initial distribution is absolutely continuous with respect to it; otherwise, the latter convergence may fail see Remark 5.2.6(c). In Theorem 5.4.1 it is shown that f* is also the "limit" of the empirical (or pathwise occupation) measures 7r(n) in (2.3.5). Throughout this chapter we make the following assumption.

Assumption 5.2.1. (a) The state space X is a locally compact separable (LCS) metric space, and

B is the corresponding Borel a-algebra.

(b) The t.p.f. P has an invariant p.m. ,a; that is, it(B) := f ,u,(dx)P (x, B)

VB E

B.

(5.2.1)

Concerning Assumption 5.2.1(b), necessary and sufficient conditions for existence of invariant p.m.'s are given in Chapter 10; see also [53], [54], [57], [88], [87], [103], [112] and their references. Let M(X) be the subspace of measures in M(X) which are absolutely continuous w.r.t. it, i.e., v is in M(X) if v E M(X) and v < t.

5.2.1 The Limiting Transition Probability Function We next provide several alternative descriptions of the limit function f* in the ergodic theorems of §2.3. The following theorem summarizes some of the main results in this section.

5.2. The Limit in Ergodic Theorems

65

Theorem 5.2.2. Under Assumption 5.2.1, there is a t.p.f. 11(x, B) such that: (a) H = HP ,u-a.e.; that is, for 1u-a.a. x E X, 11(x, B) = (HP)(x,

= L11(x, dy)P(y, B) V B E 8,

(5.2.2)

which is the same as saying that, for ,u,-a.a. x E X, 11(x, .) is an invariant p.m. for P. (b) For ,u-a.a. x e X, P ( ") (x,.) converges weakly to II(x, .), i.e., P( " ) f (x) -> [If (x) V f E Cb(X),

(5.2.3)

where (as in (2.3.6)) f (x) := fx 11(x, dy) f (y). (c) f* = Hf t-a.e. Vf E (d) PH = flP = HIT = II it-a.e. /cf. (a)]. (e) it is an invariant p.m. for H, i.e., tt = H. (f) For every f E L i (tt), let vf be the measure defined in (2.3.9), and T f E Li(tt) as in (2.3.10). Then f* = d(v111)1dp, = d(vf.P)Idp, = Tf*; in particular, vf. = vfll. (g) For every measure v E M(X), vP ( n ) converges to vll =: V in the total variation norm and V is a fixed point of both P and II, i.e., as n -> oo, OvP (n) - v1Tv -> 0, VP = V, and v*H = V.

(5.2.4)

For a proof of Theorem 5.2.2 see §5.5.3. Some authors (for instance, Borovkov [15]) have proved (5.2.3) under hypotheses more restrictive than our Assumption 5.2.1. Moreover, replacing Cb (X) cf. §1.4.3] convergence of by Co (X) in (5.2.3) we obtain the weak* [or vague P( n ) to H, which has been previously obtained under stronger assumptions (for instance, see [104], [133]). The proof of Theorem 5.2.2 essentially follows from the next two lemmas which are important in themselves. First, we recall that given an ordered normed vector space (V,11.11), an operator on V which is positive and norm-preserving is called a Markov operator, see Remark 2.3.7(b). In the latter remark, we mentioned several examples of Markov operators; another one is introduced in the following lemma. Recall that Assumption 5.2.1 is supposed to hold throughout this chapter.

Lemma 5.2.3. Let : Li(it)

L1(,a) be the linear mapping defined by Hf := with f* as in (2.3.7)-(2.3.8). Then: (a) 11 is a Markov operator on L1 (), and also a contraction.

J *

,

66

Chapter 5. Markov Chains in Metric Spaces

(b) If fk —> f in L i (p), then f = fifk fif = f* in L i (p). (c) 1ff and fk (k = 1, 2, .. . ) are functions in L i (t) such that fk I f p-a.e., then ilfk I IIf = f* (d) Viewing P: Li(p) —> Li(p) as the linear operator defined by (2.3.6), we have

ftp = Pti = Un = ft

(5.2.5)

For a proof of Lemma 5.2.3 see §5.5.1. Lemma 5.2.4. For every x E X there is a measure (Px on X, which is a weak* accumulation point of P() (x, .), and such that (a) p(B) = f p(dx)co x (B) VB E B,

)(

(b) cox () is a p.m. for p-a.a. x E X, (c) for all f E Li(p), f*(x) = I f(y)co x (dy) p-a.e., (d) for p-a.a. x E X, co x is an invariant p.m. for P, i.e., Sox(B) = f co x (dy)P(y, B)

V B E B,

and, similarly, (e) for p-a.a. x E X, ep x (B) = fx P(x,dy)co y (B) = fx cox (dy)co y (B) for all B E B. For a proof of lemma 5.2.4 see §5.5.2. In fact, the proof of part (a) of Lemma 5.2.4 is similar to the proof of Yosida's Theorem 2 [[133], p. 396], the main difference being that he assumes that P maps Cc (X) into itself [i.e., P f is in Cc (X) if f is in Cc (X)1, which is used to obtain the measures ciox in the lemma from the Riesz representation theorem. Here, these measures are obtained by a weak*compactness argument and no assumption on P, except for Assumption 5.2.1. Among other things, Lemma 5.2.4(a) states that every invariant p.m of P is a convex combination of the invariant p.m.'s cpx . The following simple example illustrates the importance of Assumption 5.2.1 in the characterization (b), (d) and (e) in Lemma 5.2.4 of the weak* accumulation points cox , which (in the present context of LCS metric spaces) always exist. Example 5.2.5. On X := [0, 1], consider the MC induced by the deterministic dynamical system xt±i =--

xt /2 if x t 0 1 otherwise

V t = 0, 1, ...

(5.2.6)

for some initial state x o E [0, 1]. It is immediate to check that the Dirac measure 60 at x = 0, is the weak (actually weak* see Remark 1.3.1) limit of the sequence IP( n) (x, .)}, for all x E [0, 1]. However, the t.p.f. P associated with (5.2.6), which is P(0,B) = 11B(1); P(x,B) = 1B(x12) Vx 0,

5.2. The Limit in Ergodic Theorems

67

has no invariant p.m. Hence, cox = 60 for all x E [0, 1] but (d) and (e) in Lemma 5.2.4 do not hold. On the other hand, for the trivial MC associated with the deterministic system on X = [0, 1], defined by x t± i = St for all t = 0, 1, ... , one has co x = 6x for all x E X, and every cox is an invariant p.m. From Lemmas 5.2.3 and 5.2.4 we have two different representations of f*, namely, f*(x) = fif(x) = I f(y)(p x (dy) x

Vf E Li (p), ii-a.e.

(5.2.7)

Moreover, (5.2.5) and parts (d) and (e) in Lemma 5.2.4 are basically the same as Theorem 5.2.2(d), whereas Lemma 5.2.4(a) would correspond to Theorem 5.2.2(e). This clearly suggests that, in view of Lemmas 5.2.3 and 5.2.4, the proof of Theorem 5.2.2 can be reduced to a suitable identification of the limiting t.p.f. I -1(x, B), which is done in §5.5.3. Remark 5.2.6. (a) The t.p.f. H in Theorem 5.2.2, viewed as an operator on L1(1), is self-adjoint, i.e., H = II*, which follows from the fact that (with the usual duality bracket (., .) between L i (p) and L oo (p)) (II/ , g) = (f , HO

(5.2.8)

for any two functions f E L i (p),g E L(A). Indeed (IV, g) = I (11f)g di..t = f (11f) dvg = I f d(vg 1I)

I =

f dug . [by Theor. 5.2.2(0]

f fg* dtt = (f,g*)

= (f,Hg) [ by Theor. 5.2.2(c)],

where we have implicitly used the fact that g* E L 00 ( p, ) C Li (p) if g E L oo (p). (b) Comparing the MET (2.3.8)(i) with Theorem 5.2.2(g), we can interpret the latter result as a "mean" (or in "norm") ergodic theorem for P seen as an operator v '—p vP on M(X). Moreover, (5.2.4) implies of course the setwise convergence vP (n ) (B) ---> v* (B) VB E 8, which can be interpreted as an "individual" (or "pointwise") ergodic theorem for P acting on measures in Mi, (X). (c) The convergence in (5.2.4) may fail if V is not in Mi,(X). For instance, let [ 10 1 P be the 2 x 2 identity matrix, and let it = (1, 0), v = (0, 1), and H = 10i' Then, H satisfies Theorem 5.2.2, but (5.2.4) does not hold for such a v.

68

Chapter 5. Markov Chains in Metric Spaces

5.2.2 Extension to Some Nonmetric Spaces It is worth noting how the results in this section can be extended to nonmetric spaces, that is, Assumption 5.2.1(b) remains the same, but in Assumption 5.2.1(a) X is a topological space with a countably generated Borel a-algebra B. Let us suppose the following. Hypothesis. There is a Banach space F(X) of real-valued bounded measurable functions on X, endowed with the sup-norm, and the following properties: (a) F(X) is separable, (b) F(X) is dense in L i (p), and (c) the family P(X) of probability measures on X is weak*-closed, i.e., closed in the weak* topology o- (F(X)*, F(X)), where F(X)* 9 p(x) is the topological dual of F(X). [Observe that, by (a) and the fact that P(X) is convex, to check weak* closedness of P(X) it suffices to use sequences, instead of nets; see Lemma 1.3.21 Then, noting that M(X) is contained in F(X)*, any sequence in P(X) has a weak* limit (by Lemma 1.3.2(b)) and we can reproduce the proof of Lemma 5.2.4 replacing Co (X) with F(X). All of the other results are obtained similarly.

5.3 Yosida's Ergodic Decomposition In this section we study further properties of the measures ( lox in Lemma 5.2.4. In particular, we will see that they are ergodic measures. We follow an approach closely related to Yosida's [133, §XIII.4] to obtain an ergodic decomposition of the state space X. This decomposition is then used to obtain ergodic-like results for the empirical or pathwise occupation measures in (2.3.5). First we introduce some terminology and notation; then we obtain some properties of ergodic measures (Lemma 5.3.2); and, finally, we obtain the Yosida ergodic decomposition. We suppose again that Assumption 5.2.1 holds, except that now it is convenient to think of p as an arbitrary invariant p.m. We will write P(x, B) as P1B(x), that is f = IIB in (2.3.6). We briefly recall from Definition 2.2.2 that a set B E B is said to be invariant (with respect to P) if PIIB > JIB; that is, P(x,B) = 1 whenever x E B. Also recall that B E B is said to be p-invariant if PIIB = JI B p-a.e. In particular, any set B with ,u(B) = 0 or 1 is p-invariant. As in §2.4, we shall denote by S the family of all the invariant sets, and by Si, the completion of S with respect to p. The a-algebra Si, is a sub-a-algebra of the completion B B with respect to p, and it coincides with the family of p-invariant sets. Finally, we recall that an invariant p.m. is said to be ergodic if p(B) = 0 or p(B) = 1 for every invariant set B e S [see Definition 2.4.1], and that a measurable function f : X —* R is invariant of harmonic w.r.t. P if Pf = f. In this section we only consider functions f in L i (p), so, more explicitly, we have the following.

5.3. Yosida's Ergodic Decomposition

69

Definition 5.3.1. A function f E L i (p) is said to be invariant (with respect to P) if P i = f. Note that if f is invariant then f = P( ' ) f for all 71 = 1, 2, ... , and, therefore, by (2.3.7)(i), f = f* p-a.e. Moreover, by (2.3.8)(ii), f* is invariant so that f* — (f * ) * • The following lemma summarizes several known results on ergodicity (see, e.g., [133]), which are used to introduce the "ergodic decomposition" of X; they are stated here mainly for completeness and ease of reference.

Lemma 5.3.2. Let p be an arbitrary invariant p.m., and let (pa. be the corresponding measures obtained in Lemma 5.2.4. Then the following statements (a) to (e) are equivalent: (a) p is ergodic. (b) Every invariant function is a constant p-a.e. (c) For p-a.a. x: f f dp = f f d(px for all f E Li (14 (d) p(S) = 1, where S := Ix E X Icp x = pl. (Note that p(S) = 1 yields that S is p-invariant.) (e) 43.(f, p) = 0 for all f E L i (p), where

:= f [f

f chpx

—f

f dttl 2 P(dx).

(5.3.1)

Suppose in addition that p is ergodic, and let S be the p-invariant set in (d). Then (f) There exists an invariant set S C S such that p(S \ 5) = 0 [cf. (2.4.1)], i.e., p(S) = p(S) = 1 and P(x, S') = 1 Vx E ,.'.

(5.3.2)

Moreover, if 15 (x, B) denotes the t.p.f. P restricted to S (i.e., restricted to x E ,. 'y and B c S), we get the following: (g) p is the unique invariant p.m. for P, and (h) P is ergodic, or indecom,posable, in the sense that S cannot be written as the union of two disjoint invariant sets A and B with p(A) • t t(B) > 0. Lemma 5.3.2 is proved in §5.5.4. To obtain the ergodic decomposition of X, the basic idea is to show that cpx is ergodic for p-a.a. x, and then apply Lemma 5.3.2(f)—(h) to cpx . Hence we shall begin with the following lemma, which states, in other words, that the expression in Lemma 5.2.4(a) is an integral (or "convex") representation of p in terms of ergodic measures.

Lemma 5.3.3. Let p be an invariant p.m., and (px (x E X) as in Lemma 5.3.2. Then, for p-a.a. x, the invariant p.m. (p x is ergodic.

70

Chapter 5. Markov Chains in Metric Spaces

Proof. Note that in Lemma 5.3.2(e) [and (c)] we may replace Li(p,) by Co(X). Moreover, by Lemma 5.2.4(e), for ii-a.a. x E X and f E Co(X) we have

f f dcp x = f I f (z)cp y (dz)v x (dy). Then a direct calculation shows that, from (5.3.1), 2 2 0 4)(f, cox ) = f [f f chP ] Vx (dY) — [f f d(Px] , y

and integration with respect to p, yields [by Lemma 5.2.4(a)]

f (I) (.f,Sax)P(dx) = 0 Vf

E Co(X).

(5.3.3)

Hence, as cox is an invariant p.m. for P [Lemma 5.2.4(d)], the desired conclusion follows from Lemma 5.3.2(a),(e). CI

Remark 5.3.4. In (5.3.1), replace ii, with cox . Then using Lemma 5.2.4(c) we get = f [ f I d(Py — f f dcox] 2 (Px(dY) = I [f* (Y) — f* (x)] 2 cox(dY).

(5.3.4)

We will use this fact to obtain the ergodic decomposition of X. Let X 1 be the set of points x E X for which the conclusions (a)—(e) of Lemma 5.2.4 hold. Hence i(X i ) = 1 and so X 1 is a ,u,-invariant set. On the other hand, by (5.3.3) and the fact that X is in LCS, the set X2:=IXEX114)(f, yox ) = 0

Vf E Co(X)}

(5.3.5)

has measure 1t(X2) = 1. Now, for x E X2, consider the set

Sx = fy E X2 1 f* (Y) = f* (x) V f E Co(X)}, which (by Lemma 5.2.4(c)) is the same as the set

By (5.3.5) and (5.3.4), we have c(S) = 1 and, therefore, Sx is a cpx-invariant set. In fact, as cox is an ergodic measure (Lemma 5.3.3), we may view cox and Sx as corresponding to t,t and S in the second part of Lemma 5.3.2. Thus, from parts (f)—(h) of Lemma 5.3.2 we obtain the following ergodic decomposition of X.

5.3. Yosida's Ergodic Decomposition

71

Theorem 5.3.5. For every x EX2: (a) There is an invariant set Sx c Sx such that [as in (5.3.2)] co x (S) = 40 x(gx) =1, and P(y,, x ) = 1 Vy E S.

(b)cox is the unique invariant p.m. of P,

where

P

denotes the restriction of P to

,:x•

(c) P

is ergodic [or "indecornposable" — see Lemma 5.3.2(h)] on

We are now in position to define the concept of ergodic class. Definition 5.3.6. Let e, be a MC with t.p.f. P. A set S E 5 is called an ergodic class if (i) it is invariant, and (ii) the restriction of P to S admits a unique ergodic invariant p.m. p. In view of Theorem 5.3.5 one may choose S such that co, = p for all x E S (with co x as in Lemma 5.2.4). It is worth mentioning that, depending on the underlying assumptions, there are other types of "ergodic decompositions", like Doeblin's and Harris' (see §4.5), Hopf's (see §2.3), etc.; see also [39], [104] and [112]. For the decomposition in Theorem 5.3.5, the assumptions on the kernel are "minimal" in the sense that we only require the two conditions in Assumption 5.2.1.

5.3.1 Additive-Noise Systems We now specialize some results to the class of additive-noise systems in (2.2.10), i.e., (5.3.6) G.+1 = G(ri) + 17n, n = 0, 1, ... on X = W'. In (5.3.6), C: Rd -- R d is a measurable function and the i's are i.i.d. random vectors independent of the initial state 6. If G is continuous, then the system is said to be regular. In this case, the t.p.f. P has the weak-Feller property (see Definition 4.4.2). These systems are frequently encountered in practice and the reader is referred to, for instance, [32, 85, 103], for results on their asymptotic behavior. In particular, it is known (see [85], Theorem 12.7.2) that if (5.3.6) is regular and weakly asymptotically stable in the sense that, for any v E M (X), v Pn converges to p in the weak topology (o- (M(X), Cc (X)) (see §1.3), i.e., lim (h, v Pn) = (h, p) V h E Cc (X), fl —+00

then it is strongly asymptotically stable, i.e., p is the (unique) invariant p.m. and one also has Vv E M(X) : hill 11 v P n — ttilTv = 0. (5.3.7) rt , 00

Chapter 5. Markov Chains in Metric Spaces

72

We want to extend this result in three directions. First, we relax the regularity condition on (5.3.6) in that we do not assume that G is continuous. Second, we require {G,} to have at least one invariant p.m. ti, but not necessarily unique. Third, we consider the Cesar° sequence {vP ( n ) } instead of the n-step iterates {vPn} and prove the convergence in total variation norm ffin

11 P(n) (X 1 .)

— bilIT V = 0

(5.3.8)

in each ergodic class. Note that by Theorem 5.2.2(g) we already have (5.3.8) but for initial measures v in M(X) only. Pointwise, the convergence P( n) (x, .) to f.t is only in the weak sense [see Theorem 5.2.2(b)]; in general, not in total variation.

Theorem 5.3.7. Consider the system (5.3.6) and assume that (a) the common distribution of the random vectors rh, is a.c. w.r.t. the Lebesgue measure A(dx) = dx, and (b) there exists an invariant p.m. for P. Let X2 and S* := Sx for x C X2 be as in Theorem 5.3.5; hence P(x,S*) =1 for all x E S*. Let p := cp, be the unique invariant p.m. for the restriction of P to S*. Then Vs E 8* : 1411 11 P(n) (XI .) — PIITV = 0. (5.3.9)

n—■ oo

Thus, by Theorem 4.3.2, the restriction of the MC to S* is P.H.R. In the proof of Theorem 5.3.7 we use the following definition.

Definition 5.3.8. (Royden [113, Prop. 13, p. 408]) The support of a measure p on a LCS metric space X is the unique closed set F C X such that p(X\F) = 0 and p(F n 0) > 0 for every open set 0 C X such that F n 0 O.

Proof of Theorem 5.3.7. First note that it < A. Indeed, by (a), P(x, .) < A for all x E X. Hence, by the invariance of tt we have p(A) = I P(x, A) p(dx) = 0

if A(A) = 0, A E B.

Let f E L1(A) be the density of ii w.r.t. A. Now choose an arbitrary x E 8*. We next show that P(x,.) < p. Indeed, for every A E B,

P(x, A) = I 11,4 (G(X) + s)g(s)ds where g(.) denotes the common probability distribution of the Tit w.r.t. A. Equivalently,

P(x, A) =- fA g(y — C(s)) dy, A E B. From the Definition 5.3.8 of the support of /2 and of the set 8*, p(S* n o) > 0 for any open set that intersects S*. Therefore, the probability density f = d,u,IdA

73

5.4. Pathwise Results

is strictly positive on S*. Hence, as P(x, S*) = 1 for every x E S* and A E B with A c S*, we have

g(y - G (x))

P(x, A) = fA

f (Y )

g( y - G (x))

f (Y) dY = f

f ( Y)

it(dy),

which proves that P(x,.) < t.i. Therefore, P"(x,.) < ii for every x E S* and n = 1, 2, ... , that is, Pri(x,.) is in M(X), and so we are now in the context of Theorem 5.2.2(g). Fix an arbitrary x E S*, and let us write 1/) := P(x,.) so that 1/) E Mt, ( X). Define n-1 -

b pk

, n = 1, ... .

k -=0

Obviously, n'tP( n ) + 6,.. = (n +1)P(n+1) (x,.), where öx is the Dirac measure at x. Therefore, - PIITV < 11'0 (n) - A ITV + 2(n + 1) -1 Vn = 1, . . . .

(5.3.10)

On the other hand, as IP is in Mi,(X), from Theorem 5.2.2(g) we also have li111 I O n) - PlITV = 0. fl -+00

(5.3.11)

Hence, combining (5.3.10) and (5.3.11), we get lim 11P(n) (x,.) - pliTv = 0, El which yields (5.3.9) because x E S * was arbitrary. For related results on linear systems [i.e., when the function G in (5.3.6) is linear] see, for example, [32, 85, 103].

5.4 Pathwise Results We now consider the empirical (or pathwise) occupation measures introduced in (2.3.5), which were already studied in the sample-path ergodic Theorem 2.5.1 and its Corollary 2.5.2. As in §2.5, we consider the Markov chain e„ = {e n } to be defined on the canonical sample space (Q,T), where 52 := X°° =XxXx..., and .F is the corresponding product a-algebra. An element w E 52 is a sequence w := {Xo, xi, ... } with components x n E X for all 71 = 0, 1, ... , and en : CZ -* X is defined as the projection e n (w) := x n for n = 0, 1, . . . . Recall that we also write 71-( n) (B) as 71- 012) (B) := '11,- 1 Erktall 1B[ek(w)] if the dependence on w needs to be emphasized. Finally, note that an integral with respect to 7r ( ri ) can be written as n-1

f f d7r (" ) =

-

f (G). k=0

(5.4.1)

Chapter 5. Markov Chains in Metric Spaces

74

In the following theorem we present two ergodicity results for 7r (n ) in the case in which the invariant p.m. p in Assumption 5.2.1 is ergodic. These results are refinements of the sample-path ergodic Theorem 2.5.1 as they identify the limit function f* in (2.5.5). In addition, the first one states that 7r (n) converges weakly to bt for every initial state in some set of full measure. Note that in both parts (a) and (b) of Theorem 5.4.1, the ergodicity of p yields, of course, the same limit function f*, namely the constant function f* = f f dii [see (5.5.13)].

Theorem 5.4.1. Suppose that Assumption 5.2.1 holds and, in addition, ti is ergodic. Let S and S be the sets in Lemma 5.3.2. Then (a) There exists a measurable set S* C S such that it(S*) = 1 and for every y E S* , 7r (n) converges weakly to ii P y -a.s., that is, by (5.4.1), Py -a.s. n-1 Vf E Cb(X)

E f (4(W))

: n -1

-4 f f dp, .

(5.4.2)

k=0

(b) For every f E Li (p,), there is a Borel set X1 such that ii(X1) = 1 and for each initial state y E Xf n-1

n -1

E f(k)

-4 f f d,u Py -a.s.

k=0

Theorem 5.4.1 is proved in §5.5.5.

5.5 Proofs We recall some of the notation and basic facts from Chapter 1 that we repeatedly use in the proofs. (a) As in (2.3.15) and (1.2.3), we have

Cc ( X) c Co (X) c Cb (X) c 1,„o (it) c L i 1i), (

where /../ is the p.m. in Assumption 5.2.1(b). (b) Let M(X) be the Banach space of finite signed measures on X with the total variation norm II .11 Tv Under Assumption 5.2.1(a) on X, C o (X) is a separable Banach space and its topological dual is M(X). As in Remark 1.3.1 we denote by o- (M(X), Co (X)) the weak* (weak-star) topology on M(X); that is, for a sequence (or net) Ivri l in M(X), weak* convergence of vri to v (in symbols, vri -4 ' v) means that (v, g) Vg E Co(X), (5.5.1) (vn,g) .

--'

where (v, g) := f g dv, and similarly for (v, g). We will also use the weak topology o- (M(X), Cb(X)), in which Co(X) is replaced with Cb(X) in (5.5.1) (in symbols, v,,,, v). See §1.4 for a discussion on weak and weak* convergence of measures.

75

5.5. Proofs

5.5.1 Proof of Lemma 5.2.3 (a) By (2.3.8)(i), it is obvious that 1^1 is a positive linear map, and the normpreserving property

IlfifIli = 11/11 = if iii, for

(5.5.2)

f E L10-1) + ,

follows from (2.3.7)(ii). Hence 1^-1 is a Markov operator. This yields that 1^-1 is also a contraction on Li(p). Indeed, for any f E Li(p), writing -Ifl
flf

-ii I/I i.e., Infl

Hill,

OP. Thus, by (5.5.2), 11n/111

11011 Iii = 111111,

for any f E L 1 (14. (b) This follows from the linearity and the contraction property of 1^-1 since 11 111k —

flulli = 11f1(ik - 1)Ik 5_ HA — Ilk.

(c) The hypothesis of (c) and the positiveness of 1 -1 yield that the functions fi: = Ilfk form a nondecreasing sequence converging it-a.e. to a function g < f* — flf.. Hence, from (2.3.7)(ii) and the monotone convergence theorem,

f f* di,t = f f di.,t = lir f fk dp, = lir f f I: dit = f g dit, so that f (f* — g)d,u, -= 0. Hence, as f* — g > 0, we obtain f* = g p-a.e. (d) Writing (2.3.8)(0 as Ptif = fl f for f E Li(p,), we see that PH = H. Moreover, the equality HP = H follows from (2.3.7)(i). Namely, for arbitrary f E Li(p), (2.3.7)(i) yields, on the one hand, P(n) (P1) —+ fl(P1) P-a.e-,

and, on the other hand, lim P(n) (Pf) = lim P(n ) f = flf n

n

Thus, as f E Li(p) was arbitrary, we get H = HP. Finally, iteration of (2.3.8)(ii) yields f* = Pk f* for all k = 0,1, ... , so that f* = P(n) f* Vn = 1,2, ...,: This implies f* = (f*)* it-a.e., which is equivalent to the idempotent property H = 1111 in (5.2.5). This completes the proof of Lemma 5.2.3.

0

Chapter 5. Markov Chains in Metric Spaces

76

5.5.2 Proof of Lemma 5.2.4 (a) Let g := {gi} be a countable dense set in Co(X). By the IET 2.3.4, for every 1 there is a it-null set N1 such that, (i) P(n)gi (x) —* g(x) Vx 0 N I ,

and (ii) f gtdp = f

(5.5.3)

Thus, letting N be the union of the sets N1, the denseness of {0 yields that for every f in Co (X): (i) P( ') f (x) —* f* (x) for x 0 N,

and (ii) f f*dtt = f fdp.

(5.5.4)

On the other hand, for every x E X, {P ( n) (x,.)} is contained in the closed unit ball of M(X), which is weakly* sequentially compact (by the Banach-Alaoglu-Bourbaki theorem; see Lemma 1.3.2(b)). Therefore, for every x E X, there is a subsequence {i} of {n} and a subprobability cpx (•) on X such that P( i ) (x, •) —> cox (-) in the weak* topology o- (M(X), Co(X)), i.e., P( i ) f (x) —* f fcicp x V f E C o (X).

(5.5.5)

Thus, from (5.5.5) and (5.5.4)(i), Vf E Co(X) :

f*(x) = fx f(y)(p x (dy) V x 0 N.

(5.5.6)

Now note that as the indicator function 110 of an open set 0 E B is nonnegative and lower semicontinuous, it is the pointwise limit of a nondecreasing sequence of functions in Cb(X)+, each of which is again the pointwise limit of a nondecreasing sequence in C o (X)+. Therefore, from Lemma 5.2.3(c), (5.5.6) and the monotone convergence theorem, it follows that the function x 1—* co x (0) [. 110 * (x) p-a.e.] is measurable. Moreover, since [under Assumption 5.2.1(a)] the Borel a-algebra B is countably generated (by open sets), using the monotone class theorem one can show that x 1—* co x (B) is a measurable function on X for every B E B. To complete the proof of part (a), integrate both sides of (5.5.6) with respect to tt to obtain f f*(x)p(dx) = f f f(y)co x (dy)p(dx) V f and then note that, by (2.3.7)(ii), the left-hand side equals I fdp. That is,

I

f (x)ti(dx) = f f f (Y)(Px(dY)P(dx) V f c C

which implies (a).

o (X),

(5.5.7)

5.5. Proofs

77

(b) Observe that for any constant function c, P(n) c = c for all n and, therefore, He = c* = c, where H is the linear mapping in Lemma 5.2.3. Now, take c = 1 and let {fk} be a sequence in Co(X) such that fk(x) 1 1 Vx. Then, by Lemma 5.2.3(c), (5.5.6) and the monotone convergence theorem, f(x) =

Ix fk(y)(p x (dy) 1

1=

(px (X) for ,u-a.a. x.

(c) To prove (c) we need to show that (5.5.6) holds for all f in L i (p). To do this, let us first recall that c(X) [C Co(X)] is dense in Li (u); see, for instance, Rudin [115], p. 69. Therefore, if f is in Li(p), there is a sequence If', in 610 (X) such that fk f in Li (,u). Thus, by Lemma 5.2.3(b), 1* in Li( au) and, on the other hand, f also converges to f f dc,o x in L i (,u) since, by (5.5.6),

IIR(*) — f f(Y)(P.(clY)111 = f f (x) — f

f(y) (Px(dY)1,a(dx)

if If k(Y) — f (Y)I(Px(dY)/i(dx) = f Ifk(y)— f(Y)l(dy) fk —

This implies (e). (d) Note that the equality

H = HP

f* = (Pf) *

[by part (a)]

0.

in (5.2.5) can also be written as (5.5.8)

f e Li(1- 1).

Thus, applying (c) to both sides of (5.5.8) we see that, for every f in L i (p) and x E X,

f f(y)(p,(4)

= 1(x) = ( Pf)*(x) = f (P,(dY)Pf(Y) cp.,.(dy) f P(y, dz) f (z)

[by (2.3.6)].

In particular, as 60(X) C Li(,u), for every gi in the countable dense set introduced above, there is a ,u-null set N1 such that for every x

gc

co (x)

f gi (y)(p x (dy) = f gi(z) f P(y,dz)(,o,(dy). Hence, letting N be the union of the sets NI, we have ,u(N) = 0 and for every x N

f .91(Y)(P.,(4) = f gi(z) f P(y,dz)(dy)

VI.

Chapter 5. Markov Chains in Metric Spaces

78

This implies, by the denseness of g, that cox = cpx P, which together with (b) gives that co, is an invariant p.m. for P. (e) The first equality in (e) is obtained from part (c) and (2.3.8)(ii), taking f E L 1 (ii) as an indicator function. Finally, from part (d), co, is an invariant p.m. El so that the last term in (e) is obtained from (a) with co, in place of ,u.

5.5.3 Proof of Theorem 5.2.2 In Lemma 5.2.4(a), take B as the support supp(,u) of ,u, (see Definition 5.3.8). Then

fx ,u(dx)cox [supp(,u)] = 1, which yields co x [supp(a)] = 1 ,u-a.e., and suPP(49 x) C supp(,u) c NC for ,u-a.a. x,

(5.5.9)

where NC denotes the complement of N, the it-null set in (5.5.4) Now, let ,u ° be any p.m. equivalent to au (i.e., ,u° < au and tt < ,0), and define the t.p.f. II(x, .) := co x ° if x E supp(p)

(5.5.10)

:= ,a° (.) otherwise. By Lemma 5.2.4, this is indeed a t.p.f. and by (5.5.10) and (5.2.7) we obtain Theorem 5.2.2(c). Similarly, Theorem 5.2.2(a) follows from Lemma 5.2.4(d), whereas part (b) in the theorem follows from the IET 2.3.4 and (5.2.7) again. Finally part (d) follows from (5.2.7) and (5.2.5) [or Lemma 5.2.4(d), (e)], while (e) is equivalent to Lemma 5.2.4(a). To get (f), let us write P( n)(x,B) as P( n) /B(x). Then, by part (c) and the MET (2.3.8)(i), p(n)1 B , ii*B = rul B in Li(P)• (5.5.11) On the other hand, by definition (2.3.10) of T and a straightforward induction argument, T k f = d(vfP k )/d,u for all f E L i (A) and k = 0,1, ... , so that T ( n) f = d(vfP (n) )/d 11 Vn = 1, 2, . .. ;

(5.5.12)

that is, for any Borel set B C X and n = 1,2, ... ,

iB T (n) f di = vf P( n) (B) = f vf(dx)(P ( n) IB)(x). Now, in the latter equation let n -4 oo. Then, by (2.3.14)(ii) and (5.5.11), we obtain

ii, f* d ii = fxvf(dx)(III B )(x) = (vfll)(B) VB

E B,

5.5. Proofs

79

which means that f* = d(vfII)/d,u. This proves the first equality in (f), and in fact it yields the other equalitites in (f) since

(vfII)(dx) = f*(x),u(dx) = vf. (dx)

[by (2.3.9)],

that is, vf. = vf H. Hence, vf.P = vfHP = vfII [by (d)], and so [by (2.3.10)]

T I = d(vf.P)/d,u = d(vfII)/d,u = f*. Finally, part (g) essentially follows from the fact that M(X) is isometrically isomorphic to Li(1- 1 ) and Il v ilTv = Lib if v(dx) = f (x),u(dx). Namely, let v be any measure in Mii (X) and take f = dvid,u, so that v = vf [see (2.3.9)]. Then, by (5.5.12), part (f), and (2.3.14)(ii),

IIVP(n)

— VIIIITV

= 11 71(71) / —

1* Iii

-- 0 as n -- co.

On the other hand, with V := vII, part (d) yields VP = vIIP = vII = V, and, similarly, v*II = vIIII = vII = v*. This completes the proof of Theorem 5.2.2. El

5.5.4 Proof of Lemma5.3.2 It is clear that (c), (d) and (e) are equivalent. We next prove the equivalence of (a), (b) and (c). (a) = (b). Supose that ,u is an ergodic invariant p.m. and let f E L i (A) be an invariant function. Then, by (2.4.2), f = f* and f is a S1 -measurable function. Thus, for every real number r, the set Br := Ix I f(x) < r} is in the o--algebra Si, and, therefore, by (2.4.1), there is an invariant set Er E S such that the symmetric difference BrAB, has ,u-measure zero. This implies that ,u(B r ) =- ,u(Br) is 0 or 1. Hence, as r was arbitrary, we conclude that f is a constant it-a.e. (b) = (c). For every f E Li(/u), (2.3.8)(ii) yields that f* is an invariant function, which combined with (b) implies that f* is a constant tc-a.e.. Thus, by (2.3.7)(ii) and Lemma 5.2.4(c),

f* (x)

= f f do = f f cb,ox

for ,u-a.a. x.

(5.5.13)

(c) = (a). Let B E S be an invariant set, and take f = 1B in (c), so that ,u(B) = yox (B) for ,u-a.a. x. Thus, by Lemma 5.2.4(c) and (2.4.2),

,u(B) = cpx (B) = 11*B (x) = 1B(x)

for 1u-a.a. x.

This yields (a), which completes the proof of the equivalence of (a)--(e).

Chapter 5. Markov Chains in Metric Spaces

80

(f) Suppose that ,u is ergodic. As p(S) = 1, the invariance of ,u yields:

1 = ,u(S) = f p(dy)P(y, S) = so that P(y, S) = 1 for ,u-a.a. y E S. Thus, there is a Borel set S i C S such that 1-t(Si) = p(S) = 1 and P(y, S) = 1 if y E Define 52 := 1y E Si P(Y, obtain

isi

and, therefore,

= 11. Then, using the invariance of ,u again, we

ANY)P(Y, Si) = 1-t(Si) = u(S) =

1,

11 (4)[P(Y, S) — P(y, SO] = 0.

By definition of S2, we have P(y, Si) = 1 if y E 82; hence, ii(S i — 82) = 0, that is, itt(S2) = tt(Si) = P(S) = 1. In general (as in the proof of Lemma 2.2.3(c)), the above procedure gives a nonincreasing sequence of sets S. C S such that it(S) = ,u(S) = 1 and P(y, Sn) = 1 if y E Sn ±k for all n > 0 and k > 1, with So := S. Then the set S := satisfies (5.3.2). (g) This follows from (f) and Lemma 5.2.4(a),(d), which yield

tt(B) =

ii(dx)(Px(B) = cp x (B)1,4) = cpx (B) Vx E S , B C

with BE B. (h) If S can be decomposed as the union of two disjoint invariant sets A and B with p(A).,u,(B) > 0, then the indicator function of, say, A is such that 1A(x) = 1 if x E A, and 1A(x) = 0 if x E B. Thus, as A and B have both positive ,u,-measure, the invariant function 'A is not fl which contradicts part (b). constant

5.5.5 Proof of Theorem 5.4.1 (a) From Lemma 5.3.2(f), given an arbitrary initial state in S, the MC stays in S with probability 1, while by Lemma 5.3.,2(g) the restriction of P to S has a unique invariant p.m. u, so ,u = cox for all x E S. In addition, by Theorem 2.5.1, for every

5.5. Proofs

81

f E Li (u) there is a Borel set Nf c 8 , with ,u(Nf) = 0 and such that for every xo 0 Nf n-1

f f c171- n ) = n -1 E f (ek(w))

J(x0)

(5.5. 14)

k=0

for some f E Li (,u). On the other hand, for every fixed w E Si, the sequence is weak* compact in M(X). Hence, as Co (X) C

f(x 0 ) = f f drw ,

Vf E Co(X) :

(5.5.15)

where 7r, is an arbitrary weak* accumulation point of {7r, („n) }. Using the separability of Co (X) [cf. (5.5.3)45.5.4)] and (5.5.14), there is a Bore! set S* C S such that, for all x o E S*

f(x 0 ) = f f d7r, V f

E Co (X), P 0 -a.s.

(5.5.16)

This in turn implies that P 0 -a.s. all the weak* accumulation points 7r, are identical to, say, 7r xo . Hence, P 0 -almost surely, the sequence {7r ,n) } converges weak* to 7rxo , and this holds for every x o e Sf* , On the other hand, for every x E S and f E Co(X), we also have n-1

f fdji = lim rt -1 Ex fl —* 00

E f(ek(c4))) =

firn f(f f chr, (:) )Px (dw). fl —*00

k=0

Hence, as f is bounded, from the bounded convergence theorem and the almost sure convergence of f f chr,c7 ) in (5.5.14), we get

f

f d,u = f f d7r x

Vx E 8*, f E Co (X),

which in turn implies 7r x = /2 for all x E S*, and thus, 7Tx is a p.m. By Theorem 1.4.20 one gets the (stronger) "weak" convergence of {7r () } to ,u for every x E S* (as opposed to the (weaker) weak* convergence). Part (b) can be obtained from (a) and Lemma 5.2.3 using the denseness of Co (X) in Li (//). Indeed, take f e Li(,u) arbitrary. From the denseness of Co (X) in i L (p), there is a sequence Ifkl in Li(p) with fk f in Li (//). Therefore, = tif

= lim tifk [by Lemma 5.2.3(b)] k—* 00 1411 k—>oo

I

f f d iu

fk diu

[by (a)]

[as fk —> f in L i (u)]. 0

82

Chapter 5. Markov Chains in Metric Spaces

5.6 Notes Most of the results in this chapter are from Hernandez-Lerma and Lasserre [59]. The representation of invariant p.m.'s as a convex combination of the ergodic measures as well as the Yosida ergodic decomposition was originally obtained by Yosida [133] under the assumption that the t.p.f. P maps Cc (X) into itself, which extended earlier results by Krylov-Bogolioubov [83] obtained for a class of deterministic dynamical systems on a compact metric space. In the context of deterministic dynamical systems, see also the chapter on invariant measures in Akin [1].

Chapter 6

Classification of Markov Chains via Occupation Measures 6.1 Introduction We here consider a MC e. = {6} on a LCS metric space X, and with a unique invariant p.m. ,u, on B. If this uniqueness does not hold, then by Theorem 5.3.5 we may consider the restriction of e„ to each ergodic class in which uniqueness of an invariant p.m. is guaranteed (see Definition 5.3.6). One way to characterize these MCs is to use the various notions of transience and recurrence as in Chapter 4. However, except for the case where X is countable, this is not a definite classification. On the other hand, we have seen in Chapter 5 that for MCs on LCS metric spaces, the expected occupation measures defined in (2.3.4) have some nice convergence properties, without any further assumption than the existence of an invariant p.m. (see Assumption 5.2.1). We will use this fact to show that according to the type of convergence of their expected occupation measures, these MCs can be classified in only two categories. Namely, there is an absorbing set A of full measure pc such that either (a) or (b) below occur: (a) The MC is positive recurrent in A, and positive Harris recurrent (P.H.R.) in A 1 c A, where A 1 is of full measure; therefore, in view of Theorem 4.3.2, the sequence of expected occupation measures {P (7') (x, .)} converges to tt in the total variation norm for every x E Al . The set A 1 is a "maximal" Harris set and it is indecomposable into disjoint absorbing sets. Therefore, "up to the A-null set X \ A .: Ac", the chain is positive recurrent. (b) The chain is not positive recurrent and for every x E A, the sequence {P ( n ) (x, .)} converges to iu only weakly (and the stronger setwise convergence or in the total variation norm are not possible). In addition, A is the union of an uncountable collection of p-null absorbing sets A x c A, that is, when the chain

84

Chapter 6. Classification of Markov Chains via Occupation Measures

starts in state x E A, it remains in a ti-null absorbing set A x . The restriction of P to A x has no invariant p.m. Thus, in contrast to case (a), the absorbing set A contains uncountably many absorbing sets. In addition, a "solidarity" property holds, which means that if the convergence of P ( n) (x,.) to ,u in total variation occurs for all x in a ti-positive set then it occurs wa.e. and we are in case (a). In fact, to be in case (a), it even suffices to check whether the sequence P(') (x, 0) converges for all open sets 0 E B on a ,u-positive set only (note that knowledge of p(0) is not required). Moreover, in case (b), where A is the union of ,u-null absorbing sets A x , x E A, the restriction of P to A x is still a Markov chain but with no P-invariant p.m. For every bounded measurable function f on X and for every x E A, whenever the limit function f* (x) in Theorem 2.3.4 exists, it is shown to be Ox (f) = f fa x for some pure finitely-additive invariant p.m. Ox with Ox (Ax ) = 1, whereas ,u(A x ) = 0. However, f f da = f f dO x for ,u-a.a. x. One may regard case (a) as the analogue of the countable case, i.e., the case of an irreducible MC where A cannot be decomposed into absorbing sets, whereas case (b) illustrates what can happen if X is not countable. The latter case is far from being pathological. In fact, one may show that every Markov chain with finitely supported expected occupation measures, and whose invariant distribution does not have a countable support will fall into case (b). This can be easily checked in many cases, for instance, the systems in §6.2.1, below. For such systems (with a unique invariant p.m. ,u a.c. w.r.t. the Lebesgue measure A), for ,u-a.a. initial states x, the chain evolves in a countable space A x (the iterates from x) and the associated MC (the restriction to A x ) has no invariant p.m.

6.2 A Classification Let e. = {6} be an X-valued MC with t.p.f. P. We begin with a result that is valid for arbitrary separable spaces (X, B), that is, (X, B) is a measurable space with B countably generated (hence X is not necessarily a metric (not even a topological) space). The following result is the key fact underlying the "classification" proposed later on. It essentially follows from Neveu [106, Prop. 2].

Theorem 6.2.1. Assume that (X, B) is separable and that P has a unique invariant p.m. i. Then there is an absorbing set A E B with ,u(A) = 1 and such that either (a) is positive recurrent on A, and P.H.R. in a measurable set H C A with ,u(H) =1, or (b) ett _L E7: 1 Pi (x, .) for each x E A, so that when the MC. starts in state x E A, it remains in a ,u-null absorbing set A x C A. There are uncountably many such sets. Of course, when X is countable, only (a) can occur.

6.2. A Classification

85

Proof. As tt is the unique P-invariant p.m., ,u is ergodic (Proposition 2.4.3). Hence, by Proposition 2.4.2 with f = 1B we have 71

lirn n -1 E Pk (x,B) = p(B)

n —> Do

,u-a.e.

k=1

for each B E B. Therefore, whenever [t(B) > 0, DO

E Pk (x, B) > 0 k=1

which implies that , is ,u-essentially irreducible [see Definition 4.2.3]. Therefore, from Neveu [106, Prop 21, only two things may happen. Either (a) there is an absorbing set A1 E B with ,u(A i ) = 1, and such that oc

ti << E

pk (x , .)

for all x E A1,

k=1

or (b) there is an absorbing set A2 E B with ,u(A 2 ) = 1, and such that oc ou I E

pk (x , .)

for all x E A2-

(6.2.1)

k=1

In case (a), . is ii-irreducible in the absorbing set A 1 , and, by the existence of a unique P-invariant p.m., it is positive recurrent on A1 [Proposition 4.2.11, and Definition 4.2.9]. In addition (by Proposition 4.2.12) there is a set H E B and a u-null set N such that A 1 = H U N, and H is a maximal Harris set in which , is P.H.R. (A "maximal Harris set" H is a maximal invariant (or absorbing) set such that . restricted to H is Harris recurrent [see Meyn and Tweedie [103, Theor. 9.1.5]].) In case (b), it follows from (6.2.1) that for each x E A := A2, there is a measurable set A x c A, such that ,u(A x ) = 0 and Pk (x, Ax ) = 1 for all k = 1, 2, .... Thus, the it-null set it := {y E A1P3 (y,Ax ) =1 for j =1,2, ... } CA is absorbing and x E A. As the sets Aix form a covering of A, there are uncountably many different sets A'x because otherwise, u(A x ) = 0 for all x and ,u(A) = 1 would yield a contradiction. El We now use Theorem 6.2.1 to obtain the following solidarity property.

Proposition 6.2.2. Assume that P has a unique invariant p.m. tt. Then the following statements are equivalent: (a) P(m ) (x, .) -4 it setwise for all x EBEB with p(B) > 0. (b) P(n) (x, .) —> il setwise u-a.e. (c) 11P (m) (X) -) — MITI/ —> 0 tt-a.e.

Chapter 6. Classification of Markov Chains via Occupation Measures

86

(a) are trivial. The implication (b) = (c) is Proof. The implications (c) = (b) proved in Theorem 4.3.2. Therefore, it suffices to prove (a) = (b). If (a) holds and B E B is such that ,u(B) > 0, then on B we cannot have /II Pk (x, .) for all k = 0, 1, .... Therefore, by Theorem 6.2.1(a), the MC is P.H.R. on some Harris set H. From Theorem 4.3.2, we conclude that P(n ) (x, .) —> p El setwise for all x E H (in fact IIP (n ) (x, .) — sullTv —> 0 for all x E H). If X is a LCS metric space, Theorem 6.2.1 takes the following stronger form.

Theorem 6.2.3. Assume that X is a LCS metric space and that P has a unique invariant p.m. 1a. Then only (a) or (b) below may happen: (a) There exists an absorbing set X 0 c X with ,u(X 0 ) =1, and „ is positive recurrent on X0 , and P.H.R. on some measurable set H C X0 with p,(H) =1. In addition, for ,u-a.a. x E Xo (resp. for all x E H), one has II P(n) ( X 1 .) — IIIITV —> 0.

(b)

(6.2.2)

. is not positive recurrent and for ,u-a.a. x E X one has P(n) (x, .) = ,u and ,u I Pen) (x, .) Vn -= 1, ... .

(6.2.3)

Proof. In case (a) of Theorem 6.2.1, e. is positive recurrent on an absorbing set X 0 and P.H.R. on a Harris set H C Xo. Therefore, by Theorem 4.3.2, (6.2.2) holds in H. This yields (a) since ,u(H) = 1. The case (b) follows from Theorem 6.2.1(b) ID and Theorem 5.2.2(a),(b) (using the uniqueness of the invariant p.m. p). Remark 6.2.4. Theorem 6.2.3 states that A-a.e. either the expected occupation measures converge to A in the strongest possible way (in the total variation norm), or "at most" the weakest form (the weak convergence ""). Nothing in between may happen! For instance, we cannot have setwise convergence on some p,-positive set B E B and only weak convergence in some other A-positive set C E B. Thus, from the solidarity property in Proposition 6.2.2 it follows that to check which type of convergence occurs (in norm or only weak), it suffices to consider one ,u-positive set only. In addition, we even have the following.

Proposition 6.2.5. Under the hypotheses of Theorem 6.2.3: (a) e. is positive recurrent on some absorbing set if and only if there is a ,u-positive set B E B such that P(n) (x, 0) converges for every open set 0 E B and every x E B. (b) There is no absorbing set on which e. is positive recurrent if and only if there is some 1u-positive set B E B such that for every x E B, P (11) (x,0 x ) does not converge for some open set O x E B. Proof. It suffices to observe that the setwise convergence of a sequence of probability measures occurs if and only if it occurs in the class of open sets only [see CI Proposition 1.4.3].

6.2. A Classification

87

Note that in (a), one does not need to know the limit (which is it(0)). It suffices that the sequence P(n)(x, 0) converges.

6.2.1 Examples We next provide a series of examples.

Measure-preserving transformations Consider the following measure-preserving transformations borrowed from Lasota and Mackey [85]:

(a) The r-adic transformation. 8: [0, 1] —> [0, 1] given by x i— rx

mod(1) with r > 0.

(6.2.4)

(b) The logistic map. S: [0, 1] —> [0, 1] given by x 1---3 S(x) := 4x(1 — x).

(6.2.5)

(c) The Baker transformation. S: [0, 11 2 —4 [0, 11 2 given by (x,y) 1---+ S(x,y) =

(2x, y/2) if 0 < x < 1/2, 0 5_ y 5_ 1, { (2x — 1, (y + 1)/2) if 1/2 < x < 1, 0 < y < 1.

(d) Anosov diffeomorphisms. 8: [0, 11 2 --4 [0, 11 2 given by (x,y) 1--4 S(x, y) = (x + y,x + 2y) mod(1). In cases (a), (c) and (d), the Lebesgue measure is invariant, whereas in case (b), Ulam and von Neumann have shown that there is an invariant p.m. with density (7r \/x(1 — Therefore, since in all of these cases the expected occupation measures are finitely supported, the convergence of the expected occupation measures {P (n) (x, .)} to the corresponding invariant p.m. itt (for the initial states x in the corresponding ergodic class) is only "weak" so that the associated MCs cannot be positive recurrent, and we are in case (b) of Theorem 6.2.3.

Nonlinear Dynamic Systems Let et+i = with F : Rn x Rm --4 Rn , and the /P t are i.i.d. R P-I -valued random vectors. If /Po is discrete with finitely many values and the support of the invariant p.m. (assumed to exist and to be unique) is uncountable, then the MC cannot be positive recurrent, and we are in case (b) of Theorem 6.2.3. Most random walks with finitely supported distributions fall into that class.

88

Chapter 6. Classification of Markov Chains via Occupation Measures

Iterated Function Systems Let e. be as in (2.2.7). Under certain conditions, the MC e. has a unique absolutely continuous invariant p.m., called a fractal measure (a measure supported on a fractal); see e.g. Section 12.8 in Lasota and Mackey [85]. Again, as the expected occupation measures are finitely supported, the convergence is only weak so that the chain cannot be positive recurrent. In fact, for any measure-preserving transformation with an invariant p.m. tt that has an uncountable support, from p-a.a. initial states x E X, the chain remains in a p-null countable absorbing set A x (with no invariant p.m.). The restriction of the MC e. to A x is transient. Indeed, when it starts in x E X, e. visits yo E A x only once, because otherwise, if the chain returns to y o (and as the transitions are deterministic), there must be a finite cycle (y = Yo • .. , yri ). But then there is an invariant p.m. supported on y o , yi ,•, yn , a contradiction.

6.3 On the Birkhoff Individual Ergodic Theorem From the results of the previous section, in the case of a unique invariant p.m. p and when the chain is not positive recurrent, for p-a.a. initial states x, the chain remains in a p-null absorbing set A. Therefore, the restriction of the t.p.f. to A x still a MC chain but with no invariant p.m. However, the limit in the JET is is still related to the invariant p.m. it. We make this more precise in Theorem 6.3.1 below. Let B(X) be the Banach space of bounded measurable functions on X, equipped with the sup-norm. From the JET, Theorem 2.3.4, we already know that for every f E Li(11 ),

p(n) f (x)

r ( x)

p-a.e.

and ffdp = ff*

(6.3.1)

In fact, from the uniqueness of p it follows that f* (x) = f f dp for p-a.a. x E X (see Propositions 2.4.2 and 2.4.3). Moreover, in the case that is positive recurrent (resp. P.H.R.), we also have

II P(n)

itlITV —> 0

p-a.e. (resp. for all x E X).

(6.3.2)

Hence, if T. is positive recurrent, there is a set A with ,u,(A) = 1 such that

P(n )f

f f dp, Vx E A , f

E

B (X),

(6.3.3)

and if is positive Harris recurrent, one may replace A with X. We show below (Theorem 6.3.1) that if e. is not recurrent, then for every x E X and f E B(X), we can write the limit lim n P(n) f (x) (whenever it exists) as Ox (f) = f f cle x with Ox a finitely-additive measure which is P-invariant. The link with it is that for every f E B(X), and whenever the limit P(1i)f( x ) exists, we have f f dO x = f f dp, p-a.e.

89

6.3. On the Birkhoff Individual Ergodic Theorem

6.3.1 Finitely-Additive Invariant Measures Before stating Theorem 6.3.1, we introduce the notion of a pure finitely-additive measure. Let ba(X) be the vector space of finite finitely-additive signed measures on B (also called charges). Endowed with the total variation norm, it is a Banach space, and, moreover, by (1.2.4), ba(X) B(X)* . Note that M(X) is a subspace of ba(X). A (positive) finitely-additive measure 0 E ba(X) is said to be pure (or a pure charge) if there is no nonzero positive measure y E M(X) such that y < 0. More generally, a charge 0 is said to be pure if 101 := 0+ + 0— is pure, and by Theorem 10.1.2 in Rao and Rao [13, p. 240], 0 is a pure charge if and only if 0 E M(X)', where M(X)' := {0 E ba(X) 0 1 y Vy E M(X)}. Moreover, 01 if and only if 1 0 1 A kol = 0, where 1 0 1 A kal := 1 0 1 t9 1 — )± denotes the minimum of 101 and lyl. A charge (resp. a pure charge) is called a mean (resp. a pure mean) in Revuz [112, Definition 3.6, p. 205]. Part (i) of Theorem 6.3.1 below does not depend on the uniqueness of the invariant p.m. p.

Theorem 6.3.1. Let (X, B) be a separable measurable space. (i) For every x E X, there is a P-invariant finitely-additive p.m. O x E ba(X) such that whenever f E B(X) and lim n P(n) f (x) exists, we have lim P ( n) f (x) = f f dO x .

(6.3.4)

—+ Do

Moreover, for every invariant p.m. ,u one has p(B) = f Ox (B) p(dx)

B E B.

(6.3.5)

has a unique invariant p.m. p, then either (a) or (b) below (ii) In addition, if occur: (a) X = X 0 U X i with X 0 invariant, p(X0 ) =1, and is positive recurrent on X 0 (resp. P.H.R. on H C X 0 with p(H) =1); moreover, Ox = p, p-a.e. (resp. for all x E H), is not positive recurrent, and Ox is a pure finitely-additive P-invariant (b) p.m.; moreover, for every f E B(X), f f dp f f dO x for p-a.a. x E X. Proof. (i) The proof of the first statement is an application of a special HahnBanach extension theorem (see Royden [113, Prop. 5, p. 224]). Indeed, choose an arbitrary x E X, and with f*(x) as in (6.3.1), let

Vx := { f E B(X) I f * (x) exists}.

Chapter 6. Classification of Markov Chains via Occupation Measures

90

Vx is a nonempty subspace of B(X) since it contains at least all the constant functions f := c for which f*(x) = c. In addition, as 11P k 111 < for all k = 1, ... , and limn P(n) P f = limn P(n) f , we have

11f11

li* (x)I 5_

11/11 vf E Vx)

and f E Vx = Pf E Vx ,

(6.3.6)

so that PVx C V. Let Tx : Vx —> R be the linear mapping f 1--+ Tx f := f* (x)

for f E V.

Observe that {I, P, P2 , . .. } is an Abelian semi-group of linear operators on B(X). Therefore, as Will < for all f E B(X), and in view of (6.3.6), by the HahnBanach extension theorem referred to above, Tx can be extended to all B(X), that is, to a continuous linear functional on B(X), which we denote by Tx again. In other words , Tx is in the dual space B(X)* ':_-_ ba(X) of finitely-additive signed measure on 8; see §1.4. As 1 E Vx and Tx is positive on Vx , Tx can be chosen to be a positive linear functional. Hence, Tx can be identified with a (nonnegative) bounded linear functional Ox E ba(X) that satisfies:

11f11

•

Tx f = Ox (f) = f f dO x for all f E B(X).

•

f* (x) = Tx f = 0x ( n = f f dOx for all f E Vx .

•

Tx (P f) = Tx f for all f E B(X).

Therefore, OP = Ox , i.e., Ox is P-invariant. In addition, being finitely-additive, Ox can be written as ex — Vx + 71) X 9 X E X, (6.3.7) where vx is a countably additive measure, that is, vx E M(X), and li)x is a pure finitely-additive measure (see Rao and Rao [13, Theor. 10.2.1, p. 241]). Finally, using an argument due to Neveu and proceeding as in the proof of Lemma A in Foguel [39, p. 34-35], one may show that both vx and li)x are P-invariant. We now prove the second statement. Let 1..1 be an invariant p.m. and take an arbitrary B E B. From the Birkhoff Individual Ergodic Theorem 2.3.4 with f := * (x) = limn P( n) f(x) tt-a.e. and f f* dp, = f fdp = 1B E L1(p), one has 1 B Now, fix B E B. As 17/3 (x) = f 1B dOx = Ox (B) p-a.e., one may extend the function x 1-4 O x (B) to a measurable one, still denoted Ox (B), and, therefore,

f f* dpt = I ex (B) 11,(dx) = that is, as B E B was arbitrary, (6.3.5) holds. (ii) From Theorem 6.2.3, either (a) X = Xo U Xi and e. is positive recurrent in Xo and P.H.R. on some absorbing set H C X0 of full p-measure, or (b) e. is not positive recurrent and p-a.e., P(n)(x, .)lit for all n.

6.3. On the Birkhoff Individual Ergodic Theorem

91

In case (a), from Theorem 6.2.3, we know that (6.2.2) holds for all x E H, so that, obviously, Ox = p for all x E H, that is, p-a.e. Consider now case (b). For p-a.a. x E X, there is a set A x E B with p(A x ) = 1 and P( ' ) (x, A x ) = 0 for all n = 1, 2, .... From (6.3.7), the invariance of vx and the uniqueness of p, it follows that vx = as p for some scalar 0 < a x < 1. But then

0 = lirn p(n)(x, A x ) = axp(Ax) + 1Px(Ax) = ri -400

which implies that ax = 0 and, thus, for p-a.a. x E X, O x (= 7Px ) is a pure finitely-additive P-invariant measure. In addition, as f E L i (p) if f E B(X), from (6.3.1) we have f*(x) = f fax, for p-a.a. x E X, and thus, since we also have I* (x) = f f dtt, we conclude that f fdit= f fa x for p-a.a. x E X. 111

6.3.2 Discussion Here we comment Theorem 6.3.1 and relate it to the IET 2.3.4. To ilustrate the discussion, consider for instance the MC „ associated to the "logistic map" (6.2.5) on the LCS metric space X = [0, 1], with t.p.f.

P(x,B) = 1B(4x(1 — x))

x E X, B E B.

As was already mentioned, there is a unique invariant p.m. p which is a.c. w.r.t. the Lebesgue measure and its density is the function x 1---+ (7 -Vx(1 — x)) -1 on [0,1]. There are also countably many invariant p.m.'s associated with cycles of all lengths j = 1, 2, ... (see, e.g., Holmgren [72]). Let S C [0, 1] be an ergodic class associated with p (see Definition 5.3.6). In particular, S E B does not contain the countable subset defined as the union of all (finite) supports of the cycles. It is an invariant set and the restriction of e. to (the LCS metric space) S has a unique invariant p.m. p. By Theorem 5.2.2(a),(b) and the uniqueness of p, we have

A

p-a.e. in S.

(6.3.8)

Moreover, as the p.m.'s P(n) (X, .) for x E S have finite support, the convergence of P(n ) (x,.) to p in (6.3.8) is the strongest possible, that is, we cannot have convergence in the total variation norm, and not even setwise convergence. Therefore, we are in case (b) of Theorem 6.2.3. So, let S with p(S) = 1,be a measurable subset of S such that (6.2.3) holds, as well as (6.3.8) for all x E S. For every x E S let

Bx := U 0 {k},

eo := x,

(6.3.9)

be the set of iterates of . from 6 = x E S. For every initial state 6 E i., ' the MC . remains in Bx with probability 1. Now, for every x E S, let Ox be the invariant finitely-additive p.m. Ox in Theorem 6.3.1. As P(n) (X, B x ) = 1 for all n, it follows from Theorem 6.3.1(i) that

92

Chapter 6. Classification of Markov Chains via Occupation Measures

Ox (B x ) = 1 for all x E S (take f := lBs in (6.3.4)). This is in contrast to p(B 1 ) = 0 for all x E S. However, from (6.3.8) we also have that P(n) f(x) —> f fdp, for all f E Cb(X). But note that if we consider the restriction of to the space Bx (the set of iterates from 6 = x c S) the latter convergence P(n)f(x) f fdp, is not the limit function in the Birkhoff IET 2.3.4 in Bx (because p is not an invariant p.m. on B1 ). On the other hand, as 01 (B 1 ) = 1 for all x E S, the convergence 0 1( f) = f f dOx can be "understood" as a Birkhoff JET in Bx . In P(n) f (X) a sense, Ox characterizes the asymptotic properties of the restriction of to the absorbing set Bx , which p fails to do. That the action of Ox and p on the functions f E Cb(X) is the same is because Bx is dense in [0, 1] and thus we have ,u(B 1 ) = 1, where Bx is the closure of Bx , despite p(B 1 ) = 0. On the other hand, consider a cycle Qn of length n, that is, CZ-t := {xi, x 2 , . xn } with xi E [0, 1] for all j = 1, . n, and xj = 4x3 _1(1 — xj_i) Vj = 2, ...n; Xi = 4x(1— x). As was already mentioned, such cycles exist for all n = 1, ... (see Holmgren [72]). For instance, the cycles of length 1 are the two points x = 0 and x = 3/4, that is, = {0} and {3/4}. Every Qn is an ergodic class with associated ergodic invariant p.m. pn defined as

with 6x the Dirac measure at x E X. The restriction of e. to St ri is a finite MC with invariant p.m. p n . It is trivial to see that the restriction of to Qn is positive Harris recurrent, for all n = 1, ... Moreover, with Bx as in (6.3.8), we have p(B x ) = 1 for all x E Qn . Summarizing, we have exhibited an ergodic class S C [0, 1] in which is not positive recurrent, and a countable collection of ergodic classes Qn in which e. is P.H.R.

6.4 Notes The classification of MCs in §6.2 is from Hernandez-Lerma and Lasserre [67] and essentially follows from Neveu [106]. As mentioned earlier, a finitely-additive measure is called a mean in Revuz [112]. In [112, Theor. 3.7, p. 205] it is proved that is Harris recurrent and its t.p.f. is quasi-compact if and only if the P-invariant p.m. p is the unique invariant mean. (For the definition of "quasi-compact" see at the end of §9.4.)

Part II

Further Ergodicity Properties

Chapter 7

Feller Markov Chains 7.1 Introduction In this chapter we consider the class of MCs on a LCS metric space X that have either the weak- or the strong-Feller property introduced in §4.4 (see Definition 4.4.2). The Feller property is a continuity property on the t.p.f. P of the MC. In particular, it permits to derive simple necessary and/or sufficient conditions for existence of an invariant p.m. for P (recall that most results in the previous chapters assumed that the MC had an invariant p.m.). In fact, most conditions for existence of an invariant p.m. do assume the weak-Feller property, and as can be shown in simple examples, the failure to satisfy this continuity condition can have important consequences (see, e.g., the MC defined by (7.3.1) below). However, there are many interesting examples of MCs that are not weakFeller. For instance, the generalized semi-Markov processes (GSMP) permit to model the essential dynamical structure of a discrete-event system, and the longrun behavior of some time-homogeneous GSMPs can be studied via the long-run behavior of an underlying associated MC which is not weak-Feller in general (see, e.g., Glynn [45]). Therefore, we will also investigate MCs on a LCS metric space for which the (weak or strong) Feller property is violated on some discontinuity set in B. In some cases, and even if the discontinuity set is very "small", this violation is indeed pathological in the sense that it prevents the MC of many properties like the existence of an invariant p.m., whereas in some other cases this violation has practically no consequence. In the latter case, we call the MCs quasi weak(or strong-) Feller because they enjoy the same basic properties as the weak- or strong-Feller MCs.

Chapter 7. Feller Markov Chains

96

7.2 Weak- and Strong-Feller Markov Chains Throughout this section e. = {e t } is a MC on a metric space X with t.p.f. P. We recall that B(X) (resp. Cb (X)) is the Banach space of bounded measurable (resp. bounded continuous) functions on X, equipped with the sup-norm. As in (2.3.6), P can be viewed as a linear operator f 1--4 P f , acting on B(X).

7.2.1 The Feller Property Recall from Definition 4.4.2 that a MC e i, (or its t.p.f. P) is weak-Feller (resp. strong-Feller) if P maps the space Cb (X) (resp. B(X)) into Cb (X). In other words, P is weak-Feller if and only if for every sequence {x n } in X such that x n, —> x e X, P f(x n,) —> P1(x) whenever f E Cb(X); equivalently, P(xn ,.) converges weakly to P(x,.) if x n --.> x. Similarly, P is strong-Feller if and only if for every sequence {x n } in X such that x n —> x E X, P f(x n ) —> P f(x) whenever f E B(X); equivalently, P(xn ,.) converges setwise to P(x,.) if x n ---> x (see Definition 1.4.1). Of course, as B(X) D Cb(X), the strong-Feller property is indeed (a lot) stronger than the weak-Feller property. As an immediate consequence of the results in §1.4, we have

Proposition 7.2.1. (a) P is weak-Feller if and only if for every sequence {x n } in X that converges to x E X, and every open set 0 E B lim inf P(x n , 0) > P (x , 0). fl

—*00

(7.2.1)

(b) Suppose that X is a LCS metric space. Then P is strong-Feller if and only if for every sequence {x n } in X that converges to x E X and every open set 0 E B

lim P(x n ,0) = P(x,0).

Ti--oo

(7 .2.2)

Proof. Part (a) follows from the Portmanteau Theorem 1.4.16(a),(d); on the other hand, by Proposition 1.4.3, (7.2.2) is equivalent to P(x n , .) ---> P(x,.) setwise, which yields (b). El The weak-Feller MCs have the following remarkable property. Let {P(n ) (x, .)} be as in (2.3.4).

Proposition 7.2.2. Suppose that X is a LCS metric space, and let . be a weakFeller MC. Then every weak* accumulation point v x E M(X)± of the sequence of expected occupation measures {P (n) (x, .)} is a (possibly trivial) invariant measure for P. Proof. Choose an arbitrary x c X and consider the sequence of expected occupation measures {P (n) (X, .)}. As X is a LCS metric space, M(X) '-' Co (X)*; see (1.2.5). Therefore, by Remark 1.3.1 and Lemma 1.3.2(b), the unit ball of M(X)

7.2. Weak- and Strong-Feller Markov Chains

97

is sequentially compact in the weak topology (equivalently, the weak* topology) a(M(X), Co (X)) of M(X). Let co x E M(X) ± be an arbitrary weak* accumulation point of {P ( " ) (x,.)}, and {P (nk ) (x,.)} a subsequence converging to co x . By definition (2.3.4) of P(n) (x,.), we have P(n) P f (x) = P( n) f (x) + n -1 [Pn f (x) — f(x)J,

(7.2.3)

for all xEX,fE B(X), and n = 1,2, .... Moreover, by the weak-Feller property, P f is in Cb(X) for every f E Co (X). In addition, for every nonnegative f E Co(X) ± we have P f E Cb(X) ± and thus, by (7.2.3) and Proposition 1.5.6(a), liminf f Pf(y)dP (nk ) (x,dy) > f Pfd cox Vf e Co (X) + . k oo Therefore, taking limit in (7.2.3) for the converging subsequence {P (nk ) (x,.)} yields f P f dco x < lim inf f P f (y) dP (nk ) (x, dy) k— oo = 1411 f f (y) d P (n k ) (x, dy) k -- ■ co .= f f cl(10 x

Vf E Co(X) + ,

which implies cox P < cox . As co, is a finite measure, it follows that co x P = cox .

1=1

We have already seen that if P admits an invariant p.m. t then the whole sequence {P (n) (X, .)} converges weakly to an invariant p.m. cox , for /..t-a.a. x E X (see Theorem 5.2.2(a),(b)). Here, we do not assume the existence of an invariant p.m. p, but, on the other hand, the MC is assumed to have the weak-Feller property. Therefore, the t.p.f. P of a weak-Feller MC admits an invariant p.m. if and only if there is some x E X such that cox is nontrivial. That cox may be a trivial measure (i.e. cox (X) = 0) is because the limit of a sequence of p.m.'s in the weak* topology o- (M(X), C o (X)) of M(X) may not be a p.m. (see Example 1.5.2) An important immediate consequence of Proposition 7.2.2 is the following well-known result (see, e.g., Lasota and Mackey [85, Remark 12.5.1, p. 419]). ,

Theorem 7.2.3. Let X be a compact metric space, and P the t.p.f. of a weak-Feller MC on X. Then P admits an invariant p.m. Proof. If X is compact then M(X) Cb(X)* and thus the sequence of expected occupation measures 1.13(1I) (X, .)} has an accumulation point co x E M(X)± in the weak* topology o- (M(X),Cb(X)) of M(X), which is the "weak" convergence of p.m.'s in Definition 1.4.10. Therefore, cox is necessarily nontrivial, and with the same proof as that of Proposition 7.2.2, one can show that cox is a nontrivial invariant p.m.

Chapter 7. Feller Markov Chains

98

7.2.2 Sufficient Condition for Existence of an Invariant Probability Measure We will see in §10 that a necessary and sufficient condition for existence of an invariant p.m. for a weak-Feller MC with t.p.f. P, is that lim P(n ) fo (x) > 0, fl

(7.2.4)

-00

for some x E X and some arbitrary strictly positive function fo E Co (X). However, despite its simplicity, the condition (7.2.4) involves the sequence P(n)fo(x) and might be difficult to check. Therefore, we now state a Lyapunovlike sufficient condition for existence of an invariant p.m. for P, which, in general, might be easier to check than (7.2.4).

Theorem 7.2.4. Let , be a MC on a LCS metric space X with t.p.f. P. Assume that P is weak-Feller, and let fo E Co (X) be an arbitrary but strictly positive function. If there is a nonnegative number b and a nonnegative measurable function V: X -4 R± , not identically zero, such that PV(x) < V(x) - 1 ± b f o (x)

Vx E X,

(7.2.5)

then there exists an invariant p.m. for P. Proof. Iterating (7.2.5) n times and dividing by n yields n -1 [PV (x) - V (x)] ± 1 < bP (n) fo (x),

x E X, n = 1, 2, .. . ,

(7.2.6)

where {P (n ) (x, .)} is the sequence of expected occupation measures. Let co x E M(X) + be an accumulation point of the sequence {13(n) (X, .)} in the weak* topology o- (M(X), C o (X)) of M(X), and {P ) (x, .)} a subsequence converging to cox . By Proposition 7.2.2, cox is a (possibly trivial) invariant p.m. for P. However, by the nonnegativity of V and the fact that fo is in Co(X), taking limit for the subsequence nk in (7.2.6) yields: b-1 < I fo cico x , which proves that yo x is a nontrivial invariant measure for P. Finally, as cox is a finite measure, we can normalize it to obtain a p.m. 0 In Theorem 7.2.4, we can also replace fo E Co(X) + with the indicator function ILK of a compact subset K of X. Indeed, for the converging subsequence {P (nk) (X, .)} in the proof of Theorem 7.2.4 we can invoke Theorem 1.4.9 to show that b-1 < lim sup P(nk) (X, K) k—> Do

so that cox is a nontrivial invariant p.m. for P.

7.3. Quasi Feller Chains

99

7.2.3 Strong-Feller Chains We now consider a strong-Feller MC , and show that it is P.H.R. (see Definition 4.2.10) in each of its ergodic classes. Recall that from Definition 5.3.6, an ergodic class is an invariant (or absorbing) subset S e 13 such that the restriction of the MC to S has a unique invariant p.m.

Proposition 7.2.5. Suppose that X is a LCS metric space and let . be a strongFeller MC. Then the restriction of the MC to each ergodic class is P.H.R.

Proof. If the t.p.f. P admits an invariant p.m. it, then in each ergodic class E we have Vx E E,

(7.2.7)

where co is the unique invariant p.m of the restriction of the MC to E [see Theorem 5.3.5 and Definition 5.3.6]. Moreover, for every B E B, P(., B) is in Cb(X) because 11B E B(X) and „ is strong-Feller. Next, observe that for all x E E and B E B,

P(n ) P1B (x) = P( n) (x, B) + n -1 [Pn1B(x) — 5(B)].

(7.2.8)

Therefore, lim P(n) (X, B) -,----

n-400

=

lim P ( n) P1B(x) [by (7.2.8)]

n—■ oo

f P1B dcp

[from (7.2.7) and P1B E Cb(X)]

yo(B) [by the invariance of co], which proves that the sequence P(n) (X, .) converges setwise to co. Hence, by Theorem 4.3.1, it follows that the MC is P.H.R. D Proposition 7.2.5 shows that the strong-Feller property is indeed a strong property. For instance, we also have the following result of Grigorescu [47, Lemma 2.2, p. 6841 for a strong-Feller MC on a compact metric space.

Lemma 7.2.6 (Grigorescu [47]). Let P be a strong-Feller MC on a compact metric space X. Then there is a finite number of ergodic classes.

7.3 Quasi Feller Chains Throughout this section we suppose that X is a LCS metric space. To see how important is the continuity condition in the weak-Feller definition, consider the following elementary, deterministic system on the compact interval X = [0, 1]. Let e. be as in (2.2.6) with F: [0, 1] —> [0, 1] defined as

x i-- F(x) := {

x/2 if 1 if

x E (0, 1], x = 0.

(7.3.1)

Chapter 7. Feller Markov Chains

100

The associated t.p.f. P is not weak-Feller since the weak-Feller property fails at 0 (only). However, whereas every weak-Feller t.p.f. on a compact metric space admits an invariant p.m. (Theorem 7.2.3), P admits no invariant p.m. and just because the weak-Feller condition fails at one point only! The transformation (7.3.1) illustrates how the absence of "continuity", even at a single point, may have important consequences. On the other hand, in many other cases, the absence of continuity at some points does not yield the same effect, as in the random walk examples in Borovkov [15], for instance. Therefore, it is worthy to detect whether or not the absence of continuity will have an effect. In particular, one defines a class of quasi-(weak and strong) Feller kernels, in the sense that their long-run behavior is the same as for weakand strong-Feller t.p.f.'s Let D C X be the discontinuity set of P, that is, the set of points x E X where the weak-Feller property fails. If we assume that D is a closed (hence a Borel) subset of X, then Y := Er = X \D is a LCS metric space and its associated Borel a-algebra By coincides with B n Y. Therefore, if P(x, D) = 0 for every x E Y, then the restriction of the MC . to Y is well-defined and, for notational ease, its t.p.f. is again denoted by P. The bounded continuous functions on Y are the restrictions to Y of the functions in Cb (X). On the other hand, the space Co (Y) is different from Co(X). Indeed, it consists of the bounded continuous functions that vanish "at infinity" in Y, that is, f E Co (Y) if there is a nondecreasing sequence of compact sets Kn E By such that lim sup If(x)1 = 0. 1-1 -400 xExc In particular, lim x,,, xE D f (x n ) = 0. Therefore, if f is in Co(Y), its extension to X by f (x) = 0 on D is in Co (X) and if f E Co(X) with f (x) = 0 on D, its restriction to Y is in Co(Y)• Now, obviously, the restriction of „ to (Y, By) is weak-Feller. In particular, the necessary and sufficient condition (7.2.4) for existence of an invariant p.m. for P is valid for a strictly positive function fo E Co (Y) (instead of fo in Co(X)). Similarly, Proposition 7.2.2 and Theorem 7.2.4 remain valid with Y in lieu of X. Now consider the MC associated with the function F in (7.3.1). The restriction of e. to Y := (0, 1] = [0, 1] \ {0} is weak-Feller. As f E Co (Y) implies that f(x) —> 0 when x n —> 0 (that is, f vanishes in D = {0}), we obtain that all the accumulation points of {PTO (x, .)} in the weak* topology o- (M(Y), Co(Y)) of M(Y) coincide with the trivial measure 0 because -

P(n) f (x) = f(x/2n) —> 0

Vx E Y

Thus, for instance, in the necessary and sufficient condition (7.2.4) for existence of an invariant p.m. we obtain lim Pen) fo (x) = lim fo (x/2n ) = 0 Ti —> oc

n —*00

101

7.3. Quasi Feller Chains

for every x E Y and fo E Co (Y), which confirms that P has no invariant p.m. on Y. In fact, if we consider the sequence {P ( n ) (x, .)} in M(X) (i.e., not in M(Y)), we have P(n)(x, .) 60 for all x E X; however, 0 0 Y. Observe that in this example the weak* accumulation points of the sequence {P ( n)(x, .)} in M(X) have positive mass at the discontinuity set D = {0}, for all e X. This suggests the following definition of a quasi weak-Feller MC with discontinuity set D.

Definition 7.3.1. P (or

is said to be quasi weak-Feller if (a) the discontinuity set D of P is closed, with P(x,D) = 0 for all x E DC, and there exists n> 1 such that Pn(x, D) < 1 for all x E D; and (b) for each x E DC, every weak* accumulation point (p x of IP ( n) (x, .)} satisfies that cp(D) = 0.

For a quasi weak-Feller MC we have the following.

Proposition 7.3.2. Assume that P is quasi weak-Feller. Let D be the discontinuity set of P, and let gox be an accumulation point of the sequence of expected occupation measures (a) If P has an invariant p.m. p, then p(D) = 0, and (px is an invariant p.m. for P p-a.e. in De. (b) For all x E DC, the restriction of yo x to De is a weak* accumulation point in M(Dc) of {P (n) (X,.)}, and, therefore, an invariant measure for the restriction of P to De. (c) cpx is an invariant measure for P if and only if co(D) = 0. Proof. (a) Let /2 be an invariant p.m. for P. By the invariance of it and the assumption on D, we obtain p(D) = f p(dx)P(x, D) = X

p(dx) P(x, D), 1)

and thus p(D) > 0 if and only if P(x, D) = 1 p-a.e. on D, that is, D is a p-invariant set. But then, as in (2.4.1), there exists an invariant set D' C D with positive //measure. This implies that Pn(x, D') = 1 for all x E D' and all n = 1, 2, ... , which contradicts the hypothesis on D (see Definition 7.3.1(a)). By Lemma 5.2.4(b), ciox isanvrtp.mfo/1-a x E X, and thus for //-a.a. x E De. DC. (b) Choose an arbitrary x E Let ciox be an accumulation point of {P ( n) (x, .)} in the weak* topology o- (M(X), Co(X)) of M(X). Then, there is a subsequence {P ( nk ) (x, .)} that converges to cp x , that is, p(7k)f( x )f f d,ox

V f E Co(X).

Chapter 7. Feller Markov Chains

102

Now, for every x E DC, consider the restriction (again denoted P(n)(x, .)) of P(n)(x, .) to Bpc, that is, P(') (x, B n Dc) for all B E B. As Co(Dc) C Co(X) (because f E Co (Dc) if and only if f E Co(X) and 1(D) = 0) it follows that co x is also an accumulation point of {P (n ) (x, .)} in the weak* topology cf(M(Dc), Co (Dc)) of M(Dc). By Proposition 7.2.2, co x is an invariant measure for the restriction of P

to DC. (c) Suppose that cpx is a nontrivial invariant measure, and rescale it to be an invariant p.m. Then, as in the proof of (a), we must have cpx (D) = 0. Conversely, assume that co x in a nontrivial measure with cox (D) = 0. After rescaling cox to be a p.m., by (b) it follows that co x is an invariant p.m. for the restriction of P to DC, and thus an invariant p.m. for P (on X). 0 Thus, for a quasi weak-Feller MC, one may "ignore" the discontinuity set D. On the other hand, the MC in our previous example (7.3.1) is not quasi weak-Feller because P(n)(x, .) 60 , and so cox (D) = 1 for all x E X. A sufficient condition for P to be quasi weak-Feller is as follows: Corollary 7.3.3. Let D E B be as in Definition 7.3.1(a). Let D e E 8 be open and De 1 D as E —> 0. Then P is quasi weak-Feller if for every c > 0 and x E De-,

there is a scalar K and n(x,c) such that P( n ) (x, D E ) < K c

Vn > n(x, c).

(7.3.2)

Proof. Assume that (7.3.2) holds and consider a subsequence {P(nk)(x, .)} that converges to co x E M(X) in the weak* topology of M(X). As De is open, Ke >

>

lim inf P(Thk) (x, D e )

k--+oo

(PX(DE)

_?_

where the second inequality follows from Theorem 1.4.9(b). Letting € t 0 yields that cox (D) = 0 for all x E DC, and thus, by Definition 7.3.1, P is quasi weakFeller. El The definition of a quasi strong-Feller kernel mimics that of a quasi weakFeller kernel, except that D is now the set of points x E X where the strongFeller property fails. The restriction of P to DC is strong-Feller and every weak* accumulation point cox of {P ( n) (x, .)} satisfies that cox (D) = 0 for x E DC. The sufficient condition in Corollary 7.3.3 is also sufficient for P to be quasi strongFeller.

7.4 Notes The definition of weak and strong-Feller MCs is due to W. Feller (whence the name) and most results for both weak and strong Feller MCs are standard. Most of the material on quasi Feller MCs is taken from Lasserre [93]. Borovkov [15] also investigated MCs on Borel spaces with no continuity condition.

Chapter 8

The Poisson Equation 8.1 Introduction In this chapter we are concerned with necessary and/or sufficient conditions for the existence of solutions (g, h) to the probabilistic multichain Poisson equation (a)

g = Pg,

and (b)

g + h— Ph= f,

with a given function f, called a "charge", where P is the t.p.f. of a MC on a general measurable space (X, B). The existence conditions are derived via three different approaches, using canonical pairs, Cesar° averages, and resolvents, in §§8.3, 8.4 and 8.5, respectively.

8.2 The Poisson Equation Consider a MC e„ on a measurable space (X, B), with t.p.f. P. As usual, we denote by Pf the function in (2.3.6), i.e., Pf(x) := f P(x,dy)f(y) x

VX E X,

for measurable functions f on X for which the integral is well defined. The Poisson equation (P.E.) for P is (a)

g = Pg,

and

(b)

g+h— Ph = f,

(8.2.1)

where f : X --* R is a given measurable function, called a charge. If (8.2.1) holds, then the pair (g, h) is said to be a solution to the P.E. with charge f. If it is known that P admits a unique invariant p.m., then (8.2.1) is called the unichain P.E.;

104

Chapter 8. The Poisson Equation

otherwise, (8.2.1) is called the multichain (or general) P.E. The problem we are concerned with is to obtain necessary and/or sufficient conditions for the existence of solutions to the multichain P.E. The P.E. for discrete (or continuous) time Markov processes occurs in many areas of applied and theoretical probability, including potential theory, stochastic approximation and stochastic control [46, 51, 52, 58, 61, 64, 70, 73, 101, 102, 103, 108, 111, 112, 127, 137]. In particular, in the analysis of average (or ergodic) cost problems for Markov control processes, (8.2.1) is a special case of the socalled average-cost optimality equation, also known as Bellman's equation, and the existence of solutions to it, among other things, is required to analyze the policy iteration (or Howard's) algorithm [11, 35, 58, 64, 61, 111]. A relevant question then is: How can we ensure the existence of solutions to the P.E.? There are two cases in which the answer to this question is well known. One is the case when the state space X is a countable set (see Remark 8.3.6 below), and the other is the above-mentioned unichain case, in which the function g in (8.2.1) turns out to be a constant (for a special situation, see Remark 8.4.6). In these two cases, a key probabilistic fact used to obtain a solution to (8.2.1) is that the "ergodic decomposition" of X in "recurrent" and "transient" parts is well understood (see Chapters 3 and 4, for instance). These concepts, however, do not have a unique meaning in the multichain situation with a general space X; namely, we have already seen that there are several types of "ergodic decompositions" (Doeblin's, Harris', Hopf's, Yosida's, ... ), and even then to obtain (8.2.1) one further needs suitable topological assumptions on X and/or probabilistic hypotheses (for instance, the weak-Feller condition) on the t.p.f. P. A consequence of this is that if one wishes to study the multichain P.E. with general X and P, a "probabilistic" approach does not look very promising. We have to try other methods. Here we try several alternative, basically functional-analytic approaches. In this chapter we follow three (related) approaches to study (8.2.1). In §8.3 we show that the existence of a solution to (8.2.1) is equivalent to the existence of a so-called canonical pair (Definition 8.3.1). The latter shows in particular the close relation between (8.2.1) and the existence of limits (as n —> oo) of the Cesaro averages in (2.3.4), i.e.,

p(n) f .._

pk f ,

(8.2.2)

for classes of suitable functions f. Hence, in §8.4 we turn our attention to the Mean Ergodic Theorem of Yosida and Kakutani [135], which gives a precise description of the set of functions f for which p(n)1 converges (see Theorem 8.4.2), and allows us to fully determine (Theorem 8.4.7) the set of solutions to (8.2.1). An alternative

8.3. Canonical Pairs

105

description of this set is obtained in §8.5 (Theorem 8.5.3) using the well-known relation between the limiting behaviour of the Cesaro averages P(n) f and the limit as a I 1 of (1 — a)R, f , where 00

R, f := EakPkf,

with 0 < a < 1,

k=0

is the resolvent (or a-potential) of P. The results in §8.4 and §8.5 are strongly influenced by the work of previous authors on the existence of solutions to linear equations in Banach spaces [19, 97, 118, 120]. (In a somewhat related context but using a technically different approach, via a generalized Farkas theorem, existence results are also presented in [52, 56, 60].) Finally, we conclude in §8.6 with a brief description of some open problems.

8.3

Canonical Pairs

Let (X,11 • II) be a normed vector space of real-valued measurable functions on (X, 8). [In most applications of Markov processes, X is in fact a Banach space, for instance, X = L p (X , 8, 1,t) L p (p) for some a-finite measure tt and 1 f strongly" means f I -› 0. In this case, we write f = s-lim fn . For operators on X, strong convergence T := s-limT„ means T f = s-urn T„ f for all f E X. We shall assume that the t.p.f. P defines a linear operator on X into itself given by

Ilf, -

(P f)(x) := f P(x,dy)f(y) Vf E X, x E X. x As usual, Pn denotes the n-step transition function, which is given recursively by Pn(x, B) = f P x

1 (x, dy)P(y, B),

n = 1, 2, .. . ,

where P°(x, •) is the Dirac measure at x; we also write P° := I, the identity operator. For n = 1, 2, ... , let

sn := I ± p + . . . ± P 1 ,

(8.3.1)

so that we can write (8.2.2) as P( n ) = n —l Sn . The following definition is an adaptation of the concept of canonical triplet introduced by Yushkevich [136] (see also [35] or [58]) for Markov control processes.

Definition 8.3.1. Let f be a given function in X. A pair (g, h) of functions g and h in X is said to be an f-canonical pair if S„ f + Pnh = ng + h

Vn = 1, 2, • • • •

(8.3.2)

106

Chapter 8. The Poisson Equation

It turns out that (8.3.2) is equivalent to the multichain P.E. (8.2.1) in the following sense.

Theorem 8.3.2. (g, h) is an f -canonical pair if and only if (g, h) is a solution to the multichain P.E. with charge f. Proof. (-) Let (g, h) be an f-canonical pair. Then, with n = 1, (8.3.2) yields (8.2.1)(b). Now, to obtain (8.2.1)(a), apply P to both sides of (8.2.1)(b) to get P2 h = Pg + Ph— Pf, and, on the other hand, note that (8.3.2) with n = 2 yields P2 h = 2g -Fh—f—Pf. The last two equations give (8.2.1)(a) since they imply Pg—g=g+h—Ph—f= 0, where the latter equality comes from (8.2.1)(b). (-) Conversely, suppose that (g, h) satisfies (8.2.1). Then g = P g implies g = Pk g for all k = 0, 1, ... , and, therefore, n-1 pk g

ng =-

_ sng

Vn = 1,2,... .

(8.3.3)

k=0

Now write (8.2.1)(b) as h = (f — g)+ Ph and iterate to obtain h = Sn ( f — g) + Ph = Sn f — ng + Ph

[by (8.3.3)],

which is the same as (8.3.2).

El

Although Theorem 8.3.2 is quite straightforward, it has important consequences. In particular, we will derive from it additional necessary and/or sufficient conditions for the existence of solutions to the multichain P.E. Recall that the norm of an operator T from X into itself is defined as

IITII := sup IIIT f II I f c x,11/11

1 1,

and that T is said to be power-bounded if there is a constant Al > 0 such that IIT1 < M for all n = 0, 1, ... .

Corollary 8.3.3. Let (g, h) be an f -canonical pair. Then: (a) g = s-lim P(n) g n

(b) If Pm hln —> 0 (poin,twise or strongly), then limP (n) f =limP(n) g = g (pointwise or strongly, respectively). (c) If P is power-bounded, then supn llSn(f — 9)11 <00.

8.3. Canonical Pairs

107

Proof. Part (a) follows from (8.3.3). Moreover, from (8.3.3) and (8.3.2) we get Sn (f — g) = h — Pnh.

(8.3.4)

This yields (b) [using (a)], and also (c) since 1lS ri (f — g)11 5_ (1+ M)111111, where M is such that 11P n ii < M for all n = 0, 1, ... . Ill Remark 8.3.4. (a) Observe that if (8.2.1)(b) holds, so that f — g = (I — P)h, then we can also obtain (8.3.4) from the general expression:

Pk (I — P) = I — P n Vn = 1,2 . .. .

Sn (I — P)

(8.3.5)

k=0

(b) The hypotheses in parts (b) and (c) of Corollary 8.3.3 obviously hold if P is a contraction operator, i.e., IIPII < 1. This is the case if, for instance, X = B(X) is the Banach space of bounded measurable functions on (X, B) with the sup norm, or if X = L p (p) with 1 < p < oo and it a P-invariant p.m., i.e., it is a (not necessarily unique) p.m. such that it(B) = f it(dx)P(x, B) x

VB E B.

(8.3.6)

(Recall (2.2.12).) The following theorem gives another characterization of a solution to (8.2.1).

Theorem 8.3.5. Let f,g and h be functions in X, and suppose that: (a) P is bounded (i.e., 11-13 11 < M for some constant M), and (b) Pa /n —* 0 strongly. Then the two following assertions are equivalent. (i) (g, h) is the unique solution of the P.E. (8.2.1) for which s-lim P (n) h = 0.

(8.3.7)

(ii) g = s-lim P(n) g = s-lim P(n) f , and N n-1

h = s-lim Tv-

E E Pk (f — g) = s-lim n=1 k=0

N

Sn(f

—

g).

(8.3.8)

n=1

Proof. (i) = (ii). If (i) holds, then the first condition in (ii) follows from Corollary 8.3.3(b). On the other hand, by (8.3.4), 1

N n=1

1

N

VN = 1,2,.... n=1

Hence, (8.3.8) follows from (8.3.9) and (8.3.7).

(8.3.9)

Chapter 8. The Poisson Equation

108

(ii) = (i). If g = s-lim P(n) g = s-lim P(n) f , then (8.2.1)(a) holds [since, by assumption (a), we can interchange P and s-lim], and also s-lim P(n ) (f - g) = 0.

(8.3.10)

To prove (8.2.1)(b) first note that, by (8.3.5), n-1

(I - P)

E Pk (f - g) = (I -

Pn )( f

- g).

(8.3.11)

k=0

Therefore, applying (I - P) to both sides of (8.3.8) and using assumption (a) again, we get (I - P)h =

n-1

1 S-111-11 —

N

(I -

P)

n=1

E pk (f - g) k=0

( f - g) - s- lim —

N

N ‘-d n=1

Pn (f - g)

[by (8.3.10)],

= f - g

i.e., (8.2.1)(b) holds. Hence, the pair (g, h) is a solution to (8.2.1); it only remains to show that it is unique. Before doing this, let us note that (8.3.8) and (8.3.9) together imply (8.3.7). Now let (gi, h1) and (g2, h2) be two f-canonical pairs satisfying the conditions in (ii). Then gi = s-lim P(n) f = g2 ,

i.e., gi = g2 =: g.

(8.3.12)

Furthermore, since (I - P)h, = f -g for i = 1, 2, the function u = h i - h2 satisfies (I - P)u = 0, and, therefore, u = Pk u for all k = 0, 1,..., which implies u = s-lim P(n) u = s-lim P(n ) hi - s-lim P( n) h2 = 0 [by ( 8 . 3 . 7)], LI

i.e., hi = h2. This completes the proof.

In the following section we show that the results in Theorem 8.3.5, as well as those mentioned in the following Remark 8.3.6 are valid in a more general context. Remark 8.3.6. If the state space X is a finite set, in which case the t.p.f. P is a square matrix, it is well-known [11, 35, 73, 101, 111] that the limiting matrix n-1

iim P(n) = 11111n-

E

Pk (componentwise)

(8.3.13)

k=0

exists (compare (8.3.13) with (2.3.7), (2.3.8), and (5.2.3), for instance), and that / - P + H is nonsingular; its inverse Z := (I - P +11) -1

(8.3.14)

8.3. Canonical Pairs

109

is called the fundamental matrix associated to P. Moreover, the matrix

(P — 11) k (I — H)

(8.3.15)

H = (I — P + 11)-1 (I — H) = Z(/ — H)

(8.3.16)

satisfies and is called the deviation matrix associated to P (or the Drazin inverse of ./— P); P — II is sometimes called the approach matrix [127]. The above facts are also true if X is a countable set. What we wish to remark is that the solution pair (g, h) in Theorem 8.3.5(i), (ii) is precisely

g =11f,

and

h = H f.

(8.3.17)

In Theorem 8.4.7 we show that (8.3.16) and (8.3.17) hold in a much more general setting.

Remark 8.3.7. The choice of the underlying space X is important. For instance, consider the countable set X = {1, 2, ... } with the discrete topology, and let X be the Banach space of bounded functions on X with the supremum norm 7.4 := sup x 1u(x)j. Further, let {g(x), x E X}, be a probability distribution on X, that is, q(x) > 0 for all x and Ex q(x) = 1, which is assumed to have a finite "mean value" q := x q(x) < Do, x

E

and let P(x, y) __ P(x, {y}) be the t.p.f. given by

P(x,x — 1) := 1 Vx > 2, and P(1, y) := q(y) Vy > 1. Finally, consider the Poisson Equation (8.2.1) with charge f E X defined by f(1) := 1-4, and f(x) := 1 Vx > 2. Then one can easily check that (8.2.1) has a solution (h, g) with g(.) -_T., 0 and

h(x) = f(1) + x — 1 Vx E X.

(8.3.18)

In fact, except for an additive constant, any solution h to (8.2.1) is of the form (8.3.18), which is not a bounded function. In other words, the charge f is in X and the P.E. is "solvable", but the solution is not in X. This kind of situation can often be remedied by suitably enlarging the space X. For example, consider the weighted norm Ilul w := Ilu/w = suplu(x)lw(x) -1 , x where w(x) = x for all x E X, and let Xw be the Banach space of functions u on X with finite w-norm, i.e., Iluilw < oo. It is clear that Xw contains X (in fact,

110

Chapter 8. The Poisson Equation

since w > 1, we have IluIl u, < Hull < co if u is bounded) and, moreover, the function h in (8.3.18) belongs to Xli, That is, the P.E. does not have a solution in X, but it does in XiD . Moreover, it is straighforward to check that P is still a bounded linear operator on X„) . Under some additional assumption on the distribution q (for instance, if q has a finite "moment generating function"), one may show that P is in fact power-bounded.

8.4 The Cesaro-Averages Approach It follows from the results in §8.3 that the existence of solutions to the multichain P.E. (or, equivalently, the existence of canonical pairs) is closely connected with the limiting behavior of the Cesar° averages P(') := n -1 ,572 . In this section we obtain necessary and/or sufficient conditions for the existence of such solutions by identifying the limits of P(n). To do this we shall use the mean ergodic theorem of Yosida and Kakutani [135] (see also [133] or [34]), which requires the following assumption.

Assumption 8.4.1. X is a Banach space and P maps X into itself. Moreover, (a) Pr' In -- 0 strongly, and (b) sup.11P (n) i <00. Note that (a) and (b) trivially hold if P is power-bounded, in particular, if P is a contraction [see Remark 8.3.4(b)]. Now let A(P) be the set of functions whose Cesar° averages converge, i.e., A(P) := {f E XI P ( n) f

converges strongly as n -4

The set A(P) is nonempty [it contains (at least) the constant functions] and the following mean ergodic theorem (for a proof see the above-mentioned references) provides a description of it. We use the notation Ker := kernel (or null) space and Ran := range. Moreover, Y denotes the closure of a set Y.

Theorem 8.4.2. Suppose that Assumption 8.4.1 holds. Then A(P) is the closed linear manifold given by A(P) = Ker(/ - P) @ Ran(/ - P).

(8.4.1)

Furthermore, the operator H that maps f 1--> fif := s-lim P ( h') f is a projection on A(P) (i.e., ii 2 = H) with range and kernel Ran(II) = Ker(/ - P), and

Ker(II) = Ran(/ - P),

(8.4.2)

and satisfies HP = PH = 112 = H. If, in addition, X is reflexive, then P is mean ergodic, i.e., A(P) = X.

(8.4.3)

8.4. The Cesaro-Averages Approach

111

Remark 8.4.3. We have already seen special cases of Theorem 8.4.2. For instance, if we take the "projection" (or idempotent) operator H as the linear mapping H in Lemma 5.2.4, where X = Li (p), then (5.2.5) follows from (8.4.3). See also Theorem 5.2.2(c),(d), and Theorem 2.3.5. On the other hand, concerning the last statement in Theorem 8.4.2, the condition that X be reflexive is sufficient but not necessary for P to be mean ergodic. For instance, suppose that p, is a P-invariant p.m., and let X = L i (p). Then X is not reflexive, but as shown by the MET, Theorem 2.3.5, A(P) = X. We shall now derive necessary conditions for the existence of solutions to (8.2.1); sufficient conditions are considered in the second half of this section. Let (g, h) be a solution of the multichain P.E. with charge f, and suppose that Assumption 8.4.1 holds. Then, by (8.4.1) and (8.2.1), g and f are both in A(P), and in fact, by Corollary 8.3.3(a), (b),

g = II f .

(8.4.4)

Hence, in particular, we may rewrite (8.2.1)(b) as (/ — P)h = (/ — II)f.

(8.4.5)

On the other hand, by (8.4.4), g is necessarily unique but this needs not be the case for h because (g,h+ Hit') is also a solution of the multichain P.E. for any h' in A(P); indeed, by (8.4.3), we have (I — P)1111' = 0, and so

(I — P)(h + IIh') = (I — P)h = (I — H)/ . For h to be unique it suffices, for instance, to add the constraint IIh = 0.

(8.4.6)

In other words, as in the last part of the proof of Theorem 8.3.5, we have:

Proposition 8.4.4. If (g, h 1 ) and (g, h2) are two solutions of the multichain P.E. and h1 , h2 satisfy (8.4.6), then h1 = h2. Proof. From (8.4.5), (/ — P)(hi — h2) = 0, i.e. u := h1 — h2 is in Ker(/ — P) = Ran(H) [by (8.4.2)]. This implies u = Hu, so that, by (8.4.6), u = h1 — h2 = 0. 0 Finally, we shall use (8.3.5) to re-state Corollary 8.3.3 in the context of this section. Actually, the following proposition almost amounts to a trivial remark but it is important because it gives an idea of the rate of convergence of P(n) f to IIf .

Proposition 8.4.5. Suppose that Assumption 8.4.1 holds. If f and h satisfy (8.4.5), with f in A(P), then P(n) f — Ilf = —1 (I — Pn)h --4 0

strongly.

(8.4.7)

112

Chapter 8. The Poisson Equation

If in addition P is power-bounded, then 1P(n) .f – [If II

(8.4.8)

MIlhIl/n

for some constant M. Proof. From (8.4.5) and (8.3.5), S1 n (I – II) f = Sn (I – P)h = (I – Pn)h.

Since S, (/ – H) = Sn – nil, (8.4.7) — hence (8.4.8)

follows.

111

Remark 8.4.6. The convergence in (8.4.8) can be greatly improved by imposing suitable assumptions on the t.p.f. P. In particular, there is a large variety of

conditions ensuring a geometric rate of convergence (see §4.3.3), that is, there exists a constant 0 <8 < 1 such that Pri i – it(i)1 < cfin V f E X, and n = 0, 1, . . . ,

(8.4.9)

where j( f)) :=ff dp = II f for some P-invariant probability measure it, and c is a constant (that may depend on f). See [22, 46, 51, 64, 70, 103]. Note that if (8.4.9) holds, then the operator H o introduced below is defined for all f E X. To state sufficient conditions for the existence of solutions to the multichain P.E., let us consider two operators H o and H defined as Hof := s-

lim

7j -4

n-1

y-: ( p k - IT) f =

Do

E(Pk - 11 )f,

k=0

k=0

00

and

N n-1

1 H f := s-I\Ern Kr E cx)

(pk

_ ii)f.

(8.4.10)

n=1 k=0

The domain of H is Dom(H) := { f E A(P)I the limit in (8.4 .10) exists}, and similarly for Ho . If a sequence {hn } in X converges strongly to h then so does the sequence of averages n -1 E nk=01 hk. Thus, taking ,

n-1

hn :=

E(pk

_ II)f,

k=0

we see that H is an extension of Ho , that is, Dom(H0 ) c Dom(H) and H f = Ho f

V f E Dom(1/0 ).

In fact, these remarks were intended mainly to illustrate the relation between (8.4.9) and Ho , whence between (8.4.9) and H. But what we are really interested in is the following result, which in particular gives the precise domain and range of H. [Compare Theorem 8.4.7 and Remark 8.3.6, noting that (P – II) k (I – II) = Pk – H for all k = 0, 1, ... , by (8.4.3).]

113

8.4. The Cesaro-Averages Approach

Theorem 8.4.7. Under Assumption 8.4.1 we have: (a) f is in Dom(H) if and only if the pair (g, h) given by g = II/ and N E E( pk _ ri) f h :-= H f = s- lim N-1 n-1

N —,oc

(8.4.11)

n=1 k=0

is the unique solution of (8.4.4)-(8.4.6). (b) Dom(H) = Ran(H) e (I - P) Ker(II) [= Ker(/ - P) ED (I - P)Ran(/ - P), by

(8.4.2)]. (c) Ran(H) = Ker(II) [= Ran(/ - P), by (8.4.2)]. (d) The restriction of H to Ran(H) = Ker(II), call it Z, is the inverse of I-P+II, i.e., Z f = (I - P + II) -l f

Vf E Ran(H) 1= Ker(II) by (c)j;

(8.4.12)

hence, the function h in (8.4.11) can be written as h = H f = Z(I - II)f V f E Dom(H).

(8.4.13)

(a) (). Suppose that f is in Dom(H), and let g := II f and h := H f. Then observing that [by (8.4.3)] Proof.

pk (1- ____ ll) f

( 1-' 7-4

ii) f Vk = 0,1,...,

we see that the function h = H f in (8.4.11) is the same as the function h in (8.3.8) with g = Ilf. Hence the implication "" in (a) follows from the implication "(ii)(i)" in Theorem 8.3.5. Similarly, the converse follows from "(i)(ii)" in Theorem 8.3.5. (b) Let f be in Dom(H) and let g := II f and h := H f. Then, by part (a), (8.4.5) and (8.4.6) yield f = II f + (I - P)h,

with h

E

Ker(II);

hence f

is in Ran(H) e (i- - P) Ker(II).

(8.4.14)

Now suppose that f satisfies (8.4.14). Then there are functions f i in A(P) and f2 in Ker(H) such that f =I-Ifi+ (I - P)f2. Obviously [by (8.4.3)], Hfi is in Dom(H) and HIIf i = 0. Moreover [by (8.4.3) again] (p k

_ "i)(I _ p) _

pk ____ pk-1-1 ,

and so H(I - P)f2 -= f2. Summarizing, if f satisfies (8.4.14), then f is in Dom(H) and H f = f2.

114

Chapter 8. The Poisson Equation

(c) Suppose h = H f is in Ran(H). Then since H is bounded (by Assumption 8.4.1), we can interchange H and s-urn in (8.4.10), which combined with (8.4.3) yields 1-11/ = IIHf = 0, i.e., h E Ker(H), (8.4.15) so that Ran(H) c Ker(II). Now to prove that Ker(H) c Ran(H), let h be in Ker(II) and let f := (I - P)h. Then, by (8.4.3) and (8.3.5), n-1

n-1

E(pk _ {)f k=0

.

E (pk _ 11) (I — P)h = (I - Pn)h.

(8.4.16)

k=0

Thus, (8.4.10) yields H f = h - Ilh = h, i.e., h is in Ran(H). (d) Suppose that f is in Ran(H) = Ker(II) and let h =- H f . Then, by (a), from (8.4.5)-(8.4.6) we get (I - P)h = (I - II) f = f and Hh = 0, so that (I - P + II)h = (I - P)h + Hit = f,

i.e., (I - P +II)H = I on Ker(I1). A similar argument shows that H(/-P+II) = I on Ker(H). Finally, to prove (8.4.13), let f be any function in Dom(H). Then part (a) yields that h = H f satisfies (8.4.5)-(8.4.6), so that (I - P + II)Hf = (I - P)h +Hh = (I - II) f ,

and (8.4.13) follows.

n

Remark 8.4.8. (a) Arguing as in (8.4.16) it can be shown that H(I - P)f = (I - II) f

Vf E Dom(H),

so that in addition to (8.4.13) we have H (I - P + II) f = H (I - P) f = (I - II) f .

(b) The operator Ho defined above is sometimes called the ergodic potential of P, and H1 := Ez° 0 Pk is called the potential [108, 112, 127]. In the following section we study the a-potential (or resolvent) R,„ in (8.5.1).

8.5 The Abelian Approach For every 0
Ra := (1. — aP)

—

E akpk k=0

.

(8.5.1)

8.5. The Abelian Approach

115

The close connection between the limits of the Cesar() averages P(n ) [see (8.3.1)] as n -- oo and the limits of the "Abelian means" (1 — as aI 1 has been widely exploited in a variety of contexts. In this section we use that connection to study the multichain P.E. (8.2.1). First, to ensure that, among other things, R, is well defined we let X be as in §8.4 [i.e., X is a Banach space] and suppose:

Assumption 8.5.1. P is power-bounded, i.e., there is a constant M such that liPnii < M for all n = 0,1, .... Assumption 8.5.1 obviously holds, in particular, if P is a contraction [see Remark 8.3.4(b)]. On the other hand, note that Assumption 8.5.1 implies Assumption 8.4.1 and, therefore, all the results of §8.4 are valid. Moreover, (8.5.2)

sup 11( 1 — a)R, 11 5_ M (< oo),

0
which, since P is positive (f > 0 = P f > 0), is in fact equivalent to the condition supn IIP (n)Il < oo in Assumption 8.4.1(b) (see, for instance, [36]). In addition, a well known result [19, 117] yields that, under Assumption 8.5.1, the set A(P) in (8.4.1) is the same as the set of all f E X for which the strong limit s-lim„-ti (1 — ce)R,f exists, and in fact coincides with TIf:

s- lim P (n ) f = IV = s-lim(1 — a).1=1„f n—too

at1

V f E A(P).

(8.5.3)

(See Remark 8.5.4, below.) We next extend to our present context a result in [20], which turns out to be related to Proposition 8.4.5.

Proposition 8.5.2. Suppose that Assumption 8.5.1 holds, and let f be a function in X . Then n-1

n-1

(a) an(Pn—/)Rf =

Ea k )

(1—a)R,f

( k =0 (b) For every n = 1, 2, . . . , the limit

—

E

a

k pk r J

Va E (0, 1), n= 1, 2, . . .

k=0

G n f := s-lim(Pn — .1)Ra f all

(8.5.4)

exists if and only if f is in A(P), in which case [by (a) and (8.5.3)] Gnf = n • IV

—

Snf

Vn = 1, 2 , • " ,

(8.5.5)

so that s- lim G n f In = 0; Ti —+00

(c) Given f in A(P), a function h satisfies the P.E. (8.4.5) if and only if Gn f = Pnh — h

Vn = 1,2,... .

(8.5.6)

Chapter 8. The Poisson Equation

116

Proof. (a) From (8.5.1), we get R, = I + aPR,, which iterated yields n-1

R, =

E

Vn = 1,2,..., 0 < a < 1.

a k Pk ± an Pn Ra k=0

Substracting an Ra to both sides of this equation and recalling that n-1 1 — an = (1 k=0

we obtain (a). (b) In (a) take both lim inf and lirn sup as a i 1 to get s-lim inf (Pn — I)R, f = n • s-lim inf (1 - a)R, f - S n f all

all

< n • s-lim sup(1 - a)R, f - Sn f all

=

s-lim sup(Pn - I)R, f . all

Thus, in view of (8.5.3), we conclude that (8.5.4)-(8.5.5) hold if and only if f is in A(P). Finally, (c) follows from (8.5.5) and the equality (8.4.7) [or (8.3.2) with g = 0 HP. With n = 1, Proposition 8.5.2(a) gives the following expression for (/ - P)Ra = R,(I - P):

a R,(I - P) f = f - (1 - a)R, f

Vf E X.

(8.5.7)

f E A(P).

(8.5.8)

Therefore, from (8.5.3) [or from (8.5.5) with Ti = 1], s-lim R,(I - P) f = (I - II) f all

if

We will now use (8.5.7) and (8.5.8) to obtain a result similar to Theorem 8.4.7 but using R, instead of P(n) . First, following [19] (see also [118]) let Po denote the restriction of P to A(P) and define J f := s-lim R, f , all

whenever the limit exists.

117

8.5. The Abelian Approach

Theorem 8.5.3. Under Assumption 8.5.1:

(a) Dom(J) = Ran(/ — Po ), and Ran(J) = Ker(II). (b) f is in Dom(J) if and only if (8.5.9)

h = s-lim Ra f [= J f] all

is the unique solution of the Poisson equation (g = 0 and) (I — P)h = f

IIh ----- O.

with

(8.5.10)

(c) f is in Dom(J) if and only if the pair (g, h) with g = IIf and h as in (8.5.9) is the unique solution of the multichain P.E. (8.2.1) satisfying IIh = 0. (d) The restriction of J to Ran(J) = Ker(II) is the inverse of (I — P o ), i.e.,

Vf E Ker(H). (8.5.11)

(I — P)J f = f = J(I — P)f

Proof. (a) To show that Ran(/—Po) c Dom(J), let f be a function in Ran(/ — Po). Then there is a function h in A(P) such that f = (I — P)h, and (8.5.8) gives that f is Dom(J) and J f = s-lim Ra (I — P)h = (I — II)h. (8.5.12)

Suppose now that f is in Dom(J), and let J f = h, i.e., s-lim R a f = h.

(8.5.13)

all

Then multiplying the latter equality by (1 — a) we obtain, by (8.5.3), II f = s-lim(1 — a) R a f = s-lim(1 — a) • h = all

all

0.

This in turn yields [by (8.5.8) — also recall that P is bounded] f = s-lim R a j — P)f = (I — P) - J f = (I — P)h.

(8.5.14)

all

Finally, by (8.5.13), h = s-urn R, f all

= s-iiM R a j — P)h all

[by (8.5.14)],

which, by (8.5.12), yields h = h — IIh; hence, IIh = 0.

(8.5.15)

Thus, from (8.5.14) and (8.5.15) we conclude that f is in Ran(/ — P0 ). This completes the proof of Dom(J) = Ran(/ — 130).

Chapter 8. The Poisson Equation

118

On the other hand, note that (8.5.13) and (8.5.15) imply Ran(J) c Ker(II). Now let h be in Ker(H), and let f := (I — P)h. Then (8.5.8) yields J f = s-lim Ra (I — P)h = (I — II)h = h, ail

i.e., h is in Ran(J). In other words, Ker(H) c Ran(J). (b) This follows from (a). Namely, the implication "" follows from (8.5.14) follows from (8.5.12). The uniqueness comes and (8.5.15), and the converse from Proposition 8.4.4. (c) () If f is in Dom(J), it follows from (8.5.5)—(8.5.6) and (8.3.2) that (IIf, h) is an f-canonical pair. Thus, by Theorem 8.3.2 and Proposition 8.4.4, is the unique solution of (8.2.1) satisfying (8.4.6). (IIf, , () The converse follows from part (b): In (8.5.9) and (8.5.10), replace f by f - Ilf• (d) The first equality in (8.5.11) follows from (8.5.9)—(8.5.10), and the second follows from (8.5.8) [or (8.5.12)]. Remark 8.5.4. An informal way of arriving at the second equality in (8.5.3) as well as at (8.5.16) J f = s-lim Ra f ail

in (8.5.9) is as follows. In (8.5.1) replace Pk by H + (Pk — H) to get 00

Ra =

k (pk H) 1—

a

(8.5.17)

k=0

This immediately suggest the second equality in (8.5.3) if Pk — H converges to zero sufficiently fast, for example, as in (8.4.9). In particular, using k-1

for k = 1, 2, . . j=o elementary calculations on (8.5.17) give H

Ra =

+

1/0 — (1 —

1— a

a)

00

00

E

E

k=0

n=-k+1

_

(8.5.18)

E(Pk _ n).

(8.5.19)

where, as in §8.4, n-1

OC

Ho

(pk k=0

- n) =

s- lirn

n—>oo

k=0

Thus, if the sequence in (8.5.19) converges, (8.5.18) yields the second equality in (8.5.3) as well as (8.5.16) with J f = Ho f for f in Ker(II). These calculations can be made precise even in the uniform (instead of the strong) operator topology; see e.g. [77, 95, 100, 118]. Finally, note that expressions such as (8.5.18) can be used to obtain, for instance, rates of convergence of (1 — a)Ra to H as a T 1.

119

8.6. Notes

8.6 Notes We have presented a detailed analysis of the problem of existence of solutions to the multichain P.E. (8.2.1) in a very general context, using the concept of canonical pairs (§8.3), Cesar° averages P( n) (§8.4), and a-potentials R, (§8.5). There remains, however, a lot to be done. In particular, some important open problems are the following. 1. To develop approximation schemes to solve (8.2.1), perhaps iteratively. If the state space X is finite, there are available some computational algorithms — see the references in Remark 8.3.6. For the case of general X, some of the techniques in [19, 97, 118, 120] and their references might be useful. 2. Theorem 8.4.2 provides an "ergodic decomposition" of the domain X of P, expressing X as the (disjoint) union of A(P) and its complement X\A(P). This, in turn, is used to obtain the domain and range of the operators H, H, J, and so on. It would be interesting to obtain a more precise form of these sets for the particular classes of Banach spaces X (for instance, X an L p space) used in probability applications. On the other hand, as shown in §4.5 and §5.3, for instance, there are well-known ergodic decompositions of the MC's state space X. These typically give more information on the MC's probabilistic behaviour and, therefore, it would be important to relate them to the different sets appearing in Theorems 8.4.2, 8.4.7 and 8.5.3. In other words, the basic question is to find the relation (if any) between an ergodic decomposition of X and one of X. 3. An important issue in some Markovian control problems is to determine the validity of a Laurent expansion of the form [cf. (8.5.18)] 00

/7„, = 111(1 — a) + E(—arHn,

(8.6.1)

n=0

where H is the "deviation operator" in (8.4.10) [see also (8.4.13) and (8.3.15)]. To our knowledge, (8.6.1) has only been studied under very strong assumptions [111, 129, 137], and it turns out to be related to a sequence of "nested" Poisson equations. 4. From Theorem 5.2.2 we know that, under appropriate hypotheses, the projection operator H in Theorem 8.4.2 is determined by a t.p.f. I1(x, B), which has nice implications. It would be useful to obtain a similar result for (1 — a)R, [see (8.5.3)]. 5. In the unichain case, there is a well-known relation between the existence of solutions to the Poisson equation and probabilistic conditions such as the Doeblin and Harris conditions [96, 112]. What can be said about this relation in the multichain case?

120

Chapter 8. The Poisson Equation

6. The results in this paper are for the "discrete-time" Poisson equation (8.2.1), in the strong topology. What are the corresponding results for the continuous-time case (when P — I is replaced by the generator of a continuous-time Markov semigroup Pt , t > 0) and/or in the uniform or weak (operator) topologies? What can be said about the "adjoint" Poisson equation v = vP, it(I — P) = 0— v, for a given "charge" 0 in X*? (See [19, 95, 97, 108, 117, 118, 119, 120, 100].) Most of the results in this chapter are fom Hernandez-Lerma and Lasserre [68].

Chapter 9

Strong and Uniform Ergodicity 9.1 Introduction In this chapter we introduce the notions of strong ergodicity and uniform ergodicity of MCs. We study how these notions relate to the concept of "stability" of a transition kernel and to the solvability of the Poisson equation (8.2.1). For a MC with a t.p.f. P that admits an invariant p.m. ii, the original notion of ergodicity refers to the property (2.4.5). This property occurs if ii is ergodic (Definition 2.4.1) and states a memoryless property of the MC in the long-run, in the sense that the limit of P(n) does not depend on the initial state x, for p-a.a. x E X. However, this memoryless property is true only in a proper subset of X. For the strong and the uniform ergodicity studied in this chapter, the memoryless property holds for any initial state x E X. Moreover, uniformly ergodic MCs have an interesting "stability" property under sufficiently small perturbations of the t.p.f. P. We will see that the strong ergodicity and the uniform ergodicity properties are indeed very strong. For instance, strong ergodicity implies (and in fact is equivalent to saying) that the MC is positive Harris recurrent. Some authors state strong and uniform ergodicity in terms of the asymptotic property of the sequence {Pm } of n-step transition probabilities (see, e.g., Meyn and Tweedie [103], Nummelin [1081), whereas some others (e.g., Kartashov [78], or Revuz [112]) use the same definition but in terms of the sequence of Cesaro averages {PM} rather than {Pn}. We choose the latter definition mainly because the former, in terms of {Pn}, implicitly assumes an additional "aperiodicity" property of the MC, not easy to check in general, whereas the strong ergodicity property does not require this aperiodicity assumption. Moreover, the definition of uniform ergodicity in terms of {Pri} is not really appropriate because, with this definition, it turns out that a uniformly ergodic MC in fact inherits (with no further assumption) the stronger geometric ergodicity property introduced in Definition 4.3.5.

122

Chapter 9. Strong and Uniform Ergodicity

Also in this chapter we introduce the notions of weak ergodicity and weak uniform ergodicity for a MC on a LCS metric space. These definitions are essentially the same as those of strong and uniform ergodicity, except that we look at the t.p.f. P as a linear operator acting on Cb(X) only (rather than on B(X)). The conditions for weak and weak uniform ergodicity are, of course, easier to verify. It turns out, however, that the latter weaker notions yield basically the same properties. For instance, compare Propositions 9.2.4 and 9.3.4, and Theorems 9.2.7 and 9.3.5.

9.2 Strong and Uniform Ergodicity 9.2.1 Notation Given a Banach space (X, I1.11), we denote by L(X) the Banach space of bounded linear operators on X into itself with the induced operator norm 11.11, which is defined as for Q E L(X). (9.2.1) IIQII := sup 11Qx11 11x115_1 We shall consider, in particular, the following two cases, in which (X, B) is a given measurable space: • X = M(X), the Banach space of finite signed measures on (X, B), with the total variation norm (1.4.3). Moreover, we shall denote by 11.11 the total variation norm or any norm equivalent to it, as in (1.4.4), for instance. • X --= B(X) the Banach space of bounded measurable functions on X, with the supremum (or sup) norm VII = supx If(x)1. In these cases, it will be clear from the context whether 11.11 refers to the total variation norm or to the sup norm. Consider now a MC „ = { t } on a measurable space (X, B) with t.p.f. P. We may look at P as a linear operator on M(X), that is P E L(M(X)), defining v '—f vP as in (2.2.12), i.e., v P (B) = f P(x, B) v(dx)

VI/

E M(X).

(9.2.2)

Similarly, defining f i— P f as in (2.3.6), i.e., P f (x) = f f (y)P(x, dy)

V f E B(X),

(9.2.3)

we may view P as a linear operator in L(B(X)). For notational ease, we write P for both the operator P E L(M(X)) in (9.2.2) and P E L(B(X)) in (9.2.3). Hence, by (9.2.1),

IIPII = sup{Ilv-P11 114 = supaPP1111/11

1 , v E M(X)} 1, f E B(X)I-

(9.2.4)

9.2. Strong and Uniform Ergodicity

123

From (9.2.2) we have IvP(B)I

VB E B,

f P(x,B) Iv

which yields that 11vPii operator. Similarly, by (9.2.3),

or IIP _< 1. Thus P E L(M(X)) is a contraction

IPi(x)I

11f1

f li(Y)i P(x,dY)

and so P E L(B(X)) is a contraction operator, i.e., 11PM< 1. Hence, by the recursive definition of the n-step transition probabilities Pn (see §2.2), we can see that < 1 Vn > 0. (9.2.5) This means that in either case (9.2.2) or (9.2.3), P is power-bounded, and so the techniques used in Chapter 8 are applicable in our present context. Also note that Assumption 8.4.1 holds for both X = M(X) and X = B(X).

9.2.2 Ergodicity Throughout this chapter we assume that (X, 13) is a separable measurable space and that P has an invariant p.m. tt (that is, p = AP). As usual, P(n) denotes the n-step expected occupation measures in (2.3.6).

Definition 9.2.1. Let

be a MC on X with t.p.f. P, and assume that P has an invariant p.m. p. The MC e. is said to be (a) strongly ergodic if ex E X;

11 P(n) (X1')

(9.2.6)

(b) uniformly ergodic if sup 11P ( n ) (x, .) —M TEx

0.

(9.2.7)

Let H := 1 0 p, be the linear operator on M(X) defined by v

v11 = v(1 0 tt) := v(X) 1u V v E M(X),

so that H E L(M(X)). We also define H := 1 0 tt on B(X) as f

Hf = (1

it)f(x) := f fd,u Vx E X,

so that H E L(B(X)). Hence, as for P, we shall use the same notation for H as an operator in L(M(X)) and in L(B(X)).

124

Chapter 9. Strong and Uniform Ergodicity

In either case, that is, viewing H as an operator in L(M(X)) or in L(B(X)), (9.2.7) is equivalent to (9.2.8) lirn 11 P(n) - rut = 0 7/ -p00

for the corresponding operator norm on B(X) or on M(X). Note that if . is strongly ergodic, then p, is necessarily the unique invariant p.m. for P. Therefore, by Proposition 2.4.3, p is ergodic (Definition 2.4.1), and so Proposition 2.4.2 yields the p-a.e. ergodicity property (2.4.5). However, (9.2.6) (and a fortiori (9.2.7) and (9.2.8)) is stronger than (2.4.5) in the sense that it yields

P(n) f (x) - f f dp

for all x E X and f E B(X).

(9.2.9)

We also have the following

Proposition 9.2.2. A MC is strongly ergodic if and only if it is P.H.R. Proof. The proof is a direct consequence of Theorem 4.3.2.

0

On the other hand, we have the following characterization of uniform ergodicity.

Proposition 9.2.3. (Kartashov [78, Theor. 1.3]). The MC , is uniformly ergodic if and only if the operator (I - P + H) E L(M(X)) has a bounded inverse in L(M(X)), where I E L(M(X)) is the identity operator. Moreover, in Proposition 9.2.3 we may replace M(X) with B(X), which yields the following.

Proposition 9.2.4. The MC „ is uniformly ergodic if and only if the operator (I - P + H) E L(B(X)) has a bounded inverse in L(B(X)). Proof. We proceed as in Kartashov [78, p. 8]. For each n > 1, the bounded linear operator Qn : B(X) -> B(X) defined by n

Q n := H + n -1 E(k(P (k) - H)) lc=-1

satisfies

Qn(i - P -EH) = (I - P ±H)Qn

I _ p(n+1) ± H ± n -i i.

(9.2.10)

9.2. Strong and Uniform Ergodicity

125

Therefore, if . is uniformly ergodic, choosing n such that 1 P(n+ 1) - Hil < 1 ensures that the operator (/ - P ( n+ 1) + H + n -1 /) has a bounded inverse and commutes with Q„ and (/ - P + H). Multiplying both sides of (9.2.10) by Zn := (/ - P(n+1) ± H + n -11) -1 yields that the linear operator Z E L(B(X)), defined by Z := (I - P +11)-1 = Z„Q„ = is well-defined and bounded. Conversely, let (/ - P + H) : B(X) -+ B(X) have a bounded inverse Z E L(B(X)). Observe that for all n = 1, 2, ... , (I _ p ± ri) (p(n) — H)

Hence, by (9.2.5), 11(/ - P + II)(P (n) — 11)11 follows that

< 2n -1 for all n = 1, 2, ... , and so it

11P (n) — 11 11 = 11Z (I — P + II) (P (n) - 11 )11

—11)11 2 n -1

1141.

Thus 11 P(n) — fill —÷ 0 as n -4 oo. By Definition 9.2.1(b), t. is uniformly ergodic. 0

9.2.3 Strong Stability Now consider Kartashov's [78] notion of a strongly stable MC.

Definition 9.2.5. A MC . with t.p.f. P is said to be strongly stable in the norm M.M if there is some € > 0 and a neighborhood N(P,€) := {(2 I IIQ — PM < €} of P in L(M(X)) such that (a) Every stochastic kernel Q E N(P, c) has a unique invariant p.m. v = vQ. (b) For every sequence {Qn } of stochastic kernels in N(P, e) for which MQ n — PH ---* 0, one has Ilv, - liM -4 0, where lin is the invariant p.m. for Q„, and p, is the invariant p.m. for P. In fact, this definition is also valid for other norms than (1.4.3), provided that they satisfy some conditions (see Kartashov [78, page 5]). It turns out that the uniformly ergodic MCs satisfy the strong stability property, and conversely.

Theorem 9.2.6. (Kartashov [78, Theor. 1.6]). A MC is strongly stable in the norm 11.11 if and only if it is uniformly ergodic. Thus, uniform ergodicity is equivalent to a strong form of stability, that is, under a sufficiently small perturbation of its t.p.f., a uniformly ergodic MC preserves its "ergodic" properties.

126

Chapter 9. Strong and Uniform Ergodicity

9.2.4 The Link with the Poisson Equation We may use Theorem 9.2.6 to relate strong and uniform ergodicity to the solvability of the Poisson Equation (P.E.) in (8.2.1) with ga- constant, i.e., (9.2.11)

g (I — P)h = f. Indeed, we have the following.

Theorem 9.2.7. (a) The MC e, is strongly ergodic if for every B

E B, there exist

a scalar gB and a measurable function hB that solve the P.E. (9.2.11) with charge i.e., f (9.2.12) g (I — P)h = 11B, and such that PnhB/n ---+ 0 pointwise. (b) The MC e, is uniformly ergodic if and only if for every "charge" f B(X), the P.E. (9.2.11) has a solution (gf,hf) ER x B(X).

E

Proof. (a) Choose an arbitrary B c B, and let (gB , h B) be a solution of the P.E. (9.2.12) such that PnhB/n —+ 0 pointwise as n —+ oo. From (8.3.2) with f = "LB we obtain gB + PnhB (x)In = hB(x)/n+ P(n) (X, B) Taking limit in (9.2.13) as n

(9.2.13)

co yields

P( n ) (x,B)

gB

From Theorem 4.3.1 it follows that

II P(n) (X1

Vx E X.

A(*)

V x E X.

is P.H.R., and thus, by Theorem 4.3.2, —+0

Vx E

X,

where p, is the unique invariant p.m. for P. This proves (9.2.6), and so is strongly ergodic. (b) The only if part. From Proposition 9.2.4 the operator I— P +II : B(X) B(X) has a bounded inverse Z E L(B(X)). It is easy to check that for any given f E B(X), the pair (gf,hf) E R x B(X) with gf := f fdp, and hf := Zf solves the P.E. (9.2.11). The if part. Choose an arbitrary charge f e B(X), and assume that the P.E. (9.2.11) has a solution (gf ,hf ) E R X B(X). As in the proof of (a), it follows that is P.H.R. with a unique invariant p.m. p. Next, the solution hf is unique in B(X) up to a constant because if h'f is another solution in B(X), then (I — P)(h f — hif ) = 0. Therefore, by Theorem 4.2.14, the bounded P-harmonic function hf — hif is constant. Let Bo(X) := {f E B(X)I f fdp, = 0} c B(X).

9.3. Weak and Weak Uniform Ergodicity

127

We thus have (I - P)B(X) = Bo(X) and I - P: Bo(X) -> Bo(X) is one-to-one and onto. By the Banach Theorem (see e.g. [34, Theor. 2, §II 2.1.2]), I - P has a bounded inverse Q: Bo(X) Bo(X). Proceeding as in Revuz [112, P. 204], n-1

n-1

pifflfII

11nl

= 11 71-1

i=0

E Piu - rif)11 i=0 n-1

= lin - 1 E PT/ - P)Q(f

11 /)

i=0

<

4n' 1Q11

Hence sup IIP (n) / Tin -3 0 or, equivalently, II P(n)

-4 0 1

which is also equivalent to uniform ergodicity (see (9.2.8)).

El

Remark 9.2.8. (a) Theorem 9.2.7(b) is slightly different from Theorem 3.5(ii) in Revuz [112] where it is stated that uniform ergodicity is equivalent to : (al) there is an invariant p.m. u, (a2) the bounded harmonic functions are constant, and (a3) (I - P)B(X) = B o (X). (b) Given a weight function w : X -> [1, oo), one can also get a weighted version of Theorem 9.2.7 with the spaces B„(X), M ill (X) defined as B(X) = {f

sup IA4
xEX

in lieu of B(X), M(X), with

Mph,

as in (4.3.6).

9.3 Weak and Weak Uniform Ergodicity Strong ergodicity is indeed a strong property because, as we already noted in Proposition 9.2.2, it is equivalent to P.H.R. Of course, uniform ergodicity is an even stronger property. However, as many MCs with a unique invariant p.m. are not P.H.R., one may wonder whether there exists a notion weaker than strong ergodicity. It turns out that when X is a LCS metric space, replacing the convergence in total variation norm in (9.2.6) with weak convergence leads to the following concept.

128

Chapter 9. Strong and Uniform Ergodicity

9.3.1 Weak Ergodicity Definition 9.3.1. Let , be a MC on a LCS metric space X, with t.p.f. P. Assume that P has an invariant p.m. it. Then e. is said to be weakly ergodic if Vx E X, where "" denotes the weak convergence of p.m. 's (see Definition

(9.3.1) 1.4.10).

Clearly, (9.3.1) is a lot weaker than (9.2.6). (See (1.4.10).) Of course, if (9.3.1) holds, the "time average = space average" ergodicity property (2.4.5) is still true for arbitrary f E Li (p), since it is the unique invariant p.m. (Recall that a weakly convergent sequence of p.m.'s has a unique limiting measure.) However, instead of the p-a.e. convergence in (2.4.5), under (9.3.1) we have for all x E X, and f E Cb(X).

Pen) f(x) --4 f f dp

(9.3.2)

We have seen in §5.2 and §5.3 that for a MC . on a LCS metric space, with a t.p.f. P that admits an invariant p.m., there is a Yosida decomposition of X into ergodic classes such that in each ergodic class E there is a unique (ergodic) invariant p.m. co and a set A c E of full cp-measure, such that p(n)( x,

.)

ep

Vx E A.

In other words, in a LCS metric space, the restriction of a MC to its ergodic classes is weakly ergodic, with no further assumption than the existence of an invariant p.m. The next result provides a sufficient condition for weak ergodicity.

Proposition 9.3.2. Let e. be a MC on a LCS metric space X, with t.p.f. P. Assume that (a) P is weak-Feller. (b) P admits a unique invariant p.m. p. (c) There exist a measurable function V : X ---4 IR+ and a moment function f : X —> 111 + (Definition 1.4.14) such that PV (x) < V (x) — f (x) -I- 1 Then . is weakly ergodic.

Vx E X.

(9.3.3)

9.3. Weak and Weak Uniform Ergodicity

129

Proof. Iterating (9.3.3) we obtain

n-1

> pnv +

Pk

f — n,

k=0

which gives

n-1 V>

E Pk f —

n Vn > 1,

k=0

because V is nonnegative. Hence, sup f P (n ) (x, dy) f (y) < 1 + V(x)

Vx E X.

As f is a moment, from Proposition 1.4.15 we have that the sequence IP(n ) (x,.)} is tight for every x E X [see Definition 1.4.14 Now choose an arbitrary x E X. By Prohorov's Theorem 1.4.12, there is a p.m. px E M(X)+ and a subsequence {nk} of {n}, such that /3(71 )(x, .) = ,a x. The latter convergence and (a), together with Proposition 7.2.2, yield that fix P = lix. Thus, by the uniqueness of the invariant p.m. p in (b), px = p for all x E X. Therefore, as x E X was arbitrary, the whole sequence {PM (x, .)} converges to p, that is, (9.3.1) holds. 0 We next introduce the notion of weak uniform ergodicity.

9.3.2 Weak Uniform Ergodicity Definition 9.3.3. Let e. be a MC on a LCS metric space X, with a weak-Feller t.p.f. P. Assume that P has an invariant p.m. p. Then e. is said to be weakly uniformly ergodic if sup

sup I P( n ) f(x) —

0

as n

oo.

(9.3.4)

fEcb(x);11f11<1 xEx

In other words, for a weakly uniformly ergodic MC, the weak convergence in (9.3.1) is uniform in x E X and f in the unit ball of Cb(X). Recall that Cb (X) equipped with the sup norm is a Banach space. Thus, as P is weak-Feller, we can consider P and H = 1 0 p as linear operators on Ch(X) (that is, P and H are in L(Cb (X))) with the corresponding operator norm 11.11b on L(Cb(X)), 11 (211b :=

sup

sup

{fEcb(x)difil
Kg(x)1

for Q E L(Cb(X)).

This norm is denoted by111Q1 b to avoid confusion with the norm Then (9.3.4) is equivalent to

11 -P(n)

rqb —>

0

as n

oo.

We have the following analogue of Proposition 9.2.4

1Q11

in (9.2.4). (9.3.5)

Chapter 9. Strong and Uniform Ergodicity

130

Proposition 9.3.4. The MC . is weakly uniformly ergodic if and only if the operator (I P + H) E L(Cb(X)) has a bounded inverse. —

Proof. The proof is a verbatim copy of the proof of Proposition 9.2.4 with the 0 norm 11.11b in lieu of 11.11. We also have an analogue of Theorem 9.2.7.

Theorem 9.3.5. Let e. be a MC on a LCS metric space, with a weak-Feller t.p.f. P. Assume that P has a unique invariant p.m. p. (a) e, is weakly ergodic if for every f E Cb(X), there is a scalar g f and a measurable function hf such that (g f, h f) solves the P.E. (9.2.11), and in addition Pnh f In —> 0 pointwise as n —+ oo. (b) e, is weakly uniformly ergodic if and only if for every f E Cb (X), the P.E. (9.2.11) has a solution (g f, h f) in R X Cb(X). Proof. (a) The proof is similar to that of Proposition 9.2.7(a). Fix an arbitrary f E Cb(X). Iterating (9.2.11) yields PO') f (x) —* g1

Vx E

X.

(9.3.6)

As f E Cb(X) was arbitrary, it follows that P(n) (x, .) = v for some v E M(X). Hence, as P is weak-Feller, by Proposition 7.2.2 it follows that v is an invariant p.m. for P. Moreover, from the uniqueness of p,, we have v = p. Thus, P(n ) (x, .) = p for p all x E X, that is, e, is weakly ergodic. Moreover, from (9.3.6) and P(n) (x, .) for all x E X, it follows that gf = f fdp= fIf. (b) Only if part. From Proposition 9.3.4, the operator I—P+II has a bounded inverse Z E L(Cb(X)). Given f E Cb(X), the pair (gf,hf) with gf := Hf and h f := Z f E L(Cb(X)) solves the P.E. (9.2.11). If part. Choose an arbitrary f E Cb(X) and let (g f,h f) E R x Cb(X) be a solution to (9.2.11). By (a), e, is weakly ergodic and g f = fIf . It also follows that the continuous bounded harmonic functions are constant. Indeed, let fo E Cb(X) be such that Pfo = fo (and hence, P(n)f0 = fo for all n > 0). As , is weakly ergodic, we have fo(X) = nlinI Pen) fo (x) = fdp, fo

Vx E X.

Thus, let (g f , h f) and (g"f , W.f.) be two solutions of the Poisson equation, h'f ) = 0, that is, the function with h f, lef E Cb(X). It follows that (I P)(h f (h f — h'f ) E Cb(X) is harmonic, hence constant. Let Coo (X) C Cb(X) be the subspace of Ch(X) defined by —

—

Coo (X) := If E Cb(X) I Ilf = 01. Then, using the weak-Feller property, we have (I — P)Cb(X) = Coo (X), and the linear operator (I P) E L(Cb(X)) is one-to-one and onto from Coo (X) into —

9.4. Notes

131

Coo (X). By the Banach Theorem (see, e.g., [34, Theor. 2, §11.2.1.2]), / — P has a bounded inverse Q : Coo (X) —) Coo(X). The rest of the proof mimics that of the if part of Proposition 9.2.7, with the norm M.Mb in lieu of 11.11. LII

Example 9.3.6. Consider the following example from Borovkov [15]. Let X := [0, 1] and let e. be the MC

=

+ 'I't

(mod 1),

t = 0, 1,

,

(9.3.7)

where {V} is a sequence of i.i.d. random variables with Prob(O t = a) = p =1 — Prob(6 = 0), with a > 0 irrational, say a = -VI This MC can be interpreted as a summation process on a circle of unit length, and we have P(x,.)

ji

Vx E X,

(9.3.8)

where i is the Lebesgue measure on [0,1], and it is the unique invariant p.m. for P (see Borovkov [15, Example 1, p. 544]). Of course, (9.3.8) yields that

P(n) (x,.) =

Vx E X,

(9.3.9)

that is, the MC is weakly ergodic (see Definition 9.3.1). In addition, the support of P( ') (x,.) is finite for every n > 0 and x E X. Hence, as in (6.2.4)—(6.2.7), the convergence in (9.3.9) can only be weak and not in total variation, which implies that the MC is not strongly ergodic.

9.4 Notes Some definitions and results of this chapter are from Kartashov [78]. On the other hand, the link with the Poisson equation and the notions of weak ergodicity and weak uniform ergodicity are from the authors. As already mentioned, several authors (see, e.g., Meyn and Tweedie [103], Nummelin [108]) define the strong ergodicity and the uniform ergodicity of a MC, using the convergence in the total variation norm of the n-step probabilities /31 (x,.) (rather than the Cesar° averages P(n)(x,.)) to /..t. Revuz [112] considers both definitions with Pn and P("), respectively. For the former, with Pm, he requires P to be Harris recurrent, aperiodic and quasi-compact, and the latter P is Harris recurrent and quasi-compact (see [112, Theor. 3.4, 3.5, pp. 203-204]), where P is said to be quasi-compact if there exists a sequence {U n } of compact operators on B(X) such that —+ 0 as n oo. Moreover, P is quasicompact if and only if there exists a compact operator U and an integer n o such that Pn° — U M < 1 (see Revuz [112, Prop. 3.3, p. 202]).

Part III

Existence and Approximation of Invariant Probability Measures

Chapter 10

Existence and Uniqueness of Invariant Probability Measures 10.1 Introduction and Statement of the Problems A critical assumption for many results of previous chapters is that a MC admits at least one invariant p.m. In this chapter we investigate the issue of existence of those invariant p.m.'s. Namely, we consider a MC on a measurable space (X, B), with t.p.f. P and we present necessary and sufficient conditions for •

P. Existence of invariant p.m.'s for P;

•

P;. Existence of strictly positive invariant p.m.'s; and

•

P. Existence and uniqueness of strictly positive invariant p.m.'s.

By a strictly positive invariant p.m. in problems 13'; and P;, we mean an invariant p.m. it on B such that p(G) > 0 for any open set G E B, and, therefore, we shall consider PI and 7-31; in the case in which X is a topological space with Borel a-algebra B. In fact, some of our results require X to be a metric space. Finding an invariant p.m. for P is essentially equivalent to finding a solution it to the linear system

1.113 = it with

1i(X) = 1 and tt E M(X) +

in the Banach space M(X) of finite signed measures on X (see §1.2.1).

(10.1.1)

Chapter 10. Existence of Invariant Probability Measures

136

In addition to (10.1.1) we also consider the constraint ii

^ (Po

(10.1.2)

5_ IN,

(10.1.3)

and/or

where (po , /Po are nontrivial measures in M(X)±. When v;i 0 in (10.1.2) is strictly positive, that is, p0 (G) > 0 for every open set G E B, we obtain conditions for problems 71); and P. On the other hand, finding conditions for existence of an invariant p.m. that satisfies (10.1.3) might be useful to determine the existence of a suitably majorized invariant p.m., for instance, with a "tail" property. Finally, we also consider the problems Pi* for MCs in a restricted setting, namely when X is a LCS metric space and the t.p.f. P satisfies the weak-Feller property (see Definition 4.4.2). In this case, we can derive existence conditions without invoking constraints of the form (10.1.2), as we do in the general case. These results are presented in §10.4. The approach. Our approach to problems P7 is in fact very straightforward. The main idea is to write the problems P: as "linear equations" in appropriate Banach spaces, and then use a Generalized Farkas Theorem of Craven and Koliha [25, Theor. 2] to obtain necessary and sufficient conditions for the corresponding linear equations to have a solution. The main technical difficulty is to check whether a certain set is weakly closed, for which we use the Alaoglu (or Banach-Alaoglu-Bourbaki) theorem (see Lemma 1.3.2). We could in principle use alternative approaches to linear equations (see [25, 56]), but the present approach has the advantage that the resulting existence and uniqueness criteria resemble Lyapunov (or Foster-Lyapunov) criteria, usually found in related results. The results for problems P i* are presented in §10.3, §10.4 and §10.5. As some of the proofs are rather technical, for ease of exposition they are postponed to §10.7, after some technical preliminaries in §10.6.

10.2 Notation and Definitions Let (X, B) be a measurable space, and e, a MC on X with t.p.f. P. Let B(X) be the Banach space of real-valued bounded measurable functions on X, with the supremum norm f II = supx 1 f (x) I. As was already noted in §1.3, the spaces M(X) and B(X) form a dual pair (M(X), B(X)) of vector spaces with the bilinear form (IL u):= f u dL

V,u, E M(X), u E B(X).

(10.2.1)

10.2. Notation and Definitions

137

As in §9.2, we use the same "P" to denote the mapping f 1-> P f on B(X) defined by (9.2.3) (or (2.3.6)) and the mapping v i- vP on M(X) defined by (9.2.2) (or (2.2.12)). From Remark 2.3.7(13) (see also §9.2), P is a positive contraction and also a Markov operator. A signed measure p, in M(X) is said to be a fixed point of P if it is Pinvariant, i.e., ,u,P = p. The problem of finding an invariant p.m. for P is basically equivalent to that of finding a nontrivial fixed point of P. Indeed, let p E M(X) be a fixed point of P, and let {D, DC} be the Hahn-Jordan decomposition of p. Denote by p+ (B) := p,(B n D) and p - (B) := p(B n Dc) the positive and negative parts of ,u,, respectively. It is then clear that p, = p+ + p,- is a fixed point of P if and only if p+ and p- are invariant p.m.'s for P (after renormalisation to a p.m. if necessary). Let ba(X) be the Banach space of finitely-additive measures on B, equipped with the total variation norm. Then, ba(X):-_- B(X)* [see (1.2.4) or [34, Theor. IV.5.1]]. Let P* : ba(X) --4 ba(X) be the adjoint (or dual) of the operator P : B(X) -> B(X), that is, P* is such that: u) = (p, Pu),

Vic E ba(X), u E B(X).

(10.2.2)

As M(X) c ba(X), and in view of (2.2.12), when restricted to M(X), P* coincides with P, that is, P* p(B) = pP(B) = f p(dx) P(x, B) x

VB E B, eu, E M(X),

(10.2.3)

so that P* maps M(X) into itself, that is, P*(M(X)) c M(X). If p and v are in M(X), we denote by p, V v (resp. p, A v) the maximum (resp. minimum) of p, and v. That is, p,V v := and

1

1

2 A family K C M(X) is order-bounded from above (resp. from below) if there is some v E M(X) with p, < II (resp. v < p) for all p E K. (See [44, 69, 70, 71] for results, examples and applications of order-boundedness of measures). Now let fp, I be a bounded sequence in M(X)+, and define the order lim inf of {11,} as -

0 - lim inf pr, := n —■ DO

VA m>1 n>m

in 1

(10.2.4)

which is a (possibly trivial or unbounded) measure. Indeed, for every m > 1, the sequence {A n , n > ml C M(X) + is order-bounded from below by the trivial measure 0. The measures 71m := An>m An are well-defined, and {wi } is a nondecreasing sequence as m -> oo. Therefore, by the Vitali-Hahn-Saks theorem [see

Chapter 10. Existence of Invariant Probability Measures

138

Proposition 1.4.21, rini I (p for some, not necessarily finite, measure (p. Similarly, the order-lim sup of bin } is defined as 0 — lim sup it, := n co

A V An , m>1 n>m

(10.2.5)

which is a possibly unbounded measure, that is, not necessarily in M(X)+.

10.3 Existence Results We first consider results for problem Pr in a general measurable space (X, B) , and for problems '1::' and P;, X is a metric space. Let {P() (x, .)} be the sequence of expected occupation measures with initial state x e X, defined in (2.3.4).

10.3.1 Problem Theorem 10.3.1. Let . be a MC on X with t.p.f. P. Then the following statements are equivalent: (a) There is a measure ft E M (X) that satisfies (10.1.1) and (10.1.2) for some measure cp o E M(X)+. (b) The condition

(P — I)u < —v + s with u,v E B(X) + and s E R+

(10.3.1)

implies —

(40 o, v) + s > 0

(10.3.2)

for some measure (po e M(X)+. (c) There is a measure 7 in M(X) ± such that "ii := 0 — lim inf -yP (n) is in rt -■ 00

M(X)± , and it is nontrivial. We also have the following "majorization" version of Theorem 10.3.1.

Theorem 10.3.2. Let „ be a MC on X with t.p.f. P. Then the following statements are equivalent: (a) There is a measure au, E M (X) that satisfies (10.1.1) and (10.1.3) for some measure 00 c m(x)±. (b) The condition

(P — I)u
(10.3.3)

implies (00, v) — s > 0

(10.3.4)

for some measure O o E M(X) +. (c) There is a measure 7 in M(X) + such that -3 := 0 — iirnsup7P(n) is in M(X)+.

10.3. Existence Results

139

For a proof of Theorem 10.3.1 and Theorem 10.3.2 see §10.7 Remark 10.3.3. (a) Theorem 10.3.1 improves many previous results in that it gives necessary and sufficient conditions for the existence of invariant p.m.'s for P under no hypotheses; most of the results in the literature give only sufficient conditions for the existence of an invariant p.m. or require additional hypotheses. For instance, one usually assumes that X is a LCS metric space, and that P satisfies the weak-Feller property (see Definition 4.4.2), or even, instead of the Feller property, quite often it is required the stronger hypothesis P maps Co (X) into itself,

(10.3.5)

where Co (X) is the Banach space of bounded continuous functions that vanish at infinity, equipped with the sup-norm. [See, for instance, [7, 85, 87, 99, 103] and their references.] Some authors write (10.3.5) in the equivalent form: P satisfies the Feller property and, in addition, for every compact K C X, the function x 1-4 P(x, K) vanishes at infinity. (b) The references mentioned in (a) can be related to Theorem 10.3.1 by noting the similarity between (10.3.1) and the Lyapunov (or Foster-Lyapunov) criteria in the literature; see also (10.4.4) below. In particular, the above two theorems permit to derive a simple sufficient condition for existence of an invariant p.m. (For examples of MCs that satisfy the hypotheses in parts (a) and (b) of the following corollary, see, e.g., [70, 71] and the references therein.) Corollary 10.3.4. Let l. be a MC on X with t.p.f. P. (a) If P (n) (x, .) > coo for some x E X, some nontrivial measure (po E M(X)+ , and all n = 1, 2, . . . , then P admits an invariant p.m. (b) If P(n)(x,.) < 00 for some x E X, some measure 00 E M(X)± , and all n = 1, 2, ... , then P admits an invariant p.m. Proof. (a) We will show that the condition (10.3.1) implies (10.3.2) for the measure co o , which by Theorem 10.3.1 yields the desired conclusion. Rewrite (10.3.1) as u > Pu + v - s and iterate n times to get n-1

u > Pn u ±

E Pt v - ns. t=0

Therefore, rearranging terms and multiplying by n -1 , we obtain, for all x E X, n-i r1,-1 (Pn - pu(x) <

- n -1

E Pt v (x) + s

t=0

=

- (P(n) (x , .), v) + s,

so that with x and coo as in (a) we obtain n -l (Pn - /)u(x) < -((po,v) +

S.

140

Chapter 10. Existence of Invariant Probability Measures

Taking limit as n -4 oo and using the boundedness of u we get ((po, v) < s, that is, (10.3.2) holds. Thus the conclusion in (a) follows from Theorem 10.3.1. (b) Rewriting (10.3.3) as u > Pu — v ± s and arguing as in the proof of (a) we obtain that for all x E X n-1

n -1 (Pn — I)u(x) <

_i E pt, (x) _ s n t=0

,

(p(n)(x,

.), v) —

s.

Hence, with x and 00 as in (b), we get n -1 (Pn I)u(x) 5_ (00 ,v) — s. Taking limit as n —> oo yields (0 0 , v) — s > 0, and so (10.3.3) implies (10.3.4). Thus, the desired conclusion follows from Theorem 10.3.2. CI

10.3.2 Problem pI We now turn our attention to the existence of strictly positive invariant p.m.'s, assuming that X is a metric space. In fact, as an immediate consequence of Theorem 10.3.1 we obtain the following.

Corollary 10.3.5. Assume that X is a metric space with Borel a-algebra B, and let . be a MC with t.p.f. P. Then the following conditions are equivalent: (a) There is an invariant p.m. which is strictly positive; that is, there is a measure p E M(X) that satisfies (10.1.1) and (10.1.2) for some strictly positive measure (Po. (b) The condition (10.3.1) implies (10.3.2) for some strictly positive measure o. (c) There is a measure 7 in M(X) such that '')-, := 0— lim inf ,.),p(n) is in TI -4 00

M(X)+ , and it is strictly positive. (d) There is a measure -y in M(X) ± such that 7P(n) (B) -- aB V B E B, as n —> oo, with sup BEB aB < oo, and lim -yP( n ) (G) > 0 for every open set C E B. n-+(x)

Proof. The equivalence of (a), (b) and (c) follows directly from the proof of Theorem 10.3.1 (see §10.7.1). On the other hand, to get that (a) = (d) it suffices to take 7 := p, and aB := ,u,(B). Finally, to prove that (d) = (a), let -y, be the measure defined by 7n (B) := 7P(B)

VB E B, n = 1, 2, .. . .

10.3. Existence Results

141

If (d) holds, i.e., -y(B) aB for every B E B, by the Vitali-Hahn-Saks theorem (Proposition 1.4.2) there is some measure it E M(X)+ such that 'y71 (B)

,a(B)

VB E B,

(10.3.6)

that is, 'Yn —>u setwise, and p, is strictly positive. In addition, 'Yn/3 = 'Yn + n -1 [21 /3n -Y] •

(10.3.7)

Therefore, as P maps B(X) into itself, for every B E B we have ,a(B) = lim y(B) = n oo

lim00(7?-1, P1B) [by (10.3.7)]

fl

=

(P,P1B) [by (10.3.6)]

= so that ,aP = it, that is, it is a strictly positive invariant p.m.

El

10.3.3 Problem P1' We now turn our attention to problem 'P:3", that is, the existence of a unique strictly positive invariant p.m. We first use Theorem 10.3.1 to obtain the existence of an invariant p.m. it which is not strictly positive, that is, p,(G) = 0 for some given (fixed) open set G c B.

Corollary 10.3.6. Assume that X is a metric space with Borel o - -algebra B, and let G X be a nonempty open set. Then the following conditions are equivalent: (a) There is an invariant p.m. ,u, with p(G) = 0. (b) The condition (10.3.3) implies (10.3.4) for some measure '00 E M(X)+ such that 00(G) = 0. (c) There is a measure -y in M(X) + such that 5% := 0 — lirn sup -yP ( n) is in M(X)+ , and 5%(G) = 0. (d) There is a nontrivial measure -y in M(X) ± such that lim 7P(B) = aB V B E B, 71,-)• 00

with sup BEB aB <00, and -yP( n) (G)

0 as n

oo.

Proof. To prove the equivalence of (a), (b) and (c), it suffices to observe that the existence of an invariant p.m. ji with p,(G) = 0 is equivalent to the problem of existence of an invariant p.m. it majorized by some measure zi)c, E M(X) + with Oo(G) = 0. Then apply Theorem 10.3.2. The proof of (a) <=> (d) mimics the proof of the equivalence of (a) and (d) in Corollary 10.3.5. LI We now consider problem 1:%'; again but assuming that P satisfies the strongFeller property (Definition 4.4.2), that is, P(B(X)) c

cb (x).

(10.3.8)

(In the following two sections we replace (10.3.8) with the weak-Feller property.)

142

Chapter 10. Existence of Invariant Probability Measures

Theorem 10.3.7. Assume that X is a metric space with Borel a-algebra B, and P is strong-Feller. Then the following conditions are equivalent: (a) P has a unique invariant p.m. and it is strictly positive, or P does not have an invariant p.m. (b) For every nonempty open set G X and for every Oci E M(X) + such that Oo(G) = 0, there are functions G, VG E B(X)+ , and a scalar SG such that (P - I))11c < VG - sG and (00 ,vG )— s G <0. (c) For every nonempty open set G X and every measure -y E M(X)+, the measure 5% := 0- lim sup -yP ( n) is either unbounded or such that "y'(G) > 0. n —>

Proof. (b) (c) follows from the equivalence of (b) and (c) in Corollary 10.3.6. Moreover, from Corollary 10.3.6 again, (b) is equivalent to saying that for every open set G X, there is no invariant p.m. with u(C) = 0. Equivalently, either there is a strictly positive invariant p.m. or there is no invariant p.m.. Hence (a) (b). To complete the proof let us suppose that (b) holds. Now if (a) does not hold, then P has an invariant p.m. which is not strictly positive, which contradicts (b), or P has more than one invariant p.m. Let 4a 1 and 112 be two invariant p.m.'s with p2. Then the finite signed measure v := pi - /12 is a fixed point of P, and so are the nonnegative measures v+ and v - [see Lemma 2.2.3 or the paragraph following (10.2.1)]. Moreover, v+ and v - are both nontrivial fixed points of P. To see this, suppose, for instance, that v+ (X) = 0. Then < ,a2 , and so Remark 2.2.4 yields ,u,i = p2, which contradicts our assumption p i ,a2 . A similar argument shows that v - (X) > 0. Now, as v+ and v - are nontrivial, we may assume that they are invariant p.m.'s for P. Let {D, DC} be the Hahn-Jordan decomposition of v = v+ - v - , so that, in particular, v+(D) =1 and v - (D) = 0. Then (i) P(x,D) = 1 v+-a.e.,

and (ii) P(x,D) = 0 v - -a.e.

(10.3.9)

Indeed, by the invariance of v+ , v+(D) = f v+(dx)P(x,D), ,c which combined with v+(D) = v+(X) = 1 yields (10.3.9)(i); a similar argument gives (10.3.9)(ii). Furthermore, by (10.3.9)(i), D is v+-invariant, and thus (as in (2.4.1)), there exists an invariant set B C D such that v+ (B) = 1 and P(x, B) = 1 for all x E B; in fact, by the strong-Feller property, P(x, D) = 1 for all x in the closure B of B. Note that the complement BC of B is nonempty because otherwise B = X would imply P(x,D) =1 for all x in X, contradicting (10.3.9)(ii). Finally, we observe that v+(B c ) = 0 since BC is contained in BC and v +(BC) = 0. In other words, assuming that (a) does not hold, we have found that v+ is an invariant p.m. that vanishes on the nonempty open set BC X, which contradicts (b). Hence (b) implies (a).

10.4. Markov Chains in Locally Compact Separable Metric Spaces

143

10.4 Markov Chains in Locally Compact Separable Metric Spaces Some results in this section require Assumption 10.4.1, below, in which we use the weak-Feller property (see Definition 4.4.2).

Assumption 10.4.1. (a) X is a LCS metric space; (b) P is weak-Feller. We first state an analogue of Theorem 10.3.1(a),(b) in the present context of Assumption 10.4.1. In fact, the only thing we do is that we replace B(X) + in (10.3.1) with Cb(X) +, see (10.4.1).

Theorem 10.4.2. Let . be a MC on X with t.p.f. P. Under Assumption 10.4.1, the following statements are equivalent: (a) There is a measure it E M(X) that satisfies (10.1.1) and (10.1.2) for some measure (po E M(X)+ . (b) The condition (P — I)u < —v + s with u,v E Cb(X) + and s E111+

(10.4.1)

implies —

((P o, v ) + s > 0

(10.4.2)

for some measure (p o E M(X) + . We shall now present specialized versions of Corollary 10.3.6 and Theorem 10.3.7 for strictly positive invariant p.m.'s. As in §10.3.3, we first consider an invariant p.m. which is not strictly positive. That is, there is a measure iu, and a nonempty open set G X such that (i) p, (/ — P) = 0,

(ii) (p, 1) < 1,

(iv) (ii, fo) ?

E,

(iii) (p, 1lG) < 0, and

with it in M(X)+,

(10.4.3)

for some number E > 0 and some strictly positive function fo in Co (X)+. The reason for stating the existence of such an invariant p.m. in the form (10.4.3) will be apparent from the proof of Theorem 10.4.3. In particular, (10.4.3)(iv) ensures that p is nontrivial, that is, p(X) > 0, which combined with (10.4.3)(i)—(iii) yields that it [multiplied by 11 p(X) if necessary] is an invariant p.m. that vanishes on G. The following theorem gives necessary and sufficient conditions for the existence of a solution it in M(X) + to (10.4.3).

X be a Theorem 10.4.3. Suppose that Assumption 10.4.1 holds, and let G nonempty open set. Then the following conditions are equivalent: (a) There is a measure it E M(X) ± that satisfies (10.4.3) for some € > 0.

144

Chapter 10. Existence of Invariant Probability Measures (b) There exists some E > 0 such that the condition (P — I)u < a + (31G — -y fo , with u E Cb(X)+ and a,I3,-y > 0,

(10.4.4)

implies a > E-y.

(10.4.5)

(c) There exists x E X such that lim inf P(n) (x , G) = 0 n— ■ oo

and

lim inf P( n) fo(x) > 0. n.— oo

For a proof of Theorem 10.4.3 see §10.7. Moreover, from Theorem 10.4.3 we get the following criterion for the existence of a unique strictly positive invariant p.m.. Corollary 10.4.4. Suppose that Assumption 10.4.1 holds. Let „ be a MC on X with t.p.f. P and let f o be a strictly positive function in C o (X). If X is a LCS metric space and P is strong-Feller, the following conditions are equivalent: (a) There is no p, E M(X) ± that satisfies (10.4.3); that is, either P does not admit an invariant p.m. or, if it does, it is (unique and) strictly positive. (b) For every nonempty open set G X and every E > 0, there is a function u E Cb(X)+ and constants a,13,-y > 0 such that (P — I)u < a + 01G — -yfo

and a < E7,

(10.4.6)

or, equivalently, (P — I)u < a + 01G — fo and a < E.

(10.4.7)

(c) For every nonempty open set G X and every x E X lim inf P(n) (x , G) > 0 n--c>o

or

lim inf P(n) MX) = 0. n--4co

(10.4.8)

Proof. To obtain (10.4.7) from (10.4.6) it suffices to multiply the inequalities in (10.4.6) by 1/-y and relabel u/-y, a/-y, and 131-y as u, a and 0, respectively. The relations (a) = (b) <=> (c) follow from Theorem 10.4.3. To complete the proof let us suppose that (b) holds. Now if (a) does not hold, then either P has an invariant p.m. which is not strictly positive, which contradicts (b), or P has more than one invariant p.m.. As in the proof of Corollary 10.3.7, one uses the strong-Feller property to prove that P cannot admit two distinct invariant p.m.'s. El Remark 10.4.5. Uniqueness of invariant p.m.'s, strictly positive or not, is a tough question and, in particular, the strong-Feller property required in Corollary 10.4.4 seems to be unavoidable in our present context: Skorokhod [122] gives an example of a MC in which X is compact, P satisfies the weak-Feller property and it is topologically connected [meaning that EkP k (x, G) > 0 for any x E X and any nonempty open set G], and still it does not have a unique invariant p.m. He does obtain uniqueness results (see, for instance, [122, Theorem 2]), but under hypotheses much more restrictive than those of Corollary 10.4.4.

10.5. Other Existence Results in Locally Compact Separable Metric Spaces 145 For examples of MCs that satisfy the strong-Feller property see after Definition 4.4.2.

10.5 Other Existence Results in Locally Compact Separable Metric Spaces The existence results in §10.3 and §10.4 are stated in terms of minorizing or majorizing measures as in (10.1.2) and (10.1.3). In this section we consider a different type of existence results, using a moment function (Definition 1.4.14) and the Banach space Co(X) of bounded continuous functions on X that vanish at infinity.

10.5.1 Necessary and Sufficient Conditions We first present a set of necessary and sufficient conditions.

Theorem 10.5.1. Let be a MC on X with t.p.f. P. Let Assumption 10.4.1 hold and let fo be a fixed, strictly positive function in C o (X). Then the following four statements are equivalent: (a) P admits an invariant p.m. p E M(X). (b) For some initial distribution v E M(X) ± and some moment function f, 0 < liminf n -1 EEf(et) n—*oo

<

00.

(10.5.1)

t =o

(c) For some initial state x E X, lim inf P(n ) fo (x) > 0.

(10.5.2)

11-> (DO

(d) For some initial state x E X and some compact set K E B, lim sup P(n ) (x, K) > 0.

(10.5.3)

n oo

Theorem 10.5.1 is proved in §10.7.3. Observe that, in contrast to the moment function f in (b), the function fo in (c) is arbitrarily fixed.

10.5.2 Lyapunov Sufficient Conditions The conditions in Theorem 10.5.1 are necessary and sufficient, but they are all stated in terms of asymptotic properties of P(n) . However, we have seen in §7 a sufficient condition for existence of an invariant p.m. for a weak-Feller MC (see Theorem 7.2.4). This condition is only sufficient, but it involves only the one-step t.p.f. P and a Lyapunov function V to guess in (7.2.5).

Existence of Invariant Probability Measures

146

Namely, given an arbitrarily fixed strictly positive function fo E Co(X), one has to guess a measurable function V : X --4 R+ (not identically zero) and a scalar b> 0 such that PV (x) < V(x) - 1 + bf o (x)

Vx E X.

(10.5.4)

The Lyapunov condition (10.5.4) can also be used for other purposes. For instance, consider a MC on the real line X := R. Suppose we want to check whether there exists an invariant p.m. with an exponential tail, that is, an invariant p.m. such that

L for some positive scalar such an invariant p.m.

T.

erixi,a(dx) < oo,

The following Lyapunov condition permits to detect

Corollary 10.5.2. Let . be a weak-Feller MC on X = R. Assume that there exist positive scalars b, T and a nonn,egative measurable function V: X -- R+ such that PV(x) < V (x) - eTlxl + b Vx E X.

(10.5.5)

Then there exists an invariant p.m., and every invariant p.m. has an exponential tail. Proof. The proof mimics that of Theorem 7.2.4. Iterate (10.5.5) n times and multiply by n -1 to obtain

ro L. p(n)( x, dy ) e r1Y1 _< b + [V(x) - PnV(x)]/n,

(10.5.6)

for all x E X and n = 1, 2, .... The function x '-f eTlx 1 is a moment and, therefore, (10.5.6) and Proposition 1.4.15 yield that the sequence {P(n)(x, is tight for every x E X. Fix an arbitrary x E X. By Prohorov's Theorem 1.4.12, there is a subsequence nk and a p.m. px such that P(n ) (x, .) = ,ux and, by Proposition 7.2.2, px P = px . In addition, as x 1-4 erlxl is continuous and nonnegative, letting k -p oo, Proposition 1.5.6(a) yields f 00 00 f p(n x, dy ) elm < b, Loo e TIYI ttx(dy) < liminf k)( k-000 -00 .)}

which proves that ,u,x is an invariant p.m. with an exponential tail. Let us now consider an arbitrary invariant p.m. it for P. By Lemma 5.2.4(a), p, can be written as p(B) = f aux (B) p(dx),

B E 8,

(10.5.7)

147

10.6. Technical Preliminaries whereas, by Theorem 5.2.2(a),(b), there is a t.p.f. H such that it' = II(x,.) f elY1 p,x id-y\) and f II(x, dy)erlYI are equal Therefore, the functions x and so, by (10.5.7), f

erlYI ic(dy)

=

[f cc II(x, dy) eTlY] ,u(dx) f—00 — 00 itx (do] p(dx)

f[f 00

f

oo

bdit = b,

and it follows that has an exponential tail.

LI

Of course, Corollary 10.5.2 can be translated to MCs on a "one-dimensional" countable space, that is, X := l• • • — 2, —1, 0, 1, 2, . . . 1.

10.6 Technical Preliminaries 10.6.1 On Finitely Additive Measures We first recall some notation. Let (X, 8) be a measurable space. Let B(X) be the Banach space of bounded measurable functions on X, equipped with the sup-norm whose topological dual B(X)* ba(X) is the Banach space of finitely additive measures on B (called charges in Rao and Rao [13] and means in Revuz [112]), equipped with the total variation norm. The space M(X) of finite signed measures on X is a subspace of ba(X), which is also a Banach space with the total variation norm. A pure finitely additive measure (or a pure charge) ,u E ba(X) is an element of the space M(X) -- defined as

MPO-L :=

{A E ba(X)1A I ea

for every

E M(X)},

(10.6.1)

where A 1,u if and only if 1A1 A ji = 0 (see, e.g., Rao and Rao [13, Theor. 10.1.2, p. 240]). Moreover, every i E ba(X)+ can be decomposed as with cp E M(X) + and

E ba(X)±

n m(x)±.

(10.6.2)

(See Rao and Rao [13, Theor. 10.2.1, p. 241] .) Let P be the t.p.f. of a MC on X, viewed as an operator f Pf on B(X) (see (2.3.6)) with dual (or adjoint operator) P* : ba(X) ba(X) whose restriction to M(X) coincides with the operator v vP on M(X) defined in (2.2.12) [see (10.2.2)—(10.2.3)].

148

Existence of Invariant Probability Measures

Lemma 10.6.1. Let . be a MC on X with t.p.f. P. Let (i00,00 be nontrivial measures in M(X)+ , and let ea E ba(X)+ . (a) If it < 00, then ii is countably additive, that is, ii, is in M(X)+ . (b) If it > (po , then il is not purely finitely-additive, and pt = cp + IP with cp E M(X) + and 0 a pure nonnegative finitely-additive measure. In addition, if P* ea = iu, then c,oP = (p. Proof. (a) Let {B n } be a sequence in B such that Bn j 0. As tt(Bn) < 00(B n ) for all n = 1, 2, ... , and 00 is countably additive, it follows that ,a(B7 ) 1 0, and thus p is countably additive. (b) Let F(p) := Iv E M(X) + I v < O. Then F(p) is nonempty since (Po E 1704. From Yosida and Hewitt [134, Theor 1.23], F(p) contains a maximal element (to E M(X)+. Moreover, from (10.6.2), iu can be written as

for some nonnegative and pure finitely-additive measure V) and a maximal element (p E M(X)± of ['(p). As co is a maximal element of F(p,) it follows that co o < co. This gives the first statement in (b). Suppose now that P* it = it. Hence, as P* is a positive operator, we get Ii = P* A > P* (P = co-P. Hence, we conclude that (pP E F(p) and thus (pP < cp. The latter fact and the CI boundedness of (p imply that (pP = (p (see Remark 2.2.4).

10.6.2 A Generalized Farkas Lemma Let (X, Y) and (2, W) be two dual pairs of vector spaces (see §1.3), and A a linear map from X to Z. The adjoint of A, written A*, is defined by the relation (Ax, w) = (x, A* w) for each x E X and w E W. Further, A* maps W into 3) [i.e., A*(W) C Y] if and only if A is weakly continuous, (10.6.3) [i.e., continuous with respect to the weak topologies o- (X, y) and o- (Z, W)]; see, for instance, [25, p.984]. Lemma 10.6.2. (Craven and Koliha [25, Theor. 2]). Let (X , y) and (2,1N) be two dual pairs of (real) vector spaces, and let K be a convex cone in X with dual cone K* := {y c yl(x,y) > 0

Vx E K}

c Y.

Let A : X —> Z be a weakly continuous linear map with adjoint A* :1 / 1 1 —> y. If A(K) is weakly closed, then the following conditions are equivalent for b E Z: (a) The equation Ax = b has a solution x in K. (b) A*w E K* = (b,w) > 0.

10.7. Proofs

149

10.7 Proofs 10.7.1 Proof of Theorem 10.3.1 For technical reasons that will become apparent below, instead of trying to solve PI' in M(X) we will first consider Pi' in the larger space ba(X) D M(X) of finitely-additive measures. We begin by noting that, by (10.2.3), the condition (a) in Theorem 10.3.1 is equivalent to: There exists p E ba(X) such that (I — P*)p = 0,

u> wo ,

(it, 1) = 1,

and p E ba(X)+ .

(10.7.1)

Indeed, it is obvious that (10.1.1) and (10.1.2) imply (10.7.1) since P* ,u coincides with p,P whenever ti E M(X). Conversely, if (10.7.1) holds, then P* ,u, = p. In addition, p, can be decomposed as it = yo + 0 with yo E M(X) + and lp > 0 a pure nonnegative finitely-additive measure, that is, 7,b E M(X)' (see (10.6.2) and 10.6.1)). By Lemma 10.6.1(b), (pP = P*(p = cp and yo > yoo , so that co is a nontrivial invariant measure in M(X)+. Therefore, the p.m. (p(.)1(p(X) satisfies (10.1.1) and (10.1.2). On the other hand, introducing the "slack variable" v in M(X)+, we can see that the existence of a solution ,u to (10.7.1) is equivalent to: There is a solution (p,, v) in the convex cone K := ba(X)+ x ba(X)+ to

( i ) ( I — P* )/-1 = 0 ,

and (iii) (ii, 1) = 1.

(ii) A — v = (po,

(10.7.2)

In view of this remark, we will prove Theorem 10.3.1 by showing that (10.7.2) has a solution in K if and only if part (b) in Theorem 10.3.1 is true. To do this, we use Lemma 10.6.2 as follows. To put (10.7.2) in the context of Lemma 10.6.2 consider the dual pairs (X ,Y) and (Z, IN) with X := ba(X) x ba(X),

Y := B(X) x B(X),

Z := ba(X) x ba(X) x R,

W := B(X) x B(X)

x R.

Let A : X —> Z be the linear map A(p, v) := ((I — P*),u,, p — v, (p, 1)), and let K be the positive cone in X, i.e., K := (ba(X) x ba(X))+ = ba(X)+ x ba(X)+ . Thus, we can rewrite (10.7.2) as in part (a) of Lemma 10.6.2, i.e., A(p,v) = (0,(P0, 1) has a solution

(p, v) in

K.

(10.7.3)

Existence of Invariant Probability Measures

150

Now note that by (10.2.2), the adjoint A* of A is A* (u, v, s) = ((/ — P)u + v + s,—v), and that A* maps W into 3); hence A is weakly continuous [see (10.6.3)]. Thus, since the dual cone of K is K* = (B(X) x B(X))+ = B(X) + x B(X)+, in the present context we can write (b) in Lemma 10.6.2 as A*(u, v, s)

in

K* = ((0, (Po , 1) , (u, v, s)) = ((Po, v) + s ? 0.

(10.7.4)

Finally, replacing v by —v in (10.7.4), we see that (10.7.4) is precisely the statement "(10.3.1) implies (10.3.2)" in part (b) of Theorem 10.3.1 (when s is negative, the statement is automatically satisfied). Therefore, since we already know that A is weakly continuous, the proof of Theorem 10.3.1 will follow from Lemma 10.6.2 if we show that A(K) is weakly closed.

(10.7.5)

To prove (10.7.5), let (D,>) be a directed set, and let {(p,, v,), a E D} be a net in K such that A(u,, v,) converges to (a, b, c) E Z in the weak topology o- (Z, W), i.e.,

(i) (I _ P*) tta setwise

a,

(ii) tic, _ va set,4se b,

(10.7.6) and (iii) (L c , 1) —> C. We wish to show that (a, b, c) is in A(K), that is, there exists a pair (p , 0) in K such that (i) (/ — P* ) IP = a,

( ii ) isio _ 110

b, (10.7.7)

and (iii) (it ° , 1) = c. To prove (10.7.7), first note that the real number c in (10.7.6)(iii) is nonnegative since so is (p,, 1) = t(X) , IlI1 c,11 Tv . We shall consider two cases, c = 0 and c > 0. If c = 0, then (I-La, 1 ) = 1112 0 Iliv

0,

and so

I P*Pa 1 1T v =

Thus, (10.7.6) yields that (110,110) = ( 0, _ b) satisfies (10.7.7). Let us now consider the case c> 0.

I I[ta IIT v --+ O.

10.7. Proofs

151

If c > 0, the condition (10.7.6)(iii) implies the existence of a' E D and m' > 0 such that (p,,„, 1) = 1,1 1.14Tv < m' for all a > a', and the same holds for IIP * Pa 11Tv = iitia 11Tv. Combining this fact with (10.7.6)(i) and (ii) we see that there exists ao E D and mo > 0 such that illiallTv, IlvallTv < mo Va > ao. Therefore, as ba(X) f_`-_-' B(X)* (see (1.2.4)), the weak topology o- (ba(X), B (X)) is the weak* topology of ba(X). It follows from the Banach-Alaoglu theorem (Lemma 1.3.2), that the unit ball of ba(X) is weak* compact, so that, by the boundedness of the net {} , there is a subnet {AA in K and an element it ° E ba(X) such that ,u,(3 -- it° setwise. From this and (10.7.6)(ii) we deduce that vo -- v° setwise for some v ° E ba(X) and p° — v° = b. As K is weak* closed, it follows that 40 , v° E K. Moreover, as P* is weakly continuous, it also follows that (I — P* )1.10 —> (I — P*)tt ° = a. Hence, ( tto , vo )\ E K satisfies (10.7.7), which implies that (a, b, c) is in A(K). This completes the proof of (10.7.5), and so the equivalence of (a) and (b) in Theorem 10.3.1. To prove that (a) = (c), it suffices to take -y := ,u because then (c) follows from the invariance of ea. To prove the reverse implication, we will show that (c) implies (b). To see this, let -y be as in (c), and suppose that (10.3.1) holds; we will next show that (10.3.2) is satisfied with c,0 0 (•) := 5%(-)/-y(X), which [by (c)] is a nontrivial measure in M(X)+. Now, let us write the inequality in (10.3.1) as

u > Pu + v — s and iterate. This gives n-1

u> pnu+ E pkv_ns

Vn > 1,

k=0

and integration with respect to y [noting that (-yP k ,u) = (y,P k u)] yields n-1 (7, IL) >

(713n , u) + (E -y Pk , v) — ns-y(X), k=0

or, dividing both sides by rry(X),

s + (-y, u)/n-y(X) ..> (713n , u)/n-y(X) + eyn, v)/y(X), where

n-1 1 V 7pk = 7 ,s(n) ,yn := _ /-' n z—i k=0

/

Observe that 0 < (7 P7 1 ) U) and that (7(),v)

n = 1,2, ... .

1147 Pn (X) = 1147 (X) < 00 for all n = 0,1, ... ,

?_ ( A 7 k, v) k> m

(10.7.8)

Vn > m

(m = 1, 2, ... ).

Existence of Invariant Probability Measures

152

Thus, since Ak>,n 71 (m = 1,2, ... ) is a monotone nondecreasing sequence of measures that converges setwise to ;1, (see (10.2.4)), taking the liminf r, in (10.7.8), we get s > ( , v)/-y(X), which is the same as (10.3.2) with (p o := -3/-y(X). El The proof of Theorem 10.3.2 and of Theorem 10.4.2 is the same as that of Theorem 10.3.1 with obvious changes.

10.7.2 Proof of Theorem 10.4.3 (a) <#. (b). We may replace (10.4.3)(i) by p,(I — P) < 0 with tt in M(X)+. Taking this into account, we introduce "slack variables" I/ in M(X)+ and r, s, and t in R± to rewrite (10.4.3) as

(i) it(I — P) +v = 0, (ii) (p, 1) +r = 1, (iii) (,a, 1G) +8 = 0, and (iv) (A, fo ) — t = s,

with (p, v, r, s, t) E K,

(10.7.9)

where K is the convex cone

K := (M(X)+) 2 x (R +) 3 . Having (10.4.3) in the form (10.7.9), we may prove Theorem 10.4.3 using Lemma 10.6.2 again, as follows. Let (X, y) and (Z, W) be the dual pairs of vector spaces with X := M(X) 2 x R3 ,

Y := B(X) 2 x R3

Z := M(X) x R3 ,

W := Cb(X) x R3 .

Then defining a linear map A : X —> Z as A(p, v, r, s, t) := (p,(/ — P) + v, (ii, 1 ) + r, (I-t, 1G) + s, (II, f0) — t), (10.7.9) becomes A(p, v, r, s, t) = (0, 1, 0, E)

with

(p, v, r, s, t) E K,

(10.7.10)

which corresponds to part (a) in Lemma 10.6.2. On the other hand, the adjoint A*:W-->YofAis

A * (u,a,0, -Y) = ((/ — P)u + a + 01G + 7,io,u,oe,i3 , -1'), which indeed maps W into

y, and so A is weakly continuous (see (10.6.3)). Then

10.7. Proofs

153

part (b) in Lemma 10.6.2 can be written as:

A*(u, a„ (3,-y) E K* = (B(X)+) 2 x (11k +) 3 implies ((0, 1, 0, E), (u, a, 0,-y)) = a + E'y > 0; that is,

(P - I)u < a + MG + Vo, with u E Cb(X)+ , a > 0,

0 > 0, and - -y > 0,

implies a+ E-y > 0. Finally, replacing -y by -y, we see that the latter statement is precisely part (b) in Theorem 10.4.3. Therefore, by Lemma 10.6.2, the proof of Theorem 10.4.3 will be complete if we show that A(K) is weakly closed. (10.7.11) We shall omit, however, the proof of (10.7.11) because it is essentially the same as the proof of (10.7.5), using the Banach-Alaoglu theorem and the fact that if {an} C M(X) is a sequence that converges to it E M(X) in the weak* topology o- (M(X), C o (X)) for M(X), then for every nonnegative 1.s.c. function f on X,

liminff f n—■ ao

ffd dpn >. p,

(see Proposition 1.4.18). In particular, for an open set G we obtain lim inf itn (G) > p(G). n—■ oo

(a) = (c). Suppose that fi satisfies (10.4.3) so that, in particular, p is an invariant p.m.. Then, by the Individual Ergodic Theorem 2.3.4, for every f in L i (p,), there is a function f* in L1(1) such that

1 nx -1 ---. f*(x) = lim - 2 Pk f (x) n-,3o n

p,-a.e.,

(10.7.12)

k=0

and

Ix f dp, = I r dp. (10.7.13) x and f = fo we see that (c) follows from (10.7.12) and

In particular, taking f = 1G (10.7.13). (c) = (b) [<=>. (a)]. Suppose that (c) holds and let u, a, 0, and -y be as (10.4.4), so that u > Pu + 7 fo - 01G — a. Iterate this inequality and rearrange terms to obtain, for n = 1, 2, ... , U 4-

na + 0

n-1

n-1

k=0

k=0

E pki G > Pnu + 7E pkfo.

154

Existence of Invariant Probability Measures

Finally, divide by n and take lim inf as n -- oo to get (10.4.5) with E := lim inf n—>oo

i n— —

n

Pk fo ( x )•

k=0

That is, (c) implies (b), which completes the proof of Theorem 10.4.3.

El

10.7.3 Proof of Theorem 10.5.1 The proof of (a) implies both (c) and (d) follows from the Birkhoff Individual Ergodic Theorem 2.3.4. Indeed, let p be a invariant p.m. for P. As both g := fo and g := 1K belong to Li(p), by Theorem 2.3.4 we get

n-1 lim inf P(n)g(x) = lirn n -1 EEx g (6) = g*(x) n—*oo

p-a.e.

t=0

for some measurable function g* E Li (,u), and, in addition, f g dp = f g*dp. With g := fo and using that f fo du > 0, it follows that g*(x) > 0 on some set B with p(B) > 0. On the other hand, with g := 1K we have f g*(x)dp, = p(K) > 0 for some K E B. Summarizing, (a) = (c) and (a) = (d). (c) (a). Assume (c) is true for some x E X, and write (10.5.2) as lim inf f P n—■ ao

(n) (X, dy) fo (y) > 0.

Recall that 113(n) (X, .)} is in the unit ball of M(X), which is sequentially compact in the weak* topology a(M(X), Co(X)) of M(X) (see Lemma 1.3.2(b)). Now, let px be a weak* accumulation point of the sequence {P(n)(x,.)}, hence a limit point for some subsequence {P (nk ) (x,.)}. By Proposition 7.2.2, p x is an invariant measure for P. Moreover, as fo E Co(X) satisfies (10.5.2) we have 0 < lim inf P(n) MX) < liM P (nk) MX) n—*oo

k—*co

=

f

fo dllx

which proves that px is a nontrivial invariant measure for P, hence an invariant p.m. for P (after normalization if necessary). (d) = (a). Consider a subsequence {nk} of {n} for which 0 < lim sup P(n) (x,K) = lim P (nk ) (x, K),

(10.7.14)

k—>oo

with x and K as in (d). As in the proof of (c) = (a), there is an invariant measure px E M(X)± and a subsequence (again denoted {n k } for convenience) such that

155

10.8. Notes

[ix is a weak* accumulation point of the subsequence {P ( nk ) (x, .)}. To prove that px is nontrivial, simply use the fact that, by Theorem 1.4.9(a) and (10.7.14), p(K) > limsupP (nk ) (x,K) > 0. k—■ oo

(b) .;=> (a). Suppose that (b) holds, and let {nk} c {n} be the subsequence for which the "lim inf" in (10.5.1) is attained, that is, 0 < lirn f f(y)(vP (nk ) )(dy) < oo. k— ■ oo

From this, we conclude that sup f f(y)(vP ( nk ) )(dy) < oo, k

which in turn, by Proposition 1.4.15 (as f is a moment), implies that the sequence v { p(nk)} is tight. Therefore, by Prohorov's Theorem 1.4.12, there is a measure and a subsequence (again denoted {nk}) such that vP ( nk )

pv .

Using that P is weak-Feller and it is a p.m., it follows from Proposition 7.2.2 that is an invariant p.m. (a) = (b). Conversely, let ii be an invariant p.m. Let Kn be a nondecreasing sequence of compact sets such that Kn I X, and p(Kn+i — Km ) < n -3 , n = 1,2, ... , where we have used that every p.m. on a o- -compact space is tight (see, e.g., [14]). Let v : X -- X be the measurable function such that v := n for every X E Kn+1 — K n , n> 1. Then, v is obviously a moment, and n-1

0 < lim sup n Ti

—>oo

00

Ei,f(W = f v dp, 5_

—

t=0

n=1

10.8 Notes Most of the results in this chapter are from [57]. There is an extensive literature on the problem Pi', e.g. [24, 29, 53, 70, 75, 85, 87, 88, 103, 94, 99, 105, 112, 122], ... , and the references therein. We know of no references for P; and 1- 7. . There are, of course, many previous works on existence and uniqueness of invariant p.m.'s, but not for strictly positive invariant p.m.'s. In addition, most results in the literature concern weak-Feller t.p.f.'s on a LCS metric space.

Chapter 11

Existence and Uniqueness of Fixed Points for Markov Operators 11.1 Introduction and Statement of the Problems In Chapter 10 we have considered the existence of invariant p.m.'s for a t.p.f. P, viewing P as a Markov operator on M(X) — see §10.2. Thus, an invariant p.m. p, turns out to be a fixed point of P, that is, p,P = p. In this section, we study essentially the same problem but from a very different perspective. To motivate the material in this chapter, consider the logistic map defined in (6.2.5) as x 1-> 8(x) = 4x(1 - x) for x E [0,1]. This gives a MC e, = with et+ i = S() for t = 0,1, ... , with some given initial distribution and the t.p.f. P(x, B) coincides with the Dirac measure concentrated at 8(x). Hence, as 8(x) = x if and only if x = 0 or x = 3/4, it follows that the Dirac measures 6 0 and 63/4 at the points 0 and 3/4 are "trivial" invariant p.m.'s, or fixed points of P. In fact, there are countably many other invariant p.m.'s of P associated with the cycles of all lengths j = 1,2, ... (see Holmgren [72] and the discussion in §6.3.2). However, one may wish to determine whether there exists an invariant p.m. that has a density w.r.t. the Lebesgue measure A on [0,1] (and, in fact, Ulam and von Neumann proved that there exists an invariant p.m. with density (7r,/x(1 - x)) -1 ). For the latter problem, the results of Chapter 10 are "useless", in the sense that they answer the question of existence of invariant p.m.'s but they do not give information about the existence of invariant p.m.'s with a density. When P maps the space of measures with a density w.r.t. A into itself, an alternative is then to consider the operator T in (2.3.10) defined on L 1 (A). Thus,

eo,

{et}

Markov Operators

158

T maps the density f E Li (A) of a measure vf < A to the density T f of (vfP) w.r.t. A. As noted in Remark 2.3.7(b), T is a Markov operator, and so, we are led to investigate the existence of nontrivial fixed points of T in L i (A). This is a topic of fundamental importance in many fields, including ergodic theory, probability, dynamical systems, and their applications, and there is, therefore, an extensive literature on the subject (some related references are given in the Notes section at the end of this chapter). Here we present necessary and sufficient conditions for the following problems to have a solution: • Pi . Existence of invariant probability densities (IPDs) for a Markov operator T on a space Li -a- Li (X, 13, A) (for a more precise statement see (11.2.3)); • P2. Existence of strictly positive IPDs; and • P3. Existence and uniqueness of strictly positive IPDs. The approach. The approach is similar in spirit to that in Chapter 10. Again, the main idea is to write the problems P i , i = 1, 2, 3, in terms of "linear equations" in appropriate Banach spaces, and then use a generalized Farkas theorem of Craven and Koliha [25, Theor. 2] (see Lemma 10.6.2), to obtain necessary and sufficient conditions for the linear equations to have a solution. The resulting existence and uniqueness criteria have a nice interpretation as Lyapunov (or Foster-Lyapunov) criteria, which, in some cases, allows us to compare our results with related works. After some preliminary material in §11.2, the existence results are presented in §11.3. For ease of exposition, all proofs are postponed to §11.4.

11.2 Notation and Definitions Let (X, B, A) be a a-finite measure space. We will assume that this space is complete, which means that B contains all the subsets of A-null sets (i.e., if B E B is such that A(B) = 0 and B' C B, then B' is in B). Let L i Li (X, 13, A) be the Banach space of A-integrable functions f on X with the Li norm

:= f IfIdA. We denote by Lt the "positive cone" in L i , i.e., LiF := If E Lil f > 01. In this chapter, A is a fixed measure, and "a.e." means "A-a.e.". As in Remark 2.3.7(b), a linear map T : L 1 L, is said to be a Markov operator if it is positive, that is,

Tf EL

if f E LiE ,

(11.2.1)

and, in addition, T is norm-preserving, that is,

ITf Iii

=

1111k if

f E L.

As T fl < TIf I it follows that T is also a contraction, i.e.,

f

1

IlfIli

Vf E

.

(11.2.2)

159

11.2. Notation and Definitions

A function f E Li is called a fixed point of T (or invariant w.r.t. T) if T f = f, and, on the other hand, f is called a probability density w.r.t. A if f > 0 and III f = 1. Thus, problem Pi addresses the question of finding functions f that satisfy T f = f and 11/111= 1, with f E Lt. (11.2.3) This is essentially equivalent to the problem of existence of nontrivial (IlfIli > 0) fixed points of T [see Remark 11.2.1(b)]. In fact, in addition to (11.2.3), we shall consider the condition (11.2.4) f fo where fo E LiE is a function which is positive on a set B e B of positive A-measure. Hence, specializing to the case B = X, so that (11.2.5)

fo > 0 a.e.,

we obtain conditions for P2 and 13, where strictly positive means f> 0 a.e. The results for Pi, P2, and P3 are presented in §11.3 and proved in §11.4. These results are applicable, for example, to the case in which the t.p.f. P is A-continuous, that is, P(x, B) is of the form P(x, B) = f p(x , y)A(dy),

(11.2.6)

where p(x, y) : X x X R+ is a given nonnegative, measurable function. Then, introducing the measures as in (2.3.9) and (2.2.12), that is, vf(B) := f f (x)A(dx)

for every

f E

(11.2.7)

and (v fP)(B) := f vf(dx)P(x, B),

(11.2.8)

the corresponding Markov operator T (2.3.10) maps f E Li into the RadonNikoqm derivative of v1/3 (which, of course, is absolutely continuous with respect to A), i.e., T f := d(vfP)/ dA,

or T f (y) = f vf(dx)p(x, y) a.e.

(11.2.9)

Thus, a function f that satisfies (11.2.3) is an 1PD for P in the usual sense, i.e., f (•) = I f (x)P(x, .)A(dx),

11/111= 1, f E Lt.

(11.2.10)

Moreover, the measure v1 in (11.2.7) is an invariant p.m. for P, i.e., vf P = vf .

(11.2.11)

160

Markov Operators

Another case of interest is when P corresponds to a "deterministic" (or "noiseless") MC as in (2.2.6), i.e., 6 = F(et-i) = Ft (G), t = 0, 1, .. . ,

(11.2.12)

where F : X -> X is a measurable function. The logistic MC et+ i = S(e t ) in the previous section is an example of such a MC. In this case, the t.p.f. P (x , B) is given by P(x, B) = 6 F(x) (B), which can be written as P(x, B) = 5x [F-1 (B)]

(with Sx := Dirac measure at x),

and all of the results for the operator T in (11.2.9) remain valid provided that F is A-nonsingular in the sense that A[F+ 1 (B)] = 0 if A(B) = 0. In this case, T is known as the Frobenius-Perron operatorcorresponding to F (see e.g. [85], p.40). Remark 11.2.1. (a) All sets and functions introduced below are supposed to be (B ) measurable. In §§11.3 and 11.4, we identify sets and functions which differ only on a A-null set. If X is a vector space, we denote by X+ := {x E Xlx> 0} the convex cone of nonnegative elements of X. The indicator function of a set B is denoted by 11B• (b) A function f E Li is a fixed point of T if and only if f+ := max( f, 0) and f - := - min(f, , 0) are both fixed points of T (see, for instance, [48] Lemma 2). Moreover, if f E Lt is a fixed point of T and III f Iii > 0, then fillf11, is an IPD for T. Thus, the problem of existence of nontrivial fixed points of T basically reduces to P. (c) Problem Pi is obviously equivalent to the problem of existence of functions that satisfy (11.2.3)-(11.2.4). In fact, we pose Ti in the form (11.2.3)-(11.2.4) because it naturally leads to the strictly positive case in which fo satisfies (11.2.5). However, if one is only interested in Ti, and not in P2 and P3, one can replace (11.2.4) with a "majorization" condition f < fo , which is useful to do in some applications, and our present approach is still applicable (as can be seen from the proofs in §11.4, and also in Remark 11.3.6). Several authors have studied the "majorization" problem but under very restrictive assumptions. For instance, [29] deals with a specific type of Markov chains in X = Rd ; [94] considers a Markov chain in X = {1, 2, ... }; and in [54] the underlying measure space (X, B, A) is as in the present paper, namely, -

(X, B, A)

is a a-finite complete measure space,

(11.2.13)

but it requires additional hypotheses that are not needed for our results in §11.3.

Incidentally, we need (11.2.13) because we wish to use the relation (11.3.3) below.

11.3 Existence Results In this section we first consider problem P i , in the form (11.2.3), (11.2.4); then P2, as in (11.2.3)-(11.2.5); and, finally, the uniqueness problem P3.

161

11.3. Existence Results

We begin by rewriting (11.2.3), (11.2.4) in an equivalent form, for which we need to introduce some notation.

11.3.1 The Space ba(X, A) Let M(X) be the Banach space of bounded signed measures on B with the total variation norm TV, and let MA (X) be the subspace of measures in M(X) which are absolutely continuous w.r.t. A, i.e., MA(X) :=

Itt c M(X) p, < AI.

(11.3.1)

By the Radon-NikoOm theorem, MA(X) is isometrically isomorphic to Li = Li(X,B, A), which we write as Li

(11.3.2)

MA(X).

Now let ba(X, A) _= ba(X, B, A) be the Banach space of bounded finitely-additive set functions it on B such that ,u(B) = 0 if A(B) = 0, with the total variation norm. Then ba(X, A) is isometrically isomorphic to the (topological) dual L of L,„ [34, Theor. IV.8.16], i.e., .14,0 c ba(X, A).

(11.3.3)

Hence, since LI L oo , the second dual LI* of Li is isometrically isomorphic to ba(X, A), that is, LI* ba(X, A), so we have L1 c ba(X, A)

L.

(11.3.4)

Finally, let T* : L c, —> L c, be the adjoint of the Markov operator T : L i —> Li in (11.2.1), (11.2.2). (In the "deterministic" case (11.2.12), the adjoint T* is called the Koopm,an operator with respect to F; see [85], p. 47. In the A-continuous case (11.2.6)–(11.2.9), T* g(x) = f p(x,y)g(y)A(dy).) Then T* is also a Markov operator (on L oc,), so that, in particular (by Lemma VI.2.2 in [34]),

1 7'11 =

T II = 1,

(11.3.5)

and the second adjoint T** : —> LI* of T is an extension of T [34, Lemma VI.2.6]. To simplify the notation we shall write T** as T and, in view of (11.3.4), we also write T: ba(X, A) —> ba(X, A). (11.3.6)

11.3.2 Problem Pi We now state our first result (proved in §11.4), where we use the notation (f, u) :=

f(

fu)dA for every f in Li and u in L.

162

Markov Operators

Theorem 11.3.1. The following statements are equivalent: (a) There is a function f E Li that satisfies (11.2.3), (11.2.4) for some fo E LiF . (b) There is a pair (p,,v) in ba(X, A) -1- x ba(X, A) -1- such that

(i)

(I — T)p, = 0,

(ii) it — v = (po ,

and

(iii)

p(X) = 1,

(11.3.7)

where c,00 E MA(X) ± is the measure defined by B E

Wo(B) = f fo dA,

B.

(11.3.8)

B

(c) The condition with u, v E L+ 0° and r E le

(T* — I)u < —v + r

(11.3.9)

implies (.f 0,v)

(11.3.10)

7''.

For a proof of Theorem 11.3.1 see §11.4. We can also derive a necessary and sufficient condition using asymptotic properties of the Cesaro sequence T(n) f o :

n-1 n _i E T k fo

for n = 1, 2, ...

(11.3.11)

k=0

(with T° := identity) for some fixed arbitrary strictly positive function fo E L. Theorem 11.3.2. Let fo E LiF be any fixed, strictly positive function. Then T admits a fixed point f* E Lif if and only if

lim inf n--- oo

T (n) fp

0.

(11.3.12)

For a proof of Theorem 11.3.2 see §11.4.

11.3.3 Problem P2 Theorem 11.3.1 refers to problem P1 in the form (11.2.3), (11.2.4), for a given function fo in L: t , which is positive on a set B E B with A(B) > 0. The following theorem, on the other hand, considers a function fo as in (11.2.5), so that a solution f E L1 to (11.2.3)—(11.2.5), if one exists, is a strictly positive IPD, as required in P2.

Theorem 11.3.3. The following statements are equivalent. (a) There is a function f E L i that satisfies (11.2.3) and (11.2.4) for some function fo as in (11.2.5).

163

11.3. Existence Results

(b) There is a pair (pc, v) in ba(X , A)+ x ba(X , ))± that satisfies (11.3.7), with cp o as in (11.3.8) for some function f o as in (11.2.5). (c) The condition (11.3.9) implies (11.3.10) for some function f o as in (11.2.5). (d) There is a function g o E Lt, with g o > 0 a.e., such that

lim inf Tn go > 0 a. e.

(11.3.13)

fl -*00

(e) There is a function g o e Lt, with go > 0 a.e., such that

lim inf T (n ) go > 0 a. e.,

(11.3.14)

with T (n) as in (11.3.11).

For a proof of Theorem 11.3.3 see §11.4. Remark 11.3.4. (a) As was already noted in §11.1, there are many previous works related to the problem of existence of an IPD, in particular, for strictly positive IPDs. For instance, for a general positive contration in L 1 , Neveu [105, Theor. 1] shows that part (a) in Theorem 11.3.3 is equivalent to: For any u E L, the equality lim infn (Tngo, u) = 0 implies that u = 0 a. e., where go is an arbitrary but fixed function in Lt with g o > 0 a.e.

(11.3.15)

Condition (11.3.15), which is written in still another equivalent form in [18], is, at least in appearance, more complicated that our condition (11.3.13) for Markov operators. On the other hand, by the general inequality h = ago + (h — ago ) > ago — (h — ag o ) , a E R,

arguments similar to those used by Neveu [105, p. 465] show that if (11.3.13) holds for a given strictly positive function g o in LiF , then it holds for every strictly positive function h in Li 1- .

(b) The existence of a strictly positive IPD does not imply, of course, that it is the unique IPD. For a simple counterexample take T as the 2 x 2 identity matrix on R 2 .

11.3.4 Problem P3 We now consider the question of uniqueness of strictly positive IPDs. To motivate our approach let us first consider the problem of finding an IPD which is not strictly positive. In other words, let G E B be a set such that G X and A(G) > 0, and consider the problem of finding an IPD f that vanishes on G. More explicitly, the problem is to find f in L -t such that (I — T) f = 0, III f Iii = 1, and

f = 0 a.e. on

C.

This is expressed in other equivalent forms in the following theorem.

(11.3.16)

164

Markov Operators

Theorem 11.3.5. Let G E B be a set such that G X and A(G) > 0. Then the following statements are equivalent: (a) (11.3.16) has a solution f in LiF . (b) There exists ,u in ba(X, A)± such that

(I — T)it = 0,

it < v., and (11,1) = 1,

(11.3.17)

for some nontrivial measure v. in M(X)± with v(G) = 0. (c) For any u, v in L+ and 13 E IR, the condition (I — T*)u + v + 0 > 0

(11.3.18)

(v., v) + 0 > 0,

(11.3.19)

implies for some nontrivial measure v. in M(X)± with v(C) = 0. (d) There exist go, h E 14- with Tr' go
lirn sup T ( n ) go 0 0 and lirn sup T (n) go = 0 a.e. on G. n ---*oo

(11.3.20)

n—oo

Remark 11.3.6. Observe that Theorem 11.3.5 also permits to derive a necessary and sufficient condition for the existence of a fixed point f of T, majorized by some given function fo E LI - , that is, the existence of a function f E Lil- that

satisfies (11.2.3) and the majorization condition f

(11.3.21)

fo

for some fo E L. As a consequence of Theorem 11.3.5 we obtain the following result on the uniqueness of a strictly positive IPD. Corollary 11.3.7. The following statements are equivalent: (a) Either T has a unique IPD which is strictly positive, or T does not have an IPD. (b) There is no set G E B with G X and A(G) > 0, for which (11.3.16) has a solution f in LiE . (c) For every set G E B with G L X and A(G) > 0, and every nontrivial measure v. in M(X)± with v(C) = 0, there exist u,v E LI, and 0 E R such that

(I — T* )u + v +13 > 0

and (v. , v) + 0 < 0.

(11.3.22)

(d) For every set G E B with G X and A(G) > 0, and for every go, h E LiF 0 withTngo

lim sup T ( n) go = 0 a. e. or lim sup Ten ) go n---oo

n--+ cc

0 on G.

(11.3.23)

165

11.3. Existence Results

Proof. It is clear that (a) implies (b). We next prove that (b) implies (a). Suppose that (b) holds. Now if (a) does not hold, then T has at least one

IPD which is not strictly positive, or it is not unique. As (b) holds, the former case is not possible. In the latter case, let fi f2 be two IPDs. Then f := fi — f2 is a nontrivial fixed point of T and so are f+ and f - [see Remark 11.2.1(b)]. Moreover, denoting by B E B the support of f+, there is a set G E B with A(G) > 0 such that G C B and f+ > 0 on G. Then f - = 0 on G and, therefore, f - illilli satisfies (11.3.16); hence, (b) does not hold. Thus, (b) implies (a). Finally, the equivalence of (a), (c) and (d) follows from Theorem 11.3.5. 1=1 Remark 11.3.8. If T* denotes the adjoint of a Markov operator T, then

is sometimes called the drift operator associated to T. In terms of A we may rewrite (11.3.18) in the form Au

< v +0,

which is of the same form as the Lyapunov (or Foster-Lyapunov) criteria that are widely used in studies on the "stability" of Markov processes and MCs; see, for instance, [7, 29, 53, 84, 103, 130].

11.3.5 Example Let P and T be as in (11.2.6) and (11.2.9), respectively. As an example of how one can derive conditions for the existence of IPDs, for f in LI- let us write Tn f = d(v fPn)I 0% as (11.3.24)

Tn f (y) = f vf(dx)pn(x,y), x

where IP (x, y) denotes the n-step transition density. Moreover, define p* (x, y) := lim inf pn (x, y), n

co

and f* (y) := f v f (dx)p* (x , y). X

Observe that f* is in L-IF since, by Fatou's lemma and (11.2.2), inf f (TThpdA Ilf*Ili = f f* clA < lim 7-1-400

IlfIli.

(11.3.25)

166

Markov Operators

Now, from (11.3.24) and using Fatou's lemma again, lim inf 7' f (y) > f* (y), so that a sufficient condition for (11.3.13) is: There is a function f e .14.- such that a.e.,

(11.3.26)

with f* as in (11.3.25). For instance, let A be the Lebesgue measure on R d , and let A be adxd matrix with spectral radius < 1. Moreover, let {o t ,t = 0,1, ... } be a sequence of i.i.d. (independent and identically distributed) Gaussian d-vectors with zero mean and a symmetric positive-definite covariance matrix. Then it is well known (and easy to show) that for the Gauss-Markov MC (2.2.11) = .24-t + Ot,

t = 0, 1, ...;

x o = x given,

the limiting density p* in (11.3.25) is again Gaussian. Therefore (11.3.26) always holds for any strictly positive function f in L 1 . Similarly, suppose that, for every y e X =R d , A y (B) := fB A(dx)p(x , y)

VB E B

defines a finite measure on B. Then

is also a finite measure on B, and if f E Lt is a convex function we obtain, by Jensen's inequality (see, e.g., Perlman [110, Prop. 1.1]), Tn f (y) ? f [fin (y)], where

11(y) := f xpn (x , y)A(dx). x

(11.3.27)

Hence if, in addition, f is strictly positive and if there is a constant M such that Illn( y ) 1 < Al

Vy E X, n = 1, 2, ... ,

(11.3.28)

then (11.3.27) yields lim inf Tn f (y) > 0

Vy E X.

71 ---+ 00

Thus, another sufficient condition for (11.3.13) is that every y and (11.3.28) holds.

Ay (*)

is a finite measure for

167

11.4. Proofs

11.4 Proofs Before proving Theorems 11.3.1, 11.3.2 and 11.3.3, we need the following auxiliary result. Let eu, be in ba(X, A))± [9 MA(X)+] and f in Lt. Then the notation f < p, (used in the following lemma) means that vf(B) < p,(B), where vf E MA(X)± is the measure defined in (11.2.7). Lemma 11.4.1. (See [39, p.34, Lemma A] or [105, Lemma 1]). Let p, be in ba(X, ))±. Then (a) The set P(jI) := If E 4 f < pl contains a maximal element f*; that f* is in r(p), and f* f for all f E r(p); (b) Let B := {xl f*(x) = 0. There exists a function u E Lto such that u > 0 on B and (p,u) = 0. (c) If, in addition, p, is a fixed point of a linear operator T : ba(X, )¼) —> ba(X, )) (i.e., T p = p), then so is f*.

11.4.1 Proof of Theorem 11.3.1 We shall prove that (a)-(b)(a) and (b).<=>(c). (a) = (b). Clearly, part (a) in Theorem 11.3.1 is equivalent to: There is pair of functions (f, g) in Lif x L -1- such that (I —T)f = 0,

f — g = fo ,

(f, 1) = 1.

(11.4.1)

To see the equivalence, simply take g := f — fo. Now, in turn, recalling that L 1 --' MA(X) and MA(X) c ba(X, A) [see (11.3.2) and (11.3.4)], we see that (11.4.1) implies part (b) in the theorem by identifying f and g in (11.4.1) with the measures p(B) := I fdA and v(B) := fB gc/A B

in MA(X) ± c ba(X, A)E. (b) = (a). Let (it, v) E ba(X, A)± x ba(X, A)± be a pair that satisfies (11.3.7). Then, by (11.3.7)(i) and Lemma 11.4.1, the set r(p) in Lemma 11.4.1(a) contains a maximal element f* e L, which is a fixed point of T. Moreover, by (11.3.8) and (11.3.7)(ii), fo is in r(t ) and, therefore, f* fo• Hence, f := f*/11f*Ili is an IPD with f > f* > fo, where the first inequality comes from

11/11 = I f*dA < (X) =1, x

and so the pair (f,g) c 1,1" x L -1- , with g := f — fo , satisfies (11.4.1). (b) .#› (c). We wish to use Lemma 10.6.2, for which we shall first rewrite (11.3.7) as a "linear equation".

168

Markov Operators

Consider the dual pairs (X, Y) and (Z, W), where X := ba(X, A) x ba(X, A),

y :=

L ao x 1,, Z := X x IR, W :=

y x IR.

A) and u E L oo , we write (it, u) := f u dit. In particular (tt, 1) = it(X). Now let A : X —> Z be the linear map For it

E ba(X,

A(1 1 , v) := ((I — niL, 1- 1 — v, (1- 1 , 1 )).

Furthermore, let K C X be the convex cone K := ba(X,A)+ x ba(X, A)+, for which the dual cone K* C 3) is K* := 40 x 40 , and let b E Z be the vector b := (0, (po , 1). Then we can write (11.3.7) as A(i,t, v) = b, so that part (b) in Theorem 11.3.1 [which corresponds to (a) in Lemma 10.6.2] becomes: the equation A(//, v) = (0, (Po, 1) has a solution (a, v) in

K.

(11.4.2)

Let us now consider the adjoint A* : W —> 3) of A, which is easily seen to be given as A* (u, v, r) = ((/ — T*)u + v + r, —v) V(u,v,r) E W. Thus, observing that ((0, (Po, 1), (u, v, r)) = (vo, v) + r, the statement corresponding to (b) in Lemma 10.6.2 is: (/ — T*)u + v + r > 0 and — v > 0

implies (vo, 0 + r > 0,

which, replacing v by —v and using (11.3.8), can be restated as (T* — I)u < —v + r and v E L±

implies (fo , v) < r.

(11.4.3)

Now note that if v E LI) , then for the inequality (fo , v) < r to be true we must necessarily have r > 0 (recall that f o is in Lt); thus, in (11.4.3) we may take r > 0. Moreover, without loss of generality, we may also take u > 0 because, for any u E L oo , the function u' := u+ Iluiloo is in 40 , and (T* — I)u = (T* — I)u'

(11.4.4)

(since T* c -2--- c • T*1 = c for any constant c). With these changes, (11.4.3) becomes exactly the same as statement (c) in Theorem 11.3.1. Therefore, as it is obvious that the map A in (11.4.2) is weakly continuous (see (10.6.3)) we can use Lemma 10.6.2 to conclude that (b) and (c) in Theorem 11.3.1 are equivalent provided that A(K) is weakly closed.

(11.4.5)

We shall now proceed to prove (11.4.5) using the fact that (ba(X, A), L oo ) is a dual pair of vector spaces, and that, by (11.3.3), the weak topology a(ba(X, A), L) on ba(X, A) is the weak* topology (see Remark 1.3.1). Convergence in this topology will be denoted by '''>.

169

11.4. Proofs

Let (D,>) be a directed set, and let {(p a , v,), a E DI be a net in K = ba(X , A)+ x ba(X , ))+ such that

A(, v) 14 (a, b, c) for some

(a, b, c) E W = L c, x L oo x IR;

that is (i) (I — T)p, 14 a,

(ii) p,, — I}, u-4 b,

and (iii) (p,a , 1) —4 c.

(11.4.6)

We wish to show that (a, b, c) is in A(K); that is, there is a pair (p, v) in K such that (i) (I — T)p, = a,

(ii) p, — v = b,

and (iii) (p, 1) = c.

(11.4.7)

By (11.4.6)(iii), we must have c > 0, so we shall consider two cases, c = 0 and c> 0. Case 1: c = 0. In this case, (11.4.6)(iii) implies that p, converges strongly to 0, since iii-tallTv = (Pa, 1) ---> 0, and so does T p c,. Hence, a = 0, and so (11.4.7) holds with (p, v) = (0, —b). Case 2: c> 0. By (11.4.6)(iii), there is a constant Mo (for instance, take Mo = 2c) and ao E D such that iiliallTv <M0 for all a > ao . This in turn, combined with (11.4.6)(ii), yields that there is ai E D and a constant M1 such that IlvallTv < Mi

for all a > a i . Therefore, the Banach-Alaoglu-Bourbaki theorem (see Lemma 1.3.2) implies the existence of finitely additive set functions p, v in ba(X , )YE' and a subnet {0} of {a} such that (i) PO 14 it,

and

w

(ii) vo —> v.

(11.4.8)

w* Moreover, (11.4.8)(i) yields that Tpo —> Tp, since (T,u,o, u) = (po,T* u) —) (p, T* u) = (T p, u)

Vu E L.

(11.4.9)

Finally, from (11.4.6), (11.4.8) and (11.4.9) it is easily deduced that the pair (p,, v) in (11.4.8) satisfies (11.4.7). This completes the proof of (11.4.5), and, therefore, the proof of Theorem 11.3.1.

11.4.2 Proof of Theorem 11.3.2 The "if" part. Let y(n)j0 be as in (11.3.11). Observe that supn

1 T(n) /0 1 < I fob.

By the Banach-Alaoglu theorem (see Lemma 1.3.2), the norm-bounded subsets of ba(X, A) are weak* compact (that is, in the weak* topology cr(ba(X , A), L) of ba(X, A)). Hence, there is a finitely additive measure p E ba(X , A)± and a subnet {n,} of the sequence {n}, such that (h,T (na ) fo ) —> (h, p)

Vh E L a c.

Markov Operators

170 It follows that T p = p, because T(T (na ) fo) = T (na ) fo + nV [Tna fo — fo]

Vn,,

and

(T* h, p,)

= lim (T* h, Ten' ) fo) a

= 1411 (h,T(T ( nce ) fo ))

a = (1.1„ u,) V h E L,,, where we have used that lirp ri 1 (h,Tn'io — fo)1

n ,---, 1 011. [ITn /011 + 11/01I]

27c i llhildifoll t O.

Hence, T ,u, = t. Now, let :I.; : = lim infn T (n) fo E LiE . By (11.3.12), addition, Fatou's lemma yields lim inf (h,T (n) fo) > (h, :10 )

10

0. In

Vh E 4;0 .

71,-- 00

Therefore, for every h E Lit,

( 11 , ii)

(h, T(na) f0) = liM a >

lim inf (h,T (n) fo) ri—oo

> (11 ,L), so that p, > fo . Invoking Lemma 11.4.1, we conclude that L E r(i,) :. If E Lill f 0}, we have lim T (n) fo = (f*) -1 n—*co

lirn

-1 n -1 \--01 L jk=0 T k fo

- 71 —1 k *. n—*(Do n _ i \-% Lik=0 T j

= lim

Snfo

Ti —+00 Sn f* '

with

n-1 Sn f :=

E Tk f

k=0

for f E Li.

(11.4.10)

171

11.4. Proofs

By Theorem 2.3.2, the limit in (11.4.10) is finite a.e. on C* := C n ff* > 01. Moreover, suppose for the moment that A is finite. Let C be the sub-a-algebra of A-invariant sets (where a A-invariant set A E 8 is a set such that TlIA = 11A). For any B E8 and f E Li we define (iBi)(x) := 11/3(x)f(x) for all x E X. Further, let D be the dissipative part of X in Hopf' s decomposition. Then, from Revuz [112, p. 150, Theor. 5.6], the limit in (11.4.10) can be written as fo E Vic fo IC] (f) * —1 lim T ( n) = lim Sri = n—■ oo Sn f* E [Hc f* IC]

*

with Plc :=Ic Eo°° (TID) n . Obviously, as Jo > 0, we have Hcfo > Ic Jo, and thus, from the strict positivity of fo, E[Hc folC] > E[IC folC] > 0. Moreover, writing f* = fr) and using f in lieu of f* in (11.4.11), we have Hcf =f so that the ratio in (11.4.11) is greater than E[ic folC]/E[Th` IC] on C*. This yields that (p.)-4 limn_>00 T () f0 > 0 on C. Finally, if A is not finite, as fo E 1,1- is strictly positive, let p, be the finite measure p(B) := fB fo dA equivalent to A. A function f is in L i (p,) if and only if fo f is in L i L i (A)). Therefore, we replace T with the new operator f f := fo-1 T(f 0 f). It turns out that T' is a positive contraction on L i (p) , and, in addition, L(t) = Loo (A), (T)* = T*, so that the a-algebra of invariant sets is the same (see Revuz [112, p. 133]). The ratio in (11.4.11) becomes ,

E[111 1C] E f f o lC]

and we conclude in the same manner as we did for the case A finite.

11.4.3 Proof of Theorem 11.3.3 The equivalence of (a), (b) and (c) follows, of course, from Theorem 11.3.1 applied to the particular case of a function f o > 0 a.e. To complete the proof of Theorem 11.3.3, we will show that (a)-(d)(e)(c). In fact, the first two implications are straightforward because if f E L 1 is as in (a), then go := f satisfies (d) (since T n f = f for all n, we get lim inf Tn f = f > 0), whereas (d)(e) follows from general properties of sequences. Hence, it only remains to prove that (e) implies (c). Let go E lit be as in (e), and suppose that (11.3.9) holds, that is, u > T*u

v — r, (11.4.12)

where u, v E Lto and r > 0. We will show that (11.4.12) implies (11.3.10) with fo given by inf fo := c • lim n—Kx)

where

c := 1/1Igoll i .

(11.4.13)

172

Markov Operators

To do this, observe first that fo indeed satisfies (11.2.5) [by (11.3.14)], and it belongs to LiE because [by (11.2.2)]

f

(T k go ) d A = 11 Tk g0 1 1

= 11.g o I I i

so that Fatou's lemma yields 11f0111 =

E f(T k go)c/A = c • b o b =1. f io ca < c • lim inf —n1 n-1 k=0

Now, to see that fo satisfies (11.3.10), iterate (11.4.12) to obtain (recalling that T*1 = 1) n-1 U > T*nu ±

E

vic, _

nr

Vn = 1, 2, . .. .

k=0

Then multiply by go and integrate with respect to A to see that n-1 (go , u) > (Tn go, u) + (E Tk go, v) — nrlig0111. k=0

Finally, in the latter inequality, multiply by (nlig o ll i ) -1 , and then take lim inf n to obtain (fo, v) < r from (11.4.13) and Fatou's lemma. Therefore, (e) implies (c), which completes the proof of Theorem 11.3.3. 0 Remark 11.4.2. In the proof of Theorem 11.3.5 we use the following fact, which is a restatement of Lemma 10.6.1(a) : If p, is in ba(X, A)+, v is in M(X) ± and p
11.4.4 Proof of Theorem 11.3.5 We shall prove first that (a) and (b) are equivalent. (a) (b). Let f be as in (a) and define vf(dx) := f(x)A(dx) as in (11.2.7). Then p, = v* := vf satisfy (b). (b) = (a). With p, and v * as in (b), Lemma 11.4.1 yields that the maximal element f* in F(p) is a fixed point of T such that f* = 0 a.e. on G. Moreover, by Remark 11.4.2, the condition p, < v* implies that in fact p, is a measure in M(X)+ and, therefore, f* is nontrivial [otherwise, the set B in Lemma 11.4.1(b) would be all of X, in which case p(X) = 0; a contradiction]. Hence f := f*/11f*Ili satisfies (11.3.16). (b) <=> (c). We shall use Lemma 10.6.2, for which we introduce the dual pairs (X, 32) and (2, W) of vector spaces X := ba(X, A) x ba(X, A), 3) := L oo x L oo , 2 := X X R, W := 3) X R.

173

11.4. Proofs

Let A: X -> Z and its adjoint A* : W -> Y be defined as A(p,, v) := ((I - T),u,, + v, (t.t, 1)),

and A* (u, v, 0) := ((/ - T*)u + v + 0,v). With this notation, part (b) in Theorem 11.3.5 is equivalent to: The linear equation A(ti, v) = (0, v * , 1) has a solution

v) in K,

(11.4.14)

where K := ba(X , ))± X ba(X, )¼)± is the positive cone in X. Similarly, part (c) in Theorem 11.3.5 can be written as: * = ((0 , v * , 1) , (u, v , 0)) = (v* , v) +3 > 0.

A* (u , v , ,3) E

(11.4.15)

Therefore, by Lemma 10.6.2, (11.4.14) and (11.4.15) are equivalent if (i) A is weakly continuous, and (ii) A(K) is weakly closed. The condition (i) is obvious (see (10.6.2)) and (ii) can be proved exactly as (11.4.5). (a) = (d). Choose h := go := f, where f is an IPD as in (a). (d) = (c). Consider u, v E L;c4,-0 and E R such that (11.3.18) holds. Rewrite (11.3.18) as u > T* u - v - 0. Then iterating n times and dividing by n one gets (after rearranging terms) n-1

71, -1

E T*kv + >

( T*11 —

Vn = 1,2,...

k=0

Hence, multiplying by g o and integrating w.r.t. A, n-1

E T k go , v) + 1311goll i

n -1 (go, (T " — Du)

(11.4.16)

k=0

for all n = 1, 2, .... Taking lim sup in (11.4.16) and invoking Fatou's lemma we obtain (lim sup n 71-■ CO

—1

n-1

T go, v) ± 011011

0

k=0

because the right-hand-side of (11.4.16) vanishes as n

Do. Therefore, letting

n-1 /1* (B) := I

Igo 11 -1 f (lim sup n -1 B

(11.3.19) follows.

n—>oo

E T k go ) dA for B E 8,

k=0

LI

174

Markov Operators

11.5 Notes Problems Pi and P2 have been studied by many authors. For instance, [80] and [124] deal with a problem of the form (11.2.3), (11.2.4), whereas [18, 39, 48, 75, 105, 112] study the strictly positive case P2. The latter references also show that if P2 has a solution, then T is a conservative operator, and, on the other hand, P2 turns out to be the same as the problem of existence of a probability measure 11, equivalent to A (in the usual sense that ii < A and A < //). In addition to these references, the case of a A-continuous Markov process, as in (11.2.6), is dealt with in [5, 85, 86], and the deterministic system (11.2.12) is studied in [4, 85, 126]. For most practical purposes, the problems Pi (i =-- 1, 2, 3) can be restricted to situations as in (11.2.6) or (11.2.12) because under mild assumptions on the measure space (X, B, A), every positive contraction T on Li is of the form (11.2.9) for at least one t.p.f. P (see [112], pp.120-121). Finally, we should remark that we know of no previous work on the uniqueness problem P3.

Chapter 12

Approximation Procedures for Invariant Probability Measures 12.1 Introduction In this chapter we consider a MC e. on a LCS metric space X with t.p.f. P. Suppose for the time being that A E M(X) is an ergodic invariant p.m. for P (see Definition 2.4.1). We address the following issue. Given f E Li(A), we want to evaluate f fdp, knowing only that the ergodic invariant p.m. A exists but p itself is not known. One way to proceed is to simulate the MC. This makes sense because, as A is ergodic, for all initial states x in a set Af of full A-measure we have

lim nri—■ cc

f(k) = f f diL

Px -a.s.

(12.1.1)

(see Corollary 2.5.2 and the comment after it). Hence, if we simulate the MC from an initial point eo E Af, that is, if we simulate a realization w E 52 to obtain a sample-path (eo (w), 6 (w), ... ) (with 6 e Af), then we obtain almost surely a good estimate of f f dp, for n sufficiently large. However, except for special cases, simulating a MC on a space X as, say 1 m , is not an easy task in general, even for small m. Moreover, even if we can simulate the MC e., what we obtain in (12.1.1) is only an estimate of f f dz. Instead of looking for an estimate of f f dp, an alternative approach is to approximate the numerical value f f dp, "directly". In fact, if X is finite and A is unique, then A can even be computed exactly by solving the finite-dimensional linear system p(I — P) = 0;

p(X) = 1; A > 0;

(12.1.2)

176

Chapter 12. Approximation Procedures for Invariant Probability Measures

see (10.1.1). However, solving (12.1.2) exactly becomes virtually impossible if X is countably-infinite or any other "large" set. One may thus try to "approximate" p (or f f dp) by solving finite-dimensional linear systems that approximate in some sense the original (large or infinite-dimensional) linear system (12.1.2). Note that in the latter approximation approach, one obtains a numerical approximation of the value f dp rather than a statistical estimate as in the former simulation approach. If the MC has several (unknown) invariant p.m.'s, one may wish to obtain upper and/or lower bounds on sup f dp, where the "sup" is taken over the set of all invariant p.m.'s for P. In this case, the simulation approach is not quite satisfactory because it only provides an estimate of f f dpo for some invariant p.m. po which depends on the initial state &) of the simulated sample-path. In contrast, we shall see that the proposed approximation schemes do provide such bounds. This chapter is organized as follows. In §12.2 we state the problem and provide some intermediate results needed in later sections. In §12.3 we propose a numerical approximation scheme based on a sequence of larger and larger finitedimensional linear programs. Under some conditions, the resulting sequence converges to the desired value. In §12.4 we provide an alternative approximation scheme for MCs with a weak-Feller t.p.f. P that maps polynomials into polynomials. This latter approach is based on the moments of the unknown p.m. p, and so, instead of a sequence of linear programs we obtain a sequence of semidefinite programs whose solution does not require discretizing the state space as in the former approach.

f

12.2 Statement of the Problem and Preliminaries Let X be a LCS metric space with Borel a-algebra B, and consider a MC on X with t.p.f. P. Let M(X), B(X), Cb(X), Co(X) be the Banach spaces defined in §§1.2.1 and 1.2.2. (In particular, recall (1.2.3).) Given a measurable function f : X —4 IR, suppose that we want to compute f f dp for some invariant p.m. p for P, with f E L 1 (p). Then we may first solve a linear system of the form (12.1.2), i.e., p (I — P) = 0, p(X) = 1,

p e M(X)±.

(12.2.1)

to obtain one invariant p.m. p, and then we compute, or at least approximate, the integral f f dp. In fact, one may wish to solve the following related optimization problem in which, as usual, (p, f) := f fdp, and we use the notation "s.t." for "subject to". P:

minimize s.t.

(P, p(I — P) = 0, p(X) = 1, p e M(X)+

(12.2.2)

12.2. Statement of the Problem and Preliminaries

177

This will provide : —the desired value inf P = (p, f) if p, is unique, —or infm (p, f), where the infimum is taken over all the invariant p.m.'s p, for P for which f is in L i (p) and (p, f) is bounded below. Note that P is an infinite-dimensional linear program; its optimal value is denoted inf P. A linear program is said to be solvable if there exists an optimal solution, say . In this case, P is said to be solvable and we write its value as inf P = min P =

f f d.

The underlying idea. To solve P the idea is to use an aggregation of constraints as well as an inner approximation of p e M(X) as follows.

12.2.1 Constraint-Aggregation When X is a LCS metric space, the Banach space Co (X) is separable, that is, it contains a countable dense subspace

H := {hi , h2 , . . .} c Co (X).

(12.2.3)

By the denseness of H in Co(X), for any two measures ,u, v in M(X) we have

P, = v <=> (1i, h) = (v, h) V h E Co(X) <=> (ii, h) = (v, h) Vh E H. Hence, solving (12.2.2) is equivalent to solving the linear program minimize s.t.

(ii, i)

(11(1- — P), h) = 0 Vh E H, (11,1) = 1, p, E M (X)± . (

12.2.4)

A key difference between (12.2.2) and (12.2.4) is that the latter is an optimization problem with countably many constraints. On the other hand, if in (12.2.4) we consider only a finite subset Hk C H in lieu of H, i.e., we only consider the constraints

(11, ( 1. — P), h) = 0 Vh E Hk C H, then we obtain a linear program with finitely many many constraints. This approximation to (12.2.4) is an aggregation scheme because we "aggregate" the infinitely many constraints ti(I — P) = 0 into finitely many of them. Further, one may even decide to relax the constraints (p,(/ — P), h) = 0 for h e Hk, to obtain the weaker constraints l(p(/ — P), NI < e Vh E Hk , (12.2.5) for some E > 0. To approximate (12.2.4) we will use the aggregation-relaxation scheme (12.2.5) together with (p, 1) = 1 and p, E M (X)± , which defines a linear program with finitely many constraints.

178

Chapter 12. Approximation Procedures for Invariant Probability Measures

12.2.2 Inner Approximation in M(X) The space M(X) has also an important, well-known property (see, e.g., Billingsley [14]). Namely, its subset P(X) of p.m.'s on X is such that every p.m. p, E P(X) can be "approximated" by p.m.'s with finite support in the following sense. As X is a LCS metric space, it contains a countable subset 5e (12.2.6)

:= {xi,x2,• • •}

dense in X. Proposition 12.2.1. Let p, E P(X) be an arbitrary, fixed, p.m. on X, and let X = {x i } be as in (12.2.6), i.e., a countable dense subset of X. Then there exists a sequence {p,} in P(X) such that:

has finite support (a) For every n = 2, of the form p n = Ein_ i Aril ax„ with Arii > 0 for all i = 1, . . .n, and (b) The sequence {li n } converges weakly to p, i.e., p n 1.4.10).

that is, p,,, is = 1.

>A

,u, (see Definition

We will also use this fact to build an inner approximation of (12.2.4) by approximating ,u, in (12.2.4) by a p.m. with finite support, which will yield another linear program with finitely many variables, the coefficients {A} in Proposition 12.2.1. Of course, the larger the support, the better the approximation. This is an inner approximation scheme (in M(X)) because we look for solutions p in a proper subset of the initial set of p.m.'s 'P(X). Finally, to approximate (12.2.4), we will combine the aggregation-relaxation scheme (12.2.5) with the just mentioned inner approximation scheme to finally obtain finite linear programs, i.e., linear programs with finitely many constraints and variables, that in principle are solvable even for a very large number of constraints and variables. We next show that under appropriate assumption on the function f, the above approximation scheme indeed works.

12.3 An Approximation Scheme Throughout this section, f : X R denotes a given nonnegative measurable function; additional assumptions on f are introduced below. We first examine an aggregation scheme which replaces the original infinitedimensional constraint p(/ — P) = 0 in (12.2.2) with finitely many constraints.

12.3. An Approximation Scheme

179

12.3.1 Aggregation Consider the following aggregation Pk of P. minimize (1--(I — P),

= 0 Vh E Hk, (i 1 , 1 ) = 1,

p > 0,

(12.3.1)

where Hk := h2,. . .,hk} C H and H C Co (X) is as in (12.2.3). From Definition 1.4.14(b), recall the following.

Definition 12.3.1. A function f : X

R is said to be inf-compact if the set

Kr :=

r}

E Xlf (x)

is compact for every scalar r. As noted after Definition 1.4.14, an inf-compact function is also a moment.

Proposition 12.3.2. Assume that f is inf-compact and nonnegative. Moreover, P is weak-Feller and admits art invariant p.m. ,u, with f f dp, < oo. Then (a) Pk is solvable for every k = 1, 2, . . . , and so is P. Further, inf Pk = min Pk

I

min P

as k

oo.

(12.3.2)

(b) Let p,k be art arbitrary optimal solution of Pk. Then every weak accumulation point of the sequence fp,k1 is an optimal solution of P. Proof. (a) Let us first note the following. By the assumption on p, the linear program P and its aggregation Pk each has at least one feasible solution, and 0 < inf Pk < inf P < (,u,, f) < Moreover, as Hk C Hk+i for all k, the values inf Pk form a nondecreasing sequence bounded above by inf P, and, therefore, there exists p* > 0 such that inf Pk I p < inf P.

(12.3.3)

We next prove (a). Fix an arbitrary k > 1. Let lin E P(X) be a minimizing sequence for Pk that is, pn is feasible and f fdp n j, inf Pk. By (12.3.3), there exist no, mo such that (An , f) < mo for all n > no. Hence, as f is a moment, from Proposition 1.4.15 it follows that the sequence {p m } is tight. Therefore, by Prohorov's Theorem 1.4.12, there is a p.m. p, and a subsequence In3 1 such that tin,= p. By the weak-Feller property, (I — Cb(X) as hEHC Co(X), so that for every h E Hk (ittri3

(I — P), h) = (pn, , (I — P)h) --> (, (I — P)h) = ((I — P), h),

which proves that p, is feasible for Pk. In addition, as f is 1.s.c. and nonnegative, from Proposition 1.4.18 we get inf Pk = lim inf (1413 1 f) 3

f).

180

Chapter 12. Approximation Procedures for Invariant Probability Measures

As p, is feasible, it follows that p, is an optimal solution for Pk. This proves the first statement in (a). To prove that P is solvable and (12.3.2), let pk be an arbitrary optimal solution of Pk and consider the sequence -tick 1. By (12.3.3), (p,k , f) < inf P for all k > 1, and so the sequence {AO is tight. Hence, by the same argument used in the p. previous paragraph, there is a subsequence {k 2 } and a p.m. p, such that ,u,k, Fix an arbitrary j > 1. Then there is some index i j such that hi E Hi for all i > ij . Therefore, by the weak-Feller property, as i oo we get (pkz (I — P), h .] ) = (pk „ (I — P)h3 )

(p, (I — P)h i ) = ([(1- — P), h i ).

As j > 1 was arbitrary, it follows that (p,(/ — P)p, h3 ) = 0 for every j, and thus, p, is feasible for P. Again, as f is nonnegative and 1.s.c., we have inf P > p* := lirn inf min Pk, = lirn inf (,u, ki , f) > (p, f). The latter inequality and the feasibility of p, yield (ii, f) = inf P = p", that is, tt is an optimal solution of P and, finally, (12.3.2) follows from (12.3.3). (b) That every weak accumulation point of {pk} is an optimal solution of P follows from the same arguments used in the proof of the second part of (a). The linear program Pk is still infinite-dimensional because the "decision variable" p is in M(X). To obtain a finite-dimensional linear program, we next combine the above aggegation scheme Pk with a "relaxation" and an "inner approximation" in M(X).

12.3.2 Aggregation-Relaxation-Inner Approximation Let X be as in (12.2.6). Instead of Pk, we next consider the optimization problem Pkn(Ek) (with Ek > 0) defined by: minimize Pkn(Ek) : s.t.

Aif (xi) E ni _ Ai [h(xi ) — Ph(x)11 < ek = 1, A i > 0 Vi, in,i

V h E Hk,

(12.3.4)

where xl, x2, , xn e X. Observe that this is a finite-dimensional linear program because there are finitely many constraints and decision variables A1, , An . We have the following. Proposition 12.3.3. Let f and P be as in Proposition 12.3.2, and let in addition f be continuous. Then : (a) For every c k > 0 there is an integer n(ck) such that Pk(k) is solvable, and min Pk n (Ek) < min P ek for all n > n(Ek)• (b) Suppose that E k ,I, 0, and let pk,n, be an optimal solution ofPkri(Ek) for each fixed n > n(ek). Then every weak accumulation point of the sequence {k n ,k = 1, 2, ... } is an optimal solution of P, and limk min Pim (fk) = min P for each n > n(ck).

181

12.3. An Approximation Scheme

Proof. (a) Let p* be an optimal solution of P so that (p*, f) = min P. Let Ki := {x E X f(x) < j}, which is a compact set as f is inf-compact. Moreover, as j —> oo, K3 I X, and so, p*(Kj ) > 0 for all j sufficiently large. In fact, to keep matters simple we shall assume that p* (KJ ) > 0 for all j > 1. The p.m. := p,*(B n Ki )/ au. (K3 ) for all B E B obviously converges weakly to p* (in fact, ,u,j even converges setwise to ,u,*). Hence (jig , f) —> (j*, f) and, in addition, (A3, f) < au* ( 1(3) -1 (A* , f) vi = 1 , 2 ,— .

(12.3.5)

Hence, given Ek > 0, there is some j(k) such that (pj, f) < min P + Ek/2 for all j > j(k). From the weak convergence of pj to p* and the continuity of the functions hi and Phi , i = 1, 2, ... , there is an index j i such that for all j > ji , l(, (I — P)h) — (p,* , (I — P)1)1 < Ek/2 Vh E Hk.

Fix jo > max[j(k), ji]. From Proposition 12.2.1, there is a sequence PO of p.m.'s with finite support in X 11 K30 , that converges weakly to pi o . As the restriction of f to Ki is continuous, and the hi are also continuous, it follows that there is some n(Ek, j o ) such that for all n > n(Ek, jo ), one has (vn , f) — (L 0 , f) and

I (v n, (I P) 11 )

Vh E Hk•

(11.7431( I P)h)I < Ek

Therefore, (v, f) < min + ek and i(vn, (/ — P)h)I < Ek

Vh E Hk,

which proves that vn is feasible for Pkn (f k), and min Pk(Ek) + E k• rz < (b) Choose an arbitrary n > n(ek, jo ), and let /L k be an optimal solution to Pk(€k). Since (jib, f) < min P + ck, we conclude that the sequence { pk} is tight. Consider an arbitrary converging subsequence pk, = p. Using again that f is continuous and nonnegative, from Proposition 1.4.18 it follows that f)

< lim inf

(pk„ f) < min P

i -400

because (Ilk, f) < min P + Ek for every k and e k 1 0. Now pick an arbitrary index j, and note that hj E Hk for all k> j. Therefore, 1(tql (I P)hj)1 <€k

j,

and so the convergence ,u,k, it yields (p, (I — P)h 3 ) = 0. As j was arbitrary, p is feasible for P. This fact and (,u,, f) < min P imply that p is an optimal solution for P. El

182 Chapter 12. Approximation Procedures for Invariant Probability Measures Remark 12.3.4. (a) A crucial hypothesis in Propositions 12.3.2 and 12.3.3 is the inf-compactness condition on f. It was used in combination with the fact that a set H of p.m.'s that satisfy sup per' f fdp, < co is relatively compact (see Proposition 1.4.15 and Theorem 1.4.12(a)). On the other hand, let f be a given inf-compact function, and suppose that P is weak-Feller. Assume that there exist a nonnegative scalar b and a measurable nonnegative function V : X R+ such that

PV (x) < V(x) — f (x) + b

Vx E X.

(12.3.6)

Then, by Proposition 9.3.2, every invariant p.m. ,u, satisfies (A, f) < b. Therefore, instead of the hypothesis (p, < oo in Proposition 12.3.2, one may try to find a suitable "Lyapunov function" V and a scalar b for which (12.3.6) holds. (b) For a given measurable function f on X, one may also wish to maximize (p, f) over the set of all invariant p.m.'s p, for P. The latter problem is, of course, equivalent to minimize —(p, f). In particular, we may wish to both maximize and minimize (p, f) over the set of invariant p.m.'s p, for P so as to obtain upper and lower bounds on (p, f). If f is continuous and X is compact, then, by Proposition 12.3.3, the approximation scheme permits to obtain sharp upper and lower bounds, as both max P and min P can be approximated as closely as desired.

Example 12.3.5. Consider the MC associated with the deterministic logistic map in (6.2.5); see also §11.1. Its t.p.f. is P(x, B) = 1B(4x(1 — x)) for x E X = [0,1] and B E B. Recall that P admits countably many invariant ergodic p.m.'s; see §6.3.2. Let us partition X into n subintervals of equal length. The dense subset H in (12.2.3) is taken to be the basis of the polynomials on [0, 1], that is H = {1,x,x 2 ,...}, so that the set Hk in (12.2.5) consists of the polynomials {1,x,x 2 ,...xk}. In (12.2.2) consider three functions, namely fi (x) := x 2 , 12(x) := x 8 and f3(x) := (x — 0.2)(x — 0.3)(x — 0.4). Notice that the Dirac p.m. 6 0 at x = 0 is an invariant p.m. for P, and, therefore, for the first two nonnegative functions fi and 12, we trivially have min P = 0. Hence, we have chosen to "maximize" (p, f) in the linear program PThk(Ek), and we have even taken ek = 0 for all k > 1. As X is compact and f is continuous, Proposition 12.2.4 is also valid replacing inf with sup and min with max, that is, when we wish to maximize (instead of minimize) (p, f). In this case, sup Pnk (ek) will provide an upper bound on supp (p, f), where the "sup" is taken over all the invariant p.m.'s p for P (see Remark 12.3.4(b)). The results are displayed in Table 12.1. From the results in table 12.1, one may observe that with a fine grid (say, n = 100 or n = 200) one obtains very good upper bounds on sup A (p, fi ) even with a small set H. For fi (x) := x 2 , the values show that the Dirac p.m. 6 3/ 4 at the point x := 3/4 is optimal, and is obtained exactly as it is a grid point when n = 100 or n = 200.

183

12.4. A Moment Approach for a Special Class of Markov Chains

Parameters n=50; k=5 n=50; k=10 11=50; k=20 n=100; k=5 n=100; k=10 n=100; k=20 n=200; k=5 n=200; k=10 n=200; k=20

fi(x)

f2(x)

f3(x)

0.5603 0.5548 0.5546 0.5625 0.5625 0.5625 0.5625 0.5625 0.5625

0.2600 0.2567 0.2566 0.2611 0.2610 0.2606 0.2612 0.2611 0.2611

0.1069 0.1066 0.1065 0.1071 0.1071 0.1070 0.1072 0.1072 0.1072

Table 12.1: Upper bounds on supm (p, f), i = 1, 2, 3

12.4 A Moment Approach for a Special Class of Markov Chains In this section we consider a special class of MCs on X = Rn whose t.p.f. P maps polynomials into polynomials. This class contains, for instance, the MCs associated with the deterministic systems (2.2.6) and the iterated function systems (2.2.7), for which the functions F in (2.2.6) and Fp in (2.2.7) are real-valued polynomials. We will see that in this case, for a given real-valued polynomial p : -> IR, we can obtain upper and lower bounds on (p,,p) by solving semidefinite programs, that is, convex optimization problems on the set of positive semidefinite matrices, for which efficient solution procedures are available (see, e.g., Vandenberghe and Boyd [132]). The polynomial p plays the role of the function f in (12.2.2).

12.4.1 Upper and Lower Bounds Let p : liin IR be a given polynomial, and suppose that one wishes to determine upper and lower bounds on (p, 7)) over the set P 11 (X) C P(X) of invariant p.m.'s for P that have all their moments finite, that is, p, E P 7 (X) if p(I - P) = 0; p(X) = 1; p E M(X) +, (t, x) <00 Va E

,

(12.4.1)

where N := {0, 1, ... }, and for a = (ai , • , an ) e N11 the notation x stands for • 4.. The integral (p, x") := f x' dp, in (12.4.1) is called a moment of p. Observe that when P maps polynomials into polynomials and p, satisfies (12.4.1), we have ,

(p,(/ - P),

Va

= (p, (I - P)x")

E N 11

.

(12.4.2)

Moreover, (12.4.2) defines linear constraints on the moments := (11 ,e)

a

E Nn.

(12.4.3)

184

Chapter 12. Approximation Procedures for Invariant Probability Measures

Let 1, xi, ... x n , xT,x i x2,

xrin xm i — 1 x2, . , x,rn

be a basis for the vector space A m of real-valued polynomials of degree at most m. The dimension of this basis is denoted by s(m). Let lal := Ei ai . A real-valued polynomial p E A m is written as

p(x) =

E lal
with {pc,} E Rs(m) its vector of coefficients. Let p be a real polynomial of degree s, that is, p E A,. Consider the optimization problems maximize {(i, p)i E Pinv(X)}1 minimize { ( au, , p) p , E

P: Q

(12.4.5) (12.4.6)

and, for fixed m E N with 2m > s, the corresponding approximations Pm : maximize

E

PaYa

(12.4.7)

lal<2m

s.t.

= (11 ,x') V lal 5_ 2m,

it E Pinv(X),

and Qm

minimize s.t.

E

pc yo

x c') ya=(it, lal<2m

(12.4.8) V< 2m, j E

(X).

We immediately have inf Qm < inf

< inf Q Vm = 1, 2, .

and SUP P m > sup TPrn+l > sup P Vm = 1, 2, .. . . Hence inf Qm < inf Q < sup P < sup Pm Vm = 1, 2, . . . .

(12.4.9)

Therefore, sup P m (resp. inf Q in ) provides monotonically improving upper (resp. lower) bounds for sup P (resp. inf Q) as m increases. However, the problems Pm and Qm are not directly solvable because of the constraints

= (,x)

Vce E Nn with c:E1 < m,

for some p, E Pinv (X).

(12.4.10)

We next see how to approximate P m and Q m by two related problems with explicit conditions on y that are necessary for (12.4.10) to hold.

12.4. A Moment Approach for a Special Class of Markov Chains

185

12.4.2 An Approximation Scheme As P maps polynomials into polynomials, let d( 13) be the degree of the polynomial Px13 for all 13 E Nn. Let y = {y a } E R 8(2m) , so that lal < 2m. Let 3 E Nn with d(0) < 2m. From (12.4.10) and (12.4.2), (p(I - P),x 13 ) = 0 =

Ay = bo,

for some scalar bo and some row vector Ao E Eks(2m) . Therefore, the constraint p, E Pinv (X) is replaced with the invariance condition

VO s.t. d(0) < 2m

Aoy = bo

on the moments. It remains to give necessary conditions for y to be the vector of moments up to order 2m of some p.m. p, E P(X). These conditions require the moment matrix defined below to be positive semidefinite. Moment matrix. Given a s(2m)-sequence {1, y a }, let Mm (y) be the moment matrix of dimension s(m) labelled by (12.4.4). For instance, for illustration purposes and clarity of exposition, consider the 2-dimensional case. The moment matrix Mm (y) is the block matrix {M,, i (y)} 0
111 ,3(y) =

Yi+j-1,1

Yi+j,o Yi+j-1,1

•

Yi,i

Yi±j —2,2 • • •

Yi

j

1

(12.4.11)

,

Yi+j-1,1

•

_

To fix ideas, with n = 2 and m = 2, one obtains 1

Y10 YO1

Yb M2 (y) =

I

Y20

Yll

YO2

Y20

Yll

Y30 Y21 Y12

YO1

Yll

Y02

Y21 Y12 YO3

Y20

Y30 Y21

Y40 Y31

Y1 1

Y21 Y12

Y31 Y22 Y13

Y12 YO3

Y22

I

Y02

I

Y22

Y13 YO4

Another, more intuitive, way of constructing M m (y) is as follows. If Mm (y)(1, i) ya and Mm (y)(j,1) = yo , then Min(Y)(i, i) = Ya+,3

with a+3 = (al + 131, • • • ,

+ On).

(12.4.12)

186

Chapter 12. Approximation Procedures for Invariant Probability Measures The moment matrix Mm (y) defines a bilinear form (.,

.)y

on A m , by

(q(x), y(x)) y := (q, Mm (y)v) , q(x), v(x) E A m , where the first scalar product (., .) y is for polynomials in A m whereas the second scalar product (., .) is the usual scalar product of vectors in Rem ) . Thus, if y is a sequence of moments of some measure m y , then (I7, Mm (y)q) = I q(s)2 py (dx) > 0,

(12.4.13)

so that Mm (y) is positive semidefinite, denoted Mm (y) >-- 0. Finally, P, is replaced with the semidefinite program IP, : maximize

Pa yo lal<2m

s.t.

Ay = bo V13 s.t. d(0) < 2m, Mm (y) 0,

(12.4.14)

whereas O m is replaced with the semidefinite program OC,11- 7n :

minimize

E

PaYa

ickl< 2m

Ay = bo V i3 s.t. d( 13) < 2m, Mm (y) > 0.

s.t.

(12.4.15)

The semidefinite program 'P m (resp. 4- 5m ) provides monotonically improving upper (resp. lower) bounds on P (resp. Q). Under some additional assumptions one may get sharp bounds.

12.4.3 Sharp Upper and Lower Bounds In this section we prove that for a certain class of MCs with weak-Feller t.p.f. P, the upper and lower bounds provided by sup 'Pm and inf Qm are "sharp" in the sense that inf inf Q and sup Pm ,I, sup P,

dm i

as m --- co. Let X = Rn, and suppose that we are interested in the invariant p.m.'s p, for P that have all their moments finite. For instance, consider the function f : IV —> IR+ defined by n

X I— f(x) :=

x i=1

e Rn,

(12.4.16)

12.4. A Moment Approach for a Special Class of Markov Chains

187

for some > 0. If (12.3.6) holds for the function f in (12.4.16) and for some Lyapunov function V : X le, then every invariant p.m. p for P satisfies

f f dp,

b

(12.4.17)

for some b > 0 (see Remark 12.3.4), and, therefore, every invariant p.m. for P has all its moments finite. Let p, be an arbitrary invariant p.m. for P, and let yi(2k) (j) := f x,?,k dp for all i = 1, ...n and k = 1,2, .... (12.4.17) imposes a restriction on the growth of the absolute moments of p; in particular, (12.4.17) implies

E co

[y1(, 2k)( ia) -1/2k

(12.4.18)

=00

k=1

for all i = 1, ...n, which is a multivariate analogue of Carleman's condition for univariate moment sequences (see Berg [9, p. 117]). We will use (12.4.18) in the proof of Theorem 12.4.1, below, because it implies, in particular, that the p.m. p is completely determined by its moments (see, e.g., Feller [38, pp. 514-515] and Berg [9, p. 117]).

4 1V

Let y,I 2k) be the variable y, corresponding to the monomial in the basis (12.4.4)). Then define the new version of the semidefinite programs (12.4.14) and (12.4.15) as P'm : maximize

E

PaYa

lal<2m

s.t.

= 10 0

VO S.t.

d(0)

< 2m,

(12.4.19)

Mm(Y) 0,

k=1722/0;.yi(2k) < b,

E1 Ern and Q'm

minimize s.t.

E

PaYct

Afly = bfi Vj S.t. d(0) < 2m, 0, Mm(Y) Ein=1 Em k=1 ( -y2 2:), ek) < b,

(12.4.20)

respectively.

Theorem 12.4.1. Let be a MC on X = le with a weak-Feller t.p.f. P that maps polynomials into polynomials. Let p: X —> R be a real-valued polynomial of degree s. Assume that the Lyapunov condition (12.3.6) holds with f as in (12.4.16), and let P'm and Q'm be as in (12.4.19) and (12.4.20), respectively. Then, (a) Pmand Q'm are solvable for all 2m> 8.

188

Chapter 12. Approximation Procedures for Invariant Probability Measures

(b) P and Q are solvable, and min Qin., min Q and max Pim, J. max P, as m oo.

(c) If P admits a unique invariant p.m., then min Q = max P min Qin '

p*

and max Irrn J p* , .

p* , and

as m oo.

Proof. (a) As P is weak-Feller, from the Lyapunov condition (12.3.6) it follows that all the invariant p.m.'s for P satisfy (12.4.17) (see Remark 12.3.4). and let To prove that P',.„ is feasible, take an arbitrary invariant p.m. y := { y,(p)} be the infinite vector of all its moments. From (12.4.17) we have 2k

Ii 00

ii

i=1 k=0

y(2k) (it) < b.

( 72 k )!

(12.4.21)

Therefore, the truncated vector of moments up to order 2m is feasible for both Pim, and qm . Next let y E 11 s(2m) be a feasible solution of P in ' . The set of feasible solutions of Pin, is compact because of the third constraint in (12.4.19) in Irrn , which implies that for every i = 1, ...n

yi(2k) < (2K).ory 2k 7 \

/

Vk= 1,2,....

(12.4.22)

From this fact together with M m (y) 0 and the definition of Mm (y), it follows that the whole diagonal of Mm (y) is nonnegative and bounded. By Mm (y) 0 again, we conclude that all the entries of M m (y) are bounded, and thus, the set of feasible solutions of P compact. Hence n i is solvable, and, with similar arguments, so is Q. This proves (a). ' , and (b) For each 2m > s, let ym E R s(2m) be an optimal solution of P m extend the vector ym- to an infinite vector ym = {yln} E by completing with zeros. Consider the sequence {ym} c 1R 00 In (12.4.22) we have already seen that all the entries of M m (y) are bounded. Fix an arbitrary q E N. As Mq (ym) is a submatrix of Mm (ym) for all m > q, it follows that the first s(2q) coordinates of ym are uniformly bounded in m. Hence, by a standard diagonal argument, there is a subsequence {mk} and a vector y* E 118' such that k

y7

Vi = 1,2,....

(12.4.23)

Let y*(m) E R s(2m) be the first s(2m) coordinates of y*. From (12.4.23) and Mm (ym) 0 it follows that

lum y*(m)) (

V2m > s.

(12.4.24)

12.4. A Moment Approach for a Special Class of Markov Chains

189

Moreover, (12.4.22) implies that Carleman's condition (12.4.18) holds for y* E which in turn implies that y* is the infinite vector of moments of a unique (or determinate) p.m. ,u on Rn (see Berg [9, Theor. 5, p. 117]). In addition, fix an arbitrary 0 E Nn. As ymk is an optimal solution of Irnik it follows that Aoymk = bo for k sufficiently large. Hence from (12.4.23) again, we obtain Aoy* = bo, that is (,u(I — P), x 13 ) = 0. (12.4.25) As 0 in (12.4.25) was arbitrary, it follows that ttP and ,u have the same moments. But from Carleman's condition (12.4.18), p (and hence btP) is completely determined by its moments (see again Berg [9, p. 117]), and thus, p,P = ,u, which proves that p is an invariant p.m for P. Finally, as p is a polynomial, by (12.4.23) we also have max P'nik

= E PaVank

= f P d/-1 =

PaY

P),

ial<s

IcIs

and thus, as max P m> sup P for all 2m > s and p is feasible for P, it follows that ( au, p) = sup P = max P. This proves that p is an optimal solution of P and that max Pin, 1 max P. The same arguments also apply to min Q'm with obvious changes. Finally, (c) follows from (b) when p is unique. Observe that the approximation scheme in this section requires —the function f in (12.2.2) to be a polynomial, and —the t.p.f. P to map polynomials into polynomials, whereas the approximation scheme in §12.3 can handle functions f in a larger class, and P is only assumed to be weak-Feller. However, for convergence of the latter approximation scheme one must impose restrictions on f (e.g., inf-compactness in Proposition 12.3.3 for the minimization case). On the other hand, for the moment approach, and under the assumptions of Theorem 12.4.1, one obtains sharp upper and lower bounds. Example 12.4.2. Consider the logistic map in Example 12.3.5. As X is the compact space [0, 1] c R, the semidefinite program rr,,, is the same as Pm in (12.4.14), i.e., y2k) we do not need to introduce the (additional) third constraint on the variables in (12.4.19). It suffices to impose that all the variables yi are in [0, 1]. In fact, one can instead impose the additional constraints Bm,(y) 0 and Mm _ 1 (y) — Bm,(y) 0, where the matrix B m is deduced from the moment matrix Mm by

Bm (y) :=

Y1

Y2

Y2

Y3

Yn

Yn+1

Yn •

• •

Yn+1 Y2n-1

(12.4.26)

190

Chapter 12. Approximation Procedures for Invariant Probability Measures

Indeed, (12.4.26) will ensure that { yi, y2,,_11 are the moments of a p.m. supported in [0, 1] (see, e.g., Curto and Fialkow [27, p. 622]). To compare with the methodology of §12.3, we consider the same functions (x) := x 2 , f2(x) = x 8 and f3(x) := (x — 0.2)(x — 0.3)(x — 0.4) for the criterion f fdp, to maximize in Pm . We may solve ir°,, for several values of the parameter m (that is, the number of moments considered is 2m) and several values of the number k of moment constraints iloy = too (or (p(I — P), x 3 ) = 0, j = 1, ... k). The results are displayed in Table 12.2. We can observe that we obtain very good bounds with few Parameters k=4; m=4 k=4; m=5 k=5; m=5

fi (x) 0.5625 0.5625 0.5625

f2(x)

h(x)

0.2613 0.2613 0.2612

0.10875 0.10875 0.10875

Table 12.2: Upper bounds for sup o (i f), i = 1,2,3

moment constraints (only 4 or 5). The results are consistent with those obtained for the linear programs in Example 12.3.5; see Table 12.1. For the function the invariant p.m. that maximizes ( au, fi ) is the Dirac measure .5 3/4 at the "fixed point" x = 3/4, and the exact value is obtained with k = 4 moment constraints, whereas this was not so for the linear programs even with n = 50 because the point x = 3/4 is not a grid point when n = 50 (see Table 12.1).

12.5 Notes The results in §12.3 are derived from the more general framework in [63], which deals with general linear programs in infinite-dimensional spaces (see also Hernandez-Lerma and Lasserre [64, 62] for approximation schemes for controlled Markov processes in metric spaces). The results in §12.4 are new. On the other hand, the moment approach in §12.4 was first developed for certain types of continuous time Markov control processes (controlled diffusions, exit time distribution, optimal stopping) on compact metric spaces by Helmes [49] and Helmes, Rohl and Stockbridge [50] but with different moment conditions. For instance, on X = [0,1] they use the so-called Hausdorff moment conditions that state that a sequence y = (yo , y,...) is a "moment sequence" of some measure on [0, 1] if and only if

j=0

3

0,

k = 0, 1, ...

(12.5.1)

(see, e.g., Feller [381), and analogue conditions for the multidimensional case. The resulting approximation scheme is a sequence of linear programs (LPs), whereas

12.5. Notes

191

in §12.4 we obtain a sequence of semidefinite programs. Both schemes have advantages and drawbacks. For instance, it is to be noted that the Hausdorff moment conditions (12.5.1) are numerically ill-conditioned because of the presence of binomial coefficients in (12.5.1). Moreover, the Hausdorff moment conditions are valid for measures on compact boxes [a, b]' (with generalizations to convex polytopes) only, whereas the semidefinite constraints are valid for arbitrary measures. On the other hand, many LP software packages can handle very large LPs which is not yet the case for semidefinite packages.

Bibliography [1] E. Akin, The General Topology of Dynamical Systems, Graduate Studies in Mathematics, American Mathematical Society, Providence, RI, 1993. [2] E.J. Anderson and P. Nash. Linear Programming in Infinite Dimensional Spaces, John Wiley & Sons, 1987. [3] R. Ash, Real Analysis and Probability, Academic Press, San Diego, 1972. [4] I. Assani and J. Wos, An equivalent measure for some nonsingular transformations and application, Studia Math. 97 (1990), 1-12. [5] K. Baron and A. Lasota, Asymptotic properties of Markov operators defined by Volterra type integrals, Ann. Polon. Math. 58 (1993), 161-175. [6] A. Barvinok, Convexity, Duality and Optimization, Lecture Notes, Department of Mathematics, University of Michigan, 1998. [7] V.E. Beneg, Finite regular invariant measures for Feller processes,J. Appl. Prob. 5 (1967), 203-209. [8] A. Ben Israel, Linear equalities and inequalities on finite dimensional, real or complex vector spaces: a unified theory, J. Math Anal. Appl. 27 (1969), 376-389. [9] C. Berg, The multidimensional moment problem and semigroups, Proc. of Symp. in Appl. Math. 37 (1987), 110-124. [10] A. Berman and A. Ben-Israel, More on linear inequalities with applications to matrix theory, J. Math. Anal. Appl. 33 (1971), 482-496. [11] D.P. Bertsekas, Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall, Englewood Cliffs, NJ, 1987. [12] D.P. Bertsekas and S.E. Shreve, Stochastic Optimal Control: The Discrete Time Case, Academic Press, New York, 1978.

194

Bibliography

[13] K.P.S. Bhaskara Rao and M. Bhaskara Rao, Theory of Charges: A Study of Finitely Additive Measures, Academic Press Inc., London, 1983. [14] P. Billingsley, Convergence of Probability Measures, Wiley, New York, 1968. [15] A.A. Borovkov, Conditions for ergodicity of Markov chains which are not associated with Harris irreducibility, Siberian Math. J. 32 (1992), 543-554. [16] J. Borwein, Weak tangent cones and optimization in a Banach space, SIAM J. Contr. Optim. 16 (1978), 512-522. [17] H. Brezis, Analyse Fonctionnelle: Theorie et Applications, 4eme tirage, Masson, Paris, 1993. [18] A. Brunel, New conditions for existence of invariant measures in ergodic theory, Lecture Notes Math. 160 (1970), 7-17. [19] P.L. Butzer and U. Westphal, The mean ergodic theorem and saturation, Indiana Univ. Math. J. 20 (1971), 1163-1174. [20] R. Cavazos-Cadena, A note on the vanishing interest rate approach in average Markov decision chains with continuous and bounded costs, Syst. Control Lett. 24 (1995), 373-383. [21] R.V. Chacon, Identification of the limit of operator averages, J. Math. Mech. 11 (1962), 961-968. [22] K.S. Chan, A review of some limit theorems of Markov chains and their applications, in: H. Tong, ed., Dimension, Estimation and Models (World Scientific Pub., Singapore, 1993), pp. 108-135. [23] K.L. Chung, Markov Chains with Stationary Transition Probabilities, 2nd ed., Springer-Verlag, Berlin, 1967. [24] O.L.V. Costa and F. Dufour, Invariant probability measures for a class of Feller Markov chains, Stat. Prob. Lett. 50 (2000), 13-21. [25] B.D. Craven and J.J. Koliha, Generalizations of Farkas' theorem, SIAM J. Math. Anal. 8 (1977), 983-997. [26] B.D. Craven, Mathematical Programming and Control Theory, Chapman Hall, London, 1978. [27] R.E. Curto and L.A. Fialkow, Recursiveness, positivity, and truncated moment problems, Houston J. Math. 17 (1991), 603-635. [28] R.E. Curto and L.A. Fialkow, Flat extensions of positive moment matrices: recursively generated relations, Memoirs of the Amer. Math. Soc. 136, no. 648, November 1998.

Bibliography

195

[29] J. Diebolt and D. Guegan, Probabilistic properties of the general nonlinear Markovian process of order one and applications to time series modelling, Rapport Technique #125, LSTA, CNRS-URA 1321, Universite Paris VI, 1990. [30] J. Dieudonne, Sur la convergence des suites de measures de Radon, An. Acad. Brasil. Ci. 23 (1951), 21-38. [31] J.L. Doob, Measure Theory, Springer-Verlag, New York, 1994. [32] M. Duflo, Methodes Recursives Aleatoires, Masson, Paris, 1990. [33] R.M. Dudley, Real Analysis and Probability, Chapman & Hall, New York, 1989. [34] N. Dunford and J.T. Schwartz, Linear Operators, Part I, Wiley, New York, 1957. [35] E.B. Dynkin and A.A. Yushkevich, Controlled Markov Processes, SpringerVerlag, New York, 1979. [36] R. Emilion, Mean-bounded operators and mean ergodic theorems, J. Funct. Anal. 61 (1985), 1-14. [37] G. Emmanuele, Existence of solutions to a functional-integral equation in infinite dimensional Banach spaces, Czech. Math. J. 44 (1994), 603-609. [38] W. Feller, An Introduction to Probability Theory and Its Applications, 2nd ed., John Wiley & Sons, 1966. [39] S.R. Foguel, The Ergodic Theory of Markov Processes, Van Nostrand, New York, 1969. [40] A.G. Gibson, A discrete Hille-Yosida-Phillips theorem, J. Math. Anal. Appl. 39 (1972), 761-770. [41] LI. Gihman and A.V. Skorohod, Controlled Stochastic Processes, SpringerVerlag, New York, 1979. [42] B.M. Glover, A generalized Farkas lemma with applications to quasidifferentiable programming, Zeit. Oper. Res. 26 (1982), 125-141. [43] B.M. Glover, Differentiable programming in Banach spaces, Optimization 14 (1983), 499-508. [44] J. Gonzalez-Hernandez and 0. Hernandez-Lerma, Envelopes of sets of measures, tightness, and Markov control processes, Appl. Math. Optim. 40 (1999), 377-392.

196

Bibliography

[45] P. Glynn, A GSMP formalism for discrete-event systems, Proc. of the IEEE 77 (1989), 14-23. [46] P. Glynn and S.P. Meyn, A Lyapunov bound for solutions of Poisson's equation, Ann. Probab. 24 (1996), 916-931. [47] S. Grigorescu, Ergodic decomposition for continuous Markov chains, Rev. Roum. Math. Pures Appl. XXI (1976), 683-698. [48] A.B. Hajian and Y. Ito, Conservative positive contractions in L 1 , Proc. 5th Berkeley Symp. on Math. Statist. and Prob., Vol. II, Part 2 (1967), 361-374. [49] K. HeImes, Numerical comparison of controls and verification of optimality for stochastic control problems, J. Optim. Theor. Appl. 106 (2000), 107-127. [50] K. Helmes, S. Rohl and R. Stockbridge, Computing moments of the exit time distribution for Markov processes by linear programming, J. Oper. Res. 49 (2001), No 4. [51] 0. Hernandez-Lerma, Adaptive Markov Control Processes, Springer-Verlag, New York, 1989. [52] 0. Hernandez-Lerma and J. Gonzalez-Hernandez, Infinite linear programming and multichain Markov control processes in uncountable spaces, SIAM J. Control Optim. 36 (1998), 313-335. [53] 0. Hernandez-Lerma and J.B. Lasserre, Invariant probabilites for FellerMarkov chains, J. Appl. Math. and Stoch. Anal. 8 (1995), 341-345. [54] 0. Hernandez-Lerma and J.B. Lasserre, Existence of bounded invariant probability densities for Markov chains, Statist. Probab. Lett. 28 (1997), 359366. [55] 0. Hernandez-Lerma and J.B. Lasserre, An extension of the Vitali-HahnSaks theorem, Proc. Amer. Math. Soc. 124 (1996), 3673-3676; correction ibid. 126 (1998), p. 849. [56] 0. Hernandez-Lerma and J.B. Lasserre, Cone-constrained linear equations in Banach spaces, J. Convex Anal. 4 (1996), 149-164. [57] 0. Hernandez-Lerma and J.B. Lasserre, Existence and uniqueness of fixed points for Markov operators and Markov processes, Proc. London. Math. Soc. (3) 76 (1998), 711-736. [58] 0. Hernandez-Lerma and J.B. Lasserre, Discrete-Time Markov Control Processes: Basic Optimality Criteria, Springer-Verlag, New York, 1996. [59] 0. Hernandez-Lerma and J.B. Lasserre, Ergodic theorems and ergodic decomposition for Markov chains, Acta. Appl. Math. 54 (1998), 99-119.

Bibliography

197

[60] 0. Hernandez-Lerma and J.B. Lasserre, Existence of solutions to the Poisson equation in L p spaces, Proceedings of the 35th IEEE CDC conference, Kobe (Japan) (1996). [61] 0. Hernandez-Lerma and J.B. Lasserre, Policy iteration for average-cost Markov control processes on Borel spaces, Acta. Appl. Math. 47 (1997), 125-154. [62] 0. Hernandez-Lerma and J.B. Lasserre, Linear programming approximations for Markov control processes in metric spaces, Acta Appl. Math. 51 (1998), 123-139. [63] 0. Hernandez-Lerma and J.B. Lasserre, Approximation schemes for infinite linear programs, SIAM J. Optim. 8 (1998), 973-988. [64] 0. Hernandez-Lerma and J.B. Lasserre, Further Topics on Discrete-Time Markov Control Processes, Springer-Verlag, New York, 1999. [65] 0. Hernandez-Lerma and J.B. Lasserre, Further criteria of positive Harris recurrence for Markov chains, Proc. Amer. Math. Soc. 129 (2000), 15211524. [66] 0. Hernandez-Lerma and J.B. Lasserre, Fatou's and Lebesgue's convergence theorems for measures, J. Appl. Math. Stoch. Anal. 13 (2000), 137-146. [67] 0. Hernandez-Lerma and J.B. Lasserre, On the classification of Markov chains via occupation measures, Appl. Math. (Warsaw) 27 (2000), 489-498. [68] 0. Hernandez-Lerma and J.B. Lasserre, On the probabilistic multichain Poisson equation, Appl. Math. (Warsaw) 28 (2001), 225-243. [69] 0. Hernandez-Lerma and J.B. Lasserre, Order-bounded sequences of measures, internal report, LAAS-CNRS, 1996 (unpublished). [70] 0. Hernandez, R. Montes-de-Oca and R. Cavazos-Cadena, Recurrence conditions for Markov decision processes with Borel state space: A survey, Ann. Oper. Res. 28 (1991), 29-46. [71] 0. Hernandez-Lerma and R. Romera, Limiting discounted-cost control of partially observable stochastic systems, SIAM J. Control Optim. 40 (2001), 348-369. [72] R.A. Holmgren, A First Course in Discrete Dynamical Systems, 2nd ed., Springer, New York, 1996. [73] A. Hordijk and F. Spieksma, A new formula for the deviation matrix, in: F.P. Kelly, ed., Probability, Statistics and Optimization (Wiley, New York, 1994), pp. 497-507.

198

Bibliography

[74] M. Iosifescu, A basic tool in mathematical chaos theory: Doeblin and Fortet's ergodic theorem and Ionescu Tulcea and Marinescu's generalization, Contemp. Math. 149 (1993), 111-124. [75] Y. Ito, Invariant measures for Markov processes, Trans. Amer. Math. Soc. 110 (1964), 152-184. [76] R.P. Kanwal, Linear Integral Equations: Theory and Techniques, Academic Press, San Diego, 1971. [77] S. Karlin, Positive operators, J. Math. Mech. 8 (1959), 907-937. [78] N.V. Kartashov, Strong Stable Markov Chains, VSP, Utrecht, The Netherlands, 1996. [79] J. Kemeny and L.J. Snell, Denumerable Markov Chains, Springer-Verlag, New York, 1966. [80] T. Komorowski, Asymptotic periodicity of stochastically perturbed dynamical systems, Ann. Inst. H. Poincare 28 (1992), 165-178. [81] S.G. Krein, Linear Equations in Banach Spaces, Birkhauser, Boston, 1982. [82] U. Krengel, Ergodic Theorems, Walter de Gruyter, Berlin, 1985. [83] N. Krylov and N. Bogolioubov, La theorie general de la m,esure dans son application a l' etude des sytemes de la mecanique non lineaires, Ann. Math. 38 (1937), 65-113. [84] H.J. Kushner, Stochastic Stability and Control, Academic Press, New York, 1967. [85] A. Lasota and M.C. Mackey, Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics, 2nd ed., Springer-Verlag, New York, 1994. [86] A. Lasota and J.A. Yorke, Lower bound technique for Markov operators and iterated function systems, Random and Comput. Dynamics 2 (1994), 41-77. [87] J.B. Lasserre, Existence and uniqueness of an invariant probability measure for a class of Feller-Markov chains, J. Theoret. Prob. 9 (1996), 595-612. [88] J.B. Lasserre, Invariant probabilites for Markov chains on a metric space, Statist. Probab. Lett. 34 (1997), 259-265. [89] J.B. Lasserre, A new Farkas Lemma without a closure condition, SIAM J. Contr. Optim. 35 (1997), 265-272. [90] J.B. Lasserre, A new Farkas Lemma for positive semidefinite matrices, IEEE Trans. Aut. Contr. 40 (1995), 1131-1133 .

Bibliography

199

[91] J.B. Lasserre, A theorem of the alternative in Banach Lattices, Proc. Amer. Math. Soc. 126 (1998), 189-194. [92] J.B. Lasserre, Weak convergences of probability measures: a uniform principle, Proc. Amer. Math. Soc. 126 (1998), 3089-3096. [93] J.B. Lasserre, Quasi-Feller Markov chains, J. Appl. Math. Stoch. Anal. 13 (2000), 15-24. [94] J.B. Lasserre and H.C. Tijms, Invariant probabilities with geometric tail, Prob. Eng. Inform. Sci. 10 (1996), 213-221. [95] M. Lin, On the uniform ergodic theorem, II, Proc. Amer. Math. Soc. 46 (1974), 217-225. [96] M. Lin, Quasi-compactness and uniform ergodicity of Markov operators, Ann. Inst. H. Poincare, Sect. B, 11 (1975), 345-354. [97] M. Lin and R. Sine, Ergodic theory and the functional equation (I —T)x = y, J. Operator Theory 10 (1983), 153-166. [98] J. Lindenstrauss and L. Tzafriri, Classical Banach Spaces I and II, SpringerVerlag, Berlin, 1996. [99] G. Lu and A. Mukherjea, Invariant measures and Markov chains with random transition probabilities, Tech. Rept., Dept. of Mathematics, University of South Florida, 1994. [100] Y. Lyubich and J. Zemanek, Precompactness in the uniform ergodic theory, Studia Math. 112 (1994), 89-97. [101] A.M. Makowski and A. Shwartz, On the Poisson equation for countable Markov chains: existence of solutions and parameter dependence by probabilistic methods, Preprint: Electr. Eng. Dept., University of Maryland, College Park, 1994. [102] M. Metivier and P. Priouret, Theoremes de convergence presque sure pour une classe d'algorithmes stochastiques a pas decroissant, Probab. Th. Rel. Fields 74 (1987), 403-428. [103] S.P. Meyn and R.L. Tweedie, Markov Chains and Stochastic Stability, Springer-Verlag, London, 1993. [104] S.P. Meyn and R.L. Tweedie, The Doeblin decomposition, Contemporary Math. 149 (1993), 211-225. [105] J. Neveu, Existence of bounded invariant measures in ergodic theory, Proc. 5th Berkeley Symp. on Math. Stat. and Prob., Vol. II, Part 2, 1967, pp. 461-472.

200

Bibliography

[106] J. Neveu, Sur l'irreducibilite des chaines de Markov, Ann. Inst. Henri Poincare VIII (1972), 249-254. [107] J.R. Norris, Markov Chains, Cambridge University Press, Cambridge, 1997. [108] E. Nummelin, General Irreducible Markov Chains and Non-Negative Operators, Cambridge University Press, Cambridge, 1984. [109] T.V. Panchapagesan, Baire and a -Borel characterizations of weakly compact sets in M (T), Trans. Amer. Math. Soc. 350 (1998), 4839-4847. [110] M.D. Perlman, Jensen's inequality for a convex vector-valued function on an infinite-dimensional space, J. Multivar. Anal. 4 (1974), 52-65. [111] M.L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley, New York, 1994. [112] D. Revuz, Markov Chains, revised ed., North Holland, Amsterdam, 1984. [113] H.L. Royden, Real Analysis, Macmillan, New York, 1968.

[114] H.L. Royden, Real Analysis, 3rd Edition, Macmillan, New York, 1988. [115] W. Rudin, Real and Complex Analysis, 3rd edition, McGraw-Hill, New York, 1986. [116] R. Serfozo, Convergence of Lebesgue integrals with varying measures, Sankya: The Indian J. of Statist. 44 (1982), 380-402. [117] S.-Y. Shaw, Ergodic projections on continuous and discrete semigroups, Proc. Amer. Math. Soc. 78 (1980), 69-76. [118] S.-Y. Shaw, Mean ergodic theorems and linear functional equations, J. Funct. Anal. 87 (1989), 428-441. [119] S.-Y. Shaw, Uniform convergence of ergodic limits and approximate solutions, Proc. Amer. Math. Soc. 114 (1992), 405-411. [120] S.-Y. Shaw, Convergence rates of ergodic limits and approximate solutions, J. Approx. Theory 75 (1993), 157-166. [121] B. Simon, The classical moment problem as a self-adjoint finite difference operator, Adv. Math. 137 (1998), 82-203.

[122] A.V. Skorokhod, Topologically recurrent Markov chains: Ergodic properties, Theory Prob. App!. 31 (1986), 563-571. [123] A.V. Skorokhod, Lectures on the Theory of Stochastic Processes, VSP, Utrecht, The Netherlands, 1996.

Bibliography

201

[124] J. Socala, On the existence of invariant densities for Markov operators, Ann. Polon. Math. 48 (1988), 51-56. [125] L. Stettner, On the Poisson equation and optimal stopping of Ergodic Markov processes, Stochastics 18 (1986), 25-48. [126] E. Straube, On the existence of invariant, absolutely continuous measures, Comm. Math. Phys. 81 (1981), 27-30. [127] R. Syski, Ergodic potential, Stoch. Proc. Appl. 7 (1978), 311-336. [128] W. Szczechla, On ergodic averages and absorbing sets for positive contractions in L 1 , J. Math. Anal. Appl. 194 (1995), 560-568. [129] H.M. Taylor, A Laurent series for the resolvent of a strongly continuous stochastic semigroup, Math. Programm. Study 6 (1976), 258-263 [130] R.L. Tweedie, Invariant measures for Markov chains with no irreducibility assumptions, J. Appl. Prob. 25A (1988), 275-285. [131] R.L. Tweedie, Drift conditions and invariant measures for Markov chains, Stoch. Proc. and Appl. 92 (2001), 345-354.

[132] L. Vandenberghe and S. Boyd, Semidefinite programming, SIAM Review 38 (1996), pp. 49-95. [133] K. Yosida, Functional Analysis, 6th edition, Springer-Verlag, Berlin, 1980. [134] K. Yosida and E. Hewitt, Finitely additive measures, Trans. Amer. Math. Soc. [135] K. Yosida and S. Kakutani, Operator-theoretical treatment of Markoff processes and mean ergodic theorems, Ann. Math. 42 (1941), 188-228. [136] A.A. Yushkevich, On a class of strategies in general Markov decision models, Theory Prob. Appl. 18 (1973), 777-779. [137] A.A. Yushkevich, Blackwell optimal policies in Markov decision process with a Borel state space, Zeit. Oper. Res. 40 (1994), 253-288. [138] R. Zaharopol, Attractive probability measures and their supports, submitted. [139] C. Zalinescu, A generalization of Farkas lemma and applications to convex programming, J. Math. Anal. Appl. 66 (1978), 651-678. [140] C. Zalinescu, Solvability results for sublinear functions and operators, Zeit. Oper. Res. 31 (1987), 79-101. [141] X.-D. Zhang, On weak compactness in spaces of measures, J. Funct. Anal. 143 (1997), 1-9.

Index The numbers in this index refer to sections

absorbing set, 2.2, 2.3 additive-noise system, 2.2, 5.3 Akin, 5.6 Alaoglu Theorem, 1.3 aperiodic MC, 4.2 Ash, 1.4, 1.6

deterministic system, 2.2 Dieudonne, 1.5 Doeblin decomposition, 4.5 Doob, 1.3, 1.4., 1.6 dual ergodic theorem (DET), 2.3 dual pair of vector spaces, 1.3 Dunford, 1.4, 1.6

Banach-Alaoglu-Bourbaki theorem, see Alaoglu Theorem Berg, 12.4 Bertsekas, 1.4 Billingsley, 1.4, 12.2 Birkhoff IET, 2.3, 6.3 Bogoulioubov, 5.6, 9.3 Borel space, 1.5 Borovkov, 5.2, 7.4

ergodic measure, 2.4 ergodic theorem Chacon-Ornstein, 2.3 dual, 2.3 individual, 2.3 mean, 2.3, 8.2 pathwise, 2.5 ergodicity property, 2.4 expected occupation measure, 2.3

canonical pair, 8.3 Carleman's condition, 12.4 Cesaro sum, 2.3 Chung, 3.1, 3.6 convergence of functions dominated, 1.5 monotone, 1.5 convergence of measures setwise, 1.3, 1.4 vague, 1.4, 1.4 weak, 1.3, 1.4 countably generated a-algebra, 4.1 Craven, 10.1, 10.6, 11.1

Fatou's lemma, 1.5 Generalized, 1.5 Feller, 12.4, 12.5 Foguel, 2.3, 2.6, 6.3 fundamental matrix, 8.3 generalized Farkas theorem, 10.6 geometric ergodicity, 4.3 Gihman, 2.2 Glynn, 7.1 Grigorescu, 7.2 harmonic function, see invariant func tion Harris decomposition, 4.5

204 Harris recurrence, 4.2 null, 4.2 positive, 4.2 Hausdorff moment condition, 12.5 Hernandez-Lerma, 4.6, 5.6, 6.4, 8.6, 12.5 HeImes, 12.5 Hewitt, 10.6 Holmgren, 6.3, 11.1 Hopf's decomposition, 2.3 individual ergodic theorem (IET), 2.3 inf-compact function, 1.4 vs. tightness, 1.4 invariant function, 4.2 invariant probability measure, 2.2 approximation of, 12.3, 12.4 strictly positive, 10.1 invariant set, 2.2, 2.3 Ionescu-Tulcea, 2.2 iterated function system, 2.2, 6.2 Kakutani, 8.2, 8.4 Kartashov, 9.1, 9.2, 9.4 Kemeny, 3.1, 3.6 Krengel, 2.3, 2.6 Koliha, 10.1, 10.6, 11.1 Krylov, 5.6 Lasota, 2.2, 6.2, 7.2 Lasserre, 1.5, 1.6, 4.6, 5.6, 6.4, 7.4, 8.6, 12.5 Laurent expansion, 8.6 logistic map, 6.2, 6.3, 11.1 Mackey, 2.2, 6.2, 7.2 Markov chain (MC), 2.2 countable state, 3.1 indecomposable, 3.2 irreducible, 3.2 (p-irreducible, 4.2 (p-essentially irreducible, 4.2 periodic, 4.2 recurrent, 4.2 Harris, 4.2 positive Harris, 4.2 quasi Feller, 7.3

Index

strongly ergodic, 9.2 strong-Feller, 4.2, 7.2 strongly stable, 9.2 uniformly ergodic, 9.2 weakly, 9.3 weakly ergodic, 9.3 weak-Feller, 4.2, 7.2 mean ergodic theorem (MET), 2.3 Yosida and Kakutani, 8.4 measure, 1.2 ergodic, 2.4 finitely-additive, 1.2, 6.3, 10.6 signed, 1.3 support of, 5.3 total variation, 1.2 measure-preserving transformation, 2.3 Meyn, 2.6, 4.2, 4.3, 4.5, 4.6, 6.2, 9.1, 9.4 moment function, 1.4 vs tightness, 1.4 ,u-invariant set, 2.4 Neveu, 4.2, 6.2, 6.4, 11.3 Norris, 3.1, 3.6 Nummelin, 4.2, 4.6, 9.1, 9.4 occupation measure expected, 2.3 pathwise, 2.3 operator drift, 11.3 contraction, 2.3 Frobenius-Perron, 2.3, 11.2 Koopman, 11.3 Markov, 2.3 positive, 2.3 power-bounded, 8.3 shift, 2.5 order-boundedness of measures, 1.5 order-lim inf, 10.2 order-lim sup, 10.2 Panchapagesan, 1.4 pathwise ergodic theorem, 2.5 Perlman, 11.3 petite set, 4.3

Index Poisson equation (P.E.), 8.2, 9.2 unichain, 8.2 multichain, 8.2 Polish space, 1.4 Portmanteau theorem, 1.4 positive Harris recurrence (P.H.R.), 4.2 characterization of, 4.2, 4.3 probability density, 2.3 invariant, 11.1 strictly positive, 11.2 Prohorov's theorem, 1.4 Rao, 6.3, 10.6 relative compactness, 1.4 resolvent, 8.2, 8.4 Revuz, 2.3, 2.6, 4.2, 4.6, 6.3, 6.4, 9.1, 9.2, 9.4, 10.6, 11.4 Royden, 1.6, 6.3 Rohl, 12.5 Schwartz, 1.4, 1.6 separable measurable space, 4.1 Serfozo, 1.6 Shreve, 1.4 Skorohod, 2.2, 10.4 small set, 4.2 Snell, 3.1, 3.6 stochastic kernel, 2.2 Stockbridge, 12.5 strong law of large numbers, 2.5 tightness, 1.4 total variation norm, 1.4 transient set, 4.5 uniformly, 4.5 transition probability function (t.p.f.), 2.2 mean ergodic, 8.4 Tweedie, 2.6, 4.2, 4.3, 4.5, 4.6, 6.2, 9.1, 9.4 Vitali-Hahn-Saks theorem, 1.3 w-geometric ergodicity, 4.3 weak convergence, 1.3 of measures, 1.3, 1.4 weak topology, 1.3

205 weak* topology, 1.3 Yosida, 2.6, 5.2, 5.3, 5.6, 8.2, 8.4 Yosida's ergodic decomposition, 5.3 Yosida and Kakutani mean ergodic theorem, 8.4 Yushkevich, 8.3 Zaharopol, 4.6 Zhang, 1.4

Markov Chains and Invariant Probabilities (Progress in Mathematics 211)

Read more

Invariant probabilities of Markov-Feller operators and their supports

Read more

Invariant Probabilities of Markov-Feller Operators and Their Supports

Read more

Markov chains

Read more

Markov Chains

Read more

Markov chains

Read more

Markov chains and stochastic stability

Read more

Markov chains and mixing times

Read more

Markov chains and mixing times

Read more

Queueing networks and Markov chains

Read more

Markov Chains and Mixing Times

Read more

Markov set-chains

Read more

Finite Markov chains

Read more

Denumerable Markov chains

Read more

Markov chains and stochastic stability

Read more

Markov Chains and Stochastic Stability

Read more

Markov chains and stochastic stability

Read more

Markov Chains and Stochastic Stability

Read more

Markov chains and mixing times

Read more

Interactive Markov Chains

Read more

Notes on Markov chains

Read more

Non-negative Matrices and Markov Chains

Read more

Markov Chains Models, Algorithms and Applications

Read more

Markov Chains Models Algorithms and Applications

Read more

Markov chains. Models, algorithms and applications

Read more

Non-negative Matrices and Markov Chains

Read more

Markov Chains: Models, Algorithms and Applications

Read more

Non-negative matrices and Markov chains

Read more

Markov Chains: Analytic and Monte Carlo Computations

Read more

Controlled Markov chains, graphs, and Hamiltonicity

Read more

Recommend Documents

Markov Chains and Invariant Probabilities (Progress in Mathematics 211)

Invariant probabilities of Markov-Feller operators and their supports

Frontiers in Mathematics Advisory Editorial Board Luigi Ambrosio (Scuola Normale Superiore, Pisa) Leonid Bunimovich (G...

Invariant Probabilities of Markov-Feller Operators and Their Supports

Frontiers in Mathematics Advisory Editorial Board Luigi Ambrosio (Scuola Normale Superiore, Pisa) Leonid Bunimovich (G...

Markov chains

Markov Chains

...

Markov chains

Markov chains and stochastic stability

Markov Chains and Stochastic Stability Sean Meyn & Richard Tweedie Springer Verlag, 1993 Monograph on-line (link) ...

Markov chains and mixing times

Markov Chains and Mixing Times David A. Levin Yuval Peres Elizabeth L. Wilmer University of Oregon E-mail address: dlevi...

Markov chains and mixing times

Markov Chains and Mixing Times David A. Levin Yuval Peres Elizabeth L. Wilmer University of Oregon E-mail address: dlev...

Queueing networks and Markov chains