Probability Methods for Approximations in Stochastic Control and for Elliptic Equations Harold J. Kushner DEPARTMENTS OF APPLIED MATHEMATICS A N D ENGINEERING
BROWN UNIVERSITY PROVIDENCE, RHODE ISLAND
ACADEMIC PRESS New York San Francisco London A subsidiary of Harcourt Brace Jovanovich, Publishers
1977
BY ACADEMIC PRESS,INC. ALL RIGHTS RESERVED. N O PART OF THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANY MEANS, ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR ANY INFORMATION STORAGE AND RETRIEVAL SYSTEM, WITHOUT PERMISSION IN WRITING FROM THE PUBLISHER.
COPYRIGHT 0 1977,
ACADEMIC PRESS, INC.
111 Fifth Avenue, New York, New' York 10003
United Kingdom Edition published by ACADEMIC PRESS, INC. (LONDON) LTD. 24/28 Oval Road. London NWl
Library of Congress Cataloging in Publication Data Kushner, Harold Joseph, Date Probability methods for approximations in stochastic control and for elliptic equations. (Mathematics in science and engineering ; v. 129) 1. Control theory. 2. Differential equations, 3. Approximation theory. 4. Probabilities. Elliptic. I. Title. 11. Series. QA402.3.K79 519.2 76-13942 ISBN 0-12-430140-1 AMS (MOS) 1970 Subject Classifications: 93E20,93E25, 65M10, 60B10, and 90C40 PRINTED IN THE UNITED STATES OF AMERICA
To Linda, Diana, and Nina
Contents
xi
Preface Acknowledgmtnb
xvii
2
1.1 The Wimer Ptoc:euc$ 1.2 Martinples
s 7
1.3 Markov ProccNCs 1.4 Stochastie lntqrals I.S Stoclla51ic Ditrerential Equations
9 17
Chapter 2. Weak C..vergence of Prollebility Measures 2.1 Probability Measures on the Real Line. Real-Valued Random Variables 2.2 ProbabiiHy Measures on Metric Spaces 2.3 The Spaces C"[o. ~) of Continuous Functions 2.4 The Space D"'(o. II] l.S Weak Convcreencc o n Other Spaces rii
2S 26
28
29 34
viii
CONTENTS
Chapter 3.
Markov Quoins •nd Control Problems wirh Markov Quoin Models
3. 1 Equations Satisfied by func1ionals ol M arkov Cha1ns
l.2 J.J 3.4 J .S J .6 3.1 3.8 39
O.,.imal Stopp•n& Problems Con~ro iJcd M arkov Chains: families of Controlled StratcJICS Op4imal Conuol until a Boundary Is Reached Optimal Oi~counted Co~ t Opcimal Stopping and Conuol lmpubive Control Systems Contro l over a Fixed T1mc: Interval Linear Programmina Formulation of the Markov Chain Control Problems
Chapter 4.
37
41 44 46 48 49 SO
52 53
EUipcic and P•rabolic Eqll81iOM •nd FWICtionals of Oltruslons
4..1 4..2 0
As.sumptioru and Uniqueness Results: No Control funcuon,als ol Uncontrolled OitJusions Partial 01.1fem'1tial Equations Associated with Functionals of Diffusions. II(· ) UniJormly Posilive Orinite
4.4
•( · )
o.,.,.....
58 S9 60
63
4.5 Partial Olfferenlial Equations formally Satis:ficd by Path Func:tionals
66
4,6 The Charactcristk: O perator o( 1he Diffusion 4,7 Opclmal Control Problems and Nonlinear Partial Oifferent i~t~l Equactons
69
Chapter 5. S.l S2
68
A S imple Application of rhe lnvarillnce Theorems
A Functional Limil Theorem An Application to Numerical AnaJys:is
19 8t
Chapter 6. Elliptic Eqll8tions •Del Uncontrolled Oitruslons 6. l Problem fotmulacioo 6.2 The F1n1tc Odl'erencc Mechod and an Appro1imatina Markov Chain 6.1 Converacocc of the Approximations to a Diffusion Proeeu 6.4 Converacncc o( the Cost FunctionaJs R"( ·) 6.5 The Discounted Cost Problem 6.6 An Allcmalive Reprcsentarion ror If.. and W( ·) 6. 7 Mo nte Carlo 6.S Approximacion or Invariant Measures 6.9 Remarks and E:.;tensions 6. lO Numerical Data
89
91 % 100 102 t04
107 108 lt Z 113
CONTeNTS
Chapter 7. 1.1 1.2 7.3 7.4 1.5 7.6
8.1 8.2 8.3 8.4 8.S 8,6
Approximations for Parabolic Equations and Nonlinear Filtering Problems
Problem Statement The Finite Difference Approximation a nd Weak Convergence Implicit Approximations Discounted Cost : Explicit Method Nonlinea r Filtering Numerical Data: Estimation of an lm•ariant Measure
Chapter 8.
ix
117 117 124 127 128 134
Optima l Stopping and Impulsive Control Problems
Discretization of the Optimal Stopping Problem Optimality or the Limiting Stopping Time p
138
Constrained Optimal Stopping Problems Discretization or tht. Impulsive Control Problem Optimality of the limits {p, , D1} and R(x. {p,. v1}) Numerical Data for the Optimal Stopping Problem
154 158 163 166
140
Chapter 9. Approximations to Optimal Controls and Nonlinear Partial Dill'erential Equations 9.1
Optimal Slopping and Continuously Acting Control: Formulatio n and Approximations 9.2 The L imit ls a Controlled. Stopped OiA"us)on 9.3 Optimality of the Limit 9.4 Discounted Cost
9.S 9.6 9.1 9.8
Control until a Boundary Is Reached: Discounted Case Control u ntil a Boundary Is Reached: No Discounting The l mpulsh·e Control Problem Numerical Results
171
JBJ 184 192 193
200 205
206
Chapter 10. Approximations to Stochastic Delay Equations and to Dill'usions and Partial Dlft'erential Equations with Reflecting Boundaries IO.J Approximations to Stochasttc Differential Delay Equations, Introduction 10.2 Approximations to Elliptic and Parabolic Equations with Neuma nn Bound ary Conditions. Formulation 10.3 The Finite Difference Approximations 10.4 Co ntinuous Time Interpolations and Convergence 10.5 Extensions of the Reflectio n Problem
2 11 214 219 221 223
CONTENT S
X
Chapter II. The Separation Theorem of Optimal Stodlastic Control Theory I I. I 11.2
Assumptions and the Sys.tem Model The Separa1ion and Optimality Theo rems
227 231
References
237
Index of Selected Symbols
240
Index
241
Preface
This book deals with a number of problems concerning approximations, convergence, and numerical methods for stochastic control problems, and also for degenerate elliptic and parabolic equations. The techniques that are developed seem to have a broader applicability in stochastic control theory. In order to illustrate this, in Chapter 11 we give a rather natural approach to the formulation and proof of the separation theorem of stochastic control theory, which is more general than the current approaches in several respects. The ideas of the book concern a number of interesting techniques for approximating (cost or performance) functionals of diffusions and optimally controlled diffusions and for approximating the actual diffusion process, defined by stochastic differential equations of the It6 type, both controlled and uncontrolled. Since many of the functionals that we seek to compute or approximate are actually weak solutions of the partial differential equations (i.e., the weak solution can be represented as a functional of an associated diffusion), the techniques for approximating the weak solutions are closely related to the techniques for approximating the diffusions and their functionals. Also, the form of the partial differential equation that is xi
xii
PREFACE
(at least formally) satisfied by a functional of interest actually suggests numerical methods for the probabilistic or control problem. We develop numerical methods for optimal stochastic control theory and prove the required convergence theorems. Neither for this nor for any of the other problems d o we require that the cost or optimal cost functions be smooth or satisfy any particular partial differential equation in any particular sense. Nor d o we require, a priori, that the optimal control exist. Existence is a by-product of our method. The numerical techniques are intuitively reasonable, admit of many variations and extensions, and seem to yield good numerical results. The main mathematical techniques are those related to the use of results in the theory of weak convergence of a sequence of probability measures. The technique seems to provide a point of view that not only suggests numerical methods but also unites diverse problems in approximation theory and in stochastic control theory. The ideas of weak convergence theory are being used more and more frequently in various areas of applications. But this book and previous papers by the author and some of his students seem to be the only currently available works dealing with applications to stochastic control theory or to numerical analysis. The proofs are purely probabilistic. Even when dealing with numerical methods for partial differential equations, we make no explicit smoothness assumptions and use only probabilistic methods and assumptions. Chapter 1 discusses some of the necessary probabilistic background, including such topics as the Wiener process, Markov processes, martingales, stochastic integrals, ItSs lemma, and stochastic differential equations. It is assumed, however, that the reader has some familiarity with the measure theoretic foundations of probability. In Chapter 2, we describe the basic ideas and results in weak convergence theory, at least insofar as they are needed in the rest of the book. The computational methods of the book are all equivalent to methods for computing functionals of finite Markov chains or for computing optimal control policies for control problems with Markov chain models. Many efficient computational techniques are available for these problems. In particular, the functionals for the uncontrolled Markov chains are all solutions to finite linear algebraic equations. The Markov chain can arise roughly as follows. We start with the partial differential equation that, at least formally, is satisfied by a functional of the diffusion, and we apply a particular finite difference approximation to it. If the approximation is chosen carefully (but in a rather natural way), then the finite difference equation is actually the equation that is satisfied by a functional of a particular Markov chain, and we can immediately get the transition probabilities for the chain from the coefficients in the finite difference
PREFACE
xiii
equation. The local properties of this chain are very close to the local properties of the diffusion, in the sense that there is a natural time scaling with which we interpolate the chain into a continuous parameter process, and the local properties of the interpolation and diffusion are close in certain important respects. Also, the functional of the Markov chain, which is the solution to the approximating equation, is similar in form to a Riemann sum” approximation to the original functional of the diffusion. At this point, the theory of weak convergence comes in, and we show that the functional of the chain does indeed converge to the desired functional of the diffusion as the difference intervals go to zero. Similarly, the approximation to the weak sense solution to the partial differential equation converges to the weak sense solution. The interpolation of the chain also converges (in a suitable sense) to a solution to the stochastic differential equation. Of course, the finite difference algorithm is classical. But neither the convergence proofs nor the conditions for convergence are classical. Also, the method can handle a much broader class of functionals than those that may possibly solve some partial differential equation. It is not necessary that we use finite difference methods; their use does, however, yield an automatic way of generating a family of approximating chains, whether or not the functional is smooth. However, many types of approximations are usable, provided only that they yield the correct limiting properties. Indeed, this versatility is one of the strong points of the approach. Approximating with Markov chains (whether or not we use classical finite difference techniques) allows us to use our physical intuition--to guide us in the choice of a chain, or in the selection of a computational procedure for solving the equation for the functional of the chain. Our sense of the “dynamics” of the process plays a useful role and can assist us in the selection of procedures that converge faster. In the case of the optimal control problem, we start by approximating the nonlinear (Bellman) partial differential equation, which is formally satisfied by the minimal cost function. With a suitable choice of the approximation, the discrete equations are just the dynamic programmihg equations for the minimal cost function for the optimal control of a certain Markov chain. Again, there are many types of useful approximating chains. This nonlinear partial differential equation, or optimal control, case is much more difficult than the uncontrolled or linear partial differential equation case. However, the ideas of weak convergence theory again play a very useful role. Under broad conditions, we can show that the sequence of optimal costs for the controlled chain converge to the optimal cost for the controlled diffusion. Indeed, it can even be shown that the (suitably interpolated) chains converge, in a particular sense, to an optimally controlled diffusion. “
xiv
PREFACE
In Chapter 3, we give the required background concerning the equations satisfied by various functionals of Markov chains, both controlled and uncontrolled. Our method is able to treat optimal control problems with various types of state space constraints. However, this often requires a linear programming (rather than a dynamic programming) formulation, and this is also discussed in Chapter 3. Chapter 4 discusses the relations between diffusion processes and elliptic and parabolic partial differential equations, both nondegenerate and degenerate and linear and nonlinear. Proofs are not given. The representation of the solutions of the linear equations in terms of path functionals of the diffusion is discussed, as well as the relation between certain nonlinear equations and optimal stochastic control problems. Chapter 5 is an introduction to the techniques and results of the sequel. In order to illustrate some of the simpler ideas, the techniques of weak convergence theory are applied to a simple two-point boundary value problem for a second order differential equation. In Chapter 6, we begin the systematic exploitation and development of the ideas. The motivation for the types of approximations is given, and the approximation of a variety of functionals of uncontrolled diffusion and linear elliptic equations is treated. We also show how to approximate an invariant measure of the diffusion by an invariant measure of an approximating chain, discuss the use of the approximations for Monte Carlo, and give some numerical data. The approximations that are explicitly discussed are derived by starting with finite difference techniques; all of them yield Markov chain approximations to the diffusion. However, it should be clear from the development that many other methods of approximation can be handled by the same basic techniques. The general approach taken here should motivate and suggest other methods with perhaps preferable properties for specific problems. Chapter 7 deals with the parabolic equation and with the probabilistic approach to approximation and convergence for explicit and implicit (and combined) methods. Furthermore, approximations to a (currently much studied) class of nonlinear filtering problems are discussed. Some numerical data concerning approximations to an invariant measure are given. In Chapter 8, we begin the study of nonlinear partial differential equations and approximations to optimal control problems, in particular to the optimal stopping and impulsive control problems. The discretizations of the optimization problems for the diffusion yield similar optimization problems on the approximating Markov chains. We are able to prove that the approximations to the optimal processes and cost functions actually converge to the optimal processes and cost functions, respectively. The study of nonlinear partial differential equations and optimal control problems continues in
PREFACE
xu
Chapter 9, where a variety of approximations and control problems are discussed. In order to show that the limiting cost functionals are truly minimal (over some specified class of control policies) and that the limiting processes have the probabilistic properties of the optimally controlled diffusion, a number of techniques are developed for approximating arbitrary controls and for proving admissibility or existence. It is expected that many aspects of the general approach will be quite useful in other areas of stochastic control theory. Additional numerical data appear in Chapters 8 and 9. Again, it must be emphasized that much more work needs to be done (investigating various types of approximations) in order to understand fully which types of approximations are preferable and why. In Chapter 10, we treat two types of extensions of the ideas in Chapters 6 and 7. First, approximations to stochastic differential difference equations and to path functionals of such processes are developed. Then, we discuss the problem of diffusions that are reflected from a boundary and the corresponding partial differential equations with mixed Neumann and Dirichlet boundary conditions. It is hoped that the book will help open wider the door to an interesting direction of research in stochastic control theory. Similar techniques can be applied to the problem where the stochastic differential equation has a “jump term” and the partial differential equations are replaced by partial differential integral equations. See, e.g., [K12] or the forthcoming work of Kushner and DiMasi [K16]. Numbering. Example. Theorem 3 of Section 2 of Chapter 5 will always be referred to as Theorem 5.2.3. Equation (3) of Section 2 of Chapter 5 will be referred to as Eq. (2.3) in Chapter 5 and as Eq. (5.2.3) in other chapters.
Acknowledgments
A number of people have helped the author with various parts of the manuscript. He would especially like to thank Giovanni DiMasi for his reading of some of the more critical parts, and Sandra Spinacci and Katrina Avery for their excellent typing of the several drafts. It is also a pleasure to acknowledge the support of many research projects, over several years, by the Office of .Naval Research, the Air Force Office of Scientific Research, and the National Science Foundation, which led directly to the ideas published in this book.
xuii
CHAPTER 1
Probability Background
This chapter contains a brief survey of several of the more important concepts and results in probability theory that will be used in the rest of the book. Most of the concepts are by now fairly standard in work in stochastic control theory and in many other areas of applications, for they are basic in the construction and study of continuous parameter models. But it is worthwhile to include them here in the interest of self-containment and in order to introduce them in the form in which they will be used. The chapter is not intended to be an introduction to probability or stochastic process theory. It is generally assumed in this book that the reader has some familiarity with probability theory, especially with the concepts of Markov chains, stochastic convergence, conditional expectation, and the separability and measurability of processes. The requisite background is contained in many excellent books, e.g., Breiman [B4], Gikhman and Skorokhod [G2], Neveu “11, or Wong [W2] (the first sixty or so pages in the latter reference). Section 1.1 deals with the Wiener process, which plays a fundamental role in the study of diffusion Markov processes and continuous martingales. I
2
1 PROBABILITY BACKGROUND
Section 1.2 gives a few results on martingales. Section 1.3 deals with Markov processes and Section 1.4 with the basic definitions and properties of stochastic integrals (integrals with respect to a martingale process). Finally, in Section 1.5, the topic of stochastic differential equations is developed. 1.1 The Wiener Processes
1.1.1 Definitions Let (Q 9, P) denote a probability space; i.e., the set R is the space of “elementary” outcomes, 9 is a a-algebra of subsets of R, and P is a probability measure on 9. Let {w(t), co > t 2 0} denote a family of real-valued random variables defined on the space. Generally, the generic variable o of the set Q will not be explicitly written. However, when there is a greater possibility of ambiguity, the o will be included. Also, w(w,t) will denote the value of the random variable w(t) at o,and w(o,.) will denote the path corresponding to o.The entire process {w(t), co > t 2 0} will be denoted by w(.). The process w(- ) is said to be a standard Wiener process ifit satisjes (a) and (b): (a) it is a separable process?; (b) it has independent, stationary, and normally distributed increments with zero mean value and w(0) = 0. By (b) we mean: for each integer n and set of real numbers 0 I t, I t2 I ... I t,, the random variables { ~ ( t ~-+w(ti), ~ ) 1 I i < n}
are mutually independent with zero mean and w ( t i +,) - w(ti)has a normal distribution with variance t i + - t i and w ( 0 ) = 0. The term “standard” will not be used. A “nonstandard” Wiener process satisfies (a) and (b), except that the variance can depend on time. Any process satisfying properties (a) and (b) has continuous paths w.p. 1. Let g ( R )and 9 ( R + )denote the Bore1 fields over the real line R and over the positive part of the real line R + = [0, m), resp. By virtue of the continuity w.p. 1, there is a measuruble version of w( ) in the sense that there is a version for which, for each set A E B ( R ) ,we have
-
{o, t : w(w, t ) E A } E 9 x B ( R + ) .
If we are interested only in a Wiener process or in functionals of a Wiener process, then the sample space R can be chosen to be C[O, co), the space of
t All processes used in the book will be assumed to be separable.
3
1.1 THE WIENER PROCESSES
continuous functions on [0, co). Then each w corresponds to a continuous function on [0, co) and conversely, and B is the smallest a-algebra on Q which contains all the open sets in the topology of uniform convergence on finite intervals. A vector-valued Wiener process is defined to be a vector of mutually independent real-valued Wiener processes. There are several equivalent definitions of the Wiener process. One (in terms of martingale concepts) that is particularly convenient in applications will be given in Section 4.3. The normality of the distribution of the increments is implied by the independence and stationarity of the increments if we add the condition: for each t 2 0 and E > 0, lim
I w(t + s) - w ( t )I 2 4 - 0 S
S l O
(see Breiman [B4, Chapter 121). If Y ( t ) ,t 2 0 is a sequence of random variables on (a,&,P) and for each t 2 0, { Y(s),s I t} is independent of W ( U ) - w(t), u 2 t, we say that Y ( - )is nonanticipatiue with respect to w( ), or simply nonanticipatiue if the Wiener process is obvious. Suppose that Bf is the minimal a-algebra over which w(s), s I t, is measurable, and B = Bf,and that P is the measure induced on B by w( .). Then (Q i3, P) may be called a Wiener process space.
-
ut,
1.1.2 Stopping Times
Let (Q B, P) denote a probability space, and let Bf and Y(t), DC, > t 2 0, denote a nondecreasing sequence of sub-a-algebras of B (Bf+s 2 B f ,s 2 0) and a sequence of random variables defined on the probability space. We say and process Y( - ) are adapted [or that Y( is adapted that the sequence (af} to {B,}]if Y ( t ) is Bf measurable for each t 2 0. Bf may be larger than the minimal a-algebra over which { Y(s), s I t} is measurable. Let { B f }denote a nondecreasing sequence of sub-a-algebras of B. The nonnegative real-valued function z is called a stopping time with respect to { B f }if it is defined on a set of positive probability and {z I t} E B ffor each t. (As a convention, if such a random variable z is not defined at some o,we set it equal to co there. Later, we will deal with stopping times which may actually take infinite values.) There is a natural set ofevents Bt c i3 which is associated with a stopping time T. By definition, a set A is in Br if and only if a )
A n {t I t } E Bf
4
1 PROBABILITY B A C K G R O U N D
for each t 2 0. The events in gr“occur” prior to (or at) time z. Indeed, gris a a-algebra. If Bt is the minimal o-algebra over which a process Y(s), s < t, is measurable, then (loosely speaking) whether or not any particular subset of R is in Br can be determined by simply observing { Y(s),s 5 z}. Stopping times arise frequently in stochastic process theory. They are often the moments at which specific events occur. Often we are concerned with studying the quantities Y(z), or the process (or some functional of it) Y( - ) only up until some event has occurred. In addition, the stopping times are frequently introduced as purely (and useful) technical devices to be used in the course of a proof. Suppose that the process Y ( * )has right continuous paths w.p. 1 and is adapted to {at}, and z is a stopping time with respect to {gt}. Then Y ( z )is a random variable and is Br measurable (Neveu “1, Chapter 111.61). Examples of Stopping Times
Let Y ( - )take values in a Euclidean space M and have right continuous paths w.p. 1, and let 99,denote the smallest a-algebra over which Y(s),s 5 t, is measurable.? Let A denote a Bore1 set in M, and y[s, t] the range of Y(u), s I u I t, and define the function of w t=
inf{t : Y ( t )E A} = sup{t : y[O, t ] E M - A }
(the first entrance time into A ; if Y ( t )# A, all t < 00, define the inf{ } to be co). Then (Dynkin [D4, Chapter IV.11) z is a stopping time if A is closed and either A or M - A is compact, or if A is open and M - A is compact.
1.1.3 Properties of the Wiener Process Owing to its fundamental role in the construction and study of Markov processes with continuous paths, the Wiener process and several of its properties will be used frequently in the sequel. (a) (Breiman [B4, Chapter 121) Almost every path is nowhere differentiable. (b) For each T > 0, let T. denote an arbitrary partition of the interval [0, T ] of the form Ti = {tl, i 5 n : 0 = t; ... 5 t: = T } . Define I T, I = sup, 1 tl+ I - tl I. Then we have the “quadratic” variation formula (Doob [D2, Chapter 81) lim
lTnl 1 0
C I w(tl+ i
1)
1’
- ~ ( t j ” )=
T
W.P. 1,
t We will often use the notation &( Y(s),s It) for such a u-algebra.
(1.1)
1.2
5
MARTINGALES
,
provided that T,, includes the points { t l } of T,. In any case (1.1) holds in mean square. Both (a) and (b) suggest that the path is rather wild locally. This is corroborated by the law of the iterated logarithm (c) (Breiman [B4, Chapter 121): (c)
lim
[ l o (2t
w(t)
log log( l/t))1’2
=
1
w.p. 1.
Property (c) implies that [recall that w(0) = 01 in each interval [0, s], s > 0, w(.)takes positive and negative values infinitely often. It is also true that the location of a zero of w( .) is the limit from the right of an infinite sequence of zeroes and, generally, if t is such a zero, then the shifted function w,( defined by w,(t) = w(t + t) - w ( t ) , takes positive and negative values each infinitely often in any interval [0, s], s > 0. To see this, let to denote any positive number and define a),
t=
inf{t 2 to : w ( t ) = 0},
Bt = @(w(s), s I t).
Then t is a stopping time relative to {a[}, and it can be shown (Breiman [B4, Chapter 121) that wr(.) is a standard Wiener process. The assertion follows, since (c) holds for wr(*).
1.2 Martingales 1.2.1 Definition and Martingale Inequalities Let (Q B, P) be a probability space and {at} a nondecreasing family of sub-o-algebras of %?.For any o-algebra s? c %? and vector-valued random variable Y, we use either notation E , Y or E[ Y Is?] for the conditional expectation of Y given s? (assuming that it exists). Suppose that the realvalued process Y ( . ) is adapted to {%?,}.Then { Y ( t ) , %?[, co > t 2 0} is a martingale, if for each real t 2 0, s 2 0,
E 1 Y ( t )I < 00,
+ I.%?[] = Y ( t )
(2.11 We may simply say that Y ( .) is a martingale if the {Bt}sequence is obvious. Martingale processes play a very basic role in the subjects dealt with in this book, but will not be used in any particularly deep way. They will be used mainly to obtain upper bounds for certain probabilities, and to show that some of our derived processes are related in a specific way to a Wiener process and, ultimately, to a stochastic differential equation. We will merely state some of the properties which will be used. E[I’(t
S)
W.P. 1.
1
6
PROBABILITY BACKGROUND
For any T 2 0, E 2 0 (Doob [D2, Chapter 71)
E sup
TtitO
I Y(t)I2 1 4 E I Y(T)I2
Both (2.2) and (2.3) hold on [0, co) if E l Y(T)I2 is replaced by limT-,mE l Y(T)12. A martingale {Y(t), a,,co > t 2 0} is said to be square integrable if E I Y(t) 1’ < co for all c 2 0. If the martingale is square integrable and continuous, then there is a unique continuous nondecreasing process A( - ) with A(0) = 0 so that (Wong [W2, p. 1661; Kunita and Watanabe [K14]; Meyer [M3, Chapter 71) if we define M ( t ) = Y2(t) - A ( t ) then {M(t),a,, 00 > t 2 O }
is a martingale. The function A( arbitrary T > 0, let Then
a )
may be written as (Y), with values ( Y), . For an
T. denote a partition of the interval [0, TI, where T. = { t l :0 = t; I t; I ... I t i = T}. n- 1
lim
lTnl+O
1 I ~ ( t l +-~ )Y(t!)I2
i=l
(2.4)
= A(T),
where the limit is in probability. The function A ( - ) is constant over an interval only if the martingale is constant over that interval. If {Y(t), B,, co > t 2 0} is a vector-valued martingale (i.e., each component of Y ( . ) is a martingale with respect to {Br}), which is continuous and square integrable, then there is a unique continuous nonnegative definite matrix-valued process A( so that A(0) = 0, A( * ) is nondecreasing (in the sense of nonnegative definite matrices) and { M ( t ) , B r , co > t 2 0) is a matrix-valued martingale, where a),
M ( t ) = Y(t)Y(t) - A(t).
x,
We may use the symbols ( Y) and ( Y), for A( - ) and A(t), resp. Let ( rj) denote the i, jth component of the matrix (Y). Then x ( t ) q ( t ) q), is a martingale with respect to {B,}. Also, (2.4) holds if the square 1 Y(t:+l) - Y(t:)l’ is replaced by the outer product
(x,
( Y(tl+ 1 ) - Y(tl))( Y(tl+ 1 ) - Wl))’ (Kunita and Watanabe [K14, Section 13). A ( . ) is known as the quadratic uariatfon of Y(-), owing to the property (2.4). If E{[Y,(t + s) - x ( t ) ] x [ T ( t+ s) - $(t)]la,} = 0 w.p. 1, for all s 2 0, t 2 0, then 5) = 0.
(x,
1.3
7
MARKOV PROCESSES
If, for each t 2 0, Bt I> B ( w ( s ) ,s It) and { ~ ( u-) w(t), u 2 t} is independent of B t ,then { w ( t ) , B t , co > t > 0) is a martingale, and (2.4) is a generalization of (1.1). Indeed, if { Y ( t ) ,Bf, co > t 2 0) is a continuous vector-valued martingale with quadratic variation A ( t ) = t I , where I is the identity matrix, then Y ( - )is a Wiener process. This important property will be proved in Section 1.4.3, which also contains additional material on martingales. 1.3 Markov Processes 1.3.1 Definition. Homogeneous Markov Process Suppose that { X ( t ) , co > t 2 0} is a sequence of vector-valued random variables which is defined on some probability space. Roughly speaking, the process is a Markov process if, knowing the present state, then knowledge of how the path evolved up to its present value yields no additional information concerning the future evolution of the path. Intuitively, for any set l- in the state space of the process we would have
qx(t + s) E r 1 x(s),s I t>= p { x ( t + s) E r I x ( t ) }
W.P. 1.
The following more exact definition is specialized to our needs. A less restrictive definition and a rather thorough study of Markov processes appears in Dynkin [D4]. Let (Q @) denote a measure space and S a Borel.set in some Euclidean space. The set S will be the range (or state space) of the process, and @ ( S ) will denote the Bore1 field over S. Let P,{ *}, x E S, denote a family of probability measures on (Q B), let {Bf , co > f 2 0} denote a family of nondecreasing sub-a-algebras of %?, and let X ( . ) be an S-valued random process on (Q @, Px), for each x E S. Let P(-,-,- ) be a function from S x [0, m) x B(S) to [0, 13, such that P(*,t, r) is B(S) measurable for each t 2 0 and r E B(S), P(x, t, - ) is a probability measure on B(S) for each t 2 0 and x E S, and P(x, 0, r)= I,.(x), where I,( .) is the indicator function of the set r [ I r ( x )= 1 if x E r and = 0 otherwise]. The collection (Q B, B t ,P,, X ( . ) , x E S) [which we often refer to merely as X ( .)I is a homogeneous Markov process, with transition function P( -,., *), if X ( t ) is Bt measurable for each t 2 0 and (3.1) holds for each x E S, t 2 0 and r E B(S).
P , { x ( ~+ s) E r I BJ = ~ ( x ( ts,)r) ,
W.P.
1.
(3.1) Equation (3.1) is called the Markov property. It is a precise statement of the intuitive notion of the first paragraph, which said roughly that it is of no value to know the values of X(u),u c t, to predict the value of X(t s), if the
+
1
8
PROBABILITY BACKGROUND
value of X ( t ) is known. According to (3.1), the left-hand side of (3.1) equals (w.P. 1 ) P,(,,{X(s) E r}.If (3.1) holds, then P ( - , ., - ) must satisfy the Chapman-Kolmogorov equation (3.2) P(x, t
+ s, r)=
5 P(x, 4 dy)P(y, S
s, r)
(3.2)
for all x E S, s, t 2 0, and r E B ( S ) . The quantity P(x, t, r)is the probability that if X ( 0 ) = x, then X ( t ) E r. Let B , ( x ) = B ( X ( s ) ,s I t) denote the smallest a-algebra over which X ( s ) , s I t, is measurable. Bj the definition of a Markov process, B J x ) c B,. In many cases B, will actually be B,(x).Certainly, (3.1) must hold if B,(x) replaces B, there. For theoretical purposes, it is often desirable to choose a B, which strictly includes a’,(.). A common case is where B, measures variables X ( s ) , Y(s),s I t, where Y ( - )is some process which is independent of X ( .), under each measure P, . The inclusion of the Y ( .) enables us to treat randomizations of strategies, for example, in various decision problems. Our definition of a Markov process is not quite complete, but is adequate for our needs. In order to facilitate the mathematical exploitation of the Markov property, there is an additional requirement: for each s > 0 and w E Q, there is an w’ E C2 such that X(w, t + s) = X(w’, t) for all t 2 0. If all the other conditions of the definition are met, it is often not hard to either augment the space Q or to find a new probability space on which all the previous conditions hold, but which also contain points corresponding to all “left shifts ” as required by the last condition. 1.3.2 Nonhomogeneous Markov Processes Almost everything remains the same as in the previous section except that we now have a transition function P( ., * ; ., ) which depends on four quantities rather than on three: it has values P(x, t ; r, t + s) for t 2 0, s 2 0,x E S, and r E %?(S). Also P(x, t ; r, t) = Z,(x) and the nonhomogeneous Markov property (3.3) holds:
-
P , { X ( t +s)
E
r IB,}
= P ( X ( t ) ,t ;
r, t + S)
W.P.
1.
(3.3)
The quantity P ( x , t ; r, t + s) is the probability that if X ( t ) = x, then X ( t + s) E r. So, in the nonhomogeneous case,’the transition probabilities are not stationary. The Chapman-Kolmogorou equation, equivalent to (3.3), now takes the form ~ ( x t,;
for each t 2 0, x
r, t + s) = E
S,
5
S
~ ( x t ,; dy, t
+ U)P(~,t + u, r, t + s)
r E B(s)and s 2 u 2 0.
(3.4)
1.4
9
STOCHASTIC INTEGRALS
Let Bl = Bl(x).Then a property that is equivalent to ( 3 . 3 ) but is perhaps more intuitive, is: for each integer n, t 2 0, s 2 0, F B ( S ) and real t , , . . ., t, It, and each x E S, P,{x(t
+
S) E
+ s) E r I x ( t ) ) = q x ( t ) , t, r, t + s) W.P. 1.
r Ix ( t , ) ,i I
x ( t ) }= p , { x ( t
(3.5)
1.3.3 Strong Markov Processes There is a subclass of Markov processes with some additional properties that are of great importance in applications. Suppose, first, that A is a set in S so that the entrance time 7 defined by z = inf{t : X ( t ) E A }
is a stopping time. Let 7 < co w.p. 1 with respect to some P , . We would usually expect that a relation like (3.6) would hold in physical applications.
P , { x ( ~+ s) E r laT} = P , { x ( ~+ s) E = P ( X ( ~ )7 ,;
r lx(7))
r, + s)
W.P. 1.
(3.6)
Equation (3.6) says that if we know the values of X ( t ) at the moment of hitting A, then the prior history is of no additional value in predicting the evolution of the path beyond time 7 . As far as the author is aware, if a physical process can be modeled by a Markov process, then that Markov process has property (3.6). A Markov process is said to be a strong Markov process if (3.6) holds for all random variables 7 which are stopping times relative to {al}. A Markov process is said to be a Feller (resp., strong Feller) process if for each t > 0 and each bounded real-valued continuous (resp., measurable) function g( .), the function of x, with values E , g ( X ( t ) ) , is continuous in x on S. Every Feller process whose paths are right continuous w.p. 1 is a strong Markov process (Dynkin [D4, p. 993). Verification of the Feller property is a convenient way of verifying whether a process is a strong Markov process. 1.4 Stochastic Integrals 1.4.1 Integrals with Respect to a Martingale In order to discuss Markov processes of the diffusion type and the relationship between them and elliptic and parabolic equations, it is necessary to define the integral of a function with respect to a continuous martingale. Let { Y ( t ) , a,, co > t 2 0} be a continuous square integrable scalar-valued
1
10
PROBABILITY BACKGROUND
So
martingale. We need to define the object f(o, s) d Y ( s ) for a ciass of functionsf( -,-). If (Y), f 0, then Y(.) is not of bounded variation (with some positive probability) and the Lebesgue-Stieltjes definition of the integral does not make sense. Nevertheless, by first defining the integral with respect to a class of step functions f(-,-), we can then extend the definition to a class of nonstep functions via a suitable probabilistic limiting procedure. We will only state some definitions and results. For a fuller discussion see Doob [D2], Gikhman and Skorokhod [G2], Kunita and Watanabe [K14] or Wong [W21* Let A’” denote the class of real-valued measurable random functions f (-, - ) on R x [0, co) which are adapted to {a,}. The following functions and classes of functions are all elements of or subclasses of A’”. W e say that f(*)E L,(
JOT
I f ( t )1’
or
d( Y ) , < co,
fn(‘-),f(*)
and
E
JOT
I f,(t) - f ( t ) 1’
d( Y),
E
each
T < co
LZ((Y)) as n + co, each
+0
T < 00,
resp. Let ;FZs denote the class of random step functions in L2( Y). Iff( - ) E L,(( Y)) and there is a sequence f,(.) E *2s such that f,(.) -f(.) in L,(( Y)), then we say that f(*)E 2z. Iff(.) E .TZs, there is a sequence of stopping times 0 = t,, . . . , such that (by assumption, f(.)E .%”2s is right continuous) f ( t ) =f(ti),
f
E [ti 9 ti+ 1 ) .
We say that f(.)E N 2 if there is a sequence of stopping times 71, . . ., tending to co w.p. 1 such that, for each n, the function f,(.) defined by f n ( t )= f ( t ) , t < 7 , , f n ( t ) = 0, t 2 7 , is in 2 2 . Definition of the Stochastic Integral of a Step Function f ( * )
then we define the stochastic integral +(-) o f f ( - ) with Iff(-) E respect to Y ( - ) by the formula (4.1). Let f(-)be constant on the intervals [ti, t i + l ) , where the {ti} are an increasing sequence of stopping times with respect to {B,}: +(t)= l 1 f ( s )dy(s) = ncf(ti)[y(ti+l) - y(ti)l 0
i= 1
+f(t,”(t)
- Y(t,)l
for t E
[tn7
t,+ 1).
(4.1)
1.4
11
STOCHASTIC INTEGRALS
The pair { + ( t ) , 00 > t 2 0} is a continuous square integrable martingal; and
Let u 2 t 2 z be finite stopping times with respect to {Br}, and letf(.) and g( .) be in .P2,.Then we can also verify (4.3)-(4.7) by simple calculations. f(s) dY(s) =
r
j f(s) d Y ( s )
w.p. 1,
(4.3)
0
E
sd
f(s) d Y ( s ) = 0,
(4.4)
Dejnition of the Stochastic Integral off ( - ) E 3'
For any f(.) E .Pz,there is a sequencef,(
',bJ
E
I fn(s)-f(s) 1'
d( Y ) , --* 0
0
)
E
2zs for which
for each
T < 00
(4.8)
as n --* co. If ( Y) is absolutely continuous with respect to Lebesgue measure, and has a uniformly bounded derivative (as with the case of a Wiener process), then for any f(.)satisfying E J,'1f(s)1' ds < 00, each T < co, a sequence fn( .) E .Pzs which satisfies (4.8) can easily be constructed. Thus such f(.) are in .P2.See Doob [D2, Chapter IX.51; or Wong [W2, pp. 142-1441. Let be a sequence in .PZs which satisfies (4.8), and define
vn(.)}
sl/n(t)=
BY (4.71,
j'fn(s) d ~ ( s ) . 0
12
1
PROBABILITY BACKGROUND
Equations (4.8) and (4.9) imply that {+,(t)} is a Cauchy sequence for each t. Hence, for each t, there is a random variable + ( t )which is the limit of $,(t) in mean square, and + ( t ) is adapted to W,. The limit $(t), corresponding to
two different {fn(.)} sequences in 2',,each satisfying (4.8), are the same w.p. 1. All the processes are assumed to be separable. Hence, if we can find one particular limit function +(.) that is continuous w.p. 1, then any separable version must be continuous w.p. l. Let T be an arbitrary positive number. Select fn( .) E 22s such that E
joT1
f(S)
-fn(s)
I2
4 y > , 2-",
and set E = 2-"14and rn > n in (4.10). Then the right-hand side of (4.10) is bounded above by 4.2-n/2.The summability of this sequence and the BorelCantelli lemma imply that the series m
+(t) =
1
l(t)
[+n+
n= 1
- + A t ) ] + +1(t)
converges uniformly on [0, TI. Since each +,,(.) is continuous, so is +( .). We write + ( t ) = yo f(s) dY(s). It can also be verified that (4.3)-(4.7) hold for anyf(*), g ( * ) E 22, and u 2 t 2 z, and that +(*) is a continuous square integrable martingale on [0, 03) with respect to (93,). Let { Y ( t ) ,W,, 03 > t 2 0) be an R'-valued continuous square integrable martingale, f (.) a q x r matrix-valued function which is adapted to {W,}. Suppose that (4.11) holds for each T c 03. (4.11)
Then we can define the stochastic integral +(t)= yo f(s) d Y ( s )term by term. 03 > t 2 0} is a continuous square integrable Rq-valued The pair { + ( t ) , W,, martingale, the ith component of +(*) is
@dt) =
5
r
1 fik(S)
O k
dK(s)
1.4 STOCHASTIC INTEGRALS
and the function ($) = {($i, IC/,), i, j
=
1,
..., q} is given by
f
($i
$j)f
3
=
13
fik(s)fjds)
d(
5
3
(4.12)
x)s
0 k.1
or, more compactly, by
j f
($)f =
Vsf’(s)l.
(4.13)
0
Let { Y(t),af,co > t 2 0) be a scalar-valued martingale and let f( E 22 (with respect to (Y)). Define $(t) = fi f(s) dY(s). Let g(*) E 22 (with respect to ($)). Then a )
T
E
jof2(S)SZ(44 y>s < co
-=
for each T co. We can now define the stochastic integral of g(-) with respect to $(*) by r
4(t) =
Furthermore,
(4>, =
j g(s) W s ) . 0
jf 9 2 ( s ) d(*), = jf s 2 ( s ) f 2 ( s4) 0
4(t)= jfs(s)f(s) ~ 0 1.4.2
V
S
9
0
s
)W.P. 1.
Local Martingales
There is a useful definition of the stochastic integral of functions E N z . Let {f.( -)} and {z,} denote the functions and stopping times used in the definition of M 2 . The stochastic integral
f(
a )
r
=
jof n b ) .dY(s)
is defined as in Section 1.4.1. We have that P{T, I T}+ 0 as n + 00, for each T ICO. Using this fact, we can show that there is a continuous function $(*) such that $,(i) + $ ( t ) in probability for each t > 0. By selecting a suitable subsequence, the convergence is uniform on each finite interval w.p. 1. We define the integral by r
W )= j0 f ( s ) W s ) .
1
14
PROBABILITY BACKGROUND
The vector case is treated analogously. The integral +(. ) is well defined since if {fk(*)}, i = 1, 2 are two approximating sequences, then the limits of the stochastic integrals +k(t) agree w.p. 1. An adapted pair { X ( t ) , gr,co > t 2 0} is said to be a local martingale if there is a sequence of increasing stopping times {z,} so that, for each t, X ( t )= X(t n z,) and, for each n, the process X , ( * ) with values X , ( t ) = X(t n T,) is a martingale. If each X , ( - ) is square integrable, then X ( ) is said to be a locally square integrable martingale. Let X , ( . ) be square integrable. Then, the increasing function (X,) is well defined. Also, ( X n + l ) r2 (X,), for each t, and we define (X) by (X), = limn (X,),. The limit is (w.P. 1) independent of the sequence {z,}. I f f ( . ) E N 2 , then its stochastic integral is a local martingale.
-
1.4.3
Itb's Lemma
In order to deal effectively with functions of stochastic integrals, we need a calculus for differentiating and integrating functions of the integrals. This need is answered by It6's lemma (see Gikhman 2nd Skorokhod [G2, Chapter 8.21, for the case where Y ( . )is a Wiener prcl-ss, and Kunita and Watanabe [K14, Theorem 2.21, otherwise). Let r denote a given integer, T a positive number and let g ( - , be a real-valued function in? C ' . 2([0,TI x R'). Let { Y(t),B,, 03 > t 2 0}denote a continuous R'-valued locally square integrable martingale, and let f( be an R'-valued random function which is adapted to @,}, and satisfies 1 f(s) Ids < co w.p. 1. Define the process X ( .) by a )
a )
Jt
X(t)=
5
r
0
f(s) ds
+ Y(t)
and suppose that ( Y ) is absolutely continuous with respect to Lebesgue measure. Define the operator D :
Then It6's Lemma states that
t C'.'([O, TI
whose first
t,
x R') denotes the space of real-valued continuous functions on [0, TI x R', x derivatives and second partial x derivatives are continuous.
1.4 STOCHASTIC INTEGRALS
15
The differential (4.15) is interpreted to mean
(4.16)
w.p. 1, where the last integral on the right-hand side is a stochastic integral (and also a continuous locally square integrable martingale). If, for each i, the function with values [dg(s, X ( s ) ) / d x i ]is in 2 2(with respect to ( q ) ) , then the stochastic integral in (4.16) is a square integrable martingale with zero expectation. If ( Y ) is not absolutely continuous with respect to Lebesgue measure, then d( q)[replaces (d( q ) , / d t ) dt in (4.15) and (4.16). Suppose that {Y(t), at,03 > t 2 0 ) is a continuous vector-valued square integrable martingale with (Y), = t l and Y(0) = EY(t) = 0. Then Y(.) is a standard Wiener proieess. We will only show that Y(t) is normally distributed. Applying It6's lemma to the function g ( x ) = exp il'x yields
x,
exp il'Y(t) = 1
x,
+ 1'il'[exp 0
iA' Y ( s ) ]dY(s) - f
I,I 1
I2[exp il' Y(s)] ds.
Denoting the characteristic function of Y(t) by Z , ( t ) = E exp iAY(t) and taking expectations yields I
Z,(t) = 1 -
j ( 11I2/2)Z,(s) ds, 0
which has the unique solution z,(t)= exp - Il I2t/2,
which is the characteristic function of a vector of independent normally distributed random variables with zero mean and covariance t l . 1.4.4 Representation of a Continuous Square Integrable Martingale Y (-) as a Stochastic Integral with Respect to a Wiener Process 03 > t 2 0 ) be a continuous vector-valued square integrable Let { Y(t), martingale on the probability space (Q 99, P). In all of the applications in this book, the increasing process A(.) = ( Y ) will be absolutely continuous with respect to Lebesgue measure; i.e., there is some nonnegative
1 PROBABILITY BACKGROUND
16
ro
matrix-valued Lebesgue measurable (w.P. 1) function C(.), adapted to {W,} and such that (Y), = C(s) ds. In such cases, Y ( . ) can be written as a stochastic integral with respect to Wiener process w(*),although we may have to enlarge the probability space to do so. Let us enlarge the probability space by adding a process a R‘-valued standard Wiener process which is independent of Y ( * ) Suppose . that @(*) is defined on the Wiener process space (4 P), and let 3, denote the smallest a-algebra which measures @(s), s I t . Define the space (a, &, = (R x a, W x 3,P x P), and let &, denote the smallest sub-a-algebra of W x 3 which contains W,x a, and Rx Then { Y ( t ) , co > t 2 0) is still a continuous square integrable martingale. There are measurable (r x I ) matrix-valued functions P( -, .) and D( -, * ) which are adapted to and such that for each o,t, D ( o , t) is diagonal with nonnegative elements P ( o , t) is orthonormal, and @(a),
a,
a,.
a)
a,,
{a,}
1
(0, t ) =
P(o,t)D*(o, t ) P ( o ,t).
Denote the diagonal elements of D(w, t) by d i ( o , t). Define the (r x r) diagonal matrix D + ( o , t) with elements d + ( ~t), by d+(w, t ) = {:r1(o, t),
if di(w, t) > 0, otherwise.
Define ~(o t ),= P(o,t ) D ( o , t )
and define a process w(.) by w ( t ) = ] ‘ D + ( w ,s)P(o, s) d Y ( s ) 0
+ j’[I - D + ( o , s)D(o, s)] d@(s).
(4.17)
0
The stochastic integrals in (4.17) are well defined and they are orthogonal since ( Y, @) = 0 (by the mutual independence of Y( -), A direct calculation using (Y, a),. = 0 and (4.12) or (4.13) yields that @(a)).
(w), = t I , which implies that w( is a Wiener process. Using the definitions of D, D’, P, and of the differential of w(.) as given by (4.17), we get that (w.P. 1) a )
Y ( t )= j t P ( w ,s ) D ( o , s) dw(s) = jla(w, s) dw(s), 0
(4.18)
0
the desired representation. If Z(o,t) is positive definite for almost all o,t, then we can “recover” a Wiener process by “inverting” Y ( - ) via the definition dw(t) = Z-’’z(o, t) d Y ( t ) . Otherwise, C(*, is degenerate on a nonnull o,t set, a )
1.5
17
STOCHASTIC DIFFERENTIAL EQUATIONS
and even if Y( .) were defined by (4.18),we could not recover w( from Y ( .) since, due to the degeneracy, the integration in (4.18)"loses" information on w(*)* a )
1.5 Stochastic Differential Equations Let w ( - ) denote a standard R'-valued Wiener process and = 9?(w(s),s It ) . A large class of Markov processes are defined by the nonanticipative (with respect to w( .)) solution of the equation x ( t )= x
+ j o ' f ( x ( s ) , s) ds + Jb;@W, s) dw(s).
(5.1)
Equation (5.1)is often written in the symbolic differential form dx = f ( x , S) ds
+ O(X,
dw(s).
(5.2) Such processes are widely used in current stochastic control theory and in other applications in engineering, physics and economics, and encompass a wide variety of processes of practical usefulness. Functionals associated with these processes are intimately related to solutions of second-order partial differential elliptic and parabolic equations, and the study of the properties of these processes yields much valuable insight into the properties of those partial differential operators. In fact, the relationships between the process (5.1) and the partial differential equations will be heavily exploited in our study of approximations to the processes and to the partial differential equations. We will give the classical I t b D o o b [D2] proof of existence and uniqueness of the solution to (5.1)under assumption A1.5.1. S)
., .
A1.5.1 f(-, and CT( ) are R'-valued and (r x r ) matrix-oalued measurable functions on R' x [0, co) and there is a real K such that a )
I f (x1 s) - f ( Y , 4I + 14x9 s) - d Y , s) I IK I x - Y I I f ( x , s) l2 + 1 s) l2 IK2(1 + Ix 12)r,
for each s < co, and y, x
E
R'.
Theorem 1.5.1 (see Doob [D2, Chapter VI]; Gikhman and Skorokhod [G2, Chapter 81) Under A1.5.1 for each x E R', there is one and only one process X ( - ) which is adapted t o {Btfand satisjes (5.1) and (for each T < GO) E max 1 X ( s )l2 < 00. T2sZO
(5.3)
1
18
PROBABILITY BACKGROUND
The process is continuous W.P. 1.
PROOFThe proof depends heavily on the use of the Lipschitz and growth conditions A1.5.1, Schwarz’s inequality, and the martingale inequalities (2.2), (2.3), (4.9) and (4.10). The type of calculations that are used in the proof are used frequently in the analysis of stochastic differential equations and in the rest of this book. Note that if X ( is nonanticipative (with respect to w( -)), with I X ( s ) 1’ ds < 00 w.p. 1 for each T < co,then X ( * ) must be continuous w.p. 1, since the stochastic integral can be constructed as a local martingale as in Section 1.4.2. Let Y ( - )denote an R‘-valued process which satisfies
Jt
a )
(a) Y ( . ) is continuous w.p. 1. (b) Y ( . ) is adapted to {gf} (c) E maxTtrpO I Y ( s ) / ’ < co for each T < 00. Define P(-) by a(t)= x
+ j:f(
Y ( s ) ,s ) ds
+ j:c(
Y(s), s ) dw(s).
(5.4)
The integrals are well defined and a(-)satisfies (a) and (b) also. The inequalities (5.5), (5.6) imply that (c) holds for a(-).The derivation of inequality (5.5) requires the use of the growth condition and Schwarz’s inequality. The derivation of (5.6) uses the growth condition, the fact that the stochastic integral is a martingale and (2.3). E max
TtftO
l,df(
Y ( s ) ,s) ds
1
2
IE max t Ttf2O
T
:j I f(Y ( s ) ,s ) 1’
ds
T
I E T J ~I f ( Y ( s ) , s ) 1 2 1 E T K 2 1 (1 + I Y ( s ) I ’ ) d s < c o .
E max
Tzf2O
1 jo‘
(5.5)
0
c( Y(s),s ) d w ( s )
< 4K2 j:E(
1 + I Y ( s )1)’
ds.
Now, select a process X , ( . ) which satisfies (a)-(c), and define X , + , ( * ) (n = 0, 1, . . .) iteratively by setting (recursively) X,( .) = Y( .) and defining = F( in (5.4). Then all X,( - ) satisfy (a)-(c). X,, We now define a )
M,(t) = E max 1 X , ( u ) 1’ rzuto
and sharpen the estimate (c), for { X , , ( - ) }We . can write (again using the
1.5
19
STOCHASTIC DIFFERENTIAL EQUATIONS
growth condition, Schwarz’s inequality, and the martingale estimate) Mn+l(t) I4 1 x 1’
Ij:J(X,,(s), s) ds
1
f>U>O
+ 4E max lJ:o(Xn(s),
s) dw(s)
12 I42 0
I l2 + 4tK2
I4 x
I
Jo
I
2
+ 4E max
E( 1
2
+ I X,(s) 1’)
ds
+ 4K2-4Ej‘(1 + / X , ( S ) ~ds,~ ) 0
(5.7)
which yields, for some real number K O , I
M,+,(t) 5 K O
+ K O joMn(s)ds,
t IT.
(5.8)
Iterating (5.8) yields that
-
lim E max I X,(t) 1’ 2 K o e K o T .
n-tm
T>f>O
(5.9)
Define A n + , ( - ) by A n + l ( t ) = X,+,(t) - X,(t). We will show that the sequence m
X(t) =
1 An(t) + XO(4
(5.10)
n= 1
converges uniformly w.p. 1 to a process satisfying (a)-(c) and (5.1). Clearly, if the limit exists, it must satisfy (b) and, by Fatou’s lemma and (5.9), (c). Also, any limit X ( . ) which satisfies (5.1) and has a bounded mean square value on each [0, TI, must be continuous. First, we prove that the limit exists on an interval [0, TI. We have I
An+
l(t) =
j[S(Xfl+
1(s),
0
s) -f(Xn(s), s)l ds
I
+ j o [ 4 X n + l ~ ss)) . - 4Xn(s), s)I dw(s), and (again using the Lipschitz condition, Schwarz’s inequality and the martingale estimate) there is a real number K, such that
Rn+l(t)
= E max I An+,(s) 1’ I>S>O
I2K’T
J:
E 1 An(s)1’ ds
+ 8K2 j:E
lA,,(s)1’ ds IK , J;R,,(s) ds
1
20
PROBABILITY BACKGROUND
for all t < 00. Hence, (5.11)
R,,+ l(t) I K2(Kl t)./n!,
where
K , = K, . E max IX,(t) - X,(t)(’.T. T t 12 0
By Chebychev’s inequality and (5.11),
1
1
P sup [ A n +l(t) I 2 2-”12 I K22”(K1T)”/n!. Tlr2O
(5.12)
By the summability of the right-hand side of (5.12) over n and the BorelCantelli lemma we can have sup I A,+ l(t) I 2 2-”12
T t f t O
only finitely often w.p. 1. Hence (5.10) converges uniformly to a continuous function X(-) w.p. 1. The convergence is also in mean square. To see this, we use the inequality [which is a consequence of (a + b)’ I 2a2 + 2b2]
[A,, + A,+
+ + A,,+k-
+
I2A,” 4A,”+
+ + 2kA,”+k1.
From this inequality and (5.11) we can readily compute that TtftO
Ixn+k(t)
- xn(t)
1’
as n, k + co, which in turn implies the mean square convergence. By the w.p. 1 convergence, (5.13a)
l;t~(X,,(s),s) dw(s)
-+
{ofo(X(s),s ) dw(s)
as n -, 00,
(5.1%)
in mean square. Equations (5.13a) and (5.13b) and the fact that X,(t) -,X ( t ) w.p. 1 uniformly on finite intervals imply that X ( . ) satisfies (5.1).
1.5
21
STOCHASTIC DIFFERENTIAL EQUATIONS
Only the uniqueness remains to be shown. Let X i ( * ) ,i = 1, 2, be two nonanticipative solutions to (5.1), each satisfying (5.3). Then r
+ j o f ( X i ( s ) ,s) ds +
X i ( t )= x
n(Xi(s),s ) dw(s),
i = 1, 2.
Define 6 ( t )= I X 1 ( t )- X 2 ( t )12. Then
[o(X'(S), S) - ~ ( X ' ( S )s,) ] dw(s)
The use of the triple of the Lipschitz condition, Schwarz's inequality, and the martingale estimate for the stochastic integral on the last inequality yields E6(t) I2tK2 jo1E6(s)ds
+ 2KZj:EG(s)
ds,
which implies that E6(t) = 0 since E6(t) is finite for all t . Q.E.D. Properties of the Solution to (5.1) under A 1 5 1
The process X ( . ) is a Feller process and a strong Markov process. The process is of the form of X ( - )above (4.14) and the operator D of (4.14) takes in the particular the form (5.14). The operator is denoted by the symbol 9, case of a stochastic differential equation:
a
9, =-+Y at
a + 1 f;:(x, s)- a + 1 U i j ( X , t ) - a 2 = aaxi i , j axi axj t
9
(5.14)
where a( * ) .) =
For g(-,
a )
E
.(a,
*)d(-,-)/2.
C'. 2([0,TI x Rr), 9,g(0, X) = 1im~-0[ E x g ( 4 XA) - g(0, x)]/A,
where Ex denotes the expectation, given that X ( 0 ) = x. Proofs of the local properties (5.15), (5.16) are found in Doob [D2, Chapter VI.31. For each finite interval [0, TI, there is a function e ( * )such that O(h)+Oas h+Oand
1
22 E[(X(t
PROBABILITY BACKGROUND
+ h ) - X ( t ) ) ( X ( t+ h ) - X(t))' I X ( t ) ]- 2
I(1
f+h
a ( X ( t ) ,s ) ds f
+ I x ( t )i2)he(h),
1
(5.16)
where the O ( - )functions are uniform in t and X ( t ) .
REMARKS Assumption A1.5.1 makes the study of the properties of the solution to (5.1) relatively simple. We will give two examples of the simple analysis that it allows.
-
Let E,( ) denote a nonanticipa-
Example Z Sensitivity to perturbations tive process such that
joIe,(t)12 dt +O, 1
E
each
T c co, as n + co.
For each n, we can construct a solution to X n ( t )= x
+ JOf(Xn(s),
S)
ds + j;a(Xn(s), S) dw(s) +
['En(S) '0
ds (5.17)
in the same manner as we constructed one to (5.1). In addition, the solution is unique. Define 6 ( t )= 1 X,(t) - X ( t )1 2 . Then [ f ( x n ( s ) ,S) - f ( x ( s ) , s)l ds
and there is a real K, such that E6(t) I(1 + t ) K , j ' E b ( s ) ds 0
+ 4t :j I E,(s) l2
ds.
(5.18)
Equation (5.18) implies that there is a real K , such that f
E6(t) IK 2 e K z T j ol&,(s)I2 ds. Hence X n ( t )+ X ( t ) w.p. 1 as n + co. Example 2 Finite difference approximations For each A > 0, define the sequence { X t }by X t = x and, for n 2 0,
X t + l = X t + f ( X t , nA)A
+ a(X;, nA)[w(nA+ A ) - w(nA)],
and suppose A1.5.1, and thatf(., .), a(-,- ) are continuous in both x and t.
1.5
STOCHASTIC DIFFERENTIAL EQUATIONS
23
Define the process XA(.) by
XA(t)=Xt
on [nA, nA+A),
n2O.
We will not go through the detailed calculations, but simply by use of some of the devices; namely, the Lipschitz condition, Schwarz's inequality and the martingale estimate, it can be proved that XA(.) tends to the solution of (5.1) in mean square, as A + 0. If there is a real /? E (0, 1) such that the successive values of A are related by A(new)/A(last) I/?,then the convergence is uniform on any finite time interval, w.p. 1. See Kushner [K5, Chapter lo]. In much of Chapters 6-10, problems of perturbations (Example 1) or discrete time (Example 2) or other approximations will arise, and we will not usually assume that there is a Lipschitz condition. We will always assume that a type of uniqueness holds. The uniqueness and various techniques involving weak convergence theory (Chapter 2) will have to be used rather heavily, owing to the lack of a Lipschitz condition. Solutions to (5.1) exist under very weak assumptions, and the powerful tools of weak convergence theory can be used to construct those solutions. The main problem is uniqueness. If the K in A 1 5 1 depends on x, y but is bounded on finite sets, then there will be a unique solution up to a (random) moment of escape to infinity. Often, concepts in stability theory can be used to show that there is no (w.P. 1) finite escape time. In Theorem 1.5.l,f(., .) and a(-, can depend nonanticipatively on w, provided that A 1 5 1 still holds. Uniqueness also holds under a certain Holder condition (Yamada and Watanabe [Y13). Iff( -, .) and a( -, - ) satisfy a uniform Holder condition and are bounded and a(., .) can depend nonanticipatively on w, provided that A 1 5 1 still to (5.1) and the family of solutions over x = X ( 0 )is a strong Markov, strong Feller process (Dynkin [D4, Chapter 111). According to Strook and Varadhan [S3], if a(*, .) is strictly positive definite and continuous then there is one and only one solution for each bounded and measurablef( -,-), and the family X ( ), over all x = X(O), is also a strong Feller strong Markov process. It should be pointed out, however, that the references of this paragraph do not show that for each w ( . ) process there is a solution for each initial condition X ( 0 ) = x. In fact, the w( * ) process is constructed rather indirectly from the putative solution X ( .) corresponding to a specific X ( 0 ) = x and the uniqueness is in the sense of the probability law. a )
-
CHAPTER 2
Weak Convergence of Probability Measures
This chapter introduces the basic ideas of the theory of weak convergence of a sequence of probability measures on complete and separable metric spaces, or equivalently, of convergence in distribution of a sequence of random values with values in such a space. The generality is not only of theoretical interest, for it brings powerful and useful tools to bear on problems in the convergence of a sequence of real-valued stochastic processes. In the cases in this book, this sequence will be an “approximating” sequence to a random process which we wish to study-or to estimate functions of. The development starts with a discussion of convergence in distribution for the simple case of a sequence of real-valued random variables, and then moves on to the more complex cases. No proofs are given, but the results that are needed in the sequel are stated and discussed. 24
2.1
PROBABILITY MEASURES ON THE REAL LINE
25
2.1 Probability Measures on the Real Line. Real-Valued Random Variables The ideas of weak convergence of probability measures find useful applications in many areas of probability and statistics, as well as in many areas of application of those subjects, particularly in operations research and control theory. They frequently provide fundamental tools in the study of approximations and they will be used in just this way in this book. Most of the book is devoted to approximations to diffusion processes, to elliptic and parabolic equations, and to optimal stochastic control problems. The approximations will involve simpler processes-particularly, interpolated Markov chainst-and the concepts of weak convergence theory will be used to prove that the probability measures of the approximating sequences converge to the probability measure of the desired limit process, and then to prove that various functionals of the simpler processes converge to the appropriate functionals of the limit. Part of this chapter consists of a survey of the ideas as presented in the comprehensive book of Billingsley [B3]. The book of Gikhman and Skorokhod [G2, Chapter 91 also contains much useful information. See also Iglehart [Ill for a discussion of other applications. Suppose that PI, P,, . . .and P are probability measures on the real line R. The sequence {P,} is said to conoerge weakly to P (written P,* P) if P,( - co, x] --* P(- 00, x] at all points of continuity x of the distribution function P(- 00, -3. If X, and X are random variables with distributions P, and P,resp., then weak convergence is equivalent to convergence of {X,)to X in distribution. Let C ( R )denote the class of bounded and continuous real-valued functions on R. Then an equivalent definition of weak convergence P, P is (1.1) jf ( y ) P n ( d y ) j~ ( Y P('Y) for allf(.) E C ( R ) .We will take (1.1) as the definition, since it generalizes to abstract-valued random variables. Let g( - ) be a bounded measurable real--*
valued function on R with discontinuity set D,, and let C,(R) denote the set
t Let {XJ and {A,,) denote a discrete parameter process and a sequence of strictly positive random variables, resp. Define To = 0, Tn= A i . Then the process X ( * ) defined by X ( t ) = X , on [T,, T,, is said to be a (piecewise constant) interpolation of {X"}.The process X(-)defined by
1;:;
is the linear interpolation.
26
2
WEAK CONVERGENCE OF PROBABILITY MEASURES
of such g ( - ) that also satisfy P{D,} = 0. Then, if (1.1) holds for a l l f ( - ) in C(R),it holds forf( * ) in C,(R) also. Weak convergence P, P is equivalent to P,{A} + P { A } for each Borel set A witht P{dA} = 0 and to & P,{A} I P{A} for each closed A or to l& P,{F} 2 P{F} for each open set F (Gikhman and Skorokhod [G2, p. 4471; Billingsley [B3, Theorems 2.1 and 5.11). A sequence of measures {P,}is said to be tight if for each E > 0, there is a real number N, such that
P,{IXI IN,} 2 1 - E, all n. (1.2) If {P,} is tight, then for every subsequence of {P,},there is a further subsequence (say, denoted by {P’,}) and a probability measure P so that P,=-P. In the real variable case, this result is known as the Helly selection theorem (Loeve [L3, Section 11.2, where the tightness is termed complete compactness]). 2.2 Probability Measures on Metric Spaces The continuous parameter stochastic processes dealt with in this book will be defined either on some finite time interval [0, TI or on the semiinfinite or infinite intervals [0, a),(- 00, a), resp. Instead of viewing the processes as being a sequence of random variables whose parameter lies in a “time” set, it will sometimes be convenient to view them as abstract-valued random variables. We will not lose sight of the actual processes for very long, but the study of convergence will be greatly facilitated by the abstraction, since the powerful results concerning weak convergence of sequences of probability measures on abstract spaces can then be used. Let S denote a complete separable metric space, and let P denote a probability measure on the Borel sets of S.Define C(S) (resp., C p ( S ) )to be the sets of real-valued bounded functions on S which are continuous (resp., continuous almost everywhere with respect to P ) . Let {P,} denote a sequence of probability measures on the Borel subsets of S.The sequence {P,} converges weakly to a probability measure P, if (1.1) holds for all f (.) E C ( S ) .If (1.1) holds on C(S), it holds on C,(S). The statements below (1.1) all continue to hold if R is replaced by S and X 1 IN,} is replaced by {X E K,, some compact set in S}(Billingsley [B3, Chapter 13; Gikhman and Skorokhod [G2, Chapter 91). A sequence of probability measures {P,,} on S is said to be relatively compact if each subsequence has a further subsequence which converges weakly to some probability measure on S. (Strictly speaking, we should say on the Borel sets of S,but omitting the qualifier should not cause confusion.)
{I
t The boundary a A of a set A
is defined to be the closure minus the interior of A.
2.2
PROBABILITY MEASURES ON METRIC SPACES
27
Theorem 2.2.1 If {P,} is a sequence of probability measures on the complete separable metric space X,then tightness is a necessary and suficient condition for {P,} to be relatively compact. Furthermore, every probability measure on S is tight (Billingsley [B3, Chapter 11). Suppose that P , is a probability measure on X induced by some vectorvalued process t,,(*). The most frequently used tests for tightness of {P,} involve conditions on the processes -)}. These conditions will be readily verifiable in our cases. One of the main problems of the book is to prove that the “ limiting measures P correspond to some particular desired process (say, a controlled diffusion process). Let X = X l x X,, where X iare complete separable metric spaces, and let {P,} denote a sequence of probability measures on S,where P , is the measure induced by a pair of random variables X ; , X ; ,with X l taking values in Ti. Suppose that Pi is the measure induced by X l on X i . Then the sequence {P,} is tight if each sequence {Pi}, i = 1 , 2, is tight separately. This is a simple consequence of the fact that
{en(
”
pn((X1, X , ) E A1 x A,}
P,!{Xl E A , } + P.’{X, E A , }
for any Borel Ai E Xi. If P , is the measure induced by an X-valued random variable X , , and P , P, where P corresponds to an S-valued random variable X , we say that X , + X in distribution. It is usually more difficult to work with convergence in distribution, than with convergence w.p. 1. It turns out that by using a particular choice of the probability space, we can assume that all the random variables { X , } and X are defined on the same probability space, and that there is convergence w.p. 1, if there is convergence in distribution. First consider a special simple case. Let X , denote a random variable satisfying? P { X , = 0 } = 1 - l/n,
P { X , = l} = l/n.
The sequence X , converges to X = 0 in distribution. If the { X , } are mutually independent, the convergence is not w.p. 1 . Generally speaking, in our convergence studies, we are concerned mainly with properties of the “limit” X and with convergence of distributions of functionals of the { X , } to those of X . In these cases, we can alter the probability space in any way at all, as long as the distributions of each X , and of X are not changed-it would not matter if we modified the joint distributions of the ({X,}, X ) . Suppose that 6 = [0, 13, 3 is the a-algebra consisting of the Borel sets in [0, I ] , and P is
t The subscript n is sometimes put on the random variable, and sometimes on the measure; notation is often abused by interchanging P { X , E A} with Pm(XE A}, but the interchange should cause no confusion even without an identification of the probability space.
28
2
WEAK CONVERGENCE OF PROBABILITY MEASURES
(a
the Lebesgue measure. Define random variables 2,and 8 on 3,P ) by [l/n, 11, X , = 1 on [0, l/n). Then 8, -,2 w.p. 1 but for any Borel set A,
8 z 0, 8, = 0 on
P{X,
A} = P{X,
A},
B{X E A} = P{X E A}.
(2.1) There is a very useful theorem due to Skorokhod [S2, Theorem 3.1.11, which generalizes this result. E
E
Theorem 2.2.2 Let X be a complete separable metric space, let P, and P be induced by X-valued random variables X , and X , resp., and let P, P. There 3,P) with X-valued random variables {X,} and 8 is a probability space dejined on it, such that (2.1) holds for each Borel set A E S.Also 3, + 8 w.p. 1 in the topology of X.
(a,
Theorem 2.1.2 has been extended in various ways. See Dudley [D3]. If X , -,X in distribution, we will often assume, without notice, that the convergence is also w.p. 1 in the topology of X,and not use the tilde (") notation. The construction 3,B) is known as Skorokhod imbedding.
(a
2.3 The Spaces Cm[a,83 of Continuous Functions
Let Cm[a,83 (resp., Cm[O, co), Cm(- 00, co)) denote the space of R"-valued continuous functions on the finite interval [a, S] (resp. infinite intervals [0, co), (-a, a)). If m = 1, we drop the superscript. The topology on all spaces is the topology of uniform convergence on finite intervals. In this topology all three spaces are complete, separable, and metric. These spaces appear in numerous applications of weak convergence theory, some of which will be seen in the rest of the book. The proofs of the following results are in Billingsley [B3, Section 81.
Theorem 2.3.1 Let { X , ( - ) }denote a sequence of processes with paths in
C"[a, B], w.p. 1. The sequence is tight? ifand only if
For each q > 0, there is an N , < 00 such that P{ lX,(a) I 2 N , } Iq,
all n,
(3.la)
for each E > 0, q > 0, there is a 6 E (0, 1 ) and an no < co such that P(
sup
If-sJ
i
IX,(t) - X,(S)) 2 E I q
for
n 2 n o . (3.lb)
t By the statement "{X,}is tight," we mean that the corresponding sequence of probability measures induced by {Xm( on C"'[a, B] is tight. a)}
2.4
THE SPACE Dm[a,@]
29
If for each E > 0, t,~> 0, there is a 6 E (0, 1) and no such that sup
Ix,(t)- X,,(s)( 2 E
SSi<S+d
i
I 16, for
n 2 no and
each s, a I s, s
+ 6 I @,
(3.2)
then (3.lb) holds. A sufJicient condition for (3.2) is the existence of a real K and an a > 0, b > 0 such that
E 1 X,(t) - X,(s) 1. I K It - s all n. (3.3) If the finite dimensional distributions of {X,,(-)} converge to those of X ( . ) , where X ( *) has continuous paths w.p. 1, and if{X,,(-)} is tight, then {X,,} converges to X(.) in distribution. To any measure P on Cm[a,@], there corresponds a continuous stochastic process X ( ) whose finite dimensional distributions (say at times t,, . . ., t,,) are given by the expressions P { x ( - ) :x ( t l ) E A,, .. ., x(t,,) E A,,}, where x( is the generic element of C"[a, @I.
-
a )
Criterion (3.2) is simpler to verify than (3.lb), since s is fixed inside the P{ 1. A supremum over a fixed interval must be estimated, rather than the more complicated supremum in (3.lb). The theorem holds for Cm[O,co) and Cm(- co, co) if we set a equal to zero (or to any other fixed time in the appropriate interval) and if (3.lb), (3.2), (3.3) all hold on each interval [0, TI or [ - T , TI, where 6, K and no can depend on T. 2.4
The Space Dm[a,@]
Let Dm[a,@] denote the space of R"-valued functions on the finite interval @I,which are right continuous on [a, @), have left-hand limits on (a, @I, and are continuous at @. Let D"[O, co)and D"( - co, 0 0 ) denote the spaces of R"-valued functions on [0, co) and ( - co, co), resp., which are right continuous and have left-hand limits. If m = 1, we drop the superscript. Our basic approximating processes will be derived from a family of Markov chains where h indexes the family. The parameter h may be, for example, a finite difference interval. It will be necessary to interpolate the chains to obtain a family (also indexed by h ) of continuous time processes {th( .)}. These processes will be piecewise constant and continuous from the right, although the limitst (which will be diffusions) will have continuous [a,
{r!},
t The word limit is used loosely to mean the process which is the limit in distribution of
{tA-)I.
30
2
WEAK CONVERGENCE OF PROBABILITY MEASURES
paths w.p. 1. It will be convenient to treat these processes as random variables with values in the spaces Dm[O,TI, Dm[O,0 0 ) or D"( - co, co). First we consider Dm[O,TI, under a particular topology. To see which type of topology we need, consider the following example. Let f,(-)denote the member of D[O, TI, which is defined byf,(t) = 1 for t E [T/2 + l/n, TI andf,(t) = 0 elsewhere, and l e t f ( * ) denote the function whose values equal 1 on [T/2, T] and are zero elsewhere. Convergence in distribution of a sequence of probability measures {P,}on R to a probability measure P on R only requires convergence of the distribution functions P,(- co, -3 at the points of continuity of P( - co, -1. Here, in order to get reasonably useful results, we need to put a topology on D[O, TI under which f,(- ) -f( .) as n co. The space D[O, TI will be endowed with the Skorokhod topology (Billingsley [B3, Section 141) which is defined by the following metric: Let A denote the class of strictly increasing continuous maps from [0, T] onto [0, TI. For arbitrary elements x ( - ) , y ( - ) of D[O, TI, define the metric dT(X(99 Y ( * ) )= inf E : sup 1 t - A(t) I I E, sup I x ( t ) - y(A(t))I I E,
-
1
T>r> 0
TZrzO
for some
A(-)E A).
The time scale transformation A( .) allows a distortion of the time scale of one of the functions x( -), y( -), so that it can better match the other of the two functions in a neighborhood of a discontinuity. The sequence {f,(. )} defined previously converges to the abovef(.) in this topology. If { x , ( * ) }and x ( . ) are members of D[O, T] and x,( .) converges to x( .) in the metric d,, then x , ( t ) converges to x ( t ) at all continuity points t of x ( - ) . If we delete an arbitrary neighborhood of the points of discontinuity of x( .), the convergence is uniform. As shown in Billingsley [B3], D[O, TI is separable but not complete under this metric. There is another metric dT (called do in Billingsley [B3, Section 14]), which generates the same topology and under which the space is complete. For each A(.) E A, define
The metric d , is defined by
We will always assume that the metric is dT,which implies in particular that D[O, T ] is complete and separable. The product topology is used on the product spaces D"[O, TI.
2.4
THE SPACE
Dm[a,83
31
Tightness on D"'[O, T ]
Let { X , ( . ), t I T }denote a sequence of random processes whose paths are elements of D"'[O, TI w.p. 1 for each n, and which induce the sequence of probability measures {P,} on (the Bore1 sets of) D"'[O, TI.Generally, in applications, the properties of the measures (such as those required to prove tightness) are deduced from the properties of the processes-properties such as moment values, stochastic continuity, or path continuity. The development of criteria for the tightness of {P,}involves first a characterization of the compact sets in D"'[O, TI, then the selection of a particular sequence of compact sets { K E E, --* 0}, and finally the development of conditions on the paths of the family { X , ( -)} of processes which guarantee that the paths lie in K , with a probability at least 1 - E for all n. The complete story is given in Billingsley [B3], and we will only state the results. (induced by the Theorem 2.4.1 ([B3, Theorem 15.31) The sequence {P,,}, sequence ofprocesses X,(-)), on D"'[O, TI is tight ifand only if(4.1)-(4.4)hold. For each real q > 0, there is a real N , < co such that
(
i
P sup IX,(t)I 2 N , I q, T > tZ 0
all n 2 1. (4.1)
For each real E > 0, q > 0, there is a real 6 E (0,1) and an no < co such that
I
SUP
P \ T L t 2 L t L t ~ L 0. l t 2 - 1 1 1 6 6
min[ I x n ( t ) - xn(t1)
I
I x , ( t , ) - X , ( t ) I] 2 E I q,
PI
P
i
sup
n 2 no
I
I x , ( t ) - X , ( s ) ( 2 E I q, n 2 n o ,
d>t>SLO
sup
19
i
l x , ( t ) - x , ( s ) 1 2 E 2 q, n 2 n o .
TZths2 T-6
(44 (4.3) (4.4)
Equations (4.1),(4.3),and (4.4)are usually easy to verify. The moment condition (4.5)implies (4.2),and will often be used in the sequel (see Billingsley [B3, proof of Theorem 15.61). There is a real K for which E I x n ( t , ) - xn(t)I' Ixn(t2) - X A t ) I'
for all
K I t 2 - '1 l2
T 2 t 2 2 t 2 t , 2 0.
(4.5)
32
2
WEAK CONVERGENCE OF PROBABILITY MEASURES
Conditions (4.3) and (4.4) guarantee that the probability of a discontinuity on [0,6) or on ( T - 6, TI goes to zero, uniformly in n as 6 -,0. They assure us that the "limits" are continuous (w.P. 1) at t = 0 and t = T. Equation (4.2) is a type of equicontinuity condition-basically neglecting a finite number of discontinuities: it requires a type of equicontinuity (with a probability arbitrarily close to unity, uniformly in n ) on either one side or another of each t E (0, T). Frequently, the processes corresponding to a limiting measure will have continuous paths w.p. 1. There is a criterion for this-in terms of properties of the {X,( .)}.
Theorem 2.4.2 (Billingsley [B3, Theorems 15.5 and 8.31) The sequence {P,) on D"[O, T] is tight iJ(3.la) and either.(3.lb)or (3.2)or (3.3) hold. I f P is a weak limit of{P,} and X ( .) is a process inducing P on D"[O, TI, then X ( . ) has continuous paths w.p. 1. Let X ( . ) denote a process with paths in D"[O, TI w.p. 1 and define T, as the set o f t E (0, TI for which P { X ( t - ) = X ( t ) }= 1. In particular, t = T is in T,.
Theorem 2.4.3 (Billingsley [B3, Theorems 15.4 and 15.61) Let P, be induced on D"[O, TI by a process X,( whose paths are in D"[O, TI W.P. 1. Zf (4.2) or (4.5) hold and iJfor each integer m and set ( t l , . . ., t,) E T, a),
in distribution, then X,( * ) -,X ( * ) in distribution. We also have: I f X,(*) + X ( *) in distribution and ( t l , . . ., t,) (Xn(tl), xn(tm))-* (x(t1), x ( t m ) )as n+ 00.
E
T,, then
Define the finite dimensional projections as follows: For t l , . .., t, E [0, T] and x(.) E D"[O, TI, define the projection (mapping D"[O, TI to Rmn)n(tl, t n ; X) = ((x(tl),..., x(tn))* Suppose that P is the weak limit of a tight sequence {P,,) on D"[O, TI. Since the finite dimensional projections are measurable [B3, p. 1211, the measures induced by the projections can be used to construct a separable stochastic process X ( . ) on some sample space. By Billingsley [B3, Theorem 15.81, the sample paths of the process X ( .) are right continuous at t = 0, have left-hand limits at t = T and are continuous on (0, T), except for simple jumps. If (3.la,b) holds for X ( *), then X ( will be continuous w.p. 1, and will obviously have its paths in D"[O, 171 w.p. 1.. Also, if P { X ( t )= X ( t + ) }= 1, t E (0, T), then we can define the paths to be right continuous, without altering the finite dimensional distributions. In both of these cases, a e . 9
a )
2.4
THE SPACE
Dm[a,P]
33
we can assume that the process X ( . ) has paths in D"[O, T] w.p. 1, and, indeed, induces the measure P { - } on D"[O, TI. If X,( .) + X ( -) in distribution, then we can assume, via the Skorokhod imbedding, that the probability space is chosen so that X,( ) + X ( - ) w.p. 1 in the topology of D"[O, TI. In particular, there is (w.P. 1) pointwise convergence at all t where X ( .) is continuous. If X ( - ) is continuous w.p. 1, then the convergence is uniform w.p. 1. The space Dm[a, P] for arbitrary finite a, j?,P > u, is defined exactly as D"[O, TI was defined. We simply shift the time origin and set T = P - u. In the special case where a = - P = - T, denote the metric by d'i.
-
The Space Dm[O, co) There are several natural extensions of the metric d;. to the space P [ O , 00). The details of one of them appears in Lindvall [Ll] and we will describe it roughly. Let x(.) denote a generic element of D"[O, co) and, for each integer j , define the function g j ( - ) on [0, 00) by: g j ( t ) = 1 for t S j , g j ( t ) = 0 for t 2 j + 1 and g j ( t ) = j 1 - t on b, j 13. Define xj(-) = x( -)gj( Considering xj( - ) as a function on [0, j 11, define the metric
+
a).
+
+
Under this metric, Dm[O, co) is a complete separable metric space. (Actually, Lindvall considers each xj(. ) as a function on [0, 00) and maps [0, co) into [0, 11 via the transformation & ( t )= -log(l - t).) A sequence { X , ( * ) }is tight on D"[O, 00) if and only if { X i ( . ) } is tight on D"[O, j + 13 for each integer j . Then, we need not verify (4.4). The metric on the space Dm(-co, co) is defined similarly to that on D"[O, co). Define gj( - ) on (- 00, 00) by g j ( t ) = gj( - t) = gj(t), and xj( .) = x( *)gj(.), for an arbitrary x( .) E D"( - co,a). Considering xj( - ) to be an element of D"[ - j - 1, j + 11, define the metric
where d; is the Skorokhod metric on D"[ - j - 1, j + 11. All the statements concerning Dm[O,co) are true here [except that neither (4.3) nor (4.4)plays a role anymore], provided that we replace all terms of the form [0, u] (a > 0) by [ - a , u]. The D"( - 00, co) case is only alluded to and not dealt with in detail in [Ll], but the development would be almost identical to that given there for D"[O, 00).
2
34
WEAK CONVERGENCE OF PROBABILITY MEASURES
2.5 Weak Convergence on Other Spaces We will sometimes use the spaces R + = [0, a)with the usual Euclidean topology, and its one point compactification R’ = [0, a]. The space LT of R”-valued square integrable functions on [0, TI will also be used, but in a special way that does not require many of the concepts of weak convergence. Since LT[O, TI is complete and separable, each single probability measure on it is tight. As will be seen in Chapters 8 and 9, the Lp space is useful in problems arising in control theory, since the Lp spaces are usually more natural homes for the actual control than are the C or D spaces. We just note the following. Let w(-,.) (say Rr-valued) denote a Wiener process, 4 a compact convex set (in some Euclidean space R”), and u ( . ) a 4valued separable and measurable process on [0, T ] which is nonanticipative with respect to w(*,-). Then u ( o , - ) lies in L’;[O, TI for almost all o since u ( o , * ) is Lebesgue measurable for almost all o.The processes u ( * , *), w( -, induce a measure on (the Bore1 sets of) L’;[O, T ] x Cr[O, TI = X.Let u ( . ), x(. ) denote generic elements of LT[O, 7’1 and C[O, TI, resp. Under the induced measure, any measurable functional of u( - ) which depends only on u(s), s I t, is independent of any measurable function of x(. ) which depends only on x(z) - x ( t ) , z 2 t for each f . Let (U,P‘,wl)denote a probability space with generic element o‘and let U(.), W ( . ) denote random variables with values in Ly[O, TI, C[O, TI, resp. That is, for each o’E U, U ( o ’ )is an equivalence class in Ly[O. T ]and W(w’) is an element of C‘[O, 7’3. Let N((o’) and U,(o’) (for a fixed member of the equivalence class) denote the values at time t. Suppose that the marginal distribution induced on C‘[O, T ] is a Wiener measure and that for almost all a’, and any element of the equivalence class of U(o’), V,(o’) is %-valued for almost all t, where % is a compact convex set. Let V ( . ) be nonanticipative with respect to W( in the sense that any measureable function of V (* ) that depends only on the part of U ( - )on [0, t] is independent of - K(-), T 2 z 2 t } for each t E [0, TI. Then there is a probability space with 42- and R‘-valued separable and measurable processes u ( * , .) and w(-,-), resp., defined on it, where w(*,.) is a Wiener process, u(*, - ) is nonanticipative with respect to w( and they induce the measure of U ( .), W ( on I. In particular, for each w’ not in some null set, we can choose a member of the equivalence class of U(o’) such that the process with values V,(w’) is separable, measurable, and @-valued, where U,(o’) is the value at t of the chosen member of the equivalence class of U(o’). The purpose of these remarks is simply that it will occasionally be convenient to view a control as an abstract element of an L , space and sometimes a )
0
{w(.)
)
.,
a),
a)
2.5
35
WEAK CONVERGENCE ON OTHER SPACES
as a @'-valued random process; thus it is desirable to know that the two views are consistent. Sometimes it is convenient to set T = co.We then use L;, A function is in L.7, i f its restriction to [O, TI is in LT[O, TI,for each T < co. Let u, u be generic elements of I,;, If ILzTn denotes the norm of L;[O, TI, then L;, is separable and complete under the metric
,:
,
I
,,
,
and when using LT, we can and will suppose that the space is separable and complete.
CHAPTER 3
Markov Chains and Control Problems with Markov Chain Models
In this chapter we present a brief discussion of the equations that are satisfied by various functionals of Markov chains and by the cost functionals associated with optimal control problems on Markov chain models. Since the discretization methods of Chapters 5-10 yield equations that are satisfied by functionals of Markov chains and since such chains will be used to approximate diffusions, we present an outline of the relevant concepts in the theory of Markov chains in this chapter. More detail appears in Kushner [K5]. Our notation differs somewhat from the notation used in that reference. Since we are concerned with computational techniques, the state spaces of the Markov chains will be assumed to be finite. This finiteness will simplify the discussion in this chapter, but most of the results of the following chapters d o not require finiteness o€ the approximating chains. Notation is introduced and the dynamic programming equations for the more standard types of control problems on Markov chains are given. There are some brief remarks on computational methods and the linear programming for36
3.1
EQUATIONS SATISFIED BY FUNCTIONALS OF MARKOV CHAINS
37
mulation of controlled Markov chains under side constraints is discussed. This will be useful when we treat control problems on diffusion models with side constraints. 3.1 Equations Satisfied by Functionals of Markov Chains 3.1.1 Notation Let { X , } denote a Markov chain on the state space S. Let aS denote a jixed subset of S (a “boundary”), and define the stopping time N = min{n: X , E as}. If X , E S - 3.9 for all n, then following our usual convention with stopping times, we set N = co.If the chain is homogeneous, we denote the transition probability P { X , + , = y l X , = x } by p‘.“y), for x, y E S and all integers m, n 2 0, with pL.) written as pxy . If the chain is nonhomogeneous, we use the terminology p$;)(n) = P { X , + , = y I X , = x}, x, y E S, for all integers m, n 2 0, with p$,)(n) written as p,,(n). I f f ( * ) is a real-valued function on S, then we write the expectation as (assuming that it exists)
I
XI = 1 f ( y ) Y = E[f(Xn+rn)IXn = XI = 1 &;)(n)f(y)
Ex f ( x m ) = E[f(Xm) Xo = Ex,. f(xn+rn)
Y
-
(homogeneous case),
(nonhomogeneous case).
-
b( ), bT( ), r l ( ., -), b,( ., .) denote bounded real-valuedfunctions on - as, ( S - as) x [0, 1, ...) and 8.9 x [0, 1, .. .). The functions I ( - ) and r , ( - , .) will be “running” costs or “cost rates” and b(.), b T ( . ) , b , ( * , - ) will be boundary costs. Assume that the states are ordered in some definite way, so that quantities such as the transition matrix P = {p,,, x, y E S} are well defined.
Let I (
a),
S - as, as, S
3.1.2 Cost Accumulated until a S is Reached. Homogeneous Case Suppose that E x N < co
for all X E S - ~ S
and define the quantity R ( x ) by TN- 1
38
3
MARKOV CHAIN CONTROL PROBLEMS
The R ( x ) is well defined by (1.1). Equation (1.2)can be rewritten in the form
Under (l.l), the right-hand side of (1.2)is the unique bounded solution to Eq. (1.3). Suppose now that I ( * ) = 0; drop (1.1)and consider the functional
Then R(
a )
satisfies
and the right-hand side of (1.4)is a solution to (1.5).It is not necessarily the unique solution since any function A(.) that satisfies the boundary condition, and such that { A ( X n ) } is a martingale, also satisfies (1.5). If P,{N < co}= 1 each x E S - dS, then the solution is unique (this is not necessarily true if S is not finite).
3.1.3 Cost Accumulated until a Fixed Terminal Time Let M denote a fixed integer (a “terminal” time) and define the functional? ~ ( x n) v =~
[
xn ,
(MnN)-1
1 r (xi i ) +
i=n
1
I
+ b l ( X N , N ) I { M , N ),
b T ( x M v(M
X E
S - as, n < M ,
The boundary conditions (1.7)are implicit in (1.6). t The symbol 1“ is the indicator function of the set A.
(1.6)
3.1
39
EQUATIONS SATISFIED BY FUNCTIONALS OF MARKOV CHAINS
Equation (1.6) can be rewritten in the form
+ bi(xN br
w ~ { M ~ N 1) x n + 1 ]
R(x, n) = ~ I ( xn), + EX,"R(Xn+1, n
+ l),
n < M, x E S - as, (1.8)
with boundary conditions (1.7). The right-hand side of (1.6) is the unique solution to Eq. (1.8).
3.1.4 Discounted Cost Let j? denote a real number in (0,l) and assume that ( X , } is homogeneous. Define the function R ( * )by R ( x )= Ex
r1
i = O B'r(xi)
+ @Nb(xN)]*
(1.9)
Equation (1.9) can be rewritten in the form N- 1
i= 1
Equivalently, r ( x ) + j?ExR(Xl), R ( x ) = (b(x),
x E S - as x E as.
(1.10)
The right-hand side of (1.9) is the unique bounded solution to (1.10). 3.15 Average Cost per Unit Time
Suppose that the chain is homogeneous and the state space consists of a single positive recurrent, a-periodic class, with invariant measure (which we always write as a row vector) (p(x), X E S}. Thus p = p P , where P = {pxy,x, y E S). Define y by or by =
lim (l/N)Ex N+CU
N- 1
1 I(Xi).
i=O
3
40
MARKOV CHAIN CONTROL PROBLEMS
-
Suppose that there is a bounded function R( ) on S and a real number 7, such that (1.1 1) R(x) jj = r(x) E,R(X,).
+
+
Then, by premultiplying each term of (1.11) by p ( x ) and summing over x, and using ,u = p P , we have j j = y. Conversely, there is a bounded function R ( * )and a unique value 7 (which must then equal y ) which solves (1.11). The function R ( . ) in (1.11) is not unique, but j j is. The chains arising from the discretization methods used in Chapters 5-10 will often be periodic.? Let {X,}denote a homogeneous Markov chain which is a single positive recurrent class of period u 2 1. The following remarks are taken from Chung [Cl, Sections 1.7,1.8, and 1.151. The limit p ( y ) defined by n
p ( y ) = lim ( l h )1 P$; , i=O
n+w
x, y E S,
always exists, and does not depend on x. Letf(-), g ( - ) be real-valued functions on S. Define S'(f)= P ( Y ) f ( Y ) 3 E, f ( X ) .
c Y
Then, if S'(f) and F ( g ) are bounded and S ' ( g ) # 0, then
i
i=O
f(xi)/
fg(xi) S ' ( j ) / p ' ( g ) +
i=O
w.p. 1 as n + a.
The row vector p = { p ( x ) ,x E S} is the unique solution (up to a proportionality constant) to ji = pP. Sometimes it is more convenient (faster) to compute p a s follows. The system with transition matrix P" is a-periodic, and has an invariant measure which solves ji = jiP".
It is readily verified that the p given by 1 p =-[F + p p + U
* * *
+jp-'1
satisfies p = ,up. Of course, if p is used to compute an expectation, it must be normalized so that Cy,u(y)= 1, p ( y ) 2 0 all y. Note, for example, that the random walk, which is the usual approximationto the Wiener process, has period 2. In particular, X,, = X, k 1, each with probability f .
3.2
41
OPTIMAL STOPPING PROBLEMS
3.1.6 Methods of Solutions Equations (1.3), (1.5), (1.8), and (1.10) are all linear. Under the stated assumptions, (1.3), (lS), and (1.10) are of the vector form
R=Q~?+A, where r? = {R(x),x E S - dS} is the quantity to be found. In particular, for (1.3), A = { r ( x ) x Y b ( y ) p x y x, E S - as},and Q = {p,,,, x, y E S - as}.If E,N < 00, x E S - dS, then the spectral radius of Q is strictly less than unity, and any of the methods for solving such linear equations can be used. See Varga [Vl] for more details. See also the remarks at the end of section 3.4. The obvious way to solve (1.8) is via "backward iteration," starting with n = M. To get y = in (l.ll), we need to calculate the invariant measure p. This can be done by letting uo denote any probability measure on S and using the iteration u"+' = u"P (a-periodic) or via any method that can calculate the eigenvector of P corresponding to eigenvalue unity. (There is only one such eigenvalue unity under the ergodicity assumption.) If we wish to calculate E, f(X)(Section 3.1.5) when {X,} has period u > 1 and the chain is a single recurrent class, we can calculate p via the iteration u"+ = u"P", UP"-']. where uo is a probability on S and then use p = (l/u)[n +
+
+
3.2 Optimal Stopping Problems
32.1 Introduction One of the simplest of optimal control problems with a Markov chain model is the optimal stopping problem. Only one decision needs to be made: the process evolves until we decide to stop it. The cost is the sum of a cost associated with the evolution of the system until the stopping moment and a cost associated with the point at which we stop. For a simple example, let X, = accumulated number of heads minus tails in a coin tossing game with independent trials. Suppose that there is a cost c to be paid for each toss. If by stopping at time n when X, = y we get a terminal payoff of y, we may wish to find the stopping time m at which (x = initial state) E&m - Xml is minimized. The random stopping time m would usually depend only on the observed data; i.e., if m 2 n, then whether or not m = n would depend only on the observed values of X , , . . ., X,. Actually, we can enlarge the class of allowable stopping times without decreasing the minimal average
42
3
MARKOV CHAIN CONTROL PROBLEMS
return, provided that the larger class does not “secretly” give more information about X , , l, . . . than is already contained in X , , . .., X , for each n. For example, we can allow “ randomized ” stopping times. The extension will be formalized, and is quite useful for mathematical purposes. The gambling example may seem frivolous, but it is a prototype of a large class of important problems in sequential hypothesis testing, control theory, or economics. There are numerous decision problems where only one decision (or action) is possible-whether it is termination or initiation of an investment, the firing of a weapon, or the timing of a single thrust of a motor, etc. An excellent reference is Shiryaev [Sl], where the reader will find proofs of all the unproved assertions in this section. In the simple context of the finite Markov chain, the results are relatively simple to prove as special cases of Markov chain optimization problems (Kushner [K5]).
3.2.2
Problem Formulation
First, we need to introduce several classes of stopping times. We need to consider these several classes because, when we treat the stopping time problem for the diffision, we will be able to show that the sequence of approximations (in Chapter 8) to the optimal (minimal) cost converges to the cost for a stopping problem for the diffusion, corresponding to a possibly randomized stopping time. In order to show that the particular limit is also the minimal cost, over the class of “path dependent nonrandomized” stopping times, we need to consider the larger randomized classes. Let the homogeneous Markov chain { X , } be defined on the probability space (a,a, a,, P,), where {a,} is a sequence of nondecreasing sub-oalgebras of 8,X , is B, measurable and P,{X,+, = y 1 a,}= Px,{Xm = Y} w.p. 1. Equivalently,
Equation (2.1) is just the defining Markov property for the chain. However, 93, can measure other random variables besides X , , ..., X , ; it may, for example, measure $, , . .., $, , where {$,} is a sequence which is independent of { X , } under each P , . Let B,,(x)denote the minimal o-algebra over which X , , ..., X , is measurable. Let 9-gMdenote the class ofpure Markov stopping times m,possibly taking infinite values. If m E 9 g M ,then there is a set A c S for which m = min{n: X , =
00,
E
A)
if X , .$ A, all n < co.
3.2
43
OPTIMAL STOPPING PROBLEMS
Conversely such sets A correspond to stopping times m E S,",. For each x E S, F P M ( x )and S(,;x) denote the subclasses of F;,(X) for which Px{m < a}= 1 and E,m < 00, resp. A stopping time m relative to {B,} is in the randomized class 5:if { m = n} E B, for each n. For each x E S, the subclasses F R ( x )and F;(x) of F:(x) denote the sets of stopping times for which P,{m < m} = 1 and E,m < 00, resp. If d ( u ) is a class of stopping times, strategies, etc., for each u in some set [e.g., d ( u ) = S R ( x ) ,where u = x], we write A? = d ( u ) ; i.e., we drop the argument if the defining property holds for all values of the argument. Whether or not m = n for a stopping time m E F,",(x) depends only on the value of X,, but whether or not m = n for a stopping time m E F ; ( x ) can depend on the values of all of X, , . . . ,X, as well as, perhaps, additional variables (which, however, do not give more " information" on X , + ... than is already contained in X o , .. ., X,). Let b( ) and r( denote real-valued bounded functions on S. Suppose that there is a ro > 0 such that inf, r(x) 2 r , > 0. For each m E Y:(x), define the function (which is infinite unless m E F : ( x ) )
n
,,
-
a)
and define V ( . ) by
V(x)=
m
inf
E
R(x, m).
(2.3)
~PM+(x)
At each instant, we must either stop or continue. Hence, the appropriate dynamic programming equation is V ( x ) = min[b(x), Ex V ( X , )
Under our conditions on
I(
0
)
-
+ r(x)].
(2.4)
and b( ), an optimum m exists. Since we only
' need m such that R(x, m) Ib(x), the optimum m must satisfyt E,m I 2 max I b(x) l/ro .
(2.5)
X
Equation (2.4) has a unique solution which is precisely the right-hand side of (2.3). It can be solved via the iteration
V"+'(X)= min[b(x), E x T/"(X,)+ r(x)],
V"(x) arbitrary,
(2.6)
for which it can be shown that P ( x ) + V(x) as n + co. Actually, (2.2) is a special case of a cost equation for a more general control problem (Section 3.4) and there are algorithms for the computation of V ( . ) which converge
t (2.5) follows from
--axy Ib(y)I
+ (E,m)r, 5 V ( x ) 5 b(x).
3
44
MARKOV CHAIN CONTROL PROBLEMS
much faster than does (2.6) (see Kushner and Kleinman [K9] and Kushner [K5], and the remarks at the end of Section 3.4). A sufficient condition for the stopping problem to have an optimal solution is that there be a solution to (2.4), and, in that case, an optimal stopping rule is given by the minimizing rule in (2.4). Then, our optimal policy consists in stopping at the first n for which V ( X , ) = b ( X , ) ; i.e., on first hitting the stopping set B
= {x
: V ( X ) = b(x)}.
For each E > 0, define the set B,
If a stopping time
in
= {X
: V ( X )2 b(x) - E } .
is such that X ,
E
B e , then
R(x, in) IV ( x )+ E.
Also (Shiryaev [Sl, Chapter 11.71) V ( x )=
inf
m E YR+(x)
(2.71
R(x, in).
The proof of (2.7) is not hard but will be omitted. It depends on the fact that the existence of a solution to (2.4) guarantees that there is an optimal solution in* in the class S,+,, with value V ( x ) ,and then uses the minimization property in (2.4) and the fact that R(x, m*)= V ( x ) , to compare in* to any in E
S,+.
3.3 Controlled Markov Chains: Families of Controlled Strategies For the simple stopping problem of Section 3.2, there was no need to formally introduce the notion of a controlled Markov chain (of which the stopping problem is actually a special case). In a controlled Markov chain, we have some choice over the transition probabilities, and can select them to minimize some cost function or to satisfy some constraint. The controlled model has numerous applications-in many different areas. See Derman [Dl], Howard, [H2], Kushner [K5], Wagner [Wl], and Hillier and Lieberman [Hl]. Most of the results in Sections 3.3-3.8 can be found in Kushner
~~51.
Let @(x) denote the action or control set associated with the state x; @(x) will either be a finite set of discrete points or a compact set in some Euclidean space. The one-step transition probabilities pXy( ) are functions of the control. If @(x) is not discrete, then we will assume that it is independent
-
3.3
45
FAMILIES OF CONTROLLED STRATEGIES
of x and that the p x y ( - ) are continuous functions on %! = %!(x).For any a E %(x), pxy(a)= 1 and pxy(a)2 0. The initial time is n = 0 unless otherwise specified. A quantity
cy
a
= (u,
, u1, ...)
is a pure Markov control law or policy if each ui is a function on S and if the value ui(x)at x lies in %!(x). With the use of the law a,the (possibly nonstationary) transition probabilities are defined by pxy(u,(x))= P { X , +
=y
1 X , = x, law a used}.
The terms E; and P; denote expectation and probability, given initial condition x and that law a is used. If ui( - ) = u ( . ) for some function u( - ) and all i, then the law is said to be stationary and we may write E",, Pxfor E;, P;, resp. We define u,(X,) = U,, the actual sample control action used at time n. The class of pure Markov laws is denoted by WPM. We need to define a randomized class V, of control laws and will do it in a way that is suited to our later needs. Let (Q, P, a)denote a probability space. Let the sequences of random variables {$,, X , } be defined on the space and let W,denote the minimal a-algebra over which { i j i ,X i , i I n} are measurable. Each $,, takes values in a set A, and X , in S. Let u,,(-) be a measurable function of { $ i , X i , i I n} with values in %(x) if X , = x. Let U , = u , , ( $ ~ ., . ., $,, X , , . . ., X , ) denote the actual sample value. Suppose that the distribution function of @, given X , , .. ., X , ; $, . . ., $ n - l , is given. The sequence a = (u, , u1, . . .) is said to be in WR ifthe law ofevolution o f { X , } is given by (3.1) (where there is an obvious abuse of notation) for each initial condition z E S. According to our terminology, the sample space may depend on the policy n since {@,} depends on n.
P t { X 0 = z } = 1, P : { x l = Y I $0
9
XO} = Pzy(UO(II/O
9 z))9
(3.1)
C{Xn+1= y ( $ O , . . * ) $ n ; X o , . - - ,X n = x } =pxy(un($o,
$,;
Z,
XI,
Xn-11
a ) ) .
In applications in the later chapters, the random variables {$,} will either be explicitly given or will be easily constructable from the given randomized control law. For an example, suppose that we desire a law satisfying P { U , = - + a J X o ,...) X , } =I. 1
46
3
MARKOV CHAIN CONTROL PROBLEMS
+,
Then we only need let {+,} be independent and have take values k 1, each with probability to generate U,, we need only toss a fair coin and let U,= f a i f + , = f1. The class V, and the classes of controls and stopping times defined in the following are introduced owing to the nature of proofs and approximations in Chapter 9. In Chapter 9, the approximation to the optimal control problem for the diffusion will be an optimal control problem for a Markov chain and the sequence of minimum cost functions corresponding to the sequence of approximations will converge to a cost functional for a controlled diffusion with a possibly randomized control. In order to show that the limiting function is indeed minimal, we will actually need to approximate arbitrary controls so that they can be applied to the chain. The various classes of controls appear in this context. Let us introduce some additional terminology. Let G I denote an arbitary set and define N = min{n : X, .$ G , } . With the subscript c standing for either PM or R, let WC(x, G,) and V:(x, G , ) , resp., denote the sets of policies n in Wc for which P:{N < 00) = 1 and E t N co = 1, resp. As usual, Vc(Gl) = () VC(x,G,),etc. Let Y : ( n ) ,YC(x,n), and Y:(x, a),resp. (and the intersections Y , ( n ) = () Y c ( x , n), etc.) denote the classes of stopping times m for the controlled problem when the control policy is fixed at n E V c and either P",{m < co} 5 1, P:{m < co} = 1 or E:m < co, resp. (and where the properties hold for all x, etc.).
4;
-=
3.4 Optimal Control until a Boundary Is Reached We return to the cost structure of Section 3.1.2. Suppose that we wish to control the chain until the first time that the set as is reached. Let i ( - ,- ) denote a bounded real-valued function on {x, u : x E S - as, u E %(x)}. If @(x) is not discrete andfinite, then suppose that i ( x , - ) is continuous on a(.). Assume that either (a) or (b) holds. (Recall that N = min{n : X, .$ S as = C , } . ) Case (a) minx.. i ( x , u ) > 0, and there is at least one? n E VgM(S- as). Case (b)
sup E ; N < co. nsVR X
t V&(S - 8s) = V:M(G,) plies that E:N < 03, all x.
=
UIM(x, G,), where G, = S - as; i.e., II E &% ;'
- as) im-
3.4
OPTIMAL CONTROL UNTIL A BOUNDARY IS REACHED
47
Case (a) or (b) can often be verified from the data of the problem, as we shall see in some of the examples in this volume. For each K E V, ,define the cost R(x, ). = EE
[1
N- 1
i=O
i(xn
9
vn)
+ b(xN)]
(4.1)
and define V ( x )= inf R(x, K). z E WPM
Under (a), there is at least one K E WpM under which (4.1) is finite for each x and under (b), (4.1) is finite for all K E VR, x E S - 8s. Hence under (a) or (b), V ( x ) is finite for each x . By a formal dynamic programming argument, V( .) satisfies (4.3) V ( x ) = inf [E", V ( X , ) u E Q(x)
+ F(x, a)],
= b(x),
1,
x
E
S - as,
E
as,
(4.3)
where for a functionf( -), E: f ( X , ) is defined by p,,(a)f(y). Under either (a) or (b), (4.3) has one and only one solution, which is given by the righthand side of (4.2) and there is an optimal policy in VpMwhich is stationary. Also, under both (a) and (b), inf R(x, K ) = V ( x ) .
ZEOR
(4.4)
Computational techniques for solving (4.3) are given in Kushner [K5] and Kushner and Kleinman [K9]. Remark on Computational Methods
The methods for solving equations such as (4.3) are very similar to those for the purely linear problem, where there is no infimum operation. A comparison of various techniques appears in Kushner and Kleinman [K9] and we will merely list some alternatives. Let V"( denote an arbitrary bounded vector and assume either Case (a) or (b) (the methods also work-for thz discounted problem). Suppose V(-) is given. In the Jacobi method, V"+'(.) is given by a )
The boundary conditions are always V"(x) = b(x),
x
E
8s.
3
48
MARKOV CHAIN CONTROL PROBLEMS
Suppose that there are K points in S - d S and order them in some way. The Gauss-Seidel iteration is ~ + 1 ( x=) inf
a € W(x)
r i l p x y ( a ) ~ + l (+y ) y= 1
K
1p,,(a)V(y) + ~ ( xa,) ] ,
y=x
x
=
1, ..., K.
(4.7)
Choose a parameter w 2 1. The accelerated Gauss-Seidel iteration is
+ t ( x , a)
1+
V"+l(X) = O V " + 1 ( X )
x
=
1, ..., K ,
(4.8)
(1 - w ) V " ( x )
Iterations (4.5) and (4.7) always converge to the correct solution, while (4.8) does so only for o in an interval [l, a), a > 1. Method (4.7) never converges more slowly than method (4.5),and (4.8) is preferable to (4.7) for suitably chosen o.Usually, the convergence is considerably faster for (4.8) for suitable w . The ordering of the points of S - as plays an important role in determining the convergence rates of both (4.7) and (4.8). The use of probabilistic intuition can be quite helpful in selecting a good ordering; a clever choice can accelerate the convergence considerably. See [K5,Chapter 41. Some further remarks appear in Chapter 6. 3.5
Define, for
K E
Optimal Discounted Cost
gRand /? E (0, l),
and define
V ( x )= inf R(x, n). n
E UPM
The costs are well defined and finite without Case (a) or (b). A straightforward, formal dynamic programming calculation yields that V ( x ) satisfies V ( X )= inf [BE: V ( X , ) aE W(X)
+ i ( x , a)],
x
E
S - as,
(5.3)
x E as. Equation (5.3) has a unique bounded solution, which is given by the righthand side of (5.2). There is an optimal policy in gPMwhich is stationary and (4.4) holds here also. = b(x),
3.6
49
OPTIMAL STOPPING AND CONTROL
3.6 Optimal Stopping and Control The optimal stopping problem of Section 3.2 is actually a special case of the problem of Section 3.4. To see the correspondence, add a fictitious state {0}to S and let 8s = (0). Let % ( x ) = (0, 1) and set the terminal cost b(0) in (4.1) equal to zero. For x # 0 define the controlled transition probabilities P,,(*) and costs f ( x , a ) by p,,(O) = pXy, f(x, 0 ) = ~ ( x ) px0(1) , = 1, and 7(x, 1) = b ( x ) ,where this b(.) is the stopping cost term of the stopping problem. With r(x) 2 ro > 0, we have Case (a) of Section 3.4, where m 1 = N = min{n: X , = 0). In many examples, we are able to control the process up until the stopping time. This is, obviously, still a problem of the type dealt with in Section 3.4, and we will now formulate it in those terms. The set %(x) will contain the action 1 (stopping) and the various possible control actions other than the stopping action. Set %,,(x) = % ( x ) - { l}. Again, 8s = {0} and m 1 = N = min{n : X , = 0). For each n E W R and m E S : ( x , n),define the cost
+
+
[1
1
m- 1
R(x, m, n) = E t
and define
V ( x )=
i=O
? ( X i , Vi)+ b ( X , ) , inf
x # 0,
R(x, m, a).
n E QPM, rn E FPM+(x. a)
(6.1) (6.2)
It follows from Section 3.4 that V ( - )is the unique solution to the dynamic programming equation
1
+
V ( x ) = min b(x), min [E", V ( X , ) i ( x , a ) ] / ,
V ( 0 )= 0.
a E WOW)
x # 0, (6.3)
There are optimal a and m in the classes of pure Markov stationary policies and stopping times, resp., and also V ( x )=
inf
R(x, m, n).
n E 5 t ~m . E FR+ (x, II)
(6.4)
The optimal stopping set is B = { X : V ( X )= b ( x ) } .
(6.5) If V ( x ) < b(x), then we do not stop and an optimal action is that which attains the minimum over 4P0(x) in (6.3). Define the .+stopping set and
B, = { x : V ( x ) 2 b ( x ) - E ) me = min{n : X ,
E
BE}.
3
50
MARKOV CHAIN CONTROL PROBLEMS
If a denotes an optimal control policy, then R(x, m, , a) I V ( x )
+
(6.6) We can easily alter the problem to add the requirement of compulsory stopping on first reaching a set A, with associated stopping cost b( *). Equation (6.3) then holds for x $ A u (0) = d S and we need to impose the additional boundary conditions V ( x )= b(x) for x E A. E.
3.7 Impulsive Control Systems There is a generalization of the optimal stopping problem which is of some importance in applications where a sequence of decisions must be made and where one must also determine the optimal timing and “magnitude” of these decisions or actions as they will be called. The Markov chain version of this problem is just a special case of the discounted problem of Section 3.5. However, since the special case will arise when we treat the continuous time version of the problem (which is not a special case of the continuous time version of the discounted problem) we will describe it briefly here. Let (0) represent the decision “no action.” Let Q(x) denote the set of actual control actions, which is assumed finite for each x. Define Q(x) = Q(x) (0). Let x y E S for each x E S, y E Q(x). With Q(x), q i , and Qi replacing %(x), ui,and U i , resp., we define the control law a = (q,,, q l , .. .); Qi denotes the actual sample value of the control action at time i. Let B E (0, 1) and let p ( . , .) be a bounded function on Define the transition probabilities
+
+
if
Pxy(Y) = 1 PXJO)
= Pxy
.
Again, let VpMand V, denote the sets of pure Markov and randomized control laws, resp. The control law is called impulsive owing to the “sudden” and deterministic change in the state which is implied by (7.1) when there is an action with value in Q(x). For each a E V, define the cost functional
and
V(x) = inf R(x, a). Z E YPM
(7.3)
3.7
51
IMPULSIVE CONTROL SYSTEMS
The function defined by the right-hand side of (7.3) is the unique solution to the dynamic programming equation
1
~ ( x=) min P E : I / ( X , )
+ r(x),
+
inf [BE: v(x,) p(x, y ) Y
E
Q(x)
+ r(x)]).
(7.4)
The E: implies that the transition probabilities are p x y . There is a pure Markov stationary optimal control, and we can also replace V ,, by VR in (7.3) without changing the value. An alternative notation, which is useful in the continuous time version of the model, involves viewing the problem as one in which there is a sequence of action times ml, m 2 , .. . and a sequence of corresponding actions Q1,Q 2 , . .., where Q i= Q m iand Q i E Q ( X m J Q. iis the ith actual (nonzero) action. We then rewrite R(x, K ) in the form R(x, { m i , Qi}). The {mi} is a sequence of nondecreasing stopping times with respect to an appropriate sequence of nondecreasing o-algebras. In this sense the model is an extension of the stopping problem; instead of one action or transaction taking place, there is a sequence of actions or transactions and we must determine their timing and value. The functions p ( -, and I ( .) represent the transactions cost and the “running” rate, resp. The action set is defined by a )
i
B = x : ~ ( x=) inf
Y EQ W
+
V ( X , ) p(x, y )
+ r(x)))
(7.5)
and any minimizing function y ( x ) yields an optimal policy. It is not necessary that Px{rni c co} = 1;for, indeed, we may possibly never have an action. The problem can be generalized by allowing a continuously acting control. Reintroduce the control set %(x) and suppose that at each state x and time n we have a choice of selecting a control from Q ( x )or from %(x):If we select a control value y from Q(x),then the transition probabilities are of the impulsive form (7.1); if we select a control value a from @(x), then the transition probabilities take the value p,,(a) as discussed in Section 3.3. Then we assume that % ( x ) is either discrete and finite or compact and in the latter case, pij(-) is assumed continuous. The control policy takes the form K = (u,, q l , u 2 , q 2 , . . .) and the cost functional is (7.2) with ? ( X i , Vi) replacing r ( X i ) there. The dynamic programming equation is still (7.4) but with BE: V ( X , ) I ( X ) replaced by
+
inf [BE”, V ( X , ) a E ¶!(x)
+ ?(x, a)]
and all the previous conclusions concerning the optimal policy continue to hold. In order to emphasize the impulsive control aspect of the problem, we
3
52
MARKOV CHAIN CONTROL PROBLEMS
may write R(x, K ) as R ( x , {mi, U i , Qi}), where Qi is the ith actual impulsive action as previously defined. A stopping boundary dS and boundary cost b ( . ) can clearly be added to the formulation.
3.8 Control over a Fixed Time Interval There is a natural “control” analogy for the chain with cost structure (1.6) and the reader is referred to Sections 3.1.3 and 3.3 for the terminology. Let Fl(-, -, .) be a real-valued function, continuous in its third argument and defined on { x , n , a : x ~ S - d S , n = l ..., , M,~E@(x)).
For K
E
qR,define the cost
x $ dS,
n < M , (8.1)
with boundary condition
Define V(x, n) = inf R(x, n, K ) .
(8.3)
nE~ P M
Then the right-hand side of (8.3) is the unique solution to the dynamic programming equation (8.4) with boundary conditions (8.2): V ( X ,n) = inf [E”,, n v(Xn+1, n + 1) + ?t(x, Q a)], a E Q(X)
n < M,
x$dS.
(8.4)
There is an optimal pure Markov control law and (8.3) continues to hold if WpM is replaced by WR.
3.9
LINEAR PROGRAMMING FORMULATION
53
3.9 Linear Programming Formulation of the Markov Chain Control Problems The control problems on Markov chain models with which we previously dealt can be formulated as linear programming problems (see Derman [Dl], Kushner and Kleinman [KlO], Kushner and Chen [KS]. Most of the following results can be found in the second reference). There does not seem to be any numerical advantage to doing so if we stick to the problem formulations of Sections 3.2-3.8. However, in many cases, there are additional constraints on the state occupancies or on the control usage, which cannot easily be accounted for in the usual dynamic programming formulation but where a linear programming formulation will still be valid. For convenience, we will assume that %(x) contains exactly Lelements, although we can also treat the case of convex %(x) via a generalized programming approach. Let K be the number of states in S and set S = { 1, .. ., K}. There is no loss of generality in supposing that %(x) = (1, 2, . .., L}, which we shall do. If there are constraints of the type that we will be concerned with, then there may not be an optimal control for the constrained problem in the class of pure Markov controls and it may be necessary to consider randomized controls. Generally, we will need to consider only the class of stationary random Markov controls gSRM. Each K = (u, u, . . .) in gSRM is characterized by a collection of real numbers { y i j , i = 1, . .., M; j = 1, . .., L}, where Yij
2 0,
Yi = { Y i l ,
--.,Y i L } ,
C Y i j = 1. i
Thus, for each i E S, y i is a distribution over the control actions %(i). such a control law IL, we have the transition probabilities
For
Pn{x,+l = j l x O.r. . , X , = i } = P f f { X , +=l j J X , = i }
Thus the chain {X,} is a Markov chain under the law K even though K is not pure Markov. The usefulness of the introduction of randomized Markov laws can be seen by a simple example. Suppose that there is one constraint and only two possible pure strategies, n1 and z 2 . Suppose that n1 satisfies the constraint but gives a large cost, while n2 does not satisfy the constraint but gives a small cost. It is not hard to see that randomizing “fills in the gaps” and yields a strategy which satisfies the constraint and at a smaller cost than given by nl.
3
54
MARKOV CHAIN CONTROL PROBLEMS
Control until the Boundary Is Reached
We consider the problem of Section 3.4. The sets S and as are defined as previously but we modify the problem b y adding a Jictitious rest or absorbing state {0},and extend p i j ( * ) ,and F(i, - ) by defining pio(a)= 1
for i
E
F(i, a ) = b ( i ) for i
dS, all a E % ( i ) ; E
as, all a ;
por(a) = 1,
F(0, a ) = 0.
(9.1)
Let M i j denote the average number of times that X, = i and control action CJ,, = j E % ( i ) occur simultaneously and let (pl,. .., p K ) denote the initial probability distribution; namely, P { X o = if = p i , i = 1, . . . , K. Of course, { M i j }depends on the control law. We shall see how to determine the optimum { M i j }over all laws in WsSRM.The optimum { M i j }will determine the optimum control law. In the Markov dynamic programming formulation, it is not necessary to specify p ; the solution is optimal for all p. In the linear programming formulation, if p i > 0 for all i, then the resulting law is optimal for all other initial p if there are no additional constraints of the type (9.4). Under any n E CesRM, the linear system (9.2) of relations among the M i j
1Mil = M i = pi + x p j i ( k ) M j k , k, j
I
all i, j # 0, (9.2)
Mij2 0
is a direct consequence of the Chapman-Kolmogorov conservation of probability” equation for the steady state probabilities under the law n. If { M i j }solves (9.2), then yij = M i j / M ican be interpreted to be the probability that the control actionj E % ( i ) is used when the state is i. If EYN < 00, then (4.1) is well defined for x = i, and the cost equation (4.1) takes the form [recall the modifications (9.1)] “
x m
R(x, n) = E:
n=O
F(X,, V,)
=
x M i j F ( i ,j ) = z,
i. i
px = 1, x # 0.
(9.3)
The dynamic programming problem is equivalent to minimizing z under the constraints (9.2) and conversely. The simplex procedure will yield a pure Markov control (where yij = 1 or 0) in this case. Under (a) or (b) of Section 3.4, z has a finite minimum under (9.2). In fact, the dual system to (9.2) and (9.3) is precisely the dynamic programming equation (4.3). State Space Constraints
For any set of real numbers {aij, i = 1, . . ., K ;j = 1, . .., L; s = 1, . . .}, { b , , s = 1, ...}, the linear constraints (9.4) can be added to (9.2) i. i
3.9
LINEAR PROGRAMMING FORMULATION
55
If 1 such constraints are added, then the simplex procedure will yield an optimal policy (if one exists) with which the controls for as many as 1 states are randomized. Under Case (a) or (b) of Section 3.4, there is an optimal policy in %sRM with finite cost if there is one policy in WSRMwhich guarantees satisfaction of (9.4) with a corresponding finite cost (9.3). Examples Suppose that we want to constrain the use of an action a o ,so that a. is used no more than b , times on the average, until the boundary dS is reached. Then 1 = 1 in (9.4), and (9.4) has the form
1 M,, i
Ib,.
For another example, suppose that we want to constrain the average number of times that a set S, c S - dS is hit before dS is hit to be no more than b , . Then (9.4) takes the form
1
i s . 5 1 ,j
Mil Ib,.
The Optimal Stopping Problems
To put the optimal stopping into the framework of (9.2b(9.3),set @ ( x ) = (1, 0}, 1 = stop, 0 EE continue, dS = empty. We will retain the fictitious absorbing or rest state (0) and use (9.5) in lieu of (9.1). 7(i, 0 ) = r(x), Pij(0) = p i j
b ( 0 ) = r(0) = 0,
7(i, 1 ) = b(x), p i @ (1)
= 1;
i, j
# 0.
(9.5)
The following types of constraints may be of interest. If we wish (E: denotes the expectation under a, p denotes the probability measure on X o and m the stopping time) (a) EEm Ib , (bound on the average stopping time), then set
1MioIb, . i
If we wish (b) P ; ( X , then set
E
S,} I b 3 (bound on probability of stopping in the set Sl),
C Mi, Ib,.
ieSi
Discounted Control until a Boundary Is Reached
If state space constraints are added to the problem of Section 3.5, we again may not have an optimum policy in the class of pure Markov controls but there is one in % s R M .
3
56
MARKOV CHAIN CONTROL PROBLEMS
Define the discounted occupancy vector
Aij= EE
m
1 It/?’,
n=O
where 1; = 1 if X, = i and U, = j . The
{Aii}satisfy constraints [KlO]
m
1 A, = Ai = pi + /? k.1i p j i ( k ) A j k , 1
all i # 0,
A, 2 0.
(9-6)
State space and control usage constraints of the discounted form
1 a t M i i I b , , ..., s = 1, ... *
i. j
(9.7)
can be considered and cost (5.1) can be written as [notation (9.1) for F(-,
-)I
m
R ( p , n) = E,”
1/?“r(X,, U,) = 1i fiijF(i, j ) E z.
i=O
i,
(9-8)
If the constraints (9.6) and (9.7) are feasible, then there is a feasible set {Aij} which minimizes z and the minimizing {aij} yields the optimal policy via yij = f i i j / A iwhere , feasibility is simply a question of whether there is a n E gSRM which guarantees (9.7). There are also linear programming versions of the optimal impulsive control and finite time optimal control problems.
CHAPTER 4
Elliptic and Parabolic Equations and Functionals of Diffusions
One aim of this chapter is to describe some further properties of the solution to stochastic differential equations when there is either a control or when the Lipschitz condition is replaced by a weaker condition. We shall be concerned with properties such as causality and uniqueness; also, several classes of control hnctions will be defined. These results and definitions will be used in later chapters. The techniques introduced from Chapter 5 onward will deal with methods for approximating values of a large class of functionals of diffusions and of optimal costs and controls in optimal control problems with diffusion models. Since these functionals are often weak solutions of degenerate or nondegenerate elliptic or parabolic partial differential equations, we will also be able to calculate approximate values to the solutions to these equations. We will state some of the hnctionals of interest and briefly describe some of their relations to partial differential equations. Heavy use will be made of Strook and Varadhan [S5] and Dynkin [D4, Chapter 131. See also 57
58
4
ELLIPTIC EQUATIONS AND DIFFUSIONS
Fleming and Rishel [Fl], Friedman [F3], and Karoui [Kl], where a number of results concerning the partial differential equations satisfied by functionals of diffusions are discussed. The relations between diffusions and linear elliptic and parabolic equations and between optimally controlled diffusions and nonlinear elliptic and parabolic equations will he treated in a partly formal manner. We aim to cite some of the main relationships and to set up part of the structure of equations, processes, and terminology for later use. 4.1 Assumptions and Uniqueness Results: No Control Assume A4.1.1 In the homogeneous case, let f and o( .) denote bounded continuous R’ and r x r matrix-valued functions, resp., on R’. In the nonhomogeneous case, let f(.,.), a(-,* ) be bounded and continuous on R‘ x [0, a). Suppose that, for each X ( 0 ) = x, (1.5.1) has a unique (nonunticipative) solution in the sense of probability distributions; i.e., any two nonanticipative solutions (perhapscorresponding to diflerent Wiener processes) induce the same distributions on C‘[O, co). Assumption A4.1.1 (homogeneous or nonhomogeneous case) will be used throughout the book whenever there is no control function explicitly introduced. (
0
)
The uniqueness property has some important implications. For simplicity of notation, only the homogeneous case will be treated. In Theorem 4.1.1, define (R’,8, B;, P,)as follows: R’ = C[O, a), 8 = Bore1 algebra on C‘[O, co),Bi is the smallest sub-o-algebra of 93’containing the “projections” ( x ( - ) :x ( s ) < a} for all s I t and all a E R‘, and P,, is the regular conditional probability of X ( * )given w ( - ) when X ( 0 ) = x. Let P denote Wiener measure on R‘. Define $2 = R‘ x R’,P, = P,, x P, and let B and Brdenote the smallest o-algebras on $2 containing 8 x 8 and B; x B;,resp. The first R’ will “carry” X ( .) and the second w( and we can assume that X ( ), w( -), X ( 0 ) = x, are defined on (Q B, Br,P,) for each x . Let P,denote the measure that X ( - )induces on C‘[O, co) when X ( 0 ) = x. a )
-
Theorem 4.1.1 Assume A4.1.1. Then X ( - )is a strong Markov and a Feller process on (Q 99, B,, P,, x E Rr).
PROOFFor any bounded set A, the family {P,,x E A } is tight on C‘[O, co). Let (xn}denote a bounded sequence which tends to x, as n + 00, and let P,.be weakly convergent with limit P. The limit process solves (1.5.1) with X ( 0 ) = x and some Wiener process w ( * )(although we may have to
4.2
FUNCTIONALS OF UNCONTROLLED DIFFUSIONS
59
augment the probability space by adding an independent Wiener process). By the uniqueness, I‘ = P,.Thus, the subsequence is irrelevant. Now, let g ( * ) denote a bounded and continuous real-valued function on R‘. This convergence implies that E,”g(X(t))+ E , g ( X ( t ) ) as n + 0 0 ; i.e., the process is Feller. Since the Feller property together with the Markov property imply the strong Markov property, we need only show the Markov property. Let A be an open set in R‘, and g,( a sequence of continuous functions which converges to I ” ( * ) . Since E,g,(X(t)) is continuous in x, and E,g,(X(t)) + P(x, t, A), where P(x, t, A) = P x ( X ( t )E A}, the function P( t, A) is measurable for each open A and each t 2 0. This implies that it is measurable for each t and Borel set A. We have a )
.,
X(t
+
S)
=X(t)
+ 5r
t+s
f ( X ( u ) )du
+5
t+s
1
.(X(U)) dw(u),
where W ( U ) - w(t), u 2 t, is independent of X ( t ) for each t 2 0. By uniqueness, the distribution of X ( t + s), given X(u), u I t (or, given X(u), w(u), u It ) with X ( t ) = y , must be P ( y , s, .). Thus the uniqueness and the measurability of each P ( - , t, A) imply that P,{X(t
+ s ) E A I Bt}= P ( X ( t ) ,s, A )
w.p. 1 for each s 2 0, t 2 0, and Borel set A. Q.E.D.
4.2 Functionals of Uncontrolled Diffusions Let G be a bounded open set with compact closure G and boundary aG. The following process X ( -) is the diffusion (1.5.1). For all functionals except (2.4), we suppose thatf(.), o(-)d o not depend on t . Define t = min{t
: X ( t ) 4 G},
t’= inqt
: X ( t )4 G } .
As usual, if X ( t ) E G (resp. G),all t < 00, then t = 00 (resp., t’= 0 0 ) for that path. Then (Dynkin [D4, Chapter IV.l]), t and t’ are stopping times (perhaps taking infinite values). Let k( - ) and b( - ) denote bounded real-valued continuous functions on G and aG, resp. Next, several functionals of X ( * ) will be defined. In subsequent sections, we will discuss equations which these functionals satisfy. For each x E G for which E,t < 00, define
4
60
ELLIPTIC EQUATIONS A N D DIFFUSIONS
Let Lo be a positive number and L( be a real-valued bounded continuous function on G satisfying min L ( x ) 2 Lo > 0. a )
XE
Define
c
1’ A ( r ) k ( X ( t ) )dr + E , A ( r ) b ( X ( ? ) ) ,
R ( x )= Ex
‘0
where
A(0, t ) = A ( t ) ,
A(u, t ) = exp - [ ‘ y X ( s ) ) ds.
(2.3)
.U
Let T > 0 denote a given real number, and b , ( . , .), b T ( - )real-valued bounded continuous functions on dG x [0, TI and C, resp., and define the functional [here k( .), f(.), o(- ) can depend on t if we wish] R ( x , t ) = Ex,‘ i T n r k ( X ( s )ds ) -1
+ E,.,b,(X(T n T), T n T
)
I
~
~
(2.4a) The discounted version of (2.4a) is
R(x, t ) = E x . ,
\
T c>r
[ A ( t , s ) ] k ( X ( s ) )ds
*I
+ E , t [ A ( t , T n t ) ] b , ( X ( T n T), T n T ) I ~ ~ T I + E x , t[A(t, T ) l b d X ( V ) I l r >
(2.4b)
Finally, let p(-) be an invariant measure for the process X ( - ) . Let P(x, t, - ) define the measure with values P x { X ( t )E A } = P(x, t, A ) . Suppose that P(x, t, .) converges weakly to p(-),as t + co,and define y by y = lim I’m
1 ‘ ~
t
i,E , k ( X ( s ) )
If in (2.1), (2.2) and (2.4), T’ replaces tional by R’.
T,
(is =
1 k(y)p(dy).
(2.5)
we denote the corresponding func-
4.3 Partial Differential Equations Associated with Functionals of Diffusions. u(*) Uniformly Positive Definite If u ( - ) is uniformly positive definite [i.e., there is a real K > 0 such that x’u(y)x 2 K I x I 2 for all x, y ] then there exists a unique solution to (1.5.1) under weaker conditions on f(.), o(- ) than the Lipschitz condition.
~
~
~
4.3
EQUATIONS SATISFIED BY FUNCTIONALS OF DIFFUSIONS
61
Theorem 4.3.1 (Dynkin [D4, Theorem 5.113) Let aij(.),l;(.) be bounded and satisfy a uniform Holder condition and let a(.) be uniformly positive definite. Then there exists a homogeneous diffusion process with differential generator 9 (see 1.5.14). I t s transition density p(x, t , y ) [the density of the Markov transitionfunction P(x, t, dy)] is the fundamental solution of the differential equation (in variables x, t, where x = initial state) (du/dt)(x,t ) = Y u ( x , t ) , (3.1)
with initial condition u(x, t ) -,6 ( x - y )
as t -,0.
The process is strong Feller and Markov, and is a unique solution to (1.5.4). Actually, Dynkin's definition of a diffusion process is not quite the same as merely satisfying (1.5.1) and he does not actually prove that (1.5.1) is satisfied by what he calls a diffusion process, in the sense that there is a solution to (1.5.1) for each x = X(O),for the same w(.) process. However, the function P(*, -, .) can be used to define a family of measures { P x }(on, say, C[O, a))which generate a homogeneous strong Markov strong Feller process X ( . ) (on the sample space C'[O, a),with the appropriate calgebras) and it can be shown that E,[X(t A ) - X j t ) I X ( S ) ,s I t ] = f ( X ( t ) ) A+ .(A),
+
cov,[X(t
+ A ) - X ( t )I X ( s ) ,s I t ] = Za(X(t))A+ o(A), P,{ I X ( A ) - x 1 > E } = o(A),
where o ( A ) is uniform in (w, t). These last estimates implyt that there is a Wiener process w ( - ) ,such that X ( .) is nonanticipative with respect to w( * ) and (w.P. 1) Eq. (1.5.1) is satisfied. The process w ( - )may depend on x. A similar construction exists for the nonhomogeneous case. A proof of the fact that (1.5.1) has a unique solution (which is a strong Markov, strong Feller process) under weaker conditions than those of Theorem 4.3.1 [still with a(.) uniformly positive definite] appears in Strook and Varadhan [S3]. The proof of uniqueness is much more difficult than that given in Theorem 1.5.1 since we cannot use the easy and explicit calculations associated with the Lipschitz condition. Equation (3.1) is known as Kolmogorov's backward equation [since it is an equation in the initial condition x of the transition density p(x. t, y)].
ro
t The implication is not entirely obvious. The equations imply that X ( t ) - x - J ( X ( s ) ) ds is a continuous martingale with quadratic variation Yo 2 a ( X ( s ) )ds. The result then follows from Section 1.4.4 (see also Doob [D2, pp. 286-2911), provided that we augment the probability space by adding an independent Wiener process.
4
62
ELLIPTIC EQUATIONS A N D DIFFUSIONS
Suppose that the derivatives af;,(x)/ax,,aajj(x)/ax,,a*aij(x)/ax,axj are bounded and satisfy a Holder condition on Rr. Then (Dynkin [D4, Vol. I, p. 1681) for each x, p(x, ., .) satisfies the adjoint equation to (3.1) (in the current variables t, y ) :
In Chapter 7, we shall study numerical approximations to the (weak sense) transition density function p( -, -, .), without the positivity assumption on a ( . ) , provided only that A4.1.1 holds. A boundary point x is said to be regular
lim P,{T’ 2 E }
=0
for all
E
(T’)
if
> 0 and all such sequences {y}. (3.3)
Y+X,YEG
The boundary aG is regular ( T ’ ) i f each x E aG is regular (T’). There are equivalent dejinitions for regular ( T ) . Clearly, regular (T‘) implies regular ( T ) since T 5 7 ‘ .
Lemma 4.3.1
The condition (3.4) implies (3.3) P,{T’
If x
E
>o} = o
at
X E
aG.
(3.4)
aG is not an absorbing point of the process, then (3.3) implies (3.4).
PROOF We shall give only an outline of the first statement and not prove y E A} the second one. For each bounded set A, the family of measures {P,,, induced by Eq. (1.5.1) is tight on C[O, 00). If, say, y , + x and {P,J converges weakly to a measure P, then there is a Wiener process w ( - )and a corresponding solution X(.) to (1.5.1) with X ( 0 ) = x, such that X(* ) induces the probability P on C‘[O, 00). By uniqueness of the measures induced by (1.5.1) for each fixed initial condition (Assumption A4.1.1), P, = P. Now let y, + x, y, E G, x E aG and assume (3.4). The set {T’ 2 E } is a closed set in C[O, co), and by the result of the last paragraph P,,” -,P, weakly. Hence (Theorem 2.1)
-
lirnPYn{T’ 2 E } IP,{z’ 2 E }
= 0,
y.-x
which proves the assertion. Another useful fact is that P,{T’ > 0} = 0 or 1 by the zero-one law for Markoo processes (see Dynkin [D4, Corollary 1, p. 84 and Theorem 3.3, p. 871, and note that {T’ > 0)E N o +and N o +c .&Yo+, in Dynkin’s terminology). Now let us continue with the case where a( - ) is uniformly positive definite. We say that a point y E aG can be touched by an open conefrom the outside if
4.4 a ( - ) DEGENERATE there is an open cone C and an E > 0 such that
63
(4 = empty set)
<e}nG=4.
{ x : x - y ~ C ,I x - y l
If y E aG and there is a neighborhood N, of y and a function which is differentiable at y and such that +,(y) # 0, and if
4 ( * )on N,,
G n N, = { x : 4 ( x ) > 0 } n N , ,
then JG is said to be differentiable at y. If dG is differentiable at y, then it can be touched by an open cone from the outside at y (Dynkin [D4, Lemma 13.41). All points on aG which can be touched by an open cone from the outside are regular ( 7 ’ ) (Dynkin [D4, Theorem 13.81). Actually Dynkin uses (3.4) as the definition of regularity (7‘). By Lemma 4.3.1, that definition is equivalent to the one used here, provided that the point is not absorbing (which is true if a( .) is uniformly positive definite). The next theorem gives (strong) conditions under which (2.2) actually satisfies a partial differential equation.
Theorem 4.3.2 (Dynkin [D4, Theorem 13.161) Let A ( * ) , k(-) satisfy a Holder condition and A(x) 2 0 (rather than > 0 ) and let dG be regular (t). Then, under the additional conditions of Theorem 4.3.1, the functional (2.2) is twice continuously differentiable and satis3es Y V ( X )- A ( x ) V ( X )
V ( y )+ b(x),
+ k ( x ) = 0,
as
y
-P
x E G,
x E dG.
(3.5)
The positive definiteness condition on a(.) seems to be satisfied only rarely in applications to stochastic control theory. In many cases, the R( are not even continuous. There are important cases where a( * ) is not strictly positive definite, but where the solution to (1.5.1)is still a strong Markov and strong Feller process; in particular, where a continuous (in x, y) transition density p(x, t, y) exists: For an example, consider the case a )
d X , = X , dt,
d X , = g ( X ) dt
+ dw,
(3.6) where g ( * ) is a suitably smooth function. A number of results concerning smoothness of functionals of processes of the type (3.6) and related partial differential equations appear in Fleming and Rishel [Fl]. We will now go directly to the general degenerate case. 4.4
a( - ) Degenerate
By R( - ) (resp., R( -)), we mean the functional given by (2.2) (resp., (2.2) with T’ replacing t). In the degenerate case, neither R ( . ) nor R(.)may be continuous at all x E G. If R(.)is to be continuous at x E dG, we usually
4
64
ELLIPTIC EQUATIONS A N D DIFFUSIONS
need that x be a regular (7’) point and similarly for R( .) (with 7 replacing 7‘ in the definition of regularity). We will next cite some results of Strook and Varadhan [MI. Assume that f (.) E C1(R‘), a ( - )E C2(R‘),and that there is a twice continuously differentiable real-valued function 4( ) defned on some neighborhood N of aG such that
-
aG = {x : + ( x ) = 0 } n N ,
G
n N = { x : 4(x)
I~,(x))
>O
> 0)n N,
on dG.
The boundary function 4( - ) can be defined “locally,” but we will not do so. Define the sets
C, = { X : x E dG, & ( x ) u ( x ) ~ ~ (> x )0},
Xz = {X : x E aG - X3 , ( f (x) - Z ( X ) ) ’ ~ ~ ( X<) 0}, where the ith component of the vector ii(x) is defined by
C (a/axj)aij(x), i
and define
c = c , u c,, Xz = {X : x E aG - X3, Y ~ ( x< )0},
c*= c:
u
c, .
[Note that the a( .) defined in Strook and Varadhan [S5] is twice our a( *).I Define
r = set of regular ( 7 ‘ ) points on aG.
Then [S5, Section 71
cuc*crc~=z*,
(4.1)
P,{z = 7 ‘ ) = 1, a.e. x in G, a.e. x in G, P x { X ( r )E X n Z* < co} = 1,
(4.2)
a.e. x in G,
(4.4)
IT
P x { X ( ~ E’ )C n X* 17’ < co} = 1,
(4.3)
px{x(X‘)E r 1 7’ < = I. (4.5) Equation (4.2) implies that R ( x ) = R’(x) for a.e. x in G and that both functions R( ) and R’( are well defined if
.
a )
s u p E , ~ ’< 00 XEG
or
infA(x) > 0.
xcG
(4.6)
4.4
65
U ( ’ ) DEGENERATE
Equations (4.3) and (4.4) imply that it is enough to specify the boundary function b( - ) on C n C* if we are only concerned with the solutions R( .) or R ’ ( . ) for almost all x E G. Define a weak solution u ( * ) to 2 u
= L(x)u - k ( x ) ,
(4.7) with boundary condition b( -), to be a bounded measurable function u( -), which satisfies J”
I
u ( x ) ~ * ( / ( xdx ) = [ L ( x ) u ( x ) - ~ ( x ) ] ( / ( xdx )
for all? $(-) E C;(G) and for which lim u ( y ) = b ( x ) y-tx. Y
E
for
X E
X, u C,.
G
Then [S5, Corollary 8.21 under (4.6), R’(.)is a weak solution to (4.7), and any other weak solution to (4.7) equals R ’ ( - )at almost all x E G. In general in the sequel we will not use all the conditions on u( .),f( * ) and dG of this section, but will seek ways of approximating the functionals (2.1)-(2.5) directly. Of course, since these functionals are intimately associated with certain elliptic or parabolic equations, there will still be a relationship between our conditions and some of those required to treat the equations. Example It is not always easy to determine the set r of regular (z’) boundary points when u( -) is not strictly positive definite. However, often the dynamics of the problem yield this information without much trouble. Let us consider the two-dimensional process
dX,
=X
, dt,
d X , = f 2 ( X ) dt
+ a dw,
in the set G = { x : I x , 1 Ia, 1x2 I Ia}. See Figure 4.1. The sets (c, d ) and (1; a ) are inaccessible since X , points strictly inward there. Similarly, [b, c) and [ e , f ) are in r since X , points strictly outward there. The sets [a, b] and [d, el are also in r via a law of the iterated logarithm and the 0-1 law for Markov processes. The basic idea is the following. The function X , ( . ) satisfies X Z b ) = X,(O) +
J fZ(X(S))ds + aw(t)* f
0
t C:(G) is the class of R‘-valued infinitely differentiable functions on G, which take the value zero on dG.
4
66
ELLIPTIC EQUATIONS A N D DIFFUSIONS
I
dG
FIG.4.1.
Since w( obeys the law of the iterated logarithm, so does X 2 ( Thus if X 2 ( t )= + a (or -a), then X 2 ( t s) crosses the line + a (or -a) infinitely often in any interval [t, t + so), so > 0. Hence, the upper and lower boundaries are regular. The same conclusion would hold if the constant a were replaced by a function a(x), provided that inf, a 2 ( x )> 0. In this case, a simple time change transforms fo o(X(s))dw(s) into a Wiener process. The only remaining points arefand c. There are several ways of showing that these points are in r. The simplest is to use a result of Strook and Varadhan [S5, Section 61, where a number of criteria for regularity are given. Consider c, where the local boundary function is d ( x ) = (a - x l ) . The following result is Theorem 6.1 of Strook and Varadhan [MI. a )
+
a).
L.et a(*) be infinitely differentiable and suppose that f(-),+(-) are suficiently smooth so that the derivatives below exist. Suppose that there is an integer s such that .~?'c#I~(x) # 0 at x = y E dG and let n be thefirst such s. I f IPs+(x) = 0 at x = y , f m s _< n/2, then P,,{z' > 0} = 0. In our case, n = 3 and the criterion holds at both points y = (a, 0 ) and (--a, 0).
4.5 Partial Differential Equations Formally Satisfied by Path Functionals If we arbitrarily assume that the R ( . ) (resp., R ( * , .)) are in C2(G) (ClS2([O,T] x G), resp.), then It6's lemma can be applied to obtain the relevant differential equations. We will only do some formal calculations, under the previous smoothness assumption. See Kushner [K5,Chapter 111 for still formal but more detailed derivations. Similar calculations appear in Fleming and Rishel [Fl]. Let ' denote transpose.
4.5
67
EQUATIONS SATISFIED BY PATH FUNCTIONALS
Consider the functional (2.1) and let p be a stopping time such that E,p < co. Then, assuming that R( .) is smooth, It& lemma yields R ( X ( p n 7 ) ) - R(x) =
Rx(X(s))a(X(s))dw(s).
Taking expectations and using the definition of R ( .), we get Y R ( X ( s ) )ds = - E x
E,R(X(p n 7 ) ) - R(x) = Ex
Now let N , ( x ) denote the set {y : Ix - y l < E } and assume that x E G and N,(x) E G. Define p = inqt : X ( t ) $ N,(x)}. Then p I7 . If we divide the two right-hand sides of the last equation by E , p and let E + 0, we formally get YR(x) + k(x) = 0,
x E G.
(5.la)
In order for the functional R ( . ) in (2.1) and for the formal derivation to make sense, we require that SUP,,^ E x 7 < 00 if k(.) f 0. If k ( * )= 0, we require that Px{7 a}= 1, x E G. These conditions imply that
-=
lim R(x) + b(y), x-y.
XE
y regular
(7)
on 8G.
(5.lb)
C
A similar formal derivation yields that (2.2) formally satisfies ~ ? R ( x-) L(x)R(x)
+ k(x) = 0,
limR(x) = b(y),
x
E
G,
(5.2)
y regular ( r ) on dG.
X-Y
It can also be shown formally that if R ( . ) satisfies (5.1), (5.2), or (5.4), then it is the desired cost functional.? We will check the calculation for (5.2). Let F ( t ) = A(t)R(X(t)),t I7 , and assume that (5.2) holds. Then
+
dF(t) = [ - A ( X ( t ) ) F ( t ) A ( t ) Y R ( X ( t ) ) ]dt
+ A ( t ) R x ( X ( t ) ) a ( X ( tdw(t), ))
t <7.
Substituting A(x)R(x) - k(x) for YR(x) yields dF(t) = - [ A ( t ) k ( X ( t ) )dt]
+ A ( t ) R x ( X ( t ) ) a ( X ( tdw(t), ))
t < 7.
Integrating the above differential, using the fact that A(0) = 1, and taking expectations, yields us I nr
E,F(t n 7 ) = R(x) - E x
0
A(s)k(X(s))ds.
(5.3)
t Since the calculation is formal, to avoid problems with the way that the boundary condition is realized, assume that the set of points X ( T )is regular (5) w.p. 1, under P,, x E G. This holds if P,{T = T'} = 1, x E G.
68
4
ELLIPTIC EQUATIONS AND DIFFUSIONS
Since R ( X ( t ) )+ b ( X ( z ) ) as t + T [by the boundary condition in (5.2)], we have that E x F ( t n T ) + E , A ( z ) b ( X ( r ) ) as t + a.This together with (5.3) implies that R ( x ) equals the right-hand side of (2.2). By a formal derivation similar to that which leads to (5.1) from (2.1), we can show that (2.4a) satisfies (a/at
+ ~ ' ) R ( xt ), + k(x) = 0,
lim s-1,
x-y,
XE
G
x
E
G, t < T ,
R(x, T ) = bT(x), x E G , y regular (z) on dG, t IT . R(x, s ) + b,(y, t),
(5.4)
The discounted functional (2.4b) also formally satisfies (5.4), but with k ( x ) replaced by k(x) - I(x)R(x,t). The y of (2.5) is formally associated with the equation ~ ' V ( X-)y
+ k(x) = 0,
(5.5)
in the sense that if E x V ( X ( t ) ) / t+ 0 as t + co, then y = E , k ( X ) . (See also
[K5,p. 3321.) To see this note that, by Itb's lemma, (5.5) implies [ExT W )- V X ) l / t = Ex
J (7 - k ( X ( s ) ) )dslt. t
0
4.6 The Characteristic Operator of the Diffusion The Weak Infinitesimal Operator
Let X denote the space of bounded measurable real-valued functions on R'. If f o r f ( - ) E X , [Ex f W))-f(x)l/t,
t
+
09
converges boundedly and pointwise to a function g ( - )E S and if E , g ( X ( t ) ) + g(x) as t + 0, then we write f E 9(2), 2f=g, where 2 is the weak injinitesimal operator of the Markov process X ( *). On C2(R') n 2, ( a )
22 = 9.
The characteristic operator 42 is, in a sense, an extension of 2.We will discuss it briefly because in a way we will be approximating 4Y rather than 2'.Equations (5.1), (5.2), and (5.4) frequently d o not hold as stated in the strong sense (or even in a weak sense). But even in those cases, they often do hold if we replace 2 ' by the operator 42. Let x denote a point in R' and let U , denote a sequence of neighborhoods
4.7
OPTIMAL CONTROL A N D NONLINEAR EQUATIONS
of x such that U , 1 x and E,T, < co, where limit in (6.1) exists
T, = min{t : X ( t ) 4
69 U,}. If the
we write % f ( x ) = g ( x ) . If %f(x) = g ( x ) for all x in a set B and the ratio on the left-hand side of (6.1) is bounded uniformly in n and x E B, then we say that f(-) E 9(%) in B. If E,T, = co for all n, set % f ( x ) = 0. For a more precise definition, see Dynkin [D4, Section 5.31. In particular, the function f(.) defined by f ( x ) = E,b(X(r)) is in 9(%)in G and %f(x) = 0. If supxE Ex T < co,then the functionf( .) defined byf(x) = E , & k ( X ( s ) )ds is in 9(%) in G and % f ( x ) + k ( x ) = 0. The operators % and 9’ are local. But the action of 9 is more reflective of the path properties than is 9, which is a differential operator and its properties are related to path properties only under some “smoothing” assumption. Some further comments on the relationship between % and our numerical procedures appear in Section 5.2.
4.7 Optimal Control Problems and Nonlinear Partial Differential Equations In this section we will give the nonlinear partial differential equations that are formally satisfied by certain minimal cost functionals, define several classes of control strategies, and state some assumptions which will be used in Chapters 8 and 9. A more detailed treatment (under some additional assumptions) of the relationships between partial differential equations and optimal stochastic control theory appears in Fleming and Rishel [Fl]. Several of the classes of control laws and stopping times are introduced owing to the nature of the approximation procedures used in Chapters 8 and 9. Our sequence of approximations will converge to a cost functional for a controlled or stopped diffusion. The introduction of the various classes of strategies is required in order to prove the optimality (or minimality) of the limit of the sequence of approximations to the optimal cost functional. 47.1 The Optimal Stopping Problem
Let (Q B, B t , P,, X(.),x E R‘) be a strong Markov diffusion process, where for each x, there is a w( .) with the properties: X(.),w( .) solves (1.5.1) under P,; w(s), s 5 t, is B, measurable; w ( s ) - w ( t ) , s 2 t, is independent of any B, measurable function under P, . Assume A4.1.1 and
70
4
-
ELLIPTIC EQUATIONS A N D DIFFUSIONS
A4.7.1 b( ) and k( are continuous and bounded real-valued functions on R' and there is a real ko > 0 such that infxER,k(x) 2 ko . a )
Now we define several classes of stopping times analogous to those used in Section 3.2.2. A stopping time p is said to be pure Markov ifthere is a Bore1 set A such that p = inqt : X ( t ) E A}.
Let F&, denote the class of pure Markov stopping times p. In general (unless otherwise defined) a bar over a symbol (such as .For U ) indicates that the underlying process is a continuous parameter process. The symbol without the bar denotes that the underlying process has a discrete F:&), and S& are all defined analogously parameter. Thus, T p M ( x )FpM, , to the symbol without the bar in the Markov chain case. A stopping time p is' said to be randomized fi it is adapted to {a,}; i.e., if{p 5 t} E a,for all t 2 0. Let 9:denote the class of randomized stopping times, and define F R ( x ) ,FR, S ; ( x ) and 9 : in the obvious way, analogously to the definitions in the discrete parameter case of Chapter 3. For each x and each p E S ; ( x ) , define the cost
(7.1) and define It is also true that (Shiryaev [ S l ] ) ~ ( x=) inf R(x, p). (7.3) p E FK+ In fact, Shiryaev [Sl] proves that it is enough to minimize (7.2) over p which are defined by first entry into closed sets. The proof of (7.3) involves showing that we can approximate the cost arbitrarily closely by restricting attention to stopping times with values {k2-", k = 0, 1, ...} for some n, and then showing that for the discrete parameter process {Xk}, defined by X , = X(k2-"), we cannot reduce the cost by using randomized rather than pure Markov strategies. Owing to the lower bound k o , it is sufficient to consider stopping times p satisfying
The arguments given in connection with the stopping problem for the Markov chain suggest that for small A, V(.) should approximately satisfy
+
V ( x )= min[b(x),Ex V ( X ( A ) ) k(x)A].
(7.5)
4.7
OPTIMAL CONTROL A N D NONLINEAR EQUATIONS
71
Let B denote the stopping set { x : V ( x )= b(x)}, and assume that V ( - )is sufficiently smooth in the interior of R' - B. Then, in R' - B, where V ( x )< b(x), (7.5) yields A+O
n
hence (by It63 lemma) 9V(x)
+ k ( x ) = 0, V ( X )= b(x),
x
E
R' - B,
x
E
B.
V ( X )Ib(x),
(74
Of course (7.6) is formal and we will not seek to justify it. Our approximate solutions to (7.6) will converge to the right-hand side of (7.2). Define BE= { x : V ( x )2 b ( x ) - E } ; and let p = inqt : X ( t ) E BE}. Then R(x, p ) IV ( x ) E and Be is the &-optimalstopping set. The main reason for introducing the various classes of stopping times as well as the various classes of controls to be introduced next is that they appear naturally in the computational problem, and we must relate the infima of the costs over the various classes.
+
4.7.2 Optimal Control until a Boundary Is Reached Let G denote a bounded open set in R' with boundary aG and let 9 be a compact set in some Euclidean space, say Rm. Assume
A4.7.2 k( .,-),f (., .), and a(- )are R, R',and ( r x r ) matrix-valued continuous and bounded functions on R' x Q, R' x 9, and R', resp. We shall discuss functionals associated with the controlled stochastic process
First we introduce several classes of control functions. Whether the control function is of a form with values u(x(t)),u ( o , t), u(w(.), t), or u(x(t), t), we usually denote the value simply by u(t). In what follows, we denote by (Q 93,W,, P!) a probability space on which there are defined random processes X ( t ) , u(t),w(t), t < co,which satisfy (7.7) fort t < co and X ( 0 ) = x E G . The sequence of a-algebras {a,} is nondecreasing and 9, c A?. The function w( a )
t It is possible to continue, if the processes were defined only up to T = min{t : X ( t ) $ C). It is (notationally) somewhat more convenient to define them for all t .
72
4
ELLIPTIC EQUATIONS A N D DIFFUSIONS
is a Wiener process under P,,and the stochastic integral in (7.7) is constructed using the measure P,. The X ( -), u( - ) are assumed to be nonanticipative with respect to w ( - ) , and w(s), X ( s ) , s It, are Bt measurable, and w(s) - w(t),s 2 t, is independent of Bt for each t under P,. The objects R, and Bt can also depend on u and x. We are concerned mainly with the measures, and with expectations of functionals of X(- ) and u( -), and not with the probability space per se. The use of a possibly different sample space for each x, u is not too common in stochastic control theory. But it suits the purposes of Chapter 9 quite well. Suppose that there is a probability space of the previously described type, where u( *, - ) is a %-valued measurable w, t function, u( .,t ) is B, measurable and (7.7) has a nonanticipative solution. Zf the solution is unique (in the sense dejned below), then we say that u E @,(x). W e say that the solution is unique if the probability law? of { X ( s ) , u(s), s < co} is uniquely determined b y that of {u(s), w(s), s < co}, irrespective of the probability space. The subclasses of @,(x)for which PU,{T c co} and E:T < co are denoted by @,(x, G), @‘R+(x, G), resp. Let g w ( x ) denote the subclass of %?,(x) which consists of measurable ) any con( w ( - ) ,t)functions and for which u ( w l ( * ) t, ) = u(w2(*),t ) i f w i ( * are tinuous R”-valued functions on [0, co) that are equal on [0,t]. Similarly, dejne gw(x,G ) and @G(x, G). Let @ , , denote the class of %-valued Bore1 measurable “feedback” functions with values u(x, t ) ,for which (7.7) has a unique (in the sense of A4.1.1) nonanticipative solution which is a strong Markov and a Feller process. Dejne the classes q p M ( x G), , gPM(G),b3pM(x,G ) and @gM(G) in the obvious way. In particular, u E qJM(G) if u E qPM and E:T < co,x E G. Iff (., and o( - ) satisfy a uniform Lipschitz condition in x , then for any measurable nonanticipative and %-valued (w, t ) function u ( . ), there is a unique solution to (7.7). Uniqueness seems to be a critical property. With it we can approximate or perturb the dynamics and show that the distributions converge to those for the unperturbed problem as the perturbations go to zero. Such a property is rather essential in applications. a )
REMARKON NOTATION Sometimes the notation U E @ ~ ( X )(or with W replacing R or with a superscript) is used loosely in the sense that it implies the following. There is a probability space with u ( . ), w( * ) defined on it, with u ( . ) being nonanticipative, another probability space with * ), ii( -), it( .) (solving 7.7) defined on it, where ii( .) are nonanticipative and (ii( *), it( .)) has the same law as does (u( .), w( -)), and there is uniqueness of the
z(
x(-),
t By “probability law,” we mean either the multivariate distributions (for almost all s) or the measure induced by X ( u( .) on C‘[O, a)x L;. . a),
,
4.7
OPTIMAL CONTROL AND NONLINEAR EQUATIONS
73
x(-)
solution to (7.7) in the sense described above. The solution X ( . ) or need not be defined on the original probability space. This looseness should cause no problems in Chapter 9. In the stopping and control problem, stopping times i may be added to the above formulation. Functionals of the Controlled Diffusion
For u( .) in @: (x, G) or in @)p+M(x, G), define the cost r
R(x, U ) = E: Jo k ( X ( s ) , u ( s ) )ds
+ E:b(X(z))
(7.8)
and define? V ( x )=
inf
u E @;(x,
G)
R(x, u).
(7.9)
Suppose that an optimal control exists in @;M(G), assume that V ( * )is sufficiently smooth, and let us apply a formal dynamic programming argument. Let u( * ) denote the optimal pure Markov control and u( * ) any other control in g&,,(G). Let i denote the solution and escape time corresponding to the use of u ( . ) on [t n 5, i) and u ( - ) before. Then, writing
z(-),
u(s) = u ( X ( s ) ) ,
ii(s) = U ( X ( S ) ) ,
u(s) = u(X(s)),
we have (7.10)
-
where X ( . ) is the solution corresponding to u( .) and (by optimality of u( ))
Jb
1 ni
V ( x )I E!
k(.f(s), u ( s ) ) ds
Hence I ni
V ( x )I Ef:
f,
k(%(s), u(s)) ds
or equivalently Ef:V(X(t n T ) ) - V ( x ) E!Jt n i)
+ Ef:V(%(t n ?)),
+ Ef: for"k(x(s), u ( s ) ) o.
t If k ( . ) = 0, we only need that u E @(x, G).
E;(t n i)
(7.11)
4
74
ELLIPTIC EQUATIONS AND DIFFUSIONS
A formal limit yields YV(x)
+ k(x, u ) 2 0
for any value u E 42,
where we define
Y U= ~ a i i ( x ) ( a 2 / a xax,) i i, i
(7.12)
+ 1 f;:(x, U)(a/ax,). i
Equations (7.11) and (7.12)yield the formal dynamic programming equation
+
inf [ Y " V ( x ) k(x, u)] = 0, UE
91
x
E
G,
(7.13)
on 8G.
V ( x )= b ( x )
Again, we will not rigorously justify (7.13), but use it to approximate the infima of the costs given by the right-hand side of (7.9). The approximations will converge under broad conditions, whether or not we know the smoothness of V ( - ) .We define the solution of (7.13) to be the right-hand side of (7.9) It will be useful in the sequel to be able to define u( - ) for t 2 T in such a way that T' can be defined. This cannot always be done. If u E gR(x)/and PI,{T = $17' < co} = 1, then we say that u is in @'p(x,G ) . Similarly, define the subclasses @,(x, G), @ i ( x , C ) to be the subclasses of @'p(x, G ) for which PI,{,' < co} = 1 and E",T' < co. The subclasses @&(x, G), gW(x, G), and @$(x, G ) are defined in the obuious way. If we introduce the continuous bounded discount factor A(*) such that there is a real number 1, satisfying inf, 1(x) 2 1, > 0, and replace (7.8) by R ( x , ). = E",
where
[j;
]
A(s)k(X(s),4s)) ds + A(T)b(X(T))
9
(7.14)
A ( t ) = exp - j ' A ( X ( s ) )ds, 0
then we only need that u E G?,(X). In this case, a formal dynamic programming argument yields that (7.15) replaces (7.13) inf [ Y u V ( x )+ k(x, u ) - A(x)V(x)]= 0,
V O W
V ( x )= b ( x )
x
E
G,
on aG.
(7.15)
4.7.3 Optimal Stopping and Control until a Boundary Is Reached In some problems we may desire to stop the process before 8G is reached, with an associated penalty. From a formal point of view, this problem is a simple combination of the problems of Sections 4.7.1 and 4.7.2, except that
4.7
OPTIMAL CONTROL A N D NONLINEAR EQUATIONS
75
the classes of stopping times depend on the control u. We will introduce the relevant notation.
gR(x),dejne the sets of stopping times (all with respect t o {ar}), FR(u,x), and F;(u, x) [and similarly F&(u, x), s&,(u), etc.] in the obvious way: the measure PXsimply replaces P, in the dejnitions of the same If
uE
S g ( u , x),
terms when u is deleted.
For each pair (p, u ) in (7.18), define the cost
and define the infimum over (p, u ) in (7.18) by
Let B denote the stopping set {x : V ( x )= b(x)},and note that dG is in B. Then under appropriate smoothness assumptions on V ( * ) a, formal dynamic programming argument shows that min[Y"V(x) + k(x, v)] = 0,
xE
G - B,
UE'V
4.7.4
in B,
V ( x ) Ib(x)
in G .
(7.19)
Fixed Time of Control
Let k ( . ,
u(*)E
V ( x )= b ( x )
a,
- ) be a bounded continuous function on R' x 4 x [0, TI. For
q R ( x ) ,define the cost functional
and define V(x, t ) = inf R(x, t, u). u E Zit@)
(7.21)
4
76
ELLIPTIC EQUATIONS A N D DIFFUSIONS
The formal dynamic programming equation is inf [(a/& 1' E
4
+ Y ' ) V ( X , t ) + k(x, U , t ) ] = 0,
t < T, x
X E ~ G t,
V(x,t ) =
? ):);;;[
X E
I
T
E
G. (7.22)
G.
A more rigorous approach for several classes of problems appears in Fleming and Rishel [Fl].
4.7.5 The Impulsive Control Problem Assume A4.1.1 (homogeneous case). Let 1 and po denote positive real ) a continuous real-valued bounded numbers, ?. a compact set in R', and p ( function on R' x 3 satisfying p(x, u ) 2 p o , x E R', u E 3 .. We continue to use the terminology of Sections 4.7.1 and 4.7.2 but the continuously acting control will be replaced by a sequence of actions {ui}and the single stopping time will be replaced by a nondecreasing sequence {ti}of stopping times with respect to {d,}. Some or all of the t I may take infinite values, the controlled diffusion is replaced by the jump process (7.23), and u i is dr,measurable and 3 '-valued. t, and X ( s ) , s < t, where X ( - ) is We require that d,measure w(s), s I defined by (7.23), and that w ( s ) - w(t), s 2 t, be independent of any d, measurable function for each t 2 0. If there is jump at t, then X ( t ) = X ( t - ) (a d,measurable quantity). Thus, d,also measures X ( t ) . The right continuity of X ( .) is purely a convention. There is no a priori reason why it could not be defined to be left continuous. a,
-
+
.apM
Let a R ( x ) ,. d w ( x ) , and denote the sets of such sequences { t i ,ui}according to the case. I n the pure Markov case, there are Bore1 sets {Ai}such that t i = inf{t : t 2 ti- X ( t ) E Ai} and each ui(-)is a measurable function of x . By definition, if {ti,ui}is in one of the or classes, then there is a probability space (a,d,P), a sequence of nondecreasing sub-a-algebras 98, of 9?,and a set {ti,ui},X ( * ) ,w ( - ) satisfying the above properties, and the solution of (7.23) is unique in the sense that the law of ((ti,ui},w(.)) implies the law of ({ti,ui},X ( - ) ) .
,,
c & ?
a(.)
4.7
OPTIMAL CONTROL A N D NONLINEAR EQUATIONS
77
Define the costst R ( x , {zi, ui))= E x
i= 1 ir
+.I,(
(exp - A T ~ ) P ( X ( T ; )Ui) ,
inf
V ( x )=
11,. Utl E
I I
exp - A s ) k ( X ( s ) ) ds .
R ( x , {zi, ui)).
(7.24) (7.25)
ZPM
The problem is a generalization of the stopping problem-there is a sequence of discrete decision times rather than just one single decision time. A formal dynamic programming argument gives the following result.
Define B = {x : V ( x ) = minae, [ p ( x , a ) + V ( x + a)]); B is the action set. If x E B,we take the action which yields the minimum. Otherwise we d o not act. For x 4 B, YV(X)
+ k(x)
-
A V ( X ) = 0.
(7.26)
A continuously acting control can also be added. Under certain additional assumptions, the impulsive control problem has been discussed by Bensoussan and Lions [Bl, B2], from a point of view that is rather different than the one which we will take in Chapter 8.
t If there are several simultaneous actions, then there is an ambiguity in the definitions (7.23) , X ( T , )= and (7.24). We will follow a procedure illustrated by the example: Let 7 , = T ~ Define X(T;)
+
U1
= x ( T y ) , X ( T z )= X(T;)
+ Uz.
CHAPTER 5
A Simple Application of the Invariance Theorems
The purpose of this chapter is simply to illustrate on a simple example the approximation ideas (as well as some of the needed assumptions) which we will develop further in the subsequent chapters. With the exception of one result at the end of the chapter, no proofs are given. The chapter begins with an example showing how the concepts of weak convergence of measures can give us very useful generalizations of the classical limit theorems for sequences of vector-valued random variables. We then apply this result to a problem arising in the numerical analysis of a simple two-point boundary value problem for a second-order ordinary differential equation. We take a finite difference approximation to the equation. If the approximation is chosen with a little care, then the resulting finite difference equation has a probabilistic interpretation, which we can exploit via weak convergence methods to get the convergence of the finite difference solution to the true solution. 78
5.1
79
A FUNCTIONAL LIMIT THEOREM
5.1 A Functional Limit Theorem We will try to illustrate the power of the ideas of weak convergence of measures in applications to problems in the numerical analysis of elliptic equations. First, we consider a special and simple convergence problem for a sequence of random variables. For each n, let {I&, k = 0, 1, . . .} denote a sequence of independent, identically distributed random variables which satisfy, for some A E (0, l),
P{II/; = l/n} = P{$; =
4,
- 1/n} = A / 2 I
P{*; = 0) = 1 - A, and define the random walk {<;} by
G=O.
<;+1=<;+*;,
Let n -,03, but keep k/n fixed at a constant t. Then the classical central limit theorem immediately yields that n112<; converges in distribution to a normally distributed random variable, with zero mean and variance At. As a consequenceofthis convergence,we know that Ef(n'12<;)-, E f ( X ) as n -, 03, k/n = t, where X is normally distributed with mean 0 and variance At and f(.) is any bounded almost everywhere continuous function. The value of E f ( X )can be approximated by Ef(n112<;) if desired, although in this case, the limit will often be easier to calculate. Now let us consider a strengthening of the convergence result. Define t; = k/n2 and At; = l/n2, and define the piecewise constant process (with paths in D[O, 03) <"(.) by 5"(t) = <;
on
[at ; + d
By a straightforward calculation, it can be shown that there is a real number K for which
+ A ) - rn(t114 IE l
~le.(t
c
k:t < t k +
1% t +
*; 1
4
A
IK [ A
+ o(i/n2)12.
(1.1)
By Theorem 2.4.2, {?(.)} is tight? on D[O, 03) and if <(.) is a separable process to which a subsequence converges in distribution, then <(. ) is continuous w.p. 1. The limit is actually a Wiener process with covariance At. There are several ways to show this. For the first method, we start by calculating the limit of the characteristic functions of a weakly convergent subsequence.
t It is only necessary to check the conditions of the theorem for A 2 1/2nZ.
80
5
A SIMPLE APPLICATION OF THE INVARIANCE THEOREMS
For any integer m and any real 0 I t l < t 2 .. . < t, and real I , ,. . ., I , , we can get (via the weak convergence)
C I j [ t n ( t j )- ("(ti- 1)]
j= 1
-,exp
- 1 A I f ( t j - ti- 1)/2. j= 1
The limit does not depend on the subsequence. Thus the multivariate distributions converge to those of said Wiener process. This and tightness implies the asserted convergence (Theorem 2.4.3). Often it is difficult to show u priori that the finite dimensional distributions coverge to those of a specified process, and other methods must be used. A second method does not require the fact that the multidimensional distributions converge, but uses some results in martingale theory. Note that (1.1) implies that {<"(t)}and { I t n ( t )12} are uniformly integrable for any t 2 0 since E I ( " ( t ) 1" is bounded uniformly in n for each t 2 0. Fix a particular weakly convergent subsequence and index it by n. Let <( .) denote the limit process; it is continuous w.p. 1. We will argue that the limit is a continuous martingale with quadratic variation At, hence a Wiener process with covariance A t by Section 1.4.4. To see this, let q denote an arbitrary integer, t , , . . ., t , I t be arbitrary real numbers, and g( - ) be a bounded real-valued continuous function on Rq. By a direct evaluation, we see that
Qn E E g ( t n ( t i ) , .. .) t " ( t q ) ) ( t n ( t+ A) - t n ( t ) )= 0. (1.2) The function on D[O, 0 0 ) with values g ( x ( t l ) , . . .,x ( t q ) ). ( x ( t + A) - x(t))" at x ( . ) E D[O, a),a = 1, 2, is not bounded, but owing to the uniform integrability of { 1t"(s)la),a = 1, 2, each s 2 0, and the weak convergence, we have
Q,
+
Eg(t(ti),
...
t(t,))(t(t
+ A) - t(t)),
which equals zero by (1.2). Equation (1.3) and the arbitrariness of q, t c,, and g( imply that
(1.3)
. . .,
a )
E[t(t
+ A) - t(t)I t(s), s I tl = 0,
and hence that t(- ) is a continuous martingale with respect to some familly of a-algebras. Similarly, by the uniform integrability and weak convergence,
+ A) - t"(t))'
E g ( C ( t i ) , .-.)t n ( t q ) ) ( t n ( t
Rn
-Eg(t(t,),
Rn
+
Eg(t(t,),
..
..., t(t,))AA; .9
t(t,))[t(t+ A) - t(t)l'.
Equations (1.4) and (1.5) and the arbitrariness of q, imply that
+
tl,
E [ ( t ( t A) - ((t))' I {(s), s I t ] = AA.
(1.4) (1.5)
..., t,, and of g ( * )
5.2
AN APPLICATIONTO NUMERICAL ANALYSIS
81
Hence the quadratic variation is A t and ((-) is a Wiener process. Since the limit of any convergent subsequence is a Wiener process, the entire original sequence {(”( -)} tends to a Wiener process in distribution. A third method depends on the fact that there is a Wiener process w ( . ) and for each n an increasing sequence of random variables {z;} so that $1 has the distribution of w(z;+ 1) - w(z;) (see Breiman [B4]). The convergence of the process (”( .) to ((. ) in distribution is considerably stronger and more useful than the convergence of (“(t)to ( ( t )in distribution for each t 2 0. In particular, if g( .) : D[O, 0 0 ) + R is bounded and almost everywhere continuous with respect to the measure that ((-) induces on D[O, a), then E g ( P ( .)) + Eg((( -)). For example, let g( .) be the minimum of T and the first hitting time of a point X. The next section deals with a useful application to numerical analysis. 5.2 An Application to Numerical Analysis To provide a simple introduction to some of the ideas and techniques of succeeding chapters, we consider a relatively simple problem of numerical analysis; namely, solving the differential equation
a(.) 2 0,
a( .),
bounded, continuous
b( -), k( .)
inf ( l 4 x ) l
OSXSl
+
If(x)I)>O.
The last line is clearly necessary (but not sufficient) if (2.1) is to have a bounded solution for each continuous k( *). For any real-valued function g ( * )define g + ( x ) = max[O,
&)I,
g - ( x ) = ma@,
-&)I.
Let h be a positive real number such that h- is an integer, and suppose that the stochastic differential equation (2.2) has a unique solution in the sense of probability distributions. d X =f(X)dt
+ (2a(X))”’ dw
(2.2)
Let z = inf{t : X ( t ) $ (0, 1)). By Section 4.5, (2.1) is the equation formally satisfied by (if SUP,^(^, 1) E,z < a)
Jb
T
R ( x ) = Ex
k ( X ( s ) )ds + aP,{X(r) = 0 }
+ BP,(X(T) = l},
x E (0, 1).
82
5
A SIMPLE APPLICATION OF THE INVARIANCE THEOREMS
We will try to approximate the solution to (2.1) by a special finite difference method (finite difference interval h), which will have a very nice probabilistic interpretation. The solution is dejned to be (2.3), so we will want to show that the approximate solutions tend to (2.3) as h + 0 . Use the approximations
+ h) - 2 V ( x ) + V ( x - h)]/h2 [ V ( x + h) - V ( x ) ] / h if f ( x ) 2 0 [ V ( x )- V ( x - h ) ] / h if f ( x ) -= 0.
Vxx(x)+ [ V ( x Vx(x)+ +
(2.4) (2.5)
The choice in (2.5) is very important for reasons which will soon be clear. Let V " ( * )denote the solution to the finite difference equations and define Ph(%
hf * ( X ) ] / Q h ( X ) , X = 0, f h, Ath(x)= h2/&(X),
X
f h) = [a(.)
... .
Define ph(x, y) = 0 for y # x _+ h, where y, x are multiples of h. Next, substitute (2.4) and (2.5) into (2.1), collect terms, divide by Q,,(x), the negative of the coefficient of V"(x),and get
+
V"(X)= V"(X h)p"(x, x V"(0)= u,
+ h) + V"(X- h)p"(x, x - h) + k ( x ) A ~ " ( x ) , x
Vh(1) = p.
= h,
..., 1 - h, (2.6)
Equation (2.6) has an interesting probabilistic interpretation. The p"(x, y) are 2 0 and sum over y to unity for each x . Thus {ph(x,y)} is the transition probability for a Markov chain on the state space 0, f h, . . . . Let {t:, n = 0, 1, . ..} denote the random variables of the chain, and define N,,, the escape time of the chain from (0, l), by Nh = min{n : q! (0, 1)). Equation (2.6) can be rewritten in the form
<:
+
V " ( X )= E x V h ( < t ) k ( x ) At"(.) Vh(1) = p.
V"(0)= u,
Let b ( . ) be any continuous function such that b(1) = /3, b(0) = u. If E x N h < co, then by Section 3.1.2, (2.7) has the unique solution
[ 1 k(t3 Nh-
V"X)
= Ex
i=O
1
At!
+b ( t 4
(2.8)
where we use At: = At"(<:).If k ( x ) = 0 and P x { N h< co} = 1, then (2.7) has the unique solution V"X) = E,b(
5.2
AN APPLICATION TO NUMERICAL ANALYSIS
83
Equation (2.8) has a superficial similarly to (2.3); if we interpret Ar: as a time interval, then the sum in (2.8) resembles a Riemann sum approximation to the integral in (2.3). In fact, the resemblence is very close as we shall see and under certain conditions (\ hich are not stringent in applications), V h ( x )+ R ( x ) as h 0. First, let us note that
<: <:
E[<:+l cov[tl+, - t l
= X ] = f ( x ) Ath(x) = x ] = 2a(x) Ath(x)
+ ( hI f ( x ) 1
(2.9)
- f 2 ( x ) Ath((.)) A r h ( x ) = 2a(x) Ath(x)
+ o(Ath(x)).
The terms in (2.10) are just what we would get [see Eqs. (1.5.15), and (1.5.16)] for the conditional increments of the solution to (2.2) over an interval? A. These calculations suggest that there are limit theorems connecting {<"-)} and the diffusion X ( . ) . This is, indeed, just where the ideas of weak convergence enter. But first we must interpolate {ti}into a continuous parameter process. The A t h ( x ) factors in (2.8) and (2.9) suggest the following interpolation. Define t: and rh(-)by n..
t: =
<"O
1 .
1 At!,
tk = 0,
i=O
= 5,
h
on [ t i , t l + 1). So t h ( *is) held constant on random intervals and is our approximation to X(*). In Chapter 6, it will be shown that {lh( -)} is tight on D[O, co) and that if h indexes a subsequence which converges in distribution to a limit X ( .), then X( must satisfy (2.2). As a further check on the scaling, consider the special case where a ( . ) = 0. Then i= f ( x ) replaces (2.2) and A t h ( x )= h/J f ( x ) l = h/(velocity at x ) . This is the correct time interval since we must hold the process at a fixed value until enough time passes so that it can jump h. This time is just h times the inverse of the velocity at x . Define ph = rk,, , the escape time of th() from (0, 1). Thus (2.8) becomes a )
-
(2.10) Assume, for the moment: ph < co w.p. 1, each h and { p h } is uniformly integrable, Px{z = z'} = 1,
x E (0, 1).
(2.11) (2.12)
t A Lipschitz condition was assumed in Section 1.5. But the estimates are valid for any solution to the stochastic differential equation iff( .), cr( - ) are uniformly continuous.
84
5
A SIMPLE APPLICATION OF THE INVARIANCE THEOREMS
We shall return to these assumptions later. Under (2.11), we can “essentially” replace ph in (2.10) by p h n T for large T and all h. Under (2.12), the functions on D[O, co) given by
where 7 = escape time of the path x ( - ) from (0, l), are continuous w.p. 1 relative to the X ( .) measure. Then weak convergence (Section 2.2) implies ) ) E , g , ( X ( - ) )and similarly for g 2 ( . ) . Owing to the uniquethat E x g 1 ( C h ( *-+ ness of the solution to (2.2) (in the sense of distributions), the distribution of each limit is the same; it does not depend on the particular convergent subsequence that we choose. Thus E x g i ( X ( is well defined by the approximation sequence. Condition (2.12)holds if 0 and 1 are regular (T’).A sufficient condition for this is a))
(a(0)> 0
or f(0) < 0)
and
( a ( 1 )> 0
or f(1) > 0).
If a(0) > 0 and X ( r ) = 0, then the “wildness of the function of t defined by [ 2 a ( X ( ~ ) ) ]dw(s) ” ~ forces x(7 + t ) to be less than 0 infinitely often on any interval [0, s), s > 0. If a(0) = 0 and f(0) < 0, the drift forces the process to the left. If k( .) = 0, then (2.11) can be replaced by
c+‘
”
Px{r < co} = 1.
(2.13)
If (2.13) does not hold at some x E (0,l), then the approximations-indeed any finite difference approximation-may not converge as h + 0 for that value of x, although the approximations will still converge for the x values for which (2.12) and (2.13) do hold if k(.) = 0 (Chapter 6 ) . If (2.13) and (2.12) hold then we will have ph + T w.p. 1 (using the Skorokhod imbedding) and so t;’((Ph)+X(7) w.p. 1. If (2.13) does not hold, then there is a nonnull set on which X ( t ) E (0, l), all t < 00. But for each h, the corresponding paths th(t)may still hit the boundary at a finite time (although those times will go to 00 as h + 0); hence Exb(Ch((Ph)) will not necessarily converge to the correct limit Ex b(X(r))Z,,< . It will be shown below that (2.14) implies (2.11). In general, for a problem in R‘, uniform positive definiteness of the matrix a( - ) guarantees (2.1 1). In the degenerate case, we must check whether the problem is well defined-in particular, whether escape is possible-for otherwise the functionals may not be well defined.
5.2
85
AN APPLICATION TO NUMERICAL ANALYSIS
Consider the special case where there is a real o z > 0 such that
a(.) = a2/2 and wheref(x) = 0. Then X ( - ) is a Wiener process with covariance a2t and Ath(x)= h2/a2,a constant, and It!} a random walk. Then by
Section 5.1, Th( .) converges in distribution to a Wiener process with covariance &, which we will also denote by X ( *). Also, (2.12) holds and so does (2.11) (see below). Hence V h ( x ) R ( x ) . Let us use the Skorokhod imbedding (Theorem 2.2.2). Thus, we can assume that the probability space is chosen so that (w.P. 1 ) -+
t“d -+ X ( t ) uniformly on bounded t sets. Then since 0 and 1 are regular and (w.P. l), b(th(ph)) ~
Jo
Ph
+
(T’),
p h + T w.p. 1
b(X(T)), r
w.p. 1.
k ( X ( s ) )ds,
k ( t h ( s ) )ds -+ 10
The expectations of the b( .) terms converge also. By (2.11), the expectations of the integral terms converge. For this simple case, more classical (and simpler) methods also work. But our intention was to illustrate the weak convergence ideas on a simple example, where the ideas are useful to prove convergence of approximations to an interesting unbounded path functional, which is the solution of a differential equation. The probabilistic approach required that we work with random variables that are elements of the abstract space D[O, co)or that we treat convergence of processes rather than of vector-valued variables. We could not have easily treated the problem by using convergence of multivariate distributions only. Characteristic Operator
The definition of the domain of 2,the weak infinitesimal operator, involves a nonlocal calculation in the sense that X ( t ) may not be arbitrarily close to X ( 0 ) = x, uniformly in w, for small t, and we must calculate limt+o [ E , F ( X ( t ) ) - F ( x ) ] / t . Even if I X ( t ) - X I were suitably and uniformly (in w) bounded for small t, the limit may not exist. But the characteristic operator is a local operator. Define T h = inf{t : X ( t ) = x & h} and let X ( 0 ) = x. It is not hard to check that iff(.) = 0, then E x T h = hz/a2(x) o(h2) and if f(x) # 0 but d(.) = 0, then E x T h = h/ I f(x) I o(h). Using (2.7), we write
+
+
+ k(X) = 0,
[ExV h ( t h ( T h ) ) - Vh(x)]/Ath(x)
which is just an approximation to the “local ” characteristic operator. This
86
5
A SIMPLE APPLICATION OF THE INVARIANCE THEOREMS
suggests that the probabilistic approach taken here is quite natural. Indeed, the functionals of concern (in the uncontrolled case) will usually be in the domain of the characteristic operator. PROOF THAT (2.12) AND infxeIo,1, P,{z < T } 2 2c > 0 IMPLY (2.11) FOR h, WHERE T AND c ARE POSITIVE REAL NUMBERS Suppose that there are sequences h 0,x h + x E [0, 13, such that
SMALL
Pxh{ph< T}+O
as
(2.15)
h+0.
The sequence {th( *), (“0)= xh} can be shown to be tight in D[O, 0 0 ) and all limits have the form (2.2) for X ( 0 ) = x. The measure induced by each solution (for perhaps different Wiener processes) of (2.2) is the same by our uniqueness assumption. Since, by (2.12), the escape time z is continuous w.p. 1, and g h ( t ) + X ( t ) uniformly on finite intervals (using Skorokhod imbedding), we have that p h + z w.p. 1 as h + 0. The set of paths in D[O, a), for which? z’ T is open. Thus (Chapter 2) limh+oP,,{ph < T } 2 P,{z T } 2 2c > 0. This contradicts (2.15). Hence
-=
-=
inf PX{ph< T } 2 c, XEIO.
all small h.
(2.16)
11
denote the stopped processes th(tn p,,) and t: n N h , resp. Let $(*), Then p h 2 2 T if and only if t h ( T )E (0,l), and the escape time is 2 T when the initial condition is zh(T).Thus (for small h)
> 2T} = E ~ z { < h ( T(0. ) ~l))z{th(ZT) Using the Markov property of {tt}, Px{ph
Pxbh
2 2T} = Ex
E
(0,1))
*
I(th(T)~(O 1 ),) P t h ( T ) { p h 2
5 E X I { p ( T ) € ( o,))Q * - c ) 5 (1 - Cl2.
In general (for small h ) P,{ph 2 nT} 5 (1 - c)”, which implies that there are numbers M , < 00 for which (for small h) E x p i IM ,
uniformly in x
E
(0,l),
which implies the uniform integrability. On the Finite Diference Method and the Choice (2.5)
The purpose of the choice between forward and backward differences in the approximation (2.5) is to obtain a finite difference equation whose coefficients are nonnegative. With either a forward or backward difference
t In a more formal notation, let r ( x ( * ) denote ) the escape time of x( .) from (0, l), where x( -) is an arbitrary element of D[O, w). In this terminology, P,(o)= i(th((o.-)).
5.2
87
AN APPLICATION TO NUMERICAL ANALYSIS
approximation, the coefficients would sum to unity, but they could be negative. With our choice, the dynamics help “push the particle to the right if f(x) > 0 and to the left iff(.) < 0. Any approximation method which provides a computationally convenient chain {ti}and whose interpolations converge to X ( . ) in distribution would be satisfactory. However, the finite difference procedure provides an automatic method of obtaining such processes. Neitherf( - ) nor u( need be continuous (Chapter 6) if the proper approximations are used. ”
a )
CHAPTER 6
Elliptic Equations and Uncontrolled D$I usions
In this chapter, we develop in detail the basic techniques for approximation of a diffusion by a Markov chain and for showing convergence of functionals of the chain to functionals of diffusions and to weak solutions of degenerate elliptic equations. Section 6.1 gives some assumptions and formulates part of the problem. Section 6.2 discusses a class of approximating chains. The techniques involve using certain finite difference approximations to the elliptic equations. This approach has the advantage of yielding a simple way to get the approximating chains; but it is not the only way. Section 6.3 discusses the natural interpolations for the chain, proves the necessary tightness, and shows that the limit of the interpolated chains is a diffusion. In Section 6.4, we discuss the convergence of functionals of the chain to weak solutions of the partial differential equation, and the problems that arise due to the presence of a boundary. Section 6.5 deals with various approximations for the discounted problem. 88
6.1
PROBLEM FORMULATION
89
In Section 6.6, we treat a technical question concerning representing a certain term /?: in the form o(t:) SW: + “small” error, where the natural interpolations of the indefinite sum of the 6 W: converge to a Wiener process in distribution. The representation will be very useful in the Chapters 8 and 9, which deal with approximations to nonlinear partial differential equations and to optimal control problems. Section 6.7 discusses the use of the chains for Monte Carlo approximations. In Section 6.8, we show how to approximate invariant measures of the diffusion by a naturally weighted invariant measure for the approximating chain. Here, the space D(- co, 00) is used as it is very useful in dealing with stationarity properties. Various extensions are briefly discussed in Section 6.9, and in Section 6.10 some numerical results are presented. A Lipschitz condition is not used. In its place, we require only that the solution of the stochastic differential equation is unique in the sense of probability law. The techniques developed here and in the following chapters are readily applicable for the approximation of many types of path functionals with which we do not explicitly deal. For example, we can approximate functionals such as Ex sup F ( X ( t ) )exp - A t ,
1 > 0.
m>rt0
In the control problems of Chapters 8 and 9, owing to the nature of weak convergence, the approximating Markov chains yield information on a great variety of functionals of the optimal process. 6.1
Problem Formulation
This chapter consists partly of a generalization of the ideas and results of Chapter 5 together with proofs. We also treat the problem of calculating invariant measures, a “discounted problem,” and several related results. We will deal with the equation X ( t )= x
+ joff( X ( S ) )ds + joro(x(.I)) Ms)
(1.1)
and again (as in Chapter 4) assume A6.1.lt f (.) and o(-)are continuous and bounded R’ and r x r matrixvalued functions on R’, resp., and the solution to (1.1) is unique in the sense of probability distributions.
t These conditions can be weakened. See the remarks at the end of the chapter.
90
6
ELLIPTIC EQUATIONS AND UNCONTROLLED DIFFUSIONS
A6.1.2t k( - ) and b( .) are bounded continuous real-valued functions on R' and G is a bounded open set.
The functional (Section 4.5)R( - ) defined by
1 k ( X ( s ) )ds + E x b ( X ( T ) ) I
R ( x )= Ex
' 0
is a weak solution of
~ V ( X+ )k ( x ) = 0,
x
E
G,
with boundary condition
(1.3)
lim V ( y )= b(x), y
E
G, x regular
(T)
on dG.
Y-x
As in Chapter 5, the technique is the following. For each value of an approximation parameter h (perhaps vector-valued), we seek a finite state Markov chain approximation to X ( - ) . In particular, the chain should be convenient to compute the discrete analog of functionals of the form (1.2). Then we use the weak convergence theory to show that the suitably interpolated chains and the functionals converge to X ( .) and R( .), resp., as h -,0. There are many ways of getting such an approximating chain (5;). We require at least that the first- and second-order moments of (!+ - (!, conditioned on (!, be consistent with those of the diffusion and that the approximation not have large jumps; i.e.,
for some function (depending on 11) Ath((y)which goes to zero as h -,0 [unless, perhaps, a ( y ) = f (y) = 01. Finding chains whose transition probabilities give these properties is not, in general, a simple task, unless one has a rather systematic approach, for otherwise it involves solving a ''large'' nonlinear system of equations in the unknown transition probabilities. Fortunately, a very convenient " formula " is available. Following the idea in Chapter 5, it turns out that if we approximate the operator 2 by a carefully selected finite difference approximation, then the coefficients of the resulting finite difference equation will be the
t These conditions can be weakened.
See the remarks at the end of the chapter.
6.2
AN APPROXIMATING MARKOV CHAIN
91
transition function of the desired Markov chain, where h will be the difference interval, and the Ath(- ) will be immediately apparent from the formulas. The technique suggests other types of approximations. Some of the ideas of this chapter appeared in Kushner and Yu [Kll].
6.2 The Finite Difference Method and an Approximating Markov Chain Let h denote the finite difference interval, in each coordinate direction, and ei the unit vector in the ith coordinate direction. The approximations that we need to use for V J x ) and Vxix,(x),i # j , will depend on the signs ofh(x)and aij(x),resp. Let R; denote the finite difference grid on R'; namely, x E R; if there are integers (positive, negative, or zero) n,, ..., n, such that x = I i e i h n i . Define G, = R j , n G and dG, = {set of points on Rj, - G, which are only one node away from G, along a coordinate direction or diagonal}. Use the approximations V J x ) -,[ V ( x
+ eih) - V ( x ) ] / h
if h ( x ) 2 0,
(2.la)
- V ( x - eih)]/h
if h ( x ) < 0,
(2.lb)
-,[ V ( x )
v,~,~(x) -,[ ~ (+x eih) + V ( X - eih) - 2V(x)]/h2,
(2.2)
and for i # j and aij(x)2 0 use
+ + e j h )+ V ( x - eih - ejh)]/2h2 - [ V ( x + eih) + V ( x - eih) + V ( x + e j h )
vXixj(x) -, [ ~ v ( x ) V ( X + eih
+ V ( X - ejh)]/2h2,
(2.3a)
and for i # j , aij(x)< 0, use
+ + e i h - e j h )+ V ( x - eih + ejh)]/2h2 + [ V ( x+ e i h ) + V ( x - e i h )+ V ( x + e j h ) + V ( x - ejh)]/2h2. (2.3b)
V,,~(X) -, - [ ~ v ( x ) V ( X
Define
and assume that aii(x)-
1 laij(x)I 2 0,
j#i.j
all i = 1, . . ., r.
(2.4)
92
6
ELLIPTICEQUATIONS A N D UNCONTROLLED DIFFUSIONS
Assumption (2.4) is not as restrictive as it may seem. We usually have a choice of coordinate system when doing the discretization and a transformation of coordinates often can be applied to assure (2.4) in the new system. If u ( x ) does not depend on x, such a transformation (to the principal vectors of u ) is always possible. Define the functions Ath(*), p h ( - . .), and Q h ( - ,- ) on R;, R; x R;, and R; x R;I by (i Z j ) Ath(x)= h2/Qh(.x),
p h ( x , y ) = 0,
all other x, y
E
R; .
The ph(x, y) are 2 0, sum (over y) to unity for each x, hence are one step transition probabilities for a Markov chain, whose random variables we will denote by {tl}. The ph(x, y) will be coefficients in the finite difference approximation. If (2.la) were always used to approximate Vxi(x),then J(x) would replace I f;(x) I in Qh(x)andf;(x) or 0, resp., would replacef’(x) andf;(x), resp., in Qh(x,x + e, h ) and Qh(x,x - e, h), resp. Then we could not always guarantee that the ph(x, y) would be 2 0. A similar problem would occur if (2.lb) were always used. Similarly, we allow the choice in (2.3a) and (2.3b) in order to guarantee that the diagonal transition terms ph(x, x f eih f e j h ) are nonnegative. If lA(x)l is dominated from above by the other terms in the numerator of ph(x, x f eih), then either a forward or backward difference scheme could be used at x for that i.
A REMARK ON THE CHAIN Observe that, as uij(x)(i # j ) increases with all other un8(x)held fixed, the probability of moving to the diagonal corners increases until some ph(x, x _+ ei h ) is zero. A further increase in aij(x)would cause that “probability” to become negative, and we must transform the coordinates to preserve the probabilistic interpretation. Figure 6.1 illustrates some two-dimensional examples of the form of the transition function. Substituting (2.1) to (2.3) into (1.3), collecting coefficients of the terms V(x), V(x f ei h), V(x _+ ei h f ej h ) and multiplying each term by h2 and
6.2
93
AN APPROXIMATINGMARKOV C H A I N
xte,h
x - e,h
(b)
FIG.6.1 Examples d x , = f , ( x ) dt
of
+ ~ ( xd)w , f , ( x )
transition 2 0.
probabilities
for {:.
d x , =f,(x) dt, f t ( x ) 2 0 ;
using Vh(- ) to denote the solution to the finite difference equation, yields the equation
0=
-Qh(X)vh(X)
+
Qh(&
y ) v h ( y )-k h2k(X),
X E Gh
Y
Y
with boundary condition
v h ( x )= b(x),
x E dG,
.
The index y ranges over G, u dG,. Equation (2.6) can be rewritten as
Define N ,
= min{n
:
<: $ G,}, and for the moment assume that
6
94
ELLIPTIC EQUATIONS AND UNCONTROLLED DIFFUSIONS
Then (2.7) has a unique solution (Section 3.1.2) R h ( * ) defined , by
c k(5l:) All: + E,Wk,),
Nh- 1
R h b )= E x
(2.9)
n=O
where At: = Ath((r;)is used. We will prove that Rh(x)+ R(x), under some regularity conditions, as h + 0. Properties of the Chain
{r:)
Straightforward calculations yield
E,[tl:+
cov,[r:+
1
-
1
-
rl: 1 rl: = Yl = f ( y ) At“Y)
(2.10)
rl: I rl: = Yl = 2 4 Y ) At“Y) + h At”Y)P(Y) - (Ath(Y))2f(Y)f’(Y)
where
Equations (2.10) and (2.11) meet the requirements of the first two lines of (1.4) [{(k} also has a “no jump” property corresponding to the 3rd line of (1.4)] and are consistent with the local properties of X ( - ) given in (1.5.15) and (1.5.16). A convenient representation for {l:}is
rl:+ = rl: +f (53 All: + Bl: 1
=
el: + ml:+ 1 - cl: 1 <;I + (C+1 - rl: - Wl:+1 - rl: I r3),
(2.12)
where {B:} is an orthogonal sequence for each h. In particular,
E.r[(B:)(p:)’ I E,[Bl: A useful representation for
An Interpolation of }!;C{
?< !
5
= ch(
I r!, i I n] = 0.
3
(2.13)
pn is given in Section 6.6.
to a Continuous Parameter Process
So far the development has consisted of a sequence of manipulations and we have only shown that a suitably selected finite difference approximation has a probabilistic interpretation. The interpretation is quite suggestive, however. Equation (2.9) looks like a Riemann sum approximation to (1.2), and the “differential” properties of (2) resemble those ofX( The relationships will be made precise in the sequel. The forms of the expressions (2.9), a).
6.2
95
A N APPROXIMATING MARKOV CHAIN
(2.10), and (2.11) suggest a natural continuous time interpolation. Define the ) time sequence { t i } and the interpolated continuous parameter process th(. by n-
tf: =
1
1 At!,
i=O
tk = 0,
th(t)= t: on [t!, c ! + ~ ) . Define ph = ti,,= escape time of th(- ) from G,. Then (2.9) can be written as (2.14) The paths of th( are in D‘[O, distribution as h + 0. a )
00).
It will turn out that
th(*) + X(- ) in
A REMARK ON THE INTERPOLATION Let us examine the special case of Fig. 6.lb in more detail, whenfl(x) = c1,f2(x) = c2, where the ci are positive constants. Let X ( 0 ) = 0 and let F, denote an &-neighborhood(8 > 0) about the path x,(t) = c1 t, x 2 ( t ) = c2 t. The random path of {ti}moves upward and to the right. By (2.10), the average movement (given ti) of ti+ - is c Ath(y), where Ath(y) = h/(cl c2), and cov(ti+ - ti 1 ti = y) = O(h2)= 0 ( A ~ ~ ( y )Also, ~ ) . Px{supttsL 1 th(s)- cs I > E } + 1 as t + 00, so eventually the interpolated path wanders away from the nominal mean path ct. However, on any finite interval [0, TI, we can invoke a law of large numbers to show that
,
+
I
P, sup
TtstO
I < h ( s ) - cs 1 > &
I
+
0
as h + 0. So the random path becomes a very good approximation to the nominal path ct on each finite interval [0, ?‘‘I as h + 0. These considerations show that for this simple case, the values (2.9) or (2.14) are actually approximations to the integral of k ( X ( t ) )along the curve X ( t ) until exit from G, plus a boundary term; the process t h ( -actually ) approximates the limit path on each fixed finite interval and the values of any functional (which is continuous w.p. 1, relative to the limit process) of the approximations actually converges to the correct limit as h + 0. If the functional is discontinuous on one path (say xo + ct), then the approximations will not necessarily converge if X ( 0 ) = xo ,but will converge for initial values y which yield paths y + cc on which the functional is continuous. There are extensions of these ideas to the more general cases, but this simple example should illustrate that the method is a very natural one, and why it should be able to handle “reasonably well” cases where R( is discontinuous. 0 )
96
6
ELLIPTIC EQUATIONS A N D UNCONTROLLED DIFFUSIONS
6.3 Convergence of the Approximations to a Diffusion Process Let R' denote the one point compactification of [0, initial condition x and define Bh(t)=
c p!,
F h ( t )=
fi+ Ih
00)
= R+. Fix
the
C f(
k(
K h ( t )= fi+ I h S I
Then <'(t) = x
+ F h ( t )+ Bh(t),
Rh(X) = ExKh(ph)
+ E.xb(
(3.1)
-), Fh(.), Bh(-), Kh(-)). Let Oh(* ) denote the quadruple (th( Theorem 6.3.1 If a ( . ) ,f(.),and k ( - ) are bounded and continuous, then {@'(-), p h , h > 0) is tight on D3r+1[0,co) x R'. l f { Q h ( - ) }denotes any subsequence which is conoergent in distribution to a process @( * ) = (<( .), F( -), B( .), K ( .)), then @( - ) has continuous paths w.p. I .
PROOFSince R + is compact, {phJis obviously tight. Let yh( .) denote an arbitrary scalar component of Oh( If, for each such component and real T, there is a real K such that, for all T 2 s + 6 2 s 2 0, and some u > 0, a).
lim E xs +sup 1 y h ( f ) - yh(s)l a drrrs
h-0
I Kd2,
(3.2)
then (2.3.2) holds. Since (2.3.la) always holds, Theorem 2.4.2 then implies -)} will be tight if the tightness and continuity assertions. Obviously, {th( {Fh(-), Bh( is tight. So we need only verify (3.2) for {Fh(*)} and {Bh(.)}. Fix s > 0 and 6 > 0. Define n, by no = 0 and (delete the h for notational simplicity) a)}
n, = max{i : t3 It},
t > 0.
-= t!+ Is + 6. Then we can write Fh(s + 6) - Fh(s)= 1 f(
Let Ii = 1 if s
ns+6-
1
m
i=O
i=n,
Write the first sum as
c
n,+n- 1
f ( < 3 + i = n , + 1 f(t3At!.
6.3
CONVERGENCE OF THE APPROXIMATIONS
97
The first term tends to zero uniformly (in s, w ) as h 4 0. Then, since
C
n , + r 1
i=n,+
At; I 6,
1
(3.2) holds for { F h ( . ) } with 01 = 2. A similar argument works for { K h ( . ) } . The sequence C;=l pi is an R‘-valued martingale. By the discrete parameter version of the inequality (1.2.3),there is a real K O such that
We will show that there is a real K such that for all s, h,
(3.3)
which, together with the fact that E I l4 + 0 as h 4 0 (uniformly in s, w ) and Chebychevs’ inequality, implies (3.2) for {Bh(.)}. For notational convenience (with no loss in generality), suppose that 8; is scalar-valued. Define 1; to be unity if p; is in the sum on the left-hand side of (3.3),and 1: = 0 otherwise. Then ( 3 . 3 )equals F1
where
F2
= 6E
+ F , + F , + F4,
(p;)2pBj”/$li.l)li, i>j.k
F,
= 4 E x(p;))”$l:.l;, i>j
F,
F , = 0 since E[p:l<;, i I n] = 0. Let = /; $ I : . Then 2 sup,. I &,(x) I. Write i? m
/i-1
=
E
K, \
2
1 (/?;))“I: . i be any real number
6
98
ELLIPTIC EQUATIONS A N D UNCONTROLLLED DIFFUSIONS
which is dominated in absolute value by
+ ->.
+ ( p y [ ( p y+ (B! + ... + p:-
+
4E{(B:)Z[(B:)Z @",>'I + . * * In turn, the latter quantity equals 4Ef
i=O
(B!)" + 4Ef (&)'(Bhg + . * . + Bf-l)2. i= 1
The first term is of the form of F,, and for the second term we get the bound (using 1 t!, i I n] = &(<;) At!)
~[(p;)'
03
4E 1&,(t:)AtlZ@hg i= 1
+ ... + B!-
( 1&') 1 n
1)2
I4K1
6 E Sup
m>nt0
\ 2
46Hh.
i=O
Since Y! = Cy=o8: is a martingale, we can use the inequality (discrete parameter version of) (1.2.3) to get
Since I pl I = 1 tl, dominate F , by
-
I
IK ,
m
1 (B:)'
h for some real K,, we can
I K : K l h26.
i=O
Our computations imply (3.3). Q.E.D. Let {Qh( p h } denote a subsequence which converges in distribution and let 7 ) denote the limit process, where Q ( * ) = F ( * ) ,B ( - ) ,K ( . ) ) has continuous paths w.p. 1. The following lemma will be very useful when we need to characterize conditional expectations by " weak limiting" operations. The proof follows from the Radon-Nikodym definition of conditional expectation and from the fact that (see the lemma for notation) g t ( y )is the minimal a-algebra over which all g,(y(s,), .. ., y(s,)) (si I t, n, g,,, varying) are measurable. a),
(@(a),
(<(a),
Lemma 6.3.1 Let y ( - ) and z denote a vector-valued separable random process and a random vector, resp. Let W,(y)denote the smallest a-algebra over which y(s), s It is measurable. Suppose that there is a W,(y) measurable function z such that, for each integer n and bounded continuous real-valued function gn( and numbers 0 I s1 I s, * I s, I t, we have a )
Then, w.p. 1,
6.3
CONVERGENCE OF THE APPROXIMATIONS
99
Theorem 6.3.2 Assume A6.1.1 and A6.1.2. There is a Wiener process W ( such that <(.) is nonanticipative with respect to W ( . ) and <(t)= x =x
and
id +
+ f (<(s)) F(t)
l;.(<(s))
ds +
+ B(t)
a )
dW(s)
K ( t ) = j t k ( < ( s ) )ds. 0
The multivariate distributions of O ( .) do not depend on the particular convergent subsequence.
PROOF W e always assume that the Skorokhod imbedding is used. By Theorem 6.3.1, any subsequence of {Oh(.)} has a further subsequence which is convergent. Fix the subsequence and index it by h. Recall that the limit @( .) is continuous w.p. 1 and by Skorokhod imbedding we can suppose that the convergence is uniform on finite time intervals w.p. 1. Thus, (3.1) yields
+
+
<(t)= x F ( t ) B(t) (3.4) and it is obvious that F( ) and K ( ) have the asserted forms. If we can prove that B( .) can be represented by the asserted stochastic integral, then the assumed uniqueness in the sense of distributions A6.1.1 implies that the subsequence is unimportant since it implies that any convergent subsequence of (hence the entire sequence) {th( .)} converges in distribution to a solution of (1.1). To show that B ( . ) has the desired representation, we will show that it is a continuous martingale with quadratic variation 2a(<(s))ds (Section
-
.
fi
1.4.4).
Equation (3.4) implies that B(t) is B1(<) measurable.? Define Bh(t,A) = Bh(t + A ) - Bh(t) and B(t, A ) = B(t + A) - B(t). By the proof of Theorem 6.3.1, E I Bh(t, A ) is uniformly bounded in h for each t, A 2 0. Hence
l4
{Bh(t,A)}, {Bh(t,'A)(Bh(t,A)F} are both uniformly (in h) integrable. Let n, gn(*),sl, . . .,snI t be defined as in Lemma 6.3.1. Define ch, D, and C, D by
A), C = Egn(<(Sl),* <(sn))B(s,A), Dh = & n ( t h ( S 1 ) , * . S h ( S n ) ) B h ( S , A)(Bh(s,A)!, D = Egn(<(sl), .--,t(sn))B(s,A)(B(s, A))', Ch = '%n(
9
rh(Sn))Bh(S,
* 3
- 9
D O = Egn(t(sl), * . * t(sn)) J
t+A
9
2a(t(s)) ds*
I
ii?,(r)denotes the minimal u-algebra over which +),s
t,
is measurable.
I00
6
ELLIPTIC EQUATIONS AND UNCONTROLLLED DIFFUSIONS
The uniform integrability of the integrands in Ch and Dh and the weak convergence imply that Ch + C and Dh + D as h + 0. But, using the facts that E[Bh(t,A) I th(s),s _< t] = 0, and the conditional covariance properties of Bh(t,A), yields that c h = 0 (hence C = 0) and Dh -+Do as h 0. Hence by the Lemma (w.P. 1) -+
and
E[B(t, A) I t(s), s I t ] = 0,
which gives the result. Q.E.D.
desired
martingale
and
quadratic
covariation
6.4 Convergence ofthe Cost Functionals R h ( .) The following theorem is a consequence of the weak convergence.
Theorem 6.4.1 Under the conditions of Theorem 6.3.2, i f F( is any realvalued bounded and measurable function on D‘[O, m), which is a.e. continuous with respect to the measure induced on D‘[O, 03) b y <(*), then a )
E,F(th(-))+E,F(t(*))
as h + 0 .
The theorem holds i f we replace boundedness by uniform integrability of { F ( t h (.))}. The convergence is uniform on compact x sets.
We will comment on the last sentence only. Suppose that the convergence is not uniform on compact sets. Then there exists an E > 0 and a bounded sequence {xh} such that I E , , F ( t h ( * ) ) - E,,F(t(-))I2
all h.
Select a subsequence such that xh + x. The sequences {t(.), ((0) = x h } and {th(-), th(0)= X h } are both tight on D‘[O, m), and any convergent subsequence of either sequence converges to a solution of (1.1). This, together with the uniqueness of the probability law of the solution to (l.l), implies that E cannot be positive. The remaining technical difficulties in proving the desired convergence Rh(x)+ R ( x ) , concern the continuity? of the random time 7. The proof of
t By “continuity ofr,” we mean that we are concerned with the continuity ofa [0, a]-valued function on D’[O, a).The function evaluated at x ( - ) E D‘[O, a),is the first exit time of x(.) from G. Recall that T and T‘ are the escape times of c( * ) (or X( from G and G, resp. a))
6.4
CONVERGENCE OF THE COST FUNCTIONALS R h ( .)
101
the following theorem follows from the discussion ofexit times in Chapter 4, and from Theorems 6.3.1, 6.3.2, and 6.4.1. Theorem 6.4.2
Assume the conditions of Theorem 6.3.2. If
P,{z = z' < co} = 1, then z is continuous w.p. 1 with respect to
c(.) measure and
E x b ( r h ( p h ) )+ E x
Ex
joT
"Ph
k ( t h ( S ) ) ds
+
b(c(z)),
jo=k ( W ) ds
Ex
for each real T > 0 as h + 0. If; in addition, is uniformly integrable,
{ph}
then
Ph
Ex
as T -+
I,
(4.4)
03,
p w h ( S ) )
ds + 0
uniformly in h, and we can set T
=
co in (4.3).
Unfortunately, (4.1) and E,z < 00 do not imply (4.4), because the approximations <"( .) may get stuck at points which are inaccessible to the process t(.). [Of course, (4.1) and sup,, E,T' < co do imply (4.4).] To see this, consider the example of Fig. 6.2.
FIG.6.2 Example where Ex ph = co and Ex 7 < I .
102
6
ELLIPTIC EQUATIONS AND UNCONTROLLED DIFFUSIONS
Let l/h be an integer. Starting at X(O), the trajectory exits from G at time
7
-= 1. Since ph(x, x + e2 h ) = 3 # 0 for x 2 # 1, there is a nonzero proba-
<:
bility that some will hit the line x 2 = 1 (hence get stuck on that line since Ath(x)= co on that line) before exiting from G. This probability goes to zero as h + 0, but it is never exactly zero. Hence E x P h = 03, despite the fact that P h + 7 < 1 w.p. 1. Thus we must somehow guarantee that the uniform integrability (4.4) holds.
Uniform Integrability A proof similar to that used at the end of Chapter 5 shows that uniformly integrable if there are real positive T, c such that
infP,{z' IT } 2 c. XE
{Ph}
is
(4.5)
c
Property (4.5) is guaranteed if inf a i i ( x )> 0
XE
for some i .
c
(44
A property implying (4.5) under the required smoothness condition [and implied by (4.6)] is (see Pinsky [PI]): Let o ( * ) E C Z ( R ' ) , f ( -E) C2(R') and suppose that there is a 6 > 0 and an 2 E R' such that
1
(x - ?)'u(x)(x - k) 2 6 x - 2 1'
all x
E
G.
(4.7)
6.5 The Discounted Cost Problem Consider the discounted cost functional (4.2.2)with I , > 0 and associated differential equation (4.5.2). The finite difference model of Section 2 when applied to (4.5.2) yields
0=
-[Qh(X)
4x)h2IVh(x)
+ k(X)h2 = 0,
X E
1
Qh(%
Y ) Vh(Y)
YERlh
Gh ,
which can be rewritten as
= b(x),
X E
(5.lb)
8Gh.
Equation (5.1) has a unique solution. Suppose that Ath(x)+O as h + 0 ,
all x E G,
(5.2)
6.5
103
THE DISCOUNTED COST PROBLEM
or that there are no absorbing points in G. Then it can easily be shown that the limit, as h + 0, of the solutions to (5.1) is also the limit of the sequence of solutions to (5.1) with Qh(x)/(Qh(x) + A(x)h2) replaced by the first-order approximation [l - A(x) Ath(x)]or the limit of the unique solutions to
+
V h ( X )= (eXp - A(x) Ath(X))[E,V h ( ( : ) k ( x ) Ath((.)],
X
E
Ghr
(5.3)
or of
V h ( x )= exp - A(x) Ath(x)ExV h ( t : )
+ k ( x ) [ l - (eXp - A(x) At'(x))]/A(x),
X E Gh,
(5.4)
with boundary condition (5.lb). The limit of the solutions to (5.4) as h -P 0 is the correct R(x), whether or not (5.2) holds. Note that, in (5.4), we merely replaced [exp - A(x) Ath(x)]k(x)A f h ( x )by
k ( x ) !oArh(x)[exp- A(x)s] ds. Let N h = min{n : t i # Gh}.Then a slight extension of the result in Section there depend on x ) yields that the unique solution to (5.3) 3.1.4 (letting the /I is Nh- 1
Rh(x)= Ex
Ah,k((i) At!
n=O
where A:
n (exp -A(t!) At!) n
=
+ E,Akh-
.
= exp
b(tih),
x
E Gh ,
(5.5)
-
i=O
The unique solution to (5.4) is
which can be rewritten as Rh(x)= E x
\
Ph
'0
[exp -
ds]k(th(t))dt
The equation (5.3) discounts the cost over the interval [ t i , t i + 1) by using the discount factor exp - yoA(th(s))ds evaluated at t = r:+ (namely A:), while the form (5.6) uses the average discount factor
104
6
ELLIPTIC EQUATIONS AND UNCONTROLLED DIFFUSIONS
We can conclude that under the conditions of Theorem 6.3.2, and ifl( is continuous and there is a real lo> 0 such that A(y) 2 l o ,all y E G, and if 0
)
(whether t = co or not w.p. > 0),
P,{t = t’}= 1
+
and using (5.5) or (5.6) [only (5.6), if sup,,G Ath(x) 0 as h -,01, then Rh(x)+ R ( x ) as h -,0. The uniform integrability condition is not required when lo> 0.
6.6 An Alternative Representation for /?: and W(.) There is an alternative representation for /?: which allows us to explicitly construct the W( process, and is also very convenient to use in some of the control problems of the following chapters. The main difficulties in the proof of the representation and limit are caused by the possibility that u( - ) may not be positive definite. Hence, we need to introduce an auxiliary process, and to use a somewhat indirect approach. In the chapters dealing with controlled diffusions, it will be necessary to treat a class of controls which depends on the “driving” Wiener process w ( * ) .When we “discretize” these controls-in order to adapt them to the chains-the approximation W“( which we will develop below will be very helpful. W“( ) is an approximation to a Wiener process and we will be able to let the controls (for the chain) depend on the process W h ( - )in , the same way that they depend on w( - ) in the continuous parameter case. The comparisons obtained via this method will allow us to prove various theorems concerning convergence of minimal cost functionals for a sequence of Markov chains to the minimal cost functional for the diffusion. Augment the probability space (whichever one we are using) by adding a R‘ vector-valued Wiener process +( .), which is independent of the {t:}. Factor & ( X ) into a )
-
a )
ch(x)
= ph(x)Di(x)ph(x),
and define o h ( ‘ ) by gh(X) = Ph(X)Dh(X),
where P h ( - )and I),,(-)are Bore1 measurable, and are orthonormal and diagonal matrices, resp., for each x . Define P h . = P h ( t : ) and Dh, n
= diag{dl(t:),
’ * * )
dn(t!)}
= Dh(t:),
6.6
A N ALTERNATIVE REPRESENTATION FOR
/?: AND w(')
10.5
where the {di(x)}are the diagonal elements of Dh(x).We can select P h ( ' ) , Dh( * ) such that &(X) + 2a(x), o h ( X ) + a(x), as h + 0, uniformly in x. Define 6$: = $ ( t : + , ) - $(t!). Let a E (0, 1) and let Din, D;:, D f . denote the diagonal matrices with ith diagonal elements d L t I { d i , n > O ) > d L t z { d i . m z W ) 9 d i , nz{di.&? if)9
resp. Define
6wl:=D;,'Ph,n/?2+ ( I -DInD;:)6$t
= yl: + PI: and W"(t) =
(6.1)
1 7: + 1
t.+1hSt
pl: =
rh(t) + ph((t).
tn+lh
In Section 1.4.4, we derived a Wiener process from a continuous square integrable martingale (and, perhaps, a Wiener process independent of the martingale) and showed that we could represent the martingale as a stochastic integral with respect to the derived Wiener process. We are trying to d o something similar here and to mimic as closely as possible the derivation in Chapter 1.4.4. We are motivated by the fact that Bh(.) is "close" to being a square integrable martingale (Theorem 6.3.2). It will turn out that W"( converges in distribution to a Wiener process, with respect to which we can write 0
)
Theorem 6.6.1 If sup, Ath(x)+ 0 as h + 0 and i f f ( * ) and o(*)are bounded and continuous, then { W h ( * )is} tight on D'[O, 0 0 ) and rends to an R'valued Wiener process W (.), in distribution as h + 0. PROOF First, tightness and continuity of the limits will be shown by using the criterion of Theorem 2.4.2. Owing to the definition of 6$: and to the fact that its coefficient in (6.1) is a nonnegative diagonal matrix dominated by I, it is clear that the interpolation p h ( * )of the { p i } satisfies the criterion of Theorem 2.4.2. Criterion (2.3.3) will be verified for rh(. ), as it was for the Bh(- ) in Theorem 6.3.1. We can suppose, in this paragraph, that the 7: are scalar-valued. Let Gi be defined like the Fi were defined in Theorem 6.3.1, but with y! replacing /?:. Since
E[yl:
1 t!,
i I n] = 0,
GI = 0. Let K denote a real number, whose value may vary from usage to
106
6
ELLIPTIC EQUATIONS AND UNCONTROLLED DIFFUSIONS
usage but which does not depend on either s, 6 or h. The calculation which showed that F, I K6' can also be used to show that G, I K6' since Also, we can show that G3 IK(6' + G4). Using the fact that I yf: 1 I K h l - " , we get that G, I K h 2 - Z " E ~ , P D = o I y ~I1 2Kh2-'"6 1~ + 0 as h -+ 0. Thus (3.3) holds for rh( - ) with hZ replaced by h2- ", and {rh( ph( are tight, and all limits are continuous w.p. 1. Observe that { 6 q } is an orthogonal sequence and, in particular, satisfies a),
E[6W: I g, 6$!cov[GWf:I (!,
a$!-
a)}
1,
i I n] = 0
1,
i _< n] = At:I.
(6.3)
Let { Wh( denote any subsequence which converges in distribution. A proof like that used in Theorem 6.3.2 [to show that B ( . ) was a continuous martingale with a certain quadratic variation can be used to show that W ( * ) , the limit of { Wh(-)}, is a continuous martingale with quadratic variation It. Hence it is a Wiener process. Q.E.D. a)}
The only place where the assumption sup, Ath(x) + 0 as h + 0 is required, is in showing that the quadratic variation is It; i.e., that there are no gaps-as there would be if Ath(x) = 00, say. If the assumption does not hold, then we need to modify W " ( - ) and {6Wf:} slightly. Let K be a large real and ki(-)by number. Define 6Wi as in (6.1), and define
mi = xyro' 6 w ; 6%
=
6 W: I
{ ~ ~ , % ~ ~Wh(t) ~ ) ,=
Wf:
on
[tf:, rf:+ l).
Let Ih( - ) denote the function which has the value 1 on [t!, tl+ and is zero otherwise. Define P(.) by P(t)=
W(r)+ J'P(S)
if At! > Kh"
d$(s).
0
{ Wh(.)} is also tight and all limits are Wiener processes, but we will not go into the details. The process P(.) can be used instead of W " ( * )below. Henceforth, we assume that sup, Ath(x) + 0 as h + 0. All the results can be extended to the general case at the cost of extra detail. A direct calculation (in which we use the fact that ph, ,,Dh, D l p h , ,,Sf:= pf:w.p. 1) show that we can write pf:as
pf:= Ph,.D),, 6w:+ &:
where
gh(t:)
6 q + &f:,
W.P. 1,
(6.5)
6.7 MONTE CARLO
107
The continuous parameter process E"( *), obtained by interpolating {E:}, can be verified to converge to the identically zero process in distribution. We can write <:+ 1 = +f(<:) At: + ah(<:) SW: + E : . (6.6) Let {<"(*), W " ( - ) ,E " ( * ) } denote a convergent subsequence whose limit is denoted by {<( .), W(*), 0). By using the technique of the proof of Theorem 8.2.1 and Theorem 8.2.2, it can be verified that <(.) is nonanticipative with respect to the Wiener process W ( * and ) that
<:
Only an outline of the proof of the latter assertion will be given. For each A > 0, we can rewrite <"(.) in the form
+ A) = Sh(mA)+ 5 f ( t h ( s ) ) ds + O(h) + a,,(<'(mA))[V(mA + A) - w"(mA)] mA+A
<"(mA
mA
+ [E"(mA + A) - E " ( ~ A ) ] + [E", '(mA + A) - '(mA)], Ehv
where 2". '(mA
(6.8)
+ A) - 2"' "(mA) is an error due to the approximation of nmA+A-
1
1
Oh(
SW:,
by the 3rd term on the right, which term equals n m A + A- 1
Uh(<"(mA))
1
i= nmA
SW: = a,(<"(mA))[Wh(mA
+ A) - Wh(mA)].
Now, rh(- ) + <( -), a continuous process, uniformly on finite intervals, the ( s as ) ) h + 0, the third term tends to second term tends to ~ ~ ~ + " f (
+
6.7
Monte Carlo
The chain {<:} can be used for Monte Carlo type estimations of any functional for which the weak convergence Theorems 6.3.1 and 6.3.2 hold. To generate a sample path of the chain, we simply need to generate a
108
6
ELLIPTIC EQUATIONS A N D UNCONTROLLED DIFFUSIONS
sequence of random variables, each drawn from a simple discrete distribution; e.g., if tl: = x, to get a sample value of tl:+l,we need only to draw a sample according to the distribution {ph(x,y)}. Suppose we wish to calculate Ex Sb k ( X ( s ) )ds. Using {ph(x,y)}, generate, say, N sequences of sample paths {tih, n = 1, 2, .. .}, i = 1, . .., N , with <$ = x and each sequence being generated independently. The ith such sequence need be generated only until Nb, the time that {ti n = 1, 2, . . .} first leaves G h .Then use
’,
1
N
Nhi-
1
for the estimate. We d o not know how the efficiency compares with other approaches, e.g., with that which uses a discrete time approximation to X ( * ) .For the latter method, we choose a A 0, generate a sample sequence {awn, n = 1, 2, . . .} of independent “almost Gaussian random vectors with covariance AZ and use the iteration
=-
”
Xf+ = X t + f ( X f ) A + o(Xf) 6~:. Perhaps the natural time scaling will work in favor of our method. Also, it is somewhat easier to generate a sample from our discrete distribution, than to generate a good approximation to the Gaussian random vector.
6.8 Approximation of Invariant Measures The development will require the following limit theorem for Markov chains (Chung [Cl, Section 1.15, Theorems 1, 2, 31). Let { Y,} denote a recurrent finite state Markov chain with stationary transition probabilities { p i j } , and whose states form a connected class. Define
Theorem 6.8.1 For any real-valued funcrion F( -), 1
n-1
w.p. 1 and in expectation. In order to d o computations with the chain {ti},the state space must be finite for each h > 0. The state space will be finite if the process varies over a
6.8
APPROXIMATION OF INVARIANT MEASURES
109
finite region with boundary reflections. Although we will not emphasize such problems in this book (except for the material in Chapter lo), much the same development would be used-with the proper model for a reflected diffusion (Strook and Varadhan [S4], Kushner [K7]). The range will also be bounded if the diffusion is on a manifold such as a hypersphere. On the other hand, the dynamics for large values of the state are often somewhat arbitrary, and do not-or should not with a proper model-seriously affect the behavior for moderate values of x . Consider the special two-dimensional case.
dXl= X 2 dt, dX2=f2(X) dt + .(X) dw,
X
= (Xl, X2).
If the model is to be used to represent some type of phenomenon arising in a physical situation, it will usually be approximate in some sense. In particular, the values off2(- ) and o( - ) for large values of x are somewhat arbitrary. If the process is of interest over a very long time, one would usually try modelling the functions so that values such as E,k(X(t)) do not differ much from their true (or experimental) values for large t. With this idea in mind, there are several ways by which the range space of X(- ) can be made finite. We only need to assume that there is a bounded open set G so that the range of the process X(.)is contained in G, when X(0)E G. In order to avoid details concerned with the specification of the finite difference procedure in an arbitrary region G, we assume that G is a hyperrectangle, whme sides are all integral multiples of h, for each h which we actually use as a finite difference interval. Except for the difficulty in specifying the approximation, the treatment of more general regions will be the same as for the hyperrectangle. Let G, = G n RL denote the state space. Under our assumptions on C,{t:}E G if ($ E G. Formally, if p is an invariant measure for (l.l), then y = E, k(X)satisfies Eq. (4.5.5),for some bounded function V ( . ) . Let us proceed formally and discretize (4.5.5) using the approximations of Section 6.2. Denoting the finite difference solution by ( V h ( - ) y’), , we have V h ( x )= Ex V h ( < ! )+ k ( x ) Ath(X)- yh Ath(X).
(8.2) If (8.2) has a solution yh which does not depend on x, then (8.2) implies that n- 1
yh = lim Ex n-m
1 k(5:)
i=O
n- 1
At:/E,
i=O
At:.
(8.3)
If G, is an ergodic class, then there is a unique constant yh which satisfies (8.2), for some function V h ( - )The . function will not be unique. In general, there will be transient states, perhaps several ergodic classes [it is not clear
11 0
6
ELLIPTICEQUATIONS AND UNCONTROLLED DIFFUSIONS
whether even uniqueness of invariant measure of X ( - ) implies a similar uniqueness for the chain, for small h] and perhaps periodicity. But there must be at least one recurrent state for each h > 0. Suppose that x is a recurrent state on G h : Define ch(x) to be the recurrence class which communicates with x, and define nh(y)= lirn n-m
1 -
n
C P { t ! = y I tg = x}.
n-1
i=O
Then the term defined by the right-hand side of (8.3) takes the value (by Theorem 6.8.1)
where ,uh(y)is defined by P"Y) =
i1
Ath(y)nh(y)
2 E
10,
Gh(x)
Ath(z)nh(z),
y Y
ch(x),
#
ch(x).
c,}is an invariant measure for {t;:},
Note that nh = {nh(y),Y E it has support on c h ( x ) only. Since the relative time that Ath(y),we have
th(*) spends in a state y
P { t h ( t )= y 1 initial measure for the chain, at n
(8-51
and that
is proportional to = - 03,
is either nh or tLm = x} = ph(y). Thus for t
E
(-co,
03)
E [ k ( t h ( t ) 151 ) = x or has measure nh]= E&X)
= jk ( y ) dph(y).
(8.6)
By Theorem 6.8.1 and the definition of ph we have the following relationships (initial condition, at n = t = - 03, is x)
lim
=
n-r m. w.p. 1
jot"' k ( t h ( s ) )ds/t; =
lim
r-m.w.p.
= lim j ; E x k ( t h ( s ) )ds/t. r-m
These calculations suggest the following theorem.
5 ' k ( t h ( s ) ds/t )
1 0
6.8 APPROXIMATION OF
INVARIANT MEASURES
111
Theorem 6.8.2 Assume A6.1.1 and that X ( - ) has a unique invariant measure p. Let x denote a recurrent state in (which can vary with h). Then the measure p " ( - ) given b y (8.5) is a weak sense approximation to p in the sense that
c),
YE
c
CMX)
k(Y)P"(Y)-, E , W
as h+O,
(8.8)
-
for any bounded and continuous function k( ). PROOF Let {ti}and th( - ) denote the chain and interpolated process, both defined on ( - 0 0 , 00), with initial measure (at t = n = -00)z". Via arguments like those used in Section 3 of this chapter, it can be shown that is tight on D ' ( - ~ o , co) and that any limit t(-)satisfies: there is a Wiener process W ( . ) with respect to which <(.) is nonanticipative and such that
{c"(
a)}
Suppose that we choose a subsequence for which both {t"( *)} and {p"( .)} converge (in distribution and weakly, resp.). Let po denote the weak limit of {P"( *
)I.
By (8.6) and the weak convergence, is a stationary process with measure po . Hence po is an invariant measure for (1.1) and, by uniqueness, p o = p. The theorem follows since {(a)
REMARKS By uniqueness of p, all convergent subsequences of {p"} have the same weak limit, namely p. Also, the invariant measure need not be unique. Let B be a subset of G such that there is a unique invariant measure p with support in B. If E B} = 1 for x E B, then we can use B in place of G in the theorem. REMARKON THE PARABOLIC CASE OF CHAPTER 7 There iS a natural analog of Theorem 6.8.2 for the chain {t: "} of Chapter 7. We only need to let At! be replaced by A, and to replace a"(y) by (for y E G , "(x), the recurrence set associated with a recurrent state x E
c,,)
lim n-rm
1 n-1 C~{~~"=y(<~"=x}=z"*"(y). n i=O
-
11 2
6
ELLIPTIC EQUATIONS AND UNCONTROLLED DIFFUSIONS
6.9 Remarks and Extensions
We have concentrated on diffusion processes on R’. The same ideas work when the diffusion is confined to any smooth manifold, provided that an appropriate discretization procedure is used. Some special cases of current importance deal with diffusions on circles or hyperspheres (see Khazminskii [K2] or Mitchell and Kozin [M4] for an example in stability theory, and Lo and Willsky [L2] for an example in estimation theory). This type of example is particularly easy to discretize since the state variable is the “angle” and the state space is inherently finite. The conditions o n f ( o( .), k( .), and b( .) can be weakened. We assumed that they were continuous only for convenience since otherwise we would have had to take somewhat greater care with the details in the specification of the discretizations and in the proofs. For example, let D ( g ) denote the closure of the discontinuity set of a bounded R‘-valued measurable function g(*). Then the continuity off(.) in A6.1.1 can be replaced by a),
P x { X ( t )E D(f)} = 0,
almost all t > 0.
In lieu of using f (- ) or o(.), we could use some local average of these functions in the construction of p h ( - , -). For example, let N , ( y ) denote an &-neighborhood of y and let &(h)+ 0 as h + 0. Then use fh( .) defined by
.,
in place off( in ph( .). It is likely that some such alteration would yield faster convergence but the question has not been investigated. Indeed, there are probably better ways than the ones used here of approximating the transition probabilities-or even of constructing the state space for the approximating chain (it need not be Gh or even a linear transformation of Gh)-and more work needs to be done on the question. We have used the discretization of the partial differential equations by finite differences largely because it yields a suitably convergent approximating process with relatively little work. Yet, once the basic ideas of the approximation method and convergence proofs are clear, one can exercise considerable freedom in altering the transition functions and either experimentally or theoretically checking rates of convergence and other properties. Suppose that we solve for the Rh(x), x E Gh u d G h , given by (2.14). An area needing much work concerns the problem of smoothing R h ( .) in some way to obtain a “better ” approximation to R( .). Almost nothing is known 0
)
6.10
I13
NUMERICAL DATA
about this. Yet in several simple numerical experiments for furctionals of the form (2.14) and for other functionals as well, simple types of smoothings seem to yield improved approximations of the limit.
6.10
Numerical Data
Some numerical data of interest in stochastic stability will be presented. Let tl( *) and <,( * ) denote two independent wide band zero mean ergodic processes, with spectral densities of heights 0: and a:, resp., at 0. The equation (10.1)arises frequently in applications and, as the band width goes to infinity, the solution processes converge weakly [K2, M2, W3] to the solution of (10.2) in C2[0, co).
dX,
= X, dt
dX2 = - a l X l dt - (a2 - 0:/2)X, dt
+ 01x1 dw, + 0 2 X,
dw,,
X(0) = X,
(10.2)
w l ( .), w,( - ) independent Wiener processes.
One way to get information on the stability properties of (10.1) is to calculate various quantities for (10.2),say the average time required for the path to leave some bounded set or similar quantities. Consider the set
G = { x : ( x i I < 3, i = 1, 2} - {0}and quantity R ( x ) = Ex exp -IT,
T = inqt : X(t)!p
G}.
The values of R ( . ) yield some “stability” information. Indeed, if it can be calculated for several (small) values of A, then the characteristic function of the distribution of T can be estimated. Denote the solution to the finite difference approximation (5.3) by Vh(.), where we set k( = 0 and b( - ) = 1 (b(0)= 0). Set a, = a, = 1, 0: = 0: = 0’. A necessary and sufficient condition for the mean value (resp., second moment) of (10.2) to go to zero as t -,co is o2 < 4 (resp., a2 < 4) [K2, M4]. Refer to Fig. 6.3, where I = 0.1, c2 = 3, and h = 0.3 and 0.15. The level curves are close, suggesting that the approximations are “reasonably ” good. a )
11 4
6
ELLIPTIC EQUATIONS AND UNCONTROLLED DIFFUSIONS
1-24
FIG.6.3 Level surfaces for discrete approximation to E x exp - IT.u2 = 3, I = 0.1; - or h = 0.3; or - x - , h = 0.15.
-0-,
-0-
-2
-1.5
-I
-0.5
0
0.5
0.1
1.5
2
FIG. 6.4 Level surfaces for discrete approximation to E exp - IT. I = 0.05, h = 0.3, = 1, (- x -) 8 2 = f. = 3, (-0-)
(-o-) 01
6.10
NUMERICAL DATA
115
Similar results hold for 1 = 0.05. Refer to Figure 6.4, where 1 = 0.05, h = 0.3, and several level curves for several values of o2 are plotted. The results for h = 0.15 are fairly close to the plotted results. The curves give information on the path excursions of the unstopped process (10.2). In particular, they illustrate how rapidly the path excursions grow as 0 increases, even when it is well below the upper limit for mean square stability.
CHAPTER 7
Approximations f a r Parabolic Equations and Nonlinear Filtering Problems
The chapter treats the nonhomogeneous finite time version of the problem of Chapter 6. A degenerate parabolic equation replaces the degenerate elliptic equation. As in Chapter 6, a suitable finite difference approximation to the partial differential equation yields a Markov chain approximation to the diffusion. The interpolations of the Markov chains converge to the diffusion in distribution, as the finite difference intervals go to zero. With the use of the chain, many functionals of the diffusion can be approximated. Equivalently, we can approximate the solution to the degenerate parabolic equation by simple calculations with the chain. The explicit, implicit, and explicitimplicit methods are discussed, together with their probabilistic interpretations. An iterative method is discussed for solving for the weak-sense solution to the Fokker-Planck equation; i.e., to approximate the evolution of the weaksense density of the diffusion as time increases. A standard problem in nonlinear filtering is also treated; we give a method analogous to that for 116
7.2
117
THE FINITE DIFFERENCE APPROXIMATION
(approximately) solving the Fokker-Planck equation for approximating the conditional density of the diffusion, given " noise-corrupted " data. In the last section, there is some numerical data concerning the approximation of the invariant measure for a simple problem. 7.1 Problem Statement We will treat computational problems associated with the nonhomogeneous Eq. (1.1) and the cost functional (4.2.4). X ( s )= x
+ lfSf(X(u),u ) du + j f s ~ ( X ( uu)),dw(u),
X ( t ) = x. (1.1)
Assume A7.1.1 and A7.1.2. A7.1.1 f(-,.), c(*,.) are bounded and continuous, and (1.1) has a unique solution (in the sense of probability law) on t h e j n i t e interval [t, TI,for each 0 < t < T , and each x E R'. The solution is a strong Markou and a Feller process. (Again, as in Chapter 4, the Markov and Feller properties actually are consequences of the uniqueness.) A7.1.2 k( *, * ), b,( -,.), and bT( are bounded and continuous functions on R' x [0, TI, R' x [0, T ] and R' resp. a )
In the next section we will apply a finite difference technique to the parabolic partial differential equation (4.5.4) which, formally, is satisfied by the functional (4.2.4). The finite difference method will yield a Markov chain, whose interpolations will converge to X ( in distribution as the finite difference intervals A, h + 0. Again, the interpolation will be piecewise constant, but over nonrandom time intervals A. 9 )
7.2 The Finite Difference Approximation and Weak Convergence In the last chapter it was necessary to choose the finite difference approximations with some care if we wanted the coefficients in the finite difference equations to be one-step transition probabilities for some Markov chain. We must make analogous choices for the current problem. Let A and h denote the finite difference intervals to be used to approximate the derivatives with respect to time and with respect to each of the spatial directions, resp. The
118
7
PARABOLIC EQUATIONS
quantity h could vary with the coordinate direction but the formulas then become much longer. The approximations (2.1)-(2.4) are for the explicit method of approximation for parabolic equations. The implicit method is treated in the next section. We will use the finite difference approximations (2.1)-(2.4). In these formulas, we do let h depend on the direction, although in the subsequent development we will set h = h i . Y ( x 7t )
+
[ V(X, t + A) - V(X, t)]/A,
(2.1)
+ e i h i , t + A) - V ( x , t + A)]/hi, if J ( x , t) 2 0, [V(x, t + A) - V ( x - e i h i , t + A)]/hi, if J ( x , t) < 0, (2.2) Vxixi(x,t ) -,[V(x + e i h i , t + A) + V(x - e i h i , t + A) - 2V(x, t + A)]/h:, Vxi(x,t ) + [V(x +
(2.3)
+ A) + V ( x + eihi + e j h j , t + A) + V(x - eihi - e j h j , t + A)]/2hihj - [V(x + e i h i , t + A) + V ( x - e i h i , f + A) + V(x + e j h j , t + A) + V ( x - e j h j , t + A)]/2hihj, aij(x,t) 2 0, i # j , (2.4 - [2V(x, t + A) + V(x + eihi - e j h j , t + A) + V(x - eihi + e j h j , t + A)]/2hihj + [V(x + e i h i , t + A) + V ( x - e i h i , t + A) + V(x + e j h j , t + A) (2.4 + V ( x - ejhj , t + A)]/2hihj , aij(x, t) < 0, i # j .
Vxixj(xrt) + [2V(x, t
Vxixj(x,t)
Analogously to (6.2.4), we assume that (for x E R', t < T ) aii(x, t) -
1 laij(x, t) 1 2 0,
i = 1,
and that the quantity defined below by phvA(x,x) is nonnegative:
"[
p y x , x) = 1 - h' h
(2.51
. . ., r,
i#i,j
1i 1 J ( x , t) I + 2 1 Uii(X, t) - 1 1 U i j ( X , t ) I i
i#j, i . j
1
2 0.
(2.6a)
Both (2.5) and (2.6a) are needed to assure that the finite difference coefficients will be nonnegative. The conditions (2.5) and (2.6a) can be considerably weakened if we allow h to depend on the coordinate direction or by use of a coordinate rotation before the substitution of (2.1)-(2.4) into the differential equations. Before proceeding, we briefly examine a simple twodimensional case where hi is allowed to depend on i.
7.2
119
THE FINITE DIFFERENCE APPROXIMATION
Example Consider
Y ( x 9t ) + 9VxIxl(x,t ) + 6VxIx2(x,t ) + Vx2x2(x, t ) + k(x, t ) = 0, XEG, t l T . Then a,, = 9, a l , = a,, = 3, and a,, = 1. A rotation of coordinates can clearly be used to transform the problem into one in which the a( ) matrix is diagonal. For numerical purposes this may be useful. But if we restrict ourselves to using (2.1)-(2.4) without a prior rotation and allow the possibility that h, # h,, then (2.5) must be replaced by
-
a,,/h~-a,,/h,h,~O,
a22/h~-a,,lhlh,20.
Any pair (h,, h,) satisfying h, /h, = 3 will satisfy these two inequalities. The Approximating Markov Chain Define MA by M A A = T , where we assume (an unimportant assumption, for notational convenience only) that M A is an integer. For x, y E R i , t = 0, A, ..., A ( M A - l), define the function p:*"(x, y ) by (2.6a) for y = x and otherwise by p:, ' ( x , x f e, h ) =
ifj.
i
1 aij(x,t ) I + hf'(x)
1
b
+ eih - e j h ) = p:vA(x, x - eih + e j h ) = (A/h2)a,7(x,t), ' ( x , x + e, h + ejh ) = p:* ' ( x , x - eih - e jh ) = (A/h2)a;(x, t),
p:"(x, p:
a,,(x, t ) -
x
pFA(x,y ) = 0,
(2.6b) (2.6~) (2.6d)
for all x , y pairs not covered in (2.6a-c).
Note that the p?'(x, y) are 2 0 and sum (over y) to unity for each x, t. Hence, *)} is a one-step transition function for a Markov chain, whose random variables we denote by {
V",' ( x , nA) =
1 P;~'(X, Y
y)Vh*"(y, nA
+ A ) + k(x, d ) A ,
n < MA, x
with the above boundary conditions. In terms of the random process
E
Gh ,
(2.7a)
{tt-"},
7
120
PARABOLIC EQUATIONS
we can write (2.7a) as Vh.A(x,nA) = E x ,
Vh*A(ti;"l , nA + A ) + k(x, nA)A,
n < MA, x E G h , (2.7b)
<:
where we use t = nA and A = x . The notation Ex, , denotes the expectation given that t;:. A = x . Let N ( h , A ) = min{n : 4: A # G,]. Since we have defined the chain only up to time M A , set N(h, A ) = oc) if <;A E G, for all n < M A . According to Section 3.1.3, the unique solution to (2.7b) is
[
Vh.' ( x , nA) = Ex.
(MA n N ( h , A ) ) - 1
1
i=O
+ bl(t$(t,A)
9
k(
N(h, A ) A ) ' { M ~ t N ( h .
A))
., .
1
(2.8)
.
) satisfies the required boundary conditions. Fix x E R; and t = nA. Define the interpolated process th*A(*) on [t, T ] by = x and thsA(s)= t i Aon [mA, mA + A), m 2 n, and let p(h, A ) = N ( h , A ) A denote the escape time of th*A(-) from G,. Then (2.8) can be rewritten as Vh*"(
Vh'' ( x , t ) = Ex,
[ 'r
T n p(h. A)
k(th' '(S),
S)
ds
+ E x , rbT(th' A(T))z(T
+ E x , r bl(th* " ( p ( % A))?p(h, A ) ) z ( T 2 p ( h .
A))
(2.9)
A)).
The notation E x , r denotes the expectation under the condition that 5,. ' ( t ) = x . The chain has the .local properties ( M A > m 2 n )
It iA =Y ] =f(y)A, .[titi - t i A I t i A = Y ] = 2a(y)A + h @ ( y ) - A 2 f ( y ) f ' ( y )
E x , .[tit1 - t i A COV,,
(2.10) (2.11)
= 'h(Y)A9
which are precisely the same expressions as (6.2.10) and (6.2.11) with A replacing Ath(y).The term 3( ) is defined below (6.2.11). We can write
-
tit1 = t i A+f(t, )
+ B:",
m 2 n,
= x,
(2.12)
where ( p i " } is an orthogonal sequence whose conditional covariance is given by the right-hand side of (2.11). From here on, the development is almost exactly like that in Chapter 6. It is a little simpler since we need only consider the processes on a finite
7.2
121
THE FINITE DIFFERENCE APPROXIMATION
interval [0, T ] and since the interpolation interval is a constant, not a random variable. Define (where i 2 n )
Then, analogously to (6.3.1), we have
th*'(s)
=x
+ Fh' '(s) + Bh*'(s),
sE
[t, TI.
The proofs of Theorems 6.3.1 and 6.3.2 immediately yield
.), o( ., - ) be bounded and continuous. Theorem 7.2.1 Fix x, t and let f Then { t h s A ( - ) , B h * A ( - t) , T , p(h, A) n T } is tight on D2'[0, T ] x [0, TI. If h, A indexes any convergent subsequence with limjt <(-), B ( * ) ,p n T , then t(.), B( .) have continuous paths w.p. 1. There is a Wiener process W (*), with respect to which t(.) is nonanticipative and ( a ,
=,+ItS f
( t ( v ) , v ) dv
+ B(s),
s
E
[t, TI.
Assume A7.1.1. Then the multivariate distributions of depend on the particular subsequence.
(2.13)
t(.),B ( . ) do
not
Define 7 and 7' as in Section 4.2. Then Theorems 7.2.1, 6.4.1, and 6.4.2 immediately yield
Theorem 7.2.2
Assume A7.1.1, A7.1.2, andfix x , t and suppose that P x , r= { ~T } = 0
Px,,{tn T
= 7' n
T } = 1.
(2.14a) (2.14b)
Then Vh'A(X,t ) + R ( x )
(2.15)
as h, A -+ 0 [always assuming that (2.5) and (2.6a) hold as h, A -+ 01. I f g ( * ) is any bounded measurable function: D'[O, T ] + R', which is continuous w.p. 1 with respect to the measure induced by (1.1) on D'[O, TI, then ~ x , r 9 ( t h ~ A ( . ) ) + E x , r 9 ( X as ( . ) h, ) A -+a
REMARKS For each x , t (2.14a) must hold except for a countable set of T. In fact (2.14) holds except for a set of ( x , T )ofzero measure. The problem of showing (2.14b) [which is more difficult than showing (2.14a)l is precisely
7
122
PARABOLIC EQUATIONS
the problem of showing the analogous result P,{z = T'} = 1 for the problem of Chapter 6. The condition (2.14a) is not required if the boundary function has no discontinuity at t = T, x E dG. Consider the classical problem of the r-dimensional heat equation, where f (- ) = 0, aij = 0 for i # j and uii(x)= a2/2, where a > 0 is a real number. Then the nonnegativity condition (2.6a) reduces to 1 - rAo2/h2 2 0 pr rA I h2/az,which is precisely the classical criterion of von Neumann for stability of the finite difference approximation [using (2.1) and (2.3)] to the heat equation (Garabedian [Gl, p. 4761, for r = 1). The chain can be used for Monte Carlo simulation of the X(.) process (1.1). Interpolations over random time intervals are not required and owing to the weak convergence Theorem 7.2.2, the distributions of the values of a large class [the g ( - ) functions of the theorem statement] of functions of X ( .) can be approximated using simulations of the chain {ti"}. In this context we should note that piecewise constant interpolations are not required. The interpolations can be taken piecewise linear, for example, without losing the convergence property; perhaps this would yield "better " estimates of the functions. The best form for the interpolation has not been investigated. Whether it is possible to (or how one may) interpolate between several points is not at all clear. In certain cases, one must exercise great care in doing this since we may get convergence of the resulting {ths"( .)} to some process other than (1.1). See Wong and Zakai [W3] or McShane [M2] for a discussion of limits corresponding to various naive or not "approximations" to diffusions. Let $(-) be a Rr-valued Wiener process which is defined on the same probability space on which {c:"} is defined, and let the two processes be independent of each other. Then, using {t: "} instead of in the constructions of Section 6.6, we define the random variables {6W:"} and process Wh* A( .), exactly as {6W;}and Wh(* ) were defined. Then, as in Section 6.6, we have that Wh7""()-,W ( . ) ,as h, A -,0, where W ( * )is the Wiener process used in (2.13), and we can also write (see (6.6.6))
{e:"}
{e;}
=
<: " +f((:
")A -I-ah({: ") SW:
"+
E;
"9
where the interpolation eh* "(. ) of {E: "} tends to the zero process as h, A -,0. The Fokker-Planck and Kolmogorov Equations Under various assumptions (which we will not be concerned with heresee Section 4.3), the density p(x, t ; y, s) of the transition function P ( x , t ; r, s) (s > t) satisfies the backward p.d. Kolmogorov equation in the initial variables (x, t), and the forward Kolmogorov or Fokker-Planck equation (4.3.2) in the forward variables (y, s). There are many problems that are
7.2
123
THE FINITE DIFFERENCE APPROXIMATION
associated with the usual approximations to the solution of the backward equation, even iff( -, -), c( * ) are smooth and a( -, - ) is positive definite. It is not uncommon that " numerical solutions " actually take negative values or have other pathologies. Furthermore, even if a "good" solution can be obtained, if it is only an approximation to the fundamental solution of (4.3.2) then it may be of use for approximating expectations of functionals of X( at some fixed time s, but may not be useful for approximating expectations of more general functionals of X(.). It should be clear that we can get an approximation to the solution of' both the forward and backward equations directly from the p$ "( .). We will develop the details for the nonhomogeneous case. Suppose that T = 00, and order the points on R; and? Gh u dGh in some way. Define the matrices
.,
a )
.,
where we let nA = t and NA = s. The empty product is defined to be the identity matrix. The matrix 02 i can be considered to be a function of four variables, the row and column entries x, y and the initial and final times n, N. Considered as such, it is a weak-sense approximation to the weaksense density p(x, t; y, s ) (as a function of x, t, y, s ) in the following sense. Let g(-) denote a bounded and continuous real-valued function on R'. As h, A + 0, but with nA and NA fixed at t and s, resp., Theorem 7.2.2 yields that
1 g(yP2 i ( x , Y ) Y E Rh'
+
I
(2.17)
g(y)P(x, t ; dY, s).
The approximation D: i is on the infinite state space R ; . We can stop the process on leaving a bounded open set G.Let P ( x , t ; r, s ) denote the transition function for the stopped process, and $;AA( -,- ) the one-step transition function for the chain, stopped on first hitting dG,. Define P:kA = {$:*"(x, y), X, y E Gh U dGh}, and define 3
eA *"
&I!
G {$i(X,
y), X, y
E
Gh
U
dGh).
(2.18)
Assume that P , ,{T = T'} = 1 and P , ,{T = s} = 0. Theorem 7.2.2 yields
1
Y E Gh u
s(y)$
i(x, y) +
JGg(y)@,
t ; dy, s)
as h, A -b 0. (2.19)
The limit in Eq. (2.19) is valid if we sum and integrate only over dGhand dG, resp., or over Gh and G, resp. Using (2.18), the expectation can be iterated
t Recall that aG, is the set of points in R, - G,
which are only one node away from G,.
7
124
PARABOLIC EQUATIONS
forward or backward in time, yielding weak-sense approximations to the weak-sense density j(x, t ; y, s) as s or t vary, resp. If we are only interested in approximating (weakly) the weak-sense solution to the forward equation (for the stopped diffusion) for some particular fixed initial value x E G,, then the iteration in (2.18) can be simplified by proceeding as follows. Let @$(x) denote the xth row of 62;.Define @p(x) = (0,0, ..., 1, 0, ...) where the only nonzero element is in the xth place. Define, recursively
O"!$+,(X)
= &$(x)P$",
N = n,
... .
(2.20)
The ;(x) constructed in (2.20) is precisely the xth row of D!:; . The amount of computation required to calculate (2.20) is of the same order as that required with the use of any finite difference approach to the forward equation. Our approximation is a transition density for a process {t;:"} which is very closely related to the diffusion X ( . ) .
7.3 Implicit Approximations The approximations (2.1t(2.4) constitute an explicit type of approximation, in the sense that they yield explicit recursive equations for the evaluation of the solution, in the classical sense of numerical analysis (Forsythe and Wasow [F2]). This is made evident by the iteration (2.7a or b), which starts at t = T = N A (where the boundary condition is given), and moves backward one A step at a time, and Vh*"(x, nA) is computed as an explicit linear combination of the Vh*"(y, nA A), y E G, u JG,. To get the standard type of implicit approximation (Forsythe and Wasow [F2]), we need to approximate the parabolic equation by considering it as a degenerate elliptic equation and use the method of Chapter 6. To do this, we make time into a state variable by defining a new variable X o ( . ) , with X o ( t ) = 1. Then we have (4.5.4) with a/dxo and xo, resp., replacing a p t and c, resp- The set G is replaced by G x [0, T), from which the escape time is always bounded above by T. Let T and T' be defined as in Chapter 4.2 and assume (2.14). Then we can apply the various theorems in Chapter 6. Since there are no second derivatives in t appearing in (4.5.4) even if we use a difference interval A for the approximation of the derivative a/& and h for the approximation of the spatial derivativatives, where h is not necessarily equal to A, we still get tractable formulas. The method is straightforward. Here we will give an illustration of the idea, by means of a simple onedimensional problem.
+
7.3
125
IMPLICIT APPROXIMATIONS
We consider the example K(x9t ) + a(x, t)Vxx(x,t ) + f ( x , t)Vx(x,t ) + k(x, t ) = 0, (x, t ) E G x [O, T ) , V ( X ,T ) = bT(x) V ( x , t ) = b,(x, t ) ,
x
E
(3.1)
G,
x E dG, t I T .
In Chapter 6, we required that the boundary function b( - ) be continuous. But, if P,, t(r = T } = 0, then the discontinuity at the corner dG x {T} does not affect the convergence since the function with values (at arbitrary x ( - )E D[O, TI, and where z(x(.)) = r is the escape time of x ( * ) from G x [O, T ) ) bT(X(f))l{r>T)
+ bl(x(r),
r)z{r
is still continuous w.p. 1 with respect to the measure induced by (1.1). Define 1 I f ( X , t ) lA/h, Qh, A(& t ) = 2a(& t)A/hz Ath(%t ) = W Q h , A ( & t). since X , namely
=
1 > 0, we use the forward difference approximation for K(x, t),
V(X, t ) + [ V ( X t, + A ) - V ( X ,t)]/A.
(34 Substituting (6.2.1), (6.2.2), and (3.2) into (3.1) and rearranging terms yields the finite difference approximation Vh.' ( x , t ) = Q [ i ( x , t){Vh*' ( x , t
+ A ) + Vh(x+ h, t)[a(x,t)(A/hz)
+ f + ( x ,t)(A/h)]+ Vh(x- h, t)[a(x,t)(A/h2)+ f - ( x , t)(A/h)l}
+ k(x, t)Ath.' ( x , t),
x E Gh , t C T. (3.3) Define the function phpA(x,t ; y, s ) by letting ph*'(x,t ; x, t A ) and ph*'(x, t ; x f h, t ) be the coefficients of Vh*'(x,t + A ) and P v A ( xk h, t), resp., in (3.3). Define ph*'(x, t ; y, s ) = 0 for all values of (y, s) besides (x, t A), ( x k h, t ) . The phsA(z;z') ( z = (x, t ) pair) are 2 0 and sum to unity over z' for each z. They are one-step transition probabilities for a two-dimensional Markov chain whose random variables we denote by We can rewrite (3.3) in the form
+
+
{s:."}.
VhsA(x,t ) = Vh(x,t + A)phvA(x,t ; x, t
+ A) + vh(x+ h, t)phVA(x,t ; x + h, t ) + Vh(x- h, t)phVA(x,t ; x - h, t ) + k(x, t)Ath*A(x,t), x E Gh, t < T.
(3.4a)
126
7 PARABOLIC EQUATIONS
If we write z = (x, t), then (3.4a) can be rewritten using the notation for the homogeneous case; namely, as
z? "
Vh*"(z) = E, Vh."(t? ")
+ k(Z)Ath."(z),
(3.4b)
zh*
is the successor state to z . Define the process "( to be the where interpolation of '}, in the usual fashion, with interpolation intervals At:. " = Ath. "). Then the results of Chapter 6 imply that Vh*"(z) -, R ( z ) = R(x, t) as h, A + 0. The flow of time is implicit in (3.4). Strictly speaking, (3.4a) cannot be solved in the same simple backwards iteration fashion as can (2.7). In a
{z:
a )
"(t:
(C)
FIG.7.1 State flows for the discretization of a parabolic equation in one state variable. (a) Explicit method. (b) Implicit method. (c) Implicit-explicit method.
7.4
127
DISCOUNTED COST; EXPLICIT METHOD
sense, "real" time (the zeroth component of ") advances in any step only with probability ph."(x, t ; x, t + A). When the zeroth component advances, the "real" state, namely x, remains the same. Figures 7.la and 7.lb illustrate the state flows for {<$"} and for {t:"}. To get the implicit-explicit method, we simply randomize between the implicit and the explicit methods. For any a E (0, l), choose an explicit step with probability a and an implicit step probability 1 - a (transitions illustrated in Fig. 7.1~). The implicit, or implicit-explicit, method can also be used for Monte Carlo (see Section 6.7). Real time increases at a rate of A per step of {<$"}. For the implicit scheme, A can take greater values than it can take for the explicit scheme-in fact, it is usually of the order of h rather than of h2. But the average increase in real time, namely At"*" ( z )= A/Qh, "(z), is of the order of h2 also. The time increment Ath. "(z) is "adaptive " to local dynamical conditions; perhaps this helps to explain the frequently observed numerical superiority of the implicit method.
7.4 Discounted Cost: Explicit Method Consider the cost functional (4.2.4b) with associated differential equation (4.5.4), with k ( x ) replaced by k ( x ) - A(x)R(x, t). Applying the approximations (2.1)-(2.4) to this equation yields the finite difference equation (compare with the elliptic case of Section 6.5)
instead of (2.7). Of course the boundary conditions remain the same. The limit of VhvA(x,t) as h, A + 0 can be shown to be the same as the limit (as h, A + 0) of both the solutions to (4.2) or (4.3) [compare with (6.5.3) or (6.5.4) in the elliptic case]
+ + k(x, t)A],
VhSA(x, t ) = (exp -A(x, t)A)[E, Vh9"(<~;",, t A)
Vh*"(x, t) = [exp - A(x, t)A]E, V".
+ [l -exp
-
t)
t k(X,
t),
+ A) (X,
t ) E Gh
X
[o, T ) .
(4.2) (4.3)
128
7
PARABOLIC EQUATIONS
The unique solution to (4.3), with the correct boundary conditions, is V",'(x, t ) = Ex,
np(h'A)[exp- j:L(th(v)) dv]k(r*.*(s),s ) ds
t
+ 4, t
[
p(h, A)
exp -
jt
n(t"s)) ds] .
(4.4) It is not hard to see that Theorem 7.2.2 holds for the above Vh*"( * ) and the functional R(., given by (4.2.4b). Of course, the same convergence result holds if we discretize by either the implicit or the implicit-explicit method. bl(
A(p(h?A)), p(h,
A ) ) z { T > p ( h , A))
0 ,
0
)
7.5 Nonlinear Filtering One of the basic and difficult problems of current stochastic control and communication theory is the nonlinear filtering problem for diffusion models. Let X ( - )be defined by (1.1) assume A7.1.1, and suppose that for some integer u, z( ) is a R"-valued Wiener process which is independent of w( ). Let g( ) and b( denote bounded continuous R"-valued and R-valued, resp., functions on R' x [0, CQ) and R', resp. At each time T 2 t ( t = initial time), let the observations Y(s),s E [t, TI, be available, where we define Y ( * )by a,
-
-
.
a )
dY(s) = g(X(s),S) ds
+ dz(s),
Y ( t )= 0.
In many applications, we wish to calculate or approximate either the conditional probability px, t { X ( T )E l- I Y(S),s E [t, TI},
or the conditional expectation
I
E x , t{b(X(T)) Y(S)9 s E
[t, TI19
where x = X ( t ) is either a constant or a random variable. a recursive solution to the or Using the approximations approximation problem can be obtained (which will be similar to the recursive solution to the approximation problem for the Fokker-Planck equation). For computational feasibility, it is necessary that the state space Gh u dG, be finite. So we will assume that X ( . ) stops on reaching the
rh*A(*),
7.5
129
NONLINEAR FILTERING
boundary 8G.Next, a precise formulation of the problem, which is convenient for our purposes (and also very convenient for other theoretical purposes-such as the derivation of It6-type equations for the evolution of conditional moments) will be given. Let R( .), f ( be solutions of the It6 equations a )
X ( S )= x +
[
s n?
f ( X ( s ) , s) ds +
['" 'o (X (s ))dw(s),
(5.h)
where (w( .), R( -)) and (w(.), X ( .)) are independent processes and T and 5 are the escape times of X(*) and resp., from G. By A7.1.1, both R(-) and f ( - have ) the same multivariate distributions. Let z ( - ) be a Wiener process which is independent of w( ) and ii( * ) and define the process p( ) by %(a),
-
dF(s) = g ( X ( s ) , s) ds + ~ z ( s ) ,
Define R ( - ,
a )
by
R(t, T ) = exp
-
s > t,
F(t) = 0.
[j,g'(x(s), s) d p ( s ) - 4 1 Ig(X(s),s) 1' T
T
ds],
(5.2)
and define
3: = minimal o-algebra over which and the term
a ( s ) , s E [t, TI, is measurable,
a:].
&(x, t , T , 0)= Ex,,[R(t, T ) b ( X ( T )1)
Note that the conditioning serves to fix the p(*)process; the expectation above is only over the R( .) process. One of the fundamental results of nonlinear filtering theory is the represen tation
where V, uses the function which is identically unity for.b(.) (Kushner [K4], Zakai [Zl]). In (5.3) the initial condition is X ( t ) = x, a constant. However, if X ( S )is a random variable with distribution P , ( - ) supported on G, we have (w.P. 1) E[b(%(T)Ia:]=
j V,(x, t, T, o)p,(dx)/j Vl(x, t, T, o ) P , ( d x ) .
Thus, to compute the conditional expectation, we must compute V, for arbitrary b. The continuity of b(.) is not required for the validity of (5.3),
7
130
PARABOLIC EQUATIONS
where b( - ) may be an arbitrary Bore1 function with finite expectation but it is used in the proof of convergence of the approximation procedure. suggests that an approximaThe exponential in the definition of R( tion to V, can be computed in a manner similar to the way we computed an approximation to the discounted cost problem in Section 4, where the exponent in (5.2) would be related to a discount factor such as A( .). In fact, if we note that (d, = It6 differential) 9,
d,[exp - j f T A ( s ) d s ]
=
9 )
[exp -jtTA(s) d s ] A ( r )dt
and that (use It6’s lemma) - d,R(t, T ) = R(t,
T ) g ’ ( X ( t ) ,t) df(t),
we might guess that the proper replacement for A(x) dt in the differential equation (4.5.4) (with k ( x ) - A(x)R(x, t) replacing k ( x ) there) would be -g’(x, t) d v ( t ) . The subscript t denotes the variable with respect to which we are calculating the differential. In fact, it can be shown that the unnormalized conditional expectation V,,(x, t, T, o)formally satisfies the backward equation (in the initial variables x , t) (5.4), although we will not go through the formal calculations; our only aim is the development of a computational procedure for approximation of (5.3) via the use of the chain {c: ’}, and observed data y( -).
d, V,(X,t , T,o)+ Y V , ( x , t, T, o)dt
+ g’(x, t ) d f ( t ) V , ( x ,t, T, o)= 0,
( x , t ) E G x [O, T), (5.4) V,(X,T, T , o)= b(x),
x
E
G,
t i T , xEdG. Equation (5.4) is a backward equation. Equations of the forward Kolmogorov or Fokker-Plank type for the conditional density have been discussed by Kushner [K3] and Mortenson [M5]. The backward equation for the unnormalized conditional expectation is better behaved and will serve, as it did for the ordinary Fokker-Plank equation, to yield a recursive procedure for approximating the conditional expectation or weak-sense conditional density, as T increases, and new observational data become available. However, we start the development by actually doing an iteration in the direction of decreasing t. In what follows, is the signal process, and P(.) is the actual process of observations taken on the diffusion .). We can assume, with no loss of
x(
a)
x(
7.5
131
NONLINEAR FILTERING
x(
generality, that *), ?( .) are the original diffusion and observation process, respectively. Let {Ef"} be a Markov chain on the state space G h u dGhr which is stopped on first exit from G h and which has the same transition function that {<:A} has in the set Gh. The process is taken to be independent of the given process p( -). Also, assume thatt
{r;"}
-=
P x , , { t = t'} = 1,
all x E G, 0 It oc). Let T = M A A , where MA is an integer, and define 6, p ( t ) = ?(t p(t), and define V i * A ( xt,, T, o)by the iteration
I
1
(5.5)
+A)-
V!*'(x, nA, T, o)= exp g'(x, nA)6, a ( n A ) - f I g(x, nA) I2A
X E , , , V ,h . A ( -h. < , A+ l , n A + A , T , w ) ,
XEGh,
MA, (5.6)
with the same boundary conditions as in (5.4). Note that in (5.6) we never take expectations over the p( .) process, only over the "} process. Equation (5.6) has a unique solution, which can be written in the form
{c:
where
V;. '(x, nA, T, o)= Ex, .[Rh*'(nA, T)b(zh* "( T ) )1
n exp[g'(cF ', iA)6, p ( i A )
MA- 1
Rh*'(nA, T ) =
i=n
-f
I g(?F ',
(5.7)
iA) 12A].
The conditioning serves to remind us that we d o not take expectations over the p(*)process. Skorokhod imbedding will be used where useful. Now let us fix x, t and use the weak convergence results of Section 7.2. Under the conditions (5.5) and A7.1.1, Theorems 7.2.1 and 7.2.2 imply that P h s A ( s )
+
qs),
s 2 t,
r(
uniformly in each finite time interval, as h, A + 0, where is a solution to (5.la) [for some Wiener process W ( - ) ]with c ( t )= x. This implies that, w.p. 1, as h, A , 0 ' a )
Since g( ', - ) is bounded and (x, t) are arbitrary, there is also convergence in the mean for each (x, t). Hence, w.p. 1,
t The condition can be weakened. We only need that it holds for the value oft which is the initial value in the original filtering problem. Similarly, if the initial distribution is concentrated on x E all G,, it need only hold for x.
7
132
PARABOLIC EQUATIONS
as h, A + 0. Thus, via the finite iteration (5.6), we can calculate V$ ‘(x, t, T, w ) / V t ‘(x, t, T, o), an approximation to the conditional expectation (5.3). In the paper (Kushner [K6]), a similar convergence is shown if the exponential in (5.6) is replaced simply by the first-order approximation [l g’(x, nA)ijA P(nA)].
+
Recursive Equation for the Approximate Conditional Density
We can derive a recursive formula for an approximation to the weak-sense conditional density from (5.6), (5.7). (Compare with the development for the Fokker-Planck equation in Section 7.2.) Fix h, A and let H denote the number of points on the grid Gh u 8Gh. Order the points in some way, and recall the definition of the one-step transition matrix pgs A from Section 7.2. For arbitrary H vectors V , X and an H x H matrix F , define the operator by VoF=
Vl F11
“p’”],
...
[ :
vex=
[
VHFHH
vHFHl
1.
0
Vl ;Xl
VHXH
Henceforth, the variables x, y range over G h u dGh and are ordered as the points of Gh w aGh are ordered. Let B, E m , and V,(m, o) denote the vectors {b(y)}, {exp[g’(y, mA)dA?(,A) - I g(y, m A ) I’A]}, and {V$ “(y, mA, T , w)}, resp. W e will suppress some of the h, A notation. Equation (5.6) can now be rewritten in vector form as
4
where
&(n, o)= En
0
p;’V,(n
+ 1, o),
n <M,
(5-9)
V,(MA, O ) = B.
Now, for each pair of integers n, in, with m 2 n, define the H x H matrix Q(n, m, o)with components qx,,,(n,m, o)by Q(n, m, w ) = I, the identity,
for m
= n,
and, for m > n, by the recursive formula Q(n, m, o)= En 0 p;AEn+l0 = En 0
P;21 ... Em-l lP!: 0
P,”.‘Q(n + 1, m, w )
= Q(n, m -
1, w)Em-
0
P“hi -A 1.
Now let nA = t, M,A = T. Then by iterating (5.9), (5.10)
7.5
133
NONLINEAR FILTERING
In terms of components, (5.10)can be written as V'd*'(X, t, T, w )=
1 qXY(4MA,w)b(y)-
(5.11)
Y
By (54, (5.1l), and the weak convergence,
+
Ex., [ b ( z ( T )1)%), s E [ t , TI],
w.p. 1,
(5.12)
as h, A + 0. As we will see next, only one row of (5.9) needs to be iterated. Suppose that t is the initial time and that x ( t )is not concentrated at x but has a weak-sense density p,( -). Let the row vector {p:(x)} denote an approximation to p , ( . ) , with support on Gh u 8Ghrand which converges weakly to p , ( * ) , as h + 0. Let Q(n, m, w,x) denote the xth row of Q(n, m, 0).Then Q(n, m, w,x) = Q(n, m - 1, w,x ) E , - ,
0
(5.13)
P'-hA A- , .
For each pair of integers n, m, m 2 n, define the row vector @(n, m, w) with components ij,(n, m, w) by o(n,m, w) = {p:(y)} for m = n, and for m > n, by the recursive formula e ( n , m, w )= Q(n, m - 1, w)E,1
Then we have that
1 Y
iy(%
MA,
I,1
w)b(y)
MA?
+
,
0
'-h A PA-
E [ b ( x ( T ) )I p(s)?
(5.14)
[t,
w.p. 1,
as h, A + 0. Thus the vector with components
!.1
ijY(n,m, 0)
i.b, m, w )= FYhm, 0)
(5.15)
is simply a weak-sense approximation to the weak-sense density of X(mA), conditioned on F(s), nA I s ImA, in the sense that it converges weakly w.p. 1, as h, A -+ 0. Furthermore, it can be computed recursively via (5.14). The quantity j ( n , m, w) in Eq. (5.15) is simply the Bayes-rule conditional density f o r the chain namely (with an obvious abuse of notation) fly(n, m, w )= P{cA = y 16, p(i) = g ( e . ', iA)A + 6 , z(iA), n I i c m},where '} is independent of z( .) and has the same law that {f: "} has. We can see this as follows. It is true for m = n. Suppose that it is true for m = 1 - 1 2 n. By formal manipulations with Bayes's rule, we show it to be true for m = 1.
{e
{c"},
7
134
PARABOLIC EQUATIONS
Let E,(y) = yth component of the vector E,. We have, by the induction hypothesis,
P{z:iq = y I current observation
=
+ 6,z(lA
aA?(lA
- A) = g(z:it,
IA - A)A
- A); 6, ?(iA - A), n + 1 Ii < I } = P{current observation has value dA ?(IA - A) 1 e:it .P(z:i4=y(6,?(iA-A),n+ = (exp
-4
=y}
lIi
16, F(IA - A) - g(y, IA - A)A 1)’ . Qy(qI - 1, o ) K ;
= El- ,(Y)QY(&
I - 1, a),K;,
where K , ,K ; , K ; are normalizing factors which d o not depend on y. The last equation implies that, modulo normalizing factors,
~{z:,
= ~ 1 6 ?, ( i ~ ) , n Ii =
1 4- I(Y)9,(4 Y
< I}
I - 1, o)P{z:g A
=x X
I z:i4
= y},
E Gh U d G h ,
(5.16)
which equals Qx(n, 1, o)by the definition (5.14). Thus, the normalized value (5.15) is the conditional probability, as asserted. If we used the first-order approximation { 1 + g’(y, mA)6, ?(mA)} in lieu of Em, we would not have the Bayes-rule interpretation and, indeed, it is possible that some Q,(q m, o)would be negative even though we would still get the correct conditional expectation as h, A + 0.
7.6 Numerical Data: Estimation of an Invariant Measure Tables 7.1-7.3 give data on a numerical approximation to the invariant measure for the simple scalar case dx = - ( ~ / 2 )dt
+ J2
dw,
using both the elliptic and parabolic approximations. The unique invariant measure is normal with mean zero and variance 2. Since its support is the entire real line, the state space was truncated to either the interval [ - 3, 31 (Tables 7.1 and 7.2) or to the interval [ - 19, 191 (Table 7.3). Let x = left endpoint of the truncation interval. In order to account for the truncation, we selected a number q E (0, 1) and used ph(x, x + h ) = 1 - q, ph(x, x ) = q. Similarly, ph(x, x - h) = 1 - ph(x, x ) = 1 - q when x = right endpoint. The tables give the cumulatioes, i.e., the calculated distribution functions. Since these are antisymmetric about x = zero, only half the points are plotted.
7.6
135
ESTIMATION OF A N INVARIANT MEASURE
TABLE 7.1 ESTIMATE OF A N INVARIANT MEASURE. SPACETRUNCATED TO [ - 3, 31, h = f
x
Gaussian
Approximate Gaussian
E q =4
.2 .4 .6 .8 1.o 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0
,556 ,610 .665 ,713 .761 ,805 341 .873 .900 ,921 .940 .962 ,967 ,976 .983
3 8 .614 .669 ,720 .767 .8 10 346 .88 1 .910 ,933 .953 .968 ,980 ,989 ,997
3 9 .616 ,670 .72 1 .768 .8 10 A47
P
B=
4, 4 = I 2
.559 617 .672 .723 ,770 ,812
350
,880 ,910 ,934 ,950 ,970 .982 992 ,998
380
.907 .93 1 .950 ,966 .978 .988 .996
E q=.75 .558
.614 .668 ,718 ,764 .805
,842 374 .901 .924 .943 ,959 ,971 .98 1 .993
Refer to Table 7.1. Column 1 gives the value of the state, and column 2 the cumulative Gaussian (variance 2). We calculated the Gaussian density at each point 0, fh, ..., on [-3, 31 or [- 14, 1-51, interpolated this by use of a piecewise linear interpolation, and then used the interpolation to calculate an approximation to the Gaussian distribution. This distribution is tabulated in column 3. The value at x = 3 is not unity; the error is the density at x = 3, which mass we assigned to (3, a).The other columns give the numerical data using either the elliptic (E) or parabolic (P) schemes. In all cases (and all tables), we took a piecewise linear interpolation between the value obtained at 0, & h, . . ., and plotted the cumulative from that. The
*
TABLE 7.2 ESTIMATE OF A N INVARIANTMEASURE. SPACETRUNCATED TO [ - 3, 31, h = 4
P
Gaussian
Approximate Gaussian
Table 7.1 column 4
q = .3
E
P B = 4, q = 2
P B = 4, q = .3
.6 1.2 1.8 2.4 3.0
,665 ,805 .900 .962 .983
.665 304 903 .96 1 .99 1
.670 .8 10 .907 ,966 ,996
.669 .804 .889 .957 .789
,667 ,799 .982 .949 .985
.673 ,811 ,907 .967 .994
136
7
PARABOLIC EQUATIONS
TABLE 7.3
BTWATE OF AN INVARIANT MEASURE. SPACE TRUNCATED TO [ - 13, 131, h = Approximate Gaussian .25 .5 .75 1 1.25 1.5
,593 .684 ,769 .846 ,915 .973
B
=
P 19 q = I2
P B =f, q = f
P B = f, q = .3
.601 .697 ,786 366 .937 ,985
.602 ,700 .79 1 .873 .945 .989
,595 .686 ,770 ,846 .913 ,972
a.
errors were smaller with the use of the interpolation. The value of h is given in the table; to calculate A (for the parabolic case), we chose a number B E [0, 11 and [see (2.6a)l selected A by the formula B = (A/h2)max[h I x/2 1
+ a’],
X
where I x I I 3 or 1x1 I 1i according to the case and a’ = 2. It is not at all clear whether column 2 or 3 should be taken as the basis of comparison. The values with the parabolic method and B = 1 were about identical to the values with the elliptic method. The approximations calculated by both the elliptic and parabolic methods seem to be rather close to the true values (columns 1 or 2). The main problem is the choice of q. The value should reflect the average amount of real time that the original process spends outside the truncation region.
CHAPTER 8
Optimal Stopping and Impulsive Control Problems
In Section 8.1, we discuss the discretization of the optimal stopping problem by use of the finite difference methods of Chapter 6 and show that the discretization is an optimal stopping problem for a Markov chain. In Section 8.2, we discuss the optimality of the limits of the costs and of the “ interpolated optimal stopping times for the approximating chains. With a fixed initial condition (x), the interpolated Markov chain converges to a solution <(*) of (1.5.1) with respect to some Wiener process W ( * and ) initial condition x. Also p, the limit of the discretized stopping times, is nonanticipative with respect to W(.). We then define a Markov process [solution to (1.5.1)] (a,A3, Bf,X(-), P , , y E R‘) such that a stopping time p (relative to {gf}) is defined on Q and under P,, (X(-), p ) has the same law and cost as has p). Thus we can suppose that p E S;(x). This theorem (8.2.4) is proved under a Lipschitz condition. If there is uniqueness (A6.1.1) but not necessarily a Lipschitz condition then P(x) = limit of discretized costs I V ( x ) .The method of proof involves a “discretization” of stopping times T E Subsequently, the Lipschitz condition is dropped and the results strengthened. ”
(<(a),
rw.
137
138
8
OPTIMAL STOPPING
In Theorem 8.2.5 we show that if V ( . )is continuous, then P(x)= V ( x )and in Corollary 8.2.2, the continuity is removed. Here, we approximate and discretize the pure Markov stopping rules. The methods of Theorems 8.2.4 and 8.2.5 are of independent interest, for they illustrate different approximation techniques which are useful in other problems also. Then various extensions to the discounted problem or to the problem where we are forced to stop on first hitting a set are discussed. The Lipschitz condition is helpful for two reasons. It implies that the yw class is as “good as” the yPM or yRclass, because the path can be represented as an explicit w ( . ) function in a convenient way. Second, it facilitates the proof that certain approximations to diffusions converge as the approximation parameter goes to zero. In fact, these proofs use standard techniques (requiring the Lipschitz condition) and most of the details are left out. In Section 8.3, we discuss the optimal stopping problem with side constraints, a topic on which (as with all stochastic control problems with side constraints) there is relatively little known. In Sections 8.4 and 8.5, the impulse control problem is discretized and it is shown that the discretization is an impulse control problem for a Markov chain. Again, under a Lipschitz condition, it is shown that the limiting sequence of action times and values is minimal. The removal of the Lipschitz condition is also discussed in Section 8.5, and in Section 8.6, some numerical results for the optimal stopping problem are given. In Chapters 5, 6, and 7, linear partial differential equations were dealt with. The optimal stopping and impulsive control problems of this chapter relate to a class of nonlinear partial differential equations.
8.1 Discretization of the Optimal Stopping Problem The treatment of the optimal stopping problem of Section 4.7.1 is a relatively simple extension of the discretization and convergence ideas in Chapter 6. We use the notation of Section 4.7.1 and of Chapter 6. The main problem concerns verification of the minimality of the limiting cost function. Assumptions A6.1.1 and A6.1.2 are assumed t o hold throughout Sections 8.18.3. W e also suppose (unless otherwise mentioned), that there is used a real ko > 0 such that inf, k ( x ) 2 ko . According to the formal optimality Eq. (4.7.6), if we find a smooth function V ( . ) which satisfies
8.1
DISCRETIZATION FOR OPTIMAL STOPPING
139
where B = { x : V ( x ) = b(x)}, then B is an optimal stopping set and V ( x ) is the minimal cost. We will use the finite difference approximations of Section 6.2, and let Vh(- ) denote the finite difference solution. Substituting those finite difference approximations into (1.1) and keeping in mind that V ( x )I b(x), we get that (6.2.7) holds if Vh(x) b(x). Thus since we must use the condition V h ( x )I b(x), the correct discretized equation is
-=
+
Vh(x)= min[b(x), E x Vh(<:) k ( x ) Ath((.)].
(1.2)
Equation (1.2) is the dynamic programming equation for an optimal stopping problem on {
-
Note that inf, k ( x ) Ath(x)> 0. Hence by Section 3.2, (1.2) always has a unique solution, which is the minimal cost, namely, V h ( x )=
inf
m E FPM(h)
Rh(x, m ) =
inf Rh(x, m )
m E Fn(h)
(1.4)
and a minimizing pure Markov stopping rule exists. Let mh denote the optimal stopping time (in FpM(h)or S,(h))for the chain, At;, the optimal stopping time for the interpolated and dejne P h = ) we will sometimes use process, <'(.). When dealing with the t h ( .process, the notation Rh(x,p h ) for Rh(x,mh).Also, a bar over FpM(h),etc., implies that the interpolation is referred to. If p is a stopping time for th(.), define Rh(x, p ) by
I:!;'
Rh(x,p ) obviously makes sense for p of the form m- 1
p =
I At!
i=O
= t h, ,
8
140
OPTIMAL STOPPING
where m E YR(h);but since t h ( -is) a continuous parameter process, the values of p need not be restricted to t:, i = 0, 1, . .. . Observe that? E,ph I 2 maX I b ( x ) I / k o
.
X
(1.5)
The upper bound in (1.5) also holds for the optimal stopping time for the diffusion. Let us use the representation of Section 6.6 for S! and W : . Define
&,"(.) =
(('(a),
Bh(*),F h ( * ) , W ( ' ) K , h ( * ) th(O)= ; x).
It follows from Theorems 6.3.1 and 6.6.1 that the sequence {@(.), ph} is tight on D4'+l[0, co) x R'. The limit (&(*), p ) = (((-), B ( . ) , F ( - ) , W ( . ) ,K ( . ) , p ; ( ( 0 ) = x ) of any convergent subsequence satisfies f
Henceforth, fix a convergent subsequence. By ( 1 4 , the { p h } are also tight on R + = [0, co),and E , p < 00. Ifwealsoassume that the Skorokhod imbedding is used, then ph + p w.p. 1. The multivariate distributions of ((.) d o not depend on the subsequence (uniqueness) but the distribution of p may possibly depend on the subsequence. Also,
8.2 Optimality of the Limiting Stopping Time p We must show that p is a legitimate stopping time in 9; ( x )in some sense and that R ( x , p ) = V ( x ) .So far, we know very little about p. In order for p to be a stopping time in T i , it is necessary to show at least that the function is nonanticipative with respect to W ( - ) [Then . we say that with values t Equation (1.5) follows from I b ( x )I 2 V"(x)2 - max, I b ( x )1 + k, Ex ph .
8.2
141
OPTIMALITY OF THE LIMIT
the stopping time p is nonunticipative with respect to W(.).I Our formulation of the optimal stopping problem actually requires more: namely that p be a stopping time with respect to a sequence of nondecreasing a-algebras on a space which supports a Markov process. This is not necessarily the case with p. However, we will find a Markov diffusion process X ( . ) and an appropriate stopping time p such that ( X ( - ) ,p, X ( 0 ) = x) and (<(*), p ) have the same probability law. Hence R(x, p) = R(x, p). Since we are mainly concerned with the cost functionals, this will eventually allow us to show that R(x, p ) is minimal. The next theorem will be helpful in showing the nonanticipative property of p.
Theorem 8.2.1 Let w( * ) denote an R'-valued Wiener process, and Y a realvalued random variable, and s a nonnegative real number. Suppose that for each integer n and each pair of bounded and continuous functions F ( - ) : R + R, F,( : R" + R, and each set of real numbers (t, s,, . . .,sl) satisfying t 2 s 2 s, 2 s,- 2 ... 2 sl, we have a )
EF(Y)Fn(w(s,),-.-,w(sn))(w(t)- ~ ( s )=) 0.
(2.1)
Then Y is independent of {w(t)- w(s), t 2 s}.
PROOF Let q( - ) denote an arbitrary bounded continuous R'-valued funcand let 1 denote an arbitrary real number. By It6's lemma, tion on [0, a), '(exp i1Y) [exp i jsuq'(p)dw(p)] iq'(a) dw(a) - ___ 2 Note that E exp i jsrq'(a)dw(a) = exp
-4 [' I q(a)1'
do.
.S
Taking expectations in the first equation above and using (2.1) and the second equation and the definition
yields
c, = c, - I"cu .S
8
142
OPTIMAL STOPPING
which has the unique solution C , = E(exp iAY)E(exp i jsrq’(o)d w ( o ) ) .
The last equation yields the cited independence since it implies that the characteristic functional of { Y, w ( t ) - w(s), t 2 s} is the product of the characteristic functionals of Y and of { w ( t )- w(s), t 2 s}, resp. Q.E.D.
Theorem 8.2.2 Let p denote the limit Of{Ph} and let F( * ). F,(. ) and s, t, sl, t o is such that to 5 s and P,{p = t o } = 0 and Y = ZIP I then
. . ., s, be as in Theorem 8.2.1. Zf
EF( Y)F,( W(s,),. . ., W(S,))(W(t)- W ( s ) )= 0.
(2.2) PROOFWe use the Skorokhod imbedding. By a direct evaluation, with y h = 4 P h S 1 0 )’
. . .,
(2.3) since E [ W h ( t )- Wh(s)I lh(a),u I s] = 0 and Y his a functional of lh(o),o 5 s. Recall that (proof of Theorem 6.3.1.) { W h ( t )- Wh(s)}is uniformly (in h ) integrable and note that Z l p h s l o ) + Y w.p. 1 since P,{p = t o } = 0. Thus by the weak convergence, the left-hand side of (2.3) tends to the left-hand side of (2.2) as h -0. Hence (2.2) holds. Q.E.D.
,&(
q Z { P h 5
Wh(sl),
Wh(s,))(Wh(t)- Wh(s))= 0
REMARKIf P,{p = t o } > 0, then we would not necessarily have the conro) + Y as h -,0, w.p. 1. vergence
Corollary 8.2.1 The random variable p is nonanticipative with respect to W ( - ) ,in the sense that ZlP5s1 is independent o f { W ( t )- W(s),t 2 s} for all s 2 0. PROOFLet s be fixed. By Theorems 8.2.1 and 8.2.2, is independent of { W ( t )- W(s), t 2 s} if P,{p = t o ) = 0 and t o 5 s. There are at most a countable number of values of to for which P,{p = t o } > 0. Let t , J s with P,{p = t,) = 0. Let E > 0 denote a positive real number. Note that is independent of S, = { W ( t )- W ( s + E), t 2 s + E } for large n by Theorems 8.2.1 and 8.2.2. Thus I { p s s l is also independent of S, since I(pSsl= limn I(pSIn) w.p. 1. Since the independence is for each E > 0, and W ( s + )= W(s),the proof is concluded. Q.E.D. Next, we discuss the sense in which p is an optimal stopping time, and R(x, p ) is minimal [ = V ( x ) ] .Indeed, R(x, p ) always equals V ( x ) . There are several different types of approximafion techniques which can be applied to prove the result. These techniques seem to be of some independent interest, and can be applied to the study of other problems in stochastic control
8.2
OPTIMALITY OF THE LIMIT
143
theory. In order to illustrate them, several different proofs, under different conditions, will be given.
Theorem 8.2.3 Fix x = X(O), and let f ( * ) and a(.) satisfy a uniform Lipschitz condition. Then p E
s:(~).
REMARK In order to assure that the main idea is apparent, the theorem is worded loosely. A more formal statement (which is the actual assertion to be proved) is the following. Lat f ( - ) and a ( - )satisfy a uniform Lipschitz condition. There is a strong Markov process (Q, A?, &,, P , , y E R', X ( . ) ) , and a Wiener process w ( . ) dejined on each (Q, d,P,), where w ( .) is adapted to {& andIfor ,}each y E R', ( w ( * )X, ( .)) solve (1.5.1) (with X ( 0 ) = y ) w.p. 1 under P , . There is a random variable p dejined on each (Q, J, P,), which is a stopping time relative to {d,}. Also, ((( -), p ) and ( X ( . ) , p ) (under P , ) have the same distributions. I n addition p E S i ( x ) . The costs R ( x , p ) and R ( x , p ) = lim,,+o R"(x,P h ) = lim,,+o V"(x) are the same.
PROOF We have shown that satisfies (1.6) and that p is nonanticipative with respect to the W ( . ) in (1.6). Owing to the Lipschitz condition, we can define a solution to (1.6) for any initial condition y E R' on the same sample space on which W(.) is defined since W(* ) can be used to constructively generate the solution for any initial condition (Section 1.5). In fact, the to (1.6) which we generate from W ( . ) by the solution [denoted by iterative method of Section 1.5 equals (( - ) w.p. 1, when 2(0)= x . However, we will now change the sample space. Define R' = C'[O, a),J [ O , t ] = Borel algebra over [O, t ] , J' = Borel algebra over a', 9;= smallest sub-a-algebra of A?' which contains x ( s ) < a}, all s I t and all c1 E R'. Define Q = Q' x R' x R + , and let d and d,denote the product a-algebras A?' x A?' x J [ O , co) and &; x di x (B[O,t ] u [ t , a)), resp. Let P , denote the measure which the triple W ( . ) , p ; X ( 0 ) = y ) induces on (Q, a), and let the triple ( X ( . ) , w ( - ) , p ) denote the correspondingt random processes on (R, 99).Clearly, X ( * ) and the solution to (1.6) or to (1.5.1) all have the same probability law, if their initial conditions are the same. The random variable p is obviously a stopping time with respect to {d,} since { p It} = R' x R' x [0, t ] E ;8, for each t. Since p was nonanticipative with respect to W(-), and since x ( s ) ,s 2 t, was a functional of {x(t),W(u)W(t),u 2 t } , we have that { I ( p sw(u), r ) , u It}, is conditionally independent of X ( s ) , s 2 t, given X ( t ) , under each P , . Thus we still have the Markov ((a)
x(.)]
{.(a):
(x(.),
t That is, we identify the processes X ( . ) , w ( . ) and random variable p with the three components of o.
8
144
OPTIMAL STOPPING
property [i.e., the “presence” of w ( .) and p and their measure theoretic supporting structure does not ruin the Markov property of X ( .): namely, for each r E B ( R ‘ ) and Y E R’ and t , s 2 0, P , { X ( t + S ) E rig,} = P,,,,{X(s) E r}w.p. 11. Since X ( . ) is a Feller process under the P , , (Q, d, d t , P , , y E R‘, X(.))is a strong Markov process. Hence p E Ti(.).The costs R(x, p ) and R ( x , p) are the same since ( X ( .), p) under P , and (<( ), p ) have the same probability law. Q.E.D.
-
The proof that R ( x , p ) is the minimal cost requires that we compare R ( x , p ) to R(x, t) for a class of stopping times 7. To d o this, we must somehow use the properties of p. The most important property is its definition as a limit of optimal times for the th(). Hence we must somehow “discretize” 7 to obtain a policy which can be compared to ph, and then get the comparison between R(x, p ) and R ( x , z), by taking limits of the costs for the discretized policies in a suitable way. There are several ways in which the discretization can be done. Theorem 8.2.4 illustrates a technique for discretizing stopping times which are functionals of w( - ) and Theorem 8.2.5 and Corollary 8.2.2 illustrate a method for a discretization based on approximating the optimal stopping sets B or B , . In Chapter 9, some different methods are developed. Those methods are actually more general than the discretization techniques of this chapter.
-
Theorem 8.2.4 Under the conditions of Theorem 8.2.3, R ( x , p ) = V ( x ) . Hence, since the limit is independent of the subsequence, V h ( y )converges t o the minimal cost as h + 0, for each y E R’. If we assume only uniqueness of the solution to (1.5.1) in the sense of distributions (and not a Lipschitz condition), then R(y, p ) I R(y, z) for every z E Fwand y E R’.
PROOF We will only prove the first sentence. The second and third sentences are actually implied by the proof. Under the Lipschitz condition, X ( - ) can be represented as a functional of w( - ) for each x ( 0 ) = x. Thus, by Section 4.7.1, inf R(x, 5 ) = inf R ( x , t ) = inf R(x, t) = V ( x ) . TE
FPM
r
E
Fw
IE T R
So by Theorem 8.2.3, we only need to show that R(x, p ) I R(x, t )
for all
tE
yw.
(2.4)
The proof is divided into several steps. Owing to (1,5), (2.4) will hold if, for each T < co, it holds for all 7 E Twfor which 7 I T . We approximate each such t by a stopping time which takes only finitely many values and which depends on the values of w( .) at only a finite number of times. We then use a similar stopping rule for the chain, and show that the cost is no less than V h ( x )= Rh(x, mh). A suitable limiting procedure then yields the desired result.
8.2
145
OPTIMALITY OF THE LIMIT
PART1 Let 0 < 6 4 A < T denote real numbers such that TIA and A16 are integers, and let &,(w) denote the minimal o-algebra over which {w(i6), all i6 InA} is measurable. Let z E Fw,z I T. All i6 or nA terms are assumed to be no larger than T. Define the sequence pi(& A)
=
Ex[r l&(w)],
i
=
1, 2, . . .,
and the stopping time
If ~ ( 6A) , or any other stopping time is not defined at some o,set it equal to T there. As 6 -,0, z(6, A) converges w.p. 1 to a stopping time [which we denote by z(A)] which equals (w.P. 1) either iA or iA + A on {iA - A < z 5 iA}. As 6 -,0 and A -,0, ~ ( 6A) , -,z w.p. 1. Define q = A/6, Q = T/A. For each A, 6, i with i IQ, there is a Lebesgue iA} E A,(& A). Let set A,(& A) E Riqr such that z(6, A ) = iA if {w( jh), j 6 I Ii(6, A ; .) denote the indicator function of A,(& A). Note, for use below, that for any integer ci and any Lebesgue set A E R", there is a sequence of continuous functions f,(.): R" + [0, 13, which converges to I A ( *almost ) everywhere as n -, co. Also, if 4 is a vector of a independent nondegenerate scalar-valued Gaussian random variables, then, since the Gaussian measure on R" is absolutely continuous with respect to Lebesgue measure on R", we have thatf,(+) -,1,(4) w.p. 1 (4 measure) as n -, 03. Let us now fix 6, A and approximate the stopping time ~ ( 6A) , by a , follows. Let {f;(.)}denote a continuous sequence stopping time ~ ~ (A)6 as of functions with values in [0, I] which converges to Ii(6, A ; - ) almost everywhere (Lebesgue, hence Gaussian measure) as n ---* 03. Now, let us drop the 6, A indices where no ambiguity will result. Define the set BY
and, recursively, for i BY
= 2,
={U E
Rqr : f ; ( ~ > ) f}
. .., Q,
= { u E Riqr :fY(u)
> i}-
u B;
i- 1
~
x
Rqr(i-').
I= 1
We have I B I n ( - ) + Z ( - ) almost everywhere as n-, 00. The {BY x Rq(Q-i)r, ?I i = 1, 2, ...} are disjoint and open. Hence, there is at most one value of i for which {w(j6),j6 IiA} E BY. Finally, let zfl(6, A) denote the stopping rule: zn(6, A) =
iA
if {w(jd),j6 IiA} E BY,
T
if not otherwise defined.
Let zn(& A ; w ( - ) ) denote the value of the stopping time.
(2.5)
8
146
OPTIMAL STOPPING
By construction, r,,(S, A ) -,~ ( 6A, ) as n + 00 (w.P. 1). Hence since T < cc, for each E > 0, there is an nE< cc such that n 2 nE implies that R ( x , r,,(6, A ) ) I R ( x , ~ ( 8A, ) )
+ E.
(2.6)
Furthermore, as 6, A -,0, R ( x , ~ ( 6A, ) )
+
R ( x , r).
(2.7)
PART2 Now, we must find a stopping rule, for the {th( .)} process, which “corresponds” to ~ ~ (A6) . ,Let us denote this stopping time by tl(S, A ) and define it by [see (2.5)]
rl(6, A) =
( iA
if { Wh(j6),jS I iA} E BY
I
if not otherwise defined.
(
T
Let rk(6, A ; W h ( * )denote ) the value of the stopping time. This rule may yield stopping times which take values outside of the set {t!, i = 0, 1, . . .}, but that is unimportant. By the optimality of ph, Rh(X, tl(8, A ) ) 2 Rh(X,ph)
Fix x
=
(i = X ( 0 ) and
+
R(x, P).
(2.9)
let h index a weakly convergent subsequence of
{th( .), Bh(-), Wh(.), ph}, and suppose that the Skorokhod imbedding is used.
Since the boundaries of the sets BY have zero probability [ W(- ) measure], the weak convergence yields that r;(S, A ; W h ( * )+) tfl(6, A ; W ( - ) )w.p. 1, as h + 0, where W ( - )denotes the weak limit of { W h ( * ) }Also, . ch(-)+ t(*) uniformly on [0, TI w.p. 1, and by virtue of the Lipschitz condition [which allows us to write t(* ) and X ( - ) as explicit W(- ) and w( * ) functions, resp.], , (-))) and of ( X (*), rfl(S,A ; w( -))) are the same, if the laws of (((-), ~ ~ (A6; W X ( 0 ) = x . Hence, (2.10) Rh(x,z l ( 4 A ) ) -,R ( x , t,(4 A ) ) as h -,0. Equations (2.6), (2.7), (2.9),and (2.10) imply that R ( x , p ) _< R ( x , t ) for all 7 E Fw.Q.E.D. Theorem 8.2.5 k t V ( . )be continuous, and suppose that A6.1.1 holds. Fix x = X ( 0 ) . Let p denote the limit of {ph} corresponding to a convergent subThe value of R ( x , p ) does not sequence of {(’(-), B h ( * ) ,ph}. Thent p E -?:. depend on the subsequence and R ( x , p ) = V ( x ) .
REMARK In the proof, we d o not assume that X ( . ) can be constructed from w(*). The approach is different from that in the previous theorem. PROOF Since A6.1.1 holds, we can suppose that there is an R’, a-algebras
&, J‘ and measures Pysuch that {U,d‘,d;, X ( - ) , PI, y E R‘} is a strong t In the sense of the remark under Theorem 8.2.3.
8.2
147
OPTIMALITYOF THE LIMIT
Markov and Feller process, where under Pi,, X ( .) has the probability law of the diffusion (1.5.1) with initial condition y. The proof will be divided into several parts. First, we enlarge the above probability space to add p . and show that the process is still a strong Markov and Feller process. Then we show that p is as "good" as any c-optimal stopping time for X ( . ) . We di, and P; are the same as those used in Theorem suppose that the R', d', 8.2.3.
PART1 Let h index a convergent subsequence of {th( .), Bh(-), p h ) with B ( . ) , p). We use the construction B ( t ) = & o(t(s)) dW(s) of limit t and Theorem 6.3.2, where W ( t )is given directly as a functional of t(s), s I $(s), s I t , where $(-) is a Wiener process which is independent of t(-). Define p , = I,,,,, . We will first show that (for any Borel set r, and t 2 0, s 2 0, and x E R') (((a),
p,{t(t
+ $1E r 1 t ( u ) , p. , u It i = P , { t ( t + .$) E r 1 t(u), u It )
W.P. 1. (2.1 1 )
We will actually prove (2.12a) [the proof of (2.12b) is similar to that of (2.12a)]
+ A) - t ( t ) I t(.b p . u 5 t l = f ( t ( t ) ) A + 4 A ) = E,[t(t + A ) - t(0I t0)I + 4 A ) , cov,[t(t + A) - t(0I t(u), P. , u I t l = 2a(t(t))A + 44 = cov,[t(t + A) - t(f)1 t(f)I+ 4 A ) ? E,[t(r
?
(2.12a) (2.12b)
where all o ( * )terms are uniform in r. First, we will show that (2.12) implies t . Let F ( - ) be a bounded (2.11). Let t i , i = 1, .. ., n, be real numbers I , , , , ( * )be realinfinitely differentiable function from R' to R and G , , , ( * )G valued bounded continuous functions on R" and R", resp. Then (2.12a) and (2.12b) imply that
E,[F(t(t
+ s ) ) - F(t(t))]Gln(t(t,), . . ., t ( t n ) P 2 n ( p t i ? .. .( p t n )
= E.xG,n(S(t,),
.* .
7
t(tn))G.&tp
. . .* ptn) (2.13)
Equation (2.13)and the arbitrariness ofn, t , , ..., t , , F(.)and G , , ( - ) ,G z n ( - ) imply that
E,[F(t(r
+ s)) It(u), p.
3
u I tl = E,[F(t(t
+ s)) I t(u), u 5 f3
w.p. 1 for all bounded infinitely differentiable F( -), hence for all Borel F( -), from which (2.11) follows for each Borel set r.
148
8
OPTIMAL STOPPING
Although we will not explicitly use it, we note that Eq. (2.11) implies that and { p u , u I t } are conditionally independent, given {(u), u I t . Thus (Loeve [L3, p. 3511) for all Bore1 sets B, E R’ and B2 E R‘, o I t , and s > 0, we have w.p. 1 (T 2 r )
{t(t+ s), s 2 0)
P,{p,E B , lt(s), s I T } = P,{P,E B , lt(u), u I t},
P,{P,,E B , , t ( t
+ s) E B,It(u), u It } = P,{P,,E *
B , 1t(u), u It }
P x { t ( t + s) E B , I t(u), u I t}.
Next, we will prove (2.12a). In (2.13), let s = A and suppose that ..., n, and that F ( t ) = 5, and note that (where
P,{p = t i } = 0, i = 1, P t = 4Ph5fi,) ExGln(t”(tl),
...) t h ( t n ) ) G 2 n ( d l 9
. . . 1
p:J[t”(t
+ A) - t’’(t)l
(2.14)
converges to the left-hand side of (2.13) by weak convergence and uniform (in h ) integrability of {{‘‘(t + A) - (’’(t)}. Also, by using the weak convergence of t”(-) to t(.) and the continuity of the limit t(.), we can show that the value of (2.14) changes by at most o(A) (uniformly in h, t ) if [t’’(t A) <’’(t)]is replaced by j”(t”(t))A. These arguments yield
+
0 = ExG,n(t(t,), -
...?
t ( l n ) ) G ~ n b t i ?. . . ? p t J [ t ( t
+ A) - t(t)
f ( < ( t ) ) A+ o(A)], small h, and t in a bounded interval.
, t , , ..., This last equation, the arbitrariness of n, GI,(-), G Z n ( - ) and (excluding a countable set) implies (2.12a).
tn
PART2 Let B‘, 99; and B[O,t] be defined as in Theorem 8.2.3. Define R = 0’x R’, B = @’ x B[O,00) and B,= B;x (B[O, t] u [ t , 00)). For each y E R‘, select a convergent subsequence of {th( * ), ph, th(o)= y} with limit (((-), p, t(0) = y), and let P , denote the measure that the limit 98). The correspondingt processes on (R, 9) will be denoted induces on (0, by X(*), p. Under P,, the pair X(.),p has the same law as does (t(*), p), when t(0) = x. By uniqueness A6.1.1, the measure P, reduces to P; on B’x [0, 00); that is, if B E @’, then P,{B x [0, 00)) = PI{B}. The last two sentences and the conditional independence (2.11) imply that (Q B,@,, P,, y E R‘, X(-)) is a strong Markov diffusion and a Feller process. [Obviously, the laws of X(.)and X ( . ) are the same (under each P, and P,,resp.). It is only necessary to check the Markov property P,{X(t + s) E r Id,} = P ( R ( t ) ,s, r)w.p. 1.1 Also, P E r:(x). of
t That 0.
is, we identify the process
R(.)and
random variabl: p with the two coordinates
8.2
149
OPTIMALITY OF THE LIMIT
To prove the theorem, we only need to show that for each
E
(2.15)
R(x, p ) I R(x, T(E)), where
T(E)
> 0,
is the &-optimalstopping time defined by T(E) =
min{t : X ( t ) E Be},
where B, = { X : V ( X )2 b ( x ) - E } . This will be shown next. Define
T’(E) =
min{r : X ( t ) E B, - a&}.
NOTE It is always possible to define {a,J t ,J, P , , y E R‘} as we have process is Markov on that space, done, whether or not the resulting X(-) and by construction, p will always be a stopping time relative to {JJand R(x, p ) = R(x, p). However, unless X(- ) is Markov on that space [or at least unless the conditional independence (2.11) holds], the basic stopping problem is altered and, in fact, R(x, p ) < V ( x )is possible. See the example in the remark after Corollary 8.2.2.
PART3 Fix x and let ph(&) denote the stopping time PA(&) = min{t : {‘(t) E Be}.
Assume that dB, satisfies the regularity condition P,{T’(E)= T ( E ) } = 1. Otherwise, we can perturb B, slightly to get a set with that property without invalidating the following argument. The fact that inf P?{T(E)I T } 2 c > 0 Y
for a real T , c, implies (see Chapter 5 ) that the { p , , ( ~ )are } uniformly (in h) integrable. Let h index a convergent subsequence of {{”( ), Bh( ), ph(&),p h } .By the weak convergence and Skorokhod imbedding, {’(-) + { ( * ) w.p. 1 uniformly on finite intervals and (using the regularity assumption) p),(&)+ p(&) w.p. 1, where P ( E ) = inf{t : { ( t ) E B,} = inf{t : { ( t ) E B, - dB,} w.p. 1. The T ( E ) ) and ({(*), p ( ~ )have ) the same probability law [under P , pairs (X(-), and with {(O) = x , resp.]. By the optimality of p h and by weak convergence and uniform integrability,
-
Rh(& P h ) I Rh(X,p h ( E ) ) Rh(X, P h )
+
+
R(x, P(E))
=
-
R(x, T ( E ) ) ,
R ( x , p),
which proves (2.15). Q.E.D. I t is not necessary that V( - ) be continuous, and Theorem 8.2.5 holds without that assumption. We made the assumption to simplify the details. Continuity
8
150
OPTIMAL STOPPING
of V ( .) was used only to assure that B: = B, - dB, 3 B for E > 0. We needed this fact in order to assure that <'(-) entered B: for small h. This latter fact allowed us to compare the costs for the limit p ( & ) = limh+oph(&) and T(&), and ultimately, to get the optimality of p. We can approximate V ( . ) by a continuous function, even in the absence of the Lipschitz condition, and accomplish the same comparison. This can be done as follows. Let X( * ) denote the process given by (1.5.1). Fix A and n and define c ( x ) = b ( x ) and for 0 < i In, b(x), E x V;-'(X(A))
+ Ex
A 0
I
k(X(s))ds .
(2.16)
Since X ( . ) is a Feller process, each V i ( * )is a continuous function. Let A) denote the set of stopping times which take values (0, A, . . ., nA} and where whether or not a stopping time is I iA can depend only on {X(O), . . ., X(iA)}. V i ( x ) is the minimal cost for the stopping problem when the stopping time is allowed to vary only over FpM(i,A). Also, V i ( x ) 2 V ( x )and V i ( x )+ V ( x ) as i + co, A + 0 (Shiryaev [Sl]). Define the sets ypM(n,
B(i, A)
={x:
V i ( x ) = b(x)},
BJi, A) = { x : V i ( x )2 b ( x ) - E } . Within the class if
F",(n, A), the optimal rule is to stop at the time iA, n 2 i, id
=
min{jA : X(jA) E B(n - j , A)}.
An &-optimal policy [stopping time denoted by T(E, n, A)], within the class F p M ( n ,A) of stop rules is given by: stop at iA if iA = min{jA : X(jA) E B,(n - , j , A)}.
Define ph(&, n, A) as follows:
<:
iA
ph(&.
n, A) = id if
= min{jA : t h ( j A ) E
BE(n- j , A)}.
Let x = = X(0). By the optimality of p h , Vh(X)= Rh(X, ph) I Rh(X, PI(&, n, A)). B h ( * )ph, , P h ( & , n, A)). Then, Let h index a convergent subsequence of {th(.), under a regularity assumption,? ph(c, n, A) + P(E, n, A) as h + 0, where p(&, n, A) = min{jA: < ( j A ) E B,(n - j , A)}. The pairs (<(.), P(E, n, A)) and t As in the proof of the previous theorem, if the regularity assumption does not hold. we need only perturb the dB,(i, A) slightly.
8.2
151
OPTIMALITY OF THE LIMIT
(X(-), T ( E , n, A)) have the same probability law. We also have that Rh(x, ph(&,n, A)) + R ( x , p(e, n, A))
and
(7 =
=
R(x, T ( E , n, A))
as h + 0 3
optimal for X ( . ) ) R ( x , ~ ( en,, A))
+
R(x, 7 )
as E -+ 0,A + 0, n + co. A formalization of these arguments yields
Corollary 8.2.2 Theorem 8.2.5 is true ifwe drop the continuity condition on V(.)* REMARK ON THE IMPORTANCEOF (2.1 1) Consider the two-stage Markov process ( Y o , Y,), where r;. takes values a, or a,, and we are to select a stopping rule p = o or 1. Let k ( x ) = 1, b(a,) = 1, b(a2)= -2, pa,., = all i, j . Let Yo = a, and let the stopping rule be (the cheating rule) p = 0 if Y, = a, and p = 1 if Y, = a,. The corresponding cost (for Yo = a , ) is 0, while the minimal cost over all Markov stopping rules is f. Analogously to the construction of (0,g,, P,) in the proof of the last theorem, define Q = { ( a i ,a j , 0),( a i ,a j , 1); i,j = 1,2}. Let d obe the a-algebra over the sets {(ai, (a,, a2),0), ( a i , (a,, a,), 1); i = 1, 2}, and let a,be the a-algebra containing “all” subsets of R. We can easily define measures P, ( x = a,, u 2 ) and random variables { Y o ,Y,,p) on this new space which are consistent with the random triple Yo, Y,, p } and its measures. The random variable p will be a stopping time relative to d i ,i = 0, 1. Also R(a,. p ) = R(a,, p ) = 0 by construction. But even though p is a stopping time, the process { Y o ,F,} is not Markov on the new space. In this example, Pa,{p= 1I Y o = a,) = f. But we cannot select p = 0 or 1 “randomly,” each with probability f (which would give a randomized Markov, hence a legitimate strategy), and then expect the process to evolve from Y o to Y ,according to the law of the process. If p = 1, then we must have 7, = a,. The property (2.11) eliminates such possibilities.
4,
Counterexample to Convergence if k ( . ) Is Not Strictly Positiue
Consider the two-dimensional deterministic example where x
x, = x, = 1, k ( x ) = 0 and
b ( x )=
Let us restrict our attention to A
+ +, 1) 1 + 1x1 - 4
min(x,
= { x : x, 2
ph(x + e l h) = ph(x + e , h) = f,
= (x,, x,),
,
0, x2 2 0). We have x
E
RL n A .
152
8
OPTIMAL STOPPING
The point x = 0 is in B, the stopping set. But if (!, = 0, the chain {ti)still wanders infinitely often to points x where b ( x ) is arbitrarily small ( 1 x 1 - x 2 1 arbitrarily large), as n + co. In fact no finite point will be in the stopping set for the chain. The difficulty arises from the fact that the time horizon is infinite. The th(- ) will approximate X ( .) arbitrarily closely over any finite time interval as h + 0 [if Ch(0)= X(O)].But, for any fixed h, the approximations eventually “wander away” from the X ( .) path as t + co.Thus, we would expect good results only when the “essential” behavior of the system-as far as the costs are concerned-takes place on a perhaps large, but finite, time interval. When k ( x ) 2 k, > 0, the stopping times that we need to consider satisfy E x T I2 supx I b ( x )1 / k , , and we have the “finiteness condition. ”
Discounted Cost
An alternative way to guarantee the “finiteness is to introduce a discount factor. Assume that k ( - )is bounded and continuous, but not necessarily positive. Let A(.), l o ,A ( t ) and A; be defined as in Section 6.5, and define ”
R ( x , T) = E . [ j ; A ( t ) k ( X ( s ) ) ds
+ A(r)b(X(r))],
7 E
Ti,
(2.17)
and V ( x )=
inf R(x, 7 ) . r E .TPMO
As for the nondiscounted case (Shiryaev [ S l ] ) , V( .) also satisfies V ( x ) = inf R(x, 7 ) . r E.FRO
The optimal stopping set is still B V ( .) formally satisfies
+
= { x : V ( x )= b ( x ) } ,and
~ ’ V ( X ) k ( x ) - A(x)V(X)= 0,
V ( x )Ib(x),
the function
x $ B,
V ( x )= b ( x ) on B.
There need not be a finite optimal time and B could be empty. For simplicity, suppose that Ath(x)+ 0 as h + 0 uniformly in x . Applying the discretization method of Section 6.5 [which led to (6.5.3)]to the stopping problem yields the equation V h ( x )= min{b(x), exp - l ( x ) A t h ( x ) [ E xV h ( t : )+ k(x)Ath(x)]}.(2.18)
8.2
153
OPTIMALITY OF THE LIMIT
For each m E Y : ( h ) , the family ofrandomized stopping times for I<:}, define the cost (A! = 0) 1
rm- 1
Then there is a unique solution to (2.18) which is Vh(x) =
inf m
Rh(x, m);
also
inf
V h ( x )= m
E .FPMo(h)
Rh(x, m),
E TRo(h)
where F&,(h) is the class of pure Markov stopping times for the chain {<:}. It can be proved that Vh(x)--+ V ( x )as h -+ 0. We omit the details, which are very similar to those for the undiscounted case, except to mention that the {ph} will always (obviously) be tight on R + , and that limits of {ph} which are infinite with a nonzero probability are acceptable; even if ph or its limit is infinite for some path, the discounting guarantees that the corresponding cost for that path is finite. Forced Stopping o n Leaving a Given Set Let A denote a given open set, and t = inf{t : X ( t ) # A}, T' = inf{t : X ( t ) # A}. We can consider the optimal stopping problem, where X(.)is forced to
, appropriate cost stop at 7,if it has not been stopped before. For p E Y Rthe is and define
V(x) = inf R(x, p ) . p
E
TPM
The above infimum remains the same if p varies over satisfies
+
V(.)
x E A - B,
~ ' V ( X )k ( x ) = 0,
B
9,. Formally,
= {x :
V ( x ) = b(x)} n A,
V ( x )I b ( x ) = b(x)
(2.19)
in A on B u 8A.
Define A, = A n R i . Then by discretizing (2.19) analogously to the way that (1.1) was discretized, we get the equation
+
Vh(x)= min{b(x), E x V h ( t : ) k(x) Ath((.)}, V h ( x )= b ( x )
on
R; - Ah.
x E Ah,
(2.20)
8
154
OPTIMAL STOPPING
We will not go into the details of the convergence proof, which involves a combination of the concepts of Chapter 6 and of the previous sections. Only a few comments will be made. For each m E Y i ( h ) ,define the cost
I
c
(Nh n m ) - 1
~ h ( xm, ) =
E,
k(t:)~t:
i=O
+ b(ti,
m)],
where Nh = min{n : t: r# A,,}. If either k ( x ) 2 ko > 0 or E x N h < 00 or k ( x ) = 0 and P x { N h< co} = 1, then there is a unique solution to (2.20). This unique solution is given by inf
V h ( x )= m
E
inf
Rh(x,m) = m
%PM+ ( h )
E
P ( x , m).
Fn+(h)
(2.21)
Assume that P,{T = T’} = 1 and that either k ( x ) 2 k, > 0 or supysA, E y r < co,or k ( x ) = 0 and Px{r < co} = 1. In each of these cases, the discretized problem is well formulated for small h and V h ( x )-, V ( x )as h -+ 0.
8.3 Constrained Optimal Stopping Problems Very little is presently known concerning the solution to the optimal stopping problem (or any stochastic control problem) under side constraints. However, once the problem is discretized, many types of constraints can be readily handled by the linear programming methods of Section 3.9. We continue to use the discretization ((3 and interpolation <”( - ) of Section 6.1 Let M y , and Myl denote the average number of times that t! = y and the process does not stop, and does stop, resp. Thus M y , = P{stop on reaching state y}. Let M y = M y , + M y l , and fix the initial state to be x E R;; thus px = 1, P y = 0, y # x. As was done in Section 3.9, we add ajctitious absorbing state (to the state space) which we call the zero or rest stare (0). Define the controlled transition probabilities { p y l ( i ) ) by p y z ( 0 )= ph(y,z ) , p y o ( l )= 1, p o o ( i ) = 1, y, z E RL. Set b(0) = k ( 0 ) = 0, Ath(0)= co.Then the “dynamical” constraints are given by (3.9.2), which we rewrite here:
M y = pY +
1p l y ( j ) M z j ,
z.j
Mzj
2 0.
(3.1)
The cost to be minimized, namely (1.3), takes the form
c Myok(y)Ath(y)+ c MY1bb)
4,
= Rh(X,
Y
Y
(34
where the stopping time in is determined by the { M i l } . If there are constraints on the path up to the stopping time, besides the constraint (3.1), then the optimal control policy for the chain may not be
8.3
CONSTRAINED OPTIMAL STOPPING PROBLEMS
155
pure Markov, but will be randomized Markov, as discussed in Section 3.9. If I additional constraints are added to (3.1) and (3.2), then the simplex procedure yields an optimal policy with which the control (or stopping policy) for at most I states will be randomized; i.e., there are at most 1 values of y E Ri for which both M y , and M y l are positive. If t: = y , and the state occupancies are given by { M z j } , then the probability of stopping at time n will be M y l / M y . Let F R M ( hdenote ) the class of such randomized Markov strategies for the chain {t:}. If c denotes a constraint, let FRM(h, C), F R ( h rC), Y p M ( hC), , SpM(C), S R ( C ) ,S , ( C ) denote the appropriate class of strategies for the chain or for X(.),for which C is satisfied. Define inf
Vh(x,C ) = m
E
/RM(h.
Rh(x,m).
c)
(3.3)
For the constraints with which we will deal, it will be true that inf
Vh(x,C ) = m
E
YR(h,
Rh(x, m).
c)
(3.4)
For notational convenience, let us suppose that h always indexes a sequence for which all the relevant random variables converge to their limits w.p. 1 (Skorokhod imbedding) in the appropriate topology. We now give several examples of constraints which can be handled, although only Example 1 will be developed in any detail. Example I Let K denote a real number and for stopping times 7, let C denote the constraint E , z I K . In the discrete model, the appropriate constraint is
1 MYoAth(y)I K . Y
(3.5)
Equation (3.5) assures the existence [whether or not k ( . ) > 01 of an optimal policy and then the minimum of (3.2) under constraints (3.1) and (3.5) is precisely Vh(x,C ) . Let mh denote the optimal constrained stopping time for {ti},and P h the “interpolation” [the optimal time for {‘(*)I. Since E,ph I K and ph p w.p. 1, for some random variable p, Fatou’s lemma implies that -+
E , p IK . Also, Rh(x,ph) -+ R(x, p ) as h -+ 0. The probability of stopping at a point
y # 0, given the entire past, is no longer 1 or 0, depending on the value of y as in the unconstrained case, but it is still a function of y since P,{mh = .It,”,j < n,
t:= y # O} = P,{mh = n I C = y # O} = M y l / M y ,
where { M y , }solve the linear programming problem. Using this property, we
8
156
OPTIMAL STOPPING
can duplicate the proof of Theorem 8.2.3, and show under a Lipschitz condi(x,C). We can also show that tion on f(.) and o( that p E 9; 0
)
R(x, p ) I R(x, T), z E S,(C) (3.6) [still assuming that the space on which X ( - ) is defined also contains the Wiener process w( .)I. The proof is roughly the same as the proof of Theorem 8.2.4 and we will only make a few comments concerning it. Take T E s,(C), and suppose that z IT for an arbitrarily large but fixed T. All subsequent terms which are not defined here are defined in the proof of Theorem 8.2.4. Approximate this z by z(6, A) and the latter by z,(6, A). We have
lim E,z:(6, A) I K . 6,A-0
Owing to the method of approximating z, it is possible that E,z:(6, A) > K for each nonzero 6, A. [In that case Rh(x, ph) would not necessarily be no greater than Rh(x,z:(6, A)).] However, by a slight modification of the approximate stop rule, this problem can be avoided. For each E > 0, there are c0 > 0, a0 > 0, A. > 0, and measurable sets Ai(6, A), i = 1, 2, ..., Q, which are slight modifications of the A,(& A) and such that the following properties hold. The sets (Al(6, A) x R4(Q-i)', i = 1, ..., Q } are disjoint. If t ( 6 , A) denotes the stopping rule obtained by replacing Ai(6, A) by &6, A) and if ?,(a, A) is obtained from ?(a, A) exactly as z,(6, A) was obtained from z(6, A), then R(x, ?(a, A)) I R(x, z(6, A)) R(x, ?,,(6, A)) I R(x, ?(6, A))
Ex?(6, A) I K - EO ,
+
E,
+ E,
(3.7)
Ex?,(6, A) I K - EO ,
for large
n, 6 < 60, A < A. .
(3.8)
Thus it is enough to prove (3.6) if ?,,(6, A) replaces z there and E,?(6, A) I K - EO is assumed. Under this assumption, for small h, 6, A and large n, z:(6, A) satisfies the constraint Exr:(6, A) I K. By (3.9) and the optimality of ph in YR(h,C), Rh(X,
ph)
Rh(X,
(3.9)
r:(6, A))?
from which (3.6) follows. Not much more can be said since little is known concerning the nature of the optimal policy for X ( .). The optimal strategy for the discretized problem
8.3
157
CONSTRAINED OPTIMAL STOPPING PROBLEMS
involves at most one state in which the control is randomized. This suggests (but does not prove) that the optimal strategy for X ( - )will not be random. If, in fact, there is a pure Markov optimal strategy for X ( * )with stopping set B and &-optimalstopping set B, , and if B: 3 B for all E > 0, we can proceed as in the proof of Theorem 8.2.5 and its Corollary, to show that R(x, p ) = V(X, C ) . Example 2 Let A denote an open set and t a stopping time. The discretization of the constraint
(3.10)
is
1 M Y o A t h ( yI ) E.
ycRh'-A
The limit p of the optimal times ph for the discretized constrained problem satisfies the constraint. Example 3 Let A and t be as in the previous example. The handling of some constraints requires an augmentation of the state space. Consider the constraint
+
P x { X ( s ) 2, some s I
X ( 0 ) = x E A.
t) IE ,
(3.1 1)
Let X , ( - ) denote the process with initial value zero and which is constant, except for a jump of unity at the first time that X ( .) leaves A. We define the new Markov process ( X , ( . ), X ( )), with discretization {tho,,,t:}, with the obvious transition function for the first component. Let M o y o ,MlyO,MOyl, and M , , , denote the average number of times that (tho,,, = 0,tl: = y, do not stop at (0,y)), . . ., average number of times that (tho,,, = 1, t: = y, stop at (1, y)), resp. The discrete form of the constraint is
-
c
yeRV
W1,o
+ MlJ
5
E.
The objective function (3.2) remains the same and the system (3.1) must be augmented to account for the new component {tho,,,}. The constraint (3.11) will be satisfied by the limit p of the optimal ph . The first component X o ( is actually a jump process and, as such, is somewhat outside the main scope of the book. However, much the same techniques can be applied to the jump case. See Kushner and Yu [K12] and the forthcoming work of Kushner and DiMasi [K16]. a )
8
158
OPTIMAL STOPPING
8.4 Discretization of the Impulsive Control Problem The treatment of the impulsive control problem of Section 4.7.5 has much in common with the treatment of the optimal stopping problem, and we will use the results of Sections 8.1 and 8.2 wherever possible. The approximating process will still be a controlled chain {t:}on the state space R;. If there is no action at time nand 5; = x then the transition probabilities ph(x, y ) are used. If there is an action of value v at time n, then the transition probabilities are pt,(v) with P ~ , ~ + ~ = ( V1.) We need to alter Y slightly since x + v is not always in R; for x E RL and v E Y . A loss of the property that all 5: E RL would greatly complicate the numerical computations-although it does not affect the theoretical development. We will alter Y for each h, x as follows. To each x E RL and v E Y for which x + v $ RL, associate a unique point in RL by increasing each coordinate component of x + v which is not equal to an integral multiple of h (positive, negative or zero), until it equals an integral multiple of h. This procedure alters the value of the action by at most h in each direction. The modified action space does not depend on x and will be referred to simply by the symbol Y hWith . no loss of generality, we can suppose that p(x, .) is defined on V hand is 2 p o there for each x E RL. Throughout the discussion on impulsive controls we assume A6.1.1 and the following A8.4.1.
-
A8.4.1 k ( . ) and p ( ) are continuous real-valued bounded functions dejined on R' and R' x Y ,resp. There is a real p o > 0 such that p(x, v ) 2 p o for all x E R' and v E Y .Let I denote a given positive number. a,
For the sake of notational simplicity only, let us suppose that sup, Ath(x)-,0 as h + 0; the general case is treated in precisely the same way but will use the approximation (6.5.4), rather than (6.5.3) for (4.7.26). We now use the notation of Section 4.7.5, and let (0, a,B,, P,, x E R') denote the probability space on which the controlled process X ( - ) and control times and actions {7i,ui}are defined. Let us use the finite difference approximations (6.2.1b(6.2.3) and approximate the discounted equation (4.7.26) by the form (6.5.3). Then, using the facts that V ( x )s min[p(x, v ) ueP
+ V(x + v)]
(4.1)
and that when there is equality in (4.1) (which equality defines the action set B), we act and use the act which minimizes in (4.1), we get the finite difference equation (see also Section 3.7)
+
Vh(x)= min{exp -IAth(x)[k(x)Ath(x) E x Vh(5:)],
min[p(x, v ) + exp -IAth(x)[k(x)Ath(x)+ Vh(x+ v ) ] ] } .
V E
P*
(4.2)
8.4
159
THE IMPULSIVE CONTROL PROBLEM
Let VR(h)and VPM(h)denote the classes of randomized and pure Markov, resp., control laws for {t:}.For an impulsively controlled chain with action times and values { m i , ui} E VR(h),define the cost Rh(x,{mi
9
r
c ( ~ X P-ntki)~(tki
ui}) =
7
i= 1
1
m
c;:
ui)
where t: = At!. With the interpolated process, we may use the notation pi = tki and write Rh(x,{mi
9
ui})
f
Rh(x,{Pi
9
ui})
The term &h tends to zero as h + 0; it is an error due to approximating the piecewise constant function which takes the value exp -At)+ on [ t ) , t)+ 1 ) by the exponential Equation (4.2) is the dynamic programming equation for the minimal value of Rh(x, { m i , ui}). At each time n, there is a choice of no action or of action. In the former case, the (discounted) cost of (exp -AAth(x))k(x)Ath(x) is incurred. In the latter case, if the action is u E V h then , the next state is x + u, and an immediate cost of p(x, u ) + (exp -AAth(x))k(x)Ath(x)is incurred. Note that if ui # 0, then the actual jump occurs at time mi 1, for the chain, and at pi + At) for t h ( - ) . There is a unique solution to (4.2) and it is the right-hand side of (4.4)
+
V h ( x )= {mi,
inf
od
Rh(x, { m i , ui}).
(4.4)
E~PM(X)
The class V,,(h) can be replaced by VR(h) in (4.4), with no change in the infimum. Let {m), u:} (or { p ) , u)} for the interpolated process) denote the optimum Markov policy for the chain ((2). Some Useful Bounds
The following bounds will be derived for the continuous parameter system but they also hold and will be used for the Markov chain. Define k , = sup, 1 k ( x )I . An upper bound K , to the optimal cost is obtained by never acting and setting k ( x ) = k , : k , Jome-” ds
= k,/L
=K,.
(4.5)
8
160
OPTIMAL STOPPING
Suppose that there are N, jumps on [0, TI for the optimal process. Then E,p,N,(exp
-AT) - K , I optimal cost I K , .
We would obtain the left-hand side if k ( x ) = - k l , all acts occurred at t and p ( x , v ) = po, the minimum. Thus, for the optimal policy E,N, I 2K, exp AT/po .
=
T,
(4.6)
In fact, (4.6) gives an upper bound for the (optimal) number of acts on any interval of width T . For each E > 0, there is a real T, < co, such that confining all acts to [0, T,] increases the minimum cost by no more than ~ / 2 . To see this, we note that the minimal cost attainable over [T, m) is in [K,e-", -K,e-"], and that not acting on [T, co] achieves a cost in that interval. For each E > 0, there is an integer N, < co,such that by restricting control actions to no more than N, acts on the interval [0, T,] and no acts on (T,, co), we increase the minimum cost by no more than E . To see this, note that any set of acts on [0, T,] can reduce the average cost by no more than 2K1 (over no acts) and that 2K, P,{ 2 N acts required on [0, T,] by the optimal policy} I 4 K : exp AT,/poN.
Choose N so that the right-hand side is less than 4 2 . Tightness and Weak Convergence Throughout the discussion on weak convergence, the initial condition X ( 0 ) =
1
-
exactly as in Chapter 6. If there is no action at time n, define PI: and 6 W i as in Chapter 6. For notational convenience, we define PI: =f(<;) = 0 if there is an action at time n. We can write m
(4.7)
Define Fh( B h ( .), W h (). as in Sections 6.3 and 6.6. The proofs of Theorems 6.3.1 and 6.6.1 imply that the sequence { F h ( . ) , B h ( - ) , W h ( * )is} tight on O3'[O, co) and that, if F ( - ) , B ( - ) , W ( . ) is a limit of any convergent subsequence, then W ( .) is a Wiener process, B( .) is a continuous martingale, and F ( . ) is an absolutely continuous process. a),
8.4
THE IMPULSIVE CONTROL PROBLEM
161
Since each uf takes values in a set which is at most an rh neighborhood of the compact set *L,' there is some E > 0 such that the sequence (in h) {uf, i = 1, .. .; h + 0) is tight on (VJm,where V eis an &-neighborhoodof V .If {ui}is a limit of a convergent subsequence of {uf, i = 1, . ..; h + 0}, then each ui has support on V . Obviously { p l , i = 1, ...; h -0) is tight on (R')". Define the jump process J h ( - )by J h ( t )= ~ t i + , h s t ~ : . Then ("t) = x
+ J"r) + F h ( t ) + Bh(t).
(4.8)
The bound (4.6) The sequence {Jh(-)} is not necessarily tight on D'[O, a). implies that there can be no accumulation points of jumps of J h ( - ) as h -0. However, it is possible that p:+l - p : -+O (use Skorokhod imbedding) on a set A, of probability > 0 for one or more values of n for which {p:} converges to a finite p, on A,. This implies that the limit J ( . ) (if it exists) would have multiple simultaneous jumps, at some random time [and SO would the limit of ch(.)I. Actually, this possibility causes no real difficulty. It only complicates the notation a bit. First, we will proceed under the assumption that the limits of { p i } for different values of n do not coincide (except perhaps at infinity), and then we will describe the necessary modifications, if some limits do coincide w.p. > 0. Thus, by our temporary assumption, the bound (4.6) implies that { J h ( - ) }is tightt on D'[O, 00). Since each term on the right-hand side of (4.8) is an element of a tight is tight on D'[O, a).Henceforth, let h index a convergent sequence, , uf}} with limit ( F ( - ) , B ( . ) , W ( . ) , subsequence of {F"(.), B h ( - ) , W " ( * ) {pf, { p i , ui}). Then Jh(- ) + J ( -), a pure jump process with (distinct) jump times { p i } and jump values {ui}.The tightness of { J " ( - ) )guarantees that there is a separable process J ( * ) with only discontinuities of the first kind and whose multivariate distributions (except at times at which P{jump} > 0) are the but it does not guarantee that J ( - ) is right continlimits of those of {.Ih(-)), uous (Billingsley [B3, p. 1361). However, we can assume that both J ( . ) and [(.) are right continuous w.p. 1. Thus the results of Section 6.3 imply that [even though the limit (( - ) may be discontinuous here] {("(a))
(4.9)
t A slight problem arises if pt + 0 (w.P. > 0 )as h + 0. Then, owing to the behavior of {Jh(*)} at f = 0, we would not have tightness; but the cost functionals converge in any case. We can define the processes on D'[ - 1, m), and hold them constant on [ - 1.0) if we wish to avoid the problem.
162
8
OPTIMAL STOPPING
and? (4.11) Let us now consider the case where there are multiple simultaneous jumps. Let h index a convergent subsequence (use the Skorokhod imbedding again) of {Fh(.), Bh(-), Wh(-), {pf, uf}}. By the construction (4.8), th(t) converges for each t which is not a limit point of {pf} for some i. On each interval [0, 7’1, there are (w.P. 1) only a finite number of such “jump” points. More generally, there is a function t(-)which is continuous on each interval { t : pi < t < pi+1}, such that th(- ) converges to t(- ) on each { t : pi < t < pi+l}. Of course, if pi = pi+ then the interval is empty. We also have the following convergences. If pi < pi+ then as h 0, -+
(kih+
1
=tkih
+ of = (’((Pf) + 03
(;,I,-+
<(p;)
+
((p:).
If pi- < p i , then
as h + O .
Now, let u be an integer such that pi-1 < pi
= pi+1 = ... -
Pi+a < P i + n +
Then h tmih
-+
t(p;
)?
tki+1h
+
t(p;)
1.
+ uil
t Strictly speaking, ((0) = x should be replaced by t(O-) = x, in case there is ajump at f = 0. This possibility should be kept in mind, but we will ignore it in the notation henceforth.
8.5
163
OPTlMALlTY OF THE LIMIT
We make the following conventions. Suppose that t(-)is right continuous. But, if there is a multiple jump (say, of multiplicity c1 1) at some p i , define (for a 2 p 2 1)
+
t(pi+p)
t(p;)
+ ui + ... + u i + p -
1
or, equivalently, t ( ~ i + p )= t ( p i + p -
1)
+ t’i+p-
1.
Then (4.10) [and the obvious form of (4.11)] continue to hold.
8.5 Optimality of the Limits { p i , ui} and R(x, {pi, ui}) We must show, as we did in Section 8.2 for the optimal stopping problem, that {pi, ui} have certain optimality properties. We will prove the results whenf(.) and o(-)satisfy a Lipschitz condition. However, the techniques of Theorem 8.2.5 and its Corollary (and of Chapter 9) can be used to show that V h ( x )+ V ( x ) in general.
Theorem 8.5.1 Let h index a weakly conuergent subsequence o f { F h ( * ) , B h ( * ) ,W h ( * ){,p ! , u!}} with limit { F ( - ) , B(.),W(.), { p i , ui}}. Then { U ~ I , ~ , ~ , , , I{p8s ,), i = 1, 2, . . .} is independent of{ W(s)- W(t),s 2 t},for each t 2 0. Also, for each i , { u j , p j , j I i } is independent o f { W ( t p i ) - W ( p i ) t, 2 0).
+
PROOF The proof is almost identical to that of Corollary 8.2.1, and will be omitted. Q.E.D. Theorem 8.5.2 Let f(.)and o(*)satisfy a uniform Lipschitz condition. Then {pi, ui} E
a,(.).
PROOF By Theorem 8.5.1 and the Lipschitz condition, the uniqueness of the solution to (4.11) is obvious, in that if Ei}, W ( * ) is } any sequence with the same distributions as {{pi, ui}, W(.)},then the distributions of the E i } ) and (((-), {pi, ui}), resp., are the same. corresponding sets (?(.), Let (Q 39, P) denote the space used in the Skorokhod imbedding, and let atdenote the minimal o-algebra which measures {W(s),((s), s < t, I l p , 5 s , , u i I l p i s s ) ,all i and s It}. Clearly each pi is a stopping time with respect to {at}and each ui is BTimeasurable. The uniqueness, Theorem 8.5.1, and the last two sentences imply that { p i , ui} E a , ( x ) . Q.E.D.
{{ai,
{ai,
Theorem 8.5.3 Under the conditions of Theorem 8.5.2, R(x, {pi, ui}) = V ( x )for each x . Hence V h ( x )converges to the minimal cost as h + 0for each X . Also, R ( x , {pi, ui}) I R ( x , {ri, ui})for any {ri, ui} E g w ( x ) , euen in the absence of a Lipschitz condition.
164
8
OPTIMAL STOPPING
PROOFThe second sentence is implied by the first, since the limit V(x)is independent of the particular convergent subsequence. Many of the details are similar to those of the proof of Theorem 8.2.4 and we will only outline the ideas. For each A > 0, define the discrete parameter impulse controlled process { X t } by Xt+1 =X t +f(Xp)A
+ a ( X t ) [ w ( n A+ A ) - w(nA)],
if there is no action at time n and X t + 1 = Xi + f ( X t ) A
+ a ( X t ) [ w ( n A+ A ) - ~ ( d ) ] ,
if there is an action Q,, at time n,
where
(5.la)
2: = X t + Q,,,
(5.lb)
Q,, E V ,
and the control policy with realizations {Q,,} is in V,. (W, = randomized control policies for the discrete parameter process; each Q,, must be V valued.) Let the cost corresponding to the sequence of realizations {Q,,} be defined by RA(X*
m
[
{QJ)= E x nC= O (exP --InA)P(Xt, Q n ) l { Q n + O ]
+ n1 = O (exp --InA)k(Xt)A]. W
If each Qn is a functional of w(s),s InA, then we say that {Q,,}E Ww . We will abuse notation and refer to {Q,,} also as the control policy. It can be shown that (under the Lipschitz condition) inf RA(x,{Q,,}) = (Qnt E
YR
=
inf RA(x,{Q,,})
(Qd E '8 w
inf RA(x,{Q,,})= VA(x).
(Qn) E ~
P
(5.3)
M
Un'der the Lipschitz condition we have: Let X ( .), w( .), {zi,ui}satisfying (4.7.23) be defined on some probability space, where {7i,ui}E a R ( x )Then, . for each E > 0, there is a A , 0 such that, for each A < A,, there are seE WR, defined on the same probability space that quences { X t } and X ( w( *), {71,u,} is defined an, and such that
{Qt}
=-
a),
1 R(x, {Ti
E
ui}) - RA(x,{QP})
I
(5.4) Now suppose that { X f , Q,,}, w ( - )satisfy (5.1), with {Q,,}E Ww. For each > 0, there is a A. > 0 such that A < A , implies that there are X ( -), w( -), 9
E.
8.5
165
OPTIMALITY OF THE LIMIT
{oi, ui} satisfying (4.7.23), and defined in the same probability space that {Xp,Qn}, w ( - ) are defined on, {zi,ui}E a , ( x ) , and such that
I R(x, {Ti
ui}) - RA(x,{Qi}) Equations (5.3)-(5.5) imply that
lim and that
A-0
7
1
E.
(5.5)
inf RA(x,{Qi}) = V(x), (Qnl~'+f~
inf
(rip ui) E
R(x, { T ~ ui}) , = V(x).
ZW(X)
(5.6)
Thus if then the theorem holds. For each integer N and real T of strategies 2% 'by
-= co,define the class
2% '= { { T i , ui , i I N} E a W ( x ) :either zi= co or 'si5 T}. By the bounds in Section 8.4, Eq. (5.7) holds if it holds with 2% 'replacing
a w ( x ) for each N and T. For each A > 0, integers N a n d s, real T > 0, and values is,, . . ., iss in *Y-, let 3% '(A, Dl, . . ., iss) denote the subset of sequences in ~ 2',%in which each control action can take only the values isl, ..., ijs, and each action time must be an integral multiple of A and NA = T. It is enough to show (5.7) for each such subset. '(A, vl, .. ., us). We can approximate zi by a ri(S, A) of Let {ri, ui} E 2% the type dealt with in Theorem 8.2.4, where ri(S, A) depends only on {w(jS), j S I iA}. We thus assume that each T~ is in that class. There are measurable sets Am,i r in = 1, . . ., N , i = 1, . . ., s, such that the {zi,ui}strategy implies an action of value iii at time inA if and only if {w(jS),j S
mA} E Am, .
We now proceed as in the proof of Theorem 8.2.4 by smoothing the indicator functions ZAmJ-), obtaining a family of open sets {A:, i} whose boundaries, aA:, i , have zero measure and where ZAm,im-+ ZAm.i(-) almost everywhere, as n -,co. Next, define the approximation to {zi, ui}, which uses action isi at time mA if and only if {w(jS), j S I m} E A;, i.The cost corresponding to this approximation converges to the cost for {ri,ui}as n -,co. We then use the sets {A;, i} to derive a policy for the chain {
8
166
OPTIMAL STOPPING
8.6 Numerical Data for the Optimal Stopping Problem Results of some numerical calculations are illustrated in Figs. 8.1-8.4 for the system
d X 1 = X 2 dt, dX2
= (-XI -
3x2)dt
+
0
dw,
b(x) = xf + xi, k(x) = 1, with forced stopping on first leaving the set G = {x: Ixl 1 < 3, Ix2 I < 3). Only the “left half” of the data is plotted since the results were antisymmetric about the x2 axis. Figure 8.1 plots the stopping sets for h = 0.3,0.15, and 0.075. The curves
for the latter two values are very close and from them we can obtain a “reasonable” idea of the appearance of the optimal stopping set for the diffusion. At first glance, the stopping set for h = 0.3 does not seem to be too close to the other stopping sets but it is better than it may appear. We can see this from Fig. 8.2, where the E = 0.2 stopping set is plotted for h = 0.15, along with the values of V”(x)at some selected points on the boundary.
FIG. 8.1 Optimal stopping sets for the chain h = 0.075.
<:.
(--)
h
= 0.3,
(---)
h
= 0.15,
(---)
8.6
167
NUMERICAL DATA
~ 0 . = 2c stop set /----
,
-0
I
---I
I
/
/
/
/
/
boundary of opt stop set
- 2
origin -
0.09
I-
-- -
- - - -- - - - - - \
FIG.8.2
E
\
\
\/
/
/
--
= 0.2. Optimal stopping set, h = 0.15.
Generally, we need a stopping set for which the cost is “reasonably close” to the minimum cost. Here, even the set for h = 0.3 gives this to us, if “reasonably close ” is 0.2. Several types of iterative procedures were used to solve (1.2). Let us order the points as in Fig. 8.5. We selected that ordering, since it is preferable to order “against the flow” whenever possible for the various versions of the
FIG.8.3 Convergence for the Gauss-Seidel iteration, h = 0.3. (--) at 10 iterations, (---) at 20 iterations.
at 5 iterations, (---)
8
168
OPTIMAL STOPPING
origin
FIG.8.4 Convergence for the Jacobi iteration, h = 0.3. (--) iterations, (- - -) at 20 iterations.
at 5 iterations, (---) at 10
Gauss-Seidel procedure. The points on the boundary are absorbing. The Jacobi iteration is defined by V!(x) = b ( x ) and N2
b(i), C p“(i, j ) V ! ( j ) j= 1
I
+ Ath(i) ,
i $ aG.
The Gauss-Seidel procedure is defined by the same initial value and boundary condition and
I
N2
V!+ l ( i ) = min b(i),
ji =- l 1
p”(i, j)V:+ , ( j )
1
+ 1ph(i, j)V!(j)+ At”(i) , j=i
i cj! aG.
The progress of the convergence of the stopping sets is illustrated in Fig. 8.3 for the Gauss-Seidel and in Fig. 8.4 for the Jacobi. With h = 0.3, convergence of the stopping sets occurred at 30 iterations for the Gauss-Seidel and at 60 for the Jacobi. Convergence of the V : ( - )(as n + co) itself was slower, depending on the exact convergence criterion used. But in any case, the Gauss-Seidel was always preferable. An alternating direction Gauss-Seidel was somewhat (but not significantly) faster (alternating the ordering in Fig. 8.5 with the alternative ordering in that figure). The accelerated procedures were even faster.
8.6
2
I
3
x2
3
b
-3
N'
FIG.8.5
169
NUMERICAL DATA
The ordering of the grid points.
J !N
J(N-l)t I
CHAPTER 9
Approximations to Optimal Controls and Nonlinear Partial Dgerential Equations
In Chapter 8, we dealt with the case where there was either one control action or at most a countable set of “impulsive” actions. In this chapter, we deal with the case where there may also be a control which acts continuously in time. In Section 9.1 we formulate the optimal stopping problem and the relevant discretizations when there is also a continuously acting control. Several methods for getting an approximating chain are discussed. The discretizations actually lead to optimal control problems for the chain. In Section 9.2, it is shown, via weak convergence arguments and an implicit function theorem, that the sequence of interpolations for the optimal approximating chains actually converges to a controlled and stopped diffusion. In Section 9.3, we turn our attention to an examination of the optimality of the limiting cost. In order to show that the limit is optimal, it is necessary to compare the limiting cost to the cost corresponding to arbitrary controls and stopping times. To do this (in Section 9.3), we first give several ways of approximating the arbitrary controls by simpler ones. Then we adapt the simpler ones for use in controlling the approximating chain and 170
9.1
171
OPTIMAL STOPPING : FORMULATION
compare the corresponding cost with the cost for control for the chain. The types of approximations that are developed should be of wider use in stochastic control theory in proving approximation, representation, or existence results. Some of the approximation theorems in Section 9.3 are rather technical, but they seem to be necessary to show that “general” control policies can be approximated by somewhat simpler control policies. A basic problem causing the complexity is that we d o not usually assume a Lipschitz condition. Then in order to approximate a control and still have a corresponding solution to the controlled stochastic differential equation, it is often necessary to emphasize measures rather than paths. Also, we assume that to certain types of very simple controls, there correspond solutions to the controlled It6 equation. The controls are given, basically, in terms of their joint measures with a Wiener process. Thus we frequently have to make statements such as: given a probability space on which is defined a pair ( u ( * ) , w( * )), there is another probability space on which processes *), ii(. ), G( )) are defined, where the triple solves the controlled stochastic differential equation and (ii(-), G(.)) has the same probability law as ( u ( * ) , w(-)).The assumption of a Lipschitz condition would considerably simplify the the development. In Section 9.4, the results are extended to the discounted case. Sections 9.5 and 9.6 treat the case where the control stops when a target set is first reached. The discretized problem is an optimal control problem for a Markov chain, which stops on reaching a target set. As in Section 9.3, in order to show that the limiting discrete costs are optimal, we must approximate a class of controls by simpler ones and then adapt the simpler ones for use on the chain. Section 9.7 gives some results for the impulsive control problem, where there is also a continuously acting control. Some numerical results and comments appear in Section 9.8. Owing to the relationship between the optimal control problem and the nonlinear partial differential equations of Section 4.7, the discretization methods of this chapter are also approximation methods for weak solutions of these equations.
(x(
9.1
-
Optimal Stopping and Continuously Acting Control: Formulation and Approximations
An interesting and very useful generalization of the stopping problem of Chapter 8 arises when we are allowed to control the process up to the stopping time. For example, we may be given a control problem with a
172
9
OPTIMAL CONTROL A N D NONLINEAR EQUATIONS
“running” cost k(x, u ) and a stopping cost b(x), where b ( - )denotes a penalty for stopping at a point x which is not in a given desired stopping set G (in G, we would set b ( x ) = 0). If x = X ( 0 ) # G, then this example yields a mechanism for controlling the process up to “almost” when it hits a(?, the boundary of the desired stopping set, without explicitly introducing hard boundaries. The generalization also serves as an introduction to the problem, considered in later sections, where we control up to the actual time at which we reach a desired target set. We will use the following assumptions.
.
A9.1.1 f (., ) and o( * ) are bounded continuous R‘ and r x r matrix-valued functions on R’ x 42 and R’, resp., where 42 is a compact convex set in some Euclidean space, say R”. A9.1.2 k(., - ) and b(.) are bounded continuous real-valued functions on R’ x $2 and R’, resp. Up to Section 9.3, assume that inf,, o1 k(x, a ) 2 k , > 0. k ( - , *)). The sets g(x, %) = {g(x, a ) : R’, are convex and compact.
A9.1.3 Define g ( - , - ) = (f
(a,
a
E %}, x E
a),
The next assumption says only that certain very simple controls are meaningful, in that there are controlled processes [solutions to (l.l)] corresponding to such controls. The assumption is used for certain technical arguments in the approximation theorems. It seems to be rather unrestrictive. Of course, under a Lipschitz condition on f (-, .) and o(-),the condition always holds.
A9.1.4 Suppose that there is a probability space on which a Wiener process %(-) and a pair (ii(-), ?) are defined, where the latter pair is nonanticipativet with respect to %(-), where ? takes onlyfinitely many values, and where ii(-) takes only finitely many values in % and there is a real A > 0 such that ii( - ) is constant on each interval [iA, iA + A). Then, for the initial condition x of interest, there is a probability space (Q B, at,P ) supporting X ( . ), u ( * ) , w(. ), 7,where ( w ( * ), u( * ), 7)has the same probability law as does (%( ), ii(. ), ?), 7 is a stopping time relative to {Bt},the triple X ( .), u( -), w( - ) solves (l.l), and for each t > 0 the functions X ( s ) , u(s), w(s), s 5 t, are Bf-measurable and w(t s ) - w(t),s 2 0, is independent of Bf. The solution is unique in the sense x). that u E g R ( x ) ,z E
-
+
Ti(.,
In Section 8.5 on, where we refer to A9.1.4 but do not treat the stopping problem, we delete the reference to gRin A9.1.4.
t When we say that i is nonanticipative, we always mean that the function I ( - ) with values I ( t ) = I f i s , , is nonanticipative.
9.1
OPTIMAL
STOPPING: FORMULATION
173
The terminology of Section 4.7 (in particular, that of Section 4.7.3) will be used, and we will deal with the controlled diflusion model
x(t)= x + jotf(X(s), u ( s ) ) ds + j 0' o ( X ( s ) )dw(s).
(1.1)
If u ( - ) is a control in any of the classes qR(x),etc, it will be considered as either a random element of LT* or as a sequence {u(t), t 2 0 ) of @-valued random variables (which forms a separable process). As pointed out at the end of Chapter 2, the two views are mutually consistent. Some additional terminology will now be introduced.
,
NOTE When considering the controlled stopping problem, the random stopping time 7 should properly be considered as part of the control. So we modify the definition of uniqueness to read that if (w(-), u( ), 7)have the same law as (ii)(. ), ii( ), 7) and if X ( . ) and ) are two nonanticipative solutions to (1.1) corresponding to the two triples above, then ( X ( ), u( .), z) and ), ii( -), 5 ) have the same probability law. To be consistent with this definition, we say that u E gR(x),7 E Ti(u,x), if there is a probability space (0, 33, gf, P ) supporting X ( .), u( ), w( ), 7, where 7 is a stopping time relative to {Bf} and X ( . ) , u ( * ) , w ( . ) solve (1.1); X ( s ) ,u(s), w(s),s I t, are 33'-rneasurable, and X ( - ) , u ( - ) , 7 are nonanticipative with respect to w(-)and w(t + s) - w(t), s 2 0, is independent of Bf;if (a,3,B,, P) is another such probability space supporting -), ii( .), ii)( -), 5, where the last three elements have the same distribution as does ( u ( - ) , w ( . ) , 7), then the laws of ( X ( * ) ,u ( * ) , 7) and of ii(-), 5 ) are the same. A similar definition is used when @ or T have affixes W, G, 0, or +, etc.
-
-
x(.
-
-
(x(.
-
x(
(z(-),
TERMINOLOGY The symbol 92D is used to denote an arbitrary but finite set {tii},where Ui E 92.The set will be fixed in each usage, but may vary from usage to usage. For each A > 0, let @t(x)denote the subclass of controls in qR(x)which take values in some eD and which are constant on each interval [iA, iA + A), i = 0, 1, . .. . Let u ( - ) E @t(x).Suppose that there are nonnegative numbers { t j } and Bore1 sets (of the appropriate dimension, say di) A , such that { A i , , 1 = 1, 2, ...} are disjoint for each i and P { ( w ( t j ) ,rj I iA) E d A i l }= 0,
u 1
A , = Rdi,
and suppose that u(iA) is a function of w(tj),t j I iA, and is constant on each set A , . Then we say that u E @&(x). The reason for the requirement (whenever such a condition occurs) that the probability be zero that (w(t,), t j IiA) be on the boundary of the decision set Ail is that the indicator function I & ( * ) ) of the set 2, = E C [ O , a): ( x ( t j ) , t j I iA) E A,,} is then a continuous function {.(a)
174
9
OPTIMAL CONTROL A N D NONLINEAR EQUATIONS
on C'[O, a)w.p. 1 (Wiener measure). This latter fact is very useful in the weak convergence analysis since we will have W " ( - ) W( a Wiener process w.p. 1 (assuming Skorokhod imbedding) and the condition then implies that the control, as a function of W " ( * )instead of w ( - ) , converges w.p. 1 to the same control, but where the Wiener process is the limit W ( - ) . If p E Fp(u,x) and p takes values iA, i = 0, 1, ..., we say that p E Ft(u,x). If p E F;(u,x) and there are real ti 2 0 and Borel-measurable Ai (of, say, dimension c i ) which satisfy a),
P{(w(rj),ti 5 iA) E aAi} = 0,
and if p = iA if and only if {w(t,), ti IiA} E A i , then we say that
pE
F&(u,x).
The Cost Functional and Finite Difference Assumptions
For each u
E
gR(x)and
p
E
F R ( u ,x), define the cost functional
R(x, p, u ) = E", j P k ( X ( S ) ,4s)) ds 0
+W(X(p)).
(1.2)
Equation (1.2) is a special case of the functional defined in Section 4.7.3, when we put G = R'. We will take the liberty of defining u( * ) for t > p in any convenient manner. Let V ( .) denote the function defined by V ( x )= inf R(x, p, u), P.
where the infimum is taken over the defining class above (1.2) Define the stopping set B = {x : V ( x )= b(x)}and suppose that V ( . ) is "smooth."Then (see Section 4.7) V( .) satisfies the differential equation inf [ Y V ( x ) + k(x, a)] = 0,
x
#B
US*
V ( x )I b(x),
V ( x )= b ( x )
on B,
(1.3)
where the operator 9" is defined by
9" =
1i aij(x)-axi axj + c a2
i,
fi(X9
a)
a
and is just the differential generator of the controlled process when the control takes the value a. We will approximate (1.3) by finite differences. If the finite difference approximations to the derivatives are carefully chosen, then for each finite difference interval, the finite difference approximation to (1.3) will be the dynamic programming equation for the minimal cost of a certain controlled Markov chain. The limit (as the finite difference interval goes to zero) of any
9.1
175
OPTIMAL STOPPING : FORMULATION
convergent subsequence of the interpolations of the chains will be a controlled and optimally stopped diffusion. The costs (finite difference solutions) will converge to V ( x )as the finite difference interval goes to zero. It is immaterial whether (1.3) holds in the strong sense or not. The finite difference method of getting an approximating chain is used for convenience, but any sequence of approximating chains which is consistent with the controlled diffusion can be used. The general technique of approximation is that used in the previous chapters, and we will use (6.2.2)and (6.2.3)for the second derivative terms. There are several possibilities for the approximations of the first derivatives VXi( and we will describe three of them. The methods differ according to whether or not we are willing to let Ath(x)depend on the control value. Other types of approximations are also possible. In fact, although the techniques to be discussed are useful and natural, the entire subject of the technique of the discretization needs further work. Define a),
QAX,
a) = 2
Qh(X)
1 aid.) i
-
1 I aij(x)I
i. j , i P j
h
1I i
a)
I,
= max Q h ( X , a ) a€@
Ath(X)= hZ/Qh(x),
Ath(X,a ) = h2/Qh(X,a),
E
@.
Approximation I For each x , let I, denote the set of indices i for which $(x, a ) actually depends on the argument a. If i c$ I,, then use the finite difference approximation (6.2.1) for Vxi(x).If i E I,, then use the central difference approximation
V,(x)
+[V(x
+ e i h ) - V ( x - eih)]/2h.
(1.4)
An advantage of (1.4) is that the values of y for which Vh(y)actually enters into the finite difference equations do not depend on the actual value of the chosen control. But in order to guarantee that the coefficients of the Vh(y) terms in the finite difference equations are 2 0 and hence can serve as transition probabilities for a controlled Markov chain, we need to assume that
The requirement (1.5) can be rather stringent in applications. It may force us to use (numerically) a very small value of h. A technique for weakening (1.5) will be discussed bt!ow.
176
9
OPTIMAL CONTROL A N D NONLINEAR EQUATIONS
-
Define &( ) and A?( - ) by
AP(X) =
hZ/Qh(x).
Define the controlled transition function parameter a E 92,by
=
+
I
a,,(.) -
0 ,
*
I .)
on R;I, with control
1
j#i,j
~ ' ( x x, e,h f ej h 1 a ) = a: (x)/&(x)) x - e,h e j h l a ) = a,;(x)/Q",(x)
i #j,
~'((x,
Ph(x,y I a ) = 0,
Ph(
all other x, y E RL .
Let Vh(.) denote the solution to the finite difference equation. Substituting (6.2.2), (6.2.3) and either (1.4) or (6.2.1) (depending on whether i E I, or I $ I,) into (1.3) and collecting terms yields the finite difference equation
o = min aeQ
[-
+ 1 (jh(x)Vh((y)p'(x, y I a )
vh(x)Qh(X)
Y
+ h2k(x, a ) ] , Vh(x)= b(x)
Equivalently,
[
if
Vh(x)< b(x),
otherwise.
.A,1
Vh(x)= min b ( x ) , min
p'(x, y Ia)Vh(y)+ k(x, a)&'((,)
11
.
(1.7)
The {jh(x,y I a)} are the one-step controlled transition probabilities for a Markoo chain whose random variables we denote by {ti}and (1.7) is just the dynamic programming equation for an optimal control and stopping problem on this chain. For each a, we have Ex"4l:+ 1 - fl: 1'41= Y , a usedl
Cov,[;sh,+
-
= f (Y? a P t h b )
'4: 1'41: = y, a used] = 2a(y)Ath(y)+ O(hhth(y)),
(1.8)
which is consistent with the infinitesimal coefficients of the controlled diffusion X(*). We will now describe a slightly different method of approximation, which allows a weakening of (1.5). Let lidenote a fixed element of 42, and define
9.1
177
OPTIMAL STOPPING : FORMULATION
J(x) =f;(x, a) andx(x, a ) =f;(x, a ) -f;(x, where
a). Rewrite 9'in the form o f p ,
Now, use the approximation (1.4) for thex(x, a)(a/axi) term and (6.2.1) for the J(x)(a/ax,) term. The finite difference equation takes the form (1.7), where Fh( -, I .) is replaced by a slightly different set of transition probabilities and we can replace (1.5) by the weaker requirement
-
aii(x)-
1 1 aij(x)I 2 h lx(x, a ) 1,
a
42,
E
all i, x.
i#j. j
The denominator of the equation for the new transition probabilities is (replacing &&)) 2
14 x 1 - C
i f j , i. j
i
laij(x) I
+ h Ci I J(x) I.
Our limited experience with numerical calculations indicates that the alternative approximation is preferable, in that larger values of h can be used with the same degree of numerical accuracy. Note thatx(x, a ) = 0 if i $ I , . Approximation 2 A second type of approximating chain is obtained by letting the finite difference approximation for Vxi( ) depend on the unknown control function. Let us use VXi(4
+
1
-
+
if h(x, a ) 2 0, if f,(x, a ) < 0.
[V(x eih) - V(x)]/h [V(x) - V(x - e i h ) ] / h
(1.9)
Substituting (1.9), (6.2.2), and (6.2.3) into (1.3) yields the finite difference equation
o = min
aeBl
Vh(x)= b(x)
[
- Vh(x)Qh(x,a )
if
+
Y
1
Qh(x,a)jh(x, y la)Vh(y)+ k(x, a)h2
(1.10)
Vh(x)< b(x),
otherwise,
where we define j ( - ,
- 1 .) by
+
)
j(x, x e i h f e j h ( a )= a$(x)/Q,(x, a ) j ( x , x - e i h f e j h l a ) = a;(x)/Q,(x, a ) j(x, y I a ) = 0,
all other x, y
E
i#i,
R; , a E 42.
(1.llb)
178
9
OPTIMAL CONTROL A N D NONLINEAR EQUATIONS
The j(*,- 1 . ) is a transition function for a controlled Markov chain whose random variables we denote by {f:}.Since the coefficient Qh( -, ) depends on the control, it is not immediately clear whether we can simply divide each term inside the brackets of (1.10) by Qh(x, a ) and obtain the dynamic programming equation (1.12) for the optimally controlled and stopped chain.
-
[ ...1,C
V h ( x )= min b(x), min
11
+
j h ( x , y ( a ) V h ( y ) k ( x , a)Ath(x, a )
.
(1.12)
In fact, doing this yields an approximation V h ( x )which will converge to V ( x ) under the same conditions as do approximations 1 and 3. In particular, we can verify that
E,[t!+ - t! 1 t: = y, a used] =f (y, a)Ath(y,a), COV,[t~+ 1
- t! 1 g:
= y, a
(1.13)
used] = Ch(y)Ath((y, a),
where I,,(*) is given by (6.2.11). Consistency of (1.12) with a finite difference equation for (1.3) is not important. It is important that we obtain an approximating controlled process with the correct weak convergence properties and that is all that needs to be ascertained when deriving the approximation. With {t:},the “current ” interpolation interval depends explicitly on the “current control (they always depend implicitly on past controls). This dependence is natural, for if a particular control action implies a faster drift (bigger I f i ( x , a ) I )or a greater diffusion, then the interpolation intervals should be smaller. The interpolation times Ath(x, a ) have this property. A drawback to Approximation 2 is that the { } term in (1.12) is more difficult to (numerically) minimize than the equivalent term for the other approximations. ”
Approximation 3 A third method, which is derived from Approximation 2 and in which the interpolation intervals d o not explicitly depend on the current control, is sometimes simpler for computation. In particular, the inner minimization in (1.12) may become simpler since the replacement for the j ( x , x k ei h I a ) will have control-dependent terms only in its numerator, and the replacements for the j ( x , x k e, h k e j h 1 a ) will not depend on a at all. To get this approximation, define p h ( - , I - ) by
-
ph(& x I a ) = [ Q h ( X ) - Qh(X, a)]/Qh(x)
for x = Y .
(1.14)
For all other x , y, we define ph(x, y I a ) by the right-hand sides of (1.11) but with &(X, a ) replaced by Q h ( X ) . The p h ( * , I * ) are a one-step transition function of a controlled Markov chain ((3, but now each state x may possibly communicate with itself. To motivate the use of (1.14), add Q h ( x ) V h ( x )to each side of (l.lO), divide each
9.1 side by
&(X)
179
OPTIMAL STOPPING FORMULATION
and rewrite the resulting equation to yield
I
]
V h ( x )= min b(x), min {E", V h ( t : )+ k(x, a)Ath(x)}. a€*
(1.15)
The symbol E", denotes the expectation, using {ph(x,y I a)}. The inner minimization in (1.15) is usually easier to do than the inner minimization in (1.12). In the sequel only the controlled chain {rf}will be dealt with. We do this purely for definiteness; the techniques are the same for all the approximations 1-3. Note that
&[tt:+1 - ti I rt: = y, a usedl = f (Y, a)Ath(y), covx[t:+
1
- 5; 1 tf:= y,
(1.16)
= Ch(y)Ath((y).
A variant of method 3, which is analogous to the variant of method 1, is often preferable for numerical calculations. Define p , A ( x 2 a ) andJ(x) as in method 1. Define
The transition probabilities and Ath for the variant are obtained by simply replacing Qh(X) by & ( X ) and using (6.2.1) separately for J(x)a/axi and J ( x , a)d/dxi. In particular, we have the new values
[
ph(x, x f eih 1 a ) = aii(x)-
[
ph(x, x I a ) = h max a€*
1 1 aij I + h ! * ( x , a ) + h ~ (x)]/Qh(x), *
j#i;j
1i I A(x7a ) I - Ci I A(x9a ) I
1
/Qh(x)*
The other ph(x,y 1 a ) and Ath(x)change only in the denominator. Optimal Policy for
{ti}
Let V(h )(with subscripts PM or R for pure Markov or randomized, resp.) denote the class of control policies for {t:}.For each policy n E %'(/I), let F(h,n) (with additional superscripts 0 or +, subscripts PM or R, and additional argument x where appropriate) denote the class of stopping times m for which PI{m < oo} = 1, each x . For each n E V,(h) and m E FR(h, n) and corresponding chain {2}, define the cost nt-
1
(1.17)
180
9
OPTIMAL CONTROL A N D NONLINEAR EQUATIONS
where At! = At"(
R"(X, m, R ) = V"(x).
f f ) ; ff E
QpM(h)
The infimum over the class of randomized stopping times and control policies is also Vh(x).If a policy R is pure Markov and stationary, then there is a function u( such that the control action at time i is u(
-
0
)
9 )
f ( Y , 4 A t h ( Y )= Ex[<:+1 -
we can write <:+
1
=
as in earlier chapters, where
a: = <:+
1
-
<: +f(
7
<:/elY? u, =
w t :
@I7
+ a:
<:- Ex[<:+ - <: 1 1
=
u:1.
{a:}
When SW; or W " ( - )is referred to, it is constructed from the and an independent Wiener process I)(. ) as in Chapter 6. Sometimes, we will apply Then the SW: and Wh(.) will be computed from a nonoptimal control to the corresponding and $(-) but the actual process used should be clear from the context.
{a:}
<:.
Interpolations and Tightness
Henceforth, the initial state will b e j x e d and denoted b y x . Let (5:) denote the chain with the optimal control applied, and let mh denote the optimal stopping time for the chain. Let ph = 174; At! be the interpolated optimal stopping time, where At! = At"(c!), and define the interpolations C g h ( - ) = (<"(.), F h ( * ) ,K h ( * ) ,B h ( - ) , W " ( * )exactly ) as was done in Chapter 6. The process ah() only needs to be defined on the time interval [0, ph). However, it is convenient to define it on [0, 00). To d o this, we choose a ii E 42, set U: = ii for n 2 mh,and define the chain {<:} accordingly for n > mh. The extension of the optimal control policy to [0, 0 0 ) will still be denoted by u " ( * ) . The bound (8.1.5) holds for {ph} and we need only consider stopping times (for either the diffusion or the interpolated chain) which satisfy (8.1.5). Hence { p h } is tight on [O, 00).
-
9.2
THE LIMIT IS A CONTROLLED, STOPPED DIFFUSION
181
The tightness of { ( D h ( * ) } on D4'+I[0, co)as well as the continuity w.p. 1 of the limit of any convergent subsequence follow from the results of Chapter 6. Henceforth let h index a subsequence which converges in distribution (or w.p. 1 by use of the Skorokhod imbedding) and denote the limit by @(*) = ({(.), F ( - ) , K ( - ) , B ( . ) , W(.)),and let p h -,p as h +O. The arguments of Chapters 6 and 8 yield that W ( . ) is a Wiener process, B ( - )is a continuous martingale with representation yo a(t(s)) dW(s), <(t)= x
+ F ( t ) + B(t),
(1.18)
and F(.), K(.), B ( . ) , and the function I ( . ) with value I ( t ) = are all nonanticipative with respect to W(.).Also, if we write uh(s)= uh(th(s)), then Vh(x)has the representation <(a),
So far, we have said nothing about the nature of the limits; i.e., whether is a controlled diffusion, p a stopping time, etc. This is the subject of the next two sections. {(a)
9.2 The Limit Is a Controlled, Stopped Diffusion
We must show that
t(.)is indeed a controlled and stopped process. Since
{p,,} satisfy (8.1.5), so does p by Fatou's Lemma. Also, there is no problem in showing that p is a stopping time with respect to an appropriate sequence of
a-algebras. Generally, we can show only that t(.) is a controlled process, with a control ii(-, * ) which is a @-valued function of o,t, with values B(o, t), and which is nonanticipative with respect to W ( .). However, under a Lipschitz condition, ii E gR(x),p E f:(u, x), and V h ( x )-, V ( x ) as h -,0. The problem is, simply, that we do not know in general whether the control a(.) yields a unique solution, in the sense that the law of the triple (t(-), ii(.), p ) is uniquely determined by that of (a(.), p, W ( .)). In any case, ii( - ) will turn out to be as good as any control which does yield a unique solution and it is optimal in that sense. The optimality is dealt with in the next section. Several preparatory results are needed.
9
182
OPTIMAL CONTROL A N D NONLINEAR EQUATIONS
Under (A9.1.1) to (A9.1.3),there are measurable w, tfunc-
Theorem 9.2.1
tions f ( -, which
k( *, .), which are nonanticipative with respect to W( and for
a),
a )
(f(o,t), yw,t ) )E (f(t(t),a),k(t(t), a)), K ( t ) = l ' E ( o , s) ds,
F ( t ) = l'f(w,s) ds, 0
(2.1)
all t 2 0.
0
(2.2)
PROOF We suppose that the Skorokhod imbedding is used, so that all the processes are defined on the same probability space. Define g(*, .) = {f( k( -,.)}, Gh(t)= {Fh(t),Kh(t)}and G ( t ) = {F(t),K ( t ) } .Since g( -, is bounded, there is a constant K such that (w.P. 1) limh+oIGh(t s) Gh(t)1IKs. Hence (w.P. 1) IG(t s) - G(t)l I Ks and G(.) is absolutely continuous w.p. 1. Thus there are measurable o, t functions g(*, = { f ( . , E(-, -)} which satisfy (2.2) (w.P. l), and we can suppose that I g(o, t) I I K for all o,t and that g( ) is nonanticipative with respect to W ( - ) because , G(*) is. The sequence {gh( )} defined by gh(o, t) = g ( t h ( o , t), uh(w,t)) converges to g(*,.) weakly in Ll(O x [0, TI) for each T < co. To see this, first note that (by the w.p. 1 convergence of Gh( to G( .)) if (2.3) holds for a monotone increasing sequence of measurable (w,t) sets A,,
.,
+
a),
+
a )
a)
., -
a),
., .
a )
5
E
I A n g h ( o , S) ds + E
1
I A J ( o , S) ds
(as h +O),
(2.3)
u,
then it holds for the limit A = A,. The relation (2.3) holds for any set A, which is of the form A, = Bi x C i ,where Bi are disjoint intervals in [0, TI and Ci are measurable o sets. These facts imply that (2.3) holds for all converges weakly in L1(O x [0, TI) measurable (0, t) sets A,. Hence as stated. Let 1 denote an arbitrary vector in R'". Then (using an argument like that used in Roxin [Rl]), it follows from the L,-weak convergence that [for almost all (0,s)]
ci
$(a,
a )
lim l'g(t(s), U h ( S ) ) = lim l'g(
(2.4)
with the reverse inequalities holding for & l replacing G. The left-hand equality in (2.4) is obvious. To prove the right-hand inequality of (2.4), we first suppose that it does not hold on a measurable (0, s) set A,, of nonzero measure. Then Fatou's lemma yields -
lim E h
5
l'gh(o, s) ds I E AI
'AI
(h h
l'gh(o, s) ds
<E
l'g(o, s) ds, '4
9.2
THE LIMIT IS A CONTROLLED, STOPPED DIFFUSION
183
which contradicts the convergence in (2.3) for A, = A,. Thus the measure of A, is zero. Now, 1 is arbitrary, and g(T(t),@) is convex and the limb l'gh(o, t ) and h h l'gh(w, t ) lie in l'g(<(t),42) for (0, t ) $ A,. Define A = A,, where 1 ranges over a countable dense set of vectors in R" '. Since the measure of A is zero, the previous assertions of the paragraph imply that (2.1) holds for almost all (w, t). We can define g(-, .) on the exceptional set so that (2.1) holds everywhere. Q.E.D.
u,
We will need the following implicit function theorem of McShane and Warfield [Ml]. The theorem will help answer the question: given the inclusion relation (2.1) can we actually choose a nonantizipative @-valued measurable function n(-, .) for which
f(w, t ) =f ( ( ( t ) ,a(@, t)), E(o,t ) = k(<(t),n(w, t)),
for all w, t.
(2.5)
It is clearly possible to select a vector u ( o , t ) E @ for each o,t such that (2.5) holds for ii = u. But the measurability is not so obvious.
Theorem 9.2.2 Let ( M , A ) be a measure space, A a separable metric space, and B a compact metric space. Let q : M x B + A be continuous in the second argument for each value of the first, and & measurable in the$rst for each value of the second. Let y : M -,A be &-measurable with y ( z ) E q(z, B )
all z E M .
Then there is an A-measurable function u : M
+B
such that
y ( z ) = q(z, ~ ( z ) ) all z E M . Theorems 9.2.1 and 9.2.2 combine to yield the next theorem, which states that <(*) is a controlled process of the form (1.1).
Theorem 9.2.3 Under the conditions of Theorem 9.2.1, there is a nonanticipative %-valued measurable function a(*, * ) f o r which (2.5) holds and lim Vh(x)+ R(x, p, ti) h-. 0
= P(x)
as h + 0.
(2-6)
PROOF The relation (2.6) follows from (2.5), (1.19), and (1.20). Denote be the minimal a-algebra over the probability space by (a, &, p), and let s), s), s It , are measurable. Let 93' be the minimal subwhich a-algebra oft & x B[O, 00) over which <(-, - ) and g(*,.) are measurable. Let M = fi x [0, a),A =go,A = R ' + l , B = 42, y ( - , - ) = g(*,.), and q( - ) = g( -, Then an application of Theorem 9.2.2. yields that there is a
a(*, c(.,
a,
a).
t 4[0,co) is the Lebesgue field over [O, m).
184
9
OPTIMAL CONTROL A N D NONLINEAR EQUATIONS
dO-measurable“&-valued function ii( ., -)-such that (2.5) holds. Since the fixed s section of d o is contained in J , , for each 0 < s I t , any domeasurable function is nonanticipative with respect to W (.) since any .%),,measurable random variable is independent of W(t s) - W(t), s 2 0. Q.E.D.
+
9.3 Optimality of the Limit In Theorems 9.3.1 and 9.3.2, we show that we can approximate the classes F L ( u , x) and gR(x)by much simpler classes ofstopping times and controls. Theorem 9.3.3 uses these approximations to get the optimality properties of a(., .) and p and the minimality of V(x). The proofs ofboth Theorems 9.3.1 and 9.3.2 involve a number ofdetailed estimates and approximations but the results are useful.
Theorem 9.3.1 Assume A9.1.1, A9.1.2, and A9.1.4. Suppose that u E gR(x) and that T E F R ( u , x ) and 7 < T . For each E > 0, there is a A > 0 and a set UU, and a uA E % i ( x ) and a 7 A E F t ( u A ,X) with 7 A I T , such that R(X, ? A ,
UA)
< R(X, T, U ) -k E.
(3.1)
The above statement holds with W replacing R in gi(x) and
F i ( u A , x).
REMARK Statements such as that of Theorem 9.3.1 will be made frequently in the rest of this chapter. We start with a Wiener process w(*) and a control u( - ) and stopping time T, all defined on the same probability space. We then, for example, approximate u( - ) by some uA(- ) and claim that uA(.) E @i(x), without explicitly mentioning which Wiener process is involved. The notation can become awkward, because sometimes the probability space on which uAis defined is the original one, and the Wiener process is w(.), and sometimes it is the space used in the Skorokhod imbedding, and the Wiener process may be different. All that we are usually after is the construction of a controlled diffusion whose corresponding cost of control is close to the cost for some other process [say, for t;(-), u(.),p ] . In the present theorem, uA E gi(x), 7 A E Fi(u,x) means the following. Given (u( .), 7 , w( *)), we construct uA(.), 7 A on the same probability space. The values of uA(-), 7 A satisfy the restr.ictions put on values of members of @i(x) and Ti(uA, x). There is some probability space, which supports a Wiener process $(-), control iiA(-),stopping time f A ,and where GA(. ), T A have the appropriate nonanticipativeness properties, iiA(.) and T,, take values in the appropriate sets, (*( .), iiA(-), G( .)) solve ( l . l ) , and (iiA(*),
x(-), a(.),
9.3
185
OPTIMALITY OF THE LIMIT
fA,, G(.)) has
the same law as (uA(-),7 A , w(*)).Also, there is the ‘‘usual” uniqueness. We can (and will) write R ( x , iA, Ed) as R ( x , 7 A , u ~ ) ,without ambiguity. We do things in this fashion partly because we are concerned primarily with costs and not paths. Sometimes the Wiener process will not be mentioned explicitly.
PROOF By hypothesis, we can suppose that there is a Wiener process w( .) and a process X ( * ) such that ( X ( u( .), w( solve (l.l), and that u E gR(x),7 E F R ( u , x), 7 I T . Clearly, for each E > 0, there is a A > 0, such that the stopping time 7A = min{iA : iA 2 7 ) satisfies 7 A I T and a),
a ) )
R(X, T A , U ) 5 R(X, 5, U )
+ E/2.
Thus to prove (3.1), we can (and, henceforth, will) assume that 7 E Fi(u,x) whatever u is. Define a sequence { u n ( . ) }as follows. If u(w, .) is continuous (for almost all w ) on [0, m), set u,( .) = u( .), all n. Otherwise, select an arbitrary Uo E %?,set u ( t ) = U, for t < 0, and define .a
u,(t) =
1
e-nSu(t - s)
ds =
‘0
(.
00
e-su(t
- s / n ) ds.
‘0
For each t and n, u,,(t) is Ja-valued since J& is convex. For each n and 6 > 0, define the function un6(- ) by on
u d ( t ) = u,(i6)
[ib, i6
+ 6).
uf=,
Let J@O = {ul, . . .} be dense in J& and define W = { U I 1 . . . , U,] for each integer c. Let E, > 0 denote a real number such thatt N J U , ) 3 J// and E, + 0 as c + m. For each n, 6 > 0, and integer c, define the function uJc( - ) by unac(t) =
on [is, is on [ih, i6
(u~
,Ui
+ 6)
+
if 6) if
j = 1, ..., 1 - 1 and
1 u,(iS) - u1 1 I E, , 1 un6(i6)- U, 1 > E, , ( u d ( i 8 ) - U, I I E,.
By A9.1.4, each udc( - ) is in$ %,(x). There is a subsequence of {udc( .)) which t ) . We can select a subsequence, denoted converges to u( for almost all (0, by { u p ( - ) , p = 1, 2, ...}, such that a )
E
io I u p ( t ) P
- u(t)
1’
dt I 2 - p .
t NC(iit)denotes the closed sphere with center ii, and radius E . 3
See remark before the proof.
186
9
OPTIMAL CONTROL A N D NONLINEAR EQUATIONS
By A9.1.4, for each p , there is a probability space which supports processes z") has the same law as (."(.), w ( - ) , T), and where . f p ( . ) is a nonanticipative solution to
xp(.),P ( * ) ,G p ( * ) fp, , where (P(*), +'(-), P ( t )= x
+ j:j(fp((.bGp(s)) ds + j'o(%'(s))
dGp(s).
0
We can augment the probability space by adding? a separable and measurable process t i p ( . ) , where (I?'(*), ~P(.), f p , CP(-)) has the same law as ( u ( - ) , w(*),z, u p ( . ) ) . Then, of course, we can suppose that $'(-) is obtained from P(.)in the same way that IF(.) was obtained from u ( - ) . Each Lip(.) has the sume measure. Thus {P(.)} is tight on L y , l . The sequence { . f p ( . ) , G p ( - ) t, i p ( - ) ,fp} is tight on C2'[0, a)x L;, I x [0, co). Let p index a convergent subsequence. Using the Skorokhod imbedding we can suppose that all the processes are defined on the same probability space, i C ( * ) , ii(*), 5 is w.p. 1, in the approand that convergence to the limit priate topology. The triple @(-), iit(.), 5) has the same law as either (iip(*), +'( -), f p ) or (u( -), w ( ), T). Thus ii(w,t) E 42 for almost all o,t. Furthermore,
x(-),
-
E j o p l c p ( t ) - iip(r)12 dt I2 - P .
Thus ijp(o, t) + ii(o,t) for almost all o,t. By either construction or definition, ( i j p ( . ), 5.) and (P(-), iip( SP) are nonanticipative with respect to iCp( * ). By weak convergence methods, , 5) is such as those used in Section 8.2, it can be shown that ( g ( - )ii(*), nonanticipative with respect to $(*), and that %( - ) satisfies *"(a),
x ( t )= x
+ I,'f(.f(s),
ii(s))ds
a),
+ I'o(R(s)) dit(s).
eR(x),
0
Since u E T E F R ( u , x), and (ii(-), i C ( * ) , 5) has the same law as ( u ( - ) , w( ), T), the triple (X(. ), u( -), T ) has the same law as ii( ), 5) and hence
-
(x( a),
R(x, ?, u ) = R(x, 5, 6 ) . By the weak convergence, R(x, ?P,
'UP)+ R(x, 5,
ii).
Thus for each E > 0, we can find a p for which the left- and right-hand sides of the above equation differ by no more than E, from which (3.1) follows. t To construct the probability measure on the enlarged space, use the formula P(d2' . dBp . d P . diVP . d?')
=
Pep, ~~, i r ( d 2 p ) P 6 pir, .
ir
(dP)P(dt'P . d3P . d?'),
where the subscripted terms are regular conditional probabilities.
9.3
187
OPTIMALITY OF T H E LIMIT
Next, we shall prove the last assertion of the theorem. Let u E qw(x), g w ( u , x), T I T. According to the first part of the proof, we can suppose that u( .) takes values {a1,. .., ac},where aiE 42, and that there is a A > 0 such that u ( . ) is constant on the intervals [iA, iA + A), i = 0, 1, .... If T E F w ( u , x), T I T (whatever u ( - ) is), then by the method of the proof of Theorem 8.2.4, for each E > 0, we can find a 6 > 0 and a E g & ( u , x) such that I T. R(x, T ~ u,) I R(x, T, u ) 42, T E
+
Thus we can suppose that
4.1.
T E
F&(u,x), and turn to the approximation of
Let x( - ) denote the generic element of C'[O, 0 0 ) and P the Wiener measure and let B, and (0 < 6 < A < T) denote the minimal subon C'[O, a), a-algebras of (the Bore1 algebra over) C[O, a)over which {x(s),s I t} and {x(jS), j S I iA}, resp., are measurable. Since u ( . ) is a w ( - )function and for each w there is an x( - ) such that w(w, .) = x( .), let us write the value u(w, t) in the form u(x(*),t). For each i = 0, 1, . . . and 1 = 1, .. ., c define the subset of C[O, 0 0 ) : BiI = {x(-): u(x(*), iA) = GI}. For each v > 0, there is a 6 > 0 and sets GIE BfAsuch that
C P{GIA&}
I v.
i. I
cI
There are finite-dimensional sets C;, such that x(.) E is equivalent to { x ( j S ) ,j S I iA} E C;,, and there are open sets D;, 2 C; such that P{(w(jS),j S I iA) E 0; AC;,} < v. For each i and v > 0, define the control function u " ( - )as follows: u"(t) is constant on each interval [iA, iA A) and
+
u'(iA)
= iil
if {w(j6),j S
= ii,,
1 > 1, if {w(jS),j 6 IiA} E Dpl -
iA} E
u DYj.
I- 1
j= 1
Each u " ( . ) is in @&(x) and T E F&(u", x) by A9.1.4. The sequence { u " ( * ) } converges to u ( . ) for almost all w ( * ) , t if u + O fast enough. Also, a repetition of the embedding argument of the first part of the proof yields that R(x, t, u") -,R(x, T, u )
as v + O .
Hence for each E > 0, there is a v > 0 such that R ( x , 5, u') IR(x, T, u ) which proves the assertion. Q.E.D.
+ E,
188
9
OPTIMAL CONTROL AND NONLINEAR EQUATIONS
Theorem 9.3.2 Assume A9.1.1, A9.1.2, and A9.1.4. Let u E gR(x), FR(u,x), 7 I T . Then for each E > 0, there is a set eD, a A > 0, and a UA E @&(X) and a 7 A E F & ( U A , X), 7 A 5 T, such that
7 E
+ E.
(34 PROOFBy Theorem 9.3.1, (3.2) will hold if it holds for all u E @i(x)and 7 E F t ( u , x), 7 I T, and we will assume that u ( . ) and 7 are in this class for some fixed A and Q D . Thus, we can suppose that there is a probability space supporting the Wiener process w ( . ) and the nonanticipative X ( . ) , u ( . ) and stopping time z, where ( X ( . ) ,u ( - ) , w ( . ) ) solve (1.1). Let 6 < A, where A/6 = integer. Define the processes { X i } and X d ( . )by X t = X'(0) = x and R(X, 7 A
7
UA)
I
R(X,
7 9
U)
+ f ( X l , u(n6))S + o(X",[w(n6 + 6) - w(n6)], t E [n6, n6 + 6). P ( t ) = xfl,
Xi+ =X;
(3.3)
(3.4) Suppose that [say, where w ( - ) is the Wiener process] zi is an arbitrary control in G?i(x),where the set eD is the same one used for u ( . ) and Z is an arbitrary stopping time in S%(zi, x). Let N , = ?/A6 and let { X f } be the solution to (3.3) using (zi(*), w ( * ) ) Then, . for each such zi, 5, define the cost
The sequence { X ' ( - ) ,w( -), u ( * ) ,7 ; 6 > 0) is tight on D'[O, T ] x C'[O, T ] x L;[O, T ] x [0, T] [note that the measures of (w( .), u( .), 7 ) d o not depend on 61. Let 6 index a convergent subsequence. Let us suppose that the SkorokIt(-), Li(*), hod imbedding is used, and that the limit is denoted by ?(*)). Then .), ii(-), ?) is nonanticipative with respect to It(.) and (w.P. 1)
(x(.),
(x(
x(t)= x
+ joff (z(s),G(s))ds + I f o ( z ( s )dIt(s), ) 0
-
where (It(. ), ii( *), ?) has the same probability law as does (w( ), u( ), 7). Thus by A9.1.4 the law of (f(.), ii(*), ?) does not depend on the subsequence. Hence Rd(x,7, 1.4) + R(x, 7, u ) as 6 + 0. (34 The cost (3.5) is a well-defined functional of the control zi(*), the stopping time Z and corresponding process {X:} for each 6 > 0, iC E < I ~ ( X )and Z E Fi(u,x), Z I T. Let us optimize (3.5) over this class, for each 6 > 0. Since h( can change values only at (and .Z take values) 0, A, 2A, .. ., the problem a )
9.3
189
OPTIMALITY OF THE LIMIT
is actually an optimization problem on a Markov chain. Let u'(iA), i = 0, 1, . . ., and 'T denote the optimal pair, and *'(. ) the optimal interpolated [as in (3.4)] process. Then the value of u'((iA), and whether or not 'T = iA (if td 2 iA), depends only on the value of X'(iA). Thus, u'((iA) and Iba=iA) are functions of w(j6), j S IiA. By optimality, R'(X,
8,U')
IR'(X, 7,U).
(3.7)
+
Define u ' ( - ) by ~ ' ( t=) u'(iA) on [iA, iA A ) and suppose (as we may) that u ' ( - ) is a w ( - ) function. Owing to the properties of u ' ( - ) and ,'T by A9.1.4 there is a probability
-
space supporting a Wiener process $( ) and nonanticipative R'( * ), if( *), z", where (if(.), a'( .), z") has the same law as (u'(*), w( ), )'T and (Xi"( -), if( -), solve (1.1). The function $(!A) has (w.P. 1) the same dependence on ($(jS),jS IiA) that u'(iA) has on (w(jS),j 6 IiA). The sequences {X'(*), w ( * ) , T'; d ( n A ) , n = 0, 1, ...} and ?; if(nA), n = 0, 1, . ..f are each tight on D'[O, 7'1 x C'[O, 7'3 x [0,TI x *;IA. Let 6 index a convergent subsequence of both sequences, use the Skorokhod embedding, and denote the limits by (*(-), i V ( - ) , 3 ; G(nA), n = 0, ...) and (R(-),W(*),T ; ~ ( n d ) ,n = 0, ...). The laws of ( i V ( * ) , 3, L(*)) and of (W(-),T, iT(.)) are the same, and (*(-), 3, ii(.)) is nonanticipative with respect to iV( *), and similarly for the second set of limits. Each set satisfies (1.1). By the weak convergence,
-
$(a))
{x'(*), a(.),
R'(X, ,'T
+ R(X,
U')
R(x, ?, a')
-+
3, i)
as 6
R(x, T, a) = R(x, 3, i)
-+
0.
(3.8)
Equations (3.6), (3.7), and (3.8) imply that R(X, T, U ) 2 R(X, t, i).
and that, for each
E
> 0, there is a R(X,
z", a')
a0 > 0 such that, for 6 < 6 , , IR(X, T, U )
+ E/2.
(3.9) (3.10)
Fix 6 I S o . Let A , and B, denote the decision setst for if(.) and z" [i.e., i f ( i A ) = if and only if {w(j6),jS IiA} E A,,]. If the boundaries of the A, and B, have zero probability, then if E @&(x), z" E .."&(if, x), and we are done. In general, we need to show that there are uA E @&(x), T A E .."&(uA, x) such that R(X, T A , U A ) IR(X, z",
if)
t The decision sets are the same as those for ud(*)and 'T
E/2.
(3.11)
190
9
OPTIMAL CONTROL A N D NONLINEAR EQUATIONS
For each v > 0, approximate the decision sets Ail and Bi by sets A; and By, as was done in the proof of Theorem 9.3.1, and then dejne the corresponding control u'( ) and stopping time z', as in that theorem. Then, a proof similar to the one used in Theorem 9.3.1 yields that R(x, z', u') + R(x, z4,u4) as v .-+ 0. Thus, for sufficiently small v, set uA = u", zA = z', and (3.11) will hold. Q.E.D.
-
We will assume, henceforth, in the proofs, that sup, Ath(x)+ 0 as h -,0. Otherwise, with minor modifications in the definition of the uh( .), the proofs still hold.
Theorem 9.3.3 Assume A9.1.1 through A9.1.4, and let u E g R ( x ) , z E YR(u,x ) . Then P ( x ) = R(x, p, a) I R(x, z, u).
(3.12)
I f f ( ., .) and o( satisfy a uniform Lipschitz condition in x, then E E qR(x)and p E FR(u,x ) and P ( x ) = V ( x ) (i.e., we have the uniqueness property). a )
PROOFBy Theorem 9.3.2, and the fact that we need only consider T satisfying E",z 5 2 max, I b ( x ) l / k , , (3.12) will hold as stated if, for each T < 03, 9,, and A > 0, it holds for all u E @&(x)and z E Y & ( u , x ) , z I T. Henceforth let (u(.), z) be a fixed element in this class, with decision sets {Ail},{Bi},and where w ( * )denotes the Wiener process. We can suppose that there is a 6 > 0 such that u( * ) and z depend only on the values of the Wiener process at time {jS,j S I T}. Let X ( . ) denote the solution to (1.1) corresponding to w ( * ) ,u ( * ) , and z. Define a control u h ( - )and stopping time zh for the interpolated chain t h ( - ) as follows. If {W"(jS), j S I iA} E A i l , set? uh(s) =
on [iA, iA + A).
If {Wh(jS),jS I iA} E B , , set zh = id. Let th(*) denote the process under control u h ( * ) . Of course, the W h ( - )used to calculate t h ( -is) obtained (nonanticipatively) from th() and an independent Wiener process $( -). By the optimality of u h ( - )and p h ,
-
Rh(x,p h , uh) I Rh(x,zh, u').
(3.13)
The sequence {th(-), W h ( * )oh; , uh(iA), iA < T } is tight on D2'[0, TI x [0, TI x Let h index a convergent subsequence with limit (X(*), W(.), i;E(iA), iA < T), and suppose that the convergence is w.p. 1 in the appropriate topology. Then X(.), '5 and E( -) are nonanticipative with respect Actually, to be consistent with the definition of (! or the control may change at one of the t! only. When we write that the control changes at iA, we in fact suppose that it changes at the t! for which n = min{i : C: 2 iA}. ('(ex
9.3
191
OPTIMALITY OF THE LIMIT
to W(-), where we define B(.) by B(s) = Z(iA) on [iA, iA limits satisfy X ( t )= x
+ A),
and the
+ j: f ( X ( s ) ,Z(s)) ds + 1' a(X(s))d W ( s ) .
(3.14)
' 0
Since
P{( w ( j S ) , j S I iA) E dA,, u dBi} = 0, for all i, l, the weak convergence implies that the probability law of {a(*), T } is the same as that of { u ( - ) , w ( * ) , z}. Hence, by A9.1.4, the distributions of {R(.),V(*), 7) are the same as those of { X ( . ) ,u ( * ) , z}. Thus
w(.),
R"(x,7", v") + R(x, ?,
G ) = R(x, Z, u).
This and (3.13) prove (3.12). The last assertion of the theorem is proved by showing that, if u ( . ) is any @-valued measurable random function which is nonanticipative with respect to w ( . ), then the solution to X(t)=x
+
I' 0
f(X(s), 4s)) ds + j ' G ( X ( S ) ) dw(s)
(3.15)
0
is pathwise unique under the Lipschitz condition. Suppose that t1( and t2(- ) are both nonanticipative solutions. Define a(.) = Itl(-)- t2(-)l.Then for any Tl I T < co, a )
E max I~ ( s 1) 2 0 s 1s T1
2 E max
OstsT1
+2 E
1 j ' ~ ( t ~ ( s ) ~ +))I 1 1 j".(t,(s)) 4t,(s))I 1
2
ds
- f(t2(S),
0
2
max
OiliT
0
-
dw(s)
Let K i denote real numbers whose value depends on T. Using the Lipschitz condition and Schwarz's inequality, the first term in the right-hand side is bounded above by KlE~oT11S(s)12 ds.
By the martingale estimate (1.2.3), the second term is bounded above by
hence by
192
9
OPTIMAL CONTROL A N D NONLINEAR EQUATIONS
Putting theseestimates together anddefiningA(t) = E maxg,,,,16(s)(2 yields A(t) I K , /'A(s) ds. 0
-=
Then, since 0 I A(t) 00, we have A(t) = 0. Let T denote a nonnegative random variable which is nonanticipative with respect to w( ). By using the Picard iteration technique of the proof of Theorem 1.5.1, we can construct a solution to (3.15). By the method ofconstruction, there is a CEO, TI-valued measurable function F ( . , .)on Cr[O, TI x L';[O, TI, such that X ( . ) = F ( w ( . ) , solves (3.15) w.p. 1, and X ( - ) is nonanticipative with respect to w ( * ) . By path uniqueness F does not (w.P. 1)depend on the probability space. Thus, the distributions of ( w ( - ) , u ( - ) , T ) imply the distributions of ( X ( - ) ,u ( * ) , T ) . Hence, ij E VR(x)and p E FR(ii, x). Q.E.D.
-
.(a))
9.4 Discounted Cost
In this section, we will briefly outline the changes required (from the development in Sections 9.1-9.3) when the cost function is of the discounted type. Assume A9.1.1 through A9.1.4, but drop the assumption concerning the positivity of k ( * , For each u E gR(x),T E Fi(u,x), and corresponding controlled process X ( *), define the cost a).
where A(*) is a bounded continuous real-valued function On R' satisfying A(x) 2 A. > 0 and A ( t ) = exp - fo A(X(s))ds. Define V(x) =
inf
u E PR(x). r E YR(u. x)
R(x, T, u).
(44
Suppose that Ath(x)-,Oas h + 0, uniformly in x. Then one type of discretizatioh for the discounted problem leads to (see Chapters 6.5 and 8.2)
1
Vh(x)= min b(x), min[(exp - Ath(x)A(x))(E",Vh((t:) aeqt
+ k(x, a)Ath(x))]},
(4.3) which is just the dynamic programming equation for the minimum cost for a discounted controlled optimally stopped Markov chain.
9.5
For each n
CONTROL T O A B O U N D A R Y : DISCOUNTED CASE
E
%& ' I),
{ti},define the cost
193
m E Fg(h,n), and corresponding controlled chain
where A! is defined below (6.5.5), and U iis the actual realization of the control at time i, and the infimum V"X) = ff
E
inf
Rh(x,m, n).
QR(h). m E F k o ( h , z )
(4.5)
Then (4.3) has a unique solution which is just the right-hand side of (4.5). The optimal policy n is stationary pure Markou and the optimal stopping time is pure Markou. Let uh(.), mh (with sample values { U!} and interpolation p h resp.) denote the optimizing pair in (4.5). In this discounted case, define K h ( .) to be the interpolation of {C;:: A:k((:, U:)At!}. The (optimal) sequence {th(*), W h ( - )F, h ( * ) ,K h ( * )p, h } is tight on D3'+l[0, 0 0 ) x R'. Let h index a convergent subsequence, with limit W(*)F , ( * ) , K ( . ) , p). Then t(-) satisfies (((a),
t ( t )= x
+ F ( t ) + J".(t(s)) n
dW(s),
t 2 0,
and viaan argument exactly like that in Section 9.2, there is a nonanticipative %-valued control a(.) and a stopping time p, such that
j f(t(s),ii(s))ds f
F(t)=
0
and Rh(x,p h , uh)+ R(x, p, ii)
as h + 0.
Furthermore, Theorem 9.3.3 continues to hold, with TIR(u,x) replaced by
Z q ( u , x).
If Ath(x)f, 0 as h + 0, uniformly in x, but one makes the replacements indicated in connection with (6.5.4), then the above convergence results remain valid.
9.5 Control until a Boundary Is Reached: Discounted Case In this section, optimization of the cost (4.7.14) will be treated. The development is similar to that in the previous sections-but with the added complication that the nature of the process at the time that it first reaches
I94
9
OPTIMAL CONTROL A N D NONLINEAR EQUATIONS
the boundary must be considered. The infimum of the costs, if sufficiently smooth, satisfies (4.7.15). Any one of the approximations 1,2, or 3 of Section 9.1 can be used, but we will stay with approximation 3, for definiteness. Let A(x) 2 Lo > 0, and assume A9.1.1 through A9.1.4, except for the positivity condition on k ( . , *), and the assumptions concerning the stopping time. (We now stop only on first hitting the boundary dG.) Let {
1
m- 1
and let V h ( x )denote the infimum of the costs over V,(h). The dynamic programming equation for Vh(x)is V h ( x )= min[(exp - A(x)Ath(x))(Fx Vh(<:)+ k(x, cr)Ath(x))],
x
E
Gh,
U.541
= b(x),
X
# Gh.
(5.1)
The equation (5.1) has a unique solution and there is a pure Markou minimizing control uh( For the chain under control uh(*), let mh denote the hitting time of aGh(it may be infinite). Select an arbitary Uo E % and, for n 2 m h , set uh(<:)= U,"= U o . Define the usual interpolations t h ( - )W , h(*), Fh( .), Kh(-), and let p h be the interpolation of mhrnamely t i h .The interpolation K h ( - )is defined as in the last section. W h ( * )F, h ( * ) K , h ( * )p,,} , is tight on D3r+1[0,co) x R'. The sequence {th(*), Let h index a convergent subsequence, with limit (<(. ). W(* ), F ( . ), K ( .), p). Suppose that the Skorokhod imbedding is used. The result of Theorem 9.2.3 remains valid and there is a measurable %-valued function a(.), which is nonanticipative with respect to the Wiener process, such that a).
F(t)=
jt-r(<".qs)) ds 0
j
K ( t ) = 0* (exp - j O S L ( < ( U ) ) du) k(<(s),U(s)) ds, < ( t )= x
+ F(t)+
5 a m ) ) dW(s), f
0
and the random variable p is also nonanticipative with respect to W(-). On the set { p c co}, < ( p ) E aG since p h -,p w.p. I and -,<(.) w.p. 1, uni-
rh(*)
9.5
195
CONTROL TO A BOUNDARY: DISCOUNTED CASE
formly on finite intervals. But p is not necessarily thejrst time that ( ( t ) E dG. If it were, then the w.p. 1 convergence would immediately yield that as h + 0.
Rh(x, uh) -+ R(x, u )
(5.2)
For each control u l - ) , let s ( u ) and z'(u) denote the escape times (of the corresponding controlled process) from G and -6, resp., when the initial condition is x . The initial condition will be fixed at x throughout this section. In general, there seems to be little that we can do at present about the problem of whether (w.P. 1) p = T(U) = inqt : t(t)E dG}. In fact, one of the more serious deficiencies of the method concerns the lack of adequate information on ii(.). There are two main problems. The first concerns the uniqueness of solutions under the control ;(. ). The second concerns the escape times. If E E @g(x, G ) , we will be able to show that Rh(x,u h )+ V ( x )as h -,0. To do this, some more results are needed on the approximations of controls. First, let us assume A9.5.1.
A9.5.1
p = T(@) w.p. 1
There are many cases where A9.5.1 holds. For example, it holds in both the controlled versions of the example in Section 4.4 and in the case where a( ) is continuous and positive definite at all x E dG and JG satisfies a cone condition. In the absence of general information concerning z(U) and p, it seems that we must consider special cases. Perhaps a detailed investigation of the nature ofthe discrete and continuous optimization near the boundary will shed more light on the problem. But it is clear that A 9 5 1 holds in very many cases of interest. The exceptional cases are harder to find. Under A9.5.1, Eq. (5.2) holds. If k(y, a ) 2 0 and b ( - )= 0, then Theorems 9.5.1 and 9.5.2 hold with V ( x ) replacing R ( x , ti), where V ( x ) = limh-o V h ( x )and is no less than R ( x , ti) and A 9 5 1 is not needed. We will now prove an optimality theorem under A9.5.1. Let u E @ , ( x , G ) and let { u " ( . ) } denote the sequence constructed from u ( - ) and w ( - ) in the proof of Theorem 9.3.1. In particular, u " ( - ) E @$(x) and (let p = l/v)
-
E
jOpI u ' ( o )
- U(W)
ds I2-p.
Since u " ( * )E @$(x), there is a probability space supporting a Wiener process G"(- ) and nonanticipative processes *"( * ), ii"( ) such that
-
t
f v ( t )= x
+ j o f ( f V ( s ) , E"(s))
ds
+
r
J-0
a(8"(s))diV(s),
and (ij"(-), Gv(-)) has the same law as ( u " ( - ) ,w ( - ) ) . Let us enlarge the latter probability space to add a nonanticipative process 2(-), where (G"(*), C"(*),
9
196
OPTIMAL CONTROL A N D NONLINEAR EQUATIONS
ii"(.)) has the same law as (u( .), u'( *), w( .)). Thus, we can suppose that ij"( .)
is obtained from ('i - ) in the same way that u'( .) is obtained from u( -). The sequence ii'( ), ii"(. )} is tight on C2'[0, co) x LT, I . Let v index a convergent subsequence, with limit .), G(-), G( )), and suppose that the , has the same law as (w(-), Skorokhod imbedding is used. Then ( i i ( - )ii(-)) u( )), and (X(*), ii(.), ii(. )) satisfy (1.1). Furthermore, by uniqueness (u E @$(x, G ) ) , ii(-)) has the same law as ( X ( . ) , u ( . ) ) . Thus ~ ( i i = ) ~ ' ( i iw.p. ) 1 and
-
{x'(
a),
-
-
(x(
(z(.),
R ( x , ii) = R(x, u).
Furthermore, since
z'(-)+ d ( . )as u + 0, and ~ ( i i =) ~ ' ( i i w.p. ) 1, w.p. 1,
T(?) + T ( i i ) ,
P{T(?') # ~ ' ( 2 1 ~ )( i i <) co} + 0 Thus, for each
E
as
(5.3)
v + 0.
> 0, there is a v ( . ) in some @&(x) such that
R(x, u ) I R(x, u )
+ E,
P{T(U)
{r:}
# t'(U)
lT(U)
< co} 5 E .
(5.4)
Let u h ( . ) (a control for the chain and interpolated process t h ( - )be) obtained from u( * ) in exactly the same way as uh( ) was obtained from u( in the proof of Theorem 9.3.3. Then there is a 6 > 0, which goes to zero as E +O, such that
-
~ ( xu ,) - 6 < lim R ~ ( xuh) , 5 lim R ~ ( xuh) , I R(X,u ) h-. 0
h+ 0
a )
+ 6.
(5.5)
We use 6 > 0 and !& and G,since ~ ( u #) ~ ' ( uw.p. ) 1, although (5.4) holds. Thus, the sample path costs (for rh(-))may not converge to the sample path } h -+ 0. But the probability of this costs for the limit on the set {t(u) # ~ ' ( u )as set can be made as small as desired. By the optimality of u h ( - ) . Rh(x, uh) 2 Rh(x, uh).
(54
Thus we have proved
Theorem 9.5.1
Assume A9.1.1 t o A9.1.4 and A9.5.1. Then
lim Rh(x, u h ) = R(x, U) IR(x, u), h- 0
I f k ( y , a ) 2 0 and b( * ) A9.5.1.
for all u E @k(x,G ) .
(5.7)
= 0, then the right-hand inequality of(5.7) holds without
If u E @i(x,G ) but not in @g(x,G ) , then, owing to the presence of the boundary aG, we cannot use the method of approximation of Theorem 9.3.2, where a w( -)-dependent control (which was at least as good as u( -)) was found. A problem arises, since the degree to which z = z' for the new "approx-
9.5
CONTROL TO A BOUNDARY: DISCOUNTED CASE
197
imating control is not clear. Another, more direct, type of approximation will now be described. It will allow us to circumvent the above difficulty. ”
An Approximation to Controls in @g(x, G ) Let u E @(x, G ) . Then, we can suppose that there is a probability space supporting a Wiener process w ( . ) and nonanticipative process X ( . ), u( *), where ( X ( * ) ,u ( * ) ,w ( . ) ) solve (1.1). Let { u p ( * ) } denote the sequence of functions which were constructed from u ( * ) in the first part of the proof of Theorem 9.3.1. Then u p ( . ) E @ t ( x ) ,and we can suppose that E
1,’ I up(s)-
u(s)
l2
ds
2-p.
Since u p E @;(x), for each p there is a probability space supporting a ijp( -)), where ( i j p ( * ) , Gp(*)) Wiener process G p ( . ) and nonanticipative has the same law as ( u p ( .), w( -)), and ( x P ( .), f i p ( .), Gp(-)) solves (1.1). Let us enlarge the probability space by adding iP(. ), where (iip( ), GP( ), Gp(* )) has u p ( . ) , w ( - ) ) . We can (and will) suppose that i j p ( - ) is the same law as obtained from i i P ( . ) in the same way that u p ( - ) is obtained from u ( - ) .The GP(*),i i P ( - ) } is tight on C2‘[0,co) x LT,,. Let p index a sequence convergent subsequence, use Skorokhod imbedding, and denote the limit by G(.), ii(*)). By uniqueness, the limit has the same law as ( X ( . ) ,w ( - ) , u ( - ) ) . Hence (as in the proof of Theorem 9.5.1) for each E > 0, there is a eD, a A > 0, and a u E g t ( x ) such that (5.4)holds. Fix E and the corresponding u( -). Let w( and X ( .) denote the associated Wiener process and controlled diffusion, resp. We will approximate u ( . ) by a “simpler” control in a way that allows us to define a “similar” control u h ( . ) for use on th(). Let A/S = q, an integer. Let A : , . .., denote a collection of disjoint sets in R’, each of diameter I l/q, whose boundaries have Lebesgue measure zero, and such that R‘ = A;. Let BTAdenote the minimal a-algebra which and k) contained all the sets (for all j S I iA, n < i, Ur E eD,
(xp(-),
-
@(a),
{xp(.),
(%(a),
a )
uj
{o: w(jS) E At}, {o: u(nA) = n,}, and let P;, ,{u(iA) = n,} denote the conditional probability, given W%. For each iil and i, P t s“{u(iA)= u ~ }is a function of w(jS),j S I iA, and of u(nA), n < i. For each set {kl, ..., kiq; m,, ..., m , - l } , it is constant on the product set A,, x ... x Akipx {u(nA)= Urn”, n < i } . As q -,co, the function converges (w.P. 1) to the conditional probability, given w(s), s I i, and u(s), s I id - A.
9
198
OPTIMAL CONTROL A N D NONLINEAR EQUATIONS
For each q, there is a probability space on which are defined a Wiener process wq( ) and a nonanticipative @',-valued function uq( where v q ( . ) is constant on each interval [iA, iA + A ) , and where P,.,{u4(iA) = is,) is the conditional probability that uq(iA) = is,, given w4(s),s < co,u4(nA),n < i. In fact, by the definition, P{uq(iA) = is, I w4(s), s < 00, u4(nA), n < i } = P{uq(iA)= is, 1 w4(jS),j S I iA, u4(nA), n < i } . Since u4 E @i(x), we can suppose by A9.1.4 that there is a nonanticipative.X4(.) also defined on the space, such that (X4(-),u q ( - ) , wq(.)) solve (1.1). The sequence {X4(*), wq( uq((iA),i = 0, 1, . . .} is tight on C2'[0, co) x @ .; Let q index a convergent subsequence with limit (*(*), k(*),'u(iA),i = 0, 1, . . .) and suppose that Skorokhod imbedding is used. The limit has the same w ( - ) ;u(iA), i = 0, 1, . . .). Thus we can assert that law as (X(.),
-
a),
0 ) ;
R(x, 'u) = R(x, u),
+
lim P{z(uq) ~('u)} I E, 4
lim P{z(uq)# t'(u4)} I E,
lim R(x, uq) I R(x, u ) 4
4
+ E.
(5.8)
It follows from (5.8) that, for purposes of comparing u ( * ) to a(-),we can suppose that u( * ) is of the form of uq( * ) for some finite, but arbitrary, q. The sets A: will be denoted by Ai.Suppose this henceforth. Now, let us define uh( * ), the adaptation of u( * ) to the interpolated process th( Choose u"(. ) as follows. Select u h ( 0 ) randomly according to a).
P{uh(O) = a,}
= P{u(O) = a,},
and let the <"(-) evolve until time A . (Actually, we must let the system evolve until the "discrete" time t2, where n = min{i : t: 2 A } before changing controls. Here and in the sequel, it should be understood that the controls should change only at such discrete times.) In general, at t = iA, choose uh(iA)at random according to P{u"(iA)= is, I Wh(s),s I iA, th(s), s 5 iA, uh(nA),n < i } = =
P{uh(iA)= is, 1 Wh(jS),j S I iA, uh(nA),n < i )
P{u(iA)= is, 1 w(jS) = Wh(jS),j S I iA, u(nA) = uh(nA),n < i},
where, by the last expression, we mean that we simply substitute Wh(j S ) and uh(nA)for w(jS) and u(nA) in the P:,,{u(iA) = is,}, which we consider as a function of w(jS),j S I iA and of u(nA), n < i. The sequence Wh(-); uh(iA),i = 0, 1, . ..} is tight on Dz'[O, co) x a;. Let h index a convergent subsequence, and use Skorokhod imbedding. Denote the limit by (<(.), W ( - ) ;a(iA), i = 0, 1, . . .). For arbitrary n, t < s, and t i I t, i I n, and an arbitrary bounded and continuous function g ( * ) {<'(a),
9.5
199
CONTROL TO A BOUNDARY: DISCOUNTED CASE
defined on the appropriate space, we have
G, = Eg(th(ti),Wh(ri),vh(ti),i I n)(wh(s)- W h ( t )+ ) 0, G,
+
Eg(t(ti), W(ti), ti), i I n)(W(s)- W ( t ) ) ,
En = E g ( t h ( t i ) ,Wh(ti),uh(ti),i +
- Wh(t))' n)(Wh(s)- Wh((t))(Wh(s)
&(<(ti), W(ti), ij(ti), i I n)(s- t ) ,
En Eg(t(ti), w(ti),D ( t i ) , i I n) ( W ( s )+
W(t))(W(s) - W(t))'
where U( - ) is the piecewise constant interpolation of the function with values B(iA)at iA. Thus, t(-)and a(*)are nonanticipative with respect to the Wiener process W ( . )(see, e.g., Theorems 8.2.1 and 6.3.1 and Section 6.6). Also, t(t) = x
+ j;f
(t(s), fib))ds +
jo'4t(s))d W ) .
Next, let us compute the law of (u( *), W(.)). Let C, and B, denote measurable subsets of the A, and R', resp., each with a boundary of zero measure. Let t,, . . ., t , I iA and t, #fa, all j , k, and let each ijjsE aD. Then (assuming that the denominator of the second term is > O )
P{U(iA)= is, I W(t,) E B, , 4 I m, W(jS)E C j ,jS I iA, D(pA)= isj,, p < i} -
PI
\ij(iA)= E , , W ( f , )E B , , q I m, W(jS)E C j ,jS I iA, ij(pA)= isj,, p < i W(t,) E B , , q I m, W ( j S )E C j , jS I iA, @A) = isj,, p < i
4
= lim h-o
=
lim
h-. 0
1'
i 1
uh(iA)= U, , Wh((t,)E B, , q I m, Wh(jS)E C j ,jS I iA, vh(pA)= nj,, p < i Wh((t,)E B, , q I m, W " ( j S ) E C j , jS I iA, vh(pA)= isj,, p < i
i
I
P{oh( iA) = is, , Wh(jS)E C j ,jS I iA, vh(pA)= iij, , p < i} P{ Wh(jS)E C j ,jS I iA, uh(pA)= Uj,, p < i}
-
iA, ij(pA)= Uj,, p < i} P{D(iA)= a,, W(jS)E C j ,jS I P{ W(jS)E C j ,jS I id, B(pA) = Uj,, p < i}
=
P{ij(iA)= is, I W(jS)E C j ,jS I iA, ij(pA)= isj,, p < i}.
(5.9)
By the construction of u h ( . ) , the last term equals (using an obvious abuse of notation) P{u(iA)= is, I w ( j 6 ) E C j ,j S IiA, u(pA)= Uj,,p < i}. (5.10)
9
200
OPTIMAL CONTROL A N D NONLINEAR EQUATIONS
The equality between (5.9) and (5.10) and the nonanticipative property of u(-)imply that W ( . ) )has the same probability law as ( u ( * ) ,w(*)).By the uniqueness assumption A9.1.4, the law of (((*), U(*)) is uniquely determined by the law of (U(-), I#’(*), and U(-)) has the same law as ( X ( - ) , u ( * ) ) . Thus, by the foregoing argument, Eqs. (5.5) and (5.6), and the arbitrariness of E > 0, we can conclude that @(a),
(((e),
Theorem 9.5.2 Assume A9.1.1 to A9.1.4 and A9.5.1. Then R(x, a ) IR(x, u ) for all u E @g(x, G ) . REMARK Let k ( y , u ) 2 0, all y, u, and let b ( . ) 3 0. Then, whether or not
p = z(ii), we have p 2 z(ii) and hence
R(x, ii) I!& Rh(x,u’). h+ 0
In this case, we can drop condition A9.5.1 from Theorems 9.5.1 and 9.5.2. Note also that Assumption A 9 5 1 is implied by
P,{z(n) = ?’(a))= 1. The Optimality of ii and R ( x , ii)
-
Iff ) and Q( satisfy a uniform Lipschitz condition in x, then according to Theorem 9.3.3, E E q R ( x ) ;i.e., the solutions corresponding to the law of the limiting (ii( * ), W(* )) are unique in probability law. We insist, at least for practical reasons, that the solutions be unique in law. Thus under a Lipschitz condition, ii( * ) is a legitimate” control and is optimal with respect to each u E @g(x, G ) . We selected the class @g(x, G) because it was possible to “discretize” its members; i.e., for each u( in that class, we can define a u’(*) which we can compare to u h ( - ) ,and prove the optimality of ii(*) from the optimality of u h ( - )and weak convergence arguments. Any other class of “discretizable” controls would do as well. But it is difficult to discretize or to approximate an arbitrary control u( - ) if control stops at the boundary dG, unless z‘(u) = z ( u ) w.p. 1. More work needs to be done on this point. (a,
a )
“
a )
9.6 Control until a Boundary Is Reached: No Discounting When the discount factor of Section 9.5 is dropped, a new problem arises. It is not always true that the optimization problem is well formulated for either the chain or for the diffusion. For example, there may not be any control under which the boundary 8G is attainable, or the structure of the cost may be such that it is “cheaper” never to reach the boundary. We will
9.6
201
CONTROL TO A B O U N D A R Y : N O DISCOUNTING
treat some aspects of this problem. It should be emphasized that the problem does not arise because of the method of approximation, but because the original optimization problem may not be well formulated. Assume A9.1.1 through A9.1.4 but drop the positivity assumption on k ( * , *). We will treat the cost R ( x , u ) as defined by (4.7.9). The infimum of R ( x , u ) over controls in @L(x, G), when considered to be a function of x, formally satisfies the partial differential equation (4.7.13). Again, any of the approximation methods 1-3 of Section 9.1 can be used, but for definiteness we will continue to use approximation 3 (although the only place where the actual properties of the particular approximation would appear is in the use of Ath(x),Ath(x,u), or A?'(x)).The discretization of (4.7.13) is
16(x),
min [.Ex ~~
Vh((.)=
( 5+: k(x, ) u)Ath(x)],
x E Gh,
(6.1)
as*
#
Gh'
Let %i(h, x , G) denote the class of randomized policies for the chain {ti} with a finite average escape time from Ghrwhen
where Ui is the actual sample control realization at time i, and define inf
V h ( x )= IE
%R
+
Rh(x,n).
(h, x, C)
Equation (6.1) is only a formal representation for Vh(.). Unlike the situation for the stopping or discounted problems, it is not always guaranteed that (6.1) has a unique solution, which is the right-hand side of (6.2). For example, there may not be a control in %i(h, x, G) even if there is one in @)R+(x,C) or in @ i ( x , G). Furthermore, even if such a control u( .) exists, it is conceivable that a lower cost (than achieved with the use of u ( . )) can be obtained by use of another control, which always keeps the process in? G , . The problem is complicated because it involves both the properties of k( .) and b( ) and the structure of the transition probabilities. We will show that the discretized problem is well formulated in two particular, but broad, cases. Intermediate cases can certainly be handled also provided that there is some condition which guarantees that the boundary will be reached under the optimal control. The following cases are illustrative of the possibilities and techniques. a,
-
t Recall that part of the problem statement is that we must reach the boundary or target in finite time.
9
202
OPTIMAL CONTROL A N D NONLINEAR EQUATIONS
CASE1 There is a To < co such that, for each @-valued nonanticipative measurable control u ( . ), sup E ~ T 5' To . Y€.G
[The sup should be taken over all y E G for which there is a solution X ( . ) , unique or not, and over all nonanticipative solutions.] CASE2 Fix the initial condition to be x. There is a real ko > 0 such that k(y, a)2 k,, all y E R", a E @, and there is at least one u E @ i ( x , G). In case 2, there may not be a minimizing policy for {(:} for all initial conditions, but (for small h) where will be one when (! = x. The optimal policy can, of course, be assumed to be pure Markov and stationary. Then, the optimal pure Markov and stationary policy u h ( * )will be defined for x and for all states which communicate with x under Case 1 includes all problems with positive definite a ( . ) and various generalizations of the example in Section 4.4. In particular, it holds if there is an a, > 0 and an i such that aii(x)2 a , , x E G. For t > ~ ( u )the , control u ( - ) can be defined arbitrarily. But the cases imply the assumption that, for each relevant u ( * ) , there is an extension of u ( . ) to [ ~ ( u ) co) , such that T ' ( u ) is defined, and the above properties hold. If k ( . , .) = 0, then the discretized problem has a solution under weaker conditions, which will be remarked upon at the end of the section. .'(a).
Theorem 9.6.1 Assume A9.1.1-A9.1.3. Then for each small h, there is an optimal stationary pure Markou control u h ( * )and a unique solution to (6.1) which is the right-hand side of (6.2). Also lim sup E:hp, < co, h-0
where
Ph
is the interpolation of the hitting time mh (under uh).
PROOFCASE 1 Let z h ( - denote ) an arbitrary policy in WR(h)for the )). Let q h denote the corresponding chain {ti}(and interpolated process th(. interpolated escape time from G,. Suppose that there is a y E G and a sequence x h + y as h + 0 such that Pt:{qh 5 T } +o
as h - 0 ,
xh E
Gh,
(6.5)
for each T < co. The sequence {th( -), W"( .), F h ( - ) ,K h ( - )q, h } is tight on D3'+'[0, co) x R'. Let h index a convergent subsequence with limit ((( * ), W ( .), F( .), K ( .), q).
9.6
CONTROL TO A
BOUNDARY: NO
DISCOUNTING
203
Then (by use of the results of Section 9.2) there is a @-valued nonanticipative measurable process B(. ) such that t ( t )= Y
+ J'f(t(s), 0 B(s)) ds + jo'.(t(s))
dW(s),
K ( t ) = ' k ( t ( s ) ,~(s))ds. 0
BY (6.3), E:T' ITo.
Suppose that Skorokhod imbedding is used. This implies that there is an > 0 and a Tl < oc) for which?
ctl
P;{T' I TI} 2 a l .
(6.6)
Also, we have limh+oq h = q I ~ ' ( v w.p. ) 1, which, together with (6.6), contradicts (6.5). Thus, there is a T < GO and an LY > 0 for which P:h{qh I T } 2 LY, for all small h, y
E
Gh
and zhE ~ , ( h ) . (6.7)
Let f h ( * ) denote the stopped process t h ( t n qh).By (6.7) and an induction argument, P",{qh > nT)= E?z(tYnT-
T ) EG ) I { E h ( n T ) E G )
I E:hl(tyy,T- T ) E G ) P : h { q h > nT 1 eh(nT - T ) E G } I E:*l,tyn~- T ) G ) ( 1 - a ) I (1 - LY)". [We used the fact that (6.7) is uniform in z h ( . and ) y.] Thus
Jb
co
E:hqh I
(6.8)
Pzh(qh> t } dt I T 4- Tfa,
which implies the theorem for Case 1.
CASE2 Let u ( . ) denote an element of @i(x, C). By the assumption of Case 2, such an element exists. Consider the discounted problem with constant discount factor I > 0 (replacing the function I ( * ) )and denote the discrete and continuous discounted costs (with policy z and control u, resp.) by Rh,(x, z) and R,(x, u), resp. Let uh,( * ) denote the pure Markov stationary control which optimizes Rh,(x, z). By the results of the last section lim R,(x, uh,) I R,(x, u).
h-0
t T ' ( u ) is the escape time from G, under drop the argument of T'.
,)a(.
(6.9)
If u ( - ) is obvious, say from the operator E:, we
204
9
OPTIMAL CONTROL A N D NONLINEAR EQUATIONS
Also, R,(x, u ) _< R(x, u ) < 0O
(6.10)
for all I > 0. The inequalities (6.9) and (6.10), the positivity of k ( . , -), and the arbitrariness of I > 0, imply that there is a I > 0 with corresponding optimal discrete controls u t ( - ) and (interpolated) hitting times p: and a constant K such that EYhp; < K ,
all small h.
(6.11)
Equation (6.1 1) implies that the nondiscounted discrete problem is well formulated under Case 2. Q.E.D. Theorem 9.6.2 Assume A9.1.1 through A9.1.3 (except for the positivity condition on k(., .)). Fix x , and let the optimal u " ( - ) exist for each small h. -), Wh(-), Fh( -), Kh(-)} is tight Then the sequence (corresponding to u"( .)) {th( on D3'+ "0, co).Let h index a weakly convergent subsequence with limit ((( -), W ( . ) , F(.), K ( . ) ) . Then ( ( t )= x F ( t ) a(((s)) dW(s). There is a %-valued,nonanticipative and (0, t ) measurablefunction n( for which (w.P. 1)
+
+ ro
a )
The proof is exactly the same as for the corresponding result for the controlled stopping problem or for the controlled discounted problem. Theorem 9.6.3 Fix x and assume the conditions of Theorem 9.6.2, and let h index any convergent subsequence. Let a ( - ) be the control referred to in Theorem 9.6.2. Assume Case I or 11. Then
lim sup Exph < 00,
(6.12)
h
where p h is the Jirst time that ch(.) leaves Gh under the optimal control uh(.). Then E,r(ii) < co. Assume that (6.13)
Px{Z(U) = T'(U)} = 1
(or assume A9.5.1). Then
lim V"(x)= R(x, ii)
h-0
< R(x, u )
all u
E
@'R+(x, C).
(6.14)
Iff (-, - ) and a(*)satisfy a uniform Lipschitz condition in x, then a E @',(x).
PROOF The result Exr(ii)< 00 follows from (6.12) and Fatou's lemma. Let R,(x, u ) denote the discounted cost with constant discount factor I ( x ) = I and for the process X(- ) under a control u( * ) and let u:( .) denote
9.7
T H E IMPULSIVE CONTROL PROBLEM
205
the optimal pure Markov stationary control for the discounted problem on the chain {t:}.Equation (6.12) follows from the argument in Theorem 9.6.1, which also implies that lim sup E:p: < co. h-10
1-0
=-
For each E 0, (6.12) and the last estimate imply that there is a AC > 0 such that, for 1I I , ,
+ E.
(6.15)
lim R:(x, u:) IR,(x, u).
(6.16)
Rh(X, uh) I R:(x, u:)
Now, let u E @ i ( x , G). By Theorem 9.5.2 (when A 9 5 1 is dropped from Theorem 9.5.2, R(x, U) is replaced by K hV h ( x ) ) .
-
Also, as 1-,0,
h+O
R,(x, ).
+
R(x, .).
(6.17)
Equations (6.15)-(6.17), Theorem 9.6.2, and the arbitrariness of E imply the theorem. Q.E.D. REMARK Suppose that k(., - ) - 0 and b ( . ) I0. Let the cost be R(x, u ) = E",(X(r))l,,,,, , Then the discrete problem is always well formulated, and Theorem 9.6.2 continues to hold. Theorem 9.6.3 holds without (6.12), and if the @;(x, G ) in (6.14) is replaced by QR(x, G). Cases I and I1
are not needed.
9.7 The Impulsive Control Problem We are able to treat the impulsive control problem when there is also a continuously acting control. Only a sketch will be given. First we define a if u ( - ) is class of controls J R ( x ) .We say that (u(*),{ T ~ ,ui}) E aR(x), a 4Y-valued measurable process, if the ui are V"-valued random variables and (all i) T ~ + 2 ti2 0, and if there is a Wiener process w ( - ) such that the functions u ( * ) ,Ti(*),u l ( ' ) , (where ri(t)= IgiSr), u i ( t ) = uiI(risrJare nonanticipative with respect to w(*),and the equation
has a unique (and nonanticipative) solution, in the sense that the law of (X(.), u ( - ) , { T ~ ui}) , depends only on the law of ( w ( * ) u, ( - ) , { T ~ ui}). ,
9
206
OPTIMAL CONTROL AND NONLINEAR EQUATIONS
Define the cost
with
V ( x )= inf R(x, u, { 7 i , ui}),
where the inf is over the class a R ( x ) .There is an optimal pure Markov stationary policy for the discretized problem. Let {th(), Fh( ), K h ( .), Wh(*), {7!, u:}} denote the interpolated process and optimal jump times and jumps, and let ({(-), F ( - ) , K(.), W(.), { p i , iii}) denote the limit of a convergent subsequence. Define p i ( . ) , ni( by p i ( t ) = I{pisf), Zi(t) = ZiI { p i s f ) .Using the techniques of Sections 9.1-9.5, we can show that { p i (.), iii( .)} are nonanticipative and that there is a measurable nonanticipative %-valued function ii( .) such that
-
-
a),
t(t)= t(p; + ai +
r
'0
f(t(s), o(s)) ds +
J a(t(s))d ~ ( s ) , 0
(74
K ( t ) = j'(exp - I s ) k ( t ( s ) ,n(s)) ds. 0
Under the assumptions A8.4.1, A9.1.1-A9.1.3, and a natural analog of A9.1.4, we can also show that R(x, i7, { p i a,}) 9
(7.3)
R(x, u, {pi ui}) 3
for any u, {pi,ui}E a R ( x ) .Under a Lipschitz condition onf(., - ) and a(-), we have the uniqueness under U, {pi,Pi};hence R(x, ii, { p i , tii})= V ( x ) . The problem of multiple simultaneous jumps of X ( . ) or t(-)can be handled as indicated in Chapter 8. 9.8
Numerical Results
Figures 9.1-9.3 illustrate some numerical data for an optimal stopping and control problem for the system dX,
=X
, dt,
dX,
=
-(Xl
+ 3 X 2 ) dt + u dt + J2
dw,
where r
[ 1 + 0.25 I ~ ( tI]) d t
+ E:(X:(7) + X $ ( T ) ) .
9.8
L
-3.0
I
I
-2.7
-2.4
I
1
-2.1
20 7
NUMERICAL RESULTS
-1.8
I
-1.5
l
l
-1.2
1
-0.9
I
I
-0.6
-0.3
' 0 0
FIG.9.1 Optimal stopping and control, decision sets, h = 0.3.
2.1
1.8
1.5 u(x)=-l u (x) = 1
1.2
3.9 3.6
3.3 I
-3.0
-2.7
-2.4
-2.1
-1.8
-1.5
-1.2 -0.9
-0.6
-0.3
FIG.9.2 Optimal stopping and control, decision sets, h = 0.15.
0
208
9
OPTIMAL CONTROL A N D NONLINEAR EQUATIONS
The figures give data for the revised version of method 1 (Section 9.1) where 3x2), andf2(x) = u. There was enforced Ti(.) =fl(x) = x 2 ,34.) = -(xl stopping on first exist from G = {x : I x1 I < 3, 1 x2 1 < 3}, unless the process was stopped earlier. The sets where U(X) = 0, f 1, or where we stop, are plotted for the upper left-hand quadrant only. Since the algorithm yields values of u(x) for x on the grid G, only, we used the following naive rule to interpolate the decision sets from Ghto G:if the value of u(x + e, h ) was different from that of u(x), we assumed that the control for the continuous process would change values at x + e, h/2 and similarly for -% replacing h. The scheme is too simple. It is quite possible that improved decision sets would be obtained by first interpolating the Vh(*),x E G h ,into a smooth function on G and then (assuming that the interpolation is optimal for the continuous time procedure or for a fixed time interval discretization of it) use the principle of optimality for the diffusion (or for a fixed time interval discretization of the diffusion) to get the decision sets. Also, a reasonable “smoothing” of the decision sets in the figures would bring them even closer to each other. In fact, such a smoothing would be used for an actual continuous time control.
+
9.8
NUMERICAL RESULTS
209
= 0.3,
but that set is within the
Note that the stopping set is largest for h
= 0.2 optimal stopping set for the h = 0.075 problem. In fact, changing the value u ( . ) on the boundary (from its value on one side of the boundary to its
E
value on the other side) of the decision sets has relatively little effect on the cost, and such changes bring the decision sets quite close for the various values of h. The simulations raised more questions than they answered. As h decreased, the values of Vh( decreased. The revised version of method 3 was also simulated. The decision sets were close to those in the figures, but the values of Vhwere higher (about 20% or so), although the absolute difference decreased as h decreased. It is far from clear how to choose an approximating chain. For example, which of the methods is preferable (although the revised version of method 1 seems to work best) and why? Shouldf(-, .) and o(-)be averaged locally before discretization? How should the grid be chosen? The numerical results in our runs suggest that we are obtaining upper bounds to V ( . ) , but we have not been able to prove that. Here, as in Chapter 8, physical intuition is helpful in selecting useful orderings of the states in the computations. This intuition can probably be used to a much greater extent in connection with the choice of algorithm. The data indicate that the procedure is useful and reliable but much remains to be done before a reasonably good understanding of the numerical properties and best choices is available. It should be pointed out, however, that (at least at present) such a warning should be attached to any of the known numerical techniques, whether or not there are associated convergence theorems. a )
C H A P T E R 10
Approximations to Stochastic Delay Equations and to Dgusions and Partial Dgerential Equations with Reflecting Boundaries
The chapter treats extensions of the concepts of Chapters 6 and 7 to two interesting classes of processes. The first class consists of processes which are solutions to stochastic differential delay equations. The discretizations and approximations of functionals are discussed. The second class of processes may be considered to be diffusions which are reflected from a boundary. These processes are defined to be the solutions to the submartingale problem of Strook and Varadhan [S4], and are related to solutions of elliptic and parabolic equations with Neumann or mixed boundary conditions in the same way that the stochastic differential equation of Chapter 6 was related to solutions of elliptic and parabolic equations with Dirichlet and Cauchy boundary conditions. The discretizations are Markov chains which are reflected from a boundary. The solutions to the discretized equations are approximations to the solutions of the partial differential equations. The ideas are outlined and a convergence theorem 210
10.1 STOCHASTIC DELAY
EQUATIONS
211
stated. Details of the proof are not given but can be found in Kushner [K7]. Finally, several extensions to the approximation problem when there are boundary reflections are discussed. 10.1 Approximations to Stochastic Differential Delay Equations. Introduction
The ideas of Chapter 7 carry over to the case where the functional dependenceofthe f ( X ( r ) ,t)and o(X(r),t) ofA7.1.1 on X ( t ) is replaced by a smooth dependence on the solution over an interval of time, say over [ t - 1, t]. We will treat a simple one-dimensional problem. The higher-dimensional forms are treated similarly, but the state space becomes so large that the method will probably be impractical. For any a,p, CL < p, let C[a, p] denote C’[a, 83. Let ,u denote a Jinite measure on the interval [ - 1, 01. Assume that A1O.l.l f(.,.),a(., .), k ( - , .), b,(., .),and b,(*)areboundedcontinuous real-ualuedfunctions on C[ - 1,0] x [0, TI and (for b T ( . )only) on C [ - 1, 01, resp. Assume I f ( x ( . ) .t ) - f ( Y ( . h t ) l
inf
+
I 4 x ( * ) ,t ) - O(Y(.X t)l
a ( x ( - ) ,t ) > 0,
.x(.).OSf< T
where x ( - ) ,y ( - )range over C [- 1, 01 and x ( t ) is the value ofthefunction x ( . ) at time t E [ - 1, 01.
r,
For any continuous function Y ( . )on [ - 1, TI, let denote the element of C[- 1, 01 with values X(s) = Y(t + s), s E [ - 1, 01, t 2 0. The equation X ( t ) = xo(0)+ [ ‘ f ( X , , s) ds ‘0
+1
.f
‘0
a(X,, s) dw(s)
(1.2)
has a unique, continuous, nonanticipative solution for each w ( . ) process and initial condition xo E C[ - 1, 01. [The condition (1.1) plays the same role that the uniform Lipschitz condition plays in the ordinary case.] See It6 and Nisio [I2]. Let b < c denote real numbers and let G denote an open set in C [ - 1, 01 such that G c {y(s) : y(O)e(b, c)}.Henceforth let xo befixed. If 7 and 7’ denote the first exit times from G and C,resp., suppose that
212
10
DELAY EQUATIONS AND REFLECTING BOUNDARIES
Equation (1.3) is satisfied, for example, by any set of the form G
= {Y( .)
;Y ( - a i ) E A , ,
. . ., Y ( - a s ) E A},
where ai E [ - 1 , 01 and the A iare finite unions of open intervals. Define the functional, for some T > 0,
The solution X ( - ) to (1.2) is not a Markov process but the sequence { X z , t 2 0 } is when we define the probability space appropriately. Since the state space .of X, is infinite-dimensional, we cannot expect to be able to approximate the solution to (1.2) by an interpolated Markov chain whose variables are real-valued. The dimensionality of the approximating chain must grow as the finite difference intervals go to zero. There is an abstract operator which is related to functionals of (1.2) in the same way that the operator (a/& + 9) was related to the functionals of (7.1.1).However, we will proceed in a simpler way by using the technique of approximation which was developed in Chapter 7, with finite difference intervals h and A. Definitions Assume that l / A is an integer and define LA = 1
+ l / A . Let
xo denote the fixed initial condition for (1.2).Define = (xo(0),x o ( - A ) , . . . , xo(- 1 ) ) = "(O), - l ) , ..., "( 1 - LA)), where the equality
(th* thqA(
ths
defines th* " ( i ) , i s 0. The number LAis the dimension of the state space of the approximating Markov chain. We will shortly define the sequence thvA(n)for all integers n > 0. With this in mind, define "( ) to be the piecewise linear '(n), n 2 1 - L A } , with constant interpolation intervals A. interpolation of {th* Define the vector ti." = (th*"(n),. .., th*"(n- LA + 1)). We use a piecewise linear (rather than piecewise constant) interpolation since f ( t ) , etc., are only defined on C [ - 1,0] and it seems to be the simplest procedure in the present case. The interpolations are elements of C [- 1 , TI. z h s
-
.,
The Process
{th* "(n)}
For n > 0, define the random variable th*"(n)via the transition probability (7.2.6a, b) as follows:
+
~ { t ~ , " ( 1n) = t""(n)J&"}
+
=
- (A/h2)[hIf(t?AA,
p { t h * " ( n 1 ) = (""(n) f hlt:hA} =
(A/h2)[02(t:hA, nA)/2
nA)l +
2 -h A (;A
9
nA)]
(1.5)
+ hf* @LA,.A)].
We assume, as in Chapter 7, that the first term of (1.5) is nonnegative.
10.1
213
STOCHASTIC DELAY EQUATIONS
If N A = T and N(h, A) denotes the first value of n for which ztiA@ G and r(h, A) = N(h, A)A, then we define the approximation
The Markov chain A approximates X , and the law of the chain can be used to compute (1.6)recursively. FurtherFor each value of <:*A, there are at most three values of - 0, then true for n = -LA + 2, . . ., 0 also, then there are at most [(b - c)/h - 1]31'A
states in the state space of {<:*A} which need to be used. Tightness and Convergence
We can write
+ bT(Z?
' 9
T)I{r(h, A)>
T)
1
+ z(h*
where B(h, A) is an error due to the approximation of the sum by the integral and goes to zero as h, A + 0. Write, in obvious notation, ZhqA(iA) = xo(0)
+ c f(?!iA, jA)A + cj?h*A(j), j
=~
~ (+0Fh* ) '(t)
(1.7)
j
+ Bh*" ( t )+ E ~ ' *( t ) ,
(1.8)
where BhvA(t)is a linear interpolation of the function oft which is defined by c j A < f /3h*A(j) and the error E ~ * ~ ( is- )due to the use of the interpolation and to the approximation of the sum of thefterms by the integral and goes to zero as h, A + 0. The sequence B h * A ( - )c,h * ' ( . ) } is tight on C 3 [ - 1 , TI. Let h, A index a convergent subsequence with limit (<( ), B ( . ), 0). Note that, if "( .) converges to <(.) w.p. 1 on [ - 1 , TI, then A + tf,w.p. 1, t E [0, T ] as
{zh.A(-),
-
z?
zh*
214
10
DELAY EQUATIONS A N D REFLECTING BOUNDARIES
-
h, A + 0. There is a Wiener process W( such that 9( .) and B( ) are nonanticipative with respect to W ( . )and (1.2) holds for 4 ( * ) = X ( - )and W ( - )= w( .). By (1.3) and the uniqueness of the solution to (1.2), a )
Rh-' ( x 0 )+ R ( x o ) ,
as h, A + 0.
The technique of discretization of Chapter 6 is more difficult (than that of Chapter 7) for the delay problem since the interpolation intervals At! would be random and the minimal number of terms of { c h ( n ) }which are needed to represent the process on each interval [ t - 1, t ] would be random and would vary with t .
Approximations to Elliptic and Parabolic Equations with Neumann Boundary Conditions. Formulation
10.2
The Martingale Problem An alternative formulation of the problem of existence and uniqueness of the solution of the stochastic differential equation X ( t )= x
+
. f
.jO
+I
. f
f ( X ( s ) )ds
'0
o ( X ( S ) )dw(s)
(2.1)
is the following, where we suppose thatf( * ) and a(. ) do not depend on time, purely for notational simplicity. Let q ( - , .) be a bounded function in CZ*'[R' x [0, a)]and define the functional Fq( *, * ) on C[O,co) x [0, co)by Fq(x(.),t ) = q(x(t), t ) - q(x(O),0 ) - \'[d/as '0
+ ,rP]q(x(s), s) ds.
(2.2)
Suppose that, for each x E R', there is a measure P , on C[O,co) such that P,{x(O) = x } = 1 and, for each q ( * ,*), the process with values? M,(o, t ) = F , ( x ( - ) , t ) is a martingale (with respect to the sequence 9If,0 2 t < 03, of sub-a-algebras of C'[O, co) which are induced by the projections x(s),s I t ) . Then { P , , x E R') is said to be a solution to the martingale problem. If u( - ) is bounded, continuous, and strictly positive definite and f(.) bounded and measurable, then there is a unique solution to the martingale problem (Strook and Varadhan [S3]). There is a unique solution to the martingale problem if and only if there is a unique solution (in the sense of probability law) to (2.1) for each x E R' [Kl]. Thus, the martingale problem has a unique solution iff( ) and o ( . ) satisfy a uniform Lipschitz condition. Note that, if X ( .) is a solution to (2.1) with X ( 0 ) = x, then by It6's Lemma
-
J- q:(X(s), s)a(X(s))W s ) f
Fq(X( *
)?
t)=
0
t Here o = x(.) is the generic variable of the probability space C [ O , to).
(2.3)
10.2
215
APPROXIMATIONS WITH NEUMANN OR REFLECTING CONDITIONS
is a martingale under P,, the measure induced by the solution (2.1), and the appropriate sequence of a-algebras. Thus, the solution to the martingale problem is intimately related to solutions of elliptic and parabolic equations with Cauchy or Dirichlet types of boundary conditions and to diffusion processes for which Y is the differential generator. There is a very nice extension of the idea of “martingale problem,” due to Strook and Varadhan [S4], which relates a class of reflecting diffusions to solutions of the partial differential equations with Neumann boundary conditions.
The Submartingale Problem Let G denote a bounded open set, y( - ) a bounded continuous real-valued, and &( ) a twice continuously differentiable real-valued function such that G = {x : # ( x ) > 0}, aG = {x : & ( x ) = 0}, I &,(x) I 2 1 on dG, and inf,.,, #,(x)y(x) > 0. Let p = 0 or 1. Suppose that, for each x E G, there is a measure P , on PGIO,a )= Cr[O, a)n {x(.) : x ( t ) E G all t < co},such that P x { x ( 0 )= x } = 1 and, for each q( ., .) E C 2 -‘[G x [0, a)],for which
-
pqr(x, t )
+ y’(x)qx(x,t ) 2 O
on aG,
t 2 0,
(2.4)
the process defined by r
Fq(X(.),t ) = q(x(t), t ) - q(x(O),0)-
J’ [alas + -4”lq(x(s),s)I,(x(s)) 0
ds
(2.51 is a submartingale, with respect to {Br}, where 4, is the o-algebra induced by the projections x(s), s It . Then { P , , x E G} is said to be a solution ofthe submartingale problem (Strook and Varadhan [S4]). Let a( ) be strictly positive definite and u( .) andf(. ) bounded and continuous. Then [S4] there is a unique solution to the submartingale problem. Furthermore [S4], there is a unique nondecreasing scalar-valued process p( *), which increases only when x ( t ) E aG, such that
-
Fq(X(.
13
J-
t ) - r(pqs(x(s),s) + Y’(x(s))qx(X(s), s)) 0
44s)
(2.6)
is a martingale (with respect to P , and {dr}). The measure P , induces a process which satisfies
216
10
DELAY EQUATIONS A N D REFLECTING BOUNDARIES
for some Wiener process w ( * ) . The last integral in (2.7) accounts for the reflection into G from dG. It is the term which “pushes” X(r)away from dG in the direction y ( X ( t ) )when X(t) E dG. This will be made clear when we develop the Markov chain approximation. Let A( .) 2 0, /?(-) 2 0, g( *), b T ( . ) ,k( *), and b( ) be real-valued, bounded continuous functions on some neighborhood N , of G, with 1(x) 2 1, > 0, where 1, is a real number, and let b,( *, * ) b ea bounded continuous function on N , x [0, co). Consider the two sets of equations (2.8) and (2.9).
-
~ ? V ( X+) k ( x ) - ~ ( x ) V ( X=)0,
+
y’(x)Vx(x)- P(X)V(X) g ( x ) = 0,
x
E
G,
(2.8a)
x
E
dG,
(2.8b)
The functions (2.10) and (2.11) have the same relationship to Eqs. (2.8) and (2.9) as Eqs. (4.2.2) and (4.2.4b) had to (4.5.2) and discounted (4.5.4), resp. Boundary costs b ( . )and b , ( . , .)and hitting time z do not appear since the boundary is reflecting. The functions defined by (2.10) and (2.11)are weak solutions to (2.8) and (2.9), where the underlying process X ( - )is the unique solution to the appropriate submartingale problem.
10.2
21 7
APPROXIMATIONS WITH NEUMANN OR REFLECTING CONDITIONS
Weakened Conditions ; a ( . ) Nor Necessarily Positive Dejnite
The positive definiteness of a( .) was used in Strook [S4] in several ways. Indeed, it was basic to the proof of uniqueness. More important for our purposes, it implied the estimate (2.12). Let N , denote the &-neighborhoodof3G. Then [S4, Lemma 3.31 for each real T < 03, there is a real K , such that T
(2.12)
The estimate (2.12) says that X ( . ) spends “little” time near dG (unless it is on dG). It is particularly useful in the approximation problem. For example, if t h ( .is) the “approximating” process, then to show convergence of the approximations to the functionals (2.10) or (2.11) (or even to the paths (2.7)), we need to show that (for an appropriate definition of G,, the finite difference grid on G)
r,,(gh(s))
-+
lG(X(s)),
w.p. 1, almost all s
(2.13)
(using Skorokhod imbedding) as h -+0. If the paths X ( - ) lie in a compact set contained in G, then (2.13) is obvious. The estimate (2.12) (and a similar estimate for th(.)) allows us to act almost as if the paths X ( . ) were contained in such a compact set, except for the actual time spent on the boundary. So even if we weaken the conditions, we will still need some condition which guarantees (2.12). In Chapters 6 and 7 the assumptions were of two types. One type concerned the continuity of various functions and the second type concerned uniqueness (in the sense of the probability law) of the solution to (2.1) for each initial condition. We will do the same thing here. The properties of the functions listed above (2.8) will always be assumed to hold. Let us split dG into two disjoint parts, dGR and dG,, where dG, is relatively open, the process x( ) being reflected on the first and absorbed on the second. Replace the boundary condition (2.8b) by
-
y’(x)V;(x) - B ( x ) V ( x )
+ g ( x ) = 0, V ( X )= b ( x ) ,
x E dGR xE
SG
(2.8b‘)
.(,
and (2.9b) by ~ I / ; ( x ,t )
+ y’(x)V,(x) - B ( x ) V ( x ) + g ( x ) = 0, V ( X ,t ) = b , ( x , t ) , V ( x , T )= b*(x),
x E d G R x [O, T),
x E SG, x [0, T),
x
E
(2.9b‘)
G.
We make the following assumptions on the process X ( . ) . There is a r , where r = first (unique) solution to the submartingale problem for t I
218
10
DELAY EQUATIONS A N D REFLECTING BOUNDARIES
hitting time of dG,. This assumption holds if a ( - ) is positive definite on a neighborhood of dGR, and the stochastic differential equation (2.1) has a unique solution (in the sense of probability law) on [0, co) for any initial condition which is independent of w( Assume that there is a real K > 0 such that a).
1
~ ~ ( X ) ~ ( X ) 2 ~ X ( ~ $ x) ( ~ )
1’
On
dGR
.
(2.14)
The estimate (2.14) is used to prove (2.12) (with T replaced by T n T) and the related estimate (2.15) [K7], where ph is the first time that <‘(-)hits the discretization of dG, and <’(.) is the interpolation of the approximating chain. Ex
loT
nph
IN,(<’(s))ICh(<’(s)) ds IK T E
for small h.
(2.15)
Assumption (2.14) states, essentially, that if the diffusion is degenerate at some x E dG,, then it cannot be (locally) concentrated in the tangent plane to dG, at x. Indeed, the assumption (2.14) together with y’(xWx(x) > 0 insure that the reflection is well defined; i.e., that the process does not “chatter” near the boundary but eventually drifts away, even though it may hit the boundary infinitely often in some small interval of time. In order to avoid an ambiguity in the boundary condition on aG, n aG,, we assume
P,.x(T) E
ac, n aG,}= 0.
(2.16)
Also, suppose that z is continuous w.p. 1 relative to the measure on
C‘[O, co) or on D‘[O, co) induced by X(.),
(2.17)
and, in the case of the parabolic equation, that P , t{z = T } = 0.
(2.18)
We now define (2.19) to be the weak-sense solution to (2.8a) and (2.8b‘), and (2.20) to be the weak-sense solution to (2.9a) and (2.9b’). Actually, there do not seem to be specific references which we can cite to justify the definition. Yet the development in [S4] strongly suggests that (2.19) and (2.20) are the weak solutions. In any case, the approximations converge to (2.19) and (2.20). r
(2.19)
10.3
219
THE FINITE DIFFERENCE APPROXIMATIONS
(2.21)
10.3 The Finite Difference Approximations
We will concentrate on the elliptic case (2.8a) and (2.8b) and its solution (2.19). The problem for the parabolic case involves only minor changes (see Kushner [K7]). By applying a finite difference method similar to the one used in Chapter 6, the process (2.21) will be approximated by a “reflected and absorbed” chain. The procedure is much the same as that which we used previously; the only new difficulties concern the behavior near aGR . Define the set aGi in (7 n R; as follows: x E aGk if x E G n R; and either x E aG, or the line connecting x to one of the neighboring grid points x f e i h or x + e i h + e j h or x + e i h T e j h touches a G , . The set aG; approximates dG, from the “inside.” Define G, = G n R; - d G i , the discretized interior of G, and aG: = R; - G, - a G i , the discretized stopping set. To discretize (2.8a), use (6.2.1)-(6.2.3) and assume (6.2.4). Then (6.2.5) gives the transition probability of the approximating chain ( 5 3 in Gh. Suppose that the approximating chain stops on first contact with aG:. To complete the construction of the approximating chain, we need to discretize the first line of (2.8b’). To d o this we use VXi(X)
+
Define I y ( x ) 1 = by Note that
1
+
[V(x eih)- V(x)]/h [ V ( x )- V ( X - eih)]/h
if yi(x)2 0, if yi(x) 0,x E
-=
.
(3.1)
I:= I yi(x)1 and define the transition probabilities on ph(x, x
Ex(51:+1 -
*
ei h) = 7’ ( x ) /I ~ ( x1. )
r1: I 51: = Y
E
I
aGk) = Y(Y)h/ Y(Y)
(34 19
(3.3)
220
10
DELAY EQUATIONS A N D REFLECTING BOUNDARIES
which is consistent with the fact that the reflection from a pointy E aGR is in the direction y(y). In order for (3.2) to make sense, we must require that x
+ eih sign yi(x) E G
for small h and x
E
aG;(sign 0 = 0). (3.4)
Condition (3.4) is not essential. It can be dropped if we modify (3.1) and such modifications (still yielding a Markov chain approximation) will always exist. We stay with (3.1) and (3.4) purely for notational convenience. Thus we have constructed a Markov chain which is reflected from dG;, and stopped on dG:. Now, let us return to the discretization of (2.8a). The discretization (for x E Gh) is done exactly as it was done for the discounted case in Section 6.5. For notational definiteness, assume that supx Ath(x) + 0 as h -+ 0 and use (6.5.3). Thus, letting V h ( * denote ) the solution to the finite difference equat ion, we have
+
Vh(x) = exp -J(X) Ath(X)[ExV h ( { : ) k(x) Ath(x)],
X E
Gh . (3.5)
Constructions similar to those used in Section 6.5 can be used at x A direct approximation of the first line of (2.8b') yields
E
aG;.
(3.7c) If d p h ( x )+ 0 as h + 0 uniformly for x E dGR,then (3.7a-c) can all be used. (This is the case here since inf 1 y(x) I > 0 on dG, .) If not, then we will have to use (3.7c), as we had to use (6.5.4) when Ath(x)+ 0 uniformly in x as 17 + 0. I t is possible to handle both absorbing and reflecting boundaries simultaneously if we set b(x) = g(x)/B(x), let y(x) = 0 on dG,, and use (3.7~). But, for > 0. simplicity, we use (3.7b), and continue to use the fact infx,FCRIy(x)I
10.4
INTERPOLATIONS AND CONVERGENCE
221
Finally, we must have the boundary condition V h ( x )= b(x),
x
E
dG:
.
(3.8)
Equations (3.5), (3.7b), and (3.8) will constitute the discretization of (2.8a), (2.8b’). The form (3.7b) is selected for simplicity of notation. Define
4= n (exp -A(
i=O
n (exp -p(
Cl: =
i=O
dpf),
0: = A;C,h.
Then the unique solution to (3.5), (3.7b), (3.8) and the discrete approximation to (2.19) is Nh- 1
V h ( x )= E x
1 D!k(t!) At!I,,(t!) + E,D!,-
i=0
Nh- 1
b(t!,)
+ E x 1 D!g(t!) i=O
where N, = min{n : t!: E dG:}.
dp!,
(3-9)
10.4 Continuous Time Interpolations and Convergence In order to be able to use weak convergence theory to relate (3.9) to (2.19), we must interpolate {<:} into a continuous parameter process ch(*). We will again take the interpolations to be piecewise constant. If
Wl:+1 - t,hlrl: = Y E aci) = Y(Y) dPh(Y).
For scaling 0, set A t h ( x )= h2/ I Y ( X ) 1 = h dph(x).
For scaling 1, set A t h ( x )= h/ I Y ( X ) I = tlph(.u).
(4.1)
222
10
Define At:
= Ath(t:)
DELAY EQUATIONS A N D REFLECTING BOUNDARIES
for
t : dG; ~ also. Define ph = tkh and n-
ti =
1
1 At:, 0
and let th(* ) denote the usual piecewise constant interpolation of {ti)with interpolation intervals {At:}. Define n, = max{n : ti I t} and Dh(t)= D:,, Dh(O-) = I. Then Vh(x)= E x
Ph
0
Dh(s)k(rh(s))rGh(th((s))
ds
+ Ex
Dh(p,)b(th(ph))
(44
where d,uh(s)= ds
IaGRh(th(S))/h,
scaling 0,
dph(s)= ds IaCRh(th(s)), scaling 1, ph(0) = 0.
(4-3)
Define = '(tihe
Gh. i C N h )
9
I!I , R = I ( C i h ~~ G R *i <, N h ) .
Then
where /?: is defined, as usual, to be [<:+ - t: -f(t:) At:]I;. Let Fh( -), Bh(.), rh(. ), and H h ( .) denote the piecewise constant interpolations (with interpolation intervals {At:}) of the series defined by the sums on the right-hand side of (4.4), each taking the value 0 at t = 0. The following result is proved in Kushner [K7].
Hh(*), Theorem 10.4.1 Fix x . The sequence {<'(-), F h ( - ) ,B h ( - ) ,rh(*), p h ( * ) }is tight in D5r+1[0,0 0 ) under each scaling. Let h denote a convergent H ( . ) , /A(.)). The process t(*) subsequence with limit (<(-), F ( . ) , B ( . ) , r(*), satisfies, for some Wiener process? W(. ),
t As in Chapter 6, if a ( * ) is not strictly positive definite, then to get W ( * )we , may need to augment the probability space by adding an independent Wiener process.
10.5
223
EXTENSIONS OF THE REFLECTION PROBLEM
t(t)= x + F ( t ) + q t ) + q t ) =x
lo
t nr
+
+
J”;
f(g(s))rG(t(s))ds
4t(S))IG(t(S))
d
W
t nr
+ jo
Y ( t ( 4 ) I a G R ( t ( 4 dP(S),
H ( t ) = 0,
(4.5)
where 7 = inf{t : ( ( t )E aG,}. Thefunctions p ( - ) and t(-)are nonanticipatioe with respect to W(.). Furthermore, if q( .) in C2* ‘[R‘ x [0, a)]satis-es (2.4) on d G R , where p = 0 under scaling 0 and p = 1 under scaling 1, then
.,
q(t(t), t ) - 4x9 0)-
J”
t nr
0
[alas + =%(t(S),
S)IG(t(S))
ds
(4.6)
is a submartingale. Also
Vh(x)+ R ( x )
(given by (2.19)) as h + 0.
(4.7) We will not give the details of the proof. Several long sequences of estimates are involved, and the details appear in Kushner [K7]. It is interesting that the discrete approximations I(:} are the same irrespective of the value of p but, owing to the different time scalings on d G i , the limiting processes are different-however, they are only different insofar as their time scale is concerned.
10.5 Extensions of the Reflection Problem The problem of calculating invariant measures can be handled as in Section 6.8. There are also extensions to the various optimal control problems-optimal stopping, impulsive control, and continuously acting control. In the last case, we can have an internally acting control (in G), a boundary control (affecting the cost for time spent on the boundary, and the direction of reflection) or both. The appropriate dynamic programming equations (for minimizing the controlled analogue of (2.19)) are min[YV(x) a e 4
+ k(x, a ) - A(x)V(x)] = 0,
min[K(x)r(x, a ) + g(x, a ) - B ( x ) v ( x ) ] = 0,
aeQi
where % and
V(X) = b(x), are appropriate action sets.
x E G,
x E dGR, x E dG, ,
(5.1)
224
10
DELAY EQUATIONS A N D REFLECTING BOUNDARIES
It is far from clear whether the submartingale model, as discussed in Section 10.2, is useful or not for problems in control theory. Of course, there are undoubtedly mathematical problems of interest in controlled reflections, but it is hard to find realistic applications to control theory at the present time. For this reason, we will not develop the discretization of (5.1) here. We note, however, that the pair p( -), w( - ) has the same role that w( .) would have in the discussion of classes of control strategies and comparison of “discretizable” controls in Chapters 8 and 9. There are physical problems where there are “hard reflections. An example is when the velocity of a particle changes sign when the particle hits a wall. Such problems are simpler to treat than the submartingale model since the change of state is instantaneous and the particle does not linger near the boundary from which it is reflected, owing to the nature of the reflection. We will now give a simple illustration of a “chattering” type of problem, on which more work is required. Define ”
XI = x , = 1,
X ( 0 ) = 0,
G = { X : X , < l},
dGR = {X : X , = I},
and refer to Figure 10.1. Let h = l/integer, and dGk = ((0, l), (h, l), ...}. Define the boundary scaling by A r h ( x ) = hp for x E dGk, p 2 0. For x E G , , ph(x
+ e l h ) = ph(x + e , h ) = f,
Ath(.) = h/2.
Define the direction of reflection on dG, by y , ( x ) = 0, y 2 ( x ) = - 1 (the reflection is “straight down ”). After hitting line 1 for the first time, the path {tf}spends an average of3 of the steps on line 0 and the rest on line 1. The interpolation intervals on lines 0 and 1 are p h and h/2, resp. If x is on line 1, then the average velocity of th(. ) to the right can be calculated from the facts that ph(x, x + el h ) = and A t h ( x ) = h/2 and is 1. Thus, the average velocity of th(.) to the right is 1/( 1 + p), after first reaching line 1. The sequence {th(-)} is tight on D‘[O, 00) and the limit c( - ) satisfies < , ( t ) = < , ( t ) = 1, =
1/(1
+ p),
t < 1, < , ( t ) = 0,
t,(t) = 1,
t 2 1.
Eventually, the chain {ti}bounces back and forth between line 0 and line 1 and it is not true that L&h(S))
+
Mc(4
as h
+
0
w.p. 1. Approximations to the cost functionals d o not necessarily converge.
10.5
EXTENSIONS OF THE REFLECTION PROBLEM
225
line 0 line I
FIG. 10.1 An example where the discretization chatters near 3Gk.
Of course, this particular problem can easily be handled by slightly modifying the method of calculating the cost functionals near the boundary. In particular, if there is chattering and if it is clear that part of the limit process will be on the boundary, then calculate the approximate cost as if the approximation were on the boundary. But it is not so easy to see how to treat the general case. This type of model sometimes appears in applications when we artificially truncate a state space by introducing a reflecting boundary.
CHAPTER 11
The Separation Theorem
.f Optimal Stochastic Control Theory
The stochastic control problem, where the system dynamics are linear in the state variable [i.e.,f(x, u ) = Ax + d ( u ) for a matrix A and function d( -), and a(x) = constant] is one of the most useful and important models in control theory. Frequently in applications, one observes not the state variable itself, but only some linear function of the state plus additive “white ” noise. Thus the control function can depend only on the state via the noise corrupted observations. The separation principle is a fundamental idea in the control of such systems. Loosely speaking, it states that the optimal control can be obtained in two steps. The first step is the derivation of a dynamical (Itd-like) equation for the conditional mean and the second step is the computation of the optimal control-assuming that the dynamical equation which is satisfied by the conditional mean is the true dynamical equation of interest. The exact idea will be made clear below. In the discrete time problem, the proof is rather obvious and almost classical [J 1, G3, K5]. The continuous time parameter problem has been treated by a number of authors, a few afnong them being Kushner [K15], 226
11.1
ASSUMPTIONS A N D THE SYSTEM MODEL
227
Wonham [W4], Balakrishnan [B5], and Fleming and Rishel [Fl]. This case is much more subtle and many questions center around the choice of an admissible control and even about an admissible probability space. Wonham's results [W4] depend heavily on the strict positive definiteness of crcr' or on the explicit solvability of the Bellman equation by a smooth (C'. ') function. The first condition is rarely met in practice and the Bellman equation is usually only solvable by numerical methods. Balakrishnan [B5] treats a narrow class of cost functionals and it is not entirely clear whether there always exist solutions to the equations corresponding to the arbitrary observation dependent controls. Here, we take an approach that is somewhat different from those used previously. The point of view toward the admissible control is different and is more natural in certain respects. In a sense, a large part of the solution lies in that definition. We can treat problems where control stops at a random (stopping or hitting) time or where there are state space constraints (dependent on the observations or conditional means or covariance). The problem is defined and the class of admissible controls is described in Section 11.1 and an existence theorem for an optimal observation dependent control appears in Section 11.2. The proof of the theorem uses many of the ideas of weak convergence that we have developed in the preceding chapters. Even the idea for the class of admissible controls arose from the work of Chapters 8 and 9. The point of view taken here (deliberately) avoids a number of problems. Although our class of admissible controls seems to be rather natural, it is still important to understand the filtering and control problem, when the admissible controls are defined as suitable functions of the observations only, and directly. Unfortunately, except for the rare case where E ( - ) is invertable for all s, the problems of uniqueness and meaning have not yet been resolved. This is one of the motivations for our point of view.
11.1 Assumptions and the System Model
First, we will collect some assumptions, then the basic control model will be defined, and the optimization problem stated. Let w(.) and z( denote independent Wiener processes with values in R' and R", resp., and let A ( . ) , E ( . ) , and C ( . ) denote bounded measurable matrix-valued functions, defined on [0, a), with C(-) and E ( * )being continuous. In the theorem, the control values will be constrained to lie in a compact convex set 42 c R". However, as mentioned after the proof, the general case can also be treated. Theorem 11.2.1 will deal with a control a)
11
228
T H E SEPARATION THEOREM
problem defined on a finite interval. It should be clear that the formulation is readily extended to problems on unbounded or random time intervals. We and suppose until mentioned otherwise, that the control is defined on [0, a), we will consider it either as a sequence of random variables (which form a separable and measurable process), or as an element of one of the L, spaces L;. or L;, depending on the situation, as we have done in past chapters. In Theorem 11.2.1, we will use the assumptions
All.l.1 k ( - , - ) and b ( . ) are bounded continuous real-ualued functions on R' x OU and R', resp. A11.1.2 d ( . ) is an K-valued continuous bounded function on %. The set {d(a),k(x, a), a E a}= (d(%), k(x, %)) is conuexfor each x E R'. The boundedness assumptions on k( *, -), b ( . ) , d( *), and on 42 can be weakened; they have nothing to d o with the separation theorem per se but only with the question of existence of an optimal control. The class of admissible controls will be defined below. If u( .) is admissible, then the uncontrolled and controlled state equations are (1.1) and (1.2), resp., where X ( 0 ) is a normally distributed random variable with mean m(0) and covariance P(O), and which is independent of w ( . ) and z(.). X 0 ( t ) = X ( 0 ) + j r A ( s ) X o ( s )ds 0
+ j t E ( s ) dw(s),
(1.1.)
0
+ 1' ( A ( s ) X ( s +) d(u(s)))ds + 50 E ( s ) dw(s). t
X(t)= X(0)
0
(1.2)
Let P( .) denote the solution to the Ricatti equation (1.3), with initial condition P(O), define Yt)= P ( t ) C ( t ) , and let @(-, - ) denote the fundamental matrix of x = Ax.
P
=
AP
+ PA' - PC'CP + EE'.
(1.3)
The uncontrolled and controlled observation processes are defined by (1.4) and (1.9, resp. t
Y o ( t )=
0
C ( s ) X 0 ( s )ds
Y ( t )= I ' C ( s ) X ( s )ds 0
=
+
+ z(t)
+ z(t)
P(t) j'C(s) ds 0
(1.4)
U)
d(u(u))du.
(1.5)
11.1
ASSUMPTIONS A N D THE SYSTEM MODEL
229
Define the innovations process v( - ) and processes m( .) and mo( .) by
+ dz = C ( t ) ( X o ( t) mo(t))dt + dz, dmO(t)= A ( t ) m o ( t )dt + L(t) dv(t), mO(0) = m(0) = EX(O), dm(t) = (A(t)m(t)+ d(u(t)))dt + L(t) dv(t). dv(t) = C ( t ) ( X ( t )- m(t)) dt
(1.6)
(1.7)
It is well known (see, e.g., Fleming and Rishel [Fl]) that rno(t) = E I X o ( t ) (P(s), s I t],
w.p. 1.
Furthermore, if u ( - ) is a separable and measurable control for which the corresponding process X ( - ) exists and E 1 X ( t ) I < co for each t, and where each u ( t ) is measurable over 9,, the smallest a-algebra which measures v(s), sI t, then it is also true that m(t) = E [ X ( t )I Y(s),u(s), s I t] = mo(t)
+ j'@(t, u ) d(u(v))dv,
w.p. 1
0
I v(s), u(s), s I t].
= E[X(t)
(1.8)
The choice of the admissible class of controls is one of the crucial points in formulating the separation theorem problem. An admissible control cannot depend directly on X ( . ) , but must depend on it only via the noise corrupted observations Y( -). Wonham [W4] supposed that u ( t ) was a function of the solution to (1.7) over [0, t] and that it satisfied a type of Lipschitz condition. The Lipschitz condition guaranteed that u ( t ) was 9, measurable. Balakrishnan [B5] and Fleming and Rishel [Fl] take the admissible controls to be the functions with values in the desired set and where u ( t ) is 9, measurable for each t 2 0. In general, of course, we cannot let u ( t ) be an arbitrary function of Y(s),s I t, for it may then make no sense. There may not be a solution to (1.2) for such a general control and so the Y (.) process may not exist. So we must consider controls for which, at least, (1.2) and (1.7) have well-defined solutions. Also, we would like (1.8) to continue to hold. The iollowing definition of admissibility uses a somewhat broader class of control functions, but it is a very natural class, and is well suited for use in a general proof of an existence or separation theorem. The control u ( . ) is said to be admissible if it is separable, measurable, %-valued for almost all w, t, nonanticipatiue with respect to v ( . ) , and if for each t, u ( t ) is conditionally independent of w( and z( ), giuen v(s),s I t . The conditional independence and the fact that v( is nonanticipative with respect to w( ) and z( - ) implies
-
a )
a )
-
230
11 THE SEPARATION THEOREM
that u( .) is nonanticipative with respect to w(*),z( -). Again, where useful, we suppose that the paths of u( * ) are points in an appropriate L2 space. Our class of admissible controls includes the classes of controls considered by the other cited authors. It includes (nonanticipative) controls which are v ( - ) functions, and also feedback functions (of m(.))which yield unique nonanticipative solutions to (1.7). The conditional independence guarantees that, if v(s), s It, is available, then we can learn nothing more about u ( t ) by knowing z ( s ) or w(s), s < co.Thus, the condition truly implies that the control does not use any more information about X ( .), w(-), or z ( - ) than is contained in the observations and in past values of the control. Of course, it does allow randomizations. For u( ) admissible, the corresponding solutions to (1.2) and (1.7) are unique, both pathwise and in the sense of probability law. Also, v ( - ) is a Wiener process and (1.8) holds. Theorem 11.2.1 will treat the cost functional
-
REMARKSON OUR POINT OF VIEW The probability space on which the processes are defined is not important. Only the probability law is important. We do not need to distinguish between strong sense and weak sense solutions to the differential equations (1.2) or (1.7). For convenience in what follows, suppose that (whatever the probability space) the control is an element of an L;. space (for the simple reason that the space, being separable and complete, will allow us to use a regular conditional probability). Suppose that there is a probability space (a, 3,P) supporting processes m(.), F(-),i(-),ii(.), 'v(.), ii(*), -fo(.),ko(-), P(.)):where (go(*), *(* )), * ),G( * ), *(* I), (fi"( ), c( )), (fi( 1, ii( -), c( )), ( y o ( ), . X o ( .), z'( )), and ( y( * ), -f(* ), i( )) solve (1. l), (1.2), (1.6), (1.7), (1.4), and (1.9, resp., and ii(t) is "&-valued. Let all the processes (other than ii~(), i( ), 'v( )) be nonanticipative with respect to i i ( - ) ,i(*),'v(-), let G(.)and Z(.) be independent Wiener processes, and let any measurable function of h(s), s t , be conditionally - ) , G(s), s I t . Let P;(du') denote the regular independent of Z(.), i ~ ( given conditional distribution function of ii( -), given 'v( -). For each measurable set A E L;. we can suppose that there is a measurable function Q(A, .) on c'[O, GO) such that?
,
(z(-), (z(
-
- -
-
- - -
-
-=
,,
Q(4 ~ ( ' 1 )= P , ( A ) for all (7, such that 'v(&,
.) = y ( . ) E C'[O, co).
t Strictly speaking, we should delete the y ( - ) in a C [ O , co) measurable set N, where F{t(G, * ) E N } = 0. Fory(.) E N,set Q ( A , y ( * ) )= Q ( A ) ,whereQ(*)isanarbitrarydistribution.
1 1.2 THE SEPARATION A N D OPTlMALlTY THEOREMS
23 1
Now, let (Q, d,P) denote the probability space on which are defined the actual uncontrolled processes of concern in the application, namely, z ( . ), w ( * ) , mo(.), X o , Y“(.), and hence the innovation v(*). The measure P is that determined by z( * ), w( * ) but we write it as P(dv dz dw). We allow ourselves to control the process in any way at all, provided only that we do not use “unavailable” information, and that the control take values in JI/ for almost all w, t. There is no reason why we cannot “randomize” or let u ( . ) depend also on quantities that have nothing at all to d o with the processes as defined on (Q, d,P). In this sense, the probability space itself could also depend on the control u ( * ) . We will define u ( . ) via an augmentation procedure. Define Q’ = Q x L;, ,and let d’denote the associated a-algebra of subsets ofQ’. A measure P‘ will be defined on (Q’, &’) via a conditional independence assumption: Let w’ denote the generic variable of R’,but continue to write v( .), z( -), w( - ) as w functions. Define P ( d u dv dw d z ) = P(dv dz dw)Q(du,
V(O,
*))
(1.10)
(with a slight abuse of notation regarding the conditional probability). The marginal measures of z ( . ), w( ), v( ), X o ( . ), Y“( ) are unaffected. But we can now generate a control according to the new law and define the processes X(.), m(.), and Y( The process u ( . ) is nonanticipative with respect to v( ), w ( *), and z( .) and is an admissible control. The cost functionals (say, of the form (1.9)) are the same for (f( -), fi(-)) and (X(-), u ( . ) ) since their laws are the same. Thus it makes no difference on which space we do our optimization. The main problem with our formulation is that it is not always clear how to generate the values of u ( t ) recursively from the past values and the innovations for an arbitrary admissible u( ). This is not a serious difficulty. Note the remarks that we make below in connection with Theorem 11.2.1.
- -
-
a).
-
-
11.2 The Separation and Optimality Theorems
Define the functions
k ( - , *, .) and 6(*)by
where N,( - ) is the distribution of Xo((t)- rno(t)(normal with mean zero and covariance P(t)). Both functions are continuous in their arguments and (except for the t-dependence of k, which is continuous in all its arguments)
232
11
THE SEPARATION THEOREM
satisfy All.l.1 and A11.1.2. If u ( . ) is admissible with corresponding solutions X ( -), m( - ) then E"b(X(T ) )= E " [ F b ( X (T)) I v(s), u(s), s I T ] =
E"
5 b ( m ( T )+ C ) dNT(C)
=E % ( ~ ( T ) ) ,
and a similar relation holds for k ( - , -). Thus T
R(u)= F
k(m(s),u(s), s ) ds
+ 6(m(T))].
(2.1)
Theorem 11.2.1 Assume All.l.1 and A11.1.2, then there is an admissible control which minimizes (2.1).
REMARK To get the optimal control, we only need look for the control which minimizes (2.1) with system (1.7) over all %!-valued controls nonanticipative with respect tov(-),where v(.)is assumed to be an arbitrary Wiener process with the correct covariance function. The regular conditional probability of the optimal control, given v(-), yields the optimal law for the original problem, when the actual innovations are substituted for the arbitrary Wiener process v(. ). If the optimal control is representable as a v(.) function, then there is a rule which maps the observed values of the innovations directly into values of the control. This is true whether 9, k ( . , .), b ( - ) ,or d ( . ) are bounded or not. Theorem 11.2.1 is an existence theorem; it does not provide a rule for getting the values of u ( t ) from the values of u(s), s < t, and v(s), s I t, in a dynamic manner. It is not always clear how we can generate a measurable process u ( - ) (with the correct joint law with v(-)) recursively from the v(*) data as v(.) evolves. It says simply that there i s an optimal admissible control. It is not satisfactory that the existence theorem does not provide a rule. But, under an additional convexity condition, u ( - ) can be assumed to be a v(. ) function. See the remarks after the proof. In any case, it follows from the implicit function theorem used in the proof that the minimizing u ( t ) is a measurable function of v(s), s I t, and the corresponding solution m(t). But, when the control is used in this feedback form, we cannot guarantee that the solution to (1.7) is unique. By the proof of Theorem 9.3.2, for any admissible control u( * ) and E > 0, there is an admissible v(.) dependent control ii( *), which depends on v( - ) at only finitely many time points, which is piecewise constant, and R(ii) I R(u) + E .
11.2
THE SEPARATION A N D OPTIMALITY THEOREMS
233
Thus, the optimum in the class of admissible v(.) dependent controls (if it exists) is as good as the optimum admissible control. The technique of proof of Theorem 9.5.2 yields a constructive procedure for getting a (possibly randomized) admissible control ic( .) which takes finitely many values and is piecewise constant and is such that R ( i ) IR(u)
+ E.
The theorem to be proved says, basically, that if we have a sequence of admissible controls, each for a filtering problem, then in the limit we get an optimal control for a filtering problem. The "filtering" structure is not lost, as we go to the limit.
PROOFThere is at least one admissible control. Let { u " ( . ) ) denote a minimizing sequence. If we wish, we can suppose that the probability space is different for each n. Our notation will take this possibility into account. Let ( w " ( * ) , z " ( * ) ,Xo*"(.), mo*"(*),I"-'*"(*), v"(-), X"(.), m"(-),I"'(-)) = (So,"( .), X"( ), m"( ), I"'(* )) = S"( ) denote the relevant quantities, where {u"(. )} is the minimizing sequence of admissible controls. Define
-
-
-
K"(t)= Jfk^(m"(s), ~ " ( s ) , s) ds
c'
0
D"(t) =
d(u"(s)) ds.
0
The set {S"(-),K"(.), D " ( . ) } is tight on C [ O , 7 ' 1 for some u. Let n index a convergent subsequence, suppose that the Skorokhod imbedding is used and denote the limit by (S(* ), R( )). From the weak convergence results of Chapters 6 and 8, we can show that @(*), Z ( . ) are independent Wiener processes, V( .) is a Wiener process which is nonanticipative with respect to @( and Z( ), that all the other processes in S( *), K( ), 6(.) are nonanticipative with respect to W( ), Z( .), V( ), and (2.2t(2.8) hold, where X(0)and X ( 0 ) have the same distribution. The covariance of ti(-) is that of v " ( - ) . a),
0
)
-
-
-
+ j 0' A ( s ) X ( s ) ds + D ( t ) + [ ' E ( s ) d@(s),
(2.2)
+ I f A ( s ) m ( s )+ D ( t ) + 5
(2.3)
X ( t )= X(0) m ( t ) = m(0)
a(.
0
0
f
0
as)dV(s),
+
Y ( t )= j ' C ( s ) X ( s ) z(t), 0
X0(r) = X ( 0 ) + I0f A ( s ) X o ( s )ds + JO'E(s) d@(s),
(2.5)
11
234
m"(t) = m(0)
THE SEPARATION THEOREM
+1
+ ("Ys)
.I
' 0
A(s)mo(s)d s
(2.6)
dv(s),
' 0
P(r)= \'C(s)X0(s)d s + f ( t ) , ' 0
V(t) =
1
.I
C ( s ) [ X 0 ( s) mo(s)] ds
' 0
It is clear that (w.P. 1) rio(t) = E I X o ( t )
+ f(t).
I P(s),s I t ] = E I X o ( t )I B(s),
b'(m"(T))-,& ~ I ( T ) ) ,
K " ( t ) -,K ( t ) ,
(2.8)
sI t]
tI T as
(2.9)
n -, CO. (2.10)
We need to represent K( * ) and D(.) in terms of an admissible control. But first we will show (for each t ) that K ( s ) and D(s), s I t, are conditionally independent of W( -), Z( -), given V(s), s 5 t . Let q be an arbitrary integer, let t , , ..., t, be arbitrary positive scalars and I t , let sl, ..., s, be arbitrary positive scalars, and let a j , pi, p j , yj be arbitrary vectors. If (2.1 1) holds for all such sequences, then we have the conditional independence
E exp i
c ( a J b ( t i )+ p j K ( t j )+ /?jf(sj)+ yjW(sj)) i
( a j o ( t j )+ p j K ( t j ) ) l v ( s ) , s 5 (j?jZ(sj)
t
+ y > i i ( s j ) ) I V(s), s It
By the admissibility of u"(.). (2.11) holds if the bar is replaced by the superscript n. Let us examine the factor E exp i I
I
1 (pjz"(sj) + yjww"(sj)) I v"(s), s I t . i
(2.12)
The conditional expectation in (2.12) is independent of n. In fact, there are bounded continuous functions hi(.), e j ( * ) , Q(.) such that (2.12) can be written in the form (w.P. 1)
where d(s) =
c i
(&dj(s)
+ y>Ej(s)).
The function Q(- ) depends only on the conditional covariances of w"(.), z"(.), given v"( -), and is a nonrandom quantity and the integral fo dj(s) dv"(s)
11.2
THE SEPARATION A N D OPTIMALITY THEOREMS
235
is the conditional mean of z"(sj) given v"(s), s I t , etc. This representation of (2.12) and the weak convergence imply (2.11). By the conditional independence, M(t) = E [ X ( t )I v(s), D(s), S I r]
I D(s), s I t] = E [ X ( t )I Y(s), D(s), s I t ] , = E[R(r) P"s),
w.p. 1.
(2.13)
By the results in Section 9.2, there is an admissible control u ( . ) such that D ( t ) = [ ' d ( u ( s ) ) ds, '0
K(r) =
['&i(s),
(2.14)
~ ( s ) ,S) ds,
'0
and (2.13) holds with u(s) replacing D(s). By minimality and admissibility, u ( - ) is an optimal admissible control. Q.E.D.
REMARKSAND EXTENSIONS The method can be extended to problems where T is replaced by a random stopping or hitting time, provided that these times have the appropriate conditional independence properties. The followihg extensions can also be handled. Let k(x, a ) = k o ( x )+ k,(a), where ko(x)2 0. Let b ( x ) 2 0, let ( k , ( @ ) , d ( @ ) )be bounded and convex as before and suppose that b( ) and ko( * ) are convex. Both ko( ) and b ( . ) may be unbounded, but suppose that they grow no faster than a polynomial as Ix 1 + co. All the above functions are assumed to be continuous. Then, the proof of Theorem 11.2.1 implies that there is an admissible control u ( . ) such that (2.2)-(2.9) and (2.13) hold and that
.
-
+
Iim E""[K"(T)+ b ^ ( m " ( ~ )2) ]F [ R ( T ) &I(T))]
n-.
00
F [jo&(m(t),u ( t ) , t ) dt T
=
+ &i(T))].
(2.15)
Since { u " ( - ) } is minimizing, there must be equality in (2.15). Now, note that there is a measurable nonanticipative and V ( - ) dependent T, process ii(.) such that, for each r I
i
E[d(u(r))lv(s),s It ] = d ( f i ( t ) ) w.p. 1. E[k,(u(t))lV(s),s ll = k , ( q t ) )
In fact ii(*) can be supposed to be admissible.
11
236
THE SEPARATION THEOREM
Define h(t)= E[iii(t)I V(s), s I t]. Then
+ J"A(s)h(s)ds + j ' d ( i i ( s ) )ds + Ys) dV(s) I
h(t)= m(0)
0
0
0
(2.16)
and h(- ) is the conditional expectation, given ii(s), s I t, 5(s), s I t. Also, by convexity and Jensens' inequality EL,(rn(t), t ) 2 El,(r)z(t), t), Eh(M(t)) 2 Eh(h(T)).
(2.17)
These calculations imply that the optimal control can be assumed to be a function of the innovations. Thus, there is an explicit rule for calculating it. The only problem in allowing k , ( - ) , d ( * ) , or %! to be unbounded is in showing that {K"(t)}and {D"(t)}are tight and converge to absolutely continuous functions. Various additional assumptions can be used to guarantee this but we will not pursue the question further.
References
Bensoussan, A., and Lions, J. L., Nouvelle formulation de problems de control impulsionnel et applications, C. R. Acad. Sci. Paris. Ser. A 276, 1189-1192.1279-1284, 13331338 (1973). Bensoussah, A, and Lions, J. L.,Sur le contrble impulsionnel et les inequations quasivariationnelles d’evolution, C . R. Acad. Sci. Paris, S i r . A 280. 1049-1053 (1975). Billingsley, P.,Conuergence of Probability Measures. Wiley, New York, 1968. Breiman, L., Probability. Addison-Wesley, Reading, Massachusetts, 1968. Balakrishnan, A. V., A note on the structure ofoptimal stochastic controls, Appl. Math. Optimization 1, 87-94 (1974). Chung, K. L., Markou Chains with Starlonary Transition Probabilities. Springer-Verlag. Berlin, 1960. Derman, C., Finite State Markouian Decision Processes. Academic Press, New York, 1970. Doob, J. L., Stochastic Processes. Wiley, New York, 1953. Dudley, R. M., Distances of probability measures and random variables, Ann. Math. Statist. 39, 1563-1572 (1968). Dynkin, E. B., Markou Processes. Springer-Verlag, Berlin, 1965 (English translation of Russian original). Fleming, W. H.,and Rishel, R. W., Deterministic and Stochastic Optimal Control. Springer-Verlag, Berlin, 1975. 23 7
REFERENCES
Forsythe, G., and Wasow, W., Finite Difference Methods f o r Partial Differential Equations. Wiley, New York, 1960. Friedman, A,, Stochastic Differential Equations and Applications. Academic Press, New York, 1975. Garabedian, P. R., Partial Differential Equations. Wiley, New York, 1964. Gikhman, I. I., and Skorokhod, A. V., Introduction to the Theory of Random Processes. Saunders, Philadelphia, 1969 (Translation of Russian original). Gunckel, T. L., and Franklin, G . F., “A generalized solution for linear sampled data control,” J. Basic Engrg. 85, 197-201 (1963). Hillier, F. S., and Lieberman, G. J., Operations Research (2nd ed.). Holden-Day, San Francisco, 1974. Howard, R. A,, Dynamic Programming and Markov Chains. Technology Press, M.I.T., Cambridge, Massachusetts, 1960. Iglehart, D. E., Diffusion approximations in applied probability, Mathematics of the Decision Sciences, Part 11, Lectures in Applied M a t h Vol. 12. Amer. Math. SOC.,Providence, Rhode Island, 1968. ItB, K., and Nisio, M., On stationary solutions of a stochastic differential equation, Math. J. K y o t o Univ. 15, 777-794 (1966). Joseph, P. D.. and Tou, J. F., On linear control theory, Trans. AIEE (part 11) 80, 193- 196 (1961). Karoui, N., and Reinhard, H., Processus de diffusion dans R”, Lecture Notes in Mathematics, No. 321. Springer-Verlag. Berlin, 95-1 16 (1973). Khazminskii, R. Z., Necessary and sufficient conditions for the asymptotic stability of linear stochastic systems, Theor. Probability Appl. 1, 144- 147 (1967). Kushner, H. J., “ O n the differential equations satisfied by conditional probability densities of Markov processes, S I A M J. Control 2, 106-1 19 (1964). Kushner, H. J., Dynamical equations for optimal nonlinear filtering, J . Diflerential Equations 2, 179- 190 (1967). Kushner, H. J., Introduction t o Stochastic Control Theory. Holt, New York, 1971. Kushner, H. J., Finite difference methods for the weak solutions of the Kolmogorov equations for the density of both diffusion and conditional diffusion processes, J . Math. Anal. Appl. 52, 25 1- 265 (1976). Kushner, H. J.: Probabilistic methods for finite difference approximations to degenerate elliptic and parabolic equations with Neumann and Dirichlet boundary conditions, J . Math. Anal. Appl. 53, 644-668 (1976). Kushner, H. J.. and Chen, C. H., Decomposition of systems governed by Markov chains, IEEE Trans. Automatic Control. AC-19, 501-507 (1974). Kushner, H. J., and Kleinman, A. J., “Accelerated procedures for the solution of discrete Markov control problems,’’ IEEE Trans. Automatic Control AC-16, 147-152 (1971). Kushner, H. J., and Kleinman, A. J., Mathematical programming and the control of Markov chains, Internat. J. Control 13, 801-820 (1971). Kushner, H. J., and Yu,C. F., Probability methods for the convergence of finite difference approximations to partial differential equations, J . Math. Analysis Appl. 43, 603625 (1973). Kushner, H. J., and Yu,C. F., Probability methods for the convergence of finite difference approximations to partial differential integral equations, J . Math. Analysis Appl. 45, 54-72 (1974). Kushner, H. J., and Yu, C. F., Approximations, existence and numerical procedures for optimal stochastic controls, J . Math. Analysis Appl. 45, 563-587 (1974).
REFERENCES
239
Kunita, H., and Watanabe, S., On square integrable martingales, Nagoya Math. J. 30, 209-245 (1967). Kushner, H. J.. Stochastic Stability and Control. Academic Press, New York, 1967. Kushner, H. J., “Approximations for Functionals and Optimal Control Problems on Jump-Diffusion Processes,” to appear. Lindvall, T., Weak convergence of probability measures and random functions in the function space D[O, a), J. Appl. Probability 10, 109-121 (1973). Lo, J. T. H., and Willsky, A. S., Estimation for rotational processes with one degree of freedom, Part I, IEEE Trans. Automatic Control AC-20, 10-21 (1975). Loeve, M., Probability Theory, 3rd ed. Van Nostrand-Reinhold, Princeton, New Jersey, 1963. McShane, E. J., and Warfield, R . B., On Fillipov’s implicit function lemma, Proc. Amer. Math. SOC.18, 41-47 (1967). McShane, E. J., Stochastic Calculus and Stochastic Models. Academic Press, New York, 1974. Meyer, P. A,, Probability and Potentials. Blaisdell, Waltham, Massachusetts, 1966. Mitchell, R . R., and Kozin, F., Sample stability of second-order linear differential equations with wide band noise coefficients, SIAM J . Appl. Math. 27, 571-605 (1974). Mortenson, R. E., Optimal control ofcontinuous time stochastic systems, Ph.D. Thesis. Dept. Electrical Engrg., Univ. California, Berkeley, California, 1966. Neveu, J., Mathematical Foundations of the Calculus of Probability. Holden-Day, San Francisco, 1965. Pinsky, M., A note on degenerate diffusion processes, Theor. Probability Appl. 14, 502-506 (1969). Roxin, E., The existence of optimal controls, Michigan Math. J. 9, 109-1 19 (1962). Shiryaev, A. N., Statistical Sequential Analysis. Translations of mathematical monographs, Vol. 38. Amer. Math. SOC., Providence, Rhode Island. 1973. Skorokhod, A. V., Limit theorems for stochastic processes, Theor. Probability Appl. 1, 262-290 (1956). Sfrook, D. W., and Varadhan, S. R. S.,Diffusion processes with continuous coefficients, I, 11, Comm. Pure Appl. Math. 22, 345-400; 479-530 (1969). Strook, D. W., and Varadhan, S. R. S., Diffusion process with boundary conditions, Comm. Pure Appl. Math. 24, 1971. Strook, D. W., and Varadhan, S. R. S., On degenerate elliptic and parabolic operators of second order and their associated diffusions, Comm. Pure Appl. Math. 25, 651-713 (1972). Varga, R. S., Matrix Iteratiue Analysis. Prentice-Hall, Englewood Cliffs, New Jersey, 1962. Wagner, H. M., Principles of Operations Research, 2nd ed. Prentice-Hall, Englewood Cliffs, New Jersey, 1975. Wong, E., Stochastic Processes in Information and Dynamical Systems. McGraw-Hill, New York, 1971. Wong, E., and Zakai M., On the convergence of ordinary integrals to stochastic integrals, Ann. Math. Statist. 36, 1560-1564 (1965). Wonham, W. M., On the separation theorem of stochastic control, SIAM J. Control6, 312-326 (1968). Yamada, T., and Watanabe, S., On the uniqueness of solutions of stochastic differential equations, J. Math. Kyoto Uniu. 11, 155-167 (1971). Zakai, M., On the optimal filtering of diffusion processes, Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 11, 230-243 (1969).
Index of Selected Symbols
240
Index
Dynamic programming equations for diffusions, see Nonlinear partial differential equations
A
Adapted process, 3 Approximations to diffusions, requirements for, 90
F Feller process, 9, 58 strong, 9 Finite difference approximations, 82, 86, 91,
C
Chapman-Kolmogorov equation, 8, 54 Characteristic operator, 69, 85
118, 217
implicit-explicit method, 12 7 implicit method, 125-126 Neumann boundary conditions, 219-221 optimal controls, 139, 152. 153, 175-179,
D
194, 201
Delay equations, stochastic, 21 1-214 approximations to, 212-214 Differential generator, 21 Discounted cost, approximations for,
probabilistic interpretation, 82, 92, 119, 127, 139, 152, 158, 179, 192, 212, 220-22 1 Fokker-Planck equation, 62, 122-124 Functional limit theorem, 27, 79-81
102-104, 152, 192-200
24 I
242
INDEX
I Impulsive control for diffusions, 76-77 action set, 77 approximations to, 158-165.205-206 dynamic programming equation, 77 Impulsive control for Markov chains, 50-52 action set, 50 dynamic programming equation, 50 Interpolations of approximating chains, 94-96, 129, 139, 180-181 convergence to a controlled diffusion, 182, 190, 196, 200 convergence to a diffusion, 96-100, 22 1-223 Invariant measure, approximation of, 108-1 11 Iterated logarithm, law of, 5 It8's lemma, 14-15, 66, 67, 71, 73
Markov process, 7-9 property, 7 strong, 9, 58 transition function, 7 approximation of, 122-124 Martingale, 5 approximat ions, 99- 100 associated increasing process, 6 integral with respect to, 9 local, 13 probability inequalities, 6 problem, 214 properties of, 5-7 representation as a stochastic integral, 15 square integrable, 6 Measurable stochastic process, 2 Monte Carlo, 107-108, 122, 127
N K Kolmogorov equations, 61, 62, 122-124
L Linear filtering and control, 227-235 Linear programming, optimization under constraints, 53-56, 154-158 M McShane-Warfield implicit function theorem, 183 Markov chains, 37-41 average cost per unit time, 39 definition, 37 functionals of, 37-41 invariant measure, 40, 108-1 11 Neumann or reflecting boundaries, 2 19-22 1 numerical methods for, 41 Markov chains, controlled, 4 - 5 6 discounted cost, 48 impulsive control, 50-52 optimal stopping, 49-50 over [0, TI, 52 with side constraints, 53-56 strategies for, 4 - 4 5 to a target, 46-47
Neumann boundary condition, 214-219 Nonanticipative random function, 3 Nonlinear filtering, conditional expection formula, 129 approximations to conditional moments, 131-134 Nonlinear partial differential equations, 69-77 for control on [0, TI, 75 for control to a target, 74 for impulsive control, 77 with Neumann boundary condition, 223 for optimal stopping, 71 for stopping and control, 75, 174 Numerical data, for stability problem, 113-1 15 invariant set approximation, 134136 optimal control, 206-209 optimal stopping, 166-169 0
Optimal stopping problem for diffusion, 69-71, 74-75 approximations, 138-140 approximations, admissibility of limit, 142-143 approximations, optimality of limit, 144, 146, 151
243
INDEX
and control, 171-192 discounted cost, 152-3 with forced stopping, 153 with side constraints, 154-158 Optimal stopping problem for Markov chain, 41-44 with control, 49-50 dynamic programming equation, 43 &-optimalpolicy, 44
P Partial differential equations for functionals of diffusions, 66-68 with boundary reflections, 215-219 of optimal diffusions, 69-77, 223
Stochastic differential (It6) equation, 17-23, 66-68 with delays, 21 1 differential generator, 21 discrete time approximations, 22 existence and uniqueness, 17 perturbations, 22 properties, 21-22 with reflections, 213-219 relation to elliptic equations, 66-68 Stochastic integral, 9-17 calculus for, 14-15 Stopping time, 3 examples of, 4, 5 , 59 Submartingale problem, 215-2 17 and reflected diffusion, 2 16
T
Q Quadratic variation, 4, 6, 16, 80, 100, 106
R Regular boundary point, 62-64, 66, 84
S
Separation theorem, 23 1-235 admissible control, 229 Skorokhod imbedding, 28 Skorokhod topology, 30 Stochastic differential equation, controlled, 69-77
A 8 C D E F 6
7 8 9 O 1
2 H 3
1 4 J 5
Tightness, of a sequence of measures, 26, 27 for the approximations, 96-100, 160, 180, 213 criteria for, 28, 29, 31 Transition density, approximation of, 122-124 W
Weak convergence of probability measures, 2435, 79-81 on C , 28-29 on D, 29-33 on L,, 34 necessary and sufficient conditions, 27 Weak infinitesimal operator, 68 Weak solution to elliptic equation, 65, 90 Wiener process, 2-5, 7, 15-17, 81, 104-107