Stochastic Optimal Control: The Discrete- Time Case Dimitri P. Bertsekas and Steven E. Shreve
WWW site for book information and orders http://world.std.com/-athenasc/
Athena Scientific, Belmont , Massachusetts
Contents Preface Acknowledgments
Chapter I 1.1 1.2 1.3
Introduction
Structure of Sequential Decision Models Discrete-Time Stochastic Optimal Control Problems-Measurability Questions The Present Work Related to the Literature
Part I ANALYSIS O F DYNAMIC PROGRAMMING MODELS Chapter 2 2.1 2.2 2.3
Notation and Assumptions Problem Formulation Application to Specific Models 2.3.1 Deterministic Optimal Control 2.3.2 Stochastic Optimal Control-Countable Disturbance Space 2.3.3 Stochastic Optimal Control-Outer Integral Formulation 2.3.4 Stochastic Optimal Control-Multiplicative Cost Functional 2.3.5 Minimax Control
Chapter 3 3.1 3.2 3.3
Monotone Mappings Underlying Dynamic Programming Models
Finite Horizon Models
General Remarks and Assumptions Main Results Application to Specific Models
viii
CONTENTS
Chapter 4 4.1 4.2 4.3
4.4
General Remarks and Assumptions Convergence and Existence Results Computational Methods 4.3.1 Successive Approximation 4.3.2 Policy Iteration 4.3.3 Mathematical Programming Application to Specific Models
Chcrpter 5 5.1 5.2 5.3 5.4
5.5
Infinite Horizon Models under Monotonicity Assumptions
General Remarks and Assumptions The Optimality Equation Characterization of Optimal Policies Convergence of the Dynamic Programming Algorithm-Existence of Stationary Optimal Policies Application to Specific Models
Chupter 6 6.1 6.2 6.3
Infinite Horizon Models under a Contraction Assumption
A Generalized Abstract Dynamic Programming Model
General Remarks and Assumptions Analysis of Finite Horizon Models Analysis of Infinite Horizon Models under a Contraction Assumption
Purt II
STOCHASTIC OPTIMAL CONTROL THEORY
Chapter 7 Borel Spaces and Their Probability Measures 7.1 7.2 7.3 7.3
7.5 7.6
7.7
Notation Metrizable Spaces Borel Spaces Probability Measures on Borel Spaces 7.4.1 Characterization of Probability Measures 7.4.2 The Weak Topology 7.4.3 Stochastic Kernels 7.4.4 Integration Semicontinuous Functions and Borel-Measurable Selection Analytic Sets 7.6.1 Equivalent Definitions of Analytic Sets 7.6.2 Meas'urability Properties of Analytic Sets 7.6.3 An Analytic Set of Probability Measures Lower Semianalytic Functions and Universally Measurable Selection
Chnpter 8 8.1
The Finite Horizon Borel Model
The Model
CONTENTS 8.2 8.3
The Dynamic Programming Algorithm-Existence and <-Optimal Policies The Semicontinuous Models
Chapter 9 9.1 9.2 9.3 9.4 9.5 9.6
of Optimal
The Infinite Horizon Borel Models
The Stochastic Model The Deterministic Model Relations between the Models The Optimality Equation-Characterization of Optimal Policies Convergence of the Dynamic Programming Algorithm-Existence of Stationary Optimal Policies Existence of <-Optimal Policies
Chapter 10 The Imperfect State Information Model 10.1 Reduction of the Nonstationary Model-State Augmentation 10.2 Reduction of the Imperfect State Information Model-Sufficient Statistics 10.3 Existence of Statistics Sufficient for Control 10.3.1 Filtering and the Conditional Distributions of the States 10.3.2 The Identity Mappings
Chapter I I
Miscellaneous
11.1 Limit-Measurable Policies 11.2 Analytically Measurable Policies 11.3 Models with Multiplicative Cost
Appendix A
The Outer Integral
Appendix B
Additional Measurability Properties of Borel Spaces
B. 1 B.2 B.3 B.4 B.5
Proof of Proposition 7.35(e) Proof of Proposition 7.16 An Analytic Set Which Is Not Borel-Measurable The Limit a-Algebra Set Theoretic Aspects of Borel Spaces
Appendix C
The Hausdorff Metric and the Exponential Topology
References Ttrhle ~ f ' P ~ ~ ~ p o ~ i Lt i~oI ~P ~I I. Is D(:fi17iti017.~. I. ~ I S , (//I(/ A.S.SIIIII~~I'OIIJ
Index
Stochastic Optimal Control: The Discrete- Time Case Dimitri P. Bertsekas and Steven E. Shreve
WWW site for book information and orders http://world.std.com/-athenasc/
Athena Scientific, Belmont , Massachusetts
Athena Scientific Post Office Box 391 Belmont, Mass. 02178-9998 U.S.A. Email:
[email protected] WWW information and orders: http://world.std.com/-athenax/
Cover Design: Ann Gallager @ 1996 Dimitri P. Bertsekas and Steven E. Shreve All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher.
Originally published by Academic Press, Inc., in 1978
OPTIMIZATION AND NEURAL COMPUTATION SERIES 1. Dynamic Programming and Optimal Control, Vols. I and 11, by Dimitri P. Bertsekas, 1995 2. Nonlinear Programming, by Dimitri P. Bertsekas, 1995 3. Neuro-Dynamic Programming, by Dimitri P. Bertsekas and John N. Tsitsiklis, 1996 4. Constrained Optimization and Lagrange Multiplier Methods, by Dimitri P. Bertsekas, 1996 5. Stochastic Optimal Control: The Discrete-Time Case by Dimitri P. Bertsekas and Steven E. Shreve, 1996
Publisher's Cataloging-in-Publication Data Bertsekas, Dimitri P. Stochastic Optimal Control: The Discrete-Time Case Includes bibliographical references and index 1. Dynamic Programming. 2. Stochastic Processes. 3. Measure Theory. I. Shreve, Steven E., joint author. 11. Title. T57.83.B49 1996 519.7'03 96-80191
ISBN 1-886529-03-5
To Joanna and Steve's Mom and Dad
Contents Preface Acknowledgments
Chapter I 1.1 1.2 1.3
Introduction
Structure of Sequential Decision Models Discrete-Time Stochastic Optimal Control Problems-Measurability Questions The Present Work Related to the Literature
Part I ANALYSIS O F DYNAMIC PROGRAMMING MODELS Chapter 2 2.1 2.2 2.3
Notation and Assumptions Problem Formulation Application to Specific Models 2.3.1 Deterministic Optimal Control 2.3.2 Stochastic Optimal Control-Countable Disturbance Space 2.3.3 Stochastic Optimal Control-Outer Integral Formulation 2.3.4 Stochastic Optimal Control-Multiplicative Cost Functional 2.3.5 Minimax Control
Chapter 3 3.1 3.2 3.3
Monotone Mappings Underlying Dynamic Programming Models
Finite Horizon Models
General Remarks and Assumptions Main Results Application to Specific Models
viii
CONTENTS
Chapter 4 4.1 4.2 4.3
4.4
General Remarks and Assumptions Convergence and Existence Results Computational Methods 4.3.1 Successive Approximation 4.3.2 Policy Iteration 4.3.3 Mathematical Programming Application to Specific Models
Chcrpter 5 5.1 5.2 5.3 5.4
5.5
Infinite Horizon Models under Monotonicity Assumptions
General Remarks and Assumptions The Optimality Equation Characterization of Optimal Policies Convergence of the Dynamic Programming Algorithm-Existence of Stationary Optimal Policies Application to Specific Models
Chupter 6 6.1 6.2 6.3
Infinite Horizon Models under a Contraction Assumption
A Generalized Abstract Dynamic Programming Model
General Remarks and Assumptions Analysis of Finite Horizon Models Analysis of Infinite Horizon Models under a Contraction Assumption
Purt II
STOCHASTIC OPTIMAL CONTROL THEORY
Chapter 7 Borel Spaces and Their Probability Measures 7.1 7.2 7.3 7.3
7.5 7.6
7.7
Notation Metrizable Spaces Borel Spaces Probability Measures on Borel Spaces 7.4.1 Characterization of Probability Measures 7.4.2 The Weak Topology 7.4.3 Stochastic Kernels 7.4.4 Integration Semicontinuous Functions and Borel-Measurable Selection Analytic Sets 7.6.1 Equivalent Definitions of Analytic Sets 7.6.2 Meas'urability Properties of Analytic Sets 7.6.3 An Analytic Set of Probability Measures Lower Semianalytic Functions and Universally Measurable Selection
Chnpter 8 8.1
The Finite Horizon Borel Model
The Model
CONTENTS 8.2 8.3
The Dynamic Programming Algorithm-Existence and <-Optimal Policies The Semicontinuous Models
Chapter 9 9.1 9.2 9.3 9.4 9.5 9.6
of Optimal
The Infinite Horizon Borel Models
The Stochastic Model The Deterministic Model Relations between the Models The Optimality Equation-Characterization of Optimal Policies Convergence of the Dynamic Programming Algorithm-Existence of Stationary Optimal Policies Existence of <-Optimal Policies
Chapter 10 The Imperfect State Information Model 10.1 Reduction of the Nonstationary Model-State Augmentation 10.2 Reduction of the Imperfect State Information Model-Sufficient Statistics 10.3 Existence of Statistics Sufficient for Control 10.3.1 Filtering and the Conditional Distributions of the States 10.3.2 The Identity Mappings
Chapter I I
Miscellaneous
11.1 Limit-Measurable Policies 11.2 Analytically Measurable Policies 11.3 Models with Multiplicative Cost
Appendix A
The Outer Integral
Appendix B
Additional Measurability Properties of Borel Spaces
B. 1 B.2 B.3 B.4 B.5
Proof of Proposition 7.35(e) Proof of Proposition 7.16 An Analytic Set Which Is Not Borel-Measurable The Limit a-Algebra Set Theoretic Aspects of Borel Spaces
Appendix C
The Hausdorff Metric and the Exponential Topology
References Ttrhle ~ f ' P ~ ~ ~ p o ~ i Lt i~oI ~P ~I I. Is D(:fi17iti017.~. I. ~ I S , (//I(/ A.S.YIIIII~~I'~IIJ
Index
Preface
This monograph is the outgrowth of research carried out at the University of Illinois over a three-year period beginning in the latter half of 1974. The objective of the monograph is to provide a unifying and mathematically rigorous theory for a broad class of dynamic programming and discrete-time stochastic optimal control problems. It is divided into two parts, which can be read independently. Part I provides an analysis of dynamic programming models in a unified framework applicable to deterministic optimal control, stochastic optimal control, minimax control, sequential games, and other areas. It resolves the stvuctural questions associated with such problems, i.e., it provides results that draw their validity exclusively from the sequential nature of the problem. Such results hold for models where measurability of various objects is of no essential concern, for example, in deterministic problems and stochastic problems defined over a countable probability space. The starting point for the analysis is the mapping defining the dynamic programming algorithm. A single abstract problem is formulated in terms of this mapping and counterparts of nearly all results known for deterministic optimal control problems are derived. A new stochastic optimal control model based on outer integration is also introduced in this
xii
PREFACE
part. It is a broadly applicable model and requires no topological assumptions. We show that all the results of Part I hold for this model. Part I1 resolves the measurability questions associated with stochastic optimal control problems with perfect and imperfect state information. These questions have been studied over the past fifteen years by several researchers in statistics and control theory. As we explain in Chapter 1, the approaches that have been used are either limited by restrictive assumptions such as compactness and continuity or else they are not sufficiently powerful to yield results that are as strong as their structural counterparts. These deficiencies can be traced to the fact that the class of policies considered is not sufficiently rich to ensure the existence of everywhere optimal or E-optimal policies except under restrictive assumptions. In our work we have appropriately enlarged the space of admissible policies to include universally measurable policies. This guarantees the existence of E-optimal policies and allows, for the first time, the development of a general and comprehensive theory which is as powerful as its deterministic counterpart. We mention, however, that the class of universally measurable policies is not the smallest class of policies for which these results are valid. The smallest such class is the class of limit measurable policies discussed in Section 11.1. The @-algebraof limit measurable sets (or C-sets) is defined in a constructive manner involving transfinite induction that, from a set theoretic point of view, is more satisfying than the definition of the universal c-algebra. We believe, however, that the majority of readers will find the universal c-algebra and the methods of proof associated with it more understandable, and so we devote the main body of Part I1 to models with universally measurable policies. Parts I and I1 are related and complement each other. Part I1 makes extensive use of the results of Part I. However, the special forms in which these results are needed are also available in other sources (e.g., the textbook by Bertsekas [B4]). Each time we make use of such a result, we refer to both Part I and the Bertsekas textbook, so that Part I1 can be read independently of Part I. The developments in Part I1 show also that stochastic optimal control problems with measurability restrictions on the admissible policies can be embedded within the framework of Part I, thus demonstrating the broad scope of the formulation given there. The monograph is intended for applied mathematicians, statisticians, and mathematically oriented analysts in engineering, operations research, and related fields. We have assumed throughout that the reader is familiar with the basic notions of measure theory and topology. In other respects, the monograph is self-contained. In particular, we have provided all necessary background related to Bore1 spaces and analytic sets.
Acknowledgments
This research was begun while we were with the Coordinated Science Laboratory of the University of Illinois and concluded while Shreve was with the Departments of Mathematics and Statistics of the University of California at Berkeley. We are grateful to these institutions for providing support and an atmosphere conducive to our work, and we are also grateful to the National Science Foundation for funding the research. We wish to acknowledge the aid of Joseph Doob, who guided us into the literature on analytic sets, and of John Addison, who pointed out the existing work on the limit c+-algebra. We are particularly indebted to David Blackwell, who inspired us by his pioneering work on dynamic programming in Bore1 spaces, who encouraged us as our own investigation was proceeding, and who showed us Example 9.2. Chapter 9 is an expanded version of our paper "Universally Measurable Policies in Dynamic Programming" published in Mrrtl?etncrtics of Operations Reseurclz. The permission of The Institute of Management Sciences to include this material is gratefully acknowledged. Finally we wish to thank Rose Harris and Dee Wrather for their excellent typing of the manuscript.
Chapter I
Introduction
1.1 Structure of Sequential Decision Models Sequential decision models are mathematical abstractions of situations in which decisions must be made in several stages while incurring a certain cost at each stage. Each decision may influence the circumstances under which future decisions will be made, so that if total cost is to be minimized, one must balance his desire to minimize the cost of the present decision against his desire to avoid future situations where high cost is inevitable. A classical example of this situation, in which we treat profit as negative cost, is portfolio management. An investor must balance his desire to achieve immediate return, possibly in the form of dividends, against a desire to avoid investments in areas where low long-run yield is probable. Other examples can be drawn from inventory management, reservoir control, sequential analysis, hypothesis testing, and, by discretizing a continuous problem, from control of a large variety of physical systems subject to random disturbances. For an extensive set of sequential decision models, see Bellman [Bl], Bertsekas [B4], Dynkin and JuskeviE [DS], Howard [H7], Wald [W2], and the references contained therein. Dynamic programming ( D P for short) has served for many years as the principal method for analysis of a large and diverse group of sequential
2
1.
INTRODUCTION
decision problems. Examples are deterministic and stochastic optimal control problems, Markov and semi-Markov decision problems, minimax control problems, and sequential games. While the nature of these problems may vary widely, their underlying structures turn out to be very similar. In all cases, the cost corresponding to a policy and the basic iteration of the D P algorithm may be described by means of a certain mapping which differs from one problem to another in details which to a large extent are inessential. Typically, this mapping summarizes all the data of the problem and determines all quantities of interest to the analyst. Thus, in problems with a finite number of stages, this mapping may be used to obtain the optimal cost function for the problem as well as to compute an optimal or &-optimal policy through a finite number of steps of the D P algorithm. In problems with an infinite number of stages, one hopes that the sequence of functions generated by successive application of the D P iteration converges in some sense to the optimal cost function for the problem. Furthermore, all basic results of an analytical and computational nature can be expressed in terms of the underlying mapping defining the D P algorithm. Thus by taking this mapping as a starting point one can provide powerful analytical results which are applicable to a large collection of sequential decision problems. To illustrate our viewpoint, let us consider formally a deterministic optimal control problem. We have a discrete-time system described by the system equation Xk.1 =f(x,,u,), (1) where x, and x,,, represent a state and its succeeding state and will be assumed to belong to some state space S; u, represents a control variable chosen by the decisionmaker in some constraint set U(x,), which is in turn a subset of some control space C. The cost incurred at the kth stage is given by a function g(x,, u,). We seek a finite sequence of control functions n = (po,p l ,. . . ,pN- ,) (also referred to as a policy) which minimizes the total cost over N stages. The functions p, map S into C and must satisfy p,(x) E U(x) for all X E S . Each function p, specifies the control u, = p,(x,) that will be chosen when at the kth stage the state is x,. Thus the total cost corresponding to a policy n = (p,, p,, . . . ,pN- and initial state xo is given by
,
where the states x,,x,, . . . ,x, - are generated from x o and via the system equation k = 0, . . . , N - 2. (3) -xk+ 1 = f [xk,pL(xk)], Corresponding to each initial state xo and policy n,there is a sequence of control variables uo,u,, . . . , U N - where u, = p,(x,) and x, is generated by
1.1
STRUCTURE OF SEQUENTIAL DECISION MODELS
3
(3). Thus an alternative formulation of the problem would be to select a g(x,, u,) rather than a policy sequence of control variables minimizing n minimizing J,,,(xo). The formulation we have given here, however, is more consistent with the D P framework we wish to adopt. As is well known, the D P algorithm for the preceding problem is given by
:;x
and the optimal cost J*(x,) for the problem is obtained at the Nth step, i.e.,
One may also obtain the value J,, ,(x0) corresponding to any n = ( p o ,pl ,. . . , pN- at the Nth step of the algorithm
Now it is possible to formulate the previous problem as well as to describe the D P algorithm (4)-(5)by means of the mapping H given by
Let us define the mapping T by
T(J)(x= ) inf H(x,u, J ) u s U(x)
and, for any function p: S + C , define the mapping T , by
T,(J)(x)= H[x,A x ) ,Jl.
(10)
Both T and T , map the set of real-valued (or perhaps extended real-valued) functions on S into itself. Then in view of (6)-(7),we may write the cost functional J,,,(x0) of (2) as
where J, is the zero function on S [Jo(x)= 0 V ~ S ]Eand ( T , o T p l . . T,,_,) denotes the composition of the mappings T P oT,,, , . . . , T,,_ Similarly the D P algorithm (4)-(5)may be described by
1.
INTRODUCTION
and we have
where T Nis the composition of T with itself N times. Thus both the problem and the major algorithmic procedure relating to it can be expressed in terms of the mappings T and T,. One may also consider an infinite horizon version ofthe problem whereby we seek a sequence n = ( y o ,y,, . . .) that minimizes
subject to the system equation constraint (3).In this case one needs, of course, to make assumptions which ensure that the limit in (13) is well defined for each n and xo. Under appropriate assumptions, the optimal cost function defined by J*(x) = inf J,(x) 71
can be shown to satisfy Bellman's functional equation given by J*(x) = inf (g(x,u) + J*[ f ( x , u ) ] ) . uEU(X)
Equivalently
i.e., J* is a fixed point of the mapping T. Most of the infinite horizon results of analytical interest center around this equation. Other questions relate to the existence and characterization of optimal policies or nearly optimal policies and to the validity of the equation J*(x) = lim T N ( ~ , ) ( x ) V x E S, N+ m
which says that the DP algorithm yields in the limit the optimal cost function for the problem. Again the problem and the basic analytical and computational results relating to it can be expressed in terms of the mappings T and T,. The deterministic optimai control problem just described is representative of a plethora of sequential optimization problems of practical interest which may be formulated in terms of mappings similar to the mapping H of (8). As shall be described in Chapter 2, one can formulate in the same manner stochastic optimal control problems, minimax control problems, and others. Tlze objective ~fPart I is to prouide a common analytical frame-
1.2
STOCHASTIC OPTIMAL CONTROL PROBLEMS
5
work for all these problems and derive in a broadly applicable form all the results which draw their validity exclusively from the basic sequential structure of the decision-making process. This is accomplished by taking as a starting point a mapping H such as the one of (8) and deriving all major analytical and computational results within a generalized setting. The results are subsequently specialized to five particular models described in Section 2.3: deterministic optimal control problems, three types of stochastic optimal control problems (countable llisturbance space, outer integral formulation, and multiplicatice cost functional), and minimax control problems. 1.2 Discrete-Time Stochastic Optimal Control ProblemsMeasurability Questions
The theory of Part I is not adequate by itself to provide a complete analysis of stochastic optimal control problems, the treatment of which is the major objective of this book. The reason is that when such problems are formulated over uncountable probability spaces nontrivial measurability restrictions must be placed on the admissible policies unless we resort t o an outer integration framework. A discrete-time stochastic optimal control problem is obtained from the deterministic problem of the previous section when the system includes a stochastic disturbance w, in its description. Thus (1) is replaced by
and the cost per stage becomes g(x,, u,, w,). The disturbance w, is a member of some probability space (W, F)and has distribution p(dw,Ix,, uk).Thus the control variable u, exercises influence over the transition from x, t o x,, in two places, once in the system equation (15) and again as a parameter in the distribution of the disturbance w,. Likewise, the control u, influences the cost at two points. This is a redundancy in the system equation model given above which will be eliminated in Chapter 8 when we introduce the transition kernel and reduced one-stage cost function and thereby convert to a model frequently adopted in the statistics literature (see, e.g., Blackwell [B9]; Strauch [S14]). The system equation model is more common in engineering literature and generally more convenient in applications, so we are taking it as our starting point. The transition kernel and reduced one-stage cost function are technical devices which eliminate the disturbance space (W, 9) from consideration and make the model more suitable for analysis. We take pains initially to point out how properties of the original system carry over into properties of the transition kernel and reduced one-stage cost function (see the remarks following Definitions 8.1 and 8.7).
,
6
1.
INTRODUCTION
Stochastic optimal control is distinguished from its deterministic counterpart by the concern with when information becomes available. In deterministic control, to each initial state and policy there corresponds a sequence of control variables (u,, . . . ,uN- ,) which can be specified beforehand, and the resulting states of the system are determined by (1). In contrast, if the control variables are specified beforehand for a stochastic system, the decisionmaker may realize in the course of the system evolution that unexpected states have appeared and the specified control variables are no longer appropriate. Thus it is essential to consider policies TC = (p,,. . . where p, is a function from history to control. If x, is the initial state, u, = pO(xO)is taken to be the first control. If the states and controls (x,, u,, . . . , u,-, ,x,) have occurred, the control Uk
= P ~ ( X O , ~ .O, ~ , . k. - l , ~ k )
is chosen. We require that the control constraint be satisfied for every (x,, u,, . . . ,u,-, ,x,) and k. In this way the decisionmaker utilizes the full information available to him at each stage. Rather than choosing a sequence of control variables, the decisionmaker attempts to choose a policy which minimizes the total expected cost of the system operation. Actually, we will show that for most cases it is sufficient to consider only Markov policies, those for which the corresponding controls u, depend only on the current state xk rather than the entire history (x,, u,, . . . , u,- ,,x,). This is the type of policy encountered in Section 1.1. The analysis of the stochastic decision model outlined here can be fairly well divided into two categories-structural considerations and measurability considerations. Structural analysis consists of all those results which can be obtained if measurability of all functions and sets arising in the problem is of no real concern; for example, if the model is deterministic or, more generally, if the disturbance space W is countable. In Part I structural results are derived using mappings H, T,, and T of the kind considered in the previous section. Measurability analysis consists of showing that the structural results remain valid even when one places nontrivial measurability restrictions on the set of admissible policies. The work in Part I1 consists primarily of measurability analysis relying heavily on structural results developed in Part I as well as in other sources (e.g., Bertsekas [B4]). One can best illustrate this dichotomy of analysis by the finite horizon D P algorithm considered by Bellman [Bl] :
J,+,(x)=
inf E ( g ( x , u , w ) + J , [ f ( x , u , w ) ] $ , uE U(X)
k = O, . . . , N - 1 ,
(18)
1.2
7
STOCHASTIC OPTIMAL CONTROL PROBLEMS
where the expectation is with respect to p(dwlx, u). This is the stochastic counterpart of the deterministic D P algorithm (4)-(5). It is reasonable to expect that J,(x) is the optimal cost of operating the system over k stages when the initial state is x, and that if p,(x) achieves the ) infimum in (18) for every x and k = 0,. . . ,N - 1, then n = (p,, . . . ,p, is an optimal policy for every initial state x. If there are no measurability considerations, this is indeed the case under very mild assumptions, as shall be shown in Chapter 3. Yet it is a major task to properly formulate the stochastic control problem and demonstrate that the D P algorithm (17)-(18) makes sense in a measure-theoretic framework. One of the difficulties lies in showing that the expression in curly braces in (18) is measurable in some sense. Thus we must establish measurability properties for the functions J,. Related to this is the need to balance the measurability of policies (necessary so the expected cost corresponding to a policy can be defined) against a desire to be able to select at or near the infimum in (18). We illustrate these difficulties by means of a simple two-stage example.
-,
TWO-STAGE PROBLEMConsider the following sequence of events: (a) An initial state x, E R is generated (R is the real line). (b) Knowing x,, the decisionmaker selects a control u,ER. (c) A state x, E R is generated according to a known probability measure p(dxl lxo,u,) on B,, the Bore1 subsets of R, depending on x,, u, . [In terms of our earlier model, this corresponds to a system equation of the form X I = wo and p(dwolx0, uo) = p(dx,lxo, uo1.1 (d) Knowing x,, the decisionmaker selects a control u, E R. Given p(dx, lx,, u,) for every (x,, u,) E R2 and a function g :R2 + R, the problem is to find a policy n = ( p , , p,) consisting of two functions p, :R -+ R and p1 : R + R that minimizes
We temporarily postpone a discussion of restrictions (if any) that must be placed on g, p,, and p1 in order for the integral in (19) to be well defined. In terms of our earlier model, the function g gives the cost for the second stage while we assume no cost for the first stage. The D P algorithm associated with the problem is
and, assuming that J,(x,) >
-
a,J,(x,) >
-
oo for all x, E R, x l E R, the
1.
INTRODUCTION
results one expects to be true are:
R . l There holds J,(xo) = inf J,(xo)
Vx,
E R.
n
71,
R.2 Given E > 0, there is an (everywhere) E-optimal policy, i.e., a policy such that J,=(x,) l inf J,(xo)
+r
Vx,
E R.
I[
R.3 If the infimum in (20) and (21) is attained for all x, E R and x, then there exists a policy that is optimal for every xo E R.
E R,
R.4 If pT(x,) and p8(xo), respectively, attain the infimum in (20) and (21)for all x, E R and x, E R, then n* = (pg, py) is optimal for every x, E R, i.e., J,,(xo)
= inf J,(xo)
Vx,
E R.
n:
A formal derivation of R.l consists of the following steps:
Similar formal derivations can be given for R.2, R.3, and R.4. The following points need to be justified in order to make the preceding derivation meaningful and mathematically rigorous. (a) In (22a),g and p l must be such that g[x,, pl (x,)] can be integrated in a well-defined manner. (b) In (22b), the interchange of infimization and integration must be legitimate. Furthermore g must be such that J,(x,) [= inf,, g(x,, u,)] can be integrated in a well-defined manner. We first observe that if, for each (x,,~,), p(clx,(x,,u,) has countable support, i.e., is concentrated on a countable number of points, then integration in (22a) and (22b) reduces to infinite summation. Thus there is no need to impose measurability restrictions on g, p,, and p,, and the interchange of infimization and integration in (22b) is justified in view of the assumption
1.2
9
STOCHASTIC OPTIMAL CONTROL PROBLEMS
inf,, g ( x l , u l )>
-
oo for all x, E R. (For e > 0, take p,: R
g[xl,p,(xl)] 5 infg(x,,ul) uI
+e
+
Vxl E R.
R such that (23)
Then
Since e > 0 is arbitrary, it follows that
The reverse inequality is clear, and the result follows.) A similar argument proves R.2, while R.3 and R.4 are trivial in view of the fact that there are no measurability restrictions on p, and p,. If p(dx,lxo,u,) does not have countable support, there are two main approaches. The first is to expand the notion of integration, and the second is to restrict g, yo, and p, to be appropriately measurable. Expanding the notion of integration can be achieved by interpreting the integrals in (22a) and (22b) as outer integrals (see Appendix A). Since the outer integral can be defined for any function, measurable or not, there is no need to require that g, p,, and p, are measurable in any sense. As a result, (22a) and (22b) make sense and an argument such as the one beginning with (23) goes through. This approach is discussed in detail in Part I, where we show that all the basic results for finite and infinite horizon problems of perfect state information carry through within an outer integration framework. However, there are inherent limitations in this approach centering around the pathologies of outer integration. Difficulties also occur in the treatment of imperfect information problems using sufficient statistics. The major alternative approach was initiated in more general form by Blackwell [B9] in 1965. Here we assume at the outset that g is Borel- mea, , (98, is the Borel G-algebra on R), surable, and furthermore, for each B E@ the function p(Blxo, uo) is Borel-measurable in (x,, uo). In the initial treatment of the problem, the functions p, and p l were restricted to be Borelmeasurable. With these assumptions, g[x,,,ul(xl)] is Borel-measurable in .ul when p1 is Borel-measurable, and the integral in (22a) is well defined. A major difficulty occurs in (22b) since it is not necessarily true that J l ( x , ) = inf,, g ( s l , u , ) is Borel-measurable, even if g is. The reason can be traced to the fact that the orthogonal projection of a Borel set in R2 on one
10
1. INTRODUCTION
of the axes need not be Borel-measurable (see Section 7.6). Since we have for c e R
where proj,, denotes projection on the x,-axis, it can be seen that {x,l J,(x,) < c) need not be Borel, even though {(x,, ul)lg(xl,u,) < c) is. The difficulty can be overcome in part by showing that J, is a lower semianalytic and hence also universally measurable function (see Section 7.7). Thus J, can be integrated with respect to any probability measure on 93,. Another difficulty stems from the fact that one cannot in general find a Borel-measurable E-optimal selector p, satisfying (23), although a weaker result is available whereby, given a probability measure p on 93,, the existence of a Borel-measurable selector p, satisfying 9[xl>P,(xl)l
infg(x,,u,)
+E
u1
for p almost every x, E R can be ascertained. This result is sufficient to justify (24) and thus prove result R.l (J, = inf, J,). However, results R.2 and R.3 cannot be proved when p, and p, are restricted to be Borelmeasurable except in a weaker form involving the notion of p-optimality (see [S14] ; [H4]). The objective of Part 11 is to resolve the measurability questions in stochastic optimal control in such a way that almost every result can be proved in a form as strong as its structural counterpart. This is accomplished by enlarging the set of admissible policies to include all universally measurable policies. In particular, we show the existence of policies within this class that are optimal or nearly optimal for every initial state. A great many authors have dealt with measurability in stochastic optimal control theory. We describe three approaches taken and how their aims and results relate to our own. A fourth approach, due to Blackwell et al. [B12] and based on analytically measurable policies, is discussed in the next section and in Section 11.2.
I
The General Model
If the state, control, and disturbance spaces are arbitrary measure spaces, very little can be done. One attempt in this direction is the work of Striebel [S16] involving p-essential infima. Geared toward giving meaning to the dynamic programming algorithm, this work replaces (18) by Jk+
l(x) = p,-essential inf E jg[x, ~ ( x )w], w
+ Jk[f(x, p(x), w ) ] ),
(25)
1.2
11
STOCHASTIC OPTIMAL CONTROL PROBLEMS
k = 0,. . . ,N - 1, where the p-essential infimum is over all measurable p from state space S to control space C satisfying any constraints which may have been imposed. The functions J, are measurable, and if the probability measures p,, . . . ,pN- are properly chosen and the so-called countable &-lattice property holds, this modified dynamic programming algorithm generates the optimal cost function and can be used to obtain policies which are optimal or nearly optimal for p,-, almost all initial states. The selection of the proper probability measures p, ,. . . ,p,- , however, is at least as difficult as executing the dynamic programming algorithm, and the verification of the countable &-latticeproperty is equivalent to proving the existence of an &-optimalpolicy.
,
,
II
T h e Semicontinuous Models
Considerable attention has been directed toward models in which the state and control spaces are Bore1 spaces or even Rn and the reduced cost function
has semicontinuity and/or convexity properties. A companion assumption is that the mapping
is a measurable closed-valued multifunction [R2]. In the latter case there exists a Borel-measurable selector p: S -+ C such that p(x) E U(x) for every state x (Kuratowski and Ryll-Nardzewski [K5]). This is of course necessary if any Borel-measurable policy is to exist at all. The main fact regarding models of this type is that under various combinations of semicontinuity and compactness assumptions, the functions J, defined by (17) and (18) are semicontinuous. In addition, it is often possible to show that the infimum in (18) is achieved for every x and k, and there are Borel-measurable selectors p,,. . . , p N - , such that p,(x) achieves this infimum (see Freedman [Fl], Furukawa [F3], Himmelberg, et al. [H3], Maitra [M2], Schal [S3], and the references contained therein). Such a policy (p,, . . . ,p,- ,) is optimal, and the existence of this optimal policy is an additional benefit of imposing topological conditions to ensure that the problem is well defined. In Section 9.5 we show that lower semicontinuity and compactness conditions guarantee convergence of the dynamic programming algorithm over an infinite horizon to the optimal cost function, and that this algorithm can be used to generate an optimal stationary policy. Continuity and compactness assumptions are integral to much of the work that has been done in stochastic programming. This work differs from
12
1.
INTRODUCTION
our own in both its aims and its framework. First, in the usual stochastic programming model, the controls cannot influence the distribution of future states (see Olsen [Ol-031, Rockafellar and Wets [R3-R4], and the references contained therein). As a result, the model does not include as special cases many important problems such as, for example, the classical linear quadratic stochastic control problem [B4, Section 3.11. Second, assumptions of convexity, lower semicontinuity, or both are made on the cost function, the model is designed for the Kuratowski-Ryll-Nardzewski selection theorem, and the analysis is carried out in a finite-dimensional Euclidean state space. All of this is for the purpose of overcoming measurability problems. Results are not readily generalizable beyond Euclidean spaces (Rockafellar [R2]). The thrust of the work is toward convex programming type results, i.e., duality and Kuhn-Tucker conditions for optimality, and so a narrow class of problems is considered and powerful results are obtained.
III
T h e Bovel Models
The Borel space framework was introduced by Blackwell [B9] and further refined by Strauch, Dynkin, JuskeviE, Hinderer, and others. The state and control spaces S and C were assumed to be Borel spaces, and the functions defining the model were assumed to be Borel-measurable. Initial efforts were directed toward proving the existence of "nice" optimal or nearly optimal policies in this framework. Policies were required to be Borel-measurable. For this model it is possible to prove the universal measurability of the optimal cost function and the existence for every E > 0 and probability measure p on S of a p-&-optimal policy (Strauch [S14, Theorems 7.1 and 8.11). A p-&-optimal policy is one which leads to a cost differing from the optimal cost by less than e for p almost every initial state. As discussed earlier, even over a finite horizon the optimal cost function need not be Borel-measurable and there need not exist an everywhere &-optimalpolicy (Blackwell [B9, Example 21). The difficulty arises from the inability to choose a Borel-measurable function p,:S + C which nearly achieves the infimum in (18) uniformly in x. The nonexistence of such a function interferes with the construction of optimal policies via the dynamic programming algorithm (17) and (18), since one must first determine at each stage the measure p with respect to which it is satisfactory to nearly achieve the infimum in (18) for p almost every x. This is essentially the same problem encountered with (25). The difficulties in constructing nearly optimal policies over an infinite horizon are more acute. Furthermore, from an applications point of view, a p-&-optimal policy, even if it can be constructed, is a much less appealing object than an everywhere &-optimalpolicy, since in many situations the distribution p is unknown or may change when the system is
operated repetitively, in which case a new p-&-optimal policy must be computed. In our formulation, the class of admissible policies in the Bore1 model is enlarged to include all universally measurable policies. We show in Part I1 that this class is sufficiently rich to ensure that there exist everywhere &-optimal policies and, if the infimum in the DP algorithm (18) is attained for every x and k, then an everywhere optimal policy exists. Thus the notion of p-optimality can be dispensed with. The basic reason why optimal and nearly optimal policies can be found within the class of universally measurable policies may be traced to the selection theorem of Section 7.7. Another advantage of working with the class of universally measurable functions is that this class is closed under certain basic operations such as integration with respect to a universally measurable stochastic kernel and composition. Our method of proof of infinite horizon results is based on an equivalence of stochastic and deterministic decision models which is worked out in Sections 9.1-9.3. The conversion is carried through only for the infinite horizon model, as it is not necessary for the development in Chapter 8. It is also done only under assumptions (P),(N),or (D) of Definition 9.1, although the models make sense under conditions similar to the (F') and (F-) assumptions of Section 8.1. The relationship between the stochastic and the deterministic models is utilized extensively in Sections 9.4-9.6, where structural results proved in Part I are applied to the deterministic model and then transferred to the stochastic model. The analysis shows how results for stochastic models with measurability restrictions on the set of admissible policies can be obtained from the general results on abstract dynamic programming models given in Part I and provides the connecting link between the two parts of this work. 1.3 The Present Work Related to the Literature
This section summarizes briefly the contents of each chapter and points out relations with existing literature. During the course of our research, many of our results were reported in various forms (Bertsekas [B3-B5]; Shreve [S7-S8]; Shreve and Bertsekas [S9-S121). Since the present monograph is the culmination of our joint work, we report particular results as being new even though they may be contained in one or more of the preceding references. Part I
The objective of Part I is to provide a unifying framework for finite and infinite horizon dynamic programming models. We restrict our attention to
14
1.
INTRODUCTION
three types of infinite horizon models, which are patterned after the discounted and positive models of Blackwell [B8-B9] and the negative model of Strauch [S14]. It is an open question whether the framework of Part I can be effectively extended to cover other types of infinite horizon models such as the average cost model of Howard [H7] or convergent dynamic programming models of the type considered by Dynkin and JuskeviE [D8] and Hordijk [H6]. The problem formulation of Part I is new. The work that is most closely related to our framework is the one by Denardo [D2], who considered an abstract dynamic programming model under contraction assumptions. Most of Denardo's results have been incorporated in slightly modified form in Chapter 4. Denardo's problem formulation is predicated on his contraction assumptions and is thus unsuitable for finite horizon models such as the one in Chapter 3 and infinite horizon models such as the ones in Chapter 5. This fact provided the impetus for our different formulation. Most of the results of Part I constitute generalizations of results known for specific classes of problems such as, for example, deterministic and stochastic optimal control problems. We make an effort to identify the original sources, even though in some cases this is quite difficult. Some of the results of Part I have not been reported earlier even for a specific class of problems, and they will be indicated as new. Chapter 2 Here we formulate the basic abstract sequential optimization problem which is the subject of Part I. Several classes of problems of practical interest are described in Section 2.3 and are shown to be special cases of the abstract problem. All these problen~shave received a great deal of attention in the literature with the exception of the stochastic optimal control model based on outer integration (Section 2.3.3). This model, as well as the results in subsequent chapters relating to it, is new. A stochastic model based on outer integration has also been considered by Denardo [D2], who used a different definition of outer integration. His definition works well under contraction assumptions such as the one in Chapter 4. However, many of the results of Chapters 3 and 5 do not hold if Denardo's definition of outer integral is adopted. By contrast, all the basic results of Part I are valid when specialized to the model of Section 2.3.3. Chapter 3 This chapter deals with the finite horizon version of our abstract problem. The central results here relate to the validity of the dynamic programming algorithm, i.e., the equation J$ = TN(J,). The validity of this equation is often accepted without scrutiny in the engineering literature, while in mathematical works it is usually proved under assumptions that are stronger than necessary. While we have been unable to locate an appropriate source, we feel certain that the results of Proposition 3.1 are known
1.3
THE PRESENT WORK RELATED TO THE LITERATURE
15
for stochastic optimal control problems. The notion of a sequence of policies exhibiting {E,)-dominated convergence to optimality and the corresponding existence result (Proposition 3.2) are new. Chapter 4 Here we treat the infinite horizon version of our abstract problem under a contraction assumption. The developments in this chapter overlap considerably with Denardo's work [D2]. Our contraction assumption C is only slightly different from the one of Denardo. Propositions 4.1, 4.2, 4.3 (a), and 4.3 (c) are due to Denardo [D2], while Proposition 4.3 (b) has been shown by Blackwell [B9] for stochastic optimal control problems. Proposition 4.4 is new. Related compactness conditions for existence of a stationary optimal policy in stochastic optimal control problems were given by Maitra [M2], Kushner [K6], and Schal [S5]. Propositions 4.6 and 4.7 improve on corresponding results by Denardo [D2] and McQueen [M3]. The modified policy iteration algorithm and the corresponding convergence result (Proposition 4.9) are new in the form given here. Denardo [D2] gives a somewhat less general form of policy iteration. The idea of policy iteration for deterministic and stochastic optimal control problems dates, of course, to the early days of dynamic programming (Bellman [Bl]; Howard [H7]). The mathematical programming formulation of Section 4.3.3 is due to Denardo [D2]. Chapter 5 Here we consider infinite horizon versions of our abstract model patterned after the positive and negative models of Blackwell [B8, B9] and Strauch [S14]. When specialized to stochastic optimal control problems, most of the results of this chapter have either been shown by these authors or can be trivially deduced from their work. The part of Proposition 5.1 dealing with existence of an E-optimal stationary policy is new, as is the last part of Proposition 5.2. Forms of Propositions 5.3 and 5.5 specialized to certain gambling problems have been shown by Dubins and Savage [D6], whose monograph provided the impetus for much of the subsequent work on dynamic programming. Propositions 5.9-5.1 1 are new. Results similar to those of Proposition 5.10 have been given by Schal [S5] for stochastic optimal control problems under semicontinuity and compactness assumptions. Chapter 6 The analysis in this chapter is new. It is motivated by the fact that the framework and the results of Chapters 2-5 are primarily applicable to problems where measurability issues are ofno essential concern. While it is possible to apply the results to problems where policies are subject to measurability restrictions, this can be done only after a fairly elaborate reformulation (see Chapter 9). Here we generalize our framework so that problems in which measurability issues introduce genuine complications can be dealt with directly. However, only a portion of our earlier results carry
16
1. INTRODUCTION
through within the generalized framework-primarily those associated with finite horizon models and infinite horizon models under contraction assumptions. Part I I
The objective of Part I1 is to develop in some detail the discrete-time stochastic optimal control problem (additive cost) in Borel spaces. The measurability questions are addressed explicitly. This model was selected from among the specialized models of Part I because it is often encountered and also because it can serve as a guide in the resolution of measurability difficulties in a great many other decision models. In Chapter 7 we present the relevant topological properties of Borel spaces and their probability measures. In particular, the properties of analytic sets are developed. Chapter 8 treats the finite horizon stochastic optimal control problem, and Chapter 9 is devoted to the infinite horizon version. Chapter 10 deals with the stochastic optimal control problem when only a "noisy" measurement of the state of the system is possible. Various extensions of the theory of Chapters 8 and 9 are given in Chapter 11. Chapter 7 The properties presented for metrizable spaces are well known. The material on Borel spaces can be found in Chapter 1 of Parthasarathy [PI] and is also available in Kuratowski [K2-K3]. A discussion of the weak topology can be found in Parthasarathy [PI]. Propositions 7.20, 7.21, and 7.23 are due to Prohorov [P2], but their presentation here follows Varadarajan [Vl]. Part of Proposition 7.21 also appears in Billingsley [B7]. Proposition 7.25 is an extension of a result for compact X found in Dubins and Freedman [D5]. Versions of Proposition 7.25 have been used in the literature for noncompact X (Strauch [S14]; Blackwell et al. [B12]), the authors evidently intending an extension of the compact result by using Urysohn's theorem to embed X in a compact metric space. Proposition 7.27 is reported by Rhenius [Rl], JuskeviE [J3] and Striebel [S16]. We give Striebel's proof. Propositions 7.28 and 7.29 appear in some form in several texts on probability theory. A frequently cited reference is Loeve [Ll]. Propositions 7.30 and 7.31 are easily deduced from Maitra [M2] or Schal [S4], and much of the rest of the discussion of semicontinuous functions is found in Hausdorff [H2]. Proposition 7.33 is due to Dubins and Savage [D6]. Proposition 7.34 is taken from Freedman [Fl]. The investigation of analytic sets in Borel spaces began several years ago, but has been given additional impetus recently by the discovery of their applications to stochastic processes. Suslin schemes and analytic sets first appear in a paper by M. Suslin (or Souslin) in 1917 [S17], although the idea is generally attributed to Alexandroff. Suslin pointed out that every Borel
1.3
THE PRESENT WORK RELATED TO THE LITERATURE
17
subset of the real line could be obtained as the nucleus of a Suslin scheme for the closed intervals, and non-Bore1 sets could be obtained this way as well. He also noted that the analytic subsets of R were just the projections on an axis of the Borel subsets of R2.The universal measurability of analytic sets (Corollary 7.42.1) was proved by Lusin and Sierpinski [L3] in 1918. (See also Lusin [L2].) Our proof of this fact is taken from Saks [Sl]. We have also taken material on analytic sets from Kuratowski [K2], Dellacherie [Dl], Meyer [M4], Bourbaki [B13], Parthasarathy [PI], and Bressler and Sion [B14]. Proposition 7.43 is due to Meyer and Traki [M5], but our proof is original. The proofs given here of Propositions 7.47 and 7.49 are very similar to those found in Blackwell et al. [B12]. The basic result of Proposition 7.49 is due to Jankov [Jl], but was also worked out about the same time and published later by von Neumann [Nl, Lemma 5, p. 4481. The Jankov-von Neumann result was strengthened by Mackey [MI, Theorem 6.31. The history of this theorem is related by Wagner [Wl, pp. 900-9011. Proposition 7.50(a) is due to Blackwell et al. [B12]. Proposition 7.50(b) together with its strengthened version Proposition 11.4 generalize a result by Brown and Purves [B15], who proved existence of a universally measurable (D for the case where f is Borel measurable. Chapter 8 The finite horizon stochastic optimal control model of Chapter 8 is essentially a finite horizon version of the models considered by Blackwell [B8,B9], Strauch [S14], Hinderer [H4], Dynkin and Juskevii: [D8], Blackwell et al. [B12], and others. With the exception of [B12], all these works consider Borel-measurable policies and obtain existence results of a p-&-optimal nature (see the discussion of the previous section). We allow universally measurable policies and thereby obtain everywhere &-optimal existence results. While in Chapters 8 and 9 we concentrate on proving results that hold everywhere, the previously available results which allow only Borel-measurable policies and hold p almost everywhere can be readily obtained as corollaries. This follows from the following fact, whose proof we sketch shortly:
(F) If X and Y are Borel spaces, po ,p, ,. . . is a sequence of probability measures on X , and p is a universally measurable map from X to Y , then there is a Borel measurable map $from X to Y such that for p, almost every x, k
Ax) = P'(x) = 0,1,. . . .
As an example of how this observation can be used to obtain p almost everywhere existence results from ours, consider Proposition 9.19. It states in part that if F > 0 and the discount factor a is less than one, then an Eoptimal nonrandomized stationary policy exists, i.e., a policy n = ( p ,p,. . .),
18
1.
INTRODUCTION
where p is a universally measurable mapping from S to C. Given p, on S, this policy generates a sequence of measures p,, p,, . . . on S, where p, is the distribution of the kth state when the initial state has distribution p, and the policy n is used. Let p': S + C be Borel-measurable and equal to p for p, almost every x, k = 0,1,. . . . Let 7 ~ '= (pl,p', . . .). Then it can be shown that for p, almost every initial state, the cost corresponding to z'equals the cost corresponding to n,so n' is a p,-&-optimal nonrandomized stationary Borel-measurable policy. The existence of such a n' is a new result. This type of argument can be applied to all the existence results of Chapters 8 and 9. We now sketch a proof of (F). Assume first that Y is a Borel subset of [0, 11. Then for r~ [0, 11, r rational, the set
is universally measurable. For every k, let p:[U(r)] be the outer measure of U(r) with respect to p, and let B,,, B,,, . . . be a decreasing sequence of Borel sets containing U(r) such that
Let B(r) =
ny= Bkj. Then p:[U(r)]
= pk[B(r)],
k = 0,1,. . .
:
and the argument of Lemma 7.27 applies. If Y is an arbitrary Borel space, it is Borel isomorphic to a Borel subset of [O,1] (Corollary 7.16.1), and (F) follows. Proposition 8.1 is due to Strauch [S14], and Proposition 8.2 is contained in Theorem 14.4 of Hinderer [H4]. Example 8.1 is taken from Blackwell [Bg]. Proposition 8.3 is new, the strongest previous result along these lines being the existence of an analytically measurable &-optimalpolicy when the one-stage cost function is nonpositive [B12]. Propositions 8.4 and 8.5 are new, as are the corollaries to Proposition 8.5. Lower semicontinuous models have received much attention in the literature (Maitra [M2]; Furukawa [F3]; Schal [S3-S.51; Freedman [Fl]; Himmelberg et al. [H3]). Our lower semicontinuous model differs somewhat from those in the literature, primarily in the form of the control constraint. Proposition 8.6 is closely related to the analysis in several of the previously mentioned references. Proposition 8.7 is due to Freedman [Fl]. Chapter 9 Example 9.1 is a modification of Example 6.1 of Strauch [S14], and Proposition 9.1 is taken from Strauch [S14]. The conversion of the stochastic optimal control problem to the deterministic one was suggested
1.3
THE PRESENT WORK RELATED TO THE LITERATURE
19
by Witsenhausen [W3] in a different context and carried out systematically for the first time here. This results in a simple proof of the lower semianalyticity of the infinite horizon optimal cost function (cf. Corollary 9.4.1 and Strauch [S14, Theorem 7.11). Propositions 9.8 and 9.9 are due to Strauch [S14], as are the (D) and (N) parts of Proposition 9.10. The (P) part of Proposition 9.10 is new. Proposition 9.12 appears as Theorem 5.2.2 of Schal [S5], but Corollary 9.12.1 is new. Proposition 9.14 is a special case of Theorem 14.5 of Hinderer [H4]. Propositions 9.15-9.17 and the corollaries to Proposition 9.17 are new, although Corollary 9.17.2 is very close to Theorem 13.3 of Schal [S5]. Propositions 9.18-9.20 are new. Proposition 9.21 is an infinite horizon version of a finite horizon result due to Freedman [Fl], except that the nonrandomized &-optimalpolicy Freedman constructs may not be semi-Markov. Chapter I0 The use of the conditional distribution of the state given the available information as a basis for controlling systems with imperfect state information has been explored by several authors under various assumptions (see, for example, Astrom [A2], Striebel [S15], and Sawaragi and Yoshikawa [S2]). The treatment of imperfect state information models with uncountable Bore1 state and action spaces, however, requires the existence of a regular conditional distribution with a measurable dependence on a parameter (Proposition 7.27), and this result is quite recent (Rhenius [Rl]; JuskeviE [J3]; Striebel [S16]). Chapter 10 is related to Chapter 3 of Striebel [S16] in that the general concept of a statistic sufficient for control is defined. We use such a statistic to construct a perfect state information model which is equivalent in the sense of Propositions 10.2 and 10.3 to the original imperfect state information model. From this equivalence the validity of the dynamic programming algorithm and the existence of &-optimal policies under the mild conditions of Chapters 8 and 9 follow. Striebel justifies use of a statistic sufficient for control by showing that under avery strong hypothesis [S16, Theorem 5.5.11 the dynamic programming algorithm is valid and an E-optimalpolicy can be based on the sufficient statistic. The strong hypothesis arises from the need to specify the null sets in the range spaces of the statistic in such a way that this specification is independent of the policy employed. This need results from the inability to deal with the pointwise partial infima of multivariate functions without the machinery of universally measurable policies and lower semianalytic functions. Like Striebel, we show that the conditional distributions of the states based on the available information constitute a statistic sufficient for control (Proposition 10.5), as do the vectors of available information themselves (Proposition 10.6). The treatments of Rhenius [Rl] and JuskeviE [J3] are like our own in that perfect state information models which are equivalent to the original
20
1.
INTRODUCTION
one are defined. In his perfect state information model, Rhenius bases control on the observations and conditional distributions of the states, i.e., these objects are the states of his perfect state information model. It is necessary in Rhenius' framework for the controller to know the most recent observation, since this tells him which controls are admissible. We show in Proposition 10.5 that if there are no control constraints, then there is nothing to be gained by remembering the observations. In the model of JuskeviE [J3], there are no control constraints and control is based on the past controls and conditional distributions. In this case, &-optimal control is possible without reference to the past controls (Propositions 10.5,8.3,9.19, and 9.20), so our formulation is somewhat simpler and just as effective. Chapter 10 differs from all the previously mentioned works in that simple conditions which guarantee the existence of a statistic sufficient for control are given, and once this existence is established, all the results of Chapters 8 and 9 can be brought to bear on the imperfect state information model. Chapter 11 The use in Section 11.1 of limit measurability in dynamic programming is new. In particular, Proposition 11.3 is new, and as discussed earlier in regard to Proposition 7.50(b), a result by Brown and Purves [B15] is generalized in Proposition 11.4. Analytically measurable policies were introduced by Blackwell et al. [B12], whose work is referenced in Section 11.2. Bore1 space models with multiplicative cost fall within the framework of Furukawa and Iwamoto [F4-F5], and in [F5] the dynamic programming algorithm and a characterization of uniformly N-stage optimal policies are given. The remainder of Proposition 11.7 is new. Appendix A Outer integration has been used by several authors, but we have been unable to find a systematic development. Appendix B Proposition B.6 was first reported by Suslin [S17], but the proof given here is taken from Kuratowski [K2, Section 38VIl. According to Kuratowski and Mostowski [K4, p. 4551, the limit o-algebra Y Xwas introduced by Lusin, who called its members the "C-sets." A detailed discussion of the o-algebra was given by Selivanovskij [S6] in 1928. Propositions B.9 and B.10 are fairly well known among set theorists, but we have been unable to find an accessible treatment. Proposition B.ll is new. Cenzer and Mauldin [Cl] have also shown independently that 9,is closed under composition of functions, which is part of the result of Proposition B.ll. Proposition B.12 is new. It seems plausible that there are an infinity of distinct o-algebras between the limit G-algebraand the universal o-algebra that are suitable for dynamic programming. One promising method of constructing such o-algebras involves the R-operator of descriptive set theory (see Kantorovitch and
1.3
THE PRESENT WORK RELATED TO THE LITERATURE
21
Livenson [Kl]). In a recent paper [Bll], Blackwell has employed a different method to define the "Borel-programmable" o-algebra and has shown it to have many of the same properties we establish in Appendix B for the limit o-algebra. It is not known, however, whether the Borel-programmable o-algebra satisfies a condition like Proposition B.12 and is thereby suitable for dynamic programming. It is easily seen that the limit o-algebra is contained in Blackwell's Borel-programmable o-algebra, but whether the two coincide is also unknown. Appendix C A detailed discussion of the exponential topology on the set of closed subsets of a topological space can be found in Kuratowski [K2-K3]. Properties of semicontinuous (K) functions are also proved there, primarily in Section 43 of [K3]. The Hausdorff metric is discussed in Section 38 of [H2].
Part I
Analysis of Dynamic Programming Models
Chapter 2
Monotone Mappings Underlying Dynamic Programming Modelsr
This chapter formulates the basic abstract sequential optimization problem which is the subject of Part I. It also provides examples of special cases which include wide classes of problems of practical interest.
2.1 Notation and Assumptions Our usage of mathematical notation is fairly standard. For the reader's convenience we mention here that we use R to denote the real line and R* to denote the extended real line, i.e., R* = R u { - co,co). The sets ( - co,co] = R u (co) and [- co,co) = R u ( - co) will be written out explicitly. We will assume throughout that R is equipped with the usual topology generated by the open intervals (a,P), a, /3 E R, and with the (Borel) o-algebra generated by this topology. Similarly R* is equipped with the topology generated by the open intervals (a,P), a, P E R, together with the sets (y, co], [ - co,y ) , y E R, and with the o-algebra generated by this topology. The Cartesian product of sets X I , X,, . . . ,X, is denoted X,X,. . . X,. Parts I and I1 can be read independently. The reader may proceed directly to Part I1 if he so wishes.
26
2.
MONOTONE MAPPINGS IN DYNAMIC PROGRAMMING MODELS
The following definitions and conventions will apply throughout Part I.
(1) S and C are two given sets referred to as the state space and control space, respectively. ( 2 ) For each x E S, there is given a nonempty subset U ( x ) of C referred to as the control constraint set at x . (3) We denote by M the set ofall functions p : S + C such that p ( x ) E U ( x ) for all X E S. We denote by ll the set of all sequences 7c = (p,, p,, . . .) such that p , M ~ for all k. Elements of ll are referred to as policies. Elements of ll of the form 7c = ( p , p , . . .), where EM, are referred to as stationary policies. (4) We denote: F the set of all extended real-valued functions J : S + R*; B the Banach space of all bounded real-valued functions J : S + R with the supremum norm 11. 1 1 defined by
jldll = sup I J(x)l
V J E B.
XES
(5) For all J , J ' E F we write
For all J E F and F E R, we denote by J J ( x ) + F at each x E S, i.e.,
+ E the function taking the value
(6) Throughout Part I the analysis is carried out within the set of extended real numbers R*. We adopt the usual conventions regarding ordering, addition, and multiplication in R* except that we take and we take the product of zero and infinity to be zero. In this way the sum and the product of-any two extended real numbers is well defined. Division by zero or cc does not appear in our analysis. In particular, we adopt the following rules in calculations involving cc and - cc :
x+a=m+a=co K - ~ = - ~ + G L = - w
for for
-cc I a I a , --mI%<m;
xcc = COG! = m , a ( - c c ) = ( - m ) a = -c for O < a < co, for - c o 1 c c < 0 ; aa=cca=-cc, a(-a)=(-m)a=cc Oco=aO=O=O(-a)=(-w)O, -(-a)=m; inf@=+cc, sup@=-a, where @ is the empty set.
2.1
NOTATION AND ASSUMPTIONS
Under these rules the following laws of arithmetic are still valid: a,
+ a, = a, + a,,
(a,
+ a,) + a3 = a, + (a2 + a3),
%la2= a2al9
( ~ 1 la3 ~ 2= a1(a2a3
We also have
+
a(%, a,)
= aa,
1.
+ %a2
+
+
if either a 2 0 or else (a, a,) is not of the form co - co. (7) For any sequence {J,) with J,EF for all k, we denote by lim,,, J, the pointwise limit of {J,) (assuming it is well defined as an extended realvalued function) and by limsup,,, J, (liminf,,, J,) the pointwise limit superior (inferior) of {J,). For any collection {J,la E A } c F parameterized by the elements of a set A, we denote by inf,,, J, the function taking the value inf,.. J,(x) at each x E S. The Basic Mapping We are given a function H which maps SCF (Cartesian product of S, C ,and F ) into R*, and we define for each p E M the mapping T, : F --+ F by T,(J)(x)
= H[x,
We define also the mapping T : F
-t
p(x),J]
Vx E S.
(1)
Vx E S.
(2)
F by
T(J)(x)= inf H(x, u, J) u € U(x)
We denote by Tk, k = 1,2,. . . , the composition of T with itself k times. For convenience we also define T O ( J ) = J for all J EF. For any n = (p,, p1,. . .) E l7 we denote by (T,,T,, . . . T,,) the composition of the mappings T T p k , k = o , l, . . . . The foilowing assumption will be in effect throughout Part I. Monotonicity Assumption For every x E S, U E U(x), J , J'EF,we have H(x, u, J) I H(x, u, J') if J I J'. (3)
The monotonicity assumption implies the following relations:
J < J' => T(J) I T(J1) JIJ1=T,(J)
VJ, J' E F, VJ,J'EF,
EM.
These relations in turn imply the following facts for all J E F : k = 0,1,. . . , J I T(J) => T k ( ~I) Tk+'(J), k = o,i,. . . , J 2 T(J) =, T,(J) 2 T,+'(J), J I T,(J)
V~EM=,(T,;~~T,~)(J)I(T,,~~~T,~~~)(J k = 0,1,. . . , n
J 2
T,(J)
VPEM=>(T,;
.
= (po,pl,. . .)EII,
. T,,)(J) 2 up;. . T,,+,)(J), k = 0 , 1 , . . . , n = ( p O , p l,...) E n .
28
2.
MONOTONE MAPPINGS IN DYNAMIC PROGRAMMING MODELS
Another fact that we shall be using frequently is that for each J E F and > 0, there exists a p, E M such that
E
+E
T(J)(x) -I/E
if if
T ( J ) ( x )> - co, T ( J ) ( x )= - co.
In particular, if J is such that T ( J ) ( x )> - co for V x E S, then for each E > 0, there exists a p, E M such that
2.2 Problem Formulation
We are given a function J , E F satisfying , . .) E II and positive integer and we consider for every policy n = ( p O pl,. N the functions J N , , E F and J , E F defined by
J,(x)
=
lim (T,,TP1 . . . T,,
N-+m
_ l)(Jo)(x)
V x E S.
(6)
For every result to be shown, appropriate assumptions will be in effect which guarantee that the function J , is well defined (i.e., the limit in ( 6 ) exists for all X E S ) . We refer to J N . , as the N-stage cost function for n and to J , as the cost function for n. Note that J N , , depends only on the first N functions in n while the remaining functions are superfluous. Thus we could have considered policies consisting of finite sequences of functions in connection with the N-stage problem, and this is in fact done in Chapter 8. However, there are notational advantages in using a common type of policy in finite and infinite horizon problems, and for this reason we have adopted such a notation for Part I. Throughout Part I we will be concerned with the N-stage optimization problem minimize J,, ,( x ) (F) subject to n E II, and its infinite horizon version minimize J,(x) subject to n E n. We refer to problem (F)as the N-stagefinite horizon problem and to problem ( I ) as the injinite horizon problem.
For a*fixed XES, we denote by Jg(x) and J*(x) the optimal costs for these problems, i.e., Vx E S, (7) JA(x) = inf JN, ,(x) X
E
~
J*(x) = inf J,(x)
Vx E S.
ria II
We refer to the function JA as the N-stage optimal cost function and to the function J * as the optimal cost function. We say that a policy n* E ll is N-stage optimal at x E S if JN, ..(x) = J$(x) and optimal at x E S if J,*(x) = J*(x). We say that n* E l 7 is N-stage optimal (respectively optimal) if JN.,* = JR (respectively J,, = J*). A policy TC*= (pg ,p:,. . .) will be called uniformly N-stage optimal if the policy (p:, p:+ ,, . . .) is (N - i)-stage optimal for all i = 0,1,. . . ,N - 1. Thus if a policy is uniformly N-stage optimal, it is also N-stage optimal, but not conversely. For a stationary policy n = (p, p, . . .) E ll, we write J, = J,. Thus a stationary policy n* = (p*,p*, . . .) is optimal if J * = J,,. Given E > 0, we say that a policy n, E fl is N-stage &-optimalif J ~n ,. ( ~ ) 5
JR(x)
+E
if J ~ ( x >)- co, if JX(x) = - co.
We say that n, E ll is E-optimalif
If [E,) is a sequence of positive numbers with E, 10, we say that a sequence of policies {n,) exhibits {&,I-dominatedconvergence to optimality if lim JN, =, and, for n = 2,3,. . . ,
n+
= JA,
w
2.3 Application to Specific Models
A large number of sequential optimization problems of practical interest may be viewed as special cases of the abstract problems (F) and (I). In this section we shall describe several such problems that will be of continuing interest to us throughout Part I. Detailed treatments of some of these problems can be found in DPSC.+ + We denote by DPSC the textbook by Bertsekas. "Dynamic Programming and Stochastic Control." Academic Press. Ncw York, 1976.
30
2.
MONOTONE MAPPINGS IN DYNAMIC PROGRAMMING MODELS
2.3.1 Deterministic Optimal Control Consider the mapping H : SCF
-+
R* defined by
Our standing assumptions throughout Part I relating to this mapping are: (1) The functions g and f map SC into [- co, a ] and S, respectively. (2) The scalar cx is positive. The mapping H clearly satisfies the monotonicity assumption. Let Jo be identically zero, i.e.,
Then the corresponding N-stage optimization problem (F)can be written as ..
minimize J,,,(xo)
akg[xk,pk(xk)]
= k=O
subject to xk+
=f
[ x k ,pk(xk)],
pk E M ,
k
= 0, . . . , N
-
1.
This is a finite horizon deterministic optimal control problem. The scalar a is known as the discount factor. The infinite horizon problem (I) can be written as minimize J,(xo)
=
lim N+m
subject to
xk+
=f
1 a k g [ x kpk(xk)] ,
k=o
[ ~ kpk(xk)], ,
pkE M , k
= 0, 1,.
This limit exists if any one of the following three conditions is satisfied:
cc
< 1,
g(x, u) 2 0 g(x, u) I 0 0 I g(x, u) I b
v x € S, u € U(x), (12) v x € S, u € U(x), (13) for some b~ (0, co) and all x E S, u E U(x). (14)
Every result to be shown for problem (11)will explicitly assume one of these three conditions. Note that the requirement 0 < g(x, u) I b in (14) is no more strict than the usual requirement Ig(x,u)l I b/2. This is true because adding the constant b/2 to g increases the cost corresponding to every policy by b/2(1 - x ) and the problem remains essentially unaffected. Deterministic optimal control problems such as (10) and ( 1 1 ) and their stochastic counterparts under the countability assumption of the next subsection have been studied extensively in DPSC (Chapters 2, 6, and 7). They are given here in their stationary form in the sense that the state and control spaces S and C , the control constraint U ( . ) ,the system function f , and the
cost per stage g do not change from one stage to the next. When this is not the case, we are faced with a nonstationary problem. Such a problem, however, may be converted to a stationary problem by using a procedure described in Section 10.1 and in DPSC (Section 6.7). For this reason, we will not consider further nonstationary problems in Part I. Notice that within our formulation it is possible to handle state constraints of the form x ~ E X , k = 0,1,. . . , by defining g(x, u) = cc whenever x $X.This is our reason for allowing g to take the value oo. Generalized versions of problems (10) and (11) are obtained if the scalar a is replaced by a function a : SC -+ R* with 0I ~ ( xu), for all x E S, u E U(x), SO that the discount factor depends on the current state and control. It will become evident to the reader that our general results for problems (F) and (I) are applicable to these more general deterministic problems. 2.3.2
Stochastic Optimal Control-Countable
Disturbance Space
Consider the mapping H : SCF -+ R* defined by
where the following are assumed: (1) The parameter w takes values in a countable set W with given probability distribution p(dwlx,u) depending on x and u, and E{.lx,u} denotes expected value with respect to this distribution. (See a detailed definition below.) (2) The functions g and f map SCW into [ - co,co] and S, respectively. (3) The scalar cc is positive. Our usage of expected value in (15) is consistent with the definition of the usual integral (Section 7.4.4) and the outer integral (Appendix A), where the o-algebra on W is taken to be the set of all subsets of W. Thus if wi, i = 1,2,. . . , are the elements of W, (p',p2,. . .) any probability distribution on W , and z: W -+ R* a function, we define
where z'(wi) z-(w,)
= max [O, z(wi)), =
max(0, - z(wi)>,
i = 1,2,. . . , i = 1,2,. . . .
In view of our convention cx: - oo = co,the expected value E (z(w)) is well defined for every function z: W -+ R* and every probability distribution (p1,p2,. . .) on W. In particular, if we denote by (pl(x, u), p2(x,u), . . .) the
32
2.
MONOTONE MAPPINGS IN DYNAMIC PROGRAMMING MODELS
probability distribution p(dwlx, u) on W = {w', w2,. . .), then (15) can be written as m
H(x, u, J ) =
1pi(x,U)max{O, g(x, u, w') + aJ[f
(x, u, wi)])
i=l
A point where caution is necessary in the use of expected value defined this way is that for two functions z, : W + R* and 5,: W + R*, the equality need not always hold. It is guaranteed to hold if (a) E(z:(w)} < cl;: and E{z; (w)) < oo,or (b) E(z;(w)) < co and E{z; (w)} < co,or (c) E (z: (w)) < co and E{z;(w)) < co (see Lemma 7.11). We always have, however,
It is clear that the mapping H of (15) satisfies the monotonicity assumption. Let Jobe identically zero, i.e.,
Then if g(x, u, w) > - co for all x, u, w, the N-stage cost function can be written as
where the states x,,x,, . . . , x N - , satisfy
The interchange of expectation and summation in (17) is valid, since g(x, u, w) > - oo for all x, u, w. and we have for any measure space (Q, F, v),
measurable h: R
-1
R*, and iE (- oo,+ oo],
When Eq. (18) is used successively to express the states x,,x,, . . . ,x,-, exclusively in terms of wo, w,, . . . ,wN- and x,, one can see from (17) that JN,,(xo) is given in terms of successive iterated integration over w,- ,, . . . ,wo: For each xo E S and 71 E rI the probability distributions pi(xo, pO(xO)), . . . ,pL(xN- pN- l ( ~ N -,)), i = 1,2,. . . , over W specify, by the product measure theorem [ A l , Theorem 2.6.21, a unique product measure on the cross product W N of N copies of W. If Fubini's theorem [ A l , Theorem 2.6.41 is applicable, then from (17) the N-stage cost function JN,,(xo) can be alternatively expressed as
,,
where this expectation is taken with respect to the product measure on wN and the states x , , x,, . . . ,x,-, are expressed in terms of w,, w,, . . . ,w,-, and xo via (18). Fubini's theorem can be applied if the expected value in (19)is not of the form oo - oo,i.e., if either
In particular, this is true if either
E(max(0, - g [ x k , p k ( x k ) w , , ] ) ) < oo,
k
= 0,.
. . ,N - 1
or if g is uniformly bounded above or below by a real number. If JN,,(xo) can be expressed as in (19)for each xo E S and TC E II,then the N-stage problem can be written as N-1
minimize J,, ,(xo) = E{
1
rkg[q, pk(xk),wk] k=O
subjectto x k + , = f [ x k , p k ( x k ) , w k ] , p k € M , k = O , . . . , N - 1 : which is the traditional form of an N-stage stochastic optimal control problem and is also the starting point for the N-stage model of Part I1 (Definition 8.3).
34
2.
MONOTONE MAPPINGS IN DYNAMIC PROGRAMMING MODELS
The corresponding infinite horizon problem is (cf. Definition 9.3) minimize
J,(x,)
subject to x,,
=
lim E
akg[x,, ~,(x,),w,]
, = f [x,, p,(x,), w,],
p, E M, k
= 0,1,.
...
This limit exists under any one of the conditions:
a
< 1,
g(x, u, w) 2 0 g(x, u, w) l 0 0I g(x, u, w) l b
vx € S, u € U(x), w € w, Vx € S, u E U(x), w € w, for some b E (0, co) and all x E S, u E U(X),w E W .
(21) (22) (23)
Every result to be shown for problem (20) will explicitly assume one of these three conditions. Similarly as for the deterministic problem, a generalized version of the stochastic problem is obtained if the scalar a is replaced by a function x : S C W + R" satisfying 0 I a(x, u, w) for all (x, u, w). The mapping H takes the form
H(x, u, J) = E jg(x, u, w) + a(x, u, w)J[f (x, u, w)] lx, u } . This case covers certain semi-Markov decision problems (see [J2]). We will not be further concerned with this mapping and will leave it to the interested reader to obtain specific results relating to the corresponding problems (F) and (I) by specializing abstract results obtained subsequently in Part I. Also, nonstationary versions of the problem may be treated by reduction to the stationary case (see Section 10.1 or DPSC, Section 6.7). The countability assumption on W is satisfied for many problems of interest. For example, it is satisfied in stochastic control problems involving Markov chains with a finite or countable number of states (see, e.g., [D3], [K6]). When the set W is not countable, then matters are complicated by the need to define the expected value
E(g[x, P(x), wl + aJ[f (4P(x), w)Ilx, u ) for every EM. There are two approaches that one can employ to overcome this difficulty. One possibility is to define the expected value as an outer integral, as we do in the next subsection. The other approach is the subject of Part I1 where we impose an appropriate measurable space structure on S, C, and W and require that the functions , uM ~ be measurable. Under these circumstances a reformulation of the stochastic optimal control problem into the form of the abstract problems (F) or (I) is not straightforward. Nonetheless, such a reformulation is possible as well as useful as we will demonstrate in Chapter 9.
2.3.3
Stochastic Optimal Control-Outer
Consider the mapping H: SCF
-i
Integral Fornzulation
R* defined by
H(x, u, J) = E* [ g b , u, W ) + aJ[f (x, u, w)]lx, u},
(24)
where the following are assumed: (1) The parameter w takes values in a measurable space (W, F ) . For each fixed (x, u) E SC, a probability measure p(dw]x,u) on (W, 9)is given and E*(.lx, u ) in (24) denotes the outer integral (see Appendix A) with respect to that measure. Thus we may write, in the notation of Appendix A,
H(n, u, J ) =
J*{g(x,u, w) i nJ[/(x,
u. w)])p(dwlx, u).
(2) The functions g and f map SCW into [ - co,co] and S, respectively. (3) The scalar cc is positive. We note that mappings (9) and (15) of the previous two subsections are special cases of the mapping H of (24). The mapping (9) (deterministic problem) is obtained from (24) when the set W consists of a single element. The mapping (15) (stochastic problem with countable disturbance space) is the special case of (24) where W is a countable set and F is the o-algebra consisting of all subsets of W. For this reason, in our subsequent analysis we will not further consider the mappings (9) and (Is),but will focus attention on the mapping (24). Clearly H as defined by (24) satisfies the monotonicity assumption. Just as for the models of the previous two sections, we take J,(x)
=0
VXES
and consider the corresponding N-stage and infinite horizon problems (F) and (I). If appropriate measurability assumptions are placed on S, C, f , g, and p, then the N-stage cost JN.li(x) = (Two. . T , N - I ) ( ~ o ) ( ~ ) can be rewritten in terms of ordinary integration for every policy n = (p,, p,, . . .) for which p,, k = 0,1,. . . , is appropriately measurable. To see this, suppose that S has a o-algebra 9, C has a o-algebra %',and @ is the Bore1 o-algebra on R*. Suppose f is ( Y V F , 9')-measurable and g is (YV.F,.g)measurable, where 9 V . F denotes the product o-algebra on SCW. Assume that for each fixed B E 9, p(Blx, u) is .YV-measurable in (x, u) and consider a policy n = (p,, p,, . . .), where p, is (Y,'#)-measurable for all k. These conditions guarantee that T,, (J)given by '
36
2.
MONOTONE MAPPINGS IN DYNAMIC PROGRAMMING MODELS
is 9-measurable for all k and J E F that are Y-measurable. Just as in the previous section, for a fixed x o E S and n = (p,, p,, . . .) E ll, the probability measures p(. Ixo, po(xo)),. . . ,p(. IxN-, ,pN- ,(xN - ,)) together with the system equation
define a unique product measure p(d(wo, . . . ,w N - ,)lxo,n) on the cross product W Nof N copies of W . [Note that x,, k = 0,1,. . . ,N - 1, can be expressed as a measurable function of (w,, . . . ,w N - , ) via (25)l. Using the calculation of the previous section, we have that if g ( x , u, w) > - cc for all x , u, w, and Fubini's theorem is applicable, then
where x , , x , , . . . , x N - , are expressed in terms of w o , w,, . . . ,w,-, and x o via (25). Also, as in the previous section, Fubini's theorem applies if either
Thus if appropriate measurability conditions are placed on S, C, W ,f , g, and p(dwlx, u) and Fubini's theorem applies, then the N-stage cost J N , , corresponding to measurable zreduces to the traditional form
This observation is significant in view of the fact that inf J,,.(x) rren
I inf J,,.(x)
V ~ S.E
rrsn
where
Thus, if an optimal (c-optimal) policy n* can be found for problem (F)and
n* E fi .(i.e., is measurable), then n* is optimal (eoptimal) for the problem minimize J,, ,(x) subject to n E fi, which is a traditional stochastic optimal control problem. These remarks illustrate how one can utilize the outer integration framework in an initial formulation of a particular problem and subsequently show via further (and hopefully simple) analysis that attention can be restricted to the class of measurable policies fi for which the cost function admits a traditional interpretation. The main advantage that the outer integral formulation offers is simplicity. One does not need to introduce an elaborate topological and measure-theoretic structure such as the one of Part I1 in an initial formulation of the problem. In addition the policy iteration algorithm of Chapter 4 is applicable to the problem of this section but cannot be justified for the corresponding model of Part 11. The outer integral formulation has, however, important limitations which become apparent in the treatment of problems with imperfect state information by means of sufficient statistics (Chapter 10). 2.3.4 Stochastic Optimal Control-Multiplicati2;e Cost Functional
Consider the mapping H :SCF -t R* defined by We make the same assumptions on w, g, and f as in Section 2.3.2, i.e., w takes values in a countable set W with a given probability distribution depending on x and u. We assume further that
In view of (27), the mapping H of (26) satisfies the monotonicity assumption. We take
and consider the problems (F) and (I). Problem (F) corresponds to the stochastic optimal control problem minimize
JN,~(xO)=E(g[xO~~O(xO)~wO]~~~~[xN-l~~N-1(xN-I)~wN-1])
(28) subject to x,,,
= f [x,,p,(x,),
w,],
,U,E
M, k = O,l,. . . ,
and problem (I) corresponds to the infinite horizon version of (28). The limit as N + cc in (28) exists if g(x, u, w) 2 1 for every x, u, w or 0 I g(x, u, w) I 1
38
2.
MONOTONE MAPPINGS IN DYNAMIC PROGRAMMING MODELS
for every x, u, w. A special case of (28) is the exponential cost functional problem minimize
i , [I::
E exp
subject to x,,
g'[x,, ~,(x,),w,]
=f
11 y , M~ , k
[x,, p,(x,), w,],
= 0,1,.
..,
where g' is some function mapping S C W into (- a , a ] . 2.3.5 Minimax Control Consider the mapping H: SCF
H(x, u, J ) =
-+ R*
defined by
sup { q ( x ,u, w)
w s W(x, U)
+ aJ[ f ( x , u, w ) ] )
where the following are assumed:
(1) The parameter w takes values in a set W and W ( x ,u) is a nonempty subset of W for each x E S, u E U(x). (2) The functions g and f map S C W into [ - co, co] and S respectively. (3) The scalar a is positive. Clearly the monotonicity assumption is satisfied. We take If g(x, u, w) > - cc for all x, u, w, the corresponding N-stage problem (F) can also be written as minimize J , .(I,)
sup
=
N j liakg[xk,,uk(xk),
W k € W[xk, lrk(xk)l
subject to x,,
, = f [x,, ,u,(x,), w,],
W,]}
k= 0
pk E M ,
k
= 0,1,.
. .,
(30)
and this is an N-stage minimax control problem. The infinite horizon version is minimize J.(x,)
=
lim
sup
(Nil
a k g [ X k,uk(xk), . wk]
N + m w k E W[X~. p~(xk)I k = 0
subjectto x,+,=f[x,,,u,(x,),w,],
EM, k = 0 , 1 , . . . .
(31)
The limit in (31) exists under any one of the conditions (21), (22), or (23). This problem contains as a special case the problem of infinite time reachability examined in Bertsekas [B2]. Problems (30)and (31)arise also in the analysis of sequential zero-sum games.
Chapter 3
Finite Horizon Models
3.1 General Remarks and Assumptions
Consider the N-stage optimization problem minimize JN,,(x) = (T,; . . T,,_ ,)(Jo)(x) subject to n = ( p o ,p l , . . .) E l3, where for every p E M, J E F , and x E S we have T , ( J ) ( x ) = H [ x , p(x), J ] ,
T ( J ) ( x )= inf H ( x , u, J ) . U€
U(x)
Experience with a large variety of sequential optimization problems suggests that the N-stage optimal cost function J$ satisfies JR
=
inf J N , , = TN(J,), n s II
and hence is obtained after N steps of the DP algorithm. In our more general setting, however, we shall need to place additional conditions on H in order to guarantee this equality. Consider the following two assumptions. Assumption F.l If [ J , ) c F is a sequence satisfying J,., and H ( x , u, J , ) < co for all x E S , u E U ( x ) ,then
lim H ( x , u, J,) k-
n,
=
H
I J k for all k
V xES, u E U(x).
3.
40
. FINITE HORIZON MODELS
Assumption F.2 There exists a scalar u E (0, oo) such that for all scalars r E (0, oo) and functions J E F, we have
We will also consider the following assumption, which is admittedly somewhat complicated. It will enable us to obtain a stronger result on the existence of nearly optimal policies (Proposition 3.2) than can be obtained under F.2. The assumption is satisfied for the stochastic optimal control problem of Section 2.3.3, as we show in the last section of this chapter. Assumption F.3 There is a scalar and {E,) c R satisfy
J
=
lim J,,
JIJ,,
P E (0, oo) such that if J E F, {J,)
c F,
n = 1 , 2 ,...,
n+ m
Jn(4
5
J ( x ) + En, Jn-,(x)+E,,
n = 1,2,. . . and X E S with J ( x ) > - oo, n = 2 , 3 ,... and X E S with J ( x ) = - o o ,
then there exists a sequence {p,) c M such that
T ( J ) ( x )+PC, , Tp n - , (J,_,)(X)+PE,,
n = 1,2,. . . , X E S with T ( J ) ( x )> - oo, n=2,3,. . . , X E Swith T ( J ) ( x ) = - oo.
Each of our results will require at most one of the preceding assumptions. As we show in Section 3.3, at least one of these assumptions is satisfied by every specific model considered in Section 2.3. 3.2 Main Results The central question regarding the finite horizon problem is whether J$ = T N ( J o ) ,in which case the N-stage optimal cost function J,?j can be obtained via the DP algorithm that successively computes T ( J o ) ,T 2 ( J o ) ,. . . . A related question is whether optimal or nearly optimal policies exist. The results of this section provide conditions under which the answer to these questions is affirmative. Proposition 3.1 (a) Let F.l hold and assume that J,,,(x) < oo for all x € S , n € l I , a n d k = 1 , 2, . . . , N.Then
J,?j = T N ( J o ) .
(b) Let F.2 hold and assume that J z ( x ) > - cc for all x 1,2,. . . ,N. Then
and for every such that
E
E
S and k
=
> 0, there exists an N-stage &-optimalpolicy, i.e., a ~ , E I I
+ E.
J$ I JN,.c I J A
Proof (a) For each k = 0,1,. . . ,N - 1, consider a sequence { p i ) c M such that
lim T,; [ T N - k - l ( J o ) ] = T N - k ( J O ) ,
k=O,.. . , N - 1 ,
1-m
By using F.l and the assumption that Jk,.(x) < cc,we have
J g I inf. . . inf (T,;. . . T , N-1 ~ Nl -) ( J O ) io
= inf -
iN- 1
. . inf (T,:. . - T
io
= inf . . . io
i
~
%-2
i ~ - 2
[
-
I
inf) Tpi.w-i(Jo) x-1
~
iN-,
inf (T,: . . - T,~N -2)[T(Jo)] N -2
ix-2
where the last equality is obtained by repeating the process used to obtain the previous equalities. On the other hand, it is clear from the definitions of Chapter 2 that T N ( ~ I , ) J;, and hence JA = TN(J,). (b) We use induction. The result clearly holds for N = 1. Assume that it holds for N = k, i.e., J: = T k ( J o )and for a given E > 0, there is a X,E II with J,, I J,* + E. Using F.2 we have for all p E M,
.=
Hence J,*+, I T ( J Z ) , and by using the induction hypothesis we obtain J:+, I T k + l ( J 0 ) .On the other hand, we have clearly T ~ + ' ( JI~J,*+ ) and hence T k + ' ( J 0 )= J,*+,. For any F > 0, let ?? = ( p o , p l , . . .) be such that J,., I J,* + (F/2a), and let ,iiE M be such that TF(J,*)I T ( J f )+ (Z/2). Consider the policy 7f, = @,Po, & , . . .). Then
The induction is complete.
Q.E.D.
Proposition 3.l(a) may be strengthened by using the following assumption in place of F.1.
3.
FINITE HORIZON MODELS
Assumption F.l' The function J o satisfies
and if jJ,) c F is a sequence satisfying J,,
, I J k I J O for all k, then
The following corollary is obtained by verbatim repetition of the proof of Proposition 3.l(a).
Corollary 3.1.1 Let F.1' hold. Then
Proposition 3.1 and Corollary 3.1.1 may fail to hold if their assumptions are slightly relaxed.
COUNTEREXAMPLE 1 Take S = { O ) , C = U ( 0 )= ( - 1,0], Jo(0) = 0, H(0, u, J ) = u if - 1 < J(O), H(0, u, J ) = J ( 0 ) u if J(0) I - 1. Then (T,, . . . T,,_ l ) ( J o ) ( 0 = ) po(0) and Jg(0) = - 1, while T N ( J o ) ( 0= ) - N for every N. Here the assumptions J,, .(O) < co and J:(O) > - co are satisfied, but F.l, F.lf, and F.2 are violated.
+
COUNTEREXAMPLE 2 Take S = {O, I ) , C = U ( 0 )= U(1)= (- co,01, Jo(0)= J o ( l ) = 0 , H(0, u, J ) = u if J ( l ) = - co, H(0, u, J ) = 0 if J ( l ) > - co, and H ( l , u , J ) = u. Then (T,; . . T ,,_, ) ( J o ) ( 0 )= 0, (T,; . . T,,_,)(J0)(1) = PO(^) for all N 2 1. Hence, Jg(0) = 0 , Jg(1) = - co. On the other hand, we have P(J,)(O) = T N ( J o ) ( l= ) - co for all N 2 2. Here F.2 is satisfied, but F.l, F.l', and the assumptions J,. .(x) < co and J t ( x ) > - co for V x E S are all violated. The following counterexample is a stochastic optimal control problem with countable disturbance space as discussed in Section 2.3.2. We use the notation introduced there.
COUNTEREXAMPLE 3 Let N = 2, S = jO, I ) , C = U ( 0 )= U ( 1 )= R, W = ( 2 , 3, . . .), p ( ~ = k l x , u ) = k - ~ ( ~ , " _ , n - ~for ) - ~k = 2 , 3 ,..., ~ E S U, E C , f(O,u,w)= f ( l , u , w ) = 1 for V U E C ,W E W , ~ ( O , U , W ) = Wg,( l , u , w ) = u for V U EC , W E W . Then a straightforward calculation shows that J$(O) = w, J S ( 1 ) = - m, while T 2 ( J o ) ( 0 = ) - co, T 2 ( J o ) ( 1 )= - m. Here F.l and F.2 are satisfied, but F.l' and the assumptions Jk.,(x) < so for all x , ~k,, and J:(x) > - x for all x and k are all violated. The next counterexample is a deterministic optimal control problem as discussed in Section 2.3.1. We use the notation introduced there.
3.2
43
MAIN RESULTS
COUNTEREXAMPLE 4 Let N = 2, S = (0,1,. . .), C = U(x)= (0,co) for Vx E S, f ( x ,U ) = 0 for V X E S, u E C , g(0,U ) = - u for V UE U(O),g(x,U ) = x for V U EU ( x )if x # 0. Then for Z E ll and x # 0, we have J,,.(x) = x - ,u,(O), so that J:(x) = - oo for all X E S. On the other hand clearly there is no two-stage E-optimalpolicy for any E > 0. Here F.l, F.2, and the assumption J,,.(x) 4 oo for all x, n, k are satisfied, and indeed we have J;(x) = T ~ ( J , ) (=~-) co for Vx E S. However, the assumption Jz(x) > - co for all x and k is violated. As Counterexample 4 shows there may not exist an N-stage E-optimal policy ifwe have J,*(x)= - oo for some k and x E S. The following proposition establishes, under appropriate assumptions, the existence of a sequence of nearly optimal policies whose cost functions converge to the optimal cost function. Proposition 3.2 Let F.3 hold and assume J,,,(x) < cc for all x E S, n E II, and k = 1,2,. . . ,N. Then J$ = T N ( J o ) .
Furthermore, if (8,) is a sequence of positive numbers with E, LO, then there exists a sequence of policies {n,) exhibiting {E,}-dominated convergence to optimality. In particular, if in addition J$(x) > - co for all X E S, then for every E > 0 there exists an E-optimal policy. Proof We will prove by induction that for K N we have Jg = T ~ ( J , ) , and furthermore, given K and {en)with E, LO, E, > 0 for Vn, there exists a sequence {n,} c II such that for all n,
.,,
lim JK, n-+ cc
JK,
=Jf,
J g ( x )+ En J K .,n_, ( x )
.,(x)
+ E,
We show that this holds for K
JT(x)= inf J,..(x) Z
E
~
=
=
V x e S with J i ( x ) > - co, V X ES with J ~ ( x=) - a.
(2) (3)
1. We have
inf HEX,,LL(x), J O ]= T ( J O ) ( x ) V X ES. PEM
It is also clear that, given [E,), there exists a sequence (n,) c II satisfying (1)-(3) for K = 1. Assume that the result is true for K = N - 1. Let /3 be the scalar specified in F.3. Consider a sequence {en) c R with E, > 0 for Vn and limn,, en = 0, and let (it,} c ll,it, = (p;,,u;, . . .), be such that lim J N - l , e n = J $ - l , n+ m,
J N - I. %,,(x)
J A - l ( ~+) Pv1en V X E S with J z - l ( x ) > -so, ( 5 ) J . v - ~ . ~ ~ _ , ( x ) + P -V' cX, E S with J z - l ( x ) = - ~ . ( 6 )
3.
44
FINITE HORIZON MODELS
The assumption J,,,(x) < co for all x E S, n EkII =, 1,2,. . . ,N, guarantees that we have
I,"=,
Without loss of generality we assume that E, < co. Then Assumption F.3 together with ( 4 ) implies that there exists a sequence {p", c M such that, for all n,
T ( J g - 1)(x)+ En 5( , X
Tp;(JN-
)
if T ( J $ - l ) ( x )> - co, (9) if T(J;- I)(.) = - co. (10)
+E
We have by the induction hypothesis J g - , T N ( J o )s J $ . Hence,
=
T N - ' ( J , ) , and it is clear that
We also have J $ 5 lim T,;(JN-l.,n) n+m
Combining (8), ( l l ) ,and (12),we obtain
Let nn = ( p ; , p ; , & , . . .). Then from (8)-(10) and (13), we obtain, for all n, lim JN,."
=J$,
n - +co
JN,
, (XI
J $ ( x ) + ~n JN,,n_ ( x )
+ E,
V XE S V xES
and the induction argument is complete.
with J $ ( x ) > - CD, with J ~ ( x =) - a,
Q.E.D.
Despite the need for various assumptions in order to guarantee J$ = T N ( J o ) ,the following result, which establishes the validity of the DP algorithm as a means for constructing optimal policies, requires no assumption other than monotonicity of H. Proposition 3.3 A policy n* if and only if
Proof
= ( p t ,pT,
. . .) is uniformly N-stage optimal
Let (14)hold. Then we have, for k = 0,1,. . . ,N - 1,
On the other hand, we have J g - , I (T,:. . . T,;- l)(Jo),while T N - k ( J oI) J g - k . Hence, Jg = (T,;. . . T,; - ,)(Jo)and z*is uniformly N-stage optimal. Conversely, let z*be uniformly N-stage optimal. Then
-,
by definition. We also have for every EM, (T,T)(Jo)= (T,T,;_,)(J0), which implies that
T 2 ( ~= o )inf (T,T)(Jo)= inf (T,T,;_ $ J o ) 2 J $ = (T,;_ 2TP;-, ) ( J o ) 2 T2(Jo).
Therefore T ~ ( J=~J $) = (Tpr;_2TP;,)(Jo)= ( T P ; - ~ T ) ( J o ) . Proceeding similarly, we show all the equations in (14).
Q.E.D.
As a corollary of Proposition 3.3, we have the following. Corollary 3.3.1 (a) There exists a uniformly N-stage optimal policy if and only if the infimum in the relation
Tk+'(Jo)(x) = inf H[x,u, T k ( J 0 ) ]
(15)
u s U(x)
.
is attained for each x E S and k = 0,1,. . . N - 1. (b) If there exists a uniformly N-stage optimal policy, then We now turn to establishing conditions for cxistcnce of a uniformly N-stage optimal policy. For this we need compactness assumptions. If C is a Hausdorff topological space, we say that a subset U of C is compact if every collection of open sets that covers U has a finite subcollection that covers U . The empty set in particular is considered to be compact. Any sequence {u,} belonging to a compact set U c C has at least one accumulation point U E U,i.e., a point i i U ~ every (open) neighborhood of which contains an infinite number of elements of [u,). Furthermore, all accumulation points of (u,} belong to U. If [U,} is a sequence of nonempty compact subsets of C and U, 3 U,, for all n, then the intersection U, is nonempty and compact. This yields the following lemma, which will be useful in what follows.
,
Lemma 3.1 Let C be a Hausdorff space, f:C U a subset of C. Assume that the set U ( i )defined by
+ R*
a function, and
is compact for each iE R. Then f attains a minimum over U
46
3.
FINITE HORIZON MODELS
Proof If f(u) = oo for all U E U, then every U E U attains the minimum. Iff * = inf( f(u)lu E U) < oo,let {in}be a scalar sequence such that An > An+ for all n and 3,, + f *. Then the sets U(in) are nonempty, compact, and satisfy U(/?,) 3 U(/Zn+,)for all n. Hence, the intersection U(1,,) is nonempty and compact. Let u* be any point in the intersection. Then I,, for all n, and it follows that f(u*) < f *. Hence, f u* E U and f(u*) I attains its minimum over U at u*. Q.E.D.
,
Direct application of Corollary 3.3.1 and Lemma 3.1 yields the following proposition. Proposition 3.4 Let the control space C be a Hausdorff space and assume that for each ~ E S%, E R ,and k = 0,1,. . . ,N - 1, the set
is compact. Then
Jg
=
TN(Jo),
and there exists a uniformly N-stage optimal policy. The compactness of the sets U,(x, 1")of (16) may be verified in a number of important special cases. As an illustration, we state two sets of assumptions which guarantee compactness of U,(x,3.) in the case of the mapping
corresponding to a deterministic optimal control problem (Section 2.3.1). a(x, u), b I g(x, u) < oo for some b E R and all x E S, Assume that 0 I u E U(x), and take J, 0. Then compactness of U,(x, i ) is guaranteed if: (a) S = Rn (n-dimensional Euclidean space), C = Rm, U(x) E C, f , g, and a are continuous in (x, u), and g satisfies lim,,, g(x,, u,) = oo for every bounded sequence (x,) and every sequence (u,: for which lukl + a (1.1 is a norm on Rm); (b) S = Rn, C = Rm,f, g, and a are continuous, U(x) is compact and nonempty for each x E Rn, and U(.) is a continuous point-to-set mapping from Rn to the space of all nonempty compact subsets of Rm.The metric on this space is given by (3) of Appendix C. The proof consists of verifying that the functions Tk(J,), k = 0,1,. . . , N - 1, are continuous, which in turn implies compactness of the sets U,(x, 2 ) of (16). Additional results along the lines of Proposition 3.4 will be given in Part I1 (cf. Corollary 8.5.2 and Proposition 8.6).
3.3
APPLICATION TO SPECIFIC MODELS
3.3 Application to Specific Models We will now apply the results of the previous section to the models described in Section 2.3. Stochastic Optimal Control-Outer
Integral Formulation
Proposition 3.5 The mapping H ( x , u, J ) = E* [ g ( x ,u, w)
+ a J [ f ( x ,u, w)]Ix,u>
of Section 2.3.3 satisfies Assumptions F.2 and F.3. We have
Proof
H(x. u. J ) = J*.[g(x,u, w) + a J [ f ( x , u, w)])p(dwlx,u), where 1" denotes the outer integral as in Appendix A. From Lemma A.3(b) we obtain for all x E S, u E C , J E F , r > 0,
Hence, F.2 is satisfied. We now show F.3. Let J E F, { J , ) c F , [E,) c R satisfy E, > 0, and for all n, J
=
lim J,, n-oc
Let
jp,)
ln)
I,"=, < co, E,
J I J,,
M be such that for all n, T,,iJ)(x) 5
i
T ( J ) ( x )i- E , - 1/&n
if T ( J ) ( x )> - co, if T ( J ) ( x )= - a,
T p n ( J )5 TZ,,- ,(J).
(22) (23) (24)
Consider the set A ( J ) = jx
E Slthere exists u E
U ( x )with p*(.(wIJ [ f ( x ,u, w)] = - a]lx, u) > 01,
where p* denotes p-outer measure (see Appendix A). Let P*(jw)J[f ( x , p ( x ) ,w)] = - m ) I x , p ( x ) )> 0
p E M be such that V X EA(J).
(25)
48
3.
FINITE HORIZON MODELS
Define for all n ( x )
if
P,(x)
if x $ A ( J ) .
X E
A(J),
We will show that {p,) thus defined satisfies the requirement of F.3 with /3 = 1 + 2a. For X E A(J),we have, from Corollary A.l.l and (18)-(21), lim sup TPn(Jn)(x) = lim sup TE(J,)(x) n-t
n+ m
m
= lim sup n-t
m
J*{g[x,
P(X)
wl
+ ~ J . [ f ( xP(x), . w)1)
It follows from Lemma A.3(g) and the fact that T,(J)(x)< co [cf. ( 1 8 ) and (21)]that lim sup Tp,(J,)(x)= - co I T(J)(x). n-+ m
For x $ A(J),we have, for all n,
Take B, E F to contain {wlJ[ f ( x ,pn(x),w)] = - co) and satisfy Using Lemma A.3(e) and (b) and (19), we have
Hence, for x $ A(J) we have from (28),(22),and (23)that I lim sup T p n ( J ) ( x=) T ( J ) ( x ) . lim sup Tpn(Jn)(x) n-tm
n+ m
Combining (27)and this relation we obtain lim sup Tp,(Jn)(x) I T(J)(x) n- rn
Vx E S,
(27)
3.3
APPLICATION TO SPECIFIC MODELS
and since T,,(J,) 2 T(J) for all n, it follows that lim Tp,(J,) = T(J).
(29)
n+ m
If x is such that T(J)(x)> - co, it follows from (27) and (29) that we must have x $ A(J). Hence, from (28), (22), and Lemma A.3(b), T,,(J,)(x) l T,,(J)(x)
+
~ C I E ,I
+ + 2a)En
T(J)(x) (1
if
T(J)(x) > - 00. (30)
If x is such that T(J)(x) = - m, there are two possibilities: (a) x 4 A ( J ) and (b) X E A(J). If x 4 A(J), it follows from (28), (24), and (18) that Tp,(Jn)(x)
T,,(J)(x)
+ 2%
I Tp,-,(J)(x) + HE, I Tp,_,(Jn-l)(x)+ ~ u E , .
(31)
If x E A(J), then by (18)-(20) and Lemma A.3(b),
-
Tp,-l(Jn-l)(~)+ 2ae,.
(32)
It follows now from (29)-(32) that (p,) satisfies the requirement of F.3 Q.E.D. with p = 1 + 2a. As mentioned earlier, mapping (17)contains as special cases the mappings of Sections 2.3.1 and 2.3.2. In fact, for those mappings F.l is satisfied as well, as the reader may easily verify by using the monotone covergence theorem for ordinary integration. Direct application of the results of the previous section and Proposition 3.5 yields the following. Corollary 3.5.1
Let H be mapping (17) and let J,(x)
=0
for V ~ S.E
(a) I f J , ~ , ( x ) < c o f o r a l l x ~ S , ~ ~ ~ , a n d,k. .=. ,lN, 2, t h e n J E = TN(Jo)and for each sequence {en)with en -10, en > 0 for Vn, there exists a sequence of policies (n,) exhibiting [en)-dominated convergence to optimality. In particular, if in addition Jz(x) > - m for all x E S, then for every E > 0 there exists an E-optimal policy. (b) If J,*(x)> - oc for all ~ E Sk ,= 1,2,. . . ,N, then J; = TN(J0)and for each e > 0 there exists an N-stage e-optimal policy. (c) Propositions 3.3 and 3.4 and Corollary 3.3.1 apply.
50
3.
FINITE HORIZON MODELS
As Counterexample 3 in the previous section shows, it is possible to have J i # TN(Jo)in the stochastic optimal control problem if the assumptions of parts (a) and (b) of Corollary 3.5.1 are not satisfied. Naturally for special classes of problems it may be possible to guarantee the equality J i = TN(Jo) in other ways. For example, if the problem is such that existence of a uniformly N-stage optimal policy is assured, then we obtain J i = TN(Jo)via Corollary 3.3.l(b). An important special case where we have J i = TN(Jo) without any further assumptions is the deterministic optimal control problem of Section 2.3.1. This fact can be easily verified by the reader by using essentially the same argument as the one used to prove Proposition 3.l(a). However, if J i ( x ) = - co for some x E S, even in the deterministic problem there may not exist an N-stage &-optimalpolicy for a given E (see Counterexample 4). Stochastic Optimal Control-Multiplicative
Cost Functional
Proposition 3.6 The mapping
of Section 2.3.4 satisfies F.1. If there exists a b~ R such that 0 I g(x, u, w) I b for all x E S, u E U(x), w E W , then H satisfies F.2. Proof Assumption F.l is satisfied by virtue of the monotone convergence theorem for ordinary integration (recall that W is countable). Also, if 0 I g(x, u, w) I b, we have for every J E F and r > 0, H(x, u, J
+ r) = E{g(x, u, w)(J[f = El&,
Thus F.2 is satisfied with cx
t ~w)J[f ,
= b.
(x, u, w)] + r)lx, u> (x, u, w)] Ix, u ) + rE jg(x, u, w)/x,u).
Q.E.D.
By combining Propositions 3.6 and 3.1, we obtain the following. Corollary 3.6.1
Let H be the mapping (33) and Jo(x)= 1 for V ~ E S .
(a) IfJ,.,(x)< c o f o r a l l x ~ S , n ~ n , k1,2 = , . . . ,N , t h e n J i = T N ( J o ) . (b) If there exists a b E R such that 0 I g(x, u, w) I b for all x E S, u E U(x), W E W , then J i = TN(Jo)and there exists an N-stage &-optimalpolicy. (c) Propositions 3.3 and 3.4 and Corollary 3.3.1 apply. We now provide two counterexamples showing that the conclusions of parts (a) and (b) of Corollary 3.6.1 may fail to hold if the corresponding assumptions are relaxed. COUNTEREXAMPLE 5 Let everything be as in Counterexample 3 except that C = (0. cc) instead of C = R (and, of course, Jo(0) = J o ( l )= 1 instead
3.3
51
APPLICATION TO SPECIFIC MODELS
of Jo(0)= J o ( l )= 0). Then a straightforward calculation shows that J;(O) = a,J;(1) = 0, while T2(Jo)(0)= T2(Jo)(1)= 0. Here the assumption that J,. .(x) < cc for all x, n, k is violated, and g is unbounded above. COUNTEREXAMPLE 6 Let everything be as in Counterexample 4 except for the definition of g. Take g(0,u) = u for V U EU(0) and g(x,u) = x for Vu E U ( X )if x # 0. Then for every .n E I7 we have J 2 ,.(x) = xp,(O) for every x # 0, and J;(x) = 0 for V x E S. On the other hand, there is no two-stage E-optimal policy for any E > 0. Here the assumption J,.,(x) < cc for all x,n,k is satisfied, and indeed we have J z ( x ) = T 2 ( ~ , ) ( x=) 0 for V x e S . However, g is unbounded above.
Minimax Control Proposition 3.7 The mapping
H ( x , u , J ) = sup j y ( x , u , w ) + a J [ f ( x , u , w ) ] }
(34)
w e W ( x ,U )
of Section 2.3.5 satisfies F.2.
Proof
We have for r > 0 and J EF ,
H(x, u, J
+ r) =
SUP w e W ( x .u)
{g(x,U ,W )
= H(x, u, J )
Corollary 3.7.1
+ ar.
+ a J [ f ( x ,U ,w)] + ctr) Q.E.D.
Let H be mapping (34)and Jo(x)= 0 for V x E S.
), (a) If J,*(x)> - cc for all X E S, k = 1,2,. . . ,N , then J; = T N ( ~ , and for each E > 0 there exists an N-stage E-optimal policy. (b) Propositions 3.3 and 3.4 and Corollary 3.3.1 apply. , it is clearly possible that If we have J:(x) = - cc for some ~ E Sthen there exists no N-stage &-optimalpolicy for a given E > 0, since this is true even for deterministic optimal control problems (Counterexample 4). It is also possible to construct examples very similar to Counterexample 3 which show that it is possible to have Jg f T N ( J o )if J,*(x)= - cc for some x and k.
Chapter 4
Infinite Horizon Models under a Contraction Assumption
4.1 General Remarks and Assumptions Consider the infinite horizon problem minimize J,(x)
=
lim (T,,T,, . . . T,,_ ,)(J,)(x) N+
subject to n
02
= (pO,,uI,.. .)E
ll.
The following assumption is motivated by the contraction property of the mapping associated with discounted stochastic optimal control problems with bounded cost per stage (cf. DPSC, Chapter 6). Assumption C (Contraction Assumption) There is a closed subset B of the space B (Banach space of all bounded real-valued functions on S with the supremum norm) such that J, E B, and for all J E B, ,u E M, the functions T(J) and T,(J) belong to B. Furthermore, for every n = (p,, p , , . . .)E ll, the limit
exists and is a real number for each x E S. In addition, there exists a positive integer m and scalars p, a, with 0 < p < 1, 0 < a, such that
Condition (3)implies that the mapping (T,,T,, . . . T,__ ,)is a contraction ~ k = 0,1,. . . ,m - 1. When nz = 1, the mapping mapping in B for all p , M, T, is a contraction mapping for each y~ M. Note that (2) is required to hold on a possibly larger set of functions than (3). It is often convenient to take B = B. This is the case for the problems of Sections 2.3.1, 2.3.2, and 2.3.5 assuming that a < 1 and g is uniformly bounded above and below. We will demonstrate this fact in Section 4.4. In other problems such as, for example, the one of Section 2.3.3, the contraction property (3) can be verified only on a strict subset B of B. 4.2 Convergence and Existence Results We first provide some preliminary results in the following proposition. Proposition 4.1 Let Assumption C hold. Then: (a) For every J E B and J,
=
E I'I, we
have
lim (T,, . . . T,,_ ,)(Jo)= lim (T,, . . . T,,_ ,)(J). N - no
N+m
(b) For each positive integer N and each J EB, we have inf (T,,. . . T,,
_
,)(J) = TN(J)
nsn
and, in particular,
Ji =
inf (T,,. . . T,,_ l)(Jo) = TN(~,). Z
E
~
(c) The mappings Tw'and T;, y E M, are contraction mappings in B with modulus p, i.e.,
I T"(J) I~T;(J)
T~(J')~I - T;(J')I -
J'I I pJIJ - ~ ' 1 1 I p l l ~-
VJ, J'EB,
VJ, J'EB,
+
EM
Proof' (a) For any integer k 2 0, write k = nm q, where y, n are nonnegative integers and 0 q < nl. Then for any J. J'EB, using (2) and (3), we obtain
54
4.
INFINITE HORIZON MODELS UNDER A CONTRACTION ASSUMPTION
from which, by taking the limit as k (and hence also n) tends to infinity, we have lim(T,;,.T
k-+ w
,,_, )(Jo)= k-+ lim(T,;..T ,,-, )(J) m
VJEB.
(b) Since T ~ ( J ) E Bfor all k by assumption, we have Tk(J)(x)> - oo forallx~Sandk.Forany& > O , l e t p k ~ M , k = O , l ., .. , N - 1,besuchthat
Using (2) we obtain
2 inf (T,; KC
. . T,,_ ,)(J) -
II
Since E > 0 is arbitrary, it follows that TN(J)2 inf (T,; rr E n
. . T,,_ ,)(J).
The reverse inequality clearly holds and the result follows. (c) The fact that T r is a contraction mapping is immediate from (3). We also have from (3) for all pkeM, k = 0,. . . ,m - 1, and J , J'EB, (T,;
. . T,,,_,)(J) 5 (T,;
. . T,_-,)(J') + pllJ - J'11.
Taking the infimum of both sides over pk E M, k using part (b) we obtain
= 0,
1,. . . , m - 1, and
A symmetric argument yields
Combining the two inequalities, we obtain lITm(J)- Tm(J')III pl(J- J'Il. Q.E.D.
In what follows we shall make use of the following fixed point theorem. (See [ 0 5 , p. 3831-the proof found there can be generalized to Banach spaces.) Fixed Point Theorem If B is a closed subset of a Banach space with norm denoted by 11. 11 and L : B -+ B is a mapping such that for some positive integer m and scalar p ~ ( 0I), , IILm(z)- Lm(zr)llI ~ [ l-z zlll for all z, ~ ' E Bthen , L has a unique fixed point in B, i.e., there exists a unique vector z* E B such that L(z*) = z*. Furthermore, for every Z E B, we have lim [ILN(z)- z*[l = 0. N+ w
The following proposition characterizes the optimal cost function J * and the cost function J, corresponding to any stationary policy (p, p, . . .) E II.It also shows that these functions can be obtained in the limit via successive application of T and T, on any J E B. Proposition 4.2 Let Assumption C hold. Then: (a) The optimal cost function J * belongs to B and is the unique fixed point of T within B, i.e., J * = T(J*),and if J' €Band J' = T(Jf),then J' = J*. Furthermore, if J' E B is such that T(J1)I J', then J * I J', while if J' I T(Jf), then J' I J*. (b) For every p E M, the function J, belongs to B and is the unique fixed point of T, within B. (c) There holds
- J,I[ = 0 lim lITfj(~)
VJEB,
EM.
N-+ w
Proof From part (c) of Proposition 4.1 and the fixed point theorem, we have that T and T, have unique fixed points in B. The fixed point of T, is clearly J,, and hence part (b) is proved. Let J"* be the fixed point of T. We . any P > 0, take p E M such that have J"* = ~ ( 1 " )For
+ 2. From (2) it follows that T;(J"*) I T~(J"*)+ aZ I J"* + (1 + a)Z. Continuing T ~ ( J *I ) J*
in the same manner, we obtain u
T$(J*) I j*
+ (1 + u + . . . + am-')P.
Using (3) we have T;~(J*)I T;(J*) + p(i + a + . . . + am-l ) ~ I J * + (1 + p)(l + u + . . . + am-')z.
56
4.
INFINITE HORIZON MODELS UNDER A CONTRACTION ASSUMPTION
Proceeding similarly, we obtain, for all k 2 1,
+
+ +
+
T$(J"*) I J"* + (1 + + . . . pk-l)(l CI . . . am-')z. Taking the limit as k -. a3 and using the fact that J, = lim,,, T;~(J"*),we have
+ cc + . . . + am-')-'&, we obtain J, I J * + Since J * I J, and E > 0 is arbitrary, we see that J * 2. We also have J * = inf lim (T,, . . . T,,_ ,)(J*)2 lim T~(J"*)= J*.
Taking Z= (1 - p)(l
8.
N-r m
npIIN+o,
Hence J * = J * and J * is the unique fixed point of T. Part (c) follows immediately from the fixed point theorem. The remaining part of (a) follows easily from part (c) and the monotonicity of the mapping T. Q.E.D. The next proposition relates to the existence and characterization of stationary optimal policies. Proposition 4.3 Let Assumption C hold. Then: (a) A stationary policy n* = (p*,p*, . . .)EII is optimal if and only if T,,(J*) = T(J*). Equivalently, n* is optimal if and only if T,*(J,*) = T(J,*). (b) If for each X E S there exists a policy which is optimal at x, then there exists a stationary optimal policy. (c) For any E > 0, there exists a stationary &-optimal policy, i.e., a n, = (p,, p,, . . . ) ~such n that JIJ* -
J,*I
I E.
Proof (a) If TC*is optimal, then J,, = J * and the result follows from parts (a) and (b) of Proposition 4.2. If T,,(J*) = T(J*), then T,,(J*) = J*, and hence J,, = J * by part (b) of Proposition 4.2. If T,,(J,,) = T(J,,), then J,, = T(J,,) and J,, = J * by part (a) of Proposition 4.2. (b) Let 71: = (p:,., p?.,, . . .) be a policy which is optimal at x E S. Then using part (a) of Proposition 4.1 and part (a) of Proposition 4.2, we have
Hence T,;, ,(J*)(x) = T(J*)(x)for each x. Define p* E M by means of p*(x) = ~ : , ~ ( xThen ) . T,,(J*) = T(J*)and the stationary policy (p*, p*, . . .) is optimal by part (a). (c) This part was proved earlier in the proof of part (a) of Proposition 4.2 [cf. (4)]. Q.E.D. Part (a) of Proposition 4.3 shows that there exists a stationary optimal policy if and only if the infimum is attained for every x E S in the optimality equation J*(x) = T(J*)(x) = inf H(x, u, J*). UE U ( X )
Thus if the set U(x) is a finite set for each x E S , then there exists a stationary optimal policy. The following proposition strengthens this result and also shows that stationary optimal policies may be obtained in the limit from finite horizon optimal policies via the D P algorithm, which for any given J E B successively computes T(J), T2(J),. . . . Proposition 4.4 Let Assumption C hold and assume that the control space C is a Hausdorff space. Assume further that for some J E B and some positive integer k, the sets
are compact for all x E S, 3. E R, and k 2 k. Then: (a) There exists a policy TC*= (pg, p?, . . .)Ell attaining the infimum for all x E S and k 2 k in the D P algorithm with initial function J , i.e., (T,;T"(J)
=
'
T k f (J)
V k 2 k.
(6)
(b) There exists a stationary optimal policy. (c) For every policy n* satisfying (6), the sequence {pt(x)J has at least one accumulation point for each s E S. (d) If p* :S + C is such that p*(x) is an accumulation point of [pt(x)) for each x E S, then the stationary policy (p*, p*, . . .) is optimal. Proof
(a) We have
T,+ '(J)(x) = inf H[x, u, TYJ)], UE
U(x)
and the result follows from compactness of sets (5) and Lemma 3.1. (b) This part will follow immediately once we prove (c) and (d). (c) Let n* = (p:, p?,. . .) satisfy (6) and define
58
4.
INFINITE HORIZON MODELS UNDER A CONTRACTION ASSUMPTION
We have from (2), (6), and the fact that T ( J * ) = J*,
From these two relations we obtain
1 all n 2 k and k 2 5 and {p:(x)) It follows that p , * ( x ) ~U,[x, J * ( x ) + 3 ~ x 8 ~for has an accumulation point by the compactness of U k [ x ,J * ( x ) + 3ask]. (d) If p*(x) is an accumulation point of {p:(x)), then p * ( x ) ~ U,[x, J * ( x ) 3ae,] for all k 2 E, or equivalently,
+
By using (2), we have, for all k,
II(T,*Tk)(J) - T,*(J*)II I allTk(J)- J*ll 5 as,. Combining the preceding two inequalities, we obtain
Since ck -+ 0 [cf. Proposition 4.2(c)], we obtain T,,(J*) < J*. Using the fact that J* = T ( J * ) I T,*(J*), we obtain T,,(J*) = J*, which implies by Proposition 4.3 that the stationary policy (p*,p*, . . .) is optimal. Q.E.D. Examples where compactness of sets (5) can be verified were given at the end of Section 3.2. Another example is the lower semicontinuous stochastic optimal control model of Section 8.3. 4.3 Computational Methods
There are a number of computational methods which can be used to obtain the optimal cost function J* and optimal or nearly optimal stationary policies. Naturally, these methods will be useful in practice only if they require a finite number of arithmetic operations. Thus, while "theoretical" algorithms which require an infinite number of arithmetic operations are of
4.3
59
COMPUTATIONAL METHODS
some interest, in practice we must modify these algorithms so that they become computationally implementable. In the algorithms we provide, we assume that for any J E B and E > 0 there is available a computational method which determines in a finite number of arithmetic operations functions J, E B and p, E M such that For many problems of interest, S is a compact subset of a Euclidean space, and such procedures may be based on discretization of the state space or the control space (or both) and piecewise constant approximations of various functions (see e.g., DPSC, Section 5.2). Based on this assumption (the limitations of which we fully realize), we shall provide computationally implementable versions of all "theoretical" algorithms we consider. 4.3.1 Successive Approximation
The successive approximation method consists of choosing a starting function J E B and computing successively T(J), TZ(J),. . . , Tk(J),. . . . By part (c) of Proposition 4.2, we have lim,,, IITk(J)- J*ll = 0, and hence we obtain in the limit the optimal cost function J*. Subsequently, stationary optimal policies (if any exist) may be obtained by minimization for each x E S in the optimality equation J*(x) = inf H(x,u, J*). x€ U(x)
If this minimization cannot be carried out exactly or if only an approximation to J * is available, then nearly optimal stationary policies can still be obtained, as the following proposition shows. Proposition 4.5 Let Assumption C hold and assume that
y* EB and
p E M are such that
2 0,
where
E~
2 0 are scalars. Then
J* I J, I J * + [(2ael Proof
Using (2) we obtain
and it follows that
+ eZ)(l+ a + . . . + am-')/(l - p)].
60
4.
INFINITE HORIZON MODELS UNDER A CONTRACTION ASSUMPTION
Using this inequality and an argument identical to the one used to prove ( 4 ) in Proposition 4.2, we obtain our result. Q.E.D. An interesting corollary of this proposition is the following. Corollary 4.5.1 Let Assumption C hold and assume that S is a finite set and U ( x )is a finite set for each x E S. Then the successive approximation method yields an optimal stationary policy after a finite number of iterations in the sense that, for a given J E B,if n* = (pg,p:, . . .) E II is such that
then there exists an integer k such that the stationary policy ( p t ,pz, . . .) is optimal for every k 2 k. Proof Under our finiteness assumptions, the set M is a finite set. Hence there exists a scalar E* > 0 such that J , I J* E* implies that ( p ,p,. . .) is optimal. Take k sufficiently large so that IITk(J)- J*ll 5 P for all k 2 E, wherez satisfies 2aP(1 + a + . . . + am- ' ) ( l - p)-' I E * , and use Proposition 4.5. Q.E.D.
+
The successive approximation scheme can be sharpened considerably by making use of the monotonic error bounds of the following proposition. Proposition 4.6 Let Assumption C hold and assume that for all scalars r # 0, J E B, and x E S, we have a, 5 [ T m ( J
+ r ) ( x )- T m ( J ) ( x ) ] / rI
a,,
(7)
where a,, a , are two scalars satisfying 0 I a , I a , < 1. Then for all J E B , ~ E Sand , k = 1,2,. . . ,we have
where
6, = max d,
=
inf [ T k m ( J ) ( x) T(k-')m(J)(x)], XES
-----a,,-----
1 a
I
- a,
a, = s u p [ T k m ( J ) ( x-) T ( k - ' ) m ( J ) ( x ) ] . xcs
Note If B = B we can always take a , = p, a , = 0, but sharper bounds are obtained if scalars cn, and a , with 0 < a , and/or a , < p are available. Proof It is sufficient to prove (8) for k = 1, since the result for k > 1 then follows by replacing J by T ( k - ' ) m ( J ) .In order to simplify the notation. we assume m = 1. In order to prove the result for the general case simply
4.3
61
COMPUTATIONAL METHODS
replace T by Tmin the following arguments. We also use the notation dl
= d,
-
dl
=a,
d2 = d',
Relation (7) may also be written (for m T(J) + min[nlr,a2r] I T ( J
=
-
d2 = 2.
1) as
+ r) I T(J) + max[a1r,a2v].
(9)
We have for all x E S, J(x)
+ d I T(J)(x).
(10)
Applying T on both sides of (10) and using (9) and (lo), we obtain
By adding min[aId, a;d] to each side of these inequalities, using (9) (with J replaced by T(J) and v = min[a,d, a2d]), and then again (ll), we obtain
Proceeding similarly, we have for every k
Taking the limit as k
+ co,we
=
1,2,. . . ,
have
I
J(x) + min ------d, -d I T(J)(x) min [ l a 1-a2
+
r T~(J)(x)+ min ----d, 11 :al
I J*(x).
Also, we have from (11) that
and by taking the infinum over x E S, we see that
-d l 1 - n2
62
4.
INFINITE HORIZON MODELS UNDER A CONTRACTION ASSUMPTION
It is easy to see that this relation implies min
[
a;
---, d
,
1 -a2
]
I min [-dl, 1-a
~ d ' 1-a2
]
.
(13)
Combining (12)and (13)and using the definition of b, and b,, we obtain Also from (12) we have T ( J ) ( x )+ b, I J*(x), and an identical argument ) b, I J*(x). Hence the left part of (8) is proved for shows that T 2 ( J ) ( x+ k= 1, m = 1. The right part follows by an entirely similar argument. Q.E.D. Notice that the scalars b, and 5, in (8)are readily available as a byproduct of the computation. Computational examples and further discussion of the error bounds of Proposition 4.6 may be found in DPSC, Section 6.2. By using the error bounds of Proposition 4.6, we can obtain J* to an arbitrary prespecified degree of accuracy in a finite number of iterations of the successive approximation method. However, we still do not have an implementable algorithm, since Proposition 4.6 requires the exact values of the functions Tk(J).Approximations to T k ( J )may, however, be obtained in a cornputationally implementable manner as shown in the following proposition, which also yields error bounds similar to those of Proposition 4.6. Proposition 4.7 Let Assumption C hold. For a given J E B and consider a sequence (J,) c B satisfying
E
> 0,
+
T ( J )I J , I T ( J ) E, T(J,) 5 J k f l I T ( J k )+ E,
k
=
1,2,. . . .
Then
T ~ ~ ( J ) - J , ~ Ik =, o , i , . . . , where -
E = &(I
+ + . . . + CLm-l)/(l- p) %
Furthermore, if the assumptions of Proposition 4.6 hold, then for all x E S and k = 1,2,.. .
4.3
COMPUTATIONAL METHODS
Proof
We have
An identical argument yields
and we also have
IITm(Jm)- T2"(J)llI p l l J m - Tm(J)IJ. Using the preceding three inequalities we obtain l J J z m - T2"(J)ll I IIJzm - ~ ~ ( ~ r I ( 1 p)E(l + a
+
Proceeding similarly we obtain, for k
+ JJTm(Jm) - T2"(J)ll
n ) J l
+ - . .+am-').
=
1,2,. . . ,
and (14)follows. The remaining part of the proposition follows by using (14) and the error bounds of Proposition 4.6. Q.E.D. Proposition 4.7 provides the basis for a computationally feasible algorithm to determine J* to an arbitrary degree of accuracy, and nearly optimal stationary policies can be obtained using the result of Proposition 4.5.
4.3.2 Policy Iteration The policy iteration algorithm in its theoretical form proceeds as follows. An initial function p, E M is chosen, the corresponding cost function J,, is computed, and a new function p , E M satisfying T,,(J,,) = T(J,,) is obtained. More generally, given pk E M, one computes J,, and a function p,+ E M satisfying T,,<+,(J,,) = T(J,,), and the process is repeated. When S is a , can often compute J,, finite set and U(x)is a finite set for each ~ E Sone in a finite number of arithmetic operations, and the algorithm can be carried out in a computationally implementable manner. Under these circumstances, one obtains an optimal stationary policy in a finite number of iterations, as the following proposition shows.
,
64
4.
INFINITE HORIZON MODELS UNDER A CONTRACTION ASSUMPTION
Proposition 4.8 Let Assumption C hold and assume that S is a finite set and U(x) is a finite set for each XES.Then for any starting function pee M, the policy iteration algorithm yields a stationary optimal policy after a finite number of iterations, i.e., if ( y , ) is the generated sequence, there exists an integer k such that (y,, y,, . . .) is optimal for all k 2 k.
Proof
We have, for all k,
Applying T,,+ , repeatedly on both sides, we obtain T;,+ ,(J,,) 5 TEiL',(J,,) 5 . . . T,,+ l(J,,) IJ,,, N = l , 2,...
=
T(J,,) (15)
By Proposition 4.2, N+ m
SO J,,,+
I
5 J,,<
,,
If ( y k , y k , . .) is an optimal policy, then J,,+ , = J,, = J* and (y,, p,. . . .) is also optimal. Otherwise, we must have J,,+ ,(x) < J,,(x) for some x E S, for if J,, , = J,,, then from (15)and (16) we have T(J,,) = J,,, which implies the optimality of (p,, p,, . . .). Hence, either (y,, p,, . . .) is optimal or , +. .) is a strictly better policy. Since the set M is finite under else ( ~ , + ~ , y,,. Q.E.D. our assumptions, the result follows.
,,
+
When S and U(x)are not finite sets, the policy iteration algorithm must be modified for a number of reasons. First, given p,, there may not exist a pk+ such that T,,+ (J,,) = T(J,,). Second, even if such a y,. exists, it may not be possible to obtain T,,, ,(J,,) and J,,, , in a computationally implementable manner. For these reasons we consider the following modified policy iteration algorithm.
,
Step 1 Choose a function po€ M and positive scalars y, 6, and Step 2 Given p, E M, find jPkE B such that IIj,, Step 3 Find p,+
,E M
11
such that T,,+ ,(I,,) IITPL+~('PL)
-
- JWkll
-
E.
J,,ll 5 ypk.
~(.7,,)llI hpk.If
E7
stop. Otherwise, replace y, by p,,, and return to Step 2. Notice that Steps 2 and 3 of the algorithm are computationally implementable. The next proposition establishes the validity of the algorithm. Proposition 4.9 Let Assumption C hold. Then the modified policy iteration algorithm terminates in a finite, say k, number of iterations, and
4.3
65
COMPUTATIONAL METHODS
the final function pi; satisfies
Proof We first show that if the algorithm terminates at the Fth iteration, then (17) holds. Indeed we have
For any positive integer n, we have
+
ilJpr 11
IlJpr - J*ll I - T ~ ( J , J ~ ~I I T ~ ( J-~ ~ )~ ~ ( J + ~ ,. ). .j l + T("- i)m(Jp,)- T ~ ~ ( J , , )+I I T ~ ~ ( J , -, ) J * I ~ .
11
From this relation we obtain, for all n 2 1,
-
l J,
J*jj I (1 + p We also have -
1 &,
+ . . . + pn-')IIJ,,
- Tm(J,,)jI I
I J,
-
T~(J,,)/I+ IIT~~(J,,)- ~ * j l . (21)
+ 11 T(J~,-) T~( J , ) I ~
- T(jpE)II
+IIT~-'(~~,)
+ ...
-
- Tm(JpE)II>
from which we obtain, by using (2), j/Jp, -
T~(J",,)II I
(1
+ a + . . . + am-')llJ,,
-
~(J~,)jl.
(22)
Combining (21) and (22),we obtain, for all n 2 1,
llJp,
- J*II I ( 1
+ p + . . . + p n - l ) ( l + a + . . . + am-' 1I IJK - T(J~,)I1
+ IITnm(7,,)- J*ll.
Taking the limit as n + oo,we obtain
lIJp,
- J*II
I (1 + a
+ . . . + a m - ' ) / & , - T(j,,)lj/(l
Using (18), we also have J
-J *
J
-J
+J
-
-
-J *
P
+J
-
- p).
(23)
-J *
(24)
From (19) and (20)we obtain
11 J,,
1 I + dpE.
- T(J,J~
(25)
E
By combining (23)-(25), we obtain (17). To show that the algorithm will terminate in a finite - number of iterations, assume the contrary, i.e., assume we have IIT,,+,(J,,) - J,,ll > E for all k,
-
66
4.
INFINITE HORIZON MODELS UNDER A CONTRACTION ASSUMPTION
and the algorithm generates an infinite sequence CPk) c M. We have, for all k, IIT,k+l(J,k)
-
T(J,k)II
IITPk+l(JI(k)
-
+ 11 T P k + I (6
+ 2ay)pk.
T P ~ + ~ ( ~ P ~ ) ~ ~
-
+ 11 T ( J p k ) - T(Jpk)ll
T('Pk)ll
This relation yields, for all k, I T(J,,)
T,k+l(J,k)
+ (6 + 2@y)pkI
+ (6 + 2ay)pk + (6 + 2ay)pk.
T,,(J,,)
= J,,
(26)
Applying T,,+, to both sides of (26) and using (26) again, we obtain T:,+ , ( J P k ) I T,,+ l ( J P k + ) a(6 + 2ay)pk I T(J,,) I J,, ( 1 + a)(6 + 2ay)pk.
+
+ ( 1 + a)(6 + 2ay)pk
Proceeding similarly, we obtain, for all k,
+ + + + +
+
TF,+,(J,,) I T(J,,) + ( 1 a . . . am-')(6 + 2ay)pk < J,, ( 1 a . . . + am-')(6 + 2ay)pk. Applying TFk+,repeatedly to both sides, we obtain, for all n and k ,
+ ( 1 + p + . . . + pn-')(l
Ti;+,(J,,) I T(J,,)
+a
+ . . . + am-' )(a + 2ay)pk. (27)
Denote 3.
= (1
+ a + . . . + am-')(6 + 2ay)/(l - p).
Then by taking the limit in (27) as n -, co,we obtain k=0,1,. .. J,,,, I T(J,,)+?,pk, By repeatedly applying T to both sides, we obtain
+ ,?(am-' + am-'p + . . . + pm-I )P ( n - l ) m . J,,, I T m ( J,(n_ Let 1= i(am-' + a m - 2 p + . . . + pm-I). Then (28) can be written as J,,,
i T m ( J p ( nI ) m ) +
zp(n-l)m, n = 1,2,. . . .
Using ( 2 9 ) repeatedly, we have, for all n, -
J,,,
I Tm(J ,(n-
,,,I + ?,P( n - l ) m
I T m [ T m ( J p ( n2 ,-m ) < T'*(J ,,n- ),,
+
+ Xp(n-2)m] + lp(n-l)m + pp(n- I
(28)
(29)
4.3
67
COMPUTATIONAL METHODS
since
- k - 1)m
pkp(n
I pn-' for all k = 0,1,. . ., n - 1, this inequality yields
Tnm(Juo) + npn-'2,
J * 5 J,,,
n = 1,2,. . . .
Since lim,,,(npn-') = 0 and lim,,, IITnm(JP0) - J*ll tends to J * as n -+ oo, and it follows that lim llJ,,, - J*ll
= 0,
the right side
= 0.
n+cc
Since by construction
-
-
-
-
IITPnm+,(Jun,)- JunmII 5 IITPnm+l(JPnm) - T(JPnm)ll + IIT('~",) - T(Junm)II+ IIT(JPnm)- T(J*)ll
+ llJ*'I(6
Junmll
+ llJPn,
-
- Junmll
+ ay + y)pnm+ (1 + a)llJunm- J*ll,
we conclude that
This contradicts our assumption that
for every k.
Q.E.D.
4.3.3 Mathematical Pvogvamming Let the state space S be a finite set denoted by
S = ( ~ 1 , ~ 2 >,xn), ... and assume B = B. From part (a) of Proposition 4.2, we have that whenever J EB and J 5 T(J), then J 5 J*. Hence the values J*(x,),. . . ,J*(x,) solve the mathematical programming problem fl
maximize
C ,Ii i= 1
subject to
H(xi,u, J,),
i = 1,. . . ,n, u E U(xi),
where J, is the function taking values J,(xj) = /Zi,i = 1,. . . ,n. If U(xj) is a finite set for each i, then this problem is a finite-dimensional (possibly nonlinear) programming problem having a finite number of inequality constraints. In fact, for the stochastic optimal control problem of Section 2.3.2, this problem is a linear programming problem, as the reader can easily verify (see also DPSC, Section 6.2). This linear program can be solved in a finite number of arithmetic operations.
68
4.
INFINITE HORIZON MODELS UNDER A CONTRACTION ASSUMPTION
4.4 Application to Specific Models
The results of the preceding sections apply in their entirety to the problems of Sections 2.3.3 and 2.3.5 if a < 1 and g is a nonnegative bounded function. Under these circumstances Assumption C is satisfied, as we now show. Stochastic Optimal Control-Outer Proposition 4.10
Integral Formulation
Consider the mapping
H(x, u, J ) = E* ( g ( x ,u, w) + u J [ f ( x ,u, w)]lx, u )
(30)
of Section 2.3.3 and let J o ( x )= 0 for V x E S. Assume that cc < 1 and for some b E R there holds Then Assumption C is satisfied with B equal to the set of all nonnegative functions J E B, the scalars in (2) and (3)equal to 2a and a, respectively, and m = 1. Note If the special cases of the mappings of Sections 2.3.1 and 2.3.2 are considered, then B can be taken equal to B, and the scalars in (2) and (3) can both be taken equal to a. Proof Clearly Jo E B and T ( J ) , T,(J) E B for all J E B and y E M. We also have, for any = ( y o ,y l , . . .)~ll,
and hence limN,,(T,; . . T P N ]- ) ( J O ) ( xexists ) for all x verify inductively using Lemma A.2 that
E
S. It is also easy to
N- 1
(T,, . . . T,,_ , ) ( J o ) ( x I)
C
k= 0
akb
b / ( l - a)
VSES, N
=
Hence lim,,,(T,; . . T,,_l)(Jo)(x)is a real number for every x. We have for all X E S ,J. J ' E B , EM, and W E W ,
Hence, using Lemma A.3(b),
1,2,. .
4.4
APPLICATION TO SPECIFIC MODELS
Hence
A symmetric argument yields T,(J')(x) - T,(J)(x) 5 2allJ
-
J'll. Therefore,
Taking the supremum of the left side over x E S, we have IIT,(J) - T,(J1)II s 2~115- J'II
V,UEM , J , J ' E B,
(33)
which shows that (2)holds. If J , J ' E B, then from (31), Lemma A.2, and Lemma A.3(a), we obtain in place of (32) E*{glx, A x ) , wl + ccJ[f(x,A x ) , w)l lx, u ) I E*(g[x,A x ) , wl + a J 1 l f ( x~ , ( w)] 4 ,Ix, u } + allJ - J'll,
and proceeding as before, we obtain in place of (33) ( J - T ( J ) J - J
This shows that (3)holds with p
= a.
V ~ E M ,J , J ' E B .
Q.E.D.
Minimax Control Proposition 4.11
Consider the mapping
of Section 2.3.5 and let J,(x) b E R, there holds
=0
for V x E S. Assume that a < 1 and for some
Then Assumption C is satisfied with (2)and (3)both equal to cc. Proof
B equal to B, m = 1, and the scalars in
The proof is entirely similar to the one of Proposition 4.10. Q.E.D.
For additional problems where the theory of this chapter is applicable, we refer the reader to DPSC. An example of an interesting problem where Assumption C is satisfied with m > 1 is the first passage problem described in Section 7.4 of DPSC.
Chapter 5
Infinite Horizon Models under Monotonicity Assumptions
5.1 General Remarks and Assumptions Consider the infinite horizon problem minimize J,(x)
=
lim (T,,T,, - . . T,,_ ,)(J,)(x)
N+ m
subject to
n = (p,, p l , . . .) E Il.
In this chapter we impose monotonicity assumptions on the function J , which guarantee that J, is well defined for all TCEll.For every result to be shown in this chapter, one of the following two assumptions will be in effect. Assumption I (Uniform Increase Assumption) There holds
J0(x) 5 H(x, U, Jo)
VX E S, u E U(X).
(2)
Assumption D (Uniform Decrease Assumption) There holds
J,(x) 2 H(x, u, J,)
Vx E S, u E U(x).
(3)
It is easy to see that under each of these assumptions the limit in (1)is well defined as a real number or co.Indeed, in the case of Assumption I we have
+
from (2)that J o I T,,(Jo)
(T,,T,,)(Jo) 5 - . . 5 (T,,TP1.. . T,,-,)(JO)
. . .,
while in the case of Assumption D we have from (3) that Jo 2 T,,(Jo) 2 (T,,T,,)(Jo) 2 . . . 2(T,,T,,. . . T,,-,)(Jo) 2 . . . . In both cases, the limit in ( 1 ) clearly exists in the extended real numbers for each x E S. In our analysis under Assumptions I or D we will occasionally need to assume one or more of the following continuity properties for the mapping H. Assumptions 1.1 and 1.2 will be used in conjunction with Assumption I, while Assumptions D.l and D.2 will be used in conjunction with Assumption D. Assumption 1.1 If ( J , ) c F is a sequence satisfying J o I J , I J k + lfor all k, then
(
)
lim H(x,u, J,) = H x, u, lim Jk k - oo
k l m
Vx E S,
u E U(x).
(4)
Assumption 1.2 There exists a scalar a > 0 such that for all scalars r > 0 and functions J E F with Jo I J, we have Assumption D.l all k, then
,
If (J,) c F is a sequence satisfying Jk+ I J , I J o for
lim H(x,u, J,) = H
QxE S, u E U(x).
k+ w
(6)
Assumption D.2 There exists a scalar a > 0 such that for all scalars r > 0 and functions J E F with J I J,, we have
5.2 The Optimality Equation We first consider the question whether the optimality equation J* = T(J*) holds. As a preliminary step we prove the following result, which is of independent interest. Proposition 5.1 Let Assumptions I, 1.1, and 1.2 hold. Then given any E
> 0, there exists an E-optimal policy, i.e., a TC, E II,such that
5.
72
INFINITE HORIZON MODELS UNDER MONOTONICITY ASSUMPTIONS
Furthermore, if the scalar u in 1.2 satisfies a < 1, the policy to be stationary.
Proof
Let
(E,)
be a sequence such that
E~
TC, can
be taken
> 0 for all k and
For each x E S, consider a sequence of policies { T C , [ X ] ) c ll of the form xk[xl such that for k
= (,ukg[~I> ,u![xI,.
..I,
= 0,1,. . .
Such a sequence exists, since we have J*(x) > - cc under our assumptions. The (admittedly confusing) notation used here and later in the proof should be interpreted as follows. 'The policy n,[x] = (,ukg[x],y:[x], . . .) is associated with x. Thus ,u:[x] denotes, for each x E S and k, a function in M, while ,u:[x](z)denotes the value of ,u:[x] at an element z E S. In particular, y f [ x ] ( x )denotes the value of ,uf[x] at x. Consider the functions ,iik E M defined by F k ( ~ 1=
~ k [XI o (x)
'VxE S
and the functions 7, defined by
By using (lo),( 1 I), I , and 1.1, we obtain
We have from (12),(13) and 1.2 for all k
=
1,2,. . . and X E S
T,-, _ ,(JL)(x)= H [ X F, k - 1 ( ~Jkl ), I H[x, F k - ,(XI, ( J * + &)I 2 H[x,
and finally,
,(x),J*]
+
%EL
Using this inequality and 1.2, we obtain
Continuing in the same manner, we obtain for k
=
1,2,. . .
Since J, I J,, it follows that
Denote n, = (Po,P I , . . .). Then by taking the limit in the preceding inequality and using (9), we obtain
If r < 1, we take c, = ~ ( -1 x ) for all k and nL[x] = (,uo[x],pl[x], . . .) in (10). The stationary policy 71, = (p,p,. . .), where p(.u) = po[.u](s) for all s ~ S , s a t i s f i e s J , ~ ~ J * + r . . Q.E.D. It is easy to see that the assumption cx < 1 is essential in order to be able to take n, stationary in the preceding proposition. As an example, take S = jOj, U(0) = (0, a), Jo(0)= 0, H(0, u, J ) = u + J(0). Then J*(O) = 0, but for any ,u E M, we have J,(O) = x. By using Proposition 5.1 we can prove the optimality equation under I, 1.1, and 1.2. Proposition 5.2 Let I, 1.1, and 1.2 hold. Then
Furthermore, if J'EF is such that J' 2 J, and J' 2 T(J1),then J' 2 J*. Proof
For every n = (pO,p l , . . .) E n and x E S, we have, from I. 1,
J,(s)= lim (T,,T,, . . . T,,)(J,)(x) k-
n;
(T,, . . . T,,)(J,)
I
(x) 2 T,,(J*)(x) 2 T(J*)(x).
By taking the infimum of the left-hand side over R E n,we obtain
74
5.
INFINITE HORIZON MODELS UNDER MONOTONICITY ASSUMPTIONS
To prove the reverse inequality, let E, and E, be any positive scalars and let E = (p,, p,, . . .) be such that
where E l
= (p1,p2,.. .).
=
Such a policy exists by Proposition 5.1. We have
,
lirn (T,, . . T ,)(Jo)]
T,,
[k+m
Since J * I J,-and
E,
and
E,
can be taken arbitrarily small, it follows that
Hence J * = T(J*). Assume that J'EF satisfies J' 2 J, and J' 2 T(J'). Let {E,) be any sequence with E~ > 0 and consider a policy E = (Po,PI,. . .) E n such that We have, from 1.2, J*
=
inf lirn (T,, . . . T,,)(Jo) a~IIk+m
2 inf lim inf(T,, . . , T,,)(Jf) Z
E
k-m ~
I lirn inf(T,, . . . T,,)(J') k- cc
I lirn lnf(T,, . . . T,, _ ,) [T(J')
+ E,]
k-m
I lirn inf(T,, . . . Tpk_ ,T,, k- m
Since we may choose
_
,)(J' + 8,)
CEOaiEi as small as desired, it follows that J * I J'. Q.E.D.
The following counterexamples show that 1.1 and 1.2 are essential in order for Proposition 5.2 to hold.
COUNTEREXAMPLE 1 Take S = ( 0 , I ) , C = U ( 0 )= U ( 1 )= ( - 1, 01, J,(O) = J o ( l ) = - 1, H(0, u, J ) = u if J(1) I - 1, H(0, u, J ) = 0 if J ( l ) > - 1, and H(1, u, J ) = u. Then (T,, . . . T,,_ l ) ( J o ) ( 0 = ) 0 and (T,, . . . T,,- , ) ( J o ) ( l )= y O ( l )for N 2 1. Thus J*(O) = 0 , J * ( l ) = - 1, while T(J*)(O)= - 1, T ( J * ) ( l )= - 1, and hence J* # T ( J * ) . Notice also that J , is a fixed point of T, while J , I J* and J , f J*, so the second part of Proposition 5.2 fails when J , = J'. Here I and 1.1 are satisfied, but 1.2 is violated. COUNTEREXAMPLE 2 Take S = {O, I}, C = U ( 0 )= U ( 1 )= (01, J,(O) = J o ( l ) = 0 , H(O,O, J ) = 0 if J ( 1 ) < co,H(O,O, J ) = co if J(1) = oo,H(1,0, J ) = J ( l ) + 1. Then (T,, . . . T,,_ J(J,)(O) = 0 and ( T w o . . T,,- J(J0)(1)= N. Thus J*(O) = 0, J * ( l )= oo.On the other hand, we have T(J*)(O)= T ( J * ) ( l )= oo and J X # T ( J * ) . Here I and 1.2 are satisfied, but 1.1 is violated. As a corollary to Proposition 5.1 we obtain the following. Corollary 5.2.1 Let I, 1.1, and 1.2 hold. Then for every stationary policy n = ( y ,y,. . .), we have
Furthermore, if J ' E F is such that J' 2 J , and J' 2 T,(J1), then J' 2 J,. Proof Consider the variation of our problem where the control constraint set is U,(x) = ( y ( x ) j rather than U ( x ) for V x E S. Application of Q.E.D. Proposition 5.2 yields the result.
We now provide the counterpart of Proposition 5.2 under Assumption D . Proposition 5.3 Let D and D.l hold. Then
T ( J 1 ) ,then J' I J*. Furthermore, if J ' E F is such that J' I J , and J' I Proof
We first show the following lemma.
Lemma 5.1 Let D hold. Then
J*
=
lim J g , N+ rn
where J i is the optimal cost function for the N-stage problem. Proof Clearly we have J* I J g for all N , and hence J* I lim,,, Also, for all n = (y,,p,,. . . ) ~ l lwe , have
and by taking the limit on both sides we obtain J , 2 lim,,, J*>lim,,,Jg. Q.E.D.
Jg.
J $ , and hence
76
5.
INFINITE HORIZON MODELS UNDER MONOTONICITY ASSUMPTIONS
Proof (continued) We return now to the proof of Proposition 5.3. An argument entirely similar to the one of the proof of Lemma 5.1 shows that under D we have for all x E S lim inf H ( x , u, J;) N-m U E U ( X )
=
inf lim H ( x , u, J;). u~U(x)N+m
Using D . l , this equation yields
(I6)
>
lirn T ( J $ ) = T lirn J ; .
(17) Since D and D.l are equivalent to Assumption F.1' of Chapter 3, by Corollary 3.1.1 we have J ; = T N ( J o )from , which we conclude that T ( J ; ) = T N +' ( J O = ) J $ + , . Combining this relation with (15) and (17), we obtain J* = T ( J * ) . T o complete the proof, let J ' E F be such that J' I J o and J' I T ( J 1 ) . Then we have N- m
J*
=
(N+w
inf lirn (T,, . . . T,, _ i ) ( J o ) Z E ~ N - m
2 lirn inf (T,, N - m acn
. . . T,, _ l ) ( J o )
2 lirn inf (T,, . . . T,,- , ) ( J ' ) 2 lim T N ( J ' )2 J' N-m
Hence J* 2 J'.
nen
N- m
Q.E.D.
In Counterexamples 1 and 2 of Section 3.2, D is satisfied but D.l is violated. In both cases we have J* # T ( J * ) ,as the reader can easily verify. A cursory examination of the proof of Proposition 5.3 reveals that the only point where we used D.l was in establishing the relations
and JE = T N ( J o ) .If these relations can be established independently, then the result of Proposition 5.3 follows. In this manner we obtain the following corollary. Corollary 5.3.1 Let D hold and assume that D.2 holds, S is a finite set, and J * ( x ) > - co for all X E S. Then J* = T ( J * ) . Furthermore, if J ' E F is such that J' I J o and J' I T ( J ' ) ,then J' I J*. Proof A nearly verbatim repetition of the proof of Proposition 3.l(b) shows that, under D , D.2, and the assumption that J * ( x ) > - co for all x E S, we have J ; = T N ( J o )for all N = 1,2,. . . . We will show that lim H ( x , u, J X ) = H
V x E S, u E U ( x ) .
N-ta,
Then using (16) we obtain (17), and the result follows as in the proof of Proposition 5.3. Assume the contrary, i.e., that for some x " S,~ E E U(,Y), and
5.2 E
THE OPTIMALITY EQUATION
> 0, there holds H(x",u", J?) - & > H
k
,
=
1,2,. . . .
From the finiteness of S and the fact that J*(x) = limN,, J g ( x ) > - co for all x , we know that for some positive integer ii J
-
( a )I 1
N+ m
Vk 2 E.
J
By using D.2 we obtain for all k 2 ii H(x",u",J;) - E I H(R,u",JZ - ( & / a ) I ) H
Q.E.D.
which contradicts the earlier inequality.
Similarly, as under Assumption I, we have the following corollary. Corollary 5.3.2 Let D and D.l hold. Then for every stationary policy we have
TL = (p,p, . . .),
J,
=
Tp(Jp).
Furthermore, if J ' E F is such that J' I J o and J' I T,(J1), then J' 5 J,. It is worth noting that Propositions 5.2 and 5.3 can form the basis for computation of J* when the state space S is a finite set with n elements denoted by x,,x,,. . . ,x,. It follows from Proposition 5.2 that, under I, 1.1, and 1.2, J * ( x , ) , . . . ,J*(x,) solve the problem n
minimize
I-i i= 1
subject to IVi2 inf H ( x i ,u, J J , E
i = 1,. . . ,n,
U(XI)
I.i 2 Jo(xi),
i = 1,. . . ,n,
where J , is the function taking values J,(xi) = ii,i = 1,. . . ,n. Under D and D.l, or D, D.2, and the assumption that J * ( x ) > - co for V X ES, the corresponding problem is maximize
i= 1
l.i
subject to lbiI H ( x , , u , J,), lLi5 J O ( x i ) ,
i = 1 , . . . ,n, i = 1,. . . ,n.
UE
U(xi)
When U ( x , )is also a finite set for all i, then the preceding problem becomes a finite-dimensional (possibly nonlinear) programming problem.
78
5.
INFINITE HORIZON MODELS UNDER MONOTONICITY ASSUMPTIONS
5.3 Characterization of Optimal Policies We have the following necessary and sufficient conditions for optimality of a stationary policy. Proposition 5.4 Let I, 1.1, and 1.2 hold. Then a stationary policy TC*= (p*,p*, . . .) is optimal if and only if
Furthermore, if for each x~ S, there exists a policy which is optimal at x , then there exists a stationary optimal policy. Proof If TC*is optimal, then J,, = J* and the result follows from Proposition 5.2 and Corollary 5.2.1. Conversely, if T,,(J*) = T(J*), then since J* = T ( J * )(Proposition 5.2), it follows that T,,(J*) = J*. Hence by Corollary 5.2.1, J,, I J* and it follows that n* is optimal. If T C ,=* ( p z ,,, pT, . . .) is optimal at x E S, we have, from 1.1,
.,
J*(x) = J,,(x)
=
lim (T,;,; . . T,:, x ) ( J o ) ( x ) k- a,
Hence T,*o.x(J*)(x)= T ( J * ) ( x )for all x E S. Define p* E M by p*(x) = p;j;,,(x). Then T,,(J*) = T ( J * ) and, by the result just proved, the stationary policy Q.E.D. (p*, p*, . . .) is optimal. Proposition 5.5 Let D and D.l hold. Then a stationary policy n* (p*, p*, . . .) is optimal if and only if T,*(J,*)
=
T(J,*).
=
(19)
Proof If n* is optimal, then J,, = J*, and the result follows from Proposition 5.3 and Corollary 5.3.2. Conversely, if T,,(J,,) = T(J,,), then we obtain, from Corollary 5.3.2, that J,, = T(J,,), and Proposition 5.3 yields J,, I J*. Hence TC*is optimal. Q.E.D. Examples where n* satisfies (18) or (19) but is not optimal under D or I, respectively, are given in DPSC, Section 6.4. Proposition 5.4 shows that there exists a stationary optimal policy if and only if the infimum in the optimality equation J * ( x ) = inf H ( x , u, J*) u
LT(x)
5.3
79
CHARACTERIZATION OF OPTIMAL POLICIES
is attained for every x E S. When the infimum is not attained for some x E S, the optimality equation can still be used to yield an &-optimalpolicy, which can be taken to be stationary whenever the scalar a in 1.2 is strictly less than one. This is shown in the following proposition. Proposition 5.6 Let I, 1.1, and 1.2 hold. Then: (a) If E > 0, (ci) satisfies (pz, pT,. . .)EI7 is such that
I,"=, aksk= E, E~ > 0, i = 0,1,. . . , and n* =
then
(b) If E > 0, the scalar a in 1.2 is strictly less than one, and p* E M is such that T,*(J*) I T(J*)
+ ~ ( -1 a),
then
Proof (a) Since T(J*) = J*, we have T,;(J*) T , ; , to both sides we obtain
< J* + E,, and applying
Applying T,:_2 throughout and repeating the process, we obtain, for every
Since JoI J*, it follows that
By taking the limit as k + co,we obtain J,, I J * + E. (b) This part is proved by taking 6, = ~ ( -1 a) and p: Q.E.D. the preceding proof.
= p*
for all k in
A weak counterpart of part (a) of Proposition 5.6 under D is given in Corollary 5.7.1. We have been unable to obtain a counterpart of part (b) or conditions for existence of a stationary optimal policy under D.
80
5.
INFINITE HORIZON MODELS UNDER MONOTONICITY ASSUMPTIONS
5.4 Convergence of the Dynamic Programming AlgorithmExistence of Stationary Optimal Policies The D P T(Jo),T'(J,), while under can define a
algorithm consists of successive generation of the functions . . . . Under Assumption I we have T k ( ~ , I ) Tk+l(J0)for all k, Assumption D we have Tk"(Jo) I Tk(Jo)for all k. Thus we function J, E F by J,(x) = lim TN(~,)(x) N*
Vx E S.
(20)
02
We would like to investigate the question whether J, = J*. When Assumption D holds, the following proposition shows that we have J, = J * under mild assumptions. Proposition 5.7 Let D hold and assume that either D.l holds or else J$ = TN(Jo)for all N, where J g is the optimal cost function of the N-stage problem. Then J, = J*. Proof By Lemma 5.1 we have that D implies J * = lirn,, , JR. Corollary 3.1.1 shows also that under our assumptions J$ = T,(J,). Hence J * = lim,, ,TN(Jo)= J, . Q.E.D.
The following corollary is a weak counterpart of Proposition 5.1 and part (a) of Proposition 5.6 under D. Corollary 5.7.1 Let D hold and assume that D.2 holds, S is a finite set, and J*(x) > - cc for all x E S. Then for any E > 0, there exists an r-optimal 7 such that policy, i.e., a 7c, E l
+
Proof For each N , denote r, = 42(1 + cx . . . + aN-I) and let 7c, = N ,r p No , pNl , . . . ,pN-l,,u,p,.. .) be such that EM, and fork = 0,1,. . . ,N- 1, u ,; E M and ( T , ~ T ~ '-)~( J- ~I) T,-$(J~) EN.
+
We have T,;_ ,(Jo) I T(Jo)+ r,, and applying T,:_ Continuing in the same manner, we have
from which we obtain, for N
= 0,1,. . . ,
+
J,, I TN(Jo) (42).
to both sides, we obtain
5.4
CONVERGENCE OF THE DYNAMIC PROGRAMMING ALGOR~THM
81
As in the proof of Corollary 5.3.1 our assumptions imply that J $ = T N ( ~ , ) for all N , so by Proposition 5.7, lim,,, T N ( ~= , )J*. Let IV be such that T'(J,) I J* + ( 4 2 ) . Such an IV exists by the finiteness of S and the fact that J * ( x ) > - co for all X E S . Then we obtain J,, I J* + c, and T C is~ the Q.E.D. desired policy. Under Assumptions I, 1.1, and 1.2, the equality J , = J* may fail to hold even in simple deterministic optimal control problems, as shown in the following counterexample.
COUNTEREXAMPLE 3 Let S = [0, co), C = U ( x )= (0,co) for V X E S J, o ( x )= 0 for V x E S, and Then it is easy to verify that
J * ( x ) = inf J,(x) Z
E
=
co
V x E S,
~
while
Hence J,(O)
=0
and J , ( 0 ) < J*(O).
In this example, we have J*(x) = co for all ~ E SOther . examples exist where J* # J , and J * ( x ) < co for all X E S (see [S14, p. 8801). The following preliminary result shows that in order to have J , = J*, it is necessary and sufficient to have J , = T(J,). Proposition 5.8 Let I, 1.1, and 1.2 hold. Then
J , I T(J,)
T ( J * ) = J*.
(21)
Furthermore, the equalities
hold if and only if
J,
=
T(J,).
(23)
Proof Clearly we have J , I J , for all TC E ll,and it follows that J , i J*. Furthermore, by Proposition 5.2 we have T ( J * ) = J*. Also, we have, for all k 2 0,
T(J,)
=
inf H ( x , u, J,) 2 inf H [ x , u, Th(J,)] = T k + '(Jo). u U(x)
ue
U(x)
Taking the limit of the right side, we obtain T(J,) 2 J,, which, combined J* and T ( J * ) = J*, proves (21).If (22)holds, then (23)also holds. with J , I
82
5.
INFINITE HORIZON MODELS UNDER MONOTONICITY ASSUMPTIONS
Conversely, let (23) hold. Then since we have J , 2 J o , we see from Proposition 5.2 that J , 2 J*, which combined with (21) proves (22). Q.E.D. In what follows we provide a necessary and sufficient condition for J , = T(J,) [and hence also (22)] to hold under Assumptions I, 1.1, and 1.2. We subsequently obtain a useful sufficient condition for J , = T(J,) to hold, which at the same time guarantees the existence of a stationary optimal policy. For any J E F, we denote by E ( J ) the epigraph of J , i.e., the subset of S R given by E ( J ) = ( ( x ,]")I J ( x ) l A). Under I we have T k ( J o )IT k + ' ( J o )for all k and also J , = lim,,, so it follows easily that
(24) T~(J,),
Consider for each k 2 1 the subset C k of S C R given by C k = ( ( x ,u, i,)IH[x, U, T k - ' ( J o ) ]I A, X E S , U E U ( X ) ) .
(26)
Denote by P ( C k )the projection of C k on S R , i.e.,
Consider also the set is obtained from P ( C k )by adding for each x the point [ x , z ( x ) ] The set P(C,) where K(x) is the perhaps missing end point of the half-line (Al(x,A) E P ( C k ) ) . We have the following lemma. Lemma 5.2 Let I hold. Then for all k 2 1
Furthermore, we have P(Ck) = P O = EITk(Jo)l if and only if the infimum is attained for each x~ S in the equation T k ( ~ o ) (= x ) inf H [ x , u, T k - ' ( J o ) ] . u€
U(x)
' The symbol 3 means "there exists" and the initials "s.t." stand for "such that."
(30)
5.4
CONVERGENCE OF THE DYNAMIC PROGRAMMING ALGORITHM
Proof
If ( x ,3.) E E [ T k ( J o ) ]we , have Tk(Jo)(x)= inf H [ x , u, Tk-'(J,)] I 3,. uE
U(X)
Let {cn)be a sequence such that sequence such that
E,
> 0, E,
-+ 0,
and let (u,) c U ( x ) be a
+ E, IA + E ,
H [ x , u,, T k - ' ( J O ) ]I Tk(JO)(x)
Then ( x ,u,, iL+ E,) E C, and ( x ,3, + 8,) E P(Ck) for all n. Since j. + E , by (28)we obtain ( x ,i)E P(Ck).Hence EITk(Jo)l P O .
+i
,
(32)
Conversely, let ( x ,A) E P(C,). Then by (26)-(28)there exists a sequence {A,) with A, -, 3, and a corresponding sequence (u,) c U ( x )such that T k ( J o ) ( xl ) H [ x , u,, T k - ' ( J ~ )1 ] I,,. Taking the limit as n Hence
-+
co, we obtain Tk(Jo)(x) l A and (x,I,)EE[T~(J,)].
PO c E[Tk(JO)I, and using (32) we obtain (29). To prove that (30)is equivalent to the attainment of the infimum in (31), assume first that the infimum is attained by pz- ,(x) for each x E S. Then for each ( x ,A)E EITk(Jo)], which implies by (27) that (X,;L)EP(C,)Hence EITk(Jo)]c P(C,) and, in E view of (29),we obtain (30).Conversely, if (30)holds, we have [ x , Tk(Jo)(x)] P(C,) for every x for which T k ( J o ) ( x< ) oo.Then by (26)and (27),there exists a pi- ,(x)E U ( x )such that H [ x ,pf- l(x),T ~ - ' ( J ~I) Tk(~,,)(x) ] = inf H [ x , u, T,-'(J,)]. u E U(x)
Hence the infimum in (31) is attained for all x for which T k ( ~ , ) ( x<) oo. It ) a,and the is also trivially attained by all U E U ( x ) whenever T k ( J o ) ( x= proof is complete. Q.E.D. Consider now the set sets
=,
C, and define similarly as in (27)and (28)the
n
I
( ~ , % ) ~ ~ U € U ( X ) S . ~ . ( X , U ,ck ).)E , k= 1
(33)
5.
84
INFINITE HORIZON MODELS UNDER MONOTONICITY ASSUMPTIONS
Using (25) and Lemma 5.2, it is easy to see that
We have the following proposition. Proposition 5.9 Let I, 1.1, and 1.2 hold. Then:
(a) We have J,
=
T(J,) (equivalently J,
(b) We have J,
=
T(J,) (equivalently J, = J*), and the infimum in
= J*) if
and only if
J,(x) = inf H(x, u, J,) uc U(x)
is attained for each X E S (equivalently there exists a stationary optimal policy) if and only if m
(39) Proof
(a) Assume J,
=
T(J,) and let (x, A) be in E(J,), i.e.,
inf H(x, u, J,) = J,(x) 5 1. ua
U(x)
Let (E,) be any sequence with {u,) such that
E,
> 0, E,
+ 0.
Then there exists a sequence
and so H[x, u,, Tk-'(Jo)] I3,
+ E,,
k,n = 1,2,. . . .
It follows that (x, u,, i+ r,) E Ck for all k, n and (x, u,, i+ r,) E all n. Hence (x, i+ E,) E P(n,"= Ck) for all n, and since 3, + r, obtain (x, A)EP(nP= Ck).Therefore
C, for we
+ 1,
and using (36) we obtain (37). Conversely, let (37) hold. Then by (36) we have P(n,"=, C,) = E(J,). Let ~ E beS such that J,(x) < co. Then [x, J,(x)] E P(n,"=, C,), and there
5.4
CONVERGENCE OF THE DYNAMIC PROGRAMMING ALGORITHM
85
exists a sequence (I,,) with in-+J,(x) and a sequence {u,) c U ( x )such that k, n = l,2,. . . .
H[x, u,, T k - ' ( J 0 ) ]I A,,
Taking the limit with respect to k and using 1.1, we obtain
H(x, u,, J,), it follows that and since T(J,)(x) I
Taking the limit as n + oo,we obtain for all x E S such that J,(x) < oo. Since the preceding inequality holds also for all x E S with J,(x) = co, we have T(J,) I J, . On the other hand, by Proposition 5.8, we have
Combining the two inequalities, we obtain J , = T(J,). (b) Assume J , = T(J,) and that the infimum in (38) is attained for each x E S. Then there exists a function p* E M such that for each (x,I,) E E(J,) ~ [ xp*(x), , J,] I i . Hence H [ x , ~ * ( x T ) ,k - l ( J O ) I ] 2 for k = 1,2,. . . , and we have [ x ,p*(x),A] r),"=, C,. As a result, ( x , ? , )P(n,"= ~ Ck).Hence
and (39)follows from (35). Conversely, let (39) hold. We have for all ~ E with S J,(x) < co that [ x ,J,(x)] EE(J,)
=P
Thus there exists a p*(x) E U ( x )such that
from which we conclude that
0 1 tkZ1 c k
.
E
86
5.
INFINITE HORIZON MODELS UNDER MONOTONICITY ASSUMPTIONS
Taking the limit and using 1.1, we see that T(J,)(x) l H [ x , P*(x), J,] 1 Jm(x). It follows that T(J,) 1 J,, and since J , I T(J,) by Proposition 5.8, we finally obtain J , = T(J,). Furthermore, the last inequality shows that p*(x) attains the infimum in (38) when J,(x) < co. When J,(x) = co, every U E U ( x ) attains the infimum, and the proof is complete. Q.E.D. In view of Proposition 5.8, the equality J , = T ( J , ) is equivalent to the validity of interchanging infimum and limit as follows J,
=
lim inf (T,, . . . T,,)(Jo) = inf lim(T,, . . . T,,)(Jo) asnk-w
= J*
k-m nsII
Thus Proposition 5.9 states that interchanging infimum and limit is in fact equivalent to the validity of interchanging projection and intersection in the manner of (37) or (39). The following proposition provides a compactness assumption which guarantees that (39) holds. Proposition 5.10 Let I, 1.1, and 1.2 hold and let the control space C be a Hausdorff space. Assume that there exists a nonnegative integer E such that for each x E S, i E R, and k 2 E, the set
U k ( x ,i)= { uE U ( x ) l H[ x , u, T k ( J o ) ]I 1")
(40)
is compact. Then
and (by Propositions 5.8 and 5.9) J , = T(J,)
=
T ( J * ) = J*.
Furthermore, there exists a stationary optimal policy. Proof
By (35)it will be sufficient to show that
( ) fi
k= 1
Let (x,?.) be in that
m
P
C
CO
n p(ck) n m =
k= 1
P ( C k ) .Then there exists a sequence
H [ x , u,, T k ( J o ) ]5 H [ x , u,, T n ( J o ) ]I 3. or equivalently
(42)
k= 1
(u,) c
v n 2 k,
U ( x ) such
5.4
87
CONVERGENCE OF THE DYNAMIC PROGRAMMING ALGORITHM
Since U,(x, II) is compact for k 2 E, it follows that the sequence {u,) has an accumulation point ii and ii E U,(X, A)
But Uo(x,A)
3
U,(x, A) 3 . . . , so
E
U,(x, A) for k = 0,1,. . . . Hence
H[x,ii,Tk(Jo)]
and (x,u, A) E
v k 2 E. k=0,1, ...,
0% Ck It follows that (x,A) E P(n,"=, C,) and
Also, from the compactness of U,(x,?,) and the result of Lemma 3.1, it follows 5.2, that the infimum in (31) is attained for every X E S and k > E. By Lemma P(Ck) = P(C,) for k > F, and since P(C,) 3 P(C2)3 . . . and P(Cl) 3 P(C2)3 . . . , we have
Thus (42) is proved.
Q.E.D.
The following proposition shows also that a stationary optimal policy may be obtained in the limit by means of the DP algorithm. Proposition 5.11 Let the assumptions of Proposition 5.10 hold. Then:
(a) There exists a policy TC* = (pg,p?, . . .) E l 2 attaining the infimum in the D P algorithm for all k 2 E, i.e.,
(b) For every policy TC* satisfying (43), the sequence {pz(x))has at least one accumulation point for each X E S with J*(x) < co. (c) If p*:S + C is such that p*(x) is an accumulation point of {pz(x)) for all x E S with J*(x) < co and ~ * ( x )U(X) E for all x E S with J*(x) = co, then the stationary policy (p*,p*, . . .) is optimal. Proof (a) This follows from Lemma 3.1. (b) For any TC*= (pg ,pT, . . .) satisfying (43) and x E S such that J*(x) < oo,we have H[x, p:(x), Tk(Jo)]< H [x, p:(x), Tn(Jo)]< J*(x)
Hence,
V k 2 E,
n 2 k.
88
5.
INFINITE HORIZON MODELS UNDER MONOTONICITY ASSUMPTIONS
Since U k [ x ,J*(x)] is compact, {p:(x)) has at least one accumulation point. Furthermore, each accumulation point y*(x) of (,u:(x)} belongs to U ( x ) and satisfies H[x,p*(x),T k ( J o ) ]I J*(x)
'dk 2
E.
(44)
By taking the limit in (44)and using 1.1, we obtain H [ x , ,u*(x),J,]
= H [ x , p*(x),J*] I J*(x)
for all x E S with J*(x) < oo. This relation holds trivially for all x E S with J*(x) = m. Hence T,,(J*) I J* = T(J*),which implies that T,,(J*) = T(J*). It follows from Proposition 5.4 that (p*,p*, . . .) is optimal. Q.E.D. The compactness of the sets U,(x, 3,) of (40)may be verified in a number of special cases, some examples of which are given at the end of Section 3.2. Another example is the lower semicontinuous model of Section 8.3, whose infinite horizon version is treated in Corollary 9.17.2.
Application to Specific Models
5.5
We now show that all the results of this chapter apply to the stochastic optimal control problems of Section 2.3.3 and 2.3.4. However, only a portion of the results apply to the minimax control problem of Section 2.3.5, since D.l is not satisfied in the absence of additional assumptions. Stochastic Optimal Control-Outer
Integral Formulation
Proposition 5.12 Consider the mapping H(x, u, J ) = E* { g ( x ,u, w) + a J [ f( x ,u, w)]lx,u }
(45)
of Section 2.3.3 and let J o ( x )= 0 for V x E S. If
then Assumptions I, 1.1, and 1.2 are satisfied with the scalar in 1.2 equal to a. If
then Assumptions D, D.l, and D.2 are satisfied with the scalar in D.2 equal to a. Proof Assumptions I and D are trivially satisfied in view of (46) or (47),respectiveIy, and the fact that Jo(x) = 0 for 'dx E S. Assumptions LI and D.l are satisfied in view of the monotone convergence theorem for outer integration (Proposition A.l). From Lemma A.2 we have under (46) for all
5.5
APPLICATION TO SPECIFIC MODELS
H(x, u, J
+
+ +
+
E*(g(x, u, w) aJ[ f(x, u, w)] arlx, u) = E*(g(x,u,w) aJ[f(x,u,w)]lx,u} ar = H(x, u, J ) + ar.
7) =
+
Hence 1.2 is satisfied as stated in the proposition. Under (47), we have from Lemmas A.2 and A.3(c) that for all r > 0 and J E F with J I J, H(x, u, J and D.2 is satisfied.
-
r) = H(x, u, J ) - ar,
Q.E.D.
Thus all the results of the previous sections apply to stochastic optimal control problems with additive cost functionals. In fact, under additional countability assumptions it is possible to exploit the additive structure of these problems and obtain results relating to the existence of optimal or nearly optimal stationary policies under Assumption D. These results are stated in the following proposition. A proof of part (a) may be found in Blackwell [BlO]. Proofs of parts (b) and (c) may be found in Ornstein [04] and Frid [F2]. Proposition 5.13 Consider the mapping
H(x, u, J) = E{g(x, U, W)
+ J[f(x, u>41Ix, u)
of Section 2.3.2 ( W is countable), and let J,(x) = 0 for all x E S. Assume that S is countable, J*(x) > - m for all x E S, and g satisfies where b ~ ( -m,O) is some scalar. Then: (a) If for each x E S there exists a policy which is optimal at x, then there exists a stationary optimal policy. (b) For every E > 0 there exists a p, E M such that Vx E S. JJx) < (1 - s)J*(x) (c) If there exists a scalar PE (- m, 0) such that P I J*(x) for all x E S, then for every F > 0, there exists a stationary &-optimalpolicy, i.e., a p, E M such that
We note that the conclusion of part (a) may fail to hold if we have J*(x) = m for some x E S, even if S is finite, as shown by a counterexample found in Blackwell [BlO]. The conclusions of parts (b) and (c) may fail to hold if S is uncountable, as shown by a counterexample due to Ornstein [04]. The -
90
5.
INFINITE HORIZON MODELS UNDER MONOTONICITY ASSUMPTIONS
conclusion of part (c) may fail to hold if J * is unbounded below, as shown by a counterexample due to Blackwell [B8]. We also note that the results of Proposition 5.13 can be strengthened considerably in the special case of a deterministic optimal control problem (cf. the mapping of Section 2.3.1). These results are given in Bertsekas and Shreve [B6]. Stochastic Optimal Control-Multiplicative
Cost Functional
Proposition 5.14 Consider the mapping
H(x, u, J ) = E(g(x, u, w)J[f (x, u, w)] lx, u> of Section 2.3.4 and let Jo(x)= 1 for Vx E S. (a) If there exists a b E R such that then Assumptions I, 1.1, and 1.2 are satisfied with the scalar in 1.2 equal to b. (b) If 01g(x,u,w)ll
VXES, u€U(X), w e w ,
then Assumptions D, D.l, and D.2 are satisfied with the scalar in D.2 equal to unity. Proof This follows easily from the assumptions and the monotone Q.E.D. convergence theorem for ordinary integration. Minimax Control Proposition 5.15 Consider the mapping
H(x,u,J)=
sup {g(x,u,w)+uJ[f(x,u,w)]) w E W ( X U) ,
of Section 2.3.5 and let Jo(x)= 0 for Vx E S.
then Assumptions I, 1.1, and 1.2 are satisfied with the scalar in 1.2 equal to a. (b) If g(x,u,w)lO
VXES, U€U(X), w e w ,
then Assumptions D and D.2 are satisfied with the scalar in D.2 equal to a. Proof
The proof is entirely similar to the one of Proposition 5.12. Q.E.D.
Chapter 6
A Generalized Abstract Dynamic Programming Model
As we discussed in Section 2.3.2, there are certain difficulties associated with the treatment of stochastic control problems in which the space W of the stochastic parameter is uncountable. For this reason we resorted to outer integration in the model of Section 2.3.3. The alternative explored in this chapter is to modify the entire framework so that policies 71. = (p,,p,, . . .) consist of functions p, from a strict subset of M-for example, those functions which are appropriately measurable. This approach is related to the one we employ in Part 11. Unfortunately, however, many of our earlier results and particularly those of Chapter 5 cannot be proved within the generalized framework to be introduced. The results we provide are sufficient, however, for a satisfactory treatment of finite horizon models and infinite horizon models under contraction assumptions. Some of the results of Chapter 5 proved under Assumption D also have counterparts within the generalized framework. The reader, aided by our subsequent discussion, should be able to easily recognize these results. Certain aspects of the framework of this chapter may seem somewhat artificial to the reader at this point. The motivation for our line of analysis stems primarily from ideas that are developed in Part 11, and the reader may wish to return to this chapter after gaining some familiarity with Part 11.
The results provided in the following sections are applied to a stochastic optimal control problem with multiplicative cost functional in Section 11.3. The analysis given there illustrates clearly the ideas underlying our development in this chapter. 6.1 General Remarks and Assumptions Consider the sets S, C, U(x), M , ll,and F introduced in Section 2.1. We consider in addition two subsets F* and F" of the set F of extended realvalued functions on S satisfying ~ for and a subset fi of the set M of functions p:S -. C satisfying p ( x ) U(x) Vx E S. The subset of policies in I'I corresponding to if'? is denoted by i.e.,
a,
In place of the mapping H of Section 2.1, we consider a mapping H : SCF" -+ R* satisfying for all x E S, u E U(x), J, J rE F", the monotonicity assumption if J I J'.
H(x, u, J) I H(x, u, J')
Thus the mapping H in this chapter is of the same nature as the one of Chapters 2-5, the only difference being that H is defined on SCF" rather than on SCF. Thus if F" consists of appropriately measurable functions and H corresponds to a stochastic optimal control problem such as the one of Section 2.3.3 (with g measurable), then H can be defined in terms of ordinary integration rather than outer integration. For p E if'? we consider the mapping T,: F" -+ F defined by
Consider also the mapping T: F" -+ F defined by T(J)(x)= inf H(x, u, J)
Vx E S.
u EU(X)
We are given a function J, :S
-t
R* satisfying
and we are interested in the N-stage problem minimize J,,,(x) subject to n E fi
= (T,,
. . . T,,_ ,)(J,)(x)
and its infinite horizon counterpart minimize J,(x) subject to
E
=
fi.
lim (T,, . . . T,,_ ,)(J,)(x)
N+ m
(2)
We use the notation,
and employ the terminology of Chapter 2 regarding optimal, E-optimal,and stationary policies, as well as sequences of polices exhibiting {E,)-dominated convergence to optimality. The following conditions regarding the sets F*, F", and fi will be assumed in every result of this chapter. A.l For each x E S and u E U(x), there exists a y E fi such that y(x) = u. (This implies, in particular, that for every J E F" and x E S
T(J)(x)= inf H(x, u, J) = inf H[x, p(x), J]). ,EM
ue U(x)
A.2 For all J E F* and
Y E R,
we have
T(J)EF*,
( J + r)€F*.
A.3 For all J E F", p E fi,and r E R, we have T,(J) E F, A.4 For each J EF* and xES
E
(J
+ Y)E F".
> 0, there exists a y, E fi such that for all
T(J)(x)+ E
if if
T(J)(x) > - co, T(J)(x)= - co.
In Section 6.3 the following assumption will also be used. A.5 For every sequence {J,) c F" that converges pointwise, we have J, E F. If, in addition, {J,) c F*, then lirn,, , J , E F*. lirn,, ,
Note that in the special case where F* = F" = F and fi = M, we obtain the framework of Chapters 2-5, and all the preceding assumptions are satisfied. Thus this chapter deals with an extension of the framework of Chapters 2-5. We now provide some examples of sets F*, F, and fi which are useful in connection with the mapping
associated with the stochastic optimal control problem of Section 2.3.3. We take J,(x) = 0 for 'dx E S. The terminology employed is explained in Chapter 7. EXAMPLE 1 Let S, C, and W be Borel spaces, F the Borel o-algebra on W,f a Borel-measurable function mapping S C W into S, g a lower semianalytic function mapping S C W into R*, p(dwlx,u) a Borel-measurable stochastic kernel on W given SC, and cc a positive scalar. Let the set
be analytic. Take F* to be the set of extended real-valued, lower semianalytic functions on S, F" the set of extended real-valued, universally measurable functions on S, and fi the set of universally measurable mappings from S to C with graph in T (i.e., p E fi if p is universally measurable and (x, p(x))E T for 'dx E S). This example is the subject of Chapters 8 and 9. EXAMPLE 2 Same as Example 1 except that @is the set of all analytically measurable mappings from S to C with graph in T.This example is treated in Section 11.2. EXAMPLE 3 Same as Example 1 except for the following: p(dwlx, u) and f are continuous, g real-valued, upper semicontinuous, and bounded above, T an open subset of SC, F" the set of extended, real-valued, Borel-measurable functions on S which are bounded above, F* the set of extended real-valued, upper semicontinuous functions on S which are bounded above, and fi the set of Borel measurable mappings from S to C with graph in T.This is the upper semicontinuous model of Definition 8.8. EXAMPLE 4 Same as Example 3 except for the following: Cis in addition compact, g real-valued, lower semicontinuous, and bounded below, T a closed subset of SC, F" the set of extended real-valued, Borel-measurable functions on S which are bounded below, and F* the set of extended realvalued, lower semicontinuous functions on S which are bounded below. This is a special case of the lower semicontinuous model of Definition 8.7. All these examples satisfy Assumptions A.l-A.4 stated earlier (see also Sections 7.5 and 7.7). The first two satisfy Assumption A.5 as well.
6.2 Analysis of Finite Horizon Models Simple modifications of some of the assumptions and proofs in Chapter 3 provide a satisfactory analysis of the finite horizon problem (1). We first modify appropriately some of the assumptions of Section 3.1. Assumption p.2 Same in statement as Assumption F.2 of Section 3.1 except that F is replaced by F".
Assumption p.3 Same in statement as Assumption F.3 of Section 3.1 instead of J EF, except we require that J EF*, (J,) c 3, and (p,) c (J,) c F, and {p,) c M.
a,
It can be easily seen that F.2 is satisfied in Examples 1-4 of the previous section. It is also possible to show (see the proof of Proposition 8.4) that e.3 is satisfied in Example 1, where universally measurable policies are employed. By nearly verbatim repetition of the proofs of Proposition 3.l(b) and Proposition 3.2 we obtain the following. Proposition 6.1 (a) Let Assumptions A.l-A.4 and F.2 hold and assume that J:(x) > - oo for all X E S and k = 1,2,. . . ,N. Then
JA and for every such that
E
=
TN(Jo),
> 0 there exists an N-stage &-optimalpolicy, i.e., a J; I J N , = , I JA
+
T C , E ~
E.
(b) Let Assumptions A.l -A.4 and F.3 hold and assume that J,, Jx) < co f o r a l l x ~n~~,l ? , a n dk = 1,2,. . . , N . Then J;
=
TN(Jo).
Furthermore, given any sequence ( E , ) with &,LO,r, > 0 for Qn, there exists a sequence of policies exhibiting (&,)-dominated covergence to optimality. In particular, if in addition Jg(x) > - co for all x E S, then for every E > 0 there exists an &-optimalpolicy. Similarly, by modifying the proofs of Proposition 3.3 and Corollary 3.3.l(b), we obtain the following. Proposition 6.2 Let Assumptions A.l -A.4 hold.
(a) A policy TC*= (,u;f:,pT, . . .) E I? is uniformly N-stage optimal if and only if (T,:TN-k-')(Jo) = TN-k(Jo),k = 0,1,. . . ,N - 1. (b) If there exists a uniformly N-stage optimal policy, then Analogs of Corollary 3.3.l(a) and Proposition 3.4 can be proved if is rich enough so that the following assumption holds.
fi
Exact Selection Assumption For every J E F*, if the infimum in
T(J) = inf H(x, u, J) u0U(x)
is attained for every x E S, then there exists a p* E fi such that T,,(J)
=
T(J).
In Examples 1 and 4 of the previous section the exact selection assumption is satisfied (see Propositions 7.50 and 7.33). The following proposition is proved similarly to Corollary 3.3.l(a) and Proposition 3.4. Proposition 6.3 Let Assumptions A.l-A.4 and the exact selection assumption hold.
(a) There exists a uniformly N-stage optimal policy if and only if the infimum in the relation T ~ + ~ ( J ~= ) ( Xinf ) H[x, u, Tk(Jo)] uE U(X)
i s a t t a i n e d f o r e a c h x ~ S a n d k = O , l , .. . , N - 1. (b) Let the control space C be a Hausdorff space and assume that for e a c h x ~ S , % ~ R , a n d0,1,.. k = . , N - 1, theset Uk(x,i ) = {u E U(x)lH[x, u, Tk(Jo)]5 i ) is compact. Then
and there exists a uniformly N-stage optimal policy.
6.3 Analysis of Infinite Horizon Models under a Contraction Assumption
We consider the following modified version of Assumption C of Section 4.1. Assumption
There is a closed subset B of the space B such that:
~ (a) J , E B F*, (b) For all J E Bn F*, the function T(J) belongs to B n F*, (c) For all J EB n F and P E f i , the function T,(J) belongs to B n F. Furthermore, for every n
= (p,,
-
p, ,. . .) E ll,the limit
exists and is a real number for each x E S. In addition, there exists a positive integer m and scalars p and a with 0 < p < 1 , 0 < a such that IIT,(J)-T,(J1)IIIallJ-J'll
~ p ~ f Ji ,, J ' E B ~ ,
II(T,,T,~. . T,_- ,)(J) - (T,OT,I . . T,_- ,)(J1)II2 pllJ - J' l 'dp,,.. .,p,-,EM, J , J ' E B ~ F .
-
If Assumptions A.l-A.5 and C" are made, then almost all the results of Chapter 3 have counterparts within our extended framework. The key fact is that, since F" and F* are closed under pointwise limits (Assumption AS), it follows that B n F, B n F, B n F*, and B n F* are closed subsets of B. This is true in view of the fact that convergence of a sequence in B (i.e., in sup norm) implies pointwise convergence. As a result the contraction mapping fixed point theorem can be used in exactly the same manner as in Chapter 3 to establish that, for each p E l@,J, is the unique fixed point of T, in B n F and J * is the unique fixed point of T in B n F*. Only the modified policy iteration algorithm and the associated Proposition 4.9 have no counterparts in this extended framework. The reason is that our assumptions do not guarantee that Step 3 of the policy iteration algorithm can be carried out. Rather than provide a complete list of the analogs of all propositions in Chapter 4 we state selectively and without proof some of the main results that can be obtained within the extended framework.
Proposition 6.4 Let Assumptions A.l-A.5 and
hold.
(a) The function J * belongs to B n F* and is the unique fixed point of T within B n F*. Furthermore, if J' E B n F* is such that T(J') I J', then J * I J', while if J' I T(Jt),then J' I J*. (b) For every p E the function J, belongs to B n F" and is the unique fixed point of T, within B n E. (c) There holds
a,
(d) A stationary policy TC*= (p*, p*, . . .)Efi is optimal if and only if Equivalently, IT*is optimal if and only if J,, E B n F* and
-
(e) For any E > 0, there exists a stationary &-optimal policy, i.e., a (pE,,us,.. .)Ell such that
T C ~=
Proposition 6.5 Let Assumptions A.l-A.5 and hold. Assume further that the exact selection assumption of the previous section holds.
(a) If for each X E S there exists a policy which is optimal at x, then there exists an optimal stationary policy.
(b) Let C be a Hausdorff space. If for some J E B n F* and for some positive integer 7t the set
U,(x, i) = { u E U(x) 1 H[x,u,Tk(J)] I i) is compact for all X E S, /ZE R, and k 2 E, then there exists an optimal stationary policy.
Part II
Stochastic Optimal Control Theory
Chapter 7
Borel Spaces and Their Probability Measures
This chapter provides the mathematical background required for analysis of the dynamic programming models of the subsequent chapters. The key concept, which is developed in Section 7.3 with the aid of the topological concepts discussed in Section 7.2, is that of a Borel space. In Section 7.4 the set of probability measures on a Borel space is shown to be itself a Borel space, and the relationships between these two spaces are explored. Our general framework for dynamic programming hinges on the properties of analytic sets collected in Section 7.6 and used in Section 7.7 to define and characterize lower semianalytic functions. These functions result from executing the dynamic programming algorithm, so we will want to measurably select at or near their infima to construct optimal or nearly optimal policies. The possibilities for this are also discussed in Section 7.7. A similar analysis in a more specialized case is contained in Section 7.5, which is presented first for pedagogical reasons. Our presentation is aimed at the reader who is acquainted with the basic notions of topology and measure theory, but is unfamiliar with some of the specialized results relating to separable metric spaces and probability measures on their Borel o-algebras.
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
7.1 Notation
We collect here for easy reference many of the symbols used in Part 11. Operations on Sets Let A and B be subsets of a space X. The complement of A in X is denoted by A". The set-theoretic diference A - B is A n Bc. We will sometimes write X - A is place of A". The symmetric diference A A B is ( A - B) u ( B - A). If X is a topological space, A will denote the closure of A. If A,, A,, . . . is a sequence of sets such that A, c A, c . . . and A = U,", ,An, we write An? A . I f A , 2 A, = . . . a n d A = n , " = , ~ , , w e w r i t e A ~ J A . I f X , , X ,,... isa sequence of spaces, the Cartesian products of X I , X,, . . . ,X , and of X , , X,, . . . are denoted by X,X,. . . Xn and X,X,. . . , respectively. If the given spaces have topologies, the product space will have the product topology. Under this topology, convergence in the product space is componentwise convergence in the factor spaces. If the given spaces have o-algebras 9,,,9,,,.. . , the product o-algebras are denoted by BX,9,,.. .9," and 9,,9,,. . . , respectively. If X and Y are arbitrary spaces and E c X Y , then for each x E X, the x-section of E is
If 9'is a class of subsets of a space X , we denote by o ( 9 ) the smallest the class of all subsets which o-algebra containing 9.We denote by 9, or 9, can be obtained by union or intersection, respectively, of countably many If 9 is the collection of closed subsets of a topological space X, sets in 9. then Fa= -9, and the members of 9, are called the F,-subsets of X. If 9 is the collection of open subsets of X, the members of 9, are called the G,-subsets of X. If (X, .Y) is a paved space, i.e., 9 is a nonempty collection of subsets of X, and S is a Suslin scheme for .Y(Definition 7.15), then N ( S ) is the nucleus of S. The collection of all nuclei of Suslin schemes for .P is denoted Y ( 9 ) . Special Sets The symbol R represents the real numbers with the usual topology. We use R* to denote the extended real numbers [- co,+ co] with the topology discussed following Definition 7.7 in Section 7.3. Similarly, Q is the set of rational numbers and Q* is the set of extended rational numbers Q u j fco). If X and Y are sets and f :X + Y , the graph off is
7.1
103
NOTATION
If A c X and %? is a collection of subsets of X, we define .f(A) = ( f ( x ) J xA~} and
If B c Y and (?? is a collection of subsets of Y , we define f -'(B) = {xEXIf ( x ) ~ B and )
, is the collection of closed subsets of X If X is a topological space, 9 and gxthe Borel o-algebra on X (Definition 7.6). The space of probability measures on (X, Bx)is denoted by P(X); C(X) is the Banach space of bounded, real-valued, continuousfunctions on X with the supremum norm
(If I1 = xsup(f(x>I sx ; for any metric d on X which is consistent with its topology, U,(X) is the space of bounded real-valued functions on X which are unijormly continuous with respect to d. If X is a Borel space (Definition 7.7), dxis its analytic a-algebra (Definition 7.19) and @, its universal a-algebra (Definition 7.18). We let N denote the set of positive integers with the discrete topology. The Baire null space JV is the product of countably many copies of N. The Hilbert cube Y? is the product of countably many copies of [0, 11. We will denote by C the collection ofjnite sequences of positive integers. We impose no topology on C. If s E C and z = (c,, [,, . . . ) E Mwe , write s < z to mean s = (C1,[,,. . . ,ik) for some k. Mappings If X and Y are spaces, proj, is the projection mapping from XY onto X If E is a subset of X, the indicator function of E is given by
Iff : X
+
[ - m, +a],the positive and negative parts off are the functions
If f,:X -t Y is a sequence of functions, Y is a topological space, and limn, ,f,(x) = f (x) for all x E X, then we write f, -,f . If, in addition, Y = [ - oo,+a]and fl(x) I f2(x) I . . . for all x E X , we write f, f , while if J;(x) 2 J2(x) 2 . . . for all x E X, we write fi J f'. In general, when the arguments of extended real-valued functions are omitted, the statements are to be
104
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
interpreted pointwise. For example, (sup,.JJ(x) = sup,f,(x) for all X E X , fl I ,f2 if and only if fl(x) 5 .f2(x) for all EX, and f + E is the function ( j ' E)(x)= f(x) + E for all x E X.
+
Miscellaneous If (X, d) is a nonempty metric space, x E X, and Y is a nonempty subset of X, we define the distance from x to Y by d(x, Y)
=
inf d(x, y). YEY
We define the diameter of Y by diam(Y) = sup d(x, y). X,YEY
If (X, .F)is a measurable space and .Fcontains all singleton sets, then for .xE X we denote by p, the probability measure on (X, 9)which assigns mass one to the set (x). 7.2 Metrizable Spaces Definition 7.1 Let (X, F ) be a topological space. A metric d on X is consistent with .Fif every set of the form { y E Xld(x, y) < c), x E X, c > 0, is in F, and every nonempty set in F is the union of sets of this form. The is metrizable if such a metric exists. space (X, 9)
The distinction between metric and metrizable spaces is a fine one: In a metric space we have settled on a metric, while in a metrizable space the choice is still open. If one metric consistent with the given topology exists, then a multitude of them can be found. For example, if d is a metric on X consistent with F , the metric p defined by
is also consistent with F. In what follows, we abbreviate the notation for metrizable spaces, writing X in place of (X, F ) . I f (X, .F) is a topological space and Y c X, unless otherwise specijed, we will understand Y to be a topological space with open sets G n Y , where G ranges over .FThis . is called the relative topology. If ( Z , Y ) is another topological space, cp : Z -t X is one-to-one and continuous, and cp- is continuous on cp(Z) with the relative topology, we say that cp is a homeomorphism and Z is homeomorphic to cp(Z). When there exists a homeomorphism from Z into X, we also say that Z can be homeomorphically embedded in X. Given a metric d on X consistent with its topology and a homeomorphism cp:Z -+ X
'
as just described, we may define a metric dl on Z by It can be easily verified that the metric dl is consistent with the topology Y . This implies that every topological space homeomorphic to a metrizable space (or subset of a metrizable space) is itself metrizable. Our attention will be focused on metrizable spaces and their Bore1 g-algebras. The presence of a metric in such spaces permits simple proofs of facts whose proofs are quite complicated or even impossible in more general topological spaces. We give two of these as lemmas for later reference. Lemma 7.1 (Urysohn's lemma) Let X be a metrizable space and A and B disjoint, nonempty, closed subsets of X. Then there exists a continuous function f :X -+ [O, 11 such that f (a) = 0 for every a E A, f (b) = 1 for every b E B, and 0 < f (x) < 1 for every x #A v B. If d is a metric consistent with > 0, then f can be chosen to be the topology on X and inf,,,,,,,d(a,b) uniformly continuous with respect to the metric d.
Proof
Let d be a metric on X consistent with its topology and define
where the distance from a point to a nonempty closed set is defined by (8). This distance is zero if and only if the point is in the set, and the mapping of (8) is Lipschitz-continuous by (6) of Appendix C. This f has the required b) > 0, then d(x, A) d(x, B) is bounded away properties. If inf,,,,,,,d(a, from zero, and the uniform continuity off follows. Q.E.D.
+
Lemma 7.2 Let X be a metrizable space. Every closed subset of X is a G, and every open subset is an F,.
Proof We prove the first statement; the second follows by complementation. Let F be closed. We may assume without loss of generality that F is nonempty. Let d be a metric on X consistent with its topology. The continuity of the function x -t d(x, F) implies that
isopen.ButF=n,"=,G,.
Q.E.D.
Definition 7.2 Let X be a metrizable topological space. The space X is separable if it contains a countable dense set.
It is easily verified that any subspace of a separable metrizable space is separable and metrizable. A collection of subsets of a topological space (X, F)is a base for the topology if every open set can be written as a union of sets from the collection. It is a subbase if a base can be obtained by taking
106
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
finite intersections of sets from the collection. If 5 has a countable base, (X, F)is said to be second countable. A topological space is Lindelof if every collection of open sets which covers the space contains a countable subcollection which also covers the space. It is a standard result that in metrizable spaces, separability, second countability, and the Lindelof property are equivalent. The following proposition is a direct consequence of this fact. Proposition 7.1 Let (X, .F)be a separable, metrizable, topological space and 8 a base for the topology 5.Then 8 contains a countable subcollection gowhich is also a base for F . Proof Let %? be a countable base for the topology 5.Every set C E % has the form C = B,, where I ( C ) is an index set and B, €8for every a~ l(C). Since C is Lindelof, we may assume l(C) is countable. Let 8, = Ucs,e{Bala~I(C)). Q.E.D.
U,E,,(c,
The Hilbert cube 2 is the product of countably many copies of the unit interval (with the product topology). The unit interval is separable and metrizable, and, as we will show later (Proposition 7.4), these properties carry over to the Hilbert cube. In a sense, 2 is the canonical separable metrizable space, as the following proposition shows. Proposition 7.2 (Urysohn's theorem) Every separable metrizable space is homeomorphic to a subset of the Hilbert cube S. Proof Let (X, d) be a separable metric space with a countable dense set (x,). Define functions cp,(x)=min{l,d(x,x,)), and cp:X
-+
k = 1 , 2 ,...,
H by
Each cp, is continuous, so cp is continuous. (Convergence in H is componentd(y, x,,) = 0, so wise.) If cp(x) = cp(y), then letting x,! -+ x, we see that lim x = y and cp is one-to-one. It rema~nsto show that cp-' is continuous, i.e., cp(y,) + cp(y) implies y, + y. But if cp(y,) + cp(y), choose E > 0 and x, such that d(y, x,) < E. Since d(y,, x,) -+ d(y, x,) as n -+ oo, for n sufficiently large d(y,,x,) < E. Then d(y,y,) < 2 ~ . Q.E.D.
,
If X is a separable metrizable space and cp :X + 2 is the homeomorphism whose existence is guaranteed by Proposition 7.2, then by identifying x E X with q ( x ) ~ Hwe , can regard X as a subset of H . Indeed, we can regard X as a topological subspace of 2,since the images of open sets in X under the mapping cp are just the relatively open subsets of cp(X) considered as a subspace of &?. Note, however, that although X is both open and closed in itself,
p(X) may be neither open nor closed in 2.In fact, it may have no topological characterization at all. Likewise, a set with special structure in X, say a G,, may not have this structure when considered as a subset of 2.The next definition and proposition shed some light on this issue. Definition 7.3 Let X be a topological space. The space X is topologically complete if there is a metric d on X consistent with its topology such that the metric space (X,d) is complete, i.e., if {x,) c X is a d-Cauchy sequence [d(xn7x,) -+ 0 as n, m -+ co], then {x,) converges to an element of X. Proposition 7.3 (Alexandroff's theorem) Let X be a topologically complete space, Z a metrizable space, and p : X -+ Z a homeomorphism. Then p(X) is a G,-subset of Z. Conversely, if Y is a G,-subset of Z and Z is topologically complete, then Y is topologically complete. Proof For the proof of the first part of the proposition, we treat X as a subset of Z. There are two metrics to consider, a metric d on Z consistent with its topology and a metric dl on X which makes it complete. Define
U,
=
For n
{z~Zld(s, X ) < l/n and 3 an open neighborhood V(z) of z such that
=
1,2,. . . , given z E U, and V(z) as just defined, we have
so U, is open. We show X = For z EX, define
U,
Then W(z) is relatively open in X, thus of the form W(z) = V(z) n X, where V(z) is an open neighborhood in Z of z. Also,
so z E U, . Therefore X c (),"=, U,. Now suppose z E (),"= U, . Then d(z, X ) = 0, and since X is closed, we have z E X. There is a sequence {x,) c X such that x, -+ s. Let V,(z)be an open neighborhood in Z of z for which
For each n, there is an index k, such that X,E V,(Z)for k 2 k,. From (11) we see that d,(xi,xj) < l/n for i,j 2 k,, so {x,) is Cauchy in the complete space (X, dl) and hence has a limit in X. But the limit is z by assumption, so X = n;=l Un.
108
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
For the converse part of the theorem, suppose (2, d) is a complete metric U,, where each Un is open in Z. Define a metric dl space and Y = on Y by
If {y,) is Cauchy in (Y,d,), then it is also Cauchy in (Z,d), and thus has a limit y E Z. For each n,
as i, j + oo,so [ l / d ( y , , Z - U,)] remains bounded as k y E U, for every n, hence y E Y. Q.E.D.
-+
oo.It follows that
As we remarked earlier without proof, the Hilbert cube inherits metrizability and separability from the unit interval. It also inherits topological completeness. This is a special case of the fact, which we now prove, that completeness and separability of metrizable spaces are preserved when taking countable products. Proposition 7.4 Let X I , X,, . . be a sequence of metrizable spaces and ;
Y, = X1X2.. . X n , Y = XIX,. . - . Then Y and each Y, is metrizable. If each X, is separable or topologically complete, then Y and each Y, is separable or topologically complete, respectively.
Proof
If d, is a metric on X, consistent with its topology, then
. . .), is a metric on Y consistent with the where y = (y,, y,, . . .), y^ = (y^,,ij,, product topology. If each (X,, d,) is complete, clearly (Y, d) is complete. If 9,is a countable base for the topology on X,, the collection of sets of the form GIG2. . . G,X,+ ,Xn+, . . . , where G, ranges over 9, and n ranges over the positive integers, is a countable base for the product topology on Y. The arguments for the product spaces Y, are similar. Q.E.D.
Combining Propositions 7.2-7.4, we see that every separable, topologically complete space is homeomorphic to a G,-subset of the Hilbert cube, and conversely, every G,-subset of the Hilbert cube is separable and topologically complete. We state a second consequence of these propositions as a corollary. Corollary 7.4.1 Every separable, topologically complete space can be homeomorphically embedded as a dense G,-set in a compact metric space.
Proof Let X be separable and topologically complete and let cp be a -is a G,-subset homeomorphism from X into 3.Since 3 is metrizable, cp(X) of (Proposition 7.3) and thus a dense G,-subset of cp(X). Tychonoff's Q.E.D. theorem implies that 2 is compact, so cp(X) is compact. If X and Z are topological spaces, cp a homeomorphism from Z onto X, and d a metric on X consistent with its topology such that (X, d) is complete, then dl defined by (10) is a metric on Z consistent with its topology, and (Z, dl) is also complete. Thus topological completeness is preserved under homeomorphisms. The same is true for separability, as is well known. Topological completeness is somewhat different from separability, however, in that one must produce a metric to verify it. It is quite possible that a space has two metrics consistent with its topology, is a complete metric space with one, but is not a complete metric space with the other. For example, let X = {I,+,*, . . .) have the discrete topology,
and
Then (X, dl) is complete, but (X, d,) is not. A more surprising example is that the set N o of irrational numbers between 0 and 1 with the usual topology is topologically complete. To see this, write .No= n,,,([O, 11 - {r)), where Q is the set of rational numbers. It follows that N o is a G,-subset of [O,1] and is thus topologically complete by Proposition 7.3. Another proof is obtained as follows. Let N be the set of positive integers with the discrete topology and N the product of countably many copies of N. The space .N is called the Baire null space and is topologically complete (Proposition 7.4). The topological completeness of N o follows from the fact that N and N o are homeomorphic. We give the rather lengthy proof of this because it is not readily available elsewhere. This homeomorphism will be used only to construct a counterexample (Example 1 in Chapter 8), so it may be skipped by the reader without loss of continuity. Proposition 7.5 The topological spaces N o and JV are homeomorphic.
CN, Proof Let C be the set offinite sequences ofpositive integers. 1 f z ~ u we will represent its components by Ck. Similarly, 9, will represent the components of an element 2 of C u N . The length of z E C u .N is defined to be the number of its components. If s has length greater than or equal to k, we define zk = (<,,<,+,,. . .) or zk = (C,,. . . ,[,), depending on whether z has infinite length or length m < co.
110
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
For z E C u M , define a sequence whose initial terms are
If z has length k < oo,we define x l ( z ) ,x2(z),. . . ,xk(z)as shown, and xk+j ( z ) = xk(z),j = 1 , 2 , . . . .
Claim 1 The sequence { x k ( z ) )converges to an element of (O,1] for 'V'ZECu A'-. If z has finite length, the claim is trivial. If z has infinite length, then so for every n xZn(z)S lim inf xk(z) I lim sup x k ( z )I x z n - l ( z ) . k-t m
k-t m
Now
we have 4 n-1
0 < x2,- 1(z) - x2n(z) I and Claim 1 follows. Define q : C u JV
9
(0,1] by q ( z ) = lim xk(z). k-t m
Note that if Z E JV, then 0 < q ( z ) < 1. Also, if z has length at least k, then Claim 2
If
ZE
.M and q ( z ) = q ( P ) , then s
=
B.
(13)
Suppose q ( z ) = q(2) and z # 2. We can use (14) to assume without loss In the latter of generality that il # or else 2 has length one and i, = case, (12)implies
r,
r,.
and a contradiction is reached. In the former case, if 2 has length one, then from (14) lirl =
= ~ ( 2=) l / r r l
+ q(z2)i,
SO ill
=
il + 9 ( z 2 ) 7
which is impossible, since 0 < ~74.2,)< 1. If 2 has length greater than one, then and PI
+ ~ ( 2 2=) il + cp(z2).
This is also impossible, since 0 < cp(2,) 5 1 and 0 < ( ~ ( z ,< ) 1. Claim 3 Every rational number in (0,1] has the form cp(?), where 2~ C.
Let r,/q be a rational number in (O,1] reduced to lowest terms, r, and q positive integers. Then r,lq
= (qlr1)r1=
[q,
+ (rzlr1)l-
' 5
where q1 and r2 are positive integers and r, < r,. Likewise, where q2 and r3 are positive integers and r3 < r,. Continuing, we eventually obtain Y , = 1 and have
Claims 2 and 3 imply that if Z E N , then q ( z ) is irrational. Put another way, cp maps ./lr into N o .But given Y E .No,it is possible to choose positive integers i , , i 2 ,. . . , such that
etc., so that defining z
=
. . .), we have
7.
112
BOREL SPACES AND THEIR PROBABILITY MEASURES
It follows that cp(z) = y, so cp maps JV onto JV, and, by Claim 2, is one-to-one on N . We show that cp restricted to .N is open and continuous. Let V c JV be open. We may assume without loss of generality that
v ={ ~ - q ( t l ~ . .
. 7 t n )
= (?I,.
. . ,?,I).
Then cp(V)=
{(t,+ ( f 2 + . . . + ( C n + c p ( Z ) ) - ' . . . ) - ' ) - ~ I Z € M ) ,
and since { c p ( z ) l zN ~ ) = M,, q ( V ) is open. Since convergence in M is componentwise and x,(z) depends only on the first n components of Z E N , continuity of cp on JV follows from (13). Q.E.D. We now examine properties of metrizable spaces related to the notion of total boundedness. Definition 7.4 A metric space (X, d ) is totally bounded if, given there exists a finite subset F, of X for which
E
> 0,
A totally bounded metric space is necessarily separable, since U,"=, F,,, is a countable dense subset. Total boundedness depends on the metric, however, and a space which is totally bounded (and separable) with one metric may not be totally bounded with another. Like separability, total boundedness is preserved under passage to subspaces, i.e., if (X, d ) is totally bounded and Y c X , then ( Y ,d ) is totally bounded. To see this, take E > 0 and let F,,, be a finite subset of X such that
Choose a point, if possible, in each of the sets
Y n { Y ~ x l d ( x , Y<) ~ 1 2 ) ~ x ~ F , , z , and call the collection of these points G,. Then
We use this fact to prove the following classical result relating completeness, compactness, and total boundedness. Proposition 7.6 A metric space is compact if and only if it is complete and totally bounded.
Proof If (X, d ) is a compact metric space, then every Cauchy sequence has an accumulation point. The Cauchy property implies that the sequence
converges to this point, and completeness follows. Also, for collection of sets
E
> 0, the
contains a finite cover of X . Hence, ( X ,d ) is totally bounded. If ( X ,d ) is complete and totally bounded and S = { s j ) is a sequence in ( X , d ) , then an infinite subsequence S , c S must lie in some set B , = { y ~ X l d ( x y) , , < 1 ) . Since B , is totally bounded, an infinite subsequence S2 c S , must lie in some set B2 = { Y E B l l d ( x z ,y ) < 3). Continuing in this manner, we have for each n an infinite sequence S,, c S, lying in B,+ = ( Y E B,ld(x,+ y) < l / ( n + 1 ) ) . Let j , < j2 < . . . be such that sj, E S,. Then ( s j n ) is Cauchy and thus convergent. Therefore S has an accumulation Q.E.D. point, and the compactness of ( X ,d ) follows.
,,
,
,
Corollary 7.6.1 The Hilbert cube is totally bounded under any metric consistent with its topology, and every separable metrizable space has a totally bounded metrization.
Proof The Hilbert cube is compact by Tychonoff's theorem. Urysohn's theorem (Proposition 7.2) can be used to homeomorphically embed a given separable metrizable space into the Hilbert cube. Q.E.D.
As mentioned previously, total boundedness implies separability. By combining this fact with Proposition 7.6, we obtain the following corollary. Corollary 7.6.2 A compact metric space is complete and separable.
If X is a metrizable space, the set of all bounded, continuous, realvalued functions on X is denoted C ( X ) .As is well known, C ( X )is a Banach space under the norm
and we will always take C ( X )to have the metric and topology corresponding to this norm. If d is a metric on X consistent with its topology, we denote by U d ( X )the collection of functions in C ( X )which are uniformly continuous with respect to d. We take U d ( X )to have the relative topology of C ( X ) . We conclude this section with a discussion of the properties C ( X )and U,(X) inherit from X . Proposition 7.7 If X is a compact metrizable space, then C ( X ) is separable.
Proof The space X is separable (Corollary 7.6.2). Let {x,} be a countable dense subset of X and let F,, F,, . . . be an enumeration of the collection of sets of the form .( y E X I d ( x k , y ) I l l n ) , where k and n range over the positive
114
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
xj
integers. For any disjoint pair F , and F j , let be a continuous function taking values in [O,1] such that J j ( x ) = 0 for x E Fi and J j ( x ) = 1 for x~ F j . If F iand F j are not disjoint, let J j be identically one. Let %' consist of the functions J j as i and j range over the positive integers. The collection %' clearly separates points in X , i.e., given x # y, there exists f E%' for which .f(x) # .f(y). Let 9 be the collection of finite-degree polynomials over %', i.e., a typical element in .VP has the form
E %',and the summation is finite. where a(i,, . . . ,in;j,, . . . ,jn) E R, J . . ,in Then 9 is a vector space under addition and the product of two elements in -9'is again in 9.With these operations 9 is an algebra, and by the StoneWeierstrass theorem, .VP is dense in C ( X ) .Let Pobe the collection of finitedegree polynomials over %? with rational coefficients. An easy approximation argument shows that Pois dense in 9, and thus dense in C ( X )as well. Since 9, is countable, C ( X ) is separable. Q.E.D.
Definition 7.5 Let (X, d l ) and ( Y ,d,) be metric spaces. A mapping cp :X -+ Y is an isometry if
In this case we say that ( X ,d l ) and (cp(X),d2) are isometric spaces. If ( X ,d l ) and ( Y ,d,) are as in Definition 7.5, we may regard the former as a subspace of the later, and the distances between points in X are unaffected by this embedding. Thus an isometry is a metric-preserving homeomorphism. Proposition 7.8 Let ( X ,d ) be a metric space. There exists a complete metric space (X,, d l ) , called the completion of ( X ,d ) , and an isometry cp:X + X , such that q ( X ) is dense in X,. Proof The construction of the completion of a metric space is standard, so we content ourselves with a sketch of it. Given the metric space ( X ,d ) , define an equivalence relation on the set of Cauchy sequences in ( X ,d ) by ( x ,j
-
-
-
(x;)
lim d(x,, x;)
n+
= 0.
oC
Let X , be the set of equivalence classes of Cauchy sequences in ( X ,d ) under this relation and let dl be defined on X d X , by d l ( x ,y) = lim d ( x n ,y,), n- x
where (x,} and ( y,} are chosen to represent the equivalence classes x and y. It is straightforward to verify that the limit in (15) exists for every pair of Cauchy sequences {x,) and {y,), and it is independent of the particular sequences chosen to represent the equivalence classes x and y. Furthermore, (Xd,dl)can be shown to be a complete metric space, and the mapping cp which takes X E Xinto the equivalence class in X, containing the Cauchy sequence (x, x,. . .) is an isometry. The image of X under cp is dense in X,. Q.E.D. We can regard X, as consisting of X together with limits of all Cauchy sequences in X. We are really interested in the case in which (X, d) is totally bounded, for which we have the following result. Corollary 7.8.1 Let (X,d) be a totally bounded metric space. There exists a compact metric space (X,, dl) and an isometry cp:X -,X, such that cp(X) is dense in X,.
Proof In light of Propositions 7.6 and 7.8, it suffices to prove that the completion (X,,d,) of (X, d) is totally bounded. Choose E > 0. Regarding (X, d) as a subspace of (X,, dl), choose a finite set F , of X for which
Since X is dense in X,, we have X,
=
U { y E X,ld,(x, y) < E}.
Q.E.D.
x~ F ,
If X is a separable metrizable space, it is not necessarily true that C(X) is separable (unless X is compact, in which case we have Proposition 7.7). For example, let f : R + [O,1] be defined as
and given an infinite sequence b = (PI,P,, . . .) of zeroes and ones, define
We have constructed an uncountable collection of functions f, in C(R) such that if b1 # b2, then I/&, - f,,ll = 1. Therefore, C(R) cannot be separable. It is true, however, that given a separable metrizable space X, there is a metric d on X consistent with its topology such that U,(X) is separable. This is a consequence of the next proposition and the fact that separability implies the existence of a totally bounded metrization (Corollary 7.6.1). We prove this proposition with the aid of the following lemma.
116
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
Lemma 7.3 Let Y be a metrizable space, d a metric on Y consistent with its topology, and X c Y. If g E U,(X), then g has a continuous extension to Y , i.e., there exists @ E C ( Y )such that g(x) = g^(x)for every x E X , and the extension g^ can be chosen to satisfy lgll = Qll. If X is dense in Y, g^ is unique.
1
11
Proof Since g is uniformly continuous on X , given E > 0 there exists 6 ( ~>) 0 such that if x , , x , E X and d(x,, x,) < then Ig(x,) - g(x,)l < r. Suppose EX. Then there exists a sequence {x,) c X for which x, + y. Given r > 0, there exists N ( E )such that d(x,, x,) l 6(r) for all n, m 2 N(E), so {g(x,)) is Cauchy in R. Define g^(y)= lirn,,, g(x,). Note that n 2 N ( E ) implies Ig(x,) - g^( y) I E. Suppose now that X E X and d(x, y) 5 6 ( ~ ) / 2Choose . n 2 N ( E )SO that d(x,, y) < 6 ( ~ ) / 2Then . d(x,x,) I a(&)and
a(&),
1
This shows that for any sequence ( x ; ) c X with x; + y, we have g^(y)= lim,,,g(x~), so the definition of g^(y) is independent of the particular sequence (x,) chosen. If y E X , we can take x, = y, n = 1,2, . . . , and obtain g ( y ) = g^(y),so g^ is an extension of g. If { y,) is a sequence in X which converges to y e X , then there exist sequences {x,,) in X with y, = lim,,, x,,. . . Choose n, < n, < . . . so that lim,,, x,,_ = y and d(x,,,,,, y,) < 6(l/m)/2. Then
Letting m + oo in (18) and using (17),we conclude that g^(y)= lirn,, and g^ is continuous on X . It is clear that
,g^(y,)
If X = Y, g^ is clearly unique and we are done. If X is a proper subset of Y, use the Tietze extension theorem (see, e.g., Ash [All or Dugundji [D7]) to extend g^ to all of Y so that
Proposition 7.9 If ( X ,d ) is a totally bounded metric space, then U,(X) is separable. Proof Corollary 7.8.1 tells us that ( X ,d ) can be isometrically embedded as a dense subset of a compact metric space (X,,d,). We regard X as a
7.3
117
BOREL SPACES
subspace of X,. Given any g E U,(X), by Lemma 7.3, g has a unique extension Il @ l . The mapping g -+ g^ is linear and normpreserving, thus an isometry from U,(X) to C(X,). The latter space is sepQ.E.D. arable by Proposition 7.7, and the separability of U,(X) follows. @ E C(X,) such that 11g11 =
7.3 Borel Spaces
The coi~structionsnecessary for the subsequent theory of dynamic programming are impossible when the state space and control space are arbitrary sets or even when they are arbitrary measurable spaces. For this reason, we introduce the concept of a Borel space, and in this and subsequent sections we develop the properties of Borel spaces which permit these constructions. Definition 7.6 If X is a topological space, the smallest o-algebra of subsets of X which contains all open subsets of X is called the Borel o-algebra and is denoted by B,.The members of Bxare called the Borel subsets of X.
If X is separable and metrizable and F is a o-algebra on X containing a subbase Y for its topology, then F contains Bx. This is because, from Proposition 7.1, any open set in X can be written as a countable union of finite intersections of sets in Y . Thus we have Bx= o(Y) for any subbase Y. We will often refer to the smallest a-algebra containing a class of subsets as the o-algebra generated by the class. Thus, Bxis the o-algebra generated by the class of open subsets of X. Note that 93, is the class of Borel subsets of the real numbers in the usual sense, i.e., the o-algebra generated by the intervals. Given a class of real-valued functions on a topological space X, it is common to speak of the weakest topology with respect to which all functions in the class are continuous. In a similar vein, one can speak of the smallest o-algebra with respect to which all functions in the class are measurable. If X is a metrizable space, it is easy to show that its topology is the weakest with respect to which all functions in C(X) are continuous. The following proposition is the analogous result for 9,. In the proof and in subsequent proofs, we will use the fact that for any two sets R, R', any collection V of subsets of Q', and any function f :R + R', we have
.[f
-l(@)l = f -'[o(V)l-
is the smallest Proposition 7.10 Let X be a metrizable space. Then o-algebra with respect to which every function in C(X) is measurable, i.e., ' g=~ f E C ( X ) . f - '(,@R)].
118
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
Proof Denote 9 = o[UfeCcX, f -l(gR)] and let F R be the topology of R. We have
To prove the reverse containment $9, c 9 we need only establish that 9 contains every nonempty open set. By Lemma 7.2, it suffices to show that 9contains every nonempty closed set. Let A be such a set. We may assume without loss of generality that A # X, so there exists X E X - A. Let B = ( x ) , Q.E.D. and let f be given by Lemma 7.1. Then A = f - ' ( { O ) ) E 9. We use Lemma 7.2 to prove another useful characterization of the Borel o-algebra in a metrizable space. Proposition 7.11 Let X be a metrizable space. Then 93, is the smallest class of sets which is closed under countable unions and intersections and contains every closed (open) set.
Proof Let 9 be the smallest class of sets which contains every closed set and is closed under countable unions and intersections, i.e., 9 is the intersection of all such classes. Then 9 c 98, and it suffices to prove that 9 is closed under complementation. Let 9' be the class of complements of Then 9 ' is also closed under countable unions and intersections. sets in 9. Lemma 7.2 implies that 9 contains every open set, so 9' contains every Given D E 9, we have D E 9', SO closed set, and consequently 9 c 9'. DC€9. Q.E.D. Definition 7.7 Let X be a topological space. If there exists a complete separable metric space Y and a Borel subset BE.@, such that X is homeomorphic to B, then X is said to be a Bovel space. The empty set will also be regarded as a Borel space.
Note that every Borel space is metrizable and separable. Also, every complete separable metrizable space is a Borel space. Examples of Borel spaces are R, Rn, and R* with the weakest topology containing the intervals [-GO, a), (P, a ] , (a, P), or, P E R . (This is also the topology that makes the function cp defined by
7.3
119
BOREL SPACES
a homeomorphism from R* onto [- 1,1]). Any countable set X with the discrete topology (i.e., the topology consisting of all subsets of X) is also a Borel space. We will show that every Borel subset of a Borel space is itself a Borel space. For this we shall need the following two lemmas. The proof of the first is elementary and is left to the reader. Lemma 7.4 If Y is a topological space and E c Y, then the o-algebra BE generated by the relative topology coincides with the relative o-algebra, i.e., the collection {E n C~CEB,).In particular, if E € B y , then $8, consists of the Borel subsets of Y contained in E. Lemma 7.5 If X and Y are topological spaces and cp is a homeomorphism of X into Y , then cp(Bx)= B,(,,. Proof If Fxis the topology of X, then cp(.Fx) is the topology of p(X). Since cp is one-to-one, we have that cp is the inverse of a mapping, and
Proposition 7.12 If X is a Borel space and B E B,, then B is a Borel space. Proof Let cp be a homeomorphism of X into some complete separable metric space Y such that cp(X)~.@,.From Lemma 7.5 and the fact that B E.gX,we obtain cp(B)E ,Q,(,). It follows from Lemma 7.4 that q(B)E By. Q.E.D.
Like separability and completeness, the property of being a Borel space is preserved when taking countable Cartesian products. Proposition 7.13 Let X I , X,, . . . be a sequence of Borel spaces and Y, = X l X 2 . . . X,, Y = X l X 2 . . . . Then Y and each Y, with the product topology is a Borel space and the Borel o-algebras coincide with the product = BxI.Qx2. . . Bxnand By = Bx,Bx2 ' . o-algebras, i.e., BYn '
'
Proof As in Proposition 7.4, we focus our attention on the more difficult infinite product. Consider the last statement of the proposition. Each X, has a countable base 9, for its topology, and the collection of sets of the form G,G, . . . GnX,+,Xn+, . . . , where G, ranges over 9, and n ranges over the positive integers, is a base for the product topology on Y. The o-algebra generated by this topology is 98,.Recall that the product o-algebra ,%,,B,,... is the smallest o-algebra containing all finite-dimensional measurable rectangles, i.e., all sets of the form BIB2 . . . B,X,+ ,, X,+, . . . , where BkE B,,, k = 1, . . . , n. It is clear that each basic set of the product topology on Y is a finite-dimensional measurable rectangle, and since each open subset of Y is a countable union of these basic open sets, every open subset of Y is .gx,9Zlx2 . . . measurable. We conclude that By c @ x I ~ x. .2. . (Note that
120
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
this argument relies only on the separability of the spaces X I , X 2 , . . . . Without this separability assumption, the argument fails and the conclusion is false.) The reverse set containment follows from the observation that for each kand B,E&?,,, X , X 2 ~ ~ ~ X k ~ , B k X k + , ~ ~ ~ ~ ~ y . To prove that Y is a Borel space, note that X , can be mapped by a homeomorphism cp, onto a Borel subset of a separable topologically complete space 2,.The product = 2,z2. . . is separable and topologically complete, and cp : Y + I;defined by cp(x1,x2, . -
-1 = (cp,(x,), cp2(x2),. . .)
is a homeomorphism from Y onto cp,(X,)cp2(X2). . . . This last set is in L@z,&?z,' ' . = g F ,and the conclusion follows. Q.E.D. Definition 7.8 Let X and Y be topological spaces. A function f : X + Y is Borel-measurable iff - ' ( B )E 3, for every B E B y . In many respects, Borel-measurable functions relate to Borel o-algebras as continuous functions relate to topologies. We have already used the fact, for example, that if f , : X -+ Y , is continuous from a topological space X to a topological space Y,, k = 1,2, . . . , then F : X + Y, Y2 . . . defined by F(x) = ( f l ( x )f2(x), , . . .) is also continuous. This follows from the componentwise nature of convergence in product spaces. There is an analogous fact for Borel-measurable functions and Borel spaces. Proposition 7.14 Let X be a Borel space, Y,, Y2, . . . a sequence of Borel spaces, and f,: X + Y , a sequence of functions. If each f, is Borel-measurable, k = 1,2,. . . , then the function F:X + Y, Y , - - - defined by and the functions F,:X
-t
Y, Y 2 .. . Y, defined by
are Borel-measurable. Conversely, if F is Borel-measurable, then each f, is Borel-measurable, k = 1,2, . . . , and if some F, is Borel-measurable, then f,, f 2 , . . . ,f, are Borel-measurable. Proof Again we consider only the infinite product. The Borel o-algebra in Y 1 Y 2 . .. is generated by sets of the form B I B 2 . . . , where B k ~ g y k , k = 1 , 2 , . . . .Now F - ' ( B I B 2 . . .) = f ;'(B,) nf ;'(B2) n . . . .
(19)
The left side of (19) is in B, for each B k € B Y , ,k = 1,2,. . . , if and only if the sets f , '(B,) are in % , , for each B, E g y k ,k = 1,2,. . . , and the result follows. Q.E.D.
7.3
BOREL SPACES
121
Corollary 7.14.1 Let X and Y be Borel spaces, D a Borel subset of X, and f :D + Y Borel-measurable. Then
Proof The mappings (x, y) + f (x) and (x, y) + y are Borel-measurable from DY to Y , so the mapping F(x, y) = (f(x), y) is Borel-measurable from D Y to Y Y . Then
Since {(y,y ) j y ~Y ) is closed in Y Y , Gr(f) is Borel-measurable.
Q.E.D.
The concept of homeomorphism is instrumental in classifying topological spaces, since it allows us to identify those which are "topologically equivalent." We can also classify measurable spaces by identifying those which, when regarded only as sets with o-algebras, are indistinguishable. We specialize this concept to Borel spaces. Definition 7.9 Let X and Y be Borel spaces and cp:X -+ Y a Borelmeasurable, one-to-one function such that cp-' is Borel-measurable on cp(X). Then cp is called a Borel isomorphism, and we say that X and cp(X) are Borel-isomorphic (or simply isomorphic). If X and Y are Borel spaces and cp : X + Y is a Borel isomorphism, it is tempting to think of X and q(X) as identical measurable spaces. The difficulty with this is that X is a Borel space, but q(X) is not required to be. This discrepancy is eliminated by the following intuitively plausible proposition, the rather lengthy proof of which can be found in Chapter I, Section 3 of Parthasarathy [PI]. We will not have occasion to use this result. Proposition 7.15 (Kuratowski's theorem) Let X be a Borel space, Y a separable metrizable space, and cp :X -t Y one-to-one and Borel-measurable. Then cp(X) is a Borel subset of Y and cp-' is Borel-measurable. In particular, if Y is a Borel space, then X and p(X) are isomorphic Borel spaces. The advantage of classifying spaces by means of Borel isomorphisms is illustrated by the following result. We need this proposition for the subsequent development, but the proof is rather lengthy and is relegated to Appendix B, Section 2. Proposition 7.16 Let X and Y be Borel spaces. Then X and Y are isomorphic if and only if they have the same cardinality. Proposition 7.16 leads to a consideration of the possible cardinalities of Borel spaces. Of course, Borel spaces which are countably infinite are
122
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
possible, as are Borel spaces which consist of a given finite number of elements. In both these cases, the Borel o-algebra is the power set and the conclusion of Proposition 7.16 is trivial. Because every Borel space can be homeomorphically embedded in the Hilbert cube, every Borel space has cardinality less than or equal to c. Even if one were to admit the possibility of an uncountable cardinality strictly less than c, the proof of Proposition 7.16 as given in Appendix B shows that every uncountable Borel space has cardinality c. By combining this fact with Proposition 7.16, we obtain the following corollary. Corollary 7.16.1 Every uncountable Borel space is Borel-isomorphic to every other uncountable Borel space. In particular, every uncountable Borel space is isomorphic to the unit interval [O,1] and the Baire null space M . 7.4
Probability Measures on Borel Spaces
If X is a metrizable space, we shall refer to a probability measure p on the measurable space (X,B,) as simply a probability measure on X. The set of all probability measures on X will be denoted by P(X). A probability measure p~ P(X) determines a linear functional 1,: C(X) + R defined by
On the other hand, a function f Of: P(X) + R defined by
E
C(X) determines a real-valued function
These relationships and the metrizability of the underlying space X allow us to show several properties of P(X). In particular, we will prove that there is a natural topology on P(X), the weakest topology with respect to which every mapping of the form of (21) is continuous, under which P(X) is a Borel space whenever X is a Borel space. 7.4.1 Characterization of Probability Measures Definition 7.10 Let X be a metrizable space. A probability measure p E P(X) is said to be regular if for every B EBx,
p(B) = s u p j p ( ~ ) I c F B, F closed)
= inf
j p ( G ) l ~c G, G open).
(22)
Proposition 7.17 Let X be a metrizable space. Every probability measure in P(X) is regular.
7.4
123
PROBABILITY MEASURES ON BOREL SPACES
Proof Let p E P(X) be given and let Q be the collection of Be.BX for which (22) holds. If H c X is open, then H = F,, where {F,) is an increasing sequence of closed sets (Lemma 7.2), so
U,"=,
inf [p(G)IH c G, G open)
= p(H) =
lim p(F,) n- m
I sup (p(F)IF c H, F closed) I p(H).
Therefore & contains every open subset of X. We show that & is a o-algebra and conclude that Q = 99,. If B E 8,then p(Bc) = 1 - p(B) = 1 - supjp(F)IF c B, F closed) = inf .( p(G)IBc c G, G open), and similarly, p(Bc) = sup{p(F)IF c Bc, F closed), so 8 is closed under complementation. Now suppose {B,: is a sequence of sets in 6.Choose E > 0 and F, c B, c G, such that Fn is closed, Gn is open, and p(Gn - F,) I ~12".Then
and since E is arbitrary, p
U B, (nyl
) { = inf
It is also apparent from (23) that
so for N sufficiently large,
p(G)
U B, !nyl
I
c G, G open .
124
7.
The finite union
Uk
BOREL SPACES AND THEIR PROBABILITY MEASURES
Fn is a closed subset of
U,"=,Bn and E is arbitrary, so
This shows that 8 is closed under countable unions and completes the proof. Q.E.D. From Proposition 7.17 we conclude that a probability measure on a metrizable space is completely determined by its values on the open or closed sets. The following proposition is a similar result. It states that a probability measure p on a metric space ( X ,d ) is completely determined by the values Jg dp, where g ranges over U d ( X ) . Proposition 7.18 Let X be a metrizable space and d a metric on X consistent with its topology. If p,, p2 E P ( X ) and
then p,
= p2 .
Proof
Let F be any closed proper subset of X and let G,
=
( x E Xld(x, F )
< lln). For sufficiently large n, F and G,' are disjoint nonempty closed sets for which inf,.,,,,,,.nd(x, y) > 0, so by Lemma 7.1, there exist functions
f,E U,(X) such that f,(x) = 0 for x E G,', f,(x) = 1 for x E F, and 0 I f , ( x ) I 1 for every x E X . Then
and so
Reversing the roles of p1 and p2, we obtain pl(F) = p2(F). Proposition 7.17 Q.E.D. implies p,(B) = p2(B)for every B E 97,. 7.4.2 The Weak Topology
We turn now to a discussion of topologies on P ( X ) , where X is a metrizable space. Given E > 0, p E P ( X ) , and f E C ( X ) , define the subset of P ( X ): V , ( p : f )= { q t ~ ( ~ ) I J f d qJfdPlp <Ej.
7.4
PROBABILITY MEASURES ON BOREL SPACES
I f D c C ( X ) ,consider the collection of subsets of P ( X ) :
Let F ( D ) be the weakest topology on P ( X ) which contains the collection V ( D ) ,i.e., the topology for which V ( D )is a subbase. Lemma 7.6 Let X be a metrizable space and D c C ( X ) .Let {p,) be a net in P ( X ) and p E P ( X ) . Then p, + p relative to the topology F ( D ) if and only if J ,f dpa + J f dp for every ,f E D. Proof Suppose p, -t p and f E D. Then, given E > 0, there exists /3 such that a 2 p implies y, E T/,(p;f).Hence dp, -+ J f d p . Conversely, if i f dp, -+ i f dp for every f E D, and G EF ( D ) contains p, then p is also contained in I/,,(p;f,)c G, where E, > 0 andf, E D, k = 1,. . . ,n. some basic open set Choose /3 such that for all a 2 /3 we have 1j.L dp, - Jf, dpl < E,, k = 1,. . . ,n. Then ~ , E Gfor a 2 P, so pa +p. Q.E.D.
i,f
n;=,
We are really interested in . T [ C ( X ) ] , the so-called weak topology on P ( X ) .The space C ( X )is too large to be manipulated easily, so we will need a . a set D is produced countable set D c C ( X )such that 3 ( D ) = 3 [ C ( X ) ] Such by the next three lemmas. Lemma 7.7 Let X be a metrizable space and d a metric on X consistent with its topology. If f E C ( X ) , then there exist sequences {g,) and {h,] in U,(X) such that g, .T. f and h, f . Proof We need only produce the sequence {g,}, since the other case follows by considering - f . In Lemma 7.14 under weaker assumptions we will have occasion to utilize the construction about to be described, so we are careful to point out which assumptions are being used. I f f E C ( X ) ,then f is bounded below by some b E R, and for at least one x , E X, f(x,) < c ~ . Define
Note that for every x E X ,
b I g,(x) 5 f ( x )
+ nd(x,x ) = f ( x ) ,
and
b I g,(x) l f ( x o ) + nd(x, x,) c a. Thus
7.4
PROBABILITY MEASURES O N BOREL SPACES
Lemma 7.9 Let X be a metrizable space and d a metric on X consistent with its topology. If D is dense in U,(X), then F [ U , ( X ) ] = J"(D).
Proof It is clear that .F(D) c . T [ U , ( X ) ] . To prove the reverse set containment, we choose a set I/,(p;g ) ^lr[U,(X)], ~ select a point po in this set, and construct a set in T ( D )containing po and contained in I/,(p;g).Let
3 . for any q~ T/,,,,(p0; h), we have Let h~ D be such that llg - hll < ~ ~ 1Then
Proposition 7.19 Let X be a separable metrizable space. There is a metric d on X consistent with its topology and a countable dense subset D of U,(X) such that F ( D ) is the weak topology F [ C ( X ) ]on P ( X ) .
Proof Corollary 7.6.1 states that the separable metrizable space X has a totally bounded metrization d. By Proposition 7.9, there exists a countable dense set D in U,(X). The conclusion follows from Lemmas 7.8 and 7.9. Q.E.D. From this point on, whenever X is a metrizable space, we will understand P ( X ) to be a topological space with the weak topology F [ C ( X ) ] . We will show that when X is separable and metrizable, P ( X ) is separable and metrizable; when X is compact and metrizable, P ( X ) is compact and metrizable; when X is separable and topologically complete, P ( X ) is separable and topologically complete; and when X is a Borel space, P ( X ) is a Borel space. Proposition 7.20 If X is a separable metrizable space, then P ( X ) is separable and metrizable.
7.
128
BOREL SPACES AND THEIR PROBABILITY MEASURES
Proof Let d be a metric on X consistent with its topology and D a countable dense subset of Ud(X) such that F ( D ) is the weak topology on P(X) (Proposition 7.19). Let R" be the product of countably many copies of the real line and let cp :P(X) + R" be defined by
where {g,,g,,. . .) is an enumeration of D. We will show that cp is a homeomorphism, and since Rw is metrizable and separable (Proposition 7.4), these properties for P(X) will follow. Suppose that cp(pl) = cp(p2), SO that Jg,dp, = Jg,dp, for every ED. If g~ Ud(X),then there exists a sequence {g,,} c D such that Ilg,, - gll + 0 as j -+ co. Then
< 2 lim supllgkj - gll = 0, j- ca
so Jg dp, = Jg dp,. Proposition 7.18 implies that p1 = p,, so cp is one-to-one. ~ the mapping p + Jg,dp is continuous by Lemma 7.6, so For each g , D, cp is continuous. To show that cp-' is continuous, let (p,) be a net in P(X) such that cp(p,) + cp(p) for some p~ P(X). Then Jg,dp, + Jg, dp for every gk~D,andbyLernma7.6,p,+p. Q.E.D. Proposition 7.20 guarantees that when X is separable and metrizable, the topology on P(X) can be characterized in terms of convergent sequences rather than nets. We give several conditions which are equivalent to convergence in P(X). Proposition 7.21 Let X be a separable metrizable space and let d be a metric on X consistent with its topology. Let (pn} be a sequence in P(X) and p E P(X). The following statements are equivalent:
(a) Pn+P; (b) Jfdp,+Jfdpforevery f EC(X); (c) j"g dpn jg dp for every g E U d ( X ) ; (d) lim sup,,, p,(F) I p(F) for every closed set F c X; (e) lim inf,,, , p,(G) 2 p(G) for every open set G c X. +
7.4
129
PROBABILITY MEASURES ON BOREL SPACES
Proof The equivalence of (a), (b), and (c) follows from Lemmas 7.6 and 7.8. The equivalence of (d) and (e) follows by complementation. To show that (b) implies (d), let F be a closed proper nonempty subset , < llk). For k sufficiently large, F and G; of X and let Gk = ( x ~ X l d ( xF) are disjoint nonempty sets, and there exist functions f k e C ( X ) such that f,(x) = 1 for x E F, f,(x) = 0 for x E G;, and 0 I f(x) I 1 for every x E X. Using (b) we have
and letting k + oo,we obtain (d). To show that (d) implies (b), choose f E C(X) and assume without loss of generality that 0 I f I 1. Choose a positive integer K and define closed sets
Define p :X -t [O, 11 by
where FK+ =
a.Then f - (1/K) Ip If , and, for any q
EP ( X ) ,
Using (d) we have
S
lim sup f dpn - (1/K) 5 lirn sup S p dpn n+ m
n+ m
K
lim sup
=
n+m
k=l
~n(~k)
and since K is arbitrary, we obtain lim sup J-f dp. 5 J f dp n+ w
for every f~C ( X ) .In particular, (29) holds for -f , SO
S
S
lim inf f dp. = - lim sup (- f )dpn 2 n+a.
n+m
Combine (29) and (30) to conclude (b).
S(-f )dp Sf dp.
Q.E.D.
=
(30)
7.
130
BOREL SPACES AND THEIR PROBABILITY MEASURES
When X is a metrizable space, we denote by p, the probability measure on p(X) which assigns unit point mass to x, i.e., p,(B) = 1 if and only if x E B. Corollary 7.21.1 Let X be a metrizable space. The mapping 6 :X + P(X) defined by S(x) = p, is a homeomorphism.
Proof It is clear that 6 is one-to-one. Suppose {x,) is a sequence in X and x E X. If X, -+ x and G is an open subset of X, then there are two possibilities. Either X E G , in which case X,EG for sufficiently large n, so p,.(G) = 1 = px(G),or else x $ G, in which case lim inf,, , p,,,(G) 2 lim inf,, , 0 = p,(G). Proposition 7.21 implies pXn-+ p,, SO 6 is continuous. On the other hand, if pXn-+ p, and G is an open neighborhood of x, then since liminf,,, pXn(G)2 px(G) = 1, we must have X,E G for sufficiently large n, i.e., x, -+ x. This shows that 6 is a homeomorphism. Q.E.D. From Corollary 7.21.1 we see that p, can converge to p in such a way that strict inequality holds in (d) and (e) of Proposition 7.21. For example, let G c X be open, let x be on the boundary of G, and let x, converge to x through G. Then pXn(G)= 1 for every n, but p,(G) = 0. We now show that compactness of X is inherited by P(X). Proposition 7.22 If X is a compact metrizable space, then P(X) is a compact metrizable space.
Proof If X is a compact metrizable space, it is separable (Corollary 7.6.2) and C(X) is separable (Proposition 7.7). Let { fk) be a countable set in C(X) such that f, = 1,II I 1 for every k, and .( fk) is dense in the unit sphere (f E C(X)lllf I 1). Let [ - 1, l I m be the product of countably many copies of [- 1,1] and define q : P ( X ) -+ [- 1,1]" by
11
fkl l
A trivial modification of the proof of Proposition 7.20 shows q is a homeomorphism. We will show that q[P(X)] is closed in the compact space [- 1, I]", and the compactness of P(X) will follow. Suppose (p,) is a sequence in P(X) and q(p,) + (a,, a,, . . . ) E [ - 1,1Im. Given F > 0 and f E C(X) with 11 f I 1, there is a functionf, with f - , f k l < ,513. There is a positive integer N such that n,m 2 N implies IJf,dp, - Sfkdpm/< 4 3 . Then
11
11
7.4
131
PROBABILITY MEASURES ON BOREL SPACES
11 11 > 1, define
so { jf d p n ) is Cauchy in [ - 1,1]. Denote its limit by E( f ) . If f
It is easily verified that E is a linear functional on C ( X ) ,that E( f ) 2 0 whenever f 2 0, IE(f)l I 11 f for every f e C ( X ) , and E ( f , ) = 1. Suppose (h,) is a sequence in C ( X )and hn(x)-1 0 for every x E X . Then for each E > 0 , the set K,(E) = {xlh,(x) 2 E ) is compact, and K,(E) = Therefore, for n sufficiently large, K,(E) = which implies Ilhn/lL O . Consequently, E(h,) L O . This shows that the functional E is a Daniel1 integral, and by a classical theorem (see, e.g., Royden [R5, p. 299, Proposition 211) there exists a unique f -l(BR)] which satisfies E( f ) = j fdp for probability measure on o[ufEc(x, every f E C ( X ) . Proposition 7.10 implies p ~ P ( X ) . We have
11
a,
so q ( p n )+ q(p). This proves q [ P ( X ) ] is closed.
a.
Q.E.D.
In order to show that toplogical completeness and separability of X imply the same properties for P ( X ) , we need the following lemma. Lemma 7.10 Let X and Y be separable metrizable spaces and cp :X + Y a homeomorphism. The mapping $: P ( X ) -+ P ( Y ) defined by
is a homeomorphism. Proof Suppose p,, p2 E P ( X ) and p1 # p, . Since p1 and p2 are regular, there is an open set G c X for which pl(G) f p2(G). The image q ( G ) is relatively open in q ( X ) , so q ( G ) = q ( X )n B, where B is open in Y. It is clear that
so $ is one-to-one. Let [p,) be a sequence in P ( X ) and p ~ P ( X ) . If p, -+ p, then since q - ' ( H ) is open in X for every open set H c Y , Proposition 7.21 implies lim inf$(p,)(H) = lim i n f p n [ q - ' ( H ) ]2 p [ q - ' ( H ) ] n-
K,
= $(p)(H),
n-cc
so $ ( p H )-+ $ ( p ) and $ is continuous. If we are given {p,) and p such that $(pn) -+ IC/(p),a reversal of this argument shows that p, -+ p and I+-' is continuous. Q.E.D. Proposition 7.23 If X is a topologically complete separable space, then P ( X ) is topologically complete and separable.
132
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
Proof By Urysohn's theorem (Proposition 7.2) there is a homeomorphism cp:X -+ 2 , and the mapping $ obtained by replacing Y by 2 in Lemma 7.10 is a homeomorphism from P(X)to P ( 2 ) .Alexandroff's theorem (Proposition 7.3) implies q ( X ) is a G,-subset of I f ,and we see that We will show) I [P(X)] is a G,-subset of the compact space P(X)(Proposition 7.22) and use Alexandroff's theorem again to conclude that P(X) is topologically complete. Since q ( X ) is a G,-subset of 2,we can find open sets G, 3 G, 3 . . . such that q ( X ) = (-),"=, G,. It is clear from (31) that
~ 2 c) is But for any closed set F and real number c, the set { p P(X)lp(F) 2 Gn)< llk) is the comclosed by Proposition 7.21(d), and ( p ~ P ( 2 ) l p ( plement of such a set. Q.E.D. We turn now to characterizing the a-algebra B,(,, when X is metrizable and separable. From Lemma 7.6, we have that the mapping Qf:P(X)-+ R given by
is continuous for every f E C(X). One can easily verify from Proposition 7.21 that the mapping 8,:P(X) -+ [O, 11 defined by'
is Borel-measurable when B is a closed subset of X. (Indeed, in the final stage of the proof of Proposition 7.23, we used the fact that when B is closed the upper level sets { p P(X)IB,(p) ~ 2 c} are closed.) Likewise, when B is open, 8, is Borel-measurable. It is natural to ask if 8, is also Borel-measurable when B is an arbitrary Bore1 set. The answer to this is yes, and in fact, B,(,, is the smallest a-algebra with respect to which 8, is measurable for every B E B ~A. useful aid in proving this and several subsequent results is the concept of a Dynkin system. The use of the symbol 0 , here is a slight abuse of notation. In keeping with the definition of Of,the technically correct symbol would be Ox,#.
7.4
133
PROBABILITY MEASURES O N BOREL SPACES
Definition 7.11 Let X be a set and 9 a class of subsets of X. We say 9 is a Dynkin system if the following conditions hold:
(a) X E ~ . (b) I f A , B ~ 9 a n dB c A, then A- BE^. (c) If A,,A,,. . . ~ and 9 A, c A, c..., then
U,"=,A , E ~ .
Proposition 7.24 (Dynkin system theorem) Let 9 be a class of subsets of a set X, and assume 9 is closed under finite intersections. If 9 is a Dynkin then 9 also contains o ( 9 ) . system containing 9,
Proof This is a standard result in measure theory. See, for example, Ash [Al, page 1691. Q.E.D. Proposition 7.25 Let X be a separable metrizable space and 8 a collection of subsets of X which generates B, and is closed under finite intersections. Then Bp(,, is the smallest o-algebra with respect to which all functions of the form
are measurable from P(X) to [0, 11, i.e.,
Proof Let 9 be the smallest o-algebra with respect to which 8, is measurable for every E € 6 . To show 9 c Bp(,,, we show that O, is ,@,(,,measurable for every BE.@,. Let 9 = (BE,%~,IQ,is .gPo,-measurable). It is easily verified that 9 is a Dynkin system. We have already seen that 9 contains every closed set, so the Dynkin system theorem (Proposition 7.24) implies 9 = 93,. It remains to show that B,,,, c 9 . Let 9' = {BEB,IO, is B-measurable) As before, 9'is a Dynkin system, and since & c 9', we have 9' = W,.Thus the function Qs(p)= J.f dp is 9-measurable when f is the indicator of a Bore1 set. Therefore Of is .F-measurable when f is a Borel-measurable simple function. Iff E C(X), then there is a sequence of simple functions f, which are uniformly bounded below such that f,t f . The monotone convergence theorem implies Of, t Of, SO Of is 9-measurable. It follows that for E > 0, p~ P(X), and f E C(X), the subbasic open set
is 9-measurable. It follows that Bp(,, = .F (see the remark following Definition 7.6). Q.E.D.
134 Corollary 7.25.1
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
If X is a Borel space, then P(X) is a Borel space.
Proof Let cp be a homeomorphism mapping X onto a Borel subset of a topologically complete separable space Y. Then, by Lemma 7.10, P(X) is homeomorphic to the Borel set {pE P(Y) 1 p[cp(~)]= 1). Since P(Y) is topologically complete and separable (Proposition 7.23), the result folQ.E.D. lows. 7.4.3 Stochastic Kernels
We now consider probability measures on a separable metrizable space parameterized by the elements of another separable metrizable space. Definition 7.12 Let X and Ybe separable metrizable spaces. A stochastic kernel q(dy1x) on Y given X is a collection of probability measures in P(Y) parameterized by EX. If F is a a-algebra on X and y-'[&!,(,,] c F, where y :X + P(Y) is defined by
(32) then q(dyjx) is said to be .F-measurable. If y is continuous, q(dylx) is said to be continuous. Y (x) = 4(dy(x),
Proposition 7.26 Let X and Y be Borel spaces, d a collection of subsets of Y which generates 98, and is closed under finite intersections, and q(dy1x) a stochastic kernel on Y given X. Then q(dy1x) is Borel-measurable if and only if the mapping I,,: X -+ [O, 11 defined by
is Borel-measurable for every E E 6. Proof Let y : X -+ P(Y) be defined by y(x) = q(dy1x). Then for E E $ , we have I., = 8,. y. If q(dylx) is Borel-measurable (i.e., y is Borel-measurable), , is Borel-measurable for every E E 8.Conthen Proposition 7.25 implies i c versely, if ,?, is Borel-measurable for every E € b , then a[i,JE,, jLE 92,. Proposition 7.25 implies
so q(dylx) is Borel-measurable.
Q.E.D.
Corollary 7.26.1 Let X and Y be Borel spaces and q(dylx) a Borelmeasurable stochastic kernel on Y given X. If BE.@^,, then the mapping
7.4
PROBABILITY MEASURES O N BOREL SPACES
A, :X + [O,1] defined by
where B,
=
[ y E Y l(x, y) E B), is Borel-measurable.
Proof If B EBxy and x~ X, then B, c Y is homeomorphic to B n [jx) Y] E B ~ It follows ~ . that B,EB,, so q(B,lx) is defined. It is easy to show that the collection 3 = [BE.%,,IA, is Borel-measurable) is a Dynkin system. Proposition 7.26 implies that 9 contains the measurable rectangles, so 3 = Bx,. Q.E.D.
We now show that one can decompose a probability measure on a product of Borel spaces into a marginal and a Borel-measurable stochastic kernel. This decomposition is possible even when a measurable dependence on a parameter is admitted, and, as we shall see in Chapter 10, this result is essential to the filtering algorithm for imperfect state information dynamic programming models. As a notational convenience, we use _X to denote a typical Borel subset of a Borel space X. Proposition 7.27 Let (X, F) be a measurable space, let Y and Z be Borel spaces, and let q(d(y,z)lx) be a stochastic kernel on YZ given X. Assume that q(B1x) is 9-measurable in x for every BE$,,. Then there exists a stochastic kernel r(dzlx, y) on Z given XY and a stochastic kernel s(dy1x) on Y given X such that r(Zlx, y) is 9.%,-measurable in (x, y ) for every _2 E B,, , s(Z/x)is 9-measurable in x for every Z E B ~and
Proof We prove this proposition under the assumption that Y and Z are uncountable. If either Y or Z or both are countable, slight modifications (actually simplifications) of this proof are necessary. From Corollary 7.16.1, we may assume without loss of generality that Y = Z = (0,1]. Let s(dy1x) be the marginal of q(d(y, z)(x)on Y, i.e., s(Y(x)= q(_YZlx)for every YEW,. For each positive integer n, define subsets of Y
M(j, n) = ( ( j- 1)/2",,j/2"],
j = I,. . . ,2".
Then each M(j,n + 1) is a subset of some M(k,n), and the collection jM(j, n)ln = 1,2,. . . ; j = 1,. . . ,2") generates 8,. For Z E Q n Z, define q(dy(0,z] Ix) to be the measure on Y whose value at Y E By is q(_Y(O,z] lx). Then q(dy(0,z]lx) is absolutely continuous with respect to s(dy1x) for every
136
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
Z E Q n Z and EX. Define for z e Q n Z
Gn(zlx,Y ) =
q [ M ( j ,n)(O,zl Ixlls[M(j,n)lxl if y E M ( j , n) and s [ M ( j ,n)lx] > 0, 0 if Y E M ( j , n) and s [ M ( j ,n)lx] = 0.
The functions Gn(zlx,y) can be regarded as generalized difference quotients of q(dy(0,z]Ix) relative to s(dy1x). For each z, the set ( x ,y ) E X Yl lim G,(zlx, y) exists in R n+ m
is .FBy-measurable. Theorem 2.5, page 612 of Doob [D4] states that
and if we define lim Gn(zlx,y)
if ( x ,Y ) E B(z),
G(zlx,Y ) = otherwise, then
It is clear that for any z, G(zlx,y) is 9-8,-measurable in ( x ,y).t A comparison of (34) and (35) suggests that we should try to extend G(zlx,y) in such a way that for fixed ( x ,y), G(zlx,y) is a distribution function. For the reader familiar with martingales, we give the proof of the theorem just referenced. Fix x and J] and observe that for rn 2 n.
Since {M(j,n)lj = 1,. . . ,2") is the 0-algebra generated by G,(ZX,y ) regarded as a function of y, we conclude that G,(+ y), n = 1,2, . . . is a martingale on Y under the measure s(~/ylx). Each G,(slx,y) is bounded above by 1, so by the martingale convergence theorem (see, e.g., Ash [Al, p. 2921) G,(s/x, J ) converges for s(dylx) almost every y. Thus s[B(z),ls] = 1 and the definition of G(zlx, y) given above is possible. Let m + x in (*) to see that (35) holds whenever 1 = M( j. n) for some j and n. The collection of sets Y for which (35) holds is a Dynkin system, and it follows from Proposition 7.24 that (35) holds for every Y E B , .
7.4
PROBABILITY MEASURES ON BOREL SPACES
137
Toward this end, for each z, E Q n Z , we define C(zo)= { ( x ,y) ~ x Y 1 3 - Q 7 ~n Z with z I zo and G(zlx,y) > G(zolx,y ) ) ,
D(zo)= { ( x ,y) E X Y i G ( . lx, y) is not right-continuous at z o )
E
=
{ ( x ,Y )E X Y ~ G ( ~y)~does X , not converge to zero as z 10)
and
For fixed X E X and z, then
E
Q n Z , (35) implies that whenever Z E Q n Z , z I z,,
Therefore G(zlx,y) I G(zolx,y) for s(dylx)almost all y, so s[C(zo),lx] = 0 and
Equation (36) implies that G(zlx,y) is nondecreasing in z for s(dylx) almost all y. This fact and (35)imply that if z J zo ( Z E Q n Z ) , then
and
fbr s(dy1x) almost all y. Therefore s[~(z,),lx] = 0 and
7.
138
BOREL SPACES AND THEIR PROBABILITY MEASURES
Equation (35) also implies that as z J 0 (z E Q n Z )
Since G(zlx,y) is nondecreasing in z for s(dylx) almost all y, we must have G(zlx,y) i0 for s(dy1x) almost all y, i.e.,
Substituting z = 1 in (35), we see that
so G(ljx, y) = 1 for s(dylx) almost all y, i.e.,
s(F,lx)
= 0.
For z E Z, let {z,) be a sequence in Q n Z such that z, 1zand define, for every EX, Y, lim G(z,lx, y)
if (x, y ) t X Y - (C u D u E u F),
F(zlx, y) =
(40) otherwise.
For (x, y) E XY - (C u D u E u F), G(zlx, y) is a nondecreasing rightcontinuous function of z E Q n Z, SO F(zlx, y) is well defined, nondecreasing, and right-continuous. It also satisfies for every (x, y) E X Y,
and lim F(zlx, y) = 0. z i o
It is a standard result of probability theory (Ash [Al, p. 241) that for each (x, y) there is a probability measure r(dzlx, y) on Z such that
~ which r(Zlx, y) is Fc3-A,-measurable in The collection of subsets Z E B for (x, y) forms a Dynkin system which contains {(O,z]jzt Z } , so r(Zlx, y) is S.3,-measurable for every _Z E BZ Relations (35)-(40) and the monotone convergence theorem imply
7.4
PROBABILITY MEASURES ON BOREL SPACES
139
The collection of subsets ZE% . , for which (34)holds forms a Dynkin system which contains {(O, z] lz E Z } , so (34)holds for every _ZE BZ. Q.E.D. If B = B,,an application of Proposition 7.26 reduces Proposition 7.27 to the following form. Corollary 7.27.1 Let X , Y , and Z be Borel spaces and let q(d(y,z)lx) be a Borel-measurable stochastic kernel on Y Z given X. Then there exist Borel-measurable stochastic kernels r(dzlx,y) and s(dy1x)on Z given X Y and on Y given X,respectively, such that (34)holds. If there is no dependence on the parameter x in Corollary 7.27.1, we have the following well-known result for Borel spaces. Corollary 7.27.2 Let Y and Z be Borel spaces and q E P ( Y Z ) .Then there exists a Borel-measurable stochastic kernel r(dz1y) on Z given Y such that
where s is the marginal of q on Y
7.4.4 Integration As in Section 2.1, we adopt the convention
With this convention, for a, b,
CE
R* the associative law
still holds, since if either a, b, or c is co,then both sides of (42)are co,while if neither a, b, nor c is co,the usual arithmetic involving finite numbers and - co applies. Also, if a, b, c~ R* and a + b = c, then a = c - b, provided b # f m. It is always true however that if a b I c, then a I c - b. We use convention (42) to extend the definition of the integral. If X is a metrizable space, p E P ( X ) , and f :X + R* is Borel-measurable, we define
+
Note that if 1f dp < co or if Sf - dp < co, (43) reduces to the classical definition of f dp. We collect some of the properties of integration in this extended sense in the following lemma. +
S
Lemma 7.11 Let X be a metrizable space and let p~ P ( X ) be given. Let ,f; g and f ; , n = 1.2,. . . , be Borel-measurable, extended real-valued functions on X.
140
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
(a) Using (42) to define f J(f
+ g, we have
+9 ) d< ~ Jf
dP + Jg dP.
(b) If either (bl) j f dp < cc and jg+dp < co,or (b2) - dp < co and Jg- dp < co,or (b3) jg+ dp < co and jg- dp < cc,then +
Sf
J(f + 9)dP = Jf d~ + Jg (c) (d) (e) (f )
dP.
I f 0 < a < co,thenj(af)dp = ajfdp. Iff I g, then f dp < jg dp. If f, t f and j fi dp > - co,then dp t dp. Iff, I f and dp < a, then dp I j f dp.
1
Sfs
Sf,
Sf,
Sf
Proof We prove (b) first and then return to (a). Under assumption (bl), we have f(x) < cc and g(x) < co for p almost every x, so the sum f(x) g(x) can be defined without resort to the convention (42) for p almost every x. Furthermore, j f dp < co and jgdp < co, so (45) follows from the additivity theorem for classical integration theory (Ash [Al, p. 451). The proof of (45) under assumption (b2) is similar. Under assumption (b3), either J f dp = co, in which case both sides of (45) are co, or else j f dp < co, in which case assumption (bl) holds. Returning to (a), we note that if assumption (bl) holds, then (45) implies (44). If assumption (bl) fails to hold, then
+
+
+
so (44) is still valid. Statements (c) and (d) are simple consequences of (42) and (43). Statement (e) follows from the extended monotone convergence theorem (Ash [Al, p. 471) if j f ; dp < co.If J f ; dp = co,then j f, dp > - cc implies dp = Jf , dp = cc,and the conclusion follows from (d). Statement (f) follows from the extended monotone convergence theorem. Q.E.D.
Sf:
We saw in Corollary 7.27.2 that a probability measure on a product of Borel spaces can be decomposed into a stochastic kernel and a marginal. This process can be reversed, that is, given a probability measure and one or more Borel-measurable stochastic kernels on Borel spaces, a unique probability measure on the product space can be constructed. Proposition 7.28 Let X I , X,, . . . be a sequence of Borel spaces, Y, = X s X 2 . . . X , a n d Y = X , X , . . . . L e t p ~ P ( X ~ ) b e g i v e n , a n d , f o=r n1,2,..., let q,(dx,+ ( y,) be a Borel-measurable stochastic kernel on X,, given 5 .
,
,
7.4
PROBABILITY MEASURES O N BOREL SPACES
141
Then for n = 2,3,.. . , there exist unique probability measures rn E P(Y,) such that
rn(xl_x2' x n ) = Jx- I x x
J
Xz ...JX n -
1
9n-l(~nx1,x2~...~xn-1)
qn-2(d~n-lIx1,~2,. . . ,xn-2).. . ql(dx2lxl)p(dxl) V X l €Bxl,. . . ~ X ~ E B X(46) ,.
1
I f f : Y, -t R* is Borel-measurable and either f + cir, < ac, or J f - dr, < co, then
.. Furthermore, there exists a unique probability measure r on Y = XlX2. such that for each n the marginal of r on Y,is r,.
Proof The spaces Y,,n = 2,3,.. . , and Yare Bore1 by Proposition 7.13. If there exists rn E P(Y,) satisfying (46),it must be unique. To see this, suppose r; E P(Y,) also satisfies (46).The collection 9 = jB €By,lrn(B)= YL(B)) is a and rn = Dynkin system containing the measurable rectangles, so 9 = Byn rb. We establish the existence of rn by induction, considering first the case n = 2.For BEB,,, use Corollary 7.26.1to define
It is easily verified that r 2 € P(Y2)and r2 satisfies (46).Iff is the indicator of B € B y 2the , I,, f(xl,x2)ql(dx21xl) is Borel-measurable and, by (48),
Linearity of the integral implies that (49)holds for Borel-measurable simple functions as well. I f f : Y2-t [0,co] is Borel-measurable, then there exists an increasing sequence of simple functions such that f,t f.By the monotone convergence theorem,
is Borel-measurable and so J x 2 f(xl,x2)ql(dx,lxl) tir2 = n-t lirnm Jx, J x 2 ~ ( x~1 ~> ) q ~ ( d x ~ l x ~ ) ~ ( d x ~ ) lim Jy2,fn
n-
30
142
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
But !,,f, dr, t jy2 f dr,, so (49)holds for any Borel-measurable nonnegative f . For a Borel-measurable f : Y2+ R* satisfying f f dr, < ar, or j f - dun < oo, we have +
J y 2 f + dr2
= Jxl
L,f +(x1,x2)q~(dx,lx,)~(dx~),
and
L,f-
dr2 =
jxlJ x 2 f - ( x 1 ,~ 2 ) q l ( d x 2 ~ ~ l ~ ~ ( d ~ l ) .
Assume for specificity that jy2 f
-
dr2 < oo.Then the functions
and
satisfy condition (b2) of Lemma 7.11, so
where the last step is a direct result of the definition of jx, f ( x , , x2)q,(dx,lxl). Assume now that r k € P ( Y , )exists for which (46) and (47) hold when n = k. For B E Y,+,, let
Then rk+,E P(Y,+,). If B
= _ X 1 X 2 . ._XkX,+,, where
+
X j ~ . 9 3 x jthen ,
by (47) when n = k. This proves (46) for n = k 1. Now use (50) to prove (47) when n = k + 1 and f is an indicator function. As before, extend this to the case of f : Y,, + [0, oo]. If f : Y,+ + R* is Borel-measurable and either J f dr,+ < co or f - dr,, < oo, then the validity of (47) for non+
,
,
1
,
,
7.4
PROBABILITY MEASURES O N BOREL SPACES
negative functions and the induction hypothesis imply
qi(dxzlxi)~(dxi) =~ x l , , . x k ~ x k + l f ~ ( x l ~ ~ ~ ~ ~ x k + l ) ~ k ( d x k + l ~ x l ~ x 2 ~ ~ ~ ~ ~ x k
and likewise Jyk+
- drk+l
= Jx,,,,xkfxk+
Assume for specificity that
- ( X I > .. . ? x k + l ) q k ( d x k + l l x l ~ x 2 ? .
I,,+, f
-
. . ,xk)drk.
drk+, < co.Then the functions
and
satisfy condition (b2) of Lemma 7.11, so as before
Since
we can apply the induction hypothesis to the right-hand side of (51) to conclude that (47)holds in the generality stated in the proposition. To establish the existence of a unique probability measure r E P(Y)whose marginal on Y,is r,,n = 2,3,. . . ,we note that the measures r, are consistent, i.e., if m 2 n, then the marginal of r, on Y, is r,. If each X k is complete, the Kolmogorov extension theorem (see, e.g., Ash [Al,p. 1911)guarantees the existence of a unique r~ P(Y)whose marginal on each Y, is r,. If X k is not complete, it can be homeomorphically embedded as a Borel subset in a complete separable metric space 2,. As in Proposition 7.13,each Y,, is homeomorphic to a Borel subset of the complete separable metric space = .flz2. . and Y is homeomorphic to a Borel subset of the complete separable metric space = . . . Each r , P(Y,) ~ can be identified with
.z,
zl.f2.
144
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
P,,EP(K) in the manner of Lemma 7.10, and, invoking the Kolmogorov extension theorem, we establish the existence of a unique PEP(Y) whose marginal on each is P,,. It is straightforward to show that P assigns probability one to the image of Y in 7,so 7 corresponds to some Y E P(Y) whose marginal on each Y,, is v,. The uniqueness of P implies the uniqueness of v. Q.E.D. In the course of proving Proposition 7.28, we have also proved the following. Proposition 7.29 Let X and Y be Borel spaces and q(dylx) a Borelmeasurable stochastic kernel on Y given X. I f f : X Y -t R* is Borel-measurable, then the function il:X -+ R* defined by
Corollary 7.29.1 Let X be a Borel space and let f :X measurable. Then the function 8f :P(X) R* defined by
+ R*
be Borel-
-+
Proof Define a Borel-measurable stochastic kernel on X given P(X) by q(dx1p) = p(dx). ~ e f i n e f ? : ~-+( R* ~ )by~ f?(p,X) = f(x). Then
is Borel-measurable by Proposition 7.29.
Q.E.D.
I f f E C(XY) and q(dylx) is continuous, then the mapping I. of (52) is also continuous. We prove this with the aid of the following lemma. Lemma 7.12 Let X and Y be separable metrizable spaces. Then the mapping o:P(X)P(Y)+ P(XY) defined by
where pq is the product of the measures p and q, is continuous. Pvoof' We use Urysohn's theorem (Proposition 7.2) to homeomorphically embed X and Y into the Hilbert cube 2 , and, for simplicity of notation, we treat X and Y as subsets of 2.Let d be a metric on 2%consistent with its topology. If g~ U , ( X Y ) , then Lemma 7.3 implies that g can be extended to a function @ E C ( A f 2 ) . The set of finite linear combinations of the form Cjk,lf3.(~)hj(y), where and hj range over C ( 2 ) and k ranges over the
f3.
7.5
BOREL-MEASURABLE SELECTION
145
positive integers, is an algebra which separates points in .XX, so given > 0, the Stone-Weierstrass theorem implies that such a linear combination x h j - gll < E. If (pn) is a sequence in P(X) can be found satisfying converging to p E P(X), (9,) a sequence in P(Y) converging to q E P(Y), and f,and hj the restrictions of and hj to X and Y, respectively, then E
lxT=
The continuity of o follows from the equivalence of (a) and (c) of Proposition 7.21. Q.E.D. Proposition 7.30 Let X and Y be separable metrizable spaces and let q(dy1x) be a continuous stochastic kernel on Y given X. Iff E C(XY), then the function 1,:X -+ R defined by
is continuous. Proof The mapping v:X + P(XY) defined by v(x) = p,q(dy)x) is continuous by Corollary 7.21.1 and Lemma 7.12. We have iL(x)= (Of. v)(x), where Of:P(XY) + R is defined by Of(r) = Sf dr. By Proposition 7.21, Of is Q.E.D. continuous. Hence, 1, is continuous. 7.5 Semicontinuous Functions and BorelMeasurable Selection In the dynamic programming algorithm given by (17) and (18) of Chapter 1, three operations are performed repetitively. First, there is the evaluation of a conditional expectation. Second, an extended real-valued function in two variables (state and control) is infimized over one of these variables (control). Finally, if an optimal or nearly optimal policy is to be constructed, a "selector" which maps each state to a control which achieves
146
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
or nearly achieves the infimum in the second step must be chosen. In this section, we give results which will enable us to show that, under certain conditions, the extended real-valued functions involved are semicontinuous and the selectors can be chosen to be Borel-measurable. The results are applied to dynamic programming in Propositions 8.6-8.7 and Corollaries 9.17.2-9.17.3. Definition 7.13 Let X be a metrizable space and f an extended realvalued function on X. If { x E XI f ( x ) < c ) is closed for every c E R, f is said to be lower semicontinuous. If { xE XI f ( x )2 c ) is closed for every C E R, f is said to be upper semicontinuous.
Note that f is lower semicontinuous if and only if -f is upper semicontinuous. We will use this duality in the proofs of the following propositions to assert facts about upper semicontinuous functions given analogous facts about lower semicontinuous functions. Note also that i f f is lower semicontinuous, the sets { X E XI f ( x ) = - co) and { x ~f ( x ) < co) are closed, since the former is equal to ,(x E XI f ( x )< - n) and the latter is X. There is a similar result for upper semicontinuous functions. The following lemma provides an alternative characterization of lower and upper semicontinuous functions.
XI
Lemma 7.13 Let X be a metrizable space and f : X -+ R*.
(a) The function f is lower semicontinuous if and only if for each sequence jx,] c X converging to X E X lim inf f ( x , ) 2 .f(x). n+ m
(b) The function f is upper semicontinuous if and only if for each sequence (x,) c X converging to x E X lirn sup f (x,) < f (x). DO',
Proof Suppose f is lower semicontinuous and x, a subsequence {x,,) such that x,, -+ x as k -+ co and
lim f (x,,) k+ m
= lim inf f
-+
x. We can extract
(x,).
n+m
Given E > 0, define lim inf f (x,) + E
if lim inf f (x,) > - a, ,-.a
otherwise.
7.5
147
BOREL-MEASURABLE SELECTION
There exists a positive integer k ( ~ such ) that f(x,,) 5 8(&)for all k 2 k(c). The set (YEXIf(y) < $(E))is closed, and hence it contains x. Inequality (53) follows. Conversely, if (53) holds and for some C E R, (x,) is a sequence in ( y E XI f(y) < c) converging to x, then f(x) I c, so f is lower semicontinuous. Part (b) of the proposition follows from part (a) by the duality mentioned Q.E.D. earlier. I f f and g are lower semicontinuous and bounded below on X and if x,
-+ X, then
lim inf [f (x,) n-rm
+ g(x,)] 2 lim inf f (x,) + lim infg(x,) n- m m n+
so f + g is lower semicontinuous. I f f is lower semicontinuous and cc > 0, then uf is lower semicontinuous as well. Upper semicontinuous functions have similar properties. It is clear from (53) and (54) that f : X -+ R* is continuous if and only if it is both lower and upper semicontinuous. We can often infer properties of semicontinuous functions from properties of continuous functions by means of the next lemma. Lemma 7.14 Let X be a metrizable space and f : X -+ R*. (a) The function f is lower semicontinuous and bounded below if and only if there exists a sequence {f,) c C ( X )such that f , t f . (b) The function f is upper semicontinuous and bounded above if and only if there exists a sequence (f,) c C(X) such that f , If. Proof We prove only part (a) of the proposition and appeal to duality for part (b). Assume f is lower semicontinuous and bounded below by b E R, and let d be a metric on X consistent with its topology. We may assume without loss of generality that for some x, E X , f(xo) < co, since the result is trivial otherwise. (Take f,(x) = n for every x E X.) As in Lemma 7.7, define
Exactly as in the proof of Lemma 7.7. we show that jg,) is an increasing sequence of continuous functions bounded below by b and above by f . The characterization (53) of lower semicontinuous functions can be used in place of continuity to prove g, T f'. In particular, (28) becomes ,f(x) I lirn inf f(y,) I lirn g,(x) n+ m
+ E.
n- m
Now define f" = min jn,g,j.. Then each f , is continuous and bounded and jl, t f . This concludes the proof of the direct part of the proposition. For
7.
148
BOREL SPACES AND THEIR PROBABILITY MEASURES
the converse part, suppose { f , )c C(X) and f , t f . For
is closed.
CE
R,
Q.E.D.
The following proposition shows that the semicontinuity of a function of two variables is preserved when one of the variables is integrated out via a continuous stochastic kernel. Proposition 7.31 Let X and Y be separable metrizable spaces, let q(dy1x) be a continuous stochastic kernel on Y given X, and let f :XY + R* be Borel-measurable. Define
(a) Iff is lower semicontinuous and bounded below, then A is lower semicontinuous and bounded below. (b) Iff is upper semicontinuous and bounded above, then iis upper semicontinuous and bounded above. Proof We prove part (a) of the proposition and appeal to duality for part (b). Iff :XY -+ R* is lower semicontinuous and bounded below, then by Lemma 7.14 there exists a sequence (f,) c C(XY) such that f, T f . Define /Zn(x)= Jf,(x, y)q(dylx). By Proposition 7.30, we have that A, is continuous, and by the monotone convergence theorem in? A. By Lemma 7.14, iis lower semicontinuous. Q.E.D. An important operation in the execution of the dynamic programming algorithm is the infimization over one of the variables of a bivariate function. In the context of semicontinuity, we have the following result related to this operation. Proposition 7.32 Let X and Y be metrizable spaces and let f : X Y + R* be given. Define
f *(x) = inf f (x, y). YGY
(a) I f f is lower semicontinuous and Y is compact, then f * is lower semicontinuous and for every x E X the infimum in (55) is attained by some yE Y. (b) Iff is upper semicontinuous, then f * is upper semicontinuous. Proof (a) Fix x and let { y,} c Y be such that f (x, y,) L f *(x). Then (y,) accumulates at some y , Y, ~ and part (a) of Lemma 7.13 implies that f(x, yo) = f *(x). To show that f * is lower semicontinuous, let (x,) c X be
7.5
BOREL-MEASURABLE SELECTION
such that x,
+ x,.
Choose a sequence { y,) c Y such that
There is a subsequence of {(x,, y,)), call it ((x,,,~,~)), such that lim inf,, , f (x,, y,) = lim,, , f (x,, ,y,,). The sequence { y,,) accumulates at some yo E Y , and, by Lemma 7.13(a), lim inf f *(x,) = lim inf f (x,, y,) n-r
w
n-r
=
w
lim f (x,,, y,,) 2 f (x, ,yo) 2 f *(xo), k+ m
* is lower semicontinuous. (b) Let dl be a metric on X and d, a metric on Y consistent with their topologies. If G c XY is open and x, E proj,(G), then there is some yo E Y for which (x,, yo)E G, and there is some r > 0 such that so f
N,(xo, yo) = {(x,Y)EXYldl(x, xo) < E> dz(y, yo) < 8)
G.
Then
so proj,(G) is open in X. For
CE
R,
The upper semicontinuity of f implies that {(x,y)l f(x, y) < c) is open, so Q.E.D. {XEXI f "(x) < c) is open and f * is upper semicontinuous. Another important operation in the dynamic programming algorithm is the choice of a measurable "selector" which assigns to each x E X a y E Y which attains or nearly attains the infimum in (55). We first discuss Borelmeasurable selection in case (a) of Proposition 7.32. For this we will need the Hausdorff metric and the corresponding topology on the set 2' of closed subsets of a compact metric space Y (Appendix C). The space 2' under this topology is compact (Proposition C.2) and, therefore, complete and separable. Several preliminary lemmas are required. Lemma 7.15 Let Y be a compact metrizable space and let g: Y + R* be lower semicontinuous. Define g* :2' + R* by
Then g" is lower semicontinuous. Proof Since the empty set is an isolated point in 2', we need only prove that g* is lower semicontinuous on 2' - (@). We have already shown [Proposition 7.32(a)] that, given a nonempty set A E ~ ' ,there exists Y E A
7.
150
BOREL SPACES AND THEIR PROBABILITY MEASURES
such that g*(A) = g(y). Let {A,) c 2' be a sequence of nonempty sets with ~ be such that g*(A,) = g(y,), n = 1,2,. . . . Choose limit A E ~ ' ,and let y , A, a subsequence { y,,) such that lim g(y,,)
= liminfg(y,) = liminfg*(A,).
k- m
n+m
n+ a,
The subsequence (y,,) accumulates at some yo E Y, and, by Lemma 7.13(a), g(yo) i lim g(y,,)
= liminfg*(A,).
k+ m
n+ m
From (14) of Appendix C and from Proposition C.3, we have (in the notation of Appendix C) -
yo E lim A,
= A,
n-t m
The result follows from Lemma 7.13(a).
Q.E.D.
Lemma 7.16 Let Y be a compact metrizable space and let g: Y -+ R* be lower semicontinuous. Define G:2'R* -+ 2' by G(A, c) = A n { Y EYlg(y) i c).
(57)
Then G is Borel-measurable. Proof We show that G is upper semicontinuous (K) (Definition C.2) and apply Proposition C.4. Let {(A,, c,)) c 2 ' ~ " be a sequence with limit then (A, c). If limn,, G(A,, c,) =
a,
lim G(A,, c,) c G(A, c). n-+m
Otherwise, choose y~lim,,, G(A,, c,). There is a sequence n, < n, < . . . of positive integers and a sequence y , , ~G(A,,, c,,),k= 1,2,. . . , such that ynk-+ y. By definition, y , , ' ~A,, for every k, so y~lim,,, An = A. We also have g(ynk)i c,,, k = 1,2,. . . , and using the lower semicontinuity of g, we obtain g(y ) i lim inf g( y,,) i lim c,, k* m
= c.
k- n
Therefore y E G(A, c), (58) holds, and G is upper semicontinuous (K).
Q.E.D.
7.5
151
BOREL-MEASURABLE SELECTION
Lemma 7.17 Let Y be a compact metrizable space and let g : Y -+ R* be lower semicontinuous. Let g*:2'+ R* be defined by (56) and define G*:2' -+ 2' by G*(A) = A n ( Y E Y l g ( y ) g*(A)).
(59)
Then G* is Borel-measurable. Proof Let G be the Borel-measurable function given by (57).Lemma 7.15 implies g* is Borel-measurable. A comparison of (57)and (59) shows that
It follows that G* is also Borel-measurable.
Q.E.D.
Lemma 7.18 Let Y be a compact metrizable space. There is a Borelmeasurable function o:2' - (@) + Y such that ~ ( A )AE for every A E ~ '- (@). Proof Let {g,ln = 1,2,. . .) be a subset of C ( Y ) which separates points in Y (for example, the one constructed in the proof of Proposition 7.7). As in Lemma 7.15, define :2' + R* by
and, as in Lemma 7.17, define G;:2'
Let H,:2'
-+
-+ 2'
by
2' be defined recursively by
Then for A # @, each H,(A) is nonempty and compact, and
Therefore, have
n,"=,H,(A), then for n
H,(A) # @. If y, y ' ~
=
Since {g,ln = 1,2,. . .) separates points in Y, we have y = yr, and must consist of a single point, which we denote by o(A).
1,2,. . . , we
n,"=,H,(A)
152
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
We show that for A # @ m
lim H,(A)
=
n-r m
n =0
H,(A)
=
{o(A)).
(60)
Since the sequence {H,(A)) is nonincreasing, we have from (14) and (15) of Appendix C that
-
If y ~ l i m , , , H,(A), then there exist positive integers n, < n, < . . . and a sequence y , , ~H,,(A), k = 1,2,. . . , such that y,, -+ y. For fixed k, y,, EH,, for all j 2 k, and since H,,(A) is closed, we have Y E H,,(A). Therefore, yE .,(A) and
n,?=
m
lim H,(A) n+ m
c
0 H,(A).
n=O
(62)
From relations (61)and (62),we obtain (60). Since GX and H , are Borel-measurable for every n, the mapping v:2' 1%) 2' defined by v ( A ) = { a ( A ) )is Borel-measurable. It is easily seen that the mapping t : Y -+ 2' defined by t ( y ) = { y ) is a homeomorphism. Since Y is compact, t ( Y ) is compact, thus closed in 2', and t - ' : t ( Y ) -+ Y is Borel-measurable. Since o = t-' ov, it follows that o is Borel-measurable. Q.E.D. -+
Lemma 7.19 Let X be a metrizable space, Y a compact metrizable space, and let f : X Y -+ R* be lower semicontinuous. Define F : X R * -+ 2' by
Then F is Borel-measurable. Proof The proof is very similar to that of Lemma 7.16. We show that F is upper semicontinuous (K) and apply Proposition C.4. Let (x,, c,) -+ ( x ,c) in X R * and let y be an element of En,,F(x,, c,), provided this set is nonempty. There exist positive integers n, < n, < . . . and y , , ~F(xnk,en,) such that y,, -+ y. Since f(x,,, y,,) I c,, and f is lower semicontinuous, we conclude that f ( x ,y) 5 c, so that limn,, F(x,, c,) c F(x, c). The result follows. Q.E.D.
Lemma 7.20 Let X be a metrizable space, Y a compact metrizable space, and let f :X Y -+ R* be lower semicontinuous. Let f * :X -+ R* be given by f * ( x ) = min,,, f ( x , y ) , and define F * : X + 2' by
Then F* is Borel-measurable.
7.5
153
BOREL-MEASURABLE SELECTION
Proof Let F be the Borel-measurable function defined by (63). Proposition 7.32(a)implies that f * is Borel-measurable. From (63)and (64)we have F*(x) = F[x, f *(x)]. It follows that F* is also Borel-measurable.
Q.E.D.
We are now ready to prove the selection theorem for lower semicontinuous functions. Proposition 7.33 Let X be a metrizable space, Y a compact metrizable space, D a closed subset of X Y, and let f : D -+ R* be lower semicontinuous. Let f * :proj,(D) -+ R* be given by
f *(x) = min f(x, y).
(65)
YE&
Then proj,(D) is closed in X, f * is lower semicontinuous, and there exists a Borel-measurable function cp:projx(D)-+ Y such that Gr(cp) c D and
f
[x, 4o(x)l = f *(XI
Vx E projx(D).
(66)
Proof We first prove the result for the case where D = XY. As in Lemma 7.18, let o:2' - ((21) -t Y be a Borel-measurable function satisfying ~ ( A )AE ' ((21). As in Lemma 7.20, let F*:X -+ 2' be the Borelfor every A E ~ measurable function defined by Proposition 7.32(a) implies that f * is lower semicontinuous and F*(x) f (21 for every x E X. The composition cp = o o F* satisfies (66). Suppose now that D is not necessarily XY. To see that projx(D) is closed, note that the function g = - X, is lower semicontinuous and
,
where g*(x) = min,, g(x, y). By the special case of the proposition already proved, g* is lower semicontinuous, projx(D) is closed, and there is a Borelmeasurable function q , : X -+ Y such that g[x,q,(x)] = g*(x) for every x E X or, equivalently, (x, cpl(x))E D for every x E projx(D). Define now the lower semicontinuous function ~ : X -+ Y Ji* by
y(&y) = For all c E R,
Y)
if (x, Y)ED, otherwise.
154
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
Since min,,,J?(x, g) is lower semicontinuous, it follows that f * is also lower semicontinuous. Let cp2:X -+ Y be a Borel-measurable function satisfying
Clearly (x, cp,(x)) E D for all x in the Bore1 set
Define cp :projx(D) + Y by (el(x)
if min Ax,y ) = m, vaY
The function qn is Borel-measurable and satisfies (66).
Q.E.D.
We turn our attention to selection in the case of an upper semicontinuous function. The analysis is considerably simpler, but in contrast to the "exact selector" of (66) we will obtain only an approximate selector for this case. Lemma 7.21 Let X be a metrizable space, Y a separable metrizable space, and G an open subset of XY. Then projx(G) is open and there exists a Borel-measurable function cp:proj,(G) + Y such that Gr(cp) c G. Proof Let { g,ln = 1,2,. . .) be a countable dense subset of Y. For fixed Y, the mapping x -,(x, y) is continuous, so {x E X/(x,y) E G) is open. Let G, = {x E XI(x, y,) E G), and note that projx(G) = G, is open. Define cp :projx(G) -+ Y by
YE
U,"=,
Then cp is Borel-measurable and Gr(cp) c G.
Q.E.D.
Proposition 7.34 Let X be a metrizable space, Y a separable metrizable space, D an open subset of X Y , and let f : D -+ R* be upper semicontinuous. Let f * :proj,(D) + R* be given by
f *(x) = inf f(x,y).
(67)
psDz
Then projx(D) is open in X, f * is upper semicontinuous, and for every F: > 0, there exists a Borel-measurable function cp,:projx(D) + Y such that
7.5
155
BOREL-MEASURABLE SELECTION
Gr(cp,) c D and for all x E projx(D)
Proof The set projx(D) is open in X by Lemma 7.21. To show that f * is upper semicontinuous, define an upper semicontinuous function x Y + R* by
7:
fi,
=
Y)
if (x, Y)E D, otherwise.
For c E R, we have
and this set is open by Proposition 7.32(b). Let E > 0 be given. For k = 0, 11, i 2 , . . . , define (see Fig. 7.1) A(k) = {(x,Y) E Dlf(x, Y) < k&), B(k) = j x ~ p r o j , ( ~ ) l ( k l ) I~f *(x) < kc), B( - co) = jx E projx(D)lf *(x) = - co), B(w) = .jx E projx(D)lf *(x) = co). The sets A(k), k = 0, i l , 12,. . . , are open, while the sets B(k), k = 0, 1 1 , 2 2 , . . . ,B(- a), and B(co) are Borel-measurable. By Lemma 7.21,
FIGURE 7.1
7.
156
BOREL SPACES AND THEIR PROBABILITY MEASURES
there exists for each k = 0, If: 1, f2,. . . a Borel-measurable cp,: projx(A,) -+ Y such that Gr(cp,) c A,, and there exists a Borel-measurable q :projx(D) + Y such that Gr(q) c D. Let k* be an integer such that k* 5 Define cpE:projx(D) Y by +
cp,(x) @(x) cp,,(x)
if x E B(k), k = 0, & 1, &2,. . . , if X E B(co), if X E B(- co).
Since B(k) c projx[A(k)] and B(- co) c proj,[A(k)] for all k, this definition is possible. It is clear that cp, is Borel-measurable and Gr(cp,) c D. If x E B(k), then, since (x, cp,(x))E A(k), we have
f [x, ~ , ( x ) = ] f [x, qk(~)]< k& 5 f *(XI+ &. If x E B(co), then f(x, y) = co for all y E D, and f [x, cp,(x)] x E B(- a), we have
f [x, cp,(x)] = f [x,cp,*(x)l < k*&5 Hence cp, has the required properties.
=
co = f "(x). If
- I/&.
Q.E.D.
7.6 Analytic Sets
The dynamic programming algorithm is centered around infimization of functions, and this is intimately connected with projections of sets. More specifically, iff: X Y + R* is given and f * :X + R* is defined by
then for each c E R
If f is a Borel-measurable function, then ((x,y)lf(x, y) < c ) is a Borelmeasurable set. Unfortunately, the projection of a Borel-measurable set need not be Borel-measurable. In Borel spaces, however, the projection of a Borel set is an analytic set. This section is devotcd to development of properties of analytic sets. 7.6.1 Equivalent Dejinitions of Analytic Sets
There are a number of ways to define the class of analytic sets in a Borel space X. One possibility is to define them as the projections on X of the Borel subsets of XY, where Y is some uncountable Borel space. Another
possibility is to define them as the images of the Baire null space Jlr under continuous functions from Jlr into X. Still another possibility is to define them as all sets of the form
where .A'" is the set of all sequences of positive integers (the Baire null space) and the sets S(ol, o, , . . . ,on)are closed in X. All these definitions are equivalent, as we show in Proposition 7.41. We will take the third definition as our starting point, since this is the most convenient analytically. We first formalize the set operation just given in terms of the notion of a Suslin scheme in a paved space. Definition 7.14 Let X be a set. A paving B of X is a nonempty collection of subsets of X. The pair (X, B) is called a paved space.
If (X, .P) is a paved space, we denote by o(B) the o-algebra generated by 9, we denote by 9, the collection of all intersections of countably many members of 9 , and we denote by 9, the collection of all unions of countably many members of 9 . Recall that N is the set of positive integers, Jlr is the set of all infinite sequences of positive integers, and C is the set of all finite sequences of positive integers. Definition 7.15 Let (X, 9)be a paved space. A Suslin scheme for 9 is a mapping from C into rY.The nucleus of a Suslin scheme S:C + B is
The set of all nuclei of Suslin schemes for a paving B will be denoted by Y(B). In order to simplify notation, we write, for s = (o,, a,, . . . ,on)€C and z = (il,12,. - .)E.M,
With this notation, (69) can also be written as
We will use both expressions interchangeably. Note that the union in (69) is uncountable, so if .Y is a o-algebra and S is a Suslin scheme for 9, N(S) may be outside .Y. Several properties of 9(.Y) are given below.
158
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
Proposition 7.35 Let X be a space with pavings 9 and 2 such that
.9'c 2. Then
Proof (a) Obvious. Now ). choose (b) It is clear that .Y(.9'),3 9'(,9 N(S,)EY(,Y),. where S, is a Suslin scheme for 9, k = 1,2,. . . . It suffices to construct a Suslin scheme S for .Ysuch that
For k = 1, 2, . . . ,let II, = ((2j - 1)2,- ' I j = 1,2, . . .). Then II, , n 2 , . . . is a partition of N into infinitely many infinite sets. For each positive integer k, let cp, :A'"-t A'" be defined by
i.e., cp, picks out the components of (i,, i 2 , . . .)with indices in II,. We want to construct a Suslin scheme S for which
We may rewrite (71) as
Given (i,, C2, . . . ,in) E C, we have n positive integers j and k. Define
= (2j -
1)2k-' for exactly one pair of
This defines a Suslin scheme S for which (72),and hence (71), is easily verified. We now use (71) to prove (70). Choose
For some 2 , E .M, we have
Thus, for every k,
It follows that x E
n,"=,N(S,) and Uzp.Nn s < z S k ( for ~ ) every k, so 0,,,, Sk(s).Let z, E JV be such 02,ns,,k(zo, Sk(s).An application
If we are given X E (),"=, N(Sk), then X E for every k, there exists z, E JV such that x E that qo,(z,) = z,, k = 1,2,. . . . Then X E of (71)shows that
and
Relation (70)follows from (74)and (75). (c) It is clear that Y(,Y),I> Y(9).Choose N(S,) E Y ( Y ) , , where S, is a Suslin scheme for .Y, k = 1,2,. . . . It suffices to construct a Suslin scheme S for .Yfor which
UP=,
Given (i,, i , , . . . ,[,)EX, we have positive integers j and k. Define
i, = (2j - 1)2k-1 for exactly one pair of
This defines a Suslin scheme S for which
U z e Nns
Returning to (76), we choose X E N ( S ) = S(S). For some ( c l ,[,, . . .)E.N, we have x E s ( i l ,[,, . . . ,[,),and choosing j and k so
n=:
160
that
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
1, = (2j - 1)2k-1,we have, from (77),
If, on the other hand, we choose
then for some k~ N and (j, c,, . . .) E M , we have Equation (77) implies
XE
Sk(j,C2, . . . ,in).
Relation (76) follows from (78) and (79). (d) For P E P , define S(s) = P for every s E C.Then N(S) = P. (e) The proof of this takes us somewhat far afield, so is given as PropoQ.E.D. sition B.2 of Appendix B. It is not in general true that Y(B) is closed under complementation, so Y ( 9 ) is generally not a o-algebra. In order for Y ( 9 ) to contain o(.Y), we need one additional assumption. Corollary 7.35.1 Let (X,.P) be a paved space and assume that the complement of each set in 9 is in Y(.P).Then g ( 9 ) c Y(.!Y).
Proof The smallest algebra containing 9 consists of the finite intersections of finite unions of sets in .Y and complements of sets in 9. By Proposition 7.35, these sets are contained in Y[Y[Y(9)]] = Y ( 9 ) . Since Y ( 9 ) is a monotone class, it contains the a-algebra generated by 9 as well Q.E.D. (Ash [Al, p. 191). Definition 7.16 Let X be z Borel space. Denote by F xthe collection of closed subsets of X. The analytic subsets of X are the members of Y ( P x ) . Corollary 7.35.2 Let X be a Borel space. The countable intersections and unions of analytic subsets of X are analytic.
This follows from Proposition 7.35(b)and (c).
Proof
Q.E.D.
Proposition 7.36 Let X be a Borel space. Then every Borel subset of X is analytic. Indeed, the class of analytic sets Y ( P xis) equal to Y(93,).
Proof Every open subset of X is an F , (Lemma 7.2), so every open set is analytic. Corollary 7.35.1 implies 93, c Y(Px). Proposition 7.35(a) and (e) implies
9 ( F x )c 9 ( B x )c Y[Y(P,)]= 9 ( F x ) .
Q.E.D.
If the Borel space X is countable, then every subset of X is both analytic and Borel-measurable. If X is uncountable, however, the class of analytic subsets of X is strictly larger than 93,. This is shown in Appendix B, where we prove the existence of an analytic set whose complement is not analytic. Note that an immediate consequence of Proposition 7.36 is that if Y is a Borel subset of the Borel space X , then the analytic subsets of Y are the analytic subsets of X contained in Y. A generalization of this fact is the following. Corollary 7.36.1 Let X and Y be Borel spaces and q : X + Y a Borel isomorphism. Then A c X is analytic if and only if q ( A ) c Y is analytic.
Proof If q : X + Y is a Borel isomorphism and A c X is analytic, then A = N ( S ) , where S is a Suslin scheme for F x . It is easily seen that q ( A ) = N ( q 0 S ) , where q 0 S is the Suslin scheme for 93, defined by (q0 S)(s)= q [ S ( s ) J ,so q ( A ) is analytic by Proposition 7.36. If q ( A ) c Y is analytic, Q.E.D. A c X is analytic by a similar argument.
We proceed to the development of several equivalent characterizations of analytic sets. The general definition of a Suslin scheme is unrestrictive with respect to the form of the mapping S : C + 9. In the event that X is a separable metric space and 9 = F x , one can assume without loss of generality that S has more structure. Definition 7.17 Let ( X ,9)be a paved space and S a Suslin scheme for 9. The Suslin scheme S is regular if for each n~ N and (o,,0 2 ,. . . ,on+,) E C , we have
Lemma 7.22 Let ( X , d ) be a separable metric space and S a Suslin Then there exists a regular Suslin scheme R for .Fx such scheme for 9,. i , , . . .) E .N, that N ( R ) = N ( S ) and, for every z =
(el,
lim diam R(5,, i 2 , . . . , i n ) = 0 n-+
3c
if R ( c l , 12,.. . ,in) # @ Vn. (80)
162
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
Proof By the Lindelof property, for each positive integer k, X can be covered by a countable collection of open spheres of the form Bkj= {x E x Id(x, x k j )< llk}, j = 1,2,. . . . For il C 2 , . . .) E . N , define
(r,, ,r2,
etc. Thus
where z = (C,, i2,. . .). It is clear that R is a regular Suslin scheme for F x and (80) holds. If x E N(R), then there exists ( T I ,C , ,I2,[,,. . .) E .Nsuch that .) S ( S ) N ~ s ) and , therexE f2,i2.. . .) R(s), so by (81) E fore N ( R ) c N(S). If x E N(S), then there exists ( [ ,,[, ,. . .) E JV such that xE ,, . S(s). Since for each positive integer k, the collection (Bkjlj = 1,2,. . .) covers X , there exists for each k a positive integer Tk for which x E ~ k z kT . Tx E ~ ~ B,E~ , and, by (811,x E . .) R(S) N(R), SO Q.E.D. N ( R ) 3 N(S). It follows that N ( R ) = N(S).
ns<(zl,il. ns,,,,,, ,, n~,
nsc(i1,i2r..
,
ns
Note that if a regular Suslin scheme R satisfies (80),then for all z in
the set
n,,
R(s) consists of a single point, say f(z). Thus we have
and this relation provides the basis for an alternative way of characterizing analytic sets. We have the following lemma. Lemma 7.23 Let ( X ,d ) be a complete separable metric space. If A c X is a nonempty analytic set, then there exist a closed subset .N, of .N and a continuous function f :A'", + X such that A = f ( N , ) . Conversely, if .Nl c N is closed and f : JV, + X is continuous, then f ( , N , ) is analytic.
Proof Let A = N ( R ) be nonempty, where R is a regular Suslin scheme for .Fx satisfying (80).Define
Let z = ( i , ,i,, . . .) be in .M. If for each n we have R([,, i,, . . . , i n )# @, then it is possible to chose x, E R([,,i2,. . . ,in).The sequence [x,} is Cauchy by (80),and since ( X ,d) is complete, jx,} has a limit x E X. But for each n the regularity of R implies jx,(m 2 n ) c R ( i l ,i 2 , . . . ,in),so x E R ( i l ,i 2 , . . . ,in). Therefore x E R(s). Now suppose r E .M - ,MI. The preceding argument shows that for some s, < r,we have R(s,) = @. The open neighborhood (wE .ATIS, < w) contains I and is contained in .M - M I , so .N - .MI is open and M I is closed. For z E . N l , define f ( z ) to be the unique point in ,,R(s). If (z,) is a sequence in ,MI converging to z, = ([,, i,, . . . ) E M , , then given r > 0, (80) implies that there exists sn < z, for which diam R(s,) < r . For k sufficiently large, z, E { zE N l s , < z ) ,so f (z,) E R(s,). Therefore d ( f (z,),f(z,)) I diam R(sn) < r , which shows that f is continuous. For the converse, suppose N , c N is closed and f : N ,+ X is continuous. Define a regular Suslin scheme R for Pxby
n,,
0,
O,,,
where R(s)= @ if { Z E .M,ls < z ) = @. If Z E .MI, then f ( z ) ~ R(s) c N(R), so f (A'",) c N(R). If x E N(R),then for some z, = (i,,i, ,. . .) E .M we =, R(s).Then for each n, have x E
0,.
so given E > 0, there exists a znE.N,with (i,,i2,. . . ,in) < zn and d(x,f(z,)) < E. But as n -t co, z, must converge to -7., The closedness of A'", implies z, E . M I , and the continuity off implies d(x,f(z,)) I r. Since r > 0 is arbitrary, we have f (z,) = x,x E f ( M , ) , and N(R) c f ( M I ) . Q.E.D. We have thus characterized analytic sets as the continuous images of closed subsets of JV. We will obtain an even sharper characterization, for which we need the following lemma. Lemma 7.24 If ,MI is a nonempty closed subset of N , then there exists a continuous function g:.M -, N such that N , = ~ ( J V ) .
Proof Use the Lindelof property to cover M , with a countable collection of nonempty closed sets {S(i,)l[,E N )which satisfy
N,
2
S(il),
diam S ( i l )l1,
1, = 1,2,. . . ,
where d is a metric on JV consistent with its topology and diamS([,) is given by (9). Cover each S((',) with a countable collection of nonempty E N ) which satisfy closed sets is([,,
[, ) I [ ,
7.
164
BOREL SPACES AND THEIR PROBABILITY MEASURES
Continue in this manner so that, for any
(i,,i 2 ,. . . , i n - ,),
50
S ( i l , i ~ , .,in-,) . =
U Sn=
S ( i l , i 2 , .. . , i n ) ,
S(i13C2,. . . ,Cn-l,in), in= 1,2,. . . . diam S ( i 1 ,i2,.. . ,in) 5 l/n, s(C12129..
.,in-l)
3
(83)
1 in
= 1,2,. . . ,
The completeness of .N and (82)-(85) imply that for each z~ M , consists of a single point. Define g(z) to be this point. Then
n,,
(84) (85) S(s)
The continuity of g follows by an argument similar to the one used in the Q.E.D. proof of Lemma 7.23. Proposition 7.37 Let X be a Borel space. A nonempty set A c X is analytic if and only if A = f ( M ) for some continuous function f : M --+ X .
Proof If X is complete, the proposition follows from Lemmas 7.23 and 7.24. If X is not complete, it is still homeomorphic to a Borel subset of a complete separable space, and the result follows from Corollary 7.36.1. Q.E.D. Proposition 7.37 gives a very useful characterization of nonempty analytic sets in terms of continuous functions and the Baire null space M . The Baire null space has a simple description and its topology allows considerable flexibility. We have already shown, for example, that it is homeomorphic to M , , the space of irrationals in [O,l]. Another important homeomorphism is the following. Lemma 7.25 The space M is homeomorphic to any finite or countably infinite product of copies of itself.
Proof We prove the lemma for the case of a countably infinite product. Let II,, IT2, . . . be a partition of N, the set of positive integers, into infinitely many infinite sets. Define cp: M + , N M N . . . by
where z, consists of the components of z with indices in II,. Then cp is oneto-one and onto and, because convergence in a product space is componentwise, cp is a homeomorphism. Q.E.D. Combination of Lemma 7.25 with Proposition 7.37 gives the following.
Proposition 7.38 Let X I , X, ,. . . be a sequence of Borel spaces and A, an analytic subset ofX,, k = 1,2,. . . .Then the sets AlA2. . . and AlA2. . .A,, n = 1,2,. . . , are analytic subsets of XIX,. . . and XlX2- . .X,, respectively.
Proof Let f,:N + X, be continuous such that A, = f,(N), k Let cp be given by (86) and F : N N . . . + X l X 2 . . . be given by
=
1,2,. . . .
F(z1, z22. . = ( f l ( ~ l ) > f 2 ( ~- 2 ) , . Then F 0 cp is continuous and maps .ilr onto A,A,. . . . The finite products are handled similarly. Q.E.D. Another consequence of Proposition 7.37 is that the continuous image of an analytic set, in particular, the projection of an analytic set, is analytic. As discussed at the beginning of this section, this property motivated our inquiry into analytic sets. We formalize this and a related fact to obtain another characterization of analytic sets. Proposition 7.39 Let X and Y be Borel spaces and A an analytic subset of XY. Then proj,(A) is analytic. Conversely, given any analytic set C c X and any uncountable Borel space Y, there is a Borel set B c XY such that C = proj,(B). If Y = N , B can be chosen to be closed.
Proof If A = f ( N ) c X Y is analytic, where f is continuous, then proj,(A) = (proj, 0 f ) ( N ) is analytic by Proposition 7.37. If C = f ( N ) c X is nonempty and analytic, then
c = projX[Gr(f 11, where Gr(f )= {( f (z),z) E X N I ZE N ) is closed because f is continuous. If Y is any uncountable Borel space, then there exists a Borel isomorphism cp from JV onto Y (Corollary 7.16.1). The mapping (I? defined by
is a Borel isomorphism from X N onto X Y, and C = projx(~[Gr(f)]).
Q.E.D.
So far we have treated only the continuous images of analytic sets. With the aid of Proposition 7.39, we can consider their images under Borelmeasurable functions as well. Proposition 7.40 Let X and Y be Borel spaces and f : X + Y a Borelmeasurable function. If A c X is analytic, then f(A) is analytic. If B c Y is analytic, then f -'(B) is analytic.
Proof Suppose A c X is analytic. By Proposition 7.39, there exists a Borel set B c X.M such that A = proj,(B). Define I): B + Y by $(x, z) = f (x).
166
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
Then $ is Borel-measurable, and Corollary 7.14.1 implies that G ~ ( $ ) BXNY. E Finally, f(A) = proj,[Gr($)] is analytic by Proposition 7.39. If B c Y is analytic, then B = N(S), where S is some Suslin scheme for F , . Then f -'(B) = N(f 0 S), where f o S is the Suslin scheme for 33, defined by
-'
-'
The analyticity off -'(B) follows from Proposition 7.36.
Q.E.D.
We summarize the equivalent definitions of analytic sets in Borel spaces. Proposition 7.41 Let X be a Borel space. The following definitions of the collection of analytic subsets of X are equivalent:
(a) Y ( F x ) ; (b) Y(Bx); (c) the empty set and the images of A'" under continuous functions from A'" into X ; (d) the projections into X of the closed subsets of X M ; (e) the projections into X of the Borel subsets of XY, where Y is an uncountable Borel space; (f) the images of Borel subsets of Y under Borel-measurable functions from Y into X, where Y is an uncountable Borel space. Proof The only new characterization here is (f). If Y is an uncountable Borel space and f : Y + X is Borel-measurable, then for every B E B y , f (B) is analytic in X by Proposition 7.40. To show that every nonempty analytic set A c X can be obtained this way, let q be a Borel isomorphism from Y onto X.M and let F c X M be closed and satisfy proj,(F) = A. Define B = ~ - ' ( F ) E B , . Then (proj, q)(B) = A. If A = @, then f(@) = A for any Q.E.D. Borel-measurable f : Y -+ X. 0
7.6.2 Measurability Properties of Analytic Sets
At the beginning of this section we indicated that extended real-valued functions on a Borel space X whose lower level sets are analytic arise naturally via partial infimization. Because the collection of analytic subsets of an uncountable Borel space is strictly larger than the Borel G-algebra (Appendix B), such functions need not be Borel-measurable. Nonetheless, they can be integrated with respect to any probability measure on (X,Bx). To show this, we must discuss the measurability properties of analytic sets. If X is a Borel space and p~ P(X), we define p-outer measure, denoted by p*, on the set of all subsets of X by
Outer measure on an increasing sequence of sets has a convergence property, namely, p*(En)t p*(U,"=, En)if El c E , c . . . . This is easy to verify from (87) and also follows from Eq. ( 5 ) and Proposition A.1 of Appendix A (see also Ash [Al, Lemma 1.3.3(d)]).The collection of sets Bx(p) defined by
is a o-algebra (Ash [Al, Theorem 1.3.5]), called the completion of .Bxwith respect to p. It can be described as the class of sets of the form B u N as B ranges over 8, and N ranges over all subsets of sets of p-measure zero in B, (Ash [Al, p. 18]), and we have
Furthermore, p* restricted to .9Yx(p)is a probability measure, and is the only extension of p to Bx(p) that is a probability measure. In what follows, we denote this measure also by p and write p(E) in place ofp*(E) for all E E Bx(p). Definition 7.18 Let X be a Borel space. The universal o-algebra axis defined by %, = Bx(p). If E E %,, we say E is universally measurable.
n,.,(,,
The usefulness of analytic sets in measure theory is in large degree derived from the following proposition. Proposition 7.42 (Lusin's theorem) Let X be a Borel space and S a Suslin scheme for a,. Then N(S) is universally measurable. In other words, Y ( a X )= a x .
Proof Denote A = N(S), where S is a Suslin scheme for a , . For (o,, . . . ,o,) E C, define
and
Define also
Then where
7.
168
BOREL SPACES AND THEIR PROBABILITY MEASURES
As o1 oo,M(ol) T N , so R(ol) T A. Likewise, as ok co,M(ol,. . . ,ok- l, ok) f M(ol,. . . ,ok- 1),SO R(ol,. . . ,ok- l, ok)T R(nl,. . . ,ok- l). Given p~ P(X) and E > 0, choose TI ,C,, . . . such that
Then
rk)is universally measurable, so (91) and (93) imply
The set K(r,, . . . ,
We show that
an argument by contradiction will be used to show that for some we have
If no such zl existed, then for every integer k(zl) such that
If li = max,,<5,k(zl), then
7,
I il,
I cl, there would exist a positive
and a contradiction is reached. Replace (96) by (97) and apply the same C2 such that argument to establish the existence of T2 I
Continuing this process, construct a sequence T1 I[,,T 2 I [ , , . . . such that
This proves (95), i.e., as k + co,K ( [ , , . . . ,[,) decreases to a set contained in A, and X - K ( [ , ,. . . ,[,) increases to a set containing X - A. Letting k -t co in (94),we obtain
1 2 p*(A) - c + p*(X - A). Since E > 0 is arbitrary, this implies that
+ p*(X - A). that p*(E) + p*(X - E ) 2 1, so 1 2 p*(A)
It is true for any E c X
and A is measurable with respect tGp.
Q.E.D.
Corollary 7.42.1 Let X be a Borel space. Every analytic subset of X is universally measurable.
Proof The closed subsets of X are universally measurable, so Y (9, c) 42, by Proposition 7.42. Q.E.D. As remarked earlier, the class of analytic subsets of an uncountable Borel space is not a o-algebra, so there are universally measurable sets which are not analytic. In fact, we show in Appendix B that in any uncountable Borel space, the universal o-algebra is strictly larger than the o-algebra generated by the analytic subsets. 7.6.3 An Analytic Set of Probability Measures
In Proposition 7.25 we saw that when X is a Borel space, the function 8, : P ( X )-t [O, 11 defined by QA(p)= p(A) is Borel-measurable for every Borel-measurable A c X . We now investigate the properties of this function when A is analytic. The main result is that the set { p P~( X ) ( p ( A )2 c) is analytic for each real c. Proposition 7.43 Let X be a Borel space and A an analytic subset of X . For each C E R, the set { p~ P ( X ) l p ( A )2 c ) is analytic.
170
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
Proof Let S be a Suslin scheme for Fx,the class of closed subsets of X, such that A = N(S). For S E E , let N(s), M(s), R(s), and K(s) be defined by (88)-(90) and (92).Then (91)and (95)hold and each K(s) is closed. We show that for c E R
-
c,,
-
If p(A) 2 c, then for any n 2.1, there exists (i,, . . . ) E M such that (93) is satisfied with p = p and E = lln. Then by (91),for k = 1,2, . . .
On the other hand, given m
PE
n u n iP
n = l zeNs
G P ( X ) I P [ K ( ~2) I - ( 1 1 ~ ) ) ~
we have that for each n there exists ( C l , l z , . . . ) E . N for which
[
K k= 1
. ..
.ik)] lim p [ K ( i l , . . . ,L)]
2 c - (lln).
=
k - oc
We have then from (95)that
so p(A) 2 c, and (98)is proved. Proposition 7.25 guarantees that for' each n 2 1 and S E E ,the set
is Borel-measurable in P(X):We have from (98)that
and the proposition follows from Proposition 7.36 and Corollary 7.35.2. Q.E.D. Corollary 7.43.1 Let X be a Bore1 space and A an analytic subset of X . For each C E R, the set j p~ P(X)lp(A)> c) is analytic.
7.7
UNIVERSALLY MEASURABLE SELECTION
Proof
For each c E R, m
{ P EP(X)IP(A)> c )
=
U { PE P(X)Ip(A)2 c + (lln)). n= 1
The result follows from Corollary 7.35.2 and Proposition 7.43.
Q.E.D.
Lower Semianalytic Functions and Universally Measurable Selection
7.7
In a Borel space X, there are at least three o-algebras which arise naturallyt :the Borel o-algebra 99, of Definition 7.6, the universal o-algebra axof Definition 7.18, and the analytic o-algebra d x , which we define now. Definition 7.19 Let X be a Borel space. The analytic o-algebra dxis the smallest o-algebra containing the analytic subsets of X. In symbols, dx= o[Y(Fx)]. If E E d x , we say that E is analytically measurable.
From Proposition 7.36 and Lusin's theorem (Proposition 7.42), we have that for any Borel space X 99, c Y ( 9 , ) c d x
C
ax.
If X is countable, each of these collections of sets is equal to the power set of X (the collection of all subsets of X). We show in Appendix B that if X is uncountable, each set containment above is strict. This fact will not be used in the constructive part of the theory, but only to give examples showing that results cannot be strengthened. Corresponding to the three o-algebras just discussed, we will treat three types of measurability of functions. Borel-measurable functions were defined in Definition 7.8. The other two types are defined next. Definition 7.20 Let X and Y be Borel spaces and f a function mapping D c X into Y. If D E dxand f - '(B) E dX for every BEg y , f is said to be analytically measurable. If D E %x and f - '(B) E axfor every BE B y ,f is said to be universally measurable.
From the preceding discussion, we see that every Borel-measurable function is analytically measurable, and every analytically measurable function is universally measurable. The converses of these statements are false. We begin by stating for future reference the following characterization of the universal o-algebra.
'
A fourth G-algebra, the litnit a-ulgehru Y Xwhich , lies between .d, and Appendix B. and treatcd there and in Section 11.1.
J//x, is
defined in
7.
172
BOREL SPACES AND THEIR PROBABILITY MEASURES
Lemma 7.26 Let X be a Borel space and E c X. Then E E axif and only if, given any p~ P(X), there exists BE^^ such that p ( E A B) = 0.
We turn now to the question of composition of measurable functions. If Borel-measurable functions are composed, the result is again Borel-measurable. Unfortunately, the composition of analytically measurable functions need not be analytically measurable (Appendix B). We have the following result for universally measurable functions. Proposition 7.44 Let X, Y , and Z be Borel spaces, D E a x , and E E a Y . Suppose f :D -+ Y and g : E -+ Z are universally measurable and f (D) c E. Then the composition g of is universally measurable.
Proof We must show that given BE^^, the set f -'[g-'(B)] is universally measurable. Since g- '(B) E a , , it suffices to prove that f - ' ( U ) E ax for every U E a y . For p E P(X), define p' E P(Y) by pl(C)=p[f-'(C)]
VCE9,.
Let V E B be ~ such that ~ [ f - ~ ( v ) n f - ~ ( u=)pl f ( v n u ) = o. The setf -l(V) is in@,, so there exists WEB, for which p[WA f -'(V)] = 0. Then p[WA f -'(U)] = 0. The result follows from Lemma 7.26. Q.E.D. The proof of Proposition 7.44 also establishes the following fact. Corollary 7.44.1 Let X and Y be Borel spaces, D E ~ , ,and f :D -+ Y a universally measurable function. If U E a, , then f -'(U) E %, .
Since dxc a x , we can specialize these results to analytically measurable sets and functions. Corollary 7.44.2 Let X, Y, and Z be Borel spaces, D ~ d , and , E E d y. Suppose f : D 4 Y and g:E -+ Z are analytically measurable and f(D) c E. Then the composition g of is universally measurable. If A E d , , then f
We remind the reader that if X and Yare Borel spaces, a stochastic kernel q(dy1x) on Y given X is said to be universally measurable if the mapping y(x) = q(dy)x)is universally measurable from X to P ( Y ) (Definition 7.12). Corollary 7.44.3 Let X and Y be Borel spaces, let f :X -+ Y be a function, and let q(dylx) be a stochastic kernel on Y given X such that, for each ~ Then q(dylx) is unix, q(dylx) assigns probability one to the point f ( x ) Y. versally measurable if and only iff is universally measurable.
7.7
173
UNIVERSALLY MEASURABLE SELECTION
Proof Let 6 : Y + P(Y) be the homeomorphism defined by d ( y ) = p, (Corollary 7.21.1). Let y : X + P ( Y ) be the mapping y(x) = q(dylx). Then y = 6 0 f and f = 6- o y. The result follows from Proposition 7.44. Q.E.D.
'
If X is a Borel space and f : X + R* is universally measurable, then given any p E P ( X ) ,f is measurable with respect to the completed Borel o-algebra Wx(p),and Jf d p is defined by
where the convention co - co = co is used and the integrations are performed the integral j, f dp is defined on the measure space (X, &,(p), p). If D E ax, similarly. Having thus defined Jf d p without resort to p-outer measure, we have all the classical integration theorems at our disposal, provided that we take care with the addition of infinities. We proceed now to show that universally measurable stochastic kernels can be used to define probability measures on product spaces in the manner of Proposition 7.28. For this we need some preparatory lemmas. Lemma 7.27 Let X be a Borel space and f : X + R*. The function f is universally measurable if and only if, for every P E P ( X ) ,there is a Borelmeasurable function f, :X + R* such that f ( x ) = f,(x) for p almost every x. Proof Suppose f is universally measurable and let p € P ( X ) be given. For r~ Q*, let U(r)= (xl f ( x ) I r ) . Then f ( x ) = i n f { r ~Q * l x U(r)). ~ Let B(r)E B , be such that p[B(r)A U(r)]= 0. Define f,(x)
= inf ( r E
Q* 1 x E ~ ( r )=) inf
$,(XI,
reQ*
where $,(x) = r if x ~ B ( r and ) $,(x) Borel-measurable, and {xlf
(4z f,(x))
=
co otherwise. Then f,:X
U
-t
R* is
A U(r)l
rsQ*
has p-measure zero. Conversely, if, given p E P ( X ) , there is a Borel-measurable f, such that f ( x ) = f,(x) for p almost every x , then
for every
C E R*,
and the universal measurability of f follows.
Q.E.D.
Lemma 7.27 can be used to give an alternative definition of Jf dp when f is a universally measurable, extended real-valued function on a Borel space X and p~ P(X). Letting fb be as in the proof of that lemma, we can dp. It is easy to show that this definition is equivalent to define 1f dp = the one which precedes Lemma 7.27.
if,
174
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
Lemma 7.28 Let X and Y be Borel spaces and let q(dy1x) be a stochastic kernel on Y given X. The following statements are equivalent:
(a) The stochastic kernel q(dy1x) is universally measurable. (b) For any B E B y , the mapping 2,:X -+ R defined by /l.,(x) = q(Blx) is universally measurable. (c) For any p E P(X), there exists a Borel-measurable stochastic kernel q,(dylx) on Y given X such that q(dy1x) = q,(dylx) for p almost every x. Proof We show (a) (b) (c) 3 (a). Assume (a) holds. Then the function y : X -+ P(Y) given by y(x) = q(dy1x)is universally measurable. If B E B,, A, is defined as in (b), and 8,: P(Y) -+ R is given by 8,(p) = p(B), then 3,, = 8, o y, which is universally measurable by Propositions 7.25 and 7.44. Therefore (a) => (b). Assume (b)holds and choose p E P(X). Since Yis separable and metrizable, there exists a countable base B for the topology in Y. Let 9be the collection 2 and their finite intersections. For F E 9, let f, be a Borel-measurof sets in 9 able function for which
where B, E Band ~ p(BF)= 1. Such an fF and BF exist by assumption (b) and BF, let q,(dylx) = q(dylx). For x $ n F , , BF, let Lemma 7.27. For X E qp(dylx)be some fixed probability measure in P(Y). Then q(dylx) = qP(dylx) for p almost every x. The class of sets _Y in By for which q,(_Ylx) is Borelmeasurable in x is a Dynkin system containing 9.The class 9is closed under finite intersections and generates B y , so statement (c) follows from the Dynkin system theorem (Proposition 7.24). Therefore (b) 3 (c). Assume (c) holds and choose p~ P(X). Let q,(dylx) be as in assumption (c) and define y, y,:X P(Y) by y(x) = q(dylx), y,(x) = q,(dylx). If B E BP(,,, then p[y-'(B) A yil(B)] = 0. Lemma 7.26 implies that y-'(B) is universally Q.E.D. measurable. Therefore (c) a (a).
OF,,
+
Lemma 7.29 Let X, Y, and Z be Borel spaces and let f : X Y -+ Z be a universally measurable function. For fixed x E X, define g,: Y -+ Z by
Then g, is universally measurable for every x E X. Proof For fixed x,EX, let cp: Y -+ XY be the continuous function defined by cp(y) = (x,, y). For Z E g,,
and this set is universally measurable by Corollary 7.44.1.
Q.E.D.
7.7
175
UNIVERSALLY MEASURABLE SELECTION
It is worth noting that if (Q,, F 1 , p) and (Q,, F 2 ,q) are probability spaces, then there are two natural o-algebras on Q,Q2, namely, F,F2and the of F,F2with respect to pq. I f f : R1Q2 -+ R is T1F2completion F1F2 measurable, then for every 0.1, EQ,, the function g,,(a2) = f ( o l , w2) is F2-measurable. However, iff is only YIP2-measurable, then g,,(a2) can be guaranteed to be F2-measurable only for p almost all 0.1,. The case treated by Lemma 7.29 is intermediate to these two, since %,a, c a, and if p E P(X), q E P(Y), and %% ,, denotes the completion of %% , , with respect to , c %,4YY. Note that the stronger result that g,(y) is %,-measurpq, then a able for every x E X holds, although the assumption that f is %,,-measurable may be weaker than the assumption that f is %x%y-measurable. We now use the properties of universally measurable functions and stochastic kernels to extend Proposition 7.28. Proposition 7.45 Let X I , X,, . . . be a sequence of Bore1 spaces, Y, = X , X 2 . . . X n a n d Y = X l X 2 . . . . LetpEP(X,) begivenand,forn = 1,2,. . . , let qn(dxn+ ,I yn)be a universally measurable stochastic kernel on X,, given Y,.Then for n = 2,3, . . . ,there exist unique probability measures r, E P(Y,), such that
,
Iff: Y, -+ R" is universally measmable and either 1f dr, < co or 1f - dr, < co. then +
Furthermore, there exists a unique probability measure r € P ( Y ) such that for each n the marginal of r on Y, is r,. Proof There is a Borel-measurable stochastic kernel ql(dx21xl) which agrees with q(dx21xl)for p almost every x,. Define r2 E P(Y2) by specifying it on measurable rectangles to be (Proposition 7.28)
7:
Assume f : Y2 + [O, co] is universally measurable and let Y2 -t [0, co] be Borel-measurable and agree with f on Y2 - N, where NEW,, and r2(N)= 0.
176
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
By Proposition 7.28, =Y
~ ( N=)Jxl Jx2 X N ( X I .
so q,(Nx,lx,) = 0 for p almost every x,. Now f(x,, x 2 ) = 7 ( x 1 x, 2 ) for x2 $ N x , so
for p almost every x,. It follows that
for p almost every x,. The left-hand side is Borel-measurable by Proposition 7.29, so the right-hand side is universally measurable by Lemma 7.27. Furthermore,
This proves (100)for n = 2 and f 2 0. Iff: Y2 + R* is universally measurable and satisfies j f dr2 < co or J f - dr, < co,then (100) holds for f and f -, so it holds for f as well. Take f = x,,,, to obtain (99). Now assume the proposition holds for n = k. Let qk(dxk+,lyk)be a stochastic kernel which agrees with q,(dxk+,lyk) for r, almost every x,. Define r,, by specifying it on measurable rectangles to be +
+
,
Proceed as in the case of n = 2 to prove the proposition for n = k (See also the proof of Proposition 7.28.) The existence of r~ P ( Y ) such that the marginal of r on X, is r,, n 3,. . . , is proved exactly as in Proposition 7.28. Q.E.D.
+ 1. = 2,
7.7
UNIVERSALLY MEASURABLE SELECTION
177
In the course of proving Proposition 7.45, we have also established the following fact. Proposition 7.46 Let X and Y be Borel spaces and let f:X Y + R* be universally measurable. Let q(dy1x) be a universally measurable stochastic + R* defined by kernel on Y given X. Then the mapping /?:X
is universally measurable. Corollary 7.46.1 Let X be a Borel space and let f :X -t R* be universally measurable. Then the function Of :P(X) 4 R* defined by
is universally measurable. Proof Define a universally measurable stochastic kernel on X given Q.E.D. P(X) by q(dxlp) = p(dx). Apply Proposition 7.46. As mentioned previously, the functions obtained by infimizing bivariate, extended real-valued, Borel-measurable functions over one of their variables have analytic lower level sets. We give these functions a name. Definition 7.21 Let X be a Borel space, D c X, and f:D -+ R*. If D is analytic and the set {x E D f(x) < c) is analytic for every c E R, then f is said to be lower semianalytic.
1
It is apparent from the definition that a lower semianalytic function is analytically measurable. We state some characterizations and basic properties of lower semianalytic functions as a lemma. Lemma 7.30 (1) Let X be a Borel space, D an analytic subset of X, and f:D -t X.The following statements are equivalent.
(a) The function f is lower semianalytic, i.e., the set
is analytic for every c E R. (b) The set (101) is analytic for every C E R*. (c) The set
is analytic for every c E R. (d) The set (102) is analytic for every C E R*.
178
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
(2) Let X be a Borel space, D an analytic subset of X, and f,:D -t R*,
n = 1,2,. . . , a sequence of lower semianalytic functions. Then the functions
infnfn, supnf,, liminf,,, fn, and limsupn,, f, are lower semianalytic. In particular, iff, -+ f , then f is lower semianalytic. (3) Let X and Y be Borel spaces, g:X -+ Y , and f :g(X) + R*. If g is Borel-measurable and f is lower semianalytic, then f o g is lower semianalytic. (4) Let X be a Borel space, D an analytic subset of X, and f , g: D + R*. I f f and g are lower semianalytic, then f + g is lower semianalytic. If, in addition, g is Borel-measurable and g 2 0 or iff 2 0 and g 2 0, then fg is lower semianalytic, where we define 0 . oo = oo . 0 = 0(- oo) = (- oo)O = 0. Proof (1) We show (b) * (a) If (a) holds, then
(d)* (c)* (b). It is clear that (b) *(a).
{ x ~ ~ l f ( xco) ) l= D is analytic by definition, while the sets
are analytic by Corollary 7.35.2. Therefore (a)=> (d). It is clear that (d) * (c). If (c) holds, then the sets m
( x E Df(x) ~ < co) = ,( { x ~ Df(x) l l n), n=l
are analytic by Corollary 7.35.2. Therefore (c) 3 (b). (2) For C E R, { x € ~ l i n f f , ( x<) c) = n
so inf, Then
U jx~Dlf,(x< ) c), n= 1
ji and sup, f, are lower semianalytic by Corollary 7.35.2 and part (1). lim inf fn n- u
= sup
inf ,f,
n > l k>n
7.7
UNIVERSALLY MEASURABLE SELECTION
and lim supf, n+
w
=
inf supf, nzl k>n
are lower semianalytic as well. (3) The domain g ( X ) off is analytic by Proposition 7.40. For c~ R, is analytic by the same proposition. (4) For c~ R,
and this is true even if f ( x ) + g(x) = co - co = cc for some X E D . From Corollary 7.35.2 it follows that f + g is lower semianalytic whenever f and g are. Now suppose g is Borel-measurable and g 2 0. For c > 0, we have
0, we have while if c I
In both cases, the set { X E Df(x)g(x) \ < c ) is analytic by Corollary 7.35.2. Suppose f and g are both lower semianalytic and nonnegative. For c > 0, the set { x ~ ~( f ( x ) ~<( cx )) is analytic as before, and for c < 0, this set is empty. It follows that fg is lower semianalytic under either set of assumptions Q.E.D. on f and g. Note in connection with Lemma 7.30(3) that the composition of a Borel-measurable function with a lower semianalytic function can be guaranteed to be lower semianalytic only when the composition is in the order specified. To see this, let X be a Borel space and A c X be an analytic set whose complement is not analytic (see Appendix B). Define f ( x ) = -zA(x), which is lower semianalytic, because j x XI ~ f ( x ) < c} is either @, A, or X , depending on the value of c. Let g : R* 4 R* be given by g(c) = -c. Then zA=g f , and this function is not lower semianalytic, since { xE XIxa(x)< 3)= A". This also provides us with an example of an analytically measurable function which is not lower semianalytic. 0
Proposition 7.47 Let X and Y be Borel spaces, let D be an analytic subset of X Y , and let f : D + R* be lower semianalytic. Then the function
180
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
f * :projx(D) -+ R* defined by
f
*(XI = inf f (x,y)
(103)
YE&
is lower semianalytic. Conversely, iff *: X -+ R* is a given lower semianalytic function and Y is an uncountable Borel space, then there exists a Borelmeasurable function f :X Y -+ R* which satisfies (103)with D = X Y. '
Proof For the first part of the theorem, observe that i f f :D -,R* is lower semianalytic and c e R, the set
is analytic by Proposition 7.39. For the converse part of the theorem, let f * : X -+ R* be lower semianalytic and let Y be an uncountable Borel space. For rEQ, let A(r) = { X E Xf ~ *(x)< r}. Then A(r) is analytic and, by Proposition 7.39, there ~ ~that , A(r) = projx[B(r)]. Define G(r)= B(s) exists B ( Y ) E ~such and f : X Y -+ R* by
~ and $,(x, y) = co otherwise. Then f is where $,(x, y) = r if ( x ,y ) G(r) Borel-measurable. Let g be defined by g(x) = inf,,, f(x, y). We show that f *(x)= g(x) for every x E X. Iff " ( x )< c for some c E R, then there exists r E Q for which f *(x)< r < c, and so x E A(r). There exists Y E Y such that (x,y) E G(r),and, consequently, f ( x , y) 2 r and g(x) I r < c. Therefore g(x)cannot be greater than f *(x). If g(x) < c for some c~ R, then there exists r E Q and y e Y for which g(x) < r < c and (x,y) E G(r).Thus for some s E Q, s 5 r, we have (x,y) E B(s) and x E A(s). This implies f *(x)< s I r < c, which shows that f * ( x )cannot Q.E.D. be greater than g(x). Proposition 7.48 Let X and Y be Borel spaces, f : X Y -+ R* lower semianalytic, and q(dy1x) a Borel-measurable stochastic kernel on Y given X. Then the function A:X -+ R* defined by
4 x )=
S f (x.y)q(dylx)
is lower semianalytic. Proof Suppose f 2 0. Let f,(x, y) = min (n,f (x,y)). Then each f, is lower semianalytic and f , f . The set
7.7
181
UNIVERSALLY MEASURABLE SELECTION
is analytic in XYR by Corollary 7.35.2 and Proposition 7.38. Let y be Lebesgue measure on R, p E P(X Y), and py the product measure on X YR. By Fubini's theorem,
For c E R we have, by the monotone convergence theorem,
Hence, by Proposition 7.43 and the fact that the mapping p + py is continuous (Lemma 7.12), the function Of:P(XY) + R* defined by Of(p) = 1f(x, y) dp is lower semianalytic. We have 1 4 ~=) 0, [q(dy(x)p,l. Since the mapping x + q(dy1x) is Borel-measurable from X to P(Y) and the mappings x + p, and [q(dylx),p,] -+ q(dylx)p, are continuous from X to P(X) and P(X)P(Y) to P(XY), respectively (Corollary 7.21.1 and Lemma 7.12), it follows from Lemma 7.30(3) that A is lower semianalytic. Suppose f I 0. Let f,(x, y) = maxi - n, f(x, y)). Then each f, is lower semianalytic and f,1f . The sets En = ((x, y, b) E X YRlf,(x, y) 5 b I 0) are analytic and
sXy
( P P ) ( ~=~ )
JR
XE. d~ d~ =
-Jxy
/.(x, Y)d ~ .
For c E R,
Proceed as before. In the general case,
The functions f and -f - are lower semianalytic, so by the preceding arguments each of the summands on the right is lower semianalytic. The result follows from Lemma 7.30(4). Q.E.D. +
182
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
Corollary 7.48.1 Let X be a Borel space and let f : X -+ R* be lower semianalytic. Then the function Of:P(X) -,R* defined by
is lower semianalytic. Proof Define a Borel-measurable stochastic kernel on X given P(X) by q(dx1p) = p(dx). Apply Proposition 7.48. Q.E.D. As an aid in proving the selection theorem for lower semianalytic functions, we give a result concerning selection in an analytic subset of a product of Borel spaces. The reader will notice a strong resemblance between this result and Lemma 7.21, which was instrumental in proving the selection theorem for upper semicontinuous functions. Proposition 7.49 (Jankov-von Neumann theorem) Let X and Y be Borel spaces and A an analytic subset of XY. There exists an analytically measurable function q : proj,(A) -t Y such that Gr(q) c A. Proof (See Fig. 7.2.) Let f :JV -+ XY be continuous such that A = f(.M). Let g = proj, 0 f . Then g : N + X is continuous from N onto proj,(A). For x E proj,(A), g-'({x)) is a closed nonempty subset of N . Let C1(x) be the smallest integer which is the first component of an element z , ~ g - l ( { x } ) .Let [,(x) be the smallest integer which is the second component of an element z, €9-'({x)) whose first component is [,(x). In general, let Ck(x)be the smallest integer which is the kth component of an element Z,E ~ - l ( { x ) )whose first (k - 1)st components are [,(x), . . . ,Ck- l ( ~ ) Let . $(x) = (Cl(x),lZ(x),. . .). Since z, -t $(x), we have Define q : proj,(A)
+
Y by q
= proj,
0
f $, so that Gr(q) c A. 0
FIGURE 7.2
7.7
183
UNIVERSALLY MEASURABLE SELECTION
We show that cp is analytically measurable. As in the proof of Proposition 7.42, for (o,, . . . ,a,) E C let
Since We first show that $ is analytically measurable, i.e., $-l(W,N) c d,. ( N ( s ) JEX) s is a base for the topology on N , by the remark following Defini) Then tion 7.6, we have o ( { N ( s ) ( s ~ C=)g,N.
and it suffices to prove $ - [N(s)]E d x
QS E
C.
(105)
We claim that for s = (o,, o2,. . . ,o k )E C
where M(o,, . . . ,o j - ,,oj - 1) = if oj - 1 = 0. We show this by proving that $-'[N(s)] is a subset of the set on the right-hand side of (106) and vice versa. Suppose X E $-'[N(s)]. Let $(x) = ([,(x),i z ( x ) ,. . .). Then
so (104)implies
Relation (107) also implies [ , ( x ) = G I , . . . ,i k ( x )= ok. By the construction of $, we have that o l is the smallest integer which is the first component of an element of g - ' ( ( x ) ) , and for j = 2,. . . ,k, o j is the smallest integer which is the jth component of an element of g - ' ( ( x ) ) whose first ( j - 1) components are o,, . . . ,o j - ,. In other words, g- 1(,, [ \x , n ) M(ol,. . . ,
It follows that
Relations (108)and (109)imply
G ~ - ~ , G 1) ~ -=
a,
j = 1,. . . ,k.
184
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
To prove the reverse set containment, suppose
Since x~g[M(s)],there must exist y
= (y,, y,,
. . .)eg-'((x)) such that
Clearly, x E projx(A) = g(N),so $(x) is defined. Let $(x) = (ll(x),[,(x), . . .). By (104),we have g[$(x)] = x, so ( I l l ) implies
Since $(x)tfM(ol - I), we know that il(x) 2 ol. But il(x) is the smallest integer which is the first component of an element of g-'((x)), so (112) implies [,(x) I y I ol . Therefore [,(x) = 0,. Similarly, since $(x) 6 M([,(x), a, - I), we have i,(x) 2 a,, Again from (112) we see that 5,(x) 1 y, 5 a,, so i,(x) = a,. Continuing in this manner, we show that $ ( x ) ~ N ( s )i.e., , x E $-'[N(s)] and k
$ - ' [ N ( s ) ] 3 g [ M ( s ) ] - u g [ M ( o l, . . . , o j - l , o j - l ) ] .
(113)
j= 1
Relations (110) and (113) imply (106). We note now that M(t) is open in JV for every t E C,so g[M(t)] is analytic by Proposition 7.40. Relation (105)now follows from (106),so $ is analytically measurable. By the definition of cp and the Borel-measurability off and proj,, we have
We have just proved $-l(B,K) c d x , and the analytic measurability of cp follows. Q.E.D. This brings us to the selection theorem for lower semianalytic functions. Proposition 7.50 Let X and Y be Bore1 spaces, D c XY an analytic set, and f : D -t R* a lower semianalytic function. Define f *:projx(D)-,R* by
f *(x) = inf f (x, y). YSD,
(a) For every
E
> 0, there exists an analytically measurable function
cp: projx(D) + Y such that Gr(cp) c D and for all x E projx(D),
7.7
UNIVERSALLY MEASURABLE SELECTION
(b) The set I
=
{XEprojx(D)lfor some y , D,, ~ f (x, y,) = f *(x))
is universally measurable, and for every s > 0 there exists a universally measurable function cp:proj,(D)+ Y such that Gr(cp) c D and for all x E projx(D)
f [x, v(x)] = f *(x)
Proof
f
if x 1,
(115)
(a) (Cf. proof of Proposition 7.34 and Fig. 7.1.) The function
* is lower semianalytic by Proposition 7.47. For k = 0, f 1, 2 2 , . . . , define A(k) = {(x,Y ) E D ~ ~ ( < ~ kc), >Y) B(k) = { x ~ ~ r o j ~ ( ~ )1)s l (Ikf *(x) < ks), B(- CO)= . ( x ~ p r o j ~f( *(x) ~ ) l = - CO). B ( a ) = { X Eprojx(D)lf *(x) = CO).
The sets A(k), k = 0, f1, i 2 , . . . , and B(- co) are analytic, while the sets B(k), k = 0, 1 1 , 2 2 , . . . , and B(co) are analytically measurable. By the Jankov-von Neumann theorem (Proposition 7.49) there exists, for each k = 0, i 1, 2 2 , . . . , an analytically measurable cp,:proj,[A(k)] + Y with (x, cp,(x)) E A(k) for all x E projx[A(k)] and an analytically measurable q:projx(D) -, Y such that (x, c p ( x ) )D ~ for all X E projx(D). Let k* be an integer such that k* I - 1/s2.Define cp: proj,(D) -+ Y by cp,(x) @(x) cpk*(x)
if x ~ B ( k ) , k = O , f l , f 2 , . . . , if X E B(co), if X E B(- a).
Since B(k) c proj,[A(k)] and B(- oo) c proj,[A(k)] for all k, this definition is possible. It is clear that cp is analytically measurable and Gr(cp) c D. If x E B(k), then (x, cpk(x))E A(k) and we have
If x E B(oo), then f(x, y) = co for all X E B(- a), we have
Y E D,
and f [x, cp(x)] = oc
f [x, cp(x)] = f [x, cp,*(x)] < k*e I - I/&.
Hence cp has the required properties.
= f *(x).
If
186
7.
BOREL SPACES AND THEIR PROBABILITY MEASURES
(b) Consider the set E c XYR* defined by Since
it follows from Corollary 7.35.2 and Proposition 7.38 that E is analytic in XYR*, and hence the set
is analytic in XR*. The mapping T:proj,(D)
-+
XR* defined by
is analytically measurable, and
Hence I is universally measurable by Corollary 7.44.2. Since E is analytic, there is, by the Jankov-von Neumann Theorem, an analytically measurable p: A + Y such that (x, p(x, b), b) E E for every ( x , b ) ~ ADefine . $:I Yby -+
Then $ is universally measurable by Corollary 7.44.2, and by construction f [x, $(x)] 5 f *(x) for X E I. Hence f[x,$(x)l=f*(x)
~xEI.
By part (a) there exists an analytically measurable $,:proj,(D) that
Define q :proj,(D)
+
(117) -+
Y such
Y by
Then q is universally measurable and, by (117) and (118), it has the required properties. Q.E.D. Since the composition of analytically measurable functions can fail to be analytically measurable (Appendix B), the selector obtained in the proof
7.7
UNIVERSALLY MEASURABLE SELECTION
187
of Proposition 7.50(b) can fail to be analytically measurable. The composition of universally measurable functions is universally measurable, and so we obtained a selector which is universally measurable. However, there is a o-algebra, which we call the limit o-algebra, lying between d,and 62, such that the composition of limit measurable functions is again limit-measurable. We discuss this o-algebra in Appendix B and state a strengthened version of Proposition 7.50 in Section 11.1.
Chapter 8
The Finite Horizon Borel Model
In Chapters 8-10 we will treat a model very similar to that of Section 2.3.2. An applications-oriented treatment of that model can be found in "Dynamic Programming and Stochastic Control" by Bertsekas [B4], hereafter referred to as DPSC. The model of Section 2.3.2 and DPSC has a countable disturbance space and arbitrary state and control spaces, whereas the model treated here will have Borel state, control, and disturbance spaces. 8.1 The Model Definition 8.1 A finite horizon stochastic optimal control model is a ninetuple (S,C , U , W ,p, f,cc, g, N) as described here. The letters x and u are used to denote elements of S and C , respectively.
S State space. A nonempty Borel space. C Control space. A nonempty Borel space. U Control constraint. A function from S to the set of nonempty subsets of C. The set
r = {(X, U ) ~ XE S, U E U(X)) is assumed to be analytic in SC.
(I)
W Disturbance space. A nonempty Bore1 space. p(dwlx, u) Disturbance kernel. A Borel-measurable stochastic kernel on W given SC. f Systemfunction. A Borel-measurable function from S C W to S. a Discount factor. A positive real number. g One-stage cost function. A lower semianalytic function from l- to R*. N Horizon. A positive integer.
We envision a system moving from state x, to state x k + , via the system equation
and incurring cost at each stage of g(xk,u,). The disturbances w, are random objects with probability distributions p(dwklxk,u,). The goal is to choose u, dependent on the history (x,, uo, . . . ,xk- u,- x,) so as to minimize
,, ,,
The meaning of this statement will be made precise shortly. We have the constraint that when x, is the kth state, the kth control u, must be chosen to lie in U(xk). In the models in Section 2.3.2 and DPSC, the one-stage cost g is also a function of the disturbance, i.e., has the form g(x, u, w). If this is the case, then g(x, u, w) can be replaced by
If g(x, u, w) is lower semianalytic, so is g(x, u) (Proposition 7.48). If p(dwlx, u) is continuous and g(x, u, w) is lower semicontinuous and bounded below or upper semicontinuous and bounded above, then g(x, u) is lower semicontinuous and bounded below or upper semicontinuous and bounded above, respectively (Proposition 7.31). Since these are the three cases we deal with, there is no loss of generality in considering a one-stage cost function which is independent of the disturbance. The model posed in Definition 8.1 is stationary, i.e., the data does not vary from stage to stage. A reduction of the nonstationary model to this form is discussed in Section 10.1. A notational device which simplifies the presentation is the state transition stochastic kernel on S given SC defined by
+
Thus t(Blx, u) is the probability that the (k 1)st state is in B given that the kth state is x and the kth control is u. Proposition 7.26 and Corollary 7.26.1 imply that t(dxtlx,u) is Borel-measurable. Definition 8.2 A policy for the model of Definition 8.1 is a sequence n = (pO,pl,. . . ,p ~ - such that, for each k, pk(duklx0,u,, . . . ,u,- ,, x,) is a universally measurable stochastic kernel on C given SC. . . CS satisfying
for every (x,, u,, . . . ,uk- ,,x,). If, for each k, p, is parameterized only by (x,, x,), n is a semi-Mavkov policy. If p, is parameterized only by x,, n is a Mavkov policy. If, for each k and (x,, u, ,. . . ,uk-, ,x,), p,(du,lx,, u,, . . . ,u,-, , x,) assigns mass one to some point in C, n is nonvandomized. In this case, by a slight abuse of notation, n can be considered to be a sequence of universally measurable (Corollary 7.44.3) mappings p,: SC . . . CS --+ C such that
,,
for every ( x , , ~ , , . . . ,u,- x,). If 9 is a type of a-algebra on Borel spaces and all the stochastic kernel components of a policy are 9-measurable, we say the policy is 9-measurable. (For example, .Fcould represent the Borel a-algebras or the analytic a-algebras.) We denote by IT' the set of all policies for the model of Dejnition 8.1 and by IT the set of all Markov policies. We will show that in many cases it is not necessary to go outside 17 to find the "best" available policy. In most cases, this "best" policy can be taken to be nonrandomized. Since I- is analytic, the Jankov-von Neumann theorem (Proposition 7.49) guarantees that there exists at least one nonrandomized Markov policy, so IT and IT' are nonempty. If n = (p,, p l , . . . ,pN- ,) is a nonrandomized Markov policy, then n is a finite horizon version of a policy in the sense of Section 2.1. The notion of policy as set forth in Definition 8.2 is wider than the concept of Section 2.1 in that randomized non-Markov policies are permitted. It is narrower in that universal measurability is required. We are now in a position to make precise expression (2). In this and subsequent discussions, we often index the state and control spaces for clarity. However, except in Chapter 10 when the nonstationary model is treated, we will always understand S, to be a copy of S and Ck to be a copy of C. Suppose p E P(S) and n = (p,, p,, . . . ,p,- ,) is a policy for the model of Definition 8.1. By Proposition 7.45, there is a unique probability measure r,(n, p) on SoCo. . . S, - C, - such that for any universally measurable function h:S,C,. . . S , - C,- -. R* which satisfies either jh+ dr,(n, p) < co
, , , ,
or Jh- dr,(n,p) < oo,we have ~ h d r N ( ~ ~ ~ ) = ~ o ~ c o ~ s , ~ ~ ~ s N ~ l ~ c N ~ , h ~ x ~ ~ u
x ~ N - l ( d ~ , v - l I ~ o ., .~.o~ , v . -z,xN-t)
t ( d x ~1-I x N - 2 ,
UN-2)'
' '
t(dxlIxO,
uO)~O(duOIxO)~(dxO)~
(4)
where t(dxrlx,u) is the Borel-measurable stochastic kernel defined by (3). Furthermore we have from (4) that JhdrN(n, px) is a universally measurable function of x (Proposition 7.46), and if h and n are Borel-measurable, then JhdrN(n,p,) is a Borel-measurable function of x (Proposition 7.29). Definition 8.3 Suppose n = ( y o , p l , .. . ,p,- ,) is a policy for the model of Definition 8.1. For K 5 N, the K-stage cost corresponding to n at x E S is
where, for each n E I I rand p E P(S), rN(n,p) is the unique probability measure satisfying (4).The K-stage optimal cost at x is Jg(x) = inf JK,,(x). nen'
If E > 0, the policy n is K-stage &-optimalat x provided J i ( x )+ E JK.744 I
if J ~ ( x>) - co, if J i ( x ) = - co.
If J,.,(x) = Jg(x), then n is K-stage optimal at x. If n is K-stage &-optimal or K-stage optimal at every X E S, it is said to be K-stage &-optimalor K-stage optimal, respectively. If (E,) is a sequence of positive numbers with E , 10, a sequence of policies jn,) exhibits (&,)-dominatedconvergence to K-stage optimality provided lim JK.,n
= Jg,
n-m
and for n = 2,3,. . . J K .n,(x)
If K
=
JK,R n - 1
+ E,
( ~ )
if Jg(x) > - co, if J ~ ( x=) - co.
N, we suppress the qualifier "K-stage" in the preceding terms.
Note that Jg is independent of the horizon N as long as K 5 N. Note also that JK,,(x)is universally measurable in x. If n is a Borel-measurable policy and g is Borel-measurable, then J,,,(x) is Borel-measurable in x.
of ~
For n = ( p o ,p l , . . . ,p ~ - E II' and p E P ( S ) , let qk(n,p) be the marginal ~ ( 7P1) ,on SkCk. If we take h = XS,. . . ,,_ ,,,,,,,+.,--- in (4), we obtain
From (1) and (7), we see that q k ( n , p ) ( T )= 1. If n is Markov, (7) becomes q k ( ~p)(SkG) , =
Ssk hk
k(ckxk)t(dxkxk-
1 , uk-
1Idqk- I(% P )
' d S k ~ g s ,G
Q E ~ ~ c .
(8)
If either
then Lemma 7.1 1(b)implies that for every n E II'and x E S
If ( F + )[respectively ( F - ) ] appears preceding the statement of a proposition, [respectively ( F - ) ] is understood to be a part of the hypotheses of then (Ff) the proposition. If both ( F i ) and ( F - ) appear, then the proposition is valid when either (F') or ( F - ) is included among the hypotheses. If nrE ll' is a given policy, there may not exist a Markov policy which does at least as well as n' for every x E S, i.e., a policy n E I7 for which JN.
n(x) 2 J N ,
rr'(~)
(10)
for every X E S. However, if x is held fixed, then a Markov policy n can be found for which (10) holds. Proposition 8.1 ( F f ) ( F - ) If x E S and n ' ll', ~ then there is a Markov policy n such that
J K , n ( ~ ) = J K , n , ( ~ ) K, = 1 , . . . , N .
(11)
Proof Let n' = (pb,pi,. . . ,pk- l ) be a policy and let x E S be given. For k = 0,1,. . . ,N - 1, let p,(du,(x,) be the Borel-measurable stochastic kernel obtained by decomposing q,(nr,p,) (Corollary 7.27.2),i.e.,
where pk(nr,p,) is the marginal of q,(nr,p,) on S, . From (12)we see that 1 = a(nf,P,)(T) =
Ssk~k(~(x,)x,)p,(xl, p,)(dx,),
so we must have p,(U(x,)lx,) = 1 for pk(nl,p,) almost every x,. By altering pk(.lx,) on a set of p,(nf, p,) measure zero if necessary, we may assume that (12)holds and n = (p,, p,, . . . ,p,- ,) is a policy as set forth in Definition 8.2. In light of (9), ( 1 1 ) will follow if we show that qk(nl,p,) = qk(n,p,) for k = 0,1,. . . ,N - 1. For this, it suffices to show that, for k = 0,1,. . . ,N - 1, We prove (13)by induction. For k = 0, So~ $ and 8 ~ C o ~ . B cwe , have, from ( 121,
If qk(nl,p,)
= q,(n,p,),
,
then for S k + E B , ,
C,+, E Bc, we have, from (12),
From (7) we see that
so if h:S,+
,+ [0, oo] is a Borel-measurable indicator function, then
dqk(n',P X ) . (15) Then (15) holds for Borel-measurable simple functions, and finally, for all Borel-measurable functions h:S,+, + [0, a].Letting h(x,+,) in (15) be p,+ ,(C,+ ,lxk+,), we obtain from (14),the induction hypothesis, and (8)
=qk+
which proves (13) for k
+ 1.
~ ( n , ~ , ) ( S l1Cli+ i + 11, Q.E.D.
Corollary 8.1.1
( F + ) ( F - ) For K = 1,2,. . . ,N, we have J g ( x ) = inf J K ,.(x)
V x E S,
RE^
where I 3is the set of all Markov policies. Corollary 8.1.1 shows that the admission of non-Markov policies to our discussion has not resulted in a reduction of the optimal cost function. The advantage of allowing non-Markov policies is that an &-optimalnonrandomized policy can then be guaranteed to exist (Proposition 8.3), whereas one may not exist within the class of Markov policies (Example 2). 8.2 The Dynamic Programming Algorithm-Existence Optimal and E-OptimalPolicies
of
Let U(CIS) denote the set of universally measurable stochastic kernels y on C given S which satisfy p(U(x)lx)= 1 for every X E S. Thus the set of Markov policies is ll = U(CIS)U(CIS).. . U(CIS),where there are N factors. Definition 8.4 Let J : S -, R* be universally measurable and ,UE U(C1S). The operator T , mapping J into T , ( J ) : S + R* is defined by
for every x E S. The operator T , can also be written in terms of the system function f and the disturbance kernel p(dwlx, u) as [cf. (3)]
By Proposition 7.46, T,(J) is universally measurable. We show that under ( F f )or ( F - ) , the cost corresponding to a policy TC = (p,, . . . ,yN- l ) can be defined in terms of the composition of operators T,,T,, . . . T,,_ ,. Lemma 8.1 ( F + ) ( F - ) Let = ( y o ,y l , . . . ,,UN - be a Markov policy and let Jo :S + R* be identically zero. Then for K = 1,2,. . . ,N we have JK,n = ( T p o. . . T P K- l ) ( J ~ ) ,
where T,, . . . T,, Proof
_ ,denotes the composition of T,,,
We proceed by induction. For x E S,
(16) . . . , T,,
,,
Suppose the lemma holds for K - 1. Let n = (p,, p2, . . . ,,LA,- p), where p is some element of U(CIS). Then for any XES,the ( F f )or (F-) assumption along with Lemma 7.1 1(b)implies that (5) can be rewritten as
while under (F+) a similar condition holds, so Lemma 7.11(b) and the induction hypothesis can be applied to the right-hand side of (17)to conclude
= (T,oT,l..
. T,,,)(Jo)(x).
Q.E.D.
Definition 8.5 Let J : S -, R* be universally measurable. The operator T mapping J into T(J):S -, R* is defined by
for every x E S. Similarly as for T,, the operator T may be written in terms o f f and p(dwlx, u) as
If p is nonrandomized, the operators T, and T of Definitions 8.4 and 8.5 are, except for measurability restrictions on J and p, special cases of those defined in Section 2.1. In the present case, the mapping H of Section 2.1 is
We will state and prove versions of Assumptions F.l and F.3 of Section 3.1 for this function H. Assumption F.2 is clearly true. Furthermore, ifp E U(CIS),
J,, J , :S -+ R* are universally measurable, and J , I J , ,then T,(J,) I T,(J2) and T ( J l )I T(J,). If r ~ ( 0co), , then T,(J, + r) = T,(Jl) + ctr and T(Jl + r) = T(J,) + ar. We will make frequent use of these properties. The reader should not be led to believe, however, that the model of this chapter is a special case of the model of Chapters 2 and 3. The earlier model does not admit measurability restrictions on policies. By Lemma 7.30(4) and Propositions 7.47 and 7.48, T ( J ) is lower semianalytic whenever J is. The composition of T with itself k times is denoted , T O ( J= ) J. We show in Proposition by T k ,i.e., T k ( J )= T [ T k - l ( J ) ]where 8.2 that under ( F + ) or (F-) the optimal cost can be defined in terms of T N . Three preparatory lemmas are required. Lemma 8.2 Let J : S + R* be lower semianalytic. Then for E > 0, there exists ,u E U(CIS)such that where T ( J ) ( x )+ E may be - oo. Proof By Proposition 7.50, there are universally measurable selectors ,urn:S + C such that for m = 1,2,. . . and x E S, we have ,um(x)E U ( x )and
+
T(J)(x)
E
if if
T ( J ) ( x )> - oo, T ( J ) ( x )= - 00.
Let ,u(duIx) assign mass one to ,ul(x) if T ( J ) ( x )> 112" to pm(x),m = 1,2,. . . , if T ( J ) ( x )= - co. For each CEBc,
-
co and assign mass
is a universally measurable function of x, and therefore ,u is a universally This ,u has the desired measurable stochastic kernel [Lemma 7,28(a),(b)]. Q.E.D. properties.
> Lemma 8.3 ( F + ) If J o : S -+ R* is identically zero, then TK(Jo)(x) for every x E S, K = 1,. . . ,N, where T Kdenotes the composition of T with itself K times. - co
Proof Suppose for some K I N and ZES that for every x E S, and
8.2
197
THE DYNAMIC PROGRAMMING ALGORITHM
By Proposition 7.50, there are universally measurable selectors pj:S + C, j = 1,. . . ,K - 1, such that p j ( x )U(x) ~ and
+
( T, K - J Tj-')(Jo)(x) s Tj(Jo)(x) 1,
j
=
1,. . . ,K - 1,
for every x E S. Then
where the last inequality is obtained by repeating the process used to obtain the first two inequalities. By Lemma 8.2, there is a stochastic kernel p0 E U(CIS) such that (T,,TK- ')(Jo)(n) = - a. Then (T,,TPl.. . T,,_ ,)(J0)(x) 5 T,,[TK-'(Jo)
Choose any p~ U(CIS)and let By Lemma 8.1,
TC
+ 1 + a + ...+
= ( p o , .. . ,pK-
= - oo.
',p, . . . ,p), SO that T C En.
so for some k I K - 1, jg- dqk(n,pi)= oo. This contradicts the (F+) assumption. Q.E.D. Lemma 8.4 Let (5,) be a sequence of extended real-valued, universally measurable functions on S and let p be an element of U(C1S).
(a) If T,(J,)(x) < oo for every X E S and J k .1J , then T,(Jk).1T,(J). (b) If T,(J;)(x) < oo for every x E S, g 2 0, and J k f J, then T,(Jk)7 T,(J). (c) If { J,) is uniformly bounded, g is bounded, and J, + J , then T,(Jk) T,(J). +
Proof
Assume first that T,(J,) < oo and Jk .1J . Fix x. Since
we have
S
g(x,u)+ a ~,(x')t(dx'lx,u) < co
for p(dulx) almost all u. By the monotone convergence theorem [Lemma 7.1 l(f)I,
for ,u(dulx)almost all u. Apply the monotone convergence theorem again to conclude T,(Jk)(x).lT,(J)(x). If T,(J;) < co,g 2 0, and Jk J, the same type of argument applies. If { J k )is uniformly bounded, g bounded, and J k -+ J, a similar argument using Q.E.D. the bounded convergence theorem applies. The dynamic programming algorithm over a finite horizon is executed by beginning with the identically zero function on S and applying the operator T successively N times. The next theorem says that this procedure generates the optimal cost function. In Proposition 8.3, we show how &-optimalpolicies can also be obtained from this algorithm. Proposition 8.2 ( F + ) ( F - ) Let Jo be the identically zero function on S. Then J; = T ~ ( J ~ ) K , = 1, . . . , N . (18) Proof It suffices to prove (18) for K = N, since the horizon N can be chosen to be any positive integer. For any TC = (,uo,. . . ,pN- E II and K I N, we have J ~ , a=
' '
TpK- l)(JO)
2 (Tpo'
' '
TfiK-2T)(J0)
2
TK(JO),
(I9)
where the last inequality is obtained by repeating the process used to obtain the first inequality. Infimizing over TC E II when K = N and using Corollary 8.1.1, we obtain J; 2 T ~ ( J ~ ) . (20) E
If (F+)holds, then, by Lemma 8.3, Tk(Jo)> - co, k = 1,. . . ,N. For > 0, there are universally measurable selectors Gk:S+ C, k = 0,. . . ,N - 1,
with
G,(x)E
U(x)and
for every x E S (Proposition 7.50).Then
where the last inequality is obtained by repeating the process used to obtain the first two inequalities. It follows that
Combining (20)and (22),we see that the proposition holds under the ( F f ) assumption. If ( F - ) holds, then JK,,(x)< oo for every X E S, Z E IT, K = 1, . . . , N. Use Proposition 7.50 to choose nonrandomized policies ni = (pb, . . . , ph- , ) € I T such that
as i -+ oo. By (19) and Lemma 8.4(a), (T,:. . . T'iN- J ( J O )
J; I inf
(Lo, . . . , i~
-
N - 1
11
= inf. . . inf (T,:. . . T,i?- J ( J 0 ) io
= inf.
h-1
IN-
I
. , inf (T,:. . . T
i~-2 = inf. . . inf (T,:. . . T,:--;T)(Jg) io
io
ix-2
where the last equality is obtained by repeating the process used to obtain the previous equality. Combining (20)and (23),we see that the proposition Q.E.D. holds under the ( F - ) assumption. When the state, control, and disturbance spaces are countable, the model of Definition 8.1 falls within the framework of Part I. Consider such a model, and, as in Part I, let M be the set of mappings p: S -+ C for which p(x)E U(x) for every x E S. In Section 3.2, it was often assumed that for every x E S and p j € M , j = O , . . . , K - 1,wehave
or else for every x E S inf pjeM,O<j
(T,; . . T,,- ,)(Jo)(x) > - a,
K
=
1,. . . ,N .
(25)
Under the (F') assumption, Lemma 8.3 implies that
-a < T ~ ( J ,I ) pjaM.
inf (T,,...T,K_l)(JO), OsjsK- 1
so (25)is satisfied. Under ( F - ) , we have from Lemma 8.1 that
where 71 = ( p O., . . ,pK- ,), SO (24)holds. The primary reason for introducing the stronger ( F f )and ( F - ) assumptions is to enable us to prove Lemma 8.1. If one chooses instead to take (16)as the definition of JK,, (as is done in
200
8.
THE FINITE HORIZON BOREL MODEL
Section 3.1), then (24) or (25) suffices to prove Proposi.tion 8.2 along the lines of the proof of Proposition 3.1 of Part I. Proposition 8.2 implies the following property of the optimal cost function. Corollary 8.2.1 (F+)(F-) For K lower semianalytic.
=
1,2,. . . ,N, the function
Jg
is
Proof As observed following Definition 8.5, T(J) is lower semianalytic whenever J is. Since Jg = T K ( ~ , )and JoE 0 is lower semianalytic, the result follows. Q.E.D.
We give an example to show that even when l- = SC and the one-stage cost g: SC + R* is Borel-measurable, JT can fail to be Borel-measurable. EXAMPLE 1 Let A be an analytic subset of [0, 11 which is not Borelmeasurable (Appendix B). By Proposition 7.39, there is a closed set F c [O, 11.M such that A = projlo,,](F). Let S = [O, I], C = N , r = SC, and g = X F e Then JT(x) = inf g(x, u) = xAc(x)
Vx E S,
UEC
which is a lower semianalytic but not Borel-measurable function. We could also choose C = [O, 11, l- = SC, B a G,-subset of the unit square SC, and g = z B c . This is because Jlr and N o , the space of irrational numbers in [O, 11, are homeomorphic (Proposition 7.5). But NO=
n(LO,11
- {r})
reQ
is a G,-subset of [0, 11, so there is a homeomorphism cp:N + [O, 11 such that q ( N ) is a G,-subset of [O,l]. Let @:[O, 1 ] N -+ [O, 1][0,1] be the homeomorphism defined by
Then @([O,11.N) = [0, l]q(.N) is a G6-subset of SC = [O, 11[O, I], and since F is a G,-set in [O,l]./V, B = @(F) is a G,-subset of SC which satisfies proj,(B) = A. If g = x B c , then again JT = xAC. We now use Proposition 8.2 to establish existence of &-optimalpolicies. Proposition 8.3 ( F f ) For each E > 0, there exists a nonrandomized Markov E-optimal policy. (F-) For each E > 0, there exists a nonrandomized semi-Markov 8optimal policy and a (randomized) Markov &-optimalpolicy. Proof If ( F+) holds, then the policy (,Lo,. . . ,L,_ ,) constructed in the proof of Proposition 8.2 is &-optimal,nonrandomized, and Markov.
Assume (F-) holds. We show first the existence of an &-optimal,nonrandomized, semi-Markov policy. Let ni = ( p i , . . . , p h - ,) be as in the proof of Proposition 8.2. Then inf (T,?. . . Tpi,- ~ ) ( J o ) N - 1 (io.. . . .i w - 1 ) = inf JN,n(Lo,.. . .,,,(io, . . . .i~ - 1) where #03. . . . i~ - l ) = ( p t , . . . ,pkN:,'). Choose E > 0 and define J;
=
T"(J,)=
Order linearly the countable set ( ~ ( ' 0.,... ' N - 1 ) (i,, . . . , i N - , are positive integers) and define n ( x ) to be the first x ( ' o . . . . . ' ~ - such that
Let the components of n ( x ) be
..
( ~ 6 ( ~ 0 ) > ~ ; ( ~ 1>P;) , . ~(xN-1)).
The set ( x l n ( x )= ~ ( ' ~. .3.' N. -
I))
is universally measurable for each (i,, ..., iN- ,),
SO (PO(XO),PI(XO,XI),.
..,PN-I(XO.XN-I)),
where po(xo) = p","xo) and p k ( x o ,x,) = p 2 ( x k ) , k = 1,. . . ,N - 1, is an Eoptimal nonrandomized semi-Markov policy. We now show the existence of an &-optimal(randomized) Markov policy. By Lemma 8.2, there exist p , - , ~ U(CIS)such that for k = 1,. . . ,N
+
(T,, _ , T k - - ' ) ( J o )s T k ( J o ) &/(I
Proceed as in (21).
+ a + a2 + . . . + a N - I ) .
Q.E.D.
If the (F-) assumption holds and E > 0, it may not be possible to find a nonrandomized Markov &-optimal policy, as the following example demonstrates. N
EXAMPLE^ L e t s = jO,1,2,.. . j , C = ( 1 , 2, . . .), W = ( w 1 , w 2 ) , r = SC, and define
=2
8.
202
THE FINITE HORIZON BOREL MODEL
The ( F - ) assumption is satisfied. Let z = (,uo,,ul) be a nonrandomized Markov policy. If the initial state x, is neither zero nor one, then regardless of the policy employed, x , = 0 with probability 1 - (llx,), and x , = 1 with probability llx,. Once the system reaches zero, it remains there at no further cost. If the system reaches one, it moves to x , = 0 at a cost of - ~ ~ ( 1 ) . Thus JN,.(x0)= - y l ( l ) / x o if x, # 0, xo # 1, and J$(x,) = - oo if x, # 0, x, # 1. For any E > 0, n cannot be &-optimal. In Example 2, it is possible to find a sequence of nonrandomized Markov policies (n,) such that JN.n. 1 J$ . This example motivates the idea of policies exhibiting {&,}-dominated convergence to optimality (Definition 8.3) and Proposition 8.4, which we prove with the aid of the next lemma. Lemma 8.5 Let (J,} be a sequence of universally measurable functions from S to R* and ,u a universally measurable function from S to C whose graph lies in T. Suppose for some sequence {s,) of positive numbers with E, < m, we have, for every x E S,
I,"=,
S~ : ( x ' ) t ( d x ' ,u(x)) x, < m, and for k
= 2,3,.
lim Jk(x)= J(x) k+ m
..
J(x) < Jk(x)l J(x)
+ E~
J d x ) < J k - 1(x)+ &k
if J(x) > - CO, if J(x) = - m.
Then ) T,(J). lim T p ( J k= k- m
Proof
Since J
< J, for every k, it is clear that T,(J) I lirn inf T,(Jk). k-m
For x E S,
Now lim k - sup X, J f X , J i x , )> - E:~,(x')t(dx'lx, AX)
If J(x')
= - co,then
and since J[J: (x') + lim sup k+
C,"=,E,] t(dx'lx, p(x)) < oo,we have
S;
- m} Jk(x')t(dx'lx,P(x))
x'lJ(x')=
If follows that lim sup T,(Jk) I T,(J). k+ m
Combine (27) and (28) to conclude (26).
(28)
Q.E.D.
Proposition 8.4 (F-) Let {r,) be a sequence of positive numbers with 10. There exists a sequence of nonrandomized Markov policies {n,) exhibiting (&,)-dominated convergence to optimality. In particular, if Jg(x) > - co for all x E S, then for every E > 0 there exists an &-optimal nonrandomized Markov policy. E,
Proof For N = 1, by Proposition 7.50 there exists a sequence of nonrandomized Markov policies n, = (y", such that for all n T,;(Jo)(x)
+
I
T(Jo)(x) - 1/&n
E,
if T(Jo)(x) > - oo, if T(J,)(x) = - a.
We may assume without loss of generality that
Therefore {z,) exhibits (&,)-dominated convergence to one-stage optimality. Suppose the result holds for N - 1. Let z, = (p;, . . . ,p$- ,) be a sequence of (N - 1)-stage nonrandomized Markov policies exhibiting {~,/2cx)dominated convergence to ( N - 1)-stage optimality, i.e., lim J N - l , n=nJ g - l , n+ w
We assume without loss of generality that C,"=, E, < co.By Proposition 7.50, there exists a sequence [pn}of universally measurable functions from S to
204
8.
C whose graphs lie in
THE FINITE HORIZON BOREL MODEL
r such that
We may assume without loss of generality that T ( J 1 )S T
(
J
)
n
= 2,3,.
...
(32)
By Proposition 7.48, the set
is analytic in SC, and the Jankov-von Neumann theorem (Proposition 7.49) ) ]C implies the existence of a universally measurable y : p r ~ j ~ [ A ( J $ - ~+ whose graph lies in A(J;- ,). Define lii"(x) =
~(x) yn(x)
if x E projs[A(J$ otherwise.
-
,)I,
Then it, = (,P7 TC,) is an N-stage nonrandomized Markov policy which will be shown to exhibit {&,)-dominated convergence to optimality. For X E projs[A(J;- ,)I, we have, from Lemma 8.5 and the choice of y, lim sup JN, i i n ( ~=) lirn sup T,(JNn+ m
n+ m
,,,J(x)
For x $projs[A(J;- ,)I, we have t((xllJ$- ,(xf) = - co) lx, u) = 0 for every u E U(x), SO by (29) JN.%,(XI= TJJN- l,ri,)(x) S Tp(J$- ,)(XI + &n/2,
(34)
and lim sup JN.%,(x)5 lim sup T,,(J;n--'
1)(x)S JE(x)
n+ m
3c
by (31). It follows that lirn JN.
= J;.
n-rm
Suppose for fixed x E S we have JR(x) > - m. Then x $ projs[A(J;- ,)I, and we have, from (31) and (34),
Suppose now that J$(x) = - oo. If x$proj,[A(J$-,)I, imply, for n 2 2, JN,
then (32) and (34)
~z"(x) 5 TAJR- 1 ) ( ~+) &,I"? 5 TPn-I(J$-,)(x) f &,/2 I T,nJN, ii,
we have, from (29) and (30),
while if x E projs[A(J$ JN,
,,,- ,)(XI+ &,I2 - 1 ( ~+ ) &n/2,
~ J N - 1,
ii,(x)
Tp(J, - I , n,)(x) 5 T,(J.N - I , z,, - ,)(x) + &n/2 = JN. en+ &n/2. =
In either case, JN.*,(x) I JN.en- 1 ( ~+) ~ n .
(37)
From (35)-(37) we see that (2,) exhibits (&,)-dominated convergence to Q.E.D. optimality. We conclude our discussion of the ramifications of Proposition 8.2 with a technical result needed for the development in Chapter 10. Lemma 8.6 (F+)(F-) For every p E P(S),
Proof
For p E P(S) and
E
II,
which implies
Choose
E
> 0 and let 2
and it follows that
~ ben &-optimal.If p((x1JY,(x)=
-
a))= 0, then
~fp({xI J;(x) = - a } )> 0 , then
1
J $ ( x ) p ( d x ) = co, then J $ ( x ) p ( d x )= co and (39) follows. If S{x,JX(x)> Otherwise, the right-hand side of (40)can be made arbitrarily small by letting E approach zero, so inf,,,J JN,,(x)p(dx) = - co and (39) is again valid. The lemma follows from (38) and (39). Q.E.D.
We now consider the question of constructing an optimal policy, if this is at all possible. When the dynamic programming algorithm can be used to construct an optimal policy, this policy usually satisfies a condition stronger than mere optimality. This condition is given in the next definition. Definition 8.6 Let .n = ( po, . . . , p, - ,) be a Markov policy and n N - k = ( p k ,. . . , pN- ,), k = 0 , . . . , N - 1. The policy n is uniformly N-stage optimal if
Lemma 8.7 ( F + ) ( F - ) The policy n = (,uo,. . . ,,uN - ,) E II is uniformly N-stage optimal if and only if
Proof
If n
= (,uo,.
where Jo,,o = J ; then for all k
= 0.
. . , p N - is uniformly N-stage optimal, then
If ( T , k T N - k - l ) ( J O )= T N - k ( ~ Ok) = , 0 , . . . ,N
-
1,
where the next to last equality is obtained by continuing the process used to obtain the previous equalities. Q.E.D. Lemma 8.7 is the analog for the Bore1 model of Proposition 3.3 for the model of Part I. Because (F') or (F-) is a required assumption in Lemma 8.1, one of them is also required in Lemma 8.7, as the following example shows. If we take (16) as the definition of Jk. ., then Lemma 8.7, Proposition 8.5, and
Corollaries 8.5.1 and 8.5.2 hold without the (F') and (F-) assumptions. The proofs are similar to those of Section 3.2.
EXAMPLE 3 Let S = { s , t ) u { ( k , j ) / k= 1,2,. . . ; j = 1,2), C = [a,b ) , U(s)= {a,b), U ( t )= U(k,j) = { b ) , k = 1,2,. . . ,j = 1,2, W = S, and cc = 1. Let the disturbance kernel be given by p(sls, a) = 1,
p[tl(k, 2),b] = 1, k = 1,2,. . . , and p(tlt, b ) = 1. Let the system function be f ( x , u, w) = w. Thus if the system begins at state s, we can hold it at s or allow it to move to some state (1, I ) , from which it subsequently moves to some (k,2) and then to t. Having reached t, the system remains there. The relevant costs are g(s, a) = g(s, b ) = g(t, b ) = 0, g[(k,I ) , b] = k, g[(k,2),b] = - k, k = 1,2,. . . . Let .rr = ( p O , p l , p Z )be a policy with p0(s) = b, pl(s) = p2(s)= a. Then
( 0 k T(Jo)(xz)= T,,(Jo)(xz)
T ~ ( J ~= ((T ~, ~ )
-k
=
1: i 0
I
- oo
T 3 ( J o ) ( x o= ) (T,,T,IT,2)(Jo)(xo) = - k
if x , if x , if x ,
= s, = (k, 1), = (k,2),
if x , = ( k , l ) , if x , = (k,2), if x , = t , if x , if x ,
= (k, 1), = (k,2),
However, J,. ,(s) = oo > J,, ,(s) = 0, where 3 = ( P o ,P I , ,Liz) and &(s) = pl(s) = p2(s) = a, so .rr is not optimal and T ~ ( J , )# J g . It is easily verified that 7C is a uniformly three-stage optimal policy, so Corollary 3.3.l(b) also fails to hold for the Bore1 model of this chapter. Here both assumptions (F+) and (F-) are violated. Proposition 8.5 ( F + ) ( F - ) If the infimum in
is achieved for each ~ E Swhere , J g is identically zero, then a uniformly Nstage optimal (and hence optimal) nonrandomized Markov policy exists. This policy is generated by the dynamic programming algorithm, i.e., by measurably selecting for each x a control u which achieves the infimum. Proof Let 7~ = (p,,, . . . ,pNwhere p,-,- , :S + C achieves the infimum in (41)and satisfies p, - ,(x) E U(x) for every x E S, k = 0,. . . ,N - 1 (Proposition 7.50). Apply Lemma 8.7. Q.E.D.
-,
Corollary 8.5.1 (F+)(F-) If U(x) is a finite set for each x E S, then a uniformly N-stage optimal nonrandomized Markov policy exists. Corollary 8.5.2 (F+)(F-) If for each XES,/?E R, and k the set
= 0,.
. . ,N
-
1,
is compact, then there exists a uniformly N-stage optimal nonrandomized Markov policy. Proof
Apply Lemma 3.1 to Proposition 8.5.
Q.E.D.
8.3 The Semicontinuous Models
Along the lines of our development of lower and upper semicontinuous functions in Section 7.5, we can consider lower and upper semicontinuous decision models. Our models will be designed to take advantage of the possibility for Borel-measurable selection (Propositions 7.33 and 7.34), and in the case of lower semicontinuity, the attainment of the infimum in (55) of Chapter 7. We discuss the lower semicontinuous model first. Definition 8.7 The lower semicontinuous, jinite horizon, stochastic, optimal control model is a nine-tuple (S,C, U , W,p, f,cc,g, N) as given in Definition 8.1 which has the following additional properties:
(a) The control space C is compact. (b) The set r defined by (1) has the form l- = Uj"=,Tj, where l-' c T2 c . . . , each Tj is a closed subset of SC, and lim
inf
j + m (x. U ) E P - rJ- 1
g(x,u) = co.?
(c) The disturbance kernel p(dwlx, u) is continuous on T. By convention. the infimum over thc cmpty set is x,so this condition is satisfied if the
rJare all identical for j larger than some index k .
(d) The system function f is continuous on.J W. (e) The one-stage cost function g is lower semicontinuous and bounded below on T. Conditions (c) and (d) of Definition 8.7 and Proposition 7.30 imply that t(dxrlx,u) defined by (3) is continuous on T,since for any h E C(S) we have
Condition (e) implies that the (F') assumption holds. Proposition 8.6 Consider the lower semicontinuous finite horizon model of Definition 8.7. For k = 1,2,. . . ,N, the k-stage optimal cost function Jt is lower semicontinuous and bounded below, and Jz = Tk(J,). Furthermore, a Borel-measurable, uniformly N-stage optimal, nonrandomized Markov policy exists.
Proof Suppose J: S + R* is lower semicontinuous and bounded below, and define K: l- + R* by
K(x, U)= g(x, u) + ~1 S~(x')t(dx'lx, u).
(42)
Extend K to all of SC by defining Z(x, u) =
if (x, u) $ T.
By Proposition 7.31(a) and the remarks following Lemma 7.13, the function K is lower semicontinuous on r. For C E R, the .set {(x,u) E SC/&(X,u) I C) must be contained in some Tkby Definition 7.8(b), so the set {(x,u) E S C ] ~ ( X u), 5 c) = {(x,u) E Tk1 ~ ( xU), I C) is closed in Tk and thus closed in SC as well. It follows that &(x, u) is lower semicontinous and bounded below on SC and, by Proposition 7.32, the function T(J)(x)= inf k(x, u) UGC
is as well. In fact, Proposition 7.33 states that the infimum in (43) is achieved for every x E S, and there exists a Borel-measurable q : S -+ C such that For j = 1,2, . . . , let qj: proj,(rj) + C be a Borel-measurable function with graph in Tj.(Set D = T j in Proposition 7.33 to establish the existence of such p(x) = ql(x) a function.) Define p: S + C so that p(x) = ~ ( xif) T(J)(x) < a, if T(J)(x)= E and x ~ p r o j , ( r ' ) ;and for j = 2,3, . . . , define p(x) = cpj(x) if
T(J)(x) = co and xEprojS(rj)- proj,(Tj-I). Then p is Borel-measurable, p(x) E U(x) for every x E S, and T,(J) = T(J). Since J, r 0 is lower semicontinuous and bounded below, the above argument shows that J,*= Tk(J,) has these properties also, and furthermore, for each k = 0, . . . ,N - 1, there exists a Borel-measurable p,: S -+ C such that pk(x)E U(x) for every x E S and (T,*T~-~-')(J,)= TN-k(JO).The Q.E.D. proposition follows from Lemma 8.7. We note that although condition (a) of Definition 8.7 requires the compactness of C, the conclusion of Proposition 8.6 still holds if C is not compact but can be homeomorphically embedded in a compact space in such a way that the image of Tj, j = 1,2, . . . , is closed in sC. That is to say, the conclusion holds if there is a compact space C and a homeomorphism cp: C -, such that for j = 1,2, . . . , @(Tj)is closed in sC, where
c
The continuity o f f and p(dwlx, u) and the lower semicontinuity of g are unaffected by this embedding. In particular, if Tj is compact for each j, we can take = Y? and use Urysohn's theorem (Proposition 7.2) and the fact that the continuous image of a compact set is compact to accomplish this transformation. We state this last result as a corollary.
c
Corollary 8.6.1 The conclusions of Proposition 8.6 hold if instead of assuming that C is compact and each Tj is closed in Definition 8.7, we assume that each Tj is compact. Definition 8.8 The upper semicontinuous, jinite horizon, stochastic, optimal control model is a nine-tuple (S, C, U , W,p, f,cc,g, N) as given in Definition 8.1 which has the following additional properties: (a) (b) (c) (d)
The set r defined by (1) is open in SC. The disturbance kernel p(dwlx, u) is continuous on T. The system function f is continuous on T W The one-stage cost g is upper semicontinuous and bounded above
on T. As in the lower semicontinuous model, the stochastic kernel t(dx'lx, u) is continuous in the upper semicontinuous model. In the upper semicontinuous model, the (F-) assumption holds. If J : S + R* is upper semicontinuous and bounded above, then K : T + R* defined by (42) is upper semicontinuous and bounded above. By Proposition 7.34, the function T(J)(x)= inf K(x, u) uEU(X)
is upper semicontinuous, and for every E > 0 there exists a Borel-measurable
y :S -+ C such that y ( x )E U ( x )for every x E S, and
-
Since J o 0 is upper semicontinuous and bounded above, so is J z = Tk(Jo), k = 1,2, . . . , N. The following proposition is obtained by using these facts to parallel the proof of the (F-) part of Proposition 8.3. Proposition 8.7 Consider the upper semicontinuous finite horizon model of Definition 8.8. For k = 1,2,. . . ,N, the k-stage optimal cost function J z is upper semicontinuous and bounded above, and J z = Tk(Jo).For each E > 0, there exists a Borel-measurable, nonrandomized, semi-Markov, E-optimal policy and a Borel-measurable, (randomized) Markov, E-optimal policy.
Actually, it is not necessary that S and C be Bore1 spaces for Proposition 8.7 to hold. Assuming only that S and Care separable metrizable spaces, one can use the results on upper semicontinuity of Section 7.5 and the other assumptions of the upper semicontinuous model to prove the conclusion of Proposition 8.7. It is not possible to parallel the proof of Proposition 8.4 to show for the upper semicontinuous model that given a sequence of positive numbers {E,) with E, LO,a sequence of Borel-measurable, nonrandomized, Markov policies exhibiting {&,)-dominatedconvergence to optimality exists. The set A(JX- ,) defined by (33) may not be open, so the proof breaks down when one is restricted to Borel-measurable policies. We conclude this section by pointing out one important case when the disturbance kernel p(dw)x,u) is continuous. If W is n-dimensional Euclidean space and the distribution of w is given by a density d(w(x,u ) which is jointly continuous in ( x ,u) for fixed w, then p(dw(x,u) is continuous. To see this, let G be an open set in W and let (x,, u,) + ( x ,u) in SC. Then lim infp(Glx,, u,)
,-a
= lim inf k-+ a2
L
d(wlxk,uk)dw
by Fatou's lemma. The continuity ofp(dwlx, u)follows from Proposition 7.21.+ Note that by the same argument.
so p(Gjs,.u,) + p(G1x.u). Under this condition, the assumption that the systcm function is continuous in the state (Definitions 8.7(d)and 8.8(c)) can bc weakcned. See [H3] and [S5].
212
8.
THE FINITE HORIZON BOREL MODEL
In fact, it is not necessary that d be continuous in ( x ,u) for each w, but only that (x,, u,) 4 ( x ,u) imply d(w(x,,u,) + d(w(x,u) for Lebesgue almost all w. For example, if W = R, the exponential density d(wlx,U ) =(-[P:{
w - m(x,u ) ) ]
if w 2 m(x,u), if w < m(x,u),
where m:SC -+ R is continuous, has this property, but need not be continuous in ( x ,u) for any w E R.
Chapter 9
The Infinite Horizon Bore1 Models
A first approach to the analysis of the infinite horizon decision model is to treat it as the limit of the finite horizon model as the horizon tends to infinity. In the case (N) of a nonpositive cost per stage and the case (D) of bounded cost per stage and discount factor less than one, this procedure has merit. However, in the case (P) of nonnegative cost per stage, the finite horizon optimal cost functions can fail to converge to the infinite horizon optimal cost function (Example 1 in this chapter), and this failure to converge can occur in such a way that each finite horizon optimal cost function is Borel-measurable, while the infinite horizon optimal cost function is not (Example 2). We thus must develop an independent line of analysis for the infinite horizon model. Our strategy is to define two models, a stochastic one and its deterministic equivalent. There are no measurability restrictions on policies in the deterministic model, and the theory of Part I or of Bertsekas [B4], hereafter abbreviated DPSC, can be applied to it directly. We then transfer this theory to the stochastic model. Sections 9.1-9.3 set up the two models and establish their relationship. Sections 9.4-9.6 analyze the stochastic model via its deterministic counterpart. 9.1 The Stochastic Model Definition 9.1 An infinite horizon stochastic optimal control model, denoted by (SM), is an eight-tuple ( S ,C, U , W ,p, f , a, g) as described in
9.
214
THE INFINITE HORIZON BOREL MODELS
Definition 8.1. We consider three cases, where l- is defined by (1) of Chapter 8: (P) 0 I g(x,u) for every ( x ,u) E r. ( N ) g(x, u) I 0 for every ( x ,U ) E l-. (D) 0 < a < 1, and for some b E R, - b I g(x, u) I b for every ( x ,u) E r. Thus we are really treating three models: (P), ( N ) ,and (D). If a result is applicable to one of these models, the corresponding symbol will appear. The assumptions (P), ( N ) , and (D) replace the (F+)and (F-) conditions of Chapter 8. Definition 9.2 A policy for (SM) is a sequence n = (p,, p,, . . .) such that for each k, pk(duklxo,u,, . . . ,u,-, ,x,) is a universally measurable stochastic kernel on C given SC . . . CS satisfying
,,
for every (x,, uo, . . . ,u,- x,). The concepts of semi-Markov, Markov, nonrandomized, and F-measurable policies are the same as in Definition 8.2. W e denote by n' the set of all policies for ( S M )and by II the set of all Markov policies. If n is a Markov policy of the form n = (p,p,. . .), it is said to be stationary. As in Chapter 8, we often index S and C for clarity, understanding S, to be a copy of S and Ck to be a copy of C. Suppose p E P ( S )and n = ( p , ,p, ,. . .) is a policy for (SM). By Proposition 7.45, there is a sequence of unique probability measures r,(n,p) on SoCo. - . S N - ,CN- N = 1,2,. . . , such that for any N and any universally measurable function h : S o C o .. . S N - C N - + R* which satisfies either Jh+ drN(n,p)< co or Jh- drN(n,p)< CO, (4) of Chapter 8 is satisfied. Furthermore, there exists a unique probability measure r(n,p) on SoCoS,C,... such that for each N the marginal of r(n,p) on SoCo. . . S N - C N - is rN(n,p). With rN(n,p) and r(n,p) determined in this manner, we are ready to define the cost corresponding to a policy.
,,
, ,
, ,
Definition 9.3 Suppose n is a policy for (SM). The (infinite horizon) cost corresponding to n at x E S is
+ T h e interchange of integration and summation is justified by appeal to the monotone convergence theorem under (P)and (N), and the bounded convergence theorem under (D).
9.1
215
THE STOCHASTIC MODEL
If TC = (p,p,. . .) is stationary, we sometimes write J, in place of J,. The (infinite horizon) optimal cost at x is J*(x) = inf J,(x). XEII'
If
E
> 0, the policy
TC
is E-optimal at x provided
If J,(x) = J*(x),then n is optimal at x. If n is &-optimalor optimal at every x E S, it is said t o be E-optimalor optimal, respectively. It is easy to see, using Propositions 7.45 and 7.46, that, for any policy n, J,(x) is universally measurable in x. In fact, if n = ( p o , p l , .. .) and nk = (p,, . . . ,pk- then J,. ,,(x) defined by ( 5 ) of Chapter 8 is universally measurable in .Y and lim J,, ,,(x)
k* m
V x E S.
= J,(x)
(3)
If TC is Markov, then (3) can be rewritten in terms of the operators T,, of Definition 8.4 as lim (T,, . . . T,,_ ,)(J,)(x)= J,(x)
k+ w
V x E S,
(4)
which is the infinite horizon analog of Lemma 8.1. If n is a Borel-measurable policy and g is Borel-measurable, then J,(x) is Borel-measurable in x (Proposition 7.29). It may occur under (P),however, that lirn,,, J z ( x ) # J*(x),where J:(x) is the optimal k-stage cost defined by (6) of Chapter 8. We offer an example of this.
EXAMPLE 1 Let S
=
(0,1,2,. . .}, C
=
{1,2,. . .), U(x) = C for every
~ E SC( ,= 1,
The problem is deterministic, so the choice of Wand p(dwlx, u) is irrelevant. Beginning at x , = 0, the system moves to some positive integer u, at no cost. It then successively moves to u, - 1, u, - 2,. . . , until it returns to zero and the process begins again. The only transition which incurs a nonzero cost is the transition from one to zero. If the horizon k is finite and u, is chosen larger than k, then no cost is incurred before termination, so J,*(O)= 0. Over the infinite horizon, the transition from one to zero will be made infinitely often, regardless of the policy employed, so JK(0)= x.
216
9.
THE INFINITE HORIZON BOREL MODELS
For 71 = ( p Op, l , . . .) E IIf and p E P(S),let q,(n, p) be the marginal of r(n,p) on S,C,, k = 0,1, . . . . Then (7) of Chapter 8 holds, and if 71 is Markov, (8) holds as well. Furthermore, from (1)we have
which is the infinite horizon analog of (9) of Chapter 8. Using these facts to parallel the proof of Proposition 8.1, we obtain the following infinite horizon version.
Proposition 9.1 ( P ) ( N ) ( D ) If x E S and 71' E ll',then there is a Markov policy 71 such that
Corollary 9.1.1 ( P ) ( N ) ( D ) We have J*(x) = inf J,(x)
V x E S,
where II is the set of all Markov policies for (SM).
9.2 The Deterministic Model Definition 9.4 Let (S, C, U , W , p , f ,a,g) be an infinite horizon stochastic optimal control model as given by Definition 9.1. The corresponding infinite horizon deterministic optimal control model, denoted by ( D M ) , consists of the following : P ( S ) State space. P(SC) Control space. 0 Control constraint. A function from P ( S ) to the set of nonempty subsets of P ( S C ) defined for each p E P ( S ) by U(p)= ( q P(SC)lq(T) ~ = 1 and the marginal of q on S is p),
where l- is given by (1)of Chapter 8. J; System function. The function from P(SC) to P ( S ) defined by
where t(dx1lx,u) is given by (3) of Chapter 8. a Discount factor. One-stage cost function. The function from P(SC) to R* given by
(6)
The model ( D M )inherits considerable regularity from (SM).Its state and control spaces P ( S ) and P ( S C ) are Bore1 spaces (Corollary 7.25.1). The system function 7 is Borel-measurable (Proposition 7.26 and Corollary 7.29.1), and the one-stage cost function g is lower semianalytic (Corollary 7.48.1). Furthermore, under assumption (P) in ( S M ) , we have 2 0 , while under ( N ) ,g < 0, and under ( D ) , 0 < cc < 1 and - b I g I b. Definition 9.5 A policy for ( D M ) is a sequence of mappings E = (p,, ,El,. . .) such that for each k, p k : P ( S )+ P ( S C ) and pk(p)e U ( p )for every p~ P(S). The set of all policies in ( D M ) will be denoted by fl. We place no measurability requirements on these mappings. A policy n of the form ?t= (p,p,. . .) is said to be stationary. Definition 9.6 Given p, E P ( S ) and a policy the cost corresponding to 71 at p, is
= ( P o , ,El, . . .)
for ( D M ) ,
where the control sequence {q,) is generated recursively by means of the equation
and the system equation If n = ( p , p , . . .) is stationary, we write at p, is
in place of 3,. The optimal cost
J*(po) = inf TZ(po). Tren
The concepts of &-optimaland optimal policies for ( D M )are the same as those given in Definition 9.3 for ( S M ) . Definition 9.7 A sequence ( p , ,q, ,q , ,. . .) E P(S)P(SC)P(SC) . . is admissible in ( D M )if q, E D(po)and q,, E D[T(qk)],k = 0,1,. . . . The set of all admissible sequences will be denoted by A.
,
The admissible sequences are just the sequences of controls go, q,, . . . together with the initial state p, which can be generated by some policy for ( D M )via (10)and (11).Except for p,, the measures pk are not included in the sequence, but can be recovered as the marginals of the measures qk on S [cf. (6)l. Definition 9.8 Let 7: P ( S ) + R* be given and let ,E: P ( S ) + P ( S C ) be such that p ( p ) D~ ( p ) for every p~ P(S). The operator T , mapping 7 into
218
9.
THE INFINITE HORIZON BOREL MODELS
T,(J):P ( S ) + R* is defined by The operator T mapping J into T ( J ) : P ( S )+ R* is defined by
+
T ( J ) ( p )= inf .(g(q) a J [ T ( q ) ] )
Vp€P(S).
4E U(P)
Because (DM) is deterministic, it can be studied using results from Part I, Chapters 4 and 5 or from DPSC. This is because there is no need to place measurability restrictions on policies in a deterministic model. The operators T , and T of Definition 9.8 are special cases of those defined in Section 2.1. In the present case, we take H(p, q, J ) to be
The monotonicity assumption of Section 2.1 is satisfied by this choice of H. The cost corresponding to a policy 75 = (Po,,El, . . .) as given by (9) is easily seen to be of the form (cf. Section 2.2)
7, = lim (T,;
. . T,,-
,)(J,),
N+ m
where J,(p) = 0 for every p~ P(S). It is a straightforward matter to verify that under (D) the contraction assumption of Section 4.1 is satisfied when B is taken to be the set of bounded real-valued functions on P(S), m is taken to be one, and p = a. Under (P), Assumptions I, 1.1, and 1.2 of Section 5.1 are satisfied, while under (N), Assumptions D, D.l, and D.2 of the same section are in force. 9.3 Relations between the Models
Definition 9.9 Let n = ( p , ,pl,.. .) E Il be a Markov policy for (SM) and a policy for (DM). Let p, E P(S) be given. If for all k 75 = (p,, p l , . . .) E
where p, is generated from p, by 75 via (10) and ( l l ) ,then n and n are said to correspond at p,. If n and 75 correspond at every p~ P(S), then n and ? are i said to correspond. If and E correspond at p,, then the sequence of measures [q,(n, p,), ql(n,p,), . . .] generated from p, by n via (8) of Chapter 8 is the same as the sequence (q,, q,, . . .) generated from p, by 7i via (10) and (11). If n and if correspond, then they generate the same sequence (q,, q,, . . .) for any initial p,.
9.3
219
RELATIONS BETWEEN THE MODELS
Proposition 9.2 ( P ) ( N ) ( D ) Given a Markov policy 7-1 E II, there is a corresponding Z E fl. If 7 1 fZ~ and p, E P ( S ) are given, then there is a Markov policy n E II corresponding to ?? at p, .
Proof If n = ( p o ,p,, . . .) E II is given, then for each k and any pkE P(S), there is a unique probability measure on SC, which we denote by p,(p,), satisfying (12) (Proposition 7.45). Furthermore,
so 71 = (Po,P I , . . .) is in fj[ and corresponds t o n. If Z = (p,, p,, . . .) E and p, E P ( S ) are given, let ( p , ,p, ,pz ,. . .) be generated from p, by 71 via (10)and (11). For each k, choose a Borel-measurable stochastic kernel pk(dulx)which satisfies (12)for this particular p, (Corollary 7.27.2). Then (13)holds, so
for pk almost every x. Altering p,(dulx) on a set of pk-measure zero if necessary, we may assume that (14) holds for every X E S and (12) is satisfied. Then Q.E.D. n = ( p o , p l , . . . ) E I I corresponds to Z at p,. Proposition 9.3 ( P ) ( N ) ( D ) Let p E P(S), n E II, and n and ?? correspond at p, then
EE
n be given. If
Proof We have from (7) of Chapter 8, (5), (8), (9),and the monotone or bounded convergence theorems
= J,(p).
Corollary 9.3.1
Q.E.D.
( P ) ( N ) ( D ) Let x E S, n E II, and E E fl be given. If n and
71 correspond at p,, then
220
9.
Corollary 9.3.2 Proof x € S,
THE INFINITE HORIZON BOREL MODELS
(P)(N)(D) For every x E S,
Corollaries 9.1.1,9.3.1, and Proposition 9.2 imply that, for every J*(px) = inf A(&)= inf J,(x) = J*(x). ?ten Z E ~
Q.E.D.
Corollary 9.3.2 shows that J * and I* are related, but in a rather weak ) . Proposition 9.5 we way that involves 5* only on S = { ~ , E P ( S ) ( X E SIn strengthen this relationship, but in order to state that proposition we must show a measurability property of J*. This is the subject of Proposition 9.4, which we prove with the aid of the following lemma.
Lemma 9.1 The set A of admissible sequences in (DM) is an analytic subset of P(S)P(SC)P(SC).. . . Proof
The set A is equal to A. n [n?==,B,], where
By Corollary 7.35.2, it suffices to show that A , and B,, k = 0,1, . . . , are analytic. Using the result of Proposition 7.38, this will follow if we show that
are analytic. Let P(T) = (qeP(SC)(q(T)= I), where r is given by (1) of Chapter 8. Then P(T) is analytic (Proposition 7.43). Equation (6) implies that A is the intersection of the analytic set P(S)P(T)(Proposition 7.38) with the graph of the function a:P(SC) + P(S) which maps q into its marginal on S. It is easily verified that o is continuous (Proposition 7.21(a) and (b)), so Gr(o) is Bore1 (Corollary 7.14.1). Therefore, A is analytic. The set B is the inverse image ofA under the Borel-measurable mapping (q,, q,) + [T(qo),q,], Q.E.D. so is also analytic (Proposition 7.40). Proposition 9.4 (P)(N)(D) The function T* :P(S) + R* is lower semianalytic. Proof
Define G:A
+
R* by
9.3
221
RELATIONS BETWEEN THE MODELS
where A is the set of admissible sequences (Definition 9.7). Then G is lower semianalytic by Lemma 7.30(2), (4) and Lemma 9.1. By the definition of J * and A, we have J*(po)=
inf (40 , a ,
. . .)€Apo
PO^...
so J* is lower semianalytic by Proposition 7.47. Corollary 9.4.1 analytic.
Proof
(P)(N)(D) The function J * : S
VPOEP(S),
(16)
Q.E.D. + R*
is lower semi-
By Corollary 9.3.2,
where 6(x) = p, is the homeomorphism defined in Corollary 7.21.1. Apply Lemma 7.30(3) and Proposition 9.4 to conclude that J * is lower semianalytic. Q.E.D. Lemma 9.2 Given p~ P(S) and such that
1
E
J*(x)p(dx)+&
(N)
TAP) 5
> 0, there exists a policy 71 for (DM)
S if J *(x)p(dx) S
if J*(x)p(dx)>-KI, = - KI.
Proof As a consequence of Corollary 9.4.1, J*(x)p(dx) is well defined. Let p E P(S) and E > 0 be given. Let G: A -+ R* be defined by (15).Proposition 7.50 guarantees that under (P)and (D) there exists a universally measurable selector cp :P(S) -+ P(SC)P(SC). . . such that (p, cp(p))E A for every p E P(S) and ~PEP(S). G[P, cp(p)l 5 J*(P) + & Let a : S + P(SC)P(SC). . . be defined by ~ ( x=) ~ ( p , ) Then . o is universally measurable (Proposition 7.44) and Under (N),there exists a universally measurable o :S + P(SC)P(SC). . . such that for every x E S, (p,, o(x))E A and G[P,> d x ) ] 5 wherep,(J*) 1 otherwise.
J*(x) + F - (1 + F')/E~,(J*)
if J*(x) > - co, if J*(x) = - co,
= p({xlJ*(x) = - co))ifp({xlJ*(x) = - a
) ) > 0 and P,(J*)
(18) =
9.
222
THE INFINITE HORIZON BOREL MODELS
Denote o(x) = [qO(d(xO, '0)lx), ql(d(xl, ul)lx), . . Each qk(d(xk,uL)Ix) is a universally measurable stochastic kernel on SkCkgiven S. Furthermore, qo(d(x0, uo)lx)E U(P,)
vx E S,
and,for k = 0,1,. . . , qk+l(d(xk+l,Uk+l)Ix)Eu(~[qk(d(xk~uk)lx)])
Fork
= 0,1,.
. . , defineq,~P(SC)by
T h e n a ( T ) = l , k = 0 , 1 , . . . . Weshowthat(p,qo,q,,... ) ~ A . S i n c e t h e m a r ginal of q,(d(xo, uo)lx) on So is p,, we have
so q 0 € O(p). For k
= 0,1,.
. . , we have
Therefore qk+,E U[y(qk)] and (p,qo,q,, . . .) E A. Let 71 be any policy for (DM) which generates the admissible sequence (p,qO,q l , . . .). Then under (P) and (D), we have from (17) and the monotone or bounded convergence theorem
9.3
RELATIONS BETWEEN THE MODELS
Under (N), we have from the monotone convergence theorem
If p((xlJ*(x) = - co)) = 0, (18) and (19) imply
where both sides may be - co. If p({xl J*(x) = - a ) ) > 0, then JJ*(x) p(dx) = - co and we have, from (18) and (19), P
)
> -z
J
< E - (1 Proposition 9.5
Proof
[J*(x)
+ &]p(dx)- (1 + e2)/&
+ E ~ ) / E= -I/&.
Q.E.D.
(P)(N)(D) For every p E P(S),
Lemma 9.2 shows that
For the reverse inequality, let p be in P(S) and let E be a policy for (DM). There exists a policy TC l ll corresponding to 7C at p (Proposition 9.2), and, by Proposition 9.3,
By taking the infimum of the left-hand side over n e fI, we obtain the desired result. Q.E.D. Propositions 9.3 and 9.5 are the key relationships between (SM) and (DM). As an example of their implications, consider the following corollary. Corollary 9.5.1 (P)(N)(D) Suppose TC lll and E E fI are corresponding policies for (SM) and (DM). Then 71 is optimal if and only if E is optimal.
Proof
If TC is optimal, then
If 7C is optimal, then J,(x)
= J,(p,)
= y*(p,) = J*(x)
VxcS.
Q.E.D.
The next corollary is a technical result needed for Chapter 10.
9.
224
THE INFINITE HORIZON BOREL MODELS
Corollary 9.5.2 ( P ) ( N ) ( D ) For every p E P(S), J ~ * ( x ) p ( d x=) inf J ~ , ( x ) ~ ( d x ) . Proof
By Propositions 9.2 and 9.3, J*(p) = inf
Apply Proposition 9.5.
S~ , ( x ) ~ ( d x ) Vp
E
P(S).
Q.E.D.
We now explore the connections between the operators T, and T pand the operators T and T. The first proposition is a direct consequence of the definitions. We leave the verification to the reader. Proposition 9.6 ( P ) ( N ) ( D ) Let J : S -+ R* be universally measurable and satisfy J 2 0, J 1 0, or - c 5 J 1 c, c < co, according as (P), ( N ) , or ( D )is in force. Let Y :P ( S ) + R* be defined by
J
VP P(S),
J ( P )= J(x)p(dx) and suppose p: P ( S )+ P(SC) is of the form
for some p ~ U(CIS)t.Then p ( p ) D(p) ~ for every p ~ P(S), and
Proposition 9.7 ( P ) ( N ) ( D ) Let J : S + R* be lower semianalytic and satisfy J 2 0, J 1 0, or - c I J I c, c < co, according as (P), (N), or ( D ) is in force. Let 7 :P ( S ) + R* be defined by
Then
Proof -
For p E P ( S ) and q E D(p) we have
g b ) + u J [ f ( q ) l = Js,[s(r 4
+
J, J(x0t(dxf\x,u)]q(d(x,u ) )
The set U(CIS),defined in Section 8.2. is the collection of universally measurable stochastic kernels p on C given S which satisfy p ( U ( x ) l x )= 1 for every X E S. +
9.4
THE OPTIMALITY EQUATION
which implies
Given E > 0, Lemma 8.2 implies that there exists y E U(CIS) such that
Let q E D(p) be such that
Then
where ST(J)(X)~(~X) + E may be - co.Therefore,
9.4 The Optimality Equation-Characterization of Optimal Policies
As noted following Definition 9.8, the model (DM) is a special case of that considered in Part I and DPSCt. This allows us to easily obtain many results for both (SM) and (DM). A prime example of this is the next proposition. Proposition 9.8 (P)(N)(D) We have J* = f(J*),
JY= T(JY).
Proof The optimality equation (21) for (DM) follows from Propositions 4.2(a), 5.2, and 5.3 or from DPSC, Chapter 6, Proposition 2 and Chapter 7, + Whereas we allow Z j t o be extended rcal-valued, in Chapter 7 of DPSC thc one-stage cost function is assumed to be real-valued. This more restrictive assumption is not essential to any of the results we quote from DPSC.
226
9.
THE INFINITE HORIZON BOREL MODELS
Proposition 1. We have then, for any x E S,
by Propositions 9.5 and 9.7, so (22) holds as well.
Q.E.D.
Proposition 9.9 (P)(N)(D) If 71 = (p, p,. . .) is a stationary policy for (DM), then 4 = T,(J,). If n = (p,,u,. . .) is a stationary policy for (SM), then J, = T,(J,).
Proof For (DM) this result follows from Proposition 4.2(b), Corollary 5.2.1, and Corollary 5.3.2 or from DPSC, Chapter 6, Corollary 2.1 and Chapter 7, Corollary 1.1. Let n = (,u,p,. . .) be a stationary policy for (SM) and let Ti = (p, p, . . .) be a policy for (DM) corresponding to n. Then for each XES,
by Propositions 9.3 and 9.6.
Q.E.D.
Note that Proposition 9.9 for (SM) cannot be deduced from Proposition 9.8 by considering a modified (SM) with control constraint of the form
as was done in the proof of Corollary 5.2.1. Even if ,u is nonrandomized so that (23) makes sense, the set
may not be analytic, so U , is not an acceptable control constraint. The optimality equations are necessary conditions for the optimal cost functions, but except in case (D) they are by no means sufficient. We have the following partial sufficiency results. Proposition 9.10
(P) If J:P(s) -t [0, oo] and J 2 T(J), then J 2 J*. If J : S -+ [0, m] is lower semianalytic and J 2 T(J), then J 2 J*. (N) If J:P(S) -t [ - co,01 and J I T(J), then 7 I J*. J*. If J : S -+ [ - co,0] is lower semianalytic and J I T(J), then J I ]:P(S) + [ c, c], c < m, and J = T(J), then J = 5". If (D) If J : S -+[-c, c], c < a,is lower semianalytic and J = T(J), then J = J*. Proof We consider first the statements for (DM). The result under (P) follows from Proposition 5.2, the result under (N) from Proposition 5.3, and the result under (D) from Proposition 4.2(a). These results for (DM)
9.4
227
THE OPTIMALITY EQUATION
follow from Proposition 2 and trivial modifications of the proof of Proposition 9 of DPSC, Chapter 6. We now establish the (SM) part of the proposition under (P). Cases (N) and (D) are handled in the same manner. Given a lower semianalytic function J : S + [0, co] satisfying J 2 T(J), define J : P(S) + [0, a] by (20).Then
by Proposition 9.7. By the result for (DM), J 2 J*. In particular, J(x)
= J(P,)
2 J*(p,)
= J*(x)
Proposition 9.11 Let Ti = ( F , p, . . .) and policies in (DM) and (SM), respectively.
Vx E S. 71 = (p, p,
Q.E.D.
. . .) be stationary
(P) If 2 : P(S) + [0, co] and 7 2 T,(J), then J 2 I,. If J : S + [0, co] is universally measurable and J 2 T,(J), then J 2 J,. T,(J), then J I7,. (N) I f J : P ( S ) + [-co,0] and J I If J:S + [ - oo,0] is universally measurable and J I T,(J), then J C J,. (D) If J : P(S) + [- c, c], c < co, and J = T,(J), then J = 7,. If J : S + [- c, c], c < co,is universally measurable and J = T,(J), then J = J,. Proof The (DM) results follow from Proposition 4.2(b) and Corollaries 5.2.1 and 5.3.2 or from DPSC, Corollary 2.1 and trivial modifications of Corollary 9.1 of Chapter 6. The (SM) results follow from the (DM) results and Proposition 9.6 in a manner similar to the proof of Proposition 9.10. Q.E.D. Proposition 9.11 implies that under (P), J, is the smallest nonnegative universally measurable solution to the functional equation Under (D), J, is the only bounded universally measurable solution to this equation. This provides us with a simple necessary and sufficient condition for a stationary policy to be optimal under (P) and (D). Proposition 9.12 (P)(D) Let 7t = (I*, p, . . .) and = (p,p, . . .) be stationary policies in (DM) and (SM), respectively. The policy 7t is optimal if and only if I* = T,(J*). The policy TC is optimal if and only if J * = T,(J*).
Proof If 7t is optimal, then J, = J*. By Proposition 9.9, J * = T,(J*). Conversely, if J * = T,(J*), then, by Proposition 9.11, J * 2 J, and n is
228
9.
THE INFINITE HORIZON BOREL MODELS
optimal. The proof for (SM) follows from the (SM) parts of the same propositions. Q.E.D. Corollary 9.12.1 (P)(D) There is an optimal nonrandomized stationary policy for (SM) if and only if for each x E S the infimum in
is achieved. Proof If the infimum in (24) is achieved for every x E S, then by Proposition 7.50 there is a universally measurable selector p: S -t C whose graph lies in r and for which
Then by Proposition 9.8 T,(J*)
=
T(J*) = J*,
so n = (p, p,. . .) is optimal by Proposition 9.12. If TC = (p, p, . . .) is an optimal nonrandomized stationary policy for (SM), then by Propositions 9.8 and 9.9
so p(x) achieves the infimum in (24) for every x E S.
Q.E.D.
In Proposition 9.19, we show that under (P) or (D), the existence of any optimal policy at all implies the existence of an optimal policy that is nonrandomized and stationary. This means that Corollary 9.12.1 actually gives a necessary and sufficient condition for the existence of an optimal policy. Under (N) we can use Proposition 9.10 to obtain a necessary and sufficient condition for a stationary policy to be optimal. This condition is not as useful as that of Proposition 9.12, however, since it cannot be used to construct a stationary optimal policy in the manner of Corollary 9.12.1. Proposition 9.13 (N)(D) Let Z = ( p ,p, . . .) and n = (p,p, . . .) be stationary policies in (DM) and (SM), respectively. The policy n is optimal if and only if 7, = T(Jp).The policy n is optimal if and only if J , = T(J,). Proof
If E is optimal, then 7,
= J*.
By Proposition 9.8
9.5
CONVERGENCE OF THE DYNAMIC PROGRAMMING ALGORITHM
229
Conversely, if J, = T(J,), then Proposition 9.10 implies that 7, I 5" and 7t is optimal. If 71 is optimal, J , = T(J,) by the ( S M ) part of Proposition 9.8. The converse is more difficult, since the ( S M ) part of Proposition 9.10 cannot be ., . .) be invoked without knowing that J, is lower semianalytic. Let n = ( A,ii a policy for ( D M ) corresponding to 71 = ( p ,p,. . .), so that J,(p) = JJ,(x)p(dx) for every p E P(S). Then for fixed p E P ( S ) and q E D(p),
provided the integrand
is universally measurable in x. But T(J,) = J , by assumption, which is universally measurable, so
By taking the infimum of the left-hand side over q E g ( p ) and using Proposition 9.9, we see that T(7,) 2 7, = T,(J,). The reverse inequality always holds, and by the result already proved for ( D M ) ,7tis optimal. The optimality ofn follows from Corollary 9.5.1. Q.E.D. 9.5
Convergence of the Dynamic Programming AlgorithmExistence of Stationary Optimal Policies
Definition 9.10 The dynamic programming algorithm is defined recursively for ( D M ) and ( S M ) by
We know from Proposition 8.2 that this algorithm generates the k-stage optimal cost functions J:. For simplicity of notation, we suppress the Y here. At present we are concerned with the infinite horizon case and the possibility that J , may converge to J Y as k + co.
230
9.
THE INFINITE HORIZON BOREL MODELS
Under (P),JoI J1and so 7, = T(Jo)I T(J,) = 7,. Continuing, we see that 7, is an increasing sequence of functions, and so J, = lirn,,, Jkexists and takes values in [0, + a ] . Under (N), Jkis a decreasing sequence of functions and 7, exists, taking values in [ - oo,01. Under (D), we have
and, in general,
As k -t m, we see that b b/(l - a), so J, = lirn,,,
6 aj + Tk(Jo)increases to a limit. But b xj"=,aj = Tk(Jo)exists and satisfies - b/(l - a) I J,.
Similarly, we have
7, < b/(l
- a).
then Now if J : P ( S ) -t [ - c, c], c < a,
JoI J + c,
T(7,) I T(J + c) = ac + T(7),
and, in general,
It follows that
7,
lirn inf Tk(J), k - t rn
and by a similar argument beginning with J - c < J o , we can show that lim sup,,, Tk(J)< 7,. This shows that under (D), if J is any bounded realvalued function on P(S), then J, = lim,,, Tk(J). The same arguments can be used to establish the existence of J, = lirn,, , J, . Under (P), J, :S -+ [0, oo] ; under (N), J, :S -+ [ - m, 0] ; and under (D), J, = lirn,,, Tk(J)takes values in [ - b/(l - a), b/(l - a)] where J:S -+ [ - c, c], c < oo, is lower semianalytic. Note that in every case, J, is lower semianalytic by Lemma 7.30(2).
+
Lemma 9.3 (P)(N)(D) For every p e P(S),
9.5
CONVERGENCE OF THE DYNAMIC PROGRAMMING ALGORITHM
231
Proof For k = 0,1,. . . , the lemma follows from Proposition 9.7 by induction. When k = co,the lemma follows from the monotone convergence theorem under (P) and (N) and the bounded convergence theorem under (D). Q.E.D. Proposition 9.14 (N)(D) We have
1, = J*, J,
= J*.
Indeed, under (D) the dynamic programming algorithm can be initiated from any 7: P(S) [- c, c], c < co, or lower semianalytic J : S -+ [ - c, c], c < co,and converges uniformly, i.e., -+
lim sup ITk(J)(p)- J*(p)l = 0, k + w p E P(S)
lim suplTk(J)(x)- J*(x)l = 0. k+m XES
(27) (28)
Proof The result for (DM) follows from Proposition 4.2(c) and 5.7 or from DPSC, Chapter 6, Proposition 3 and Chapter 7, Proposition 4. By Lemma 9.3,
so (25)implies (26).Under (D), if a lower semianalytic function J : S + [ - c, c], c < co, is given, then define J : P(S) -+ [- c, c] by (20). Equation (28) now follows from (27) and Propositions 9.5 and 9.7. Q.E.D. Case (D) is the best suited for computational procedures. The machinery developed thus far can be applied to Proposition 4.6 or to DPSC, Chapter 6, Proposition 4, to show the validity for (SM) of the error bounds given there. We provide the theorem for (SM). The analogous result is of course true for (DM). Proposition 9.15 (D) Let J : S Then for all X E Sand k = 0,1,. . . ,
-+
[-c, c], c < co,be lower semianalytic.
where
bk = [%/(I- a)] i n f [ ~ ~ ( ~ )-( Tkx ) '(J)(x)], xes
6,
=
[%/(I- a)] sup[Tk(J)(x)- Tk- '(J)(x)]. xeS
(30)
9.
232
THE INFINITE HORIZON BOREL MODELS
Proof Given a lower semianalytic function J: S -+ [ - c, c], c < co,define 1 :P(S) + [- c, c] by (20). By Proposition 9.7,
Therefore bk
=
[a/(l - a)] inf [Tk(J)(p)- Tk-l(J)(p)], P PP(S)
6,
=
[a/(l - a)] sup [Tk(J)(p)- Tk-l(J)(p)], P E P(S)
where b, and 5, are defined by (30) and (31). Taking a, = a, = a in Proposition 4.6 or using the proof of Proposition 4, Chapter 6 of DPSC, we obtain
Substituting p = p, in this equation, we obtain (29).
Q.E.D.
It is not possible to develop a policy iteration algorithm for (SM) along the lines of Proposition 4.8 or 4.9. One difficulty is this. If at the kth iteration ~ then J,, is we have constructed a policy (p,,pk,. . .), where p , U(CIS), universally measurable but not necessarily lower semianalytic. We would like to find p, + E ~ ( ~ 1such s ) that TPk+ ,(J,,) 5 T(J,,) + 6 , where E > 0 is some prescribed small number, but Proposition 7.50 does not apply to this case. We turn now to the question of convergence of the dynamic programming algorithm under (P). Without additional assumptions, we have only the following result.
,
Proposition 9.16 (P) We have
7, 5 J*, J, I J*. Furthermore, the following statements are equivalent:
Jr, T(J,),
(a) = (b) J, = J*, ( 4 J,= T(J,), (d) J, = J*.
Proof It is clear that (32) holds and, by Proposition 9.10, implies the equivalence of (a) and (b). Lemma 9.3, Proposition 9.5, and (32) imply (33). Conditions (a) and (c) are equivalent by Lemma 9.3 and Proposition 9.7. Conditions (b) and (d) are equivalent by Lemma 9.3 and Proposition 9.5. Q.E.D.
9.5
CONVERGENCE OF THE DYNAMIC PROGRAMMING ALGORITHM
233
In Example 1, we have J,(O) = 0 and J*(O) = co,so strict inequality in (32) and (33) is possible. We present now an example in which not only is J, different from J*,but J, is Borel-measurable while J * is not. EXAMPLE 2 (Blackwell) Let C be the set of finite sequences of positive integers and H the set of functions h from C into {O,l). Then H can be regarded as the countable Cartesian product of copies of {O,1) indexed by C.Let (0,l) have the discrete topology and H the product topology, so H is a complete separable metrizable space (Proposition 7.4). A typical basic open set in H is { h ~ H l h ( s=) 1 VseC,, h(s) = 0 VSEC,), where C1 and C2 are finite subsets of C. Consider a Suslin scheme R:C + B, defined by R(s)={h~Hlh(s)=l)
VSEC.
Then N(R) = ( h H13(11, ~ c2,. . .) E JV such that h(<,, 12,. . . , in) = 1 Vn} is an analytic subset of H (Proposition 7.36). We show with the aid of Appendix B that N(R) is not Borel-measurable. Let Y be an uncountable Borel space and Q:C -+ B, a Suslin scheme such that N(Q) is not Borelmeasurable (Proposition B.6). Define $: Y + H by
If C1 and C2 are finite subsets of X,then
is in 9 , . The collection 8 of subsets E of H for which $ - ' ( E ) E ~ , is a o-algebra containing a base for the topology on H , so, by the remark following Definition 7.6, 8 contains a, and $ is Borel-measurable. For each SEE, we have Q(s) = $-'[R(s)], so
Since N(Q) is not Borel-measurable, N(R) is also not Borel-measurable. Define the decision model by taking S = HE*, where C* = C u {0), C = (1,2,. . .), U(x) = C for every x E S, and
9.
234
THE INFINITE HORIZON BOREL MODELS
The system transition is deterministic, so the choice of W and p(dwlx, u ) is irrelevant. Choose a = 1 and
If the system begins at x , = [h, 01 and the horizon is infinite, a positive cost ,. . . ,in) = can be avoided ifand only if there exists (i,,i2,. . .) such that h ( i l ,i2 1 for every n, i.e., J*([h, 01) = 0 if and only if h~ N ( R ) . Therefore, J* is not Borel-measurable. Over the finite horizon, we have
and since C is countable and f , g, and J , are Borel-measurable, J , is Borel-measurable for k = 0 , 1 , 2 , . . . . It follows that J, is Borel-measurable. ?'he equivalent conditions of Proposition 9.16 are not easily verified in practice. We give here some more readily verifiable conditions which imply that J , = J*. Proposition 9.17
( P ) ( D ) Assume that there exists a nonnegative integer 2 71, the set
E such that for each x E S, iE R , and k
is compact in C. Then J , = J*, J , = J*, and there exists an optimal nonrandomized stationary policy for (SM). Proof Under ( P ) ,we have, for each k, J , 5 J,, so J , + and letting k + co we obtain
, = T ( J k )I
J , 5 T(J,).
T(J,), (35)
Let X E S be such that J,(x) < co. By Lemma 3.1 for k 2 E there exists uk E U ( X )such that Jk + I(X)= g(x9 ~
k
+ a J Jk(xf)t(dxil*,uk). )
Since J k 5 J k + , 5 . . . I J,, it follows that for k 2 71 g ( x , u,)
+ a J ~ , ( x t ) i ( d x ~ lui) X , 5 g ( ~ui) , + a J ~ ~ ( X l ) t ( d Xu,) p, =Ji+l(x)lJ,(x)
Qi2k.
Therefore, ju,li 2 k } c U k [ x ,J,(x)] for every k 2 E. Since U k [ x J,(x)] , is
9.5
CONVERGENCE OF THE DYNAMIC PROGRAMMING ALGORITHM
235
compact, all limit points of the sequence (uili2 k ) belong to U,[x, J,(x)], and at least one such limit point exists. It follows that if a is a limit point of the sequence {uili2 E ) , then
Therefore, for all k 2 K,
Letting k + co and using the monotone convergence theorem, we obtain J,(s)
= g(x, ii)
+ a J~,(X~t(dxjx,a) t T(J,)(x)
(36)
for all X E Ssuch that J,(x) < co.We also have that (36) holds if J,(x) = co, and thus it holds for all x E S. From (35) and (36) we see that J, = T(J,) and conditions (a)-(d) of Proposition 9.16 must hold. In particular, we have from (35) and (36) that for every s E S, there exists E E U(x) such that
The existence of an optimal nonrandomized stationary policy for (SM) follows from Corollary 9.12.1. Under (D), conditions ( a t ( d ) of Proposition 9.16 hold by Proposition 9.14. If we replace g by g + b, we obtain a model satisfying (P). This new model also satisfies the hypotheses of the proposition, so there exists an optimal nonrandomized stationary policy for it. This policy is optimal for the original (D) model as well. Q.E.D. Corollary 9.17.1 (P)(D) Assume that the set U(x) is finite for each x E S. Then J, = J*, J, = J*, and there exists an optimal nonrandomized stationary policy for (SM). In fact, if C is finite and g and l- are Borelmeasurable, then J * is Borel-measurable and there exists a Borel-measurable optimal nonrandomized stationary policy for (SM). ,
Corollary 9.17.2 (P)(D) Suppose conditions (a)-(e) of Definition 8.7 (the lower semicontinuous model) are satisfied. Then J , = J*, J, = J*, J * is lower semicontinuous, and there exists a Borel-measurable optimal nonrandomized stationary policy for (SM).
Proof From the proof of Proposition 8.6, we see that J, is lower semicontinuous for k = 1,2,. . . , as are the functions
236
9.
THE INFINITE HORIZON BOREL MODELS
For A E R and k fixed, the lower level set
is closed, so for each fixed x E S
is compact. Proposition 9.17 can now be invoked, and it remains only to prove that the optimal nonrandomized stationary policy whose existence is guaranteed by that proposition can be chosen to be Borel-measurable. This will follow from Proposition 9.12 and the proof of Proposition 8.6 once we show that J, = J * is lower semicontinuous. Under (P), J, T J*, so
is closed, and J * is lower semicontinuous. Under (D), 00
J, - b
C akT J*,
so a similar argument can be used to show that J * is lower semicontinuous. Q.E.D. By using the argument used to prove Corollary 8.6.1, we also have the following. Corollary 9.17.3 The conclusions of Corollary 9.17.2 hold if instead of assuming that C is compact and each Ti is closed in Definition 8.7, we assume that each Tj is compact. Proposition 9.17 and its corollaries provide conditions under which the dynamic programming algorithm can be used in the (P) and (D) models to generate J*. It is also possible to use the dynamic programming algorithm to generate an optimal stationary policy, as is indicated by the next proposition. Proposition 9.18 (P)(D) Suppose that either U(x) is finite for each ~ E orS else conditions (a)-(e) of Definition 8.7 hold. Then for each k 2 0 there exists a universally measurable p k : S + C such that p , ( x ) ~U ( x ) for every x E S and T,,(J,)
=
T(J,).
(38)
If {pk)is a sequence of such functions, then for each X E S the sequence {pk(x))has at least one accumulation point. 1fp:S + C is universally measurable, p(x) is an accumulation point of {p,(x)} for each x E S such that J*(x) < co,and p(x)E U(X)for each x E S such that J*(x) = co,then TC = (p, p, . . .) is an optimal stationary policy for (SM).
Proof If U(x) is finite for each X E S, then the sets U,(x,I,) of (34) are compact for all k 2 0, x E S, and E R. The proof of Corollary 9.17.2 shows that these sets are also compact under conditions (a)-(e) of Definition 8.7. The existence of functions p,: S -, C satisfying (38) such that p,(x) E U(x) for every x E S is a consequence of Lemma 3.1 and Proposition 7.50. Under (P) we see from the proof of Proposition 9.17 that {pk(x)}has at S that J*(x) < co and every least one accumulation point for each ~ E such accumulation point of { p,(x)} is in U(x). If p: S + C is universally measurable and p(x) is an accumulation point of { p,(x)) for each x E S such that J*(x) < co,then from (35),(36),and Proposition 9.17 we have
J*b)
= six, Ax)]
+a
~*(x')t(dx'lx.r(x))
(39)
for all x E S sdch that J*(x) < co.If p(x)E U(x) for all x E S such that J*(x) = co,then
for all x E S such that J*(x) = co. From (39) and (40) we have J * = T,(J*), and the policy n = (p, p,. . .) is optimal by Proposition 9.12. Under (D) we can replace g by g + b to obtain a model satisfying (P) and the hypotheses of the proposition. The conclusions of the proposition are valid for this new model, so they are valid for the original (D) model as well. Q.E.D. A slightly stronger version of Proposition 9.18 can be found in [S12]. Corollary 9.18.1 If conditions (b)-(e) of Definition 8.7 hold and if each Tjof condition (b) is compact, then the conclusions of Proposition 9.18 hold. 9.6 Existence of E-Optimal Policies
We have characterized stationary optimal policies and given conditions under which optimal policies exist. We turn now to the existence of &-optimal policies. For fixed ~ E Sby , definition there is a policy which is &-optimal at x. We would like to know how this collection of policies, each of which is &-optimalat a single point, can be pieced together to form a single policy which is &-optimalat every point. There is a related question concerning optimal policies. If at each point there is a policy which is optimal at that point, is it possible to find an optimal policy? Answers to these questions are provided by the next two propositions. Proposition 9.19 (P)(D) For each > 0, there exists an &-optimalnonrandomized Markov policy for (SM), and if a < 1, it can be taken to be
238
9.
THE INFINITE HORIZON BOREL MODELS
stationary. If for each x E S there exists a policy for (SM) which is optimal at x, then there exists an optimal nonrandomized stationary policy.
x,"_,
Proof Choose E > 0 and 8, > 0 such that akck= E. If a < 1, let rk = (1 - a)&for every k. By Proposition 7.50, there are universally measurable functions pk:S -+ C, k = 0,1,. . . , such that pk(x)E U(x) for every x E S and
If a < 1, we choose all the p, to be identical. Then
Continuing this process, we have
and, letting k + co,we obtain lim (T,,T, k-r m
, . . . T,,)(J*)
I J*
+ E.
Under (P) we have
so TC = (po,,ul,.. .) is &-optimal.Under (D), Jo I J *
(T,,T,;
. . T,,)(Jo)
+ [b/(l - a)],
I (T,,T,;
+
. . Tpk)(J* [b/(l - a)]) = [ak+'b/(l - a)] (T,,T,, . . T,,)(J*),
+
so (41) is valid and TC = (po,pl, . . .) is &-optimal.This proves the first part of the proposition. Suppose that for each ~ E there S is a policy for (SM) which is optimal at x. Fix x and let TC = (pO,pl,.. .) be a policy which is optimal at x. By Proposition 9.1, we may assume without loss of generality that TC is Markov. By Lemma 8.4(b) and (c), we have
Consequently, T,,(J*)(x)
=
T(J*)(x).
This implies that the infimum in the expression
is achieved. Since x is arbitrary, Corollary 9.12.1 implies the existence of an optimal nonrandomized stationary policy. Q.E.D. Proposition 9.20 (N) For each E > 0, there exists an &-optimal nonrandomized semi-Markov policy for (SM). If for each x E S there exists a policy for (SM) which is optimal at x, then there exists a semi-Markov (randomized) optimal policy. Proof Under (N) we have Jk.1J * (Proposition 9.14), so, given the analytically measurable sets
E
> 0,
converge up to S as k + co. By Proposition 8.3, for each k there exists a k-stage nonrandomized semi-Markov policy nk such that for every x E S
+
Jk(x) (42) -1 / ~
if Jk(x)> - m, if Jk(x)= - co.
Then for x E A, we have either J*(x) > - oo and Jk,
&(x) I Jk(x)
+ (~12)5 J*(x)
E,
or else J*(x) = - co. If J*(x) = - m, then either Jk(x)= - m and Jk,
."(XI I - 1/&,
or else Jk(x)> - m and
,,
k Choose any p~ U(CIS) and define itk = (pk,,. . . ,pkp, p,. . .), where nk = ( p i , . . . ,pi- For every x E Ak, we have
J,(x)
J*(x)
5 Jk, =k(x)I
+E
if J*(x) > -a, if J*(x) = - a,
so itk is a nonrandomized semi-Markov policy which is &-optimalfor every x~ A,. The policy TC defined to be itk when the initial state is in A,, but not in Aj for any j < k, is semi-Markov, nonrandomized, and &-optimalat every X E U F = ~=AS.~
9.
240
THE INFINITE HORIZON BOREL MODELS
Suppose now that for each x E S there exists a policy z x for (SM) which is optimal at x. Let 75" be a policy for (DM) which corresponds to zx, and let ( p x ,q& q:, . . .) be the sequence generated from p, by E x via (10) and ( 1 1). If G : A -, [ - m, 0] is defined by (15),then we have from Proposition 9.3 that
J * ( x )= J,(x)
= J,(P,)
=
G(p,, q"o q;,. . .I.
(42)
We have from Proposition 9.5 and (16) that
J*(x) = J*(p,)
inf
=
(qo,ql..
G(px,q,, q l , . . .).
..)€Apx
(43)
Therefore the infimum in (43)is attained for every p, E S, where S = (p,l y E S), so by Proposition 7.50, there exists a universally measurable selector $ : S -, P(SC)P(SC) . . such that $(p,) E A,, and
Let 6 : s -+ S be the homeomorphism 6 ( x )= p, and let cp(x) = $[S(x)].Then cp is universally measurable, q ( x )E A,, , and
J*(x)=G[p,,cp(x)]
V~ES.
Denote
cp(x) = [ ~ O ( ~ ( ~ O ~ U O ) I41(d(xl,ul)Ix),. X ) > . .I. For each k 2 0, q,(d(x,, u,)~x)is a universally measurable stochastic kernel on S,C, given S, and by Proposition 7.27 and Lemma 7.28(a),(b),q,(d(x,, u,)Ix) can be decomposed into its marginal p,(dx,lx), which is a universally measurable stochastic kernel on S, given S, and a universally measurable stochastic kernel p,(du,lx, x,) on C, given SS,. Since po(dxolx)= p,(dx,), the stochastic kernel p,(du,lx, x,) is arbitrary except when x = x,. Set
The sequence z = ( P o ,p,, p, ,. . .) is a randomized semi-Markov policy for (SM). From (7)of Chapter 8, we see that for each x E S
qk(n,P,) = qk(d(Xk,U L ) ~ X ) From (5), (15),and (44),we have
so 71 is optimal.
VXE S, k
= 031,.
...
Q.E.D.
Although randomized polcies may be considered inferior and are avoided in practice, under (N) as posed here they cannot be disregarded even in deterministic problems, as the following example demonstrates.
EXAMPLE 3 (St. Petersburg paradox) Let S = (0,1,2,. . .), C = (0, I), U(x) = C for every x E S, CI = 1,
f (x, 4 =
1
if u = 1 , x # 0 , otherwise, -2"
i f x # O , u=O, otherwise.
Beginning in state one, any nonrandomized policy either increases the state by one indefinitely and incurs no nonzero cost or else, after k increases, jumps the system to zero at a cost of -2,+l, where it remains at no further cost. Thus J*(1) = -a,but this cost is not achieved by any nonrandomized policy. On the other hand, the randomized stationary policy which jumps the system to zero with probability 5 when the state x is nonzero yields an expected cost of - cc and is optimal at every x E S. The one-stage cost gin Example 3 is unbounded, but by a slight modification an example can be constructed in which g is bounded and the only optimal policies are randomized. If one stipulates that J * must be finite, it may be possible to restrict attention to nonrandomized policies in Proposition 9.20. This is an unsolved problem. If (SM) is lower semicontinuous, then Propositiq 9.19 can be strengthened, as Corollary 9.17.2 shows. Similarly, if (SM) is upper semicontinuous, a stronger version of Proposition 9.20 can be proved. Proportional 9.21 Assume (SM) satisfies conditions (a)-(d) of Definition 8.8 (the upper semicontinuous model).
(D) For each E > 0, there exists a Borel-measurable, &-optimal,nonrandomized, stationary policy. (N) For each E > 0, there exists a Borel-measurable, &-optimal,nonrandomized, semi-Markov policy. Under both (D) and (N), J * is upper semicontinuous. Proof Under (D) and (N) we have lim,,, J, = J * (Proposition 9.14), and each J, is upper semicontinuous (Proposition 8.7). By an argument similar to that used in the proof of Corollary 9.17.2, J * is upper semicontinuous. By using Proposition 7.34 in place of Proposition 7.50, the proof of Proposition 9.19 can be modified to show the existence of a Borel-measurable, E-optimal,nonrandomized, stationary policy under (D).By using Proposition 8.7 in place of Proposition 8.3, the proof of Proposition 9.20 can be modified to show the existence of a Borel-measurable, &-optimal, nonrandomized, semi-Markov policy under (N). Q.E.D.
Chapter 10
The Imperfect State Information Model
In the models of Chapters 8 and 9 the current state of the system is known to the controller at each stage. In many problems of practical interest, however, the controller has instead access only to imperfect measurements of the system state. This chapter is devoted to the study of models relating to such situations. In our analysis we will encounter nonstationary versions of the models of Chapters 8 and 9. We will show in the next section that nonstationary models can be reduced to stationary ones by appropriate reformulation. We will thus be able to obtain nonstationary counterparts to the results of Chapters 8 and 9. 10.1 Reduction of the Nonstationary Model-State Augmentation The finite horizon stochastic optimal control model of Definition 8.1 and the infinite horizon stochastic optimal control model of Definition 9.1 are said to be stationary, i.e., the data defining the model does not vary from stage to stage. In this section we define a nonstationary model and show how it can be reduced to a stationary one by augmenting the state with the time index.
10.1
REDUCTION OF THE NONSTATIONARY MODEL
243
We combine the treatments of the finite and infinite horizon models. Thus when N = cc and notation of the form S o ,S,, . . . ,S,-, or k = 0,. . . , N - 1 appears, we take this to mean S o , S,, . . . and k = 0, 1, . . .,respectively. Definition 10.1 A nonstationary stochastic optimal control model, denoted by (NSM), consists of the following objects: N Horizon. A positive integer or co. S,, k = 0,. . . ,N - 1 State spaces. For each k, S, is a nonempty Borel space. C,, k = 0,. . . ,N - 1 Control spaces. For each k, C , is a nonempty Borel space. U,, k = 0,. . . ,N - 1 Control constraints. For each k, U , is a function from S, to the set of nonempty subsets of C,, and the set rk
=
{ ( ~ 2 k~ k ) l E~ Skk > uk
Uk(xk)}
(1)
is analytic in S,C,. W,, k = 0,. . . ,N - 1 Disturbance spaces. For each k, W, is a nonempty Borel space. p,(dw,lx,, u,), k = 0,. . . ,N - 1 Disturbance kernels. For each k, P k ( d ~ k lu,) ~ kis, a Borel-measurable stochastic kernel on W, given S,C,. f,,k = 0,. . . ,N - 2 System functions. For each k, f, is a Borel-measurable function from S,C, W, to S,, . Discount factor. A positive real number. cl g,, k = 0,. . . ,N - 1 One-stage cost functions. For each k, g, is a lower semianalytic function from T, to R*.
,
We envision a system which begins at some x , E S, and moves successively through state spaces S,, S,+ ,. . . and, if N < cc, finally terminates in S N - , . A policy governing such a system evolution is a sequence nk = (p,, pk+ . . ,pN- ,), where each pj is a universally measurable stochastic kernel on C j given S,C,. . . C j - ,Sj satisfying
,, ,
,,
for every (x,, u,, . . . ,u j - x j ) Such a policy is called a k-originating policy and the collection of all k-originating policies will be denoted by nk.The concepts of senzi-Markov, Markov, nonrandomizell and 9-measurable policies are analogous to those of Definitions 8.2 and 9.2. The set 11° is also written as II', and the subset of II' consisting of all Markov policies is denoted by II. Define the Borel-measurable state transition stochastic kernels by
244
10.
THE IMPERFECT STATE INFORMATION MODEL
Given a probability measure p, E P(S,) and a policy nk = (p,,. . . ,pN- E n k , defineforj= k , k + 1, . . . , N - 1
x tj-l(dxjlxj-
,, uj-
~ j - l ( ~ ~ j - l I ~ k. ., ,uj-2,xj-l)' ~k,. ' . ~ k ( ~ ~ k l ~ k ) ~ k ( ~ ~ k ) VS~E Bsj, Cj€BCj. (2) There is a unique probability measure qj(nk,p , ) ~P(SjCj) satisfying (2). If the horizon N is finite, we treat (NSM) only under one of the following assumptions :
If N
=
co,we treat (NSM) only under one of the assumptions:
(P) OIg,(x,,u,) f o r e v e r y ( x k , u k ) ~ r k , k =,..., O N-1. (N) gk(xk,u,)I 0 forevery(x,,u,)~l-,, k = 0 , . . . , N - 1. (D) 0 < a < 1, and for some b E R, - b I gk(xk,uk) I b for every (x,, uk)E r,, k = 0, . . . , N - 1. As in Chapters 8 and 9, the symbols (F+),(F-), (P),(N), and (D) will be used to indicate when a result is valid under the appropriate assumption. We define the k-originating cost corresponding to nk at x, E S, to be
and the k-originating optimal cost at x, E S, to be J*(xk,k) = inf Jffk(x,,k). ak~nk A policy n E 11° is E-optimalat x,
E S,
if
The policy TC is optimal at x, if J,(x,, 0) = J"(x,, 0). We say TC E nois &-optimal (optimal) if it is &-optimal(optimal) at every x, E So. Let (E,) be a sequence of positive numbers with F, 10. A sequence of policies jn,} c 11° is said to
10.1
REDUCTION OF THE NONSTATIONARY MODEL
exhibit {E,)-dominated conuergence to optimality if lim J,,(xo, 0 ) = J*(xo,0)
VxOE S O ,
n+ m
and for n = 2,3,. . . J,,(xo, 0 )
J*(xo,O) f En J ,,_,( x o , O ) + ~ ,
if J * ( x o ,0 ) > - co, if J * ( x O , O ) = - a .
Definition 10.2 Let a nonstationary stochastic optimal control model as defined by Definition 10.1 be given. The corresponding stationary stochastic optimal control model, denoted by (SSM), consists of the following objects. ( T is both a terminal state and the only control available at that state. If N = oo,the introduction of T is unnecessary.):
~)Ix,E
S = U ~ Z{(x,, ; S k ) v { T ) State space. C= [(u,, k)lukE C,) u { T ) Control space. U Control constraint. A function from S to the set of nonempty subsets of C defined by U(x,, k ) = ((u,, k)lukE U,(xk)), U ( T )= { T ) . W= ((w,, k)Iw, E &) Disturbance space.
U:Z,
ULil
p(dwlx, u) Disturbance kernel. If
f
vk B,,,
System function. We define for k
',E
we define
= 0, . . . ,N -
2
and for the remaining two stages
f
[(xN-l,N - l),(uN-l,N - l),(wN-l,N - 111 = T , f ( T , T , w) = T . a Discount factor. g One-stage cost function. We define g[(xk k),( ~ k)] k = gk(~k~ g ( T , T ) = 0. 2
N
Horizon.
k ) ,
@I (9)
Consider the mapping cp, :S, + S given by cpk(xk)= (xk,k). We endow S with the topology that makes each cp, a homeomorphism, and we endow C and W with similar topologies. The spaces S, C, and Ware Borel. The set
246
10.
THE IMPERFECT STATE INFORMATION MODEL
is analytic, and g defined on l- by (8) and (9) is lower semianalytic. The disturbance kernel p(dwlx, u) is not defined on all of SC by (4), but it is defined on a Bore1 subset of SC containing I- - {(T, T)),which is all that is necessary. Likewise, the system function f is not defined on all of SCW by (5)-(7), but the set of points where it is not defined has probability zero under any policy governing the system evolution. Both p(dwlx, u) and f are Borel-measurable on their domains. Thus (SSM) is a special case of the stochastic optimal control model of Definition 8.1 (N < co)or Definition 9.1 (N = co). If N < co, the ( F f ) and (F-) assumptions on (SSM) are given in Section 8.1. These are equivalent to the respective (F') and (F-) assumptions on (NSM)given earlier in this section. If N = co, the (P),(N),and (D)assumptions on (SSM) of Definition 9.1 are equivalent to the respective (P),(N), and (D) assumptions on (NSM) given earlier in this section. The reader can verify that there is a correspondence of policies between S (SSM) is J*(x,, k) (NSM) and (SSM), and the optimal cost at (x,,~ ) E for given by (3). Because of these facts, results already proved for (SSM) with either a finite or infinite horizon have immediate counterparts for (NSM). An illustration of this is the nonstationary optimality equation. Proposition 10.1 (P)(N)(D) Let J*(x,, k) be defined by (3). For fixed k, J*(x,, k) is lower semianalytic on S,, and
We do not list all the results for (NSM) that can be obtained from (SSM). The reader may verify, for example, that the existence results of Propositions 8.3 and 8.4, are valid for (NSM) in exactly the form stated. From Propositions 9.19 and 9.20 we conclude that, under (P) and (D), an &-optimalnonrandomized Markov policy exists for (NSM), while under (N), an &-optimal nonrandomized semi-Markov policy exists. In what follows, we make use of these results and reference only their stationary versions. 10.2 Reduction of the Imperfect State Information Model-Sufficient Statistics
Before defining the imperfect state information model, we give without proof some of the standard properties of conditional expectations and probabilities we will be using. For a detailed treatment, see Ash [All. Throughout this discussion, (R,.F,P) is a probability space and X is an extended realvalued random variable on R for which either E[X +]or E[X-] is finite. If 9 c .Fis a o-algebra on R, then the expectation of X conditioned on 9 is any %measurable, extended real-valued, random variable E [ x I ~ ] ( . )
10.2
REDUCTION OF THE IMPERFECT STATE INFORMATION MODEL
247
on Q which satisfies
It can be shown that at least one such random variable exists. Any such random variable will be called a version of E[XI9]. If X(co) 2 b for some b E R and every co E Q, then it can be shown that for any version E [XI9](.) the random variable E [ x I ~ ](. ) defined by g[xl9](,)
= max(E[Xl9](w), b),
is also a version of E[XI9]. If 8 c 9 is a collection of sets which is closed under finite intersections and generates the o-algebra 9 and if Y is an extended real-valued, %measurable, random variable satisfying
then Y satisfies (10) for every D E 9 , and Y is a version of E[XI9]. If 8 c 9 is a o-algebra, then E{E[XI91 I ~ ) ( u= ) E[XJ8l(w) (11) for P almost every w. Suppose now that (Q,,.F1) and ( Q 2 , F 2 )are measurable spaces and Y, :Q + 0, and Y2 :Q -+ R2 are measurable. Let g:QlR2 + R* be measurable and satisfy either E[g+(Y,, Y,)] < co or E[gp(Y1, Y2)] < co.We define E[XI Y,l(w) = E[XI@-(Y1)l(o), where F(Y,)
=
{Y;~(F)~FEF,}.
We define for y, E Q,
E[xIY,]
where w(y,) is any element of Y;'({yl)). Since is F(Y,)measurable, it is constant on Y; '(( y,)), and this definition makes sense. Note that E[XI Y, = y,] is a function of y,, not of w. We have for any y, E Y,
for P almost every y,. We use the phrase "for P almost every y," to indicate that, in this case, P((w ~ R l ( 1 2fails ) when y, For F E F 2 , define
=
Y,(w))) = 0.
248
10.
THE IMPERFECT STATE INFORMATION MODEL
Suppose t(dy21y,) is a stochastic kernel on (R,, P2)given R , such that for every F E P2 PLY2 €Fly1 = y11
=
t(Fly1)
for P almost every y,. Then (12)can be extended to
for P almost every y,. We will find (11) and (13) particularly useful in our treatment of the imperfect state information model. They will be used without reference to this discussion. Definition 10.3 The impevfect state information stochastic optimal control model (ISI) is the ten-tuple (S, C, (U,, . . . , U,-,), Z , a, g, t, s,, s, N ) described as follows: S, C, a, g, t State space, control space, discount factor, one-stage cost function, and state transition kernel as given in Definition 8.1 and (3) of Chapter 8. We assume that g is defined on all of SC. Z Observation space. A nonempty Bore1 space. U,, k = 0,. . . ,N - 1 Control constraints. Define for k = 0,. . . ,N - 1,
An element of I, is called a kth information vector. For each k, U , is a mapping from I, to the set of nonempty subsets of C such that
analytic. s, Initial observation kernel. A Borel-measurable stochastic kernel on given S. s Obserziation kernel. A Borel-measurable stochastic kernel on Z given CS. N Horizon. A positive integer or co. For the sake of simplicity, we have eliminated the system function, disturbance space, and disturbance kernel from the model definition. In what follows, our notation will generally indicate a finite N. If N = a,the appropriate interpretation is required. The system moves stochastically from state x, to state x,,, via the state transition kernel t(dx,+ ,lx,. u,) and generates cost at each stage of g(x,, u,). The observation z,, is stochastically generated via the observation kernel s(dz,+ ,ju,, x,, ,) and added to the past observations and controls (z,, u,, . . . , z,,u,) to form the (k + 1)stinformation vector i, = (z,, u,, . . . ,z,, u,, z,, ,). The first information vector i, = ( s o )is generated by the initial observation
,
,
10.2
249
REDUCTION OF THE IMPERFECT STATE INFORMATION MODEL
kernel so(dzo~xo), and the initial state xo has some given initial distribution p. The goal is to choose u, dependent on the kth information vector i, so as to minimize
Definition 10.4 A policy for (ISI) is a sequence z = ( y o ,. . . ,y,- ,) such that, for each k, ,u,(du,lp; i,) is a universally measurable stochastic kernel on C given P(S)Iksatisfying
If for each p, k, and i,, y,(du,lp; ik) assigns mass one to some point in C, n is nonrandomized. The concepts of Markov and semi-Markov policies are of no use in (ISI), since the initial distribution, past observations, and past controls are of genuine value in estimating the current state. Thus we expect policies to depend on the initial distribution p and the total information vector. In the remainder of this chapter, H will denote the set of all policies in (ISI). Just as we denote the set of all sequences of the form (z,, uo,. . . ,u,- ,z,) E Z C . . . C Z by I, and call these sequences the kth information vectors, we find it notationally convenient to denote the set of all sequences of the form ( x o ,zo ,u o ,. . . ,x,, z,, u,) E SZC . . . S Z C by Hk and call these sequences the kth history vectors. Except for u,, the kth information vector is that portion of the kth history vector known to the controller at the kth stage. Given p E P(S) and z = ( y o , . . . ,u ,-, ,)E H, by Proposition 7.45 there is a sequence of consistent probability measures P,(n,p) on H,,k = 0,. . . , N - 1, defined on measurable rectangles by
,
Definition 10.5 Given p E P(S), a policy n = ( y o , . . . ,y N - ,) E Il,and a positive integer K I N, the K-stage cost corresponding to n at p is
If N < m, the cost corresponding lo 71. is J,,
.,
and we assume either
250
10.
THE IMPERFECT STATE INFORMATION MODEL
If N = cc, the cost corresponding to 7c is J, = lim,,, J,,,, and to ensure that this limit is a well-defined extended real number, we impose one of the following conditions: g(x, u) for every (x, u) E SC. (P) 0 I (N) g(x, u) 5 0 for every (x, u) E SC. (D) O < a < l , a n d f o r s o m e b ~ R , - b 5 g ( x , u ) 1 b f o r e v e r y ( x , u ) ~ S C . The optirnal cost at p is J%P) = inf JN, .(PI. n e Il
The concepts of optimality at p, optimality, E-optimalityat p, and E-optimality of policies are analogous to those given in Definition 8.3. If N < oo and (F') or (F-) holds, then by Lemma 7.11(b)
If N
=
oo and (P), (N), or (D) holds, then
To aid in the analysis of (ISI), we introduce the idea of a statistic sufficient for control. This statistic is defined in such a way that knowledge of its values is sufficient to control the model. Definition 10.6 A statistic for the model (ISI) is a sequence (yo,. . . ,yN- ,) of Borel-measurable functions yk:P(S)Ik+ Y,, where Y , is a nonempty Bore1 space, k = 0,. . . ,N - 1. The statistic (yo,. . . ,y,- ,) is suficient for control provided : (a) For each k, there exists an analytic set c YkC such that projyk($k)= Y , and for every p E P(S)
rk
rk
=
(20)
{(ik2u)I[~k(~; ik),"] ~ ~ k ) ,
where r, is defined by (15). We define O k ( ~ k )=
($k)yr.
(21)
(b) There exist Borel-measurable stochastic kernels Fk(dyk+1 y,, u,) on = 0,. . . ,
,
Y,+ given Y,C such that for every p E P(S), n E ll,_Y,+ EB,, ,, k +
10.2
REDUCTION OF THE IMPERFECT STATE INFORMATION MODEL
251
N - 2, we have
for Pk(n,p) almost every (L,, B,).+ (c) There exist lower semianalytic functions fying for every p€P(S),n ~ l lk, = 0,. . . , N - 1,
gk:rk[- co, co] -t
satis-
for Pk(n,p)almost every (%,a,), where the expectation is with respect to Pk(n, PICondition (a) of Definition 10.6 guarantees that the control constraint set Uk(ik)can be recovered from q,(p; i,). Indeed, from (15), (20), and (21), we have for any p€P(S), i k € I k k, = 0,. . . ,N - 1,
If U,($ = C for every i , I,, ~ k = 0,. . . ,N - 1, then condition (a) is satisfied with r, = KC. This is the case of no control constraint. Condition (b) guarantees that the distribution of y,,, depends only on the values of y, and u,. This is necessary in order for the variables y, to form the states of a stochastic optimal control model of the type considered in Section 10.1. Condition (c) guarantees that the cost corresponding to a policy can be computed from the distributions induced on the (y,, u,) pairs. We temporarily postpone discussion on the existence and the nature of particular statistics sufficient for control, and consider first a perfect state information model corresponding to model (ISI) and a given sufficient statistic. Definition 10.7 Let the model (ISI) and a statistic sufficient for control (qO,.. . ,yN-J be given. The perfect state information stochastic optimal control model, denoted by (PSI), consists of the following (we use the notation of Definitions 10.3 and 10.6):
Y,, k = 0,. . . ,N - 1 C Control space. 0 , k = 0,. . . ,N - 1 a Discount factor. , k = 0,. . . ,N - 1 k = 0,. . . ,N - 2 N Horizon.
rk,
State spaces. Control constraints. One-stage cost functions. State transition kernels.
+ In this context "for P,(?r,p) almost every (?,,a,)" means that the set { ( x , , z, u,, z k . uk)E Hkl ( 2 2 ) holds when 7, = q k ( p ;i,) Z , = u,) has Pk(x,p)-measure one.
. . . ,x , ,
252
10.
THE IMPERFECT STATE INFORMATION MODEL
Thus defined, (PSI) is a nonstationary stochastic optimal control model in the sense of Definition 10.l.+The definitions of policies and cost functions for (PSI) are given in Section 10.1. We will use ( ^ ) to denote these objects in (PSI). For example, I?' is the set of all (0-originating) policies and fi is the set of all Markov (0-originating) policies for (PSI). If it = (Go, . . . , fiN- ,) is a policy for (PSI), then by (24) and Proposition 7.44 the sequence
where
,,
i, = (zO,u,,. . . ,uk- z,),
k
= 0,.
. . ,N - 1,
(25)
is a policy for (ISI). We call this policy E also, and can regard fit as a subset of II in this sense. If it is a nonrandomized policy for (PSI), then it is also nonrandomized when considered as a policy for (ISI).We will see in Proposition 10.2 that il results in the same cost for both (PSI) and (ISI). Define cp :P(S) + P(Y,) by
Thus defined, cp(p) is the distribution of the initial state yo in (PSI) when the initial state x, in (ISI) has distribution p. By Corollary 7.26.1, for every _Y, E By,the mapping
is Borel-measurable. Define a Borel-measurable stochastic kernel on S given P(S) by q(dxolp)= p(dxo). Then (26) can be written as cp(P)(_Y,)= S*_Yo(x07 P ) ~ ( ~ ~ O I P ) . It follows from Propositions 7.26 and 7.29 that cp is Borel-measurable. For p E P(S), define the mapping I/,,,:H, + YoCo . . . Y,C, by
where (25) holds. For q E P(Yo) and il = (fi,, . . . ,fiN- ,) E fi', there is a sequence ofconsistent probability measures P,(il, q) generated on YoCo.. . KC,, k = 0,. . . ,N - 1, defined on measurable rectangles by
'
The disturbance spaces, disturbance kernels. and system functions in ( P S I ) can be taken to be W, = Y,+ p,(dw,l~.,. u,) = ~,(djl,+, J,, u,), and f,(y,,u,, w,) = w,. respectively.
,,
10.2
REDUCTION OF THE IMPERFECT STATE INFORMATION MODEL
253
For a Markov policy ?€I?, these objects are related to the probability measures P,(it,p) defined by (16) in the following manner. Lemma 10.1 Suppose p E P(S) and it E fi.Then for k for every Bore1 set B c YoCo.. . &Ck,we have
= 0,.
. . ,N - 1 and
Pk(?, P ) [ ~ L : ( ~ ) = ] pk[it> (P(P)](~). Proof
(29)
It suffices to prove that if _YO E B ~ , ,COE.%~,,,. . . ,& ~ , 9 8 ~ ~ ,
CkE BCk,then
P , ( ~ , ~ ) ( ( l l o ( ~ ; i o ) ~uoECo,. - Y o . . . ,rk(P;ik)E-Yk>~ k € C k ) ) = Pk[% (P(p)](&CO. . . &Ck).+
(30)
For k = 0, (30) follows from (16), (26), and (28). If (29) holds for some k < N, then using (16), (22), (28), and (29), we obtain
As noted earlier, (PSI) is a model of the type considered in Section 10.1. The (F+)and (F-) conditions of Section 10.1, when specialized to the (PSI) model, will be denoted by (Pf) and (E-), respectively. These conditions are not to be confused with the ( F + ) and (F-) conditions for the IS1 model given in this section. In a particular problem it is often possible to see the relationship between these finiteness conditions on the two models. In the general case, the relationship is unclear. We point out, however, that if g is bounded below or above, then ( F + )or (F-) is satisfied for (ISI), respectively, and given any statistic sufficient for control, the corresponding Gk can be chosen so as to be bounded below or above, respectively. If a particular result holds when we assume (F+)on the (ISI) model and ( p + )on the (PSI) model, the notation ( F + , F + ) will appear. The notation (F-,P-) has a similar meaning. In this context, we define
.
where ij = ( z o , u o , . . . u j _ ,.z,). We will often use this notation t o indicate a set which depends on functions of some or all of the components of a Cartesian product.
254
10.
THE IMPERFECT STATE INFORMATION MODEL
If N = co, we consider conditions (P), (N), and (D) for (ISI) and the corresponding conditions (p),(N), and @) for (PSI). In this case, however, if (P) holds for (ISI) and lower semianalytic functions g k : f k+ [- co,co] satisfying (23) exist, there is no loss of generality in assuming that Zjk 2 0 for every k, i.e., (p) holds for (PSI). Likewise, if (N) or (D) holds for (ISI), we may assume without loss of generality that (N) or @), respectively, holds for (PSI). As in the finite horizon case, we adopt the notation (P, P), (N, N), and (D, D) to indicate which assumptions are sufficient for a result to hold. holds, From Section 10.1, we have that when (pi), (p-), (p),(Q),or (8) then the (0-originating) cost covvesponding to a policy il for (PSI) at y e Yo is
where N may be infinite. The (0-originating) optimal cost for (PSI) at y e Yo is j;( y) = inf
jN,
%en'
,(y).
(32)
The remainder of this section is devoted to establishing relations between costs, optimal costs, and optimal and nearly optimal policies for the (ISI) and (PSI) models. Proposition 10.2 (F+,F+)(F-, F-)(P, F)(N, R)(D,D) For every peP(S) and il E fi, we have JN.
Froof
%(P)= L
~(Yo)P(P)(~Yo).
. ~ N ,
(33)
From (31), (28), (23), (IS), (19), and Lemma 10.1, we have
where the ( F + ) or (F-) assumption is used to interchange integration and summation when N < co, and the monotone or bounded convergence Q.E.D. theorem is used when N = co. Corollary 10.2.1 ( F + , ~ + ) ( F p-)(P,~)(N,N)(D,D) -, For every p~ P(S), we have
10.2
REDUCTION OF THE IMPERFECT STATE INFORMATION MODEL
255
Proof The function &yo) is lower semianalytic, so the integral in (34) is defined. From Proposition 10.2, we have
so it suffices to show that
This follows from Lemma 8.6 and Corollary 9.5.2.
Q.E.D.
We wish now to establish a relationship similar to (33) between the optimal cost functions for (ISI) and (PSI). In light of Corollary 10.2.1, it suffices to show that given any policy for (ISI), a policy for (PSI) can be found which does at least as well. This is formalized in the next lemma, and the analog of (33) is given as part of Proposition 10.3.
Lemma 10.2 (F', F'+)(F-,F-)(P,P)(N,R)(D, D) Given p~ P(S) and n E l7,there exists it E fi such that
Proof L e t p € P ( S ) a n d n = ( p O,..., , u N . . . , ) € ~ b e g i v e n . F o r k =,..., 0 N - 1, let Qk(z,p) be the probability measure on Y,Ckdefined on measurable rectangles to be There exists a Borel-measurable stochastic kernel fi,(dukly,) on C, given Y , such that for every Bore1 set B c Y,C, we have
In particular, 1 = Pk(n, ~ ) ( { ( ~ukk,) ~ r k ) )
so, altering fik(duklyk) on a set of measure zero if necessary, we may assume =y1kforevery ) y , Y,. ~ Letit = (fi,,. . . ,fiN-,). that (38) h o l d s a n d f i k ( ~ k ( y k ) ~ Then it is a Markov policy for (PSI). We show by induction that for _Y, €By,, C,E Bc, k = 0,. . . ,N - 1,
256
10.
THE IMPERFECT STATE INFORMATION MODEL
We see from (26) and (37) that the marginal of Qo(n,p) on Yo is cp(p).Equation (39) for k = 0 follows from (28) and (38). Assume that (39) holds for k. From (38), (37), (22), and the induction hypothesis, we have
= Jyoc0
yk,
Lk+, P ~ + ~ ( G + ~ ~ Y ~ + ~ ) ~ ~ ( ~ Y ~ + ~ Y ~ , ~ ~
= P k + l [ ~ , c p ( P ) l ( ~ ~~k_+Yl , + l , u k ~ _ C k + l ) ) .
Taken together, (37)and (39)imply that for& E By,,C, we have
E
a,, k = 0,. . .,N - 1,
If (40) is used in place of Lemma 10.1, the proof of Proposition 10.2 can now be used to prove (36). Q.E.D. Definition 10.8 Given q E P(Yo) and r > 0, a policy it E I?' is said to be weakly q-&-optimalif
The policy it is said to be q-optimal if q((yOEY ~ ~ J ~ = , % Jj$(y0))) ( ~ ~= ) 1. Equation (35) shows that given any p E P(S) and E > 0, a weakly cp(p)-Eoptimal Markov policy exists. The next proposition shows that such a policy is &-optimalat p when considered as a policy in (ISI). Proposition 10.3 (F+$+)(F-, F-)(P, P)(N,R)(D, D) We have
Furthermore, if it is optimal, cp(p)-optimal, or weakly cp(p)-&-optimalfor (PSI), then it is optimal, optimal at p, or &-optimal at p, respectively, for (ISI). If it is e-optimal for (PSI) and (F+,p'), (P,P), or (D,D) holds, then it is also &-optimalfor (ISI).
10.2
REDUCTION OF THE IMPERFECT STATE INFORMATION MODEL
257
Proof Equation (41) follows from Corollary 10.2.1 and Lemma 10.2. Let it be &-optimalfor (PSI). It is clear that under (P, P) and (D, D), we have
j;(Yo)>-m ~N,~(Y ) 5o YO) +
VYOEYO,
v
~
~
(42)
E
(43)
Under (F+,p+), (42) follows from Lemma 8.3 and Proposition 8.2, so again (43) holds. We have from (41) and Proposition 10.2 that
so 2 is e-optimal for (ISI). The remainder of the proposition follows from (41) and Proposition 10.2. Q.E.D. We shall show shortly that a statistic sufficient for control always exists, and indeed, in many cases it can be chosen so that (PSI) is stationary. The existence of such a statistic for (ISI) and the consequent existence of the corresponding model (PSI) enable us to utilize the results of Chapters 8 and 9. For example, we have the following proposition. Proposition 10.4 (F', P+)(F-, F-)(P, P)(N,R)(D, D) If (yo, . . . , y,- ,) is a statistic sufficient for control for (ISI), then for every E > 0, there exists an E-optimal nonrandomized policy for (ISI) which depends on i, = (zo,uo,. . . ,uk- zk)only through y,(p; i,), i.e., has the form 71. =
( ~ o [ ~ ; ~ o ( ~ ; i. o. ,)Pl N , .- ~ [ P ; Y N - I ( P ; ~ N - ~ ) I ) .
(44)
Under (F', p'), (P, p), or (D, D), we may choose this &-optimal policy to have the simpler form
Proof Under (F', p+),(P, p), or (D, D), there exists an e-optimal, nonrandomized, Markov policy it = (Go,. . . ,fi,- ,) for (PSI) (Propositions 8.3 and 9.19). This policy it is &-optimalfor (ISI) by Proposition 10.3, and the second part of the proposition is proved. Assume (F-, E-) holds and let {en) be a sequence of positive numbers E , < o~ and E, .LO. Let it, = (fi;,, . . . , ,) be a sequence of nonwith randomized Markov policies for (PSI) exhibiting {en)-dominated convergence to optimality (Proposition 8.4). By Proposition 10.2 and the (F-, p-)
I,"=,
rN-
10.
THE IMPERFECT STATE INFORMATION MODEL
assumption, we have
Since
we have
Let
E
> 0 be given and let n(p) be the smallest positive integer n for which
Define ,uk(p;y,) = fi;(P)(yk),k = 0,. . . ,N - 1. Then by Propositions 10.2 and 10.3, n given by (44)is an &-optimalnonrandomized policy for (ISI). Assume (N, R)holds. Consider the nonstationary stochastic optimal control model ( N P S I ) for which the initial state space is P(Y,), the initial control space is a singleton set {u,), the initial cost function is g,(q, u,) = 0 for every q~ P(Yo),and the initial transition kernel is given by t(dy,lq, u,) = q(dyo)for every q~ P(Yo). For k 2 0, the (k + 1)st state and control spaces, control constraint, cost function, and transition kernel are Y,, C, o,, g^,, and Ek(dyk+,I y,, u,) of (PSI), respectively. The discount factor is a and the horizon is infinite. By definition, the optimal cost for ( N P S I ) at q~ P(Y,) is
which, by Corollaries 9.1.1 and 9.5.2, is the same as JYo
J*(Yo)q(dyo).
Now ( N P S I ) has a nonpositive one-stage cost function, so, by Proposition 9.20, for each E > 0 there exists an &-optimal,nonrandomized, semi-Markov policy 77 = (P(4X Po(9;YO),Pl(9;Y I ) . . . .I. For fixed q E P(Yo),let ?(q) be the policy for (PSI)given by
Then
i.e., it(q) is weakly q-s-optimal for (PSI). By Proposition 10.3, the policy n defined by (44), where ,u,(p; y,) = p(cp(p);y,), is &-optimalfor (ISI). Q.E.D. The other specific results which can be derived for (ISI) from Chapters 8 and 9 are obvious and shall not be exhaustively listed. We content ourselves with describing the dynamic programming algorithm over a finite horizon. By Proposition 8.2, the dynamic programming algorithm has the following form under ( F + , F + ) or (FP,F-), where we assume for notational simplicity that (PSI) is stationary:
If the infimum in (47) is achieved for every y and k = 0,. . . ,N - 1, then there exist universally measurable functions fi,: Y -, C such that for every y and k = 0,. . . ,N - 1, f i , ( y ) ~Zi(y) and fi,(y) achieves the infimum in (47). Then it = (Go,. . . ,fi,- ,) is optimal in (PSI) (Proposition 8.5), so it is optimal in (ISI) as well (Proposition 10.3). If (F+,i?+) holds and the infimum in (47) is not achieved for every y and k = 0, . . . , N - 1, the dynamic programming algorithm (46)and (47)can still be used in the manner of Proposition 8.3 to construct an &-optimal, nonrandomized, Markov policy it for (PSI). We see from Proposition 10.3 that it is an &-optimalpolicy for (ISI) as well. In many cases, r],, ,(p; i, ,) is a function of r],(p; i,), u,, and z,, . The computational procedure in such a case is to first construct (Go,. . . ,fi,- ,) via (46) and (47), then compute yo = yo(p;io) from the initial distribution and the initial observation, and apply control uo = fio(yo).Given y,, u,, and z,.,, compute y,,, and apply control u,,, = fi,. ,(y,+ ,), k = 0,. . . ,N - 2. In this way the information contained in (p; i,) has been condensed into y,. This condensation of information is the historical motivation for statistics sufficient for control.
,
10.3 Existence of Statistics Sufficient for Control
Turning to the question of the existence of a statistic sufficient for control, it is not surprising to discover that the sequence of identity mappings on P(S)I,, k = 0,. . . ,N - 1, is such an object (Proposition 10.6). Although this
260
10.
THE IMPERFECT STATE INFORMATION MODEL
represents no condensation of information, it is sufficient to justify our analysis thus far. We will show that if the constraint sets T, are equal to IkC, k = 0,. . . ,N - 1, then the functions mapping P(S)I, into the distribution of x , conditioned on ( p ;i,), k = 0,. . . ,N - 1, constitute a statistic sufficient for control (Proposition 10.5). This statistic has the property that its value at the (k + 1)st stage is a function of its value at the kth stage, u, and z,, [see (52)],so it represents a genuine condensation of information. It also results in a stationary perfect state information model and, if the conditional distributions can be characterized by a finite set of parameters, it may result in significant computational simplification. This latter condition is the case, for example, if it is possible to show beforehand that all these distributions are Gaussian.
,
10.3.1 Filtering and the Conditional Distributions of the States
We discuss filtering with the aid of the following basic lemma. Lemma 10.3 Consider the (ISI) model. There exist Borel-measurable stochastic kernels ro(dxolp;z,) on S given P(S)Z and r(dx1p;u, z) on S given P(S)CZ which satisfy
Jso ~ o ( ~ o ~ x o ) = ~ (Jsd Jzo ~ oro(~0l ) P;~o)~o(d~oxo)~(dxo)
Proof For fixed ( p ; u )P(S)C, ~ define a probability measure q on S Z by specifying its values on measurable rectangles to be (Proposition 7.28)
By Propositions 7.26 and 7.29, q(d(x,z)jp;u) is a Borel-measurable stochastic kernel on S Z given P(S)C. By Corollary 7.27.1, this stochastic kernel can be decomposed into its marginal on Z given P(S)C and a Borel-measurable stochastic kernel r(dxlp;u, z) on S given P(S)CZ such that (49) holds. Q.E.D. The existence of ro(dxolp;z,) is proved in a similar manner. It is customary to call p, the given distribution of x,, the a priori distribution of the initial state. After z, is observed, the distribution is "up-dated", i.e., the distribution of x , conditioned on z, is computed. The up-dated distribution is called the a posteriori distribution and, as we will show in Lemma,10.4 is just ro(dxolp;so). At the kth stage, k 2 1, we will have some a priori distribution p; of x , based on i,-, = (z,, u,, . . . ,u,-, ,z,- ,). Control
u,-, is applied, some z, is observed, and an a posteriori distribution of x, conditioned on (i,- u,- ,, z,) is computed. We will show that this distribution is just r(dxlp;; u,- 2,). The process of passing from an a priori to an a posteriori distribution in this manner is calledfiltering, and it is formalized next. Consider the function 7 : P(S)C + P(S) defined by
,,
,,
Equation (50) is called the one-stage prediction equation. If x, has an a posteriori distribution p, and the control u, is chosen, then the a priori distribution of x,, isT(p,, u,). The mapping f is Borel-measurable (Propositions 7.26 and 7.29). Given a sequence i, E I, such that i,, = (i,, u,, z,, ,), k = 0,. . . ,N - 2, and given p E P(S), define recursively
,
,
PO(P;io) = ro(dx01P; 201, ~ k + l ( p ; ~ k= + lr(dxlJf[~k(~;ik),uk];u,,zk+l), )
(51) = O,.
..
- 2.
(j2)
Note that for each k, p,: P(S)I, + P(S) is Borel-measurable. Equations (48)-(52) are called the filtering equations corresponding to the (ISI) model. For a given initial distribution and policy, they generate the conditional distribution of the state given the current information, as the following lemma shows, Lemma 10.4 Let the model (ISI) be given. For any peP(S), n = (po, . . . , p ~ ,)-E rI and 3, E .$,, we have Pk(n, ~ for P,(n,p) almost every i,, k
) [ ~ k ~ ik] S k = l P~(P ik)(Sk) ;
= 0,.
(j3)
. . ,N - 1.
Prooft We proceed by induction. For any SoE .92,and Zo - E .B,, we have from (51), (16), and (48), that
Equation (53) for k probability.
=0
follows from (54) and the definition of conditional
In this and subsequent proofs, the reader may find the discussion concerning conditional expectations and probabilities at the beginning of Scction 10.2 helpful.
262
10.
THE IMPERFECT STATE INFORMATION MODEL
Assume now that ( 5 3 ) holds for k. For any _ I k € B,,, C,E B,, Z,+, E B , and 3,. € B Swe , have from (16), the induction hypothesis, Fubini's theorem, ( 5 0 ) , ( 5 2 ) , and (49)that
,
l)dPk+l(n,~)
P ~ + ~ ( Pi k ;> " k , z k +
~ i k ~ ~ k ~ ~ k lE€ z~ k + k 11 9 ~ k +
=
J JJ (iks!kl
ck
s k
+ 1~
( z k l+I " k ?
~
k
+
= P k + l ( n , ~ ) ( { i k ~ l k , ~ l k ~ C k , x k + lE
llxk,
u k ) ~ k ( d u k l ~ ik) ; dPk(n, P )
~ ~ + ~ , z ~ + ~ E Z ~ + ~ ) . ) . (j5)
It follows from ( 5 5 ) and the definition of conditional probability that Pk+l(n,~)[xk+l
E S ~ + ~= IP ~~+ ~~ ( P+; ~~~ +~I ) ( _ S ~ + ~ )
for Pk+,(n, p ) almost every i,, and the induction step is completed.
Q.E.D.
Proposition 10.5 Consider the ( I S I ) model and assume that U , ( x ) = C for every X E S and k = 0,. . . ,N - 1. Then the sequence [ p o ( p ; i o ) , . . . , p N - l ( p ; iN- ,)I defined by ( 5 1 ) and ( 5 2 ) is a statistic sufficient for control, and the resulting perfect state information model is stationary.
10.3
EXISTENCE OF STATISTICS SUFFICIENT FOR CONTROL
263
Proof Let Y , in Definition 10.6 be P(S), k = 0,. . . ,N - 1. We have already seen that the mappings p,: P(S)I, + P(S) are Borel-measurable, so (_Po,. . . ,pN-,) is a statistic. Condition (a) of Definition 10.6 is satisfied with r , = P ( S ) C , k = O ,..., N - 1 . define , For y E P(S), u E C and _Y E B,(,, Z ( Y , ~ , Y=) i z ~ Z l r [ d x l T ( y , u ) ; u , z ]I ~) ,
Y)
Note that Z(y, u, is the (y, u)-section of the inverse image of Y under a Borel-measurable function. The stochastic kernel
is Borel-measurable by Propositions 7.26 and 7.29, so the stochastic kernel
is Borel-measurable by the same propositions. It follows from Proposition 7.26 and Corollary 7.26.1 that i(dyljy,u) is a Borel-measurable stochastic and k = 0,. . . ,N - 2, kernel on P(S) given P(S)C. For n E Il,p E P(S), _YE a,(,, we have from Lemma 10.4
= i(Zl~k,
for Pk(n,p) almost every (B,, a,), where the expectations are with respect to P,+ ,(n,p). Thus (22) is satisfied. For TCEII,p~ P(S), and k = 0,. . . ,N - 1, we have from Lemma 10.4
264
10.
THE IMPERFECT STATE INFORMATION MODEL
for P,(n, p) almost every (L,, ti,), where the expectations are with respect to Pk(n,p). The function g^: P(S)C -,R* defined by
(58)
8 7 , ti) = Js g(x,t i ) ~ ( d x ) is lower semianalytic (Proposition 7.48),and, by (57),g^ satisfies (23).
Q.E.D.
If the horizon is finite, then the transition kernel f and the one-stage cost function g^ defined by (56) and (58) can be substituted in the dynamic programming algorithm (46)-(47) to compute the optimal cost function j: for (PSI). The optimal cost function J z for (ISI) can then be determined from (41).Ifthe horizon is infinite, in the limit the dynamic programming algorithm (46)-(47) yields j* under (fi) and (D) and under (p)in some cases (Propositions 9.14 and 9.17). The determination of J* from 3'"is again accomplished by using (41). 10.3.2 The Identity Mappings Proposition 10.6 Let the model (ISI)be given. The sequence of identity mappings on P(S)I,, k = 0,. . . ,N - 1, is a statistic sufficient for control. Proof Let Y, in Definition 10.6 be P ( S ) I k ,k = 0,. . . ,N - 1, and let y, be the identity mapping on P(S)I,. Then ( y o ,. . . ,y,is a statistic. Condi= P ( S ) T k , k = 0,. . . ,N - 1. tion (a) of Definition 10.6 is satisfied with If _Y,+ E B ~ ( ~yk )E P(S)I,, ~ ~ +and E C k , we adopt the notation
,)
rk
-
- -
1 (X+i)(y,<.z,) = { ~ , + , E Z I ( P ; ~ ~.,.U , ~~ k -, i. , ~ k , ~ k , ~ k + l ) ~ - - Y k + l , ,
where 7, = (fS;Z,,uo, . . . ,u,-, ,z,). Using this notation, we define for k = 0,. . . ,N - 2 the stochastic kernel f,(dy,+ l I j & , E k ) on P(S)I,+ given P(S)I,C by ?k(%+
ssk+,
l l ~ k ~= ~ k )
s[(%+ l ) ( 7 1 ~ , i i l ~ ) ~ t i k ~lx kl+ ~
( ~ l ~l x kk > + tik)~k(7k)(dxk)
where p,(y,) is given by (51)and (52).By an argument similar to that used in Proposition 10.5, it can be shown that fk is Borel-measurable. For p~ P ( S ) , and ~EII,~,,EB~(,,,,+ , , k = 0,. . . , N - 2, we have from Lemma 10.4
= ?k(&+
l I ? k ? ~ ~ k ) ~
for P,(n, p ) almost every (?,,ti,),
SO
(22)is satisfied.
For k
= 0,.
. . ,N - 1, define lj,: P(S)I,C + R* by
By Proposition 7.48, g^, is lower semianalytic for each k. For p E P(S), n E IT, and k = 0,. . . ,N - 1, we have from Lemma 10.4
for P,(n,p) almost every (F,,E,), where the expectation is with respect to P,(n, p), SO (23) is satisfied. Q.E.D. The transition kernels ?, and one-stage cost functions g, defined by (59) and (60) can be used in the nonstationary version of the dynamic programming algorithm (46)-(47). See the discussion following Proposition 10.5.
Chapter 11
Miscellaneous
11.1 Limit-Measurable Policies
In this section we strengthen the results of Section 7.7 concerning universally measurable functions. In particular, we show that these results are still valid if limit-measurable functions (Definitions B.2 and B.3) are used in place of universally measurable functions. This allows us to replace all the results on the existence of universally measurable policies in Chapters 8 and 9 by stronger results on the existence of limit-measurable policies. We now rework the main results of Section 7.7 with the aid of the concepts and results of Appendix B. Proposition 11.1 Let X, Y , and Z be Borel spaces, D E Yx, and E E 2,. Suppose f :D -+ Y and g: E -t Z are limit-measurable and f(D) c E. Then the composition g is limit-measurable.
Proof
This follows from Corollary B.ll.l.
Q.E.D.
Corollary 11.1.1 Let X and Y be Borel spaces, let f : X -t Y be a function, and let q(dYlx)be a stochastic kernel on Y given X such that, for each x, q(dylx) assigns probability one to the point f ( x ) ~ Y. Then q(dylx) is limitmeasurable if and only iff is limit-measurable.
Proof
See the proof of Corollary 7.44.3.
Q.E.D.
11.1
267
LIMIT-MEASURABLE POLICIES
Proposition 11.2 Let X and Y be Borel spaces and let q(dy1x) be a stochastic kernel on Y given X. The following statements are equivalent: (a) The stochastic kernel q(dy1x) is limit-measurable. (b) For every B EB y ,the mapping I,,: X + R defined by
is limit-measurable. (c) For every Q E Y y, the mapping IbQof (1) is limit-measurable. Proof We prove (a) => (c) (b) =.(a). Suppose (a) holds and Q E Y y. = 8, o y, where y:X -+ P(Y) is given by Now y(x) = q(dylx) and BQ:P(Y)-t R is given by
We have assumed that y is limit-measurable, and 8, is limit-measurable by Proposition B.12. Therefore (c) holds. It is clear that (c) => (b). Suppose now that (b) holds. Then
Letting y and 8, be defined by (2) and (3), we have from Proposition 7.25
so q(dy1x) is limit-measurable.
Q.E.D.
Proposition 11.3 Let X and Y be Borel spaces and let f :XY -+ R* be limit-measurable. Let q(dylx) be a limit-measurable stochastic kernel on Y given X. Then the mapping I.:X R* defined by -+
is limit-measurable. Proof The mapping 6(x) = p, is continuous (Corollary 7.21.1), as is the mapping o: P(X)P(Y) -+ P(XY) defined by o(p, q) = pq, where pq is the
268
1 1.
product measure (Lemma 7.12). Suppose Q E 5FXuand f x € X,
MISCELLANEOUS
= ZQ.
For every
where y and QQ are given by (2) and (3). Since all the functions on the righthand side of (4) are limit-measurable, iis limit-measurable. It follows that i, is limit-measurable when f is a limit-measurable simple function. The extension to the general limit-measurable, extended real-valued function f is straightforward. Q.E.D. Corollary 11.3.1 Let X be a Borel space and let f :X-+ R* be limitmeasurable. Then the function Of :P ( X ) -, R* defined by
Q,(P)
=
Jf
dP
is limit-measurable. We have the following sharpened version of the selection theorem for lower semianalytic functions. Proposition 11.4 Let A' and Y be Borel spaces, D c XY an analytic set, and f : D -+ RXa lower semianalytic function. Define f * : proj,(D) -+ R* by
f *(x) = inf f(x, y). Y E D ~
The set
I = {xE projx(D)I for some y , Dx, ~ f(x, y,)
= f *(x))
is limit-measurable, and for every E > 0 there exists a limit-measurable function 9: projx(D) -+ Y such that Gr(cp) c D and for all x E projx(D)
Proof The proof is the same as in Proposition 7.50(b), except that at the points where Corollaiy 7.44.2 is invoked to say that the composition of analytically measurable functions is universally measurable, we use ProposiQ.E.D. tion 11.1 to say that the composition is limit-measurable.
By the remark following Corollary B.ll.l, we see that I and the selector obtained in Proposition 11.4 are in fact 8:-measurable. This remark further suggests that the constructions in Chapters 8 and 9 of optimal and E-optimal
11.2
ANALYTICALLY MEASURABLE POLICIES
269
policies can be done more carefully by keeping track of the minimal 9'; with respect to which policies and costs are measurable. We do this to some extent in the next section, but do not pursue this matter to any great length. Propositions 11.1-1 1.4are sufficient to allow us to replace every reference to a "(universally measurable) policy" in Chapters 8 and 9 by the words "limit-measurable policy." It does not matter which class of policies is considered when defining J ; and J*; the proof of Proposition 8.1 together with Proposition 11.5 given below can be used to show that these functions are determined by the analytically measurable Markov policies alone. Corollary 11.1.1 tells us that the nonrandomized limit-measurable policies are just the set of sequences of limit-measurable functions from state to control which satisfy the control constraint (cf. Definition 8.2). This fact and Proposition 11.2 are needed for the proof of the limit-measurable counterpart of Lemma 8.2. From Proposition 11.3 we can deduce that the cost corresponding to a limit-measurable policy is limit-measurable (cf. Definitions 8.3 and 9.3). This fact was used, for example, in proving that under (F-) a nonrandomized, semi-Markov, &-optimalpolicy exists (Proposition 8.3). Proposition 11.4 allows limit-measurable &-optimaland optimal selection. The &-optimal selection property for universally measurable functions is used in practically every proof in Chapters 8 and 9. The exact selection property is used in showing the existence under certain conditions of optimal policies (Propositions 8.5, 9.19, and 9.20). 11.2 Analytically Measurable Policies
Some of the existence results of Chapters 8 and 9 can be sharpened to state the existence of &-optimalanalytically measurable policies. This is due to Proposition 7.50(a) and the following propositions. Proposition 11.5 is the analog of Corollary 7.44.3 for universally measurable policies and of Corollary 11.1.1 for limit-measurable policies. Proposition 11.5 Let X and Y be Bore1 spaces, let f : X -, Y be a function, and let q(dy1x) be a stochastic kernel on Y given X such that, for each x, q(dy(x)assigns probability one to the point f ( x ) ~ Y. Then q(dylx) is analytically measurable if and only iff is analytically measurable.
Proof
We sharpen the proof of Corollary 7.44.3. Let y(x) = q(dylx)and that g = 6 f and f = 6-' y. NOW6 is a homeomorphism from Y to Y = jp,ly E Y), so S and 6-': F + Y are both Borel-measurable. Iff is analytically measurable and C E ,97,(,,,then
6(y) = py,
SO
0
0
~ - ' ( c= )f - ' [ 6 - ' ( C ) ] ~ d ~
270
11. MISCELLANEOUS
because 6 - '(C) E B y . If y is analytically measurable and B Eg,, then
f -l(B) = y-l[G(B)] e d x because 6(B)E Bp(,).
Q.E.D.
Proposition 11.6 Let X and Y be Bore1 spaces and let q(dy1x) be a stochastic kernel on Y given X. The following statements are equivalent: (a) The stochastic kernel q(dy1x) is analytically measurable. (b) For every B E B y , the mapping I.,: X + R defined by is analytically measurable. Proof Assume (a) holds and define y(x) = q(dy1x). Then for B E B ~ , C E B R ,and 8,: P(Y) + R defined by (3), we have
because 0, '(C) E BP(,) (Proposition 7.25). Therefore (b) holds. If (b) holds, we can show that (a) holds by the same argument used in the proof of Proposition 11.2. Q.E.D. We know from Corollary B.ll.l that the composition of analytically measurable functions need not be analytically measurable, so the cost corresponding to an analytically measurable policy for a stochastic optimal control model may not be analytically measurable. To see this, just write out explicitly the cost corresponding to a two-stage, nonrandomized, Markov, analytically measurable policy (cf. Definition 8.3). A review of Chapters 8 and 9 shows the following. Proposition 8.3 is still valid if the word "policy" is replaced by "analytically measurable policy," except that under (F-) an analytically measurable, nonrandomized, semiMarkov, &-optimalpolicy is not guranteed to exist. However, an analytically measurable nonrandomized E-optimal policy can be shown to exist if g I 0 [B12]. The proof of the existence of a sequence of nonrandomized Markov policies exhibiting [&,)-dominated convergence to optimality (Proposition 8.4) breaks down at the point where we assume that a sequence of one-stage policies {p", exists for which
This occurs because T,--l(J,) may not be analytically measurable. In the first sentence of ~ r o ~ o 4 t i o9.19, n the word "policy" can be replaced by "analytically measurable policy." The &-optimal part of Proposition 9.20 depends on the (F-) part of Proposition 8.3, so it cannot be strengthened in
11.3
MODELS WITH MULTIPLICATIVE COST
271
this way. Under assumption (N),an analytically measurable, nonrandomized, &-optimalpolicy can be shown to exist [B12], but it is unknown whether this policy can be taken to be semi-Markov. The results of Chapters 8 and 9 relating to existence of universally measurable optimal policies depend on the exact selection property of Proposition 7.50(b). Since this property is not available for analytically measurable functions, we cannot use the same arguments to infer existence of optimal analytically measurable policies.
11.3 Models with Multiplicative Cost In this section we revisit the stochastic optimal control model with a multiplicative cost functional first encountered in Section 2.3.4. We pose the finite horizon model in Borel spaces and state the results which are obtainable by casting this Borel space model in the generalized framework of Chapter 6. This does not permit a thorough treatment of the type already given to the model with additive cost in Chapters 8 and 9, but it does yield some useful results and illustrates how the generalized abstract model of Chapter 6 can be applied. The reader can, of course, use the mathematical theory of Chapter 7 to analyze the model with multiplicative cost directly under conditions more general than those given here. We set up the Bovel model with multiplicative cost. Let the state space S, the control space C, and the disturbance space W be Borel spaces. Let the control constraint U mapping S into the set of nonempty subsets of C be such that
is analytic. Let the disturbance kernel p(dwlx,u) and the system function f : SC W + S be Borel-measurable. Let the one-stage cost function g be Borelmeasurable, and assume that there exists a b E R such that 0 I g(x, u, w) I b for all X E S, u E U(X),W E W. Let the horizon N be a positive integer. In the framework of Section 6.1, we define F to be the set of extended real-valued, universally measurable functions on S and F* to be' the set be I the set of functions in F which are lower semianalytic. We let & of universally measurable functions from S to C with graph in T. Define H: SCF + [0, co] by
wherewedefineO.co= c o . O = O . ( - a ) = ( - a ) . O = O . W e t a k e J , : S - t R * to be identically one. Then Assumptions A.l -A.4, F.2, and the Exact Selection Assumption of Section 6.1 hold. (Assumption A.2 follows from Lemma 7.30(4) and Propositions 7.47 and 7.48. Assumption A.4 follows from Proposition
11 .
272
MISCELLANEOUS
7.50.) From Propositions 6.l(a), 6.2(a), and 6.3 we have the following results, where the notation of Section 6.1 is used. Proposition 11.7 In the finite horizon Bore1 model with multiplicative cost, we have and for every E > 0 there exists an N-stage E-optimal (Markov) policy. A policy n* = (y;, . . . ,y$- ,) is uniformly N-stage optimal if and only if ( T , ; T ~ - ~ -)(Jo) ' = TN- (J,), k = 0,. . . ,N - 1, and such a policy exists if and only if the infimum in the relation T ~ '(J,)(x) +
=
inf H[x, u, Tk(~,,)] uEU(X)
is attained for each x~ S and k = 0,. . . ,N - 1. A sufficient condition for this infimum to be attained is for the set
, k to be compact for each ~ E S~, E Rand
= 0,.
. . ,N
- 1.
Appendix A
The Outer Integral
Throughout this appendix, (X, 98,p) is a probability space. Unless otherwise specified, f , g, and h are functions from X to [ - co,co]. Definition A.l
Iff 2 0, the outer integral off with respect to p is defined
by
Iff is arbitrary, define
where f'(x)=max{O,,f(x)),
f-(x)=maxjO,-f(x)),
andweset co - co = co. Lemma A.l such that
I f f 2 0, then there exists a .%-measurable g with g 2 f ,
APPENDIX A
Proof
Choose g, 2 f,g, 98-measurable, so that
We assume without loss of generality that g , 2 9 , 2 . . . . Let g Then g 2 f , g is 93-measurable, and (3) holds. Q.E.D.
= limn,,
g,.
Lemma A.2 I f f 2 0 and h 2 0, then
If either f or h is %-measurable, then equality holds in (4). Proof Suppose 9 , 2 f , 9 , 2 f , g, and 9 , are 93-measurable, and I* f dp = J g , dp, J*h dp = 19, dp. Then 9 , g2 2 f + h and (4) follows from (1). Suppose h is W-measurable and Shdp < a.[If Shdp = oo, equality is g, where g is 93-measurable and easily seen to hold in (4).] Suppose f + h I
+
Then f 5 g
-
h and g - h is 5?-measurable, so
which implies
Therefore equality holds in (4).
Q.E.D.
We provide an example to show that strict inequality can occur in (4), even if f + h is B-measurable. For this and subsequent examples we will need the following observation: For any E c X,
where p*(E) is p-outer measure defined by
and x, is the indicator function of E defined by
275
THE OUTER INTEGRAL
To verify (5), note that if 1, I g and g is .93-measurable, then { x ] ~ ( x2) 1) is a B-measurable set containing E and consequently
Definition A.l implies
On the other hand, if {B,) is a sequence of B-measurable sets with E c B, and p(B,) l p*(E), then p(n'i,"=,B,) = p*(E). By construction, xn:,, B, 2 xE. But xn;= , B , is 63-measurable, and
The reverse of inequality (6) follows. Note that the preceding argument shows that for any set E, there exists a set B e.93 such that E c B and p(B) = p*(E). EXAMPLE 1 Let X = [O, 11, let .93be the Bore1 o-algebra, and let p be Lebesgue measure restricted to 93.Let E c X be a set for which p*(E) = p*(X - E) = 1 (see [HI, Section 16, Theorem El). Then
and strict inequality holds in (4). Lemma A.2 cannot be extended to (possibly negative) bounded functions, even if h is .%-measurable, as the following example demonstrates. EXAMPLE 2 Let (X,93,p) and E be as before. Let f Then j*(f
+ h)dp = j * 21, dp = 2,
Lemma A.3 (a) I f f I y, then j*fdp I j*g dp. (b) If E > 0 and J' g f + r , then
=
zE - zx- E ,
h = 1.
276
APPENDIX A
(c) If j* f dp < co or j* f - dp < co,then +
J*(-.f)dp
=
-J*f
(8)
dp.
(d) If A, B E 93 are disjoint, then for any f J*
~~~~f d~ = J*
dp + J* iB/dp.
XJ
(9)
(e) If E c X satisfies p*(E) = 0, then for any f J * f d p = J*%X-Ef d ~ . (f) If p*(jxl f(x) = co)) > 0, then for every g, S*(g + f ) d p = m. (g) If p*({xlf(x) = - co)) > 0, then for every g either j*(g + f ) d p or j*(g + f ) d p = - oo. Proof
(a) Iff I g, then f
+
=
cc
5 g + and f - 2 g-. By (I),
J*f'dp5J*gidp,
J*f-dpt S*gdp.
The result follows from (2). (b) In light of (a), it remains only to show that
For g, 2 f +,g, 93-measurable, and J*f
+
dp = J B ~ ~ P ?
we have
J*(f For g2 2 ( f
+ E)-,
+ E)'
dp 5 Jgl dp
+
E =
J* f dp + E . +
g2 .%-measurable, and J*(f
+ &I-
dp = 1 9 2 d ~ ,
we have .q2
+ E 2 (J' + E)- + E = maxf f ' -
-E,O)
+ E >
J'-,
SO
E
+ J*(f + E)-
dp = E
+ Jg2 dp = Jig2 + ~ ) d p J * f
Combine (11) and (12) to conclude (10).
dp.
(12)
THE OUTER INTEGRAL
(c) We have
where the assumption that J*f dp < co or J*f - dp < EI is necessary for the next to last equality. (d) Suppose f 2 0. Let g be a W-measurable function with g 2 x,, f and +
,
Now suppose 9, P
zAf , g2 2 X B f are W-measurable and
Combine (13) and (14) to conclude (9) for f 2 0. The extension to arbitrary f is straightforward. (e) Suppose f 2 0. Choose BEW with p(B) = p*(E) = 0, B 2 E. By (d),
Hence J"*f dp = f dp. The extension to arbitrary f is straightforward. (f) We have (g + f )+(x) = oo iff (x) = oo,so that
+
Hence J*(g f ) + dp = co,and it follows that J*(g + f ) d p = co. (g) Consider the sets E = f(x) = - EI) and E, = (xif(x) = g(x) < co). If p*(E,) = 0, then
.[XI
p*(E - E,)
= p*(E - E,)
+ p*(E,) 2 p*(E) > 0.
-
oo,
278
APPENDIX A
Since we have f ( x ) + g ( x ) = co for X E E - E,, it follows from ( f ) that J*(g + f ) d p = co. If p*(E,) > 0, then p*((xl(g f ) - ( x ) = co))2 p*(E,) > 0 and hence, by (f), ("(9 f ) - dp = m. Hence, if [*(g + f)' dp = co, then J*(g + f ) dp = m, while if J*(g + f ) + dp < co, then J*(g + f ) dp = - m. Q.E.D.
+
+
The bound given in ( 7 ) is the sharpest possible. To see this, let f be as defined in Example 2, g = f + 1, and E = 1. Despite these pathologies of outer integration, there is a monotone convergence theorem, which we now prove. Proposition A . l f ,t
If { f , ) is a sequence of nonnegative functions and
f then >
If { f , )is a sequence of nonpositive functions and f,J f , then
Proof We prove the first statement of the theorem. The second follows from the first and Lemma A.3(c). Assume f, 2 0 and f ,f f . Let {g,) be a sequence of B-measurable functions such that g, 2 f, and
If, for some n, {g,dp = {*f,dp = m, then (15) is assured. If not, then for every n,
Suppose (17) holds for every n and for some n,
,
,
p(.Ix)gn(x)> gn+ I ( x ) ) )> 0.
Then since g, + 2 f,+ 2 f,, we have that
defined by
satisfies g, 2 g 2 f, everywhere and ij < g, on a set of positive measure. This contradicts (16). We may therefore assume without loss of generality that (17)holds and g, 5 g, . . . . Let g = lim,,, g,. Then g 2 f and
But f , 5 f for every n, so the reverse inequality holds as well.
Q.E.D.
279
THE OUTER INTEGRAL
One might hope that if {f,) is a sequence of functions which are bounded below andf, f f , then (15) remains valid. This is not the case, as the following example shows. EXAMPLE 3 Let X = [O, I), W be the Bore1 o-algebra, and p be Lebesgue measure restricted to 98. Define an equivalence relation on X by
x
-
-
y o x - y is rational.
Let Fo be constructed by choosing one representative from each equivalence class. Let Q = {q,, q,, . . .) be an enumeration of the rationals in [O,1) with qo = 0 and define Then F o , F,, . . . is a sequence of disjoint sets with
U:Z~
If for some n < oo,we have p*(U,",, F,) < 1, then E = Fk contains a 98-measurable set with measure 6 > 0. For k = 1,. . . ,n - 1, let q, = vk/sk, where r, and s, are integers and vk/skis reduced to lowest terms. Let ( p , , p2 ,. . .) be a sequence of prime numbers such that max s, < p , < p2 < .
lsksn-l
'
Then the sets E, E + p ; [mod I], E + p;' [mod 11, . . . are disjoint, and by the translation invariance of p, each contains a 9-measurable set with measure 6 > 0. It follows that [O,1) must contain a W-measurable set of infinite measure. This contradiction implies
for every n. Define
Then f,T 0, but (5) and (19) imply that for every n
By a change of sign in Example 3, we see that the second part of Theorem A.l cannot be extended to functions which are bounded above unless additional conditions are imposed. We impose such conditions in order to prove a corollary.
280
APPENDIX A
A.l.l xr=Corollary < { f,) =, E,
m. Let
lim fn n-r
Let {E,) be a sequence of positive numbers with be a sequence with
=f,
w
Then
Proof F r o m ( 2 0 ) w e h a v e l i m n + , f ~ = f + a n d l i m , ~ , f ~ = f - . N o w infksnf k I f; s f - and inf,,, f; f f - as n --+ co. By Proposition A.l, J* f - d p = lirn J* inff;dp$ n+w
k2n
lim J*
lim J* f ; d p i J *
f-dp,
n-r m
/;
dp = J* f - dp.
n-w
Let A = {xif(x) and (e) imply
= -a
) . Ifp*(A) = 0, then (21),(22),(24),and Lemmas A.3(b)
Combine (26) and (27) to conclude (25). If p*(A) > 0, then S*f - dp = - m and (26) will imply (25) provided that
and lim sup J*
f.i dp < ca.
n-r m
Conditions (21) and (24) imply (28). Conditions (21)-(23) imply for every XEX
THE OUTER INTEGRAL
J*f: d p ~ 2 ~ . + J * f ~ l d ~
and
The finiteness of
I,"=, .zk and (24)imply (29).
Q.E.D.
Appendix B
Additional Measurability Properties of Bore1 Spaces
This appendix supplements Section 7.6. The notation and terminology used here is the same as in that section and, in most cases, is defined in Section 7.1.
B.l
Proof of Proposition 7.35(e)
Our first task is to give a proof of Proposition 7.35(e). To do this, we introduce the space N* = {l,2 , . . .) u {a)with the topology induced by the metric
where we define l/a = 0. Let JV* = N*N*. . . with the product topology. The space .Af of sequences of positive integers is a topological subspace of .N*. The space JV* is compact by Tychonoff's theorem, while JV is not. If ( X ,9 )and ( Y ,9)are paved spaces, we denote by 9'2 the paving of XY:
Proposition B.l Let ( X , P ) be a paved space and .f the collection of compact subsets of .N*.Then the projection on X of a set in Y(.P.Y/C)is in Y(9). Conversely, every set in Y(.9)is the projection on X of some set in
[(9.,f)o]d.
ADDITIONAL MEASURABILITY PROPERTIES OF BOREL SPACES
283
Proof Let S be a Suslin scheme for .Y,X. Then for every s E C, S(s) has the form S(s) = Sl(s)S2(s), where S,(S)E and S2(s)E X . Now
SO
where
Since each S,(s) is compact, we have
Define a Suslin scheme R for 9 by
otherwise.
lizr Then
so projx[N(S)] E Y ( 9 ) . For the second part of the proposition, suppose S is a Suslin scheme Define a Suslin scheme R for I f by for 9'.
For fixed z , E .A'",we have
ns,zo R(s)
=
{z,),
SO
284
APPENDIX B
Therefore,
and it remains only to show that
If we can show that
where C, is the set of elements in C having k components, then (3) will follow. Let x E X and z, = ((7,[',. . . ) E N * be given. Suppose
We see from (2) that zo E A'" and ( x ,z,) E r)s,,,[S(s)R(s)], so for every k 2 1, (x,zo) S K Y , . . . ,c ~ o ) R (. ~. .Y,i;). , hi^ implies (x,Z O ) E .,kr~(s)~(s)~, and
nr=us
On the other hand, if (x,z,)E OF=, ~s,,,[S(s)R(s)],then for each k 2 1, ( x ,z,) E ~ s , x k [ S ( ~ ) ~This ( ~ ) can ] . happen only if z, E N and ( x ,z,) E S([:,. . . ,[~O)R([:, . . . ,it).Therefore,
which proves the reverse of set containment (5). Equality (4)follows. Q.E.D.
285
ADDITIONAL MEASURABILITY PROPERTIES OF BOREL SPACES
is a paved space, Y is another space, and Q c Y, we define a If (X,,!?) paving of X Y by YQ Lemma B.l
=
(PQlPeY).
Let (X, 9 ) and ( Y ,A?) be paved spaces. Then:
(a) Y(,P)Q = Y(YQ) for every Q c Y ; (b) Y(9)2? c 9(.9'2). Proof
Part (a) is trivial and part (b) follows from (a).
Q.E.D.
We are now in a position to prove part (e) of Proposition 7.35. Proposition B.2 Let (X, .9) be a paved space. Then Y(.P) = Y [Y(Y)].
Proof
In light of Proposition 7.35(d), we need only prove
Let .M* and .X be as in Proposition B.1. If AeY[9(.9')], then by the second part of Proposition B.l, A = projx(B) for some set Be([Y(P)X],),. By Lemma B.l(b) and Proposition 7.35(b) and (c), we have
The first part of Proposition B.l implies that A = projX(B)eY(9)and (6) follows. Q.E.D. B.2 Proof of Proposition 7.16 In Proposition 7.16 we stated that Borel spaces X and Y are Borelisomorphic if and only if they have the same cardinality. A related result is that every uncountable Borel space is Borel-isomorphic to every other uncountable Borel space. We used the latter fact in Proposition 7.27 to assume without loss of generality that the Borel spaces under consideration were actually copies of (O,l], we used it in Proposition 7.39 to transfer a statement about JV to a statement about any uncountable Borel space, and we will use it again in Proposition B.7 to allow our treatment of the limit o-algebra to center on the space .M. The proofs of Proposition 7.16 and Corollary 7.16.1 depend on the following lemma, which is an immediate consequence of Propositions 7.36 and 7.37. The reader may wish to verify that these propositions depend only on Propositions 7.35, B.l, and B.2, so no circularity is present in the arguments. Lemma B.2 Let X be a nonempty Borel space. There is a continuous function f' from .M onto X.
286
APPENDIX B
Define A! to be the set of infinite sequences of zeroes and ones. We can regard A as the countable product of copies of {O, 1 ) and endow it with the product topology, where { O , l ) has the discrete topology. By Tychonoff's theorem, A! is compact with this topology. It is also metrizable as a complete separable space. Our proof of Proposition 7.16 consists of three parts. We show first that every uncountable Borel space contains a Borel subset homeomorphic to A!, we show second that every uncountable Borel space is isomorphic to a Borel subset of A, and we show finally that these first two facts imply that every uncountable Borel space is isomorphic to A.
Lemma B.3 Let X be an uncountable Borel space. There exists a compact set K c X such that A!and K are homeomorphic. Proof Let J ' : N + X be the continuous, onto function of Lemma B.2. For each x E X , choose an element z, E JV such that x = f(z,). Let S = {z,lx E X ) , SO that f is a one-to-one function from S onto X. For ~ E S , if possible choose an open neighborhood T ( z ) of z such that S n T ( z ) is countable. Let R be the set of all Z E S for which such a T ( z ) can be found. Since separable metrizable spaces have the Lindelof property, there exists a countable subset R' of R such that T ( z )= T ( z ) ,SO
U,,,
uZsRf
and R is countable. Since S is uncountable, S - R must be infinite. Furthermore, if z E S - R , then every open neighborhood of z contains infinitely many points of S - R. Let d be a metric on A'" consistent with its topology for which ( M ,d ) is complete. For 3~ M , the closed sphere of radius r centered at 7 is the set ( z E .NId(z,?) 5 1.1.The interior ofthis sphere, denoted Int { Z E .N"Id(z,z)I u), is the set { z ~ . N I d ( z , ? < ) Y). Let z(0) and z(1) be distinct points in S - R. Then f [z(O)] f f [ z ( l ) ] , so there exist disjoint open neighborhoods U and V o f f [z(O)] and f [ z ( l ) ] respectively. Let S(0) and S ( l ) be disjoint closed spheres of radius no greater than one centered at z(0) and z(1) and contained in f - ' ( U ) and f - ' ( V ) respectively. We have that f [S(O)] and f [ S ( l ) ] are disjoint. Note also that for every z E (S - R ) n Int S(O),every open neighborhood of z contains infinitely many points of ( S - R ) n Int S(O), and the same is true of S(1). By the same procedure we can choose distinct points z(0,O) and z ( 0 , l ) in ( S - R ) n Int S(0) and distinct points z(1,O) and z ( 1 , I ) in ( S - R ) n Int S(1), and we can also choose dirjoint closed spheres S(O,O), S(0, I ) , S(1,O) and S ( 1 , l ) of radius no greater than 5 centered at z(0, O), z(0, I ) , z(1,O) and z(1, I), respectively, so that f[S(O, 0 ) ] , f [S(O,I ) ] , f [S(1,O)] and ,f [S(1, I ) ] are all disjoint. We can choose these spheres so that S(0,O) and
ADDITIONAL MEASURABILITY PROPERTIES OF BOREL SPACES
287
S ( 0 , l ) are contained in S(O), while S(1,O) and S ( 1 , l ) are contained in S(1). At the kth step of this process, we choose a collection of disjoint closed spheres S ( p , , . . . , p,) of radius no greater than l l k centered at distinct points z ( p l , . . . , p,) in S - R, where each pj is either zero or one. Furthermore, we can choose the spheres so that for each ( p , , . . . , p,- ,)
(i)
(1
=@,
f[s(~l>.~.>pk-l>o)]nf[s(~l>...>pk-l>l)] -
I
.
.
-
~,=0,1.
For fixed m = ( p , , p,, . . .) E A', the sets { S ( p , , . . . , p,)} form a decreasing sequence of closed sets with radius converging to zero, so { z ( p l , . . . , p,)) is S ( p l , . . . , p,). Cauchy and thus has a limit ~ ( mE ) We show that y:.&' -+ ./V is a homeomorphism. If ( p , , p , , . . .) and (v,, v,, . . .) are distinct elements of A', then for some integer k, we have pk # V , . Since y ( p l , p 2 , . . .) E S ( p l , . . . ,p,), y ( v l , v 2 , . . .) E S ( v I ,. . . ,vk), and S ( p , , . . . , p,) is disjoint from S(v,, . . . , v,), we see that cp(p,, p,, . . .) # cp(v,, v,, . . .), so y is one-to-one. To show cp is continuous, let {m,) be a sequence converging to m E A'. Choose E > 0 and let k be a positive integer such that 2/k < E. There exists an n such that whenever n 2 Ti, the elements m, and m = ( p , , p,, . . .) agree in the first k components, so both cp(m,) and ~ ( mare ) in S ( p l , . . . , p,). This implies d(cp(m,), cp(m))I 2/k < E, so cp is continuous. To show that y-' is continuous, it suffices to show that q ( F ) is closed in q(&) whenever F is closed in A'. This follows from the fact that A' is compact and cp is continuous. Define M , c .M to be the compact homeomorphic image of A' under cp. We now show that f : M I-+ X is a homeomorphism. To see that f is one-to-one, choose distinct points z and 2 in M I . Then there exist distinct points m = ( p , , p,, . . .) and f i = ( f i , , fi,, . . .) in .Asuch that z = cp(m) and z^ = y ( f i ) . For some k, we have p, # fi,, so by (i), f [ S ( p o , .. . ,pk)] n f [S(fio,. . . , fi,)] = @. Since z E S ( p O ,. . . , p,) and 2 E S(fio, . . . , fi,), we see that f( z ) # f(B),so f is one-to-one. Just as in the case of cp, the continuity of f follows from the fact that f is continuous and has a compact domain. ) a compact subset of X homeomorphic to .A. Q.E.D. The set K = f ( M I is
Or=
-'
Lemma B.4 Let X be an uncountable Borel space. There exists a Borel subset L of .Asuch that X and L are Borel-isomorphic.
Proof By definition, X is homeomorphic to a Borel subset B of a complete separable metric space Y. By Urysohn's and Alexandroff's theorems (Propositions 7.2 and 7.3), Y is homeomorphic to a G,-subset of the Hilbert cube .X, so B and hence X are homeomorphic to a Borel subset of I f . It suffices then to show that 2 is Borel-isomorphic to a Borel subset of .A'. The idea of the proof is this. Each element in Yf is a sequence of real numbers in [0, 11. Each of these numbers has a binary expansion, and bv
288
APPENDIX B
mixing all these expansions, we obtain an element in A . Let us first define $ :[O,1] -, A which maps a real number into a sequence of zeroes and ones which is its binary expansion. It is easier to define $-I, which we define on A, u {(O,O,0, . . .)), where
Al = {(pl, p 2 , . . .)EA l p k = 1 for infinitely many k). It is given by
and it is easily verified that is one-to-one, continuous, and maps onto [O, 11. Since A - 4, is countable, the domain of I)-' is a Borel subset of A, and Proposition 7.15 tells us that )I is a Borel isomorphism. Since we have not proved Proposition 7.15, we show directly that )I is Borel-measurable. Consider the collection of sets k=1,2,..., k = 1,2,. . . .
R ( k ) = { ( ~ 1 , ~ 2 , . . . ) ~ A [ ~ k = o ) >
( k ) = { ( p 1 p 2 ..
)
A=1
These sets form a subbase for the topology of A , so by the remark following Definition 7.6, we need only prove that $-'[R(k)] and $-'[k(k)] are Borelmeasurable to conclude that IC/ is. Since one of these sets is the complement of the other, we may restrict attention to tj-'[R(k)]. Remembering that the domain of I / - ' is A, u {O,O,0, . . .)), we have
and k- 1
{j = f1 ~ ~ ( ~ l , p 2 , . ) ~ ~ ~ , p k = O = (PI,. .U ., ~
k 1)-
{x+
j= 1
310<x<$],
which is a finite union of Borel sets. The proof that AA . . . and .Aare homeomorphic is essentially the same one given in Lemma 7.25, and we do not repeat it here. Let 6 mapping . M A . . . onto . A be a homeomorphism and define cp :If -+ .Aby
Then cp is the required Borel-isomorphism.
Q.E.D.
K , c L, and K, is Lemma B.5 If K l and L are Borel subsets of ?A, then L is Borel-isomorphic to .A. Borel-isomorphic to .A, we write A = B to indicate Proof For Borel subsets A and B of .A, that A and B are Borel-isomorphic. Note that A = B and B = C implies
ADDITIONAL MEASURABILITY PROPERTIES OF BOREL SPACES
289
A z C. Also, if A,, A,, . . . is a sequence of disjoint Borel sets, if B,, B,, . . . is another such sequence, and if A, z Bi for every i, then A, z U,p"_B,. We note finally that if A = A , u A, and A z B, then B = B , u B,, where A , z B , and A, z B,. If A , and A, are disjoint, then B, and B, can be taken to be disjoint. Under the hypotheses of the lemma, let D, = A! - K,. Since A', z K , and A' = K , u D l , there exist disjoint Borel sets K , and D, such that K1=K2uD2,K,%K2andD,~D,.SinceK1%K,andK,=K,uD,, there exist disjoint Borel sets K , and D, such that K , = K , u D,, K , z K,, and D, z D,. Continuing in this manner, at the nth step we construct disjoint Borel sets K , and D, such that K,-, = K, u D,, K,-, z K,, and D,-, z D,. Let K , = K,. Then A' = K , u D,], and all the sets on the right side of this equation are disjoint. Let A , = A - L and B , = L - K,. Then A , and B , are disjoint and D, = A , u B,. For each n, Dl % D,, so D, = A, u B,, where A, and B, are disjoint Borel sets and A , z A,, B , z B,. In particular, A, z A*+, for n = 1,2,. . . , a n d we have
Us,
n=; ,
= { K , U [ ~ ~ D ~ ] ] - A , = A ' - A ~ = L . Q.E.D.
We can now prove Proposition 7.16, and the proof clearly shows that Corollary 7.16.1 is also true. Proposition B.3 Let X and Y be Borel spaces. Then X and Y are isomorphic if and only if they have the same cardinality. Proof If X and Y are isomorphic, then clearly they must have the same cardinality. If X and Y both have the same finite or countably infinite cardinality, then their Borel g-algebras are their power sets and any oneto-one onto mapping from one to the other is a Borel-isomorphism. If X is uncountable, then by Lemma B.4 there exists a Borel isomorphism q : X + ,Asuch that L = q ( X ) is a Borel subset of M. By Lemma B.3, X contains a compact set K which is homeomorphic to .A, so q ( K ) is Borel-isomorphic to J22 and q ( K ) c L. Set K , = q ( K )and use Lemma B.5 to conclude that L and A are isomorphic. It follows that X and A are isomorphic. If Y is uncountable, the same argument shows that Y and J22 Q.E.D. are isomorphic, so X and Yare isomorphic.
APPENDIX B
B.3 An Analytic Set Which Is Not Borel-Measurable Suslin schemes can be used to generate a strictly increasing sequence of o-algebras on any given uncountable Borel space X. The first o-algebra in this sequence is the Borel o-algebra Bxand the second is the analytic oalgebra d,, and, as a result of the following discussion, we will see that dxis strictly larger than 93,. The proof of this depends on a contradiction involving universal functions, which we now introduce. Let .A, be the set of sequences of zeroes and ones for which one occurs infinitely many times. If the nonzero components of m E .A, are in positions m,, m 2 , . . . , then we can think of m as a mapping from Jlr to ,A'" defined by
Definition B.l Let 9 be a paving of N . A universal function L for P is If 2 is another paving of Jlr and a mapping from JV onto -9.
we say L is consistent with 9. Proposition B.4 Let 9 be the collection of open subsets of JV: There exists a universal function for 9 consistent with 9. Proof The space Jlr is separable, so its topology has a countable base ( G ( l ) ,G(2),. . .), where the empty set is included among these basic open sets. Define L : . N + 9 by
It is clear that L is a universal function for 9.Now choose m E .A1and suppose the nonzero components of m are in positions m,, m,, . . . . Choose zo = (iy,i g , . . .) in the set
Then for some E, we have zo E G(i2,). Let
Then G(i2,) c L[m(z)] for every z E UE(z,), so z G L[m(z)] for every z E U,(zo) n G(ii,). Therefore U E ( z on) G(i2,) is an open neighborhood of z, contained in jz E ,/I/-~ZE L [ m ( s ) ] ) ,so this set is open. Q.E.D.
ADDITIONAL MEASURABILITY PROPERTIES OF BOREL SPACES
291
Given a paved space and a universal function for the paving which satisfies a condition like (7), it is possible to construct similar universal functions for larger pavings. We show first how this is done when the given paving is extended by the use of Suslin schemes. Proposition B.5 Let 9'be a paving for H a n d suppose that there exists a universal function for 9'consistent with Y(9). Then there exists a universal function for Y(.Y)consistent with Y ( 9 ) .
Proof Fix a partition ( P , ~ s EC ) of the positive integers into countably many countable sets, and define for each s E C a corresponding m, = ( I U I ( SPZ(S), ) , . . .)E A1 by
Let L be a universal function for Y(9')by K ( Z O= )
.Yconsistent with Y ( 9 ) .Define K :.Af +
u n L[~,(~,)I.
(9)
ze&+s
To show that K is onto, we must show that given any Suslin scheme S for 9, there exists z, E JY such that
If S:C -+ 9 is given and s E C,then S(s) E 9'. Since L is a universal function for 9,there exists Z , E M for which S(s) = L(z,). If z, is chosen so that m,(zo) = s, for every S E X ,then (10) is satisfied, and such a choice of z, is possible because ms(zo)depends only on the components of z, with indices in P,. Therefore K is a universal function for Y(.Y). If m, n E.A?~,then there is an element in A,, which we denote by mn, such that (mn)(z)= m[n(z)]for every z E N . In fact, if the nonzero elements of m are (m,, m,,. . .) and the nonzero elements of n are (n,,n,, . . .), then the nonzero elements of mn are (n,,,n,,, . . .). Now suppose m € . A l . We have
which, since L is consistent with Y ( . Y ) ,is the nucleus of a Suslin scheme for 9 ( . Y ) .It follows from Proposition B.2 that K is consistent with Y(.Y). Q.E.D. Corollary B.5.1 There is a universal function for Y ( 9,) consistent ). with 9(9,
292
APPENDIX B
Proof Let 9 be the collection of open subsets of N.By Propositions B.4 and B.5, there is a universal function for Y ( 9 )consistent with Y ( 9 ) , and it remains only to show that Y ( 9 )= Y(9,). Since 9 c B,, it follows from Proposition 7.36 that Y ( 9 )c Y(F-,). Since every closed subset of JY is a GB-setand, by Proposition 7.35, 9, c Y(9),= Y(9), we see that F N c Y(9).Proposition B.2 implies that Y(.F,,+.)c Y [ Y ( B ) ]= Y(9). Q.E.D. Corollary B.5.2 Let L be a universal function for Y ( 9 , Nconsistent ) with Y ( F N )The . set
is analytic but not Borel-measurable, and A'" - A, is not analytic. Proof have
The set A, is analytic because L is consistent with Y(F,). We
and if this set is analytic, then there exists z, E JV such that
If z, E A,, then zo $ L(z,), and ( 1 1 ) is contradicted. If z, E N - A,, then z, E L(zo)and (12) is contradicted. Therefore JV - A, is not analytic, thus not Borel-measurable, so A, is also not Borel-measurable. Q.E.D. Proposition B.6 Let X be an uncountable Borel space. There exists an analytic subset A of X such that A is not Borel-measurable and X - A is not analytic. Proof Let cp:.N + X be a Borel isomorphism from A'" onto X (Corollary 7.16.1), and let A, c JV be as in Corollary B.5.2. Then A = cp(A,) is analytic, but since JY - A, = cp-'(X - A) is not analytic, neither is X - A. Q.E.D. It follows that A is not Borel-measurable. B.4 The Limit o-algebra
We construct a collection of o-algebras indexed by the countable ordinals, and at the end of this process we arrive at the limit o-algebra, denoted by 9,. The proofs of many of the properties of 9,, and indeed the definition of z,, proceed by transfinite induction. We also make frequent use of the fact that if (u,) is a sequence of countable ordinals, then there exists a countable ordinal E such that x, < @ for every n. In keeping with standard convention, we denote by R the first uncountable ordinal.
ADDITIONAL MEASURABILITY PROPERTIES OF BOREL SPACES
293
Definition B.2 Let X be a Borel space and 9, the collection of open subsets of X. For each countable ordinal a, we define
The limit o-algebra is
We prove later (Proposition B.lO) that 9xis in fact a o-algebra. Note When X is countable, W, = 9% for every that 9;= BX and 9;= ,dX. a < R. If X is uncountable, there is no loss of generality in assuming X = .N when dealing with the o-algebras 9% and 9,. This is the subject of the next proposition. Proposition B.7 Let X be an uncountable Borel space and let cp :Jlr + X be a Borel isomorphism from .N onto X. (Such an isomorphism exists by Corollary 7.16.1.) Then for every a < R, and
Proof We prove (16) by transfinite induction. For a = 0, (16) clearly holds. If (16) holds for all P < a, where a < R, then we have I,J~%=P,=P-~(UYR). D
Let S be a Suslin scheme for
B
UB<.2$. Then
where
Since cp S is a Suslin scheme for 0
Up,, 9$,we see that
On the other hand, if R is a Suslin scheme for
Up,, Ti, then
APPENDIX B
where (q-'oR)(s)=q-'[R(s)]
VSEC.
This shows that N ( R ) ~ c p [ Y ( u ~ < ~ . l ~ !which , ) l , proves the reverse of set containment (18). Therefore,
Since q is one-to-one, we also have
Now by (19), q(Yabr)is a o-algebra containing Y(UI,,, 9 & ) ,so
d-%)
9%.
(21
By (20), q - '(9;) is a o-algebra containing Y(UB,, 9 t r ) , so
Since q is one-to-one, (21) implies
9;$< 3 q - l(9$) and (22) implies
d9;O
9%.
Relations (21)-(24) imply (16). Relation (17) follows from (15) and (16). Q.E.D. We have already seen that in an uncountable Bore1 space X, 9;is properly contained in 9:(Proposition B.6). We would like to show more Our generally that if P < u < 0, then 9$is properly contained in 9% method for doing this is to generalize Corollary B.5.1 and then generalize Corollary B.5.2. The following lemmas are a step in this direction. If .9is a paving for a space X, we denote by the paving Lemma B.6 Let .9be a paving for A'" which contains the open subsets of .A", and suppose there exists a universal function for .Yconsistent with 9. Then there exists a universal function for 9 consistent with ~(9). Proof Let L be a universal function for .9consistent with 9. Define K : . N + . P by K((l,[*,. . .) =
L ( i 2 , i 3 , i 4 , .. .I .N - L(i2,c3,i 4 , . . .)
if
i,is odd,
if
Sl is even.
295
ADDITIONAL MEASURABILITY PROPERTIES OF BOREL SPACES
It is clear that K is a universal function for ,P. As in the proof of Proposition B.4, choose m e . A , and suppose that the nonzero components of m are in positions m,, m2, . . . . Then
Since L is consistent with .Y and .Y contains every open set, we have that Q.E.D. every set in (26) is in o(.P).It follows that K is consistent with ~(9). Lemma B.7 Let a be a countable ordinal. For each P < a, let YB be a paving for N which contains the collection 9 of open sets, and assume that there exists a universal function LB for .PBconsistent with PI,. Then there Yp consistent with Y(UB.. .YB). exists a universal function for
..
Up
Proof The set of ordinals [PIP < r ) is countable whenever cc < Q, so there exists a partition [P(P)IP < a ) of the positive integers such that P(P) is nonempty for each P < x. Define a universal function for P(P) by
Up,,
L(C1,C21...)=LB(C2,13,...)
Let
mE
if Cl€P(P).
.A, have nonzero components m,, m2, . . . . Then
and this set is in Y(UB,,.YB) by Proposition 7.35(b), (c), and the fact that each LBis consistent with .UPB. Q.E.D. Proposition B.8 For each a < R, there is a universal function for .Y(Yz,) consistent with Y ( P ,).
296
APPENDIX B
Proof For simplicity of notation, we suppress the subscript A'". The proof is by transfinite induction. When a = 0, the result follows from Corollary B.5.1. Assume now that the result holds for every /3 < a, where a < R. We prove it for a. By Lemma B.7 and the induction assumption, there is a universal funcY(90)consistent with YIUo,, Y(Yp)]. Now tion for
Up,,
U p
9 p c
(jY ( Y p ) c Y
(27)
a
and applying Y to both sides of (27) and using Proposition B.2, we obtain
From Pro~ositionB.5 and (28) we have the existence of a universal func~ i ~ q with I"(Uo<.Yp), and Lemma B.6 implies tion for " i ~ ~ < consistent existence of a universal function for Y(Up
But from (29),
[ u
Ya c Y Y
(
0
I)
YP
=' Y ( P ) ,
and applying Y to both sides, we see that
From Proposition B.5 and (30) we have the existence of a universal function Q.E.D. for Y(Ya)consistent with Y ( 9 " ) . Proposition B.9 Let X be an uncountable Bore1 space. If then 9% is properly contained in P ,.
P < a < Q,
Proof We assume without loss of generality that X = .N" (Proposition B.7) and suppress the subscript .N". It is clear that for P < a we have Yp c
297
ADDITIONAL MEASURABILITY PROPERTIES OF BOREL SPACES $pa. Let L be a universal function for Y(2" consistent with define
9'($pp)
and
) , for some zo E JV we have Then A E 9'($pP). If Jlr - A E Y ( Z p then
If zo E A, then zo $ L(zo)and a contradiction is reached. If zo E Jlr - A, then z O eL(zO)and again a contradiction is reached. It follows that Jlr - A # Y($pp). But M - A E $pa, so $pp is properly contained in $pa. Q.E.D. Proposition B.10 Let X be a Bore1 space. The limit o-algebra 8,is contained in 02, and
Indeed, 2,is the smallest o-algebra containing the open subsets of X which satisfies (31). Proof The result is trivial if X is countable, so assume that X is uncountable. It is clear that @ E 2,and 9,is closed under complementation, so we need only verify that 2,is closed under countable unions in order to show that it is a o-algebra. If Q,, Q,, . . . is a sequence of sets in 2 , , then for every k. Then for some a < 51, we have Q, E 2% Q, E 9% c 2,. We prove by transfinite induction that 9%c %, for every a < R. This c %2d, for every P < a, where a < R, then is clearly the case if a = 0. If 9% c qX. It follows that by Lusin's theorem (Proposition 7.42), y(Up,, 3%) 9% c %., Therefore 2,c 42,. We now prove (31).As a result of Proposition 7.35(d),it suffices to prove Since C is countable, that 3, 3 Y(9,).Let S be a Suslin scheme for 9,. : ' c there exists a < R such that S(s)E 9%for every s E C. Then N ( S )E 9 ZX, and (31) is proved. is a o-algebra containing the open subsets of X which satisfies Suppose .9 9 = Y(9). Clearly, B, = 9;c 9 . If 2$c 9for every P < a, where a < 51, then (14) implies that Pxc 9. Therefore 9 contains 9,, which must be the smallest o-algebra containing the open subsets of X and satisfying (31). Q.E.D.
'
A major shortcoming of the analytic o-algebra is that the composition of analytically measurable functions is not necessarily analytically measurable (cf. remarks following Proposition 7.50). However, the composition of limit-measurable functions is limit-measurable. We first give a formal definition of these terms and then prove the preceding statements.
298
APPENDIX B
Definition B.3 Let X and Y be Borel spaces, D c X, and 9 a o-algebra on X. A function f : D + Y is said to be .??-measurableiff -'(B)E 9 for every B e g y . If 9 = 2'X, we say that f is limit-measurable. The o-algebra 9 is said to be closed under composition of functions if, whenever f :X + X is then f - '(P) E 9. 9-measurable and P E .P,
In Definition B.3 there is no mention of a 9-measurable function g mapping X into a Borel space Y with which to compose f . If there were such a g, then to check that g f : X + Y is 9-measurable, we would check that f - l [g-'(B)] is 9-measurable for every B E B y . Since gV1(B)E 9 , it suffices to check that f -'(P) E .?? for every P E9 , which is the condition stated in Definition B.3. The stipulation in Definition B.3 that f have the same domain and range space is inconsequential as long as 9 = 2'; for (see Proposition B.7). These are the only cases we some cc < Q or .?? = consider. The closure of a o-algebra under composition of mappings and the satisfaction of an equation like (31) are intimately related, as the following lemma shows. 0
zx
Lemma B.8 Let X be a Borel space and let .?? be a o-algebra on X. If 9 contains the analytic subsets of X and is closed under composition of functions, then
Proof If X is countable, the result is trivial, so we assume that X is uncountable. In light of Proposition 7.35(d), we need only prove that under the assumptions of the lemma we have 9" 3 Y ( 9 ) . To do this, for an arbitrary Suslin scheme S for 9 we construct a 9-measurable function f :X + X and a set P E P such that
Let cp:N + X be a Borel isomorphism from JV onto X (Corollary 7.16.1), and let $ be a one-to-one onto function from the set of positive integers to C.For k = 1,2,. . . , define x : . N + {1,2) by
1 2
if cp(~)ES[$(k)l> otherwise,
and define f :Jf + A'" by Finally, let f : X + X be given by f = cp 7 0cp- '. We show that f is 9-measurable. This is equivalent to showing that , T o y :X + .N is .Y-measurable. 5 2Vn) which has as a subBut c cp-' takes values in {(i,, i, , . . .)E 0
-'
ADDITIONAL MEASURABILITY PROPERTIES OF BOREL SPACES
base the collection of open sets {R(k), E(k)\k = 1,2, . .
.:,where
R(k) = {(C1,[2,. . .Illn 2 v n and [k = I}> I?(k) = (([,,C2, . . I 2 V n and [,= 2).
(33)
.)I[,
By the remark following Definition 7.6, the .9-measurability of the sets
implies the 9-measurability off Define P c X by P=
.,cp- '. It follows that f is 9-measurable.
u n q(~[+-l(s)i),
ZE.NS
where R(k) is given by (33). Then P is an analytic subset of X, so P €9. We have
so (32) holds.
Q.E.D.
Proposition B . l l Let X be a Bore1 space. The limit a-algebra 9, is the smallest a-algebra containing the analytic subsets of X which is closed under composition of functions.
Proof We show first that 9,is closed under composition of functions. It suffices to show that i f f : X -+ X is 9,-measurable, a < a, and Q E Px, then f -'(Q)E Z x . If a = 0, this is true by definition. Suppose that for some a < and for every p < a and C E 9$ we have f -'(C)E Z X .We show that f -'(Q) E 9, for every Q E Y ( U p<, 9$), and this implies that f -'(Q) E g X for every Q E 9%. Choose Q G 9'(Up ,,9$) and let S be a Suslin scheme for such that Q = N(S). Then
Up,,9&
where f
-'OSis the Suslin scheme defined by -'
By the induction hypothesis, f 0 S is a Suslin scheme for Z x , and we have from Proposition B.10 and (34) that f - '(Q) E Yx. The fact that 9, is the smallest o-algebra containing the analytic subsets of X which is closed under composition of functions follows from Proposition B.10 and Lemma B.S. Q.E.D.
300
APPENDIX B
Corollary B . l l . l Let X, Y, and Z be uncountable Borel spaces. If f : X -, Y and g: Y + Z are limit-measurable, then gof :X -,Z is limitmeasurable. In particular, if f and g are analytically measurable, then g f is limit-measurable. It is possible to choose f and g to be analytically measurable so that go f is not analytically measurable. 0
Proof Proposition B.9 implies that d , , d,,and d, are properly contained in 9 , , 9 y , and Y Z , respectively. Apply Proposition B.7 to the Q.E.D. results of Proposition B.ll. Using an argument similar to the first part of the proof of Proposition B.ll, the reader may verify that iff :X + Y and g: Y + Z are analytically measurable, then gof is in fact y$measurable. Indeed, one can show by induction that iff is 9F-measurable and g is 9",measurable, where m and n are integers, then g of is 9:+"-measurable. Let X be a Borel space, and for Q E axdefine 0,:P(X) -+ [O,1] by
Then OQ is universally measurable (Corollary 7.46.1). If Q is Borel-measurable, then 0, is Borel-measurable (Proposition 7.25), and if Q is analytically measurable, then 6, is analytically measurable (Proposition 7.43). We consider the case when Q is 9;-measurable. Proposition B.12 Let X be a Borel space. If Q E 2,, then 0, defined , 6, is by (35) is 9,(,,-measurable. In fact if a < R and Q ~ 9 %then 9$(,,-measurable. Proof The last statement is true when a = 0. If it is true for every j3 < a, where a < R, and S is a Suslin scheme for ,,9 $ , then for any c E R, (98) of Chapter 7 holds, where A = N ( S ) and K(s) is defined by (92) of Chapter 7. 9&, SO by the induction hypothesis, the set For each S E C, K(s)E {PEP ( X ) ~ ~ [ K ( S2) ]c - (lln)) is in 9gcx,. It follows from (98) of Chapter 7 and Proposition 7.35(b) that
Up
uo<,
9(UI,,,
Up
9$(,,), then 6, is 9;(,,-measurable. The collection of Thus, if Q E sets Q for which OQ is Yp(,,-measurable forms a Dynkin system, so by the Dynkin system theorem (Proposition 7.24), OQ is Yp(,,-measurable for every Q E 9%. This completes the induction step. If Q E 9 , , then for some a < Q, Q E Y;, SO 0, is 9:((,,-measurable, and Q.E.D. therefore 8~ is dPp(,,-measurable.
ADDITIONAL MEASURABILITY PROPERTIES OF BOREL SPACES
B.5 Set Theoretic Aspects of Borel Spaces The measurability properties of Borel spaces are closely linked to several issues in set theory which we have for the most part skirted. These issues are presented briefly here. There is some controversy concerning the propriety of the axiom of choice and Cantor's continuum hypothesis in applied mathematics. The former is generally accepted and the latter is regarded with suspicion. The general axiom of choice says that given any index set A and a collection of S, such that nonempty sets { S , l a ~A ) , there is a function f :A -, f (a)E S, for every a E A. We have used this axiom in Appendix A to construct examples. In particular, the set E of Example 1 of that appendix for which both E and Ec have p-outer measure one is constructed by means of the axiom of choice. We have also used this axiom to construct the set S in the proof of Lemma B.3, and this lemma was instrumental in proving that every uncountable Borel space is Borel-isomorphic to every other uncountable Borel space (Proposition B.3 and Corollary 7.16.1). However an alternative proof of Lemma B.3 which does not require the axiom of choice is possible, but is quite lengthy and will not be given. The countable axiom of choice is the same as the general axiom except that the index set A is required to be countable. A paraphrase of this axiom is that given any countable collection of nonempty sets, one element can be chosen from each set. We have made extensive use of this axiom, such as in the choice, for each k, of a selector cp, in the proof of Proposition 7.50(a). Indeed, much of real analysis and topology rests on the countable axiom of choice. Solovay [S13] has shown that if the general axiom of choice is replaced by the weaker "principle of dependent choice," which is still stronger than the countable axiom of choice, then every subset of the real line may be assumed to be Lebesgue-measurable. A slight extension of this result shows that under these conditions every subset of any Borel space may be assumed to be universally measurable. Therefore, by choice of the proper axiom system, the measurability difficulties which are the subject of Part I1 can be made to disappear. It is possible to show without the use of the axiom of choice that every uncountable Borel space X contains universally measurable sets which are not limit measurable. An unpublished proof of this is due to Richard Lockhart. If both the axiom of choice and the continuum hypothesis are adopted then it follows that a, has a larger cardinality than Y,. Since and 93, has cardinality at least c, so does Px. On for each a < R, 93, c Yx is contained in Y(LYltP",)and there is a universal function the other hand, 9% for 5V(LY%),so the cardinality of 9':is c. Now Yx = U,,,YltP",, and the
U,,,
302
APPENDIX B
cardinality of the set of countable ordinals is less than or equal to c, so 2, has cardinality c. In contrast, under the assumption of the axiom of choice and Cantor's continuum hypothesis, 42, contains a set F of cardinality c which has measure zero with respect to every nonatomic probability measure [H5,Chapter 111, Section 141. Thus every subset of F is also in 42,, and the must be properly contained cardinality of axis at least 2'. It follows that 9, in 42,. Another relevant set theoretic work is that of Godel [GI], who showed that it is consistent with the usual axioms of set theory to assume the existence of the complement of an analytic set in the unit square whose projection on an axis is not Lebesgue-measurable. This means that it is consistent with the usual axioms to assume the existence of an analytically measurable function f :[O,l] [O, 11 + R such that f "(x) = inf, f (x, y) is not Lebesgue measurable. This places a severe constraint on the types of strengthened versions of Proposition 7.47 which might be possible.
Appendix C
The Hausdorff Metric and the Exponential Topology
This appendix develops a metric topology on the collection of closed subsets (including the empty set a ) of a compact metric space (X,d). We denote this collection of sets by 2'. For A E 2' and x E X, define d(x, A) = min d(x, a)
if A # @,
a€A
(1)
d(x, @) = diam(X) = max d(y, z). (2) y,isx Definition C.l Let (X,d) be a compact metric space. The HausdorfS metric p on 2' is defined by d(a, B), max d(b, A)
if A, B # 121,
(3)
if A # 121,
(4) (5)
b€B
p(A, 121) = ~(121,A) = diam(X) ~(121,121)= 0.
We have written max in place of sup in (3), since every set in 2X is compact and d(x, A) is a continuous function of x for every AE 2X. To see this latter property, consider a set A E ~If ~A .= 121, then the function d(x, A) is
304
APPENDIX
c
constant and hence continuous. If A # (a, then for x, y E X and a E A we have 4x3 a) 5 4 x 3 Y)+ d(y, a). By taking the infimum of both sides over a E A, we obtain
4%A) - d(y, A)
d(x, Y).
By reversing the roles of x and y, we have
which shows that d(x, A) is a Lipschitz continuous function of x. It is a tedious but straightforward task to verify that (2',p) is a metric space, and this is left to the reader. We will prove that (2',p) is a compact metric space. We first show some preliminary facts. If A is a (not necessarily closed) subset of X, define 2A = {K E 2'IK c A ) . We define two classes 5%' = (2GIGis an open subset of X),
(7)
.X = (2' - 2KIKis a closed subset of X).
(8)
To aid the reader, we will continue to denote points of X by lowercase Latin letters and subsets of X by uppercase Latin letters. Uppercase script letters will be used for subsets of 2,' except for subsets of the form 2A as defined above. In keeping with this practice, we denote open spheres in the two spaces as follows:
Finally, classes of subsets of 2' will be denoted by boldface script letters, as in the case of 93 and .X defined above. The topology obtained by taking 3 u .X as a subbase in 2' is called the exponential topology and an extensive theory exists for it [K2,K3]. It can be developed for a nonmetrizable topological space X, but we are interested in it only when X is compact metric. In this case, the exponential topology is the topology generated by the Hausdorff metric, as we now show. Proposition C.l Let (X,d) be a compact metric space and p the Hausdorff metric on 2'. The class 3 u .X as defined by (7) and (8) is a subbase for the topology on (2', p). Proof 2G and 2'
We first prove that when G is open and K is closed in X, then ZK are open in (2',p). If G or K is empty, then 2" or 2' - 2K,
-
THE HAUSDORFF METRIC AND THE EXPONENTIAL TOPOLOGY
305
respectively, is easily seen to be open, so we assume G and K are nonempty. ~ . proof for Suppose A is a nonempty closed subset of X and A E ~(The A = @ is trivial.) Since A is compact, is a subset of G, and X - G is closed, there exists E with 0 < E < diam(X) such that min d(a,X - G ) 2 F . acA
For B E Y , ( A ) , we have B # @ and max d(b,A ) < E. bcB
From inequalities (9) and (10) we have that B c G. Hence Y , ( A ) c 2G, and 2Gmust be open. Turning to the case of 2' - 2K for K closed, we let A E 2' 2K be nonempty. By definition, A $ 2 K , so A - K contains at least one point a,. Since X - K is open, we can find E > 0 for which S,(a,) c X - K. For B E Y , ( A ) ,we have d(a,, B) I max d(a,B) < E, acA
which implies B n S,(a,) # @ and B E 2' - 2K. Therefore Y , ( A ) c 2' - 2K, and 2' - 2K is open. Having thus shown that the sets 2G and 2' - 2K are open in (2', p) when G is open and K is closed, we must now show that given any open subset 9 of (2X,p) and any nonempty A E 9,we can find open sets G,, G,, . . . ,G, and closed sets K , , K,, . . . ,K , in X for which Since 9 is open in (2', p), there exists E > 0 such that Y , ( A ) c 9.Since A is closed in the compact set X, there exist points { x , , . . . ,x,} in A such that A c U;=, S,,,(x,). Let and ~ ' since for each k, A n S,,,(xk) # @, we have By construction, A E ~ and, A E 2' - 2Kk.Therefore Suppose B is another set in 2G1n (2' - 2K1)n . . . n (2' - 2Kn).The fact that B E 2G implies max d(b,A ) < E. beB
306
APPENDIX
c
If for some a, E A we had d(ao,B) 2 r, then we would also have S,(ao) c X - B. But for some X,E A, a0 E S,,,(x,) and this would imply in succession S,,,(x,) c X - B, B c K,, and B $ 2' - 2Kk.This contradiction shows that max d(a,B) < E. ~ E A
Inequalities (11) and (12) establish that p(A, B) < r, and as a consequence 2G1n (2' - 2 K 1n ) . . - n (2' - 2"',)
S,(A)
9.
Q.E.D.
If a cover of a space contains no finite subcover, we say the cover is essentially injnite. To show that (2', p ) is compact when X is compact, we must show that no essentially infinite open cover of 2' exists. As a consequence of the following lemma, this will be accomplished if we can show that the subbase $9 u K contains no essentially infinite cover. We remind the reader that a topological space in which every open cover has a countable subcover is called Lindelof, and in metrizable spaces this property is equivalent to separability. Lemma C.l Let Q be a Lindelof space and let Y be a subbase for the topology on a. If there exists an essentially infinite open cover of 0, then there exists one which is a subset of Y. Proof Let 93 be the base for the topology on Q constructed by taking finite intersections of sets in Y and let 9 be an essentially infinite open cover of R. Each C E @has a representation C = UaEA(,,B,,where B , E ~for every a E A(C). The collection U,,,(~,la E A(C)) is an essentially infinite open cover of Q, and, by the Lindelof property, it contains a countable, essentially infinite, open subcover 9 = {B,, B,, . . .). Each B, has a repreS k j ,where S k j €Y ,j = 1,. . . ,n(k).If for each j the cover sentation B, = 9j= { S l j B2,B3,. , . .} is not essentially infinite, then there exists a finite subcollection Bj which also covers Q. But then (B,) v
[lu
(lj - (sIj))] C 9
j= 1
is a finite subcover of Q. This contradiction implies that for some index j,, the cover gj0is essentially infinite. Denote R , = SljO.In general, given R,, R,, . . . ,R, in 9 such that B, c R,, k = 1,. . . ,n, and ( R , , R,, . . . ,R,, B,,,, B,,,,. . .) is an essentially infinite open cover of Q, we can use the preceding argument to construct R,+, E Y for which B,,, c R,, and ( R 1 ,R z , . . . ,R,, R,+ ,,BE+,,Bn+3,.. .) is an essentially infinite open cover of Q. The collection { R , ,R,, . . .j is an essentially infinite open cover conQ.E.D. tained in Y .
,
Proposition C.2 Let ( X , d ) be a compact metric space and p the Hausdorff metric on 2'. The metric space (2', p ) is compact.
THE HAUSDORFF METRIC AND THE EXPONENTIAL TOPOLOGY
307
Proof We first show that (2', p) is separable. Since ( X , d ) is compact, it is separable. Let D be a countable dense subset of X and let Let 9 consist of finite unions of sets in '3. Then 9 is countable and, as we now show, is dense in (2',p). Given A E ~ ' and E > 0, choose a positive integer n satisfying 2/n < E. The collection of sets { S l j , ( x ) l xD~) covers the ~ which also compact set A, so there is a finite subcollection { S , d x ) l x F) covers A and which satisfies SlIn(x)n A # @ for every X E F. The set B = Ux,,Slj,(x) is in 9 and satisfies p(A, B) < E . As a result of Proposition C.l, Lemma C.l, and the separability of (2', p), to show that (2',p) is compact we need only show that every open cover of 2' which is a subset of 99 u contains a finite subcover of 2'. Thus let (Gala€A ) be a collection of open sets and { K & ~ EB) a collection of closed sets in X , and suppose
nDEB
Define the closed set KO = K D .By definition, KO$ U D , B ( 2 X- 2Kfl),so KOE U r e A 2 GThus . for some a,€ A, we have KOE ~ ~i.e.,~KO Oc,G,,. This means that X - G,, c X - K O = ( X - K,),
U
DEB
and since X - G,, is compact, there exists a finite set {P1,P2,.. . ,p,) c B for which n
To complete the proof, we show 2'
= 2G=0 y
rk:
I
U (2X - 2K0k) 1
.
If C E 2', then either C c Gro,in which case C E 2'.0, or else C n ( X - G,,) # @. In the latter case, (13) implies that for some k, C n ( X - KBk)# @, i . e . , C ~ 2 ' - 2 ~ ~ ~ .Q.E.D.
We now develop some convergence notions in (2',p). Let (A,} be a sequence of sets in 2'. Define
308
APPENDIX
c
Forexample,ifX = [-1,1]andA, = {(-l)"),wehavelim,,,A, = {-1,l) and b,,, A, = @. If X = [- 1,1] and A, = [ - lln, lln], we have
lim A,
=
n+ m
lim A,
=
(0).
n+ m
-
-
Clearly we have lim,,, A, c limn,, A,. It is also true that limn,, A, and limn,, A, are closed. To see this for &,,,A,, let {x,) be a sequence in lirn,,, A, converging to x. Then from (6) we have for each rn
+ lim inf d(x, ,A,) = d(x, x,), m
lim inf d(x, A,) I d(x, x,) n-m
n-+
and since d(x, x,) can be made arbitrarily small by choosing m sufficiently A,. Replace lim inf,, ,by lim sup,, ,in large, we conclude that x E En,, b,,, A, is closed. the preceding argument to show that If lirn,,, A, = h,,, A,, we denote their common value by lirn,,, A,. This notation is justified by the following proposition. Proposition C.3 Let (X, d) be a compact metric space and p the Hausdorff metric on 2'. Let {A,) be a sequence in 2'. Then -
lirn A,
=
lim A,
=A
n+ m
n+ rn
if and only if lirn p(A,, A) = 0. n+ cc
Proof Assume for the moment that A # and suppose (16) holds. Then for each x in the compact set A, d(x, A,) + 0 as n -+ co.Given E > 0, j = 1,. . . , k let (x,, . . . , x,) be points of A such that the open spheres Sel2(xj), cover A. Choose N large enough so that for all n 2 N d(xj, A,) I ~ 1 2 ,
j = 1,. . . ,k.
Now use the Lipschitz continuity [cf. (6)] of the function x + d(x, A,) to conclude that d(x, A,) I E
Vx E A.
This implies that lim max d(x, A,) n+m
= 0.
xsA
This equation and (3) imply that (17) will follow if we can show lim max d( y, A) = 0. n-tw Y E A ,
THE HAUSDORFF METRIC AND THE EXPONENTIAL TOPOLOGY
309
If (18) fails to hold, then for some E > 0 there exists a sequence y , A,, ~ such that n, < n, < . . . and
The compactness of X implies that { y,} accumulates at some yo E X which, by (19) and the continuity of x + d(x,A), must satisfy d(yo,A) 2 e. But y0~lim,,, A, by (14),and this contradicts (16). Hence (18) holds. Still assuming A # 121, we turn to the reverse implication of the proposition. If (17) holds, then lim d(x, A,)
=0
V x E A,
n+ m
and lim max d( y, A) = 0. n+m YEA,,
Equation (20)implies that
If x EK,,,A,, then by definition there exists a sequence y, E A,, such that n, < n, < . , . and lim d(x,y,)
= 0.
k+ m
We have from (6) that
and, letting k -+ ar, and using (21)and (23),we conclude d(x,A) = 0. Since A is closed, this proves X E A and -
lim A, c A.
n+ m
(24)
Combine (22)and (24)to obtain (16). Assume finally that A = (Zr If (16) holds, then all but finitely many of the sets A, must be empty, for otherwise one could find y, E A,, ,n , < n, < . . . , and {y,} would accumulate at some yo ~lim,,, A,. If all but finitely many of the sets A, are empty, then ( 5 ) implies that (17) holds. Conversely, if (17) holds and A = 121, then (4) implies that all but finitely many of the sets A, are empty. Equation (16) follows from (2), (14), and (15). Q.E.D. For the proof of Proposition 7.33 in Section 7.5 we need the concept of a function which is upper semicontinuous in the sense of Kuratowski, or in abbreviation, upper semicontinuous (K).
310
APPENDIX
c
Definition C.2 Let Y be a metric space and X a compact metric space. A function F: Y -+ 2X is upper semicontinuous (K) if for every convergent sequence { y,) in Y with limit y, we have limn,, F( y,) c F( y). The similarity of Definition C.2 to the idea of an upper semicontinuous real or extended real-valued function is apparent [Lemma 7.13(b)]. Although we will not discuss functions which are lower semicontinuous (K), it is interesting to note that such a concept exists and has the obvious definition, namely, that the function F : Y -+ 2' is lower semicontinuous (K) if for every convergent sequence {y,) in Y with limit y, we have b,,, F(y,) 2 F(y). It can be seen from Proposition C.3 that a function F : Y -+ 2' is continuous in the usual sense (where 2' has the exponential topology) if and only if it is both upper and lower semicontinuous (K). We carry the analogy with real-valued functions even farther by showing that an upper semicontinuous (K) function is Borel-measurable, and the remainder of the appendix is devoted to this. Lemma C.2 Let Y be a metric space and X a compact metric space. If F : Y + 2' is upper semicontinuous (K), then for each open set G c X, the set
is open. Proof The openness of F-'(2') for every open G is in fact equivalent to upper semicontinuity (K), but we need only the weaker result stated. To prove it, we show that for G open, the set F-'(2X - 2G)is closed. If {y,) is a sequence in this set with limit y E Y , then
and so there exists a sequence {x,) in the compact set X - G such that . . . This sequence has an accumulation point x E X - G, x, E F(yn),n = 1,2,. and, by (14), x~lim,,, F(y,). The upper semicontinuity (K) of F implies x~F(y),andsoF(y)n(X-G)#ja,i.e.,y~F-'(2'-2'). Q.E.D. Proposition C.4 Let Y be a metric space, (X, d) a compact metric space, and let 2' have the exponential topology. Let F: Y -+ 2' be upper semicontinuous (K). Then F is Borel-measurable. Proof If F: Y -+ 2' is upper semicontinuous (K) and G is an open subset of X, then F-'(2G) is Borel-measurable in Y by Lemma C.2. If K is a closed subset of X, define open sets G, = (xjd(x, K) < lln). We have K = G,, and so a closed set A is a subset of K if and only if A c G,, n = 1,2,. . . .
THE HAUSDORFF METRIC AND THE EXPONENTIAL TOPOLOGY
This implies 2K =
2'", and
is a G,-set, thus Borel-measurable in Y. It follows that for any set 9 in the subbase9 u .X for the exponential topology on 2X,F - ' ( 9 ) is Borel-measurable in Y. By Proposition 7.1, any open set in 2X can be represented as a countable union of finite intersections of sets in 9 u .X and so its inverse image under F is Borel-measurable. Q.E.D.
References
R. Ash, "Real Analysis and Probability." Academic Press, New York, 1972. K. J. Astrom, Optimal control of Markov processes with incomplete state information, J. Math. Anal. Appl. 10 (1965), 174-205.
R. Bellman, "Dynamic Programming." Princeton Univ. Press, Princeton, New Jersey, 1957. D. P. Bertsekas, Infinite-time reachability of state-space regions by using feedback control, IEEE Trans. Automatic Control AC-17 (1972), 604-613. D. P. Bertsekas, On error bounds for successive approximation methods, IEEE Trans. Automatic Control AC-21 (1976), 394-396. D. P. Bertsekas, "Dynamic Programming and Stochastic Control." Academic Press, New York, 1976. D. P. Bertsekas, Monotone mappings with application in dynamic programming, S I A M J. Control Optimization 15 (1977), 438-464. D. P. Bertsekas and S. Shreve, Existence of optimal stationary policies in deterministic optimal control, J. Math. Anal. Appl. (to appear). P. Billingsley, Invariance principle for dependent random variables, Trans. Amer. Math. Soc. 83 (1956), 250-282. D. Blackwell, Positive dynamic programming, Proc. Fifth Berkeley Sympos. Math. Statist. and Probability, 1965, 415-418. D. Blackwell, Discounted dynamic programming, Ann. Math. Statist. 36 (1965), 226-235 D. Blackwell, On stationary policies, J. Roy. Statist. Soc. 133A (1970), 33-37. D. Blackwell, Borel-programmable functions, Ann. Prob. 6 (1978). 321-324. D. Blackwell, D. Freedman, and M. Orkin, The optimal reward operator in dynamic programming, Ann. Probability 2 (1974), 926-941. N. Bourbaki, "General Topology." Addison-Wesley. Reading, Massachusetts, 1966. D. W. Bressler and M. Sion, The current theory of analytic sets, Canad. J. Math. 16 (1964), 207-230.
REFERENCES
313
L. D. Brown and R. Purves, Measurable selections of extrema, Ann. Statist. 1 (1973), 902-912. D. Cenzer and R. D. Mauldin, Measurable parameterizations and selections, Trans. Amer. Math. Soc. (to appear). C. Dellacherie, "Ensembles Analytiques, Capacittts, Mesures de Hausdorff." SpringerVerlag, Berlin and New York, 1972. E. V. Denardo, Contraction mappings in the theory underlying dynamic programming, S I A M Rev. 9 (1967), 165-177. C. Derman, "Finite State Markovian Decision Processes." Academic Press, New York, 1970. J. L. Doob, "Stochastic Processes." Wiley, New York, 1953. L. Dubins and D. Freedman, Measurable sets of measures, Pacific J. Math. 14 (1964), 1211-1222. L. Dubins and L. Savage, "Inequalities for Stochastic Processes (How to Gamble if you Must)." McGraw-Hill, New York, 1965. (Republished by Dover, New York, 1976.) J. Dugundji, "Topology." Allyn & Bacon, Rockleigh, New Jersey, 1966. E. B. Dynkin and A. A. Juskevic, "Controlled Markov Processes and their Applications." Moscow, 1975. (English translation to be published by Springler-Verlag.) D. Freedman, The optimal reward operator in special classes of dynamic programming problems, Ann. Probability. 2 (1974), 942-949. E. B. Frid, On a problem of D. Blackwell from the theory of dynamic programming, Theor. Probability Appl. 15 (1970), 719-722. N. Furukawa, Markovian decision processes with compact action spaces, Ann. Math. Statist. 43 (1972) 1612-1622. N. Furukawa and S. Iwamoto, Markovian decision processes and recursive reward functions, Bull. Math. Statist. 15 (1973), 79-91. N. Furukawa and S. Iwamoto, Dynamic programming on recursive reward systems, Bull. Math. Statist. 17 (1976), 103-126. K. Godel, The consistency of the axiom of choice and of the generalized continuumhypothesis, Proc. Nut. Acad. Sci. U.S.A. 24 (1938), 556-557. P. R. Halmos, "Measure Theory." Van Nostrand-Reinhold, Princeton, New Jersey, 1950. F. Hausdorff, "Set Theory." Chelsea, Bronx, New York, 1957. C. J. Himmelberg, T. Parthasarathy, and F. S. Van Vleck, Optimal plans for dynamic programming problems, Math. Operations Res. 1 (1976). 390-394. K. Hinderer, "Foundations of Nonstationary Dynamic Programming with Discrete Time Parameter." Springler-Verlag, Berlin and New York, 1970. J. Hoffman-Jorgensen, "The Theory of Analytic Spaces." Aarhus Universitet, Aarhus, Denmark, 1970. A. Hordijk, "Dynamic Programming and Markov Potential Theory." Mathematical Centre Tracts, Amsterdam, 1974. R. Howard, "Dynamic Programming and Markov Processes." MIT Press, Cambridge, Massachusetts, 1960. B. Jankov, On the uniformisationof A-sets, Dokl. Akad. Nauk S S S R 30 (1941), 591 -592 (in Russian). W. Jewell, Markov renewal programming I and 11, Operations Res. 11 (1963), 938-971. A. A. Juskevit (Yushkevich), Reduction of a controlled Markov model with incomplete data to a problem with complete information in the case of Borel state and control spaces, Theor. Probability Appl. 21 (1976), 153-158. L. Kantorovich and B. Livenson. Memoir on analytical operations and projective sets, Fund. Math. 18 (1932), 214-279.
314
REFERENCES
K. Kuratowski, "Topology I." Academic Press, New York, 1966. K. Kuratowski, "Topology 11." Academic Press, New York, 1968. K. Kuratowski and A. Mostowski. "Set Theory." North-Holland, Amsterdam, 1976. K. Kuratowski and C. Ryll-Nardzewski, A general theorem on selectors, Bull. Polish Acad. Sci. 13 (1965), 397-411. H. Kushner, "Introduction to Stochastic Control." Holt, New York, 1971. M. Loeve, "Probability Theory." Van Nostrand-Reinhold, Princeton, New Jersey, 1963. N. Lusin, Sur les ensembles analytiques, Fund. Math. 10 (1927), 1-95. N. Lusin and W. Sierpinski, Sur quelques proprittts des ensembles (A), Bull. Acad. Sci. Cracovie (1918), 35-48. G. Mackey, Bore1 structure in groups and their duals, Trans. Amer. Math. Soc. 85 (1957), 134-165. A. Maitra, Discounted dynamic programming on compact metric spaces, Sankhya 30A (1968), 211-216. J. McQueen, A modified dynamic programming method for Markovian decision problems, J. Math. Anal. Appl. 14 (1966), 38-43. P. A. Meyer, "Probability and Potentials." Ginn (Blaisdell), Boston, Massachusetts, 1966. P. A. Meyer and M. Traki, Reduites et jeux de hasard (Seminaire de Probabilites VII, Universite de Strasbourg, in "Lecture Notes in Mathematics," Vol. 321), pp. 155-171. Springer, Berlin, 1973. J. von Neumann, On rings of operators. Reduction theory, Ann. of Math. 50 (1949), 401 -485. P. Olsen, Multistage stochastic programming with recourse: The equivalent deterministic problem, SIAM J. Control Optimization 14 (1976), 495-517. P. Olsen, When is a multistage stochastic programming problem well-defined?, SIAM J. Control Optimization 14 (1976), 518-527. P. Olsen, Multistage stochastic programming with recourse as mathematical programming in an L, space, SIAM J. Control Optimization 14 (1976), 528-537. D. Ornstein, On the existence of stationary optimal strategies, Proc. Amer. Math. Soc. 20 (1969), 563-569. J. M. Ortega and W. C. Rheinboldt, "Iterative Solutions of Nonlinear Equations in Several Variables." Academic Press, New York, 1970. K. Parthasarathy, "Probability Measures on Metric Spaces." Academic Press, New York, 1967. Yu. V. Prohorov, Convergence of random processes and limit theorems in probability theory, Theor. Probability Appl. 1 (1956), 157-214. D. Rhenius, Incomplete information in Markovian decision models, Ann. Statist. 2 (1974), 1327-1334. R. T. Rockafellar, Integral functionals, normal integrands and measurable selections, in "Nonlinear Operators and the Calculus of Variations." Springer-Verlag, Berlin and New York, 1976. R. T. Rockafellar and R. Wets, Stochastic convex programming: relatively complete recourse and induced feasibility, SIAM J. Control Optimization 14 (1976), 574-589. R. T. Rockafellar and R. Wets, Stochastic convex programming: basic duality, Pacific J. Math. 62 (1976). 173-195. H. L. Royden, "Real Analysis." Macmillan, New York, 1968. S. Saks, "Theory of the Integral." Stechert, New York, 1937. Y. Sawaragi and T. Yoshikawa, Discrete-time Markovian decision processes with incomplete state information, Ann. Math. Statist. 41 (1970), 78-86.
REFERENCES
315
M. Schal, On continuous dynamic programming with discrete time parameter, 2. Wahrscheinlichkeitstheorie und Verw. Gebiete 21 (1972), 279-288. M. Schal, On dynamic programming: Compactness of the space of policies, Stochastic Processes Appl. 3 (1975). 345-364. M. Schal, Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal, 2.Wahrscheinlichkeitstheorie und Verw. Gebiete 32 (1975), 179-196. E. Selivanovskij, Ob odnom klasse effektivnyh mnozestv (mnozestva C), Mat. Sb. 35 (1928), 379-413. S. Shreve, A General Framework ,for Dynamic Programming with Specializations, M . S. thesis (1977), Dept. of Elec. Eng., Univ. of Illinois, Urbana. S. Shreve, Dynamic Programming in Complete Separable Spaces, Ph.D. thesis (1977), Dept. of Math., Univ. of Illinois, Urbana. S. Shreve and D. P. Bertsekas, A new theoretical framework for finite horizon stochastic control, Proc. Fourteenth Annual Allerton Conf. Circuit and System Theory, ANerton Park, Illinois, October, 1976, 336-343. S. Shreve and D. P. Bertsekas, Equivalent stochastic and deterministic optimal control problems, Proc. 1976 IEEE Conf. Decision and Control, Clearwater Beach, Florida, 705-709. S. Shreve and D. P. Bertsekas, Alternative theoretical frameworks for finite horizon discrete-time stochastic optimal control, SIAM J. Control Optimization 16 (1978). S. Shreve and D. P. Bertsekas, Universally measurable policies in dynamic programming Mathematics of'Operations Research (to appear). R. Solovay, A model of set-theory in which every set of reals is Lebesgue measurable, Ann. Marh. 92 (1970), 1-56. R. E. Strauch, Negative dynamic programming, Ann Marh. Statist. 37 (1966), 871-890 C. Striebel, Sufficient statistics in the optimal control of stochastic systems, J. Math. Anal. Appl. 12 (1965), 576-592. C. Striebel, "Optimal Control of Discrete Time Stochastic Systems." Springer-Verlag, Berlin and New York, 1975. M. Suslin (Souslin), Sur une definition des ensembles measurables B sans nombres transfinis, C. R. Acad. Sci. Paris 164 (1917), 88-91. V. S. Varadarajan, Weak convergence of measures on separable metric spaces, Sankhya 19 (1958), 15-22. D. H. Wagner, Survey of measurable selection theorems, SIAM J. Control Optimization 15 (1977), 859-903. A. Wald, "Statistical Decision Functions." Wiley, New York, 1950. H. S. Witsenhausen, A standard form for sequential stochastic control, Math. Systems Theory 7 (1973), 5-1 1.
Table of Propositions, Lemmas, Definitions, and Assumptions
Chapter 2
Monotonicity Assumption Chapter 3
Proposition Proposition Proposition Proposition Proposition Proposition Proposition
3.1 3.2 3.3 3.4 3.5 3.6 3.7
Chapter 5
Lemma 3.1
Chapter 4
4.1 4.2 4.3 4.4 4.5
4.6 4.7 4.8 4.9 4.10 4.1 1
Assumption C (Contraction Assumption) Fixed Point Theorem
Assumption F. I Assumption F.2 Assumption F.3
Proposition Proposition Proposition Proposition Proposition
Proposition Proposition Proposition Proposition Proposition Proposition
Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition
5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.1 1 5.12 5.13
TABLE O F PROPOSITIONS
Proposition 5.14 Proposition 5.15 Lemma 5.1 Lemma 5.2 Assumption I (Uniform Increase Assumption) Assumption D (Uniform Decrease Assumption) Assumption I. I Assumption 1.2 Assumption D. 1 Assumption D.2 Chapter 6
Proposition Proposition Proposition Proposition Proposition
6.1 6.2 6.3 6.4 6.5
Assumption A. 1 Assumption A.2 Assumption A . 3 Assumption A . 4 Assumption A.5 Assumption P.2 Assumption F.3 Exact Selection Assumption Assumption C Chapter 7
Proposition 7.1 Proposition 7.2 (Urysohn's Theorem) Proposition 7.3 (Alexandroff's Theorem) Proposition 7.4 Proposition 7.5 Proposition 7.6 Proposition 7.7 Proposition 7.8 Proposition 7.9 Proposition 7.10 Proposition 7.1 1 Proposition 7.12 Proposition 7.13 Proposition 7.14 Proposition 7. I5 (Kuratowski's Theorem)
Proposition 7.16 Proposition 7.17 Proposition 7.18 Proposition 7.19 Proposition 7.20 Proposition 7.21 Proposition 7.22 Proposition 7.23 Proposition 7.24 (Dynkin System Theorem) Proposition 7.25 Proposition 7.26 Proposition 7.27 Proposition 7.28 Proposition 7.29 Proposition 7.30 Proposition 7.31 Proposition 7.32 Proposition 7.33 Proposition 7.34 Proposition 7.35 Proposition 7.36 Proposition 7.37 Proposition 7.38 Proposition 7.39 Proposition 7.40 Proposition 7.41 Proposition 7.42 (Lusin's Theorem) Proposition 7.43 Proposition 7.44 Proposition 7.45 Proposition 7.46 Proposition 7.47 Proposition 7.48 Proposition 7.49 (Jankov-von Neumann Theorem) Proposition 7.50 Lemma Lemma Lemma Lemma Lemma Lemma Lemma Lemma Lemma Lemma Lemma Lemma Lemma
7.1 (Urysohn's Lemma) 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 7.12 7.13
TABLE OF PROPOSITIONS Lemma Lemma Lemma Lemma Lemma Lemma Lemma Lemma Lemma Lemma Lemma Lemma Lemma Lemma Lemma Lemma Lemma
7.14 7.15 7.16 7.17 7.18 7.19 7.20 7.21 7.22 7.23 7.24 7.25 7.26 7.27 7.28 7.29 7.30
Definition Definition Definition Definition Definition Definition Definition Definition Definition Definition Definition Definition Definition Definition Definition Definition Definition Definition Definition Definition Definition
Lemma Lemma Lemma Lemma Lemma
Definition Definition Definition Definition Definition Definition Definition Definition
Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition
Chapter 8
Lemma 8.1 Lemma 8.2
8.1 8.2 8.3 8.4 8.5 8.6 8.7
8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 Chapter 9
7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.1 1 7.12 7.13 7.14 7.15 7.16 7.17 7.18 7.19 7.20 7.21
Proposition Proposition Proposition Proposition Proposition Proposition Proposition
8.3 8.4 8.5 8.6 8.7
9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 9.10 9.11 9.12 9.13 9.14 9.15 9.16 9.17 9.18 9.19 9.20 9.21
Lemma 9.1 Lemma 9.2 Lemma 9.3 Definition Definition Definition Definition Definition Definition Definition Definition Definition Definition
9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 9.10
TABLE OF PROPOSITIONS
Chapter 10
Proposition Proposition Proposition Proposition Proposition Proposition Lemma Lemma Lemma Lemma
Appendix B
10.1 10.2 10.3 10.4 10.5 10.6
Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition Proposition
10.1 10.2 10.3 10.4
Definition Definition Definition Definition Definition Definition Definition Definition
10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8
245 248 249 249 250 251 256 Chapter 11
Proposition Proposition Proposition Proposition Proposition Proposition Proposition
11.1 11.2 11.3 11.4 11.5 11.6 11.7 Appendix A
Proposition A. 1 Lemma A. 1 Lemma A.2 Lemma A.3 Definition A. 1
Lemma Lemma Lemma Lemma Lemma Lemma Lemma Lemma
B. 1 B.2 B.3 B.4 B.5 B.6 B.7 B.8 B.9 B. 10 B. 11 B. 12
B.l B.2 B.3 B.4 B.5 B.6 B.7 B.8
Definition B. 1 Definition B.2 Definition B.3
Appendix C
Proposition Proposition Proposition Proposition
C. 1 C.2 C.3 C.4
Lemma C.l Lemma C.2 Definition C. 1 Definition C.2
Index
A
Alexandroff s theorem, 107 Analytic measurability of a function, 171 Analytic set, 160 Analytic u-algebra, 171 A posteriori distribution, 260ff A priori distribution, 260ff Axiom of choice, 301
B Baire null space, 103, 109 Borel isomorphism, 121 Borel measurability of a function. 120 Borel programmable, 2 1 Borel a-algebra, 117 Borel space, I18
C Cantor's continuum hypothesis, 301 Completion of a metric space, 114 of a a-algebra, 167 Composition of measurable functions, 298 Contraction assumption, 52
Control constraint, 2,26, 188,216,243,245,248, 251,271 space, 2,26, 188,2 16,243,245,248,251, 27 1 Cost corresponding to a policy, 2,28, 191,217, 244,249,254 one-stage, 2, 189,2 16,243,245,248,25 1, 27 1 optimal, 2,29, 191,217,244,246,250,254 C-sets, 20
D Disturbance kernel, 189,243,245,271 Disturbance space, 189,243,245,271 Dynamic programming (DP) algorithm, 3,6, 39,57,80, 198,229,259 Dynkin system, 133 Dynkin system theorem, 133
Epigraph, 82 Exact selection assumption. 95 Exponential topology, 304
F Filtering, 261 Fixed point theorem (Banach), 55 F,-set, 102
N Nonstationary model, 243
0
H Hausdorff metric, 303 Hilbert cube, 103 Homeomorphism, 104 Horizon finite, 28, 189,243,245,248,251,271 infinite, 70,213,216,243,245,248,251
I Imperfect state information model, 248 Indicator function, 103 Information vector, 248 Isometry, 144
J Jankov-von Neumann theorem, 182
K Kuratowski's theorem, 121
Observation kernel, 248 Observation space, 248 Optimality equation, 4, 57, 71, 73, 78ff, 225 nonstationary, 246 Outer integral, 273 monotone convergence theorem for, 278
P Paved space, 157 Policy,2,6,26,91, 190,214,217,243,249 analytically measurable, 190,269ff Borel-measurable, 190 €-optimal, 29, 191,215,244 {en}-dominated convergence to optimality, 29, 191,245 k-originating, 243 limit-measurable, 190,266ff Markov, 6, 190 nonrandomized, 190,249 optimal, 29,191,215,244 p-e-optimal, 12 q-optimal, 256 semi-Markov, 190 stationary, 214 uniformly N-stage optimal, 29,206 universally measurable, 190 weakly q-e-optimal, 256 p-outer measure, 166,274 Projection mapping, 103
L Limit measurability, 298 Limit a-algebra, 293 Lindelof space, I06 Lower semianalytic function, 177 Lower semicontinuous function, 146 Lower semicontinuous model, 208 Lusin's theorem, I67 M Metrizable space, 104 Monotonicity assumption. 27
Regular probability measure, 122 Relative topology, 104 R-operator, 21 5
Second countable space, 106 Semi-Markov decision problems, 34 Separable space, 105 State space, 2,26. 188,216,243,245,248, 251,271
State transition kernel, 189,243,248, 25 1 Statistic sufficient for control, 250 existence of, 259ff Stochastic kernel, 134 Stochastic programming, I Iff Suslin scheme, 157 nucleus of, 157 regular, 161 System function, 189,216,243,245,271
Topologically complete space, 107 Totally bounded space, 112
Uniform decrease assumption, 70 Uniform increase assumption. 70 Universal function, 290 Universal measurability of a function, 171 Universal a-algebra, 167 Upper semicontinuous function, 146 Upper semicontinuous model, 210 Upper semicontinuous (K) function, 3 10 Urysohn's lemma, 105 Urysohn's theorem, 106
Weak topology on space of probability measures, 125