An Introduction to Stochastic Filtering Theory
OXFORD GRADUATE TEXTS IN MATHEMATICS Books in the series 1. Keith Hann...
36 downloads
546 Views
4MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
An Introduction to Stochastic Filtering Theory
OXFORD GRADUATE TEXTS IN MATHEMATICS Books in the series 1. Keith Hannabuss: An introduction to quantum theory 2. Reinhold Meise and Dietmar Vogt: Introduction to functional analysis 3. James G. Oxley: Matroid theory 4. N.J. Hitchin, G.B. Segal, and R.S. Ward: Integrable systems: twistors, loop groups, and Riemann surfaces 5. Wulf Rossmann: Lie groups: An introduction through linear groups 6. Qing Liu: Algebraic geometry and arithmetic curves 7. Martin R. Bridson and Simon M. Salamon (eds): Invitations to geometry and topology 8. Shmuel Kantorovitz: Introduction to modern analysis 9. Terry Lawson: Topology: A geometric approach 10. Meinolf Geck: An introduction to algebraic geometry and algebraic groups 11. Alastair Fletcher and Vladimir Markovic: Quasiconformal maps and Teichmüller theory 12. Dominic Joyce: Riemannian holonomy groups and calibrated geometry 13. Fernando Villegas: Experimental Number Theory 14. Péter Medvegyev: Stochastic Integration Theory 15. Martin Guest: From Quantum Cohomology to Integrable Systems 16. Alan Rendall: Partial Differential Equations in General Relativity 17. Yves Félix, John Oprea and Daniel Tanré: Algebraic Models in Geometry 18. Jie Xiong: An Introduction to Stochastic Filtering Theory
An Introduction to Stochastic Filtering Theory Jie Xiong Department of Mathematics University of Tennessee Knoxville, TN 37996-1300, USA
1
3
Great Clarendon Street, Oxford OX2 6DP Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide in Oxford New York Auckland Cape Town Dar es Salaam Hong Kong Karachi Kuala Lumpur Madrid Melbourne Mexico City Nairobi New Delhi Shanghai Taipei Toronto With offices in Argentina Austria Brazil Chile Czech Republic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore South Korea Switzerland Thailand Turkey Ukraine Vietnam Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries Published in the United States by Oxford University Press Inc., New York © Jie Xiong 2008 The moral rights of the author have been asserted Database right Oxford University Press (maker) First published 2008 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this book in any other binding or cover and you must impose the same condition on any acquirer British Library Cataloguing in Publication Data Data available Library of Congress Cataloging in Publication Data Data available Typeset by Newgen Imaging Systems (P) Ltd., Chennai, India Printed in Great Britain on acid-free paper by Biddles Ltd., King’s Lynn, Norfolk ISBN 978–0–19–921970–4 10 9 8 7 6 5 4 3 2 1
To Jingli, Jerry and Michael
This page intentionally left blank
Preface The object of stochastic filtering is to use probability tools to estimate unobservable stochastic processes that arise in many applied fields including communication, target tracking, and mathematical finance. Stochastic filtering theory has seen a rapid development in recent years. First, the (branching) particle-system representation of the optimal filter has been studied by many authors to seek more effective numerical approximations of the optimal filter. It turns out that such a representation can be utilized to prove the uniqueness of the solution to the filtering equation itself and, hence, broadening the scope of the tractable class of models. Secondly, the stability of the filter with “incorrect” initial state as well as the long-time behavior of the optimal filter has attracted the attention of many researchers. This direction of research has become extremely challenging after a gap in a widely cited paper was discovered. Finally, many problems in mathematical finance, for example, the stochastic volatility model, lead to singular filtering models. More specifically, the magnitude of the observation noise may depend on the signal that makes the optimal filter singular. Some progress in this aspect was made recently. It is the belief of this author that the time is ripe for a new textbook to reflect these recent developments. The main theme of this book is to recapitulate these advances in a succinct and efficient manner. The book can serve as a text for mathematics as well as engineering graduate and inspired undergraduate students. It can also serve as a reference for practitioners in various fields of applications. As noted, the aim of this book is to take the students to this exciting field of research through the shortest route possible. To achieve this goal, we completely avoid the chaos decomposition used in the classical filtering theory (e.g. Kallianpur [81]). The main approach of this book is based on the particle representation for stochastic partial differential equations developed by Kurtz and Xiong ([97–99]). The methods used here can be applied to more general stochastic partial differential equations. Therefore, it provides a bridge to the readers who are interested in studying the general theory of stochastic partial differential equations. We should mention that the propagation of chaos decomposition and the multiple stochastic integral methods have provided another type of
viii
Preface
numerical scheme in the approximation of the optimal filter. The advantage of this approach is that most of the computations are done “offline” in advance. This important development is not covered in this book because we want to limit the pre-requisite of this book. We only assume that the reader has basic knowledge of probability theory, for example, the material in the book of Billingsley [11].
Acknowledgements The author hopes that the readers find this book enjoyable and informative as they seek to enter the research field of stochastic filtering. He had much assistance in making the book a reality and wishes to thank all of those who helped along the way. In particular, he would like to note that his friend and colleague, Balram Rajput at the University of Tennessee has read the entire manuscript and has helped the author to correct many grammatical and presentational problems. Tom Kurtz from the University of Wisconsin, Wei Sun from Concordia University at Montreal, Ofer Zeitouni from the University of Minnesota and Yong Zeng from the University of Missouri at Kansas City have read the manuscript and made many constructive suggestions. Don Dawson from Carleton University and Leonid Mytnik from Israel Institute of Technology have discussed with this author and made very important observations that help to improve this book substantially. Zhiqiang Li, his graduate student, has read the entire manuscript and asked questions that helped him to clarify many points from a students’s viewpoint. This book is based on a course taught by the author in the Fall semester of 2005 at the University of Tennessee. After the book was almost finished, the author also presented it in a short course at the Summer School in Beijing Normal University in 2007. The author wishes to thank the audience in both classes who raised many interesting questions. He also wishes to thank Zenghu Li for the invitation to visit Beijing Normal University and to give the noted course in the Summer School there. As much of the material is based on the author’s collaborative research in this field, he would like to thank his collaborators on stochastic filtering for this enjoyable co-operation and for allowing him to include the joint research in this book. These collaborators include Dan Crisan (Imperial College in London), Mike Kouritzin (the University of Alberta at Edmonton), Tom Kurtz (the University of Wisconsin at Madison), Wei Sun (Concordia University at Montreal), and Xunyu Zhou (the Chinese University of Hong Kong). He also would like to thank the National Security Agency for the support of his research during the last five years. Finally, the author would like to thank the staff of the Oxford University Press: Editor Alison Jones and her assistant Dewi Jackson, as well as her formal assistant Jessica Churchman, for their co-operation and help.
This page intentionally left blank
Contents
1
2
3
4
Introduction
1
1.1 1.2 1.3
1 6 8
Examples Basic definitions and the filtering equation An overview
Brownian motion and martingales
15
2.1 2.2 2.3 2.4
15 25 32 34
Martingales Doob–Meyer decomposition Meyer’s processes Brownian motion
Stochastic integrals and Itô’s formula
36
3.1 3.2 3.3 3.4 3.5 3.6
36 37 41 46 52 57
Predictable processes Stochastic integral Itô’s formula Martingale representation in terms of Brownian motion Change of measures Stratonovich integral
Stochastic differential equations
61
4.1 4.2 4.3 4.4 4.5
62 67 70 72 79
Basic definitions Existence and uniqueness of a solution Martingale problem A stochastic flow Markov property
xii
Contents
5
6
Filtering model and Kallianpur–Striebel formula
82
5.1 5.2 5.3 5.4 5.5
82 83 86 93 95
Uniqueness of the solution for Zakai’s equation 6.1 6.2 6.3 6.4 6.5 6.6
7
8
9
The filtering model The optimal filter Filtering equation Particle-system representation Notes
Hilbert space Transformation to a Hilbert space Some useful inequalities Uniqueness for Zakai’s equation A duality representation Notes
96 96 98 103 109 111 120
Uniqueness of the solution for the filtering equation
121
7.1 7.2 7.3 7.4
121 124 129 131
An interacting particle system The uniqueness of the system Uniqueness for the filtering equation Notes
Numerical methods
132
8.1 8.2 8.3 8.4 8.5
132 137 143 149 155
Monte-Carlo method A branching particle system Convergence of Vtn Convergence of V n Notes
Linear filtering
157
9.1 9.2 9.3
157 160
9.4
Gaussian system Kalman–Bucy filtering Discrete-time approximation of the Kalman–Bucy filtering Some basic facts for a related deterministic control problem
164 165
Contents
9.5 9.6
Stability for Kalman–Bucy filtering Notes
10 Stability of non-linear filtering 10.1 10.2 10.3 10.4
Markov property of the optimal filter Ergodicity of the optimal filter Finite memory property Asymptotic stability for non-linear filtering with compact state space 10.5 Exchangeability of union intersection for σ -fields 10.6 Notes 11 Singular filtering 11.1 11.2 11.3 11.4 11.5 11.6
A special example A general singular filtering model Optimal filter with discrete support Optimal filter supported on manifolds Filtering model with Ornstein–Uhlenbeck noise Notes
180 185 186 187 198 205 211 223 230 231 231 236 240 245 252 254
Bibliography
255
List of Notations
266
Index
269
xiii
This page intentionally left blank
1
Introduction
In this chapter, we first give a few motivating examples for stochastic filtering. Then, we introduce some basic definitions and state the main equations that arise in non-linear filtering theory. Finally, we give an overview of the topics to be covered in this book.
1.1
Examples
In this section, we study four examples arising from different fields of application. The first example comes from wireless communication that was in fact the main motivation for filtering theory in the early stage. The second example comes from mathematical finance where the random factors affecting the stock prices are not completely observed, instead, only the stock prices themselves are observed. The selection of a portfolio must be based on the available information provided by the movement of the stock prices. The third example comes from the field of environment protection. In this example, we estimate the distribution of the undesired chemicals in a river using the data obtained from a few observation stations along the river. Finally, in the last example, we study the filtering problem when the observation noise is given by an Ornstein–Uhlenbeck process that is an approximation of the white noise that exists only in the sense of generalized functions.
1.1.1
Wireless communication
A signal process Xt taking values in a space S is to be transmitted to a receiver. Because of the random noise, this signal is not directly observable. Instead, a function h(Xt ) (taking values in Rm ) of this signal plus an m-dimensional white noise nt is observed. The original observation
2
1 : Introduction
model is then yt = h(Xt ) + nt ,
(1.1)
where yt is called the observation process. Note that the white noise exists only in the sense of generalized functions, it is the derivative (again, in the sense of generalized functions) of a Brownian motion that exists in the ordinary sense (we refer the interested reader to the book of Kuo [96] for an introduction to the white noise theory). Therefore, it is natural for us to consider the accumulated observation process t Yt = ys ds 0
as the source of our information. The observation equation is then written as t h(Xs )ds + Wt , (1.2) Yt = 0
where Wt is an m-dimensional Brownian motion. The aim of the filtering theory is to estimate the signal based on the observation σ -field
Gt ≡ σ (Ys : 0 ≤ s ≤ t) generated by the (accumulated) observation process {Ys : s ≤ t}.
1.1.2
Portfolio optimization
We consider a market consisting of a bond and d stocks whose prices are stochastic processes Sti , i = 0, 1, . . . , d, governed by the following stochastic differential equations (SDEs): ij ˜ j dSti = Sti Xti dt + m σ ˜ d W i = 1, 2, . . . , d, t , j=1 t (1.3) 0 0 0 dSt = St Xt dt, t ≥ 0, ˜ m )∗ is a standard Brownian motion defined on ˜ := (W ˜ 1, . . . , W where W a stochastic basis (, F , P; {Ft }t≥0 ) satisfying the usual condition, Xti , i = 1, 2, . . . , d, are the appreciation rate processes of the stocks, Xt0 is ˜ t := (σ˜ tij ) is the interest rate process, and the d × m matrix valued process ∗ the volatility process. Here and throughout this book A denotes the transpose of a matrix A. Let
Gt := σ (Ssi : s ≤ t, i = 0, 1, 2, . . . , d), t ≥ 0.
1.1
Examples
˜ ˜ is the only In our model Gt , rather than FtW (the filtration generated by W), information available to the investors at time t. One of the objectives of mathematical finance is to study how to choose a suitable portfolio such that the terminal wealth is optimized. Let uit be the worth of an agent’s wealth (dollar amount) in the ith stock, i = 1, 2, . . . , d. Our decision must be based on the available information. Let L2G (0, T; Rd ) be the collection of square-integrable processes that are predictable with respect to the σ -fields Gt .
Definition 1.1 A d-dimensional process ut ≡ (u1t , . . . , udt )∗ is an admissible portfolio if ut ∈ L2G (0, T; Rd ). For the portfolio to be self-financed, the change in the wealth should be equal to the value change due to that of the stocks and the bond. Let Wt be the wealth process. Then ⎛ ⎞ d d 0 i i ⎠ dSt i dSt ⎝ d Wt = Wt − ut + u t St0 Sti i=1 i=1 ⎞ ⎛ m d d ij ˜ tj . = ⎝Xt0 Wt + (Xti − Xt0 )uit ⎠ dt + σ˜ t uit d W (1.4) i=1
i=1 j=1
Applying Itô’s formula to equation (1.3), we have d log Sti
=
Xti
m 1 ii ij ˜ j − at dt + σ˜ t d W t , i = 1, 2, . . . , d, 2
(1.5)
j=1
where ij at
:=
m
jk
σ˜ tik σ˜ t , i, j = 1, 2, . . . , d.
k=1
It is easy to show that the quadratic covariation process, which coinj cides with Meyer’s process in this case, between log Sti and log St is given
t ij ij by 0 as ds. Therefore, the matrix-valued process At ≡ (at ) is Gt -adapted. ij
Let t ≡ (σt ) be the square root of At . We will prove in Chapter 11 ij that σt is Gt -adapted, i.e. it is completely observable. As we shall see in equation (1.8) below the stock price Sti satisfies an equivalent stochastic ij ij differential equation that depends on (σt ) instead of (σ˜ t ). Moreover, d log St0 is also Gt -adapted. Xt0 = dt
3
4
1 : Introduction
However, the stochastic process Xt := (Xt1 , . . . , Xtd )∗ is not necessarily Gt -adapted and hence, its value is not available to the investors. We need to estimate Xt based on the available information Gt . From equation (1.5), we see that t m t 1 ij ˜ j log Sti − log S0i − Xsi − aiis ds = σ˜ s d W i = 1, 2, . . . , d s, 2 0 0 j=1
t
t are martingales with Meyer’s process 0 As ds = 0 s2 ds. By the martingale representation theorem, there exists a d-dimensional standard Brownian motion W ≡ (W 1 , . . . , W d ) on (, F , P) such that m
˜t = σ˜ t d W
j=1
ij
j
d
ij
j
σt dWt ,
i = 1, . . . , d.
(1.6)
j=1
Thus, d log Sti
=
Xti
d 1 ii ij j − at dt + σt dWt , 2
i = 1, . . . , d.
(1.7)
j=1
Equivalently, the stock prices satisfy the following modified stochastic differential equations: ⎛ ⎞ d j ij i = 1, . . . , d. (1.8) σt dWt ⎠ , dSti = Sti ⎝Xti dt + j=1
We assume that t is invertible. Let S˜ t be defined by d S˜ t := t−1 d log St . We can write the observation equation (1.7) as t 1˜ S˜ t = S˜ 0 + s−1 Xs − A s ds + Wt , 2 0 where
(1.9)
∗ ˜ s = a11 , . . . , add . A s s ˜
If t is non-random, then FtS = Gt . Let Yt = S˜ t − S˜ 0 . The observation model can be written as t hs (Xs )ds + Wt , Yt = 0
1.1
where hs (x) =
1.1.3
s−1
Examples
1˜ x − As . 2
Environment pollution
Suppose that there is a source of pollution at a location θ at which undesired chemicals are dumped into the river [0, ]. We assume that the dumping times follow a Poisson process with parameter λ and the amounts are independent R+ -valued random variables ξ1 , ξ2 , . . ., with the same distribution. Denote the dumping times by τ1 < τ2 < · · · . Then, the chemical distribution Xt in the river at time t is an MF ([0, ])-valued stochastic process, where MF ([0, ]) denotes the collection of finite measures on [0, ]. For τj ≤ t < τj+1 , the process Xt satisfies the following partial differential equation d Xt , f = Xt , Lf , dt
where µ, f represents the integral of a function f with respect to a measure µ, Lf (x) = Df (x) + Vf (x) − αf (x), D is the dispersion coefficient, and V is the river velocity and α is the leakage rate. At time t = τj , there is a random jump for X given by Xt − Xt− = ξj δθ , where δθ is the Dirac measure at θ. Suppose that m observation stations x1 , . . . , xm are set up along the river. The chemical concentrations near these stations are observed subject to the random error: t 1 Xs ([xi − , xi + ])ds + Wti , i = 1, 2, . . . , m. Yti = 2 0 Let Gt = σ (Ys : 0 ≤ s ≤ t). Then Gt is the information available and we need to estimate Xt based on Gt . It is also desirable to estimate the parameters θ and λ.
1.1.4
Filtering with OU process as noise
As we indicated in Section 1.1.1, white noise does not exist in the ordinary sense. We will demonstrate below that the Ornstein–Uhlenbeck process provides a natural approximation of white noise.
5
6
1 : Introduction β
Let β > 0 and consider the process Ot governed by the following SDE: β
β
dOt = −βOt dt + βdWt , β
where Wt is an m-dimensional Brownian motion. Ot is called the Ornstein– Uhlenbeck process with parameter β. Applying Itô’s formula (to be given in Chapter 3), we get β d eβt Ot = βeβt dWt , and hence, β Ot
= O0 e
−βt
+ βe
−βt
t 0
eβs dWs .
It follows from Theorem 3.6 that for t ≥ s ≥ 0,
β −β(t−s) ∞ if t = s, −β(t+s) = −e → e 0 if t = s. 2
β β Cov(Ot , Os ) β
Thus, Ot approximates white noise as β → ∞. More precisely, we can prove that its integral converges to the Brownian motion Wt . In fact, as t 1 β β Or dr = Wt − Ot , β 0 it is easy to see that 0
t
β
Or dr → Wt ,
as β → ∞.
For simplicity of notations, we take β = 1 and denote O1t by Ot . We will consider the filtering problem with the following observation model: yt = h(Xt ) + Ot .
(1.10)
Since the law of y is not absolutely continuous with respect to the law of the OU-process O, the filtering problem with equation (1.10) as the observation model is singular. We will study a general singular filtering model with equation (1.10) as a special case in Chapter 11.
1.2
Basic definitions and the filtering equation
As we have seen from the examples in the previous section, the filtering problem consists of two processes: The signal process, which is what we want
1.2
Basic definitions and the filtering equation
to estimate, and the observation process, which provides the information we can use. In this book, we will model the signal by a d-dimensional diffusion process Xt governed by the following stochastic differential equation: dXt = b(Xt )dt + c(Xt )dWt + σ (Xt )dBt ,
(1.11)
where W and B are two independent Brownian motions taking values in Rm and Rd , respectively. The mappings b : Rd → Rd , c : Rd → Rd×m and σ : Rd → Rd×d are continuous. The observation process is an m-dimensional process satisfying the following stochastic differential equation: t Yt = h(Xs )ds + Wt , (1.12) 0
where h : Rd → Rm is a continuous mapping. Let
Gt = σ (Ys : 0 ≤ s ≤ t) be the information available to us. Note that such a setup will not cover the model in the pollution example that needs an infinite-dimensional state space. Its solution is beyond the scope of this book. As we will demonstrate in Chapter 5, the non-linear optimal filter is a P (Rd )-valued process πt that is the conditional distribution of Xt given Gt , where P (Rd ) denotes the collection of all Borel probability measures on Rd . The key in the development of non-linear filtering theory is the Kallianpur–Striebel formula that represents the optimal filter πt according to an unnormalized filter Vt : Vt , f , ∀ f ∈ Cb2 (Rd ), (1.13) πt , f =
Vt , 1 where
ˆ (Mt f (Xt )|Gt ), Vt , f = E
ˆ refers to the expectation with respect to a probability measure P, ˆ and E which is equivalent to P such that dP = Mt . d Pˆ Ft The process Vt takes values in MF (Rd ), the space of finite Borel measures on Rd .
7
8
1 : Introduction
The main advantage of using Pˆ is that Y becomes a Brownian motion that is independent of B, and Xt is governed by a stochastic differential equation driven by B and Y. Based on this fact, a stochastic differential equation on MF (Rd ) is derived: t t (1.14) Vs , Lf ds + Vs , ∇ ∗ fc + fh∗ dYs , Vt , f = V0 , f + 0
0
where Lf = ∇ ∗ fb + =
1 ∗ 2 tr c ∂ fc + σ ∗ ∂ 2 f σ 2
d d 1 aij ∂ij2 f + bi ∂i f , 2 i,j=1
i=1
and a = cc∗ + σ σ ∗ . The equation above is called Zakai’s equation for the unnormalized filter. Applying Itô’s formula to the Kallianpur–Striebel formula and Zakai’s equation, we can obtain the filtering equation for πt : t (1.15) πs , Lf ds πt , f = π0 , f + +
0
0
t
πs , ∇ ∗ fc + fh∗ − πs , f πs , h∗ dνs ,
where
νt = Yt −
0
t
πs , h ds
is a Brownian motion with respect to the original probability measure P. The process νt is called the innovation process and the filtering equation is called the Kushner–Stratonovich equation, or the FKK equation (which stands for Fujisaki–Kallianpur–Kunita).
1.3
An overview
In this section, we will give an outline of the results that will be studied in this book. In Chapters 2–4, we will introduce the basic material of stochastic analysis that will be used in the rest of this book. We refer the reader who is interested in a more detailed treatment of this topic to the following books (Ikeda and Watanabe [76], Revuz and Yor [135], Protter [134]). Then, in Chapter 5, we derive the Kallianpur–Striebel formula as well as the filtering equation (1.15) and Zakai’s equation (1.14). Now we sketch the results of Chapters 6–11.
1.3
An overview
In Chapter 6, we study Zakai’s equation (1.14) as a linear stochastic partial differential equation (SPDE). We make a linear transformation from MF (Rd ) to a Hilbert-space H0 such that the equation (1.14) is transformed to an equation on H0 . We then use Hilbert-space techniques to derive various estimates for equations on H0 . As a consequence, we prove that equation (1.14) has a unique solution. In Chapter 7, we study the filtering equation (1.15) as a non-linear SPDE. To get the uniqueness of the solution, we consider a particle system ⎧
t
t ˜ i
t i i i i i ⎪ ⎨ Xt = X0 + 0 σ (Xs )dBs + 0 b(X s , µs )ds + 0 c(Xs )dνs , t i ∗ i i i (1.16) At = A0 + 0 As β (Xs , µs )dνs , ⎪ 1 n i ⎩ µ = lim t n→∞ n i=1 At δX i , t
where the ν, Bi , i = 1, 2, . . . , are independent Brownian motions, and ˜ µ) = b(x) − c(x)β(x, µ) β(x, µ) = µ, h − h(x) and b(x, for x ∈ Rd and µ ∈ P (Rd ). It can be proved that πt is a solution to equation (1.16), i.e. equation (1.16) holds with µt replaced by πt for suitable (Xti , Ait ), i = 1, 2, . . . . Theorem 1.2 Under suitable conditions, the infinite system equation (1.16) has a unique solution (X, A, µ). Next, we proceed to provide an intuitive proof for the uniqueness of the solution to equation (1.15). Let {µt } be another solution to equation (1.15). Then {µt } is a solution to the following linear SPDE: t t
ηs , Lφ ds +
ηt , φ = π0 , φ + ηs , βs∗ φ + ∇ ∗ φc) dνs , (1.17) 0
0
where βs (x) = β(x, µs ). With µt given, we consider a system of the form equation (1.16) as follows:
t
t ˜ i
t i Xti = X0i + 0 σ (Xsi )dBis + 0 b(X s , µs )ds + 0 c(Xs )dνs ,
(1.18) t i ∗ i i i At = A0 + 0 As β (Xs , µs )dνs . We define a measure-valued process 1 i At δXi . t n→∞ n n
µ˜ t = lim
i=1
9
10
1 : Introduction
Then, µ˜ t is a solution of equation (1.17). Since the linear SPDE has a unique solution, we get that µ˜ t = µt , and hence, µt is a solution to the system equation (1.16). By the uniqueness of the solution for the system equation (1.16) we see that µt = πt . Thus, we have “proved” the following Theorem 1.3 Under suitable conditions, the filtering equation (1.15) has a unique solution. Next, in Chapter 8, we study the numerical approximation for the optimal filter using some branching particle systems. Let {xni , i = 1, 2, . . . , n} be i.i.d. random vectors in Rd with common distribution π0 ∈ P (Rd ). Then 1 ≡ δxni → π0 in P (Rd ). n n
π0n
i=1
n−2α ,
0 < α < 1. For j = 0, 1, 2, . . ., we suppose that there Let δ = n are mj number of particles alive at time t = jδ. During the time interval (jδ, (j + 1)δ), the particles move according to the following diffusions: For i = 1, 2, . . . , mnj , t t t i i i i i ˆ b(Xs )ds + σ (Xs )dBs + c(Xsi )dYs , (1.19) Xt = Xjδ + jδ
jδ
jδ
where bˆ = b − ch. At the end of the interval, the ith particle (i = 1, 2, . . . , mnj ) branches i of offspring satisfying (independent of others) into a random number ξj+1 ˜ n (X i )] ˜ n (X i )}, [M with probability 1 − {M j+1 j+1 i ξj+1 = ˜ n (X i )}, ˜ n (X i )] + 1 with probability {M [M j+1 j+1
where {x} = x − [x] is the fraction of x, ˜ n (X i ) = M j+1 and n Mj+1 (X i )
= exp
( j+1)δ
∗
h jδ
1 mnj
n (X i ) Mj+1 , mnj n =1 Mj+1 (X )
(Xti )dYt
1 − 2
( j+1)δ
jδ
|h(Xti )|2 dt
Now we define the approximate filter as follows: mn
πtn
j 1 ˜n i Mj (X , t)δXi , = n t mj
i=1
(1.20)
jδ ≤ t < (j + 1)δ,
.
(1.21)
1.3
where
Mjn (X i , t)
= exp
t
h jδ
∗
(Xsi )dYs
1 − 2
t
jδ
An overview
|h(Xsi )|2 ds
,
(1.22)
and ˜ n (X i , s) = M j
1 mnj
Mjn (X i , s) . mnj n (X , s) M j =1
˜ n (X i , t). At the end Namely, the ith particle has a time-dependent weight M j of the interval, i.e. t = (j + 1)δ, this particle dies and give birth to a random number, whose conditional expectation is equal to the pre-death weight ˜ n (X i ) = M ˜ n (X i , (j + 1)δ)) of the particle, of offsprings. (note that M j+1 j Theorem 1.4 Under suitable conditions, there exists a constant K1 such that
E sup d(πtn , πt ) ≤ K1 n−
1−α 2
,
(1.23)
0≤t≤T
where d is a suitable distance on P (Rd ). In Chapter 9, we specialize the filtering problem to the linear case. The optimal filter in this case is called the Kalman–Bucy filter. We assume that π0 ˆ 0 and covariance is a multivariate normal distribution with mean vector X matrix γ0 . Suppose that the signal Xt and the observation Yt are given by linear equations ⎧ ⎨ dXt = b˜ t + bt Xt dt + ct dWt + σt dBt , (1.24) ⎩ dYt = h˜ t + ht Xt dt + dWt , Y0 = 0. Theorem 1.5 The optimal filter πt is a multivariate normal (random) disˆ t and covariance matrix γt characterized by tribution with mean vector X the following equations t t ˜ ˆ ˆ ˆ bs + bs Xs ds + (1.25) cs + γs h∗s dνs , Xt = X0 + 0
0
and d γt = γt b∗t + bt γt + at − (ct + γt h∗t )(ct + γt h∗t )∗ . dt
(1.26)
In Chapter 9, we also investigate the stability of the Kalman–Bucy filter. Let π¯ t be the multivariate normal (random) distribution with mean vector Zt
11
12
1 : Introduction
and covariance matrix Pt characterized by equations (1.25) and (1.26) with incorrect initial conditions Z0 and P0 , respectively. Then, as time increases, the Kalman–Bucy filter will correct the incorrect initials. We consider the following case: All coefficients in model equation (1.24) are independent of t, b˜ t = 0 and h˜ t = 0. Theorem 1.6 Under suitable conditions on the coefficient matrices (b − ch, σ , h), we have lim Pt = lim γt = γ∞ ,
t→∞
t→∞
and, for some λ > 0, ˆ t − Zt eλt = 0, lim X t→∞
a.s. (almost surely)
As a consequence, we have that lim ρ(πt , π¯ t ) = 0,
t→∞
a.s.,
where ρ is the Wasserstein distance in P (Rd ). We come back to the non-linear filtering problem and consider the stability and other properties for the optimal filter πt in Chapter 10. Here, we assume that the signal is a general Markov process on a Polish space S with generator L and a unique invariant measure µ ∈ P (S). Suppose that the observation process Yt is given by t Yt = h(Xs )ds + Wt , (1.27) 0
Rm
is a continuous map and Wt is an m-dimensional where h : S → Brownian motion independent of X. The next theorem establishes the Markov property for the optimal filter. Theorem 1.7 Under suitable conditions, the filter {πt } is a Feller–Markov process taking values in P (S). The ergodicity for the non-linear filter was first studied by Kunita [92]. Subsequently, many authors extended it to various setups. Then, using the results for the ergodicity, many authors studied the stability of the optimal filter. Unfortunately, there is a gap in Kunita’s proof that was found by Baxendale et al. [5]. Let {ξt , −∞ < t < ∞} be a stationary Markov process on S with generator L and marginal distribution µ. Let {βt , −∞ < t < ∞} be a Brownian
1.3
An overview
motion on Rm and let {zt , t ∈ R} be a process satisfying t h(ξu )du + βt − βs . z t − zs = s
Define z Fs,t = σ (zv − zu : s ≤ u ≤ v ≤ t),
−∞ ≤ s ≤ t ≤ ∞.
Let π¯ t be the solution of the filtering equation with incorrect initial condition. The filter is asymptotically stable if lim d(πt , π¯ t ) = 0,
t→∞
a.s.
for a suitable metric d on P (S). The key equality used by Kunita [92] is that ξ z z ∩∞ s=−∞ F−∞,t ∨ F−∞,s = F−∞,t ,
a.s.
(1.28)
However, this equality does not hold in general as pointed out by Baxendale et al. [5]. The following theorem was proved by Budhiraja [18]. Theorem 1.8 Under suitable conditions on L, the following statements are equivalent: 1. The equality equation (1.28) holds. 2. The Markov process {πt } has a unique invariant measure. 3. The filter πt is asymptotically stable. Since condition equation (1.28) is not easy to verify, many authors studied the stability problem using other methods. Below we state a result of Atar and Zeitouni [3], which is proved using the Hilbert metric. Theorem 1.9 Suppose that S is a compact manifold in Rd . Then lim sup t→∞
1 log d(πt , π¯ t ) < 0, t
a.s.
Finally, in Chapter 11, we consider the filtering model when the observation noise depends on the signal: dXt = b(Xt )dt + c(Xt )dWt + c˜ (Xt )dBt , (1.29) dYt = h(Xt )dt + σ (Xt )dWt , where B and W are two independent Brownian motions in Rd and Rm , respectively, and b, c, c˜ , h, σ are functions on Rd with values in Rd , Rd×m , Rd×d , Rm , Rm×m , respectively. Without loss of generality, we
13
14
1 : Introduction
assume that for each x ∈ Rd , we have σ (x) ∈ Sd , the collection of all symmetric positive-definite d × d-matrices. Let Zt = σ (Xt ). It is easy to show that Zt , t > 0, is observable, and hence, the optimal filter πt is a probability measure supported on the manifold MZt where for z ∈ Sd , Mz = {x ∈ Rd : σ (x) = z}. Thus, the optimal filter is singular. The following decomposition plays a key role in the study of the singular filter πt . Theorem 1.10 Under suitable conditions, there exists an observable flow ξt,s from MZs to MZt , and a diffusion process κt on the manifold MZ0 such that Xt (ω) = ξt,0 (κt (ω), ω),
∀ t ≥ 0.
Let (Yˆ t , Zt ) be the observation process, where t ˆ σ −1 h(Xs )ds + Wt . Yt = 0
We denote the optimal filter of the signal κt by Ut , which is a P (MZ0 )valued process. Note that this is a classical filtering problem, and can be solved by the methods we mentioned above. The singular filter can be represented in terms of Ut , as the following theorem points out. Theorem 1.11 For any f ∈ Cb (Rd ) and t > 0, we have πt , f = Ut , f ◦ ξt,0 . Throughout this book, we will use K with a subscript to denote a constant. The subscript will be taken consecutively in each theorem and will restart from 1 at the beginning of each new theorem. Thus, for example, K1 in two different theorems may have different values.
2
Brownian motion and martingales
In this chapter, we introduce some basic concepts and properties of martingales that will be needed in the development of the filtering theory introduced in this book. The aim is to prepare the reader with necessary material for the study of filtering theory through the shortest route possible. Throughout this book, we fix a complete probability space (, F , P) and a family of increasing sub-σ -fields Ft (t ∈ T) satisfying the usual conditions: F0 contains all P-null sets and Ft is right-continuous. We shall take T = R+ = [0, ∞) unless stated otherwise. Occasionally, we take T = N = {0, 1, 2, . . .} for the discrete case. All the stochastic processes Xt in this book will be adapted to this family of σ -fields, i.e. ∀ t, Xt is Ft -measurable. The quadruple (, F , P, Ft ) is called a stochastic basis.
2.1
Martingales
Let Xt be a real-valued stochastic process such that E|Xt | < ∞, ∀ t ∈ T. Definition 2.1 {Xt }t∈T is a martingale if ∀ s < t,
E(Xt |Fs ) = Xs ,
a.s.
(2.1)
It is a supermartingale (resp. submartingale) if equation (2.1) is replaced by inequality:
E(Xt |Fs ) ≤ Xs ,
(resp. ≥)
a.s.
We consider the discrete case first. Let T = N and let Xn be a discrete-time stochastic process. Let fn be a predictable process (i.e. fn is Fn−1 -measurable). We define a transformation (f · X)n = f0 X0 +
n k=1
fk (Xk − Xk−1 ).
16
2 : Brownian motion and martingales
Note that this transformation is the counterpart in the discrete case of the stochastic integral that will be introduced in Chapter 3. Proposition 2.2 If Xn is a martingale (resp. supermartingale) and fn is a bounded (resp. non-negative and bounded) predictable process, then (f ·X)n is a martingale (resp. supermartingale). Proof Suppose that Xn is a martingale. As (f · X)n = (f · X)n−1 + fn (Xn − Xn−1 ), we have
E((f · X)n |Fn−1 ) = (f · X)n−1 . Thus, (f · X)n is a martingale. The other case can be verified similarly.
It is useful to consider a process at a random time τ . Such a time should be “adapted” to the σ -fields Ft . Namely, whether τ ≤ t or not should be decided by using the information available at time t. More precisely, we give the following Definition 2.3 τ : → T is a stopping time if ∀ t ∈ T, {τ ≤ t} ∈ Ft . We define the σ -field at time τ as
Fτ = {A ∈ F : ∀ t ∈ T, A ∩ {τ ≤ t} ∈ Ft }. We denote the collection of all stopping times bounded by T as ST . Theorem 2.4 (Optional sampling theorem) Let X = {Xn }n∈N be a martingale (resp. supermartingale). Let τ , σ ∈ SN be such that σ (ω) ≤ τ (ω), ∀ω ∈ . Then,
E(Xτ |Fσ ) = Xσ
(resp. ≤)
a.s.
(2.2)
Proof We assume that X is a martingale. Let fn = 1σ
E(Xτ ) = E(Xσ ). For any B ∈ Fσ , it is easy to show that τB ≡ τ 1B + N1Bc and σB ≡ σ 1B + N1Bc are two stopping times and σB (ω) ≤ τB (ω) ≤ N. Hence,
2.1
Martingales
E(XτB ) = E(XσB ). Therefore, E(Xσ 1B ) = E(XσB ) − E(XN 1Bc ) = E(XτB ) − E(XN 1Bc ) = E(Xτ 1B ). This proves equation (2.2). The case for supermartingales can be proved similarly. Next, we give some estimates on the probabilities related to submartingales. The corollary of these estimates will be very important throughout this book. Theorem 2.5 (Doob’s inequality) Let {Xn }n∈N be a submartingale. Then for every λ > 0 and N ∈ N, λP max Xn ≥ λ ≤ E XN 1maxn≤N Xn ≥λ ≤ E(|XN |), n≤N
and
λP min Xn ≤ −λ ≤ E(|X0 | + |XN |). n≤N
Proof Let σ = min{n ≤ N : Xn ≥ λ} with the convention that inf ∅ = N. Then σ ∈ SN . By equation (2.2), we have
E(XN ) ≥ E(Xσ ) = E Xσ 1maxn≤N Xn ≥λ + E XN 1maxn≤N Xn <λ ≥ λP max Xn ≥ λ + E XN 1maxn≤N Xn <λ . n≤N
Thus,
λP max Xn ≥ λ ≤ E(XN ) − E XN 1maxn≤N Xn <λ n≤N
= E XN 1maxn≤N Xn ≥λ ≤ E(|XN |).
The other inequality can be proved similarly.
17
18
2 : Brownian motion and martingales
Corollary 2.6 Let {Xn }n∈N be a submartingale. Then for every λ > 0 and N ∈ N,
λP max |Xn | ≥ λ ≤ 2E(|XN |) + E(|X0 |). n≤N
Proof Combining both inequalities in Theorem 2.5, we get
λP max |Xn | ≥ λ ≤ λP max Xn ∨ max(−Xn ) ≥ λ
n≤N
n≤N
n≤N
≤ λP max Xn ≥ λ + λP min Xn ≤ −λ n≤N
n≤N
≤ 2E(|XN |) + E(|X0 |).
Corollary 2.7 (Doob’s inequality) Let {Xn }n∈N be a martingale such that for some p > 1 we have E(|Xn |p ) < ∞, ∀ n ∈ N. Then for every N ∈ N,
P max |Xn | ≥ λ ≤ n≤N
E(|XN |p ) , λp
(2.3)
and
E max |Xn | n≤N
p
≤
p p−1
p
E(|XN |p ).
(2.4)
Proof By Jensen’s inequality, |Xn |p is a submartingale and hence, equation (2.3) follows from Theorem 2.5 directly. Let Y = max |Xn |. n≤N
By Theorem 2.5, we have λP(Y ≥ λ) ≤ E(|XN |1Y≥λ ).
2.1
Hence,
∞
p
E(Y ) = E =p
0 ∞
≤
0 ∞
0
= pE
Martingales
pλp−1 1λ≤Y dλ λp−1 P(Y ≥ λ)dλ
λp−2 E 1Y≥λ |XN | dλ
0
Y
λp−2 dλ|XN |
p E(Y p−1 |XN |) p−1 1/p (p−1)/p p ≤ E(|XN |p ) E(Y p ) , p−1
=
where the last inequality follows from Hölder’s inequality. The inequality equation (2.4) then follows easily. Next, we consider the limit of submartingales. Let {Xn }n∈N be a submartingale and a < b. Define τ0 = σ0 = 0 and for n ≥ 0, τn = min{m ≥ σn−1 : Xm ≤ a} σn = min{m ≥ τn : Xm ≥ b}.
(2.5)
Then, τn and σn are two sequences of increasing stopping times and the number of upcrossings of {Xn : 0 ≤ n ≤ N} for the interval [a, b] is X (a, b) = max{n : σn ≤ N}. UN
Theorem 2.8 Suppose that {Xn }n∈N is a submartingale. Then, ∀ N ∈ N and a < b, we have X EUN (a, b) ≤
1 E{(XN − a)+ − (X0 − a)+ }. b−a
Proof By Jensen’s inequality, Yn = (Xn − a)+ is a submartingale and X (a, b) = U Y (0, b − a). Let τ and σ be defined as in equation (2.5) UN n n N with X, a, b replaced by Y, 0, b − a, respectively. If σn > N, then YN − Y0 =
n
(Yσk ∧N − Yτk ∧N ) +
k=1 X (a, b) + ≥ (b − a)UN
n
(Yτk ∧N − Yσk−1 ∧N )
k=1 n
(Yτk ∧N − Yσk−1 ∧N ).
k=1
19
20
2 : Brownian motion and martingales
Therefore, X E(YN − Y0 ) ≥ (b − a)EUN (a, b).
As a consequence of the upcrossing estimate above, we have the following submartingale convergence theorem. Theorem 2.9 If {Xn }n∈N is a submartingale such that sup E(Xn+ ) < ∞, n
then X∞ = limn→∞ Xn exists a.s. and X∞ is integrable. Proof For any r > r, we have X X EU∞ (r, r ) = lim EUN (r, r ) N→∞
≤
r
1 lim E((XN − r)+ − (X0 − r)+ ) < ∞. − r N→∞
Hence, X P lim inf Xn < lim sup Xn = P ∪r
n→∞
which proves that X∞ exists a.s. By Fatou’s lemma,
E|X∞ | ≤ lim inf E|Xn | n→∞
= lim inf (2E(Xn+ ) − E(Xn )) n→∞
≤ 2 sup E(Xn+ ) − E(X0 ) < ∞. n
Hence, X∞ is integrable.
As a corollary of the theorem above, we now prove the following martingale convergence theorem. Theorem 2.10 Suppose that Y is an integrable random variable and {Fn } is an increasing sequence of sub-σ -fields of F . Let Xn = E(Y|Fn ), ∀ n ≥ 1. Then {Xn } is a uniformly integrable martingale and lim Xn = X∞ ,
n→∞
a.s. and in L1 ().
2.1
Martingales
Furthermore, X∞ = E(Y|F∞ ),
(2.6)
where F∞ = σ (∪n Fn ) ≡ ∨n Fn . Proof By Jensen’s inequality, we have |Xn | ≤ E |Y|Fn , and hence
E |Xn |1|Xn |>λ ≤ E E |Y|Fn 1|Xn |>λ = E |Y|1|Xn |>λ ≤ E |Y|1|Y|>λ + λ P (|Xn | > λ) ≤ E |Y|1|Y|>λ + λ−1 λ E (|Xn |) ≤ E |Y|1|Y|>λ + λ−1 λ E (|Y|) ,
where λ is an arbitrary constant. Then lim sup sup E |Xn |1|Xn |>λ ≤ E |Y|1|Y|>λ . λ→∞
n
Taking λ → ∞, we get
lim sup E |Xn |1|Xn |>λ = 0,
λ→∞ n
and hence {Xn } is uniformly integrable. By Theorem 2.9, as n → ∞, we have that Xn → X∞ a.s. and hence, in L1 (). In order to prove (2.6), we define
C = {B ∈ F : E(X∞ 1B ) = E(Y1B )} . For any B ∈ Fn , we have
E(Xn 1B ) = E (E (Y|Fn ) 1B ) = E(Y1B ). As B is also in Fm for any m ≥ n, we get
E(Y1B ) = E (Xm 1B ) . Taking m → ∞, we get that B ∈ C . Thus ∪n Fn ⊂ C . Clearly ∪n Fn is closed under finite intersection and C , containing ∪n Fn , is closed under increasing limit and closed under true difference. Thus, C contains the σ -field generated by ∪n Fn , i.e. F∞ ⊂ C . This proves (2.6).
21
22
2 : Brownian motion and martingales
We will need to consider martingales in reverse time in R− . To this end, we only need to study the martingales with time parameter in Z− . Let {F−n , n ≥ 0} be a family of increasing σ -fields. Let {X−n , n ≥ 0} be a sequence of integrable random variables adapted to {F−n , n ≥ 0}. Definition 2.11 The sequence {X−n , n ≥ 0} is a backward martingale if for n ≥ 0, X−n is F−n -measurable, and for 0 ≤ n < m, we have
E (X−n |F−m ) = X−m ,
a.s.
Now we state and prove the backward martingale convergence theorem. Theorem 2.12 Let {(X−n , F−n ), n ≥ 0} be a backward martingale, and let F−∞ = ∩∞ n=0 F−n . Then the sequence {X−n , n ≥ 0} converges a.s. and in L1 to X = E(X0 |F−∞ ) as n → ∞. Proof Let U−n be the number of upcrossings of {−X−n , n ≥ 0} of [a, b] between times −n and 0. Then U−n is increasing as n increases, and let U(a, b) = lim U−n . n→∞
By the monotone convergence theorem, we get E U(a, b) = lim E {U−n } n→∞
≤
1 E (−X0 − a)+ < ∞, b−a
and hence, P U(a, b) < ∞ = 1. The same upcrossing argument as in the proof of Theorem 2.9 implies that X = lim X−n n→∞
exists a.s. As X−n = E (X0 |F−n ), the family {X−n , n ≥ 0} is uniformly integrable and X−n converges to X in L1 . Using the same arguments as in the proof of Theorem 2.10, we can show that X = E (X0 |F−∞ ). Finally, we consider continuous-time submartingales. Lemma 2.13 Let {Xt }t≥0 be a submartingale. Then ∀ T > 0, P
sup
t∈Q+ ∩[0,T]
|Xt | < ∞ = 1,
(2.7)
2.1
and
P ∀ t ≥ 0,
Martingales
lim
s∈Q+ , s↓t
Xs and
lim
s∈Q+ , s↑t
Xs exist = 1.
(2.8)
Proof Let {r1 , r2 , . . .} be an enumeration of Q ∩ [0, T]. For each n, let s1 < s2 < · · · < sn be a rearrangement of {r1 , . . . , rn }. Define Y0 = X0 , Yn+1 = XT and Yi = Xsi , i = 1, 2, . . . , n. Then Y = {Yi }i=0,1,...,n+1 is a submartingale. Therefore, by Corollary 2.6 and Theorem 2.8, 1 P max |Yi | > λ ≤ (2E|XT | + E|X0 |) , 1≤i≤n λ and
EUnY (a, b) ≤
1 1 E(Yn − a)+ ≤ E(XT − a)+ . b−a b−a
Take n → ∞, we have P
sup
t∈Q∩[0,T]
|Xt | > λ ≤
1 (2E|XT | + E|X0 |) , λ
(2.9)
and X|
EU∞ Q∩[0,T] (a, b) ≤
1 E(XT − a)+ . b−a
(2.10)
The identity equation (2.7) follows from equation (2.9) by taking λ → ∞. By equation (2.10), we see that X| Q∩[0,T] (a, b) = ∞ = 0. P ∪a
lim
s∈Q+ , s↓t
Xs or
lim
s∈Q+ , s↑t
Xs does not exist
(2.11)
X| Q∩[0,T] (a, b) = ∞ . ∪a
Therefore, the set (2.11) is of probability 0. Equation (2.8) follows by taking T → ∞.
23
24
2 : Brownian motion and martingales
Theorem 2.14 Let {Xt }t≥0 be a submartingale. Then ˆt = X
lim Xr
r∈Q, r↓t
ˆ t is a submartingale that is right-continuous with left-limit exists a.s. and X ˆ t a.s. for every t ≥ 0, and (cádlág) a.s. Furthermore, Xt ≤ X ˆ t ) = 1, P(Xt = X
∀t≥0
(2.12)
if and only if E(Xt ) is right-continuous. ˆ t exists a.s. The cádlág Proof It follows from Lemma 2.13 directly that X ˆ property of Xt can be verified easily. ˆ t is Ft+ = Ft measurable. For s > t and B ∈ Ft , we have Note that X ˆ t 1B ) = lim E(Xr 1B ) ≤ E(X r∈Q, r↓t
lim
r ∈Q, r ↓s
ˆ s 1B ). E(Xr 1B ) = E(X
ˆ t is a submartingale. Similarly, we have Hence, X ˆ t 1B ), E(Xt 1B ) ≤ E(X
∀ B ∈ Ft ,
ˆ t a.s. and hence Xt ≤ X ˆ t = EXt and hence, Xt = X ˆ t a.s. If E(Xt ) is right-continuous, then EX On the other hand, if equation (2.12) holds, then E(Xt ) is right-continuous ˆ t ) is right-continuous. since E(X ˆ in Theorem 2.14 is called a cádlág If E(Xt ) is right-continuous, then X modification of X. From now on, we always take cádlág versions for such submartingales. The following theorem is an immediate consequence of Corollary 2.7. Theorem 2.15 (Doob’s inequality) Let {Xt }t≥0 be a right-continuous martingale such that E(|Xt |p ) < ∞, ∀ t ≥ 0 for some p > 1. Then for every t ≥ 0, E(|Xt |p ) , (2.13) P max |Xs | ≥ λ ≤ s≤t λp and
E max |Xs | s≤t
p
≤
p p−1
p
E(|Xt |p ).
(2.14)
Next, we consider the continuous-time counterpart of Theorem 2.4. We need to define the class (DL) first.
2.2
Doob–Meyer decomposition
Definition 2.16 A submartingale {Xt } is in the class (DL) if ∀ T > 0, the family of random variables {Xσ : σ ∈ ST } is uniformly integrable. Remark 2.17 By the argument as in the proof of Theorem 2.10, we can prove that every right-continuous martingale is of class (DL). Theorem 2.18 (Optional sampling theorem) Suppose that {Xt }t≥0 is a rightcontinuous martingale. Let τ , σ ∈ SN be such that σ (ω) ≤ τ (ω), ∀ω ∈ . Then
E(Xτ |Fσ ) = Xσ
a.s.
(2.15)
Proof Let τn =
k 2n
if
k−1 k ≤ τ < n. n 2 2
Then τn ↓ τ is a sequence of stopping times. Let σn be defined similarly. For any A ∈ Fσ , we have A ∈ Fσn and hence, by Theorem 2.4,
E(Xσn 1A ) = E(Xτn 1A ). Take n → ∞, we have
E(Xσ 1A ) = E(Xτ 1A ). This implies that E(Xτ |Fσ ) = Xσ .
The following theorem follows from Theorem 2.18 immediately. Theorem 2.19 Let {Xt }t≥0 be a right-continuous martingale and (σt )t≥0 be ˜ t = Xσ and F˜ t = Fσ , an increasing family of bounded stopping times. Let X t t ˜ ˜ t , Ft ) is a martingale. ∀ t ≥ 0. Then (X
2.2
Doob–Meyer decomposition
Note that a submartingale increases in expectation. Therefore, it should consist of two parts: the martingale part plus an increasing process. The rigorous treatment of this idea is the so-called Doob–Meyer decomposition that is the subject of this section. Theorem 2.20 (Doob’s decomposition) A submartingale {Xn }n∈N has exactly one decomposition Xn = Mn + An ,
(2.16)
25
26
2 : Brownian motion and martingales
where {Mn }n∈N is a martingale, A0 = 0, An is Fn−1 -measurable and An ≤ An+1 a.s. for all n ∈ N. Proof Define M0 = X0 and for n ≥ 1, Mn = Mn−1 + Xn − E(Xn |Fn−1 ). Then (Mn , Fn ) is a martingale. Define An = Xn − Mn . Then A0 = 0 and An = An−1 − Xn−1 + E(Xn |Fn−1 ).
(2.17)
It is clear that An is Fn−1 -measurable. Since Xn is a submartingale, by equation (2.17), we have An ≥ An−1 a.s. Next we prove the uniqueness. Suppose (Mn , An ) is such a decomposition, then
E(Xn |Fn−1 ) = Mn−1 + An .
(2.18)
From A0 = 0 and (2.16) we see that A0 and M0 are uniquely determined. Suppose that An−1 and Mn−1 are uniquely determined. Then An is uniquely determined by equation (2.18), and Mn by equation (2.16). Thus, the uniqueness follows by induction. Next, we consider the decomposition of a continuous-time submartingale. Definition 2.21 {At }t≥0 is an integrable increasing process if A0 = 0, t → At is right-continuous and increasing a.s. and
E(At ) < ∞,
∀ t ≥ 0.
An increasing process At is natural if it “almost” has no common jumps with any bounded martingale. Namely, for any bounded martingale mt , we have E ms As = 0, s≤t
where As = As − As− is the jump of A at s. Definition 2.22 An integrable increasing process At is natural if for every bounded martingale mt , t t E ms dAs = E ms− dAs 0
holds for every t ≥ 0.
0
2.2
Doob–Meyer decomposition
The following proposition gives a useful equivalent definition of the natural increasing process. Proposition 2.23 An integrable increasing process At is natural if and only if for every bounded martingale {mt }, t E(mt At ) = E ms− dAs 0
holds for every t ≥ 0. Proof Since {mt } is bounded and right-continuous, and {At } is integrable, it follows from the dominated convergence theorem that t E ms dAs 0
n−1
= E lim
n→∞
⎛ = lim ⎝ ⎛ = lim ⎝ n→∞
n
k=0
n−1
n→∞
m (k+1)t A (k+1)t − A kt
E m (k+1)t A (k+1)t − n
k=0 n
k=1
n
n
n
E m kt A kt − n
n
n−1
⎞ E E m (k+1)t A kt F kt ⎠ n
k=0 n−1 k=0
n
n
⎞
E m kt A kt ⎠ n
n
= E(mt At ); the third equality follows from the martingale property of mt .
Now we are ready to state the continuous-time counterpart of Theorem 2.20. Theorem 2.24 (Doob–Meyer decomposition) If {Xt }t≥0 is a submartingale of class (DL), then it is expressible uniquely as Xt = Mt + At , where At is an integrable natural increasing process and Mt is a martingale. Proof “Uniqueness”. Suppose that Xt = Mt − At = Mt − At are two such decompositions. Then At − At = Mt − Mt
27
28
2 : Brownian motion and martingales
is a martingale. Therefore, for any bounded martingale mt , we have
E(mt (At − At )) t =E ms− d(As − As ) 0
= lim E n→∞
= lim E n→∞
n−1 k=0 n−1 k=0
A (k+1)t − A (k+1)t
m kt n
n
m kt n
n
M (k+1)t − M (k+1)t n
− A kt − A k
n
n
− M kt − M k n
n
n
= 0. For any bounded random variable ξ , let mt = E(ξ |Ft ). Then
E(ξ At ) = E(E(ξ |Ft )At ) = E(E(ξ |Ft )At ) = E(ξ At ). Hence, for any t ≥ 0 fixed, At = At a.s. By the right-continuity of A and A , we see that A = A a.s. “Existence”. By the uniqueness, we only need to construct the decomposition on [0, T]. Set Yt = Xt − E(XT |Ft ). Then Yt is a non-positive submartingale with YT = 0. We only need to prove the Doob–Meyer decomposition for {Yt }0≤t≤T . Let tjn = 2jTn . As {Ytjn , j = 0, 1, 2, . . . , 2n } is an Ftjn -adapted submartingale with YT = 0, it follows from Theorem 2.20 that 0 = YT = MTn + AnT . Taking conditional expectation, we get 0 = E MTn + AnT Ftjn = Mtnn + E AnT Ftjn . j
This implies that
Mtnn = −E AnT Ftjn , j
and hence, Ytjn = −E(AnT |Ftjn ) + Antn , j
(2.19)
2.2
Doob–Meyer decomposition
n -measurable. Assume for the moment where An0 = 0, Antn ≤ Antn , Antn is Ftj−1 j
j+1
j
that the family {AnT }n≥1 is uniformly integrable, which will be shown in n Lemma 2.25 below. Then there is a subsequence nk such that ATk converges to a random variable AT in the weak topology of L1 (): For any bounded n random variable ξ , E(ATk ξ ) → E(AT ξ ). Denote by Mt a right-continuous version of the uniformly integrable martingale (E(AT |Ft ))0≤t≤T and let At = Yt − Mt . Then {At } is right-continuous. Let i ≤ j. For any n0 > 0, n
n
Yt n0 + E(ATk |Ft n0 ) ≤ Yt n0 + E(ATk |Ft n0 ), i
i
j
j
and hence by taking k → ∞, we have At n0 ≤ At n0 . Therefore, {At } is i
n
j
increasing on {ti 0 : n0 ≥ 1, i = 0, 1, . . . , 2n0 }, and thus on all [0, T]. Finally, we prove that {At } is natural. Let mt be a non-negative, bounded, right-continuous martingale. By the dominated convergence theorem,
E
0
T
ms− dAs = lim
n −1 2
n→∞
= lim
i=0 n −1 2
E mtin Antn − E mtin Antn )
n→∞
= lim
n E mtin (Ati+1 − Atin )
i+1
i=0 n −1 2
n→∞
i
n n n A n n E mti+1 − m A n ti t t
i=0
i+1
i
= E(mT AT ), where the penultimate equality follows from the fact that Antn
i+1
measurable. Hence At is natural.
is Ftin
Lemma 2.25 The family of random variables {AnT }n≥1 is uniformly integrable. Proof By equation (2.19) and the predictability of {Antn : j = 0, 1, . . . , 2n }, j
it is easy to show that Antn = k
k−1
n |Ft n ) − Yt n . E(Ytj+1 j j
j=0
29
30
2 : Brownian motion and martingales
Let c > 0 be fixed and n : Antn > c} σcn = inf{tk−1 k
with the convention that the infimum over an empty set is T. Then σcn ∈ ST and Anσcn ≤ c. By the optional sampling theorem and equation (2.19), we have Yσcn = Anσcn − E(AnT |Fσcn ). Hence,
E(AnT 1AnT >c ) = −E(Yσcn 1σcn
(2.20)
Note that
E(Anσcn 1σcn
≤
2E((AnT
n
n 1σ n
It then follows from equation (2.20) that n 1σ n c ) ≤ −E(Yσcn 1σcn
Note that sup E(Yσcn 1σcn c ) + sup E(Yσcn 1|Yσ n |≤c , σcn
c
n
c
n
≤ sup E(Yσcn 1|Yσ n |>c ) + c sup P(σcn < T). c
n
n
n } and {Yσ n } are uniformly integrable and As {Yσc/2 c
P(σcn < T) = P(AnT > c) ≤
1 1 EAnT ≤ EY0 → 0 c c
as c → ∞, uniformly in n, we have lim sup E(AnT 1AnT >c ) = 0.
c→∞ n
This then proves the uniform integrability.
2.2
Doob–Meyer decomposition
Sometimes, we need At to be a continuous process. To this end, we need to assume that EXσ is continuous in stopping time σ . Definition 2.26 A submartingale Xt is regular, if for any T > 0 and for any σn ∈ ST increasing to σ , we have E(Xσn ) → E(Xσ ). Theorem 2.27 Let Xt be a regular submartingale of class (DL). Then At in the Doob–Meyer decomposition is continuous a.s. Proof Suppose that σn ∈ ST increasing to σ , then E(Aσn ) ↑ E(Aσ ) and hence, Aσn ↑ Aσ a.s. Set tjn = 2jTn . For c > 0, we define n ∧ c|Ft ), Ant = E(Atj+1
n t ∈ (tjn , tj+1 ].
n ] and A is a natural Since Ant is a martingale on the interval (tjn , tj+1 t increasing process, it is easy to show that t t n Ans− dAs , ∀t ∈ [0, T]. (2.21) E As dAs = E 0
0
n
Next we show that there exists a subsequence nk such that At k → At ∧ c uniformly in t ∈ [0, T] so we can pass to the limit in the above equality. For > 0, we define σn, = inf{t ∈ [0, T] : Ant − At ∧ c > }, n for t ∈ (t n , t n ]. Then with the convention that inf ∅ = T. Let πn (t) = tj+1 j j+1 n σn, , πn (σn, ) ∈ ST . Since At is decreasing in n, σn, is increasing in n. Let σ = limn→∞ σn, . Then σ ∈ ST and limn→∞ πn (σn, ) = σ . By the optional sampling theorem,
E(Anσn, ) = E(Aπn (σn, ) ∧ c), and hence, P(σn, < T) ≤ −1 E(Anσn, − Aσn, ∧ c) = −1 E(Aπn (σn, ) ∧ c − Aσn, ∧ c) → 0,
as n → ∞.
Therefore, lim P
n→∞
sup |Ant − At ∧ c| >
t∈[0,T]
= 0.
31
32
2 : Brownian motion and martingales
Hence, there exists a subsequence nk such that lim
n
sup |At k − At ∧ c| = 0,
nk →∞ t∈[0,T]
Thus, by equation (2.21), we have T E As ∧ cdAs = E
T
0
0
a.s.
As− ∧ cdAs .
Hence, 0=E
T
0
(As ∧ c − As− ∧ c)dAs ≥ E
(As ∧ c − As− ∧ c)(As − As− ).
s≤T
This implies the continuity of s → As ∧ c. Since c is arbitrary, we have the continuity of At .
2.3
Meyer’s processes
In this section, we introduce Meyer’s process for each square-integrable martingale. This process will play an important role in the definition of the stochastic integral in the next chapter. Definition 2.28 A martingale {Mt }t≥0 is a square-integrable martingale (denoted by M ∈ M2 ) if
E(Mt2 ) < ∞,
∀ t ≥ 0.
If M is continuous, then we write M ∈ M2,c . Lemma 2.29 If {Mt }t≥0 is a right-continuous square-integrable martingale, then Mt2 is a right-continuous submartingale of class (DL). Proof By Jensen’s inequality, Mt2 is a submartingale. By Theorem 2.15, we have
E
sup Mt2
Hence, for c → ∞, we have sup E
σ ∈ST
≤ 4E(MT2 ) < ∞.
0≤t≤T
Mσ2 1Mσ2 ≥c
Hence, Mt2 is in class (DL).
≤E
sup
0≤t≤T
Mt2 1sup 2 0≤t≤T Mt ≥c
→ 0.
2.3
Meyer’s processes
Applying the Doob–Meyer decomposition, there exists a unique natural increasing process At such that Mt2 − At is a martingale. We shall denote At by Mt , which is called Meyer’s process of Mt . Finally, we consider Meyer’s process between two martingales. Definition 2.30 For M, N ∈ M2 , the stochastic process
M, Nt =
1 ( M + Nt − M − Nt ) 4
is called Meyer’s process of Mt and Nt . Sometimes, we need to define Meyer’s process for a more general class of stochastic processes. Definition 2.31 A real-valued process {Mt }t∈R+ is a local martingale if there exists a sequence of stopping times τn increasing to ∞ almost surely such that ∀ n, Mtn ≡ Mt∧τn is a martingale. We denote the collection of all continuous local martingales by Mcloc , and all continuous locally square2,c
integrable martingales by Mloc .
Remark 2.32 Let Mt be a continuous local martingale. Define σn (ω) = inf{t : |Mt (ω)| ≥ n}, with the convention that inf ∅ = ∞. Then ∀ n, Mtn ≡ Mt∧σn is a bounded continuous martingale. Theorem 2.33 Let Mt be a continuous local martingale. Then there exists a unique continuous increasing process At with A0 = 0 such that Mt2 − At is a local martingale. We shall denote At by Mt . Proof Let Mtn be given as in Remark 2.32. Let Ant = Mn t . The continuous n+1 martingale Mt∧σ has Meyer’s process An+1 t∧σn . However, n n+1 Mt∧σ = Mt∧σn ∧σn+1 = Mt∧σn = Mtn , n
which has Meyer’s process Ant . Hence, n An+1 t∧σn = At ,
∀t.
Define At = Ant ,
t ≤ σn .
Then A0 = 0 and At is a continuous increasing process and At∧σn = Ant .
33
34
2 : Brownian motion and martingales 2 Since Mt∧σ = (Mtn )2 , it is clear that Mt2 − At is a local martingale with n localizing stopping times {σn }. The uniqueness of At follows from that of the process Ant .
As a consequence of Theorem 2.18, we have Corollary 2.34 Let Xt be a continuous local martingale and (σt )t≥0 be an ˜ t = Xσ increasing family of right-continuous bounded stopping times. Let X t and F˜ t = Fσt , ∀ t ≥ 0. Suppose that X is constant on [σt− , σt ], for any t. ˜ t , F˜ t ) is a continuous local martingale with Then (X ! " ˜ = Xσ . X t t
Proof Let τn be a localizing stopping-time sequence for X. Let τ˜n = inf{t : σt ≥ τn }. For any s > 0, we have {τ˜n ≤ t} ∩ {σt ≤ s} = {τn ≤ σt ≤ s} ∈ Fs . Then, {τ˜n ≤ t} ∈ Fσt = F˜ t , and hence, τ˜n is an F˜ t -stopping time. As τn ∈ [στ˜n − , στ˜n ], we have ˜ t∧τ˜ = Xσ ∧σ = Xσ ∧τ X t t n n τ˜n is an F˜ t -martingale.
2.4
Brownian motion
Brownian motion is the simplest and the most useful square-integrable martingale. In a sense, stochastic analysis is a branch of mathematics that studies the functionals of Brownian motions. Definition 2.35 A d-dimensional continuous process Xt is a Brownian motion if X0 = 0, for any t > s, Xt −Xs is independent of Fs and Xt −Xs has a multivariate normal distribution with mean zero and covariance matrix (t − s)Id , where Id is the d × d identity matrix. The next theorem shows that Meyer’s process for Brownian motion is tId . The converse of this theorem is also true and will be proved in the next chapter.
2.4
Brownian motion
Theorem 2.36 Suppose that Xt = (Xt1 , Xt2 , . . . , Xtd ) is a d-dimensional j Brownian motion. Then Xt , j = 1, 2, . . . , d, are square-integrable martingales satisfying ! " X j , X k = δjk t. (2.22) t
Proof It is easy to show that Xti is a square-integrable martingale. We only prove (2.22). For t > s, we have j
E(Xt Xtk − δjk t|Fs ) j
j
j
= E((Xt − Xs )(Xtk − Xsk )|Fs ) + Xs Xsk − δjk t j
j
j
+ E(Xsk (Xt − Xs ) + Xs (Xtk − Xsk )|Fs ) j
= δjk (t − s) + Xs Xsk − δjk t j
= Xs Xsk − δjk s. j
Therefore, Xt Xtk − δjk t is a martingale. This proves (2.22).
35
3
Stochastic integrals and Itô’s formula
In this chapter, we define stochastic integrals and introduce some stochastic analysis results that are essential in the later chapters. This chapter is organized as follows: In Section 3.1, we define predictable processes. Intuitively, the set of predictable processes is the “closure” of the set of left-continuous processes. In Section 3.2, we give the definition of the stochastic integral with respect to a square-integrable martingale. Meyer’s process, defined in Chapter 2, plays a key role. We prove Itô’s formula in Section 3.3. This formula is very important in stochastic analysis, just like the chain rule in the ordinary analysis. In Section 3.4, we show that the Brownian motion is uniquely characterized by its Meyer process. We also give some representation theorems for square-integrable martingales in terms of Brownian motions. In Section 3.5, we present Girsanov’s formula for the change of probability measures. Finally, in Section 3.6, we study the Stratonovich integral. The advantage of this integral is that Itô’s formula based on it coincides with the chain rule in calculus, which is easier to use than Itô’s formula based on Itô’s integral.
3.1
Predictable processes
Let L be the collection of all measurable maps X : (R+ × , B (R+ ) ⊗ F ) → (R, B (R)), such that ∀ t ≥ 0, Xt : → R is Ft -measurable and, for each ω, t → Xt (ω) is left-continuous. Let P = σ X −1 (B) : B ∈ B (R), X ∈ L , where X −1 (B) = {(t, ω) ∈ R+ × : Xt (ω) ∈ B} .
3.2
Stochastic integral
Namely, P is the smallest σ -field on (R+ ×, B (R+ )⊗ F ) such that ∀ X ∈ L, X : (R+ × , P ) → (R, B (R)) is measurable. Definition 3.1 A stochastic process X = (Xt (ω)) is predictable if the mapping X : (R+ × , P ) → (R, B (R)) is measurable. Example 3.2 Let 0 = t0 < t1 < · · · < tn . Define the simple process Xt (ω) = X0 (ω)1{0} (t) +
n−1
Xj (ω)1(tj ,tj+1 ] (t).
j=0
If Xj is Ftj -measurable, j = 0, 1, . . . , n, then X is predictable. The following lemma gives a useful alternative description of the predictable σ -field P . Lemma 3.3 The σ -field P is generated by all sets of the form = (u, v]×B, where B ∈ Fu , or = {0} × B, where B ∈ F0 . Proof Let G be the collection of all sets of the form = (u, v] × B, where B ∈ Fu , or = {0} × B, where B ∈ F0 . For every ∈ G , it is easy to see that 1 ∈ L and hence G ⊂ P . This implies that σ (G ) ⊂ P , where σ (G ) is the σ -field generated by G . On the other hand, for each X ∈ L, we define 2
Xtn (ω)
= X0 (ω)1{0} (t) +
n
Xj/n (ω)1(jn−1 ,( j+1)n−1 ] (t).
j=0
It is clear that X n is σ (G )-measurable and Xtn (ω) → Xt (ω) for each t ≥ 0 and ω ∈ . Hence, X is σ (G )-measurable. This implies that P ⊂ σ (G ). Therefore, P = σ (G ).
3.2
Stochastic integral
Denote by L0 the collection of all simple predictable processes ft of the form ft (ω) =
n−1
fj (ω)1(tj ,tj+1 ] (t),
j=1
where 0 ≤ t0 < · · · < tn , fj is a bounded Ftj -measurable random variable.
37
38
3 : Stochastic integrals and Itô’s formula
Let M ∈ M2,c be fixed. For f ∈ L0 , we define the Itô stochastic integral as fs dMs =
I(f ) ≡
n−1
fj (Mtj+1 − Mtj ).
(3.1)
j=1
Proposition 3.4 The stochastic integral satisfies the following identities: For every f ∈ L0 , we have
E
fs dMs
= 0,
and
E
2 fs dMs
=E
fs2 d Ms .
Proof The first equality follows from
E
fs dMs
=
n−1
E(fj (Mtj+1 − Mtj ))
j=1
=
n−1
E(fj E(Mtj+1 − Mtj |Ftj ))
j=1
= 0.
(3.2)
To prove the second equality, we note that 2
fs dMs
=
n−1
fj2 (Mtj+1 − Mtj )2
j=1
+2
fj fk (Mtj+1 − Mtj )(Mtk+1 − Mtk )
0≤j
≡ I1 + I2 . Similar to equation (3.2), we have E(I2 ) = 0. On the other hand,
E(I1 ) =
n−1 j=1
E fj2 E (Mtj+1 − Mtj )2 Ftj
3.2
=
n−1
Stochastic integral
E fj2 Mtj+1 − Mtj
j=1
fs2 d Ms .
=E
To extend the definition of the stochastic integral to more general f , for M ∈ M2,c , we define a measure νM on (R+ × , P ) by νM (A) = E 1A (t, ω)d Mt . By Lemma 3.3, it is easy to show that L0 is a dense subspace of L2 (νM ). The following theorem follows from Proposition 3.4 directly. Theorem 3.5 The mapping I : L0 → L2 (, F , P) defined by equation (3.1) is a linear isometry. Namely, for f , g ∈ L0 and α, β ∈ R, we have I(αf + βg) = αI(f ) + βI(g), and
E |I(f )|2 =
R+ ×
a.s.
|f (t, ω)|2 νM (dtdω).
As a consequence, it can be extended uniquely to a linear isometry from L2 (νM ) into L2 (, F , P). We still denote the extension by I(f ) = fs dMs . We then define the stochastic integral as a process t It (f ) ≡ fs dMs ≡ fs 1[0,t] (s)dMs . 0
Theorem 3.6 The stochastic process It ( f ) is a continuous square-integrable martingale with Meyer’s process t fs2 d Ms . I( f ) t = 0
Proof We only need to prove the theorem for t ≤ T with T being fixed. Let {f n } be a sequence of simple predictable processes such that |fsn | ≤ |fs |,
(3.3)
39
40
3 : Stochastic integrals and Itô’s formula
and
E
T 0
(fsn − fs )2 d Ms < 2−n .
By the definition of the stochastic integral, we see that ∀ t ∈ [0, T], E |It (f n ) − It (f )|2 → 0.
(3.4)
It is easy to verify that It (f n ) and t n 2 (fsn )2 d Ms It (f ) − 0
are martingales. It then follows from equations (3.4) and (3.3) that It (f ) and t fs2 d Ms It (f )2 − 0
are martingales. Therefore, It (f ) is a square-integrable martingale with Meyer’s process t I(f ) t = fs2 d Ms . 0
From Theorem 2.15, we get
1 n P sup It (f ) − It (f ) > n 0≤t≤T 2 ≤ n2 E IT (f n ) − IT (f ) T (fsn − fs )2 d Ms < n2 2−n , = n2 E 0
which is summable. By Borel–Cantelli’s lemma, we have 1 n P sup It (f ) − It (f ) > , infinitely often = 0. n 0≤t≤T Hence,
sup It (f n ) − It (f ) → 0,
a.s.
0≤t≤T
As It ( f n ) are continuous, It (f ) is continuous a.s.
2,c
Finally, we give the definition of the stochastic integral when M ∈ Mloc .
3.3
Itô’s formula
2,c
Definition 3.7 For M ∈ Mloc , let L2loc (M) be the collection of all real-valued predictable processes f such that there exists a sequence of stopping times σn ↑ ∞ a.s. and
E
T∧σn
0
ft2 d Mt
< ∞,
∀ T > 0, n ∈ N.
(3.5)
It is clear that we may choose σn in Definition 3.7 such that ∀ n ∈ N, Mtσn ≡ Mt∧σn is a square-integrable martingale and equation (3.5) is satisfied. Define Itn (f ) = It (1(0,σn ] f ). For m < n, it is easy to verify that n (f ). Itm (f ) = It∧σ m
Therefore, there exists a unique stochastic process It (f ) such that Itn (f ) = It∧σn (f ). Definition 3.8 It (f ) is called the stochastic integral of f ∈ L2loc (M) with
t 2,c respect to M ∈ Mloc . We also write 0 fs dMs for It (f ).
3.3
Itô’s formula
In this section, we derive Itô’s formula for a function of a semimartingale. This formula is the counterpart in stochastic analysis of the chain rule in ordinary calculus. Definition 3.9 A d-dimensional process Xt is a continuous semimartingale if Xt = X0 + Mt + At , where M1 , . . . , Md are continuous local martingales and A1 , . . . , Ad are continuous finite-variation processes. Before we state Itô’s formula, we need to introduce the following notations. Let Cb2 (Rd ) be the collection of all bounded differentiable functions with bounded partial derivatives up to order 2. We denote the partial derivative of a function F with respect to its ith variable by ∂i F. Similarly, we 2F by ∂ij2 F. denote ∂x∂ i ∂x j
41
42
3 : Stochastic integrals and Itô’s formula
Theorem 3.10 (Itô’s formula) Let Xt be a d-dimensional continuous semimartingale and let F ∈ Cb2 (Rd ). Then F(Xt ) = F(X0 ) +
d
t
0
i=1
∂i F(Xs )dMsi
+
d i=1
t 0
∂i F(Xs )dAis
d " ! 1 t 2 + ∂ij F(Xs )d Mi , Mj . s 2 0
(3.6)
i,j=1
Proof For simplicity of notations, we assume that d = 1. Let τn =
0 if |X0 | > n, inf {t : |Mt | > n or Var(A)t > n or Mt > n} if |X0 | ≤ n,
where Var(A)t is the total variation of A on [0, t]. It is clear that τn ↑ ∞ a.s. We only need to prove equation (3.6) with t replaced by t ∧ τn . In other words, we assume that |X0 |, |Mt |, Var(A)t and Mt are all bounded by a constant K and F ∈ C02 (R). Here, C02 (R) stands for the set of all functions that are in Cb2 (R) and are of compact supports. Let ti = itn , i = 0, 1, . . . , n. Then n (F(Xti ) − F(Xti−1 ))
F(Xt ) − F(X0 ) =
i=1 n
=
F (Xti−1 )(Xti − Xti−1 )
i=1
1 + F (ξi )(Xti − Xti−1 )2 2 n
≡
I1n
i=1 + I2n ,
(3.7)
where ξi is between Xti−1 and Xti . Note that as n → ∞, I1n
=
n
→
i=1 t 0
F (Xti−1 )(Mti − Mti−1 ) + F (Xs )dMs +
n
F (Xti−1 )(Ati − Ati−1 )
i=1
t 0
F (Xs )dAs .
(3.8)
3.3
Itô’s formula
On the other hand, 2I2n =
n
F (ξi )(Mti − Mti−1 )2
i=1
+2
n
F (ξi )(Mti − Mti−1 )(Ati − Ati−1 )
i=1
+ ≡
n
F (ξi )(Ati − Ati−1 )2
i=1 n n I21 + I22
n + I23 .
(3.9)
Since At is continuous and of finite variation and M is continuous, it is easy n → 0 and I n → 0. to show that I22 23 Let Vkn
k = (Mti − Mti−1 )2 ,
k = 1, 2, . . . , n.
i=1
Then,
E
(Vnn )2
=
n
E(Mti − Mti−1 )4
i=1
+2
E E (Mtj − Mtj−1 )2 |Ftj−1 (Mti − Mti−1 )2
1≤i<j≤n
≤ 4K2
n
E(Mti − Mti−1 )2
i=1
+2
E
Mtj − Mtj−1 (Mti − Mti−1 )2
1≤i<j≤n
≤ 4K2 E(Vnn ) + 2K
E (Mti − Mti−1 )2
1≤i
≤ (4K
2
+ 2K)E(Vnn )
# ≤ (4K2 + 2K) E (Vnn )2 . Hence, E (Vnn )2 ≤ (4K2 + 2K)2 .
43
44
3 : Stochastic integrals and Itô’s formula
Let n
I3n =
F (Xti−1 )(Mti − Mti−1 )2 ,
i=1
and I4n =
n
F (Xti−1 ) Mtj − Mtj−1 .
i=1
Then,
E(|I3n
2 n − I21 |)
≤E
max |F (ξi ) − F (Xti−1 )|
1≤i≤n
2
E (Vnn )2 → 0, (3.10)
and
I4n →
Finally,
t
0
F (Xs )d Ms .
(3.11)
E |I3n − I4n |2 =E
n i=1
2 F (Xti−1 )2 (Mti − Mti−1 )2 − Mti − Mti−1
n 2 2 ≤ F ∞ E (Mti − Mti−1 )4 + Mti − Mti−1
$
i=1
→ 0.
(3.12)
Combining equations (3.10), (3.11) and (3.12), we see that t n F (Xs )d Ms . I21 →
(3.13)
0
Equation (3.6) then follows from equations (3.7), (3.8), (3.9) and (3.13). As an application of Itô’s formula, we prove that for continuous squareintegrable martingales, Meyer’s processes coincide with the quadratic variation process. This point of view will be useful in Chapter 11. 2,c
Theorem 3.11 Suppose that M ∈ Mloc . Let 0 = t0n < t1n < · · · < tnn = t be such that n max (tjn − tj−1 ) → 0.
1≤j≤n
3.3
Itô’s formula
Then, lim
n→∞
n 2 n ) = M . (Mtjn − Mtj−1 t j=1
Proof Note that n 2 n ) (Mtjn − Mtj−1 j=1
=
n
n tj−1
j=1
=2
t 0
tjn
n )dMs + M n − M n 2(Ms − Mtj−1 tj tj−1
Ms dMs − 2
n
n (Mt n − Mt n ) + M Mtj−1 t j j−1
j=1
→ Mt , where the first step follows from Itô’s formula, and the last step follows from the definition of the stochastic integral. As another application of Itô’s formula, we derive the Burkholder– Davis–Gundy inequality in this section. We recall Doob’s inequality (Theorem 2.15): p p p E max |Xs | ≤ E(|Xt |p ). s≤t p−1 Suppose that X0 = 0. As Xs2 − Xs is a martingale, we have E |Xt |2 = E ( Xt ), and Doob’s inequality becomes 2 (3.14) E max |Xs | ≤ 4E ( Xt ) . s≤t
Equation (3.14) is called the Burkholder–Davis–Gundy inequality. It also holds for general p ≥ 1. Since in this book we will only need the case of p ≥ 2, whose proof is much easier than other cases, we will only state this case in the next theorem. Theorem 3.12 (Burkholder–Davis–Gundy inequality) Suppose that p ≥ 2 and X ∈ M2,c satisfying X0 = 0. Then there exists a constant Kp such that p p 2 (3.15) E max |Xs | ≤ Kp E Xt . s≤t
45
46
3 : Stochastic integrals and Itô’s formula
Proof Since |x|p ∈ C 2 for p ≥ 2, it follows from Itô’s formula that t p(p − 1) t |Xt |p = p|Xs |p−2 Xs dXs + |Xs |p−2 d Xs . 2 0 0 Taking expectation on both sides, it follows from Hölder’s inequality that t p(p − 1) p E |Xt | = E |Xs |p−2 d Xs 2 0 p(p − 1) p−2
Xt ≤ E max |Xs | s≤t 2 p−2 2 p p p p(p − 1) p 2 ≤ E max |Xs | E Xt . s≤t 2 Combining with Doob’s inequality, we get
E max |Xs | s≤t
p
p ≤ p−1
p
p−2 2 p p p p(p − 1) p 2 E max |Xs | E Xt . s≤t 2
The inequality equation (3.15) then follows easily.
3.4
Martingale representation in terms of Brownian motion
In this section, we make use of Itô’s formula to show that the Brownian motion is characterized by its Meyer process. Then, as consequences of this result, we present some representation theorems for square-integrable martingales in terms of Brownian motions. Recall that ξ ∗ is the transpose of a vector or a matrix ξ . 2,c
Theorem 3.13 Suppose that Xt = (Xt1 , . . . , Xtd )∗ is such that X j ∈ Mloc , X0 = 0 and ! " X j , X k = δjk t, j, k = 1, 2, . . . , d. t
Then Xt is a d-dimensional Brownian motion. Proof Let ξ ∈ Rd . Applying Itô’s formula to the function exp (iξ ∗ x), we have t ∗ ∗ exp iξ Xt = exp iξ Xs + i exp iξ ∗ Xu ξ ∗ dXu −
1 2
s
s
t
|ξ |2 exp iξ ∗ Xu du.
3.4
Thus,
Martingale representation in terms of Brownian motion
E exp iξ Xt Fs = exp iξ ∗ Xs ∗ 1 t 2 |ξ | E exp iξ Xu Fs du. − 2 s
∗
By solving this integral equation, we get 1 E exp iξ ∗ (Xt − Xs ) Fs = exp − |ξ |2 (t − s) . 2 This implies that Xt −Xs is independent of Fs and Xt −Xs has a multivariate normal distribution with mean 0 and covariance matrix (t − s)Id . Namely, Xt is a d-dimensional Brownian motion. As an application of Theorem 3.13, we can represent any locally squareintegrable martingale as a time change of a Brownian motion. 2,c
Theorem 3.14 Suppose that M ∈ Mloc satisfying limt→∞ Mt = ∞ a.s. Let τt = inf{u : Mu > t}, and F˜ t = Fτt . Then Bt = Mτt is an (F˜ t )-Brownian motion. As a consequence, Mt has the following representation: Mt = B Mt . Proof We first prove that Bt is continuous. Note that the only possible case for Bt not being continuous is that τt has a jump and M is not constant over this jump. Suppose that τt has a jump at t0 . Denote r = τt0 − and r = τt0 . Then Mu = t0 for all u ∈ (r, r ); and M is not constant on the interval (r, r ). Therefore, we only need to show that (3.16) P { Mr = Mr } \ {Mu = Mr , ∀ u ∈ [r, r ]} = 0. Let σ = inf {s > r : Ms > Mr } . Then σ is a stopping time and hence, by Doob’s optional sampling theorem, Ns ≡ Mσ ∧(r+s) − Mr is a local martingale with respect to Fˆ s ≡ Fσ ∧(r+s) . Since
Ns = Mσ ∧(r+s) − Mr = 0, we have N = 0. This implies equation (3.16).
47
48
3 : Stochastic integrals and Itô’s formula
By Corollary 2.34,
Bt = Mτt = t. Therefore, Bt is a Brownian motion.
Next, we would like to remove the condition that limt→∞ Mt = ∞ a.s. To this end, we need to define the Brownian motion in an extended probability space. ˜ F˜ t ) is an extension of a stochastic ˜ F˜ , P, Definition 3.15 We say that (, ˜ → that is F˜ /F basis (, F , P, Ft ) if there exists a mapping π : −1 measurable such that i) F˜ t ⊃ π (Ft ); ii) P = P˜ ◦ π −1 and iii) for every bounded random variable X on , ˜ X( ˜ ω)| ˜ E( ˜ F˜ t ) = E(X|Ft )(π ω)
˜ ˜ P-a.s. (almost surely with respect to P),
˜ ω) ˜ by X if its meaning is ˜ We shall denote X where X( ˜ = X(π ω), ˜ for ω˜ ∈ . clear from the context. ˜ F˜ t ) is called a standard extension of the stochas˜ F˜ , P, The quadruple (, tic basis (, F , P, Ft ) if we have another stochastic basis ( , F , P , Ft ) such that ˜ F˜ t ) = (, F , P, Ft ) × ( , F , P , F ), ˜ F˜ , P, (, t ˜ and π ω˜ = ω for ω˜ = (ω, ω ) ∈ . 2,c
Theorem 3.16 For M ∈ Mloc , we define τt = inf{u : Mu > t}, with the convention that inf ∅ = ∞. Let Fˆ t = σ ∪s>0 Fτt ∧s . ˜ F˜ t ) of (, F , P, Fˆ t ) there exists an ˜ F˜ , P, Then, on an extension (, ˜ (Ft )-Brownian motion Bt such that Mt = B Mt . Proof By the optional sampling theorem, ∀ s ≥ s and u ≥ v,
E(Mτu ∧s |Fτv ∧s ) = Mτv ∧s , and
E((Mτu ∧s − Mτv ∧s )2 |Fτv ∧s ) = E( Mτu ∧s − Mτv ∧s |Fτv ∧s ).
3.4
Martingale representation in terms of Brownian motion
Therefore, s → Mτu ∧s is a square-integrable martingale with Meyer’s process Mτu ∧s . By the martingale convergence theorem (Theorem 2.9), ˜ u = lim Mτ ∧s B u s↑∞
exists a.s. Further, ∀ u ≥ v, we have
E(B˜ u |Fˆ v ) = B˜ v , and
E((B˜ u − B˜ v )2 |Fˆ v ) = E( Mτu − Mτv |Fˆ v ). Let ( , F , P , Ft ) be a stochastic basis and let Bt be a Brownian motion on . Define the standard extension by ˜ F˜ t ) = (, F , P, Fˆ t ) × ( , F , P , F ). ˜ F˜ , P, (, t Let ˜ t∧ M . Bt = Bt − Bt∧ M∞ + B ∞ Then Bt is a continuous F˜ t -martingale with Bt = t, and hence, a Brownian motion. The rest of the proof is easy. Next, we represent a square-integrable martingale as a stochastic integral with respect to a Brownian motion. Theorem 3.17 Let Mi ∈ M2,c , i = 1, 2, . . . , d. Let ij : R+ × → R, i, j = 1, 2, . . . , d be predictable processes such that !
i
M ,M
j
" t
=
t d 0
ik (s)jk (s)ds.
k=1
If det((s)) = 0 a.s.
∀ s,
(3.17)
then there exists a d-dimensional Brownian motion Bt (on the original stochastic basis) such that Mti =
d k=1
0
t
ik (s)dBks .
Proof For N > 0, let IN (s) = 1max1≤i,j≤d |( −1 )ij (s)|≤N ,
(3.18)
49
50
3 : Stochastic integrals and Itô’s formula
where −1 is the inverse matrix of . Define i,N
Bt
=
d
t
( −1 )ik (s)IN (s)dMsk ,
0
k=1
i = 1, 2, . . . , d.
Then, Bi,N ∈ M2,c and !
B
i,N
j,N
,B
" t
d
=
0
k,=1
= =
t
d
t
m=1 0 t 0
(
−1
)ik (s)(
−1
)j (s)IN (s)
d
km (s)jm (s)ds
m=1
δim δjm IN (s)ds
IN (s)dsδij .
So, i,N 2 E sup Bi,N − B ≤ 4 E t t 0≤t≤T
T 0
|IN (s) − IN (s)|2 ds → 0
as N, N → ∞. Therefore, Bi, N converges in M2,c to some Bi and ! " Bi , Bj = δij t. t
By Theorem 3.13, Bt = (B1t , . . . , Bdt ) is a d-dimensional Brownian motion. Note that d k=1
0
t
N ik (s)dBk, = s
t
0
IN (s)dMsi .
Taking N → ∞, we get the representation equation (3.18).
Next, we remove the condition equation (3.17). In this case, we need to construct the Brownian motion on an extension of the original stochastic basis. Theorem 3.18 Let Mi ∈ M2,c , i = 1, 2, . . . , d. Let ik : R+ × → R, i = 1, 2, . . . , d, k = 1, 2, . . . , r, be predictable processes such that !
Mi , Mj
" t
=
t r 0
k=1
ik (s)jk (s)ds,
i, j = 1, 2, . . . , d.
3.4
Martingale representation in terms of Brownian motion
˜ F˜ t ) of (, F , P, Ft ) there exists a ˜ F˜ , P, Then, on an extension (, r-dimensional Brownian motion Bt such that Mti =
r
t
0
k=1
ik (s)dBks ,
i = 1, 2, . . . , d.
(3.19)
Proof By taking Mti ≡ 0 or ik ≡ 0 if necessary, we may assume that d = r. Let ij (s) =
d
ik (s)jk (s).
k=1
Then (s) is a d × d non-negative definite matrix. Let 1
˜ (s) = lim (s) 2 ((s) + Id )−1 , ↓0
where Id is the d × d identity matrix. By diagonalization of the matrix (s), ˜ it is easy to see that (s) above is well defined. Let ER (s) be the projection matrix to the range of (s) and EN (s) = Id − ER (s). Then, 1
1
˜ ˜ 2 = (s) 2 (s) = ER (s). (s)(s) 1
First, we assume that (s) = (s) 2 . Let Bt be a d-dimensional Brownian motion on a stochastic basis ( , F , P , Ft ) and let ˜ F˜ t ) = (, F , P, Ft ) × ( , F , P , F ). ˜ F˜ , P, (, t Define Bit =
d k=1
0
t
d
˜ ik (s)dMsk +
k=1
t 0
EN (s)ik dBk s .
Then Bi , Bj t = δij t and hence, Bt is a d-dimensional Brownian motion. Further, d k=1
t 0
ik (s)dBks =
d k,j=1
+
t
˜ kj (s)dMs ik (s) j
0
d k,j=1
0
t
j
ik (s)EN (s)kj dBs
51
52
3 : Stochastic integrals and Itô’s formula
=
d
t
0
j=1
= Mti −
j
ER (s)ij dMs
d
t
0
j=1
j
EN (s)ij dMs .
(3.20)
Note that % d j=1
· 0
& j EN (s)ij dMs
= 0. t
Combining this with equation (3.20), we see that equation (3.19) holds. For general (s), there exists an orthogonal-matrix-valued predictable 1 process P(s) such that (s) 2 = (s)P(s). By the previous step, we have r
Mti =
k=1
t
0
1
(s) 2 dBks .
Let ˜k = B s
d i=1
0
t
j
Pkj (s)dBs .
Then B˜ t is a d-dimensional Brownian motion and equation (3.19) holds ˜ with B replaced by B.
3.5
Change of measures
In this section, we investigate the following question: how do the martingales change under equivalent probability measures? First, we consider a sequence of non-negative local martingales. Under Novikov’s condition equation (3.23), it becomes a martingale and gives the Radon–Nickodym derivative between two probability measures. 2,c For X ∈ Mloc with X0 = 0, we define 1 E (X)t ≡ exp Xt − Xt . 2
Lemma 3.19 The positive-valued process E (X)t is a continuous local martingale.
3.5
Change of measures
Proof Applying Itô’s formula, we have t E (X)t = 1 + E (X)s dXs . 0
Hence E (X) is a continuous local martingale. Theorem 3.20 (Kazamaki) If E exp 12 Xt < ∞ for all t, then E (X)t is a
martingale. Proof Let {σn } be a sequence of stopping times increasing to infinity such that for each n, {E (X)t∧σn : t ≥ 0} is a martingale. For any bounded stopping time σ , it follows from Fatou’s lemma that
EE (X)σ ≤ lim inf EE (X)σ ∧σn = 1.
(3.21)
n→∞
Since E exp
1 2 Xt
< ∞ for all t, it is easy to show that for any T ≥ 0,
the family {Xt : t ≤ T} is uniformly integrable. Thus, Xt is a martingale. 1 By Jensen’s inequality, exp 2 Xt is a submartingale. Let a ∈ (0, 1). Then
E (aX)t = (E (X)t )a (Zt(a) )1−a , 2
2
(a)
where Zt = exp (aXt /(1 + a)). By the optional sampling theorem for a submartingale, for any σ ∈ ST , we have 0 ≤ Zσ(a) ≤ E ZT(a) |Fσ , (a)
and hence, {Zσ : σ ∈ ST } is uniformly integrable. Then, sup E E (aX)σ 1E (aX)σ >c σ ∈ST
2
≤ sup (EE (aX)σ )a σ ∈ST
1−a2 E Zσ(a) 1E (aX)σ >c
1−a2 ≤ sup E Zσ(a) 1E (aX)σ >c σ ∈ST
→ 0, as c → ∞, i.e. {E (aX)σ : σ ∈ ST } is uniformly integrable. Thus 2
1 = E(E (aX)σ ) ≤ (E(E (X)σ ))a
1−a2
E(ZT(a) )
.
(3.22)
53
54
3 : Stochastic integrals and Itô’s formula
Note that ZT(a) ≤ 1XT ≤0 + exp
1 XT 2
is uniformly integrable for a ∈ (0, 1). Then, as a ↑ 1, we have 1 (a) XT ∈ (0, ∞), EZT → E exp 2 and hence, 1−a2 EZT(a) → 1. Therefore, by equation (3.22), we get
EE (X)σ ≥ 1. Combining with equation (3.21), we get EE (X)σ = 1 for any bounded stopping time σ . Let s ≤ t and B ∈ Fs . Define t if ω ∈ / B, σ = s if ω ∈ B. It is easy to show that σ ∈ St , and hence 0 = EE (X)σ − 1 = E (E (X)t 1Bc + E (X)s 1B ) − 1 = EE (X)s 1B − EE (X)t 1B . This implies that E (X)t is a martingale.
The next theorem gives another condition for E (X)t to be a martingale that is easier to verify than that in Kazamaki’s theorem. 2,c
Theorem 3.21 (Novikov) If the stochastic process X ∈ Mloc satisfies the following Novikov condition: 1
Xt < ∞, ∀ t ≥ 0, (3.23) E exp 2 then {E (X)t }t≥0 is a continuous martingale. Proof Note that
1 exp Xt 2
1
Xt . = (E (X)t ) exp 4 1 2
3.5
Change of measures
It follows from the Cauchy–Schwartz inequality that
E exp
1 Xt 2
1
≤ (EE (X)t ) 2
1 2 1
Xt E exp < ∞. 2
The conclusion then follows from Kazamaki’s theorem.
Throughout the rest of this section, we assume that E (X)t defined above is a martingale with E (X)0 = 1. We define a probability measure Pˆ t on (, Ft ) by Pˆ t (A) = E(E (X)t 1A ),
∀ A ∈ Ft .
Then ∀ t > s, we have Pˆ t |Fs = Pˆ s . In fact, ∀ A ∈ Fs , Pˆ t (A) = E(E(E (X)t 1A |Fs )) = E(E (X)s 1A ) = Pˆ s (A). We assume that F = σ ∪t≥0 Ft . ˆ F = Then there exists a unique probability measure Pˆ on (, F ) such that P| t ˆPt . Denote Pˆ by E (X) · P. The following theorem gives a formula for the conditional expectation of a random variable under a change of measure. Theorem 3.22 (Bayes’ formula) Suppose that ξ is an integrable random variable on (, F , P) and G is a sub-σ -field of F . Let Q >> P be another dP probability measure such that M = dQ . Then
E(ξ |G ) =
EQ (ξ M|G ) , EQ (M|G )
(3.24)
where EQ refers to the expectation with respect to the probability measure Q. Proof For any A ∈ G , we have A
EQ (ξ M|G ) EQ (ξ M|G ) Q 1A Q dP = E M EQ (M|G ) E (M|G ) EQ (ξ M|G ) Q Q E (M|G ) =E 1A Q E (M|G ) = EQ (1A ξ M).
55
56
3 : Stochastic integrals and Itô’s formula
On the other hand, A
E(ξ |G )dP = E(1A ξ ) = EQ (1A ξ M).
This proves the identity equation (3.24).
Finally, we state and prove the main theorem of this section that will be used extensively in this book. ˆ Denote the collection of all P-locally square-integrable martingales by 2,c ˆ Mloc . 2,c Theorem 3.23 (Girsanov’s transformation) i) If Y ∈ Mloc , then Y˜ defined by
Y˜ t = Yt − X, Yt
(3.25)
ˆ is a P-locally square-integrable martingale. 2,c ii) For Y 1 , Y 2 ∈ Mloc , let Y˜ 1 , Y˜ 2 be defined by equation (3.25). Then, Meyer’s processes satisfy the following identity: ! " ! " Y˜ 1 , Y˜ 2 = Y 1 , Y 2 , a.s. Proof i) First, we assume that Y˜ t is bounded. By Itô’s formula, d(E (X)t Y˜ t ) = Y˜ t d E (X)t + E (X)t d Y˜ t + d E (X), Yt = Y˜ t d E (X)t + E (X)t dYt . Therefore, E (X)t Y˜ t is a martingale and hence, by Bayes’ formula, ˜ Eˆ (Y˜ t |Fs ) = E(E (X)t Y˜ t |Fs )E (X)−1 s = Ys , ˆ (Y|G ) stands for the conditional expectation, given G , under the where E ˆ probability measure P. In general, we choose a sequence of increasing stopping times σn such ˆ 2,c , and so does Y. ˜ that ∀ n, Y˜ t∧σn is bounded. Therefore, Y˜ σn ∧· ∈ M loc Since the Meyer’s processes coincide with the corresponding quadratic variation processes, the proof of (ii) is trivial. Corollary 3.24 Suppose that Xt =
t 0
s∗ dBs ,
3.6
Stratonovich integral
where Bt is a d-dimensional Brownian motion and is a square-integrable Rd -valued predictable process. We assume that the Novikov condition is satisfied so that Pˆ is a probability measure. Then ˜ t = Bt − B
t
0
s ds
ˆ Ft ). is a d-dimensional Brownian motion on (, F , P, Proof Note that !
" Bi , X = ti . t
ˆ . As ˜i ∈ M Hence, B loc 2,c
!
˜ i , B˜ j B
" t
! " = Bi , Bj = δij t,
ˆ B˜ t is a d-dimensional Brownian motion under P.
3.6
Stratonovich integral
In the definition of Itô’s integral, the values of the integrand at the left points of each subinterval of a partition are taken in the definition of the approximating Riemann sum. Namely
t 0
fs dMs = lim
||→0
n−1
fti Mti+1 − Mti ,
i=0
where = {0 = t0 < t1 < · · · < tn = t} is a partition of [0, t] and = max0≤i
Mt is a square-integrable martingale and f ∈ L (νM ), the stochastic integral 0 fs dMs is also a martingale. The disadvantage of this definition of the stochastic integral is that Itô’s formula based on this integral is different from the chain rule, which is very convenient to use. To overcome this shortcoming of the Itô integral, Stratonovich modified the Riemann sum by taking the average of the values of the integrand at endpoints for each small interval.
57
58
3 : Stochastic integrals and Itô’s formula
Theorem 3.25 Let X and Y be two continuous semimartingales. Let be a partition 0 = t0 < t1 < · · · < tn = t of [0, t] such that || → 0. Then, the sum In ≡
n−1 i=0
1 Xti + Xti+1 Yti+1 − Yti 2
converges in probability to a random variable, denoted by Further, t t 1 Xs dYs + X, Yt . Xs ◦ dYs = 2 0 0
t 0
Xs ◦ dYs .
(3.26)
Proof Note that In =
n−1 i=0
1 n−1 Xti+1 − Xti Yti+1 − Yti . Xti Yti+1 − Yti + 2
(3.27)
i=0
By the definition of the Itô integral, the first term of equation (3.27) con t verges to 0 Xs dYs . By Theorem 3.11 and Definition 2.30, it is easy to show that the second term of equation (3.27) converges to 12 X, Yt . Thus, the conclusions of the theorem hold.
t Definition 3.26 The stochastic integral 0 Xs ◦dYs is called the Stratonovich integral. As a consequence of the theorem above, we have the following useful identities. For simplicity of notations, we shall use X ◦ dY for Xt ◦ dYt . Corollary 3.27 Suppose that X, Y, Z are semimartingales, then X ◦ (dY + dZ) = X ◦ dY + X ◦ dZ, (X + Y) ◦ dZ = X ◦ dZ + Y ◦ dZ, X ◦ (dY · dZ) = (X ◦ dY) · dZ = X · (dY · dZ), and (XY) ◦ dZ = X ◦ (Y ◦ dZ). Proof The first two equalities follow from the definition directly. By equation (3.26), we have Y ◦ dZ = YdZ +
1 d Y, Z . 2
3.6
Stratonovich integral
As Y, Z is of finite variation, its quadratic covariation with X is zero, and hence, X ◦ d Y, Z = Xd Y, Z . As
X, YdZ = Yd X, Z ,
we have
1 X ◦ (Y ◦ dZ) = X ◦ YdZ + d Y, Z 2 1 X ◦ d Y, Z 2 1 1 = XYdZ + Yd X, Z + Xd Y, Z 2 2 1 = XYdZ + d XY, Z 2 = (XY) ◦ dZ. = X ◦ (YdZ) +
This proves the last equality. The other equalities can be proved similarly. Finally, we give the counterpart of Itô’s formula in the present setup. This formula has the same form as the chain rule in calculus. Theorem 3.28 If X 1 , X 2 , . . . , X d are continuous semimartingales and f ∈ C 3 (Rd ), then Y = f (X 1 , . . . , X d ) is a semimartingale and dYt =
d
∂i f (Xt1 , . . . , Xtd ) ◦ dXti .
(3.28)
i=1
Proof Denote Xt = (Xt1 , . . . , Xtd ). By Itô’s formula, we get dYt =
d
∂i f (Xt )dXti +
i=1
d " ! 1 2 ∂ij f (Xt )d X i , X j . t 2 i,j=1
By Theorem 3.25, we have ∂i f (Xt )dXti = ∂i f (Xt ) ◦ dXti −
" 1 ! d ∂i f (X), X i . t 2
(3.29)
59
60
3 : Stochastic integrals and Itô’s formula
Applying Itô’s formula again, we get d∂i f (Xt ) =
d
j ∂ij2 f (Xt )dXt
j=1
d " ! 1 3 + ∂ijk f (Xt )d X j , X k . t 2 j,k=1
Thus, d ! ! " " d ∂i f (X), X i = ∂ij2 f (Xt )d X i , X j . t
j=1
t
(3.30)
The chain rule equation (3.28) follows from equations (3.29) and (3.30).
4
Stochastic differential equations
In many filtering problems, the signal Xt is a stochastic process governed by a stochastic differential equation (SDE). First, we derive this SDE intuitively. Suppose Xt is a continuous process taking values in Rd . Without noise, it should be governed by an ordinary differential equation of the form: dXt = b(Xt ), dt where b : Rd → Rd is a continuous map. In many real-world problems, there are (white) noises that perturb the signal. Namely, Xt is governed by the following SDE: dXt = b(Xt ) + σ (Xt )nt , dt where nt is an m-dimensional white noise and σ : Rd → Rd×m is a continuous mapping. It is well known that the white noise exists in the sense of generalized function only, while the accumulated process t ns ds Bt = 0
is an m-dimensional Brownian motion. Then Xt is governed by the following SDE: dXt = b(Xt )dt + σ (Xt )dBt ,
(4.1)
which is understood as Xt = X0 +
t 0
b(Xs )ds +
t 0
σ (Xs )dBs .
In this chapter, we study the existence and uniqueness for the solution to the SDE (4.1).
62
4 : Stochastic differential equations
4.1
Basic definitions
In this section, we introduce various meanings of the solution to equation (4.1). We shall introduce the weak solution, the strong solution as well as the martingale problem solution. For the uniqueness, we shall introduce uniqueness in law, pathwise uniqueness as well as the well posedness of the martingale problem. If Xt is a continuous Rd -valued process, then X : (, F ) → (Cd , B (Cd )) is a measurable mapping, where Cd = C(R+ , Rd ) is the collection of the continuous mappings from R+ to Rd . Then X induces a probability measure on Cd . We shall denote this measure by L(X) or P ◦ X −1 . Definition 4.1 i) A probability measure µ on Cd is a weak solution to equation (4.1) if there exists a stochastic process Xt and an m-dimensional Brownian motion Bt on a stochastic basis such that equation (4.1) holds and L(X) = µ. ii) We say that uniqueness of the weak solution for equation (4.1) holds if whenever X and X are two weak solutions to equation (4.1) with L(X0 ) = L(X0 ), then L(X) = L(X ). Sometimes, we need uniqueness of the solution at the level of each path. Definition 4.2 We say that pathwise uniqueness of solutions for equation (4.1) holds if whenever X and X are two solutions defined on the same stochastic basis with the same Brownian motion B such that X0 = X0 , then we have Xt = Xt , ∀ t ≥ 0, a.s. Sometimes, we need to construct solutions on a pre-specified stochastic basis with a given Brownian motion. Definition 4.3 i) A measurable functional F : Rd × Cm → Cd is a strong solution to equation (4.1) if for every Rd -valued random variable X0 and an m-dimensional Brownian motion B, X = F(X0 , B) satisfies equation (4.1). ii) We say that the uniqueness of the strong solution for equation (4.1) holds if for another solution X with the same initial X0 and the same Brownian B, we have X = F(X0 , B). To establish the relationship among weak and strong solutions, as well as the relationship among various versions of uniqueness, we demonstrate how to put two solutions of equation (4.1) on the same probability space, which might not be the case to begin with. Suppose that X and X are two solutions of the SDE (4.1) on stochastic bases ( , F , P , (Ft )) and ( , F , P , (Ft )) with initial random variables X0 and X0 (having the same distribution λ0 on Rd ) and Brownian motions B and B , respectively. Let λ and λ be the Borel probability measures
4.1
Basic definitions
on the Cartesian product Cd × Cm × Rd induced by (X , B , X0 ) and (X , B , X0 ), respectively. Define a mapping π : Cd × Cm × Rd → Cm × Rd by π(w1 , w2 , x) = (w2 , x). Then, λ ◦ π −1 = λ ◦ π −1 = PB ⊗ λ0 , where PB is the probability measure on Cm induced by a Brownian motion and PB ⊗ λ0 is the product measure of PB and λ0 on Cm × Rd . Let λw2 ,x (dw1 ) and λw2 ,x (dw1 ) be the regular conditional probability distribution of w1 given (w2 , x) with respect to λ and λ , respectively. This is possible since Cd is a Polish space. On the space = C d × C d × C m × Rd , we define a Borel probability measure λ by w3 ,x w3 ,x λ(A) = 1A (w1 , w2 , w3 , x)λ (dw1 )λ (dw2 ) × PB (dw3 )λ0 (dx)
(4.2)
for A ∈ B (). Then, it is easy to show that (w1 , w3 , x) and (X , B , X0 ) have the same distribution, and so do (w2 , w3 , x) and (X , B , X0 ). Let ιt be the operator of stopping a process at t. Namely, (ιt x)s = xs∧t , ∀ x ∈ Cd . Let d Bt (Cd ) = ι−1 t (B (C )).
Intuitively, for any solution Xt of equation (4.1), Xt should depend on {Bs : s ≤ t} only. Therefore, if A ∈ Bt (Cd ), then P(Xt ∈ A|B, X0 ) should be a functional of {X0 , Bs : s ≤ t}. The following lemma makes this intuition rigorous. Since the proof is quite technical, we suggest the reader skips it in the first reading. Lemma 4.4 For any A ∈ Bt (Cd ), we define two functions f1 and f2 f1 (w, x) = λw,x (A)
and
f2 (w, x) = λw,x (A).
Then f1 and f2 are measurable with respect to the completion of the σ -field Bt (Cm ) × B (Rd ) relative to the probability measure PB ⊗ λ0 . Proof We only prove the result for f1 . For fixed t > 0 and A ∈ Bt (Cd ), w,x let λt (A) be defined as λw,x (A) with λ replaced by its restriction to the sub-σ -field
Bt (Cd ) × Bt (Cm ) × B (Rd ).
63
64
4 : Stochastic differential equations w,x
Then, (w, x) → λt (A) is measurable with respect to the σ -field Bt (Cm ) × B (Rd ). Now, we only need to show that w,x
λt
(A) = f1 (w, x)
for PB ⊗ λ0 -a.s (w, x),
i.e. for any C ∈ B (Cm ) × B (Rd ), we have to show that w,x λt (A)PB (dw)λ0 (dx) = λ (A × C).
(4.3)
C
Define ρ : C([0, t], Rm ) × Cm → Cm by 1 ws ρ(w1 , w2 )s = 2 − w2 + w1 ws−t t 0
if s < t if s ≥ t.
Since ρ is a bijection and ρ, ρ −1 are continuous, we only need to prove equation (4.3) for C of the form C = {w ∈ Cm : ρ −1 w ∈ A1 × A2 } × D, where A1 ∈ B (C([0, t], Rm )), A2 ∈ B (Cm ) and D ∈ B (Rd ). As Brownian motions are of independent increments, PB ◦ ρ = P1 ⊗ P2 , where P1 and P2 are probability measures on C([0, t], Rm ) and Cm , respectively. Furtherw,x more, as λt (A) is Bt (Cm ) × B (Rd )-measurable, we can find a measurable function g in C([0, t], Rm ) × Rd such that λ t
w,x
(A) = g(ρ −1 (w)1 , x),
where ρ −1 (w)1 ∈ C([0, t], Rm ) is the first component of ρ −1 (w) in the product space C([0, t], Rm ) × Cm . Hence w,x λt (A)PB (dw)λ0 (dx) = g(u1 , x)P1 (du1 )P2 (du2 )λ0 (dx) A1 ×A2 ×D
C
=
A1 ×D
g(u1 , x)P1 (du1 )λ0 (dx)P2 (A2 ).
Let ˜ = {w ∈ Cm : (ρ −1 w)1 ∈ A1 } × D. C ˜ ∈ Bt (Cm ), and hence Then C w,x 1 1 g(u , x)P1 (du )λ0 (dx) = λ t (A)PB (dw)λ0 (dx) A1 ×D
˜ C
˜ = λ (A × C) = P X ∈ A, B |[0,t] ∈ A1 , X0 ∈ D .
4.1
Basic definitions
As P2 (A2 ) = P B (t + ·) − B (t) ∈ A2 , we have
w,x
C
λt
(A)PB (dw)λ0 (dx)
= P X ∈ A, B |[0,t] ∈ A1 , X0 ∈ D, B (t + ·) − B (t) ∈ A2 = P X ∈ A, (B , X0 ) ∈ C = λ (A × C).
This proves equation (4.3).
The next lemma identifies the Brownian motion in the new probability space (, B (), λ). Lemma 4.5 Let Bt be the completion of
Bt (Cd ) × Bt (Cd ) × Bt (Cm ) × B (Rd ) relative to the probability measure λ. Then w3 is a Brownian motion on the stochastic basis (, B , λ, Bt ). Proof First, we prove that w3 is of independent increments. To this end, we only need to show that for t ≥ s, Eλ {exp(i a, w3 (t) − w3 (s)Rm )|Bs } = Eλ {exp(i a, w3 (t) − w3 (s)Rm )}. Let A1 , A2 ∈ Bs (Cd ), A3 ∈ Bs (Cm ), A4 ∈ B (Rd ) and a ∈ Rm . Then we have Eλ {exp(i a, w3 (t) − w3 (s)Rm )1A1 ×A2 ×A3 ×A4 } exp(i a, w3 (t) − w3 (s)Rm)λw3 ,x (A1)λw3 ,x (A2)PB (dw3)λ0 (dx) = =
A3 ×A4
A3 ×A4
exp(i a, w3 (t) − w3 (s)Rm )f1 (w3 , x)f2 (w3 , x)PB (dw3 )λ0 (dx)
= Eλ exp(i a, w3 (t) − w3 (s)Rm )λ(A1 × A2 × A3 × A4 ), where f1 , f2 are defined in Lemma 4.4. Hence, w3 is of independent increments. Next, as the law of (w3 )t − (w3 )s under λ is the same as the law of Bt − Bs under P , we see that (w3 )t − (w3 )s has a multivariate normal distribution with mean zero and covariance matrix (t−s)Im . Therefore, w3 is a Brownian motion.
65
66
4 : Stochastic differential equations
The next lemma says that if a product probability measure P1 ⊗ P2 is supported on the diagonal of the product space, then the marginal probability measures must be the same and be degenerated. In other words, if two independent random variables are equal, then they have to be constant. Lemma 4.6 Let P1 and P2 be two probability measures on a Polish space X. If (P1 × P2 ){(x1 , x2 ) : x1 = x2 } = 1, there exists a unique x ∈ X such that P1 = P2 = δ{x} . Proof As
1=
P1 (dx)
1x=y P2 (dy) =
P1 (dx)P2 ({x}) ≤ 1,
(4.4)
we have P2 ({x}) = 1 for P1 − a.s. x. So, there exists a unique x such that P2 ({x}) = 1 and P1 = δx . After these preparations, we are now ready to state and to prove the main theorem of this section, which establishes the relationship between the pathwise uniqueness and the existence of a strong solution. Theorem 4.7 The equation (4.1) has a unique strong solution if and only if for every Borel probability measure µ0 on Rd , a weak solution µ of equation (4.1) exists with µ ◦ X0−1 = µ0 and the pathwise uniqueness of the solution holds. In this case, the weak uniqueness also holds. Proof If equation (4.1) has a unique strong solution, it is easy to verify that equation (4.1) has a weak solution and the pathwise uniqueness holds. We now prove the converse. Let X and X be two solutions of the SDE (4.1) (we can always take copies if necessary). From the arguments above, we see that (w1 , w3 , x) and (w2 , w3 , x) are two solutions of equation (4.1) on the same stochastic basis (, B , λ, Bt ). By the pathwise uniqueness, we have that λ(w2 = w1 ) = 1. By equation (4.2), we get λ(w2 = w1 ) = λw,x ⊗ λw,x (w2 = w1 )PB (dw)λ0 (dx), and hence, for PB ⊗ λ0 -a.s. (w, x), we have λw,x ⊗ λw,x (w1 = w2 ) = 1. By Lemmas 4.6 and (4.5), there exists a mapping F : Cm × Rd → Cd ,
(4.5)
4.2
Existence and uniqueness of a solution
such that λw,x = λw,x = δF(w,x) .
(4.6)
For any A ∈ Bt (Cd ), by equation (4.6), Lemma 4.4 and 1F −1 (A) (w, x) = λw,x (A), it follows that F −1 (A) is in the completion of Bt (Cm ) × B (Rd ) relative to PB ⊗ λ0 , and hence, F(w, x) is adapted. Then, for any Brownian motion B and initial random variable X0 , F(B, X0 ) is a solution of equation (4.1). The uniqueness of the strong solution follows directly from the pathwise uniqueness of equation (4.1).
4.2
Existence and uniqueness of a solution
In this section, we establish a unique solution to equation (4.1). The existence is established by the Picard approximation, while the uniqueness follows from Gronwall’s inequality. Suppose that the coefficients b and σ satisfy the following Lipschitz condition: There exists a constant K such that |b(x) − b(y)| + |σ (x) − σ (y)| ≤ K|x − y|,
∀x, y ∈ Rd .
(4.7)
Theorem 4.8 Under Condition equations (4.7), (4.1) has a unique strong solution. Proof We only need to prove the theorem for t ≤ T. First, we use Picard iteration to construct a solution. Let Xt0 ≡ X0 and Xtn+1
= X0 +
t 0
b(Xsn )ds +
t 0
σ (Xsn )dBs ,
Consider the equivalent probability P˜ given by d P˜ = e−|X0 | /E e−|X0 | , dP
n ≥ 1.
67
68
4 : Stochastic differential equations
if necessary, we may assume that E(|X0 |2 ) < ∞. By the Burkholder–Davis– Gundy inequality, we get n+1 2 gn+1 (t) ≡ E sup |Xs | s≤t
2
≤ 3E(|X0 | ) + 3T E + 12E
0
2
|b(0)| + K|Xsn |
t
2
|σ (0)| + K|Xsn |
0
≤ K1 + K2
t
t
0
ds
ds
gn (s)ds,
where K1 , K2 are two constants and g0 (t) ≤ K1 . Using induction, we can prove that E sup |Xsn |2 ≤ K1 eK2 t . s≤t
Note that Xtn+1
− Xtn
=
t
0
(b(Xsn ) − b(Xsn−1 ))ds +
t 0
(σ (Xsn ) − σ (Xsn−1 ))dBs .
Applying the Burkholder–Davis–Gundy inequality again, we have n+1 n 2 fn+1 (t) ≡ E sup |Xs − Xs | s≤t
≤ 2E
t
0
2 |b(Xsn ) − b(Xsn−1 )|ds
r 2 + 2E sup (σ (Xsn ) − σ (Xsn−1 ))dBs r≤t
≤ 2TK
2 0
0
t
E
≤ (2T + 8)K2
|Xsn
0
t
− Xsn−1 |2
ds + 8E
0
t
|σ (Xsn ) − σ (Xsn−1 )|2 ds
fn (s)ds.
(4.8)
Set K3 = (2T + 8)K2 . Using induction, it is easy to show that fn+1 (t) ≤
K3n
0
t
(t − s)n−1 f1 (s)ds → 0. (n − 1)!
(4.9)
4.2
Existence and uniqueness of a solution
Therefore, there exists a continuous stochastic process Xt such that
E sup |Xtn − Xt |2 → 0. t≤T
It is easy to show that Xt is a solution to equation (4.1). To prove the uniqueness, we let X and Y be two solutions to equation (4.1) with the same initial and the same driving Brownian motion. Using arguments similar to equation (4.8), we have t 2 g(t) ≡ E sup |Xs − Ys | ≤ K4 g(s)ds. s≤t
0
By Gronwall’s inequality, which is proved below, we get g(t) ≡ 0. This implies the pathwise uniqueness. It follows from Theorem 4.7 that equation (4.1) has a unique strong solution. Lemma 4.9 (Gronwall’s inequality) If t g(t) ≤ K1 + K2 g(s)ds,
∀t ≥ 0,
0
then g(t) ≤ K1 eK2 t , Proof Note that g(t) ≤ K1 + K2 =
t 0
∀t ≥ 0.
K1 + K 2
K1 + K1 K2 t + K22
≤ K1 + K1 K2 t + K22 = K 1 1 + K2 t +
t
0 t
0
s
g(r)dr ds 0
(t − r)g(r)dr r (t − r) K1 + K2 g(s)ds dr
(K2 2
t)2
+ K23
0
0
t
(t − s)2 g(s)ds. 2
Using induction, we can show that t (K2 t)n (K2 t)2 (t − s)n + ··· + + K2n+1 g(s)ds. g(t) ≤ K1 1 + K2 t + 2 n! n! 0 Taking n → ∞, we finish the proof.
Finally, in this section, we prove the continuous dependency of the solution on the coefficients. This theorem will be needed when we derive the duality for stochastic filtering.
69
70
4 : Stochastic differential equations
Theorem 4.10 Suppose that {(bn , σ n )} is a sequence of functions on Rd taking values in Rd × Rd×m . For each n, (bn , σ n ) satisfies Condition (4.7). Further, as n → ∞, n ≡ sup |bn (x) − b(x)|2 + |σ n (x) − σ (x)|2 → 0. x∈Rd
Let X n be the solution to equation (4.1) with (b, σ ) being replaced by (bn , σ n ). Then, for any T > 0,
E sup |Xtn − Xt |2 → 0. t≤T
Proof As Xtn
− Xt =
t
b
0
n
(Xsn ) − b(Xs )
ds +
0
t
σ n (Xsn ) − σ (Xs ) dBs ,
it follows from the Cauchy–Schwarz and Burkholder–Davis–Gundy inequalities that for t ≤ T, t E sup |Xsn − Xs |2 ≤ 2T E |bn (Xsn ) − b(Xs )|2 ds s≤t
0
+ 8E ≤ 4T E
t 0
|σ n (Xsn ) − σ (Xs )|2 ds
t
|b(Xsn ) − b(Xs )|2 + n ds
0
+ 16E
t 0
|σ (Xsn ) − σ (Xs )|2 + n ds
≤ 4(T + 4)n + 4(T + 4)KE
t
0
|Xsn − Xs |2 ds.
By Gronwall’s inequality, we have
E sup |Xsn − Xs |2 ≤ 4(T + 4)n e4(T+4)Kt . s≤t
This implies the desired estimate.
4.3
Martingale problem
Let Xt be the unique solution to equation (4.1). Applying Itô’s formula, we get for any f ∈ Cb2 (Rd ), df (Xt ) = Lf (Xt )dt + ∇ ∗ f σ (Xt )dBt ,
4.3
Martingale problem
where d d 1 2 Lf = aij ∂ij f + bi ∂i f , 2 i,j=1
i=1
and aij =
m
σik σjk .
k=1
Therefore, f Mt
≡ f (Xt ) − f (X0 ) −
t 0
Lf (Xs )ds
(4.10)
is a square-integrable martingale. Definition 4.11 We say that {Xt } is a solution to the L-martingale problem f if ∀f ∈ Cb2 (Rd ), Mt defined in equation (4.10) is a locally square-integrable martingale. The L-martingale problem is well posed in C([0, ∞), Rd ) if it has at least one solution and its solution is unique in distribution, i.e. if X, Y are two solutions, then L(X) = L(Y). From the above, we see that the solution to the SDE (4.1) is a solution to the L-martingale problem. The next theorem establishes the uniqueness for the solution of the martingale problem. Theorem 4.12 Under Condition (4.7), the L-martingale problem is well posed in C([0, ∞), Rd ). Proof Let Xt be a solution to the L-martingale problem. First, we prove that t i i i bi (Xs )ds Mt ≡ Xt − X0 − 0
is a local martingale. Let f ∈ Cb2 (Rd ) be such that f (x) = xi
for |x| ≤ r.
Then,
f
i − X0i − Mt∧σr ≡ Xt∧σ r
t∧σr 0
bi (Xs )ds
is a continuous martingale, where σr = inf {t : |Xt | > r} .
71
72
4 : Stochastic differential equations f
i As σr ↑ ∞ and Mt∧σ = Mt∧σr is a continuous square-integrable martingale, r 2,c
we see that Mi ∈ Mloc . Take f ∈ Cb2 (Rd ) such that f (x) = xi xj
for |x| ≤ r,
it is easy to verify that j
j
Xti Xt − X0i X0 −
t 0
j aij (Xs ) + Xsi bj (Xs ) + Xs bi (Xs ) ds
is a local martingale. Note that dXti = bi (Xt )dt + dMti ,
i = 1, 2, . . . , d.
By Itô’s formula, we get that " ! j j d(Xti Xt ) = Xti bj (Xt ) + Xt bi (Xt ) dt + d Mi , Mj + d(local mart.). t
Hence, t t m " ! aij (Xs )ds = σik σjk (Xs )ds. Mi , Mj = t
0
0
k=1
By the martingale representation theorem (Theorem 3.16), we see that there exists a Brownian motion Bt such that m t σik (Xs )dBis . Mti = k=1
0
Therefore, Xt is a solution to equation (4.1). By the uniqueness of the weak solution, we see that the L-martingale problem is well posed. Remark 4.13 From the proof of the theorem above we see that the solution to the martingale problem and the weak solution of the stochastic differential equation are equivalent.
4.4
A stochastic flow
Throughout this section, we assume that Condition (4.7) holds, so that the SDE (4.1) has a unique strong solution. Let Xt = F(t, x, B) be the unique strong solution of the SDE (4.1) with initial x, where F : R+ × Rd × Cd → Rd is a measurable mapping. We define a shift operator θt from Cd to Cd by (θt B)s = Bt+s − Bs ,
∀ s ≥ 0.
4.4
A stochastic flow
By the pathwise uniqueness of the solution, we see that for t, s ≥ 0 fixed F(t + s, x, B) = F(s, F(t, x, B), θs B)
a.s.
Namely, Xt = F(t, x, B) as a mapping x → F(t, x, B) is a stochastic flow. In this section, we consider the differentiability of this mapping. To this end, we study the Euler approximation of the solution. This approximation method will be useful when we study the numerical solution to the filtering equation in Chapter 8. For δ ∈ (0, 1), let ηδ (t) = jδ for jδ ≤ t < (j + 1)δ, j = 0, 1, 2, . . .. Let X δ be the solution to t t δ δ b Xηδ (s) ds + σ Xηδ δ (s) dBs . (4.11) Xt = x + 0
0
Note that for jδ ≤ t < (j + 1)δ, equation (4.11) becomes δ δ δ Xtδ = Xjδ + b(Xjδ )(t − jδ) + σ (Xjδ )(Bt − Bjδ ).
Therefore, the solution to equation (4.11) is given recursively starting from j = 0. Throughout the rest of this section, we assume that the coefficients b and σ are bounded and Lipschitz continuous, i.e. there exists a constant K such that for any x, y ∈ Rd , we have |b(x)| + |σ (x)| ≤ K, and |b(x) − b(y)| + |σ (x) − σ (y)| ≤ K|x − y|. Theorem 4.14 For any p > 12 , there exists a constant K1 such that
E sup |Xtδ − Xt |2p ≤ K1 δ p . 0≤t≤T
Proof Let fδ (t) = E sup |Xsδ − Xs |2p . 0≤s≤t
Using the inequality |a + b|2p ≤ 22p−1 |a|2p + |b|2p , we get 2p 2p E Xηδ δ (s) − Xsδ ≤ E b Xηδ δ (s) (s − ηδ (s)) + σ Xηδ δ (s) Bηδ (s) − Bs ≤ 22p−1 K2p δ 2p + 22p−2 K2p K2 δ p ≤ K3 δ p ,
73
74
4 : Stochastic differential equations
where K2 is a constant (the 2p-moment of a d-dimensional standard normal random vector) and K3 = 22p−2 K2p (2 + K2 ). By the Lipschitz continuity of b and σ , we get 2p E b Xηδ δ (s) − b Xsδ ≤ K2p K3 δ p , and
As Xtδ
2p E σ Xηδ δ (s) − σ Xsδ ≤ K2p K3 δ p . t t δ b Xηδ (s) − b(Xs ) ds + σ Xηδ δ (s) − σ (Xs ) dBs , − Xt = 0
0
by the Cauchy–Schwarz inequality and the Burkholder–Davis–Gundy inequality, we have t 2p fδ (t) ≤ 2p−1 T 2p−1 E b Xηδ δ (s) − b (Xs ) ds +2 ≤ K4
2p−1
t 0
0
2p 2p − 1
2p
T
t
p−1 0
2p E σ Xηδ δ (s) − σ (Xs ) ds
2p K3 δ p K2p + K2p E Xsδ − Xs ds
≤ K5 δ p + K6
t 0
fδ (s)ds.
By Gronwall’s inequality, we get fδ (t) ≤ K5 eK6 T δ p → 0.
Now we consider the differentiability of F(t, x, B) with respect to x. For convenience, we rewrite equations (4.1) and (4.11) as Xt = x +
0
t
b(Xs )ds +
d i=1
t
0
σi (Xs )dBis ,
and Xtδ = x +
t 0
d t b Xηδ δ (s) ds + σi Xηδ δ (s) dBis , i=1
where σi is the ith column of the matrix σ .
0
4.4
A stochastic flow
Suppose that b and σ are continuously differentiable in x. For 0 < t < δ, we have Xtδ = x + b(x)t +
d
σi (x)Bit ,
i=1
which is differentiable in x. Let Ytδ = ∇ ∗ Xtδ . Then, Ytδ = I + ∇ ∗ bt +
d
∇ ∗ σi (x)Bit .
i=1
For δ ≤ t < 2δ, we have d Xtδ = Xδδ + b Xδδ (t − δ) + σi Xδδ Bit − Biδ . i=1
Then, Ytδ is a Rd×d -valued process satisfying d ∇ ∗ σi Xδδ Yδδ Bit − Biδ . Ytδ = Yδδ + ∇ ∗ b Xδδ Yδδ (t − δ) + i=1
Using induction, we can prove that Xtδ is differentiable in x and its gradient Ytδ is the unique solution to the following SDE on Rd×d : Ytδ
=I+
0
t
∗
∇ b
Xηδ δ (s)
Yηδδ (s) ds +
d i=1
t 0
∇ ∗ σi Xηδ δ (s) Yηδδ (s) dBis . (4.12)
To study the limit of Y δ as δ → 0, we need the following discrete-time version of Gronwall’s inequality. Lemma 4.15 (Discrete-time Gronwall inequality) Let {an : n ∈ Z+ } be a non-negative sequence. Suppose that there are two constants K1 and K2 satisfying an+1 ≤ K1 + K2
n
aj ,
∀n ≥ 0.
j=0
Then, an ≤ K1 + (K1 + a0 K2 ) (1 + K2 )n−1 .
(4.13)
75
76
4 : Stochastic differential equations
Proof Let Sn =
n
aj .
j=0
Then, Sn+1 ≤ K1 + (1 + K2 )Sn . Using induction, we have Sn = K1 1 + (1 + K2 ) + · · · + (1 + K2 )n−1 + (1 + K2 )n a0 K1 (1 + K2 )n − 1 + (1 + K2 )n a0 K2 K1 (1 + K2 )n . ≤ a0 + K2 =
Inserting this back into equation (4.13), we finish the proof.
As a corollary of the discrete-time Gronwall inequality, we have Corollary 4.16 Suppose that ∇ ∗ b and ∇ ∗ σi , i = 1, 2, . . . , d are bounded and Lipschitz continuous. Then, for any p > 12 , we have sup E|Ytδ |2p < ∞. t≤T
Proof Applying the Cauchy–Schwarz and the Burkholder–Davis–Gundy inequalities to equation (4.12), we get t δ 2 E|Yt | ≤ K1 + K2 E|Yηδδ (s) |2p ds. (4.14) 0
Let t = (n + 1)δ. Then, δ |2p an+1 ≡ E|Y(n+1)δ
≤ K1 + K2 δ
n
aj .
j=0
By Lemma 4.15, we have an ≤ K1 + (K1 + K1 K2 δ) (1 + K2 δ)n−1 T
≤ K1 + (K1 + K1 K2 δ) (1 + K2 δ) δ −1 ≤ K1 + K1 eK2 T . Inserting this back into equation (4.14), we get the boundedness of E|Ytδ |2p .
4.4
A stochastic flow
To characterize the limit of Y δ as δ → 0, we consider the following linear SDE on Rd×d : Yt = I +
t
0
∗
∇ b (Xs ) Ys ds +
d i=1
0
t
∇ ∗ σi (Xs ) Ys dBis .
(4.15)
Theorem 4.17 Suppose that ∇ ∗ b and ∇ ∗ σi , i = 1, 2, . . . , d are bounded and Lipschitz continuous. Then equation (4.15) has a unique strong solution Yt taking values in Rd×d , and, for any p > 12 , there exists a constant K such that
E sup |Ytδ − Yt |2p ≤ Kδ p .
(4.16)
0≤t≤T
Proof The existence and uniqueness of Yt follow from the same arguments as in Section 4.2. Now we prove equation (4.16). Note that Ytδ − Yt (4.17) t = ∇ ∗ b Xηδ δ (s) − ∇ ∗ b (Xs ) Yηδδ (s) + ∇ ∗ b (Xs ) Yηδδ (s) − Ys ds 0
+
d t i=1
0
∇ ∗ σi Xηδ δ (s) − ∇ ∗ σi (Xs ) Yηδδ (s)
+∇ ∗ σi (Xs ) Yηδδ (s) − Ys dBis . By the Lipschitz continuity and the Cauchy–Schwarz inequality, we have 2p E ∇ ∗ b Xηδ δ (s) − ∇ ∗ b (Xs ) Yηδδ (s) 4p 1/2 1/2 δ δ 4p 2p ≤K E Xηδ (s) − Xs E Yηδ (s) . By Corollary 4.16 and the triangle inequality, we can continue with 2p E ∇ ∗ b Xηδ δ (s) − ∇ ∗ b (Xs ) Yηδδ (s) 2p 4p 1/4p 4p 1/4p δ δ δ ≤ K1 E Xηδ (s) − Xs + E Xs − X s ≤ K2 δ p ,
77
78
4 : Stochastic differential equations
where the last step follows from Theorem 4.14. Similarly, we can get 2p E ∇ ∗ b (Xs ) Yηδδ (s) − Ys ≤ K3 δ p , 2p E ∇ ∗ σi Xηδ δ (s) − ∇ ∗ σi (Xs ) Yηδδ (s) ≤ K4 δ p , and 2p E ∇ ∗ σi (Xs ) Yηδδ (s) − Ys ≤ K4 δ p . Applying the Cauchy–Schwarz and the Burkholder–Davis–Gundy inequalities to equation (4.17), we obtain equation (4.16). Finally, we consider the invertibility of the matrix Yt . To define a process and to prove it is the inverse of Yt , it is more convenient to use the Stratonovich form of the SDE. Recall that Xti
=
X0i
+
t 0
i
b (Xs )ds +
d
t
0
j=1
j
σij (Xs )dBs .
Applying Itô’s formula, we have dσij (Xt ) = Lσij (Xt )dt +
d
∂k σij (Xt )
d
σk (Xt )dBt .
=1
k=1
Thus, d ! " d σij (X), Bj = σkj (Xt )∂k σij (Xt )dt. t
k=1
Using equation (3.26), the basic process Xt is governed by the following Stratonovich form: Xt = X0 +
t 0
¯ s )ds + b(X
d i=1
t 0
σi (Xs ) ◦ dBis ,
where d 1 ¯ b(x) = b(x) − σij (x)∂i σj (x). 2 i,j=1
4.5
Markov property
Then, the Jacobian matrix Yt for the mapping x → F(t, x, B) satisfies Yt = I +
t
0
∗¯
∇ b (Xs ) Ys ds +
d i=1
t
0
∇ ∗ σi (Xs ) Ys ◦ dBis .
(4.18)
Theorem 4.18 The process {F(t, x, B)} is a stochastic flow and, for almost all B, the mapping x → F(t, x, B) is differentiable, and the Jacobian matrix Yt is invertible. Proof Let Zt be a Rd×d -valued process governed by the following SDE: Zt = I −
t
0
∗¯
Zs ∇ b (Xs ) ds −
d i=1
0
t
Zs ∇ ∗ σi (Xs ) ◦ dBis .
(4.19)
The existence of a solution follows from the same arguments as in Theorem 4.8. Applying Itô’s formula to equations (4.18) and (4.19) we have d(Zt Yt ) = Zt ◦ dYt + dZt ◦ Yt = 0. Thus, Zt Yt = I a.s.
4.5
Markov property
In this section, we consider the Markov property for the solution of the SDE (4.1). Definition 4.19 A stochastic process (Xt ) is said to be a Markov process if for all t ≥ s ≥ 0 and for any bounded measurable function f , we have E f (Xt )|Fs = E f (Xt )|Xs , a.s. We will need the following lemma for the proof of the Markov property. Lemma 4.20 Given a probability space (, G , P) and measurable spaces (E, E ) and (F, F ). Suppose that A ⊂ G is a σ -field, X : → E and Y : → F are random variables such that X is A-measurable and Y is independent of A. For a bounded real-valued function on (E × F, E × F ), we define φ(x) = E(x, Y). Then φ is Borel measurable on (E, E) and
E ((X, Y)|A) = φ(X),
a.s.
79
80
4 : Stochastic differential equations
Proof Write PY for the law of Y. Then, φ(x) =
F
(x, y)PY (dy).
The measurability of φ follows from Fubini’s theorem. Let Z be a bounded A-measurable random variable. Denote the law of (X, Z) by PX,Z . Then
E (ZE ((X, Y)|A)) = E (Z(X, Y)) = (x, y)zPX,Z (d(x, z))PY (dy) E×R F
=
E×R
=
E×R
F
(x, y)PY (dy) zPX,Z (d(x, z))
φ(x)zPX,Z (d(x, z))
= E(φ(X)Z).
This implies the desired identity.
Now we are ready to prove the Markov property for the solution to the SDE (4.1). Theorem 4.21 The solution (Xt ) to the SDE (4.1) is a Markov process. Proof Let g be a bounded real-valued function on Rd . We define a function φ on R+ × Rd by φ(t, x) = Eg(F(t, x, B)), where F is the stochastic flow given in the last section. Now we prove that for any t ≥ s ≥ 0, E g(Xt )Fs = φ(t − s, Xs ).
(4.20)
The Markov property of (Xt ) follows from equation (4.20) easily. Let Z be an Fs -measurable bounded random variable. It follows from Lemma 4.20 that E g(F(t − s, Z, θs B))Fs = φ(t − s, Z), a.s.
4.5
Markov property
Take Z = Xs = F(s, X0 , B), we see that φ(t − s, Xs ) = E g(F(t − s, F(s, X0 , B), θs B))Fs = E g(F(t, X0 , B))Fs = E g(Xt )Fs , where the second equality follows from the flow property of (Xt ).
81
5
Filtering model and Kallianpur–Striebel formula
In this chapter, we introduce the basic setup of the filtering models to be studied in this book. Then we demonstrate that the optimal filter is given by the conditional distribution of the signal. Bayes’ formula in the filtering setup, is called the Kallianpur–Striebel formula, which is the key in the development of non-linear filtering theory. We will derive this formula in Section 5.2. We establish the filtering equations in Section 5.3. Finally, in Section 5.4, we give a particle-system representation of the optimal filter that will be useful in understanding numerical schemes for the optimal filter.
5.1
The filtering model
As we mentioned in the introductory chapter, the filtering problem consists of two processes: The signal process, which is what we want to estimate, and the observation process that provides the information we can use. For the observation process, we assume that it is governed by the following stochastic differential equation: t h(Xs )ds + Wt , (5.1) Yt = 0
where Wt is an m-dimensional Brownian motion. For the signal process Xt , we assume that Xt is a Rd -valued process governed by the following stochastic differential equation: dXt = b(Xt )dt + c(Xt )dWt + σ (Xt )dBt ,
(5.2)
where B is a d-dimensional Brownian motion independent of W, and b : Rd → Rd , c : Rd → Rd×m and σ : Rd → Rd×d are continuous mappings. We make the following assumption (BC) throughout this book. Note that the boundedness condition equation (5.3) is not necessary. We make this assumption in this book for simplicity of presentation.
5.2
The optimal filter
Assumption (BC): The mappings b, c, σ , h are bounded and Lipschitz continuous. We shall denote the bound and the Lipschitz constant in the Assumption (BC) by K. Namely, for any x, y ∈ Rd , we assume that max |b(x)|, |c(x)|, |σ (x)|, |h(x)| ≤ K, (5.3) and max |b(x) − b(y)|, |c(x) − c(y)|, |σ (x) − σ (y)| ≤ K|x − y|, where for x = (x1 , x2 , . . . , xd )∗ ∈ Rd , the Euclidean norm is defined by ⎛ ⎞1/2 d |x| = ⎝ x2i ⎠ , i=1
and for matrix c ∈ Rd×m , ⎛ ⎞1/2 m d |c| = ⎝ c2 ⎠ . ij
i=1 j=1
Let
Gt = σ (Ys : 0 ≤ s ≤ t) be the σ -field generated by the observation up to time t; Gt will be the information available to us at time t. The aim of the filtering theory is to estimate the signal Xt based on the information Gt .
5.2
The optimal filter
In this section, we proceed to estimate the signal Xt based on the available information Gt . The next lemma concerns the estimation of the exact value of a random variable ξ based on the information represented by a σ -field G . It says that the conditional expectation E(ξ |G ) is the one that has the minimum square error among all G -measurable random variables. We denote the collection of all G -measurable square-integrable random variables by L2 (, G , P). Lemma 5.1 Let ξ be a square-integrable random variable in the probability space (, F , P). Let G be a sub-σ -field of F . Then
E((ξ − E(ξ |G ))2 ) = min{E((ξ − η)2 ) : η ∈ L2 (, G , P)}.
83
84
5 : Filtering model and Kallianpur–Striebel formula
Proof Let η ∈ L2 (, G , P). Then
E((ξ − η)2 ) − E((ξ − E(ξ |G ))2 ) = E((E(ξ |G ) − η)(2ξ − η − E(ξ |G ))) = E(E{((E(ξ |G ) − η)(2ξ − η − E(ξ |G )))|G }). As E(ξ |G ) − η is G -measurable, we can continue the calculation above with
E((ξ − η)2 ) − E((ξ − E(ξ |G ))2 ) = E((E(ξ |G ) − η)E{(2ξ − η − E(ξ |G ))|G }) = E((E(ξ |G ) − η)2 ) ≥ 0.
This finishes the proof of the lemma.
Most of the time, we are interested in some quantities that are functions of the signal instead of the signal itself. Therefore, we want to find a systematic way to estimate f (Xt ) for f in a rich enough family of test functions. Although E(Xt |Gt ) is the best estimate for Xt , f (E(Xt |Gt )) is not the best estimate for f (Xt ) based on the least square error criterion if f is not a linear function. Instead, applying the above lemma with ξ = f (Xt ) and G = Gt , we see that E f (Xt )|Gt is the best estimate of f (Xt ). Let πt (·) ≡ P(Xt ∈ ·|Gt ) be the regular conditional probability distribution of Xt given Gt ; i.e. πt is a map from B (Rd ) × to [0, 1] such that i) For any ω ∈ , πt (·, ω) is a probability measure on Rd . ii) For any A ∈ B (Rd ), πt (A, ·) is a Gt -measurable random variable. iii) For any A ∈ B (Rd ), we have πt (A, ω) = P(Xt ∈ A|Gt )(ω),
a.s. ω.
(5.4)
Now we prove that the conditional expectation is given by the integral of f with respect to the regular conditional probability distribution πt . Recall that Cb (Rd ) is the set of bounded continuous functions on Rd . Throughout this book, we will use ν, f to denote the integral of a function f with respect to a measure ν. Lemma 5.2 For any f ∈ Cb (Rd ) and t ≥ 0, we have E(f (Xt )|Gt ) = πt , f , a.s.
(5.5)
Proof By equation (5.4), we see that equation (5.5) holds for f (x) = 1A (x). It follows from the linearity of the conditional expectation that equation (5.5) holds for simple functions. For f ≥ 0, we can take an increasing sequence of simple functions converging pointwise to f .
5.2
The optimal filter
Then equation (5.5) follows from the monotone convergence theorem. Finally, the case for general f follows from the linearity and f = f + −f − . Let P (Rd ) be the collection of all Borel probability measures on Rd . Then, πt is a P (Rd )-valued stochastic process. Based on the observation above, we call πt the optimal filter of Xt . To calculate πt , f = E f (Xt )|Gt effectively, we need to make a transformation of probabilities so that Yt becomes a Brownian motion under the new probability measure (see equation (5.1) for the definition of Yt ). In fact, we can achieve this by a change of probability measures based on Girsanov’s theorem. Since h is bounded, t 1 t −1 2 |h(Xs )| ds h(Xs ), dWs − Mt ≡ exp − 2 0 0 is a martingale. Let Pˆ be the measure on that is absolutely continuous with respect to P and the Radon–Nickodym derivative on (, Ft ) is d Pˆ = Mt−1 , dP Ft W,B
is the σ -field generated by {Ws , Bs : s ≤ t} that we shall where Ft = Ft take throughout this and the next section. Then Pˆ is a probability measure. ˆ By Corollary 3.24, Yt is a P-Brownian motion independent of B. The next theorem is Bayes’ formula in the filtering setup. It gives a formula for the calculation of the conditional expectation under the new probaˆ It will play a very important role in the filtering theory bility measure P. introduced in this book. Theorem 5.3 (Kallianpur–Striebel formula) The optimal filter πt can be represented as Vt , f πt , f = , ∀ f ∈ Cb (Rd ), (5.6)
Vt , 1 where
ˆ (Mt f (Xt )|Gt ), Vt , f = E
ˆ refers to the expectation with respect to the measure P. ˆ and E Proof Note that
dP = Mt . d Pˆ F t
(5.7)
85
86
5 : Filtering model and Kallianpur–Striebel formula
Replacing P and Q in Theorem 3.22 by Pˆ
Ft
and P , respectively, we have Ft
Vt , f Eˆ (Mt f (Xt )|Gt ) πt , f = = .
Vt , 1 Eˆ (Mt |Gt )
Let MF (Rd ) be the collection of all finite Borel measures on Rd . The MF (Rd )-valued process Vt is called the unnormalized filter.
5.3
Filtering equation
In this section, we first derive a stochastic differential equation for the unnormalized filter Vt . Then we establish the filtering equation for the optimal filter πt by making use of the Kallianpur–Striebel formula. ˆ Recall that Yt is a P-Brownian motion independent of B. As dWt = dYt − h(Xt )dt, it follows from equation (5.2) that the signal Xt is governed by the following stochastic differential equation: dXt = (b − ch)(Xt )dt + c(Xt )dYt + σ (Xt )dBt . Note that
Mt = exp
t
0
1 h (Xs )dYs − 2 ∗
0
t
(5.8)
2
|h(Xs )| ds ,
where h∗ denotes the transpose of h ∈ Rm . By Itô’s formula, it is easy to show that dMt = Mt h∗ (Xt )dYt .
(5.9) W,B
We will need the following lemma. Recall that we take Ft = Ft this section.
in
Lemma 5.4 Suppose that f and g are predictable processes on the stochastic ˆ Ft ) satisfying basis (, F , P, T T ˆ ˆ E |fs |ds + E |gs |2 ds < ∞. 0
Then,
Eˆ
0
0
t
t fs dsGt = Eˆ fs |Gs ds, 0
(5.10)
5.3
Eˆ
0
t
Filtering equation
t gs dYs Gt = Eˆ (gs |Gs ) dYs ,
(5.11)
gs dBs Gt = 0.
(5.12)
0
and
Eˆ
0
t
Proof Suppose f is simple, i.e. fs =
k
fi 1(ai ,bi ] (s),
i=1
where (ai , bi ], i = 1, 2, . . . , k are disjoint subintervals of [0, t], and fi is Fai measurable, i = 1, 2, . . . , k. Let
Gs,t = σ (Yu − Ys : s ≤ u ≤ t). Note that Gt = Gai ∨ Gai ,t ≡ σ Gai ∪ Gai ,t . Then,
Eˆ
t 0
k fs dsGt = Eˆ fi (bi − ai )|Gt i=1
=
k
Eˆ fi |Gai ∨ Gai ,t (bi − ai )
i=1
=
k i=1 t
=
0
Eˆ fi |Gai (bi − ai ) Eˆ fs |Gs ds,
where the equality prior to the last one follows from the independence of increments of the Brownian motion Y. Thus, equation (5.10) holds for simple processes. If f ≥ 0, we can take an increasing sequence of simple processes converging pointwise to f . Then equation (5.10) follows from the monotone
87
88
5 : Filtering model and Kallianpur–Striebel formula
convergence theorem. For general f , we can write f = f + − f − , and then equation (5.10) follows from the linearity. Now we proceed to prove the second equation. Suppose g is simple, namely
gs =
k
gi 1(ai ,bi ] (s),
i=1
where gi is Fai measurable, i = 1, 2, . . . , k. Once again using the independence of increments for Y, we get
Eˆ
t
0
k gs dYs Gt = Eˆ gi (Ybi − Yai )|Gt i=1
=
k
Eˆ (gi |Gt ) (Ybi − Yai )
i=1
=
k i=1 t
=
0
Eˆ gi |Gai (Ybi − Yai )
Eˆ (gs |Gs ) dYs .
For a general g, we can approximate it by a sequence of simple processes g n such that |gsn | ≤ |gs |, a.s. for all s ≤ t. Then t 2 t n ˆ ˆ E gs dYs ≤ E |gs |2 ds < ∞, 0
0
t and hence { 0 gsn dYs : n ≥ 1} is uniformly integrable. Thus
Eˆ
t 0
t n ˆ gs dYs Gt = lim E gs dYs Gt n→∞
= lim =
n→∞ 0 t 0
0
t
ˆ E gsn Gs dYs
Eˆ gs Gs dYs .
5.3
Filtering equation
Finally, we prove the last equation. Suppose g is simple first. As Bbi − Bai is independent of Gt ∨ σ (gi ), we have
Eˆ
0
t
k gs dBs Gt = Eˆ gi (Bbi − Bai )|Gt i=1
=
k i=1
Eˆ Eˆ gi (Bbi − Bai )Gt ∨ σ (gi ) Gt
= 0. For general g, equation (5.12) follows from the same approximation argument as those used to derive equation (5.11). After all these preparations above, we are now ready to derive the main equations in the filtering theory. First, we consider the linear equation for the unnormalized filter Vt . Theorem 5.5 (Zakai’s equation) The unnormalized filter Vt satisfies the following stochastic differential equation:
Vt , f = V0 , f +
0
t
Vs , Lf ds +
0
t
Vs , ∇ ∗ fc + fh∗ dYs ,
(5.13)
where Lf =
d d 1 aij ∂ij2 f + bi ∂i f 2 i,j=1
i=1
is the generator of the signal process, and the d × d matrix a = (aij ) is given by a = cc∗ + σ σ ∗ . Proof Applying Itô’s formula to equation (5.2), we have ˜ (Xt )dt + ∇ ∗ fc(Xt )dYt + ∇ ∗ f σ (Xt )dBt , df (Xt ) = Lf where ˜ = Lf − ∇ ∗ fch. Lf
(5.14)
89
90
5 : Filtering model and Kallianpur–Striebel formula
Applying Itô’s formula to equations (5.9) and (5.14), we obtain ˜ (Xt )dt + ∇ ∗ fc(Xt )dYt + ∇ ∗ f σ (Xt )dBt d(Mt f (Xt )) = Mt Lf + Mt f (Xt )h∗ (Xt )dYt + Mt ∇ ∗ fch(Xt )dt.
(5.15)
Namely, Mt f (Xt ) = f (X0 ) + + +
t
0 t 0
t 0
Ms Lf (Xs )ds
Ms (∇ ∗ fc(Xs ) + fh∗ (Xs ))dYs Ms ∇ ∗ f σ (Xs )dBs .
Taking conditional expectations on both sides, we get t ˆ Vt , f = V0 , f + E Ms Lf (Xs )dsGt ˆ +E ˆ +E
0
t
0
t
0
= V0 , f +
Ms (∇ ∗ fc(Xs ) + fh∗ (Xs ))dYs Gt Ms ∇ ∗ f σ (Xs )dBs Gt
t
0
Vs , Lf ds +
0
t
Vs , ∇ ∗ fc + fh∗ dYs ,
where the last equality follows from Lemma 5.4.
To establish a stochastic differential equation for the optimal filter πt . We first define the innovation process νt by dνt = dYt − πt , h dt. (5.16) Lemma 5.6 The innovation process νt is a Gt -Brownian motion under the original probability measure. Proof It follows from equation (5.16) that, for t > s, t πr , h drGs E(νt |Gs ) = E Yt −
0
= E Yt − Ys −
t s
πr , h drGs + νs .
5.3
Filtering equation
Now, recalling the observation equation (5.1), we obtain
E(νt |Gs ) = E Wt − Ws − =
t
s
E
t
h(Xr ) − πr , h drGs + νs
s
E(h(Xr )|Gr ) − h(Xr ) Gs dr + νs
= νs , where the second equality follows from the independent increments of the Brownian motion W, the linearity of the conditional expectation and the boundedness of h. Therefore, νt is a Gt -martingale. ˆ its Meyer process is As Yt is a Brownian motion under P, lim
n→∞
n
j
j
Yit/n − Y(i−1)t/n
! " k k Yit/n = Y j , Y k = δjk t. − Y(i−1)t/n t
i=1
Since νt = Yt −
0
t
πs , h ds,
and the quadratic variation of the second term is 0, it follows that Meyer’s process for ν is !
νj, νk
" t
! " = Y j , Y k = δjk t. t
It now follows from Theorem 3.13 that νt is a Gt -Brownian motion.
Finally, we are ready to derive the stochastic differential equation for the optimal filter. Theorem 5.7 (Kushner–FKK equation) The optimal filter πt satisfies the following stochastic differential equation: For all f ∈ Cb2 (Rd ),
πt , f = π0 , f + +
0
t
0
t
πs , Lf ds
πs , ∇ ∗ fc + fh∗ − πs , f πs , h∗ dνs .
(5.17)
91
92
5 : Filtering model and Kallianpur–Striebel formula
Proof Applying Itô’s formula to equation (5.6) and making use of Zakai’s equation (5.13), we get 1 Vt , Lf dt + Vt , ∇ ∗ fc + fh∗ dYt
Vt , 1 Vt , f ∗ − , h V dYt t
Vt , 12 1 ∗ ∗ V V − , ∇ fc + fh , h dt t t
Vt , 12 Vt , f + Vt , h∗ Vt , h dt 3
Vt , 1 = πt , Lf dt + πt , ∇ ∗ fc + fh∗ dYt − πt , f πt , h∗ dYt − πt , ∇ ∗ fc + fh∗ πt , h dt + πt , f πt , h∗ πt , h dt.
d πt , f =
The equation (5.17) then follows if we replace Yt with νt + the above equation.
t 0
πs , h ds in
As an application of the innovation process, we now consider the portfolio optimization problem introduced in Chapter 1. Example 5.8 (Portfolio optimization) We demonstrate that the portfolio optimization problem can be solved in the filtering framework. Note that the signal process is the appreciation rate process Xt = (Xt1 , . . . , Xtd )∗ that can be modelled by a stochastic differential equation in Rd as follows: dXt = b(Xt )dt + c(Xt )dWt + σ (Xt )dBt ,
(5.18)
where (W, B) is a (m + d)-dimensional Brownian motion and the observation process Yt is the logarithm of the stock price process that satisfies the following SDE: t Yt = hs (Xs )ds + Wt , (5.19)
0
˜ s is a function from Rd to itself. It is clear where hs (x) = s−1 x − 12 A that equations (5.18) and (5.19) have a similar form as the filtering model we introduced in this chapter. Although the observation function h here is time dependent, the filtering equation still holds with obvious modification. However, the key point in solving the portfolio optimization problem is to represent the wealth process according to processes that are Gt -adapted.
5.4
Particle-system representation
Note that by equations (1.4) and (1.6), the wealth process Wt satisfies the following SDE: ⎞ ⎛ d d j ij d Wt = ⎝Xt0 Wt + (Xti − Xt0 )uit ⎠ dt + σt uit dWt , (5.20) i=1
i,j=1
where uit is the dollar amount in the ith stock you decide to own at time t. Clearly, your decision has to be based on the available information, and hence, the portfolio ut = (u1t , . . . , udt )∗ is Gt -adapted. As we know from Section 1.1.2, Xt0 is Gt -adapted. However, (Xti , Wti ), i = 1, 2, . . . , d are not Gt -adapted. Note that the innovation process T πs , hs ds νt = Yt − 0
is Gt -adapted. By equation (5.19), we get T hs (Xs )ds Wt = Yt − = νt +
0 T
0
¯ s − Xs ds, s−1 X
¯ s = E (Xs |Gs ). Inserting back into equation (5.20), we get where X ⎛ ⎞ d d ij j ¯ i − X 0 )ui ⎠ dt + d Wt = ⎝Xt0 Wt + (X σt uit dνt . t t t i=1
(5.21)
i,j=1
Note that all the processes as in equation (5.21) are Gt -adapted. The methods of stochastic control theory can be applied to obtain an optimal portfolio. This is a case where we can separate the filtering problem from the optimal control, namely, the model satisfies the separation principle. Since the theory of stochastic control is beyond the scope of this book, we refer the interested reader to the book of Yong and Zhou [153] for the general theory, and to the paper of Xiong and Zhou [150] for the application in obtaining an optimal portfolio (in some sense) for the model (with a slight generalization) introduced in this example.
5.4
Particle-system representation
In this section, we establish a particle-system representation for the unnormalized filter that will be useful in Chapter 8. We will represent Vt in terms of a conditionally independent system of particles.
93
94
5 : Filtering model and Kallianpur–Striebel formula
ˆ Recall that Yt is a P-Brownian motion. Note that the signal Xt is given by the following stochastic differential equation: dXt = (b − ch)(Xt )dt + c(Xt )dYt + σ (Xt )dBt . Further, dMt = Mt h∗ (Xt )dYt . ˆ let Bi , i = 1, 2, . . . , be indepenOn the probability space (, F , P), dent copies of B, and let them be independent of Y. Now we consider an interacting particle system: For i = 1, 2, . . . , dXti = (b − ch)(Xti )dt + c(Xti )dYt + σ (Xti )dBit ,
(5.22)
and
dMti = Mti h∗ (Xti )dYt M0i = 1.
(5.23)
Theorem 5.9 Suppose that {X0i , i = 1, 2, . . .} are i.i.d. random vectors with common distribution π0 on Rd . Then 1 i Mt f (Xti ), Vt , f = lim k→∞ k
k
(5.24)
i=1
where {(Mi , X i ) : i = 1, 2, . . .} is the unique strong solution to the particle system equations (5.22)–(5.23). Proof Since the system equations (5.22)–(5.23) has a unique solution, there is a functional Ft such that (Xti , Mti ) = Ft (X0i , Bi , Y). By the independence of Bi , we see that (Xti , Mti ) (i = 1, 2, . . .) are conditionally (given Gt ) independent with identical conditional distribution. Therefore, by a conditional version of the strong law of large numbers, we have 1 i ˆ t f (Xt )|Gt ) = Vt , f , lim Mt f (Xti ) = E(M k→∞ k k
a.s.
(5.25)
i=1
5.5
5.5
Notes
Notes
Since the early work of Stratonovich [142], [143] and Kushner [100], [101], non-linear filtering has been studied by many authors under various setups. Here we only mention a few: Grigelionis [72], Kailath [79], Kailath and Greesy [80], Frost and Kailath [66], Liptser [110], [111], [112], Liptser and Shiryaev [113], [114], [115], [116], Rozovskii [136], Shiryaev [138], [139], Striebel [144], Wentzell [147], Wonham [149] and Yershov [151], [152]. The celebrated paper of Fujisaki, et al. [67] brings to a culmination the innovation approach to non-linear filtering of diffusion processes. The filtering equation is usually called the Kushner–Stratonovich equation or the Kushner–FKK equation. The upper and lower bounds for the error of the optimal filter were studied by Bobrovsky and Zakai [13], [14], and, Zakai and Ziv [155]. We omit this topic because we want to restrict the length of this book. The work of Kallianpur and Striebel [82], [83] establish the representation of the optimal filter in terms of the unnormalized filter, which was studied in the pioneering doctoral dissertations of Duncan [61] and Mortensen [127] and the important paper of Zakai [154]. The linear SPDE (5.13) is called the Duncan–Mortensen–Zakai equation, or, simply, Zakai’s equation.
95
6
Uniqueness of the solution for Zakai’s equation
In this chapter, we prove the uniqueness of the solution to Zakai’s equation by transforming it to an SDE in a Hilbert space and by making use of estimates based on Hilbert-space techniques. Most of the material in Sections 6.2–6.4 is taken from Kurtz and Xiong [97] where a large class of linear stochastic partial differential equations are studied. Although the techniques we introduce here are for Zakai’s equation, they can be applied to other classes of linear SPDEs that includes Zakai’s equation as a special example.
6.1
Hilbert space
In this section, we state some useful facts about Hilbert spaces that will be utilized in proving the uniqueness for the solution of Zakai’s equation. Definition 6.1 A linear space H is an inner product space if there is a bilinear form ·, · on H × H such that x, x ≥ 0 for all x ∈ H and x, x = 0 if and only if x = 0. In an inner product space, we define a norm x = x, x1/2 ,
∀x ∈ H
and a metric d(x, y) = x − y,
∀ x, y ∈ H.
The space H is separable if it has a countable dense subset. A sequence {xn : n ≥ 1} is a Cauchy sequence if for any > 0, there exists N > 0 such that xn − xm < whenever n, m ≥ N. The space H is complete if every Cauchy sequence is convergent in H. Definition 6.2 An inner product space H is a Hilbert space if it is complete. (We shall assume throughout this chapter that it is separable.)
6.1
Hilbert space
The following inequality is the Cauchy–Schwarz inequality in Hilbert space. Proposition 6.3 Let H be a Hilbert space and x, y ∈ H. Then | x, y | ≤ xy. Proof For any t ∈ R, we have 0 ≤ tx + y, tx + y = x2 t 2 + 2 x, y t + y2 . Thus, (2 x, y )2 − 4x2 y2 ≤ 0. The conclusion of the proposition then follows easily.
Next we consider the basis of H. Definition 6.4 A countable set {hj : j = 1, 2, . . .} is a complete orthonormal system (CONS) of H if i) hj , hk = δjk , ∀j, k = 1, 2, . . . . ii) Every h ∈ H can be represented as a linear combination of {hj : j = 1, 2, . . .}. Proposition 6.5 If {hj : j = 1, 2, . . .} is a CONS of H, then for any h ∈ H, we have i) h=
∞
h, hj hj ;
j=1
ii) ∞ | h, hj |2 = h2 . j=1
Proof i) As h can be represented as a linear combination of {hj : j = 1, 2, . . .}, h=
∞ j=1
aj hj .
97
98
6 : Uniqueness of the solution for Zakai’s equation
Then, ∞ h, hk = aj hj , hk = ak .
j=1
ii) By the continuity and the linearity of the inner product, we have %∞ & ∞ 2 h, hj hj , h, hj hj h = h, h = j=1
=
j=1
∞
h, hj h, hk hj , hk
j,k=1
=
∞
2 h, hj .
j=1
Let H1 and H2 be two Hilbert spaces. We denote by H1 ⊗ H2 the completion of the linear span of {(h1 , h2 ) : hi ∈ Hi , i = 1, 2} with respect to the norm · H1 ⊗H2 defined by (h1 , h2 )2H1 ⊗H2 = h1 2H1 + h2 2H2 . It is easy to show that H1 ⊗ H2 is also a Hilbert space. H1 ⊗ H2 is called the tensor product of the Hilbert spaces H1 and H2 .
6.2
Transformation to a Hilbert space
In this section, we consider the uniqueness of the solution to the SDE (5.13) for the unnormalized filter Vt taking values in MF (Rd ). Let H0 = L2 (Rd ) be the Hilbert space consisting of square-integrable functions on Rd with the usual L2 -norm and the inner product given by 2 2 |φ(x)| dx and φ, ψ0 = φ(x)ψ(x)dx, (6.1) φ0 = Rd
Rd
respectively. Let MG (Rd ) be the space of finite signed measures on Rd . To obtain good estimates and to derive uniqueness for the solution to equation (5.13), we transform an MG (Rd )-valued process to an H0 -valued process. For any ν ∈ MG (Rd ) and δ > 0, let Gδ (x − y)ν(dy), (6.2) (Tδ ν)(x) = Rd
6.2
Transformation to a Hilbert space
where Gδ is the heat kernel given by
− d2
Gδ (x) = (2π δ)
|x|2 exp − 2δ
.
For t ≥ 0, we define operators Tt : H0 → H0 by Tt φ(x) = Gt (x − y)φ(y)dy, ∀φ ∈ H0 . Rd
Lemma 6.6 The family of operators {Tt : t ≥ 0} forms a contraction semigroup on H0 , i.e. ∀ t, s ≥ 0 and φ ∈ H0 , we have Tt+s = Tt Ts and Tt φ0 ≤ φ0 . Proof For any φ ∈ H0+ , by Fubini’s theorem, we have
Tt (Ts φ)(x) = = =
Rd Rd Rd
Gt (x − y) Rd
Rd
Gs (y − z)φ(z)dz dy
Gt (x − y)Gs (y − z)dyφ(z)dz
Gt+s (x − z)φ(z)dz
= Tt+s φ(x). The case for general φ ∈ H0 follows from the linearity. By the Cauchy–Schwarz inequality, we have 2 = Gδ (x − y)φ(y)dy dx Rd Rd 2 ≤ Gδ (x − y)φ(y) dy Gδ (x − z)dzdx
Tδ φ20
Rd
=
Rd 2 φ0 .
This proves the second property.
Rd
We shall need the following facts. Lemma 6.7 i) If ν ∈ MG (Rd ) and δ > 0, then Tδ ν ∈ H0 . ii) If ν ∈ MG (Rd ) and δ > 0, then T2δ |ν|0 ≤ Tδ |ν|0 , where |ν| is the total variation measure of ν.
99
100
6 : Uniqueness of the solution for Zakai’s equation
Proof i) By the Cauchy–Schwarz inequality, we get
2 Gδ (x − y)ν(dy) dx |Tδ ν(x)| dx = Rd Rd Rd Gδ (x − y)2 |ν|(dy)|ν|(Rd )dx ≤
2
Rd
Rd
= G2δ (0)|ν|(Rd )2 < ∞. ii) It follows from the semigroup property of Tt that T2δ = Tδ Tδ . Thus, by i) and the contraction property, we see that T2δ |ν|0 = Tδ (Tδ |ν|)0 ≤ Tδ |ν|0 .
Let Zsδ = Tδ Vs , where Vs is an MG (Rd )-valued solution to equaδ tion (5.13). To obtain an estimate for the H 0 -norm of the process Z , we need the following lemma. Recall that ν, f represents the integral of the function f with respect to the measure ν. Lemma 6.8 For any δ > 0, v ∈ MG (Rd ) and f ∈ H0 , we have i) Tδ v, f 0 = v, Tδ f .
(6.3)
ii) If, in addition, ∂i f ∈ H0 , then ∂i Tδ f = Tδ ∂i f , where ∂i f =
(6.4)
∂f ∂xi .
Proof i) First, we prove equation (6.3) for f ≥ 0. By Fubini’s theorem, we can change the order of the integrals below: Gδ (x − y)v(dy)f (x)dx Tδ v, f 0 = Rd Rd = Gδ (x − y)f (x)dxv(dy) Rd Rd = Tδ f (y)v(dy) = v, Tδ f . Rd
The case for general f follows from the linearity. ii) Recall that C01 (Rd ) denotes the collection of functions with compact support and continuous derivatives of order 1. Taking a test function ψ ∈
6.2
Transformation to a Hilbert space
C01 (Rd ), we have
2
Rd
≤
Rd
Rd
Gδ (x − y)|f (y)||∂i ψ(x)|dxdy
Rd
Gδ (x − y)2 |∂i ψ(x)|dxdy
= G2δ (0)
2 |∂i ψ(x)|dx
Rd
Rd
Rd
Rd
|f (y)|2 |∂i ψ(x)|dxdy
|f (y)|2 dy < ∞.
(6.5)
By Fubini’s theorem and the integration-by-parts formula, we have
− Tδ f , ∂i ψ
=−
0
Rd
Rd
Gδ (x − y)f (y)∂i ψ(x)dxdy
=− Gδ (x − y)∂i ψ(x)dxf (y)dy Rd Rd ∂ Gδ (x − y)ψ(x)dxf (y)dy. = d d R R ∂xi
(6.6)
Similar to equation (6.5), we can prove that
Rd
R
∂ Gδ (x − y)ψ(x)f (y) dxdy < ∞. d ∂x i
Thus, by Fubini’s theorem again, we can continue equation (6.6) and obtain
− Tδ f , ∂i ψ
0
∂ Gδ (x − y)f (y)dyψ(x)dx Rd Rd ∂yi ∂ = Gδ (x − y) f (y)dyψ(x)dx ∂yi Rd Rd = Tδ ∂i f , ψ 0 . =−
The proof of ii) now follows from duality. Replacing f by Tδ f in equation (5.13), we have δ Zt , f 0 = Vt , Tδ f = V0 , Tδ f +
t 0
Vs , LTδ f ds +
(6.7)
t 0
Vs , ∇ ∗ (Tδ f )c + (Tδ f )h∗ dYs .
101
102
6 : Uniqueness of the solution for Zakai’s equation
Note that for any ν ∈ MF (Rd ), % & d d 1 2 aij ∂ij (Tδ f ) + bi ∂i (Tδ f ) ν, LTδ f = ν, 2 i,j=1
i=1
d d " 1 ! bi ν, Tδ ∂i f , aij ν, Tδ ∂ij2 f − 2
=
i,j=1
i=1
where the last equality follows from equation (6.4). Hence, using equation (6.3), we have
d d " 1 ! 2 Tδ (bi ν), ∂i f 0 Tδ (aij ν), ∂ij f − ν, LTδ f = 0 2
i,j=1
=
i=1
d d " 1 ! 2 ∂i Tδ (bi ν), f 0 . ∂ij Tδ (aij ν), f + 0 2 i,j=1
(6.8)
i=1
Similarly, we can prove that ν, ∇ ∗ (Tδ f )c + (Tδ f )h∗ = −∇ ∗ Tδ (cν) + Tδ (h∗ ν), f 0 .
(6.9)
Inserting equations (6.8) and (6.9) back into equation (6.7) we have
Ztδ , f 0
=
Z0δ , f 0
d " 1 t! 2 ∂ij Tδ (aij Vs ), f ds + 0 2 0 i,j=1
+
d t i=1
0
∂i Tδ (bi Vs ), f
ds − 0
0
t
∇ ∗ Tδ (cVs ) − Tδ (h∗ Vs ), f
By Itô’s formula, we have
Ztδ , f
2 0
d t " δ ! 2 2 Zs , f 0 ∂ij Tδ (aij Vs ), f ds = Z0δ , f 0 + 0
i,j=1 0
+
d i=1 t
−
0
0
t
2 Zsδ , f 0 ∂i Tδ (bi Vs ), f 0 ds
2 Zsδ , f 0 ∇ ∗ Tδ (cVs ) − Tδ (h∗ Vs ), f 0 dYs
t ∗ + ∇ Tδ (cVs ) − Tδ (h∗ Vs ), f H 0
0
⊗Rm
2 ds.
0
dYs .
6.3
Some useful inequalities
Summing over f in a CONS of H0 , we get Ztδ 20
= Z0δ 20
+
d i=1 t
−
+
0
+
t
0
t 0
d t!
" Zsδ , ∂ij2 Tδ (aij Vs ) ds 0
i,j=1 0
2 Zsδ , ∂i Tδ (bi Vs ) 0 ds
2 Zsδ , ∇ ∗ Tδ (cVs ) − Tδ (h∗ Vs ) 0 dYs ' ' ∗ '∇ Tδ (cVs ) − Tδ (h∗ Vs )'2
H0 ⊗Rm
ds.
Taking expectations, we have
Eˆ Ztδ 20 = Eˆ Z0δ 20 +
+
+
d i,j=1 0 d i=1 t
+
0
−2 0
i,j=1 0
t
! " Eˆ Zsδ , ∂ij2 Tδ ((σ σ ∗ )ij Vs ) ds 0
! " Eˆ Zsδ , ∂ij2 Tδ ((cc∗ )ij Vs ) ds 0
ˆ Zδ , ∂i Tδ (bi Vs ) ds 2E s 0
' '2 Eˆ '∇ ∗ Tδ (cVs )'H
0 ⊗R
t
0 t
t 0
+
t
d
m
ds
Eˆ ∇ ∗ Tδ (cVs ), Tδ (h∗ Vs ) H
'2 ' Eˆ 'Tδ (h∗ Vs )'H
0 ⊗R
0 ⊗R
m
ds.
m
ds (6.10)
We will show that the integral terms on the right of equation (6.10) are bounded by a constant times the integral of Tδ (|Vs |)0 . To this end, we need some careful estimates that will be derived in the next section.
6.3
Some useful inequalities
To continue with the estimation in the last section, we now derive some useful inequalities. Throughout this section, we assume that fi : Rd → Rm , i = 1, 2, are two bounded Lipschitz continuous functions, namely, there
103
104
6 : Uniqueness of the solution for Zakai’s equation
exists a constant K such that ∀x, y ∈ Rd ,
|fi (x) − fi (y)| ≤ K|x − y|, and |fi (x)| ≤ K,
∀x ∈ Rd .
Note that |·| denotes the Euclidean norm. We will not indicate the dimension of the space when it is clear from the context. For example, | · | here denotes the norm in Rm . However, it might also denote the norm in Rd or R below. Lemma 6.9 Suppose that g ∈ H0 is such that ∂i g ∈ H0 , i = 1, . . . , d. Then, g, f1 ∂i g ≤ 1 Kg2 . 0 0 2
(6.11)
Proof First, we assume that f1 and g are continuously differentiable and have compact supports. Then, integrating by parts we have 1 1 2 g, f1 ∂i g 0 = f1 (x)∂i (g (x))dx = − g 2 (x)∂i f1 (x)dx, 2 Rd 2 Rd and hence
1 2 g, f1 ∂i g ≤ 1 g (x)∂i f1 (x)dx ≤ Kg20 . 0 2 Rd 2
The general result follows by approximation.
Note that for ζ ∈ MG (Rd ), f1 ζ is a Rm -valued signed measure and hence, Tδ (f1 ζ ) is in H0 ⊗ Rm . The next lemma gives some estimates for Tδ (f1 ζ ). We abuse the notation a little by using |ζ | to denote the total variation measure of the signed measure ζ , i.e. |ζ | = ζ + + ζ − ; while ζ + (resp. ζ − ) is the positive (resp. negative) part of ζ . Lemma 6.10 There exist constants K1 and K2 depending on K such that for any ζ ∈ MG (Rd ), ' ' ' ' 'Tδ (f1 ζ )' 'Tδ (|f1 ||ζ |)' ≤ K Tδ (|ζ |)0 , ≤ (6.12) m H ⊗R 0 0
' ' 'f1 ∂i Tδ (ζ ) − ∂i Tδ (f1 ζ )'
H0 ⊗Rm
and
≤ K1 T2δ (|ζ |)0 ,
Tδ (f2 ζ ), ∂i Tδ (f1 ζ ) H0 ⊗Rm ≤ K2 Tδ (|ζ |)20 .
(6.13)
(6.14)
6.3
Some useful inequalities
Proof The inequalities equation (6.12) follow from the following calculation: ' ' 'Tδ (f1 ζ )'2 = |Tδ (f1 ζ )(x)|2 dx H ⊗Rm
=
Rd
= ≤
(
Rd
Rd
0
Rd
Rd
Rd
Gδ (x − y)f1 (y)ζ (dy),
Rd
Rd
) Gδ (x − z)f1 (z)ζ (dz) dx
Gδ (x − y)Gδ (x − z) f1 (y), f2 (z) ζ (dy)ζ (dz)dx
Tδ (|f1 ||ζ |)(x)2 dx
'2 ' = 'Tδ (|f1 ||ζ |)'0 ≤ K2 Tδ (|ζ |)20 . Note that |f1 (x)∂i Tδ (ζ )(x) − ∂i Tδ (f1 ζ )(x)| (f1 (x) − f1 (y))∂xi Gδ (x − y)ζ (dy) . = Rd
(6.15)
As ∂i Gδ (x) = −
xi Gδ (x), δ
(6.16)
and
|x|2 Gδ (x) = exp − 4δ
d
2 2 G2δ (x),
(6.17)
by Lipschitz continuity of f1 , we can continue equation (6.15) as follows |f1 (x)∂i Tδ (ζ )(x) − ∂i Tδ (f1 ζ )(x)| |xi − yi | Gδ (x − y)|ζ |(dy) K|x − y| ≤ d δ R d |x − y|2 |x − y|2 ≤K exp − 2 2 G2δ (x − y)|ζ |(dy) δ 4δ Rd d
≤ 2 2 +2 KT2δ (|ζ |)(x),
(6.18)
105
106
6 : Uniqueness of the solution for Zakai’s equation
2 where the last inequality follows from the fact that u2 exp − u4 ≤ 4,
for all u ∈ R. Taking the H0 -norm of both sides of equation (6.18) gives d equation (6.13) with K1 = 2 2 +2 K. Finally, we prove equation (6.14). By triangular inequality, we have Tδ (f2 ζ ), ∂i Tδ (f1 ζ ) H0 ⊗Rm ≤ Tδ (f2 ζ ), f1 ∂i Tδ ζ H
0
⊗Rm
(6.19)
+ Tδ (f2 ζ ), ∂i Tδ (f1 ζ ) − f1 ∂i Tδ ζ H ⊗Rm . 0 Note that the first term on the right-hand side of equation (6.19) is ∂i (f ∗ Tδ (f2 ζ )), Tδ ζ 1
0
≤ ∂i f1∗ Tδ (f2 ζ ), Tδ ζ 0 + f1∗ ∂i (Tδ (f2 ζ )), Tδ ζ 0 ≤ KTδ (f2 ζ )H0 ⊗Rm Tδ ζ 0 + f2 ∂i (Tδ ζ ), f1 Tδ ζ H
+ ∂i (Tδ (f2 ζ )) − f2 ∂i (Tδ ζ ), f1 Tδ ζ H
m . 0 ⊗R
m 0 ⊗R
By equations (6.11) and (6.13), we then have ∂i (f ∗ Tδ (f2 ζ )), Tδ ζ 1
0
1 2 K Tδ ζ 2 + KK1 T2δ |ζ |0 Tδ ζ 0 2 1 2 2 ≤ K + K + KK1 Tδ |ζ |20 , 2 ≤ K2 Tδ |ζ |20 +
(6.20)
where the last inequality follows from Lemma 6.7. On the other hand, by equations (6.12) and (6.13), the second term of equation (6.19) is bounded by Tδ (f2 ζ )H0 ⊗Rm ∂i Tδ (f1 ζ ) − f1 ∂i Tδ ζ H0 ⊗Rm ≤ KTδ |ζ |0 K1 T2δ |ζ |0 ≤ KK1 Tδ |ζ |20 ,
(6.21)
6.3
Some useful inequalities
where the last inequality follows from Lemma 6.7. Therefore, using equations (6.20) and (6.21), we can continue equation (6.19) to finish the proof of equation (6.14). Lemma 6.11 There exists a constant K1 such that for any ζ ∈ MG (Rd ), we have ' '2 ' m ' d ' ' 2 ∗ ' ' ≤ K1 Tδ (|ζ |)2 . Tδ ζ , ∂ij Tδ ((cc )ij ζ ) + ∂ T (c ζ ) x δ ik i 0 ' ' 0 ' ' k=1 i=1
d ! i,j=1
"
0
(6.22) Proof To make clear the variable with respect to which the integral is taken, we use the following convention:
Rd
ζ (dx)f (x) for
Rd
f (x)ζ (dx)
when the expression for f is too long. Note that d !
Tδ ζ , ∂ij2 Tδ ((cc∗ )ij ζ )
i,j=1
=
d d i,j=1 R
dx
"
(6.23)
0
Rd
ζ (dy)Gδ (x−y)
Rd
ζ (dz)∂x2i xj Gδ (x−z)
k=1
Using the semigroup property of Gδ , we have Rd
Gδ (x − y)Gδ (x − z)dx = G2δ (y − z).
By equation (6.16), we get ∂ij2 Gδ (x)
=
xi xj 1i=j − δ δ2
m
Gδ (x).
As ∂x2i xj Gδ (x − z) = ∂z2i zj Gδ (x − z),
cik (z)cjk (z).
107
108
6 : Uniqueness of the solution for Zakai’s equation
we can continue equation (6.23) as follows d !
Tδ ζ , ∂ij2 Tδ ((cc∗ )ij ζ )
i,j=1
=
=
d
i,j=1
0
ζ (dy)
d i,j=1 R
d
"
ζ (dz)
Rd
m
cik (z)cjk (z)∂zi ∂zj
k=1
Rd
Gδ (x−y)Gδ (x−z)dx
(zi − yi )(zj − yj ) 1i=j ζ (dy) ζ (dz) − 2δ 4δ 2 Rd Rd
G2δ (z − y)
m
cik (z)cjk (z).
k=1
Interchanging y and z in the above equation, we arrive at d !
Tδ ζ , ∂ij2 Tδ ((cc∗ )ij ζ )
i,j=1
=
i,j=1
0
d
"
(zi − yi )(zj − yj ) 1i=j ζ (dy) ζ (dz) − 2δ 4δ 2 Rd Rd
1 (cik (z)cjk (z) + cik (y)cjk (y)). × G2δ (z − y) 2 m
(6.24)
k=1
Similarly, we can prove that '2 ' ' m ' ' ' d ' ' ∂ T (c ζ ) δ i ik ' ' ' ' k=1 i=1 0
=−
m
d
Tδ (cik ζ ), ∂i ∂j Tδ (cjk ζ ) 0
k=1 i,j=1
=−
m d k=1 i,j=1
(zi − yi )(zj − yj ) 1i=j ζ (dy) ζ (dz) − 2δ 4δ 2 Rd Rd
1 × G2δ (z − y) (cik (y)cjk (z) + cik (z)cjk (y)). 2
(6.25)
6.4
Uniqueness for Zakai’s equation
Hence, combining equations (6.24) and (6.25), the left-hand side of equation (6.22) is equal to
d
(zi − yi )(zj − yj ) 1i=j ζ (dy) ζ (dz) − 2δ 4δ 2 Rd Rd
i,j=1
G2δ (z − y)
1 (cik (y) − cik (z))(cjk (y) − cjk (z)). 2 m
×
k=1
Using equation (6.17) and the Lipschitz continuity of c, we see that the above quantity is bounded by d 1 |z − y|2 |z − y|2 1 exp − |ζ |(dy) |ζ |(dz) + 2 2δ 4δ 4δ 2 Rd Rd i,j=1
d
× 2 2 G4δ (z − y)K2 |z − y|2 ≤ 4K
d
2
d i,j=1 R
|ζ |(dy)
Rd
d
|ζ |(dz)2 2 G4δ (z − y)
d
= d 2 2 2 +2 K2 T2δ (|ζ |)20 d
≤ d 2 2 2 +2 K2 Tδ (|ζ |)20 , where the first inequality follows by bounding (4v2 + 2v)e−v . The lemma d follows with K1 = d 2 2 2 +2 K2 .
6.4
Uniqueness for Zakai’s equation
Now we continue the estimation started in Section 6.2 by making use of the inequalities we obtained in Section 6.3. Theorem 6.12 If V is an MG (Rd )-valued solution of equation (5.13) and Zδ = Tδ V, then t δ 2 δ 2 ˆ EZt 0 ≤ Z0 0 + K1 Eˆ Tδ (|Vs |)20 ds, (6.26) 0
where K1 is a constant. Proof The last term of equation (6.10) is bounded by a constant times Tδ (|Vs |)20 by equation (6.12). The bound for the second term of equation (6.10) follows from Lemma 6.11. The bound for the sum of the
109
110
6 : Uniqueness of the solution for Zakai’s equation
third and fifth terms of equation (6.10) also follows from Lemma 6.11. The bound for the fourth and sixth terms of equation (6.10) follows by equation (6.14). Corollary 6.13 If V is an MF (Rd )-valued solution of equation (5.13) and ˆ Vt 2 < ∞, ∀t ≥ 0. V0 ∈ H0 , then Vt ∈ H0 a.s. and E 0 Proof Since Vt is a measure, |Vt | = Vt . It follows from equation (6.26) that t δ 2 δ 2 ˆ EZt 0 ≤ Z0 0 + K1 Eˆ Zsδ 20 ds. 0
By Gronwall’s inequality, we have
Eˆ Ztδ 20 ≤ Z0δ 20 eK1 t . Note that lim Ztδ , φ 0 = lim
δ→0
(6.27)
δ→0 Rd
Gδ (x − y)φ(x)dxVt (dy) = Vt , φ .
Rd
Let {φj } be a complete, orthonormal system of H0 such that φj ∈ Cb (Rd ). Then, by Fatou’s lemma, ⎡ ⎤ ⎡ ⎤ 2 2 ˆ Zδ 2 ≤ V0 2 eK1 t . ˆ⎣ Eˆ ⎣ lim Ztδ , φj 0 ⎦ ≤ lim inf E Vt , φj ⎦ = E t 0 0 j
δ→0
j
Let
δ→0
Vt , φj φj .
V˜ t =
j
Then, V˜ t ∈ H0 and !
V˜ t , f
" 0
=
Vt , φj f , φj 0 = Vt , f .
j
ˆ Vt 2 < ∞. Hence, Vt ∈ H0 and E 0
These estimates give uniqueness of MF (Rd )-valued solutions with V0 ∈ H0 . Theorem 6.14 Suppose that V0 ∈ H0+ . Then Zakai’s equation (5.13) has at most one MF (Rd )-valued solution. Proof Let Vt1 and Vt2 be two MF (Rd )-valued solutions with the same initial value V0 . By Corollary 6.13, Vt1 , Vt2 ∈ H0 a.s. Let Vt = Vt1 − Vt2 . Then Vt ∈ H0 and t 2 ˆ ETδ Vt 0 ≤ K1 Eˆ Tδ (|Vs |)20 ds. 0
6.5
A duality representation
Note that for Vt ∈ H0 , we have |Vt | ∈ H0 and |Vt |0 = Vt 0 . Taking δ → 0, we get t t 2 2 ˆ ˆ EVt 0 ≤ K1 E |Vs |0 ds = K1 Eˆ Vs 20 ds. 0
0
By Gronwall’s inequality, we arrive at Vt ≡ 0.
By exactly the same argument we can prove the following theorem. Theorem 6.15 Suppose that V0 ∈ H0 . Then Zakai’s equation (5.13) has at most one H0 -valued solution.
6.5
A duality representation
In this section, we give a representation of the unnormalized filter in terms of the solution to an SPDE that is the dual of Zakai’s equation. This representation will be useful in proving the convergence of the numerical approximations of the optimal filter. To aid the understanding of this method, we recall the duality used in the proof of uniqueness for the solution of a linear partial differential equation (PDE). Let u be a solution to the following PDE: ∂u = L u, ∂s
(6.28)
with initial condition u0 , where L is a second-order differential operator and L is the adjoint operator of L. To prove the uniqueness for the solution of equation (6.28), we consider the following backward PDE for s ∈ [0, t] with t being fixed: ∂v ∂s = −Lv, (6.29) vt = g. Then,
d ds
us , vs 0 = 0 and, hence, ut , g 0 = u0 , v0 0 ,
here the notation ·, ·0 is the inner product in L2 (Rd ) introduced in equation (6.1). This implies the uniqueness for the solution to equation (6.28). Now we imitate equation (6.29) and consider the backward SPDE: ˆ dψs = −Lψs ds − ∇ ∗ ψs c + h∗ ψs dY 0 ≤ s ≤ t, s, (6.30) ψt = φ,
111
112
6 : Uniqueness of the solution for Zakai’s equation
where dˆ denotes the backward Itô integral. Namely, we take the right endpoints in the approximating Riemann sum in defining the stochastic integral. Remark 6.16 In the ordinary Riemann integral, it does not matter which point we take in each subinterval of a partition to define the Riemann sum. However, it is important to take the left endpoint when defining Itô’s stochastic integral. Therefore, it is crucial here to take the right endpoint in the Riemann sum when we consider the backward SPDE. Hereafter, we will denote by Cbk Rd , X the set of all bounded continuderivatives up to order ous mappings from Rd to X with bounded partial k d k, where X is a Hilbert space. We endow Cb R , X with the following norm ||ϕ||k,∞ = sup Dα ϕ (x)X , ϕ ∈ Cbk Rd , X , where α =
|α|≤k x∈R
α1, . . . , αd
d
is a multi-index, |α| = α 1 + · · · + α d and 1 d Dα ϕ = ∂1α · · · ∂dα ϕ. Also, let Wpk Rd , X be the set of all functions with generalized partial derivatives up to order k with both the function and all k d its partial derivatives being p-integrable. We endow Wp R , X with the following Sobolev norm ⎛ ||ϕ||k,p
=⎝
|α|≤k
Rd
⎞1 p
p Dα ϕ (x)X
dx⎠ .
When X is clear from the context or X = R, we will drop it from the notation for simplicity. To demonstrate the existence of a solution, we now convert the backward SPDE (6.30) to an ordinary SPDE by reversing the time parameter. Fix t > 0. For 0 < s < t, we define Y˜ s = Yt − Yt−s and ψ˜ s = ψt−s . Then, ψ˜ s satisfies the following forward SPDE d ψ˜ s = Lψ˜ s ds + ∇ ∗ ψ˜ s c + h∗ ψ˜ s d Y˜ s , ψ˜ 0 = φ.
0 ≤ s ≤ t,
(6.31)
˜ we get the As equation (6.31) is a Zakai-type equation with Y replaced by Y, existence of its solution (as the optimal filter for a suitable filtering model).
6.5
A duality representation
Similar to the proof of Theorem 6.14, we can show the uniqueness for the solution to equation (6.31) with φ ∈ H0+ = W20 (Rd )+ . However, we need
here the solution of equation (6.30) to be a process with values in Cb2 Rd . To achieve this, we show that ψs ∈ W2k Rd , where k is chosen so that
2(k − 2) > d, and then, using a standard Sobolev imbedding argument, which we state below (without giving the proof) for the convenience of the reader. We refer the reader to the book of Adams [1] for the proof of a more general version of the theorem. Theorem 6.17 (Sobolev) If kp > d + j, then Wpk (Rd ) can be embedded into j
Cb (Rd ), i.e. there is a constant K1 and a linear mapping from f ∈ Wpk (Rd ) j to f¯ ∈ C (Rd ) such that f (x) = f¯ (x) for almost every x and b
f¯ j,∞ ≤ K1 f k,p . Now we are ready to prove the existence of a smooth solution to the SPDE (6.31). We shall need the following Assumption (BD): The mappings a, b, c, h, φ are in Cbk (Rd , X ) with k = d2 + 2 and X being Sd , Rd , Rd×m , Rm and R, respectively. Also, we assume φ ∈ W2k (Rd ). Lemma 6.18 Suppose that Assumption (BD) holds. Then there exists a constant K1 independent of φ and s ∈ [0, t] such that (6.32) E[ψs 2k,2 ] ≤ K1 φ2k,2 . As a consequence ψs ∈ Cb2 Rd a.s. and there exists a constant K2 independent of φ and s ∈ [0, t] such that E[ψs 22,∞ ] ≤ K2 φ2k,2 . (6.33) Proof It follows from the same arguments as those leading to equation (6.27) that there exists a constant K3 such that
Eˆ ψs 20,2 ≤ K3 φ20,2 . Next, we take derivatives (smooth out by the Brownian semigroup Tδ as we did in Section 6.2 if necessary) on both sides of equation (6.31). For simplicity of notations, we assume d = 1. Then ψ˜ s1 ≡ ∇ ψ˜ s satisfies the following SPDE d ψ˜ s1 = L1 ψ˜ s1 ds + ∇ ∗ ψ˜ s1 c + c1 ψ˜ s1 + c2 ψ˜ d Y˜ s , ψ˜ 1 = ∇φ 0
113
114
6 : Uniqueness of the solution for Zakai’s equation
where L1 is a second order differential operator with bounded coefficients, ci s (i = 1, 2), are bounded functions. With similar arguments as in equation (6.27), we can prove that there is a constant K4 such that
Eˆ ψ˜ s1 20,2 ≤ K4 φ21,2 . The higher-derivative estimates for equation (6.32) follow by induction. The inequality equation (6.33) follows from Sobolev’s imbedding theorem. Remark 6.19 Condition (BD) is not sharp for the conclusion of Lemma 6.18 to hold. It can be relaxed by using Krylov’s Lp -theory, whose proof is more complicated than the proof we presented above. For g ∈ L2 ([0, t], Rm ), we define r √ 1 r W ∗ 2 θg (r) = exp −1 gs dWs + |gs | ds . 2 0 0
(6.34)
We will need the following lemma that implies that the family θgW (t) : g is bounded on [0, t] ˆ is dense in L2 (, FtW , P). ˆ satisfies Lemma 6.20 If ξ ∈ L2 (, FtW , P) Eˆ ξ θgW (t) = 0, for all bounded function g on [0, t], then ξ = 0 a.s. Proof Let
n : 1 ≤ i ≤ 2n , Hn = σ Wtin − Wti−1
where tin = it2−n , i = 0, 1, 2, . . . , 2n . Then {Hn } is a sequence of σ -fields increasing to FtW . By the martingale convergence theorem (Theorem 2.9), we have ˆ (ξ |Hn ) → ξ , ξn ≡ E a.s. n : 1 ≤ i ≤ 2n . Let Note that ξn is a function of Wtin − Wti−1 n
gsn
=
2 i=1
n ,t n ) (s), λi 1[ti−1 i
6.5
A duality representation
where the λi s are constants. Then ˆ ξ θ Wn (t) = E ˆ ξ θ Wn (t)|Hn = E ˆ ξn θ Wn (t) . ˆ E 0=E g
g
g
t Note that 0 |gsn |2 ds is non-random. This implies that the Fourier transformation of ξn is ⎛ ⎞⎞ ⎛ 2n √ ⎠⎠ = 0. n Eˆ ⎝ξn exp ⎝ −1 λi Wtin − Wti−1 i=1
Therefore, ξn = 0 a.s. and hence, ξ = 0 a.s.
The following lemma will play a key role in the proof of the convergence of Vtn to the unnormalized filter Vt in Chapter 8 as well as for the duality representation of the unnormalized filter. Recall that Xt is the signal and dMt = Mt h∗ (Xt )dYt . Note that ψ is Gt -measurable which is independent of FrB . The stochasr tic integral 0 Ms ∇ ∗ ψs σ (Xs )dBs is well defined on the stochastic basis ˆ F˜ s ), where F˜ s = Fs ∨ Gt , 0 ≤ s ≤ t. (, F , P, Lemma 6.21 Suppose that Condition (BD) holds. Then, for every t ≥ 0, we have t ψt (Xt )Mt − ψ0 (X0 ) = Ms ∇ ∗ ψs σ (Xs )dBs , a.s. (6.35) 0
Proof Let f and g be two bounded smooth functions on [0, t] taking values in Rm and Rd , respectively. Let θfY (r) be defined as in equation (6.34), and let θgB (r) be defined in a similar fashion. Note that both sides of equation (6.35) are Gt ∨ FtB -measurable. It follows from the previous lemma (with W replaced by (B, Y)) that in order to prove equation (6.35) it is sufficient to show that for all bounded functions f and g, we have Eˆ (ψt (Xt )Mt − ψ0 (X0 )) θfY (t)θgB (t) t ∗ Y B ˆ Ms ∇ ψs σ (Xs )dBs θf (t)θg (t) . (6.36) =E 0
Let
ˆ ψr (x)θ˜ Y (r)|Fr , r (x) = E f
∀x ∈ Rd ,
115
116
6 : Uniqueness of the solution for Zakai’s equation
where θ˜fY (r) = θfY (t)/θfY (r) = exp
t √ 1 t 2 ∗ −1 fs dYs + |fs | ds . 2 r r
Let θ˜gB (r) be defined similarly. Since ψr and θ˜fY (r) are measurable with respect to the σ -field Gr,t , which is independent of Fr , we get that ˆ ψr (x)θ˜ Y (r) . r (x) = E f As θ˜gB (r) is independent of Fr ∨ Gr,t and θgB (r) is a martingale, we have Eˆ ψr (x)θ˜fY (r)θ˜gB (r)|Fr = Eˆ Eˆ ψr (x)θ˜fY (r)θ˜gB (r)Fr ∨ Gr,t Fr ˆ ψr (x)θ˜ Y (r)E ˆ θ˜ B (r)Fr Fr =E g f ˆ ψr (x)θ˜ Y (r)Fr =E f = r (x). Hence, for r ∈ [0, t], we have Eˆ ψr (Xr )Mr θfY (t)θgB (t)|Fr = Mr θfY (r)θgB (r)Eˆ ψr (Xr )θ˜fY (r)θ˜gB (r)|Fr = r (Xr )Mr θfY (r)θgB (r).
(6.37)
t ˆ
t ∗ Note that r fs∗ dY s coincides with r fs dYs since fs is deterministic. Thus, by the backward Itô formula, we have √ ˆ r. dˆ θ˜fY (r) = − −1θ˜fY (r)fr∗ dY
(6.38)
Applying the backward Itô formula to equations (6.30) and (6.38), we get ˆ r θ˜ Y (r)) = − Lψr θ˜ Y (r)dr − ∇ ∗ ψr c + h∗ ψr θ˜ Y (r)dY ˆ r d(ψ f f f √ √ ˆ r + −1 ∇ ∗ ψr c + h∗ ψr fr θ˜ Y (r)dr − −1ψr θ˜fY (r)fr∗ dY f √ ∗ = −Lψr + −1 ∇ ψr c + h∗ ψr fr θ˜fY (r)dr √ ˆ r. − ∇ ∗ ψr c + h∗ ψr − −1ψr fr∗ θ˜fY (r)dY
6.5
A duality representation
Writing into integral form, we get t √ Y ˜ φ − ψs θf (s) = Lψr − −1 ∇ ∗ ψr c + h∗ ψr fr θ˜fY (r)dr s
+
t s
∇ ∗ ψr c + h∗ ψr −
√
ˆ r. −1ψr fr∗ θ˜fY (r)dY
Taking expectation on both sides, we see that t √ Lr − −1 ∇ ∗ r cfr + h∗ fr r dr, φ − s = s
and hence, r is the solution to the following PDE: √ d r = −Lr + −1 ∇ ∗ r cfr + h∗ fr r . dr As a consequence, is differentiable in r and has continuous first- and second-order partial derivatives in x. By Itô’s formula, we have √ dr (Xr ) = −Lr (Xr ) + −1 ∇ ∗ r cfr + h∗ fr r (Xr ) dr ˜ r (Xr )dr + ∇ ∗ r σ (Xr )dBr + c(Xr )dYr + L √ √ = −1 ∇ ∗ r cfr + h∗ fr r + −1∇ ∗ r ch (Xr )dr + ∇ ∗ r σ (Xr )dBr + c(Xr )dYr . (6.39) Note that dMr = Mr h∗ (Xr )dYr , dθfY (r) =
√ −1θfY (r)fr∗ dYr ,
dθgB (r) =
√ −1θgB (r)gr∗ dBr .
and
Applying Itô’s formula to the four equations above, we get d(r (Xr )Mr θfY (r)θgB (r)) √ = −1Mr θfY (r)θgB (r)∇ ∗ r σ (Xr )gr dr + d(mart.)
(6.40)
117
118
6 : Uniqueness of the solution for Zakai’s equation
Making use of equation (6.37) with r = t and r = 0, respectively, we get Eˆ (ψt (Xt )Mt − ψ0 (X0 )) θfY (t)θgB (t) ˆ E ˆ ψt (Xt )Mt θ Y (t)θ B (t)Ft − E ˆ ψ0 (X0 )θ Y (t)θ B (t) =E g g f f ˆ t (Xt )Mt θ Y (t)θ B (t) − 0 (X0 ) =E g f t √ (6.41) Eˆ Mr θfY (r)θgB (r)∇ ∗ r σ (Xr )gr dr, = −1 0
where the last equality follows from equation (6.38). ˆ F˜ s ), we get By Itô’s formula on the stochastic basis (, F , P, r Ms ∇ ∗ ψs σ (Xs )dBs θgB (r) 0
=
r
· · · dBs +
0
√
−1
0
r
Ms ∇ ∗ ψs σ (Xs )gs θgB (s)ds.
This implies that t ∗ Y B ˆ E Ms ∇ ψs σ (Xs )dBs θf (t)θg (t) 0
t ∗ Y B ˆ ˆ Ms ∇ ψs σ (Xs )dBs θf (t)θg (t)Gt =E E ˆ =E
0
√
−1
t
0
Ms ∇
∗
ψs σ (Xs )gs θgB (s)dsθfY (t)
t √ ˆ Ms ∇ ∗ ψs σ (Xs )gs θ B (s)θ Y (t)|Fs ds ˆ E Ms E = −1 g f 0
t √ ∗ Y B Y ˆ ˆ ˜ −1 Ms E ∇ ψs (Xs )θf (s)|Fs σ (Xs )gs θg (s)θf (s)ds =E ˆ =E
0
√
−1
0
t
Ms ∇ ∗ s (Xs )σ (Xs )gs θgB (s)θfY (s)ds .
(6.42)
From equations (6.41) and (6.42), we see that equation (6.36) holds, which then implies equation (6.35). As a corollary, we get the uniqueness for the solution to Zakai’s equation. Corollary 6.22 Suppose that Condition (BC) holds, φ ∈ Cb (Rd ) and π0 ∈ L2 (Rd ). Then
Vt , φ = π0 , ψ0 .
(6.43)
6.5
A duality representation
Proof First, we assume that Condition (BD) holds. By equation (5.12), we see that t Eˆ Ms ∇ ∗ ψs σ (Xs )dBs Gt = 0. 0
It then follows from equation (6.35) and Theorem 5.3 that ˆ (ψ0 (X0 )|Gt ) = π0 , ψ0 . ˆ φ(Xt )Mt Gt = E
Vt , φ = E Now we will remove the Condition (BD) but assume that φ and π0 are Lipschitz continuous functions. Let {(an , bn , cn , hn , φ n )} be a sequence of functions such that for each n, Condition (BD) is satisfied, and as n → ∞, it converges to (a, b, c, h, φ) in supremum norm. Let Vtn , ψ0n be given as before with (a, b, c, h, φ) being replaced by (an , bn , cn , hn , φ n ). We now prove the convergence of Vtn , ψ0n to Vt , ψ0 using the representation given by the Kallianpur–Striebel formula. By the proof above, we have that n Vt , φ = π0 , ψ0n . (6.44) Although the measure Pˆ n depends on n, the same process (Y, B) applies to all models and is an (m + d)-dimensional Brownian motion under Pˆ n for all n. As (X n , Mn ) is the unique strong solution to dXtn = (bn − cn hn )(Xtn )dt + cn (Xtn )dYt + σ n (Xtn )dBt dMtn = hn (Xtn )∗ dYt , its distribution does not depend on n. Thus
ˆn ˆ Mn f (X n )|Gt . Vtn , φ = EP Mtn f (Xtn )|Gt = E t t
Note that Eˆ Vtn , φ − Vt , φ ˆ Mn φ(X n ) − Mt φ(Xt ) ≤E t t # n # 2 ˆ ˆ |X n − Xt |2 , ˆ ≤ KE Mt − Mt + EMt K E t where we used the Lipschitz continuity and the boundedness of φ is the last ˆ |X n − Xt |2 → 0. Similar to the inequality. By Theorem 4.10, we have E t ˆ Mn − Mt → 0. Therefore, as proof of Theorem 4.10, we can show that E t n → ∞, Eˆ Vtn , φ − Vt , φ → 0. (6.45)
119
120
6 : Uniqueness of the solution for Zakai’s equation
Similarly, we have Eˆ π0 , ψ0n − π0 , ψ0 → 0.
(6.46)
Taking n → ∞ on both sides of equation (6.44), it follows from equations (6.45) and (6.46) that equation (6.43) holds when φ and π0 are Lipschitz continuous functions. Finally, for general φ ∈ Cb (Rd ) and π0 ∈ L2 (Rd ), we can approximate them by Lipschitz functions and obtain equation (6.43) by passing though the limit as we did above.
6.6
Notes
There are several theories developed that provide the existence and uniqueness for the solutions to linear SPDEs. For example, Krylov and Rozovskii [90], [91], Pardoux [132], Rozovskii [137] for the L2 -theory, Da Prato and J. Zabczyk [47], Krylov [88] for the Lp -theory, Krylov [89] for the analytic approach, and Kunita [94] for the semigroup approach. Most of the material in this chapter (except the last section) was taken from Kurtz and Xiong [97]. The method here can be applied to general linear SPDEs with Zakai’s equation as a special case. The last section is based on Crisan [38].
Uniqueness of the solution for the filtering equation
7
In this chapter, we prove the uniqueness for the solution to the filtering equation. To this end, we consider an interacting particle system whose weighted empirical distribution process satisfies the filtering equation. Note that the filtering equation is a non-linear stochastic partial differential equation (SPDE). The main idea in proving the uniqueness in this chapter is to show that the uniqueness of a non-linear SPDE is implied by that of the infinite system of ordinary stochastic differential equations and that of a corresponding linear SPDE, which follows from the same arguments as those in the previous chapter. The uniqueness for the system is obtained by a truncation argument.
7.1
An interacting particle system
In this section, we define an infinite particle system and prove that the weighted empirical measure process of this system is a solution of the filtering equation. This system will help us to prove the uniqueness for the solution to the filtering equation. Let β : Rd × P (Rd ) → Rm and b˜ : Rd × P (Rd ) → Rd be given by ˜ µ) = b(x) − c(x)β(x, µ). β(x, µ) = µ, h − h(x) and b(x, We consider an interacting particle system {(X i , Ai ), i = 1, 2, . . .} governed by the following equations: i = 1, 2, . . ., Xti
=
X0i
+
0
t
σ (Xsi )dBis
+
0
Ait
=
Ai0
+
t
t 0
˜ i , µs )ds + b(X s
t 0
Ais β ∗ (Xsi , µs )dνs ,
c(Xsi )dνs ,
(7.1)
(7.2)
122
7 : Uniqueness of the solution for the filtering equation
and 1 i At δXi , t n→∞ n n
µt = lim
(7.3)
i=1
where the Bi , i = 1, 2, . . ., are independent, standard Rd -valued Brownian motions; ν is a Rm -valued Brownian motion independent of {Bi : i = 1, 2, . . .}, and the notation δx stands for the Dirac point measure at x. More specifically, equation (7.3) means that for any f ∈ Cb (Rd ), n 1 i µt , f = lim At f (Xti ). n→∞ n
i=1
Definition 7.1 The triple (X, A, µ) is a solution to the system equations (7.1–7.3) if the equations (7.1–7.3) are satisfied and given the process (µ, ν), the sequence of stochastic processes (X i , Ai ), i = 1, 2, . . ., are conditionally independent with identical conditional distribution in the space C(R+ , Rd × R+ ). To prove the uniqueness of the solution to the system, we have to assume the Lipschitz continuity of the coefficients. To this end, we need to use a metric in the space P (Rd ). We shall take the Wasserstein metric defined below. For ν1 , ν2 ∈ P (Rd ), the Wasserstein metric is defined by ρ(ν1 , ν2 ) = sup {| ν1 , φ − ν2 , φ | : φ ∈ B1 } , where
B1 = φ : |φ(x) − φ(y)| ≤ |x − y|, |φ(x)| ≤ 1, ∀x, y ∈ Rd .
We note that the topology determined by the Wasserstein metric ρ is equivalent to the topology of weak convergence on P (Rd ). Under Condition (BC), we can verify that the coefficients of the infinite particle system satisfy the following conditions (S1) and (S2): (S1) There exists a constant K such that for each x ∈ Rd , ν ∈ P (Rd ) ˜ ν)|2 + |c(x)|2 + |β(x, ν)|2 ≤ K2 . |σ (x)|2 + |b(x, (S2) For each x1 , x2 ∈ Rd and ν1 , ν2 ∈ P (Rd ), ˜ 1 , ν1 ) − b(x ˜ 2 , ν2 )|2 |σ (x1 ) − σ (x2 )|2 + |b(x + |c(x1 ) − c(x2 )|2 + |β(x1 , ν1 ) − β(x2 , ν2 )|2 ≤ K2 (|x1 − x2 |2 + ρ(ν1 , ν2 )2 ).
7.1
An interacting particle system
We assume that (Ai0 , X0i ) are independent and identically distributed random vectors being independent of {Bi } and ν. For the moment, we suppose that the system has a solution, the existence of which will be proved later (cf. Theorem 7.7 below). The following proposition establishes the finiteness of the second moments for the locations and the weights of the particles in the system. Proposition 7.2 Suppose that Assumption (S1) holds and
E(A10 )2 + E|X01 |2 < ∞.
(7.4)
If (X, A, µ) is a solution of equations (7.1)–(7.3), then for every t ≥ 0 and i ∈ N, (7.5) E sup (Ais )2 + |Xsi |2 < ∞. 0≤s≤t
Proof Applying the Burkholder–Davis–Gundy inequality to Xti in equation (7.1) and using the inequality (a+b+c+d)2 ≤ 4(a2 +b2 +c2 +d 2 ), we have t i 2 i 2 E sup |Xs | ≤ 4E|X0 | + 16E |σ (Xsi )|2 ds 0≤s≤t
+ 4t E
0
0
t
˜ i , µs )|2 ds + 16E |b(X s
t
0
|c(Xsi )|2 ds
≤ 4E|X0i |2 + 32K2 t + 4K2 t 2 < ∞. Applying Itô’s formula to Ait in equation (7.2), we have ! " (Ait )2 = (Ai0 )2 exp 2Nti − N i , t
where
Nti =
0
t
β ∗ (Xsi , µs )dνs .
(7.6)
Since exp 2Nti − 2 N i t is a martingale and t ! " Ni = |β(Xsi , µs )|2 ds ≤ K2 t, t
0
by Doob’s inequality, we get
E sup (Ais )2 ≤ 4E(Ait )2 0≤s≤t
! " ! " = 4E (Ai0 )2 exp 2Nti − 2 N i exp N i t
≤ 4e
K2 t
E(Ai0 )2 .
t
123
124
7 : Uniqueness of the solution for the filtering equation
The next theorem shows that the weighted empirical measure of the system is indeed a solution to the filtering equation. Theorem 7.3 Let µt be the weighted empirical measure process for the particle system equations (7.1)–(7.3). Then, ∀φ ∈ Cb2 (Rd ), t t
µt , φ = π0 , φ +
µs , Lφ ds + µs , β ∗ (·, µs )φ + ∇ ∗ φc dνs . (7.7) 0
0
Proof Applying Itô’s formula to equations (7.1) and (7.2), for every φ ∈ Cb2 (Rd ), we have t i i i i Ais φ(Xsi )β ∗ (Xsi , µs )dνs At φ(Xt ) = A0 φ(X0 ) + 0
+
t 0
+ +
t
0 t 0
Ais Lφ(Xsi )ds
(7.8)
Ais ∇ ∗ φ(Xsi )σ (Xsi )dBis Ais ∇ ∗ φ(Xsi )c(Xsi )dνs .
i = 1, . . . , n, are independent Brownian motions, we can apply As the Burkholder–Davis–Gundy inequality to get n 2 1 t i ∗ i i i E sup As ∇ φ(Xs )σ (Xs )dBs 0 t≤T n Bi ,
i=1
T n 2 1 ≤4 2 E (Ais )2 ∇ ∗ φ(Xsi )σ (Xsi ) ds n 0 i=1
4 ∇φ2∞ K2 T E sup(A1s )2 → 0. n s≤T Taking limn→∞ n1 ni=1 on both sides of equation (7.8), we see that equation (7.7) holds. ≤
Remark 7.4 By the definition of β, we see that equation (7.7) coincides with the filtering equation (5.17).
7.2
The uniqueness of the system
In this section, we prove the uniqueness for the solution of the infinite system of stochastic differential equations. Although the coefficients are assumed
7.2
The uniqueness of the system
to be Lipschitz, the product aβ ∗ (x, µ), which appears in the system, is only Lipschitz in (a, x, µ) in the region where a is bounded. To get the uniqueness for the solution to the system, we need to use a localizing technique. Namely, first we prove the uniqueness when the weights of the particles are bounded, and then, extend to the whole space. However, there are infinitely many particles in the system, it is not possible to make their weights bounded (for example, Borel–Cantelli’s lemma tells us that, in the special case when individuals are independent of each other, the probability of their weights being bounded is 0). To overcome this difficulty, we will localize a kind of average of the weights. Here is the main theorem of this section. Theorem 7.5 Under the Assumptions (S1), (S2) and equation (7.4), the system has at most one solution. ˜ A, ˜ µ) Proof Let (X, A, µ) and (X, ˜ be two solutions of equations (7.1)–(7.3) with the same initial conditions. Recall that (Xi , Ai ), i = 1, 2, . . ., are conditionally independent with identical conditional distribution. It follows from the conditional version of the strong law of large numbers that the following limit 1 i 2 (At ) n→∞ n n
lim
i=1
exists almost surely. For any m ∈ N, let
$ n 1 i 2 τm = inf t : lim (At ) > m2 . n→∞ n i=1
Then, τm is an increasing sequence of stopping times. The stopping time τ˜m ˜ i ). Let ηm = τm ∧ τ˜m . Then, {ηm } is defined similarly (with Ait replaced by A t is again an increasing sequence of stopping times. We denote the limit of this sequence by η∞ . By equation (7.1) and the Burkholder–Davis–Gundy inequality, we get 2 i ˜i E|Xt∧η −X t∧ηm | m t ˜ i )|2 1s≤η ds |σ (Xsi ) − σ (X ≤ 12E m s 0
+ 3t E
t 0
˜ i , µs ) − b( ˜ X ˜ i , µ˜ s )|2 1s≤η ds |b(X m s s
125
126
7 : Uniqueness of the solution for the filtering equation
+ 12E
t 0
˜ i )|2 1s≤η ds |c(Xsi ) − c(X m s
≤ 3K2 (8 + t)E
t
˜ i |2 + ρ(µs , µ˜ s )2 1s≤η ds. |Xsi − X m s
0
(7.9)
For s ≤ ηm , we have n 1 i i i i ˜ φ(X ˜ )) (As φ(Xs ) − A ρ(µs , µ˜ s ) = sup lim s s φ∈B1 n→∞ n i=1
1 i ˜ i )| As |φ(Xsi ) − φ(X s φ∈B1 n→∞ n n
≤ sup lim
i=1
n 1 i ˜ i | φ(X ˜ i ) + sup lim |As − A s s φ∈B1 n→∞ n i=1
1 n→∞ n
≤ lim
n
1 i ˜ i |. |As − A s n→∞ n
˜ i | + lim Ais |Xsi − X s
i=1
n
i=1
Consequently, the by Cauchy–Schwarz inequality, we get
1 i 2 ρ(µs , µ˜ s ) ≤ lim (As ) n→∞ n n
1
1 i ˜ i |2 |Xs − X lim s n→∞ n
2
i=1
1 n→∞ n
+ lim
n
n
1 2
i=1
˜ i| |Ais − A s
i=1
1 i ˜ i |2 ≤ m lim |Xs − X s n→∞ n n
i=1
1 2
1 i ˜ i |, (7.10) |As − A s n→∞ n n
+ lim
i=1
where the last inequality follows from s ≤ ηm . Let i 2 ˜i −X fm (t) = E|Xt∧η t∧ηm | , m
and
⎡
2 ⎤
1 i ˜i |At∧ηm − A gm (t) = E ⎣ lim t∧ηm | n→∞ n n
i=1
⎦.
7.2
The uniqueness of the system
Then, 1 i 2 ˜i ≤ 2m E lim |Xs∧ηm − X s∧ηm | n→∞ n n
2
2
Eρ(µs , µ˜ s ) 1s≤ηm
i=1
2
1 i ˜i + 2E lim |As∧ηm − A s∧ηm | n→∞ n n
i=1
≤ 2m2 fm (s) + 2gm (s).
(7.11)
By equations (7.9), (7.10), (7.11) and Fatou’s lemma, we have, 2
fm (t) ≤ 3K (8 + t)
t 0
fm (s) + 2m2 fm (s) + 2gm (s) ds.
(7.12)
Next, we derive the estimate for gm (t). As Ait
=
Ai0 exp
Nti
1 ! i" , − N t 2
with N i given by equation (7.6), and making use of the fact that |ex − ey | ≤ (ex ∨ ey )|x − y|, we have t ˜ i | = (|Ai | ∨ |A ˜ i |) (β ∗ (X i , µs ) − β ∗ (X ˜ i , µ˜ s ))dνs |Ait − A t t t s s 1 − 2
0
0
t
|β(Xsi , µs )2
˜ i , µ˜ s )|2 ds. − β(X s
For t ≤ ηm , it follows from the Cauchy–Schwarz inequality that
2
1 i ˜ i| |At − A lim t n→∞ n n
i=1
n n 1 i 2 1 t ∗ i i 2 ∗ ˜i ˜ (At ) ∨ (At ) lim ≤ lim (β (Xs , µs )−β (Xs , µ˜ s ))dνs n→∞ n n→∞ n 0 i=1 i=1 2 t 1 i 2 i ˜ 2 ˜ − (|β(Xs , Vs )| − |β(Xs , Vs )| )ds . 2 0
127
128
7 : Uniqueness of the solution for the filtering equation
By the definition of ηm and the inequality (a + b)2 ≤ 2(a2 + b2 ), we can continue with 2 n 1 i i ˜ | lim |At − A t n→∞ n i=1
2 n 1 t ∗ i ∗ ˜i ≤ 4m lim (β (Xs , µs ) − β (Xs , µ˜ s ))dνs n→∞ n 0 i=1 t 1 2 i i 2 ˜ + t 4K |β(Xs , µs ) − β(Xs , µ˜ s )| ds . 4 0 2
By the Lipschitz continuity of β, we finally obtain: 2 n 1 i ˜ i| lim |At − A t n→∞ n i=1
2 n 1 t ∗ i ∗ ˜i ≤ 4m lim (β (Xs , µs ) − β (Xs , µ˜ s ))dνs n→∞ n 0 i=1 t 4 i i 2 2 ˜ |Xs − Xs | + ρ(µs , µ˜ s ) ds . +K t 2
0
It follows from Fatou’s lemma and the Burkholder–Davis–Gundy inequality that t n 1 2 ˜ i , V˜ s )|2 1s≤η ds E 4 |β(Xsi , Vs ) − β(X gm (t) ≤ 4m lim m s n→∞ n 0 i=1 t 4 i i 2 2 ˜ +K t |Xs − Xs | + ρ(µs , µ˜ s ) 1s≤ηm ds 0
t n 1 2 4 i i 2 2 ˜ (4K +K t) E |Xs − Xs | +ρ(µs , µ˜ s ) 1s≤ηm ds ≤ 4m lim n→∞ n 0 i=1 t ≤ 4m2 (4K2 + K4 t) fm (s) + 2m2 fm (s) + 2gm (s) ds. (7.13) 2
0
Adding equations (7.12) and (7.13), for t ≤ T, we have t fm (t) + gm (t) ≤ K(m, T) fm (s) + gm (s) ds, 0
where K(m, T) is a constant. By Gronwall’s inequality, we have fm (t) + gm (t) = 0.
(7.14)
7.3
Uniqueness for the filtering equation
Then for m, i ∈ N and t ∈ [0, T], we have i ˜i Xt∧η =X t∧ηm m
and
˜i Ait∧ηm = A t∧ηm
a.s.
˜ i and A ˜ i are continuous, we obtain almost surely that As X i , Ai , X i ˜i Xt∧η =X t∧ηm m
˜i Ait∧ηm = A t∧ηm ,
and
∀ m, i ∈ N and t ∈ [0, T].
By equation (7.3), we have µt∧ηm = µ˜ t∧ηm , ∀ t ∈ [0, T]
a.s.
Hence, ˜ t, A ˜ t , µ˜ t ) (Xt , At , µt ) = (X
for t ≤ ηm ∧ T,
a.s.
Taking T, m → ∞, we then get ˜ t, A ˜ t , µ˜ t ) (Xt , At , µt ) = (X By the definition of ηm ,
for t ≤ η∞ ,
a.s.
1 i 2 P(ηm ≤ t) ≤ P sup lim (As ) ≥ m2 n→∞ n 0≤s≤t n
i=1
1 1 i 2 E sup lim (As ) m2 0≤s≤t n→∞ n n
≤
i=1
≤ =
1 1 lim inf m2 n→∞ n
n
E sup (Ais )2
i=1
0≤s≤t
1 E sup (A1 )2 , m2 0≤s≤t s
where the last inequality follows by moving the sup inside the sum and applying Fatou’s lemma, and the last equality follows from the fact that (Xi , Ai ), i = 1, 2, . . . are conditionally independent with identical conditional distribution. Hence, by Proposition 7.2, P(η∞ ≤ t) = limm→∞ P{ηm ≤ t} = 0, i.e. η∞ = ∞, a.s., and the uniqueness follows.
7.3
Uniqueness for the filtering equation
In this section, we establish the uniqueness for the solution to the filtering equation (7.7). First, let us summarize the techniques that will be used in this section. Note that the optimal filter πt is a solution of equation (7.7). We assume
129
130
7 : Uniqueness of the solution for the filtering equation
the existence of another solution µt and fix the variable in the nonlinear functions in equation (7.7) by µ (cf. equations (7.15) and (7.19)) to obtain a linear SPDE whose uniqueness follows from arguments similar to those in Section 6.4. The uniqueness for the solution to equation (7.7) is implied by that of the linear SPDE equation (7.19) and that of the system equations (7.1–7.3) proved in the previous section (cf. the proof of Theorem 7.7 for this argument). Namely, we decompose the difficult uniqueness problem for the non-linear SPDE into two simpler problems: One for a linear SPDE and one for a system of ordinary SDEs. Now we fix a P (Rd )-valued process µt and consider the linear equation t t
ηt , φ = π0 , φ +
ηs , Lφ ds + (7.15) ηs , βs∗ φ + ∇ ∗ φc) dνs , 0
0
where βs (x) = β(x, µs ). L2 (Rd )
Recall that H0 = is defined in Section 6.2. By following exactly the same argument as those used in Section 6.4, we have the following theorem. Theorem 7.6 Suppose that π0 ∈ H0 . Then equation (7.15) has at most one solution. Finally, we consider the uniqueness of the solution of the non-linear SPDE equation (7.7). Theorem 7.7 Suppose that π0 ∈ H0 , then there exists a unique H0 -valued solution of equation (7.7). Further, πt is the unique solution to the system equations (7.1)–(7.3). Proof By Corollary 6.13, πt takes values in H0 . Let µt be another H0 -valued solution of equation (7.7). Consider the system of stochastic differential equations: i = 1, 2, . . ., t t t i i i ˜ Xti = X0i + b(Xs , µs )ds + σ (Xs )dBs + c(Xsi )dνs , (7.16) 0
0
and
Ait = Ai0 +
0
t 0
Ais β ∗ (Xsi , µs )dνs .
(7.17)
Let µ˜ t be given by 1 i At δXi . t n→∞ n n
µ˜ t = lim
i=1
(7.18)
7.4
Notes
As in Theorem 7.3, we can prove that µ˜ is a solution of t t
ηt , φ = π0 , φ +
ηs , Lφ ds + ηs , β ∗ (·, µs )φ + ∇ ∗ φc dνs . (7.19) 0
0
By Corollary 6.13, µ˜ is H0 -valued. In particular, µ˜ is an H0 -valued solution of equation (7.19). Since µ is also an H0 -valued solution of equation (7.19), it follows from Theorem 7.6 that µ˜ = µ. Hence, µ, ˜ together with suitable (X i , Ai ), i = 1, 2, . . ., is a solution of the system equations (7.1)–(7.3). Note that we may replace µt by πt in the equations (7.16) and (7.17) and then define π˜ t by the right hand side of equation (7.18). By a similar argument as the above, we then have π˜ t = πt , and hence, πt is a solution to the system equations (7.1)–(7.3). By the uniqueness of the solution to this system we see that πt = µt .
7.4
Notes
Limits of empirical measure processes for systems of interacting diffusions have been studied by various authors following the pioneering work by McKean [122] (see, for example, Chiang, et al. [28], Graham [71], Hitsuda and Mitoma [73], Kallianpur and Xiong [84], Méléard [123], and Morien [126]). In these papers, the driving processes in the models are assumed to be independent. The limit is then a deterministic, measure-valued function. Florchinger and Le Gland [63] and Del Moral [50] consider particle approximations for Zakai’s equation. Kotelenez [87] introduces a model of n-particles with the same driving process for each particle and studies the empirical process as the solution of an SPDE. In his model, the weights Ai are constants. Dawson and Vaillancourt [49] consider a model that corresponds to taking Ait ≡ 1. Bernard et al. [8] consider a system with time-varying weights and a deterministic limit. Kurtz and Xiong [97] study a general class of non-linear SPDEs whose solutions are represented as the weighted measure process of interacting particles systems. Most of the material in this chapter is taken from Kurtz and Xiong [97].
131
8
Numerical methods
Explicit solutions to the filtering equations are rarely available. Thus, to solve the filtering problems, we have to resort to numerical approximations. In this chapter, we study the numerical solutions to the optimal filters using certain particle systems. The main idea is to represent the solution to a stochastic partial differential equation through a system of weighted particles whose locations and weights are governed by stochastic differential equations that can be solved numerically. In Section 8.1, we consider a direct Monte-Carlo method based on the weighted particle-system representation. As the error in the Monte-Carlo approximation increases exponentially fast when the time parameter tends to infinity, due to the exponential growth of the variance of the weights of the particles in the system, we will modify the weight of each particle. However, we need to keep the total mass constant for the approximate filter. To this end, the number of particles in the system will be changed from time to time. We use a branching particle system to match the change of the number of particles in the system. In Section 8.2, we introduce this branching particle system to approximate the optimal filter. In Section 8.3, we give a primary estimate on the error bound between the unnormalized approximate filter and the optimal one with t fixed. Finally, in Section 8.4, we prove that the approximate filter converges uniformly in time as the number of particles tends to infinity and the step size between branching times tends to 0.
8.1
Monte-Carlo method
In this section, we will give a numerical scheme based on the weighted particle system representation equations (5.22–5.24) for the unnormalized filter. We first approximate the unnormalized filter by the weighted empirical measure of a finite system. Then, for each stochastic differential equation in that system, we approximate the solution using the Euler scheme.
8.1
Monte-Carlo method
First, we recall the weighted particle-system representation given in Section 5.4: ⎧ ˜ i )dt + c(X i )dYt + σ (X i )dBi , ⎨ dXti = b(X t t t t (8.1) M0i = 1, i = 1, 2, . . . , dMti = Mti h∗ (Xti )dYt , ⎩ 1 n i i Vt , f = limn→∞ n i=1 Mt f (Xt ), where b˜ = b−ch. We recall that {X0i , i = 1, 2, . . .} are i.i.d. random vectors with common distribution π0 in Rd . Let t h∗ (Xsi )dYs . Nti = 0
Then
Ni
ˆ is a P-martingale with Meyer’s process satisfying t 2 ! " i N = h(Xsi ) ds ≤ K2 t, t
0
where the constant K is given in equation (5.3). An application of Itô’s formula shows that Mti is given by 1 ! i" i i N . (8.2) Mt = exp Nt − t 2 Proposition 8.1 Suppose that Condition (BC) holds. Then for each i ∈ N, 2 Eˆ sup |Msi |2 ≤ 4eK T .
(8.3)
0≤s≤T
Proof By equation (8.2), Mti is a square-integrable martingale with ! " Eˆ (Mti )2 = Eˆ exp 2Nti − N i t ! ! " " 1 i ˆ exp 2N i − =E 2N exp Ni t t t 2 " ! 2 ˆ exp 2N i − 1 2N i ≤E eK t t t 2 2
= eK t . Consequently, equation (8.3) follows from Doob’s inequality equation (2.14). Let 1 i Mt δXi . t n n
Vtn =
i=1
The process
Vtn
is called the approximate unnormalized filter.
(8.4)
133
134
8 : Numerical methods
The following corollary gives the convergence of Vtn as n → ∞. ˆ |X i |2 < ∞ and Condition (BC) holds. Let Corollary 8.2 Assume that E 0 f be a bounded continuous function. Then, for each t ≥ 0, there exists a constant K1 such that K1 f 2∞ . (8.5) Eˆ | Vtn , f − Vt , f |2 ≤ n Proof Note that, given Gt , the sequence of random vectors {(Mti , Xti ) : i = 1, 2, . . .} are conditionally independent with the same conditional distribution. For any i, we have Eˆ Mti f (Xti )Gt = Vt , f . Thus, by equation (8.4), we have Eˆ | Vtn , f − Vt , f |2 ⎛ ⎞ n 2 1 ˆ ˆ⎝ E ˆ Mi f (X i )Gt Gt ⎠ =E Mti f (Xti ) − E t t n2 i=1
=
1 n2
n
Eˆ
2 ˆ Mi f (X i )|Gt Mti f (Xti ) − E t
i=1
2 1ˆ 1 ˆ (M1 f (X 1 )|Gt ) Mt f (Xt1 ) − E = E t t n 4f 2∞ ˆ ≤ E((Mt1 )2 ). n Equation (8.5) then follows from Proposition 8.1 and equation (8.6).
(8.6)
Next, we apply the Euler scheme to approximate the solution of the finite system {(Mi , X i ) : i = 1, 2, . . . , n}. For δ > 0, let ηδ (t) = jδ
for jδ ≤ t < (j + 1)δ.
Set Zti = log Mti . Define the finite system {(X δ,i , Zδ,i , V n,δ ) : i = 1, 2, . . . , n} as follows: ⎧ δ,i ˜ δ,i )dt + c(X δ,i )dYt + σ (X δ,i )dBi ⎪ dXt = b(X ⎪ t ηδ (t) ηδ (t) ⎪ 2 ηδ (t) ⎨ δ,i δ,i δ,i dZt = h(Xηδ (t) )dYt − 12 h(Xηδ (t) ) dt (8.7) ⎪ ⎪ n ⎪ n,δ δ,i 1 ⎩ Vt = δXδ,i . i=1 exp Zt n t
8.1
Monte-Carlo method
The next theorem proves the convergence of each particle in the finite system to the corresponding one in the infinite system equation (8.1) as δ → 0. ˆ |X i |2 < ∞ and Condition (BC) holds. For Theorem 8.3 Assume that E 0 each positive integer i and for each T > 0, we have δ,i δ,i i 2 i 2 ˆ (8.8) E sup |Xt − Xt | + sup |Zt − Zt | ≤ K1 (T)δ, t≤T
t≤T
where K1 (T) is a constant depending on T. Proof By equations (8.1) and (8.7), we have t δ,i i ˜ δ,i ) − b(X ˜ i ) ds Xt − Xt = b(X s ηδ (s) 0
+ +
t 0
δ,i c(Xηδ (s) ) − c(Xsk,i ) dYs
t 0
δ,i σ (Xηδ (s) ) − σ (Xsi ) dBis .
Applying the Burkholder–Davis–Gundy and Hölder’s inequalities, we obtain t 2 ˜ δ,i δ,i i 2 ˜ i ) ds ˆ E sup |Xs − Xs | ≤ 3T Eˆ b(X ) − b(X s ηδ (s) s≤t
0
+ 12
0
+ 12
t
0
t
2 Eˆ c(Xηδ,iδ (s) ) − c(Xsi ) ds 2 Eˆ σ (Xηδ,iδ (s) ) − σ (Xsi ) ds.
(8.9)
It follows from equation (8.7) that
Eˆ |Xtδ,i − Xηδ,iδ (t) |2 ≤ 3K2 (t − ηδ (t))2 + 3K2 E(|Yt − Yηδ (t) |2 ) ˆ (|Bi − Bi |2 ) + 3K2 E t ηδ (t) ≤ K2 δ, and hence, ˜ δ,i ) − b(X ˜ i )|2 ≤ 2E ˜ δ,i ) − b(X ˜ δ,i )|2 + 2E ˜ δ,i ) − b(X ˜ i )|2 ˆ |b(X ˆ |b(X Eˆ |b(X s s s s ηδ (s) ηδ (s) ≤ 2K2 K2 δ + 2K2 fδ (s),
135
136
8 : Numerical methods
where ˆ sup |X δ,i − X i |2 . fδ (t) ≡ E s s s≤t
The other two terms in equation (8.9) can be estimated similarly. Therefore, t t fδ (t) ≤ 3T 2(K2 δ + K2 fδ (s))ds + 24 2(K2 δ + K2 fδ (s))ds 0
0
≤ K3 (T)δ + K4 (T)
t 0
fδ (s)ds.
By Gronwall’s inequality, we have
Eˆ sup |Xtδ,i − Xti |2 ≤ K5 (T)δ. t≤T
δ,i i 2 ˆ sup A similar inequality holds for E t≤T |Zt − Zt | .
To study the convergence of measure-valued processes, we need a metric in the space of finite measures. Once again, we use the Wasserstein metric introduced in Chapter 6. We recall that for ν1 , ν2 ∈ MF (Rd ), the Wasserstein metric is given by ρ(ν1 , ν2 ) = sup {| ν1 , φ − ν2 , φ | : φ ∈ B1 } , where
B1 = φ : |φ(x) − φ(y)| ≤ |x − y|, |φ(x)| ≤ 1, ∀x, y ∈ Rd .
Under this metric, MF (Rd ) becomes a Polish space. We will be dealing with measures of the form 1 j ai δxj , n i n
νj =
j = 1, 2.
i=1
It is useful to note that in this case, ∀ φ ∈ B1 , | ν1 , φ − ν2 , φ | 1 1 |ai − a2i ||φ(x1i )| + a2i |φ(x1i ) − φ(x2i )| n n
≤
i=1
1 1 (ai ∨ a2i ) |x1i − x2i | + | log a1i − log a2i | , n n
≤
i=1
8.2
A branching particle system
where we used |a − b| ≤ a ∨ b | log a − log b| in the last inequality above. Hence n 1 1 (ai ∨ a2i ) |x1i − x2i | + | log a1i − log a2i | . (8.10) ρ(ν1 , ν2 ) ≤ n i=1
ˆ |X i |2 < ∞ and Condition (BC) holds. Then Corollary 8.4 Assume that E 0 there exists a constant K1 (T) such that √ Eˆ sup ρ(Vtn,δ , Vtn ) ≤ K1 (T) δ. (8.11) t≤T
Proof Note that by equation (8.10), we have n,δ
ρ(Vt , Vtn ) 1 δ,i δ,i δ,i (Mt ∨ Mti ) |Xt − Xti | + |Zt − Zti | n n
≤
≤
i=1
1 δ,i (Mt ∨ Mti )2 n n
i=1
1/2
2 1 δ,i δ,i |Xt − Xti | + |Zt − Zti | n n
1/2 ,
i=1
where the last inequality follows from the Cauchy–Schwarz inequality. The conclusion then follows from Proposition 8.1 and Theorem 8.3. Finally, we combine both approximating procedures (δ → 0 and n → ∞). The sampling error and the discretization error together will give us the following overall error estimate. n,1/n ˆ |X i |2 < ∞ and Condition (BC) Theorem 8.5 Let V¯ tn = Vt . Assume E 0 holds. Then there exists a constant K1 (t) such that
K1 (t) Eˆ ρ(V¯ tn , Vt ) ≤ √ . n We note that the weights introduced above have mean 1 and variance growing exponentially fast as t → ∞. Therefore, the error associated with the numerical scheme introduced above grows exponentially fast as t → ∞. To avoid this drawback of the numerical scheme, we consider in the next section a branching particle system to modify the weights of the particles at the time-discretization steps.
8.2
A branching particle system
In this section, we introduce the branching particle-system approximation of the optimal filter. The main purpose is to reduce the variance of the
137
138
8 : Numerical methods
weights of the particles in the system. The idea is to divide the time interval into small subintervals and the weight for each particle at any partition time is modified as an exponential martingale that depends on the signal and the noise in the small interval prior to that time instead of on the whole interval starting from 0. Now we proceed to the definition of the branching particle system. Initially, there are n particles of weight 1 each at locations xni , i = 1, 2, . . . , n, satisfying the following initial condition (I): The initial positions {xni : i = 1, 2, . . . , n} of the particles are i.i.d. random vectors in Rd with the common distribution π0 ∈ P (Rd ). Note that, under Condition (I), we have that 1 δxni → π0 n n
π0n =
i=1
in P (Rd ) almost surely; and for any φ ∈ Cb (Rd ), " 2 1 ! π0 , φ 2 − π0 , φ2 Eˆ π0n − π0 , φ = n ≤ 4n−1 φ20,∞ , here, φ0,∞ denotes the supremum norm of φ. Let δ = δn = n−2α , 0 < α < 1. For j = 0, 1, 2, . . ., there are mnj number of particles alive at time t = jδ. Note that mn0 = n. During the time interval (jδ, (j + 1)δ), the particles move according to the following diffusions: For i = 1, 2, . . . , mnj , i Xti = Xjδ +
t
jδ
σ (Xsi )dBis +
t
jδ
˜ i )ds + b(X s
t jδ
c(Xsi )dYs .
(8.12)
At the end of the interval, the ith particle (i = 1, 2, . . . , mnj ) branches i of offsprings such (independent of others) into a random number ξj+1 that the conditional expectation and the conditional variance given the information prior to the branching satisfy
i ˜ n (X i ), Eˆ ξj+1 |F( j+1)δ− = M j+1 and ˆ i n VarP ξj+1 |F( j+1)δ− = γj+1 (X i ),
(8.13)
8.2
A branching particle system
n (X i ) is arbitrary, where γj+1
˜ n (X i ) = M j+1
and
n Mj+1 (X i )
= exp
( j+1)δ
∗
h jδ
1 mnj
n (X i ) Mj+1 , mnj n (X ) M j+1 =1
(Xti )dYt
1 − 2
( j+1)δ
jδ
(8.14)
|h(Xti )|2 dt
.
(8.15)
n , we take To minimize γj+1
i ξj+1
=
˜ n (X i )] ˜ n (X i )}, [M with probability 1 − {M j+1 j+1 ˜ n (X i )}, ˜ n (X i )] + 1 with probability {M [M j+1 j+1
where {x} = x − [x] is the fraction of x, and [x] is the largest integer that is not greater than x. In this case, we have n ˜ n (X i )}(1 − {M ˜ n (X i )}). γj+1 (X i ) = {M j+1 j+1
Now we define the approximate filter as follows: mn
πtn
j 1 ˜n i Mj (X , t)δXi , = n t mj
jδ ≤ t < (j + 1)δ,
i=1
where
Mjn (X i , t) = exp
t
jδ
h∗ (Xsi )dYs −
1 2
t
jδ
|h(Xsi )|2 ds ,
(8.16)
and ˜ n (X i , s) = M j
1 mnj
Mjn (X i , s) . mnj n (X , s) M j =1
˜ n (X i , t). At the Namely, the ith particle has a time-dependent weight M j end of the interval, i.e. t = (j + 1)δ, this particle dies and gives birth to a random number of offspring, whose conditional expectation is equal ˜ n (X i ) = M ˜ n (X i , (j + 1)δ)) of the to the pre-death weight (note that M j+1 j particle. The new particles start from their mother’s position with weight 1 each.
139
140
8 : Numerical methods
The process πtn is called the hybrid filter since it involves a branching particle system and the empirical measure of these weighted particles. In the earlier stage of the study of particle approximation of the optimal filter, the particle approximation is defined as πtn without the weight, i.e. the particle filter is mn
π˜ tn
j 1 = n δXi , t mj
jδ ≤ t < (j + 1)δ.
(8.17)
i=1
Thus, the current approximate filter πtn is a combination of the weighted filter introduced in Section 8.1 and the particle filter equation (8.17). That is why we call it the hybrid filter. Since Zakai’s equation for the unnormalized filter Vt is much simpler than the Kushner–FKK equation for the optimal filter πt , to study the convergence of πtn to πt , it is more convenient to consider an auxiliary process first. Let mn
ηkn =
j−1 1 n k Mj (X ). j=1 n mj−1
=1
For kδ ≤ t < (k + 1)δ, we define Vtn
mn
mn
i=1
i=1
k k 1 1 = ηkn πtn Mkn (X i , t) = ηkn Mkn (X i , t)δXi . t n n
We will prove that Vtn converges to the unnormalized filter Vt . In the rest of this section, we derive a few estimates for the branching particle system introduced above. Lemma 8.6 There exists a constant K1 such that for any i = 1, 2, . . . , mnj and 0 ≤ j ≤ [T/δ], we have n Eˆ |Mj+1 (X i ) − 1|2 Fjδ ≤ K1 δ. (8.18) Proof By equation (8.16), similar to Proposition 8.1, it is easy to show that for jδ ≤ t ≤ (j + 1)δ, i K2 δ Eˆ (Mjn (X i , t))2 |Fjδ ∨ Fjδ,( , (8.19) j+1)δ ≤ e i i i where Fjδ,( j+1)δ = σ {Bs − Bjδ : jδ ≤ s ≤ (j + 1)δ} is the σ -field generated
by the increments of Bit in t ∈ [jδ, (j + 1)δ]. Applying Itô’s formula to
8.2
A branching particle system
equation (8.16), we have Mjn (X i , s)
=1+
s
jδ
Mjn (X i , r)h∗ (Xri )dYr .
(8.20)
Thus,
Eˆ
n |Mj+1 (X i ) − 1|2 Fjδ
ˆ =E ≤
( j+1)δ
jδ ( j+1)δ
|Mjn (X i , r)|2 |h(Xri )|2 drFjδ
2
eK δ K2 dr
jδ 2
= K2 eK δ δ.
(8.21) 2
The inequality equation (8.18) then follows by taking K1 = K2 eK .
Remark 8.7 In the proof of equation (8.21), we have not used the full ˆ strength of equation (8.19). Instead, we only need E (Mjn (X i , t))2 |Fjδ ≤ 2
eK δ . The full strength of equation (8.19) will be needed in the proof of Theorem 8.10. Lemma 8.8 For each 1 ≤ j ≤ [T/δ], we have 2 Eˆ mnj (ηjn )2 ≤ neK T . Proof As mnj−1
mnj
=
ξji ,
i=1
we have
Eˆ mnj |Fjδ− =
mnj−1
Eˆ ξji |Fjδ−
i=1 mnj−1
=
˜ n (X i ) M j
i=1
= mnj−1 .
(8.22)
141
142
8 : Numerical methods
By equation (8.19), for 1 ≤ i ≤ mnj−1 , we have 2 Eˆ Mjn (X i )2 |F( j−1)δ ≤ eK δ .
(8.23)
Thus,
Eˆ
n ηjn /ηj−1
2
|F( j−1)δ
⎛⎛ 1 ˆ⎜ =E ⎝⎝ n mj−1
⎞ ⎟ Mjn (X k )⎠ F( j−1)δ ⎠
mnj−1
⎞2
k=1
n
mj−1 1 ˆ n k 2 E Mj (X ) |F( j−1)δ ≤ n mj−1 k=1
≤e
K2 δ
.
(8.24)
n is F( j−1)δ -measurable, it follows from As ηjn is Fjδ− -measurable and ηj−1 equation (8.22) and equation (8.23) that
n n 2 n n 2 ˆ ˆ ˆ E mj (ηj ) = E E mj (ηj ) Fjδ−
ˆ mn (ηn )2 =E j−1 j 2 n n 2ˆ n n ˆ = E mj−1 (ηj−1 ) E ηj /ηj−1 |F( j−1)δ
2 ˆ mn (ηn )2 . ≤ eK δ E j−1 j−1 By induction, we get 2 2 Eˆ mnj (ηjn )2 ≤ eK T mn0 = neK T , for 1 ≤ j ≤ [T/δ].
Finally, we give an estimate of the conditional variance γjn (X i ). Lemma 8.9 There exists a constant K1 such that for any j ≥ 0 and i = 1, 2, . . . , mnj , we have 2 √ n n Eˆ γj+1 (X i ) ηj+1 /ηjn |Fjδ ≤ K1 δ.
8.3
Proof As
Convergence of Vnt
˜n n i (X i ) ≤ M γj+1 j+1 (X ) − 1 ,
we have 2 n i n n ˆ E γj+1 (X ) ηj+1 /ηj Fjδ ⎛
⎞ 2 mnj 1 ⎟ ˜n i n ˆ⎜ Mj+1 (X k ) Fjδ ⎠ ≤E ⎝M j+1 (X ) − 1 n mj k=1 ⎞ ⎞ mnj mnj 1 1 n n ˆ ⎝⎝Mn (X i )−1 + Mj+1 (X k )Fjδ ⎠ ≤E Mj+1 (X k )−1⎠ n j+1 mnj mj ⎛⎛
k=1
⎛
2 n ⎜ i ≤ ⎝ E Mj+1 (X ) − 1 Fjδ
k=1
1/2
⎫1/2 ⎞ ⎧ mnj ⎨ 1 2 ⎬ n ⎟ + E Mj+1 (X k ) − 1 Fjδ ⎠ ⎭ ⎩ mnj k=1
⎫1/2 ⎧ mnj 2 ⎬ ⎨ 1 n E Mj+1 (X k ) Fjδ × ⎭ ⎩ mnj k=1
3 2 ≤ 2K2 δeK /2 = K1 δ, where K1 =
8.3
√ 2 2K2 eK /2 .
Convergence of Vnt
In this section, we consider the convergence of Vtn to Vt with t fixed. We recall that {ψs , 0 ≤ s ≤ t} is the solution to the backward SPDE equation (6.30). n Let kδ ≤ t < (k + 1)δ. First, we note that Vkδ , ψkδ − V0n , ψ0 can be written as a telescopic sum k ! j=1
" ! " Vjδn , ψjδ − V(nj−1)δ , ψ( j−1)δ .
143
144
8 : Numerical methods
As ψt = φ, we get
n , ψkδ Vtn , φ − V0n , ψ0 = Vtn , ψt − Vkδ +
k !
" ! " ˆ V n , ψjδ |Fjδ− ∨ Gjδ,t Vjδn , ψjδ − E jδ
j=1
+
k ! " ! Eˆ V n , ψjδ |Fjδ− ∨ Gjδ,t − V n
( j−1)δ , ψ( j−1)δ
jδ
"
j=1
≡ I1n + I2n + I3n ,
(8.25)
where Gs,t = σ (Yu − Ys : s ≤ u ≤ t). By the definition of V n , we have mn
k 1 i I1n = ηkn ) . Mkn (X i , t)ψt (Xti ) − ψkδ (Xkδ n
i=1
Note that
Eˆ
!
"
Vjδn , ψjδ |Fjδ− ∨ Gjδ,t
⎛
⎞ i ˆ ⎝ηn 1 ξji ψjδ (Xjδ )Fjδ− ∨ Gjδ,t ⎠ =E j n mnj−1 i=1
mnj−1
= ηjn
1 i ˆ ψjδ (Xjδ )E ξji |Fjδ− ∨ Gjδ,t n i=1
mn
j−1 1 i ˆ = ηjn ψjδ (Xjδ )E ξji |Fjδ− n
i=1
mn
j−1 1 i ˜ n = ηjn ψjδ (Xjδ )Mj (X i ), n
(8.26)
i=1
where the second equality follows from the measurability of ηjn , mnj−1 ,
i ) with respect to F ψjδ (Xjδ jδ− ∨ Gjδ,t , the third equality follows from the fact that Yt is of independent increments. Thus,
I2n =
k j=1
mn
j−1 1 i ˜ n (X i )). ηjn ψjδ (Xjδ )(ξji − M j n
i=1
8.3
Convergence of Vnt
By equation (8.26) and the definitions of V n and ηn , we have
I3n =
k j=1
=
k
⎞ mnj−1 mnj−1 1 1 i ˜ n n ⎝ηn ψjδ (Xjδ )Mj (X i ) − ηj−1 ψ( j−1)δ (X(i j−1)δ )⎠ j n n ⎛
i=1
i=1
mnj−1
n ηj−1
j=1
1 i )Mjn (X i ) − ψ( j−1)δ (X(i j−1)δ ) . ψjδ (Xjδ n i=1
Theorem 8.10 Suppose that the conditions (BD) and (I) hold. Then there exists a constant K1 such that Eˆ | Vtn , φ − Vt , φ |2 ≤ K1 n−(1−α) φ2k,2 , where k =
d 2
+ 2 is given in Condition (BD).
Proof Denote the term in the summation in I3n by aj , i.e. I3n = equation (6.35), we get
1 ηjn
aj+1 =
mn
j
n
i=1
( j+1)δ
jδ
k
j=1 aj .
By
Mjn (X i , s)∇ ∗ ψs σ (Xsi )dBis .
Thus,
Eˆ aj+1 |Fjδ =
1 ηjn n
mn
j
i=1
Eˆ
( j+1)δ
jδ
Mjn (X i , s)∇ ∗ ψs σ (Xsi )dBis Fjδ
= 0,
and hence,
Eˆ ((I3n )2 ) =
k−1 j=0
⎛
mnj
1 Eˆ ⎝ηjn n i=1
( j+1)δ
jδ
⎞2 Mjn (X i , s)∇ ∗ ψs σ (Xsi )dBis ⎠ .
145
146
8 : Numerical methods
Since {(Bis − Bijδ , Xsi ) : jδ ≤ s ≤ (j + 1)δ}, i = 1, 2, . . . , mnj are conditionally (given Fjδ ∨ Gt ) independent, we can continue with
Eˆ ((I3n )2 ) =
k−1 j=0
ˆ =E
⎛ ⎛⎛ ⎞ ⎞ ⎞2 mnj ( j+1)δ ⎜ ⎜ 1 ⎟ n 2⎟ Eˆ ⎝Eˆ ⎝⎝ Mjn (X i , s)∇ ∗ ψs σ (Xsi )dBis ⎠Fjδ ∨ G⎠ t (ηj ) ⎠ n jδ
k−1 j=0
i=1
mn
j 1 ( j+1)δ n i 2 ∗ Mj (X , s) |∇ ψs σ (Xsi )|2 (ηjn )2 ds. n2 jδ
i=1
i Conditioning on Fjδ ∨ Fjδ,( j+1)δ , we then get
Eˆ ((I3n )2 ) ˆ =E
k−1 j=0
ˆ =E
k−1 j=0
n
mj 1 ( j+1)δ ˆ n i 2 ∗ i 2 i n 2 M E (X , s) |∇ ψ σ (X )| F ∨ F s jδ s j jδ,( j+1)δ (ηj ) ds n2 jδ i=1
mn
j 1 ( j+1)δ ˆ n i 2 i M E (X , s) | F ∨ F jδ j jδ,( j+1)δ n2 jδ
i=1
i n 2 Eˆ |∇ ∗ ψs σ (Xsi )|2 |Fjδ ∨ Fjδ,( j+1)δ (ηj ) ds, where the last equality follows from the independent increments of Y and i n i the fact that, given Fjδ ∨ Fjδ,( j+1)δ , Mj (X , s) is Gs -measurable and ψs is Gs,t -measurable. Using equation (8.19), we can continue with the above estimate as follows
Eˆ ((I3n )2 ) ≤ e
K2 δ
Eˆ
k−1 j=0
mn
j 1 ( j+1)δ ˆ ∗ 2 E ψ | F ∇ σ 20,∞ (ηjn )2 ds. s jδ 0,∞ n2 jδ
i=1
Since Y is of independent increments and ψs is Gs,t -measurable, for s ≥ jδ, we have Eˆ ∇ ∗ ψs 20,∞ |Fjδ = Eˆ ∇ ∗ ψs 20,∞ ≤ K2 φ2k,2 .
8.3
Convergence of Vnt
Thus, Eˆ ((I3n )2 ) ≤ K3 n−2 φ2k,2 Eˆ mnj (ηjn )2 ≤ K4 n−1 φ2k,2 ,
(8.27)
where the last inequality follows from Lemma 8.8. It follows from the independent increment property of Y that ˆ ξ i − M ˜ n (X i )Fj δ− ∨ Gt = E ˜ n (X i )Fj δ− = 0. Eˆ ξji − M j j j Thus, for j < j , we have mn
mn
j−1 j −1 1 i i n i 1 ˆ ˜ ˜ n (X i ))ηn ηn E ψjδ (Xjδ )(ξj − Mj (X )) ψj δ (Xji δ )(ξji − M j−1 j j n n
i=1
i=1
n
mj −1 1 i i n i 1 ˆ ˜ =E ψjδ (Xjδ )(ξj − Mj (X )) ψj δ (Xji δ ) n n i=1 i=1 i n i n n ˆ ˜ × E(ξj − Mj (X )|Fj δ− ∨ Gt )ηj−1 ηj mnj−1
= 0. Therefore 2 mnj−1 k n 2 n1 i i n i ˆ ˆ ˜ E((I2 ) ) = E ηj ψjδ (Xjδ )(ξj − Mj (X )) j=1 n i=1 =
k j=1
⎞2 mnj−1 1 i ˜ n (X i ))⎠ Eˆ ⎝ηjn ψjδ (Xjδ )(ξji − M j n ⎛
i=1
⎛ ⎛⎛ n ⎞ ⎞ ⎞2 mj−1 k ⎜ ⎜ 1 i n 2⎟ ˜ n (X i ))⎠ Fjδ− ∨ Gt⎟ = Eˆ ⎝Eˆ ⎝⎝ ψjδ (Xjδ )(ξji − M ⎠ (ηj ) ⎠ j n j=1
i=1
mn
j−1 k 1 i 2 n ˆ ψjδ (Xjδ ) γj (X i )(ηjn )2 . =E n2
j=1
i=1
147
148
8 : Numerical methods
By Lemma 8.9, we can continue the above estimate with n
mj−1 k 1 ˆ n 2 2 n i n 2 ˆ ˆ E I2 E γ (X )(η ) F =E ψ jδ ( j−1)δ 0,∞ j j n2 j=1
i=1 n
mj−1 k √ 1 ˆ 2 n 2 ˆ ≤E ψ E δ jδ 0,∞ K5 (η( j−1)δ ) 2 n j=1
i=1
k √ 1 ˆ n E mj−1 (η(nj−1)δ )2 ≤ K6 δφ2k,2 2 n j=1
√ 1 2 ≤ K6 δφ2k,2 2 kneK T n ≤ K7 n−(1−α) φ2k,2 . The term I1n can be estimated in a manner similar to the estimate for I3n . The conclusion then follows from equation (8.25) and the fact that 2 2 Eˆ V0n , ψ0 − Vt , φ = Eˆ V0n − V0 , ψ0 ˆ ψ0 2 ≤ 4n−1 E 0,∞ ≤ K8 n−1 φ2k,2 , where the first inequality follows from Condition (I). mnk
Remark 8.11 For the case of π˜ tn , we define V˜ tn = n ηkn π˜ tn for kδ ≤ t < (k + 1)δ. In that case, ! " V˜ tn , φ − V0n , φ = I0n + I1n + I2n + I3n , where Ijn , j = 1, 2, 3 as before and mn
k 1 1 − Mkn (X i , t) φ Xti . I0n = ηkn n
i=1
It is easy to show that Eˆ |I0n |2 ≤ K1 φ20,2 n−2α .
8.4
Convergence of Vn
Therefore, ! 2 " Eˆ V˜ tn , φ − Vt , φ ≤ K2 n−(1−α) ∨ n−2α φ2k,2 . For this case, the optimal α is 13 .
8.4
Convergence of Vn
In this section, we study the convergence of V n , regarding V n as a sequence of stochastic processes taking values in MF (Rd ). More specifically, we derive the convergence rate uniformly for t in any finite interval. The main idea of this section is to obtain an equation for the process Vtn and then to derive a maximum inequality making use of martingale theory. First, we consider the equation satisfied by Vtn . Let jδ ≤ t < (j + 1)δ. Applying Itô’s formula to equation (8.12), we get ˜ (X i )dt + ∇ ∗ f σ (X i )dBi + c(X i )dYt , df (Xti ) = Lf t t t t ˜ = Lf − h∗ c∗ ∇f is defined in Section 5.3. Combining it with where Lf equation (8.20), by Itô’s formula again, we have d Mjn (X i , t)f (Xti ) = Mjn (X i , t)Lf (Xti )dt + Mjn (X i , t)∇ ∗ f σ (Xti )dBit + Mjn (X i , t) ∇ ∗ fc + fh (Xti )dYt . It follows from the definition of Vtn that mn
j n 1 n i n d Vt , f = Vt , Lf dt + Mj (X , t)∇ ∗ f σ (Xti )dBit ηjn n i=1 n ∗ + Vt , ∇ fc + fh dYt .
The jump of Vtn at t = (j + 1)δ is n 1 ηj+1
n
=
mn
j
i=1
i ξj+1 δXi ( j+1)δ
n 1 ηj+1
n
mn
1 − ηjn n
mn
j
n Mj+1 (X i )δXi
( j+1)δ
i=1
j
i=1
i ˜ n (X i ) δ i ξj+1 −M j+1 X
( j+1)δ
.
149
150
8 : Numerical methods
Therefore,
Vtn , f = V0n , f + n,f + Nt
t
0
Vsn , Lf ds +
t
0
Vsn , ∇ ∗ fc + h∗ f dYs
ˆ tn,f , +N
(8.28)
where n,f
Nt
=
[t/δ] j=0
mnj
1 n i=1
(( j+1)δ)∧t
jδ
∇ ∗ f σ (Xsi )dBis ηjn ,
and ˆ tn,f = N
[t/δ] j=1
mn
j−1 1 i ˜ n (X i ))f (X i ). ηjn (ξj − M j jδ n
i=1
We will need the following lemma to help us calculate Meyer’s process for discontinuous martingales. Lemma 8.12 Suppose that {ζi , i = 1, 2, . . .} is a sequence of squareintegrable random variables such that ζi is Fi -measurable and ∀i = 1, 2, . . . . E ζi Fi−1 = 0, Let Mt =
[t]
ζi .
i=1
Then, {Mt , t ≥ 0} is a square-integrable martingale with Meyer’s process
Mt =
[t]
E ζi2 |Fi−1 .
(8.29)
i=1
Proof Let F˜ s = F[s] , s ≥ 0. Then Mt is F˜ t -adapted. For s < t, we have ⎛ ⎞ [t] E Mt − Ms |F˜ s = E ⎝ ζi F[s] ⎠ i=[s]+1
=
[t] i=[s]+1
E ζi F[s] = 0.
8.4
Convergence of Vn
This proves that Mt is a martingale. Denote the right-hand side of equation (8.29) by γt . Then, γt is predictable and
E
Mt2 − γt − Ms2 − γs F˜ s ⎛ ⎞ 2 [s] 2 [t] [t] ζi − ζi − E ζi2 |Fi−1 F[s] ⎠ = E⎝ i=1
⎛⎛
i=1
[t]
⎜ = E ⎝⎝
⎞2
=
[t]
ζi ⎠ + 2
i=[s]+1
ζi
i=[s]+1
[t]
i=[s]+1
⎛
E ζi2 |F[s] − E ⎝
i=[s]+1
[s]
ζj −
j=1
[t]
[t] i=[s]+1
⎞ ⎟ E ζi2 |Fi−1 F[s] ⎠
⎞ E ζi2 |Fi−1 F[s] ⎠
i=[s]+1
= 0. Thus, Mt = γt .
As an application of the lemma, we have
n,f
Corollary 8.13 The processes Nt gales with Meyer’s processes !
N
n,f
" t
=
[t/δ] j=0
ˆ tn,f are two uncorrelated martinand N
n
mj 1 (( j+1)δ)∧t ˆ ∗ i 2 E f σ (X )| |∇ Fjδ ds(ηjn )2 , s n2 jδ i=1
and !
ˆ n,f N
" t
=
[t/δ] j=1
n
mj−1 1 ˆ n i 2 i n 2 γ . E (X )f (X )(η ) F ( j−1)δ j jδ j n2 i=1
151
152
8 : Numerical methods
Proof We prove the second equality only. By Lemma 8.12, we have !
ˆ n,f N
" t
=
[t/δ] j=1
=
[t/δ] j=1
=
[t/δ] j=1
⎛⎛
⎞ 1 ˆ ⎜⎝ i ⎟ i ⎠ E⎝ (ξj − Mjn (X i ))f (Xjδ ) (ηjn )2 F( j−1)δ ⎠ 2 n ⎞2
mnj−1 i=1
⎛ ⎛⎛ n ⎞ ⎞ ⎞2 mj−1 1 ˆ ⎜ ˆ ⎜⎝ ⎟ ⎟ i ⎠ E ⎝E ⎝ (ξji − Mjn (X i ))f (Xjδ ) Fjδ−⎠(ηjn )2 F( j−1)δ⎠ 2 n i=1
n
mj−1 1 ˆ n i 2 i n 2 γ . E (X )f (X )(η ) F ( j−1)δ j jδ j n2
(8.30)
i=1
Define the usual distance on MF (Rd ) by d(ν1 , ν2 ) =
∞
2−i | ν1 − ν2 , fi | ∧ 1 ,
∀ ν1 , ν2 ∈ MF (Rd ), (8.31)
i=0
where f0 = 1 and for i ≥ 1, fi ∈ Cbk+2 (Rd ) ∩ W2k+2 (Rd ) with ||fi ||k+2,∞ ≤ 1 and also ||fi ||k+2,2 ≤ 1, where k = d2 + 2 is given in Condition (BD). Theorem 8.14 Suppose that the conditions (BD) and (I) hold and, additionally, that h ∈ Cbk (Rd ) ∩ W2k (Rd ). Then, there exists a constant K1 such that
Eˆ sup d(Vtn , Vt )2 ≤ K1 n−(1−α) .
(8.32)
t≤T
˜ 1 , ν2 ) be defined as in equation (8.31) with “∧1” removed. Proof Let d(ν Note that ˜ n , Vt )2 ≤ Eˆ sup d(V t t≤T
∞ i=1
−k
2
n 2 ˆ ˆ sup V n − Vt , 1 2 , +E E sup Vt − Vt , fi t t≤T
t≤T
(8.33)
8.4
Convergence of Vn
and 2 2 Eˆ sup Vtn − Vt , f ≤ K2 Eˆ V0n − V0 , f t≤T
+ K2
T
2 Eˆ Vtn − Vt , Lf dt
T
2 Eˆ Vtn − Vt , ∇ ∗ fc + hf dt
0
+ K2
0
ˆ + K2 E
[T/δ] j=0
ˆ + K2 E
i=1
[T/δ] j=1
mn
j 1 (( j+1)δ)∧t ∗ |∇ f σ (Xsi )|2 ds(ηjn )2 n2 jδ
mn
j−1 1 n i 2 i γj (X )f (Xjδ )(ηjn )2 . n2
(8.34)
i=1
By Condition (I) we know that the first term is bounded by K3 n−1 . By Theorem 8.10, we see that the second and the third terms are bounded by K4 n−(1−α) . For the fourth, we note that 4th term ≤ K5
[T/δ] j=0
δ ˆ n n 2 E mj (ηj ) ≤ K6 n−1 . n2
By Lemma 8.9, we have 5th term ≤ K7
[T/δ] j=1
√ n
δˆ n n 2 m ≤ K8 n−(1−α) . E (η ) j−1 j−1 2
(8.35)
To complete the proof we consider the last term in equation (8.33). Taking f = 1 in equation (8.34), we get 2 Eˆ sup Vtn − Vt , 1 ≤ K t≤T
T
0
ˆ + KE
2 Eˆ Vtn − Vt , h H dt [T/δ] j=1
mn
j−1 1 n i n 2 γj (X )(ηj ) . n2
(8.36)
i=1
Applying equation (8.35) with f = 1, we see that the second term of equation (8.36) is bounded by K9 n−(1−α) . By Theorem 8.10, we get that the first term of equation (8.36) is bounded by K10 n−(1−α) .
153
154
8 : Numerical methods
Thus, by taking K1 = K3 + 2K4 + K6 + K8 + K9 + K10 , we get ˜ n , Vt )2 ≤ K1 n−(1−α) . Eˆ sup d(V t
(8.37)
t≤T
˜ 1 , ν2 ). The desired inequality follows from d(ν1 , ν2 ) ≤ d(ν
Remark 8.15 Here, the estimate equation (8.37) is stronger than equation (8.32). We will need this stronger version in the proof of Theorem 8.16 below. Finally, we convert the convergence result to that for πtn . Theorem 8.16 Suppose that the conditions in Theorem 8.14 are satisfied. Then, there exists a constant K1 such that
E sup d(πtn , πt ) ≤ K1 n−
1−α 2
.
(8.38)
0≤t≤T
Proof Note that for f bounded by 1, we have V n , f V − V n , 1 + V n , 1 V n − V , f n t t t t t t n | πt − πt , f | = Vt , 1 Vt , 1 | Vt − Vtn , 1 | | Vtn − Vt , f | + . ≤
Vt , 1
Vt , 1 Thus, d(πtn , πt ) ≤
1 1 ˜ n | Vt − V n , 1 | + d(Vt , Vt ).
Vt , 1
Vt , 1
Recalling that the Radon–Nickodym derivative of the probability measure P on FT with respect to the probability measure Pˆ is MT , we have 1 1 ˜ n n ˆ E sup d(πt , πt ) ≤ E sup | Vt −V n , 1 |+ d(Vt , Vt ) MT
Vt , 1 0≤t≤T 0≤t≤T Vt , 1 1 1 2 2 2 MT2 n ˆ ˆ ≤ E sup | Vt − V , 1 | E sup 2 0≤t≤T 0≤t≤T Vt , 1 1 1 2 2 2 M T ˜ n , Vt )2 ˆ sup d(V + E Eˆ sup . t 2 0≤t≤T 0≤t≤T Vt , 1 (8.39) ˆ M4 < ∞. As It follows from the same argument as in equation (8.3) that E T 1 d log Vt , 1 = πt , h∗ dYt − | πt , h∗ |2 dt, 2
8.5
we have −4
Vt , 1
= exp −4
t
0
∗
πs , h dYs + 2
0
t
∗
Notes
2
| πs , h | ds .
It follows from the same argument as in equation (8.3) again that
Eˆ sup Vt , 1−4 < ∞. 0≤t≤T
Thus, by equations (8.37) and (8.39), there is a constant K1 such that equation (8.38) holds. For the particle filter, we have the following estimate. Remark 8.17 For the particle filter π˜ tn , we have E sup d(π˜ tn , πt ) ≤ K1 n−(1−α) ∨ n−2α . 0≤t≤T
8.5
Notes
Particle-system approximation of optimal filter was studied in heuristic schemes in the beginning of the 1990s by Gordon et al. [70], Gordon et al. [69], Kitagawa [86], Carvalho et al. [26], Del Moral et al. [55]. The rigorous proof of the convergence results for the particle filter were published by Del Moral [51] in 1996, and independently, by Crisan and Lyons [42] in 1997. Since then, many improvements have been made by various authors. Here, we would like to mention only a few: Crisan and Lyons [43], Crisan [38], [36], [35], [37], Crisan et al. [39], [41], Crisan and Doucet [40], Del Moral and Guionnet [53], Del Moral and Miclo [56]. Later, a central-limittype theorem was proved by Crisan and Xiong [45], on which most of this chapter is based, for a new class of hybrid filter as well as for the original branching particle filters. The material in Section 8.1 is based on Kurtz and Xiong [98]. Other numerical methods for the filtering problems have been studied extensively, although much of the work has been done under the assumption that the observation noise is independent of the signal. Kushner [102, 103, 104] develops approximation methods based on replacing the signal process by a finite-state Markov chain that approximates the signal. In the simplest cases, this method is equivalent to a finite-difference approximation in the filtering equation. Picard [133] considers a time discretization of Zakai’s equation involving the replacement of the signal by a discrete-time process and discrete-time approximations of the Radon–Nickodym derivative in
155
156
8 : Numerical methods
the Kallianpur–Striebel formula. The approximations still involve integrals against process distributions, and Picard suggests a Monte-Carlo scheme to implement the approximation. Di Masi et al. [59] consider a similar time discretization, but they also introduce a signal approximation that reduces the problem to a finite-dimensional computation somewhat similar to the approach taken by Kushner. Lototsky and Rozovskii [117] and Lototsky et al. [118] derive algorithms based on a Wiener chaos decomposition. This point of view is also explored by Budhiraja and Kallianpur [19]. Hu et al. [75] consider a Wong–Zakai-type approximation for the filtering equation directly. Florchinger and Le Gland [64], [65] consider a time discretization of Zakai’s equation for diffusion processes observed in correlated noise based on a split-up approximation and a Trotter-type product formula. In [63], a particle approximation is formulated by Florchinger and Le Gland. Del Moral [50] considers a particle approximation for a model with independent observation noise that discounts past information.
9
Linear filtering
In this chapter, we consider a very special filtering model. Namely, the signal is a Gaussian process and the observation function is linear. The corresponding filtering theory is called Kalman–Bucy filtering. We will derive the Kalman–Bucy filter as a special case of the non-linear filter introduced in the previous chapters. After the filtering equations are established and the discrete-time approximation is studied, we will investigate the long time stability of the linear filter in Sections 9.4 and 9.5. The results in the last two sections can be regarded as a standard to compare with when we study the stability for the non-linear filter in Chapter 10. The material in Section 9.4 is very technical and can be skipped in the first reading.
9.1
Gaussian system
We consider the signal–observation given by the following system dXt = (b˜ t + bt Xt )dt + ct dWt + σt dBt , dYt = (h˜ t + ht Xt )dt + dWt , Y0 = 0,
(9.1)
ˆ 0 ∈ Rd and covariance where X0 is a normal random vector with mean X matrix γ0 ∈ Rd×d , (Wt , Bt ) is an m + d-dimensional Brownian motion, the coefficients b˜ t , bt , ct , σt , h˜ t , ht are deterministic matrices (or vectors) of dimensions d × 1, d × d, d × m, d × d, m × 1, m × d, respectively. To solve the linear stochastic system equation (9.1), we consider the following ordinary differential equations (9.2) and (9.3) in the space of d × d-matrices, whose solution will be the dual process of Xt . Let pt and qt be deterministic functions taking values in the space of d × d-matrices. Suppose that pt is the unique solution to the linear system d pt = −pt bt , dt
p0 = I,
(9.2)
158
9 : Linear filtering
and qt is the unique solution to the linear system d qt = bt qt , dt
q0 = I,
(9.3)
here, I is the d × d identity matrix. Then d dpt dqt (pt qt ) = qt + pt dt dt dt = −pt bt qt + pt bt qt = 0, and hence, pt qt = p0 q0 = I. Namely, pt is invertible with p−1 t = qt . Remark 9.1 If bt = b does not depend on t, then pt = e−bt ≡
∞ (−1)n t n n=0
n!
bn and qt = ebt .
Definition 9.2 A stochastic process Xt is a Gaussian process if for any t1 < t2 < · · · < tn and λ1 , . . . , λn ∈ Rd , the random variable ni=1 λ∗i Xti has a normal distribution. Theorem 9.3 Suppose that b, h are bounded on [0, T], c, σ are square˜ h˜ are integrable on [0, T]. Then, the stochastic integrable on [0, T] and b, process (Xt , Yt ) is a Gaussian process. Proof Applying Itô’s formula to equations (9.1) and (9.2), we get d (pt Xt )
= − pt bt Xt dt + pt (b˜ t + bt Xt )dt + ct dWt + σt dBt = pt b˜ t dt + ct dWt + σt dBt . Thus,
t Xt = qt X0 + qt 0 pr b˜ r dr + cr dWr + σr dBr ,
t Yt = (h˜ s + hs Xs )ds + Wt . 0
Therefore, (Xt , Yt ) is obtained by a linear transformation of {(X0 , Wr , Br ) : 0 ≤ r ≤ t}, and hence, it is a Gaussian process. Recall that Gt is the σ -field generated by the observation process before time t. Let πt be the optimal filter, i.e. πt (ω) is the conditional distribution of Xt given Gt : πt (A, ω) = P (A|Gt ) (ω),
∀ A ∈ B (Rd ), ∀ ω ∈ .
9.1
Gaussian system
Theorem 9.4 For any t ≥ 0 and for ω ∈ being fixed, πt (ω) is a multivariate normal probability measure on Rd . N Proof Let DN = {0 = sN 1 < · · · < saN = t} be an increasing sequence of sets whose union is dense in [0, t]. Since (Xt , Yt ) is a Gaussian process, the conditional distribution πtN ≡ P(Xt ∈ ·|Ys , s ∈ DN ) is normal with condiˆ N and conditional covariance matrix γ N . We now consider tional mean X t t the characteristic function corresponding to πtN . For λ ∈ Rd , we define ∗ eiλ x πtN (dx) φN (λ) ≡ d R ∗ = E eiλ Xt Ys , s ∈ DN .
Note that for λ ∈ Rd fixed, {φN (λ) : N ≥ 1} is a martingale with ∗ φ∞ (λ) = Rd eiλ x πt (dx). By the martingale convergence theorem (Theorem 2.10), we see that φN (λ) → φ∞ (λ) a.s. Since the characteristic function of a multivariate normal distribution πtN is given explicitly as follows: ˆ N − 1 λ∗ γ N λ , φN (λ) = exp λ∗ X t t 2 ˆt ˆ N and γ N as N → ∞. Denote the limit by X we get the convergence of X t t and γt , respectively. Then 1 ∗ ∗ˆ φ∞ (λ) = exp λ Xt − λ γt λ . 2 Thus, πt (ω) is a multivariate normal distribution on Rd .
ˆ t and γt to denote the (ranAs in the proof of the theorem above, we use X dom) conditional mean vector and the conditional covariance matrix of the multivariate normal distribution corresponding to the random measure πt . Lemma 9.5 The matrix γt is non-random. In fact, for 1 ≤ i, j ≤ d, we have ˆ ˆ iX γt = E(Xti Xt ) − E(X t t ). ij
j
j
Proof As in the proof of Theorem 9.4, we get ij ˆ i )(Xtj − X ˆ tj )|Ys , s ∈ DN . γt = lim E (Xti − X t N→∞
Since
ˆ i |Ys , s ∈ DN ) = E E(X i − X ˆ i |Gt )|Ys , s ∈ DN = 0, E(Xti − X t t t
159
160
9 : Linear filtering
ˆ i is uncorrelated to {Ys , s ∈ DN }. By the properties of normal random Xti − X t ˆ i is independent of the family {Ys , s ∈ DN }. Similarly, Xtj − X ˆ tj vectors, Xti − X t is independent of {Ys , s ∈ DN }. Thus, j j ij i i ˆ ˆ γt = E (Xt − Xt )(Xt − Xt )Gt ˆ i )(Xtj − X ˆ tj )Ys , s ∈ DN = lim E (Xti − X t N→∞ ˆ i )(Xtj − X ˆ tj ) = lim E (Xti − X t N→∞ j ˆj ˆ iX = E Xti Xt − X t t .
9.2
Kalman–Bucy filtering
ˆ t and γt by making use In this section, we derive equations for the processes X of the filtering equation. The coefficients in the filtering model of Chapter 5 are assumed to be bounded, which is not satisfied in the current linear model. However, since the processes are Gaussian, which have moments of any order, it follows from the same arguments as in the derivation of the filtering equation (5.17) that πt satisfies the following stochastic differential equation on P (Rd ): For any f ∈ C 2 (Rd ) with at most exponential growth, t t! " πt , f = π0 , f + πs , ∇ ∗ fcs + f (h˜ s + hs ι)∗ dνs πs , Ls f ds + −
0
0
t
0
"
!
πs , f πs , (h˜ s + hs ι)∗ dνs ,
where νt = Yt −
(9.4)
t! 0
" πs , h˜ s + hs ι ds
t ˆ s ds h˜ s + hs X = Yt − 0
is a d-dimensional Brownian motion, d 1 ij 2 at ∂ij f (x), Ls f (x) = ∇ ∗ f (x)(b˜ t + bt x) + 2 ij=1
and at = ct ct∗ + σt σt∗ , ι is the identity function on Rd , i.e. ι(x) = x, ∀x ∈ Rd .
9.2
Kalman–Bucy filtering
ˆ t . Taking f (x) = xi , we have We now derive the equation satisfied by X Ls f (x) = b˜ is +
d
ij
bs xj and ∂j f = δij , i = 1, 2, . . . , d.
j=1
Then
d ij ˆ j bs X πs , Ls f = b˜ is + s, j=1
t!
" πs , ∇ ∗ fcs + f (h˜ s + hs ι)∗ dνs
0 d
=
% πs ,
=
j=1 d
=
⎛ j
∂ fcs + ⎝h˜ s +
=1
j=1 d
d
%
j
k=1
⎛
πs , cs + xi ⎝h˜ s + ij
d
j
d
⎞ & jk j hs xk ⎠ f dνs ⎞&
hs xk ⎠ dνs jk
j
j=1
⎛ ˆ i h˜ sj + ⎝csij + X s
j=1
d
⎞
jk ˆ iX ˆk ⎠ j hs (γsik + X s s ) dνs ,
k=1
and % & d d " ! j jk j πs , (h˜ s + hs ι)∗ dνs = hs xk dνs πs , h˜ s + j=1
=
d
⎛ ⎝h˜ sj +
j=1
k=1 d
⎞
jk ˆ k ⎠ j hs X dνs . s
k=1
Thus, ˆi =X ˆi + X t 0
+
⎛
t 0
d j=1
0
⎝b˜ is + ⎛
t
d
⎞ ˆ s ⎠ ds bt X ij
j
j=1
ˆ i h˜ sj + ⎝csij + X s
d k=1
⎞ ˆk ⎠ ˆ iX hs (γsik + X s s ) dνs jk
j
161
162
9 : Linear filtering
−
⎛
d j=1
ˆi + =X 0
t
0
0
ˆ i ⎝h˜ sj + X s
⎞ jk ˆ k ⎠ j hs X dνs s
k=1
⎛ t
d
⎝b˜ is +
d
⎞
ij ˆ j ⎠ bt X s ds +
j=1
d j=1
0
⎛ t
⎝csij +
d
⎞ hs γsik ⎠ dνs . jk
k=1
Writing in the vector form, we have t t ˜ ˆ ˆ ˆ bs + bs Xs ds + cs + γs h∗s dνs . Xt = X0 + 0
j
(9.5)
0
Therefore, we have proved the existence part of the following theorem. ˆ is the unique solution to the SDE Theorem 9.6 The mean vector process X (9.5), with νt being a d-dimensional Brownian motion given by t ˆ s ds. h˜ s + hs X νt = Yt − 0
Proof We only need to prove the uniqueness. Note that νt depend on the ˆ We write the SDE (9.5) as solution X. t ˆ0 + ˆt = X cs + γs h∗s dYs X 0
t ˆ s ds. + b˜ s − cs + γs h∗s h˜ s + bs − cs + γs h∗s hs X 0
˜ q˜ be defined as in equations (9.2) and (9.3) with bs replaced by Let p, bs − cs + γs h∗s hs . Similar to the proof of Theorem 9.3 we see that ˆ 0 + q˜ t ˆ t = q˜ t X X
0
t
p˜ s
b˜ s − cs + γs h∗s h˜ s ds + cs + γs h∗s dYs .
This yields the uniqueness.
Next, we derive the equation satisfied by γt . Applying Itô’s formula to equation (9.1), for 1 ≤ i, j ≤ d, we have j
j
j
ij
d(Xti Xt ) = Xti dXt + Xt dXti + at dt. Writing this in the integral form, we have t j j ij i j i j Xt Xt = X0 X0 + Xsi dXs + Xs dXsi + as ds . 0
9.2
Kalman–Bucy filtering
Taking expectations and then taking derivatives on both sides, we get ⎛ ⎞ ⎛ ⎞ d d d j j jk j ij k⎠ + at . E(Xti Xt ) = EXti ⎝b˜ t + bt Xtk ⎠ + EXt ⎝b˜ it + bik t Xt dt k=1
k=1
(9.6) Applying Itô’s formula to equation (9.5), we get ˆ ˆi ˆ ˆ ˆi ˆ iX d(X t t ) = Xt d Xt + Xt d Xt + j
j
j
m
(ct + γt h∗t )ik (ct + γt h∗t )jk dt.
k=1
Similar to equation (9.6), we have ⎛ ⎞ ⎛ ⎞ m m d jk ˆ k ⎠ ˆ iX ˆj ˆ i ⎝˜ j ˆ tj ⎝b˜ i + ˆ k⎠ E(X bt X bik + EX t t ) = EXt bt + t t t Xt dt k=1
ij + (ct + γt h∗t )(ct + γt h∗t )∗ .
k=1
(9.7)
Combining equations (9.6) and (9.7) with Lemma 9.5, we have ij d ij ik jk ik jk ij γt = γt bt + bt γt + at − (ct + γt h∗t )(ct + γt h∗t )∗ . dt d
d
k=1
k=1
It then follows that, in the matrix form, γt satisfies the following Riccati equation: d γt = γt b∗t + bt γt + at − (ct + γt h∗t )(ct + γt h∗t )∗ . dt
(9.8)
ˆ t , γt ) is the unique solution to equations (9.5) Theorem 9.7 The process (X and (9.8). Proof Suppose that γt1 and γt2 are two solutions to equations (9.8) and ζt is their difference. As |γt1 h∗t ht γt1 − γt2 h∗t ht γt2 | ≤ |ζt h∗t ht γt1 | + |γt2 h∗t ht ζt | ≤ |ht |2 |γt1 | + |γt2 | |ζt |, we have
|ζt | ≤
0
t
2 |bs | + |hs |2 |γs1 | + |γs2 | + |cs ||hs | |ζs |ds.
By Gronwall’s inequality we see that ζt = 0.
163
164
9 : Linear filtering
For d = m = 1 with time-independent coefficients, we can give the explicit formula for γt . Note that d γt = 2bγt + a − (c + hγt )2 dt = −hγt2 + 2(b − ch)γt + (a − c2 ) = −h(γt − c+ )(γt − c− ), where c± =
b − ch ±
3
(b − ch)2 + h(a − c2 ) . h
Then, dγt = −hdt. (γt − c+ )(γt − c− ) Thus, γ0 − c+ −h(c+ −c− )t γt − c+ = e . γt − c− γ0 − c− ˆ t , γt ) is call the Kalman–Bucy filter of the linear Remark 9.8 The process (X system equation (9.1).
9.3
Discrete-time approximation of the Kalman–Bucy filtering
In this section, we consider the numerical solution to the Kalman–Bucy filtering, since in most cases it cannot be given explicitly. Recall that the filter ˆ t , γt ) is given by equations (9.5) and (9.8). Using the Euler scheme, we (X approximate the Kalman–Bucy filter at time points t = kδ, k = 0, 1, 2, . . . by the following recursive formulas: δ δ ˆ δ + b˜ kδ + bkδ X ˆδ ˆ δ δ + ckδ + γ δ h∗ = X − ν ν X (k+1)δ kδ kδ kδ kδ (k+1)δ kδ , (9.9) δ δ ˆ δ δ, = νkδ + Y(k+1)δ − Ykδ − h˜ kδ + hkδ X ν(k+1)δ kδ
(9.10)
and
δ δ δ ∗ δ ∗ δ ∗ ∗ δ. = γkδ + γkδ bkδ + bkδ γkδ + akδ − ckδ + γkδ hkδ ckδ + γkδ hkδ γ(k+1)δ (9.11)
9.4
Some basic facts for a related deterministic control problem
Theorem 9.9 Suppose that the coefficients b˜ t , bt , ct , σt , h˜ t , ht are Lipschitz continuous in t. Then there exists a constant K1 such that ˆδ −X ˆ kδ |2 + max |γ δ − γkδ | ≤ K1 δ. E max |X kδ kδ kδ≤T
kδ≤T
Proof Consider the process in finite time interval [0, T]. Since γt and the coefficients at , bt , ct and ht are continuous, they are bounded. It follows d from equation (9.8) that dt γt is bounded. Thus, γt is also Lipschitz continuous. Form this, together with the Lipschitz continuity of the coefficients, we see that there exists a constant K2 such that (k+1)δ ∗ γs bs + bs γs + as − (cs + γs h∗s )(cs + γs h∗s )∗ ds kδ
−
γkδ b∗kδ
+ bkδ γkδ + akδ −
ckδ + γkδ h∗kδ
∗ ckδ + γkδ h∗kδ δ
≤ K2 δ 2 . As
γ(k+1)δ = γkδ +
(k+1)δ
kδ
γs b∗s + bs γs + as − (cs + γs h∗s )(cs + γs h∗s )∗ ds,
by equation (9.11), we get δ fk+1 ≡ |γ(k+1)δ − γ(k+1)δ |
≤ fk + K2 δ 2 + K3 fk δ. Using induction, we can prove that fk ≤ K2 δ 2 1 + (1 + K3 δ) + · · · + (1 + K3 δ)k−1 =
K2 (1 + K3 δ)k − 1 δ K3
≤ K4 δ. ˆ δ − Xkδ follows from the same arguments The proof for the estimate of X kδ as in Sections 4.4 and 8.1.
9.4
Some basic facts for a related deterministic control problem
In this section, we consider the time-homogeneous linear system, i.e. the case when the matrices b˜ t , bt , ct , σt , h˜ t , ht do not depend on t. This section is long and technical. Its main purpose is to prepare Lemma 9.35
165
166
9 : Linear filtering
with related definitions and results from the theory of stochastic control. We strongly suggest the reader skips this section in the first reading. We include this section here for the completeness of the book. We will investigate the limiting behavior of the Riccati equation (9.8). To this end, we consider the following optimal control problem with state equation on Rd : x˙ t = Axt + But , and the cost functional T 1 1 J(u) = |Dxt |2 + |ut |2 dt + x∗T γ0 xT , 2 0 2
(9.12)
∀ u ∈ L2 ([0, T], Rm ),
where | · | denotes the Euclidean norm, A, B, D are matrices of dimensions d d ×d, d ×m and m×d, respectively, and x˙ t denotes the time derivative dt xt . Suppose that the initial state x0 is fixed. The aim of the control problem is to find a control u ∈ L2 ([0, T], Rm ) such that J(u) is minimized. Here, ut is called the control variable, and xt is the state variable. The problem in the last paragraph is a special case of the LQ control problem. In this section, we shall study some properties for this special case. We refer the reader to Yong and Zhou [153] for a detailed treatment under a more general setting. First, we will derive a Riccati equation for this control problem. Then, we will choose the matrices A, B, D in terms of the coefficients in the filtering problem such that the Riccati equation for the control problem coincides with equation (9.8), which is the Riccati equation for the Kalman–Bucy filter. The representation for the Riccati equation in terms of the control problem will help us in studying the limit of the solution. The next theorem establishes the optimal control law. Theorem 9.10 Suppose that pt is given by the following differential equation on Rd : p˙ t = −A∗ pt + D∗ Dxt ,
pT = −γ0 xT .
Let ut = B∗ pt . Then for any u˜ ∈ L2 ([0, T], Rm ), we have J(u) ˜ ≥ J(u). Namely, u is the optimal control law. Proof Define vt = u˜ t − ut and let x˜ t be the solution to equation (9.12) with ut replaced by u˜ t . Let zt = x˜ t − xt . Then z˙ t = Azt + Bvt ,
z0 = 0.
9.4
Some basic facts for a related deterministic control problem
As d ∗ (p zt ) = x∗t D∗ Dzt + p∗t Bvt dt t = x∗t D∗ Dzt + u∗t vt , we have
0
T
Dxt , Dzt Rm dt +
T 0
u∗t vt dt = −x∗T γ0 zT .
Then, we have J(u) ˜ − J(u) =
1 2
T 0
1 |Dzt |2 + |vt |2 dt + zT∗ γ0 zT ≥ 0, 2
and hence, the minimal of J is attained at ut .
Next, we show that the optimal control law is related to the Riccati equation. Theorem 9.11 Let Pt be the solution to the following Riccati equation on Sd+ : −P˙ t = A∗ Pt + Pt A + D∗ D − Pt BB∗ Pt ,
PT = γ0 ,
(9.13)
where Sd+ is the collection of all d × d non-negative-definite matrices. Then, pt = −Pt xt and J(u) =
1
P0 x0 , x0 Rd . 2
Proof Let qt = −Pt xt . Taking derivatives on both sides, we get q˙ t = −P˙ t xt − Pt x˙ t = A∗ Pt + Pt A + D∗ D − Pt BB∗ Pt xt − Pt Axt + BB∗ pt = −A∗ qt + D∗ Dxt − Pt BB∗ (pt − qt ) . Note that pt is also a solution to the above equation. By the uniqueness of the solution to this equation, we see that pt = −Pt xt . Note that d
Pt xt , xt Rd = − A∗ Pt + Pt A + D∗ D − Pt BB∗ Pt xt , xt dt + Pt (Axt + But ) , xt + Pt xt , Axt + But = −|Dxt |2 − |ut |2 .
167
168
9 : Linear filtering
Hence J(u) =
1
P0 x0 , x0 Rd . 2
(9.14)
Remark 9.12 Note that equation (9.8) can be written as d γt = γt (b − ch)∗ + (b − ch)γt + σ σ ∗ − γt h∗ hγt . dt If A = (b − ch)∗ , B = h∗ and D = σ , then the Riccati equation (9.13) coincides with equation (9.8) with time reversed. To study the long-time limit of the (uncontrolled) linear system dxt = Axt , dt
(9.15)
which is a special case of the linear SDE (9.5), we need some facts from matrix algebra. We state these facts without giving their proofs, which can be found in any textbook about matrix algebra. Lemma 9.13 Suppose that the d×d-matrix A has k distinct (complex) eigenvalues λi , i = 1, 2, . . . , k. Let mi be the multiplicity of λi in the characteristic polynomial of A. Let Ni be the null space of the matrix Mi ≡ (A − λi I)mi . Then the dimension of Ni , denoted by dim(Ni ), is equal to mi , and the d-dimensional complex space is equal to the direct sum of the null spaces Ni , i = 1, 2, . . . , k. Based on this decomposition, we get the following Jordan form for the matrix A. Lemma 9.14 There is an invertible d×d-matrix C that can be partitioned as C = (C1 , C2 , . . . , Ck ) such that A = CJC −1 , where J = diag(J1 , J2 , . . . , Jk ), and Ci is a d × mi -matrix, i = 1, 2, . . . , k. The block Ji is an mi × mi -matrix that can be subpartitioned as Ji = diag Ji1 , Ji2 , . . . , Jii , where each subblock Jij is of the form ⎛ ⎞ λi 1 0 ··· ··· ⎜ 0 λi 1 0 ··· ⎟ ⎜ ⎟ ⎜ Jij = ⎜ · · · · · · · · · · · · · · · ⎟ ⎟. ⎝ 0 · · · 0 λi 1 ⎠ 0 · · · · · · 0 λi J is called the Jordan normal form of A.
9.4
Some basic facts for a related deterministic control problem
Now we can get an expression for the solution to equation (9.15). Theorem 9.15 Suppose that x0 = ki=1 vi with vi ∈ Ni , i = 1, 2, . . . , k. Write C −1 = (U1∗ , U2∗ , . . . , Uk∗ )∗ where the partitioning corresponds to that of C in Lemma 9.14, i.e. Ui is an mi × d-matrix. Then xt =
k
Ci exp (Ji t) Ui vi ,
i=1
where
eJi t = diag e Ji1 t , e Ji2 t , . . . , e Jii t ,
and
⎛
1
t2 2!
t
···
⎜ ∞ n t n ⎜ 0 1 t ··· Jij = ⎜ exp Jij t ≡ ⎜ n! ⎝ ··· ··· ··· ··· n=0 0 0 0 ···
n −1
t ij (nij −1)! n −2 t ij (nij −2)!
··· 1
⎞ ⎟ ⎟ λt ⎟e i , ⎟ ⎠
(9.16)
nij is the dimension of Jij . Proof It is easy to verify that xt = etA x0 . Thus, xt =
∞ n t n=1
=
n!
∞ n t n=1
n!
n
A =
∞ n t n=1
k
n!
CJ n C −1 x0
Ci Jin Ui vi
i=1
=
k
Ci exp (Ji t) Ui vi .
i=1
With the preparation above, we are ready to study the limiting behavior of the solution of equation (9.15). If all the eigenvalues of A have negative real parts, then lim xt = 0.
t→∞
Definition 9.16 Let A be a d × d-matrix. We call the direct sum of those subspaces Ni with Re(λi ) < 0 the stable subspace of A. The orthogonal complement of the stable subspace is called the unstable subspace. A is asymptotically stable if all its eigenvalues have negative real parts. Remark 9.17 If A is asymptotically stable, then the real parts of the eigenvalues of A are negative. Let λ0 > 0 be such that Re(λj ) < −λ0 for all
169
170
9 : Linear filtering
j = 1, 2, · · · , d. It follows from equation (9.16) that there exists a constant K such that |xt | ≤ Ke−λ0 t . Therefore, xt tends to 0 exponentially fast. To prove the convergence of the solution of the Riccati equation as T → ∞, we need to make some assumptions on the coefficient matrices. Definition 9.18 The linear system equation (9.12) is completely controllable if for any a ∈ Rd , there is a control u and a time t1 such that x0 = a and xt1 = 0. Remark 9.19 Using a shift transformation, we see that an equivalent definition for the linear system equation (9.12) being completely controllable is that for any a ∈ Rd , there is a control u and a time t1 such that xt1 = a and x0 = 0. In fact, this is the definition given in [105]. Namely, under suitable control, the system can reach any state starting from 0. Now we consider the system equation (9.15) with output variable yt , namely, we study the system dxt = Axt , dt
yt = Dxt .
(9.17)
Denote the solution yt by yt (a) if x0 = a. Definition 9.20 The system equation (9.17) is completely reconstructible if for any t1 > 0 and a ∈ Rd , yt (a) = 0,
∀ t ≤ t1
implies a = 0. Remark 9.21 By the linearity, the system equation (9.17) is completely reconstructible if for any t1 > 0 and a1 , a2 ∈ Rd , yt (a1 ) = yt (a2 ),
∀ t ≤ t1
implies a1 = a2 . This means that the initial state can be reconstructed from the output. We denote the solution to the Riccati equation (9.13) by PtT when we discuss its limit. We may omit T for simplicity of notation when there is no confusion. Theorem 9.22 Suppose that the system equation (9.12) is completely controllable and PT = 0. Then P0T converges as T → ∞. Denote the ¯ limit by P. If, in addition, the system equation (9.17) is completely reconstructible, then P¯ > 0 and the matrix A − BB∗ P¯ is asymptotically stable.
9.4
Some basic facts for a related deterministic control problem
Proof By equation (9.14), we have aT ≡
x∗0 P0T x0
=
min
u∈L2 ([0,T],Rm ) 0
T
|Dxt |2 + |ut |2 dt.
(9.18)
Thus, aT is non-decreasing in T. We now prove that {aT } is bounded from above. Since the system is completely controllable, there exists an input u that transfers state x0 to 0 at some time t1 . Clearly, the optimal control after t1 is ut = 0, and in this case, xt = 0, ∀ t ≥ t1 . Then, aT is bounded from above by t1 |Dxt |2 + |ut |2 dt. 0
Therefore, aT converges for any x0 . This proves the convergence of P0T as T → ∞. Next, we assume that the system equation (9.17) is completely reconstructible in addition to the assumption of equation (9.12) being completely controllable. Note that P¯ ≥ 0. Suppose that P¯ is singular. Then there exists x0 = 0 ¯ 0 = 0. Since aT is non-decreasing, we see that aT = 0 such that a∞ = x∗0 Px for all T ≥ 0. Hence, there exists u such that for any T ≥ 0, T |yt |2 + |ut |2 dt = 0. 0
Thus, ut = 0 and yt = 0 for all t ≤ T, but x0 = 0. This contradicts the assumption that the system is completely reconstructible. Therefore, P¯ is positive-definite. ¯ It is clear that Pt = P¯ for all t ≤ T. By Theorem 9.11, we Let PT = P. see that $ T 2 2 ∗ ¯ ¯ 0. inf |yt | + |ut | dt + xT PxT = x∗0 Px (9.19) u∈L2 ([0,T],Rm )
0
¯ t . Thus The optimal control is attained at ut = −B∗ Px ∞ |yt |2 + |ut |2 dt < ∞.
(9.20)
0
Since P¯ is not singular, it follows from equation (9.19) that {xT } is bounded. Under the steady-state control law, the system becomes (9.21) x˙ t = A − BB∗ P¯ xt .
171
172
9 : Linear filtering
If A − BB∗ P¯ is not asymptotically stable, then there is an eigenvalue of A − BB∗ P¯ whose real part is non-negative. It follows from equation (9.16) that for a suitably chosen x0 the sequence {xt } cannot be bounded, which contradicts the boundedness of {xt } we obtained earlier. Hence, A − BB∗ P¯ is asymptotically stable. In general, the system need not be completely reconstructible. We now define the subspace of states which cannot be reconstructed. Definition 9.23 The unreconstructible subspace of the system equation (9.17) is the linear subspace of Rd consisting of the states a ∈ Rd satisfying yt (a) = 0,
∀ t ≥ 0.
The following theorem will be needed in this section. We state it without giving the proof. Theorem 9.24 (Cayley–Hamilton theorem) Let f (λ) = det(λI − A) = c0 + c1 λ + · · · + cd−1 λd−1 + λd be the characteristic polynomial of the matrix A. Then, f (A) = 0. The following theorem gives an explicit representation of the unreconstructible subspace. Theorem 9.25 The unreconstructible subspace of equation (9.17) is equal to the null space of the matrix ⎛ ⎞ D ⎜ DA ⎟ ⎟ Q=⎜ ⎝ ··· ⎠. DAd−1 Proof If a is in the unreconstructible subspace, then DeAt a = 0, Thus,
for all t ≥ 0.
2 t D + DAt + DA2 + · · · a = 0, 2!
for all t ≥ 0.
(9.22)
Therefore, Da = DAa = DA2 a = · · · = 0 and, hence, Qa = 0. On the other hand, suppose that Qa = 0, i.e. Da = DAa = · · · = DAd−1 a = 0. It follows from the Cayley–Hamilton theorem that f (A) = 0, where f is the characteristic polynomial of the matrix A: c0 + c1 A + · · · + cd−1 Ad−1 + Ad = 0.
9.4
Some basic facts for a related deterministic control problem
Thus, DAd a = 0. By induction, we can prove that DAk a = 0 for all k. Hence, equation (9.22) holds, and a is in the unreconstructible subspace. Definition 9.26 The pair of matrices (A, D) is called detectable if the unreconstructible subspace of the system equation (9.17) is contained in the stable subspace of A. Next, we study the system under state transformation, and investigate how the Riccati equation changes under this transformation. Consider the controlled system with output: x˙ t = Axt + But ,
yt = Dxt .
(9.23)
Let U be an invertible d × d-matrix and xt = Uxt . Note that xt is just the notation for another state variable. It should not be confused with the derivative of xt or the transpose of xt . Then x˙ t = UAU −1 xt + UBut ,
yt = DU −1 xt .
Let Pt = (U −1 )∗ Pt U −1 . Then −P˙ t = (UAU −1 )∗ Pt + Pt UAU −1 + (DU −1 )∗ DU −1 − Pt UB(UB)∗ Pt . Note that for t → ∞, Pt has a limit if and only if Pt has a limit. Suppose that the original system is not completely reconstructible. Denote the rank of the matrix Q by d1 < d. Let f1 , f2 , . . . , fd1 be a set of linearly independent row vectors of Q. Let fd1 +1 , . . . , fd be row vectors, orthogonal to f1 , f2 , . . . , fd1 , such that {f1 , . . . , fd } form a basis of Rd . Let U1 and U2 be d1 × d and (d − d1 ) × d matrices formed by the row vectors f1 , f2 , . . . , fd1 U1 . Let U −1 = (T1 , T2 ) be and fd1 +1 , . . . , fd , respectively. Let U = U2 the corresponding partition, i.e. T1 is d × d1 and T2 is d × (d − d1 ). Then U1 T1 U1 T2 I 0 = . UU −1 = U2 T1 U2 T2 0 I Thus, U1 T2 = 0. Since the row space of U1 coincides with the row space of Q, for x ∈ Rd , U1 x = 0 implies Qx = 0, and hence, x is in the unreconstructible subspace. Since U1 T2 = 0, all column vectors of T2 must be in the unreconstructible subspace. There are d − d1 linearly independent column vectors in T2 . Hence, these column vectors form a basis for the unreconstructible subx1 space. Thus, U1 x = 0 for any x in this subspace. Let x = with x2 x1 ∈ Rd1 and x2 ∈ Rd−d1 . As x = U −1 x = T1 x1 + T2 x2 ,
173
174
9 : Linear filtering
the unreconstructible subspace is {x : x1 = 0} and the reconstructible subspace is {x : x2 = 0}. Note that U1 AT1 U1 AT2 −1 UAU = , U2 AT1 U2 AT2 and DU −1 = (DT1 , DT2 ). If x0 is in the unreconstructible subspace, by Theorem 9.25 and the Cayley–Hamilton theorem, it is easy to verify that Ax0 is also in that subspace. Thus, the column vectors of AT2 are also in that subspace, and hence, U1 AT2 = 0. Since the rows of D are row vectors of the unreconstructibility matrix Q, we must have DT2 = 0. We summarize these observations in the following: Lemma 9.27 The transformed system is represented in the unreconstructibility canonical form: ⎧ ⎨
0 A11 = A21 A22 ⎩ yt = (D1 , 0)xt , x˙ t
xt
+
B1 B2
ut
where A11 is a d1 × d1 -matrix, and the pair (A11 , D1 ) is completely reconstructible. Further, the transformed Riccati equation for Pt ≡ (U −1 )∗ Pt U −1 is P˙ t =
∗ 0 0 A11 (D1 )∗ D1 A11 + Pt + Pt A21 A22 A21 A22 0 ∗ B1 (B1 ) B1 (B2 )∗ − Pt Pt . B2 (B1 )∗ B2 (B2 )∗
0 0
The reconstructible subspace is {x ∈ Rd : x2 = 0}, where x2 ∈ Rd−d1 consists of the last d − d1 components of x = Ux. Proof We only need to prove that (A11 , D1 ) is completely reconstructible. Let x ∈ Rd1 be such that D1 x = D1 A11 x = D1 (A11 )2 x = · · · = 0.
9.4
Some basic facts for a related deterministic control problem
Then, DT1 x = 0. As D1 A11 x = DT1 U1 AT1 x = D(I − T2 U2 )AT1 x = DAT1 x, we get DAT1 x = 0. Similarly, we can show that DAk T1 x = 0 for all k. Thus, T1 x is in the unreconstructible subspace of (A, D). This implies T1 x = 0, and hence, x = 0. Therefore, (A11 , D1 ) is completely reconstructible. We now consider the controlled system equation (9.12) that is not necessarily completely controllable. Definition 9.28 The controllable subspace of the linear system equation (9.12) consists of the states that can be reached from the zero state within a finite time. Namely, a ∈ Rd is in the controllable subspace if and only if there exist u ∈ L2 ([0, T], Rm ) and t1 > 0 such that x0 = 0 and xt1 = a. The pair (A, B) is stabilizable if the unstable space of A is contained in the controllable subspace of the linear system equation (9.12). The next theorem gives an explicit characterization of the controllable subspace. Theorem 9.29 The controllable subspace of the linear system equation (9.12) is equal to the linear subspace spanned by the columns of the controllability matrix P = (B, AB, . . . , Ad−1 B). Proof Suppose x0 = 0. Then t xt = eA(t−s) Bus ds 0
=B
0
t
us ds + AB
t 0
2
(t − s)us ds + A B
0
t
(t − s)2 us ds + · · · . 2!
Thus, xt is in the column space of the matrix P∞ = (B, AB, A2 B, . . .). By the Cayley–Hamilton theorem, Ad can be represented as a linear combination of I, A, A2 , · · · , Ad−1 . Thus, the column space of P∞ coincides with that of P. Hence, xt is in the column space of P.
175
176
9 : Linear filtering
On the other hand, suppose a is in the column space of P. Namely, there are vectors α0 , α1 , . . . , αd−1 ∈ Rm such that a = Bα0 + ABα1 + · · · + Ad−1 Bαd−1 . We can choose a suitable function u such that t (t − s)i us ds = αi−1 11≤i≤d . i! 0 Thus, a is in the controllable subspace of the linear system equation (9.12). Next, we consider the controllability canonical form. Since the proof is similar to that of Lemma 9.27, we will omit it. Lemma 9.30 Suppose that T1 is a d × d2 -matrix whose column vectors form a basis for the controllable subspace and T2 is a d × (d − d2 )-matrix whose column vectors are orthogonal to those of T1 , and, together with those of T1 , form a basis for Rd . Let T = (T1 , T2 ) and xt = T −1 xt . Then the system is transformed into the controllability canonical form: A11 A12 B1 ut , x˙ t = xt + 0 A22 0 where A11 is a d2 × d2 -matrix, and the pair (A11 , B1 ) is completely controllable. Further, the transformed Riccati equation becomes ∗ A11 A12 ˙P = A11 A12 Pt + Pt t 0 A22 0 A22 ∗ B1 (B1 ) 0 ∗ Pt . + (D ) D − Pt 0 0 Now we are ready to consider the limit of the Riccati equation. This limit will be established in the following four theorems. Theorem 9.31 If PTT = 0 and the intersection of the unstable, uncontrollable and reconstructible subspaces of the system equation (9.23) is empty, ¯ which is a solution to the following then, as T → ∞, P0T converges to P, algebraic Riccati equation ¯ + D∗ D − PBB ¯ ∗ P¯ = 0. A∗ P¯ + PA
(9.24)
Proof By Lemma 9.27, the system can be written into the reconstructibility canonical form. For simplicity of notation, we denote xt and Pt by xt and
9.4
Some basic facts for a related deterministic control problem
Pt , respectively. Then ⎧ A11 B1 0 ⎨ xt + ut , x˙ t = A21 A22 B2 ⎩ yt = (D1 , 0)xt , where A11 is a d1 × d1 -matrix, and the pair (A11 , D1 ) is completely reconstructible. Partitioning the solution Pt of the Riccati equation as Pt11 Pt12 Pt = , (Pt12 )∗ Pt22 then
∗ −P˙ t11 = D∗1 D1 − Pt11 B1 + Pt12 B2 Pt11 B1 + Pt12 B2 ∗ + Pt11 A11 + Pt12 A21 + Pt11 A11 + Pt12 A21 , ∗ −P˙ t12 = − Pt11 B1 + Pt12 B2 Pt12 B1 + Pt22 B2
−P˙ t22
+ Pt22 A22 + A∗22 Pt22 + A∗11 Pt12 , ∗ = − Pt12 B1 + Pt22 B2 Pt12 B1 + Pt22 B2 + Pt22 A22 + A∗22 Pt22 ,
with the terminal conditions PT11 = 0, PT12 = 0 and PT22 = 0. Therefore, Pt22 = 0, Pt12 = 0 and
−P˙ t11 = D∗1 D1 − Pt11 B1 B∗1 Pt11 + Pt11 A11 + A∗11 Pt11 PT11 = 0.
It follows from this that the unreconstructible subspace does not affect the convergence of Pt . Therefore, we may and will assume that the system is completely reconstructible, and hence, the condition of the theorem becomes: “the intersection of the uncontrollable subspace and the unstable subspace is empty”. Thus, the uncontrollable subspace is contained in the stable subspace, i.e. the pair (A, B) is stabilizable. Now we transform the system equation (9.23) to the controllability canonical form (again, we use Pt instead of Pt for the simplicity of notations): A11 A12 B1 ut , xt + (9.25) x˙ t = 0 A22 0
177
178
9 : Linear filtering
where A11 is a d2 × d2 -matrix, and the pair (A11 , B1 ) is completely controllable. The transformed Riccati equation becomes ∗ A11 A12 B1 B∗1 0 ∗ ˙Pt = A11 A12 Pt . +D D − Pt Pt + P t 0 0 0 A22 0 A22 (9.26) ∈
Rd2
and
x2
∈
Rd−d2
x1 x2
. For x2 = 0, x is in be such that x = 1 xt and there is a control u such the controllable subspace. Hence, xt = 0 that x1t = 0 for large t. For x1 = 0, x is in the uncontrollable subspace, which is contained in the stable subspace. By the stability, xt → 0 as t → ∞. In both cases, we see that {aT } defined in the proof of Theorem 9.22 is bounded. It then follows from the same arguments as in the proof of Theorem 9.22 that the limit of P0T exists as T → ∞. It is clear that the limit P¯ is a solution to equation (9.24). Let
x1
Theorem 9.32 Suppose that the system equation (9.23) is stabilizable and ¯ t is asympdetectable. Then, the steady-state control law ut = −B∗ Px ∗ totically stable. Equivalently, we have that A − BB P¯ is asymptotically stable. Proof We have already seen in the proof of Theorem 9.31 that the steadystate control does not affect and is not affected by the unreconstructible part of the system. Since the system is detectable, we may and will omit the unreconstructible part and assume that the system is completely reconstructible. We now use the controllability canonical form. Partitioning the matrix Pt as Pt12 Pt11 Pt = . (Pt12 )∗ Pt22 It is easy to see from equation (9.26) that Pt11 is the solution of −P˙ t11 = D∗1 D1 − Pt11 B1 B∗1 Pt11 + Pt11 A11 + A∗11 Pt11 , PT11 = 0. By Theorem 9.22, we see that Pt11 has a limit P¯ 11 as T → ∞, and that 0 is in the stable A11 − B1 B∗1 P¯ 11 is asymptotically stable. As x = x2 subspace, xt → 0, and hence, x2t → 0. By equation (9.25), we have x˙ 2t = A22 x2t . Thus, A22 is stable.
9.4
Some basic facts for a related deterministic control problem
The control law for the whole system is given by P¯ 12 P¯ 11 xt ut = − B∗1 , 0 (P¯ 12 )∗ P¯ 22 = − B∗1 P¯ 11 , B∗1 P¯ 12 xt . With this control law, the system equation (9.23) becomes A11 − B1 B∗1 P¯ 11 A12 − B1 B∗1 P¯ 12 xt . x˙ t = 0 A22 Since both A11 − B1 B∗1 P¯ 11 and A22 are asymptotically stable, xt tends to 0 exponentially fast. Therefore, A − BB∗ P¯ is asymptotically stable. Theorem 9.33 Suppose that the system equation (9.23) is stabilizable and detectable. Then, the steady-state control law minimizes $ T 2 2 ∗ lim (9.27) |Dxt | + |ut | dt + xT γ0 xT T→∞
0
¯ 0 . Here, γ0 ≥ 0 means for all γ0 ≥ 0. Further, the minimal value is x∗0 Px that γ0 is a non-negative-definite matrix. Proof Obviously, the steady-state control law minimizes ∞ |Dxt |2 + |ut |2 dt, 0
¯ 0 . Denote the steady-state control law by u¯ t and the minimal value is x∗0 Px and the corresponding state process by x¯ t . By the previous theorem, we see that limT→∞ x¯ T = 0. Therefore, the value of equation (9.27) with (xt , ut ) ¯ 0 . Suppose that there exists another replaced by (x¯ t , u¯ t ) is equal to x∗0 Px control law ut that gives a smaller value for equation (9.27). Then ∞ ¯ 0. |Dxt |2 + |ut |2 dt + lim x∗T γ0 xT < x∗0 Px T→∞
0
Since γ0 ≥ 0, this would imply that ∞ ¯ 0. |Dxt |2 + |ut |2 dt < x∗0 Px
(9.28)
0
This is not possible because the minimal value for the left hand side of equation (9.28) is equal to the right hand side. Therefore, equation (9.27) is minimized by u. ¯
179
180
9 : Linear filtering
Theorem 9.34 Suppose that the system equation (9.23) is stabilizable and detectable. Then the solution of the Riccati equation with PT = γ0 tends to P¯ as T → ∞. Further, P¯ is the unique solution to the algebraic Riccati equation ¯ ∗ P¯ = 0. ¯ + D∗ D − PBB A∗ P¯ + PA Proof Note that
min
u∈L2 ([0,T],Rm )
0
T
(9.29) $
|Dxt |2 + |ut |2 dt + x∗T γ0 xT
= x∗0 P0T x0 .
By the last theorem, we get P0T → P¯ as T → ∞. It is clear that P¯ solves the algebraic Riccati equation (9.29). Now we prove the uniqueness. Let P be any non-negative-definite solution of the algebraic Riccati equation. Consider the Riccati equation (9.13) with terminal condition PT = P . Obviously, the solution of the Riccati equation is Pt = P for all t ≤ T. Then, the steady-state solution P¯ must also be given by P . This proves the uniqueness for the solution to equation (9.29).
9.5
Stability for Kalman–Bucy filtering
After all the preparations in the last section, we can now discuss the stability of the linear filtering. We consider the filtering model whose signal is given by dXt = bXt dt + cdWt + σ dBt ,
(9.30)
and the observation process is dYt = hXt dt + dWt ,
(9.31)
ˆ 0 ∈ Rd and where X0 is a d-dimensional normal random vector with mean X d covariance matrix γ0 ∈ S+ , the space of all non-negative-definite symmetric d × d-matrices, (Wt , Bt ) is an m + d-dimensional Brownian motion, the coefficients b, c, σ , h are matrices of dimensions d × d, d × m, d × d and m × m, respectively. As dWt = dYt − hXt dt, we can rewrite equation (9.30) as dXt = (b − ch)Xt dt + cdYt + σ dBt .
(9.32)
9.5
ˆ t = E (Xt |Gt ) and γt = E Recall that X
Stability for Kalman–Bucy filtering
ˆt Xt − X
ˆt Xt − X
∗
satisfy
the following equations:
ˆ t = (b − ch − γt h∗ h)X ˆ t dt + c + γt h∗ dYt , dX
(9.33)
and γ˙t = γt (b − ch)∗ + (b − ch)γt + σ ∗ σ − γt h∗ hγt . d , we define the d-dimensional stochastic For any z ∈ Rd and R ∈ S+ d -valued function P by process Zt and S+ t dZt = (b − ch − γt h∗ h)Zt dt + c + Pt h∗ dYt (9.34) Z0 = z,
and
P˙ t = Pt (b − ch)∗ + (b − ch)Pt + σ ∗ σ − Pt h∗ hPt P0 = R.
ˆ 0 and R = γ0 , we have Zt = X ˆ t and Pt = γt . Thus, Note that for z = X (Zt , Pt ) can be regarded as the linear filter with “incorrect” initial. We will ˆ t − Zt as t → ∞. We need the following study the limit behavior of X d such that Assumption (A): There exists a matrix γ∞ ∈ S+ γ∞ (b − ch)∗ + (b − ch)γ∞ + σ ∗ σ − γ∞ h∗ hγ∞ = 0,
(9.35)
and b − ch − γ∞ h∗ h is asymptotically stable. Lemma 9.35 If ((b−ch)∗ , σ ) is detectable and ((b−ch)∗ , h∗ ) is stabilizable, then the Assumption (A) holds, γ∞ is the unique solution to equation (9.35), d. and Pt → γ∞ exponentially fast for any initial condition P0 = R ∈ S+ Proof By Remark 9.12 and Theorem 9.34, we see that γt converges to γ∞ , which is the unique solution Riccati equation (9.35). By to the ∗ algebraic ∗ Theorem 9.32, the matrix b − ch − h hγ∞ is asymptotically stable. This implies that b − ch − γ∞ h∗ h is also asymptotically stable. Let 0 < λ0 < inf −Reλ : λ is an eigenvalue of b − ch − γ∞ h∗ h . (9.36) Note that
1 d ∗ (Pt − γ∞ ) = b − ch − (Pt + γ∞ ) hh (Pt − γ∞ ) dt 2 ∗ 1 ∗ + b − ch − (Pt + γ∞ ) hh (Pt − γ∞ ) . 2
181
182
9 : Linear filtering
Thus, there exists a constant K1 such that |Pt − γ∞ | ≤ K1 e−λ0 t .
If R = γ∞ , then Zt is called the steady-state filter. Corollary 9.36 Suppose that ((b − ch)∗ , σ ) is detectable and ((b − ch)∗ , h∗ ) is stabilizable. If R = γ∞ , then Zt is asymptotically optimal in the following sense: ˆ t |2 = tr(γ∞ ), lim E |Xt − Zt |2 = lim E |Xt − X t→∞
t→∞
where tr(γ ) denotes the trace of the matrix γ . Proof The second equality follows from the fact that ˆ t )(Xt − X ˆ t )∗ tr(γt ) = Etr (Xt − X ˆ t )∗ (Xt − X ˆ t) = Etr (Xt − X ˆ t |2 . = E |Xt − X To prove the first equality, we only need to show that ˆ t − Zt |2 = 0. lim E|X
t→∞
By equations (9.33) and (9.34), we get ˆ t − Zt + (γ∞ − γt ) h∗ hX ˆt ˆ t − Zt = (b − ch − γ∞ h∗ h) X d X + (γ∞ − Pt ) h∗ hZt dt + (γt − Pt ) h∗ dYt . Applying Itô’s formula, we have ˆ t − Zt )e−(b−ch−γ∞ h∗ h)t d (X ∗ ˆ t + (γ∞ − Pt ) h∗ hZt dt = e−(b−ch−γ∞ h h)t (γ∞ − γt ) h∗ hX ∗ h)t
+ e−(b−ch−γ∞ h
(γt − Pt ) h∗ dYt .
(9.37)
Then, ˆ t − Zt |2 ≤ 3|X ˆ 0 − z|2 |e2(b−ch−γ∞ h∗ h)t | E|X t ∗ ˆ s |2 ds + 6t E |e2(b−ch−γ∞ h h)(t−s) ||γ∞ − γs |2 |h∗ h|2 |X 0
9.5
+ 6t E
t 0
∗ h)(t−s)
|e2(b−ch−γ∞ h
Stability for Kalman–Bucy filtering
||γ∞ − Ps |2 |h∗ h|2 |Zs |2 ds
t 2 (b−ch−γ∞ h∗ h)(t−s) ∗ (γt − Pt )h dYs . + 3E e
(9.38)
0
By Theorem 9.15, it is easy to show that ∗ h)t
|e(b−ch−γ∞ h
| ≤ e−λ0 t .
Thus, by Lemma 9.35 and the boundedness of the second moments of ˆ t and Zt , the fourth term of the right-hand side of equation (9.38) is X bounded by t 2 (b−ch−γ∞ h∗ h)(t−s) ∗ 6E e (γt − Pt )h dWs 0
2 t ∗ + 6E e(b−ch−γ∞ h h)(t−s) (γt − Pt )h∗ hXs ds 0
≤ 6E
t
0
+ 6t E ≤ K1
0
t
∗ h)(t−s)
|e2(b−ch−γ∞ h
t 0
|e2(b−ch−γ∞ h
||γt − Pt |2 |h|2 ds
∗ h)(t−s)
||γt − Pt |2 |h∗ h|2 |Xs |2 ds
e−2λ0 (t−s) e−2λ0 s ds = K2 e−2λ0 t .
The other terms can be estimated similarly. Therefore, we get ˆ t − Zt |2 ≤ K2 e−2λ0 t → 0, E|X as t → ∞.
(9.39)
Now we study the a.s. convergence. Denote the right-hand side of equation (9.36) by λ¯ . Theorem 9.37 Suppose that Assumption (A) holds. Let X0 be normal with ˆ 0 and covariance matrix γ0 . Suppose that mean X lim γt = lim Pt = γ∞ .
t→∞
t→∞
¯ we have Then, for any z ∈ Rd and any 0 < λ < λ, ˆ t − Zt eλt = 0, a.s. lim X t→∞
Proof First, we modify the proof of Corollary 9.36 to get an estimate that is better than that given in equation (9.39). By equation (9.37) and the
183
184
9 : Linear filtering
Burkholder–Davis–Gundy inequality, it follows from the same arguments as those leading to equation (9.39) that 2 ∗ ˆ s − Zs ) ≤ K1 , E sup e−(b−ch−γ∞ h h)s (X s≥0
where K1 is a constant independent of t. Denote ∗ h)s
Us = e−(b−ch−γ∞ h
ˆ s − Zs ). (X
Then, ˆ s − Zs | ≤ e−λ0 s |Us |, |X and hence, ˆ s − Zs |2 e2λ0 s < ∞. E sup |X s≥0
Thus, ˆ t − Zt |eλt ≤ e−(λ0 −λ)t sup |X ˆ s − Zs |eλ0 s → 0, |X
a.s.
s≥0
Finally, we prove the asymptotic stability of the Kalman–Bucy filter. Note ˆ t and that, given Gt , πt is a Gaussian probability measure with mean X covariance matrix γt . Let π¯ t be, given Gt , a Gaussian probability measure with mean Zt and covariance matrix Pt . Recall that the Wasserstein metric in the space P (Rd ) of probability measures is defined by ρ(ν1 , ν2 ) = sup {| ν1 , φ − ν2 , φ | : φ ∈ B1 } , where
∀ ν1 , ν2 ∈ P (Rd ),
B1 = φ : |φ(x) − φ(y)| ≤ |x − y|, |φ(x)| ≤ 1, ∀x, y ∈ Rd .
Corollary 9.38 Under the conditions of Theorem 9.37, we have lim ρ(πt , π¯ t ) = 0,
t→∞
a.s.
Proof Let ϕ(z) be the probability density function of the d-dimensional standard normal random vector. Then, ˆ t + √γt z ϕ(z)dz. f (x)πt (dx) = f X Rd
Rd
9.6
Notes
Thus, for f ∈ B1 , we have f (x)π (dx) − t
f (x)π¯ t (dx) Rd Rd 3 √ ˆ ≤ Xt + γt z − Zt + Pt z ϕ(z)dz Rd √ 3 ˆ ≤ X − Z γ − Pt |z|φ(z)dz + t t t Rd √ √ 3 ˆ ≤ X t − Zt + d γt − Pt → 0,
a.s.
9.6
Notes
The Kalman–Bucy filter and its stability were first studied by Bucy and Kalman [15]. Some other properties and applications have been studied by many authors. Here we refer the reader to Beneš and Karatzas [7], Chow, et al. [33], Delyon and Zeitouni [58], Makowski [119], Makowski and Sowers [120], Miller and Runggaldier [125] and Miller and Rubinovich [124] for these topics. Section 9.5 is based on Ocone and Pardoux [129]. The basic definitions and results for deterministic linear systems in Section 9.4 are taken from the book of Kwakernaak and Sivan [105]. The stability problem for linear filter was studied by Bucy and Kalman [15].
185
10
Stability of non-linear filtering
In this chapter, we consider the stability of the non-linear filter for the case when the observation and signal noises are independent, i.e. c = 0 in the filtering model introduced in the previous chapters. In this case, Xt is a time-homogeneous Markov process taking values in Rd . We will consider the filtering problem with a state space that is more general than what we studied in the previous chapters, namely, we assume that Xt is a continuous time-homogeneous Markov process in a Polish space S. We denote the generator of Xt by L. Suppose that the observation process Yt is an m-dimensional process given by t h(Xs )ds + Wt , (10.1) Yt = 0
where h : S → Rm is a continuous map and Wt is an m-dimensional Brownian motion independent of X. By the arguments similar to those in Chapters 5 and 7, the optimal filter πt ≡ P(·|Gt ) is the unique solution to the following filtering equation on P (S): For any f in the domain D(L) of L, t t πs , Lf ds + πs , fh∗ − πs , f πs , h∗ dνs , πt , f = π0 , f + 0
0
(10.2) where π0 is the law of X0 , and νt is the innovation process given by dνt = dYt − πt , h dt. Note that νt is an m-dimensional Brownian motion. In the last chapter, we studied the stability problem for the Kalman– Bucy filter. Namely, when the initial of the filtering equation is normal with “incorrect” mean and variance, the difference between the Kalman–Bucy filter with the correct initial and that with the incorrect one tends to zero as t → ∞. A natural generalization in the non-linear case is the investigation of the following question: Under what conditions does the distance between
10.1
Markov property of the optimal filter
πt and π¯ t tend to 0 as t → ∞? Here, π¯ t is the solution of equation (10.2) with π0 replaced by π¯ 0 ∈ P (S). Definition 10.1 The filtering model is asymptotically stable if for any π0 , π¯ 0 ∈ P (S), we have lim d(π¯ t , πt ) = 0
t→∞
in probability, where d(·, ·) is a suitable metric in the space of probability measures on S.
10.1
Markov property of the optimal filter
As we mentioned above, the signal Xt is a continuous time-homogeneous Markov process taking values in S. We denote the transition probability of the Markov process Xt by p(t, x, A), where t ≥ 0, x ∈ S and A ∈ B (S). Then, for any (s, x) ∈ R+ × S, there exists a probability measure Ps, x on C(R+ , S) such that for t > s and A ∈ B (S), Ps,x ξt ∈ A|Fsξ = p(t − s, x, A), Ps,x − a.s., and Ps,x (ξu = x, 0 ≤ u ≤ s) = 1, where ξt is the co-ordinate process on C(R+ , S), i.e. ξt (θ ) = θt for all θ ∈ C(R+ , S). We assume throughout this chapter that the following condition (F) is satisfied. Condition (F): The mapping (s, x) → Ps,x from R+ × S to P (C(R+ , S)) is continuous. Remark 10.2 Under Condition (F), Xt becomes a Feller–Markov process, i.e. for any f ∈ Cb (S), Tt f defined by Tt f (x) ≡ f (y)p(t, x, dy) S
is still in Cb (S). Remark 10.3 Although the Markov process Xt is assumed to be timehomogeneous and hence, the distribution can be characterized by the family of probability measures {P0,x : x ∈ S}, it is more convenient to use {Ps,x : s ≥ 0, x ∈ S}, which is usually reserved for time-inhomogeneous Markov processes. This will become clear when we define st and !st later.
187
188
10 : Stability of non-linear filtering
Since we will discuss the filtering problems with different initial distributions, it is convenient to construct all filtering models under the same (standard) stochastic basis. We recall that the observation is an mdimensional process given by equation (10.1). Let βt be the co-ordinate process on C(R+ , Rm ). Let Q be the probability measure on C(R+ , Rm ) induced by the Brownian motion Wt . Let ˆ = C(R+ , S) × C(R+ , Rm ) and Rs,λ = Ps,λ ⊗ Q, where for λ ∈ P (S), Ps,λ ∈ P (C(R+ , S)) is defined as ∀A ∈ B (C(R+ , S)) . Ps,λ (A) = Ps,x (A)λ(dx), S
Namely, Ps,λ is the distribution of the Markov process Xt with initial distribution λ, and Rs,λ is the joint distribution of (X, W). Let Fˆ t be the σ -field ˆ generated by the co-ordinate processes ξ and β stopped at t, i.e. on Fˆ t = Ftξ ,β . Denote Fˆ ∞ by Fˆ .
t To define a common version of the stochastic integral s h(ξu )∗ dβu with respect to the probability measures Rs,λ , (s, λ) ∈ R+ × P (S), we prove it to be a measurable functional of ξ and β. We need the following result adapted from Karandikar [85]. Lemma 10.4 Let W be an m-dimensional Brownian motion and let f be an adapted (with respect to the original stochastic basis (, F , P, Ft )) Rm valued continuous stochastic process. Let τ0n = 0 and n i ≥ 0. = inf t ≥ τin : |ft − fτin | ≥ 2−n , τi+1 n , k ≥ 0, we define For τkn ≤ t < τk+1
Itn =
k−1 i=0
∗ n − W τ n + f n Wt − W τ n . fτ∗n Wτi+1 τ i i
Then, for all T < ∞, when n → ∞, we have t n ∗ fs dWs → 0, sup It − 0≤t≤T
k
k
a.s.
0
Proof We denote the left-hand side of equation (10.3) by Un . Let vtn = τkn Then,
n if τkn ≤ t < τk+1 ,
k = 0, 1, 2, . . . .
t ∗ fvsn − fs dWs . Un = sup 0≤t≤T
0
(10.3)
10.1
Markov property of the optimal filter
By Doob’s inequality we have
EUn2
≤ 4E
T
|fvsn − fs |2 ds
0 −2n
≤ 4T2
.
Therefore,
E
∞
Un2 < ∞,
n=1
and hence, Un → 0 a.s.
As a consequence, we can prove that the stochastic integral is a functional of the integrand and the driving Brownian motion. Corollary 10.5 There is a measurable mapping F from C(R+ , Rm ) × C(R+ , Rm ) to C(R+ , R) such that for any m-dimensional Brownian motion W and any Rm -valued continuous stochastic process f , we have t fs∗ dWs , ∀t ≥ 0, a.s. (10.4) F(f , W)t = 0
Proof Fix ζ , η ∈ C(R+ , Rm ). Let an0 = 0 and, for i ≥ 0, ani+1 = inf t ≥ ani : |ζt − ζani | ≥ 2−n . Define a sequence Fn of mappings from C(R+ , Rm ) × C(R+ , Rm ) to C(R+ , R) as follows: For ank ≤ t < ank+1 , k ≥ 0, Fn (ζ , η)t =
k−1
ζa∗n ηani+1 − ηani + ζa∗n ηt − ηan . i
i=0
We then define F(ζ , η) =
lim Fn (ζ , η)
n→∞
0
k
k
if the limit exists, otherwise.
Equation (10.4) follows from Lemma 10.4. ˆ → R be the stochastic process such that Let Zt : t h(ξu )∗ dβu , Rs,λ − a.s., Zt − Z s = s
189
190
10 : Stability of non-linear filtering
where the stochastic integral is understood as the functional of h(ξ ) and β ˆ let defined in the corollary above. For 0 ≤ s ≤ t and (θ , η) ∈ , 1 t qst (θ , η) ≡ exp Zt (θ , η) − Zs (θ , η) − |h(ξu (θ ))|2 du . 2 s Note that ξt (θ ) = θt and βt (η) = ηt . Therefore, qst above can also be denoted by qst (ξ , β). We define an MF (S)-valued process st (λ) and a P (S)-valued process !st (λ) on C(R+ , Rm ) as f (ξt (θ))qst (θ , η)Ps,x (dθ )λ(dx), Q-a.s. η, st (λ)(η), f ≡ S C(R+ ,S)
and !st (λ)(η) =
st (λ)(η) .
st (λ)(η), 1
(10.5)
We shall denote q0t , 0t and !0t by qt , t and !t , respectively. By the Kallianpur–Striebel formula, we have πt = !t (π0 )(Y),
a.s., ∀t ≥ 0.
Proposition 10.6 Let π¯ t be defined as the unique solution of equation (10.2) with initial π¯ 0 . Then, for any t ≥ 0, π¯ t = !t (π¯ 0 )(Y),
a.s.
(10.6)
Proof As ξt is a Markov process with generator L, for any f ∈ D(L) with Lf ∈ Cb (S), t f Nt ≡ f (ξt ) − Lf (ξs )ds 0
is a square-integrable martingale independent of βt , under probability measure R0, π¯ 0 . It is easy to show that dqt = qt h(ξt )∗ dβt . By Itô’s formula, we have d(f (ξt )qt ) = Lf (ξt )qt dt + qt dNt + f (ξt )qt h(ξt )∗ dβt . f
As
t (π¯ 0 )(η), f =
C(R+ ,S)
f (ξt (θ))qt (θ , η)Pπ¯ 0 (dθ ),
10.1
Markov property of the optimal filter
we get
t (π¯ 0 ), f = π¯ 0 , f +
0
t
s (π¯ 0 ), Lf ds +
t
0
s (π¯ 0 ), fh∗ dβs .
Applying Itô’s formula to equation (10.5), we then obtain d !t (π¯ 0 ), f = !t (π¯ 0 ), Lf dt + !s (π¯ 0 ), fh∗ dβt − !t (π¯ 0 ), f !t (π¯ 0 ), h∗ dβt − !t (π¯ 0 ), fh∗ !t (π¯ 0 ), h dt 2 + !t (π¯ 0 ), f !t (π¯ 0 ), h dt = !t (π¯ 0 ), Lf dt + !s (π¯ 0 ), fh∗ − !t (π¯ 0 ), f !t (π¯ 0 ), h∗ d ν˜ , where
ν˜ t = βt −
0
t
!s (π¯ 0 ), h ds.
Replacing βt by Yt , we see that !t (π¯ 0 )(Y) satisfies equation (10.2) with initial π¯ 0 . The representation equation (10.6) follows from the uniqueness of the solution to equation (10.2). The next lemma establishes the flow property for both processes and !, which is the key in the proof of their Markov property. Lemma 10.7 Fix 0 ≤ s < t < ∞ and λ ∈ P (S). Then t (λ) = st (s (λ)) and !t (λ) = !st (!s (λ)) ,
Q-a.s.
Proof Let D = {(θ , θ) ∈ C(R+ , S) × C(R+ , S) : θs = θs }. We define a mapping (θ , θ) ∈ D → θ˜ ∈ C(R+ , S) as if u ≤ s, θu θ˜u = θu if u ≥ s. Then, qt (θ˜ , η) = qs (θ , η)qst (θ , η), and, by the Markov property of the measure Px , we have ˜ = Px (dθ )Ps,ξs (θ ) (dθ ). Px (d θ)
191
192
10 : Stability of non-linear filtering
Therefore, for any η ∈ C(R+ , Rm ), st (s (λ)) (η), f f (ξt (θ))qst (θ , η)Ps,x (dθ) s (λ)(dx) = S
=
C(R+ ,S)
S C(R+ ,S)
=
S C(R+ ,S)
C(R+ ,S)
f (ξt (θ))qst (θ , η)Ps,ξs (θ ) (dθ ) qs (θ , η)Px (dθ )λ(dx)
˜ t (θ, ˜ η)Px (d θ)λ(dx) ˜ f (ξt (θ))q
= t (λ)(η), f . The second equality of the lemma follows from the first one, the definition equation (10.5), and the fact that !st (λ) = !st (λ¯ ), where λ¯ = λ, 1−1 λ is the normalization of λ.
Note that, for (s, λ) ∈ R+ × P (S) fixed, the process {qst : t ≥ s} is a ˆ Fˆ , Rs,λ , Fˆ t ). martingale on the stochastic basis (, β
β
Lemma 10.8 Let s ≥ 0 be fixed. On the stochastic basis (Cm , F∞ , Q, Ft ), we define the process {ρst : t ≥ 0} as follows:
st (λ), 1 if t ≥ s, ρst ≡ 1 if t ≤ s. Then, {ρst , t ≥ 0} is a martingale. β
Proof Suppose that t > r ≥ s and A ∈ Fr . Then EQ (ρst 1A (η)) = qst (θ , η)Ps,λ (dθ )Q(dη) A C(R+ ,S)
= =
ˆ
1A (η)qst (θ , η)Rs,λ (dθdη) 1A (η)qsr (θ , η)Rs,λ (dθdη)
ˆ Q
= E (ρsr 1A (η)) . Thus, EQ ρst |Frβ = ρsr .
(10.7)
10.1
Markov property of the optimal filter
For t > s ≥ r, we have EQ ρst |Frβ = EQ EQ ρst |Fsβ Frβ = 1 = ρsr . The proof for the case s ≥ t > r is trivial. Thus, equation (10.7) holds in all cases. This implies the martingale property of ρst . Based on Lemma 10.8, we define a probability measure Qs,λ on Cm such that dQs,λ = ρst dQ
β
on Ft .
We denote Q0,λ by Qλ . Lemma 10.9 Qπ0 is the law of the observation process Y. ˆ π be the probability measure on ˆ such that on Ft , Proof Let R 0 ˆπ dR 0 (θ, η) = qt (θ , η). d(Pπ0 ⊗ Q)
(10.8)
ˆ π , the process It follows from Girsanov’s theorem that under R 0 t h(ξs )ds βt − 0
is a Brownian motion, which is independent of ξt . Thus, the law of βt under ˆ π coincides with that of Yt under P. Note that for A ∈ B (Cm ), R 0 ˆ π (C(R+ , S) × A) ˆ π (η : β(η) ∈ A) = R R 0 0 qt (θ , η)Pπ0 (dθ )Q(dη) = C(R+ ,S) A
=
A
t (π0 )(η), 1 Q(dη)
= Qπ0 (dη). This implies that Qπ0 is the law of the observation process Y.
Now, we are ready to prove the Markov property for the optimal filter. β
Theorem 10.10 Let λ ∈ P (S). Then, the stochastic processes (t (λ), Ft ) β Markov processes on the probabiland (!t (λ), Ft ) are time-homogeneous β m ity space C , F∞ , Q .
193
194
10 : Stability of non-linear filtering β
Proof Fix 0 ≤ s < t < ∞, λ ∈ P (S) and A ∈ Fs . Then, for f ∈ Cb (MF (S)),
A
f (t (λ))dQ =
A
f (st (s (λ)))dQ
Q
=E
1A E
=
A
β f (st (s (λ)))Fs
Q
f1 (s, t, s (λ))dQ,
(10.9)
where f1 (s, t, µ) = EQ f (st (µ)) for any s < t and µ ∈ MF (S). We note that the law of {(ξs+u , βs+u − βs ) : u ≥ 0} under Rs,λ is the same as that {(ξu , βu ) : u ≥ 0} under Rλ . As a consequence, the law of st (λ) under Rs,λ is the same as that of t−s (λ) under Rλ . Thus, f1 depends on (s, t) only through t − s. Let f2 (t − s, µ) = f1 (s, t, µ). We may continue the calculation of equation (10.9) above with
A
f (t (λ))dQ =
A
f2 (t − s, s (λ))dQ.
This implies that EQ f (t (λ))|Fsβ = f2 (t − s, s (λ)). β β Therefore, on the probability space Cm , F∞ , Q , t (λ), Ft is a timehomogeneous Markov process. The conclusion for !t (λ) can be proved similarly. ξ
β
As Ft and Ft are independent under Rπ0 , we have
(!t (π¯ 0 )(η), ξt (θ )) , Fˆ t ˆ Fˆ , Rπ ). Markov process on the probability space (, Corollary 10.11 The stochastic process
is a
0
ˆπ , The next theorem states the Markov property under the measure R 0 which is defined by equation (10.8). Theorem 10.12 The stochastic process (!t (π¯ 0 )(η), ξt (θ )) is Markovian on ˆ π , Fˆ t ) taking values in P (S) × S. ˆ Fˆ , R the stochastic basis (, 0
10.1
Markov property of the optimal filter
Proof Fix 0 < s < t. Let f : P (S) × S → R be a bounded measurable function and let A ∈ Fˆ s . Then A
ˆπ = f (!t (π¯ 0 ), ξt )d R 0
A
=
A
f (!t (π¯ 0 ), ξt )qt dRπ0 ERπ0 f (!st (!s (π¯ 0 )) , ξt ) qs qst Fˆ s dRπ0
=
A
=
A
f1 (!s (π¯ 0 ), ξs )qs dRπ0 ˆπ , f1 (!s (π¯ 0 ), ξs )d R 0
where f1 (λ, x) = ERs,x f (!st (λ), ξt )qst . Hence, ˆ ERπ0 f (!t (π¯ 0 ), ξt )Fˆ s = f1 (!s (π¯ 0 ), ξs ). This implies the desired Markov property.
ˆ π is the same as the law of (Xt , Yt ). Note that the law of (ξt , βt ) under R 0 As π¯ t = !t (π¯ 0 )(Y) and Xt = ξt (X), we get the following: Corollary 10.13 The process {(π¯ t , Xt ), Ft } is a P (S) × S-valued Markov process. Finally, in this section, we study the Feller property of the Markov process (π¯ t , Xt ). Lemma 10.14 Suppose that the sequence {λn } ⊂ P (S) converges to λ ∈ P (S) weakly. Then, for all t ≥ 0, t (λn ) → t (λ) in Q probability. As a consequence, we get that !t (λn ) → !t (λ) in Q probability. Proof By the continuity of Px in x, we see that Pλn converges weakly to Pλ in the space P (C(R+ , S)). By Skorohod’s representation theorem (cf. Theorem 25.6 on page 343 of Billingsley [11]), there exists a probability ˜ a sequence of continuous S-valued processes {ξ˜tn , n ≥ 1} and an space , S-valued process ξ˜t such that ξ˜ n and ξ˜ have distributions Pλn and Pλ on ˜ C(R+ , S), respectively, and ξ˜ n (ω) ˜ → ξ˜ (ω) ˜ in C(R+ , S) for almost all ω˜ ∈ .
195
196
10 : Stability of non-linear filtering
Recall that from Condition (BC), the mapping h is bounded by the ˜ we have constant K. Note that for ω˜ ∈ , qt (ξ˜ n (ω), ˜ η)2 Q(dη) Cm
=
t
exp
Cm
0
2h(ξ˜sn (ω)) ˜ ∗ dβs (η) −
t 2 ˜n ( ω)) ˜ h( ξ × exp ds Q(dη) s ≤ eK =e
0
2t
Cm
K2 t
t
1 2
t 2 n ˜ ˜ ds 2h(ξs (ω)) 0
2h(ξ˜sn (ω)) ˜ ∗ dβs (η) −
exp 0
1 2
t 2 n ˜ 2h( ξ ( ω)) ˜ ds Q(dη) s 0
.
(10.10)
Similarly, we can prove that 2 qt (ξ˜ (ω), ˜ η)2 Q(dη) ≤ eK t .
(10.11)
Further, we have 2 ˜ ω)Q(dη) ˜ ˜ η) − log qt (ξ˜ (ω), ˜ η) P(d log qt (ξ˜ n (ω),
(10.12)
Cm
Cm
˜
2 t ∗ n ˜ ω)Q(dη) h(ξ˜s (ω)) ˜ − h(ξ˜s (ω)) ˜ dβs (η) P(d ˜ ≤2 ˜ Cm 0 t 2 n 2 2 ˜ ω)Q(dη) ˜ ˜ ˜ |h(ξs (ω))| ds P(d ˜ − |h(ξs (ω))| ˜ +2
Cm
˜
2
≤ 2 + (2K) t
0
Cm
t 2 ˜n ˜ ω)Q(dη). ˜ − h(ξ˜s (ω)) ˜ dsP(d ˜ h(ξs (ω)) ˜ 0
Using the inequalities equations (10.10), (10.11), (10.12) and x e − ey ≤ (ex + ey )|x − y|, we get
2 ˜ qt (ξ˜ n (ω), ˜ ˜ η) − qt (ξ (ω), ˜ η) P(d ω) ˜ Q(dη) ˜ Cm 2 t ˜n K2 t 2 ˜ ω)Q(dη). ˜ − h(ξ˜s (ω)) ˜ dsP(d ˜ 2 + (2K) t ≤ 2e h(ξs (ω)) Cm
˜ 0
10.1
Markov property of the optimal filter
Then, for any f ∈ Cb (S), | t (λn )(η) − t (λ)(η), f |2 Q(dη) Cm
2 n n Q(dη) ˜ ˜ ˜ ˜ ˜ ( ω))q ˜ ( ξ ( ω), ˜ η) − f ( ξ ( ω))q ˜ ( ξ ( ω), ˜ η) P(d ω) ˜ f ( ξ = t t t t ˜ Cm 2 ˜n ˜ ω) ˜ ˜ − f (ξ˜t (ω)) ˜ P(d ≤ K1 f (ξt (ω))
˜
2 ˜ n ˜ ˜ ˜ η) − qt (ξ (ω), ˜ η) P(d ω) ˜ Q(dη) + K1 qt (ξ (ω), ˜ Cm 2 ˜n ˜ ω) ˜ ˜ − f (ξ˜t (ω)) ˜ P(d ≤ K1 f (ξt (ω))
˜
+ K2
Cm
t 2 ˜n ˜ ω)Q(dη) ˜ − h(ξ˜s (ω)) ˜ dsP(d ˜ h(ξs (ω)) ˜ 0
→ 0,
(10.13)
where K1 , K2 are two constants. The conclusion of the lemma then follows easily. Define two families of operators {Tt : t ≥ 0} and {St : t ≥ 0} on Cb (P (S)) and Cb (P (S) × S), respectively, as follows:
Tt G(λ) = EQλ G(!t (λ)),
∀ G ∈ Cb (P (S)) and λ ∈ P (S),
and ˆ
St F(λ, x) = ERx F(!t (λ), ξt ),
∀ F ∈ Cb (P (S) × S) and (λ, x) ∈ P (S) × S.
Theorem 10.15 The families of operators {Tt } and {St } are Feller semigroups on Cb (P (S)) and Cb (P (S) × S), respectively. Proof By Lemma 10.14, for {λn } ⊂ P (S) converging to λ ∈ P (S), we have
Tt G(λn ) = EQλn G(!t (λn )) = EQ G(!t (λn )) t (λn ), 1 → EQ G(!t (λ)) t (λ), 1 = Tt G(λ). This proves that Tt is a mapping from Cb (P (S)) to Cb (P (S)), and hence, {Tt : t ≥ 0} is a Feller semigroup.
197
198
10 : Stability of non-linear filtering
Finally, we prove the Feller property for the semigroup {St : t ≥ 0}. Let (λn , xn ) → (λ, x) in P (S) × S. Note that ˆ
St F(λn , xn ) = ERxn F(!t (λn ), ξt ) F(!t (λn )(η), ξt (θ ))qt (θ , η)Pxn (dθ )Q(dη). = C(R+ ,S) Cm
˜ a sequence Similar to Lemma 10.14, there exist a probability space , of C(R+ , S)-valued random variables {θ˜ n : n ≥ 1} and a C(R+ , S)-valued ˜ with distributions Pxn and Px , respectively, and random variable θ˜ on ˜θ n (ω) ˜ ˜ Then, ˜ → θ (ω) ˜ in C(R+ , S) for almost all ω˜ ∈ . n ˜ ω)Q(dη). St F(λn , xn ) = F(!t (λn )(η), ξt (θ˜ n (ω)))q ˜ ˜ η)P(d ˜ t (θ˜ (ω), ˜ C(R+ ,S)
By estimates similar to those leading to equation (10.13), we can prove that St F(λn , xn ) → St F(λ, x). Thus, {St : t ≥ 0} is a Feller semigroup on Cb (P (S) × S).
10.2
Ergodicity of the optimal filter
In this section, we consider the ergodicity of the Markov process πt . Namely, we look for conditions under which the Markov process πt has a unique invariant measure. Definition 10.16 A probability measure M on P (S) is an invariant measure of the Markov process πt if for any F ∈ Cb (P (S)) and t ≥ 0, we have Tt F(λ)M(dλ) = F(λ)M(dλ). P (S)
P (S)
We make the following assumptions. Assumption (E1): The signal process Xt has a unique invariant measure µ ∈ P (S). Assumption (E2): For any f ∈ Cb (S), we have lim sup Tt f (x) − µ, f µ(dx) = 0, t→∞
S
where {Tt } is the semigroup of the signal process. To study the invariant measure of the optimal filter, we extend the filtering model to include the whole real line as the set for the time parameter. (1) Let Pµ ∈ P (C(R, S)) be such that the co-ordinate process ξt , t ∈ R, is a Markov process with stationary marginal distribution µ, i.e. for any
10.2
Ergodicity of the optimal filter
E1 , . . . , En ∈ B (S) and −∞ < t1 < · · · < tn < ∞, (1) ξt1 ∈ E1 , . . . , ξtn ∈ En Pµ ··· µ(dx1 )p(t2 − t1 , x1 , dx2 ) · · · p(tn − tn−1 , xn−1 , dxn ). = E1
En
Let Q(1) ∈ P (C(R, Rm )) be the distribution of the m-dimensional Brownian motion with time t ∈ R, i.e. for the co-ordinate process βt on β (C(R, Rm ), F∞ , Q(1) ) and −∞ < t0 < t1 < · · · < tn < ∞, the random vectors √
1 1 (βt − βt0 ), . . . , √ (βt − βtn−1 ) tn − tn−1 n t1 − t0 1
are i.i.d. with common distribution N(0, I) on Rm . Let (1) (1) (1) = C(R, S) × C(R, Rm ) and R(1) µ = Pµ ⊗ Q .
Let {zt , t ∈ R} be the observation process on ((1) , B ((1) ), R(1) µ ) such that t z t − zs = h(ξu )du + βt − βs . s
Denote the observation σ -fields by z Fs,t = σ (zv − zu : s ≤ u ≤ v ≤ t),
For f ∈ Cb (S), let
and
−∞ ≤ s ≤ t ≤ ∞.
" (1) (0) z , πs,t , f = ERµ f (ξt )|Fs,t
!
(10.14)
" (1) (1) z ∨ σ (ξs ) . πs,t , f = ERµ f (ξt )|Fs,t
! (0)
Note that πs,t is the usual optimal filter with s as the initial time; and
(1)
πs,t is the optimal filter with the complete knowledge of the initial (i.e. at time s) signal as well as the observation process. In the next lemma, we establish some useful alternative expressions for (0) (1) the measure-valued processes πs,t and πs,t . Lemma 10.17 For any s < t, we have (0)
πs,t = !t−s (µ)(zs ),
(1)
πs,t = !t−s (δξs )(zs ),
199
200
10 : Stability of non-linear filtering
and
! " (1) z ξ (1) πs,t , f = ERµ f (ξt )F−∞,t ∨ F−∞,s ,
(10.15)
where zus = zs+u − zs . Proof Note that
zts =
t
0
h(ξus )du + βts ,
where ξus = ξs+u and βus = βs+u − βs . Then, ξts , βts , zts t≥0 has the same structure as the original filtering problem with initial µ. Hence, (0) πs,s+t = !t (µ)(zs ). (1)
Further, the filter πs,t is equivalent to the optimal filter in the original filtering problem with initial δξs , and hence, (1)
πs,t = !t−s (δξs )(zs ). By the Markov property of ξ and the independency of β and ξ , given ξ σ (ξs ), the σ -fields σ (ξu+s , βs+u − βs , u ≥ 0) and F−∞,s are independent. As z ⊂ σ (ξu+s , βs+u − βs , u ≥ 0), σ (ξt ) ∨ Fs,t ξ
we get its independency, given σ (ξs ), from F−∞,s , and hence, (1) (1) z z ξ ξ ERµ f (ξt )Fs,t ∨ F−∞,s = ERµ f (ξt )Fs,t ∨ F−∞,s ∨ σ (ξs ) (1) z = ERµ f (ξt )Fs,t ∨ σ (ξs ) (1)
= πs,t (f ).
(10.16)
Note that ξ ξ z z z F−∞,t ∨ F−∞,s = F−∞,s ∨ F−∞,s ∨ Fs,t β
ξ
z = F−∞,s ∨ F−∞,s ∨ Fs,t , β
ξ
z , so we get and that F−∞,s is independent of σ (ξt ) ∨ F−∞,s ∨ Fs,t (1) (1) ξ ξ z z ERµ f (ξt )|Fs,t ∨ F−∞,s = ERµ f (ξt )|F−∞,t ∨ F−∞,s .
(10.17)
Combining equations (10.16) and (10.17), we get the last expression of the lemma.
10.2 (0)
(1)
Ergodicity of the optimal filter (1)
µ
µ
Denote the laws of πs,s+t and πs,s+t under Rµ by mt and Mt , respectively. Recall that {Tt } is the semigroup for the Markov process πt . Lemma 10.18 For any t ≥ 0 and F ∈ Cb (P (S)), we have µ µ mt , F = Tt F(µ) and Mt , F = Tt F(δx )µ(dx). S
Proof Note that µ (1) (1) (0) mt , F = ERµ F(πs,s+t ) = ERµ F !t (µ)(zs ) = EQµ F(!t (µ)) = Tt F(µ), and
(1) µ (1) Mt , F = ERµ F(πs,s+t ) (1) F(!s,s+t (δξs )(zs ))Pµ (dξ )Q(1) (dβ) =
C(R+ ,S) Cm
= =
F(!t (δx )(z))Qδx (dz)µ(dx)
S Cm
S
Tt F(δx )µ(dx),
where the third equality follows from the stationarity and the fact that the conditional law of zs = zs (ξ , β) under R(1) µ given ξs = x is equal to Qδx . By equations (10.14) and (10.15) and the backward martingale convergence theorem, we get that as s → −∞, we have ! " ! " (1) (0) (0) z πs,t , f → πt , f = ERµ f (ξt )|F−∞,t , and
! " ! " (1) ξ (1) (1) z πs,t , f → πt , f = ERµ f (ξt )| ∩∞ . s=−∞ F−∞,t ∨ F−∞,s
As
we see that
(1) µ (0) mt−s , F = ERµ F(πs,t ),
µ lim mµ u , F = lim mt−s , F
u→∞
s→−∞
(1)
(0)
= lim ERµ F(πs,t ) s→−∞ (1)
(0)
= ERµ F(πt ).
201
202
10 : Stability of non-linear filtering (0)
As a consequence, the distribution of πt does not depend on t. We denote µ it by mµ = limu→∞ mu . Therefore, mµ is an invariant measure of the semiµ group {Tt }. Similarly, we can prove that Mu tends to Mµ , which is also an invariant measure of the semigroup {Tt }. (0) (1) By the expressions of πt and πt , we see that mµ = Mµ , if for some t, ξ z z ∩∞ R(1) (10.18) s=−∞ F−∞,t ∨ F−∞,s = F−∞,t , µ − a.s. In this case, we will prove the uniqueness of the invariant measure for the optimal filter. To this end, we need to introduce the concept of the barycenter for a probability measure on P (S). Note that ν, f (dν) f ∈ Cb (S) → P (S)
is a bounded linear functional on Cb (S), and there exists η ∈ P (S) such that η, f = ν, f (dν). P (S)
We denote η by
η=
P (S)
ν(dν)
and call it the barycenter of . Theorem 10.19 If mµ = Mµ , then the optimal filter has a unique invariant measure. Proof Suppose that is another invariant measure of the optimal filter. Let µ˜ be the barycenter of , i.e. µ˜ = ν(dν). P (S)
For f ∈ Cb (S), let F ∈ Cb (P (S)) be defined by F(ν) = ν, f , ν ∈ P (S). Then, Tt F(ν) = Eν F(πt ) = Eν πt , f = Eν Eν f (Xt )|Gt = Eν f (Xt ) = ν, Tt f ,
10.2
Ergodicity of the optimal filter
and hence, µ, ˜ f =
P (S)
ν, f (dν) =
P (S)
=
P (S)
F(ν)(dν)
Tt F(ν)(dν) =
P (S)
= µ, ˜ Tt f .
ν, Tt f (dν)
This proves that µ˜ is an invariant measure of the semigroup {Tt }. By the uniqueness of the invariant measure for {Tt }, we get µ˜ = µ. Therefore, the barycenter of is equal to µ. (c) (e) Now, we define two probability measures µ and µ on P (S) by !
" ! " (e) (c) , F = F(µ) and , F = F(δx )µ(dx), µ µ S
∀F ∈ Cb (P (S)).
Let F be a continuous convex function on P (S). Then, " (c) , F = F µ
!
P (S)
≤
P (S)
ν(dν)
F(ν)(dν) = , F ,
(10.19)
where the inequality above follows from Jensen’s inequality. On the other hand, we note that for any ν ∈ P (S),
ν, f =
S
f (x)ν(dx) =
( =
S
)
S
δx , f ν(dx)
δx ν(dx), f ,
and hence, ν=
S
δx ν(dx).
(10.20)
As µ is the barycenter of , we have !
" (e) , F = µ
P (S) S
F(δx )ν(dx)(dν).
(10.21)
203
204
10 : Stability of non-linear filtering
It follows from equation (10.20) and Jensen’s inequality that we can continue equation (10.21) with ! " (e) µ , F ≥ δx ν(dx) (dν) F P (S)
S
=
P (S)
F(ν)(dν) = , F .
(10.22)
Combining equations (10.19) and (10.22), we get ! " ! " (c) , F ≤ , F ≤ (e) , F , for any continuous convex function F on P (S). Next, we prove that Tt F is also convex. For α ∈ [0, 1] and λ1 , λ2 ∈ P (S), set λ = αλ1 + (1 − α)λ2 . Let πt = Eλ (·|Gt ) and π˜ t = Eλ (·|Gt ∨ σ (π0 )), where π0 is a random measure with Pλ (π0 = λ1 ) = α and Pλ (π0 = λ2 ) = 1 − α. Then,
Eλ (π˜ t |Gt ) = πt , and hence, by Jensen’s inequality, we have
Tt F(λ) = Eλ F(πt ) ≤ Eλ (Eλ (F(π˜ t )|Gt )) = Eλ F(π˜ t ) = α Tt F(λ1 ) + (1 − α)Tt F(λ2 ). Finally, we prove the uniqueness of the invariant measure for {Tt }. By Lemma 10.18 and the definition of (c) , we get ! " µ mt , F = Tt F(µ) = (c) , T F . (10.23) t µ Since is invariant for Tt , it follows from equations (10.19) and (10.23) that µ mt , F ≤ , Tt F = , F . (10.24) By Lemma 10.18 and the definition of (e) , we get " µ ! (e) Mt , F = µ , Tt F .
(10.25)
10.3
Finite memory property
Since is invariant for Tt , it follows from equations (10.22) and (10.25) that µ Mt , F ≥ , Tt F = , F . (10.26) Combining equations (10.23) and (10.25), taking t → ∞, we get µ m , F ≤ , F ≤ Mµ , F . Since mµ , F = Mµ , F, we get
, F = mµ , F = Mµ , F
for all convex F.
Thus, mµ = = Mµ . This implies the desired uniqueness.
10.3
Finite memory property
In this section, we establish an equivalent condition for the stability of the optimal filter. Let {Tt∗ } be the dual semigroup on P (S) of the semigroup {Tt } on Cb (S). First, we need the following lemma. Lemma 10.20 Let ν ∈ P (S) be such that for some > 0, T∗ ν is absolutely continuous with respect to µ. Then, Tt∗ ν → µ in P (S) as t → ∞. Proof For any f ∈ Cb (S) and t ≥ , we have ∗ Tt ν, f = T∗ ν, Tt− f dT ∗ ν = Tt− f (x) (x)µ(dx). dµ S Thus, ∗ T ν − µ, f ≤
t
For any K > 0, we have ∗ T ν − µ, f ≤ K t
∗ Tt− f (x) − µ, f dT ν (x)µ(dx). dµ S
S
Tt− f (x) − µ, f µ(dx)
+ 2f 0,∞
S
It follows from Assumption (E2) that lim sup T ∗ ν − µ, f ≤ 2f 0,∞ t→∞
t
dT∗ ν µ(dx). (x)1 dT∗ ν (x)>K dµ dµ S
dT∗ ν µ(dx). (x)1 dT∗ ν (x)>K dµ dµ
205
206
10 : Stability of non-linear filtering
Taking K → ∞, we get
lim sup Tt∗ ν − µ, f ≤ 0. t→∞
This implies the convergence of Tt∗ ν to µ.
To apply this lemma, we will need the following Assumption (E3): For any ν1 , ν2 ∈ P (S), there exists t ≥ 0 such that Tt∗ ν1 is absolutely continuous with respect to Tt∗ ν2 . As a consequence, we have the following corollary. Corollary 10.21 Suppose that Assumption (E3) holds. Then for any ν ∈ P (S), we have Tt∗ ν → µ as t → ∞. Proof By (E3), we get Tt∗ ν << Tt∗ µ = µ. The conclusion then follows from Lemma 10.20. In the next two propositions, we establish the absolute continuity relation among {Tt∗ }, {!t } and {Qλ }, which will be used to get the finite memory property of the optimal filter. Proposition 10.22 Let µ1 , µ2 ∈ P (S) and > 0. If T∗ µ1 << T∗ µ2 , then ! (µ1 ) << ! (µ2 ),
Q-a.s.
Proof Recall that Cm = C(R+ , Rm ). Let N ⊂ Cm be the Q-nullset such that for η ∈ / N , we have q (θ , η) > 0 a.s. with respect to both Pµ1 and Pµ2 . c Fix η ∈ N and let A ∈ B (S) be such that ! (µ2 )(η), 1A = 0. Then 1A (ξ (θ))q (θ , η)Pµ2 (dθ ) = 0. Cm
Thus, 1A (ξ (θ)) = 0,
Pµ2 -a.s.
This implies that T∗ µ2 (A) = 0, and hence, T∗ µ1 (A) = 0. Now we reverse the above argument with µ2 replaced by µ1 . Then, ! (µ1 )(η), 1A = 0. Proposition 10.23 Let µ1 , µ2 ∈ P (S). If there exists > 0 such that ! (µ1 ) << !(µ2 ),
Q-a.s.,
then Qµ1 << Qµ2 . β
Proof Denote ! (µi ) by γi , i = 1, 2. Suppose that A ∈ Ft . Then Qµ1 (A) = EQ { t (µ1 ), 1 1A } = EQ { (µ1 ), 1 t (γ1 ), 1 1A } ,
(10.27)
10.3
Finite memory property
where the last equality follows from the flow property of . Note that
t (γ1 ), 1 =
S Cm
qt (θ , η)P,x (dθ)γ1 (dx)
=
S Cm
qt (θ , η)P,x (dθ)1 dγ1 dγ2
+
S Cm
dγ1 (x)γ2 (dx) (x)≤K dγ2
qt (θ , η)P,x (dθ )1 dγ1 dγ2
≤ K t (γ2 ), 1 +
S
γ (dx) (x)>K 1
t (δx ), 1 1 dγ1 dγ2
γ (dx). (x)>K 1
As Q
E
β Q
t (δx ), 1 F = E
Cm
qt (θ , η)P,x (dθ )Fβ = 1,
we have Q
E
(µ1 ), 1
Q
≤E
=E
t (δx ), 1 1 dγ1
γ (dx)1A (x)>K 1
t (δx ), 1 1 dγ1
γ (dx) (x)>K 1
dγ2
(µ1 ), 1
Q
S
S
dγ2
(µ1 ), 1
S
1 dγ1 dγ2
γ (dx) (x)>K 1
≡ C1 (K). Since S
1 dγ1 dγ2
γ (dx) (x)>K 1
≤ 1,
by the dominated convergence theorem, we have that C1 (K) → 0 as K → ∞. Now we continue the estimate equation (10.27) to arrive at Qµ1 (A) ≤ KEQ { (µ1 ), 1 t (γ2 ), 1 1A } + C1 (K).
(10.28)
207
208
10 : Stability of non-linear filtering
The first term of equation (10.28) is dominated by
(µ1 ), 1 Q
t (γ2 ), 1 1A
(µ2 ), 1 KE
(µ2 ), 1 ≤ KLEQ { (µ2 ), 1 t (γ2 ), 1 1A } Q
+ KE
(µ1 ), 1 1 (µ1 ),1
(µ2 ),1 >L
$
t (γ2 ), 1 1A
= KLQµ2 (A) + KC2 (L), where
(10.29)
Q
C2 (L) = E
$
(µ1 ), 1 1 (µ1 ),1
(µ2 ),1 >L
t (γ2 ), 1 1A
and the last equality of equation (10.29) follows from equation (10.27) with µ1 replaced by µ2 . By the dominated convergence theorem again, we have that C2 (L) → 0 as L → ∞. Suppose that Qµ2 (A) = 0. It follows from equations (10.28) and (10.29) that Qµ1 (A) ≤ C1 (K) + KC2 (L). Taking L → ∞, and then taking K → ∞, we get Qµ1 (A) = 0. This proves the absolute continuity of Qµ1 with respect to Qµ2 . We now introduce the property of “finite memory of the filter”, by which we mean that the optimal filter can be approximated by a filter that uses only the observations from the past τ unit of time. More precisely, we have Definition 10.24 The filter has the finite memory property if for any f ∈ Cb (S) and for µ-almost all x ∈ S, we have ∗ lim sup lim sup EQδx !t (δx ) − !t−τ ,t (Tt−τ δx ), f = 0. (10.30) τ →∞
t→∞
∗ δ ) is the filter using the history of the observaRemark 10.25 !t−τ ,t (Tt−τ x tion from time t − τ to time t, i.e. it is a finite memory filter. The statement equation (10.30) says that the optimal filter can be approximated by one with finite memory.
Theorem 10.26 Suppose that Assumptions (E1)–(E3) hold. If the optimal filter is asymptotically stable with π0 = δx and π¯ 0 = µ, ∀ x ∈ S, then it has the finite memory property.
10.3
Finite memory property
Proof We note that 2 ∗ EQµ !t (µ) − !t−τ ,t (Tt−τ µ), f 2 2 ∗ = EQµ !t (µ), f µ), f + EQµ !t−τ ,t (Tt−τ ∗ µ), f . − 2EQµ !t (µ), f !t−τ ,t (Tt−τ
(10.31)
Note that !t (µ) = !t−τ ,t (!t−τ (µ)). As !t−τ ,t and !t−τ are independent under Qµ , and ∗ EQµ !t−τ (µ) = Tt−τ µ,
we see that 2 ∗ ∗ . EQµ !t (µ), f !t−τ ,t (Tt−τ µ), f = EQµ !t−τ ,t (Tt−τ µ), f Thus, we can continue equation (10.31) with 2 ∗ EQµ !t (µ) − !t−τ ,t (Tt−τ µ), f 2 2 ∗ − EQµ !t−τ ,t (Tt−τ = EQµ !t (µ), f µ), f 2 2 = EQµ !t (µ), f − EQµ !τ (µ), f µ
= mt (Ff ) − mµ τ (Ff ), 2 where Ff (ν) = ν, f for ν ∈ MF (S), the second equality follows from µ being an invariant element of Tt∗ and the fact that !t−τ ,t (µ) and !τ (µ) have the same distribution, and the last equality follows from Lemma 10.18. Taking t → ∞ and then, τ → ∞ in the above equation, we have 2 ∗ µ), f lim sup lim sup EQµ !t (µ) − !t−τ ,t (Tt−τ τ →∞
t→∞ µ
= lim sup m (Ff ) − mµ τ (Ff ) τ →∞
= 0. Thus, ∗ lim sup lim sup EQµ !t (µ) − !t−τ ,t (Tt−τ µ), f = 0. τ →∞
t→∞
209
210
10 : Stability of non-linear filtering
Now we need to replace µ in the above equation by δx . By Assumption (E3), it follows from Propositions 10.22 and 10.23 that Qδx << Qµ . Note that !t (µ) − !t−τ ,t (T ∗ µ), f ≤ 2f 0,∞ . t−τ By the dominated convergence theorem, we have ∗ µ), f lim sup lim sup EQδx !t (µ) − !t−τ ,t (Tt−τ τ →∞
t→∞
Qµ
= lim sup lim sup E τ →∞
t→∞
!t (µ) − !t−τ ,t (T ∗ µ), f dQδx t−τ dQµ
= 0.
(10.32)
We note that ∗ EQδx !t (δx ) − !t−τ ,t (Tt−τ δx ), f ≤ EQδx !t (δx ) − !t (µ), f ∗ + EQδx !t (µ) − !t−τ ,t (Tt−τ µ), f ∗ ∗ + EQδx !t−τ ,t (Tt−τ µ) − !t−τ ,t (Tt−τ δx ), f . By the asymptotical stability, the first term tends to 0. The convergence of the second term follows from equation (10.32). Thus, we only need to prove that the third term tends to 0; this term can be rewritten as QT ∗
E
t−τ δx
∗ !τ T µ − !τ T ∗ δx , f . t−τ t−τ
(10.33)
∗ δ → µ as t → ∞. As T ∗ µ = µ, it By Corollary 10.21, we have Tt−τ x t−τ follows from the Feller property of the filter that equation (10.33) tends to 0. Combining the estimates above, we see that the filter has the finite memory property.
The proof of the next theorem, due to Budhiraja, is technically involved. We will omit the proof but state the theorem for the completeness of the results. Theorem 10.27 Suppose that Assumptions (E1)–(E3) hold. If the filter has the finite memory property, then it satisfies the condition equation (10.18). Remark 10.28 Combining the theorems we introduced in this and the last sections we see that equation (10.18), ergodicity, stability and finite memory properties for the optimal filter are all equivalent, provide that Assumptions (E1)–(E3) hold.
10.4
10.4
Asymptotic stability for non-linear filtering with compact state space
Asymptotic stability for non-linear filtering with compact state space
In the last section, we have given equivalent conditions under which the optimal filter is stable. Now, in this section, we consider a situation when all these conditions are satisfied, i.e. the stability of the optimal filter holds. To this end, we will make use of the Hilbert metric that we introduce now. We will give the proof of the fact that will be used in this book. We refer the reader who is interested in more detail for this topic to the books Birkhoff [12] and Hopf [74]. Definition 10.29 i) For λ, µ ∈ MF (S), λ ≤ µ means that λ(A) ≤ µ(A) for all A ∈ B (S). ii) Two measures λ, µ ∈ MF (S) are comparable if there are two positive constants K1 and K2 such that K1 λ ≤ µ ≤ K2 λ. The Hilbert metric ρh on MF (S) is defined as ⎧ sup{λ(A)/µ(A): A∈ B (S), µ(A)>0} ⎪ ⎪ if λ, µ are comparable, ⎨log inf{λ(A)/µ(A): A∈ B (S), µ(A)>0} ρh (λ, µ)= 0 if λ = µ = 0, ⎪ ⎪ ⎩ ∞ otherwise. First, let us see how to calculate the Hilbert distance for a special case. Suppose that S consists of n points. Then MF (S) can be identified with Rn+ . For x, y ∈ Rn+ , it is easy to show that xi yj . 1≤i, j≤n xj yi
ρh (x, y) = log sup
The following special case will be used later. For n = 2, we have x1 y2 . ρh (x, y) = log x2 y1
(10.34)
Remark 10.30 The Hilbert metric ρh is only a pseudo-metric on MF (S). However, it is a metric on P (S). Proof It is clear that ρh (λ, µ) = ρh (µ, λ) and ρh (λ, Kλ) = 0 for any constant K and λ, µ ∈ MF (S). Now we prove the triangle inequality. For λ, µ, ν ∈ MF (S), we may and will assume that they are comparable (otherwise the
211
212
10 : Stability of non-linear filtering
triangle inequality equation (10.35) is trivial). Then sup{λ(A)/ν(A) : A ∈ B (S), ν(A) > 0} inf{λ(A)/ν(A) : A ∈ B (S), ν(A) > 0} sup{ν(A)/µ(A) : A ∈ B (S), µ(A) > 0} + log inf{ν(A)/µ(A) : A ∈ B (S), µ(A) > 0}
ρh (λ, ν) + ρh (ν, µ) = log
= log
≥ log
ν(A) sup{ λ(A) ν(A) : ν(A) > 0} sup{ µ(A) : µ(A) > 0} ν(A) inf{ λ(A) ν(A) : ν(A) > 0} inf{ µ(A) : µ(A) > 0}
λ(A) sup{ µ(A) : µ(A) > 0} λ(A) inf{ µ(A) : µ(A) > 0}
= ρh (λ, µ).
(10.35)
Thus, ρh is a pseudo-metric on MF (S). However, if λ, µ ∈ P (S) satisfying ρh (λ, µ) = 0, then λ(A) λ(A) sup : µ(A) > 0 = inf : µ(A) > 0 . µ(A) µ(A) Thus, there exists a constant K such that λ(A) = Kµ(A),
∀ A ∈ B (S).
Taking A = S, we get K = 1, and hence, λ = µ. Namely, ρh is a metric on P (S). For a linear transformation T on MF (S), we define the ρh -diameter of the range T (MF (S)) of T as H(T ) = sup{ρh (T λ, T µ) : λ, µ ∈ MF (S)}. For the special case that MF (S) = R2+ and T given by a matrix 4
we have
ab cd
5 ,
a, b, c, d > 0,
(ax1 + bx2 )(cy1 + dy2 ) H(T ) = sup log : x, y ∈ R2+ (cx1 + dx2 )(ay1 + by2 ) bc = log . ad
(10.36)
(10.37)
10.4
Asymptotic stability for non-linear filtering with compact state space
Similarly, if for a specific λ ∈ MF (S), the transformation T has the kernel representation T µ, f = G(x, x )f (x )µ(dx)λ(dx ), S S
where
G(x, x )
is non-negative, then H(T ) = log esssup
G(x, y)G(x , y ) , G(x, y )G(x , y)
(10.38)
with the convention 0/0 = 1 and 1/0 = ∞. The supremum above is strict over x, x ∈ S, and is essential over y, y ∈ S with respect to λ. The next lemma will be useful in the proof of the stability result. Lemma 10.31 Let T be a linear transformation on MF (S). Then T is a contraction under the Hilbert metric and ρ (T λ, T µ) H(T ) τ (T ) ≡ sup h : 0 < ρh (λ, µ) < ∞ = tanh . (10.39) ρh (λ, µ) 4 The function τ is called Birkhoff’s contraction coefficient. Proof For simplicity, we consider only the case that MF (S) = R2+ and T is given by the matrix equation (10.36). We refer the reader to Birkhoff [12] (Theorem 3, p. 384) for the arguments on relating the general case to the current one. Let R(x) = xx12 for x = (x1 , x2 ) ∈ R2+ . It follows from equation (10.34) that ρh (x, y) = log(R(x)/R(y)) . Denote x =
(T x)1 aR(x) + b , = (T x)2 cR(x) + d
and y is defined similarly. Then,
y du ρh (T x, T y) = log(x /y ) = . x u
Define a new variable u such that u =
au + b . cu + d
As du ad − bc = , du (cu + d)2
(10.40)
213
214
10 : Stability of non-linear filtering
we may continue equation (10.40) with R(y) ad − bc ρh (T x, T y) = du R(x) (au + b)(cu + d) R(y) du |ad − bc| ≤ √ u 2 abcd + ad + bc R(x) |ad − bc| = √ ρh (x, y). 2 abcd + ad + bc By elementary calculation, it is easy to show that |ad − bc| bc 6 = tanh log 4 . √ ad 2 abcd + ad + bc We finish the proof of the lemma by making use of equation (10.37).
The following technical lemma will be needed in the proof of the exponential stability of the filter. Lemma 10.32 For any λ, µ ∈ P (S), we have dTV (λ, µ) ≡ 2 sup |λ(A) − µ(A)| A∈B(S)
≤ 2 ∧ eρh (λ,µ) − 1 ≤
2 ρ (λ, µ). log 3 h
(10.41)
dTV is called the total variation metric on P (S). Proof If λ and µ are not comparable, then ρh (λ, µ) = ∞, and hence, equation (10.41) clearly holds. Now, we suppose that λ and µ are comparable. Let A ≡ {A ∈ B (S) : λ(A) ≥ µ(A)}. Then, for A ∈ A with µ(A) > 0, we have 1≤ and hence
λ(A) λ(A)/µ(A) = ≤ eρh (λ,µ) , µ(A) λ(S)/µ(S)
0 ≤ λ(A) − µ(A) ≤ µ(A) eρh (λ,µ) − 1 .
(10.42)
It is clear that equation (10.42) holds even if µ(A) = 0 since λ(A) = 0 by the comparability.
10.4
Asymptotic stability for non-linear filtering with compact state space
By symmetry, we can prove that for A ∈ A, 0 ≤ µ(Ac ) − λ(Ac ) ≤ λ(Ac ) eρh (λ,µ) − 1 . Therefore, dTV (λ, µ) = sup λ(A) − µ(A) + µ(Ac ) − λ(Ac ) A∈A
≤ sup µ(A) + λ(Ac ) eρh (λ,µ) − 1 A∈A
≤ eρh (λ,µ) − 1. It is clear that dTV (λ, µ) is bounded by 2. Hence, the first part of the inequality equation (10.41) holds. The second part follows from the inequality 2x , 2 ∧ ex − 1 ≤ log 3
∀x ≥ 0.
Recall that Xt is a continuous time-homogeneous Markov process taking values in S. Px is the probability measure on C(R+ , S) induced by Xt with X0 = x. Let (ξt , βt ) be the co-ordinate process on the probability space ˆ Fˆ , Rπ0 ), where ˆ = C(R+ , S) × C(R+ , Rm ) and Rπ0 = Pπ0 ⊗ Q. Then, (, the observation process Yt has the same distribution as the process t yt = h(ξs )ds + βt . 0
Since we are going to study the stability problem with models on the stanˆ Fˆ , Rπ0 ) only, we shall use some of the notations dard probability space (, y that were used in the original filtering model. For example, we take Gt = Ft and πt = !t (π0 )(y). ˆ is a Polish space, there exists a regular conditional probability As distribution ˆ x ·|Gt , ξt = x R ˆ x is the probability measure whose Radon–Nickodym ˆ where R on , derivative with respect to Rx is ˆx dR (θ , η) = qt (θ , η) = exp dRx
0
t
1 h(ξu (θ)) dβu (η) − 2 ∗
0
t
2
|h(ξu (θ ))| du .
215
216
10 : Stability of non-linear filtering
ˆ x , it is easy to show that for any B ∈ Fˆ t As Gt and ξt are independent under R and A ∈ B (S), we have ˆ x (ξt ∈ dx ). ˆ x B|Gt , ξt = x R ˆ x (B ∩ {ξt ∈ A}|Gt ) = R R A
Recall that
qt (θ , y) = exp and
p(t, x, dx )
t
0
h(ξu (θ))∗ dyu −
1 2
0
t
|h(ξu (θ ))|2 du ,
is the transition function of the Markov process Xt .
Lemma 10.33 For any y ∈ C(R+ , Rm ) and x, x ∈ S, let ˆ It (x, x ; y) = ERx qt (·, y)|Gt , ξt = x . Then, for any f ∈ Cb (S), we have f (x )p(t, x, dx )It (x, x ; y)λ(dx). t (λ)(y), f =
(10.43)
S S
Proof By the Kallianpur–Striebel formula, we have ˆ t (λ)(y), f = ERλ f (ξt (θ))qt (θ , y)|Gt ˆ = ERx f (ξt (θ))qt (θ , y)|Gt λ(dx) S ˆ = ERx qt (θ , y)|Gt , ξt = x f (x )p(t, x, dx )λ(dx) S S = f (x )It (x, x ; y)p(t, x, dx )λ(dx). S S
The aim of this section is to prove that dTV (!t (π¯ 0 ), !t (π0 )) → 0
as t → ∞.
More specifically, we shall calculate the asymptotic rate γ (π¯ 0 ) = lim sup t→∞
1 log dTV (!t (π¯ 0 ), !t (π0 )), t
and prove that γ (π¯ 0 ) < 0. Throughout the rest of this section, we assume that the Markov process Xt has a unique invariant measure µ satisfying the following conditions: Assumption (S1): The measure Pµ is ergodic, i.e. the tail σ -field ∩t≥0 σ {Xs : s ≥ t} contains only Pµ -trivial sets (those with Pµ measure 0 or 1).
10.4
Asymptotic stability for non-linear filtering with compact state space
Assumption (S2): The measures Pπ0 and Pµ are equivalent on the tail σ -field. Let λ ∈ MF (S), t ≥ 0 and y ∈ C(R+ , Rm ) be fixed. We define an operator λ,y It on MF (S) as follows: " ! λ,y f (x )It (x, x ; y)µ(dx)λ(dx ). It (µ), f = S S
For convenience, we denote the transition probability p(t, x, dx ) also by pt (x, dx ). Theorem 10.34 Suppose that Assumptions (S1) and (S2) hold, and let δ > 0. (a) If π¯ 0 ∈ P (S) is comparable with π0 , then γ (π¯ 0 ) ≤ δ −1 Eµ log τ (δ ),
Qπ0 -a.s.
(10.44)
(b) If π¯ 0 ∈ P (S) is comparable with π0 , and there exists λ ∈ MF (S) such that for every x ∈ S, the measure pδ (x, ·) is absolutely continuous with respect to λ, then λ,y γ (π¯ 0 ) ≤ δ −1 Eµ log tanh 4−1 H(Tδ ) + H(Iδ ) , Qπ0 -a.s. (10.45) (c) Suppose that there exists λ ∈ MF (S) such that, for every x ∈ S, pδ (x, ·) is comparable with λ, and Iδ (x, x ; y) can be bounded above and below by positive bounds that do not depend on x and x . Then equations (10.44) and (10.45) hold Qπ0 -a.s. for any π¯ 0 ∈ P (S). Proof (a) Let n = [t/δ]. By Lemma 10.32, we have 2 1 1 1 log dTV (!t (π¯ 0 ), !t (π0 )) − log ≤ log ρh (!t (π¯ 0 ), !t (π0 )). t t log 3 t Note that for any positive constants K1 and K2 , we have ρh (K1 λ, K2 µ) = ρh (λ, µ). Thus, ρh (!t (π¯ 0 ), !t (π0 )) = ρh (t (π¯ 0 ), t (π0 )). By the flow property proved in Lemma 10.7, we have t (π¯ 0 ) = nδ,t (nδ (π¯ 0 )). It follows from Lemma 10.31 and the fact of τ (·) is bounded by 1 that ρh (t (π¯ 0 ), t (π0 )) ≤ ρh (nδ (π¯ 0 ), nδ (π0 )).
217
218
10 : Stability of non-linear filtering
Therefore, combining all the estimates above, we get 1 2 1 log dTV (!t (π¯ 0 ), !t (π0 )) − log t t log 3 1 ≤ log ρh (nδ (π¯ 0 ), nδ (π0 )) t ρ ((i+1)δ (π¯ 0 ), (i+1)δ (π0 )) 1 1 + log ρh (π¯ 0 , π0 ) log h t ρh (iδ (π¯ 0 ), iδ (π0 )) t n−1
=
i=0
1 1 log τ (iδ,(i+1)δ ) + log ρh (π¯ 0 , π0 ). t t n−1
≤
i=0
Note that τ (iδ,(i+1)δ ) is a function of {ξt+iδ , yiδ+t − yiδ : 0 ≤ t ≤ δ}. Thus, ˆ µ . It follows {τ (iδ,(i+1)δ ), i = 1, 2, . . .} is an ergodic sequence under R ˆ µ -a.s. By from Birkhoff’s ergodic theorem that equation (10.44) holds R ˆ π are equivalent on the tail σ -field. ˆ µ and R Assumption (S2), we see that R 0 ˆπ As equation (10.44) is a tail event, we get that equation (10.44) holds R 0 a.s. Since equation (10.44) depends on y only and the marginal measure of ˆ π is Qπ , we see that equation (10.44) holds Qπ -a.s. R 0 0 0 (b) By a slightly abuse of the notation, we denote the density of the probability measure pδ (x, ·) with respect to λ by pδ (x, x ). Define Jδ (x, x ) = pδ (x, x )Iδ (x, x ; y). By equation (10.43), we have f (x )Jδ (x, x )π¯ 0 (dx)λ(dx ). δ (π¯ 0 ), f = S S
Thus, δ has a kernel representation, and hence, H(δ ) = log esssup ≤ log esssup
Jδ (x, z)Jδ (x , z ) Jδ (x, z )Jδ (x , z) pδ (x, z)pδ (x , z ) Iδ (x, z; y)Iδ (x , z ; y) + log esssup pδ (x, z )pδ (x , z) Iδ (x, z ; y)Iδ (x , z; y)
= H(Tδ ) + H(Iδλ (·; y)). By equations (10.39) and (10.44), we see that equation (10.45) holds. (c) Note that by Lemma 10.33, we have d(δ (π¯ 0 )) (x ) = pδ (x, x )Iδ (x, x ; y)π¯ 0 (dx). dλ S
10.4
Asymptotic stability for non-linear filtering with compact state space
By the assumption of the theorem, pδ and Iδ are bounded, and hence, the Radon–Nickodym derivative above is bounded (below from 0 and above from infinity). Thus δ (π¯ 0 ) and δ (π0 ) are comparable. Therefore, the argument used in equation (10.45) holds when ρh (q, π0 ) is replaced by ρh (δ (π¯ 0 ), δ (π0 )). Finally, we specialize the results above to the diffusion in S. We assume that S is a compact manifold in Rd . For the reader who is not familiar with a differential manifold, you can regard S as a compact surface in Rd . Suppose that the Markov process ξt is governed by the following SDE on S: dξt = b(ξt )dt + σ (ξt )dαt ,
(10.46)
where αt is a d-dimensional Brownian motion. For simplicity of notation, we assume that the observation is a real-valued (m = 1) process yt given by t h(ξs )ds + βt . yt = 0
We assume that b, σ , h are Lipschitz continuous on S. We further assume that h ∈ Cb2 (S) and b, σ satisfy the uniform elliptic condition. It is well known that the transition density for the Markov process ξt exists and satisfies K1 e−K2 t
−1
≤ pt (x, x ) ≤ K3 t −d/2 ,
∀ x, x ∈ S,
(10.47)
where K1 , K2 , K3 are three constants. We refer the reader to Chapter 3 of Davies [48] for this result. We will need the following result in proving the asymptotic stability of the optimal filter. Lemma 10.35 For δ > 0 and y ∈ C(R+ , R) fixed, there exists a constant K1 = K1 (δ, y) such that Iδ (x, x , y) ≤ K1 ,
∀x, x ∈ S.
Proof Let λ be the surface measure on S. Note that δ 1 δ 2 ˆx R Iδ (x, x ; y) = E exp h(ξs )dys − h (ξs )ds ξδ = x , Gδ . 2 0 0 Applying Itô’s formula to equation (10.46), we have d(h(ξs )ys ) = h(ξs )dys + ys ∇ ∗ h(ξs )dξs +
d 1 σik σjk ∂ij2 h(ξs )ys ds. 2 i,j,k=1
219
220
10 : Stability of non-linear filtering
Thus, δ 1 δ 2 h(ξs )dys − h (ξs )ds 2 0 0 ⎛ ⎞ δ δ d 1 2 2 ⎝ ⎠ = h(ξδ )yδ − σik σjk ∂ij h(ξs )ys ds − ys ∇ ∗ h(ξs )dξs h (ξs ) + 2 0 0 ≤ K(y, δ) −
0
i,j,k=1
δ
ys ∇ ∗ h(ξs )dξs ,
where K(y, δ) is a random variable depending on y and δ. Therefore, δ ˆx K(y,δ) R ∗ E ys ∇ h(ξs )dξs ξδ = x , Gδ . exp − Iδ (x, x ; y) ≤ e 0
(10.48) Set
t ys ∇ ∗ h(ξs )dξs . Ft = exp − 0
Let δn ↑ δ. Applying Fatou’s lemma, we get ˆ ˆ ERx Fδ ξδ = x , Gδ ≤ lim ERx Fδn ξδ = x , Gδ . n→∞
(10.49)
ξ
For any Fδn -measurable random variable Z and Borel measurable function f , it follows from the Markov property of ξt that ˆ ˆ ˆ ERx Z|ξδ = x pδ (x, x )f (x )dx = ERx ERx (Z|ξδ ) f (ξδ ) S
ˆ = ERx Zf (ξδ ) ˆ ˆ ξ = ERx ZERx f (ξδ )|Fδn ˆx R =E Z pδ−δn (ξδn , x )f (x )dx =
S
S
ˆ ERx Zpδ−δn (ξδn , x ) f (x )dx .
Thus, ˆ ˆ ERx Z|ξδ = x pδ (x, x ) = ERx Zpδ−δn (ξδn , x ) for almost every x ∈ S.
10.4
Asymptotic stability for non-linear filtering with compact state space
ˆ x , the processes (ξt ) and (yt ) are independent. Thus, when considUnder R ering the conditional expectation given Gδ , we may regard y0δ , the path yt for ξ t ∈ [0, δ], as fixed (non-random). Namely, for Z being Fδn ∨ Gδ -measurable, we have ˆ ˆ ERx Z|ξδ = x , Gδ pδ (x, x ) = ERx Zpδ−δn (ξδn , x )|Gδ , a.s. x ∈ S. Taking Z = Fδn , then ˆ ˆ ERx Fδn |ξδ = x , Gδ pδ (x, x ) = ERx Fδn pδ−δn (ξδn , x )|Gδ .
(10.50)
Denote the left-hand side of equation (10.50) by bn (x, x , y0δ ). Since ξ Fδn−1 is Fδn ∨ Gδ -measurable, it follows from the same argument as in equation (10.50) that ˆ bn−1 (x, x , y0δ ) = ERx Fδn−1 pδ−δn (ξδn , x )|Gδ . In view of equations (10.48) and (10.49), in order to get a uniform upper bound for Iδ (x, x ; y), it suffices to bound bn uniformly in x, x and n. Note that |bn (x, x , y0δ ) − bn−1 (x, x , y0δ )| ˆ ≤ ERx |Fδn − Fδn−1 |pδ−δn (ξδn , x )|Gδ 1 d ˆx ˆx d+1 d+1 d+1 d+1 R R d |Fδn − Fδn−1 | pδ−δn (ξδn , x ) |Gδ |Gδ E ≤ E ≡ Q1 Q2 .
(10.51)
By equation (10.47), we get Q2 =
S
pδ−δn (z, x )
d+1 d
≤ K2
S
pδn (x, z)dz
pδ−δn (z, x )(δ − δn )
= K2 (δ − δn ) ≤ K3 (δ − δn )
d − 2(d+1)
− 12
d d+1
pδn (x, z)dz
d d+1
d
pδ (x, x ) d+1
d − 2(d+1) −d 2 /2(d+1)
δ
.
To bound Q1 , we first use the inequality |ex − ey | ≤ |x − y|(ex + ey ),
(10.52)
221
222
10 : Stability of non-linear filtering
and the Cauchy–Schwarz inequality. Denoting m = 2(d + 1), we have ⎧ ⎛ ⎞⎫ 1 d+1 ⎨ δn ⎬ d+1 ˆ d+1 ⎠ Q1 ≤ ERx ⎝ ys ∇ ∗ h(ξs )dξs G Fδn + Fδn−1 δ ⎭ ⎩ δn−1 m $ 1 m δn 1 m ˆ ∗ ys ∇ h(ξs )dξs Gδ ERx 2m Fδmn + Fδmn−1 Gδ δn−1
ˆx R
≤ E
≡ Q11 Q12 . By the Burkholder–Davis–Gundy inequality, we get m $ 1 m δn ys ∇ ∗ h(ξs )b(ξs )ds Gδ δn−1
ˆx R
Q11 ≤ E
ˆx R
+ E
m $ 1 m δn ∗ y ∇ h(ξs )σ (ξs )dαs Gδ δn−1 s
≤ K4 y0δ ∞ (δn − δn−1 )1/2 .
(10.53)
On the other hand, ˆ
ERx (Fδmn |Gδ ) ˆx R = E exp −m
δn
0
≤e
K5 my0δ ∞ δ
∗
ys ∇ h(ξs )b(ξs )ds − m
δn 0
∗
ys ∇ h(ξs )σ (ξs )dαs
m2 δn ∗ 2 E exp |ys ∇ h(ξs )σ (ξs )| ds 2 0
× exp −m
δn
0
δ
δn 0
|ys ∇ ∗ h(ξs )σ (ξs )|2 ds
2 yδ 2 δ 0 ∞
≤ eK5 my0 ∞ δ eK6 m × E exp −m
m2 ys ∇ ∗ h(ξs )σ (ξs )dαs − 2
0
δn
m2 ys ∇ h(ξs )σ (ξs )dαs − 2 ∗
δn 0
∗
2
|ys ∇ h(ξs )σ (ξs )| ds
= exp (mK(y, δ)) , where K(y, δ) = K5 y0δ ∞ δ + K6 my0δ 2∞ δ.
10.5
Exchangeability of union intersection for σ -fields
Thus, Q12 ≤ 4 exp (K(y, δ)) .
(10.54)
Combining equations (10.53) and (10.54), we conclude that (10.55) Q1 ≤ 4K4 y0δ ∞ (δn − δn−1 )1/2 exp (K(y, δ)) . Choosing δn = δ 1 − 2−n and combining equations (10.51), (10.52) and (10.55), we get |bn (x, x , y0δ ) − bn−1 (x, x , y0δ )| ≤ 4K3 K4 y0δ ∞ (δ2−n )1/2(d+1) exp (K(y, δ)) δ −d
2 /2(d+1)
.
Since b0 = pδ (x, x ), it follows that bn ≤ K7 δ −d/2 + 4K3 K4 y0δ ∞ δ (1−d)/2 exp (K(y, δ)) .
(10.56)
Using equations (10.48), (10.49) and (10.47), we get Iδ (x, x ; y0δ ) ≤ eK(y,δ)
bn . pδ (x, x )
Making use of equation (10.56), we then get the boundedness of Iδ (x, x ; y0δ ). Theorem 10.36 There exists a constant K1 > 0 such that γ (π¯ 0 ) ≤ −K1 ,
Qπ0 -a.s.
(10.57)
Proof It follows from Lemma 10.35 and the Cauchy–Schwarz inequality EeX ≥ 1/Ee−X that there is a constant K2 = K2 (δ, y) > 0 such that Iδ (x, x , y0δ ) ≥ K2 ,
∀ x, x ∈ S.
(10.58)
By Lemma 10.35 and equation (10.58), it follows from equation (10.38) λ,y that H(Iδ ) < ∞. It follows from equations (10.38) and (10.47) that H(Tδ ) < ∞. Thus, equation (10.57) follows from equation (10.45).
10.5
Exchangeability of union intersection for σ -fields
As we saw in Section 10.3, the condition equation (10.18) is the key for the stability of the optimal filter. This condition is essentially about the exchangeability of the two operations (union and intersection) of σ -fields. In this section, we study this problem in a general setting.
223
224
10 : Stability of non-linear filtering
Let {Gn , n ≥ 1} be a sequence of non-increasing sub-σ -fields of F with the intersection G∞ . Let A be another sub-σ -field of F . The goal of this section is to find conditions under which the following equality holds ∩∞ n=1 (A ∨ Gn ) = A ∨ G∞
(10.59)
in some sense we shall define below. Definition 10.37 For two sub-σ -fields A and B of F , we write
A=B
mod P,
if A and B induce the same sets of P-equivalence sets. More precisely, for any A ∈ A there exists B ∈ B such that P(AB) = 0, and vice versa, where AB = (A \ B) ∪ (B \ A) is the symmetric difference between A and B. Now we can modify equation (10.59), and the goal of this section is to find conditions under which the following equality holds ∩∞ n=1 (A ∨ Gn ) = A ∨ G∞ ,
mod P.
(10.60)
For convenience, we introduce a few more notations. For a σ -field A, Ab denotes the set of all bounded A-measurable random variables. PωA denotes the regular conditional probability measure on (, F , P) given A. First, we consider measurable spaces (X, X ) and (Y, Y ). Let Q be a probability measure on (X × Y, X ⊗ Y ). We denote the kernel from X to Y by Qx (dy), namely, F(x, y)Qx (dy) Q1 (dx) = F(x, y)dQ(x, y) (10.61) X
Y
X×Y
for all F ∈ (X ⊗ Y )b , where Q1 is the first marginal measure of Q. Let {Yn } be a decreasing sequence of sub-σ -fields of Y . We now find conditions for the following counterpart of equation (10.60): ∩∞ n=1 (X ⊗ Yn ) = X ⊗ Y∞ ,
mod P.
(10.62)
Lemma 10.38 Let Y b,0 be a generating system of Y b as a monotone class. The equality equation (10.62) holds if and only if for every g ∈ Y b,0 and every > 0 there exists a finite-dimensional subset R of X b and a uniformly bounded sequence {hn } of functions on X × Y such that for each n: i) hn ∈ (X ⊗ Yn )b . ii) hn (·, y) ∈ R for every y ∈ Y.
10.5
iii)
X×Y
Exchangeability of union intersection for σ -fields
Q (x, y) h (x, y) − E X ⊗ Y (g| ) dQ(x, y) < . n n
Proof “Only if” Suppose that equation (10.62) holds. Fix g ∈ Y b,0 and > 0. Let g(x, ˜ y) = g(y). By the backward martingale convergence theorem, we have ˜ X ⊗ Yn ) → EQ g| EQ (g| ˜ ∩∞ n=1 (X ⊗ Yn ) ˜ X ⊗ Y∞ ) = EQ (g| in L1 (Q). Choose n0 such that for n > n0 , ˜ X ⊗ Yn ) − EQ (g| ˜ X ⊗ Y∞ ) < . EQ EQ (g| 2 For n ≤ n0 or n = ∞, by the martingale convergence theorem, we can choose a finite subalgebra Hn of X such that ˜ Hn ⊗ Yn ) − EQ (g| ˜ X ⊗ Yn ) < . EQ EQ (g| 2 Let 0 b R = ∨nn=1 Hn ∨ H∞ . Then R is a finite-dimensional space. Denote ˜ Hn ⊗ Yn ) , hn = EQ (g|
n ≤ n0 or n = ∞.
For n > n0 , let hn = h∞ . It is easy to see that {hn , n ≥ 1} satisfies i)–iii). “If” We only need to show that for any F ∈ (X ⊗ Y )b , EQ (F|X ⊗ Y∞ ) = EQ F| ∩∞ (10.63) n=1 (X ⊗ Yn ) . Suppose F(x, y) = g(y) for some g ∈ Y b,0 . For every > 0, let {hn } and R be given by i)–iii). Let {f1 , . . . , fd } be a basis of R. Then, there exist uniformly bounded functions gin ∈ Ynb , i = 1, . . . , d such that hn (x, y) =
d
gin (y)fi (x),
x ∈ X, y ∈ Y.
i=1
Since the bounded set of L∞ is relatively compact in weak topology, we b of {g n }, i.e. for every h ∈ L1 (Q2 ), may take a (weak) limit point gi ∈ Y∞ i we have 2 2 EQ gin h → EQ gi h , i = 1, . . . , d,
225
226
10 : Stability of non-linear filtering
as n → ∞. Let h˜ i (y) =
fi (x)H(x, y)Qy (dx),
X
i = 1, 2, . . . , d.
Then, for every H ∈ L∞ (Q), as n → ∞, we have
EQ (hn H) =
d i=1
=
X
d i=1
→
Y
Y
gin (y)h˜ i (y)Q2 (dy)
d i=1
fi (x)gin (y)H(x, y)Qy (dx)Q2 (dy)
Y
gi (y)h˜ i (y)Q2 (dy)
= EQ h∞ H , where h∞ (x, y) =
d
fi (x)gi (y) ∈ (X ⊗ Y∞ )b .
i=1
Taking n → ∞ in iii), by the backward martingale convergence theorem, we get Q ∞ (x, y) g| ∩ h (x, y) − E X ⊗ Y ( ) dQ(x, y) ≤ . ∞ n n=1 X×Y
Such an h∞ exists for all > 0. This implies that EQ g| ∩∞ n=1 (X ⊗ Yn ) is X ⊗ Y∞ -measurable, and hence, equals EQ (g|X ⊗ Y∞ ). Next, we take F(x, y) = f (x)g(x) for some f ∈ X b and g ∈ Y b,0 . The same arguments with fi replaced by ffi yield equation (10.63) for this F. The standard method of the measure theorem then implies equation (10.63) for general F ∈ (X ⊗ Y )b . Next, we transform the result of Lemma 10.38 to our original probability space by defining a mapping γ from (, F ) to ( × , A ⊗ G1 ) by γ (ω) = (ω, ω). Take X = Y = , X = A and Y = G1 . Lemma 10.39 For each n, we have γ −1 (A ⊗ Gn ) = A ∨ Gn . Proof Let
D = {B1 × B2 : B1 ∈ A, B2 ∈ Gn } ,
10.5
and
Exchangeability of union intersection for σ -fields
H = B ∈ A ⊗ Gn : γ −1 (B) ∈ A ∨ Gn .
It is easy to show that D is closed under finite intersection and H, containing D, is closed under increasing limit and closed under true difference. Thus, H = A ⊗ Gn . This proves γ −1 (A ⊗ Gn ) ⊂ A ∨ Gn . The other direction of the containment can be proved similarly. As a consequence of Lemma 10.39, we see that the equalities equations (10.60) and (10.62) are equivalent. Lemma 10.40 Define a probability measure Q on × by Q = P ◦ γ −1 . Then, for any random variable g on (, F , P), we have ˜ A ⊗ Gn ) (ω, ω) = E (g|A ∨ Gn ) (ω), EQ (g|
P-a.s. ω,
(10.64)
where g(ω ˜ 1 , ω2 ) = g(ω2 ) is a random variable on ( × , F ⊗ F , Q). Proof It follows from Lemma 10.39 that both sides of equation (10.64) are A ∨ Gn -measurable. We only need to show that for A1 ∈ A and A2 ∈ Gn , EQ (g|A ⊗ Gn ) (ω, ω)P(dω) = E (g|A ∨ Gn ) (ω)P(dω). A1 A2
A1 A2
It is clear that the right-hand side equals E g1A1 A2 . On the other hand, the left-hand side is EQ (g|A ⊗ Gn ) 1A1 ×A2 dQ = EQ g1A1 ×A2 = E g1A1 A2 . ×
By Lemmas 10.39 and 10.40, we can transform the result of Lemma 10.38 to our original setting now. b,0
Theorem 10.41 Let G1 be a generating system of G1b as a monotone class. b,0
The equality equation (10.60) holds if and only if for every g ∈ G1 and every > 0 there exists a finite-dimensional subset R of Ab and a uniformly bounded sequence {hn } of functions on × such that for each n: i) hn ∈ (A ⊗ Gn )b . ii) hn (·, ω) ∈ R for every ω ∈ . iii) hn (ω, ω) − E (g|A ∨ Gn ) (ω) P(dω) < .
227
228
10 : Stability of non-linear filtering
In the rest of this section, we take X, Y as those defined just before Lemma 10.39. Recall that Q1 is the marginal measure of Q on X, and PxA coincides with Qx , the transition kernel from X to Y. Below, we will use PxA to indicate its measurability with respect to A as a function of x. Now we consider other conditions under which equation (10.60) holds. We need the following Definition 10.42 Let A, G be two sub-σ -fields of F . The σ -field G is PA -separable if there is a countably generated sub-σ -field H of G such that G = H mod PωA for P-a.s. ω. Throughout the rest of this section, we assume that G1 is PA -separable. b,0
Lemma 10.43 Let G be a sub-σ -field of G1 and let G1 be as before. Then, the following statements are equivalent: i) G is PA -separable. b,0 ii) For every g ∈ G1 there exists F ∈ (A ⊗ G )b such that A
F(x, ·) = EPx (g|G )
for Q1 -a.s. x,
(10.65) A
where PxA is the conditional probability measure given A, and EPx stands for the expectation with respect to the measure PxA . b,0 iii) For every g ∈ G1 and every F ∈ (A ⊗ G )b satisfying F = EQ (g|A ⊗ G ), we have that equation (10.65) holds. Proof i)⇒iii) Let H ⊂ G be countably generated such that H = G mod PxA for Q1 -a.s. x. Let F ∈ (A ⊗ G )b satisfy F = EQ (g|A ⊗ G ). Then, for all A ∈ A and G ∈ H, we have F(x, y)PxA (dy)Q1 (dx) = g(y)PxA (dy)Q1 (dx). A G
A G
Thus, for G ∈ H, A F(x, y)Px (dy) = g(y)PxA (dy) G
for Q1 -a.s. x.
G
Since H is countably generated there exists N ⊂ X such that Q1 (N) = 0 A / N. Because H = G mod PxA , F then satisfies and F(x, ·) = EPx (g|H), ∀ x ∈ the same relation for G instead of H. iii)⇒ii) is obvious. ii)⇒i) By a monotone class argument we see that the assertion in ii) carries over to all g ∈ G1b . Let H1 ⊂ G1 be countably generated such that H1 = G1 mod PxA for x ∈ / N1 while Q1 (N1 ) = 0. Fix a countb able subset {gm , m ≥ 1} of H1 that generates H1b as a monotone class.
10.5
Exchangeability of union intersection for σ -fields A
For each m, we choose Fm ∈ (A ⊗ G )b satisfying Fm (x, ·) = EPx (gm |G ) for Q1 -a.s. x. Since every product σ -field is the union of its countably generated subproduct-σ -fields, all functions Fm (x, ·), (m ≥ 1, x ∈ X), are measurable with respect to a countably generated sub-σ -field H of G . Then A
A
EPx (gm |G ) = EPx (gm |H),
∀m ≥ 1, x ∈ / N2 ,
where N2 is another Q1 -nullset. If x ∈ / N1 ∪ N2 , it follows that A
A
EPx (g|G ) = EPx (g|H),
∀ g ∈ H1b .
(10.66)
Then, equation (10.66) holds for all g ∈ G1b . Therefore, H = G mod PxA since G ⊂ G1 . Finally, we are ready to state the main theorem of this section. Theorem 10.44 Suppose that Gn is PA -separable for all n. Then, equation (10.60) holds if and only if G∞ is PA -separable. b,0
Proof Fix g ∈ G1 . Let Fn ∈ (A ⊗ Gn )b be such that Fn = EQ (g|A ⊗ Gn ). Define F = lim supn→∞ Fn . From Lemma 10.43 we see that Fn (x, ·) = A EPx (g|Gn ). By the martingale convergence theorem, we have A
F(x, ·) = EPx (g|G∞ )
Q1 -a.s.x.
For H ∈ (A⊗G∞ )b , it follows from equation (10.61) that the two statements H = F mod Q,
(10.67)
and A
H(x, ·) = EPx (g|G∞ )
Q1 -a.s.x
(10.68)
are equivalent. Suppose equation (10.60) holds. Then, equation (10.62) holds, and hence, there exists H ∈ (A ⊗ G∞ )b with property equation (10.67). Hence, H satisfies equation (10.68) that, by Lemma 10.43, implies that G∞ is PA -separable. On the other hand, suppose that G∞ is PA -separable. By Lemma 10.43, we see that there is H ∈ (A ⊗ G∞ )b satisfying equation (10.68) and hence equation (10.67). Therefore, ( A ⊗ G ) F ≡ EQ g ∩∞ n n=1
229
230
10 : Stability of non-linear filtering
coincides Q-a.s. with an A ⊗ G∞ -measurable function, i.e. EQ (g|A ⊗ G∞ ) = EQ g ∩∞ n=1 (A ⊗ Gn ) . b,0
Since this is true for every g ∈ G1 , it follows from the same arguments as in the proof of Theorem 10.41 that equation (10.60) holds.
10.6
Notes
The problem of invariant measures for filtering processes was first considered by Kunita [92]. Later, the results were extended by Kunita [93], Stettner [140], and Bhatt et al. [9] to more general settings. Using the results of Kunita [92], Ocone and Pardoux [129] studied the asymptotic stability of the filter. Since then, various authors have investigated this problem with various improvements. For the setting with Markov ergodic signals on compact domains, this problem is studied by Atar and Zeitouni [3], [4], Baxendale et al. [5], Chigansky [29], [30], Chigansky and Liptser [31], Da Prato, et al. [46], Del Moral and Guionnet [52–54], Del Moral and Miclo [57], Delyon and Zeitouni [58]. It is pushed to non-compact/non-ergodic settings by Atar [2], Budhiraja and Ocone [24], [25], Le Gland and Mevel [107], Le Gland and Oudjane [108], [109], Di Masi and Stettner [60], Tadi´c and Doucet [145], Oudjane and Rubenthaler [130], Papavasiliou [131]. Some other results using Kunita [92] include Budhiraja [16], [17], [18], Stettner [141], Le Breton and Roubaud [106]. The stability with respect to initial conditions is naturally related to the robustness of the filtering equation with respect to the model parameters. The related results appeared in, for example, a series of papers of Budhiraja and Kushner [20], [22], [21], [23]. Here, we also mention some other papers in this subject: Cérou [27], Chigansky and Liptser [32], Clark et al. [34], Ocone [128]. The key condition for the stability is equation (10.18). This condition was studied by Weizsächer [146]. Counterexamples for this condition can be found in Baxendale et al. [5], Delyon and Zeitouni [58], and Williams [148]. Sections 10.1 and 10.2 are based on the paper of Bhatt et al. [9]. Theorem 10.19 is due to Kunita. Section 10.3 is based on the paper of Budhiraja [18]. The concept of the finite-memory property is introduced by Ocone and Pardoux [129]. Section 10.4 is based on Atar and Zeitouni [3]. Section 10.5 is based on Weizsächer [146].
11
Singular filtering
In this chapter, we consider the filtering problem when the magnitude of the observation noise depends on the signal itself. Such a situation arises from many application problems, such as the stochastic volatility model in mathematical finance and the general filtering problem with the Ornstein–Uhlenbeck process as the observation noise. Since the covariance matrix of the observation noises is completely observable by using the quadratic covariation process, the optimal filter becomes a probability measure supported on the surface (manifold) on which the covariance matrix of the observation noises is constant. Thus, the optimal filter is singular. In this chapter, we will demonstrate how to transform this singular filtering problem to the classical one that we studied in the previous chapters.
11.1
A special example
In this section, we consider an inspiring example that can be solved by elementary calculations. Suppose that (B, W) is a (d +m)-dimensional standard Brownian motion. Let the signal be given by the following stochastic differential equation: # j j j dXt = bj (Xt )dt + Xt dBt , j = 1, 2, . . . , d, (11.1) and let the observation process be given by 3 dYt = h(Xt )dt + Zt dWt ,
(11.2)
where b = (b1 , . . . , bd )∗ : Rd → Rd and h : Rd → Rm are continuous j mappings, and Zt = dj=1 Xt is a non-negative-valued process. Remark 11.1 We now consider the example introduced in Section 1.1.2. Suppose that the volatilities of the stocks are not deterministic. Instead, we assume that they are all equal to the square root of the sum of the
232
11 : Singular filtering
appreciation rates of all available stocks, namely 3 j j j j j = 1, 2, . . . , d. dSt = St Xt dt + Zt dWt , j
j
Then, Yt ≡ log St , j = 1, 2, . . . , d satisfy the equation (11.2) with m = d j and hj (x) = xj − 12 z. In this case, Xt is the appreciation rate of the jth stock. If bj (x) = αj (βj −xj ), j = 1, 2, . . . , d with αj , βj ≥ 0 being constants, then equation (11.1) is the Cox–Ingersoll–Ross (CIR) model for interest rates. We refer the reader to the book of Baxter and Rennie [6] for more details on this model. We can also use it to model the appreciation rate of the jth stock. √ Note that the volatility of the jth stock here is modelled as Zt for j = 1, 2, . . . , d. In general, the volatility of the jth stock may √ have a more general form, in particular, it may depend on j. We take Zt here for the convenience of the calculations. The general case can be solved using the method that will be introduced in other sections of this chapter. By Theorem 3.11 and Definition 2.30, we see that for 1 ≤ i, j ≤ m, the quadratic covariation process Y i , Y j t satisfies !
Yi, Yj
where 0 =
t0n
<
t1n
" t
= lim
n→∞
< ··· <
tnn
n
Ytin − Ytin k
k=1
k−1
j
j
k
k−1
Yt n − Yt n
,
= t is a partition of [0, t] with
n | → 0. max |tkn − tk−1
0≤k≤n
Then Y i , Y j t is Gt -measurable. By Theorem 3.6, we get that t ! " Yi, Yj = Zs dsδij . t
0
Therefore, Zt , t > 0, is an observable process. It will feedback and provide some extra information about the signal process Xt . For any z > 0, we define the hyperplane ⎧ ⎫ d ⎨ ⎬ xj = z . Mz := x ∈ Rd : ⎩ ⎭ j=1
As Xt1 + · · · + Xtd = Zt , the optimal filter πt is supported on the hyperplane MZt , namely, πt (MZt ) = 1. Note that 3 ¯ t )dt + Zt dW 0 , dZt = b(X (11.3) t
11.1
where ¯ b(x) ≡
d
bj (x) and
dWt0
j=1
≡
d j=1
7
A special example
j
Xt j dB . Zt t
By Theorem 3.13, it is easy to verify that Wt0 is a one-dimensional Brownian motion. Let ej be the unit vector in the jth axis of Rd , j = 1, 2, . . . , d, namely, the jth co-ordinate of ej is 1, and the other co-ordinates are 0. Let √ e¯ := (1, 1, . . . , 1)∗ / d. Note that the vectors b(x) and ej , 1 ≤ j ≤ d, can be decomposed orthogonally as follows: ˜ b(x) = b(x), e¯ e¯ + b(x),
1 ej = √ e¯ + e˜ j , d
˜ with b(x), e˜ j being orthogonal to e¯ . Then, in vector form, equation (11.1) becomes d # j j dXt = b(Xt )dt + Xt ej dBt (11.4) j=1
=
# d 1 ¯ 1 3 j j 0 ˜ e˜ j Xt dBt . √ e¯ b(Xt ) + b(Xt ) dt + √ e¯ Zt dWt + d d j=1
Let (x) be a d × d orthogonal matrix whose last column is # # ∗ x1 xd , . . . , . Define a d-dimensional process B˜ t by z z ˜ t = (Xt )∗ dBt . dB ˜ t is a d-dimensional Again, by Theorem 3.13, it is easy to show that B Brownian motion. Further, dWt0
=
d
j ˜ d, σjd (Xt )dBt = d B t
j=1
and hence, B˜ d = W 0 . As (t)∗ = (t)−1 , we can represent the Brownian j motions Bt , j = 1, 2, . . . , d, in terms of the stochastic integrals with respect ˜ d−1 and Wt0 : to the Brownian motions B˜ 1t , . . . , B t 7 d−1 j Xt j σjk (Xt )d B˜ kt + dWt0 , j = 1, 2, . . . , d. dBt = Zt k=1
233
234
11 : Singular filtering
By equation (11.3), the equation (11.4) is then rewritten as # d d−1 d j 1 X j ˜ t )dt + σjk (Xt ) Xt e˜ j d B˜ kt + dXt = √ e¯ dZt + b(X √ t e˜ j dWt0 . Zt d j=1 k=1 j=1 (11.5) Note that for fixed s and t with s < t, the mapping 1 x → ξt,s (x) ≡ x + √ e¯ (Zt − Zs ) d −1 is a one-to-one linear transformation from MZs onto MZt . Thus, ξt,0 is a
−1 mapping from MZt to Mz0 . Let κt = ξt,0 Xt . Then
1 κt = Xt − √ e¯ (Zt − z0 ) d satisfies the following stochastic differential equation on the hyperplane Mz0 : ˜ t )dt + dκt = b(X
d−1 d
# σjk (Xt )
k=1 j=1
j Xt e˜ j d B˜ kt
d j Xt + √ e˜ j dWt0 . Zt j=1
(11.6)
For κ ∈ Mz0 and z ∈ R+ , define ˆ b(κ, z) = b˜ κ + d −1/2 e¯ (z − z0 ) ,
σˆ 0 (κ, z) =
d κ j + d −1/2 (z − z0 ) e˜ j , √ z j=1
and, for k = 1, 2, . . . , d − 1, σˆ k (κ, z) =
d
# σjk κ + d −1/2 e¯ (z − z0 ) κ j + d −1/2 (z − z0 )˜ej .
j=1
Then, the singular filtering problem becomes the classical one with the signal κt given by the following SDE on Mz0 : ˆ t , Zt )dt + dκt = b(κ
d−1
˜ k + σˆ 0 (κt , Zt )dW 0 , σˆ k (κt , Zt )d B t t
(11.7)
k=1
and the observation process (Yt , Zt ) given by 3 ˆ t , Zt )dt + Zt dWt , dYt = h(κ
(11.8)
11.1
and
A special example
3 ˆ¯ 0 dZt = b(κ t , Zt )dt + Zt dWt ,
(11.9)
where
ˆ¯ ˆ h(κ, z) = h κ + d −1/2 e¯ (z − z0 ) and b(κ, z) = b¯ κ + d −1/2 e¯ (z − z0 ) . Define the optimal filter for κt by Y,Z Ut , f = E f (κt )Ft ,
∀ f ∈ Cb (Mz0 ).
To study the filtering problem with signal equation (11.7) and observations equations (11.8) and (11.9), we need to find the initial probability measure U0 on Mz . Suppose that the distribution of X0 has a continuous density π˜ 0 on Rd+ . Theorem 11.2 Let U0 be the conditional probability distribution of X0 given Z0 = z, i.e. U0 dx = P X0 ∈ dx|Z0 = z . Then, U0 (dx) = p(x)λz (dx), where λz is the Lebesgue measure on the hyperplane Mz and π˜ 0 (x) . ˜ 0 (y)λz (dy) Mz π
p(x) =
(11.10)
Proof For a test function φ defined on Rd+ , and a Borel set D in R+ , we have E φ(X0 )1Z0 ∈D = φ(x)1D (x1 + · · · + xd )π˜ 0 (x)dx Rd+
=
D Mz
By taking φ(x) ≡ 1, we get P(Z0 ∈ D) =
φ(x)π˜ 0 (x)λz (dx)dz.
D Mz
π˜ 0 (x)λz (dx)dz.
Thus, E φ(X0 )1Z0 ∈D = E E (φ(X0 )|Z0 ) 1Z0 ∈D E (φ(X0 )|Z0 = z) π˜ 0 (x)λz (dx)dz. = D
(11.11)
Mz
(11.12)
235
236
11 : Singular filtering
Comparing equations (11.11) and (11.12), we get that 8 E (φ(X0 )|Z0 = z) = φ(x)π˜ 0 (x)λz (dx) π˜ 0 (y)λz (dy). Mz
Mz
Therefore, U0 is absolutely continuous with respect to λz and the density is given by equation (11.10). In contrast to the filtering models equations (5.2) and (5.1), the coefficients in the equations (11.7), (11.8) and (11.9) here depend on the observation process Zt . However, since the observation processes are regarded as known (behaves deterministically) in the filtering problem, there is no essential difficulty in deriving the filtering equation and the numerical solutions for Ut using the methods introduced in the previous chapters. We leave the details to the interested reader. Finally, we note that Y,Z πt (·) = P Xt ∈ ·|Ft 1 Y,Z = P κt + √ e¯ (Zt − Z0 ) ∈ ·|Ft . d Thus, the optimal filter for Xt is given by πt (·) = Ut · − d −1/2 e¯ (Zt − Z0 ) ,
t > 0.
It is clear that for t = 0, the optimal filter π0 coincides with the initial distribution π˜ 0 of X. Note that the optimal filter πt is not continuous at time t = 0 since Z0 is not observed at time t = 0.
11.2
A general singular filtering model
In this section, we study the following filtering model: dXt = b(Xt )dt + c(Xt )dWt + c˜ (Xt )dBt , dYt = h(Xt )dt + σ (Xt )dWt ,
(11.13)
where B and W are two independent Brownian motions in Rd and Rm , respectively, and b, c, c˜ , h, σ are functions on Rd with values in Rd , Rd×m , Rd×d , Rm , Rm×m , respectively. The analysis can be easily extended to cover the case when all terms in equation (11.13) depend on X as well as on Y. Since the coefficient matrix of the observation noise plays an important role in this chapter, we reserve the notation σ for it. For the
11.2
A general singular filtering model
coefficient matrix of the second noise of the signal, we use c˜ instead of σ (contrary to what we did in the previous chapters). To aid the understanding of the material in Sections 11.2–11.4, we suggest the reader compares the results in these three sections with the corresponding ones in the previous section, which is more explicit. For convenience, we make the following assumption throughout the rest of this chapter. Condition (ND): For any x ∈ Rd , the m × m-matrix σ (x) is invertible. Without loss of generality, we can and will assume that σ (x) is a symmetric and positive-definite matrix. √ In fact,d we can define the −1 d × m-matrix-valued function α = cσ σ σ ∗ on R and m-dimensional ˜ process W by 3 −1 ˜t= σ σ ∗ (Xt ) σ (Xt )dWt . dW ˜ is a martingale with Meyer’s processes Then, W ! " ˜ i, W ˜ j = δij t, W ∀ 1 ≤ i, j ≤ m, t
and
!
˜ i , Bj W
" t
= 0,
∀ 1 ≤ i ≤ m, 1 ≤ j ≤ d.
˜ B) is a standard m + d-dimensional By Theorem 3.13, the process (W, Brownian motion. Note that the system equation (11.13) is equivalent to the system ˜ t + c˜ (Xt )dBt dXt = b(Xt )dt + α(Xt )d W √ ∗ ˜ t. dYt = h(Xt )dt + σ σ (Xt )d W √ In this case, σ σ ∗ is symmetric and positive-definite. Let Yt be Meyer’s process of Y. Recall that, since Y is a continuous semimartingale, Yt coincides with the quadratic covariation (Rm×m -valued) process of Y. From equation (11.13) it follows that t
Yt = σ 2 (Xt ) ds, 0
and hence the process σ 2 (Xt ) =
d dt
Yt is Gt -measurable for any t > 0.
Remark 11.3 The random variable σ 2 (X0 ) is not G0 -measurable, instead, it is G0+ -measurable. However, σ 2 (Xt ) is Gt -measurable for t > 0. + the set of all symmetric positive-definite m × m-matrices. Denote by Sm +. Then for x ∈ Rd , we have σ (x) and σ 2 (x) ∈ Sm
237
238
11 : Singular filtering + to Rm ˜ with m Next, we define a mapping a from Sm ˜ = m(m+1) as the 2 list of the diagonal entries and those above the diagonal in lexicographical + , a (r) is defined as order, i.e. for any r ∈ Sm
a (r)1 = r11 , a (r)2 = r12 , . . . , a (r)m = r1m , a (r)m+1 = r22 , . . . , a (r)2m−1 = r2m , a (r)2m = r33 , . . . , a (r)m˜ = rmm . + onto a(S + ) ⊂ Rm ˜ . Further, a and It is clear that a is one-to-one from Sm m are continuous.
a−1
+ ) is open in Rm ˜. Lemma 11.4 The set a(Sm + if and only Proof Suppose that σ ∈ Rm×m . It is well known that σ ∈ Sm if det(σk ) > 0, k = 1, 2, . . . , m, where σk is the k × k submatrix obtained from σ by removing the last m − k rows and m − k columns. Note that + ) consists det(σk ) is a polynomial of the entries in σ . Thus, the image a(Sm m ˜ of points in R such that these polynomials of its co-ordinates are positive. + ) is open. This implies that a(Sm + → S + such that for any Now let s be the square-root mapping s : Sm m + + such that s (r)2 = r. To r ∈ Sm , s (r) is the unique matrix belonging to Sm see that s is a continuous and, in particular, a Borel measurable mapping, we prove the following representation.
Lemma 11.5 Suppose that is a smooth closed path in the complex half-plane with positive real part and contains all the eigenvalues of the positive-definite matrix r. Then √ 1 s(r) = z(r − zI)−1 dz. (11.14) 2π + to itself. As a consequence, s is a continuous mapping from Sm
Proof Since r is a positive-definite matrix, there exists an orthogonal matrix r˜ such that r˜∗ r˜r = diag(λ1 , . . . , λm ), where diag(λ1 , . . . , λm ) is the diagonal matrix with diagonal (λ1 , . . . , λm ), and λ1 , . . . , λm > 0 are the eigenvalues of the matrix r. Then, √ √ √ z(r − zI)−1 = r˜diag z(λ1 − z)−1 , . . . , z(λm − z)−1 r˜∗ . By Cauchy’s theorem, we have # √ 1 z(λj − z)−1 dz = λj , 2π
j = 1, 2, . . . , m.
11.2
Thus,
A general singular filtering model
3 3 λ1 , . . . , λm r˜∗ s(r) = r˜diag √ 1 = z˜rdiag (λ1 − z)−1 , . . . , (λm − z)−1 r˜∗ dz 2π √ 1 = z(r − zI)−1 dz. 2π
Now, since σ 2 (Xt ) is Gt -measurable for any t>0, then σ (Xt )=s σ 2 (Xt ) is Gt -measurable for any t > 0, too. In fact, we have the following Lemma 11.6 Let Z, Yˆ be defined by Zt = σ (Xt ), t > 0 and ˜ t )dt + dWt , d Yˆ t = σ −1 (Xt )dYt = h(X where h˜ = σ −1 h. Then ˆ
Gt = FtY ∨ FtZ , t > 0. Proof As we indicated in Remark 11.3, we have FtZ ⊂ Gt . Since σ −1 (Xt ) ˆ is also FtZ -measurable, Yˆ t is Gt -measurable, i.e. FtY ⊂ Gt . Therefore ˆ
FtY ∨ FtZ ⊂ Gt .
(11.15)
On the other hand, as dYt = Zt d Yˆ t , we have, ˆ
Gt ⊂ FtY ∨ FtZ .
(11.16)
The conclusion of the lemma follows from equations (11.15) and (11.16). From Lemma 11.6, we see that the signal–observation pair can be written as ⎧ ⎨ dXt = b(Xt )dt + c(Xt )dWt + c˜ (Xt )dBt , ˜ t )dt + dWt , (11.17) d Yˆ = h(X ⎩ t Zt = σ (Xt ). We see now that the framework is truly non-classical as part of the observation process is noiseless. It follows that, given the observation, Xt takes + , M is defined as values in the set MZt where, for any z ∈ Sm z Mz = {x ∈ Rd : σ (x) = z}.
239
240
11 : Singular filtering
More precisely, if πt is the optimal filter, that is πt f = E( f (Xt )|Gt ), t > 0, then πt has support in MZt . Hence, πt will no longer have total support and will be singular with respect to the Lebesgue measure on Rd . To study the process πt , we need the following smoothness condition on σ . Condition (S): The matrix-valued function σ on Rd , together with its partial derivatives up to order 2, are bounded and Lipschitz continuous. As already observed, only the diagonal entries and those above the diagonal of the process Zt (in other words, a (Zt )) are required to generate FtZ . Hence, we only need to take into account the properties of the mapping aσ aσ : Rd −→ Rm˜ , defined as aσ (x) = a (σ (x)) for all x ∈ Rd . Note that, for all x ∈ Rd , ˜ × d matrix with (∇ ∗ aσ ) (x) is a linear mapping from Rd to Rm˜ , i.e. a m entries ij qij (x) = ∇ ∗ aσ (x) = ∂j aiσ (x), i = 1, . . . , m; ˜ j = 1, . . . , d. Definition 11.7 The vector x ∈ Rd is a regular point for aσ if and only if the matrix q(x) has full rank d ∧ m. ˜ We will study the optimal filter πt in the next two sections according to the type of level set Mz . For convenience, we will make the following assumption throughout the rest of this chapter. Condition (R): Every point in Rd is regular for aσ .
11.3
Optimal filter with discrete support
In this section, we consider the case when d ≤ m. ˜ We will show that Mz consists of countably many points and the optimal filter πt is a discrete type probability measure on Mz . Note that x ∈ Rd is a regular point if and only if # J (x) = det q∗ q(x) > 0. Applying Itô’s formula to aσ (Xt ) with Xt being given by equation (11.13), we get da (Zt ) = Laσ (Xt )dt + q (Xt ) c(Xt )dWt + c˜ (Xt )dBt , (11.18)
11.3
Optimal filter with discrete support
where L is the second-order differential operator given by d d 1 2 a˜ ij ∂ij f + bi ∂i f , Lf = 2 i,j=1
i=1
with a˜ = cc∗ + c˜ c˜ ∗ . Hence, −1 ∗ q (Xt ) da (Zt ) − Laσ (Xt )dt . c(Xt )dWt + c˜ (Xt )dBt = q∗ q Inserting back into equation (11.13), we have −1 ∗ q (Xt ) da (Zt ) − Laσ (Xt )dt . dXt = b(Xt )dt + q∗ q
(11.19)
Now we proceed to study the optimal filter πt . First, we consider the case when X0 = x0 is constant. By Conditions (S) and (BC), it is easy to show that the coefficients for the SDE (11.19) are Lipschitz continuous. Therefore, equation (11.19) has a unique strong solution Xt that is a functional of {a(Zs ) : s ≤ t} and x0 = 0. Thus, Xt is Gt -measurable and hence, t > 0. πt f = Ex (f (Xt )|Gt ) = f (Xt ), In other words, the noiseless component of the observation uniquely identifies the conditional distribution of the signal, given the observation: πt = δXt ,
t > 0.
Next, we consider the case when X0 is not constant. We assume that the distribution of X0 has a continuous density π˜ 0 with respect to the Lebesgue measure in Rd . Now we need to take into account the additional information that may arise from observing the quadratic variation of the process a (Zt ) and the covariation process between a(Zt ) and Yt . This will not influence the trajectory of the process. However, it may reduce the number of possible initial values. By equation (11.18), we see that the quadratic covariation process of a (Zt ) is t q(Xs ) cc∗ (Xs ) + c˜ c˜ ∗ (Xs ) q∗ (Xs ) ds. 0
It follows from equations (11.18) and (11.13) that the quadratic covariation process between a(Zt ) and Yt is t qc(Xs )Zs∗ ds. 0
241
242
11 : Singular filtering
Therefore, ˜ 0 ≡ q cc∗ + c˜ c˜ ∗ q∗ (X0 ) , qc(X0 )Z∗ Z 0 is G0+ measurable. We now divide the discussion into two cases. Case 1 The matrices q cc∗ + c˜ c˜ ∗ q∗ and qc are functions of aσ , in other +) words there exist two Borel measurable functions H1 and H2 from a(Sm m× ˜ m ˜ m×m ˜ to R and R , respectively, such that ∗ q cc + c˜ c˜ ∗ q∗ = H1 (aσ ) and qc = H2 (aσ ). (11.20) ˜ 0 = H1 (a (Z0 )) , H2 (a (Z0 )) Z∗ brings no new knowledge, In this case, Z 0 hence we can ignore it. Before we can proceed further, we need an area formula and the definition of the Hausdorff measure. We will state them here for the convenience of the reader. We refer the reader who is interested in more details to the book of Evans and Gariepy [62] for the proofs and other related definitions. Definition 11.8 (i) Let A ⊂ Rd , 0 ≤ s < ∞ and δ > 0. Define ⎧ ⎫ ∞ ⎨ ⎬ s Hδs (A) = inf α(s) r(Cj ) A ⊂ ∪∞ C , r(C ) ≤ δ , j j=1 j ⎩ ⎭ j=1
where r(Cj ) is the radius of the ball Cj and 8 ∞ s/2 α(s) = π xs/2 e−x dx. 0
(ii) Let
Hs (A) = sup Hδs (A). δ>0
We call Hs the s-dimensional Hausdorff measure on Rd . It can be proved that H0 is the counting measure. If s = k is a positive integer, then Hk agrees with ordinary “k-dimensional surface area” on nice sets; this is the reason we include the normalizing constant α(s) in the definition. Now, we are ready to state the area formula. Lemma 11.9 (Area formula) For every g ∈ L1 (Rd , J(x)dx), we have g(x)J(x)dx = g(x)Hd (du). Rd
Rm˜ x∈M a−1 (u)
11.3
Optimal filter with discrete support
Now we can state the result for the case of d ≤ m ˜ when equation (11.20) is satisfied. Theorem 11.10 Suppose that Z0 = z and Mz = {xi , i ∈ I} . Suppose that X0 has a continuous density π˜ 0 with respect to the Lebesgue measure in Rd , and the conditions (R, S, BC) are satisfied. If π˜ 0 (xj ) j∈I J (xj ) < ∞, then pi δXˆ i , t > 0, πt = t
i∈I
ˆ i is the solution to equation (11.19) with initial xi , and where X t π˜ 0 (xi ) J(xi ) π˜ 0 (xi ) j∈I J(xi )
pi =
.
Proof Let µz be the conditional probability distribution of X0 given Z0 = z, i.e. µz dx = P X0 ∈ dx|Z0 = z . For any B ∈ B (Rm˜ ) and φ ∈ L1 (Rd , π˜ 0 (x)dx), let g(x) = φ(x)1aσ (x)∈B By the area formula, we have
E φ(X0 )1aσ (X0 )∈B =
Rd
Rd
= =
φ(x)1aσ (x)∈B π˜ 0 (x)dx g(x)J(x)dx
Rm˜
=
π˜ 0 (x) . J(x)
x∈Ma−1 (u)
B x∈M
Taking φ = 1, we get P(aσ (X0 ) ∈ B) =
g(x)Hd (du)
φ(x)
a−1 (u)
B x∈M
a−1 (u)
π˜ 0 (x) d H (du). J(x)
π˜ 0 (x) d H (du). J(x)
(11.21)
243
244
11 : Singular filtering
Thus, E φ(X0 )1aσ (X0 )∈B = E 1a(Z0 )∈B φ(x)µZ0 (dx) =
φ(x)µa−1 (u) (dx)
B
Comparing with equation (11.21), we get φ(x)µa−1 (u) (dx) B
=
x ∈Ma−1 (u)
B x∈M
a−1 (u)
x ∈Ma−1 (u)
π˜ 0 (x ) d H (du). J(x )
π˜ 0 (x ) d H (du) J(x )
π˜ 0 (x) φ(x)Hd (du). J(x)
Therefore, φ(x)µz (dx)
π˜ 0 (x) π˜ 0 (x ) = φ(x). J(x ) J(x)
x ∈Mz
x∈Mz
Hence, µz has the support in the set Mz and µz = pi δxi . i∈I
Following the case with constant initial, we then have ˆ i ), πt f = Eµz (f (Xt )|Gt ) = pi f (X ∀t > 0. t i∈I
Remark 11.11 Since Z0 is not observed at time 0, π0 is deterministic and is given by the law of X0 for which we used the notation π˜ 0 . Therefore, the optimal filter πt is not continuous at t = 0. Case 2 If q cc∗ + c˜ c˜ ∗ q∗ and qc are not functions of aσ , then µ has support in the set ˜ z, z , z = {x ∈ Rd : σ (x) = z, q cc∗ + c˜ c˜ ∗ q∗ (x) = z1 , qc(x) = z2 }, M 1 2 and a similar formula as above is valid under additional smoothness assumptions c˜ . Namely, we replace σ (x) in Case 1 by σ˜ (x) = ∗ on c∗ and ∗ (σ (x), q cc + c˜ c˜ q (x) , qc(x)) and continue with the discussion there.
11.4
11.4
Optimal filter supported on manifolds
Optimal filter supported on manifolds
In this section, we consider the case when d > m. ˜ We will show that Mz is a surface (manifold) and the optimal filter πt is a probability measure on the surface Mz and is absolutely continuous with respect to the surface measure. Note that x ∈ Rd is a regular point if and only if # J (x) = det qq∗ (x) > 0. We list some facts about the transformation aσ without giving their proofs. Again, we refer the reader to the book of Evans and Gariepy [62] for more details. We shall use Tx Mz to denote the space consisting of all the tangent vectors of the manifold (surface) Mz at point x ∈ Mz . Tx Mz is called the tangent space of Mz at x. We will use Nx Mz to denote the orthogonal complement of Tx Mz in Rd . We call it the normal space of Mz at x. ˜ Lemma 11.12 i) For any u ∈ Rm˜ , Ma−1 (u) is a d-dimensional manifold ˜ d ˜ where d = d − m. ˜ The Hausdorff measure H on Ma−1 (u) is the surface measure. ii) For any x ∈ Mz , the rows of the matrix q generate the normal space Nx Mz to the manifold Mz at point x. iii) Let ρ(x) = q∗ (qq∗ )−1 (x) and p(x) = ρ(x)q(x). Then p(x) is the orthogonal projection matrix from Rm˜ to the subspace N x Mz . For simplicity of presentation, we make an assumption that is slightly stronger than equation (11.20). Condition (IN): There exist two Borel measurable functions H1 and H2 + ) to Rm×m ˜ ˜ from a(Sm and Rm×d , respectively, such that qc = H1 (aσ ) and q˜c = H2 (aσ ).
(11.22)
Throughout the rest of this section, the assumptions (IN), (R) and (S) will be in force. Now, we proceed to study the “initial” π0+ of the optimal filter πt . Note that the real initial π0 coincides with the law of X0 , while π0+ is the initial for the filtering equation satisfied by the optimal filter. Another point of view is that π0+ is the initial of the optimal filter when both Zt and Yt , including Z0 , are observed. We need a co-area formula whose proof and related definitions can be found in Chapter 3 of [62].
245
246
11 : Singular filtering
Lemma 11.13 (Co-area formula) For every g ∈ L1 (Rd , J(x)dx), the restriction of g to the level set Ma−1 (u) is Hd−m˜ -integrable for almost all u ∈ Rm˜ , and g(x)J(x)dx = g(x)Hd−m˜ (dx)du. Rm˜
Rd
Ma−1 (u)
The next theorem gives the “initial” π0+ of the optimal filter in the case of d > m. ˜ Theorem 11.14 For u ∈ Rm˜ , let λu denote the surface measure on the level set Ma−1 (u) . Suppose that X0 has a continuous density π˜ 0 that is not identically zero on Ma−1 (u) , and satisfies the following integrability condition: π˜ 0 (x) λu (dx) < ∞. J(x) M −1 a
(u)
Let µz be the conditional probability distribution of X0 given Z0 = z, i.e. µz dx = P X0 ∈ dx|Z0 = z . Then, µz (dx) = p(x)λz (dx), where π˜ 0 (x)/J(x) . ˜ 0 (y)/J(y)λu (dy) Mz π
p(x) =
Proof For any test function φ defined on Rd , and any Borel set D in Rm˜ , we define π˜ 0 (x) g(x) = φ(x)1σ (x)∈D . J(x) Then, by the co-area formula, we have E φ(X0 )1σ (X0 )∈D = g(x)J(x)dx Rd = g(x)Hd−m˜ (dx)du Rm˜
Ma−1 (u)
π˜ 0 (x) d−m˜ = φ(x) H (dx) du J(x) D Ma−1 (u) π˜ 0 (x) = λu (dx) du. φ(x) J(x) D M −1
a
(u)
11.4
Optimal filter supported on manifolds
By taking φ(x) ≡ 1, we get P(σ (X0 ) ∈ D) =
D
Ma−1 (u)
π˜ 0 (x) λu (dx) du. J(x)
The result follows from the definition of the conditional expectation.
As we did in equation (11.5), we now decompose the vector fields in the SDE satisfied by the signal according to their components in the spaces Tx Mz and Nx Mz . In contrast to the case in Section 11.1, here the tangent space and the normal space depend on the location of the point on the manifold, it is more convenient to use the Stratonovich form for the signal process. Lemma 11.15 The signal Xt satisfies the following SDE in Stratonovich form: ˜ t )dt + c(Xt ) ◦ dWt + c˜ (Xt ) ◦ dBt , dXt = b(X
(11.23)
where for i = 1, 2, . . . , d, the ith component of b˜ is d m k 1 1 ckj ∂k cij − c˜ kj ∂k c˜ ij . b˜ i = bi − 2 2 k=1 j=1
k, j=1
Proof By equations (11.13) and (3.26), we have dXt = b(Xt )dt + c(Xt )dWt + c˜ (Xt )dBt = b(Xt )dt + c(Xt ) ◦ dWt + c˜ (Xt ) ◦ dBt −
1 1 d c(X), Wt − d ˜c(X), Bt . 2 2
(11.24)
Applying Itô’s formula, we get dc(Xt ) = Lc(Xt )dt +
d
⎛ ⎞ m d j j ∂k c(Xt ) ⎝ ckj (Xt )dWt + c˜ kj (Xt )dBt ⎠ . j=1
k=1
j=1
Thus, d m " d ! j d
c(X), Wt = W ,W ∂k c(Xt ) ckj (Xt ) t dt dt k=1
j=1
247
248
11 : Singular filtering
=
d m
ckj ∂k c(Xt )ej
k=1 j=1
=
d m
ckj ∂k c·j (Xt ),
k=1 j=1
here, c·j denotes the jth column of the matrix c. Similarly, we can prove that k d
˜c(X), Bt = c˜ kj ∂k c˜ ·j (Xt ). dt k,j=1
Inserting back into equation (11.24), we see that equation (11.23) holds. Note that a(Zt ) is observable. Recall equation (11.18) that da(Zt ) = Laσ (Xt )dt + q(Xt ) c(Xt )dWt + c˜ (Xt )dBt .
(11.25)
Define Vt =
t
0
qc(Xs )dWs +
t 0
q˜c(Xs )dBs .
Then, Vt is an m-dimensional ˜ continuous martingale with quadratic covariation process
Vt =
t
0
H(a(Zs ))ds,
where H = H1 H1∗ + H2 H2∗ . Lemma 11.16 The observation process a(Zt ) satisfies ˜ t )dt. da(Zt ) = qc(Xt ) ◦ dWt + q˜c(Xt ) ◦ dBt + qb(X
(11.26)
Proof Similar to the proof of Lemma 11.15, we have dVt = qc(Xt ) ◦ dWt + q˜c(Xt ) ◦ dBt −
d m 1 ∂k (q·j cj )ck (Xt )dt 2 k,j=1 =1
−
d 1 ∂k (q·j c˜ j )˜ck (Xt )dt 2 k,j,=1
11.4
Optimal filter supported on manifolds
d d 1 1 ∗ 2 ∗ = qc(Xt ) ◦ dWt + q˜c(Xt ) ◦ dBt − (cc )jk ∂kj adt− q∂k cc·k dt 2 2 k,j=1
k=1
d d 1 1 ∗ 2 ∗ − (˜cc˜ )jk ∂kj adt − q∂k c˜ c˜ ·k dt 2 2 k,j=1
k=1
˜ t )dt. = qc(Xt ) ◦ dWt + q˜c(Xt ) ◦ dBt − Laσ (Xt )dt + qb(X Combining with equation (11.25), we see that equation (11.26) holds. Finally, we arrive at the main decomposition result. Theorem 11.17 The filtering model can be rewritten as ˜ t )dt + c(Xt ) ◦ dWt + c˜ (Xt ) ◦ dBt + ρ(Xt ) ◦ da(Zt ), dXt = (I − p) ◦ b(X (11.27) with observations da(Zt ) = Laσ (Xt )dt + dVt ,
(11.28)
dYt = h(Xt )dt + Zt dWt .
(11.29)
and
Proof By Lemma 11.15, we have ˜ t )dt + c(Xt ) ◦ dWt + c˜ (Xt ) ◦ dBt dXt = b(X ˜ t )dt + (I − p)c(Xt ) ◦ dWt + (I − p)˜c(Xt ) ◦ dBt = (I − p)b(X ˜ t )dt + pc(Xt ) ◦ dWt + p˜c(Xt ) ◦ dBt . + pb(X
(11.30)
By equation (11.26) and Corollary 3.27, we get ˜ t )dt + pc(Xt ) ◦ dWt + p˜c(Xt ) ◦ dBt . ρ(Xt ) ◦ da(Zt ) = pb(X Inserting back into equation (11.30), we see that equation (11.27) holds. The equality equation (11.28) is a rewrite of equation (11.25). Equation (11.29) is just the original observation model equation (11.13). Let {ξt,s : 0 ≤ s ≤ t} be the stochastic flow associated with the SDE: dξt = ρ(ξt ) ◦ da(Zt ).
(11.31)
Lemma 11.18 The flow ξt,s maps MZs to MZt . Further, the process ξt is FtZ -adapted.
249
250
11 : Singular filtering
Proof Applying the Stratonovich form of Itô’s formula, we get daσ (ξt ) = ∇ ∗ aσ (ξt )ρ(ξt ) ◦ da(Zt ) = da(Zt ). Thus, for σ (ξs ) = Zs , we get σ (ξt ) = Zt . The second conclusion follows from the uniqueness of the solution to the SDE equation (11.31). Denote the column vectors of (I − p)c and (I − p)˜c by g1 , . . . , gm and ˜ Then for each x ∈ Mz , the g˜ 1 , . . . , g˜ d , respectively. Let b0 = (I − p)b. vectors g1 (x), . . . , gm (x), g˜ 1 (x), . . . , g˜ d (x), b0 (x) are all in Tx Mz . The signal process Xt satisfies dXt = ρ(Xt ) ◦ da(Zt ) +
m
gi (Xt ) ◦ dWti +
i=1
d
j
g˜ j (Xt ) ◦ dBt + b0 (Xt )dt.
j=1
(11.32) of the stochastic flow ξ By Theorem 4.18, the Jacobian matrix ξt,s t,s is −1 invertible. The operator (ξt,s )∗ defined below pulls a vector in the tangent space of MZt at ξt,s (x) back to a vector in the tangent space of MZs at x.
Definition 11.19 Let g be a vector field in Rd . The random vector field −1 (ξt,s )∗ g is defined as −1 −1 )∗ g(x) ≡ (ξt,s ) g(ξt,s (x)) (ξt,s
for any regular point x ∈ Rd . Similar to what we did in Section 11.1, we consider the following SDE on Rd : dκt =
−1 (ξt,0 )∗ b0 (κt )dt
m d j −1 −1 i + (ξt,0 )∗ gi (κt ) ◦ dWt + (ξt,0 )∗ g˜ j (κt ) ◦ dBt . i=1
j=1
(11.33) Lemma 11.20 The SDE equation (11.33) has a unique strong solution. Further, if Z0 = z, then κt ∈ Mz for all t ≥ 0 a.s. )−1 satisfies an equation of the form equation (4.19). Proof Note that (ξt,0 By Gronwall’s inequality, it is easy to show that −1 p E sup (ξt,0 ) < ∞,
∀p > 1.
0≤t≤T
Therefore, we can prove that there is a constant K such that 2 −1 −1 E (ξt,0 )∗ b0 (κ1 ) − (ξt,0 )∗ b0 (κ2 ) ≤ K|κ1 − κ2 |2 , ∀ κ1 , κ2 ∈ Rd .
11.4
Optimal filter supported on manifolds
The same inequalities hold with b0 replaced by gi , 1 ≤ i ≤ m or g˜ j , 1 ≤ j ≤ d. By the same argument as in the proof of Theorem 4.8, we see that equation (11.33) has a unique strong solution. Applying Itô’s formula to equation (11.33), we get daσ (κt ) =
−1 q(κt )(ξt,0 )∗ b0 (κt )dt
+
−1 )∗ gi (κt ) ◦ dWti q(κt )(ξt,0
m i=1
+
j −1 )∗ g˜ j (κt ) ◦ dBt . q(κt )(ξt,0
d j=1
−1 Since b0 (ξt,0 (κt )) ∈ Tξt,0 (κt ) MZt , we have (ξt,0 )∗ b0 (κt ) ∈ Tκt Mσ (κt ) . As the row vectors of q(κt ) are in Nκt Mσ (κt ) , we see that −1 q(κt )(ξt,0 )∗ b0 (κt ) = 0.
The same equalities hold with b0 replaced by gi , 1 ≤ i ≤ m or g˜ j , 1 ≤ j ≤ d. Thus, daσ (κt ) = 0, and hence, aσ (κt ) = aσ (κ0 ), ∀ t ≥ 0. This proves that σ (κt ) = z, and hence κt ∈ Mz , ∀ t ≥ 0, a.s. The next theorem gives the decomposition of the signal process. Theorem 11.21 For almost all ω ∈ , we have Xt (ω) = ξt,0 (κt (ω), ω),
∀t ≥ 0.
(11.34)
˜ t (ω). Applying Proof Denote the right-hand side of equation (11.34) by X Itô’s formula to ξt,0 and κt in equations (11.31) and (11.33), we get ˜t = dX
d
j
∂j ξt,0 (κt ) ◦ dκt + ρt (ξt,0 (κt )) ◦ da(Zt )
j=1 (κt )(ξt,0 (κt ))−1 b0 (ξt,0 (κt ))dt = ξt,0
+
m
ξt,0 (κt )(ξt,0 (κt ))−1 gi (ξt,0 (κt )) ◦ dWti
i=1
+
d
˜ t ) ◦ da(Zt ) ξt,0 (κt )(ξt,0 (κt ))−1 g˜ j (ξt,0 (κt )) ◦ dBt + ρt (X j
j=1
˜ t ) ◦ da(Zt ) + = ρ(X
m i=1
˜ t ) ◦ dW i + gi (X t
d j=1
˜ t ) ◦ dBjt + b0 (X ˜ t )dt. g˜ j (X
251
252
11 : Singular filtering
By the uniqueness of the solution to equation (11.32), we see that the representation equation (11.34) holds. The optimal filter then satisfies ˆ
πt f = E( f (ξt,0 (κt ))|FtY ∨ FtZ ),
∀ t > 0.
Note that ξt,0 is FtZ -measurable. Thus, we may regard ξt,0 as known and the singular filtering problem can be transformed to a classical one as follows: For f ∈ Cb (Mz ), let ˆ Ut f ≡ E f (κt )|FtY ∨ FtZ . Then, Ut is the optimal filter with the signal process κt given by equation (11.33) and the observation (Yˆ t , a(Zt )) given by d Yˆ t = Zt−1 h(ξt,0 (κt ))dt + dWt ,
(11.35)
and da(Zt ) = Laσ (ξt,0 (κt ))dt + H1 (a(Zt ))dWt + H2 (a(Zt ))dBt .
(11.36)
Note that the filtering problem with signal equation (11.33) and observations equations (11.35) and (11.36) are classical. We leave the detail on how to derive Zakai’s equation and the filtering equation for Ut to the reader. Finally, we note that t > 0. (11.37) πt f = Ut f ◦ ξt,0 , As we studied in Sections 11.1 and 11.3, π0 is not given by equation (11.37). In fact, π0 coincides with π˜ 0 , which is the initial distribution of X.
11.5
Filtering model with Ornstein–Uhlenbeck noise
In this section, we consider the filtering problem with the OU process as the observation noise. As we indicated in Section 1.1.4, the OU process is an approximation of the white noise that exists in the sense of a generalized function only. Recall that the observation model is yt = h(Xt ) + Ot , while Ot is the OU process given by dOt = −Ot dt + dWt ,
(11.38)
11.5
Filtering model with Ornstein–Uhlenbeck noise
h : Rd → Rm is continuous and Wt is an m-dimensional Brownian motion. Suppose that the signal process Xt is given by dXt = b(Xt )dt + c(Xt )dBt , where b : Rd → Rd , c : Rd → Rd×d are continuous mappings, and Bt is a d-dimensional Brownian motion independent of W. Now we transform this filtering problem with OU-process noise to a singular filtering problem studied in the previous sections. Let Yt =
t 0
e−s d es ys .
y
Then FtY = Ft and, by Itô’s formula, dYt = Lh(Xt ) + h(Xt ) dt + dWt + ∇ ∗ hc(Xt )dBt . Define ∗ −1/2 dWt + ∇ ∗ hc(Xt )dBt . dVt = I + ∇ ∗ hc ∇ ∗ hc Then Vt is an m-dimensional Brownian motion and ∗ 1/2 (Xt )dVt . dYt = Lh(Xt ) + h(Xt ) dt + I + ∇ ∗ hc ∇ ∗ hc
(11.39)
Let d V˜ t =
∇ ∗ hc
∗
∇ ∗ hc + I
−1/2
∇ ∗ hc
∗
dWt − dBt .
Then V˜ is a d-dimensional Brownian motion independent of Vt . It is easy to solve for dBt to get that dBt =
∗ −1 ∗ ∗ ∗ 1/2 ∇ ∗ hc ∇ ∗ hc + I (Xt )dVt ∇ hc I + ∇ ∗ hc ∇ ∗ hc ∗ −1/2 − ∇ ∗ hc ∇ ∗ hc + I d V˜ t .
The signal process is then written as ∗ −1/2 d V˜ t (11.40) dXt = b(Xt )dt − c(Xt ) ∇ ∗ hc ∇ ∗ hc + I ∗ −1 ∗ ∗ ∗ 1/2 ∇ hc I+∇ ∗ hc ∇ ∗ hc (Xt )dVt . +c(Xt ) ∇ ∗ hc ∇ ∗ hc+I It is clear that the filtering model equations (11.39) and (11.40) is a special case of the singular filtering model equation (11.13).
253
254
11 : Singular filtering
11.6
Notes
Stochastic filtering with the Ornstein–Uhlenbeck process as the observation noise has been studied by Kunita [95], Mandal and Mandrekar [121] and Gawarecki and Mandrekar [68] under the condition that the signal is differentiable in time. Bhatt and Karandikar [10] relax the differentiability condition on the signal by smoothing the observation function. Most of the material in this chapter is taken from the paper of Crisan et al. [44]; the main ideas come from the papers of Joannides and LeGland [77], [78]. Section 11.1 is based on an unpublished manuscript written jointly with Xunyu Zhou.
Bibliography
[1] R. A. Adams (1975). Sobolev spaces. Pure and Applied Mathematics, Vol. 65. Academic Press, New York-London. [2] R. Atar (1998). Exponential stability for nonlinear filtering of diffusion processes in non-compact domain. Ann. Probab. 26, 1552–1574. [3] R. Atar and O. Zeitouni (1997). Exponential stability for nonlinear filtering. Ann. Inst. H. Poincaré Probab. Statist. 33, no. 6, 697–725. [4] R. Atar and O. Zeitouni (1997). Lyapunov exponents for finite state nonlinear filtering. SIAM J. Control Optim. 35, 36–55. [5] P. Baxendale, P. Chigansky and R. Liptser (2004). Asymptotic stability of the Wonham filter: ergodic and nonergodic signals, SIAM J. Opt. Contr. 43, no. 2, 643–669. [6] M. Baxter and A. Rennie (1996). Financial calculus: an introduction to derivative pricing, Cambridge University Press. [7] V. E. Beneš and I. Karatzas (1983). Estimation and control for linear, partially observable systems with non-Gaussian initial distribution. Stochastic Process Appl. 14, 233–248. [8] P. Bernard, D. Talay, and L. Tubaro (1994). Rate of convergence for the Kolmogorov equation with variable coefficients. Math. Comp. 63, 555–587. [9] A. G. Bhatt, A. Budhiraja, and R. L. Karandikar (2000). Markov property and ergodicity of the nonlinear filter. SIAM J. Control Optim. 39, 928–949. [10] A. G. Bhatt and R. L. Karandikar (2003). On filtering with Ornstein– Uhlenbeck process as noise. J. Ind. Statist. Assoc. 41, no. 2, 205–220. [11] P. Billingsley (1986). Probability and measure. Wiley, New York. [12] G. Birkhoff (1967). Lattice theory, Am. Math. Soc. Publ. 25, 3rd edn. [13] B. Z. Bobrovsky and M. Zakai (1975). A lower bound on the estimation error for Markov processes. IEEE Trans. Automat. Control 20, no. 6, 785–788. [14] B. Z. Bobrovsky and M. Zakai (1975). A lower bound on the estimation error for certain diffusion processes. IEEE Trans. Inform. Theory IT-22, no. 1, 45–52.
256
Bibliography
[15] R. S. Bucy and R. E. Kalman (1961). New results in linear filtering and prediction theory. J. Basic Eng., Trans. ASME 83, 95–108. [16] A. Budhiraja (2001). Ergodic properties of the nonlinear filter. Stochastic Process. Appl. 95, 1–24. [17] A. Budhiraja (2002). On invariant measures of discrete time filters in the correlated signal-noise case. Ann. Appl. Probab. 12, no.3, 1096– 1113. [18] A. Budhiraja (2003). Asymptotic stability, ergodicity and other asymptotic properties of the nonlinear filter. Ann. Inst. H. Poincaré Probab. Statist. 39, no. 6, 919–941. [19] A. Budhiraja and G. Kallianpur (1996). Approximations to the solution of the Zakai equation using multiple Wiener and Stratonovitch integral expansions. Stochastics Stochastics Rep. 56, 271–315. [20] A. Budhiraja and H. J. Kushner (1998). Robustness of nonlinear filters over the infinite time interval. SIAM J. Control Optim. 36, 1618–1637. [21] A. Budhiraja and H. J. Kushner (1999). Approximation and limit results for nonlinear filters over an infinite time interval. SIAM J. Control Optim. 37, no. 6, 1946–1979 [22] A. Budhiraja and H. J. Kushner (2001). Monte Carlo algorithms and asymptotic problems in nonlinear filtering. Stochastics in finite/infinite dimension, in: trends math., Birkhäuser, Boston, 59–87. [23] A. Budhiraja and H. J. Kushner (2000). Approximation and limit results for nonlinear filters over an infinite time interval. II. Random sampling algorithms. SIAM J. Control Optim. 38, no. 6, 1874–1908 [24] A. Budhiraja and D. L. Ocone (1997). Exponential stability of discrete time filters without signal ergodicity. Systems Control Lett. 30, 185–193. [25] A. Budhiraja and D. L. Ocone (1999). Exponential stability in discrete time filtering for non-ergodic signals. Stochastic Process. Appl. 82, 245–257. [26] H. Carvalho, P. Del Moral, A. Monin and G. Salut (1997). Optimal nonlinear filtering in GPS/INS integration. IEEE Trans. Aerosp. Electron. Syst., 33, no. 3, 835–850. [27] F. Cérou (1994). Long time asymptotics for some dynamical noise free nonlinear filtering problems. Rapport de Recherche 2446, INRIA. [28] T. Chiang, G. Kallianpur and P. Sundar (1991). Propagation of chaos and McKean-Vlasov equation in duals of nuclear spaces. Appl. Math. Optim. 24, 55–83. [29] P. Chigansky (2006). An ergodic theorem for filtering with applications to stability. Systems Control Lett. 55, no. 11, 908–917.
Bibliography
[30] P. Chigansky (2006). Stability of the nonlinear filter for slowly switching Markov chains. Stochastic Process. Appl. 116, no. 8, 1185–1194. [31] P. Chigansky and R. Liptser (2004). Stability of nonlinear filters in nonmixing case. Ann. Appl. Probab. 14, no. 4, 2038–2056. [32] P. Chigansky and R. Liptser (2006). On a role of predictor in the filtering stability. Electron. Comm. Probab. 11, 129–140 [33] P. L. Chow, R. Z. Khasminskii and R. S. Liptser (1997). Tracking of a signal and its derivatives in Gaussian white noise. Stochastic Process. Appl. 69, no. 2, 259–273. [34] J. M. C. Clark, D. L. Ocone, and C. Coumarbatch (1999). Relative entropy and error bounds for filtering of Markov processes. Math. Control Signals Syst. 12, 346–360. [35] D. Crisan (2001). Particle filters—a theoretical perspective. Sequential Monte Carlo methods in practice, 17–41, Stat. Eng. Inf. Sci., Springer, New York. [36] D. Crisan (2002). Numerical methods for solving the stochastic filtering problem, Numerical methods and stochastics (Toronto, ON, 1999), 1–20, Fields Inst. Commun., 34, Amer. Math. Soc., Providence, RI. [37] D. Crisan (2003). Exact rates of convergence for a branching particle approximation to the solution of the Zakai equation. Ann. Probab. 31, no. 2, 693–718. [38] D. Crisan (2004). Superprocesses in a Brownian environment. Stochastic analysis with applications to mathematical finance. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 460, no. 2041, 243–270. [39] D. Crisan, P. Del Moral and T. Lyons (1999). Interacting particle systems approximations of the Kushner–Stratonovitch equation. Adv. Appl. Probab. 31, no. 3, 819–838. [40] D. Crisan and A. Doucet (2002). A survey of convergence results on particle filtering methods for practitioners. IEEE Trans. Signal Process. 50, no. 3, pp 736–746. [41] D. Crisan, J. Gaines and T. Lyons (1998). Convergence of a branching particle method to the solution of the Zakai equation. SIAM J. Appl. Math. 58, no. 5, 1568–1590 (electronic). [42] D. Crisan and T. Lyons (1997). Nonlinear filtering and measurevalued processes. Probab. Theory Related Fields, 109, 217–244. [43] D. Crisan and T. Lyons (1999). A particle approximation of the solution of the Kushner–Stratonovitch equation. Probab. Theory Related Fields 115, no. 4, 549–578. [44] D. Crisan, M. Kouritzin and J. Xiong (2007). Nonlinear filtering with signal depending observation noise. Submitted.
257
258
Bibliography
[45] D. Crisan and J. Xiong (2007). A central limit type theorem for particle filter. Comm. Stoch. Analysis 1, no. 1, 103–122. [46] G. Da Prato, M. Fuhrman and P. Malliavin (1995). Asymptotic ergodicity for the Zakai filtering equation. C. R. Acad. Sci. Paris Sér. I Math. I 321, 613–616. [47] G. Da Prato and J. Zabczyk (1992). Stochastic equations in infinite dimensions. University Press, New York. [48] E. B. Davies (1989). Heat kernels and spectral theory. Cambridge University Press. [49] D. A. Dawson and J. Vaillancourt (1995). Stochastic McKean– Vlasov equations. NoDEA Nonlinear Differential Equations Appl. 2, 199–229. [50] P. Del Moral (1995). Non-linear filtering using random particles. Theory Probab. Appl. 40, 690–701. [51] P. Del Moral (1996). Non-linear filtering: interacting particle resolution. Markov Process. Related Fields 2, no. 4, 555–581. [52] P. Del Moral and A. Guionnet (1999). On the stability of measurevalued processes with applications to filtering. C. R. Acad. Sci. Paris Sér. I Math. I 329, 429–434. [53] P. Del Moral and A. Guionnet (1999) Central limit theorem for nonlinear filtering and interacting particle systems. Ann. Appl. Probab. 9, no. 2, 275–297. [54] P. Del Moral and A. Guionnet (2001). On the stability of interacting processes with applications to filtering and genetic algorithms. Ann. Inst. H. Poincaré Probab. Statist. 37, no. 2, 155–194. [55] P. Del Moral, J. C. Noyer and G. Salut (1995). Résolution particulaire et traitement non-linéaire du signal: application radar/sonar. In traitement du signal 12, no. 4, 287–301. [56] P. Del Moral and L. Miclo (2000). Branching and interacting particle systems approximations of Feynman–Kac formulae with applications to non-linear filtering. Séminaire de Probabilités, XXXIV, Lecture Notes in Math., 1729, Springer, Berlin, 1–145. [57] P. Del Moral and L. Miclo (2002). On the stability of nonlinear Feynman–Kac semigroups. Ann. Fac. Sci. Toulouse Math. (6) 11, no. 2, 135–175. [58] B. Delyon and O. Zeitouni (1991). Lyapunov exponents for filtering problem. Applied Stochastic Analysis, M. H. A. Davis and R. J. Elliot, ed., Gordon & Breach, New York, 511–521. [59] G. B. Di Masi, M. Pratelli and W. J. Runggaldier (1985). An approximation for the nonlinear filtering problem with error bound. Stochastics. 14, 247–271.
Bibliography
[60] G. B. Di Masi and L. Stettner (2005). Ergodicity of hidden Markov models. Math. Control Signals Systems 17, no. 4, 269–296. [61] T. E. Duncan (1967). Doctoral Dissertation, Department of Electrical Engineering, Stanford University. [62] L.C. Evans and R.F. Gariepy (1992). Measure theory and fine properties of functions. CRC Press, Boca Raton, FL. [63] P. Florchinger and F. Le Gland (1992). Particle approximations for first order stochastic partial differential equations. Applied stochastic analysis (New Brunswick, NJ, 1991), 121–133, Lecture Notes in Control and Inform. Sci., 177 Springer, Berlin. [64] P. Florchinger and F. Le Gland (1991). Time-discretization of the Zakai equation for diffusion processes observed in correlated noise. Stochastics Stochastics Rep. 35, 233–256. [65] P. Florchinger and F. Le Gland (1990). Time-discretization of the Zakai equation for diffusion processes observed in correlated noise. In: Analysis and optimization of systems (Antibes, 1990), 228–237, Lecture Notes in Control and Inform. Sci., 144, Springer, Berlin. [66] P. Frost and T. Kailath (1971). An innovation approach to leastsquares estimation, Part III. IEEE Trans. Automat. Control, AC-16, 217–226. [67] M. Fujisaki, G. Kallianpur and H. Kunita (1972). Stochastic differential equations for the non-linear filtering problem. Osaka J. Math. 9, 19–40. [68] L. Gawarecki and V. Mandrekar (2000). On the Zakai equation of filtering with Gaussian noise. Stochastics in finite and infinite dimensions, volume in honor of Gopinath Kallianpur, eds. T. Hida et al., 145–151. [69] N. J. Gordon, D. J. Salmon and A. F. M. Smith (1993). Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proc. F, 140, 107–113. [70] N. J. Gordon, D. J. Salmon and C. Ewing (1995). Bayesian state estimation for tracking and guidance using the bootstrap filter. J. Guidance Control Dyn., 18, no. 6, 1434–1443. [71] C. Graham (1992). Nonlinear Itô–Skorohod equations and martingale problem with discrete jump sets, Stochastic Process. Appl. 40, 69–82. [72] B. Grigelionis (1973). On stochastic equations of nonlinear filtering of random processes. Litov. Mat. Sb., 12, no. 4, 37–51. [73] M. Hitsuda and I. Mitoma (1986). Tightness problem and stochastic evolution equation arising from fluctuation phenomena for interacting diffusions, J. Multivariate Anal. 19, 311–328.
259
260
Bibliography
[74] E. Hopf (1963). An inequality for positive linear integral operators. J. Math. Mech. 12, no. 5, 683–692. [75] Y. Hu, G. Kallianpur and J. Xiong (2002). An approximation for Zakai equation, Appl. Math. Optim. 1, 23–44. [76] N. Ikeda and S. Watanabe (1989). Stochastic differential equations and diffusions. North Holland Publishing Company, Amsterdam. [77] M. Joannides and F. LeGland (1997). Nonlinear Filtering with Perfect Discrete Time Observations, Proceedings of the 34th IEEE Conference on Decision and Control, New Orleans, December 13–15, 1995, pp. 4012–4017. [78] M. Joannides and F. LeGland (1997). Nonlinear Filtering with Continuous Time Perfect Observations and Noninformative Quadratic Variation. Proceedings of the 36th IEEE Conference on Decision and Control, San Diego, December 10–12, 1997, pp. 1645–1650. [79] T. Kailath (1968). An innovation approach to least-squares estimation, Parts I, II. IEEE Trans. Automat. Control, AC-13, 646–660. [80] T. Kailath and R. Greesy (1971). An innovation approach to leastsquares estimation, Part IV. IEEE Trans. Automat. Control, AC-16, 720–727. [81] G. Kallianpur (1980). Stochastic filtering theory. Springer-Verlag, New York. [82] G. Kallianpur and C. Striebel (1968). Estimation of stochastic systems: arbitrary system process with additive noise observation errors. Ann. Math. Statist., 39, 785–801. [83] G. Kallianpur and C. Striebel (1969). Stochastic differential equations occurring in the estimation of continuous parameter stochastic processes. Teor. Veroyatn. Primen., 14, no. 4, 597–622. [84] G. Kallianpur and J. Xiong (1994). Asymptotic behavior of a system of interacting nuclear-space-valued stochastic differential equations driven by Poisson random measures. Appl. Math. Optim. 30, 175– 201. [85] R. L. Karandikar (1995). On pathwise stochastic integration. Stochastic Process. Appl. 57, 11–18. [86] G. Kitagawa (1996). Monte-Carlo filter and smoother for nonGaussian non-linear state space models. J. Comput. Graphical Stat., 5, no. 1, 1–25. [87] P. Kotelenez (1995). A class of quasilinear stochastic partial differential equation of McKean-Vlasov type with mass conservation. Probab. Theory Related Fields 102, 159–188. [88] N. Krylov (1996). On the Lp -theory of stochastic partial differential equations in the whole space. SIAM J. Math. Anal. 27, 313–340.
Bibliography
[89] N. Krylov (1999). An analytic approach to SPDEs. In: Stochastic partial differential equations, six perspectives. Mathematical Surveys and Monographs. AMS, Providence, RI. [90] N. Krylov and B. L. Rozovskii (1981). Stochastic evolution equations. J. Sov. Math. 16, 1233–1277. [91] N. Krylov and B. L. Rozovskii (1982). On the characteristics of degenerate second order parabolic Itô equations. Trudy seminara imeni Petrovskago 8, 153–168 in Russian; English translation in J. Sov. Math. 32, no. 4 (1986), 336–348. [92] H. Kunita (1971). Asymptotic behavior of the nonlinear filtering errors of Markov processes. J. Multivariate Anal. 1, 365–393. [93] H. Kunita (1991). Ergodic properties of nonlinear filtering processes. Spatial Stochastic Processes, K.C. Alexander and J. C. Watkins (ed.). [94] H. Kunita (1990). Stochastic flows and stochastic differential equations. Cambridge Studies in Advanced Mathematics 24. Cambridge University Press, Cambridge. [95] H. Kunita (1993). Representation and stability of nonlinear filters associated with Gaussian noises. Stochastic processes: a festschrift in honor of Gopinath Kallianpur, (ed.) Cambanis et al., SpringerVerlag, New York. 201–210. [96] H. H. Kuo (1996). White Noise Distribution Theory. CRC Press, Boca Raton. [97] T. Kurtz and J. Xiong (1999). Particle representations for a class of nonlinear SPDEs. Stochastic Process. Appl. 83, 103–126. [98] T. Kurtz and J. Xiong (2000). Numerical solutions for a class of SPDEs with application to filtering. Stochastics in finite and infinite dimension: in honor of Gopinath Kallianpur. (ed.) T. Hida, R. Karandikar, H. Kunita, B. Rajput, S. Watanabe and J. Xiong. Trends in mathematics. Birkhäuser, Boston. [99] T. Kurtz and J. Xiong (2004). A stochastic evolution equation arising from the fluctuation of a class of interacting particle systems. Commun. Math. Sci. 2, 325–358. [100] H. J. Kushner (1964). On the dynamic equations of conditional probability density functions with applications to optimal stochastic control theory. J. Math. Anal. Appl., 8, 332–344. [101] H. J. Kushner (1967). Dynamic equations for nonlinear filtering. J. Differ. Equations, 3, 179–190. [102] H. J. Kushner (1977). Probability methods for approximations in stochastic control and for elliptic equations. Academic Press, New York.
261
262
Bibliography
[103] H. J. Kushner (1979). A robust discrete state approximation to the optimal nonlinear filter for a diffusion. Stochastics Stochastics Rep. 3, 75–83. [104] H. J. Kushner (1997). Robustness and convergence of approximations to nonlinear filters for jump-diffusions. Comput. Appl. Math., V. 16, 153–183. [105] H. Kwakernaak and R. Sivan (1972). Linear optimal control systems. Wiley-Interscience, New York. [106] A. Le Breton and M. Roubaud (2000). Asymptotic optimality of approximate filters in stochastic systems with colored noises. SIAM J. Control Optim. 39, no. 3, 917–927. [107] F. Le Gland and L. Mevel (2000). Exponential forgetting and geometric ergodicity in hidden Markov models. Math. Control Signals Syst. 13, 63–93. [108] F. Le Gland and N. Oudjane (2003). A robustification approach to stability and to uniform particle approximation of nonlinear filters: the example of pseudo-mixing signals. Stochastic Process. Appl. 106, no. 2, 279–316. [109] F. Le Gland and N. Oudjane (2004). Stability and uniform approximation of nonlinear filters using the Hilbert metric and application to particle filters. Ann. Appl. Probab. 14, no. 1, 144–187. [110] R. S. Liptser (1967). On filtering and extrapolation of the components of diffusion type Markov processes. Teor. Veroyatn. Primen., 12, no. 4, 754–756. [111] R. S. Liptser (1968). On filtering and extrapolation of some Markov processes, I. Kibernetika (Kiev), 3, 63–70. [112] R. S. Liptser (1968). On filtering and extrapolation of some Markov processes, II. Kibernetika (Kiev), 6, 70–76. [113] R. S. Liptser and A. N. Shiryaev (1968). Nonlinear filtering of diffusion type Markov processes. Tr. Mat. Inst. Steklova, 104, 135–180. [114] R. S. Liptser and A. N. Shiryaev (1968). On the case of effective solution of the problems of optimal nonlinear filtering, interpolation, and extrapolation. Teor. Veroyatn. Primen., 13, no. 3, 570–571. [115] R. S. Liptser and A. N. Shiryaev (1968). Nonlinear interpolation of the components of diffusion type Markov processes (forward equations, effective formulae). Teor. Veroyatn. Primen., 13, no. 4, 602–620. [116] R. S. Liptser and A. N. Shiryaev (1969). Interpolation and filtering of the jump components of a Markov process. Izv. Akad. Nauk SSSR, Ser. Mat., 33, no. 4, 901–914.
Bibliography
[117] S. Lototsky and B. L. Rozovskii (1997). Recursive multiple Wiener integral expansion for nonlinear filtering of diffusion processes. In: Stochastic processes and functional analysis (Riverside, CA, 1994), 199–208, Lecture Notes in Pure and Appl. Math., 186, Dekker, New York. [118] S. Lototsky, R. Mikulevicius and B. L. Rozovskii (1997). Nonlinear filtering revisited: a spectral approach. SIAM J. Control Optim. 35, 435–461. [119] A. M. Makowski (1986). Filtering formula for partially observed linear systems with non-Gaussian initial conditions. Stochastics 16, 1–24. [120] A. M. Makowski and R. B. Sowers (1992). Discrete-time filtering for linear systems with non-Gaussian initial conditions: Asymptotic behaviors of the difference between the MMSE and the LMSE estimates. IEEE Trans. Automat. Control 37, 114–120. [121] V. Mandrekar and P. K. Mandal (2000). A Bayes formula for Gaussian processes and its applications. SIAM J. Control Optim. 39, 852–871. [122] H. P. McKean (1967). Propagation of Chaos for a Class of Nonlinear Parabolic Equations, Lecture Series in Differential Equations 2, 177–194. [123] S. Méléard (1996). Asymptotic behavior of some interacting particle systems, McKean-Vlasov and Boltzmann models, Probabilistic models for nonlinear partial differential equations, Lecture Notes in Math., 1627, 42–95. [124] B. M. Miller and E. Ya. Rubinovich (1995). Regularization of a generalized Kalman filter. Math. Comput. Simul. 39, 87–108. [125] B. M. Miller and W. J. Runggaldier (1997) Kalman filtering for linear systems with coefficients driven by a hidden Markov jump process. Systems Control Lett. 31, 93–102. [126] P. L. Morien (1996). Propagation of chaos and fluctuations for a system of weakly interacting white noise driven parabolic SPDE’s. Stochastics Stochastics Reps. 58, 1–43. [127] R. Mortensen (1966). Doctoral Dissertation, Department of Electrical Engineering, University of California, Berkeley. [128] D. L. Ocone (1999). Entropy inequalities and entropy dynamics in nonlinear filtering of diffusion processes. Stochastic analysis, control, optimization and Applications, W. McEneaney, G. Yin, and Q. Zhang (ed.). [129] D. L. Ocone and E. Pardoux (1996). Asymptotic stability of the optimal filter with respect to its initial condition. SIAM J. Control Optim. 34, no. 1, 226–243.
263
264
Bibliography
[130] N. Oudjane and S. Rubenthaler (2005). Stability and uniform particle approximation of nonlinear filters in case of non-ergodic signals. Stoch. Anal. Appl. 23, no. 3, 421–448. [131] A. Papavasiliou (2006). Parameter estimation and asymptotic stability in stochastic filtering. Stochastic Process. Appl. 116, no. 7, 1048–1065. [132] E. Pardoux (1979). Stochastic partial differential equations and filtering of diffusion processes. Stochastics 3, 127–167. [133] J. Picard (1984). Approximation of nonlinear filtering problems and order of convergence. Filtering and Control of Random Processes. Lecture Notes Control Inform. Sci. 61, Springer, New York. [134] P. Protter (1990). Stochastic integration and differential equations: a new approach. Springer, New York. [135] D. Revuz and M. Yor (1999). Continuous martingales and Brownian motion. Springer, New York. [136] B. L. Rozovskii (1972). Stochastic partial differential equations arising in nonlinear filtering problems. Usp. Mat. Nauk. SSSR, 27, 3, 213–214. [137] B. L. Rozovskii (1990). Stochastic evolution systems. Linear theory and applications to nonlinear filtering. Kluwer, Dordrecht. [138] A. N. Shiryaev (1966). On stochastic equations in the theory of conditional Markov processes. Teor. Veroyatn. Primen., 11, no. 1, 200–206. [139] A. N. Shiryaev (1966). Stochastic equations of nonlinear filtering of jump Markov processes. Probl. Peredachi. Inf., 2, no. 3, 3–22. [140] L. Stettner (1989). On invariant measures of filtering processes. Stochastic Differential Systems, Proc. 4th Bad Honnef Conf., K. Helmes, N. Christopeit, and M. Kohlmann (ed.), Lecture Notes in Control and Inform. Sci., 279–292. [141] L. Stettner (1991). Invariant measures of the pair: state, approximating filtering process. Colloq. Math. 62, no. 2, 347–351. [142] R. L. Stratonovich (1960). Conditional Markov processes. Teor. Veroyatn. Primen., 5, no. 2, 172–195. [143] R. L. Stratonovich (1966). Conditional Markov processes and their applications to optimal control theory. Izd. MGU, Moscow. [144] C. Striebel (1968). Partial differential equations for the conditional distribution of a Markov process given noisy observations. J. Math. Anal. Appl., 11, 151–159. [145] V. B. Tadi´c and A. Doucet (2005). Exponential forgetting and geometric ergodicity for optimal filtering in general state-space models. Stochastic Process. Appl. 115, no. 8, 1408–1436.
Bibliography
[146] H. V. Weizsächer (1983). Exchanging the order of taking suprema and countable intersections of σ -algebras. Ann. Inst. H. Poincaré Probab. Statist. 19, no. 1, 91–100. [147] A. D. Wentzell (1965). On equations of the conditional Markov processes. Teor. Veroyatn. Primen., 10, no. 2, 390–393. [148] D. Williams (1991). Probability with martingales. Cambridge University Press, Cambridge, UK. [149] W. M. Wonham (1965). Some applications of stochastic differential equations to optimal nonlinear filtering. SIAM J. Control Optim., 2, 347–369. [150] J. Xiong and X.Y. Zhou (2006). Mean-variance portfolio selection under partial information. SIAM J. Control Optim. 46, no. 1, 156– 175. [151] M. P. Yershov (1969). Nonlinear filtering of Markov processes. Teor. Veroyatn. Primen., 14, no. 4, 757–758. [152] M. P. Yershov (1970). Sequential estimation of diffusion processes. Teor. Veroyatn. Primen., 15, no. 4, 705–717. [153] J. Yong and X.Y. Zhou (1999). Stochastic control: Hamiltonian systems and HJB equations. Springer, New York. [154] M. Zakai (1969). On the optimal filtering of diffusion processes. Z. Wahr. Verw. Gebiete 11, 230–243. [155] M. Zakai and J. Ziv (1972). Lower and upper bounds on the optimal filtering error of certain diffusion processes. IEEE Trans. Inform. Theory IT-18, no. 3, 325–331.
265
List of Notations
• A = B mod P: A and B induce the same sets of P-equivalence sets • Ab : the set of all bounded A-measurable random variables • AB: the symmetric difference between two sets A and B • B (S): Borel σ -field of a metric space S • Cd = C(R+ , Rd ): the collection of the continuous maps from R+ to Rd • C 2 (Rd ): the collection of all bounded differentiable functions with b • • • • • • • •
• • • • • • • • • • • • •
bounded derivatives up to order 2 C0 (R): the collection of all continuous functions with compact support δθ : the Dirac measure at θ D(L): the domain of the operator L EQ : the expectation with respect to the probability measure Q Eˆ (Y|G ): the conditional expectation, given G , under the probability measure Pˆ Ft : the σ -field generated by the observation up to time t G1 ∨ G2 : The σ -field generated by G1 ∪ G2 H0 = L2 (Rd ): the Hilbert space consists of square-integrable functions on Rd
2 ◦ φ2 = d |φ(x)| dx R 0 ◦ φ, ψ0 = Rd φ(x)ψ(x)dx H1 ⊗ H2 : the tensor product of the Hilbert spaces H1 and H2 L(X) or P ◦ X −1 : the measure induced by X L2loc (M): the collection of all locally square-integrable predictable processes L2G (0, T; Rd ): the collection of square-integrable processes that are predictable with respect to the σ -fields Gt M2 : the collection of all square-integrable martingales M2,c : the collection of all continuous square-integrable martingales Mcloc : collection of all continuous local martingales
M2,c : all continuous locally square-integrable martingales loc MG (Rd ): the space of finite signed measures on Rd .
Mt : quadratic variation process of Mt
M, Nt : quadratic covariation process of Mt and Nt MF (Rd ): collection of all finite Borel measures on Rd µ << ν: the measure µ is absolutely continuous with respect to ν
List of Notations
• • • • • • • • • • •
|ν|: the total variation measure of ν ν, f : the integral of the function f with respect to the measure ν Nx Mz : the normal space of the manifold Mz at point x ∈ Mz PωA : the regular conditional probability measure on (, F , P) given the sub-σ -field A P (Rd ): the collection of all Borel probability measures on Rd ∂i F: the partial derivative of a function F with respect to its ith variable ∂ij2 F: ∂i ∂j F ST : the collection of all stopping times bounded by T · 0,∞ : supremum norm of a function + : the set of symmetric positive-definite m × m-matrices Sm Tx Mz : the tangent space of the manifold Mz at point x ∈ Mz ξ ∗ : the transpose of a vector or a matrix ξ
267
This page intentionally left blank
Index PA -separable, 228 ρh -diameter, 212 s-dimensional Hausdorff measure, 242 accumulated observation process, 2 algebraic Riccati equation, 176 Area formula, 242 Assumption (A), 181 Assumption (BC), 83 Assumption (BD), 113 Assumption (E1), 198 Assumption (E2), 198 Assumption (E3), 206 Assumption (S1), 216 Assumption (S2), 217 asymptotically stable matrix, 169 backward Itô integral, 112 backward martingale, 22 backward martingale convergence theorem, 22 backward SPDE, 111 barycenter, 202 Bayes’ formula, 55 Birkhoff’s contraction coefficient, 213 Brownian motion, 34 Burkholder–Davis–Gundy inequality, 45 cádlág, 24 cádlág modification, 24 Cauchy sequence, 96 Cayley–Hamilton theorem, 172 characteristic polynomial, 172 class (DL), 25 comparable measures, 211 complete orthonormal basis, 97 completely controllable matrix, 170 completely reconstructible system, 170 Condition (F), 187 Condition (I), 138 Condition (IN), 245 Condition (ND), 237 Condition (R), 240
Condition (S), 240 CONS, 97 controllable subspace of a system, 175 covariance matrix, 159 Cox–Ingersoll–Ross model, 232 detectable pair, 173 discrete-time Gronwall’s inequality, 75 Doob’s decomposition, 25 Doob’s inequality, 17, 18, 24 Doob–Meyer decomposition, 27 Duncan–Mortensen–Zakai equation, 95 Euler approximation, 73 Euler scheme, 134 extension, 48 Feller–Markov process, 187 filter is asymptotically stable, 187 finite memory property, 208 FKK equation, 8 Gaussian process, 158 generator of the signal, 89 Girsanov’s transformation, 56 Gronwall’s inequality, 69 Hilbert metric, 211 Hilbert space, 96 hybrid filter, 140 inner product space, 96 innovation process, 90 integrable increasing process, 26 invariant measure of filter, 198 Itô stochastic integral, 38 Itô’s formula, 42 Jordan form, 168 Jordan normal form, 168
270
Index
Kallianpur–Striebel formula, 7 Kallianpur–Striebel, 85 Kalman–Bucy filter, 164 Kazamaki’s theorem, 53 Kushner–FKK equation, 91 Kushner–Stratonovich equation, 8, 95 local martingale, 33 Markov process, 79 martingale, 15 martingale convergence theorem, 20 martingale problem, 71 mean vector, 159 Meyer’s process, 33 natural increasing process, 26 normal space, 245 Novikov condition, 54 observation process, 2 optimal control problem, 166 optimal filter, 85 optional sampling theorem, 16, 25 Ornstein–Uhlenbeck process, 6 particle filter, 140 pathwise uniqueness, 62 PDE, 111 portfolio, 3 predictable process, 15, 37 regular conditional probability distribution, 84 regular point for mapping, 240 regular submartingale, 31 Riccati equation, 163, 167
SDE, 2 semimartingale, 41 separable space, 96 separation principle, 93 SPDE, 9 square-integrable martingale, 32 stabilizable pair, 175 stable subspace of matrix A, 169 standard extension, 48 stochastic basis, 15 stochastic flow, 73 stochastic integral, 41 stopping time, 16 Stratonovich integral, 58 strong solution, 62 submartingale, 15 submartingale convergence theorem, 20 supermartingale, 15
tangent space, 245 tensor product of Hilbert spaces, 98 total variation metric on P(S), 214 uniqueness of strong solution, 62 uniqueness of weak solution, 62 unnormalized filter, 86 unreconstructible subspace of a system, 172 unstable subspace of matrix A, 169 upcrossing number, 19
Wasserstein metric, 122 weak solution, 62 weighted unnormalized filter, 133 well-posedness, 71
Zakai’s equation, 8, 89