Linear and nonlinear filtering for scientists and engineers

Linear and Nonlinear Filtering for Scientists and Engineers This page is intentionally left blank Linear and Nonline...

Author: Ahmed N.U.

37 downloads 550 Views 17MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Linear and Nonlinear Filtering for Scientists and Engineers

This page is intentionally left blank

Linear and Nonlinear Filtering for Scientists and Engineers N U Ahmed University of Ottawa

VL&J World Scientific Singapore • New Jersey •London • Hong Kong

Published by World Scientific Publishing Co. Pte. Ltd. P O Box 128, Farrer Road, Singapore 912805 USA office: Suite IB, 1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.

LINEAR AND NONLINEAR FILTERING FOR SCIENTISTS AND ENGINEERS Copyright © 1998 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.

ISBN 981-02-3609-3

This book is printed on acid-free paper.

Printed in Singapore by U t o P r i n t

In memory of my Mother, Father, Uncle and my wife Feroza who gave so much. To my children Lisa, Schockley, Mona Rebeka, and not so fortunate one Pamela and the last, Jordan Salam, the most loving one. Also to my sweet grand children Kira, Pearl, and Jazzmine. N.U.Ahmed

This page is intentionally left blank

VII

PREFACE

Linear and Nonlinear filter theory has evolved over the last three and half decades and still continues to make advances in both theory and applications. Some of the earlier results are available in several books such as the classical book by R.S.Bucy and P.D.Joseph, entitled "Filtering for Stochastic Processes with Applications to Guidance" published by John Wiley in 1969. The book by B.D.O. Anderson and J.B. Moore, entitled "Optimal Filtering", was pub lished by Prentice Hall in 1979 and the book by F.L. Lewis, entitled "Optimal Estimation" was published by John Wiley in 1986. The book by G. Kalianpour, entitled "Stochastic Filtering", was published by Springer-Verlag in 1980. Kalianpour's book is theoretically rigorous in its presentation but not so eas ily accessible to Engineers and Scientists. The first three books avoid rigid rigourousness and use more intuition to make it easily accessible to Engineers and Scientists. The proposed book combines a bit of both rigourousness and intuition to derive most of the classical results of Linear and Nonlinear filtering and beyond. Further, many new results that appeared in recent years in literature are also presented. The most interesting feature of the book is that all the derivations of the linear filter equations given in Chapters 3-11, beginning from classical Kalman filter presented in Chapter 3 and 5, are based on one basic princi ple which is fully rigorous but also very intuitive and easily understandable. Further this is a direct approach based on the first principle of calculus of vari ations covered in undergraduate mathematics and first year graduate courses in physical and engineering sciences. Hence this book also provides a good and strong foundation for theoretical understanding of the subject. A brief description of the contents of the book is given below. The first chapter introduces the basic concepts of probability space, random variables and stochastic processes which are used as basic building blocks for Markov processes and Diffusion processes including stochastic integrals. The second chapter introduces Stochastic Differential Equations, both linear and nonlin ear, and Backward and Forward Kolmogorov equations. Filter theory starts

VIII

Preface

from chapter 3. Chapter 3 gives derivation of the classical Kalman filter which involves only integral observation, chapter 4 deals with a generalized version of the classical case covering dynamic observation. Chapter 5 gives the discrete Kalman filter equations. Linear filtering with correlated noise (i.e dynamic and measurement noise are correlated) is treated in chapters 6-8 with increasing generality. Chapter 9 deals with systems driven both by Wiener and Poisson jump processes. Chapter 10 gives derivation of Optimum linear filters with gain constraints. Classical Kalman filter follows from this general result as a special case once the constraints are relaxed. This chapter also contains a section on robust filtering or filtering against parametric uncertainties treated as games problem. Chapter 11 deals with systems driven both by Wiener processes and arbitrary second order random processes. Chapter 12 gives the deriva tion of three kinds of Extended Kalman Filtering that include second order corrections. Chapter 13 gives the derivation of Nonlinear Filtering equations, specially the Zakai equation, followed by a derivation of Kushner equation from Zakai equation. In chapter 14, a powerful numerical technique for solving non linear filter equations is presented. In chapter 15, partially observed control problems are solved. For linear problems the basic principles of linear filter theory are used for the derivation of optimal control. For nonlinear problems Zakai equation is used. Chapter 16 deals with identification of linear and non linear systems with full or partial information. The book is an outgrowth of many years of first year graduate courses given by the author at the university of Ottawa, Canada; and also a series of lectures on linear and nonlinear filtering given at the Department of Systems and Processes of the University of Simon Bolivar, Caracas, Venezuela at the invitation of Professor W. Colmeneres. Target Audience: First and second year graduate students of science, en gineering, economics and finance; practicing scientists and engineers; commu nication and signal processing specialists etc. The book is also targeted to students of applied mathematics working on stochastic processes and their applications. Reading Guide: Theorems, propositions, lemmas and corollaries are num bered sequentially in one stream without distinction. Similarly comments are numbered sequentially in another stream. Readers interested in applications of Kalman Filtering may read first few sections of chapter 1 and 2 and then move on to chapters 3,4,5. Readers interested more on linear filtering may read all the way from chapter 3 to 11. Those interested in nonlinear filtering, may start with chapter 12 (extended Kalman filtering), move on to the full nonlinear problem treated in chapter 13 and conclude with numerical methods

Preface

IX

given in chapter 14. For introduction to fully and partially observed control and system identification problems one may study chapters 15 and 16. I wish to thank three of my students Mr. Afshin David, Mr. Hani Harbi and Mr. Kerbal Sebti for their material help during the preparation of the manuscript. In particular I wish to thank Professor S.K. Biswas of Temple University, Philadelphia, USA for taking the pain to proof read the entire manuscript. I like to thank all my colleagues of the SITE and the Department of Math ematics for their cooperation. Finally I like to remember two of my teachers, Professor O. Celinski and Professor G.J. Van Der Mass who introduced me to the field of stochastic processes in the early sixties. Also I would like to thank the World Scientific Publishing Company for their sincere cooperation.

This page is intentionally left blank

XI

CONTENTS

PREFACE

C H A P T E R 1 Introduction to Stochastic Processes 1.1 1.2 1.3 1.4 1.5 1.6

Introduction Probability space and Random Variables Conditional Expectations Markov Processes, Wiener Processes and Jump processes Ito Stochastic Integrals Stratonovich Integrals

1 1 5 9 13 22

C H A P T E R 2 Stochastic Differential Equations 2.1 2.2 2.3 2.4 2.5 2.6

Introduction Linear Stochastic Differential Equations Nonlinear Stochastic Differential Equations Ito and Stratonovich Calculus and Their Comparison Backward and Forward Kolmogorov Equations Change of drift and Girsanov theorem

25 25 28 32 39 47

C H A P T E R 3 Kalman Filtering for Linear Systems Driven by Wiener Process-I 3.1 3.2 3.3 3.4 3.5 3.6

Introduction System Dynamics Measurement Dynamics Problem Formulation Derivation of Optimum Filter Some Examples

55 55 56 56 58 65

Contents

XII

3.7 Prediction Problem C H A P T E R 4 Kalman Filtering for Linear Systems Driven by Wiener Process-II 4.1 Introduction 4.2 System and Measurement Dynamics (A) 4.3 Problem Formulation 4.4 Structure of The Filter 4.5 Solution of The Problem 4.6 System and Measurement Dynamics (B) 4.7 Discussions and an Example

68

69 69 70 70 70 72 75

C H A P T E R 5 Discrete Kalman Filtering 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8

Introduction System Dynamics Measurement Dynamics Problem Formulation Error Covariance Update State Estimate Time Update Error Covariance Time Update Necessary Equations for Simulation

77 77 78 78 79 81 82 83

C H A P T E R 6 Linear Filtering with Correlated Noise-I 6.1 6.2 6.3 6.4 6.5 6.6

Introduction System and Measurement Dynamics (A) Problem Formulation Structure of The Filter Solution of The Problem System and Measurement Dynamics (B)

85 85 86 86 86 90

C H A P T E R 7 Linear Filtering with Correlated Noise-II 7.1 7.2 7.3 7.4

Introduction System and Measurement Dynamics (A) Problem Formulation Structure of the Filter

95 95 96 96

Contents

XIII

7.5 Solution of The Problem 7.6 System and Measurement Dynamics (B) 7.7 Discussions and Examples

96 100 103

C H A P T E R 8 Linear Filtering with Correlated Noise-III 8.1 8.2 8.3 8.4 8.5 8.6 8.7

Introduction System and Measurement Dynamics Problem Formulation Structure of The Filter Solution of The Problem Special Cases Discussion and Examples

105 105 106 106 106 110 112

C H A P T E R 9 Linear Filtering of Jump Processes 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9

Introduction Poisson Noise System and Measurement Dynamics (A) Problem Formulation Structure of The Filter Solution of The Problem System and Measurement Dynamics (B) System and Measurement Dynamics (C) Examples and Discussion

113 113 113 114 114 115 117 118 119

C H A P T E R 10 Linear Filtering with Constraints 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8

Introduction System and Measurement Dynamics Problem Formulation Solution of The Filtering Problem Dynamically Coupled Systems Algorithm for Computation-I Robust Filtering Algorithm for Computation-II

121 121 121 122 129 130 131 138

Contents

XIV

C H A P T E R 11 Filtering for Linear Systems Driven by Second Order Random Processes 11.1 11.2 11.3 11.4 11.5 11.6

Introduction System and Measurement Dynamics Problem Formulation Solution of The Filtering Problem and Special Cases Dynamically Coupled Systems Algorithm for Computation

141 141 142 146 152 153

C H A P T E R 12 Extended Kalman Filtering I, II and III 12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8

Introduction System and Measurement Dynamics Problem Formulation (EKF-1) Problem Solution with EKF-1 Equations Computational Methods Problem Formulation and Equations for EKF-2 Comparison of EKF-1 and EKF-2 with Examples Equations for EKF-3

155 155 156 157 159 160 163 164

C H A P T E R 13 Nonlinear Filtering 13.1 13.2 13.3 13.4 13.5 13.6 13.7 13.8

Introduction System and Measurement Dynamics Problem Formulation Derivation of Zakai Equation Zakai to Kushner Equation and EKF Revisited Zakai Equation for Jump Processes Some Examples Nonlinear Prediction

167 167 168 168 178 183 185 186

Contents

xv

C H A P T E R 14 Numerical Techniques for Nonlinear Filtering 14.1 14.2 14.3 14.4 14.5 14.6 14.7

Introduction Theoretical Basis for Numerical Computation Galerkin Approximation using Special Basis Functions Spatial Discretization and Computational Algorithm Basic Computational Steps An Alternative Approach Examples and Simulation Results

187 188 192 195 197 197 199

C H A P T E R 15 Partially Observed Control 15.1 15.2 15.3 15.4 15.5 15.6 15.7

Introduction Linear Systems with Integral Observation Linear Systems with Dynamic Observation Fully Observed Nonlinear Systems Computational Methods Partially Observed Nonlinear Systems Some Examples and Discussion

203 203 212 213 218 219 223

C H A P T E R 16 System Identification 16.1 16.2 16.3 16.4 16.5

Introduction Fully Observed Linear and Nonlinear Systems Partially Observed Linear Systems Partially Observed Nonlinear Systems A Computational Algorithm

227 227 234 237 246

1

CHAPTER 1 I N T R O D U C T I O N TO STOCHASTIC PROCESSES

1.1 Introduction In all fields of studies including physical and engineering sciences and also economic and social sciences, there are uncertainties which cannot be explained entirely on the basis of exact laws of physics. Even the physics of particles de scribed by Schrodinger equation is inherently embedded in statistical notions. For example, the integral of the square of the modulus of wave function over a spatial region is interpreted as the probability of finding the particle in that region. This is quantum mechanics. Similarly gas dynamics, thermodynamics, particle mechanics, stock market, weather are all statistical in nature. Mathe matical statistics combined with exact sciences, is used as a powerful tool for analysis and prediction of such phenomenon on a quantitative basis. In this section we present basic concepts of random variables and stochastic processes including the notions of Markov processes, Martingales, Wiener processes and jump processes which are absolutely essential for the materials to follow in later chapters. We present some important basic results from probability the ory which are useful for understanding these materials. For details the reader is referred to the books by Billingsley [21], Halmos [49 ], Berberian [24] . 1.2 Probability Space and Random Variables A probability space is a triple (£1, B, P) where (1) fi is an arbitrary nonempty set representing the space of elementary events, called the sample space, (2) B is a class of subsets of the set fi, representing events, satisfying the following basic properties: (a) (empty set) 0 e #, Cl € B (b) A e B => X ( compliment of A) e B

(^{Aim^AiZB^U^AieB. In other words the class B contains the empty set, the whole space and

Stochastic Processes

2

it is closed under complementation and countable union. From these three properties one can verify that if { A J G B then n^A* G B as well. The class B is called the cr-algebra or the algebra of Borel sets representing all possible and impossible events. (3) P : B —► [0,1] is a set function defined on B satisfying the properties (i): p(0) = o, (n): p(n) = i, (in): P O J ^ A O - ESi p(Ai)for any dis J° int family of sets {Ai} G B. The set function P is called the probability measure. Definition 1 (Random Variables) A real random variable X is a measurable function X : Q, —► R in the sense that for any open set J C R, the set {UJ e Q, : X(u) G J} is a set belonging to the class-#. In symbols X~l{J) G B. In particular, if BR denotes the class of Borel sets in R, and X - 1 ( J ) G B whenever J G BR then X is called a Borel measurable function, abbreviated as a measurable function. In fact the class of random variables or equivalently measurable functions, denoted by Lo(fi,B, P), forms an algebra in the sense that it is closed with respect to addition and multiplication. That is, for all X\,X% G Lo(tt,B,P) (i) c1X1 + c2X2 G L 0 ( « , B , P ) V c l l C 2 e

R,

(u)XiX2eL0(«,£,P). Since we are going to deal with vector-valued random variables (elements), we replace R by a general metric space S = (5,d), furnished with a metric d, called the state space where the random variables take their values from. Let Bs denote the sigma algebra of Borel sets in S generated by open sets. For example S = Rn, n any positive integer, with any of the equivalent measures of distance (metrics), in particular the Euclidean distance, is a metric space. Then a function X : Ct —► S is a random variable if X _ 1 ( r ) G B for every r G Bs- Note that every X G Lo(fi, B, P ; S) induces a probability measure fix on S or more precisely on Bs, given by

Mx(r) = PX-\T)

= p{u;eQ: x(u)

G

r>, r

G BS.

(1.1)

For simplicity of notation we shall write L 0 (^5 S) for L0(Sl, B, P; S). Most of the random variables of interest in applications must possess some additional properties. Let S = Rn and for f G Rn define

KI = (Elfi|*) 1 / , ,l
Linear and Nonlinear Filtering

3

for the norm in Rn. A random variable X G L0(Cl,Rn) class Lp(f2; Rn), 1 < p < oo, if W

EE / |X(u;)|*P(
is said to belong to the

\x\*»x{dx)

< oo,

(1.2)

where JJLX is the probability measure induced by the random element X on 13Rn. These are known as Lebesgue spaces and they are furnished with the norm topology given by P

|| X || p = ( j f \X(u)\*P(du)}

.

(1.3)

With respect to the norm topology, these are Banach spaces, in the sense that they are complete normed spaces. The space L 0 0 ( ^ ; P n ) denotes the class of essentially bounded # n -valued measurable functions with the norm | | A - | | « = es*-8up{|X(u;)|,u;€fi} = inf{a > 0 : P{u; € ft : \X(u)\ > a } = 0}.

^

With respect to this norm topology, this is also a Banach space. By use of Holder inequality the reader can easily verify that L 0 0 (ft;.R n )cLp(ft;.R") cLr(fl;Rn)

cLi(ft;.Rn),

for oo > p > r > 1. (1.5) One of the most widely used space of random variables is the space of second order random variables denoted by 1,2(0; # n ) . For any X G 1,2(0; .ft71) we define the mean mx and the covariance matrix Cx by mx=EX=

I X{u)P{duj)=

(Cx^0=E(X-mx^)2=

L

(x-mx,Z)2tix(dx),

I

xfix(dx)

f {X{u)-mx,fL)2P(du)

(1.6)

y^eRn.

n

IR

It is well known that a Gaussian random vector is entirely characterized by these two parameters. Let M+ (n x n) denote the class of symmetric positive definite nxn matrices. If X G 1/2(0; Rn), is a Gaussian random variable having mean m G Rn and covariance C G M+(n x n) then it induces a measure /j,g on Rn given by

MH = /^(x;m,c)dx,vr G BRn,

(1.7)

4

Stochastic Processes

where g(x;m,C)

= (1 /(2TT)" /2 VdetC) exp{-(l/2)(C-\x

- m),x - m)}

(1.8)

is called the density of the random variable X. In fact it follows from this expression that X G Lp(^t]Rn) for all 1 0. M o d e s of Convergence. Very often we are required to deal with a sequence of random variables or elements and the associated question of convergence. There are many different modes of convergence, in fact, as many as there are modes of convergence in general measure theory. Definition 2 (Convergence in Law) A sequence {Xk} G Lo{Q;Rn) is said to converge in law ( or weakly) to X G Lo(0; Rn) if the corresponding sequence of measures jik = PX^1 converges weakly to the measure /z = PX'1 in the sense that for any continuous bounded function / /

f(x)»k(dx)

—

/

f(x)fi(dx).

(1.9)

Let M(Rn) denote the space of probability measures on Rn. Since Rn is a separable metric space, M(Rn) can be metrized as a separable metric space as well [67]. Definition 3 (Convergence in Probability) A sequence of random vari ables {Xk} G Lo(Q,;Rn) is said to converge in probability or in measure to X G L 0 (ft; Rn) if for every e > 0, lim P{LJ G O : \Xk(u) - X(u>)\ > e} = 0.

(1.10)

k—►oo

Definition 4 (Convergence with Probability one) A sequence {Xk} G Lo(ffc; Rn) is said to converge to X G Lo(^; # n ) with probability one or almost surely or almost everywhere, if P{u G O : lim \Xk{uj) - X(u)\ > 0} = 0.

(1.11)

k—+oo

The space Lo(fi;i? n ), furnished with the topology of convergence either in probability or almost surely, is a complete topological space. The topology of almost sure convergence is a metric topology which is given by d(X, Y) =

Linear and Nonlinear Filtering

5

P{UJ e Q : X(u) ^ Y(u)}. Two random variables X,Y e L0(ft;Rn) are said to be equivalent or equal if P{u G ft : X(LJ) ^ Y(UJ)} = 0. We write X = Y to indicate this equivalence. Definition 5 (Convergence in the m e a n of order p) A sequence {Xk} £ LP(Q] Rn) is said to converge in the mean of order p to X e LP(Q; Rn) if lim / \Xk(uj) -

X(LJ)\PP((LJ)

= 0.

(1.12)

The space Lp(Q,;Rn) is a Banach space. Mean convergence implies con vergence in probability and convergence with probability one (a.s) also implies convergence in probability. Convergence with probability one, combined with the existence of an integrable function dominating the given sequence, implies convergence in the mean [49]. The following result is valid for probability spaces. Theorem 6 In a probability space, convergence a.s (almost surely or almost everywhere or with probability one ) implies convergence in probability ( in measure) which in turn implies convergence in law (weakly). Independence of Random variables. For any set K e B, let XK denote the indicator function of K which, by definition, equals one on K and zero elsewhere. Two sets A, B € B are said to be independent if E(XAXB) = (EXA)(EXB)(1.13) Similarly two subsigma algebras G i , ( ? 2 C B are said to be independent if E{XAXB)

= (EXA)(EXB) V i e M e G2,

(1.14)

and this is indicated by writing G\ _L G2. For any random variable Z , let o~{Z) denote the smallest sigma algebra with respect to which Z is measurable. Two random variables X,Y e Za(Q; Rn) are said to be independent if a(X) ± a(Y) and for every a, /? € Rn E{(X,a)(Y,(3)}

= E{(X,a)}

E{{Y,(3)}.

(1.15)

1.3 Conditional Expectations In filter theory the concept of conditional expectation is of great impor tance. Consider the measurable space (Cl,B) and let /i and v be two finite

Stochastic Processes

6

positive measures, not necessarily probability measures on B. The measure \i is said to be absolutely continuous with respect to the measure */, written // -< v, if for every e > 0, there exists a 6 = 6(e) > 0, such that ji(A) < e whenever i/(A) < 6 for any A e B. Given g € Lf(Q,B,j/) = {/ e Zq(ft,£,P) : / > 0}, it is clear that /i, defined by fi(K)=

[ g(u)v((h)),KeB,

(1.16)

is absolutely continuous with respect to the measure v. The converse is called the Radon-Nikodym theorem(RND), which states that if /i and v are two finite positive measures on B and if ji -< v then there exists a g € I/* (ft, B, v) such that /JL(A) = f g{u)v{aw) V AeB. (1.17). The function g is called the Radon-Nikodym derivative (RND) of// with respect to v and it is often denoted by ^ ( ^ ) = g(w). In fact, Radon-Nikodym theorem holds for finite as well as sigma-finite measure spaces. This theorem is of central importance in probability theory and its applications to filter theory. Let G be any complete subsigma algebra of the sigma algebra B and X G Li(Q] Rn). Any G-measurable Rn-valued random variable Y satisfying / Y(w)P(du>) = f X{uj)P(du) JK

\/K eG

(1.18)

JK

is called the conditional expectation of X given the subsigma algebra G, written Y = E{X\G}.

(1.19)

Clearly this is a G measurable random variable, and it is uniquely defined by virtue of the Radon-Nikodym theorem. In fact, any other random variable Z satisfying (1.18) must equal Y, PQ almost surely, where PQ = P\G denotes the restriction of the measure P to the subsigma algebra G. We give a brief sketch of the proof for the scalar case. Let G be a subsigma (completed) algebra of the sigma algebra B and let X e l a (ft, B, P). Let PQ denote the restriction of P on to G. Define X+ = sup{X,0}, and X~ = s u p { - X , 0 } and v+{T) = / X+(u)P(du) i / " ( r ) = I X-(u)P(du)=

= f X+PG{duj) f X-pG(duj)

Vr € G VTeG.

Linear and Nonlinear Filtering

1

Since both X+ and X~ are in L+, we have z/+ -< PQ and v~ )P(du>) VT G G i / " ( r ) = f g-{u)PG{dw)=

f X-{LJ)P{CLJ)

VTeG.

Setting g = g+ — g~ it follows from the above that / g(uj)PG(duj) = f X(u)P(du),

VT e G.

Clearly, taking Y = g we have a unique random variable Y which is G mea surable and belongs to L I ( J 2 , G , P G ) satisfying / Y(u>)P(du>) = f X(uj)P(duj)

VTeG.

This is the conditional expectation of X given G C B which is expressed by (1.19). Using this general result one can define the conditional expectation of any random variable X with respect to any family of random variables {Xi,X2,- - ',Xm} as Y = E{X\(XUX2,-

-,*m)} = E{X\*{X1,X2r

• .,X m }}

(1.20)

where a{X\,X2, • • *,X m } denotes the smallest sigma algebra with respect to which the random variables {Xi, X2, • •, X m } are measurable. In other words Y is a measurable function of the variables {X\, X2, • • •, Xm}, that is, Y(LJ) = F(Xi(a;), X 2 ( " ) , • • -, X m (cj)),

(1.21)

for a suitable measurable function F. In general if G C B is any subsigma algebra, and if X = XA for any A e B, then Y = E{XA\G}

= P(A\G)

gives the conditional probability of the event A given the subsigma algebra G. Note that the conditional probability is a random variable, for example, P(A\G) is a G-measurable function. With reference to (1.20), let K e BR and define A = {w e ft : X(u>) e K}. Then P(A\
Stochastic Processes

8

conditional probability of the event A given the random variables {Xi, X2l • • ■, Xm}. Thus P(A\a(Xi,X2, • • •, Xm)) is a measurable function of the random variables {X\, X2, • • •, Xm }. In fact, one can view the conditional expectation as a projection opera tion from the space, say, L p ( 0 , B , P ; 5 ) to the space LP(Q,G,PG',S) for any complete subsigma algebra G c B , and any p e [l,oo]. We have already de noted this operation by E{.\G}. The following two results are of fundamental importance in the study of stochastic differential equations and filtering. Theorem 7 (properties of conditional expectation) The conditional ex pectation operator E{.\G} satisfies the following properties. (1) E{a1X1 + a2X2\G} = a^X^G) + a2E{X2\G} V aua2 e R n and WXUX2 e Z a ( f t , B , P ; P ) . (2) E{E{X\G2}\Gi} = £{X|
(1.22) v

y

fc—+oo

Another important result that is useful in the study of regularity of solu tions of stochastic differential equations is the Borel-Cantelli lemma. Theorem 9 (Borel-Cantelli Lemma) Let {Ai} be a sequence of measurable sets and let A denote the set of points in infinitely many of the sets {Ai}, that

Linear and Nonlinear Filtering

9

is, A = limsup A; = n£°=1 U g n Au then oo

P(A) = 0 if 53P(i4i)
and oo

P(A) = 1 if ^ P ( A i ) = oo and the sets {Ai}

are mutually independent.

i=l

1.4 Markov Processes, Wiener Processes and J u m p Processes Let / C R be any finite or infinite interval. An # n -valued process, {£(£), t e / } , is called a stochastic process if, for each t £ i", u —> £*(u;) is a n indexed family of measurable functions from (fi, 23, P) to Rn. Let # / denote the sigma algebra of Borel subsets of the set J. The process {£} is said to be a measurable process if for every T 6 BRTV , the set {(t,u) E I x O : £(t,u>) € T} € 5 /

xB.

The process {£} is said to be continuous with probability one if, for all s , t e / , P { l i m | U ( t ) - £ ( s ) | | K - = 0 } = l, while it is said to be continuous in probability if, for every e > 0 and all s, t G J, l i m P { | | £ ( * ) - € ( * ) l l K » > e } = 0. Two random processes {£, 77} denned on the interval / are said to be equivalent if

P{£{t)*ri(t),t€l}

= 0.

The following criterion for continuity of random processes is due to Kolmogorov. Theorem 10 (Continuity with probability one) A stochastic process {£} with values in Rn is continuous with probability one if there exist numbers a, (3 > 0 and a finite number c > 0 such that E || m

- £(*) ll a < c\t - s| 1+/? , Vt, s e I.

(1.23)

10

Stochastic Processes

Let (O, T, P) be a probability space and {Tt,t e 1} be an increasing family of complete subsigma algebras of the principal sigma algebra T which are continuous from the right having left hand limits. We denote this probability space by (Cl^T D Tt 11, P) and call it the filtered probability space. Definition 11 (Adapted Process) A process {£} defined on / is said to be ^ - a d a p t e d if for each t e I, £t is an .^-measurable random variable (element) with values in Rn, that is , for each Y G BR^ , i~\Y)

= {u;en:

&(«;) e T} e Tt.

(1.24)

For each t e I let cr(^) denote the smallest sigma algebra with respect to which £t is measurable. Clearly cr(&) C Tt. Definition 12 (Markov processes) An ^ - a d a p t e d process {£} is said to be an Tt Markov process if for each T £ BRH. and for every s,t £ I,s
p{6 e r\rs} = p{& e i>(&)}.

(i.25)

In other words, given the present, the future evolution of a Markov process is independent of the past. A Markov process is completely characterized by its transition probability kernel, P(s,x,t,T)

Rn,T e BRn,

= Prob.{£t e T|£s = x}, for s
(1.26)

and its initial probability measure IIo at t = to. For example, for to < t\ < t 2 ....
Pitt! €ri,&2 e r 2 r . 4 t er m } = /

U0{dx) /

I

rm_i

P(t0,x,ti,dxi) /

P(t1,x1,t2idx2).:

* \pm — 2i ^77i—2) t m — i , CLXjn — \)±[tm

— \-) xm—\i

,, „ . ^rm 1

mj-

Later in the sequel, we will have occasion to discuss the properties of the transition kernel P ( s , x , t , T ) as a function of its arguments. Definition 13 (Wiener processes/ Brownian motion) A real valued pro cess {W(t),t e I = [0,oo)} is called a standard Wiener process or Brownian motion, if (i):P{W(0) = 0} = 1,

Linear and Nonlinear Filtering

11

(ii): The increments of W given by {W(U+i) - W(U)} on disjoint intervals of time {(*», ti + i] C / } are independent Gaussian random variables with mean zero and variance (^ + 1 —ti). Thus for any interval J G S R , P{(W(ti+1)-W(U))

e J} = J(l/y/2ir(ti+1

- < i ) ) e x p - ( i ) { x 2 / ( < i + i - ^ ) } dr.

From this the reader can easily verify the following properties: (PI) E(W(t)W(s)) = min{f,5}, (P2) E(W(t) - W(s))4 = (3n5/2)(t - s)2. Taking c = (37r 5/2 ), a = 4, /? = 1, it follows from property (P2) and Theorem 10 that the sample paths of Brownian motion W are continuous with probability one. Further, the following properties also follow from simple computation. (P3) lim,_ t P{\W(t) - W(s)\ > e) = 0 for any e > 0. (P4) lim s _ t P { | i y ( t ) l ^ ( ' ) | < r } = 0 for any positive r < oo. (P5)

P|SUPT„

Er=o l ^ ( * i + 0 - ^(*i)l < o o j = 0,

n

where 7r = {0 = to < t\ < t2... < tn = T < co} denotes a partition of the interval [0,T]. The sup is taken over all such partitions. For any a < (1/2) (P6) h m t _ P { y

> , } = 0, V, > 0.

Thus VF is Holder continuous with exponent a < (1/2). According to property (P4), Brownian motion is nowhere differentiate and hence it has unbounded total variation with probability one. This is property (P5). Thus, the length of any sample path over any interval of positive length is infinite with probability one. This tells us how wild the Brownian motion is. The most popular white noise used in physical and engineering sciences is in fact the derivative of the Brownian motion which does not exist in the classical sense. For stochastic systems to be considered in the next chapter, we need ndimensional Brownian motion. An n-dimensional standard Brownian motion {W(t),t e 1} is nothing more than n copies of n independent one dimensional Brownian motions. Thus all the properties stated for one dimensional case remain valid for the multidimensional case. For each t > 0, its probability law is given by tf(T)

= P{W(t)

eT} = j\l/(2nt)^2)exp{-(l/2)(\x\2/t)}

dx Vr G B&. (1.28)

12

Stochastic Processes

Clearly, for standard and nonstandard (with incremental covariance Q) Brownian motions in Rn, we have E{(W(t),0(W{s),r))} E{(W(t),£)(W(8),T))}

s)(t,ri) V£,V V£,T? € Rn, = (t A «)(£,T})

E{(W(t),t)(W(s),ri)} E{(W(t),£)(W(s),ri)}

= (t (* A s)(Q£,77) s)(Q£,77) V£,77 V£, r?€€ RRnn,,

(1.29)

where Q 6 M+(n x n). M a r t i n g a l e s . A process {£(*)>* € 7} on a filtered probability space (Q,? D Tt T t,P) is said to be an Tt martingale if (a) € /, (.) :=£|<(i)| *K(i>l< oo,Vt <«,««/,

:£{<(*)|JFss=<(s))Vs
(1.30)

A scalar valued process { { ( t ) , t e 1} is said to be a submartingale ( resp. supermartingale ) relative to the filtration Tt if

(a):E\£(t)\ £(*)( resp < £(*)), Vs < t.t.

(1.31) (1 31) '

It is easy to verify that the Wiener process or the Brownian motion W is a martingale. Indeed, for the general case, we have ( 1 ) : E\W(t)\2 = tTr(Q)
p /| s u p |Ke ( t ) | > >r\< (1) : P r l < sE ( 0 ^ ^) \ Vr > 0, (2) : if for some p > l,E\£(t)\p

< oo,V* € I, then

£ J s u p | £ ( * ) l P } < (Q)P E\£(T)\ E\Z(T)\P,

(1.32)

where (1/p) + (1/q) = 1.

It follows from this result that for square integrable martingales

2 2 £{sup|^)| E | S U | ^ ) | }]<4^(T)| < 4 ^ ( T ) | .. P

2

2

(1.33)

Linear and Nonlinear Filtering

13

The space of Lp martingales with the norm topology || £ ||= ( E ( K ( T ) | * ) )

P

(1.34)

is a Banach space. This is a consequence of the martingale inequality and the fact that Lv{£l,!F,P) is a Banach space. Poisson Process or Jump process Now we consider the Poisson random process {p{t),t G 1} on the filtered probability space (ft, T D Tt | t,P). Let B0(Rn) denote the class of Borel sets in Z = Rn \ {0} and i/(t, T) denote the number of jumps of the process p over the interval [0, t] with jump sizes Vp(s) = p(s) — p(s — 0) G I \ 0 < s < t. This is a random counting measure. A classical Poisson process has only jumps of one fixed size and in that case u(t1 T) = v(t) denotes simply the number of events over the time interval [0,i\. We shall use the centered Poisson process denned by

q(t,r) = v(t,r) - Ev(t,r)

(1.35)

where Ei/(t,F) = tU(T) with II being a suitable positive measure on Bo{Rn). The measure II gives the distribution of jump sizes and can be chosen according to applications. For example, if there are only a finite set of admissible jumps, say, {^i,^2> •••^m}j v% G i? n \ 0 , then II(r) = ^2v.erPi where pi denotes the frequency of occurrence of size Vi. We assume that p is a homogeneous Poisson process with the increments of the process ^(t,T) over disjoint intervals of time being independent. Thus, for each nonnegative integer k, t e I, and r e B0(Rn), P M ( i

r)

= i} =

« S i f f l } .

(i.36)

A:!

It follows from (1.35) and (1.36) that Eq(t,r) = 0, E(q(t,r))2

= tIL(r),

(1.37)

E{q(t,r)\Fa} = q(8,r)Vs
Stochastic Processes

14

Poisson processes as discussed in the previous section. In fact, the Wiener and the Poisson processes are the building blocks for stochastic differential equations to be studied in the following chapter. Our objective here is to introduce the notion of stochastic integrals of the form

L(f) = J f(t)dW(t). Since the Brownian motion W is of unbounded variation, this integral cannot be defined in the sense of any of the classical notions such as Lebesgue integrals or the Lebesgue-Stieltjes integrals. However, there is a unique and natural way this integral is defined. Stochastic integrals based on Wiener Process For stochastic integrals we shall always assume that Tt -L a{W(s) - W(r),

s>r>t},

that is the sigma algebra Tt is independent of the Brownian increments be yond time t. Let / = [0, T] be any finite interval and S the class of simple functions adapted to the filtration Tt in the sense that for any / £ <S, f(t) is Tt -measurable and that there exists a partition 7rn of the interval i" given by {0 = to < ti < £2 tn-i < tn = T} and a sequence of random variables {ao,fti, a2, ....a n -i} such that f(t) = au for t e (ti,t i + i],t = 0,1,2...n - 1,

(1.39)

and that a^ is Tti measurable for each z = 0,l,2,...,n — 1. For any such / , we define the stochastic integral by n —1

/

f(t)dW(t)

= ^ ( W f e + i ) - W{U)).

(1.40).

2=0

Being the sum of a finite number of products of T measurable functions, L(f) is an T measurable function and hence belongs to Lo(tt,T, P). Let L\ denote the class of ^ - a d a p t e d (nonanticipative) random processes satisfying E f\f(t)\2dt<

00.

(1.41)

Linear and Nonlinear Filtering

15

Furnished with the norm topology,

(E ^\f(t)\2d?J

1/2

,

(1.42)

this is a Hilbert space. This class of random processes is quite broad. For example, let g be a measurable function satisfying the polynomial growth \g(x)\ < cx + C2|x| a ,0 < a < oo, and define f(t) = g(W(t)). This / belongs to the class L\. Another example is given by the multiple Wiener integral

f(t) = [ .... / tfn(ri,T2,...Tn) Jo Jo

dW(ri)dW(r 2 )....dW(r n ),

(1.43)

for t G / , where Kn is a symmetric (deterministic) kernel and belongs to L 2 ( / n ) , that is,

/ |K n (ri,T2,...,T n )| 2 dTidT 2 ...c/r n < oo. 7/» Since £ | / ( * ) | 2 = n! / .... / |ifn(ri,r 2 ) ...,r n )| 2 dridr 2 ...dr n 7o Jo we have £ / | / ( t ) I 2 * <(n\T)

|| K n U i ^ ) < o o .

Note that / defined by (1.43) is Tt adapted. In fact any finite sum of such functionals belongs to the class L\. Returning to the stochastic integral (1.40), define Se = L 2 C\S. For / 6 Se we show that

EL(f) = 0,

E\L(f)\2 = E J\f(t)\2dt.

(1 44)

'

Stochastic Processes

16

Indeed, we verify the following chain of equalities: r

n-l

EL{f) = ^ { ^ Oi(W(t i+ i) -

W(U)\

= SEJa i (W(t i+ i)-W(t i ))| i=0

^

'

= S E { E { a *( W ^+ 1 )" W ^))l^4} = ^E{aiE(jW(ti+1)

(1 45)

'

- W(U))\Tt^

= £ EloiE{W{U+1) - W(UJ) } = 0. i=0

^

^

The second line follows from linearity of the expectation operation, the third follows form property 3 of Theorem 7, the fourth line follows from property 5 of the same theorem and the fifth line from the independence of the sigma algebra Tti and the increment W(ti+i) — W(ti). The last equality follows from the fact that the expected value of the Brownian increment is zero. This proves the first part of equation (1.44). For the second part, we have

E\L{f)\2 = E(^2 aiaj(W(ti+1) - W(U))(W(tj+1) - W(tj))\ n-1

= Ej2\<>i\2(W(U+1) -W(U))2 2=0

+ 2 £ ^ ^ { a i a J ( i y ( t 1 + 1 ) - W(ti))(W(tj+1) n—1

= EJ2 \ai\2(ti+1 -U) + 0 = E

p

-

W(tj))\

\f(t)\2dt.

(1.46) The double sum of the first line has been grouped into two terms in the second and the third line, the first one giving the diagonal components and the second giving the off diagonal terms. The first term of the fourth line follows from application of the conditional expectation with respect to the filtration Fti, precisely as in (1.45), and the property

Linear and Nonlinear Filtering

17

Again using conditional expectation with reference to the filtration Ttj, which contains Fti+1 for j > 2, one can easily verify that the off diagonal terms of the third line vanish. The last term of the fourth line follows from the fact that / is a simple function given by (1.39). This proves the second part. Thus we conclude that the stochastic integral L(f) is well defined for each / e Se and that L(f) e L2(Sl, T, P). Using the natural norm topology of this Hilbert space given by II X

|U2(^,P)

^ [E\X\2j

,

(1.47)

it follows from the above result that II L{f) | | z , 2 ( J W ) = || / |U5 V/ € 5 e .

(1.48)

In other words L is a bounded linear operator from Se to L,2(£l,Jr,P). It is known that any L | function can be approximated in the mean square sense by a sequence of simple functions {fn} 6 Se. This is expressed by saying that Se is dense in L\. Thus by the principle of continuous extension, L can be extended from Se to a unique linear operator L on to all of L\. In other words L is well defined on L\ and we call this the stochastic integral in the sense of Ito. For economy of notation we continue to use L instead of L to denote this extension. Another more direct proof based one elementary analysis is very instructive. Let / G L\ and {/ n } £ Se such that fn —► / in L\ and define Zn = L(fn). Since || Zn+m

— %n ||L 2 (fi,^,P) = ll fn+m

~ fn ||e

and fn is a convergent sequence , this is a Cauchy sequence in L2(^t,Jr1P). Hence it has a unique limit Z e L2(tt,F,P) which we call the stochastic integral of / , that is, Z = L(f) = L{f). (1.49) Further, this limit is independent of the choice of the approximating sequence. Indeed, let {#„} € Se be another sequence that converges to / in L\ and suppose that L(gn) —> Y in L 2 (fi, J1", P). Then \\Z-Y

\\Lt =|| Z - L(fn) + L(fn) - L(gn) + L(gn) - Y ||

<|| Z - L(fn) || + || /„ - 9n He + II L(gn) - Y || <|| Z - L(/„) || + || / „ - / ||e + || / - 9n lie + || L(9n) - Y \\ .

Stochastic Processes

18

Since L(fn) —► Z,L{gn) —> Y in L 2 (ft, ^ P ) and / n —> / , £ n —> / in L|, it follows from the preceding inequality that for every e > 0 there exists an n(e) G N such that for all n > n{e) , \\Z-Y\\L2{n^P)<e. Since e(> 0) is arbitrary, Z = Y. Thus we have proved the following funda mental result. T h e o r e m 15 For each / ^L\,

the stochastic integral

L(f) = I f(t)dW(t)

(1.50)

is uniquely defined in the sense of Ito, and further, it satisfies the following properties: (i) ELtf) = 0 (ii) E\L(f)\* = Efj\f(t)\*dt, (iii) E{L(f)L(g)} = E^fitM^dtJ^ e L%. Now we consider the stochastic integral as a function of it's upper limit given by £(£) = Lt(f) = f0 f(s)dW(s),t G I. We show that it is a continuous square integrable martingale. For any f € Se and s < t, it is evident that E{Lt(f)\fs}

= Ls(f) + E{[

fdW\fs).

(1.51)

JS

Using property (2) of Theorem 7, the last component can be written as

E{[ fdW\Ts}=Et ^s

J2

ai^{W*i+i)-W(*i))l^}l^}=0.

^s
'

Hence the last component in (1.51) is zero and therefore Lt(f) is a square integrable martingale. The square integrability follows from (ii) of Theorem 15. For continuity, note that for s < t and s,t G (£fc,£fc+i], Lt(f) — Ls(f) — a k{W(t) — W(s). Since the Wiener process is continuous with probability one, this implies continuity for the martingale £(£) = Lt(f) for simple / . For the general case, / G L\, o n e u s e s a n approximating sequence {/ n } G <Se, martin gale inequality and limiting arguments based on Borel-Cantelli lemma. Indeed, for / G L|, there exists a sequence {/ n } G Se such that fn —> / in L\. Define ^(t) = / f(s)dW(s) Jo

and Ut)

= / Jo

fn(s)dW(s).

Linear and Nonlinear Filtering

19

By the preceding results, both £(£) and £n(t) are square integrable Tt martin gales and further, £ n is continuous with probability one. By the martingale inequalities, theorem 14, p j s u p \Z(t) - Ut)\2

> r } < (i/r)E

|| / - / „ ||£. .

Since fn converges to / , we can choose a subsequence {n^} C {n} such that oo

£2fc||/-/nfc||£|<<x3. fc=l

Define the sequence of sets {Ak} by Ak = | w € a : sup|^(t) - U ( 0 l 2 > 2 - f c | Clearly ^P(Afe)
(1.52)

is a continuous square integrable martingale. The process < C(t) > = < Lt{f), Lt(f)

>= f \f(r)\2dr Jo

(1.53)

20

Stochastic Processes

is called the square variation (quadratic variation, compensator) of the mar tingale £ and one can verify that

r1(t) = (at))2-,t>o,

(1.54)

is again a martingale. This is left as an exercise for the reader. The stochastic integral (1.50) can be further extended to a larger class of functions 1% = If : f{t)

is Tt adapted, P{ f \f(t)\2dt

< oo} = 1 j .

(1.55)

This is easily done by using the fact that L\ is dense in L2. Since we do not use this general case anywhere in this monograph, we shall not discuss it further. The results given for the one dimensional case can be easily extended to the multidimensional case. Let W = {W{t),t G / } denote an m-dimensional standard Brownian motion and L\ denote the class of all M(nx m) valued Tt adapted random processes satisfying E j || a{t) ||2 dt < oo.

(1.56)

Furnished with the scalar product defined by (v,P)e = E f Tr(a(t)p'(t))dt,

(1.57)

and the associated norm square given by (1.56), this is a Hilbert space. If W is a nonstandard Brownian motion having the incremental covariance Q G M+(m x m), then the scalar product given by (1.57) is modified as (<7,/?)e = E [Tr(a(t)QP(t))dt.

(1.58)

Thus the multivariable version of Theorem 15, can be stated as follows. Theorem 17 For each a G L|, the stochastic integral L(a)=

fa(t)dW{t)

(1.59)

is uniquely defined in the sense of Ito, and further, for any Q Brownian motion, it satisfies the following properties: (i) EL(o-) = 0,aeLe2

Linear and Nonlinear Filtering

21

(ii) E\L(a)\2 =EfJTr{a(t)Qa(t)}dt , (iii) E{{L{al),L{a2))} = ESIT*{al{t)Qa2(t)}dt,Gl,G2 e L§, (iv) £(t) = J 0 a(s)dW(s),t e / , cr 6 L2 is a continuous square integrable martingale with the compensator < i(i) >=< Lt(a),Lt(o-)

>= f

Tr{aQa'}ds

Jo

(v) £{sup t 6 / |£(t)| 2 } < 4£|£(T)| 2 =

4EfjTr(
Suppose the principal sigma algebra T appearing in the probability space (fi,^ 7 , P) is replaced by the (completed) sigma algebra T? which is the min imal sigma algebra with respect to which the Brownian motion is measur able. Then it follows from (ii) of Theorem 15 (or (ii) of Theorem 17) that 1/2(0, T™, P) is isometrically isomorphic to the Hilbert space L|- This means that for any Z £ L2{(^t1 Fjf ,P) there exists a unique / £ L2 such that Z = L(f) and conversely. Stochastic integrals based on Poisson Process Here we consider stochastic integrals based on the centered Poisson random process, more precisely, the counting measure q(T,T) - q(t,r) = [ q(ds x d(). J[t,r]xr We wish to consider stochastic integrals of the form

K(g)= f I g{s,Qq(dsxdO

(1.60)

Jo Jz for a suitable class of integrands, where Z = Rn \ 0. Let S(I x Z;Rn) denote the class of T% adapted simple functions defined on I x Z and taking values from Rn. Let L2(H) denote the class of space-time random processes so that E [ \g{t,x)\2dt U{dx) < oo. JlxZ

(1.61)

Precisely as in the continuous case, we define the integrals first for simple functions S and then, by limiting arguments, we obtain the results for the general case L2(U). Define Se = Sr\L2(U). For g € Se there exists a sequence of disjoint sets {r^} G Bo{Rn) so that Z = Ul^, and a partition 7rn = {0 =

22

Stochastic Processes

to < ti < t2,....,t m = T} of the interval i" = [0, T] = U/i, 7* = (£i,£i+i]> and a family of Rn valued random variables {a^} which are Jrti measurable so that g(t,x) = ^dij

xiixrt(t,x),t

e I,x e Z.

(1.62)

Clearly for such functions we have K(g)=^ai,i(q(ti+1,ri)-q(ti,ri)).

(1.63)

This is a well defined random variable belonging to the class Z^^,.? 7 , P). Again, following similar arguments using conditional expectation as in the continuous case, we can verify that for all g G Se we have EK(g) = 0, E\K(g)\2

= V |oi,/| 2 (t i + i - UMTi) ~7 i,£

= f \g(t,x)\2dtU(dx) Jixz

< oo.

(1

64)

Thus we have the following result for jump process, analogous to Theorem 17 for the continuous case. Theorem 18 For each g G L | ( n ) , the stochastic integral K(g)=

g(t,x)q(dtxdx) (1.65) JlxZ is uniquely defined in the sense of Ito and it satisfies the following properties: (i): EK(g)=0,g£Lm) (ii): E\K(g)\* = E JIxZ \g(t,x)\2dtU(dx) , (iii): E{(K(9l),K(g2))} = EJIxZ((gi(t,x),g2(t,x))U(dx)dt, V9i,92€LftIL), (iv) £(t) = JQ Jzg(s,x)q(ds,dx), t £ I,g € L^H), is a square integrable martingale with discontinuities at most of the first kind. The compensator is given by < £(t) > = /„* fz \g\2dsU(dx). For further and more detailed studies the reader is referred to the texts [1],[17],[33],[41],[42], [46], [47],[59],[64], [70],[71]. 1.6 Stratonovich Integrals We have seen that the Ito integral is defined in a natural (non anticipative) way by taking the limit in the mean of sums of the form L(f) = l.i.m £

f(U)(W{ti+1)

- W{U))

(1.66)

Linear and Nonlinear Filtering

23

for any continuous Tt adapted process / . This integral may have different values for different choices of the point of evaluation of the integrand. In general, for 0 G [0,1], the integral L(0, f) = l.i.m £

f(U + 0(U+i - ti))(W(ti+1)

- W(U))

(1.67)

7TTl

may have different values for different choices of 0, unlike the Lebesgue or Lebesgue-Stieltjes integrals. Stratonovich integral uses 6 — (1/2) while the Ito integral uses 9 = 0. Suppose the integrand is given by f(t) = g(W(t)) where g is a C1 function. Then one defines the Stratonovich integral as L.{f)

= L(l/2J)

=

Jg(W(t))odW

= U.m^gUiU)

W{ti+l) W{ti)

+

~

= l.i.mJ29(W(ti))(W(ti+1) Pi

)(W(ti+1)

-

W(U))

- W(U))

" + l.i.m (1/2) J29x(W{U))(W(ti+1)

-

W{U))2

(1.68)

pin

= Ig(W(t))dW(t) + (1/2) I

gx(W(t)) dt

rp

= L(g) + (1/2) / Jo

gx(W(t))dt.

For the vector valued case with jRm-valued Wiener process W and g : Rm —> M{n x m), this result takes the form Ls(g) = L(g) + (1/2) j b(W(t))dt,

(1.69)

where bi = J2T=i dj9ij(W),i = 1,2,3, ■ ■ -,n. The most interesting feature of the Stratonovich integral is that it obeys the rules of Newtonian calculus while Ito formalism does not. For an example, take g(W) = W. Then it follows from (1.69) that / WodW= [ WdW + (l/2)T. Jo Jo Following the classical calculus we have fT

rW(T)

/ W o dW = / xdx = Jo J\V(0)

{1/2)W2{T).

(1.70)

24

Stochastic Processes

Hence it follows from (1.70) that the Ito integral is given by rT

r2i f WdW = (1/2){W2(T) Jo

-T}.

(1.71)

This shows that the rules of classical calculus do not apply in stochastic cal culus. We discuss this more in details in the next chapter.

25

CHAPTER 2 STOCHASTIC DIFFERENTIAL EQUATIONS

2.1 Introduction Temporal evolution of most of the natural and man made systems are governed by differential equations. Systems which are subject to uncertainties in the form of additive or multiplicative perturbations are modeled as stochastic differential equations. Often systems subject to parametric uncertainties can be approximated also by stochastic differential equations. It is important to know that this subject has been extensively studied both for finite and infinite dimensional systems and a full account of this alone requires several volumes. In this chapter we present only a very brief account of this subject to satisfy the needs of this monogram. For detailed studies on this topic, the reader is referred to the texts [1] ,[17], [33], [41], [42], [46], [47], [56], [59], [62], [70], [71], [74], [78], [84], 2.2 Linear Stochastic Differential Equations Deterministic finite dimensional linear systems are governed by differential equations of the form (d/dt)x = A(t)x + /(*), x(0) = x 0 , t > 0

(2.1)

where A is a known matrix valued function determining the intrinsic charac teristic of the physical system while / is an external force and x 0 is the initial state at t = 0 and x(t) denotes the state of the system at time t. If the initial state or the forcing function / or both are random we have a stochastic system. For example, if / is the sum of a deterministic force and white noise the system can be modeled by a linear stochastic differential equation, dx(t) = A(t)x{t)dt 0,

+ fd(t)dt

+ cr(t)dW(t) x(0) = x

where W is an m-dimensional Q Brownian motion and a is a n x m matrixvalued deterministic function, fd is a deterministic force and XQ is a random

26

Stochastic Differential

Equations

n-vector. If the elements of the matrix A(t) are locally integrable then it admits a transition operator $(£, 5), 0 < s < t < 00, which is given by $(*, s) = X(t)X-\s),0

< s < t < 00,

(2.3)

where X is the fundamental solution of the matrix differential equation (d/dt)X(t)

=

A(t)X(t),t>0,

$(*, t) = J

X(0) = In,

with In denoting the identity matrix from M(n x n). Clearly the transition operator satisfies the following properties n,

Vt > 0, $(£, s)$(s, r) = $(*, r) V r < s < t,

(3/9t)$(t, s) = A(t)$(t, 5), -{d/ds)${t,

s) = $(t, 5)^(5).

Using the transition operator, one can easily verify that x, given by x{i) = $(t,0)x 0 + / $(t,s)f(s)ds, Jo

(2.5)

is the unique solution of the deterministic equation (2.1), while the solution of the stochastic differential equation (2.2) is given by x(t) = (t10)x0+ f ${t,s)fd(s)ds+ 7o

[ *(t,s)
(2.6)

The stochastic integral is defined in the sense of Ito. If XQ has finite second moment, fd is locally integrable and the elements of a are locally square in tegrable, then x(t) also has a finite second moment. Let 0 < T < 00 and define z(t)=

[ $(t,s)a{s)dW(s).

(2.7)

It follows from theorem 17 of chapter 1, and local square integrability of cr, that for any t G [ 0 , T ] E ^ w e have E\z{t)\2=

f

Tr{{t,s)a(s)Qa'(s)$'{t,s)}ds

J

°

T

< M2 / Jo

(2.8)

f

|| a(s)Qa(s)

|| ds < 00,

Linear and Nonlinear Filtering

27

where M = MT = sup{|| $(t,s) ||,0 < 5 < t < T } . Hence there exists a positive constant C = C(M,T) depending on the parameters indicated such that for all t e IT,

E\x(t)\2
+ (f

\fd(s)\ds)2

+ f

\\
(2.9)

If the initial state xo is Gaussian then the process x is also Gaussian. Let E(xo) = Xo = mo denote its mean. Then the mean of x(t) is given by ra(t) = $(*,0)ra 0 + / $(t,s)fd(s)ds Jo

(2.10)

and it satisfies the differential equation (d/dt)m = A(t)m + / d , t 6 IT,rn(0)

= m0.

(2.11)

Suppose now that XQ is To measurable and independent of the sigma algebra a{W(t) - W(s),t > s > 0}. Let P 0 , given by (P 0 £,£) = E(x0 -£o,£) 2 > denote the covariance matrix of the initial state Xo- Let us now compute the covariance of x(t), denoted by P(t). Subtracting (2.10) from (2.6) and scalar multiplying by £ and squaring and taking the expectation while using the independence hypothesis as mentioned above, we have (P(t)£,0 = £J($(t,0)(x0 - m o ) , 0 +

J(dW(8),a\8)*\t,s)t)\

= ( / % * ' ( * , 0 ) £ , * ' ( t , 0 ) 0 + I (Qa (s)$ (t,s)0,a'(s)$

(t,s)t)ds.

Jo

(2.12) Differentiating this with respect to t and recalling the properties of the tran sition operator $ we obtain (d/dt)(P(t)£,0

= {P(t)t,A'(t)0

+ {P(t)A'(t)£,t)

+ (Qa'm,a'(t)0,t

€ I (2.13)

for all £ € Rn. Since £ £ Rn is arbitrary, we have (d/dt)P{t)

= A(t)P(t)

+ P(t)4'(*) + cr{t)Q<j'{t),t e IT.

Thus we have the following result.

(2.14)

Stochastic Differential Equations

28

Theorem 1 Consider the system (2.2) and suppose that the elements of the matrix A and those of fd are locally integrable and the elements of a are locally square integrable and cr(x0) _L a{W(t) - W(s),t > s > 0}. Then, for each finite t, x(t) has finite second moment given that the initial state XQ has this property. Further, if x$ is Gaussian then so is x{t) with mean m(t) and covariance P(t) satisfying the differential equations: (d/dt)m = A(t)m + / d , t G IT,ra(0) = mo, / , (d/dt)P(t) = A(t)P{t) + P{t)A {t) + a{t)Qa (t),P(0) = P0,t e IT,

(2.15)

for every finite T. If either PQ is nonsingular or the deterministic system given by, (d/dt)C = A(t)C + fd + C7(t)u,u G L 2 (J T , .R m ),

(2.16)

is completely controllable then P(t) is nonsingular and we have a nonsingular Gaussian process. The corresponding Gaussian measure \it is given by Mt(H = j 9{x, m(t), P(t))dx,

(2.17)

where ^ is given by the expression (1.8), chapter 1. If the matrices A and a are constant, fd = 0 and the Lyapunov equation AP + PA =

-oQo

has a positive definite solution P*, then one can show that the measure /j,t given by (2.17) converges weakly to a Gaussian measure /i* with density g(x, 0, P*). For more general linear systems with coefficient matrices A and a being Tt adapted random processes, the reader is referred to Ahmed [1] where existence of solutions and their continuous dependence (or sensitivity) with respect to parameters are studied. 2.3 Nonlinear Stochastic Differential Equations In general a nonlinear stochastic differential equation driven by Wiener process is written in the form dx = 6(t, x(t))dt +
= x 0,

(2.18)

Linear and Nonlinear Filtering

29

where b maps I x Rn to Rn and a maps I x Rn to M(n x m) given that W is an m-dimensional Q Brownian motion. The parameter b is called the drift and the matrix aQa is called the diffusion. More precisely this equation is interpreted as a stochastic integral equation x ( t ) = x 0 + / b(s,x(s))ds Jo

+ / a(s,x(s))dW(s). Jo

(2.19)

Given x(t) = £, the value of x at time t + At is approximately given by x(t + At) « C + 6(«, C ) ^ +
(2.20)

Suppose that both b and a are well defined for each (t, C) € / x Rn. Then given that x(t) = C, it follows from the above expression that, for every rj G Rn, limAt^0(l/At)El(x(t

+ At) - Cn)\x(t)

= A = (b(t,Q,ri).

(2.21)

Similarly, limAt->o(l/&t)El((x(t

+ At) - C - &(t,<)At,77) 2 |xW = A

= ^ m A ^ 0 ( l / A t ) ^ | (v(t, ()AW, rj)2\x(t) = A

(2.22)

= (V(*,C)^'(U)Kr/) =(a(t,C)r7,77). Hence the parameters b and a are known as the infinitesimal mean and covariance of the process x. It is evident from the expression (2.20) that, for sufficiently small At, given x(t) = £> the random element x(t -I- At) is approx imately Gaussian with mean vector fe(t, £) and the covariance a(t, £). In other words, x is locally Gaussian though in global terms it is not. Expression (2.20) can be also interpreted as follows. The mean instantaneous velocity of motion at the point (t, () in the phase space I x Rn is determined by the value of b at this point; and the fluctuation around this mean motion is determined by cr(t, C) or more precisely by a(t, £)• For the proof of existence and uniqueness of solutions of equation (2.18) we use the integral equation (2.19). There are two notions of solutions for SDE. One is known as the strong solution and the other as the weak solution.

Stochastic Differential

30

Equations

Definition 2 (strong solution) The system (2.18) is said to have a strong solution x if x(t) is Tt adapted, t —> x(t) is continuous with probability one, and x satisfies the integral equation (2.19) almost surely. Accordingly a strong solution has the representation x(t) = (*, xo, HtW), HtW = {W(s), s < t}

(2.23)

where 0 is a nonanticipative function continuous in the first variable, Borel measurable in the second and Tt measurable in the third. We prove that equation (2.18) has a strong solution. We use the notation

\\ofQ=Tr(oQo). Theorem 3 (Existence and Uniqueness) Consider the system (18) and suppose there exists a K e L^{I) such that for allt e I and £,77 G i? n , (a) : T\b(t,0\2 + 4 ||
(2-24)

Then for every To measurable initial state xo having bounded second moment, equation (2.18) has a unique strong solution x, and E{sup\x(t)\2}

(2.25)

tei

Proof.

Define the vector space

X = < x : x(t)

is Tt

adapted and x e C(I, Rn) P - a.s., i

(2.26)

and furnish it with the norm topology \l/2

/

||x|U^^{sup|x(t)|2}J

.

(2.27)

By virtue of Borel-Cantelli lemma, this is a Banach space and hence a complete metric space with respect to the metric p given by p(x,y) =|| x — y \\x • Let Xo = {x e X : x(0) = xo a.s.}. Define the operator S on X by (Sx)(t) = x0+

/ b(s,x(s))ds+ Jo

/ a(s,x(s))dW(s), Jo

t e I.

(2.28)

Linear and Nonlinear Filtering

31

We use Banach fixed point theorem to prove that this operator has a fixed point in X0. Note that existence of such a fixed point implies that the integral equation (2.19) has a solution. We show that S maps X0 into itself and that some power (or iterate) of it is a contraction in X0. Let £ € X0, then for each tel, sup | ( 5 0 ( * ) | 2 < 3 ( | z o | 2 + T f 0<s
I

\b{r,Z{r))\2dr

Jo

+

(2.29) 2

sup I / <s
o<

a(r,£(r))dW(r)| l.

JO

)

Taking the expectation and using the martingale inequality (theorem 14, chap ter 1) we have E{ sup |(S£)(s)| 2 } <

Z(E\X0\2+TE

0<s
|6(r,£(r))| 2 dr

f

I

Jo

+ 4EJo ||<7(r,£(r))|&dr}. Using the growth condition (2.24a), this yields E{ sup |(SO00| 2 } < 3 ( ^ | x 0 | 2 4 - / K(r){l 0<s
{

+ E sup |£(0)| 2 }dr\,

Jo 2

Hence for c = 3 max {£|x | , / 0

K(s)ds}

0<9
)

Vtel. (2.30)

it follows from (2.30) that

0

|| Sx H2, < e{l+ || x H2,}.

(2.31)

Further, it follows from (2.28) that lim U0 (Sx)(£) = x0 P-a.s. This shows that S maps Xo into itself. Now we must show that Sn is a contraction in XQ for some integer n. Let x,t/ € X 0 , then by use of the martingale inequality and the Lipschitz condition (2.24b) we have E{ sup \(S(x)(s) 0<s
<

(Sy)(s)\2}

\b(T,x(r))-b(T,y(r))\2dr ° ■ re o1 + sup | / [
< 2 / K ( s ) £ { sup | x ( 0 ) - ^ ) | 2 } d s . Jo o<e<s

(2.32)

Stochastic Differential Equations

32

Defining pt(x,y) = E\

sup \x(s) - y(s)\2} U<5
for x . t / G l ,

(2.33)

Vt 6 / .

(2.34)

it follows from (2.32) that Pt{Sx,Sy)

<2 [ K(s)ps{x,y)ds Jo

Repeating this iteration n times and using the notation h(t) = J 0 K(s)ds we obtain pti^x^y)

< ((2h(t))n/(n

- l)!)ft(x,y).

(2.35)

Then evaluating this at T and using the parameter an = ({2h{T))n/{n

- l)l)

(2.36)

and the metric topology of X, it follows from (2.35) that || Snx - Sny \\x< an\\x-y

| | x , Vn € N.

(2.37)

Clearly by virtue of (2.36), for n sufficiently large say n > n0, cxUo < 1 and hence the operator Sn° is a contraction on X0 and therefore, by Banach fixed point theorem, it has a unique fixed point and consequently S itself has only one and the same fixed point. This proves that for each initial state XQ possessing finite second moment, the integral equation (2.19) has a unique solution x € X and hence the problem (2.18) has a strong solution x. QED The conclusions of Theorem 3 also hold under more general conditions. For example, it suffices if the coefficients b and a satisfy the global growth condition (a) and only local Lipschitz condition instead of the uniform Lipschitz condition (b) as stated there. Further, solutions also exist locally even if the coefficients have polynomial growth. Proof of this is based on the concept of stopping time. This is beyond the scope of this monograph. The reader is referred to books more specialized on SDE such as [1], [17],[41],[42],[46],[47],[62],[63],[70],[71]. 2.4 Ito and Stratonovich Calculus and Their Comparison Now we wish to introduce the Ito formula which is extremely useful in estimating bounds of functionals of solutions of Ito equations. Further we use

Linear and Nonlinear Filtering

33

this in the study of Markov processes, in particular, the Backward and Forward Kolmogorov equations and their infinitesimal generators. According to Newtonian calculus if a scalar valued function / = f(t,x) is C 1 in all the variables on I x Rn and x is a C 1 function from J to Rn, then n(t) = f(t,x(t)) is differentiable and it is given by (d/dt)n(t) = ft + (fx(f (d/dt)r,(t) ,x) x,x)

(2.38)

df(t,x(t))=f df(t,x(t)) tdt = ftdt + (fx,dx). (fx,dx).

(2.39)

or equivalents For sufficiently smooth / , it follows from Taylor expansion that An(t) = fft(t,x(t))At t(t,x(t))At

4 (/x,Ax(t))

+

(l/2)ftt(t,x(t))(At)2

4 (f {ftxtx, Ax) At 4 (l/2){f (l/2)(fxx (t, x{t))Ax(t), xx(t,x(t))Ax(t),

Ax(t)) 4 h.o.t, h.o.t,

(2.40) (2 40) .

where An(t) = n(t + At)-n(t), Ax(t) = x(t + At) - x(t) and h.o.t denotes the higher order terms. Dividing this by At and letting it go to zero, we see that all the terms on the right hand side, except the first two, vanish giving (2.38). In general this is no longer true in the case of stochastic processes containing martingales such as (2.41)

x(t) = x(0) 4 V(t) 4 M(t),t M(t), t € /,/ ,

where x(0) is a random variable with values in Rn, V is an Rn valued stochastic process with bounded variation and M is a square integrable martingale with unbounded variation, like the Wiener process. For example if x is replaced by the Wiener process W, we cannot divide (2.40) by At and let it go to zero since the Brownian motion is nowhere differentiable and further we can no longer neglect the fifth term since E(AWiAWj) = SijAt. Thus (2.40 ) takes the form An{t) Ai7(«) = ft(t, W(t))At 4 (/te, AW)At

+ (fx, AW(t)) 4 (l/2) ft((t, 4 (l/2)Tr(/xx(t, (<,

W(t))At

W(t)){At) W(t))(At)2

4 h.o.t. h.o.t.

(2.42) (2 42) '

Hence |A77(£) - ft(t, W(t))At I.Av(t)

- (fx, AW(t)) - (l/2)Tr(/(l/2)Tr(f W(t))At\ xx(t, xx(t,W(t))At\ 22

= (l/2)/„(t, W(t))(At) (l/2)ftt(t, W(t)){At)

(ftxtx, AW)At + {f

(2.43)

+ h.o.t.

Note that the higher order terms also contain cross terms like, polynomials in the variables {At, AWU AW2,- • ;AWn} of degree equal or greater than 3.

Stochastic Differential Equations

34

Hence they are of small order o ( ( A t ) 1 + a ) for a € [0,1/2). This follows from the fact that W is Holder continuous with exponent a < (1/2). The moral is that all terms linear and sublinear in At must be retained and any thing of higher order neglected. Now we return to the general case given by (2.41). Let C 1 , 2 (7 x Rn) denote the class of scalar valued functions which are once continuously differentiable in t and twice continuously differentiable in x. Theorem 4 (Ito Formula) Consider the semi martingale process x given by (2.41) and let / G CX'2(I x Rn). Then the Ito differential of the process rj = {rj(t) = f(t,x(t)),t > 0, is given by drj(t) = df(t,x(t))

= ftdt + (fx,dx)

+ (1/2) < fxxdx,dx

= ftdt + (fx,dV)

>

+ ( / „ d M ) + (1/2) < fxxdM,dM

>, (2.44)

where the last bracket denotes the quadratic variation term. Proof. The proof is based on rigorous justification of the arguments given above. Note that terms like (fxxdV, dV), {fxxdV, dM) are of small order o(dt) and hence neglected. QED We illustrate this by a simple example where M is a continuous square integrable martingale. Let ((t) = £ ( 0 ) + / a(s)ds+ f (3(s)dW(s), (2.45) Jo Jo where a and /? are Tt adapted processes and W is a standard Brownian motion. For example, a{8) = b(8,t{8)),0(8)=*(8,t(8)). Here V(t) = / 0 'a(«)ds and M(t) = /„* p(s)dw(s). For / € CX'2(I x Rn) and T](t) = f(t,£(t)), the Ito differential of 77 is given by dri = (Lf)(t,Z(t))dt

+ (P(t)fx(t,£(t)),dW(t)),

(2.46)

+ (l/2)Tr(/ x x /?/?').

(2.47)

where Lf = ft + (a,fx)

Note that the quadratic variation term, (1/2) < fxxdM,dM formula is easily computed as follows (1/2) < fxxdM,dM

> = (1/2) < fX{tXidM*dMJ k

>

= (1/2) < fXilXipi,kdW pitdWi

= (y2)fXuXi0itk0jtkdt

>, in the Ito

>

= (l/2)Ti(fxxpp')dt.

(2.48)

Linear and Nonlinear Filtering

35

We can use Ito formula to estimate the bounds of moments of x(t) in terms of the moments of the initial state. According to the result of theorem 3, we observed that if the initial state x 0 has a finite second moment then the state x(t) also has a finite second moment. In fact, under the assumptions of theorem 3, we can show that x(t) has moment of the same order as that of the initial state x0 . This is given in the following theorem Theorem 5 Suppose the assumptions of theorem 3 hold with K constant and E\xQ|2p < oo for some p > 1, and let x denote the corresponding solution of equation (2.18). Then, there exists a constant CT = C(T,K,p) > 0, such that 2p (a) : E\x(t)\ E\x{t)\2p

< CT(1 (l + E\x0\2p)

< oo,

2p 2p (b) : E{ sup |x(s) \x(s) - x0|2pl < CT(1 (l + E\xE\x \ p.)tp. 0\ 0)t V \o<s
(2.49) {2M)

Proof. Define f{x) = \x\2p and r)(t) = f(x(t)) and compute the Ito differential as follows. For convenience of notation, repeated indices are used to mean summation. By Ito formula we have dr)(t) = fXidxi drj(t)

+ (1/2) < fxt^dxidxj fXi,Xjdxidxj

>.

(2.50).

The first and second partials of / are given by 2 p -»Xii fXi 2p\x$2^-^x Xi = 2p\x\

/ x „ x ii ==44pp((pp--ll))xxi ixxJ J| x| x| 2| 2( ^P - 2 ) .

(2.51)

Substituting this in equation (2.50) and taking the expectation after integrating over the interval [0,t], we obtain 2 2 E\x(t)\2p E\x00\|2p2 » + 2pE ff* |x( S )|\x{s)\ ^- 12)(x( E\x{t)\ * = £|x ^-l\x{s),b)ds S ),6)ds jJo t 2 2p 22 + 2p(p2p(p-l)E1)E f[ \x(s)\ \x(s)\ ( ^- ((o-Qa')x,x)ds. ((aQa')x,x)ds. Jo

(2.52)

Since by assumption (2.24a), both 6 and a have at most linear growth, and K is a constant, there exists a constant C = C(K, Q) such that | ( x , 6 ( s , x ) )<
(2.5o)

Stochastic Differential Equations

36

E\x{t)\2p

< E\x0\2p + 2pCE I \x(s)\2*-l\\

+ \x{s)$ds

Jo

(2.54)

t

+ 2p(p-l)CE

2 2

2

f \x(s)\ ^ {l Jo

+ \x(s)\ )ds.

Using Cauchy-Young inequality, ab < {eq/q)\a\q + (e~r/r)\b\r,a,beR,e>0, for a = \x\2p-\ b = l,q = (2p/(2p-l)) 1))} one can easily deduce that

(1/q) + (1/r) = 1,

and e =

\x\2p~l <\x\2p +

exp{((2p-l)/2p)log(2p/(2p-

d\/xeRn.

Using similar arguments for a = |x| 2 p ~ 2 with q = (2p/(2p - 2)) one can verify that M 2 p - 2 < | x | 2 p + C2, V x € R n where the constants C\ and C 2 are positive depending only on p. Finally use of these inequalities reduces (2.54) to the form E\x{t)\2p

< E\XQ\2P -f Czt + C 4 / E|x( 5 )| 2 p ds, Vt € / ,

(2.55)

Jo

where C3 and C4 are positive constants depending only on the parameters p, Q and if. Using Gronwall lemma, it follows from (2.55) that E\x(t)\2p
+

E\x0\2p)

where the constant CT depends on the parameters T,K, Q and p. This proves the first part of the estimate (2.49). Using similar procedure one can prove the second part. This is left as an exercise for the reader. Note that the order of regularity of x can not be better than that of W. QED Comparison of Ito and Stratonovich SDE In chapter 1 we briefly touched upon Stratonovich integrals. We observed that the Ito formalism does not obey the chain rule of classical calculus while

37

Linear and Nonlinear Filtering

the Stratonovich does. Here we wish to compare the stochastic differential equations based on Ito and Stratonovich formalism. Consider the Ito equation dx = b(x)dt b{x)dt + a(x)dW + a(x)dW

(2.56)

where the integral is understood in the sense of Ito. We wish to convert this to an equivalent Stratonovich equation to be written in the form dx = b{x)dt dW. b(x)dt + a(x) o dW.

(2.57)

Let 7rn denote any partition of the interval [0,t] so that l i m , ^ , D(irn) = 0, where D(vrn) denotes the diameter of the partition TT" in the sense that D(nn) = max{\tk+i-tk\c0
/ a jodW* = l.i.m. V o-i j(x(tk) Jo ^ = =

+ (l/2)(x(t fc +1) -

x(tk)))AW>(tk)

l.i.l.i.m.'£ -x((tfcAWt(tk)) m.'£
{x {xee(t (tk+1 k+1

j xx£(t k))AWj(tk) £(tk))AW (tk)

--

= Ji Ji+J2, + J2, =

(2.58) where AW* (tfc) = W*(tk+i) - W^(tk), and dt4> denotes est partial ol 4> witw respect to the component xt of the vector x. Now using Ito equation (2.56) and recalling that the bounded variation terms and cross terms of small order o(At) do not contribute to the integral, the second term in (2.58), denoted by J2, is given by r

Li.m.(l/2)Y,deViMtk)) J2 = U.m.(l/2)£^(*(**)) *" = dtai:j{x{s)) = (1/2) / 0/<7«(s(s)) Jo

t

(2.59) (2-59) aerer(x(s))qrjrjds.

Hence the stratonovich integral is the sum of Ito integral and a correction term given by / aijodWj Jo

= f o-ijdW <TijdWj j + (1/2) / d/tr y (x(s)) aer(x(s))qrjds. Jo Jo

((2.60)

Stochastic Differential

38

Equations

In differential notation, this is given by aodW

= adW + Cdt

where the components of the vector C are given by 71

771

d = (1/2) Y, I > / * y ) M(qrj).

(2.61)

1=1 r,j

Defining b = b-C

(2.62)

we obtain the Stratonovich equation (2.57) from the Ito equation (2.56). Sim ilarly if the Stratonovich equation is given by (2.57) and we wish to convert it into Ito equation we must add the vector C to b. Thus we have proved the following result. Proposition 6 The Ito differential equation corresponding to the pair {b, a} given by (2.56) is equivalent to the Stratonovich differential equation corre sponding to the pair {&, a} with b = b — C where C is given by (2.61). Example 1 Consider the Ito integral JQ W(t)dW(t) with W being the stan dard scalar Brownian motion. Define / ( # ) = x 2 . Using the Ito formula for the process f(W(t)) we have df(W(t))

= fx(W)dW

+ (1/2) < fxxdW, dW >

Hence /QT df(W(s))

(2.63) J

v

= 2WdW + eft.

= 2 JQT WdW + T, giving W2(T) = 2 JQT WdW + T. Thus rT

f W{t)dW{t) Jo

r2( = {1/2)(W2{T)

- T).

(2.64)

This result agrees with the one obtained by using Stratonovich formalism as seen in equation (1.71) of chapter 1. Example 2 Consider the scalar Ito equation d£ = £WdW. Here 6 = 0 and
Al/Ode = -(1/2) /* W\s)ds + (l/2)W2(t).

Jo

Jo

Linear and Nonlinear Filtering

39

Hence m

= & exp (1/2) jW2(t) - J

W2(s)ds\.

(2.65)

E x a m p l e 3 Consider the scalar Ito equation d£ = a£dt + /3£dW. If one uses classical calculus, the solution is given by f (t) = & exp I f ads+

f (3dw\.

(2.66)

Using the Ito derivative of /(£) = log £, one obtains df(£(t)) = (adt + (3dW - (l/2)(32dt.

(2.67)

Integrating this, one obtains f CO = Co exp | ^ (a - (l/2)/3 2 )d 5 + ^ / 3 d w J .

(2.68)

To use the Stratonovich calculus we change the drift of the Ito equation to &(£) = (a — (l/2)/? 2 )£. Clearly using this modified drift and interpreting the stochastic integral as the Stratonovich integral and then using classical calculus to integrate we obtain the same result as given by (2.68) which was derived using the Ito calculus. Hence (2.66) is incorrect. Before we conclude this section, we make some comments regarding nu merical solution of SDE. A standard method for solving ordinary differential equations is the Runge-Kutta algorithm. This algorithm uses classical integra tion techniques. Thus if one solves Ito equation, involving nonlinear diffusion coefficient, as an ordinary differential equation driven by white noise using Runge-Kutta technique, the result will be incorrect. Hence for numerical cal culations one must transform the Ito equation into Stratonovich equation as suggested in proposition 6. 2.5 B a c k w a r d a n d Forward Kolmogorov E q u a t i o n s We have seen in chapter 1 that a Markov process is completely character ized by its transition probability function P ( s , x ; t , r ) , 5 < t , x G Rn,T e BRH.. Definition 7 (Transition Kernel) A function P = P(s,x;t,T),s
Stochastic Differential

40

Equations

(a): for fixed s,£,T, x —> P(s,x,t,T) is Borel measurable, (b): for fixed s,t,x, T —> P(s,x;t,T) is a probability measure on BRr>. (c) for 0 < s < r < t < oo, T G BRn and x € Rn P(s,x;t,r)=/

P(s,x,r,dy)P(r,y;t,r).

(2.69)

The expression (2.69) is known as the Chapman-Kolmogorov equation. In gen eral under some mild assumptions, solutions of stochastic differential equations are Markov processes. Consider the system <%(t) = b(t,£(t))dt + a{t,£(t))dW(t),t

> s,£(s) = x.

(2.70)

Under the assumptions of theorem 3, the process £ = {£5>:r(£), s €s,x(t) is continuous in the mean square sense and hence Borel measurable. Therefore the transition probability kernel P(s, x-1, r ) = Prob{£(t) e T|£(s) = x}

(2.71)

is well defined. The uniqueness of solution implies that P, as defined, also satisfies the Chapman-Kolmogorov equation . Definition 8 (Diffusion Process) An Rn valued Markov process with the transition kernel P = P(s,x;t,T),s 0 and x e R , P satisfies the following properties: (Dl): lim^o (1/ft) $N> P(t,x',t + h, dy) = 0 We > 0 where N'€ is the com plement of the e-neighborhood N£ of x. (D2): there exists a function b(t,x) : I x Rn —> Rn such that lim(l//i) /

{y-x,r])P(t,x-t

+ h,dy) =

{b(t,x),r]),Vr}eRn.

(D3) there exists a function a(t,x) : I x Rn —> Rn such that lim(l//i) / (y - x,r))2P(t,x-,t Mo JNe

+ h,dy) = (a(t,x)r),ri),Vri e Rn.

Thus we have the following result. Theorem 9 Under the assumptions of theorem 3 or more generally local Lipschitz properties, the process f given by the solution of the SDE (2.70)

Linear and Nonlinear Filtering

41

is a Markov diffusion process with the diffusion matrix given by a(t,x) <7(t,x)Q
=

2.5.1 Backward Kolmogorov Equation. Backward Kolmogorov equation is very useful in computing expected values of functional of the Markov process £. For example suppose we are interested to find £ mapping Rn to R. We show that we cac find this expected value by solving a deterministic linear partial differential equation. This however requires some technical assumptions on the function 4> such as C2 regularity. The advantage is that a nonlinear functional of a Markov process governed by a nonlinear SDE is determined by solving a linear problem. The price to be paid for gaining linearity is that one must now solve a partial differential equation. However, since this is the only analytical technique, there is no option but to accept the price. Returning to our problem, we define for (s,x)€ [0,T]xRn, V(s,x) ,s(T))} v( =EL(£ EL(Z EUwnm*)=4. S,s(T))} ==EL((;(T))\tis) S,x) = 3

*\,

(2.72)

(2.72)

where £ 5 , x (.) denotes the solution of equation (2.70) starting from the state x at time s. Now, if the probability measure for the initial state £(0) at time t = 0 is known as v0, then E4>{Z{T))= f

JR"

(2.73)

V{0,x)v V{0,x)v 0(dx). 0(dx).

Our objective here is to find a differential equation whose solution is V as defined above. Note that from the Markov property and the uniqueness of solution we have V(s,x)

= E<j>(t E(tas,x(T)) = EU{d EL(ZS+AS ,Z(T),Z {T),z s+AStZ

= e.,xx(* (*

+ +A As) s)|

(2.74)

= £ E { V ((ss + + As,£ As,&, + As))}. As))}. S)X x (s + Using Ito formula given by equation (2.44) of theorem 4, and computing the Ito differential of V(s + As,6,x(*)) for fixed 5 + As, we have dV(s + As,e,,*(*)) As,£ S ,*(*)) = {DV,b(t,t, (DV,b{t,£StX {t))dt dV{s tX(t))dt +

+

(DV,o-(t,£SiX (t))dW(t)) {DV,a(t,Z s,x{t))dW{t))

2 (l/2)Tr(a(t,£ (l/2)Tr(a(t,£ V), 2V), s,x(t))D s,x(t))D

(2.75) '

(

where V is evaluated at the point (s + A s , &,»(*)), and DV,D2V denote, respectively, the gradient and the matrix of second partials of V with respect

42

Stochastic Differential

Equations

to the spatial variables. Integrating this over the interval [s,s + As] and taking the expectation we arrive at the following expression s+As

A{r)V{s +

As^s,x{r))dr, / (2.76) where A(t) denotes the differential generator of the Markov process £ given by (A(t)tl>)(x) = (£ty(x), &(*,*)) + (l/2)Tr(a{t,x)D21>(x)).

(2.77)

Using (2.76) in (2.74) we obtain -(y(S

+ A s

^"

y ( S ? X )

) = (VAs) E £

+

^ A(9)V(s + As,& |X (0))d0.

(2.78) Letting As go zero, it follows from Lebesgue density arguments, that we have -(d/ds)V

= A(s)V, 0 < s < T.

(2.79)

Hence we have proved the following result. Theorem 10 Suppose 0 is continuous having at most quadratic growth. Then, under the assumptions of theorem 9, or theorem 3, the conditional expectation V defined by (2.72) satisfies the Backward Kolmogorov equation -{d/dt)V V(T,x)

xeRn,0
= A(t)V, = 4>(x),

xeRn.

(2.80)

Similarly the expected value of the functional rp

F ( 0 = / /(t,«t))* + 0K(r)) Jo

(2.8i)

can be computed by solving the Cauchy problem -(d/dt)V V(T,x)

= A(t)V + i(t,x),

x e Rn,0 < t < T

xeRn,

= (x)1

(2.82)

and then evaluating the integral EF(£)=

[

JRn

V(0,x)vo(dx)

(2.83)

Linear and Nonlinear Filtering

43

where i/0 is the probability measure of the initial state £0- We can prove the following result using theorem 5. Corollary 11 Suppose there exist p>l,l
\x\2pv0{dx)

and 7 > 0 such that

< 00,

( 6 ) : | * ( t , * ) | < 7 ( l + M g ),

(c):|0(x)|< 7 (l-t-N 2 p ), then E{F(£)} < 00, and it is given by the solution of equation (2.82) and the expression (2.83). Backward Kolmogorov equation (2.80) can be used also to compute the probability measure,P{£(T) G E} where E is any Borel set in Rn. Replace in (2.80) by the indicator function of the set E, that is, (j)(x) = XE{%), and then solve equation (2.80) and compute fRn V(Q,x)i>o(dx). This may require approximating 0 by a sequence of continuous functions {(j)n} C D(A) that converge to the indicator \E uniformly on compact subsets of E. We can use also Forward Kolmogorov equation to determine this and in general the probability measure as a function of t. We discuss this in the following section. 2.5.2 Forward Kolmogorov Equation. For simplicity of notation only we consider the time invariant system d£(t) = b(£(t))dt + a(£(t))dW(t),t

> 0, f (0) = x,

(2.84)

and let £x(£),£ > 0, denote the solution. Let CQ° denote the class of C°° functions on Rn having compact supports. Define P t , x (0) = EL(Z(t))\m

=*]=

E{<j>(Ut))} = j

P(t,x,dz)4>(z).

(2.85)

Note that for the time invariant case, the infinitesimal generator A of the Markov process £x is also independent of time. Let A* denote the adjoint of the operator A. The operator A* is called the forward Kolmogorov operator. We prove the following result. L e m m a 12 The family of probability measures {Pt,x>t >0,xeRn} the differential equation

(a/at)*„ = ^ „ PQ,X

— Oxi

satisfies

(286)

44

Stochastic Differential Equations

in the weak sense, that is for each <j>€C%°, the following equation holds (d/dt)PttX PtAM), t,x(4>) = PtAM),

. 7) (2.87)

Po,x(0 = *(*)) p*M = m)

Proof. We give a formal proof. Applying Ito formula to 4>(£x{t)) we obtain (Ut)) = ttx)+ I\M)(U0))M+ Jo Jo

+ f\a'D,dW{e)), Jo Jo

(2.88) (2.88)

which, upon taking expectation and using (2.85), yields (2.89) (2.89)

PttX(4>) ==*{x) PS,x{M)de. {M)d6. Pt,*(0) ( * ) + /+ / Pe, Jo Jo

Differentiating this and recalling the continuity property of the process £ we obtain (2.87) which clearly is the weak form of (2.86). QED We can also define a Markov semigroup of operators {S(t),t sponding to PtiX(.) as follows: (S(t)<j>)(x) Pt}X (),t>>,xxe > >,xn. x (S(t)4>)(x) ==Pt, xCt>),t

en.

> 0} corre-

(2.90)

Since P(£,x,)) is a probability measure, it is obvious that S(t) is a contraction operator on Bb(Rn), the space of bounded Borel measurable functions on Rn. It satisfies the following two properties (t) : 5(0) = / and (it) : S(t + r) = S(t)S(r),t,r > 0. The second property actually follows from uniqueness of solution and hence the Chapman-Kolmogorov equation holds. Thus S(t), t > 0, is a contraction semigroup. The operator valued function S is called a Feller semigroup if it maps Cb(Rn) into Cb{Rn). As it stands, under certain mild assumptions, the semigroup 5 is weakly continuous on Cb(Rn) in the sense that, for each € Cb(Rn), lim (S(tU)(x) t-*o+

= (x), 4>(x), forforeach eachfixed fixedx xG GRnR. n.

For Ito processes, this follows from (b) of theorem 5. In fact, the property (b) implies that this convergence is uniform on compact subsets of Rn. In general this semigroup is not strongly continuous on Cb(Rn). However, it is so on the space of bounded uniformly continuous functions, BUC(Rn), furnished with the topology of uniform convergence. For bounded and continuous parameters {6,cr}, this can be proved easily by use of Lebesgue dominated convergence

Linear and Nonlinear Filtering

45

theorem and Chebyshev inequality. The general case is proved by truncation of the parameters {6,a} given by the sequence {6n,an} which are uniformly bounded and which converge uniformly on compact subsets of fin. For diffusion processes, if (Dl) (see definition 8) holds uniformly with respect to x 6 Rn, then S is strongly continuous on Cb(Rn). If it is only true for each fixed x e Rn, then it is only weakly continuous. Define Ht(E) = Prob{t{t)€E},E€BRn,

(2.91)

where £ is the solution of equation (2.84) corresponding to the initial state £o = x which has the distribution i/0. From the above result we can derive a differential equation for the ( probability) measure valued function {fj.t,t > 0} induced by the solution process {£(£),£ > 0}. Let M(Rn) denote the space of countably additive regular Borel measures on BRn and VM(Rn) C M(Rn) the space of regular probability measures. For each A € M(Rn) and / € Cb(Rn), the action of A on / is defined and denoted by Hf) = /

f(x)X(dx)

= < A, / > = < / , A > .

T h e o r e m 13 The family of probability measures {/j,t,£ > 0} satisfies the following differential equation (d/dt)>*t = A*»t, Mo = fo, t > 0

(2.92) (2.92)

in the weak sense, that is, {d/dt)H(4>)

=nt(M),t>0,

M 0 ) = ^o(
(2.93)

for each 4> 6 ( ^ ( . R " ) . Proof. Proof follows readily from Lemma 12. Using the semigroup S(t), t > 0, we can rewrite (2.89) as S(t)cj> = 4>+ f S{6){A(j>)de. Jo Integrating this with respect to the initial measure vQ and letting S*(t) denote the adjoint semigroup, we have <S*(t)v 0 ,4>>=< v»,4> > + / Jo

< S*{6)vo,A4> >d9.

Stochastic Differential

46

Equations

Setting jjbt = S*(t)i/Q, it follows from this that fit satisfies the evolution equation (2.92) in the weak sense given by (2.93). This ends our formal proof. QED Here we have introduced two semigroups of operators S(£),£ > 0, and S*(t),t > 0, the later being the adjoint of the former. The semigroup S solves the backward Kolmogorov equation in Cb(Rn) and S* solves the forward equa tion on the space of measures M(Rn). Under quite general assumptions they are pointwise continuous but not strongly continuous unless some additional technical conditions are satisfied. If the initial distribution UQ has a density with respect to Lebesgue measure, that is, there exists a po € L\(Rn) such that Po(x)dxVGeBRn,

JG

then the measure valued function fit has a density, p(£, x) such IH(G) = I p{t,x)dx VG e BRn, JG

and it satisfies the Forward Kolmogorov equation x),t>0,xeRn

(d/dt)p(t, x) = A*p(t, p(0,x)=p0(x),xeRn.

H,(G)= [

We have stated often in this section that the proofs are formal. The reason for this is that, in general, neither the Backward nor the Forward Kolmogorov equations have classical solutions, that is, solutions that satisfy the correspond ing equations for all (t,x) e I x Rn. On the other hand weak solutions may exist under very mild assumptions. For many practical applications, weak so lutions are good enough and may be computed using Galerkin approach as discussed in chapter 13. Example 4 For a simple illustration, let us consider the scalar equation d£ = P(0dt + adW where W is one dimensional standard Wiener process and a is a constant and (3 is a continuous real valued function on R. The differential operators A, A* are given by M = P{x){d4>ldx) +

a2(d2(/)/dx2)

A* = -(d/dx)((3(x)) +

a2(d2/dx2)((t>).

Linear and Nonlinear Filtering

47

Thus the equation for the probability density is given by (d/dt)p(t, x) - a2(d2/dx2)p(t,

x) + (d/dx)(l3(x)p(t,

p(0,x)=po(x).

x)) = 0,

;Ti G B "

This is precisely the heat equation for a rod or some liquid in a thin cylindrical pipe of infinite length with a convective term, where a 2 is the coefficient of heat conduction of the material. 2.6 Change of Drift and Girsanov Theorem In chapter one we discussed briefly the concept of absolute continuity of measures. Two different stochastic differential equations on the same state space, determined by the parameters (b2,o~) and (bi,cr) respectively induce two measures ^2 and fi\ on the path space C = C(I, Rn). The question is how they are related and under what conditions /12 -< Mi or /ii -< /Z2 or both and therefore equivalent. Suppose C is furnished with the Borel sigma field Be, that is the sigma field generated by open sets in C containing all cylinder sets of the form i G C : £(ti) G r i , • • .,£(tk) e Tk

R^U

G J;i = 1, • • •,fc;k G j v j .

As discussed in chapter 1, according to Radon-Nikodym theorem, if |x2 ^< Mi there exists a q G I a ( C , Bc,/J>i) such that M2(A) = / 9(0Mi(d0 VA G ^ c -

(2.96)

Our objective here is to determine conditions for existence of q and an explicit expression for it. This result is used in chapters 13, 15 and 16 dealing with the problems of nonlinear filtering, control and identification respectively. For a clear understanding of this topic we start with an elementary problem on Rn instead of C and use this result to construct q for the infinite dimensional prob lem by taking limits of g m constructed piece by piece for m finite dimensional problems. Let X be a Gaussian random element in the state space Rn with mean m G Rn and covariance C G M+(n x n), and h G Rn a fixed element. Define the random variables Yx = X,

and

Y2 = X + h,

(2.97)

48

Stochastic Differential Equations

and the function g(x,m,C)

= (l/(2
exp{-{l/2){C-\x

- m),x - m))}. (2.98)

Clearly the probability densities of Y\ and I2 are given by Pi(x)=g(x,m,C) p2(x) =g(x,m

) =

+ h,C).

By simple algebraic manipulation of the exponent, it is easy to see that p 2 (x

Pl(x

l

m) - (l/2)(C-lh,h)}.

h,x-

(2.100)

Let [i\ and fi2 be the Gaussian measures corresponding to p\ and P2 respec tively. Then clearly for any E € BRTI l*2(E) = f q(x)vn(dx)

(2.101)

JE

where q is the RND of ^2 with respect to \x\ given by q(x) = exp{(C~lh,x-

m) - (1/2)(C _ 1 /i,h)}.

(2.102)

Clearly q G Li(i2 n ,##n,n{). Now we consider two Ito processes {^1,^2} given by the solutions of the following two SDE's, d& = bi(t, £i)dt + *(t, Zi)dW,

teI=[0,T]

)

exp{(C-

6(0) = 6,o, » = 1,2, which correspond to two different drifts 61 and 62 respectively. Let //^o and /i2,o denote the measures corresponding to the initial states 6,0 and £2,o re spectively. In the following theorem we assume that W is a standard Brownian motion taking values from Rn. Theorem 14 Suppose a : I x Rn —► M(n x n) has bounded inverse for all (£,x) e IxRn and bi : IxRn —► Rn are measurable functions and /i2,o -< A*i,o with RND g0- Then /J,2 -< /ii and the RND is given by
- h),dW)

- (1/2) y l ^ - 1 ^ - 6i| 2 cft|,

where a~l(b2 - h) is evaluated along the path {£i(£),£ G / } .

Linear and Nonlinear Filtering

49

Proof. We prove this result for continuous {bita}. For measurable coefficients the result follows from the limit in probability of the expression for the continuous case. Let the interval / be partitioned into m equal subintervals 7 r - (^-l'*"]>* = l,2....,m where tf = (kT/m). For simplicity of notation we omit the superscript m on tk. Note that for sufficiently large m we can approximate the solutions by «"(«*) « C ( t t - i ) +

(T/mMtk.l,^(tk.l)) +

(2.104)

for t = 1,2 and k = 1,2, ....,m. Clearly given ^ " ( t k - i ) , the difference

A?fc = er(*fe) - Wfc-o »(r/m^^-a^r^-o)

+ ^ n (t fc _ 1 ))(W(t fc )-W(«fe_ 1 )),

(2.105)j '

(

is a Gaussian random element in Rn with mean and covariance matrix given by M™ k.i,^(t k-i)), M# k = = (T/mMh-u&fa-i)),^ (r/m)6<(tf c _i,C(**-i)).Crfc = = (T/m){aa')(t (T/m)(ff<7')(t fc_i,Cr(*fc-i)),

(2.106) (2-106)

respectively. According to (2.96), the RND (Radon-Nikodym derivative) is a functional on the path space and so must be computed along a common path £ e C, say, &. Then (2.106) reduces to M™k =

(T/m)bi(tk-1,tf(tk_1)),

C™k ==(T/m){aa'){t CTfc ( T / m ) ^ k'_UTit ) ^ , !k-i)) , ^ ^ - ! ) ) ==CCf tf t ==CC&&, ,

(2.107) '

(

giving the means and the covariance matrices of the two Gaussian random elements A£ fc and A^ fc respectively. Temporarily, for convenience of notation, we write M2™ = ((TT/ mH) 6^ 2^( taf er_t1 ,UC (- ^i-)i ))) , M 22 EE M£ k fe = Mi = M f t = (T/m)h ( tftc__i,£*(*k-i)) (r/m)6i(t l t er(*fc-i)) /vm

(2.108) (2-108)

/^m

= — °2,A:> = °l,fc °l,fc — °2,fc' and denote by 7£* the following expression 7 ? = {(C-1(M2-M1),A-M1)-(1/2)(C-1(M2-M1),(M2-M1))}.

(2.109)

Then, identifying h = M 2 - M i , x - m = A - M x = < r(t fe _i,er( i fe-i))AW'* evaluated along the path &, it follows from (2.102) that the corresponding RND is given by C ( A )) ==e xe px 7p r7 .r

50

Stochastic Differential Equations

Let {/x™fc,i = 1,2} denote the measures induced by {£™(£fc),i = 1,2}. Given that H™k_1 -< vTk-ii is given by

an(

^ S i v e n ^i7fc-i'

tne

witn res

RND °* ^k

P e c t to V>™k

(ft^MtfctK'^'-

(2uo)

'

for A: = 1,2,..., m. Thus if the two measures are absolutely continuous at time t — 0, that is, //2,o -< Mi,o, then it follows from (2.110) that

(^)«w-(^)«">fi-*r>.

(2.111)

= 9o(6,o)exp(f^7r)\fe=i

'

It follows from the expressions (2.108)-(2.109) that

Ik

= Uk-l,m(hk-l,m

-6l,fc-l,m),AWfc_iJ

- (r/2m)|fffe_

1

1>m (6 2lfc _ 1 , m

(2.112) 2

- 6i, fc _i, m )| .

where Ck-l,m

= c r ( ^ f c - l , C r ( ^ - l ) ) , ^ 2 , f c - l , m = b2(tk-l,

&l,*-l,m =

fcl(tfc-l,fT(tfc-l)),

fi^fe-l))*

A ^ f c - i = W(tfc) -

Wftfc-i).

Hence we have

Letting m —> oo in equation (2.111), we arrive at the result stated in the theorem. This completes the proof. QED In all our applications in chapters 13,15 and 16, we have £i5o = £2,0 and hence q0 = 1. Define f = o~~l (b2 — b\) and

Qt(f) = exp IjT (/, dW) - (1/2) j * \f\2dd\. For /i2 to be a probability measure on C it is necessary that

/*2(c) = / giT(0Mi(de) = iJc

(2.113)

Linear and Nonlinear Filtering

51

Sufficient conditions for this to hold are that / is essentially bounded or more generally, there exists a constant /? > 1 such that

E{exp(p / l / l 2 ^ ) } <

oo.

Under either of these assumptions one can prove that (i) £ { ^ ( / ) } = 1 P - a.s. for 0 < s < t < T, and (ii) E{qt8(f)\T8} = 1 P - a.s. for all 0 < s < t < T. For further details on this topic see Friedman [41], [42]. However, we must mention that the process {
dqh = qt>(f,dW), and that o = l + [ (q9of,dW(6)),t>0. (2.114) Jo It follows readily from Ito formula that the solution of this equation is given by (2.113) with 5 = 0. The formula presented in theorem 14, is known as the Girsanov formula. Similar formula was discovered by Cameroon and Martin in their study of nonlinear transformation of Wiener processes see [41]. We conclude this chapter after we have briefly mentioned two applications. If the drift coefficient b is merely bounded measurable, the existence theory given by theorem 3 does not hold. There is no strong solution either. However Strook and Varadhan [71] showed that weak solutions (martingale solution) exist in the sense that there exists a probability space (O, T, P , Tt T) on which an Tt Brownian motion W is defined such that the pair (£, W) is a solution of the equation df = b(t,£)dt + a(t,0dW,Z(o) = x. (2.115) More specifically, suppose that a satisfies the assumptions of theorem 3. Then it follows from this theorem that equation d£° = 0~(t,€o)dW(t),£o = x, has a unique solution which induces a measure ii° on the measurable space (C, B(C)) where B(C) denotes the Borel sigma algebra on C = C(I,Rn) containing all cylinder sets. Define 9°(0 = expij{a-\

dW) - (1/2) j

l^&fdi}.

If a is nonsingular and a~lb is bounded measurable, then the measure fi given by dpi = q°d\x° is a probability measure on (C,B(C)) giving us the probability

Stochastic Differential Equations

52

space (C, B(C), fi). Then one can show that W{t) = W{t) - f (a-^is^^ds.t Jo

e J,

is a Brownian motion on this probability space. Hence the conclusion follows. For details see Strook-Varadhan [71]. Another application is found in control theory with discontinuous drift. It is well known that an optimal feedback control law for even an elementary time optimal control problem is, in general, discontinuous. For example, the feedback control u(t) = F(£(t)) that drives the system (d2£/dt2) = u,-l< u(t) < + 1 from any given state to the origin (£, £) = (0,0) in minimum time, is discontin uous. If it is perturbed by additive white noise, we obtain an Ito equation and the time optimal feedback control law F for the deterministic system is also optimal for the stochastic problem. In general, consider the control problem d£ = 6(f, u)dt + a(£)dW, f (0) = f0.

(2.116)

The problem is to find a feedback control law that minimizes the functional J(u) = E{[ Jo

£&u)dt}==E{i&u)},

subject to the constraint that u must take values from a compact metric space U. Letting [i (same equation with drift removed) respectively, the problem can be reformulated as : find u that minimizes the functional

J(u)= [ Jc

m^qZiu)}^).

Under quite general assumptions that include bounded measurable drifts, one can prove the existence of optimal feedback controls from a suitable class of control laws. This line of approach is due to Benes, Varaiya, Duncan and Davis see [5] and the references therein. All that is required for existence of optimal controls in this setting is that the set V = {
u € Uad}

Linear and Nonlinear Filtering

53

be a weakly sequentially compact subset of Li{C,Bc,Hi) and that u —► gu satisfies certain appropriate convexity condition. This is combined with the idea of measurable selections to complete the proof. The set V is weakly sequentially compact if it is bounded and uniformly integrable, that is,

lim

ft/V(0W(de)=O

uniformly with respect to u € Uad- Further details is beyond the scope and purpose of this book see [1], [5]. Though this is a remarkable generalization, it has also serious limitations. The diffusion parameter a must be invertible (nonsingular square matrix) and must not contain controls, both considered very unrealistic for applications. This is a limitation of the Girsanov approach. It is not clear if Girsanov approach can be extended to overcome these difficulties. This is an interesting research problem.

55

CHAPTER 3 K A L M A N FILTERING FOR LINEAR S Y S T E M S D R I V E N B Y W I E N E R PROCESS-I

3.1 Introduction In all fields of studies including physical and engineering sciences and also economic and social sciences, scientists make measurements and collect data to understand and interpret the underlying phenomenon which is not directly ac cessible or observable. However the fundamental properties of the phenomenon may be sufficiently reflected in the measured or observed data provided they are carefully collected. On the basis of the available data scientists must discover the underlying phenomenon. This is generally known as scientific inference. A phenomenon may be described or characterized, as it evolves with time, by a vector valued function which may be called the state process. The funda mental problem of filtering is to estimate this process at any given time on the basis of available data or information upto that moment of time. This is generally known as filtering. There are two other concepts related to filtering. These are extrapolation (or prediction) and interpolation. In this chapter we consider continuous time Kalman filtering for linear systems driven by Wiener processes. In other words both the underlying (inaccessible) process and the measurement (or observed) process are governed by linear stochastic differen tial equations. Here we derive the original Kalman filter equations based on the approach given by the author and his students in [1],[2] and [11] using basic calculus of variations. We conclude this chapter with an extension of the filter theory to prediction problems. 3.2 System Dynamics The system is assumed to be governed by the following linear stochastic differential equation, dx(t) = A(t)x(t)dt x(0) = x

+

a(t)dW(t)

0,

Kalman Filtering I

56

where xo is a random n-vector and {x(t),t > 0} is the state process. The elements of the matrix A(t) are locally integrable and those of a(t) are locally square integrable functions taking values from the space of matrices M(n x n) and M(nxp) respectively. Let M+(n xn),n € N, denote the space of positive symmetric matrices. The initial state is a Gaussian random vector in Rn with mean E{XQ) = XQ and covariance matrix Po given by

(Pof,0 = £(*o-5:o,0 2 -

(3'2)

The process {W(t), t > 0} is a Brownian motion with covariance matrix Q(t) — tQ, t > 0, for some Q G M+(p x p), given by (QZ,0 = E(W(1)J02.

(3.3)

3.3 Measurement Dynamics The measurement dynamics is given by dy(t) = H(t)x(t)dt

+ a0(t)dV(t),t

2/(0) = 0.

> 0, (3 4)

'

Thus the measurement process is given simply by the integral of the trans formed state process corrupted by measurement noise. Here y takes values y(t) € Rm and the matrix valued functions H(t) and <7o(£) take values from the space of matrices M(m x n) and M(m x m) respectively. The elements of H are locally integrable and those of <7o are locally square integrable. The measurement noise process {V(t),t > 0} is an m-dimensional Brownian motion with covariance R(t) = tR, t > 0, given by (Rr,,V) = E(V(l),r])2),V€Rm,

(3.5)

where R G M+(m x rn). Later, we assume this matrix to be positive definite. Throughout the presentation here we shall also assume that the matrix <Jo(t) is also nonsingular for all t > 0. 3.4 Problem Formulation Given the history of the process {y(s),0 < 5 < £}, we wish to find the best estimate of the state x at time t, that is, x(t). Throughout this book,

Linear and Nonlinear Filtering

57

by best estimate is meant unbiased-minimum-variance (UMV) estimate. Since the observed process y is corrupted by noise, any trace or history observed is a matter of chance. Thus mathematically it is often convenient to denote this history by the smallest sigma algebra T\ with respect to which it is measurable. We may call T\ the information sigma algebra. We show that the best estimate is given by the conditional expectation x(t) EE

E{x(t)\J?}.

Indeed let {rj(t),t > 0} be any other Rn-valued .^-adapted random process having finite second moment so that Er}(t) = Ex(t),Vt

>0,

thereby satisfying the requirement that it is unbiased. Then for every u e Rn, we have j(v)

E^v(t)-x(t),u)2\^

=

= #{(»,(*) - m + m - x(t),u)2\??} - x(t),u)2|JTf|

= E((r,(t) +

+ £?/(£(*) -

x(t),u)2\f?}

2E((r,(t)-x(t),u)(x(t)-x(t),u)\F?\.

Since both rj(t) and x(t) are T^ adapted, it follows from the properties of conditional expectations ( see chapter 1) that

SJMO -x(t),«)2|*?} = (v(t) - x(t),u)2 and similarly Ei(ri(t)-x(t),u)(x(t)-x(t),u)\F?} = (V(t) -x(t),u)EUx(t)

-x(t),u)\J*\

= 0. Thus J(V) = (rj(t) - x(t)^)2

+ El{x(t)

- x(t),u)2\Fyt

J

Kalman Filtering I

58

for all u e Rn and t > 0. The only choice variable here is 77. Since u e Rn is arbitrary, it is clear from this expression that J(rj) attains its minimum for 77 = x. This proves that the best estimate of x(t) is given by its conditional expectation x(t) as stated earlier. Here we merely wish to construct a process z(t),t > 0, which is adapted to T\ satisfying the UMV philosophy. Thus we must have Ex(t) = Ez(t)

Vt>0,

. . . . . . and E \\ x(t) - z(t) ||22—> . Minimum,

w

(3-6)

V_ t >, 0

3.5 Derivation of Optimum Filter

Since we are interested in constructing a linear recursive filter, the filter dynamics must be of the form dz = G(t)z(t)dt + r(t)dy, t > 0, _ _

(3-7)

Hence our problem now is to choose the matrices G and T so that the con ditions (3.6) are satisfied. Henceforth, we shall assume that {xo,W, V} are uncorrelated. Define the error process {e(t),t > 0} by e{t) =x{t)-z{t).

(3.8)

Then subtracting equation (3.7) from equation (3.1), it is easy to verify that the error process satisfies the stochastic differential equation de = {A - TH)edt + (A - G - TH)zdt■+ adW - Ta0dV, ZQ

e(0) = e 0 = x(0) - xo,

=

Xo.

where, for convenience, we have omitted the argument t though all the variables in equation (3.9) are functions of time. Let $r(£> s),0 < s 0. Then using this transition matrix we can express the solution of equation (3.9) by e(t) = *r(*>0)eo+ / $r(t,s)(A(s) - G(s) Jo + M(t),t>0; pt

M{t)=

I $r(t,s)<7(s)dWJo

F(s)H(s))z(s)ds

pt

/ Jo

$r(t,s)r(s)a0(s)dV(s).

(3.10)

Linear and Nonlinear Filtering

59

Note that E{e0} = 0 and E{M(t)} = 0. Therefore, since the process {z(t),t 0} cannot be identically zero, for unbiased estimate we must have G(t) = A(t) - T(t)H(t)

Vt > 0.

>

(3.11)

Thus the filter dynamics given by equation (3.7) must take the form dz(t) = (A(t) - T(t)H(t))z(t)

+ T(t)dy, i > 0,

_ _ ZQ = Xo-

(6.12)

Clearly the process z so determined is adapted to the observed process y. Thus the only remaining unknown to be determined is the matrix valued function r(£), t > 0. As a consequence of this choice, the dynamics of the error process e given by equation (3.9) reduces to de = (A - TH)edt + adW - Ta0dV, (3 13) rt* ' e(0) = e 0 . Now we must exploit this equation to minimize the error variance. The error covariance matrix is defined by (tf(t)£,£) = £ ( e ( t ) , £ ) 2 , £ € i r \

(3.14)

and it is easy to see that TrK(t) = E{\\ e(t) | | 2 } . Let || rj \\ or simply \rj\ denote the standard Euclidean norm in Rn for any finite positive integer n. The problem then is to find a matrix valued function r ° , possibly from the class L^ c ([0, oo), M(n x m)), at which the functional J ( r ) = E [ || e(t) ||2 dt = / TY(K(t))dt, Jo Jo

(3.15)

attains its minimum for every finite T. In general we will find it useful to consider a slightly more general cost functional than that presented by (3.15). This is given by J ( T ) = / TV(ft(t)K(*))dt, (3.16) Jo where Q, G L^I^M^n x n)). Using the transition operator $r> it follows from equation (3.13) that the expression for the error is now given by e(t) = $T(t,0)eo+

[ $r(t,s)a(s)dW(s) Jo

- [ $r(t,s)r(s)ao(s)dV(s),t Jo

> 0. (3.17)

Kalman Filtering I

60

To find the expression for the error covariance we compute E(e(t),£)2 for arbitrary £ € Rn. Since the processes {xo, W(t),V(t),t >} are uncorrelated, all the three components in (3.17) are statistically independent each having zero mean. Thus the cross terms vanish and hence, for any £ € Rn, we have E{(e(t),02

2 0,*'r(t,0)0 }

+ I (Qa (s)$'r(t,s)t,a + / (R*0(s)r Jo

(s)$'r(t,s)£)ds

(s)$r(t,s)Z,aQ(s)r

^ ^

(s)$r(t,s)0ds,

where (') denotes the transpose of a matrix. Using the definition of the matrix valued function K we have (K(t)£,Z) = E{(e0,$'r(t,0)02}

+ f (Qa

(s)$'r(t,s)Z,
Jo

+ / Jo

(Ra'0(s)r'(s)$'r(t,s)Z,a0(s)r\s)$'r(t,s)Ods.

(3.19) Differentiating the expression (3.19) with respect to £, it is easy to verify that

{k{t)t,Z) = (K(t)^ (A - FHU) + (K(t)(A - THU,Z) + ((
(3.20)

n

(ff(0)£,O = (JteO, V teR . For the sake of compactness in notation let us write Q(t) = a{t)Qa{t)

and R{t) =
(3.21)

Since (3.20) holds for all £ € Rn, it follows from (3.20) and (3.21) that K satisfies the following differential equation in M+(n x n) given by K(t) = {A- TH)K(t) + K(t)(A - TH)' +Q + rRf K(0) = K0 = Po-

,t > 0,

}=E{(e

Let r ° denote the optimum filter gain, and K° the unique solution of equation (3.22) corresponding to T". Then clearly, J ( r ° ) < J(V° + e ( r - T°)) V r € L o o ( / , M ( n x m ) )

and V e e R. (3.23)

Linear and Nonlinear Filtering

61

This suggests that the function (e) = J ( r ° 4- e(T - T°)),e € R,

(3.24)

attains its minimum at e = 0 and hence {d/de)(j)\£=Q = 0.

(3.25)

Let K£ denote the solution of equation (3.22) corresponding to Te = r ° + £ ( r T°), that is,

K£(t) = (A-

( r ° + e(F - T°))H)K£{t)

+ K£(t)(A

- (T° + e(T -

T°))H)'

+ Q + (r° + e(r - r°))£(r° + e(r - r°))', t > o, £

K (0) = K 0 = P 0 . (3.26) Define K(t) = lim(l/e) ( x £ ( t ) - K°(tf).

(3.27)

Replacing K by If ° corresponding to Y = T° in equation (3.22) and subtracting equation (3.22), so obtained, from equation (3.26) and dividing by e and letting e —> 0, it is easy to verify that K satisfies the following variational equation

(d/dt)K(t)

= (A-

T°H)k{t) +

+ K(t){A - T°HJ + ( r - r ° ) [R(T°)' - HK°] \r°R-K0H'){T-T°i,*>0,

K(0) = 0. (3.28) Thus K is the directional derivative of K at r ° in the direction T - T 0 . By virtue of (3.15) and (3.24) we can write 0(c) = j

Tr{Sl(t)K£{t))dt,

where K£ is the solution of equation (3.26). Differentiating this with respect to its argument and setting e = 0 and using equation (3.25) we arrive at the following identity

dJ(r°, r - r°) = / Tronic) dt = o,

(3.29)

Kalman Filtering I

62

where d J{T°, T — T°) denotes the directional or the Gateaux derivative of J at T° in the direction r - T°. Since this holds for every finite interval I and every ft e Loo(/,M^"(n x n)), it is necessary that K(t) = 0 for all t e I and for all T G 1/00(7, M(n x ra)). In other words, the solution of the variational equation (3.28) must be identically zero for all admissible T. This is possible if and only if the inhomogeneous terms in equation (3.28) vanish for all t e I. But since T e Loc(I, M(n x ra)) is arbitrary this is possible if and only if the factor T°R-K0H'

=0.

(3.30).

This clearly implies that its transpose given by R(T°) — HK° also vanishes identically for alH > 0. Since R is nonsingular, it follows from this that T° = K°H,R-1.

(3.31)

Substituting this in equation (3.22) corresponding to optimum r ° , as repro duced below,

K°(t) = (A- r°H)K°(t) + K°(t)(A - r°H) + Q + r°£(r°)

(3.32)

K(0) = tfo, we obtain the differential (matrix) Riccati equation K°

= AK°

+ K°A

K°H'R-1HK°,

+Q-

(3.33)

K°(0) = K0. Thus we have proved the following result. T h e o r e m 1 Suppose the elements of the matrices {A(t),H(t),t > 0} are locally integrable, and those of {a(t),cro(t),t > 0} are locally square integrable, W and V are Wiener processes having incremental covariances Q G M+(n x n)andR € M+(m x m) respectively with R being nonsingular. Fur ther, suppose that the initial state has finite second moment, E \\ x$ || 2 < oo, and that {XQ,W, V} are uncorrelated. Then the best (UMV) linear filter is given by the following set of equations: dz(t) = A(t)z(t)dt

+ K°HR-1{dy{t)

-

H(t)z(t)dt) (j.u4J

z(0).= x 0 ,

where the error covariance matrix K° is given by the solution of the matrix Riccati differential equation K°

= AK°

K°(0) = K0.

+ K°A

+ Q -

K°H'R-1HK°,

(3.35)}

K

Linear and Nonlinear Filtering

63

Proof. It follows from equations (3.31) and (3.32) that the optimum error covariance matrix K° is given by the solution of equation (3.33). Using the expression for T° from equation (3.31) into equation (3.12) we obtain equation (3.34). QED Remark In case xo is Gaussian with mean XQ and covariance PQ, it is known that the Kalman filter is given by exactly the equation (3.34). Hence by unique ness of solutions of stochastic differential equations we have

z{t) = E{x(t)\J?}. In a later chapter we shall see that the procedure used here also applies to stochastic differential equations driven by martingales and not just by Wiener martingales. Stationary Case: In case the matrices {A, a, H, CTQ] are all constant, a sta tionary filter can be designed using the solution of the algebraic Riccati equa tion given by AK° + K°Af + Q - K°HfR-lHK° = 0. (3.36) Suppose Q can be written as Q = (3/3 . Then Q = (a(3)(a/3) . Now assume that (A, H) is observable and (A, a(3) is controllable. Then the algebraic Riccati equation (3.36) has a unique positive definite solution K°. This problem is dual to the classical infinite horizon linear quadratic regulator problem (see [1]). Defining

r° =

K°H'{R)-\

it follows from the above equation that A0K° + K°A0 = - E

(3.37)

where A0 = (A-T°H) and E = (5-^-^ 0 J R(^ 0 ) , . This is the celebrated Lyapunov equation. If E > 0, it follows from Lyapunov theory that A0 and hence AQ is stable in the sense that all its eigenvalues have negative real parts. Thus, the stationary filter given by dz = A0zdt + T°dy is stable. Given the equation (3.35) with the information, Q € M+(n x n),R € M+(m x m) and R > 0, one may question if it has a unique solution satisfying the natural positivity property. The answer to this question is given in the following theorem.

Kalman Filtering I

64

Theorem 2 Under the assumptions of theorem 1, the matrix Riccati equation (3.35) has a unique absolutely continuous solution K e C(I,M+(n x n)). Proof. Since integrable functions can be approximated by a sequence of con tinuous functions, without loss of generality we assume that all the matrices {A, cr, H, ao} are continuous functions of time. Further, recall that CFQ is always assumed to be nonsingular. Write equation (3.35) in the form K = F(t,K) where F denotes the expression on the right hand side of the same equation. Notice that on the space of square matrices denoted by M(n x n), F is locally Lipschitz in the sense that for every finite positive number r, there exists a function hr G L* (I) D C(I) such that || F(t,K)

- F(t,L)

||< hr(t) \\K-L

||, Vt € /

and V K,L 6 Ms(n x n) satisfying || K ||, || L \\< r. The function hr is given by

M*) = 2(||A(t)||+r||/r(t)ll 2 IIW*))- 1 ll)Thus it follows from a classical existence theorem for solutions of differential equations with continuous (right hand side) F, that equation (3.35) has a unique absolutely continuous solution on the maximal interval of existence say, [0,r), T T || K(t) ||= +00. The question is, can the solution be continued beyond r and till the given time T. We show that blow up does not occur. First note that if equation (3.35) has a solution then it is necessarily positive. Indeed, let L denote this solution and define G = LH'R~l. Then the equation for L can be written also as L = (A - GH)L + L(A - GHJ + Q + L(0) = K0.

L{HR-lH)L, (3 38)

'

Let ^AitjS) and ^G(t,s) denote the transition matrices corresponding to the matrix valued functions A and A - GH respectively. Then the solution L must satisfy the integral equation L(«) = *G(t,0)ifo*G(*,0)

+ / ^G(^ 5 ){Q(s) + L( 5 )(^ , ( 5 )^- 1 (5)^( 5 ))L( s )}^ G (t,s)d S . J0

(3.39)

Clearly the expression appearing within the curly brackets is symmetric and positive semidefinite, independently of whether L is positive or not. Since

Linear and Nonlinear Filtering

65

KQ > 0 and tyc(t,s) is nonsingular, this implies that L > 0. This proves the positivity. Further, it follows from the following expression L(t) = 9A(t,0)K09'A(t,0)+

I

*A(t,s)Q(s)*A(t,s)

Jo

-

r^(t, 5 )L( 5 ){if , ( 5 )^- 1 ( 5 )^( 5 )}L( 5 )^(t,s)d 5 ,

Jo

<*A(t,0)K0*'A(t,0)

+ / Jo

*A(t,8)Q(s)*'A(t,s),

that L is bounded above by a positive matrix valued function. Using these lower and upper bounds we conclude that blow up cannot occur. Hence the solution can be continued indefinitely. For integrable functions {A, Q,H, R}, one uses the approximating sequence {An,Qn,Hn,Rn} which is continuous and the Lebesgue dominated convergence theorem (theorem 8, chapter 1) to complete the proof. This gives an outline of the proof. For further details see [1, Theorem 6.3.11]. QED 3.6 S o m e E x a m p l e s E x a m p l e 1 Here we present a simple practical example from aerospace. In the absence of noise, a spin stabilized spherical satellite in geo-synchronous orbit can be approximately described by the following set of equations, Ip + LJ0Ir = Ti Iq = T2

(3.40)

If -u>oIp = T 3 , where I is the mass moment of inertia, UQ is the spin rate of the earth, {p, q, r} are the angular velocities of the satellite and {Ti,T2,Ts} are the external torques acting on the satellite. Hence the angular positions or the attitude angles of the satellite, denoted by {#i, 62l #3}, are given by 0 1 = p , 02 = q, 03=r

.

(3.41)

These equations are valid in the absence of any disturbance. However in reality the satellite is subjected to various random forces generated by solar pressure, random movement of micro meteorites, geomagnetic forces etc. In this situa tion the system is more precisely described by a stochastic differential equation as follows:

66

Kalman Filtering I dp = -u0r

dt + {TilI) dt + (1/J) d i ^ ,

d? = (T2/I) dt + (1/7) dw2l

(3.42)

dr = o;0p dt + (T 3 /7) dt + (1/7) dw 3 , where {A = (Tx//), / 2

2 //),

/ 3 = (T 3 //)}

are the mean forces. Introducing measurement noise, equation (3.41) takes the form d0\ — p dt + dv\, d02 = qdt+

dv2,

d03 = r dt+

dv 3 .

(3.43)

Finally equations (3.42) and (3.43) can be written in the following compact form dx = Ax dt + f dt + a dW, x(0) = x 0 (3.44) dy = Hxdt + a0 dV, y(0) = 0 , where the matrices {A, a, H, cr0} are easily identified from equations (3.42) and (3.43). In this case the filter equation is given by dz = Azdt + f(t)dt + T°(dy - Hz dt), z0 = x0,

(3.45)

where T 0 is the optimum gain matrix determined by the equations (3.31) and (3.33). Note that the presence of the deterministic force / in the system equation does not change the Riccati equation (3.33); it appears only in the filter equation given above. E x a m p l e 2 Here we present a problem arising in digital communication. A binary signal s(t) which takes on the values + V or — V is a modulating signal for a Phase Shift Keying (PSK) system. The PSK wave form is given by sP(t)

= D cos[u0t + (t)\

(3.46)

in which D is a fixed amplitude, U)Q is the carrier frequency, and the phase 0 = 0 for s(t) = +V and 0(t) = IT for s(t) = — V. Then equation (3.46) can be written in the alternative form sP(t)

= (s(t)/V)D

= 7(0/(0,

cosujot

= (T

Linear and Nonlinear Filtering

67

where f(t) = D cosc^ot and ^(t) takes on one of the values -hi or — 1 during each bit period, say, T. This is the signal that is transmitted over a Gaussian channel. The received signal (at the receiver) y is given by dy = l(t)f{t)dt

+ aodv, 2/(0) = 0,

(3.48)

for t € [0,T] or any interval of the form [fcT, (fc + l)T],fc e N. Here v i s a standard Brownian motion and &% is the noise power density. The problem is to design a receiver that gives the best estimate of 7 at the end of each bit period on the basis of the received signal y over that period. This is a filtering problem. Since 7, over any bit period, is a constant, -hi or — 1, for the state equation we can take dx(t) = 0, t G [0, T], x(0) = x0.

(3.49)

Here everything is scalar, with A = 0,cr = 0 and H = f. Using (3.35) and (3.34) one can easily write the equations for error covariance fc and the esti mator 7 as follows k = -(f2(t)/*i)k2,

*(0) = fc0,

2

d7(t) = -{h(t)f {t)/ol)Kt)dt+

{k{t)f{t)/al)dy.

(3.50)

Solving for k we have k(t) =

r-^

■

(3-51)

(l + fco/oVW/^W) Defining g(t) = k(t)(^-J^-), the solution for 7 is given by

7(t) = 7o exp{- J

g(e)d0)} + j

exp-

{J

g(6)de}

(k(s)f{s)/a^jdy{a).

This is the best estimator. Define d

, , / ^ = \

+ 1 , for 7] > 0 -l,for„<0.

The optimum receiver is then given by the estimator followed by the de cision function d giving s(t) = d(7(T)). Note that at the end of each sampling

Kalman Filtering I

68

period, the estimation error is given by k(T) = n+kom2T/2a'>)i indicating re duction in error with increased signal to noise ratio, a fact well known to communication engineers. 3.7 Prediction Problem Here we consider the prediction problem: find the best estimate of the state at time t + a, a > 0, given the information sigma algebra T^ upto time t. Clearly, according to the basic theory developed here this should be given by xa(t) = E{x(t + a)\7?}. (3.52) By the properties of the conditional expectation operator this can be evaluated as follows: E{x(t + a ) | * ? } = E^E{x(t

+ a)|^t+a

}|*i}

= E{x(t + a)\f?}. It follows from equation (3.34) and the comment following this equation, that x(t) satisfies the differential equation dx(t) = A{t)x(t)dt + K°H'R-l{dy(t)

-

H(t)x(t)dt)

x(0) = x 0 .

(3.54)

Using the innovation process H)

=J

H(s)x(s)ds\

which is an T± Brownian motion having the same covariance as that of V, we can rewrite equation (3.54) as dx(t) = A(t)x(t)dt

+

K°H'R-l<Jo{t)dV(t),

x (0) = x0. Integrating this equation and using (3.52) and (3.53) we have xa(t) = $A(t + a, t)x(t), t > 0,

(3.55)

(3.56)

where $A denotes the transition operator corresponding to A(t),t > 0. Thus the optimum predictor is just the extrapolation of the optimum estimator. In other words, the optimum prediction operator is the composition of two operators, the estimator followed by the transition operator. We leave it as an exercise for the reader to compute the prediction error as a function of the system matrices.

69

CHAPTER 4 K A L M A N FILTERING FOR L I N E A R SYSTEMS D R I V E N B Y W I E N E R PROCESS-II

4.1 Introduction In this chapter we again consider continuous time Kalman filtering for lin ear systems driven by Wiener processes. However in contrast with the preced ing chapter, here the measurement dynamics is governed by a genuine stochas tic differential equation. We consider two situations: (A) and (B). In (A) the system dynamics is independent of the observed process but the measurement dynamics is governed by a genuine stochastic differential equation unlike in chapter 3. In (B) both the systems are fully dynamically coupled. 4.2 System and Measurement Dynamics (A) The unobserved process {x(t),t > 0} is again governed by the same linear stochastic differential equation, dx(t) = A(t)x(t)dt

+

a(t)dW(t)

(4.1)

x(0) = x 0 ,

as in the preceding chapter. However the measurement or the observed process is governed by the following stochastic differential equation dylt) = C(t)y(t)dt

+ H(t)x(t)dt

+ a0(t)dV(t),

t>0, (4.2)

1/(0) = 0 .

System (4.2) is a genuine stochastic differential equation whereas in chapter 3, C = 0 and hence the observed process y was merely the integral of the state process modulated by H. Thus the measurement process here is truly dynamic in contrast with the integral observation used in basic Kalman filtering. Here, the same basic assumptions as given in the preceding chapter apply. For the

70

Kalman Filtering II

matrix valued function C, it suffices if the entries are locally integrable. 4.3 Problem Formulation Given the history of the process {y(s), 0 < s < i) up to time £, we wish to find the best estimate of the state x at time t. Again we demand the (UMV) estimate. 4.4 Structure of The Filter Since we are interested in constructing a linear recursive filter, the filter dynamics must be of the form dz = G(t)z(t)dt + B(t)ydt + T(t)dy, t > 0, =

XQ

- x

Hence our problem now is to choose the matrices G,B and T so that the estimate z for the state x is unbiased and has minimum variance. 4.5 Solution of the Problem Define the error vector e = {x - z)

(4.4)

as before. Subtracting equation (4.3) from equation (4.1) and using the mea surement dynamics (4.2), we obtain the following stochastic differential equa tion

de = (A - VH)edt + (A-THe(0) = e 0

G)zdt - (B + TC)ydt - Ta0dV + odW,

0.

(4.5) Since the estimated process z and the observed process y are arbitrary, for unbiased estimate it is necessary that the corresponding coefficients vanish, that is, A-TH-G = 0Wt>0, (4.6) K J B + TC = 0 V t > 0. From equation (4.6) it follows immediately that the matrices G and B of the proposed filter equation (4.3) must be chosen as

Linear and Nonlinear Filtering

71

G = A - TH,

(4.7)

B = -TC. Using this in the error dynamics (4.5) we obtain de = (A - TH)edt + adW - T
(t)R(To(t)''.

Interestingly this equation coincides with the error equation (3.13) of the pre ceding chapter. Thus the error covariance matrix K° is given by the solution of the same matrix Riccati differential equation as reproduced below: K° = AK° + K°A

K°H'R-lHK°,

+Q-

0<jQ

K°(0) = Xo, and the optimum gain is given by r° = K°H'R-\

(4.10)

where Q(t) =

a(t)Qa(t)\

R(t) = a0 Using the expressions (4.7), (4.10) and the solution of equation (4.9) into the filter equation (4.3) we obtain the equation for optimum filter dz = (A - T°H)zdt - T°Cy dt + r ° dy, = Azdt + T°(dy -Cy

dt - Hzdt),

(4.11)

z(0) = xoIntroducing the innovation process we can describe the filter equation (4.11) by an equivalent stochastic differential equation as follows dz = Azdt + T°(j o

= Azdt + Y a0dV,

l

{dy - Cy dt -

Hzdt) (

'

'

where we can show that {V(t),t > 0} is va Wiener process with the same incremental covariance as {V(t),t > 0}. Collecting all this information in the form of a theorem we have the following result.

Kalman Filtering II

72

Theorem 1 Suppose the elements of the matrices {A(t), H(t), C(t),t > 0} are locally integrable and those of {cr(t),cro(t),t > 0} are locally square integrable, the processes {xo, W, V} are uncorrelated, XQ has finite second moment, {W, V} are Brownian motions with {
(4.13)

z(0) = x 0 , where F" is given by T°(t) = K°(t)H'(t)R-1(t)

(4.14)

and K° is given by the solution of the differential Riccati equation

K° = A(t)K° + K°A(t)

+ Q(t) -

K°H\t)R{t)-lH(t)K°,

K°(0) = K0.

(4.15)

Comment 1 We have seen that the optimum filter equation (4.13) can be written as a stochastic differential equation driven by the innovation process as follows dz = Azdt + r°a0dV, (4.16) V J *(0) = x0. This equation can be used to study sensitivity and stability of the filter. 4.6 System and Measurement Dynamics (B) Here we consider a more general model in which the signal and the mea surement processes are governed by a coupled system of stochastic differential equations. The signal (unobserved) process {x(t),t > 0} is governed by the linear stochastic Differential equation, x(0) = x

0,

and the measurement or the observed process is governed by the equation dy{t) = C(t)y(t)dt -I- H(t)x{t)dt

y
+ a0(t)dV(t),t

> 0,

"

(4 18)

-

Linear and Nonlinear Filtering

73

Thus the state and the measurement processes are truly dynamically coupled. Here the same basic assumptions as given in the preceding chapter apply. For the matrix valued function D, it suffices if the entries are locally integrable. Choosing the structure of the filter as given by equation (4.3), our problem is to find the matrices {G, B, T} so that we have an unbiased minimum variance filter. Proceeding as in the previous sections, one can show that for an unbiased estimate we must have G = A - TH (4.19) K J B = D - YC. Again defining the error process e = x — z it is easy to verify that e satisfies the SDE: de = (A- TH)edt + adW - Ta0dV, (4 20) rti\ ' e(0) = x 0 - zo, which, in fact, is the same equation as (4.8). Thus the equation for error covariance remains unchanged. This is given by the solution of equation (4.15). The optimum gain matrix T° is then computed by the expression (4.14). Thus it follows from (4.19) that G0 = 0

B =

(A-T°H), (D-F°C).

The dynamics of the optimum filter is given by dz = Azdt + Dydt + z{0) = z0

T°(dy-Cydt-Hzdt),

0.

=x

Again introducing the innovation process dV = <jQl{dy - Cydt -

Hzdi)

we can rewrite the filter equation driven by the innovation process as follows: dz = Azdt + Dydt + T°<70dV, (4.22) 2(0) =

ZQ

= x0.

Using martingale theory one can verify that V is a Brownian motion having the same incremental covariance as that of V. Thus we have the following result for the most general linear system given by the equations (4.17) and (4.18).

Kalman Filtering II

74

Theorem 2 Suppose the elements of the matrices {A(£), H(t), £>(£), C(t),t > 0} are locally integrable and those of {crit), cro(t), t > 0} are locally square inte g r a t e , the processes {x 0 , W, V} are uncorrelated, #o has finite second moment, {W, V} are Brownian motions with {Q, R} being the corresponding incremen tal covariance matrices respectively. Then, the best (UMV) linear filter for the systems (4.17) and (4.18) is given by the following set of equations dz = (A-

T°H)zdt + Dydt + T°(dy - Cydt)

2(0) = x 0 ,

(4.23)

where r ° is given by r°(t) = K°(t)H'(t)R-1(t)

(4.24)

and K° is given by the solution of equation

K° = A(t)K° + K°A'(t)

+ Q(t) -

K°Hf(t)R(t)-1H(t)K°,

(4.25)

K°(0) = K0. Comment 2 There is only a minor change in the above theory if the sys tem dynamics given by (4.17) is modified by adding an arbitrary T\ adapted process {f(t),t > 0} on its right hand side giving dx(t) = A{t)x{t)dt + D(t)ydt + f(t)dt +

a(t)dW(t)

x(0) = x 0 .

(4.26)

In this case the estimator dynamics given by (4.23) turns into dz = (A-

T°H)zdt + Dydt + f{i)dt + T°(dy - Cydt)

z(0) = x0.

(4.27)

The covariance equation given by (4.25) remains unchanged. This shows that the estimation error can not be improved by any .T7^-adapted additive control However the estimated state process z can be controlled as evidenced by (4.27). Note that f(t) can be any linear or nonlinear Volterra functional of the process y in the sense that f(t) =

F(t,irty),t>0

where irty stands for the history of the process y upto and including time t.

Linear and Nonlinear Filtering

75

C o m m e n t 3 By augmenting the state space and denning £(<) = (x®

) as

the new state, one can reduce the dynamic models (4.17) and (4.18) into the standard form (3.1) and (3.4) of chapter 3. This is given by df = A£dt + ZdW, dy = HZdt H£dt + a0dV, where the matrices A, H, £ and the Wiener process W are given by

* - ( 2 S).«-<»«>-=-(S £)■*■(?)■ However, there is no real advantage in this reduction. In fact, it increases the dimension of the differential Riccati equation. Further, it also introduces correlated noise since W is correlated with the measurement noise V. 4.7 Discussion and Examples We present here a simple prototype example from the field of ecology. The Lotka-Volterra prey-predator model for two species sharing the same habitat is governed by a pair of nonlinear differential equations like: + a2XY

+ a3X2

(d/dt)X

= -aiX -aiX

(d/dt)Y

= piY faY - IhXY (32XY + /?Y2,

/?Y2,

(4.28) (4 28) .

where X(t) and Y(t) denote the predator and prey population at time t. For example, X may represent fox population and Y the population of rabbits. If #2/?2 + otsfc ¥= 0 the system has a unique nontrivial equilibrium {Xe,Ye}. Linearizing the system around the equilibrium, one obtains a linear system describing the population excursion, (x = X - Xe,y = Y - Ye), ffom the equilibrium. This is given by the system (d/dt)x = -ax + py j3y w (4.29) ' (4.29) {d/dt)y = 7J/ - Sx, (d/dt)y = -yy-6x, where a,/?,7 and 6 are constants dependent on the original parameters and the equilibrium. Including migration, parametric uncertainties and fluctuation due to epidemic or other natural phenomenon, one may modify this model to a stochastic system of the form + 0y py + eiiVi (d/dt)x = -ax + w My ' (d/dt)y = jy-Sx + e2N2, {d/dt)y = 72/ - Sx + e2N2,

, _ (4.30) (4.30)

Kalman Filtering II

76

where e\ and e\ denote the intensity of fluctuation while N\ and N2 are two independent standard white noise processes. Letting {w(t),v(t),t > 0} denote the corresponding scalar Brownian motions, model (4.30) can be interpreted as standard Ito equations dx = (ax + (3y)dt -f e\dw

2dv.

dy — (iV — 6x)dt.+ e

The predator population X is evasive and difficult to observe while the prey population Y is easily observable. The problem is to estimate x(t) from the historical data of the observable population {y(s), 0 < s < t}. Thus we can apply the result of theorem 2. Identifying A = —a, D = (3, a = e\ C = j,H

= -6,(70 = e 2 ,

it follows from (4.23)-(4.25) that the equation for the estimate z, defined by the expression z(t) = E{x(t)\Jrf}, is given by

dz = -(a + {6/e2)2k)zdt

+ (/? + (67/fel)k)ydt

- (8/e22)kdy.

(4.32)

The estimation error k is given by the solution of the Riccati equation, k = -2ak - (S/e2)2k2

+ e\.

(4.33)

On the basis of this result one may determine an approximate estimate of the population X(t) by X(t) w Xe(t) + z(t),t > 0. This is approximate since a linearized version of a nonlinear system has been used for filtering. Similar results can be derived for problems involving binary reactions in chemical kinetics [1].

77

CHAPTER 5 D I S C R E T E K A L M A N FILTERING

5.1 Introduction In this chapter we present the basic equations for discrete time Kalman Filtering and conclude with a complete simulation procedure. We follow the derivation given in [83] which is fundamentally similar to that of chapters 3 and 4. 5.2 System Dynamics Let iV = {0,1,2, 3, • • •} denote the index set for the time of events. The system dynamics is governed by x 5 + i = $sxs + Ws,se

N

^0 = so>

where £o is a random n-vector and {xSJ s e N} is the state process, {3>s, s € N} is the state transition operator and {Ws, s £ N} is a sequence of independent Gaussian random vectors with values in Rn. The sequence {Ws,s € N} is assumed to satisfy the following properties: EWS = E(WaWr)

0,seN, (5.A1)

=

Qa6atr,8,reN,

where the covariance matrix Qs e M+(n x n) that is, (Q s £,£) > 0,Vs £ N. The initial state satisfies the following assumptions £

!°

= £f

E(x0x0)

°=°'

(5.A2)

= P0.

Further, it is assumed that the initial state is independent of the dynamic noise sequence {Ws, s G N}, and hence E(x(hWs)=01

VseN.

(5.43)

78

Discrete Kalman Filtering

5.3 Measurement D y n a m i c s Let {zs,s G N} denote the observed process or the measured data which is governed by the following linear system zs = Hsxs + ZQ zo

Vs,seN

(5.2) (5.2)

= 0. = 0.

The matrix Hs G M{m x n), and {Vs,s G N} is a sequence of independent Gaussian m-vectors with known statistical data: £Vss = , , V,,VseN seIV EV £(V Rs6,,r,s,r€N. £(V5V € N. 5Vr) r) = R,6,,r,8,r

(5.44). (5.44).

This is the measurement noise or measurement uncertainty. The covariance matrix Rs G M+(m x m),Vs G N and it is assumed to be positive dehnite, that is, Rs > 0. Further, it is also assumed that the state noise, measurement noise and the initial state are all uncorrelated. Since {Ws} and {V^} have zero mean, this implies that E(V E(V3Wl)=0, Vs,reN, SWI)=0, Vs,reAT, E(Vsx'0o))=0\/5GN. = 0V5GN.

(5.A5) ' '

[

5.4 Problem Formulation Given the apriori estimate (before measurement) of the state at time 5, denoted by £ , ( - ) , we seek an update estimate xs(+) based on the current measurement data zs. To avoid any estimator requiring growing storage of data, one uses a recursive expression that employs the most current data and the preceding estimate to obtain the update. This is realized by requiring the estimator to have the following structure: x5(+) = Bsxs(-)

+ Ksza,s€N, Ksza,seN,

(5.3)

where the matrices {Bs,Ks,sGN,} are time varying weighting matrices to be determined. These are to be determined so as to obtain an unbiased minimum variance estimate (UMV).

Linear and Nonlinear Filtering

79

C o m m e n t 1 If {Ws, Vs, XQ, s € N} are all Gaussian then the linear filter is globally optimal in the sense that, by admitting linear as well as nonlinear estimators, the estimate can not be improved over that given by the linear estimator. Define the estimation errors before and after the measurement as follows xs(—) = xs(—) — xSJ s € TV", before measurement x s (-h) = x s (-f) — xsi s € TV, after measurement.

(5.4)

Substituting these in equation (5.3) we obtain xa{+) = (Bs

SH S

- I)xs + Bsxs(-)

+ KsV31s e N.

(5.5)

By assumption, EVS = 0,5 <E N and hence, if Exs{-)

= 0,

that is, Exs(~)

= Exs,

then we can have an unbiased estimate if and only if BS = (I-KSHS).

(5.6)

Substituting (5.6) into (5.5) and using (5.4) we have x a (+) = (J - KsHs)xs{-)

= x3(-) + K s (z s -

+ K5ys

H3xs(-)).

(5.7)

From this we can now determine the error covariance update as discussed in the following section. 5.5 Error Covariance U p d a t e Define the error covariances before measurement, called time update, and after measurement as follows Ps(—) = E{xs{—)XS(—)')

before measurement

P3(+) = E(xs(+)xs(+y)

after measurement .

4- K

80

Discrete Kalman Filtering

Substituting (5.7) into (5.8) we have, P . ( + ) = E{[(I - KsHs)xs(-) = (I- K.H.)P,{-)(I

+ KSVS][(I ~ KsHs)xs(-) - KSHS)'

+

+ K,V.]'}

(+))

KSRSKS.

where we have used the fact that the state error xs(—) is statistically indepen dent of the measurement noise Vs giving E{Vs(xs(-))'} E{xa(-)Vi}

= E{Vs)E(xs{-)j = 0,

=0

VseN.

Optimum Gain Now we must choose Ks, the so called Kalman gain, so that we have minimum variance estimate. This can be done by minimizing the functional J{KS) = Tr(Ps = Tr((I

- K,H.)P.(-)(I

= Trfpa(-)

- K.H.A

- PS(-)H'X + TT

~ K3HaPs(-)

+TT(KSRSK'S)

+

K.H,P.(-)H'.K'.\

(KSRSK'\

(5.10) Define (e) = J(KS + e r ) , T € M+(n x n).

(5.11)

It is clear that if Ks is optimal then the function 4> attains its minimum at e = 0 and hence its first derivative with respect to e must vanish there. Taking the derivative of (e) =

J{Ks+eT)

= T r { P s ( - ) - Ps(-)H's(Ks+eT)' + (K. + eT)HaPa(-)H'a(Ks

- (Ks +

eT)HsPs(-)+

+ eT)' + {K. + eT)Rs(Ks

+ eT)'}, (5.12)

and setting e = 0 we find that (d4>/de)\e=0 = 2Tr(-Ps(-)H'sT'

+ (KsHsPs(-)H's+KsRs\f\

= 0. (5.13)

Linear and Nonlinear Filtering

81

Since this holds for all T e M+(n x n), it follows from equation (5.13) that PS(-)H'S

= KS(HSPS(-)H'S)

+ KSRS,

Vs G AT,

(5.14)

and since Rs > 0 (that is, Rs is positive definite), it follows from the above expression that Ks = PS(-)H'S (HSPS(-)H3

+ R3\

.

(5.15)

Thus we have the expression for the optimum gain matrix for each time index s e N. Substituting the expression (5.15) into (5.9) we obtain the equation for error covariance update after measurement Ps(+) = (I - KSHS)PS(-)(I = (I-

- KSHS)'

+ KSRSK'S

KSH3)PS(-)

- PS(-)HS'K'S

+ Ks

(HSPS(-)H'3

= (I - KSHS)PS(-)

- PS{-)HS'K'S

+

PS(-)H'X

= (I -

+

R,\K',

KSHS)P3(-),

(5.16) with Ks as given by (5.15). Thus the error covariance update after measure ment is given by PS(+) = (I-KSHS)PS(-). (5.17) 5.6 State Estimate Time Update Since measurement is taken at discrete points of time, it is necessary to make as good an estimate as possible using the system dynamics and the previous best estimate after measurement. The best estimate that one can expect in this situation (that is, in the absence of any measurement data during this period) is given by x5+1(-) = $5x5(+). (5.18) This estimate should be unbiased. Prom the system dynamics we have x s +i = $sxs -{-Ws,se

N.

Subtracting (5.19) from (5.18) we have x s + i ( - ) - xs+i = $ s ( x 5 ( + ) - xs) - Ws,

(5.19).

Discrete Kalman Filtering

82

which is equivalent to x,+i(-) = $sxs(+)-Ws.

(5.20)

Since the estimate xs(+) is unbiased, E x s ( + ) = 0 and, since EWS = 0, it follows from (5.20) that £?xs+i(-)=0, (5.21) that is, {xs(—),s e N} are also unbiased. Thus, in the absence of measure ment, the time update given by (5.18) is a good estimate obtained from the previous best estimate (after measurement). 5.7 Error Covariance Time Update We use (5.20) to compute the error covariance time update. P,+iR =

E(XJ+I(-)X5+I(-)')

Since x

Here we have used the fact that Efx.i+W',} =0.

(5.23)

This follows from the following observation. Let Tl — a{zr,r < s} denote the smallest sigma algebra with respect to which the observation process {zr,r < s} is measurable. Then £{^(+)W:} = E|£?|X.(+)W,'|J7}}. s(-\-)

is Tl measurable, we have

s ji5 |X.(+)W;IJ7}} = £?|x.(+)£?|w:i^;}}. But the sequence {Wk, k > s} is statistically independent of the measurement process {zr,r < s} and hence of the sigma algebra Tzs. Thus

EUVS\?*\=E(WS)

= O.

Linear and Nonlinear Filtering

83

Hence follows (5.23). 5.8 Necessary Equations For Simulation Time update Given the estimates (after measurement) at the previous time index s e TV, we can obtain the time updates from equations (5.18) and (5.22) as given below: £ s + i ( - ) = $sXs(+),

State time update

P s + i ( - ) = $>SPS(+)$S -\-Qs,s e N

covariance time up date .

(5.24)

Optimal Gain: The optimal gain is then computed using equation (5.15) as follows Ks + X = ( p ; + i ( - ) i £ + i ) ( f f , + i P . + i ( - ) i £ + i + * * + i )

•

( 5 - 25 )

Measurement U p d a t e Using equations (5.3),(5.6),(5.16) and the time up dates given above, we obtain the updates after measurement as follows: x8+i(+) = x s + i ( - ) + i i : 5 + i ( v i - # s + i x s + i ( - ) ) P a +i(+) = ( I - K8+iH8+i)P8+i(-),

State

Covariance.

(5.26).

In summary, we have a complete set of equations,(5.24)-(5.26), which can be used to compute all the necessary estimates. This gives us the optimum filter. Comment 2 One should expect improved estimate after each measurement update. This can be verified by comparing the error covariance matrices before and after measurements. From equation (5.26) we have P . + i ( - ) - P.+i(+) = X 5 + i F 5 + 1 P s + 1 ( - ) =

(5.27)

Ps+l(-)#s+l-fr7+l#s+lPs+l(-)>

where Rs+1

= (HS+1PS+1(-)H'S+1

+ Rs+1).

(5.28)

Since .R s+ i > 0, and the term Hs+iP3+i(-)H's+1 > 0, it is evident from (5.28) that Rs+i > 0 and hence it follows from (5.27) that Ps+1(-)

- Ps+1(+)

> 0.

Discrete Kalman Filtering

84

This means that the error covariance can only diminish after measurement update thus reducing uncertainty in the estimate. C o m m e n t 3 Any continuous time model can be approximated by a discrete time model of the form given by equation (5.1). Indeed, consider the system dx = A(t)xdt + dy = H(t)x(t)dt

a(t)dW(t), + a0{t)dV(£), t > 0,

by

and let $(£,s),0 < s < t < oo, denote the transition operator corresponding to the system matrix A(t). Let 0 = to < ti < £2 < • • • < £s < ^+1 • " D e a sequence of points (time instants) at which measurements are taken. Define $ s = $(£5+i>*s)> xs = x(ts) and Ws Ws=

/

*(i a+1> 0)cr(0JdW(0).

Jts

Then equation (5.29a) can be approximated by the discrete time model x s + i = $sxs + Ws,seN,

(5.30)

which is precisely the model equation (5.1). Similarly, one can approximate the measurement equation as follows. Define zs = y(£ s +i) — y(ts), Hs = / / ; + 1 H{9)dQ and Vs = / / ; + 1
+ Vs,8€N.

(5.31)

It is easy to verify that {TVs,s G AT}, is a sequence of independent Gaussian random vectors with mean zero and covariance

QS= [s+1

^{ts+1,e)a(e)Qa\e)^(ts+ue)de.

Similarly, {Vs,s e N}, is also a sequence of independent Gaussian random elements with covariance fts+l rzs+i

Rs =

ao(r)Ra0(r)dr. Jts Frequency of measurements determines the order of accuracy. In engineering practice, measurement is costly and error can be fatal and hence a balance must be sought. For rapidly varying parameters, {^4,
85

CHAPTER 6 L I N E A R FILTERING W I T H CORRELATED NOISE-I

6.1 Introduction In this chapter once again we consider continuous time Kalman filtering for linear systems driven by Wiener processes. However in contrast with the preceding chapters, here the system and measurement noises are correlated. In particular the system noise affects the measurement dynamics. We consider two situations: (A) and (B). In (A) the system (state) dynamics is independent of the observed process and the observed process is only integrally (not differentially) coupled with the state process. In (B) both the system and Measurement dynamics are fully coupled. 6.2 System and Measurement Dynamics (A) The inaccessible or unobserved process {x(t),t a linear stochastic differential equation, dx(t) = A(t)x{t)dt A(t)x(t)dt

+ +

> 0} is again governed by

a!(t)dW{t) a^dWit)

(6.1)

x(0) = xo, x0,

as in the preceding chapter. However the measurement or the observed process is governed by the following stochastic differential equation dy(t) = H(t)x(t)dt 2/(0) = 0. »(0)

+ a3(t)dW(t)

(t)dV(t),t + a4(t)dV{t),t

> 0,

(6.2) (6 2) '

Thus the system noise affects the measurement dynamics. This model reduces to the classical case if 03 = 0. Here the same basic assumptions as given in the preceding chapters apply. For the matrix valued function 03, it suffices if the

86

Linear Filtering: Correlated Noise I

entries are locally square integrable. 6.3 Problem Formulation Given the history of the process {2/(5), 0 < s < t} upto time t, we wish to find the best linear estimate of the state x at time t. Again we demand the (UMV) estimate. 6.4 Structure of The Filter Since we are interested in constructing a linear recursive filter, the filter dynamics must be of the form dz = G(t)z(t)dt + Y(t)dy,t

> 0, (6.3)

Hence our problem now is to choose the matrices G and T so that the estimate z for the state x is unbiased and has minimum variance. 6.5 Solution of The Problem Define the error vector e = (x - z)

(6.4)

as before. Subtracting equation (6.3) from equation (6.1) and using the mea surement dynamics (6.2), we obtain the following stochastic differential equa tion de = {A - TH)edt + (A-TH-

G)zdt + (ax - Ta3)dW - TaAdV ZQ

=

£0.

e(0) = eo = x0 - x 0 Since the estimated process z is arbitrary, for an unbiased estimate it is nec essary that the corresponding coefficients vanish, that is, A-rH-G

= 0Vt>0.

(6.6)

From equation (6.6) it follows immediately that the matrix G of the proposed filter equation (6.3) must be chosen as G = A - TH.

(6.7)

Linear and Nonlinear Filtering

87

Using this in the error dynamics (6.5) we obtain de = (A- TH)edt + (
(6.8)

Let $r(t, s), - o o < s < t < oo denote the transition operator corresponding to the matrix valued function {A{t)-T{t)H). As usual, using this operator we can express the error vector by the following equation e(t) = = *r(*,0)co * r (*,0)co + / * r ( M ) ( * i ( « ) - r(s)
(6.9) (6.9)

Define the error covariance matrix K as before 2 2 (K(t)Z,t) = . (K(t)^,0 = E(e(t),£) E(e(t),0

(6.10)

Using equations (6.9) and (6.10), and the basic properties of the transition operator $ r one can easily show that K satisfies the following differential equation K(t) = (A- YH)K TH)K + + K(A - TH)' + fo + T{c fa - Ta^Qia, Ta3)Q((r1 - Ya Va33j)' + T{a44R*' Ra'44)T )T', ,

(6.11) (6.11)

K(0) = K0 = P0. For convenience of notation let us define the following matrices Q11=<TiQ
= a3Qa[,R44 = a4aRa 4. 4. 4Ra

(6.12) (6 12) '

Note that Q13 = Q31. Using these notations one can rewrite equation (6.11) as K{t) = (A - YH)K + K(A - TH)' + Qn K{t) = (A- YH)K + K(A - TH)' + Qu (6.13) -- Q Q13TT -- TQ' TQ'13 + + r(Q r(Q33 + + RR)f, 44)f, (6.13) 13 13 33 44 K(0) = K . K(0) = K0. 0

Thus following our general philosophy, the problem is to find a T € Loo(/,M(nxm)) T€LooiltMinxm))

88

Linear Filtering: Correlated Noise I

that minimizes the cost functional J(T) = [Tr (QK)dt,

(6.14)

for every bounded interval I and Q G Li{I, M+(n x n)). If r ° G 1,^(1, M(n x m) is optimal, it is clear that J(T°) < J(T° + e(T - r°))

(6.15)

for all e G R and for T G LOQ(I, M(n x m)). Since there are no constraints, this implies that the directional derivative of J at T° in any direction T — T° must vanish, that is, d J ( r ° , r - r o ) = 0 V r e £«>(/,M(nxm)).

(6.16)

Let (j) denote the function 4>{e) = J(Y° + e{Y -T°)),e

e R.

Thus (6.16) is equivalent to {d/d£)(t>{e)\£=o = 0 V r G L o o ( / , M ( n x m ) ) .

(6.17)

Let K£ denote the solution of equation (6.11) corresponding to T = T£ = ( r ° + e(T — r ° ) and K° that corresponding to r ° . Then we have the following pair of differential Riccati equations K£ =(A - Y£H)K£ + K£(A-

T£H)

+ Qn

- QizV)' - (r)Q'l3 + r*(Qs3 + Ru)(r£)\

(6-18)

£

K (0) =K0; and K° =(A - Y°H)K°

+ K°(A - T°H)

+ Qn

- Q i 3 ( r ° ) ' - (r°)Q13 + r°(Q 3 3 + K°{0)

R4A){T°)\

(6.IS)

=K0.

It follows from continuous dependence of solutions on parameters that the following limits exist lim K£ = K° £ "° ~ (6.20) x

\im(l/e)(K£-K°)=K,

Linear and Nonlinear Filtering

89

where K satisfies the following differential Riccati equation

R =(A - Y°H)K + K(A -

r°H)

- (r - T°){HK° + Q'13 - (Q33 + #44) (r 0 )'} 1

J

(6.21)

- \K°H' + Q13 - r°(Q33 + ^44)}(r - r°)\ K(0) = 0. Computing the derivative of 0(e) and using the necessary condition (6.17), we obtain 0 = (d/cte)0(c)U=o = [Tr

{SIK) dt V T e L^I.Mfji

x m)).

(6.22)

Since this holds for arbitrary ft € Li(i", M+(n x n)) and any finite interval 7, we conclude that K = 0. This requires that the solution of equation (6.21) be identically zero, that is, for alH € I. But, for arbitrary T, this is possible if and only if the multiplier of T — T° is identically zero. Since a matrix is zero simultaneously with its adjoint, it follows from equation (6.21) that the expression, IK°H'

+ Q 1 3 - r ° ( Q 3 3 + #44)} = 0,

must hold. Since Q33 > 0 and R44 > 0, Q33 + R44 is nonsingular and hence the optimal gain is given by T° = (Q13 +

K°H')(Q33

+ i? 4 4 )

•

(6.23).

Substituting the expression (6.23) in equation (6.19) we obtain K° =(A - Q13(Q33 + R44)-lH)K°

+ K°(A - Q13(Q33 +

+ (Q11 - Qi3(Q 33 + i ^ r V i s ) " K°H\Q33 K°(0)

+

RAA)'XH)

Ru)-lHK°,

=K0.

(6.24) This is the differential Riccati equation giving the optimum covariance operator K° which when substituted in (6.23) gives the optimum filter gain T°. Thus we have proved the following result.

90

Linear Filtering: Correlated Noise I

Theorem 1 Consider the system dynamics (6.1) and the measurement dy namics (6.2) and suppose the elements of the matrix valued functions {A, H} are locally integrable and those of {cri, 03,04} are locally square integrable with 04 being nonsingular, and further { x 0 , W , F } are statistically indepen dent. Then, the optimum (UMV) linear filter is given by dz = {A-r°H)zdt ft*

-

+ r°{t)dy (2b>

-

z(0) - z0 = x 0 , where T° = (Q13 + K°H)(o33

+ R44)

•

(6.26)

and K° is given by the solution of the differential Riccati equation K° =(A - Q13(Q33 + Ru)-lH)K°

+ K°(A - Q13(Q33 +

+ (Q11 - Qi3(Qs3 + ^44)- 1 Q 1 3 ) - K0H{Q33 K°(0)

+

R^^H) RAA)-lHK°

=K0. (6.27)

Uncorrelated Case The results for the uncorrelated case can be readily derived from the above result. Recall that for the uncorrelated case, 0 - 3 = 0 . Hence the matrices {Qi3>Q3i>Q33} are all zero. In this case equation (6.26) reduces to T0 = {K0H,)(Ruy\

(6.28)

and equation (6.27) reduces to K° =AK° + K°A' + Q n K°(0)

K°H'R£HK°

=K0.

, x (6.29)

These are precisely the equations for Kalman-Bucy filter as given by equations (3.9) and (3.10) of chapter 3. 6.6 System and Measurement Dynamics (B) Here we consider a more general model in which the signal and the mea surement processes are governed by a coupled system of stochastic diflFerential

Linear and Nonlinear Filtering

91

equations. The signal (unobserved) process {x(£),£ > 0} is governed by the linear stochastic differential equation, dx(t) = A(t)x(t)dt , x(0) = x 0

+ D(t)ydt +

a^dWit), ^ '

'

and the measurement or the observed process is governed by equation dy(t) = C(t)y(t)dt

+ H(t)x(t)dt

+ a3(t)dW(t)

+

y(0)-0.

> 0, "

{6 31)

'

Thus the state and the measurement processes are truly dynamically coupled. Here the same basic assumptions as given in the preceding section apply. For the matrix valued functions {A, D, C, H}, it suffices if their entries are locally integrable. Choosing the structure of the filter as: dz = Mzdt + Nydt + Tdy, z(0) = z0 = x 0 ,

(6.32)

our problem is to find the matrices {M,N,T} so that we have an unbiased minimum variance filter. Proceeding as in the previous sections, one can show that for an unbiased estimate we must have M = A - TH N = D - TC.

(6.33)

Again defining the error process e = x — z it is easy to verify that e satisfies the SDE: de = (A - TH)edt + (
(6.34)

e(0) = x 0 - x 0 , which, in fact, is the same equation as (6.8). Thus the equation for error covariance remains unchanged. This is given by the solution of equation (6.19). The optimum gain matrix T° is then computed by the expression (6.28). Then from (6.33) we have M° = {A-T°H), N° = (D-

T°C).

Linear Filtering: Correlated Noise I

92

The dynamics of the optimum filter is given by dz = (A - V°H)zdt + (D - Y°C)ydt + T°dy, z(0) = z0

/

x

(6.35)

0.

Thus we have the following result. T h e o r e m 2 Suppose the elements of the matrices {A(t), H(t), D(t), C(t), t > 0} are locally integrable and those of {cri(£),cr3(£),cr4(£),£ > 0} are locally square integrable with cr^ being nonsingular. The processes {xo,W, V} are uncorrelated, xo has finite second moment, {W, V} are Brownian motions with {Q, R} being their corresponding incremental covariance matrices with R being positive definite. Then the best (UMV) linear filter for the systems (6.30) and (6.31) is given by the following set of equations dz = (A-

T°H)zdt + (D-

T°C)ydt + T°dy,

z(0) = xo,

(6.36)

where T° is given by r° = (Qi3 + K°H')(Q33

+ R44}

,

(6.37),

and K° is given by the solution of the differential Riccati equation

K° =(A - Q13(Q33 + Ri^H)^

+ K°(A - Q13(Qss +

Ri^H)'

1

R^y'HK"

+ ( O H - Qi3(Q 3 3 + R44)- Q'13) - K°H'(Q33 K°(0)

+

=K0. (6.38)

C o m m e n t 1 The filter equation (6.36) can be rewritten with the innovation process as its input, dz = (Az + Dy)dt +

T°^2(t)dW,

z(0) = Zo,

=

where W is an m-dimensional F? measurable standard Wiener process and E ( t ) = (Q33(t)

+

R44(t)\

x

Linear and Nonlinear Filtering

93

This is verified as follows. Define the process V as given by V(t) = y(t) - / C(s)y{s)ds - / H(s)z(s)ds, Jo Jo

t > 0.

(6.40)

Clearly it follows from (6.31) and (6.40) that dV = (dy - Cydt -

Hzdt)

= (H{x - z)dt + a3dW + a4dV).

(6.41)

Using conditional expectations relative to the sigma algebra T\ and Ito formula for stochastic diflFerentials one can verify that {V'(t), t > 0} is a Wiener process with covariance operator given by E as defined above. Then define W by dW = ^-1/2\t)dV.

(6.42)

Clearly W, as defined, is a standard m-dimensional Wiener process. Again, by setting appropriate matrices equal to zero one obtains the clas sical filter equations of chapter 3.

95

CHAPTER 7 L I N E A R FILTERING W I T H CORRELATED NOISE-II

7.1 Introduction In this chapter again we consider continuous time filtering for linear systems driven by Wiener processes. The system and measurement noise are correlated as in the preceding chapter. In the previous chapter, system noise adds to (affects) the measurement noise. However in contrast with the preceding chapter, here the measurement noise affects the system dynamics. We consider two situations: (A) and (B). In (A) the system (state) dynamics is independent of the observed process and the observed process is only integrally (not differentially) coupled with the state process. In (B) both the system and measurement dynamics are fully coupled. 7.2 S y s t e m and Measurement Dynamics (A) The unobserved process {x(t),t differential equation,

> 0} is governed by the linear stochastic

dx(t) = A{t)x{t)dt A(t)x(t)dt + a^dWit)

+
x(0) = x0, x{0) x0,

(7.1)

while the measurement or the observed process is governed by the following stocha-stic system dy(t) = H(t)x(t)dt y(0) = = 0.

+ a4(t)dV(t),t

> 0, >

(7.2)

Comparing with the problem considered in the preceding chapter, the situation here is reversed. Here the measurement noise affects the system noise and hence its dynamics. This model reduces to the classical case if 02 = 0. Clearly the dimension of the matrices must be compatible. For x{t) € Rn, y(t) G Rm,

Linear Filtering: Correlated Noise II

96

W(t) e Rp and V(t) € Rm we must have A(t) e M(n x n), a^t) G M(n x p), (72(t) G M(n x m), J?(t) € M ( m x n) and cr4(t) € M(m x m). Here the same basic assumptions as given in the preceding chapters apply. 7.3 Problem Formulation Given the history of the process {y(s), 0 < s < i) up to time t, we wish to find the best linear (UMV) estimate of the state x at time t. 7.4 Structure of the Filter As usual, we are interested in constructing a linear recursive filter. Thus the filter dynamics must be of the form dz = G{t)z{t)dt + T(t)dy,t > 0, _ _ z0

(7-3)

Hence our problem now is to choose the matrices G and T so that the estimate z for the state x is unbiased and has minimum variance. 7.5 Solution of the Problem Define the error vector e = (x-z)

(7.4)

as before. Subtracting equation (7.3) from equation (7.1) and using the mea surement dynamics (7.2), we obtain the following stochastic differential equa tion

de = (A-

TH)edt + (A-TH-

e(0) = eo = xo — xo-

G)zdt + oxdW + (<72 -

Ta4)dV

= Xo-

Since the estimated process z is arbitrary and cannot be identically zero, for an unbiased estimate it is necessary that the corresponding coefficients vanish, that is, A - TH - G = 0, V t > 0. (7.6) From equation (7.6) it follows immediately that the matrix G of the proposed filter equation (7.3) must be chosen as

Linear and Nonlinear Filtering

97

G = A - TH. TH.

(7.7) (7.7)

Using this in the error dynamics (7.5) we obtain de = (A- TH)edt <jxxdW dW ++ (<J (a22 -- Ta )dV de=(ATH)edt + <j To44)dV e(0) = e0 = x0 - x0.

(7.8)

Let $ r ( ^ s ) , , - c o < 5 < t < 00 denote the transition operator corresponding to the matrix valued function (A(t) - T(t)H(t)). As usual, using this operator we can express the error vector by the following equation 3(i) = = **r(t,0)e r ( t , 0 ) e 00 + f/ (t,s)
(7.9) (7.9)

Define the error covariance matrix K as before 2 (K(t)t,Q = E(e(t),0 E(e(t),tf. (K(t)^0 = -

(7.10)

Using equations (7.9) and (7.10), and the basic properties of the transition operator $ r one can easily deduce that K satisfies the following differential equation K(t) Qo'x + (a (
(7.11)

For convenience of notation let us define the following matrices Qn=(T a2Q(72, Qn = XaQo-' x,R22 1Qo-' 1,R22 = (T R24 = ao22Ra' Ro-' R42= =a4aRa ,R 4, 4,R42 4Ra' 2, 2R 4444= = ao 4Ra' 4Ro' 4. 4.

(7.12)

Note that R24 = R42. Using these notations one can rewrite equation (7.11) as

K(t) K{t) = (A{A- TH)K + K{A K(A - TH)' + Qnxx + R22 - R24 f 24T K(0) = K0. K(0)

- TR24 TR24 ++ ri? TR4 44 f, 4 r', (7.13)

98

Linear Filtering: Correlated Noise II

Thus following our general philosophy, the problem is to find an element Y € L 00 (7, M(n x m)) that minimizes the cost functional J(T)=

fTr(QK)

dt,

(7.14)

for any bounded interval I and Q, € Li(I, M + ( n x n)). Following similar arguments as repeated in the preceding chapters, since there are no constraints the necessary condition for T° £ I>oo(-J\ M(n x m) to be optimal is that the directional derivative of J at T° in any direction T — T° must vanish, that is, dJ(r°,r-r°)=0

V r€Loo(/,M(nxra)).

(7.15)

Let (f> denote the function

cP(e) = j(r° + £ ( r - r ° ) ) , £ e # . Thus (7.15) is equivalent to (d/de)(l>{e)\£=o = 0 V T G £,«>(/, M{n x m)).

(7.16)

Let K£ denote the solution of equation (7.11) corresponding to T = Te = T° + e(T - T°) and K° that corresponding to T°. Then we have the following pair of matrix differential equations K£ =(A - TeH)K£ + K£(A-

T£H) + ( Q n + R22)

- R24(r£)' - (r£)R24 + r £ #44(n'

(7.17)

e

K (0) =K0. K° =(A - T°H)K° + K°(A- T°H) 4- (Qn + R22) - #24(r°)' - ( r ° ) ^ 4 + ^ ^ ( r 0 ) ' K°(0)

(7-18)

=K0.

It follows from continuous dependence of solutions on parameters that the following limits exist lim K€ = K° ~° , (7.19) x V ; \im(l/e)(K£ - K°) =K,

Linear and Nonlinear Filtering

99

where K satisfies the following matrix differential equation

k =(A - Y°H)K

+ K(A - r°H)

-

(K°H

- (r - T°)(HK°

+ R2A -

RA^T°)'\

+ #24 - r 0 ^ ) (r - r°)',

k(o) = o. (7.20) Computing the derivative of (j)(e) and using the necessary condition (7.15), we obtain 0 = (d/de)(e)\€=0 = f Tr (SIK) dt V T e Loo(7,M(n x m)).

(7.21)

Since this holds for arbitrary Ct e Li(I, M+(n x n)) and any finite interval 7, we conclude that K = 0. This requires that, for any T e Lco(I,M(n x m)), the solution of equation (7.20) be identically zero, that is, for all t e I. But this is possible if and only if the multiplier of T — T° is identically zero. Since a matrix is zero simultaneously with its adjoint, it follows from equation (7.20) that the expression K°H + R2A - r°#44 = 0. (7.22) Since #44 > 0, the optimal gain is given by T0 = (R2i + K°H)Rtl.

(7.23)

Substituting the expression (7.23) in equation (7.18) we obtain K° =(A - R24R^H)K°

+ K°(A -

R24R^H)

+ (Q11 + R22 ~ R24R4IR42) ~ K°H\RA4)-lHK° K°(0)

(7.24)

=K0.

Note that #24 #44 #42 = #22 R24R44

=

^2^4

•

Substituting these in equation (7.24), we obtain K° =(A - oialxH)K° K°(0)

+ K°(A - a2a^H)

- K°HR^HK°

+ Qn

=K0. (7.24)'

Linear Filtering: Correlated Noise II

100

This is the differential Riccati equation giving the optimum covariance operator K° which when substituted in (7.23) gives the optimum filter gain r ° . Thus we have proved the following result. Theorem 1 Consider the system dynamics (7.1) and the measurement dy namics (7.2) and suppose the elements of the matrix valued functions {^4, H} are locally integrable and those of {01,02,04} a r e locally square integrable with 04 being nonsingular, and further {xo^W,V} are statistically indepen dent. Then the optimum (UMV) linear filter is given by dz = (A - T°H)zdt + T°(t)dy z{0) = z0

0,

= x

where T° = (R2A + K°H)R^,

(7.26)

and K° is given by the solution of the differential Riccati equation K° =(A K°{0)

<J2<J11H)K°

+ K°(A - <J2(J41H)' -

K°H'R£HK°

+ Qn

=K0. (7.27)

Uncorrelated Case The results for the uncorrelated case can be readily derived from the above result. Recall that for the uncorrelated case, 02 = 0. Hence the matrices {02,^24} are all zero. In this case equation (7.26) reduces to T0 = {K°H'){R4A)'\

(7.28)

and equation (7.27) reduces to K° =AK° + K°A K°(0)

+ Qn -

=K0.

K°HR^HK°,

(7.29)

These are precisely the equations for the Kalman-Bucy filter as given by equa tions (1.9) and (1.10) of chapter 1. 7.6 System and Measurement Dynamics (B) Here we consider a more general model in which the signal and the mea surement processes are governed by a coupled system of stochastic differential

Linear and Nonlinear Filtering

101

equations. The signal (unobserved) process {x(£),£ > 0} is governed by the linear stochastic differential equation. dx{t) = A(t)x(t)dt

+ D{t)y{t)dt +
x(0) = x 00,

(7.30) (7 30) '

and the measurement or the observed process is governed by equation dy(t) = C(t)y(t)dt

+ H(t)x(t)dt H(t)x{t)dt

+ + a4(t)dV(t),t

> 0,

2/(0) y(0) = 0.

(7.31) (7 31) '

Thus the state and the measurement processes are truly dynamically coupled. Here the same basic assumptions as given in the preceding sections apply. For the matrix valued functions {A, D, C, H}, it suffices if their entries are locally integrable. Choosing the structure of the filter as: dz = Mzdt + Nydt + Ydy, Tdy,

(7.32) (7.32)

z(0) z(0) = zo z0 = xx00,

our problem is to find the matrices {M,N,T} so that we have an unbiased minimum variance filter. Proceeding as in the previous sections, one can show that for unbiased estimate we must have M = VA-TH M'A- H N D-TC. N = = D-TC.

(7.33)

Again defining the error process e = x - z ii ii easy to verify that e satisfies the SDE: de = (A - TH)edt + axdW + (
Ya^dV,

(7.34)

which, in fact, is the same equation as (7.8). Thus the equation for error covariance remains unchanged. This is given by the solution of equations (7.18) and (7.23) or equivalently equation (7.24). The optimum gain matrix T° is then computed by the expression (7.23). Then from (7.33) we have M° = N° = (D-

{A-T°H), {A-Y°H), T°C).

Linear Filtering: Correlated Noise II

102

The dynamics of the optimum filter is given by dz = {A- T°H)zdt + {D(D-

T°C)ydt + V°dy, T°dy,

2(0) = = 20 z0 = = x 0.

(7.35)

Thus we have the following result. Theorem 2 Suppose the elements of the matrices {A(t), H(t), D(t), C(t), t > 0} are locally integrable and those of {(n(t),a2(t),a4(t),t > 0} are locally square integrable with 04 being nonsingular. The processes {xo,W,V} are uncorrelated, x0 has finite second moment, {W, V) are Brownian motions with {Q, R} being their corresponding incremental covariance matrices with R being positive definite. Then the best (UMV) linear filter for the systems (7.30) and (7.31) is given by the following set of equations dz = {A(A- T°H)zdt + (DV V ' z(0) = x0, z(0) = xo,

T°C)ydt + + T°dy, >y

y

(7.36) (7.36)

where T° is given by T° K°H')Rr° = {R2 (R24i + K°H')R; 4\ Al,

(7.37),

and K° is given by the solution of the differential Riccati equation a2alxH)K° K° =(A -
+ K°(A - a2a^H)'

K°H R^HK" - K°H'R^HK°

+ Q„ Qu

=K0. (7.38)

Comment 1 The filter equation (7.35) can be rewritten with the innovation process as its input, dz = {Az (Az + Dy)dt + T°o-4(t)dW,

(7.39) (7.39)

w

s(0) = 20,

s(0) = zb,

where W is an m-dimensional J* measurable Wiener process with the same covariance as that of V. This is verified using the same procedure as in the previous chapter. Define the process V as given by V{t) V(t) = y(t) - f C{s)y{s)ds C{s)y(s)ds - f H(s)z(s)ds, Jo Jo Jo Jo

t > 0. ~

(7.40)

K

Linear and Nonlinear Filtering

103

Clearly it follows from (7.31) and (7.40) that dV = = (dy - Cydt -

Hzdt)

= H(x - z)dt + o-idV. a4dV.

(7.41)

Using the conditional expectations relative to the sigma algebra JF* and Ito formula for stochastic differentials, one can verify that {V{t), t > 0} is a Wiener process with the covariance operator given by #44. Then define W by 1 l {t)dV. dW = oi-al (t)dV.

(7.42)

Clearly W, as defined, is an m-dimensional Wiener process with the same covariance operator as that of V. 7.7 Discussions and E x a m p l e s Here we present an example from electrical engineering. For lack of accessibility of terminals, induction coil devices are used to measure currents on overhead electrical lines. The current induced in the induction coil is related to that carried by the primary line. The coupled system is governed by the following pair of equations Li(d/dt)h Li (d/dt)h

+ M(d/dt)I2

+ Rih RJi = Vi,

L2(d/dt)I2

M{d/dt)h + M(d/dt)h

+ + R2I2 = 0,

(7.43)

where h denotes the current in the main line and I2 is the current induced in the induction coil. The source voltage Vt is noisy given by V0 + N0 where V0 is deterministic and represents the original source voltage and N0 is the component induced by other nearby cables. The parameters LUL2 and M are the self and mutual inductances and Rx and R2 are the resistances of the primary and the secondary (induction) cables respectively. The problem is to estimate h from the measurement data on I2- In the absence of perfect coupling LXL2 - M2 ^ 0 and therefore the inductance matrix is invertible and the system (7.43) reduces to the standard form {d/dt)h (d/dt)h

= o n J i + aa1122 / 2 + 6nV0 + buN buN0l

{d/dt)I22 = anii a2ih + (d/dt)I + a22I2 + b21V0 + ^21^0,

(7.44)

where the parameters { a y , 6 y } are easily determined by matrix inversion. Letting 7f and 7 | denote the currents in the absence of noise, defining the varrables

104

Linear Filtering: Correlated Noise II

x = I\ - 1°, y = 72 - I2 and considering N0 to be the standard white noise with v denoting the associated Brownian motion, it is easy to see that the pair {x, y} is governed by the pair of linear stochastic differential equations dx = (anx + a12y)dt + bndv, dy = (a2

22y)dt

ix + a

+ b2idv.

This is a special case of the system given by (7.30) and (7.31). Identifying the parameters A = a n , D = ai2,ai = 0,<J 2 = 611, C = a22,H

= 0.21,0-4 — 621,

and following the equations (7.36)-(7.38) we obtain the estimator equation dz = ( a n - j°a2i)zdt

+ (ai2 - l°Q>22)ydt + j°dy

where the optimum gain 7 0 is given by the solutions of the following equations: k° = 2 ( a n - a 2 1 ( 6 n / & 2 i ) V - ( a 2 i / 6 2 i ) V ) 2 , 7 ° = (611621+a 2 ifc 0 )(l/bl 1 ). The best estimate of I\ is then given by h(t) = Ja°(t) + E{x(t)\F*}

= mt)

+ z(t),t > 0.

For study of electromagnetic interference experienced in high density multilayered printed circuit boards or multicored cables, the dynamics is far more complex requiring partial differential equations of hyperbolic type [6]. How ever, by use of modal approximation one obtains a system of equations of the same form as (7.43). Thus, as illustrated above, one may still apply the results of this chapter to the approximate finite dimensional model.

105

CHAPTER 8 L I N E A R FILTERING W I T H CORRELATED NOISE-III

8.1 Introduction In this chapter once again we consider continuous time filtering for linear systems driven by Wiener processes. Here we treat a fully coupled system in the sense that the state dynamics is coupled with measurement dynamics even in the absence of noise. Further, the state and the measurement noises are also fully coupled. 8.2 System and Measurement Dynamics The process {#(£), t > 0} to be estimated is governed by the linear stochas tic differential equation, dx{t) = A(t)x(t)dt

+ D(t)y(t)dt

+ ai(t)dW(t)

+ a2(t)dV,

x(0) = xo,

(8.1)

while the measurement or the observed process is governed by the following stochastic system dy(t) = H(t)x(t)dt

+ C(t)y(t)dt

+ a3(t)dW(t)

+ a4(t)dV(t),t

> 0, (8.2)

y(0) = o. Comparing with the problems considered in the preceding chapters, the sit uation here is most general. The observable and the unobservable are fully coupled both dynamically and stochastically. This model reduces to all the previous cases if appropriate matrices are set equal to zero. Clearly the di mension of the matrices must be compatible. For x(t) G Rn, y(t) G R™, W(t) e Rp and V(t) G Rm we must have A(t) e M(n x n), a^t) e M(n x p), a2(t) G M(nxm), H(t) G M ( r a x n ) ,
106

Linear Filtering: Correlated Noise III

Here the same basic assumptions as given in the preceding chapters apply. 8.3 Problem Formulation Given the history of the process {y(s),0 < s < t} up to time t, we wish to find the best (UMV) estimate of the state x at time t. 8.4 Structure of the Filter Following the same basic philosophy, the filter dynamics must be of the form dz = M(t)z(t)dt + N(t)y(t)dt + T(t)dy, t>0, _ _ (8-3) ZQ

=

XQ.

Hence our problem now is to choose the matrices M, N and Y so that the estimate z for the state x is unbiased and has minimum variance. 8.5 Solution of the Problem Define the error vector e = (x-z)

(8.4)

as before. Subtracting equation (8.3) from equation (8.1) and using the mea surement dynamics (8.2), we obtain the following stochastic differential equa tion

de = (A - TH)edt + (A-YH

- M)zdt + (D-TC-

+ (d - Ta3)dW + (a2 - r
N)ydt (8.5)

-x0.

Since the estimated process z and the observed process y are arbitrary and cannot be identically zero, for an unbiased estimate it is necessary that the corresponding coefficients vanish, that is,

A-rH-M

=

0Vt>0,

D-rc-N

= ovt>o.

^8'6^

From equation (8.6) it follows immediately that the matrices M and N of the proposed filter equation (8.3) must be chosen as

Linear and Nonlinear Filtering

107

M = A - TH,

J K(8.7)

N = D - YC. Using this in the error dynamics (8.5) we obtain de = (A-

TH)edt + (
e(0) = e0

= xo -

Let $r(£>s),— oo < s < t < oo denote the transition operator corre sponding to the matrix valued function (A(t) - F(t)H(t)). As usual, using this operator we can express the error vector by the following equation 3(*) = * r M ) e 0 + / Jo - [ Jo

9r(tJs)(a1(3)^r(s)a3(s))dW(s) (8.9) $r(t,8){a2(8)-r(s)
Define the error covariance matrix K as before (K(t)t,0

= E(e(t),02,

£€i?n-

(8.10)

Using equations (8.9) and (8.10) and the basic properties of transition operator $ r one can easily derive that K satisfies the following differential equation K(t) = (A-

TH)K + K(A - TH)' + (
- IV3)'

+ (a2 - Ta4)R{a2 - I V 4 ) \

(8.11)

K(0) = K0 = P0. For convenience of notation let us define the following matrices Qn = 0"iQ0"i,Qi3 = criQa3,Q3i = o-3Qo"i,Q33 = cr3Qa3,

(8.12)

R22 = a2Qa2, R24 = a2Ra4l R42 = cr4R<J2, R44 = cr4Ra4. Note that Q\3 = Q31 and R2i = R42- Using these notations one can rewrite equation (8.11) as K(t) = (A - TH)K + K(A - TH)' + (Qn + R22) + T(Q33 + - r ( Q 3 i +R42)K(0) = K0.

(Q13 + Ru)T',

i W (8- 13 )

XQ.

108

Linear Filtering: Correlated Noise III

Thus following our general philosophy, the problem is to find an element T € Loo(I,M(n x m)) that minimizes the cost functional J(T)=

(8.14)

[Tr(nK)dt.

for any bounded interval I and 12 € Zq(J, M+(n x n)). Following similar arguments as in the preceding chapters, the optimality condition is given by dJ(r°,r-r°)=0

V r €£«>(/, M ( n x m ) ) .

(8.15)

Let <j> denote the function 4>(e) = J(T° + e(r - T°)), e € R. Thus (8.15) is equivalent to (d/de)4>{e)\e=0 = 0 V r 6 L 0 0 ( 7 , M ( n x m ) ) .

(8.16)

Let Ke denote the solution of equation (8.11) corresponding to T = T£ = T° + e(r - r°) and K° that corresponding to T°. Then we have the following pair of matrix differential equations, Ke =(A - T€H)K£ + Ke(A-

T£H)' + (Qu + R22)

- (i? 24 + Qi 3 )(r £ )' - (Te)(R24 +

Q13)'

+ r £ (Q 3 3 +

RA4)(T£)',

(8.17) and K° =(A - T°H)K° + K°(A - r°H)

+ (Qu + R22)

- (Q13 + fl24)(r°)' - (r°)(Q13 + R2ij + r°(Q 33 + ^ X r 0 ) ' , K°(0)

=K0.

(8.18) It follows from continuous dependence of solutions on parameters that the following limits exist lira K£ = K° lim(l/e)(K£-K°)=K,

(8 19)

'

Linear and Nonlinear Filtering

109

where K satisfies the following differential equation on the space M(n x n)

k=(A-

T°H)K

+ K(A - T°H)

- (r - n (HK° + Q 3 I + i*42 - (Q33 + #44)(r°)')

- (K°H + Q13 + R24 - r°(Q33 + Ru))(r - n ' , K(0) = 0. (8.20) Computing the derivative of 4>(e) and using the necessary condition (8.16), we obtain

0 = (d/cfe)0(c)|e=sO = [TT (QK) rft V T € Loo(J,Af(n x m)).

(8.21)

Since this holds for arbitrary Q e Li(I, M+(n x n)) and any finite interval J, we conclude that K = 0. This requires that, for any V e L^^I, M(n x m)) the solution of equation (8.20) be identically zero, that is, for all t € I. But this is possible if and only if the multiplier of T — T° is identically zero. Thus it follows from equation (8.20) that the expression K°H

+ (Q 1 3 + #24) - r ° ( Q 3 3 + #44) = 0.

(8.22)

Since R44 > 0 and Q33 > 0, the optimal gain is given by T° = (QIS + #24 + K°H'\

(Q33 4- #44) _ 1 .

(8.23)

Substituting the expression (8.23) in equation (8.18) we obtain the differential Riccati equation

K°=[A-

(Q13 + i*24)(Q33 + + K°[A-

Rur'H]^

(Q13 + # 24 )(Q33 +

-K°H\Q33

Ru)~lH]

+ Ru)-lHK°

+ (QH + R22 - (Q13 + R24XQ33 + R^'HQsi K°(0)

=K0.

(8>24)

+

Ru)),

Linear Filtering: Correlated Noise III

110

This is the differential Riccati equation giving the optimum covariance operator K° which when substituted in (8.23) gives the optimum filter gain T°. Thus we have proved the following result. Theorem 1 Suppose the elements of the matrices {A(t), H(t), D(t), C(t),t > 0} are locally integrable and those of {cri(t), 0} are locally square integrable with o± being nonsingular. The processes {x$,W, V} are uncorrelated, xo has finite second moment, {W, V} are Brownian motions with {Q, R} being their corresponding incremental covariance matrices with R being positive definite. Then the best (UMV) linear filter for the systems (8.1) and (8.2) is given by the following set of equations dz = (A - T°H)zdt + (D-

T°C)ydt + T°dy,

(8.25)

z(0) = XQ,

where F° is given by T° = (QIS + #24 + K°H'\QM

+ Ru)~\

(8.26)

and K° is given by the solution of the differential Riccati equation K° ={A - (Q13 + R24)(Qs3 +

R44)-1H)K°

+ K°(A - (Q13 + R24)(Q33 +

Ru)-lH)

-K°H\Q33^R44)-1HK° + (Qll

(8.27)

+ R22 - (Ql3 + R24)(Q33 + # 4 4 ) _ 1 ( Q 3 1 + R42)) ,

#°(0) =K0.

8.6 Special Cases (a) Uncorrelated Case The results for the uncorrelated case can be readily derived from the above result. Recall that for the uncorrelated case, a2 = 0, a 3 = 0. Hence the only matrices of equation (8.12) which are nonzero are {Qn,R44}. In this case equation (8.26) reduces to r° = K°H'(Ru)-\

(8.28)

11 1 111

Linear and Nonlinear Filtering and equation (8.27) reduces to K° =AK° + K°A' + + Qn -

K°H'R-jHK° K'H'R^HK" 44

(8.29) (8.29) { K°(0) =K =K0.0. ' These are precisely the equations for Kalman-Bucy filter as given by equations (3.9) and (3.10) of chapter 3.

(b) System noise affecting Measurement noise. In case the system noise affects the measurement noise but not the converse, we have <j2 = 0. In this case the matrices {R22# #24, R42} are all zero. Thus it follows from equation (8.23) that the optimum gain is given by 1 +R T° = (Q13 + K°H){Q tf0#')(Q33 + i*44)33 44)-K .

(8.30)

(c) Measurement noise affecting System noise In case the measurement noise affects the System noise but not the converse, we have 03 = 0. In this case the matrices {Qi3,Q3i-Q33} are all zero. Thus again it follows from equation (8.23) that the optimum gain is given by 1 T° = 24 T°=(# (#24 + K°H')(R K°H'){R 44)- . 44)-\

(8.31)

Comment 1 The filter equation (8.25) can be written with the innovation process as its input, dz = {Az + Dy)dt + r°(Q33 + 2 (0) z(0)

R^f^dW, R44 )1/2dW,

= z0, z0,

(8.32) (8.32)

where W is an m-dimensional ^-measurable standard Wiener process. This is verified as follows. Define the process V as given by V(t) = yy(t) - { C(s)y(s)ds - f H(s)z(s)ds, t > 0. w W y w w 7o Jo 7o Jo Clearly it follows from (8.2) and (8.33) that dV dV = = (dy (dy -- Cydt Cydt -- Hzdt) Hzdt) = H(x z)dt + a 4dV. = H(x - z)dt + a3(t)dW (t)dW + + <J a dV. 3

(8.33)

(8.34)

4

Again using conditional expectations relative to the sigma algebra JF* and Ito formula for stochastic differentials one can verify that {V(t),t > 0} is a Wiener process with covariance operator given by (Q33 + R*t)- Then define W by (8.35) dW = (Q33 + R44)-V R44)-21dV. '2dV. Clearly W, as defined, is an m-dimensional standard Wiener process.

112

Linear Filtering: Correlated Noise III

8.7 Discussions and Examples When dealing with precision measurement at the microscopic levels, the measurement noise and the system noise may equally affect each other. The impact of the measurement dynamics and its internal noise on the system dynamics can be very significant. This is certainly true at the molecular and atomic levels. The model presented in this section is the correct mathematical set up to deal with such problems.

113

CHAPTER 9 L I N E A R FILTERING FOR J U M P PROCESSES

9.1 Introduction In this chapter we consider filtering for linear systems driven by both Wiener and Poisson processes. Thus the trajectories of the state process have discontinuities of at most the first kind. Here we consider two cases: one in which the state has discontinuous jumps but the observation is a continuous process and the second, both the state and the observation have discontinuous trajectories. We must emphasize and remind the reader that we are looking for the best (UMV) linear filter, not the best filter. 9.2 Poisson Noise Let / = [0, oo), denote the time index and ( f i , ^ * , ^ T»-P) denote the probability space with a filtration Tt | C T. Let Z C fln\{0} and {p(t),t G 1} denote a Poisson random process on this probability space. For every pair of Borel sets J C I and E C Z let v( J x E) denote the number of jumps of the process p on the interval J of sizes {Ap(t) = p(t+) — p(t—),t € J} G E. Thus i/( J x E) is a Poisson random variable taking values from the set of positive integers N. In other words v is a counting random measure. The properties of this measure are discussed in section 1.4 of chapter 1. We shall freely use the notations given there. In particular, we use the centered Poisson random measure as defined by equation (1.35) of chapter 1. 9.3 System and Measurement Dynamics (A) We use the centered Poisson process q and the Wiener process W as the fundamental sources of noise affecting the system dynamics. Thus the state of the process {x(£),£ > 0} to be estimated is governed by the linear stochastic

114

Linear Filtering: Jump Processes

differential equation, = A(t)x{t)dt A{t)x(t)dt + <J *i(t)dW(t) dx(t) = / ( t , 0 q{dt q(dt x d£) xd£) X (t)dW(t) ++ f f f(t,0 Jz x(0) = = xo, xo, x(0)

(9.1) (9.1)

while the measurement or the observed process is governed by the following stochastic system dy{t) = H(t)x(t)dt H{t)x(t)dt

+ <x a4(t)dV(t),t > 0, 4(t)dV(t),t >

y(0) 2/(0) = 0.

(9.2)

For a practical example where this model fits well, we mention the global integrated navigation system [27]-[28]. Here the position of ships or helicopters is determined by processing radio signals received from transmitters located at different positions around the globe. Thus, x may represent the vector of signals transmitted which is corrupted by channel noise continuously and also through "cycle hops" by jumps. Clearly the dimension of the matrices must be compatible. For x(t) G Rn, y(t) G #m, W(t) G Rp and V(t) G Rm we must have A(t) G M(n x n),
(9.3) (9.3)

In other words Jz |/(t, f)| 2 n(df) < oo for almost all t G / and that this function is locally integrable on ". 9.4 Problem Formulation Given the history of the process {y(s), 0 < s < t} up to time t, we wish to find the best linear (UMV) estimate of the state trajectory x at time t. 9.5 Structure of the Filter Following the same basic philosophy, the filter dynamics must be of the form dz = M(t)z(t)dt + T(t)dy,t T(t)dy,t>0, > 0, (9.4) zo = x0.

Linear and Nonlinear Filtering

115

Hence our problem now is to choose the matrices M and T so that the estimate z for the state x is unbiased and has minimum variance. 9.6 Solution of the Problem Define the error vector e = (x-z)

(9.5)

as before. Subtracting equation (9.4) from equation (9.1) and using the mea surement dynamics (9.2), we obtain the following stochastic differential equa tion de = {A - YH)edt + (A-TH-

M)zdt + oxdW

+ / / ( * , 0 q(dt x d£) - Ta4dV

(9.6)

e(0) = eo = xo — £oAgain for an unbiased estimate we must have M = A - TH V t > 0.

(9.7)

Hence the error process is governed by the following equation de = (A - TH)edt + axdW + / /(*, £) q(dt x d£) - TaAdV e(0) = eo = xo —

(9.8)

XQ.

From this equation we can easily derive the differential equation for the error covariance matrix K(t),t > 0, as K = (A - YH)K + K(A - TH)' + TRu?' K(0) = K0 where #44 =

G^RG'^

+ O n + Sv(t)

= P

0,

Qn =

GIQG[

and S^ is given by

(s„(t)v,v) = [ (/(*,0^)2n(de).

(9.10)

Jz Following the optimization principle as developed and applied several times in the preceding chapters, we obtain the expression for the optimal gain T° as given below T° = K°HRll, (9.11)

Linear Filtering: Jump Processes

116

where K° is given by the solution of the differential Riccati equation K° = A(t)K° + K°A(t)

- K°H'R^HK°

+ ( Q n + SJ

/

K(0) = K0.

|/(t,77)|

Thus, we arrive at the following result for optimum linear filtering. T h e o r e m 1 Suppose the elements of the system matrices {^4(t), H(t), t > 0} are locally integrable and those of {o-i(t),o~4(t),t > 0} are locally square inte grable with cr4 being nonsingular. The processes {#o, W, V, q) are uncorrelated, XQ has finite second moment, {W, V} are Brownian motions with {Q, R} be ing their corresponding incremental covariance matrices with R being positive definite and / G Ll2oc(I1L2(Z,U))1 that is, 2

n(d77)
a.e and locally integrable on I. Then the best (UMV) linear filter for the systems (9.1) and (9.2) is given by the following set of equations dz = (A-

T°H)zdt + T°dy

(9.13)

z{0) = x 0 , where r ° is given by T° = K°H'R^.

(9.14)

and K° is given by the solution of the differential Riccati equation

K° =A(t)K° K°(0)

+ K°A(t)

- K°H'(t)R^{t)H{t)K°

+ (Qu{t)

+ 5„(t))

=K0. (9.15)

Comparing this result with that of Kalman-Bucy filter, it is apparent that the covariance equation is modified by the additional term 5^(2). Thus the ad ditional perturbation of the state process caused by the presence of the Poisson noise makes the error covariance larger and hence the required optimum gain given by the expression (9.14) also larger in the sense of norm. If one ignores this noise and constructs a Kalman filter using the classical covariance equation obtained by deleting the last term in equation (9.15), one would underestimate the error and the state estimate will be poorer.

Linear and Nonlinear Filtering

117

Another interesting point that should be borne in mind is that the covari ance Sn of the Poisson process given by

(Sw(t)r,,r,) = J (f(t^),v)2mO,ve

Rn

depends on the support of the measure II. For example, if f(t, £) = £ a n d larger jumps are more probable than smaller ones, then 5^ would be larger leading to large variance of the estimate. The same conclusion holds if / is a monotone map in the sense that there exists a 7 > 0 such that

(/(*,£) - f(t,vU

-v) > 7 H - v II2 vt,v G Rn.

9.7 System and Measurement Dynamics (B) Here we consider a system model in which both the state and the observed processes are perturbed not only by the continuous martingales (Wiener pro cesses) but also by two independent centered Poisson processes {g, q] with the corresponding compensating measures denoted by II, II. Thus the state process {x(t),t > 0} to be estimated is governed by the linear stochastic differential equation, dx(t) = A(t)x(t)dt

+ <7i(t)dW(t) + / /(*,£) q(dt x d£) Jz

(y.lo)

x(0) = £0, while the measurement or the observed process is governed by the following stochastic system dy(t) = H(t)x(t)dt

+
xdrj),t>Q,

(y-i 7 )

2/(0) = 0,

where Y C Rm \ {0}. Following the same basic principle we can obtain the optimum filter equa tions. For this purpose we need the error covariance of the jump process perturbing the measurement dynamics. This is defined as S^ by (5,(t)7,7) = / (9(t, 0 , 7 ) 2 n ( d $ ) , 7 € Rm.

(9-18)

Linear Filtering: Jump Processes

118

Thus we have arrived at the following result. Theorem 2 Suppose the elements of the system matrices {A(t), H(t),t > 0} are locally integrable and those of {cri(t),o~4(t),t > 0} are locally square integrable with (74 being nonsingular. The processes {xo,W, V, q, q} are inde pendent, xo has finite second moment, {W, V} are Brownian motions with {<2, R} being their corresponding incremental covariance matrices with R be ing positive definite and / G Ll2oc(I, L2(Z, II)), and g G Ll2oc(I, L2(Y, ft)). Then the best (UMV) linear filter for the systems (9.16) and (9.17) is given by the following set of equations dz = (A - T°H)zdt + T°dy

(9.19)

z(0) = x 0 , where r ° is given by r°(«) = K°H'(R44(t)

+ S«(t))~\

(9.20)

and K° is given by the solution of the differential Riccati equation

K° =A(t)K° + K°A'{t) - K°H'(t)(R44

+ S*) - 1 #(<)ff°

+ (Qn(t) + 5 , ( 0 )

(9-21)

0.

9.8 System and Measurement Dynamics (C) In this case the system and the measurement dynamics are given by dx = (A(t)x + D(t)y)dt + ai(t)dW

+ f f(t,$)

q{dt x d£),

K°{0)

x(0) = x 0 , and

dy = (H(t)x + C(t)y)dt + a4(t)dV + / g(t, £) q{dt x d£), JY 2/(0) = 0,

respectively.

(9.23)

=K

Linear and Nonlinear Filtering

119

As in the preceding chapters we have the following result for optimum filter. Theorem 3 Suppose the elements of the matrices {A(t), D(t)H(t), C(t),t > 0} are locally integrable and those of {o~i(t),a4(t)yt > 0} are locally square integrable with 04 being nonsingular. The processes {x 0 , W, V,q,q} are in dependent, #o has finite second moment, {W, V} are Brownian motions with {Q, R} being their corresponding incremental covariance matrices with R be ing positive definite and / € Ll£°(I, L2(Z, II)), and g e Ll£c(I, L2(Y, fl)). Then the best (UMV) linear filter for the systems (9.22) and (9.23) is given by the following set of equations dz = (A- T°H)zdt + (D - T°C)ydt + T°dy (9'24)

,nN

«(0) = x , where T° is given by

0

r°(t) = K°H'(R44(t)

+ 5*(«)) - 1 ,

(9.25)

and if ° is given by the solution of the differential Riccati equation

K° =A(t)K° + K°A(t)

- K°H(t)(R44

+ (QnW + ^ W )

+

S*)~lH(t)K° (9.26)

K °(0) =if 0 . C o m m e n t 1 Comparing theorem 2 and theorem 3, it is clear that the error covariance equation remains unchanged as expected and hence the filter gain remains unchanged. Comment 2 Filter equations for correlated noise can be developed along the same principle as those of chapters 6,7,8. The reader is encouraged to try this. 9.9 Examples and Discussion We consider a factory having m distinct production lines. The number of machines in operation on the 2-th production line, measured in terms of capacity, at time t is denoted by yi{t) each with capacity c». Let x(t) denote the cumulative revenue of the firm at time t. The availability dynamics of the

Linear Filtering: Jump Processes

120

vector of machines and the growth dynamics of revenue may be described by the following set of stochastic differential equations m

dx — y^PiCiyi(t)dt i=i

— cox(t)dt +

a\dW(t), (9.27)

dy = Hxdt + Cydt + / £q(dt x d£). Here pi denotes the unit price of the z-th commodity, Co consumption rate, the martingale term represents market uncertainty. In the second equation, H denotes the capital augmentation matrix including cost of repairs and main tenance, and C denotes the capital depreciation matrix which is a diagonal matrix with entries all negative. The last term represents random failure or repair intensity with Y C R™ \ {0} denoting the set of all possible scenar ios. Our objective is to find the best linear estimate of the company revenue E{x(t)\J^} at time t given the operation history up to time t. The result of theorem 3 applies in this case with A = —CQ, / = 0, and o± = 0,g(t, £) = £. C o m m e n t 3 For application to navigation problems see [28] and the refer ences therein. C o m m e n t 4 Another general model is given by dx = A(t)xdt + a! (t)dW + C(t)dM(t)

(9.28)

where M is a martingale with incremental covariance given by £{(dM(t),r/) 2 } = (QMw)dl3(t)yri

€ Rk

where QM € Ms(k x k) and (3 is an increasing function of time with /?(0) = 0. Wiener martingale is a special case with (3(t) — t. In this general situation the differential Riccati equation (9.15) takes the form dK = {AK + KA' - KHR^HK

+ Qn}dt + CQNC'd0(t).

(9.29)

Here, it suffices to assume that (3 is a function of bounded variation on bounded intervals of time and that

JTv (C(t)QMC) d/3(t) < CO. Under this additional assumption all the results of this chapter remain valid with the appropriate modification of the differential Riccati equations as sug gested above. For control applications see [73].

121

C H A P T E R 10 L I N E A R FILTERING WITH GAIN CONSTRAINTS

10.1 Introduction In this chapter we continue with continuous time filtering for linear systems driven by Wiener processes but subject to filter gain constraints. This was first treated in several papers of the author and his students [2], [11]. Our derivation is based on the approach given there. An algorithm for computation of the optimum filter gain is given in section 5. In section 6, we consider robust filtering where the system parameters are uncertain. We treat this as a games problem, games against nature. The problem is to find the best filter in the face of the uncertainty. We develop the basic theory and conclude with an algorithm whereby one can determine the best filter which is robust against parametric uncertainty. 10.2 System and Measurement Dynamics The system is governed by the following linear stochastic differential equation. A(t)x(t)dt + a1(t)dW(t) dx(t) = A{t)x(t)dt a^dWit) (10.1) x(0) = = xo, xo, x(0) and the measurement dynamics is given by dy(t) = H(t)x(t)dt y(0) = 0. 2/(0)

+ > 0, + a4(t)dV(t),t a4(t)dV{t),t>0,

(10.2)

10.3 Problem Formulation Given the history of the process {y(s), 0<s
Linear Filtering: Gain constraints

122

or the estimator must be linear of the form dz = M{t)zdt + T{t)dy

(10.3)

with the filter gain T satisfying some boundedness constraints. First we recall that for unbiased estimate, M must be given by M(t) = A(t)-T(t)H{t).

(10.4)

Thus for unbiased estimate we must allow this freedom before we place any constraints. Hence we are now free to put any constraints on the filter gain T without compromising the unbiasedness. Thus we can state the problem as follows. Let I = [0,T] be a finite interval and Q C £ ^ ( 7 , M(n x m)) a subset of the space of essentially bounded measurable functions taking values from the space o f n x m matrices denoted by M(n x m). Thus the problem can be stated as follows: Find F G Q that minimizes the functional J ( r ) = / T r (Q(t)K(t))

dt

(10.5)

where K denotes the error covariance which is governed by the matrix differ ential equation, K(t) = (A-

TH)K(t)

+ K(t)(A - TH)' + Qn + TRur',t

> 0,

K(0) = K0

= Po,

and ft is any positive definite matrix valued function on I. 10.4 Solution of the Filtering Problem The first question that arises is the question of existence of a solution of the optimization problem (10.5) and (10.6). This is a serious question since even a simple problem as stated below does not have a solution. Consider the interval T = [0,1] and the set A={xe

C{T) : x(0) = 0,x(l) = 1 and |x(t)| < 1}.

The problem is to find an x° G A that minimizes the functional

J(x) = f \x(t)\2dt.

Linear and Nonlinear Filtering

123

Clearly J(x) > 0 and xn(t) = tn,t G T, is a minimizing sequence and lim J(xn) = 0.

n—»oo

But the limit of the sequence is not in A and hence the infimum is not attained in the class. It fails because though the set is closed, bounded and convex, it is not compact. Thus the question of existence is important since otherwise one may search for an entity that does not exist in the first place. For this reason we are always required to verify if an optimization problem has a solution or not. First let us consider the case in which Q is bounded and suppose it is also closed and convex. Since it is a closed bounded convex subset of L^I, M(n x m)), Q is weak star compact. Let Loo(J,M+(n x n)) denote the class of real symmetric positive definite matrix valued functions on / . As K§ is positive, it is easy to verify from equation (10.6) that for any T G Loo(J, M(n x m)) the solution K G L^I, M+(n x n)) and, further, it is also absolutely continuous. Thus, since Q G Loo(^» M+ (n x n)), the functional J(T) is positive. Therefore if we show that J is weak star lower semicontinuous in the sense that J(T°) n

for any sequence T e G that converges to T° in the weak star topology on Loo(7, M(n x m)), then we can prove that J attains its minimum on Q. Indeed suppose that J is weak star lower semicontinuous and that

inf{j(r),r es} = /?. Clearly /3 > 0. Let { r n } G Q be a minimizing sequence. Sine Q is weak star compact ( weak star closed), there exists a subsequence of this sequence, relabeled as such, and a r ° G Q such that T n —► T° in the weak star topology. Then by virtue of weak star lower semi continuity we have J(T°) < liminf J ( r n ) < lim J(Tn) = /?. n—>oo

n—»oo

But since T° G G, it is clear that (3 < J(T°). Hence we have proved that J ( r ° ) = /? which means that the infimum is attained on G- Now it remains to show that J is weak star lower semicontinuous. Let $n(t,s),Q < s < t < T denote the transition operator corresponding to the matrix A(t) — Tn(t)H(t) and $ o ( t , s ) , 0 < s < t < T, that corresponding to the matrix A(t) - T°(t)H(t). By use of Gronwall inequality, one can verify that if Fn

124

Linear Filtering: Gain constraints

converges to r ° in the weak star sense, then the transition operator $n(t,s) converges to ®0(t, s) uniformly on the triangle 0 < s
f Tr{ftK°)dt = J(Tn) + / TT{Q(K° Jo Jo < j ( r n ) + F n + Gn,

Kn)dt

-

(1°-7)

where

Fn= f dsf dt ii/n(t){(*„(«,*)ro(«) -

$n(t,s)rn(s)).

.Ru(s)($0(t,s)r°(s))'} Gn=

f

ds f

dt Tr(n(t){($0(t,s)r0(s)R44(s)

•

•

($0(t,s)r°(s)-$n(t,s)rn(s))}).

Since 04 G 1,2(1, M(m x m)), it is clear that R44 £ L\(I, M(m x m)). Thus by virtue of weak star convergence of Tn to T° and uniform convergence of 3>n to $ 0 , it follows from the above expressions that lim Fn = 0, lim Gn = 0.

71—KX)

n—XX)

Hence for every e > 0, there exists a number n€ € N such that |Fn|,|Gn|<(l/2)c for all n>n€.

Thus we have J{T°) < J ( T n ) + e , V n > n £ .

(10.8)

Since e > 0, but it is otherwise arbitrary, it follows from this that J(r°)
proving weak star lower semicontinuity of J.

(10.9)

Linear and Nonlinear Filtering

125

Now consider Q to be unbounded. Since Q e L 0 0 (7,M s + (n x n)) and #44 > 0 for all t e / , it is easy to verify from equation (10.6) that || K ||—> +oo as || T ||—> +oo. Hence J(T) is radially unbounded, that is lim J(T) —> +oo. liriHoo One can verify this by integrating equation (10.6) and using the facts that Qn(t) is nonnegative and Ru(t) is positive definite on the interval I. From this and nonnegativity of J and the fact that J ^ oo (is not identically infin ity) we conclude that J has a minimum. Thus we have proved the following fundamental existence result which is presented as a theorem. T h e o r e m 1 Let Q C L^I, M(n x m)) be a closed convex, possibly, bounded subset and Q a real symmetric positive definite matrix valued function with values in M(nxn). Then the optimization problem (10.5), (10.6) has a solution. C o m m e n t 1 It is interesting to note that for Q bounded, it is not essential to assume the strict positivity of the observation noise covariance R44{i) = a4(t)Ra4(t) . However, we will see later that the Kalman-Bucy filter can be derived only if this matrix is positive definite for all t £ I. Next we prove the following result which is required to develop necessary conditions for optimality. In other words this will allow us to write a set of necessary conditions that a T € Q must satisfy if it is to be optimal. L e m m a 2 For every T e G, equation (10.6) has a unique solution

KeAC{I,M(nxn)), the space of absolutely continuous functions with values in M(n x n). The mapping T —> K is continuous and continuously Gateaux differentiable and its Gateaux derivative at T° in the direction T - T° is given by the solution of the following equation:

k = (A - T°H)K + K(A - T°HJ - (T - T°)(HK° - i ^ r 0 ) ' ) - (K°H - r°R44)(T - r°)',

( 10 - 10 )

K(0) = 0, where K° is the solution of equation (10.6) corresponding to T = T°. Proof. The first part is already known. We prove the last part. Let T°, V e Q and s e [0,1]. By convexity Fe = r ° + £ ( r - r ° ) e Q. Let K£ and K° denote the

Linear Filtering: Gain constraints

126

solutions of equation (10.6) corresponding to T£ and T° respectively. It is easy to verify that as e —> 0, K£ —> K° uniformly on I. Hence by using equation (10.6) for K£ and K° and subtracting one from the other and dividing by e and letting e —> 0 one obtains equation (10.10). QED Now we are prepared to develop the necessary conditions of optimality. Let r ° G G be the optimal gain, and T e G any other element. Then by virtue of convexity and closedness of the admissible set G, T° -f e(T — F°) G G for all e G [0,1]. Clearly by optimality of T°, J(T° + e(T - T°)) > J(T°) Ve G [0,1], WeG-

(10.11)

This is equivalent to [TT(QK£)

dt > [TI(QK°)

dt.

(10.12)

Wee [ 0 , 1 ] , V T G £ .

(10.13)

Hence £

-K°))dt>0,

Dividing through by e and letting e —> 0 we obtain [Tr(n(K)\dt

> 0, VT G £,

(10.14)

where K is the solution of equation (10.10). Our objective now is to write this inequality in terms of T , r o and K°. Towards this goal, we introduce an auxiliary equation which we call the adjoint equation given by L + L(A-

T°H) + (A - T°H)'L = fi(t)f

L(T) = 0.

[Tr(n(K "

Since ft G Loo(7,M+(n x n)), it is clear that this equation has a unique absolutely continuous solution L° G AC(I,M(n x n)). Replacing Q in the inequality (10.14) by the expression on the left hand side of equation (10.15) and integrating by parts and using the boundary conditions K(0) = 0 , L ° ( T ) = 0 one arrives at the following expression /

Tr(SlK)dt = f

T V | L ° f-K

+(A-

T°H)K + K(A - F°HA

\dt > 0.

Linear and Nonlinear Filtering

127

Recalling that for any matrix D, Tr(D) = TY(D'), and using the variational equation (10.10) it follows from the previous inequality that j

-r^R44)(r-ro)'jcft>0, VTeS.

TrU°(K°H

(10.16)

Collecting all these information together we have the following necessary conditions for optimality. Theorem 3 Suppose the assumptions of Theorem 1 hold. Then in order that r ° G Q be the optimum gain for the constrained filter, it is necessary that the following equations and inequalities hold: L° + L°(A - r°H) J + v(A - T°H)'L° = nm, ; w L°(T) = 0, K°(t) = (A-

T°H)K{t)

+ K(t)(A - T°HJ + Qn + (r°)Ru(r°)\t

(10.17) V ; > 0,

K°(0) = Xo, (10.18) and

0

0

fTrlL (K H'

- r°R44\

(T - r°)'\

eft > 0, VT G S- (10.19)

Unconstrained Filter Here in this subsection we wish to derive, from the very general result given above, the classical results of Kalman-Bucy. We lift the constraint by choosing Q = Lpo(I, M(n x m)). Then it follows from the inequality (10.19) that L° \K°H

- r°#44 ) = 0, V t G / .

(10.20)

If Cl G £,«,(/, Af+(n x n)) and Q(t) > 0 for all t G / , then equation (10.17) has a real symmetric strictly negative definite solution L°. One can easily verify this as follows. First note that equation (10.17) has a unique solution given by L°(t) = - f

* B ( t , 8)Sl(8)*'B(t, s)ds,

(10.21)

where \£s(£, s),0 < s < t < T, is the transition operator corresponding to the matrix function

B(t) =

-(A(t)-r°(t)H(t))'.

Linear Filtering: Gain constraints

128

We show that L° is negative definite by establishing a contradiction. It is obvious from (10.21) that it is symmetric and negative semidefinite. We must show that it is strictly negative. Suppose there is a £ ( ^ 0) e Rn such that (L°(*)£, 0 = 0 for some t G I. Then f

(Sl(s)9B(t,*)£,

9s(t, s)Qd8 = 0.

Since Q(t) is symmetric and positive definite we can write this as

|| (QCs^vMMK II2 ds.

0=J This implies that n(s))1/2VB(t,s)£

= 0 for almost all s € [*,T].

Hence, by the positivity of Q, and nonsingularity of the transition operator \I>£, it follows that £ = 0. This contradiction proves the claim. Thus L°(t) is negative definite for all t G [0, T) and hence it follows from (10.20) that K°(t)H{t)

- T°(t)R44(t)

= 0 V t e [0,T).

(10.22)

Thus we have proved the following result. Corollary 4 Suppose the assumptions of Theorem 3 hold with Q = Loo(7, M(n x m)) and Q(t) a real symmetric positive definite bounded matrix valued function. Then the necessary conditions of optimality given by Theorem 3 are equivalent to the following necessary conditions: for the filter gain T° to be optimum, it must satisfy the equations (10.18) and (10.22) simultaneously. This means that the optimum filter gain T° has to be determined by solving the differential equation K(t) = F(K(t), T(t)), K(0) = K0,t€(0,

T]

(10.23)

subject to the algebraic constraints G(K,T) = 0,

(10.24)

Linear and Nonlinear Filtering

129

where F{K, T) = (A - TH)K + K{A - THJ + Qn + G{K,T) =

TR

TR44-KH\

In any case once T° is found, the optimal filter is given by the stochastic differential equation dz = (A - T°H)zdt + T°(t)dy

(10.25)

which follows from equations (10.3) and (10.4). This problem reduces to a much simpler one if the measurement noise covariance #44 is invertible. This requires that R > 0 and a4(t) nonsingular for all t e I. In this case equation (10.22) yields an explicit expression for the optimum gain as given below Y° = K°HR^.

(10.26)

Substituting this in equation (10.18) one obtains the differential Riccati equa tion K°(t) = A(t)K(t)

+ K(t)A{tj

+ Q n - K°H'{R^)~lHK°,t

K°(0) = Ko.

> 0,

44(r)'

Thus we see that the Kalman-Bucy filter follows from a corollary to a more general result on filtering with constraints on filter gain. 10.5 Dynamically Coupled Systems In this section we assume that the process dynamics and the measurement dynamics are fully coupled in the absence of noise. Thus the system is given by dx{t) = A(t)x{t)dt + B(t)y{t)dt + ai(t)dW(t) (10.28) x(0) = x 0 , and the measurement dynamics is given by dy{t) = H(t)x(t)dt

»(o) = o,

+ C(t)y(t)dt + 0, (10 29)

'

where the matrices {A, S , i J , C} are locally integrable. As we have seen in the preceding chapters with uncorrelated noise, the error covariance does not

Linear Filtering: Gain constraints

130

change. It is still given by equation (10.27) though the optimum filter dynamics is different. For unbiased estimation, demanding a filter structure of the form dz = Mzdt + Nydt + Tdy, one arrives at the conclusion that M and N must be given by M = A - FH and N = B - TC. This leads to the estimation error equation given by de = (A - TH)edt + axdW - Ta4 Hence the error covariance dynamics remains unaffected by these generaliza tions. Thus for optimum constrained filter, theorem 3 remains in force. This is stated in the following theorem. T h e o r e m 5 Suppose the system matrices {A(t), B(t), H(t), C(t), t > 0, } have locally integrable entries and the matrices {<Ji{t),(j£(t),t > 0,} have entries which are locally square integrable. Further, the processes {xo,W, V} satisfy standard hypotheses. Then the optimum linear filter is given dz = (A-

T°H)zdt 4- (B - T°C)ydt + T°dy,

z(0) = x 0 ,

dV.

where the optimum gain r ° satisfies the necessary conditions of optimality given by equations (10.17)-(10.19). 10.6. Algorithm for Computation I Here we present briefly an algorithm for computing the optimum filter gain subject to the constraints. This we do by using the result of theorem 3. Suppose the nth stage of iteration has been reached. Then, by setting n = 0, one has initialized the algorithm. s t e p 1 Suppose at the nth stage, T = Tn e Q. Setting T = Tn solve equation (10.17) to obtain Ln. S t e p 2 Use T = Tn to solve for Kn using equation (10.18). S t e p 3 Use {Tn,Ln,Kn} and the inequality (10.19) to compute the direction of search Gn = Ln(KnH - TnRu). (10.31) S t e p 4 Choose a positive number an and update Tn by T n + 1 given by rn+l

^ pn _

anGn^

(1Q 3 2 )

Linear and Nonlinear Filtering

131

so that T n + 1 lies in the admissible set Q. Step 5 Repeat step 1 with T n + 1 till a stopping criterion is met. Note that corresponding to the updated T n + 1 , the cost is given by

j(r n+1 ) = j(rn) + dj(r n ,r n + 1 - r n ) + o(an) = J(Tn) - andJ{Tn, Gn) + o(an) = J ( r n ) - an [TT (Gn{Gnj) n

)-anj

dt + o(an)

(10.33)

\\Gn\\2dt + o(an).

Here we have used dJ(T, A) to denote the Gateaux differential of J at T in the direction A. Choice of the step size an is very critical. Prom the last equation it is clear that for sufficiently small an the algorithm will converge. If at the n-th stage Tn e dQ and the direction denoted by Gn is outward, one must stop the iteration. If however Gn is inwardly directed, then an can be chosen positive and the iteration continued. This gives us a time varying filter gain I\ For easy implementation, often in engineering practice it is desirable to have a time invariant filter gain. For this one may choose a closed, possibly, bounded convex set E C M(n x m), and consider this as the set of admissible gains. In this case one can modify the previous algorithm as follows. For the direction Gn we define Gn = f{Ln(KnH

- TnR44)}

dt

(10.34)

and replace (10.32) by r n + l =Tn

_

anQU

(10.35)

starting with r ° e S. 10.7 Robust Filtering In this section we want to introduce the concept of robustness for lin ear filtering. This question arises when the system matrices are not precisely known or they are dependent on some parameters whose values are not pre cisely known. We assume however that the parameters lie in some bounded set in RN for some integer N < oo. We denote this set by V. The system model is given by dx = A(t,q)xdt 4-
= J(T

132

Linear Filtering: Gain constraints

for t 6 I and q £ V. Our basic problem is to give the best linear estimate (UMV) of x(t) relative to the filtration T^ in the face of the uncertainty about q e V. In view of the preceding chapters, we know that, given the parameter g e P , the best filter must have the form dz = (Ait, q) - T(t)H(t, q))zdt + T(t)dy, z(0) = x 0 ,

(10.37)

where the optimum filter gain T is determined by minimizing J(T)=

f Tr(Sl(t)K(t))dt= Jo

f Tr (to(t)K{t,q,T)) Jo

dt,

(10.38)

subject to the dynamic constraint K = (A{t, q) - T(t)H(t, q))K + K(A(t, q) - T(t)H(t, q)) + v(t, q)Q* (t, q) + r ( * 0 ( i , 9)Ari(*, «))r',

(10-39)

X(0)=^o='PoIf there are no constraints on the filter gain this will give us the Kalman-Bucy filter and if there are constraints we obtain the result stated in Theorem 3. The problem here is that q is unknown and therefore J is not uniquely defined by r alone. In fact it is a well defined function of the two parameters V and q. To emphasize this dependence we write J(q,T) for the expression (10.38). Now suppose that there are also gain constraints as in sections 10.3 and 10.4 of this chapter. The admissible set is again denoted by Q C Loo(I, M(n x m)). In the face of the uncertainty, the correct formulation of the problem should be as follows. Find a P G 5 that minimizes the functional J0(r)=sup{J(g,r),g6P}. This is a min-max problem or a games problem, games against nature. The op timal strategy, or equivalently, the saddle point of the game is a pair (g°, T°) G V x G that satisfies the following chain of inequalities J(q,r°)

< J(q°,T°) < J(q°,T),

V (q,T) € V x Q.

(10.40)

We assume that an optimal strategy exists. Our main interest here is to develop necessary conditions of optimality. In order to prove the necessary conditions for optimality we need several intermediate results. Let (g°,r°) € V x Q be

Linear and Nonlinear Filtering

133

the saddle point and let K° denote the corresponding solution of the following matrix differential equation K = (A(t, q°) - r°(t)H(t,

q°))K + K(A(t, q°) - T°(t)H(t,

+ 0 ( t , q°)R*o(t,

q°))' q°)')(T°)',

K(0) = K0 = P0. (10.41) The associated value of the game is then given by J{q°,T°) = ITr(Sl(t)K0(t))dt.

(10.42)

For convenience of presentation we introduce the following notations. Corre sponding to the pair (q°, T°) 6 P x g, we define B0(t) =

A(t,q°)-r°(t)H(t,q°)

Co(t,0

= dqA(t,q°,0

-r°(t)dqH(t,q°;0,Z

Do(t,0

= dqa(t,q°;Z)Qa'(t,q°),te

E0(t,$

= r°(t)dqa0(t,q°;Z)RcT'o(t,q0)(r°)',$

€

RN,

€

RN,

RN,

(10.43)

where, for any function F(t,q), dqF(t,q°,£) denotes the Gateaux differential of F(t, q) with respect to q at q° in the direction £. These are all n x n ma trix valued functions. The map £ —» dqF(t,q°,£) from RN to M(n x n) is homogeneous. Later we may assume that it is also linear. Fix T = r ° in (10.39) and let q free. Let Ke denote the solution of equation (10.39) corresponding to T = T° and qe = q° + e{q - q°). L e m m a 6 Suppose all the matrices are measurable in t € / and continuous and continuously once Gateaux differentiable with respect to the parameter q on V. Then K£ —► K° uniformly on I. (10.44) Further, there exists a K € C(I, M(n x n)) such that as e —> 0 (1/e) (K£ -K°)^K

(10.45)

uniformly in t on V" and it satisfies the matrix differential equation (d/dt)K

= B0{t)K + KB0(t) + G(t, q°, T°, K°;

K(0) = 0,

q-q°),

(10.46)

Linear Filtering: Gain constraints

134

where G(t, q°, T°, K°; q-q°)

= C0(t, q - q°)K° + K°C'0(t, q - q°) + A>(*, 9 " 9°) + D0(t, q-q°)

+ E0(t, q - q°) + £ > ,
Proof. The proof is essentially simple but long. We present only an outline. Subtract the differential equation (10.41) corresponding to the pair {q°,T°) from that corresponding to the pair (q£, T°) and arrange terms in a suitable way and apply Gronwall inequality and Lebesgue dominated convergence theorem to prove (10.44). This part requires the continuity of the matrices with respect to the parameter q. Similarly one can prove (10.45) and (10.46) using the assumptions on Gateaux differentiability of these matrices with respect to the parameter q. This completes the outline of our proof. QED The reader is encouraged to carry out all the details. Using the first half of the inequality (10.40), we can now prove the first half of the necessary conditions of optimality stated in theorem 10. L e m m a 7 Suppose that the admissible set V x Q is convex and closed and that an optimal strategy pair exists. In order that the pair (q°, T°) £ V x Q be the optimal strategy it is necessary that there exists an L° £ C(I, M(n x n)) such that /

Tv|-Lo(t)G(t,^o(t),ro(t),^;go-9)|^<0,

\fqeV,

(10.48)

where L° is the solution of the adjoint equation (d/dt)L + LB0(t) 4- B0(i)L = Q(t),t e [0,T],L(T) = 0.

(10.49)

Proof. Given that the pair (^°,r°) G *P x Q is the optimal strategy, it is evident that J(g £ , r ° ) - J(q°,T°) < 0, We G (0,1]. (10.50) Thus (l/e)(J(q^r°)

- J(q°,r°))

= [

Tr(Q(t)(K£

- K°)/e)dt.

(10.51)

Jo

Letting e [ 0 it follows from (10.50) and (10.51) that dq J(q°, r ° ; q-q°)=

[ Tr(Q(t)K(t))dt Jo

< 0, \/q G V,

(10.52)

Linear and Nonlinear Filtering

135

where K is the solution of equation (10.46) corresponding to q e V. We intro duce the adjoint system, (d/dt)L + LB0(t) + B0(t)L = Sl(t), L(T) = 0.

(10.53)

It is clear that for every Cl e L i ( i , M+(n x n ) ) , this equation has an absolutely continuous solution L°(t),t e I. Substituting this in (10.52) one can easily verify that

dqJ{q0,T0-q-q°)

=j

TriL°{-{d/dt)K

+ B0K +

kB0)\dt,

= f Trl-LoG(t,qo,ro,Ko'iq-qo)\dt<0,

VgGP.

(10.54) Using the facts that for any pair of square matrices K, L, Tr(KL) = Tr(LK) and Tr(K) = TY(iO, it follows from (10.47) and (10.54) that

dqJ{q°, P>; q-q°) = J = J^ Tri-L°

i J - L°G(t, q°, P \ K°; q - q°) \dt\c

(co(t, q - q°)K° + A)(*, q ~ q°) + E0(t, q - q°)\ \dt < 0, (10.55)

Vg € V. This completes the proof. QED Now we consider the second half of the games problem (10.40), that is,

j(q°, r°) < j(q°, r), vr e g.

(io.56)

Proof of the following Lemma is very similar to that of lemma 6. Lemma 8 Suppose the assumptions of Lemma 6 hold and further the set Q is convex and weakly (weakstar) closed as a subset of Loo(7, M ( n x ra)). For e e (0,1], define T£ = r ° + ^ ( r ~ r ° ) and let K€ denote the solution of equation K* = (A(t,q0) - r£(t)H(t,q°))K*

+ K*(A(t,q°)

-

Y*(t)H(t,q°)j

+ v(t, q°)Q* (t, q°) + r (erofr q°)Rtr'0(t, q°)) (I*)', £

K (0) = K 0 , (10.57) corresponding to the pair (q0,Te) and K° the solution of equation (10.41) corresponding to the pair {q°,Y°). Then, as e | 0, we have : i ^ —> tf°, (l/e)(/ir e - tf°) — if,

uniformly on / .

(10.58)

136

Linear Filtering: Gain constraints

Further, K satisfies the matrix differential equation (d/dt)K

= B0(t)K

+ KB0{t) + ( r - T0)(R0(r0)

+ ( P ' f l o f r O - K°H\t,q°))(T

-

H{t,q°)K°)

- r°)\

(10.59)

K(0) = 0, where Ro(t,q°) =

a0(t,qo)R<j'Q(t,qo).

Using Lemma 8, we prove the second half of the necessary conditions of optimality. Lemma 9 Suppose the admissible set V x Q is convex and closed and that an optimal strategy exists. In order that the pair (q°, T°) £ V x Q be the optimal strategy, it is necessary that there exists an L° £ C(I, M(n x n)) such that /

T r \ L ° ( r ° R o - K°{t)H

(t,q0))(T°

- T)'left > 0,

(10.60)

for all T € Q where L° is the solution of the adjoint equation (d/dt)L + LB0(t) + B0{t)L = Sl(t), L(T) = 0.

(10.61)

Proof. From (10.56) it is clear that J(q0,T°)

< J{q°,T£)

We e (0,1], T G g.

(10.62)

By definition J(q°,r£)

= J(q°, r° + e(T-r°))=

I

Tr

(Sl(t)Ke(t)) dt,

Jo

J(q°,T°)=

f Tr (il(t)K°(t)) Jo

(10.63)

dt,

where K€ is the solution of equation (10.57) and K° is the solution of equation (10.41) respectively. Hence by virtue of Lemma 8, it follows from (10.62) and (10.63) that

dTj{q°, r° ; r - r°) = / TY (SIK) dt>o vr e g, Jo

(io.64)

Linear and Nonlinear Filtering

137

where K is the solution of equation (10.59). Using the adjoint equation (10.61) and integration by parts, it follows from (10.64) that dT J(q°, r ° ; T-r°)=

f

T T | L ° (-(d/dt)K

+ B0K + KB'0\ \dt.

(10.65)

Using the facts that TV(AB) = Tr(BA), Tr(A + B) = Tr(A) + Tr(£), Tr(A) = Tr(A'), it follows from (10.59) and (10.65) that

d r (
T r j - L ^ r 0 ^ ^ , ^ ) - K°rf(t,q0)\(r

- T°)'\dt

> 0, (10.66)

for all TeG.

This completes the proof. QED

Prom Lemma 7 and Lemma 9, we obtain the necessary conditions that an optimal strategy pair (g°,r o ) must satisfy. T h e o r e m 10 Suppose the hypotheses of Lemma 7 and Lemma 9 hold. Then, in order that the pair {q°, T°) € V x Q be the optimal strategy, it is necessary that there exists an L° £ C(I, M(n x n)) such that f

Trf-L°(co(t,q

- q°)K° + D0(t,q-

q°) + E0(t,q-

q°)\ \dt < 0,

VgeP, and /

Trl{-L0(^r0R0{t1q0)-K0H\t,q0)\r-r0),\dt>

(10.67) where K° and L° are the solutions of the following system of matrix differential equations: K = (A(t,q°)-r0(t)H(t,q°))K + a(t,q°)Qc

+

K(A(t,q°)-r°(t)H(t,q0))'

(t,q°) + r°R<,(t,q°)(T°)',

(10-68)

K(0) = K0 = P0.

(d/dt)L + L(A(t,q°)

- r°(t)H(t,q°))

+ (A(t,q°) - T°(t)H(t,q°))

L = J2(t),

L(T) = 0. (10.69)

Linear Filtering: Gain constraints

138

Proof. The proof follows from Lemma 7 and Lemma 9. QED Comment 2 The results of Theorem 10 also hold for time varying param eters {q = q(t),t G I.} In this case V is a closed bounded convex subset of LOQ{I, RN). See also comment 5. 10.8 Algorithm for Computation II We can use this result to develop an algorithm for computing the optimal strategy pair (q°, T°). Since this is necessarily sequential we may use the nota tions of (10.43) corresponding to the sequence (qn,Tn) G V x Q generated at the nth stage of iteration. We define A(t,qn)-Tn(t)H(t,qn)

Bn(t) =

C„(«,0 = dqA(t,qn,0 Dn(t,0 En(t,0

-Tn(t)dqH(t,qn;0,Z

M = dqa(t,qn;0Qa (t,qn),£ 6 RN, = rn(t)dqa0(t,qn;ORa'0(t,qn)(rn)',Z

€ RN, (10.70) 6 RN.

Assuming that all the Gateaux differentials are linear and that {e^, 1 i - q?) fo =

- qn)Kn + Dn(t,qT V J - L " (cn(t,ei)Kn

qn) + En(t,q-

+ Dn(t, a) + En(t,

e<))

\dt, \dt

J2^-^9i(Ln,Kn,qn,rn)

= («-?",s>
(10.72)

except for the factor 2. This expression is very convenient for writing the algorithm. Similarly for (10.67b), at the nth stage of iteration, we have Fn(t) = -Ln(t)(rn(t)a0(t,qn)R<70(t,qn)

- Kn{t)H\t,qn)\

(10.73)

Linear and Nonlinear Filtering

139

Thus the expression (10.67b) can be written as

f iv|-L n ^r n ^ 0 (t,o-^ n ^ , (^o)(r-r n ) , |^ (10.74)

= / T>(Fn(r-rn)')d*. Jo The gradient here is given by

«dTJ(qn,rn),A»=

[ Tr(Fn(t)A'(t))dt,

A e £«,(/, M(n x m)).

Jo

(10.75) Using the gradients (10.72) and (10.75) we can now write an algorithm. Step 1 Let (qn,Tn) e V x Q, n > 0, denote the approximations at the nth stage of iteration. Step 2 Solve equations (10.68) and (10.69) replacing (g°,r°) by the pair (qn,Tn) and obtain (Kn,Ln). n n Step 3 Use {q ,T , Kn,Ln) to compute the gradients given by (10.72) and (10.73) giving {gn,Fn}. Step 4 Define g n + 1 = qn + egn, and T n + 1 = Tn - eFn, by choosing e sufficiently small so that ( ^ n + 1 , r n + 1 ) € V x Q. If the gradients are zero or outwardly directed the saddle point is ( g n , r n ) and stop and print the results. If not go to step 5. Step 5 Compute J{qn+\Vn)

= J(qn,Tn)+e

|| g» fRN

J(qn, Tn+1) = J(qn, r " ) - ejo

= J(qn,Tn)-ef

Jo

+o(e)

TV (Fn(Fn)')

dt + o(e)

\\Fn(t)fdt

+ o(e).

(1Q ? 6 )

and go to step 1 with ( g n + 1 , r n + 1 ) and continue unless a stopping criterion is met. Comment 3 Note that according to the expression (10.76), new selection of g n + 1 pulls up the cost and r n + 1 pulls it down. This tug of war continues till the saddle point (g°,r°) has been found. For numerical results on constrained filtering based on algorithm 1 of section 10.6, see [11].

Linear Filtering: Gain constraints

140

Comment 4 The question of robustness in filtering problems have been considered in [45] and [68] using different techniques. Here we have presented a rigorous technique based on games theory. Comment 5 In the case of uncertain time varying parameters (see comment 2), the algorithm for computation given above requires only minor modifica tion. With reference to equation (10.71), define g?(t) = TVJ -Ln (cn(t,

ei)Kn(t)

+ Dn(t, a) + En(t, a)

for 1 < i < N and replace the scalar product in RN by the duality pairing associated with L 00 (7,-R^) and Li(I,RN). Thus the scalar product in the inequality (10.71) takes the meaning, (q ~ Qn,9n) = [ (q(t) - qn«,gn(t))RNdt Jo

<0VqeV.

(10.71)'

For convenience of notation we write F(q°) for F(t,g°). As a corollary of theorem 10 we have the following result for the unconstrained case. Corollary 11 Suppose all the assumptions of theorem 10 hold and that Q = LQO(7, M ( n x r a ) ) . Then the necessary condition for optimal strategy is given by the inequality (10.67a) where K° and L° must satisfy the differential equations K = A(q°)K + KA{q°) + a(q°)Qa (q°) - KH'tq^Rotfr^WK,

(10.77)

K(0) = K0. and -L = LA{q°) + A'(q°)L - LK(H

(q°)R;Hq°)H(
- {H{q°)k;\q°)H{q0))KL L(T) = 0.

-

fi

(10.78)

141

C H A P T E R 11 FILTERING FOR LINEAR S Y S T E M S D R I V E N B Y S E C O N D O R D E R R A N D O M PROCESSES

11.1 Introduction In this chapter we consider continuous time filtering for linear systems driven by Wiener processes and an additional second order random process. We shall see below that in this situation we are required to solve integrodifferential Riccati equations. Within the knowledge of the author, this is the first time that this problem has been addressed and solved. Our derivation is based on the same general principle as used in the preceding chapters. 11.2 S y s t e m and Measurement Dynamics The system is governed by the following linear stochastic differential equation, dx(t) = A(t)x(t)dt + f(t)dt +
+ cr t>0, a4(t)dV(t), 0, 4(t)dV(t), t >

2/(0) = 0.

(11.2)

Here W and V are the Brownian motions and the process {/(*),£ > 0} is an arbitrary second order random process with known mean or known conditional mean relative to the history of the observed process . This is given by either

m/(t) = E{f(t)}, E{f(t)},

(11.3)

mf(t) m = Etf(t)\7?\. Etf(t)\7?\. f(t) =

(11.4)

or

142

Filtering for Second Order Processes

In many engineering problems the noise process / is an essentially bounded random process and hence certainly not Gaussian. It may be Lipschitz, contin uous, or discontinuous but measurable. So it cannot be modeled as a Wiener process. Its probability law is not known. The only information one has about this process is its mean given by (11.3) or (11.4) and its covariance operator C given by s,t € R,£,r, € Rn.

(C(t,8)t,ri)=EUf(t)-mf(t),t)(f(s)-mf{s)),Ti)y

(11.5) Clearly it follows from the above expression that C(t,s) = C'(s,t).

(11.6)

Since the process is assumed to be second order, it is clear that

f || C(t,t) \\dt = J E\\ f(t) - mf(t) ||2 dt < oo,

(11.7)

for any finite interval I. Thus the solution of the SDE (11.1) is a well defined second order random process and is given by x(*) = *(*,0)x o + / * ( * , s ) / ( s ) d * + / ^(t1s)a1(s)dW(s),t>0. Jo Jo

(11.8)

Clearly this process is not Gaussian. C o m m e n t 1 In fact the process / need not be a second order process. We can also admit generalized random processes like an independent white noise. In that case C(t,t) is a Dirac measure concentrated at 0, and equation (11.7) is not necessary. 11.3 P r o b l e m F o r m u l a t i o n Given the history of the process {2/(5), 0 < s < £}, the process {ra/(s), 0 < s < t}, and the covariance C, we wish to find the best (UMV) estimate of the state x at time t, under the constraints that the filter or the estimator must be linear of the form dz = M(t)zdt -f mf(t)dt

+ V(t)dy.

(11.9)

First we recall that for unbiased estimate, M must be given by M(t) = A(t)-T(t)H(t).

(11.10)

Linear and Nonlinear Filtering

143

Thus we can state the problem as follows. Let I — [0, T] be any finite interval and Q = L^I, M(n x m)), the space of essentially bounded measurable func tions taking values from the space M ( n x m), as the admissible class of filter gains. Thus the problem can be stated as follows: Find T € Q that minimizes the functional J(T) = f Tr(Sl(t)K(t))dt where ft € L^I,

Mf{n

(11.11)

x n)) and K is given by

{K{t)a,Z) = E{{x{t)-z{t),Z?}.,teI,

V£efT.

(11.12)

Throughout this chapter we use the following assumption. Assumption A: The random processes {xo, f(t), W(t), V(t),t > 0} are inde pendent with Po being the covariance of Xo, C(t, s) the covariance of / and Q, R are the incremental covariances of the Wiener processes {W, V} respectively with Q being positive semidefinite and R positive definite. Define the error process by e(t) =x(t)-z(t).

(11.13)

Subtracting (11.9) from (11.1) and using (11.2) one obtains the stochastic differential equation for the error process de = (A-

TH)e + (A-TH-

e(0) = e 0 = (x0

M)zdt + ( / - mf)dt + axdW - Ta4dV

-XQ).

(11.14) Again for an unbiased filter we must choose M = A-TH.

(11.15)

This reduces equation (11.14) into the following equation de = (A - YH)e + ( / - mf)dt +
(11.16)

e(0) = e 0 = (x 0 - x 0 ) , and the filter equation (11.9) takes the form dz = (A-

TH)zdt + mf{t)dt

+ T(t)dy.

(H-17)

Letting $r(t,s),s < t, denote the transition operator corresponding to the matrix (A(t) - r(t)H(t)), the solution of equation (11.16) is given by

Filtering for Second Order Processes

144

e(t) = *r(t,0)e 0 + / *r(M)(/(«) - ™f(s)) ds Jo l

[r Qr(t,s)<Ti(s)dW(s)- [ $r(t,s)r{s)a4(s)dV(s),t Jo Jo

e I. (11.18)

For convenience we write this as e(t) = e^t) + e2(t) + e3(t) + e4(t),

(11.19)

where e* denotes the i-th term of equation (11.18). Note that these processes are statistically independent with zero mean. Thus the covariance operator K is given by (K(t)t,0

= E{(e(t),02} = E{(ei (t), 0 2 } + E{(e2(t), 02} + E{(e3(t), £)2} +

E{(e4(t),tf}

= ( # i (*)£,£) + (K2(t)Z,0 + (K3(t)£,0 + (KA{t)t,Z)(11.20) Using these we wish to write, as usual in this book, the differential for K. We do this term by term. By definition these are given by (ffi(i)&0 =< K0$'T{t,0)£, *r(*.0)4 >, (K2(t)£,0 = I

j < C(r, 9)*'r(t,e)£, *r(«,r)€ > d9dr,

(11-21) (11.22)

where C(r,0), given by (C(r,0)£)J?) = £?|(/(r) - ro/(r),0(/(fi) - m/W.T?)},

(11.23)

for 4,77 € Rn, denotes the autocovariance of the process /. Similarly the third and the fourth terms are given by (K3(t)t,t)= {Ki{t)Z, 0=

r* Jo

[ d0, Jo

.

< T(0)Ru(9)r (0)* r (i, 9)£, * r ( t , 9)£ > M,

(11.24) (11.25)

where Quit) =
(11.26)

145

Linear and Nonlinear Filtering

Now we differentiate each of these terms to obtain a differential equation for K. For this purpose we use one of the fundamental properties of the transition operator: (d/dt)$T(t,

s) = (A(t) - r(t)H(t))$T(t,

s), V 0 < s < t < oo.

(11.27).

For convenience of notation we shall suppress the time variable without any undue risk of confusion. We simply keep in mind that A — TH is a func tion of time. Differentiating (11.21),(11.22),(11.24) and (11.25), we obtain the following equations, ( / f o O = < (A - rH)K!(t)U (K2U)

=< (A - TH)K2{t)d,Z>

> + < Ki(t)(A

- rH)'£,Z >

+ < K2(t)(A - TH)'t,Z

(11.28)

>

f <», Jo (11.29) > + < K3(t)(A-VH)'t,t > +Qn(t), (11.30)

+ f <*r{t,e)C(e,t)t,Z>dB+ Jo (K3(t)£,0

=< (A-rH)K3(t)t,t

and finally (*4(*)e, 0 = < (A - TH)Kt(t)t,

£> +
TH)'t, £ >

44(t)r't,z>.

+
Adding the equations (11.28)-(11.31) we obtain the following equation for K {KZ, 0 = < (A - TH)K(t)t,

£ > + < K(t)(A - TH)'e, £> + < Qu£, £ >

+ + f <*r(t,e)c(o,t)t,s>de Jo

+ [ de. Jo

(11.32) Hence it follows from (11.32) that K satisfies the following integro-differential equation: K = (A-

TH)K(t) + K(t)(A - FH)' + Qu(t) + f $r(t,0)C{6,t)d6+ Jo

f Jo

+

T(t)Ru(t)r'(t)

C{t,0)$'r{t,9)d6,

K{0) = K0. (11.334)

Filtering for Second Order Processes

146

For compactness of notation we shall write this functional equation as follows K(t) = K{t) = K(0) =

(K,r), Fl Fl{K,n

(11.33)

K0,t>0,

where Fi denotes the functional expression appearing on the right hand side of equation (11.33A). As usual in this book, we consider equation (11.33) as a controlled system on the state space M(n x n) with filter gains T as controls. Thus according to our basic principle we have proved the following result. Lemma 1 The basic filtering problem (11.11)-(11.12) is equivalent to the following control problem: find r £ Q that minimizes the functional J(T) = f[ Tr(Sl{t)K{t)) Tr{n(t)K{t))

dt

(11.34)

subject to the dynamic constraint (11.33). 11.4 Solution of The Filtering Problem and Some Special Cases Again the first question that arises is the question of existence of a solution of the optimization problem (11.33)-(11.34). We shall not go into this question any further except pointing out that similar question has been addressed in theorem 1 of chapter 10 on constrained filtering. Due to the presence of the integral term, however, some modification in the proof is required. We leave it to the serious reader for a detailed proof. We assume here that the optimization problem has a solution. Here we shall mainly concentrate on the question of necessary conditions of optimality and derive the equations for optimal filtering. Comment 2 It is interesting to note that for Q bounded, it is not essential to assume the strict positivity of the observation noise covariance Ru{t) = at(t)Rat(t)'. Here we ehall lnly yerive ehe eilter rquationn without tny yain constraints, that is, we assume that Q = Loo{I,M(n x m)). Again we invite the reader to derive the filter equations for the general case. Next we prove the following result which is required to develop the necessary conditions for optimality. In other words this will allow us to write a set of necessary conditions that a T e G must satisfy if it is to be optimal.

Linear and Nonlinear Filtering

147

L e m m a 2 For every r G G, equation (11.33) has a unique solution K G AC(I,M{n x n)). The mapping T —► K is continuous and continuously Gateaux differentiate and its Gateaux derivative at T° in the direction T - T° is given by the solution of the following equation:

K = (A-

T°H)K

+ K(A - rQH)' + (r - r°)(#44(r°)' -

+ (Y°Ru-K°H)(Y-T°)

- / $0(t,r)(T

HK°)

-r°)(r)H(r)M0(r,t)dr

Jo

- f M'0{r,t)H (r){T -T°) Jo

{r)$0(t,r)dr,

K(0) = 0, (11.35) where K° is the solution of equation (11.33) corresponding to Y = T°, $ 0 is the transition operator corresponding to the matrix function A(t) — r°(£)ff(£), and the kernel M0 is given by M0{r,t)=

[

$0(r,6)C{e,t)de.

(11.36)

Proof. Let T°,T e G and e€ [0,1]. By convexity Te = r ° H - e ( r - r ° ) 6 G- Let Ke and K° denote the solutions of equation (11.33) corresponding to T£ and T° respectively. It is easy to verify that as e —► 0, Ke —► K° uniformly on I. Hence by using equation (11.33) for K£ and K° and subtracting one from the other and dividing by e and letting e —► 0 one obtains equation (11.35). This involves a bit of algebra, though it may appear to be somewhat laborious. QED Now we are prepared to develop the necessary conditions of optimality. Let r ° e G be the optimal gain, and F e G any other element. Then by virtue of convexity and closedness of the admissible set G, T° 4- e(F — F°) G G for all e G [0,1]. Clearly by optimality of T°,

j(r° + e(r - r°)) > J(r°), \/s G [o, i], vr G g. This is equivalent to [TT(QK€)

dt >

[TI(Q,K°)

dt.

(11.37)

Filtering for Second Order Processes

148

Hence f Tr(n(K£

- K°) Jdt > 0, \fe e [O,1],VT€0.

(11.38)

Dividing through by e and letting £ - > 0 w e obtain fTr(n(t)K(t)Jdt

> 0, VT e 0,

(11.39)

where K is the solution of equation (11.35). Since, here we are only interested in the free problem, that is Q = Loo(/,M(n x m)), the necessary condition (11.39) reduces to [Tr(n(t)K(t)\dt

= 0, VT e Q.

(11.40)

Since, for £l{t) > 0, this is true for all T £ Q and all finite intervals / , this is possible if and only if K = 0 for all T € Q. This means that, for optimality of T°, equation (11.35) must have trivial solution for all V e Q. This is possible if and only if for alH > 0,

( r - T0)(Ri4(T°)'

- HK°)(t)

- f *„(t,r)(r -

T°)(r)H{r)M0(r,t)dr

Jo

+ (r°R44-K°H')(r-r°)'(t)-

f M'0(r,t)H

(r)(T

-V)\r)*'0(t,r)dr

Jo

= o v reg. (11.41) Again, since this is true for all T e Q and Q is the whole space, this must also hold for T(t) = T°(t) + X0(t)D for all D e M{n x m), where X0(t) is related to Q0(t,s) = XotyX-1^)^^ t. Then (11.41) is equivalent to X0(t)D(Ru{T°j

- HK°){t) - [

$0{t,r)X0(r)DH(r)M0{r,t)dr

Jo

(T°R4, - K°H)DX0{t)

- f M,0(r1t)H\r)DX,0(r)^0(tJr)dr

{UA2)

Jo

= 0, \/D e M(n x m). For compactness of notation, define G(t) = (R44{r°)' - HK°)(t).

(11.43)

Linear and Nonlinear Filtering

149

Then, it is clear from (11.42) that the trace must vanish, that is, Tri (X Trj(X 0DG)(t) 0DG)(t)

+ {X0DG)'{t) DG)'{t)- -

f

$0{t,r)X0(r)DH(r)M0(r,t)dr

t - ff M'0(r,t)H'(r)DX' (r,t)H\r)DX'00(r)K(t,r)dr\ (r)$'0(t,r)dr\ (r)*'

(11.44) (H.44) =0

for all D G M(n x m) and t > 0. Since within the parenthesis the second term is the transpose of the first and the fourth is the transpose of the third, this is equivalent to 2Trlxo0(t)DG(t) t)dr\ 2TriX (t)DG(t) - f * 0 ( t , r)X j*0(r)DH(r)M (t,r)X0(r)DH(r)M 0(r, 0(r,t)dr\ ( J o J = 0, Vt > 0, V D e M(n x m),

, , (11.45) (11.45)

which, in turn, is equivalent to Tr|x (t)Z>rG(t) Ttix0(t)D(G(t)

- f/ H(r)M0(r,t)dr\ (r,t)dr\\

\ = 0, Vt > 0, V D 6e M(n x m).

(11.46) Since X0{t) is always nonsingular and (11.46) holds for all t > 0, and all D € M{n x m), we must have t > 0.

(11.47)

d0, M00(r,t)= (r, t) = r* r 0(r,9)C(6,t) * 0 (r, «)C(«, t) < W,

(11.48)

G(t) G(t)= W = /f H(r)M0(r,t)dr,

Jo

But recalling that

it follows from (11.43)'and (11.47) that de( [f iRu(t){T°(t))' ? 4 4 ( t ) ( r 0 ( t ) )= ' = FH(t)K°{t) ( t ) / C 0 ( t ) ++ / M(

Jo

\Je

(r,e)dr)c(9,t). H(r)$o0(r,0)dr)c(6,t).

)

By transposing this we have T0(t)R44 r°(t)R 44(t)

K°(t)H\t) + f[ MC(t,0)( d0 C(t,6)( = K°(t)H'(t)+ °

> 0. f[ $'A_roH(r,e)H'(r)dr),t *'A-roH(r,0)H'(r)dr\t>0. V

(11 (11.49) 49)

150

Filtering for Second Order Processes

Since Ru{t) is nonsingular, equation (11.49) is equivalent to r°(t)=(K°(t)H'(t) =

+J

deC(t,9)(J

$A-roH(r,e)H\r)dr\^R^(t)

F2(K°,ro),t>0.

(11.50). The foregoing analysis shows that the optimal filter gain T° and hence the pair {T0,K°} must satisfy the pair of functional equations (11.33) and (11.50) simultaneously. Thus we have proved the following necessary conditions of optimality. Theorem 3 Suppose the assumption (A) holds and that #44 (0 is invertible for all t > 0. Then, in order that the pair {r°, K°} be optimal for the control problem as stated in Lemma 1, or equivalently for the filtering problem (11.11)(11.12), it is necessary that they satisfy the functional equations (11.33) and (11.50) written compactly as K(t) = F1(K,r),K(0)

r(t) =

=

Ko,t>0,

F2(K,r),t>o.

(11.51)

Further, the optimum filter is given by dz = (Awhere {T°,K0}

T°H)zdt + mf(t)dt

+ T°dy,

(11.52)

is the pair satisfying equation (11.51).

Some Special Cases The first two examples are the extreme cases. The rest are intermediate ones. (a): Let {/(£),£ > 0} be an n-dimensional white noise process with zero mean ray = 0 and C(t, s) = S(t — s)Q0 where Q 0 is the incremental covariance. In this situation

I I

rt

*r(t,9)C(9,t)M

=

(l/2)Q0,

C(t,0)$'r(t,0)dB

=

(l/2)Q0.

0t

/o

(11.53)

Thus equation (11.50) reduces to T0(t) = K°(t)H,(t)R^(t)

(11.54)

Linear and Nonlinear Filtering

151

and the functional differential equation (11.33A) reduces to a differential equa tion given by K = (A-

TH)K(t)

+ K{t){A - TH)' + {Qn(t)

+ Q0) + T(t)R44{t)f

(t),

K(0) = K0, (11.55) with K° being its solution corresponding to T = T°. Thus we have obtained the classical matrix Riccati differential equation with extra noise covariance Qo added to the equation. This is precisely what is expected. Note that if / is an n-dimensional white noise, equation (11.1), can be rewritten as dx = Axdt + dW(t) +

ai(t)dW(t)

where W and W are independent Brownian motions and / is the generalized derivative of W. Clearly this is not a second order random process. But we may note that our result is also valid for a class of generalized random processes including white noise. (b) In case / is purely deterministic and is known to the observer, m / = / and C(£, s) = 0. Again we obtain the classical filter without any additional uncertainty, that is, Qo = 0. (c) Let W be an Rq valued Brownian motion independent of {#o, W, V} with incremental covariance denoted by Q and let f(t) = FW(t),F G M(nxq). Then clearly m/(t) = 0, (11.56) C(t,s) = (tAs)FF. Then one can compute M0(r,t)= G(t)=

[ Jo

[ Jo

(sAt)Q0(r,s)(FF')ds (11.57) H{r)M0{r,t)dr.

This gives equation (11.50) or the second equation of (11.51). (d) In this example we consider F : Rq —> Rn a bounded Borel mea surable function. For the random process / we take f(t) = F(W(t)). This is clearly a bounded measurable random process generated by nonlinear trans formation of the Brownian motion W. Define gt(x) = (l/2irt)q/2exp

{-(1/2*) || x \\2},t > 0.

(11.58)

152

Filtering for Second Order Processes

Then for the mean and the covariance we have mf(t)

= Ef(t)

= [

F(z)gt(z)dz

(C(t,s)^V) = /

(11.59) (F(x) - mf(t),Z)(F(z)

- mf{s),rj)gt-8(x

-

z)gs{z)dxdz.

The expression for the covariance can be verified by use of the Markov property of W and conditional expectations. Information given by equation (11.59) is sufficient to apply theorem 3 for filtering the processes given by equations (11.1)-(11.2). (e) If / is a self similar random process, it is known that its autocovariance is hyperbolic, that is, C(t,s) = C(s,t) = (1/|* - s\a)Qf,0

< a < 1,

(11.60)

where Qj is a symmetric positive semidefinite n x n matrix, that is, Qf € M+(n x n). Given that mj is known, we have Mo(r,t) = f\l/\t G(t)=

rt / Jo

-

8\a)*0(r,s)Qfds, (H.61)

H(r)M0(r,t)dr.

Note that (11.61) involves a singular integral and since 0 < a < 1, this is well defined. Strictly speaking this is not a second order process. 11.5 Dynamically Coupled S y s t e m s In this section we assume that the process dynamics and the measurement dynamics are fully coupled in the absence of noise. Thus the system is given by dx{t) = A(t)x(t)dt + B(t)y(t)dt + f(t)dt + <jx(t)dW{t), , . (11.62) x(0) = x0, and the measurement dynamics is given by dy(t) = H{t)x(t)dt

y(o)-o,

+ C(t)y(t)dt + 0,

(1L63)

Linear and Nonlinear Filtering

153

where the matrices {A, B,H, C} are locally integrable. As we have seen in the preceding chapters with uncorrelated noise, the error covariance does not change. It is still given by equation (11.33) though the optimum filter dynamics is different. For an unbiased estimation, demanding a filter structure of the form dz = Mzdt + Nydt + mf(t)dt + Tdy, one arrives at the conclusion that M and N must be given by the expressions M — A — TH and N — B — TC. This leads to the estimation error equation given by de = (A- YH)edt + (/ - mf)dt + ax 4dV. Hence the error covariance dynamics remains unaffected by these generaliza tions, see equation (11.16). Thus for optimum filter, theorem 3 remains in force. This is stated in the following theorem. Theorem 4 Suppose the system matrices {A(t), B(t), H(t), C(t), t > 0}, have locally integrable entries and the matrices {<TI(£),(T4(£),£ > 0,} have entries which are locally square integrable. Further, the processes {xo, / , W, V} satisfy assumption (A). Then the optimum linear filter is given dz = {A- T°H)zdt + (B - T°C)ydt + mf(t)dt

+ T°dy, dW - Ta

z(0) = x 0 ,

where the optimum gain T° satisfies the necessary conditions of optimality given by Theorem 3. 11.6 Algorithm for Computation Here we present briefly an algorithm for computing the optimum filter gain. This we do by using the result of theorem 3. Suppose the n-th stage of iteration has been reached and Tn has been determined. Step 1 Use r = T n to solve for Kn using equation (11.33) or the first equation of (11.51). Step 2 Use { F \ Kn} to compute F2(Kn1Tn) n 1

and define

n n

r + = F2(K ,r ). Step 3 Use a suitable metric, for example the standard Euclidean metric, to compute

d(rn+\rn) = ||r n+1 - r n ||

Filtering for Second Order Processes

154

and stop if a predetermined level of accuracy has been reached, and print; if not, go to step 1. Note that if at the n-th stage r n + 1 equals Tn then Tn is a fixed point of the functional equation (11.51ii) given by the second equation of (11.51). That is, T n solves the fixed point problem T = F2(Kn,F) in the space Q. C o m m e n t 3 Results of section 11.4 and 11.5 can be generalized to the case where the measurement dynamics is also subjected to an additional noise hav ing second order properties. In this case the measurement dynamics may be given by dy(t) = H(t)x(t)dt

+ C(t)y(i)dt

+ g(t)dt + (jA{i)dV(t),t

> 0,

(11.65) '

K

2/(0) = 0 ,

where g is a second order random process with known mean mg (t) and covari ance Cp(£, s). In this case the filter equation will have the form dz = (A-

T°H)zdt + (B - Y°C)ydt + (mf - T°mg)dt + T°dy.

(11.66)

The equations for the error covariance and the optimum gain denoted by the pair { i f ° , r ° } are slightly more involved than those of equation (11.51). The reader is encouraged to carry out the details to prove results similar to those of theorem 3 and theorem 4.

155

C H A P T E R 12 E X T E N D E D K A L M A N FILTERING-1,11,111

12.1 Introduction In this chapter we consider linear approximation of continuous time filtering for nonlinear systems. Here we develop the so called extended Kalman filter (EKF) equations which are linear approximations of the exact nonlinear filtering equations discussed in the next chapter. We develop three different models for the EKF; the first one is based on linearization around the estimate and the second is based on linearization around the deterministic trajectory of the nonlinear undisturbed system and the third is a second order correction of the first. Our derivation is based on the same general principle as used in the preceding chapters. 12.2 System and Measurement Dynamics The system is governed by the following nonlinear stochastic differential equation in Rn dx{t) = b(x(t))dt + dx(t) + <Ji{t)dW{t) a!(t)dW(t) (12.1) x(0) = £o, x0, where b is a suitable nonlinear mapping from i?" to Rn, and 0, t>0, 4(t)dV(t), (12.2) y(0) = 0,

where h is a suitable nonlinear mapping from Rn to Rm, a^t) £ M(m x m) is a deterministic matrix valued function and V is an m-dimensional Brownian motion with covariance R € M+(m x m). As usual we assume that the processes {W(£),V(£),£ > 0} are statistically independent and also independent of the initial state x0.

Extended Kalman Filtering

156

12.3 Problem Formulation (EKF-1) Given the history of the process {y(s), 0 < s < £}, or equivalently the sigma algebra J^, the problem is to find a process rj(t),t > 0, which is ^"f-adapted so that E{x(t)} = E{n(t)},Vt>0, +: E{(x(t) - r}(t), £)*} is minimum Vf e Rn, V t > 0. This is a particular case of a more general problem. Let <j> : Rn t-> R. Find a process 4>(t) which is T\ measurable so that E{cf>(x(t))} =

E{kt)},Vt>0, (12.4)

E{{(/)(x(t)) - 4>{t))2} is minimum V t > 0.

In other words the best (UMV) estimate of the functional (j>(x(t)) at time £, is given by 4>(t). We show that (£) is given by the conditional expectation of the random process (t) = E{(x{t))\F*}. Define

m = EU(x(t))\^\

(12.5)

and let
E^($(t) - (x(t))) 2\n} = m - mf+z{(m

-

mm2^?}

(12.6) By virtue of (12.5) the last term in equation (12.6) vanishes. Hence we have

E^(t) - <£(*(*)) v ? } = m) - m)2+E{(m

- 4>(x(t)))2i^ j .

(12.7) Since the last term of (12.7) is fixed, the expression on the left attains its minimum if (t) is chosen as ip(t). This verifies our claim that the conditional expectation gives the best (UMV) estimate. Hence the solution to our problem (12.3), that is, the best estimate of the state x(t) given the history T\', is given by

x(t) = EL(t)\J^\.

(12.8)

Linear and Nonlinear Filtering

157

12.4 Problem Solution with EKF-1 Equations The method we follow here is nothing more than linearization of a nonlinear problem. We assume that the drift terms b and h of equations (12.1) and (12.2) are sufficiently smooth so that they can be approximated by retaining only the first two terms of their Taylor series expansion around the estimate x(t). In fact it is sufficient if these functions are only C2. In that case Lagrange formula is used [see section 12.6]. Under this assumption the equations (12.1) and (12.2) can be approximated by dx ~ 0(t)dt + B(t)(x(t) B(t){x(t)

x{t))dt + a!(t)dW(t),x(0) CTi(*)dW(t),a;(0) = x0, x0, - x(t))dt

dy ~ n(t)dt + + H(t)(x(t)

- x(t))dt + a
= 0,

(12.9) (12 9) '

where 6(t) 3{t) = b(x(t)),B(t) b(x{t)),B{t)

= Db(x(t)) Db{x{t)) = {dbi/dxj,1< {dbi/dXj1<

n(t) = h(x(t)),H(t)

= Dh(x(t)) = {dhi/dxj,1 {dhi/dxj,

i,j < n};

0} W),S(«),#,ff(t),t>0} are JFf measurable. For convenience of presentation we shall not hesitate to make a slight abuse of notations. We shall occasionally use B(x(t)) for B{t) B(t) H(x{t)) for H{t). Since equation (12.9) is linear in x, we can use the same approach as used so frequently in linear filter theory. We choose the structure of the filter as follows: dx(t) = M{t)x(t)dt

+ g(t)dt + T(t)dy(t),x(0)

= Ex0 = x x00..

(12.11)

Here we allow {M,g,T} to be ^ - a d a p t e d . Our task now is to choose the ( matrix and vector) processes {M,g,T} so that the filter represented by (12.11) gives the unbiased-minimum-variance estimate x(t). Define the error process e(t)=x(t)-x(t),t>0. e(t) =x(t)-x(t),t>0.

(12.12)

Extended Kalman Filtering

158

Subtracting equation (12.11) from the first equation of (12.9), we obtain de(t) ~(B(t) - T(t)H(t))e(t)dt + a!{t)dW{t)

-

+ (/?(*) - r(*)*7(*) - g(t))dt -

M(t)xdt

T(t)a4{t)dV{t). (12.13)

For an unbiased estimate it suffices to choose M(t) = 0, g(t) = p{t) - r(t)T7(t), t > 0.

(12.14)

With this choice, the error equation (12.13) reduces to de{t) ~ (B(t) - T(t)H(t))e(t)dt

+ ai(t)dW(t)

-

T(t)a^(t)dV(t),

(12.15)

e(0) = eo = xo — xo, and the filter equation (12.11) reduces to

dx(t) - (0{t) - T(t)rj(t))dt + T(t)dy{t),x{0)

= Ex0 = x0.

(12.16)

Since equation (12.15) has precisely the same structure as that of equation (3.13) of chapter 3, the optimum gain is given by r ° = K°H'Rl£

(12.17)

where K° is the solution of the differential Riccati equation K = B(t)K + 0 ' ( t ) + Q n K(0) = K0.

KH'{f)Rll{t)H(t)K,

(12.18)

As we have seen in the previous chapters, equations (12.17) and (12.18) origi nate from the single equation

K = (B(t) - T°(t)H(t))K + K(B(t) - r°(t)H(t))' + Qu + rTz^r 0 )' ^(0) = K0, (12.19) where T° is the optimum filter gain that corresponds to minimum mean square error. Note that, in contrast with linear filter theory, these equations are not explicit since B and H are functions of the estimate x(t) which is our target variable. Therefore this differential Riccati equation can not be solved off line. In any case we have arrived at the following result which is well known in the literature as the Extended Kalman Filter (EKF).

Linear and Nonlinear Filtering

159

Theorem 1 Suppose the drift vectors b and h are sufficiently smooth func tions mapping Rn to Rn and Rn to Rm respectively and the dispersion matrices <7i(£),<74(£),£ > 0 are locally square integrable with 04 being nonsingular. The processes {xo, W, V} are statistically independent where W and V are Brownian motions with incremental covariances given by {Q, R} where R is positive definite. Then, the extended Kalman filter equations (EKF) are given by the following system of coupled stochastic differential equations:

dx(t) ~ b(x(t))dt + T°(t){dy(t) - h(x(t))dt), x(0) = x 0 . T°(t) = K°(t)H'(x(t))R^(t) K°(t) - (B(x(t)) - r°(t)H(x(t)))K°

+ K°(B(x(t))

(12.20) (12.21)

-

T°(t)H{x(t)))'

+ Qii + r°i? 4 4(n', K(O) = K0. (12.22) Proof. Here we present only some comments on the proof. Equation (12.20) follows from equations (12.16) and (12.10) and the definitions for (3 and 77. Equations (12.21) and (12.22) follow from equations (12.17) and (12.19) and the definitions for B and H. QED C o m m e n t 1 By definition x(t) = E{x(t)\Jrf}. Since the equations (12.20)(12.22) are merely the approximations of the correct filter equations to be seen later, it is not quite appropriate to use the symbol x for their solutions. Therefore, caution should be exercised while reading the equations (12.20)(12.22) and interpreting their solutions which are only approximations of the variables they symbolically represent. It is important to note that, in contrast with the linear filtering, K° is now adapted to T\ and therefore a stochastic process depending on the uncertainty carried by the observation process. Hence the optimum filter gain T° is also adapted to Ff. The success of the (EKF) depends very much on the smoothness of the vector fields {6, h} and the accuracy of any algorithm based on the above result. We present here an elementary algorithm for solving the (EKF) equations. 12.5 Computational Methods Since K is symmetric, equations (12.20)-(12.22) constitute a system of nonlinear differential equations of order n(n + 3)/2. The system is driven by

Extended Kalman Filtering

160

the observed process y with initial conditions {X$,KQ}. Given the process y, this is a deterministic system of equations and hence any standard technique like, Runge-Kutta , can be used to solve this problem. Since these system of equations are only approximations of the correct filter equations one must use double precision for reliable estimates. Also one can use simple Euler scheme provided the time interval over which the solution is sought is partitioned with sufficient refinement. Letting Z denote the vector itj,

iT("+ 3 )/ 2 ,

l
one can rewrite the equations (12.20)-(12.22) in the compact form dZ = F(Z)dt + G(Z)dy

{£, K

Z(0) = Zo,

where F and G are vector and matrix valued functions entirely determined by the functions {&,ft,J5, H}. By using the Euler scheme one can compute Z((s + 1) A t ) in terms of Z(s A t) and the current observation as follows : Z((s + 1) A t) ~ Z(s A t) + F(Z(s At))

At

+ G(Z(s A t))(y((s + 1) A t) - y(s A t)), s e N. (12.24) Here one may use Stratonovich formalism by converting equation (12.23) ac cording to proposition 6 of chapter 2 before writing the Euler scheme. 12.6 Problem Formulation and Equations for EKF-2 Again we consider system (12.1) and (12.2) reproduced here for conve nience dx{t) = b(x(t))dt + (n(t)dW(t), dy(t) = h(x(t))dt + a4(t)dV(t),

t > 0, x(0) = x 0 , t > 0, y(0) = 0.

(

• )

Usually one may expect that if the dispersion parameters ai, 04 are small in some sense, the signal to noise ratio is high and the process x can be well approximated by a sum of two terms: x =z+x

(12.26)

where z is the unique solution of the unperturbed (deterministic) system z(t) = b(z(t)), t > 0, z(0) = z0= x 0 .

(12.27).

Linear and Nonlinear Filtering

161

If 6 has linear growth and is locally Lipschitz, equation (12.27) has a unique (absolutely continuous ) solution on any finite time interval. Thus x represents the random fluctuation around this deterministic flow. We approximate the equations (12.25) around this deterministic trajectory by expanding b and h around z using Lagrange formula as described below. Let / G C2(Rn) with second partials uniformly bounded, that is, sup{|| (D2f)(0

U €Rn},

(12.28)

where D2 denotes the matrix of second partials. For any pair x,z € Rn consider the function 0—►/(z + 0 ( x - z ) ) , 0 € [ O , l ] , and note that (d/dO)f{z + 0(x - z)) = Df{z + 0{x - z)), x - z).

(12.29)

Integrating this over [0,1], we obtain the Lagrange formula, f(x) = f(z) + / dO (Df(z + 0(x - z)),x - z). Jo

(12.30)

This holds for any / G C 1 . If / e C2 and D2f is uniformly bounded on Rn, then by repeated application of the Lagrange formula one can obtain f(x) = f(z) + (Df(z),x-z)+f

deU

I

(D2f(z+sO{x-z))x-z,x-z)\.

ds

(12.31) Assuming that b and h are C2 functions having second partials uniformly bounded, we can expand b and h using Lagrange formula. Thus, letting B(t) and H(t) denote the matrices B(t) = {d/dxjbiizit)), H(t) = {d/dxjhi(z(t)),

l
(12.32)

it follows from Lagrange formula that b(x(t)) = b(z(t)) + B(t)(x -z)

+ o(x - z)

h(x(t)) = h(z(t)) + H(t)(x -z)+

o(x - z).

(12.33)

This leads to the following system of equations dx(t) c^ B(t)xdt -h ai(t)dW(t),

t > 0, x(0) = x0 - x 0 ,

dy(t) - h(z(t))dt 4- H(t)x + a4(t)dV(t),t

> 0, y(0) = 0.

(12.34)

Extended Kalman Filtering

162

This is the standard linear filtering problem treated in chapter 3 with the exception that the observer equation has an additional deterministic quantity. The problem is to find the best estimate of x{t) which obviously is given by xx = = E{x{t)\tf>}. E{x(t)\F?}.

(12.35)

Thus the filtering problem has been decomposed into two subproblems. The first problem is related to the deterministic flow governed by the initial value problem (12.27), and the second problem is related to the dynamics of random fluctuation of the state and measurement determined by the system of equations (12.34). This is the classical Kalman filtering problem. Hence the results of chapter 3 apply. Accordingly, the filter equation for x is given by dx = B(t)xdt + r°(t)(dy(t)

- {h(z(t)) +

H(t)x}dt),

x(0)=0.

(12.36) (12 36) '

where T0 = K0H'R^ and K° is the solution of the differential Riccati equation K = B(t)K + + KB\t) + Qn KHR^HK, (12.37) K{0) = K0. K(Q) It follows from the above results and equation (12.26) that the best estimate is given by x{t) x{t) x(t) = z{t) + x(t) (12.38) We summarize these results in the following theorem. T h e o r e m 2 Suppose the assumptions of Theorem 1 hold with b and h in C2 having bounded second partials. Then the approximate filter equations for (EKF-2) are given by the following system of coupled deterministic and stochastic differential equations: = x0,0, z = b(z(t)),z(o) b(z(t)),z(o)=x d£(t) ~ B(t)£{t)dt + KH'R^{dy{t) K = B{t)K K = B(t)K + + KB(t)' K(0) = K0. K{0)

- h(z(t))dt - H(t)x(t)dt),i(0) + KB(t)'+Qn-KH'RllHK, Qn KHR^HK,

(12.39) = 0, (12.40) (12.41)

This is the so called modified extended Kalman filter (MEKF) proposed in [13].

Linear and Nonlinear Filtering

163

Comment 2 Note that the matrices B and H are matrix valued functions of z and hence equations (12.39)-(12.41) constitute a system of n(n + 5)/2 coupled nonlinear differential equations,with (12.40) being stochastic. Given the observation y these can be solved by any standard numerical technique. Unlike in EKF-1, (theorem 1), here (12.39) and (12.41) are deterministic and can be solved off line giving {K°yz} which can then be used for the estimator equation (12.40). In contrast, the EKF-1 equations (12.20)-(12.22) must be solved in real time C o m m e n t 3 The success of this filter depends on the size of the initial covariance matrix PQ = KQ. For example, if the Tr(Po) is large, estimation error is likely to be large. Another source of poor estimation is the exclusion of second order nonlinearities. The size or the intensity of nonlinearity can be measured by the second partials {D2bi,D2hj} which have been neglected in the above approximations. 12.7 Comparison of EKF-1 and EKF-2 with Examples In case of small noise we replace o\ by S\G\ and 04 by £204- This means that if ai,i = 1,2, are bounded and the parameters Si,i — 1,2, are small, the noise power densities are small. In this case the EKF-1 and EKF-2 equations are given by dx(t) ~ b(x(t))dt + (l/el)K0H'R^{dy{t) K°(t) = B(x{t))K°

+ K°B(x(t))'

- h(x(t))dt),x(0)

= x0.

(12.42)

+ elQn 2

- (l/e 2)K°(t)H\x{t))R^H{x(t))K°,

(12.43)

K(0) = K0, z = b(z(t)),z(o)=x(h dZ(t) ~

(12.44)

B(z(t))£(t)dt + (1/4)K°H'

(z(t))Rrf

(dy(t) - h(z(t))dt - H(t)£(t)dt),

(12.45)

5(0) = 0. K° = B(z(t))K°

+ K°B{z{t)j 2

+ e\Qu

- (l/e 2)K H(z(t))R^H(z(t))K0, K°(0) = K0.

0

(12.46)

Extended Kalman Filtering

164

It was reported in [13] that EKF-2 performs better than EKF-1 under small noise situation. This was justified by several simulation results by plotting the integral squared error = I II x{t) - x(t) ||2 dt. (12.47) Jo For fixed measurement noise e^, JT,C2 (£I) w a s plotted as a function of £1. It was found that for small values of £1, both the filters exhibit similar performance but as the noise intensity increases the performance of EKF-2 supersedes that of EKF-1. Similar results were obtained when the dynamic noise S\ was fixed and the measurement noise £2 was varied. For detailed numerical results the reader is referred to the literature [13]. Unfortunately there is no formal proof of this fact. However examining the equations of EKF-1, one may notice that in the approximation of b = E{b(x(t))\J^}, the second term of the following expression Mei,£2)

b = b(x(t)) + E< (Db(x(t))(x(t)

- x(t)) + higher order terms) | ^

has been omitted using the argument that Db(x(t)) is J^ adapted and hence the conditional expectation of the second term is zero. The same comment applies to h. This is correct if the solution of equation (12.42) is precisely x as defined by the expression (12.8). But this equation is only an approximation and hence it's solution cannot be exactly x, as the notation suggests, and hence the omission is not justified. On the other hand inclusion is also not useful since x is not known. This kind of problem is avoided in EKF-2 which includes both the first and second terms. 12.8 Equations for EKF-3 We expect that inclusion of second order terms may eliminate the discrep ancy mentioned above . If both b and h are C 3 having third partials uniformly bounded, then by use of Lagrange formula one can justify that k = bi(x(t)) + (l/2)Tt{D%(x(t))K(t)} hi = hi(x(t)) + (l/2)Tr{D2hi(x(t))K(t)}

+ o(\\ x-x + o(|| x-x

|| 2 ) || 2 ).

(12.48)

Define the vectors b(x(t),K(t))

= {bi(x{t)) + (l/2)Tr{D\(x(t))K(t)},i

h(x(t),K(t))

= {hi(x(t)) + (l/2)Ti{D2hi(x(t))K(t)},i

= l,2...,n} = l,2...,m}. (12.49)

Linear and Nonlinear Filtering

165

Now we replace the equations of EKF-1 by dx(t) ~ b(x(t),K°{t))dt K°(t) = B(x(t))K°

+ K°HRlZ(dy(t)

+ K°B(x{t)j

+ Qn -

- h(x(t),K°{t))dt),x(0)

= x0. (12.50) K0(t)rf(x(t))R£H(x(t))K0,

K(0) = K0. (12.51) Thus we have yet another modification of EKF-1. One may expect this model to work better than EKF-1. We propose this as EKF-3. T h e o r e m 3 Suppose the assumptions of theorem 2 hold with b and h in C 3 having bounded third partials. Then the approximate filter equations for (EKF-3) are given by the system of coupled stochastic differential equations (12.50) and (12.51). Again, one can solve this recursively as the data y is presented in real time and obtain a better approximations of x(t) and K°(t). The extended Kalman filters are expected to work well under small noise situation. A formal error analysis of these filters along the lines suggested in [13], [80] should be useful for applications. Other useful approximation techniques can be found in [81] and [82]. C o m m e n t 4 An interesting exercise for the reader would be to evaluate the peformance of EKF-3 consisting of equations (12.50) and (12.51) by comparing their solutions with those of Zakai equation (13.37B) of chapter 13. The Zakai equation presented in chapter 13 gives the exact estimate.

167

C H A P T E R 13 N O N L I N E A R FILTERING

13.1 Introduction In this chapter we consider nonlinear filtering proper, without any approximation. We treat both continuous time Markov process and a class of jump processes. We develop the so called Zakai equation which is the counter part of linear filtering discussed in chapters 3 - 9. From the Zakai equation we derive the Kushner equation and from the Kushner equation we revisit the extended Kalman Filtering. 13.2 System and Measurement Dynamics The system is governed by the following nonlinear stochastic differential equation dx(t) = b(x(t))dt + ox (n{x(t))dW(t) (x(t))dW{t) x(0) = x0,

(13.1)

in Rn where 6 is a suitable nonlinear mapping from R" to Rn, and
> 0,

(13.2)

where h is a suitable nonlinear mapping from Rn to Rm, CT0 is also a nonlinear mapping from Rm to the space M(mxm) and V is an m-dimensional Brownian motion with covariance R G M+(mxm). As usual we assume that the processes {W(£),V(£),£ > 0} are statistically independent and also independent of the

168

Nonlinear Filtering

initial state x 0 . 13.3 Problem Formulation Let 4> : Rn » ► be ana continuouo and possibly bobnded scalar valued function. Given the history of the process {y{s),0 < s < £}, find a process 4>(t) which ii F> measurable .so thaa E{(x(t))} = E{(x(t))} and

E{4>(t)},Vt>0, E{4>(t)},\/t>0,

E{(<j>(x(t)) E{((x(t)) - 4>(t)) 4>{t))22} is minimum m t > 0.

(13.3)

In other words the best (UMV) estimate of the functional (x(t)) at time t, is given by 4>{t). We have ehown in ccapter 112 ,hat 4>(t) is given by t h t conditional expectation of the random process 4>(x(t)) relative to the er-algebra F%, that is, Mt) i{t) ==E^(x(t))\^y EU(x{t))\f?}-

(13-4)

Let B(Rn) denote the class of Borel subsets of the set Rn and T € B{Rn). Let

Q\ (r) == P{x(t) p{x(t)€ eT\F?} r\r*} Q\(r) denote the conditional probability of x(t) taking values in T given the history of the process y up to time t. Then, the equivalent expression for equation (13.4) is

4>{t) E^(x(t))\^ 4>{t) ==E^(x(t))\^

4>(Z)QUd$). ==jJ 4>(t)QMdO-

(13.5) (13-5)

Thus we may conclude that the entire filtering problem is solved if we can find the conditional probability measure {Qyt,t > 0}. For example, if one is interested in x(t) = E{x(t)\J^} then one has to choose for <j> the functions {i{x) = xui = 1,2,...,n}. Similarly, if one is interested in the conditional covariance of the process, one may choose for 4> the function
Linear and Nonlinear Filtering

169

process {x(t),y(t),t > 0} is a continuous time Markov process with values in Rn+m As seen in chapter 2, this is always true if all these parameters are Lipschitz having at most linear growth. This also guarantees uniqueness of the solution and finiteness of second moments provided the initial state XQ also has finite second moment. Thus, we see that under suitable assumptions, the solutions of the system equations (13.1) and (13.2) have always continuous sample paths (almost surely). Let I = [0,T] denote any finite time interval and C = C(I, i ? n + m ) denote the space of continuous functions from I to Rn+rn furnished with the usual norm topology C(I,Rn+m

sup{|| z(t) \\,t e I},z €

Let B(C) denote the cr-algebra of Borel subsets of C containing all cylinder sets, and (C, B(C)) the measurable space. In view of the almost sure continuity properties, as discussed above ( see also chapter 2), this can be considered as the canonical sample space for the random variable LJ —> {x(. ,u),y(. ,o;)} mapping (fi,^ 7 , P) to C. Hence one can define a probability measure fi1 on B(C) so that for any A £ B(C) one has ^(A)

= P{uen:

(x(o;),y(uj)) e A}.

(13.6)

Let Bt(C) C B(C) be an increasing family of completed subsigma algebras which are right continuous having left hand limits so that the restriction of lil on &t(C) is the probability measure /ij on the space C([0,£],R n + r n ). Now consider the pair of systems given by dx(t) = b(x(t))dt +
x(0) = x 0 ,

> 0, y(0) = 0.

).

Again by virtue of similar arguments, this system also induces a measure on B(C) denoted by /i°. We know from Girsanov theorem ( see chapter 2) that fj,1 is absolutely continuous with respect to the measure /i°, denoted by /x1 -< /i°, and that there exists a //° integrable function qr on C so that d/j,1 = qr dfj,0 where qr is given by qT = explf

-(1/2)

f

< flj^fc,/! > d s | ,

(13.8)

with Ro = aoRor'0. This is the Radon-Niokodym derivative (RND) of fi1 with respect to /J,0, as discussed in chapter 2. Define the process {qt,t > 0}, given by qt = Expif

-(1/2)

f

ds\.

(13.9)

170

Nonlinear Filtering

Clearly by restriction to Bt(C), d\i\ — qt rf/i?• Define z(t) = f (R^%dy(s)) Jo

- (1/2) / Jo

(R^h,h)ds

and write qt = ez{t\

t>0.

(13.10)

We show that qt is a square integrable continuous Tt martingale with respect to the measure /J,0. Indeed, by taking the Ito differential of the process q and noting that, under the measure /i°, dy = a^dV, one can easily verify that dqt = ez{t)dz{t)

+ (l/2)e z ( i ) < dz,dz >

= q t i ^ h , dy) - (1/2)(R^% =

h)dt} + {l/2)qt{R^lh,

h)dt

(13.11)

1

qt(R0~ h,dy),

where < •,• > denotes the quadratic variation process and (•,•) the scalar product. Clearly, from this it follows that Qt = 1 + /

Qsi^htdyis)),

Jo

(13.12)

= 1 + / qs{R-l(jQlh,dV(s)), Jo Thus if

(RQ1H,

t>0.

ft) is bounded, it follows from Gronwall inequality that

Eq\<2e^\

for

7

= s u p ^ 1 / * , ft), (x,y) G

Rn+m}.

In fact, it is sufficient if (RQ 1ft, ft) is essentially bounded along the sample path {x(t),y(t),t > 0}. Further, since qt is Tt adapted, it follows from (13.12) that for s < t, E{qt\Ts}

= l +j

qr(Rslh,dy(r))

+ E{j

qr(R^h,dy(r))\Ts} (13.13)

= qs + E{J

l

qr{Ro h,dy(r))\Ts}.

We know that RQ ( ' 'ft is square integrable and Ts J_ cr{V{r) - V(r),r s}. Hence we have

E(J

« r ( i ^ 1 f t , d » ( r ) ) | J r , | = E | / ' qr(
} = 0.

>r>

Linear and Nonlinear Filtering

171

Hence for all s < t, E{qt\Jrs}=qs,

P-a.s,

(13.14)

and thus {qt,t > 0} is a square integrable Tt martingale. Continuity follows easily from the expression (13.12). Further, it follows from (13.12) that, for almost all t > 0, E{qt} = \.

(13.15)

We summarize this result in the following lemma. L e m m a 1 Suppose the parameters {6,<7i,/i,<7o} are all Lipschitz having at most linear growth with cr0 invertible, and the triple {xQ,W, V} statistically independent with xo having finite second moment, Q > 0 and R > 0, where Q and R are the incremental covariances of the Brownian motions W and V respectively. Then the measure /x1, induced by the Markov process {#,?/} generated by the system (13.1)-(13.2) on the measurable space (C, 8(C)), is absolutely continuous with respect to the measure n° generated by the same system with h = 0. The RND of /z1 with respect to the measure /JP is given by qr as defined by (13.8), and further, the process {qt,t > 0} given by (13.9) is a continuous square integrable Ft martingale satisfying Eqt = 1 for almost all t>0. Now we are prepared to construct the evolution equation for the condi tional probability measure {Q\,t > 0}. Let El and E° denote the integrations on C with respect to the measures /i 1 and fi° respectively. Let (/> be any bounded Borel measurable function on Rn. Recall that we are interested in the conditional probability measure Q\ and hence the quantity

4>{t) = E j*(x(t))|**} = El^>{x(t))\^

= j ^ HOQyt(dO-

We know that, by virtue of Radon-Niokodym theorem, the conditional expec tation of a random variable X given another random variable Y, is a random variable X which is a(Y) measurable such that f XP{
= f XP(du),VA

e

a(Y),

or equivalently /

Jn

ZXP((LJ)

= f

Jn

ZXP(
e Loo(fi,<7(Y),P).

Nonlinear Filtering

172

Note that a(Y) denotes the smallest sigma algebra with respect to which Y is measurable. Let zt, t > 0, be any bounded measurable random process adapted to f%. Then clearly, E |. £ 1J fztfo) jzt<£(t) }j == EE11 izt{x{t)) jzt0(s(t))}-

(13.16) (13.16)

But since zt and fo) are ^ - m e a s u r a b l e we have tj>(t)E°{qt\jy}\. W*,*<>} = * {ft^^(*)| . * * . ) } == E°L *{,*>*{.W>}.

(1«7)

(13.17)

Similarly, E^zt^xit))^ = E°L zq{x(t))\ |z Wz z (f>(x(t)) \ = = £E°< t
t tt t

0 BtEo {q {qtt0(x(t))|J-f
}|.

(13.18)

Now it follows from the basic theory of Lebesgue integration that, for any positive measure space (£, JF, „) and for any two elements X, X € U(£, ^ , */), if f XZu(du>), [I XZv(du)= XZv(duj), V Z GVZeLoofaF,!/), Loo(E,^,^),

h

h

then X = X, 1/ a.e. Since zt is an arbitrary bounded J%- measurable process, and the equations (13.16)-(13.18) hold for all such z, we conclude from the expressions (13.17) and (13.18) that y 4>{t)E°{qt\Tj} 4>(t)E°{q = E°{q = £°{ t{x{t))\T t}. t\F»} 9 ^(x(t))|^}.

(13.19)

Hence we have proved that -

0 W

_ £°{g^(x(t))|Jf} E0{ft|J7} •

(13.20)

This is known as the Kallianpur-Striebel formula. Let Cb{Rn) denote the Banach space of bounded continuous real valued functions on Rn with the norm topology given by n n Hoos sup{|0(x)(,x Rn},4>e e CbC(R || < £ ||oo= sup{|0(x)(,x € R€n},<j> ), ), b(R

and let M{Rn) denote the space of bounded signed measures on the Borel field of subsets of the set R" denoted by B(fl"). Furnished with the total variation norm, M(Rn) is a Banach space. For any p G M(Rn), and 4> € Cb{Rn) let

»(4) = I (0»(dO J fi-

Linear and Nonlinear Filtering

173

denote the action of \x on (f>. Define the measure ji\ by setting /z?(4>) = /

M)ti(d£)

= E°{qt4,{x{t))\J^},<j> e Cb(Rn).

(13.21)

Using this notation, (13.20) can be rewritten as nvf^ = - £2vri. *** M #t) = = <#(fl

(1322)

Clearly the measure valued process fiy,t > 0, is unnormalized, the normaliza tion factor being /if(l). Thus, here we have two options: either we use the probability measure valued process Qy or the unnormalized measure valued process fiy. The two are equivalent except for a constant multiple depending on t. It turns out that the evolution equation for \xv is a linear stochastic partial differential equation while that for Qy is a nonlinear stochastic partial differ ential equation. The equation for Qy is known as the Kushner equation which was originally derived by Kushner in a series of papers [54]-[56]. The equation for fj,y is known as the Zakai equation and it was originally derived by Zakai [77]. The derivation of Zakai equation is much easier. First we shall derive this equation and then use this to derive the Kushner equation. We mention here that Kushner was the first to obtain the equation for nonlinear filtering, that is, the equation for Qy, and that Zakai later developed the equivalent linear equation by a separate technique. Let a denote the matrix a(x) = ai(x)Qa1(x). operator A by

Define the partial differential

(A)(x) = (D<j>,b{x)) + (l/2)Tr(D 2 0 a(x))

= J2 M*)0 X i + £ \
aiJ(t>Xi,Xj,
( 13 - 23 )

1£*>.7<™

As we have seen in chapter 2, this operator is the infinitesimal generator of the Markov process {#(£), t > 0} given by the unique solution of equation (13.1). Now we derive an evolution equation for the measure valued process /j,y,t > 0. Again we use v(X) to denote the least sigma algebra contained in T with respect to which X is measurable. Then let &(x(t)) denote the same associated with the random element x(t) where x is the Markov process generated by system (13.1). Since the sample paths are continuous with probability one this is well defined. Define

q(t,x(t)) = E°{qt\f? V *(*(*))}•

<13-24)

Nonlinear Filtering

174

Using equations (13.21) and (13.24) we have M?(0)

= E°{qt{x(t))\f?} = E°![E0{qt\^\Ja(x{t))}4>(x(t))\^

(13.25)

= £0{^,*(iM*(i))|^}. Since under the measure /x° the process x is independent of the process y, it follows from (13.25) that Mf ()= /

q(t,ZMt)Pt{
(13.26)

where Pt(d£) denotes the measure induced by the random element x(£), that is, Pt(G) = Prob.{x(t) e G}, G € B(Rn). Now taking the conditional expectation relative to the sigma algebra T\ \J cr(x(t)) of either side of (13.12) with respect to the measure /z°, and using the independence of x and y under this measure, we have

*(*,*(*)) = l + jf (E°{qsh\a(x(t))\/ ^},R^dy(s)) (13.27)

= 1+ / (E°{q(s,x(s))h(x(s))\*(x(t))},R^dy(s)). Jo

Since under the measure /i°, x is independent of y, the last line follows from the following chain of equalities

^{«.%(x(o)V^}-^{^{?.^(^.x(*))V^'}kN*))V^'} = E0{q(s,x(s))h(x(s))\
??}

E0{q(s,x(s))h(x(s))\
Given that x(t) - £, it follows from (13.27) that

q(t,0

= l+j

(E°{q(s,x(s))h(x(s))\x(t) Jo

= 0}>#o ^tfW)-

(13-28)

Linear and Nonlinear Filtering

175

Let T e B(Rn) and x e Rn and let Pt(x,T) = Prob.{x(t) € I > ( 0 ) = x} denote the Markov transition probability measure corresponding to the process {x(5),s > 0}. Let VQ denote the probability measure corresponding to the initial state xQ. Define the Markov semigroup {S(t),t > 0}, by (S(t)d>)(ri) = E{c/>(x(t))\x(0) =V} = J (z)Pt(V,dz).

(13.29)

Then E{J>(x{t)} = f

4>(Z)Pt(dO = /

(S{t)4>)(ri)v6(dT,) = MS(t)4>)-

(13.30)

To evaluate E°{q(s,x(s))h(x(s))\x(t) = £)} we use an important symmetry property of Markov processes. Let {£(£), t > 0} be a continuous Markov process and 6 < £, then

Pt,e(dx\rj)Pe(dr)) = Probst)

€ dx\£(0) = rj\Pe{drj)

= Prob.{£(0) e dr)\£(t) = x}Pt{dx) =

P9)t(dr)\x)Pt{dx), (13.31) where the first and the last terms in the above equation are merely notations for the second and the third expressions respectively. Clearly, E»{q(e,x(0))Kx(0))\m

= 0} = I

JRn

h(V)q(e,ri)Pe,t(dri\0)-

(13-32)

Multiplying this by 0(£) and integrating with respect to the measure Pt(d£) and using the symmetry property (13.31) we obtain,

/ (t)Eo{q(e,x(0))Hx(e))\x(t) = OWdt) JR"

JR"

= f l

4>m(v)q(0,v)PtAd£\v)Pe(dil)

JR"JR"

= [ n h(r,)q(6,V)\ [ JR IJR" = / JR"

4>(0PtA^\v)]Pe(dv) J

h{71)q{e,r,){S{t-e)4>){7l)Pe{di1).

(13.33)

176

Nonlinear Filtering

Multiplying either side of equation (13.28) by <£(f) and integrating with respect to the measure Pt(d£) and using (13.26), (13.30) and (13.33) we obtain (13.34) [\tf((S{t - s)4>) - s)4>) (13.34) tf(4>) = vo{S(t)) voWt)*) + + f\tf((S(t • • h^^dyls)). h^dyis)). Jo Jo We shall now write this as a differential equation by using Ito formula. Let € D(A) where the operator A is given by (13.23). Let £,(<) denote the solution of equation (13.1) starting from the state 77 at time t = 0. Then by taking the Ito derivative of the process {<£(&?(*))>* > 0}, we have <Mt„(*))==(Atf>)(&,(t))dt (A4>)(^(t))dt ++ (a'iD&dWit)). (t,(t)) EMr,(t)) = )(Us))} E{(A4>)(Us))} ds. ds. V) +1* Using the Markov semigroup S this can be written as (S(t)(j>){r}) = <j>{ri) / {S{s)A4>){n) (5(t)^)(rj) = 4>(r>) + +/ (S(a)ity)fa) ds ds Jo = 4>{-n) + / A(S{s))ds. Jo From this we have (d/dt)S(t) = {d/dt)S{t)<j> = AS{t)(j> AS{t)==S(t)A<j>,V<j>D(A). S(t)A4>,V4>D(A).

(13.35) (13.35)

Computing the Ito differential of either side of the expression (13.34), and using (13.35), one obtains

« * ) == {MS(t)A) + <j [tW ( * - s)Am,^dy)}dt 4M?(0) [*mM) + j [ W - .)M)h),tfdy)}dt 1 y + ((n (h),Ro1dy) dy) ti (cj>h),R^ yy yy 1 1 = fni (A)dt (A<j>)dt + + (ftxi (4>h),Ro (h),R0dy). - dy).

Thus we have proved the following fundamental result. Theorem 2 Under the assumptions of Lemma 1, the measure valued process {ny,t > 0} is governed by the following stochastic (partial) differential equation in the weak sense dtf() dn\{) ==nnyy(A4>)dt (tftfh), R^dy) R^dy) t{A4>)dt++{rf(h), ^ ) = "oW,V^P(4).

(13.36) (13 36) '

Linear and Nonlinear Filtering

177

Note that equation (13.36) is the weak form of the stochastic differential equation on the Banach space M(Rn)

dtf = A*tf dt + vKRo'h,dy) (16.61) W) = "0,

where A* denotes the adjoint of the operator A. It should be noted that Zakai's original derivation is based on the assumption that the measure \it has a density pt and he writes equation (13.37) as an evolution equation in the Hilbert space H = L2{Rn) given by dpy = A*pytdt +

pl{R^h,dy) (16.61B)

Po = Po, where po is the density corresponding to the measure v$. Comment 1 In case RQ 1h is only continuous and bounded on bounded subsets of Rn x jRm, equation (13.37) still holds in the weak sense for each €Cg°(Rn). Comment 2 Theorem 2 is also valid for time varying systems. Here b = b(t,x),ai = ai(t,x) are functions defined on Ro x Rn measurable in the first variable and C1 in the second. Similarly h = /i(t,x, y),<Jo = 0"O(*J2/)> defined on i?o x Rn x R™, a r e measurable in the first variable and C 1 in the rest of the arguments. In this case, of course, the operator A is time varying and the corresponding Markov transition operator S is given by (S(t,s))(x) = J(z)P(s,x;t,dz)

= E{4>(t(t))\Z(s) = s i -

Equation (13.37B), a special case of (13.37), is the celebrated Zakai equa tion which plays a central role in nonlinear filtering as Kalman equation does for linear filtering. Positivity of /iy It is important to verify that the measure solution of Zakai equation (13.36) is positive. Taking (f> = 1, it follows from equation (13.36) that dtf(l)

= =

(p${h),I%1dy) _!(Rv1tf(h),cT „dV). „ 0

(13-38)

178

Nonlinear Filtering

Further it follows from equation (13.22) that QUh) = *$&.

(13.39)

Hence dA*?(l) = **?(!)( V < # W . < f o ) = Performing Ito differential of log $(1), d{log{tf(\))}

tfWRo'QUhhvodV).

(13.40)

it is easy to verify that

= (R^Q^h^dy)

(l/2)(R^QUh),Qyt(h))dt

-

giving /i»(l) = exp{(R^Qvt(h),dy)

(l/2)(R^Qyt(h),Qyt(h))}

-

and hence it follows from (13.22) that !#{) = nvt{i)Qvt(4>) = Qf(0)exp{(H o - 1 Qf(/i),dy)-(l/2)(i?o 1 Qf(/ l ) ) Qf(/i))}.

(

'

'

It follows from this that whenever Q\{4>) is positive $ ((/)) is also positive. Since Q\ is a probability measure, for each (j> > 0, Q\{4>) > 0, and hence /if (0) > 0. Thus /xjf is a positive measure valued process. 13.5 Zakai to Kushner Equation and EKF Revisited In this section we derive Kushner equation from the Zakai equation. Using Ito differential applied to Q\{(j>) it follows from equation (13.22) reproduced below QW)

= TI/TTT >

Mf(l)'

(13-42)

that

(13.43) We compute each of the Ito differentials term by term. For the first term, it follows from (13.36) that ^ ( l ) - 1 ^ ^ ) = Qyt(A<j>)dt + {QKhfi^dy).

(13.44)

Linear and Nonlinear Filtering

179

For the second term, we have Q\mR*lQ\{h),Qyt{h))dt-Qyt{){R*lQy(h),dy). (13.45) For the quadratic variation term, we have tffo)

dOiZU)- 1 ) =

< drf(),d(/i?(l)_1)

> = -{Qyt{4>h),R^Qy{h))dt.

(13.46)

Adding all these we arrive at the following equation dQUt) = Qyt(A)dt + (QU4>h) - Qy{ci>)Qyt{h),Rsl{dy{t)

-

Qy{h)dt)\ (13.47)

Thus we have proved the following result. Theorem 3 Under the same set of assumptions as postulated for theorem 2, the evolution equation for the conditional probability measure Qy, in its weak form, is given by dQU4>) = Qy(M)dt

+ \Qy{4>h) - Qyt{)Qy{h),R»l{dy{t) -

Qyt{h)dt)

Qy0(), V&D(A). (13.48) This is the celebrated Kushner equation in the weak form [54]-[56]. Clearly it is a nonlinear stochastic partial differential equation while the Zakai equation given by (13.36) is a linear stochastic partial differential equation. Writing fta^1(dy(s)-Qy(h)ds),

V(t)= Jo

one can verify that V(t) is a continuous square integrable T\- Wiener mar tingale having the same incremental covariance as that of V. Hence equation (13.48) can be written also as dQyM

= Qyt{A4>)dt + (Qyt{4>h) -

Qyt()Qy(h),(Roa'^dV)).

For numerical computations we find the Zakai equation more convenient as discussed in the following chapter. Comment 3 In the literature, Zakai is credited for derivation of equation (13.37B) and this is indeed a ground breaking work because it is a linear

Nonlinear Filtering

180

stochastic partial differential equation in contrast with the celebrated Kushner equation. This work appeared in 1969 [77]. However if one reads Jazwinski's book one may discover [51, p 179] that this equation already appeared in an earlier paper of Bucy in 1965 though the derivations are different. In fact Jaswinski uses the proof given by Bucy. It also appears that Bucy did not realize the significance of the result (Zakai equation) he had obtained in the process of derivation of Kushner equation. E K F Revisited Here we rederive the extended Kalman filter (EKF) equa tions from the Kushner equation. Choose cf)(x) = Xi. Substituting this in equation (13.48) and using the notations ^ = Q\(V>), and ( • ••) = Q\(• • •), we have

dxi(t) = bi(x(t))dt + ( (hxi — hxi),RQ1(dy = k+

(l(xi-Xi)(h-

- hdt) },

h^iR^idy-hdt)

J , l < i < n.

In the vector notation this takes the form dx = bdt+\(x

- x)(h - h)')RQ 1(dy - hdt).

(13.49)

Using Lagrange formula as given by equations (12.30)-(12.31) of chapter 12, we can write h(x) = h(x) + (Dh(x)(x - x)) + o(x - x) = h(x) + 0 + o ( x - x), h(x) - h{x) + Dh(x)(x - x) + o(x - x). Hence to a first approximation, h(x) - h(x) « Dh(x)(x - x). Substituting this in equation (13.49) we have dx = bdt + K{t){Dh{x))'R^1{dy

- hdt)

(13.50)

where K(t) is the conditional covariance of x(t) - x(t). Now using the approx imations b(x) = b(x) + ^Db(x).(x - x)) + o(x - x) h(x) = h(x) +\Dh{x){x

- x)) + o(x - x)

Linear and Nonlinear Filtering

181

and noting that the second components are zero, we have the so called extended Kalman filter given by dx = b(x)dt + K{t){Dh(x))'R^l{dy

- h{x)dt).

(13.51)

For notational convenience we write H = H{t) = Dh(x(t)),

B = B(t) = Db(x(t)),e

= e(t) = (x - x).

Using only the first order approximations b(x) = b(x) 4- Db(x)(x — x) 4- o(x — x) h(x) = h(x) 4- Dh(x)(x - x) + o(x — x), one can easily verify that the error equation is given by de = (B(t) - T°H)edt - T°a0dV +
(13.52)

where T° = KH RQ1 . This is similar to equation (12.15) of chapter 12, and hence the differential Riccati equation for the error covariance is given by K = BK + KB'+

Qn -

KHR^HK

K(0) = K

Q.

Comment 4 Note that B and H are functions of x and therefore equations (13.51) and (13.53) are a coupled system of nonlinear differential equations which must be solved simultaneously. This is what we know as EKFl giving only a first order approximation and certainly cannot be expected to perform well in case of strong nonlinearities and large noise power. Comment 5 Clearly the Kalman-Bucy filter equations follow immediately from the above results, in particular, equations (13.51) and (13.53), once one assumes b and h to be linear and (J\,(Jo independent of {x,y}. Higher Order Approximations: By choosing 0(x) = x%, and (j){x) = XiXj and using Ito's formula, one obtains from Kushner equation (13.48), the following set of equations: dxi — bi + f (hxi — hxi), (dy — hdt) ), 1 dt — (Ro(hxi — hxi), hxj — hxjjdt 4- \ I (hxiXj — hxlxj — Xihxj — Xjhxi 4- 2xiXjhj1dy

— hdt 1 >, (13.54)

Nonlinear Filtering

182

for 1 < z, j < n. In general by choosing
x„n,ki € N,i = l....n

one can construct an infinite system of nonlinear ordinary differential equations for moments of all finite orders provided they exist. For calculation of the first moment, second moment must be known; and for calculation of the second moment, third order moments must be known and so one. Certainly this is again an infinite system of equations and there is no reason why one should prefer this to PDE given by (13.37). On the other hand if for some reasonable ground one can neglect all higher order moments beyond the second order, one obtains a better approximation compared to EKF1, EKF2 and EKF3. One such approximation can be obtained from equation (13.54) by neglecting all components beyond second order nonlinearities and second moments. This gives a set of equations comparable to those of equations (12.50) and (12.51) of theorem 3 of chapter 12. We call this EKF4 and we present this in the form of a Corollary. Corollary 4 The equations for EKF4 which include second order approxi mations are given by dx{t) = b(x(t),K{t))db dK(t) = [B(x(t))K(t)

+ K{t)H R^l(dy{t) + K(t)B(x(t))' - K(t)H 2

- (1/2)K(t)(Tr(D h(x)K),

- h(x{t),K(t))dt),x(0)

= x0; (13.55)

+ Qn

(xityR^HixityKit^dt dy - h(x(t), K)dt),

(13.56) K(0) = K0,

where Tr(D2h(x)K).=

{Tr (D2hk(x)K),k

= l,2....ra}'.

These equations were originally derived by Jazwinski. For details the reader may see Jaswinski [51]. Note that both these equations are stochas tic differential equations driven by the observed process y. The filter gain is also adapted to the y process. If the stochastic term in the second equation is neglected we arrive exactly at the equations of EKF-3 as given by (12.50) and (12.51) of chapter 12. Our main goal of this chapter was to derive the Kushner and the Zakai equations which are of great interest in nonlinear filtering. In the following

Linear and Nonlinear Filtering

183

section we present an extension of Zakai equation for systems perturbed by a class of jump process. 13.6. Zakai Equation for J u m p Processes Here we consider a class of Ito processes perturbed by a jump process. The model was originally proposed in [9] and [15] and it is given by the following system of equations dx = 6(x, Qdt + a(x, QdW, x(0) = x 0 , C(0) = Co, dy = h(x)dt + dV, y{0) = 0.

(13.57)

The model represents very well a situation involving stochastic production and inventory. The process {C(t),t > 0} is a finite state Markov chain taking values from S = {ei,e2, ^ M } - The transition probability matrix for the jump process £ is given by T(t) = {Tij(t),t > 0,1 < z,j < M} with the infinitesimal transition rates {A^; 1 < i, j < M } , given by

13.58) Xu = lim ** tio t These parameters satisfy the properties (PI) : Xij > 0, i ^ j ,

.

and (P2) : ] T Ay + An = 0.

The general assumption is that for each fixed e G <S, the system parameters {6(x, e), cr(x, e)} satisfy the standard assumptions of this chapter and that the processes {xo, W, V, £} are independent. Note that {x(t),t > 0}, alone is not a Markov process. However the couple {x(t),((t),t > 0} is a Markov process with values in the state space Rn x S and its infinitesimal generator is given by M

(A^)i(x)

= Aipi(x) + J2\ijil>j(x),

i = l,2,..,.M,

where I/J = {V>i, i>2, • ••• / 0M}' with each Vi G D(A). Under certain standard assumptions, the differential operator A is the infinitesimal generator of a

184

Nonlinear Filtering

Markov transition operator on Cb(Rn,RM) by

= Yi^LlCb{Rn) which we denote

M

(S(t,s)jp)i(x)

=V

/

^k(z)Ps,t(x,i;dz,k),l

Cb(Rn)lS<

< i<M,^e

t,

where, for 5 < £, Ps,t(x,i',dz,k)

= Prob{x(t) G dz,C.{t) = ek\x(s) = x,((s)

= ej.

If the parameters are independent of time, {#, £} is a temporally homogeneous Markov process and in this case S(t, s) = S(t — s). As usual, the problem is to find the best estimate for x(t) or (/>(x(t)) given only the history T^. In other words, the estimate must be made without the knowledge of the history of £ denoted by T{. Let {pyk{t,x),k = 1,2...M} denote the conditional densities corresponding to the unnormalized measure process /J% = {p\k, k = 1,2....M}. For convenience of notation we omit the superscript y. It was shown in [9] that the family of densities satisfy the following system of Zakai equations dpk(t,x) pk(0,x)

= I (A*pk)(t,x) = /9fc,o(x),

+ ^2\rnkpm{t1x)\dt+

(pk{t1x)h{x),R-1dy(t))

k = l,2...Af. (13.59)

Let
M

r

n

k(x)dx

*(<£) = Y2 /

= ^(pk(t),(t)k)

M

= ^/ifeC^fc).

Pk(t,x)(j)

Using the Markov transition operator {5(t, 5), s < t} it is easily verified that Ut() = I I , ( 5 ( M ) 0 ) + y (n r ((5(t,r)^)/i), J R- 1 di/(r)).

(13.60)

Formally differentiating this in the Ito sense, we obtain the Zakai system in its weak form dlit{(j)) = Ut(A(l>)dt + (n t (0/i), R^dyit))^ Thus we have the following result.

= u0.

Linear and Nonlinear Filtering

185

Theorem 5 Under the standard assumptions the measure valued process n t , t > 0, with Ut £ M(Rn, RM), corresponding to the filtering problem asso ciated with the system (13.57), satisfies the following system of Zakai equations: )dt + {Uti^.R^dyit))^

= */o.

(13.61)

For detailed proof of this result see Ahmed and Dabbous [9] and for other models for jump processes see [15],[27],[28]. Using this result one can determine the conditional expectation of f(x(t)) relative to T\ for any suitable function / . This is given by the following expression M

E{f(x(t))\J*}

Y^Gk(t)(pk(t)J)/(Pk(t),l)

= k=i

where «

(pk(t)J)

= jRn

M

dxf(x)pk(t,x),k

= l,2r-M',

and 6fc(t) = V 7 ^ ) 6 , ( 0 ) . 7^1

This model can be used to represent many different physical situations. Here we present some simple examples illustrating the usefulness of the model. 13.7 Some Examples Example 1 Suppose a source generates a signal vector x and the com munication channel that carries the signal is subject to random changes of transmission characteristics (channel fading) switching between M different levels of capacity. This is modeled by the Markov chain £. The signal is also subject to additive and possibly multiplicative channel and source noise rep resented by the martingale term. The process y represents the received signal and the problem is to provide the best possible estimate of the signal x{i) given the measurement history up to time t. Example 2 Suppose a firm has m machines producing a homogeneous good. The rate of production depends on the number of machines in operation. Define S = {0, l } m and let (t be a Markov chain with values in «S, where 0 represents the failed state and 1 represents the operating state. Total number of states is given by M = 2 m . Flow of inventory It and total sales st are governed by the following system of equations: dlt = (b(Ct) - h(It,p))dt dst = h(It,p)dt

+ dV(t),

+
Nonlinear Filtering

186

where b is the production rate dependent on the state of the process £, more machines more production and conversely. The function h represents the in stantaneous sales rate which is dependent on the inventory level and the unit price. The martingale terms adW and dV represent spoilage of inventory and sale returns. The firm uses sales record to estimate the inventory cost C(It) . According to our theory the best estimate for holding cost is given by

6(it) EE E{C(it)\?n = f > ( * ) | ^ j The reader may produce many interesting examples from other fields. 13.8 Nonlinear Prediction Here we consider the prediction problem associated with the system equa tions (13.1) and (13.2). Find the best estimate of 0(x(£ + a)), a > 0, given the information sigma algebra T^ upto time t. Clearly, according to the basic theory developed here this should be given by

(13.62)

where p is the solution of equation (13.37B). It is evident from equation (13.37B) that p is given by the solution of the integral equation Pt = S*(t)po + / (S*(t - s)psh1R^1dy(s))1 (13.63) Jo where 5* is the adjoint of the semigroup S corresponding to the backward Kolmogorov operator. Using the integral equation one can verify that

ELt+a\?A=S*(a)Pt.

(13.64)

We leave it as an exercise for the reader. Using (13.62) and (13.64) we obtain the expression for prediction

*{t) = {S{a)(t>,pt)= I (S(a))(Z)pt(0dC

(13.65)

It is interesting to note that both in the linear ( see chapter 3) as well as in the nonlinear case, the predictor is given by the composition of two operators, the estimator followed by the transition operator.

187

C H A P T E R 14 N U M E R I C A L T E C H N I Q U E S FOR N O N L I N E A R FILTERING

14.1 Introduction In this chapter we are concerned with numerical techniques for solution of nonlinear filtering equations. In particular, we develop techniques for solving Zakai equation. We have already seen that if the solution of Zakai equation is available then we can compute the best estimate for any relevant functional of the original process x. For convenience of easy reference we remind the reader of the systems we are concerned with. These are given by x(0) = x0, in RRn,n,

(14.1)

j/(0) = 0, in J T dy(t) = h(x(t))dt + a 0, y(0)

(14.2)

dx(t) = b(x(t))dt + ai(x(t))dW(t), (Ti(x(t))dW(t), for the system dynamics and

for the measurement dynamics. We have seen in chapter 13, that the unnormalized measure valued process {fit = /iyt,£ > 0}, (y suppressed for convenience of notation) associated with the above system is governed by the following stochastic (partial) differential equation in the weak sense l dfH(4>) Ht{Ah),RZ dy) dinW) ==MM)dt + +(M4>h), R^dy)

Ho{4>) v0{),V e D(A). ■Mo(<W ==M4>)^<1> € D(A).

(14.3)

As mentioned in the previous chapter, this is the weak form of the differ ential equation on the Banach space M{Rn) given by dfit = A*fit dt + Vt-iRo1h,dy)

/1AAS (14.4) (14.4)

Mo = ^o, Mo = ^o, where A* denotes the adjoint of the operator A. The most ideal and desirable computational technique should be the one that can handle this system without

Numerical Techniques

188

requiring the assumption on the existence of density (unnormalized). We shall comment further on this matter later. Under the assumption that the measure fit has a density pt with respect to the Lebesgue measure, we have seen that equation (14.3) reduces to an evolution equation given by dpt = A*pt dt +

pt.(Rvlh,dy) (14.5)

It is known that if VQ has a density, then \it also has a density. By virtue of Radon-Nikodym theorem one expects this density to be in L\{Rn) and not necessarily in if = L2{Rn). However if v$ has a density po € H, then under certain assumptions on the coefficients of the operator A or, equivalently, A*, one can prove the existence of pt with values in H. Computationally this is very desirable. Thus, from now on we will be concerned with the evolution equation (14.5) on H. 14.2 Theoretical Basis for Numerical Computation To build up the numerical procedure it is essential to understand the op erator A* and the function spaces on which it may be defined. Recall that the operator A is given by (A4>)(x) = {D,b(x)) + (1/2)TY(£>2<£ a)

= £

bi(x)+Xi J^ «***.**> £C2h{Rn),

l
( 14 - 6 )

l<*»i<w

where the matrix a is given by a(x) = ai(x)Qa[(x). Let CQ°(Rn) = CQ° denote the class of C°° functions on Rn with compact support, that is, functions which are differentiable infinitely many times and which vanish outside some closed bounded subsets of Rn. The support may vary from one function to another. Define Dt = d/dxuDi5 = DiD^D = (DUD2, ...Dn)'. Taking any ip e C£° and multiplying this on either side of (14.6) and carrying out integration by parts it is easy to verify that the formal adjoint of A defined by (A(j>, ip) = (0, A*I/J) is given by n

n

A*1> = -J2 DiM) + (1/2) £ AiKV). Let us introduce the Sobolev space H1 = ^(R71) Hl = {4>e L2(Rn)

(14.7)

as follows

: D<j> = {0 Xi ,z = 1,2, ...,n} €

L2(Rn)}.

Linear and Nonlinear Filtering

189

Furnished with the norm topology,

II 4> \\m= ( J \<S>\2dx + J || D4> ||2 dx)

,

this is a Hilbert space with the scalar product (0, )#* = ( < £ , V 0 L 2 ( * » ) + {D,Dll>)L2(R"), fall) €

H1.

The space of continuous linear functionals on if 1 , denoted by (H1)* = H"1, is a class of distributions in the sense of Schwarz. For example, let / G L,2(Rn) and define L/(0) = / f(x)4>(x)dx = ( / , 0 ) , for cf> G H\ Clearly \Lj()\ <|| / I U I 0

IIL 2
/ I U I
where Cf is a constant. Thus / induces a continuous linear functional on H1 and hence Lf G if - 1 , and the embedding L i f - 1 is continuous. In other words we can consider every / G L2 as an element of H~l through the correspondence / «-► Lf. In general each element L of H~l has the nonunique representation n

L / . «_ /* = co/o + £

i=l

C.A/.

(14.8)

for some family /* G L2, and suitable constants c^ , i = 0,1,2, ...,n, where the derivatives are understood in the sense of distribution. Indeed, by integration by parts, it is easily observed that for each <j) G H1 the following functional, n

L/.(0) = c o ( / o , 0 ) - ^ c i ( / i , A 0 ) =

(/*,0)H-I,HI,

is well defined and there exists a positive constant C/* dependent only on / * through the elements {ci, fi) such that

\Lr{)\\\m v^eff 1 . The reader can easily verify that 1/2

Cf.

<(l4\Ml2+2^1/^)

190

Numerical Techniques

Thus, each /* of the form (14.8) is a distribution belonging to H'1. Now we return to the operator A*. For ip e H1, using integration by parts, it follows from (14.7) that n

n

-{A*i>,i>) = Y^W, where

AV)L 2 + Y

(oiiDrt'AV0L2,

(14-9)

n

(14-10)

6i = 6i + 5 ^ 4 - K j ) i=i

Let LOQ(Rn) = B(Rn) denote the class of essentially bounded measurable functions on Rn furnished with the usual norm topology || g 1100= ess-sup{|p(x)|,x e

Rn}.

Suppose there exist three numbers /3,7,6 > 0 satisfying the following condi tions (Al) :

k eB(Rn),i

= l,2--n,

with/? = sup{|| 6* ||oo,i = l , 2 , » , n } < 00,

n

(A2) :

7 |£|

2

a

*Mi < m2-

< (1/2) Y

(14.11) Then it follows from Schwarz inequality applied to the first term of (14.9) that n

I £ ( 6 ^ , D i i > ) L 2 \ \\L2II Lhl> || ia •

(14.12)

i=l

Similarly by virtue of the ellipticity assumption (A2), we have n

(1/2) Y

(diiDjrP, D^)Li

> 7 || Dip ||| 2 .

(14.13)

Using Cauchy inequality, ab < (e/2)a2 + (l/2e)6 2 Ve > 0, it follows from (14.9), (14.12) and (14.13) that

- ( A > , VO > (7 - 0(*/2)) II ^

II2 -(/?/&) || V Ilia •

(14.14)

191

Linear and Nonlinear Filtering

Since e > 0 but otherwise arbitrary, it follows from (14.14) that there exist a > 0 and A > 0 such that --04>,) 04>,f>) + A A || m mi£l 22

a\\^\\2H1, V V e f f 11.. > alllM&i, VVetf

(14.15)

This inequality also holds for the operator A. It is clear from the above expression that

wu m_{(z|M} = +0O) Hm ftf^U+oo.

(14.16) (14,6)

for B = A or A . Operators satisfying such properties are called coercive. Following similar procedure and using the upper bound in (A2), one can easily justify that there exists a constant C > 0, dependent only on /3 and 6, such that \(A*4>,i>)\ \\ D4> || D4> || Di,|U\\2 L2 \(A* \\L\\ +6 +6 || D \\Lt||\\L2D^ 2 L2

^ (14.17)

Inequality (14.17) also holds for the operator A. This inequality implies that both the operators A and A* are linear and bounded from H1 t o f f " 1 . Thus we have proved the following result. L e m m a 1 Under the assumptions (Al) and (A2) given by (14.11), both the differential operators A* and A satisfy Garding's inequality (14.15) and hence these operators are coercive with respect to the triple #Hi^L i ^2^ L2 ^

H-\

Further both A and A* are bounded linear operators from H1 to H~x and A* is the generator of a Co semigroup {S(t),t > 0} in H. In view of the above result, equation (14.5) can be treated as a stochastic differential equation on the Hilbert space H = L2(Rn). Defining the operator F by F(p) = (RQ 1h)p, we can rewrite equation (14.5) as dp = A*pdt +

(F(p),dy)

Po == PoPo-

(14.18)

Clearly F is a multiplication operator and for unbounded h, it is an unbounded operator in H with domain a proper subspace of H. If on the other hand h is bounded, F is a bounded operator from H to £(Rm,H). Under this assumption, using Banach Fixed point theorem, one can easily prove the existence of a unique solution using the integral equation dy). p(t) s)F(p(s)),dy). W = S(t)po + / (S(t - s)F(p(s)), W

Jo Jo

(14.19)

192

Numerical Techniques

Further, one can prove that for any finite time interval I = [0, T], the solution p e L2(I, H1) D L2(I, H) n C(I, H) with probability one. We state this result formally as a theorem. For details see [2] [3]. Theorem 2 Suppose the assumptions of Lemma 1 hold and that RQ 1h is uniformly bounded. Then equation (14.18) has a unique solution p and that for any finite time interval I, pe L2{I, Hl) n L 2 (J, H) D C(J, H) - P-a.s. The uniform boundedness assumption of R$ lh is rather strong. In fact, for the existence of weak solutions this is not necessary. This can be appreciated as we deal with the Galerkin approximation. 14.3 Galerkin Approximation using Special Basis Functions For convenience, from now on we use the notation V = H1 , H = L2(Rn) and V* = if" 1 . Note that the embeddings V t-> if c-^ V* are continuous but not necessarily compact. We assume, however, that there exists a sequence {vi, i G N} C V which is orthogonal in V and orthonormal in H and complete in all these spaces. We use this sequence to construct Galerkin approximation for the solution p of equation (14.18). Since po e H, we can approximate this by the sequence {po} given by k

k

P0 = ^(PO, Vi)Vi = J2 Z0iVi>

( 1 4 - 2 °)

which converges to po in H. Define k

pk{t) = Y,ZHt)vi,

(14.21)

2=1

where the Fourier coefficients, {Z\,\ l <j < i=l

fc,

(14.22)

i=l

where the elements of the matrices are given by dji = (AVJ, vi) and Hji = (R^hvj^i),

1 < i,j < k.

(14.23)

Linear and Nonlinear Filtering

193

Note that {Hji}, taking values from Rm, are functions of time in case a0 is y dependent. Clearly equation (14.22) can be written as an ordinary kdimensional stochastic differential equation as follows: dZk{t) = AkZk{t)dt + Bk(t, Zk(t))dy, u «. / Zfc(0) = Z o f c ^ { ( p o ^ i ) , . . . , ( p o , ^ ) } ,

(14.24)

where Ak G M(k x k) and Bk(t,z) G M(k x m) for t > 0 and z G Rk. Since Ak is bounded and Bk is linear in z and hence Lipschitz, it follows from the results of chapter 2 that this equation has a unique solution {Zk(t),t > 0}, possessing finite second moments and Zk G C(I,Rk) P-a.s. Clearly pk given by pk(t) = J2i=iZi(t)vi ^ a ^ s o m C{I,H) P-a.s and it is an approximate solution of equation (14.18). For numerical computation, there are two major difficulties with the se quence {vi}. First, there is no systematic method available for constructing the sequence {vi} which is orthogonal in V and orthonormal in H. Secondly, since this sequence is orthonormal in if, there is no guarantee that the approx imating sequence {pk} will preserve positivity. Since any numerical algorithm must be terminated after a finite number of steps, say AT, it is possible for pN to assume negative values over certain regions of the space Rn, which is absurd for probability densities. In order to avoid this negativity, N may have to be taken very large requiring extensive computation and hence large CPU time. Recently Ahmed and Radaideh [12] proposed a sequence of Gaussian func tions {wi} as the basis functions for numerical solution of Zakai equation.

These are given by {wi(x) = Wi(x,mi,Bi),x

G Rn,i G N} where

Wi{x,mi,Bi) = {l/(2n)n/2^/d^)exp-(-)(B-\x-mi)1x-mi),

(14.25)

with mi ^ nij G Rn for i ^ j and the matrices Bi G M+(n x n) are positive definite. For all examples we have considered Bi = 7*7,7» > 0, with complete success. It is evident that for m^ ^ mj, the sequence {wi} is linearly indepen dent but neither orthogonal nor normal, though, by Gram-Schmidt procedure, one can always orthonormalize the given sequence. However for numerical purpose this is very inefficient and not at all necessary. Rather, it is more convenient to use the sequence {wi} as they are. The following properties of the sequence {wi} are advantageous over the orthonormal sequence {vi} (a) Wi(x)

>0,xeRn,ieN.

194

Numerical Techniques (b) Wi e D(A) = D(A*)jWi

e C°°.

(c) the sequence {wi} is complete in the class L2(Rn) = H. In (14.21) we replace the sequence {vi} by the sequence {wi} and replace the equation (14.24) by VkdZk(t) u VkZk(0)

= AkZk(t)dt u = zk,

+ Bk(t,

Zk{t))dy,

(14.26)

where the elements of the matrices are given by Vk = {dji = (wj,Wi),l

< ij

< fc}; Ak = {a?* = (wj,A*Wi),l

< ij

< A:};

k

Bk(t,z)

= {tfjti(t,z)

= ^((R^fyi

Wj,Wi)zul

<j
i=l

(14.27) The initial state ZQ is given by the corresponding expression in (14.24) with Vi replaced by Wi. In this case all the Fourier coefficients {Z^} > 0. Note that the matrices Vk are invertible. C o m m e n t 1 Equation (14.26) is equivalent to d(pk(t)jWj)

= (pk(t)1Awj)dt

+ (^ffi(t)RZ%wj)H,dy\pk(0)

= pk0, (14.28)

for 1 < j < k. Integrating this we have the weak form (pk(t)1wj)

= (p^wj)+

/ Jo

{pk(s)1Awj)ds

i*r

+J

\

\Apk{s)RZlh,Wj)H,dy{s)\,

(14 29)

'

for every Wj, 1 < j < k. Looking at the integrand of the martingale term in (14.29) more closely, we see that

(p/'(t)R^1h,wj)H

* . = V z f ( t ) / wi(x)(R^1h)(x)wj(x)dx,

(14.30)

is well defined even for unbounded h, in particular, for h with polynomial growth, \h(x)\Rm < K(l + \x\qRn), 0
Linear and Nonlinear Filtering

195

Thus for weak solutions, uniform boundedness assumption is unnecessary. 14.4 Spatial Discretization and Computational Algorithm For numerical computation, using Runge-Kutta techniques, Stratonovich formulation is the correct framework for stochastic systems. Thus a correction term, known as the Wong-Zakai correction [74], ( see also chapter 2), must be added to the Zakai equation (14.5) or (14.18) giving dpt = A*(H dt - ( 1 / 2 ) ^ ( ^ 0 1 /i, h)dt +

pt{Rvlh,dy)

= A*ptdt + ptiR^h^dy),

(14.31)

Po =Po, where A* = A* — (i?Q */i, h)I. In the Galerkin scheme discussed in the preceding section, the operator A* must be replaced by A*. In this case the elements of the matrix Ak of equation (14.26) are replaced by Ak = {aji = {wjiA*Wi) - (l/2)((RQ1h,h)wj,wi)H,l<

ij

< k}.

(14.32)

Using the solution of equation (14.26), with this correction, approximate con ditional mean and the covariance matrix of the original process {x(t),t > 0}, can be calculated as follows:

x{) K(A

zi^m,

(14.33)

E £ i Z?{t){Bj + (x(t) -

mi)(x(t)

-

mi)'))

The first term is demonstrated as follows :

±(t) = E{xm?r} =

s

r**^

(14.34)

JRnPt{Z)
~ !ZztN(t)wi(0^

EliZ?(t)

'

Similarly one can verify the second expression. In order to compute all the matrix elements given by (14.27) and (14.32) as accurately as possible and at the same time minimize computational burden, it is necessary to select a

Numerical Techniques

196

suitable bounded domain in Rn. This can be based on the approximate support of the initial density po- Define the vector

mQ= j i p 0 (0 d£, and define the cube C2a = C2a{™<0) = {x £ Rn

: THoi - CL < Xi < ITloi + a , l 0, there exists a compact set K C Rn such that Po(x)dx > 1 — e. / We choose a large enough so that K C Cia> This domain can be enlarged as desired by adjusting a so as to absorb any integer multiple of K. Once the domain is fixed, each edge is partitioned into d equal intervals and create dn — N cells covering the entire domain. They are then indexed in some order giving C20 = Ur=i Dr- The center of cell Dr is denoted by mr. Choose the matrix Br = 7 r / , 7 r > 0. Then use the sequence {wr = w(x,mr,Br)} for the Galerkin approximation. For further details see [12]. One way of judging the performance of the approximate filter is to evaluate the residual error defined by the innovation process

V(t)=

[ Jo

a^(s)(dy(s)-h(s)ds)

where h(t) = E{h(x(t))\Jrf}. We know that V is a Brownian motion with mean zero and incremental covariance R. Note that h is computed using the expression,

Hi)« / Morfm, where pk is the approximate solution of equation (14.31) obtained by solving equation (14.26) as discussed in the preceding section. If the observed behavior of the measurement residual V(t+At) — V(t) is inconsistent with its theoretical property, then it must be concluded that the number of terms in the Galerkin approximation is inadequate or the domain chosen does not adequately support the probability mass. So the domain decomposition by cells must be further

Linear and Nonlinear Filtering

197

refined by increasing the number of cells or the domain must be enlarged or both till the error significantly reduces. 14.5 Basic Computational Steps Here in this section we present the basic computational steps. Step 1 Generate the random processes W and V. Step 2 Given x 0 , solve the system equations (14.1) and (14.2) using Runge-Kutta method to obtain the values of the observation process y at discrete points of time {y(ti),i = 1,2, ...L}. Step 3 Solve for Zk(0) from equation (14.26b). Step 4 Solve the finite dimensional stochastic differential equation (14.26) using Zk(0) as the initial state to obtain {Zk(ti)1i = 1,2, ....,L}. Step 5 Check the residual AV(t) « vo(t)-\Ay(t)

-

h(t)At).

If this is not approximately a random element with mean zero and covariance AtR, increase the number of cells N and the size of the cube C^ai if required, and go to step 3, otherwise stop. 14.6 A n Alternative Approach Before presenting some numerical examples based on the above scheme, we prefer to present here another interesting approach which may be useful for real time computing and filtering. This is based on an implicit scheme directly applied to the system equation (14.31) in the Hilbert space H. Let J = [0,T] be any finite time interval partitioned into say L equal subintervals of length A. Let t € J be a point from any one of the interior nodes and suppose p(t) is already known and that new information contained in the increment of y given by Ay(t) = y(t + A) - y(t) is available. The problem is to find p(t + A) from the given information. We obtain the solution by approximating the equation (14.31) using the following implicit scheme: p(t + A) « p(t) + (i*p(* + A)) A + p(t)(R^%

Ay(t)),

(14.35)

Numerical Techniques

198 which can be written as (7 - AA*)p(t + A) « p(t)(l

+ (R^h,

Ay(t))\.

(14.36)

If the operator (7 - A A*) is invertible, the solution is given by p(t + A) * (7 - A i * ) " V * ) ( l + (#0 1A, A y ( i ) ) ) .

(14.37)

Prom (14.37) we can then construct the following algorithm ps = (l-AA*y1{l

+

(R^h,Ays)}P{s.lh

(14.38)

Po = P o , s = 1,2,...,L, where ps = p(sA). For invertiblity of the operator (7 — A^4*), we note that for every (j) € V we have ((I-AA*)0,0) = | 0 & - A ( i V , * ) > |*& + A { a || 0 ||2V -AI^H, + | v W » . h ) 4 > \ 2 H ) >(l-XA)\d>\l

+

(14.39)

(aA)\\d>fv,

where the second line follows from the coercivity of —A* ( see inequality (14.15)) and the positivity of the matrix RQ. Hence for A sufficiently small, it follows from (14.39) that the operator (7 — AA*) is coercive, that is, there exists a constant c > 0 dependent only on {A, a, A } so that ( ( / - A i * ) < M ) v % v >c\\4> |fr, \/ e V.

(14.40)

From this inequality one can prove that the operator has a bounded inverse from V* to V satisfying || (7 - A i * ) " V ||v< (1/c) || V Hv VV; e V*.

(14.41)

For convenience denote B = (I — A A*) and let A : V —> V* denote the duality map or the canonical isomorphism of V onto V*, so that for any e V, || (j) \\v=\\ A0 \\v* and similarly for any g e V*, \\ g \\v* = \\ A - 1 # ||y . Since in our case V = H1, we have A = (7 - A) where A denotes the Laplacian in Rn. Now define the operator G by G0 = 0 - f l A " 1 ( B 0 - / ) >

for

/eV*,O<0
Linear and Nonlinear Filtering

199

Clearly, if this operator has a fixed point, say, ip e V, then ip = Gift means that Btj) = f or in other words t\) is a solution of equation B<\> = / . Since this is true for any / £ V* we conclude that B is an onto map from V to V*. On the other hand it is one to one since || B(j) ||v*> c \\ 0 ||v, V0 G V. All these mean that 5 is invertible and has a bounded inverse. Further, recall that for bounded ft, the operator B is bounded from F to V*. Let Ch denote this bound. We show that for a suitable choice of 6, the operator G has a fixed point in V. By simple computation one can verify that

II G0i - G<j>2 \\2V =|| 0x - 0 2 Hf, -20(A" 1 (£0! - £0 2 ),0 X - 0 2 ) + 02||A-1(B01-^2)|^ = 11 01 - 02 ||V " 2 0 ( 5 0 ! - 502,01 - 02)V*,V + 02||£01-£02||^
Po = p o , s = 1,2,- • -,L. Note that this also proves the existence of a unique solution of Zakai equa tion (14.31) and that the solution is pathwise continuous with values in V. 14.7 Examples and Simulation Results Consider the three dimensional system dx\ = g(x)(3i(x)dt + adW\ dx2 = g(x)(32(x)dt + adW2 dx3 = g(x)(33(x)dt + crdW3,

(14.43)

200

Numerical Techniques and the observation dynamics given by one of the following set dVi dVl = ndt xidt + aa00dV! dy2=x dt + a dV =2 x2dt 00 22

(14.44a)

dy3 = = xz3dt + a0dV3. dyi= xidt + aaQ0dVi dyi=xidt dy = xx22dt dt + a00dV dV22.. dy22 = + a dyi = =xidt xitft + +
/ " i 4 4 4 6 )

(i4.446) (14.44c)

The functions # and {ft,i = 1,2,3} are given by the following expressions - x\ xxxx33 + + x2x x23x), g= = tanh(2xi - x\ x2 x2 + + xxxxx22 + + Xl 3), Pi=4xi+x + 2+ Pi = 4 x i 2+x P2 = Xx— P2=X x-

X3, X3,

2x2 x3, 2X2 + + *3,

£3 = xi xi + + X2 - 22x x 33.

Since the drift vector b = [gpi,gp2,gp3]' function

is the gradient of a scalar valued

2 F = log!cosh[2xf - x X\| - x X 2 + fxx f j X22 + + xX!XX + f 2 Xf23xA 1 lXX + 3]\1

it satisfies the following Benes condition [25],[12],[79]. II V F || 2 +AF + (l/
(14.45)

where Q is a positive definite matrix, q € R3 and q0 G R. An exact solution of the corresponding Zakai equation is given by p(t,x) = c(t)explF(x) c(t)exp!.F(x)

- (l/2)(E( l / 2 ) ( E -11(ar ( x - m(t)),x m(t)),x- - m(t))\. m(t))\.

(14.46)

This equation determines an unnormalized conditional density in terms of the parameters {E,m} which are the solutions of the following equations E =o a2I/ - (l/<x (l/
(14.47) (14 4?) '

We solve each of the three problems (14.43-14.44a), (14.43-14.44b) and (14.4314.44c) using the steps given in section 4 and 5. The exact results are obtained

Linear and Nonlinear Filtering

201

by solving (14.47) and (14.46) and compared with the solutions computed using the numerical scheme described in sections 14.4 and 14.5. The estimated states are presented as functions of time in the figures 1,2,3. Theoretical results are plotted by unbroken lines and numerical ones are plotted by broken lines. It is extremely encouraging to see how close the numerical results are to the analytical results which are based on exact solutions. The data used for the numerical simulation are a = 1.0, a0 = 0.015, E 0 = 0.17, mi(0) = 0.2,ra 2 (0) = 0.1,m 3 (0) = - 0 . 1 . C o m m e n t 2 The reader may find it challenging to try with the alternative approach as proposed in section 14.6. C o m m e n t 3 For on line computation, the alternative approch may prove to be very usefull. If the operator A is time varying, this requires a powerfull computer to evaluate the inverse of elliptic operators at each time step. C o m m e n t 4 Again the reader is encouraged to evaluate the performances of EKF-3 and EKF-4 given by equations (12.50)-(12.51) of chapter 12 and equations (13.55)-(13.56) of chapter 13 respectively. This can be done by comparing their solutions with the best estimates given by the solutions of the associated Zakai equations (13.37B) of chapter 13.

Fig.l

Numerical Techniques

202

Fig.2

Fig.3 Courtesy of Kluwer Acad. Pub., Dynamics and Control,7,1997, 293-308

203

C H A P T E R 15 PARTIALLY OBSERVED CONTROL

15.1 Introduction In many problems arising in physical or social sciences, one is required not only to give a best estimate of the unobservable from the observable, but also exercise controls to change the course of evolution of the unobservable inorder to achieve certain objectives. This is partially observed control. For exam ple, the population (or immigration) control for a country may be exercised on the basis of limited survey data in order to maintain a certain growth rate considered to be good for the country socially and economically. The same phi losophy applies to government regulations applied to fishing industries where the objective is to prevent overfishing by prescribing appropriate net size which is determined on the basis of government estimate of the available fish stock and its growth characteristics. In fact this concept is universal and applies to any situation where decision is to be made for "good" on the basis of partial information. 15.2 Linear Systems with Integral Observation The system is governed by the following controlled stochastic differential equation, dx(t) = A(t)x(t)dt

4- /(*, u(t))dt + a(t)dW(t),

x(0) = x 0 ,

(15.1)

with the measurement dynamics given by dy(t) = H(t)x{t)dt

+ a0{t)dV{t),t

> 0, y(0) = 0.

(15.2)

Here all the parameters satisfy the same set of standard assumptions that we used in chapter 3. The function / is a map from [0,oo) x U —► Rn, measurable in the first variable and continuous in the second. This is the mechanism through which control is exercised on the plant. The set U is a

Partially Observed Control

204

compact, possibly convex, subset of Rd, prescribing the set of admissible values the control may take. Here we assume that the initial state xo is Gaussian, the random processes {W, V} are Brownian motions in IP and Rm respectively and they are all mutually uncorrelated. The performance of the system over the time period I = [0,T] is measured by the average cost given either by J(u) = ES. EI f £(t,x(t),u(t))dt\. £(t,x(t),u(t))dt\.

(15.3a)

J(u) = E[ E{ ff £{t,x{t),u{t))dt\, £(t,x(t),u(t))dt\,

(15.36)

or

where x(t) is the best estimate of x(t) given the history {y(s), 0 < t} or simply the information sigma algebra JFf'. The functions £ and £ are measurable in the first variable and continuous in the rest of the variables. Additional assumptions will be introduced as necessary. The basic problem is to find a control which is J=f measurable that imparts a minimum to the cost functional. Since / is T\ adapted, it follows from the concluding remarks of chapter 4 that the best estimate x is given by the solution of the SDE + fdt + + KHR^idy dx = Axdt +

- Hxdt),

x(0) = x0 - xx00.-

(15.4)

The matrix K is given by the solution of the same differential Riccati equation (3.35) of chapter 3, reproduced here K(t) = A(t)K(t)

+ K(t)A'(t)

+ Q(t) -

K(t)H'(t)Ro\t)H(t)K(t),

K(0) = K0.

(15.5)

Thus, it is important to note that by introducing J* adapted controls, the error covariance cannot be modified, or in other words, control cannot change the uncertainty in the state estimate. However the estimate x(t) itself can be influenced by choice of the control forces given by / . But at the same time control actions will also bring about similar influence on the state itself. This suggests that one may use either of the cost functionals (15.3a) or (15.3b). First we consider the cost functional (15.3b) along with the system (15.4). This is a very realistic problem. Since the estimate is available and F? adapted, this is a fully observed problem. Recall that the process V given by dV = ^(dy Hxdt), o-^idy - Hxdt),

(15.6)

Linear and Nonlinear Filtering

205

is an innovation process and it is a Brownian motion having the same in cremental covariance, R, as that of the Brownian motion V perturbing the measurement dynamics. Hence equation (15.4) can be written as dz = Azdt + fdt + GdV, z(0) = x0, /

/

!

>

G = KH (Rao)-1 = KH #o

where

1

(15.7)

W

In other words we have converted our partially observed control problem into a fully observed control problem given by dz = Azdt + fdt + G(t)dV, z(0) = x 0 , J{u) =E\

Ii(s,

z(s), u(s))ds\

x),u€U

=► Inf,

where the infimum is taken over the class of admissible controls Uad comprised of T± measurable processes taking values from the set U. Later we show that the infimum is actually attained in a subclass U of Markovian functionals of the process z. That is, u € hi if it has the representation, u(t) = F(t, z(t)) for a suitable map F : [0,T] x Rn —► U. We refer to this problem as (PI). This can be solved by using Bellman's principle of optimality. For this purpose we embed this control problem into a family of problems as follows. Let t € (0, T) and z(t) = x and consider the problem d£(s) = A(s)£(s)ds -I- / ( s , u(s))ds + G(s)dV, £(t) ( fT ~ 1 C(u,t,x) = EIJ t(3,£(s)Ms))ds\J*y

=x,s>t, (15.9)

where £(s) = £™x(s),s € (t,T] is the unique solution corresponding to the control u e Uad of the SDE (15.9a), starting from the state x at time t. Corre sponding to the control policy u, the functional C(u,t,x) denotes the average running cost over the interval [t,T] starting from the state x. According to Bellman's principle of optimality, if z has reached the state x at time t, the choice of future control actions should be such that it minimizes the cost for the remaining time period. Denote cj)(t,x)=mf{C(u,t1

ad}.

(15.10)

For the moment, suppose the infimum is attained at u° G Uad so that (t, x) = C(u°, t, x). Clearly then, for t = 0 and x = x 0 , we have J(u°) = 0(O,x o ),

(15.11)

Partially Observed Control

206

provided this function is continuous in all it's arguments on I x Rn. The function 4> is called the value function. We assume that 0 € Cl'2{I x Rnx and show that it satisfies a semilinear partial differential equation called the Bellman equation. We use Ekland variational principle. Let u° G Uad denote the optimal control and let $°s denote the unique solution of (15.9a) corresponding to u°. Here {t,x} is fixed but arbitrary. Let v be an U valued random variable measurable with respect to the sigma algebra Jf. Define a new control , , _ Ju°)s), >-\v, -\v,

U{s U[S)

iorse[t + At,T]; for s€[t,t se[t,t +At). + At).

(15.12) (15 12) '

Clearly by virtue of the optimality principle,
= El

M ft+ rMi(s,tl i(S,^x(s),v)ds x(s),v)ds

+ 4>(t (t++At,^ At,Zl x(tx(t ++

At))\FA, At))\^\, A15.13)

where ^X{t+At) denotes the state attained by the solution of equation (15.9a) at time i + At due to the control action v. This is given by the approximation, %tX(t + At) At)~x« x + (A(t)x + f(t,v))At

+ G(t)(V(t

+ At) -

V(t)),

« x + {A(t)x + f(t, fit, v))At + G(t)AV. Then using Lagrange formula, one can easily verify that y ry W At)) W + + At,£l At,£lx(t(t + + At)) x

- 4>(t + At,x + (A(t)x + f(/,v,)At 4>(t+ t,x) + {Mt + At,x),A(t)x

+ G(t)AV))

+ o(o(A

+ f(t,v))At

^ ^ (15.14)

^

+ (G (t)<j> (t)<j)*,AV) x,AV) + (1/2) < 4>xxG(t)AV,G(t)AV

> +o(At).

Substitut+ng this in (15.13) and us
< £?{jf

/ ( . , & ( , ) , « ) ds\??}

+ (4>x(t + At,x),A(t)x

+ (l/2)Tr (<j>xx(t + At, x)(GRG'))At

+ f)At

(15.15) (15.15)

+ o(At).

For convenience let D and D2<j> denote the eradient vector and the matrix xo second partials of <j> respectively ana M{Tf, U) t h t class of F? measurable U valued random variables. Dividing both sides of the inequality (15.15) by At and letting At - 0, we have, for all (t,x) G / x Rn, 0 < (£»(*,*) {C<j>){t,x) + £(t,x,v),(t,x)

eIxRn,\/ve

M(F?,U), M(F?,U),

(15.16)

Linear and Nonlinear Filtering

207

where ( £ » ( t , x ) = {d/dt)cf> + (D4>,A(t)x + f(t,v))

+ (l/2)Tr{{D2
= (d/d*)0 + ■ * > v

where A is the infinitesimal generator of the Markov process z = x. This follows from right continuity of T\', measurability of / in the first argument and continuity in the second, and the almost sure continuity of the process £ and the assumption that <j> £ C1,2(I x Rn). Now suppose there exists a control ^° € ^ad at which the inequality (15.16) turns into an equality. That is 0 = (£u°(t>)(t, x) + I(t, x, u°(t)), (*, x) G 7 x iJ n .

(15.17)

We show that any such control must be optimal. Consider the solutions z° and f£ x of equations (15.8) and (15.9) respectively, corresponding to the same control ix°. Taking the Ito derivative of (s,£°x) along this trajectory and integrating over the interval [t,T], we have, -0(*,x) = I

(£°ct>)(s,$Zx(s))ds + J

(G*D(s,ZZx),dV(s).

(15.18)

Since z°(t) = x, it follows from uniqueness of solution that £°iX(s) = z°(s),s > t, and thus equation (15.18) is identical to the following expression -<j){t,z°{t))=

f

{£°(t>){s1z0(s))ds+

f

(GD4>(s,z°),dV(s).

(15.19)

Taking the expectation of either side and letting t —» 0, it follows from (15.19) that -4>(Q,X0)

= E U

(£»(s,2°(S))ds|.

(15.20)

Using (15.17) in (15.20) and recalling equation (15.11), we have J(u°) = 0(0,x 0 ) = E< f

i(t,z°(t),u°(t))dt\.

(15.21)

Therefore, any admissible control satisfying the equality (15.17) is an optimal control. Thus we have proved the following result. T h e o r e m 1 Suppose there exists a control u° G Uad and a function (j) € C1,2(I x Rn) satisfying the terminal condition 0(T, x) = 0,x 6 Rn, so that 0 = (£«>)(*, x) +i(t,x,u°(t),(t,x)

€ [0,T) x i T , n

0 < ( £ » ( * , x) + £ ( t , x , v ) , (*,x) e [0,T) x R ,Vv e U.

5

Partially Observed Control

208

Then u° is optimal. This result is equivalent to the well known Bellman equation. Indeed, byvirtue of the expressions (15.22), the question of existence of a function 0, as stated in the above result, is equivalent to that of existence of a solution of the Hamilton-Jacobi-Bellman (HJB) equation given by (d/dt)<j> + inf {(D(j), Ax + f(t,u)) ueu {

+ l(t,x, u)\ 4- (l/2)Tr(D2GRG) = 0, )

0(T,x)=O. (15.23) Define the Hamiltonian M(t, x, u, q) = (q, Ax+f(t,

u))+I(t, x, u), (*, x,u,q) € IxRnxUxRn.

(15.24)

Recall that / is measurable in the first variable, continuous in the second and £ is also measurable in the first variable, and continuous in the rest of the arguments. Since U is compact, this implies that for almost all t € / , and for each (x,q) e Rn x Rn, the Hamiltonian, considered as a function of u, is continuous and hence attains it's minimum on U. Thus, the infimum in the Bellman equation can be replaced by the minimum. Further, if u —► M ( t , x , u , q) is strictly convex or equivalently Muu is a positive definite matrix, that is, M u u > 0, for almost all t and for all (x, u, q) then there exists a unique point u* = r){t,x,q) e U usually dependent on the variables indicated, such that infM{t,x,u,q) =

M{t,x,r)(t,x,q),q).

u€t/

Using this function r\ one can rewrite the Bellman equation as a semilinear PDE (d/dt)4> + (l/2)Tr((D24>)GRG') 0(T,x) = O.

+ M(t, x, r,(t, x, D), D) = 0, (

• '

This equation and similar equations for general nonlinear problems to follow are known as the Hamilton-Jacobi-Bellman (HJB) equation. The question

Linear and Nonlinear Filtering

209

of existence of a (classical) solution 0, that is, one that belongs to the class C 1 ' 2 and satisfies equation (15.25) every where, is very crucial. In fact it is very rare that such a solution exists. Very stringent assumptions on the parameters {A,/,cr,H,a 0 l l} are required for the existence of such smooth solutions. For details on the regularity conditions required of the parameters mentioned above see the excellent review article due to Wonham [75]. In fact Wonham imposed all the necessary assumptions on these parameters so that the existence results for semilinear parabolic equations appearing in the PDE literature are applicable [see Ladyzenskaja et al, 60]. In any case, if equation (15.25) has a classical solution 0°, then the optimal control is given by u°(t) =V(t,z0(t),D(j>(t,z0(t))

= fj{t,z°{t)),

(15.26)

where z° is the solution of the estimator equation (15.7) written as dz = Azdt + f(t, rj(t, z))dt + GdV, z(0) = x 0 .

(15.27)

Therefore fj provides the feedback control. Thus theorem 1 can be restated as follows. T h e o r e m 2 Suppose U is compact and convex, and the Hamiltonian M is continuous on U and strictly convex. Then, if the equation (15.25) has a solution (j) e C 1 ' 2 , the optimal control u° is given by u°(t) = r}(t,z(t)), where z is the solution of equation (15.27). With the development of generalized solutions including weak solutions, these assumptions are no longer necessary. In fact now we have the concept of viscosity solution, most appropriate for the Hamilton-Jacobi-Bellman equation see [1],[38] and the references therein. We will have some more comments on the question of solutions of HJB equation later. Now we consider the genuine partially observed control problem and refer to this as (P2). This is the same problem as above with the exception that now the objective functional is given by the expression (15.3a) instead of (15.3b). Our objective here is to show that this partially observed problem can be converted to a fully observed one as treated above. Subtracting equation (15.4) from equation (15.1) and using equation (15.2), one can easily verify that the difference (x — x) = e satisfies the differential equation d(x - x) = (A - TH)(x - x)dt + adW - Ta0dV, x(0) — x(0) =

XQ

—

XQ,

(15.28)

Partially Observed Control

210

where T = KH' RQ1. Recall that XQ is Gaussian with mean x 0 and covariance P0 = K0. Prom equation (15.28) it follows that the conditional probability law of x(t) given x(t) is Gaussian. Precisely, the probability law of x(t), given that x(t) = C, is Gaussian with mean £ and covariance K(t) which is also the covariance of the error e(t). Let 9{t, x, C) =

,„ * exp - {{^(K^it^x U (27r)^VdetX(t) 2A denote the corresponding density. Hence for fixed u yK

W

E{Z(t,x(t),u)\x(t)=C>}

= I

e(t,x,u)g(t,x,C)dx

- C), x - C)} VA

= itt,(;,u).

(15.29)

In view of this, the objective functional (15.3a) can be written as J(u) = E< I £(t,x(t),u(t))dt\

= E< I

E{t(t,x(t),u{t))\x(t)}dt\ (15.30)

= E< /

i(t,x(t),u(t))dt\.

This shows that the original partially observed control problem associated with (15.1),(15.2) and (15.3a) has been converted to a fully observed problem described by dx = Axdt + fdt + GdV, x(0) = xn, where , , , G = KH (Ra0)~\

(15.31)

with the objective functional given by

rT ~ J(u)=

e(t,x(t),u(t))dt. (15.32) ./o Thus the solution of this control problem is again given by the solution of the HJB equation (15.25), where in the definition of the Hamiltonian M given in (15.24), £ is replaced by L Under a number of very strong assumptions on the parameters, Wonham proved that the optimal control actually lies in the subset ii C Uad- This is the celebrated separation theorem originally proved by Wonham. For further details see [75]. LQGR Problem. As an immediate application of the above result, we consider the so called linear quadratic Gaussian regulator problem. The system is given by dx(t) = A(t)x(t)dt

+ B(t)u{t)dt + (j(t)dW(t),

x(0) = x 0 ,

(15.33)

Linear and Nonlinear Filtering

211

with the same measurement dynamics as described by equation (15.2). The cost functional is given by J(u) = (1/2)E

[f {(N0x,x)

Jo

+ (Nu,u)}dt, (Nu,u)}dt,

(15.34)

"

where N0 € M+(n x n) and N e M+(d x d) are positive definite and U = Rd. We call this problem (P3). Note that E{N0x,x)

= TV (N0K) +

(N0x,x).

Thus the cost functional (15.34) takes the form J(u) = (1/2) / TV (N0K)ds + (l/2)£ / {(N0x,x) Jo Jo

+ (Nu,u)}ds,

(15.35)

and hence *(t,*,u) i{t,*,u) = = (l/2)TV (N0K) + (l/2){(N0z,z)

+

(Nu,u)}.

The Hamiltonian is given by M(t,z,u,q)

= {q,Az + £u) + (1/2)TV (NQK) + (l/2){{N (l/2){{N0z,*) 0z,*)

+ (JVu.u)}. (15.36) d l Since U = R , minimizing the Hamiltonian one obtains u = -N~ B'q. Substituting this in HJB equation (15.25) we have (d/&t)4> (l/2)Tr {{D ((D22)GRG) d>)GRG') ++M(t,x, M(t,x, -NB'D4>,D) -NB D,D<j>) (d/dt)4> ++(l/2)Tr = 0,= 0, (15.37) 0(T,x)=O. 4>(T,x)=O. Since the cost integrand is quadratic, for suitable {P,p,r}, 4> has she following form.

the value function

4>(t, z) = = (l/2)(P(t)z, (l/2)(P(t)z, z) + (p(t), (p(t), z) + r(t). r{t).

(1(.38) (1(.38)

Substituting this in equation (15.37) and equating quadratic, linear and constant terms to zero individually, one obtains for {P,p,r} the following set of equations P + PA + A'P + No - PBN~lBP ll

p+(A' PBN- B)p p + (A' -- PBN~ B')p r+ + (l/2)Tr (PGRG {PGRG

= 0, P{T) = = 0,P(T) = 0,

= 0,p(T) 0,p(T) = = 0, 0, = {\I2){BN-11BB'p,P)p,p) = = 0,r(T) 0,r{T) = 0. + N0K) - (l/2)(BJV-

(15.39)

212

Partially Observed Control

The terminal conditions follow from equation (15.37b). Being homogeneous with zero terminal condition, the second equation implies that p(t) = 0 for all t > 0. Finally it follows from the third equation of (15.39) and equation (15.38) that the optimal cost is given by J(u°) = 0(0, x 0 ) = (l/2)(P(0)xo,x 0 ) rT + (1/2) / IV {P{s)GRG'(s) Jo

(15.40) +

N0(s)K(s)}ds.

Thus we have the following result. Theorem 3 The solution for the linear quadratic Gaussian regulator problem (15.33),(15.2) and (15.34) is given by the following set of equations P + PA + AP + N0 - PBN-XB'P u° =

= 0, P{T) = 0,

l

-N~ BPx,

J(u°) = (l/2)(P(0),xo,xo) + (1/2) /

Tr {P{s)GRG

(s) +

Jo

N0(s)K(s)}ds, (15.41)

where K is the solution of equation (15.5). 15.3 Linear Systems with Dynamic Observation The system is governed by the following set of equations dx = (Ax + Dy)dt + f(t, u(t))dt + adW, x(0) = x0,

(15.42)

dy = (Hx + Cy)dt + a0dV, y(0) = 0,

with the the objective functional given by (15.3a). Let us call this problem (P4). Since the control is T% measurable, / is also T\ adapted. Hence this is similar to the model B of chapter 4. According to theorem 2 of Chapter 4, the optimal filter is given by dx = Axdt + ( / + Dy(t))dt + T(dy - Cydt - Hxdt),x(0)

= x0l

(15.43)

where T = KH RQ1 with K being the solution of equation (15.5). Again, introducing the innovation process V given by dV{t) = <Jol(dy(t) - (Cy(t) + Hx{t))dt),

(15.44)

Linear and Nonlinear Filtering

213

we can write the estimator equation as dx = Axdt + ( / + Dy(t))dt + Ta0dV, = Axdt + ( / + Dy)dt + G(t)dV, x{0) = x 0 ,

(15.45)

where G is defined by (15.7b). Again one can easily verify that V is an J~^ Brownian motion with incremental covariance given by R. This problem is very similar to the partially observed problem (P2) treated in the previous section. Define the Hamiltonian M by M(t, x, y, u, q) = (q, Ax + f(t, u) + D(t)y) + l(t, x, u), for (t, x, y,u,q)

(15.46)

e I x Rn x .Rm xU x Rn, and the mapping n:I

x R n x Rm x Rn

by inf M(t,x,?/,u,g) = M(t,x,y,?7(t,x,y,q),q). Thus we have proved the following result. Theorem 4 The necessary condition of optimality for the problem (P4) is that the HJB equation given by (d/dt) + (l/2)Tr ((D2(t>)GRG) + M(t1x,y(t),rJ(t,x,y(t),D(l)),D(t>)

= 0,

0(T,x) = O, (15.47) has a classical solution. Note that the coefficients of equation (15.47) are dependent on the ob served data. In fact one may also allow the cost integrand to depend on the observation reflecting the measurement cost. The reader is encouraged to con struct a feedback regulator as in theorem 3. 15.4 Fully Observed Nonlinear Systems In this section we consider an optimal control problem for fully observed nonlinear systems. In general the system is governed by the following SDE, dx = 6(x, u)dt + cr(x, u)dW, t > 0, x(0) = x 0 .

—> £/

Partially Observed Control

214

Let Uad denote the class of admissible controls. This consists of the class of measurable functions defined on I taking values from a compact convex subset U C Rd adapted to at most the current information about x(t). That is, u{t) has the representation u(t) = v(t,x(t)) for some Borel measurable function v : I x Rn —> U. We assume that for any such admissible control, the system (15.48) has a unique solution {x(t), t e 1} having continuous sample paths and bounded second moments (see chapter 2). The objective functional given by J(u) = E I £(t,x,u)dt (15.49) Jo is to be minimized over the class Uad- Let <> / denote the value function given by (t,x) = inf< E{ /

£(s,x(s),u(s))ds\x(t)

= x},u € Uad \-

Again following the Bellman's principle of optimality as in section 15.2, one can verify (formally) that (j> satisfies the HJB equation (d/dt) 4- inf { (£ty, b(x, u)) + £(t, x, u) + (l/2)Tr (D2 a(x, u)) 1 = 0 , J

u€U {

0(T,x) = O, (15.50) where a denotes the diffusion matrix given by a(x,u) = cr(x, u)Qa (x, u), with Q being the incremental covariance of the Wiener process W. This is a quasilinear partial differential equation, usually a very difficult problem in the field of PDE. The question of existence, uniqueness and regularity properties of solutions of such equations is far outside the scope of this book. For fur ther discussion see the concluding remarks (comments 4,5) at the end of this chapter. Instead we shall treat the problem as an optimal control problem of the forward or the backward Kolmogorov equation driven by vector valued controls. Let Uad denote the class of admissible controls. For u e Uad the backward Kolmogorov operator is given by A{u)(j> = (1/2)TY {D2(/) a(x, u)) + (Dfa b(x, u)).

(15.51)

The forward operator, as defined by equation (14.7) of chapter 14, or equivalently its adjoint, is given by

A*(u) = (l/2)j^£>y(ay*) - J3A(M) n

n

(15.52)

Linear and Nonlinear Filtering

215

where bi = -bi + XIj Dj(aij)- Note that the last expression is written in the divergence form. Let u G Uad be a fixed control and P t u the probability measure induced by the solution process of equation (15.48) at time t > 0, corresponding to the initial probability measure P 0 induced by the initial state XQ. Assuming that PQ has a density po> one can demonstrate, under some suitable assumptions on the coefficients, that Pu has a density pu and that it is given by the solution of the forward Kolmogorov equation (see chapter 2), ^=A*(u)p,

p(0)=Po.

(15.53)

Again under certain suitable assumptions, given that p0 e H = L2{Rn), we also have p\ e H. Hence the cost functional given by (15.49) can be written

J(u)= f i(t,p«,u)dt= f \ [ t(t,£,u)p?(£)(%) dt J

\

Jo

UR

"

J

(15.54)

= / 0?(t,u)(.),P?(-))//^ ./o

where we have used the standard notation for scalar products in H,

The problem is to find a control that minimizes (15.54) subject to the dynamics (15.53). Thus we have the following result. L e m m a 5 The stochastic control problem (15.48)-(15.49) in Rn is equivalent to the deterministic control problem (15.53)-(15.54) in the infinite dimensional Hilbert space H. According to this result it suffices to develop necessary conditions of optimality for the problem (15.53)-(15.54). We have seen in chapter 14, that under the assumptions of Lemma 1 (see (Al) and (A2) of equation 14.11, chapter 14), for each fixed u e U, the operator A* (u) is the generator of a Co-semigroup in H. Here we assume that, for each u e Uad, A*(u(t)) generates a strongly con tinuous evolution operator in H. Thus corresponding to each p0 € H, equation (15.53) has a unique solution pu e L2(I, V) C\ C(J, H). In fact this is a special case of the general theory of differential equations involving coercive operators. We present here for easy reference the following abstract result.

Partially Observed Control

216

L e m m a 6 Consider the evolution equation z-=A(t)z

+ f, z(0) = z0y

(15.55)

and suppose that there exist constants {c > 0, A > 0, a > 0} such that for all t > 0, the operator A(t) satisfies the following estimates v*y\
-{A{t)v,v)v%v

+ \\v\2H>a\\v\\2v

VveV.

\(A(t)v,v)

Then for each z0 G H and / G L2(I, V*), equation (15.55) has a unique solution z G L2(I, V) n C(I, H). Further, the map / —► z is a continuous linear map from L2{I,V*) to L2{I,V). For reference, see [2], [3]. In order to develop necessary conditions of optimality we need some regularity assumptions on the coefficients of the operator A*(u). We deliberately avoid rigorous justification of all the steps, since this will carry us far from the major objective of this book. Further, we may as sume that an optimal control exists. For details on this question the interested reader is referred to the literature [3],[4],[5],[6], [20],[65],[66]. We also assume that the parameters 6, a are once continuously differentiable on Rn x U with the derivatives being bounded. Let u°,u G Uad and suppose that u° is the optimal control. Since U is closed and convex, u£ = u° + e(u — u°) G Uad for 0 < e < 1 and hence J(u° + e(u - u°)) > J(u°). (15.57) Let p e ,p° denote the solutions of equation (15.53) corresponding to the controls ue and u° respectively. Subtracting equation (15.53) corresponding to the control u° from that corresponding to control u £ , dividing by e and letting e —> 0, one can verify that q* = {1/eW

- P°) — 9, in L2(I, V) n C(I, H),

where q satisfies the differential equation jq

= A*{u°)q + dA*(u°,u - u°)p°,q{0) = 0.

(15.58)

Here dA*(u°,u — u°) denotes the Gateaux differential of the operator valued function u —> A*(u) at u° in the direction u — u°. Note that, under the differ entiability assumptions of the coefficients {a, &}, A* G C 1 (t/,£(V, V*)). Define the function / by f(t) = dA*(u°(t),u(t) ~ u°(t))p°(t). Hence one can verify

Linear and Nonlinear Filtering

217

that, for p° e L2(I, V) n C{I, H), we have / e L2(I, V*). Thus it follows from Lemma 6, that equation (15.58) has a unique solution q e L2(I, V) n C(I, H). From (15.57) and (15.54) it follows that

0 < dJ{u°,u-

u°) = U m | ( l / e ) J ( u ° + e{u - u°)) -

J(u°)\ (15.59)

= J

0

U{t,quu )

0

+

{Iu{ty,u ),u-u°)\dt,

where l{t,quu°{t))= Zu(t,V0t,U°(t))

e(t,x,u0{t))qt(x)dx=<e(t,;u°(t)),qt>v*y,

[ =

/ £u(t,X,U0(t))p°t(x)dx JRn

=<

eu(t,;U°(t)),P°(t)

>V*y

.

(15.60) Our notation above clearly suggests that we have used the assumption that, for each u e Uad, ^(•,-^),^(-,-,ti)eL2(/,0Hence L, given by L(q) = fQ £(t,qt,u°)dt, is a continuous linear functional of q on 1/2(7, V). On the other hand it follows from (15.58) and Lemma 6, that dA*(u°, u - u°)p° —► q is a continuous linear map from L2(I, V*) to L2(I, V). Hence there exists a <j> G L2(I, V) such that L(q)=

[ i(t,quu°)dt= Jo

[ Jo

< (t),dA*(u°(t),u(t) - u°(t))p°(t) >v,v* dt.

(15.61) The function (j> is the adjoint variable uniquely determined by the solution of equation -±d> = A(u°) + e(tr,u°(t)),t€ [0,T),4>(T)=0. at Substituting (15.61) into (15.59) we conclude that, for all u € Uad, f | < 4>(t),dA*(u°(t),u{t)

- u°(t))p° >vy.

+(£u(t,p0]u°),u-

u°)\dt

(15.62)

> 0. (15.63)

Introduce the Hamiltonian H(t, u, 77,0 =< V, A*{u)<; > v , v + < t(t, •,«), C >v,v

=< n, A*(U)<; >v,v +i(t, c,«),

(t,u,ri,c)eixuxvxv. (15.64)

218

Partially Observed Control

Thus we have proved the following necessary conditions of optimality. Theorem 7 Suppose the elements of {a, b} are Cl{Rn x U) and further they satisfy the assumptions (Al) and (A2) of Lemma 1 (chapter 14) uniformly on Rn x U. Then, in order that {u°,p°} e Uad x L2(I, V) be an optimal control trajectory pair, it is necessary that there exists a (j) e L2(I, V) n C(7, H) such that rp

j {Hu(t,u°{t),{t)y(t)),u(t)-u°{t))dt>Q, Jo

VueUad

(15.65)

where 0 is the unique solution of the adjoint equation -(d/dt) = A(u°)(t) + i(t, -, tx°), 0(T) = 0,

(15.66)

and p° is the solution of the forward Kolmogorov equation (d/dt)p = A*(u°)p, p(0) = po. Comment 1 (Terminal Cost) If a terminal cost E{F(x(T))} (15.49), then (15.54) must be modified by adding =

j

(15.67). is added to

F(x)p£(x)dx.

This modifies the adjoint equation (15.66) by the nonzero terminal condition 0 ( T ) ( ) = F ( ) . The integral is well defined for F G V*. For necessary condi tions of optimality for jump Markov processes see [15]. 15.5 Computational Methods On the basis of the results of Theorem 7, we can develop two algorithms one for the open loop controls and one for the feedback controls. Algorithm A (Open Loop Control). Step 1 Choose any UQ e Uad for n — 0 and suppose at the nth stage ^n £ Mad has been obtained. Step 2 Use un in place of u° to solve equations (15.66) and (15.67) giving <j)n and pn. Step 3 Use this to compute Hu(t, un(t), (t>n(t),pn(t)) = Hn(t) of inequal ity (15.65).

219

Linear and Nonlinear Filtering

Step 4 Define u n +i = un — sHn for (e > 0) suitably small so that u n + i(£) G U for almost all t € I. Step 5 Compute J ( u n + i ) = J{un) — £ || # n ||2 +0(5), for £ suitably small. Step 6 Stop if a stopping criterion has been met, otherwise go to step 2 with the new control un+\. Comment 2 In case, at any stage n and t e I, un(t) e dU, the boundary of [/, and Hn(t) is also outwardly directed, one must set un+\(t) = un(t), otherwise follow step 4. Algorithm B (Feedback Control) For feedback controls we define G = G(t,x,ii(t,x),0(t,x),p(t,a:)) = p(t,x)A(u)<j)(t,x)

+ £(t,x,

u)p(t,x).

Using this expression we can rewrite (15.63) explicitly as space time integration in place of the abstract form (15.65) giving

/ {•

< Gu(t,x, u o (t,x),p o (t,x),0(t,x)),w(t,x) — u°(t,x) >Rd \dxdt > 0,

(15.68) JlxBr I where Br = {x € Rn : \x\ < r},r < 00. We state the necessary modifications of the preceding algorithm. For Uad take all Borel measurable functions from / x Rn to U. Replace H by G and Hu by Gu. For step 3, define Gn(t,x) = G ? n (t,x,it n (t,x),0 n (t,x),p n (t,x)). For step 4, define u n + i(£,x) = un(t,x) eGn(t,x) keeping in mind the comment 2. Before concluding this section we like to mention that the control problem comprised of (15.53)-(15.54) involving the FKE (forward Kolmogorov equa tion) is equivalent to the following control problem involving BKE (backward Kolmogorov equation) - {d/dt)(j)u = A{u)<j)u + t(t, x, u), 0(T, x) = 0, f J(u) = / (j)u(0,x)po(x)dx —> inf.

(15.69)

JR"

This follows from Ito differential formula. The same set of necessary condi tions exactly as stated in Theorem 7, can be derived from the control problem (15.69). We invite the reader to verify this. 15.6 Partially Observed Nonlinear Systems In this section we consider control of partially observed nonlinear systems.

Partially Observed Control

220

Unfortunately separation theory due to Wonham, does not hold for nonlinear systems. Here we force the Zakai equation as the state equation which is to as to minimize certain cost functional. We consider

as discussed in section 2, a separation and consider be controlled optimally so the system

dx = b(x, u)dt 4- cr(a;, u)dW, x(0) = £o, (15.70)

dy = h{x)dt + a0(t)dV, y(0) = 0.

Let T± = a{y(s),s < t} denote the least sigma algebra (completed) with respect to which y is measurable. Let U be a compact convex subset of Rd and Uad = M(J r y , U) the class of U valued controls {u(t), t > 0}, which are Ty measurable. This is taken as the class of admissible controls. Let us consider an arbitrary but fixed control from this class and consider the system (15.70) driven by this control. Then it follows from chapter 13 ( see Theorem 2, equation 13.37B), that the unnormalized density of x(t) = xu(t), relative to the sigma algebra J^, is given by the solution of the controlled Zakai equation dput = A*{u)putdt + p?(Rolh,dy),p0

= po.

(15.71)

u

Since the conditional probability density of the state x (t)1 given the obser vation {y(s), s < £}, is related to the Zakai density via p™ = c(t)p^ with c(t) = ( 1 / f pfdx) being the normalizing factor, it is reasonable to consider the unnormalized cost functional J(u) = E I

i(t,p?,u(t))dt

=E I

(f

£{t,x,u(t))p?(x)dx\dt

(15.72)

(f

e(t,x,u(t))p?(x)dx]dt.

(15.73)

in place of the true cost functional JT(u) = E I

i(t,p?,u(t))dt

=E f

Thus the original control problem of minimizing JT over Uad subject to the dynamics (15.70) is approximately equivalent to the fully observed control problem of the Zakai equation (15.71) coupled with the cost functional given by (15.72). A necessary condition of optimality is meaningful only if existence of optimal controls is assured. We assume throughout that an optimal con trol exists. Readers interested in existence theory may refer to the literature [2],[3],[4],[37] for finite dimensional problems; and [6], [7] for infinite dimen sional systems. Let L|(7, V), Ly2{I, H),Ly2(I, V*) denote the Hilbert spaces of ^ - a d a p t e d random processes satisfying E J || z | | | dt < oo, where B = {V,H,V*},

(15.74)

Linear and Nonlinear Filtering

221

and Cy(I, H) C L\ (7, H) the Banach space of T^ adapted H-valued processes having continuous sample paths with probability one. Here we are only con cerned with the necessary conditions of optimality. Before embarking on this topic we note that, under the given assumptions, for each control u G Uad, the Zakai equation (15.71) has a unique solution p G L\{I, V)nCy(I, H). This can be proved on the basis of the apriori estimate E\pt\2H + 2aE f

|| ps \\2V

2 0\ H

+ (2A+ || h W^E

Jo

[

\ps\2Hds

JO 2

<\p0\ Hexp{(2\+

(15.75)

||fc||oo)T},Vt€J,

n

where || h ||oo= sup{|/i(x)|jRm,x G R }. The first inequality follows from Ito's formula applied to the function f(pt) = (l/2)\pt\'ji, and the second follows from Gronwall inequality. In section 15.4, we treated forward Kolmogorov equation (15.53) as the state equation coupled with the cost functional (15.54), whereas, here we con sider the Zakai equation (15.71), a stochastic PDE, as the state equation and (15.72) as the cost functional. The former is a deterministic infinite dimen sional problem obtained from a finite dimensional fully observed stochastic con trol problem and the later is a stochastic infinite dimensional problem obtained from a finite dimensional partially observed stochastic control problem. We fol low the same procedure as in section 15.4. Let u° G Uad be the optimal control and u G Uad any other control. Then by convexity, u£ = u° + e(u — u°) G UadLet p£ and p° be the solutions of equation (15.71) corresponding to controls ue and u° respectively. Then following the same steps as in section 4, one can verify that q given by

9 = lim{(l/ £ )(p e -p°)} in 23(J,lO eJO

satisfies the SPDE dqt = A*{u°)qtdt + dA*(u°,u-

u°)p°dt + qt(Ro1h,dy),q0

= 0,

(15.76)

and that the Gateaux differential of J satisfies the following inequality dJ(u°,u-u°)

= El

f

[i(t,qt,u0)

+ (Iu{t,p0t,u0),u-u°)\dt\

>0Vu€Z4d(15.77)

Again for the SPDE dv = A*{u°)udt + g°(t)dt + HRo'h, f(0) = 0,

dy)

ds < \p

Partially Observed Control

222

one can show that the map g° —> v is a continuous linear map from L f t / , 0 —♦

L%(I,V).

Hence, taking dA*(u°,u - u°)p% for g°(t) and q for z/, it follows from the continuity argument that there exists a <j> € L l C > ^0 s u c n ^ n a ^ E / i{t,qt,u°)dt 7o

=E f ./o

< 0(t),di4*(u°,u-ti°)p? >v,v* A,

(15.79)

where 0 satisfies the backward stochastic PDE -dfa = A{u°)(j)tdt + l(t, ;u°)db + <M#o x /i, dy),r = 0.

(15.80)

Thus we have proved the following result. Theorem 8 Suppose the assumptions of theorem 7 hold and that h is continu ous and bounded from Rn to Rm. Then, in order that (u°, p°) G Uad x ^ I ( ^ ' ^ ) be an optimal pair, it is necessary that there exists a e L\(I, V) such that Elf

{t),pot),u(t)-u0{t)>Rddt\>0WueUadl

(15.81)

where (j) is the solution of equation (15.80), p° is the solution of equation (15.71) corresponding to the control u° and H is given by the expression (15.64). Hence the necessary conditions of optimality are given by (15.81), (15.80) and (15.71). This is a stochastic minimum principle which can be used to solve the LQGR problem yielding the same set of equations as in theorem 2. For more extensive study on partially observed control problems see the recent book due to Bensoussan [20] where many variants of the problem have been discussed. The necessary conditions given by theorem 8 are also useful for system identification considered in the next chapter. However, their usefulness for computing optimal controls on line (in real time) seems to be limited because of the adjoint equation which is a backward stochastic differential equation. For engineering applications, design of suboptimal controls using neural networks may be much easier. Comment 3 (Terminal Cost) If a terminal cost E{< F, p? >} is added to the running cost functional (15.72), then the adjoint equation (15.80) must be

Linear and Nonlinear Filtering

223

modified by adding the nonzero terminal condition, T = F(.). 15.7 Some Examples and Discussion Example 1 This example illustrates fully observed problem of section 15.4. dx = uxdt + y/(l + x2)dw,x(0)

= x0,U = [-1,4-1],

T

J(u) = (l/2)E

f o / {Ai(x - m) 2 + Jo

o \2u2}dt.

(15-82)

By virtue of (15.65), we have / Jo

/ (xDcj) p + \2pu°)(u

- u°)dxdt > 0 \/u € Uad,

JR

where Uad denotes the class of all Borel measurable functions on I x R with values in U. ^From this one can easily verify that the optimal feedback control has the bang-bang structure given by the implicit relation u° = -sign (xDcj) 4- A 2 u°). If there is no control constraint, the set U = R and in this case u° = -(l/A2)x£>, and the associated adjoint equation is given by -(d/dt)(/) = (1/2)(1 + x2)D24> - (l/2)x2(Dcf))2 + (l/2)Ai(x - m) 2 ,

teI,xeR

0(T, x) = 0, x e R. Example 2 Here we consider a partially observed problem. Population control by immigration can be modeled as dN = (ciN 4- c2N2 + uN)dt + /3dw, N(o) = N0 dy = h(N)dt 4- rdv, j/(0) = 0,

(15.83)

where u denotes (per capita) immigration rate and w and v are standard Wiener processes. The observable y is, for example, a vector of production levels of m distinct sectors of the economy which are influenced by immigra tion. The problem is to minimize the difference between a desired population

Partially Observed Control

224

level and the one actually attained by a specified time period [0,T], The cost functional is given by J{u) = (1/2)E{

[ \!U2dt + X2(N{T) Jo

Nd)2}

The admissible controls are .T7^-adapted processes taking values from the set U = [0, K] 0 < K < 1. The first term may represent cost of administration. Applying the necessary conditions of optimality given by Theorem 8, one can verify that the optimal control is of the form u° = (/s/2)(l - sign{xD(j))), where <j) must satisfy the equation -dcj) = Ul/2)02D2(t)

+ (cix + c2x2 + u°x)D(f> + (l/2)Ai(u°) 2 left (l/r2)(j)h(x)dy,

+ 0(T,x) = (l/2)A 2 (x-iV < i ) 2 , and p° is the solution of equation

dp = A*(u°)pdt +

(l/r2)phdy.

Comment 4 (HJB equation) Fully observed problems can be solved using the HJB equation (15.50). Including also the terminal cost, one can rewrite this equation in the abstract form W

J

0(T,x) = 0 o ( x ) , x € J ^ n ,

(15.84)

where o denotes the terminal penalty. In general the HJB operator A is strongly nonlinear and equation (15.84) does not have a classical solution, that is, (j> does not belong to the class C1,2(I x Rn). In recent years the con cept of viscosity solutions which are merely continuous functions on I x Rn has broadened the scope of HJB equation as a powerful tool for solving fully observed control problems. However, the literature on this topic is mostly de voted to the question of existence of solutions (optimal feedback controls), see [38] and [1] and the references therein. The assumptions imposed on the basic coefficients are rather strong. The computational burden for multi dimensional

Linear and Nonlinear Filtering

225

problems is heavy, though, with the development of super computers this may be easier in the near future. In the mean time, control theory must be further developed both for fully observed and partially observed problems relaxing as far as possible many of the stringent assumptions used in this book and the current literature. Fully observed problems are partially satisfactory, though for application one must develop feasible computational techniques for con structing viscosity solutions and state feedback controls. These are interesting doctoral thesis projects. We conclude this comment with an example from finance. Example 3 (Finance) There are two investment tools, one risk free with small mean growth rate as described by equation dps = aspsdt, and the other risky, such as mutual fund, with large mean growth rate ar > as described by equation dpr = pr(ardt 4- ardw), where or denotes its volatility and w is the standard Brownian motion. This is the price dynamics of the two assets. An individual wants to invest his wealth £ using these instruments so as to maximize his return. This is a question of optimal portfolio selection. For details see [89]. Letting u G [0,1] denote the fraction invested in the risky asset, and c G [0, oo) the consumption rate, the growth dynamics of his wealth is given by d£ = (1 — u)as^dt -f u£(ardt 4- ardw) — cdt = (as£ 4- (ar - a 5 )u£ - c)dt 4- u^ardw, f (0) = x. The return functional is given by the discounted revenue

(15.85)

= Ex fX e-0tU{c{t))dt + e'0T'P, (15.86) Jo where x is the initial wealth, U is the utility function, /3 is the discount rate, P is the bankruptcy value of the assets and rx is the stopping time to bankruptcy. Defining the value function V(x) = sup{J(u,c,x),u € [0,1],c > 0} one can verify that the stationary HJB equation, associated with the HJB evolution equation (15.84), is given by 0V(x) = sup{(l/2)a?x2u2Vxx -f (asx + (ar - as)xu - c)Vx + U(c)}, J{u,c,x)

V(0) = P. This is a highly nonlinear problem. Ignoring the constraints one can verify that the optimal policy is given by

.-c 1 (,,v.,v.)-fe^l ( v,/v„) c = Gi(x,V„V„)

=

fU')-1(V,),

(i5g7)

Partially Observed Control

226

where if denotes the gradient of the utility function U. In the presence of constraints the optimal policy is given by (15.88)

u = max{Gi A 1,0},c = max{G 2 ,0}.

Substituting this in the equation one obtains a highly nonlinear problem. Comment 5 (HJB equation for Partially observed problem) Though partially observed control problems are the natural settings for most of the problems arising in science, engineering and economics, the subject is far from satisfactory. Once the partially observed control problem described by the equations (15.70) and (15.73) is transformed into the control problem described by the equations (15.71) and (15.72), one may again formally write an HJB equation. We rewrite (15.71) and (15.72) as dp = A*(u)pdt A*{u)pdt + F{p)dz, F(p)dz, po = po T r J(u) = E I/ (eu(s),ps)Hds. Jo In this case formally the HJB equation is given by (d/dt)V + G{V) = 0,t € [0,T), v 6 D(A*) C H, V ( T » = 0.

(15.89)

(15.90)

where the HJB operator Q is given by G(V) = m(t(A*(u)v, g(V) infUA*{u)v,DV) DV) {l/2)Tr{{D22V)F(v)F*(v)) V)F(v)F*(v)) H H + (l/2)Tr((D

+ (iu,u)H,«(eu,v)eH,«eu\. u\.

One of the advantages of the HJB formulation is that it eliminates the necessity of the backward stochastic equation (15.80). The question of existence of viscosity solutions of partial differential equations of the form (15.90) on / x if is far more difficult and is rarely considered in the literature. Interested readers may see [85],[86] and the references therein. These results often give only existence without uniqueness. In case uniqueness holds, one can construct the optimal feedback control law. The control law so obtained is a function of time and the density. Since a physical control law must be a functional of the naturally observed process y, practical usefulness of a feedback control law depending on the conditional density is seriously questionable. For applications, one can rigorously formulate partially observed control problems using neural networks placed in the feedback loop thereby reducing the complexity of designing optimal feedback control laws to the problem of optimum selection of parameters or the weights. These are current problems of research interest.

227

C H A P T E R 16 S Y S T E M IDENTIFICATION

16.1 Introduction The problem of system identification is fundamental in all physical and engineering sciences. Very often the governing equations are known but the parameters that enter in the coefficients are either not known or known only approximately. For example, in vibration problems the form of equations for beams, plates, shells are known but the parameters such as modulus of rigidity, coefficients of elasticity etc. may not be precisely known. In reaction-diffusion problems the reaction rates and the diffusion coefficients may be unknown. Similarly in thermal problems, the heat conduction, convection and radiation parameters may not be known or known only approximately. In Quantum me chanics, the potential function in Schrodinger equation and even the masses of the interacting particles may not be precisely known. In fact very often mathematical models are used to describe the evolution of physical, chemical, biological, or even social systems based on fundamental laws of physics and scientific intuition on the infinitesimal (microscopic) relationship between the various interacting entities that produce the macroscopic behavior. For ex ample, in the modeling of economic and social systems one introduces a large number of parameters representing the infinitesimal interactions between the various entities that contribute to the temporal evolution of the process. Some time even the mathematical models are not able to capture all the intricacies of the natural system. In the process many natural parameters are introduced which may be determined either theoretically if possible or by experiments and observation. We present in this chapter some general methodologies for identification of such system parameters. 16.2 Fully Observed Linear and Nonlinear Systems Consider the system dx = b(t, x, a)dt +
(16.1)

System

228

Identification

where a € RN is an unknown vector of parameters that determines the dy namics of the system (16.1). Throughout the remainder of this section we assume that W is a standard Brownian motion and that the functions b and a satisfy the standard assumptions (uniformly with respect to a ) so that for each fixed a e RN, the system (16.1) has a unique strong or weak solution which is continuous with probability one having finite second moments whenever XQ has a finite second moment. The basic identification problem is to determine the true parameter a* from the observation {x(t),t 6 / = [0,T]}. A very classical and powerful technique for solution of this problem is the maximum likelihood estimate. For the likelihood function, we use the Radon-Nikodym derivative. Let C denote the Banach space of continuous functions on / with values in Rn and B(C) the Borel sigma algebra on C containing all cylinder sets and {Bt(C) | t > 0} C B(C) an increasing family of subsigma algebras (completed) of the sigma algebra B(C). Let /xQ denote the measure on C in duced by the process x a , the unique solution of system (16.1) corresponding to the parameter a. Let /x° denote the measure induced by the unique solution corresponding to the system given by dx = cr(t, x)dW, x(0) = x0.

(16.2)

Then we know that /xa -< fi° and the corresponding RND is given by pa(x) = expl

< R~1(s,x(s))ba,dx(s)

/

\*-\8Ms))KsMs),*)\2ds\

-(1/2) / (

= Expl

,T

/

a

>Rn

(16'3)

1

Rr,

T

(1/2) j

dsRrX

where R = era', and ba = bQ(s,x(s)) = b(s,x(s),a). That is, dfj,a = padfi°. A sufficient condition for /xa to be a probability measure is that cr~lba is bounded. Let T G B(C) denote any event. Then the probability of this event is given by

tia(T) = J p«{x)dn°{x).

(16.4)

Suppose as a result of an experiment we have observed the following trajectory {{(t),t e 1} e C. Let T e B(C) be a set such that f e T. Then clearly

Linear and Nonlinear Filtering

229

it follows from (16.4) that the probability of such a realization is maximum if a coincides with the true parameter a*. Since \i° is independent of a, it follows from the above argument that, given the observation x = £, find a G RN that maximizes the functional a —> Pa(€)- This is the justification for using maximum likelihood estimates. It follows from this and the expression (16.3), that we can choose J(a) = p a (£) for the likelihood function or J(a) = log pa(£). Hence the best parameter is obtained by maximizing this function. Thus, for J(a) one may choose J{a)=

[ Jo

Rr> (16.5) 1

a

a

(1/2) [ ds< /r (s > x(s))& ,& >jRn Jo In many practical problems the drift ba may be given by N

ba(t,x) = ^2aMt,x)

(16.6)

i=l

where {bi = 6i(t,x),i = 1,- • -,iV} are n-vector valued functions satisfying the standard Lipschitz and growth conditions. Clearly, it follows from this assumption that ba can be written as ba(t,x)

= B(t,x)a

(16.7)

where B : I x Rn —> M(n x N). In this case equation (16.5) can be rewritten as J(a) = (a, rrr)RN - ( l / 2 ) ( r T a , a)RN,

(16.8)

where rp

T]T=

f B'(s,x(s))i?- 1 (s,x(s))dx(s) Jo

£RN

TT=

I B'(5,o:(s))ii- 1 (s,x(s)) J B(s.x(s))ds € M(N x N). Jo

(16.9)

Given the history {x(t),t € / } , the variables rjr and IV are known as deter mined by equation (16.9). In order that the functional (16.8) has a maximum

System

230

Identification

it is required that the matrix I > be positive definite for some finite time T. In general we may consider the functional Jt(a) = (a,ifc) - ( l / 2 ) ( r t a , a ) , t > 0.

(16.10)

Thus if the parameter space is whole of RN, and if for a history of length t, Tt > 0, the problem has a solution and it is simply given by maximizing Jt(a) over RN. It follows from this that the maximum likelihood estimate of the parameter a at time t, provided I \ > 0, is given by

a°t = IV V

(16.11)

If t° is the first time for which Tto > 0, then it is also positive for all t>t°. Thus the history Ff0 provides the minimal statistics for estimation of the parameter. Suppose that the true parameter is a*. We show that the estimate (16.11) is consistent in the sense that a°t —► a* as t —> oo.

(16.12)

Note that W defined by W(t) = W(t) - [ a- 1 (5,x(5)) J B( 5 ,x(5))a*d5^ > 0,

(16.13)

is a Brownian motion on the measure space (C,B(C),/J,*) where n* is the measure corresponding to the true parameter a*. Hence it follows from this that x is a weak solution of equation dx = Ba*dt + adW. Substituting this in the expression for rjt given by (16.9) for T = t, we have rjt = I > * - f / Jo

BR-l<jdW.

This combined with (16.11) gives a°t - a* = IV 1 / (a^B)'

dW.

Note that the quadratic (matrix) variation process of the martingale

Jo

(a^B) dW

(16.14)

Linear and Nonlinear Filtering

231

is given by Tt. We can show from this that for any e > 0, ]x^PUa?-a*\RN

>e\

= 0.

(16.15)

This is very similar to the case of scalar Brownian motion w, lim

0
p i M ^ i > € \ =0 for any e > 0. [

t

)

Using equation (16.11) and Ito's formula we obtain a stochastic differential equation for a° given by da°t = -T;\B'R~lB)a0tdt

+ r;l(B'R~l)dx,t

> t°,

(16.16)

driven by the observed process x. Under suitable assumptions such as, B R~lB strictly positive, the asymptotic limit of a£ is the true parameter. We summa rize this result in the following theorem. Theorem 1 Suppose ba and a satisfy the standard assumptions guarantee ing existence and uniqueness of continuous solutions and that ba is linear in a satisfying (16.6). The dispersion matrix a taking values from M(n x n) is nonsingular and W is a standard Brownian motion. Then the maximum likelihood estimate of a G RN at any time t > t°, denoted by a£, is given by equation (16.11) and it satisfies the SDE (16.16). Further, this estimate converges asymptotically to the true parameter a* with probability one. C o m m e n t 1 Equation (16.16) provides a recursive estimate which is evidently adapted to the data field T*. Example 1 For illustration, we consider the (scalar) population model given by dx = (a\X + a2x2)dt + jdw (16.17) where a\ is the intrinsic growth rate, a2 is the competition or cooperation factor and w is a standard Brownian motion representing uncertainties. If Oil > 0, the population is cooperative and if negative it is competitive. The problem is to estimate these rates from historical population data {x(s), s > 0}. In this case B = (x, x 2 ) , a = ( a i , #2)'> °" = 7- Using this in (16.9) we obtain for T = t,

V, = d/72) [ ( X)) « l W = <1/T» / ( x f ) ) **W' (s)

M*£{*

x3(s)\,

System

232

Identification

The reader can easily verify that Tt is positive. Then ot°t is given by (16.11). If the growth rate is known precisely but the competition (or cooperation) factor Q.2 is not, then we have

m°

-(Hi) J- t

where %

= ( l / 7 2 ) f x2(s)dx(s) Jo

= (l/7)

Jo

fx2{s)dw

r , = ( l / 7 2 ) f xA{s)ds. Jo This example demonstrates clearly that the interacting coefficients OL\ and c*2 can be determined from population data using the technique summarized in theorem 1. Clearly the result of theorem 1 also applies to the linear problem dx =Axdt + adW =B{x)adt + adW.

(16.18);

V

where a G RN, N = n 2 , denotes the elements of the matrix A which is to be identified and B(x) G M(n x N) is linear in x. Identification of Dispersion Parameters The preceding results, however, do not apply if a or more precisely R = CFRQG is also to be identified. The problem arises from the fact that now equation (16.4) is given by H<*(T) = f pa(x)dva{x),

(16.19)

where pa has the same expression as (16.3), now with R being dependent on a and va is the measure induced by equation (16.2) with a dependent on a. This presents a very difficult problem which we consider in section 16.4. In any case, if the set of admissible parameters V is a compact subset of RN and cra(£, x) is globally Lipschitz in x e Rn having at most linear growth uniformly with respect to a eV and further it is measurable intel and continuous in a € V, then we can prove that, for each bounded set T G B(C), the functional given by / ( a ) = /x a (r) is continuous on V and hence attains both its maximum and minimum on V. However we do not have a simple expression for this as

Linear and Nonlinear Filtering

233

in the drift identification problem. This must be computed iteratively using a suitable algorithm given in section 16.5. A simple practical approach is described as follows. Consider the sys tem (16.18) and suppose I f is a standard Brownian motion. For any given set of model parameters {A, a} we write the equations for the mean and the covariance ra = A(t)m,m(0) = Exn, ; P = AP + PA + Q , P ( 0 ) = P 0 , t € J ,

(16.20)

where Q = oo . Suppose the observed process over the period I is given by {€(t),t e / } , a sample path realization of the process x. Then define the deterministic error covariance by the expression

This is clearly deterministic given the observed sample path £. Let {P, ra} denote the solution of equation (16.20) corresponding to the system parameters {A,Q}. Then one may try to find {A,Q} by minimizing the cost functional given by J(T, A,Q)=

[ Tr((P(t) - P(t))S(t)(P(t) Jo

- P(t)))dt,

(16.21)

subject to the differential constraints (16.20). The weighting matrix S(t) is any suitable symmetric positive definite matrix valued function possibly increasing with t > 0, in the sense that (S(t 2 )C,C) > (5(*i)C,C) f o r a11 C(T^ 0) G Rn and £2 > t\. This should be chosen in such a way that very little weight is given for small values of t and very large weight is given for large values of t. This is intended to eliminate the impact of initial uncertainties and transients. Thus the identification problem has been converted into an approximate control and optimization problem. This simple minded approach has been successfully used even for partially observed linear systems discussed in the following section. What is crucial here is the availability of sufficiently long historical data. When suitable historical data is not available one may use Monte Carlo simulation to generate an empirical covariance M

P(t) = (l/M) £(&(t) - m(t))(&(t) - m(t))'

234

System

Identification

for use in equation (16.21). For numerical results see [14]. 16.3 Partially Observed Linear Systems Suppose the system and measurement dynamics is given by the following system of equations dx = Axdt + adW,x(0) adW, x(0) = x0,

(16.22)

= 0,0, dy = Hxdt +
= x0, (16.23)

where the state estimation error covariance K(n) satisfies the differential Riccati equation K(n) AK{-K) + K(ir)A K(7T) = AK(n) K{*)A" +Q-

K{-K)H K(n)H

R^HK^), R^HK(ir),

K(TT)(0) = K0 = Po,Q s= aa , Ro == a0a0a. 0. K(n)(0) o-
(16.24) (16 24) '

Here the process {z(t),t G / } is the innovation process, a Brownian motion with covariance RQ. We know that the covariance of the process x corresponding to the parameter w, denoted by P(ir), is given by the solution of equation P(TT) = = AP(n) AP(TT) + + P(K)A'

+ (TT)(0) = +Q Q,, PP(TT)(0) =P P00 = =

KK0.0.

(16.25)

Let x(n)(t) = x{t) denote the mean of the process x given by the solution of equation (d/dt)x(t) = Ax(t),x(0) = x0, x0, (16.26)

Linear and Nonlinear Filtering

235

corresponding to the parameter TT = {A, a, H, <Jo}. The two covariances K(TT) and P(TT) are related as follows. For any £ e Rn we have

(tf(*)(t)£,O = £{(s(0-*(t).O2} = s { ( x ( t ) - x(t),02

+ (*(«) - x ( t ) , 0 2 + 2(x(t) - x ( t ) , 0 ( * ( t ) - x ( t ) , o }

- ( P W ( * K , 0 + ^ { ( x - x , 0 2 + 2£{(x(t) - x(t),0(*(«) ~ * ( * U ) I * ? } } = ( ^ ) ( * K , 0 + E{(X - x , 0 2 + 2{(x(t) - x ( t ) , 0 ( * ( t ) - * ( * ) , 0 } } = (P(7T)(tK,0-^{*(<)-2(<),0a}(16.27) Here we have used standard properties of conditional expectations relative to the sigma algebra T%. Define e(7r)(<) = x(t) — x(t). Subtracting (16.26) from (16.23), it is easy to verify that e(7r) satisfies the following SDE de(ir) = (A-

K{IT)H'R~

1

H)e(ir)dt + K(w)H' RoV(dy - Hxdt),e(ir){0)

= 0. (16.28)

Thus (16.27) can be rewritten as K(n)(t) = P(*)(t) ^P(n)(t)-K

E{e(n)(t)e'(n)(t)}

e

(n)(t).

This identity is correct if the choice of the parameter TT coincides with the true system parameter TT* while {y(t),t € / } is the actual measurement (field) data corresponding to the true parameter TT* . In other words if the observed data corresponds to the true parameter TT* and not the trial parameter TT, and this data is used as the input to the model system (16.28) giving e(n), the identity (16.29) may not hold. Thus it is natural to choose TT that minimizes any potential mismatch. This can be done by introducing the cost functional

J(T,TT)

= /

Tri{K{Tr)

+ Ke{Tr) - P(n))S(t)(K(ir)

+ Ke(rr) - P(TT)) jcft,

(16.30) where S(t) is a suitable weighting matrix, symmetric and positive definite possibly increasing with t. It is clear from equations (16.24), (16.25) ,(16.26) and (16.28), that their solutions are uniquely determined by the parameters {A, Q,H, RQ} rather than {A, a, H, cr0}, the data {x 0 ,Po = -^o} and the ob servation {?/(£),£ € [0,T]}. So we may redefine TT as TT = {A,Q,H,RQ} and

System

236

Identification

accordingly the admissible set V. Given the data {y(t), t G / } and n G V equation (16.28) is deterministic with solution e(ir,y) and hence we can use KV(n)(t) = (e(ir,y)(t)e'(w,y)(t)),

(16.31)

in place of Ke in the objective functional (16.30). Hence, if we can find a 7r° eP that minimizes the functional J, we have an approximate solution for the parameter identification problem. Thus we have the following result. Theorem 2 Consider the system (16.22) and suppose it is required to identify the parameters n = {A, Q, H, Ro} C V from the data {y{t), t G [0,T]}, x0 and Po- Let the observed data y be used as the input to the system (16.28). Then the identification problem is equivalent to the control problem: find 7r° G V that minimizes the cost functional (16.30), with Ke replaced by Ky, subject to the dynamic constraints (16.24-16.26) and (16.28). This technique was originally developed and reported in [14] with con vincing numerical simulation results. Necessary conditions of optimality of Pontryagin type was also given. However, method of simulated annealing was used for optimization. For details the reader is referred to [14]. A problem that a scientist faces in application of this technique is that Ke is not available in practice and so one must replace this in (16.30) with Ky given by (16.31). This is obtained from the solution of equation (16.28) corresponding to the measured data {y{t),t e 1} which is just one sample path as observed from experiment. This problem was overcome in [14] by taking a long history of the process {y(t),t G [0, T]}. An empirical covariance was also used for numerical computation. Example 2 The technique described above is illustrated by the following example. Consider the system and measurement model dx\ = ( a n x i + a\2X2)dt + a\\dw\ + Gyidw
Linear and Nonlinear Filtering

237

values with the estimated ones, it is evident that the method works very well for time invariant systems. Parameters an «12 a>2i a>22

9n 1

Ql2

1 |

J&11 hi2

Q22

1

Parameter Identification Starting value Estimated value -1.0 -1.994406 1.995625 1.0 2.0 0.504037 -3.0 -2.064103 1.010031 0.1 0.5 0.2000456 1.010027 0.1 -0.000276 -0.7 2.0 1.009440 0.1 0.992090

Actual value -2.0 2.0 0.5 -2.0 1.01 0.2 1.01 0.0 1.0 1.0

| |

16.4 Partially Observed Nonlinear Systems Now we consider partially observed nonlinear systems. Suppose the system and measurement dynamics is given by dx = b(t, x, a)dt + a(t, x, a)dW, x(0) = xo,

(16.32)

dy = h(t, x, a)dt + a0(t, y)dV, y(0) = 0.

Here the unknown parameter (vector) appears in all the coefficients except in the measurement noise. From chapter 13, we know that the associated Zakai equation (see chapter 13, equation 13.37B) is given by dpa(t) = A*(t1a)pa(t)dt

+

pa{Ro1ha1dy),p(0)=po.

(16.33)

For emphasizing dependence of the coefficients on the unknown parameter a, we may use it whenever required as superscript as indicated by 6 a ,cr a ,/i a . Again from equation (13.9) of Chapter 13, we have the Radon-Nikodym deriva tive given by qa = qa(t) = e x p j f

< R»lha,dy

> - ( 1 / 2 ) y < R^ha,ha

> ds},t

> 0.

° ° (16.34) For the fully observed problem, we argued that the RND given by qr or log(qr) can be chosen for the likelihood function. Thus, in the partially observed

238

System

Identification

situation the natural candidate for the likelihood function is LT(a)

= E°{qa(T)\F%}

(16.35)

1> .

=< pa(T),

Hence the identification problem can be stated as follows. Given the history {y(t),t el= [0,T]}, find a that maximizes the functional (16.35) subject to the dynamic constraints (16.33). For easy reference we call this problem IP. We shall reformulate this problem as a parameter optimization problem of a deterministic partial differential equation by use of an exponential transformation. In the literature this pathwise formulation is also known as robust formulation. Define z(t)= and

(16.36)

RZ\s,y{s))dy{s), f R^(s,y(s))dy(s),

Jo Jo

(16.37)

Pa(t) = pa(t)exp Pa(t)exp - {ha . z{t)}, P«(t) z(t)},

where from now on we use the notation (£, <)/*» = £ • <. Then by use of Ito formula one can verify that dpa(t) {t) = dpa(t) exp - {ha • z(t)} + pa(t) d(exp - {haQ ■ •{t)}) •(t)}) + (1/2) < dpa{t), (t), d(exp d(exp - {ha • z) > a a (exp-{h«•z})A*(t,a)(p -•})dt = (exp - {h° • z})A*{t,a)(p (t)eMh a(t)eXa»{h

■ •})dt

(16.388 ^ 8

a 0a -(l/2) ,h ,/»")*, )dt, -(l/2)pPaa(t)(R^h (t)(i2jr1/»

where A*(t,a) denotes the formal adjoint of the generator A(t,a) of the Markov process x given by the first equation of (16.32). Note that the last term in the first identity is the quadratic variation term. Clearly, given the process y{t),t € / , or equivalently the process z(t),t G /, equation (16.38) is a standard (deterministic) partial differential equation. We rewrite this equation as (d/dt)p (t)=A*(t,a)p a(t), Pa(t), (d/dt) = A*(t,a) Paa(t) (16.39) (n\ (16.39) Pa\y) —Po, pQ(,u) — po, where A* denotes the differential operator given by A*(t,a)<j> x p - (ah-•)a ■ •) A*(t,a)U A*(t,a)U A*(t,a) = eexp-(h

eex{ha a aa•}\ •}) - eex{h

l aa aa (l/2)(R^ h ,h )<j>. (l/2){R^h )^>.

(16.40)

Linear and Nonlinear Filtering

239

Note that the coefficients of this differential operator are parameterized by the process y through the process z. Equation (16.39) is a partial differential equation on the domain I x Rn. The objective functional (16.35) takes the form JT(a) = LT

a (T),

1 > = < pa(T)

exp(/i a • z(T)), 1 > .

(16.41)

Thus, our identification problem (IP) is equivalent to the optimization prob lem: find a that maximizes the functional JT(&) given by (16.41) subject to the dynamic constraint (16.39). The problem as stated above may have no so lution without further assumptions. Let V C RN denote the set of admissible parameters and suppose the coefficients satisfy the following assumptions. Assumptions (Al) b(t, x, a) is bounded and measurable in t and continuous in {x, a} on I x Rn x V satisfying uniform Lipschitz and linear growth condition in x on Rn. (A2) All the columns of the diffusion matrix o~(t,x,a) satisfy similar con ditions as those of 6 and Holder continuity in t on / . Further there exists a constant 7 > 0, such that a = acr > 7 / , V(£, x,a) e I x Rn x V and that the first derivatives in x are continuous and bounded on I x Rn xV. (A3) For every (£,x) £ I x Rn, the mapping a —► 6(t,x,a), a —> a(t, x, a) and a —► /i(a, x) are once Gateaux differentiate on V. (A4) For every a £ V, h is C 1 in x with bounded first derivatives. Fur ther, the maps a —> /i(x, a), and a —> (d/dxi)/i(x,a), are continuous and bounded on V for each x G Rn. (A5) The matrix valued function cr0(t,t/) is measurable in t on / and uni formly Lipschitz on Rm possessing at most linear growth. Further it is invertible. Under these assumptions we can prove that the operator A*(t, a) satisfies the Garding's inequality. Let H denote the Hilbert space L 2 ( ^ n ) and V = H1, the Sobolev space as introduced in chapter 14 ( see chapter 14, Lemma 1), and V* = (iJ 1 )* its dual. Then we have the following result. Lemma 3 Under the assumptions (Al)-(A5), the operator .4*(£,a), and hence A as well, is a bounded linear operator from V to V* and -A* is coercive in the sense that there exist constants { c > 0 , / ? > 0 , A > 0 } such that \{A*(t,a)u.v)V;v\ p\\u\\2vyueV, and V ( t , a ) e / x p .

= = (d/dxi) for rhe eartial lo <j> with hespect tt aad D<j> for rhe eradient vector. Then by integration by parts one can easily verify that aaaa(t,u,v) (t,u,v)

r

= = -- (( 11 // 22 )) //

= = (A*(t,a)u,v) (A*(t,a)u,v)

JR

"

+ / + /

Yf3j(Dju)vdx Yf3j(Dju)vdx

+ [ + [

n

]] T (£>,«)(£*«)<& T ay oy(A«)(Ity;)dx iJ 1 i , j== i

Yliu{Div)dx+ yliu{Div)dx+

(16.43) (16 43)

f'Suvdx, f'Suvdx,

where n

Pi(t,x,a) ^(t,x,a)

a a DD ha = (1/2)Y,aiiDi(h (1/2) J2 YaH H i(i(ha

■ z),l < j < n,n, -z),l<j<

i=\ »=i n

7i(*,x,a) = - ( 1 / 2 ) ^

^ ^ . ( / i 0 • s) - (1/2) ^ 3Dj(Da3ij) {ai3) + + bul bi,l
3=1 3=1

j=l 3=1

n

6{t,x,a)

n aa

a

aijDj{h = (1/2) ^ , aijDj{h ai-jDjih" ■ z)Di{h

■ z) + (1/2) Y

i,j=l

aa 33>.,(ay)A(/i •■z)z) J(aij)A(/i

i,j=l n

-Y,hVi{ha•z)-{ll2){RZlh",h"). i=l

Under the assumptions (A1)-(A3) and (A5), it is easy to verify that these coefficients are bounded measurable functions of their arguments. Thus there exist nonnegative constants {Ci,C2,C3,C4} such that by Schwarz inequality we have |{A*(t,a)u,v)|

C 1 |Du| L a |.Di;|i a ++ C a | CD2\DU\ t i | LL2>\V\L U a2 < C^Du^Dv^ + +C C33\DV\ \Dv\L2La\U\ \u\L2La

<>/2C|M|v||t,||y

=

+ C4+C4\U\LML \u\La\v\La 2

(16.44) (16.44)

c||tt||v||t;|fv,

where the constant C = WBX.{Ci,C2,CZ,C£. Since C$°(Rn) is dense in the sobolev space H^R") = V, inequality (16.44) holds for all u,v GV. Thus the operator A*(t,a) is a bounded linear operator from F to F*. For coercivity we use (16.43) with v replaced by -u and the Cauchy inequality 2 ab < {e/2)a? + (l/2e)6(l/2e)6 (e/2)a2 + , a, 62,a,6GR,e>0, G R, e > 0, (16.45)

Linear and Nonlinear Filtering

241

to arrive at the following inequality (-A*(t,a)u,u)

> ( 7 - (e/2)(C 2 + C 3 ) ) || u fv - ( 7 - (e/2)(C2 + C3) + (l/2e)(C2

+ C3) + cA \u\2H, (16.46)

Vu € V. Choosing e = {^/{C2 + C 3 )) in (16.46) and defining /? = 7 / 2 and A = {( 7 /2) + (1/2 7 )(C 2 + C3) 2 + C4} we obtain (-A*(t,a)u,u)

+ A|u& > /? || u ||£,Vtx £ V,t e I,a e V.

(16.47)

This proves coercivity. QED In view of this result and Lemma 6 of chapter 15 we have the following theorem. T h e o r e m 4 For any a e V and any finite interval I = [0,T] and every initial density p0 e H, equation (16.39) has a unique weak solution pa e L2(I, V) D C(I, H). Further, for every p0 e H satisfying \p0{x)\ < M0 exp - {tf|x| 2 },x G Rn,MQ > 0,# > 0,

(16.48)

there exist constants M\ = Mi(Mo,$) > 0, and , 0 < 6 < $ possibly depending on {Mo,#,T} such that the following estimate holds. \pa{t,x)\

< Afi exp-{<5|x| 2 },Vxe Rn,0

(16.49)

The proof of the estimate (16.49) follows from the fact that, under the given assumptions, the fundamental solution of the parabolic equation (16.39) satisfies the following estimate r(«,x,T,OI < (ki/(t-r)^2)exp-{k2(\x-C\2)},

(16.50)

for all a e V,x,£ € Rn,0 < r < t < T, where k\ and k2 are positive con stants depending on 7 of assumption (A2) and the bounds of the parameters {aij,/?i,7i,5} as defined in equation (16.43). In fact, the estimate (16.49) can be computed using the expression for the solution pa(t,x)=

[

r(t,x,0,Z)po(£)dZ,

242

System

Identification

and the estimate (16.48) and (16.50). For details on the estimate of the fun damental solution see [43]. In view of theorem 4, we rewrite our objective functional (16.41) as follows: JT(a) (a)

(16.51)

= (qa(T),r, (gQ(T),7 (r))H a(T)) ?QH

where Va(t) = Tfa(t,x) = exp{ha(x) ■ z(t) - (6/2)\x\2}, Va{t) Va(t,x) = , [ u ^ „„, ,21 D nn qa(t) = qa(t,x)=pa{t,x)exp{{6/2)\x\22},x& Rqa{t) = qa{t,x)=pa(t,x)exp{(S/2)\x\ },x 6 Rn .

(16.52) 16"52)

Next, we introduce the operator B* as follows aa B*(t,a)4> B*{t,a)<j>==exp{-h exp{-h ■• z{t) z(t) + + (6/2)\x\22}A*(t,a)(exp{h }A*{t,a)(exp{haa

2 2 •■z(t) z{t)- - (6/2)|x| {6/2)\x\ })})

l a a -(ll2){RZ h ,h )4> -(l/2)(R;lh°,h°)

=:exp{(«5/2)N 22 M*(0exp{(-«5/2)N 22 }.

Sr, ^lU.ffcU^.

! hve=°pe0rfa*ui

Zr^^^TLf^^f^^x^^^l

H i VitJW U l Lilt; t J X p i t J o b l U I l b yi.yJ.OZt)

(d/dt)qa = B*{t,a)qa,qa(0)

(16.53)

d-IlU ^ l U . O O y , t i l e t J V U l U b l U l l tJLJUctLlUIl

= q0 = p 0 exp{(6/2)|x| p0exp{{6/2)\x\22}, },

(16.54)

is equivalent to the original equation (16.39). In other words, for the initial condition p0 satisfying (16.48), if Pa € L2(I, V)nC(I, H) is the unique solution of (16.39) then qa given by (16.52) is the unique solution of equation (16.54) belonging to the space L2(I, V ) n C ( J , H). Thus the identification problem (IP) stated earlier is equivalent to the following optimization problem: (d/dt)qa = B*(t,a)qa,qa(0)

= q0

M<*) Ma) = (q {qaa(T),T] (T), Va{T)) a{T))H H —+ max.

(16.55) ^16'55)

Now we are prepared to demonstrate that under one more additional as sumption the identification problem (16.39)-(16.41) denoted by (IP) or its equivalent, described by equation (16.55), has a solution. Theorem 5 (Existence) Consider the identification problem (IP) or equivalently (16.55). Suppose the assumptions (Al)-(A5) hold, p0 satisfies the esti mate (16.48), a —► h<* has atmost linear growth and that the parameter set V C RN is compact. Then the mapping a —> Ma) is continuous on V and hence attains its maximum on V.

Linear and Nonlinear Filtering

243

Proof. Since a continuous function on a compact set attains both its maximum and minimum, it suffices to verify that a —> JT(a) is continuous on V. Let an,a° £ V, and pn,p° the corresponding solutions of equation (16.39) and qn,q° those of the equivalent system (16.54). By virtue of Theorem 4, these solutions exist and they are unique. Suppose an —► a°. We prove that qn converges to q° in the topology of L2(7, V) n C(I, H). It suffices to show that pn converges to p° with respect to the same topology. Denning znn(t)=p (t)=pnn(t)-p°(t),t€l, (t)-p°(t),tel, one can easily check that zn(t),t (t) (d/dt)zn{t)

= A*(t,an)zn(t)

e 7, satisfies the differential equation + (A*{t,a (A*(t,ann)

- A*(t,a°))p°,t

€ I,

n

zzn(0) (0) = 0.

(16.56)

Scalar multiplying on either side of (16.56) by zn(t) we obtain {l/2)(d/dt)\z (l/2){d/dt)\zn{t)\ (t)\2H

-

{A*(t,ann)znn(t),znn{t)) (t)) n n n = {{A*(t, a{{A*(t,a )-A*(t,a°))p°,z (t)). ) - A*(t, a°))p°, zn(t)).

(16.57) (16 57) '

Integrating this over the interval [0,«] and using the coercivity property and the Cauchy inequality (16.45) one arrives at the following inequality \zn(t)\2H + (2/3 - e) f || zn(s) \\2V ds Jo t t n 2 < A / \z (s)\ Hds + (l/e) /[ || (A*{s,a (A*(s,ann)-A*(s,a°))q° Jo Jo \/te7,e>0. (16.58) that

||^. ds,

(16.58) (16.58)

Using e = j3 and applying Gronwall inequality it follows from

Ar \zn(t)\l+P //3) fT it)\2H+P ff II zn(s) \\2V ds < (eXT /P)[ Jo Jo

II|| (A*(t,an)-A*(t,a°))p°

\\2V. dt,

(16.59) for all t € I. From assumptions (Al) and (A2) one can prove that a —► A*(t,a) is strongly continuous (strong operator topology) from L2(7,V) to L2(7,V*). Hence it follows from (16.59), upon letting n->oo, that zn —-> 0 in L2(I,V) n C(I,H) and consequently qn —► q° in L2(I, V) n C(7,H) as an —► a°. This proves sequential conttnuity and hence conttnuity of the map a —► qa. Under the conttnuiiy assumptton of ha(x) in a e V and uniform

System

244

Identification

linear growth in x € Rn, the map a —► rja(T) is continuous from V to H. Thus the cost functional JT{OL) given by (16.51) is continuous on V and hence the existence of a maximum follows from the compactness of V. This completes the proof. QED The next question is how do we find the best parameter a° that maximizes the likelihood functional JT(&)- We present here a set of necessary conditions that the optimum parameter must satisfy. For this we need the Gateaux differentiability of the solution map a —► qa. Suppose the admissible set of parameters V is a compact and convex subset of RN. Let a€ = a° -f e(a a°), e € [0,1], Clearly by convexity of V, ct£ € V whenever a ° , a e V. L e m m a 6 Suppose the assumptions (Al)-(A5) hold and that V is a compact convex subset of RN. Then the map a —> qa is Gateaux differentiable on V and further its Gateaux differential at a° in the direction a — a° denoted by r is given by the weak solution of the following differential equation: (d/dt)r = B*(t,a°)r

+ G(t,a°,a

- a o )g o ,r(0) = 0,

(16.60)

where G denotes the strong Gateaux differential of the operator valued function a —► B*(t,a) at a° in the direction a - a° given by lim || {l/e){B*(t,a*)

- B*(t,a°))4> - G(t,a°;a-

a°)0 || v .—> 0,V0 G V. (16.61)

Proof. The proof is quite similar to that of theorem 5. We give here an outline only. Let qe and q° denote the unique solutions of system (16.54) corresponding to the parameters a€ and a° respectively. Define r £ = (l/e)(
(16.62)

Then following the same procedure as in theorem 5, one can verify that this sequence is contained in a bounded subset of the space W = { e L2(I, V):j>e

L2(I, V*)}.

(16.63)

Furnished with the norm topology, /

\ 1/2

II llws (|| 4> ||i a ( / i V ) + || 4> HL(/,v.)J

,

W is a reflexive Banach space. Hence the sequence {rE} has a subsequence which converges weakly to an element r e W a s e j O (along this subsequence).

Linear and Nonlinear Filtering

245

It is known [3] that W is continuously embedded in C{I, H) and hence r G C(I,H) fl 1/2(7, V). Prom this one can verify that r is a weak solution of (16.60). In fact, by use of arguments related to V* valued distributions one arrives at the conclusion that equation (16.60) holds almost everywhere on / . This completes the outline of our proof. QED For a slightly different but detailed proof see [2] and [27]. Finally with the help of the above results we derive the necessary conditions of optimality. Let f}(ot°, a - a°) denote the Gateaux differential of t] with respect to a at a° in the direction a — a°. T h e o r e m 7 (Necessary conditions of optimality) Suppose the assump tions of Lemma 6 hold. Then, in order that a° G V be the maximum likelihood estimate of the unknown parameter a G V, it is necessary that there exists a ip° G L2(I, V) n C(I, H) such that the following relations hold: {d/dt)q° = B*(t,a°)q°, -(d/dt)^°

= B(t,a°)r,

q°(0) = q0, r(T)

(16.64)

= 77ao(T),

/ (G(t, a°; a - a°)q°(t), ^°(t))v%vdt Jo + (
(16.65) (16.66)

<0VaeP

Proof. Let a° G V denote the optimal parameter and a G V any arbitrary element. Then the Gateaux differential of JT at a° in the direction a — a° denoted by dJria0, OL — a°) is given by dJT(a°,a-

a°) = (rao(T),r,ao(T))H

+ {qao(T),ri(a0,a-

a°)(T))H,

(16.67)

where qao = q° and rao == r° are the solutions of equations (16.64) and (16.60) respectively. Since a° is optimal it is clear that JT(a° + e(a - a°) - JT{OL°) < 0, \/e G (0,1], Va G V.

(16.68)

Dividing this by e and letting e —> 0, it follows from (16.67) and (16.68) that (r a o(r),77 Q o(T))tf + (qao(T),
< 0 Va G V.

(16.69)

Consider the general nonhomogeneous problem associated with (16.60) given by (d/dt)y = B*(t, a°)y + g, y(0) = 0. (16.70)

System

246

Identification

One can show that for every g € L2(I, V*), this equation has a unique solution y e W and that the map g —» y(T) is continuous linear from L2(I, V*) to H. Define g°a = G{t,a°,aa°)q°. Note that g°a € L2{I,V*). Thus the map g°a —>

(ra°(T),r,ao(T))H

is a continuous linear functional on L2{I, V*) and hence there exists a unique V>° € L2(I,V) such that (16.71) = /I < 9° £ (a*{t)^°{t) ) . l H 0 >v,v
16.5 A Computational Algorithm Assuming that the Gateaux differentials are linear, an assumption that normally holds, inequality (16.66) can be expressed coordinate wise as follows T,( riGtfrcW.mv.vdt + ( g ( T ) , ^ ° ) ( T ) ) H V ; -a?) -a?) |;(jfW.a'v.^w*+(gm^KD)*)^ ~^\./o ) 0

, (16.72)

o = 0 -i V<< 0 = ((VVJJT T( a( °a)e, a) -, aa -)aR V

Step 1. At the nth stage, n > 0, an € V is given, solve equation (16.64) for qn Step 2. Corresponding to the same an, solve equation (16.65) for ipn. Step 3 . Using {an,qn,^n}

and (16.72) compute the gradient

VJT(an).

Step 4. Define a n + 1 = a" + £ V J T ( a n ) for e > 0 sufficiently small so that Step 5. Compute JT(an+1)

= JT(an)+e\\

\7JT(an)

\\2RN +o(e).

Step 6. If J T ( a n + 1 ) < JT(an) reduce £ and repeat step 4. Otherwise go to step 1 with this new a = an+1 and continue till a stopping criterion is satisfied. Comment 2 The reader will notice that the general assumptions stated in this section are rather strong. These are not at all necessary. However under these assumptions rigorous proof are easily obtained. Comment 3 For further details, see [2],[29]. For the general theory of identification and numerical results, the reader is referred to [2].

247 17. References [1] N.U. Ahmed,Elements of Finite Dimensional Systems and Control Theory, Pitman Monographs and Surveys in Pure and Applied Mathe matics, 37,(1988), Longman Scientific and Technical, U.K., Co-publis her: John Wiley, New York. [2] N.U. Ahmed, Optimization and Identification of Systems Governed by Evolution Equations on Banach Space , Pitman Research Notes in Mathematics Series, 184, 1988, Longman Scientific and Technical, U.K., Co-publisher: John Wiley, New York. [3] N.U. Ahmed, and K.L.Teo, Optimal Control of Distributed Parameter Systems. Elsevier North Holland. New York. Oxford. 1981. [4] N.U.Ahmed and K.L. Teo, An Existence Theorem on Optimal Control of Partially Observable Diffusions, SIAM Journal on Control, 12, 3, p. 351-355, 1974. [5] N.U. Ahmed, Optimal Control of Stochastic Systems, Probabilistic Analysis and Related Topics, (ed. A. T. Bharucha-Reid), Academic Press Series, 2, p.1-68, 1979. . [6] N.U. Ahmed, Optimal Relaxed Controls for Infinite Dimensional Stoc hastic Systems of Zakai Type, SIAM Journal on Control and Optimiza tion, 34,5, p.1592-1615,1996. [7] N.U Ahmed, and J. Zabczyk, Partially Observed Optimal Controls for Nonlinear Infinite Dimensional Stochastic Systems, Dynamic Systems and Applications, 5, p. 521-538, 1996. [8] N.U. Ahmed, M. Fuhrman, and J. Zabczyk, On Nonlinear Filtering in Infinite Dimensions, Journal of Functional Analysis, 143, 1 , p. 180204, 1997. [9] N.U. Ahmed and T.E. Dabbous, Nonlinear Filtering of Systems Gov erned by Ito Differential Equations with Jump Parameters, J. Math. analysis AppL, 115,1, p.76-92., 1986. [10] N.U. Ahmed, and T.E. Dabbous and H.W. Wong, Gradient Method for Computing Optimal Controls for Stochastic Differential Equations, Stochastic Anal- ysis and Applications,5(2) , 121-150, 1987. [11] N.U. Ahmed and P. Li, Qadratic Regulator Theory and Linear Filter ing Under System Constraints, IMA Journal of Mathematical Control and Information, 8(8), pp.93-107, 1991. [12] N.U. Ahmed and S.M. Radaideh, A Powerful Numerical Technique Solving Zakai Equation for Nonlinear Filtering, Dynamics and Con trol, 7, pp 293-308, 1997.

248

References [13] N.U.Ahmed and S.M. Radaideh,MocK/ied Extended Kalman Filtering, IEEE Transactions on Automatic control, 39,6, pp. 1322-1326, 1994. [14] N.U.Ahmed and S.M. Radaideh, Identification of Linear Stochastic Systems Based on Partial Information, Journal of Applied Mathemat ics and Stochastic Analysis 8,3, pp.349-360, 1995. [15] N.U. Ahmed and H.W. Wong,i4 Minimum Principle for Systems Gov erned by Ito Differential Equation with Markov Jump Parameters, Dif ferential Games and Control Theory II (Ed. E.O. Roxin, P.T.Liu and R.L. Sternberg), Lecture Notes in Pure and Applied Mathematics, 30, Marcel Dekker, Inc. New York Basel, 1976. [16] N.U. Ahmed, Existence and Uniqueness of Measure Valued Solutions for Zakai Equation, Publicationes Mathematicae, Debrecen, 49, 3-4, pp.251-264,1996. [17] L. Arnold, Stochastic Differential Equations: Theory and Applica tions, John Wiley and Sons, New York, 1974. [18] B.D.O. Anderson and J.B. Moore, Optimal Filtering, Prentice-Hall, Inc., Englewood Cliffs, N.J, (1979). [19] J.S. Baras, G.L. Blankenship, and S.K. Mitter, Nonlinear Filtering of Diffusion Processes, Proc. IFAC Congr., Koyoto. Japan, Aug. 1981. [20] A. Bensoussan, Stochastic Control of Partially Observable Systems, Cambridge University Press, 1992. [21] P. Billingsley, Probability and Measure, (2nd Edition), Wiley-Interscience, New York Chichester Brisbane Toronto Singapore, 1985. [22] A. Bagchi, Continuous Time Systems Identifications with Unknown Noise Covariance, Automatica, 11, pp.533-536, 1975. [23] R.S. Bucy, Nonlinear Filtering Theory, IEEE Aut. Contr., 10, pp.198212, 1965. [24] S.K Berberian, Measure and Integration, The MacMillan Company, New York, Collier-MacMillan limited, London, 1965. [25] V.E. Benes, Exact Finite Dimensional Filter for Certain Diffusion with Nonlinear Drifts, Stochastics, 6, pp.65-92, 1981. [26] J.M.C. Clark, The Design of Robust Approximation to the Stochastic Differential Equation of Nonlinear Filtering, Communication Systems and Random Process Theory, pp.721-735, 1981. [27] T.E. Dabbous and N.U. Ahmed, Nonlinear Filtering of Diffusion Pro cesses with Discontinuous Observations , Stoch. Analys. Appl., 2, 1, pp.87-106. 1984. [28] T.E. Dabbous, N.U. Ahmed, J.C. McMillan and D.F. Liang, Filtering of Discontinuous Processes Arising in Marine Integrated Navigation Systems, IEEE Transaction on Aerospace and Electronics Systems, 24,1, pp.85-102, 1988.

Linear and Nonlinear Filtering

249

[29] T.E. Dabbous and N.U. Ahmed, Parameter Identification for Partially Observed Diffusion , J. of Optimization Theory and Applications(JOTA)., 75,1, pp 33-50, 1992. [30] M.H. Davis, and S.L. Marcus, An Introduction to Nonlinear Filter ing, Stochastic Systems: The Mathematics of Filtering and Identi fication and Applications (NATO Advanced Study Institute Series). Dordrecht. Reidel. pp.53-75, 1981. [31] M.H. Davis, New Approach to Filtering for Nonlinear Systems, IEEE Proc, 128, 5, pp.166-172, 1981. [32] G.B. Di Masi and W.J. Runggaldier, An Approximation to Opti mal Nonlinear Filtering with Discontinuous Observations: Stochastic Systems, The Math- ematics of Filtering and Identification and Appli cations (NATO Advanced Study Institute Series), Dordrecht, Reidel, pp.583-590, 1980. [33] R. J. Elliot, Stochastic Calculus and Applications,Springer-Verlag, Heidel burg Berlin. New York, 1982. [34] R.J. Elliot and R. Glowinski, Approximations to solutions of Zakai Fil tering equation, Stochastic Analysis and Applications, 7(2), pp.145168, 1989. [35] W.H. Fleming, Measure Valued Processes in Control of Partially Ob servable Stochastic Systems, Appl. Math. Optim., 6, pp.271-285, 1980. [36] W.H. Fleming and S.K. Mitter, Optimal Control and pathwise Nonlin ear Filtering of Nondegenerate Diffusions, presented at the 20th IEEE conf. De- cision Contr., San Diego. CA. 1981. [37] W.H. Fleming, and E. Pardoux, Optimal Control for Partially Ob served Diffusions , SIAM J. Optim., 20,2 , pp.261-285. 1982. [38] W.H. Fleming and H.M. Soner, Controlled Markov Processes and Vis cosity Solutions, Springer-Verlag, New York Berlin Heidelberg London Paris Tokyo Hong Kong Barcelona Budapest, 1993. [39] W.H. Fleming and Q. Zhang, Nonlinear Filtering with Small Obser vation Noise: Piecewise Monotone Observations , stochastic analysis, Academic press. Inc., pp.153-168, 1991. [40] P. Florchinger and L.F. Gland, Time Discretization of the Zakai Equa tion for Diffusion Processes Observed in Correlated Noise, Stochastics and Stochastics Reports, 35, pp.233-256, 1991. [41] A. Friedman, Stochastic Differential Equations and Applications,!, Academic Press, New York San-Francisco London, 1975. [42] A. Friedman, Stochastic Differential Equations and Applications,2, Academic Press, New York San-Francisco London, 1976. [43] A. Friedman, Partial Differential Equations of Parabolic Type , Pren tice Hall,Inc. 1964.

250

References [44] A. Germani, Stochastic Modelling and Filtering, Proc. IFIp-WG 7/1 Working Conference Rome, Italy, Dec. 10-14, 1984, Lect. Notes in Control and Information Sciences, 9 1 . [45] J.C Geromel, Optimal Linear Filtering Under Parameter Uncertainty, Manuscript LAC-DT/ School of Electrical Engineering, UNICAMP, CP610,- 13081-970,1997, Campinas,SP,Brazil. [46] I.I. Gihman and A.V. Skorokhod, Stochastic Differential Equations, Springer-Verlag, New York Heidelberg Berlin 1972. [47] Gihman, I.I. and Skorokhod, A.V, The Theory of Stochastic Processes III. Springer-Verlag, Heidelberg Berlin., New York, 1979. [48] L.F. Gland, Systematic Numerical Experiments in Nonlinear Filtering with Automatic Fortran Code Generation, Proceeding of 25th CDC, Greece, pp. 638-642, 1986. [49] P.R. Halmos, Measure Theory, D. Van Nostrand Company, Inc., Princ eton, New Jersey, Toronto, New York, London, 1964. [50] M. Hazewinkel and J.C. Willems, Stochastic Systems: The Mathemat ics of Filtering and Identification and Applications , Proc. NATO Advanced Study Institute, Les Arcs, Savoie, France,(june 22-July 5, 1980), D.Reidel Publishing Company, Dordrecht Boston London. [51] A.H. Jazwinski, Stochastic Processes and Filtering Theory , Academic Press, New York, London, 1970. [51] G. Kallianpur, Stochastic Filtering Theory , Spring-Verlag, Heidelberg Berlin, New York, 1980. [52] R.E. Kalman and R.S. Bucy, New Results in Linear Filtering and Prediction Theory, Tran. ASME,Ser, D: J. Basic Eng., 83, pp.95-108, 1961. [53] H. Kunita, The Stability and Approximation Problems in Nonlinear Filtering Theory, Stochastic analysis, Academic press, Inc., pp.311329, 1991. [54] H.J. Kushner, On the Dynamical Equations of Conditional Probabil ity Density Functions with Application to Optimal Stochastic Control Theory, J. Math. Analys. Appl., 8, pp.332-344, 1964. [55] H.J. Kushner, On the Differential Equations Satisfied by Conditional Probability Densities of Markov Processes, SIAM J. Contr., 2, pp.106119, 1964. [56] H.J. Kushner, Dynamical Equations for Optimal Nonlinear Filtering, J. Diff. Equations, 3, pp.179-190, 1967. [57] R. Katzur, B.Z. Bobrovsky and Z. Schuss, Asymptotic Analysis of the Optimal Filtering Problem for One Dimensional Diffusions Measured in a Low Noise Channel, Part I. SIAM J. Appl. Math. 44, 3, pp.594604, 1984.

Linear and Nonlinear Filtering

251

[58] R. Katzur, B.Z. Bobrovsky and Z. Schuss, Asymptotic Analysis of the Optimal Filtering Problem for One Dimensional Diffusions Measured in a Low Noise Channel , Part II. SIAM J. Appl. Math. 44, 6, pp.1176-1191, 1984. [59] P.E. Kloeden and E. Platen, Numerical Solution of Stochastic Differ ential Equations, Springer-Verlag, Heidelberg Berlin, New York, 1992. [60] O.A. Ladyzenskaja, V.A. Solonokov and M.N. Uralceva, Linear and Quasilinear Equations of Parabolic Type, Eng. Trans, 23, AMS (1968). [61] D.F. Liang, Exact and Approximation of State Estimation Techniques for Nonlinear Dynamical Systems, Advances in Contr. and Dynamic Systems. Academic Press. New York , London, 19, pp.1-71, 1983. [62] R.S. Liptser and A.N. Shiryayev, Studies of Random Processes I and II, Springer-Verlag. Heidelberg Berlin, New York, 1978. [63] H.J. Larson and B.O. Shubert, Probabilistic models in Engineering Sciences, 1, I I , John Wiley and Sons, Inc., 1979. [64] G.N. Milstein, Approximate Integration of Stochastic Differential Eq uations, Theory Prob. Appl., 19, 1974. [65] E. Pardoux, Nonlinear Filtering Prediction and Smoothing Stochastic Systems: The Mathematics of Filtering and Identification and Appli cations (NATO Advanced Study Institute Series), Dordrecht, Reidel, pp.529-577, 1980. [66] E. Pardoux, Stochastic Partial Differential Equations and Filtering of Diffusion Processes, Stochastics 3,2 , pp. 127-167, 1979. [67] K.A. Parathasarathy, Probability Measures on Metric Spaces, Aca demic Press, New York London, 1967. [68] O. Perez, W. Colmenares, E. Granado and F. Tadeo, Robust Multimodel Control of a Neutralization Process , Proc. ICECS'97, pp.13351340,1997. [69] J. Picard, An Estimate of the Error in Time Discretization of Nonlin ear Filtering Probplems, Theory and Applications of Nonlinear Control Systems, North-Holland, pp.401-412, 1986. [70] A.V. Skorokhod, Studies in the Theory of Random Processes, (eng. trans.), Addison-Wesley Publishing Company,Inc. Reading, Massach usetts, (1965). [71] D.W. Stroock and S.R.S. Varadhan, Multidimensonal Diffusion Pro cesses, Springer-Verlag Berlin Heidelberg New York, 1979. [72] R. Temam, Infinite Dimensional Systems in Mechanics and Physics , Springer-Verlag New York Berlin Heidelberg London Paris Tokyo, 1980. [73] K.L. Teo, N.U. Ahmed and M.E. Fisher, Optimal Feedback Control for Linear Stochastic Systems Driven by Counting Processes, Journal of Engineering Optimization, Vol. 15 , No.l, pp. 1-16, 1990.

252

References

[74] E. Wong and M. Zakai, On the Convergence of Ordinary Integrals to Stochastic Integral, Ann. Mat. 36, ppl560-1564, 1965. [75] W.M. Wonham, Random Differential Equations in Control Theory, Probabilistic Methods in Applied Mathematics, 2, (Ed. A.T. Bharucha Reid), 1970, Academic Press, New York, London. [76] O. Zeitouni and B.Z. Bobrovsky, On the Reference Probability Ap proach to the Equations of Nonlinear Filtering, Stochastics,19, pp. 133149, 1986. [77] M. Zakai, On the Optimal Filtering of Diffusion Processes, Z. Wahrsch, Verw. G e b . l l , pp.230-243, 1969. [78] G. Da Prato and J. Zabczyk, Stochastic Equations in Infinite Dimen sions , Encyclopedia of Mathematics and its Applications, 44, Cam bridge University Press, 1992. [79] P.S. Maybeck, Stochastic Models, Estimation and Control, Vol. 1, Academic Press, New York San Francisco London, 1979. [80] J. Picard, Asymptotic Study of Estimation Problems with Small Ob servation Noise, Stochastic Modelling and Filtering, Proc. IFIP-WG 7/1, Rome, Italy, 1984, Springer Lect. Notes in Control and Inf. Sc. Vol. 9 1 , Springer-Verlag, Berlin Heidelberg New York London Paris Tokyo, 1987. [81] J. Golec and G. Ladde, On an Approximation Method for a Class of Stochastic Singularly Perturbed Systems, Dynamic Systems and Ap plications, Vol. 2, pp.11-20,1993. [82] J. Golec, Sample Path Approximation for a Class of Stochastic Sys tems, Dynamic Systems and Applications, Vol.5 , pp.569-581,1996. [83] S.K. Biswas and M.B. Subrahmanyam, Worst-Case Estimation of Un known Sinusoids Contained in Corrupted Measurement Data, Ameri can Control Conference, Philadelphia, June 24-26,1998. [84] K. Ito, On Stochastic Differential Equations, Memoirs of the Ameri can Mathematical Society, N o . 4, American Mathematical Society, Providence, Rhode Island, 1951. [85] M.G. Crandal and P.L. Lions, Viscosity Solutions of Hamilton-Jacobi Equattions, Transactions A.M.S., 277, pp.1-42, 1984. [86] N.U. Ahmed and X. Ding, Controlled McKean-Vlasov Equations and Viscosity Solutions, Communication in Applied Analysis, (to Appear 1998). [87] T.E. Dabbous, N.U.Ahmed, S.S. Lim Linear Filtering for a Class of Jump Processes Arising in Navigation Systems, IMA Journal of Math ematical Control and Information,Vol.7, pp.269-292, 1990. [88] J.K. Tungait, Continuous-Time System Identification on Compact Parameter sets, IEEE trans, on Information Theory, IT-31, pp.652659, 1985.

Linear and Nonlinear Filtering

253

[89] S.P.Sethi, Optimal Consumption-Investment Decisions Allowing for Bankruptcy: A Survey , ICOTA-98, Workshop on Optimization: Techniques and Applications, Curtin University of Technology, West ern Australia, June29-30,1998.

This page is intentionally left blank

255

18. index Absolute Continuity 5, 7, 51 66, 125, 169,171190 Adapted 10, 14, 21, 28, 57, 74 156, 171, 182, 214, 222 Almost sure continuity 169 Borel sets 2, 12, 113 Borel-Cantelli Lemma 8 Brownian motion 10, 25, 33, 48 66, 92, 118, 151, 167, 213, 236 Backward Kolmogorov equation 39, 41, 46, 214 Cameroon-Martin-Girsanov formula 47, 51, 169 Canonical Sample space 169 Cauchy Inequality 190, 240, 242 Correlated noise 85, 95, 105 153 Chapman-Kolmogorov equation 40, 44 Conditional expectation 5, 57 93, 103, 152, 156 Conditional probability 7, 168 179, 209, 220 Continuity of Stochastic processes 9, 18, 44 Continuous martingales 12,117 Covariance operator 89, 93, 99 103, 110, 142 Differential Riccati equation 62, 72, 89, 129, 158, 181, 204 Diffusion process 40 Doob's martingale inequality 12 Dynamic programming equation 207, 208, 214 Equivalence of random variables 5

stochastic processes 9 Error Covariance 79, 82 Existence of solutions SDE30 filtering problems 125 Zakai equation 199 Feller process 44 Feller semigroup 44 Forward Kolmogorov equation 43, 45 Filtering problem 55, 56, 121 167 Filter theory 55,155,167 Linear 55, 69, 77, 85, 95 105, 113, 121, 141 Nonlinear 155, 167, 187 Filtration (Filtered probability space) 10 Gaussian Random variable 3, 4 Random process 11, 210 Gateax derivative 62, 80,88,98 Gain Constraints 121 Games theory 131 Girsanov theorem 46 Gronwall inequality (Lemma) 36 H J B equation 208, 214, 225 Identification 227 Linear Systems 227, 234 Nonlinear Systems 227, 237 Information sigma algebra 57 204 Integro-differential equations 64, 145 Ito Differential 34 Ito Integral 13, 14

256

Ito-Stratonovich S D E 32, 38 Infinitesimal (Differential) Generator 42, 43 Innovation Process 92, 102, 111 Jump Process 9, 13, 113 (see Poisson Process) Kalman Filtering 55, 69 Kolmogorov's Equations Backward 41 Forward 43 Kolmogorov's Continuity Condition 9 Lebesgue Dominated Convergence 8, 44 Linear Quadratic Gaussian Regulator 210 Lp Martingale 12 Markov Property 10 Markov Process 9, 10 Markov Semigroup 44 Martingales 12 Sub Martingale 12 Super Martingale 12 Discrete Kalman Filtering 77 Measurable Functions 2, 9 Modes of Convergence 4, 5 Measures 2, 4, 45, 48, 51,113,169 176, 187 Measurement Process 56, 69 78 Moments 4, 182 Multiple Wiener integrals 15 Nonanticipative Process 14 Nonlinear Filtering 155, 167 Observation (Measurement) Process 56, 69 Optimum Filter 55 Linear 58, 72, 83 90, 100, 110, 129, 141 Nonlinear 155, 167 Optimal control 203 Optimum Gain 80, 83

Index Partially observed 203 Control 203 Identification 227 Probability Space 1 Poisson process 9, 13, 21, 113 Prediction 68, 186 Probability measure 2 Quadratic variation process 19 Radon-Nikodym derivative ( R N D ) 6, 169 Random variables 2 Riccati differential equations 63, 71, 74, 88 Right continuity 169 Robust filtering 131 Sample path 11, 170 Sample space 1, 169 Semigroups (see Markov, Feller) 44 Stochastic process 9 Stochastic differential equations (SDE) 25, 28, 36 Stochastic Integrals 14, 21, 22 Strong solutions 29 Signal process 66 System and Measurement Dynamics 55 ,56, 69, 78, 85 95, 105, 113, 121, 141, 155 Time optimal control 52 Trajectory 113 Transition Kernel 39 Terminal cost 225 Uncorrelated Noise 90 Uniqueness of solutions 30 Uniform integrability 53 Value function 206 Viscosity solution 225 Weak solution 51 White noise 11, 150 Wiener process (martingale) 9, 10, 14

Linear And Nonlinear Filtering For Scientists And Engineers

Read more

Nonlinear PDEs for scientists and engineers

Read more

Excel for Scientists and Engineers

Read more

Physics for scientists and engineers

Read more

Statistics for Engineers and Scientists

Read more

Computing for scientists and engineers

Read more

Statistics for Engineers and Scientists

Read more

Physics for scientists and engineers

Read more

Nonlinear Physics with Mathematica for Scientists and Engineers

Read more

Nonlinear Partial Differential Equations for Scientists and Engineers

Read more

Nonlinear Partial Differential Equations for Scientists and Engineers

Read more

Nonlinear partial differential equations for scientists and engineers

Read more

Handbook of linear partial differential equations for engineers and scientists

Read more

Handbook of Linear Partial Differential Equations for Engineers and Scientists

Read more

Nonlinear Physics with Mathematica for Scientists and Engineers

Read more

Nonlinear Ordinary Differential Equations: An Introduction for Scientists and Engineers

Read more

Nonlinear Partial Differential Equations for Scientists and Engineers, Second Edition

Read more

Nonlinear Partial Differential Equations for Scientists and Engineers

Read more

Nonlinear Physics with Mathematica for Scientists and Engineers

Read more

Linear Partial Differential Equations for Scientists and Engineers

Read more

Handbook of Linear Partial Differential Equations for Engineers and Scientists

Read more

Linear Partial Differential Equations for Scientists and Engineers

Read more

Handbook of Linear Partial Differential Equations for Engineers and Scientists

Read more

Handbook of linear PDEs for engineers and scientists

Read more

Linear Partial Differential Equations for Scientists and Engineers, 4th Edition

Read more

Linear Partial Differential Equations for Scientists and Engineers

Read more

Linear Partial Differential Equations for Scientists and Engineers

Read more

Linear Partial Differential Equations for Scientists and Engineers

Read more

Linear Partial Differential Equations for Scientists and Engineers

Read more

Linear partial differential equations for scientists and engineers

Read more

Recommend Documents

Linear And Nonlinear Filtering For Scientists And Engineers

Nonlinear PDEs for scientists and engineers

Excel for Scientists and Engineers

Excel@ for Scientists and Engineers Numerical Methods E. Joseph Bill0 BICENTENNIAL BICENTENNIAL WILEY-INTERSCIENCE ...

Physics for scientists and engineers

Statistics for Engineers and Scientists

nav76337_fm.qxd 12/10/09 1:35 PM Page i Statistics for Engineers and Scientists Third Edition This page intention...

Computing for scientists and engineers

Statistics for Engineers and Scientists

nav76337_fm.qxd 12/10/09 1:35 PM Page i Statistics for Engineers and Scientists Third Edition This page intention...

Physics for scientists and engineers

Nonlinear Physics with Mathematica for Scientists and Engineers

...

Nonlinear Partial Differential Equations for Scientists and Engineers

Lokenath Debnath Nonlinear Partial Differential Equations for Scientists and Engineers Third Edition Lokenath Debnat...