MATHEMATICAL PROGRAMMING STUDIES
Editor-in-Chief M.L. BALINSKI, International Institute for Applied Systems Analysis, Laxenburg, Austria, and City University of New York, N.Y., U.S.A. Senior Editors E.M.L. BEALE, Scientific Control Systems, Ltd., London, Great Britain GEORGE B. DANTZIG, Stanford University, Stanford, Calif., U.S.A. L. KANTOROVICH, National Academy of Sciences, Moscow, U.S.S.R. TJALLING C. KOOPMANS, Yale University, New Haven, Conn., U.S.A. A.W. TUCKER, Princeton University, Princeton, N.J., U.S.A. PHILIP WOLFE, IBM Research, Yorktown Heights, N.Y., U.S.A. Associate Editors PETER BOD, Hungarian Academy of Sciences, Budapest, Hungary VACLAV CHVATAL, Stanford University, Stanford, Calif., U.S.A. RICHARD W. COTTLE, Stanford University, Stanford, Calif., U.S.A. J.E. DENNIS, Jr., Cornell University, Ithaca, N.Y., U.S.A. B. CURTIS EAVES, Stanford University, Stanford, Calif., U.S.A. R. FLETCHER, The University, Dundee, Scotland TERJE HANSEN, Norwegian School of Economics and Business Administration, Bergen, Norway ELI HELLERMAN, Bureau of the Census, Washington, D.C., U.S.A. ELLIS L. JOHNSON, IBM Research, Yorktown Heights, N.Y., U.S.A. C. LEMARECHAL, IRIA-Laboria, Le Chesnay, Ivelines, France C.E. LEMKE, Rensselaer Polytechnic Institute, Troy, N.Y., U.S.A. GARTH P. McCORMICK, George Washington University, Washington, D.C., U.S.A. GEORGE L. NEMHAUSER, Cornell University, Ithaca, N.Y., U.S.A. WERNER OETTLI, Universitiit Mannheim, Mannheim, West Germany MANFRED W. PADBERG, New York University, New York, U.S.A. L.S. SHAPLEY, The RAND Corporation, Santa Monica, Calif., U.S.A. K. SPIELBERG, IBM Scientific Center, Philadelphia, Pa., U.S.A. D.W. WALKUP, Washington University, Saint Louis, Mo., U.S.A. R. WETS, University of Kentucky, Lexington, Ky., U.S.A. C. WITZGALL, National Bureau of Standards, Washington, D.C., U.S.A.
MATHEMATICAL PROGRAMMING STUDY 5 Stochastic Systems: Modeling, Identification and Optimization, I Edited by Roger I.-B. WETS
November (1976)
NORTH-HOLLAND PUBLISHING COMPANY - AMSTERDAM
© The Mathematical Programming Society, 1976 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner.
This STUDY is also available to non-subscribers in a book edition.
Printed in The Netherlands
PREFACE
This volume and its companion on stochastic systems are essentially* the proceedings of a symposium held in Lexington, Kentucky, in June 1975. The purpose of the meeting was to bring together researchers involved in one way or another in the description and control of stochastic phenomena. One hears sometimes that if a field has reached the level where the unsolved problems are well formulated, the language has been unified and almost everybody agrees on the fundaments, then - from a researcher's viewpoint - the field is nearly moribund. Assuming that as measure of vitality of a field one would use the opposite criteria, then stochastic systems is alive and well. Naturally the field is more solidly established in the modeling part than in identification and we know relatively more about identification than about optimization. However since there are strong interconnections between the questions raised in optimization, identification and modeling of stochastic processes, it is imperative to approach the problems from a global standpoint. At this juncture unfortunately, the development of such a general theory seems to be a major undertaking that would also require from its author more than common foresight since by no means have we even scratched the surface of possible applications of stochastic systems theory. The disparity of background of the researchers - of which those contributing to these volumes are a representative sample - just gives us one more hint as to the wide range of applicability of the models. This great variety of applications is also in many ways responsible for the diversity of techniques, terminologies, ... that are tieing used. As already indicated above one of the goals of the symposium was to help to reduce the distance between the work going on in the different "subfields" of stochastic systems and to start the dialogue that would allow the "translation" of the results from one area to another. In this first volume, we have included those articles that deal with modeling and identification. The second volume is exclusively devoted to • A few participants were unable to meet the publication's deadlines and we regret that these proceedings are not as complete as they could have been. v
vi
Preface
models involving - at least partial - control of the stochastic process. (The preceding paragraph indicates that these divisions must remain to some extent arbitrary). The papers by Cinlar, Elliot and Hida are concerned with the extension of the class of stochastic processes that can be mathematically modeled. Cinlar shows that it is possible to associate to each semi-regenerative process (enjoying the strong Markov property) a Markov additive process. Elliot reviews results on the martingales that can be associated to a jump process. Hida extends the definition of Brownian functional (a function of Brownian motion) to be able to handle certain problems arising in stochastic control, quantum mechanics, and others. The possibility to handle analytically stochastic models, often by means of approximations, is the concern of Brockett, Holland, Kurtz, McShane, Pinsky and Zachrisson. Brockett studies a class of Ito equations of the form dx == Ax dt + L::, B,x dw. + L~~, g, dv, where w. and v, are standard independent Wiener processes. He shows that if the Lie algebra generated by the B. possesses some natural properties, then many of the features of the purely linear case can be salvaged. Kurtz shows that for a certain class of one-parameter families of Markov chains X A (t), X A (t)/A converge in probability as A ~ 00 to a system of ordinary differential equations. He then considers approximation by diffusions. The effect of replacing a stochastic process by an idealized version of this process is studied by McShane. This is done so that convergence and approximation results can be derived. Pinsky considers a system of Ito equations in R" and studies stability properties in particular "asymptotic stability in probability". A number of conditions are given that guarantee this property for systems of Ito equations and the results are applied to the Dirichlet problem Lu == 0 with L the infinitesimal operator of the Ito process. Zachrisson is also concerned with the "reduction" of practical problems to Gauss-Markov processes. He justifies this procedure by characterizing processes in terms of their first and second order moments and then giving the "equivalent" process generated by a linear Ito equation. Holland uses a probabilistic technique to handle ordinary and boundary layer expansions derived from classes of singularly perturbed semilinear second order elliptic and parabolic partial differential equations. Finally, Benes settles an equivalence question between an innovation process and a related observation process. This result, a generalization of a result of Clark, exemplifies the difficulties one encounters when passing from the purely modeling stage to modeling with identification. In which way should we study the passage of information from the
Preface
vii
modeler to the controler? In their contribution Ho and Sun view this as a two-person game and study the value of information that is defined in terms of the relative strength of the player and the market's nature. Tse shows that there is a tight coupling between the control adopted and the information collected. The probing aspect of the strategy is defined in terms of Shannon's information measure. He then gives an application to the control of a Markov process with imperfect observation. The papers of Casti, Le Breton, Lindquist and Mehra are more directly concerned with the identifiability of a system and the associated calculations. Le Breton considers the problem of parameter estimation in a linear stochastic differential equation, with constant coefficients. He compares continuous and discrete sampling. Mehra suggests the use of Markov models for the estimation, one step ahead, of the values of independent variables in time series. He shows, among other things, that these predicted estimates can be used for computing maximum likelihood estimates of the unknown parameters. Convergence of parameter estimates is the object of the paper of Ljung. This is studied when the parameter set contains a true description of the system that generated the data and also, when - more realistically we cannot assume this a priori nice situation. In view of the richness of the contributions it has only been possible to give a quick overview of the content of this volume. I hope that nevertheless this volume and its companion will help the process of cross-fertilization that seems so promising. The symposium and the preparation of these volumes would not have been possible without the help of the Mathematics Department of the University of Kentucky and the National Science Foundation. They contributed without hesitation moral and financial support. I must single out Raymond Cox, Clifford Swanger, Robert Goor and Anders Lindquist (all of the University of Kentucky) and Robert Agins (of N.S.F.). The organizing committee A. Balakrishnan, G. Dantzig, H. Kushner, N. Prahbu and R. Rishel also provided us with invaluable expertise in setting up the program. Finally, we must recognize that the preparation and the running of the symposium would not have been possible without the organizational and secretarial skills of Ms Linda Newton and Pat Nichols. Louvain, 1976
Roger I.-B. Wets
CONTENTS v
Preface. Contents
viii
MODELING
(1) Extension of Clark's innovations equivalence theorem to the case of signal z. independent of noise, with J~ z; ds < 00 a.s., V.BeneS
2
(2) Parametrically stochastic linear differential equations, R. Brockett
8
(3) Entrance-exit distributions for Markov additive processes, E. (:in/ar .
22
(4) Martingales of a jump process and absolutely continuous changes of measure, R. Elliott
39
(5) Analysis of Brownian functionals, T. Hida
53
(6) Probabilistic representations of boundary layer expansions, C. Holland
60
(7) Limit theorems and diffusion approximations for density dependent Markov chains, T. Kurtz.
67
(8) The choice of a stochastic model for a noise system, E. McShane
79
(9) Asymptotic stability and angular convergence of stochastic systems, M. Pinsky
93
(10) Remarks on wide sense equivalents of Gauss-Markov processes in R", L. Zachrisson viii
continuous
103
Contents
ix
IOENTIFICATION
(11) A reduced dimensionality method for the steady-state Kalman filter, J. Casti
116
(12) On continuous and discrete sampling for parameter estimation in diffusion type processes, A. Le Breton. .
124
(13) On integrals in multi-output discrete-time Kalman-Bucy filtering, A. Lindquist .
145
(14) On consistency and identifiability, L. Ljung .
169
(15) Identification and estimation of the error-in-variables model (EVM) in structural form, R. Mehra
191
(16) Value of information in zero-sum games, F Sun and Y. Ho
211
(17) Sequential decision and stochastic control, E. Tse.
227
Mathematical Programming Study 5 (1976) 2-7. North-Holland Publishing Company
EXTENSION OF CLARK'S INNOVATIONS EQUIVALENCE THEOREM TO THE CASE OF SIGNAL z INDEPENDENT OF NOISE, W I T H fot z s2d s < ~ a.s. V.E. B E N E S Bell Laboratories, Murray Hill, N.J., U.S.A. Received 15 September 1975 Revised 20 December 1975 Let z be a process, w a Brownian motion, and y, = f~ z, ds + w, a noisy observation of z. The innovations process v,, defined in terms of the estimate ~, = E{z, I Y,, 0 ~< s ~< t} by y, = f~ i~ ds + v,, is also a Brownian motion. The innovations problem is to determine whether y is adapted to u. The positive answer of C l a r k / o r bounded z independent of w is extended to z independent of w with f~ z~ds < 0o a.s.
I. Introduction
It is a classical problem of filtering theory to estimate signals from the past of observations of them corrupted by noise. A standard mathematical idealization of this problem is as follows: The signal z, is a measurable stochastic process, with E ] z , [< w, the noise w, is a Brownian motion, and the observations consist of the process y, =
f0 t z , d s + w,.
(1)
Introduce s = E{z, t y~,0 ~< s ~< t}, the expected value of z, given the past of the observations up to t. It can be shown [4] that if f~ z]ds < ~ a.s., then there is a measurable version of 2 with f o' z~:s d s < oo a.s. The i n n o v a t i o n s p r o c e s s for this setup is defined to be v, =
fo
( z , - Ss)ds + w,,
and it is a basic result of Frost [1] and also of Kailath [2] that under weak
V.E. Bene# / Extension of Clark's innovations equivalence theorem
conditions v, is itself a Wiener process with respect to the observations. Thus (1) is equivalent to the integral equation y, =
~sds + v,,
(2)
which reduces the general case (1) to that in which z is adapted to y., a special property useful in problems of absolute continuity, filtering, and detection. The innovations problem, first posed by Frost [1], is to determine when the innovations process v contains the same information as the observations y, that is, when y. is adapted to v. This problem has been outstanding since about 1968, in both senses of the word, and it has drawn the attention of electrical engineers and probabilists alike. An account of the problem and its background can be found in a paper by Orey [3] and in lecture notes by Meyer [4]. It is now known that the answer to the general innovations problem is in the negative. Cirel'son has given a counterexample [7] for the following special case: Suppose that the signal z, is a causal function a(t, y) of the observations, i.e., the signal is entirely determined by feedback from the observations. Then z = P., w = v, and the problem reduces to asking whether the observations are "well-defined" in the strong sense of being adapted to the noise; for in this vestigial or degenerate case the noise is the only process left. Cirel'son's disturbing example consists of a choice of a ( . , - ) for which there is a unique weak solution y., nonanticipative in the sense that the future increments of w. are independent of the past of y., but such that y cannot be expressed as a causal functional of w.. Prior to this counterexample, several cases of the innovations problem had been settled in the affirmative. Clark [5] showed that if noise and signal are independent and the signal is bounded (uniformly in t and to), then observations are adapted to innovations. Also, the case of Gaussian observations turns out affirmatively: it follows from results of Hitsuda [8] that in this case 5. is a linear functional o f (the past of) y. and the equation (2) is solvable for y by a Neumann series. More generally, the case in which 2 is a Lip functional of y. also turns out affirmatively; in practice, though, it is difficult to find out when this condition is met. We extend Clark's result to the case where signal and noise are independent and the signal is square-integrable almost surely. The method used is similar to that of Clark, as described by Meyer [4], who also states, without proof, a generalization to sup,] z, I < ~ a.s.
V.E. Beneg / Extension of Clark's innovations equivalence theorem
2. Preliminaries The o--algebra in probability space generated by random variables x,, t E T, with T some indexing set, is denoted by ~r{x,, t E T}. Since we are assuming that z and w are completely independent, as opposed to just having the future increments of w. independent of the past of z., we can represent the conditional expectation 2, by Kallianpur and Striebel's formula [6]. For this and later purposes, it is convenient to introduce, with Orey [3], the functional q defined for pairs of suitable functions f, g by
q(f, g), =
exp{fo'f(s)dg(s)- ~fo'f(s)2ds};
"suitable" pairs f, g will be those for which f is square-integrable locally, and g is a Wiener function; for such pairs the stochastic integration in the formula for q is well-defined. With this notation, Kallianpur and Striebel's formula is
f Z(dz)ztq(z, y ) , _ 2, = Elz, l y , , 0 < ~ s ~< t} . . . . . .
f Z ( d z ) q ( z , - Y),
c~(t, y)
(3)
where Z is the (independent) measure for the signal process. In addition, the transformation 2(. ) is defined by 2(f), = a(t,f). It is easy to see that the denominator A, above is just q($(y), y),. For a quick proof, multiply by A,, integrate both sides from 0 to t with respect to dy, and add one to each side. Then
1+for
2(y),Asdy, = 1 +
f0f Z(dz)z,q(z,
y),dy,
= f Z(dz)[l +fo'Z,q(z,y).dy~]= A, because dq(z, y), = zsq(z, y)sdys. The unique solution of this equation is just q(2(y),y),, and we are done [4].
3. Argument We prove the following result: Theorem.
If z and w. are independent, if E Iz, I < ~ for each t, and if
V.E. Beneg / Extension of Clark "s innovations equivalence theorem
P
{fo z~ds
then ~r{y., s <- t} = o'{v,, s <~t} for 0 <- t <~T, modulo null sets. Proof. As in Clark's theorem, described by M e y e r [4], we exhibit a sequence of v - a d a p t e d processes converging to z~; the result then follows from equation (2). Let [[z [[2= frz]ds" It is apparent that the approximations
Z(dz)q(z, y),z, zrr~AIz't~-m -= s ~zFr2~l~,j<_.,Z(dz)q(z, y),
s
(4)
approach s as m --~ oc, and that each one is adapted to y. as a function of (t, w). T h e r e f o r e it is enough to prove that each approximation above is adapted to v. as a function of (t, ~o). Set then M(t,m)={l[z[r^lz, l<~m}, and put also m = 1,2 .....
s
f. ~?'"+' =
Z(dz)q(z, y m.),z, ("")
m ^ n / > 1,
Z(dz)q(z, y ' " ) , (t,m)
y7 ' ' " =
fo ~7'"ds
+ v,,
yT' =
fo
and note that 2 . . . . ' = zT,.(y""), d(y . . . .
q(z, y"'"), = q(z, y"),exp With ~?. . . .
L. (s)ds + v,,
y ' ) = (s . . . .
fo z,(~ . . . .
;~,.)ds, and
$,.)s ds.
2m = +'~" for short, we find from (4)
O. '"+' = •
(
JM (t,m)
Z(dz)z,q(z, y~),
Z(d(,q(g',
ds}-exp{f'
(O""ds/}
~,,.,.) Z(dz)q(z, y~)tf,.,,.Z(df)q(f, ym),exp{fo'fOm"ds }" Using the inequality lea - e B I ~< ~( ea + eB)l A - B I, and Schwarz's, we find
V.E. Beneg / Extension of Clark's innovations equivalence theorem
f.,,., Z2(dz
x d~r)lz, I fo'[Zs- ~'s [ ] ~bm'"[as
q(z, y" ),q (~', •
ym),[exp{fo'Z~,'~} +exp{fo'C~,'~ ,
L
•
(1, rn)
Z(dz)q(z'Y~)'L
q(z,
y""),q((,
ym),
+
Z(dr
(t, rn)
q(z,
;
'$m'dsl
y,-),q (~r,y,~-),
fM Z(dz)q(z'Ym)'fM Z(dC)q (~',y""), (t,m)
(t,m)
It can be verified that (f'olqJT"l~ds) ~ is a factor of the right-hand side, and that the value of the integral over M(t, m) 2 with respect to Z 2 does not exceed 2X/re. This is because on M(t, m)2
fo Izs - ~s [2ds ~ 2 fo(z~ + ~)ds <~4m. Thus squaring,
[~bf."+'12 <<_m 3fo' ]~b'f"12ds. Jensen's inequality and (4) give
fo'167,~ fo'le.12ds<-m2t. It follows that z~m'" are v-adapted functions converging to $,, ; these in turn converge to z?; thus s is v-adapted and the theorem is proved. The constraint 0 ~< t ~< T is nugatory; the argument works for any T so long as
forZ~ds <
a.s.
Acknowledgment The author is indebted to J.M.C. Clark, P. Frost, T.T. Kadota, H.P. McKean, Jr., and L.A. Shepp for valuable discussions regarding the innovations problem.
V.E. Beneg / Extension of Clark "s innovations equivalence theorem
References [1] P.A. Frost, "Estimation in continuous-time nonlinear systems", Dissertation, Dept. of Electrical Engineering, Stanford University, Stanford, Calif. (June, 1968). [2] T. Kailath, "Some extensions of the innovations theorem", The Bell System Technical Journal 50 (1971) I487-94. [3] S. Orey, "Radon-Nikodym derivatives of probability measures: martingale methods", notes from lectures at Tokyo University of Education (Spring 1974). [4] P.A. Meyer, "Sur un probl/~me de filtration", in: S~minaire de probabilit6s VII, Springer Lecture Notes 321 (Springer, Berlin, 1973) pp. 223-247. [5] J.M.C. Clark, "Conditions for the one-to-one correspondence between an observation process and its innovation", Tech. Rept. 1, Center for computing and automation, Imperial College, London, U.K. (1969). [6] G. Kallianpur and C. Striebel, "Estimation of stochastic processes. Arbitrary system processes with additive white noise errors", The Annals of Mathematical Statistics 39 (1968) 785-801. [7] M.P. Ershov, private communication. [8] M. Hitsuda, "Representation of Gaussian processes equivalent to Wiener process", Osaka Journal of Mathematics 5 (1968) 299-312.
Mathematical Programming Study 5 (1976) 8-21. North-Holland Publishing Company
PARAMETRICALLY EQUATIONS*
STOCHASTIC
LINEAR
DIFFERENTIAL
Roger W. B R O C K E T T Harvard University, Cambridge, Mass., U.S.A. Received 6 June 1975 Revised manuscript received 22 July 1975 In this paper we study the asymptotic properties of a family of Ito equations. Various questions about the existence and uniqueness of invariant measures are resolved by using properties of the moments. In particular, the determinateness of the moment problem is related to a certain compactness condition of a Lie algebra associated with the infinitesimal generator of the process.
I. Introduction Because of the relationship between controllability and hypoellipticity the results on existence of smooth diffusions which are based on H6rmander's theory are particularly appealing in a control setting. Some results in this direction are discussed in [1]. In this paper we use this theory and allied methods to study the qualitative and quantitative features of a simple but surprisingly diverse class of Ito equations, namely those of the form q
d x = A x d t + 2 Bixdw, + E g, dr2, i
1
(1)
i=l
where w, and v~ are standard independent Wiener processes. It may happen, depending on the nature of the B, terms, that there are many aspects of the linear theory which are not destroyed by the presence of the B,x dw, terms, or that most features of linear theory disappear. For example, the higher order moment properties depend critically on the Lie algebra generated by * This paper was written while the author held a Guggenheim Fellowship, partial support from the U.S. Office of Naval Research under the Joint Services Electronics Program by Contract N00014-75-C-0648, Division of Engineering and Applied Physics, Harvard University, Cambridge, Mass. is also acknowledged.
R. W. Brockett/ Parametricallystochasticdifferentialequations the Bi. We show under some natural assumptions that if this algebra leaves invariant a positive definite symmetric form then for suitable A ' s the moments of all orders are bounded on [0, ~) and approach a limit as t goes to infinity, whereas if this condition is not satisfied, such behavior need not occur. We also show that the moments determine the density uniquely (i.e. the moment problem is determinate) if the Lie algebra generated by the {B~} leaves invariant a positive definite form. We also establish some instances under which one can compute the invariant measure for (1) and in this way motivate some of our results. As in [2] we utilize certain algebraic techniques which arise in connection with the moments of equation (1). These methods permit a fairly complete analysis of the moments including their behavior for large time and a determination of the extent to which the moments determine the density. We also exploit the connection between (1) and the deterministic control problem q
Yc(t)= ( A - ~ B 2 )
x(t)+~ui(t)Bix(t)+~glyi(t)
i=1
i=l
(2)
i=l
in which ui and y~ are controls as well as the aforementioned connection between the controllability of (2) and the hypoellipticity of the diffusion equation corresponding to (1). The probabilistic content of the methods is not great but there has been a continuing interest in the class of systems [3-9] and our results bring this theory to a new level of completeness. A sample theorem is the following. Theorem. Let w and v be standard Wiener processes which are independent. Consider the Ito equation in R ~ dx(t) = A x ( t ) d t + B x ( t ) d w ( t ) + g dr(t)
(3)
and the associated control equation it(t) = (A - 8 9
u l ( t ) B x ( t ) + u~(t) g.
(4)
Then (i) the k-th moment of x(t) exists for all time if it exists initially, (ii) if the variance exists, it satisfies 2 ( t ) = A S , ( t ) + 2 ( t ) A ' + B 2 ( t ) B + gg' and for 2(0) = 0 controllability of (4) is equivalent to S ( t ) > 0 fort>O, (iii) if for some O = O ' > 0 OB + B' O = 0 and OA + A ' O <~O, then all moments which exist are bounded on [0,~].
10
R. W. Brockett / Parametrically stochastic differential equations
Moreover, if c [" ] is any linear functional, then the m o m e n t problem for c [x (t)] is determinate for 0 <~ t ~ oo if all m o m e n t s exist initially and the m o m e n t problem is determinate for all linear funetionals on x(O). A glance at two special cases will serve to indicate the b r e a d t h of possibilities e m b o d i e d in these equations.
Example 1. Consider the scalar Ito equation dx = ax dt + bx d w + c dv. If c is n o n z e r o and a is negative, then we k n o w from genera! principles that there exists an invariant measure for this system. It is necessarily C a and must satisfy ~x
-ax
+~x
(b2x2+cZ)
p(x)=0.
For b ~ 0 one sees, by solving a linear first order differential equation, that p is a generalized Student's density k ( a , fl) p ( x ) = (ax2 + 1), , where a = b2/c 2 a n d / 3 = (1 - a/b2). In this case the invariant density does not possess m o m e n t s of all orders as is clear. W e observe that as b goes to zero not only do additional m o m e n t s exist but the density a p p r o a c h e s the Gaussian density in accordance with l i m ~ 0 (1 + e k ) 1/~ = e k.
Example 2. C o n s i d e r the Ito equation in R" dx = A x dt + a B x d w + C d v where C ' C = L A + A ' -- 2 a I and B = - B ' , B 2 = - / . T h e n o n e easily sees that m = x ' x satisfies the Ito e q u a t i o n d m = 2 a m d t + 2 x ' C d v + a Z m d t + tr C ' C d t . Because C is o r t h o g o n a l , this is equivalent to the Ito e q u a t i o n d m = (2a + aZ)m dt + 2 X / m d t z + n dt. Proceeding formally, for a m o m e n t , we solve the steady state F o k k e r - P l a n c k equation
R. W. Brockett / Parametrically stochastic differential equations
d-~ 2 -~z mp - (2a + a2)mp - np
11
= 0
to get p = km(,-2)e(a+~z/2)m and if a + a2/2 is negative, we obtain a candidate for an invariant measure. Transferring this back to x gives a measure which is zero mean and Gaussian with variance (a + a2)L One easily sees that this is actually an invariant measure under the given hypothesis. Obviously both these examples admit generalizations of various types. Example 2 will be generalized in Section 3.
2. Controllability and hypoellipticity We begin by noting the following result from [9], (page 61, Theorem 4).
Theorem 1. The system in R" defined by (2) has the property that any state can be driven to any other state in any positive time if and only if {g, } does not belong to a proper subspace of R" which is invariant under A and {B~}; alternatively, if {H,}7_~ is a basis for the Lie algebra generated by A and {B,}, then {H,g# gj} should contain a basis for R". Remark. Notice that for vector fields in R" which take the form A x for some matrix A, the Lie bracket coincides with the ordinary matrix commutator, i.e., [Ax, Bx ] = +--( A B - B A )x, depending on the sign convention chosen. Moreover the addition of the ~B~ terms to A does not effect the controllability condition in this case because of the special form of equations. We will say that (2) is controllable if the conditions of Theorem 1 are met. One sees easily that the variance for (1) satisfies the matrix differential equation q
~ ( t ) = A ~ ( t ) + ~ ( t ) A ' + ~] B~Z(t)B'~ + ~ g,g'~. i=1
(5)
i--1
If Z(0) = 0 then it may happen that S is still singular at some future time. Expressing the variance equation with this boundary condition as 2(t) =
2f'
i=1
)
eA~B~S(o')B;eA'~do " +
fo'
i=1
ea~'g~g;e A'~dt
R. W. Brockett / Parametrically stochasticdifferentialequations
12
makes it clear that as t increases Z is monotone increasing with respect to the partial order defined by nonnegative definiteness. The following theorem describes the situation precisely (c.f. [4]). Theorem 2. If (2) is not controllable, the solution of (5) with Z(O)= 0 is singular for all time. If (2) is controllable, then the solution of (5) with Z(O) = 0 is positive definite for all t > 0 and
L ( Q ) ~=A Q + Q A ' + ~ B, QB', = - G G '
(6)
admits a positive definite solution if and only if the eigenvalues of L ( . ) have negative real parts. Proof. If c'S(t,)c is to vanish then c ' Z ( t ) c must vanish for 0 ~< t ~< tl. This can happen only if d k t At t A't dt k c e g~gie c=-O;
i = 1 , 2 . . . . ,p.
But since
N(t) >1
f,
eA'g~g'~eA"dt
)
we see from the integral version of (5) that
~,(t) >1 ~ ~0t
eA~B~f0 tr eAog~g'~eA'OdpB~ea'~do ". "
j=l
Iterating on this argument we see that A kgl, A kBjA "g,, A kBjA " B j'A "g, . . . . etc. must all belong to an invariant subspace of the collection A, B,, B 2 . . . B,,. On the other hand, if there is such an invariant subspace which contains all the g,, then clearly Z is singular. If the eigenvalues of L ( . ) have negative real parts, then the solution of (5) is bounded. Since Z ( t ) is monotone increasing for Z(0) = 0, this implies that there is a limit Z ( ~ ) which satisfies (6). On the other hand, if there exists a positive definite solution P of (6), then for X(0) = X'(0) positive definite we see that the solutions of
f~ = A X + X A ' + ~ B'~XB,
i=1
satisfy X ( t ) = X ' ( t ) > 0 for all t > 0 and that
R. W. Brockett / Parametrically stochastic differential equations
13
d d t (X, P) = ( a 'P + P A + B P B ' , X ( t ) ) :
- ( C C ' , x ( t ) ) ~ o,
where (P, X ) = t r ( P ' x ) ~<0. Since (X(t), CC') ~- 0 together with controllability implies that X ( t ) - 0 we see from LaSalle's asymptotic stability theorem that all solutions from symmetric initial conditions go to zero. From this it follows that L ( - ) actually has eigenvalues with negative real parts. Remark. Let L,, i = 0, 1,2 . . . . . m be a set of first order partial differential operators. The second order o p e r a t o r L = L,,+ L ~ + L ~ + . . . + L ~ is said to be hypoelliptic if for any Schwartz distributions 4, and ~b such that Lib = ~b it follows that & is C a off the support of the singular part of 0. (After a possible modification on a set of zero measure.) H o r m a n d e r has shown that hypoellipticity is easily checked by computing the Lie algebra of vector fields generated by the L~. In the present setup we may verify directly that hypoellipticity of the infinitesimal generator of (1) is the same as the controllability condition of the theorem. Thus the F o k k e r - P l a n c k equation will have C a solutions for t > 0.
3. The diffusion equation and invariant measures
One of our goals will be to establish some properties of the invariant measures of (1). If one could solve for these measures in closed form, then this work would be unnecessary. For this reason we now investigate more fully certain special cases. Associated with the Ito equation (3) is the forward equation ap _ c)t
Vx, (V~, A x p ) + ~(V~ ' ' ( B x x ' B ' + gg')p ).
(7)
One may verify with some effort that
(Vx, A x p ) = (tr A )p + (Vpx, A x )
<%v'., CC'p> =
(Vx V'x, B x x ' B ' p ) = [tr(S 2) + (tr B)g]p + (V,p, [ B B ' + B ' B + t r ( S ) (B + B')]x ) + (Bx,
(%V'xV)Bx).
R. W. Brockett / Parametrically stochastic differential equations
14
Example 2 suggests that we investigate the possibility of a Gaussian invariant measure. For p = ((27r)" det Q)-~e -x'~ we have Vxp = - Q - l x p Vx V'~p = ( - O - ' + O - ' x x ' O - ' ) p .
Using these formulas we see that the steady state part of the Fokker-Planck equation is satisfied if and only if tr(A ) - ( A x , Q ' x ) - }trB 2 - ~(tr B): - } ( Q - l x , B B ' + B ' B + t r ( B ) ( B + B ' ) x }
(8) + 89
i x, Q-lgg'O-'x} + 89 - ~{
O-'xx'O-')Bx}
= O.
In view of the fourth degree terms in x a necessary condition for this equation to hold is that Q - 1 B be skew symmetric. Since Q would necessarily be a solution of the steady state variance equation 0 1 4 ' + A O + B O B ' + gg' = 0,
(9)
this gives an easily checked necessary condition. Moreover (9) implies 2tr A + t r ( B Q B ' Q - ' ) + tr g g ' Q - 1 = O. In order to have the zero-th degree terms in (8) cancel it must happen then that tr B Q B ' Q -~ = - t r B 2 - (tr B ) 2. But this holds if B Q = - Q B ' since in this case (tr B ) = 0. A short calculation shows that the quadratic terms also sum to zero if B Q = - Q B ' . Putting this together with the stability condition and making the generalization to an arbitrary number of B ' s gives the following result. Theorem 3. The Ito equation (1) admits a unique nondegenerate Gaussian invariant measure if and only if the following conditions are satisfied: (i) equation (2) is controllable, (ii) L ( . ) = A ( . ) + ( . ) A ' + E?=l B, (.)B', has all its eigenvalues in Res < 0, (iii) the solution Qo of L ( Q ) = - G G ' (which exists and is positive definite by (ii)) satisfies QoB~ = - B',Qo; i = 1,2 . . . . . m. In this case the invariant measure is zero m e a n Gaussian with variance Qo.
Notice that it can happen that the system is not hypoelliptic with B~ = 0, i.e. it may happen that (A, G ) is not controllable, yet the effect of the B~xdw terms is to give hypoellipticity. If we drop the controllability condition in this theorem there still may be an invariant Gaussian measure but it will be
R. W. Brockett / Parametrically stochastic differential equations
15
degenerate if (ii) holds or possibly nondegenerate but nonunique if (i) and (ii) both fail. Thus if "unique" is dropped from the theorem statement, the theorem does not hold.
4. The m o m e n t equations In this section we set up the necessary machinery to investigate the moments of equation (1). This material is classical in a certain sense but I am unaware of a satisfactory reference. Let N(n, p) denote the binomial coefficient - - ( n + p - 1 ] . If A :R"---~R", there is a family of maps A tpl, p = \ P / 1,2 . . . . which map R N('p) into R Nt"'p) and which are defined as follows. Given x E R", define x t"J E R N(n'p) a s x t~j--col(x~,
a , xP l P-' x2 . . . . , x ~ ) .
That is, the c o m p o n e n t s of x rpJ are the (suitably scaled) monomials of degree p in Xl ... xn, the components of x. The scale factors c~ are chosen so as to obtain for the Euclidean norm (see [9]) IIx II~ -- Ilxl~JII.
The map A tpj is then the unique map such that the following diagram commutes. A R n
~R"
[p]
Ip] RNtn, p)
Atpl ) R N(m'p)
This process was investigated by 1. Schur in connection with the theory of group representations. The entries of the matrix A tpj are h o m o g e n e o u s of degree p in the entries of A. M o r e o v e r one sees easily that (AB) I~j= A ~PJBrPJwhich more or less explains the significance of the theory for group representations. We are more interested in the infinitesimal version of this process which plays a corresponding role in the representation of Lie algebras. Suppose x(t) E R" with x ( . ) differentiable. If ~(t) = Ax(t), then
R. IV. Brockett / Parametrically stochastic differential equations
16
dt x
-~ [x[P](I + h ) - xtPl(t)]
= lima_0hl [ ( i + hA)tPjx~PJ(t)_ xtPJ(t)] =d AtplXtPk If qbA(t, r) represents the fundamental solution of 2(0 = A(t)x(t),
then of course
9 .,,.,(t, ~-) = [q~.. (t. ~-)]~PJ which specializes in the constant case to eat.,, = [e a,]tpl. We now observe several properties of the "sub p " operator. (a) [aA + flB]tpj = aAip j +/3[B]rpj. (b) (A')I~I = (A~pl)'(c) [L]fpl = plr,,(.,p~; L = n-dimensional identity. (d) If (A,, A2. . . . . A.) are the eigenvalues of A (including possible repetitions) the spectrum of A,pj consists of all (\ n + p - 1~/ formally distinct P of p eigenvalues of A. (e) If
sums
0
AI with A n + l
by n + l
and A n by n, then
0
0
...
0
0
Am
...
0
0
0
...
Atp j
Alp I =
The moment equations for (1) are easily expressed in terms of this notation. It is known [2] that if the g, are zero in (1) then - -
dt
=
1
i=1
2
[pl
i=1
If the g, are nonzero we can write (1) as
R. IV. Brockett / Parametrically stochastic differentialequations
o
A][~]dt+~=
d [ ~ ] = [~
dw,[O
Bi
17
~ l
,=1
Applying the above construction together with comment (e) above gives
d ~ dt
1
0
0
0
...
0
1
x
0 -'~rq + N~
0
...
0
x
~t~2j+ N2 ...
0
~ X [3]
.
x tp'
G2
0
0
G3
0
. . .
0
0
... ,ff.le,+ Np
L0
0
J
x'"]]
(10) where ,~ = ( A - 8 9 IV, =~E(Bwl) 2 and G, are the coupling matrices defined by the [ ( i - 2), i]-th block in 0
01, )
Thus dt
~=1
tpJ
i=1
We note by way of explanation that
[o Ool,' has its nonzero entries in blocks along a diagonal immediately below the main diagonal. Thus its square has nonzero entries in blocks along a diagonal 2 below the main diagonal. The blocks in the square are the product of exactly two nonzero entries. Since []AIpj[[ ~< p IIA [[ we see that the norm of the G, terms in (10) are bounded by IIGpl[- ~ [[g, ll2.p z.
(11)
As an immediate consequence of this representation of the moment equations we have the following theorem.
18
R. W. Brockett/ Parametricallystochasticdifferentialequations
Theorem 4. Consider the Ito equation (1). If p(O,. ) is such that ~(c[x(O)]) p exists for all linear functionals c, then for p = 1, 2 . . . . . (i) ~(c[x(t)]) p exists for all 0 <-t < o~ and all linear functionals c[. ], (ii) there exist constants M~ and hp such that ~(c[x(t)]) e <~Mp e~,'(1 + II~x(0)IIP), (iii) there exists an invariant measure if the eigenvalues of L ( . ) = A ( . ) + ( . ) A ' + E?_, B ( . )B'~ have real negative parts.
Proof. We see (i) and (ii) from the calculation above. On the other hand, (iii) follows from T h e o r e m 1 of Zakai [11], if we take a Liapunov function to be x'Px where P is the steady state variance.
5. Solvability of the moment problem Since we have a means for solving for the moments of all order for equation (l) we can, in principle, determine the probability density if the moment problem is determinate. In this section we investigate when this is the case under some natural assumptions on A and B1, B: . . . . . B,,.
Theorem 5. Consider the Ito equation (1) in R". Suppose that the system is controllable, that {B~}~A leaves invariant a symmetric positive definite bilinear form x'Qx, and that A Q + A ' Q Z,%,B, QB'~
Remark. The estimate (ii) shows more, namely that the characteristic
R. W. Brockett / Parametrically stochastic differential equations
19
function for the random variable (c, x) is analytic in a neighborhood of the real axis (see [11], page 487). Thus the Taylor expansion of the characteristic function in a neighborhood of zero determines it on the whole real axis. Proof. Since O is positive definite, it can be factored as R ' R and if z = R x , then in z-coordinates the invariant form is (z, z). Thus in this coordinate -, system the /3, are skew symmetric and ,4 + A ' + E,%~B , B , < 0. Of course /3~ = -/3',/3, is symmetric and negative semidefinite. Thus
i=1
[p]
i=1
is the sum of a selfadjoint negative semidefinite matrix and a matrix whose symmetric part is negative definite thus it has its eigenvalue in Re s < 0. From this it follows that the moment equations are all asymptotically stable. Hence there is a unique equilibrium solution of (10). From now on we drop the - but assume that the above transformation has been made. If A + A ' < 0 , then we know that IleA'll<~e '~' where 2/x is the largest eigenvalue of A + A'. Then IIA-'II--
f; IS; e ~'dt
~<
e~'dt ~ IlXi
and II(el.i)-'lr
=
~
enUidt <~
PJ/~I'
If Bi = - B ; , then BiI,j is also skew symmetric and ( B , i , j ) 2 is symmetric and negative semidefinite. Thus the largest eigenvalue of A l p J + 89 ~ is again negative and following the above steps we see that 1 II[Atpl + 89 Z(Bilp])2]-'[[ ~< - -
pill
Thus ~xtP)(~) = (Atp] + N p ) - ' G p ( A i p - 2 ] + N p - 2 ) - ' G p z. .. G 2 x 1~ Clearly the odd moments vanish. Applying the above estimates (including inequality (11)) we see that for p even
~c,x). <-frcll", ae=(i~ , = p . -(p2 )-~ 22)~. . .
2
i=, ~ IIg, II=.
20
R. W. Brockett / Parametrically stochastic differential equations
It is weil known [12, 13] that this is sufficient for the moment problem to be determinate. Moreover under the present circumstances the moments for (c, x) determine the density p(x) by standard results ([14], p. 326). Clearly the stability condition implies uniqueness of the invariant measure. The C = part of (iii) comes from hypoellipticity (see [1]).
6. Comments
We should point out some of the difficulties involved in treating these systems from other points of view. First of all, not only are they not fully elliptic but they need not break up as
~. = f(x,, x~), X2 :" g(Xl, X2) "+ O'(X ) d]A? with o'-'(x) uniformly bounded since we may have x - ' terms in o'-~(x). Moreover, if such a decomposition is possible we do not have enough a priori information on densities to insure that p(t, x, U) is positive for all open sets U and all t > 0 as is required for the Khas'minskii theory. This condition is quite likely to hold but I am unaware of any results in this direction. Even in the circumstances described by T h e o r e m 3, only the invariant measure is known - - the transition density will typically not be Gaussian for t < oe and whether it is everywhere positive or not is unclear. Finally, we point out that the case where the Lie algebra generated by A and B is solvable is rather more like the linear case and has been investigated in detail by Willems [15].
References [1] D.L. Elliott, "Diffusions on manifolds arising from controllable systems", in: D.O. Mayne and R.W. Brockett, eds., Geometric methods in system theory (Reidet, Dordrecht, 1973). [2] R.W. Brockett, "Lie theory and control systems defined on spheres", S I A M Journal on Applied Mathematics 25 (2) (1973) 213-225. [3] M.B. Nevel'son and R.Z. Khas'minskii, "Stability of a linear system with disturbances of its parameters", Journal of Applied Mathematics and Mechanics 30 (1966) 487-494. [4] D.L. Kleinman, " O n the stability of linear stochastic systems", IEEE Transactions on Automatic Control, AC-14 (4) (1969) 429-430. [5] M. Wonham, " R a n d o m differential equations in control theory", in: A.T. Bharucha-Reid, ed., Probabilistic methods in applied mathematics, Vol. 2 (Academic Press, New York, 1974).
R. W. Brockett / Parametrically stochastic differential equations
21
[6] J.C. Willems and G. Blankenship, "Frequency domain stability criteria for stochastic systems", I E E E Transactions on Automatic Control, AC-15, (4) (1971) 292-299. [7] U.G. Haussmann, "Stability of linear systms with control dependent noise", S I A M Journal on Control, 11 (2) (1973) 382-394. [8] U.G. Haussmann, "Optimal stationary control with state and control dependent noise", S I A M Journal on Control, 9 (1971) 184-198. [9] U.G. Haussmann, "On the existence of moments of stationary linear systems with multiplicative noise", S I A M Journal on Control, 12 (1) (1974) 99-105. [10] R.W. Brockett, "Lie algebras and Lie groups in control theory", in: D.Q. Mayne and R.W. Brockett, eds., Geometric methods in system theory (Reidel, Dordrecht, 1973). [11] M. Zakai, "A Liapunov criterion for the existence of stationary probability distributions for systems perturbed by noise", S I A M Journal on Control, 7 (1969) 390-397. [12] W. Feller, A n introduction to probability theory and its application, Vol. 2 (Wiley, New York, 1966). [13] N.l. Akhiezer, The classical moment problem (Hafner, New York, 1965). [14] T. Kawata, Fourier series in probability theory (Academic Press, New York, 1972). [15] J.L. Willems, "Stability of higher order moments for linear stochastic systems", to appear.
Mathematical Programming Study 5 (1976) 22-38, North-Holland Publishing Company
ENTRANCE-EXIT DISTRIBUTIONS FOR MARKOV ADDITIVE PROCESSES* Erhan (~INLAR Northwestern University, Evanston, Ill., U.S.A.
Received 2 June 1975 Revised manuscript received 15 Dec. 1975 Let X be a standard Markov process, and let S be a perfectly additive increasing process with conditionally independent increments given the paths of X. Then, (X, S) is a Markov additive process. Let C be the random time change associated with S, and put Z , = X(C, - ), Z ~ = X(C,), R , = t - S(C, - ), R~ = S ( C , ) - t. When the state space of X is finite, Getoor [5] has recently obtained the joint distribution of these variables in terms of a triple Laplace transform. Here, the same is obtained explicitly by using renewal theoretic arguments along with the results on L6vy systems of (X, S) given in (~inlar [4]. These results are useful in reliability theory and in the boundary theory of Markov processes.
1. Introduction
Let (X, S) = (0, At, At,, X,, S,, 0,, px) be a Markov additive process. Here X = (12, At, At,,X,, O,,P ~) is a standard Markov process with state space (E, g') augmented by a point A (see [2] for the definition and other ge0eral terminology), and (S,) is an increasing right-continuous process with So = 0 a n d S, = S~_ for t / > r = i n f { u : Xu = A} w h i c h is a d a p t e d to (At,), is p e r f e c t l y a d d i t i v e ( t h a t is, S,+, ( t o ) = S,(~o)+ S, ( 0 , t o ) f o r all t, u, a n d ~ o ) a n d is such t h a t Px{X.
o O, E A , Su o O, E B [ At,} = P X ( ' ) { X , E A , Su @ B }
(1.1)
for all x E E~, t, u @ R+, A E E~, B E ~ . (see [3] for the p r e c i s e d e f i n i t i o n ) . In g e n e r a l At, is l a r g e r t h a n 5'{, = o-(Xs; s ~< t ) - . If At, = Y{, for all t, t h e n ( S , ) b e c o m e s a n a d d i t i v e f u n c t i o n a l of X. O t h e r w i s e ,
(S,) has c o n d i t i o n a l l y
* Research supported by the Air Force Office of Scientific Research, Air Force Systems Command, USAF, under Grant No. AFOSR-74-2733. The United States Government is authorized to reproduce and distribute reprints for governmental purposes. 22
E. (,?inlar / Entrance-exit distributions for Markov processes
23
independent increments given K~ (this becomes an F-process according to Neveu [15] when E is finite). Define C, = inf{t : S, > s}, Z~ = Xc~-,
(1.2)
Z~ = Xc,,
R~ = s - Sc, ,
R~ = Sc, - s
(1.3)
for all s/> 0. We are interested in the joint distribution of these variables. When E is a singleton, (S,) becomes an increasing L6vy process and then our results reduce to those given by Kingman [11] and Kesten [10]. When E is finite, Getoor [5] computed the triple Laplace transform f dt e - " ' P ' { Z , = j,Z~ = k, R-; ~ du, R~ E dv}e -o"-'v
Here we give an explicit derivation of the distribution of (1.3) in general by using renewal theoretic methods along with results on L6vy systems, and a change of variable formula out of Getoor and Sharpe [7]. We will assume throughout that (S,) is quasi-left-continuous. Then, it was shown in t~inlar [4] that there is a L6vy system (H, L, a) which specifies the jump structure of ( X , S ) along with the conditional law of (S,) given ~ r = o-(X, s ~> 0)-. Here, (/4,) is a continuous additive functional of X, L is a transition kernel from (E, ~) into (E, ~')x (R., ~ ) , and a is a positive Borel function on E (see the next section for the precise meaning). Define ~
U(x,f):
f
E x , f(X,,S,)dH,
(1.4)
for x E E and f positive Borel on E x R+. The following proposition is preparatory to the theorem after it, which is our main result. (1.5) Proposition. Let A = {a >0}. There exists a transition kernel (t, x, A )---> U,(x, A ) from (R., ~+) x (E, ~') into (E, ~') vanishing on E\A such that U(x,A
xB)=fo
u,(x,A)dt
(1.6)
for all x E E, A C A Borel, B E ~?+.
(1.7) Main Theorem. For any positive Borel measurable function f on
ExE•215
24
E. ~.inlar / Entrance-exit distributions [or Markov processes
E" [f(Z-;, Z ;, R,,
R +,)]= f,.
u,(x, d y ) a ( y ) f ( y , y, 0, 0)
(1.s) + fE~,,,,,
U(x'dy'ds)f,.x,.
L(y, dz, t - s + d u ) / ( y , z, t - s, u)
for every x C E and (Lebesgue) almost every t E R , . identically, then the same is true for every t.
If fl( ., ., ., 0) = 0
A number of further results will be given in Section 3 along with the proofs. For the time being we note the interesting deficiency of the preceding theorem: the equality is shown to hold only for almost every t. The difficulty lies with computing the probabilities of events contained in {R, = 0, R ~ = 0}. When E is a singleton, this becomes the infamous problem resolved by Kesten [10]. Using his result, we are able to resolve the matter when X is a regular step process (and therefore, in particular, if E is countable). The matter is simpler for the event {R ~ > 0, R ~ = 0}; in general, the qualitier "almost every t " cannot be removed. But again, we do not have a complete solution. To see the significance of the seemingly small matter, and also to justify the title of this paper, consider a semiregenerative process (Zs; M) in the sense of Maisonneuve [14]. Here (Z~) is a process, M is a right closed random set, and (Z,) enjoys the strong Markov property at all stopping times whose graphs are contained in M. This is a slightly different, but equivalent, formulation of the "regencrative systems" of Maisonneuve [12]. Then, M is called the regeneration set, and L, = sup{t ~< s : t • M},
N, = inf{t > s : t E M}
.(1.9)
are the last time of regeneration before t and the next time of regeneration after t. In accordance with the terminology of the boundary theory of Markov processes, the processes (z-,) = ( z L . ) ,
( z ; ) = (z~...)
(1.1o)
are called, respectively, the exit and the entrance processes. Under reasonable conditions on M (see Jacod [9] and also Maisonneuve [15] for the precise results) it can be shown that there exists a Markov additive process (X, S) such that the entities defined by (1.2), (1.3), (1.9), (1.10) are related to each other as follows: M={s:
S,=s
for some
t E R+},
(1.11)
17..~intar / Entrance- exit d~tributions for Markov processes R - , = s - Ls; z~ = zLs-;
R ~ : N~ - s; Z+s = zNs.
25 (1.12) (1.13)
In other words, R~ and R+~ are the "backward and forward recurrence times" and Z s and Z+~ are the exit and entrance states. So, R *~= 0 implies that s E M, that is, that s is a time of regeneration. Conversely, starting with a Markov additive process (X,, S,), if (Z~) is defined by (1.3) and M by (1.11), the pair (Z~; M) is semiregenerative in the sense of Maisonneuve [14]. It was noted by Maisonneuve [12] that semiregenerative processes may be studied by using the techniques of Neveu [15, 16] and Pyke [17, 19] via Markov additive processes. We are essentially doing just this by bringing in "renewal theoretic" thinking together with the results on Markov additive processes obtained in (~inlar [3, 4]. In fact, our techniques may be used to obtain most of the results given by Maisonneuve [12-14] and by Jacod [9], but we have limited ourselves to results which are extensions of their work. Moreover, these results are related to the last exit-first entrance decompositions for Markov processes of Getoor and Sharpe [6, 7]. We are planning to show the precise connections later; roughly, they were working with the conditional expectations of our additive process (S,).
2. Preliminaries
Let (X, S) = (S2, ~ , 2L, X,, &, 0,, W ) be a Markov additive process as in the introduction. It was shown in (~inlar [4] that there is a L6vy system ( H ' , L ' ) for (X, S), where H ' is a continuous additive functional of X and L ' is a transition kernel, such that
(2.1)
fo' l
=E ~
dH'~
L ' ( X . dy, d s ) f ( X . y , s )
JgxR+
for all x, t and all positive ~ x ~' • ~+ measurable functions f. This was shown for X Hunt with a reference measure; but the work of Benveniste and Jacod [1] shows that the same is true for arbitrary standard Markov processes X. The process S can be decomposed as (see (~inlar [3] for this)
E. (finlar / Entrance-exit distributions for Markov processes
26
S=A
+S a+S r
(2.2)
where A is a continuous additive functional of X, S d is a pure jump increasing additive process which is continuous in probability with respect to px(. I ~f~), and S I is a pure jump increasing additive process whose jumps coincide with those of X and therefore the jump times are fixed by X. We define /4, = H: + A, + t,
(2.3)
and let the positive ~-measurable functions a and h be such that
A, =
f
a(X~)dH.~;
H:=
foh(X,)dH~;
(2.4)
(that this is possible follows from Blumenthal and Getoor [2, Chapter VI). Define
L(x,. ) = h (x)L'(x,. );
(2.5)
L~(x,B)=L(x,{x}•
LI(x,A •215
(2.6)
K(x,A)=LI(x,A•
F(x,y,B) = L I ( x ' d y x B ) " K(x, dy) '
(2.7)
(in fact one starts from F and K and defines LI; see (~inlar [4] for the derivations). Then, (/4, K) is a L6vy system for X alone; (H, L r) is a L6vy system for the Markov additive process (X, St); (H, a) defines A by (2.4); and (H, L ~) defines the conditional law of S a given Y{~= ~r(Xs;s ~>0)- by
Finally, if r is a jump time of X, then F(XT , X , . ) is the conditional distribution of the magnitude of the jump of S r at r given .,~. We call (H, L, a) the L6vy system of (X, S). The following random time change reduces the complexity of the future computations. Define G, = inf{s :/45 > t}, Yc, = x ( G , ) ,
= S(G,) .....
(2.9)
(2.10)
and define C,, ,Y], Z], /~], / ~ by (1.2) and (1.3) but from (.~, S). Then we have the following
E. (2inlar / Entrance-exit distributions for Markov processes
27
(2.11) Proposition. (X, S) is a Markov additive process with a Lgvy system
(ISI, L, a) where I2I, = t A ~. Moreover,
(2.2+,, ~1., ~;) = (z;, z;, R;, R;).
(2.12)
Proof. This is immediate from the definitions involved since H is strictly increasing and continuous (which makes G continuous and strictly increasing); see also (~inlar [4, Proposition (2.35)]. Note that the potential U defined by (1.4) is related to (.~, S) by
U(x, f) = E x
f(X,, S,)dH, = E x
f0
f(.~,, S,)dt.
(2.13)
In view of (2.11) and (2.13), it is advantageous to work with (.X, S). We will do this throughout the remainder of this paper, but will also drop . . . . . from the notation. In other words, we may, without loss of any generality, assume that the L6vy system (H, L, a) is such that /4, = t A~'. (2.14) Notations. In addition to the usual notations it will be convenient to introduce the following: For any t E R. = [0, oo) we write R, = (t, oo), the set of all real numbers to the right of t; and B, -- [0, t], the set of all numbers before t; for B C R + we write B - t = { b - t ~ O : b E B } and B + t = {b+t: bEB}. For any topological space G we write ~ for the set of all its Borel subsets; we write f E p~ to mean that f is a positive Y-measurable function on G. If N is a transition kernel from (E, 3) into (F, ~ ) x ( G , ~), we write N(x, dy, du) instead of N(x, d(y, u)), and write N(x,f, g) instead of N(x, h) whenever h has the form h(y, u ) - - f ( y ) g ( u ) , that is
N(x, f, g) = f~•
(2.15)
N(x, dy, d u ) f ( y ) g(u).
If N is a transition kernel from (E, ~) into (E, ~')• (R+, ~+), and if f E p~ x ~+, we define the "convolution" of N and f by
N * f ( x , t) = f
dE •
(2.16)
N(x, dy, du)f(y, t - u).
If M and N are two transition kernels from (E, ~') into (E, g~) x (R+, 9~+), their convolution is defined by
M*N(x,h)=f~• M(x,dy,du)~•
N(y, dz, d s ) h ( z , u + s ) , h E p~' x ~+.
(2.17)
E. ~inlar / Entrance-exit distributions [or Markov processes
28
The convolution operation is associative: M * (N * jr) = (M * N) * f, but in general not commutative.
3. Proof of the Main Theorem Let (X, S) be a Markov additive process as in the introduction, and let (14, L, a) be its L6vy system. We may and do assume/4, = t^~r without any loss of generality (see the preceding section). Let b > 0 be fixed, and define
(3.1)
T = inf{t : S, - S,_ > b}.
The following auxiliary result is of interest in itself. Recall that Rb = (b, ~). (3.2) Proposition. For all x ~ E and f E p~s • ~+,
E x [f(XT-, ST-)] = E ~f0 T f(X,, S,)L(X,, E, Rb)dt.
Proof. Define
S b, : A , + ~
(3.3)
(S. - S.-(I~s.-s.~_b,.
u~t
Then, S, = S b, on { T > t} and ST-= S~-_ = S~. Moreover, given ~| o-(Xs, s/> 0)-, T is conditionally independent of X and S b, and
[fo t L~(Xs, Rb) ds ] [ I
M,---ex{r>t ly{~}=exp -
F(X~_,X,,Bb).
u~t
(see ~inlar [3]). Hence, with 2(~ = ~r(X,, S~,;t ~>0)-. We have
E*[f[(XT , ST ) [ Y{~] = fR § f(X,_, Sb,_)(- dM,) (3.4)
=
f0
f(X,_,Sb,_)M, L a ( X , , R b ) d t +
~
/ER+
f(X,-,Sb,-)M,-F(X,-,X,,Rb).
Now, the process W, = f(X,-, S b, )M, is predictable; therefore, by theorems on L6vy systems,
E. (inlar / Entrance-exitdistributionsfor Markovprocesses
EX[~
29
W,F(X,_,X,,Rb)] = E'fo=dtf~ K(X,_,dy)F(X,_,y, Rb)W,
(3.5)
= Exf]
W,U(X,_,E, Rb)dt.
Putting this into (3.4) while taking expectations, and noting that X,-, Sb,_, M, can be replaced by X,, Sb,, M, since their jump times are countable, we get E x [f(XT, s~)] = Exf, = f(X,, Sb,)M,(L e(X,, Rb) + LI(X,, E, Rb))dt =Ex
= E ~
f; f;
f(X,, S,b)L (X,, E, Rb)I(r>,)dt
f ( X , , S , ) L (X,, E,
Rb)dt
as desired. The following is immediate from the strong Markov property for (X, S); see ~inlar [3, p. 103]. (3.6) Proposition. Let r be a stopping time of (~,) and define o (x, D : E x [[(x,, s,)].
Then, for any f E p~ x ~+ and x E E, U(x,f)=E x
fo
f(X,,S,)dt+O*U(x,f).
The next result is essentially the second statement of the main theorem (1.7). (3.7) Proposition. Let b > 0, A x B x C x D @ ~ x ~ x gt+ • ~?+, and F, = { Z - , @ A , Z + , ~ B , R - , @ C , R + E b + D}.
(3.8)
PX(F,) = fA
(3.9)
Then, U(x, dy, d s ) l c ( t - s ) L ( y , B , t + b + D - s ) xB t
for every x E E and every t ~ R+ (recall that B, = [0, t]).
30
E. (~inlar/ Entrance-exitdistributionsfor Markovprocesses
Proof. Define T as in (3.1), and put f(x, t) = PX(F,);
g(x, t) = PX(F,; ST > t).
(3.10)
Then, by the strong Markov property for (X, S) at T, and by the additivity of S which implies St+, = ST + S, o OT, f(x, t) = g(x, t)+ Px(r,; ST <~t) = g(x, t)+ Ex[f(XT, t - ST); ST ~< t]
= g(x,t)+ f x.,
(3.11)
Q(x, dy, d u ) f ( y , t - u )
where Q(x, k) = EX[k(Xr, ST)], k E p ~ x ~+.
(3.12)
Next consider g(x, t). Since T is the time of first jump with magnitude greater than b, on {ST>t} we have C , = T , and Z T = X r _ , Z,+=XT, RT=t-ST-, and R + = S T - t . So, g(x,t)= P'{XT_~A, XTEB, ST_Et-C, STEt+b+
D}.
(3.13)
The stopping time T is totally inaccessible since S is quasi-left-continuous. Therefore, by the results of Weil [19] on conditioning on the strict past .~/r-, we have P'{XrEB,
ST--ST-~b+D.'I
JI,IT-} =
L ( X T - , B , b + D') L ( X T - , E , Rb)
(3.14)
for any B ~ g' and D'@ ~+; here we used the additivity of S, so that Sr - ST- is conditionally independent of ~T given Xr . Putting (3.14) into (3.13) we obtain g (x, t) = E" [h (XT-, t -- ST_)/L (XT-, E, Rb)]
(3.15)
h(y, u) = 1A ( y ) l c ( u ) L ( y , B, u + b + D).
(3.16)
where In view of Proposition (3.2), (3.15) implies g = V* h
(3.17)
where h is as defined by (3.16) and V(x,k)=E"
k(X,,S,)dt,
Putting (3.17) into (3.11) we obtain
kEp~X~+.
(3.18)
E. (~inlar/ Entrance-exit distributionsfor Markov processes f= V*h+O*f,
31 (3.19)
and by Proposition (3.6), we have
U*h = V*h+O*U*h.
(3.20)
It is now clear from (3.19) and (3.20) that
(3.21)
f=U*h
is a solution to (3.19). Note that U * h(x, t) is exactly the right-hand side of (3.9). Therefore, the proof will be complete once we show that U * h is the only solution to (3.19). To show this, let f' and f" be two bounded solutions to (3.19); then k = f ' - f " satisfies k = O * k.
(3.22)
Let O, be defined recursively by Q~= Q, Q,+~= Q * Q, through the formula (2.17). Then, (3.22) implies k = O, * k, and
Ik(x, t)l = [Q, 9 k(x, t)l <~cQ,(x,E,B,)
(3.23)
for every n, where c is the bound for k. On the other hand, T is totally inaccessible, and its iterates defined by To = 0,
T,,+l = 7". + T o Or.
(3.24)
are all stopping times, and ST.+,/> ST. + b by the definition of T. Hence, Q , ( x , E , B , ) = Px{STo <- t, XT. E E}---> 0
(3.25)
as n --->oo for any fixed t E R+. This implies through (3.23) that k (x, t) = 0 for all x and t; that is, the only bounded solution of (3.22) is k = 0, and therefore (3.19) has exactly one solution. (3.26) Corollary. For any f E p~' • g' x ~t+ x ~+ such that f ( . , . , . , 0) = 0 the equation (1.8) holds for every x E E and every t E R+. Moreover, for all x and t again, E x [ f ( Z ~ , Z ~ ; , R ; , R §,),9z, - -- z+,]
(3.27) = f~
xBr
U(x, dy, d s ) L ~ ( y , t - s + d u ) f ( Y , Y , t - s , u ) ;
E x [ f ( Z l , Z~, R ;, R ~); Z . / Z~]
(3.28) = fExB, U ( x ' d y ' d s ) L t ( y ' d z ' t - s + d u ) f ( Y ' Z ' t - s ' u ) "
E. ~inlar / Entrance-exit distributions for Markov processes
32
Proof. In
Proposition (3.7) let D = Ra. Then, as b + 0, increases to L ( y , B , t - s + D ) ; and hence, by the monotone convergence theorem, (3.9) remains true with b = 0 in (3.8) and (3.9) both. Now, by the usual monotone class arguments, (3.9) holds true for arbitrary Borel subsets D of Ro = (0, oo). This is equivalent to the first statement by the monotone class theorem again. The second statement, namely that concerning (3.27) and (3.28), is immediate from the first statement coupled with the definitions (2.6).
L(y,B,t-s+b+D)
The preceding corollary is the second statement of Theorem (1.7). There remains the problem of computing the probabilities of the subsets of {R ; = 0}, namely the event that t is a time of regeneration. To that end, we start computing the distribution of ( Z L Z~;). (3.29) Proposition.
E f7
For any f ~ p~ x g, x E g, and
Ae *'f(Z;,
= ~•
A E R+,
Z+,)dt
U(x, dy, ds)[a(y)f(y,y)ae-*" + f~•
(3.30)
L(y, dz, du)f(y,z)(1-e-*")].
Proof. Let e, be the mapping x ~ e-~X~and recall the definition (1.2) of (6",). By the general change of variable formula given by Getoor and Sharpe [6, p. 551], for any g E p~+,
fo,~) g(t)d(1-e,(&))= fo.s~, g(C,)Ae-*'dt.
(3.31)
Taking g(t) = f(X, , X,) above, recalling (1.3), and noting that for t/> S= we have C, = ~ and X= = A and h(X,_, 32,) = 0 by the usual conventions, we get lhs(3.30)
= E x~ f(X,_, X, ) d(1 -
e, (S,)),
(3.32)
where "lhs(-)" stands for the "left-hand side of ( . ) . " By the generalized version of Fubini's theorem,
EX[fo=f(X,_,X,)d(-e,(S,)) [ X=]= -fo=f(X, ,X,)dM,,
(3.33)
E. ~inlar / Entrance-exit distributions for Markov processes
33
where
M, = EX[e,(S,) [ X-I
(3.34)
= e x p [ - A f 0` a(X,)ds-fo'
L~(xs,1-e.)ds]~ F(Xs_,X,,e,)
(see (~inlar [3] for this formula). Now (3.32), (3.33), (3.34) yield lhs(3.30) = Exfo ~ f(X, ,X,)M,(~a(X,)+ L"(X,, 1 - e~))dt (3.35)
+ E*~, f(X,_, X,)M,_F(X,_, X,, 1 - e,). t
The process (M,_) is predictable; therefore, by theorems on Ldvy systems, the second term on the right-hand side is equal to
EX fo =dtM'- fE K(X,_,dy)f(X,_,y)F(X,_,y,l-e~)
f; dtM,
=E~
Lt(X,,dz, l - e , ) f ( X , , z )
by the definitions (2.6) and (2.7). Putting this in (3.35) and noting (3.34), we see that lhs(3.30) = E x
j; g(X,)M, dt = E xf0 g(X,)e~(S,)dt = U(x, g,e~)
(3.36)
where g (y) = f(y, y) (Aa (y) + L ~(y, 1 - e, )) + ] L t(y, dz, 1 - e, )f(y, z) = Aa (y)f(y, y) + I L(y, dz, 1 - e~)f(y, z).
(3.37)
With this g, U(x, g, e~) is precisely the right-hand side of (3.30); and thus the proof is complete. Next we consider the problem of inverting the Laplace transform (3.30). Note that g defined by (3.37) can be written as
g(y)= a{a(y)f(y, y)f0~ e0(du)e -~" (3.38) + f,= [fE L(y'dz'R")/(Y'z)]e-~"du} ' which has the form A f n(y, du)e-"". Putting this in (3.36) we see that
34
E. ~intar / Entrance-exit distributions for Markov processes
U(x,g,e,)= A f U*n(x, du)e ~", and this is equal to the right-hand side of (3.30). Inverting the Laplace transforms on both sides of (3.30), we obtain
f, ex[f(Z-,,Z+,)ldt : f~ U(x, dy, B)a(y)f(y, y)
+f. dts215 U(x,dy, ds)s
(3.39)
L(y, dz, R, s)f(y,z)
for every B ~ N.. We are now ready to give the proof. (3.40) Proof of Proposition (1.5). Choose f such that h (y) = f(y, y) is strictly positive, and let A = {a > 0}. Now the first term on the right-hand side of (3.39) is U(x, a 9h, B), and clearly this is at most equal to the left-hand side. It follows that the measure B --->U(x, a 9h, B) is absolutely continuous with respect to the Lebesgue measure. Let u,(x, ah) be its Radon--Nikodym derivative with respect to the Lebesgue measure. Since X is standard, its state space (E, ~') is locally compact with a countable base. Therefore, it is possible to choose this derivative such that (t,x)---~u,(x, ah) is ~ + x measurable. Now, for k EpiC, define a,(x,k) to be u,(x, al~) where /~ = k/a. By the special nature of (E, ~) again, by theorems on the existence of regular versions of conditional probabilities, we may take ti,(x, 9) to be a measure on while retaining the measurability of the mapping (t, x ) ~ a, (x, A ) for each A E sO. Finally, let
u,(x,A)= 6,(x,A AA),
A E ~.
(3.41)
The statement (1.5) is true for this transition kernel u. The following is immediate from Proposition (1.5) applied to (3,39). (3.42) Theorem. For any f E p~ x ~ and x @ E,
EX[f(Z-,, Z~)] = f~ u,(x, d y ) a ( y ) f ( y , y) (3.43)
+fE• U(x, dy, ds)s for (Lebesgue) almost every t E R+.
L(y, dz, R,_,)f(y, z)
E. ~inIar / Entrance-exit distributions for Markov processes
35
In view of Corollary (3.26), the second term on the right-hand side of (3.43) is equal to the expectation of f(Zs, Z +) on {R s > 0}. Hence, (3.43) implies that + Z-,=Z,EAa.s. on{R~,=O};
,), R ,+ = 0] = E ~[g ( Z +"
u,(x, ag)
(3.44)
for any g E p ~ , x E E, and almost every t E R+. The method of the proof of Proposition (3.7) goes through to show
P'{ZsEA, Z+,EB, R ~ b + C , R + , E D } = (3.45)
= fA•
U(x, dy, d s ) l c ( t - s - b ) L ( y , B , t - s + D)
for all A, B E ~, all C, D E ~+, all t E R+, for b > 0. In particular, this yields
Px{Z-,= Z+,~A,R,>O,R+,=o}= fA
U(x, dy, ds)L~(y,{t-s})
•
(3.46)
/-
= JA ~
•
t)
u,(x, d y ) L a(y, {t -
s})ds
0
since the function s ~ L d (y, {t - s}) is zero everywhere except on a countable set. Hence, for any t, R~=0a.s.
on
{Z]=Z*,EA;R+,=O}.
(3.47)
Conversely, Corollary (3.26) and T h e o r e m (3.42) show that R+=0a.s.,
on
{Z~ = Z , ,+R , =
0}.
(3.48)
It follows that a.s. on {Z~ = Z~}, either R ~ = R +, = 0 or R ~ > 0 and R ~ > 0. (3.49) Proof of Theorem (1.7). The proof now follows from these observations put together with Theorem (3.42) and Corollary (3.26).
4. From almost to all
This section is devoted to showing that under certain reasonable conditions, when X is a regular step process, Theorem (1.7) can be strengthened so that (4.8) is true for every t (instead of for almost every t). Unfortunately, our technique does not generalize to arbitrary X. We shall need the following facts concerning the case where X is trivial:
E. (2inlar/ Entrance-exitdistributionsfor Markovprocesses
36
that is, where E is a singleton {x}. Then, writing a, = u,(x,{x}), U ( d s ) = U(x,{x}, ds), etc., we obtain from Theorem (1.7) that
E[f(R-,,R~)]
/ ) ( d s ) l I~(t- s + d u ) f ( t - s, u)
a,ff(O,O)+(_
(4.1)
,J
f
for almost every t. This is the result which Kingman [11] obtained by inverting a triple Laplace transform in the case where d > 0 . If d > 0 , then t---~ fi, is continuous and U ( d s ) = fisds, and (4.1) holds for every t (this is due to Neveu [15]. If d = 0 and/~ (Ro) = + ~, then 8, = 0 and (4.1) holds for every t again (this is due to Kesten [10] essentially). If d = 0 and /~(Ro) < 0% then fi, -~ 0 but the restriction "almost every t" cannot be removed in the absence of further restrictions on the smoothness of/~. In fact, if/~ has a countable support and a = 0,/~ (R0) < o% then (4.1) fails to hold at every t belonging to the group generated by that support. In view of this, the only real restriction in the proposition below is on the process X. Define D={x~E:
a(x)>0
or
Ld(x, R0)=o@
(4.2)
(4.3) Proposition. Suppose X is a regular step process. Then, we may take t--~ u,(x,A) to be continuous, and we have E
x
+
-
+
.
-
+
fD u,(x, dy)a(y)g(y,O,O) + fD
(4.4)
U(x'dy'ds)La(y't-s+du)g(Y'Y-S'U)
• E l
for all x E E
and all t ~ E .
(4.5) Remark. Suppose X is a regular step process and (S,) is strictly increasing (which means that ((7,) defined by (1.2) is continuous, which in turn means that the regeneration set M is without isolated points). In particular, Getoor [5] assumes this holds. (S,) can be strictly increasing only if D = E. Hence, Proposition (4.3) applies to this important case. Proof of (4.3). The expectation on {R ~, > 0} is equal to the second term on the right-hand side of (4.4) by the second statement of Theorem (1.7). Hence, we need only show that
E. ~?inlar / Entrance-exit distributions for Markov processes
Px{Z+, E A, R5 = 0, R~ = 0, Z5 = Z~,}
37
(4.6)
is continuous in t for any A E @. (This is equal to u, (x, a 1A ) for almost every t; and therefore the density t---, u,(x, a l ~ ) can be taken continuous and equal to (4.6) for all t.) Let r be the time of first jump for X, and define r0 = 0, r.+, = ~-, + ~- o 0T~ Then, ~-, is the time of the nth jump of X, and ~-, ~ ( almost surely, and X remains constant on each interval ft,, ~'-+0. Therefore, (4.6) is equal to
p~{Z_=Z+~A,R_=R+=O,S~
< ~ t < S .... }=
n=0
(4.7)
=~
Px{X~ E A , R-~=R+,=O,S~ < t < S .... }.
n=0
Note that, on {ST. = t-- u} we have R~; = R~(0~.), and R~ = R-~(O~.). By the strong Markov property at r.,
P X { R - , = R + , = O , S ~ . < t < S .... 12/l.T.}=f(X~.,t-ST.)Iio,,)(S~~
(4.8)
where f(y, u) = PY{R-~ = R~ = 0, ~-, > u}.
(4.9)
Starting at y, X stays there an exponential time with parameter k ( y ) = K(y, E); and during that sojourn, S has the law of an increasing L6vy process with drift parameter a ( y ) and L6vy measure Ld(y, .). It follows from the results mentioned following (4.1) that 0 f(y,t)=
r(y,t)a(y)
if a(y) = O, if a ( y ) > 0
for all t, where r ( y , . ) is the density (which exists when a ( y ) > 0 ) potential measure R (y,-) with R(y,e~) = [ A a ( y ) + Ld(y, 1 - e~)+ k(y)] -1.
(4.10) of the (4.11)
Putting (4.7)-(4.10) together, we see that (4.6) is equal to
E X [ a ( X , . ) r ( X . , , t - ST,);X~ E A fqA;S, < t ] = n=0
(4.12)
t"
3( A
V (x, dy, ds )a (y )r(y, t - s) h A ) x [ 0 , t)
by an obvious definition for V. This is essentially a convolution, and the function t ~ r(y, t) is continuous (Neveu [15]). Hence, (4.12) is continuous in t, and the proof of Proposition (4.3) is complete.
38 (4.13) R e m a r k ,
E. (ginlar / Entrance- exit distributions ]:orMarkov processes
As mentioned
before,
the r e s t r i c t i o n " a l m o s t e v e r y t "
c a n n o t b e r e m o v e d o n E \ D w i t h o u t a d d i n g o t h e r c o n d i t i o n s of s m o o t h n e s s . S i m i l a r l y for e q u a l i t i e s c o n c e r n i n g e x p e c t a t i o n s on t h e e v e n t {Z~ r Z+}.
References [1] A. Benveniste and J. Jacod, "Syst~mes de L6vy des processus de Markov", Inventiones Mathematicae 21 (1973) 183-198. [2] R.M. Blumenthal and R.K. Getoor, Markov processes and potential theory (Academic Press, New York, 1968). [3] E. t~inlar, "Markov additive processes If", Zeitschrift fiir Wahrscheinlichkeitstheorie und verwandte Ciebiete 24 (1972) 94-12I. [4] E. t~inlar, "L6vy systems of Markov additive processes", Zeitschri[t [fir Wahrscheinlichkeitstheorie und verwandte Gebiete 31 (1975) 175-185. [5] R.K. Getoor, "Some remarks on a paper of Kingman", Advances in Applied Probability 6 (1974) 757-767. [6] R.K. Getoor and M.J. Sharpe, "Last exit times and additive functionals", Annals o[ Probability 1 (1973) 550-569. [7] R.K. Getoor and M.J. Sharpe, "Last exit decompositions and distributions", Indiana University Mathematics Journal 23 (1973) 377-404. [8] J, Jaeod, "Un th6or~me de renouvellement et classification pour les chaines semimarkoviens", Annales de l'Institut Poincar6, Sec. B., 7 (1971) 83-129. [9] J. Jacod, "Syst~mes r6g6n6ratifs et processus semi-markoviens", Zeitschrift fiir Wahrscheinlichkeitstheorie und verwandte Gebiete 31 (1974) 1-23. [10] H. Kesten, "Hitting probabilities for single points for processes with stationary and independent increments", Memoirs of the American Mathematical Society 93 (1969). [11] J.F.C. Kingman, "Homecomings of Markov processes", Advances in Applied Probability 5 (1973) 66-102. [12] B. Maisonneuve, "Syst6mes r6g6n6ratifs", Astgrisque, 15 (1974), Societ6 Math6matique de France, Paris. [13] B. Maisonneuve, "Exit systems", Annals o]: Probability 3 (1975) 399-411. [14] B. Maisonneuve, "Entrance-exit results for semi-regenerative processes", Zeitschki]:t [iir Wahrscheinlichkeitstheorie und verwandte Gebiete 32 (1975) 81-94. [15] J. Neveu, "Une g6n6ralisation des processus ~ accroisements positifs ind6pendants", Abhandlungen aus den Mathematischen Seminar der Universit~it Hamburg 25 (1961) 36-61. [16] J. Neveu, "Lattice methods and submarkovian processes", Proceedings o1: the Fourth Berkeley Symposium on Mathematical Statistics and Probability, 2 (1901) 347-391. [17] R. Pyke, "Markov renewal processes: definitions and preliminary properties", The Annals o[ Mathematical Statistics 32 (1961) 1231-1242. [18] R. Pyke, "Markov renewal processes with finitely many states", The Annals o[Mathematical Statistics 32 (1961) 1243-1259. [19] M. Weil, "Conditionnement par rapport au pass6 strict", in: Sdminaire de Probabilitds V, Springer Lecture Notes in Mathematics 191. Universit6 de Strasbourg, 1971 (Springer, Berlin, 1971) pp. 362-372.
Mathematical Programming Study 5 (1976) 39-52. North-Holland Publishing Company
MARTINGALES OF A JUMP PROCESS AND ABSOLUTELY CONTINUOUS CHANGES OF MEASURE R o b e r t J. E L L I O T T
University of Hull, Hull, England Received 15 April 1975 Revised manuscript received 20 October 1975 A process (x,),,0 with values in a standard Borel space (X, 5~ is considered. Starting at the fixed position Zo it has a single jump at the random time S to a random position z E X. The probability space g2 can be taken to be Y = (R L{~})X X, and the jump time S and jump position z are described by a probability # on .O. Distribution functions F,a =/x (] t, ~] x A ) and F, = F,x are introduced, and the countably many discontinuities of F, are predictable stopping times. Associated with the jump process (x,) are certain basic martingales q(t, A), (A E 9"). The jumps of q(t, A) at the discontinuities of F, can be considered as innovation projections and the optional [q, q] (t, A) and predictable (q,q)(t,A) quadratic variations of q(t,A) are determined. Using the innovation projection norm, local martingales on the family of o--fields generated by (x,) can be represented as stochastic integrals of the q martingales. Finally it is shown how the L6vy system, or local description, of the (x,) process changes when the measure/x is replaced by an absolutely continuous measure /2.
1. Introduction F o l l o w i n g t h e classical results of K u n i t a a n d W a t a n a b e [12] for m a r t i n g ales on t h e family of g - f i e l d s g e n e r a t e d by a H u n t process, Boel, V a r a i y a a n d W o n g [2] e s t a b l i s h e d a m a r t i n g a l e r e p r e s e n t a t i o n t h e o r e m for m a r t i n g ales on t h e f a m i l y of g - f i e l d s g e n e r a t e d by a j u m p p r o c e s s w h o s e j u m p times w e r e t o t a l l y i n a c c e s s i b l e a n d w h o s e j u m p t i m e s h a d no finite a c c u m u l a t i o n point. In a s e c o n d p a p e r [3] t h e y a p p l i e d t h e i r results to d e t e c t i o n and filtering p r o b l e m s , a n d B o e l in his thesis [1], discusses o p t i m a l c o n t r o l of j u m p p r o c e s s e s using these results. E x t e n d i n g a t e c h n i q u e of C h o u and M e y e r a s i m i l a r r e p r e s e n t a t i o n t h e o r e m when the j u m p p r o c e s s has at most o n e finite a c c u m u l a t i o n point, b u t with no restriction on t h e n a t u r e of the j u m p t i m e s , was p r o v e d by D a v i s in [4]. A r e l a t e d o p t i m a l c o n t r o l result using t h e L 6 v y system, o r local d e s c r i p t i o n , is o b t a i n e d in t h e p a p e r of Rishel [141. 39
40
R.J. Ellion / Martingales of a lump process
In references [7], [8], [9] and [10] we obtain representation theorems for martingales on the family of o--fields generated by a jump process (x,) whose jump times may have both an accessible and totally inaccessible part and whose jump times may have finite accumulation points (indeed, accumulation points of arbitrary order). Below, for simplicity of exposition, we discuss the case when (x,) has a single jump at a random time S from its initial position zo to a random position z. The well measurable and predictable quadratic variation processes of certain basic martingales associated with the jump process are obtained and related to certain innovation projections. Using the innovation projection norms local martingales on the family of o--fields are represented as stochastic integrals of the basic martingales. Finally, it is shown how the L6vy system, or local description, of the jump process changes when the measure determining the time and position of the jump is replaced by an absolutely continuous measure. It is indicated how this is related to solving a stochastic differential equation driven by a jump process. Details of the latter result appear in [11] and applications are now being made to filtering and control.
2. The single jump case Consider a stochastic process (x,),~o with values in a standard Borel space (X, 5e), which has a single jump at the random time S from its initial position zo to a random position z E X. The underlying probability space can be taken to be 12 = Y = R ?~{~}• X with the o--field ~ which is the product o--field Y)(R+)*5 e together with the atom {oo}xX. Here ~ denotes the Borel o--field, and the G-field 0%o generated by the process up to time t is Y3([0, t]) * (O~ - {z0}) together with the atoms (]t, ~] x X), (R § • {z0}). This last term covers the situation when the process has a 'zero jump' at S - - s o we cannot observe when it happens. The time S and position z E X of the jump are described by a probability measure tx on 12 and we suppose that /x(R +x{z,,})=0, so the probability of a zero jump is zero. ~, is the completion of 0%o by adding null sets of ~z. For A E 5e write F,a = / x (] t, ~] x A ) for the probability that S > t and z E A. Define F, = F x so that F, is right continuous and monotonic decreasing. Consequently, F,
R.J. Elliott / Martingales o[ a jump process
41
has only countably many points of discontinuity {u} where AF, = Fu - F,_ ~ 0. Each such point u is a constant and so certainly a predictable stopping time. The measure on (R +, ~ ( R + ) ) given by F,a is absolutely continuous with respect to that given by F, so there is a positive function A(A, s) such that
F,A - F~ = fjo,,~ A(A, s)dFs. Write
A (t) =
- fl o,,1 dFs/Fs_.
Definition 2.1. (A, ^) is called the L6vy system for the basic jump process (x,). Write 7~(t) = A(t^ S); then roughly speaking, d~(s) is the probability that the jump occurs at s given that it has not happened so far, and A(A, s) is the probability that the jump value z is in A given that it occurs at time s. Define for A E 5e:
p(t, A ) ~(t,A)=
= I(,>>-s)Itz~a),
fj0.,j A(A, s)d~(s)
= f~o,,^sj - X ( A , s ) d F s / F s - . The basic martingales are described by the following result ([4, Proposition 3]): Lemma 2.2.
q ( t , A ) = p ( t , A ) - p ( t , A ) is an .~, martingale.
3. Quadratic variation Introduce the process
r(t,A)=
~, O
~ ( A , u)2AFu/F~_. 2 2
42
R.J. Eltiott / Martingales of a jump process
The jump of a basic martingale q(t,A) at a point of discontinuity u is
q(u,A)-q(u -,A)
)aFo I(s~)
= I(s=.)I(~A)+ A(A, u ~ =Aq(u,A). However,
E[I~s =.) I(~,~)[ ~'~-] = = E[I(z~)[S = u]P[S = u 1.~.-] =
-
A(A,
.AF. u)-~._I(s~.);
SO
E[Aq(u,A)]~_] : O. Because u is a predictable stopping time, from Theorem VIII-T28 of Meyer [13] we have that
qa. = Aq(u, A )I(,>~) is a square integrable martingale orthogonal to every square integrable martingale which is continuous at u. Furthermore, from VIII-T28 of [13] the predictable quadratic variation of qa. is: (q a~, q a. ) (t, A ) = E[Aq (u, A )2] .~u-]I. ~u) = AF. (
, AF,~
= - a ( A , u ) Fu_ _l + a(A'ul~-~-) I~s~u)G~)" The distribution function E can be decomposed into its continuous part Ff and its sum of jumps F~ = Z0. . . . AFt. The/~(t, A ) process can be similarly written as tic(t, A ) +/5~(t, A ) where
p~(t,A )= -fl0,,^sJ A(A,s)dF:/F,_ and
/~(t,A):- O
pd(t,A )=
~.
I(s=~)I(z~A)
O
write
pC(t,A ) = p(t,A ) - pn(t,A ).
R.J. Elliott/ Martingalesof a jumpprocess
43
Then we can decompose q(t,A) as:
q(t,A )= q~(t,A )+ qd(t,A ), where
qr and
q~(t, A ) = p~(t, A ) - ~ ( t , A ) = ~ qaU. u
Now qaU is orthogonal to qa,, if u / u ' and qd(t,A) is orthogonal to q C(t, A). The quadratic variation of a sum of orthogonal martingales is the sum of the quadratic variations so
(q,q)(t,A )= (q~,q~)(t,A )+ (q~,qd)(t,A ) and
(qd, qd)(t,A)= ~(q~",qa")(t,A) u
AF~ (
AF~'~
u
=~d(t,n)-r(t,A). Because F~ is continuous, as in L e m m a 2.4 of [7] we have that
(qC, qC}(t,A)= Oc(t,A) is the predictable quadratic variation of qC(t,A). Therefore, we have indicated the proof of the following result: Theorem 3.1.
The predictable quadratic variation of q (t, A ) is
~'(t,A )+ fid(t,A ) - r(t,A ) = ~(t,A ) - r(t,A ). Suppose that the jump time is S. Then the well-measurable quadratic variation of q(t, A) as defined in [6], is:
[q,q](t,A)= ~ Aq(u,A) 2= u~f
=
~_~ A(A,u)ZAF~/F2.-+ I.~s,(I~A+ A(A, S)AFs/Fs_) ~ O
= r(t,A)+p(t,A)+2p(t,A)A(A,S)
AFs Fs "
R.J. EtIiott / Martingales of a jump process
44
4. Stochastic integration Write J for the set {g} of 0~ measurable real valued functions on O such that g (% z ) = 0. For fixed t and to, p (t, A ),/~ (t, A ) and q (t, A ) are countably additive and for suitable g U l we have, considering the integrals as Stieltjes integrals:
n g(s, x)p(ds, d x ) = g(S, z), f, g(s,x)p(ds, dx)= flo,s) fx g(s,x)Z(dx, s)dZ~(s) = - fo, s, f• g(s,x)~(dx, s)dF~/F~-, and we define
Then
L~(p) =
{g ~ y : IIg I1~< o~},
where
IIg II~=
We can
E[(fgdp)q =
E(g(S, z) 2)
similarly define integrals with particular, for suitable g E~r
respect
to
the
processes
pC,pd,~o,~d. In
fa g(s'x)pC(ds'dx)= - ~o,s, f• g(s,x)X(dx, s)dF:/Fs_ and
f g(s,x)~d(ds, dx)= The process
r(t, A)
~_~ ~• g(u,x)A(dx, u)AF./F._. 0
is not countably additive, but for g E ~ write:
Z 2(r) = {g ~ ~ : IIg I1~< oo}, where
Finally define
R.J. Elliott/ Martingalesof a jumpprocess
45
L2((q, q)) = {g E ,,4: Hg 11(,,,,,>< ~} where I1g I[~q,q>= IIg I1~c+ IIg 11~". In [8] the following result is proved.
Suppose M, is a square integrable ~, martingale. Then there is a g E L z ((q, q)) such that
Theorem 4.1.
M, = Yn I(s~,)g(s, x)q(ds, dx)
a.s.
The existence of g Eor is a consequence of work of Davis [4]; that
g E LZ((q, q)) is established by calculations (see [8]) from the observation that Ms = lim,_~M, exists and is in Lz(O,/~). We now indicate another proof of why g E LZ((q, q)), together with an explanation of the [I II(q,q)n o r m . Now
fa gdq = fa gdq~ + fa gdqa and
= E [ ( ~ a g d q C ) 2 ] + E [ ( f a gdq't) 2]
by orthogonality
= [[g [[,~o+ [[g 1[~ say,
qc ] + [fo from.properties of stochastic integrals (see Doldans-Dade, Meyer [6]),
from the characterization of the quadratic variations in section 4. If IIg I1~< ~ this is = IIg I1~ -c + I[g I[~-"- IIg II,~
= [Ig I[~--IIg I1~: IIg 11,%,~>.
R.J. Elliott / Martingales of a jump process
46
For any point of discontinuity u of F, ,AFu f~ gdq"U= g(u,z)I,so,O+fx g(u,x)X(dx,u)-gf/_ I(s~.). However, AF~ E[g(u, z)l(s=.,l .~. ] = - fx g(u, x)A(dx, u ) - ~ - I(s~., so fagdq au is the extra information in g(u, z)I(s=.) that we do not have given only the information in ~._. That is, fagdq a" is the innovation projection of g(u, z)I(s..~ given ~._. Furthermore, its L 2 norm is
= - fx g(u, x)2A(dx, u)AF,, -E[(fx
g(u,x)A(dx,u)AF~/Fu-fI(s~,,].
Therefore
f,~ gdq~=~f,~
gdq a*'
is the sum of the innovation projections at the points of discontinuity of F,. By orthogonality ~E[(fa
gdqaU) ]
and from above if IIg I[,< ~ this is --
II g I1~-d - JIg If,~ -- If g rf~d.
Hg I1~~ is, therefore, the sum of the squares of the L ~ norms of the innovation projections. Having determined the meaning of the L2((q, q)) norm we make the following definitions: Definition 4.2.
L'((q,q)) = {g ~ # : Hgll, < ~ } where
I,gll,=E[If, ~ e,dq*r]+EE[If, gdq~'l].
R.J. Ellion / Martingales of a jump process
47
L~oc((q, q)) = {g E # : gls<_,E L '((q, q)) for all t < c), where c = inf{t: F, = 0}. L +(~) = {g ~ •:
IIg II~ < ~}
where IIg II+= ess sup l g (s, x)l. If g E L+((q, q)) then esssup f J
I,.,gdqaul < oo for i
l
t < c.
l
Consider a uniformly integrable o%, martingale M, so M, = EIM+I~,]
for some M+ E L 1(12,/z). Then M= can be approximated in the L 1norm by a sequence M ~ L 2 ( O , lz). The square integrable martingales M,k= E[M~ I,~,] have representations by Theorem 4.1 as stochastic integrals
M ~, f• I..,)gk(s. x)q(ds, dx) for integrands
gk ~ L:((q, q)). The gk form a weak Cauchy sequence in L~o~((q, q)) and so have a limit g E Llo~((q, q)). Interpreting the integrals as Stieltjes integrals we have
M, =
fo ,.++ (s,x)q (ds, dx ).
This representation can be extended to local martingales and the following result can be stated. Corollary 4.3.
M, =
If M, is a local ~, martingale, then M, has a representation.
fo I~,,,)g(s, x)q(ds, dx)
for a g E L~o~((q, q)), A detailed proof can be found in [9].
48
R.J. Elliott / Martingales of a jump process
5. Absolutely continuous change of measure
The L6vy system (A, ^ ) of the jump process (x,) was defined in Section 2. Suppose that the measure ~, which describes the time and position of the jump, is replaced by an absolutely continuous measure fi ~/z. Then it is shown in [11] that the L6vy system under fi becomes (s ~) where
fA (I+c~+AF~( qb) AFs( ~b)a(dx, s) and X(t)=fo,,,
~x ( l + & +AF, -~_ j•f 4~)A(dx,s)d^(s).
Here Ps- \
E
where L is the Radon-Nikodym derivative d/~/d/~ and, as in Section 2, F, -- ~(]s, ~] • X),
L = dos, ~] • x ) .
It is of interest to relate the above result to that of van Schuppen and Wong [15] on change of measure for local martingales, and to indicate how to affect a change of measure. Suppose 4, E L' (O,/x) is such that if
Z, = fn I~"~ then (i) L, --- EXP
Z,
x)q(ds, dx)"
is a uniformly integrable positive ~, martingale,
(ii) E[L~] = 1.
From the formula of Dol6ans-Dade [5] we have that x
II (1-
fx ~b(u,x)A(dx,u) AFu~exp( - fa I(s~t)~bdffC)"
u~S
Define a new probability measure fi on J2 by taking the Radon-Nikodym derivative dfi/d~ to be L=. The result of [15] then says that because
q(t,A )= p(t,A )- ff(t,A )
R.J. Elliott/ Martingalesof a jumpprocess
49
is an ,~, martingale with respect to the measure tx ~(t, A ) = q(t, A ) - (q, Z),
(5.1)
is an ,~, martingale with respect to the measure /2. Now the process
~6(t, A ) = I(,~s)l(z~,)
(5.2)
is the same as the original process p(t, A); it is just the distribution of the jump times and positions that is altered, so with respect to the new L6vy system (,~, ~): / ~ ( t , a ) = fj0.,^sj 2(A,s)dT~(s). From (5.1) and (5.2), therefore,
~(t,a)=~(t,a)+(q,Z),, and as all the processes are stopped at S, we can say that for all t:
s
A(A,s)d~(s)=
o.,, h(A,s)d^(s)
+ (q(t,A ), ~o.,. f x 4a(s,x)q(ds, dx)). The last term above is
dx)-f,o.,, (fA *(d/")) (fx +(" x),(dx, s)) a,,-F,% -,-
rio. tl fa Writing
d~(s)--
dE, p,_,
d^(s)--
dE E..
we have after some manipulation that AF, f ~)dA
X(A,s) fA ( l+~+ F, Jx
in agreement with the formula above. Remark 5.1.
With the above notation we have observed that
~ ( t , A ) = q(t, A ) - {q, Z), is an ~, martingale under the new measure/2. Using the Girsanov measure
R.J. Elliott / Martingales of a ]ump process
95 0
transformation idea we can, therefore, think of the original martingales q under the new measure as solutions of the 'stochastic differential equation': dq = d(q, Z) + d~, which represents a basic martingale of the jump process and a drift term d(q, Z). Details of the above appear in reference [11] and applications are now being made to optimal control and filtering problems, extending work of Boel, Varaiya and Wong [1], [2], and [3].
6. Extensions to multi-jump situations
Most results described in the previous sections have extensions to the situation where we have a right constant stochastic process (x,) which has a succession of jumps and ~, is the family of ~r-fields generated by (x,). The situation is considered anew after each jump and the next jump described by a conditional probability measure. The only novel problem arises at an accumulation time of jump times. If T is such an accumulation time of previous 'ordinary' (or zero order) jump times then, given what has happened at the previous jumps, that is given ~r-, the jump time T is determined. What is not determined, however, is the position zT of the jump process at time T. Therefore, we suppose that this is described by a conditional probability measure or L & y measure Z (w ; ", T) on (X, 5r Corresponding to T the basic processes are p=(t, A ) = I(,~T) I(zrEA), ,6=(t, a ) = h (W ; A, T)I(,~r). NOW E[I(~T~A)I~T ] = A(w;A, T) SO
It~a)-
h(w; A, T)
is the innovation projection of I(~T~A) given 9~ r , VIII-T28 of [13] q=(t, A ) = p| is an ,~, martingale.
A ) - fi=(t, A )
and from Theorem
R.J. Elliott / Martingales of a jump process
51
T h e r e a r e a n a l o g u e s of all t h e p r e v i o u s p r o c e s s e s a n d r e s u l t s (except the r e p r e s e n t a t i o n of local m a r t i n g a l e s ) for the a c c u m u l a t i o n j u m p time situation and, in fact, most of the p r o o f s a r e simpler. D e t a i l s can b e f o u n d in [7], [9], [10], [11].
7. Conclusion T h e results of Boel, V a r a i y a a n d W o n g [2], [3] a n d D a v i s [4] on the r e p r e s e n t a t i o n of m a r t i n g a l e s of a j u m p p r o c e s s h a v e b e e n e x t e n d e d to g e n e r a l j u m p p r o c e s s e s w i t h o u t r e q u i r i n g total inaccessibility of the j u m p s o r that t h e r e is onty o n e finite a c c u m u l a t i o n point. E v e n for t h e basic case of a single j u m p i n t e r e s t i n g e x p r e s s i o n s for t h e q u a d r a t i c v a r i a t i o n a n d L6vy system h a v e b e e n o b t a i n e d . T h e r e s u l t s are now b e i n g a p p l i e d .
References [11 R. Boel, "Optimal control of jump processes", Electronics Research Lab. Memo M 448, University of California, Berkeley, Calif. (July 1974). [2] R. Boel, P. Varaiya and E. Wong, "Martingales on jump processes, Part I: representation results", Electronics Research Lab. Memo M-407, University of California, Berkeley, Calif. (1973), S.LA.M. Journal on Control, to appear. [3] R. Boel, P. Varaiya and E. Wong, "Martingales on jump processes, Part II: applications", Electronic Research Laboratory. (Memo M-409, University of California, Berkeley, Calif. (1973), S.LA,M. Journal on Control, to appear. [41 M.H.A. Davis, "The representation of martingales of jump processes", Research Report 74/78; October 1974, Imperial College, London. [5] C. , Dol6ans-Dade, "Quelques applications de la formule de changement des variables pour les semimartingales", Zeitschrift Jar Wahrscheinlichkeitstheorie und Verwandte Gebiete 16 (1970) 181-194. [6] C. Dol6ans-Dade and P.A. Meyer, "Integrales Stochastiques par rapport aux martingales locales", in Seminaire de Probabilites IV, Springer Lecture Notes in Mathematics 124 (Springer, Berlin, 1970). [7] R.J. Elliott, "Stochastic integrals for martingales of a jump process", Hull University, U.K. preprint, to appear. [8] R.J. Elliott, "Martingales of a jump process with partially accessible jump times", Hull University, U.K. preprint, to appear. [9] R.J. Elliott, "The L~vy systems of a jump process and martingale representation", Hull University, U.K. preprint, to appear. [10] R.J. Elliott, "L6vy functionals and jump process martingales", Hull University, U.K. preprint, to appear. [11] R.J. Elliott, "L6vy systems and absolutely continuous changes of measure for a jump process", Hull University, U.K. preprint, to appear.
52
R.J. EUiott / Martingales of a jump process
[12] H. Kunita and S. Watanabe, "On square integrable martingales", Nagoya Mathematical Journal 30 (1967) 209-245. [13] P.A. Meyer, Probability and potentials (Blaisdell, Waltham, Mass., 1966). [14] R. Rishel, "A minimum principle for controlled jump processes", University of Kentucky, preprint (1974). [15] J. van Schuppen, and E. Wong, "Transformation of local martingales under a change of law", The Annals of Probability 2 (1974) 879-888.
Mathematical Programming Study 5 (1976) 53-59. North-Holland Publishing Company
ANALYSIS OF B R O W N I A N F U N C T I O N A L S Takeyuki H I D A
Nagoya University, Nagoya, Japan Received 3 June 1975 Revised manuscript received 1 October 1975 We discuss functionals of Brownian motion by expressing them as those of white noise. We then proceed to generalize them to obtain generalized functionals of Brownian motion. Throughout this paper the integral representation of those functionals is a powerful tool.
0. Introduction
We are interested in the analysis of functionals of Brownian motion {B(t)}, call them Brownian functionals, as well as stochastic processes each random variable of which is a Brownian functional. A Brownian functional may be expressed as a functional, nonlinear in general, of white noise that is the derivative of Brownian motion. We shall therefore begin with, in Section 1, the definition of white noise which is the measure space (5e*,/z) given by the characteristic functional C(s
= exp [ - 89 ~: 112],
s ~ ~,
(1)
where 5e is the Schwartz space and 5e* is the dual space of 5e. Each x ~ ,9~ with p. may be thought of as a sample path of {/~(t)},/)(t) = dB(t)/dt, and ~ ( x ) E ( L 2 ) - L Z ( S P * , ~ ) is a functional of white noise or a Brownian functional with finite variance. One of the basic tools for the analysis of Brownian functionals is the integral representation. We shall give a quick review of this representation theory here. Being inspired by the problems in many fields of applications such as Stochastic Control (cf. [1]) and Q u a n t u m Mechanics (for instance, Feynman's path integral), we shall introduce, in Section 3, a special class of Brownian functionals which corresponds to the Sobolev space in the integral
53
54
T. Hida / Analysis of Brownian functionals
representation. We shall then proceed to extend the (L2)-space, namely we shall introduce a class of generalized functionals of white noise. The last section will be devoted to the product of (L2)-functionals and /3(t) the derivative of Brownian motion B(t). Having introduced such an operation we can carry out a proposed differential and integral calculus based on d B ( t ) .
1. Brownian functionals
Let 50 be the Schwartz space of real valued C| and let b~ be the dual space of ow. Given a characteristic functional C(~:), ~ ~ 50, in (1), we can find uniquely a probability measure p. on (50", ~ ) such that C(~:) = f~. e'~X'e>d/z(x),
(2)
where N is the o--field generated by the cylinder subsets of ~ * and (x, ~), x ~ 50*, ~: @ 50, is the canonical bilinear form which links 50 and 5e*. The measure space (St*, @, p.) is called white noise. Each x E 50* with t~ is viewed as a sample function, in fact a generalized function, of /3(t) and a functional q~(x) on ow* can therefore be thought of as a functional of Brownian motion; it will be called a Brownian functional. We shall first restrict our attention to a class of complex-valued Brownian functionals with finite variance, namely the Hilbert space (L 2) ~- L2(50 *,/z). The space (L 2) admits a direct sum decomposition into the spaces of multiple Wiener integrals. This can be done in the following manner. First, take a complete orthonormal system {~,; n -> 1} in L2(R ') such that G ~ 50 for every n, and form polynomials of the form I-[ Hok ((x, ~k)/X/2),
finite product, the ~k distinct,
(3)
k
where/4, is the Hermite polynomial of degree n. Such a polynomial in x is often called a Fourier-Hermite polynomial and Z nk is the degree of the polynomial. It can be proved that the collection of all the Fourier-Hermite polynomials forms a base of (L:). Let ~ , be the subspace of (L 2) spanned by the Fourier-Hermite polynomials of degree n. We are now given a direct sum decomposition (see, for example, [2]): (L2) = 2 @ ~.. n ~0
(4)
T. Hida / Analysis of Brownianfunctionals
55
The subspace is called the multiple Wiener integral of degree n. Thus, for any Brownian functional q~(x) in (L2), we have the unique expansion
~(x) = ~ ,p.(x),
~~
E a'o,
(5)
/'~=0
where Co(X) is a constant which is the expectation of r
2. Integral representation We introduce a transformation functionals on ow
3
which maps (L 2) to a space of
e~'X">~(x)d/z(x),
~0 E(L2),
~:~.9 ~
(6)
If, in particular, ~ (x) is a Brownian functional of the form (3), then we have ($-q~)(r = (V2i)"C(~:)fR-""" f [~:,(Ul)... ~:l(u,,)r x ~:(Ul)... r
...du,,
]
n = ~'~ nk.
In the above integral we may change the L2(R")-function in [ . . . ] to its symmetrization, since the same function ~: follows as many as n. In addition, the L2(R")-norm of the symmetrized function is (Ilnk!)~/(n!) ~, while I[~ I[(L 2) = (IInk !)~2"/2. Such observations lead us to the following theorem. Theorem 1. The transformation J-gives a one-to-one linear transformation of ~ , onto ( L 2 ) ^ ( R " ) = - { F ; F @ L2(R"), symmetric} in such a way that (i)
=
q~ E ~,,
(ii)
F E (L 2)^ (R").
1l~ ll,~2,=(n!)~llFIIL2,.. ).
For details of the proof we refer to [2]. We can now say that (L2) ^ (R") is an integral representation of Y(. through the formula (i) of Theorem 1, and we are given a formal expression (L2) ~ 2 @ (n !)~(L2) ^ (R"). n=0
(7)
56
T. Hida /Analysis o[ Brownian functionals
This expression suggests to us how to generalize the notion of a functional defined on the space Se* of generalized functions.
3. Generalized Brownian functionais
We shall start with a particular subspace of ~ , n >-1. Let F. be the (L2)^(R")-function associated with ~0 ~ ~ . by the integral representation, and let H '~ (R") be the Sobolev space of order m on R n with the norm ill" Ill,-. Set 2d~.")= { r
~ g('+l)/2(R ") CI (L2)^ (R')}.
(8)
If we introduce a norm [[. [[~ in ~/(~) by
then ~ " ) is a Hilbert space and the injection
is continuous. We are now able to define the dual space ~'~") of ~(.") in such a way that with any G in H-("~)/2(R ") we can associate a member g'a in ~ - " ) satisfying (r
r
= n !(G, F).
(9)
for any Cv in 2d~") with the integral representation F E (L2)~(R"), where ( . , - ) . ( ( . , . ) . ) is the canonical bilinear form which links ~ c . ) a n d ~(.-") (H('+')/~(R ") and H-("+')/2(R")). We are now given a triple
YG") c ~,,
c
~.-"',
(10)
where the injections from left to right are both continuous. Let II" tl-. be the norm in ~ . "), and define
t n
(L~) - =
n~O
)
the dual space of (L2) §
Then the following theorem is straightforward.
Theorem 2.
A n y r in (L2) - is expressed in the form n n -o
(ll)
T. Hida / A natysis of Brownianfunctionals
57
with the property that
(12)
Jf,Po IL < o~. n = 0
A m e m b e r in (L2) - is said to be a generalized Brownian functional. Thus we have been able to generalize Brownian functionals by the help of integral representation and Sobolev space. T h e following examples will serve to illustrate those functionals in (L2) + and (L 2) . Example 1.' Let q~(x) be in ~ 2 ) and assume that it is a positive quadratic functional minus its expectation. This implies that the associated (LZ) ^ (R2) function F(u, v) is expressible, in terms of eigensystem {A,, rim, n -> 1}, as
F(u,v)=E ~'oo(u)'oo(v),
An>0,
(13)
where En 1/An < ~ and the series (13) converges uniformly (see [6, w The original functional itself can be expressed as
1 ~(x) = E ~((x, n. )2 - 1)
in (L ~)
(14)
(cf. [3]). If we write it as ((x, no)) 2
1,
then we see that q~(x) is a continuous function on a Hilbert space since x---){(x,'o~),n >-1} presents a coordinate representation of x (we add, if necessary, some "On's so that {"On} b e c o m e s a complete o r t h o n o r m a l system in L2(R~)).
Example 2.
Consider a quadratic functional
U(~) = [ f ( u ) ~ ( u ) Z d u ,
f E H'(R'),
(15)
defined on 5e. Such a functional never appears in the integral representation of ~2, h o w e v e r it does appear when we extend the representation to the class of ~-~). Indeed, if we would express U(~:) in the form
u(r
f f f(u)~(u-v)~(u)~(v)dudv,
' The author enjoyed discussions with Professor G. Kallianpur.
(15')
T. Hida / Analysis o[ Brownian functionals
58
then, the symmetric function f ( u ) 6 ( u - v), being in H-3/~(R2), could be associated with a member, say q,(x), in'Y(~-2). Let us have an observation on this functional ~(x). A system {(x, xi,^,,,,v,,i); t CR'} in Y(~ is taken to be a version of Brownian motion. Consider a functional
t~.(x)= ~-~ f(u,)~ \ ~ !
- 1 ,
u, EA,,
(16)
where {A~} is the system of intervals with length [A~ t = 1/n and s = R Z. We can associate (L2) ^ (R2)-function F.(u, v) = Einf(u,)xa,• v) with tp. (x). The F, does not converge in (L 2) ^ (R 2) but does in H;3/2(R 2) with the limit f(u)6(u - v). Thus we claim that the limit of the 6.(x) does not exist in ~2 while the limit could be found in y(~-2). The limit must be the same as tp(x). A formal expression of ~b(x) could now be introduced; being suggested by (16) and noting (dB(t)/~/dt):=/3(t)2dt, we may write
tp(x) = ] f(u){[~(u) 2- E(/~(u)g)}du.
(17)
Finally we note that for q,(x) in Yg~zz) with the integral representation F we have =
2f fF(u,v)f(u)a(u-v)dudv 2 .I F(u, uff(u)du.
(Use trace theorem for F; see, e.g. [5].)
4. Multiplication by
B(t)
As we have seen in Section 3 we shall be able to deal with /3(t). In this section we shall show that the multiplication of (L2)+-functional b y / ~ ( t ) is possible and that the product is in (L2) -. Again we shall make use of the integral representation. Since the system {e~<x'~>,rt E 5e} of exponential functionals generates the (L 2) space, we start with the product /)(t)C <x''>,
r / E ow,
t E R' fixed,
(18)
where B(t) is taken to be (x, xl,,^,.0v,1) as in Section 3. The generalized functional given by (18) is approximated in (L~) - by
(x,,g,
T. Hida / Analysis of Brownian functionals
59
Applying the transformation J- to this approximation yields
i(•+ r/, ~-~ )exp[-&[l~ II2- (~, ~)] C(~). As
i A[--+ 0, A containing t, this transformation tends to i(~:(t) + rt (t)) exp [ - ~ll ,~ II2 - (~, ,7)]" c ( ~ ) .
(19)
The fact that the transform of the approximation tends to the quantity (19) enables us to prove
3-(B (t)H. ((x, "r/)/~/2)) (~:) = = i"+'2"/2{~:(t)(~:, r/)- + nn(t)(~, r/)"-'}c(~),
(20) I177 II = 1.
This means that with [3(t)H,((x, r/)/~/2), II "0 II = 1, we can associate 2"/2{(r/"|
6,) ^ - nrl(t), r/'"-"|
(21)
where (~ denotes the tensor product, and ( . ) ^ symmetrization, which is a sum of an H ( ~"+')a)(R"+')-function and an (L2)^(R"-')-function. Namely
B(t)Ho((x, " o ) l V - 2 ) ~ ~#.-:,-'~ + 8o ,. Such a relation can be extended to the product of/~ (t) and F o u r i e r - H e r m i t e polynomials. Thus we can immediately prove the following theorem.
We can multiply functionals in ~g?) by B ( t ) and the product is in the space ygc-+. ,)+ ~g,_,.
T h e o r e m 3.
Finally, we would like to note that this result together with the expressions (20), (21), is helpful to solve a certain class of stochastic differential equations.
References [1] A.V. Balakrishnan, "Stochastic bilinear partial differential equations", Preprint (1974). [2] T. Hida, Stationary stoehasticprocesses (Princeton University Press, Princeton, N. J., 1970). [3] T. Hida, "Quadratic functional of Brownian motion, Journal of Multivariate Analysis 1 (1971) 58-69. [4] P. L6vy, Probl~mes concrets d'analyse .fonctionnelle (Gauthier-Villars, Paris, 1951). [5] J.L. Lions and E. Magenes, Non-homogeneous boundary value problems and applications, I (Springer-Berlin, 1972). [6] F. Smithies, Integral equations (Cambridge University Press, London, 1970).
Mathematical Programming Study 5 (1976) 60-66. North-Holland Publishing Company
PROBABILISTIC REPRESENTATIONS OF B O U N D A R Y LAYER EXPANSIONS Charles J. H O L L A N D Purdue University, West Lafayette, Ind., U.S.A.
Received 27 May 1975 Revised manuscript received 22 July 1975 Using probabilistic methods we treat the problem of asymptotic expansions for a class of semilinear elliptic equations depending upon a small parameter e which degenerate to parabolic equations when e becomes zero. The method depends upon the representation of the solution of the partial differential equation as the expected value of a functional of an Ito stochastic differential equation. In [3, 4, 5], we have used probabilistic methods to derive asymptotic expansions for a class of singularly perturbed second order semilinear elliptic and parabolic partial differential equations in two independent variables. Singularly perturbed indicates that the highest order derivatives are multiplied by a small positive p a r a m e t e r e so that the equation becomes of first order when e is zero. Our results include the validity of the regular expansion and the ordinary and parabolic boundary layer expansions in the theory of singular perturbations. Our approach is probablistic depending upon the representation of the solution of the partial differential equation as the expected value of a functional of an Ito stochastic differential equation. This approach has been used previously by Fleming [2] to treat the regular expansion. The work discussed above appears to be the first theoretical treatment of both ordinary and parabolic boundary layer expansions for semilinear equations. These results also have the advantage over previously existing work for the linear case in that we only require local estimates and consequently prove local results. In this p a p e r we illustrate our probabilistic method on a class of semilinear elliptic equations depending upon a small p a r a m e t e r e which degenerate to parabolic equations when E becomes zero. Consider for e > 0 the semilinear elliptic equation 60
C'.J. Holland / Representations of boundary layer expansions C(x)qb'~,x, + eq~:,~2+ A ( x ) q S ~ , - B ( x ) c h : . + F(x, ~ )
=0
61 (1)
in the square S = (0, 1)x ( - 1,0) with Dirichlet boundary data q~ = A on OS and C > 0 , B > 0 o n S U a S . Under certain assumptions the regular and ordinary boundary layer expansions are established. In the regular expansion the solution ~ of (1) is approximated for small e by the solution 4, ~ of the parabolic equation (1) with e = 0 taking the boundary data q~o = A on the lower and lateral sides of the square. Near the top of the square q~0 is not generally a good approximation to 4~' and in that region one determines a boundary layer correction term. We consider the case where ~b~ is bounded in S. This depends partly upon the compatibility condition that the boundary data satisfy the parabolic equation (1) with e = 0 at the corner points (0, - 1), and (1, - 1). For the linear case with the additional restriction that B is only a function of x2, Zlamal [6] has considered the case where ~b',',, is not assumed bounded. We prove the following theorem. Theorem 1. Let 0 < m < { and let there exist positive constants eo, K * such that the following hold:
(BI) S = (0,1)x ( - 1,0). (B2) (B3) (B4) (h''= A
A, B, C a r e C 2 functions on S a n d F is a C 2 function on S x ( - ~, ~). C > 0 and B >O on S. There exists a solution d) ~ to (1) with e = 0 on S taking boundary data on the bottom and lateral sides of S such that ch~ is uniformly bounded on S. (B5) For 0 < e < e o there exists a solution of class C 2 to (11) on S, continuous on S U &S, satisfying I ch ~ [ <- K * on S.
(B6) 6 (e) ---- e m Define
q~(x) = e x p -
fi'
A ( s , x : ) ( 2 C ( s , x2))-'ds,
and
qt"(x ) = [r
0)A (x,, 0) - V~
0)]ex p[B (x,. 0)x2].
Then for each a, O< a < m, ch'(x)=dP(x)fqb~
*~
~ ) l + o(t ~
(2)
62
c.J. Holland / Representationso[ boundary layerexpansions
uni[ormly on S, and for each a, 0 < a < 1,
6" (x ) = ~(x )4~~ ) + o(e ~
(3)
uniformly on [0, 1] • [ - 1, - 6(e)]. Proof. Define V" = 4 , ~ -'. Note that there exist positive constants a,, a2 such that or, < q b < a2 on S and that qb satisfies for fixed x2 the equation A 9 + 2Cqb,, = 0. H e n c e to establish (2) and (3), it suffices to show that V ' ( x ) = V~176 uniformly on [0, 1] x [ - 1 , 0 ]
v" (x) = V~
(4) for each a, 0 < a
< m, and
+ o(e ~
(s)
uniformly on [0, 1] x [ - 1, - 8(e)] for each a, 0 < a < 1. By direct calculation V * satisfies
VL,,+eC-'V~.~.+[-BC-'+eD]V'~+G(x,
V')+eH(x)V"
=0
(6)
with D = 2~2(C4))-',
H = O~2.~(C~)-'
and
G(x, V) = ( C ( x ) ~ ( x ) ) - ' [ A ( x ) ~ . , ( x ) + C ( x ) ~ . , . l ( X ) - B(x)~.2(x) + F(x, ~(x ) V)]. For e > 0 , V ' ( x ) = d ~ - ' ( x ) A ( x ) o n ,9S while for e = 0 we have V~ = 9 - ' ( x ) A ( x ) on 0 S - { ( x , , 0 ) : 0 < x , < 1}. The regular expansion (5) is now considered. Fix a, 0 < a < 1 and define O" = ( W - V~ ~. T h e n for e > 0 0 ~ satisfies the linear elliptic equation
0:,~, + eC '(x)0~2~2 + ( - B ( x ) C - ' ( x ) + eD(x))O'.2
~- e '-"[C-'(x)V~
D(x)V~ + H(x)V ~ = 0
with the b o u n d a r y data 0 * = 0 on the b o t t o m and lateral sides of S and 0 ~ = e ~ ' [ q ~ - ' A - V ~ along the top of S. We now m a k e a probabilistic representation of 0". Consider for x E S and e -> 0 the Ito stochastic differential e q u a t i o n
C.J. Holland / Representationsof boundary layer expansions
- B(~)C(~)-' + eD(~) dt +
e~(C(~))_l] i dw2)
63
(8)
with initial condition ~,(0) = x. For x E (0, 1) x ( - 1, - 8(e)), let r~" be the first time t -> 0 that ~ (t) E OS and let p be a constant such that p > 2~-~ for all x ~ S. T h e existence of a constant p is guaranteed by the assumption B / C > 0 on ff Let y~ = rain (r~, p), then using the Ito stochastic differential rule we have for x E (0, 1)• ( - 1, - ~(e)) that
0"(x) = Ext~
fo': O:(t)e '-~ [C
V 0.... +DV~
H V o) (~., (t))]dt
+ Q : ( 7 : ) 0 , (~: (7~,))}
(9)
where
O;(t)= exP fo' [eH(r + fo' Gv(~:(s), V~
A(V'(s~:(s)) -
V~
We show that 0"--~0 using the representation (9). The proof depends upon the estimates (10), (11), (12) relating trajectories of the e-system (8) to the system with e = 0. Redefine the functions A, B, C, qb outside S so that there exists a constant M which is b o t h a b o u n d for [D[ and a Lipschitz constant for I BC-1[ on R 2, For functions h defined on [0, t], let IIh = sup0~,,~, I h(t')[ and let sr~( t ) = e~f~ C(~(s))4dw2(s). Since the first component of the vector ~:~(t) is i n d e p e n d e n t of e -> 0, an application of Gronwall's inequality yields
II,
liE:
- euP[eMp + e89 ?~ lip].
-
T h e n f r o m equation (14) in [3] we have that P {J[~:- ~':, II, >89 <-- 4(exp(M + fl)p) exp ( - 8 (e)/(4 ~/2 e ~exp
Mp))
(]0)
for some positive constant ~ depending on a bound for I C-~I on R 2. From (10) with tS(e) = e "~, O < m <89 we have
jlsc~ _ ~ojjp -->0
(11)
in probability uniformly for x on [0, 1] x [ - 1 , - 8 ( e ) ]
It: - r ~
I ~ : ( ~ : ) - ~~176
0
in probability uniformly for x on [0, 1] x [ - 1, - di(e)].
and that
(12)
C.J. Holland / RepresentationsoJ:boundary layerexpansions
64 Since 0 ~ = 0
if ~ ( y ~ )
is in the b o t t o m and lateral sides of S and
I O~ I < Ke-~+~ for some constant K otherwise, we obtain from the representation (9) and the estimates (10), (11), (12) that 10"[---~0 uniformly on
[0, ~1 x [ - 1, - 6(~)1. We now consider the b o u n d a r y layer expansion (4). N o t e that xlt = ~ ~ satisfies on [0, 1] x ( - o o , 0] the e q u a t i o n .o
_
B(x,,O)xtr~
)
= 0
with b o u n d a r y conditions
't,"(x,,O) = a, ' ( x , , 0 ) A ( x , , 0 ) -
V~
0<- x, ~- 1
,W(x,, - ~) = O. Fix a, 0 < c ~ < m and define ~ ( x ) = V ~ ( x , , e x 2 ) - V ~ and X ~(x) = ( ~ (x) - ~ ~ -". T o establish (4) on [0, 1] x [ - 6 (e), 0] we need to show that ,g~ -->0 uniformly on [0, 1] x [ - 6 ( e ) e - ~ , 0 ] . T h e expansion (4) then follows on [0, 1] x [ - 1, 0] since the function ~o satisfies an exponential decay in x2 as x2--> - ~ . Let Y, = E ( x ) and define the o p e r a t o r M ~ by
M ~E = eC(x~, ex~)E .... + E~x~ + [ - B(x,, ex~)+ eC(x,, ex~)O(x,, ex~)]~2
+ e2C(x,, exz)H(x,, ex2)Y~. Then ~
satisfies the equation
M~"W + e2S(x, e ) = 0
(13)
with
S(x, e) = V~
eXz)+ D(x,, ex2)C(Xl, ex2)V~
"J-C(x,, ~X2)H(X,, eX2) V~
ex2)
ex2),
We can rewrite the equation for ~o in the form
M ~ ~ + eT(x, e) = 0
(14)
with
+ C(xl, ex2)D (x~, ex2)~~
C(x,, exOH(x,, exz)gt~ )1"
T h e r e f o r e X ~ satisfies the equation
M~X ~ + e ~-~T(x, e)+ e2-"S(x, e) = O.
(15)
C.J. Holland / Representations of boundary layer expansions
65
Let us make a probabilistic representation of X ~. Consider for x E (0, 1) x
(-8(E)e-~,O) the stochastic differential equation
(
o
1
drl~ = - B(~,, erl2) + eC(rl,, erl2)D('o~, e'02) dt +((2~C(rl~ en2))''2
29/0 d w
(16)
with initial condition rt ~(0) = x ~ (0, 1) x ( - 6(e)e % 0). For e0 > 0 sufficiently small there exists a constant P > 0 such that
B (x,, ex=) + eC(x,, exDD (x,, ex2) > P if (x~, x2) E (0, 1) x ( - 26(eo)e{,', 0). Let 1-~ be the exit time of -q'(t) from (0, 1)x ( - 6 ( e ) e - ' , 0) and let 3'~ = min [rx~, 23(e)(eP)-~]. Then from the Ito stochastic differential rule we have X~(x)= E~{ f}~:ISI:(t)[e2-~S(rl~(t),e)+ e ' ~T('q~(t),e)]dt}
+E~Hd3"x)X (n (3'7,))
(17)
where t
/~x~(t) = exp
fOe2C(rl~(s), e~q~(s))HOT~(s), erl~(s))ds.
Let us show that X" ---~0 through use of the representation (17). First, note that T, S are uniformly bounded for (x, e) ~ (0, 1) x ( - 6 (eo)e g', 0) • (0, e0). Since 3'7, -< 26(e)(eP) -1 and 6 ( e ) = e", the first term on the right-hand side of (17) tends uniformly to zero for x ~ (0, 1) x ( - 6(e)e-~,O). We need only estimate the second term. Note 2, ~ = 0 along the top and lateral sides of the rectangles (0, 1)x ( - 6(e)e-~, 0). On the bottom g" ---~0 as e ---~0. To conclude this last fact we use the form of ~0 and the validity of the regular expansion (5) on the lower boundaries of these varying rectangles. In the interior of the rectangles, IX" ] < Ce-~ for some constant C, which depends in part on the constant K* in (B5). Hence to complete the proof we only need show that P{3'7, = ~-~}< Ce ~+" for some positive constants C and /3. Now r/; (26 (e) (eP) ') < x: - 26 (e )/e + ~/2 w (26 (e) (eP)-'). Hence P{3'~ ~ ~-;} -< P {w(26(e ) (eP) -~) > 6(e )/e }
(18)
66
C.J. Holland / Representations of boundary layer expansions
F r o m s t a n d a r d e s t i m a t e s we h a v e t h a t
P {w (28 (e)(eP)-')> 6(e )/e } < 4 ( ~ ) e x p
(- 6(e )P/2e ).
T h i s i m p l i e s (18) a n d t h u s c o m p l e t e s t h e p r o o f of the t h e o r e m .
References [1] W. Eckhaus, "Boundary layers in linear elliptic singular perturbation problems", SIAM Review 14 (1972) 225-270. [2] W.H. Fleming, "Stochastically perturbed dynamical systems", Rocky Mountain Journal of Mathematics 4 (1974) 407-433. [3] C. Holland, "Singular perturbations in elliptic boundary value problems", Journal Differential Equations 20 (1976) 248-265. [4] C. Holland, "Parabolic boundary layers", Indiana University Mathematics Journal, to appear. [5] C. Holland, "Singular perturbations in the first boundary value problems for parabolic equations", S I A M Journal of Mathematical Analysis, to appear. [6] M. Zlamal, "The parabolic equation as a limiting case of a certain elliptic equation", Annuli di mathematics Pura ed Applicata 52 (1962) 143-150.
Mathematical Programming Study 5 (1976) 67-78. North-Holland Publishing Company
LIMIT THEOREMS AND DIFFUSION APPROXIMATIONS FOR DENSITY DEPENDENT MARKOV CHAINS T h o m a s G. K U R T Z University of Wisconsin, Madison, Wis., U.S.A. Received 10 June 1975 Revised manuscript received 13 October 1975 One parameter families of Markov chains XA(t) with infinitesimal parameters given by q~.k,t = A f ( A 'k, I) k, I E Z" ly~ 0 are considered. Under appropriate conditions XA(t)/A converges in probability as A - ~ to a solution of the system of ordinary differential equations, X = F(X) where F ( x ) = Etlf(x, I). Limit theorems for these families are reviewed including work of Norman, Barbour and the author. A natural diffusion approximation is discussed. Families of this type include the usual epidemic model, models in chemistry, genetics and in many other areas of application.
I. Introduction In this p a p e r we will consider families of Markov chains XA (t) with state spaces in Z', the r-dimensional integer lattice, and infinitesimal parameters given by
qk.k~-IA
~
Af(k/A, l),
l~ 0
where f(x, l) is a nonnegative function defined on a subset of R r x Z'. To avoid complication in the statement of the theorems we will assume that the state space is all of Z' and that f(x, 1) is defined on all of R' • Z'. This is not naturally true for most of the examples given below (where the state space is usually a subset of the nonnegative integer lattice), but the restatement of the theorems for the particular examples is usually clear. Throughout we will assume
sup x~K
Itlf(,,
(1.1)
I
for every bounded set K and define 67
Th. G. Kurtz / Density dependent Markov chains
68
V(x) =
if(x, t). l
Assumption (1.1) insures that XA (t) is well defined up to s u p { t : sup IXA(S)'< ~ In some of the examples A is naturally interpreted as area XA (t) gives the population size for each of several species, and hence ZA (t) ~ XA (t)/A is the vector of population densities. With these examples in mind, we will refer to ZA (t) as the density process. The critical observation about these processes is that under appropriate assumptions the density process satisfies ~sE(Za(s))
(1.2)
E(F(Za(s)))
and for e > 0 ~
(1.3)
0
where Z ( s ) , 0 <~s ~ t is a solution of
d Z(s) = F(Z(s)). ds
(1.4)
The following stochastic models are for the most part quite familiar. We formulate them in such a way as to incorporate the parameter A into the model in a natural way, and we observe that in each case (1.4) gives the familiar deterministic model that is analogous to the stochastic model. Consequently (1.3) demonstrates the relationship between these stochastic and deterministic models.
Epidemics. Let A be the total population size, X A (t) = (Sa (t), IA (t)) gives the number of infectious and the number of susceptibles. We assume the chance that a particular susceptible individual encounters some infectious individual at time t is proportional to the fraction of the total population that is infectious, Ia(t)/A. This leads to the assumption that for k = (s, i) ASA = A l t - -s ~ i -~ A f ( k / A , l),
I = ( - 1, 1)
A
qk,k+l
-
-
i
t~i = At~-~ = A f ( k / A , 1)
I = (0, - 1).
Th. G. Kurtz / Density dependent Markov chains
69
This gives
F(x, y) = ( - 1, 1)Axy + (0, - 1)/xy (1.5)
= ( - •xy, ~xy - txy) and hence (1.4) is the familiar epidemic model. (See [2].)
Branching processes. Let A denote area. We assume that the offspring distribution depends on the population density i.e. pro(x) is the probability that a dying individual leaves m offspring if the population density is x. This leads to A
qk,k+, = Akp~+l(k/a ) = aA ( k / a )p~+,(k/a ) = a f ( k / a , l), where h is the p a r a m e t e r of the exponential lifetime distribution.
Chemical reactions. We consider a simple reaction B + C ~ D taking place in a solution with volume A. Let XA (t) ----(Ba (t), Ca (t)) be the numbers of molecules of reactants B and C. We assume the chance that a particular molecule of B encounters a molecule of C in such a way as to have a reaction is proportional to the concentration of C, Ca (t)/A. Consequently for k = ( b , c ) and I = ( - 1 , 1 ) qk.k+ta= b c / A = A b / A = A f ( k / A , l ) . More general chemical reaction models satisfy A
__
qk,k+~-- A ( f ( k / A , l)+ O ( 1 / A )). (See [10, 14].)
Prey-predator. Let A be the area of a region and XA(t) = (Ua(t), Va(t)) be the n u m b e r of rabbits and foxes in the region. The chance that a particular fox finds a rabbit is proportional to the density of the rabbits Ua (t)/A and assuming that the chances that a fox produces an offspring are enhanced by being well fed we have for k = (u, v) au =
A a A = A f ( k / A , l)
v u = A f ( k / A , t) bu + c v ~u = A b A + Ac-~--~
l = (1, 0), l = ( - 1,0),
A
qk,k+l ~ U
dv + cv-~ = a d
~
V tl
+ A e ~ - - ~ = A f ( k / a , l)
a f A = a f ( k / a , l)
l = (0, 1), l= (0, - 1).
70
Th.G. K u r t z / D e n s i t y d e p e n d e n t M a r k o v chains
Of course, (1.4) is the classical Volterra prey-predator model. In Section 2 we review known limit theorems for density dependent processes. Most of these results are stated more generally in the literature, but we state them here in the form that seems most appropriate for our purposes. Diffusion approximations have been suggested for some of the above and related models [See for example 2,4,5] and in Section 3 we discuss the general problem of diffusion approximations for density dependent families. Using a result of Komlos, Major and Tusnady we construct the Markov chain and the diffusion approximation on the same sample space and give very sharp estimates on the difference between the two.
2. Limit theorems
The limit theorems closely parallel those for sums of independent random variables. The " L a w of large numbers" relies on the fact that under appropriate conditions ZA ( t ) - Za ( 0 ) - f~ F(ZA (s))ds is a martingale. Theorem (2.1). Suppose for each bounded set K there is a constant MK such
that IF(x) - F(y)l <~ MK Ix - y [ for x, y E K, and lim sup ~ d~
xEK II[>d
II I f ( x , l ) : O .
If Z ( s ) 0 <~s <- t is a solution of Z = F( Z ) and l i m A ~ ZA (0) = Z(O), then for e>O
12m p/sup jz (s)- z(s)l > k s~t
:0.
(2.2)
This result was given in [8]. Bounds on the probabilities in (2.2) similar to Kolmogorov's Inequality are given in [9]. The following central limit theorem is given in [9]. It is closely related to the theorem of Rosen [16] and to results of Norman [12]. Theorem (2.3). Let g0(x) = ~ 1
and suppose
l,lf(x, l),
Th.G. Kurtz / Density dependentMarkovchains limsup E d ~
x~K
71
I l r f ( x,l)=O
IIl>d
for each bounded K. Suppose F(x) = (Fl(x), F:(x)... F'(x)) has continuous first partial derivatives. If Z(s) 0 <~s <~t is a solution of 2. = F(Z) and lima-~ ~/A (ZA (0) - Z(0)) = 0, then "x/A (ZA (S) -- Z(s)) converges weakly in D(O, t) to the Gaussian diffusion process V(s) with characteristic function 4/(s, O) satisfying
~(s,
o) = - 12~
O~Okg,k (Z(s))
4'(s, 0) (2.4)
+~
Oj&F~(Z(s))~G(s, 0).
Wang [17, 18] gives theorems analogous to (2.1) and (2.3) for population models involving age as well as density dependence. In that setting the limiting deterministic model is given by a system of integral equations. In [1] Barbour gives a Berry-Essen type error estimate for the convergence of the distribution of a certain functional of "x/A (Za (s) - Z(s)). Theorem (2.5) (Barbour). In addition to the assumption of Theorem (2.3) suppose f(x, l) is a polynomial in x for each I and is non-zero for only a finite number of l's. Let h (x, s) satisfy
o~Oh+ ~ F~
Oh =
o
and have continuous second partial derivatives, and suppose ~r2(s)=-- 2
gi'(Z(s))~x , h(Z(s), s) o@j h(Z(s),s)
I,I
does not vanish on 0 <- s ~ t. Then for ~q(s) = ((O/Ox,) h(Z(s), s ) . . . (O/Ox,) h(Z(s), s)) and 0 <- s ~ t sup [P{'x/A (ZA (S)-- Z(s))" 71(s) <<-v} t~
(loa_4
(2.6)
- P { V ( s ) " r / ( s ) ~< v} I-- O \ v / A ].
The process V(s).Tl(s) has mean zero, independent increments, and variance fgo':(u )du, i.e. it is a deterministic time change of Brownian motion.
"72
Th. G. Kurtz / Density dependentMarkov chains
The following theorem, which is a continuous parameter analog of a result of Norman [13], gives conditions under which the convergence in Theorem (2.3) is uniform for all time. Theorem (2.7). Let K be a bounded subset of R" such that ZA (0) E K implies
ZA (t) ~ K all t > 0 and Z(O) ~ K implies Z ( t ) E K all t > O. In addition to the assumptions of Theorem (2.3) suppose there is a point Xo such that F(xo) = O, lim,_~ IZ ( t ) - Xo I = 0 for every solution of 2 = F ( Z ) with Z(O)@ K and that the eigenvalues of the matrix H = ((0iFJ(x0))) have negative real parts. Suppose
sup [ 0,F'(x)l < ~,
sup Ig,(x)l < o%
xEK
xEK
limsup ~ d~
xEK
l ll2f(x,l)=O.
]ll>d
Let ~PA(s,O) be the characteristic function lim ~/A (Za (0) - Z(0)) = 0 implies
lira sup I OA(S, O ) - 6(s, O)l = O,
~ / A ( Z z ( s ) - Z(s)).
Then
(2.8)
s
where ~(s,O) satisfies (2.4). In particular, suppose Z A ( s ) converges in distribution to ZA (~) as s --->~z. Then v ' A (ZA (~) - xo) converges in distribu tion to a normal random vector with mean zero and covariance matrix G H -l where G,j = g,(xo)
and
H, = 0,F(Xo).
The primary significance of Theorem (2.7) is that it gives a central limit theorem for limiting distributions. It is applicable to many chemical reaction models and some genetics and population models. Its chief restriction is that it essentially allows only one fixed point and usually would only be applicable to population models with immigration and genetics models with mutation, and of course is not applicable to the epidemic model. Nagaev and Startsev [19] have, however, given limit theorems for the limiting distribution of the epidemic model under various assumptions about the relationships among the parameters. We restate one in our language. Theorem (2.9) (Nagaev and Startsev).
Let X A ( t ) = (SA(t), IA(t)) be the epidemic process discussed in Section 1, with p = #/A < 1 and let Z ( t ) =
Th.G. Kurtz / Density dependentMarkovchains
73
(S(t),I(t)) be a solution of 2 = F ( Z ) with F given by (1.5). Suppose limv'A(ZA(0)--Z(0))=0 and let W A = l i m , _ = S a ( t ) / A and W = lim,~= S(t). Then lirn P{~v/A (WA -- W) < x} = P{V < x}
(2.10)
where V is normally distributed with mean zero and variance W(p - W) 2
s(o) (s(o)- w) (s(o) w + p2) 3. Diffusion approximations Formally the infinitesimal generator for ZA(t) is given by
BAh(x) = ~ A [ h ( x + l / A ) - h(x)]f(x, l).
(3.1)
l
If we assume h is smooth and expand in a Taylor series we see 11 BAh(X)~ ~. F'(x)d,h(x)+--~ ~ g,~(x)O,O,h(x)
=- DAh (x).
(3.2)
This suggests that in some sense ZA (t) can be approximated by a diffusion process YA(t) with generator DA. We already know that ZA(t) = Z(t)+O(l/~v/A). Consequently if the diffusion approximation is to be worthwhile it must do better than Z(t). Our goal in this section is to give conditions under which
zA(t)= YA(t)+ o (L~-0-)
(3.3)
in a precise sense. Lemma (3.5). There is a version of ZA such that
ZA(t)= ZA(O)+ ~i/-~ Y, A for t < r= = sup{s : supu,s IZA(u)l < son processes with parameter 1.
f(ZA(s),l)ds
~}, where
(3.6)
the Yt are independent eois-
Th. G. Kurtz / Density dependent Markov chains
74
Proof. Consider (3.6) as an equation for ZA. Observe that
-- e x p { - ~
Af(f(ZA(O),l)}.
The finiteness of the sum in the exponent insures that there is a first jump and similarly a second jump etc. Since supx~K ~,lf(x, l) < :~ for all bounded sets K, (3.6) determines ZA (t) uniquely up to r~. Lemma (3.'/). Let E be an open set such that
sup 2 II I f(x, l) < xE:E
~" 1 1 ] 2 1 X / ~ , l ) - X/f(y,l)12<~ M [ x -
1
yl 2,
] F ( x ) - F(y)[ < M l x - y l for some M and all x, y E E. Then there is a diffusion YA with generator DA such that 1
YA(t)=
YA(O)~- ~1 X
B'(Afot
f ( Y A ( s ) , l) ds)
(3.8)
up to the first time YA leaves E, where the B~ are independent Brownian motions with E(B,(t))= t and V a r ( B , ( t ) ) = t. Proof. For simplicity assume E = R'. This is not really a restriction since f(x, I) can be redefined off the set E so that the conditions of the Lemma are satisfied on all of R'. Let o's(x) = v'(f(x, I)) and consider the stochastic integral equation YA(t) = YA (0)+ ~
-slf0'
o',(Y.(u))dW,(u) (3.9)
+fo' F ( Y A ( u ) ) d u , where the Wt(u) are independent Brownian motions with mean 0 and variance u. This, of course, is not the usual equation, but existence, uniqueness and the strong Markov property follow in the usual way and Ito's Lemma (McKean [11, page 44]) implies that YA has generator DA i.e.
Th.G. Kurtz / Density dependentMarkov chains
75
lim 1 (E(h(YA(t + e)) t F , ) - h(YA(t))) = D a h ( Y a ( t ) ) . Let
S
Ct(t) = v~A , o't(Y/,(u))dW~(u)+
f ( Y A ( u ) , l)du.
Define stopping times r~ by s =A
f(YA(u),l)du.
We claim that Ct(~-~) are independent Brownian motions with E(Ct(r~)) = Var(C~(r~)) = s. Observe that i
C,(r~) =
"x/At~,(YA(u))dWk(u)+ s.
(3.10)
The first term is a zero-mean Brownian motion (See [11, page 29]) and the independence follows since Ito's Lemma ([11, page 44]) implies
E (fo
o'k(YA(U))dWk(u)f)
=0
for
o',,,(YA(u))dWm(u))=
k~m.
(3.11)
Define
B,(s)-- c,(~) and observe that YA(t) = gA(o)+ ~ ~ B,
Finally we need the following lemma which can be obtained from results of Komlos, Majdr and Tusnfidy [7] on approximation of empirical distribution functions by the Brownian Bridge and a lemma of Revesz [15] on the normal approximation to the Poisson distribution. Lemma (3.12). A Poisson process X ( t ) and a Brownian motion B ( t ) with E(X(t)) = V a r ( X ( t ) ) = E(B (t))= Var(B (t))= t
can be constructed on the same probability space in such a way that
Th. G. Kurtz / Density dependent Markov chains
76
sup ] X ( t ) - B(t)l= K < ~ 9
a.s.,
141n(l+t)
and E(expAK)< or for some A > 0 . Theorem (3.13) Suppose that f(x, I) satisfies the conditions of Lemma (3.7)
and that f(x, 1) ~ 0 for only a finite collection of/'s. Let YA be the process given by Lemma (3.7), with Ya (0) = Za (0) and let 7F be the first exit time from E. Then for A > e and T > 0 there exists a version of ZA and a random variable K with distribution depending only on f(x, I) and T such that ]YA(t)--Z~(t)I<-K ~
for
t<~T^rE.
Furthermore there is a A > 0 such that E(exp A K ) < ~c (Note that Y,, (t) and Za (t) are not jointly Markov.) Proof. Again assume E = R'. Let SUpx.~[(x, l) = C, and let M be a common I,ipschitz constant for f(x,l) and F(x). Let {B~} be the independent Brownian motions in (3.8). By Lemma (3.12) there exist independent Poisson processes { Yt} satisfying sup
L~-(t)- Y'(t)l--- K ,
< ~c
a.s..
1 + In' t
i
Let ZA be the solution of (3.6) using these Poisson processes and Za (0)= YA (0). Then
YA (t) - ZA (t) :=
+2, xl
xl [B'(a fo' f ( Y A ( s ) , l ) d s ) -
B , ( a fo' f(ZA(S),l)ds) ]
[B,(Afo'f(Z~(s),l)ds)- Y,(Afo'f(ZA(s),l)ds)].
(3.14)
Writing B,(u) = W~(u)+ u and observing If;f(YA(S), l)dsl<~ Ct we obtain
z,,(t)l<~
[ y:,(t)-
i
' f(Y.(s),l)ds) -
+ Z i I iK, [1 + i
In~(ACt)] A
W,(Afo'y(JA(s),l)ds)]I
Th.G. Kurtz / Density dependentMarkov chains
77
A result of Levy (McKean [11, page 14]) implies sup ....
ACT
I W,(u)- W,(v)[ ~ M, < ~/I u - v [In(ACT~ I u - v [)
a.s.,
(3.16)
where the distribution of Mt does not depend on A C T and a result of Fernique [6] implies E(exp A M~) < ~. Let y ( t ) = AI YA(t)--ZA(t) I. Noting that "X/(xln(ACT/x))<~ V'(x(1 + ln+(ACT/x))) for 0 4 x <~A C T and that the right-hand side is increasing in x we obtain
3"(0 <~~ I I [M~
(
( /fo M3,(s)ds) ) )
My(s)ds 1 + In + AC
I
(3.17)
+ f0t M3"(s)ds + ~'~ It IK~ (1 + In§ I
for t ~< T. Letting FI =
~,,II[M,
and F2 = ~ , [ like, Gronwall's inequality implies
,/ (fo'M3"(s )ds(l + ln +( A C T / fo' M3"(s )ds) ) )
3'(t) ~ [IV1
+ F2(1 + In+(ACT))[ e M'.
(3.18)
Setting 3' = sup,~T3'(t) we have 3' ~< [F,x/(TMy(1 +
In+(AC/My)))+ F:(1 + In+(ACT))]e Mr. (3.19)
Either y ~< 1 or
3' <- eMTF,~/(TM( 1 + In+(AC/M))) "x/-y+ eMTF2(1 + ln+(ACT)) (3.20)
= K,X/~+ K2. If (3.20) holds then 3, ~< K~ + 2K2
= e:MTF~TM(1 + In+(AC/M))+ 2eMrF2(1 + ln+(ACT)) <~KlnA for A > e and an appropriately defined K. Since E ( e x p A M ~ ) < ~ and E(expAK~)< oo for some A > 0 we have E ( e x p A K ) < ~ for some A > 0 .
78
Th.G. Kurtz / Density dependent Markov chains
References [1] Andrew D. Barbour, "On a functional central limit theorem for Markov population processes", Advances in Applied Probability 6 (1974) 21-39. [2] M.S. Bartlett, Stochastic population models in ecology and epidemiology (Methuen, London, 1960). [3] A.T. Bharucha-Reid, Elements of the theory of Markov processes and their application, (McGraw-Hill, New York, 1960). [4] William Feller, "Die Grundlagen der Volterraschen Theorie des Kampfes ums Dasein in Wahrscheinlichkeits theoretischen Behandlung", Aeta Biotheoretica 5 (1939) 1-40. [5] William Feller, "Diffusion processes in genetics", in Proceedings of the second Berkeley symposium on mathematical statistics and probability, pp. 227-246. [6] Xavier Fernique, "Int~grabilit6 des vecteurs gaussiens", Comptes Rendus Hebdomadiares des Sdances de Acaddmie des Sciences, Paris 270 (1970) i698-1699. [7] J. Koml6s, P. Major and G. Tusnfidy. "An approximation of partial sums of independent random variables and the sample distribution function, I", Zeitschrift fiir Wahrscheinliehkeitstheorie und Verwandte Gebiete 32 (1975) 111-131. [8] Thomas G. Kurtz, "Solutions of ordinary differential equations as limits of pure jump Markov processes", Journal of Applied Probability 7 (1970) 49-58. [9] Thomas G. Kurtz, "Limit theorems for sequences of jump Markov processes approximating ordinary differential processes", Journal of Applied Probability 8 (1971) 344-356. [10] Thomas G. Kurtz, "The relationship between stochastic and deterministic models for chemical reactions", The Journal of Chemical Physics 57 (1972) 2976-2978. [11] H.P. McKean, Jr. Stochastic integrals (Academic Press, New York, 1969). [12] M. Frank Norman, Markov processes and learning models (Academic Press, New York, 1972). [13] M. Frank Norman, "A central limit theorem for Markov processes that move by small steps", The Annals of Probability 2 (1974) 1065-1074. [14] I. Oppenheim, K.E. Schuler, and G.H. Weiss "Stochastic and deterministic formulation of chemical rate equations", The Journal of Chemical Physics 50 (1969) 460-466. [15] P. Revesz, "On strong approximation of-the multidimensional empirical process", to appear. [16] B. Ros6n "On the central limit theorem for sums of dependent random variables", Zeitschrift far Wahrscheinlichkeitstheorie und Verwandte Gebiete 7 (1967) 48-82. [17] J.S. Frank Wang, "Limit theorems for age and density dependent stochastic population models", The Journal of Mathematical Biology 2 (1975) 373-400. [18] J.S. Frank Wang, "A central limit theorem for age and density dependent population processes", to appear. [19] A.V. Nagaev and A.N. Startsev "The asymptotic analysis of a stochastic model of an epidemic", Theory of Probability and its Applications, 15 (1970) 98-107.
Mathematical Programming Study 6 (1976) 79-92. North-Holland Publishing C o m p a n y
THE CHOICE OF A STOCHASTIC FOR A NOISE SYSTEM
MODEL
E.J. M c S H A N E University of Virginia, Charlottesville, Lexington, Va., U.S.A.
Va., U.S.A. and Virginia Military Institute,
Received 2 June 1975 Revised manuscript received 15 March 1976 Let a system be affected by noises that are sample functions of a process N * ; the state-function x* corresponding to e a c h sample function of N * is determined by a differential or difference or functional equation E * . For mathematical tractability we wish to replace N* by an idealization N. To do this we shall also replace equations E * by other equations E ; the solution of E corresponding to N will be a process x. If E is well chosen, the finite-dimensional distributions of x will be satisfactory approximations to those of x * . We here investigate s o m e cases to find when such an idealization can reasonably be expected to exist, and how to choose the system E.
1. Introduction
Suppose that a system is being studied whose state can be specified by a set of n numbers x ~ , . . . , x " , and that this system is affected by a set of r noises, the input of the p-th noise during the time-interval [s, t) being the increment z*~(t)-z*P(s)of a "noise-function" z *p. In many instances there exists a theory that assures us that when the noise-functions belong to some class, restricted to accord with the nature of the problem, the time-evolution of the system is described by a set of differential or difference or functional equations E*. However, the class of noise-processes that are admissible by the nature of the system often are inconvenient for mathematical study, especially for study of their statistics, and we would like to replace them by an idealized system of greater mathematical convenience without introducing an unacceptably large error into the resulting solution for the state functions. These idealizations are often out of the class of processes for which the underlying scientific theory asserts that equations E * apply, so that we have no reason to hold firmly to the formalism of equations E * even after we have changed the noise-processes z * to some other processes z of a 79
80
E.J. McShane / The choice of a stochastic model
different kind. Instead, we shall also attempt to replace equations E * by other equations E in such a way that for suitable and useful choices of points tl . . . . . t,, in the time-interval [a, b] under consideration, the joint distribution of the solutions x(6) ..... x(t,,) of the idealized system shall be near to the joint distribution of the solutions x * ( t , ) , . . . , x *(tin) of the actual system. Whether we can reasonably hope to find such an idealized system E corresponding to some arbitrarily chosen idealization z of the noiseprocesses z * depends on the structure of the original system and the choice of the idealized noise-system z. We investigate this in a few instances, primarily for equations E * and E in the form of stochastic differential equations.
2. Standing hypotheses and notation The type of equations E with which we shall be chiefly concerned is xi(t) =
xi(a)+
f'(s,x(s))ds +
g~(s,x(s))dz~(s) 0=I
(2.1) + ~, ~i
hip'~(s'x(s))dz~(s)dz~(s)'
where the integrals are "belated" integrals of first and second order ([2, 3]). Equations E * will have the same form, but we shall write asterisks on the x ~, z, f , g'~ and h~.~. For typographical simplicity we shall usually write the equations without the asterisks; all our statements will apply also to the system E * For each noise-process there is a probability triple (~O,~, P). (For the process z* this would be denoted by (12",~4",P*).) All points t and s mentioned will be assumed to lie in an interval [a, b]. For each t, ~, is a complete o'-subalgebra of ~, and if s ~< t then ffs C ff,. Every process denoted by z, with or without affixes, is a real-valued process on [a, b] adapted to the if,, and there exists a constant K such that if a ~< s ~< t ~< b then a.s. [E(z(t)-
z(s)l ~ ) l ~< K i t - s[,
E ( [ z ( t ) - z(s)]2[ff~) ~< K [ t -
sl.
For each positive /z, there exists a positive K4(/.I,) such that if [ t - s ] ~ < / z then
E.J. McShane / The choice of a stochastic model
81
E([z (t) - z ( s ) ] 4) ~ g 4 ( / z ) [ t - s [. M a n y interesting processes satisfy these conditions. In particular, if all sample functions of z satisfy a Lipschitz condition of constant L, we can choose K 4 ( / z ) = L4/.s 3. If z is a standard Wiener process we can choose K4(~) = 3~. If we retained the notation of (2.1), some of the later estimates would be notationally horrible. W e therefore introduce some changes in notation. First we define
z~
= x~
g~
t, (p=l,...,r),
g~ x) = 1, g~o(t,x)=f'(t,x)
( i = 1 . . . . . n).
We introduce these in (2.1), and discard the s e c o n d - o r d e r terms. L a t e r we shall show that we can get them back in by a notational trick. F r o m this point on affixes i, k and h will have range 0 . . . . . n, and when r e p e a t e d in a term are to be s u m m e d over that range. Affixes p, o- and r will have range 0 . . . . . r, and when r e p e a t e d are to be s u m m e d over that range. With this change, equations (2.1) (with the s e c o n d - o r d e r terms omitted) takes the form x'(t) = x'(a) +
Iogi,(x(s))dz~
(2.2)
The new z ~ clearly satisfies the standing hypotheses. If we wish to reintroduce the s e c o n d - o r d e r terms in (2.1), we first define Z~'~(t) =
ldzP(s)dz~(s).
(2.3)
The conclusion of a t h e o r e m ( T h e o r e m IV-4-1, p. 134) of [2] asserts that
foh~o,~(s,x(s))dzP(s)dz~(s) = foh~o,~(s,x(s))dZ~
(2.4)
Since we shall assume that the h',.,, are b o u n d e d , they satisfy the hypotheses of the t h e o r e m quoted. The processes z p do not, because in that t h e o r e m it was assumed that the sample functions of the z p are a.s. continuous. But in the p r o o f of the t h e o r e m no use was m a d e of that hypothesis, so we can use (2.4). N o w the s e c o n d - o r d e r terms have taken the a p p e a r a n c e of first-order integrals, and can be included a m o n g the integrals with respect to the z ". In
82
E.J. McShane / The choice of a stochastic model
[2] and [3] this substitution was avoided, because the m a p p i n g of z p and z ~ into the p r o c e s s Z ~'~ is u n s t a b l e . B u t we shall m a k e use of it only for various fixed noise-processes, and take a c c o u n t of the fact that its value can change greatly u n d e r c h a n g e from z *p to z ~. We shall assume that the g~ are b o u n d e d for all x, t o g e t h e r with their partial derivatives of first and second order. This is less drastic than it might seem. In any application there is s o m e limited range of values of the state variables b e y o n d which we have no interest in the solutions of the equations. Within this range it is not at all u n r e a s o n a b l e to assume that the g~ and their first and s e c o n d o r d e r partial derivatives are b o u n d e d and continuous. Outside this range we can a m e n d t h e m to p r o d u c e b o u n d e d n e s s without affecting any solution-function that we are willing to be c o n c e r n e d with. A partition ~ of the interval [a, b] is an increasing s e q u e n c e of points tl = a, t2. . . . . t,, = b. The mesh of the partition is the m a x i m u m of the n u m b e r s tj+l- tj, j = 1. . . . . m - 1. If yl and y2 are any two r-vector-valued processes on a, b, we define the pseudo-distance d ( y . y2; ~ ) to be the s u p r e m u m of t E(q~ (yl(t~) . . . . , y,(t~)) - E(~p (y2(tl) . . . . . y2(t~ ))t for all functions ~ : ( R ' ) m ~ [ 0 , 1 ] of m r-vectors with values in [0,1] and satisfying a Lipschitz condition of constant 1. If xl, x2 are two n - v e c t o r valued processes on [a, b], the pseudo-distance d(xl, x2; ~ ) is defined similarly, the function ~ being now defined for all sets of m n-vectors.
3. Properties that an idealization should have To be specific, let us consider idealization by replacing the actual r a n d o m noise-processes (z ,1 . . . . , z *') by an r - c o m p o n e n t e d W i e n e r process. If each increment z *(t) - z *(s) of the r-vector-process (z , 1 , . . . , z *') is the sum of i n d e p e n d e n t small pulses, h o m o g e n e o u s l y distributed in time and each of short duration, then w h e n e v e r t - s remains above some positive n u m b e r 8, we can expect that the increment from s to t of the process (z*l . . . . . z * ' ) is nearly Gaussian, and the joint distribution of the m r-vectors (z*~(h),...,z*'(tl)) . . . . . (z*l(tm) . . . . , z * ' ( t , , ) ) should be nearly that of a W i e n e r process in r-space if all the t j + , - tj are above ~1. That is, we may expect that there is a process z in which z 1 , . . . , z ' are the c o m p o n e n t s of a Wiener process and d ( z * , z ; ~ ) is small provided that all intervals of partition ~ are of length at least 61. In o t h e r examples too we meet the same
E,J. McShane / The choice of a stochasticmodel
83
situation; the actual z* and the idealization z are close in the sense that d(z *, z ; ~ ) is small provided that all the intervals of ~ are of length at least Given now the actual noises z * and equations E*, and having chosen the desired idealization z for the noises, it would be pleasant if we could find another set of equations E such that the solutions of E, with noises z, are processes x such that for all partitions ~, the distance d(x, x*; ~ ) is small. But this is manifestly too much to ask. To solve equations E and E * for all possible tj would require us to use arbitrarily closely spaced points in estimating the integrals, and for these we have no reason to believe that z and z * have distributions that resemble each other. Suppose then that ~ is a partition in which all the intervals have length greater than 8~. By adjoining to ~ the midpoints of any of its intervals that have length more than 281, and repeating this process, we can find another partition ~ ' that includes all the points of ~ and has all its intervals of length greater than 8~ but not greater than 261. It is enough to consider such partitions, for d(x,x*;~)<~ d(x,x*; ~ ' ) . We wish to show that for all such partitions d ( x , x * ; ~ ) is small; and in estimating the x(tj) and x ";(ti) we wish to use no values of the zP(t) and z *P(t) except the values at the ti, since at closer values we have no adequate control over the distributions of the z and z*. This adds a new element to our problem. We must not only find the equations E. We must also find for E and for E * methods of approximating the solutions x and x * at the tj that are accurate enough and also use no values of the z and z* except the z(tj) and z *(tj). In detail, given E * and z *, and having chosen the idealization z for the noise-processes, and given a positive n u m b e r e that represents the largest acceptable distance d(x, x *; 2 ) at all partitions ~ with interval-lengths more than 61 but not more than 26~, we wish to find a set of equations E, an approximation procedure for solving E * that uses only the z*(tj), and an approximation procedure for solving E that uses only the z(tj), such that if y, y* are the approximations obtained for x, x* respectively, then
d(x*,y*;~)<e]3;
d(x,y;~)<e/3;
d(y,y*;~)<e/3.
We shall discuss these types of approximation separately.
4. The closeness in distribution of the approximate solutions It is convenient to assume that there is a uniform bound for the sample-functions of all the noise processes that we consider. This again is not
84
E.J. McShane / The choice of a stochastic model
as drastic an assumption as it appears. The actual noise functions z* are often expressed in units of some quantity, such as energy or currency, of which there is a bounded supply, and in this case the sample-functions of z * are bounded. In the idealization we usually want processes z with continuous sample functions. Given any such z and any positive e, there is a process z' with uniformly bounded sample functions such that z ' ( . , w) = z ( . , ~o) for all w except those in a set of P-measure less than e. Then d(z; z'; ~ ) < e for all partitions ~, and if e has been chosen small enough z' will serve in place of z. We shall also assume for the sake of simplicity that the initial value x~(a) is a sure function, independent of ~o. For approximating solutions of equations E and E * we shall allow only procedures that satisfy the following conditions. (4.1) The initial approximation y ( a ) is x ( a ) . For all other j, y(tj+l) is a function of the y(th) (h ~<j) and z(th) (h ~<j + 1) that is a polynomial in the Ahz (h ~<j) (where as usual we write Ahz = z(th+j)-- z(th)) with coefficients that are products of finitely many of the g'p and the first and second partial derivatives of the gi at t~. . . . . tj. (We could allow the use of partial derivatives of higher order if we added the hypothesis that they are bounded and have bounded derivatives.) Under this condition it is easy to prove step by step that the estimates y(tj) are bounded Lipschitzian functions of the z (tj). Thus for each i and j there is a bounded Lipschitzian function qO~ of m r-vectors such that y'(t~) = q~(z(t,) . . . . . z(t,,)).
We treat equations E* similarly. We use an approximation procedure (not necessarily the same as was used for E ) that furnishes approximations y*(t~) . . . . . y*(t,.) to the values of the solutions x* at the tj. As before, there are bounded Lipschitzian functions @*~ of m r-vectors such that y*'(t,) = q)*~(z *(t,),..., z *(t,,)). We now assume that even though equations E differ from E * and the formulas expressing the y(tj) in terms of the g', and their derivatives are different from the formulas expressing the y*(tj) in terms of the g*~o and their derivatives, the algorithm arrived at is the same; that is the functions q~ and q~*~ are the same. Now let q~ be a real-valued function of m n-vectors that has values in [0, 1] and satisfies a Lipschitz condition of constant 1. Then the composite function
E.J. McShane / The choice of a stochastic model
r
.....
z.,) .....
,/,:(z, ....
85
, zm))
is Lipschitzian; let L (/> 1) be a Lipschitz constant for it. If e > 0 and the noise-processes z, z* satisfy d(z, z*; ~ ) < e/3L, then ]E(r ( y ( t l ) , . . . , y ( t , , ) ) - E ( ~ ( y *(tl),..., y*(t,,))l < e/3, so d(y, y*; ~ ) < e/3.
5. Approximating the solutions of equations The standard Picard method for showing the existence of solutions of equations (2.2) is to choose as first approximation x, the function obtained by replacing x(s) by x(a) in the right member, as second approximation x2 the function obtained by replacing x(s) by x~(s) in the right member, and so on. Given a partition ~ = (tl . . . . , t,,), an obvious improvement is to use the first Picard approximation on the interval [tl, t2], then to use it again on the next interval but with x~(t2) as initial value instead of holding on to x(a), and so on. That is, x, has initial value x(a), and is computed successively on the intervals [tj, tj+,] by the formula
xi(t) = xi(tj)
+f; g~(x~(tj))dz~
(5.1)
= x , ( t j ) + g'p(x,(tj)) ( z " ( t ) - z ~
This is the Cauchy-Maruyama approximation. An evident improvement is to go as far as the second Picard approximation on each interval before proceeding to the next, and using as initial value on each interval the value thus obtained at the end of the preceding interval. That is, for tj ~< t ~< tj+l
x~(t) =
x'~(tj)+
f; gp[x2(ti) + gr162 '
z~(ti))]dzP(s). (5.2)
But this gain in accuracy is bought at too high a price. These approximations fail to satisfy the requirements (4.1). We shall therefore modify (5.2), sacrificing some of the accuracy in order to obtain a procedure that satisfies (4.1). Suppose that the coefficient functions g'o,, all have a bound Lo and a Lipschitz constant L0. We use the successive Picard approximations to solve equations (5.1). In [2], p. 156 it was shown that if a is an upper bound for
86
E.J. McShane / The choice of a stochastic model N(t, y)=sup{]fy(s)[f:
a~<s~
then for k/> 1 we have m(t,
Xk+ 1 -- X k ) ~ { n l / 2 ( r 2 +
r ) C L o ( t - a)k/2/(k !)'/2}or,
where (5.3)
C = 2 K ( b - a) '/2 + K '/2.
Therefore the limit of the xk, which is the solution of the differential equation, differs from xo in L2-norm by less than a constant multiple of a, uniformly on [a, b]. A similar statement, with similar proof, holds for difference equations such as (5.2); in fact, we could reduce this to the case already considered by rewriting the difference equations as differential equations. So if we compute processes x,, by a set of equations like (5.2), and denote by x the solution of other equations obtained by adding to the right member of (5.2) a process whose L2-norm is at most a, we can use Xo as the initial approximation in the Picard solution of the equations for x, and obtain x~ - x0 equal to the added terms, and therefore of L2-norm at most a. By the preceding discussion, fix(t)- xo(t)lJ will remain below a constant multiple of a. In other words, if we amend the right member of (5.2) by a process whose L2-norm remains at most a, the change in the solution will be of the same order; it will remain below some multiple of a. From this we easily verify that d(x, x0; ~ ) also remains below some multiple of a, for all partitions all of whose intervals have lengths between 61 and 281. By (5.2) and the theorem of the mean, for each to, each p and i, and each s in [tj, tj+~] there is a number O(s) in (0, 1) such that if we write ,~(s) = x2(tj)+ O ( s ) g ~ ( x d t i ) ) ( z ~ ( s ) -
z~(tj)),
(5.4)
then for these s g'dx2(t,) + g~(x2(t,))(z~(s) - z~(~))] =
(5.5)
= g'p(x2(tj)) + g'pxk(Xfft,))g~(x2(ti))(z~(s) - z=(tj)) + 89
(gh(x2(tj)) (Z ~(S) -- Z ~(tj))g~(xz(tj)) ( Z ' ( S ) -- z ' ( t j ) ) .
If we had used the theorem of the mean with the integral form of the remainder, it would have been evident that the coefficients evaluated at ,~(s) are adapted to the if,. To estimate the terms in the approximation we use a theorem proved in [2] (Corollary II1-2-11, p. 65): 5.6.
Theorem.
I f z satisfies the s t a n d i n g hypotheses, a n d f ( t ) is a process
E.J. McShane / The choice of a stochastic model
87
adapted to the ~, for which f(t) is everywhere finite and the belated integral o f f with respect to z exists, then
where C is defined as in (5.3). If we substitute from (5.5) in (5.2), each of the integrals arising from the remainder-terms in (5.5) has the form
f,' b(s) (z~(s) - z " (tj)) (z "(s) - z'(tj))dzP(s),
(5.7)
in which b is a bounded process adapted to the ~,. If B is an upper bound for I b(s, oJ)l, by (5.6) expression (5.7) cannot have L:-norm greater than
CB
~d t]
E([z~(s)- z~(t,)lZ[z'(s)-
z'(t,)]2)ds
By the Cauchy-Schwarz inequality and the standing hypotheses, this is at most
CB{K4(t,+I- t,)(t - tj)2/2} 1/2 If partition ~ has all its intervals of length less than 281, this is at most a constant multiple of ~1K4(28~)in. When the z are uniformly Lipschitzian, this is O(~15/2). When the z are Wiener processes it is O(813/2). In general it is at words O(t51). But even in the other cases that we shall discuss, K4(281) is small compared to K, and although the L2-norm of this term is bounded only by a constant multiple of &, the constant is a small one. We can safely discard this term from the estimation formula. If we define
g'p.o(t, z) = g~x~(t, x)g~(t, x),
(5.8)
what is left of (5.2) is
x~2(t) = x~(tj) + g~(x2(tj)) (z p (t) - zP(tj))
(5.9)
+ glp,=(x2(tj)) fti t (z~(s) - z=(tj))dzP(s). If p or ~r is 0, the last integral is O(8~3n), and can be discarded. If we could replace the last r 2 integrals by good estimates involving only the zO(t~), we
E.J. McShane / The choice of a stochastic model
88
would have the formula we desire. Unfortunately, we can do this only under a strong hypothesis on the equations E, namely For all p > 0, o- > 0, i and x,
gio.~(x) = g~.o(x).
(5.10)
If there is only one random noise-process, this is automatically satisfied. But if r/>2, so that there is more than one random noise, (5.10) is a severe restriction on the equations. When (5.10) is satisfied, the last term in (5.9) can be written in the form
89
( z = ( s ) - z~(tj))dz~
(zO(s) - z~
. (5.11)
To compute the sum of the integrals we subdivide [tj, t] into q * congruent parts by points t;, q = 1 . . . . . q * + 1. If we write
A'qz" = z~
z~
and analogously for z ~, we obtain the estimate A;z ~ •
Apz
A,z
=
q=l
= (z=(t) - z~(tj))(zO(t) - zP(tj))- ~ (A'qzO)(A'qz ~) q-1
for the sum of the integrals in (5.11). As q* increases, this yields for the quantity (5.11) the expression
89
) - z~(tj))(zp(t) - zp(tj)) - f f dzP(s)dz~(s)}.
(5.12)
This we substitute in (5.9) and obtain a procedure that still fails to satisfy requirements (4.1) because of the presence of the last integral in (5.12). But in the cases that interest us most this last integral can be computed without difficulty.
6. Examples If the sample functions of the z p are Lipschitzian, the second-order integrals in (5.12) vanish, and the approximation formula (5.9) takes the form (with y in place of x2) y'(tj+l) = y'(t,) + g'p(y(ti))Ajz* + ~g~o,o(y(tj))AizP AizL
(6.1)
E.J. McShane / The choice o[ a stochastic model
89
When the z ~ are twice differentiable this agrees to within higher order terms with the simplest of the R u n g e - K u t t a methods, and converges much faster than the Cauchy-Euler method. Suppose next that z 1. . . . , z" are independent standard Wiener processes. Then it is well known that
f "'dz~(s)dzO(s)=
Ajt
= 0
if P =
Or
I> 1,
(6.2)
in all other cases.
With this evaluation (5.9) becomes
y'(t,+l) =
y'(/,) + g ~(y (t,))A,z p
(6.3)
- [g ~,l(y (tj)) + . . . + g'.,.(y (tj))]A///2
+ 89 'o.~(Y(tj))Aiz pAiz L If the system satisfies a set of equations E*, which are equations (2.2) with asterisks on the x ~, z and g~, and the noise process z * has sample functions that are uniformly Lipschitzian, and we wish to idealize the system by replacing the z*" by independent Wiener processes, we must choose equations E for the idealization such that the right members of (6.1), with the g~p replaced by the g *~ o, are the same functions as the right members of (6.3), with the g~ of equations E. We cannot attain this by simply choosing the g*~ for the g~o,because of the term with factor ~jt in (6.3). But we recall that z ~ t, so if we define g ~(t, x ) = g *'(t, x) + [g *?~ (t, x) + ... + g *' . (t, x)]/2
g~(t,x) = g*~'(t,x)
(i = 1 , . . . , n),
(all other p and i);
(6.4)
we find that the right member of (6.1) with the g*~ in place of the g~, and the right m e m b e r of (6.3), with the g~ defined by (6.4), are the same function. Therefore the equations E for the idealization are those in which the coefficients are deduced from the g *~ by equations (6.4). The procedure espoused in [2] is as follows. Given the system E * for uniformly Lipschitzian noises, replace it by its "canonical extension" x *'(t) = x * ' ( a ) +
fo' g *'(x *(s))dz *~(s) + 89 g*i~(x *(s))dz *" (s)dz
*P(s) (6.5)
For noise-processes z* with uniformly Lipschitzian sample functions these
E.J. McShane / The choice of a stochastic model
90
equations have the same solutions as E * . If we idealize to any other noise-process z that satisfies the standing hypotheses, we keep the same equation, the g~ being the same as the g*~. When the idealizations z p are independent standard Wiener processes, from the known facts that
fo
f(s)dzP(s)dz~(s) =
fa
= 0
f(s)dt
(0 = o" i> 1) (all other O and ~r),
we see at once that these canonical equations are the same as the ones we have obtained by (6.3). There is, however, a small difference between the approximation procedures in the preceding pages and those in [2]. In that book, the equations were written in form (2.1) rather than in the condensed form (2.2). Accordingly, the C a u c h y - M a r u y a m a approximation procedure applied to the canonical extension in [2], which omits all second-order integrals with P or or equal to 0, will lack some small terms that are present when the C a u c h y - M a r u y a m a procedure is applied to the condensed form (2.2). The foregoing discussion leads to the suspicion that these terms, though small, might yield some i m p r o v e m e n t in the approximation. This might be worth some investigation. Consider next a point-process z ~. This is constant between jumps, which occur at the discontinuities of a right-continuous random function N whose sample-functions N ( . , to) are constant between finitely many points, at each of which it increases by 1. Let u be a process (t, ~o)~ u(t, to) such that all the random variables u (t, .) have the same distribution and form an independent set. For each to in g2, we define z ( . , to) to be the function such that z(a, to) = 0 and z(t, to) - z(s, to) = sum of u ( z ) [ N ( z , to) - N ( z - , to)] over all ~- in [s, t) at which N ( . , to) is discontinuous. We assume that incremeBts of N over disjoint intervals are independent and h o m o g e n e o u s in time; the expectation and the variance of AN are proportional to At, as on p. 87 of [2]. We also assume that E ( u ( a ) ) = 0; for simplicity we shall write u for u(a). The conditional distribution of Az under the condition A N = n is the distribution of the sum of n independent random variables Ul . . . . . u~, each with the same distribution as u. By the computation on p. 87 of [2], E(Az) = 0, and E([Az] 2) is a constant multiple of At. The conditional distribution of [hz] 4, given AN = n, is that of the fourth power of the sum of n independent r a n d o m variables, each with the same distribution as u has. Let these be ul . . . . . un. The fourth power of the sum contains terms in which some uj has exponent 1. All these have expectation 0. So
E.J. McShane / The choice of a stochastic model
E([Az]41AN = n ) = E
u
+E
= nE(u4)+ n ( n -
91
ujuk
1) (E(u2)) 2
= n var u 2 + n2(E(u2)) 2. We multiply by P(AN = n) and sum over all non-negative integers n, obtaining E([A z ]4) : (var u 2)E(A N) + (E u 2)2E([AN]2). We now define Aj = (Ajz) 2 - E(AN). E(AN) is proportional to At, and for notational simplicity we shall consider that it is equal to At. If ~ is a belated partition of [a, b] with evaluation-points '7"1. . . . , " / ' n and division-points tl = a , . . . , t,+l = b, we can repeat the steps of the computation at the top of page 84 of [2] and find that the Riemann sums S ( ~ ; / ; z , z ) and S ( ~ ; f ; t) have a difference whose L2-norm does not exceed a multiple of [varu2+ (E(u2))2] 1/2. Since the distribution of jumps, which is that of u, is fixed, we do not have the privilege of letting this tend to 0. Nevertheless, if the distribution of z is to be close enough to that of a Wiener process to make it at all reasonable to try to use a Wiener-process idealization, in time-intervals of the size appropriate to the problem there must be many jumps, each of small size in comparison to the total amount during such a time-interval. In this case, [varu~+(E(u2))2] 1/~ must be small in comparison with other numbers that occur in the approximation procedure (5.9). We shall therefore discard this term, and thus obtain an approximation to the second-order integral which is the same as the first of equations (6.2). However, with Wiener processes equations (6.2) gave evaluations of the integrals. With our point-processes, the first of these equations merely furnishes an estimate, and we should be a bit more suspicious of an idealization based on (6.2). There is no difficulty in showing that if z ~ and z " are two independent point-processes of the type just described, the second of equations (6.2) is exact. Since we are accepting equations (6.2) for our point-processes, and the derivation of (6.3) was based on equations (6.2), we obtain the same procedure (6.3) for the point-process that we obtained for the Wiener processes. Consequently, if we wish to idealize a system in which the noises are point-processes, replacing them by Wiener processes, the best procedure is to retain the same g',, in the idealized system as were in equations E * of the actual system. This is in accord with the conclusions of Barrett and Wright. [1].
92
E.J. McShane / The choice of a stochastic model
Obviously we could consider other examples. For instance, let the system be affected by noises indirectly, through a smoothing process. That is, when the noise-processes are z 1. . . . . zr, let the equations of the system be x'(t) = x ' ( a ) +
g~(s,x(s))dv~
(6.6)
vP(t) = f~_~ ~o(t- s)dzP(s), where ~ is a continuously differentiable function on ( - ~ , oo) that vanishes outside some interval [0, T]. It is an obvious conjecture, and can be proved correct, that in order to realize this by any other process satisfying the standing hypotheses the best choice of equations E for the idealized system is the original set of equations, unchanged. However, for general functional equations I have not been able to obtain any results of interest.
References [1] J.F. Barrett and D.J. Wright, "The random nature of stockmarket prices", Operations Research 22 (1974) 175-177. [2] E.J. McShane, Stochastic calculus and stochastic models (Academic Press, New York, 1974). [3] E.J. McShane, "Stochastic differential equations", Journal of Multivariate Analysis 5 (1975) 121-177.
Mathematical Programming Study 5 (1976) 93-102. North-Holland Publishing Company
ASYMPTOTIC STABILITY AND ANGULAR CONVERGENCE OF STOCHASTIC SYSTEMS Mark A. PINSKY* Northwestern University, Evanston, Ill. U.S.A.
Received 9 May 1975 Revised manuscript received 23 July 1975 A system of It6 equations is considered in the unit disc. The circumference is assumed to be a stable invariant manifold with a finite number of stable equilibrium points. Under supplementary hypotheses it is proved that-the solution converges to a limit and that the angle converges to a limit. This leads to a well-posed Dirichlet problem and to the determination of all L-harmonic functions for the infinitesimal operator of the process.
1. I n t r o d u c t i o n
In this p a p e r we study the a s y m p t o t i c b e h a v i o r when t--~ ~ of a M a r k o v p r o c e s s d e f i n e d by a system of It6 s t o c h a s t i c e q u a t i o n s in the (xl, x2) p l a n e . This w o r k is a c o n t i n u a t i o n of p r e v i o u s w o r k s [4, 5, 8] with an a t t e m p t to f o r m u l a t e t h e results u n d e r m i n i m a l technical h y p o t h e s e s . T h e m o s t basic case of stochastic s t a b i l i t y c o n c e r n s the a s y m p t o t i c stability of the z e r o s o l u t i o n of a system of It6 e q u a t i o n s . In o r d e r to p r o v e a s y m p o t i c stability in c o n c r e t e cases, we h a v e i n t r o d u c e d the notion of S-function. This c o r r e s p o n d s to the l o g a r i t h m of t h e stochastic L y a p u n o v function [6]. W h e n e v e r an S-function exists, we h a v e " a s y m p t o t i c stability in p r o b a b i l ity", i.e., lim P { l i m , _ ~ X 7
= 0} = 1.
x~0
This is o n l y a local c o n c e p t , as is t h e case for d e t e r m i n i s t i c systems [7]. D u a l to the n o t i o n of a s y m p t o t i c stability is the q u e s t i o n of a n g u l a r b e h a v i o r , in t h e case of stochastic s y s t e m s in two variables. F o r o s c i l l a t o r y systems [6], it is n a t u r a l to p r o v e " s p i r a l i n g b e h a v i o r " , i.e., that O(t)-->m * Research supported by National Science Foundation MPS71-02838-A04. 93
M.A. Pinsky / Asymptotic stability and angular convergence
94
when t --~ ~. These questions have been treated in detail in earlier work [3, 4, 8] and therefore are omitted from the subsequent discussion here. We shall be m o r e interested in the question of angular convergence. This corresponds to the case of an improper node in the classification of ordinary differential equations [1, ch. 15, Sec. 1]: every orbit has a limiting direction at (0, 0), but not all limiting directions are possible. At the other extreme is so-called ergodic behavior where a continuum of directions is attained infinitely often as the orbit converges to (0, 0). We shall give concrete instances of both cases. Once we have determined the possible angular convergence, it is natural to ask whether there can be discerned any other asymptotic characteristics of the stochastic system. Mathematically, this corresponds to the classification of all harmonic functions for the given It6 process in the neighborhood of an equilibrium point. In a companion p a p e r [9] we study this question by examining the limiting behavior of f(Xl, X2) when (x,, x2)--> (0, 0), where f is an L-harmonic function for the It6 process with an equilibrium point at (0, 0). These results appear in Section 4 of the present work. Sections 2 and 3 of this work contain the statements and proofs of results on a system of It6 equations in the unit disc x~+ x~ < 1, where the unit circle is assumed to be an attractive invariant set which contains a finite number of stable equilibrium points. This set-up was already studied in [5, 8] in connection with the Dirichlet problem. In the following sections 0 will denote an element of the interval [0, 2rr] or the point (cos 0, sin 0) on the perimeter of the unit circle. Thus the notation x ~ 0 is a shorthand for the statement ( x l - ~ c o s 0, x2---*sin 0).
2. Statement of results
We assume given a system of stochastic equations dx = t r ( x ) d w + b(x)dt,
(2.1)
where w ( t ) = ( w l ( t ) , . . . , w " ( t ) ) is an n-dimensional Wiener process, x ~ or(x) is a C = 2 • n matrix, and x ~ b(x) is a C a 2-vector which satisfy the following conditions: (A) Interior conditions inf
b,l
Io'(x)~r = & > 0
(0-< r < i)
(2.2)
M.A. Pinsky /Asymptotic stability and angular convergence
(B)
95
Normal boundary conditions ~(x)=-I~'(x)x12=O,
Ix[:1;
(2.3)
~3 (x ) =--(b(x ), x) + 1t tr'(x )x t2 - ~tr{o'(x )o" (x )} = O,
rxT--1; (2.4)
Ol(0) ~ !im-1
I - 2(l-Ix
12)J < 0 '
0-< 0-<2rr.
(2.5)
These signify that the unit circle I x I = 1 is a stable invariant manifold for the system (2.1). Now we introduce polar coordinates x, -- r cos 0, x: = r sin 0 and write (2.1) in the form
dr = ~r(r, O) d w + ~(r, O)dt,
(2.6)
dO = 6.(r, O)dw + b(r, O)dt, where limr~l 6.(r, O) -- 6"(0), limr~,/~(r, O) =/~(0), b(r, O) : 0 ( 1 - r). (C) Tangential boundary conditions
~(r, O) -- 0 ( 1 - r),
6.(0) has a finite number of zeros, 01. . . . . On, and b(O,)>-O, l < - i < - n .
(2.7)
If 6.(0,) = 0,/~(0,) = 0, then
(2.8)
Qrp(0,) ~ lira [/~(0)/(0 - 0 , ) - 16"2(0)/(0 - 0,) 2] < O. 0~0
i
These points will be labeled 0, . . . . . 0,., m < n. In o r d e r to state the final set of conditions, we introduce a special coordinate system in the n e i g h b o r h o o d of (1, 0,), where 1 -< i -< m. Let p, = [(1 - r)2 + (0 - 0,)2] '/2 ~o~ = t a n - ' [ ( 0 - 0,)/[1 - r)] where 0 < p < p,, and 0 < q~ < w. T h e stochastic equations (2.6) can be written in the form dp = d'~ (p, q~)dw s +/~(p, q~)dt,
(2.9)
d ~ : 6-~(p, q~)dw s +/~(p, q~)dt, where the coefficients dr,/~ have limits w h e n p - - - > 0, which we d e n o t e by 6.,(~), /~(q~), respectively 6-(q~) 2 --- YT=, 6.~(q~)~. (D) Angular boundary conditions
M.A. Pinsky / Asymptotic stability and angular convergence
96
We assume that at each 0~ (1-< i -< m) we have
6-(~)~> 0
(o< ~ <~r),
(2.10)
and either O~(O,) < Orr(O,) < 0
(2.11)
Orr(O,)< O•
(2.12)
or
< O.
Let X~' be the unique solution of (2.1) with X~ = x, Ix [ < 1. T h e o r e m 2.1. Assume that (A), (B) and (C) hold. Then we have the following asymptotic behavior
P{lim,-=[X71 = 1} = 1
( [ x l < 1);
P{lim,~ X7 ~ {0, . . . . . Omit= 1 lira P { l i m , ~ X7 = 0~} = 1 x~O
(2.13) (2.14)
(Ixl
(2.15)
(1 -< i -< m ) .
i
T h e o r e m 2.2. Assume that (A), (B), (C) and (D) (2.10)-(2.11) hold at 0,.
Then in addition to (2.13)-(2.15) we have for that value of i P{Iim,~,(XT)C{O,w}IXT---~O,}=I
( [ x l < 1);
lim P { l i m , _ ~ , ( X T ) = 0IX7 ---> 0~} = 1.
(2.16) (2.17)
x~O i ~i ( x ) - - , 0
T h e o r e m 2.3. Assume that (A), (B), (C) and (D) (2.10), (2.12) hold at 0,.
Then in addition to (2.13)-(2.15) we have for that value of i
P{lim .... ~ , ( X T ) = O ,
!ira q~,(X~,)=wlX:--->O,} = 1
(Ix[
(2.18)
3. Proof of T h e o r e m s ( 2 . 1 ) - ( 2 . 3 )
It was shown in [8, theorem 4.3] that conditions (A), (B) imply that P{lim,-~lX~l = 1} = 1 for any Ix I< 1. Now by hypothesis (2.10), in the neighborhood of (1, 0~) there exists a function f~(p, q~) = logp + hi(q~) such
97
M.A. Pinsky / Asymptotic stability and angular convergence
that Lf, --< - 1. H e n c e (2.15) follows by the (local) asymptotic stability of the point (1, 0,). To prove (2.14), it suffices to construct a function F in the open disc D = (r < 1) which satisfies F E C2(D);
(3.1)
LF--< - 1
( 1 - 8 <- r -< l);
(3.2)
(1 - r) I V F I -< const.; limF(x)=
-o0
(3.3)
for 0 E { 0 1 . . . . . 0,.},
x~O
>-~
for 0ff{0~ . . . . . 0m}.
(3.4)
Indeed, if F satisfies (3.1)-(3.4), we can write by It6's formula F(XT) = F(x) +
f
crVF ds +
fo
LF(XX)ds.
(3.5)
From this (2.14) follows in standard fashion. In order to construct F, we proceed by several steps. L e m m a 3.1. There exists a function F(O), defined for Off {01 . . . . . 0,,} such that P E C2((0, 2rr)\{0, . . . . . 0m}), L F = - 1, P(O)=QFIIoglO-O,[-I+O(1)
(O--+O,l<-i<-m),
(3.6)
where O(1) is bounded together with its first two derivatives.
Proof. Let F ( 0 ) be the solution of the ordinary differential equation ~rZ(O) P'(O) + b ( O ) F ' ( O ) = - 1,
Off_ {01 . . . . ,0,.},
P ( 0 ' , ) = o = F- '( 0 3',
(3.7)
where O'i is a point in the interval (0, E+,), where ~2(0' 3 > 0. We have the explicit solution #(0)
= -
fo,~ ~ dq~
~[, 2u,(x)dx ~(x) '
0 ~ (0,, 0,+,),
(3.8)
where u~(x) = exp
f;
[2 b (x')/o-a(x ')] d x '
(3.9)
i
From the explicit solution (3.8), it follows that when 0--~ 0, (3.6) holds.
M.A. Pinsky / Asymptotic stability and angular convergence
98
Lemma 3.2. In a neighborhood of (1, 0i), iff(0) can be extended to a function F(r, O) which satisfies L F <- - 1, F(r, 0)---~ - ~ when r ---* 1, 0 ~ Oi. Proof. By the invariance of Q,, Q• [2, problem 13.9], we may assume that the arcr = 1 has been replaced by the line y = 0, and the line 0 = 0i has been replaced by the line x = 0. Thus we have an operator
Lf(x, y) = 89
y) O2f/ Ox,Oxj + b,(x, y) cgf/ cgx,
(3.10)
with
azffX, O) = 0 = bffx, 0), O•189
(3.11)
y)/y2]
Ixl<-8,
Qle =- l!m [b,(x, O)/x - ~a,,(x, O)/x 2] < O.
(3.12) (3.13)
With these hypotheses, it was shown in [8, Proposition 2.10] that there exists a function of the form
f(x, y ) = ~ aj log(y~ + ejx 2)
(3.14)
i
satisfying L f <- - 1 in the disc X2q-y 2, where a j > O , E a j = l . notation, P is the solution of the ordinary differential equation
89
0 ) P ' ( x ) + bi(x, 0 ) F ' ( x ) = - 1.
In this (3.15)
Now we define
F(x,y)=-[P(x)-Op, logJxl-']-O,[f(x,y)-
~'.a, l o g e i ] . J
Clearly F ( x , O ) = F ( x ) by construction. By (3.6), we o ( 1 ) - QirLf < 0. Therefore F(x, y) is the desired function.
have
(3.16)
LF=
Lemma 3.3. There exists a function ff'(r, 0), defined in a neighborhood 1 - e < r < 1 which extends F(r, O) and satisfies L F <- - A 2 < O. Proof. Let ~'(0) be a C = curve such that ( ( 0 ) = 1 for 1 0 - 0 i l > 3 and 1>((0)->1-3 for 1 0 - 0 i J - < 3 . Consider the domain /5 = {r -< ((0), 0_< 0-< 2w}. F is defined on the boundary and is of class C =. Let F be the harmonic extension of F to the interior, Necessarily F is of class C 2 up to the boundary of /5. Therefore LF(r, O)--~LF(1, O) when r---~l. But L F < -
M.A. Pinsky / Asymptotic stability and angular convergence
99
- A 2 < O on 8/5. T h e r e f o r e we m a y choose 8 > 0 such that L F <- - 8 9 2 for 1 - 6 < r < 1. T h e p r o o f is c o m p l e t e . Proof of Theorem 2.2. In the n e i g h b o r h o o d
of 0;, we can write the
infinitesimal o p e r a t o r L in the form
L f = 89ff.2(~p) f.,~ + ~(~p ) f,~ + o(1)
(3.17)
with d 0 r ), /~(~) c o m p u t e d from (2.9). Let f ( ~ ) be the solution of the equation 8 9 - 1 , f ( w / 2 ) = 0, f'Or/2)= 0. An explicit calculation shows that when ~,--->0 or ~--~ ~r lim ([/~'(~p) - 2~[d-'(~)] 2) = O i - Qll.
(3.18)
By hypothesis (2.11), this quantity is negative. T h e r e f o r e by an explicit solution of the equation for )~ we see that lim f ( ~ ) = -0% w h e n e v e r ~ - - ~ 0 or ~--~ w. N o w we apply It6's f o r m u l a f(~o,) = f(q~o) +
fo
^ - dws + o-ff~
f/
(Lf)ds.
(3.19)
On the set ( X I - - ~ 0;), we have
--/(,p,) lira - t~
t
-< - 1.
T h e r e f o r e ~i (X~) ~ 0 or w when t --~ 0% which is assertion (2.16). T o p r o v e (2.17), let g = e x p ( h / ) , A > 0. T h e n for A sufficiently small, we have Lg <-0, g > 0 in the o p e n h a l f - n e i g h b o r h o o d (0 < p~ < p0, 0 < ~i < ~). By following the a r g u m e n t of [8, l e m m a 2.3], we see that for given e, "q > 0 by choosing 0o sufficiently small, we have P(~, < ~/ for all t > 0} -> 1 - e. A p p l y i n g (3.19) on this set p r o v e s that P { ~ , ( X ~ ) - - ~ 0 ) -> 1 - e , which was to be proved. T h e same a r g u m e n t applies to ~ =-rr, which completes the proof. Proof of Theorem 2.3. In this case the equations (3.17)-(3.18) are still valid, but now the limit in (3.18) is positive. W e now take f ( ~ ) to be the solution of the e q u a t i o n 89 0 , / ( w / 2 ) = 0, f'Or/2) = 1. Since the limit (3.18)is positive, we must have l i m ~ , / ( g ) ) = + ~ , l i m ~ o / ( ~ ) = - c o ; further >o.
A p p e a l i n g to (3.19), we see that the f i n a l ' t e r m is b o u n d e d , since (Lf)(r,, 0 , ) = O ( 1 - r , ) = O(e-k'), t--~ ~. T h e stochastic integral is a martingale and is equivalent to WA,, w h e r e W is a W i e n e r process and A, = f~, 16-~ [2ds. But [ d-~/~ I -> 13 > 0 on (0, ~r), and therefore A, --* ~. Thus (3.19) becomes
100
M.A. Pinsky / Asymptotic stability and angularconvergence
f(,p,) = w,,, + o ( 1 )
(t--, o~)
But lim,_~ WA, = + 0% lim,_: Wa, = - oo. T h e r e f o r e the same is true of f(q~,) and thus (2.18) follows. T h e proof is complete.
4. Dirichlet Problem; Fatou Theorems T h e preceeding results allow us to formulate a well-posed b o u n d a r y value problem for the o p e r a t o r 82u Lu = ~(cro")q ~
Ou + b,--Ox,
(4.1)
and to find all b o u n d e d solutions of Lu = 0 in the interior of the unit disc.
Theorem 4.1. Assume (A), (B), (C) and (D) 2.10. Assume that for each l < - i < - m , we have (2.11) and for m j < i < - m we have (2.12). Then the problem Lu = O,
Ix ] < 1,
lira u ( x ) = fi
x~Oi
(4.2)
(I -< i --< ml),
(4.3)
lim
u(x)=f?
(ml
(4.4)
lim
u(x)=f2
(m,
(4.5)
x ~ Oi. ,~,i ( x . ~ . O
x ~O~, ,~i(x )--~r
has the unique solution nl I
. ( x ) = Y. f , e ~ x : - , o,} + i=1
+
~,
f? P/X7 ---, 0,, r ( x T ) - - , 0}
i=~vtl+l
2
f?p{xx,--->O,,~,(XT)-->Tt} 9
(4.6)
i =;,nl+l
Theorem 4.2. A s s u m e (A), (B), (C), and (D) (2.10). A s s u m e that for 1 <- i <- m~ we have (2.1I) and that for ml < i <- m we have (2.12). Let u be a function which is bounded and continuous in Ix I < 1 together with its partial derivatives of the first and second order and satisfies Lu = 0 for I x I < 1. Then the following limits exist:
M.A. Pinsky / Asymptotic stability and angular convergence
f~ = lim u(x)
(1 -< i -< m,),
x~Oi
f~ =
lim
lOt
(4.7)
u(x)
(m, < i <- m),
(4.8)
u(x)
(m~ < i <- m).
(4.9)
x ~ O i , ,pi(x )~',O
f7 =
lira x ~ O i , ~i ( ~ ) - ~
If each of the limits (4.7)-(4.9) is zero, then u =- O. In the l a n g u a g e of differential equations, T h e o r e m (4.1) states that the b o u n d a r y value p r o b l e m (4.2)-(4.5) is well posed. T h e o r e m (4.2) states in addition that (4.2)-(4.5) is the most general boundary value problem which can be posed, u n d e r the stated conditions. W e have thus f o u n d the correct " b o u n d a r y " for the present p r o b l e m .
Proof of T h e o r e m 4.1. We must show that u, defined by (4.6) is a solution of Lu = 0. It suffices to examine each t e r m in the sum. Let p~(x) = P [ X ; --* 0,}. Clearly E [p,(X~)} - P{XT+s--~ 0~} = p~(x) for s > 0. T h e s a m e holds true if s is replaced by T~ = inf{t > 0 : ]X'~ - x ] = e}. But on the disc D , l Y - x l < e we consider the n o n - d e g e n e r a t e p r o b l e m
Lq = 0 lim
(yED~),
q(y')=p(y)
(yEaD,)
which has a C ~ solution q,, for which q, (x) = E {p, ( X " T,)} = p, (x). Thus p, (x) is a C ~ solution of Lu = 0. T h e same a r g u m e n t applies to the o t h e r terms of (4.6). To p r o v e the b o u n d a r y conditions (4.3)-(4.5), we appeal to T h e o r e m s (2.1)-(2.2), s t a t e m e n t s (2.15), (2.17). F r o m these the conditions follow immediately, and the t h e o r e m is proved.
Proof of T h e o r e m 4.2. We appeal to a c o m p a n i o n p a p e r [9] on isolated singularities of d e g e n e r a t e elliptic equations. By t h e o r e m s (3.2)-(3.3) of [9] we see that each of the limits (4.7)-(4.9) exist. In o r d e r to p r o v e the final statement, note that
u(x) =
E{u(X~)}.
W h e n t---->~, we use T h e o r e m s (2.1)-(2.3) and the existence of the limits (4.7)-(4.9). Letting t---> ~ we have the representation
102
M.A. Pinsky /Asymptotic stability and angular convergence m
1
= E f,p,(x) + i=1
i=rnl+l
(f~p? (x) + fTp~(x)}.
Clearly ~ = 0 = ]'7 implies u(x)~O. T h e proof is complete.
References [1] E. Coddington, N. Levinson, Theory of ordinary differential equations (McGraw Hill, New York, 1956). [2] A. Friedman, Stochastic differential equations (Academic Press, New York, 1975). [3] A. Friedman and M. Pinsky, "Asymptotic behavior of solutions of linear stochastic differential systems", Transactions of the American Mathematical Society 181 (1973) 1-22. [4] A. Friedman and M. Pinsky, "Asymptotic stability and spiraling properties of stochastic equations", Transactions of the American Mathematical Society 186 (1973) 331-358. [5] A. Friedman and M. Pinsky, "Dirichlet problem for degenerate elliptic equations", Transactions of the American Mathematical Society 186 (1973) 359-383. [6] H. Kushner, Stochastic stability and control (Academic Press, New York, 1967). [7] J. LaSalle, "The extent of asymptotic stability", Proceedings of the National Academy of Science 46 (1960) 365-368. [8] M. Pinsky, "Stochastic stability and the Dirichlet problem", Communications on Pure and Applied Mathematics 27 (1974) 311-350. [9] M. Pinsky, "Isolated singularities of degenerate elliptic equations in R 2'', to appear.
Mathematical Programming Study 5 (1976) 103-114. North-Holland Publishing Company
R E M A R K S O N W I D E S E N S E E Q U I V A L E N T S OF C O N T I N U O U S G A U S S - M A R K O V P R O C E S S E S IN R" L.E. Z A C H R I S S O N Royal Institute of Technology, Stockholm, Sweden Received 20 May 1975 Revised manuscript received 8 Nov. 1975 A characterization of the first and second order moments of a stochastic multi-variate process on [ 0 , ~ ) ~ t having a wide sense equivalent process which has the same dimension and is generated by a linear Ito-equation with time-dependent coefficients (called a simple diffusion process in the paper) is given in Proposition 1, Section 2. The rest of the paper is an informal discussion around the result.
O, Introduction Due to the widely spread use of Gauss-Markov processes in engineering applications some simple remarks on what a process should look like in order to have at least the same kind of second order characteristics as those belonging to a Gauss-Markov process might be useful. The second order properties, defined as properties which depend only upon first and second moments, are the only ones necessary for the solution of the usual linear-quadratic optimization problems. In [6] (from which the subject matter of my contribution to the Lexington Symposium on Stochastic Systems, June 1975, was drawn), the second order characteristics of certain Markov processes (having a state space {1, 2 . . . . , I} • R N) which I called piecewise simple diffusion processes were derived and it was shown that these could be delivered also from a Gaussian Markov process in R TM generated by a stochastic differential vector equation of the simple type ds
- A(t)s
+ B(t)dv(t)
(0)
Such a process was called a process of simple diffusion type. In the first and shorter part of [6] a theorem giving sufficient conditions that a wide sense equivalent of this simple type should exist was developed as an aid in the specific cases treated. 103
104
L.E. Zachrisson / Equivalents of continuous Gauss-Markov processes
This note is an elaboration of this first part and gives in Section 2, Proposition 1, a characterization of first and second order moments when such a wide sense equivalence is possible. The theorem in [6] mentioned above is a consequence of this proposition and reduces to the observation in the last part of Section 3. Section 4 indicates that a centered Gaussian Markov process which is absolutely continuous in a certain sense (which could be analyzed further) has to be of simple diffusion type. A more complete identification of the two classes of processes will be possible if we add a deterministic term c(t) to the right-hand side of eq. (0) and omit the centering restriction on the Gaussian process. A number of informal and very short comments will also be given in the sequel in order to indicate connections with other points of view. See also Section 5. I intend to publish the second part of [6] elsewhere.
1. Generalities, definitions Two processes x(t, w) and s o5) where the state variables belong to R", are usually said to he wide sense equivalent to each other if they have, for all relevant times, the same vector of first moments:
E x(t, o9) = r(t)= E x(t, fig)
(1)
and the same matrix of second moments: E{x (t, w)x*(s, w)} = O(t, s) = l~{x (t, os)x*(t, o5)}
(2)
where * means transpose. Then also the covariance matrices
Q(t, s) = Q(t, s) - r(t)r*(s) = E{(x (t) - r(t))(x(t) - r(t))*}
(3)
agree for the two processes. (We shall always use r, Q and (~ according to these definitions.) Our interest will be on processes where times belong to the positive axis [0, oc); time t = 0 is called initial time and the stochastic vector x(0, w) is formed from "initial values". We shall focus on processes in an equivalence class (according to wide sense equivalence) which has a Gaussian representative s w) of simple diffusion type (A, B, r(O), (~(0,0)), by which we mean:
L.E. Zachrisson / Equivalents of continuous Gauss-Markov processes
105
(1) ~(0, .5) is Gaussian distributed with mean values r(0) and covariance matrix 0 ( 0 , 0 ) (hence second order moment matrix O(0,0) = 0 ( 0 , 0 ) + r(O) r*(0)). (2) Evolution in time of s is determined by the Ito-equation ds
O5) = A (t)$(t, os)dt + B(t)dv(t, O5)
(4)
Here A (t) is a deterministic (n • n)-matrix, which as a function of t belongs locally to L l: fo T I[a
(t) Ildt <
~
for any T > 0
v(t, o5) is an m-vector of mutually independent, standard Wiener-processes, also independent of s O5) and B(t) is a deterministic (n • m)-matrix belonging locally to L 2. f T~
~ (blj(t))2dt
for any
T>0.
We remark at once that other A ( . ) : s and B ( - ) : s may very well lead to the same probability distribution. For instance B1 and B2 are equally good if B1B*(t) = B2B*(t) (a.e.), and A ( t ) is unique (a.e.) only if ()(t, t) has full rank for arbitrary small t > 0. It should be observed, that handling of (4) does really not need the Ito-integral, as we can define the solution by ~(t, O5) = ~(t)s
oJ)+ ~ ( t )
~ - ' ( z ) B 0 - ) d v (z, O5)
(5)
and it is enough to use the Wiener integral (as defined for any process with orthogonal increments). Because ~ - ' is absolutely continuous we can even think of the integral as sample-wise defined via integration by parts if also B is absolutely continuous. (cf. [3, pp. 426-433]). Here ~ ( t ) ~ - ~ ( r ) is the transition matrix belonging to the differential equation d~/dt = A~. For definiteness we have put 4 ( 0 ) = I. If we define ~ ( t ) as
~(t) =
fo
B(t)dv(t),
(6)
then l~{~(t)} = 0 and the covariance matrix of ~ is (~w(t,s) =
fotAsB(r)B*(r)d~"
(=(~w(s,s)ifs~
106
L.E. Zachrisson / Equivalents of continuous Gauss-Markov processes
We note, that in increasing generality, if(t) is (a) a Gaussian vector process in R" with centered, independent increments, (b) a vector martingale, (c) a vector process with centered, orthogonal increments.
2. A proposition Suppose now that w(t) is a vector process with centered, orthogonal increments and w(0)= 0. Let E w(t)w*(t)= O(t, t) be absolutely continuous, that is O(t, t) = f~, W0")dr, where, due to orthogonality, necessarily the symmetric matrix W is positive semidefinite. Suppose the process x satisfies the stochastic differential equation dx (t) = A (t) x (/)dt + d w (t)
(4')
with x(0) orthogonal to w(t) for any t and Ex(0)--r(0); E(x(0)x*(0)) = O(0, 0). Suppose further that B(t) is a Lebesque-measurable solution locally belonging to L 2 of the equation
B(t)B*(t) = W(t) a.e. Such solutions exist, for instance [W(t)] ~/2. Then comparing eq. of (4') with eq. (4)written as d~f(t) = A (t) .~(t) + d~ (t)
(4")
and taking into account the computation rules
E(,e(~-)dw(~-) = o J
'As
it is immediately apparent that ~ and x are wide sense equivalent and that (use the counterpart of eq. (5)) dr(t) _- A (t) r(t); dt
~ O ( t , s ) = A(t) O(t,s)
(7a) ( / > s);
(7b)
L.E. Zachrisson / Equivalents of continuous Gauss-Markov processes ~t O(t, t) = A ( t ) O(t, t)+ O(t, t ) A *(t)+ W(t)
107
(7c)
almost everywhere. (t~ satisfies the same equations as 0.) Now the following/emma gives a key fact to be used in the statement of necessary and sufficient conditions that a process have a wide sense equivalent of simple diffusion type. Lemma 1. If Q(t, s) is a covariance matrix, abs. cont. in t for t >i s, which satisfies (7b) (a.e. t) where A ( t ) ~ L 1 locally, and if O(t,t) is abs. cont. as a function of t, then W(t) defined by (7c) is pos. semidef, almost everywhere and belongs locally to L 1. Proof. Considering exceptional t : s later, suppose that, in the point t and according to (7b), lira 1 [O(t + h, t ) - O(t, t)] = A(t)Q(t, t). h~O h
(s)
Then also
O(t, t)A *(t) = lim
1 [O*(t + h, t ) - O*(t, t)] = lim ~-[O(t, t + h ) - O(t, t)];
Suppose also that in this point (d/dt)O(t, t) exists equal to lim h,~o ~1[ O ( t + h, t + h ) - O(t, t)]; Then, putting together according to (7c), 1
W(t) = lim ~- (O(t + h, t + h ) - O(t + h, t ) - O(t, t + h ) + O(t, t)} 1 E{[x(t + h ) - x(t)] [x(t + h ) - x(t)]*} = lim ~so p * W ( t ) p = l i m ( 1 / h ) E ( p , x ( t + h ) - x ( t ) ) 2 ~ O for any n-vector p. W is pos. semi-definite in this point t. At which points can the equalities used fail? First, in a null-set N' where O(t, t) has no derivative. What concerns eq. (7b), observe that any finite set of solutions of the eq. d~/dt = A~, being a finite linear combination of a fundamental system of solutions (the columns of q~ for instance) satisfy the equations everywhere outside of a common null set N" defined by 4. As O(t, s) is continuous, eq. (8) is therefore satisfied at every t ~ N".
108
L.E. Zaehrisson / Equivalents of continuous Gauss-Markov processes
So W(t)>~O (a.e.). As (d/dt)O(t,t) and A ( t ) belong locally to L' and O(t, t) is locally bounded, W ( t ) belongs locally to L '. The proof is complete. We also observe, that with given A ( t ) eq. (7a) and the initial value r(0) determines r(t) uniquely, that (7c) with the initial value Q(0,0) determines Q(t, t) uniquely, that (7b) with the initial value Q(s, s) determines Q(t, s) (t/> s) uniquely and that, if t <<-s, Q(t, S) = Q*(s, t) is already determined. Summing up these observations and the lemma, we have Proposition 1. (I) Necessary and sufficient conditions that the second order process x (t, oJ) E R" has a wide sense equivalent o[ simple diffusion type are (1) r(t), Q(t, s) (for t >1 s) and Q(t, t) are abs. cont. as functions of t. (2)
There exists a function A : [ 0 , ~ ) D t ~ A ( t ) E (n x n)-matrices locally belonging to L', such that almost everywhere dr(t) dt = A ( t ) r ( t )
(2')
OQ(t, s) = A ( t ) Q(t, s) (t > s) at (II) In such case the matrix
(2")
d W ( t ) =- ~ Q(t, t) - A (t) Q(t, t) - Q(t, t) A *(t) is positive semidefinite (a.e.). Let B ( t ) be any L2-solution of the equation B ( t ) B * ( t ) = W(t). Then the simple diffusion process (A, B, r(0), O(0, 0 ) - r(0) r*(0)) is wide sense equivalent to x. We shall now discuss to what extent the conditions in the proposition determine A (t). According to the proof of the lemma, a minimal condition that A (t) shall fulfil almost everywhere is A ( t ) O(t, t) = lim 1 {Q(t + h, t ) - O(t, t)}. h "~O - h
So if O(t, t) is invertible A ( t ) is unique at t.
(8')
L.E. Zachrisson/ Equivalentsof continuousGauss-Markovprocesses
109
If Q(t, t) has not full rank, it is nevertheless true that the ambiguity set of A is just the ambiguity set of (8') for different t : s (apart from measurability restrictions). In other words, if Al(t) is a solution of (2'), (2") and hence of (8'), then any L-measurable solution A2(t) of (8') is also a solution of (2') and (2") and so is just as good as A,(t) for the proposition. This fact is made plausible from the fact, that (8') is an equation which gives constraints on the behavior of A (t) only on Im Q(t, t), which is the smallest linear set in R" containing all the probability mass of the stochastic vector x(t). Things outside of that set should not influence the development of the process. To prove our ambiguity statement we first observe that the definition of W(t) amounts to the same for A1 and A2, as
A,(t) Q(t, t) = A2(t) Q(t, t) according to assumption; put A~ - A2 = A. To prove that also (2') and (2") with A2 instead of A, are true we have to show that A(t) O(t, t ) = 0 implies A(t) r(t) = 0 and A(t) O(t, s) = 0. The easy proof by Schwarz' inequality is omitted. Finally, it is good to recognize the following facts. Under the proposition O(t, s) has the same rank as O(s, s) if t/> s (trivial according to (2")). The rank d(t) of O(t, t) is an integer valued, monotonically increasing, left continuous function of t. So a set of constancy for d(t) is a left open, right closed interval. The proof is best carried out by considering
P(t, t) ~ qb '(t) Q(t, t) (@-')*(t) which has the same rank properties as O. The differential equation for P is
d P(t, t) = dp '(t) W(t) (qb-')*(t), dt so P(t, t ) - P(s, s) is positive semidef, if t/> s. This shows that (N means "null-space of")
NP(t,t)CNP(s,s)
and
ImP(t,t)D!mP(s,s)
especially d(t) >~d(s). So NP(t, t) and Im P(t, t) do not depend on t within a set where d(t) = constant. Furthermore, the graph of NP(t, t) is closed, due to continuity. From these facts it is easy to see that the function d ( . ) is left continuous.
110
L.E. Zachrisson / Equivalents of continuous Gauss-Mark'ov processes
3. Further comments It can be shown, that (a version of) the process x in the sufficiency part of proposition can be considered as generated by some process w(t) with centered orthogonal increments via eq. (4'): dx(t) = A (t)x(t)dt + dw(t).
(4')
We define w(t) as
w(t) = - x(0)+ x ( t ) -
f0
A(~')x(r)d~"
(9)
and have to show that w(t) is a process with centered, orthogonal increments, also orthogonal to x(O). To carry out the proof we have to assume that x is measurable T x g2. That w(t) is orthogonal to x(0), for instance, is equivalent to proving E(w(t)x*(O)) = 0. But
E(w(t)x*(O))= - O(0,0)+ O ( t , 0 ) -
A ( r ) O(z,O)dr
according to eq. (7b). We do not try to carry through the measure-theoretic details but have all the same obtained a strong indication that only processes generated via stochastic equations of type (4') with a process w with centered, orthogonal increments can be candidates for having a wide sense equivalent of simple diffusion type. If w in eq. (4') happens to be a martingale, then the decomposition of x(t) into the martingale x(O)+w(t) and the "predictable" process f; A (t)x(t)dt (Meyer) is a simple instance of the Doob decomposition. See
[s]. The martingale case is connected with the following observation (see [6]). Both equations (2') and (2") are consequences of the stronger condition, that there exists a version of each conditional expectation E{x(t) I x(s)} = r,(t) (each t ~>s; s fixed but arbitrary, versions are with respect to x(s)-space) such that
d r , ( t ) = A ( t ) . rs(t); dt
lim r,(t) = x(s) (a.s.)
,,,,
The proof is simple:
r,(t) = 45(t) q) '(s)x(s)
(a.s.),
r(t) = E rs(t) = 4~(t) 45-1(s) r(s)
hence (2'),
O(t, s) = E{r,(t)x*(s)} = cl)(t) qS-'(s) O(s, s) hence (2").
L.E. Zachrisson / Equivalentsof continuous Gauss-Markov processes
111
4. How special is a simple diffusion process? This is a natural question. A fairly complete answer for the centered case (r(0) = 0) is delivered by
Proposition 2. A centered Gaussian Markov process x : [o, ~) • o ~ (t, o~)~ x(t, to)~ r ~
has a version which is a simple diffusion process if (1) Q(t, s) (for t >1s) and Q(t, t) are abs. cont. as functions of t, (2) there exists a deterministic A ( . ) E L 1 (locally) such that for almost every t >! 0 iim Q ( t + h, t ) - Q(t, t) = A (t)Q(t, t). h.~o h Under these conditions the process is generated by dx(t, to) = A (t )x(t, to)dt + dw(t, to) where w (t) is a Gaussian process with independent centered increments and
E{w(t)w*(t)}= O(t,t)-
0(0,0)-
fo
[ A ( r ) O ( r , r ) + O(r,r)A(r)]dr.
A rapid sketch of a partial proof will be given. The first thing to be proved is that lim O ( t + h, s) - O(t, s) = A (t) O(t, s) h
h %0
for all s ~< t (not only for s = t). This is where the Markov property comes in. "Centered Gaussian" implies that there is a n Ah(t) such that E{x(t + h)l x(t)} = x(t) + hAh (t)x(t)
(a.s.)
and " M a r k o v " shows that this Ah also satisfies E{x(t + h ) l x ( t ) , x(s)} = x ( t ) + h A h ( t ) x ( t ) (a.s.) (The "a.s." says that Ah (t) as an operator in R" is unique with resp. to its action on I m Q ( t , t ) but can be arbitrary outside that set.) Now E(x(t + h ) l x ( t ) , x ( s ) ) - x ( t + h ) i s orthogonal both to x ( t ) and x(s); So postmultiplication by x(s) and taking expectations gives
112
L.E. Zachrisson / Equivalents of continuous Gauss-Markov processes
O(t + h, s) - Q(t, s) h =Ah(t)O(t,s)
for
any
s<~t.
The assumptions of the theorem imply that lim [Ah(t)-- A(t)] O(t, t) = O. Proper use of the Schwarz inequality shows that the same relation with O(t, t) replaced by O(t, s) is true. So lira 9Q-(t + h, s) - Q(t, s) h exists also and
3Q(t, s) = A ( t ) Q ( t , s). 8t
(~)
Now, evoking Proposition 1, we find that a weak version of Proposition 2 is already proved, viz. one which says that there is a simple diffusion process on some sample space which has the same finite dimensional distributions as the given Gaussian process. We will not carry on to prove the full version of the proposition. An analysis of the last proof shows that it is also effective in establishing that for a " M a r k o v process in the wide sense" [3, p. 90], see also [2]) satisfying condition (8') also eq. (2") in Proposition 1 is true. We bring to mind that a " M a r k o v process in the wide sense" is a process where for any t~S
~Sl~S2...
l~{x (/) [ x(s), x(sl).. . } = F,{X(t) ] X(S)}. Here E { . l x ( s ) , x ( s l ) . . . } means the orthogonal projector in the usual Hilbert space on the closed linear subspace spanned by the components of x(s), x(sl) . . . . . On the other hand condition (2") makes x into a Markov process in the wide sense. So the conditions in Proposition 1 could have been streamlined by using the concept of Markov processes in the wide sense. Such processes are, as has been pointed out in the literature (see, for instance, [4]) intimately connected with wide sense martingales, which is the same as processes with orthogonal increments (see [3, p. 165]).
5. Final c o m m e n t s
This paper treats a simple rather special problem as the search for a second order equivalent has been restricted to vector processes of the same
L.E. Zachrisson / Equivalents of continuous Gauss-Markov processes
113
formal dimension as the given one. Nevertheless, in the applications made in [6] a m a j o r concern was to find, at the outset, from plausibility reasons or otherwise, the correct dimension to serve as a good starting point in the exploration for the wide sense equivalent. In the cases treated the choice was not difficult, as the processes, in the most advanced case, were Markovprocesses with a state space (1,2 . . . . . I} • R", so the dimension used was
N = In. On the other hand we have not made the restriction that the generator of the process be time-independent. Results without this restriction can be a handy tool in many contexts. The bigger and better problem to give a G a u s s - M a r k o v i a n representation (~ la A k a i k e [1]) in some R N x T x ~2, N not specified in advance, has been treated mainly in the time independent or even stationary case. However, in discrete time (integers in [0, ~)), rather general results (possibly of doubtful practical value) are within reach even in the time dependent case. In this case vector processes can be reduced to scalar processes by putting in extra time points, so the scalar process is really the most general one if time d e p e n d e n c e is allowed. For a centered scalar Gaussian process y(t, w) (t = 0, 1,2 . . . . ) with positive definite (not only semidefinite) covariance function q(t,s) it can be shown without much difficulty, that it has a G a u s s - M a r k o v i a n representation in R N under the following assumptions: (1) There exist N functions hi(t) (i = 1,2 . . . . . N ) such that their restrictions to [s, ~), denoted by h~, are linearly independent for any s/> 0 in the real vector space L s of all functions from integers in [s, ~) to real numbers. (2) q ( - , s ) belongs to the linear space spanned by {h~(-)} if t/> s. One representation has the simple form: N
y(t)-- ~ h,(t) x,(t)= h*(t) x(t), i--1
x(t + 1) = x(t)+ b(t + 1) v(t + 1). Here h and b are deterministic vectors with N components, x is the stochastic state vector with the same n u m b e r of components, and v ( . ) is white, centered, scalar Gaussian noise (on the positive integers) defined as the innovation v(t + 1) = y(t + 1 ) - E{y(t + 1) ly(t), y ( t - 1) . . . . . y(O)}. The state c o m p o n e n t x,(s) is defined as the coefficient of development
hi(t) in the
114
L.E. Zachrisson / Equivalents of continuous Gauss-Markov processes N
E{y(t) l y ( s ) , y ( s - 1 ) , . . . , y ( 0 ) } = ~
h,(t)x,(s)
(t>ls).
i=l
The important thing in the proof is to recognize that the assumptions imply the existence and uniqueness of such x~(s). x turns out to be a vector process with independent increments. The task to prove something similar for continuous time is beyond the limits set out for this contribution but should not be particularly difficult. The hint of most practical value in the preceding pages is possibly the trivial remark that a process generated by a stochastic differential equation d x = A ( t ) x dt + d w ( t )
with a w ( t ) that has orthogonal, centered increments has a wide sense equivalent of simple diffusion type, this put together with the equally simple remark that if w ( t ) is a martingale then it has orthogonal centered increments. As the verification of the martingale property might sometimes require advanced methods, a relatively easy way out in a specific case may be to check via the conditions in proposition 1 a hypothetical result, obtained for instance from a guess on martingale-ness.
Acknowledgment I gratefully acknowledge rewarding discussions of [6] with Prof. T. Kailath during a week at Stanford in spring 1975.
References [1] H. Akaike, "Stochastic theory of minimal realization", IEEE Ac-19, (6) (1974) 667-674. [2] F.J. Beutler, "Multivariate wide sense Markov processes and prediction theory", The Annals of Mathematical Statistics 34 (1963) 424-438. [3] J.L. Doob, Stochastic processes (New York, 1953). [4] H. Kara, V. Mandrekar and G.L. Park, "Wide sense martingale approach to linear optimal estimation", SIAM Journal on Applied Mathematics 27 (2) (1974) 293-302. [5] A. Segall and T. Kailath, "The modeling of randomly modulated jump processes", IEEE, IT-21 (2) (1975) 135-143. [6] L.E. Zachrisson, "A wide sense, simple diffusion equivalent of a piecewise simple diffusion process", Rept. TRITA-MAT-1975-16, Dept. Math., Roy. Inst. Techn., Stockholm (April, 1975).
Mathematical Programming Study 5 (t976) 116-123. North-Holland Publishing Company
A R E D U C E D D I M E N S I O N A L I T Y M E T H O D FOR T H E STEADY-STATE K A L M A N F I L T E R J. CASTI International Institute for Applied Systems Analysis, Laxenburg, Australia. Received 23 May 1975 Revised manuscript received 6 November 1975
We consider the standard Kalman filtering problem in which the dimension of the output (measurement) vector is p, while the dimension of the state-space for the process model is n. The usual approach to determination of the steady-state gain matrix involves solving an algebraic Riccati equation consisting of n(n + 1)/2 quadratically nonlinear equations. In this article, we present an alternate equation for the optimal gain matrix, itself, continuing only np quadratically nonlinear components. Numerical results comparing the efficiency of the new equation with the standard approach are also given.
1. Introduction A venerable problem in the modern system theory literature is the so-called Kalman filtering question in which we desire to determine the optimal least-squares estimate of a noisy signal, given a model for the observation process. The central importance of this problem is primarily due to the fact that its solution is recursive, enabling one, in principle, to e.mploy straightforward computational methods for the solution. However, if the dimension of the system state is large, the computational problem is no longer negligible since, in the Kalman formulation, in order to determine the optimal filter gain we must solve O(n 2) simultaneous quadratic equations, where n is the dimension of the state. In these cases, many refinements of the standard algorithms have been proposed to take advantage of the structure which the system presents, e.g. sparseness, weak coupling, and so forth. In recent years, mathematical steps have been taken to substantiate the conjecture that, regardless of the dimension of the system state, if the dimensions of the input and output spaces are small, it should be possible to exploit this situation and reduce the computational complexity of the 116
J. Casti / The steady-state Kalman fitter
1t7
equations needed to calculate the optimal filter gain (or the optimal feedback control law). Beginning with the work [7], and continuing in [4, 6, 10], it has been shown that this conjecture is valid and that indeed it is possible to reduce the number of needed equations to O(n) without increasing the analytic complexity of the problem (still quadratically nonlinear). The ideas which motivated the new filtering equations had their origin in radiative transfer, special cases having been developed by Ambartsumian [1] and Chandrasekhar [9] in the 1940's. As a consequence of the transport terminology, the new filtering functions have been called "generalized X - Y functions" [4] or "Chandrasekhar-type" algorithms [10]. Unfortunately, the finite interval version of the X - Y functions does not readily lend itself to an algebraic equation for the infinite interval case, in contrast to the situation for the standard matrix Riccati equation of optimal filtering. Thus, our objective in this paper is two-fold: to present the appropriate algebraic equation which may be used instead of the algebraic Riccati equation in order to determine the optimal steady-state gain function and to give results of comparative numerical experiments.
2. The Kalman filter
The standard Kalman filtering problem which we study is described in detail in [11]. Our interest in this note is the numerical determination of functionals of the solution of the matrix Riccati equation dP d--[ = G Q G ' + F P + P F ' - P H ' H P ,
(1)
P(O) = F, where F, O and F are n x n constant matrices with O ~ 0 , P 1>0, G is n x m, and H is p x n. For future reference, we note that the Kalman gain function K ( t ) is given by K(t) = P(t)H'.
(2)
The steady-state gain K(~) is obtained by solving the algebraic Riccati equation G Q G ' + FP(oc) + p(oo)F' - P ( ~ ) H ' H P ( ~ ) = O, or
G Q G ' + F P ( ~ ) + p ( o o ) f ' - K(oo)K'(oo) = O.
(3)
118
J. Casti / The steady -state Kalman ]ilter
Our subsequent development will concentrate on deriving a substitute for Eq. (3) involving only np unknown quantities, rather than the n ( n + 1)/2 unknowns in (6) (p is the dimension of the observation process for the standard filtering problem).
3. Generalized X - Y functions
Since the finite interval version of the generalized X - Y functions for the Kalman filter are of some interest in their own right, we present them without proof in this section. The results follow the development in [4]. The basic result is Theorem 1 [4,10]. Let (i) rank Z ( a__G O G ' + F F + F F ' - 1"1-1'141") = r, (ii) rank H ' H = p, a n d let Z be factored as Z = B B ' , where B is an n • r matrix (possibly complex). Then the solution to the matrix Riccati equation (1) satisfies the algebraic relation FP(t) + P(t)F' = L(t)L'(t) + K(t)K'(t) - GOG',
0 <~ t <~
(4) where L a n d K are n x r, n x p matrix functions, respectively, satisfying the equations dL d--'-[= ( F - K ( t ) H ) L ,
dK
d---t-= L ( t ) L ' ( t ) H ' ,
L(O) = B,
(5)
K(O) = F H ' .
(6)
Remarks. (1) Eqs. (5)-(6) represent a system of n ( p + r) equations suitable for computing the functions L and K. The importance of this representation is that, by definition, K ( t ) = P ( t ) H ' , the Kalman gain itself. Thus not only are there fewer equations in the L - K system, but they allow a direct computation of the physically relevant quantity K. (2) The algebraic relation (4) is often useful in determining the error covariance P ( t ) for selected values of t, provided that the inversion of (I| + F@I) can be readily accomplished (here | denotes the Kronecker product). (3) Since (5)-(6) represent n ( p + r) equations, a computational savings
J. Casti / The steady-state Kalman filter
119
over the Riccati equation (1) is anticipated whenever p + r < (n + 1)/2. However, from the rank conditions (i) and (ii). we see that p is always equal to the dimension of the observation process, while r is always bounded from above by the number of inputs to the system model. Thus, for many physical processes there is a high likelihood that the condition p + r < (n + 1)/2 will bc satisfied.
4. Steady-state solutions In many instances it is of interest to know the steady-state gain K(o~) so that either it may be used as a sub-optimal, easily implementable filter, or so that one can determine the minimal possible error covariance over an infinite observation horizon. In these cases, the usual approach is to solve the algebraic Riccati equation (3) for its unique positive semi-definite solution P(~), then form the gain function K(~). As earlier discussed, this procedure requires the solution of n ( n + 1)/2 simultaneous quadratic equations. Our main result, given below, is to show that the optimal gain, K(oc), may be directly determined by solving np simultaneous quadratic equations. Thus, when p < (n + 1)/2, we have fewer equations. The result upon which all else follows is the
Steady-State Theorem [3]. A s s u m e F has no purely imaginary characteristic values and no real characteristic values symmetrically placed relative to the origin. Then the steady-state gain K satisfies the algebraic equation tr(K) = (HNI) (I|
+ F|
GOG'),
(7)
where | denotes the Kronecker product and cr is the column stacking operation, i.e. if A = [a,j], then o'(A ) = (a,,a~_, . . . a,,,a,2a22. . . a,,2" " a,,,,)'.
There are several remarks in order regarding the above Theorem: (a) The presence of the matrix inverse ( I | F| seems, at first glance, to present a serious computational obstacle, being an n 2 x n 2 matrix. However, the special structure makes it possible to carry out this inversion utilizing the characteristic polynomial of F and the inversion of an n x n matrix (see [2] for details). In addition, for many parametric studies the system dynamic matrix F is reasonable well known and our interest is in exploring the effect of changing
J. Casti / The steady-state Kalman filter
120
G, H, and/or O. Thus, the inversion need be done only once and stored for future use; (b) the form of equation (7) is more well-suited to standard successive approximation algorithms than the usual algebraic Riccati equation since the unknown quantity appears by itself on the left side of the equation. While this is a minor point, it does enable us to directly appeal to many standard results from functional and numerical analysis in which a form u = T(u)+f
is assumed for the equation to be solved in the unknown u; (c) it is important to note that the number of equations in (7) depends only upon the dimension of the observation process, p, and is totally independent of the number r, which played a role in the finite-interval equations (5)--(6) for L and K. In view of the proof of the Steady-State Theorem, one might conjecture that an equation for K alone could be developed for the finite-interval problem. Further remarks on this situation in the context of control theory, as well as implications of (7) for the Inverse Problem of Control are given in [3, 5, 8, 12].
5.
Numerical
results
Interesting as eq. (7) may be on theoretical grounds, the proof is in the program and the final test is its numerical efficacy vis-a-vis the algebraic Riccati equation (3). To investigate this question, comparative numerical experiments were performed. The first involved the matrices 1.7 )
Q=I,
H=(000
1),
G=
4.2 0.85 3.91
,
with six different F matrices. In all cases F has the companion form
F= --
0
1
0
0
0
0
1
0
0
0
0
1
a3
--
a2
-
a~
-
ao
J. Casti / The steady-state Kalman filter
121
Thus, F is completely determined by its characteristic roots. The computing procedure used was the following: (1) The finite-time Riccati equation (1) was integrated to a value T at which the relative change in its components in going from T - A to T was less than 10 -4. This determined an initial approximation P0 to P(~). Using P0 and a built-in subroutine for solving sets of nonlinear algebraic equations (a hybrid procedure combining Newton iteration and steepest descent; see [12] for details and a program), the algebraic Riccati equation was solved to an accuracy in the residuals of 5 • 10 -5. (2) Using the initial guess K o = Poll', the algebraic equation (7) was solved by the same algorithm as used in step (1) to the same degree of accuracy. The results of the above experiments, calculated on the CDC Cyber 74 computer by Dr. O. Kirschner, are given in Table 1.
Table 1 C o m p u t i n g t i m e s (in s e c o n d s ) f o r a f o u r - d i m e n s i o n a l p r o b l e m
Case (roots of F)
Eq.
(7)
Algebraic Riccati equation
Approx. time factor
- 8, - 8, - 8, - 8
0.073
0.783
11
- 15, - 8, - 8, - 8 - 8, - 1.4, 1.5, 9.5 - 5.5 -+ 3.5i, 8.4, - 12.4 3.3, - 7.5, - 0.2 + 9 . 2 i
0.102 0.123 0.123 0.119
0.706 0.431 0.757 0.818
7 3.5 3 7
- 0.1 -+ i, - 0.2 -+ 9 . 2 i
0.087
0.575
7
From Table 1, it is evident that eq. (7) results in a dramatic improvement over the computing time required for the algebraic Riccati equation. What is surprising, perhaps, is the large magnitude of the improvement since, on a strictly equation-counting basis, we would expect a factor of between two and three as there are four equations in (7) as opposed to ten equations in the algebraic Riccati equation (3). Additional results for the case n = 8 are given in Table 2.
122
J. Casti / The steady-state Kalman filter Table 2 Computing times (in seconds) for n = 8
Eq. (7)
Algebraic Riccati equation
Time factor
- 7.5, - 0.2 -+9.2i, 9.5
2.680
13.447
5.02
1,2,2.5,2.5,2.5, 2.5, - 3, - 4
2.894
22.486
7.77
1, -1.4,1.5, - 8 , - 8, - 8, - 8, 9.5
2.982
12.973
4.35
0.5, 0.1 -+3.3i, - 3.5, 4, - 4.5, - 7, 8
2.826
15.280
5.41
1, -0.1 -+ i, -1.4, 1.5, - 0.2 -+9.2i, 9.5
2.711
14.792
5.46
Roots of F
1, - 1.4, 1.5, 3.3,
6. D i s c u s s i o n
W e have seen t h e possibility of s u b s t a n t i a l l y r e d u c i n g t h e w o r k n e e d e d to c o m p u t e t h e s t e a d y - s t a t e gain K ( ~ ) if t h e r a n k of the o b s e r v a t i o n m a t r i x is much less t h a n t h e d i m e n s i o n of t h e state. A l s o , it has b e e n s e e n that the actual c o m p u t i n g time r e q u i r e d is a function not only of t h e n u m b e r of e q u a t i o n s , b u t also d e p e n d s u p o n t h e p a r t i c u l a r s t r u c t u r e of t h e system. Thus, we p o s e t h e following issues as w o r t h y of f u r t h e r s t u d y : (a) N u m e r i c a l studies. F u r t h e r e x p e r i m e n t s on m a n y d i f f e r e n t system structures a r e n e e d e d to d e t e r m i n e w h a t p r o p e r t i e s of F, G, H, O, give rise to p a r t i c u l a r l y f a v o r a b l e (or u n f a v o r a b l e ) c o m p u t a t i o n a l cases vis-a-vis t h e a l g e b r a i c R i c c a t i e q u a t i o n . T h e s e e x p e r i m e n t s should also give insight into w h a t d i r e c t i o n s a n a l y t i c i n v e s t i g a t i o n s m i g h t profitably take. (b) I n f i n i t e - d i m e n s i o n a l p r o b l e m s . S o m e w o r k has b e e n d o n e in e x t e n d ing the f i n i t e - t i m e case to i n f i n i t e - d i m e n s i o n a l p r o b l e m s [8]. H o w e v e r , it r e m a i n s to c r e a t e t h e p r o p e r setting in o r d e r that the S t e a d y - S t a t e T h e o r e m be valid.
References
[1] v. Ambartsumian, "Diffuse reflection of light by a foggy medium", Doklady Akademii . Nauk SSSR 38 (1943) 229.
J. Casti / The steady-state KaIman filter
123
[2] S. Barnett, and C. Storey, Matrix methods in stability theory (Nelson and Sons, London, 1970). [3] J. Casti, " A new equation for the linear regulator problem", Journal of Optimization Theory and Applications, 17 (1975) 169-175. [4] J. Casti, "Matrix Riccati equations, dimensionality reduction and generalized X - Y functions", Utilitas Mathematica 6 (1974) 95-110. [5] J. Casti, "New Equations for the Time-Dependent Regulator Problem." IEEE Transactions on Automatic Control, AC-20 (1975) 558. [6] J. Casti, "Reduction of dimensionality for systems of linear two-point boundary value problems with constant coefficients", Journal of Mathematical Analysis and Applications 45 (1974) 522-531. [7] J. Casti., R. Kalaba and K. Murphy. "A new initial-value method for on-line filtering and estimation", IEEE Transactions on Information Theory IT-18 (1972) 515-518. [8] J. Casti, and L. Ljung. "Some new analytic and computational results for operator Riccati equations", S I A M Journal on Control 13 (1975) 817-826. [9] S. Chandrasekhar, Radiative transfer (Dover, New York, 1960). [10] T. Kailath, "Some new algorithms for recursive linear estimation in constant linear systems", I E E E Transactions on Information Theory, IT-19 (1973) 750-760. [11] R. Kalman, and R. Bucy. "New results in linear filtering and prediction", Journal of Basic Enginering ASME, Ser. D, 83 (1961) 95-108. [12] M.J.D. Powell, "A Fortran subroutine for solving systems of nonlinear algebraic equations", A E R E Rept. 5947, Harwell (1968).
Mathematical Programming Study 5 (1976) 124-144. North-Holland Publishing Company
ON CONTINUOUS PARAMETER
AND DISCRETE
ESTIMATION
SAMPLING FOR
IN DIFFUSION TYPE PROCESSES
A. LE BRETON
University of Grenoble, Grenoble, France Received 30 May 1975 Revised manuscript received 22 September 1975 This paper is concerned with the problem of estimation of the drift parameter 0 in a linear stochastic differential equation, with constant coefficients. When continuous sampling of the solution process is available, the maximum likelihood estimate 0r, based on observation in [0, T] is defined in terms of stochastic and ordinary integrals. So, in practice, to compute 0~, one has to approximate these integrals by appropriate finite sums which only depend on some discrete sampling in {to, t . . . . . . tN}C[0, T]. If 0N.~ (resp. 0N.T)denotes the resulting estimate (resp. maximum likelihood estimate based on observations at to,...,tN), we show that, when Max lt,+t-t,I = 6N goes to zero, P - l i m ON,~- = P - lim O~r = OT, 6~/~(ON.r - Or) and 6 N"z(/~.T - OT) both bounded in probability.
1. Introduction I n [1] a n d [2] we h a v e s t u d i e d t h e p r o b l e m of p a r a m e t e r e s t i m a t i o n in a v e c t o r l i n e a r s t o c h a s t i c d i f f e r e n t i a l e q u a t i o n of t h e f o r m
dX,=OX,
dt+GdW,;
t-0;
X0=0,
(1)
w h e r e ( W , ; t - 0) is a s t a n d a r d W i e n e r p r o c e s s in R", 0 (resp. G ) is a n u n k n o w n n x n m a t r i x (resp. n o n s i n g u l a r n x n matrix), w h e n c o n t i n u o u s s a m p l i n g of t h e s o l u t i o n process is a v a i l a b l e . T h e statistical s t r u c t u r e for e s t i m a t i o n b a s e d o n o b s e r v a t i o n in [0, T ] is (C,~eT,
T . 0 C ~(R"), {/z o,~,
G @ ~*(R")}),
where C is t h e set of c o n t i n u o u s f u n c t i o n s f r o m R + to R", 5er is t h e o--field of s u b s e t s of C g e n e r a t e d b y t h e f a m i l y of e v a l u a t i o n f u n c t i o n a l s o n C, {H, ; t E [0, T]}, 124
A. Le Breton / Continuous and discrete sampling
125
~f(Rn) (resp. Lg*(R")) is the set of n • n matrices (resp. non singular n x n matrices). For all (0, G ) E Le(R")• 5~*(R"), p,o,r~ denotes the measure on (C, 5er) induced by every process which is a solution of (1). It has been shown that we can assume that G G ' is completely known with the meaning that, with probability one, it may be computed on every finite interval [0, t], t ~ ]0, T] by the statistic 2N
d', = lim -1 ~ ( I I . / 2 ~ - l~(i_l)t/21q)(Ilit/2 N - II(i N+~
1)I/2N),
t i=l
which is such that, for all (0, G ) E ~ ( R " ) • 5f*(R"), d-,
r = G G , tZo, c
almost surely.
Here ' denotes transposition. Assuming G G ' known and, without loss of generality, equal to E,, the n • n identity matrix, the statistical structure for the problem of estimation of the drift-parameter 0, (C, ~T, {~,~; 0 ~ ~e(R~ with ~0 T ~
T
is dominated by the standard Wiener measure /z~, on (C, SPr) with the likelihood function d~
_
where the stochastic integral is defined with respect to the measure tx ~,. The maximization of the Log-likelihood function Lr(O, 9) =
(OlI,,dH,)- 89
(OII,,OII,)dt
provides the estimate
0, _- [fTd ,- < [fo" If the true value 0o for 0 is a stability matrix, the family of estimates {0r; T > 0} is consistent and asymptotically normally distributed. In practice, even if continuous sampling is available, to compute 0r, the stochastic
126
A. Le Breton / Continuous and discrete sampling
(resp. ordinary) integral f { d l I , II', (resp. f~II, l I ; d t ) shall be approximated by an appropriate finite sum of the form ~ (/7,,- H,, ,)H:, ,
resp. ~ (tl - t , ,)H,. . . . .
i=l
i=1
where {to, tl . . . . . tN} defines a partition of [0, T]. Then a natural question arises: Is the resulting estimate
ON, T
=
(11,, - - [71,,_,)H:,
(t, - t~ ,)II,,_1H' t i - I
the best one that could be obtained in view of the discrete sampling at to, tl,...,tN ? We are then led to study the problem of estimation of the drift-parameter 0 in the stochastic differential equation
dX,=OXtdt+dW,;
t->0;
Xo=0
(2)
based on discrete observations at to, h , . . . , t N . The result is that, if 0N,T denotes the maximum likelihood estimate, when 6N = Max,_iT-~ [t~- t,_, I goes to zero, lim ON.T= lim O~.r = OT in the sense of convergence in probability, and further 87~'/2(ON,r - 0~)
and
67~"~(0~.r - OT)
are bounded in probability. This means that for continuous sampling, the error bound is of order 6 ~2 (in probability) for approximation of the maximum likelihood estimate of 0. The significance of this result is that it provides a theoretical basis for approximate computations. Indeed, if the order of the variance (or asymptotic variance T---~ o0) of the exact estimate is known [2], it yields a definite limit on the usefulness of further refinements of the discretization. On the other hand, if only discrete sampling is available, we know that 0N.r is almost the "best estimate", up to the order 8~ 2, with the desirable property of being easily computable. Moreover the estimation results for diffusion type processes are derived from a general theorem which may be useful in various other situations. The paper is organized as follows: In Section 2, we compute the Log-likelihood function (L.L.F.) /~N.r for estimation based on discrete sampling and we give an a priori approximation s of/~,N.r. In Section 3 (resp. 4), we study the properties of the approximation of/~N, r (resp. Lr) by
A. Le Breton / Continuous and discrete sampling
127
L,,,. r. In Section 5, we prove a general theorem on approximate maximum likelihood estimation. In Section 6, the application of the general theorem to our particular problem yields the results mentioned above.
2. The L . L F . for estimation based on discrete sampling
The statistical structure for the estimation of 0 based on discrete sampling at t0, t ~ , . . . , & , where 0 = t 0 < t z < . . . < t n , is
(c, ,~T,~ ~ {/z ~ '~., 0 ~ ,~(R" )}), where b ~ i s the sub o--algebra of ~ - generated by the family {ll,, ; 0 <- i <- N}, for all 0 ~ ~ ( R " ) . /zff.N is the restriction of /zJ to 'f~. We first compute the likelihood function d/zo7;'/d/z.r'N, where /z~r,"N denotes the restriction of /Z~ to 5e~. Lemma 1. For all 0 @ ~ ( R ") the m e a s u r e s /Zj.N a n d /Z~,N are m u t u a l l y absolutely continuous, the R a d o n - N i k o d y m derivative being given by d/z~ N =
(t, - t, ,)"
• ~l,(II,,-e",-',-,"n,,_,)+ 9
9 exp
- ~ ,:, (ll,. - e", " O~
~~
( I I , . - II,. ,)'(I1,,- II,, ,)}
-
where v~o., = f,] '.-,e "~ e"~
i-I
li
--
/i-I
,)
'
i = 1, N.
Proof. The equivalence of the measures/Z~-N and/Z,~,N follows from that of
/Z ]and/.t~. We compute the density of the probability distribution Ptx, .. x,~ of the random vector (X, . . . . . . X,~) with respect to that of (W, . . . . . . W.~), say P ( w , I ... w , ~ ) .
For this we determine the respective densities of these distributions with respect to the Lcbesgue measure on (R") N, say A~. We know that the solution of (2) is a gaussian process with mean zero given by X, =
f,
e" ~'~
t >-0
I
with covariance r(s,t)=e
"~
e "~ e "~ )
e '~
for
O<-s<-t.
128
A. Le Breton / Continuous and discrete sampling
We can write X,,+,
= e"'+'
,, +
"~~
e ''''~
~
t,+l
e
.,o dW~
for
i=O,N-1.
i
Set e~ = e ',~
for
e -'~ d W , i
i = 1,N.
I
Then e2 = X , . - e "~
'~ X,,,
eN = X,N - e ~'N -'~ ~176
where the distribution Pc ....... ~of the random vector (e~ . . . . . e,.,) is gaussian with uncorrelated elements and Ee,=O
i=
l,N,
[I; ~O
Ee, e,=e,,O
1
e -u..e-~o'du e,,O'
i-I
tl
=
tl
I
e ~~ " e ~ ~
= Xo.i.
It follows that dP~, N,,~ where f . , ( x , , ) = (27r)-"/2(det Z~.,)-'/2 e x p { -
l-x ,, .-V-' ,, } 9
And then dAi xexp
(x ....... x,.,.) = (2~r)
I
- 89
e
(.... ' ' - 'J~
det Xo., ,,-,1
v z -'~ O.J[Xt,
--
e(tJ-ti-t)Oxt,_l)
i-I
/
where x~,-=O. Further, the random vector ( W , , , W , ~ - W , . . . . . . W,~ - W,~_,) has uncorrelated gaussian elements with mean zero and ,covariances (t, - t , _ , ) E , , i = 1, N . Its density with respect to A ~ is (2rr)-"N'~ ( ,=,I~I(t,-t,_,) --'2) exp {- 89 ,~,~_x',,x,,t, - t, ,jl"
129
A. Le Breton / Continuous and discrete sampling
Then one has
x
,,
,,,
x
and further, dP(• ..... x,n) = dP(x,, ..... x,.) x [dP(w,, ..... dP(w, ...... w,,) dh~ [ dh~
J
yields the desired result. T h e Log-likelihood function for the estimation of 0 is then: /~. T (0,") = -- 89
L o g det Zo.i - n ~ Log (t, - ti-l) i=l
i=l
N
_ 1~
(H,, - e "
"-')eli,,
-l / / ,, - e("-',-0~ ,)'X..,(
,)
i=l N
+ 89~
(t,
--
t, ,) I (11,, -
I/
.... ) t (li,, - li,,_,).
i=1
Let {HN}N_~ be a sequence of s u b d i v i s i o n s / / ~ = {0 = to < tl < . . . < tN = T} such that, setting 6N = Max,=,-7-~ ( t , - t~_,), l i m N ~ 6 N = 0. W e look for an a s y m p t o t i c a p p r o x i m a t i o n ( N ~ + o~) of /ZN.T ( 0 , ' ) . Let us define N
/~N. T ( 0 . " ) =
N
~ li',, , O'(li,, -- II,,_,) -- 8 9 i=1
(t, -- t, 1)II,; ,O'OII ....
i=1
which, in fact, a p p e a r s as a natural a p p r o x i m a t i o n of LT(O, " ). W e look at the a s y m p t o t i c properties of /-7,t~.T(0,')--/~,N.T(0,')
and
s
when N goes to infinity. /~(o f ~' No. T , L ~ )) shall d e n o t e the derivative of o r d e r i (i = 0, 1,2), N , T (resp. L with respect to O, of /Z~.T (resp. s L~). 3. T h e a p p r o x i m a t e
L . L . F . for e s t i m a t i o n b a s e d on d i s c r e t e s a m p l i n g
In the following t h e o r e m , we describe the properties of the a p p r o x i m a t i o n of LN, T by s
T-
T h e o r e m 1. For every constant K > 0 ,
a n d for i = O, 1,2, there exists a
130
A. Le Breton / Continuous and discrete sampling
sequence
{e~4(K)}N>I of positive
random variables such that, for all 0 " ~
~(R") (j) {e ~(K); N >- 1} is bounded in tx ~* probability, i.e., lim~+| SUpN~+~ P[t e ~ ( K ) l > A ] = O, "o) _ ~,Tt,,,')l f(') ta r almost (jj) Sup,o~(.,~:lol<,,rlLN,-dO,') < 6~2e~(K)
surely, where for A~Le(R ~ (resp. h~R~ [h l-- Maxlh,[).
Ial=MaxlA,jl
(resp.
T h e proof proceeds through a s e q u e n c e of lemmas. L e m m a 2. For h > O, let XO,h be defined by XO, h = Then, (i) (ii) (iii) where
fo
e ~~ 9e"~
for h sufficiently small, one can write det J~O,h = h"[1 + h tr 0 + h2y,(h, 0)], where tr 0 is the trace of O, L o g d e t X o , h = n L o g h + h t r 0 + h2yz(h, 0), X~,~ = h - ' [ E , - 89 + 0')+ h2A(h, 0)], Supl,y~(h,O)I<_y,(h,K),
i = 1,2,
101-
Sup Ia(h,O)] <- "/3(h, K ) r0r~K
with limb_0 y~(h,K) = y , ( h , K ) , i = 1,3. Proof. Xo, h = h E . + (h :/2!)( O + 0 ' ) + (h 3/3!)((dZ/du 2)[e "~ 9 e"O']).=h, with 0
lim y o ( h , K ) = yo(K).
IO[~K
h~O
So det Xo,, = h" IlL, (1 + h a ~,h), where {A;,h; i = 1, n} are the eigenvalues of 89 + 0' + 2hAo(h, 0)) = F(h, 0). Then, expanding the product, we can write det Xo, h = h"[1 + h tr 0 + h2y,(h, 0)], where y,(h, 0) satisfies the required property. Now, for h sufficiently small, we can easily derive assertion (ii). In order to prove assertion (iii), we write: X o,h-' = {h [E. + ~h (0 + 0') + h 2zlo(h, 0)1} ' = h - ' [ E . + 89
+ 0')+ h2zlo(h, 0)] '
= h - ' [ E , - 89
+ 0 ' ) + h2A(h, 0)],
A. Le Breton / Continuous and discrete sampling
131
where a ( h , 0 ) = - / t o ( h , 0 ) + V2(h, O)[E, - hF(h, 0)+ h2r(h, 0 ) + . . . ] , Sup [ a ( h , 0)[ <- 70(h, K ) + n~(K + hyo(h, K))2[1 - h ( K + hyo(h, K))] 1. IoJ<-K
Lemma 3. One has
I ~ (H,, - FI,, ,)2|
<-- 6~z UN,
where for h @ R", h 2| = hh', and {UN}N~I is a sequence of random variables bounded in tx ~-probability, for all 0 E ~ ( R " ) . Proof. Set N
u,,=lt2NI,
~,(H,,-H,, ,)2| i-1
H',= (H', . . . . . HT)
and
&., =
if
k~ 1
""
We show that every term in /.J, is b o u n d e d in Ix oLprobability. First, let us assume that 0 = 0 i.e., IxoT= IX~,. Then: 1
k H k ~tH t - H , , , ) - T & t
2
i=l
22(t,-t,,)
2 if
k=l,
i=1 N
~(t,-t,-l):
if
k~l
i=1
-< 2T6N which provides the result. Now, for 0 E ~ ( R " ) , with respect to IXor, we can write
H,, - H,, , =
"
f, i
Olls ds +
~o_0
~ ....
i
where (so,o) is a Wiener process. Then, setting (~o), = (orL )' = ((otis), . . . . . (oils).),
~,,
~, ) and
A. Le Breton [ Continuousand discretesampling
132
rik ,lHt ' ) - T&~ 2 <E,4 [ ~ ( Hk,,- ...... j~ ,,- H .... i=1
<4E.~-
(~,, - {,,_,)(~,, - {,,_,)- T3k, i=1
+
(OH,)k ds) i=l
+
i_ I
F (f i=l
(0U,), ds) i-I
r
(OU,)~ ds)(~,~"- so,,_,) o., +
i--I
(OH,)tds i=1
i I
)
~,k
x (~,,
O,k
-
~,,_,)
]2}
<-8 T~N + 46%E,,, [( ~'(OFI, )I d s ) ( L r (Oll,)] ds)]
With respect to/x T, (H,) (resp. (so,~ a gaussian (resp. Wiener) process; so each term of the form E,g (.) is finite. This completes the proof. We are now able to prove the assertion in Theorem 1 for i = O. One can easily express the difference ISN.T(O,')--[.N,T(O," ) in the following form:
LN. T(0,. ) - s
~(o,. ) =
=- 89176176
o,,- (t, - t,_,)-'E. + - 89 [,=~ { X-~
-89 [,=~ {(E.
- e'-"-~176
0}(//,, //,, ,)2|
- e'-"-"~ - O'O(t, - t i - O } H ~ ]
r6
Lemma 4.
(3)
For all constant K > O, there exists a sequence
]
{e~(K); N -> 1} of
positive random variables such that, for all O* ~ ~ ( R n) (i) {e~v(K); N - 1} is bounded in I.t ~*-probability (ii) SUplOI~_K]ISN.r(O," ) --/~N,r(0," )[-----6~2e~K) I~or* almost surely.
A. Le Breton / Continuous and discrete sampling
133
Proof. W e d e n o t e by EN.o, i = 1, 5 the successive parts of the sum in (3). From ~Yo,~= Zo,,,- .... and L e m m a 2, we derive N
E~o = ~
(t, - t,_,)2~,2(t, - t,_,, o),
so 1 Sup IEN, ol <-- 6~y4(6.,K)
with
lim y,(6N, K ) = y~(K).
101-
Now considering gN,2 o, E2
T
~ (17,,-11, ,)2|
,/2 01 tr
As tt
,.
0r*- l l m
1 X~
-m ~.~ (17,, - 17,,_,)
2|
= E.,
where P - l i m m e a n s limit in the sense of c o n v e r g e n c e in probability P, we have: 2 T Sup IEN,ot <---~ n'a Ko~(N),
Ioj<-K
where tx[*-limN~+~a(N)=O and, in view of L e m m a 3, {o~(N)/6~:} is b o u n d e d in probability. By L e m m a 2, we can write,
EN.o .
. ~tr
.
(t,
. t,_,)A(t, .
t,-1,0)(17,. . / 7 ,. ,)2| .
i=1
and then,
IE~,ol <- ~,, la,(6N, 0)ltr
(17,_~,,
,)2|
so that we have Sup [E3.o[ <- 6Nys(SN, K)[3(N), IOI~-K
where limN~+~ YS(6N, K ) = Ts(K) and /~ oT*-limN--+~[3(N) = nT. Now, we write
EN.o- -- 89
(t~ -- t~-,)2A2(ti -- t~_,, O)H,2~
A, Le Breton / Continuous and discrete sampling
134
So 4 Sup IE~.ol <-
IOI<-K
aNy7(&,,,K)~(N),
where lim~+= yT(SN, K ) = yffK) and tt or*- limN~+= ~r(N) = trfr~II~| To complete the proof, consider EN.o-- -- tr
(t~ -- t,_~)A~(t~ - t~ z, O)H,, ,(H,, - H,,_~)' i=1
and N
IE~,ol <- n '/2 ~] {tr[(t~ - t~ ,):A3(t~ - t~_,, 0)17,, ,I]H,,-/7,, ,112
i=|
x /7;, ,A~(t, - t,-,, 0)]} ''2 N 1/2
<_ aN 3'd&, O) ~ (t, - t,_,)'211/7,, - tI,, ,1111//,,-,11 i=1
1/2
-< aN ~,~(aN, 0)
i=l
II&-n,,
/I
2~ I/2
,11=
where for h C R ~ 1[h [I2 = (h, h). So 5 Sup IEN, o] <_ a N '/: ~,~(~, K ) v ( N ) ,
Io]
where
t
tt0*--T lim v(N)= ( . r ) ''2 fO T 11/7, IlZdtt
lim Ts(3N, K ) = ys(K),
N~+~
N~+~
and proVe the
Now we investigate the difference ~f oN). T Wt a, . ) - - L ~ ! T ( O , ' ) assertion in T h e o r e m 1 for i = 1. We can write, 1G(o)-
/G(0)
=
= - 89 {d-d~ [,=~ Log det ~ o . , ] o,,( ,,
_89 {~0. [tr ,=, ~ {(E. dO tr
1/2
i=l
-
TE.}
/7,, ,)2| +
e (..... ' '- ,)o,,~-1 , o,,,.~, t~
{Z~.I~(E.-e("-"')~
-
e'"-"-')o)- O'O(t,
I
/7
,,,) i
.
-
t,
- 1)}/7~5 , ]}
A. Le Breton / Continuous and discrete sampling
135
L e m m a 5. For all constant K > 0 , there exists a sequence {e~(K); N -> 1} of positive random variables such that, for all 0 " ~ ~ ( R " ) , (i) {e~(K); N - > 1} is bounded in ~.-probability, (ii) Suplol_~KIfcl) r,~ " ) - ~N, f ~l)TW, la ")1 <6 N - ~/2e ~ ( K ) I.Lo* r almost surely. ~N.7-w,
Proof. FN, ' o, i = 1,4, denote the successive terms of the sum in (4). Let us compute ~ 8k.,(t,- t, ,)+(t,-t~-l)2y~'~'(t,-t, ,,0) ,=1 1 + ( t , - tTi~r--0~,- t, ~)Ti(t, - - t ~ , b )
d0k., ~=~L ~ 1 7 6 d N
N
= T6k., + ~ (t, - t~_,)2 ygk"(t, - t, ,, 0), i=I
where Sup l y~'~(h,O)l -< Ykg"(h,K)
with
pOI<--K
lim y9k.,(h, K ) = y ~ ' t ( K ) . h~O
So
Sup IF'~,ol <- 6~3,,o(6N, K)
where
tol~K
lira y,o(6N, K ) = y,o(K). N~+~
Now we look at Xol,(H,, - 17,, ,)2|
d---0 tr
=
i=1
= -~
( / i t , - / / , , ,):*+
tr~
N (ti--ti
,)/14(ti-t,-,,O)([L,-17~,
,)2|
/
i=l
and
so that, using L e m m a 3,
Sup IFL,ol <~ ~ VN + ,~,~3,,,(,3N,K ) f ( N ) ,
Iol<--g
where {VN} is b o u n d e d in /zff*-probability, lim T,,($N, K ) = TH(K),
tzor*- lim f l ( N ) = nT.
N ~
T o obtain the inequalities for F3.o and F'u,o we proceed in a similar way as for E~,o 4 and E sN , O in the proof of L e m m a 4. By use of similar arguments, one can easily prove that r-m L , N , T ~r,~ tI: ~ 9 ) - /~!T(0, 9) satisfies a condition of the same type and so complete the proof of T h e o r e m 1.
136
A. Le Breton / Continuous and discrete sampling
4. The approximate L.L.F. for estimation based on continuous sampling
Now we look at the difference l ~ u . r ( O , ' ) - L r ( O , ' ) . given in the following theorem.
Its properties are
Theorem 2. For all constant K > 0 and for i = 0, 1, 2, there exists a sequence {A ~- r ( K ) ; N > 1} of positive random variables such that, for all O* E ~ ( R " ) , (j) { A ~ , r ( K ) ; N -> 1} is bounded in tx~* probability, < "x'/2aN,~N,r ( K ) tz o*T almost surely. (jj) Supl<<-~l ~r~,rtv, r~") ~a 9) - L~)( O, ")1 --
The proof will be quite obvious in view of the following lemma. Lemma 6. O n e has
(i) ['~N-i ~l,_l (l~/tl -- HI, l)'--fTl~ls dH:l <- a~ 2 VN, (ii) [E~=,(t, - t, , ) F l ~ E - f f H ~ | <- 6 g : W N , where {VN}N>Z and {W~}N>_~ are sequences of random variables which are bounded in tzTo-probability for all 0 @ ~ ( R " ) . Proof. Set H,,
,(/7,
-
,, ,) '
H
/L d H :
-
=
a~ 2
VN,
VN
=
I VNI ,
i=l
fo"
i=l(t,-t,_,)II~-
and
II~|
WN=[I~'N[.
We show that every term in ~'N is b o u n d e d in /xr-probability. First let us assume 0 = 0 , i.e., txoT=/x~; then //k
r
fo
t
i=l
= E.~,
_ 2 <
i
1
Is
E~I~ i=l
i
=2
(II,~_,-
l I ~*) d H ,
'
i
(s - t~_,)ds = i=1
t
U,)dn~ ~ '
(II, ~, li=l
k
i-I
(t,
-
-
r ti-,) 2 <-- anT.
i=l
Now, for 0 ~ 37(R"), we can write dH, = OH, dt + d,~, ;
t E [ 0 , T];
/ / o = 0,
A. Le Breton / Continuous and discrete sampling
137
where {~:,~}is a Wiener process with respect to /.tor. Then .l ~. k. . . r~ ,,I
E~
l
t-- Hn-l)
< - 4 E ~ {i ~ f,, i=1
Hsk dH~
--
(~,~.k -~:~'ok )d,so, 2 + i ~ s
i-I
i=1
t~
2 +
~ i=1
i
<(OII.)kdu ) d ~ " 2
(f~ i
I
i
I
t,
)(OHm),
(OII.)k du
I
i=1
i-I
i
I
• (OIl~), ds 2t
{
<-8 6 N T + i
N
I
s
]
ds
ti
E.~*lt j:~ , - .,Jz~
i ~1
i
~
,)]}ds
I
(s - t,-,)E~a i=l
E , al(OH.),]gdu ~ I
+ ~ % - t,_~) +
[s
(s-t,-,) i=1
(OH.)~du 9(OHs)~
~ 1
t
l
) } ds
.
Using the fact that the terms of the form E r a ( ' ) are bounded on [0, T], assertion (i) in the lemma can easily be derived. We look now at VCN: E,~
(t, - t, ,)H k,, , .H'.,, , -
II k~I L , ds
i=1
( n , k, , HI, , - H ,kns)i ds
= E,~d i=I N
-<2~(t,-t, /=1
~-I
L) ]"
E.a[III,k,_,FI ',, ,
Jt i-I
k ,2 ]ds. -u,n,I
This provides assertion (ii) because the terms E,~ ( . ) are bounded. The proof of Theorem 2 is then a simple application of that lemma to the study of LN.r(O,')--Lr(O,')=tr
0
(H,,_,-H~)dH: "=
i I
-~0 0
(H2,,~- H~| i=1
i
I
and its derivatives which can be expressed by d (s dO
(H,,_, i=1
i
IL)dH:
I
- 0
(H~ ~, - H~| i=I
i
I
A. Le Breton / Continuousand discretesampling
138
and, for u, v, k, 1 E {1, 2 . . . . . n},
C~2
ao.oook, (L~T(O,')--LT(O,')) =
~ds &~,=,
,,
, .........
H,Hs)ds.
Similar statements for the differences .~N, r-,) rt,,, ta . ) - L ~ ) ( O , ' ) , i = 0 , 1 , 2 , to those of T h e o r e m 2 for ~,,,*w," f o ) la ) - L~)( O, "), i = 0, 1,2 are obvious corollaries of Theorems 1 and 2.
5. A general theorem on approximate maximum likelihood estimation
A general problem is that of approximating a statistical structure and related statistical methods. This problem arises, for example, when only partial observation on some random element is available (a typical situation being that of discrete time sampling for a continuous time process) or (and) when one wants to use some approximate statistical tools. Here we investigate the case where the given structure is dominated and the statistical method is a maximum likelihood method for parameter estimation. We have the following theorem. Theorem 3. Let (12, M, {Po ; 0 E R k }) be a statistical structure dominated by P, with a Log-likelihood function L(O,. ). Let {MN; N-> 1} be a sequence of sub-~-algebras of M and, for all N >-1, LN(O,') be the Log-likelihood function on the statistical structure (12, Mu,{Po/~,~; 0 ~ Rk}) or any MNmeasurable function. Let us suppose that the following assumptions are satisfied. (A1) L and LN are twice continuously derivable with derivatives L "), L ~ ~, i = 1,2. (A2) L ~ does not depend on 0 and is P almost surely a strictly negative definite matrix. (A3) Lm(O, 9) = 0 admits P almost surely a unique solution 0(. ). (A4) There exists a sequence {yN ; N >- 1} of positive numbers converging to zero and for i = 1, 2 and all K > 0 there exists a sequence {V~(K); N -> 1} of positive random variables such that, for all O*E R k, (a) {V~,(K); N >- 1} is bounded in Po* probability ; (b) Sup,0,_~,, I[L%'(0, " ) - L")(O,.)H -< "/N V~(K) Po* almost surely, where for A E ~ ( R " ) , [[A II2= XA,~. Then there exists a sequence {0N; N-> 1} of random variables satisfying
A. Le Breton / Continuous and discrete sampling
139
(i) 0N is tiN-measurable,
and for all O* ~ R ~, (ii) IimN~+~Po*[L~)(ON) = 0] = 1, (iii) Po*-limN~+~ 0N = 0.
Further, if {0~; N-> 1} is another sequence of random variables satisfying conditions (i), (ii) and (iii), then, for all O*E R k, lim Po*[O~ = 0~1 = 1.
N~+~
Lastly, if {ON ; N >- 1} is a sequence satisfying (i), (ii) and (iii), then, for all 0 * E R ~, the sequence {7~,~(0~ - 6); N-> 1} is bounded in Po. probability. Proot. W e write, for 0) ~ J2, x E R k,
L ~)( 6(0) ) + x, w) = L ~)(6(0) ), to) + L (2'(w)x + {L ~)(0'(0), x), to) - L(2)(0))}x, where
0'(0), x) = 6(0))+A(6(0)),x)x;
A(O(0)),x) ~ (0, 1).
Let K > 0 be a given constant. The function L ~ ) ( 0 ( w ) + . , 0)) is continuous on IIx II ~- 1/K. Further,
(x, C~)(O(0)) + x, to)) = (x, L~)(6(0)), 0))) + x'L2(0))x + x'{C~)(O'(0), x), to) - L (2)(o))}x.
In view of the fact -L(2)(.) is P almost surely strictly positive definite, we have Inf, x, l{-X'L'2)(0))x}=q(0))>O and -x'L(2)(0))x>-q(0))[[x]]L So (x, L~)(O(0))+ x, 0))) shall be strictly negative for fix [[ = 1/K if
IIL%'(&0)),0))II< q(0)) 2K
and
Sup I1~I1-i/~
[IL~)(O'(0),x),0))-L'2'(0))l[<89
Then we consider, for K > 1,
I2N(K) = {0) E gI: y N V ~ ( K - 1 )
(0)) < q(0)-)
2K '
For 0) E D ~ ( K ) we have
Sup IIolI~K-,/K
[[L'~'(O,0))- L"~(0,0))II < q(0)) 2K
and, in view of [[6(0))11 < K - 1/K, we have also
~.v~/c)(0))<~q(0)),
A. Le Breton / Continuous and discrete sampling
140
II L2,:'(0(co), co)- L'"(0(co), co)rr < q~:co) 2K Further
Sup [[I.~'(0, co)- L'~'(co)l[ < !q(co) Ilall~:K
and, in view of [[O'(co, x)[[<-K for [Ix[[-<- 1/K, we have Sup [[L~,'(O'(co, x ) , c o ) - Lr
< 'q(co).
Ilxl!~T/r
Finally for co E . O N ( K ) the function L ~ ( O ( c o ) + . , c o ) [[xll-< 1/K, and for I l x l l - - l / g ,
is continuous on
(x, L ~ ( 6 ( c o ) + x, co)) < 0. The lemma of Aitchinson-Silvey equation
[3] asserts that, for co @.QN(K) the
L~'(O(co) + x, co) = 0
admits a solution XN(co) satisfying I]XN(co)[[ < I/K. Considering
IIL~'(6(,o) + x, co)- L'-"(co)ll < 89 for co E ~2N(K) and Ilxll<-l/K, we see that L ~ ( O ( c o ) + x , co) is strictly negative definite. Computing, for [Ix II -< IlK, L~
+ x, c o ) - L ~)(O(co)+ XN (co), r =
, ~
[L~'(O(co) + X~,(co) + a ( x - XN(co)), co)] da,
that is,
L ~ ) ( 6 ( o ) + x, co) =
f
l )
L N,~,( O-( w ) + X N ( c o ) + a ( x
- XN (co)), co) x (x - XN(co))da,
it comes (L ~ =
+ x, to), x - XN(co)) =
f'
< 0
, (x - XN(co))'L.~'(O(co) + XN(co) + a ( x - XN(co)), r
if
x/XN(co).
- XN(co))da
A. Le Breton t Continuous and discrete sampling
So L ~ ) ( O ( t o ) + x , t o ) / O IIx II -<
for
UK,
[Ixll<-l/K
and
141
X/XN(to).
Further,
for
LN(O(to) + x, to)- LN(O(to)+ XN (to), to) = = '(x - xN (to))'L'2(tJ(to)+
XN(to)+
,~(x, to)(x - X ~ (to)), to)(x - X ~ (to)),
where o~(x, t o ) E (0, 1). Then for x ~ XN(to) one has
LN(O(to)+ X, to) <
LN(0(to) + XN(to), to).
Now we consider
2(~K] + P~189
+ Po*IttoII>K-K]. In view of assumption (A4) we have lim
N~+~
Po. ITNVb (K - 1 ) >_~2(~]
= N~+~limP~.[TNV~(K)>_~(2")]=O.
So we have limsup,~_~ Po*[I-IN(K)] K
-1]
and then lim lim sup K~+~
Po*[I-IN(K)]=
O,
or
N~+~
lim lim inf K~+~
Po*[~N(K)] = 1.
N~+~
Now we show that there exists a sequence {0N} satisfying the required conditions. Set bN =
U
12N(K).
If to E ~N, there exists a K ( t o ) -> 1 such that the equation L~)(O,to)= 0 admits an unique solution in II 0 - 0 ( t o ) l l - 1/K(to), say 0N(to). Let r/ > 0 be given. For all K _> 1 such that 1/K < r/ we have .0N(K) C []ItgN - 011 < r/]. But, for all e > 0, there exists Ko such that lim inf N~+~
Po*[12,,,(K)] > 1 - e.
K >_K,, implies
142
A. Le Breton / Continuous and discrete sampling
Choosing K such that 1/K < T/ and K > Ko, there exists No such that N > No implies
Po*[~N(K)] > 1 -- e and then
P~*[tlON-- 61I < 7q] > 1 - e.
Po*-limN~+~6N=6. NOW Po*[Lo)(ON)= 0] --> Po*[O,,(K)] > So
~,,(K)C[L~(0,,)--0], 1 - e and then
so,
if
N>No,
lim P0*[L~)(0N) = 0] = 1.
N~+m
Let {ON, - '. N > 1} be another sequence of random variables satisfying the conditions. One has, for all K-> 1,
{ o, ~ a : ~ ~ ON(K), 116~(~o)- 6(0,)11-< ~1- , L~,(~k(w),oJ)=O} c [~
=
~N],
SO,
Vo*[0k# 0N] --< (1 - Po*O2N(K)))+ Po* [][0r~-6 tl > 1 ] +
Po*[L~)(O[O#O];
then lim sup Po*[Ok# fiN] -< 1 - lim inf Po*(glN(K)). N~+~
N~+~
Using limK_+~ lim infN_§ Po*(FL~(K)) = 1, it yields lim Po*[Ok= ON] = 1.
N~+~
Let {0N; N > 1} be a sequence of random variables satisfying the conditions, we can write, for aJ E [ L ~ ( 0 N ) = 0],
L(I'(oN(w), w ) - L ~)(6N(to), w) = L(')(Or~(w), w) = L(')(0(w), to)+ L'2'(w)(0N(w)- 0(w)) = L2(w)(0N(w)- 6(o))). So that
y ~'( ON(w ) - 0(r )) = y k'[ L '~'(w )]-'{L "'( ON(o~), oJ) - L ~'(ON(w), to)}
A. Le Breton / Continuous and discrete sampling
and, for o~ E [LC~
143
= 0] N [110~ II<- K] and K > 0,
~/;,~[16~(0,)- 6(.,)11-< H( L '='(o~ ))-'IIVL(K) . Now
Po.[~-2 II~N - 6 II > A]
<- Po*[L~'(ON) / 0] + Po*[ll 0N II > K]
+ e~*ill L~'(" )-'11" V~(K) > A 1. Let e > 0 be given, then there exists Ko > 0 such that
P o*[]] 6 II > go] < e/6, lira P o*[ II0N II > go] = P o*[ II6 II > No].
N~+~
We can choose N~ such that for N -> N~,
I Po*[ II6~ II > go] - Po*[ II6 II > goll < e/6, so that P o*[ll 0~ II > Ko] < ~/3. Further, using limN_+= P~.[L g'(0N)g 0] = 0, we find Nz such that for N >--N2, Po*[Lg~(0~)r 0] < e/3, and, using the [act that {IIL:(" )-1 I1 V~(Ko); N >-1~ is bounded in Po*-probability, we can find A and N3 such that N >-N3 implies Po*[II L(~)(. ) ~ll"V~(Ko) > A ] < e/3. So, e being given, there exists No (No = Max(N~, N2, Na)) and A such that Po*[~'?~'ll 0N - 6[[ > AI < e. Then limA_§ proof.
A ] = O . This completes the
6. The result Now we come back to our problem about the sequence of statistical structures N ((7, ~Tr, {/z0T . N ., 0 ~ f(R~)}).
From Theorems 1, 2, 3 and [2] we immediately deduce the following theorem. Theorem 4. There exists a sequence {ON r ; N >_ 1} of estimates of O such that, for all O* E ~ ( R " ) , one has
144
A. Le Breton / Continuous and discrete sampling
(j) 0N.~ is 5"Nr-measurable, (jj) l i m N ~ /X 0%[C'~?T(0N. T) = 0] = 1, (j j j) /x or*-limN~+~ 0N,T = 0r = [f~dH,. II;] [f[H2,| -'. Further, the sequences {6N'/Z(0N,r -- 0T); N --> 1} and {6~'/~(ON.r -- 0.); N >-- 1}, where 0N.T = [EL~ (H,, - H,, .)H;,_.] [EL~ (t~ - t,-~)H~.] ', are bounded in t~ to.probability. Proof. Using Theorems 1 and 2 and the fact that for H,2| dt is almost surely a strictly positive definite matrix [2], one can easily show that assumptions (A1)-(A4) in Theorem 3 are satisfied for {LT, ISN,T;N>--I} and {LT,s N --> 1} with YN = 6~ 2. SO Theorem 3 applied to {LT, LN, T; N --> 1} provides assertions (j), (jj), (jjj) and the fact {6kl/2(ON.T -- OT);-N >-- 1} is a sequence which is b o u n d e d in /z 0%-probability. Further, ON.T is clearly the unique solution for s ) = 0 and {0u, T ; N -> 1} is nothing but the sequence of estimates provided by Theorem 3 for the sequence {s r (0,'); N -> 1}.
References [1] A. Le Breton, "'Estimation des param~tres d'une 6quation diff6rentielle stochastique vectorielle lin6aire", Comptes Rendus Hebdomadaires des Sdances de l'Acad~mie des Sciences, Paris, S6rie A, t. 279 (1974) 289-292. [2] A. Le Breton, "Parameter estimation in a linear stochastic differential equation" in: Transactions of 7th Prague Conference and 1974 E.M.S., to appear. [3] J. Aitchinson and S.D. Silvey, "Maximum likelihood estimation of parameters subject to constraint", The Annals of Mathematical Statistics 29 (1955) 813-828.
Mathematical Programming Study 5 (1976) 145-I68. North-Holland Publishing C o m p a n y
ON INTEGRALS DISCRETE-TIME
IN MULTI-OUTPUT KALMAN-BUCY
FILTERING*
Anders LINDQUIST University of Kentucky, Lexington, Ky., USA Received 9 May 1975 Revised manuscript received 29 January 1976
T h e problem of determining the error covariance matrix for a discrete-time K a l m a n - B u c y filter is equivalent to solving a certain n x n-matrix Riccati equation, which, due to symmetry, contains ~n(n + 1 ) scalar, coupled first-order difference equations. In this paper we show that, under the extra assumption that the underlying system is stationary, the matrix Riccati equation can be replaced by an algorithm containing mn first-order difference equations, where m is the n u m b e r of outputs. Hence we have reduced the n u m b e r of first-order recursions whenever, as is often the case, m < 89 + 1). This reduction is however bought at the expense of greater algebraic complexity, and therefore, at present, the merit claimed for this result is that of theoretical insight rather than computational efficiency. W e hope that this insight will bring about better numerical procedures in the future. The reduction is achieved through the exploitation of certain redundancies in the matrix Riccati equation. Although the main portion of this paper is concerned with stationary systems, our method also works in a more general setting. Hence we investigate what h a p p e n s when the assumption of stationarity is removed, and, under certain conditions, we find that the matrix Riccati equation can be replaced by vn first-order recursions, where u ~< n. W e conjecture that this is true in general.
I. Introduction
C o n s i d e r t h e d i s c r e t e - t i m e , n • n m a t r i x Riccati e q u a t i o n
P(t + 1) = F [ P ( t ) - P(t)H'(HP(t)H' + S)-IHP(t)]F' + GG', (1.1) . P ( 0 ) = P0, t = 0, 1 , 2 , . . . ,
w h e r e t h e n • n - m a t r i c e s F and P0, the m x n - m a t r i x H, the
n x p - m a t r i x G, a n d the m • m - m a t r i x S a r e all c o n s t a n t . P r i m e d e n o t e s * This work was supported by the National Science Foundation under grant MPS75-07028. 145
A. Lindquist/ Integralsin discrete-timeKalman-Bucyfiltering
146
transpose. The matrices P0, and S are nonnegative definite and symmetric, and moreover they are defined so as to insure the existence of the inverse in (1.1). (We may for example take S to be positive definite.) To begin with (in Sections 2-7) we shall also assume that F is a stability matrix (i.e. all its eigenvalues are inside the unit circle) and that Po is the (unique) solution of the Liapunov equation Po - F P o F ' - G G ' = 0.
(1.2)
The Riccati equation (1.1) is encountered in Kalman-Bucy filtering theory, where the matrix sequences K(t) =
FP(t)H',
R (t) = HP(t)H'
(1.3) + S
(1.4)
are to be determined. Condition (1.2) corresponds to the extra assumption that the underlying stochastic system is stationary. The solution of (1.1), subject to the additional condition (1.2), can also be determined [8, 10, 12, 13, 18] from the matrix recursion
P(t + 1) = P(t)- Q*(t)R *(t)-~Q*(t) ',
I
(1.5)
(P(O) = Po, where the n x m-matrix sequence Q* is generated by the
non-Riccati
algorithm I K(t + 1) = K(t) - FQ*(t)R *(t)-lQ *(t)'H',
(1.6)
Q*(t + 1) = [ F - K(t)R(t)-1H]Q*(t),
(1.7)
R(t + 1) = R ( t ) - HQ*(t)R *(t)- Q*(t)'H;
(1.8)
R *(t + 1) = R
(1.9)
*(t) - Q*(t)'H'R (t)-IHQ*(t)
with initial conditions K(O) = O*(O) =
FPoH',
(1.10)
R(0) = R *(0) =
HPoH'+ S.
(1.11)
This algorithm determines K and R, which are the quantities usually required, directly without resort to (1.5). It can be written in a more compact form [10, 12, 13], but the present formulation is more suitable for later references. In fact, it can be shown that recursion (1.8) is not needed, and therefore (in view of the fact that the m x m-matrix R * is symmetric) the
A. Lindquist / Integrals in discrete-time KaIman-Bucy filtering
147
algorithm actually contains only 2ran + ~m (m + 1) scalar first-order difference equations. Often, and this is the raison d'etre for our study, m .~ n. Then the non-Riccati algorithm contains a much smaller number of equations than the i n ( n + 1) of the matrix Riccati equation (1.1). This is of course provided that only K and R are required, for if we also want P, (l.5) has to be invoked. The non-Riccati algorithm (1.6)-(1.9) was first obtained by the author [10]. The original derivation [10] proceeds from basic results in filtering of stationary time series [1, 5, 9, 22, 23] and does not in any way involve the Riccati equation. However, once the structure of the algorithm is known, it is not hard to derive it directly from the Riccati equation. Indeed, this merely amounts to observing that the factorization (1.5) holds. In doing this, (1.2) has to be applied in the initial stage to insure the correct initial conditions, for without (1.2) the algorithm (1.6)--(1.9) does not hold. The factorization (1.5) is however still valid provided that we account for the non-zero left member of (1.2) in determining the initial conditions. Obviously, the algorithm thus obtained has the same structure as (1.6)-(1.9), although Q* in general has a different number of columns. Whenever this number is small (which depends on the initial factorization), the procedure will result in a useful algorithm. This rather natural extension of our work was first presented by Kailath, Morf and Sidhu [8, 18]. 1 A similar factorization technique had previously been used by Kailath [6] in obtaining the continuous-time analog [6, 7, 11, 12] of the non-Riccati algorithm, thereby extending the Chandrasekhar-type results of Casti, Kalaba and Murthy [3]. The results of [6] and [10] were developed independently. Another algorithm, which is similar to (1.6)-(1.9) and to which we shall have reason to return below, was developed independently by Rissanen [19, 20]. The Riccati equation (1.1) and the non-Riccati algorithm (1.6)-(1.9) both have the form X1q(t)),
{x~(t+l)=f~(xl(t),x (t)
i = 1 , 2 , . . . , N,
(1.12)
x, ( 0 ) = a~,
where fl,f2 . . . . . fN are real-valued functions, and N equals 89 + 1) and 2ran + m (m + 1) respectively. In this paper we shall study certain relations of the type ' These papers were completed while the authors had privileged access to [10]. We wish to point this out, since this fact is not reflected in the reference list of [8].
148
A. Lindquist / Integrals in discrete-time Kalman-Bucy filtering q~i (xl(t ), x2(t) . . . . . xN (t )) ~ constant
(1.13)
for t = 0, 1, 2 . . . . . where ~,, ~02. . . . . q~M ( M < N ) are real-valued functions. Following the terminology of the classical theory of differential equations, a relation such as (1.13) will be denoted an integral of the system (1.12). We believe that knowledge of these integrals will prove valuable in studying the numerical properties of the algorithms. Using (1.6)-(1.9) in place of (1.1) amounts to a reduction in the number N of first-order difference equations whenever n ~> 4m. This poses an interesting question: Is it possible to reduce this number even further? It is the purpose of this paper to show that this is indeed the case, and we shall use a system of integrals of (1.6)-(1.9) to see this. In fact, under the very natural assumption that (H,F, G ) is a m i n i m a l realization [2], i.e. ( H , F ) is observable and (F, G ) is controllable, we shall find that only m n first-order difference equations are needed to solve the Riccati equation (1.1) subject to the initial-condition constraint (1.2), and under certain special conditions even fewer are required. This reduction is mainly of theoretical interest. Indeed, from a computational point of view, the important problem (which we shall not consider here) is to minimize the number of arithmetic operations. Our basic approach does not rely on the assumption (1.2), and in the end of the paper we shall briefly discuss what happens when this constraint is removed. The outline of the paper goes as follows. In Section 2 the algorithm (1.6)-(1.9) is transformed into a system which is " p a r a m e t e r free" in the sense that the recursions do not depend on (H, F, G), other than through a certain partitioning pattern. In fact, the system parameters enter only in the initial conditions. In Section 3 a set of m n + 8 9 + 1) integrals o f ' t h e parameter-free system is derived. These relations are highly nonlinear, but by a suitable transformation a bilinear system results, so that m n + 89 (m + 1) variables can be solved in terms of the remaining ran. The proof of this is found in Section 4. In Section 5, the 89 + 1) integrals of the combined system (1.5)-(1.9), obtained by eliminating P ( t + 1) between (1.1) and (1.5), are solved for P. As a preliminary of Section 7, in Section 6 a transformation is applied to the parameter-free version, the purpose of which is to eliminate R *. Then, in Section 7, the reduced-order algorithm is presented. Finally, Section 8 is devoted to a brief discussion of the nonstationary case. This work extends the results of our previous paper [14] to the (discretetime) multi-output case (m > 1). The corresponding continuous-time result
A. Lindquist
/ Integrals in discrete -time Kalman-Bucy
filtering
149
is r e p o r t e d in [15]. Recently a seemingly similar result has b e e n presented by L u o and Bullock [17]. H o w e v e r , it should be noted that the m n equations of [17] are not of first-order and t h e r e f o r e do not constitute a r e d u c e d - o r d e r system in the sense of this paper.
2. A generating function formulation of the non-Riccati algorithm Letting e~k) be the ith unit vector in R ~, define the k x k-shift-matrix & = (0, e~k), e~k~. . . . . e ~ l )
(2.1)
and the m x k-matrices /--/~k = (el'% O, 0 . . . . . O)
(2.2)
with a unit v e c t o r in the first column and zeros in all o t h e r positions. Then, assuming that (H, F ) is observable and H has full rank, it is no restriction to take F and H to be in the canonical form F = J-
(2.3)
All,
H = (Hx,,,/-/2 . . . . . . , H , , . . ),
(2.4)
where n~, n 2 , . . . , nm are positive integers such that nl
+
n2-]-
" " " "Jr- n r n
=
T[:,
.] is the n x n block-diagonal matrix J,,
0
0
o
&o...
...
0
o
3 =
,
0
0
0
...
(2.5)
J.~
and A is a constant n • m-matrix, which we shall partion in the following way all
a 12
a 13
...
a TM
a 2,
a=
a 23
...
a 2"
A =
, a ml
a me
a m3
. . .
(2.6)
a mm
a ~i being an n~-dimensional (column) vector. In fact, (F, H ) can always be
150
A. Lindquist / Integralsin discrete-time Kalman-Bucy filtering
transformed into the form (2.3)-(2.4) (see e.g. [16]), and in the sequel we shall assume that this has been done. The non-Riccati algorithm can now be recast into a form which is parameter-free in the sense that the recursions do not depend on the system parameters A : Lemma 2.1. The matrix sequences K, R, Q* and R*, defined in Section 1, are determined by the system of matrix recursions O(t + 1) -- O ( t ) - JQ*(t)F* ; O(O) = (J - A H ) P o H ' + A R (0),
(2.7)
Q*(t + 1) -- J Q * ( t ) - Q(t)F,; O*(0)-- ( J - A H ) P o H ' ,
(2.8)
R ( t + 1) = R ( t ) ( I - F,F*); R ( O ) = H P o H ' + S,
(2.9)
R *(t + 1) = R *(t)(I - F ' F , ) ; R *(0) = R(0),
(2.10)
where the m x n-matrix parameter sequences F and F* are defined as
I
F, = R ( t ) - ' H O * ( t ) ,
(2.11)
F * = R *(t)-lO*(t)'H ',
(2.12)
and the gain K is given by K(t) = Q(t)- aR(t).
(2.13)
Proof. Relations (2.9) and (2.10) are the same as (1.8) and (1.9). To obtain (2.7) insert (2.13) and (2.3) into (1.6): Q ( t + 1) - A R ( t + 1) = Q ( t ) - a R ( t ) -
(J - A H ) Q * ( t ) F * , .
By (2.11) the last term can be written JQ *(t)F* - A R (t)F,F*, and therefore, in view of (2.11) and (2.12), relation (2.7) follows. Similarly, (1.7), (2.3) and (2.11) give us O ( t + 1) = J Q * ( t ) - A R ( t ) F , - K(t)F,, which, by (2.13), is the same as (2.8). For m = 1 this version of the non-Riccati algorithm is essentially identical to Rissanen's algorithm [19]. (See Remark 4.2 of [14], where this is explained in detail.) For m > 1, the relation is more complicated in that Rissanen's algorithm contains more equations.
A. Lindquist / Integrals in discrete-timeKalman-Bucy filtering
151
Note that the algorithm of L e m m a 2.1 actually only contains 2ran + m nontrivial scalar (first order) recursions. In fact, from (1.3), (1.4), (2.3) and (2.13), we have Q ( t ) = JP(t)H'+ AS,
(2.14)
R ( t ) = H P ( t ) H ' + S.
(2.15) i
It follows from (2.4) that row number v~ in Q (where v~ = ~k=~n~) is constant in t and need not be updated. (This can also be seen from (2.7).) Since there are m such rows, each with m elements, m2 equations can be removed from the 2mn + m ( m + 1) recursions of L e m m a 2.1. From (2.14) and (2.15) it is also clear that the components of the n x m-matrix P(t)H' from which R ( t ) is formed are exactly the ones excluded from (2.14). Similarly,
R ( t + 1) = R ( t ) - HO*(t)F*
(2.16)
and (2.7) show that Q and R are updated by the corresponding components of O * ( t ) F * . This fact will be used in the generating function formulation of Lemma 2.1, to which we now proceed. To this end, define the set of n,-dimensional vectors {q~
i = 1,2 . . . . . m, j = 1,2 . . . . , m}
(2.17)
and
{q*q(t);i = 1,2 . . . . . m , j = 1,2 . . . . . m}
(2.18)
formed by partitioning O and Q * in the same way as in (2.6). For these vectors we have the following generating functions
ON(z) = ~_, q~(t)z k,
(2.19)
k=O ni
O*~J(z) = ~ q*'~(t)z k
(2.20)
k=O
i,j = 1,2 . . . . . m, where we have defined the 0th components of the vectors (2.17) and (2.18) as { q g ( t ) = r~(t)= element (i,]) in R(t),
q*q(t) = 0.
(2.21) (2.22)
Then the following lemma is an immediate consequence of (2.7), (2.8) and (2.16):
152
A. Lindquist / Integrals in discrete- time Kalman-Bucy filtering
Lemma 2.2. The m • m-matrix valued polynomials Q,(z) and Q * ( z ) with components (2.19) and (2.20), satisfy the following recursions:
{ Q,+l(z) = Q, (z) - 1 Q*(z)F*,,
(2.23)
O*+l(z) = 1 O * ( z ) - Q, (z)F,.
(2.24)
z
3. Redundancy in the non-Riccati algorithm
The number of recursions in Lemma 2.1 can be further reduced. In fact, there is an extensive redundancy in the algorithm manifested in the existence of a number of simple integrals. In the following lemma, which generalizes a previously presented result [14] to the case m > 1, we give a generating function formulation of these relations. Lemma 3.1. Let Or(z) and Q*(z), t = 0 , 1 , 2 . . . . . be the m Ira-matrix valued polynomials defined in Section 2. Then there is a constant matrix polynomial B ( z ) such that Q, (z )R (t) 1Q, ( l / z ) ' - Q*(z )R *(t)-~Q*(1/z ) '
= ~[B(z)+
SO~z)- B(0)]
(3.1)
holds for all t.
(Explicit formulas for the coefficients of B (z) in terms of A and G will be given in Section 5.) Proof. We need the following relation, the proof of which can be found in [10; Lemma 3.2]: FiR *(t + 1) -1 = R ( t + 1)-1F *'.
(3.2)
By using the recursions of Lemma 2.2, we obtain: Q,+l(z)R(t + 1)-lO,+i(1/z) ' - Q ,+,(z)R*(t + 1) Q t+l(1/z) t = Q, ( z ) [ I - F,F*]R(t + 1)-IQ,(1/z) ' - Q * ( z ) [ I - F*,F,]R *(t + 1)-'Q*,(1/z) '
(3.3)
which, in view of (2.9) and (2.10); establishes that the left member of (3.1) is constant in t. Clearly, it must have the form exhibited in the right member of
A. Lindquist / Integrals in discrete-time K a l m a n - B u c y filtering
153
(3.1). To obtain (3.3), we have used (3.2) three times, whereupon the appropriate cancellations have occurred. Equation (3.1) provides us with m 2 generating function relations, the ijth of which contains n, + nj + 1 scalar relations, i.e. 2ran + m 2 in total. However, due to symmetry, most of these occur in pairs, so that we only have mn + 89 + 1) different relations. (We can only use the functions, say, above the diagonal, and the ith diagonal one only provides n~ + 1 scalar relations.) These relations are highly nonlinear, but as in the scalar case [14], we shall transform them into bilinear forms. However, unlike the scalar case, R * ~ R, and therefore the simple transformation of [14] is no longer directly applicable, but needs to be modified slightly. To this end, we define
0 *(z ) = Q *(z )R *(t)-T/2R (t) r/2,
(3.4)
where
R = R ~ R T/2 is the Cholesky factorization (see e.g. [21]), i.e. R ~ is lower triangular and R T/z is its transpose. (Some other factorization could also be used.) In Section 6 we shall reformulate the non-Riccati algorithm in terms of O and 0 " . Equation (3.1) can now be written:
O,(z )R (t )-l Q,(1/z ) ' - Q *(z )R (t )-l O*(1/z ) ' = 89[B ( z ) + B ( 1 / z ) - B
(0)],
(3.5)
to which we can apply the transformation U, ( z ) = [ O , ( z ) - Q*,(z)]R(t) -~,
(3.6)
V, (z) = O, ( z ) + (~*(z)
(3.7)
to obtain
U,(z)V,(1/z)'+ V,(z)U,(1/z)'= B(z)+B(1/z)-B(O),
(3.8)
which is the required bilinear form. The m x m-matrix polynomials U, and V, have components UiJ(z) = ~ u~(t)z k,
(3.9)
k
Vi'(z) = ~ v~(t)z k. k
Here,
(3.10)
154
A . L i n d q u i s t / I n t e g r a l s in d i s c r e t e - t i m e K a l m a n - B u c y
{u0,=8,={10
filtering
ifi=J,ifi~j,
(3.11)
v~ = r,j = element (i,j) in R.
(3.12)
Note that the 0th coefficient of the matrix polynomial U, (z) is the identity matrix, so that the only time varying coefficients of Ui~(z) are the ones collected in the n~-dimensional (column) vector u ' = (u~, u2,.. ' .,u~,)'.
(3,13)
In analogy to (2.6) we shall arrange the vectors u'i in the n • m-matrix
[ U =
~11
/ u21
/
/,/ml
~12
/,/13
. . .
/,/lm
u2:
u23
"'"
u2m
Urn2
Urn3
, 99
Urrtm
(3.14)
The vectors v ' and the matrix V are defined in the same manner. The mn + 89 + 1) equations of (3.8) can now be used to solve the mn + 89 + 1) components of V(t) and R ( t ) in terms of the nm components of U(t). In Section 7 we shall thus be able to present an algorithm for U to replace the present non-Riccati equations. This algorithm will of course only contain mn scalar (first-order) equations. Identifying coefficients of z j in (3.8) we obtain
~ [UTk-,Vt~k+Ur~k+~V7k] = b7 ~,
(3.15)
k = l 1=o
where i = - n ~ , - n ~ , + l ..... 0 .... , n a - l , n ~ (if a # [ 3 ) or i = 0 , 1 , . . . , n ~ (if a =/3). [Note that (3.9) and (3.10) define u~ and v~ to be zero whenever k < 0 or k > n,. We have chosen the upper limit of the second summation of (3.15) to be greater than all n~.] Let us now define the rectangular Toeplitz matrix function (the two first arguments of which exhibit its dimension)
T(i,j, x_,, x_,+, . . . . . xj) =
Xo
X1
X2
...
X]
X-I
Xo
X1
999
Xj-t
X-2
X-1
Xo
9. 9
Xj-2
X-i
X-i+1
X-i§
. 99
Xy-i
(3.16)
A . L i n d q u i s t / Integrals in discrete-time K a l m a n - B u c y filtering
155
and the H a n k e l matrix function
H(i,
j,
xo, X l
.....
Xi+j)
~-
"X,o
Xl
X2
. . .
Xj
Xi
X2
X3
999
Xj+I
X2
X3
X4
9 9
Xj+2
Xi+l
Xi+2
99.
Xi+l _
- )~i
(3.17)
in terms of which we can write (3.15) as
{T(ni + ns, nj, O',&k,u~k,O') [ rs' ] k=,
tv'kJ 9
[ r,k ] / - -
+H(n~+n~,n~,O',6sk, U'k,O')Lv~k] J forj=i+l,i+2,...,m;
i = 1,2,...,m,
b ij
(3.18)
and
[ r,k ] = b' {T(n~, n~, 0", &k, u~k)+ H(ni, n~, &k, u ,k, 0"')} [vik
(3.19)
k=l
for i = j. H e r e Ok is the k - d i m e n s i o n a l zero vector, &k is defined by (3.11), b ~ is the (n, + nj + D-dimensional vector b ij = (b ~,,, b .... ij , . . . . . b,,), 0 ,
(3.20)
and b' the (n, + 1)-dimensional v e c t o r
b i = ( b ~ , b "1, b :,. 'i .., b,,) "' .
(3.21)
(Note that, in (3.18)-(3.19), Ok and u ~k are arrays of arguments.) The 89 + 1) vector relations (3.18) and the m vector relations (3.19) t o g e t h e r constitute a system of m n + 89 + 1) linear equations in the mn + 89 + 1) c o m p o n e n t s of V and R. If the coefficient matrix has full rank, we can solve this system to obtain R and V in terms of U:
R (t) = Mo(U(t)),
(3.22)
V(t)
(3.23)
M(U(t)),
A sufficient condition for this to be the case is provided by the following t h e o r e m , the p r o o f of which will be given in Section 4. T h e o r e m 3.2. Let (i, H ) be given by (2.3)-(2.4), and assume that (1, G) is
controllable. Let C be any matrix such that S = CC', and suppose there is a matrix X such that
A. Lindquist / Integrals in discrete-time Kalman-Bucy filtering
156
(3.24)
C X = H.
Then, for each fixed t = O, 1, 2 . . . . . the system (3.18)-(3.19) of linear equations has a unique solution (3.22)-(3.23).
In particular condition (3.24) is satisfied if S is nonsingular. Remark 3.3. The Toeplitz-Hankel structure of the system (3.18)-(3.19) permits the derivation of a fast algorithm for the solution of the system. Hence this latter task should not be quite as laborious as it first appears. Remark 3.4. To more explicitly bring out the similarities with the exposition in [14] we may instead use the following formalism: Let = max(n,, n2,..., n,,), and define the m • m-matrices U~ 1
U~ 2
...
U~ m
uf'
uf 2
...
uf m
(3.25)
O, =
u?'
u? 2 ...
u? m
i = 0, 1,2 . . . . . The matrices V, are defined analogously. Clearly Oi = Q~ = 0 for i > ti, and by (3.11) and (3.12) ,~ O0 = I,
(3.26)
[ V0(t) = R (t).
(3.27)
In fact, 01 (t) and Q~(t) are the coefficients of the matrix polynomials U, ( z ) and V, (z): O, (t)z',
(3.28)
| V, (z) = ~ ~ (t)z', i=0 (
(3.29)
U, ( z ) = ~ i=0 ,i
which inserted into (3.8) yields ~] [U~_,V~+ VjU;+,] = B,
(3.30)
j=o
for i = 0, 1. . . . . ti, after identifying coefficients of z'. This is as far as we can
157
A. Lindquist /Integrals in discrete-timeKalman-Bucy filtering
bring the analogy with the scalar case, for in order to be able to write the left m e m b e r of (3.30) as the sum of a b l o c k Toeplitz and a block H a n k e l system, we would n e e d to have the second t e r m transposed. R e l a t i o n (3.30) is of course identical to (3.15), and we shall have to p r o c e e d as above. R e m a r k 3.5. Precisely as in the scalar case [14], some c o m p o n e n t s of Q and Q * and h e n c e of U will be zero w h e n the polynomial B ( z ) satisfies certain conditions, which h o w e v e r are s o m e w h a t m o r e c o m p l i c a t e d in the multivariate case. F o r example, if S = 0, q,,'J =- 0 by (2.14); then, in view of (2.8), q*,i'(t)=-O w h e n e v e r t/> 1; hence u~,(t)=-0, too, for the s a m e values of t. T h e r e f o r e the n u m b e r of equations can be further reduced.
4. Proof of Theorem 3.2
L e m m a 4.1. Let (F, H ) be given by (2.3)-(2.4), and let {P(t); t = 0, 1, 2 . . . . } be the solution of (1.1)-(1.2). Then, for each fixed t = 0, 1, 2 . . . . , P(t) satisfies the Liapunov equation
P = (3- UH)P(J-
UH)'+(U-A)S(U-A)'+
GO'
(4.1)
where U = U ( t ) , defined by (3.6). Proof. F r o m (3.6) and (3.4) we have
O * ( R *)-r/2nT/Z = K - ( U - A )R.
(4.2)
T h e n eliminating P(t + 1) and O * ( t ) b e t w e e n (1.1), (1.5) and (4.2) we have
P : [F-(U-A)H]P[F-(U-A)H]'+(U-A)S(U-A)'
+ GG',
which is the s a m e as (4.1). T h e following l e m m a is a discrete-time version of a result due to W o n h a m [24, p. 289]. W e shall use the s a m e idea of proof. L e m m a 4.2. Assume that the conditions of Theorem 3.2 hold. Then, for each fixed t = O, 1, 2 . . . . .
J-
U(t)H'
is a stability matrix, i.e. all its eigenvalues have moduli less than one. Proof. Let (4.3) be d e n o t e d ff'(t), and let (~ be defined by
(4.3)
A. Lindquist / Integrals in discrete-time Kalman-Bucy filtering
158
t~t~'= ( U - A ) S ( U - A )' + GG'. Since (F, G ) is controllable, ( F - ( U - A ) C X , G) is controllable for each matrix X of suitable dimension [24; L e m m a 4.1]. Choose X to be a solution of (3.24). Then it follows that (F, G ) is controllable. The Liapunov equation (4.1) can be written e = FPF'+ GO',
(4.1)'
or, by successive iterations, s-1
P = F ' P ( F ' ) " + ~'~ ( F ' 0 ) ( 1 6 @ ) ',
(4.5)
i=O
which holds for all s = 0, 1, 2 . . . . . Now (4.1)', and hence (4.5), has a solution which is nonnegative definite, namely the Riccati matrix P(t). Hence s-I
~'~ ( P @ ) ( P ' 0 ) ' ~ < P(t) i=0
for s = 0, 1,2 . . . . (where ~< is defined with respect to the cone of nonnegative definite matrices). Consequently P must be a stability matrix. Lemma 4.3. Let Y be an n x m-matrix, and let Y ( z ) be the m x m-matrix polynomial formed from Y in analogy with (3.28). Then
Y ( z )X(1/z )' + X ( z ) Y(1/z)' = B (z) + B (1/z) - B (0)
(4.6)
has a unique m x m-matrix polynomial solution X ( z ) (of degree <~n) for each matrix polynomial B ( z ) of degree <<-n, if and only if the Liapunov equation P = ( J - Y H ) P ( J - YH)'+ C
(4.7)
has a unique solution P for each symmetric matrix C. We postpone the proof of this lemma to Section 5. Proof of Theorem 3.2. Since (4.3) is a stability matrix (Lemma 4.2), (4.7)
with Y = U(t) has a unique solution [4]. Hence (3.8), and consequently (3.18)-(3.19) too, has a unique solution (Lemma 4.3). The following is a corollary of L e m m a 4.2, which we shall need in Section 8.
A. Lindquist / Integrals in discrete-time Katman-Bucy filtering
159
Corollary 4.4. A s s u m e that the conditions of Theorem 3.2 hold. Let P be a nonnegative definite solution of the Liapunov equation P=(J-UH)P(J-UH)'+(U-A)S(U-A)'+GG'+C, where C is nonnegative definite. Then J -
(4.8)
U H is a stability matrix.
5. Redundancy in the Riccati equation
Partition the Riccati matrix P ( t ) into m 2 blocks in the following way:
p,,
p,2
...
p,~ ]
p21
p22
...
/ p2,,, /
p = pm,
pm~
/
(5.1)
pm.,]
where P'J is an nl x hi-matrix with components p~ (k = 1,2 . . . . . n,; l = 1,2 . . . . ,n~). Define D ~j and d~ by a similar partitioning of D = G G ' . Moreover, introduce the constant n x m-matrix A = A S , and define the n~-dimensional vector d ~j by a partitioning of type (2.6). Lemma 5.1. For each fixed t, Pea(t), U(t), V ( t ) and R (t) are related in the following w a y : 89 k=l
(UrkVff + V r k U f f ) = p,+,4+1 ~ - p~" + k2= ] ~~ka f f + d~ ~
(5.2)
for i = O, 1 . . . . . n~ ; j = O, 1, 2 . . . . . n~, provided p~O and dT~~ are defined to be zero whenever i < 1, i > n~, j < 1 or j > no, and ag = 8q.
Proof. Insert P ( t + 1) as given by (1.5) into the Riccati equation (1.1), and use (1.3) and (1.4) to obtain P - O * ( R *)-'O*' = F P F ' - K R - ' K ' + D,
(5.3)
which, in view of (2.3), (2.13) and (1.4), can be written
OR-'O'- O *(R *) 'O*' = JPJ'- P + ASA'+ D.
(5.4)
By employing transformations (4.1) and (3.6)-(3.7), the latter of which comes out as
160
A . L i n d q u i s t / Integrals in discrete-time K a l m a n - B u c y filtering
U = (O - 0 *)R-',
(5.5)
V = Q + 0*
(5.6)
in matrix form, to (5.4), we obtain ( U V ' + VU') = J P 3 ' - P + A A ' + D,
the al3th block of which is ~ [U~kV'k'+ V~kU~k'] = J.- o -P ~ t~' - ~ -
p,,~ +
k=l
~ 6~ka a k ' + D ~ .
(5.7)
k=I
Moreover, (2.14) together with (5.5)-(5.6) yields 89
+ V ) = JPH' + A,
the aflth vector (by type (2.6) partitioning) of which is (5.8)
~ [u~krk~ + V ~] = J,~P~et~"~)+ ~ ~ k=I
Together, relations (5.7) and (5.8) constitute (5.2). (Equation (5.8) provides the part for which i or j is zero). 5.2. The ijth component of P~~ U(t), V(t) and R (t) through the relation
can be expressed in terms
Lemma
p~O=
lat /=i
at+j-i-2
i
i+j-i
2
I
t+j-,] + dTf+i-, ,
,
(5.9)
k=l
which holds for all t. Hence a7 ~ = ~'~ = 0 for l < 0 and 1 > n,. Proof.
Since p~
~
~
I=i
P,.~+j-,),
summing (5.3) in the appropriate fashion will yield the desired result. Remark 5.3. Lemma 5.2 together with (3.22) and (3.23) defines a function N such that P(t) = N ( U ( t ) )
for alI t.
(5.10)
A. Lindquist / Integrals in discrete -time Kalman-Bucy filtering
161
By putting j = 0 in (5.8) we obtain an alternative derivation of (3.15) and an expression for b7 ~ in terms of A, fi, and D :
b7 ~
2
,,I - l - i I =0
9
k =0
Proof of L e m m a 4.3. Rewrite (4.7) as 3PJ'- P = YX' +
xY'+ c
(5.12)
where
X = J P H ' - 8 9Y H P H ' .
(5.13)
Now, following the s a m e p r o c e d u r e as in the proof of L e m m a 5.1, these relations can be written ,,o _ p~O _- 2 P,+~.i+l
[YTkXff +xrkYff]+Co=~
(5.14)
k=l
i -- 1 , 2 , . . . , n~, j = 1,2 . . . . . n~, and xT'
o8 -- pi+l,l
_
89
yi~kpllks,
(5.15)
k=l
i=l,2,...,n~, respectively. H e r e we have defined p , , . l . j = p i . , , + l = 0 . M o r e o v e r , if we take y ~J= 8,j and define p ~ and c ~0 to be zero w h e n e v e r i or j is zero, it is easy to see that, in view of (5.15), (5.14) holds for i = 0 o r j = 0 also. T h e n multiplying (5.14) by z '-j and s u m m i n g o v e r i = 0, 1 . . . . . n~ and j = 0, 1 , . . . , no, we obtain (4.6) for a suitable choice of B ( z ) . Clearly P can be uniquely d e t e r m i n e d from X a n d vice versa. This can be seen f r o m (5.14) and (5.15). H e n c e (4.6) will have a unique solution if and only if (4.7) has one.
6. A n a l g o r i t h m in O a n d ( ) *
T h e coefficients of the matrix p o l y n o m i a l ( ) * ( z ) , defined by (3.4), can, in the usual m a n n e r , be collected in the n • m - m a t r i x () *(t) = Q *(t)R *(t)-T/2R (t) Ta.
(6.1)
As a p r e l i m i n a r y for Section 7, we shall r e f o r m u l a t e the equations of L e m m a 2.1 in t e r m s of Q, ()* and R. T o this end, define the m x m - m a t r i x valued function L of two m x m - m a t r i x variables, X and Y, as the C h o l e s k y factor [211 of
162
A. Lindquist / Integrals in discrete-time Kalman-Bucy filtering
X - Y X -1Y'
(6.2)
that is, the (unique) lower triangular m x m-matrix L ( X , Y ) such that L (.k, Y ) L (X, r ) ' = X -
Y X - ' Y'.
(6.3)
Clearly, L is defined if X is symmetric and nonsingular and (6.2) is positive definite. This will be the case in the applications below. Lemma 6.1. The Cholesky factors R I and R*I of R and R * respectively satisfy the recursions R ( t + 1)89 = L(R(t),HQ*(t)),
(6.4)
R *(t + 1)I = R *(t)IR(t)-IL(R(t), Q*(t)'H').
(6.5)
Proot. Using (6.1), we can rewrite (1.8) as R ( t + 1)= R ( t ) - n O * ( t ) R ( t ) - ~ Q * ( t ) ' H ',
(6.6)
which, by (6.3), provides us with the first recursion (6.4). Similarly, (6.5) is obtained by first writing (1.9) as R *(t + 1) = R *(t)IR(t)- 89
- Q * ( t ) ' H ' R ( t ) - I H Q * ( t ) ] R ( t ) r/2R *(t) r/2,
and then again use the definition (6.3). Note that the product of lower triangular matrices is lower triangular, and that the Cholesky factorization of a positive definite matrix is unique. Then a straightforward application of Lemma 6.1 to the three first matrix recursions of Lemma 2.1 establishes the following lemma. Lemma 6.2. The matrix sequences Q, Q * and R are determined by the matrix recursions Q ( t + 1) = Q ( t ) - J Q * ( t ) R ( t ) - ' Q * ( t ) ' H ' ,
(6.7)
0 * ( t + 1) = [J - O ( t ) g ( t ) - ~ H ] Q * ( t ) • [L(R(t), Q * ( t ) ' H ' ) L ( R ( t ) , H Q * ( t ) ) - ]; R ( t + 1)= R ( t ) - H Q * ( t ) R ( t ) - I O * ( t ) ' H '.
(6.8)
with initial conditions Q(O) = FPoH' + A R (0), t~*(0) = FPoH' and R (0) = HPoH' + S. Since the m 2 components q~, are constant and R is symmetric, this
A. Lindquist / Integralsin discrete-time Kalman-Bucy Jqltering
163
algorithm contains 2ran - 89 m (m - 1) nontrivial scalar (first order) difference equations. Remark 6.3. By applying Lemma 6.1 to the original algorithm [10] or to the modified ones [13], we obtain 2ran equations precisely as in the continuoustime case [6, 7, 11, 12]. 7. The reduced-order non-Riccati algorithm
We are now in a position to construct a system of mn first-order difference equations to replace the ~n(n + 1) of the matrix Riccati equation (1.1). Lemma 7.1. The n x m-matrix sequence U, defined by (5.5), satisfies the matrix recursion
I
u ( t + 1) = 49(u(t), v ( t ) , n(t)), (7.1)
I, U(0) = A. Here the function 49 : R"•
x R"•
x R"•
~ R"X" is defined as
49(U, V, R ) = qz(89V +89UR,~ V - 8 9 UR, R),
(7.2)
where ~(0,
(~*, R) = [ 0 -
.]O*R-'O*'H'
- (JO* - O R - I H O * ) L ( R , O * ' H ' ) L ( R , H O * ) ] x [L(R, H O * ) - ' ] ' L ( R , H O * ) ' (7.3) The function L is given by (6.3). Proof. The lemma is an immediate consequence of (6.7), (6.8), (6.4), and (5.5) and (5.6). So far we have assumed that ( H , F ) has the special canonical form (2.3)--(2.4). This is no restriction. Indeed, we have the following well-known result. Lemma 7.2. Let (1t, F) be observable. Then there is n x n matrix Tsuch that T - I F F equals (2.3) and H T equals (2.4).
We refer the reader to the literature (e.g. [16]) for a proof of this lemma and for an explicit formula for T in terms of H and F.
164
A. Lindquist / Integrals in discrete-time Kalman-Bucy filtering
T h e o r e m 7.3. Let (H, F) be observable and (F, G ) controllable. Suppose that
H = CX, where C C ' = S, forsome m x n matrix X. Let {U(t); t = 0, 1,2 . . . . } be the solution of the first-order n • m matrix recursion U(t + 1) = ~ ( U ( t ) , M ( U ( t ) ) , M,,(U(t))),
(7.4) u ( o ) = A,
where M and Mo are the (well-defined) transformations (3.22) and (3.23) evaluated as in Section 3 with G exchanged for T-'G. Then the solution of (1.1)-(1.2) is given by P(t) = TN(U(t))T'
(7.5)
where N is the function (5.10) evaluated with D = T ' G ( T - ~ G ) '. The gain matrices (1.3) and (1.4) are given by
K(t) = 89 U ( t ) R (t) + M ( U ( t ) ) ] - A R (t),
(7.6)
R ( t ) = Mo(U(t)).
(7.7)
Proo|. Since ( H , F ) is observable, we can apply the transformation of Lemma 7.2. It is easy to see that (1.1)-(1.4) hold with P, Po, G, H, F and K exchanged by T - ' P ( T ' ) -~, T-IPo(T') -~, T-1G, (2.3), (2.4) and T-~K respectively. Then the problem is of the form considered in Sections 2-6. By Theorem 3.2 the coefficient matrix of the linear system (3.18)-(3.19) is nonsingular. Hence the functions M, Mo and N are well-defined for the appropriate valued of U. Then (7.5) follows from (5.10), (7.6) from (2.13), (3.23) and (5.5)-(5.6), and (7.7) from (3.22). Hence, only mn first-order difference equations are needed to solve the matrix Riccati equation. If the system parameters satisfy certain additional conditions (e.g. S = 0) as indicated in Remark 3.5, the order of the algorithm can be further reduced. However it remains to investigate whether there is a computationally efficient procedure based on this result. It seems that further simplifications are needed to support such a claim. 8. The nonstationary case
Once the structure of the non-Riccati algorithm (1.6)-(1.9) is known, it is not hard to derive the factorization (1.5) directly from the Riccati equation. In fact, as demonstrated in [18], this basically amounts to showing that such a
A. Lindquist / Integrals in discrete -time Kalman-Bucy filtering
165
factorization holds for the initial time. For t = 0 the Riccati equation (1.1) reads: P(1) - P(0) = F P o F ' - Po + G G ' - FPoH(HPoH'+ S)-IH'PoF'.
(8.1)
Since the Liapunov equation (1.2) holds, (8.1) yields e(1)-
P(O) = - O * ( 0 ) R *(0)-10*(0)',
(8.2)
where O*(0) and R *(0) are defined by (1.10) and (1.11). Now, if we remove the constraint (1.2), the three first terms in the right m e m b e r of (8.1) will no longer sum up to zero, but we can nevertheless factor (8.1) in the manner of (8.2), by some suitable numerical procedure, to obtain the n x v-matrix O*(0) and the u x u-matrix R *(0). Therefore the recursions (1.6)-(1.9) will still hold with the new initial conditions, although, in general, u ~ m. The size of u, which is bounded by the rank of (8.1), determines the number of equations in the algorithm, for K is n • m, O* is n x u, R is m x m and R* is u • u. This is, in essence, Kalilath's, Morf's, and Sidhu's extension of the algorithm first presented in [10]. Whenever u is small we shall find it useful. We shall now investigate to what extent the analysis of Sections 2-7 can be modified to the situation where the Liapunov condition (1.2) no longer holds. To this end, first note that L e m m a 2.1 still holds (except for the initial conditions), but that the parameter matrices F, and r * are now rectangular with dimensions m • u and u • m respectively. Similarly, O*,(z) is now m x u, whereas O, (z) retains its dimension (m • m). These matrix polynomials satisfy the recursions (2.23) and (2.24), and therefore (3.1) is valid. However, in trying to solve the integrals contained in (3.1), the situation is complicated by the fact that, in general, R* is no longer nonnegative definite. T h e r e f o r e we shall begin by discussing the situation where it is. Theorem 8.1. Let (H, F) be observable and (F, G) controllable. Let C be any
matrix such that S = CC', and suppose that the equation C X = H has a matrix solution X. Assume that (8.1) is nonpositive definite and has rank u. Let {P(0), P(1), P(2) . . . . } be the solution of the Riccati equation (1.1). Then there is a system (1.12) of first-order difference equations, with N = max(mn, un ) and a matrix-valued function I1 such that P(t) = l l ( x ~ ( t ) , x : ( t ) , . . . , x , (t)) for t = O, 1 , 2 , . . . . Proof. Since (H, F) is observable, we may without restriction assume that (H, F) has the form (2.3)-(2.4). The matrix inversion lemma can be used to reformulate (1.9) as
166
A. Lindquist / Integrals in discrete-time Kalman-Bucy filtering
R * ( t + 1)-' = R * ( t ) - ' + R * ( t ) - ' Q * ( t ) ' H ' R ( t + 1 ) - ' H Q * ( t ) R * ( t ) - ' (cf. [10]). Therefore, since (8.1) is nonpositive definite and consequently R *(0) ' is positive definite, R *(t)-' will be positive definite for all t. In our subsequent analysis we shall distinguish between two cases. First assume that v ~> m. Then define the matrix sequences (~* and 0 * by the partitioning
o*(n*)- r,., = (O*n
~-'~, 0 %
(8.3)
where (~* is n x m and 0 * is n x ( v - m). Now the counterpart of (5.3) can be written P - O * R - ' O *'= F P F ' - K R - ' K ' +
D + O'O*'.
(8.4)
Then defining U and V by (5.5)-(5.6), we obtain in analogy with L e m m a 4.1
e = (3- UH)e(]- UH)'+(U-A)S(U-A)'+
G G ' + t)*t~*'.
(8.5)
Moreover, with obvious modifications of B, which will now depend on 0 " , (3.8) holds. In view of (7.5), ] - U H is a stability matrix (Corollary 4.4), and consequently (3.8) has a unique solution ( L e m m a 4.3), for (4.7) with Y = U has one. H e n c e (3.8) can be solved for V and R in terms of U and 0 * Precisely as in Sections 6-7 we can now construct a recursion in the n x v-matrix sequence (U, 0 " ) . Then, in the manner of Section 5, P(t) can be expressed in terms of (U(t), 0 * ( t ) ) by a function /-/ which does not depend on t. Next consider the case v < m. Now define the n x m-matrix sequence O* = ( O * ( R * ) r'2,0)Rr':
(8.6)
Then (5.3) will hold, and therefore the formal analysis will be the same as in Sections 3-5. In particular (4.1) will hold and therefore (3.8) will have a unique solution. Hence (3.8) can be solved for V and R in terms of U. Consequently the first-order difference equation in the n x m - m a t r i x sequence U, constructed as in Section 6-7, will be the required recursion. R e m a r k 8.2. If, in the case m > u (of the proof above), we introduce the partitionings U = (O, 0 ) and V = (if', V), where 0 and I7" are n x v, we see that ~" = O. Since f" was solved in terms U, we have a system of (m - v)n integrals (of the mn first-order difference equations in U) which have not been exploited. At present we do not know how to reduce the system further.
A. Lindquist / Integrals in discrete-time Kalman-Bucy filtering
167
Next let us consider the general case where (8.1) is not necessarily nonpositive definite. Since it is clearly possible to modify the analysis of Section 6 (by writing Q * ( R * ) - ' O * as X X ' Y Y ' and handling the X - and Y-terms separately) to obtain ( m + u ) n + ~ m ( m + 1) scalar recursions in the c o m p o n e n t s of Q, (~ (properly defined) and R (in analogy with L e m m a 6.2), and (3.1) provides m n + ~ r n ( r n + 1) integrals, we conjecture that nu firstorder e q u a t i o n s are sufficient for determining the solution of (1.1). So far we have only been able to prove this for the case that (8.1) is nonpositive definite a n d u >1 m ( T h e o r e m 8.1), which includes the stationary case. To tackle the general case we shall have to apply some transformation other than the bilinear U - V - t r a n s f o r m a t i o n of this paper.
Acknowledgment The a u t h o r would like to thank Professor L.E. Zachrisson who read the first version of this paper and supplied some valuable suggestions for its revision. Note added in proof. As a closer study of [24] will reveal, the proof of L e m m a 4.2 remains valid if the condition (i) that (F, G ) is controllable is replaced by the weaker condition (ii) that (F, G ) is stabilizable. H e n c e condition (i) can be deleted in T h e o r e m s 3.2 and 7.3 (for F is stable) and replaced by (ii) in T h e o r e m 8.1.
References [1] N.I. Akiezer, The classical moment problem (Hafner, New York, 1965). [2] R.W. Brockett, Finite dimensional linear systems (Wiley, New York, 1970). [3] J. Casti, R. Kalaba and V.K. Murthy, "'A new initial value method for on-line filtering and estimation". IEEE Transactions on Information Theory (1972) 515-517. [4] F.R. Gantmacher, The theory of matrices, Vol. 1 (Chelsea, New York, 1959). [5] L. Ya. Geronimus, Orthogonal polynomials (Consultants Bureau, New York, 1961). [6] T. Kailath, "Some Chandrasekhar-type algorithms for quadratic regulators", in: Proceedings of the IEEE decision and control conference, Dec. 1972. [7] T. Kailath, "Some new algorithms for recursive estimation in constant linear systems", IEEE Transactions on Information Theory IT-19 (1973) 750-760. [8] T. Kailath, M. Morf and S. Sidhu, "Some new algorithms for recursive estimation in constant discrete-time linear systems", in: Proceedings of the seventh Princeton symposium on information and system science. March 1973. [9] N. Levinson, "The Wiener RMS (root mean square) error in filter design and prediction", Appendix B in: N. Wiener. Extrapolation, interpolation and smoothing of stationary time series (M.I.T. Press, Cambridge, Mass., 1942).
168
A. Lindquist / Integrals in discrete-time Kalman-Bucy filtering
[10] A. Lindquist, " A new algorithm for optimal filtering of discrete-time stationary processes", S I A M Journal on Control 12 (1974) 736-746. [11] A. Lindquist, Optimal filtering of continuous-time stationary processes by means of the backward innovation process, S I A M Journal on Control 12 (1974) 747-754. [12] A. Lindquist, "On Fredholm integral equations, Toeplitz equations and Kalman-Bucy filtering, International Journal of Applied Mathematics and Optimization 1 (4) (1975) 35.5-373. [13] A. Lindquist, "Linear least squares estimation of discrete-time stationary processes by means of backward innovations", in: Proceeding of the international symposium on control theory, numerical methods and computer systems modelling, Rocquencourt, France, June 1974, Lecture Notes in Economics and Mathematical Systems, Vol. 107 (Springer, Berlin, 1975) pp. 44-63. [14] A. Lindquist. "'Some reduced-order non-Riccati equations for linear least squares estimation : The stationary, single-output case", International Journal of Control, to appear. [15] A. Lindquist, "Some new non-Riccati algorithms for continuous-time Kalman-Bucy filtering", Applied Mathematics and Optimization, to appear. [16] D.G. Luenberger, "Canonical form for linear multivariable systems", IEEE Transactions on Automatic Control AC-I 1 (1967) 290-293. [171 Z. Luo and T.E. Bullock, "Discrete Kalman filtering using a generalized companion form", IEEE Transactions on Automatic Control (1975) 227-230. [18] M. Morf. G.S. Sidhu and T. Kailath, "Some new algorithms for recursive estimation in constant, linear, discrete-time systems", IEEE Transactions on Automatic Control AC-19 (1974) 315-323. [19] J. Rissanen, "A fast algorithm for optimum linear prediction", IEEE Transactions on Automatic Control AC-18 (1973) 555. [20] J. Rissanen, "'Canonical Markovian representations and linear prediction", to appear. [21] G.W. Stewart, Introduction to matrix computations (Academic Press, New York, 1973). [22] R.A. Wiggins and E.A. Robinson, "Recursive solutions to the multichannel filtering problem", Journal of Geophysical Research 70 (1965) 1885-1891. [231 P. Whittle, "On the fitting of multivariate autoregressions and the approximate canonical factorization of a spectral density matrix", Biometrica 50 (1963) 129-134. [241 W.M. Wonham, "On a matrix Riccati equation of stochastic control", S I A M Journal on Control 6 (1968) 681-697.
Mathematical Programming Study 5 (1976) 169-190. North-Holland Publishing Company
ON CONSISTENCY AND IDENTIFIABILITY* Lennart LJUNG Lund Institute of Technology, Lurid, Sweden**
Received 16 May 1975 Revised manuscript received 31 July 1975 The convergence with probability one, of the parameter estimates obtained from prediction error identification methods, such as the maximum likelihood method, is analysed in this paper. It is shown that under quite weak assumptions on the actual system that has generated the data, the expected value of the identification criterion can be used for the asymptotic analysis of the estimates. In particular, the true system does not have to belong to the set of models over which the search for optimum is made. The implications of this result for consistency analysis and for questions of identifiability, as well as for other related problems are discussed.
1. Introduction T h e identification p r o b l e m is to d e t e r m i n e a d y n a m i c m o d e l that (according to some criterion) described the i n p u t - o u t p u t data m e a s u r e d from some process as well as possible. O n c e a c e r t a i n m e t h o d to solve this p r o b l e m has b e e n devised it is n a t u r a l to test its p e r f o r m a n c e in v a r i o u s ways. T h e tests can be n u m e r i c a l , like when the m e t h o d is applied to data s i m u l a t e d on a c o m p u t e r . A particularly c o m m o n analytical test is to study the asymptotic b e h a v i o u r of the m e t h o d (and of the estimates that it produces) as the n u m b e r of m e a s u r e d data tends to infinity. Since the d a t a often are c o n s i d e r e d to be r a n d o m processes, such analysis has to be p e r f o r m e d using probabilistic m e t h o d s . T h e c o n c e p t s of consistency and identifiability are closely related to such analysis a n d to the limits of the estimates (if they exist), as is f u r t h e r explained in Section 3.
* This work was supported in part by the Air Force Office of ScientificResearch, AF Systems Command, under contract AF 44-620-69-C-0101, the Joint Services Electronics Program under contract N-00014-67-A-0112-0044, and the Industrial Affiliates Program at the Information Systems Laboratory, Stanford University. ** At present: Stanford University, Stanford, Calif. U.S.A. 169
I70
L. Ljung / Consistencyand identifiability
The class of identification methods to be studied here are defined as procedures that minimize the prediction error of the model, when applied to the recorded data. This class contains min-max entropy methods, and under certain assumptions on the statistics, the Maximum-likelihood method. These methods have attracted much interest and have shown good performance in practical applications, [1, 7]. In Section 2 we shall define the class of identification methods and the set of models formally, while in Section 3 the concepts of identifiability and consistency are discussed. Section 4 reviews some results on consistency of these methods and in Section 5 a general result on their asymptotic properties is proved. The implications of this result are discussed in Section 6.
2. Models and identification criterion
Loosely speaking, the identification setup is entirely determined by three entities: the data, the set of models, and the identification criterion. The identification procedure, then, is to determine that (those) element in the set of models that gives the best fit to the measured data according to the chosen criterion. In this section we shall discuss some different set of models to be used throughout the paper and also define a class of identification criteria.
2.1. Models We shall generally denote a specific model by J~(0), where 0 is some parameter vector belonging to a given set D~. As 0 varies over D~, M ( 0 ) describes a set of models, which will be denoted by M: J~ = {J/~(0)10 E D~ }. In this p a p e r we shall only consider linear models. Some results valid for more general models are given in [2] and [3]. Example 1. Linear models in state space representation. The state space representation is a common and convenient way of describing linear, time-varying systems. The input-output relation for the model M ( 0 ) is then defined by
xo(t + l ) = Fo(t)Xo(t)+ G o ( t ) u ( t ) + e(t), y(t) = Ho(t)Xo(t)+ v(t),
O E D~,
(1)
L. L]ung / Consistency and identifiabitity
171
where e ( ' ) and v ( . ) are random processes with zero means and covariances
E e ( t ) e ' ( s ) = Oo(t)6,,,
E e ( t ) v ' ( s ) = R~o(t)6,,
E v ( t ) v'(s) = Ro(t) 6,,. We shall have reason to be interested in the linear least squares estimate of y(t + 1) given y(s), u(s), s = 0 . . . . . t and some initial estimate s 0) with error covariance P(010), and given that the model M(0), (1), is a true description of the data. This estimate will be denoted by 13(t + 1 I 0) and is obtained from standard Kalman filtering,
f~(t + l l O)= Ho(t + l ) 2 ( t + l J O),
(2a)
llO ) = F 0 ( t ) 2 ( t l 0 ) + G o ( t ) u ( t ) + K o ( t ) [ y ( t ) - H o ( t ) 2 ( t [ O ) ] ,
(2b)
where
s
and Ko(t) is the Kalman gain matrix, determined as
Ko(t)=[Fo(t)P(tlO)H'o(t)+ R~o(t)][Ho(t)P(t[O)H'o(t)+ Ro(t)] -',
(2c)
P(t + 11 O) = Fo(t)P(tlO)F'o(t ) - Ko(t)[Ho(t)P(tlO)H'o(t) + Ro(t)]-'K'o(t)
+ Oo(t).
(2d)
The initial values 2(0[ 0) and P (01 0) can either be known or parameterized in an arbitrary way by 0. We shall not be much concerned with the way in which the matrices F, G, H, O, R, R c, 2(0t0) and P(0[ 0) are determined from 0, but we shall assume that the matrix-elements are continuously differentiable functions of 0. Apart from this assumption, the unknown elements may enter quite arbitrarily in the matrices. Some elements may be known from basic physical laws, or a priori fixed like in canonical representations. Other elements may be related to each other etc. The important thing is that 0 is a finite-dimensional, time-invariant parameter that determines all the matrices for all t >-0. Example 2. General linear, time-invariant models. A linear time-invariant model can be described as y(t + 1) = ~ o ( q - l ) u ( t ) + Ygo(q-~)e(t + 1),
(3)
where q-~ is the backward shift operator: q - l u ( t ) = u(t - 1) and go(z) and Ygo(z) are matrix functions of z (z replaces q-l). The variables e ( . ) are assumed to be independent random variables with zero mean values and covariance matrices E e ( t ) e ' ( t ) = Ao (which actually may be time-varying).
L. L]ung / Consistencyand identifiability
172
It will be assumed that ~o(z) and ~ 0 ( z ) are matrices with rational functions of z as entries and that ~0(0) = L The latter assumption implies that e(t) has the same dimension as y(t), but this is no loss of generality. To find the linear least squares estimate f ( t + 1 10) from (3) requires some caution regarding the initial values. In general the filter determining the estimate )3(t + 1 10) will be time-varying, even though (3) is time-invariant. In such a case a state space representation can be used. A simpler approach is to assume that information, equivalent to knowing all previous y (t), u (t), t < 0, is available and that hence the prediction filter has reached stationarity. It will follow from the analysis in the following sections that this assumption is quite relevant for identification problems. From (3) we have
ffffol(q -1) y(t + 1) = ~Laol(q -1) ~qo(q-1) u(t) + e(t + 1) and y (t + 1) = [I - ~gol(q-1)] y (t + 1) + ~ 1 ( q - 1 ) ~d0(q-~) u (t) + e (t + 1).
(4)
Since ~gol(0) = L the right-hand side of (4) contains y(s) and u(s) only up to time t. The term e (t + 1) is independent of these variables, also in the case u(t) is determined from output feedback. Hence, if we assume that all previous y ( s ) and u(s) are known, we have )3(t + 1[0) = [I - ~o1(q-1)] y(t + 1)+ ~ol(q -~) ~30(q-1) u(t)
(5)
which in this case also equals the conditional mean. Now, linear systems are often not modelled directly in terms of the impulse response functions ~d0(z) and ~o (z). A frequently used representation is the vector difference equation (VDE):
Ao(q -~) y(t + 1) = Bo(q -1) u(t)+ Co(q -1) e(t + 1).
(6)
Another common representation is the state space form (in the timeinvariant innovations representation form),
xo(t + 1) = Foxo(t)+ Gou(t)+ Koe(t),
(7)
y ( t ) = Hoxo(t)+ e(t). It is easily seen that these two representations correspond to
~o(z)= A;l(z)Bo(z),
2(o(z)= A~l(z)Ce(z)
(8)
and
~e(z) = Ho[I - zFo]-lGo, respectively.
2go(z) = zHo[I - zFe]-~Ko + I
(9)
L. Ljung / Consistency and identifiability
173
Inserting (8) into (5) it is seen that )3(t + 1 10) is found as the solution of
Co(q-l))3(t + 11 O) = [Co(q-1)- Ao(q-l)]y(t + 1)+ Bo(q-~)u(t)
(10)
for the case of a VDE-model. For the state space model (7),)3(t + 1 [ 0) is found from
$o(t + 1) = Fo.~o(t)+ Gou(t)+ K 0 [ y ( t ) - Ho.~o(t)], )3(t + 1 1O) = Ho$o(t + 1).
(11)
We shall also in this case assume that the matrix elements of q3~ and Yg0 (and
Ao(z ), Bo(z ), Co(z), Fo, Go, 14o and Ke) are continuously differentiable with respect to 0, but apart from that the parameter vector 0 may enter arbitrarily in the matrices. Remark. From (10) and (11) it is seen that certain initial information is required to start up the algorithms, namely for (10) {y(0) . . . . , y ( - N ) , u (0) . . . . . u ( - N), )3(01 0) . . . . . )3( - N I 0)} and for (11) i0 (0). In many cases it is not feasible to assume that these are known. Therefore they should be parametrized by the parameter vector 0. However, nothing prevents us from taking trivial parametrizations, like ~o(0) = 0 for all 0 E D~, etc., since we shall not introduce the requirement that there is a 0 in D~u that corresponds to a " t r u e " description of the data. We shall often, for notational convenience, also suppress the initial values in explicit formulas (i.e., suppose that we have t h e above "trivial" parameterization). In these two examples the predicted value )3(t + 11 0) is obtained by linear filtering operations on y ( . ) and u ( . ) ,
)3(t + 11 O) = ~ [h,.s(O)y(s)+f,.~(O)u(s)].
(12)
s=O
Since the coefficients of this filter are continuously differentiable with respect to the system matrix parameters, we have d)3(t+llO)= dO
~ -~-~h,s(O)y(s)+-~,s(O)u(s) s=o '
"
(13)
We shall be particularly interested in the case where the linear filters (12) as well as (13) are exponentially stable. The set of those 0 yielding this property will be denoted by Ds(gl). It is easy to see that for models described by (6),
Ds(~)={OIdetCo(z)=O
~ [z I > 1}
(14)
174
L. L]ung / Consistencyand identifiability
and for models described by (7),
D s ( J A ) = { O I F o - K o H o has all eigenvalues in lz l < l }.
(15)
Moreover, for the general time-varying model (1), the well-known stability properties of the Kalman filter, see, e.g., [4, Theorem 7.4], implies that Ds (J~) = {0 I[Fo(. ), 0o(" ), Ho(" )] is completely uniformly controllable and observable}.
(16)
Furthermore, it follows that over compact subsets of these Ds, the base of the exponential decay of the filter coefficients, is uniformly bounded by a constant strictly less than 1.
2.2. A class of identification criteria From the linear least squares predictions )~(t + 1 [ 0) and the data we can form the following matrix o,,(0)
:
1
[y(t)- ~(tlO)][y(t)- p(t J0)]'.
(17)
This matrix is a measure of how well the model :g(0) is able to describe the recorded data. Remark. In some cases there might be reason to study a weighted version of (17),
s
-- -~ ,-, [X/-R(t)(y(t)-
9(t I 0))1 [ X / ? ( t ) ( y ( t ) -
9(t I 0))1'
(18)
where R ( . ) is a sequence of positive semidefinite matrices. However, this can also be seen as a rescaling of the output, and we shall confine ourselves, for reasons of notational convenience, to the case (17). It is reasonable to take as the identification criterion some continuous function h ( . ) o f ON(O):
VN(O) = h [ON(0)].
(19)
The parameter estimate based on N measurements, 0N, is thus taken as the 0 that minimizes VN(O) over D~,, and the corresponding model is taken as For the minimization to make sense, some simple properties of h ( ' ) should be required, essentially that h ( . ) retains an ordering property among the matrices, see [3].
175
L. Ljung / Consistency and identi~ability
The identification criterion (19) has in itself a good physical interpretation: To choose that model that has the best prediction performance when applied to the data. Moreover, cf. [28], if the innovations of the models in Examples 1 and 2 are Gaussian with covariances A(. ), then it is well known that the log likelihood function for the problem is N -I ~ [y(t)- ~(ttO)]'A-'(t)[y(t)-
N
~ ( t l O ) - -~ log 2w +
t=l N
- 89 ~
log det A(t).
(20)
t=l
This holds even if there is non-linear output feedback present in the system. If A ( . ) are known, then maximizing (20) with respect to 0 is equivalent to minimizing tr 0N(0) [R (t) = A-l(t)]. If A does not depend on t, but is unknown, then the maximization of (20) with respect to A can be performed analytically, see [5], and 0 is found by minimizing det ON(0).
(21)
In case the distribution of the innovations is unknown, (21) is the 0-dependent term in max H [ y ( . ) - ))(. 10)], where H is the entropy of the prediction error and where the maximization is over all possible distributions, with the constraint that the covariance equals the sample covariance 0N(0), see, e.g., [6]. Consequently, the class of prediction error identification methods, defined by minimization of (19), contains the maximum-likelihood method as well as rain-max entropy methods. The criterion (19) was first suggested and applied to system identification problems in [7], and has after that successfully been applied to numerous practical identification problems.
3. I d e n t i f i a b i l i t y a n d c o n s i s t e n c y
The concept of identifiability has been given several different definitions in the literature, and we shall here briefly discuss a few of them. We may distinguish two major approaches. The apparently most common approach is to relate the identifiability property to consistency of the parameter estimate 0N. We shall label this approach as "consistencyoriented identifiability definitions", The "true" parameter 0o is then said to
176
L. Ljung / Consistency and identifiability
be identifiable if the sequence of estimates 0N converges to 00 in some stochastic sense. This is the path followed e.g. in [7] (convergence with probability one), [8] (convergence in the mean square sense) and in [9] (convergence in probability). A somewhat different definition is used in [10 and 3]. There a set DT(0~ J/t) = {0
o
1 ~l,~(tly)_9(t[O)t2= 0 ,
for all bounded inputs u ( . ) }
(22)
is defined, where )~(t 15e) is the true prediction of the system 0 ~ Then ow is said to be system identifiable if 0N-->D~(Ae, J/t) with probability one as N ~ oo and to be parameter identifiable if, in addition, DT(ow,M ) consists of only one point. Although this definition makes no reference to any " t r u e " parameter value 0o, it should be regarded as "consistency-oriented", since the requirement that DT(Se, M) is non-empty implies that there is a "very good model" available among the set of models M. Indeed, if D~ contains a " t r u e " parameter 0o, then this definition of parameter identifiability is equivalent to the one first given. These definitions require that the true system allows an exact description within the model set. In practice this is usually not a very realistic assumption, since almost any real-life process is more complex than we would allow our model to be. However, even if the set of models does not contain the true system, questions of identifiability of the model parameters are still relevant. One could think of a state space model like (1) where all the matrices are filled with parameters. Even if the data are furnished by an infinitely complex system, it will not be possible to identify the parameters of the model, simply because several models give exactly the same fit, i.e., the identification criterion VN(O) does not have a unique minimum. This leads us to "uniqueness-oriented identifiability definitions", like in [11], where a model set is said to be (globally) identifiable, if the identification criterion used has a unique global minimum. A complication in the present context is that the identification criterion is a random function and a bit awkward to handle. We would be much better off if VN(O) converges (with probability one) to a deterministic function (or asymptotically behaves like one). Let us remark here already that such convergence must be uniform in O, in order to enable us to relate minima of VN(O) to minima of the deterministic function. We shall have occasion to return to this point below. In addition to the references mentioned above, interesting results can also be found in, e.g., [29, 30 and 31].
L. Ljung / Consistencyand identifiability
177
4. Some consistency results
The consistency problem for the maximum likelihood method has been quite widely studied. For independent observations the consistency has been studied by, e.g., Cramer [12], Wald [13] and Kendall-Stuart [14]. The application of the maximum likelihood method to system identification (for single input-single output models on a difference equation form) was introduced in [7], where it also is shown how the assumption on independent observations can be relaxed. Applications to other (linear) model choices have been considered in, e.g., [15, 16, 17, 18, 9 and 19]. However, it should be remarked that several of the proofs on strong consistency (convergence with probability one to the true parameter value) are not complete, a fact that can be traced back to a short-coming in the proof in [14]. The first complete strong consistency proofs for applications to system identification seem to be given in [2 and 20]. Let us cite, for future discussion, the following consistency result from [3, Theorem 4.2 and Lemma 5.1]. Theorem 1. Consider the set of models described in Example 1. Assume that
D~, over which the search in 0 is performed, is a compact subset of Ds(~t) (cf. (16)), and is such that Dr(Ae, Z/t) defined by (22) is non-empty. Assume that the actual system (with possible feedback terms) is exponentially stable and that the innovations of its output have bounded variance and are of full rank. Then, the identification estimate ONthat minimizes the criterion (19) converges into D, = { O I O E D ~ ; l~m i n f ~ , = , E l ~ ( t l 6 P ) - ~ ( t l 0 ) [ 2 = 0 for the actual input to the process}
(23)
with probability one as N tends to infinity. This result is rather general and is not based on any ergodicity assumptions. To ensure parameter consistency, it should be required first that the actual input during the identification experiment was sufficiently general so that /91 C Dr(Y, ~ ) holds (which implies "system identifiability"), and secondly, that the model is suitably parameterized so that
178
L. Ljung / Consistency and identifiability
DT(,9 ~ ./~ ) : {/9 *} holds. It is convenient to study these conditions separately. The restrictive assumption in the theorem apparently is that Jw(o'c~ ./~) be non-empty. This requires the true system to be "not too complex" and is rare!y met for real life processes. However, the philosophy of consistency results should be viewed as a test of the method: If the method is unable to recognize the true system in a family of models, then it is probably not a good method. The same philosophy clearly lies behind testing identification methods on simulated data. It should however be noted, that from such consistency results, strictly speaking nothing can be stated about the performance of the method when applied to a system that does not have an exact description within the set of models.
5. A limit result for the criterion function
In this section we shall give results for determining the asymptotic behaviour of the estimates ON that minimize the criterion function (19), VN(O), also in the case where the true system is more complex than can be described within the set of models. We shall do that by giving conditions under which
VN(O) = h i e QN(0)] can be used for the asymptotic analysis. Thereby "the stochastic part of the problem" is removed and the analysis can proceed with the deterministic loss function V'N(0). In order to make the analysis as general as possible, we would like to impose as weak conditions as possible upon the actual system. The important property we need is a stability property, but in order to state it, we shall assume that the true system with (possibly adaptive) feedback admits a description as follows,
x(t + 1) = f[t; x(t), u(t), e(t)], y(t) = g[t; x(t), e(t)],
(24)
u(t) = h[t; x(t) . . . . . x(O), uR(t)], where y ( . ) is the output, u ( . ) the actual input to the process, uR(.) a reference (extra) input, or noise in the feedback and e ( . ) is a sequence of
L. L]ung / Consistency and identifiability
179
independent random variables. The over-all stability property which we shall require is the following. Define yO(. ) and u~ 9) through x~ + 1) -- f[t; x~
u~
e(t)],
y~
= g[t; x~
e(t)],
u~
= hit; x~
. . . . . x~
X~
= 0 (or any value independent of e(r), r < s)
(25a)
0 . . . . ,0, UR(t)],
Then the property is E l y ( t ) - y ] ( t ) 1 4 < CA' s,
E tu(t)-u~
<
A
CAt-s;
t<s
(25b)
and Ely(t)['
(25c)
Elu(t)[4
All expectations are over e(.). The assumptions (24) and (25) are quite weak, in particular as we shall not need to specify the description (24). We now have the following result. Lemma. Let the set of models be defined by (1) or (3) (which includes in particular (6) and (7)), and assume that the actual process is subject to (24) and (25). Then sup IQN(O)-EQN(O)I--~O with probability one
as N - - ~ ,
(26)
O~bs
where IDs is a compact subset of D s ( ~ ) (cf. (14)-(16)), and ON(O) is defined by (17). The expectation is over e(. ) in (24).
The proof of the lemma is given in the appendix. The lemma implies that, if h ( - ) is continuous, then 17"N(0)= h[EQN(0)] will be arbitrarily close to VN(O) in the sup-norm, w.p.1, and hence that the local and global minima of V,,,(0) are arbitrarily close to those of VN(0). In particular the globally minimizing points of 17"N(0) and VN(O) will be arbitrarily close, and if
9 ( 0 ) = lim ~,,(0)
(27)
exists, then ON will converge w.p.1 to the globally minimizing point(s) of 9(0).
L. Ljung / Consistencyand identifiability
180
It is important to notice that these properties follow only since (26) holds uniformly in 0. If it is known only that QN(O)- E QN(0)---~ 0 w.p.1 as N--~ o0 for all 0 U/5~ (as has been proved by several authors, although under more restrictive assumptions on the system), then the minimizing points of V~(O) and VN(0) do not have to be close. Moreover, in the lemma no assumptions on stationarity neither on the system nor on the model are introduced and E QN(0) does not have to converge. However, if it does converge, then the asymptotic analysis can most conveniently be performed on V(O). As the following simple example shows, the lemma does not hold without some kind of stability assumption on the true system. Example 3. Consider the system y(t + I) = bu(t)+ e(t + 1),
where
b
and
e(t)EN(O, 1)
with (adaptive) feedback
u(t) = - 89 + sign y(1)] y(t);
u ( o ) = o.
Let the model be y(t + 1)= b u ( t ) + e ( t + l ) and the criterion is 1
VN(b) = ~ ~ [y(t + l ) - b u ( t ) ] 2. Then E VN(/~)= 89[1 + 1 + / ~ 2 - 2b/~ 1 l_b 2 However, for realizations such that y(1) > 0, VN(b) --, l + b1~-- 2bb2b
w.p.1
as
N-+oo
VN(b)-+I
w.p.1
as
N---~
and for y(1) < 0. Clearly, this adaptive regulator does not yield the over-all stability property (25), since the effect of y(1) lingers forever. This simple example could of course easily be handled by conditioning with respect to y (1), but it illustrates the difficulties that may arise with adaptive regulators. For such
L. Ljung / Consistency and identifiability
181
applications it is sometimes helpful to avoid taking the expectation of the criterion function, cf. [3].
6. Some applications of the lemma We may consider the lemma of Section 5 as a basic tool for the convergence analysis of the estimates 0(. ), and we shall in this section point out some potential applications of it.
6.1. Properties of the asymptotic estimates For the sake of definiteness, let us take h [-] = tr and assume that
~ f'N( o ) = -~1 ,=,
Ely(tl_;(tlo)[~
converges to V(0) as N tends to infinity. It then follows that the estimate(s) 0u that globally minimizes Vu(O), will w.p.1 tend to
DL = {O[ Q(O)= o.~o.inf V(0*)} if D~ C/5,. Moreover, let )~(t+l15~) = E [ y ( t + l ) l ~ , ] ,
where
~,={y(s),u(s);s
be the true prediction (which of course may be non-linear in ~,). Then
;(t + l ) = ~(t + l [ ~ ) + o(t + l), where o ( . ) are the innovations, obeying
E[,,(t + 1)I ~,] = 0. We now have
E ly(t+ 1 ) - fi(t+ l l 0 ) r = E o'(t + 1) o(t + 1) + E 2o'(t + 1)[)5 (t + l I s e) - :9(t + 1[ o)] + E I)~(t + 1150-)~(t + 110)[ 2. The second term of the right-hand side obviously is zero, and hence the global minima of V(O) are also the global minima of N--1
W(O) = l~m -~ ~_, E l~(t + l ] Se)- ~(t + l l O)t 2 t=O
(28)
L. Ljung / Consistencyand identifiability
182
Consequently we have proved that tin will tend to the global minimum of W(O), no matter if this is unique or not. In other words, the limiting estimate will give a model that is the best approximation of the true system (in the sense of (28)) for the particular input used in the identification experiment. This is by no means a surprising result, but it has here been established under quite general conditions. It can be remarked that it is this property that makes the identification method powerful in applications, rather than the consistency properties. While these results do not follow from consistency analysis, it is of course possible to deduce consistency properties from the lemma. Hence, if DT(ow,~) is non-empty, then W(O) assumes the value 0 for some 0", which has to be its global minimum, and from this the theorem of Section 4 follows.
6.2. Identifiability properties The limit function 9 ( 0 ) or W(O) of the previous subsection can be used for determining the identifiability properties of a certain model set (parameterization) without reference to any true parameter values. Hence, a model parameterization, ,/~, can be said to be parameter identifiable, under given experimental conditions (input signal properties), for a given system ow, if 9(0) [or W(0)] has a unique global minimum. It is clear that this concept is appropriate and that the parameterization problem for multi-output structures is equally important even if the true system is "very complex". The same holds for the identifiability properties under output feedback, cf. e.g., [10 and 21], as is indicated in this simple example. Example 4. Consider the model (29)
y(t + 1)+ h y ( t ) = bu(t)+ e(t + 1) with output feedback
(30)
u(t)= g y ( t ) and the criterion VN(d,/)) = N ,=, [y(t + 1)+ a y ( t ) - b u(t)] ~. It is clear that, regardless of the true system, 9N(a, l;) = 9,~(,~ + kg, b + k),
-~
L. L]ung / Consistency and identifiabitity
t83
and consequently VN(0) cannot have a unique minimum. Hence, the model set (29) under the experimental condition (30) is never parameter identifiable, no matter what the true system might be.
6.3. Local minima of the criterion function If the numerical minimization of VN(O) is performed using a gradient method, the "false" local minima of VN(O) are potential traps for the algorithm, and it is a most interesting problem to analyse the conditions under which such local minima may or may not exist. Since VN(O) converges uniformly to 1)(0), it follows that a local minimum of V(0) will, w.p.1, for sufficiently large N correspond to a local minimum of VN(O) and vice versa. Therefore the analysis for local minima can be performed in terms of V(0) instead of VN(O), which of course is a great simplification. In [22 and 23] several interesting results of this kind are given.
6.4. Certain difficulties with adaptive regulators When the true system cannot be modelled exactly, the identification method will still make the best possible out of the situation by minimizing (28), as explained in Section 6.1. However, it should be realized that the minimum of Qu(0) in general depends on the actual input during the identification experiment. If the input is determined as output feedback, the minimizing element 0N will depend on the feedback law. If now the regulator is adaptive, and the feedback law is determined from the current estimate, cf. [24], the analysis of the loss function VN(O) becomes cumbersome. Let us consider the following simple example.
Example 5. Let the system be given by y(t + 1)+ a y ( t ) = u(t)+ e(t + 1)+ ce(t),
(31)
where e ( . ) is a sequence of independent random variables with unit variance, and let the model set be given by
y(t + 1)+ fly(t) = u(t)+ v(t + 1), where v(" ) are assumed to be independent. Then the model set does not contain a true description of the system (31). Let the identification criterion be as in Example 3. The input to the system is determined as
u(t) = d~(t) y(t),
(32)
L. Ljung / Consistencyand identifiability
184
where d (t) is the value that minimizes V, (ti). With a constant feedback (30), we would have as the asymptotic estimate ti=a
-
c[1-(a-g)2] 1 + c: - 2(a -
g)c
which clearly depends on the actual feedback coefficient g. When taking into account the adaptive feedback (32), the determination of E VN(d) becomes difficult, and it is even impossible to easily decide whether EVN(~i) will converge or not as N increases. In fact, for adaptive regulators of this kind, E VN(O) may really fail to converge (without tending to infinity), cf. [24]. Our conclusion of this example is that although the lemma provides the tool for analysing the asymptotic behaviour of the estimate (regardless of convergence of E VN(O)) even for these more complex problems, it may not be so easy to use. An idea that, for these problems, seems more appropriate than to determine E VN(O), is to consider the conditional change in the criterion E [V~+,(O)- V~(O)I V~(-)] since this quantity reflects the significance of the present control action for future estimates. By instead considering the expected change in the minimizing point 0N of VN(O), given 0N, it is possible to track the estimate and analyse its asymptotic properties. This is, essentially, the approach that has been taken in [26, 25 and 24].
7. C o n c l u s i o n s
A particular aspect of the asymptotic analysis of the estimates obtained by minimization of the prediction error identification criterion (19) is the question of identifiability. By this can be meant that the identification method has the ability of recovering the true parameter values if they belong to the set over which the criterion is minimized, (or to yield a model that has an equivalent input-output mapping, "System Identifiability", see Section 3). It can also be meant that the estimate will converge (w.p.1, say) to a unique value, which, however, does not have to be related to any "true system". We have labeled these approaches as "consistency-oriented" and "uniqueness-oriented", respectively, and given some general results for either approach.
185
L. Ljung / Consistency and identifiability
The l e m m a of Section 5 should be regarded as the main contribution of this paper. It states that, under quite weak assumptions on the true system, the expected value of the loss function, which is a deterministic function, can be used for the asymptotic analysis. W e have also indicated how this result can be used for identifiability and convergence studies, as well as for analysis of other related problems.
Appendix. Proof of the lemma of Section 5
The idea of the proof is to show that 1 sup ~ ,=,
[Y(t)- y(t l O)], [Y(t)- y(t l O)]j
- E[y (t) -
f~(tlO)],[y(t ) -
)~(t [ 0)]/
taken o v e r a "small '~ open set, is arbitrarily small for large N, and then to extend this result to/~s, using the H e i n e - B o r e l theorem. We shall need some properties of )~i(tlO) [ " i " denotes the i-th component], and let us state these results as a lemma. Lemma A. Let B = B ( O * , p ) = { O I [ O - O *
E sup
OEB
d
)3,(t I O)
I < p}.
Then
2< C(O*).
(A1)
Let r/ij(t, 0 *, p) = sup [ y , ( t ) - ~,(t I 0)] [ y , ( t ) - ~,(t t 0)] OEB
~'o(t, 0 *, p) = !nf [y,(t) - )3,(t [ 0)] [yj(t) -
Pj(tl
0)].
Then
-~,=l[rlo(t,O*,p)-Erl,j(t,O*,p)]---~O
w.p.1
as
N--~
and p such that [That is, for all 0", p and e, there exists
l=lrl,j(t,O*,p,w)-Erlli(t,O*,p,w)
a N~ (O*,p,e, to)
r<e
B(O*,p) C ff~s. such that
forN>N~(O*,p,e, to)
all to E 1)(0"), where P [fl(0*)] = 1.] and similarly for ~'~(t, 0 *, O)-
f o r a l l 0*
for
(A2)
L. Ljung / Consistencyand identifiability
186
Proof of Lemma A. We shall throughout the proof use constants, where A < 1, that do not need to be the same. We ourselves to suppress arguments and subscripts freely, when of confusion. Let the linear filter determining ~(tlO) be given by (12). /5, we then have
C and A for shall also allow there is no risk For B(O*, to) C
sup [Ih,,k(0)l + [f,,k(0)[] < CA' k
(A3a)
OEB
and
Hence sup ~-~33(t[
k~_o-d~h,.k(O)y(k)+~f,,k(O)u(k)
=sup
rd
r
--<2 sup -~h,,k(O)[y(k)[+sup k=O
I ~ 0 f , . k ( 0 ) l[u ( k ) J
< c ~ x'-'[ly(k)[ + I,(k)l] k=O
and
Esup ~ ~(t[o) < C 2 A2'-k-'E[ly(k)l +lu(k)l][ly(s)l +lu(s)l]
id
2
k,s=O
< c k ,Zs = O *2'-'-"[',/Ely(k?l~+V'Et,.,(k)l"l x [x/gl y(s)[~ + V~ I u(s)l ~]
A'-k
(A4)
which proves (A1). Let the variables y0(.) and u ] ( . ) be defined as in (25a) and let the prediction based on these variables be denoted by :9~
~ [h,,k(O)yO(k)+f,.k(O)u~ k~s
The components of ~](tlO) will be denoted by 13~ introduce r/~i'i:~(t, 0", p) = sup [y~ OEB
- 13~
[ 0)] [y~
- )3~
Let us also
I 0)].
L. Ljung / Consistencyand idemifiabiliry
187
Notice that "O'~(t) is by definition independent of r/(r) for r < s. Consider /z~(t) = "O(t)- n~ After elementary calculations we find E/x~(t) 2_<_4[Esup!y - )3 [4 .[E sup ly - y~
E sup I)3 - 9~ "1
+E sup[y"-)3~176176
Esup!)3 - )3~
(m5)
As above, (A4), we readily find t h , : Esupjy-)~j'<
C
and
Esupjy~176
(A6)
and by (25b) we have E ! y(t) - y~
< CA '-'.
(A7)
Consider now, (for subscript i o r / ' )
suplp(tgo)-p'J(tlo)l = sup]
k=o ~'~ [h,.~(O)y(k)+f,.k(O)u(k)]
+ ~_~ [h,,k(O)[y(k)-y~ k =s§
-~ f,.~(O)[u(k ) - uO~(k)]] I. Using (A3a), we find after some straight-forward algebra, and use of Schwarz' inequality (a detailed account is given in [2, pp. 94-95]), E suplf1(tlo)-~~
4 < C A '-~.
(A8)
Collecting (A5), (A6), (A7) and (A8) we find, E/z,(t) 2 < CA' '
(A9)
Consider now for t > s, Coy('0 (t), r/(s)) = Cov[-0~
(t), •(s)] --- Cov[/x, (t), r/(s)]
-_ (E/x,(t) 2- E'q(s)2) '/2 < CA' ",
(A10)
where the second equality follows since r/~ is independent of {e(r), r < s}, and hence of r/(s). Boundedness of E rl(s) 2 follows as above, (A4). In [27], the following convergence theorem is given: Let f~ be a sequence of random variables, with zero mean values and with i" +jP
IE[~fil
0_-<2p
(All)
188
L. L j u n g / Consistency a n d identifiability .
Then f~-->0
w.p.1
as
N-->~.
(A12)
i=1
We can now apply this result to ,=, [n,,(t, 0", 0 ) - En,,(t, 0", 0)] since (A10) well implies (All). This concludes the proof of Lemma A. Proof of main lemma. Let
r(O) = O~?(o) - ~ 0 ~ ( o
zj(t, 0) = y j ( t ) - )~j(t [ 0) and consider 1
*) = -~ ,-, [z,(t, O)z,(t, o ) - w z , ( t , o*)z~(t, o*)],
1
sup [r(O)] -< ~ Z sup[z,(t, O)zj(t, 0 ) ] - E z,(t, O*)zj(t, 0") 1
= ~ Z [r/,~(t, 0 % 0 ) - E'o,,(t, 0", p)]
(A13)
1
+ -~ Z E [Tl,i(t, 0", p ) - z,(t, O*)z,(t, 0")]. Using the mean value theorem and (A1), we readily obtain (cf. [2, p. 85-86]), E I "q,,(t, 0", p ) - z,(t, 0") z,(t, 0")[ < p. C(O*).
(A14)
Similarly,
E tz,(t,O*)z~(t,O*)-z,(t,O)zj(t,O)[ < p.C(O*)
for
OEB(O*,p),
which implies
E I Q ~ ) ( O ) - E Q ~ ( O * ) [ < p . C ( O *) for
OEB(O*,p).
(A15)
Collecting (A13), (A14) and (A15), we obtain
sup [ o ~ ' ( 0 ) - E oT~'(O)l < ~ ,~, In,,(t, 0", p ) - E n,, (t, 0", o)]
+ 2pC(O*)
(A15)
Now choose an e > 0, and take for every 0* E/),, the radius p = p* = p(O*) to be the minimum of e/4C(O*) and the distance from 0* to the boundary of Ds. Then for N > N1 (O*,p*, e/2, ~o), the first term in the right-hand side of (A15) is less than e/2 according to (A2), and hence
L. L]ung / Consistency and identifiability
sup
[ O ~ ) ( 0 ) - EO~)(0)] < e.
189
(A16)
o~B(o*,o*)
We shall now extend (20) to hold over /5, by applying Heine-Borel's theorem. Clearly, {B(0*, p*), 0* ~ 15,} is a family of open sets covering/5,. Select a finite family of sets {B(O, p~), i = 1 , . . . , M} that also covers/), and let /~l(e, w) = max Nl(Oi, p(O~), e/2, l<_i<M
o~).
Then
sup [O~)(0)-E O~)(0)] <
for
N >/Ql(e, ~o)
where
P ( ~ ' ) = 1.
for
N >/Q2(e, w)
0E/~ s NI
for all ~o ~ l ) ' = (") f~(0,), i--I
Similarly, sup [E O ~ ) ( 0 ) - O ~ ) ( 0 ) ] < e
which concludes the proof of the Main Lemma. References [1] K. J. ,~str6m and P. Eykhoff, "System identification - - a survey", Automatica 7 (1971) 123-162. [2] L. Ljung, "On consistency for prediction error identification methods", Rept. 7405, Division of Automatic Control, Lund Institute of Technology, Lund, Sweden (March 1974). [3] L. Ljung, "On the consistency of prediction error identification methods", in: R. K. Mehra and D. G. Lainiotis, eds, System identification: advances and case studies (Marcel Dekker, Inc., New York) to appear. [4] A. H. Jazwinski, Stochastic processes and filtering theory (Academic Press, New Y o r k , 1970). [5] J. Eaton, "Identification for control purposes", IEEE Winter meeting, New York (1967). [6] J. Rissanen, "Minmax entropy estimation of models for vector processes", in: R. K. Mehra and D. G. Lainiotis, eds., System identification: advances and case studies (Marcel Dekker, Inc., New York) to appear. [7] K. J. ~ s t r 6 m and T. Bohlin, "Numerical identification of linear dynamic systems from normal operating records", IFAC Symposium on Self-Adaptive Systems, Teddington, England 1965; also in: P. H. Hammond, ed., Theory of self-adaptive control systems (Plenum Press, New York, 1965). [8] R. M. Staley and P. C. Yue, "On system parameter identifiability", Information Science 2 (2) (1970) 127-138. [9] E. Tse and J. Anton, "On the identifiability of parameters", I E E E Transactions on Automatic Control AC-17 (5) (1972). [10] L. Ljung, I. Gustavsson and T. S6derstr6m, "Identification of linear multivariable systems operating under linear feedback control", IEEE Transactions on Automatic Control AC-19 (6) (1974) 836-841.
190
L. Ljung / Consistency and identifiability
[11] R. Bellman and K. J. Astr6m, "On structural identifiability", Mathematical Biosciences 7 (1970) 329-339. [12] H. Cramer, Mathematical methods of statistics (Princeton University Press, Princeton, N.J., 1946). [131 A. Wald, "Note on the consistency of the maximum likelihood estimate", The Annals of Mathematical Statistics 20 (1949) 595-601. [14] M. G. Kendall and A. Stuart, The advanced theory of statistics, Vol. 2 (Hafner, New York, 1967). [15] P. E. Caines, "The parameter estimation of state variable models on multivariable linear systems", in: Proceedings of the U.K.A. C. conference on multivariable systems, Manchester, England, 1971. [16] M. Aoki and P. C. Yue, "On certain convergence questions in system identification", SIAM Journal on Control 8 (2) (1970). [171 A. V. Balakrishnan, "Stochastic system identification techniques", in: M. F. Karreman, ed., Stochastic optimization and control (Wiley, New York, 1968). [18] D. S. Spain, "Identification and modelling of discrete, stochastic linear systems", Tech. Rept. 6302-10, Stanford University, Stanford, Calif. (1971). [19] P. E, Caines and J. Rissanen, "Maximum likelihood estimation in multivariable Gaussian stochastic processes", IEEE Transactions on Information Theory, IT-20 (1) (1974) 102-104. [20] J. Rissanen and P. E. Caines, "Consistency of maximum likelihood estimators for ARMA processes", Control Systems Rept. No. 7424, Department of Electrical Engineering, University of Toronto, Toronto, Canada (December 1974). [211 T. S6derstr6m, I. Gustavsson and L. Ljung, "ldentifiability conditions for linear systems operating in closed loop", International Journal of Control 21 (2) (1975) 243-255. [22] K.J. Astr6m and T. Sfderstr6m, "Uniqueness of the maximum likelihood estimates of the parameters of an ARMA model", IEEE Transactions on Automatic Control AC-19 (6) (1974) 769-774. [23] T. S6derstr6m, "On the uniqueness of maximum likelihood identification for different structures", Automatica 11 (2)(1975). [24] K. J. Astr6m, U. Borisson, L. Ljung and B. Wittenmark, "Theory and applications of adaptive regulators based on recursive parameter estimation", in: Proceedings of the 6th IFAC world congress, Boston, Mass., August 1975. [25] L. Ljung and B. Wittenmark, "Analysis of a class of adaptive regulators", in: Proceedings of the IFA C symposium on stochastic control theory, Budapest, Hungary, September 1974. [26] L. Ljung, "Convergence of recursive stochastic algorithms", in: Proceedings of the IFAC symposium on stochastic control theory, Budapest, Hungary, September 1974. [27] H. Cramer and M. R. Leadbetter, Stationary and related stochastic processes (Wiley, New York, 1967). [28] T. Kailath, "The innovations approach to detection and estimation theory", Proceedings of the IEEE 58 (5) (1970) 680-695. [29] K. Glover and J. C. Willems, "Parametrizations of linear dynamical systems: canonical forms and identifiability", IEEE Transactions on Automatic Control AC-19 (6) (I974) 640-646. [30] H. E. Berntsen and J. G. Balchen, "ldentifiability of linear dynamic systems", in: P. Eykhoff, ed., Proceedings of the 3rd I F A C symposium on identification and system parameter estimation, The Hague, Holland, June 1973. [31] E. Tse, "Information matrix and local identifiability of parameters", 1973 Joint Automatic Control Conference, Preprints.
Mathematical Programming Study 5 (1976) 191-210. North-Holland Publishing Company
IDENTIFICATION AND ESTIMATION OF THE ERROR-INVARIABLES MODEL (EVM) IN STRUCTURAL FORM* R.K. MEHRA International Institute ]:or Applied Systems Analysis, Luxemburg, Austria Received 16 May 1975 Revised manuscript received 14 Nov. 1975 It is shown that the EVM in structural form is identifiable if serial correlation is present in the independent variables. Least Squares, Instrumental Variable and Maximum Likelihood techniques for the identification and estimation of serial correlations and other EVM parameters are given. The techniques used are based on State Vector Models, Kalman Filtering and Innovation representations. Generalizations to EVM involving multiple regressions and randomly time-varying coefficients are also discussed.
1. Introduction T h e E r r o r - I n - V a r i a b l e s M o d e l ( E V M ) is a r e g r e s s i o n m o d e l in which the i n d e p e n d e n t v a r i a b l e s a r e only m e a s u r e d with errors. It h a s b e e n investig a t e d e x t e n s i v e l y in t h e statistical a n d the e c o n o m e t r i c l i t e r a t u r e for o v e r thirty y e a r s ([1-10]; for f u r t h e r r e f e r e n c e s see t h e r e c e n t p a p e r by F l o r e n s et al. [10]). H o w e v e r , as s o m e of t h e s e a u t h o r s p o i n t out, the p r o p o s e d s o l u t i o n s to t h e p r o b l e m are still far f r o m satisfactory a n d r e q u i r e specification of d a t a that m a y not b e r e a d i l y a v a i l a b l e in practice. T h e two b a s i c a s s u m p t i o n s of E V M in S t r u c t u r a l F o r m a r e n o r m a l i t y and i n d e p e n d e n c e (or serial u n c o r r e l a t e d n e s s ) of t h e e x p l a n a t o r y variables. T h e c o n s e q u e n c e s of relaxing n o r m a l i t y w e r e a n a l y z e d by Reiers01 [11] w h o s h o w e d t h a t t h e E V M is i d e n t i f i a b l e for n o n - n o r m a l d i s t r i b u t i o n s . In this p a p e r , we e x a m i n e t h e a s s u m p t i o n of i n d e p e n d e n c e a n d s h o w that relaxation of this a s s u m p t i o n m a k e s t h e E V M identifiable for n o r m a l d i s t r i b u t i o n s . This m a y s e e m p a r a d o x i c a l at first sight, since o n e is i n t r o d u c i n g e x t r a c o r r e l a t i o n p a r a m e t e r s into t h e p r o b l e m which m a y b e e x p e c t e d to w o r s e n * The work reported here was made possible through a grant from IIASA and through US Joint Services Contract No. N00014-67-A-0298-0006 to the Division of Engineering and Applied Physics, Harvard University, Cambridge, Massachusetts. 191
192
R.K. Mehra / The error-in-variables model ( E V M )
the identifiability problem. But we show, in this paper, that the introduction of a correlation structure between the independent variables gives extra equations that allow one to identify all the parameters associated with the independent variables without using E V M . This solves the basic identifiabiiity problem of E V M in structural form. The organization of the paper is as follows. In Section 2, we outline the identifiability problem of EVM. The estimation of the covariance and correlation p a r a m e t e r s associated with the independent variable using a first order correlation model is discussed in Section 3. The complete identification and estimation of the E V M model is discussed in Sections 4 and 5. Extensions of the E V M to multiple regression and to randomly time-varying coefficients is discussed in Section 6.
2. Error-in-variables model, structural form Consider a simplified linear regression model [8], y~ = a + flx~ + u,,
i = 1.....
N,
(1)
where a and/3 are constant unknown parameters, x, and y~ are respectively the independent and the dependent variables and u, is an error variable, normally distributed, white, zero mean and variance o'~. The variable x~ is measured with error z, = x, + v,
(2)
where v~ is normally distributed, white, zero mean, and variance wzo. We assume that v~ and uj are independent for all i, j. Notice that u~ includes both the model error in (1) and the m e a s u r e m e n t error in y,. In the structural form of EVM, it is further assumed that x~, i = 1. . . . . N are independent (of each other and of v~ and u,) and normally distributed with mean /z and variance o, ~. The unknown parameters in the a b o v e model are a,/3, or 20,o- ],/x and o- 2. It is intuitively clear that the m a x i m u m likelihood estimates of these six parameters, if they exist, can be obtained by equating the sample mean and covariance of the pair (y~, z~) to their theoretical values, i.e. E(z,) :
~,
(3)
E(y,) = ,~ +/3~,
(4)
~=
(5)
o-~+ o-~,
R.K. Mehra / The error-in-variables model ( E V M )
or
2y _/32o.~+ o. 2u , --
o'zy =/30- ~.
193
(6) (7)
The five equations (3)-(7) can be solved for only five of the six unknown parameters, thus creating an identifiability problem.' It has been suggested in the literature [1-10] that either cr~ or the ratio o']/cr ~ should be assumed to resolve the identifiability p r o b l e m ) However, such information is generally not available in practice and it would be desirable to devise alternative techniques. An extensive analysis by Zellner [8] shows that the use of Bayesian techniques incorporating a priori information in a less rigid form than the exact specification of zr ~ or tr ~/tr ~ is possible, but the effect of the prior information remains strong for all sample sizes, as pointed out by Florens et al. [10]. Also a recent robustness study by Brown [32] reveals the extreme sensitivity of the classical estimator to the assumed value of o- z,/tr ~. In fact, if the assumed value of cr I/or ~ is in error by more than 25%, the ordinary least-squares estimator, even though biased, turns out to have a lower mean square error compared with the classical E V M estimator. Since the estimation and identification of the E V M seems so out of proportion with its simplicity, one is inclined to ask the question: Is there something missing in the model? Clearly, any model is an idealization of reality and one should make sure that the simplifying assumptions do not make the model degenerate. In the next section, we examine critically the assumption of independence of xi's and show that a relaxation of this assumption makes E V M identifiable. In most of the practical applications, some form of correlation either already exists or may be caused to exist between the independent variables, so that the above assumption is useful not only from a mathematical standpoint but also beneficial from a practical standpoint. 3. EVM with correlated independent variables In this section we analyze a particular correlation structure having a Gauss-Markov or state-vector representation [12]. This structure has been used for Time-Series Analysis and System Identification with great success [13-15]. In some applications of EVM, the assumption of this type of In terms of the likelihood function, this implies that no maximum exists in the admissible range of the parameters [8]. 2 Let a = tr~/o-~ and solve equations (5) and (6) for o-~. Then using equation (7), the following quadratic equation for /3 is obtained: /3 = o-zy(/32- A)/(o-~-Aar ~).
194
R.K. Mehra / The error-in-variables model ( E V M )
correlation structure may not be completely valid and o n e m a y use s o m e o t h e r structure m o r e suited to the particular application. H o w e v e r , for those applications w h e r e the i n d e p e n d e n t variables c o m e from time series (e.g. in forecasting problems) and for illustrative purposes, we consider the following first o r d e r G a u s s - M a r k o v m o d e l for the i n d e p e n d e n t variables x,. (A m o r e general m o d e l will be considered in the next section.) x,+, = 4,x, + w,,
(8)
z, = x, + v,,
(9)
where 0 < 14, I < 1 and w, is a s e q u e n c e of zero mean 3 G a u s s i a n uncorrelated variables with variance o-~. W e have excluded the cases 4, = 0 and [4) I ~> 1 since the f o r m e r leads to E V M with uncorrelated x,'s and the latter leads to a n o n s t a t i o n a r y sequence. T h e steady state or stationary covariance of (8) satisfies [12], 0.1 = 6 : 0 . 1 + 0.~ or
0.1 = o-~/(1 - 62).
(10)
If we choose x0 to be normally distributed with zero mean and variance o-I, then the s e q u e n c e s (x~, zi), i = 1, 2 . . . . . g e n e r a t e d by e q u a t i o n s (8)-(9) are stationary. N o w we estimate 4,, 0. 2 and 0. 2 from the sample correlation of the observed s e q u e n c e zi, i = 1,2 . . . . . Let
c(k)=E(z,zi+k),
k =0,1,2,3,....
A consistent estimator of c(k) is 4 ( k ) where N-k
e(k ) = K1 Z z,zi+k.
(11)
i=l
T h e correlation sequence c(k) satisfies [13] c(O) = 0 "2= 0.2+ 0.2,
(12)
c(1) = 4,0.1,
(13)
c(2) = 4,20- I.
(14)
In general,
c(k)= 4,k0.1,
k = 1,2,3 . . . . .
3 F o r s i m p l i c i t y , w e h a v e a s s u m e d E ( x ~ ) = p. = 0 . E ( w , ) = > ( 1 - tk).
(15) In t h e g e n e r a l c a s e , o n e s h o u l d t a k e
R.K. Mehra / The error-in-variables model (EVM)
195
Equations (13) and (14) may be solved for 09 and o.~: = c(2)
4' O- x - -
c(1) '
(16)
c-'(1)
c(2) "
(17)
From equation (12), O" 2 =
C(0)
-- c2(1)
c(2) '
(18)
and from equation (10), ( _ c2(2)~i c2(1) _ 1 (c2(1)- c2(2)). or2w= 1 c2(1)] c ( 2 ) - c ( 2 )
(19)
It is easily shown that if d(k) is used for c ( k ) in equations (16)-(19), the corresponding estimates of 09, o-2, o-2 are consistent [13]. Using these estimates in equations (6)-(7), one can obtain consistent estimates of/3 and o.2 . The estimation of cr and /, is done from the sample means yl and zi using equations (3) and (4). Thus the EVM with the correlation structure of equation (8) and qg# 0 is identifiable. Of course, the accuracy of estimates would depend on 09, with smaller values of 09 tending to give larger standard deviations of the parameter estimates. In the limit as 09 ~ 0 , equations (13)-(15) do not provide any information about o. ~ and one has the problem of determining both o-2 and o-2 from equation (12) alone. This gives rise to the identifiability problem of the classical EVM. Theoretically, thus, the EVM is identifiable for nonzero 09, however small. Furthermore, if correlation is present, its inclusion in the model would, in general, improve the results. In the next section, we consider estimation of the E V M with a more general Oauss-Markov correlation structure. From here on, we assume that the independent variable has some correlation. Unless there are strong physical reasons to believe that the independent variable is completely uncorrelated, the above assumption is justified in practice. T h e procedure to be outlined in the next section may, in fact, be used to test correlatedness.
4. Consistent estimation and identification of the correlated EVM
Since the details of some of the techniques to be described here are also
R.K. Mehra I The error-in-variables model (EVM)
196
covered elsewhere [13-15] we will only sketch these techniques here. T h e new or special aspects of the E V M will be described in detail. Consider again the E V M , equations (1)-(2) with scalar x,. W e now generalize the first order correlation structure of equation (8) to an n t h o r d e r correlation structure using a state-vector model, i.e. s,+, = q~s, + Fw,,
(20)
x, = hs~,
(21)
z, = hs~ + v~,
(22)
where s~ is n x 1 state vector; cl)(n x n), F ( n x 1) and h(1 x n) are respectively constant matrix and vectors with u n k n o w n parameters. 4 By a basis change, the matrices q~, F and h can be put into the following canonical f o r m
[13-15]: 0
1
0--
0
1 ----
q~ = 0 -
4~
-
0
---1
4,2
. . . .
4,.
F u r t h e r m o r e , wl can be taken to be of unit variance so that the m o d e l (20)-(22) has a total of (2n + 1) parameters. Generalizations of the technique discussed in Section 3 to this case are given in [13]. The relevant equations are (23)-(25) below. [~i]
= - rc(l!
"'"
c(n)]-~[c(n+l)]~t (23)
c(2n - 1)
Lc(n:)
Lc~2n)
where c ( k ) ' s are estimated by e q u a t i o n (11);
o.2= 1 ~ d*j+,c(j), v
~11 !'= o
o- ~ - h4~ -1
[c,1,1 .
.
~b,~+l = 1,
(24)
(25)
Lc'(n)J ' It is assumed that the system (cl),F,h) is completely controllable and observable. In addition, @ is stable, ~b,~ 0 and the mean value of x, viz. a is zero.
R.K. Mehra / The error-in-variables model ( E V M )
197
It should be m e n u o n e d that a model equivalent to equations (20)-(22) is the following 'Innovation' or 'Kalman Filter' model [13-15], g,+,r, = O[g,p,_, + kv,]
(26)
z, = hs
(27)
+ v~,
where g~+ll, denotes the conditional mean estimate of s,+, given {Zl . . . . . z,}, and ~,, denotes the sequence of one-step-ahead prediction errors or innovations [13], since from equation (27) v, = z, - :?, j,_,.
(28)
It is known [13] that v, is a zero mean Gaussian white noise sequence with variance t7 2 = o-9(1 - h k ) . The K a l m a n gain k ( n • 1) is a constant vector of gains related to tr ~ and tr ~ (cf. equations (31)-(32)). The interesting property of the model (26)-(27), besides the whiteness of the sequence v,, is the uncorrelatedness of z?~l,_, with v, since s is a function of {z~,..., z,_~} only. This fact is useful in constructing an instrumental variable [16] for estimating /3 as follows. Consider E V M (1)-(2) combined into a single equation y, = c~ +/3z, -/3v, + u~. Using s
(29)
as instrumental variable (IV) [16],
E(y,z?.,,,_,) =/3E(z,~.,r,_,) or
E(y,e, ,, ,)
/3 = E(z,2,
(30)
p,_,) "
Equation (3) can be used to obtain a consistent IV estimator for /3 by replacing the theoretical correlations by their sample values. T h e sequence ~,p, ,= hg,l,_, is generated using equations (26)-(27). T h e matrix 9 is estimated from the correlations of z,'s using equation (23), and k is obtained from the estimates of F and or ~, as follows. 5 k = M h r ( h M h T + o- ~)-1
(31)
where M = O[M - MhT(hMhT+
o'~)-lhM] qbT+ r r T.
(32)
5 To maintain the uncorrelatedness of z?~ls , with {z~, z,+l, z~+2. . . . }, the estimates of 9 and k used in the K a l m a n Filter are based on the past data, viz. {z, ,, z~_2. . . . }. These estimates are c o m p u t e d on-line by using a recursive f o r m of equation (23) [13].
198
R.K. Mehra / The error-in-variables model ( E V M )
Other methods for direct and more efficient estimation of k exist and are described in [13-15]. In practice, however, the Maximum Likelihood method seems to give the best results, and it may be used for the simultaneous estimation of all the parameters, denoted collectively as o
= It3, o-o, ,~v, 4,~ ....
, ,~,, ~,, .....
~,.]T.
In the next section, we describe a Maximum Likelihood (ML) Estimator, keeping in mind that the above correlation procedure is to be used to obtain a consistent estimator Oo which will be required to start the ML estimation iterative procedure. But first we discuss the problem of determining the order n of the system.
4.1. Order d e t e r m i n a t i o n
The state vector model (26)-(27) along with canonical forms for 4) and h may be written in input-output form as [13, 14] (33)
Equations (26)-(27) and equation (33) are related by their transfer functions, viz. h(qI-
~)-IgPk + I =
q" +
j~l
fbjq i-~
q~ +
j=l
cjq j-~
,
(34)
where q is a forward shift operator, i.e. qg~l,_, = g~+lp, and
(35)
qz, = z,+~.
Equation (33) is an Autoregressive Moving Average ( A R M A ) model of order (n, n) [17]. Let us successively multiply and take expectations on both sides of equation (33) by :~l, ,,z,+~l . . . . . . . . :~+.r,-1 where z~+Jl,-,= E{z,+j I z~. . . . . z,_,} is a function of ( z ~ , . . . , z, ,) only. Then since E{u~+jzk} = 0 for k < ~ i - 1 and for j > / 0 , we get
E (z,§
j,_, + 2 q~j:~,+ki,_,z~+j_l) = 0, j=l
This may be written in matrix form as
k=0,1 .....
n.
(36)
R.K. Mehra / The error-in-variablesmodel (EVM) Z l l i _ l Z i ~ 9. 9,
199
4,i
Zill-tZi+n
4,2 E
~i§
9 9 9,
Zi+lli-lZi+n
= 0.
(37)
4,. 1 Y
G~ Using the linearity property of Gaussian conditional expectations, we can write i-1
2,+,,, , = ~'~ a,.,z,.
(38)
t=l
For practical purposes, one approximates equation (38) by i-]
z,+~l,_l Am =
E
"
(39)
OLt, lZt~
t =i-erl
where m < i is chosen as the least integer value that essentially achieves the minimum prediction error [33]9 From equation (39) i--I
E(z,§
,+tr,-1) =
ar, C(i +y--t).
~ t=i
(40)
m
Also from equation (39) and the orthogonality property, a,.k satisfy the equations i--I
C(i + l - z ) =
Z t=i--m
~ ~.C(t
- ~'),
r=i-m,...,i-1, l=l,...,n.
(41)
Equation (37) shows that if the order of the system is n, then the matrix of correlations C~ has rank n and the eigenvector corresponding to the zero eigenvalue is [4,~,...,4,,,1]. An estimate of C~ may be obtained by computing sample correlations C(l), l = 0 . . . . . m (of equation (11)) and by solving equation (41) using an efficient recursive algorithm based on the work of Levinson [18], Durbin [19] and Wiggins and Robinson [20]. The algorithm can be made recursive both in the order m of lags and and in the lead variable/" 71-0 as shown in [21]. The model order is determined by a rank test of C~ for different values of n or by Whittle's method [33] of testing systems of different order n against an autoregressive model of a very high order,
200
R.K. Mehra / The error-in-variables model ( E V M )
A n o t h e r p r o c e d u r e p r o p o s e d by A k a i k e [22] is to use the m e t h o d of canonical correlations between the sets of variables s~ = {f,r . . . . . . . z,+,l, ,} and r/ = {z~_~,..., z~-m} for m sufficiently large. In this m e t h o d , correlations between all n o r m a l i z e d linear c o m b i n a t i o n s of ~: and r/ viz. A s~ and Br/, with [[As~lt = IIBr/II = 1 are checked, and the combinations with the least correlation are tested for uncorrelatedness. In essence, a Singular V a l u e D e c o m position [23] of the matrix E{s~r/v} is p e r f o r m e d and the lowest characteristic value is c h e c k e d for significance. T h e test has been found useful in practical problems, but s e e m s to involve m o r e c o m p u t a t i o n than the m e t h o d proposed above. (1) A k a i k e [22] has shown that the state vector of the system m a y be defined as Remark.
~r,,
=
I~illl 1 ~
Then g~, , r e p r e s e n t s all the information from the past n e e d e d to predict the future outputs of the system. Thus for an nth order, 2~+,1, , will be linearly d e p e n d e n t on g~p.... which also follows f r o m equation (37). (2) In deriving equation (37) f r o m equation (33), (2i+kl, ,, k = 0 . . . . . n) were used as instrumental variables. If o n e uses, instead, lagged values of z, viz. (z~ ~, zi_~ . . . . ) as instruments, Modified Y u l e - W a l k e r equations are obtained [13]. T h e a d v a n t a g e of using Z+kl~ ,'s as instruments is an improvem e n t in efficiency of estimating (d'~ . . . . . ~b,) since the resulting equations have a structure similar to the M a x i m u m Likelihood e s t i m a t o r discussed below. It is i m p o r t a n t for o r d e r d e t e r m i n a t i o n that the estimates of (~bl . . . . . . 4',) be as efficient as possible within the constraints of the c o m p u t a t i o n burden.
5. Maximum likelihood estimation of the correlated EVM
T h e E V M described by equations (1) and (20)-(22) m a y be written in state-vector f o r m as s~ +i = q)s~ + Fw~,
(42)
yi = f l h s i + ui,
(43)
Z~ = hs~ + t)~,
(44)
i = 1 . . . . . N.
R.K. Mehra / The error-in-variables model (EVM)
201
Equations (43) and (44) may be combined into a vector equation (4-5)
m~ = Hsi + n~,
where mi=(
)z~ y~
is 2 • 1 vector of measurements,
is 2 • 1 vector of noises with covariance matrix R
=
~
,
(46)
O'v
and H is 2 x n matrix defined as H=
[lfi]h.
(47)
Let 0((2n + 3) x 1) be the vector of all unknown parameters in the above model, i.e. 0 = [/3, o-w,o-v, 61 . . . . . 6,, "Y,. . . . . yn ] T.
(48)
We have shown in Section 4 that 0 is identifiable as long as ~bl. . . . . r are not identically zero and the roots of 9 lie inside the unit circle. Thus the m a x i m u m likelihood estimate of 0 would be expected to be consistent. In fact, stronger results can be proved, viz. that under the above conditions, the M L E is asympotically unbiased, efficient, normal and strongly consistent [24, 25]. We describe here only the p r o c e d u r e for obtaining M L E of 0. For further computational details the r e a d e r may refer to [26, 27]. T h e log-likelihood function is L ( O ) = l o g p ( y , . . . . . yN, zl . . . . . z~ I 0)
= l o g p ( m , . . . . . rnN I 0) m
= ~'. logp(m, ] ml . . . . . mi-~, 0). j=l
(49)
The conditional density p ( m i I m l . . . . , mj ,, O) is normal with mean and
202
R . K . M e h r a / The error-in-variables m o d e l ( E V M )
covariance d e n o t e d respectively b y 6 m r ~r -, and Pr'r ,. As is well known, these quantities can be c o m p u t e d recursively using a K a l m a n Filter [12] of the following form:
L , : r = ,tg,r.
(50)
st,, = ~ !j-, + Kr (m, - H~ ,,..,)
(51)
P;~, ,, = 4)Pj ,fl)-r + FF T
(52)
Kj = P~ ,j_,H'~(HPj ,j-,H'r + R ) - '
(53)
,r = ( I - K ~ H ) ~
It-,.
(54)
T h e initial conditions are specified f r o m a priori k n o w l e d g e as S01o = So and P0o = Po. If no a priori information is available, one m a y use the Information F o r m of the Kalman Filter [28] that p r o p a g a t e s P;l'J-, and Pf,', starting from an initial value of zero. A n o t h e r m e t h o d often used in practice is to set P0 to a very large value which essentially eliminates the d e p e n d e n c e of the K a l m a n filter on initial values. T h e log-likelihood function (49) m a y now be written as N
L(O) = - ~ {{(m, - m, ,, ,)'r(Hej , j - , H T + R ) - ' ( m j - mj l j-,)
+ log ] HPr IJ , HT -F R I} + constants.
(55)
We now m a x i m i z e L(O) with respect to 0 @ O subject to the constraints of eqs. (50)--(54). Since this is a nonlinear p r o g r a m m i n g p r o b l e m , a c o m m o n l y used m e t h o d is modified G a u s s - N e w t o n , the details of which are given in [27]. T h e basic iteration is
0'*' = O' + p M ' ( O ' )
~7
,
(56)
where 0 j is the value of 0 during the j t h iteration, 0 < p < 1 is a step-size p a r a m e t e r , M(O j) is an estimate of the Fisher I n f o r m a t i o n Matrix at 0 r defined as
M(O r)=E L- J
=E
o,'
" D o u b l e hats are u s e d for e s t i m a t e s c o n d i t i o n a l on the joint set of m e a s u r e m e n t s {m . . . . . . m,..,}. T h u s " j Sjl
t
=E{s,
I m..... 9
m, ] , 0 } = E { g . . . . l Y. . . . . . y,_ h 0}.
R.K. Mehra / The error-in-variables model ( E V M )
203
and M ~' is a modified inverse of M o b t a i n e d by setting eigenvalues of M below a certain threshold (such as 10 -8 times the largest eigenvalue) to larger positive values. Most of the c o m p u t a t i o n in this algorithm is involved in calculating the partial derivatives 0~iji-,/00 and 0Pjjj ~/a0 from a set of linear recursive equations. A s indicated in [26, 271, simplifications to these c o m p u t a t i o n s are possible in practice. Notice that M ~(0) evaluated at the M L E 0 is the C r a m e r - R a o lower bound. For large samples, it gives a good e s t i m a t e of the covariance of the M L estimates. R e m a r k s . (1) Since the log-likelihood function m a y be multimodal, it is i m p o r t a n t to have a good starting estimate 0o ~ ~9 of the p a r a m e t e r s . T h e I n n o v a t i o n Correlation - - I n s t r u m e n t a l Variable t e c h n i q u e described in Section 4 is r e c o m m e n d e d for this purpose. In the small sample case, even the o r d e r n m a y have to be r e c h e c k e d using M L E along with an F - t e s t or I n f o r m a t i o n criterion [22]. (2) It is also possible to d e v e l o p a m a x i m u m likelihood e s t i m a t o r using the alternative model (26)-(27). This formulation leads to s o m e simplifications and is also useful in the Multiple Regression case to be discussed in the next section, since in that case, a multivariate version of (26)-(27) is identified directly. T h e new set of equations is
g~+zp, --- qb[g~ i, 1 + k ( z , - hg~r,_l)]
(58)
yl = flhg, r,-1 + ~,,
(59)
e, =/3(v, - v , ) + u,
(60)
o'~ = [32o'~hk + o-~.
(61)
0 ' = [/3, o'~, o-~, q~l. . . . . ~bn, k l , . . . , k,],
(62)
where
and
Defining
the log-likelihood function L ( O ' ) m a y be written in t e r m s of the K a l m a n Filter, 7
7 The Kalman Filter for equations (58)-(59) is derived by regarding equation (58) as an equation containing no process noise, viz. g,§ = ~ ( I - kh )g~ ~+ kz, where z i is a known sequence.
R.K. Mehra / The error-in-variables model (EVM)
204
s~+, r, = c I ) [ s ~ , , _ , + ( I - k h ) & ( y , - / 3 h ~ , - l ) + k ( z , -
hs, r,-,)]
(63) (64)
P,+I I, = qb(I - k h ) ( I - / 3 g , h )P~ r,_l(I - k h )T dp T
& =/3P~ r,
lh T(/32hp ~t,-lh T +
/3 20. 2hk +
0.
2)-1
(65)
L ( O ' ) has the s a m e form as eq. (55) except that o'2 in R is replaced by ( I - h k ) 0.2 and eqs. (63)-(65) are used to evaluate Pj ij-i and ~ pj , in terms
of 0' p a r a m e t e r s . (64)-(65).
Notice that no matrix inversion is r e q u i r e d
in eqs.
6. Extensions In this section, we consider two extensions: (i) vector /3 and vector x case (Multiple Regression E V M ) , and (ii) r a n d o m l y time-varying /3. 6.1. M u l t i p l e R e g r e s s i o n E V M
Let _x d e n o t e a p • 1 vector of i n d e p e n d e n t variables and /3 be the corresponding vector of regression coefficients. 8 T h e E V M is y~ = _x,T L3 + u,,
(66)
z, =_x, +_v,,
(67)
i = l . . . . ,N.
We now d e v e l o p a state-vector m o d e l for the series {_x~},of the s a m e form as eqs. (20)-(22) except that h is a matrix (p • n). T h e identification of this model is m o r e complicated, but follows the s a m e basic principles as outlined in Sections 4 and 5. T h e essential differences lie in choosing a canonical form which in the m u l t i - o u t p u t case d e p e n d s on p integers {nl . . . . . rip} such that ~L~ n~ = n. T h e state vector g~j~-l is defined as zT,r,-,(1)
.~,+,,,F, , 0 ) g, I,-, =
s r,-l(2)
(68)
(n • 1) ,+.~,, ,(2) .
~,+..I,-,(p) 8 A bar under a scalar variable denotes a vector and a bar under a vector denotes a matrix. A b a r u n d e r a m a t r i x d e n o t e s a n o t h e r m a t r i x of d i f f e r e n t d i m e n s i o n s .
R.K. Mehra / The error-in-variables model (E VM )
205
where z~+Ji, ~(k) denotes the (j + D-step-ahead predicted estimate of the kth c o m p o n e n t of zi+j. The integers n~,n2 . . . . . n, are d e t e r m i n e d by examining correlations between the above variables in the order 2,p, ,(1), z~,j,_,(2) . . . . . z~r~-,(P), z~,+,l~-l(1), 2,+1!, ,(2) . . . . . z,+,,,+lr,-,(j), where j refers to the output variable with the highest value nj. Thus, nl is determined when Zi+.lF, ,(1) becomes linearly correlated to its antecedents. The p r o c e d u r e is quite straightforward and is well described in [22]. The procedure described in Section 4 using eqs. (37)-(41) is also easily extended to the vector case using recursive algorithms of Wiggins-Robinson [20, 21]. We now summarize the complete p r o c e d u r e adding a few m o r e practical details. (1) C o m p u t e the sample correlation matrices _d(k), of {z, . . . . . _z.} after subtracting the mean, for k up to m ~ 3N'/2. 9 (2) D e t e r m i n e a state-vector model for _x,'s using either the Canonical Correlation P r o c e d u r e of A k a i k e [22] or the procedure of Section 4 extended to the vector case [21]. During this step, the o r d e r n, output numbers nl . . . . . np and matrices (4), _k, _h) in canonical form are determined: gi-~lj, = ~[g~ll 1+ _kv~]
(69)
_v, = _z, - hg~l,_l.
(70)
(3) This step should be p e r f o r m e d if, due to small sample size, the procedure of step 2 is expected to yield inefficient estimates that may also affect the correct determination of (nl . . . . . np) [22]. During this step, obtain maximum likelihood estimates of parameters in 4), k, _h and ~_~_~(covariance matrix of _v's), d e n o t e d collectively by vector ~, by maximizing the loglikelihood function, L ( $ ) = - 89~
N
N vT ~71_~_v, - g In [ X<~ 1-
(71)
i=1
(4) Use ~,t, 1 = hs,l~ , as instrumental variables with eqs. (66) and (67) to obtain a consistent estimate of L3: e = [,=~ (2,,, i_z,*)] ' ( i ~ 1 2_iri-,yi).
(72,
F r o m the sum-of-squares of the residuals (y, - z,' ~), obtain an estimate of 2 , using estimated values of 6 and s = (I-h_k)N<~, covariance of m e a s u r e m e n t noise _v,. " 3N '/=is an empiriczl number beyond which the accuracy of correlations is found to degrade
seriously.
206
R.K. Mehra / The error-in-variables model (EVM)
(5) Steps 1--4 give the model structure and a consistent e s t i m a t e of all the unknown p a r a m e t e r s , 0 E {/3, or,, Y._~ q0, _k, h}. We now p e r f o r m final maxi m u m likelihood estimation by maximizing with respect to 0 ~ O,
{
L(O)=- 89
[~T,r_r](HP,,,_,H_ T + E ~ ) - '
r,
+ In i _HP,,,-, _//r+ Y~_~vi}, subject to the constraints s,§
= 4,[s, , , + (I - _k_h)qX, + k_r,]
(73)
sr~ = y, - L3r_hs, i,_,
(74)
_r, = z, - _hg,,, _,
(75)
Pi.lli = t ~ ( I - _k_h)(l- L3rq,_h)P,,, , ( I - _k_h)rq~'r
(76)
g, = P,,,_,h_'O(olh_P,,,_,h_~# + r
(77)
+ O'h_k_ X_._~O)-'
Err = ( I - h_k ) Y~_~,_~,
(78)
H =
(79)
_h.
6.2. E V M with R a n d o m l y T i m e - V a r y i n g Coefficients In this section we propose an a p p r o x i m a t e technique b a s e d on E x t e n d e d K a l m a n Filtering [28, 29] for estimation of E V M with time-varying coefficients. For simplicity, consider the simple E V M (1)-(2) with ot = 0, a n d / 3 a function of i, which now explicitly refers to time. (This connection with time is not essential, but helps motivation, since such models generally arise in forecasting applications where i is a time variable.) y, =/3,x, + u,, z, = x, + v,.
(80) (2)
O n e way of m o d e l i n g r a n d o m changes in/3~ that has been used successfully in practice with ordinary regression models [30] is /3,+,-- 6,8, + A,
(81)
where 0 ~< 6 < 1 is a constant u n k n o w n p a r a m e t e r and A~ is a G a u s s i a n white noise s e q u e n c e ~~ with unknown variance tr ]. ,o As shown in Zellner [8], the assumption of Gaussian prior distribution of/3, may lead to inadmissible values. In such cases, we assume that/3, is a transformed variable with Gaussian density.
207
R.K. Mehra / The error-in-variables model (EVM)
Let us assume, again for simplicity, that x,'s obey a first order model (cf. eq. (8)): Xi+l = ~)Xi ~- Wi.
(8)
Regarding (x,, fl~) as the state vector, the above four equations constitute a linear state-vector model with nonlinear measurements since the product of states, 18,x,, appears in eq. (80). Initial estimation of $ and ~r ~ can still be carried out in the same fashion as before, but to estimate fl~'s, we use an Extended Kalman Filter of the following form [28],
(82) t./3,+,r, j
y,-fllri-~fr
a ] I_t3,F, 1J D
K~ = S,I,_~A T(A,S,r,_,A T + R ) - '
(83)
Si+lpl = D ( I - KiAi)SIII_IDT + E,
(84)
where AI = D =
[, o [0 o] ft.,i_,
]
X,,i-,
,
o] '
E =
or.
[o o] o-~
(85, 86)
'
(87, 88)
'
The Extended Kalman Filter is not an optimal nonlinear filter for estimation of /3's and x's. Other filters such as the Iterated Sequential-Extended Kalman Filter [29], which involves a little more computation, may give better results. The choice of the filter is dependent qualitatively on the amount of noise and the extent of nonlinearity in the equations. Approximate Maximum Likelihood estimation of the unknown parameters 0 = (or,, o-v, 6, o-a, ~b, ~rw)Tmay be performed by maximizing the following approximate log-likelihood function subject to eqs. (82)-(88). N
L(O)= -~
e~r(A,s,li_,A~+R)-~e,+lnlA,&l,
~AT+R
l,
(89)
i=l
where ^
= [Zi --.~11-1 e,
Ly, - fi, p,-l~, p,-, ] '
(90)
L(O) is approximate log-likelihood since el is not exactly Gaussian and white. Further details of this procedure may be found in [31].
208
R.K. Mehra / The error-in-variables model (EVM)
R e m a r k s . (1) The above method is extended easily to Multiple Regression EVM with a general correlation model for the independent variables. (2) In many forecasting applications where regression is used, the values of the independent variables also have to be predicated for the future. If the independent variables are regarded as serially uncorretated, this cannot be done. Our procedure directly gives forecasts of both the independent and dependent variables via the equations si+jl~ = dPJ-~si+,p~,
(91)
[
(92)
~'+il~]= Hsi+j,i. Xi+jli_]
6. Conclusions
The EVM in structural form is completely identifiable as long as some serial correlation is present in the independent variables. Both least squares and maximum likelihood techniques have been given to identify and estimate the serial correlations and the EVM parameters. Construction of Bayesian techniques is also straightforward and will be discussed elsewhere. The following summarize what are believed to be the original contributions of the paper: (1) The assumption of no serial correlation of independent variables is a cause of the identifiability problem. The assumption is generally not justified in practical applications such as forecasting where regression models are commonly used. (2) Consistent estimates of the correlation parameters are obtained by analyzing the series of independent variables alone. These estimates are then used along with a new instrumental variable technique to obtain consistent estimates of the EVM parameters. (3) In Section 4.1, a computationally efficient technique is given for model order determination. (4) In Sections 5 and 6, a maximum likelihood technique using the observations one at a time and incorporating new information into an 'Innovation' model is described (cf. eqs. (58)-(65) and (73)-(79)). (5) An EVM with randomly time-varying coefficients is estimated using Extended Kalman Filtering and Approximate Maximum Likelihood Estimation. The technique is applicable to nonlinear systems, as well.
R.K. Mehra / The error-in-variables model ( E V M )
209
References [1] A. Wald, "The filtering of straight lines if both variables are subject to error", The Annals of Mathematical Statistics 11 (1940) 284-300. [2] J. Neyman and E. Scott, "Consistent estimates based on partially consistent observations", Econometrica 16 (1948) 1-32. [3] T.C. Koopmans and O. Reiers01, "The identification of structural characteristics", The Annals of Mathematical Statistics 21 (1950) 165-181. [4] J. Neyman, "Existence of a consistent estimate of the directional parameter in a linear structural relation between two variables", The Annals of Mathematical Statistics 22 (1951) 497-512 [5] J. Kiefer and J. Wolfowitz, "Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters", The Annals of Mathematical Statistics 27 (1956) 887-906. [6] A. Madansky, "The fitting of straight lines when both variables are subject to error", Journal of the American Statistical Association 54 (1959) 173-205. [7] M.E. Solari, "The maximum likelihood solution of the problem of estimating a linear functional relationship", Journal of the Royal Statistical Society Ser. B, 31 (1969) 372-375. [8] A. Zellner, A n introduction to Bayesian inference in econometrics (Wiley, New York, 1971). [9] P.A.P. Moran, "Estimating structural and functional relationships", Journal of Multivariate Analysis 1 (1971) 232-255. [10] J.P. Florens, M. Mouchart and J.F. Richard, "Bayesian Inference in Error-in-Variables Models", Journal of Multivariate Analysis 4 (1974) 419-452. [11] O. Reiersol, "Identifiablity of a linear relation between variables which are subject to error", Econometrica 18 (1950) 375-389. [12] A.E. Bryson and Y.C. Ho, Applied optimal control (Xerox Publishing Company, New York, 1969). [13] R.K. Mehra, "On line identification of linear dynamic systems with applications to Kalman filtering", IEEE Transactions on Automatic Control AC-16 (1) (1971). [14] R.K. Mehra, "Identification in control and econometrics, similarities and differences", Annals of Economic and Social Measurement (1974). [15] R.K. Mehra, "On the identification of variances and adaptive Kalman filtering", IEEE Transactions on Automatic Control AC-15 (2) (1970). [16] J. Johnston, Econometric methods (McGraw-Hill, New York, 1963). [17] G.E.P. Box and G.M. Jenkins, Time series analysis forecasting and control (Holden Day, San Francisco, 1970). [18] N. Levinson, "The Wiener RMS criterion in filter design and prediction", Appendix B of N. Wiener, Extrapolation, interpolation and smoothing of stationary time series (Wiley, New York, 1949). [19] J. Durbin, "The fitting of time series models", Revue Inst. Int. Statistics 28 (1960) 233-244. [20] R.A. Wiggins and E.A. Robinson, "Recursive solution to the multichannel filtering problem", Journal of Geophysics Research 70 (1965) 1885-1891. [21] R.K. Mehra and P.S. Krishnaprasad, "Fast recursive algorithms for parameter estimation", to appear. [22] H. Akaike, "Canonical correlation analysis of time series and the use of an information criterion", in: R.K. Mehra and D.G. Lainiotis, Eds., System Identification, advances and case studies (Marcel-Dekker, New York, 1976).
210
R.K. Mehra / T h e error-in-variables model ( E V M )
[23] G.H. Golub, "Matrix decompositions and statistical calculations", in: R.C. Milton and J.A. Nelda, Eds., Statistical Computation (Academic Press, New York, 1969). [24] K.J. Astrom, T. Bohlin and S. Wenmark, "Automatic construction of linear stochastic dynamic models for stationary industrial processes with random disturbances using operating records", Rep. TP 18.150, IBM Nordic Laboratories, Lindingo, Sweden (1965). [25] L. Ljung, "On the consistency of prediction error identification methods", in: R.K. Mehra and D.G. Lainiotis, Eds., Systems identification, advances and case studies (Marcel-Dekker, New York, 1975). [26] R.K. Mehra, "Identification of stochastic linear dynamic systems using Kalman filter representation", A I A A Journal (1971). [27] N. K. Gupta and R.K. Mehra, "Computational aspects of maximum likelihood estimation and reduction in sensitivity function calculations." t E E E Transactions on Automatic Control (1974). [28] F.C. Schweppe, Uncertain dynamic systems (Prentice Hall, New York, 1974). [29] R.K. Mehra, " A comparison of several nonlinear filters for re-entry vehicle tracking", I E E E Transactions on Automatic Control (1971). [30] R.K. Mehra and P.S. Krishnaprasad, " A unified approach to structural estimation of distributed lags and stochastic differential equations", Third NBER Conference on Stochastic Control, Washington, D.C. (1974). [31] R.K. Mehra and J.S. Tyler, "Case studies in aircraft parameter identification", Third IFAC Conference on Identification, The Hague, Netherlands (June, 1973). [32] M.L. Brown, "Robust Line Estimation with errors in both variables", Working Paper No. 83, Computer Research Center for Economics and Management Science, National Bureau of Economic Research, Inc., Cambridge, Mass. (May, 1975). [33] P. Whittle, "Tests of fit in time series", Biometrika 39 (1952) 309-318.
Mathematical Programming Study 5 (1976) 211-226. North-Holland Publishing Company
VALUE OF INFORMATION IN Z E R O - S U M G A M E S * Fang-Kuo SUN and Yu-Chi H O Harvard University, Cambridge, Mass., U.S.A. Received 10 June 1975 Revised manuscript received 10 December 1975
1. Introduction Assuming the utility of any consequences can be measured in dollars, the concept of the value of information in a statistical decision problem is often informally defined as I Value of information
f T h e "best" one c a n | = / achieve with the / L information J
f T h e " b e s t " one c a n | |achieve without t h e | . 1_ information _1
(1)
In a competitive situation, the meaning oi (i) must be interpreted with care. It is intutitively clear that the value of information not only depends on what we mean by best but also depends on what information the other player has. In this paper we shall provided a definition of the value of information, which is similar to (1), for a strictly competitive environment, in particular for two person zero-sum games. We shall require all games considered to have saddle point values in either pure or mixed strategies, but otherwise completely general, e.g., they could be stochastic two-person differential games. The values of information for such games are derived from a market * The research reported in this paper was made possible through support extended by the Division of Engineering and Applied Physics, Harvard University and the U.S. Office of Naval Research under the Joint Service Electronics Program by Contract N00014-75-C-0648 and by the National Science Foundation under Grant G K 31511. 1 We recognize that this definition is open to criticism, since the utility is in general not linear in dollars and the consequence of a decision should be considered together with the cost of information. However, such consideration only complicates the computation of the meaning of the word "best", and does not affect the spirit of the discussions below [1]. Furthermore, despite this, the above definition is generally accepted at least as a first approximation in the most of decision-theoretic literatures. 211
212
F. Sun, Y. Ho / Value of information in zero-sum games
place consideration involving buyers (the two players in the original game under consideration) and a seller. This market place situation constitutes a three-person constant sum game the solution of which justifies the particular definition of the value of information for zero-sum game. To maintain the continuity, we state and discuss the implications of various propositions and lemmas in the main text. All proofs are given in the appendix. 2. Formulation and definitions
Consider a situation in which A and B are playing a zero-sum game, G, with value Va under the information structures, IA and IB, respectively. Here, we use IA and 18 to represent (i) the system i n f o r m a t i o n , which includes the system structure, criterion, measuring mechanism, and their parameters as well as other data, and (ii) the decision i n f o r m a t i o n which is about what nature and possibly other player has done [2]. Without loss ef generality, let A be the maximizer and B be the minimizer. Now suppose that there is additional information AI available in the m a r k e t for both players. In practice, this information, AL can be anything, such as, results which may be obtained from doing researches, a new technique which is owned by a third party, even a piece of information which is obtained by a double agent who stole it from both A and B, or etc. Following the usual approach in the one-person decision theory and assuming first that A I is costless, we then have the following possible cases for the saddle point values of the zero-sum game, G, depending on whether none, one, or both players acquire the new information, AL Information structure A (Maximizer)
B (Minimizer)
Value of game
a
IA
IB
V.
b
(JA, AI)
I,
Vb
c
I.,
(t.,aD
vo
d
(I.., M)
(I,,, M)
v~
However, it is also well known [3] that everything being equal increasing information in one of the players never hurts and may possibly help him. Thus, we have immediately
213
F. Sun, Y. Ho I Value of information in zero-sum games
Vc<~Vo<~89
(2)
v~<~G<-V~.
The order of Vo and Va will depend on the player's "strength" which has been examined by Ho and Sun [4, 5]. Here, we assume that Vo and Va are arbitrary. Thus, when AI becomes available, A and B are facing a new game from which they should decide whether or not to use AI described as follows.
U
N
(69
U
v~
N
v.
where " U " stands for "using the information", and " N " stands for "not using". From (2) it is easy to see that Vd is the value of game (G*), and (U, U) is the corresponding saddle point. However, since Vd might not be equal to V, in general and the game is zero-sum, one of players, depending on the order of V,~ and Vo, will definitely suffer from the appearance of this additional piece of information. Thus, one might argue that the value of AI will be negative to that player. Nevertheless, the main reason for A (or B ) to choose " U " is to avoid ending up with Vc (or Vb) which is much worse than V~ from A ' s (or B's) point of view. Therefore if AI is available in the market place, it is still worthwhile for A (or B ) to acquire it. Next, let us define VA/B ( A I ) =
V d -- Vc
= [Gain in $ to A due to A I when B already has A I ] ; VA/a ( A I ) = V b -
Vo
= [Gain in $ to A due to AI when B does not have AI].
(3)
Similarly, we have VB/A ( / ~ I ) ~- V b -- V d ;
VB/,~ ( A I ) =
Va -- V~.
From now on, for notational simplicity we will write V / ( ' ) Immediately, we have the following lemma.
(4)
as V / :
F. Sun, Y. Ho / Value of information in zero-sum games
214
Lemma 2.1.
VA/~ < VA/B---> VB/A < V~/~ ;
(5)
VB/a < VB/A --->VA/~
F.Sun, Y. Ho/ Valueof informationinzero-sumgames
215
3. The three-person information-market game, T In this section, we shall show that T admits an e-equilibrium solution and that no nondominated coalition solutions are possible. Furthermore, various properties of the equilibrium solution (P*, P*) for C would qualify them as eminently reasonable choices for the values of information AL In the next section, we shall discuss other nonmarket interpretation for the value of A/. Let PA and PB be prices of hi announced by C for A and B, respectively. Let Jc (PA, PB) be C's return, then depending on the announced prices and actions taken by A and B we have
t PA q-PB Jc(PA, P~) =
if both A and B buy A/,
Pa
if A buys AI, but B does not,
P~,
if B buys AI, but A does not,
0
if both A and B do not buy AL
In general, PA and PB E [0, oo). From the assumption that A and B will only buy AI from C, A and B are now facing the following nonzero-sum game.
B B
D
(v~ - PA, V~ + I".)
(Vc, Vc + P.)
D (Vb -- PA, Yb)
(G**)
(V~, Va)
where " B " stand for "buy h i " , " D " stands for "does not buy"; (V~ - PA, Vd + PB) means that A and B both bought h i with prices PA and PB, respectively, then (Vd - PA) is the net return to A and (V~ + PB) is the net loss to B, and the rest of entries in (G**) is similarly defined. Thus, A, B and C are playing a 3-person zero-sum game. The solution to this 3-person game usually depends on the solution concepts. However, the following series of results lead to an e-equilibrium solution for T. Proposition 3.1. ( a ) I f
PA< Min(VA/B, VAIn) [or PB <
Min(VB/A, VB/~)],
216
E Sun, Y. 14o / Value of information in zero-sum games
then A ' s (or B ' s) optimal strategy for (G**) is " B " , where optimality is in the permanent sense from the viewpoint of nonzero-sum game, i.e.,
k
Vc
V.
Jky2_l
l(gc + P~)
Vc
go
Va
k(v~ + P~)
Jky2J
vo
y:
for all x~, x2, yl, and y2 belong to (0, 1); x~ + x2 = 1 and yl + y2 = 1. (b) If PA ~ MaX(VA/B, VA/B)
[or PB > Max(VB/A, VB/A)],
then A ' s (or B ' s) optimal strategy for (G**) is " D " , where the optimality is also in the permanent sense. Furthermore, for both (a) and (b) " B " and " D " are unique permanent optimal strategies. Notice that this proposition provides a bounded region for which PA and JoB might be announced by C in order to maximize his return. In the next several corollaries, we can tighten these bounds.
Corollary 3.2. For any e > 0, there exists a pair of prices, (P*, P*), with O((VA/B --P*)<(e/2);
0 < (VB/A - P * ) < ( e / 2 )
such that O<(VA/B • VB/A - J c ( P * , P * ) ) < e.
Corollary 3.3. VA/s + VS/A = Min(VAm, VA/~)+ Max(VB/A, VB/7~) 9= Min(V~.~, V~/~)+Max(VAm, VA/a)
=Vb-Vc. R e m a r k 3.4. By these two corollaries we see that in any case C can announce PA and PB as close to VAIn and VB/A, respectively, as possible such that he can be guaranteed to get an amount which is arbitrarily close to
F. Sun, Y. Ho / Vatueof information in zero-sum games
217
(Vb - Vr Thus we have tightened up the region from below. It is obvious that if ( V ~ - V~)= ( V b - Va), i.e., VA/B = Vain and VB/A = V~/z, then Pa* and P~ are clearly the best prices which C can ask for. From now on, we shall assume that (Vd - Vc) ~ (Vb - Vo). The next corollary shall narrow the possible prices down even further. Corollary 3.5. Let (PA, PB) be a pair of prices announced by C. If (i) V, ~ Vh, (ii) either
Pa >Max(VAIn, VA/9)
or
f B > M a x ( V B / a , V~/,~),
then there exist PA and PB, satisfying conditions of Corollary 3.2, such that
Jc (PA, p . ) > Jc (PA, PB)Note that (a) condition (i) in the corollary simply requires that AI cannot be simultaneously contained in IA and IB, and that is not useless. (b) If both fiA and/SB are greater than Max(Vain, VAIn) and Max(VB/A, VB/,~), respectively, then it is obvious that there exists such a pair (Pa, PB) that Jr PB)>Jc(Pa, PB). To see this, by the part (b) of Proposition 3.1 A and B will not buy AI with such a pair of prices. Thus, C will gain nothing.
Corollary 3.6. Let condition (i) in Corollary 3.5 be assumed. (i) If PA = Max(VAin, VAle)
(or fib = Max(VB/A, V,/,~)),
then A (or B) will either choose " D " definitely or be indifferent between " D " and "B". (ii) If PA = Min(Vam,
Vain),
(or/50 = Min(V./a, V./~)),
then A (or B) will either choose " B " definitely or be indifferent between " D " and " B " . However, If A (or B) chooses " B " definitely, then there exists a pair (PA, PB) such that JC(PA, P , ) > Jc(fA, P,)
[or Jc(PA, PB)],
where (PA, PB) satisfies conditions in Corollary 3.2. Remark 3.7. From these four corollaries, the only region left for which C can possibly increase his return definitely, comparing to (Pa, PB), is to announce (Pa, PB) in R, where
F. Sun, Y. Ho / Value of information in zero-sum games
218 R
o PB). o . Min(Va/s, Va/a) < P~ < Max(Va/B, VAIn); = ((Pa,
Min( VBIA, V~t~ ) < po < Max(VB/A, VB/~,)}. However, we have the following proposition.
Proposition 3.8. ff (PA, PB) E R, then (a) There exist no permanently optimal strategies for both A and B;
(b) Jc(PA, Ps)=(X,PA + y , P s ) < (VAIn + V , / a ) = ( V b - Vc) for any Nash-equilibrium ((x,, 1 - x,), ( y l , ] -- y l ) ) with 1 <~ xl <~ 0 a n d 1 <~ y~<~O. Remark 3.9. From this proposition we see that for both PA and PB in the region R, we will lose the permanent property of optimality. Furthermore, the Nash-equilibrium chosen by A and B for PA and Ps in the closure of R cannot increase the expected gain of C, comparing to P* and P* in Corollary 3.2. Thus, the overall situation from C's viewpoint may be summarized in Fig. 3.1, where we assume Vain < VA/~. Consequently, a set of equilibrium strategies for A, B and C is P* = V,,,, (a O - (e/2)
C:
yc =
I
te* A
B:
for all e > 0 =
T*(PA, PB)
y*(Pa, PB)
v~,~ (~I) - (~/2) fBuy,
if PA < VAIn (AI).
1 {.Do not buy,
if otherwise.
j'Buy,
if P, < VB/A(AI).
) {.Do not buy,
if otherwise.
It should be noticed that the equilibrium defined by this set is strong in the sense of Remarks 3.4, 3.7, and 3.9. Finally, we consider the possibility of coalition formation in T. To do this, we shall g_o to the characteristic function form of T. Let N = {A, B, C} be the player set, and u(S) be the characteristic function, where S C N. Note that since our game in the normal form is zero-sum, we have v(N) = O, v(4~) = O, and u(S) + v(N - S) = 0, Furthermore, (i) if S = {A, B}, then the best A and B can do jointly against C is not to buy AL and v({A,B})=O, i.e., v ( C ) = 0 ;
F. Sun, Y. Ho / Value of information in zero-sum games
219
PB ' 0:5
e + P* = V~/,~
0:3 0/2
0/4
P* + e = VA/o
Region Region Region Region Region
a,: A az: A a3: A a4: B 0:5: A
and and and will will
Vain
PA
Fig. 3.1. B will both buy. B may or may not buy. B will both not buy. buy, but A will not buy. buy, but B will not buy. 3
(ii) if S = {A, C}, then B knows A will have AI definitely, so the worst B will get is ( - Vb), i.e., u(B) = - Vb ; (iii) similarly, if S = {B, C}, then u ( A ) = Vc. It is easy to see that u(S) is indeed a characteristic function by checking the property of superadditivity. Since the existence of the core will imply that of the n o n d o m i n a t e d imputation, which in turn will imply that there might exist prices for m I on which a coalition may form, we first investigate the question "is the core e m p t y for this 3-person g a m e ? " . This is the next proposition. Proposition 3.10. The core of this 3-person zero-sum game is empty. None of the three possible coalitions, { A , B } , {B,C}, and { C , A } are "stable" enough so that the third player can be prevented from enticing a m e m b e r of the coalition with a better "deal". Thus, no coalition will form. 4 Under this circumstance, we maintain that VA/~ and VB/A c a n be designated as the value of information AI to A and B, respectively. 4. Arbitration model for the value of AI Suppose that instead of AI being available in the market an arbitrator, e.g. 3 A similar diagram with obvious changes holds for VA/B > VA/0. 4 We assume that no secret coalition is permitted. Or equivalently the existence of a secret "deal" between two players will always be revealed in the course of time, e.g., during the repeated play of the game.
220
F. Sun, Y. Ho / Value of information in zero-sum games
the government, is used to determine the price. Furthermore, consider only the case AI = (A/a, A/e), where AIa C IA and AIB C IB. In this case, the arbitrator is primarily interested in being "fair". He may wish to determine a pair of prices Pe (A/A) and PA (A/e) which A and B should receive from each other for making available AIA and AIe, respectively, to the other party. From now on, for notational simplicity we shall write PB (AIa) and Pa (A/B) as PB and PA, respectively. Thus, after the arbitrator has announced the prices, A and B are facing the following zero-sum game. B B
D
D
(Vd+P.-Pa) (Vc + PB)
(V~--PA)
(G***)
Vo
However, it is obvious that for any pair (PA, Pe) the value of (G***) will not be equal to V, in general. This means that one of the players, depending on the order of Vo and the value of (G***), will definitely suffer from this exchange. Consequently, if this player has the right not to sell his information, then he will simply refuse to exchange the information at the announced price. Nevertheless, if we assume the arbitrator can order each player to sell his information at a price, e.g. government, then the "fair" prices defined by Witsenhausen [7] should be acceptable. Under his definition, a pair of prices is "fair", if it makes both players to mix buying and not buying with equal probability as their optimal strategies, i.e., they are indifferent. Mathematically, let /Se and PA be the "fair" prices, then P, = ~[(Vb - V c ) + ( V o - V~)]--- 89
+ VB,~);
Pa = ~ [ ( V b - V c ) - - ( V o -
+ VA,~).
V~)] = 89
It is interesting to note that the sum of/~A and fib is equal to (Vb - Vc), i.e., the maximal amount for which C may obtain from A and B in the market model.
5. Conclusion
In this paper, the concept of the value of information in the one person decision problem has been extended to the strictly competitive environment.
F. Sun, Y. Ho / Value of information in zero-sum games
221
Under the market model discussed herein, we find that the intuitively reasonable extension of the definition of (1) can in fact be justified.
6. Appendix Since all of the proofs concerning A is similar to that concerning B, we only give proofs for either of them.
Proof of L e m m a 2.1. If VAIn < VAIn, then ( Vb - Va ) < ( Vd - V~ ). It follows that ( V b - Vd)< (Vo - Vc). Thus, VB/A < VB/.~ and (4-5) holds. Proof of Proposition 3.1.
Vc = x,[(v~
-
Va
y2
PA)yl + (vb - e~)y2] +
x2[Vcyl +
Voy2].
Since x~ and x2 belong to [0, 1] and Xl+ x2 = 1, we know that if [( Vd - PA )y~ + ( Vb - PA )y2] > [ Vcy, + Vay2]
(6)
for any yl and y2, then (1, 0) is the unique permanent optimal strategy (denoted by p.o.s.) for A. To show (4-6), let KA :
[( Vu - PA )yl
+ ( Vb - Pa )y2] - [ Vcy, + Vay2].
(7)
By assumption that yl and y2 are nonnegative with yl + y2 = 1 and PA < Min(VAm, VAle), it follows that ( V ~ - - P A - V~) and ( V b - - P A - Va) are positive, and so is KA. We, therefore, conclude that " B u y " is the unique p.o.s, for A. By (7), it is obvious for the part (b) that if Pa > Max(VAin, VAIn), then KA is negative and A ' s p.o.s, is " D "
Proof of Corollary 3.2. First assume that Min(VA/B, VAIn) = VAIn, i.e., ( V d - Vc)< (Vb - Vo). From Proposition 3.1 we know that if PA, which is announced by C, is less than Min(VAm, VA/a), then A will buy AL and that is independent of what PB is. Thus, for given e > 0, there exists a P * with 0 < [Min(VAm, V A / a ) ] - P * < (e/2) such that A will choose " B u y " , and that is independent of what PB is. Thus the game facing B becomes
222
F. Sun, Y. Ho / Value of information in zero-sum games
B
(Vd+PB)
V
B would choose " B u y " , if PB < (Vb - Vd) = VBZA.Obviously, for this same e > 0, C could pick such P * with 0 < (VB/,* -- P*) < (e/2) that B will act " B " also. Since Jc(P~,PB)=PA+PB
if both A and B buy AI,
we have 0 < (VA/B + VB/a - J c ( P * , P * ) ) <
e.
Thus, if Min(VAm, VAIn) = VA/B = V d - Vc, then the assertion is true. Conversely, if Min(VAm, V,~/~)= VAIn = Vb -- V~, then V , . B = Vd -
Vc > Vb -
V~.
This implies that Min(VB/A, VB/,~ ) = VB/A. Thus, by interchanging the role of A and B in the previous proof, we shall complete the proof. Proof of Corollary 3.3. If Min(VAm, VAIn)= Va/8, then
Min(VAm, Vain)+ MaX(VB/A, VB/~,)= VAin + VB/A from Lemma 3.1. Conversely, if Max(VB/A, V~/a)= VB/~. It follows that
Min(Va/B, Va/a)= Vain,
then
Min(Va/B, Vam)+Max(VB./A, VB/a) = V b - Vc
= ( v b - v~)+(vo- v~)= v,,j~ + vB,~. Proof of Corollary 3.5. If only fiA > Max(VA/B, VAIn), then A will not buy
AI by the fact that the optimality derived in Proposition 3.1 is permanent 9 The game facing B is B B
(Vc+&)
D
Vo
Thus, B will buy AL only if fib < Vo - Vc. Since V, # Vb by assumption, we have
F. Sun, Y. Ho / Value of information in zero-sum games Vb > Vo
and
223
PB < ( V o - Vc)<(Vb - Vc) = VA/B + V~/A.
By Corollary 3.2, there exists (po, po) such that Jc ( p o , p o ) = V , - V~. Similar argument can be applied, when roles of A and B are reversed. Proof of Corollary 3.6. (i) From Proposition 3.1 we have KA = [(Va -- PA -- Vc)y, + (Vb - PA - V,)y2].
Thus, if/SA = Max(VA/B, VA/a), then K,~ ~< 0 and A will either choose " D " definitely or be indifferent between " D " and " B " . (ii) Let KB = [(Vd + PB -- Vb)x, + (Vc + PB - V~)x2]. The first part can be easily achieved by the fact that KA ~>0 (or K~ ~< 0), if the assumption holds. For the second part if A chooses " B " definitely, then KA > 0. Thus, if Min(VA/B, V A / a ) = Vd - Vc, then KA = ( V b -
PA - Vo)y2>0,
only if y2 # 0. However, if y2 = 1, then J c ( P ~ , P ~ ) = V~ - Vc < Vb - V~.
The assertion is obviously true. By Proposition 3.1, if PB > Max(VB/A, Vs/x ), then y2 = 1, and that is the previous case. Hence, we only need to consider the case where PB <- M a x ( VB/A, VB/Z). If yl and y2 are both not equal to zero, we have Jc(PA, P B ) = PA + y'PB ~< v~ -
v~ + y , ( V ~ -
v~)
< v ~ - v~ + v b - v ~ = v ~ - v~. Here, the second inequality holds by the fact that y~ belongs to (0, 1). Conversely, if Min(VA/B, V A l e ) = Vb -- Vo, then KA = ( V a -
Pa - V~)y,>0,
only if y, ~ 0. However, if y, = 1, and x~ = 1, then PB ~< Vb - Va. Thus,
Jr (PA, P~) = v ~ - Vo + P~ <~ V ~ - Vo + V ~ - V~ < V ~ - V~ + V ~ - V~ < V ~ - Vc.
The assertion is true. Next, if y~ belongs to (0, 1), then
224
F. Sun, Y. Ho / Value o[ information in zero-sum games
Jc (PA, P . ) = v ~ - V o + y , ( V o - V ~ ) < V ~ - V~ + V o - V~= V ~ - V~. T h e p r o o f is t h e r e f o r e completed.
Proof of Proposition 3.8. (a) W e shall p r o v e this part by contradiction. Let (x ~*,x*) be p.o.s, for A with x* + x2* = 1 and x 1,x2 E [0, 1], then
L
v~
for all xl, x2, yl, and yz ~ [0, 1] with xj + x~ = 1, y, + y2 =
Vc By (7) we have y ~ + y 2 = 1, and y~ = 1, and y2 = MA < 0. To see
Vo
v. 1,
y~
and x~ ~ x *. Let
y2
MA ~ 0 for all X~,x2, yb and y 2 E [0, 1] with Xl+ x2 = 1, x ~ x * ~ . First, if Min(VA/B, VA/a) = VAIn, then there exist 0, if x ~*> Xl (or y~ = 0, and yz = 1, if x~' < x~), such that it, substituting y~ = 1, and y2 = 0 into (8) we have
T h e inequality holds by the fact that Min(Va/B, Va/a)= V,,/B, (Vd -- V c ) < PA, and x * - xa > 0. Thus, (x *, x *) cannot be p.o.s, in this case. F o r the rest of cases the p r o o f is identical. (b) T h e first equality is trivial. To p r o v e the second part, assume that
(Vb - V~)
so
( V s - Va)
N o w if xl = 1, then the game facing B is
A• B
B
(Vd+PB)
D
Vb
Since PB > ( V b - V d ) by assumption, B will choose " D " , and y , = 0 . H o w e v e r , if yl = 0, then since PA > ( V b - V,), A will choose " D " , and x, = 0. Thus, ((1, 0), (0, 1)) and ((1, 0), (1, 0)) cannot be Nash-equilibria.
225
F. Sun, Y. Ho / Value of information in zero-sum games
Similarly, it can be shown that ((0, 1), (1, 0)) and ((0, 1), (0, 1)) cannot be Nash-equilibria either 9 Thus, the only region, for which the Nashequilibrium m a y exist when (PA, P~) E R, is xl E (0, 1). Let
V~
Va
J L(1 - yl)
= x , y , ( V d - V b - V~ + V o ) + x ~ ( V b - P A ) + ( 1 - x l ) V o + y ~ ( V ~ - V~);
L(V~+eB) = x,y,(Vd
-
Vo
Vc - Vb + V . ~ ) +
(l-y,)
y , ( ~ + e ~ ) + (1 - y,)V~ + x , ( V b - V o ) .
T h e necessary conditions, for which ((x~, (1 - Xl)), (y,, (1 - y,))) with x, and y, E (0, 1), are (OSA/OX0 = 0 and (OSB/Oy,)= O, i.e., Vo - V~ + PA Y'=
9
V.-Vc-PB
=
Vd-V~-V~+v~'
x,
v~-
v ~ - v ~ + v~ " "
It is easy to check that this pair is indeed a Nash-equilibrium. Next, x,PA + y,P~ = (uo - v~ + PA)P~ + ( g ~ - g~ - P~)PA V,,-- V~-- V~ + Va < (vo -
v~)(v~
-
v~)+(Vo
-
v~)(v~
-
v~)
v ~ - v b - vo + vo =V~-V~.
The last inequality holds by assumptions. For the converse part, the proof is similar. Proof of proposition 3.10. Since u(N)= O, and
v ( { A } ) + v ( { B } ) + u({C})--- - Vb + V~ < 0 , we have .(N) > ~
.({i}).
i E N
Thus, this g a m e is essential. F u r t h e r m o r e , since this g a m e is zero-sum, by a t h e o r e m [8] we have that for an essential constant sum g a m e the core is empty.
226
F. Sun, Y. 1-to / Value of information in zero-sum games
References [1] K. Arrow, "The value and demand for information", in: C. McGuire and R. Radner, Eas., Decision and organization (North-Holland, Amsterdam, 1972) Ch. 6. [2] Y.C. Ho and K.C. Chu, "Information structure in dynamic multi-person control problems", Automatica 10 (1974) 341-351. [3] H. Witsenhausen, "On the relations between the values of a game and its information structure", Information and Control 19 (3) (1971). [4] Y.C. Ho and F.K. Sun, "Value of information in two-team zero-sum problems", Journal of Optimization Theory and Applications, 14 (5) (1974). [5] F.K. Sun and Y.C. Ho, "Role of information in the stochastic zero-sum differential game", Journal of Optimization Theory and Applications (1976). [6] H.S. Witsenhausen, "Alternatives to the tree model for extensive games", in: J.D. Grote, Ed., Proceedings of N A TO advanced study institute on differential games (Reidel Publishing Co., Dordrecht, 1974). [7] H.S. Witsenhausen, see [6]. [8] G. Owen, Game theory (Saunders Company, Philadelphia, 1968).
Mathematical Programming Study 5 (1976) 227-243. North-Holland Publishing Company
S E Q U E N T I A L DECISION AND S T O C H A S T I C C O N T R O L * Edison TSE Stanford University, Stanford, Calif., U.S.A. Received 8 July 1975 Revised manuscript received 22 March 1976 The tight coupling between learning and control in a sequential decision or a stochastic control problem is being considered in this paper. A quantitative description on the learning capability of a control law is defined in terms of Shannon's information measure. It is shown that when the control law cannot influence the amount of learning, the stochastic control problem has the separation property regardless of the cost criterion.
1. Introduction In many processes arising in social, economic, engineering and biological systems, the problem of decision making or control under various sources of uncertainties is inherent. Usually, if all the uncertain events were made known to the decision maker before his decision was made; he could, at least in principle, make the best decision which will optimize his objective function. In reality, however, the decision maker is forced to make a decision without having full knowledge of these uncertainties. Therefore, an intuitive, yet plausible, approach is to treat the problem as two interconnected problems: first the decision maker tries to estimate the uncertain events; and then the decision maker makes an optimum decision based on the estimation results. In most cases, the estimation procedure is independent of the decision rule chosen, whereas the optimum decision made at a particular time is highly dependent on the effect of present decision made to future estimation accuracy which will, in turn, help in making good decisions in the future. This tight coupling between learning and control exists in most statistical sequential decision problems. In the stochastic control literature, this interaction between learning and control has been studied under the topic of dual control theory [1, 5, 14]. * This research is supported by ONR Contract N00014-75-C-0738. 227
228
E. Tse / Sequential decision and stochastic control
One very common and widely used (or misused) approach to the sequential decision problem is to neglect the dependency between present decision and future estimation performance; and a suboptimal decision is made by assuming that the estimation performance remain unchanged in the future. In so doing, we break the problem into two separate disjointed subproblems, and such an approach is called separation. It is, however, conceivable that for a certain class of degenerate problems, the separation approach yields the optimum decision. In fact, it is shown in Section 4 that the separation property is a structural property of the stochastic dynamic process. Except for a difference in terminology, a discrete time stochastic control problem is equivalent to a Bayesian sequential decision problem. A common ground formulation is to express the problem in terms of the conditional density for predictive observation, which is dependent on the past decision and the underlying uncertainty. This is done in Section 2. In Section 3, optimal control (or decision) rule will be derived using the principle of optimality [4]. A quantitative measure on the learning capability of a control law is defined in Section 4. In terms of this measure, the coupling between learning and control in a stochastic control problem can be discussed in a quantitative manner. When such a coupling disappears, the system is said to be neutral. One of the main results established in Section 4 is to relate neutrality of a system to the separation property in a stochastic control problem. Using this result one can deduce, very simply, all the separation results which are pertinent to discrete time dynamical systems. Several specific cases are discussed in Section 5.
2. Stochastic control problem Consider a stochastic control process operating for N time steps. Observations are made at each step, and a control is made after each observation. The stochastic control problem is to find a control law which maps the cumulative information data into value of control input such that the expected value of an objective criterion is minimized or maximized. We shall distinguish two types of uncertainties: parameter (or process) uncertainty and observation uncertainty. Let (~2, ~ ) be a measurable space which represents the parameter uncertainty, and (g~', ~ ' ) be another measurable space which represents the observation uncertainty. For each parameter 0 C ~, we have a probability measure Po on ~ ' . For a fixed admissible deterministic control sequence U N-1 ~ { u o , . . . , uN 1} with
E. Tse / Sequential decision and stochastic control
229
uk @ ~ k = 0, 1. . . . . N - 1 and a fixed parameter 0 E/7, we have a random sequence (N '-measurable)
y~=y,[w;O, U'-'];
i--1,2 .... ,N-l;
wES2'
which represents observations at each time step. If there is no confusion, the arguments of yi will be dropped for simplicity, and the sequence {y~,..., yk} will be denoted by yk. Given {Po;O@J2} and {y,[w;0, U " ] ; 0 E J2, uk @ ~//k, k = 0 , 1 , . . . , i 1}~=~, a family of conditional densities, {p(yk I Y~-~, 0; Uk-~)}, ' can be deduced. N - , be an admissible control law. The family Let u = {uk (Yk, U k ,) ff q/k}k=o of conditional densities on y~ is given by
p,(y~ I Yk-'; O) = p(yk I yk ,, 0; u,,, u,(y,, uo). . . . . uk-,(Y k-', Uk-2)); 0 ~ J2
(2.1)
where the subscript u on the left-hand side denotes that a control law u is assumed to be held fixed. Thus we see that the specification of ~ {p(yk I yk-,, O; U k-~) I 0 ~ J2, u~ E ~ i = 0 ..... k - 1}N , allows us to deal with both deterministic control sequences and feedback control laws. If a control law is being considered, we shall use the notation fi, to denote the realization of the control value at step i. A payoff, or a cost, is associated with the triple (0, U u-', yN), denoted by J(O, U N-', y u ) . A stochastic control problem is defined as follows: Given a family of conditional density 3~ and a prior probability measure P on N, find an optimal control law u~ = u * ( Y ~, U * k - ' ) E ~ k -- 0, 1 . . . . . N - 1, such that E{J(0, U N-~, yN)} is either maximized or minimized. In the following discussions, we shall always minimize E{J(0, U ~-~, yN)}. The above formulation is under the framework of Bayesian sequential decision analysis. To see that the classical stochastic control problem is included in the present general formulation, consider a stochastic system described by
Xk+,=fk(Xk, Uk, Wk);
Xk E R "
u k E R ~, w k E R '
(2.2)
with observation yk =hk(xk, vk);
yk E R " ,
V~ E R d;
m/>d
(2.3)
where {vk, wk} are noise processes with known statistics, and hk (x,.) is a In the following discussion, we assume that these conditional densities are well defined. If Yk is a discrete r a n d o m variable we shall replace the conditional density by conditional probability and the integral by summation.
230
E. Tse / Sequential decision and stochasticcontrol
o n e - o n e mapping from R d to R" for all x ~ R". The performance criterion is given by E{J(X N, U N-l, yN)}. To cast this problem into the above framework, let us define 0 = {Xo, w0 . . . .
, w~
1}.
For a given observation sequence y k - , , and a realized sequence of control 0 ~-~, a parameter 0 is consistent with the input-output data if there exists a sequence V k-~= {Vl,..., vk-1} such that it satisfies
y~ = h,(x~(O i-', W~-~,Xo),V~);
i = 1. . . . . k - 1
(2.4)
where W k-1 = { W 0 , 9 9. , Wk--1}
and x i ( U I-1, Wi-~,Xo) is the solution of (2.2) with Uk-l-~ 0 k-~. Since hk (x, 9) is a o n e - o n e mapping from R d to R " for all x E R", the existence of V k-~ implies that it is also a unique solution for (2.4). If 0 is consistent with (yk--l, Ok-l), then
p(yk [ yk-~, O; 0 k-l) = p(yk [ X, (Ok-'; Wk-'), V k-l, W k-l)
(2.5)
where V k-~ is the unique sequence that satisfies (2.4). If, on the other hand, 0 is not consistent with (yk-~, O~-t), the conditional density for yk is zero. Therefore, given (2.2), (2.3) and the noise statistics, the family ~ and the prior P are specified. The performance criterion can be rewritten in terms of (0, U N-l, Y~) E{J(X N, uN-1, y N ) } = E{/(x, ( U I-1, W'-', Xo), i = 1. . . . , N, x0, U N-', V~)} E{J(0, U ~'-~, Y")}.
(2.6)
Note that in the above discussion, there is no requirement that {xk} be a Markov sequence, i.e., {vk} is a white noise sequence; or that {wk} be independent of {vk}.
3. Optimal stochastic control Let us assume that a certain control law {uk (.)}, k = 0, 1 . . . . . N - 2 is specified, and the observation sequence y ~ - i is obtained. The remaining problem is to select UN-1 such that conditional expected cost
E. Tse / Sequential decision and stochastic control
231
E{I(0; U ~-', Y~) I yN-,} = ~,{J(0; t? ~-~, uN-,, yN) I Y~-', O "-~} is minimized. Using Bayes' rule and equation (2.1) we have EIJ(O"' U N-', Y~) [ Y~-'} = C~_,(Y N-', I 0 ~-2) 3f J(O; 0 ~-2, uN_,, Y~) (3.1) 2
N-I
p(yN I YN-',O;DN-:,u~,-,)1-[ p(yk [ Yk-',O; Ok-')p(O)dOdyN k=l
where CN_t(Y N-~, 0 ~'-:) is a normalizing constant for the conditional density. It should be noted that in (3.1), only the realized value of the past control sequence effects the computation of the conditional expected cost. The minimizing uN-~ has the form 3
u*_, = u*-,(Y N-', UN-2).
(3.2)
The optimal return function has the form
j*_,(yN-',O'-2)=E{J(O
;
* N-1 UN-Z, UN_,(Y ,UN-2),yN)[ yN-1, UU-2}(3.3)
In applying the optimization procedure backward in time, we have the following equations: (k = 0, 1 , . . . , N - 2)
j*(yk
Ok-l) =
min E{J~+,(Y k+', O k-', uk) [ yk, Ok-,}
(3.4)
ttk E q / k
E{J*+,(Y k+~, O k-', uk) [ yk, Ok-,} =
ck( Yk,1 0 k-') f j~+,(yk+,, Ok-,, uk)
k
p(yk+, I Yk, o;Ok-',Uk)I--lP(Y, I Y'-',O;O'-')'P(O)dOdyk+,
(3.5)
i=1
and u* is a minimizing solution of (3.4). We can easily see that
u~ = u*(Y k, Ok-').
(3.6)
The optimal control taw is then given by {Uk}k-o. , u-, For stochastic control problems with state and observation equations given by (2.2) and (2.3), and with additive cost N-,
j ( X ~, UN-,, yN) = K[xu, yu] + Z Lk [xk, uk, yk]. k=O
z We shall write formally d P ( 0 ) = p(O)dO and allow p(O) to contain impulses. 3 Throughout the discussions, we assume that the optimal control law exists.
(3.7)
232
E. Tse / Sequentialdecision and stochasticcontrol
Equations (2.6) and (3.1) give E{J(0,
U N-l,
yN)
[ yN-l}
=
E{J(X N, U N - - 1 , yN) I yN-,, u~-,}
= E{K[xN, YNI + LN-I[XN-1, UN--1,Y"--'I N--2
+ •
Lk [xk, lik,yk ] [ rN-1, ON-2}. (3.8)
k=O
Since E{Lk (Xk, r/k,yk) [ yN-,, ON-E} is independent of UN-1, k = 0, 1. . . . . N - 2, the optimizing u N-t(Y * N-1, O N-a) is obtained by minimizing E{K(xN, yN)+ LN-I(XN-I, UN--,,yN-,) ] yN-,, ON-=} where XN, yN are dependent on UN-1 through the system dynamic (2.2) and observation equation (2.3). Define the expected optimal cost IN-I(Y N-l, L~N-a) a s IN-I(Y N-', O N-a) ~=E{K[fN-I(XN-,, U/~4--1,WN-1), hN (fN-l(XN-1, U*-I, WN-,), VN)] + L(XN-I, U*-l) I yN-,, 0N-2}.
(3.9)
Equation (3.4) becomes j._,(yN-l, 0N-2) = E
E Lk (xk, uk, yk) I yN-1, 0N-2 k=0
+ IN_I(Y N-', ON-=).
(3.10)
When we carry the minimization one step backward in time, equation (3.4), with k = N - 2 , becomes N-3
J~,-2(YN-=, 0N-~) = min E UN 2
~ Lk (xk, &, yk) + L (XN-=, UN-a,yN-2)
k=O
+IN_,(yN-2, yN_,, 0N-3, UN_a) [ yN-2, i.)~-3} =E
}
~ Lk (xk, zik,yk) + min E{L (XN-2, UN-2, yN--2)
k ~0
14N--2
iN_,(y,-2, y,_l, 0N-3, UN_2) I yN-2, 0N-a} (3.11) where yN-1 is dependent on UN-1 through (2.2) and (2.3). Inductively, the expected optimal cost to go, h (yk, 0k-l), satisfies
E. Tse / Sequentialdecisionandstochasticcontrol
233
uk E{Lk(xk, uk, yk)
Ik (yk, Ok-,) = min
+ Ik +l( yk, yk +,, ~k-,, Uk ) l
yk, [.~k-1}
(3.12)
and the optimal return function is given by
J*l'sk kt , uk--l) = m i n E I ~ 1 L,(x,,a,,y,)] gk, Ok-1} +Ik(Y~,l]k-'). uk
(3.13)
t. i = 0
Equation (3.12) is the usual dynamic programming equation. It should be noted that in the above derivation, (3.1)-(3.6), we deviate from the conventional approach of introducing the concept of information state [7, 11, 12]. Instead, we express the results in terms of the conditional density of the future observations. These results, equations (3.1)-(3.6), indicate clearly that the coupling between learning and control lies in the fact that, in general, the control law can influence the predictive conditional density of the output observation. A measure of how much a control law can influence the learning capability is being discussed in the next section.
4. Learning, neutrality and separation In this section we shall define a measure on the learning capability of a control law. Neutrality of the system, a concept which was introduced by Feldbaum [5] and vaguely defined by several authors [3, 8] is then defined in terms of lack of learning capability. The justification for the definition given here on neutrality rests on the relation we can establish between neutrality and the separation property. For a given admissible control law u = {uk (.)}~-o~, the mutual information between 0 and yk is defined as
iu(Yk;O)= f p.(yk ] O)p(O)log p, pu(yk) ( y k ] 0) d y k dO
(4.1)
for k = 1 , 2 . . . . . N - 1 . D e n o t e ~f~={xER~-'lx~=I.(Yi;O), i= 1 , . . . , N, u is an admissible control law} C R N-1. Similarly, we can define for a given admissible control sequence U ~'-',
p(yk r O; U k-,) I ( Y k", 0 ] Uk-')= f p(yk I O; Uk-')p(O)log p(y,,[U,,_l) d Y k d e ( 4 . 2 ) for k = 1 , 2 . . . . . N ; and denote ~d ~{x~-R~-']x,--- I ( Y ' ; 0 ] U'-'), i = 1. . . . . N, U N-1 is an admissible control sequence}. Since an admissible
234
E. Tse ] Sequential decision and stochastic control
control sequence can be viewed as a singular admissible control law, we must have ~a C Ytr. The properties of the mutual information measure between yk and 0 are discussed in the literature. Mutual information is always nonnegative and its numerical value represents, quantitatively, the amount of information that is contained in the observation y k about the unknown "source", 0. (See e.g. [6, 10].) Thus a measure on the learning capability of the control law u can be defined in terms of I , ( Y ~; 0). A partial ordering on the class of admissible control laws, in terms of its learning capability, can be defined as follows
u~
if
I.I(Y~;O)~I~2(Yk;O);
k = l . . . . . N.
It should be noted that, in general, the ordering is not complete, i.e., there exist a pair (re, u4) such that I,~(Yk;O) I,,(Yk';O), k ' / k . An element x E ~ I represents the "distribution" of cumulative learning throughout the control period. In order to acquire a deeper understanding in the interaction between learning and control, we must investigate the relationship between the set ~r and the structural property of the optimal control law when the prior density and the performance criterion are specified. In the following, we shall establish some results in this direction. Let /]N-1 be an admissible reference deterministic control sequence, and {.fl (w ; O,/.),-1)}~_~, is the resulting random observation sequence. For any admissible deterministic control sequence U ~ U ~-', and 0 E S2, we assume that the mapping
~b(i) u,o: y~ [W ; 0, ~ji-1]--~yi [W 0, U '-l] is nonsingular for all i = 1. . . . . N. This implies that if 0 is known, then after observing what has happened, we can deduce exactly what would have happened if we had acted differently in the past. Such an assumption holds in many applications. In many cases if the zero control sequence is admissible, then we may take it as the reference control sequence. For a fixed reference control sequence, the density p(l?k I 0; t ] k-~) depends only on 0 and may be denoted by p(IYk [ 0). Let us define the neutrality of a system. Definition 4.1. The system described by the family of contitional densities is said to be neutral if for any given prior p(O), the range Ytr contains one single element in R r'-l.
E. Tse / Sequentialdecisionand stochasticcontrol
235
It turns out that the neutrality of the system can be investigated in t e r m s of gtd instead of ~r. T o establish this result, we first p r o v e a l e m m a which will be useful t h r o u g h o u t this section.
For any given prior p ( 0 ) , the range ~a contains one single element if and only if for all admissible control sequence U N-~ and 0 ~ O
L e m m a 4.2.
e ( Y ~ I O; U ~-')
p(?~ 1o) where gk ( ' , "
; 9)
= gk (Y~,
?~,9 Uk-');
k = 1,
...,
N
(4.3)
is independent of 0 E O.
Proof. T h e sufficiency is clear. If (4.3) is true, then
p(yk ] O; U k-') = p(yk I O; U k-') _ e(?~ I o). p(?~) , p(U* lug,) j p(Y~ [ 0'; Uk-~)p(O')dO ' (4.4)
k = 1,...,N-1
for all admissible U N-~ and any prior p(O). T h e r e f o r e ~ e contains only one element. Conversely, if ~ contains only o n e e l e m e n t for any prior p(O), we must have in particular
e t r * 1 0 ; u k-') . f P (Yk IO; U~-')p(O)log f P(Yk l 0; Uk-')ptO)dO p(~'~ I 0)
d~
dY
~ dO
d0,
= f p(f~lO)p(O)log f P(fC~lO)p(O) do k = t .... , N
-
t
(4.5)
for all admissible control sequence U k-l. E q u a t i o n (4.5) can be rewritten as .,.o~ (y, is evaluated at ,e u,o, (.~i), where 0' is an arbitrary fixed p a r a m e t e r in .O)
f p(~..k I O)p(O)log g(yk [ O; U~-~)p(~.k). d y e d 0 p(yk ] Uk-~)P(~'k I O)
=0.
(4.6)
Using the inequality In x ~< x - 1; with equality if and only if x = 1
(4.7)
E. Tse / Sequential decision and stochastic control
236
and the fact that p(O) is an arbitrary prior density on f2, we deduce from (4.6) that
p(yk I 0 ; U k - , ) p ( ~ k ) = 1 p(yk ] Uk-,)p(f.~ 10)
(4.8)
which implies (4.3). An important implication of (4.8) is that the mapping i ~ ( 1v.o~') ) [ must be independent of 0; i.e. y, = ~ ) ( ~ ) , i = 1, . . . , N.
The system described by the family of conditional densities ~ is neutral if and only if for any prior p(O), ~d has only one single element.
Theorem 4.3.
Proof. Necessity is obvious since Y?t D ~d. Conversely if ~d has only one element, then using (2.1) and (4.3), we have for any admissible control law,
p.(YklO)= l)(Y k I 0 ; O k - ~ ) = p.(yk)
p(yk i /.~k-~)
gk(yk, IT"k; Ok-l)p(l?k 1 0 ) = P(I? [ 0) gk(yk, 17,k; Ok-,)p(17-'k) k = 1, 2 . . . . . N
p(fk)
'
(4.9)
and thus ~ contains only one single element. Intuitively, Theorem 4.3 implies that if we cannot control the "amount" of learning on the unknown parameter using deterministic control sequence, we cannot do so even with feedback. Therefore, we would expect the control phase to be completely decoupled from the learning phase, and some sort of separation result must hold. This will be treated in the remaining of this section. In the degenerate case where p(yk ] yk-l, O; U k-l) = p(yk ] yk-1, 0), the sequence U N-1 cannot "control" the output random sequence YU, and therefore we shall call the resulting problem a stochastic optimization problem. A given stochastic control problem (i.e. with a given ~ and J) is said to have the separation property if the optimal control law can be obtained by solving a stochastic optimization problem which is transformed from the original stochastic control problem. From the above results, we note that the following statements are equivalent, (1) the system described by the family of conditional density ~ is neutral, (2) for any prior p(O), ~ has only one single element, (3) p(yk [ O; U k-l) and p ( I ?k I 0) are related by
p(yk I 0; U k - ' ) = p ( l 7"k I O)g( Yk, f.k; Uk-~), (4) the mapping ,eu.o~'t'tt.~) s~ is independent of 0, and
E. Tse / Sequentialdecisionand stochasticcontrol
237
(5) the t~-algebra generated by f.k and that generated by Yk(Uk-~), for an arbitrary admissible sequence U k-l, are equivalent. Therefore, if the system is neutral, we can easily show, using the results in section 3 and the above equivalent statements, that regardless of the cost criterion, the original problem can be transformed into the following stochastic optimization problem: Given a family of conditional density # =~ {p(y~ ] IT"k-~;0), 0 ~ O}k=, ,'r and a prior density p(O) on ~, find an optimal control law u * = uk ( I7"k, U *k-~) E ~k, k = 0, 1. . . . . N - 1 such that
E{?(O, U N-~, f.N)} = E{J(0, U '~-', q~b~(y,); i = 1. . . . . g ) } is minimized. The optimal control law for the original problem is then given by u * = u * (~-~).(y,), i = 1,2 . . . . . k - 1,
g *k-l)
(4.10)
where ~'~).(. ) is the inverse mapping of ~b~).( 9). We thus have the following statement: T h e o r e m . If the system described by the family of conditional density ~ is neutral, then the stochastic control problem has the separation property regardless of the cost criterion. Separation
On the other hand, the converse statement is probably true since if the system is not neutral, one can find a cost criterion which does not yield the separation property. It should be noted that the separation property is a structural property which is independent of the prior distribution on ~ and the cost criterion.
Remarks. (1) One may question the motivation for the definition of neutrality given in this paper. Indeed, one may define neutrality via the relation (4.3) or the condition that the mapping ~,u.e~ a.,~ i. ) to be independent of 0. In fact, a different definition of neutrality based on the latter one was given by Bar-Shalom and Tse [3]. Such a concept of neutrality, however, is only of kind and not of degree. The present definition of neutrality may allow one to define a quantitative measure of neutrality in terms of the "size" of ~d, or ~r. Naturally, the usefulness of such a concept can only be justified by the theoretical results one can derive from it. This is a subject for further studies.
238
17,. Tse / Sequential decision and stochastic control
(2) The present definition of neutrality of systems can easily be extended to continuous time systems since it is defined in terms of the mutual information between two random processes. 5. S o m e special cases
The results obtained in the previous section will now be applied to several classes of stochastic control problems. (a) L i n e a r s y s t e m s with partially observed state. Consider a linear system x~+, = Akx~ + bk (uk ) + wa ;
xk E R",
1
(5.1) yk ~" CkXk -~ Vk
where wk, vk are random processes with known statistics (not necessarily independent or white). The cost criterion is cost = E{J(XN; U N-', yN)}.
(5.2)
It is noted earlier that this class of problem can be put into the general framework of Section 2 by defining 0 = {x~, w5. . . . , w;~-d. If the zero input sequence is taken as the reference deterministic input sequence, then I7"k satisfies (~o = x0)
i~+~ = A~i~ + wk + bk (0).
I
(5.3)
Because of the linearity yk = .~k + Ck:~
(5.4)
where ik+, = Ake~ + b~ ( u k ) -
bk (0).
(5.5)
Since ik depends only on U k - - 1 , the mapping ~,u, .to,) o/t ' j as given in (5.4) is independent of the parameter 0; and thus from Section 4, the stochastic control problem has the separation property regardless of the cost (5.2). To solve the stochastic optimization problem, only the conditional density for )Tk, which is specified by (5.3) is needed. The cost criterion (5.2) can also be rewritten as E { J ( X " ; U ' - ' ) } = E{J(X N + ~ N ; UN-1)}
(5.6)
239
E. Tse / Sequential decision and stochastic control
where s satisfies a deterministic equation (5.5). The stochastic control problem is equivalent to the stochastic optimization problem of finding an optimal control law u*(Y k, 0 *k-') such that
E{I(~ N + 2 ; UN-')} = f I ( 2 N + 2, U N - I ) p ( ~ N ) d ~ N
(5.7)
is minimized subject to the deterministic equation (5.5). For additive cost N--1
(5.8)
J ( x N ; u N-') = K(xN) + Y. Lk (xk, uk) k=0
the dynamic programming equation for the stochastic optimization problem becomes Ik (~.k, Ok-,) = min E{Lk (ik + xk, u~)+ I k , l ( Y k, )Tk+,, O k-', Uk) I I2k} (5.9) tlk
where Yk+,, given by (5.4), is independent of uk, and ik satisfies (5.5) with u, replaced by the realized values, i = 0 . . . . . k - 1. This is the general separation result. We shall specialize the result to two classes of problems. Suppose {Vk},{Wk} are independent white noise sequences, then E{LN-,(.IN-t + s
UN ,) + K(.IN + iN) t ~.N ,}
f {LN-,(.IN-, + iN-,, UN--,) + K(AN ,(.iN-, + iN--,) q'- W N - 1 ) } ' p ( i N - , [ '~'7"N-')p(wN-1)d.iN-ldlr
(5.10)
If, in addition, p(.i~ I 9 ' ) depends on the data 1?~ through the relation p(.i, [ I T ' ) = t p ( . i , - ~ , l , ) ;
i=1 ..... N-1
(5.11)
where x~ is = E{i~ [ 17'}, we see that the optimal control law which minimizes (5.10) has the structural form /..1~_ 1 = u N-,(XN-, * -~ IN-, + s
where dN-,pN-, = E{XN-, [ yN-,, that u * = u*(ik ik);
= U *--,(.iN-, tN-,)
ON-=}. Inductively,
k =0,1 ..... N-1.
(5.12)
it can easily be shown (5.13)
Clearly, (5.11) is true if all the noises are Gaussian. A continuous version of this result in the Gaussian case was established by Wonham [15]. For the second class of problems, {Vk},{Wk} are neither white nor uncorrelated; and the cost criterion is quadratic,
E. Tse / Sequentialdecision and stochasticcontrol
240
N--1
J ( X N ; U N-l) = 89
+89 ~ {X'kOkXk + u'kRkuk}
(5.14)
k=0
where FN >1O, Qk >! O, Rk > 0, and the system (5.1) is linear in the control, i.e., b~ (uk)= Bkuk. Using (5.3)-(5.5) we have
IN_,(Y N-l, 0 N-2) = rain E{89
+ gN)'F(:~N + iN)
-~- 1 (JI~N-1 +
I N - 1 ) t Q N - I ( I N - - 1 ~- . ~ N - 1 )
+ U.-,RN-lUN-1 I y~,-1, s = min [ 89 UN--1
--t ~ 1 --t -[ I7"N-1}+ xkF~NIN-1 + ~xkF~N
1 --t
+ ~XN-1 Qrc-lXN-1 + U ~-IRN-lUN--1]
(5.15)
and since XNIN-~ ~E{~N [ I7"N-1} is independent of UN-1, the minimizing u*_l is given by
U~q--1
:
- -
[RN-1 + B~-,FBN-~]-IB;,,_I{F(AN-~x%-I + XNtN-,)}. (5.16)
It can be shown by straightforward induction that the optimal control law is given by u*
=
-
[Rk
+
' - 1 Bk{Kk+l(Akxk , - + Xk+,lk) + ~k+,tk} BkKk+lBk]
(5.17)
where X~+,,k=E{~k+ 1 [ ~k},~k+,tk a E{'yk+l [ IT"k}= E{~'k+, t yk}, with ~llk A= Sk~llk+l -[- SkKk+lJitk "~ ~1N = 0
Sk
=
A
'k--
1
(5.18)
A 'kKk+~Bk [B'kKk+~Bk + Rk ]-IB~
and {Kk }~=osatisfies the usual matrix Riccati difference equation, see e.g. [1, 2, 12]. This is the certainty equivalence property as defined by Tse and Bar-Shalom [13]. a (b) Linear systems with unknown parameters. Consider a linear system 4 Tse and Bar-Shalom[13] prove this result in a differentmanner, a differentset of equations, which are equivalent to (5.17)-(5.18), are given by them.
E. Tse / Sequential decision and stochastic control
Jr- W k "~ X k ~
Xk+l
241
R", uk E R " } ~5.19)
yk = CkXk + Vk ;
y~ E R "
where v~, wk are random processes with known statistics, A~'~ is a known nl • n~ matrix, and A ~2)(A), A ~22)(A) are, respectively, nl • n - n~, n - nl x n - n~ matrices dependent on the unknown vector A. The cost criterion is of the form (5.2). By defining 0 : {/0, W0. . . . .
W/V--l, , ~ } ,
the cost criterion can be rewritten into the form of (2.6). The conditional density p(yk ] Y k-~, Uk-t; 0) is also well defined. Choose ~ = 0 , i = 0 . . . . . N - 1 as the reference control sequence, and A ~al) A (k12)(A)]
~+~ =
I
0
| x~ + w~ + A (k22)(A).j
(5.20)
Since A ~'>
A ~12)(A)
(5.21) 0
Equations (5.19)-(5.21) give yk = Yk + C~i~
(5.22)
where ik satisfies ~+l=A~l~)~k+bk(uk)--bk(O);
2k ~ R"'; u k E R "
(5.23)
and Ck = rro)t-,(z~1. t'--k ,.-k j,
C(k~) is m x nl.
(5.24)
The mapping )Tk--->yk via (5.22) is clearly independent of 0, and thus the separation property holds for arbitrary cost criterion. If the cost is quadratic, a similar type of certainty-equivalence property can be obtained. (c) Non-separation with perfect state measurement. Consider a nonlinear system yk+l = fk (y~, uk)+ wk
(5.25)
242
E. Tse / Sequential decision and stochastic control
where wk is a noise process with known statistics; yk, k = 1. . . . . N is A observed exactly but y0 = 0 is an unknown vector. The cost criterion can be expressed in terms of 0, y N and U N-'. T a k e the zero sequence as the reference control sequence. For k = 0, yl = fo( O, Uo) + Vo = fo( O, u0)+ ~1 - f,,( O, 0).
(5.26)
Thus the mapping ~,--->y~ will be independent of 0 if, and only if, there exists a function go(u) such that
fo(O, u ) - fo(O, o): go(u).
(5.27)
If (5.27) is not satisfied, separation property is not guaranteed for arbitrary cost criterion. IL however, f~(O, u ) - f k ( O , O ) = gk (u); k = 0, 1 . . . . then it can easily be shown that y2 = y2 + fl(g (u0) + y~, u,) -- f~(y,, 0)
(5.28)
~2 = y~ + f~(yl - g(uo), 0) - f~(yl, u0.
(5.29)
and
Combining with (5.25) and (5.26), it can easily be seen that, inductively, the ~,-algebra generated by ~k and that generated by yk (U~-,) are equivalent, k = 1, 2 . . . . . N ; and thus separation property hold for arbitrary criterion. It should be noted that the separation property hinges heavily on the fact the dynamical system has additive process noise. If the process noise is not additive, then it may be more appropriate to define 0 = {yo, Wo. . . . . WN-~}; and it can easily be seen that separation is not guaranteed even though perfect state observation is available.
6. Concluding remarks The formulation of stochastic control and sequential decision problems is rather straightforward, and the optimal control law or decision can be easily obtained by applying the principle of optimality. The only difficulty remains is that of solving the nested minimization-expectation equation (in the case of additive cost, this becomes the dynamic programming equation). In view of the fact that this equation cannot be solved in practice, there are many suboptimal control algorithms suggested in the literature. They range from the simplest type like enforced certainty equivalence, to the most sophisticated dual control method [12, 14]. Comparisons of some alternate suboptimal control laws have been carried out by many authors, but almost all of these comparisons were based on a limited n u m b e r of simulation studies.
E. Tse / Sequential decision and stochastic control
243
Since the data base used for comparisons were different for different authors, realistic comparison of all alternate suboptimal control laws are almost nonexistent. In order to gain further insight into the stochastic control or the sequential decision problems, one must look at the problem from several different directions. One direction is to study the influence of a control law to the information flow in a dynamical system and the interplay between learning and control. Hopefully, results on these can lead to results regarding the structural properties of the optimal control law, which is'a function of the system dynamic, observation program and performance criterion. In order to set a proper framework for studies in this direction, it is necessary to quantify the abstract and qualitative concepts like learning and neutrality. The objective of this paper is an effort in this endeavor. It has been shown that, in the degenerate case when the control law cannot influence the amount of learning, the stochastic control problem has the separation property regardless of the cost criterion. References [1] M. Aoki, Optimization of stochastic systems (Academic Press, New York, 1967). [2] K.J. ,~str6m, Introduction to stochastic control theory (Academic Press, New York, 1970). [3] Y. Bar-Shalom and E. Tse, "Dual effect, certainty equivalence and separation in stochastic control", I E E E Transactions on Automatic Control (1970) 494-500. [4] R. Bellman, Adaptive control processes: a guided tour (Princeton University Press, Princeton, N.J., 1961). [5] A.A. Feldbaum, Optimal control systems (Academic Press, New York, 1965). [6] R.G. Gallager, Information theory and reliable communication (Wiley, New York, 1968). [7] L. Meier, "Combined optimal control and estimation", Proceedings of the 3rd annual Allerton conference on circuit and systems theory.
[8] J.W. Patchell and O.L.R. Jacobs, "Separability, neutrality and certainty equivalence", International Journal of Control (1971) 337-342. [9] Saridis and Lobbia, "Parameter identification and control of discrete-time systems", IEEE Transactions on Automatic Control (1972) 52-60. [10] C.E. Shannon, "A mathematical theory of communication", BellSystem Technical Journal (1948) 379-423; 623-656. [ll] C. Striebel, "Sufficient statistics in the optimum control of stochastic systems", Journal of Mathematical Analysis and Applications 12 (3) (1965) 576-592. [12] E. Tse and Y. Bar-Shalom, "An actively adaptive control for linear systems with random parameters via the dual control approach", IEEE Transactions on Automatic Control (1973) 109-117. [13] E. Tse and Y. Bar-Shalom, "Generalized certainty equivalence and dual effort in stochastic control", IEEE Transactions on Automatic Control, to appear. [14] E. Tse, Y. Bar-Shalom and L. Meier, "Wide-sense adaptive dual control for nonlinear stochastic systems", IEEE Transactions on Automatic Control (1973) 98--108. [15] M.W. Wonham, "On the separation theorem of stochastic control", S I A M Journal on Control 6 (2) (1968) 312-326.