Institute of Mathematical Statistics LECTURE NOTES-MONOGRAPH SERIES
Essays on the Prediction Process Frank B. Knight University of Illinois at Champaign-Urbana
Institute of Mathematical Statistics LECTURE NOTES SERIES Volume 1
Essays on the Prediction Process Frank B. Knight University of Illinois at Champaign-Urbana
Institute of Mathematical Statistics Hayward, California
Institute of Mathematical Statistics Lecture Notes Series
Editor, Shanti Gupta, Purdue University
International Standard Book Number 0-940600-00-5 Copyright ©
1981 Institute of Mathematical Statistics. All rights Reserved.
Printed in the United States of America.
TABLE OF CONTENTS Page Essay I.
INTRODUCTION, CONSTRUCTION, AND FUNDAMENTAL PROPERTIES
...
1
0.
INTRODUCTION
1
1.
THE PREDICTION PROCESS OF A RIGHT-CONTINUOUS PROCESS WITH
2.
PREDICTION SPACES AND RAY TOPOLOGIES
3.
A VIEW TOWARD APPLICATIONS
35
REFERENCES
44
LEFT LIMITS
ESSAY II.
3 20
CONTINUATION OF AN EXAMPLE OF C. DELLACHERIE
1.
THE PROCESS
2.
THE PREDICTION PROCESS OF
3.
CONNECTIONS WITH THE GENERAL PREDICTION PROCESS
54
REFERENCES
55
ESSAY III.
Rfc
46 46
Rfc
50
CONSTRUCTION OF STATIONARY STRONG-MARKOV TRANSITION
PROBABILITIES
57
REFERENCES ESSAY IV.
67
APPLICATION OF THE PREDICTION PROCESS TO MARTINGALES . . . .
68
0.
INTRODUCTION
68
1.
THE MARTINGALE PREDICTION SPACES
70
2.
TRANSITION TO THE INITIAL SETTING: PROCESS
3.
THE LEVY SYSTEM OF A 91
ON CONTINUOUS LOCAL MARTINGALES REFERENCES
96 107
iii
ESSAYS ON THE PREDICTION PROCESS
Frank B. Knight
University of Illinois at Champaign-Urbana
ESSAYS ON THE PREDICTION PROCESS
Frank B. Knight
PREFACE.
This work comes at a stage when the literature on the prediction
process consists of only six papers, of which two are by the present author and the other four are in the Strasbourg Seminaire des Probability's. these papers is simple to read, much less to understand.
None of
Accordingly, our
work has been cut out for us to make the prediction process comprehensible to more than a few specialists.
One way of doing this, it would appear, is to
present the subject separately in several different contexts to which it applies.
Thus for a reader interested mainly in a certain aspect, that part
may be studied independently, while for one wishing to have a fuller understanding, the force of repetition of a common theme in different settings may serve to deepen the effect. Accordingly, the present work consists of four distinct papers based on a common theme.
No attempt is made to exhaust the subject, but at the same time
the purpose is not just to illustrate.
The first and most fundamental paper
is an introduction to the method.
It has been kept as simple as possible in
order to make it more accessible.
Besides organizing and explaining the
subject, it provides some elements not in the previous literature and which are needed to understand the fourth essay.
On the other hand, a few of the
most difficult known results on the prediction process, in part depending heavily on analytic sets, are not included in the results of this paper.
The
attempt has been to make the subject self-contained and as concrete as possible, by avoiding unnecessary mathematical abstractions and artificial methods of proof. The second essay presents what is perhaps the simplest non-trivial type of stochastic process:
one consisting simply of the arrival time (or lifetime)
of a single instantaneous event.
To a surprising degree, this already
illustrates and clarifies the method. basic types of processes involved. of the physical phenomenon, where -co < t < °°.
One sees in clear distinction the two
On the one hand, we have the direct model t
represents physical time and we allow
On the other hand, we have the prediction process based on the
model, in which
t
represents observer's time and we require
0 < t < °°.
Vi
FRANK B. KNIGHT
This essay uses two results of the Strasbourg school, as well as several of the associated methods, but they are largely confined to the beginning and the end.
It should be possible to gain an understanding of the main idea by
taking for granted these results as stated. The third essay gives an application of the method to ordinary Markov processes.
Like the second it is written to be read independently of the
first, and it does make some demands on the literature of the subject. a sense it represents a concession to traditional formalism.
In
The problem
is to apply the prediction process (which is always Markovian) to a given Markov process without permitting any change in the given joint distribution functions.
This has the double intent of providing new insight into the
usual regularity assumptions for Markov processes, and of clarifying the meaning and role of the prediction process. The fourth essay brings the method to bear on three basic classes of processes:
square integrable martingales, uniformly integrable martingales,
and potentials of class
D.
In accordance with essay one, the study of each
class is reduced to that of a corresponding Markov process.
Thus for
example the "potentials" do actually become Markovian potential functions in the usual sense of probabilistic potential theory.
Several basic
applications are made, including the orthogonal decomposition of squareintegrable martingales, and the Doob-Meyer decomposition of Class potentials. process.
D
Of some general interest is the Levy system of a prediction
This is shown to exist in complete generality, not in any way
limited to martingales.
It is then applied to an arbitrary process to yield
simultaneously the compensators (or dual previsible projections) of all of the integrable, adapted increasing pure-jump processes.
Finally, the class
of continuous martingales which are germ-diffusion processes (i.e., have an autonomous germ-Markov property), is investigated briefly. In this essay, more than previously, a basic contrast with the Strasbourg approach to the same subject matter becomes apparent.
While
the latter approach studies the class of all martingales (or supermartingales, etc.) with respect to a given probability measure and adapted family of σ-fields, the prediction process approach studies the class of all martingale (or supermartingale) probabilities with respect to a fixed canonical definition of the process and
σ-fields.
One acknowledgment and one word of caution should be given in conclusion. Essays 1 and 2 have profited from the careful reading and criticism of Professor John B. Walsh.
In particular Theorem 1.2 of Essay 1 owes its
present formulation largely to him.
On the cautionary side, our numbering
system permits one consecutive repetition of a number when this corresponds
ESSAYS ON THE PREDICTION PROCESS
to a different heading.
Vii
Thus, Theorem 1.2 is followed by Definition 1.2, but
it might have been preceded instead.
However, since no number is used more
than twice, we thought that the present more informal system was justified in preference to the usual monolithic progression.
ESSAY I.
0.
INTRODUCTION, CONSTRUCTION, AND FUNDAMENTAL PROPERTIES
INTRODUCTION.
In this first essay, our subject is introduced in a setting
general enough to cover its uses in the remainder of the work.
Then the
fundamental properties and results needed later are developed and proved from scratch, making only minimal use of the "general theory of processes," as presented for example in C. Dellacherie [5].
In the later material,
which prepares the method developed here for application in various more specialized situations, it is inevitable that there be more reference to, and reliance on, the results of the Strasbourg school as developed in Volumes I-XII of the Strasbourg Seminaire de Probabilites [14], in C. Dellacherie [5], in C. Dellacherie and P.-A. Meyer [4], and in R. K. Getoor [8].
Yet it should be emphasized that the prediction process is not simply
another chapter in this development.
Rather it is a largely new method.
It
could be developed in the framework of the above, but whatever would be gained in brevity and completeness would be offset, at least for the reader who is less than fully familiar with the Strasbourg developments, by the prerequisites.
Consequently, we have tried to proceed here in such a way
as to be understood by the less initiated reader, and yet not to be considered infantile by the initiated.
For the reader who is familiar with
the Strasbourg work, and wants to get an idea of what the prediction process means in that setting, the second essay below may be read as an introduction. It does not depend on the more general theory to be developed.
The aim here
is not to incorporate the prediction process into any general theory of stochastic processes, but to develop it as an independent entity. Having gone this far in setting our work apart from that of the Strasbourg group, we must hasten to give credit where due.
In the first
place, the present work borrows unsparingly from the papers of P.-A. Meyer [12], of M. Yor and P.-A. Meyer [13], and of M. Yor [15], on the technical side.
The proof of the Markov property of the prediction process, which
was difficult (and possibly incomplete) in Knight [9], is derived in these papers from a stronger identity holding pathwise on the probability space, and we follow their method.
Again, the very definition of the process in
[12] avoids the necessity of completing the
σ-fields (until a later stage),
2
FRANK B. KNIGHT
and we adopt this improvement.
The measurability of the dependence of the
process on the initial measure, too, is due to these authors.
On this score,
we have not hesitated to profit from their mistakes, as described in [12] and [13]. Further, the basic role of the set
H
of "non-branching points"
is due to P.-A. Meyer ([12, Proposition 2]). Finally and perhaps most importantly, we adopt a new idea of M. Yor [15] to the effect that one need not only predict the future of the specified process in order to get a homogeneous Markov process of prediction probabilities.
One may just as well
predict the futures of any countable number of other processes at the same time.
The only essential precondition is that the future of the specified
process (the process which generates the "known past") must be included in the future to be predicted.
This, in our opinion, places the prediction
process of [9] into an entirely new dimension. Meanwhile, in regard to our use of the Strasbourg ideas and formalism, we would emphasize the distinction between such as
F , F
etc., and
σ-fields on a probability space,
σ-fields of a product space in which time is one
coordinate, such as the optional or previsible convenient to use
σ-fields.
standing of many results, they are probably unavoidable. while
It is often very
σ-fields of the latter type, and for a complete underOn the other hand,
σ-fields of the former type are needed to express the state-of-
affairs as it actually exists at a given
t,
σ-fields of the latter type
are needed rather to define various kinds of processes, usually as an auxiliary, and they can always be circumvented at a cost of sacrificing some degree of completeness.
Thus, one will not go essentially wrong in the
present work, if one substitutes right-continuous and left-continuous adapted process for "optional" and "previsible" process, respectively, and limit of a strictly increasing sequence of stopping times for "previsible stopping time".
In particular, while the section theorems are used freely
in establishing results for all
t,
no use is made of the corresponding
projections of a measurable process, although they are heavily implicated in the results. To give a general preview of the applications treated in subsequent essays, some of them (such as the Le"vy system of a martingale) may be stated without any reference to the prediction process, and when possible formulations are included. for the proofs.
such
For these, the prediction aspect is needed only
For most results, however, the prediction process is a
necessary part of the formulation of the idea or problem involved.
The
central purpose is thus to elaborate, and by implication more or less to phase in, the prediction process as a feature of the general theory of stochastic processes.
Once the reader becomes adept at thinking in terms
ESSAYS ON THE PREDICTION PROCESS of this process, other applications will suggest themselves immediately according to the context, or so we have found.
For example, very little is
done here in the way of using the prediction process itself in the manner of probabilistic potential theory.
In this way, many stopping times of the
given process would become first passage times of the prediction process, but the interconnection of the two processes remains largely unstudied. might be of interest to follow such a direction farther.
It
Even less (if
possible) has been done in connecting the present work with stochastic integration, a medium in which the author is not highly proficient. Accordingly, such matters are left aside in favor of applications in which we can feel confident at least that a correct beginning has been made. 1.
THE PREDICTION PROCESS OF A RIGHT-CONTINUOUS PROCESS WITH LEFT LIMITS.
We use the following standard notation for measurability. 1)
If
F
and
function for
G
are
σ-fields on spaces
X : F •* G
S e G,
is
and when
X
the corresponding Borel 2)
b(F)
F
and
V/G - measurable, or
a random variable or if
X~ (S) e F
is real or extended-real valued and σ-field, we write simply
denotes the bounded, real-valued, 00
denote the extended real line (X
[- , °°] by R,
X
B)
is
X £ F .
b(F) . Further, we R,
by
n==l
n=l
G
F - measurable functions;
b (F) denotes the non-negative elements of and the product space
G,
X e f/G,
with Borel sets
B
(R , B ) . oo
oo
We begin with the following measurable space. DEFINITION 1.1.
Let
Ω
denote the space of all sequences
w(t) = (w 1 (t), w 2 (t),...,w n (t),...) valued functions of denote the
t > 0,
of right-continuous extended-real-
with left limits for
σ-field generated by all
t > 0 . Let
w (s), s < t, n > 1,
G°
and let
F denote that generated by all w o ( s ) , n > 1, so that F° c 6° . "t 2n t t We set X = (w 2n (t),n > 1) on Ω, and F° = V F°, G° = V Go . τ h u s oo
X
has right-continuous paths with left limits in X R with the t n=l 2n product topology. Finally, we set θ w(s) = w(t + s) on Ω, and denote by
P
a fixed probability on
(Ω,G°) .
Before going further, we give a brief rationale for selecting this as the initial structure. 1
basically two things. conditioning
In setting up a prediction process, we require The first is a process which generates the
σ-fields (in this case, the process
X )
and the second is
a definition of the futures which are to be predicted (in this case, θ. G°), which must contain those of X (namely θ"1F°) . Once we define the process X and the futures θ G°, there may be some latitude as t
t
4
FRANK B. KNIGHT
to exactly how these futures are to be generated, but it seems to be necessary that they be generated by processes in order to write then with shift operators in the form
θ" G° . This being granted, the remainder of our
set-up represents a compromise between the more general assumption of [9], where
X
was only a measurable process, and the more familiar requirements
of the applications, in all of which
X
limits (abbrev. r. c. 1. 1.). Since
X
logical that the "unobserved" processes
is right-continuous with left is assumed r. c. 1. 1., it is n
^n-i^
-
lr
a r e
a l s o
r
c
lβ
1#
It should be pointed out that the choice of real-valued processes is only a matter of convenience.
If the actual process has values in a
locally compact metric space, or even a metrizable Lusin space (horaeomorphic to a Borel subset of a compact metrizable space) we can obtain the above situation by considering the processes composed with a sequence of uniformly continuous functions separating points.
Similarly, if the
X
and replace
by the corresponding subset, and so forth.
see that our set-up is
ί
w
actual process Ω
is real-valued, we may take
p
2
It) = 0, n > 1} = 1 It is easy to
P-indistinguishable from the canonical space of
right-continuous paths with left limits in the product of any two metrizable Lusin spaces, but we prefer the more explicit situation. A property of
(Ω, G°)
which is needed in setting up the prediction
process is the existence of regular conditional probabilities, given any subfield.
For this it is of course sufficient that
(Ω, G°)
is the Borel
space of a metrizable Lusin space, i.e., a "measurable Lusin space" in the language of Dellacherie-Meyer [4, Chap. Ill, Definition 16]. There are many different topologies under which the present measurable Lusin space. each
Ω n
(Ω, G°)
becomes a
It suffices to write
Ω = X Ω , then to give n=l n a Lusin topology as (a copy of) the space of all extended-real-
valued right-continuous paths with left limits, and finally to give the product topology.
Ω
In the present work, we specialize on one particular
such topology, a transplant to the present context of the one used in Knight [9]. This turns out again to be quite natural, and to have some rather unique advantages.
In brief, this is the topology of scaled weak
convergence of sample paths.
This topology is metrizable in such a way that
the completion of Ω is the space of all sequences of equivalence classes of measurable functions (with respect to Lebesgue measure).
The completion is
then a compact metric space, which we denote by
Ω
Ω Ω,
as a Borel subset.
For some purposes,
and a few results will concern
process can be constructed on
Ω
Ω
Ω
Ω,
and
is embedded in
is a more natural space than
explicitly.
The prediction
in complete analogy to
Ω,
but for
simplicity we leave this to the reader (see also [9] and [12]).
ESSAYS ON THE PREDICTION PROCESS
ϊ
Before beginning the construction, one more remark on its essential nature may provide orientation.
It it generally accepted that a stochastic
process, in application, is a model of a phenomenon which develops according to laws of probability.
But there is no such agreement as to
the nature of probability itself.
Some authors (including such renowned
figures as Laplace and Einstein) seem to have doubted that probability even exists in an absolute physical sense.
However, it seems unlikely that
anyone can doubt that probability does exist in a mental sense, as a way of thinking.
If only because one does not know the entire future, it is
clear that probabilistic thinking is an alternative possible procedure in many situations.
Indeed, it may be the only one possible.
Consequently, it
can scarcely be doubted that stochastic processes do exist in some useful sense, if only, perhaps, in the minds of men.
Furthermore, even if
objective probabilities do exist entirely apart from subjective ones, it cannot be considered unimportant to study the more subjective aspects of probability.
As with many other branches of mathematics, one is in a
better position to make applications of probability to the physical world once one understands fully the mental presuppositions which are involved in the applications.
Indeed, a large part of mathematics consists precisely in
cultivating and developing the necessary mental operations, and one of the fundamental requirements for knowing how to apply mathematics lies in distinguishing what is a physical fact from what is only part of the mental reasoning.
Thus, in stochastic processes as elsewhere in mathematics, it is
important and useful to understand what one is doing mentally. Coming, now, to the case of the prediction process, in much the same way as the probability distributions govern the development of a stochastic process, so the prediction process governs, or models, the development of these probabilities themselves.
The prediction process, then, is a process
of conditional probabilities associated with a given or assumed stochastic process.
The given information will be that of the "past" (or observed
part) of the given process, and the probabilities will be the conditional probabilities of the "future" (or unobserved part).
In this way, the
prediction process becomes at first an auxiliary (or second level) stochastic process associated with the given process.
But the remarkable
advantages of the method appear only when we consider this as a process per se, and define the original process in terms of it instead of conversely. This last step constitutes, in a sense, the main theme of the present work. The first step, however, is definition of the prediction process of the given
X ,
and this is our immediate task.
6
FRANK B. KNIGHT
We set
p (x) = π
(π/2 + arctan x) , -» < x < «>,
sequential process on
Y(t) = (Y (t)) = (/* e~ S p(w (s)) ds, 1 < n) . n ' O n ~ d
Y(
^} dt+
and consider the
Ω
= (e^pίw (t)), 1 < n)
it is clear that
Since
{Y(s) , s < t}
n
generates the {γ(r),
r
σ-field
rational,
G°_ = V r < t},
G° .
<
In fact, the same is true of
since the right derivatives at the rationals
determine the right-continuous functions generated by
Y(r),
r
p(w (s)) .
G
is
rational, and hence by the countable collection of
Y (r) = JT e S p(w (s)) ds .
random variables
In particular,
This countability is
essential to the method, which relies on martingale convergence a.s. ("almost surely", i.e., with probability one) at a critical place. random variables
Y (r)
also to those of the set
V
We note that
0 < Y (r) ~ n positive Lipschitz condition : 0 < Y (t+s) - Y (t) < e ~"
n
n
of [13, Lemma 1 ] . < 1, and that each Y (t) satisfies a uniform ~* n of order 1 : s < s . In particular, convergence of
•"
Y (r) for each rational
The
are analogous to those of [9, Def. 1. 1. 1 ] , and
*"
r > 0
is equivalent to uniform convergence.
We will be concerned with the uniformly closed algebra of functions generated on follows.
Ω
by the
For each
m > 1, ~
continuous functions on such functions.
Y (r) .
Explicitly, this may be generated as
let
f .(x., m,j 1
[0,l] m
,x ), 1 < j, m ~
be a sequence of
which is uniformly dense in the set of all
Then the algebra in question is the uniform (linear)
closure of all the random variables
f .(Y-(r,) f —,Y (r )), for all m, 3 J. l m m positive rationals r , . . . , r , l<j, l < m . This is easily checked l m — ~ by first fixing m and r.,..., r , noting that the range of l m (Y_ (r n ), .. . ,Y (r )) is compact, and then using the uniform continuity of 11 mm f . in conjunction with the Stone-Weierstrass approximation theorem, m, 3 REMARK. More generally, if we enumerate the Y.(r.), and choose any countable collection of continuous functions on the Hubert cube X _ [0,1] n=l which is uniformly dense in the set of all such functions, the composition of these with the sequence We let
U
Y.(r.)
can replace the above particular choice.
denote this algebra, and summarize the needed function -
theoretic properties as follows. THEOREM 1.2 metric
a)
The topology on Ω generated by a
b
n
a
Cί is metrizable by the b
d(w ,w ) = Σ 2 - | | y - γ |
ESSAYS ON THE PREDICTION PROCESS
where
||f(t)|| = sup|f(t)| .
compact metric space
Ω,
The completion of
Ω
in this metric is a
whose elements are identified with all sequences
of (equivalence classes mod Lebesgue-null sets of) measurable functions w (t) : R
-*• R,
with the same definition of
d .
With this identification, o
Π
Ω
is Borel in
Ω,
b) PROOF. by
U
and the Borel field on
The map
(t,w) -* θ w
Ω
is
G
is continuous from
[0,«>) x Ω -> Ω .
We have already noted that convergence in the topology generated is the same as uniform convergence of each
assertion.
Y ,
proving the first
On the other hand, since this convergence is equivalent to weak
convergence of each
Y
considered as a distribution function, the
completion is a closed subset of the space of all sequences of distributions of mass
< 1
Theorem.
on
Hence
[0,°°) , Ω
which is compact under weak convergence by Helly's
is a compact metric space.
by a sequence of uniform limits of
Y 's
An element of
Ω
is given
i.e. by a sequence cκ£ non-
decreasing continuous functions of Lipschitz constant 1.
Such functions
being absolutely continuous, we may identify them as integrals of their a.e. - derivatives
p(w (t)) < 1, n ~
and
w (t) n
is identified by applying
P Conversely, given any sequence functions
P(w )
are bounded by
w
0
and
functions, convergence in the metric
d
time intervals in the weak topology /(pwn(t))f(t)dt
of measurable functions, the and measurable.
f,
For such
is simply convergence in finite
σ(L .L^),
for bounded measurable
equivalently, only for continuous
1
f
i.e. convergence of
with compact support (or
which are dense in
L ).
The
closure of the continuous functions bounded by intervals.
pw is all measurable functions n 0 and 1, since it contains the L -closure in finite time Therefore, the completion includes all measurable w as n
asserted. Finally, an approximation by Riemann sums shows that G -measurable on G -measurable on (Ω,G ) •> (Ω,G )
Ω
for each
t .
Ω
for fixed
w
is Borel, where
It follows that <= Ω,
G
d(w ,w )
as asserted.
Turning to b ) , we have
p(w(s))ds is is
and therefore the inclusion mapping
denotes the Borel field of
a well-known theorem [4, III, Theorem 21] it follows that G = S |Ω,
J
Ω e G°
Ω . and
By
FRANK B. KNIGHT
p(iΛs))ds| < (eε-l) J ^ e ( t " s ) p (w^s) )ds - phζ(β)))dβ|
ε
< (e -l)
t
2 ||γ«-1ζ||
+
+
2 e.
This easily implies b ) . REMARK.
This topology is somewhat artificial in that it depends on the
choice of the function begin with a process E c E
p . The artificiality disappears, however, if we Y
having values in a metrizable Lusin space
compact, and consider
w n
( t ) = f (Y.)
dense subset of the continuous functions on
where
(f^)
is a uniformly
E . Then the topology of
Ω
reduces to that of weak convergence of sojourn time measures μ(t,A) = /Jj I A (Y g )ds
for the process
Y .
We turn now to the state space of the prediction process. DEFINITION 1.2.
Let
(Ω, G ), with the on
H . We give
(H, H)
be the set of all probability measures on
σ-field generated by H
{h(S), S e G }
convergence in the topology of
Ω . Let
P
and
probability and expectation determined over PROPOSITION 1.3. H
is the Borel
H
G
E , by
h e H,
denote the
h .
is a separable metrizable space, with respect to which
σ-field.
in its completion on
as functions
the topology of weak-* convergence with respect to
H,
The metric may be so defined that
H
is Borel
the compact metrizable space of all probabilities
Ω .
PROOF.
Since
Ω
is embedded in the compact
Ω,
there is a uniformly
dense sequence f in the uniformly continuous functions on Ω . The n h functions E f then induce the topology of H", which is clearly n __ , metrizable with completion continuous bounded, S&G
,
Borel in
f,
H .
Since
E
f
by a monotone class argument
G°-measurable as asserted.
f . Hence Finally, since
h(S) = E
is then measurable for E
f I s
is measurable for is measurable for
H = {h e H : h(Ω) = 1},
H
is
H .
For the construction of the prediction process we introduce a fixed sequence of continuous functions which are suitably bounded, but the outcome is entirely free of which such sequence is used.
ESSAYS ON THE PREDICTION PROCESS NOTATION 1.4. Let 0 < f < 1 be any fixed sequence of continuous functions on Ω whose uniform linear closure is all continuous functions. For example, the f .(YΊ (r_),...,Y (r )) of Theorem 1.2 suffice, when m,j 1 1 m m extended by continuity to Ω . We also need LEMMA 1.5. For λ > 0, and bounded measurable λt
f t
h
(t) = e~ E (/°° e ~ for every h ^ H .
λs
f
d s
θ
F
l t
}
a r e
p h
f > 0, the expressions -suPermartin9ales
i n
PROOF. This i s a familiar computation, due to G. Hunt.
Af
11
«.
-L.
U.
- λ ( t - + t_) 1 2 h . r°° -λs _ = e E (J e f -λt_ ,
^Q
θ
Λ
12s
, I rO ds|F ) 1
-λ(t 2
In order to use martingale convergence with Lemma 1.5, we first choose for each rational S given *
r > 0 a regular conditional probability F . In fact, we choose r
W (S), Se G ,
of
θ r
in
(h,w) as is possible by a well-known construction of J. L. Doob
(using the fact that
W to be H x F r r
measurable
F° is countably generated — for the method, see
also Theorem 1.4.1 of [9]). Thus we may be more precise in Lemma 1.5 for f =f
by setting ,.
/
and we now assume this particular choice. Next, we prepare one more lemma. LEMMA 1.6. For any
t > 0, h e H , and w e Ω, existence of the limits
along the rationale r lim f (r) λ,n,h for all n and all rational
λ > 0 is equivalent to the existence of lim W h r
in the topology of H .
10
FRANK B. KNIGHT
PROOF.
By definition of the weak-* topology, existence of the last limit
is equivalent to that of
wh Γ
lim E f for all
n . Now by Fubini's Theorem we have
f, , (r) = e λ,n,h
J
f e 0 r
and by Theorem 1.2 b) we know that E (f uniformly in r . Clearly it is bounded by
Convergence of
f^
λ > ε > 0, as
v,^
r
(f
θ )ds , s
θ ) is continuous in s, 1 . Thus f_ __ (r) is λ,n,n
uniformly in
"*"t +
n
h
w
uniformly continuous in
r
E
^or
a
^
r,
λ > 0
for each
n
and
implies, by the
continuity theorem for Laplace transforms, convergence of the measures h Wr
(E
f
θ )ds s
n
.
By a simple use of equicontinuity of these densities in
s,
this is
equivalent to convergence of
wh Γ
E for each n
s .
But at
s = 0
W
s
f
θ
w
is continuous on
in
H,
Ω,
implies that of
wh
E for each
θ
this implies convergence of
varies. Conversely, since each
convergence of
f n
r
f
n
θ
s
s . Hence by the dominated convergence theorem, we obtain the
existence of limits of
fΛ
, (r)
and the proof is complete.
We can now give the definition of the prediction process for fixed h e H . DEFINITION 1.6.
Let
T
= sup{t : for
W , = lim W s± , r r-*s±
both exist and are in
process of
by
h
0 < s < t
the limits
H } . We define the prediction
as
ESSAYS ON THE PREDICTION PROCESS
lim
W
h
i
11
on {t < T }
on {t > T } — h
In discussing optional and previsible stopping times, it is convenient to use the h-null sets in the
σ-fields
F
consisting of
h-completion of
F° .
T,
T = n on { T = θ } , and let n form of the assertion is trivial).
THEOREM 1.7. for each
a)
t .
perhaps at
T
0 < T < °° for previsible
since we may replace any
and
For
h e H,
augmented by all
Furthermore, there is no loss of
generality in the following theorem to assume stopping times
F
n •+ °°
T
(on
P {T, = °°} = 1, n
by
T
= T Λ n
b)
For every
F
H X B X F
the corresponding
and
is ,
Z^ t
c)
For every
F
=
Z
TίS) '
S
e
F° -measurable t+
Z
measurable in
-optional stopping time
pNθ^1 slFτ+)
(1.7b)
{τ>θ}
{ T = 0}
It is right-continuous, with left limits < °°, and it is
on
in
H
except
(h,t,w) .
T < °°, we have
G
° '
-previsible stopping time
0 < T < °° we
have
p h (θ τ 1 S|FJ_) =ZJ_(S) , S e G° ,
(1.7 c) where we set
Z
= h
if the left limit at
T
< «> does not exist.
Ή
d)
The processes
F t+ -optional and
F
Z
and
Z
-previsible, and either of these facts together with
(1.7b) or (1.7c) respectively, determines h-null set for all REMARKS. PROOF.
are respectively
t > 0
(> 0
Z^
if we set
It follows from [4, VI, 5]
that
or
Z
uniquely up to an
Z Q _ = h) . z£
is even
F°+-optional .
By Lemma 1.6 and the classical supermartingale convergence
theorem of Doob (continuous parameter version) we know that P {lim W = W exist for all s} = 1 . Unfortunately, there seems to be S r+s± r " no way to deduce from this that the limits are concentrated on Ω (hence are in Z
+
let {m2~ k
H)
except by first proving parts
(this is the price we pay for using T
be any finite < T < (m+l)2~k}
F
b)-d) for Ω
instead of
-stopping time, and let
for all
m > 0,
W
as usual.
T
+
in place of Ω) .
Accordingly,
= (m+l)2~
on
Then by Theorem 1.2 b ) ,
and martingale convergence of conditional expectations, we have
12
FRANK B. KNIGHT
(1.8,
liπ. E X e"
λs
k-*»
θ
fn
k
ds|FΪ > k
; = lim E
k
(/£ e "
λs
f n • θgds)
= EW>Γ+ Γ e-λs f J 0 n
θ ds . s
By monotone class argument using linear combinations of the follows that given W
F
W
f , it
defines a regular conditional probability on
, and in particular
W
θ
G
(Ω) = 1 a.s. Since
(set Ξ o where it does not exist for all
t)
optional section theorem [4, IV, 84] shows that
is h
P {w
F
-optional, the (Ω) = 1
for all
t}
= 1 . Turning to time.
W
0 < T < °° be an
lim T -*x>
(T ) of stopping times with
T
< T
and
= T < °° . Then by (1.8) and Hunt's Lemma [4, V, 45] we have
V
Γ e"λs f
Ό
But by [4, IV, 56 b) and d)] we have class argument θ
G
t)
is
VC_
π
θ d. .
n
V
s
F
= F , and so by monotone k defines a regular conditional probability on
given F . Since W (set = 0 where it does not exist for all , T t F -previsible, the previsible section theorem [4, IV, 85] shows
Ph{w£_(Ω) = 1
that P h {τ
-previsible stopping
*
= E
T
F
By [4, IV, 71 and 77] this is equivalent to the existence of an
increasing sequence k
, let
for all
t > 0> = 1 .
Combining the above results, it follows immediately that = °°} = 1 and we have (1.7b) and (1.7c). It is clear that
T, n
is
ESSAYS ON THE PREDICTION PROCESS
even an
F°
stopping time, and obviously
Z
13
is right continuous, with
left limits except perhaps at T h < °° . It now follows by [5, IV, T 27] that
Z
is
F
-optional.
To see that
Z
is
suffices to note that it is a measurable process previsible process sets. d).
W
,
since
-previsible it
h-equivalent to the
contains all subsets of
h-null
In view of the two sections theorems, this completes the proof of Finally, the joint measurability assertion in a) follows from the 1
H x F°-measurability of W* r r and
z
In
= w
t
For ε > 0
on
where
0 < s < t, ~
I
8
™
REMARK. rn m
S+
since
I
I
I u,T
t + [o.τ.) n
+ h
x
(t) is H x 8 x F°-measurable
[τ. ,«) h
fact, for later use we may state
COROLLARY 1.7.
"Λ.
F
F
and t > 0 , Z is H x 8 x F° -measurable s [ 0, t J t ε , are the Borel s e t s of [0,t] .
LU, tj
This follows immediately by the same method, since
A /.^n
i s
F° -measurable for each
LU,1 Λ (t+b }
s .
It does not follow,
t+c-
however, that Corollary 1.7 holds if
F is replaced by F t+ε t+ , We next examine how to recover the process X = (wo (t)) from Z , t 2n t In principle, this is possible because Z {(w. (0)) = X } = 1 for each t 2n t t, h-a.s. DEFINITION 1.8. Let a mapping from H into R^ be defined by φ(h) = (p" 1 (E h P t w ^ t O ) ) ^ < n It is easy to see that THEOREM 1.9.
φ(h) is
Since
Now we have
For h ^ H, P h {φ(z£) = X
PROOF.
K/B^-measurable.
φ
for all t > θ} = 1 .
is a Borel function and the components of X
continuous, both sides of the equality are F -optional.
By the usual
section theorem, it suffices to prove that for each optional have
are right-
T < °° we
P {φ(z£) = x } = l . But for n > 1, by Theorem 1.7 b) we have
P
(w 2 n (τ)) = E h (p(w 2n (0))
θ τ |F£ + )
= E T P(w 2 n (0)) , P h a.s. Applying X
p
to both sides, we obtain the identity for the components of
, completing the proof.
14
FRANK B. KNIGHT
REMARK.
It follows in particular that
{z ,
h
0 < s < t}
S
"~
generates a
~"
σ-field whose completion for P contains that of F° . Consequently, by Theorem 1.7b), for any 0 < s < — < s and B_ , ,B e B , we have — — — n l n °° easily
ίw(s k ) * B k } | z ^ , β < t) . But then, by obvious monotone extension, we have P (s|z , σ(Z h , S
s < t) ,
s < t)
S e G° .
) )
Gence it follows that the augmentation of
P h -null sets is
by all
P (s|F
F^ . t+
We turn now to a basic homogeneity property first proved by P.-A. Meyer and M. Yor ([12] and [13]), which is also the key to their proof of the Markov property of
Z
.
The proof we give is new in that
it avoids Theorem 1 of [13], which was in the nature of an amendment to [12]. Here and in the sequel, we will use where convenient the following abbreviation.
h
NOTATION 1.10. THEOREM 1.11.
Let
Z^f
For each
denote h
and
E tf . F° -stopping time
T < «>,
w e
have
z
z
where
θ
PROOF.
τ+t = zt "θ τ
for a11
t
t .
f e U,
f
and
F
Z (w)
and
differ at most by
°
T+t, k
{t > 0} such that, for h , P - a.s. Such t exist Zm is continuous at t = t. , T+t k Theorem 1.7a), b) show that
P -null sets.
cF°
(fn) of
set
T+t, k
c F°
may be included in this equivalence. k
f
k F
F
Z
in the sequence
Notation 1.4 and t in a countable dense h each n, Z f is continuous at t = t since Z m f is r.c.1.1. in t . Thus T+t n P -a.s., and since T + t is previsible _
"a s
Therefore, to prove Theorem 1.11 it suffices
to show that these are equal for all
k
ph
on the right does not apply to the superscript
We first observe that, for
are right-continuous in
Fτ
- °'
T+t, + k
Since we have
ESSAYS ON THE PREDICTION PROCESS
Next, we note that for any F°
-measurable.
θ T
is
(tf x F°
the two sides of Theorem 1.11 are
The left side is clearly so.
Corollary 1.7, for each and
t
t
and
15
ε > 0,
Z
As for the right side, by
is
(tf x F τ
ε
) / ^ ~ measurable
F° ./F? - measurable, whence by composition Z θ_ is T"ί*ε Ί"t 11 ε -u _ measurable. Si Since also Z is F /tf - measurable, by )/H - measurable.
composing again it follows that
θ
is
F° Ttε
- measurable, and thus
F , = F T+t+ T+t, , k k
j?tτ
- measurable.
Since
the proof of Theorem 1.11 is thus reduced to showing
E h (Y z£
(1.10)
for each
F
τ
Y e b(F^+t )
f) = E h (Y(Z
and all
f e {f^}
T
f)
θ )
used above.
To prove (1.10) we need two simple lemmas. LEMMA 1.12.
F
is contained in the
σ-field
F
generated by all
Y
of
the form Y = (b
PROOF. F
g e b(F°+) ,
9 e n e r a t e d by
t h e
stopped process
X(τ+t
k
)
Λ
g
,
0 < s .
Hence we
k
need only show that for each have
b e b(F° ) . k
It is easily seen from Galmarino's Test [4, IV, (100)] that i s
T+t
θ τ )g ,
F°+ c
Fγ
,
s
X g M , e F°+,
this is in the
and
T e
F
°
+
.
σ-field
Hence
F
XsΛT I^
. < τ }
Clearly we * Fγ .
Now we have
(
s Λ T
on
{s < T}
T +((s - T) Λ t ) on {s > T}
hence it remains only'to consider the case {j2~
n
n
< T<(j+l)2" },
1 < j, I
we have {s>T}
- i{s>τ} For each
n,
we can write
s > T .
X T+(s-T) Λ t
<"»
Letting
T
= j 2"n
on
16
FRANK B. KNIGHT
X
T + ((s-T ) Λ t ) n
X
. -n
—
iζ.
\S π^
Θ
/ ^ t
T
k
n
= j 2 < S } , which is in F . Then it is easy to see that the n Y above limit on {s > T } in F , completing the argument of the Lemma.
on
{T
REMARK.
In fact we have
F°
= F ,
as is easily checked, but not needed
k
below.
The class of finite linear combinations of
Y
having the form stated
in Lemma 1.12 is closed under multiplication, hence it suffices to prove (1.10) only for
Y
writing
t ,
t
for
of this form.
Then assuming
Y = (b
θ ) g,
and
by Theorem 1.7b) we have
E h (Y z j + t f)
(1.11)
θτ+t|F°(τ+t)+» = E h (gE h ((b
θ τ )(f
E h (gE T (b(f
θ t )))
= E (gE (b E (f zh
ΘJF° + )))
zh
h T T = E (gE (b Z^f)) . To go from here to the right side of (1.10) we need to reintroduce a conditioning by
F
on certain occurrences of
w .
To justify this, we
have LEMMA 1.13.
Let
K(w ,w )
be a bounded,
F°
x F° - measurable function.
Then
E for
P -a.e. w,
over
where the expectation on the left is with respect to
w
By linearity and an obvious monotone class argument, it suffices to
prove this for
1.7b).
K(w,w 2 ) = E n (K(w,θ τ w(|F° + )
Ω .
PROOF.
Then
T
K (w)
K
of the form
K^w^
K2(w2),
K χ e b(F° + ) ,
K 2 e b(F°) .
factors out on both sides, and the result follows by Theorem
ESSAYS ON THE PREDICTION PROCESS
17
We now apply Lemma 1.13 to the last expression in (1.11) with
K ( W
W
1' 2
)
=
9
(w
l
}
This is justified by composition of with
Z ,
which is
b
<
w
) ( Z 2
f ) ( W
t
2* *
Z^ f(w), which is
^ + / ^ " measurable.
H x F
- measurable,
We obtain
(1.12)
2 (W) = E h (E
h
T
( g ( w )
b
Ϊ
(W2)(Zt
f ) ( w
Zh θ )g (Z.Tf)
h
= E [E ((b
2
) } )
Θ_)|F° )]
z
h ί = E (Y(Z
f)
θ )
proving (1.10), and hence Theorem 1.11. It is now easy to deduce that the processes
Z
are all strong
Markov processes with the same Borel transition function. this as the "intrinsic Markov property." varies we frequently write DEFINITION 1.14.
z
Z (w)
q(t,z,A) 6 F
THEOREM 1.15.
Z
as
h
h .
We set
is
z*H,
0 < t ,
A e H .
H x F° - measurable, we have by Fubini's Theorem
for fixed
q(t,z,A) e 8 + x H
When considering
in place of
q(t,z,A) = P Z (Z* ^ A) , Since
One may describe
(t,A) .
for fixed
Further, by right-continuity in
t,
A & H .
As processes with state space
(H,H)
and
σ-fields
F° ,
•U
the
tx
Z ,
PROOF.
h
fc
Let
H,
are strong-Markov with transition function
T < «> be an
F^ + - stopping time.
Then by Theorem 1.11
and Lemma 1.13,
zh z h P
T
(Z
T
t A)
q(t,Z^,A) , as asserted.
q .
A e H
18
FRANK B. KNIGHT
Using Theorem 1.15, it is surprisingly easy to obtain a remarkable result of P.-A. Meyer [12] concerning the set DEFINITION 1.16.
Z
Let
the intersections of sets in Clearly
H
e H,
will see that
H
THEOREM 1.17.
For
h
G
H
PROOF.
of "non-branching points"
and
H HQ
with
Z
fL u
is its topological Borel field.
h
h e H,
and let
denote
HQ .
is an alternative state space for
the processes
sense of
H
Z
H Λ = {z e H : P { Z = z} = 1} υ o
P {z£ e H Q
for all
We
Z fc .
t > 0} = 1,
and for
comprise a Borel right-process on
HQ,
in the
Meyer. The second assertion follows immediately from the first and the
definition of a right-process (see Getoor [8, (9.7) and (9.4)]) since the
Z
are right-continuous,
P {Z
= h} = 1
on
H , and
q
is Borel.
The first assertion is really a familiar consequence of the strong Markov property.
To prove it, we introduce momentarily for fixed
h e. H
a canonical version (P,Z ) of Z on the space of r.c.1.1. paths with Z values in (H,H), and let θ denote the usual translation operators, and P F the usual right-continuous, P-augmented σ-fields on this space. Of course by Theorem 1.15 this makes sense, and
Z
remains a strong-Markov
process on this space, with transition function F
q .
Also,
Z
is
- optional, hence by the section theorem it suffices to show that for
optional
T < °°, p{z
€ H } = 1 .
But since
Z
θ^ = Z
and
Z
is
Fψ./H - measurable, the strong-Markov property implies p{z
H
τe o
}
= 1 , which completes the proof. REMARK.
Usage of this sample space is introduced systematically in the
following section.
Here it was used only for notational convenience,
because we do not have
Z
* θ
= Z
.
We turn finally to the "moderate Markov property" of the left-limit processes
Z
,
t > 0,
in the terminology of Chung and Walsh [2]. This
was anticipated by Theorem 1.7 c ) , and provides a "practical" form of the Markov property in the sense that it can be applied without knowledge of the future (unlike Theorem 1.15). THEOREM 1.18. with
For
0 < T < oo .
h ^ H, Then for
let
T
t > 0
be an and
F
A e H,
- previsible stopping time
ESSAYS ON THE PREDICTION PROCESS
lFi
ϊ PROOF. T°
ph a
4-'A) ' -s
By [4, IV, Theorem 78] there is an
equal
T° .
P
- a.s. to
T,
19
F
- previsible stopping time
and it suffices to prove the assertion for
Further, by [4, IV, Theorem 71] we can as well assume there is an
increasing sequence
T
< T ,
P {lim T n
by
lim T n it*»
T°
are
= T } = 1 .
We may replace
T
-xx>
in proving the assertion, and then
F
F = V F , where the T^ n T° n - stopping times [4, IV, Theorem 56] . Again by [4, IV,
Theorem 78], F T-
and
F T-
differ only by
enough to prove the result conditional on
P -null sets, so it is F° . T-
In short, we have shown
that the entire assertion is equivalent to that obtained by replacing T _o r - stopping times, and v» might as well have been so formulated (except for the fact that F - stopping by a strict limit of an increasing sequence of
times are needed in applications). We need to use the analogue of Theorem 1.11 for previsible stopping times. LEMMA 1.19.
For
h 6 H,
and
F°
- previsible
T
with
ph
"a ' s
P h {0 < T < «} = 1,
we have
z
PROOF.
τ+t = z t ~ 'θ τ
for a11
t
- °'
As for Theorem 1.11, the problem reduces to showing the analogue of
(1.10) with
Y,
(1.13)
f,
and
t
as before:
E h (Y Z ^ + t
f) = E h (Y(Z t T "f)
θ ) .
To do this, we need only apply the analogues of Lemmas 1.12 and 1.13.
The
latter is proved just as before, and we have Z
t-(w)
E for
Fτ_ x F
is replaced by
- measurable g <= b ( F τ _ ) ,
use the familiar fact that
h K(w,w 2 ) = E
K(w ,w ) .
(K(w,θ w) |F
) ,
As to the former, where
g e b(F° )
the same proof applies except that we must T e F
i
s Λ T
for previsible on
T .
Then we write
{s < T}
T +((s-T) Λ t ) on
{s > T}
20
FRANK B. KNIGHT
I \ = X I / ^ \ e F (by definition of F ) . s Λ T ris < Tj s is < Tj TTthe second case, there is no change on {s > T} . Finally, on
where
X
{s = T } we have
X
s
Ir -, = (X_ is=T} 0
For
θ ) Ir -,, which also has the T {s=T}
required form. We now apply the two lemmas, along with Theorem 1.7 c ) , replacing F
and
Z
by
F
and
z τ
_'
and next the analogue of (1.12).
t o
obtain first the analogue of (1.11)
This then completes the proof of Lemma
1.19. To complete the proof of Theorem 1.18, one need only apply Lemma 1.19 and the analogue of Lemma 1.13 to obtain
,z
h
Z
7
P
(z t
6
A)
completing the proof.
2.
PREDICTION SPACES AND RAY TOPOLOGIES. As already became apparent (in the proof of Theorem 1.17 for example)
it is a technical obstacle to have to define
Z
separately for each
z .
Furthermore, in view of Theorems 1.9 and 1.15 it is an unnecessary obstacle. N & F° Z
We have
X
Z
with
= φ(Z )
P (N) = 0
for
for all
z & H .
t,
except on a fixed set
Thus we are free to transfer the
to a more convenient sample space, and study
X
in terms of
Z instead
of conversely.
This leads to the following concepts and definitions.
DEFINITION 2.1
1)
of
The prediction space of
( Ω . 7 ,Z , Θ . Z ) Δ
i) ii) iii) iv)
t
t
consists
where
t
Ω is the set of all right-continuous H -valued paths Z 0 w (t), t > 0, with left limits w (t-) in H for t > 0 . z Z Z° is the σ-field generated by {w (s) , s < t}; Z° = Vt Z° .
5
θ* : Ω^ •> Ω_ is defined by t Z Δ Z (w ) = w (t) , 0 < t . t
it
transition function Thus,
Z H
0 < s,t . ~"
The prediction process (without specification of a fixed
probability) is the canonical process
process on
z
θ w (s) = w ( s + t ) t Z Z
Δ
2)
1.18.
(Ω,G°,F° ,G° ,θ ,X ) t+ t+ t t
q(t,z,A),
Z
on prediction space with
as justified by Theorems 1.15, 1.17, and
is a strong Markov process and
Z
("process" being meant in the sense of
is a moderate Markov E. B. Dynkin [6]),
ESSAYS ON THE PREDICTION PROCESS
and it is a right-process on
H
distributions concentrated on
when considered only for initial H
.
In both cases, we have the same
augmented right-continuous
σ-fields
the
for all permissible
P^
completion of
Z°
continuity of path at with
P^ = P ° .
each
h e H
t = 0
21
every
Z
μ
P*1 - null sets in
containing all
on
H
μ,
since by right-
indices a
\χ^ on
Finally, in view of Definition 1.8 and Theorem 1.9, for
the processes
distribution to
(φ(Z ),Z )
(X ,Z ) ,
are jointly
and both components are
P -equivalent in P
- a.s. right-
continuous with left limits in their respective topologies. noted that we use the same notation (Ω,G )
or
(Ω , Z ) , 3)
£ U
for all
if
it is an h
P {z _ e u
REMARKS.
It is to be
for a probability on either
By a packet of the prediction process we mean a non-void
measure. U c H
P
the distinction being clear from the context,
z
universally measurable subset P {z
HQ
U
of
H
such that, for all
t > 0} = 1
in the sense of outer
If
then
"H
U e H,
packet".
for all
H Q - packet, and on an
U, H
is a "Borel packet", while if
We say that a packet
t > θ} = 1
Given a packet
ϋ
for
Z
ϋ
is
"complete"
h € u .
it is clear that
- packet
h e u,
U Π H
is an
is a right process in the sense
of Getoor [8]. But completeness may be lost in this operation, and on a complete packet one has the moderate Markov property of
Z
.In
anticipation of things to follow, we point out that starting with a process Zfc,
or collection of such (i.e., of
P's
on
(Ω,G°)),
it is often
possible to find a packet which contains the given process (or processes),
but little or nothing superfluous.
This is beneficial in
applying the prediction process. As a first step in the construction of packets, we prove THEOREM 2.1. Borel subset ffor or z e e A A . . z packet, with
h & H c H , n Ά
a)
A c H , let R be a A z z of H (i.e., R € H) with P {z^ 6 R f o r all t > θ} = 1 Ph{ A TThen hen t h es e t H = { h e H : P { z e R , t > θ } = l i s a A t A A c A b) For each h e H , there is a Borel packet H with and further c)
PROOF.
Let
R
Ω
on
function
. h
Given any non-void subset
The packet
n
^ {z € H : P {z € H, t h H
of
a)
Then
T
E (exp - αT)
is is
for all
t > 0} = l} . —
is complete.
T = inf { t > 0 : Zfc e H Q - R^}
(as in [1, I, (2.8)]). {h fc H
H
be the hitting time of
Z(=V Z ) - measurable, and for
α > 0
α-excessive for the right-process on
Further, we have for any
α > 0
H
Π H
the H =
: E (exp -αT) = θ}, which (since the right-process has a Borel
22
FRANK B. KNIGHT
transition function) is a nearly-Borel set [8, (9.4) (i)]. Since we have
H
= {h : q(O,h,H
Π H ) = l},
it follows that
H
is nearly
Borel in H . Hence it is universally measurable. Also, for h e H the z t process E (exp -αT) is h - a.s. right-continuous. For h e H , A Z e
E
(exp - αT)
is thus a positive right-continuous supermartingale
starting at 0 . Hence it is 0
for all t, and H
is a packet.
Turning to the proof of b ) , we use a familiar reasoning due to P.-A. Meyer.
Since
,
_
H
is nearly Borel, for h e H
A
,
η
there is a Borel set
A
HΓ with Hn c (H Π H j and P {z e H for a l l t } = 1 . Then by the n 1 A 0 tn l z l same reasoning as for part a) the set H^ = {z £ H : P {z e H , t > 0} = 1} i s a packet with P {z_ e w } = 1 . Similarly, we define by induction a l
o
2n
2n—1
s e q u e n c e H o H D HT :> . . . such that for a l l n, H_ e H and H, ^ h 2n °° 2n i s an packet with P {z ^ H } = 1 . Then p l a i n l y H^ = Π H^ = 1 ΠR " defines a Borel packet and P {ZΛ € H°°} = 1 . Finally, we s e t n n Oh ra H = {z : P {z c H } = 1} . Then H, is a Borel packet, h e H , and if h 0 h n h P z {z & H for t > 0} = 1 then obviously z e H . t h ~ n Before proving c ) , we mention two simple Corollaries. COROLLARY 2.2. H
c H
on H A , there is a Borel packet
μ
with
A
μ
For any probability
p { z Q e H } = 1, and further
H y ^ {z e H : P Z { Z t e H
for all t > 0} = 1} . PROOF. H
1
By definition of nearly - Borel set, there is an Π H , H 1 & H, with
c H
part a) the set H and
= {z e H
^ί^t
e
H1
for all t} = 1 . Then as in
Z
: P { z & H , t > 0} = l} is a packet,
P μ { Z e H } = 1 . Proceeding by induction as in b ) , we obtain a
e decreasing sequence H c H Π H with H H, and H such that p μ { Z Λ e Hn } = 1 . Now let H°° = ΓΊ H Π , and 0 μ μ n H = {z : P {Z Λ e H00} = 1} . μ 0 μ
COROLLARY 2.3.
For any packet
K c H
H
. Thus
K
such that
K fl H = H
a packet
Π H , we have
is the largest packet having the given non-branching
points of H . PROOF.
For any packet
K, one has q(O,z,H
(Ί H ) = 1
for z € K . But
it follows by the definition of H , using the Markov property again, H contains all z with q(O,z,H Π H ) = 1 . Thus the Corollary Ά A U is proved. REMARK. We observe that for any initial probability μ on H , an element that
h
of H
is defined by h(S) = /
/ A
=
q(O,h,dy)y(S)μ(dh) 0
J H (/„ q(O,h,dy)μ(dh)) y(S) , S e G 0 A
ESSAYS ON THE PREDICTION PROCESS
where the probability in parentheses is concentrated on
\
23
Π H . U A Returning now to the proof of Theorem 2.1 c ) , for h € H let c H be a Borel packet as in b ) . We wish to show that
P h {Z
e H for all t > 0} = 1, tA t > 0} = 1 . Now 1 (Z. ) is a H h t-
previsible stopping time,
H
and we know that
P {z e H for all t n Z - previsible process, and for each t
T, 0 < T < <», we have by the moderate Markov
property ph(Zt e Hh
for all
t > τ|Z
)
V = p
(Z
e H
for all
t
> 0)
= 1 . Consequently, by b) we have theorem it follows that
P {Z
P {i
(Z
H
P {Z
e H
for all
e H^} = 1 . ) = 1
for all t > 0} = 1 ,
~
t > 0} = 1
as required.
A natural question is whether, given a set
R , with
A = {(0,0)},
Here the points P
,
A € H,
there is a
The example of a Brownian motion
B (t)
in
shows however that no smallest packet need exist.
(x,y) € R
correspond to points of
R
and clearly any polar set may be subtracted from
non-polar set may be subtracted) to leave a packet. this example
and so
fc
h
smallest packet containing it.
By the previsible section
via the usual R
(but no
It can be shown that in
H
is the set of all Brownian probabilities corresponding to 2 2 initial distributions μ on (R ,8 ) , but the proof probably requires Ray compactifications (see Discussion 3) and 4) of Conjecture 2.10 below). It also should be noted that the definition of packet depends only on the transition measures
q(t,h,dz)
not depend on the exact choice of
of the prediction process, and these do Z
•i
the
W
(which is not unique since it involves t
of Definition 1.6).
In short, a packet is just a continuous time
analogue of "conservative set" for a Markov chain. elements of
A
In the case that the
are themselves Markovian probabilities on
Brownian example above) the measures
q(t,h, ) , h e A,
Ω
(as in the
are usually easy to
identify, and the appropriate packet becomes evident. This leads to a method of finding a "nice" transition function for a Markov process, which is the subject of the third essay.
Here we can
illustrate it in a more classical case by continuing our example of B (t) .
Let
killed process
B
be a Borel, non-polar set in R , and consider the usual 2 2 2 B.(t) = B (t) for t < T , and B.(t) = Δ for t > T Δ B Δ 0 " B0
24
FRANK B. KNIGHT
where (x
Δ
is adjoined as an isolated point.
y)
2
p ' { B ^ ( t ) e c} (x,y) .
C e B ,
are only known to be universally measurable in
Thus one obtains for
function.
Classically, the probabilities
B
a universally measurable transition
However, using the prediction process it is easy to get a
transition function on a countably generated subfield of universally measurable sets which is the restriction of a Borel transition function on a larger space.
The natural state space of C
(finely open) set
(B*) = {(x,y) : P
(x
'
B
γ)
is
{T B
° B
together with the i.e., the
0
complement of the set of regular points for functions for
Δ
> 0} = 1},
B
are Borel measurable, and
.
Since
E
α-excessive
(exp -αT
)
is
0
α-excessive, it is not hard to show that
(B )
need only its universal measurability. (w (t) , w (t)) , where Δ =
C00,00)
identically
,
(B^)
C
U Δ
image in
0
for
into H .
H
Έ>
φ(z)
(x,y) + p f X ' Y )
We have R Δ = {z € H : φ(z) e U B * )
H .
countably generated q
image.
0
Let
S
R
denote the
U («,«)} x X* = 3 (0,0)
is the Borel mapping of Theorem 1.9, and
generating sequence
with
;
Since
universally measurable from
and
B (t)
we obtain a one-to-one mapping of
defined by
first two coordinates.
measurable in
Identifying
and all other coordinates are set
ana where
is a Borel set, but we
G
φ«(z)
denotes its
is countably generated, and
(B )
U Δ
in place of
Then the trace of
into S
H,
that
H
on
R R
φ
^s
Λ
it follows by using a is universally is mapped by
σ-field of universally measurable sets in
on the trace maps by
P
φ
( Q ) C U Δ,
into the transition function of
In the present case, it can be shown that the image
onto a B
B.
on the
σ-field is
really the Borel field, but this seems to require in general Meyer's hypothesis of "absolute continuity". The theory of Ray processes (and Ray semigroups) is rather well understood, and will not be developed here. all of the facts we shall need.
We refer instead to Getoor [8] for
By means of the familiar compactification
procedure (to be described below) this theory may be brought to bear on any parcel of the prediction process.
Thus, it leads to a more satisfactory
form of Theorem 2.1 (Corollary 2.12), and also to an interesting open problem (Conjecture 2.10) which is discussed in some detail.
It also makes
ESSAYS ON THE PREDICTION PROCESS
25
possible a transcription of much of the "comparison of processes" from [8] to the prediction process setting, but some of this we leave to the reader.
Part
of the material which we do cover is needed again for the fourth essay. We start with any prediction packet which we denote by convenience although
A
alone is unspecified and
H
H
for
has no reference in
A general to Theorem 2.1.
It is clear from Theorem 1.17 that
Z
becomes
a right process on H Π H , with the Borel transition function if H Π H is not Borel, we have for z e H ίl H , q(t,z,B) = A U A U q(t,z,B Π H
Π H )
for
B € H,
subset of the compact metric space
H
Then
C
H
fl H
H
has a countable subset which is dense in
H,
which will be
C
in the uniform norm.
R g(z) denote the resolvent of the right-process Z λ t H Π H , we form the minimal set of functions containing A 0 + {R g : λ > 0, g 6 C } and closed under the two operations: λ a) application of R for λ > 0 , λ b)
as a
(H Π K L ) + is as follows. Let C + denote the A 0 of non-negative continuous functions on H .
Letting
formation of minima
Since we have
ίl H
of Proposition 1.3, and form its Ray
compactification (as in Chapter 10 of [8]) relative to
restriction to
(even
where the right side is the extension to a
universally measurable set). Consequently, we may consider
denoted by (H Π H ) The definition of
q
on
f Λ g .
(f Λ g) + (h Λ k) = (f+h) Λ (g+h) Λ (f+k) Λ (g+k),
it is easy
to see by simple induction that the set is closed under formation of linear combinations with non-negative coefficients.
Hence, it is the minimal
convex cone closed under operations a) and b ) . A crucial lemma ([8, (10.1)]) now asserts that this cone contains a countable uniformly dense subset. Furthermore, the cone separates points in
H
ίl H Λ since R does so. A 0 λ + We now define (H Π H ) to be the compact metrizable space obtained by completing H ΓΊ H Λ in a metric Σ°° . α 1If (z.) - f (zj I , where (f ) A 0 n=l n n 1 n 2 ' n is uniformly dense in the cone, α > 0, and Σ _ α(max f ) < « . n=l n Clearly the topology of of
f
or
α
.
(H
Π H )
does not depend on the particular choice
It is homeomorphic to the closure of the image of
H Π H Λ in X00 . [0,~) by the function f(z) = (f, (z) , fo(z),...) . A 0 n=l 1 2 If H is Borel, then its one-to-one image in (H Π H_) is also Borel, A A 0 while in general its image is universally measurable [8, (11.3)]. It is now easy to see by the Stone-Weierstrass Theorem that the space C(H
Π H_) A 0
of continuous functions on
(H Π H Λ ) A 0
is the uniform closure
26
FRANK B. KNIGHT
of the differences (H
Π H )
- g
of elements of the cone, extended to
by continuity.
differences on (H
g
H
Π H_,
Letting and
f
f
denote a uniform limit of such
its extension by continuity to
Π H ) + , we now define a resolvent on
(2.2)
f G C(H A Π H Q ) + ,
R χ f = R^F ,
The resolvent
Π H ) + , by
C(H
λ > 0 .
R
has the special property that it carries C(H Π H ) λ A U into itself. Finally, one shows [8, (10.2)] that every element of the cone is λ-excessive for some λ > 0, hence R. separates points and so λ + RΛA is a Ray resolvent on (H n U) A 0 It follows by a Theorem of D. Ray that there is a unique rightcontinuous Markov semigroup
P
on
C(H η H n ) A U
t transition measures we denote by Space (of
Z
on
DEFINITION 2.4.
H
η H )
p(t,h,dz) .
with resolvent
R. , λ
whose
We also introduce the Ray
as in [8, Chapter 15].
The Ray Space is the set
(i) = 1 } . REMARKS.
More properly, one should write
^ n n Q c o n f u s i o n w i l l A 0 arise. It is clear that U does not depend on λ > 0, and that it is + universally measurable in (H Π H ) . If H e H then U is also Borel. A 0 A A Three basic facts about P
P
ϋ
from [8, Chapter 15] which serve to connect
with the prediction process may be summarized as follows.
PROPOSITION 2.5. 1.
For
Thus p and 2.
For
z e H A Π HQ q
and
f € c(H A Π H Q ) + we have
may be identified on z e u A
we have
H
P f(z) = Q
f(z) .
Π HQ .
P. (I u n u (z)) = 1 t H_||π A 0
for
t > 0
(where
5
is t
defined for universally measurable functions by the usual extension procedure). 3. For the canonical Ray process (X , P ) on the probability space of r.c.1.1. paths with values in (H Π H Λ ) + , we have for z e u A 0 A Z P {5 e H Π H_ for all t > 0} = 1 , t A 0 and Z
P {X
e u
Recalling again the space metrizable space
Ω
H
for all
t > 0} = 1 .
of probabilities on the compact
of equivalence classes of measurable functions, we will
ESSAYS ON THE PREDICTION PROCESS
show that the Ray topology is stronger on Hence
(H
Π H )
single elements on H
Π HQ
than the
H-topology.
is "saturated" by the equivalence classes of elements
corresponding to the same element in
elements of
H
27
H
Π H
.
H,
Furthermore, on
have a special form:
which are r.c.1.1. for
and these classes reduce to
t > 0 .
ϋ
the corresponding
they assign probability one to paths
Only the right-limits at
t = 0
are not
known to exist, hence the mapping does not quite have its range in
H .
Nevertheless, it is sufficient to permit properties of the Ray process to be applied to the process
Z
for
h e H
Π HQ .
Turning to the details, we first characterize convergence in LEMMA 2.6.
A sequence
dense sequence
f n
h
€ H
is Cauchy in
H
by
if and only if, for the
tic 1.4, of Notation
/ 0 exp(-βt) f n is a real Cauchy sequence in PROOF.
H
k
for each
n
θ t dt and
3 > 0 .
By Theorem 1.2 b) the integrals are uniformly continuous on
Hence our condition is clearly necessary.
Ω .
To prove sufficiency, we observe
by the same result that
E kf
θ
n
are uniformly continuous and bounded in
t
t,
uniformly in
k,
for each
n .
Then by inversion of the Laplace transforms (as in Lemma 1.6) h rk J o exp(- βt) E f n
we have convergence in
k
of
h k E Kf
for each
t > 0
(h^)
H, as required.
in
and
θ t dt
n .
For
θ,. t
n
t = 0
this reduces to convergence of
Using the Lemma, we may compare the Ray and THEOREM 2.7.
If we have
h
e H JC
Π H ,
A
1 < k,
vJ
H-topologies. and
"™
+
in the topology of
(H
Π H ) ,
A
then
lim h
k-*»
°
lim h, = z , *•-
= h
exists
K,
exists in the topology
K
of H . Furthermore, let h(z) denote the induced mapping: h(z) = z on H Π H Λ , h(z) = h if z £ H Π H_ and (z,h) correspond as above. Then A 0 A 0 h(z) is continuous P
{Z)
on
(H
{paths r.c.1.1. for
0 HJ
.
Finally, for
t > 0} = 1 .
z e U
we have
28
FRANK B. KNIGHT
PROOF.
Let
h
e H
f] H
be a convergent sequence in the Ray topology,
with limit z e (H Π H ) + . This requires convergence of R.g(h^) for z g € C+ . Still more particularly, g e C . Still more particularly, let g(z) = E f (= zf in Notation 1.10) for 0 < f e c(Ω) . Then we have H
(2.3)
R.g(h, ) = E λ Jc
7
k
e " λ t E fcf dt
Γ u
k
λt
λ t f E• " /g • e-""
θ
at,
λ > 0 .
Thus convergence in the Ray topology implies convergence in the topology of 5
by Lemma 2.6.
h k •> h(z) .
Accordingly, there is a unique
Since
Hft Π H Q
is well-defined and continuous: identity on
H
Π H
h(z) e H
(H A Π H Q )+ ,
is dense in (H, Π H_) A 0
•+ H,
such that
the mapping
h(z)
and reduces to the
.
We will examine more closely the case
z s u
.
Passing to the
limit in (2.3) yields R ^ U ) = E h ( 2 ) Γo e " λ t f
(2.4)
θ t dt ,
but the middle term in (2.3) is no longer well-defined in the limit if h(z) φ H H) .
(in the context of [9], Z
becomes the prediction process on
However, the same limit may be expressed in terms of the Ray
process
Xfc
of Proposition 2.5, since
X t = h(Xfc)
on
HA Π HQ .
To this
end, we need to establish LEMMA 2.7.
For
g(z) = E f,
f
continuous on
R^i"(z) = E Z / 0 e " λ t E REMARK. PROOF.
t
Ω,
and
z e u , we have
f dt .
This was also used for [10, Theorem 2.4 d)] with incomplete proof. For
3 > 0,
the function
R o FLg p
is
8-excessive for
λ
t
it is known [8, (5.8)] that lim RβIΓg" (XJ = R β £Γg"(z) , t
40
M
fc
β
P Z -a.s.
λ
Also, by (2.2) and the resolvent equation, lim β R. R.g = lim 3 R o R Λ g 3H»
3
λ
3^00
X ,
β
λ
lim (3/(3-λ))(R.g - R.g)
hence
ESSAYS ON THE PREDICTION PROCESS
and the limit is uniform on
(H A Π H Q )
+
.
It follows that limits can be
interchanged to obtain ^
= lim lim 3E S+oo t-*0
Z
RQIΓg*(X ) 3 λ t
= lim lim βE +0 3_oo t
Z
R
= lim E fc*O But since for
t > 0
we have
X
Z
3
R g(X ) λ t
RΓg(X. ) λ
t
e H Π
H
P -a.s., the last expression
becomes = lim E Z R,g(Xj
= lim EZ E X t tK)
= lim i Z t-K)
Γ e " λ S E X s f ds
Γ e " λ s E X t + S f ds
= lim i Z e λ t t-K)
Γ e " λ s E ^ f ds fc
;
dt
f
completing the proof. Combining this with (2.4) yields
i 2 /; e " λ t E X t f dt = E h ( 2 ) Γo e " λ t f
(2.5) Since
X
θfc dt .
is right-continuous in the Ray topology, which we have seen is X
stronger than the t > 0 .
H-topology
By Theorem 1.2 b ) , E
on Z
H,
EZ E
f
θ
f
is right-continuous in
is also right-continuous.
Thus_by inversion of the transforms in (2.5) we obtain E Z E tf = E h ( z ) f
θ ,
t > 0,
for
0 < f
continuous on
Ω .
By
Proposition 2.5.1., the left side is X E Z E t(fIΩ) . By monotone class argument the equality extends to bounded Borel it follows that the right side is
E
((flfi)
θ ) .
f,
hence
This implies that
30
FRANK B. KNIGHT
for
t > 0,
t -> 0,
p
{paths which are r.c.1.1. in
[t,°°) } = 1 .
Letting
the last assertion of Theorem 2.7 is proved.
It is thus plausible that for
z <=• U A
the Ray process may be
expressed as a prediction process on a slightly larger space than smaller than
H .
NOTATION 2.8. and let
H
Ω
= {elements in
Let
F°
=
t +
h
&
H ,
Π
Ω
which are r.c.1.1. for
H χ = {A Π H
transition function
ε > 0
q
S
to
(H , H ) ,
Setting
H
= {h e H
h e H ,
if
H
X
is not
meaning of
X
for
t > 0
) = 1
for
Then for
and extend the
: P {Z
F° -measurable. Ω",
X
= h} = l}, H
Theorem
.
This conforms to the
is not even well-defined.
X
since
We will not elaborate all details,
Ω|F
.
The
is really in the sense of an essential right
of the prediction process on P (θ
Z ,
is replaced by
limit, which happens to coincide with PROOF.
/?} . Ω
in such a way that Theorems 1.7
1.17 also applies for
fact that as a "coordinate in
on
t > 0},
h
and 1.15 remain true.
Note that
:A e
σ(X , 0 < s < t + ε )
one can define the prediction process
REMARK.
but
We introduce
Let
= {h e H : h ί Ω ^ = l},
THEOREM 2.9.
H,
H
Z
X
= X
.
is just a special case
of [9]. The point is that, since
t > 0,
we can use exactly the same σ-fields v> G and the same construction as before to define Z for t > 0, to show that P {z e H for t > 0} = 1, and to show that the same transition v> function q continues to apply for Z , t > 0 . On the other hand, for f continuous on Ω it follows by Hunt's Lemma that for rationals r > 0, lim
Z h f = lim E h (f r XH
θ |F° )
) , Since
Z
is right-continuous for
in the topology of
H,
t > 0,
P h -a.s.
we see that
lim Zfc = Z Q
exists
P -a.s., and
E h (s|F° + ) = zJ(S) , S e G °(Ω) . Now
if we define
q(t,h,A) = Ph{z!) e A }
q(t,h,A) = q(t,h,AΠH ) s > 0
implies that
t > 0 .
for
h e H,
for
h e H
q(s+t,h,A) = /q(s,h,dz) q(t,z,A)
On the other hand, for
- H,
A e H ,
the Markov property of
s = 0
we have for
q(t,h,A) = P h {z£ € A}
Z
for all t > 0
and
for s > 0
and
ESSAYS ON THE PREDICTION PROCESS
31
= E h P ° ( ^ 6 A)
= /q(O,h,dz) q(t,z,A) , completing the verification of the Chapman-Kolmogorov property of Since
H
c H ,,
P {Z
€ H
it only remains to verify that for
} = 1 .
Since, by construction,
Z
is
q .
h e H , F
/W -measurable,
this last is a consequence of the strong Markov property with
T = 0 .
Formally, it follows because
implying that the expression in the last parentheses equals
1,
P -a.s.
In view of Theorem 2.9, we define the prediction space and prediction process of
(Ω , G , F 1
, θ , X ) t""
t
in complete analogy with
t
Definition 2.1, and it has the same Markov properties noted there.
We
are now in a position to state an interesting conjecture concerning the relation of this prediction process to the Ray processes (see also Theorem 2.7). CONJECTURE 2.10.
For any packet
—"h
μ h (dy) = P (h(X ) € dy) . for
t > 0
H ,
h e u ,
—
Then
X
is
X
let
(Ω , G , F
, θ , X )
on
H ,
at
in the
μ, (dy) . n
DISCUSSION.
1)
H-topology,
P -a.s., the conjecture follows if it is shown that the mapping
h(z)
Since
dy & (F/|H )
P -equivalent in distribution
to the prediction process of
with initial distribution _
and
V»
has right limits in
H
is one-to-one on the non-Ray-branching points of
t = 0
U
.
The converse
implication is also clear. 2)
We do not conjecture that
prediction process of a fixed element of For example, consider the sequence probability of the process
X
h , n
X H
is .
1 < n, —
P -equivalent to the
This is false in general. where
which with probability
h
is the n
1/2
chooses one of
the two paths w χ (t) = n " 1 + (t-l) + or w 2 (t) = - ( t - l ) + , = max (0,t-l) . Then in the Ray topology lim h = h, n-*» n is the Ray branch-point which with probability 1/2 gives the
where
(t-1)
where
h
32
FRANK B. KNIGHT
prediction process of either of the deterministic processes X
= -(t-l)
+
Xfc = (t-1)
or
. It is not hard to see that this initial distribution for
the prediction process cannot be expressed as P { Z Q e (•)} for any z e H
. The necessary and sufficient condition for such a representation
is contained in Theorem 1.2 of [10]. On the other hand, in the H-topology
lim h n = z, where z fc H n-*» concentrated on 2 points of Ω . 3)
is the obvious probability
The importance of the conjecture, at least from the
standpoint of theory, lies in the fact that all entrance laws for the transition function
q
on H
initial distributions on
Π H-
(H Π H )
(having mass 1) are expressed by +
for the Ray process.
This fact
seems to have first been noted by H. Kunita and T . Watanabe [11, Theorem 1 ] . Hence, our conjecture is equivalent to the assertion that every (finite) entrance law for the prediction process on H
Π H
initial distribution of the prediction process of Of course, it suffices here to consider the case
is realized by an (Ω , G ° , F ° , θ , X ) .
H Π H_ = H . The A 0 0
analogous conjecture for the prediction process of [9] on H (or equivalently, on the set H
= {h e H : P h { Z = h} = l})
it is already closed under formation of entrance laws. space of H Q
would be that
Hence the Ray
would correspond to a subset of initial distributions over
H
It is easily shown that this Ray space does define a process corresponding to each than
P , h ^ H, and by Discussion 1) above it is then strictly larger
H . The class of processes for t > 0
obtained from initial
distributions on the Ray space is then the same class as those obtained from all initial distributions on H Q
(or equivalently, on H ) , if
this extended conjecture holds. 4)
For the packet of an autonomous germ - Markov process, the
conjecture holds and X
is even represented by a single element of H-
(see [10, Theorem 2.4] for a more general setting). As far as concerns the left-limit process
Z
, it will be seen that
the result of Conjecture 2.10 does hold, at least for Borel packets.
A
still more satisfactory result will be shown subsequently. THEOREM 2.11. For any Borel C. = {z e (H Π H ) Ά A U in (2.2),
:
for
R λ f( z ) = /p(0,z,dy) R λ f(h(y))},
{y : h(y)e H A Π H Q } . Then any
H -packet H Π H Λ , let _ ϋ A 0 f & C (H_ Π H j + with corresponding A U
Z
z ^ u , ? l\_
e
CA
CA
f
as
where the integral is over
is Borel in
(Hft Π H Q ) + , and for
for all t > 0} = 1 .
ESSAYS ON THE PREDICTION PROCESS
PROOF.
Since
h(y)
function, while
is continuous and
q(t,h(y),A),
p(t,z,dy)
is a Borel transition
is also Borel in the Ray-topology,
it is clear by letting + Borel in (H Π H ) .
Therefore,
for the Ray
To prove the second assertion, it suffices to
assume for
for some
ε > 0,
t > 0,
σ-fields on
P -augmented
H
I
^Xt-^ *
and since
we may as well assume
identified for same
range through a countable dense set that
C
is A
σ-fields.
t > ε
t > 0
f
A e H,
33
z 6 H
i s
X
we see that
Π H
Z
generated by
σ-fields
I
P r e visible process
and
Z
. Then
and since the Ray and Π H ,
a
are identified X
and
Z
are
H-topologies induce the ^t-*
A Z ,
i s
Previsible
s < t .
f o r
t n e
By the previsible
section theorem, it now is enough to show that for previsible
T
with
0 < T < °°, P Z ίl (X_ ) = 1} = 1 . Since X « H (1 H for t > 0, C T"* Iτt A U ~" A the Ray processes have the moderate Markov property, it follows that
(2.6)
R χ f (X τ J = E E
T
"
Jo e
u
and
f(X t ) dt
":τ-rdy) R χ f(y), P Z -a.s. Since
h(y) =» y
on
H
Π H ,
this is the asserted result.
Irrespective of Conjecture 2.10, we can regard
C
as a complete
Borel packet in the Ray space, each of whose elements corresponds to an initial distribution on
H
Π H
.
However, a stronger result is evident
by comparison of (2.6) with the moderate Markov property of 1.18).
Z
(Theorem
Thus the expression in (2.6) must also equal
Γ
~λt
f H
A
T
0
since both determine the probabilities of this expression by theorem that for
R. f(Z ), A T— z e U ,
t—
But by continuity of
h(z)
Z t -
Substituting for COROLLARY 2.12.
Z
τ + t
given
Z
A
t—
for all
t > 0} = 1 .
we have
= lim Z S s-^t-
= lim h(X ) = h(X S Z s-^t~
) .
in the above, we have shown
For any Borel
.
Denoting
it follows by the previsible section
PZ{R.f(X. ) = R.f te ) A
z
H -packet
H
Π H ,
let
34 D
V
FRANK B. KNIGHT +
= {z e ϋ A : for f ec(H f t ΠH 0 ) ( z ) = R f(h(z)) } • Then I) is λ P Z {X
zeu
h(D ) Π H
e
D
t > 0
for all
A
with
f
as in (2.2), (H A Π H Q )+ , and for
orel in
= 1 . Finally, the image
is a complete Borel packet in
H
containing
H
Π HQ .
PROOF.
Only the final assertion remains to be shown, since obviously D A + is Borel in (H Π H ) . But since z is determined uniquely in (H Π H ) + by {R.f(z)}, we see that h(z) is one-to-one on D . A U A A Hence h(D ) is Borel in H., and h(D ) Π H is Borel in H . Since A 1 A for z € h(D ) Π H we have A P Z {Z
e
and
H
h(X
) = Z
for all
t > 0} = 1 ,
the result is proved. According to Corollary 2.12, starting from any Borel H Π H , we can form the complete Borel packet A U
H -packet
h(D Λ ) Π H A
containing
it, all of whose elements determine the same processes as corresponding initial distributions on process
Z
h(D ) Π H-,
remains in
H
Π H
natural to replace the process on h(D ) Π H A U on
DA,
with left-limits in
and have the property that the
for all H
t > 0 . Thus it is quite
Π H
by the right process on
h(D ) ίl H . A
Since
h(z)
is one-to-one
we can regard this process equivalently in either the Ray or the
H-topology in so far as concerns its times of discontinuity.
Thus, there
is no need to make an elaborate "comparison of processes," as in [8, Chapter 13] for example.
Instead, we can transcribe results for the Ray
process directly into results for the
H-process.
To conclude the present
section, let us illustrate this by transcribing Theorem (7.6) of [8, Chapter 7 ] . THEOREM 2.13.
For a Borel
distribution on
H
Π H
H -packet
If
Z
= Z
Z^-previsible and (ii)
Let
B
Z
on μ
Z
(Z
let
p
be a fixed initial
D
h ( A ) Π H ),
are the usual augmented
{0 < T < »},
P U -a.s.
then
T
and let σ-fields
is
= Z£_ .
denote the set of Ray branch-points in
Then the totally inaccessible part of T T on A T
where
Π H .
(or more generally on
T be a Z^-stopping time of u for P μ ) . (i)
H
A
1
«> on
is
Ω_ - A
z
(H
Π H )
ESSAYS ON THE PREDICTION PROCESS
35
+
A = {0 < T < » , χ τ _ e (H A Π H Q ) - B, X τ _ f X ^ = {0 < T < co, z τ _ e PROOF.
H()
, Z ^ * Z τ >,
Both (i) and the first expression for
A
P -a.s.
in (ii) are taken
directly from [8]. It remains only to verify the second expression for Clearly if hence p
h(z)
z
G
D
a n d A
z φ B .
{Z
z
< )
e
H
= h(z)} = 0 .
z £ n
z
h
z
a n d
= < ) and
p Z
P {h(X Q ) = h(z)} = 0,
z e B,
^
0
A.
= z} = 1,
h(z) e H - H , then
Z
Hence
Then
t h e n
' o
Conversely, if
P { x = z } = 0 .
3.
h
and so
completing the proof.
A VIEW TOWARD APPLICATIONS. Since the object of the present work is not to study the prediction
process per se but to develop it for applications to other processes, we conclude this essay with some general observations and partly heuristic discussion of the simplest types of examples. that by choosing different packets
H
It may appear at present
one can obtain in the form
Z
practically any kind of r.c.1.1. strong-Markov process, but this is not quite true.
A special feature of
Z
that is important in applications is
the absence of "degenerate branch points."
Here a degenerate branch point
is one from which the left limit process jumps to a fixed point of the state space.
But since we have a Borel transition function
the moderate Markov property, and z = h <= H ,
q(0,h,{z}) = 1
for
if and only if
such deterministic jumps do not occur.
expression of the fact that, by Corollary 2.12,
q(t,z,A)
This is again an
Z
is practically just
the Ray process of a right-process. The same fact permits us to give criteria for or for it to be a THEOREM 3.1. Then a)
Z -previsible process.
H be a complete Borel packet for Z (Definition 2.1, 3)). A t is a Hunt process on H relative to the usual σ-fields Z A t
Z t
H
c H
(i.e.,
H
is an
quasi-left-continuity of the
distribution
μ
on
continuous
(Pμ a.s.
PROOF.
H
If
to be a Hunt process,
Let
if and only if the
Z
H A
c H
b)
Z
σ-fields
is "C
for all
H -packet). Z^
for each initial
Z -previsible if and only if it is t
μ) .
then clearly
Z
is a right-process.
that it be a Hunt process is then quasi-left-continuity. any for
Z -stopping time P
μ
([4, IV, 81])
T
This implies
The requirement
By decomposing
into accessible and totally inaccessible parts
one sees that for quasi-left-continuity it is
necessary and sufficient that for any increasing sequence Z° -stopping times with
lim T
= T
and
μ
P {
τ n
< T} = 1,
T
of
one have
36
FRANK B. KNIGHT
P μ {Z
= Z } = P μ {T < «}. T-
T
But by the moderate Markov property,
P μ ( Z τ = Zτ_
(3.1)
T < 00)
= E μ ( P μ ( z τ = z τ - |Z τ _) = E μ (q(O,Z τ - ,{Z τ - })
T < co)
T < 00)
= P μ {T < «} , since
q(O,z,{z}) = 1
on
H
and
H
is complete.
Finally,
H
c HQ
is necessary even for a right-process, so the converse is obvious. last statement and (iv)].
of
Thus it is another way of ensuring that
X
is a Hunt
process in the Ray topology, as remarked in [ibid, (13.3)]. considering
H A
(
C
we see that for
The
a) is proved for Ray topology in [8, (13.2), (i)
H ) μ
as a subset of
concentrated on
h(D) Π H H
However, by
from Corollary 2.12,
the process
Z
is quasi-left-
continuous if and only if it is quasi-left-continuous in the Ray topology. Hence the result carries over. Turning to b ) , continuity implies previsibility so we need only prove the converse. Then if Z
is Z -previsible, both
processes, and to prove that indistinguishable. show that for
Z
and let
N •* ») . ,{Z
However, since Z
Z
are
Z -previsible
is continuous we need only prove them
T,
0
by
T
P^{Z = N
= Z } = 1
(as usual,
{T = 0
T > N},
on
By the moderate Markov property we have }), hence we must show that
Z
-measurable.
and
By the previsible section theorem, it is enough to
Z -previsible
we may replace general previsible
= E μ q(O,Z
Z
P M {Z
or μ
P {Z
= Z }
e H - H } = 0 .
is previsible it is known [4, IV, 57] that
Z
is
Since
and there are no degenerate branching points, we must have
Z
e
H
as
required. To give a feeling for the applications, we will consider briefly three situations: a)
X
is a Markov process,
b)
Z
is a Markov chain,
c)
(X ,(w
^(t)))
is a Markov additive process.
It is to be noted that b) is a condition on conditions on
X .
Z ,
while a) and c) are
Thus our examples illustrate the point that in the
ESSAYS ON THE PREDICTION PROCESS combined study of considered.
X
Zfc
neither is necessarily the first to be
One may start either with a known process or a known
prediction process. on both
and
37
X
and
To be sure, one does not ordinarily make assumptions
Zfc,
since each determines the other uniquely.
To study the case of Markovian
X ,
if we are not interested in any
"hidden information" we can assume for convenience that P{w_ _ (t) = 0 , 2n-l
l
from our notation. since
Z
t > 0} = 1, -
and drop the coordinates
To relate the Markov properties of
X
w_ . 2n-l
and
Z ,
has the role of a conditional distribution relative to
we must assume that
X
is Markov relative to
we could equivalently use in general since
Z
X
, Z
,
might not be
the probability of
X ,
and let
and
F
F° ,
.
Alternatively,
but we cannot use
F -measurable.* H
F
Let
k ^ H
denote
be a Borel predict!oti packet for
X as, for example, in Theorem 2.1 and the discussion following its k proof. As noted in Definition 2.1 2 ) , (φ(Z ), Zfc) is P -equivalent to (X , Z ),
and it suffices to look at the former pair.
see how the Markov property of property of
Z
.
X
It is not hard to
translates into an instantaneous
In the first place, in view of Theorem 1.9 and the
Remark following,
Z°
is
P -equivalent to the
σ-field
χ
where
χ =σ{φ(Z ),
s < t} . Hence, the Markov property of X is equivalent to k o (for P ) of Z and σ{φ(Z ), 0 < s} t t+s *" φ(Z ) . But Z is also defined as a conditional probability over
the conditional independence given
the latter
σ-field given
Z ,
namely
Z t (S) = P ((θ^)""1 ίφ(Z Since
F
P -a.s.
) e S}|Z°) ,
is countably generated, it follows that by
φ(Z )
S e F° . Z
is determined
(the details of this transparent reasoning are given in
[10, Theorem 2.2]
and fortunately need not be repeated here). It follows — that there is a P -null set N and a 8^/H-measurable ψ such that Z = ψ (φ(Z )) for w Φ N . Conversely, if such N and ψ exist, t t t z t t t k
then plainly ψ
X
was Markov at time
t
relative to
plays the role of transition function for
conditional future definition
X
Ψt<
x t
)
I f
Φt
m a v
b e
F°
. The function
X , by assigning to it the
chosen free of
t,
then by
is homogeneous in time.
*A variety of analytic conditions making X Markov relative to F~ is given in H. J. Englebert [7]. If X t is only Markov relative to F°., then it is still germ - Markov relative to F£ + in the sense of [10] , and may be approached by the method developed there under suitable conditions. From the standpoint of Ω (as in [9]) F. coincides with Ffc_, and the distinction becomes meaningless.
38
FRANK B. KNIGHT
Perhaps the most noteworthy fact here is that even if homogeneous nor strong-Markov, the process
Z
tion of
q
ψ.(X ))
with transition function
any such irregularities of
X
are due to
is neither
(i.e., a standard modificahas both properties.
ψ.
t
X
and
N ,
t
not to
Thus
Z
t
.
t
This provides a ready method of investigating transition functions of Markov processes which, as mentioned already, is the subject of the third essay. At present, one may gain further insight by comparing this method to another one:
that of the "space-time" process.
any Markov process by
(t,X ) ,
process
X
(P ,X )
so that an initial value ,
t > 0,
(s,x)
conditional upon
X
means that one considers the
X
= x,
but with the added
S
STU
coordinate
It is a familiar fact that
becomes homogeneous in time if we replace
s + t
so that no value of the pair can recur.
While this
device is very useful in particular cases, such as in studying the heat generator (— - — ) , it has also been used occasionally in a general role 2. σt (E. B. Dynkin, [6, 4.6]).
Contrary to first impressions, the method of the
prediction process apparently is quite unrelated to this as a method of "making a Markov process homogeneous".
Not only are the respective
topologies quite different (assuming the product topology for the space-time process), but more importantly the prediction process can repeat values, and hence may be simpler. 0 < θ < 2π
For example, a particle confined to the unit circle
and moving with velocity
v(t) = t - [t]
(a saw-tooth function)
has prediction process with states corresponding to pairs 0 < v < 1,
while its space-time process has states
In general, if
X
(v,θ),
(t,θ),
0 < t < « .
happens to be a time-homogeneous Markov process then it
is usually equivalent to its prediction process, while
(t,X )
may be
somewhat artificial and intractable. Taking up our second illustration, since
Z
is always a homogeneous
Markov process it is natural to ask under what conditions it is a process of some special type.
For instance, if
Z
is a pure jump process, i.e., a
sum of finitely many jumps with exponentially distributed waiting times for the next jumps given the past, then property.
But unlike
Z ,
X
X
and suppose also that
obviously has the same
need not be a Markov process.
To indicate the possibilities for 1 < n,
= φ(Z )
w
X ,
(t) = 0
regarded as the real-valued process
we again take for
2 < n,
w
so that
(t) = 0, X
may be
w (t) . To construct a process X £ t having a pure-jump prediction process (apart from the case of Markovian X ) one can begin with any family K (x_,...,x t ,...,t (dx _xdλ )), n 1_ n 1 n n+1 n+1 1 < n of probability kernels over R * [ε,°°) , for fixed ε > 0, and
ESSAYS ON THE PREDICTION PROCESS
x k e R, t on
> 0,
R x [ε,°°),
variable with
1 < k < n . define
P{e
X
Letting
= x..
(x ,λ )
for
have any initial distribution
0 < t < e
where
> t} = exp(-λ t ) , independent of
Proceeding by induction, suppose that
39
x_,...,x
e x
and
is a random given
λ
e, ,...,e
In
.
have been
In
determined, and that X has been defined for 0 < t < Σ£ e . Then we select a pair (x _,λ ) distributed according to the kernel K n+i n+i n with
t
= 0,
U
t
JC
= Σ
e.,
ϊ= 1
and
~j
x
= X
JC
definition is completed by setting
t,
X
e
Σ
oo
n=l
1 < k < n . ""*""*
for
Σ*J
n-tΊ
< t < Σn+
e
k—1
The inductive
k
e
k^l
, k
is a random variable conditionally independent of
{χ_,...,x _, e_,.. ,e } 1 n+1 1 n On the
,
= x t
where
_
k-1
P-null
e < t . n -
given
λ
,
and
Pie
n+1
_ > t} n+1
CO
set where
Σ e < °° we define n=l n
It is evident that such
X t
X^ = 0 t
for
has a pure-jump prediction J ^ ^ ^
process, and it is plausible that any pure-jump prediction process
Z
all of whose expected waiting times exceed
is
obtained in this way (if
φ(Z )
is a.s.
ε 0
with probability
1
except for the first
coordinate). In this construction, even if distinct values,
Z
X
can assume only a finite number of
may have an uncountable state space since it
"predicts" the whole future sequence of
X -values.
it is easy to give sufficient conditions on the is even a finite Markov chain (other than
X
K
On the other hand, which imply that
Z
being itself one). Thus if,
for some fixed N and all n > N, K = K depends only on (x ., x ....,x ) while X n-N+1 n-N+2 ' n t moreover λ . is a fixed function λ (x H i 1 , x _,...,x ,x _) n+1 n+1 n-N+l n—N+2 n n+1 depending only on the
x v
possibilities for these chain. X
's x k
shown, then it is clear that the finitely many '
s
In particular, if the
imply that λ 's
Z
will be a finite Markov
reduce to a single constant
is a "generalized Poisson process based on an
λ,
then
N-dependent Markov
chain," in the evident sense of dependence on the past only through the last
N
that
Z
states visited.
Obviously, then, the possibilities for
X
such
is a pure jump process are quite great, and we do not pursue
them farther here. For a type of example which involves a non-Markovian which the unobserved data
(w
_ (t))
X ,
and in
are of basic importance, we
consider briefly the "Markov additive processes" (in the sense of E. Cinlar; see [3] and [153 for a vivid introduction and further references), 1 2 Roughly speaking, a standard Markov additive process is a pair (X , X )
40
FRANK B. KNIGHT
where X 1 is a standard process (in the sense of Blumenthal and Getoor) and fc 2 X is a real-valued process with conditionally independent increments 1 2 given X . In the applications X is observed, and one would like to t 1 make inferences about the underlying process X . For simplicity of 1 notation we assume that w (t) = 0 for n > 2, and that X. is realn t valued, so that we may identify the trap state Δ as °°, and let 1 2 1 1 X = w (t) , X = w (t) on Ω. Since X is Markovian, and given X the future increments of X are independent of F , it is to be P 1 o expected that the prediction process Z of (X , X^) is determined by 0
1
o
the value of X and the conditional distribution of X given If one is concerned only with X , it is simpler to treat 2 2 P X
- Xn
the form
as an additive functional, and consider S Π {x
in determining
= 0},
Z^
Se G
.
Z
Then the value z
if the values of
t
ί
χ
e
B
0
K
F
.
restricted to sets of X
becomes irrelevant
B ^ 5,
are known.
We can incorporate this change of view by redefining our translation operators appropriately.
We turn now to the necessary notation and
hypotheses. DEFINITION 3.2.
Let
Ω* = { ( w ^ t ) , w 2 (t) ) : w 2 (0) = 0
and
w ^ t ) j4 ± «>
for all t} . Further, let G* + = {S Π Ω* : S e G° + } and F* = {S Π Ω* : S & F° } . Finally, let θ*((w.,w.)(s)) = c t + λ z * *υ * θ w ) = θ θ W fW n Ω (w χ (s),w 2 (s) - w 2 (0)) and i^2 0 t^ l 2^ ° * t ( HYPOTHESIS 3.3. Ω
A standard Markov additive process
is a collection of probabilities
such that
w (t)
P
X
on
G
(w (t), w (t))
(= V
G* ),
on
x € R,
Δ = + « as the
is a standard Markov process (we take
terminal point), and (i) (ii)
P X {(w 1 (t),w 2 (t)) e B 2 > For
G* -optional
is
T < «>,
B-measurable for
B2 €
B^,
one has
w (T) = Pλ ((w l f w 2 )(s) e B 2 ) ,
B2 6
82 .
We now introduce a notation for the process of conditional probabilities of DEFINITION 3.3. distribution
μ
w (t)
given
F
,
which is our main concern.
The filtering process of is the process
P
w (t) P
by
w (t)
P
F ( ): F (B) = Z {w (0) e B } ,
where for each initial distribution
μ
on
R
we let
Z
μ
X
μ(dx) ,
with
B e
denote the
prediction process for Pμ = / P
for initial
P μ (Ω -Ω*) = 0 .
B,
ESSAYS ON THE PREDICTION PROCESS
41
F μ (B) = P μ (w (t) € B|F^ + ) .
We remark that, of course, we have
A remarkable result of M. Yor [15, Theorem 4] asserts that the F (•)
are themselves r.c.1.1. strong-Markov processes with a single Borel
transition function. Z
for the
μ
.
Here we will deduce this from the corresponding fact
However, this does not quite give as nice a topology as
[15] (see the remarks following the proof).
For our proof, we need a
further notation and lemma. LEMMA 3.4. P
μ
on
(Ω,G°)
S & G°,
μ
on
X
by first
H
and
P (S) = P θ * ( S y ) ,
y
€
R,
where
we define a measure S
= S Π {w 2 (0) = y},
P μ = /p X μ(dx) . Let H* = {p μ , y € R, all μ} . y y y is a Borel prediction packet, and for each μ we have
P μ {Z^ = P *
(3.2)
T-
PROOF.
R X
and then *
Then
For each initial
For
S
for all
t > 0} = 1 .
W ^ \^J
of the form S = {a < w 2 (0) < b, S w
(Q)
= S*}
for
S* e G* ,
we have
P (S) = I, . (y) P (S ) . Let S be a countable sequence of ιa n y 'D' o x such sets which generates G . Since by (i) and (ii) P is a one-to-one Borel kernel of probabilities on G with P {w (0) = x} = 1, μ we see that P is also one-to-one and Borel with respect to the
measure
μ .
Then it follows that the sequential range
μ
{(P (S )), y ^ R, μ a probability on R> is a Borel set in X°° [0,1], k==1 y n * . * implying that H M . To prove that H is a packet it suffices to show (3.2), since clearly P U = P μ and if (3.2) is true then Fμ P
(3.3)
{Z
ϊ t
=
Vw2(t)
fθr a11
by translation (we omit the superscript Borel in
(x,y),
sides of (3.2) are
for
F μ -optional
and F
F
μ
is
F
μ
t > 0 } - 1fy ^ R pμ
on
Z ) .
Since
PX
is
-optional, it is clear that both
-optional, hence it is enough to prove
T < » . Now by (ii) and the definition of
Zμ,
we have
42
FRANK B. KNIGHT
w (T)
=
P
ϊ
w2(T)(S) '
as asserted. By this lemma, we can introduce the filtering process as a function * of the prediction process with state space H , and derive its properties from the latter. THEOREM 3.4. B € B,
The probability-valued process
F (B) = Z {w (0) e B>,
as a function of the prediction process
Z
on
H*f
is a
right-continuous, strong-Markov process for a suitable topology such that (M,ί!)
the space
of probabilities on
metrizable Lusin space. processes PROOF. for
F^,
5
with its generated
σ-field is a
Accordingly, the same results are true for the
F^ .
For
h = P
u
* e H ,
h * F (B)=μ(B),B^δ (this is not to be mistaken
set
which has a subscript).
AXΛ = {h e H
: F
& M> .
Clearly
M
Then for A w e H, M
M e M,
we let
and writing now *
probability of the canonical prediction process on
H
with
Py y
for the μ
h = P
initial measure we have (3-4)
P
; ( F τ + t c- M|Z°)
On the other hand, recalling the Ψ(Z ) ,
s < t,
where
φ(Z )
is
σ-fields
χ
P -equivalent to
generated by w (s), we can
transfer (3.3) to the canonical space and rewrite (3.4) in the form
as
ESSAYS ON THE PREDICTION PROCESS
(3-5)
43
p J ( F τ + t 6 M|χ τ + )
P
= /j{Ft « M} F
τ
= q(t.P 0 , A M ) , where we used (3.3) with
F
in place of
μ,
along with the fact that in
distribution F does not depend on y for initial probabilities of the u * t * form Pyfe H . Accordingly, we may define a transition function q for F by q*(t,μ,M) = q(t,P μ ,A M ), and (3.5) becomes P μ ( F m e M | Z ° ) = q (t,Fm,M) . y T+t T T
(3.6) x Since μ,
P
u was assumed to be Borel in
it is not hard to see that
q
(iϊ,ίJ) . Finally, the topology on that induced by the mapping
x
and
PJ: is one-to-one in
is a Borel transition function on M
μ -+• P μ
referred to in the theorem is just and the topology of
H ,
since it is
easily seen that right-continuity of
Z
t
P
y+w 2 (t)
in (3.3) implies right-continuity of
(from the right-continuity of DISCUSSION.
w (t)) . Thus Theorem 3.4 is proved.
It follows directly from the (known) fact that the optional
projections of the r.c.1.1. processes r.c.1.1.
f(w (t))),
f e C(R), are again
μ
P -a.s. ([5, Chapter 2, Theorem 20]), that
in the usual weak
-*
topology.
F^
is even r.c.1.1.
This, together with further
applications, is found in [15]. From an applied viewpoint, it is only the processes
F^_(b) = z μ _{w (0) e B > , B e B,
which are realistic, since
only they do not depend on the future element of usual convention that
F°
F^ + . Using the fact that
F°
. Further, with the μ
is degenerate, one has
P {F^
P μ {w 1 (T-) = w
at previsible
(T)} = 1
= μ}= 1,
unlike
T < ξ,
44
FRANK B. KNIGHT
however, it is clear that perhaps at the lifetime property of
F^
ξ
F^
has no previsible discontinuities except
of
w (t) .
Hence, the moderate Markov
follows from the Markov property of
F
.
A final remark seems merited concerning the Definition 2.1 of the prediction space
Ω
. According to [4, IV, 19], Ω z
is a coanalytic
subset of the space of all r.c.1.1. paths with values in space is a measurable Lusin space.
H,
and this
The question naturally arises of
whether, by restricting this space to the r.c.1.1. paths in some stronger topology, one might preserve its function of representing the processes Z
and yet improve some other properties.
the Skorokhod topology of measures on (unpublished) one does not have topology} = 1 . exist unless
X
P {Z
Ω .
A natural candidate is then However, as shown by D. Aldous
is r.c.1.1. in the Skorokhod
The difficulty is that the Skorokhod left-limits do not is
P -quasi-left-continuous.
Hence the topology of
H
seems to be the most reasonable alternative.
REFERENCES 1. Theory.
Blumenthal, R. M. and Getoor, R. K. Academic Press, New York, 1968.
Markov Processes and Potential
2. Chung, K. L. and Walsh, J*. "To reverse a Markov process," Acta Math. 123, 1970, 225-251. 3. Cinlar, E. Markov additive processus and semi-regeneration. Proc. Fifth Conf. on Probability Theory (Bresov), Acad. R.S.R., Bucharest. 4. Dellacherie, C. and Meyer, P.-A. Probability et Potentiel, Chap. I a IV. Hermann, Paris. Chap. V - VII (to appear), 1975. 5. Dellacherie, C. Verlag, Berlin, 1972.
Capacites et Processus Stochastiques.
6. Dynkin, E. B. Theory of Markov Processes. Englewood Cliffs, New Jersey, 1961.
Springer-
Prentice-Hall Inc.,
7. Engelbert, H. J. "Markov processes in general state spaces" (Part II), Math. Nachr. 82, 1978, 191-203. 8. Getoor, R. K. Markov Processes: Ray Processes and Right Processes. Lecture Notes in Math, No. 440. Springer-Verlag, New York, 1975. 9. Knight, F. B. "A predictive view of continuous time processes," The Annals of Probability, 3, 1975, 573-596. 10. Knight, F. B. "Prediction processes and an autonomous germ-Markov property," The Annals of Probability, 7, 1979, 385-405. 11. Kunita, H. and Watanabe, T. Some theorems concerning resolvents over locally compact spaces. Proceedings of the Fifth Berkeley Symposium on Math. Stat. and Prob. Vol. II, Part 2. University of Cal. Press, 1966, 131-163. 12. Meyer, P. A. La theorie de la prediction de F. Knight. de Prob. X, Universite de Strasbourg, 1976, 86-104.
Seminaire
ESSAYS ON THE PREDICTION PROCESS
13. Meyer, P. Ά. and Yor, M. Sur la theorie de la prediction, et le probleme de decomposition des tribus F £ + . Seminaire de Prob. X, University de Strasbourg, 1976, 104-117. 14. Seminaire de Probabilite's I-XII. University de Strasbourg. Lecture Notes in Math # 39, 51, 88, 124, 191, 258, 321, 381, 465, 511, 581, 649. 15. Yor, M. Sur les theories du filtrage et de la prediction. Seminaire de Prob. XI, University de Strasbourg, 257-297.
45
ESSAY II. CONTINUATION OF AN EXAMPLE OF C. DELLACHERIE
1.
THE PROCESS
R
.
We consider a single occurrence in continuous time happens at an instant
T^ > 0
which may be random.
be the failure time of some mechanical apparatus.
t > 0
which
For example,
TΛ
may
Analytically, the
entire situation is described simply by the distribution function F(x) = P{T^ < x} . We restrict
F
and we define
is not finite, so that
T Λ = » where
T^
only by
F(O-) = 0
and
F(«>) < 1,
P{T^ = «>} = 1 - F (»)
Without risk of confusion, we speak of the "occurrence of
T # , " thus
identifying the event with its instant. From the viewpoint of an observer waiting for
T^
to occur, the
situation presents itself not as a distribution function but as a stochastic process, and as such it provides a basic example of general methods.
Thus we associate with
(1.0)
T^
the process
Rfc = I [ T Λ f < B ] ( t )
where I
,
— < t < -,
denotes the usual indicator function.
This process was studied by C. Dellacherie (1972), and by C. S. Chou and P. A. Meyer (1975).
The closely related process
TΛ Λ t
was also
studied briefly by C. Dellacherie and P. A. Meyer (1975), who corrected some errors in [4]. Since we require some preliminary results from [4], we use that formulation in large part. Rfc
However, our purpose is to study
in terms of its prediction process, as defined in F. B. Knight (1975)
and P. A. Meyer (1976).
This dictates that {T^ = 0} and {T^ = ~} be
permitted to have positive probability, which in turn makes it useful to set
R
= 0
(Ω,F°,P)
and we define Then the
for
where
- °° < t < 0 . Thus we introduce the probability space Ω = [0,~],
T Λ (x) = x ^^
σ-field
F°
on
F° Ω,
is the Borel
P(dx) = F(dx),
so that R (x) = I . (t) , r Lx iC0J
generated by
and is that generated by the atom
σ-field, and
R , (t,00]
s < t,
is
-<χ> < t < » . ""
{Φ,Ω} for
and the Borel sets of
t < 0, [0,t] for
t > 0 . As an example of the "general theory of processes," in [4] by the supe martingale
X
= E (R^ - R |F )
46
=
I
R
rn Φ ^^ '
was replaced wn:
"- cn
w a s
ESSAYS ON THE PREDICTION PROCESS
even a potential since
P{T Λ < «} = 1
the argument of [4, Chap. 5, T56]
was assumed.
of
P
on
σ-fields
F (= ¥ ) F° . Gt
THEOREM 1.1. R
generated by
Observe that
we set
Gfc+ = ^
The unique
= P{R Q = 1}
and
R
FT
¥
.
We need the usual augmented
and all
- Ψ
Gg
R
,
and
P-null sets in the completion
where for any adapted family of Gfc_ = J f c G g .
F -previsible increasing process R
~ Γ
R
such that
is a martingale is given by
R Γ :c * Λ Λt L = / 0-
= 0 ,
-°° < t < 0
i (1 - F(u-))
β
_
on + 1
dF(u) ,
0 < t < oo ,
IT Λ < •> on
{T. = °°} .
x>—
REMARK.
In the present case,
transfers with no substantial change to
provide the Doob-Meyer decomposition of σ-fields
47
*
Uniqueness means unique up to a fixed
P-null set.
In the present note we will go one step farther, and study
R
as an
example in the theory of Markov processes (as well as of martingales). Indeed, a general feature of the prediction process construction is that it permits any process to be viewed as a homogeneous Markov process—more specifically, as a right process in the sense of P. A. Meyer and having still additional structure.
It may be said here that
R
provides a more
or less prototypical example of the prediction process of a positive purejump submartingale.
The behavior of this prediction process depends, in
turn, on classification of the stopping times of our next concern.
F ,
which accordingly is
However, the reader may prefer to skip this rather
technical discussion, and go directly to Section 2 where the results are applied.
The connections with Essay I are postponed until the end of the
present essay, for reasons stated there. We recall that a stopping time
T
is "totally inaccessible" if for
every increasing sequence of stopping times
T
one has
Pίlim T = T < °°} = 0, n-x» n and "previsible" if P{T = 0} = 0 or 1, and if when P{T = 0} = 0 there exist T with 1 = P{T < T} = Pίlim T = T> . For the remaining concepts n n n-*» n * n
in our classification, as well as its existence and uniqueness, we refer to [5, Chap. IV, Theorem 81], According to the basic representation theorem of our particular situation ([4, III, T53]) a random time time if and only if for some
s < °°,
T
is an
F -stopping
48
FRANK B. KNIGHT p
(1.1) We note that s
s
ί ί τ * < s Λ T} U {T # > s = T}} = 1 .
is unique unless
P{T Λ > T} = 0 f
and then we may choose
s oo . The classification of stopping times depends on:
THEOREM 1.2.
The accessible part of a stopping time T
= { [oo
A
(1.2)
A=
{T > T.} U {T = s < T J U *
where
s
REMARK.
AC
on
enumerate the values with
is given by
, where U
*
T
{T - T
S, < S
• s. } *
K
P{T # = s, } > 0 .
It is easy to see that this set is unique up to a
set even if PROOF.
s
is not unique.
We have
P{T = 0} > 0 {T = 0}
P-null
{T = 0} - {T = 0 = T^} U {T - 0 < T^}, hence if
then either
0
is an
s
or
is in (1.2), as it should be.
s = 0 . In either case
Now let
T
be any nondecreasing
sequence of stopping times, and let
T = lim T . If we assume that n P{T < T } = 1 for all n (thus T °°is previsible) and let s n °° °° n correspond to T as in (1.1) with s = « whenever possible, then we n n see that lim s = s exists, and satisfies (1.1) for T . Then we have n n {T. < s } c { T < T } up to a P-null set, and therefore ~
00
">
OO
(1.3)
P{{T^ < s *
Λ T } U { s 00
Conversely, if a stopping time P{T > 0} = 1, as follows.
= T. Λ T } } = 1 .
00
00
T
*f
satisfies (1.3) for some
then we can construct a sequence If
s
00
s^
and
P{τ < T} = 1, n n 1 = P{{T. < T} M {T. = T = «}} and
= °° then
00
v
"
T
*T,
*>
writing T = f(T^) on Ω measurable functions with
we can define T = f (TΛ) where f are any f (°°) = n, and for x < °°, x < f (x) < f (x) n n and lim fn (x) = f (x) . If 0 < s < °°, then we define for
n
< s o
f (T.) T
U
on
{T. < s
"
n
*» "~
- n" 1 } U {T. = s "
oo
o
< T} o
l s
- n
elsewhere
oo
and observe that s^ = 0
then
T
satisfies (1.1) with
P{T Λ = 0} = 1
and
T
s = sm - n
. Finally, if
is equivalent to a positive constant.
It follows that (1.3) characterizes the previsible stopping times
T
with
P{T > 0} = 1 . Next we observe that for constant of the form
{T = c}, hence on
c,
{T = s}U
any
T
is accessible on a set
U {T = T. = s. } . It remains s k <s k
ESSAYS ON THE PREDICTION PROCESS
49
to show that the rest of the accessible part is given by
{T > T^} .
this is contained in the accessible part follows by writing as in the preceeding paragraph.
{T < T^}
T n = f n (T # )
On the other hand, by (1.1) we have
{T < T^} = {T = T^} U {T^ > s = T} part of
That
up to a
P-null set, hence the only
not already found excessible is
{T = T^ ψ s^
for all
k}
To see that this last is not accessible, note that for any previsible stopping time
T^ > 0,
contained in
(1.3) implies that the set
{T = T. = T •»
to
Tro
as in (1.3).
= s }
00
up to a
{T = T # = T^}
P-null set, where
s
CO
is corresponds
°°
Therefore, only sets
{T = T^ = s }
of positive
probability can be in the accessible part, and the proof is complete. COROLLARY 1.3. if
A stopping time
P{T > T^} = 0
if and only if
and
T
is:
a) totally inaccessible if and only
P{s = T < T^} = 0
P{T = 0} = 0
or
1
for
0 < s < «>,
and, for some
s,
b) previsible
P{{T # < s Λ T} U
{s = T^ Λ T}} = 1 . PROOF.
Part b) is just (1.3) , so we need only prove a ) . The condition
is obviously sufficient by Theorem 1.2. P{s=T
0 (1.1) and
for some
s,
P{s = T < T^} > 0,
Theorem 1.2
and
On the other hand, if
then either
or else
P{T = T^ = s } > 0 .
s
s
corresponds to s
is one of the
In either case,
s v '
T
T
as in
i-n
is partially
accessible. COROLLARY 1.4.
If
P{T^ = s} = 0
inaccessible and a stopping time p{τ = T^} = 0 . F P{T Λ
for all T
s < «,
then
REMARK.
is previsible if and only if
implies that
s > 0,
P{T Λ = s} = 0 .
It is known from [4, Chap. Ill, T51] that absence of times of
discontinuity is equivalent to the previsibility of all part is
is totally
Furthermore, the necessary and sufficient condition that
be free of times of discontinuity is that, for all > s} > 0
TA
Ω
(up to a
T
PROOF.
The first assertion is immediate from Theorem 1.2.
assume
P{τ = T^} = 0,
P{T^ = s} = 0,
whose accessible
P-null set).
we have
1.3 b ) . Conversely, if
and let
s
correspond to
P { τ # = s Λ T> = 0,
hence
p{τ = T^}> 0
T
then
T
For the second,
as in (1.1). T
Since
satisfies Corollary
is inaccessible on this
set, hence not previsible. It remains to prove the last assertion. holds; i.e., that the distribution of
T^
Assume that the condition
has not atoms except perhaps its
maximal value, and suppose that the accessible part of correspond to we have
T
as in (1.1).
If
P(T^ = s} > 0,
1 = P{{T > T^} U {T = T Λ = s}},
{T > T^},
T
and since
T
is
Ω .
Let
then by Theorem 1.2 T^ < s
holds on
is previsible by Corollary 1.3 b ) . If, on the other hand,
s
50
FRANK B. KNIGHT
p{τ^ = = then _if .,_ P{T = _ T == s^j s } ;> for any any s ^, we see from _ sβ } _ wo, , — > u0 lυr J W M # Γ > s } = 0 and (1.1) that s < s, and hence s may replace s * k k K (1.1). s .
Thus either we have the former case, or
Then since
T > T^
implies
T^ < s
1 = P{{T > T^} U {T = s < T^}}, previsible.
F
F
P-null set, and T
is again
is free of discontinuities.
P{τ^ > s} > 0
and
P{T^ = s} > 0
The
imply
s THE PREDICTION PROCESS OF
R
.
We turn now to the construction of the prediction process of which we will denote by
Z
.
^. - ^. )
τ
F .
TΛ - t
Thus, writing
distribution function, we have
Z ,
given
(.\
F ,
Z (x) = Z (x,w)
z
.(0) = 1
Zfc(x) = (F(t + x) - F(t))/(1 - F(t))
t > T^
t +
F(t-) = 1,
ψ
(we
if
whence they have the
for the corresponding
t > T^
otherwise.
and
or
F(t) = 1,
while
The left-limit process
in a suitable topology to be specified, is or
given
Z
Clearly such distributions can be specified by
the conditional distribution of same form as
R ,
According to its definition, the values of
are the conditional probability distributions of recall that
of
for all
ϊF .
s2.
except on a
by Corollary 1.3 b)
Thus (see the Remark)
converse is obvious, since
P{T = T^ = s} = 0
in
Z _(0) = 1
if
Zfc_ (x) = (F(t + x) - F(t-) )/(l-F(t-))
otherwise. The prediction process may be used to best advantage only by introducing it as a Markov process in its own right, instead of confining it to the probability space of
R
(this represents a partial shift of the
author's views from those expressed in [9]). This is because there are technical difficulties in carrying out the theory of additive functionals of the prediction process if it is defined on the original probability space (as noted by R. K. Getoor (1978)).
Ω
On the other hand, once we free ourselves
from this restriction, the theory becomes comparatively straightforward. Furthermore, in a sense to be made precise, nothing concerning the process R
is lost in the transition.
Therefore, we introduce formally both a new
state space and a new probability space. DEFINITION 2.1.
The prediction state space of
R
is the space t
(E , E ) Z Z
where E_ = {(F(t+ ) - F(t))/1 - F(t) ,
-co < t < »:
(F(t+ •) - F(t-))/(l - F(t-)) , F
, —OO
and
F
} ,
-J-OO
with
F —OO
F(t) ^ 1
-« < t < oo: F(t-) ^ 1
(x) Ξ 0 ,
F -J-OO
(x) Ξ 1 ,
ESSAYS ON THE PREDICTION PROCESS
and G F
E is the Z varies on E and
F
σ-field generated by the functions .
We denote elements of
Eχ
51
G ( x ) , 0 < x < «>,
as
of the first two types by
respectively (although, with this notation, they are not
necessarily distinct) .
We let
E^
denote
{F_oo, F + O Q , Ffc, -« < t < «>} .
In the present very specialized situation, it is natural to introduce in
E
the topology of weak convergence of measures on Ω, when Ω is Z considered as a subset of the space D with the Skorokhod J^-topology (Billingsley, [2], Chapter 3 ) . Specifically, to each the element of
D
given by
f (s) = R (x) X
with
t
x e Ω
we associate
s = —(1 + — arctan t ) , η £
iΓ
-oo < t < °° . We note that f (s) = 0 for 0 < s < —, and that convergence in D of f is the same as convergence of x in the extended topology of x [0,00] .
It therefore follows that the continuous functions on
D-topology are just
C[0,~],
Ω
in the
and weak convergence of probabilities on
Ω
becomes simply weak convergence of the corresponding distribution functions F
on
[0, °°] .
In particular, we note that
E
is a Borel set and that
E
z is a Borel
z
σ-field generated by this (metrizable) topology on
Furthermore, since with left limits
F
F
is right-continuous for for
t > 0,
with left limits in this topology.
E_ . Δ
t < minis: F(s) = 1 } ,
it is clear that
Z
is right-continuous
In fact, the space
E
is "almost"
z compact, the only limit points not necessarily included being those of obtained as F(t) = 1
t — > -H» .
for some
This set is trivial if either
F(°°) < 1
t < °°, but in general it cannot be avoided.
We turn next to the prediction probability space for the process using the same notation Z for the process on the new space. DEFINITION 2.2. Let ( Ω . F , Z) consist of z z t a) The space of all paths z(t) , 0 < t < °°, with values in ~" which are right-continuous, with left limits for of weak convergence, b) The coordinate
σ-field generated on
c) The coordinate functions We observe that the original Ω
given by
&
for
has its paths as points in
.
E
and Z
in the topology
Ω
z
0 < t < T —
Ω
t > 0,
Z , t
by ίz(t) e A>, t> 0, Z — Z = Z (z) = z(t) . F(=F ) is in Ω , and that the
u— process on
F
or
and by *
F
for
t > t
- ' o o
—
*
Hence we can define a probability
on
(Ω , F ) such that the joint distributions of Z(t) Δ Δ those of the above process on Ω. Furthermore, to every z
P
are the same as z € Ez
we can
(Ω , F ) , by using z in z z the role of F as the distribution of T A . Thus the points z ^ E * Z correspond to probabilities for Z . If z = F for some t,
associate in the same way probability
P
on
52
FRANK B. KNIGHT
Z
-oo < t < °°, then
P {Z
F(t) - F(t-) > 0,
= z} = 1 . Z
then
P { Z Q = Pfc}
However, if
z = F
(Ω_, F_) .
so that
= 1 .
We are now in a position to view the family Markov process on
ψ F ,
The points
z
{P , z e E z >
such that
as a
z = F
ψ F
are the "branching points" of this process, in the terminology of Walsh and Meyer [13]. The transition function such that for each points.
(t,z)
q(t,z,A)
of the process is
the probability is concentrated on at most two
Precisely, we have
DEFINITION 2.3.
The transition function of
t > 0, z e E z ,
A e
Z
is given by
where
i) qίt.p.rtpj) = 1 ,
t > 0
ii) q ί t ^ ^ F ^ } ) = 1 - q(t,z,{F s+t}) = F (t)
1 > F (t) (=F(s+t)) , s
1 > F g (t) ,
if
z = F
and s
t > 0 ,
q(t,z,{F^}) = 1 - q(t,Z,{F
iii)
q(t,z,A),
F _(t) s+t}) =
if
z = F 7* s-
and
t > 0 , in cases ii) and iii) if
iv) q ί t ^ ^ F ^ } ) = 1
v) q(0,z,{Foo>) = 1 - q(0,z,{F }) = F
(0)
1 = F
in case
s(t) ' iii).
It follows from the general theory of [9] and [11] (or can easily be seen directly) that
(Ω_, F_/ Z , P ) becomes a right process on E in z z t z the sense of P. A. Meyer, with transition function q, when we include z z the canonical translation operators θ. and σ-fields F . Of course, both E and q are Borel, so the general U-space set-up of Getoor [6] it
is unnecessary (this is quite generally true for the prediction process). Furthermore, the process has unique left limits
Z in E , t > 0 . t~ Z It is important to observe that probabilistically nothing is lost by F + considering (Z , P ) in place of (R ,P) . Thus we introduce on E t t z the Borel function
(2.1)
Then
φ
φ(Z )
is
(G) =
P -equivalent to
R
in joint distribution, and is
right-continuous with left limits. Hence it is a valid replacement for o, Z R . The σ-fields F generated by Z , s < t, are of course larger "C "t S than those generated by
φ(Z g ),
traced to the fact that
φ(Z Q )
initial point hence
Z
and
z
s < t .
But the entire difference can be
does not determine
the above two fields have the same
φ(Z )
generate the same completed
Z
.
Thus for each
P -completion, and σ-fields
F
ESSAYS ON THE PREDICTION PROCESS
53
One basic feature of the prediction process which gives insight into the given process is its times of discontinuity. time
TA
on
Ω
The analogue of the jump
i s o f course the stopping time
(2.2)
T
z
^ = inf {t: Zfc = fj
.
However, this is not necessarily a time of discontinuity for F
p ,
and by no means the only one.
T
under P Z, jump points of
probability
F
consists of F .
But while
Z^
under
By Theorem 1.2 the accessible part of
U {T = s }, where the s enumerate the k Z, k K. R is discontinuous at t = s^ with
F(s,) - F(s. ) , Z is discontinuous at t = s with k k— t K 1 - F(s -)(= P F {T * > s }) unless F(s ) = 1, when it is
probability
continuous (since
Z
is then S
F
k
-measurable).
On the other hand, at
V
the totally inaccessible part of T_ (i.e. the part where F is continuous), Z like R has an inaccessible jump. It is clear that Z
is continuous except at IL {s } U {T .} t K K 6, F its discontinuities under P , and for other is analogous.
hence we have classified z e E_ Δ
the situation
Thus, the conclusion which roughly emerges is that
the same totally inaccessible jumps as jumps at times when
R
R
Z
has
but it has additional accessible
has a positive (but unrealized) potentiality for a
jump. This distinction in the behavior of R and Z at the previsible s disappears when we replace R by the martingale R - R'* k t t t of Theorem 1.1. More generally, we introduce on Ω the previsible additive times
2
——————
functional
Λ * Λt (2.3)
A
=
/
z
'
(1 - G(u-))
Ί^
d G(u)
on
{Z Λ = G} ,
o Λ
G e E
o
(previsibility is clear since process
x
Λ t) .
A
The process
.
z
is a Borel function of the previsible φ(zt) - Ψ(ZQ) "
i s
n o w
s e e n
t
to be a
martingale additive functional of
Z
that
have the same times of discontinuity
φ(Z ) - φ(Z ) = A
for each
P
.
and
Z
.
z
More importantly, one easily checks
This is an expression of the general fact that a right-
continuous martingale has its times of discontinuity contained in those of its prediction process, as proved in F. Knight [10, Lemma 1.5]. However, the application is not direct because the prediction process of φ(Z t ) - A t
for fixed
space than
E^,
F(s) - F(s-) = 1
G = ZQ
has a different (and less convenient) state
and it cannot be identified with for some
s
then
z
φ( J
although continuous, is not constant.
- A
= 0
Z
. for
For example, if PF
while
Z ,
54
FRANK B. KNIGHT
We consider finally the Levy system of
R,
and
where
T
- R
:ί
N(x,dy)
.
Z ,
By definition [1, Corollary 5.2] this is a pair
is a kernel on
(E , E ) , N(x,{x}) = 0,
previsible additive functional such that for with
f(z,z) = 0,
(2.4)
E*(
and its relevance to
Σ f ( Z ,Z ) ) = E S S 0<s
In the present case, although
#
\\ dH ( / O S E
Z
and
0 < f(x,y)
N(Z S
H
&€*€„,
,dy) f ( Z , d y ) ) . s-
does not satisfy all the hypotheses of
[1, Cor. 5.2] it is easy to specify such a system explicitly. only to take
H
= A
(N,H)
is a
One has
from (2.3) and then define q(0,x,dy)
for
x = F^__ ψ F^_,
-°° < t < ~
N(x,dy) 6(F )
otherwise,
x ^ F '
OO
where
6(F )
00
is the unit mass at
F
00
(we define
N(F , )
00
in any Λ
OO
convenient way). As a compensator for the discontinuities of system is here more relevant to
R
- R^
than to
R ,
Z ,
the Levy
for the reasons of
the preceeding paragraph. Thus we have an analogous "Levy system" for R - R!_C in the form (N'", R^) where
t
t
t
N"(-R:: ,{-R:: + 1}) = F (0) s.s. s.D D D
(2.6)
for
F(s.) - F(s.-) > 0, D D
and
N"(x,B} = IB Ώ (x+l)
for all
x ψ {-R" } . s.-
It is clear that (2.6) is obtained from (2.5) by just substituting the :: F jumps of
R
- R
for those of
(Z ,P )
which are disallowed as jump times of
Z
analogous to (2.4) but for the martingale
except at . R
t = 0
and
t = °°
Since (2.6) has a role - R
instead of
Z ,
it is
natural to take it as the definition of a Levy system for the martingale. Again, this is a very special case of a general existence theorem ([10, Theorem 1.3]). 3.
CONNECTIONS WITH THE GENERAL PREDICTION PROCESS. For the reader who is already familiar with Essay I, the present
Section 2 is easily incorporated into that more general setting.
However,
it is somewhat more natural to treat all single-jump processes simultaneously, as realized by a single prediction process. the essence of the underlying idea.
This formalizes, so to speak,
It has been carried out by Professor
John B. Walsh, who has consented to let us use the material that follows.
ESSAYS ON THE PREDICTION PROCESS
We take notation.
w(t) = w (t) ,
Let
w(x) = I.
Ω J
(J
v (x) ,
55
with all other components discarded from the
for jump)
0 < T < «> .
be the set of functions of the form
Then
Ω
inherits from
Ω
the topology of
pointwise convergence of the corresponding T . Hence it is compact. Let H be the set of all probability measures on Ω_, with the weak-* J J topology.
If we identify
assigns to
T,
h e H
then convergence in
distribution functions on For
with the probability distribution it
[0,«>] ,
H
becomes weak convergence of
and
H_
is compact.
j
h <= H
(regarded as a measure on
Ω vanishing outside Ω_) , J Z remains in H , and so does Z for t > 0 . t J t— is a complete Borel packet, in the sense of Essay I, Definition
J the prediction process Thus
H
j
2.1, 3 ) . The transition function of
Z on H is given above by Essay 2, t J Definition 2.3. The elements of H Π H , regarded as distributions of T, <j u are just F^ and all F with F(0) = 0 . Thus Z is a right process on H Π H . In fact, we have more in the present case. J 0 PROPOSITION 3.1. Z^ is a Ray process on H, . t . J m PROOF. It is to be shown that /_ e~" q^fdt e C(H ) if f e C ( H ) , where u t J J q f(h) = /f (z)q(t,h,dz) . As before, we let F(t) = h{T < t} . Then we have
syλtt [F{T- ; where the last integrand is
-*• h, with n corresponding F -> F, the first term on the right obviously converges to its limit with F in place of F . Also, if F(t) < 1 then n F n (t+ ) - F n (t) has at most two weak limit points as n ->•«>: F(t+ ) - F(t) 1 _ -p (t) with
F(t) < 1
_ a n d
0
if
F(t) = 1 .
F(t+ ) - F(t-) Ί _ F(t-) *
τ
"
u s
a t
Now if
h
continuity points
it converges to the same limit.
Since
f
t
of
F
is bounded it is
easy to see that the contribution to the last integral for t > inf {t : F(t) = 1}
tends to
0
as
n -»- °° .
Hence by dominated
convergence, the last integrals also converge to their value at
F,
completing the proof. REMARK.
It follows immediately that Conjecture 2.10 of Essay I holds for
REFERENCES 1. Benveniste, A. and Jacod, J. "Systemes de Levy des processus de Markov," Inventiones mathematicae 21, 1973, 183-198. 2. Billingsley, P. Convergence of Probability Measures. and Sons, Inc., New York, 1968.
John Wiley
56
FRANK B. KNIGHT
3. Chou, C. S. and Meyer, P.-A. Sur la representation des martingales comme integrales stochastiques dans les processus ponctuels. Seminaire de Prob. IX, Univ. de Strasbourg, 226-236. Lecture Notes in Math 465, Springer, Berlin, 1975. 4. Dellacherie, C. Verlag, Berlin, 1972.
Capacite"s et Processus Stochastiques.
5. Dellacherie, C. and Meyer, P.-A. Chapitres I a IV. Hermann, Paris, 1975.
Springer-
Probabilite*s et Potentiel,
6. Getoor, R. K. Markov Processes: Ray Processes and Right Processes. Lecture Notes in Math. 440, Springer, Berlin, 1975. 7. Getoor, R. K. Homogeneous potentials. Seminaire de Prob. XII, Univ. de Strasbourg, 398-410. LEcture Notes in Math. 649, Springer, Berlin, 1978. 8. Knight, F. B. "A predictive view of continuous time processes," The Annals of Prob., 3, 1975, 573-596. 9. Knight, F. B. On prediction processes. Proceedings of the Symposium in Pure Mathematics of the Amer. Math. Soc. XXXI, 1976, 79-85. Providence, R.I. 10.
Knight, F. B.
Essays on the prediction process.
Essay IV.
11. Meyer, P.-A. La theorie de prediction de F. Knight. Seminaire de Prob. X, Univ. de Strasbourg, 86-104. Lecture Notes in Math. 511, Springer, Berlin, 1976. 12. Meyer, P.-A. and Yor, M. Sur la theorie de la prediction, et le probleme de decomposition des tribus F? + . Seminaire de Prob. X, Univ. de Strasbourg, 104-117. Lecture Notes in Math. 511, Springer, Berlin, 1976. 13. Walsh, J. B. and Meyer, P.-A. "Quelques application des resolvantes de Ray," Inventiones Mathematicae, 14, 1971, 143-166.
ESSAY III.
Let
X
CONSTRUCTION OF STATIONARY STRONG-MARKOV TRANSITION PROBABILITIES
be a continuous parameter stochastic process on
with values in a metrizable Lusin space σ-field of a Borel set
E
to state the property of
(E,E)
(i.e.,
in a compact metric space X
E
I) .
(Ω,F,P)
is the Borel In order just
that it be a "time-homogeneous Markov
process", it is necessary to introduce some form of conditional probability function to serve as transition function.
From an axiomatic standpoint it
is of course desirable to assume as little as possible about this function. An interesting and difficult problem is then to deduce from such assumptions the existence of a complete Markov transition probability (P,X )
p(t,x,B)
for
which satisfies the Chapman-Kolmogorov identities
(1.1)
p(s+t,x,B) =
thus giving rise to a family
Jp(s,x,dy)p(t,y,B) ,
(P , x £ E)
of Markovian probabilities for
which (1.2)
pX(X s
+t ^ B l σ ( X τ '
τ
1
s ) )
= P
S
X (X t
e
B} .
The analogous time-inhomogeneous problem (of obtaining a
p(s,x;s+t,B))
was treated by J. Karush (1961), and considerably later the present problem was taken up by J. Walsh [9]. It seems, however, that for the homogeneous case the solution remained complicated and conceptually difficult. Since the publication of these two works, a new tool has appeared on the scene which has an obvious bearing on the problem, namely, the "prediction process" of [5] and [8]. Accordingly, the present essay aims to show what can be done by using this method. of applying a new device.
But it is not simply a question
Our view is that the prediction process is
fundamental to the problem, and the hypotheses which are needed to apply it give a basic understanding of the nature of the difficulties. way of viewing the entire matter is as follows. in some sense the best approximation to
X
A suggested
The prediction process is
by a process which does have a
The hypotheses of Theorem 3 of [9] are ultimately consequences of ours (Corollary 1.9 below).
57
58
FRANK B. KNIGHT
stationary strong-Markov transition function.
The problem is thus to
formulate the conditions under which the prediction process becomes identifiable with
Xfc
itself.
Two immediate requirements are that the paths of
X
be sufficiently
regular, and that their probability space be sufficiently tractable, so that the assumed conditional probabilities may be identified surely for each
t
P-almost
with the regular conditional probabilities which
constitute the prediction process.
We will make the following initial
assumption (to be relaxed in Theorem 1.12). ASSUMPTION 1.1. E-valued paths
Let
(Ω,θ , F )
w(t), t > 0,
denote the space of right-continuous
with left limits for
translation operators and generated representation
σ-fields.
t > 0,
and the usual
We assume the canonical
X (w) = w(t) .
We now introduce the two basic definitions with which we will be concerned. DEFINITION 1.2.
Let
Q(x,S),
X e E,
kernel, i.e. a probability in each to
S . Q
A probability
and
F^_
(= ^
P
S
on
F°+£) ,
S e F°(= V F°), be a probability
for each F°
x
and
E-measurable in
x
for
is called homogeneous Markov relative
t > 0,
if for each
t
and
S e F°
Pίθ" 1 S|F°_+) = Q(X t ,S) P-a.s.
(1.3) DEFINITION 1.3.
The Chapman-Kolmogorov identities for
(1.4)
Q(x,θ~\_(S)) = J Q ( X , { X Sit
0 < s,t REMARKS.
x ^ E ,
Q(x,S)
G dy}) Qίy^θ^S) S
t
S^F°.
Since regular conditional probabilities exist over
assumption of a
Q
are
F°,
the
as in Definition 1.2 is equivalent to assuming only a
marginal conditional probability kernel In fact, it is enough to have
Q
B e E,
Q (x,B),
for rational
s,
for each
s > 0 .
since then
/ Q
(Xτ/dy)Q (y,B) = Q (X ,B) except on a P-null set for each S S l 2 l + S2 T We can then use this identity, along with the fact that regular S
conditional probabilities assign probability one to the r.c.1.1. paths, to construct a Q
Q
on the space of
satisfying (1.3).
In fact, the measures generated by
E-valued functions of rational
s > 0
s
must reduce,
"™
when X is substituted as initial value, to the restriction to rational s of any regular conditional probability on the r.c.1.1. paths given o F . Hence they extend to measures on the r.c.1.1. paths, P-a.s. for every
τ .
The set of restrictions to rational
s > 0
of r.c.1.1. paths
is a Borel set in the countable product space, so the condition that this
τ .
ESSAYS ON THE PREDICTION PROCESS
set have probability set, we may take
1
59
gives a Borel set of initial values.
Outside this
Q(x,S) = I (w(0)) .
The most that follows from (1.3), however, is that (1.4) holds for all τ > 0
and
S
&
F°
except for
x
in a set
E(τ,s)
In short, one can eliminate the dependence on are right-continuous and
F
t
s,
much less on
τ,
Secondly, the reason for conditioning on
it is less convenient.
and
S
is countably generated.
to eliminate dependence on
is in one sense trivial:
with
We could have used
P {X
e E(τ,s)} = 0
since the paths
But we do not see how
without further assumptions.
F
in Definition 1.1
F°
and
X
instead, but
However, the distinction between
F°
and
ψ
t
is
t
"unobservable" for the prediction process (see, for example, the Remark following Theorem 1.9 of Essay I ) . So it is unrealistic to condition on F
except when it is shown (as following Theorem 1.12 below) that this is
equivalent to
F.
.
The point here is that the prediction process is
automatically a strong-Markov process relative to dictates that the same will be true of
X
The problem is now to identify in practice) under which, given a Q*(x,S)
To this end we first state the
and [8], and Essay 1.
Let
z(S)
σ-field generated by
F°
a e R . F
and the
Further, for each
and all
for each
z e H,
is an
for each optional time d
5)
P
REMARK.
Z(t)
be the
1
σ-field generated by
z-prediction process
Z (H,H)
,
Ω'
such that
z-a.s.
itself).
The process
Ω
"Z
is
Ω
and
H
were "larger" than the present ones.
is here a Lusin space it is easy to see that the probabilities
of [5] must already equal one on the Borel image of this
space
= Z (S,w),
S e F ,
S | F * + ) = Z*(S)
z
S & F°,
z-equivalence.
In [5] the spaces
But since
{z : z(S) < a},
F
and all
is another notation for
unique up to
as obtained in [5],
-optional process with space
T < <», P^θ;
(where
let
Then the
F
X ,
be the set of probability measures
t > 0
z-null sets.
Thus our method
satisfying (1.3), there exists a
relevant properties of the prediction process of
on
.
conditions (presumably verifiable Q
satisfying both (1.3) and (1.4).
(H,H)
F
.
of [5], for all
t
z-a.s.
Ω
in the
Hence we can assume the present
(H,tf) . The second essential feature of the processes behavior as THEOREM 1.4. (H,H)
z
varies.
Z
concerns their
From Theorem 1.15 of Essay I we have:
There is a jointly Borel transition function
such that for each
z
the process
Z
q(t,y,A)
(with the probability
on z
6 0
FRANK B. KNIGHT
itself) is a homogeneous strong-Markov process relative to F fc+ , transition function
q(t,y,A) .
In particular,
q
with
satisfies the Chapman-
Kolmogorov identities (1.1). An advantage of restricting to a space of right-continuous paths is that one can be quite explicit about the connection of
Zfc
and
Xfc .
Indeed we have a simple functional dependence. THEOREM 1.5.
There is an
H/E-measurable function
φ
such that for all
z « H Pz{χt . φ ( Z Z ) PROOF.
for all
It is convenient to introduce the set of non-branching points of
= {z e H: We have P {Z
Z
H
e H ,
G H,
z <= H
and
q(O,z,{z}) = 1} .
and by Proposition 2 of Meyer [8] for all
t > 0} = 1
(in fact, the distributions of
those of a right process on B <= E,
H
P {X
z(S)
is
that
E Z f(X Q )
B}
ψ
is
fixed
= φ(z)} = 1
H-measurable for
functions).
is
Z *
with transition function
(
""
B|F
o
= P {X we must have
z e H on
q) .
we have H
are
For
since
oι o
and
H-measurable for
Then we have
H-measurable on
H
o+)
e B}
X
is
f e b(E)
a.s.,
, Z
for some function
S e F°
φ(z)
on
x Q G E . Now for any
F
i B (x τ ) = P Z
.
Since
(the bounded
E-measurable
Z
e H,
φ(z) = x
-stopping time
H
F°-measurable, we see
{z: φ(z) e B} = {z: E I (X ) = 1 } B 0 . We set
on
H - H
T < «>
w e
so
for some have for
B e E
(x τ fe B | F Z + )
z = P
T
(XQ € B)
= I β (φ(Z Z )) , Z
z-a.s.
It follows easily that X = φ ( Z ) , z-a.s. Then, since both X and zx z " are F -optional processes, the optional section theorem of —
UT
[1, IV, 84] finishes the proof.
Z
P Z { Z Q Z = z} = 1}
H Q = {z e H:
Z
t > 0} = 1 .
ESSAYS ON THE PREDICTION PROCESS
Before proceeding, let us review our notations. refers to the original process on P fc H .
P
Z
and
we do not write p Z
is that of
senses:
P
E P
Z
Ω,
61
P
without superscript
and at the same time we have
are simply z and its expectation, for z e H, but z . Z is the prediction process of z; in particular,
P .
We will need to use
Q(x,S)
in three distinct
first, as a probability kernel; second, as a mapping
defined by
Q(x) = Q(x,(•));
and third as a set mapping
Q : E -v H
Q{x e S} =
{Q(χ) : x e S} . The essential requirement for using the processes a transition function for
X
defined by the given kernel p large that
P{Z
e Q(E),
is that the mapping Q(x,S)
Z
Q:
to construct
E
should have a range
-*» H Q(E)
sufficiently
t > 0} = 1 . The most natural way to insure this
is to introduce: ASSUMPTION 1.6. topology on i)
H
H
Q
is continuous for the given topology on
E
such that
σ-field generated by the open sets, and P ii) Z is P-a.s. right continuous in t . There are usually many different topologies generating
is the
P Z a.s.-right-continuous.
H
LEMMA 1.7.
PROOF.
K
and
discussed
P{Q(x ) = Z Z^
denote
for all rational it follows that
{w:w(0) = x} .
t > 0, "*
Z^ e H Q
this implies
P{X
=
K
hence
K
e
H.
S (x)
φQ(X ) , t > 0} = 1,
Since
{x: Q(x,S(x)) = 1} Π
and on the above intersection we have
follows by [1, III, 21] that is the identity on
such that
P{Q(Xt,S(X )) = 1, t > 0} = 1 .
is one-to-one on this set, whose image under
REMARKS.
e H,
By right-continuity of
By Theorem 1.5 we have
Q ( X , S ( X ) ) = 1} Π H Q .
Q
K
P{Q(X.) = Z^} = 1, t t
r > 0} = 1 .
and since we have
{x: Q(x) e H } e E,
c H ,
P{Q(X t ) = Z*, t > 0} = 1 . Next, let
We set
K Q = Q{x:
K
is complete.
P{Z* £ K , t > 0} = 1 .
By (1.3) we have for each p
and
Q*(x,S)
Under Assumption 1.6 there is a
Qφ = identity on
then
Ω,
We postpone further discussion of Assumption 1.6 until the
construction of the transition function
Xt
and making
Perhaps the most natural one is the weak*-
topology with respect to the topology of weak convergence on below.
and some
We have
Q
φ Qx = x, is
K
Q φ Qx = Qx,
.
It
hence
and the proof is complete.
We did not quite have to require that
Q(x)
be continuous,
but only that it be measurable and that its graph be closed in
E x H .
Furthermore under the not unreasonable conditions that
Q(x,S(x)) = 1
for all
(where the
x
and that
conditioning is on
Qφ
Q(x, Q(x))
|F°+) = Q(x, ) we have
for all
K Q = Q(E) .
x
62
FRANK B . KNIGHT
We now use the set
K
of Lemma 1.7 to construct a state space for the
prediction process on which it can be identified with LEMMA 1.8. for all
There is a
K
c K , K
£ H,
z, z t
REMARK.
€
κ
t
i'
-0}
=
In the terminology of Essay I, Definition 2.1, 3), K
packet of PROOF.
Z
PZ{ZZ e K Q ,
t > 0} = 1} .
Then in the terminology of [3, Section 12] for K = {z € H :
P
0
l(z) = 0}
where
P
κ
H
V o
α > 0 1
q .
process is a right-process on
I
(Z.)
is
Since H
q
is Borel and the prediction
(see Remark III. e. of Meyer [8])
P {Z
& K,
that
P ίz
F ,
Z
It follows that for
and so the section theorem implies that
t > 0} = 1 . We have, by Lemma 1.7,
P{Z^ Ξ K Π K Q } = 1 . Z
K
P -indistinguishable from a well-measurable
(optional) process of Z
α-excessive for
is
κ
is a nearly Borel set for the prediction process. K,
we have
o" o
the transition function
Z
is a Borel
(In part like Theorem 2.4 a) of [6]). We begin by setting K = {z e H Q :
e
and
z ^ K ,
pZ{z
z
Xfc . P{Z P € κ χ } = 1
such that
Also, for
e K Π K ,
z e K Π K
t>θ}=l,
p{Z P 6 κl = 1,
hence
we have by definition of
so we may consider
space for the prediction process, and by Lemma 1.7
K Π K
K
as state
Q φ = identity on
κnκ0 . It remains to show that K
K Π K
may even be replaced by a Borel subset
. We use an argument due to P. A. Meyer [7] (see also the end of [9]).
Since P{Zfc
6
K Π K K /
is nearly Borel, it has a Borel subset
t > 0} = 1 .
Let
κ2 - u
K'
K2
such that
denote the nearly Borel set
« κ2. p izt € κ2,
t
0) - u
.
As before, we have P{Z P € K p = 1
i)
P Z ί z Z e K^,
ii)
and
t > 0} = 1
for
z e κ'
Similarly, we define by induction a sequence where
K
Now let But for
n
is Borel, and K
= nQ2 K
.
KΓ "> K^ ~> K^ ... "> K "D K". 2 3 3 n n is nearly Borel and satisfies i) and ii).
K^ n Then K.
z e K ~ we have l
is Borel, and obviously satisfies i ) .
P Z {zf e K", t n
t > 0} = 1 —
for every
n .
ESSAYS ON THE PREDICTION PROCESS
Since
K
1
=
^ K^, n>2 n
K' 1
63
also satisfies ii) and the proof is complete.
We can now prove the main theorem. THEOREM 1.9.
Under Assumptions 1.1 and 1.6, given
Definition 1.2 there exists a
Q*(x,S)
Q(x,S)
for the same
P
and
P
as in
which satisfies
the identities (1.4). PROOF.
We have
Q φ = identity on
By [1, III, 21], φ(K ) G E .
K ,
and
P{X
€ φ(K ),
t > 0} = 1 .
Now we define Q(x,S)
if
x e φ(K )
I g (w χ )
if
x f Φ(Kχ)
Q*(x,S)
where
w (t) = x
for all
t > 0 . Obviously
kernel and satisfies (1.3) for for
x e φ(κ ) f
P,
Q*
and (1.4) for
0 < t χ < ... < t R ,
and
is a probability x ^ ψ(K ) .
Bχ,...,B^ e E,
Finally,
by (1.5)
and Theorem 1.4 we have Q*(x, Pi" X e K-l t k
B ) JC
where we used the fact that
Q
is an isomorphism of
for the last equality (again by [1, III]). omit the t
= s,
φ(K )'s and
t
just as for the first equality. - t
= t,
H|κ
onto
E|-
In the last term we may Choosing
this establishes (1.4) for
B
= E,
S = π£_ 9 {x
& B } .
K. JL The general case follows immediately by the familiar uniqueness of the extension. COROLLARY 1.9.
For every initial distribution
μ,
we have the strong
Markov property:
P^e^slF^) - Q*(XT,S), Pμ-a.s. where
P (S) = /Q*(X,S)μ(dx), and
completed REMARK.
σ-fields
T
Fτ+
It follows that
pJJ
= F^ .
is any finite stopping time of the
64
FRANK B. KNIGHT
PROOF.
For
μ concentrated on
property of
?F , by writing
The part of
μ
we have
outside of
φ(K ) 1 PP -a.s; μμ
φ(K )
this follows from the analogous >y X # = φ(Z^ ) as in the former proof.
causes no difficulty since, for every
T,
μ
{X X QQ
We turn to a discussion of Assumption 1.6, which of course is the main question mark in the theory.
The essential fact in identifying such a
topology is THEOREM 3.10. f o θ
Let
f
be bounded and F °-measurable
(f e b(F°)) . If
is right-continuous (resp. with left limits) in
t
for all
w e Ω,
then for every z e H Zz x t P {E f is right-continuous (resp. with left limits)} = 1 . PROOF.
This follows immediately from two known results: Z
a)
t
E
f
is the
F^-optional projection of
f o θ
[1, III,
Theorem 2 ] , and b)
The
F -optional projection of a right-continuous bounded process
(resp. with left limits) is itself right-continuous (resp. with left limits) z-a.s, [7, Appendice 2 ] . Therefore, we have immediately COROLLARY 1.10. a) b)
Let
for each
w
{f
c b(F°),
and
1 < n}
satisfy the two conditions
n,
f o θ. is right-continuous in t > 0, n t — the monotone linear bounded closure of {fn } is b(F ) .
and
z
Then the topology on H generated by the functions E f , 1 < n satisfies i) and ii) of Assumption 1.6. z PROOF. Only i) needs comment. But since each E fn is measurable with respect to the σ-field generated by the open sets, so is E z f for f if the closure b(F°), as required. There are many possibilities for such f . Perhaps the most n obvious is to take f = g (X ) where r runs over the non-negative n m r rationals and g runs over a uniformly dense set of continuous functions m _ on a compact metric space
E
containing
E
as a Borel subset.
Then the
condition that Q satisfy Assumption 1.6 becomes the Feller property E Q ( x ) g (X ) e C(E) for rational r . m r A weaker type of requirement, but one which still involves the given topology of
E,
utilizes all finite products k r
(1.6)
f = Π JJ n . _ i. i=l
for
0 < r. ""i
generated on
rational and the Ω
by the
f
g fs m
l
-t e g (X ) dt , m. t l
as above.
Here the topology
is just the weak topology of the sojourn
ESSAYS ON THE PREDICTION PROCESS
measures /
0
all
μ(t,A)
g (X ) ds = /
m
3
m
65
defined by
μ(t,A) = J I (X )ds . Indeed, we have U A S g (x)μ(t,dx) . Hence, convergence of these integrals for
Em
is just weak convergence of
On the other hand, this
convergence for all
t
and
generated by the
.
This topology is metrizable, for example, with
f
m
μ(t, ) .
is easily seen to be equivalent to that
metric
d(w ,w ) = Σ w |f (w ) - f (w 2 )|, whence Ω is embedded as a m Borel subset of its compactification, which is the space Ω of (equivalence classes of) measurable functions with values in the closure of
E
(for this argument, see Essay 1, Theorem 1.2, where an analogous
but weaker topology is treated). Accordingly, we can consider on this topology on continuity of
Ω,
Q(x)
by setting
H
the weak-*topology generated by
h(Ω - Ω) = 0
for
h e H . Again,
for this topology on its range can be expressed in
more familiar terms. THEOREM 1.11. Z
E f
n λ > 0
Continuity
Q(x)
for the weak-*topology generated by
for the
f of (1.6) is equivalent to the continuity on n _ and continuous g on E, of
E,
for all
E Q ( X ) /~ e""λtg(Xt)dt .
(1.7) REMARK.
Let
R
Ray property of equation.
λ
g(x)
denote (1.7).
R,g(x),
Then the last continuity is just the
except that we are not assuming the resolvent
The proof below is not self-contained, but in the present
context it does not seem to merit that degree of emphasis. PROOF. h
We rely on the construction of [5], where the coordinate functions
are the present
space
Ω'
g
.
By the argument just given, convergence in the
of [5] induces on
Ω
the topology of weak convergence of
sojourn time distributions. Consequently, the topology of topology as above.
H
in [5] reduces to the same weak-*
The assertion of our theorem now follows from the proof
of Theorem 3.1.1 of [5] in two steps. R : C(E) λ topology.
-00
if on
First, we observe that the proof of
>C(E ) needs no change, where E is E with the Q-induced Q Q This is simply the observation that each R g (x) is continuous , λ n —λt
E* v since 'f are continuous RΛ : C(E) on»C(E) . Second, E each J e gon(X E)dtthen is continuous Ω . Consequently, n λ
we note that the proof of Lemma 3.1.1 of [5] does not use the resolvent equation or the compactness of we obtain that if
Rχ:
C(E)
E . >C(E)
Accordingly it applies unchanged, and holds, then
66
FRANK B. KNIGHT
k
/ g
(X )ds, 1 < k < n, Π
have joint distributions for any choice of Q(x)f E
f o r
n,
t h e
n ,
and
t
> 0,
k X
Q
which are weakly continuous in
t ) .
x
(for
This easily implies continuity of
of (1.6) so the proof is complete.
f
n
n
As seen above, both the Feller property and the Ray property are essentially special cases of Assumption 1.6.
It is thus of interest to
note that (at least formally) the later is much more general than either of these.
According to Corollary 1.10, if
g k e b(E)
is any sequence
such that the monotone linear bounded closure of
{9^}
then the topology on
all
0 < r
rational,
H
generated by
1 < k,
Assumption 1.6.
E f n
for
^ s all °f
b(E)
f = / e n r
g (X )ds, k s
will satisfy the requirements i) and ii) of
Hence one need only find a
Q(x)
continuous in such a
topology to obtain the conclusions of Theorem and Corollary 1.9. since the
g
involve only the
σ-field
E
one is now free to change the topology of
E
Moreover,
(and not the topology of E ) , provided that
X
be assumed to have right-continuous paths with left limits.
may still
Therefore,
rather than starting with Assumption 1.1, we could just as well assume such Q(X t ) a continuity of E f . This leads to the following statement. n THEOREM 1.12.
Let
(E,E)-valued paths to include
σ
( / Ω f(
(Ω,θ ,F°) x t χ
M
be the space of Lebesgue measurable
= w(t) ,
) dτ,
and a probability kernel
t > 0,
s < t, Q(x,S)
with the
f β b(E)) .
σ-fields
F°
Suppose given
satisfying (1.3).
Let
g
P
augmented on
£ b(E)
F be any sequence
having monotone linear bounded closure b(E), and let f be an oo =- s 1 < k . Suppose that the family h (x) E^^f generates the σ-field enumeration of the random variables n j ^ e g ( X )nd s , 0 < r rational, ~ n
r
and that the processes limits, where
P*
is
h (X )
are
P*-a.s. right-continuous with left
P-outer-measure.
and Corollary 1.9 hold when
(Ω,θ , F )
Then the conclusions of Theorem is replaced by the space of right-
continuous paths with left limits in the topology on
E
generated by the
h (x), and when P is transferred to this space. r n FINAL REMARKS. Such a P on F is induced through completion by any progressively measurable process. For 0 < g the processes e
—+-
h n
(χt)
a r e
measurable supermartingales with respect to
as seen by a familiar computation.
F°
and
P,
Hence the martingale convergence
theorems can be used to aid in checking the right-continuity with left limits.
The question is simply whether, by making a standard modification
of
the martingale right-limits along rational
X ,
substitution of
Xfc
in
hn .
t
can be evaluated by
It is important to note that this is always
E,
ESSAYS ON THE PREDICTION PROCESS
67
possible if we permit the standard modification to take values in instead of just in
E
(regarded as a Borel subset of
identification with its image by the mapping limits along rational substitution of on
Ω
Z
by
X
t
.
Q).
H
H
through
Thus by (1.3) the
t
may be evaluated a.s. at each t by p Letting Z denote the general prediction process t
(see Section 1 of Essay I) we may assume without loss of generality p that for each r in a countable dense set P{X = φ(Z )} = 1 . Then if p we replace X by Z whenever this evaluation fails, and then replace P P P φ(Z )
with values in
whenever
Z
E U (H-Q(E))
e Q(E), we get a standard modification of
X
which satisfies the conclusions of Theorem
and Corollary 1.9. It is also of interest to note that for Theorem 1.12 one need only assume (1.3) relative to F . Then the familiar "Hunt's Lemma" argument showsthat the h (X ) are in any case conditional expectations relative n t to
F° ,
and therefore
Q(X ,S)
satisfies (1.3) relative to
analytical question of giving conditions on a semigroup for any corresponding Markov process,
F
dealt with at length in Englebert (1978).
and
F
P
F?
.
The
under which,
are equivalent, is
Here it has been implicitly
assumed (see the second remark after Definition 1.3). REFERENCES 1. Dellacherie, C. and Meyer, P.-A. Chapitres I a IV. Hermann, Paris, 1975.
Probabilites et Potentiel,
2. Engelbert, H. J. "Markov processes in general state spaces" (Part II), Math. Nachr. 82, 1978, 191-203. 3. Getoor, R. K. Markov Processes: Ray Processes and Right Processes. Lecture Notes in Mathematics, No. 440. Springer-Verlag, New York, 1975. 4. Karush, J. "On the Chapman-Kolmogorov equation," Math. Stat., 32, 1961, 1333-1337.
Annals of
5. Knight, F. B. "A predictive view of continuous time processes," Ann. Probability, 3, 1975, 573-596. 6. Knight, F. B. "Prediction processes and an autonomous germMarkov property," Ann. Prob, 7, 1979, 385-405. 7. Meyer, P.-A. Le retournement du temps, d'apres Chung et Walsh. Seminaire de probabilites V, Universite de Strasbourg, 213-236. Lecture Notes in Mathematics 191, Springer-Verlag, Berlin, 1971. 8. Meyer, P.-A. La theorie de la prediction de F. Knight. Seminaire de probabilites X, Universite de Strasbourg, 86-104. Lecture Notes in Mathematics 511, Springer-Verlag, Berlin, 1976. 9. Walsh, J. B. Transition functions of Markov processes. Seminaire de probabilites VI, Universite de Strasbourg, 215-232. Lecture Notes in Mathematics 258, Springer-Verlag, Berlin, 1972.
ESSAY IV.
0.
APPLICATION OF THE PREDICTION PROCESS TO MARTINGALES
INTRODUCTION. Let
X(t) ,
t > 0,
be a rights-continuous supermartingale relative to
an increasing family of it
it
it
(Ω ,F ,p ) .
σ-fields
G
on some probability space
it
we assume that the
are countably generated for each t . * It is then easy, by using indicator functions of generators of Gfc, to construct a sequence that
{x(s),(X .
X
G
1
2(n+1)^'
(s)),s
t
-n'
o f
generates
real-valued processes such
G
for each rational t. We can now
transfer both process and probability to the canonical space We simply set S
e
P{W
.(s) = 0 ,
all s > 0
and
n > 1} = 1,
Ω
of Essay 1.
and for
X > Q B ^ (see Essay 1, Section 1 for notation)
Pί(w 2 n < )> e s} = p *{( χ ( ) ' x 2 ( n + i ) ( >>e s > Then we obtain a canonically defined process supermartingale with respect to In the present work, we let sequential process
(w2n^'
X a n d
P
and the
X ((w )) = w o (t) t n 2 σ-fields
of Essay 1.
denote this process (rather than the w e
dr
°P
t h e
o d d
coordinates from the
notation (i.e., we discard the set of probability zero) .
G°
which is a
0
where they are non-
Thus we do not allow any "hidden information":
?? = G° .
By a
well-known convergence theorem we have E(
F
Vtl t+> = F
lim
E(X
F
s + tl r>
< lim X ' t+ r =
Hence
X^
X
t
is a supermartingale relative to
with its prediction process
F° ,
and we can connect it
Z
As in Essay 1, the method requires that
P
be treated as a variable.
In the present work we are concerned initially with three familiar classes of
P
on
(Ω,F°),
as follows.
ESSAYS ON THE PREDICTION PROCESS
DEFINITION 0.1.
Let
sup E X (t) < °°}, X
= EίX^lF
class
D
0
},
M = {P: X
is an
U - {P: X
and
F^+~martingale and
is a uniformly integrable martingale, i.e.,
V - {P: X
with lim EX
69
= 0} .
is a non-negative supermartingale of
The classes
M
and
V
are called
respectively the square-integrable martingales and the potentials of class D,
or simply the potentials (see [4, VI, Part 1, 9]). Of course, we have
and
V .
For
P € M
M
we have a decomposition
(0.1) where
M c (J, and most of the attention will be on
Xfc - X Q - X^(t) + X*(t) , X?
is a continuous
sum of jumps" with
F° -martingale and
E(X X ) = 0 .
X2
is a "compensated
This decomposition is due to P. A. Meyer
[11], but it will be obtained here as a consequence of a result on additive functionals of a Markov process (Theorem 1.6), more in the spirit of H. Kunita and S. Watanabe [10]. Given such a decomposition (for fixed p it is clear that Z contains the distributions of both processes
P)
X.(s) ° θ given F , but this approach is not useful because one does not 1 p p « xT(s+t)-xT (t)=X^(s) °θ . Rather, one has at least in principle
have
so that the
X.(t)
become additive functionals of the prediction process.
To make this approach rigorous, it is very convenient (and probably necessary) to transfer the setting once more to the prediction spaces of p Essay 1, Section 2. Here the Z are given a single definition not depending on functionals of
P,
and for example the above Z
.
In the setting of
Ό,
X.(t)
become actual additive
this enables us to avoid the
technical difficulties encountered in [7] with a similar question. This approach permits the application of general Markovian methods to the analysis of the
X.(t),
and to other decompositions in
U
and
In particular, we obtain the celebrated Doob-Meyer decomposition in as a theorem on Markov additive functionals (Theorem 1.8).
V . V
Further
investigation of the discontinuities is based on the theory of Levy systems ([1]).
Thanks to the use of a suitably weak topology for
Z ,
it
is possible to transfer directly the known components of the Levy system of a Ray process to
Z ,
including separate terms for the compensation of
totally inaccessible jumps and previsible jumps.
Rather surprisingly,
this operation is in no way restricted to martingales.
By returning the
70
FRANK B. KNIGHT
components to the original probability space
(Ω,F°),
we obtain (what is
termed) the Levy system of an arbitrary r.c.1.1. process (Definition 2.2, Theorem 2.3). Treatment of the continuous components, unlike that of the jumps, is restricted to the case of martingales.
The continuous local martingales
comprise a single prediction process (a "packet," as in Essay 1 ) .
By the
means of a time change inverse to an additive functional, they are all reduced to a single Brownian motion (but it is a Brownian motion for many different probabilities). We then specialize to the case of autonomous germ-Markov probabilities, which generalizes the one-dimensional diffusion processes in the natural scale on
(-00,00) .
Even in this case the variety
of possible behavior is large, and we do not obtain anything like as comprehensive a theory as is available for ordinary diffusion. A significant feature of the prediction process approach to the present material is thus its generality.
It is sometimes possible to restrict the
process to a subset which is especially chosen to fit a given
P,
the present purposes there is usually no advantage in doing so. by considering as a single packet all
P
such that
(X ,P)
but for Instead,
has some
abstract defining property, we obtain at once the results which are implied by that property.
On the other hand, since the definition of
X
is fixed, this approach is not as flexible as the usual one for treating all processes adapted to
1.
F
,
relative to a fixed
P .
THE MARTINGALE PREDICTION SPACES. In this section we study the classes
M,
U,
and
them to prediction space, as in Section 2 of Essay 1.
V
by transplanting
In the following
Section 2, it is shown how these results can be interpreted in the original setting of processes on one process at a time.
(Ω,G ) ,
at least if we only deal with
Some familiarity with the terminology and results
of Essay 1, Section 2, is assumed for the present section.
One new basic
method is introduced which is in no way tied to the martingale setting, although it is perhaps especially well suited to martingales.
This is the
application of the Levy system of a Ray process to a packet of the prediction process. natural step to take.
In view of Corollary 2.12 of Essay 1, this is a Here we do not propose to exhaust its implications
even for martingales, but only to use it for the limited purpose of obtaining certain well-known decomposition theorems in the prediction space setting.
It is hardly surprising that these appear as results on
Markov additive functionals of the prediction process, since on prediction space we have a richer structure than on the original space.
A key
ESSAYS ON THE PREDICTION PROCESS
71
ingredient is the fact that on prediction space the prediction process behaves well under the translation operators space there is no corresponding operation.
θ ,
while on the original
Throughout the present section,
we make one significant change in the notation of Definitions 1.8 and 2.1 of Essay 1.
We let
φ(z)
previous definition. X
t
denote only the first coordinate from its
Thus with the present restricted definition of
we retain the fact that for each ,
h,
(φ(Z ) ,Z. ) t ^
is
P "equivalent to
We first establish that the martingale prediction spaces have the nicest possible general properties. THEOREM 1.1.
The sets
M,
U,
and
V
are complete Borel packets of the
prediction process (in the sense of Definition 2.1, 3) of Essay 1 ) . PROOF.
Since for every
h e H
the processes
X
and
φ(Z )
are
P -equivalent we may verify that the three sets are in H by using either X or φ(Z ) . We choose to use φ(Z ) . The three sets' have in common t t t the uniform integrability of any
N > 0,
h £ H,
and
φ(Z ), 0 < t .
By Fatou's lemma, we have for
t > 0,
lim inf E h (|φ(Z ) r r+t+ > E h (|φ(Z t )|;
|φ(Zt) | > N) .
Hence uniform integrability is equivalently expressed, using the rationals
Q,
by the condition lim
and the set of
h
E h (|φ(Z )|
sup
r
o < r^Q
|φ (Z ) | > N) = 0 , r
satisfying this is clearly in
H .
We now further
restrict this set by the martingale or supermartingale conditions. Markov property of
Z ,
By the
these are respectively
h
P -a.s.
But a simple application of Hunt's lemma shows that when uniformly integrable, if these are assumed only for can take right-limits to extend them to all
s,t .
φ(Z )
is
0 1 s, t € Q ,
we
Thus the class of
uniformly integrable martingales (respectively, supermartingales) is in REMARK.
For the martingale case, one can proceed more simply by writing Z
the condition as
E
t
Z
φ (Z ) = E
t
φ^,
h P -a.s.,
where
φ ^ = lim φ(Z )
H .
72
FRANK B. KNIGHT
exists
P -a.s.,
which is also a Borel condition on
Now to obtain h E
2
< oo,
φ
D
we have only to append the Borel condition
go it remains to obtain
supermartingales. and
M
h .
V
from the uniformly integrable
Clearly the conditions
P {φ(Zt) > 0
all
t > 0} = 1,
h
lim E (φ(Z )) = 0, are Borel in h, so we need only check the class n n-κ» requirement. According to a criterion of Johnson and Helms (see [4,
IV, Section 1, Theorem 25]), for a positive, right-continuous supermartingale φ(Z )
to be of class
increasing sequence
D
it is necessary and sufficient, for any
0 < C
»°°,
that
lim E (φ (Z
k
where
T. = infίt: φ(Z.) > C, } . Obviously K
t
Since
Z
is
T
K
h,
as required.
Z -optional and our three sets are Borel, to show that
they are prediction packets it suffices to show that for T < °°, Z
is
P -a.s.
introduce the sets
in each set along with
S g = {z e H: E
= ίz fe H: E Z φ(Z ) < φ (z)} .
S
martingale h
P {z
h .
φ (Zg) = φ(z)},
(respectively,
<
P {z
S
e s } = 1) t
s > 0,
we
and
Clearly these are in
h
Z -optional
For
H,
(respectively supermartingale) condition on
e s } = 1 u
< «) = 0,
o
φ (T ) β Z°, hence this is
K
again a Borel condition on
); K
k-χχ>
and the
h
for all
becomes s,t > 0 .
S
By the classical optional sampling theorems of Doob (Neveu [13, Proposition IV, 5.5]) we have respectively, for P h ίz
* S } = 1 J."ι U
P h {z
and
S
Γrt
first case (1.1)
h
in the corresponding set,
& S<} = 1 .
Therefore, we have in the
S
1 = E h P h ( Z τ + t e S s |Z τ ) = E h P T ( Z t e sg) ,
and therefore
P
zτ
{zfc e S g } = 1,
corresponding result using
S
•i
hold, for
P -a.s.
for each
in the second case. S
P -a.s.
Z ,
for
0 < s,t e Q
at least for rational
Z«p
P
t .
P -probability one.
Zφ
t < K,
Therefore, they
φ(Z )
Consequently, for fixed
-uniformly integrable for rational
with the
2 T
with
But this means, in the martingale cases, that
(s,t)
P
is a K
-a.s.
the
z
P T -martingale, φ(Z )
t
are
Then as in the
first part of the proof we use Hunt's Lemma to extend the martingale property to all
(s,t)
with
s + t < K,
and then let
case of positive supermartingales is a little different.
K •+ «> .
The
One first
observes that simply by martingale convergence of conditional expectations, Z T if φ(Z r ) is a P -positive-supermartingale as r varies in Q, then for any 0 < t < r, r e Q , one has
ESSAYS ON THE PREDICTION PROCESS 17
73
rr Z
t E
φ(Z
m
.
) = lim E
seQ < lim
φ (Z ) S
z
= φ(Z t ),
τ
P -a.s.
Then by positivity and Fatou's Lemma, for 0 < t < s one has z z t t E φ (Z ) < liminf E φ (Z ) S r ~ r*s+ " Z
< φ(Z ) ,
P
T
-a.s.
z
Thus
τ h φ(Z.) is a P -positive supermartingale, P -a.s. The uniform integrability, or square integrability, of the
Zm P -martingales now follows easily from the fact that convergence of φ (Z ) to φ in L or L for P as t ~+ °° implies the same T+t °° convergence conditional on Z , at least for a sequence t •* <» x
sufficiently fast. z
of the
P
jc
This suffices to identify
τ
φ
as the value at
t = 00
h -martingales for
P -a.e.
It remains to verify that
Z
Z
(see Neveu [13, IV, 5.6]).
is of class
D
for
P
in the
supermartingale case, and to show that the three packets are complete. With T
C
as in the first part of the proof, and optional
= inf{t > T: φ (Z ) > C, } . Then with the previous JC
T
£
t
< T
and,
E (φ (Z ) ; T k have
for
JC
h e V,
< 001Z )
let
we have
JC h
lim E (φ(Z
T
k-*oo
); T
k
K
< «>) = 0 .
tends to zero in probability as
), τ <-|z JC
E h (φ(Z
It follows that
k -*-«>.
Now we
, JC
by [4, VI, Section 1, [10]], hence by conditioning on
(1.3)
T
T < »,
);
T'<»|Z)
Z
we have
74
FRANK B. KNIGHT
Therefore, the left side also tends to a subsequence
k.
0
in probability, and so there is
for which it converges to
then, the sequence
C
0
P -a.s. Clearly,
satisfies the Johnson-Helms criterion for Z^,
P -a.s., proving that
Z
is a.s. of class
D .
Turning, finally, to the completeness, we must show that for Z -previsible
0 < T < °°, Z
is in the corresponding packet,
E h E T " | φ ( Z ) | = E | φ ( z τ + t ) | < °° ' Zrp_ ft
First, we note that
has finite expectation for
P
,
hence
P -a.s. φ(Z )
P -a.s. We can now verify the
martingale property (resp. positive supermartingale property) just as before, except that in (1.1) the conditioning is on
Z
. The uniform
or square integrability then follows in the martingale cases, as before, by Φ ( z τ + t ) t o Φα,' conditional on Z , and it only k remains to check the class D restriction in the supermartingale case. is clear as before that E (φ(Z ) ; T < °°|Z ) converges in Ti^ V T K T—
convergence of
probability to
0
as
k -> °° . Conditioning both sides of (1.3) by Z
It
,
we then obtain the same criterion, and the proof of Theorem 1.1 is complete. REMARK.
The supermartingale case could also have been handled by means of
the Doob-Meyer decomposition of
φ(Z ), but since our intention is to
obtain this decomposition from the prediction space, this would lead to a circular reasoning. COROLLARY 1.1. lί Π H , On
or
M Π HQ
The prediction process is a right-process on
V Π H , and
and in each case
ϋ Π HQ,
is a potential of class REMARKS.
φ(z)
φ(z)
M Π H ,
is an excessive function.
is an invariant function.
On
V Π HQ
it
D.
Since the literature of excessive functions is usually confined
to standard processes, this terminology is not quite orthodox. standard processes, such φ
For
are considered under (1) and (2) of the Notes
and Comments to Chapter IV in [2]. PROOF.
In all three cases, for
lim E φ(Z ) = φ(z) t+0 *
we have
by right-continuity of
P z {φ(Z ) = φ(z)} = 1 martingale property
z e H
φ(Z )
E φ(Z ) < φ (z),
and
and the fact that
t
for
z e H
.
Invariance, by definition, becomes the
E φ(Z ) = φ(z) .
For the last assertion, which is
again true by definition, we observe that for any increasing sequence T +oo of stopping times one has lim E φ(T ) = 0 for z £ V Π H_, since n n u -*oo Z the φ(T ) are p -uniformly integrable and, by supermartingale n convergence, lim φ(Z ) = 0 a.s. n
ESSAYS ON THE PREDICTION PROCESS
φ(zt)
We next take up the discontinuities of the Levy system of
Z
75
Here our chief tool is
on the corresponding packet.
The theory of Levy
systems was initiated by M. Motoo and S. Watanabe under the hypothesis of absolute continuity [12], and developed further by J. Walsh and M. Weil [16].
The final touches, and also the simplest proofs, are provided by
A. Benveniste and J. Jacod [1], whose formulation applies to all the discontinuity times of any Ray process.
Since we know by Corollary 2.12
of Essay 1 that on any Borel prediction packet the prediction process is (in a sense) a Ray process restricted to a suitable Borel set, it is natural to use the result of Corollary 5.2 of [1] which we now describe. Continuing the notation (however unwieldy) of Essay 1, let
H
be A
+
any Borel prediction packet, and let (H -————
Π H )
A
of its "non-branching" points.
We denote the canonical Ray process by
— V
with probabilities
—
P ,
transition function
Then there exists a Levy system of N, M, H,
and
kernels on
L . Here
Xfc,
N = N(x,dy)
+
(HA fl H Q ) , while x s D,
where
the Ray branching points), while and
M
and X
Π H )
and resolvent
N(x,{x}) = 0 , < oo
x G D,
and
EXL
and
.
M(x,dy)
x
(with
x e D
continuous additive functional while
D
X
.
In fact, we have
t .
=
Finally,
H
Both
N
where
is a
σ-fields
Σ f (X ) Irs 1 e 0<s
t
F
of
•, for B
'
f > 0 .
These four objects have the property that
previsible discontinuities.
Ex
Z
(N,H)
while
In more detail, let
jointly-Ray-Borel function with
Σ
f(x,x) = 0 .
f(5s,Xs)
/>. and
x
x c B . Next, we have
L
totally inaccessible discontinuities of
(1.4)(a,
is
is a purely discontinuous additive
* a Borel
x e B .
for each
functional which is previsible with respect to the usual the Ray process
N(x,dy)
as before denoting
is a Ray-Borel set). Further,
and all L
B
is defined for
(B
are additive
In more detail,
- B
M(x,{x}) = 0 ,
< °° for
R^ .
are Borel measure
L = Lfc(w)
yield finite measures on the Borel sets of
defined, and are Borel measurable in EXH
p,
M = M(x,dy)
H = H t (w)
D « (H
X ,
—
which consists of four parts, and
functionals on the probability space of defined for
be the Ray compactification
U
N(ϊ β
I
f(x,y)
Then {
J
"compensates" the
(M,L)
e
D
}
-'dy)f( V ' y )
compensates the be any non-negative,
76
FRANK B. KNIGHT
(1.4) (b)
*"0 <ϊ
,
for all X
x
and
t > 0 .
may be replaced by
We note that on the right side of X
S"~
since
H
S
is continuous. S
_
We combine the Levy system by setting are extended to be the zero measure for and
H = H + L .
N = N + M,
x
x,
and
t
and
M
at which they were undefined,
as before.
In order to apply this to the packet involved and we do not assume packet
h(D ) Π H
H
H ,
since left limits are
is complete, we use the complete Borel
of Essay 1, Corollary 2.12.
We recall that
A
.
H_ c (h(D_) Π H) , Ά
and for
h e (h(DΛ) Π H)
A
t > 0} = 1 .
elements of
H
h(DA) ^
Thus Z
on
H
H
Π H
c a n
P {Z^ e H t
n L
A
for U
be regarded as a packet of
which, in addition, are given by
(and therefore, by distributions on -z
such an entrance law is determined by Π H ) ,
we'have
A
entrance laws for
(H
N
EX
f,
all
where
Thus we have finally
(1.4)(c)
for
(a),
P
h(D ) Π H ) . A o
Since
for at most one point of
and in the present case by at least one, we see that the
mapping h(z) is one-to-one on D . Thus we can introduce the inverse -1 h : h ( D ) -> D , and for any initial distribution μ for Z on A
A
η
-
t
h(D ) Π H we can identify X = h" (Z ) and X = h" (Z. ) for A U t t Z— t"~ t > 0 as a realization of a Ray process with initial distribution 1 μ(S) = μ(h(S)) on h" (h(DAΛ ) Π H_) . Furthermore, we showed in the 1 -1 proof of Essay 1, Theorem 2.13, that h (h(D ) Π H ) = h (h(D ) Π H) - B, A U A or in other words, for the right process Z on h(D ) Π H with left limits in h(D ) Π H, the elements of (h(D ) ίi H) - H correspond -1 -1 under
h
to the Ray-branching points in
transfer the Levy system of
X
h
(h(D ) Π H) .
to obtain a Levy system of
Thus we can Z
on
h(D A ) Π H . In detail, let t > 0, ""
and
Ω^ = {w,, ^ Ω^ such that Z , A Z Z w^ίt-) e h(D ) Π H, t > 0} . Then ί>
A
{A Π Ω
w (t) e n ( D ) Π H_, Z A U Ω , with the σ-fields Z, A
A € Z } on Ω , is canonical sample space for Z as a Z , A t l Z,A t right process on h(D ) Π H . Using this sample space, we define the four A
\j
elements of a Levy system by
77
ESSAYS ON THE PREDICTION PROCESS
(1.5)
N
(h Z
M
1
'dh2)
; h
€ h(D
Π
) A
H
0
- HQ ,
Z(hl'dh2) ?
L
l
W
(
Z,t V '
z
€
Ω
u
Z,
-,»
-1 Z where w(t) = h (w (t)) for t > 0 . Then since θ w corresponds to Z "* t ^ θ w as w does to w (where θ is the Ray-translation operator), we t Z t Also see that H_ and L_ are additive functionals of Z on Ω_ Z Z X. Δ fP H_ is continuous, while since
z
0
where
f h
is Borel on
discontinuous. over to
B}
h(D ) Π H,
L
A
is
Z
Z -previsible and purely
t
Of course, the local integrability of
H_
and
carries
z e h(D A ) Π H Q .
It is now just a matter of transferring (1.4) to the present context, and setting THEOREM 1.2.
N
N
For
+ M,,
and
H,,
Ct
Z
0 < f(zχ,z2)
+ L , to obtain Z Z H x H, f(z,z) - 0, the objects (1.5) H
satisfy ( 16 ) ( a
> d H
Z,s / H Q N z ( Z s - ' d z )
0
f ( Z
s-'2)
H-H o }
o
for all
z €r h(D A ) Π H Q
and
Of course, the kernels are concentrated on
t > 0 N_,
h(D )Π H
M
are Borel in
z
and the measures
Also, we may as well assume
f(z ,z ) = 0
78
FRANK B. KNIGHT
except for
z
e h(D ) Π H
use of subscript
Z
and
z 2 e h(D A ) fl H Q . On the other hand, our
instead of
A
for the elements of the Levy system is
quite appropriate for the following reason. with the case
H
= H,
h(D) Π H
= H
.
We could just as well begin
Then we obtain the four
components of the Levy system in a form which applies, except for negligible changes, to any Borel prediction packet
H
c H .
In fact, the
only objection to identifying the restrictions of these components to h(D ) Π H A
and
Ω with the components of Theorem 1.2, in general, is Z, A
that the measure kernels might not be suitably restricted to the corresponding
h(D ) Π H . A U
these measures to be (1.6).
0
However, for any fixed
one may redefine A
outside
MD ) Π H
This follows by substituting
(1.6) and noting that for
H
without losing property
f(z_,z_) = 1 - I
z e h(DA> Π H Q
(z ) in
n
the result is
0. Another form of the same observation is useful in treating and
M,
U,
V . We may and do take as sample spaces the canonical prediction
spaces of all elements of
Ω_
with values and left limits in the
z respective (complete) packet.
An inspection of (1.6) shows that we may just
as well restrict the components of the Levy system from h
such complete Borel packet, instead of just
H
(D ) Π H .
= H
seen by intersecting the packet first with the corresponding the step is unnecessary.) THEOREM 1.3.
to any
(This can also be h(D ) ,
but
We may state
There exists a Levy system for any complete Borel prediction
packet and corresponding r.c.1.1. process of Theorem 1.2 with
H = H A
For application to
Z
.
In fact, the components
may be restricted to yield such a system.
φ(Z ), we need to restate the properties of
the Levy system in a somewhat different form. COROLLARY 1.3.
For any complete Borel prediction packet, the properties
(1.6) of the Levy system imply that for any any Z -previsible process by f ( y t . z > i ( Z t _ / z ) PROOF.
y
G
R,
(1.6) holds with
f(Z
,z)
and
replaced
We justify the substitution in (1.6)(c), the other two cases being
analogous.
First we consider
f(y,z) = k(y)g(z),
k e B,
f
and
g e tf, and
y
of the special form y
= I (w )I,(w_)I,. t
A fe Z
0 < f(z ,z ) G S * H,
_ .
Now by (1.6)(a)
the packet we have
A
Z
A
and the Markov property of
ί»
(t)
for
\t , t_J
Z ,
for
z
in
79
ESSAYS ON THE PREDICTION PROCESS
(1.7)
Z = lim
t r 2 /t -r
r
g(z)
, 2 dδ.Z,S
N (Z
,dz)I. ^ S-f
0
Multiplying both sides of (1.7) by
I
V
and taking the expectations
remove the conditioning, we obtain the assertion for this
y
is generated by such
y
and sets of the form
indicators satisfy the assertion trivially.
to
and
By [3, Chapter IV, Theorem 67], the previsible
k(y)
E
σ-field
Z ,
{0} x A, A
whose
Moreover, the class of
finite sums of disjoint indicators of the form
y
is closed under
multiplication.
y
satisfying the
Since the class of indicators
assertion is monotone, it follows by [3, Chapter 1, 19] that it contains all y
Z -previsible indicator functions, and hence all previsible > 0 .
Since
result for
k(y )
f(z ,z 2 )
is again previsbly for k(z )g(z 2 )
0 < k € β,
immediately.
we obtain the
Therefore, it holds for
finite positive linear combinations of such, and by [3, Chapter 1, 21] it holds for
0 < f ^ B x H
as asserted.
We first apply this method to obtain a well-known decomposition of square integrable martingales due to P. A. Meyer [11] and Kunita, Watanabe [10].
We recall that two square integrable martingales are called
orthogonal if their product is a martingale. of
Z ,
If they are additive functionals
then one requires this to hold for every
following notation will be used for
h
M, U,
P
or
h e M .
V,
The
according to
context. NOTATION 1.4. any
h & H,
limit, and
For
φ_(t) φ_(t)
t > 0,
let φ (t) = limsup φ(Z )
exists
P -a.s. for all
is
t > 0
We recall that for as an ordinary left
P -equivalent in distribution to
2.1, 2 ) , of Essay 1 ) , as
φ(Zfc)
is to
Xfc .
Our object is to disengage the jumps of
φ(Z )
X
t-
(Definition
into a separate
martingale, called the "discontinuous part," by means of the Levy system. For this we need LEMMA 1.5. Z
f Z ,
For either all
h ^ M
t > 0} = 1 .
are contined in those of
Z^ .
or
h ^ U ,
P {φ (t) f φ(Z )
implies
In words, the discontinuity times of
φ(Z )
80
FRANK B. KNIGHT
PROOF.
For
ε > 0,
let
T
= inf {t > 0:
|φ(Z ) - φ (t)I > ε} t —
ε Tε
is a
P (T
Z -stopping time, and by right-continuity of
> 0} = 1
existence of
for all
φ (t)
h .
for all
the iterates
V ε *Z T }ε=
ε
<
T
T
i s
t h e
a c c e s s i b l e
Essay 1, we always have
Part.
° θ ε
, with n
Hence by the strong Markov
°°} "
is the totally inaccessible part of
c = T Ω -A
= T
ε
h P -a.s. ph{T
TΠ+ ε
By [3, IV, Theorem 81c)] there is a decomposition where
we have
By the strong Markov property and the t > 0,
1 T £ = T , tend to °° along with n, property it suffices to show that p h {
φ(Z )
Clearly
T
T
for
= T
Λ T , AC (Z.,P ) , and
According to Theorem 2.13(ii) of
Z
f Z P -a.s. on {T = T } . On the other l ε ε A -measurable on A° ([3, IV, 57]), and by the
Ύ
hand, since
Zτ_
is
Z
moderate Markov property we have d
8)
P (φ(Z
)^B|Z ε
P -a.s.
A C Π { T < «}
on
for
) = q(0,Z ,{φ(z) - B}) ε ε B e B,
P (φ(Z
it follows that
) = φ (Z ε
on
{z τ _ e H Q } Π A C Π {T £ < «>}, P h -a.s.
Z
f H ,
P -a.s.
on
A
) |Z ε
) = 1 ε
Therefore, we have
Π { T < °°}, and so
Z
^ Z
We now state and prove the decomposition theorem in Ω
IWV = ^wr,: w_(t) e M
z
J
z
THEOREM 1.6. where for
for all
t > 0
"~
and
There is a decomposition
h e M,
M
is a continuous,
w (t-) e M
as required. M .
for all
z
Let t > 0} .
φ(Z ) - Ψ(Z Q ) = M (t) + M (t), P-square-integrable,
c h
martingale additive functional on ίίw, M, is an E -mean-square limit of martingale additive functionals of bounded variation, and M is orthogonal to all
M, . The decomposition is unique up to a α
P -null set for
h . In the course of the proof, and also later, we need
NOTATION 1.7. t > 0,
For any r.c.1.1. process
M(t),
let
ΔM(t) = M(t) - M(t-),
oo - oo
where M(t-) denotes the left limit at time t, . In particular, let Δφ (t) = φ (Z ) - φ (t) .
PROOF.
If
E
M(t)
M («>) < oo)
is any
and we use
P -square-integrable martingale (in particular
then it is a familiar fact that
M(t)
has orthogonal
ESSAYS ON THE PREDICTION PROCESS
increments.
81
Thus if
{t ., 1 < i < n} is a sequence of partitions of [0,t] n, 1 — — with maximum separation tending to 0, then by Fatou's Lemma, with
h 2 EM
Π
h (t) = lim E
2 Σ (M(t
.) - M(t
. _)
> lim E h Σ (ΔM(t. ) ) 2 D " ε-MH t. 'ε 3*ε where the last sum is over all t + «>,
E h M 2 («) > E h
(1.9)
where
t.
such that
Δ M(t.
) > ε .
Letting
it follows that
t.
Σ (ΔM(t.)) 2 , D j
enumerate the discontinuity times of
We now fix (which is
0 < a < b,
M(t) .
and apply Corollary 1.3 with
Z -previsible by [3, IV, Theorem 92]), and 1
(φ(z) - y ) / - ^
( z )_ v
Eh|
Letting
y
= φ (t)
f(y,z) =
φ w = limsup φ (Zfc) , (1.9) implies that
Σ f(φ (s), Z )| < a" 1 E h ( Φ o o -φ(Z ) ) 2 < « . S 0<s
Then we may subtract the right side in Corollary 1.3, and by Lemma 1.5 and the Markov property of
M
Z
we obtain that the process
(a,b) ( t ) = [ - / $ d δ z , s / H o VV' d z ) f ( < ( > - ( s ) ' z )
is a martingale additive function of
Z
on
Ω,,
(here
f(y,z)
was
substituted explicitly only in the sum). The martingale
M. . . (t) is clearly of bounded variation, and we (a,Dj now evaluate its mean square precisely. Denoting the above difference by M
U,b)(t)
T
is a N
as
- M(a,b)(t)'
l e t
Z -stopping time, t
N •> °° .
T
N M
i n f { t :
M
(a,b)(t)
. (T ) < N + b, (ΆfD)
Also, as in (1.4),
v* —
M. , . (t) (a,b)
of discontinuity, and for previsible
+
M
( a , b ) ( t ) >- N }
and
T
'
T h e n
•+ °°t P -a.s.,
N
has at most only accessible times
T < «> it follows by (1.8) of Lemma 1.5
and Jensen's inequality for conditional expectations that
82
FRANK B. KNIGHT
Then by decomposition of
T
we have easily
< (N+b) 2 . Next, for
t > 0
fixed, let
t . = jt2~ n , n, j
beginning the proof to the martingale
M
and reapply the argument
(t Λ T ) .
Simply by decomposing
paths of bounded variation into continuous and jump components, we see that the sums of the squared increments of t . converge n,3
P -a.s.
as
. (t Λ T ) along the partitions a,b N to Σ (ΔM, , . (t. Λ T )) , where (a,b) 1 N t < t
n -> «
M
the sum is over the jump times less than
t .
Also, the sums of squares
of increments of alone.
M alone are decreasing with n, as are those of M 2 2 2 (c-d) < (c +d ) to bound the squared increments of M,
Using
the sums are dominated by has finite expectation.
(M* ,_, (t Λ T,)) + (M~ . , (t Λ T )) , which (a,b) N (a,o; N Hence by the dominated convergence theorem,
EhM2 .(t Λ T ) = E h Σ (ΔM (a,b) N t
E h M 2 ..(t) = E h (a,b)
In particular, by (1.9) M
.
E h M
1
E(M
U
c < d < 0
0 < a < b < c
S
< t
M
M.
"~
(t) < E M (°°) ,
we have by (1.11) that + M
(b,c)(t))2
and by the Markov property of Z
(DrC)
.
and
we can define
the same form as
E M (dL rD)
(a,O(t) =Eh<M(a,b)(t)
c < φ(Z ) - φ (s) < d, ""
t
(ΔM , ( t . ) ) 2 . a,b I
it follows that
\ ί* ^ / w x ( t ) ) = O ,
{3.,D)
follows that
Σ
is square-integrable.
Furthermore, for
Hence
Λ T ))2 .
N ->• °°, it follows readily that
(1.11)
hence
(t
M(,
.
are orthogonal.
Similarly, for
M, . to compensate the negative jumps (c, α) and (1.11) applies. Finally, since -M, .
M. ,x , (a,b)
it t
it is seen that
^C, Q;
has
ESSAYS ON THE PREDICTION PROCESS
Hence
M, ,x and (a,b)
83
M, _. are orthogonal, and (c,d) * E h (M, (t) + M (t)) 2 < E h M 2 (») . (a,b) (c,d)
We next choose a sequence
a •> 0+, b -> », n n
It follows directly from the above that for exist
E -mean-square limits of
M
h e M
, . (t) + M, (Q.fD)
sequence.
c -*• - », d n n and
-»• 0- .
0 < t < «
there
. (t) along this
(C,Q.)
Furthermore, it is known from general theorems of analysis that
such limits always may be chosen so as to be valid for all
h
(see
[14, Theorem 3]). Accordingly, we denote such a choice by
M*(t),
and
define lim M (t) = I 0 For each
h,
M*(r)
if this exists for all
equals elsewhere.
0
for
t < »
and
t = 0 ,
we have easily M*(r) = E h (M*(«)|Z r ) , P h -a.s. ,
from which it follows that Vι
M,(t) is a right-continuous version of
it
E (M (°°) I Z )
for each
h,
and thus it is a square-integrable martingale.
To see that it is an additive functional of s,t,
and
h G M
we can choose
α ,
Z , we note that for fixed
$. , γ , and
Jc
jc
6
jc
P h ( S g + t ) = 1,
P h (S t ) = 1,
s
given by S = {M (u) = lim (M, . (u) + M, d (α (Y k-^ k'βk) k'6k}
u
'
u
- °'
i s
P t(Sg) = 1
such that K
and
for
P h -a.e.
Z^,
where
(u))} .
Since then n
h
P (θ" u
1
h
S ) = E (P t ( S )) = 1 , S
S
the property
M,(s+t) = M,(t) + M,(s) ° θ. , P -a.s., α α α u corresponding fact for M. ,x (t) + M, , x (t) . (a,b) (c,d)
follows from the
Similarly, it follows from a classical martingale theorem of Doob ([5, (Theorem 5.1), p. 363]) that for each n,
for which
M^
is the limit of
M,
h
we can choose a subsequence
. + M.
.
uniformly in
t,
84
FRANK B. KNIGHT
P -a.s.
for
a = a , \
etc.
Clearly, then,
M,(t)
contains all the
d
totally inaccessible jumps of
φ(Z ) .
But for previsible
T < <», we
have Δ M (T) = Δφ(T) plus a quantity which is Z -measurable along with the ΔM" v* (T), and since E h (ΔM,(T)|Z m ) = 0, this quantity must be 0 KdLfD)
α
T—
Hence we see that M (t) = φ ( Z ) - φ(Z Λ ) - M,(t) defines a continuous c t U α martingale additive functional of Z . It remains only to show that M and
M, are orthogonal, or again that M and M, . are orthogonal, d c va, D j To this effect we have only to apply some of the argument for (1.11) to M (t) + M, ., (t) c (a,b) |M (t)I) > N} . E h (M
c
with
TM N
redefined by
It follows readily that in computing
(t Λ T ) + M, . . (t Λ T M ) ) 2 along N (a,b) N
cross-products of increments of * bounded by
inf{t: (M* ,. (t) + M T .. (t) + (a,b) (a,b)
M
c
and
2N (M* . , (T ) + M T - % (T M )), (a,r>) N (a,o) N
and the sum tends to
0
{t
.} n,j
the sum in
M, . . (a,b)
in
of the
(t . _,t .) n,3-l' n,j
is
which has finite expectation
along with the partition size
By dominated convergence we obtain
j
E(M(tΛT)M, c
N
2
,
P -a.s.
,x(tΛT))=0, (a,b)
N
and to conclude the existence proof it suffices to observe that, by using Fatou's Lemma, lim E h (M(t) - M(t Λ T ) ) 2
= EV (t)
- lim E"(M*(t Λ T ))
= 0 , for any
P -square-integrable martingale
M(t)
with respect to
As to the uniqueness, since any two choices for
M,
Z
.
differ by a
continuous martingale additive functional, it needs only be shown that such cannot be the
E -mean-square limit of martingales of integrable total
variation unless it is
E -a.s.
identically
check that the proof of orthogonality of
M c
0 .
The reader will readily
and
M
just given (a,D)
applies without change to any square integrable martingales which are respectively continuous and of integrable total variation (where M (t) + M~(t)
is defined to be the total variation at time
t) .
Hence
the former cannot be approximated in the mean square by the latter, and the uniqueness is proved. In the present section we make no further use of the packet II, except to remark that any class
D
right-continuous submartingale
X
ESSAYS ON THE PREDICTION PROCESS
may be decomposed in the form
Xfc - E ( x j G ° + ) - (E(xjG° + ) - Xfc)
first process on the right is in elements of the packet
V,
85
U
and the second is in
V .
where the
For the
we will derive the celebrated Doob-Meyer
decomposition theorem as a theorem on Markov processes.
It will be seen
that this yields the corresponding decomposition result for
X
by
expressing it in the above form. Many proofs of the Doob-Meyer decomposition are known, and some are perhaps easier than ours.
Nevertheless, ours seems worthwhile because it
connects the decomposition with the prediction process, and provides additive functionals where the decomposition alone only provides unrelated pairs of processes.
Besides, it does not use the theory of Levy systems,
and most of the work needed for the proof has already been done in [2, Chapter 4, Section 3] and therefore need not be repeated here. Ω = {w : w it) G V Z Z
for
t > 0 ""
and
w_(t-) <= V Z
for
We let t > C} .
The result to be proved is as follows. THEOREM 1.8.
There is a decomposition φ(Zfc) - φ(Z Q ) = M(t) - A(t)
where
A(t)
on
Ωp ,
is a (non-decreasing) additive functional of
Z ,
•i
t
Z -previsible for every
h ^ D,
martingale additive functional. equivalence (i.e., PROOF.
P -a.s.
and
M(t)
is a uniformly integrable
The decomposition is unique up to
for all
h e p ) .
The method of the proof is to write
the three terms on the right are class moreover
φ
D
φ=φ
+ φ
potentials of
corresponds to discontinuities of
φ (Z )
1 /U.
is continuous, Z
,
on
where
V,
at which
t
corresponds to discontinuities of
and
φ
is a regular potential.
and
Z^_
t
φ2
e H - H ,
+φ Z
φ(Z )
with
The asserted decomposition
is obtained separately for each of the three terms. Recalling from Notation 1.4 that
φ (t)
is a
indistinguishable from the left-limit process of
Z -previsible process
φ(Z ),
for fixed
ε > 0
let T = inf{t > 0: (|Δφ(t)|l,_ (Z
_ .) > ε} .
t- " V "
Since
z
φ( t)
s
i- r.c.1.1. except on a null set, its jumps of size
not accumulate, and hence we see that on size at least a
Z
(hence
ε
at Z )
t = T
μ .
Z
{T < °°} φ (Z ) is continuous.
do
has a jump of
Also, since
T
stopping time, (and a terminal time), it follows by
Theorem 2.13 of Essay 1 that distribution
where
ε
T
is
Z"_-previsible for each initial
Then by the moderate Markov property
is
86
FRANK B. KNIGHT E h (φ(Z τ )|Z τ J
(1.12)
V
φ(zQ)
= E
= φ (Z ) , Since that
φ(Z )
P -a.s.
on
{T < «} .
is a supermartingale, the optional sampling theorem implies
φ(Z ) < φ (T), P -a.s.
on
{T < »}
(see [4, VI, Part 1, Theorem
14]). Letting
T = T
and
T
= T
° θ
,
1 < n,
it follows in the same
n
h way that Δφ(T ) < 0, P -a.s. on {T < «} same supermartingale property we see that
for all
n .
Next, by the
E h |Σ Δφ(T )| < E h (φ(h) - lim φ(Z )) Π t n=l t-x» = φ(h) . As
e ->• 0+,
the same facts are seen to hold for all
introduce the process t. < t
with
Z
A
= Z i
A, ^(0+) = oo, we set l,d
(t) = and
-Σ Δφ(t.), X t ±
Δφ(t.) < 0
(in case this yields
iA, ^(t) = 0 l,d
for all
t) .
is an additive functional, and we have
h e V .
Moreover, for each
(1.13)
ε
It is then clear that
E A, . (°°) < φ (h) l,d
A (t) =
-Σ Δφ(T ) n:T
Z -previsible and equivalent to an additive functional.
A_ _ l,α
is
for all
the process
ε
Z^ t
Hence we may
the sum being over all
A, , (t) l,d
is
ε .
Z -previsible (since it is t
Therefore
P -indistinguishable from such, and
P h -null sets), , We now set Φ , Λ (h) = E (A, Λ («>)) . ^l,d l,d
contains all
φ
is a potential of class
D,
It is immediately clear that
with
φ
1 ,Q
< φ .
Of course, the
1 , Ct
Doob-Meyer decomposition of
φ
-,(z.) 1 ,Q
for each
P
i is given by
u
-Ai.d(t) According to our construction, φ φ
1 ,d 1 ,Cl
on
is is Borel. Borel.
{T < «} .
Finally, let Finally, le
Then we have
A
(~)
1 ,α
0 < T
be
is even
Z°-measurable, hence
Z -previsible with u
Z
= Z T—
x
ESSAYS ON THE PREDICTION PROCESS
(1.14)
Consequently, φ(Z ), φ
"E h ( A l,d ( " ) l Z T- ) " Δ A l,d ( T )
*Plfd<*> - ^ V d ^ ' V = - ΔA φ
j. ,Q
(Z )
d
87
P h -a.s.
(T) ,
on
{T < «} .
contains all of the discontinuities at times
t
and does not introduce any others of the same kind.
= φ - φ. -,
it follows that
contained a.s. in that of
Z
φ o (Z )
n
Setting
has its discontinuity time set
.
It is next to be shown that class D . Setting φ, . (h) as ε -*- 0+ l#d
T
φ.
is excessive, hence a potential of
φ (h) = E (A (°°)), since φ (h) increases to it suffices to show that for each ε and t > 0 h E ((φ(Zt) - Ψ ε (Z t )) < φ(h) - φ £ (h) ,
or again that this holds with Starting with
n = 1, E
(1.15)
\
t Λ T
in place of
t,
for every
n .
we have
( z
Λ T
t
E h (E h (A
ε
)
"φ ε
( h )
=
(«) - A (t Λ T )|Z A τ )) - E h A (oo) ε i u Λ x_ ε
= -E h A £ (t Λ T χ )
On the other hand, by the previsibility of
t Λ T
and optional sampling
for supermartingales φ(h)
h
h
>-Eh(Δφ(t Λ T χ ) = E h (ΔA £ (T 1 )
T χ < t) T χ < t) .
This finishes the case φ
(Z
(1.16) (a)
h
- E φ(Z / 4 . A m . ) • (φ (h) - E φ (t Λ T-)) - E (Δφ(t Λ T-)) (t Λ T χ ) 1 1
n « 1 .
Assuming the case
h
) = E (A («) - A (t Λ T )|Z
E h (φ (Z^ A
m
Λ
τ
)
) - φ (Z,. A
m
)
= Eh(Aε(tΛTn) -Ae(t = -Eh(ΔAε(Tn+1)
AT
n + 1
Tn+1 < t .
n
and writing
it follows similarly that
t Λ T
» and
< t Λ T
.)
of#
88
FRANK B. KNIGHT
(b)
φ ( h ) - E h φ(Z
) U
Λ
n+r
= (φ(h) - E h φ(Z
))
+ Eh(φ(Zt
) - φ (Z
n > (cp (h) - E ω (Z ε ε t Λ T
This proves the case
Λ
n
))
τ
n
n+1
) ) + E (ΔA (T ) ε n+1
T _ < t) n+1 -
.
n + 1, and hence the assertion. We note that only
the previsibility of the T , is used here, not the continuity of Z(t) n+1 at t = T . n+1 We next compensate the accessible jump times of Ψ 2 ( z t ) Since these are
contained in those of Z , it follows by Theorem 2.13(ii)
of Essay 1 that these are contained in the set of times where
Z
e H -H
By taking accessible parts of all the discontinuity times of Z , it is easy to see that
{(t,w ): Z h
e H - H } is contained in a countable h
union of graphs of Z -previsible times, P -a.s. for each
h . Then
by [3, IV, 88 b)] this set is equal to a countable disjoint union of graphs of such times.
Let (T ) denote such a set, and for each n let n (T. , 1 < k < n) be defined by T, = T. on the set where exactly k κ,n — — κ,n 3 among (T , ..., T ) are less than or equal to T. . Then the T are in ~2 K ,n Z -previsible, and define a natural ordering of T , ..., T We now set φ
(t) = limsup φ o (Z ) , t > 0 and (letting
<» - <» = 0)
define
V A n (t) =
Σ
(φ2JTkn) - E
Φ 2 (Z 0 ))
k,nSince the last term on the right is a version of E(φ (Z
)|Z ), k,n k,n 0 < E (A (t)) < E*\p (0) -
it follows by the supermartingale property that
E (φ (t)), and the A (t) are increasing in n for all t, and in t for each
n, P -a.s. Thus we may define
(1.17)
A
(t) = '
VX
Σ (Φ 2 Jt) - E 0
i-
where the sum is over all t. with
VX
"
Z _e H - H Γ
z
and
0 < φ (t.) - E Φ 2 ( 0 ) if this gives A d (0+) = 0, and A (t) = 0 elsewhere. Then A o is an additive functional and, setting 2 h 'd Ψo2,dj(h) = E A_ , (°°) , we have 0- < 2,d φ_ (h) 2,d - <2φ_ (h) . Moreover, since φ is Borel it is easily seen that A is Z -previsible for every h . 2.
2f Q
t
ESSAYS ON THE PREDICTION PROCESS
Then
φ^ _ is a potential of class 2,d decomposition of φ o _ (Z ) is 2,d t
D,
We recall, now, that a potential
and for each
φ
of class
regular potential if, for any increasing sequence times, and any
T = lim T it**
P
D
T n
the Doob-Meyer
is called a of
Z -stopping t
h e V, E h lim φ r ( Z τ ) = E h
where
89
and
n
φ (Z ) = 0 .
^r
φ ^ )
The key fact needed to reduce
*
Theorem 1.8 to standard methods of Markov processes is LEMMA 1.8. class
Set φ = φ - φ =φ - φ - φ . r z z fd -L/Ci ztcx potential.
D
PROOF.
We have seen that
φ
Then
φ r
is a regular
lim φ (Z ) = φ (h), P -a.s. t-K) r t r
> 0, and clearly lim E φ (Z ) = 0 and r t ~ t-*» If we show that E φ (Z ) < φ (h), then r t ~ r
is a potential of class
To this effect, we need to repeat the argument
r
used for
φ2,
D .
and for this we require the analogues of
A
and
φ
r
φ
Using the same symbols as before, we introduce
T = infίt > 0: (
A^ ^ί00) < »
and setting
T
holds
= T, T
a.s.,
,
we see that
and
φ z
A
ε
(t) =
and observe that
Σ
n :T
functional, while
lim T
T > 0
= « a.s.
a.s., Thus
- ( V "E
by
τ Π (
P2(Z0))
is equivalent to a
φ (h)
_ H Q ) > ε} .
n-χ»
n
A
H
it is now easy to see that
= T ° θ
we may define as before an
6
increases to
;
φ
ε(h)
= E (A (00))
ε
' ε
2 -previsible additive
φ
(h)
as
ε •»• 0+ .
It is
now easy to check that the proof of (1.15) and (1.16) applies here with φ
replaced by
φ ,
showing that
E φ (Z ) < φ (h) .
Finally, let us prove the regularity. Z -stopping times increasing to
T < » .
Let
T
be any sequence of
Then clearly
lim E {φ (Z ) T = «>} = o . On the other hand, over {T < «} there is r n-*30 n no difficulty in passing to the limit on {T = T for large n} . Then setting
S={τ
n
all
n}
we have
90
FRANK B. KNIGHT
(1.18)
h
S)
h
{Z^
lim E ( φ r ( Z τ ) - φ^Z^) n lim E ( φ r ( Z τ ) - ψ r ( Z ^
+ lim E h (φ r (Z τ ) - φ^Z^); τr*»
{Z^ e H - H Q } Π S)
n
h
= E (Δφ(T) - Δφ_ _(T) - Δφ_
+ E h (Δφ(T) - Δ φ χ d (T) - Δ φ 2 But on
{z
= Z^} Π S)
= Z } Π S
(T)
d
(T)
we have by (1.14)
{Z^
{Z^
= Z_} Π S)
e H - H Q } Π S) .
Δφ(T) = -ΔA
(T) = Δφ
d
(T),
E h (A , (°°) I Z^) we have 2, d t Hence the first summand on the right
while by (1.17) and the martingale property of E (Δφ
(T)
z fCX
vanishes.
{Z
τ~
= Z } ίl S) = 0 . T
As for the second, on h
and therefore
E (Δφ.
(T)
{Zm
{Z € H - H } T0
we have
e H - H n } ίl S) = 0,
ΔA
l,α
(T) = 0,
while by (1.17)
and the moderate Markov property E h (Δφ
(T) - Δφ.
(T)
{Z^
€ H - H.} Π S)
= E h (Δφ (T) + ΔA 2 d ( T ) ; {Z^_ € H - H Q } Π S)
h = E (φ 2 (Z τ ) - E = 0 . This completes
V φ 2 ( Z 0 ) ; {Z^
e H - H Q } Π S)
the proof of Lemma 1.8.
As mentioned at the beginning, the rest of the proof has already been done in a somewhat different context in [2, IV, Section 3 ] , It follows that there is a continuous increasing additive functional φ
(z) = E A (°°) .
Z (t)
with
The method used is that of Sur [15] , together with
refinements which reduce the problem to a bounded regular potential. proof is unfortunately not short.
because the multiplicative functionals hence
S
The
Some simplifications can be made M
of [2] are not present, and
and the
S 's of [2] are absent, but it does not seem merited P to rewrite the proof. In the present case there may be branching points,
but it can be checked that the proof in [2] makes no use of quasi-left continuity of
X
and so applies also to Borel right processes (see
Getoor [6,9.] for the relevant information on hitting times and excessive functions.)
ESSAYS ON THE PREDICTION PROCESS
91
In the extension argument of [2, p. 168] from bounded to unbounded potentials, use is made of the uniqueness theorem to the effect that
Z (t)
is uniquely determined by its potential ([3, IV, (2.13)]).
However, this 1 2 is easy to see directly from martingale arguments. Thus if A,, and A c l Z are continuous additive functionals with φ (z) = E A (°°) = E A^ («>) , then for each h the identity
1
implies that
0
Vi
A (t) - A (t)
is a continuous,
(P , Z )-martingale of
bounded variation.
By arguments given in the proof of Theorem 1.6 (since 1 2 the martingale may be stopped at any N) this implies that A - A is c c indistinguishable from the zero martingale. But the same reasoning applies if
A
and
c
A
c
are only assumed to be
previsible right-continuous martingale since T
M
- M
Z -previsible. u. M
Indeed, a
is continuous:
otherwise,
is a previsible process, a bounded previsible stopping time
could be found with
with the fact that
M
P {M is
? M
} > 0,
leading to immediate contradiction
Z^-measurable
([3, IV, 67]). It follows that
any decomposition of the type asserted by Theorem 1.8 is unique up to equivalence.
Hence we must have A(t) = A_ _(t) + A (t) + A (t) , l,d 2,d c M(t) = φ(Z t ) - φ(Z Q ) + A(t) ,
and the proof is complete.
2.
TRANSITION TO THE INITIAL SETTING:
THE LEVY SYSTEM OF A PROCESS.
In order to translate results back and forth between the prediction process setting and their original setting, it is useful to examine more carefully the connection of
(Ω,G )
and
(Ω ,Z ) . Δ
Since the connections
we have in mind are completely general, not requiring any restriction on the probabilities, we return temporarily to the notations of Essay 1, Section 2.
Thus
Ω_ Δ
is the space of all paths
continuous with left limits in
H
w_(t) e H_ Δ U
for all
t > 0,
which are rightφ(h)
of Definition 1.8 (rather than only its first coordinate)
is the function and
φ (t)
denotes limsup ) . As remarked at athe end space. of Essay 1, (Ω ,Z ) coordinatewise may be topologized as φ(Z a coanalytic subset of Lusin While Z neither this nor the following assertion is essential to the development here, it is worthwhile to have them on record.
92
FRANK B. KNIGHT
PROPOSITION 2.1.
Let
Ω = {w : φw (t) is r.c.1.1. in R with the φ Z Z °° product topology}, where φw^(t) = φ(w_(t)), t > 0 . Then we have Z Z — Ω G Z°, and φ(Ω ) = {all r.c.1.1. paths in R } . φ φ °° PROOF. Since φ is a Borel function, the components of φw (t) are
z Z°-progressively measurable.
We now apply the results of [3, IV, 17],
according to which the two processes defined as the right
limsup
liminf
Z
of
φ(w (r)) z
along rational
r > t, —
r Ψ t,
measurable, and the two processes defined as the left
are
and
-progressivelyt+
limsup
and
liminf
along rational r < t, r f t, are Z -progressively-measurable in t > 0 . The condition w e Ω is simply that the two right-limit processes should Z φ equal φ(w (t)) and the two left-limit processes should equal each other. Z Since φ(w (t)) is also Z -progressively-measurable, these conditions Δ t+ define a set in Z To see that φ(Ω ) = {all r.c.1.1. paths}, we fix w e Ω and let h be the unit probability at w . Then h e H, and so we can define w w h w its prediction process Z as in Essay 1, Section 1. By Theorem 1.9 h h w w there, we have P { (Z ) = X for all t} = 1, meaning in the present case that the even coordinates w^ (t) of w are identical with those 2 n h h w w of φ(Z ) at w . Since Z at w defines an element of Ω , and any t ( ) Z r.c.1.1. path X. is obtained as X^_ = (w_ (t)) from a w e Ω, the t t 2n assertion follows. The mapping
φ on
Ω
is not one-to-one.
In fact, since
P (Ω ) = 1 for every h e H (as usual, we use the same notation P for φ measures on Ω or on Ω) if φ were one-to-one then the prediction Δ k process Z on Ω would not depend on h except for null sets, which is absurd. Thus we cannot use φ on Ω to transfer a process on Ω to one on Ω_ . Instead, we must reduce Ω to a subset depending on h . Thus, ^ h given h and a particular choice of Z on (Ω,G ) , we can regard Z
as a measurable mapping of
Ω = {w & Ω: φ Z , , (w) = w} h ( ) z
/ x (Ω, ) e Z , and ( ) h and hence we can use
φ
Ύ
φ
is in
(Ω,F ) F ,
>(Ω ,Z ) .
and we have
Then the set
Z, , (Q) c Ω ( ) h
,
is one-to-one on Z. . (Ω, ) . Also P (Ω, ) = 1, ( ) h h to transfer objects from Ω to Ω_, except for
z an
h-null set. In the cases of interest here the problem is to go in the other
direction, from
Ω
z
to
Ω,
and this presents almost no difficulty.
we now define the concept of a Levy system for any
h e H,
existence and properties from Theorem and Corollary 1.3.
Thus
and obtain its
ESSAYS ON THE PREDICTION PROCESS
DEFINITION 2.2 H ,
A Le*vy system for
(as in (1.5) with
processes
H
and
is continuous, that for
L, n,s
0 < f(z —
consists of kernels N w and M_ on h = H ), and F -previsible, increasing
on
Ω,
with
is pure-jump,
HL
= L
E U. < », n,s
z ) € H x H, f(z,z) = 0,
= 0, and
IL
EL. < «, n, s
such
we have
Σ f(Z h , Z h )I S S 0<s
Eh
= E h /n o
dH
h h un,s L H- N ^ ( ,z) z Z s-»dz)f(Z s-
(b)
Eh
Σ f ( Z h ,Zh)l 0<s
(c)
Eh
Σ f ( Z h ,Z h ) s 0<s
Hh = ^
DISCUSSION.
where
»r *
(2.1)(a)
where
h
H(D ) ίl H L
93
+ 1^.
and
S z = N2 + Mz .
Obviously a Levy system for
particular choice of
Z
.
h
does not depend on the
Furthermore, when a Levy system exists then the
proof of Corollary 1.3 carries over without difficulty to show that, for any if
F^previsible process f(z£ ,z)
y
is replaced by
particular, for
y
t
= X
t""
with values in f(y. ,z)I
(R^rδ^)*
.
for
we may replace this
(2.1) (d)
0 < f ( x i r x 2 ) <= 8^ x 8^, Eh
Σ
0<s
f(X
S
with
x H .
In
by
t
. .xx to obtain the following, ψ φ (z))
For
0 < f € g
f(y ,z)
f(y ,φ(z))I. t (y
(2.1) remains true
f(x,x) = 0,
we have
,X )I
" /
S
(Z^ ^ zj) N
(Z^ ,dz)f(X
,φ(z)) .
Analogous statements also hold corresponding to (2.1)(a) and (b). In other words, the Lέvy system compensates the jumps of
X
which coincide
94
FRANK B. KNIGHT
in time with jumps of be expected.
Z
.
On the other hand, this is the most that could
By Essay I, Theorem 2.13(i), a time of the form T = i n f ί t > 0:
on
Ω
Z
is
Z -previsible, where t
φ(Z ) - φ (t)
|Δφt|l(z
z
in any convenient Borel metric on φ (Z ) € Z
iteration of
e -*• 0,
T,
and letting Σ
the processes
S
"
as before).
(
X
over
R .
By the moderate
{0}.
_ .
are
t
V "V
Z -previsible (where
Therefore, these processes are their own
compensators, and the Levy system is irrelevant.
It will be seen easily
from the proof to follow that this fact translates into the of
By
it follows in the usual way that
f(φ (s),φ(Z ))I,_
0<s
> ε}
}
|Δφ I denotes the magnitude of t
Markov property, we then have
f(x,x) = 0
=
Σ f(X ,X ) I , S 0<s
F -previsibility
hence there is no need to compensate these
S
discontinuities. THEOREM 2.3. PROOF.
A Levy system exists for any probability
h
on
(Ω,G ) .
All that needs to be done is to define from Theorem 1.2 (with
V t ( w ) =H z , t < ) ( w ) ) ' L
and to show that they are show that if
Y
P {Y Q = 0} = _,
h,t(w) =
L
z,t<)
Y (Z
definition the previsible
'
F -previsible processes.
is any real-valued, then
( w ) )
(w) )
More generally, let us
Z -previsible process with
is
F -previsible.
Since, by
σ-field is generated by the left-continuous
adapted processes (if we take
F
= F
see [3, IV, 61]), and this
class is closed under linear and lattice operations, it will suffice to consider the case of left-continuous
Y
P {Y
is left-continuous, and we need only
= 0} = 1) .
show it is {w: Z
Then
(w))
F -adapted, or again, that for
(w) e S} e F
Then because
Y (Z
Z
is
{w: Z . . (w) e S } e F
SQ e Zt
(without assuming
S e Z
we have
.
Let
be such that
F
-progressively measurable, it follows that Also, by definition of
P
P h w:{Z^ # ) (w) € S Q Δ S} = Ph{SQ = 0 .
Δ S}
P (S
on
Δ S) = 0 .
Z
we have
ESSAYS ON THE PREDICTION PROCESS
Consequently,
{w: Z
(w) e s} e F^ + ,
and so
But by left-continuity this is equivalent to and at
t = 0
Y (Z^
95
(w))
is
F
F -adaptedness for
-adapted.
t > 0,
we have the extra hypothesis needed to complete the proof.
In the more specialized contexts of martingales or class
D
supermartingales, we have the decomposition results corresponding to Theorems 1.6 and 1.8. X
Here we return to the notation of Section 1:
denote only their first coordinates, while
THEOREM 2.4. for
X
where X
t
Let
P
be any
G
F
and
φ(z)
G
coincide.
-square-integrable martingale probability
= w (t) on (Ω,G°) . Then there is a decomposition X = X^ + X ^ t t t c P X is a continuous, F -square-integrable martingale with
= 0,
X
is a square-integrable martingale which is a
limit of martingales of bounded variation, and
(X
X )
P-mean-square is a martingale.
The decomposition is unique up to a P-null set. p PROOF. Choosing Z as any version of the prediction process of we first write where
and
M
x£(w) = M (t,Z*
(w)), and
X^(w) = X
+ M d (t,Z*
and
M_ are the martingales of Theorem 1.6 on c α definitions are meaningful except for a P-null set since P c d P{Z, . 6 Ωμ> = 1 .
Moreover,
X
and
X
ΩAι . M
P
on
Ω,
(w)), These
are right-continuous, and by
arguing just as in the proof of Theorem 2.3 it follows that they are F
-adapted. Further, we recall from the Remark to Theorem 1.9 of Essay 1 P P that F is P-equivalent to the σ-field generated by {z , s
2
EP(X
)
s+t'fΓt+)
w N M ) M z P , s
= x£ , The same reasoning applies to
P-a.s. X ,
and to
X
X
.
Finally, the mean-
square approximation obviously transfers from Theorem 1.6 in the same way, and the uniqueness proof was already formulated for fixed to apply it to
X
- X
= X^ + (X t - X Q )
P . We have only
to complete the derivation.
Turning to the specialization of Theorem 1.8, we obtain the Doob-Meyer decomposition of a class writing
Y
is a class follows.
D,
right-continuous submartingale
= E ( γ j G ° ) - (E(γjG° + ) - Yfc) , D
potential.
(γt'^V+^
b y
and noting that the last term
Then we have only to decompose the last term, as
96
FRANK B. KNIGHT
THEOREM 2.5.
Let
Then there is a a
G
P
on
P-a.s.
(Ω,G°)
be such that
X
is a class = M
G -previsible increasing process with
potential.
A
M
is
is a
= 0 .
Unlike the decomposition of Theorem 2.4, the present components
depend on the choice of the
A
D
- A , where
-uniformly integrable, right-continuous martingale, and
NOTE.
F
X
unique decomposition
σ-fields, and may be altered if one replaces them by
σ-fields generated by ,
X
.
Hence we denote them by
G
instead of
although in the present notation these are actually identical.
PROOF.
We recall again that the uniqueness part of the proof of Theorem
1.8 was in no way Markovian, and applies here without change.
For the
existence, we set M (w) = M(t,Z^ ,(w)) t () and A t (w) - A ί t . Z ^ ί w ) ) , where
A
and
M
on the right are from Theorem 1.8, and
fixed choice of the prediction process of the proof of Theorem 2.3 shows that since
E
A^ < °° it is clear that
need only be shown that it is a
G
A M
P is
on G
Ω .
p Z
is any
Since
A (w) = 0,
-previsible.
Of course,
is uniformly integrable, hence it
-martingale.
However, this follows by
the same reasoning as (2.2), completing the proof.
3.
ON CONTINUOUS LOCAL MARTINGALES. We pass over any examination of general local martingales or semi-
martingales.
These are treated at length in [4], and it is not clear
whether our Markovian approach has anything to add.
The continuous local
martingales provide not only a simpler application, but also one in which the method of time changes can be aptly illustrated in a prediction process setting.
In point of detail, we avoid the "adjoined Brownian
motions" of the usual time-change result (as in H. Kunita, S. Watanabe [10], for example).
In the last part of the section, we specialize further to the
continuous local martingales which are autonomous germ-Markov processes, as defined in [9]. These generalize the one-dimensional diffusions in natural scale, and perhaps should be called germ-diffusion processes in natural scale.
However, this would be misleading in that no reduction of the
general germ-diffusion to a scale in which it is a local martingale is possible. p X (w) = w (t) = φ(Z ) , t £ t etc., but our starting point is the prediction space of all continuous We continue with the notations of Section 1:
ESSAYS ON THE PREDICTION PROCESS
97
local martingales. We recall that A
is a "continuous local martingale"
means that if
then
T N = inf{t:
A
is a continuous N N > 0 . Actually, we will deal somewhat more
martingale for every
| A^_ J > N}
t
Λ τ
generally with processes which are continuous local martingales given their initial values.
Thus we do not require that the initial value have finite
expectation. PROPOSITION 3.1.
L c H be the set of all probabilities h on c , (Ω,G°) such that (φ(Z ) - φ (Z Q ), Z ) is a P -continuous local martingale. Then L is a complete Borel packet of the prediction process, c , NOTE: It is assumed that, P = a.s., Ψ ( Z Q ) ? ± °° PROOF.
Let
In the first place, since
φ(Z.)
is r.c.1.1. for any
h e H,
P -a.s. the condition that φ(Z ) be continuous is the same as φ(Z ) - ψ (t) = 0 . As seen in the proof of Proposition 2.1, this is a Borel condition on implies that
h . By the usual optional section theorem argument this
{h: φ(Z )
is
P -a.s. continuous}
is a Borel packet, and
the moderate Markov property plus the previsible section theorem show that this packet is complete. redefine
T
by
T
In the definition of local martingale, we can
= infίr € Q: |φ(Z )| > N}, which is
Then we see that φ(Z
Λ
\ " Φ(ZQ)
τ
is
Z -measurable.
Z -adapted, and in conjunction
with the continuity and boundedness of φ(Z ) Z N r l h becomes E φ(Z ) = φ(Z ), P -a.s. over r 2 N l
the martingale condition {r
< T },
for
IN
0 < r , r_ e Q . Hence the set of continuous local martingale probabilities for
φ(Z ) - φ(Z Q )
is in
H .
To show that it is a packet, for
Z -optional
T < °° we can replace
the martingale condition by
on
{r
< T
Λ T}, 0 < r,
e
Q,
together with the conditions
T+r E
Φ(Z
imply that over
) = ψ( {T < r
z τ+
)' 0 < * 3 e Q, o
< T } one has
n
{T + r
< T },
since these
98
FRANK B. KNIGHT
= φ(Z
) ,
using Hunt's Lemma for conditional expectations in the first equality. Then it follows immediately that
Z
is
P -a.s. in
By the optional section theorem this implies that
L
The completeness then follows because, for previsible p h {
L
VG c
L
along with
h
is a Borel packet. 0 < T < <»,
}
= E h ( P V { z 0 € Lc}>
= 1 . We require the following lemma, which was first proved for Hunt processes and square-integrable martingales by H. Kunita and S. Watanabe [10].
But our situation is different, and we prefer to use again the
argument of M. G. Sur (see [2, Chapter 4, Section 3]). From now on, we denote
φ(Z ) - φ(Z )
by
A
.
LEMMA 3.2. There is a unique continuous (non-decreasing) additive 2 functional τ(t) of Z on Ω such that, for h e L , A - τ(t) c is a continuous local martingale. PROOF.
For each
h & L
and
N,A
is a bounded submartingale, and
by Theorem 2.5 it has a Doob-Meyer decomposition
= E (A |Z ) N N for an increasing previsible process τ (t) ,
- E (τN(°°)|Z.) + τ (t) , depending on
h,
with
τ (0) = 0 . Thus
- τN(t)
A
A
previsible, it follows easily that τ
is a.s. constant for
τ (•)
is unique up to a
τ (t) = limsup N
τ M
(t)
t > T
is a uniformly
N
2 integrable martingale, and since
A
is continuous while τ (t)
τ (t) is
is continuous, and of course
along with
A. Λ
T
Furthermore,
P -null set, hence we can define
to obtain a
1 -adapted,
3
~* °
P -a.s. continuous,
2
non-decreasing process such that
A
h - τ (t)
is a continuous,
P -local-
martingale . It now remains only to modify the functional τ(t)
τ
to obtain an additive
τ(t), but the argument we follow would also suffice to define
from scratch.
We first observe that
TN
is a terminal time of Z fc
ESSAYS ON THE PREDICTION PROCESS
on
Ω
.
c may set B
Indeed, since
T
= infίr
e
φ(Z )
JM
C
E χτ = i Π H Π {h: N c 0 a Borel right process.
Then
T
,
for
Z ,
e
is the Borel set
defined by killing
t < T ,
for
is Borel, we
Z^ t
at
T
N
is
and
t > T
t) = 1 . Then with the Borel transition function derived N Z becomes a right process on E U Δ .
We show next that if then
B
φ
Z -measurable, and the process
vi
Δ
from that of
continuous and
is
P (T > 0) = 1 } N Thus we define Z
Δ N P (Z = Δ for all
a.s.
Q: φ (Z ) € B } where
= ίz e L : |φ(z) | > N} .
on
is
99
e (h) = E
τ h
(τN)'
h
e
E N
a n d
'
e
( Δ )
=
N
°'
is a bounded regular potential for
Z . Indeed, by the h 2 e (h) = E A , h € E . Next we note
optional stopping theorem we have
that there is no difficulty extending the additive functional property of A
to stopping times.
Then for any stopping time
E h (A^ ) = E h (A N
(3.1)
+ A
°
S < T
we have for
θ V
N
Z h 2 h S = E (As; S
^
Λ Setting first
S
T
} + E H ( E
S
N
\ ' N
S = t Λ T . we obtain N Z
E h (E
Since
E A
decreases to 0 as t -*• 0+, it follows that N 2 N an excessive function for Z . On the other hand, since A t Λ τ a bounded submartingale we have
lim A t-χ» t Λ T
= A
, N
P -a.s.
e N (h) i s
(or more
is
100
FRANK B. KNIGHT
precisely, the right side is defined by the left on
{T N = «}) .
Thus by
Z
h t 2 lim E (E A ) = 0, and so e X t+ N Finally, let S < T increase to a limit
dominated convergence we have bounded potential. by (3.1) with
S = S
is a T .
Then
we have
li- E n-*»
h
h
eN(Z» ) - E e N ( < ) n
= lim E h (A^ n-*»
N
- A* n
Λ
) N
= 0 , proving that
e is a regular potential. N It follows by the argument of Sur [15] that there exists a continuous N τ
additive functional each
h G E
this
N τ N
o f
(t)
z t
i s
(t)
p
with potential
e
.
Of course, for
-equivalent to the one obtained by the
Doob-Meyer decomposition since their difference is a continuous martingale Z h of bounded variation. Now we have τ (s+t) = τ__(t) + τ (s) ° θ . P -a.s. N N N t on {s+t °° and note that T N "*" °°' p h ~ a s for h e L . It is well-known that
A(t)
may be reduced in some sense to a
Brownian motion by the time change inverse to
τ(t)
(see for example
H. Kunita and S. Watanabe [10], Theorem 3.1). We wish to formulate a result of this type in the present context, and it will be useful to make a slight enlargement of the
σ-fields
Z
to cope with the case when
P {lim τ(t) < °°} > 0 . LEMMA 3.3. and
T
Let
M = sup
τ(t), and
are permitted the value
that is, the trace of h & L
T = infίt: τ(t) = M>, where * Let Z denote Z V
σ-field generated jointly by
σ(T)
both
+°° .
on
A(t)
{τ
C
The family
A (t) - τ(t)
Z are
Z ,
the atom
{τ>t},
M
and the
is non-decreasing, and for Ph-continuous local
t*
martingales relative to
Z
. t
NOTE:
T
is not a stopping time of
is right-continuous in PROOF.
* 2 ,
but it is not hard to see that
Z
t .
A familiar argument using Jensen's Inequality (as for (1.10)) shows
that for
0 < r
< r
e Q,
P {τ(r_) = τ(r o )} = P {τ(r) = τ ( r o ) ; A(r_) = A(r o )} 1 2 1 2 1 2 whence we obtain without difficulty that
P {A(t) = A(T)
for all
t > T> = 1
ESSAYS ON THE PREDICTION PROCESS
Next, observe that any
S € Z
S = (S χ Π {Tt}) s > 0,
with
T
101
may be written in the form
with
S± e 2 t ,
= inf t: |φ(Z )| > N}
i = 1 or 2 .
Then for
we have trivially
EZ(A((s+t) Λ T N ) ; S) = E Z (A(t Λ T ); S. Π {T
S o Π {T>t}) . 2
But the last term on the right becomes EZ(A((t+s) Λ T N ) ; S 2 ) - E Z (A((t+s) Λ T N )
S 2 Π {T
= E Z (A(t Λ T N ) ; S 2 Π {T>t}) , by the martingale property and the same reasoning as before. two terms yields the local martingale property of 2 The case of
A (t) - τ(t)
A(t)
Adding the
relative to
Z
is clearly analogous.
This is the key step; the rest is somewhat routine and we will omit some details.
Set
τ~ (t) = inf{s: τ(s) > t} with inf(φ) = «> . A -1 * routine check shows that τ (t) Λ T is a stopping time of Z. . Let Z denote the usual indicated σ-fields, thus S Ξ Z τ (t) Λ T _λ τ (t) Λ T A means that for c < ° ° s Π { τ (t) Λ T < c} e Z . Then we have {M < d} = ίτ^Cd) = oo} = { τ " 1 (d) Λ T = T} , from which it follows easily that
M
is a stopping time
Z τ " (t) Λ T
The theorem we wish to prove is as follows. THEOREM 3.4.
For
h e L ,
the process
C
a Brownian motion adapted to times
τ(s)
B(t Λ M) = A(τ""1(t) Λ T)
Z , stopped at time M . The τ (t)^Λ T are stopping times of Z , and A(t) = B(τ(t) Λ M) τ (t) Λ T
for all
t,
REMARK.
It is a simple matter to see that
t > M,
P -a.s.
so that our notation is consistent.
B(t Λ M)
remains constant for
It also would not be difficult
to adjoin an auxiliary independent Brownian motion and continue beyond time
is
T*
M
B(t Λ M)
as an unstopped Brownian motion (as in [10]) but since
is a stopping time the meaning is clear without this step. PROOF.
The adaptedness and measurability assertions are again routine,
and left to the reader. τ
(τ(t)) = <» for
Since
t > T,
Aίτ"1(τ(t))) = A(t) = A(t Λ T ) , where
the last assertion is clear.
M
102
FRANK B. KNIGHT
By a characterization theorem of J. L. Doob ([5, VII, Theorem 11.9]) * to show that B(t Λ M) is stopped Brownian motion relative to Z _ τ (t) Λ T 2 becomes equivalent to showing that both B(t Λ M) and B (t Λ M) - (t Λ M) are martingales relative to Z . B y Lemma 3.3 and the optional τ (t) Λ T sampling theorem, they are plainly local martingales. Since t Λ M is bounded by
t,
the second is then clearly a martingale.
is finite, hence so is as
Thus
E B (t Λ M)
sup | B ( S Λ M ) | . Then by dominated convergence s
B(t Λ
M)
E
.
While Theorem 3.4 provides a rough outline of the process
A(t),
it
conceals a variety of possibilities which emerges only when we introduce further assumptions.
For convenience, we let
L
denote a complete Borel
Z
H -subpacket of L such that, for z e L P {M = °°} = 1 . Then it is 0 c d clear from the theorem that A(t) is unbounded above and below, P -a.s. for
z e L . d
If we assume that
(φ(Z),Z ) t t
is a homogeneous strong-
Markov process, then it follows from well-known facts that it must be for each
z
representation of Theorem 3.4 becomes unique measure
m(dx),
(t)), and there is a
t = /^
s(t,y) = — —
jointly continuous in m(dx)
B(t) = A(τ
Then the
positive on open intervals, such that we have
(3.2)
where
(-00,00) .
a regular diffusion in the natural scale on
/
I (t,y)
(B(s))ds outside a
is the local time of B(t), P -null
set for each
is the "speed measure," and does not depend on
z .
z .
Here
The theory
of such processes is highly developed, going back to W. Feller in the 1950's, and is well-represented in the book of Ito and McKean [8]. It will not concern us here, except as a starting point for the discussion. Suppose, indeed, that instead of the Markov property we assume only the autonomous germ-Markov property as in [9, Definition 2.2]. By Definition 2.2 and Theorem 2.4 of that work, this means that there is a packet
(K Π K ) c H Q
by the functions
such that the trace of
z(S), S ^ G
H
on
(and hence, since
K Π K H
is generated
is countably
U"Γ
generated, by
z(S n )
for a sequence
S
e G° + ) .
The germ-Markov
property, as well as homogeneity in time, then follow for described below. i-c Π K ^ Φ,
But first we replace
KlΊκ
by
L
z ^ K Π K
Π K Π K
to obtain the packet of a continuous martingale autonomous
germ-Markov process.
as
assuming
ESSAYS ON THE PREDICTION PROCESS
As discussed in [9], for such a process φ(Z )
Z
103
the future and past of
are conditionally independent given the present germ
σ(Z (S),S e Gz ),
and this is equivalent to conditional independence of
the future and past of
φ(Z )
Moreover, such a process
ψ(
given the germ
z t
h a s
)
Π σ(φ(Z ) , 0 < s < ε)) ε>0 stationary transition mechanism (a
a
function of the "germ-state") hence from an empirical viewpoint it perhaps may be said to arise from essentially the same conditions as if it were a homogeneous Markov process, i.e., in this case a regular diffusion. However, most of the resemblance ends here, as we now indicate. the first place unlike the case of diffusion where be
identified, here the process
discontinuities of
Z
Z
<M
z t
an<
)
*
In
m a v t
may be discontinuous, and the
may greatly effect the behavior of
give two such examples:
z
ψ(
in the first the discontinuities of
z t
W e
)
Z
are
totally inaccessible, while in the second they are previsible. y
EXAMPLE 3.5 B(0) = x,
Let
P
and let
be the probabilities of a Brownian motion
e , e , ... be independent, exponential random
variables with parameter Define a process
X(t)
1,
independent of
B(t)
for each
fB(t+T2n)?
T
2n±
t < T
2n+l < T
is a
T
= 0
T
= Σ?
e., n > 1 .
2(n+l)
Clearly (for each
(t) Ξ 0
for
P
on k 7* 2,
(Ω,G
, F
p X -a.s.
X(t), θ )
X(t)
in such a way that
so that we may assume
•K
= F°
Then we introduce the prediction space and process
probabilities
P
are of two kinds: Z
P {Z
= Z
P {Z
ψ ZQ} = 1
according as the process φ(Z ) ,
G° t"τ"
before.
Z
ί
Π K Π K
P
Z
as
t"Γ
Z ,
where the
either
for all small for each
, s > 0,
t} = 1, t,
or
respectively,
starts during a "level stretch" of
or during a "Brownian stretch" of
corresponds a distinct write
x)
P -continuous martingale, and we may introduce corresponding
probabilities w
and
x .
on the same probability space by
X(t)
where
B(t),
φ(Zfc) .
for each initial point
= {z(l,x), z(2,x); -°° < x < °°}
In each case there x,
so that we can
for the prediction
state space.
It is not hard to see that this does define a packet for
which
is an autonomous germ-Markov process and
φ(Z )
continuous martingale for each
P
.
The times
T
φ(zt)
is
a
become totally
104
FRANK B. KNIGHT
inaccessible stopping times, and the character of abruptly at each
T
φ(Z )
changes
. Of course, such exponential holding times are
impossible for diffusions because of the strong-Markov property. strong-Markov property holds for
Z
because
Z
Here, the
contains the "information"
that a level stretch has just ended or begun, but it does not hold for Φ(Z t ) . EXAMPLE 3.6. let
Let
X
P
be Brownian probabilities for
B(t)
as before, and
B (t) be an independent "instantaneous return" Brownian motion on
[0,1)
for the same probability space, so that when
B (t) = 0,
and we assume that
the usual sense. B(0) = x
and
Let
P '
B (0) = y,
y
0
be the joint probabilities for and assume further a sequence xv
independent Bernoulli random variables with all
(x,y),
where
a ^ b
B^t-) = 1,
is a reflecting boundary for
P
then B^
in
(B,B ) with
Q,,Q~, — 1
{Q, = a
or
b} = —
of for
are two strictly positive constants. We
consider a process X(t) = B(τ(t)) , where
τ (t)
is defined as follows.
instantaneous return times of B
s
τ (t) = Q, SQ i ί )
ds
for
Let
B (t)
to
T..,T , ... 0
from
0 < t < τ_, and for
be the successive
1- . Then we set
n > 1
we define
inductively
τ(t) = τ(T n ) + Q n + 1 Jl B l ( s)ds for T n < t < T n + 1 . n
Here the corresponding prediction state space is identified by triples z = (x,y,c)
in
R x [0,1) x {a,b}
where
x = B(0), y = B (0) , and
c = Q1 . It is not hard to recognize that this leads to a Borel packet of the prediction process for which
φ(Z )
a continuous martingale for each
. Here the times
P
is autonomous germ-Markov and T
are previsible
stopping times, since they occur when the "rate" dτ (t) = Q B (tj)dt reaches its maximum each
Tn
Qn
since the value of
Q
Z
has a previsible jump at
is not determined by
Z
, but is n
determined by
Z
(Z n
trary,
on each cycle. Also,
ψ(zt)
is
n o t
a
is thus a branching point). Since φ(Z ) is arbiτ ή n strong-Markov process in the usual sense. But
φ(Z ) is always a strong-germ-Markov process (as defined and proved in Theorem 2.3 of [9]). From these examples it is clear that germ-Markov processes exhibit much more variety of behavior than Markov processes, even under quite restrictive assumptions.
The situation is not much simpler even if we require
Z
to
ESSAYS ON THE PREDICTION PROCESS
be continuous along with that the
Q 's
ψ(zt)
are constant,
Thus if we set Z
105
a = b
in Example 3.6, so
becomes continuous but
predictable but sudden changes of behavior at the times In this example, the time scale
τ(t)
X(t)
still has
T
is independent of
B(t) .
If
we permit dependence, then two general types of process (with continuous Z )
still may be distinguished.
The first may be called processes in which
the speed measure developes independently of position. with any fixed speed measure process
g(t)
ψ(t) = /
m(dy),
Here we may begin
and any autonomous germ-Markov
with a continuous prediction process and such that the process
g(s)ds
is strictly increasing (in the last example,
ψ(t) = /Q B (S) ds) . independent of
Now let
B(t), B(0) = 0 ,
ψ(t), with local times
define a random time
τ(t)
(3.3)
s(t,y)
be a Brownian motion as in (3.2).
We may then
by ψ(t) = Γ_m s(τ(t),y)m(dy) ,
and then set
X(t) = B(τ(t)) .
It is to be shown that
X(t)
is an
autonomous germ-Markov process, with a continuous prediction process, which is a continuous local martingale.
X (t) denotes the regular m diffusion with speed measure m(dy) based on B(t) as in (3.2) and if τ (t) is the corresponding additive functional of X (t), then we have m m (3.4)
In fact, if
X(t) = X m (ψ(t)) = B(τ m φ(t)) .
Now since
ψ(t)
is independent of
X
it can be seen that
τ ψ(t) = lim Σ (X(—) - X ( - ^ ί - ) ) 2 , n x» k l n at least in the sense of convergence in probability. that
τ ψ m
is an additive functional of
(3.5)
Setting
we have
Next, we will obtain
J* g^φds^ψtsKy)
as an expression for the local time of
where
X .
This is enough to see
u = ψ (s),
dψ
(u) = (g(ψ P -a.s.
X
at
y
with respect to
m(dy) .
we have from (3.4)
(u))Γ
du .
But for bounded step functions
f(u)
106
FRANK B. KNIGHT
Jt0( t )
I, . (X (u)) (-°°#y) m
J
= /.„ [/J since
s(τ (u),y) m
(u)du
f
>?
is the local time of
X
m
')
.
Since this holds
for a countable family of step functions generating monotone extension that it holds for all Borel f(u) = (g(ψ
(u)))~ ,
b(8), it follows by
f > 0 .
Substituting
differentiating with respect to
finally returning to the variable
s,
m(dy),
and
yield (3.5).
Integrating (3.5) with respect to
dy
gives
f_ g 0
(s)d(τ ψ(s)) m
J
which is therefore also an additive functional of C(t),
P -a.s.
X .
We denote it by
and observe that dτψ(s)
where the integrand is the Lebesgue density as indicated. an additive functional of that of with
X .
Then the germ of
X(t), and hence the germ of
X(t)
g(t)
ψ(t)
is also.
determines the prediction process of
view of our assumptions on
g
and
B .
Thus
ψ(t)
is
is contained in But this together
X(t)
autonomously, in
It is clear that
X(t)
is a
continuous local martingale, and that its prediction process is continuous along with that of
g(t) .
It is quite apparent how to extend this type of example to germ-Markov functionals
ψ(t)
other than those which have a density
to Lebesgue measure. time
t
g(t)
with respect
The analogue of the speed measure of the process at
is given formally by
— m(dy), αψ
or
(1/g(t))m(dy)
case, and it evolves independently of the position
in our special
X(t) .
Not surprisingly, this is not the only type of continuous local martingale which is an autonomous germ-Markov process. in which the evolution of
g(t)
depends on
B(t) .
There are also cases
One such example is the
solution of the stochastic integral equation X(t) = x Q + /£ \ jS0 X(τ)dτ
dB(s)
x Q ft 0 .
The existence and uniqueness of the pathwise solution, given any (continuous) Brownian motion functional
τ(t)
B(t), is proved in Section 3.4 of [9]. Here the additive is clearly
τ
Thus if we write formally
rt Γ ,s IΓ
(t) = /
L
1 2ds
(X(u))du
J
.
ESSAYS ON THE PREDICTION PROCESS
dt = d /
107
s(τ(t),y)m(dy) —oo
we find that this is satisfied at time
t
if
m(dy) = m t (dy) = 2 dy dt/dτ = 2
JQ X(s)ds
L On the other hand, if we fix ψ(t)
is just
m(dy) = 2dy
dy .
J as in (3.3) the analogue of
τ(t), and clearly it depends on
X(t) .
It might be of
interest to look for further examples of this type in which
m(dy) ψ c dy .
As examples of continuous martingales, such processes are rather specialized.
However, in view of the significance of the martingale property
(or natural scale) for diffusion, it seems a natural first step to consider it also for a germ diffusion.
But perhaps the chief significance of
the examples is only to call attention to the fact that germ-diffusion processes are very much less limited in behavior than ordinary diffusions. Since they both give expression to essentially the same underlying physical hypotheses, it would seem necessary to use some caution before assuming the validity of a diffusion model of a real phenomenon. REFERENCES 1. Beveniste, A. and Jacod, J. "Systemes de Levy des processus de Markov," Inventiones math., 21, 1973, 183-198. 2. Theory.
Blumenthal, R. M. and Getoor, R. K. Academic Press, New York, 1968.
Markov Processes and Potential
3. Dellacherie, C. and Meyer, P.-A. Chapter I - IV. Hermann, Paris, 1975.
Probabilites et Potentiel,
4. Dellacherie, C. and Meyer, P.-A. Paris, 1980.
Ibid., Chapter V - VIII, Hermann,
5.
Doob, J. L.
Stochastic Processes.
Wiley and Sons, New York, 1953.
6. Getoor, R. K. Markov processes: Ray processes and right processes. Lecture Notes in Math. 440, Springer, Berlin, 1975. 7. Getoor, R. Homogeneous potentials, Seminaire de Prob. XII, 398-410. Lecture Notes in Math. 649, Springer, Berlin, 1978. 8. Ito, K. and McKean, H. P., Jr. Diffusion Processes and their Sample Paths. Academic Press, Inc., New York, 1965. 9. Knight, F. B. "Prediction processes and an autonomous germ-Markov property," The Annals of Probability, 7, 1979, 385-405. 10. Kunita, H. and Watanabe, S. Nagoya Math. J., 30, 1967, 209-245.
"On square integrable martingales,"
11. Meyer, P.-A. Probability and Potentials. Mass., 1966.
Blaisdell, Waltham,
108
FRANK B. KNIGHT
12. Motoo, M. and Watanabe, S. "On a class of additive functionals of Markov processes," J. Math. Kyoto Univ., 4, 1965, 429-469. 13. Neveu, J. Mathematical Foundations of the Calculus of Probability. Holden-Day, Inc., San Francisco, 1965. 14. Seminaire de Probabilites VII, Univ. de Strasbourg. P.-A. Meyer. Limites mediales, d'apres Mokobodzki. Lecture Notes in Math. 321, Springer, Berlin, 198-204, 1973. 15. Sur, M. G. Continuous additive functionals of a Markov process, English translation: Soviet Math. 2 (Dokl. Akad. Nauk SSSR), 1961, 365-368. 16. Walsh, J. B. and Weil, M. "Representation des temps terminaux et application aux fonctionelles additives et aux systemes de Le*vy," Ann. Sci. Ec. Norm., Sup. 5, 1972, 121-155.