Essays on the Prediction Process (Ims Lecture Notes-Monograph, Vol 1)

Institute of Mathematical Statistics LECTURE NOTES-MONOGRAPH SERIES Essays on the Prediction Process Frank B. Knight Un...

Author: Frank B. Knight

21 downloads 381 Views 9MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Institute of Mathematical Statistics LECTURE NOTES-MONOGRAPH SERIES

Essays on the Prediction Process Frank B. Knight University of Illinois at Champaign-Urbana

Institute of Mathematical Statistics LECTURE NOTES SERIES Volume 1

Essays on the Prediction Process Frank B. Knight University of Illinois at Champaign-Urbana

Institute of Mathematical Statistics Hayward, California

Institute of Mathematical Statistics Lecture Notes Series

Editor, Shanti Gupta, Purdue University

International Standard Book Number 0-940600-00-5 Copyright ©

1981 Institute of Mathematical Statistics. All rights Reserved.

Printed in the United States of America.

TABLE OF CONTENTS Page Essay I.

INTRODUCTION, CONSTRUCTION, AND FUNDAMENTAL PROPERTIES

...

1

0.

INTRODUCTION

1

1.

THE PREDICTION PROCESS OF A RIGHT-CONTINUOUS PROCESS WITH

2.

PREDICTION SPACES AND RAY TOPOLOGIES

3.

A VIEW TOWARD APPLICATIONS

35

REFERENCES

44

LEFT LIMITS

ESSAY II.

3 20

CONTINUATION OF AN EXAMPLE OF C. DELLACHERIE

1.

THE PROCESS

2.

THE PREDICTION PROCESS OF

3.

CONNECTIONS WITH THE GENERAL PREDICTION PROCESS

54

REFERENCES

55

ESSAY III.

Rfc

46 46

Rfc

50

CONSTRUCTION OF STATIONARY STRONG-MARKOV TRANSITION

PROBABILITIES

57

REFERENCES ESSAY IV.

67

APPLICATION OF THE PREDICTION PROCESS TO MARTINGALES . . . .

68

0.

INTRODUCTION

68

1.

THE MARTINGALE PREDICTION SPACES

70

2.

TRANSITION TO THE INITIAL SETTING: PROCESS

3.

THE LEVY SYSTEM OF A 91

ON CONTINUOUS LOCAL MARTINGALES REFERENCES

96 107

iii

ESSAYS ON THE PREDICTION PROCESS

Frank B. Knight

University of Illinois at Champaign-Urbana

ESSAYS ON THE PREDICTION PROCESS

Frank B. Knight

PREFACE.

This work comes at a stage when the literature on the prediction

process consists of only six papers, of which two are by the present author and the other four are in the Strasbourg Seminaire des Probability's. these papers is simple to read, much less to understand.

None of

Accordingly, our

work has been cut out for us to make the prediction process comprehensible to more than a few specialists.

One way of doing this, it would appear, is to

present the subject separately in several different contexts to which it applies.

Thus for a reader interested mainly in a certain aspect, that part

may be studied independently, while for one wishing to have a fuller understanding, the force of repetition of a common theme in different settings may serve to deepen the effect. Accordingly, the present work consists of four distinct papers based on a common theme.

No attempt is made to exhaust the subject, but at the same time

the purpose is not just to illustrate.

The first and most fundamental paper

is an introduction to the method.

It has been kept as simple as possible in

order to make it more accessible.

Besides organizing and explaining the

subject, it provides some elements not in the previous literature and which are needed to understand the fourth essay.

On the other hand, a few of the

most difficult known results on the prediction process, in part depending heavily on analytic sets, are not included in the results of this paper.

The

attempt has been to make the subject self-contained and as concrete as possible, by avoiding unnecessary mathematical abstractions and artificial methods of proof. The second essay presents what is perhaps the simplest non-trivial type of stochastic process:

one consisting simply of the arrival time (or lifetime)

of a single instantaneous event.

To a surprising degree, this already

illustrates and clarifies the method. basic types of processes involved. of the physical phenomenon, where -co < t < °°.

One sees in clear distinction the two

On the one hand, we have the direct model t

represents physical time and we allow

On the other hand, we have the prediction process based on the

model, in which

t

represents observer's time and we require

0 < t < °°.

Vi

FRANK B. KNIGHT

This essay uses two results of the Strasbourg school, as well as several of the associated methods, but they are largely confined to the beginning and the end.

It should be possible to gain an understanding of the main idea by

taking for granted these results as stated. The third essay gives an application of the method to ordinary Markov processes.

Like the second it is written to be read independently of the

first, and it does make some demands on the literature of the subject. a sense it represents a concession to traditional formalism.

In

The problem

is to apply the prediction process (which is always Markovian) to a given Markov process without permitting any change in the given joint distribution functions.

This has the double intent of providing new insight into the

usual regularity assumptions for Markov processes, and of clarifying the meaning and role of the prediction process. The fourth essay brings the method to bear on three basic classes of processes:

square integrable martingales, uniformly integrable martingales,

and potentials of class

D.

In accordance with essay one, the study of each

class is reduced to that of a corresponding Markov process.

Thus for

example the "potentials" do actually become Markovian potential functions in the usual sense of probabilistic potential theory.

Several basic

applications are made, including the orthogonal decomposition of squareintegrable martingales, and the Doob-Meyer decomposition of Class potentials. process.

D

Of some general interest is the Levy system of a prediction

This is shown to exist in complete generality, not in any way

limited to martingales.

It is then applied to an arbitrary process to yield

simultaneously the compensators (or dual previsible projections) of all of the integrable, adapted increasing pure-jump processes.

Finally, the class

of continuous martingales which are germ-diffusion processes (i.e., have an autonomous germ-Markov property), is investigated briefly. In this essay, more than previously, a basic contrast with the Strasbourg approach to the same subject matter becomes apparent.

While

the latter approach studies the class of all martingales (or supermartingales, etc.) with respect to a given probability measure and adapted family of σ-fields, the prediction process approach studies the class of all martingale (or supermartingale) probabilities with respect to a fixed canonical definition of the process and

σ-fields.

One acknowledgment and one word of caution should be given in conclusion. Essays 1 and 2 have profited from the careful reading and criticism of Professor John B. Walsh.

In particular Theorem 1.2 of Essay 1 owes its

present formulation largely to him.

On the cautionary side, our numbering

system permits one consecutive repetition of a number when this corresponds

ESSAYS ON THE PREDICTION PROCESS

to a different heading.

Vii

Thus, Theorem 1.2 is followed by Definition 1.2, but

it might have been preceded instead.

However, since no number is used more

than twice, we thought that the present more informal system was justified in preference to the usual monolithic progression.

ESSAY I.

0.

INTRODUCTION, CONSTRUCTION, AND FUNDAMENTAL PROPERTIES

INTRODUCTION.

In this first essay, our subject is introduced in a setting

general enough to cover its uses in the remainder of the work.

Then the

fundamental properties and results needed later are developed and proved from scratch, making only minimal use of the "general theory of processes," as presented for example in C. Dellacherie [5].

In the later material,

which prepares the method developed here for application in various more specialized situations, it is inevitable that there be more reference to, and reliance on, the results of the Strasbourg school as developed in Volumes I-XII of the Strasbourg Seminaire de Probabilites [14], in C. Dellacherie [5], in C. Dellacherie and P.-A. Meyer [4], and in R. K. Getoor [8].

Yet it should be emphasized that the prediction process is not simply

another chapter in this development.

Rather it is a largely new method.

It

could be developed in the framework of the above, but whatever would be gained in brevity and completeness would be offset, at least for the reader who is less than fully familiar with the Strasbourg developments, by the prerequisites.

Consequently, we have tried to proceed here in such a way

as to be understood by the less initiated reader, and yet not to be considered infantile by the initiated.

For the reader who is familiar with

the Strasbourg work, and wants to get an idea of what the prediction process means in that setting, the second essay below may be read as an introduction. It does not depend on the more general theory to be developed.

The aim here

is not to incorporate the prediction process into any general theory of stochastic processes, but to develop it as an independent entity. Having gone this far in setting our work apart from that of the Strasbourg group, we must hasten to give credit where due.

In the first

place, the present work borrows unsparingly from the papers of P.-A. Meyer [12], of M. Yor and P.-A. Meyer [13], and of M. Yor [15], on the technical side.

The proof of the Markov property of the prediction process, which

was difficult (and possibly incomplete) in Knight [9], is derived in these papers from a stronger identity holding pathwise on the probability space, and we follow their method.

Again, the very definition of the process in

[12] avoids the necessity of completing the

σ-fields (until a later stage),

2

FRANK B. KNIGHT

and we adopt this improvement.

The measurability of the dependence of the

process on the initial measure, too, is due to these authors.

On this score,

we have not hesitated to profit from their mistakes, as described in [12] and [13]. Further, the basic role of the set

H

of "non-branching points"

is due to P.-A. Meyer ([12, Proposition 2]). Finally and perhaps most importantly, we adopt a new idea of M. Yor [15] to the effect that one need not only predict the future of the specified process in order to get a homogeneous Markov process of prediction probabilities.

One may just as well

predict the futures of any countable number of other processes at the same time.

The only essential precondition is that the future of the specified

process (the process which generates the "known past") must be included in the future to be predicted.

This, in our opinion, places the prediction

process of [9] into an entirely new dimension. Meanwhile, in regard to our use of the Strasbourg ideas and formalism, we would emphasize the distinction between such as

F , F

etc., and

σ-fields on a probability space,

σ-fields of a product space in which time is one

coordinate, such as the optional or previsible convenient to use

σ-fields.

standing of many results, they are probably unavoidable. while

It is often very

σ-fields of the latter type, and for a complete underOn the other hand,

σ-fields of the former type are needed to express the state-of-

affairs as it actually exists at a given

t,

σ-fields of the latter type

are needed rather to define various kinds of processes, usually as an auxiliary, and they can always be circumvented at a cost of sacrificing some degree of completeness.

Thus, one will not go essentially wrong in the

present work, if one substitutes right-continuous and left-continuous adapted process for "optional" and "previsible" process, respectively, and limit of a strictly increasing sequence of stopping times for "previsible stopping time".

In particular, while the section theorems are used freely

in establishing results for all

t,

no use is made of the corresponding

projections of a measurable process, although they are heavily implicated in the results. To give a general preview of the applications treated in subsequent essays, some of them (such as the Le"vy system of a martingale) may be stated without any reference to the prediction process, and when possible formulations are included. for the proofs.

such

For these, the prediction aspect is needed only

For most results, however, the prediction process is a

necessary part of the formulation of the idea or problem involved.

The

central purpose is thus to elaborate, and by implication more or less to phase in, the prediction process as a feature of the general theory of stochastic processes.

Once the reader becomes adept at thinking in terms

ESSAYS ON THE PREDICTION PROCESS of this process, other applications will suggest themselves immediately according to the context, or so we have found.

For example, very little is

done here in the way of using the prediction process itself in the manner of probabilistic potential theory.

In this way, many stopping times of the

given process would become first passage times of the prediction process, but the interconnection of the two processes remains largely unstudied. might be of interest to follow such a direction farther.

It

Even less (if

possible) has been done in connecting the present work with stochastic integration, a medium in which the author is not highly proficient. Accordingly, such matters are left aside in favor of applications in which we can feel confident at least that a correct beginning has been made. 1.

THE PREDICTION PROCESS OF A RIGHT-CONTINUOUS PROCESS WITH LEFT LIMITS.

We use the following standard notation for measurability. 1)

If

F

and

function for

G

are

σ-fields on spaces

X : F •* G

S e G,

is

and when

X

the corresponding Borel 2)

b(F)

F

and

V/G - measurable, or

a random variable or if

X~ (S) e F

is real or extended-real valued and σ-field, we write simply

denotes the bounded, real-valued, 00

denote the extended real line (X

[- , °°] by R,

X

B)

is

X £ F .

b(F) . Further, we R,

by

n==l

n=l

G

F - measurable functions;

b (F) denotes the non-negative elements of and the product space

G,

X e f/G,

with Borel sets

B

(R , B ) . oo

oo

We begin with the following measurable space. DEFINITION 1.1.

Let

Ω

denote the space of all sequences

w(t) = (w 1 (t), w 2 (t),...,w n (t),...) valued functions of denote the

t > 0,

of right-continuous extended-real-

with left limits for

σ-field generated by all

t > 0 . Let

w (s), s < t, n > 1,

G°

and let

F denote that generated by all w o ( s ) , n > 1, so that F° c 6° . "t 2n t t We set X = (w 2n (t),n > 1) on Ω, and F° = V F°, G° = V Go . τ h u s oo

X

has right-continuous paths with left limits in X R with the t n=l 2n product topology. Finally, we set θ w(s) = w(t + s) on Ω, and denote by

P

a fixed probability on

(Ω,G°) .

Before going further, we give a brief rationale for selecting this as the initial structure. 1

basically two things. conditioning

In setting up a prediction process, we require The first is a process which generates the

σ-fields (in this case, the process

X )

and the second is

a definition of the futures which are to be predicted (in this case, θ. G°), which must contain those of X (namely θ"1F°) . Once we define the process X and the futures θ G°, there may be some latitude as t

t

4

FRANK B. KNIGHT

to exactly how these futures are to be generated, but it seems to be necessary that they be generated by processes in order to write then with shift operators in the form

θ" G° . This being granted, the remainder of our

set-up represents a compromise between the more general assumption of [9], where

X

was only a measurable process, and the more familiar requirements

of the applications, in all of which

X

limits (abbrev. r. c. 1. 1.). Since

X

logical that the "unobserved" processes

is right-continuous with left is assumed r. c. 1. 1., it is n

^n-i^

-

lr

a r e

a l s o

r

c

lβ

1#

It should be pointed out that the choice of real-valued processes is only a matter of convenience.

If the actual process has values in a

locally compact metric space, or even a metrizable Lusin space (horaeomorphic to a Borel subset of a compact metrizable space) we can obtain the above situation by considering the processes composed with a sequence of uniformly continuous functions separating points.

Similarly, if the

X

and replace

by the corresponding subset, and so forth.

see that our set-up is

ί

w

actual process Ω

is real-valued, we may take

p

2

It) = 0, n > 1} = 1 It is easy to

P-indistinguishable from the canonical space of

right-continuous paths with left limits in the product of any two metrizable Lusin spaces, but we prefer the more explicit situation. A property of

(Ω, G°)

which is needed in setting up the prediction

process is the existence of regular conditional probabilities, given any subfield.

For this it is of course sufficient that

(Ω, G°)

is the Borel

space of a metrizable Lusin space, i.e., a "measurable Lusin space" in the language of Dellacherie-Meyer [4, Chap. Ill, Definition 16]. There are many different topologies under which the present measurable Lusin space. each

Ω n

(Ω, G°)

becomes a

It suffices to write

Ω = X Ω , then to give n=l n a Lusin topology as (a copy of) the space of all extended-real-

valued right-continuous paths with left limits, and finally to give the product topology.

Ω

In the present work, we specialize on one particular

such topology, a transplant to the present context of the one used in Knight [9]. This turns out again to be quite natural, and to have some rather unique advantages.

In brief, this is the topology of scaled weak

convergence of sample paths.

This topology is metrizable in such a way that

the completion of Ω is the space of all sequences of equivalence classes of measurable functions (with respect to Lebesgue measure).

The completion is

then a compact metric space, which we denote by

Ω

Ω Ω,

as a Borel subset.

For some purposes,

and a few results will concern

process can be constructed on

Ω

Ω

Ω

Ω,

and

is embedded in

is a more natural space than

explicitly.

The prediction

in complete analogy to

Ω,

but for

simplicity we leave this to the reader (see also [9] and [12]).

ESSAYS ON THE PREDICTION PROCESS

ϊ

Before beginning the construction, one more remark on its essential nature may provide orientation.

It it generally accepted that a stochastic

process, in application, is a model of a phenomenon which develops according to laws of probability.

But there is no such agreement as to

the nature of probability itself.

Some authors (including such renowned

figures as Laplace and Einstein) seem to have doubted that probability even exists in an absolute physical sense.

However, it seems unlikely that

anyone can doubt that probability does exist in a mental sense, as a way of thinking.

If only because one does not know the entire future, it is

clear that probabilistic thinking is an alternative possible procedure in many situations.

Indeed, it may be the only one possible.

Consequently, it

can scarcely be doubted that stochastic processes do exist in some useful sense, if only, perhaps, in the minds of men.

Furthermore, even if

objective probabilities do exist entirely apart from subjective ones, it cannot be considered unimportant to study the more subjective aspects of probability.

As with many other branches of mathematics, one is in a

better position to make applications of probability to the physical world once one understands fully the mental presuppositions which are involved in the applications.

Indeed, a large part of mathematics consists precisely in

cultivating and developing the necessary mental operations, and one of the fundamental requirements for knowing how to apply mathematics lies in distinguishing what is a physical fact from what is only part of the mental reasoning.

Thus, in stochastic processes as elsewhere in mathematics, it is

important and useful to understand what one is doing mentally. Coming, now, to the case of the prediction process, in much the same way as the probability distributions govern the development of a stochastic process, so the prediction process governs, or models, the development of these probabilities themselves.

The prediction process, then, is a process

of conditional probabilities associated with a given or assumed stochastic process.

The given information will be that of the "past" (or observed

part) of the given process, and the probabilities will be the conditional probabilities of the "future" (or unobserved part).

In this way, the

prediction process becomes at first an auxiliary (or second level) stochastic process associated with the given process.

But the remarkable

advantages of the method appear only when we consider this as a process per se, and define the original process in terms of it instead of conversely. This last step constitutes, in a sense, the main theme of the present work. The first step, however, is definition of the prediction process of the given

X ,

and this is our immediate task.

6

FRANK B. KNIGHT

We set

p (x) = π

(π/2 + arctan x) , -» < x < «>,

sequential process on

Y(t) = (Y (t)) = (/* e~ S p(w (s)) ds, 1 < n) . n ' O n ~ d

Y(

^} dt+

and consider the

Ω

= (e^pίw (t)), 1 < n)

it is clear that

Since

{Y(s) , s < t}

n

generates the {γ(r),

r

σ-field

rational,

G°_ = V r < t},

G° .

<

In fact, the same is true of

since the right derivatives at the rationals

determine the right-continuous functions generated by

Y(r),

r

p(w (s)) .

G

is

rational, and hence by the countable collection of

Y (r) = JT e S p(w (s)) ds .

random variables

In particular,

This countability is

essential to the method, which relies on martingale convergence a.s. ("almost surely", i.e., with probability one) at a critical place. random variables

Y (r)

also to those of the set

V

We note that

0 < Y (r) ~ n positive Lipschitz condition : 0 < Y (t+s) - Y (t) < e ~"

n

n

of [13, Lemma 1 ] . < 1, and that each Y (t) satisfies a uniform ~* n of order 1 : s < s . In particular, convergence of

•"

Y (r) for each rational

The

are analogous to those of [9, Def. 1. 1. 1 ] , and

*"

r > 0

is equivalent to uniform convergence.

We will be concerned with the uniformly closed algebra of functions generated on follows.

Ω

by the

For each

m > 1, ~

continuous functions on such functions.

Y (r) .

Explicitly, this may be generated as

let

f .(x., m,j 1

[0,l] m

,x ), 1 < j, m ~

be a sequence of

which is uniformly dense in the set of all

Then the algebra in question is the uniform (linear)

closure of all the random variables

f .(Y-(r,) f —,Y (r )), for all m, 3 J. l m m positive rationals r , . . . , r , l<j, l < m . This is easily checked l m — ~ by first fixing m and r.,..., r , noting that the range of l m (Y_ (r n ), .. . ,Y (r )) is compact, and then using the uniform continuity of 11 mm f . in conjunction with the Stone-Weierstrass approximation theorem, m, 3 REMARK. More generally, if we enumerate the Y.(r.), and choose any countable collection of continuous functions on the Hubert cube X _ [0,1] n=l which is uniformly dense in the set of all such functions, the composition of these with the sequence We let

U

Y.(r.)

can replace the above particular choice.

denote this algebra, and summarize the needed function -

theoretic properties as follows. THEOREM 1.2 metric

a)

The topology on Ω generated by a

b

n

a

Cί is metrizable by the b

d(w ,w ) = Σ 2 - | | y - γ |

ESSAYS ON THE PREDICTION PROCESS

where

||f(t)|| = sup|f(t)| .

compact metric space

Ω,

The completion of

Ω

in this metric is a

whose elements are identified with all sequences

of (equivalence classes mod Lebesgue-null sets of) measurable functions w (t) : R

-*• R,

with the same definition of

d .

With this identification, o

Π

Ω

is Borel in

Ω,

b) PROOF. by

U

and the Borel field on

The map

(t,w) -* θ w

Ω

is

G

is continuous from

[0,«>) x Ω -> Ω .

We have already noted that convergence in the topology generated is the same as uniform convergence of each

assertion.

Y ,

proving the first

On the other hand, since this convergence is equivalent to weak

convergence of each

Y

considered as a distribution function, the

completion is a closed subset of the space of all sequences of distributions of mass

< 1

Theorem.

on

Hence

[0,°°) , Ω

which is compact under weak convergence by Helly's

is a compact metric space.

by a sequence of uniform limits of

Y 's

An element of

Ω

is given

i.e. by a sequence cκ£ non-

decreasing continuous functions of Lipschitz constant 1.

Such functions

being absolutely continuous, we may identify them as integrals of their a.e. - derivatives

p(w (t)) < 1, n ~

and

w (t) n

is identified by applying

P Conversely, given any sequence functions

P(w )

are bounded by

w

0

and

functions, convergence in the metric

d

time intervals in the weak topology /(pwn(t))f(t)dt

of measurable functions, the and measurable.

f,

For such

is simply convergence in finite

σ(L .L^),

for bounded measurable

equivalently, only for continuous

1

f

i.e. convergence of

with compact support (or

which are dense in

L ).

The

closure of the continuous functions bounded by intervals.

pw is all measurable functions n 0 and 1, since it contains the L -closure in finite time Therefore, the completion includes all measurable w as n

asserted. Finally, an approximation by Riemann sums shows that G -measurable on G -measurable on (Ω,G ) •> (Ω,G )

Ω

for each

t .

Ω

for fixed

w

is Borel, where

It follows that <= Ω,

G

d(w ,w )

as asserted.

Turning to b ) , we have

p(w(s))ds is is

and therefore the inclusion mapping

denotes the Borel field of

a well-known theorem [4, III, Theorem 21] it follows that G = S |Ω,

J

Ω e G°

Ω . and

By

FRANK B. KNIGHT

p(iΛs))ds| < (eε-l) J ^ e ( t " s ) p (w^s) )ds - phζ(β)))dβ|

ε

< (e -l)

t

2 ||γ«-1ζ||

+

+

2 e.

This easily implies b ) . REMARK.

This topology is somewhat artificial in that it depends on the

choice of the function begin with a process E c E

p . The artificiality disappears, however, if we Y

having values in a metrizable Lusin space

compact, and consider

w n

( t ) = f (Y.)

dense subset of the continuous functions on

where

(f^)

is a uniformly

E . Then the topology of

Ω

reduces to that of weak convergence of sojourn time measures μ(t,A) = /Jj I A (Y g )ds

for the process

Y .

We turn now to the state space of the prediction process. DEFINITION 1.2.

Let

(Ω, G ), with the on

H . We give

(H, H)

be the set of all probability measures on

σ-field generated by H

{h(S), S e G }

convergence in the topology of

Ω . Let

P

and

probability and expectation determined over PROPOSITION 1.3. H

is the Borel

H

G

E , by

h e H,

denote the

h .

is a separable metrizable space, with respect to which

σ-field.

in its completion on

as functions

the topology of weak-* convergence with respect to

H,

The metric may be so defined that

H

is Borel

the compact metrizable space of all probabilities

Ω .

PROOF.

Since

Ω

is embedded in the compact

Ω,

there is a uniformly

dense sequence f in the uniformly continuous functions on Ω . The n h functions E f then induce the topology of H", which is clearly n __ , metrizable with completion continuous bounded, S&G

,

Borel in

f,

H .

Since

E

f

by a monotone class argument

G°-measurable as asserted.

f . Hence Finally, since

h(S) = E

is then measurable for E

f I s

is measurable for is measurable for

H = {h e H : h(Ω) = 1},

H

is

H .

For the construction of the prediction process we introduce a fixed sequence of continuous functions which are suitably bounded, but the outcome is entirely free of which such sequence is used.

ESSAYS ON THE PREDICTION PROCESS NOTATION 1.4. Let 0 < f < 1 be any fixed sequence of continuous functions on Ω whose uniform linear closure is all continuous functions. For example, the f .(YΊ (r_),...,Y (r )) of Theorem 1.2 suffice, when m,j 1 1 m m extended by continuity to Ω . We also need LEMMA 1.5. For λ > 0, and bounded measurable λt

f t

h

(t) = e~ E (/°° e ~ for every h ^ H .

λs

f

d s

θ

F

l t

}

a r e

p h

f > 0, the expressions -suPermartin9ales

i n

PROOF. This i s a familiar computation, due to G. Hunt.

Af

11

«.

-L.

U.

- λ ( t - + t_) 1 2 h . r°° -λs _ = e E (J e f -λt_ ,

^Q

θ

Λ

12s

, I rO ds|F ) 1

-λ(t 2

In order to use martingale convergence with Lemma 1.5, we first choose for each rational S given *

r > 0 a regular conditional probability F . In fact, we choose r

W (S), Se G ,

of

θ r

in

(h,w) as is possible by a well-known construction of J. L. Doob

(using the fact that

W to be H x F r r

measurable

F° is countably generated — for the method, see

also Theorem 1.4.1 of [9]). Thus we may be more precise in Lemma 1.5 for f =f

by setting ,.

/

and we now assume this particular choice. Next, we prepare one more lemma. LEMMA 1.6. For any

t > 0, h e H , and w e Ω, existence of the limits

along the rationale r lim f (r) λ,n,h for all n and all rational

λ > 0 is equivalent to the existence of lim W h r

in the topology of H .

10

FRANK B. KNIGHT

PROOF.

By definition of the weak-* topology, existence of the last limit

is equivalent to that of

wh Γ

lim E f for all

n . Now by Fubini's Theorem we have

f, , (r) = e λ,n,h

J

f e 0 r

and by Theorem 1.2 b) we know that E (f uniformly in r . Clearly it is bounded by

Convergence of

f^

λ > ε > 0, as

v,^

r

(f

θ )ds , s

θ ) is continuous in s, 1 . Thus f_ __ (r) is λ,n,n

uniformly in

"*"t +

n

h

w

uniformly continuous in

r

E

^or

a

^

r,

λ > 0

for each

n

and

implies, by the

continuity theorem for Laplace transforms, convergence of the measures h Wr

(E

f

θ )ds s

n

.

By a simple use of equicontinuity of these densities in

s,

this is

equivalent to convergence of

wh Γ

E for each n

s .

But at

s = 0

W

s

f

θ

w

is continuous on

in

H,

Ω,

implies that of

wh

E for each

θ

this implies convergence of

varies. Conversely, since each

convergence of

f n

r

f

n

θ

s

s . Hence by the dominated convergence theorem, we obtain the

existence of limits of

fΛ

, (r)

and the proof is complete.

We can now give the definition of the prediction process for fixed h e H . DEFINITION 1.6.

Let

T

= sup{t : for

W , = lim W s± , r r-*s±

both exist and are in

process of

by

h

0 < s < t

the limits

H } . We define the prediction

as

ESSAYS ON THE PREDICTION PROCESS

lim

W

h

i

11

on {t < T }

on {t > T } — h

In discussing optional and previsible stopping times, it is convenient to use the h-null sets in the

σ-fields

F

consisting of

h-completion of

F° .

T,

T = n on { T = θ } , and let n form of the assertion is trivial).

THEOREM 1.7. for each

a)

t .

perhaps at

T

0 < T < °° for previsible

since we may replace any

and

For

h e H,

augmented by all

Furthermore, there is no loss of

generality in the following theorem to assume stopping times

F

n •+ °°

T

(on

P {T, = °°} = 1, n

by

T

= T Λ n

b)

For every

F

H X B X F

the corresponding

and

is ,

Z^ t

c)

For every

F

=

Z

TίS) '

S

e

F° -measurable t+

Z

measurable in

-optional stopping time

pNθ^1 slFτ+)

(1.7b)

{τ>θ}

{ T = 0}

It is right-continuous, with left limits < °°, and it is

on

in

H

except

(h,t,w) .

T < °°, we have

G

° '

-previsible stopping time

0 < T < °° we

have

p h (θ τ 1 S|FJ_) =ZJ_(S) , S e G° ,

(1.7 c) where we set

Z

= h

if the left limit at

T

< «> does not exist.

Ή

d)

The processes

F t+ -optional and

F

Z

and

Z

-previsible, and either of these facts together with

(1.7b) or (1.7c) respectively, determines h-null set for all REMARKS. PROOF.

are respectively

t > 0

(> 0

Z^

if we set

It follows from [4, VI, 5]

that

or

Z

uniquely up to an

Z Q _ = h) . z£

is even

F°+-optional .

By Lemma 1.6 and the classical supermartingale convergence

theorem of Doob (continuous parameter version) we know that P {lim W = W exist for all s} = 1 . Unfortunately, there seems to be S r+s± r " no way to deduce from this that the limits are concentrated on Ω (hence are in Z

+

let {m2~ k

H)

except by first proving parts

(this is the price we pay for using T

be any finite < T < (m+l)2~k}

F

b)-d) for Ω

instead of

-stopping time, and let

for all

m > 0,

W

as usual.

T

+

in place of Ω) .

Accordingly,

= (m+l)2~

on

Then by Theorem 1.2 b ) ,

and martingale convergence of conditional expectations, we have

12

FRANK B. KNIGHT

(1.8,

liπ. E X e"

λs

k-*»

θ

fn

k

ds|FΪ > k

; = lim E

k

(/£ e "

λs

f n • θgds)

= EW>Γ+ Γ e-λs f J 0 n

θ ds . s

By monotone class argument using linear combinations of the follows that given W

F

W

f , it

defines a regular conditional probability on

, and in particular

W

θ

G

(Ω) = 1 a.s. Since

(set Ξ o where it does not exist for all

t)

optional section theorem [4, IV, 84] shows that

is h

P {w

F

-optional, the (Ω) = 1

for all

t}

= 1 . Turning to time.

W

0 < T < °° be an

lim T -*x>

(T ) of stopping times with

T

< T

and

= T < °° . Then by (1.8) and Hunt's Lemma [4, V, 45] we have

V

Γ e"λs f

Ό

But by [4, IV, 56 b) and d)] we have class argument θ

G

t)

is

VC_

π

θ d. .

n

V

s

F

= F , and so by monotone k defines a regular conditional probability on

given F . Since W (set = 0 where it does not exist for all , T t F -previsible, the previsible section theorem [4, IV, 85] shows

Ph{w£_(Ω) = 1

that P h {τ

-previsible stopping

*

= E

T

F

By [4, IV, 71 and 77] this is equivalent to the existence of an

increasing sequence k

, let

for all

t > 0> = 1 .

Combining the above results, it follows immediately that = °°} = 1 and we have (1.7b) and (1.7c). It is clear that

T, n

is

ESSAYS ON THE PREDICTION PROCESS

even an

F°

stopping time, and obviously

Z

13

is right continuous, with

left limits except perhaps at T h < °° . It now follows by [5, IV, T 27] that

Z

is

F

-optional.

To see that

Z

is

suffices to note that it is a measurable process previsible process sets. d).

W

,

since

-previsible it

h-equivalent to the

contains all subsets of

h-null

In view of the two sections theorems, this completes the proof of Finally, the joint measurability assertion in a) follows from the 1

H x F°-measurability of W* r r and

z

In

= w

t

For ε > 0

on

where

0 < s < t, ~

I

8

™

REMARK. rn m

S+

since

I

I

I u,T

t + [o.τ.) n

+ h

x

(t) is H x 8 x F°-measurable

[τ. ,«) h

fact, for later use we may state

COROLLARY 1.7.

"Λ.

F

F

and t > 0 , Z is H x 8 x F° -measurable s [ 0, t J t ε , are the Borel s e t s of [0,t] .

LU, tj

This follows immediately by the same method, since

A /.^n

i s

F° -measurable for each

LU,1 Λ (t+b }

s .

It does not follow,

t+c-

however, that Corollary 1.7 holds if

F is replaced by F t+ε t+ , We next examine how to recover the process X = (wo (t)) from Z , t 2n t In principle, this is possible because Z {(w. (0)) = X } = 1 for each t 2n t t, h-a.s. DEFINITION 1.8. Let a mapping from H into R^ be defined by φ(h) = (p" 1 (E h P t w ^ t O ) ) ^ < n It is easy to see that THEOREM 1.9.

φ(h) is

Since

Now we have

For h ^ H, P h {φ(z£) = X

PROOF.

K/B^-measurable.

φ

for all t > θ} = 1 .

is a Borel function and the components of X

continuous, both sides of the equality are F -optional.

By the usual

section theorem, it suffices to prove that for each optional have

are right-

T < °° we

P {φ(z£) = x } = l . But for n > 1, by Theorem 1.7 b) we have

P

(w 2 n (τ)) = E h (p(w 2n (0))

θ τ |F£ + )

= E T P(w 2 n (0)) , P h a.s. Applying X

p

to both sides, we obtain the identity for the components of

, completing the proof.

14

FRANK B. KNIGHT

REMARK.

It follows in particular that

{z ,

h

0 < s < t}

S

"~

generates a

~"

σ-field whose completion for P contains that of F° . Consequently, by Theorem 1.7b), for any 0 < s < — < s and B_ , ,B e B , we have — — — n l n °° easily

ίw(s k ) * B k } | z ^ , β < t) . But then, by obvious monotone extension, we have P (s|z , σ(Z h , S

s < t) ,

s < t)

S e G° .

) )

Gence it follows that the augmentation of

P h -null sets is

by all

P (s|F

F^ . t+

We turn now to a basic homogeneity property first proved by P.-A. Meyer and M. Yor ([12] and [13]), which is also the key to their proof of the Markov property of

Z

.

The proof we give is new in that

it avoids Theorem 1 of [13], which was in the nature of an amendment to [12]. Here and in the sequel, we will use where convenient the following abbreviation.

h

NOTATION 1.10. THEOREM 1.11.

Let

Z^f

For each

denote h

and

E tf . F° -stopping time

T < «>,

w e

have

z

z

where

θ

PROOF.

τ+t = zt "θ τ

for a11

t

t .

f e U,

f

and

F

Z (w)

and

differ at most by

°

T+t, k

{t > 0} such that, for h , P - a.s. Such t exist Zm is continuous at t = t. , T+t k Theorem 1.7a), b) show that

P -null sets.

cF°

(fn) of

set

T+t, k

c F°

may be included in this equivalence. k

f

k F

F

Z

in the sequence

Notation 1.4 and t in a countable dense h each n, Z f is continuous at t = t since Z m f is r.c.1.1. in t . Thus T+t n P -a.s., and since T + t is previsible _

"a s

Therefore, to prove Theorem 1.11 it suffices

to show that these are equal for all

k

ph

on the right does not apply to the superscript

We first observe that, for

are right-continuous in

Fτ

- °'

T+t, + k

Since we have

ESSAYS ON THE PREDICTION PROCESS

Next, we note that for any F°

-measurable.

θ T

is

(tf x F°

the two sides of Theorem 1.11 are

The left side is clearly so.

Corollary 1.7, for each and

t

t

and

15

ε > 0,

Z

As for the right side, by

is

(tf x F τ

ε

) / ^ ~ measurable

F° ./F? - measurable, whence by composition Z θ_ is T"ί*ε Ί"t 11 ε -u _ measurable. Si Since also Z is F /tf - measurable, by )/H - measurable.

composing again it follows that

θ

is

F° Ttε

- measurable, and thus

F , = F T+t+ T+t, , k k

j?tτ

- measurable.

Since

the proof of Theorem 1.11 is thus reduced to showing

E h (Y z£

(1.10)

for each

F

τ

Y e b(F^+t )

f) = E h (Y(Z

and all

f e {f^}

T

f)

θ )

used above.

To prove (1.10) we need two simple lemmas. LEMMA 1.12.

F

is contained in the

σ-field

F

generated by all

Y

of

the form Y = (b

PROOF. F

g e b(F°+) ,

9 e n e r a t e d by

t h e

stopped process

X(τ+t

k

)

Λ

g

,

0 < s .

Hence we

k

need only show that for each have

b e b(F° ) . k

It is easily seen from Galmarino's Test [4, IV, (100)] that i s

T+t

θ τ )g ,

F°+ c

Fγ

,

s

X g M , e F°+,

this is in the

and

T e

F

°

+

.

σ-field

Hence

F

XsΛT I^

. < τ }

Clearly we * Fγ .

Now we have

(

s Λ T

on

{s < T}

T +((s - T) Λ t ) on {s > T}

hence it remains only'to consider the case {j2~

n

n

< T<(j+l)2" },

1 < j, I

we have {s>T}

- i{s>τ} For each

n,

we can write

s > T .

X T+(s-T) Λ t

<"»

Letting

T

= j 2"n

on

16

FRANK B. KNIGHT

X

T + ((s-T ) Λ t ) n

X

. -n

—

iζ.

\S π^

Θ

/ ^ t

T

k

n

= j 2 < S } , which is in F . Then it is easy to see that the n Y above limit on {s > T } in F , completing the argument of the Lemma.

on

{T

REMARK.

In fact we have

F°

= F ,

as is easily checked, but not needed

k

below.

The class of finite linear combinations of

Y

having the form stated

in Lemma 1.12 is closed under multiplication, hence it suffices to prove (1.10) only for

Y

writing

t ,

t

for

of this form.

Then assuming

Y = (b

θ ) g,

and

by Theorem 1.7b) we have

E h (Y z j + t f)

(1.11)

θτ+t|F°(τ+t)+» = E h (gE h ((b

θ τ )(f

E h (gE T (b(f

θ t )))

= E (gE (b E (f zh

ΘJF° + )))

zh

h T T = E (gE (b Z^f)) . To go from here to the right side of (1.10) we need to reintroduce a conditioning by

F

on certain occurrences of

w .

To justify this, we

have LEMMA 1.13.

Let

K(w ,w )

be a bounded,

F°

x F° - measurable function.

Then

E for

P -a.e. w,

over

where the expectation on the left is with respect to

w

By linearity and an obvious monotone class argument, it suffices to

prove this for

1.7b).

K(w,w 2 ) = E n (K(w,θ τ w(|F° + )

Ω .

PROOF.

Then

T

K (w)

K

of the form

K^w^

K2(w2),

K χ e b(F° + ) ,

K 2 e b(F°) .

factors out on both sides, and the result follows by Theorem

ESSAYS ON THE PREDICTION PROCESS

17

We now apply Lemma 1.13 to the last expression in (1.11) with

K ( W

W

1' 2

)

=

9

(w

l

}

This is justified by composition of with

Z ,

which is

b

<

w

) ( Z 2

f ) ( W

t

2* *

Z^ f(w), which is

^ + / ^ " measurable.

H x F

- measurable,

We obtain

(1.12)

2 (W) = E h (E

h

T

( g ( w )

b

Ϊ

(W2)(Zt

f ) ( w

Zh θ )g (Z.Tf)

h

= E [E ((b

2

) } )

Θ_)|F° )]

z

h ί = E (Y(Z

f)

θ )

proving (1.10), and hence Theorem 1.11. It is now easy to deduce that the processes

Z

are all strong

Markov processes with the same Borel transition function. this as the "intrinsic Markov property." varies we frequently write DEFINITION 1.14.

z

Z (w)

q(t,z,A) 6 F

THEOREM 1.15.

Z

as

h

h .

We set

is

z*H,

0 < t ,

A e H .

H x F° - measurable, we have by Fubini's Theorem

for fixed

q(t,z,A) e 8 + x H

When considering

in place of

q(t,z,A) = P Z (Z* ^ A) , Since

One may describe

(t,A) .

for fixed

Further, by right-continuity in

t,

A & H .

As processes with state space

(H,H)

and

σ-fields

F° ,

•U

the

tx

Z ,

PROOF.

h

fc

Let

H,

are strong-Markov with transition function

T < «> be an

F^ + - stopping time.

Then by Theorem 1.11

and Lemma 1.13,

zh z h P

T

(Z

T

t A)

q(t,Z^,A) , as asserted.

q .

A e H

18

FRANK B. KNIGHT

Using Theorem 1.15, it is surprisingly easy to obtain a remarkable result of P.-A. Meyer [12] concerning the set DEFINITION 1.16.

Z

Let

the intersections of sets in Clearly

H

e H,

will see that

H

THEOREM 1.17.

For

h

G

H

PROOF.

of "non-branching points"

and

H HQ

with

Z

fL u

is its topological Borel field.

h

h e H,

and let

denote

HQ .

is an alternative state space for

the processes

sense of

H

Z

H Λ = {z e H : P { Z = z} = 1} υ o

P {z£ e H Q

for all

We

Z fc .

t > 0} = 1,

and for

comprise a Borel right-process on

HQ,

in the

Meyer. The second assertion follows immediately from the first and the

definition of a right-process (see Getoor [8, (9.7) and (9.4)]) since the

Z

are right-continuous,

P {Z

= h} = 1

on

H , and

q

is Borel.

The first assertion is really a familiar consequence of the strong Markov property.

To prove it, we introduce momentarily for fixed

h e. H

a canonical version (P,Z ) of Z on the space of r.c.1.1. paths with Z values in (H,H), and let θ denote the usual translation operators, and P F the usual right-continuous, P-augmented σ-fields on this space. Of course by Theorem 1.15 this makes sense, and

Z

remains a strong-Markov

process on this space, with transition function F

q .

Also,

Z

is

- optional, hence by the section theorem it suffices to show that for

optional

T < °°, p{z

€ H } = 1 .

But since

Z

θ^ = Z

and

Z

is

Fψ./H - measurable, the strong-Markov property implies p{z

H

τe o

}

= 1 , which completes the proof. REMARK.

Usage of this sample space is introduced systematically in the

following section.

Here it was used only for notational convenience,

because we do not have

Z

* θ

= Z

.

We turn finally to the "moderate Markov property" of the left-limit processes

Z

,

t > 0,

in the terminology of Chung and Walsh [2]. This

was anticipated by Theorem 1.7 c ) , and provides a "practical" form of the Markov property in the sense that it can be applied without knowledge of the future (unlike Theorem 1.15). THEOREM 1.18. with

For

0 < T < oo .

h ^ H, Then for

let

T

t > 0

be an and

F

A e H,

- previsible stopping time

ESSAYS ON THE PREDICTION PROCESS

lFi

ϊ PROOF. T°

ph a

4-'A) ' -s

By [4, IV, Theorem 78] there is an

equal

T° .

P

- a.s. to

T,

19

F

- previsible stopping time

and it suffices to prove the assertion for

Further, by [4, IV, Theorem 71] we can as well assume there is an

increasing sequence

T

< T ,

P {lim T n

by

lim T n it*»

T°

are

= T } = 1 .

We may replace

T

-xx>

in proving the assertion, and then

F

F = V F , where the T^ n T° n - stopping times [4, IV, Theorem 56] . Again by [4, IV,

Theorem 78], F T-

and

F T-

differ only by

enough to prove the result conditional on

P -null sets, so it is F° . T-

In short, we have shown

that the entire assertion is equivalent to that obtained by replacing T _o r - stopping times, and v» might as well have been so formulated (except for the fact that F - stopping by a strict limit of an increasing sequence of

times are needed in applications). We need to use the analogue of Theorem 1.11 for previsible stopping times. LEMMA 1.19.

For

h 6 H,

and

F°

- previsible

T

with

ph

"a ' s

P h {0 < T < «} = 1,

we have

z

PROOF.

τ+t = z t ~ 'θ τ

for a11

t

- °'

As for Theorem 1.11, the problem reduces to showing the analogue of

(1.10) with

Y,

(1.13)

f,

and

t

as before:

E h (Y Z ^ + t

f) = E h (Y(Z t T "f)

θ ) .

To do this, we need only apply the analogues of Lemmas 1.12 and 1.13.

The

latter is proved just as before, and we have Z

t-(w)

E for

Fτ_ x F

is replaced by

- measurable g <= b ( F τ _ ) ,

use the familiar fact that

h K(w,w 2 ) = E

K(w ,w ) .

(K(w,θ w) |F

) ,

As to the former, where

g e b(F° )

the same proof applies except that we must T e F

i

s Λ T

for previsible on

T .

Then we write

{s < T}

T +((s-T) Λ t ) on

{s > T}

20

FRANK B. KNIGHT

I \ = X I / ^ \ e F (by definition of F ) . s Λ T ris < Tj s is < Tj TTthe second case, there is no change on {s > T} . Finally, on

where

X

{s = T } we have

X

s

Ir -, = (X_ is=T} 0

For

θ ) Ir -,, which also has the T {s=T}

required form. We now apply the two lemmas, along with Theorem 1.7 c ) , replacing F

and

Z

by

F

and

z τ

_'

and next the analogue of (1.12).

t o

obtain first the analogue of (1.11)

This then completes the proof of Lemma

1.19. To complete the proof of Theorem 1.18, one need only apply Lemma 1.19 and the analogue of Lemma 1.13 to obtain

,z

h

Z

7

P

(z t

6

A)

completing the proof.

2.

PREDICTION SPACES AND RAY TOPOLOGIES. As already became apparent (in the proof of Theorem 1.17 for example)

it is a technical obstacle to have to define

Z

separately for each

z .

Furthermore, in view of Theorems 1.9 and 1.15 it is an unnecessary obstacle. N & F° Z

We have

X

Z

with

= φ(Z )

P (N) = 0

for

for all

z & H .

t,

except on a fixed set

Thus we are free to transfer the

to a more convenient sample space, and study

X

in terms of

Z instead

of conversely.

This leads to the following concepts and definitions.

DEFINITION 2.1

1)

of

The prediction space of

( Ω . 7 ,Z , Θ . Z ) Δ

i) ii) iii) iv)

t

t

consists

where

t

Ω is the set of all right-continuous H -valued paths Z 0 w (t), t > 0, with left limits w (t-) in H for t > 0 . z Z Z° is the σ-field generated by {w (s) , s < t}; Z° = Vt Z° .

5

θ* : Ω^ •> Ω_ is defined by t Z Δ Z (w ) = w (t) , 0 < t . t

it

transition function Thus,

Z H

0 < s,t . ~"

The prediction process (without specification of a fixed

probability) is the canonical process

process on

z

θ w (s) = w ( s + t ) t Z Z

Δ

2)

1.18.

(Ω,G°,F° ,G° ,θ ,X ) t+ t+ t t

q(t,z,A),

Z

on prediction space with

as justified by Theorems 1.15, 1.17, and

is a strong Markov process and

Z

("process" being meant in the sense of

is a moderate Markov E. B. Dynkin [6]),

ESSAYS ON THE PREDICTION PROCESS

and it is a right-process on

H

distributions concentrated on

when considered only for initial H

.

In both cases, we have the same

augmented right-continuous

σ-fields

the

for all permissible

P^

completion of

Z°

continuity of path at with

P^ = P ° .

each

h e H

t = 0

21

every

Z

μ

P*1 - null sets in

containing all

on

H

μ,

since by right-

indices a

\χ^ on

Finally, in view of Definition 1.8 and Theorem 1.9, for

the processes

distribution to

(φ(Z ),Z )

(X ,Z ) ,

are jointly

and both components are

P -equivalent in P

- a.s. right-

continuous with left limits in their respective topologies. noted that we use the same notation (Ω,G )

or

(Ω , Z ) , 3)

£ U

for all

if

it is an h

P {z _ e u

REMARKS.

It is to be

for a probability on either

By a packet of the prediction process we mean a non-void

measure. U c H

P

the distinction being clear from the context,

z

universally measurable subset P {z

HQ

U

of

H

such that, for all

t > 0} = 1

in the sense of outer

If

then

"H

U e H,

packet".

for all

H Q - packet, and on an

U, H

is a "Borel packet", while if

We say that a packet

t > θ} = 1

Given a packet

ϋ

for

Z

ϋ

is

"complete"

h € u .

it is clear that

- packet

h e u,

U Π H

is an

is a right process in the sense

of Getoor [8]. But completeness may be lost in this operation, and on a complete packet one has the moderate Markov property of

Z

.In

anticipation of things to follow, we point out that starting with a process Zfc,

or collection of such (i.e., of

P's

on

(Ω,G°)),

it is often

possible to find a packet which contains the given process (or processes),

but little or nothing superfluous.

This is beneficial in

applying the prediction process. As a first step in the construction of packets, we prove THEOREM 2.1. Borel subset ffor or z e e A A . . z packet, with

h & H c H , n Ά

a)

A c H , let R be a A z z of H (i.e., R € H) with P {z^ 6 R f o r all t > θ} = 1 Ph{ A TThen hen t h es e t H = { h e H : P { z e R , t > θ } = l i s a A t A A c A b) For each h e H , there is a Borel packet H with and further c)

PROOF.

Let

R

Ω

on

function

. h

Given any non-void subset

The packet

n

^ {z € H : P {z € H, t h H

of

a)

Then

T

E (exp - αT)

is is

for all

t > 0} = l} . —

is complete.

T = inf { t > 0 : Zfc e H Q - R^}

(as in [1, I, (2.8)]). {h fc H

H

be the hitting time of

Z(=V Z ) - measurable, and for

α > 0

α-excessive for the right-process on

Further, we have for any

α > 0

H

Π H

the H =

: E (exp -αT) = θ}, which (since the right-process has a Borel

22

FRANK B. KNIGHT

transition function) is a nearly-Borel set [8, (9.4) (i)]. Since we have

H

= {h : q(O,h,H

Π H ) = l},

it follows that

H

is nearly

Borel in H . Hence it is universally measurable. Also, for h e H the z t process E (exp -αT) is h - a.s. right-continuous. For h e H , A Z e

E

(exp - αT)

is thus a positive right-continuous supermartingale

starting at 0 . Hence it is 0

for all t, and H

is a packet.

Turning to the proof of b ) , we use a familiar reasoning due to P.-A. Meyer.

Since

,

_

H

is nearly Borel, for h e H

A

,

η

there is a Borel set

A

HΓ with Hn c (H Π H j and P {z e H for a l l t } = 1 . Then by the n 1 A 0 tn l z l same reasoning as for part a) the set H^ = {z £ H : P {z e H , t > 0} = 1} i s a packet with P {z_ e w } = 1 . Similarly, we define by induction a l

o

2n

2n—1

s e q u e n c e H o H D HT :> . . . such that for a l l n, H_ e H and H, ^ h 2n °° 2n i s an packet with P {z ^ H } = 1 . Then p l a i n l y H^ = Π H^ = 1 ΠR " defines a Borel packet and P {ZΛ € H°°} = 1 . Finally, we s e t n n Oh ra H = {z : P {z c H } = 1} . Then H, is a Borel packet, h e H , and if h 0 h n h P z {z & H for t > 0} = 1 then obviously z e H . t h ~ n Before proving c ) , we mention two simple Corollaries. COROLLARY 2.2. H

c H

on H A , there is a Borel packet

μ

with

A

μ

For any probability

p { z Q e H } = 1, and further

H y ^ {z e H : P Z { Z t e H

for all t > 0} = 1} . PROOF. H

1

By definition of nearly - Borel set, there is an Π H , H 1 & H, with

c H

part a) the set H and

= {z e H

^ί^t

e

H1

for all t} = 1 . Then as in

Z

: P { z & H , t > 0} = l} is a packet,

P μ { Z e H } = 1 . Proceeding by induction as in b ) , we obtain a

e decreasing sequence H c H Π H with H H, and H such that p μ { Z Λ e Hn } = 1 . Now let H°° = ΓΊ H Π , and 0 μ μ n H = {z : P {Z Λ e H00} = 1} . μ 0 μ

COROLLARY 2.3.

For any packet

K c H

H

. Thus

K

such that

K fl H = H

a packet

Π H , we have

is the largest packet having the given non-branching

points of H . PROOF.

For any packet

K, one has q(O,z,H

(Ί H ) = 1

for z € K . But

it follows by the definition of H , using the Markov property again, H contains all z with q(O,z,H Π H ) = 1 . Thus the Corollary Ά A U is proved. REMARK. We observe that for any initial probability μ on H , an element that

h

of H

is defined by h(S) = /

/ A

=

q(O,h,dy)y(S)μ(dh) 0

J H (/„ q(O,h,dy)μ(dh)) y(S) , S e G 0 A

ESSAYS ON THE PREDICTION PROCESS

where the probability in parentheses is concentrated on

\

23

Π H . U A Returning now to the proof of Theorem 2.1 c ) , for h € H let c H be a Borel packet as in b ) . We wish to show that

P h {Z

e H for all t > 0} = 1, tA t > 0} = 1 . Now 1 (Z. ) is a H h t-

previsible stopping time,

H

and we know that

P {z e H for all t n Z - previsible process, and for each t

T, 0 < T < <», we have by the moderate Markov

property ph(Zt e Hh

for all

t > τ|Z

)

V = p

(Z

e H

for all

t

> 0)

= 1 . Consequently, by b) we have theorem it follows that

P {Z

P {i

(Z

H

P {Z

e H

for all

e H^} = 1 . ) = 1

for all t > 0} = 1 ,

~

t > 0} = 1

as required.

A natural question is whether, given a set

R , with

A = {(0,0)},

Here the points P

,

A € H,

there is a

The example of a Brownian motion

B (t)

in

shows however that no smallest packet need exist.

(x,y) € R

correspond to points of

R

and clearly any polar set may be subtracted from

non-polar set may be subtracted) to leave a packet. this example

and so

fc

h

smallest packet containing it.

By the previsible section

via the usual R

(but no

It can be shown that in

H

is the set of all Brownian probabilities corresponding to 2 2 initial distributions μ on (R ,8 ) , but the proof probably requires Ray compactifications (see Discussion 3) and 4) of Conjecture 2.10 below). It also should be noted that the definition of packet depends only on the transition measures

q(t,h,dz)

not depend on the exact choice of

of the prediction process, and these do Z

•i

the

W

(which is not unique since it involves t

of Definition 1.6).

In short, a packet is just a continuous time

analogue of "conservative set" for a Markov chain. elements of

A

In the case that the

are themselves Markovian probabilities on

Brownian example above) the measures

q(t,h, ) , h e A,

Ω

(as in the

are usually easy to

identify, and the appropriate packet becomes evident. This leads to a method of finding a "nice" transition function for a Markov process, which is the subject of the third essay.

Here we can

illustrate it in a more classical case by continuing our example of B (t) .

Let

killed process

B

be a Borel, non-polar set in R , and consider the usual 2 2 2 B.(t) = B (t) for t < T , and B.(t) = Δ for t > T Δ B Δ 0 " B0

24

FRANK B. KNIGHT

where (x

Δ

is adjoined as an isolated point.

y)

2

p ' { B ^ ( t ) e c} (x,y) .

C e B ,

are only known to be universally measurable in

Thus one obtains for

function.

Classically, the probabilities

B

a universally measurable transition

However, using the prediction process it is easy to get a

transition function on a countably generated subfield of universally measurable sets which is the restriction of a Borel transition function on a larger space.

The natural state space of C

(finely open) set

(B*) = {(x,y) : P

(x

'

B

γ)

is

{T B

° B

together with the i.e., the

0

complement of the set of regular points for functions for

Δ

> 0} = 1},

B

are Borel measurable, and

.

Since

E

α-excessive

(exp -αT

)

is

0

α-excessive, it is not hard to show that

(B )

need only its universal measurability. (w (t) , w (t)) , where Δ =

C00,00)

identically

,

(B^)

C

U Δ

image in

0

for

into H .

H

Έ>

φ(z)

(x,y) + p f X ' Y )

We have R Δ = {z € H : φ(z) e U B * )

H .

countably generated q

image.

0

Let

S

R

denote the

U («,«)} x X* = 3 (0,0)

is the Borel mapping of Theorem 1.9, and

generating sequence

with

;

Since

universally measurable from

and

B (t)

we obtain a one-to-one mapping of

defined by

first two coordinates.

measurable in

Identifying

and all other coordinates are set

ana where

is a Borel set, but we

G

φ«(z)

denotes its

is countably generated, and

(B )

U Δ

in place of

Then the trace of

into S

H,

that

H

on

R R

φ

^s

Λ

it follows by using a is universally is mapped by

σ-field of universally measurable sets in

on the trace maps by

P

φ

( Q ) C U Δ,

into the transition function of

In the present case, it can be shown that the image

onto a B

B.

on the

σ-field is

really the Borel field, but this seems to require in general Meyer's hypothesis of "absolute continuity". The theory of Ray processes (and Ray semigroups) is rather well understood, and will not be developed here. all of the facts we shall need.

We refer instead to Getoor [8] for

By means of the familiar compactification

procedure (to be described below) this theory may be brought to bear on any parcel of the prediction process.

Thus, it leads to a more satisfactory

form of Theorem 2.1 (Corollary 2.12), and also to an interesting open problem (Conjecture 2.10) which is discussed in some detail.

It also makes

ESSAYS ON THE PREDICTION PROCESS

25

possible a transcription of much of the "comparison of processes" from [8] to the prediction process setting, but some of this we leave to the reader.

Part

of the material which we do cover is needed again for the fourth essay. We start with any prediction packet which we denote by convenience although

A

alone is unspecified and

H

H

for

has no reference in

A general to Theorem 2.1.

It is clear from Theorem 1.17 that

Z

becomes

a right process on H Π H , with the Borel transition function if H Π H is not Borel, we have for z e H ίl H , q(t,z,B) = A U A U q(t,z,B Π H

Π H )

for

B € H,

subset of the compact metric space

H

Then

C

H

fl H

H

has a countable subset which is dense in

H,

which will be

C

in the uniform norm.

R g(z) denote the resolvent of the right-process Z λ t H Π H , we form the minimal set of functions containing A 0 + {R g : λ > 0, g 6 C } and closed under the two operations: λ a) application of R for λ > 0 , λ b)

as a

(H Π K L ) + is as follows. Let C + denote the A 0 of non-negative continuous functions on H .

Letting

formation of minima

Since we have

ίl H

of Proposition 1.3, and form its Ray

compactification (as in Chapter 10 of [8]) relative to

restriction to

(even

where the right side is the extension to a

universally measurable set). Consequently, we may consider

denoted by (H Π H ) The definition of

q

on

f Λ g .

(f Λ g) + (h Λ k) = (f+h) Λ (g+h) Λ (f+k) Λ (g+k),

it is easy

to see by simple induction that the set is closed under formation of linear combinations with non-negative coefficients.

Hence, it is the minimal

convex cone closed under operations a) and b ) . A crucial lemma ([8, (10.1)]) now asserts that this cone contains a countable uniformly dense subset. Furthermore, the cone separates points in

H

ίl H Λ since R does so. A 0 λ + We now define (H Π H ) to be the compact metrizable space obtained by completing H ΓΊ H Λ in a metric Σ°° . α 1If (z.) - f (zj I , where (f ) A 0 n=l n n 1 n 2 ' n is uniformly dense in the cone, α > 0, and Σ _ α(max f ) < « . n=l n Clearly the topology of of

f

or

α

.

(H

Π H )

does not depend on the particular choice

It is homeomorphic to the closure of the image of

H Π H Λ in X00 . [0,~) by the function f(z) = (f, (z) , fo(z),...) . A 0 n=l 1 2 If H is Borel, then its one-to-one image in (H Π H_) is also Borel, A A 0 while in general its image is universally measurable [8, (11.3)]. It is now easy to see by the Stone-Weierstrass Theorem that the space C(H

Π H_) A 0

of continuous functions on

(H Π H Λ ) A 0

is the uniform closure

26

FRANK B. KNIGHT

of the differences (H

Π H )

- g

of elements of the cone, extended to

by continuity.

differences on (H

g

H

Π H_,

Letting and

f

f

denote a uniform limit of such

its extension by continuity to

Π H ) + , we now define a resolvent on

(2.2)

f G C(H A Π H Q ) + ,

R χ f = R^F ,

The resolvent

Π H ) + , by

C(H

λ > 0 .

R

has the special property that it carries C(H Π H ) λ A U into itself. Finally, one shows [8, (10.2)] that every element of the cone is λ-excessive for some λ > 0, hence R. separates points and so λ + RΛA is a Ray resolvent on (H n U) A 0 It follows by a Theorem of D. Ray that there is a unique rightcontinuous Markov semigroup

P

on

C(H η H n ) A U

t transition measures we denote by Space (of

Z

on

DEFINITION 2.4.

H

η H )

p(t,h,dz) .

with resolvent

R. , λ

whose

We also introduce the Ray

as in [8, Chapter 15].

The Ray Space is the set

(i) = 1 } . REMARKS.

More properly, one should write

^ n n Q c o n f u s i o n w i l l A 0 arise. It is clear that U does not depend on λ > 0, and that it is + universally measurable in (H Π H ) . If H e H then U is also Borel. A 0 A A Three basic facts about P

P

ϋ

from [8, Chapter 15] which serve to connect

with the prediction process may be summarized as follows.

PROPOSITION 2.5. 1.

For

Thus p and 2.

For

z e H A Π HQ q

and

f € c(H A Π H Q ) + we have

may be identified on z e u A

we have

H

P f(z) = Q

f(z) .

Π HQ .

P. (I u n u (z)) = 1 t H_||π A 0

for

t > 0

(where

5

is t

defined for universally measurable functions by the usual extension procedure). 3. For the canonical Ray process (X , P ) on the probability space of r.c.1.1. paths with values in (H Π H Λ ) + , we have for z e u A 0 A Z P {5 e H Π H_ for all t > 0} = 1 , t A 0 and Z

P {X

e u

Recalling again the space metrizable space

Ω

H

for all

t > 0} = 1 .

of probabilities on the compact

of equivalence classes of measurable functions, we will

ESSAYS ON THE PREDICTION PROCESS

show that the Ray topology is stronger on Hence

(H

Π H )

single elements on H

Π HQ

than the

H-topology.

is "saturated" by the equivalence classes of elements

corresponding to the same element in

elements of

H

27

H

Π H

.

H,

Furthermore, on

have a special form:

which are r.c.1.1. for

and these classes reduce to

t > 0 .

ϋ

the corresponding

they assign probability one to paths

Only the right-limits at

t = 0

are not

known to exist, hence the mapping does not quite have its range in

H .

Nevertheless, it is sufficient to permit properties of the Ray process to be applied to the process

Z

for

h e H

Π HQ .

Turning to the details, we first characterize convergence in LEMMA 2.6.

A sequence

dense sequence

f n

h

€ H

is Cauchy in

H

by

if and only if, for the

tic 1.4, of Notation

/ 0 exp(-βt) f n is a real Cauchy sequence in PROOF.

H

k

for each

n

θ t dt and

3 > 0 .

By Theorem 1.2 b) the integrals are uniformly continuous on

Hence our condition is clearly necessary.

Ω .

To prove sufficiency, we observe

by the same result that

E kf

θ

n

are uniformly continuous and bounded in

t

t,

uniformly in

k,

for each

n .

Then by inversion of the Laplace transforms (as in Lemma 1.6) h rk J o exp(- βt) E f n

we have convergence in

k

of

h k E Kf

for each

t > 0

(h^)

H, as required.

in

and

θ t dt

n .

For

θ,. t

n

t = 0

this reduces to convergence of

Using the Lemma, we may compare the Ray and THEOREM 2.7.

If we have

h

e H JC

Π H ,

A

1 < k,

vJ

H-topologies. and

"™

+

in the topology of

(H

Π H ) ,

A

then

lim h

k-*»

°

lim h, = z , *•-

= h

exists

K,

exists in the topology

K

of H . Furthermore, let h(z) denote the induced mapping: h(z) = z on H Π H Λ , h(z) = h if z £ H Π H_ and (z,h) correspond as above. Then A 0 A 0 h(z) is continuous P

{Z)

on

(H

{paths r.c.1.1. for

0 HJ

.

Finally, for

t > 0} = 1 .

z e U

we have

28

FRANK B. KNIGHT

PROOF.

Let

h

e H

f] H

be a convergent sequence in the Ray topology,

with limit z e (H Π H ) + . This requires convergence of R.g(h^) for z g € C+ . Still more particularly, g e C . Still more particularly, let g(z) = E f (= zf in Notation 1.10) for 0 < f e c(Ω) . Then we have H

(2.3)

R.g(h, ) = E λ Jc

7

k

e " λ t E fcf dt

Γ u

k

λt

λ t f E• " /g • e-""

θ

at,

λ > 0 .

Thus convergence in the Ray topology implies convergence in the topology of 5

by Lemma 2.6.

h k •> h(z) .

Accordingly, there is a unique

Since

Hft Π H Q

is well-defined and continuous: identity on

H

Π H

h(z) e H

(H A Π H Q )+ ,

is dense in (H, Π H_) A 0

•+ H,

such that

the mapping

h(z)

and reduces to the

.

We will examine more closely the case

z s u

.

Passing to the

limit in (2.3) yields R ^ U ) = E h ( 2 ) Γo e " λ t f

(2.4)

θ t dt ,

but the middle term in (2.3) is no longer well-defined in the limit if h(z) φ H H) .

(in the context of [9], Z

becomes the prediction process on

However, the same limit may be expressed in terms of the Ray

process

Xfc

of Proposition 2.5, since

X t = h(Xfc)

on

HA Π HQ .

To this

end, we need to establish LEMMA 2.7.

For

g(z) = E f,

f

continuous on

R^i"(z) = E Z / 0 e " λ t E REMARK. PROOF.

t

Ω,

and

z e u , we have

f dt .

This was also used for [10, Theorem 2.4 d)] with incomplete proof. For

3 > 0,

the function

R o FLg p

is

8-excessive for

λ

t

it is known [8, (5.8)] that lim RβIΓg" (XJ = R β £Γg"(z) , t

40

M

fc

β

P Z -a.s.

λ

Also, by (2.2) and the resolvent equation, lim β R. R.g = lim 3 R o R Λ g 3H»

3

λ

3^00

X ,

β

λ

lim (3/(3-λ))(R.g - R.g)

hence

ESSAYS ON THE PREDICTION PROCESS

and the limit is uniform on

(H A Π H Q )

+

.

It follows that limits can be

interchanged to obtain ^

= lim lim 3E S+oo t-*0

Z

RQIΓg*(X ) 3 λ t

= lim lim βE +0 3_oo t

Z

R

= lim E fc*O But since for

t > 0

we have

X

Z

3

R g(X ) λ t

RΓg(X. ) λ

t

e H Π

H

P -a.s., the last expression

becomes = lim E Z R,g(Xj

= lim EZ E X t tK)

= lim i Z t-K)

Γ e " λ S E X s f ds

Γ e " λ s E X t + S f ds

= lim i Z e λ t t-K)

Γ e " λ s E ^ f ds fc

;

dt

f

completing the proof. Combining this with (2.4) yields

i 2 /; e " λ t E X t f dt = E h ( 2 ) Γo e " λ t f

(2.5) Since

X

θfc dt .

is right-continuous in the Ray topology, which we have seen is X

stronger than the t > 0 .

H-topology

By Theorem 1.2 b ) , E

on Z

H,

EZ E

f

θ

f

is right-continuous in

is also right-continuous.

Thus_by inversion of the transforms in (2.5) we obtain E Z E tf = E h ( z ) f

θ ,

t > 0,

for

0 < f

continuous on

Ω .

By

Proposition 2.5.1., the left side is X E Z E t(fIΩ) . By monotone class argument the equality extends to bounded Borel it follows that the right side is

E

((flfi)

θ ) .

f,

hence

This implies that

30

FRANK B. KNIGHT

for

t > 0,

t -> 0,

p

{paths which are r.c.1.1. in

[t,°°) } = 1 .

Letting

the last assertion of Theorem 2.7 is proved.

It is thus plausible that for

z <=• U A

the Ray process may be

expressed as a prediction process on a slightly larger space than smaller than

H .

NOTATION 2.8. and let

H

Ω

= {elements in

Let

F°

=

t +

h

&

H ,

Π

Ω

which are r.c.1.1. for

H χ = {A Π H

transition function

ε > 0

q

S

to

(H , H ) ,

Setting

H

= {h e H

h e H ,

if

H

X

is not

meaning of

X

for

t > 0

) = 1

for

Then for

and extend the

: P {Z

F° -measurable. Ω",

X

= h} = l}, H

Theorem

.

This conforms to the

is not even well-defined.

X

since

We will not elaborate all details,

Ω|F

.

The

is really in the sense of an essential right

of the prediction process on P (θ

Z ,

is replaced by

limit, which happens to coincide with PROOF.

/?} . Ω

in such a way that Theorems 1.7

1.17 also applies for

fact that as a "coordinate in

on

t > 0},

h

and 1.15 remain true.

Note that

:A e

σ(X , 0 < s < t + ε )

one can define the prediction process

REMARK.

but

We introduce

Let

= {h e H : h ί Ω ^ = l},

THEOREM 2.9.

H,

H

Z

X

= X

.

is just a special case

of [9]. The point is that, since

t > 0,

we can use exactly the same σ-fields v> G and the same construction as before to define Z for t > 0, to show that P {z e H for t > 0} = 1, and to show that the same transition v> function q continues to apply for Z , t > 0 . On the other hand, for f continuous on Ω it follows by Hunt's Lemma that for rationals r > 0, lim

Z h f = lim E h (f r XH

θ |F° )

) , Since

Z

is right-continuous for

in the topology of

H,

t > 0,

P h -a.s.

we see that

lim Zfc = Z Q

exists

P -a.s., and

E h (s|F° + ) = zJ(S) , S e G °(Ω) . Now

if we define

q(t,h,A) = Ph{z!) e A }

q(t,h,A) = q(t,h,AΠH ) s > 0

implies that

t > 0 .

for

h e H,

for

h e H

q(s+t,h,A) = /q(s,h,dz) q(t,z,A)

On the other hand, for

- H,

A e H ,

the Markov property of

s = 0

we have for

q(t,h,A) = P h {z£ € A}

Z

for all t > 0

and

for s > 0

and

ESSAYS ON THE PREDICTION PROCESS

31

= E h P ° ( ^ 6 A)

= /q(O,h,dz) q(t,z,A) , completing the verification of the Chapman-Kolmogorov property of Since

H

c H ,,

P {Z

€ H

it only remains to verify that for

} = 1 .

Since, by construction,

Z

is

q .

h e H , F

/W -measurable,

this last is a consequence of the strong Markov property with

T = 0 .

Formally, it follows because

implying that the expression in the last parentheses equals

1,

P -a.s.

In view of Theorem 2.9, we define the prediction space and prediction process of

(Ω , G , F 1

, θ , X ) t""

t

in complete analogy with

t

Definition 2.1, and it has the same Markov properties noted there.

We

are now in a position to state an interesting conjecture concerning the relation of this prediction process to the Ray processes (see also Theorem 2.7). CONJECTURE 2.10.

For any packet

—"h

μ h (dy) = P (h(X ) € dy) . for

t > 0

H ,

h e u ,

—

Then

X

is

X

let

(Ω , G , F

, θ , X )

on

H ,

at

in the

μ, (dy) . n

DISCUSSION.

1)

H-topology,

P -a.s., the conjecture follows if it is shown that the mapping

h(z)

Since

dy & (F/|H )

P -equivalent in distribution

to the prediction process of

with initial distribution _

and

V»

has right limits in

H

is one-to-one on the non-Ray-branching points of

t = 0

U

.

The converse

implication is also clear. 2)

We do not conjecture that

prediction process of a fixed element of For example, consider the sequence probability of the process

X

h , n

X H

is .

1 < n, —

P -equivalent to the

This is false in general. where

which with probability

h

is the n

1/2

chooses one of

the two paths w χ (t) = n " 1 + (t-l) + or w 2 (t) = - ( t - l ) + , = max (0,t-l) . Then in the Ray topology lim h = h, n-*» n is the Ray branch-point which with probability 1/2 gives the

where

(t-1)

where

h

32

FRANK B. KNIGHT

prediction process of either of the deterministic processes X

= -(t-l)

+

Xfc = (t-1)

or

. It is not hard to see that this initial distribution for

the prediction process cannot be expressed as P { Z Q e (•)} for any z e H

. The necessary and sufficient condition for such a representation

is contained in Theorem 1.2 of [10]. On the other hand, in the H-topology

lim h n = z, where z fc H n-*» concentrated on 2 points of Ω . 3)

is the obvious probability

The importance of the conjecture, at least from the

standpoint of theory, lies in the fact that all entrance laws for the transition function

q

on H

initial distributions on

Π H-

(H Π H )

(having mass 1) are expressed by +

for the Ray process.

This fact

seems to have first been noted by H. Kunita and T . Watanabe [11, Theorem 1 ] . Hence, our conjecture is equivalent to the assertion that every (finite) entrance law for the prediction process on H

Π H

initial distribution of the prediction process of Of course, it suffices here to consider the case

is realized by an (Ω , G ° , F ° , θ , X ) .

H Π H_ = H . The A 0 0

analogous conjecture for the prediction process of [9] on H (or equivalently, on the set H

= {h e H : P h { Z = h} = l})

it is already closed under formation of entrance laws. space of H Q

would be that

Hence the Ray

would correspond to a subset of initial distributions over

H

It is easily shown that this Ray space does define a process corresponding to each than

P , h ^ H, and by Discussion 1) above it is then strictly larger

H . The class of processes for t > 0

obtained from initial

distributions on the Ray space is then the same class as those obtained from all initial distributions on H Q

(or equivalently, on H ) , if

this extended conjecture holds. 4)

For the packet of an autonomous germ - Markov process, the

conjecture holds and X

is even represented by a single element of H-

(see [10, Theorem 2.4] for a more general setting). As far as concerns the left-limit process

Z

, it will be seen that

the result of Conjecture 2.10 does hold, at least for Borel packets.

A

still more satisfactory result will be shown subsequently. THEOREM 2.11. For any Borel C. = {z e (H Π H ) Ά A U in (2.2),

:

for

R λ f( z ) = /p(0,z,dy) R λ f(h(y))},

{y : h(y)e H A Π H Q } . Then any

H -packet H Π H Λ , let _ ϋ A 0 f & C (H_ Π H j + with corresponding A U

Z

z ^ u , ? l\_

e

CA

CA

f

as

where the integral is over

is Borel in

(Hft Π H Q ) + , and for

for all t > 0} = 1 .

ESSAYS ON THE PREDICTION PROCESS

PROOF.

Since

h(y)

function, while

is continuous and

q(t,h(y),A),

p(t,z,dy)

is a Borel transition

is also Borel in the Ray-topology,

it is clear by letting + Borel in (H Π H ) .

Therefore,

for the Ray

To prove the second assertion, it suffices to

assume for

for some

ε > 0,

t > 0,

σ-fields on

P -augmented

H

I

^Xt-^ *

and since

we may as well assume

identified for same

range through a countable dense set that

C

is A

σ-fields.

t > ε

t > 0

f

A e H,

33

z 6 H

i s

X

we see that

Π H

Z

generated by

σ-fields

I

P r e visible process

and

Z

. Then

and since the Ray and Π H ,

a

are identified X

and

Z

are

H-topologies induce the ^t-*

A Z ,

i s

Previsible

s < t .

f o r

t n e

By the previsible

section theorem, it now is enough to show that for previsible

T

with

0 < T < °°, P Z ίl (X_ ) = 1} = 1 . Since X « H (1 H for t > 0, C T"* Iτt A U ~" A the Ray processes have the moderate Markov property, it follows that

(2.6)

R χ f (X τ J = E E

T

"

Jo e

u

and

f(X t ) dt

":τ-rdy) R χ f(y), P Z -a.s. Since

h(y) =» y

on

H

Π H ,

this is the asserted result.

Irrespective of Conjecture 2.10, we can regard

C

as a complete

Borel packet in the Ray space, each of whose elements corresponds to an initial distribution on

H

Π H

.

However, a stronger result is evident

by comparison of (2.6) with the moderate Markov property of 1.18).

Z

(Theorem

Thus the expression in (2.6) must also equal

Γ

~λt

f H

A

T

0

since both determine the probabilities of this expression by theorem that for

R. f(Z ), A T— z e U ,

t—

But by continuity of

h(z)

Z t -

Substituting for COROLLARY 2.12.

Z

τ + t

given

Z

A

t—

for all

t > 0} = 1 .

we have

= lim Z S s-^t-

= lim h(X ) = h(X S Z s-^t~

) .

in the above, we have shown

For any Borel

.

Denoting

it follows by the previsible section

PZ{R.f(X. ) = R.f te ) A

z

H -packet

H

Π H ,

let

34 D

V

FRANK B. KNIGHT +

= {z e ϋ A : for f ec(H f t ΠH 0 ) ( z ) = R f(h(z)) } • Then I) is λ P Z {X

zeu

h(D ) Π H

e

D

t > 0

for all

A

with

f

as in (2.2), (H A Π H Q )+ , and for

orel in

= 1 . Finally, the image

is a complete Borel packet in

H

containing

H

Π HQ .

PROOF.

Only the final assertion remains to be shown, since obviously D A + is Borel in (H Π H ) . But since z is determined uniquely in (H Π H ) + by {R.f(z)}, we see that h(z) is one-to-one on D . A U A A Hence h(D ) is Borel in H., and h(D ) Π H is Borel in H . Since A 1 A for z € h(D ) Π H we have A P Z {Z

e

and

H

h(X

) = Z

for all

t > 0} = 1 ,

the result is proved. According to Corollary 2.12, starting from any Borel H Π H , we can form the complete Borel packet A U

H -packet

h(D Λ ) Π H A

containing

it, all of whose elements determine the same processes as corresponding initial distributions on process

Z

h(D ) Π H-,

remains in

H

Π H

natural to replace the process on h(D ) Π H A U on

DA,

with left-limits in

and have the property that the

for all H

t > 0 . Thus it is quite

Π H

by the right process on

h(D ) ίl H . A

Since

h(z)

is one-to-one

we can regard this process equivalently in either the Ray or the

H-topology in so far as concerns its times of discontinuity.

Thus, there

is no need to make an elaborate "comparison of processes," as in [8, Chapter 13] for example.

Instead, we can transcribe results for the Ray

process directly into results for the

H-process.

To conclude the present

section, let us illustrate this by transcribing Theorem (7.6) of [8, Chapter 7 ] . THEOREM 2.13.

For a Borel

distribution on

H

Π H

H -packet

If

Z

= Z

Z^-previsible and (ii)

Let

B

Z

on μ

Z

(Z

let

p

be a fixed initial

D

h ( A ) Π H ),

are the usual augmented

{0 < T < »},

P U -a.s.

then

T

and let σ-fields

is

= Z£_ .

denote the set of Ray branch-points in

Then the totally inaccessible part of T T on A T

where

Π H .

(or more generally on

T be a Z^-stopping time of u for P μ ) . (i)

H

A

1

«> on

is

Ω_ - A

z

(H

Π H )

ESSAYS ON THE PREDICTION PROCESS

35

+

A = {0 < T < » , χ τ _ e (H A Π H Q ) - B, X τ _ f X ^ = {0 < T < co, z τ _ e PROOF.

H()

, Z ^ * Z τ >,

Both (i) and the first expression for

A

P -a.s.

in (ii) are taken

directly from [8]. It remains only to verify the second expression for Clearly if hence p

h(z)

z

G

D

a n d A

z φ B .

{Z

z

< )

e

H

= h(z)} = 0 .

z £ n

z

h

z

a n d

= < ) and

p Z

P {h(X Q ) = h(z)} = 0,

z e B,

^

0

A.

= z} = 1,

h(z) e H - H , then

Z

Hence

Then

t h e n

' o

Conversely, if

P { x = z } = 0 .

3.

h

and so

completing the proof.

A VIEW TOWARD APPLICATIONS. Since the object of the present work is not to study the prediction

process per se but to develop it for applications to other processes, we conclude this essay with some general observations and partly heuristic discussion of the simplest types of examples. that by choosing different packets

H

It may appear at present

one can obtain in the form

Z

practically any kind of r.c.1.1. strong-Markov process, but this is not quite true.

A special feature of

Z

that is important in applications is

the absence of "degenerate branch points."

Here a degenerate branch point

is one from which the left limit process jumps to a fixed point of the state space.

But since we have a Borel transition function

the moderate Markov property, and z = h <= H ,

q(0,h,{z}) = 1

for

if and only if

such deterministic jumps do not occur.

expression of the fact that, by Corollary 2.12,

q(t,z,A)

This is again an

Z

is practically just

the Ray process of a right-process. The same fact permits us to give criteria for or for it to be a THEOREM 3.1. Then a)

Z -previsible process.

H be a complete Borel packet for Z (Definition 2.1, 3)). A t is a Hunt process on H relative to the usual σ-fields Z A t

Z t

H

c H

(i.e.,

H

is an

quasi-left-continuity of the

distribution

μ

on

continuous

(Pμ a.s.

PROOF.

H

If

to be a Hunt process,

Let

if and only if the

Z

H A

c H

b)

Z

σ-fields

is "C

for all

H -packet). Z^

for each initial

Z -previsible if and only if it is t

μ) .

then clearly

Z

is a right-process.

that it be a Hunt process is then quasi-left-continuity. any for

Z -stopping time P

μ

([4, IV, 81])

T

This implies

The requirement

By decomposing

into accessible and totally inaccessible parts

one sees that for quasi-left-continuity it is

necessary and sufficient that for any increasing sequence Z° -stopping times with

lim T

= T

and

μ

P {

τ n

< T} = 1,

T

of

one have

36

FRANK B. KNIGHT

P μ {Z

= Z } = P μ {T < «}. T-

T

But by the moderate Markov property,

P μ ( Z τ = Zτ_

(3.1)

T < 00)

= E μ ( P μ ( z τ = z τ - |Z τ _) = E μ (q(O,Z τ - ,{Z τ - })

T < co)

T < 00)

= P μ {T < «} , since

q(O,z,{z}) = 1

on

H

and

H

is complete.

Finally,

H

c HQ

is necessary even for a right-process, so the converse is obvious. last statement and (iv)].

of

Thus it is another way of ensuring that

X

is a Hunt

process in the Ray topology, as remarked in [ibid, (13.3)]. considering

H A

(

C

we see that for

The

a) is proved for Ray topology in [8, (13.2), (i)

H ) μ

as a subset of

concentrated on

h(D) Π H H

However, by

from Corollary 2.12,

the process

Z

is quasi-left-

continuous if and only if it is quasi-left-continuous in the Ray topology. Hence the result carries over. Turning to b ) , continuity implies previsibility so we need only prove the converse. Then if Z

is Z -previsible, both

processes, and to prove that indistinguishable. show that for

Z

and let

N •* ») . ,{Z

However, since Z

Z

are

Z -previsible

is continuous we need only prove them

T,

0
by

T

P^{Z = N

= Z } = 1

(as usual,

{T = 0

T > N},

on

By the moderate Markov property we have }), hence we must show that

Z

-measurable.

and

By the previsible section theorem, it is enough to

Z -previsible

we may replace general previsible

= E μ q(O,Z

Z

P M {Z

or μ

P {Z

= Z }

e H - H } = 0 .

is previsible it is known [4, IV, 57] that

Z

is

Since

and there are no degenerate branching points, we must have

Z

e

H

as

required. To give a feeling for the applications, we will consider briefly three situations: a)

X

is a Markov process,

b)

Z

is a Markov chain,

c)

(X ,(w

^(t)))

is a Markov additive process.

It is to be noted that b) is a condition on conditions on

X .

Z ,

while a) and c) are

Thus our examples illustrate the point that in the

ESSAYS ON THE PREDICTION PROCESS combined study of considered.

X

Zfc

neither is necessarily the first to be

One may start either with a known process or a known

prediction process. on both

and

37

X

and

To be sure, one does not ordinarily make assumptions

Zfc,

since each determines the other uniquely.

To study the case of Markovian

X ,

if we are not interested in any

"hidden information" we can assume for convenience that P{w_ _ (t) = 0 , 2n-l

l
from our notation. since

Z

t > 0} = 1, -

and drop the coordinates

To relate the Markov properties of

X

w_ . 2n-l

and

Z ,

has the role of a conditional distribution relative to

we must assume that

X

is Markov relative to

we could equivalently use in general since

Z

X

, Z

,

might not be

the probability of

X ,

and let

and

F

F° ,

.

Alternatively,

but we cannot use

F -measurable.* H

F

Let

k ^ H

denote

be a Borel predict!oti packet for

X as, for example, in Theorem 2.1 and the discussion following its k proof. As noted in Definition 2.1 2 ) , (φ(Z ), Zfc) is P -equivalent to (X , Z ),

and it suffices to look at the former pair.

see how the Markov property of property of

Z

.

X

It is not hard to

translates into an instantaneous

In the first place, in view of Theorem 1.9 and the

Remark following,

Z°

is

P -equivalent to the

σ-field

χ

where

χ =σ{φ(Z ),

s < t} . Hence, the Markov property of X is equivalent to k o (for P ) of Z and σ{φ(Z ), 0 < s} t t+s *" φ(Z ) . But Z is also defined as a conditional probability over

the conditional independence given

the latter

σ-field given

Z ,

namely

Z t (S) = P ((θ^)""1 ίφ(Z Since

F

P -a.s.

) e S}|Z°) ,

is countably generated, it follows that by

φ(Z )

S e F° . Z

is determined

(the details of this transparent reasoning are given in

[10, Theorem 2.2]

and fortunately need not be repeated here). It follows — that there is a P -null set N and a 8^/H-measurable ψ such that Z = ψ (φ(Z )) for w Φ N . Conversely, if such N and ψ exist, t t t z t t t k

then plainly ψ

X

was Markov at time

t

relative to

plays the role of transition function for

conditional future definition

X

Ψt<

x t

)

I f

Φt

m a v

b e

F°

. The function

X , by assigning to it the

chosen free of

t,

then by

is homogeneous in time.

*A variety of analytic conditions making X Markov relative to F~ is given in H. J. Englebert [7]. If X t is only Markov relative to F°., then it is still germ - Markov relative to F£ + in the sense of [10] , and may be approached by the method developed there under suitable conditions. From the standpoint of Ω (as in [9]) F. coincides with Ffc_, and the distinction becomes meaningless.

38

FRANK B. KNIGHT

Perhaps the most noteworthy fact here is that even if homogeneous nor strong-Markov, the process

Z

tion of

q

ψ.(X ))

with transition function

any such irregularities of

X

are due to

is neither

(i.e., a standard modificahas both properties.

ψ.

t

X

and

N ,

t

not to

Thus

Z

t

.

t

This provides a ready method of investigating transition functions of Markov processes which, as mentioned already, is the subject of the third essay. At present, one may gain further insight by comparing this method to another one:

that of the "space-time" process.

any Markov process by

(t,X ) ,

process

X

(P ,X )

so that an initial value ,

t > 0,

(s,x)

conditional upon

X

means that one considers the

X

= x,

but with the added

S

STU

coordinate

It is a familiar fact that

becomes homogeneous in time if we replace

s + t

so that no value of the pair can recur.

While this

device is very useful in particular cases, such as in studying the heat generator (— - — ) , it has also been used occasionally in a general role 2. σt (E. B. Dynkin, [6, 4.6]).

Contrary to first impressions, the method of the

prediction process apparently is quite unrelated to this as a method of "making a Markov process homogeneous".

Not only are the respective

topologies quite different (assuming the product topology for the space-time process), but more importantly the prediction process can repeat values, and hence may be simpler. 0 < θ < 2π

For example, a particle confined to the unit circle

and moving with velocity

v(t) = t - [t]

(a saw-tooth function)

has prediction process with states corresponding to pairs 0 < v < 1,

while its space-time process has states

In general, if

X

(v,θ),

(t,θ),

0 < t < « .

happens to be a time-homogeneous Markov process then it

is usually equivalent to its prediction process, while

(t,X )

may be

somewhat artificial and intractable. Taking up our second illustration, since

Z

is always a homogeneous

Markov process it is natural to ask under what conditions it is a process of some special type.

For instance, if

Z

is a pure jump process, i.e., a

sum of finitely many jumps with exponentially distributed waiting times for the next jumps given the past, then property.

But unlike

Z ,

X

X

and suppose also that

obviously has the same

need not be a Markov process.

To indicate the possibilities for 1 < n,

= φ(Z )

w

X ,

(t) = 0

regarded as the real-valued process

we again take for

2 < n,

w

so that

(t) = 0, X

may be

w (t) . To construct a process X £ t having a pure-jump prediction process (apart from the case of Markovian X ) one can begin with any family K (x_,...,x t ,...,t (dx _xdλ )), n 1_ n 1 n n+1 n+1 1 < n of probability kernels over R * [ε,°°) , for fixed ε > 0, and

ESSAYS ON THE PREDICTION PROCESS

x k e R, t on

> 0,

R x [ε,°°),

variable with

1 < k < n . define

P{e

X

Letting

= x..

(x ,λ )

for

have any initial distribution

0 < t < e

where

> t} = exp(-λ t ) , independent of

Proceeding by induction, suppose that

39

x_,...,x

e x

and

is a random given

λ

e, ,...,e

In

.

have been

In

determined, and that X has been defined for 0 < t < Σ£ e . Then we select a pair (x _,λ ) distributed according to the kernel K n+i n+i n with

t

= 0,

U

t

JC

= Σ

e.,

ϊ= 1

and

~j

x

= X

JC

definition is completed by setting

t,

X

e

Σ

oo

n=l

1 < k < n . ""*""*

for

Σ*J

n-tΊ

< t < Σn+

e

k—1

The inductive

k

e

k^l

, k

is a random variable conditionally independent of

{χ_,...,x _, e_,.. ,e } 1 n+1 1 n On the

,

= x t

where

_

k-1

P-null

e < t . n -

given

λ

,

and

Pie

n+1

_ > t} n+1

CO

set where

Σ e < °° we define n=l n

It is evident that such

X t

X^ = 0 t

for

has a pure-jump prediction J ^ ^ ^

process, and it is plausible that any pure-jump prediction process

Z

all of whose expected waiting times exceed

is

obtained in this way (if

φ(Z )

is a.s.

ε 0

with probability

1

except for the first

coordinate). In this construction, even if distinct values,

Z

X

can assume only a finite number of

may have an uncountable state space since it

"predicts" the whole future sequence of

X -values.

it is easy to give sufficient conditions on the is even a finite Markov chain (other than

X

K

On the other hand, which imply that

Z

being itself one). Thus if,

for some fixed N and all n > N, K = K depends only on (x ., x ....,x ) while X n-N+1 n-N+2 ' n t moreover λ . is a fixed function λ (x H i 1 , x _,...,x ,x _) n+1 n+1 n-N+l n—N+2 n n+1 depending only on the

x v

possibilities for these chain. X

's x k

shown, then it is clear that the finitely many '

s

In particular, if the

imply that λ 's

Z

will be a finite Markov

reduce to a single constant

is a "generalized Poisson process based on an

λ,

then

N-dependent Markov

chain," in the evident sense of dependence on the past only through the last

N

that

Z

states visited.

Obviously, then, the possibilities for

X

such

is a pure jump process are quite great, and we do not pursue

them farther here. For a type of example which involves a non-Markovian which the unobserved data

(w

_ (t))

X ,

and in

are of basic importance, we

consider briefly the "Markov additive processes" (in the sense of E. Cinlar; see [3] and [153 for a vivid introduction and further references), 1 2 Roughly speaking, a standard Markov additive process is a pair (X , X )

40

FRANK B. KNIGHT

where X 1 is a standard process (in the sense of Blumenthal and Getoor) and fc 2 X is a real-valued process with conditionally independent increments 1 2 given X . In the applications X is observed, and one would like to t 1 make inferences about the underlying process X . For simplicity of 1 notation we assume that w (t) = 0 for n > 2, and that X. is realn t valued, so that we may identify the trap state Δ as °°, and let 1 2 1 1 X = w (t) , X = w (t) on Ω. Since X is Markovian, and given X the future increments of X are independent of F , it is to be P 1 o expected that the prediction process Z of (X , X^) is determined by 0

1

o

the value of X and the conditional distribution of X given If one is concerned only with X , it is simpler to treat 2 2 P X

- Xn

the form

as an additive functional, and consider S Π {x

in determining

= 0},

Z^

Se G

.

Z

Then the value z

if the values of

t

ί

χ

e

B

0

K

F

.

restricted to sets of X

becomes irrelevant

B ^ 5,

are known.

We can incorporate this change of view by redefining our translation operators appropriately.

We turn now to the necessary notation and

hypotheses. DEFINITION 3.2.

Let

Ω* = { ( w ^ t ) , w 2 (t) ) : w 2 (0) = 0

and

w ^ t ) j4 ± «>

for all t} . Further, let G* + = {S Π Ω* : S e G° + } and F* = {S Π Ω* : S & F° } . Finally, let θ*((w.,w.)(s)) = c t + λ z * *υ * θ w ) = θ θ W fW n Ω (w χ (s),w 2 (s) - w 2 (0)) and i^2 0 t^ l 2^ ° * t ( HYPOTHESIS 3.3. Ω

A standard Markov additive process

is a collection of probabilities

such that

w (t)

P

X

on

G

(w (t), w (t))

(= V

G* ),

on

x € R,

Δ = + « as the

is a standard Markov process (we take

terminal point), and (i) (ii)

P X {(w 1 (t),w 2 (t)) e B 2 > For

G* -optional

is

T < «>,

B-measurable for

B2 €

B^,

one has

w (T) = Pλ ((w l f w 2 )(s) e B 2 ) ,

B2 6

82 .

We now introduce a notation for the process of conditional probabilities of DEFINITION 3.3. distribution

μ

w (t)

given

F

,

which is our main concern.

The filtering process of is the process

P

w (t) P

by

w (t)

P

F ( ): F (B) = Z {w (0) e B } ,

where for each initial distribution

μ

on

R

we let

Z

μ

X

μ(dx) ,

with

B e

denote the

prediction process for Pμ = / P

for initial

P μ (Ω -Ω*) = 0 .

B,

ESSAYS ON THE PREDICTION PROCESS

41

F μ (B) = P μ (w (t) € B|F^ + ) .

We remark that, of course, we have

A remarkable result of M. Yor [15, Theorem 4] asserts that the F (•)

are themselves r.c.1.1. strong-Markov processes with a single Borel

transition function. Z

for the

μ

.

Here we will deduce this from the corresponding fact

However, this does not quite give as nice a topology as

[15] (see the remarks following the proof).

For our proof, we need a

further notation and lemma. LEMMA 3.4. P

μ

on

(Ω,G°)

S & G°,

μ

on

X

by first

H

and

P (S) = P θ * ( S y ) ,

y

€

R,

where

we define a measure S

= S Π {w 2 (0) = y},

P μ = /p X μ(dx) . Let H* = {p μ , y € R, all μ} . y y y is a Borel prediction packet, and for each μ we have

P μ {Z^ = P *

(3.2)

T-

PROOF.

R X

and then *

Then

For each initial

For

S

for all

t > 0} = 1 .

W ^ \^J

of the form S = {a < w 2 (0) < b, S w

(Q)

= S*}

for

S* e G* ,

we have

P (S) = I, . (y) P (S ) . Let S be a countable sequence of ιa n y 'D' o x such sets which generates G . Since by (i) and (ii) P is a one-to-one Borel kernel of probabilities on G with P {w (0) = x} = 1, μ we see that P is also one-to-one and Borel with respect to the

measure

μ .

Then it follows that the sequential range

μ

{(P (S )), y ^ R, μ a probability on R> is a Borel set in X°° [0,1], k==1 y n * . * implying that H M . To prove that H is a packet it suffices to show (3.2), since clearly P U = P μ and if (3.2) is true then Fμ P

(3.3)

{Z

ϊ t

=

Vw2(t)

fθr a11

by translation (we omit the superscript Borel in

(x,y),

sides of (3.2) are

for

F μ -optional

and F

F

μ

is

F

μ

t > 0 } - 1fy ^ R pμ

on

Z ) .

Since

PX

is

-optional, it is clear that both

-optional, hence it is enough to prove

T < » . Now by (ii) and the definition of

Zμ,

we have

42

FRANK B. KNIGHT

w (T)

=

P

ϊ

w2(T)(S) '

as asserted. By this lemma, we can introduce the filtering process as a function * of the prediction process with state space H , and derive its properties from the latter. THEOREM 3.4. B € B,

The probability-valued process

F (B) = Z {w (0) e B>,

as a function of the prediction process

Z

on

H*f

is a

right-continuous, strong-Markov process for a suitable topology such that (M,ί!)

the space

of probabilities on

metrizable Lusin space. processes PROOF. for

F^,

5

with its generated

σ-field is a

Accordingly, the same results are true for the

F^ .

For

h = P

u

* e H ,

h * F (B)=μ(B),B^δ (this is not to be mistaken

set

which has a subscript).

AXΛ = {h e H

: F

& M> .

Clearly

M

Then for A w e H, M

M e M,

we let

and writing now *

probability of the canonical prediction process on

H

with

Py y

for the μ

h = P

initial measure we have (3-4)

P

; ( F τ + t c- M|Z°)

On the other hand, recalling the Ψ(Z ) ,

s < t,

where

φ(Z )

is

σ-fields

χ

P -equivalent to

generated by w (s), we can

transfer (3.3) to the canonical space and rewrite (3.4) in the form

as

ESSAYS ON THE PREDICTION PROCESS

(3-5)

43

p J ( F τ + t 6 M|χ τ + )

P

= /j{Ft « M} F

τ

= q(t.P 0 , A M ) , where we used (3.3) with

F

in place of

μ,

along with the fact that in

distribution F does not depend on y for initial probabilities of the u * t * form Pyfe H . Accordingly, we may define a transition function q for F by q*(t,μ,M) = q(t,P μ ,A M ), and (3.5) becomes P μ ( F m e M | Z ° ) = q (t,Fm,M) . y T+t T T

(3.6) x Since μ,

P

u was assumed to be Borel in

it is not hard to see that

q

(iϊ,ίJ) . Finally, the topology on that induced by the mapping

x

and

PJ: is one-to-one in

is a Borel transition function on M

μ -+• P μ

referred to in the theorem is just and the topology of

H ,

since it is

easily seen that right-continuity of

Z

t

P

y+w 2 (t)

in (3.3) implies right-continuity of

(from the right-continuity of DISCUSSION.

w (t)) . Thus Theorem 3.4 is proved.

It follows directly from the (known) fact that the optional

projections of the r.c.1.1. processes r.c.1.1.

f(w (t))),

f e C(R), are again

μ

P -a.s. ([5, Chapter 2, Theorem 20]), that

in the usual weak

-*

topology.

F^

is even r.c.1.1.

This, together with further

applications, is found in [15]. From an applied viewpoint, it is only the processes

F^_(b) = z μ _{w (0) e B > , B e B,

which are realistic, since

only they do not depend on the future element of usual convention that

F°

F^ + . Using the fact that

F°

. Further, with the μ

is degenerate, one has

P {F^

P μ {w 1 (T-) = w

at previsible

(T)} = 1

= μ}= 1,

unlike

T < ξ,

44

FRANK B. KNIGHT

however, it is clear that perhaps at the lifetime property of

F^

ξ

F^

has no previsible discontinuities except

of

w (t) .

Hence, the moderate Markov

follows from the Markov property of

F

.

A final remark seems merited concerning the Definition 2.1 of the prediction space

Ω

. According to [4, IV, 19], Ω z

is a coanalytic

subset of the space of all r.c.1.1. paths with values in space is a measurable Lusin space.

H,

and this

The question naturally arises of

whether, by restricting this space to the r.c.1.1. paths in some stronger topology, one might preserve its function of representing the processes Z

and yet improve some other properties.

the Skorokhod topology of measures on (unpublished) one does not have topology} = 1 . exist unless

X

P {Z

Ω .

A natural candidate is then However, as shown by D. Aldous

is r.c.1.1. in the Skorokhod

The difficulty is that the Skorokhod left-limits do not is

P -quasi-left-continuous.

Hence the topology of

H

seems to be the most reasonable alternative.

REFERENCES 1. Theory.

Blumenthal, R. M. and Getoor, R. K. Academic Press, New York, 1968.

Markov Processes and Potential

2. Chung, K. L. and Walsh, J*. "To reverse a Markov process," Acta Math. 123, 1970, 225-251. 3. Cinlar, E. Markov additive processus and semi-regeneration. Proc. Fifth Conf. on Probability Theory (Bresov), Acad. R.S.R., Bucharest. 4. Dellacherie, C. and Meyer, P.-A. Probability et Potentiel, Chap. I a IV. Hermann, Paris. Chap. V - VII (to appear), 1975. 5. Dellacherie, C. Verlag, Berlin, 1972.

Capacites et Processus Stochastiques.

6. Dynkin, E. B. Theory of Markov Processes. Englewood Cliffs, New Jersey, 1961.

Springer-

Prentice-Hall Inc.,

7. Engelbert, H. J. "Markov processes in general state spaces" (Part II), Math. Nachr. 82, 1978, 191-203. 8. Getoor, R. K. Markov Processes: Ray Processes and Right Processes. Lecture Notes in Math, No. 440. Springer-Verlag, New York, 1975. 9. Knight, F. B. "A predictive view of continuous time processes," The Annals of Probability, 3, 1975, 573-596. 10. Knight, F. B. "Prediction processes and an autonomous germ-Markov property," The Annals of Probability, 7, 1979, 385-405. 11. Kunita, H. and Watanabe, T. Some theorems concerning resolvents over locally compact spaces. Proceedings of the Fifth Berkeley Symposium on Math. Stat. and Prob. Vol. II, Part 2. University of Cal. Press, 1966, 131-163. 12. Meyer, P. A. La theorie de la prediction de F. Knight. de Prob. X, Universite de Strasbourg, 1976, 86-104.

Seminaire

ESSAYS ON THE PREDICTION PROCESS

13. Meyer, P. Ά. and Yor, M. Sur la theorie de la prediction, et le probleme de decomposition des tribus F £ + . Seminaire de Prob. X, University de Strasbourg, 1976, 104-117. 14. Seminaire de Probabilite's I-XII. University de Strasbourg. Lecture Notes in Math # 39, 51, 88, 124, 191, 258, 321, 381, 465, 511, 581, 649. 15. Yor, M. Sur les theories du filtrage et de la prediction. Seminaire de Prob. XI, University de Strasbourg, 257-297.

45

ESSAY II. CONTINUATION OF AN EXAMPLE OF C. DELLACHERIE

1.

THE PROCESS

R

.

We consider a single occurrence in continuous time happens at an instant

T^ > 0

which may be random.

be the failure time of some mechanical apparatus.

t > 0

which

For example,

TΛ

may

Analytically, the

entire situation is described simply by the distribution function F(x) = P{T^ < x} . We restrict

F

and we define

is not finite, so that

T Λ = » where

T^

only by

F(O-) = 0

and

F(«>) < 1,

P{T^ = «>} = 1 - F (»)

Without risk of confusion, we speak of the "occurrence of

T # , " thus

identifying the event with its instant. From the viewpoint of an observer waiting for

T^

to occur, the

situation presents itself not as a distribution function but as a stochastic process, and as such it provides a basic example of general methods.

Thus we associate with

(1.0)

T^

the process

Rfc = I [ T Λ f < B ] ( t )

where I

,

— < t < -,

denotes the usual indicator function.

This process was studied by C. Dellacherie (1972), and by C. S. Chou and P. A. Meyer (1975).

The closely related process

TΛ Λ t

was also

studied briefly by C. Dellacherie and P. A. Meyer (1975), who corrected some errors in [4]. Since we require some preliminary results from [4], we use that formulation in large part. Rfc

However, our purpose is to study

in terms of its prediction process, as defined in F. B. Knight (1975)

and P. A. Meyer (1976).

This dictates that {T^ = 0} and {T^ = ~} be

permitted to have positive probability, which in turn makes it useful to set

R

= 0

(Ω,F°,P)

and we define Then the

for

where

- °° < t < 0 . Thus we introduce the probability space Ω = [0,~],

T Λ (x) = x ^^

σ-field

F°

on

F° Ω,

is the Borel

P(dx) = F(dx),

so that R (x) = I . (t) , r Lx iC0J

generated by

and is that generated by the atom

σ-field, and

R , (t,00]

s < t,

is

-<χ> < t < » . ""

{Φ,Ω} for

and the Borel sets of

t < 0, [0,t] for

t > 0 . As an example of the "general theory of processes," in [4] by the supe martingale

X

= E (R^ - R |F )

46

=

I

R

rn Φ ^^ '

was replaced wn:

"- cn

w a s

ESSAYS ON THE PREDICTION PROCESS

even a potential since

P{T Λ < «} = 1

the argument of [4, Chap. 5, T56]

was assumed.

of

P

on

σ-fields

F (= ¥ ) F° . Gt

THEOREM 1.1. R

generated by

Observe that

we set

Gfc+ = ^

The unique

= P{R Q = 1}

and

R

FT

¥

.

We need the usual augmented

and all

- Ψ

Gg

R

,

and

P-null sets in the completion

where for any adapted family of Gfc_ = J f c G g .

F -previsible increasing process R

~ Γ

R

such that

is a martingale is given by

R Γ :c * Λ Λt L = / 0-

= 0 ,

-°° < t < 0

i (1 - F(u-))

β

_

on + 1

dF(u) ,

0 < t < oo ,

IT Λ < •> on

{T. = °°} .

x>—

REMARK.

In the present case,

transfers with no substantial change to

provide the Doob-Meyer decomposition of σ-fields

47

*

Uniqueness means unique up to a fixed

P-null set.

In the present note we will go one step farther, and study

R

as an

example in the theory of Markov processes (as well as of martingales). Indeed, a general feature of the prediction process construction is that it permits any process to be viewed as a homogeneous Markov process—more specifically, as a right process in the sense of P. A. Meyer and having still additional structure.

It may be said here that

R

provides a more

or less prototypical example of the prediction process of a positive purejump submartingale.

The behavior of this prediction process depends, in

turn, on classification of the stopping times of our next concern.

F ,

which accordingly is

However, the reader may prefer to skip this rather

technical discussion, and go directly to Section 2 where the results are applied.

The connections with Essay I are postponed until the end of the

present essay, for reasons stated there. We recall that a stopping time

T

is "totally inaccessible" if for

every increasing sequence of stopping times

T

one has

Pίlim T = T < °°} = 0, n-x» n and "previsible" if P{T = 0} = 0 or 1, and if when P{T = 0} = 0 there exist T with 1 = P{T < T} = Pίlim T = T> . For the remaining concepts n n n-*» n * n

in our classification, as well as its existence and uniqueness, we refer to [5, Chap. IV, Theorem 81], According to the basic representation theorem of our particular situation ([4, III, T53]) a random time time if and only if for some

s < °°,

T

is an

F -stopping

48

FRANK B. KNIGHT p

(1.1) We note that s

s

ί ί τ * < s Λ T} U {T # > s = T}} = 1 .

is unique unless

P{T Λ > T} = 0 f

and then we may choose

s oo . The classification of stopping times depends on:

THEOREM 1.2.

The accessible part of a stopping time T

= { [oo

A

(1.2)

A=

{T > T.} U {T = s < T J U *

where

s

REMARK.

AC

on

enumerate the values with

is given by

, where U

*

T

{T - T

S, < S

• s. } *

K

P{T # = s, } > 0 .

It is easy to see that this set is unique up to a

set even if PROOF.

s

is not unique.

We have

P{T = 0} > 0 {T = 0}

P-null

{T = 0} - {T = 0 = T^} U {T - 0 < T^}, hence if

then either

0

is an

s

or

is in (1.2), as it should be.

s = 0 . In either case

Now let

T

be any nondecreasing

sequence of stopping times, and let

T = lim T . If we assume that n P{T < T } = 1 for all n (thus T °°is previsible) and let s n °° °° n correspond to T as in (1.1) with s = « whenever possible, then we n n see that lim s = s exists, and satisfies (1.1) for T . Then we have n n {T. < s } c { T < T } up to a P-null set, and therefore ~

00

">

OO

(1.3)

P{{T^ < s *

Λ T } U { s 00

Conversely, if a stopping time P{T > 0} = 1, as follows.

= T. Λ T } } = 1 .

00

00

T

*f

satisfies (1.3) for some

then we can construct a sequence If

s

00

s^

and

P{τ < T} = 1, n n 1 = P{{T. < T} M {T. = T = «}} and

= °° then

00

v

"

T

*T,

*>

writing T = f(T^) on Ω measurable functions with

we can define T = f (TΛ) where f are any f (°°) = n, and for x < °°, x < f (x) < f (x) n n and lim fn (x) = f (x) . If 0 < s < °°, then we define for

n

< s o

f (T.) T

U

on

{T. < s

"

n

*» "~

- n" 1 } U {T. = s "

oo

o

< T} o

l s

- n

elsewhere

oo

and observe that s^ = 0

then

T

satisfies (1.1) with

P{T Λ = 0} = 1

and

T

s = sm - n

. Finally, if

is equivalent to a positive constant.

It follows that (1.3) characterizes the previsible stopping times

T

with

P{T > 0} = 1 . Next we observe that for constant of the form

{T = c}, hence on

c,

{T = s}U

any

T

is accessible on a set

U {T = T. = s. } . It remains s k <s k

ESSAYS ON THE PREDICTION PROCESS

49

to show that the rest of the accessible part is given by

{T > T^} .

this is contained in the accessible part follows by writing as in the preceeding paragraph.

{T < T^}

T n = f n (T # )

On the other hand, by (1.1) we have

{T < T^} = {T = T^} U {T^ > s = T} part of

That

up to a

P-null set, hence the only

not already found excessible is

{T = T^ ψ s^

for all

k}

To see that this last is not accessible, note that for any previsible stopping time

T^ > 0,

contained in

(1.3) implies that the set

{T = T. = T •»

to

Tro

as in (1.3).

= s }

00

up to a

{T = T # = T^}

P-null set, where

s

CO

is corresponds

°°

Therefore, only sets

{T = T^ = s }

of positive

probability can be in the accessible part, and the proof is complete. COROLLARY 1.3. if

A stopping time

P{T > T^} = 0

if and only if

and

T

is:

a) totally inaccessible if and only

P{s = T < T^} = 0

P{T = 0} = 0

or

1

for

0 < s < «>,

and, for some

s,

b) previsible

P{{T # < s Λ T} U

{s = T^ Λ T}} = 1 . PROOF.

Part b) is just (1.3) , so we need only prove a ) . The condition

is obviously sufficient by Theorem 1.2. P{s=T0 (1.1) and

for some

s,

P{s = T < T^} > 0,

Theorem 1.2

and

On the other hand, if

then either

or else

P{T = T^ = s } > 0 .

s

s

corresponds to s

is one of the

In either case,

s v '

T

T

as in

i-n

is partially

accessible. COROLLARY 1.4.

If

P{T^ = s} = 0

inaccessible and a stopping time p{τ = T^} = 0 . F P{T Λ

for all T

s < «,

then

REMARK.

is previsible if and only if

implies that

s > 0,

P{T Λ = s} = 0 .

It is known from [4, Chap. Ill, T51] that absence of times of

discontinuity is equivalent to the previsibility of all part is

is totally

Furthermore, the necessary and sufficient condition that

be free of times of discontinuity is that, for all > s} > 0

TA

Ω

(up to a

T

PROOF.

The first assertion is immediate from Theorem 1.2.

assume

P{τ = T^} = 0,

P{T^ = s} = 0,

whose accessible

P-null set).

we have

1.3 b ) . Conversely, if

and let

s

correspond to

P { τ # = s Λ T> = 0,

hence

p{τ = T^}> 0

T

then

T

For the second,

as in (1.1). T

Since

satisfies Corollary

is inaccessible on this

set, hence not previsible. It remains to prove the last assertion. holds; i.e., that the distribution of

T^

Assume that the condition

has not atoms except perhaps its

maximal value, and suppose that the accessible part of correspond to we have

T

as in (1.1).

If

P(T^ = s} > 0,

1 = P{{T > T^} U {T = T Λ = s}},

{T > T^},

T

and since

T

is

Ω .

Let

then by Theorem 1.2 T^ < s

holds on

is previsible by Corollary 1.3 b ) . If, on the other hand,

s

50

FRANK B. KNIGHT

p{τ^ = = then _if .,_ P{T = _ T == s^j s } ;> for any any s ^, we see from _ sβ } _ wo, , — > u0 lυr J W M # Γ > s } = 0 and (1.1) that s < s, and hence s may replace s * k k K (1.1). s .

Thus either we have the former case, or

Then since

T > T^

implies

T^ < s

1 = P{{T > T^} U {T = s < T^}}, previsible.

F

F

P-null set, and T

is again

is free of discontinuities.

P{τ^ > s} > 0

and

P{T^ = s} > 0

The

imply

s THE PREDICTION PROCESS OF

R

.

We turn now to the construction of the prediction process of which we will denote by

Z

.

^. - ^. )

τ

F .

TΛ - t

Thus, writing

distribution function, we have

Z ,

given

(.\

F ,

Z (x) = Z (x,w)

z

.(0) = 1

Zfc(x) = (F(t + x) - F(t))/(1 - F(t))

t > T^

t +

F(t-) = 1,

ψ

(we

if

whence they have the

for the corresponding

t > T^

otherwise.

and

or

F(t) = 1,

while

The left-limit process

in a suitable topology to be specified, is or

given

Z

Clearly such distributions can be specified by

the conditional distribution of same form as

R ,

According to its definition, the values of

are the conditional probability distributions of recall that

of

for all

ϊF .

s2.

except on a

by Corollary 1.3 b)

Thus (see the Remark)

converse is obvious, since

P{T = T^ = s} = 0

in

Z _(0) = 1

if

Zfc_ (x) = (F(t + x) - F(t-) )/(l-F(t-))

otherwise. The prediction process may be used to best advantage only by introducing it as a Markov process in its own right, instead of confining it to the probability space of

R

(this represents a partial shift of the

author's views from those expressed in [9]). This is because there are technical difficulties in carrying out the theory of additive functionals of the prediction process if it is defined on the original probability space (as noted by R. K. Getoor (1978)).

Ω

On the other hand, once we free ourselves

from this restriction, the theory becomes comparatively straightforward. Furthermore, in a sense to be made precise, nothing concerning the process R

is lost in the transition.

Therefore, we introduce formally both a new

state space and a new probability space. DEFINITION 2.1.

The prediction state space of

R

is the space t

(E , E ) Z Z

where E_ = {(F(t+ ) - F(t))/1 - F(t) ,

-co < t < »:

(F(t+ •) - F(t-))/(l - F(t-)) , F

, —OO

and

F

} ,

-J-OO

with

F —OO

F(t) ^ 1

-« < t < oo: F(t-) ^ 1

(x) Ξ 0 ,

F -J-OO

(x) Ξ 1 ,

ESSAYS ON THE PREDICTION PROCESS

and G F

E is the Z varies on E and

F

σ-field generated by the functions .

We denote elements of

Eχ

51

G ( x ) , 0 < x < «>,

as

of the first two types by

respectively (although, with this notation, they are not

necessarily distinct) .

We let

E^

denote

{F_oo, F + O Q , Ffc, -« < t < «>} .

In the present very specialized situation, it is natural to introduce in

E

the topology of weak convergence of measures on Ω, when Ω is Z considered as a subset of the space D with the Skorokhod J^-topology (Billingsley, [2], Chapter 3 ) . Specifically, to each the element of

D

given by

f (s) = R (x) X

with

t

x e Ω

we associate

s = —(1 + — arctan t ) , η £

iΓ

-oo < t < °° . We note that f (s) = 0 for 0 < s < —, and that convergence in D of f is the same as convergence of x in the extended topology of x [0,00] .

It therefore follows that the continuous functions on

D-topology are just

C[0,~],

Ω

in the

and weak convergence of probabilities on

Ω

becomes simply weak convergence of the corresponding distribution functions F

on

[0, °°] .

In particular, we note that

E

is a Borel set and that

E

z is a Borel

z

σ-field generated by this (metrizable) topology on

Furthermore, since with left limits

F

F

is right-continuous for for

t > 0,

with left limits in this topology.

E_ . Δ

t < minis: F(s) = 1 } ,

it is clear that

Z

is right-continuous

In fact, the space

E

is "almost"

z compact, the only limit points not necessarily included being those of obtained as F(t) = 1

t — > -H» .

for some

This set is trivial if either

F(°°) < 1

t < °°, but in general it cannot be avoided.

We turn next to the prediction probability space for the process using the same notation Z for the process on the new space. DEFINITION 2.2. Let ( Ω . F , Z) consist of z z t a) The space of all paths z(t) , 0 < t < °°, with values in ~" which are right-continuous, with left limits for of weak convergence, b) The coordinate

σ-field generated on

c) The coordinate functions We observe that the original Ω

given by

&

for

has its paths as points in

.

E

and Z

in the topology

Ω

z

0 < t < T —

Ω

t > 0,

Z , t

by ίz(t) e A>, t> 0, Z — Z = Z (z) = z(t) . F(=F ) is in Ω , and that the

u— process on

F

or

and by *

F

for

t > t

- ' o o

—

*

Hence we can define a probability

on

(Ω , F ) such that the joint distributions of Z(t) Δ Δ those of the above process on Ω. Furthermore, to every z

P

are the same as z € Ez

we can

(Ω , F ) , by using z in z z the role of F as the distribution of T A . Thus the points z ^ E * Z correspond to probabilities for Z . If z = F for some t,

associate in the same way probability

P

on

52

FRANK B. KNIGHT

Z

-oo < t < °°, then

P {Z

F(t) - F(t-) > 0,

= z} = 1 . Z

then

P { Z Q = Pfc}

However, if

z = F

(Ω_, F_) .

so that

= 1 .

We are now in a position to view the family Markov process on

ψ F ,

The points

z

{P , z e E z >

such that

as a

z = F

ψ F

are the "branching points" of this process, in the terminology of Walsh and Meyer [13]. The transition function such that for each points.

(t,z)

q(t,z,A)

of the process is

the probability is concentrated on at most two

Precisely, we have

DEFINITION 2.3.

The transition function of

t > 0, z e E z ,

A e

Z

is given by

where

i) qίt.p.rtpj) = 1 ,

t > 0

ii) q ί t ^ ^ F ^ } ) = 1 - q(t,z,{F s+t}) = F (t)

1 > F (t) (=F(s+t)) , s

1 > F g (t) ,

if

z = F

and s

t > 0 ,

q(t,z,{F^}) = 1 - q(t,Z,{F

iii)

q(t,z,A),

F _(t) s+t}) =

if

z = F 7* s-

and

t > 0 , in cases ii) and iii) if

iv) q ί t ^ ^ F ^ } ) = 1

v) q(0,z,{Foo>) = 1 - q(0,z,{F }) = F

(0)

1 = F

in case

s(t) ' iii).

It follows from the general theory of [9] and [11] (or can easily be seen directly) that

(Ω_, F_/ Z , P ) becomes a right process on E in z z t z the sense of P. A. Meyer, with transition function q, when we include z z the canonical translation operators θ. and σ-fields F . Of course, both E and q are Borel, so the general U-space set-up of Getoor [6] it

is unnecessary (this is quite generally true for the prediction process). Furthermore, the process has unique left limits

Z in E , t > 0 . t~ Z It is important to observe that probabilistically nothing is lost by F + considering (Z , P ) in place of (R ,P) . Thus we introduce on E t t z the Borel function

(2.1)

Then

φ

φ(Z )

is

(G) =

P -equivalent to

R

in joint distribution, and is

right-continuous with left limits. Hence it is a valid replacement for o, Z R . The σ-fields F generated by Z , s < t, are of course larger "C "t S than those generated by

φ(Z g ),

traced to the fact that

φ(Z Q )

initial point hence

Z

and

z

s < t .

But the entire difference can be

does not determine

the above two fields have the same

φ(Z )

generate the same completed

Z

.

Thus for each

P -completion, and σ-fields

F

ESSAYS ON THE PREDICTION PROCESS

53

One basic feature of the prediction process which gives insight into the given process is its times of discontinuity. time

TA

on

Ω

The analogue of the jump

i s o f course the stopping time

(2.2)

T

z

^ = inf {t: Zfc = fj

.

However, this is not necessarily a time of discontinuity for F

p ,

and by no means the only one.

T

under P Z, jump points of

probability

F

consists of F .

But while

Z^

under

By Theorem 1.2 the accessible part of

U {T = s }, where the s enumerate the k Z, k K. R is discontinuous at t = s^ with

F(s,) - F(s. ) , Z is discontinuous at t = s with k k— t K 1 - F(s -)(= P F {T * > s }) unless F(s ) = 1, when it is

probability

continuous (since

Z

is then S

F

k

-measurable).

On the other hand, at

V

the totally inaccessible part of T_ (i.e. the part where F is continuous), Z like R has an inaccessible jump. It is clear that Z

is continuous except at IL {s } U {T .} t K K 6, F its discontinuities under P , and for other is analogous.

hence we have classified z e E_ Δ

the situation

Thus, the conclusion which roughly emerges is that

the same totally inaccessible jumps as jumps at times when

R

R

Z

has

but it has additional accessible

has a positive (but unrealized) potentiality for a

jump. This distinction in the behavior of R and Z at the previsible s disappears when we replace R by the martingale R - R'* k t t t of Theorem 1.1. More generally, we introduce on Ω the previsible additive times

2

——————

functional

Λ * Λt (2.3)

A

=

/

z

'

(1 - G(u-))

Ί^

d G(u)

on

{Z Λ = G} ,

o Λ

G e E

o

(previsibility is clear since process

x

Λ t) .

A

The process

.

z

is a Borel function of the previsible φ(zt) - Ψ(ZQ) "

i s

n o w

s e e n

t

to be a

martingale additive functional of

Z

that

have the same times of discontinuity

φ(Z ) - φ(Z ) = A

for each

P

.

and

Z

.

z

More importantly, one easily checks

This is an expression of the general fact that a right-

continuous martingale has its times of discontinuity contained in those of its prediction process, as proved in F. Knight [10, Lemma 1.5]. However, the application is not direct because the prediction process of φ(Z t ) - A t

for fixed

space than

E^,

F(s) - F(s-) = 1

G = ZQ

has a different (and less convenient) state

and it cannot be identified with for some

s

then

z

φ( J

although continuous, is not constant.

- A

= 0

Z

. for

For example, if PF

while

Z ,

54

FRANK B. KNIGHT

We consider finally the Levy system of

R,

and

where

T

- R

:ί

N(x,dy)

.

Z ,

By definition [1, Corollary 5.2] this is a pair

is a kernel on

(E , E ) , N(x,{x}) = 0,

previsible additive functional such that for with

f(z,z) = 0,

(2.4)

E*(

and its relevance to

Σ f ( Z ,Z ) ) = E S S 0<s
In the present case, although

#

\\ dH ( / O S E

Z

and

0 < f(x,y)

N(Z S

H

&€*€„,

,dy) f ( Z , d y ) ) . s-

does not satisfy all the hypotheses of

[1, Cor. 5.2] it is easy to specify such a system explicitly. only to take

H

= A

(N,H)

is a

One has

from (2.3) and then define q(0,x,dy)

for

x = F^__ ψ F^_,

-°° < t < ~

N(x,dy) 6(F )

otherwise,

x ^ F '

OO

where

6(F )

00

is the unit mass at

F

00

(we define

N(F , )

00

in any Λ

OO

convenient way). As a compensator for the discontinuities of system is here more relevant to

R

- R^

than to

R ,

Z ,

the Levy

for the reasons of

the preceeding paragraph. Thus we have an analogous "Levy system" for R - R!_C in the form (N'", R^) where

t

t

t

N"(-R:: ,{-R:: + 1}) = F (0) s.s. s.D D D

(2.6)

for

F(s.) - F(s.-) > 0, D D

and

N"(x,B} = IB Ώ (x+l)

for all

x ψ {-R" } . s.-

It is clear that (2.6) is obtained from (2.5) by just substituting the :: F jumps of

R

- R

for those of

(Z ,P )

which are disallowed as jump times of

Z

analogous to (2.4) but for the martingale

except at . R

t = 0

and

t = °°

Since (2.6) has a role - R

instead of

Z ,

it is

natural to take it as the definition of a Levy system for the martingale. Again, this is a very special case of a general existence theorem ([10, Theorem 1.3]). 3.

CONNECTIONS WITH THE GENERAL PREDICTION PROCESS. For the reader who is already familiar with Essay I, the present

Section 2 is easily incorporated into that more general setting.

However,

it is somewhat more natural to treat all single-jump processes simultaneously, as realized by a single prediction process. the essence of the underlying idea.

This formalizes, so to speak,

It has been carried out by Professor

John B. Walsh, who has consented to let us use the material that follows.

ESSAYS ON THE PREDICTION PROCESS

We take notation.

w(t) = w (t) ,

Let

w(x) = I.

Ω J

(J

v (x) ,

55

with all other components discarded from the

for jump)

0 < T < «> .

be the set of functions of the form

Then

Ω

inherits from

Ω

the topology of

pointwise convergence of the corresponding T . Hence it is compact. Let H be the set of all probability measures on Ω_, with the weak-* J J topology.

If we identify

assigns to

T,

h e H

then convergence in

distribution functions on For

with the probability distribution it

[0,«>] ,

H

becomes weak convergence of

and

H_

is compact.

j

h <= H

(regarded as a measure on

Ω vanishing outside Ω_) , J Z remains in H , and so does Z for t > 0 . t J t— is a complete Borel packet, in the sense of Essay I, Definition

J the prediction process Thus

H

j

2.1, 3 ) . The transition function of

Z on H is given above by Essay 2, t J Definition 2.3. The elements of H Π H , regarded as distributions of T, <j u are just F^ and all F with F(0) = 0 . Thus Z is a right process on H Π H . In fact, we have more in the present case. J 0 PROPOSITION 3.1. Z^ is a Ray process on H, . t . J m PROOF. It is to be shown that /_ e~" q^fdt e C(H ) if f e C ( H ) , where u t J J q f(h) = /f (z)q(t,h,dz) . As before, we let F(t) = h{T < t} . Then we have

syλtt [F{T- ; where the last integrand is

-*• h, with n corresponding F -> F, the first term on the right obviously converges to its limit with F in place of F . Also, if F(t) < 1 then n F n (t+ ) - F n (t) has at most two weak limit points as n ->•«>: F(t+ ) - F(t) 1 _ -p (t) with

F(t) < 1

_ a n d

0

if

F(t) = 1 .

F(t+ ) - F(t-) Ί _ F(t-) *

τ

"

u s

a t

Now if

h

continuity points

it converges to the same limit.

Since

f

t

of

F

is bounded it is

easy to see that the contribution to the last integral for t > inf {t : F(t) = 1}

tends to

0

as

n -»- °° .

Hence by dominated

convergence, the last integrals also converge to their value at

F,

completing the proof. REMARK.

It follows immediately that Conjecture 2.10 of Essay I holds for

REFERENCES 1. Benveniste, A. and Jacod, J. "Systemes de Levy des processus de Markov," Inventiones mathematicae 21, 1973, 183-198. 2. Billingsley, P. Convergence of Probability Measures. and Sons, Inc., New York, 1968.

John Wiley

56

FRANK B. KNIGHT

3. Chou, C. S. and Meyer, P.-A. Sur la representation des martingales comme integrales stochastiques dans les processus ponctuels. Seminaire de Prob. IX, Univ. de Strasbourg, 226-236. Lecture Notes in Math 465, Springer, Berlin, 1975. 4. Dellacherie, C. Verlag, Berlin, 1972.

Capacite"s et Processus Stochastiques.

5. Dellacherie, C. and Meyer, P.-A. Chapitres I a IV. Hermann, Paris, 1975.

Springer-

Probabilite*s et Potentiel,

6. Getoor, R. K. Markov Processes: Ray Processes and Right Processes. Lecture Notes in Math. 440, Springer, Berlin, 1975. 7. Getoor, R. K. Homogeneous potentials. Seminaire de Prob. XII, Univ. de Strasbourg, 398-410. LEcture Notes in Math. 649, Springer, Berlin, 1978. 8. Knight, F. B. "A predictive view of continuous time processes," The Annals of Prob., 3, 1975, 573-596. 9. Knight, F. B. On prediction processes. Proceedings of the Symposium in Pure Mathematics of the Amer. Math. Soc. XXXI, 1976, 79-85. Providence, R.I. 10.

Knight, F. B.

Essays on the prediction process.

Essay IV.

11. Meyer, P.-A. La theorie de prediction de F. Knight. Seminaire de Prob. X, Univ. de Strasbourg, 86-104. Lecture Notes in Math. 511, Springer, Berlin, 1976. 12. Meyer, P.-A. and Yor, M. Sur la theorie de la prediction, et le probleme de decomposition des tribus F? + . Seminaire de Prob. X, Univ. de Strasbourg, 104-117. Lecture Notes in Math. 511, Springer, Berlin, 1976. 13. Walsh, J. B. and Meyer, P.-A. "Quelques application des resolvantes de Ray," Inventiones Mathematicae, 14, 1971, 143-166.

ESSAY III.

Let

X

CONSTRUCTION OF STATIONARY STRONG-MARKOV TRANSITION PROBABILITIES

be a continuous parameter stochastic process on

with values in a metrizable Lusin space σ-field of a Borel set

E

to state the property of

(E,E)

(i.e.,

in a compact metric space X

E

I) .

(Ω,F,P)

is the Borel In order just

that it be a "time-homogeneous Markov

process", it is necessary to introduce some form of conditional probability function to serve as transition function.

From an axiomatic standpoint it

is of course desirable to assume as little as possible about this function. An interesting and difficult problem is then to deduce from such assumptions the existence of a complete Markov transition probability (P,X )

p(t,x,B)

for

which satisfies the Chapman-Kolmogorov identities

(1.1)

p(s+t,x,B) =

thus giving rise to a family

Jp(s,x,dy)p(t,y,B) ,

(P , x £ E)

of Markovian probabilities for

which (1.2)

pX(X s

+t ^ B l σ ( X τ '

τ

1

s ) )

= P

S

X (X t

e

B} .

The analogous time-inhomogeneous problem (of obtaining a

p(s,x;s+t,B))

was treated by J. Karush (1961), and considerably later the present problem was taken up by J. Walsh [9]. It seems, however, that for the homogeneous case the solution remained complicated and conceptually difficult. Since the publication of these two works, a new tool has appeared on the scene which has an obvious bearing on the problem, namely, the "prediction process" of [5] and [8]. Accordingly, the present essay aims to show what can be done by using this method. of applying a new device.

But it is not simply a question

Our view is that the prediction process is

fundamental to the problem, and the hypotheses which are needed to apply it give a basic understanding of the nature of the difficulties. way of viewing the entire matter is as follows. in some sense the best approximation to

X

A suggested

The prediction process is

by a process which does have a

The hypotheses of Theorem 3 of [9] are ultimately consequences of ours (Corollary 1.9 below).

57

58

FRANK B. KNIGHT

stationary strong-Markov transition function.

The problem is thus to

formulate the conditions under which the prediction process becomes identifiable with

Xfc

itself.

Two immediate requirements are that the paths of

X

be sufficiently

regular, and that their probability space be sufficiently tractable, so that the assumed conditional probabilities may be identified surely for each

t

P-almost

with the regular conditional probabilities which

constitute the prediction process.

We will make the following initial

assumption (to be relaxed in Theorem 1.12). ASSUMPTION 1.1. E-valued paths

Let

(Ω,θ , F )

w(t), t > 0,

denote the space of right-continuous

with left limits for

translation operators and generated representation

σ-fields.

t > 0,

and the usual

We assume the canonical

X (w) = w(t) .

We now introduce the two basic definitions with which we will be concerned. DEFINITION 1.2.

Let

Q(x,S),

X e E,

kernel, i.e. a probability in each to

S . Q

A probability

and

F^_

(= ^

P

S

on

F°+£) ,

S e F°(= V F°), be a probability

for each F°

x

and

E-measurable in

x

for

is called homogeneous Markov relative

t > 0,

if for each

t

and

S e F°

Pίθ" 1 S|F°_+) = Q(X t ,S) P-a.s.

(1.3) DEFINITION 1.3.

The Chapman-Kolmogorov identities for

(1.4)

Q(x,θ~\_(S)) = J Q ( X , { X Sit

0 < s,t REMARKS.

x ^ E ,

Q(x,S)

G dy}) Qίy^θ^S) S

t

S^F°.

Since regular conditional probabilities exist over

assumption of a

Q

are

F°,

the

as in Definition 1.2 is equivalent to assuming only a

marginal conditional probability kernel In fact, it is enough to have

Q

B e E,

Q (x,B),

for rational

s,

for each

s > 0 .

since then

/ Q

(Xτ/dy)Q (y,B) = Q (X ,B) except on a P-null set for each S S l 2 l + S2 T We can then use this identity, along with the fact that regular S

conditional probabilities assign probability one to the r.c.1.1. paths, to construct a Q

Q

on the space of

satisfying (1.3).

In fact, the measures generated by

E-valued functions of rational

s > 0

s

must reduce,

"™

when X is substituted as initial value, to the restriction to rational s of any regular conditional probability on the r.c.1.1. paths given o F . Hence they extend to measures on the r.c.1.1. paths, P-a.s. for every

τ .

The set of restrictions to rational

s > 0

of r.c.1.1. paths

is a Borel set in the countable product space, so the condition that this

τ .

ESSAYS ON THE PREDICTION PROCESS

set have probability set, we may take

1

59

gives a Borel set of initial values.

Outside this

Q(x,S) = I (w(0)) .

The most that follows from (1.3), however, is that (1.4) holds for all τ > 0

and

S

&

F°

except for

x

in a set

E(τ,s)

In short, one can eliminate the dependence on are right-continuous and

F

t

s,

much less on

τ,

Secondly, the reason for conditioning on

it is less convenient.

and

S

is countably generated.

to eliminate dependence on

is in one sense trivial:

with

We could have used

P {X

e E(τ,s)} = 0

since the paths

But we do not see how

without further assumptions.

F

in Definition 1.1

F°

and

X

instead, but

However, the distinction between

F°

and

ψ

t

is

t

"unobservable" for the prediction process (see, for example, the Remark following Theorem 1.9 of Essay I ) . So it is unrealistic to condition on F

except when it is shown (as following Theorem 1.12 below) that this is

equivalent to

F.

.

The point here is that the prediction process is

automatically a strong-Markov process relative to dictates that the same will be true of

X

The problem is now to identify in practice) under which, given a Q*(x,S)

To this end we first state the

and [8], and Essay 1.

Let

z(S)

σ-field generated by

F°

a e R . F

and the

Further, for each

and all

for each

z e H,

is an

for each optional time d

5)

P

REMARK.

Z(t)

be the

1

σ-field generated by

z-prediction process

Z (H,H)

,

Ω'

such that

z-a.s.

itself).

The process

Ω

"Z

is

Ω

and

H

were "larger" than the present ones.

is here a Lusin space it is easy to see that the probabilities

of [5] must already equal one on the Borel image of this

space

= Z (S,w),

S e F ,

S | F * + ) = Z*(S)

z

S & F°,

z-equivalence.

In [5] the spaces

But since

{z : z(S) < a},

F

and all

is another notation for

unique up to

as obtained in [5],

-optional process with space

T < <», P^θ;

(where

let

Then the

F

X ,

be the set of probability measures

t > 0

z-null sets.

Thus our method

satisfying (1.3), there exists a

relevant properties of the prediction process of

on

.

conditions (presumably verifiable Q

satisfying both (1.3) and (1.4).

(H,H)

F

.

of [5], for all

t

z-a.s.

Ω

in the

Hence we can assume the present

(H,tf) . The second essential feature of the processes behavior as THEOREM 1.4. (H,H)

z

varies.

Z

concerns their

From Theorem 1.15 of Essay I we have:

There is a jointly Borel transition function

such that for each

z

the process

Z

q(t,y,A)

(with the probability

on z

6 0

FRANK B. KNIGHT

itself) is a homogeneous strong-Markov process relative to F fc+ , transition function

q(t,y,A) .

In particular,

q

with

satisfies the Chapman-

Kolmogorov identities (1.1). An advantage of restricting to a space of right-continuous paths is that one can be quite explicit about the connection of

Zfc

and

Xfc .

Indeed we have a simple functional dependence. THEOREM 1.5.

There is an

H/E-measurable function

φ

such that for all

z « H Pz{χt . φ ( Z Z ) PROOF.

for all

It is convenient to introduce the set of non-branching points of

= {z e H: We have P {Z

Z

H

e H ,

G H,

z <= H

and

q(O,z,{z}) = 1} .

and by Proposition 2 of Meyer [8] for all

t > 0} = 1

(in fact, the distributions of

those of a right process on B <= E,

H

P {X

z(S)

is

that

E Z f(X Q )

B}

ψ

is

fixed

= φ(z)} = 1

H-measurable for

functions).

is

Z *

with transition function

(

""

B|F

o

= P {X we must have

z e H on

q) .

we have H

are

For

since

oι o

and

H-measurable for

Then we have

H-measurable on

H

o+)

e B}

X

is

f e b(E)

a.s.,

, Z

for some function

S e F°

φ(z)

on

x Q G E . Now for any

F

i B (x τ ) = P Z

.

Since

(the bounded

E-measurable

Z

e H,

φ(z) = x

-stopping time

H

F°-measurable, we see

{z: φ(z) e B} = {z: E I (X ) = 1 } B 0 . We set

on

H - H

T < «>

w e

so

for some have for

B e E

(x τ fe B | F Z + )

z = P

T

(XQ € B)

= I β (φ(Z Z )) , Z

z-a.s.

It follows easily that X = φ ( Z ) , z-a.s. Then, since both X and zx z " are F -optional processes, the optional section theorem of —

UT

[1, IV, 84] finishes the proof.

Z

P Z { Z Q Z = z} = 1}

H Q = {z e H:

Z

t > 0} = 1 .

ESSAYS ON THE PREDICTION PROCESS

Before proceeding, let us review our notations. refers to the original process on P fc H .

P

Z

and

we do not write p Z

is that of

senses:

P

E P

Z

Ω,

61

P

without superscript

and at the same time we have

are simply z and its expectation, for z e H, but z . Z is the prediction process of z; in particular,

P .

We will need to use

Q(x,S)

in three distinct

first, as a probability kernel; second, as a mapping

defined by

Q(x) = Q(x,(•));

and third as a set mapping

Q : E -v H

Q{x e S} =

{Q(χ) : x e S} . The essential requirement for using the processes a transition function for

X

defined by the given kernel p large that

P{Z

e Q(E),

is that the mapping Q(x,S)

Z

Q:

to construct

E

should have a range

-*» H Q(E)

sufficiently

t > 0} = 1 . The most natural way to insure this

is to introduce: ASSUMPTION 1.6. topology on i)

H

H

Q

is continuous for the given topology on

E

such that

σ-field generated by the open sets, and P ii) Z is P-a.s. right continuous in t . There are usually many different topologies generating

is the

P Z a.s.-right-continuous.

H

LEMMA 1.7.

PROOF.

K

and

discussed

P{Q(x ) = Z Z^

denote

for all rational it follows that

{w:w(0) = x} .

t > 0, "*

Z^ e H Q

this implies

P{X

=

K

hence

K

e

H.

S (x)

φQ(X ) , t > 0} = 1,

Since

{x: Q(x,S(x)) = 1} Π

and on the above intersection we have

follows by [1, III, 21] that is the identity on

such that

P{Q(Xt,S(X )) = 1, t > 0} = 1 .

is one-to-one on this set, whose image under

REMARKS.

e H,

By right-continuity of

By Theorem 1.5 we have

Q ( X , S ( X ) ) = 1} Π H Q .

Q

K

P{Q(X.) = Z^} = 1, t t

r > 0} = 1 .

and since we have

{x: Q(x) e H } e E,

c H ,

P{Q(X t ) = Z*, t > 0} = 1 . Next, let

We set

K Q = Q{x:

K

is complete.

P{Z* £ K , t > 0} = 1 .

By (1.3) we have for each p

and

Q*(x,S)

Under Assumption 1.6 there is a

Qφ = identity on

then

Ω,

We postpone further discussion of Assumption 1.6 until the

construction of the transition function

Xt

and making

Perhaps the most natural one is the weak*-

topology with respect to the topology of weak convergence on below.

and some

We have

Q

φ Qx = x, is

K

Q φ Qx = Qx,

.

It

hence

and the proof is complete.

We did not quite have to require that

Q(x)

be continuous,

but only that it be measurable and that its graph be closed in

E x H .

Furthermore under the not unreasonable conditions that

Q(x,S(x)) = 1

for all

(where the

x

and that

conditioning is on

Qφ

Q(x, Q(x))

|F°+) = Q(x, ) we have

for all

K Q = Q(E) .

x

62

FRANK B . KNIGHT

We now use the set

K

of Lemma 1.7 to construct a state space for the

prediction process on which it can be identified with LEMMA 1.8. for all

There is a

K

c K , K

£ H,

z, z t

REMARK.

€

κ

t

i'

-0}

=

In the terminology of Essay I, Definition 2.1, 3), K

packet of PROOF.

Z

PZ{ZZ e K Q ,

t > 0} = 1} .

Then in the terminology of [3, Section 12] for K = {z € H :

P

0

l(z) = 0}

where

P

κ

H

V o

α > 0 1

q .

process is a right-process on

I

(Z.)

is

Since H

q

is Borel and the prediction

(see Remark III. e. of Meyer [8])

P {Z

& K,

that

P ίz

F ,

Z

It follows that for

and so the section theorem implies that

t > 0} = 1 . We have, by Lemma 1.7,

P{Z^ Ξ K Π K Q } = 1 . Z

K

P -indistinguishable from a well-measurable

(optional) process of Z

α-excessive for

is

κ

is a nearly Borel set for the prediction process. K,

we have

o" o

the transition function

Z

is a Borel

(In part like Theorem 2.4 a) of [6]). We begin by setting K = {z e H Q :

e

and

z ^ K ,

pZ{z

z

Xfc . P{Z P € κ χ } = 1

such that

Also, for

e K Π K ,

z e K Π K

t>θ}=l,

p{Z P 6 κl = 1,

hence

we have by definition of

so we may consider

space for the prediction process, and by Lemma 1.7

K Π K

K

as state

Q φ = identity on

κnκ0 . It remains to show that K

K Π K

may even be replaced by a Borel subset

. We use an argument due to P. A. Meyer [7] (see also the end of [9]).

Since P{Zfc

6

K Π K K /

is nearly Borel, it has a Borel subset

t > 0} = 1 .

Let

κ2 - u

K'

K2

such that

denote the nearly Borel set

« κ2. p izt € κ2,

t

0) - u

.

As before, we have P{Z P € K p = 1

i)

P Z ί z Z e K^,

ii)

and

t > 0} = 1

for

z e κ'

Similarly, we define by induction a sequence where

K

Now let But for

n

is Borel, and K

= nQ2 K

.

KΓ "> K^ ~> K^ ... "> K "D K". 2 3 3 n n is nearly Borel and satisfies i) and ii).

K^ n Then K.

z e K ~ we have l

is Borel, and obviously satisfies i ) .

P Z {zf e K", t n

t > 0} = 1 —

for every

n .

ESSAYS ON THE PREDICTION PROCESS

Since

K

1

=

^ K^, n>2 n

K' 1

63

also satisfies ii) and the proof is complete.

We can now prove the main theorem. THEOREM 1.9.

Under Assumptions 1.1 and 1.6, given

Definition 1.2 there exists a

Q*(x,S)

Q(x,S)

for the same

P

and

P

as in

which satisfies

the identities (1.4). PROOF.

We have

Q φ = identity on

By [1, III, 21], φ(K ) G E .

K ,

and

P{X

€ φ(K ),

t > 0} = 1 .

Now we define Q(x,S)

if

x e φ(K )

I g (w χ )

if

x f Φ(Kχ)

Q*(x,S)

where

w (t) = x

for all

t > 0 . Obviously

kernel and satisfies (1.3) for for

x e φ(κ ) f

P,

Q*

and (1.4) for

0 < t χ < ... < t R ,

and

is a probability x ^ ψ(K ) .

Bχ,...,B^ e E,

Finally,

by (1.5)

and Theorem 1.4 we have Q*(x, Pi" X e K-l t k

B ) JC

where we used the fact that

Q

is an isomorphism of

for the last equality (again by [1, III]). omit the t

= s,

φ(K )'s and

t

just as for the first equality. - t

= t,

H|κ

onto

E|-

In the last term we may Choosing

this establishes (1.4) for

B

= E,

S = π£_ 9 {x

& B } .

K. JL The general case follows immediately by the familiar uniqueness of the extension. COROLLARY 1.9.

For every initial distribution

μ,

we have the strong

Markov property:

P^e^slF^) - Q*(XT,S), Pμ-a.s. where

P (S) = /Q*(X,S)μ(dx), and

completed REMARK.

σ-fields

T

Fτ+

It follows that

pJJ

= F^ .

is any finite stopping time of the

64

FRANK B. KNIGHT

PROOF.

For

μ concentrated on

property of

?F , by writing

The part of

μ

we have

outside of

φ(K ) 1 PP -a.s; μμ

φ(K )

this follows from the analogous >y X # = φ(Z^ ) as in the former proof.

causes no difficulty since, for every

T,

μ

{X X QQ

We turn to a discussion of Assumption 1.6, which of course is the main question mark in the theory.

The essential fact in identifying such a

topology is THEOREM 3.10. f o θ

Let

f

be bounded and F °-measurable

(f e b(F°)) . If

is right-continuous (resp. with left limits) in

t

for all

w e Ω,

then for every z e H Zz x t P {E f is right-continuous (resp. with left limits)} = 1 . PROOF.

This follows immediately from two known results: Z

a)

t

E

f

is the

F^-optional projection of

f o θ

[1, III,

Theorem 2 ] , and b)

The

F -optional projection of a right-continuous bounded process

(resp. with left limits) is itself right-continuous (resp. with left limits) z-a.s, [7, Appendice 2 ] . Therefore, we have immediately COROLLARY 1.10. a) b)

Let

for each

w

{f

c b(F°),

and

1 < n}

satisfy the two conditions

n,

f o θ. is right-continuous in t > 0, n t — the monotone linear bounded closure of {fn } is b(F ) .

and

z

Then the topology on H generated by the functions E f , 1 < n satisfies i) and ii) of Assumption 1.6. z PROOF. Only i) needs comment. But since each E fn is measurable with respect to the σ-field generated by the open sets, so is E z f for f if the closure b(F°), as required. There are many possibilities for such f . Perhaps the most n obvious is to take f = g (X ) where r runs over the non-negative n m r rationals and g runs over a uniformly dense set of continuous functions m _ on a compact metric space

E

containing

E

as a Borel subset.

Then the

condition that Q satisfy Assumption 1.6 becomes the Feller property E Q ( x ) g (X ) e C(E) for rational r . m r A weaker type of requirement, but one which still involves the given topology of

E,

utilizes all finite products k r

(1.6)

f = Π JJ n . _ i. i=l

for

0 < r. ""i

generated on

rational and the Ω

by the

f

g fs m

l

-t e g (X ) dt , m. t l

as above.

Here the topology

is just the weak topology of the sojourn

ESSAYS ON THE PREDICTION PROCESS

measures /

0

all

μ(t,A)

g (X ) ds = /

m

3

m

65

defined by

μ(t,A) = J I (X )ds . Indeed, we have U A S g (x)μ(t,dx) . Hence, convergence of these integrals for

Em

is just weak convergence of

On the other hand, this

convergence for all

t

and

generated by the

.

This topology is metrizable, for example, with

f

m

μ(t, ) .

is easily seen to be equivalent to that

metric

d(w ,w ) = Σ w |f (w ) - f (w 2 )|, whence Ω is embedded as a m Borel subset of its compactification, which is the space Ω of (equivalence classes of) measurable functions with values in the closure of

E

(for this argument, see Essay 1, Theorem 1.2, where an analogous

but weaker topology is treated). Accordingly, we can consider on this topology on continuity of

Ω,

Q(x)

by setting

H

the weak-*topology generated by

h(Ω - Ω) = 0

for

h e H . Again,

for this topology on its range can be expressed in

more familiar terms. THEOREM 1.11. Z

E f

n λ > 0

Continuity

Q(x)

for the weak-*topology generated by

for the

f of (1.6) is equivalent to the continuity on n _ and continuous g on E, of

E,

for all

E Q ( X ) /~ e""λtg(Xt)dt .

(1.7) REMARK.

Let

R

Ray property of equation.

λ

g(x)

denote (1.7).

R,g(x),

Then the last continuity is just the

except that we are not assuming the resolvent

The proof below is not self-contained, but in the present

context it does not seem to merit that degree of emphasis. PROOF. h

We rely on the construction of [5], where the coordinate functions

are the present

space

Ω'

g

.

By the argument just given, convergence in the

of [5] induces on

Ω

the topology of weak convergence of

sojourn time distributions. Consequently, the topology of topology as above.

H

in [5] reduces to the same weak-*

The assertion of our theorem now follows from the proof

of Theorem 3.1.1 of [5] in two steps. R : C(E) λ topology.

-00

if on

First, we observe that the proof of

>C(E ) needs no change, where E is E with the Q-induced Q Q This is simply the observation that each R g (x) is continuous , λ n —λt

E* v since 'f are continuous RΛ : C(E) on»C(E) . Second, E each J e gon(X E)dtthen is continuous Ω . Consequently, n λ

we note that the proof of Lemma 3.1.1 of [5] does not use the resolvent equation or the compactness of we obtain that if

Rχ:

C(E)

E . >C(E)

Accordingly it applies unchanged, and holds, then

66

FRANK B. KNIGHT

k

/ g

(X )ds, 1 < k < n, Π

have joint distributions for any choice of Q(x)f E

f o r

n,

t h e

n ,

and

t

> 0,

k X

Q

which are weakly continuous in

t ) .

x

(for

This easily implies continuity of

of (1.6) so the proof is complete.

f

n

n

As seen above, both the Feller property and the Ray property are essentially special cases of Assumption 1.6.

It is thus of interest to

note that (at least formally) the later is much more general than either of these.

According to Corollary 1.10, if

g k e b(E)

is any sequence

such that the monotone linear bounded closure of

{9^}

then the topology on

all

0 < r

rational,

H

generated by

1 < k,

Assumption 1.6.

E f n

for

^ s all °f

b(E)

f = / e n r

g (X )ds, k s

will satisfy the requirements i) and ii) of

Hence one need only find a

Q(x)

continuous in such a

topology to obtain the conclusions of Theorem and Corollary 1.9. since the

g

involve only the

σ-field

E

one is now free to change the topology of

E

Moreover,

(and not the topology of E ) , provided that

X

be assumed to have right-continuous paths with left limits.

may still

Therefore,

rather than starting with Assumption 1.1, we could just as well assume such Q(X t ) a continuity of E f . This leads to the following statement. n THEOREM 1.12.

Let

(E,E)-valued paths to include

σ

( / Ω f(

(Ω,θ ,F°) x t χ

M

be the space of Lebesgue measurable

= w(t) ,

) dτ,

and a probability kernel

t > 0,

s < t, Q(x,S)

with the

f β b(E)) .

σ-fields

F°

Suppose given

satisfying (1.3).

Let

g

P

augmented on

£ b(E)

F be any sequence

having monotone linear bounded closure b(E), and let f be an oo =- s 1 < k . Suppose that the family h (x) E^^f generates the σ-field enumeration of the random variables n j ^ e g ( X )nd s , 0 < r rational, ~ n

r

and that the processes limits, where

P*

is

h (X )

are

P*-a.s. right-continuous with left

P-outer-measure.

and Corollary 1.9 hold when

(Ω,θ , F )

Then the conclusions of Theorem is replaced by the space of right-

continuous paths with left limits in the topology on

E

generated by the

h (x), and when P is transferred to this space. r n FINAL REMARKS. Such a P on F is induced through completion by any progressively measurable process. For 0 < g the processes e

—+-

h n

(χt)

a r e

measurable supermartingales with respect to

as seen by a familiar computation.

F°

and

P,

Hence the martingale convergence

theorems can be used to aid in checking the right-continuity with left limits.

The question is simply whether, by making a standard modification

of

the martingale right-limits along rational

X ,

substitution of

Xfc

in

hn .

t

can be evaluated by

It is important to note that this is always

E,

ESSAYS ON THE PREDICTION PROCESS

67

possible if we permit the standard modification to take values in instead of just in

E

(regarded as a Borel subset of

identification with its image by the mapping limits along rational substitution of on

Ω

Z

by

X

t

.

Q).

H

H

through

Thus by (1.3) the

t

may be evaluated a.s. at each t by p Letting Z denote the general prediction process t

(see Section 1 of Essay I) we may assume without loss of generality p that for each r in a countable dense set P{X = φ(Z )} = 1 . Then if p we replace X by Z whenever this evaluation fails, and then replace P P P φ(Z )

with values in

whenever

Z

E U (H-Q(E))

e Q(E), we get a standard modification of

X

which satisfies the conclusions of Theorem

and Corollary 1.9. It is also of interest to note that for Theorem 1.12 one need only assume (1.3) relative to F . Then the familiar "Hunt's Lemma" argument showsthat the h (X ) are in any case conditional expectations relative n t to

F° ,

and therefore

Q(X ,S)

satisfies (1.3) relative to

analytical question of giving conditions on a semigroup for any corresponding Markov process,

F

dealt with at length in Englebert (1978).

and

F

P

F?

.

The

under which,

are equivalent, is

Here it has been implicitly

assumed (see the second remark after Definition 1.3). REFERENCES 1. Dellacherie, C. and Meyer, P.-A. Chapitres I a IV. Hermann, Paris, 1975.

Probabilites et Potentiel,

2. Engelbert, H. J. "Markov processes in general state spaces" (Part II), Math. Nachr. 82, 1978, 191-203. 3. Getoor, R. K. Markov Processes: Ray Processes and Right Processes. Lecture Notes in Mathematics, No. 440. Springer-Verlag, New York, 1975. 4. Karush, J. "On the Chapman-Kolmogorov equation," Math. Stat., 32, 1961, 1333-1337.

Annals of

5. Knight, F. B. "A predictive view of continuous time processes," Ann. Probability, 3, 1975, 573-596. 6. Knight, F. B. "Prediction processes and an autonomous germMarkov property," Ann. Prob, 7, 1979, 385-405. 7. Meyer, P.-A. Le retournement du temps, d'apres Chung et Walsh. Seminaire de probabilites V, Universite de Strasbourg, 213-236. Lecture Notes in Mathematics 191, Springer-Verlag, Berlin, 1971. 8. Meyer, P.-A. La theorie de la prediction de F. Knight. Seminaire de probabilites X, Universite de Strasbourg, 86-104. Lecture Notes in Mathematics 511, Springer-Verlag, Berlin, 1976. 9. Walsh, J. B. Transition functions of Markov processes. Seminaire de probabilites VI, Universite de Strasbourg, 215-232. Lecture Notes in Mathematics 258, Springer-Verlag, Berlin, 1972.

ESSAY IV.

0.

APPLICATION OF THE PREDICTION PROCESS TO MARTINGALES

INTRODUCTION. Let

X(t) ,

t > 0,

be a rights-continuous supermartingale relative to

an increasing family of it

it

it

(Ω ,F ,p ) .

σ-fields

G

on some probability space

it

we assume that the

are countably generated for each t . * It is then easy, by using indicator functions of generators of Gfc, to construct a sequence that

{x(s),(X .

X

G

1

2(n+1)^'

(s)),s
t

-n'

o f

generates

real-valued processes such

G

for each rational t. We can now

transfer both process and probability to the canonical space We simply set S

e

P{W

.(s) = 0 ,

all s > 0

and

n > 1} = 1,

Ω

of Essay 1.

and for

X > Q B ^ (see Essay 1, Section 1 for notation)

Pί(w 2 n < )> e s} = p *{( χ ( ) ' x 2 ( n + i ) ( >>e s > Then we obtain a canonically defined process supermartingale with respect to In the present work, we let sequential process

(w2n^'

X a n d

P

and the

X ((w )) = w o (t) t n 2 σ-fields

of Essay 1.

denote this process (rather than the w e

dr

°P

t h e

o d d

coordinates from the

notation (i.e., we discard the set of probability zero) .

G°

which is a

0

where they are non-

Thus we do not allow any "hidden information":

?? = G° .

By a

well-known convergence theorem we have E(

F

Vtl t+> = F

lim

E(X

F

s + tl r>

< lim X ' t+ r =

Hence

X^

X

t

is a supermartingale relative to

with its prediction process

F° ,

and we can connect it

Z

As in Essay 1, the method requires that

P

be treated as a variable.

In the present work we are concerned initially with three familiar classes of

P

on

(Ω,F°),

as follows.

ESSAYS ON THE PREDICTION PROCESS

DEFINITION 0.1.

Let

sup E X (t) < °°}, X

= EίX^lF

class

D

0

},

M = {P: X

is an

U - {P: X

and

F^+~martingale and

is a uniformly integrable martingale, i.e.,

V - {P: X

with lim EX

69

= 0} .

is a non-negative supermartingale of

The classes

M

and

V

are called

respectively the square-integrable martingales and the potentials of class D,

or simply the potentials (see [4, VI, Part 1, 9]). Of course, we have

and

V .

For

P € M

M

we have a decomposition

(0.1) where

M c (J, and most of the attention will be on

Xfc - X Q - X^(t) + X*(t) , X?

is a continuous

sum of jumps" with

F° -martingale and

E(X X ) = 0 .

X2

is a "compensated

This decomposition is due to P. A. Meyer

[11], but it will be obtained here as a consequence of a result on additive functionals of a Markov process (Theorem 1.6), more in the spirit of H. Kunita and S. Watanabe [10]. Given such a decomposition (for fixed p it is clear that Z contains the distributions of both processes

P)

X.(s) ° θ given F , but this approach is not useful because one does not 1 p p « xT(s+t)-xT (t)=X^(s) °θ . Rather, one has at least in principle

have

so that the

X.(t)

become additive functionals of the prediction process.

To make this approach rigorous, it is very convenient (and probably necessary) to transfer the setting once more to the prediction spaces of p Essay 1, Section 2. Here the Z are given a single definition not depending on functionals of

P,

and for example the above Z

.

In the setting of

Ό,

X.(t)

become actual additive

this enables us to avoid the

technical difficulties encountered in [7] with a similar question. This approach permits the application of general Markovian methods to the analysis of the

X.(t),

and to other decompositions in

U

and

In particular, we obtain the celebrated Doob-Meyer decomposition in as a theorem on Markov additive functionals (Theorem 1.8).

V . V

Further

investigation of the discontinuities is based on the theory of Levy systems ([1]).

Thanks to the use of a suitably weak topology for

Z ,

it

is possible to transfer directly the known components of the Levy system of a Ray process to

Z ,

including separate terms for the compensation of

totally inaccessible jumps and previsible jumps.

Rather surprisingly,

this operation is in no way restricted to martingales.

By returning the

70

FRANK B. KNIGHT

components to the original probability space

(Ω,F°),

we obtain (what is

termed) the Levy system of an arbitrary r.c.1.1. process (Definition 2.2, Theorem 2.3). Treatment of the continuous components, unlike that of the jumps, is restricted to the case of martingales.

The continuous local martingales

comprise a single prediction process (a "packet," as in Essay 1 ) .

By the

means of a time change inverse to an additive functional, they are all reduced to a single Brownian motion (but it is a Brownian motion for many different probabilities). We then specialize to the case of autonomous germ-Markov probabilities, which generalizes the one-dimensional diffusion processes in the natural scale on

(-00,00) .

Even in this case the variety

of possible behavior is large, and we do not obtain anything like as comprehensive a theory as is available for ordinary diffusion. A significant feature of the prediction process approach to the present material is thus its generality.

It is sometimes possible to restrict the

process to a subset which is especially chosen to fit a given

P,

the present purposes there is usually no advantage in doing so. by considering as a single packet all

P

such that

(X ,P)

but for Instead,

has some

abstract defining property, we obtain at once the results which are implied by that property.

On the other hand, since the definition of

X

is fixed, this approach is not as flexible as the usual one for treating all processes adapted to

1.

F

,

relative to a fixed

P .

THE MARTINGALE PREDICTION SPACES. In this section we study the classes

M,

U,

and

them to prediction space, as in Section 2 of Essay 1.

V

by transplanting

In the following

Section 2, it is shown how these results can be interpreted in the original setting of processes on one process at a time.

(Ω,G ) ,

at least if we only deal with

Some familiarity with the terminology and results

of Essay 1, Section 2, is assumed for the present section.

One new basic

method is introduced which is in no way tied to the martingale setting, although it is perhaps especially well suited to martingales.

This is the

application of the Levy system of a Ray process to a packet of the prediction process. natural step to take.

In view of Corollary 2.12 of Essay 1, this is a Here we do not propose to exhaust its implications

even for martingales, but only to use it for the limited purpose of obtaining certain well-known decomposition theorems in the prediction space setting.

It is hardly surprising that these appear as results on

Markov additive functionals of the prediction process, since on prediction space we have a richer structure than on the original space.

A key

ESSAYS ON THE PREDICTION PROCESS

71

ingredient is the fact that on prediction space the prediction process behaves well under the translation operators space there is no corresponding operation.

θ ,

while on the original

Throughout the present section,

we make one significant change in the notation of Definitions 1.8 and 2.1 of Essay 1.

We let

φ(z)

previous definition. X

t

denote only the first coordinate from its

Thus with the present restricted definition of

we retain the fact that for each ,

h,

(φ(Z ) ,Z. ) t ^

is

P "equivalent to

We first establish that the martingale prediction spaces have the nicest possible general properties. THEOREM 1.1.

The sets

M,

U,

and

V

are complete Borel packets of the

prediction process (in the sense of Definition 2.1, 3) of Essay 1 ) . PROOF.

Since for every

h e H

the processes

X

and

φ(Z )

are

P -equivalent we may verify that the three sets are in H by using either X or φ(Z ) . We choose to use φ(Z ) . The three sets' have in common t t t the uniform integrability of any

N > 0,

h £ H,

and

φ(Z ), 0 < t .

By Fatou's lemma, we have for

t > 0,

lim inf E h (|φ(Z ) r r+t+ > E h (|φ(Z t )|;

|φ(Zt) | > N) .

Hence uniform integrability is equivalently expressed, using the rationals

Q,

by the condition lim

and the set of

h

E h (|φ(Z )|

sup

r

o < r^Q

|φ (Z ) | > N) = 0 , r

satisfying this is clearly in

H .

We now further

restrict this set by the martingale or supermartingale conditions. Markov property of

Z ,

By the

these are respectively

h

P -a.s.

But a simple application of Hunt's lemma shows that when uniformly integrable, if these are assumed only for can take right-limits to extend them to all

s,t .

φ(Z )

is

0 1 s, t € Q ,

we

Thus the class of

uniformly integrable martingales (respectively, supermartingales) is in REMARK.

For the martingale case, one can proceed more simply by writing Z

the condition as

E

t

Z

φ (Z ) = E

t

φ^,

h P -a.s.,

where

φ ^ = lim φ(Z )

H .

72

FRANK B. KNIGHT

exists

P -a.s.,

which is also a Borel condition on

Now to obtain h E

2

< oo,

φ

D

we have only to append the Borel condition

go it remains to obtain

supermartingales. and

M

h .

V

from the uniformly integrable

Clearly the conditions

P {φ(Zt) > 0

all

t > 0} = 1,

h

lim E (φ(Z )) = 0, are Borel in h, so we need only check the class n n-κ» requirement. According to a criterion of Johnson and Helms (see [4,

IV, Section 1, Theorem 25]), for a positive, right-continuous supermartingale φ(Z )

to be of class

increasing sequence

D

it is necessary and sufficient, for any

0 < C

»°°,

that

lim E (φ (Z

k

where

T. = infίt: φ(Z.) > C, } . Obviously K

t

Since

Z

is

T

K

h,

as required.

Z -optional and our three sets are Borel, to show that

they are prediction packets it suffices to show that for T < °°, Z

is

P -a.s.

introduce the sets

in each set along with

S g = {z e H: E

= ίz fe H: E Z φ(Z ) < φ (z)} .

S

martingale h

P {z

h .

φ (Zg) = φ(z)},

(respectively,

<

P {z

S

e s } = 1) t

s > 0,

we

and

Clearly these are in

h

Z -optional

For

H,

(respectively supermartingale) condition on

e s } = 1 u

< «) = 0,

o

φ (T ) β Z°, hence this is

K

again a Borel condition on

); K

k-χχ>

and the

h

for all

becomes s,t > 0 .

S

By the classical optional sampling theorems of Doob (Neveu [13, Proposition IV, 5.5]) we have respectively, for P h ίz

* S } = 1 J."ι U

P h {z

and

S

Γrt

first case (1.1)

h

in the corresponding set,

& S<} = 1 .

Therefore, we have in the

S

1 = E h P h ( Z τ + t e S s |Z τ ) = E h P T ( Z t e sg) ,

and therefore

P

zτ

{zfc e S g } = 1,

corresponding result using

S

•i

hold, for

P -a.s.

for each

in the second case. S

P -a.s.

Z ,

for

0 < s,t e Q

at least for rational

Z«p

P

t .

P -probability one.

Zφ

t < K,

Therefore, they

φ(Z )

Consequently, for fixed

-uniformly integrable for rational

with the

2 T

with

But this means, in the martingale cases, that

(s,t)

P

is a K

-a.s.

the

z

P T -martingale, φ(Z )

t

are

Then as in the

first part of the proof we use Hunt's Lemma to extend the martingale property to all

(s,t)

with

s + t < K,

and then let

case of positive supermartingales is a little different.

K •+ «> .

The

One first

observes that simply by martingale convergence of conditional expectations, Z T if φ(Z r ) is a P -positive-supermartingale as r varies in Q, then for any 0 < t < r, r e Q , one has

ESSAYS ON THE PREDICTION PROCESS 17

73

rr Z

t E

φ(Z

m

.

) = lim E

seQ < lim

φ (Z ) S

z

= φ(Z t ),

τ

P -a.s.

Then by positivity and Fatou's Lemma, for 0 < t < s one has z z t t E φ (Z ) < liminf E φ (Z ) S r ~ r*s+ " Z

< φ(Z ) ,

P

T

-a.s.

z

Thus

τ h φ(Z.) is a P -positive supermartingale, P -a.s. The uniform integrability, or square integrability, of the

Zm P -martingales now follows easily from the fact that convergence of φ (Z ) to φ in L or L for P as t ~+ °° implies the same T+t °° convergence conditional on Z , at least for a sequence t •* <» x

sufficiently fast. z

of the

P

jc

This suffices to identify

τ

φ

as the value at

t = 00

h -martingales for

P -a.e.

It remains to verify that

Z

Z

(see Neveu [13, IV, 5.6]).

is of class

D

for

P

in the

supermartingale case, and to show that the three packets are complete. With T

C

as in the first part of the proof, and optional

= inf{t > T: φ (Z ) > C, } . Then with the previous JC

T

£

t

< T

and,

E (φ (Z ) ; T k have

for

JC

h e V,

< 001Z )

let

we have

JC h

lim E (φ(Z

T

k-*oo

); T

k

K

< «>) = 0 .

tends to zero in probability as

), τ <-|z JC

E h (φ(Z

It follows that

k -*-«>.

Now we

, JC

by [4, VI, Section 1, [10]], hence by conditioning on

(1.3)

T

T < »,

);

T'<»|Z)

Z

we have

74

FRANK B. KNIGHT

Therefore, the left side also tends to a subsequence

k.

0

in probability, and so there is

for which it converges to

then, the sequence

C

0

P -a.s. Clearly,

satisfies the Johnson-Helms criterion for Z^,

P -a.s., proving that

Z

is a.s. of class

D .

Turning, finally, to the completeness, we must show that for Z -previsible

0 < T < °°, Z

is in the corresponding packet,

E h E T " | φ ( Z ) | = E | φ ( z τ + t ) | < °° ' Zrp_ ft

First, we note that

has finite expectation for

P

,

hence

P -a.s. φ(Z )

P -a.s. We can now verify the

martingale property (resp. positive supermartingale property) just as before, except that in (1.1) the conditioning is on

Z

. The uniform

or square integrability then follows in the martingale cases, as before, by Φ ( z τ + t ) t o Φα,' conditional on Z , and it only k remains to check the class D restriction in the supermartingale case. is clear as before that E (φ(Z ) ; T < °°|Z ) converges in Ti^ V T K T—

convergence of

probability to

0

as

k -> °° . Conditioning both sides of (1.3) by Z

It

,

we then obtain the same criterion, and the proof of Theorem 1.1 is complete. REMARK.

The supermartingale case could also have been handled by means of

the Doob-Meyer decomposition of

φ(Z ), but since our intention is to

obtain this decomposition from the prediction space, this would lead to a circular reasoning. COROLLARY 1.1. lί Π H , On

or

M Π HQ

The prediction process is a right-process on

V Π H , and

and in each case

ϋ Π HQ,

is a potential of class REMARKS.

φ(z)

φ(z)

M Π H ,

is an excessive function.

is an invariant function.

On

V Π HQ

it

D.

Since the literature of excessive functions is usually confined

to standard processes, this terminology is not quite orthodox. standard processes, such φ

For

are considered under (1) and (2) of the Notes

and Comments to Chapter IV in [2]. PROOF.

In all three cases, for

lim E φ(Z ) = φ(z) t+0 *

we have

by right-continuity of

P z {φ(Z ) = φ(z)} = 1 martingale property

z e H

φ(Z )

E φ(Z ) < φ (z),

and

and the fact that

t

for

z e H

.

Invariance, by definition, becomes the

E φ(Z ) = φ(z) .

For the last assertion, which is

again true by definition, we observe that for any increasing sequence T +oo of stopping times one has lim E φ(T ) = 0 for z £ V Π H_, since n n u -*oo Z the φ(T ) are p -uniformly integrable and, by supermartingale n convergence, lim φ(Z ) = 0 a.s. n

ESSAYS ON THE PREDICTION PROCESS

φ(zt)

We next take up the discontinuities of the Levy system of

Z

75

Here our chief tool is

on the corresponding packet.

The theory of Levy

systems was initiated by M. Motoo and S. Watanabe under the hypothesis of absolute continuity [12], and developed further by J. Walsh and M. Weil [16].

The final touches, and also the simplest proofs, are provided by

A. Benveniste and J. Jacod [1], whose formulation applies to all the discontinuity times of any Ray process.

Since we know by Corollary 2.12

of Essay 1 that on any Borel prediction packet the prediction process is (in a sense) a Ray process restricted to a suitable Borel set, it is natural to use the result of Corollary 5.2 of [1] which we now describe. Continuing the notation (however unwieldy) of Essay 1, let

H

be A

+

any Borel prediction packet, and let (H -————

Π H )

A

of its "non-branching" points.

We denote the canonical Ray process by

— V

with probabilities

—

P ,

transition function

Then there exists a Levy system of N, M, H,

and

kernels on

L . Here

Xfc,

N = N(x,dy)

+

(HA fl H Q ) , while x s D,

where

the Ray branching points), while and

M

and X

Π H )

and resolvent

N(x,{x}) = 0 , < oo

x G D,

and

EXL

and

.

M(x,dy)

x

(with

x e D

continuous additive functional while

D

X

.

In fact, we have

t .

=

Finally,

H

Both

N

where

is a

σ-fields

Σ f (X ) Irs 1 e 0<s
t

F

of

•, for B

'

f > 0 .

These four objects have the property that

previsible discontinuities.

Ex

Z

(N,H)

while

In more detail, let

jointly-Ray-Borel function with

Σ

f(x,x) = 0 .

f(5s,Xs)

/>. and

x

x c B . Next, we have

L

totally inaccessible discontinuities of

(1.4)(a,

is

is a purely discontinuous additive

* a Borel

x e B .

for each

functional which is previsible with respect to the usual the Ray process

N(x,dy)

as before denoting

is a Ray-Borel set). Further,

and all L

B

is defined for

(B

are additive

In more detail,

- B

M(x,{x}) = 0 ,

< °° for

R^ .

are Borel measure

L = Lfc(w)

yield finite measures on the Borel sets of

defined, and are Borel measurable in EXH

p,

M = M(x,dy)

H = H t (w)

D « (H

X ,

—

which consists of four parts, and

functionals on the probability space of defined for

be the Ray compactification

U

N(ϊ β

I

f(x,y)

Then {

J

"compensates" the

(M,L)

e

D

}

-'dy)f( V ' y )

compensates the be any non-negative,

76

FRANK B. KNIGHT

(1.4) (b)

*"0 <ϊ ,

for all X

x

and

t > 0 .

may be replaced by

We note that on the right side of X

S"~

since

H

S

is continuous. S

_

We combine the Levy system by setting are extended to be the zero measure for and

H = H + L .

N = N + M,

x

x,

and

t

and

M

at which they were undefined,

as before.

In order to apply this to the packet involved and we do not assume packet

h(D ) Π H

H

H ,

since left limits are

is complete, we use the complete Borel

of Essay 1, Corollary 2.12.

We recall that

A

.

H_ c (h(D_) Π H) , Ά

and for

h e (h(DΛ) Π H)

A

t > 0} = 1 .

elements of

H

h(DA) ^

Thus Z

on

H

H

Π H

c a n

P {Z^ e H t

n L

A

for U

be regarded as a packet of

which, in addition, are given by

(and therefore, by distributions on -z

such an entrance law is determined by Π H ) ,

we'have

A

entrance laws for

(H

N

EX

f,

all

where

Thus we have finally

(1.4)(c)

for

(a),

P

h(D ) Π H ) . A o

Since

for at most one point of

and in the present case by at least one, we see that the

mapping h(z) is one-to-one on D . Thus we can introduce the inverse -1 h : h ( D ) -> D , and for any initial distribution μ for Z on A

A

η

-

t

h(D ) Π H we can identify X = h" (Z ) and X = h" (Z. ) for A U t t Z— t"~ t > 0 as a realization of a Ray process with initial distribution 1 μ(S) = μ(h(S)) on h" (h(DAΛ ) Π H_) . Furthermore, we showed in the 1 -1 proof of Essay 1, Theorem 2.13, that h (h(D ) Π H ) = h (h(D ) Π H) - B, A U A or in other words, for the right process Z on h(D ) Π H with left limits in h(D ) Π H, the elements of (h(D ) ίi H) - H correspond -1 -1 under

h

to the Ray-branching points in

transfer the Levy system of

X

h

(h(D ) Π H) .

to obtain a Levy system of

Thus we can Z

on

h(D A ) Π H . In detail, let t > 0, ""

and

Ω^ = {w,, ^ Ω^ such that Z , A Z Z w^ίt-) e h(D ) Π H, t > 0} . Then ί>

A

{A Π Ω

w (t) e n ( D ) Π H_, Z A U Ω , with the σ-fields Z, A

A € Z } on Ω , is canonical sample space for Z as a Z , A t l Z,A t right process on h(D ) Π H . Using this sample space, we define the four A

\j

elements of a Levy system by

77

ESSAYS ON THE PREDICTION PROCESS

(1.5)

N

(h Z

M

1

'dh2)

; h

€ h(D

Π

) A

H

0

- HQ ,

Z(hl'dh2) ?

L

l

W

(

Z,t V '

z

€

Ω

u

Z,

-,»

-1 Z where w(t) = h (w (t)) for t > 0 . Then since θ w corresponds to Z "* t ^ θ w as w does to w (where θ is the Ray-translation operator), we t Z t Also see that H_ and L_ are additive functionals of Z on Ω_ Z Z X. Δ fP H_ is continuous, while since

z

0
where

f h

is Borel on

discontinuous. over to

B}

h(D ) Π H,

L

A

is

Z

Z -previsible and purely

t

Of course, the local integrability of

H_

and

carries

z e h(D A ) Π H Q .

It is now just a matter of transferring (1.4) to the present context, and setting THEOREM 1.2.

N

N

For

+ M,,

and

H,,

Ct

Z

0 < f(zχ,z2)

+ L , to obtain Z Z H x H, f(z,z) - 0, the objects (1.5) H

satisfy ( 16 ) ( a

> d H

Z,s / H Q N z ( Z s - ' d z )

0
f ( Z

s-'2)

H-H o }

o
for all

z €r h(D A ) Π H Q

and

Of course, the kernels are concentrated on

t > 0 N_,

h(D )Π H

M

are Borel in

z

and the measures

Also, we may as well assume

f(z ,z ) = 0

78

FRANK B. KNIGHT

except for

z

e h(D ) Π H

use of subscript

Z

and

z 2 e h(D A ) fl H Q . On the other hand, our

instead of

A

for the elements of the Levy system is

quite appropriate for the following reason. with the case

H

= H,

h(D) Π H

= H

.

We could just as well begin

Then we obtain the four

components of the Levy system in a form which applies, except for negligible changes, to any Borel prediction packet

H

c H .

In fact, the

only objection to identifying the restrictions of these components to h(D ) Π H A

and

Ω with the components of Theorem 1.2, in general, is Z, A

that the measure kernels might not be suitably restricted to the corresponding

h(D ) Π H . A U

these measures to be (1.6).

0

However, for any fixed

one may redefine A

outside

MD ) Π H

This follows by substituting

(1.6) and noting that for

H

without losing property

f(z_,z_) = 1 - I

z e h(DA> Π H Q

(z ) in

n

the result is

0. Another form of the same observation is useful in treating and

M,

U,

V . We may and do take as sample spaces the canonical prediction

spaces of all elements of

Ω_

with values and left limits in the

z respective (complete) packet.

An inspection of (1.6) shows that we may just

as well restrict the components of the Levy system from h

such complete Borel packet, instead of just

H

(D ) Π H .

= H

seen by intersecting the packet first with the corresponding the step is unnecessary.) THEOREM 1.3.

to any

(This can also be h(D ) ,

but

We may state

There exists a Levy system for any complete Borel prediction

packet and corresponding r.c.1.1. process of Theorem 1.2 with

H = H A

For application to

Z

.

In fact, the components

may be restricted to yield such a system.

φ(Z ), we need to restate the properties of

the Levy system in a somewhat different form. COROLLARY 1.3.

For any complete Borel prediction packet, the properties

(1.6) of the Levy system imply that for any any Z -previsible process by f ( y t . z > i ( Z t _ / z ) PROOF.

y

G

R,

(1.6) holds with

f(Z

,z)

and

replaced

We justify the substitution in (1.6)(c), the other two cases being

analogous.

First we consider

f(y,z) = k(y)g(z),

k e B,

f

and

g e tf, and

y

of the special form y

= I (w )I,(w_)I,. t

A fe Z

0 < f(z ,z ) G S * H,

_ .

Now by (1.6)(a)

the packet we have

A

Z

A

and the Markov property of

ί»

(t)

for

\t , t_J

Z ,

for

z

in

79

ESSAYS ON THE PREDICTION PROCESS

(1.7)

Z = lim

t r 2 /t -r

r

g(z)

, 2 dδ.Z,S

N (Z

,dz)I. ^ S-f

0

Multiplying both sides of (1.7) by

I

V

and taking the expectations

remove the conditioning, we obtain the assertion for this

y

is generated by such

y

and sets of the form

indicators satisfy the assertion trivially.

to

and

By [3, Chapter IV, Theorem 67], the previsible

k(y)

E

σ-field

Z ,

{0} x A, A

whose

Moreover, the class of

finite sums of disjoint indicators of the form

y

is closed under

multiplication.

y

satisfying the

Since the class of indicators

assertion is monotone, it follows by [3, Chapter 1, 19] that it contains all y

Z -previsible indicator functions, and hence all previsible > 0 .

Since

result for

k(y )

f(z ,z 2 )

is again previsbly for k(z )g(z 2 )

0 < k € β,

immediately.

we obtain the

Therefore, it holds for

finite positive linear combinations of such, and by [3, Chapter 1, 21] it holds for

0 < f ^ B x H

as asserted.

We first apply this method to obtain a well-known decomposition of square integrable martingales due to P. A. Meyer [11] and Kunita, Watanabe [10].

We recall that two square integrable martingales are called

orthogonal if their product is a martingale. of

Z ,

If they are additive functionals

then one requires this to hold for every

following notation will be used for

h

M, U,

P

or

h e M .

V,

The

according to

context. NOTATION 1.4. any

h & H,

limit, and

For

φ_(t) φ_(t)

t > 0,

let φ (t) = limsup φ(Z )

exists

P -a.s. for all

is

t > 0

We recall that for as an ordinary left

P -equivalent in distribution to

2.1, 2 ) , of Essay 1 ) , as

φ(Zfc)

is to

Xfc .

Our object is to disengage the jumps of

φ(Z )

X

t-

(Definition

into a separate

martingale, called the "discontinuous part," by means of the Levy system. For this we need LEMMA 1.5. Z

f Z ,

For either all

h ^ M

t > 0} = 1 .

are contined in those of

Z^ .

or

h ^ U ,

P {φ (t) f φ(Z )

implies

In words, the discontinuity times of

φ(Z )

80

FRANK B. KNIGHT

PROOF.

For

ε > 0,

let

T

= inf {t > 0:

|φ(Z ) - φ (t)I > ε} t —

ε Tε

is a

P (T

Z -stopping time, and by right-continuity of

> 0} = 1

existence of

for all

φ (t)

h .

for all

the iterates

V ε *Z T }ε=

ε

<

T

T

i s

t h e

a c c e s s i b l e

Essay 1, we always have

Part.

° θ ε

, with n

Hence by the strong Markov

°°} "

is the totally inaccessible part of

c = T Ω -A

= T

ε

h P -a.s. ph{T

TΠ+ ε

By [3, IV, Theorem 81c)] there is a decomposition where

we have

By the strong Markov property and the t > 0,

1 T £ = T , tend to °° along with n, property it suffices to show that p h {

φ(Z )

Clearly

T

T

for

= T

Λ T , AC (Z.,P ) , and

According to Theorem 2.13(ii) of

Z

f Z P -a.s. on {T = T } . On the other l ε ε A -measurable on A° ([3, IV, 57]), and by the

Ύ

hand, since

Zτ_

is

Z

moderate Markov property we have d

8)

P (φ(Z

)^B|Z ε

P -a.s.

A C Π { T < «}

on

for

) = q(0,Z ,{φ(z) - B}) ε ε B e B,

P (φ(Z

it follows that

) = φ (Z ε

on

{z τ _ e H Q } Π A C Π {T £ < «>}, P h -a.s.

Z

f H ,

P -a.s.

on

A

) |Z ε

) = 1 ε

Therefore, we have

Π { T < °°}, and so

Z

^ Z

We now state and prove the decomposition theorem in Ω

IWV = ^wr,: w_(t) e M

z

J

z

THEOREM 1.6. where for

for all

t > 0

"~

and

There is a decomposition

h e M,

M

is a continuous,

w (t-) e M

as required. M .

for all

z

Let t > 0} .

φ(Z ) - Ψ(Z Q ) = M (t) + M (t), P-square-integrable,

c h

martingale additive functional on ίίw, M, is an E -mean-square limit of martingale additive functionals of bounded variation, and M is orthogonal to all

M, . The decomposition is unique up to a α

P -null set for

h . In the course of the proof, and also later, we need

NOTATION 1.7. t > 0,

For any r.c.1.1. process

M(t),

let

ΔM(t) = M(t) - M(t-),

oo - oo

where M(t-) denotes the left limit at time t, . In particular, let Δφ (t) = φ (Z ) - φ (t) .

PROOF.

If

E

M(t)

M («>) < oo)

is any

and we use

P -square-integrable martingale (in particular

then it is a familiar fact that

M(t)

has orthogonal

ESSAYS ON THE PREDICTION PROCESS

increments.

81

Thus if

{t ., 1 < i < n} is a sequence of partitions of [0,t] n, 1 — — with maximum separation tending to 0, then by Fatou's Lemma, with

h 2 EM

Π

h (t) = lim E

2 Σ (M(t

.) - M(t

. _)

> lim E h Σ (ΔM(t. ) ) 2 D " ε-MH t. 'ε 3*ε where the last sum is over all t + «>,

E h M 2 («) > E h

(1.9)

where

t.

such that

Δ M(t.

) > ε .

Letting

it follows that

t.

Σ (ΔM(t.)) 2 , D j

enumerate the discontinuity times of

We now fix (which is

0 < a < b,

M(t) .

and apply Corollary 1.3 with

Z -previsible by [3, IV, Theorem 92]), and 1

(φ(z) - y ) / - ^

( z )_ v
Eh|

Letting

y

= φ (t)

f(y,z) =

φ w = limsup φ (Zfc) , (1.9) implies that

Σ f(φ (s), Z )| < a" 1 E h ( Φ o o -φ(Z ) ) 2 < « . S 0<s
Then we may subtract the right side in Corollary 1.3, and by Lemma 1.5 and the Markov property of

M

Z

we obtain that the process

(a,b) ( t ) = [ - / $ d δ z , s / H o VV' d z ) f ( < ( > - ( s ) ' z )

is a martingale additive function of

Z

on

Ω,,

(here

f(y,z)

was

substituted explicitly only in the sum). The martingale

M. . . (t) is clearly of bounded variation, and we (a,Dj now evaluate its mean square precisely. Denoting the above difference by M

U,b)(t)

T

is a N

as

- M(a,b)(t)'

l e t

Z -stopping time, t

N •> °° .

T

N M

i n f { t :

M

(a,b)(t)

. (T ) < N + b, (ΆfD)

Also, as in (1.4),

v* —

M. , . (t) (a,b)

of discontinuity, and for previsible

+

M

( a , b ) ( t ) >- N }

and

T

'

T h e n

•+ °°t P -a.s.,

N

has at most only accessible times

T < «> it follows by (1.8) of Lemma 1.5

and Jensen's inequality for conditional expectations that

82

FRANK B. KNIGHT

Then by decomposition of

T

we have easily

< (N+b) 2 . Next, for

t > 0

fixed, let

t . = jt2~ n , n, j

beginning the proof to the martingale

M

and reapply the argument

(t Λ T ) .

Simply by decomposing

paths of bounded variation into continuous and jump components, we see that the sums of the squared increments of t . converge n,3

P -a.s.

as

. (t Λ T ) along the partitions a,b N to Σ (ΔM, , . (t. Λ T )) , where (a,b) 1 N t < t

n -> «

M

the sum is over the jump times less than

t .

Also, the sums of squares

of increments of alone.

M alone are decreasing with n, as are those of M 2 2 2 (c-d) < (c +d ) to bound the squared increments of M,

Using

the sums are dominated by has finite expectation.

(M* ,_, (t Λ T,)) + (M~ . , (t Λ T )) , which (a,b) N (a,o; N Hence by the dominated convergence theorem,

EhM2 .(t Λ T ) = E h Σ (ΔM (a,b) N t
E h M 2 ..(t) = E h (a,b)

In particular, by (1.9) M

.

E h M

1

E(M

U

c < d < 0

0 < a < b < c

S

< t

M

M.

"~

(t) < E M (°°) ,

we have by (1.11) that + M

(b,c)(t))2

and by the Markov property of Z

(DrC)

.

and

we can define

the same form as

E M (dL rD)

(a,O(t) =Eh<M(a,b)(t)

c < φ(Z ) - φ (s) < d, ""

t

(ΔM , ( t . ) ) 2 . a,b I

it follows that

\ ί* ^ / w x ( t ) ) = O ,

{3.,D)

follows that

Σ

is square-integrable.

Furthermore, for

Hence

Λ T ))2 .

N ->• °°, it follows readily that

(1.11)

hence

(t

M(,

.

are orthogonal.

Similarly, for

M, . to compensate the negative jumps (c, α) and (1.11) applies. Finally, since -M, .

M. ,x , (a,b)

it t

it is seen that

^C, Q;

has

ESSAYS ON THE PREDICTION PROCESS

Hence

M, ,x and (a,b)

83

M, _. are orthogonal, and (c,d) * E h (M, (t) + M (t)) 2 < E h M 2 (») . (a,b) (c,d)

We next choose a sequence

a •> 0+, b -> », n n

It follows directly from the above that for exist

E -mean-square limits of

M

h e M

, . (t) + M, (Q.fD)

sequence.

c -*• - », d n n and

-»• 0- .

0 < t < «

there

. (t) along this

(C,Q.)

Furthermore, it is known from general theorems of analysis that

such limits always may be chosen so as to be valid for all

h

(see

[14, Theorem 3]). Accordingly, we denote such a choice by

M*(t),

and

define lim M (t) = I 0 For each

h,

M*(r)

if this exists for all

equals elsewhere.

0

for

t < »

and

t = 0 ,

we have easily M*(r) = E h (M*(«)|Z r ) , P h -a.s. ,

from which it follows that Vι

M,(t) is a right-continuous version of

it

E (M (°°) I Z )

for each

h,

and thus it is a square-integrable martingale.

To see that it is an additive functional of s,t,

and

h G M

we can choose

α ,

Z , we note that for fixed

$. , γ , and

Jc

jc

6

jc

P h ( S g + t ) = 1,

P h (S t ) = 1,

s

given by S = {M (u) = lim (M, . (u) + M, d (α (Y k-^ k'βk) k'6k}

u

'

u

- °'

i s

P t(Sg) = 1

such that K

and

for

P h -a.e.

Z^,

where

(u))} .

Since then n

h

P (θ" u

1

h

S ) = E (P t ( S )) = 1 , S

S

the property

M,(s+t) = M,(t) + M,(s) ° θ. , P -a.s., α α α u corresponding fact for M. ,x (t) + M, , x (t) . (a,b) (c,d)

follows from the

Similarly, it follows from a classical martingale theorem of Doob ([5, (Theorem 5.1), p. 363]) that for each n,

for which

M^

is the limit of

M,

h

we can choose a subsequence

. + M.

.

uniformly in

t,

84

FRANK B. KNIGHT

P -a.s.

for

a = a , \

etc.

Clearly, then,

M,(t)

contains all the

d

totally inaccessible jumps of

φ(Z ) .

But for previsible

T < <», we

have Δ M (T) = Δφ(T) plus a quantity which is Z -measurable along with the ΔM" v* (T), and since E h (ΔM,(T)|Z m ) = 0, this quantity must be 0 KdLfD)

α

T—

Hence we see that M (t) = φ ( Z ) - φ(Z Λ ) - M,(t) defines a continuous c t U α martingale additive functional of Z . It remains only to show that M and

M, are orthogonal, or again that M and M, . are orthogonal, d c va, D j To this effect we have only to apply some of the argument for (1.11) to M (t) + M, ., (t) c (a,b) |M (t)I) > N} . E h (M

c

with

TM N

redefined by

It follows readily that in computing

(t Λ T ) + M, . . (t Λ T M ) ) 2 along N (a,b) N

cross-products of increments of * bounded by

inf{t: (M* ,. (t) + M T .. (t) + (a,b) (a,b)

M

c

and

2N (M* . , (T ) + M T - % (T M )), (a,r>) N (a,o) N

and the sum tends to

0

{t

.} n,j

the sum in

M, . . (a,b)

in

of the

(t . _,t .) n,3-l' n,j

is

which has finite expectation

along with the partition size

By dominated convergence we obtain

j

E(M(tΛT)M, c

N

2

,

P -a.s.

,x(tΛT))=0, (a,b)

N

and to conclude the existence proof it suffices to observe that, by using Fatou's Lemma, lim E h (M(t) - M(t Λ T ) ) 2

= EV (t)

- lim E"(M*(t Λ T ))

= 0 , for any

P -square-integrable martingale

M(t)

with respect to

As to the uniqueness, since any two choices for

M,

Z

.

differ by a

continuous martingale additive functional, it needs only be shown that such cannot be the

E -mean-square limit of martingales of integrable total

variation unless it is

E -a.s.

identically

check that the proof of orthogonality of

M c

0 .

The reader will readily

and

M

just given (a,D)

applies without change to any square integrable martingales which are respectively continuous and of integrable total variation (where M (t) + M~(t)

is defined to be the total variation at time

t) .

Hence

the former cannot be approximated in the mean square by the latter, and the uniqueness is proved. In the present section we make no further use of the packet II, except to remark that any class

D

right-continuous submartingale

X

ESSAYS ON THE PREDICTION PROCESS

may be decomposed in the form

Xfc - E ( x j G ° + ) - (E(xjG° + ) - Xfc)

first process on the right is in elements of the packet

V,

85

U

and the second is in

V .

where the

For the

we will derive the celebrated Doob-Meyer

decomposition theorem as a theorem on Markov processes.

It will be seen

that this yields the corresponding decomposition result for

X

by

expressing it in the above form. Many proofs of the Doob-Meyer decomposition are known, and some are perhaps easier than ours.

Nevertheless, ours seems worthwhile because it

connects the decomposition with the prediction process, and provides additive functionals where the decomposition alone only provides unrelated pairs of processes.

Besides, it does not use the theory of Levy systems,

and most of the work needed for the proof has already been done in [2, Chapter 4, Section 3] and therefore need not be repeated here. Ω = {w : w it) G V Z Z

for

t > 0 ""

and

w_(t-) <= V Z

for

We let t > C} .

The result to be proved is as follows. THEOREM 1.8.

There is a decomposition φ(Zfc) - φ(Z Q ) = M(t) - A(t)

where

A(t)

on

Ωp ,

is a (non-decreasing) additive functional of

Z ,

•i

t

Z -previsible for every

h ^ D,

martingale additive functional. equivalence (i.e., PROOF.

P -a.s.

and

M(t)

is a uniformly integrable

The decomposition is unique up to

for all

h e p ) .

The method of the proof is to write

the three terms on the right are class moreover

φ

D

φ=φ

+ φ

potentials of

corresponds to discontinuities of

φ (Z )

1 /U.

is continuous, Z

,

on

where

V,

at which

t

corresponds to discontinuities of

and

φ

is a regular potential.

and

Z^_

t

φ2

e H - H ,

+φ Z

φ(Z )

with

The asserted decomposition

is obtained separately for each of the three terms. Recalling from Notation 1.4 that

φ (t)

is a

indistinguishable from the left-limit process of

Z -previsible process

φ(Z ),

for fixed

ε > 0

let T = inf{t > 0: (|Δφ(t)|l,_ (Z

_ .) > ε} .

t- " V "

Since

z

φ( t)

s

i- r.c.1.1. except on a null set, its jumps of size

not accumulate, and hence we see that on size at least a

Z

(hence

ε

at Z )

t = T

μ .

Z

{T < °°} φ (Z ) is continuous.

do

has a jump of

Also, since

T

stopping time, (and a terminal time), it follows by

Theorem 2.13 of Essay 1 that distribution

where

ε

T

is

Z"_-previsible for each initial

Then by the moderate Markov property

is

86

FRANK B. KNIGHT E h (φ(Z τ )|Z τ J

(1.12)

V

φ(zQ)

= E

= φ (Z ) , Since that

φ(Z )

P -a.s.

on

{T < «} .

is a supermartingale, the optional sampling theorem implies

φ(Z ) < φ (T), P -a.s.

on

{T < »}

(see [4, VI, Part 1, Theorem

14]). Letting

T = T

and

T

= T

° θ

,

1 < n,

it follows in the same

n

h way that Δφ(T ) < 0, P -a.s. on {T < «} same supermartingale property we see that

for all

n .

Next, by the

E h |Σ Δφ(T )| < E h (φ(h) - lim φ(Z )) Π t n=l t-x» = φ(h) . As

e ->• 0+,

the same facts are seen to hold for all

introduce the process t. < t

with

Z

A

= Z i

A, ^(0+) = oo, we set l,d

(t) = and

-Σ Δφ(t.), X t ±
Δφ(t.) < 0

(in case this yields

iA, ^(t) = 0 l,d

for all

t) .

is an additive functional, and we have

h e V .

Moreover, for each

(1.13)

ε

It is then clear that

E A, . (°°) < φ (h) l,d

A (t) =

-Σ Δφ(T ) n:T
Z -previsible and equivalent to an additive functional.

A_ _ l,α

is

for all

the process

ε

Z^ t

Hence we may

the sum being over all

A, , (t) l,d

is

ε .

Z -previsible (since it is t

Therefore

P -indistinguishable from such, and

P h -null sets), , We now set Φ , Λ (h) = E (A, Λ («>)) . ^l,d l,d

contains all

φ

is a potential of class

D,

It is immediately clear that

with

φ

1 ,Q

< φ .

Of course, the

1 , Ct

Doob-Meyer decomposition of

φ

-,(z.) 1 ,Q

for each

P

i is given by

u

-Ai.d(t) According to our construction, φ φ

1 ,d 1 ,Cl

on

is is Borel. Borel.

{T < «} .

Finally, let Finally, le

Then we have

A

(~)

1 ,α

0 < T

be

is even

Z°-measurable, hence

Z -previsible with u

Z

= Z T—

x

ESSAYS ON THE PREDICTION PROCESS

(1.14)

Consequently, φ(Z ), φ

"E h ( A l,d ( " ) l Z T- ) " Δ A l,d ( T )

*Plfd<*> - ^ V d ^ ' V = - ΔA φ

j. ,Q

(Z )

d

87

P h -a.s.

(T) ,

on

{T < «} .

contains all of the discontinuities at times

t

and does not introduce any others of the same kind.

= φ - φ. -,

it follows that

contained a.s. in that of

Z

φ o (Z )

n

Setting

has its discontinuity time set

.

It is next to be shown that class D . Setting φ, . (h) as ε -*- 0+ l#d

T

φ.

is excessive, hence a potential of

φ (h) = E (A (°°)), since φ (h) increases to it suffices to show that for each ε and t > 0 h E ((φ(Zt) - Ψ ε (Z t )) < φ(h) - φ £ (h) ,

or again that this holds with Starting with

n = 1, E

(1.15)

\

t Λ T

in place of

t,

for every

n .

we have

( z

Λ T

t

E h (E h (A

ε

)

"φ ε

( h )

=

(«) - A (t Λ T )|Z A τ )) - E h A (oo) ε i u Λ x_ ε

= -E h A £ (t Λ T χ )

On the other hand, by the previsibility of

t Λ T

and optional sampling

for supermartingales φ(h)

h

h

>-Eh(Δφ(t Λ T χ ) = E h (ΔA £ (T 1 )

T χ < t) T χ < t) .

This finishes the case φ

(Z

(1.16) (a)

h

- E φ(Z / 4 . A m . ) • (φ (h) - E φ (t Λ T-)) - E (Δφ(t Λ T-)) (t Λ T χ ) 1 1

n « 1 .

Assuming the case

h

) = E (A («) - A (t Λ T )|Z

E h (φ (Z^ A

m

Λ

τ

)

) - φ (Z,. A

m

)

= Eh(Aε(tΛTn) -Ae(t = -Eh(ΔAε(Tn+1)

AT

n + 1

Tn+1 < t .

n

and writing

it follows similarly that

t Λ T

» and

< t Λ T

.)

of#

88

FRANK B. KNIGHT

(b)

φ ( h ) - E h φ(Z

) U

Λ

n+r

= (φ(h) - E h φ(Z

))

+ Eh(φ(Zt

) - φ (Z

n > (cp (h) - E ω (Z ε ε t Λ T

This proves the case

Λ

n

))

τ

n

n+1

) ) + E (ΔA (T ) ε n+1

T _ < t) n+1 -

.

n + 1, and hence the assertion. We note that only

the previsibility of the T , is used here, not the continuity of Z(t) n+1 at t = T . n+1 We next compensate the accessible jump times of Ψ 2 ( z t ) Since these are

contained in those of Z , it follows by Theorem 2.13(ii)

of Essay 1 that these are contained in the set of times where

Z

e H -H

By taking accessible parts of all the discontinuity times of Z , it is easy to see that

{(t,w ): Z h

e H - H } is contained in a countable h

union of graphs of Z -previsible times, P -a.s. for each

h . Then

by [3, IV, 88 b)] this set is equal to a countable disjoint union of graphs of such times.

Let (T ) denote such a set, and for each n let n (T. , 1 < k < n) be defined by T, = T. on the set where exactly k κ,n — — κ,n 3 among (T , ..., T ) are less than or equal to T. . Then the T are in ~2 K ,n Z -previsible, and define a natural ordering of T , ..., T We now set φ

(t) = limsup φ o (Z ) , t > 0 and (letting

<» - <» = 0)

define

V A n (t) =

Σ

(φ2JTkn) - E

Φ 2 (Z 0 ))

k,nSince the last term on the right is a version of E(φ (Z

)|Z ), k,n k,n 0 < E (A (t)) < E*\p (0) -

it follows by the supermartingale property that

E (φ (t)), and the A (t) are increasing in n for all t, and in t for each

n, P -a.s. Thus we may define

(1.17)

A

(t) = '

VX

Σ (Φ 2 Jt) - E 0

i-

where the sum is over all t. with

VX

"

Z _e H - H Γ

z

and

0 < φ (t.) - E Φ 2 ( 0 ) if this gives A d (0+) = 0, and A (t) = 0 elsewhere. Then A o is an additive functional and, setting 2 h 'd Ψo2,dj(h) = E A_ , (°°) , we have 0- < 2,d φ_ (h) 2,d - <2φ_ (h) . Moreover, since φ is Borel it is easily seen that A is Z -previsible for every h . 2.

2f Q

t

ESSAYS ON THE PREDICTION PROCESS

Then

φ^ _ is a potential of class 2,d decomposition of φ o _ (Z ) is 2,d t

D,

We recall, now, that a potential

and for each

φ

of class

regular potential if, for any increasing sequence times, and any

T = lim T it**

P

D

T n

the Doob-Meyer

is called a of

Z -stopping t

h e V, E h lim φ r ( Z τ ) = E h

where

89

and

n

φ (Z ) = 0 .

^r

φ ^ )

The key fact needed to reduce

*

Theorem 1.8 to standard methods of Markov processes is LEMMA 1.8. class

Set φ = φ - φ =φ - φ - φ . r z z fd -L/Ci ztcx potential.

D

PROOF.

We have seen that

φ

Then

φ r

is a regular

lim φ (Z ) = φ (h), P -a.s. t-K) r t r

> 0, and clearly lim E φ (Z ) = 0 and r t ~ t-*» If we show that E φ (Z ) < φ (h), then r t ~ r

is a potential of class

To this effect, we need to repeat the argument

r

used for

φ2,

D .

and for this we require the analogues of

A

and

φ

r

φ

Using the same symbols as before, we introduce

T = infίt > 0: (
A^ ^ί00) < »

and setting

T

holds

= T, T

a.s.,

,

we see that

and

φ z

A

ε

(t) =

and observe that

Σ

n :T
functional, while

lim T

T > 0

= « a.s.

a.s., Thus

- ( V "E

by

τ Π (

P2(Z0))

is equivalent to a

φ (h)

_ H Q ) > ε} .

n-χ»

n

A

H

it is now easy to see that

= T ° θ

we may define as before an

6

increases to

;

φ

ε(h)

= E (A (00))

ε

' ε

2 -previsible additive

φ

(h)

as

ε •»• 0+ .

It is

now easy to check that the proof of (1.15) and (1.16) applies here with φ

replaced by

φ ,

showing that

E φ (Z ) < φ (h) .

Finally, let us prove the regularity. Z -stopping times increasing to

T < » .

Let

T

be any sequence of

Then clearly

lim E {φ (Z ) T = «>} = o . On the other hand, over {T < «} there is r n-*30 n no difficulty in passing to the limit on {T = T for large n} . Then setting

S={τ

n

all

n}

we have

90

FRANK B. KNIGHT

(1.18)

h

S)

h

{Z^

lim E ( φ r ( Z τ ) - φ^Z^) n lim E ( φ r ( Z τ ) - ψ r ( Z ^

+ lim E h (φ r (Z τ ) - φ^Z^); τr*»

{Z^ e H - H Q } Π S)

n

h

= E (Δφ(T) - Δφ_ _(T) - Δφ_

+ E h (Δφ(T) - Δ φ χ d (T) - Δ φ 2 But on

{z

= Z^} Π S)

= Z } Π S

(T)

d

(T)

we have by (1.14)

{Z^

{Z^

= Z_} Π S)

e H - H Q } Π S) .

Δφ(T) = -ΔA

(T) = Δφ

d

(T),

E h (A , (°°) I Z^) we have 2, d t Hence the first summand on the right

while by (1.17) and the martingale property of E (Δφ

(T)

z fCX

vanishes.

{Z

τ~

= Z } ίl S) = 0 . T

As for the second, on h

and therefore

E (Δφ.

(T)

{Zm

{Z € H - H } T0

we have

e H - H n } ίl S) = 0,

ΔA

l,α

(T) = 0,

while by (1.17)

and the moderate Markov property E h (Δφ

(T) - Δφ.

(T)

{Z^

€ H - H.} Π S)

= E h (Δφ (T) + ΔA 2 d ( T ) ; {Z^_ € H - H Q } Π S)

h = E (φ 2 (Z τ ) - E = 0 . This completes

V φ 2 ( Z 0 ) ; {Z^

e H - H Q } Π S)

the proof of Lemma 1.8.

As mentioned at the beginning, the rest of the proof has already been done in a somewhat different context in [2, IV, Section 3 ] , It follows that there is a continuous increasing additive functional φ

(z) = E A (°°) .

Z (t)

with

The method used is that of Sur [15] , together with

refinements which reduce the problem to a bounded regular potential. proof is unfortunately not short.

because the multiplicative functionals hence

S

The

Some simplifications can be made M

of [2] are not present, and

and the

S 's of [2] are absent, but it does not seem merited P to rewrite the proof. In the present case there may be branching points,

but it can be checked that the proof in [2] makes no use of quasi-left continuity of

X

and so applies also to Borel right processes (see

Getoor [6,9.] for the relevant information on hitting times and excessive functions.)

ESSAYS ON THE PREDICTION PROCESS

91

In the extension argument of [2, p. 168] from bounded to unbounded potentials, use is made of the uniqueness theorem to the effect that

Z (t)

is uniquely determined by its potential ([3, IV, (2.13)]).

However, this 1 2 is easy to see directly from martingale arguments. Thus if A,, and A c l Z are continuous additive functionals with φ (z) = E A (°°) = E A^ («>) , then for each h the identity

1

implies that

0

Vi

A (t) - A (t)

is a continuous,

(P , Z )-martingale of

bounded variation.

By arguments given in the proof of Theorem 1.6 (since 1 2 the martingale may be stopped at any N) this implies that A - A is c c indistinguishable from the zero martingale. But the same reasoning applies if

A

and

c

A

c

are only assumed to be

previsible right-continuous martingale since T

M

- M

Z -previsible. u. M

Indeed, a

is continuous:

otherwise,

is a previsible process, a bounded previsible stopping time

could be found with

with the fact that

M

P {M is

? M

} > 0,

leading to immediate contradiction

Z^-measurable

([3, IV, 67]). It follows that

any decomposition of the type asserted by Theorem 1.8 is unique up to equivalence.

Hence we must have A(t) = A_ _(t) + A (t) + A (t) , l,d 2,d c M(t) = φ(Z t ) - φ(Z Q ) + A(t) ,

and the proof is complete.

2.

TRANSITION TO THE INITIAL SETTING:

THE LEVY SYSTEM OF A PROCESS.

In order to translate results back and forth between the prediction process setting and their original setting, it is useful to examine more carefully the connection of

(Ω,G )

and

(Ω ,Z ) . Δ

Since the connections

we have in mind are completely general, not requiring any restriction on the probabilities, we return temporarily to the notations of Essay 1, Section 2.

Thus

Ω_ Δ

is the space of all paths

continuous with left limits in

H

w_(t) e H_ Δ U

for all

t > 0,

which are rightφ(h)

of Definition 1.8 (rather than only its first coordinate)

is the function and

φ (t)

denotes limsup ) . As remarked at athe end space. of Essay 1, (Ω ,Z ) coordinatewise may be topologized as φ(Z a coanalytic subset of Lusin While Z neither this nor the following assertion is essential to the development here, it is worthwhile to have them on record.

92

FRANK B. KNIGHT

PROPOSITION 2.1.

Let

Ω = {w : φw (t) is r.c.1.1. in R with the φ Z Z °° product topology}, where φw^(t) = φ(w_(t)), t > 0 . Then we have Z Z — Ω G Z°, and φ(Ω ) = {all r.c.1.1. paths in R } . φ φ °° PROOF. Since φ is a Borel function, the components of φw (t) are

z Z°-progressively measurable.

We now apply the results of [3, IV, 17],

according to which the two processes defined as the right

limsup

liminf

Z

of

φ(w (r)) z

along rational

r > t, —

r Ψ t,

measurable, and the two processes defined as the left

are

and

-progressivelyt+

limsup

and

liminf

along rational r < t, r f t, are Z -progressively-measurable in t > 0 . The condition w e Ω is simply that the two right-limit processes should Z φ equal φ(w (t)) and the two left-limit processes should equal each other. Z Since φ(w (t)) is also Z -progressively-measurable, these conditions Δ t+ define a set in Z To see that φ(Ω ) = {all r.c.1.1. paths}, we fix w e Ω and let h be the unit probability at w . Then h e H, and so we can define w w h w its prediction process Z as in Essay 1, Section 1. By Theorem 1.9 h h w w there, we have P { (Z ) = X for all t} = 1, meaning in the present case that the even coordinates w^ (t) of w are identical with those 2 n h h w w of φ(Z ) at w . Since Z at w defines an element of Ω , and any t ( ) Z r.c.1.1. path X. is obtained as X^_ = (w_ (t)) from a w e Ω, the t t 2n assertion follows. The mapping

φ on

Ω

is not one-to-one.

In fact, since

P (Ω ) = 1 for every h e H (as usual, we use the same notation P for φ measures on Ω or on Ω) if φ were one-to-one then the prediction Δ k process Z on Ω would not depend on h except for null sets, which is absurd. Thus we cannot use φ on Ω to transfer a process on Ω to one on Ω_ . Instead, we must reduce Ω to a subset depending on h . Thus, ^ h given h and a particular choice of Z on (Ω,G ) , we can regard Z

as a measurable mapping of

Ω = {w & Ω: φ Z , , (w) = w} h ( ) z

/ x (Ω, ) e Z , and ( ) h and hence we can use

φ

Ύ

φ

is in

(Ω,F ) F ,

>(Ω ,Z ) .

and we have

Then the set

Z, , (Q) c Ω ( ) h

,

is one-to-one on Z. . (Ω, ) . Also P (Ω, ) = 1, ( ) h h to transfer objects from Ω to Ω_, except for

z an

h-null set. In the cases of interest here the problem is to go in the other

direction, from

Ω

z

to

Ω,

and this presents almost no difficulty.

we now define the concept of a Levy system for any

h e H,

existence and properties from Theorem and Corollary 1.3.

Thus

and obtain its

ESSAYS ON THE PREDICTION PROCESS

DEFINITION 2.2 H ,

A Le*vy system for

(as in (1.5) with

processes

H

and

is continuous, that for

L, n,s

0 < f(z —

consists of kernels N w and M_ on h = H ), and F -previsible, increasing

on

Ω,

with

is pure-jump,

HL

= L

E U. < », n,s

z ) € H x H, f(z,z) = 0,

= 0, and

IL

EL. < «, n, s

such

we have

Σ f(Z h , Z h )I S S 0<s
Eh

= E h /n o

dH

h h un,s L H- N ^ ( ,z) z Z s-»dz)f(Z s-

(b)

Eh

Σ f ( Z h ,Zh)l 0<s
(c)

Eh

Σ f ( Z h ,Z h ) s 0<s
Hh = ^

DISCUSSION.

where

»r *

(2.1)(a)

where

h

H(D ) ίl H L

93

+ 1^.

and

S z = N2 + Mz .

Obviously a Levy system for

particular choice of

Z

.

h

does not depend on the

Furthermore, when a Levy system exists then the

proof of Corollary 1.3 carries over without difficulty to show that, for any if

F^previsible process f(z£ ,z)

y

is replaced by

particular, for

y

t

= X

t""

with values in f(y. ,z)I

(R^rδ^)*

.

for

we may replace this

(2.1) (d)

0 < f ( x i r x 2 ) <= 8^ x 8^, Eh

Σ

0<s
f(X

S

with

x H .

In

by

t

. .xx to obtain the following, ψ φ (z))

For

0 < f € g

f(y ,z)

f(y ,φ(z))I. t (y

(2.1) remains true

f(x,x) = 0,

we have

,X )I

" /

S

(Z^ ^ zj) N

(Z^ ,dz)f(X

,φ(z)) .

Analogous statements also hold corresponding to (2.1)(a) and (b). In other words, the Lέvy system compensates the jumps of

X

which coincide

94

FRANK B. KNIGHT

in time with jumps of be expected.

Z

.

On the other hand, this is the most that could

By Essay I, Theorem 2.13(i), a time of the form T = i n f ί t > 0:

on

Ω

Z

is

Z -previsible, where t

φ(Z ) - φ (t)

|Δφt|l(z

z

in any convenient Borel metric on φ (Z ) € Z

iteration of

e -*• 0,

T,

and letting Σ

the processes

S

"

as before).

(

X

over

R .

By the moderate

{0}.

_ .

are

t

V "V

Z -previsible (where

Therefore, these processes are their own

compensators, and the Levy system is irrelevant.

It will be seen easily

from the proof to follow that this fact translates into the of

By

it follows in the usual way that

f(φ (s),φ(Z ))I,_

0<s
> ε}

}

|Δφ I denotes the magnitude of t

Markov property, we then have

f(x,x) = 0

=

Σ f(X ,X ) I , S 0<s
F -previsibility

hence there is no need to compensate these

S

discontinuities. THEOREM 2.3. PROOF.

A Levy system exists for any probability

h

on

(Ω,G ) .

All that needs to be done is to define from Theorem 1.2 (with

V t ( w ) =H z , t < ) ( w ) ) ' L

and to show that they are show that if

Y

P {Y Q = 0} = _,

h,t(w) =

L

z,t<)

Y (Z

definition the previsible

'

F -previsible processes.

is any real-valued, then

( w ) )

(w) )

More generally, let us

Z -previsible process with

is

F -previsible.

Since, by

σ-field is generated by the left-continuous

adapted processes (if we take

F

= F

see [3, IV, 61]), and this

class is closed under linear and lattice operations, it will suffice to consider the case of left-continuous

Y

P {Y

is left-continuous, and we need only

= 0} = 1) .

show it is {w: Z

Then

(w))

F -adapted, or again, that for

(w) e S} e F

Then because

Y (Z

Z

is

{w: Z . . (w) e S } e F

SQ e Zt

(without assuming

S e Z

we have

.

Let

be such that

F

-progressively measurable, it follows that Also, by definition of

P

P h w:{Z^ # ) (w) € S Q Δ S} = Ph{SQ = 0 .

Δ S}

P (S

on

Δ S) = 0 .

Z

we have

ESSAYS ON THE PREDICTION PROCESS

Consequently,

{w: Z

(w) e s} e F^ + ,

and so

But by left-continuity this is equivalent to and at

t = 0

Y (Z^

95

(w))

is

F

F -adaptedness for

-adapted.

t > 0,

we have the extra hypothesis needed to complete the proof.

In the more specialized contexts of martingales or class

D

supermartingales, we have the decomposition results corresponding to Theorems 1.6 and 1.8. X

Here we return to the notation of Section 1:

denote only their first coordinates, while

THEOREM 2.4. for

X

where X

t

Let

P

be any

G

F

and

φ(z)

G

coincide.

-square-integrable martingale probability

= w (t) on (Ω,G°) . Then there is a decomposition X = X^ + X ^ t t t c P X is a continuous, F -square-integrable martingale with

= 0,

X

is a square-integrable martingale which is a

limit of martingales of bounded variation, and

(X

X )

P-mean-square is a martingale.

The decomposition is unique up to a P-null set. p PROOF. Choosing Z as any version of the prediction process of we first write where

and

M

x£(w) = M (t,Z*

(w)), and

X^(w) = X

+ M d (t,Z*

and

M_ are the martingales of Theorem 1.6 on c α definitions are meaningful except for a P-null set since P c d P{Z, . 6 Ωμ> = 1 .

Moreover,

X

and

X

ΩAι . M

P

on

Ω,

(w)), These

are right-continuous, and by

arguing just as in the proof of Theorem 2.3 it follows that they are F

-adapted. Further, we recall from the Remark to Theorem 1.9 of Essay 1 P P that F is P-equivalent to the σ-field generated by {z , s
2

EP(X

)

s+t'fΓt+)

w N M ) M z P , s
= x£ , The same reasoning applies to

P-a.s. X ,

and to

X

X

.

Finally, the mean-

square approximation obviously transfers from Theorem 1.6 in the same way, and the uniqueness proof was already formulated for fixed to apply it to

X

- X

= X^ + (X t - X Q )

P . We have only

to complete the derivation.

Turning to the specialization of Theorem 1.8, we obtain the Doob-Meyer decomposition of a class writing

Y

is a class follows.

D,

right-continuous submartingale

= E ( γ j G ° ) - (E(γjG° + ) - Yfc) , D

potential.

(γt'^V+^

b y

and noting that the last term

Then we have only to decompose the last term, as

96

FRANK B. KNIGHT

THEOREM 2.5.

Let

Then there is a a

G

P

on

P-a.s.

(Ω,G°)

be such that

X

is a class = M

G -previsible increasing process with

potential.

A

M

is

is a

= 0 .

Unlike the decomposition of Theorem 2.4, the present components

depend on the choice of the

A

D

- A , where

-uniformly integrable, right-continuous martingale, and

NOTE.

F

X

unique decomposition

σ-fields, and may be altered if one replaces them by

σ-fields generated by ,

X

.

Hence we denote them by

G

instead of

although in the present notation these are actually identical.

PROOF.

We recall again that the uniqueness part of the proof of Theorem

1.8 was in no way Markovian, and applies here without change.

For the

existence, we set M (w) = M(t,Z^ ,(w)) t () and A t (w) - A ί t . Z ^ ί w ) ) , where

A

and

M

on the right are from Theorem 1.8, and

fixed choice of the prediction process of the proof of Theorem 2.3 shows that since

E

A^ < °° it is clear that

need only be shown that it is a

G

A M

P is

on G

Ω .

p Z

is any

Since

A (w) = 0,

-previsible.

Of course,

is uniformly integrable, hence it

-martingale.

However, this follows by

the same reasoning as (2.2), completing the proof.

3.

ON CONTINUOUS LOCAL MARTINGALES. We pass over any examination of general local martingales or semi-

martingales.

These are treated at length in [4], and it is not clear

whether our Markovian approach has anything to add.

The continuous local

martingales provide not only a simpler application, but also one in which the method of time changes can be aptly illustrated in a prediction process setting.

In point of detail, we avoid the "adjoined Brownian

motions" of the usual time-change result (as in H. Kunita, S. Watanabe [10], for example).

In the last part of the section, we specialize further to the

continuous local martingales which are autonomous germ-Markov processes, as defined in [9]. These generalize the one-dimensional diffusions in natural scale, and perhaps should be called germ-diffusion processes in natural scale.

However, this would be misleading in that no reduction of the

general germ-diffusion to a scale in which it is a local martingale is possible. p X (w) = w (t) = φ(Z ) , t £ t etc., but our starting point is the prediction space of all continuous We continue with the notations of Section 1:

ESSAYS ON THE PREDICTION PROCESS

97

local martingales. We recall that A

is a "continuous local martingale"

means that if

then

T N = inf{t:

A

is a continuous N N > 0 . Actually, we will deal somewhat more

martingale for every

| A^_ J > N}

t

Λ τ

generally with processes which are continuous local martingales given their initial values.

Thus we do not require that the initial value have finite

expectation. PROPOSITION 3.1.

L c H be the set of all probabilities h on c , (Ω,G°) such that (φ(Z ) - φ (Z Q ), Z ) is a P -continuous local martingale. Then L is a complete Borel packet of the prediction process, c , NOTE: It is assumed that, P = a.s., Ψ ( Z Q ) ? ± °° PROOF.

Let

In the first place, since

φ(Z.)

is r.c.1.1. for any

h e H,

P -a.s. the condition that φ(Z ) be continuous is the same as φ(Z ) - ψ (t) = 0 . As seen in the proof of Proposition 2.1, this is a Borel condition on implies that

h . By the usual optional section theorem argument this

{h: φ(Z )

is

P -a.s. continuous}

is a Borel packet, and

the moderate Markov property plus the previsible section theorem show that this packet is complete. redefine

T

by

T

In the definition of local martingale, we can

= infίr € Q: |φ(Z )| > N}, which is

Then we see that φ(Z

Λ

\ " Φ(ZQ)

τ

is

Z -measurable.

Z -adapted, and in conjunction

with the continuity and boundedness of φ(Z ) Z N r l h becomes E φ(Z ) = φ(Z ), P -a.s. over r 2 N l

the martingale condition {r

< T },

for

IN

0 < r , r_ e Q . Hence the set of continuous local martingale probabilities for

φ(Z ) - φ(Z Q )

is in

H .

To show that it is a packet, for

Z -optional

T < °° we can replace

the martingale condition by

on

{r

< T

Λ T}, 0 < r,

e

Q,

together with the conditions

T+r E

Φ(Z

imply that over

) = ψ( {T < r

z τ+

)' 0 < * 3 e Q, o

< T } one has

n

{T + r

< T },

since these

98

FRANK B. KNIGHT

= φ(Z

) ,

using Hunt's Lemma for conditional expectations in the first equality. Then it follows immediately that

Z

is

P -a.s. in

By the optional section theorem this implies that

L

The completeness then follows because, for previsible p h {

L

VG c

L

along with

h

is a Borel packet. 0 < T < <»,

}

= E h ( P V { z 0 € Lc}>

= 1 . We require the following lemma, which was first proved for Hunt processes and square-integrable martingales by H. Kunita and S. Watanabe [10].

But our situation is different, and we prefer to use again the

argument of M. G. Sur (see [2, Chapter 4, Section 3]). From now on, we denote

φ(Z ) - φ(Z )

by

A

.

LEMMA 3.2. There is a unique continuous (non-decreasing) additive 2 functional τ(t) of Z on Ω such that, for h e L , A - τ(t) c is a continuous local martingale. PROOF.

For each

h & L

and

N,A

is a bounded submartingale, and

by Theorem 2.5 it has a Doob-Meyer decomposition

= E (A |Z ) N N for an increasing previsible process τ (t) ,

- E (τN(°°)|Z.) + τ (t) , depending on

h,

with

τ (0) = 0 . Thus

- τN(t)

A

A

previsible, it follows easily that τ

is a.s. constant for

τ (•)

is unique up to a

τ (t) = limsup N

τ M

(t)

t > T

is a uniformly

N

2 integrable martingale, and since

A

is continuous while τ (t)

τ (t) is

is continuous, and of course

along with

A. Λ

T

Furthermore,

P -null set, hence we can define

to obtain a

1 -adapted,

3

~* °

P -a.s. continuous,

2

non-decreasing process such that

A

h - τ (t)

is a continuous,

P -local-

martingale . It now remains only to modify the functional τ(t)

τ

to obtain an additive

τ(t), but the argument we follow would also suffice to define

from scratch.

We first observe that

TN

is a terminal time of Z fc

ESSAYS ON THE PREDICTION PROCESS

on

Ω

.

c may set B

Indeed, since

T

= infίr

e

φ(Z )

JM

C

E χτ = i Π H Π {h: N c 0 a Borel right process.

Then

T

,

for

Z ,

e

is the Borel set

defined by killing

t < T ,

for

is Borel, we

Z^ t

at

T

N

is

and

t > T

t) = 1 . Then with the Borel transition function derived N Z becomes a right process on E U Δ .

We show next that if then

B

φ

Z -measurable, and the process

vi

Δ

from that of

continuous and

is

P (T > 0) = 1 } N Thus we define Z

Δ N P (Z = Δ for all

a.s.

Q: φ (Z ) € B } where

= ίz e L : |φ(z) | > N} .

on

is

99

e (h) = E

τ h

(τN)'

h

e

E N

a n d

'

e

( Δ )

=

N

°'

is a bounded regular potential for

Z . Indeed, by the h 2 e (h) = E A , h € E . Next we note

optional stopping theorem we have

that there is no difficulty extending the additive functional property of A

to stopping times.

Then for any stopping time

E h (A^ ) = E h (A N

(3.1)

+ A

°

S < T

we have for

θ V

N

Z h 2 h S = E (As; S
^

Λ Setting first

S
T

} + E H ( E

S

N

\ ' N

S = t Λ T . we obtain N Z

E h (E

Since

E A

decreases to 0 as t -*• 0+, it follows that N 2 N an excessive function for Z . On the other hand, since A t Λ τ a bounded submartingale we have

lim A t-χ» t Λ T

= A

, N

P -a.s.

e N (h) i s

(or more

is

100

FRANK B. KNIGHT

precisely, the right side is defined by the left on

{T N = «}) .

Thus by

Z

h t 2 lim E (E A ) = 0, and so e X t+ N Finally, let S < T increase to a limit

dominated convergence we have bounded potential. by (3.1) with

S = S

is a T .

Then

we have

li- E n-*»

h

h

eN(Z» ) - E e N ( < ) n

= lim E h (A^ n-*»

N

- A* n

Λ

) N

= 0 , proving that

e is a regular potential. N It follows by the argument of Sur [15] that there exists a continuous N τ

additive functional each

h G E

this

N τ N

o f

(t)

z t

i s

(t)

p

with potential

e

.

Of course, for

-equivalent to the one obtained by the

Doob-Meyer decomposition since their difference is a continuous martingale Z h of bounded variation. Now we have τ (s+t) = τ__(t) + τ (s) ° θ . P -a.s. N N N t on {s+t °° and note that T N "*" °°' p h ~ a s for h e L . It is well-known that

A(t)

may be reduced in some sense to a

Brownian motion by the time change inverse to

τ(t)

(see for example

H. Kunita and S. Watanabe [10], Theorem 3.1). We wish to formulate a result of this type in the present context, and it will be useful to make a slight enlargement of the

σ-fields

Z

to cope with the case when

P {lim τ(t) < °°} > 0 . LEMMA 3.3. and

T

Let

M = sup

τ(t), and

are permitted the value

that is, the trace of h & L

T = infίt: τ(t) = M>, where * Let Z denote Z V

σ-field generated jointly by

σ(T)

both

+°° .

on

A(t)

{τ
C

The family

A (t) - τ(t)

Z are

Z ,

the atom

{τ>t},

M

and the

is non-decreasing, and for Ph-continuous local

t*

martingales relative to

Z

. t

NOTE:

T

is not a stopping time of

is right-continuous in PROOF.

* 2 ,

but it is not hard to see that

Z

t .

A familiar argument using Jensen's Inequality (as for (1.10)) shows

that for

0 < r

< r

e Q,

P {τ(r_) = τ(r o )} = P {τ(r) = τ ( r o ) ; A(r_) = A(r o )} 1 2 1 2 1 2 whence we obtain without difficulty that

P {A(t) = A(T)

for all

t > T> = 1

ESSAYS ON THE PREDICTION PROCESS

Next, observe that any

S € Z

S = (S χ Π {Tt}) s > 0,

with

T

101

may be written in the form

with

S± e 2 t ,

= inf t: |φ(Z )| > N}

i = 1 or 2 .

Then for

we have trivially

EZ(A((s+t) Λ T N ) ; S) = E Z (A(t Λ T ); S. Π {T
S o Π {T>t}) . 2

But the last term on the right becomes EZ(A((t+s) Λ T N ) ; S 2 ) - E Z (A((t+s) Λ T N )

S 2 Π {T
= E Z (A(t Λ T N ) ; S 2 Π {T>t}) , by the martingale property and the same reasoning as before. two terms yields the local martingale property of 2 The case of

A (t) - τ(t)

A(t)

Adding the

relative to

Z

is clearly analogous.

This is the key step; the rest is somewhat routine and we will omit some details.

Set

τ~ (t) = inf{s: τ(s) > t} with inf(φ) = «> . A -1 * routine check shows that τ (t) Λ T is a stopping time of Z. . Let Z denote the usual indicated σ-fields, thus S Ξ Z τ (t) Λ T _λ τ (t) Λ T A means that for c < ° ° s Π { τ (t) Λ T < c} e Z . Then we have {M < d} = ίτ^Cd) = oo} = { τ " 1 (d) Λ T = T} , from which it follows easily that

M

is a stopping time

Z τ " (t) Λ T

The theorem we wish to prove is as follows. THEOREM 3.4.

For

h e L ,

the process

C

a Brownian motion adapted to times

τ(s)

B(t Λ M) = A(τ""1(t) Λ T)

Z , stopped at time M . The τ (t)^Λ T are stopping times of Z , and A(t) = B(τ(t) Λ M) τ (t) Λ T

for all

t,

REMARK.

It is a simple matter to see that

t > M,

P -a.s.

so that our notation is consistent.

B(t Λ M)

remains constant for

It also would not be difficult

to adjoin an auxiliary independent Brownian motion and continue beyond time

is

T*

M

B(t Λ M)

as an unstopped Brownian motion (as in [10]) but since

is a stopping time the meaning is clear without this step. PROOF.

The adaptedness and measurability assertions are again routine,

and left to the reader. τ

(τ(t)) = <» for

Since

t > T,

Aίτ"1(τ(t))) = A(t) = A(t Λ T ) , where

the last assertion is clear.

M

102

FRANK B. KNIGHT

By a characterization theorem of J. L. Doob ([5, VII, Theorem 11.9]) * to show that B(t Λ M) is stopped Brownian motion relative to Z _ τ (t) Λ T 2 becomes equivalent to showing that both B(t Λ M) and B (t Λ M) - (t Λ M) are martingales relative to Z . B y Lemma 3.3 and the optional τ (t) Λ T sampling theorem, they are plainly local martingales. Since t Λ M is bounded by

t,

the second is then clearly a martingale.

is finite, hence so is as

Thus

E B (t Λ M)

sup | B ( S Λ M ) | . Then by dominated convergence s
B(t Λ

M)

E

.

While Theorem 3.4 provides a rough outline of the process

A(t),

it

conceals a variety of possibilities which emerges only when we introduce further assumptions.

For convenience, we let

L

denote a complete Borel

Z

H -subpacket of L such that, for z e L P {M = °°} = 1 . Then it is 0 c d clear from the theorem that A(t) is unbounded above and below, P -a.s. for

z e L . d

If we assume that

(φ(Z),Z ) t t

is a homogeneous strong-

Markov process, then it follows from well-known facts that it must be for each

z

representation of Theorem 3.4 becomes unique measure

m(dx),

(t)), and there is a

t = /^

s(t,y) = — —

jointly continuous in m(dx)

B(t) = A(τ

Then the

positive on open intervals, such that we have

(3.2)

where

(-00,00) .

a regular diffusion in the natural scale on

/

I (t,y)

(B(s))ds outside a

is the local time of B(t), P -null

set for each

is the "speed measure," and does not depend on

z .

z .

Here

The theory

of such processes is highly developed, going back to W. Feller in the 1950's, and is well-represented in the book of Ito and McKean [8]. It will not concern us here, except as a starting point for the discussion. Suppose, indeed, that instead of the Markov property we assume only the autonomous germ-Markov property as in [9, Definition 2.2]. By Definition 2.2 and Theorem 2.4 of that work, this means that there is a packet

(K Π K ) c H Q

by the functions

such that the trace of

z(S), S ^ G

H

on

(and hence, since

K Π K H

is generated

is countably

U"Γ

generated, by

z(S n )

for a sequence

S

e G° + ) .

The germ-Markov

property, as well as homogeneity in time, then follow for described below. i-c Π K ^ Φ,

But first we replace

KlΊκ

by

L

z ^ K Π K

Π K Π K

to obtain the packet of a continuous martingale autonomous

germ-Markov process.

as

assuming

ESSAYS ON THE PREDICTION PROCESS

As discussed in [9], for such a process φ(Z )

Z

103

the future and past of

are conditionally independent given the present germ

σ(Z (S),S e Gz ),

and this is equivalent to conditional independence of

the future and past of

φ(Z )

Moreover, such a process

ψ(

given the germ

z t

h a s

)

Π σ(φ(Z ) , 0 < s < ε)) ε>0 stationary transition mechanism (a

a

function of the "germ-state") hence from an empirical viewpoint it perhaps may be said to arise from essentially the same conditions as if it were a homogeneous Markov process, i.e., in this case a regular diffusion. However, most of the resemblance ends here, as we now indicate. the first place unlike the case of diffusion where be

identified, here the process

discontinuities of

Z

Z

<M

z t

an<

)

*

In

m a v t

may be discontinuous, and the

may greatly effect the behavior of

give two such examples:

z

ψ(

in the first the discontinuities of

z t

W e

)

Z

are

totally inaccessible, while in the second they are previsible. y

EXAMPLE 3.5 B(0) = x,

Let

P

and let

be the probabilities of a Brownian motion

e , e , ... be independent, exponential random

variables with parameter Define a process

X(t)

1,

independent of

B(t)

for each

fB(t+T2n)?

T

2n±

t < T

2n+l < T

is a

T

= 0

T

= Σ?

e., n > 1 .

2(n+l)

Clearly (for each

(t) Ξ 0

for

P

on k 7* 2,

(Ω,G

, F

p X -a.s.

X(t), θ )

X(t)

in such a way that

so that we may assume

•K

= F°

Then we introduce the prediction space and process

probabilities

P

are of two kinds: Z

P {Z

= Z

P {Z

ψ ZQ} = 1

according as the process φ(Z ) ,

G° t"τ"

before.

Z

ί

Π K Π K

P

Z

as

t"Γ

Z ,

where the

either

for all small for each

, s > 0,

t} = 1, t,

or

respectively,

starts during a "level stretch" of

or during a "Brownian stretch" of

corresponds a distinct write

x)

P -continuous martingale, and we may introduce corresponding

probabilities w

and

x .

on the same probability space by

X(t)

where

B(t),

φ(Zfc) .

for each initial point

= {z(l,x), z(2,x); -°° < x < °°}

In each case there x,

so that we can

for the prediction

state space.

It is not hard to see that this does define a packet for

which

is an autonomous germ-Markov process and

φ(Z )

continuous martingale for each

P

.

The times

T

φ(zt)

is

a

become totally

104

FRANK B. KNIGHT

inaccessible stopping times, and the character of abruptly at each

T

φ(Z )

changes

. Of course, such exponential holding times are

impossible for diffusions because of the strong-Markov property. strong-Markov property holds for

Z

because

Z

Here, the

contains the "information"

that a level stretch has just ended or begun, but it does not hold for Φ(Z t ) . EXAMPLE 3.6. let

Let

X

P

be Brownian probabilities for

B(t)

as before, and

B (t) be an independent "instantaneous return" Brownian motion on

[0,1)

for the same probability space, so that when

B (t) = 0,

and we assume that

the usual sense. B(0) = x

and

Let

P '

B (0) = y,

y

0

be the joint probabilities for and assume further a sequence xv

independent Bernoulli random variables with all

(x,y),

where

a ^ b

B^t-) = 1,

is a reflecting boundary for

P

then B^

in

(B,B ) with

Q,,Q~, — 1

{Q, = a

or

b} = —

of for

are two strictly positive constants. We

consider a process X(t) = B(τ(t)) , where

τ (t)

is defined as follows.

instantaneous return times of B

s

τ (t) = Q, SQ i ί )

ds

for

Let

B (t)

to

T..,T , ... 0

from

0 < t < τ_, and for

be the successive

1- . Then we set

n > 1

we define

inductively

τ(t) = τ(T n ) + Q n + 1 Jl B l ( s)ds for T n < t < T n + 1 . n

Here the corresponding prediction state space is identified by triples z = (x,y,c)

in

R x [0,1) x {a,b}

where

x = B(0), y = B (0) , and

c = Q1 . It is not hard to recognize that this leads to a Borel packet of the prediction process for which

φ(Z )

a continuous martingale for each

. Here the times

P

is autonomous germ-Markov and T

are previsible

stopping times, since they occur when the "rate" dτ (t) = Q B (tj)dt reaches its maximum each

Tn

Qn

since the value of

Q

Z

has a previsible jump at

is not determined by

Z

, but is n

determined by

Z

(Z n

trary,

on each cycle. Also,

ψ(zt)

is

n o t

a

is thus a branching point). Since φ(Z ) is arbiτ ή n strong-Markov process in the usual sense. But

φ(Z ) is always a strong-germ-Markov process (as defined and proved in Theorem 2.3 of [9]). From these examples it is clear that germ-Markov processes exhibit much more variety of behavior than Markov processes, even under quite restrictive assumptions.

The situation is not much simpler even if we require

Z

to

ESSAYS ON THE PREDICTION PROCESS

be continuous along with that the

Q 's

ψ(zt)

are constant,

Thus if we set Z

105

a = b

in Example 3.6, so

becomes continuous but

predictable but sudden changes of behavior at the times In this example, the time scale

τ(t)

X(t)

still has

T

is independent of

B(t) .

If

we permit dependence, then two general types of process (with continuous Z )

still may be distinguished.

The first may be called processes in which

the speed measure developes independently of position. with any fixed speed measure process

g(t)

ψ(t) = /

m(dy),

Here we may begin

and any autonomous germ-Markov

with a continuous prediction process and such that the process

g(s)ds

is strictly increasing (in the last example,

ψ(t) = /Q B (S) ds) . independent of

Now let

B(t), B(0) = 0 ,

ψ(t), with local times

define a random time

τ(t)

(3.3)

s(t,y)

be a Brownian motion as in (3.2).

We may then

by ψ(t) = Γ_m s(τ(t),y)m(dy) ,

and then set

X(t) = B(τ(t)) .

It is to be shown that

X(t)

is an

autonomous germ-Markov process, with a continuous prediction process, which is a continuous local martingale.

X (t) denotes the regular m diffusion with speed measure m(dy) based on B(t) as in (3.2) and if τ (t) is the corresponding additive functional of X (t), then we have m m (3.4)

In fact, if

X(t) = X m (ψ(t)) = B(τ m φ(t)) .

Now since

ψ(t)

is independent of

X

it can be seen that

τ ψ(t) = lim Σ (X(—) - X ( - ^ ί - ) ) 2 , n x» k l n at least in the sense of convergence in probability. that

τ ψ m

is an additive functional of

(3.5)

Setting

we have

Next, we will obtain

J* g^φds^ψtsKy)

as an expression for the local time of

where

X .

This is enough to see

u = ψ (s),

dψ

(u) = (g(ψ P -a.s.

X

at

y

with respect to

m(dy) .

we have from (3.4)

(u))Γ

du .

But for bounded step functions

f(u)

106

FRANK B. KNIGHT

Jt0( t )

I, . (X (u)) (-°°#y) m

J

= /.„ [/J since

s(τ (u),y) m

(u)du

f

>?

is the local time of

X

m

')

.

Since this holds

for a countable family of step functions generating monotone extension that it holds for all Borel f(u) = (g(ψ

(u)))~ ,

b(8), it follows by

f > 0 .

Substituting

differentiating with respect to

finally returning to the variable

s,

m(dy),

and

yield (3.5).

Integrating (3.5) with respect to

dy

gives

f_ g 0

(s)d(τ ψ(s)) m

J

which is therefore also an additive functional of C(t),

P -a.s.

X .

We denote it by

and observe that dτψ(s)

where the integrand is the Lebesgue density as indicated. an additive functional of that of with

X .

Then the germ of

X(t), and hence the germ of

X(t)

g(t)

ψ(t)

is also.

determines the prediction process of

view of our assumptions on

g

and

B .

Thus

ψ(t)

is

is contained in But this together

X(t)

autonomously, in

It is clear that

X(t)

is a

continuous local martingale, and that its prediction process is continuous along with that of

g(t) .

It is quite apparent how to extend this type of example to germ-Markov functionals

ψ(t)

other than those which have a density

to Lebesgue measure. time

t

g(t)

with respect

The analogue of the speed measure of the process at

is given formally by

— m(dy), αψ

or

(1/g(t))m(dy)

case, and it evolves independently of the position

in our special

X(t) .

Not surprisingly, this is not the only type of continuous local martingale which is an autonomous germ-Markov process. in which the evolution of

g(t)

depends on

B(t) .

There are also cases

One such example is the

solution of the stochastic integral equation X(t) = x Q + /£ \ jS0 X(τ)dτ

dB(s)

x Q ft 0 .

The existence and uniqueness of the pathwise solution, given any (continuous) Brownian motion functional

τ(t)

B(t), is proved in Section 3.4 of [9]. Here the additive is clearly

τ

Thus if we write formally

rt Γ ,s IΓ

(t) = /

L

1 2ds

(X(u))du

J

.

ESSAYS ON THE PREDICTION PROCESS

dt = d /

107

s(τ(t),y)m(dy) —oo

we find that this is satisfied at time

t

if

m(dy) = m t (dy) = 2 dy dt/dτ = 2

JQ X(s)ds

L On the other hand, if we fix ψ(t)

is just

m(dy) = 2dy

dy .

J as in (3.3) the analogue of

τ(t), and clearly it depends on

X(t) .

It might be of

interest to look for further examples of this type in which

m(dy) ψ c dy .

As examples of continuous martingales, such processes are rather specialized.

However, in view of the significance of the martingale property

(or natural scale) for diffusion, it seems a natural first step to consider it also for a germ diffusion.

But perhaps the chief significance of

the examples is only to call attention to the fact that germ-diffusion processes are very much less limited in behavior than ordinary diffusions. Since they both give expression to essentially the same underlying physical hypotheses, it would seem necessary to use some caution before assuming the validity of a diffusion model of a real phenomenon. REFERENCES 1. Beveniste, A. and Jacod, J. "Systemes de Levy des processus de Markov," Inventiones math., 21, 1973, 183-198. 2. Theory.

Blumenthal, R. M. and Getoor, R. K. Academic Press, New York, 1968.

Markov Processes and Potential

3. Dellacherie, C. and Meyer, P.-A. Chapter I - IV. Hermann, Paris, 1975.

Probabilites et Potentiel,

4. Dellacherie, C. and Meyer, P.-A. Paris, 1980.

Ibid., Chapter V - VIII, Hermann,

5.

Doob, J. L.

Stochastic Processes.

Wiley and Sons, New York, 1953.

6. Getoor, R. K. Markov processes: Ray processes and right processes. Lecture Notes in Math. 440, Springer, Berlin, 1975. 7. Getoor, R. Homogeneous potentials, Seminaire de Prob. XII, 398-410. Lecture Notes in Math. 649, Springer, Berlin, 1978. 8. Ito, K. and McKean, H. P., Jr. Diffusion Processes and their Sample Paths. Academic Press, Inc., New York, 1965. 9. Knight, F. B. "Prediction processes and an autonomous germ-Markov property," The Annals of Probability, 7, 1979, 385-405. 10. Kunita, H. and Watanabe, S. Nagoya Math. J., 30, 1967, 209-245.

"On square integrable martingales,"

11. Meyer, P.-A. Probability and Potentials. Mass., 1966.

Blaisdell, Waltham,

108

FRANK B. KNIGHT

12. Motoo, M. and Watanabe, S. "On a class of additive functionals of Markov processes," J. Math. Kyoto Univ., 4, 1965, 429-469. 13. Neveu, J. Mathematical Foundations of the Calculus of Probability. Holden-Day, Inc., San Francisco, 1965. 14. Seminaire de Probabilites VII, Univ. de Strasbourg. P.-A. Meyer. Limites mediales, d'apres Mokobodzki. Lecture Notes in Math. 321, Springer, Berlin, 198-204, 1973. 15. Sur, M. G. Continuous additive functionals of a Markov process, English translation: Soviet Math. 2 (Dokl. Akad. Nauk SSSR), 1961, 365-368. 16. Walsh, J. B. and Weil, M. "Representation des temps terminaux et application aux fonctionelles additives et aux systemes de Le*vy," Ann. Sci. Ec. Norm., Sup. 5, 1972, 121-155.