This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
. t
Indeed, let («*(•), «*(•)) be the e q u i l i b r i u m p o i n t i n the game Y =< *(«(•)>»(-))>•
T
h
e
P,E,
D
* ( « ( • ) , » * ( • ) ) > * ( « • ( • ) . » * ( • ) ) > ^{"•(•)."(-))
d-3.2)
for a l l «(•) G P.w(-) G £ . W e shall r e w r i t e (1.3.2) as K{u{-),v-{.))>K{u'{-)y{-)),
(1.3.3)
/f(u-(),«())<XK(-),«'(.)).
(1.3.4)
Integrating (1.3.3) w i t h respect t o an a r b i t r a r y measure v G E, a n d (1.3.4) w i t h respect t o an a r b i t r a r y measure, a G P we o b t a i n / K{u{-),v'{-))dp.
> j K(u'(-),v'(-))dp
= K(u-(-),v*(-)),
(1.3.5)
j K{u-{-)A-))dv
< j
= K(u'{),v'{-)).
(1.3.6)
K(u-(-),v-{-))d„
W e i n t e g r a t e (1.3.5) a n d (1.3.6) respectively w i t h measures
"„•{.),/*«•(.)•
T h e n , u s i n g (1.3.5) a n d (1.3.6), we find A i ( u , i v . ) ) > A/(« .(.,, i/ . ) > A/(^ .(.,,i/), (
u
v
( 0
u
(1.3.7)
i.e. the p o i n t A / ( / i ' ( . ) , i V ( - ) ) is the e q u i l i b r i u m point in the m i x e d strategies U
in the g a m e F , a n d A / ( ^ - ( ) . "-.(•)) =
^K(0,»*(-))-
For s i m p l i c i t y , we i n t r o d u c e the following n o t a t i o n :
M( . v) M h
=
M(u(),v).
T h e o r e m 3 For the pair p.", v* to be the equilibrium necessary
and sufficient
point in the game T , it is
that for all u(-) G P . t i ( - ) G E the following
inequality
holds. A / ( u ( - ) , f * ) > M{p-,v-)
> Mip-A-)).
(1.3.8)
Preliminaries
20
P r o o f : Sufficiency. L e t « £ P,u € E be a r b i t r a r y m i x e d strategies. W e integrate b o t h parts of the l e f t - h a n d inequality i n (1.3.8) w i t h respect to the measure p, and b o t h parts of the r i g h t - h a n d inequality i n (1.3.8) w i t h respect to the measure v. A s a result , we o b t a i n M(p,S)
= J
M(u(-),u')dfi>
> j M{ i\v')du M{pS,v')
(1.3.9)
= M{n\v-),
r
= j M{p',v')&t>
> j M{n',v(-))dv
>
= M(fi',v).
(1.3.10)
F r o m (1.3.9) and (1.3.10) we find out that the pair (/i*,c*) constitutes the e q u i l i b r i u m point i n m i x e d strategies. T h e necessity is evident, since if (1.3.2) is satisfied for all u £ P , v £ E, then i n p a r t i c u l a r , it also holds for a l l strategies / i ( - ) , !>„(•) prescribing p r o b a b i l i t y one to the elements u(-) € P , v{-) € E. • u
A s shown, the m i x e d strategies extend the players' strategy sets i n the original game and preserve the value of the game and the e q u i l i b r i u m point, if any. In most cases, the e q u i l i b r i u m point i n m i x e d strategies is found to be existent, while the e q u i l i b r i u m point may not exist. W e consider a special class of z e r o - s u m two-person games T where there is the e q u i l i b r i u m point i n m i x e d strategies. T h e o r e m 4 Let P e CompR (E G CompR ) be a compact set in Euclidean spaces of corresponding dimension, and the function K(tt(-}, v(-)) be continuous on Cartesian product P X E. Then in the game V • < P , E, /^(u(•),«(•)) > there is the equilibrium point in mixed strategies. k
1
For the proof this theorem y o u may refer to [8]. In some cases, using specific properties of the payoff f u n c t i o n , we may provide a more precise definition for structure of m i x e d o p t i m a l strategies i n the game V. Specially , this occurs i n the games w i t h convex and concave payoff f u n c t i o n . In w h a t follows we w i l l be dealing w i t h this type of games a n d , therefore, we w i l l consider t h e m i n details i n the following section.
1.4
Games with convex payoff function
A s s u m e t h a t the set E is compact i n R , n
i t " , the function K(u(-),v(-))
the set P is convex and compact i n
is continuous i n all its variables a n d convex i n
G a m e s w i t h convex payoff
function
21
«(•) for a l l o( ). S u c h game T = < P, E,K(u(-),v( w i t h convex payoff function (or convex game).
)) > w i l l be called the game
Let us consider a m i x e d extension of the game T over spaces of various p r o b a b i l i t y measures on P and E. Investigation of such type of games is of interest to us because the solution of the f a m i l y of p u r s u i t - e v a s i o n type games w i t h incomplete i n f o r m a t i o n reduces to t h e m . T h e s o l u t i o n has simple geometrical structure. In this section we w i l l define a general f o r m of o p t i m a l strategies and i n d i c a t e a m e t h o d of c o m p u t i n g the value of the game. W e hope t h a t the reader has some knowledge of fundamentals of the theory of convex sets a n d convex functions, therefore the relevant results of this theory are given w i t h o u t proof. Let I be a m i x e d strategy by which the point 2 is chosen w i t h p r o b a b i l i t y 1. T h e f o l l o w i n g theorem holds. t
T h e o r e m 5 The value of the game T is determined v = min max
by the
formula
Klu.v).
Player It has an optimal strategy of the form A j > Q J . At the same time, all pure strategies K(u,v) is achieved, are optimal for Player K(u, v) with each fixed v(-) is strictly convex for Player I is unique.
v = Y,i=i ^ i ^ w 1, I under which m i n max,, I . In addition, if the function in u(-), then the optimal strategy =
0
UBI
u
In T h e o r e m 5, I/Q is a p r o b a b i l i t y measure concentrated on not more t h a n the (n + 1) points i n the set £ , v , ( - ) Before t u r n i n g i m m e d i a t e l y to the proof of T h e o r e m 5, we shall prove some a u x i l i a r y l e m m a describing the properties of convex functions. Let B be a compact convex region i n the n - d i m e n s i o n a l space whose elements are denoted by y. T h e f u n c t i o n / is called linear if / ( £ X,y,) = £ A ; / ( i / ; ) with = 1 (the linear f u n c t i o n is not necessarily homogenous). T h e function f(y) — 1 denoted by u>, is linear. T h e linear functions form the (n - f 1) d i m e n t i o n a l linear space E. W e denote by F an element of the conjugate space E' . L e m m a 3 If F[w)
= 1, then there is such y that F{f)
= f(y)
for all f g
E.
P r o o f : It suffices to show t h a t 11 -I- 1 equations F(fi) = f(y) have a solution w i t h respect to y for n + 1 linearly independent elements / , 6 E. B u t w may be t a k e n as f\. T h e first equation then converts to identity. T h e other n equations are independent a n d have a s o l u t i o n , since y is an element of the n - d i m e n s i o n a l space. •
Preliminaries
22
L e m m a 4 The sei of all functions f, that are nonnegative on the set B, form a closed convex cone P C E with its vertex at zero, and the function u> is its interior point. Moreover, the region P(y) = {f(y) : f(y) € P}, for which P(y) > 0. coincides with B. (We denote by P(y) a collection of all f{y) for f € P, and by f{B) a collection of all f(y) for y G B). P r o o f : F i r s t show t h a t P is a closed convex cone. L e t f f € P , a > 0, 0 > O.Then f = af +0f also belongs to P , since f(y) = a / , ( y ) + Pf [y) > 0 for all y € B. T h a t the P is closed is evident. If / = 0, then f € Pit
1
3
1
2
W e now show that the region, for w h i c h P(y) > 0, coincides w i t h E. L e t y & B. T h e n y may be separated from B by a h y p e r p l a n e , i.e. it is possible to indicate such / € E that f(y) < c < f{B). Consequently, the function f — cw belongs to P and is negative i n j / . Suppose w is not the interior point of P. T h e n there is a sequence / „ —» u>, / n € P- For each y„ 6 B there is y £ B such t h a t f {Vn) < 0. Since B is compact region , then there exists a subsequens y converging to y € B and such that / n , ( u „ ) < 0. T h e n , f(ya) < 0, as well. In this case , however, I i m _ o o /n (!/o) < 0the same t i m e , l i m f (y) = 1 for a l l t/'s on B, i.e. w is the interior point of the set P. • n
n
ni
0
t
nt
t
A
t
n - D O
n
L e m m a 5 Let Q be a compact convex sei from E not intersecting there arc such y 6 B and S > 0 that Q(y) < —6.
P.
Then
P r o o f : Let F € E' a n d shares P and Q, i.e. F(f)>b>F(g)
(1.4.1)
for a l l / € P and q £ Q. F r o m this it i m m e d i a t e l y follows that F(f) > 0 for all / € P Indeed, assume that the case is the contrary. L e t fi be such that F(h) < 0. T h e n Xf, £ P for any A > 0 and we have F(Xfj) = XF(f ) < 0. Since A cannot be chosen a r b i t r a r i l y large, this contradicts (1.4.1). T h u s , P{f) > 0 a n d , a d d i t i o n a l l y , there is 6 ^ 0 satisfying (1.4.1), since 0 6 P, a n d P(0) = 0, and the sets P and Q are convex, closed and do n ot intersect. F r o m (1.4.1) we o b t a i n t
F(f)>0>
F(q)-b.
(1.4.2)
Setting ( — 6) = 6 > 0, we derive the inequality P(Q)
+ 6
(6>0).
(1.4.3)
Since w is an interior point of P, then F{u>) > 0 holds. W e may choose F i n such a manner t h a t F(w) = 1. B y L e m m a 3, we find the point y satisfying the inequality Q(y) + 5 < 0 < P(y). In accordance w i t h the preceding l e m m a , y € B. T h i s completes the proof of L e m m a 5. •
G a m e s w i t h convex payoff
23
function
L e m m a 6 If the function s u p f (B) is positive for the family {f } of linear functions, then for the properly chosen A, > 0, E S S Aj = 1 and a ; , i = I,... ,n,n + 1, the function f = E S r i A / is a member of P, i.e. f{B) > 0. a
a
a
;
0 j
P r o o f : If / „ ( ; / ) > 0 for t h e fixed a a n d y, then there is a n open set c o n t a i n i n g y where there is a strict inequality. Consequently, by H e i n e - B o r e l theorem on covers, we m a y find such finite subset {f } t h a t m a X j { / ( j / } } > 0 for each yeB. 0)
0>
D e n o t e by Q a convex h u l l of the s u b f a m i l y {f ,}- Because of L e m m a 5 , the sets P a n d Q intersect. Since Q is b o u n d e d , a n d P is not b o u n d e d , then a b o u n d a r y point of t h e set Q lies i n P T h i s point belongs to a space of dimension not exceeding n , therefore it m a y be represented as a convex linear c o m b i n a t i o n of not more t h a n n + 1 functions / „ . T h i s proves the l e m m a . • a
T h e o r e m 6 Let {ip } be a family of continuous convex functions defined on the compact convex n-dimensional set B. The function sup
0 there are such indices a ; and such A : , that for all y G B. a
a
a
0
7i+l
1=1
when A,- > 0 and £ £ £ A, = 1. Proof:
Suppose t h a t for a l l j 6 B , satisfying the condition of T h e o r e m 5,
the following i n e q u a l i t y holds 6 > i n f sup
G (0, T ) a n d y(i-°'i))+
S i m i l a r l y , F&ip) < 0 for
^(^)>0for^G(7r,27T-^)).
Consequently, M(x,v')
= F( ) l
1 —
> F,(0) + I5£r) =
- F( )
Vl
2
V2
/ (R + r | - 2 f l r cos <^)d0+ 2
+—
/
2
(i? +r + 2^1003^)^ 2
(1.6.10)
2
i.e. using the strategy x i = (-r 0), x = ( r 0 ) P l a y e r P receives a larger payoff than using the strategy x,- = (r; cos i^>,, r,- s i n ^>). u
2
2 l
N o w let X i a n d x lie o n the diameter of the circle D , the distance between t h e m being 2 r , a n d the chord A B (for t h i s case) forms t h e central angle 2 a . Assume that x = (Rcosa — r,Q), x = (Rcosct + r , 0 ) . P l a y e r £ ' s payoff is equal t o 2
r
x
2
*(<*,r) = — £TT
\ +
*"h f
J
Rcosa
- r) + R s i n 2
1
2
yj]dyj+
—a
r2r-a
2^ J
1 ! ~ o~ /
I" \(R cos yj-
[{Rwsi>-Rcosa+
r) + R sin ip}dyj = 2
2
2
a
[K ~ ZRws'KRcosa 2
c
'i R2
2Rcos
^(
Rcosa
-) r
+ r) + (Rcosa
+(
+
-)w
ficosa
r
2
r) }dyj+ 2
=
Simultaneous
games of pursuit = ^{[R + [R
2
with non-convex
+ (Rcos a + r) ]a 2
- 2 f t s i n a(Rcos
a +
+ ( f l c o s a - r) ](7r - a) + 2Rsma(Rcos 2
2
45
payoff functions r)+
a -
r)}.
W e show that the f u n c t i o n r) for a fixed r w i t h respect to a a achieves m i n i m u m when a = x / 2 . P e r f o r m i n g elementary c o m p u t a t i o n s , we obtain 3* 2 T T — = - f l s i n a\(n — 2a)r — x f t c o s a l , da IT " ' ' therefore for sufficiently small a j°' < 0, because s i n a > 0, r(w - 2a) x/E COS a < 0 (in the l i m i t i n g case r x - x i i = x ( r - R) < 0, r < R). A t the same t i m e , 2*fcZ3£l = -R • 0 = 0. 3 l t
r >
2
W i t h each f i x e d r , the f u n c t i o n ^ j " ' ^ has no zeros i n a exept for a = x / 2 . Suppose the opposite is true. L e t Q i € ( 0 , x / 2 ) be a zero of the function —j?^. T h e n , for a = en, the function G(a) = (x — 2a)r — uRcosa also equals to zero. T h u s , ( ? ( « , ) = G ( x / 2 ) = 0. E v i d e n t l y , G ( a ) , as the difference of a linear f u n c t i o n (x — 2 o ) r a n d the function irftcosa, has no zeros: G(a) > 0 for a (E ( t * i , x / 2 ) . W e compute its dervatives: G'{a) = - 2 r + x f l s i n a ,
= xtfcoso > 0
G"(a)
i.e. the f u n c t i o n G ( a ) is convex. Since G(a)
> 0, a € ( c t i , x / 2 ) , this contra-
dicts the above inequality. T h u s , the f u n c t i o n *j£' ^ < 0 for a € ( o i , x / 2 ) and d
3 l t ,
^ ' ^ 2
r
=
r
0- Consequently, the function *(or, r ) w i t h respect to c* achieves
absolute m i n i m u m for a = x / 2 : *(o,r) > *(x/2,r). T h i s means that i n this case the payoff of P l a y e r E = * ( a , r ) > * ( x / 2 , r ) >
M(x,v-)
(1.6.11)
Based on (1.6.9)—(1.6.11), it turns out that for any pure strategy x = ( a : j , i ) Player E has the payoff M{x,v-) > *(r ,ft). (1.6.12) 2
0
Suppose P l a y e r P adopts the strategy p", a n d Player E an a r b i t r a r y pure strategy y — (p cos i>, p sin t/>). T h e n P l a y e r £ obtains the payoff A/(u*,!/) = - L f 2 x Jo = _L 2 x Jo
m
\ (p a
min[p
+ rl-2pr cos(i!-
2
2
a
+ r 2
2pr c o s £ , p 0
2
+ rl + 2pr cos(i>-
2
0
+ r + 2 ? r cos£)d£ = a
0
*(r ,/)), 0
=
Preliminaries a n d , because of L e m m a 9, M(x,v*)
(1-6-13)
= <S>(r ,p)<$(r ,R), 0
a
Inequalities (1.6.12) and (1.6.13) i m p l y t h a t a and v are mixed o p t i m a l strategies respectively for the Player P a n d E, and $ ( r , R) ' the value of the game. T h i s proves theorem completely. • 0
s
Example 13. Consider a generalization of the previous game i n this case where p u r s u i t is carried out by a pursuer detail i n v o l v i n g m p a r t i c i p a n t s . N o w , suppose that a simultaneous z e r o - s u m game between the pursuer d e t a i l P = {P, P } (Player I) a n d the evader E ( P l a y e r II) is given. P l a y e r P = { P i , . . . , P } chooses the point x = (x ...,x ), r , E S, i = 1 , . . . , m and Player E, without any knowledge of Player P ' s choice (simultaneously with h i m ) , chooses the point y £ S. T h e payoff of P l a y e r E is assumed to be equal min, , p{xi,y). m
lt
m
6 {
m
m ]
T h u s , as in the previous case, Player E aims to m a x i m i z e the distance between himself and the detail ( m a x i m i z a t i o n of a m i n i m u m distance from one of the partners of the detail P ) . P l a y e r P pursues the opposite object. W e shall give a solution for the case where the set 5 coincides with the interval [—1,+1]. A t present, the solution of the game in the general case is unknown. T h e o r e m 9 The mixed optimal strategy for Player E is the choice of the points 2m - 3
2 m - 2i - 1
' " 2 m - 1'"''
2m - 1
1
1
1
2m-V''''~2^n~^'2m~^\'''''
'
with probabilities 1 / 2 m . The optimal strategy for Player able choice of two sets of m points 2m - 5 '
1
2m - 1 " " '
2m -
2m-3
3
1
2m - 1 *
2m-3
V 2m - 1 ' " ' " ' 2m -
3
2 m - 1"'"
P is the the
2m - 5
2m - 1 ' " " ' 2m -
The value of the game is equal to l/(2m
— 1).
P r o o f : Introduce the following n o t a t i o n : - \ ~
2 m
~ ' + 2m - 1 4
3
'
2 m - 4 ( i - 1) + 1" 2m - 1
1'
V
equiprob-
Simult. L.J- . J j.-. games of pursuit
with non-convex
payoff
functions
47
\2m - 4 i + 1 2 m - 4 t + 31 '
2m-l
2m -
W e show that M(F\y) < I / { 2 m - l ) for a l l y £ [ - 1 , 1 ] , where F is a strategy of P l a y e r P. Indeed, for y £ £ 2 m - 4i: + 3
M(F-,y)= -mv l
+ imio
2m - 1
2 m - 4i + 1
- V
2 m - 1
+
-y
2m - 1
A similar equality is derived for y € W e now prove t h e i n e q u a h t y x ;G")
>
m
1 2m - 1
where G' is a strategy of Player E. We restrict ourselves to the cases when the points Xi are l y i n g o n the interval (in the remaining cases the proof is s i m i l a r ) : 1. n €ti,i= 2. n £ii,
1,2,...,m; 1 , 2 , . . . , * - l,fc + l , . . . , m , x e e „
i=
k?p.
k
In the first case, 2m - 4i + 1
<
2m - 1
2m - 4i + 3
<
-
X
-
i
2 m - 1
M{x xi,...,x ;G') u
_1_ ' ™ / 2 m - 4 j + 3 2m
=
m
\
A
/
1
2m-4i + l\
2 m - l -'V+gfo-
g(
'
2 m - l J
=
2m- 1
T h e second case means that t w o points Xi fall on one of the intervals a n d we have * * ( « , , . . . , * , ; )
2m
_
/2m-4j+3
£
l~2m^l
'2m - 4p + 3 + mninn (I -— ; — 2 m - ] + n ui n (x
=
^ X j
J
+
j
£ . ( '
2 m - 4 j + 3^
f J
-
2m. - 4p 4p + 3 3
\ + 2m = 1 *J 2 _4p + l 2m-4p+r ,xt 2- m - 1: — ) + 2 m - 1 x
T
m
p
m -1 22m
2m-4£+_3 2 m - 1
i
+
2 m - Ak + 1 2 m - 1
>
J
+
48
Preliminaries 1 -
/
2m
4
2 m \2m - 1 ~ 2 m - 1
4
n _
1
2m - 1 / ~ 2m -
+
T h e latter inequalities show the o p t i m a l i t y of strategies F',
1" G* and prove the
theorem.
•
Example 14- T w o players - the p u r s u i n g detail P = { P i , . . . , P } and the evading detail E = {Ei,... ,E„] - choose the systems of points x = m
(pi,-.-: • ,x ), x; £ [ - 1 , 1 ] , i = 1 , . . . , m , and y = ( » i y ), & e [ - 1 , 1 ] , j = 1 , . . . , n , respectively. T h e choice is made simultaneously and independently, i.e., i n m a k i n g his choice, P has no information about the choice of E a n d vice versa. Player E's payoff is assumed to equal m
m
-(max min I i i
— u,-J + m a x m i n i »
\xi
—
yA).
T h e game is z e r o - s u m . W e w i l l prove the following simple theorem. T h e o r e m 10 The optimal strategy for each partner of ike detail P , € P, i = 1 , 2 , . . . , m , is the choice of the point {0} with probability 1 (strategy F*), and the optimal strategy for the detail E = {£,-,..., E} - is the choice of the points — 1, + J with equal probabilities and subsequent concentration of the entire detail E tkere in (strategy G'). The value of the game equals 1. n
P r o o f : Suppose Player E chooses a strategy G', and P l a y e r P an arbitrary pure strategy x = (x ,...,x ). For the purposes of defmiteness, assume that x\ < x < ... < x . T h e n l
2
m
m
= - ( m i n | - 1 - Xi\ + m a x | - 1 - xA
M{x,G")
+ - ^ m i n |1 - x;\ + m a x |1 T ( | l + a : i | + |1
Xi\j
+
+
+ |1 - n | + |1 - x | ) m
= 1.
Suppose now Player P choses the strategy F * , and P l a y e r E the a r b i t r a r y pure strategy y = ( t / , , . . . , y ), j / , < y < ... < y . T h e n n
M{F\y)
2
n
= ^ m a x l ^ l + m i n l j / j l ^ < 1.
T h u s , the strategies P " , G " are a c t u a l l y o p t i m a l , and the value of the game is equal to 1. B
Chapter 2 Definition of differential game of pursuit and existence theorem of equilibrium points 2.1
Nonformal description
Differential games of p u r s u i t are a n extension of the multistage games of p u r suit discussed i n C h . 1, to the case where the number of steps i n the game becomes infinite (continuum) a n d the players have an o p p o r t u n i t y to make decisions continuously. Therefore, such formulation n a t u r a l l y suggests that the players' trajectories represent solution of the system of differential equations, the r i g h t - h a n d sides o f which depend on the parameters under the players' control. L e t x e R , y G R", u G U C R , v G V C fi*. f(x,u),g(y v) be the nd i m e n s i o n a l vector functions given on R X U, R™ x V respectively. C o n s i d e r two systems of o r d i n a r y differential equations n
1
t
n
*-/(*,»),
(2-1.1)
y=g(y,v)
(2.1.2)
w i t h t h e i n i t i a l conditions x ,y . P l a y e r P[E) starts the m o t i o n from the point (position) x ,y and moves i n the phase space R", choosing at each instant a value of the parameter u G U, (u G V) i n accordance w i t h his objectives a n d i n f o r m a t i o n accessible i n each current state. 0
0
0
0
T h e most r e a d i l y describable is the complete information case. In the differential g a m e this means t h a t t h e phase states x(t),y(t) are k n o w n to the players at each time instant ( when t h e parameters u G U,v G V. are chosen. S o m e t i m e s , at each current instant (, one of the players, say, Player P •19
Definition o f differential
50
game of
pursuit
requires a d d i t i o n a l i n f o r m a t i o n about the value of the parameter v £ V chosen by Player E at the same instant. In such a s i t u a t i o n , P l a y e r E is said to be d i s c r i m i n a t e d , and the game itself is called the game w i t h d i s c r i m i n a t i o n for Player E. T h i s definition of complete information suggests an analogy w i t h the multistage games of pursuit w i t h complete i n f o r m a t i o n from C h . 1. T h e parameters tt £ U,v £ V are called the controls for £ , respectively. Let the motions x ( ( ) , j / ( t ) , 0 ( 0 ) = x ,y(0) T h e trajectory x{t) (y(t)) is called the motion trajectory of T h e Players' objectives i n the game are determined w i t h the that is dependent on the realized trajectories x(f) (y(t)). game is assumed to have a prescribed d u r a t i o n T. o
the P l a y e r P and = y ) be realized. the P l a y e r P (E). help of the payoff, In some cases the 0
L e t x(T),y(T) be phase states for the Players P and E at m o m e n t of the t e r m i n a t i o n of the game T. T h e payoff of P l a y e r E is then assumed to be equal M{x(T),y(T)), where M{x,y) is a continuous function given on W x R , In a special case, when n
M(x(T),y(T))
(2.1.3)
= p(x(T), y(T))
where p(x(T),y(T)) = \jT."=\{ i{ ) ~ 3 / - ( ) ) the E u c l i d e a n distance between the points x(T),y{T), the game describes the process of p u r s u i t , in which Player E is a i m i n g to recede from P l a y e r P by the game termination instant for a m a x i m u m distance. In a l l cases, the game is assumed to be zeros u m , i.e. the payoff of Player P is equal to — M(x(T),y{T}). If condition (2.1.3) is satisfied, then Player P is a i m i n g to ensure a m a x i m u m approach to Player E by the game t e r m i n a t i o n instant T. x
r
T
!
i s
T h e payoff thus defined depends only o n the finite position of the players and the player, say Player P , scores the results he has obtained by the game termination instant T. Therefore, quite reasonable is the p r o b l e m formulation where Player E ' s payoff is defined as the m i n i m u m distance between the players existing i n the process of the game ^ K K ' M O )
(2.i-4)
Sometimes a l i m i t a t i o n on the game d u r a t i o n is not essential, and the game proceeds u n t i l the Players P a n d E arrive at a p a r t i c u l a r result. Let the m - d i m e n s i o n a l surface S be given in R tn = min{(t : (x(t),y{t))
2n
and called t e r m i n a l . Set
€ S]
i.e. i n ' the first instant for the point x(t),y(t) to fall on S. If for a l l the ( > 0 the point x(t),y(t) & S then t is assumed to be equal +oo. W i t h the s
u
N on formal
description
51
trajectories of x(t), y(t) r e a l i z e d , the payoff of P l a y e r E is assumed to be equal tn ( P l a y e r P ' s payoff is e q u a l to - ( „ ) . If S is a sphere w i t h r a d i u s £ > 0.
X > , 1=1
- y,)
2
= t.
then we are dealing w i t h the p r o b l e m of p u r s u i t , i n which P l a y e r P is a i m i n g to ensure the fastest approach to Player E for a distance £ > 0. If £ = 0, then by the c a p t u r e is meant a coincidence of phase coordinates for the Players P and E. P l a y e r E seeks to delay the capture instant. Such type of games of pursuit are called the response games of p u r s u i t (or the t i m e o p t i m a l games of p u r s u i t ) . A t t i m e s , it is essential to define the set of i n i t i a l states for the Players P and E, f r o m w h i c h Player P can ensure the capture of P l a y e r E for a distance i? (£ - c a p t u r e ) a n d the set of i n i t i a l states, from which Player E can ensure t h a t such a c a p t u r e does not take place for the distance £ (^-capture) i n a finite t i m e . T h e first set is called the capture zone ( C , Z ) , and the second the escape zone (E, Z). E v i d e n t l y , these zones do not meet. However, of i m p o r t a n c e is the question whether the closure of the u n i o n for the capture and escape zones covers the entire phase space. A l t h o u g h this issue will be addressed i n w h a t follows, we now note that for adequate description of the process it suffices to define the payoff as follows. Suppose there is i n 1? co (see (2.1.4)). T h e n the Payoff of Player E is assumed to be — 1 . If, however, i n = oo then his payoff is assumed to be +1 (Player P ' s payoff is assumed to be equal Player E's payoff w i t h opposite sign). T h e games of p u r s u i t w i t h such a payoff are called the p u r s u i t - t y p e games of degree. P h a s e constraints. If we require a d d i t i o n a l l y that d u r i n g the game the point {x,y) w o u l d not leave a set S € fl , then we have the phase-constrained differential game of p u r s u i t . A special case of such a game is the " L i f e - l i n e " game. 2n
T h e " l i f e - l i n e " game is the z e r o - s u m two-person game of k i n d , i n which the payoff of P l a y e r E is assumed to be + 1 , i f he succeeds i n reaching the b o u n d a r y of the set S ( " l i f e - l i n e " ) before he has been c a p t u r e d by Player P . T h u s , the a i m of P l a y e r E is to reach the b o u n d a r y of the set S prior to his rendezvous w i t h P l a y e r P (i.e. before he has been approached by P l a y e r P for a distance £,£ > 0). T h e a i m of P l a y e r P, however, is to approach Player E for the distance £ w h i l e the latter is i n the set S. It is assumed that P l a y e r P cannot a b a n d o n the set S. Example I (simple m o t i o n ) . T h e game takes place on the plane. T h e motion of the Players P and E is described by a system of differential equations
Definition
52
x x
2
t
«j
+
2
<
a
2
+ t> <0\a>0
2
pursuit
«*t,
$1 =
J/i=^,u
game of
= Uj,
2
u-2 u
—
of differential
(2.1.5)
2
Physically, the i m p l i c a t i o n of equations (2.1.5) is t h a t the P l a y e r P and E are m o v i n g i n the plane at l i m i t e d velocities, the m a x i m u m velocities a and 0 being constant in value, and the velocity of P l a y e r E does not exceed that of P l a y e r P. B y choosing at each t i m e instant a control u = (ui,u ) constrained by + u j < a (the set U ), the player can change the velocity direction ( the direction of the velocity vector). Similarly, by choosing at each t i m e instant a control v = (vi,v ) constrained by v + v\ < 0 (the set V ) , P l a y e r E can also change the velocity direction at each time instant. E v i d e n t l y , if a > 0, then the capture zone {CZ) coincides w i t h the entire space, i.e. P l a y e r P can always ensure the capture of Player E for any distance £. T o this end, it suffices to choose the motion at the m a x i m u m velocity a and to direct the velocity vector at each time instant ( towards the pursued point y(t), i.e. to carry out pursuit along the pursuit line. 2
2
2
2
2
If a < 0 then the set (EZ) coincides w i t h the entire space of the game less the points (x,y), for which p(x,y) < £ Indeed, if at the i n i t i a l instant Po(xa>yo) > £, then Player E can always ensure capture avoidance by receding from Player P along the straight line j o i n i n g the s t a r t i n g p o s i t i o n x , i / o and the m a x i m u m velocity 0. }
0
A remarkable fact established here, will be encountered on w h a t follows. To form the control ensuring capture avoidance for P l a y e r £ , it suffices to know only the i n i t i a l states, while i n the case a > 0, to form the control ensuring capture of a player E, Player P needs information o n his own and the opponent's state at each current t i m e instant. Example 2. T h e Players P and E are the m a t e r i a l points w i t h unit masses m o v i n g on the plane under the action of the m o d u l u s - c o n s t r a i n e d and the frictional forces. T h e motion equations are of the f o r m x x x
3
= Ui - k x , P
3
x
4
t
= z ,
2
= i
3
= u 2
4
, kx, P
V\ = !/3, h
=
yn,
4
u + u\ < 2
a, 2
(2.1.6)
G a m e of pursuit
in normal
form
53
h = v - k y ,v\ x
E
3
+ vl <0 , 2
V« = «a — *E2/4
where ( x i , ^ ) , ( 3 / 1 , 3 / 2 ) are geometric coordinates, and (x , i ) , (t/ , y ) .... are, respectively, the moments of the points P a n d E,k and fcg the friction coefficients, or a n d 0 the m a x i m u m forces applicable to the m a t e r i a l points P and E. T h e m o t i o n starts f r o m the states ar,-(0) = x°,3/,{0) = 3 / ? , i = 1 , 2 , 3 , 4 Here, by the state is meant the m o m e n t u m and coordinate space of the Players P and E rather t h a n their geometric position. T h e sets I / , V are the spheres 3
4
3
4
p
U = [u= ( t n . u a j i u j + uj <<* }, 2
V = {v = (v ,v ):v 1
2
2
+ vt<0*}
T h e i m p l i c a t i o n here is that the Players P and E may choose the direction of a p p l i e d forces at each time i n s t a n t . However, the m a x i m u m values of these forces are restricted by the constants a and 0. In this f o r m u l a t i o n , as shown below, the c o n d i t i o n a > 0 (power superiority) is inadequate for P l a y e r P to complete the game.
2.2
Game of pursuit in normal form
In 1 we d i d not provide the m e t h o d of choosing the controls u G U, v £ V by the Players P a n d E d u r i n g the game according to the i n c o m i n g i n f o r m a t i o n . In other words, we gave no definition for the concept of strategy i n the differential game. T h i s concept may be defined by employing different approaches. W e focus only o n the i n t u i t i v e l y evident game - theoretical properties to be possessed by the concept. A s noted i n C h . I , the strategy must describe the player's behavior i n a l l i n f o r m a t i o n states, i n w h i c h he may be d u r i n g the game. In the game w i t h the prescribed d u r a t i o n T, the information state of each player is determined by the phase vectors of the states x(t),y(t) at the current instant 1 and by the t i m e i f r o m the start of the game. Therefore, it would be n a t u r a l t o consider the strategy of P l a y e r P(E) as a function u(x, y, t)(v(x, y, t)) w i t h values i n the control set U (V). T h a t is how the strategy is defined i n [1]. Such type strategies are called synthesizing. However, this m e t h o d of defining a strategy suffers f r o m some grave disadvantages. Indeed, let the Players P a n d E choose the strategies afar, y, t), v(x, y, t). T h e n t o determine the players' motion strategy a n d , consequently, the payoff ( w h i c h is t r a j e c t o r y - d e p e n d e n t ) , we s u b s t i t u t e functions u{x, y, t), v(x, y, (), instead of control parameters u, v, into (2.1.1), (2.1.2), and integrate t h e m
Definition o f differential
54
on the time interval [0,7'] under the i n i t i a l conditions x y 0i
0
game of
pursuit
T h u s we get the
following system of o r d i n a r y differential equations: x y =
f(x,u{x,y,t)),
(2.2.1)
9(y,v(x,y,t)).
For the existence and uniqueness of a solution to the system (2.2.1) it is essential that some conditions be imposed on the functions f{x,u),g(y,v) and the strategies u{x, y, t), v(x, y, t). T h e first group of conditions (for f(x, u ) , 9{Vi")) places no l i m i t a t i o n s on the strategical capabilities of players, refers to the p r o b l e m statement a n d are justified by the physical nature of the process involved. T h e case is different w i t h the constraints on the class of functions (strategies) u ( x , y, (), i>(x, y, t). S u c h constraints for the players' capabilities are inconsistent w i t h the notions (adopted in the game theory) above the players' freedom to choose a behavior, a n d , i n some cases, lead to s u b s t a n t i a l impoverishment of the strategy sets. For example, if we restrict ourselves only to the continuous functions. u ( x , y,t),v{x,y,t), then the problems are encountered where there are no solutions i n the class of continuous functions. T h e assumption of a wider strategic spaces makes it impossible to ensure the existence of a unique solution to the system (2.2.1) on the interval [0,T]. A t times, to overcome this difficulty, one considers the sets of strategies u(x, y, t), u ( x , y,t), for which the system (2.2.1) has a unique solution that is capable of continuation on the interval [0, T). A s i d e from the n on const m e t iven ess of definition for the strategy sets, such on approach is not adequately s u b s t a n t i a t e d , since the set of a l l strategy points u(x y,t),v(x,y,t), for which the system (2.2.1) has a unique s o l u t i o n , proves to be nonrectangular. T h e latter circumstance, as shown below, leads to transformation of the e q u i l i b r i u m point concept m a k i n g it game—theoretically meaningless. l
A n o t h e r approach, proposed by N . N . K r a s o v s k y [10,12], enables the players to make on ambiguous choice of controls i n each i n f o r m a t i o n state. In this monograph we choose piecewise o p e n - l o o p strategies as strategies in the differential game. T h e piecewise o p e n - l o o p strategy u(-) for Player P is composed of a pair {<7,a}, where a is some p a r t i t i o n i n g 0 = t' < t\ < . . . < t* < . . . o f the time interval [0,oo] by the points t' w i t h o u t finite points of a c c u m u l a t i o n ; a is the m a p p i n g which places each point y' and phase states x(t' ),y(t' ) in correspondence w i t h a measurable o p e n - l o o p control u(t) for t e [t' , t' ) (a measurable function u(t)). Similarly, the piecewise o p e n - l o o p strategy v(-) for Player E is composed of a p a i r {T,0}, where T is some p a r t i t i o n i n g 0 to < t" < ... < t% < ... of the t i m e interval [0, oo) by the points t' ' w i t h o u t finite points of a c c u m u l a t i o n ; 0 is the m a p p i n g w h i c h places each point t" and positions x{t" ),y{t" ) in correspondence w i t h a measurable o p e n - l o o p control 0
n
k
k
k
t
k
k+l
k
k
k
k
G a m e of pursuit
in norma.!
form
55
v(t) on the interval [j&tjHa.) ( a measurable function D e n o t e the set of a l l piecewise o p e n - l o o p strategies for P l a y e r P by and set of a l l possible piecewise o p e n - l o o p strategies for P l a y e r E by E.
P, Let
be a pair of measurable o p e n - l o o p controls for the Players P a n d E
u{t),v(t)
(measurable function) w i t h values i n the set of controls U, V C o n s i d e r a system of o r d i n a r y differential equations
on the time interval [0,oo] w i t h the i n i t i a l conditions x(0)
= i o , !/(0) = yo-
C o n d i t i o n A . W i t h any pair of measurable o p e n - l o o p controls u((),
it(t)
and any i n i t i a l conditions x , i / , the system of differential equations (2.2.2) 0
has a u n i q u e s o l u t i o n x(t),
y(t)
0
(x(0) = x , t/(0) = y ),
t
0
0
which can be continued
on the t i m e interval t £ [0,oo). L e t x ,y 0
be a pair of i n i t i a l conditions for equations (2.2.2). T h e system
0
where u(-)
$ = {xo>yo\u(-),v(-)},
£ P,v(-}
€ E is called the s i t u a t i o n i n the
differential game. U s i n g C o n d i t i o n A , it can r e a d i l y be shown that to each s i t u a t i o n S { x o , y o ; u ( - ) , u(')}
uniquely corresponds a pair of trajectories x(t),
that x(0) = x ,y(0)
y(t)
=
such
= j / and
o
0
v = s{y Indeed, let u{-) —
{ C T , Q } , V{-)
—
(2.2.3)
(*).«(•))•
{T,0}.
Moreover, let 0 = r < ( i < . . . u
<
< . . . be a p a r t i t i o n i n g of the interval [0, co) which is a union of p a r t i t i o n i n g s u , r . T h e s o l u t i o n of system (2.2.3) is constructed as follows. O n each interval 't+i),
= 0 , 1 , . . . , the images of the mappings a,@
o p e n - l o o p controls u ( i ) , v(t),
are the measurable
hence, by C o n d i t i o n A , the system of equations x = /(x(i),u(i)), y =
has a u n i q u e s o l u t i o n x(t),
y(t),
9(y{t)Xt))
such that x(0)
= x y((i) 0t
= y , on the interval 0
[ t , i i ) . T a k i n g x ( i i ) , j / ( t i ) as i n i t i a l conditions, on the interval [ti,t ) 2
0
we con-
struct a s o l u t i o n to (2.2.4) by reusing the m e a s u r a b i l i t y of controls
u(t),v(t)
as the m a p p i n g images a,0
Obtain-
ing x{t ), y(t ) 2
on the interval [tk,tk+i),
we continue the process w i t h the result t h a t we find a u n i q u e
2
solution x(t),y(t),
so that x(0)
= x ,y{0)
c o r r e s p o n d i n g to { x , y \ u{-),v(-)} 0
(the evader
k = 0,1,....
E).
Q
o
= y0
A n y t r a j e c t o r y x(t)
(y(i))
is called the trajectory of the pursuer P
Definition o f di/ferentiai g a m e of"pursuit
56
In what follows, we consider the games w i t h the four types of payoff functions. T e r m i n a l payoff.
G i v e n are: a number T > 0 and the f u n c t i o n
continuous on {x,y}.
T h e payoff i n the s i t u a t i o n S =
{x ,
M(x,y)
y ; u(-), v(-)}
0
0
is
defined as follows: tf(*o,yo;
«(•),»()) =
M{x{T),y{T)),
where x{T) = x(t)\ y(T) = y(t)\ = {here x(t),y{t) are the trajectories of Players P and E corresponding to the s i t u a t i o n S). W e are dealing with the p r o b l e m of p u r s u i t when the function M(x,y) is the E u c l i d e a n distance between the points x and y. l=Tl
t
T
M a x i m u m r e s u l t . Let M(x y) be a continuous real f u n c t i o n . In the situation S = {x , y ; «(•), « ( • ) } , the payoff of P l a y e r E is assumed to be min <
0
0
0
(
I n t e g r a l p a y o f f . Some m - d i m e n s i o n a l m a n i f o l d S and a continuous function M(x,y) > 0 are given i n R x i f * . A s s u m e that i n the s i t u a t i o n S = {x ,y ;u(-),v(-)} In is the first instant at which the trajectory x(t),y(t) falls on S. T h e n K(x ,y ;u(-), *>(•)) A f ( l ( i ) , y(f))d< (if t = 00, then n
0
0
l
0
0
n
K = oo), where x{t),y(t) are the trajectories, respectively, for the Players P and E corresponding to the s i t u a t i o n S. In the case M = \,K = ( n , we face the p r o b l e m of p u r s u i t . Q u a l i t a t i v e p a y o f f . T h e payoff function K may take one of the three values + 1 , 0 , - 1 depending on a position of i ( i n ) i n ft" X R . After defining the strategy set for the Players P and E and the payoff f u n c t i o n , we can define the differential game of p u r s u i t as the game in normal form. In C h . 1, by the n o r m a l form is meant the triple (X, V , K), where {X, Y) are the spaces of pairs of all possible strategies i n the game, K is the payoff function defined on X x Y. In the case involved, the payoff f u n c t i o n is defined not only on the set of pairs of a l l possible strategies i n the game but also on the set of all pairs of the i n i t i a l position x ,y . Therefore, to each p a i r (x ,yo) G R corresponds its own game in n o r m a l form i.e. we actually define a f a m i l y of games i n n o r m a l form which depend on the parameters x ,y £ R x R. B y the normal form of the differential game V{x ,yo) given on the space of strategy pairs P x E is meant the expression n
0
0
n
a
0
n
0
n
0
r ( x , a / o ) = (xo,yo\ P,E, 0
where K{x y ;u{-),v(-)) methods given above. 0l
Q
K[x ,y ;u( 0
0
),v(-)))
is the payoff function defined by any one of the four
G a m e of pursuit
in normal
form
57
If the payoff f u n c t i o n K i n the game T is t e r m i n a l , then the relevant game I" is c a l l e d the game w i t h t e r m i n a l payoff. If the function K is defined i n the second way, then the g a m e is called the game for achievement of the m a x i m u m result. If the f u n c t i o n K i n the game T has the integral form, then the relevant game T is called the integral payoff game. W h e n the payoff function i n the game T is q u a l i t a t i v e , the relevant game T is called the q u a l i t a t i v e game (the game of k i n d ) ' . N a t u r a l l y , o p t i m a l strategies cannot exist i n the class of piecewise o p e n loop strategies (in view of the open structure of the class). However, we can show t h a t , i n a sufficiently large number of cases, for any e > 0 there is the e - e q u i l i b r i u m p o i n t (see 2, C h . 1). R e c a l l the definition for the e - e q u i l i b r i u m p o i n t . Let an e > 0 be given. T h e s i t u a t i o n S* = {x , y ; «*(•), «*(•)} is called the e - e q u i l i b r i u m point i n the game r f x o . p o ) if for a l l u(-) £ P,v{-) £ E the inequality holds. 0
0
(2-2.5)
K(x ,yo;<(')X-))-t0
T h e strategies u*(-), «*(•) defined in (2.2.5), are called the e-optima! strategies for the Players P a n d E. T h e following l e m m a is a rephrased version of the l e m m a i n C h . I for differential games. L e m m a 1 Suppose
that in the game r{xo,t/o) f
or
every e > 0 there exist a
limit l i m f f ( i o , » ; « , * ( • ) . <(•))• A s s u m e that for any i n i t i a l conditions x ,y 0
0
from the region X C R
x
n
R",
any e > 0 there is the e - e q u i l i b r i u m point. W e offer the following definition. D e f e n i t i o n 1. T h e f u n c t i o n V(x,
y) defined at each point x ,y 0
0
€ R? x R"
as lug. K{x ,y ; 0
0
<(•)»<(•))
is called the value of the g a m e T{x , 0
y) 0
= V(*o>*>)>
from the i n i t i a l conditions x , y 0
T h e existence of the e q u i l i b r i u m point in the game V(x ,y ) 0
0
£
X.
for any e > 0
0
is equivalent to the fulfillment of the equality sup i n f K(x y ;u(-),v(-)) v(-)e£ "<-> 0l
6P
0
=
i n f gup «( )6f
K(x ,y ;u(-),v(-)). 0
0
'Such type games are treated in some detail in Ch. 5 which deals with the "Life line" games where the function K is specified.
of differentia/ g a m e of
Definition
58
pursuit
If i n the game T(x ,yo) for any e > 0 there are e - o p t i m a l strategies for the Players P and E, then the game V(x ,y ) is said to have a solution. D e f i n i t i o n 2. Let u " ( ) , u " ( - ) be a pair of strategies such t h a t 0
0
XfcttWfc),«*(•))
0
> K"(*°,ift>;«*(-).»'(-)) >
tf(*o,ito;«'(-),«(-))
( - - ) 2
2
7
for all u(-) e P and u(-) 6 £ . T h e situation ( x , J/oi«*(•), f*(-)) is then called the e q u i l i b r i u m point i n the game T(x , yo)T h e strategies u*(-) € P and v'(-) G £ f r o m (2.2.7) are called o p t i m a l strategies for the Players P and E. T h e existence of the e q u i l i b r i u m point i n the game P ( z o , yo) is equivalent to the fulfillment of the equality 0
tl
max
m i n K(x ,
yol ».(•)•>"(•)) =
0
#q„
m
, * ^ ( ^ c i t o ; «(')>"('))• a
Evidently, if there is an e q u i l i b r i u m point, then for any e > 0 it is also the e - e q u i l i b r i u m point, i.e.
the function V{x ,y ) 0
0
here j u s t coincides with
K(x y ; « ' ( • ) , "*(•)) (see 2, C h . 1). Consider the synthesizing strategies. D e f i n i t i o n 3. T h e situation u'(x,y,t),v'(x,y,t) is called the e q u i l i b r i u m point i n the game i n synthesizing strategies if the inequality holds. Oi
0
#(x ,!/o;"(x,!f,*),v'{x,y,t)) 0
> Jf(^,(fe;t^(^ff,i),r(ai,sv^) >
> >
(2-2.8)
K(x ,y ;u'(x,y,t),v(x,y,t)). 0
0
Note that the e q u i l i b r i u m point i n the o r d i n a r y sense i n the class of functions u(x, y, (), v(x, tf, i ) is not possible because of the nonrectangularity of the situation space (the spaces of admissible pairs u(x, y, t), v(x,y,t), for w h i c h the system (2.2.1) has a unique solution). In fact, the s i t u a t i o n u*(x,y,t), v'(x,y,t) is the e q u i l i b r i u m point i n the game P i n the class of functions u(x,y,t), v{x, y, t), if the inequality (2.2.8) holds for all situations (u{x, y, (), v'(x, y, i ) ) , and (u'(x, y, £), v(x, y, t)), for which there is a unique solution of the system (2.2.1) from the initial states x ,y on [0,oo). 0
0
T h e strategies u"(x, y, t), v'(x, t/, () are called the o p t i m a l strategies for the Players P and E. T h e function V(x , y ) = K(x ,y ;u'(x,y,t),v"(x,y,t)) is called the value function of the game. 0
0
a
a
A d i s t i n c t i o n is made between the concepts of e q u i l i b r i u m point i n the piecewise o p e n - l o o p and the synthesizing strategies. In the synthesizing strategies, we cannot require the inequality (2.2.7) to be satisfied for a l l strategies
E x i s t e n c e of the e-equilibrium
point in differential
games
59
(u'(x, y,t),v(x, y, ()), since for some ti, v the pairs (u", v), (u, v') cannot be a d missible (in an a p p r o p r i a t e s i t u a t i o n the equation system (2.2.1) may have, generally, no s o l u t i o n , or may have no unique solution). Such type of games were s t u d i e d i n the theory of games over the finite strategy sets and were called games w i t h p r o h i b i t e d S i t u a t i o n s (see [3]). In w h a t follows, we consider the classes of piecewise o p e n - l o o p strategies unless otherwise specified.
Existence of the e—equilibrium point in differential games with prescribed duration
2.3
In this section we w i l l prove the existence of the e - e q u i l i b r i u m point i n the differential games of p u r s u i t w i t h prescribed d u r a t i o n i n the class of piecewise o p e n - l o o p strategies described i n 2. C o n s i d e r i n w h a t follows the case where P l a y e r E's payoff is a distance p(x(T), y(T)) at the last instant T oi the game. Start w i t h the game w i t h t e r m i n a l payoff. Let the d y n a m i c s of the game be given by the following differential equations: for F x = p(x,u),
(2.3.1)
for£
(2.3.2)
y = g{y,v).
Here x, y £ R , U € U, v £ V, where U, V are respectively the compact sets of E u c l i d e a n spaces R a n d R ,t € [0,oo) (see [16, 17, 23, 48, 5 2 - 5 6 , 63-65]). D e f i n i t i o n 4. Denote by P{x ,t ,t) the set of all points x £ R", for which there is a measurable o p e n - l o o p control u(t) transferring the point x to the point x on the t i m e interval \t ,t] (here the phase point is assumed to be in the p o s i t i o n x at the instant (). T h e set P(x ,to,t) is the reachability set for P l a y e r P. L i k e w i s e , the reachability set E(y ,t ,t) is defined for Player E. n
k
e
0
0
0
0
0
0
0
A s s u m e t h a t the functions f,gare such that the reachability sets P(x ,t ,t), E{yo,t ,t) for the Players P and E, respectively, satisfy the following conditions: 0
0
0
1. P(x ,t ,t) 0
is determined for any x
0
0
£ R", t ,t 0
£ [0,oo] (t
0
< t) a n d is a
compact set of the space R " ; 2. the f u n c t i o n P{x ,t ,t) is continuous in a l l of its arguments i n Hausdorf m e t r i c , i.e. for any e > 0, x„ £ R , t £ [O.oo) ( ( < t) there is such S > 0 t h a t \t - *ol < S, \t - t'\ < 6, p(x ,x' ) < 6, then p'{ P[x , t , (), P(x', t' > *')) < - R e c a l l that the Hausdorf m e t r i c p* i n the 0
0
n
0
0
0
0
0
0
e
0
Definition
till
of differential
game of
pursuit
space of compact subsets R is given as follows : p'(A, B) = m a x ( / ( 4 , B), p'(B,A)). T h e same is for E(y ,to,t). Here p'(A, B) = max p(a, B), where p is s t a n d a r d metric i n R". n
J
0
a€A
T h e existence theorem w i l l be proved for the game of p u r s u i t w i t h prescribed duration r(x ,y ,T), where i , y € R are respectively t h e i n i t i a l positions of the Players P and £ , and T is the d u r a t i o n of the game which proceeds as follows. T h e Players P and E at the t i m e instant t = 0 start m o v i n g from the positions x<,,yo in compliance with the chosen piecewise o p e n - l o o p strategy. T h e games terminates at the time instant / = T, a n d P l a y e r E receives from Player P the payoff equal to p(x(T),y(T)}. (see 1, C h . 2). 0
0
n
a
0
a
A t each time instant ( 6 [0,T] of the game r(x , y , T), each player knows the time instant t, his o w n p o s i t i o n , a n d the opponent's p o s i t i o n . Denote by P(x ,t ,t) {E(y ,t ,t)). a set of the trajectories o f system (2.3.1), (2.3.2) emanating from the point x (ya) and defined on the interval [t ,t]. 0
0
0
0
0
0
0
0
Let 6 = S„ = £ a n d introduce the games r f ( x o , yo-, T), i = 1 , 2 , 3 , that are a u x i l i a r y w i t h respect to the game r ( x , j/o, T ) . 0
T h e game r j ( x , y , T) proceeds as follows. A t the first step, Player E, being i n the position y , chooses a trajectory v% f r o m the set E(yo,Q,&), and Player P, being in the position XQ and having knowledge of P l a y e r E ' s choice at this step, i n t u r n chooses a trajectory u{ from the set P(j/o, 0,5). A t the k-th step (k = 2 , . . . , 2"), Player E knows the position of P l a y e r P - t h e point x _j a n d o w n position - the point y _j a n d chooses trajectory v from the set £ ( ( £ _ , , ( £ - 1)5,fc5), P l a y e r P knows a d d i t i o n a l l y Player E ' s choice at this step and chooses a trajectory u\ from the set P[t _j, (fc — 1)6, kS). T h e game terminates at the 2 " - t h step a n d P l a y e r E receives from Player P the payoff equal to p(x(T), y(T)). 0
0
0
6 k
6
s k
t
k
T h e game r (x y T) differs from the game T{{x ,y ,T). in t h a t , as the fc-th step (k = 1 , . . . , 2 " ) , P l a y e r P chooses a trajectory u£ from the set P(ti-i'{k ~ i)S,k8) a n d knows the position x _ and at this instant; here Player E makes his choice w i t h the a d d i t i o n a l knowledge of the choice made by Player P at this step. 6 2
0:
0:
a
k
0
l
T h e game V (x , y , T) differs from the game T {x , yo, T) in t h a t , at the 2 " th step, Player P chooses a trajectory U „ from the set ^ ( x ^ . , , ( 2 - 1)6,2"S), whereupon P l a y e r E receives the payoff that is equal to p(x(T), y(T — 6)). 5
3
0
6 2
0
2
n
T o be noted is that the games r*(x ,y ,T) the games from E x a m p l e 2, C h . 1. 0
0
6
0
i = 1,2 are a special cases of
In fact, as a f a m i l y of sets U , x 6 R , i t suffices to take U(x) = P(x, 0,6], and as a f a m i l y o f sets V - V(y) = E(y,0,6) i.e. the f a m i l y of reachability sets from the initial states x and y in the time 6 — ^. n
t
s
Denote the values of the games r ,{x ,y ,T) s
0
0
by y « / r f ( i , j / T } 0
0 l
E x i s t e n c e of the e-equilibrium
point in differential
L e m m a 2 Let {A(r)}, {B(s)} ously dependent on parameters functions
61
be the family of compact sets in R" continur,s G R in the Hausdorf metric. Then the n
F{r,s)
max
=
mm
min
G(r,s)=
f(x,y),
y£B{t)
z£A{t)
max
f.(x,y),
are continuous on the product R X R if f(x,y) (this lemma is similar to Lemma 7, Ch. I). n
Proof:
games
n
is continuous
L e t us prove the l e m m a for the function F(r, s).
provided for the f u n c t i o n T h e f u n c t i o n g(x)
n
x
R"
A s i m i l a r proof is
G(r,s).
— m i n ^ s j , ) f(x,y)
is the envelope of the
function f a m i l y and is continuous on the set A{r). given.
on R
B y the c o n t i n u i t y of the function g(x)
continuous
Let a number e > 0 be
on the space R" there is such
e > 0 t h a t if the point x' belonging to the set A(r)AA(r') ,
with a particular
2
r' is such t h a t there is such point x" G A(r)f]A(r') then the i n e q u a l i t y \g(x") — g{x')\ < t is v a l i d .
< S t h a t p(x",x')
< S,
T h e definition of Haysdorf
metric suggests t h a t , to satisfy this c o n d i t i o n , it is sufficient that the inequality < 6 be satisfied, because of the continuity of the family of sets
p*(A{r)A(r')) A(r),
by the p a r a m e t e r r i n the Haysdorff m e t r i c for any 6 > 0 it is possible to
select such Tj t h a t for p{r', r) < 17 the inequality p{ A{r),
A{r')) < 3
£ is satisfied.
Consequently, by any c > 0 it is possible to find such i) > 0 t h a t for r ' , r such that p(r',r)
< n the following inequality holds I max
m i n fix.y),—
max
min
f{x,y),\<e
T h e c o n t i n u i t y of the f u n c t i o n F i n s G S is proved in much the same way.
L e m m a 3 In the games r f ( a : , t/ , T ) , z = 1 , 2 , 3 , there exists the 0
in x , y 0
0
G R, n
and T, and for
0
i = 2
3
AAB
ValTf (x , y , T})
0
ValFi{x ,y ,T) Proof:
0
every x , yo, T < oo, the functions
points for
0
0
every n > 0 the following < VaIV*(x ,y ,T),T 0
0
0
are
inequality
equilibrium continuous holds (2.3.3)
= 2"6.
T h e existence of the e q u i l i b r i u m point i n the games r f ( x , ! / o , T } ) , 0
1,2,3 =
follows f r o m T h e o r e m
7, C h . 1, and its corollaries.
(A\B)\J{B\A)
T o be compared with the equations (1.5.7'),(1.5.8').
T o prove
62
Definition
of differential
game of
the inequality (2.3.3), we w r i t e the functional equations for the ValTt{xo,y ,T)),i
=
0
functions
l,2: Valri(x ,yo,T)
=
0
max
min
ValTi(v{{S),vi(S),T
) ! n
u
n
I
min
2
2a
2
=
0
max
p(u. (T),v „(T)),
1
Vair {x ,yo,T) 6
(2.3.4)
- 6))
max min f. " eB(s^_,.r-s,T) r) 5 eP( *„_ ,r-J.r) ,T-S,T)
-
pursuit
Vair (u\{6), s
(2.3.5)
v[{6), T - 6),
2
Va/r*(x ,j ,?') = 0
min
( a
max
p(vL(T) iLlT)), t
T h e inequality (2.3.3) is obtained by a p p l y i n g sequentially the inequalities (1.2.4) to the functional equations (2.3.4) and (2.3.5). T h e continuity of the functions ValT*(x , y , T ) , i = 1,2,3 is proved by applying sequentially L e m m a 2 to the functional equations (2.3.4), (2.3.5). (The sets B, P in (2.3.4), (2.3.5) may be replaced by suitable reachability sets which enables utilization of L e m m a 2.) • 0
0
L e m m a 4 F o r any integer n > 0, ffte following Vair{"(x , 0
Vairi(x , 0
inequalities
hold:
Jto, T) < Vairl""
( x , J/6, T),
i/
(x ,
0 ]
T) > ValT " S
2
+l
0
y ,T) 0
0
P r o o f : W e show the validity of the first inequality. T h e second inequality is proved in the same manner. Suppose that P l a y e r E receives a guaranteed payoff ValT\ {x , y T) by choosing on o p t i m a l strategy in the game P [ " ( i , y , T). L e t us restrict the class of admissible strategies for P l a y e r E in the game f{" (x ,yo,T) to the class of strategies in the game r\"(x y T). It can be readily seen that here he ensures for himself the payoff equal to V a i r f " ( x o , !/o, T ) , t h i s proves the validity of the required inequality. pj n
0
0
¥1
0l
0l
0
(l
0l
Existence
of the e-equilibrium
T h e o r e m 1 For any x ,y 0
point in differential
< oo the next limiting
€ R",T
0
63
games
Hm V a / P h x c y c T ) - J i m
equality
holds:
Vair "{x ,y ,T) B 2
0
0
P r o o f : C o n s i d e r t h e games T (x ,y ,T), ri"(x ,y ,T). T o each pair of strategies u(-),u(-) i n t h e game T "(x y ,T) corresponds a pair of strategies u'(-),v'(-) i n the game ri"{x ,yo, T) which are t r u n c a t i o n s of the pair u(-),u( ) at t h e last step (u(-) = u'(-)). Denote the payoff function in the games r "(x ,y ,T) r * " ( x , y , T ) by K {u(-),v(-)) ^ d K (u'(-), «'(•)) respectively. T h e i n e q u a l i t y £ a 2
0
0
2
0
0t
0
0
0
s
2
0
0
0
2
0
3
is satisfied. T h i s means t h a t the inequality V lT^(x ,y ,T) a
0
<
0
+
max
ValTi'(x ,y ,T)+ 0
max
0
(2.3.6)
p(y,y')-
y6E{yo,Q,7W) v'€£{v,r-(„,T)
r >
*
is also satisfied. W r i t e (2.3.6) for the games w i t h other i n i t i a l d a t a : ^r* (x ,i^» r)
0
)
max
0
max
p(y,y')-
veElvJ-.O.T-S) !V'€E(v.T-Sn.T) Be the c o n s t r u c t i o n of the set E{y ", 0, T — 6 ), 6
the inclusion
n
E ( j , f » , 0 , T - 5 ) C £ ( y o , 0 , K ) (yf- e
£(y ,0,*,)).
n
o
is valid a n d , because of the c o n t i n u i t y of the function p{x, y) a n d the compactness of the sets i n v o l v e d , the inequality V a / r * " ( x , y J " , T ) < Vair "(x , £
0
+ is satisfied.
3
yt",T)+
0
max max p{y,y')ye£(vo,0.T)y'6E(y.r-«„.T)
(2.3.7)
T h e definition of the games r t " ( x , y , 7 ' ) , ^ " ( x c y o , ? ' ) implies 0
0
the e q u a l i t y Vairi"
(x , y 0
0 l
T) =
max
Vair f(x , s
in view of t h e c o n t i n u i t y of the function E(y,0,T)
0
y ",T). s s
(2.3.8)
and initial condition
£ ( y , 0 , 0 ) = 0, the a d d e n d in (2.3.7) tends to zero as n -> oo.
Denote it by
( , ( n ) . F r o m (2.3.7), (2.3.8) we get Vairi"(x , 0
yo, T) > Vair "(x 6
2
0l
J/f™, 7") - e,(n).
(2.3.9)
64
Definition
Because of the continuity of VvT^($a yo,T), s
0
> V a l l f (z ,
0
0
f f o
game of
pursuit
(2.3.9) implies the inequality
t
Vair "(x ,y ,T)
of differentia!
, . r ) - e^n) + e {n), 2
(2.3.10)
where 62(71) —* 0 as n —> co. W h e n m a k i n g a passage i n (2.3.10) to the limit as n —* 0 0 , which is possible by L e m m a 4 a n d the theorem on the bounded monotone sequence, we have l i m Vair "(x , T} s
n—.oo
> E m Vair '{x ,y T). g
0 y(h
*
1
I
(2.3.11)
1h
n—too
F r o m L e m m a 3 follows the opposite inequality. Consequently, b o t h limits in (2.3.11) coincide. T h e statement of T h e o r e m 1 has been proved on the a s s u m p t i o n that the sequence p a r t i t i o n i n g the interval [0,T] er = {i a
= Q
0
1
<...
= T},n=
N
1,...,
satisfies the condition tj+t — tj = ^, j — 0 , 1 , . . . , 2" — 1. It can r e a d i l y be seen that the statement of T h e o r e m 1 holds for any sequence a„ of p a r t i t i o n i n g the interval [0, T] such that 7
K )
= max((
j + 1
- tj)
n
=f
0.
(2.3.12)
Consider any such two sequences of p a r t i t i o n i n g the interval [0,T], and {o^}. T h e following l e m m a holds.
{a } • n
Lemma 5 J i m Vair\">(x ,y ,T) 0
where x ,y £ 0
0l
R, n
0
= Jim
Vair^(x ,y T), 0
0!
T < 00.
P r o o f : W e give the proof by c o n t r a d i c t i o n . A s s u m e that the statement of the l e m m a is not true, and assume that the following inequality will be a satisfied J i m Vair^xo^T)
> J i m ValT\" (x , y , 0
T).
0
T h e n the natural numbers n\,m.\ may be found such that the following i n equality will be satisfied Vair° -'(x ,y ,T) 2
0
0
> Vairl-'(x , ,T) 0 yo
> ValV,^ >
(x ,y ,T) 0
0
>
Vair?{xo,y*,T).
Existence
of the e-equilibrium
point in differential
65
games
Denote by b p a r t i t i o n i n g of the interval [0, 7"] by the points belonging to both the p a r t i t i o n i n g o ,
and o .
m
T h e following inequality is satisfied for a
nt
Vair (x ,y ,T) s
2
0
< ValT'S'ixoyVtoT)
0
< VaWl (^jfy T) m
<
t
<
Vair°(x ,y ,T), 0
0
whence ValTUxo,yo,T))
<
Vair°(x ,y ,T). 0
0
T h i s contradicts (2.3.3) f r o m L e m m a 3. Consequently, our assumption is false and the statement of L e m m a 5 is proved.
•
T h e o r e m 2 For all x , t / , T < oo in the game T(x ,y ,T) 0
a
0
point for any e > 0.
equilibrium
ValT(x ,y ,T) 0
=
0
\imValT^(x ,y ,T), 0
0
of the interval [0,T] (to he
where {o~ } is any decreasing sequence of partitions n
compared with Theorem
there is the e -
a
Moreover,
7, Ch. 1).
P r o o f : L e t e > 0 be a n a r b i t r a r y given number. W e show t h a t such strategies ti (-) a n d v,{-) w i l l be found for the players P a n d E respectively that for a l l strategies u(-) 6 P,«(•) € E the following inequalities are satisfied: t
K(x ,y ,u (-),v(-)) 0
0
- e < K(x ,y ,u {-),v (-))
t
0
0
c
< K{x ,y ,u(-),v,(-)) 0
<
c
+ e.
0
Because of T h e o r e m 1, there exist such p a r t i t i o n t r (
u
that Vatr "(x ,y ,T) a 2
0
-
0
lirn^ ValV' "{x ,y T)
< e,
- Vair^ixo^o^)
< e.
2
Urn ValT'Six^y^T)
0
a>
n—oo
Let «'(•) = « . < ) ,
where a ,0l l u
are o p t i m a l strategies for the players P a n d E in the game
T j " ' {x , y , T) a n d T' (x , 0
«'(•) =
vl
0
y , T), respectively. In this case, based on the choice
0
0
of maps QJ, and 0' , P l a y e r P ensures the payoff at least ( - l i m _ u
y ,T) 0
n
o 0
VaW\ (x , rn
0
+ e)"and P l a y e r E the payoff l i m ^ „ V a / r " ( i o , ! / o , 7 ' ) - e . Consequently n
i
u'(-), »«(•) are t h e e - o p t i m a l strategies for t h e players P a n d E, respectively. T h e function ValT{x ,y ,T) 0
0
= \im
Vair^{x ,y ,T) 0
a
Definition of differential
66
game of
pursuit
is the value f u n c t i o n of the game r(z>o>lfd> T)
•
C o r o l l a r y . In the proof of the existence theorem, we d i d not use the specific form of the payoff p(x(T),y(T)). It was i m p o r t a n t only the continuous dependence of the payoff from the realized trajectories. Therefore, T h e o r e m 1 holds also if instead of p(x(T), y(T)) any continuous f u n c t i o n a l o f trajectories x(t), y{t) is considered. In p a r t i c u l a r , such a f u n c t i o n a l m a y be equal too nihio
2.4
Existence of e—equilibrium points in optimal time differential pursuit games
O p t i m a l time differential p u r s u i t games constitute a special case of differential games w i t h integral payoff, as defined i n 2, C h . 1. T h e classes of strategies P and E are the same as i n the game of prescribed d u r a t i o n . T h e set S={(x y):
p(x,y)
}
is given i n R
n
x
R. n
Let x(t),y(t) be trajectories for the players P a n d E i n the s i t u a t i o n (u("),u{*)) from the i n i t i a l states x ,y . Denote 0
t {x ,y ;u[ a
0
0
i
(if such that (x((),t/(()) sumed to be +oo).
0
),t>(-)) = min{f : (x{t),y(t))
eS
does not exist, then
€ S}
(n(*o»Sol «(•)»«(•))
is as-
In the differential o p t i m a l time p u r s u i t game, Players E's payoff is assumed to be K(x ,y ;u(-),v(-)) 0
0
= t„{x y ; 0t
0
u(-),«(•))•
Player P's payoff is equal too —K. T h e game depends on the initial states x ,y , 0
0
hence it is denoted by l\x ,y }. 0
0
F r o m the definition o f the payoff function it follows that i n the game r { x , t / ) Player B aims to m a x i m i z e the time of a p p r o a c h i n g P l a y e r P on a given distance t > 0. O n the other h a n d , P l a y e r P seeks to m i n i m i z e this time. 0
0
T h e o p t i m a l time pursuit game r ( i , y ) is related directly to the prescribed duration pursuit game for achievement of the m a x i m u m result. L e t r ( i , y , T) be the game of p u r s u i t w i t h prescribed d u r a t i o n T for achievement of the 0
0
0
0
E x i s t e n c e of e - e q u i i i b r i u m
67
points
m a x i m u m result ( P l a y e r E's payoff is equal to min <,<7- p(x{t), y{t))). 0
As
shown i n 3 of this chapter, for such type of games for every t > 0 i n the class of piecewise o p e n - l o o p strategies there is an e - e q u i l i b r i u m point. Let V{x , yo, T) be the value of such a game, a n d V(x , y ) the value of the game T{x ,yo)0
Q
0
a
L e m m a 6 With the fixed x ,yo the Junction cally in T on the time interval [0,oo).
V(x ,y ,T)
0
0
decreases
0
monotoni-
P r o o f : L e t T\ > T% > 0. Denote by vf' {•) an e - o p t i m a l strategy for Player E in the game T(xo, yo, For Player E, this strategy ensures t h a t the distance between h i m a n d P l a y e r P on the interval [0, Tj] is at least max[0, V(x ,y ,7i) — e] Consequently, it also ensures the distance between t h e m max[0, V(x ,y ,Ti)— e] on the interval [0,T ], where T < T , . Hence 0
0
3
0
0
2
V ( x o , yo, r . ) > max[0, V(x , a
y , Tj) - e] 0
(2.4.1)
(the strategy uf'(*), which is e - o p t i m a l i n the game T(x ,y ,Ti), is not necessarily e - o p t i m a l in the game P ( x , yo, Since e m a y be chosen a r b i t r a r i l y , then the statement of the l e m m a follows from (2.4.1). C o n s i d e r the e q u a t i o n 0
0
0
V(x ,yo,T) 0
= e
(2.4.2)
w i t h respect to T . T h r e e cases are possible here: 1. the (2.4.2) has no root; 2. i t has o n l y one root; 3. it has more t h a n one root. In case 3, it follows f r o m the m o n o t o n i c i t y a n d continuity of the function V(x„,y , T) i n T t h a t the (2.4.2) has the whole segment of roots, i.e. the function V(x ,y ,T ), as the f u n c t i o n of T, has a constancy interval (Fig.13). 0
a
0
i
Let us consider each case separately. In case 1 it is possible t h a t : a)
K(xo,y ,7
b)
infr> V(x ,!/o,?")>£;
0
0
1
)<^forallT>0;
D
c) inf >o V{x , T
,
0
y , T) = I. 0
Definition
68
of differentia!
game of
pursuit
F i g . 13
ID case l a , we have
V O o ^ o . r ) = p{x ,y ) 0
i.e.
tn{xo,ya;ii(-)> '(')) = 0 for a l l u(-),u(-).
T{x , 0
yo) is equal to V(x
<£
0
1
T h e n the value of the game
y ) = 0. In case l b
0l
0
Jnf Vfa, ,T)
= $mV(x
Vo
T)
> i.
0%Vot
For every (sufficiently large) T > 0, Player E has the strategy u(-, T), which ensures the avoidance of ( - c a p t u r e on the interval [0, T]. Player P has no strategy which ensures the ( - c a p t u r e of Player E i n finite t i m e . Indeed, if such a strategy existed a n d ensured a p a r t i c u l a r capture t i m e V(x ,yo), then it would be better than the o p t i m a l strategy for Player P in the game T(xo, yo\ V(xo> J/o))i since, by assumption the value of this game V(x ,yo] V(xo,yo)) > £• A t the same time, it cannot be stated that Player E has a strategy w h i c h ensures the avoidance of ( - c a p t u r e for any time. T h e p r o b l e m of finding the i n i t i a l states, in which such a strategy exists, reduces to a solution of the game of k i n d for Player E. T h u s , for £ < l i m - . « , V(xo,yo,T) it m a y be s t a t e d t h a t the value of the game, if any, is larger than any fixed T , i.e. it is equal to +oo. Consider case l c together w i t h case 3. 0
0
T
Consider case 2. Let T be a unique root of ( 2 . 4 . 2 ) . T h e m o n o t o n i c i t y and c o n t i n u i t y of the function V ( i , j / , r ) in T i m p l y the existence of such S > 0 that for a l l T > T > T - 6 0
0
0
V(x ,y ,T) 0
and for a l l T + S > T > 0
0
0
0
> V{x ,y ,To) 0
(2.4.3)
0
T. 0
V(x ,!/o,T)< 0
V(xo,y ,T ) 0
0
(2.4.4)
E x i s t e n c e of (-equilibrium
points
69
C o n s i d e r the game of p u r s u i t r(x ,y ,T) (T + 6 > T > T ). It has the e - e q u i l i b r i u m p o i n t i n the class of piecewise o p e n - l o o p strategies for any e > 0. T h i s means t h a t there exists the strategy u ( / ) for Player P which guarantees h i m a p p r o a c h i n g P l a y e r £ o n a distance V(x ,y T) -f t . F r o m (2.4.4) i t follows that e! > 0 may be chosen i n such a way that ti < V(x , y , To) V ( z o , y o , T) = £-V{x , ya, T). C h o o s i n g e] in this way, we see t h a t the strategy u,[(-) for P l a y e r P guarantees h i m approaching P l a y e r E o n the distance 0
0
0
0
t l
0
0t
1
0
0
0
V(x ,
V(x , y , T) + V{x , y , T) = / .
yo, T) + ej< i-
0
a
0
0
(2.4.5)
0
F r o m (2.4.5) i t follows that the strategy «,,(•) ensures the ( - c a p t u r e of P l a y e r E i n the t i m e not exceeding T + S. Here £[ is found by S and can be defined for any 6 > 0. E v i d e n t l y , by adopting the e - o p t i m a l strategy v (-) i n the game T(x , Vo, T) (T > T > T - S) where e is chosen f r o m the condition e < V(x , y , T) - V(x , y , T ) = V{x , y , T) - t, Player E can ensure t h a t in the t i m e T P l a y e r P w i l l not approach h i m o n a distance 0
C]
2
0
0
2
0
0
0
0
V(x , 0
0
0
0
0
V(x , y , T) - V(x , y , T) + I = t,
y , T)-t > 0
2
2
0
a
0
0
i.e. i n the t i m e T ( T - e < T < T ) P l a y e r E c a n make the f - c a p t u r e impossible. T h i s means t h a t i n the o p t i m a l time p u r s u i t game r ( a : , y ) t h e r e is the e - e q u i l i b r i u m point i n piecewise o p e n - l o o p strategies for any e > 0, with the value of the game being equal to To{xo,yo)- Here T is a unique root of the E q u a t i o n (2.4.2). Q
0
0
0
0
In case 3, denote the m i n i m u m root of (2.4.2) by To- G e n e r a l l y , we cannot c l a i m t h a t the value of the game ValY{x , yo) = T . Indeed, f r o m V(x , y ,T ) = £ i t o n l y follows that i n the game T(x ,y ,T ) for any e > 0, there is the strategy u {-) for P l a y e r P which guarantees h i m the e + £-capture of P l a y e r E i n the t i m e T . T h e existence of more than one roots of (2.4.2) implies (the m o n o t o n i c i t y of V(x , y , T)) the existence of the interval of constancy in T p n j T i j of the f u n c t i o n V(x ,y ,T) Therefore, an increase i n the d u r a t i o n of the game T{xo,yo, 3") ° n 5, where 6 < T — T , does not involve a decrease in the guaranteed approach to P l a y e r E, i.e. for all T £ [To, Ti\- player P can only ensure an approach t o Player £ on a distance ( + e (for any e > 0), and we have no reason t o say t h a t for a p a r t i c u l a r T £ [To, T i ] the value of e turns out to be zero. E v i d e n t l y , if the e q u i l i b r i u m point (instead of the e - e q u i l i b r i u m ) 0
a
0
0
0
0
0
0
c
a
0
0
0
t
0
existed i n the game r ( x , y a , T ) , then the value of the game r ( x , y o ) would correspond e x a c t l y t o T . 0
0
0
0
W e m o d i f y the concept of the e q u i l i b r i u m point in the game r ( x , y o ) Henceforth i n this section i t is more convenient t o employ the notation r ( x , yo, $ instead of T(x , y ) w i t h the i m p l i c a t i o n that i n the game T(x , y , £) the play terminates when the players have approached one another o n a distance £. L e t t' (xo,yo', (-)> i )) be the necessary t i m e u n t i l the players a p proach one another on the distance ( i n the s i t u a t i o n ("(•}> v(-)). • 0
0
0
n
u
0
v
a
0
Definition
70
of differential
game of
D e f i n i t i o n 5. For t > 0 5 > 0 the pair of strategies uf(-), v (-)
pursuit
is said to
s
constitute the e, 6 - e q u i l i b r i u m s i t u a t i o n i n the game F(:to,;/o,£) 4 it constitutes the ( - e q u i l i b r i u m point i n the game T{x ,
y , £ + 5), i-e. if
0
+ e > (
t (x ,yo-M-)M(-)) l +s n
0
0
n
+ i
Oa,!/o;uf(-),^(-)) > (2.4.6)
>CVo,!/o;fifM,^))-*
for a l l u(-) e P , «(•) e £ . D e f i n i t i o n 6. Suppose there is a sequence {& }, &k —* 0 such that i n all games T(XQ, I/Q, £ + S ) for any e > 0 there is the e - e q u i l i b r i u m p o i n t . T h e n the limit k
k
\imV(x y J 0l
(2.4.7)
+ °k) = V(x ,y ,e)
0
0
0
is called value of the game r(x ,y ,(). Note that the definition does not depend on the choice of a sequence S since the function V(x , yo,£) decreases monotonically i n £. D e f i n i t i o n 7. W e say t h a t the game r(x ,y ,£) has the value i n the generalized sense, if there is a sequence {8 }, S —t 0, such t h a t for any e > 0 and 5* £ {6k} i n the game r[x ,yo,£) there exist e q u i l i b r i u m s i t u a t i o n . F r o m (2.4.6) and (2.4.7) it follows that if the game has the value i n the ordinary sense, then its value V'(x ,y ,£) (in a generalized sense) exists and equals 0
0
k
0
0
k
a
k
0
0
lim
0
t *(x ,y ,u v (-)) e +s n
0
0
6
c>
(2.4.8)
= V'(x ,y ,£)
s
0
0
{«*)-<> N e x t , from the definition of the value of the game T(x , y , £) (in a generalized sense) it follows that if i n the game T(x ,y ,£) for any e > 0 there is the ee q u i l i b r i u m point i n the o r d i n a r y sense (i.e. a solution i n the o r d i n a r y sense), then it is exactly the solution, and V(x ,y ,£) = V'{x ,ya,£) (it suffices to take a sequence {S }, 6 = 0,for all k). 0
k
0
0
0
0
0
0
k
T h e o r e m 3 Suppose there exist the least finite root of the (2.4.2), T Then there is the value (in the generalized sense) of the optimal pursuit r(x ,y ,£), 0
0
0
P r o o f : T h e monotonicity and continuity of the function V{x ,y ,T) 0
k
0
y, T)
0
Q
^ O o . y o . T o ) = £ a n d the function V(x ,y ,T ) k
oo.
in T i m -
0
plies the existence of such sequence T - » T from the left, t h a t V(x , points T .
<
= To-
and V'(X ,y ,£) 0
0
time game of
Q
0
k
-t
is s t r i c t l y m o n o t o n e at the
k
Let 4 =
(2.4.9)
V(x ,y ,T )-£>0 o
o
k
T h e strict monotonicity of the function V(x ,y T) at the points T suggests that the equation V{x , y , T) = t + 5 has a unique root T . T h i s means that 0
0
0
k
0l
k
k
Alternative
71
for any S £ {S } i n the game T{x , y , £ + e) there is the e - e q u i l i b r i u m point for any e > 0 (see case 2). T h i s , however, means that i n the game T(x , yo, £) there is a s o l u t i o n i n a generalized sense and k
0
0
0
l i m V{x ,y ,£ 0
+ S) = limT* = T =
0
k
V'{xo,yo,£}
0
T h i s proves the theorem.
•
N o w consider case l c . W e have i n f V(x ,y T) = £. Let T - » oo. T h e n limt-.eo V{x ,yo,T ) = f . T h e m o n o t o n i c i t y and continuity of V(x ,yo,T) in T suggests that a sequence T may be chosen i n such a way that the function V(x y ,T) is s t r i c t l y m o n o t o n e at the points T . T h e n , as i n the proof of T h e o r e m 3, we m a y show t h a t there exists such a sequence {b~ } that T
0
0
0>
k
k
0
k
0y
0
k
k
l i m V(x ,yoJ 0
+ 6 ) = l i m T = T = oo k
k—.co
k
0
it—.oo
T h u s , i n this case a generalized solution also exists, and a generalized value of the game V(xo, J/o,^) equals infinity.
2.5
Alternative
In many cases it is preferable to find out weather player P can guarantee the ( - c a p t u r e f r o m the given i n i t i a l positions x,y i n a fixed time T. A n d if not we have to determine whether P l a y e r E can guarantee the avoidance of ( - c a p t u r e in the prescribed t i m e T. Let V(x,y, T) be the value of the game w i t h prescribed d u r a t i o n T from the i n i t i a l states x, y £ R", the payoff being min <[
1.
V{x,y,T)>£;
2.
V(x,y T)<£. :
Let us consider case 1. It follows from the definition of the function V(x,y,T) t h a t for any e > 0 there is such a strategy «"(•) for Player E that K(x,y;u(-),v:(-))>V(x,y,T)-e. holds for a l l strategies u(-). B y choosing a sufficiently small e, we may ensure that K(x,y;u(
)y<-))>
V( ,y,T)-e>£ x
for a l l strategies tz(-) of player P. T h e form of the payoff function K
suggests
that b y a d o p t i n g the strategy v'(-), P l a y e r E can ensure t h a t the inequality
Definition
72
of differential
game of
pursuit
min £be satisfied independent of P l a y e r P ' s actions, i.e. i n case 1 Player E ensures the avoidance of ^ - c a p t u r e on the t i m e interval [0, T] independent of Player P ' s actions. 0
Consider case 2. Let T be a m i n i m a l root of the equation (T < T) V(x,y,t) = £ w i t h the fixed x,y (if p(x,y) < I, then T is assumed to be 0). T h e definition of V(x,y,T ) then implies that i n the game T(x,y,T ) for any t > 0 Player P has the strategy u'(-) which ensures t h a t the inequality 0
0
0
0
0
ff(x,y;«;(•),»(•))<
V(x,y,T )
+ e =
0
£+e
is satisfied for a l l strategies v(-) of P l a y e r E. T h e form of the payoff function K suggests t h a t , by adopting the strategy u\{-), Player P c a n ensure that the inequality min 0 P l a y e r P can ensure the £-f-^-capture of Player E i n the time T, whatever the action of the latter may be. 0
0
W e have actually proved the following theorem (about a l t e r n a t i v e ) . T h e o r e m 4 For any x,y,E alternatives holds.
ft"
and T > 0 at least one of the following
1. From the initial states x, y Player E can ensure the avoidance during the time T independent
of Player
P's
of
two
(-capture
actions.
2. For every e > 0 Player P can ensure the £ + e-capture of Player E from the initial states x, y in the time T independent of Player E's actions. For each fixed T > 0 the entire space R* x ft" is divided into three nonintersecting regions: the region A = {x,y : V(x,y,T) < £}, called the capture zone, the region B = {x,y : V(x,y,T) > £} naturally called the escape zone, and the region C = {x,y : V(x,y,T) — £}, the neutral outcome zone. 1
Let x,y
G A. B y definition of A , for any £ > 0 P l a y e r P has such strategy
u*(-) that K(x,y;u- (-),v:(-))
+ e
c
for a l l strategies u(-) of Player E. B y properly choosing of e > 0, it is possible to ensure that *(*.*;«;(•),»;(•)><
V{x,y,T}
+
e<£
is satisfied. T h e l a t t e r means that the strategy «*(•) of P l a y e r P guarantees h i m the ( - c a p t u r e of P l a y e r E from the i n i t i a l states x,y in the t i m e T. A s a result, we get the following improvement of T h e o r e m 4.
Differentia!
games with dependent
motions
T h e o r e m 5 For any fixed T > 0 the entire space is divided into three tersecting regions A, B, C possessing the following properties :
nonin-
1. for any x y £ A , Player P has the strategy uj(-) which ensures the t-capture of Player B on the interval [0,T] independent of the tatter's actions; t
8. for any x,y £ B Player P has the strategy v'(-) which ensures the avoidance of ^-capture independent of Player P 's actions; 3. for any z , y £ C and t > 0, Player P has the strategy u*(-) which ensures the (. + t-capture of Player E independent of the latter's actions. T h e e x p l i c i t i m p l i c a t i o n of T h e o r e m 5 is t h a t if x, y £ A, then P l a y e r P can always ensure t h a t the trajectory x(t),y(t) emanating from the i n i t i a l states x,y does not leave the set / H J C d u r i n g the time T. Similarly, if x,y £ ft, then P l a y e r E can always ensure that the trajectory x(t),y(t) e m a n a t i n g from the i n i t i a l states x,y does not leave the set B\JC d u r i n g the t i m e T.
2.6
Differential games with dependent tions
mo-
So far we have discussed the differential games where the players' motions were described by (2.1.1), (2.1.2). A special feature of such games is that the control and phase states of the opponent d i d not affect e x p l i c i t l y players m o t i o n , since the variables y,v d i d not appear i n the r i g h t - h a n d side of (2.1.2), and the variables x,u d i d not appear i n the r i g h t - h a n d side of (2.1.2). In this, and in the following sections of the present chapter, we study the games, where the m o t i o n equations are of a general f o r m , w h i c h does not enable us to interpret them as differential games of p u r s u i t . In general, the theorems on the existence of a s o l u t i o n in the class of piecewise o p e n - l o o p strategies, t h a t are similar to theorems f r o m 3-5 of the present chapter, do not h o l d for this case. Let z £ ft", u £ U C CompR , v £ V C CompR*, f(z u,v) be the w - d i m e n s i o n a l vector f u n c t i o n given on R X U X V. Consider the system k
t
n
z' = f{z,u,v)
(2.6.1)
w i t h the i n i t i a l c o n d i t i o n z(0) = z . T h e n z moves i n the phase space ft" under the a c t i o n of the controls chosen at each instant respectively by the players P and E i n accordance w i t h their objectives and the information accessible i n each state of the game. T h e players' objectives i n the game are determined 0
Definition
74
of differential
gome of
pursuit
w i t h the help of the payoff that is dependent on the realized t r a j e c t o r y
z(t).
T h e game is assumed to be w i t h prescribed d u r a t i o n T. L e t M[z ,z{t)} Q
be a continuous f u n c t i o n a l of t h e t r a j e c t o r y z(t) realized
d u r i n g the game. A s s u m e that P l a y e r E's payoff is M[z , z ( t ) ] , a n d P l a y e r P ' s 0
p a y o f f - A f [20, «{*)]• A s before, such type of differential games are called the games w i t h prescribed d u r a t i o n . Sometimes the r e s t r i c t i o n on the d u r a t i o n of the game is of n o i m p o r t a n c e , a n d the game proceeds u n t i l the players P and E achieve some p a r t i c u l a r result. Let the m - d i m e n s i o n a l surface S called the t e r m i n a l surface, be given in R. n
Set t
n
= m i n { i : z(f) £ £ } , i.e. i n is the first i n s t a n t at w h i c h the point
z(t) penetrates S. If for a l l i > 0 the p o i n t z(t) g S, then tn is assumed to be equal +00. P l a y e r E's payoff is assumed to be equal i n (Player P ' s payoff being — i n ) .
A s before, such type of differential games are called the o p t i m a l
t i m e games. In m a n y cases i t is essential to define a set of i n i t i a l states from which Player P can ensure that the point 2 falls on S independent of P l a y e r E's action, a n d , conversely, i t is essential to define a set of i n i t i a l states, from which P l a y e r E can ensure the avoidance of a rendezvous w i t h the set S. Such type of differential games w i l l be called t h e differential games of k i n d . A s before (see 2), we choose the classes of piecewise o p e n - l o o p strategies. For the existence of the unique solution of system (2.6.1), w h i c h m a y be continued on the interval [0,oo) under any i n i t i a l conditions ZQ £ R"and
any
situation (u('),!>(•)) i n piecewise o p e n - l o o p strategies we impose on the r i g h t h a n d sides of equations (2.6.1) the condition that is s i m i l a r to C o n d i t i o n A from 2 of this chapter. C o n d i t i o n B . T h e system of o r d i n a r y differential equations 2 = f{z,u(t),v{t))
w i t h any pair of measurable o p e n - l o o p controls u{t),v(t)
any i n i t i a l condition z
0
the time interval [0, 00).
and
has a unique solution z ( i ) which c a n be extended to A s in 2, C h . 2, by e m p l o y i n g C o n d i t i o n B , we can
construct for any i n i t i a l conditions the solution of system i = f(z, u(t),
v(t)),
which can be extended to the interval [0,oo). Moreover, we may define the payoff function K(z ,n(),v(-)), 0
whose f o r m depends of the f o r m of t h e game.
In contrast to the case of independent motions, i n the differential games w i t h dependent motions the e - e q u i l i b r i u m points for any e > 0 i n the class of piecewise o p e n - l o o p strategies m a y not exist. T h i s m a y be i l l u s t r a t e d by referring to the following simple example. Example 3. T h e game has a prescribed d u r a t i o n T > 0, i £ R>, u £ [0,1], v £ [0,1], 2 = (u - v)\ 2(0) = 2 (z > 0), M ( 2 , 2 ( i ) ) = z{T). 0
0
0
A s s u m e that the ( - e q u i l i b r i u m point in the class of piecewise o p e n - l o o p strategies exists for all e > 0, a n d let {(«;(•)>"«(•))} be a f a m i l y of the e-equi-
Differentia/ g a m e s with dependent
motions
75
l i b r i u m p o i n t s for various € > 0, a n d V be tbe value of the game. T h e n + « > V > K(z ;u:(-),v(-))
Rfa«(•).<{-))
- t
0
(2.6.2)
for a l l u(-),w(-)Since there always exists the strategy u ( - ) , coincident w i t h v'( ) (u,(-) = ""(•))> by choosing the strategy u ( ) against the strategy v'(-) P l a y e r P can ensure t h a t the velocity of m o t i o n of the point z w i l l be zero, which leads to the payoff K{z ; B , ( ' ) » » * ( ' ) ) = o in the s i t u a t i o n (ti ( ), t>*(-)) Since the payoff K(z ;u{ ),v(-)) for any u(-),t>(') cannot be less t h a n z , a n d the inequality Zo + « > V m u s t h o l d for a l l e > 0, then V = z a n d for a l l e > 0 a n d v( ) the following i n e q u a l i t y is to be satisfied (
t
z
a
(
(
0
u
0
«;(•), » ( • ) ) - « .
(2.6.3)
Now let «;(•) = {o-',o'} be the e - o p t i m a l strategy for P l a y e r P. W e construct the strategy u(-) = { T , £ } , r = a' for P l a y e r £ . For any t , z{t ), the m a p p i n g k
0' associates the state t ,z(tk), k
time interval
[ifc,ifc+i): 1 0
where u'(()
K
w i t h the following o p e n - l o o p control on the
for a i l ( for which i / ( i ) < i , for a l l t for which u'(f) > |,
is the o p e n - l o o p control d i c t a t e d by the strategy u'(t)
interval [ d t . t i + i ) f r o m the state (*, z(t ). k
on the
T h e n , i n the s i t u a t i o n (u^('),ii(-))
the velocity of the p o i n t z is J or m o r e , a n d the payoff K{za\ u*(-),ii(-)) in this s i t u a t i o n is equal to K(z ;u'(-),v(-))
^ for any ( - o p t i m a l strategy
> z+
0
0
adopted by P l a y e r PIf e = £ , then (see (2.6.3)) T zo > K(z ; 0
u j (•).»(•)) ~ 8 "
T =
T
T -
8"
T +
Z o
=
" 8
+
2
° -
T h e o b t a i n e d c o n t r a d i c t i o n means t h a t , i n the case i n v o l v e d , the e - e q u i l i b r i u m point exists not for all t > 0, a n d we cannot talk about a solution of the game in the class of piecewise o p e n - l o o p strategies. T h e f u n d a m e n t a l results for the games w i t h dependent motions were obt a i n e d by N . N . K r a s o v s k y [12] a n d his followers, who extended the classes of strategies to i n c l u d e m i x e d strategies and provide the proof of the existence of a s o l u t i o n i n this extended classes. Since the above mentioned are discussed i n d e t a i l in [12], we just dwell here on the differential games w i t h dependent motions a n d d i s c r i m i n a t i o n of one of the players where we can prove the existence of the ( - e q u i l i b r i u m points for any e > 0 i n the class of piecewise o p e n - l o o p strategies.
Definition o f differentia.! game of pursuit
76
Denote by T{z , T) the t e r m i n a l payoff game of prescribed d u r a t i o n T from the i n i t i a l state z - Introduce the upper a n d lower games w i t h d i s c r i m i n a t i o n . a
0
Let V~(z,T) be the lower game w i t h d i s c r i m i n a t i o n for P l a y e r E. L e t (j, T be the p a r t i t i o n s of the interval [0, T\ dictated by the piecewise open-loop strategies u(-), v(-) for the players P and E i n the game r(z ,T), and w - a\Jr be a u n i o n of these partitions. In the game P ( z , T), at each point t € w Player P knows t a n d z(t ), and Player E knows t ,z(t ). A s s u m e t h a t i n t h e game r ~ ( z ( , , T ) Player P knows the p a r t i t i o n w a n d , at the instants t € u>, knows t , z((fe) and a control v{t) of P l a y e r E on the t i m e interval [(*, t +i]. Changes in the information state n a t u r a l l y involve changes i n the strategy definition of Player P. 0
k
k
k
k
k
k
k
k
T h e strategy of Player P is defined i n the game F~(z , T) as follows. L e t "( ) = { ,0} be a strategy of Player E i n the game F ( z , T ) (the strategy classes of Player E in the games r ( z , T ) a n d r ~ ( z , T ) coincide). T h e n , by the strategy u{-) for Player P i n the game F " ( z , r ) is meant a pair {a,a], where a is a m a p , which places each point z[i ) and control for P l a y e r E on the corresponding interval, i n correspondence w i t h a measurable open-loop control u{t),i e [ifc,ifc+i)0
T
0
0
0
0
k
Denote the strategy classes of the players i n the game r ~ ( z o , T ) by P~ = {u(-)} a n d £ = { » ( - ) } • Let P ( z , r ) be the upper game w i t h d i s c r i m i n a t i o n of P l a y e r P . L e t
+
+
k
k
k
0
+
Jr
+
+
k
k
k
+
Denote the strategy classes for the players i n the game r ( z , T ) by P = {u(-)} a n d E = {v(-)}. W e show that i n the games r ( z , T ) a n d r~(z ,T) for the strategy classes under considering there on the ( - e q u i l i b r i u m point ( >0). +
+
£
+
0
0
0
Differential
games with dependent
motions
77
In order to prove the existence of ( - e q u i l i b r i u m points i n the games T {z ,T) and r-(z ,T), we w i l l construct the a p p r o x i m a t i n g sequence of games rf(z ,T) and Tj{z ,T) w i t h discrete t i m e as shown i n this chapter for the games w i t h independent m o t i o n s . +
0
0
0
0
T h e game Tj(z ,T) differs from the game r~(z ,T) only by the strategy sets for the players. L e t 6= {t } be a fixed p a r t i t i o n of the t i m e interval [0,T\. T h e set of strategies E i n the game rj(z T) is a subset of strategies E in the game r { z o , T ) a n d is defined as follows: 0
0
k
6
0t
_
E
e
= {(r,0):r
= 6,
(T,0)GE},
i.e. this is the set of a l l such pairs ( r , 0) for which the p a r t i t i o n r is the same. Likewise, the set of strategies Pf i n the game Vj(z ,T) is a subset of strategies P~ i n the game T~{z ,T) and is defined as follows: 0
0
P - = {(6',0):6'
=
e
S (6',a)£P-), l
i.e. this is the set of all such pairs (S',o) for which the partition S' = S is the same. S i m i l a r l y , by fixing the p a r t i t i o n s of the time interval [0, T] for all strategies of the players P a n d E, it is possible to construct a discrete a p p r o x i m a t i o n r + f z o . ^ . o f the game T (z ,T) G e n e r a l i z a t i o n of L e m m a s 2 and 3 to the case of games r J ( z , T ) , Vj(z ,T) is elementary a n d we get the l e m m a that follows. +
0
0
0
L e m m a 7 In the games r^{zo,T) and rj(z ,T) there are equilibrium points in the classes of strategies P ~ , Es and Ps, E$ respectively. The values of those games V ( z , T ) and Vf(z ,T) are continuous functions ofz,T. (The reachability sets of system (2.6.1) are assumed to be compact for any z G W , T > 0 and continuously dependent on Zo,T). a
s
s
+
0
0
0
Suppose the p a r t i t i o n 6' contains one break point more than the p a r t i t i o n S, and the other break points (* coincide w i t h the break points of the partition 6. C o n s i d e r the games Tg,{z ,T) and Vj{z ,T). L e t t < ('< t by definition, at the point z{t ) i n the game rj,(z ,T) P l a y e r P receives from P l a y e r E a smaller amount of i n f o r m a t i o n that i n the game r^",(z ,T) (in the first case he knows v(t) for t G [(*,*'). a n d i n the second v(t) for ( G [ ( * , t * i ) ) . P l a y e r E, however, receives at the instant t' the i n f o r m a t i o n about the state z(t') that is a d d i t i o n a l as against the one in the game r ^ ( z , r ) . T h i s means that P l a y e r P ' s p o s i t i o n i n the game Tj(z ,T) becomes worse as against the one in the game rj(z ,T), w h i c h leads to the inequality Vf(z ,Y) > ^"(zo.T). T h e same i n e q u a l i t y follows from such explicit reasoning as Es- D E . Restrict Q
k
a
k
k+l
0
0
+
0
0
0
0
s
Definition
78
of differential
game of
pursuit
Player £ i n the game Tj.(z, T) to the class of strategies Es i n the game T h e n , i n the game rj.(z,T), he can always ensure the payoff equal to from w h i c h we get the inequality
Vj(z,T) Vf(z,T),
V 7(z T)>V -(z ,T). t
0l
s
(2.6.4)
0
S i m i l a r l y , we may get the inequality V t(z ,T)
+
0
(2,6.5)
0
Let 6i C 02 C - -. C fi C be a decreasing sequence of partitions of the interval [ O T ] . Since the t e r m i n a l payoff M[z , z{T}\ is bounded on the reachability set of the system (2.6.1) from the i n i t i a l state z i n the time T , then the sequences V ~(z ,T) and V *(z ,T) are bounded monotone sequences of continuous functions. Let \5\ = m a x ; tk\ be a break r a n k . n
0
0
s
0
St
0
—
( t e
L e m m a 8 There exist the
limits
V+(z ,T)
=
0
V(z T)=
s
\im
th
that are independent
l i m V +{zo,T), is* I—o Vf {z ,T) k
Q
s
sequence {fit}-
of the choice of a partition
T h e existence of the limits V , V~ for each fixed decreasing sequence {fit}, fifc+i D Sk, \$k\ —* 0 follows from the monotonicity and boundlessness of se¬ quences { V + } , {V ~}. W e show that for all uniformly decreasing sequences {fi*} a n d {fij,} of the interval partitions the following relationship holds +
s
00
J i m V -(z ,T)= f
00
l]mV 7(z ,T)
0
s
0
Denote l i n w V£ = V-{{6 )) and l i m ^ V$ = V-({6' }) Set L = sup V ({fifc}) (here supremum is taken w i t h respect to a set of all uniformly decreasing breaks). It is fairly easy to see that L < oo, since a l l V~({S }) are bounded by the number supM(z , z ) , z e C (z ). Here M(z) is the terminal payoff t h a t is a continuous function, and C (z ) is the reachability set of system (2.6.1). Let an a r b i t r a r y n > 0 be given. B y the definition of L, there is such a p a r t i t i o n that k
k
4|[
k
T
0
T
0
u
Vf{2o,T)>L-$. Let m be a number of interior points of the p a r t i t i o n fi. Because of the continuity of the reachability set C ( z ) (in the Hausdorff metric) and the continuity T
0
Differential
games with dependent
motions
79
of the payoff M for any n a t u r a l number m a n d any e > 0, there is such 6(e) > 0 t h a t , i f i n the p a r t i t i o n 6'^ - *J_.,J < 5(e) for a l l i , then for any p a r t i t i o n 6" c o n t a i n i n g m interior p o i n t s the following inequality is satisfied V- „{z ,T) U6
- V 7(z ,T)
0
6
W e choose 5' i n such a way t h a t | i ; l?r M o r e o v e r , Vf^j „(e ,T) s
u s
<e.
0
< |(|). T h e n
« ( s . , ^ - y r ( o , T ) < e = |. J
Z
> L — |, whence we have t h a t
A
V,7
>£-,,.
0
T h e latter inequality shows that for any decreasing sequence of p a r t i t i o n s of the interval [0,T] l i m Y%fa,f)
= L.
T h i s e x a c t l y proves the first p a r t of the statement i n L e m m a 8. Its second part is proved i n m u c h the same way. T h e o r e m 6 In the games r {z ,T) +
e-equilibrium r (z ,T) +
+
and r-{z T),
for any e > 0 there are
0
0
respectively.
QL
0
and T~(zo,T)
0
points and V ( z , T), V~(z , T) are the values of the games
P r o o f : L e t us prove the statement of the theorem for the game T~(z , T). T h e 0
proof for the g a m e T (z ,T) +
is made i n m u c h the same way. F i x an a r b i t r a r y
0
e > 0 u s i n g L e m m a 8, we m a y find such a p a r t i t i o n 5 that (2.6.6)
V-(z ,T)-Vf(z ,T)<e. 0
0
Let &t,Vs(-) be o p t i m a l strategies for the players P a n d E in the game
TJ{ZQ,T).
W e construct the following strategy for the P l a y e r P in the game
r~(z ,T). 0
Let «(•) = ( r , / 3 ) be a strategy for P l a y e r E, then by ii*( ) = (fi,a) is meant the strategy for P l a y e r P which places his every i n f o r m a t i o n state a t the time instants tk € S\JT i n correspondence w i t h the same o p e n - l o o p control u(t) for * € [h,tk+i)
as the strategy u y , that is o p t i m a l i n the game rj^j (z(t ),T f
T
T
tk), f r o m the i n i t i a l state zfjfcj, d u r a t i o n T — t W e show t h a t t h e strategy u'(-), r~(z , T).
k
-
K
w i t h the p a r t i t i o n 8(Jr.
is o p t i m a l for P l a y e r P in the game
E v i d e n t l y , as follows from (2.6.6), for any e > 0 by a d o p t i n g the
0
strategy v , t h a t is o p t i m a l i n the game Tj(z ,T), 6
0
payoff at least V~{z ,T) 0
P l a y e r E can ensure the
- e. W e show t h a t , by p l a y i n g the strategy
P l a y e r P guarantees the payoff for P l a y e r E not exceeding V~(ZQ,
u'(-),
T) = L.
Definition
so
of differential
game of
pursuit
Suppose the opposite is true. Let t?(-) = {T', 0} be a strategy for P l a y e r E , under w h i c h the payoff i n the s i t u a t i o n (u*(-)i »(•)) is L+c> crete game r j y , ( z , T ) w i t h the p a r t i t i o n 6\JT'. T
L. C o n s i d e r a dis-
In the s i t u a t i o n (u'(-), !>(•)),
0
the game r ~ ( z o , T ) is realized as a discrete game w i t h the p a r t i t i o n 6(Jr'.
In
a d d i t i o n , as follows from the definition of the strategy u'(-), P l a y e r P chooses the same controls u(t)
for ( € [(*,t*+i) ^ ^
n
pl y
e
a
o p t i m a l l y i n the game
s
T j y , ( z , T),i.e. he ensures that P l a y e r £ ' s payoff w i l l not exceed V j . ( z , 7 * ) f
5
0
T
0
F r o m (2.6.4), however, we have V -(z ,T) 5
0
< V-^Xz^T)
=
V~(z ,T). 0
T h e latter inequality means t h a t Player P can ensure t h a t P l a y e r E's will not exceed L. T h i s proves the theorem.
2.7
payoff •
Alternative for games with dependent motions and discrimination
Consider the games w i t h dependent motions f r o m 6. Let a compact set 5 be given i n R". In some cases we have to find out whether a player can ensure that a trajectory z(t) penetrates the terminal set S i n a fixed time T from the i n i t i a l position z € R . If this is not possible, then we have to find out whether Player E can ensure the avoidance of the point z ( i ) falling on the set S d u r i n g time T. n
0
T h e reasoning given i n the previous section point out the way of solving this problem for the upper F ( z , T ) and the lower r ~ ( z , T ) games with discrimination. +
0
0
T h e o r e m 6 can be readily generalized to the case when the payoff of Player £ is a continuous function of the realized trajectory: i n p a r t i c u l a r , it can be generalized to the case when it is of the form o
min ?(z(t),S),
(2.7.1)
r
where z(t) is a realized trajectory, p a distance to the set S. T h i s section does not provide appropriate proofs since they are exactly as i n 6. Now, let r ( z o , T ) and r ( z J") be the upper and lower games w i t h disc r i m i n a t i o n and payoff (2.7.1), K " ( z , T ) and V ( z , T ) being the corresponding values of the games. W e shall study the game r ( z , T ) i n some detail. T h e game P ( z o , T ) is studied in m u c h the same w a y T h e following two cases (two alternatives) are possible: _
+
0 l
0
+
0
_
+
1-
V-(z ,T)>0, o
0
A l t e r n a t i v e for games w i t h dependent motions and d i s c r i m i n a t i o n 2. V-{z ,T)
81
= 0.
o
C o n s i d e r case 1. Let K{z ; u(-), «{•)) be the payoff function in the game under study. It follows from the definition of V~(zT) that for any e > 0 there is such a strategy v'(-) for P l a y e r E t h a t for a l l strategies u(-) 0
B y choosing a sufficiently s m a l l e, we may ensure that K(z u(-),v;(-))>V-(z T)-e>Q o]
0>
for all strategies u(-) of P l a y e r P . T h e form of the function K suggests that, by a d o p t i n g the strategy «"(•), Player E can ensure t h a t the inequality
m i n p(z(t), S) > 0 will be satisfied independent of P l a y e r P's actions, i.e. in case 1 Player E ensures the avoidance of the point z failing on the set S d u r i n g the time interval [0, T] independent of Player P ' s actions. Consider case 2. L e t To be the m i n i m a l root of the equation (T < T ) V~(z ,t) = 0. T h e definition of V~(z ,t) then suggests t h a t , in the game r(zo,To) for any e > 0 P l a y e r P has the strategy u'(-) which ensures that the inequality 0
0
0
K(z ;u:(-},v(-))
+ e = e
0
is satisfied for all strategies v(-) of Player E. F r o m the form of the function K it follows t h a t , by a d o p t i n g the strategy u'(-). P l a y e r P can ensure t h a t the inequality m i n p(z(t),S)
< e
is satisfied independent of P l a y e r E's actions. W h e n the strategy u"(-) is extended to cover the interval [0, T], we see t h a t , in case 2, for any € > 0 P l a y e r P can guarantee the entry into the ( - n e i g h b o r h o o d of the set S in the time T independent of Player E's actions. W e proved the following theorem (about alternative). T h e o r e m 7 In the game T~(z,T), holds (z € R
n
and T >
at least one of the following
alternatives
0).
I. Player E can guarantee the avoidance of the entry into the terminal set S from the initial state during the time T independent of Player P's actions.
82
Definition of differential
g a m e of
pursuit
2. For e > 0, Player P can guarantee the entry into the (-neighborhood of the terminal set S from the initial state z during the time T independent of Player E's actions. A similar theorem also holds for the game r ( z , T) M o r e general a n d finer theorems (for the game F ( z , T) were proved by N . N . K r a s o v s k y [12]. T h e corresponding constructions u t i l i z e wide classes of m i x e d strategies. T h e remaining chapters of the book are not dealing w i t h dependent motions. +
Chapter 3 Class of pursuit—evasion games with optimal open—loop strategy for evader 3.1
Discrete game with terminal payoff and discrimination for player E
T h e game Ts(x ,y , 0
T), that w i l l be discussed in this section, differs from the
0
game Ti{x ,y ,T) 0
defined i n 2, C h . 2, i n the strategy sets P , E for players.
0
Let 8 =
(0 = t
0
< i
t
of the time i n t e r v a l [0,T].
< ... < t
< t„ = T}
< ...
k
be a fixed p a r t i t i o n
T h e strategy set i n the game Ts(xo,yo,T)
is a
subset of the strategy set E i n the game V(xo, yo, T ) and is defined as follows : E
s
= {{T,@)
: T = 8,(T,0)
G E],
i.e. this is a set of a l l the pairs (T,3)
g E
for w h i c h the p a r t i t i o n T = 8 is the same. T h e strategy space for P l a y e r P is composed of pairs {6, a } ,
where 6 is
the p a r t i t i o n of [0,7"] fixed throughout the game (and coinciding with the p a r t i t i o n of the same i n t e r v a l for Player E), a n d a is the map which places an admissible o p e n - l o o p control u(t) on the interval [t*,ifcfi) in correspondence w i t h each p o i n t t
k
€ 8, the p o s i t i o n x(t ), y(t ), k
k
and the o p e n - l o o p control
u(£),fe[r*,r* ). + 1
T h e strategy spaces for the P l a y e r P and E i n the game T$(x , yo, T) 0
denoted b y P , s
E, 6
are
respectively.
T h e c o n s t r u c t i o n of strategies for Player P suggests that the i n f o r m a t i o n on x(t ), k
a n d the P l a y e r E's c o n t r o l v(t)
y(t ) k
each i n s t a n t t . k
k
E is the first to choose u(t) for t G [t ,t ) k
P.
for ( € [(*,ij=+i) is supplied at
T h i s m a y be understood as follows. A t the instant t
T h e n P l a y e r P chooses u(f) for ( G [t ,t i). k
83
Player
and reports his choice to P l a y e r
k+l
k+
In this case P l a y e r P is
CJass of pursuit-evasion
8-1
games with optimal
open-loop
strategy
i n a privileged p o s i t i o n , the game is called the game w i t h d i s c r i m i n a t i o n for Player E . N o t e that d i s c r i m i n a t i o n for P l a y e r E is not assumed i n the game r(*o,yo,r). Denote the strategies i n the game r {x ,y ,T) by u {-)>M )- Evidently, all the constructions i n v o l v i n g the derivation of a unique s o l u t i o n of the system x = f(x,us()),il = g{y, "*(•))> w i t h the i n i t i a l conditions and situation ( U ( - ) , J J S ( - ) ) , are made just as i n 2, C h . 2. Recall that Player E ' s payoff i n the game w i t h t e r m i n a l payoff is defined as p(x(T),y{T}), where p, is the E u c l i d e a n distance between the players' position x(T),y(T) as the game terminates (see 2, C h . 2). Denote by Cf{x ) and C^(y ) the reachability sets for the P l a y e r P and E from the initial states x ,yo at the time instant T and p{xo,y ) = m a x , r . m i n , 7 - ( ) / » ( ? ' , n ' ) - U s i n g the designations f r o m 3, C h . 2, we have s
0
0
s
S
0
0
0
£
6 C
0
n
e c
( 3
o l
1
I 0
P(x , a
0, T) = C {x ), T P
E(y ,0,
0
T) =
0
C {y ). T B
a
Let p {x ,y ) = p(£,n) (p is achieved at the points £ £ Cp{x ), tj £ Cg(jf ))> where o ( f , J ? ) is the E u c l i d e a n distance between the points £,n. Any trajectory x (t) (0 < t < T) j o i n i n g ( x ' ( 0 ) = x ,x'(T) = j ) points x and £ is called the conditionally o p t i m a l trajectory for P l a y e r P and any trajectory y'(t) (0 < t < T ) j o i n i n g (y*(0) = yo, y*(T) = if) points y a n d n is called the conditionally o p t i m a l trajectory for Player E. T
0
0
T
0
0
m
0
0
Q
T h e point M is called the center of pursuit in the game Ts(x ,yo,T) 0
PT=
m a x
n
,
if
A
. ™ ,/»(?'. »?') = /({> 0
(/) is achieved at the points £ € Cp{x ), M £ C / j ( y ) ) . T h e center of pursuit may be not unique. T o every center of pursuit M correspond points £ £ Cp(x ) from (3.1.1). 0
0
0
Let x"(() be the c o n d i t i o n a l l y o p t i m a l trajectory for Player P , leading to f, and y'(t) the conditionally o p t i m a l trajectory for P l a y e r E , leading to M. W e say that the trajectories x'(t),y*(t) correspond to one another. Consider an a u x i l i a r y game r' (x,y,T) w h i c h is the same as the game F6(x,y,T) except that here Player E remains at the point y when t £ [ 0 , 0 ] , i.e. Player E ' s motion takes place w i t h the delay a. T h e m o t i o n equations and the payoff of the players are same as i n the game Fs(x, y, T). Let M', x"(t),y"(t) be the center of pursuit and the relevant conditionally o p t i m a l trajectories i n the game T' (x,y T). W e say t h a t i n the game Ts(x,y,T) there is an invariant center of p u r s u i t if the following conditions are Sg
5a
t
satisfied. ' i n [10, Cl(y ) 0
and
12]
the quantity pr(zo,ya) CJ,(x ). a
is called the " h y p o t h e t i c m i s m a t c h " between the sets
Discrete g a m e w i t h t e r m i n a l payoff a n d d i s c r i m i n a t i o n for piayer E
85
1. T h e center of p u r s u i t is the same i n all of the games V {x*{t),y'(t),T-t) f r o m the i n i t i a l states x " ( t ) , y'(t) w i t h the d u r a t i o n T-t, where x"(t), y*(t) are an a r b i t r a r y p a i r of the relevant c o n d i t i o n a l l y o p t i m a l trajectories i n the game r (x , y , T). s
6
0
0
2. F i x t, x ' ( r ) , !/*((), a n d let x"(t + a) (0 < a < T - t, x"(t + a)\ = x"{t)) be a c o n d i t i o n a l l y o p t i m a l trajectory for Player P in the game r ' s i ( x " ( i ) , i / , r - f), where y G C|;(j/*(t)). C o n s i d e r the current games " V c - t y C + <*),V,T - t - a) for 0 < a < S and fixed (, y G C {y-(t)). L e t M'{a) be a center of p u r s u i t i n the game r ' _ ( i ' * ( ( + a),y, T —t~ a) for 0 < a < 6, y e C (y'(t)). W e require the point M'(a) not depend on a for a i l 0 < a < S (occupies the same position). a=0
,
s
( (
E
0
s E
Prove the following l e m m a . L e m m a 1 Let x"(t),y"{t)
be a pair of relevant conditionally
ries in the game Tg(x, y, T), exist the solution the recurrent
x'(t ),x'(ti},...,
k
t
a
min
=
lk
on the conditionally
Proof:
x'(t ),x'[t ).
D
optimal
being invariant. a
trajecto-
Then there
= T, (x'(t ) 0
= x) 0
of
equations
PT- {Ah),y'(tk)) laying
the center of pursuit
(3.1.2)
jT- (x,y-{t )), ll+l
k+1
optimal trajectory x'(t)
i.e. x'(tk)
=
x'{tk).
F r o m the invariance c o n d i t i o n for the center of p u r s u i t M p(( M)
= p ^(x-(t),y'(t)),
1
T
0
A l o n g the relevant c o n d i t i o n a l l y o p t i m a l trajectories x'{t),y'(t)
for all ft £
we have
{toy-,in)
^, (x*(t ) ,-(r ))=^1
i
l i
t
( t +
A t the same t i m e , for a l l x G C (x'{t )) s
P
T
is satisfied.
k
k
<
k
+
(3.1.3)
1
the inequality
k
p -t {x~(t ),y-(t ))
,(x*(^ i),y*(iH ))-
fr_ (*,|f(**+i))fMl
Indeed,
pT-t^'i^^'ih))
=
max min p{x ) « £ $ * ( * * ( ' * ) ) >y
T
min
p{x',M),
=
Class of pursuit-evasion
v€C
T E
min However, since x g C p *
games with optimal
'*+»(*•('*+.»«*€^
,>(x,JW),
(x G
W
open-loop
strategy
M
G ^ V t M ) ) -
''(x"^)),
+1
C^" '(x)CC?-"(^(^)). +
then T h e latter inequality, together w i t h E q . (3.1.3), proves the l e m m a .
•
D e f i n i t i o n 1. T h e set of points {x',y',T'} such t h a t V £ [ 0 , T ] , x' £ Cp~ '(x), y' £ C%~ \y) is called the singular surface i n the game Ts(x,y,T) if i n the games P,;(x', y', J " ) there is no invariant center of p u r s u i t . T h e singular surface is called dispersal if Player E's choice of the control v(t),t € [t ,t ) on the singular surface defines uniquely the invariant center of pursuit M w i t h respect to the reachability sets C p " ' ^ ^ ) ) a n d CE~' {y{t ),v). Here Cl~ *(y{t ),v) is a subset of the set C£~'*(t/((*)), where i n P l a y e r E may arrive at the instant T from the states y(t ), w i t h the fixed s t a r t i n g interval of control v(t),t £ [ f , t , r ) . T h i s set coincides w i t h C ^ ' * ' (y(t )), where y{t i) is a point wherein Player E arrives from the state y(t ), when using the control v(t),t £ [tfc,(fc+i)' T
T
k
1
l
k
k+1
k
k
k
-
fc+1
4
k+
k+1
k
T h e o r e m 1 Suppose that in the game rs(x , yo, T) there is an invariant center of pursuit pVfxo.J/o) > 0- Then there exists an equilibrium point, the conditionally optimal trajectories are optimal, and the value of the game is equal to Va(x ,y ,T) = pr(xo,J/o)' Moreover, the optimal strategies for the Players P and E are of the form vj(-) = (£,/?*), where the map /?* associates each point x(t ),y(t ),t with the control v'(t), t £ [t ,t ) which defines the motion of the point E on the conditionally optimal trajectory y'(t) in the game r (x(,, J / Q , T) aiming to the invariant center of pursuit (i.e.,the piecewise openloop strategy r j ( ( ) is time function); u j ( ' ) — (6, a"), where a' places each point ?(**))** control v(t),t £ [ i , i ) of Player E in correspondence with the control u(f), ( £ [t , t + 1) transfering the point x from the state x(t ) to the state x ( t t i ) such that 0
0
0
k
k
k
k
k+l
s
a
n
d
t
k
t + 1
k
k
+
^r-
l i + 1
(x(i* ),!,(« + 1
f c + 1
))
=
min
^ T
l l + 1
Here y{t ) is the point wherein Player E arrives v(t), t £ [<j(,tfe+i) at the time instant t . k+1
k
( x , y(t )). k+1
when choosing
(3.1.4) the control
Discrete g a m e w i t h termi/iaJ payoff a n d d i s c r i m i n a t i o n for p/ayer E
87
P r o o f : W e show t h a t ^ ( * o , ! » ; « ; ( • ) . "!(•)) =
fr(xo,yo)-
(3.1-5)
Let y*(f) b e the c o n d i t i o n a l l y o p t i m a l strategy for P l a y e r £ directed towards the invariant center of pursuit M whose existence is assumed i n the theorem (y*(0) = yo, y*(T) = M). In the s i t u a t i o n {x ,y \v.'{-),v-{-)) the trajectory x'(t) for P l a y e r P at the t i m e instant t ,k - 0 , 1 , . . . . n passes through the points x'(t ) obtained as a solution of the recurrent equation ( L e m m a 1): 0
a
k
k
fr- {x'(t ),y-(t )) tt
k
min J T ^ V ,
=
k
/(**,))•
(3-1.6)
For any solution of this system, however, we have M*0>»>) = P T - , ( x ' ( i i ) , ! / * ( i i ) ) = . . . (
--• =
fr-^Wfrffc))
p(x*
= . . . = f>(x'(r),M) = ff(*o,»;u*(.),v*())
= cons(.
(3.1.7)
We have thus proved the equality (3.1.5). Show t h a t K{x ,y ;u ( a
0
),v}(-))
s
> p (x ,y ) T
Q
0
> K(x ,y ;u- (-),v {-)). 0
0
s
s
(3.1.8)
for a l l «((•) G P and u (-) G E . F i r s t prove the v a l i d i t y of the l e f t - h a n d inequality. Let u<( ) be a strategy for P l a y e r P, a n d x(T) the point wherein he appears when the game ends i n the s i t u a t i o n ( « ( ( - ) ' " £ ( ' ) ) - Since (
s
(
PT{xa,yo)=
max
nun o(£,A/),
t h e n , for any p o i n t £ ' 6 Cf{x ), the inequality P T - ( X Q , yo) < ptf', M) is satisfied. In p a r t i c u l a r , p r ( x , j ( o ) < p(x{T),M), a n d since the strategy uj(-) transfers yo to the point M, then /»(x{T'),A/) = K(x ,y ; us{-),v' (-)), a n d this proves the v a l i d i t y of the l e f t - h a n d inequality (3.1.8). W e now prove the v a l i d i t y of the r i g h t - h a n d inequality (3.1.8). In the game r$(2o,!/o,T), h t i m e instant (* the pursuer P has complete information on the trajectory of P l a y e r E for ( G [<*,(*+))• p a r t i c u l a r , this means that at each t i m e i n s t a n t t he knows the point y(t +i) which is reached by £ in the time - t . N o w assume that P l a y e r £ deviates his conditionally o p t i m a l trajectory (control v"(t)) d u r i n g the time t +t - h choosing on t h e interval [tk,h+\) t h e control v{t) different from v'(t). 0
0
Q
a
t
0
s
e a c
I n
k
k
k
k
88
CJass of pursuit-evasion
games with optimal
open-loop
strategy
Since Player E has held the control u*(t) up to the i n s t a n t r * , then from the definition of the strategy uj(-) for Player E a n d f r o m the invariance of the point M we have h{x»,yo) L e t D ~'"{y{t )) E
=
PT-t (x'(h)>y'(tk))t
be a subset of the reachability set C%~ (y*{t )) tk
k
E, which is o b t a i n e d from C]f {y{t )} ik
at the instant i +i, E passes through the point y(t +i). k
= ^r-*i(M*{iA),V*(ifc)) >
0
>
max
min
Consider the a u x i l i a r y game T' (x*(t ), SiS
point y — y[t +i)
for Player
Then
k
PT(x ,yo)
k
under the a d d i t i o n a l requirement that,
k
k
(3.1.9)
p(^,n).
y, T — t ) k
where E remains at the
d u r i n g the t i m e 6.
k
Let x'*(() be a conditionally o p t i m a l trajectory for P l a y e r P i n the game I V ( x - ( i * ) , V,T - i ) .
Let r _ (x"(t iiS
s
y = y(t ), k
k
d u r i n g the time t
y = y(h+i)
a
- h - a) for 0 < a < 6,
+ a),y,T
k
be the game, where P l a y e r E remains at the point
z'(tk) = x'(t )
-{t
k+1
k
+ a) = S - a, a n d pT-t - {x'*{h k
a
+
a) y) l
be a m a x i m i n distance between reachability sets c o m p u t e d in this game. T h e n PT-,Ax"(t ),y)
= &_,,,(**(**). I f ( ' H i ) )
min
< p _ (x'(t ),y'(t )).
k
max
T
tt
k
k
= (3.1.10)
F r o m the existence of the invariant center of pursuit i n the game Ts(x , 0
ya,T)
(see condition 2) it follows t h a t , for all 0 < a < 6, the center of p u r s u i t i n the games rs,s- (x"(t a
- t — a) is the same, therefore for a l l 0 < a
+ a),y,T
k
k
PT-n + ){^ k
a
k
+ a),y(t ))
-
k
<6.
const.
Hence, i n p a r t i c u l a r , w i t h a = 0 and a — 6 we get
However, PT-t (x"{tk i),y) t+l
+
=
max
min
/>({,*) =
since, in this case, P l a y e r E remains at the point y{t ) 6-6 = 0. A s a result, we get (see E q . (3.1.10)) k+1
PT-t^,{x-(t ),y) k+l
< p ^{x-(t ),y~(t )), T
k
k
d u r i n g the time
(3.1.11)
Discrete g a m e w i t h terminal
payoff and discrimination
i.e. there is such p o i n t x"(t ) quently,
€ C (x'(t ))
k+1
P
k
for player
89
E
that (3.1.11) is satisfied. Conse-
where x is chosen by the strategy u'(-) from the condition m i n i.e. i n t h e s i t u a t i o n (ii'('),i> (-)) we have
pr-t (x,y{t +i))i k
k
c
Pr-tH,
= fa(xo, yo)
< pT-tfc(**C
if, p l a y i n g the strategy vs(),
P l a y e r E continues to select the controls, which
do not coincide w i t h v*(t), then fa{x ,y ) 0
0
> pr-<
= pT-t (x'(t ) -(t )) k
>
k iy
k
fr-tw(x(t ,y t )) M
{
>...>
k+2
= p{x{T) y(T)) :
=
t + 1
(x(t
f c +
,),!/((
po(x(T),y(T))
t + 1
)) >
=
K(x ,y u- (.)M-))0
0]
s
T h i s is e x a c t l y w h a t proves the v a l i d i t y of the r i g h t h a n d inequality (3.1.8). We have thus proved the theorem.
•
T h e o r e m 1 has a l o c a l character and holds i n the regions of x , y , T, where 0
a
the invariance conditions for the point M are satisfied, i.e. i n the space region bounded b y the singular surface. N o t e that PT[X<3, yo) = 0 holds for C g ( y o ) C C?(x„)._ Let T > 0 b e the first time instant when pr(xo,yo)
= 0 and the condition
a? > 0 i n T h e o r e m 1 is violated explicitly. ( C o m p a r e this w i t h the first instant of absorption). T h e following theorem is readily obtained from the c o n t i n u i t y of the function PT{XO,VO) T h e o r e m 2 Suppose 6 ),S 2
2
T.
m
there is such Si > 0 that in the games TS(XQ,
< 6\, the conditions
2
the value of the game r (x ,y ,f) s
P can approach
Player
0
j/ol T —
of Theorem 1 are satisfied (for all 5 < S ). 0
Then
t
is zero, i.e. for any e > 0 at time T
Player
E for a distance not exceeding t.
P r o o f : T h e p r o o f of T h e o r e m 2 follows f r o m the fact t h a t , by adopting i n the game r (x ,y ;f) s
B
the o p t i m a l strategy u * ( ) in the game Ts(x , y \T -
0
0
0
P l a y e r P c a n approach P l a y e r E at instant f - S on a distance P T S ^ X Q , 2
and from the Hausdorff c o n t i n u i t y of the reachability sets Cp{x),C^(y), point T = 0. Corollary.
6 ), 2
j/), 0
at the •
If T > T, then the players P a n d E may be i n the states
of passive e x p e c t a t i o n , i.e.they make a r b i t r a r y moves along some trajectories u p to t h e t i m e i n s t a n t T = T, where f
x(t),y[t)
equation p (x(r),y(r)) T
is the m i n i m u m root of the
= 0. For £ € [r,T], the players start the p u r s u i t i n t h e
game r ( s ( r ) , y(r); T - T), ensuring a zero value of the game in accordance A
Class of pursuit-evasion
90
games with optimal
open-loop
strategy
w i t h T h e o r e m 2 (provided t h a t the conditions of this theorem are satisfied in the game r,(x(r),y(r),f)). Consider a global solution of the games. W e restrict ourselves to the games of pursuit where a l l singular surfaces are dispersal. T h e theorems 1, 2 make it possible to construct the solution " i n s m a l l " , i.e. w i t h the a d d i t i o n a l assumption that the players' trajectories do not intersect singular surfaces. For the global solution one chould be able to construct an o p t i m a l control on singular surfaces. In the general case, the nonincrease of Pr{x, y) is to be proved a d d i t i o n a l l y when Player P adopts the strategy u'(-) (the principle of m i n p ) , to the point m o v i n g on the singular surface (see the examples 4, 5, 7, C h . 4). In the case of the dispersal surface, the solution is comparatively simple, since Player E's choice of the control v(t) for ( 6 ['k,(it+i) on a singular surface (which becomes known to the opponent immediately) defines uniquely the center of pursuit M from the states x(tt,),y(t),) for the sets C j ~ ' * ( a : ( ( t ) ) , C g ' * ( ! / ( 4 ) ) . T h e o p t i m a l control is intended for a i m i n g at the point M. T h u s , the theorems 1, 2 also give a global solution to the games of pursuit where all singular surfaces are dispersal. _
3.2
Continuous game without discrimination
U s i n g the continuity of the function ^r(*o,I/o) in T,x ,y d i s c r i m i n a t i o n requirement. 0
we can discard the
0
W e define the invariant center of pursuit for the continuous game without d i s c r i m i n a t i o n r(x,y,T). T h e pursuit center M is called invariant if there is such 5 > 0 that for all 0 < 5 < 5 it is invariant i n the discrete games r $ ( x , j / , T) w i t h d i s c r i m i n a t i o n for Player E. T h e o r e m 3 Suppose in the game T[x ,yo,T) there is an invariant center of pursuit. Then the function p (x , yo) i» the value of the game without discrimination (i.e. the game of pursuit where Player P's choice is based only on the information about the state of the system at the time instant t). 0
T
0
P r o o f : For the given e > 0, we choose such 6 > 0 t h a t 0
\h{xo,yo)-fa(x',y')\ for all x', y' belonging to C {x ), 6 P
0
< |
(3.2.1)
Cgftfe), and for all 8' < 8 . N e x t . l e t 5, be such 0
that for a l l x e C^{x ) a n d a l l 6' < 5, the diameter of the set D{C (x)) < | and 8 = min{S ,S } Player P is prescribed the strategy u,( ), under which, on the time interval [0,5] he moves a r b i t r a r y to a point x(5) e C {x ). Let 0
0
6
P
}
p
0
Continuous game without discrimination
ill
x(6),y(6) be the states realized by the instant S. F r o m the state x' = x{S), y(6) he plays an o p t i m a l strategy in the game w i t h d i s c r i m i n a t i o n T {x', y , T) s
0
(this is possible, since at each step Player P knows the move made by his opponent). T h e value of the game r (x',y T) is equal to PT(x',y ), but in the game V$ P l a y e r P cannot make a final move (having played the strategy «*(•)), since he has lost one (opening ) move by choosing a n a r b i t r a r y m o t i o n to the point x' from the point x i n the t i m e 6. Therefore, he can guarantee an approach to P l a y e r E by the instant T only on a distance px(x', yo) + 5 (since the diameter of the reachability set C (x) for one step does not exceed 6 for all x £ C p ( x ) ) . Because of E q . (3.2.1) the strategy u ( ' ) ensures that Player P will aproach P l a y e r E by the time instant T on a distance PT{XQ, yo) + e. Since Player E c a n always p r o v i d e for himself the payoff PT(XQ, yo), then u,(-) is the ( - o p t i m a l strategy for P l a y e r P. T h i s completes the proof of the theorem. • 6
0t
0
0
P
0
(
Instead of the lower game of p u r s u i t , where Player E is d i s c r i m i n a t e d , we may consider the u p p e r game of pursuit w i t h d i s c r i m i n a t i o n of Player P. A l l previous results are valid i f instead of the q u a n t i t y max
pr( o,yo)= x
m i n />(£,?)
we consider the q u a n t i t y min min p{(,n). (6Cj(i ) neclbn)
p{xo,yo)=
0
In this case, the definition of the upper center of pursuit and its invariance may be also i n t r o d u c e d . W h e n the center of pursuit is invariant the upper center of p u r s u i t is not (this was to be expected, otherwise the quantities p,p would be e q u a l , w h i c h is far from r e a l i t y ) . Moreover, we know no case of the invariance of the u p p e r center of the pursuit other than i n the game of " s i m p l e p u r s u i t " o n the sphere. A t the same t i m e , the invariance of the center of p u r s u i t is observed i n a large class of problems (see [28, 30, 32]). T h e latter consideration also provides support for the i n t u i t i v e l y evident n o n - s i m m e t r y of the players' status i n the games of pursuit. T h e a b o v e - m e n t i o n e d constructions may also be generalized to the case where the sets Cj,(x ) a n d C%{y ) are not necessarily closed. M o r e o v e r , the e - o p t i m a l i t y m a y be shown for the strategies given above, where under the quantities PT{x,y),Pr{x,y) are meant the quantities 0
a
pr{x,y)
P(x,y)
=
=
sup
inf
inf
p{?,n),
sup
p{(,v),
92
Class of pursuit-evasion
games with optimal
open-loop
strategy
a n d under center of p u r s u i t a n d the upper center of p u r s u i t are meant the points i n the e-neighbourhoods
of the points M
Cl(y )
closures of the sets Cf{x ),
0
0
a n d M'
belonging to the
and d e l i v e r i n g m a x m i n a n d m i n m a x in
the preceding equalities.
3.3
Arbitrary terminal payoff functions and phase constrains
W e now t u r n to the p u r s u i t - t y p e games of p r e s c r i b e d d u r a t i o n
T{x ,y ,T) 0
0
where the t e r m i n a l payoff is denned as K(x ,yoM-),v())
=
a
where M(x,y)
M(x(T),y(T))
is an a r b i t r a r y continuous function given on R
n
X RJ . 1
T h e lower game w i t h d i s c r i m i n a t i o n Ts(x , y , T) is defined as i n 2, C h . 2, 0
except t h a t M{x(t),
0
is meant everywhere i n s t e a d of
y(t))
T h e function Mr(x,y)
p(x(T),y(T)).
is the analog of the function p(x,y)
a n d is defined
as follows: Mr(x,y)
— max
min
M(£,n).
C o n d i t i o n a l l y o p t i m a l trajectories are defined as in 1 (for simplicity, the sets C (y) E
and Cp(x)
are assumed to be c o m p a c t ) .
T h e point H is called the center of the game V${x , y , T) if MT(x ,y ) 0
M{£, H).
(Here
0
0
0
=
H are the points at which m a x m i n of the function M ( £ , TJ)
is achieved). B y analogy w i t h 1, we introduce the a u x i l i a r y game T' (x,y,T)
where
So
Player E remains at the p o i n t y d u r i n g the time a. T h e center of the auxiliary game is determined as i n the game r ( x , y, T) except that the reachabiliy sets, by which m a x m i n is t a k e n , are computed for the a u x i l i a r y game. A s i n 1, the notion of the invariant center of the game is i n t r o d u c e d w i t h the help of the a u x i l i a r y game. T h e o r e m 1 is fully applicable to the case of the function MJ{XQ,
yo).
T h e o r e m 4 Assume that in the game Ts(x , yo, T) there is an invariant center of the game and A Z ; r ( : r , y ) > 0. Then there is an equilibrium point, conditionally optimal trajectories are optimal, the value of the game V(x ,y ,T) = MT[XO,yo), and optimal strategies for the players P and E are constructed in much the same way as in Theorem 1. 0
0
0
0
0
T h e proof for T h e o r e m 4 fully coincides w i t h the one for T h e o r e m 1, in 3. T h e n o t i o n of the singular surface is j u s t as i n 1. T o generalize T h e o r e m 3 to the games w i t h o u t d i s c r i m i n a t i o n , we utilize the continuity of functions Mr{x,y)
i n all of the arguments.
A r b i t r a r y terminal
payoff functions
and phase
constrains
93
T h e o r e m 5 A s s u m e that in the game r{xo,yo,T) (without discrimination) there is an invariant center of the game. Then the function Mj{x,y) is the value of the game without discrimination. Proof:
A s i n T h e o r e m 3, we consider the a p p r o x i m a t i o n game r (x s
with descrimination.
y , T)
0:
0
Here the c o n t i n u i t y of the function Mj{x,y)
is used
instead of the c o n t i n u i t y of the function p(x , y ). 0
•
0
T h e o r e m 5 is a p p l i c a b l e t o z e r o - s u m games of pursuit w i t h prescribed d u ration between the pursuer team a c t i n g as one player P = { P i , . . . , P } a n d m
the evader E. Suppose the m o t i o n equations for the team partners P j , i = l , . . . , m , and Player E are of the f o r m i«
=
u), « 6 0 ® , J $ € R , i = l , . . . , m , n
!/ =
Sjg
,...,
0
min,'/3{i'''(T),J/(T')),
where x ' ( i ) , y(t), i = l,...,m-
yo, (*)> u
=
are trajectories of the
team members P ; a n d P l a y e r £ f r o m the i n i t i a l states x' , yo, i ~ 1 , . . . ,m, in a
the s i t u a t i o n («(•),«(•)).
In order to have an o p p o r t u n i t y to apply T h e o r e m 5,
we a d d to the m o t i o n equations for Player E, n x (m — 1) equations of the from Vi = ° . j = !»••••>.** X ( w i t h a r b i t r a r y i n i t i a l conditions. new extended vector y = (y,y,,... i
m
- )> ]
B y the state vector of Player E is meant a of dimension n X m . T h e vectors
,y ( -i)) n
m
a n d y are of the same d i m e n s i o n . W e rewrite the payoff i n the equivalent
form: K(x '\...,xo \yoM-)A-)) 0
=
m
= m\n
M(x(T),y(T))==
p(x^{T)MT)).
Here b y t h e distance is meant t h e distance between the vectors x y{T)
( , ,
{ T ) and
(but n o t y(T)) i n the space R . Since the function M(x, y) is continuous, n
we can a p p l y T h e o r e m 5. L e t a closed convex set S be given i n R". Consider the pursuit game w i t h prescribed d u r a t i o n a n d w i t h the a d d i t i o n a l constraint under which the players P and E are n o t allowed t o leave the set S d u r i n g the game. M o t i o n equations are o f the f o r m E q . ( 2 . 1 . 1 ) , Eq.(2.1.2), the i n i t i a l conditions being ( x , y ) G S. 0
0
T h e payoff is defined as a distance between the players at t h e time instant T when the p l a y ends. D e n o t e such a g a m e by
r'^(xo,yo,T).
94
Class of pursuit-evasion
games with optimal
open-loop
strategy
T o s u m m a r i z e the previous results, we have to introduce the n o t i o n of a d missible o p e n - l o o p controls a n d admissible strategies. T h e o p e n - l o o p control u(t) for 0 < ( < T is called admissible in the game ^l ' (x ,1/o>3 ) ' f under this control the trajectory x(t) corresponding to the : 1
0
^,
given o p e n - l o o p control belongs to 5 (XQ £ S).
T h e admissible open-loop
control for Player E is determined i n much the same way. B y the reachability set Cf(x ,S) 0
for Player P i n the game
T^(x y T) Ol
0t
is meant a set of points at which Player P can arrive at t h e t i m e instant T from the point x
0
£ S be e m p l o y i n g every admissible o p e n - l o o p control. T h e
reachability set Cg(y ,S)
from Player E is defined i n m u c h the same way.
0
T h e piecewise o p e n - l o o p strategy ti(-) is called admissible if i n any situation u(-) £ E the trajectory x(t), x(0) = x
o
£ S , 0 < t < T
belongs
to S. A n admissible piecewise o p e n - l o o p strategy for P l a y e r E is defined in much the same way. T h e set of a l l admissible piecewise o p e n - l o o p strategies for the players P and E will be denoted respectively by P
s
and E .
B y the e q u i l i b r i u m point
s
and the value of the game are meant those i n the class of admissible piecewise o p e n - l o o p strategies. It follows from the construction of the sets P P, s
v(-) £ E
s
and x £
S,y £
0
0
and E
s
s
that for any u(-) £
S the trajectories x(t),y(t),
0 < t < T, belong
to S. B y analogy w i t h pr{x , yo), we define P T ( X , y , S) as 0
PT{x ,yo,S) 0
0
=
sup
0
inf
p(f,n).
A l t h o u g h this does not play a role i n what follows, we assume that the sets @E{yQ>S),
Cp~{xo,S)
are compact a n d the function PT(X<3, y , S) is continu0
ous i n Xo, yo, T (in the phase-con strained games this c o n d i t i o n m a y not be fullfilled). T h e n it is possible to define a center of pursuit a n d relevant conditionally o p t i m a l trajectories x"((), In general, not a l l of the c o n d i t i o n a l l y o p t i m a l trajectories correspond to admissible controls, we agree to consider, w i t h o u t special s t i p u l a t i o n s , only admissable c o n d i t i o n a l l y o p t i m a l trajectories x'(t),
t/*(t) i.e. such t h a t x'(t) €
S, y'{t) £ S for 0 < i < T. F r o m the definition of the sets Cf(x ,
S), C f ( j / o , S )
0
it follows that such trajectories necessarily exist. A s in 2, we define the discrete game r (x ,ya,T) s
Player E.
0
w i t h d i s c r i m i n a t i o n of
In this game, we find the strategy u"(-) of P l a y e r P from the
formula (3.1.4) by replacing therein the set C {x(t )) s
P
k
B y c o n s t r u c t i o n , t h e strategy u*(-) is admissible. invariant center of pursuit. Here by x'(t),
by C {x(t ), P
k
S).
A s i n 2, we define an
y*(i) is meant a n admissible pair of
relevant c o n d i t i o n a l l y o p t i m a l trajectories. Theorems 1, 2, 5 h o l d fully for the
time p u r s u i t games
Optimal
95
case of phase constraints i f by the reachability sets are everywhere meant the sets Cp(x, S), C (y, 5 ) b o t h in definition of pr{x, y) and in the formulation of an o p t i m a l strategy for P l a y e r P . E
3.4
Optimal time pursuit games
In this section, we w i l l consider the games o f p u r s u i t where the payoff function equals t h e t i m e - t o - c a p t u r e . F o r the purposes of defininiteness, we assume that the t e r m i n a l m a n i f o l d S is the sphere p(x,y) = (,£ > 0, the function M(x,y) = +1 (see i n 2 , C h . 2) [28]. Let us assume that the sets C' {x) and C (y) continuous i n the point t = 0 uniformly w i t h respect to x and y. Suppose the following q u a n t i t y has a m e a n i n g P
®{*,y,£)
E
= max min
t (x,y-,\i(t),v(t)), n
where t (x,y; u((), w(t)) is the t i m e of advancing the distance t between t h e players P and E m o v i n g from the i n i t i a l points x, y under measurable o p e n loop controls u{t) a n d v(t), respectively. A l s o , assume the function Q(x,y, () continuous i n a l l of its arguments. Kelendzheridze was t h e first to introduce this function [9]. n
D e f i n i t i o n 2 . L e t ( u ( i ) , 8 ( f ) ) , f £ [0, oo) be a pair of o p e n - l o o p controls, under w h i c h Q(x, y, () is achieved, and y*(f), x'(t) be a corresponding pair of trajectories defined b y sistem (2.1.1), (2.1.2) under o p e n - l o o p controls u(t), v(t) and i n i t i a l c o n d i t i o n s x * ( 0 ) = x, y * (0) = y. T h i s p a i r o f trajectories w i l l be referred t o as a pair o f conditionally o p t i m a l trajectories. In what follows, r' (x,y) is the game of pursuit which differs from the game T ( i , y) only in t h a t here Player E remains in the initial state y d u r i n g the time S. L e t x' (t) be a c o n d i t i o n a l l y o p t i m a l trajectory for Player P in this game. 6
T h e o r e m 6 Suppose the following t. inf t )&(x',y,l)
conditions
are satisfied
is achieved with y fixed for any x.
fl(iC i(x
2. 0 ( x * ( i ) , !/*(*), 0 + * "
c o n s i
>
0
^
(
^ *n»
w
h
e
r
e
t
n
=
I n i n
<{'
:
P(*'(t) y*(t))
3. There is such I that for all S < § in the game F {x, y) with any fixed t, s
V 6 0 {y'(t)) E
for all 0 < a < 6 we have B(x"(t
+ a),y,()
+ a = const,
Class of pursuit-evasion
games with optimal
where x'*(t) = x"(t) (here Q{x"(t + o),y,i) T' . (x"(t
+ },yJ)}
5 a
is computed for the game
Then for any I > 0, e > 0 in the game T{x,y)
there is the
exist such pair of strategies u'(-),
t-equilibrium
i.e. for any e > 0 there
point and the value of the game is equal to Q(x,y,£), that
- e < tU*>vi
>y
strategy
[28].
a
e(x ,e)
open-loop
^
+ «.
4{*,j/i«(•).<(•)) + * > ^ f * * » ; « * ( • ) » « * ( • ] ) > t' (z,yX(-)M-))
(--) 3
4
J
- % (3.4-2)
n
for all possible strategies t i ( - ) and v(-) of the players P and E. Before passing t o the proof of the theorem, we prove some auxiliary statements. A s i n 2 of this chapter, we introduce the discrete games rs{x,y,t) with discrimination of Player E. M o t i o n equations for the players are the same, and a p a r t i t i o n of the interval [0,oo) is fixed, i.e. the p a r t i t i o n by the set of points {tfcl^o that tk+i — t — S for any k. P l a y e r E is said to be discriminated, i.e. at the time instant t Player P has, apart f r o m information on the time, his own and opponent's position at a given t i m e , information on the control v(t) for t £ |*fc,t*+l)- T h u s , at the instant t P l a y e r P has information o n the opponent's position at the instant t +\. Let u*(-), vs{-} be strategies for the players P and E respectively i n the game rs(x,y,£). s
u
c
n
k
k
k
k
L e m m a 2 Under conditions of Theorem 6, in the game Vjfe, y, () for S <S there is an equilibrium point in pure strategies and Q(x,y,£) is the value of the game, i.e. there exist a pair of strategies uj(-), uj(-) such that for any strategies uj(-) and Uj( ) of players P and E in the game Yi{x,y,V) there is t' (x,y,u (-),vl(-))>t (x,y,u}(.),v (-)) n
6
n
=
6
(3.4.3)
= Q(x y ?)>t (x,y;u (-),v (-)). l
>
n
l
s
P r o o f : Let v{t;x(t ),y(t )), u(t;x(t ),y(t )), t £ \t ,oo), be a pair of o p e n loop controls o n which Q(x(t ), y(t ),i) is achieved. Let us define the strategies uj(-) and v' (-). T h e strategy « ; ( - ) for Player E in the game V (x,y,() associates the state (t ,x(t },y(t )) w i t h the admissible o p e n - l o o p control v{t,x(t ),y{t )) for ( £ [ < , i ) . T h e strategy uj(-) for Player P i n the game Vs{x,y,() associates the state (t ,x{t ),y(t )), with the admissible o p e n - l o o p control v.(t;x(t ),y(t +i)), t £ [tk,t ) transferring the point x(t ) to such point x(t ) £ C (x(t )) such that k
k
k
k
k
k
k
s
s
k
k
k
k
fc
k
fc+J
k
k
k
k+1
Q(x(tk+i)Mh )J) +l
P
=
k
k
k+l
k+l
k
m i n e(x\y(t i),e), k+
(3.4.4)
O p t i m a / t i m e p u r s u i t games
97
if the ( - c a p t u r e f r o m the points x(t ),
off.*) is impossible i n the time 6. O t h e r -
k
wise P l a y e r P chooses the admissible o p e n - l o o p control u(() which allows the ( - c a p t u r e to be realized in the m i n i m u m time. W e w i l l show t h a t the s i t u a t i o n («((•)• f((').) is the e q u i l i b r i u m point. A s s u m e t h a t in the s i t u a t i o n ( « " ( • ) , u * ( ) ) for n < k the equality holds Q(x(t ),y{t ),e) where x(t ),
= 6(x,y,(),
n
(3.4.5)
is the position of the players P a n d E at the instant ( „ i n
y(t )
n
+ t
n
n
n
the s i t u a t i o n ( u j ( - ) , » ( ( • ) ) , establish that Eq.(3.4.5) holds for n = k + 1. Let v(t;x(t ) (t(:)), k
t G [ffc,oo) be a p a i r of o p e n - l o o p
u(t;x{t ),y{t )),
iy
k
k
controls on which Q{x(t ),y{t ),() k
is achieved, a n d x'(t,x(t )),
k
y'{t,y{t )),
k
k
t G [t <x>) be a corresponding p a i r of trajectories defined by the system (2.1.1), kl
(2.1.2) u n d e r the i n i t i a l conditions x'(t ,x(t ))
=
y'(tk,y{t ))
= y(tk)
k
k
k
x[t ), k
and the o p e n - l o o p controls v(t;x{t ),y(t }), k
time r = ( - t
a n d denote x'(t,x(t ))
k
k
Introduce the
u(t;x(t ),y(t )).
k
k
k
= x(r,x(t )),
«"((,y{t ))
k
=
k
y(T,y(t )). k
F r o m Eq.(3.4.4) we have 9 W ( * , ) , ! / ( f i , y ( t * ) ) , ( ) < Q(x(6,x(t )),y(6,y(t )),g). k
+
(3.4.6)
k
T h e second condition of T h e o r e m 6 implies e(x(6 x(t )),y(6,y{t ))J) 1
k
k
k
B y the a s s u m p t i o n , the c o n d i t i o n (3.4.5) is satisfied. t r o d u c i n g the n o t a t i o n y(S,y(t )) k
(3.4.7)
+ 6 = e(x(t ),y(t ),e).
k
fr°
= y(t -n), k
m
Consequently, by i n -
(3-4.6) a n d E q . (3.4.7) we
get e(x(t ),y(t ),i) k+l
+ t
k+l
(3.4.8)
<6(x,y,e).
k+1
A t the same t i m e , let x' be an a r b i t r a r y point of the set C £ ( x ( ( t ) ) , a n d u ' ( i ) , i 6 [t , t +i) be an a d m i s s i b l e o p e n - l o o p control transferring the point x(t ) k
k
k
to
the point x'. N e x t , let u ( i ) , t G [U+1,00) be such admissible o p e n - l o o p control that t (x\y(t )Mt)Mi,x(h)Mt>-))) n
=
k+l
= mint {x\y(t i),u(t) v(t,x{t ),y(t ))) "(') n
< maxmm v(()
k+
1
t (x',y(t )Mt)Mt}) n
k+1
k
<
k
= eK3/(*
t + ]
},().
(3.4.9)
»(<)
Let D be a set of measurable o p e n - l o o p controls u ( t ) , ( 6 coincident w i t h the f u n c t i o n u'(t) on the interval [t t +i)kt
k
[(4,00)
t h a t are
Denote by ti(i)
98
Class of pursuit-evasion
games with optimal
the function defined on the interval [t ,oo) k
strategy
that is coincident w i t h u'(f) on the
k
interval [t ,tk+i)
open-loop
a n d w i t h 5(() o n [ ( * i , o o ) . T h e n +
» ( < * ) . * ( * ) , * K ' , *(**). *(**))) = =
min
t {x{t ),y(t )Mi)Ai,x(h),y(tk))) n
k
k
Next *&(*(ifc),y(*fc),fi(t,«(**),|f(*t)) ii(* a{*fc),|f(**))) = )
= min
1
t^{x(U)Mtk)Mt)Mt,x{tk),y{tk)))
{here m i n is taken for a l l possible measurable o p e n - l o o p controls). However, D is a subset of the set of all measurable o p e n - l o o p controls. Consequently,
^(*(*t)^(**),»(*,*(*k).»(*fc)),«(*.*(**).»('*))) =
© ( * ( * * ) . <
T h e latter inequality (3.4.10) follows f r o m (3.4.9). T h u s ,
+ e(x\ (t ),e).
Q(x(t )Mh),e)<6
y
k
k+l
T h e inequality {3.4.11) holds for any point x' g C (x(t )), P
k
(3.4.11) specifically, for
x ( ( t + i ) . F r o m (3.4.11), (3.4.8), a n d from the a s s u m p t i o n (3.4.5) we have
Q(x,y,l)
= e{s(tHi)»V(*Hi),0 + '*+!•
So (3.4.5) also holds for n = 6 + 1 . For n - 0 , Eq.(3.4.5) is obviously satisfied. W e have thus proved (3.4.5). L e t N be such that Q(x,y,i)
= N6 + 6', where 6' < 6. Since NS = t , from N
(3.4.5) we have 0(x,yj) here Q(x{t ),y(t )J) N
= e(x(t ),y{t ),£) N
N
+ t, N
(3.4.12)
= 6' < 6.
N
F r o m the states x(f-w), y(i/>/), h a v i n g i n f o r m a t i o n on Player Es control for the time 6 ahead and a d o p t i n g the strategy u j ( - ) , P l a y e r P m a y finish the pursuit in the time Q(x{t ),y(t ),£) N
= 6' < 8. T h u s the hole t i m e of pursuit
N
in the situation (l*J(')i"*(")) from the i n i t i a l states x, y along t h e trajectories ! / " ( ' ) ' equal to s
i'niw,
«*(•). V|(-)) = t
N
+ S- = NS + 6' = Q{x,y, f).
(3.4.13)
Let us prove the validity of the r i g h t - h a n d side of t h e inequality (2.4.3). W e first show that in the s i t u a t i o n ( « # ( • ) » J ( 0 ) the following inequality holds V
9(x(M,y(M,<)>©(3((m),y{W*) + ^
(3.4.14)
O p t i m a / t i m e p u r s u i t games
99
where x(t ), y(t ), x(t ], y(t ) are positions of the players P and E at the time instant t and t i n the s i t u a t i o n (wf(').fj{-)), k
k
k+l
k
k+l
k+l
©(xfifej, y(tk)J)
= m a x m i n *£,(*(**), !/(**)>«('), » ( * ) ) =
- m i n t n ^ ) ^ ^ ) , ^ ) , ^ ! , ^ ) , ^ ^ ) ) ) .
(3.4.15)
Let i i ( i ) i £ ) be an o p e n - l o o p control transferring the point x(t ) to the point x{t ). T h e o p e n — l o o p control transferring the point y(t ) to the point y(t ) is the control v(t;x{i ),y(t ), t £ \t ,t ). Let D be a set of o p e n - l o o p controls defined for t E [t*,oo) and coincident w i t h w(f.) for £ € [ t t , t i i ) . C o n s i d e r the q u a n t i t y k
k+1
k
k+l
k
k
k
k+i
+
Evidently, <jfe>
» ( ' * ) ' «<'>. < 5
^
t ;
< S f etajftiw),^^)^).
(3.4.16)
The set D is a subset of the set of a l l measurable o p e n - l o o p controls. C o n sequently, f r o m (3.4.15) and (3.4.16) it is possible to derive the inequality (3.4.14). L e t tij(-) be a strategy of Player P under which, i n the s i t u a t i o n («((• ))"?(•))> he makes r times a choice that is different from the choice of u j ( - ) . In this notation uj(-) = u°( ). A s s u m e t h a t for n < r the inequality * n ( w K 0 X ( 0 )
>9 ( W ) -
(3.4.17)
is satisfied. P r o v e t h a t (3.4.17) also holds for n = r + 1. N o w , suppose t h a t i n the game Ts(x,y, £} Player P adopts the strategy u j ( ' ) , a n d P l a y e r E the strategy v^(-). L e t t be the time instant at which the choice of u £ ( - ) does not coincide w i t h the choice of uj(-) for the first time. W e introduce the t i m e r = t — t i and consider the game r $ ( x ( t i i ) , y(t +i),£), where x(t ), y(t i) are positions of the players P and E respectively at the instant t i n the s i t u a t i o n ( u £ ( ' ) , u ; ( - ) ) . In this game, Player E adopts the strategy uj(-) P l a y e r P the strategy under w h i c h , i n the s i t u a t i o n ("£(')< l ( ' ) ) i he makes a choice r times different than the choice of u j ( - ) , i.e. the strategy l(')- B y a s s u m p t i o n . + 1
k
+1
k+
k+}
k
+
kJt
+ l
k+l
u
u
t„(x(tk uy(h i)y (-)M) +
+
s
>
eHM-?(W4
(3-4.18)
and since ( h ( ^ y . ^ ( ) . ^ ( 0 ) =^»+in(^('Hi),!/(^ i);^(-).'';(-)), 1
+
(3-4.19)
100
Class of pursuit-evasion
games with optimal
open-loop
strategy
then, by e m p l o y i n g (3.4.14), we obtain
T h e choices by the players P a n d E i n the game T {x,y,t), however, has been coincident w i t h their choices in the s i t u a t i o n (t£(•),tlj(-J) u p t o the instant f*. Consequently (3.4.15) holds for n < k. T h e n , from (3.4.20) we find s
(3.4.21)
tnlx^v'sU.u^W^Q^y,?)
T h u s , (3.4.10) also holds for n = r + 1 , and since this holds obviously for n = 0, then i t also holds for a l l n < N i.e. for any Player P ' s strategy i n the game Fj(a;,3/,£) we have the inequality. t (*iy>M-),v;(-))
> &(x,yJ)
n
fe^sB^H)
=
(3-4.22)
Now prove the validity of the r i g h t - h a n d side of the inequality (3.4.3). First establish the following inequality: 0(^),yfc),O>H0(% ,),y(Wi).a +
where x(t ), y(t ), x(t^i), y{t +i) are the points at w h i c h the players P and E arrive i n the s i t u a t i o n (ttj(-),v$(-)) a t the instants t , (t+iL e t v(i), t € [t ,t + 1) be an o p e n - l o o p control transferring the point y(tk) t o the point j/((fc i), a n d u(t), t £ b e an o p e n - l o o p control transferring the point x(t ) to the point i ( r ) . L e t D be a set of measurable o p e n - l o o p controls that are coincident w i t h the control v(t) o n the interval ), where tk is the time instant when P l a y e r E's choice i n t h e situation (ttj(-), »;(•)) is firstly noncoincident w i t h his choice in the s i t u a t i o n (uj(-),t)J(-)) Then k
k
k
k
k
k
+
k
f c + 1
&(x,y,e) = t + e(x{t ),y(t ),e) k
k
k
> ™ . m i n t (x(t ),y{t ); n
k
u(t),v(t))
k
u(i)eD u(i) (3.4.23) Consider an a u x i l i a r y game V (x{t ), y ) , where P l a y e r E remains at the point y = y{t ) d u r i n g the time b\. Let x'*(t) be a c o n d i t i o n a l l y o p t i m a l trajectory for Player P i n t h e game U( (M>f)< 's,B- (Atk + <*),y) for 0 < a < 6, y = y{t ), x''{t) = x(i ) be the game where Player E remains at the point y = y(t i) d u r i n g the time t i - (t + a) = 6 - a. L e t Q'[x"(t + a), y, () b e the m a x i m i n time prior to the ( - c a p t u r e computed i n this game. T h e n 6Si
k
k+x
r
x
a
D
dT
a
k+1
k
k+
k+
k
k
9 V « * ) , y , 0 = e>'(r ),y('* »),*) = t
= tnm.nnnt {x(t ),y(i ),u(t)Mt)) n
v(tj€D u(<)
k
k
+
< Q{x(t ),y(t )J). k
k
(3.4.24)
O p t i m a ) t i m e p u r s u i t games
101
It follows f r o m c o n d i t i o n 3 of T h e o r e m 6 t h a t for all 0 < a < 6 0 ' O ' " ( ( + a), V{tk+i),t)
(3.4.25)
+ a = const.
t
Hence, i n p a r t i c u l a r , for all a = 0 and a = 6 we get = 6y%*i^i(&wM) +
&(x"{t ) y(t ),£) k t
k+l
(3-4-26)
S
However »[*) «(0 = e(x"(^
+ 1
), (( ! /
t + 1
)/)
(3.4.27)
since i n this case P l a y e r E remains at the point y(t ) time 5 = 0. A s a result, we have k+i
Q(x (tk)Mh)J)>Q(x"(tk )Mh i),e) H
+i
(3.4.28)
+ 6
+
Consequently, there exists such x " ( t i ) € C (x(t )) satisfied. T h e n the following inequality holds P
t +
> ©
©(**(**),V(W)
unmoved d u r i n g the
f
t
k+l
/
)
for which (3.4.28) is
+ 6
(3.4.29)
where x is selected under the strategy u'( ) from the m i n condition of the function © ( z , j / ( t
l + 1
) , f ) o n the set Cp{x(t ))
i.e.in the situation (uj(-),««(•))
k
e ( * , s , / ) = e(*(r ),!,(t ),() + t > e ( x ( ( t
t
t
t + 1
),j,(i
t + 1
),^) +
(3.4.30)
T h e further prove is i n much the same way as the vahdity of the l e f t - h a n d side of the i n e q u a l i t y Eq.(3.4.3). • W e now t u r n t o the proof o f T h e o r e m 6. P r o o f : B y hypothesis, the function 6(x,y,£) is continuous i n a l l of its arguments. Consequently, for any t > 0 we find such 6 > 0. t h a t as soon p({x,y,£),(x',y\£'))<6,
(3.4.31)
\Q(x,yJ)-Q(x\y',e')\< -.
(3.4.32)
e
T h e a s s u m p t i o n a b o u t the continuity of C (x), Cp(y) i n t = 0 uniformly w i t h respect t o x, y suggests t h a t by 8 i t is possible t o find such 8 > 0 that for any x' € Cp(x), y' G Cp(y), t = £ - 5, (3.4.31) holds. P
x
N e x t , byfi it is possible to find such a > 0, a > 0 t h a t for any x' EE C x
5
V €
3
l P
(x),
Cp(y) {x,x')<5
P
u
p{y,y')<6,.
(3.4.33)
Class of pursuit-evasion
102
games with optimal
open-loop
strategy
Introduce the notation S = min
<7i,
2
(3.4.34)
2l
Consider the strategy u'{-} for Player P that is an ordered pair (o-,a), where o* is a p a r t i t i o n of the interval [0, oo) by the points w i t h the step equal to S < S and ct is a m a p which associates the state f , x(to), y(to) w i t h an a r b i t r a r y admissible o p e n - l o o p control u ( i ) , t € [ i o , ' i ) CO = 0, x(t ) — x , y(t ) = yo), l then coincides w i t h the map a ' w h i c h associates the state (t*,jr{ii),!/(**)), k > 1, w i t h an admissible o p e n - l o o p control u(t), t € [tk,tk+t) transferring the point x(t ) to such point x = x(t +i) that 3
0
0
2
0
a
0
n
(
t
k
min
0(x(tk^)Mh),n=
(3.4.35)
Q(x',y(tk),i'),
where £' = I — St. T h e n , the map Q is an o p t i m a l strategy for P l a y e r P in the game rs (x(6 ) y,£'). Moreover, assume t h a t Player E's strategy «"{•), under which a p a r t i t i o n of the time interval [0, 0 0 ) is chosen w i t h an a r b i t r a r y step 6 and a m a p coincides w i t h the o p t i m a l strategy for P l a y e r E in the game T^yJ). From the o p t i m a l i t y of the strategy v'(-) i n the game Ts (x,y,£'). for any strategy u(-) of Player P follows 3
3 }
a
3
(3-4.36)
®(x,yJ)
For u(-) = u-(-) (3.4.36) takes the form &{x,yj) T h e fact that u'(-)
< &*,»,«;(•),<(•))•
(3-4.37)
is an o p t i m a l strategy in the game T {x(6 },y,£'} S3
3
e ( * ( « ) , y , 0 > £ ( * * , « * ( . ) . « ( • ) ) - 63,
implies (3.4.38)
3
where t' (x,y,u'(-),v{-)) = r is the approaching time for the points x(t), y(t — 0 3 ) on a distance t' from the initial positions as x and y i n the situation l s3
Next rt»(T),if(T))
< /»(*(T),jf(T - S )) + p ( „ ( r - <5 ),y(r)). 3
3
However, S < S a n d , based on the choice of S we get z
2
2
O p t i m a / t i m e pursuit games
103
Thus, (x(T),y(r))
P
(3.4.39)
+ 6, =t.
implies
F r o m (3.4.32) we have (3.4.41)
Q(x(6 ) y,e')
F r o m (3.4.41), (3.4.38) a n d (3.4.40), a n d t a k i n g i n account t h a t S < |, we obtain 2
®(*,yj)
- 6
> t (x,y,<(-)X-)) n
(3.4.42)
T h e i n e q u a l i t y (3.4.42) holds for any Player E's strategy i n the game r(x,y,().
S u b s t i t u t i n g u(-) = v'(-) i n (3.4.42) we get *<>,,,,<(•),!,;(.))
(3.4.43)
- < < Q(x,y,e).
Using (3.4.43), (3.4.37), (3.4.36) a n d (3.4.42), we find - e < 0(x,y,e}
< t (x,y,u;(-U;(-))
< 9 ( x , y, I) + e < t h O ,
«(•)> < ( - ) ) + e,
t (x,y,u:(-)M-)) n
<
u
(3-4.44)
T h u s , the s i t u a t i o n (u*(-)i " « ( " ) ) ' the ( - e q u i l i b r i u m p o i n t in the game a n d Q(x,y,£) is the value of the game. s
r(x ,£), • >y
R e m a r k . W e may a b a t i d o m the a s s u m p t i o n a b o u t the existence of m a x m i n t {x, y, u(t),v(t)) e
n
and
min
Q{x\y i) t
i n t r o d u c i n g the quantities 0 ( z , j/,f) = supinf f n ( x , j / , u ( i ) , T ; ( i ) ) , "(0 sup
0(x',y,().
In this case, the t h e o r e m s i m i l a r to T h e o r e m 6 is also v a l i d . Because of c u m bersome n o t a t i o n s , i t s precise f o r m u l a t i o n a n d proof are o m i t t e d here.
104
Class o f p u r s u i t - e v a s i o n games with optima.! open-loop
3.5
strategy
Necessary and sufficient conditions for existance of optimal open—loop strategy for player E
I D this section, we restrict ourselves to the p u r s u i t games of prescribed durat i o n , although all of its results are extended to cover the o p t i m a l time pursuit games. Let r ( x , i / o , ' ) he a discrete game of pursuit w i t h the (6 = — t ) with prescribed d u r a t i o n T and w i t h d i s c r i m i n a t i o n of P l a y e r E from the initial states x , y T h e n the following theorem holds 4
k
0
0
0
Theorem 7 = Va^^^1>o,^/o,7 ),
PT( O,VO) y £ R
if and only if for all x , 0
0
s
0
and T = 6k, k = 1 , 2 , . . .
=
h(xo,yc) ( ValT (x ,y ,T)
n
max
min
6
(3.5.2)
p -s{x,y) T
is the value of the game V (x ,
a
(3.5.1)
,
X
0
y ,T)). 0
T h e proof of the theorem is prefaced w i t h the following l e m m a . L e m m a 3 For any x ,
y
0
0
£ R" and T > 5, the following
pT(xo,yo)
<
max min »6C-£(iro)»<€C£(*«)
P r o o f : B y the definition of the function pT(x,y), max
=
max
V€C' M
l6
E
For a l l ( £ C (x ) for any x £ C {x ), 5
s
P
0
P
0
min
PTS(X,V)
min max c f , ( x ) sec%-%) 0
inequality
prslx.y) we have =
min p(x, 5). secj-'w
we have the inclusion Cjr\x) y £ Cj- (j/)
C C£(x ). 0
Consequently,
S
min
p(x,y)>
min
p(ar,y).
p(x,jj) >
max
min
T h e n for any a: £ C ( x ) P
max
0
min
is satisfied
p(i,fi).
Existance
of optimal
open-loop
strategy
105
for player E
Thus max >
min
max
Pr~s(x,y)
max
v6C*(s„)
5 e C
min
>
p(x,y)
=
f-' )i6Cj(x )^ ( v
0
T h i s completes the proof of the l e m m a .
•
W e now prove the theorem. Proof:
Suppose the (3.5.1) is satisfied but the (3.5.2) not. T h e n ,
Necessity.
by the previous l e m m a , there exist such x , pTo{zo,yo)<
max
Denote x°(t)
mm
0
0
0
(3.5.3)
pr _i{x,y). 0
VEC7J.{ )i£C*(i )
— x(t\x , u ° ( ' ) ) , where u°(-) 0
0
n
0
w
the game rs{x ,yo,T ).
T = Sk , k > 1 t h a t
y £ R,
0
0
is an o p t i m a l strategy of Player P in
E v i d e n t l y , there is such point j* £ C {y )
0
E
mm
p -i(x,y-) To
max
=
min
for which
Q
(3.5.4)
p - {x,y). Ta
6
Let u ° ( - ) , 5°(-) be o p t i m a l strategies i n the game ri(x°(S),y',T
— 6).
We
focus on the following strategy ti(-) for Player E; at the instant t = 0 he chooses the function v £ i ( [ 0 , 5], V ) such that the motion generated by this function transfers P l a y e r E from the state y
0
y* = y(S;y ,v) 0
to the state y' in the time equal to 6, i.e.
a n d s t a r t i n g f r o m the instant ( = 5 Player E plays the strategy
«"(•).
'
Denote by fi (-) a restriction of the strategy u°(-) to the interval [6, To]. 0
F r o m (3.5.1), (3.5.3), (3.5.4) we find « i ( * o , l t o ) > K(u\hu°(
)>^y^T )
= K-(fi (.),5°():*»(*).»'.ro-*) 0
>
min
fa-i{x,y')
=
max
=
0
min
>
p -s(xo{6),y') To
pT„(x ,y ) 0
0
>
> p { o,yo)Tc
x
T h e c o n t r a d i c t i o n o b t a i n e d proves the necessity of the c o n d i t i o n (3.5.2). Sufficiency.
N o t e t h a t c o n d i t i o n (3.5.3) is an equation for the value func-
tion of the discrete finitestep game r (xQ,y , e
T h e o r e m 2, C h . 2). T h e f u n c t i o n h( >y) x
a
T) (see T h e o r e m 7, C h . 1, a n d
satisfies the Eq.(3.5.2); hence it is
the value of the game T$[x, y, T ) , T h i s proves the theorem.
•
Class of pursuit-evasion
106
L e m m a 4 / / an optimal function
games with optimal
open-loop
strategy (i.e.
of the time only) for Player
is necesary and sufficient
strategy
the strategy which
E is to exist in the game T(x ,
=
p (x ,y ). T
T h e proof follows from the definition of T h e game r(x ,y ,T)
0
0
pr{x ,y ). 0
0
is a p p r o x i m a t e d by the discrete games r ( x , y ,
0
s
T h e o r e m 8 If for any x ,
e R,
y
0
T >.0 in the game T(za,yo,
n
0
0
0
E
that for
£ R", y £ R", T > S 0
=
pT(xo,yo P r o o f : Sufficiency.
T).
0
T) Player
is to have an optimal open-loop strategy, it is necessary and sufficient and any x
it
that 0
anyS>0
is the
yo, T),
0
ValT(xo,y ,T)
0
open-loop
max
min
PT-S{ ,
(3.5.5)
y)
x
W e show t h a t the c o n d i t i o n (3.5.5) implies the equality W < * Q . t t o ) = Vair{x ,y ,T) 0
(3.5.6)
0
E v i d e n t l y , P l a y e r E has an o p e n - l o o p strategy w h i c h ensures the payoff the least pr( o,yo)
since for any e >
x
payoff K(u(-),
v*(t); x ,y ,T) 0
0 a n d any strategy u(-) where v'(t)
> p (xo,yo),
0
T
of P l a y e r P the
is o p e n - l o o p control
transferring P l a y e r E from the state yo to the state y'
£ C (yo)
(y'
E
is the
center of p u r s u i t ) such that min
p(x,y')
=
max
mm
p{x, y) = p {x , T
y)
0
0
Prove that for any € > 0 P l a y e r P has a strategy which ensures that he will loose not more than pr{ o,
yo) + £• F i x e > 0. Consider the following strategy
x
u (-) for P l a y e r P : P l a y e r P chooses & > 0 and an a r b i t r a r y o p e n - l o o p control fi
U i ( t ) , t € [0,5) a n d , s t a r t i n g from the instant 6, he pursues the point y(t — 6} in the game F j ( x ( f i ) , y , T — 6) where y(t) is P l a y e r E ' s m o t i o n . 0
Because of the continuous dependence of the sets Cp(x ),
Cg()/o) on x ,
0
0
yo,
T, we have that there is S > 0 such that \PT-s,( ( i)^ya) x s
~
fa( o,yo)\
and there is S > 0 such that for all y £ C g 2
p{y,y')
< j - Set S = m i n { 5 i , 5 } . 0
2
game P j ( a : ( 5 ) , ( / , T — S ) 0
0
a
x
- 8
<
|,
" (jfo) a n d y' £ C (y)
the distance
E
Since the strategy u "(-) is o p t i m a l i n the s
(see T h e o r e m 7), then it ensures, i n this g a m e , an ap-
0
proach to P l a y e r E (y(t—S )) on a distance not more t h a n prs, 0
yo}- T h i s
( (Sj), x
implies an approach to P l a y e r E on a distance not more than px-6„(x(S ), 0
J/o) +
E x i s t a n c e of optimal
open-loop
strategy
E
for player
107
\, since it follows f r o m the choice of b~ that i n a t i m e S he cannot cover a distance more t h a n g. However, \pT-s (x(6 ),y ) - P T ( * O , ! / O ) | < §• T h e r e fore, P l a y e r P ensures an approach to P l a y e r £ on a distance not more t h a n PT(ZO, VO) + «• T h i s completes the proof of the sufficiency. 0
0
0
0
0
Necessity. A s s u m e t h a t P l a y e r E has an o p t i m a l o p e n - l o o p strategy. A l s o , suppose t h a t 6 > 0 a n d there are x , y £ R a n d T such t h a t 0
PT (X ,3/O)< 0
0
n
0
0
max
0
min
(3.5.7)
PT -s (x,y) B
0
Because of l e m m a 3 inequality of opposite sign is not possible. T h e n P l a y e r E has the strategy v'(-): at the instant t ~ 0 he chooses the o p e n - l o o p control u ( i ) , ( G [0,fi ] t r a n s f e r r i n g y o
min 'f&?
0
PT -h[ 'y') x
a
(x ) 0
to y " , where y' is such t h a t ™
= y
min
a x
hi-h(*ty),
6C7j (vol z£C>* (x ) 0
and from the i n s t a n t f = 5 he plays the o p t i m a l o p e n - l o o p strategy v'{t) i n the game r(x,y",T — S ), where x is the state in which P l a y e r P appears at the instant t — S . Hence for any strategy u(-) G P 0
0
0
=
*(»(•), »*(-); =
K(u(-),v'(t);x(6 ;x M-))y,T -6 )> 0
> where u(-)
min
he-toi ^*) 1
0
0
0
(3.5.8)
> PT<,(xo,yo),
is a r e s t r i c t i o n of the strategy u(-)
to the interval [t5 ,T ]. 0
0
By
L e m m a 4, the i n e q u a l i t y contradicts the o p t i m a l i t y of o p e n - l o o p strategy. T h i s completes the proof of the theorem.
•
B y e m p l o y i n g T h e o r e m 7, we derive a p a r t i a l differential equation for the value f u n c t i o n of the game.
T h e conditions of T h e o r e m 3 are assumed to be
satisfied for the g a m e P ( x , y , T ) . T h e n the function pj(x,y) game T(x,y,T)
w i t h d u r a t i o n T from the i n i t i a l conditions
A s s u m e that i n some region of the space R pr(x,
n
is the value of the x,y.
x R" x [0,oo) the function
y) has continuous p a r t i a l derivatives in all its variables. Since pr(x,
y) is
independent of £ , n then Eq.(3.5.5) may be r e w r i t t e n as max Let £ G C (y), 6 E
n G C {x). p
m i n [pr(x,y)
- & • - < ( { , » ) ) ] = 0.
T h e n there is the p a i r of controls u(f), v(t)
(3.5.9) which
transfers the p o i n t s y a n d x, respectively, to the states F a n d n i n the t i m e 8. Let x(t),
y(t)
be relevant trajectories. In this case ( = f f{x{t),u{t))dt Jo
+ X,
108
Class of pursuit-evasion
•/=
games will} optimal
f 9(y(i)At))dt Jo
open-loop
strategy
(3-5.10)
+ y.
D i v i d i n g b o t h sides of the Eq.(3.5.9) by S > 0, u s i n g Eq.(3.5.10) a n d letting S —> 0, we o b t a i n dp
under the i n i t i a l condition
3p
f
.
=
pr{x,y)\T~o
= 0,
(3.5.11;
p{ ,y)x
A s s u m e that we have somehow succeeded i n defining u, v, w h i c h gives max and m i n i n Eq.(3.5.12), as functions of x,y,T
and J | ,
i.e.
S u b s t i t u t i n g the (3.5.13) i n Eq.(3.5.12), we get
provided that K, x
T h u s , to define pr(x,
V: )\T=O T
= p(x,
(3.5.15)
y)
y), we have the C a u c h y p r o b l e m for the p a r t i a l differential
equation of the first order Eq.(3.5.14) w i t h the initial c o n d i t i o n (3.5.15). The Eq.(3.5.14) has been derived independently by various authors (see [ l , 33, 65]) and is called the Isaacs equation. A s s u m e that p a r t i a l derivatives of the second order of the function p exist then solution to the C a u c h y problem for Eq.(3.5.14) can be o b t a i n e d by the m e t h o d of characteristics that are of the form
dpx,
A
dfi(x,u{x,y,p )) x
dx - j r = 3i {y>v{x,y,p )),
»= l,...,n,
y
D
.
3
P»,
y
%(j/,u(3r,y,^)),
dt
£
%
(3.5.16)
E x i s t a n c e of optimal
open-loop
strategy for player E
109
Here dxj/dt, dp /dt, dy jdt, dp Jdt are the time derivatives along the o p t i m a ! trajectory. T h e derivation of Eq.(3.5.16) may be found i n [1], where the equations of characteristics are shown to determine o p t i m a l traectories for players. T h e i n i t i a l conditions for integration of equations i n (3.5.16) are obtained from (3.5.15). Xj
3
y
For the o p t i m a l t i m e p u r s u i t game the equation (3.5.15) takes the form
(3.5.17)
under the i n i t i a l c o n d i t i o n © ( s , y , O U * . » ) = ' = 0.
(3.5.18)
Here, as i n the previous case, the continuous p a r t i a l derivatives of the first order of the function Q(x,y,l) i n x, y are assumed to exist. A s s u m i n g t h a t it is possible to define u , ti, w h i c h carry over max and m i n in Eq.(3.5.6) as functions of x, y, i.e.
we rewrite equation (3.5.17) i n the form
provided t h a t e ( * . y . 0 U * , v > = ' = °-
(3-5-20)
T h e d e r i v a t i o n of equation (3.5.17) is o m i t t e d here, since it is similar to the derivation of equation (3.5.14) for the p u r s u i t game w i t h prescribed d u r a t i o n . B o t h C a u c h y p r o b l e m s (3.5.14), (3.5.15) and (3.5.17), (3.5.18) are nonlinear w i t h respect to p a r t i a l derivatives, since their solution poses serious problems. T h e differentiability of functions p a n d 0 presents a very fine a n d c o m p l i cated p r o b l e m . N . N . K r a s o v s k y has shown the differentiability of function p in the s o - c a l l e d regular case [10].
Class of p u r s u i t - e v a s i o n games w i t h optima] o p e n - i o o p strategy
110
3.6
Iterative methods for solution of differential game of pursuit
Let r ( x , y , T ) be a discrete form of the differential game F(x,y,T)
of dura-
s
tion T > 0 t e r m i n a l payoff M(x(T),y(T))
w i t h a fixed step of t i m e - i n t e r v a l
p a r t i t i o n 8, Player E being d i s c r i m i n a t e d for the t i m e 5 > 0 ahead. by Vs(x,y,T)
the value of the game r * ( x , y, T} . 2
\)mV {x T) 6
=
iyi
Denote
Then
V(x,y,T)
and o p t i m a l strtegies in the game r ; ( x , y , T ) w i t h sufficiently s m a l l 5 can be effectively employed to construct E - e q u i l i b r i u m points i n the game
Ts{x,y,T).
W e now expound the m e t h o d . Z e r o — o r d e r a p p r o x i m a t i o n . A s a zero-order a p p r o x i m a t i o n of the value function of the game Vg(x, y, T), we take the function max
V °(x,y,T)= 6
where Cp(x),
m i n A/(£,r>),
(3.6.1)
are the reachability sets for the players P and E from the
C [y) E
initial states x, y £ / J
7 1
by the time instant T.
T h e choice of the function V$[x, y, T) as an i n i t i a l a p p r o x i m a t i o n is justified by the fact that i n a sufficient broad class of games (a " r e g u l a r case") it proves to be the value of the game T(x, y, T) (see in 2, 5 i n this chapter). T h e following a p p r o x i m a t i o n is constructed by this Tule V (x,y,T) s
l
V?(x,y,T)=
=
max
min
V ° ( £ , n , T — 6),
max m i n V ({, ft T - 6), nec£( ) ««(*)
(3.6.2)
x t
v
V?{x,V,T)=
max
min
Vt^tn^
- S)
w i t h T > 5 and V ( x , y, T ) = V ? ( * , v , T ) , k = 1, 2 , . . . , w i t h T < S. e
fc
A s we may see f r o m the formulas (3.6.2), the operation m a x m i n is taken w i t h respect to the reachability sets C (y), C {x) for the time 8, i.e. for one step of the discrete game T(x,y T). E
P
l
T h e o r e m 9 The sequence of function ing. ' T h e terminal payoff
on fl" x R"
is
equal
{Vg (%,$,$}) is monotone
to M(x(T), y(T))
where
M(r, y) is
a
nondecreas-
function continuous
Iterative
methods
for solution
of differential
P r o o f : W e first prove the i n e q u a l i t y
game of
a n d V °(x,y,T),
J s
For a l l ( e C (x) there is C ? ~ ' ( 0 C - (r,), £ € C {x)
min
C Cj{x).
P
T 6 E
s
max
mm
111
3
B y the definition of the functions V (x,y,T)
— max
pursuit
we have
Mff.n) Consequently, for any r) €
P
min
Hence we have t h a t for any £ € max
min
> .min
M{i,T))
/W(f,if)
C {x) P
M ( £ , n) >
max
min JW(f,n).
Therefore, i n p a r t i c u l a r , max
mm >
max
min min
M(f,n)
>
Af(xi,Tj).
Thi V,'fx,y,r) =
> =
max
max.min
max
min
max m i n Jtf(£,rj) = nec%(yi cecj(x)
if,T-«)>
Mif.fi) V°(x,y,T),
i.e. A s s u m e that for I < k the following inequality holds
V?{x,y,T)>Vr\*,V,T).
(3.6.3)
T h e proof of this inequality is given here only for the purposes of completeness, since it reproduces the one for Lemma 3. 3
112
Class of pursuit-evasion
games with optimal
open-loop
strategy
W e prove that V/* (*,J,T')> tf(*,l..T). 1
(3.6.4)
1
B y definition, V {x,y,T)
max
=
k+1 s
V (x,y,T)
max
=
k
min min
V}fa* T-S), t
ltf-»(fc*r-S).
(3.6.5)
F r o m (3.6.3), and using (3.6.5), we obtain the required inequality (3.6.4). We have thus proved that the statement of the theorem holds at each point (x, y, T) in the case T > 6. In the case T < 5, however, the statement of the theorem is obviously seen. T h i s complete the proof of theorem. • T h e o r e m 10 The sequence {V (x,y,T)}
converges in a finite number of steps
k
N, with the estimate N < [ j j + 1, where the brackets stand for the greatest integer. P r o o f : Let N < [ j ] 4- 1. Show that (3.6.6)
V "(*,V,T)-Vi'+ {z,y,T). l
l
T h e expression (3.6.6) is obtained from the construction of the sequence {V (x,y,T)}. k
Indeed,
= max
max
min
min
max
Vjtf"-
... - (N - 1 )8).
,n ~\T
1
N
Similarly, we get =
V *'(x,y,T) s
N
max
min
max
min
max
- IN
V?(f, -\y -\T N
...
N
-1)6).
However, T - (N - 1)6 = a < 6, therefore
Vi{f-\t, ~\T-{N-l)6) N
= V (( -\r, -\T 2 s
N
N
- (N - 1)5) =
= V?(£ -\y - ,a), W
v
1
whence follows (3.6.6). T h e coincidence of members of the sequence E q . (3.6.2) for k > N is obtained from Eq.(3.6.6) by i n d u c t i o n . T h i s proves the theorem. •
Iterative
methods
for solution
of differential
game of
pursuit
T h e o r e m 11 The limit of the sequence {V {x,y,T)} of the game T (x,y,T). s
coincides
K
113 with the value
e
P r o o f : B a s i c a l l y , T h e o r e m 11 is the corollary of T h e o r e m 10. Indeed, let V (x,y,T)
=
s
l i m V {x,y,T). s
(3.6.7)
k
Since the convergence takes place i n a finite number of steps not exceeding +
then i n the recurrent E q . ( 3 . 6 . 2 ) it is possible to convert to the
limit as k —t oo. T h e l i m i t i n g function V {x,y,T)
satisfies the equation
6
V {x,y,T)
max
=
s
min
V {(,jj,T
(3.6.8)
- 6)
6
under the i n i t i a l c o n d i t i o n Vs(x y,T)
\
t
0
=
max
min
Af(f,n),
which is a sufficient c o n d i t i o n for the function V$(x, y,T) the game
to be the value of
T{x,y,T).
W i t h knowledge of the function Vs(x,y,T),
a n d using the E q . ( 3 . 6 . 8 ) , we can
construct o p t i m a l piecewise o p e n - l o o p strategies i n the game Fs(x,y, T).
The
E - o p t i m a l strategies i n the basic game T{x, y, T) are constructed w i t h the help of strategies t h a t are o p t i m a l i n the game Fs[x,y,T)
(see 2, 5 in this chapter).
A s is clear f r o m ( 3 . 6 . 2 ) , the coincidence of two successive a p p r o x i m a t i o n at the steps k a n d k + 1 means that the corresponding a p p r o x i m a t i o n is the value of the game Fs(x,y,T), coincide w i t h the k-th
since in this case all subsequent a p p r o x i m a t i o n s
a p p r o x i m a t i o n . Such a coincidence is a c r i t e r i o n for
termination of c o m p u t a t i o n s . W e have every reason to say that i n the broad class of problems the convergence occurs faster t h a n i n the time mentioned in T h e o r e m 11.In p a r t i c u l a r , the c o m p u t a t i o n i n the " r e g u l a r case" terminates at the first step after the f u n c t i o n V / ( x , i / , T) has been computed (at the same time, this provides a " r e g u l a r i t y " c r i t e r i o n ) . W e w i l l m o d i f y the above m e t h o d of successive a p p r o x i m a t i o n s . A s an i n i t i a l a p p r o x i m a t i o n , we take the f u n c t i o n V^{x, y, T) — where V °(x y,T) s
y, T ) ,
is defined by ( 3 . 6 . 1 ) . T h e following a p p r o x i m a t i o n is con-
t
structed by the rule V, (x,y,T) k+,
=
max
max
for T > 5, where N = f , a n d V \x,y,T) k+
min
V ? ( { , 17, T - i6)
= V °(x,y,T) s
for T < S.
(3.6.9)
114
Class of pursuit-evasion
games with optimal
open-loop
strategy
T h e statements of the theorems 9-11 holds for the sequence of functions {V (x,y,T)), as well as for sequence of functions {V (x,y,T)}. These statements for the sequence of functions {V (x,y,T)} are proved in m u c h the same way as similar reasonings for the sequence of functions {V (x,y,T)}. The functional equation for the value function of the game Ts{x, y, T) in the region { ( i , u , T ) | T > 5} is of the form k f
k
t
k
k s
= max
V(x,y,T)
max
min
V(£,n,T
(3.6.10)
- iS),
where N = j, and the i n i t i a l condition remains the same,i.e. V(x,y,T)\
=
T
max
min
M((,,n)
W e prove the equivalence of Eq.(3.6.8) and Eq.(3.6.10). T h e o r e m 12 Eq.(3.6.8)
and Eq.(3.6.l0) T
are
with the initial
min
V(x,y,T)\ < = s
•
min
condition (3.6.11)
M((,n)
equivalent.
P r o o f : Suppose the function V {x,y,T] satisfies E q . (3.6.8) and initial condition (3.6.11). W e show that it satisfies E q . (3.6.10) i n the region {(x, y, T)\T > 6
Indeed, since V (x,y,T)
satisfies (3.6.8), then the relationships
s
max
V (x,y,T)= s
= >
max
min
max max *ec> {y) nec M E
=
...>
E
max
max
holds in the region {(x,y,T)\T Vi{x,y,T)=
min
max
min
V ((,n,T
- 6) =
6
Vdl, rj,T - 26) >
min m i n Vs(L n, T — 26) = (ec {*) £ec<<£) p
min
min > 6}. max
VSILTI.T-%$)
V (£,n,T-iS) t
£
...
>
...
Since the equality min
V(£, n, T -
6),
Iterative
methods
for solution
of differential
game
115
of pursuit
is satisfied for i = 1, then the equality V&(x,y,T)
= max
max
min
V(f,n,
T -
8),
where N = j , holds, w h i c h proves the required statement. Now let the function V {x,y,T) i n the region {(x,y,T)\T > 6} satisfy E q . (3.6.10) a n d i n i t i a l c o n d i t i o n (3.6.11). W e show that it also satisfies Eq.(3.6.8). Suppose the opposite is true. T h e n from (3.6.10) we find s
max
Vi{x,y,T)> In the region {{x,y,T)\T
min
>
max
=
max
(3.6.12)
V {(,rj, T - 8). s
However,
> 8}. max
— max
min
min
max max
V (f ,r>,T - 8) = 6
max
min
K (£, fj.T - (i + 1)6) >
min
min
Vs(l, ij, T — (i + 1 )8) =
max max max min m i n Vs(£, V. T — (i + 1)5) = •eli^-il^c^fiec^Wfg^Wf-ecjfif) ' v
=
max
max
min
V [(.n.T s
- i5) = VAx.y,
T),
since, by (3.6.12), the rigorous inequality holds for ! = 1. A s a result, we get a contradiction which completes the proof of the theorem. • C o n s i d e r now the o p t i m a l time pursuit game T(x,y). As is seen from 4, C h . 2, the existence of the (, u - e q u i l i b r i u m points in the game T (x, y) for any e > 0, 8 > 0 is connected w i t h the solution of the game F(x, y,T) w i t h prescribed d u r a t i o n T , w i t h the payoff equal to w = min <,<7 p(x(t), y(t)). In the game T{x,y, T) w i t h perscribed d u p a t i o n , by the solution is c o m m o n l y meant the ( - e q u i l i b r i u m point. For any e > 0,the existence of the ( - e q u i l i b r i u m point and the existence of the game value for the games with perscribed d u r a t i o n is proved under sufficiently general assumptions in 3, C h . 2. l
0
Let V(x,y,T) be the value of such a game. F i x the i n i t i a l state x, y £ R" and consider the function w(t) where w(t) C h . 2).
V(x,yJ)
is a continuous non-increasing function w i t h t £ [0,oo) (see 4,
L e t Vs{x,y,T)
be the game w i t h prescribed d u r a t i o n T w i t h the
payoff w a n d the value V (x,y,T), 6
time 8 > 0.
=
where Player E is d i s c r i m i n a t e d for the
116
Class of pursuit-evasion
games with optimal
open-loop
strategy
Based on 3, C h . 2, w (t) s
=
V (x,y,t)
hmV {x ,t) 6
=
>y
V(x,y,t).
0—*(j
Denote by r|(ar, y) (see 4 i n this chapter) the o p t i m a l time game up to the ( - c a p t u r e instant, where the evader is d i s c r i m i n a t e d for the time S ahead. Assume that the values of the games T'(x,y), and T'(x,y) exist are finite and are equal, respectively to V and V'{x,y). T h e n the following theorem holds. e
s
T h e o r e m 1 3 lims-.o V '(x,y)
= V {x,y)
Proof:
at each point {x,y)
e t
s
A s is evident from T h e o r e m 3, C h . 2, V'(x,y)
e R
n
X R*.
and V£(x,y)
are the
m i n i m u m roots of equations V(x, ,T) for fixed
(3.6.13)
= £, V
V
>Vi
x,y iv(T)
= V(x,y,T)
> V (x,y,T) s
= w (T) s
(3.6.14)
(3.6.14) and the functions w(T), u>s(T) are m o n o t o n i c a l l y nonincreasing. Since V (x,y) and V'(x,y) are the m i n i m u m roots of (3.6.13), then there are such a > 0, £*[ > 0 that on the intervals [ V ( z , y ) - a,V (x,y)], [V/(x,y) Oti,V (x,y)] the functions w(T) and u)j(T) are s t r i c t l y descreasing. T h e n we have l i m w (T) = wlT). t
e
s
e
Therefore, for any a > 0 there are such Si > 0 an T € [V{x, that w (r) > w(V (x,y)) = t,
y) - a, V (x, e
y)]
l
h
and since w (V (x,y}) < w(V'(x,y)) = t, then the m i n i m u m root of equation u>s,{T) - ( belongs to the interval [V(x,y) - a, V'(x, y)\. Consequently, V \{x,y) also belongs to this interval. Since a may be chosen a r b i t r a r y small, and the sequence {V${x,y)} is monotonically non decreasing i n S, then 6l
e
s
\imV '(x,y)
=
6
V'(x,y).
T h i s completes the proof of the theorem.
•
T h e o r e m 13 suggests that the sequence {V (x,y)} imation of V'(x,y). s
e
can be used for approx-
F i x a p a r t i c u l a r S > 0. T h e c o m p u t a t i o n of V'{x, y) is a sufficiently tedious operation, since it involves the solution of discrete Isaacs e q u a t i o n . W e suggest the recurrent procedure of finding V '(x,y) s
Iterative
methods
for solution
of differential
117
game of pursuit
Let l ( x , y ; u ( i ) , ! > ( ( ) ) be the f u n c t i o n a l , that is defined for any admissable o p e n - l o o p control tt(t) a n d v(t) a n d is equal to the t i m e to the ( - c a p t u r e instant, f r o m t h e i n i t i a l states x, y (E R", if t h e ( - c a p t u r e occurs when the controls u(t) a n d v(t) are used, otherwise i t equals oo. N e x t , let n
Q\x,y)
=
u inU (x,y u(t)At)Y
S
?
n
>
As a z e r o - o r d e r a p p r o x i m a t i o n , we take
Vl (x,y) =
Q (x y). i
0
l
T h e choice of zero-order a p p r o x i m a t i o n is justified by the fact that in some cases this provides the value of the game (see 4 i n this chapter). T h e second a p p r o x i m a t i o n is constructed by the rule V£(x,I,)=
sup
inf V&(£,n) + *
(3.6.15)
for ( z , y ) £ L a n d V / , ( i , y ) = V / ( x , ! / ) for (x,y) 6 L , where L = {{x,y) € 0
0
0
0
R"xR"\V (x,y)<6}. f sp
A s i n the proof of a s i m i l a r statement i n 5 of this chapter (see L e m m a 3), we m a y show t h a t for any states x y and for any £ and 5 G (0,Q'(x,y)), the following i n e q u a l i t y holds t
t
sup
e'(x,y)<
i n f 0 ' ( £ , >?) + *-
Hence T h e a p p r o x i m a t i o n k + 1 is constructed by the rule VUtfrv)
=
S
U
jsf, Mk(i>*)+*
P
(-- ) 3
6
16
for (x y) 6 L a n d Vl {x,y) = Vl {x,y) for (*,$,) G L , where = {(x,y) R" X R \V' {x,y) < (k + 1)6}. T h e construction of the sequence { 6.ki
v
x
e
k
k+1
n
k
k
k
4 1
l 6 ik
for (x,y) e Li converges i n not more than i steps, and the i - t h a p p r o x i m a t i o n coincides w i t h the value of the game T\(x,y). Indeed, i n the passage i n Eq.(3.6.14) to the l i m i t as k -* oo at the point (x,y) e Li (the latter is possible because of the convergence of the process i n a finite n u m b e r of steps), we get V (x,y)= t s
sup
i n f VM,T})
+ §
(3.6-17)
118
Class of pursuit-evasion
here for (x, y) £ L
games with optima!
open-loop
strategy
0
V (x,y) l e
=
Q (x,y) t
However, Eq.(3.6.17) w i t h a suitable i n i t i a l condition on the set L provides a sufficient c o n d i t i o n for Vg(x y) to be the value of the game (x,y). T h u s , the following theorem holds. 0
t
T h e o r e m 14 Let (x,y) 6 Li and k be the first step at which V ' (x,y) V£ (x,y). Then for alli> k s k+1
=
k
Vli(x,y)=Vl (x,y) k
=
V<(x,y),
where V$ (x, y) is the value of the game. K n o w i n g V£{x,y) a n d using Eq.(3.6.17), we may find the e , 5 - o p t i m a l strategies for the players P a n d E, as it has been done i n 4 of this chapter.
Chapter 4 Examples of differential games of pursuit 4.1
Games with prescribed duration without phase constraints
T h i s chapter studies differential games o f p u r s u i t whose solution is based on the results presented i n C h . 3. Example
1. L e t the reachability sets for t h e Player P and E be such that max
m i n MU,n)
=
min
m a x MU.y)
(4-1-1)
Let (\ Hhe an a r b i t r a r y pair of points at which m a x m i n is achieved in (4.1.1). T h e n by T h e o r e m 1, C h . 1. m i n m a x is also achieved i n the same points. In the latter case, the point // satisfies the first c o n d i t i o n of invariance. Indeed, for all 0 < t < T t h e p o i n t , f belongs to the set Cp~'{x'(t)), a n d the point H belongs to t h e set Cg~*(y'(t)). Moreover, from T h e o r e m 1, C h . 1, for all V G C E O ) . f G Cp{z) we have t
t
Since for ( € [0,T] C%~*(y*{t)) C C £ ( j / ) , £ ? " * ( * " { * ) ) P ( ) > then the above inequality also holds for a l l n g C £ ( j / * ( t ) ) , i G (#"*(**(*)). B y a p p l y i n g again T h e o r e m 1, C b . 1, we o b t a i n c
C
X
-(
max
min
M((, n) =
min
max
M{£, n) = M(A, / / ) .
T h e l a t t e r relationship shows t h a t the first c o n d i t i o n for invariance o f the point / / is satisfied. 119
E x a m p l e s of differential
120
games of
pursuit
W h e n the (4.1.1) is satisfied, the differential multistage game r(x,y,T) converts to a simultaneous single-stage game and it follows from (4.1.1) that such a game has an e q u i l i b r i u m point i n pure strategies. In this case, the optimal strategy for Player P is the choice of the point and the one for Player E is the choice of the point H. F r o m this it follows that i n the corresponding differential game (under condition (4.1.1)) the o p t i m a l strategies for the players employ only the i n f o r m a t i o n on the players' positions at the i n i t i a l instant of the game and on the game duration T. T h e continuous " t r a c k i n g " of the opponent is unwarranted. Example 2. ( simple p u r s u i t ) . T h e pursuit occurs over the entire Euclidean space. M o t i o n equations are of the form
where a: = T h e set x. T h e set T h e payoff
n , n , ^ u - < 1,
Xi = aui,
i = 1,..
Vi^fiVi,
i = l,...,n,
£ « ? < 1 ,
const, 0 = const, a > 0. Cj,(x) is a sphere with the radius aT a n d its center at the point C (y) is a sphere w i t h the radius 0T and its center at the point y. function is defined as follows: E
H(x(T),y(T))
= p(x(T),y(T))
=
£(«i(T)-jB(r))'.
Geometrically, the invariance of the center of pursuit ( the center of the game ) is straightforward, and T h e o r e m 3, C h . 3, is applicable here. Indeed (see the E x a m p l e 1 from 2, C h . 1), p (x,y) T
= p(x,y)-( -0)T a
{p(x y)> t
0),
T h e quantity p (x,y) is achieved at the points Mi,M (see F i g . 5 - 7 , C h . 1), where M is the intersection point for the line of the centers xy(OOi), that is farthest from x(O), and the point M is the point of the set C^(xi) (Si), that is nearest to the point M. T h e conditionally o p t i m a l trajectories for the Players P and E are the straight line segments xMi, (OMi) and yM (O M). T h e first condition of invariance follows from the fact that for any 0 < ( < T the point M(t), t h a t is the center of pursuit w i t h respect to the states x'(t),y'(t),T - t remains on the center line xy(OO ), i.e. it coincides w i t h M. T
x
x
W e now assume that at some instant 0 < t < T P l a y e r E deviates from the conditionally o p t i m a l trajeciory d u r i n g the time 6, and makes a transition to the point y' e C (y'(t)). T h e n the center of pursuit w i t h respect to the E
Games with prescribed sets C p " ' ( V ( t ) )
without phase
duration
a n d C%~ ~ (y'), i
i.e.
s
6iS
trajectory x"(t)
121
the center of p u r s u i t M' i n the game
— t), is on the center l i n e x'(t),
V' (x'(t),y',T
constraints
y'. T h e c o n d i t i o n a l l y o p t i m a l
for P l a y e r P i n this game is a straight line segment of lenght
a(T - t) focussed on
M'.
C o n s i d e r the games V {x"{t since the trajectory x"(t)
- t - a) for 0 < a < 8. In these
+ a),y',T
Bi
games, the center of p u r s u i t M'(a)
is on the center line x"(t
+ a),
y', and
is focussed on M' — i W ' ( o ) , then for all 0 < o; < 8
it coincides w i t h M' = A / ' ( 0 ) — M ' ( a ) , 0 < a < 8. W e have thus proved the second c o n d i t i o n of invariance. Consequently, the value of the game p (x,y)
=
T
p(x,y)-(a-0)T.
T h e o p t i m a l strategy for P l a y e r E is o p e n - l o o p and provides an o p e n - l o o p trnsition to the center of p u r s u i t M.
U s i n g the e - o p t i m a l strategy for Player
P at each instant of t i m e is m i n i m i z i n g the function p w i t h respect to the realized state of the game. the realized point y.
G e o m e t r i c a l l y , this involves the m o t i o n towards
In p a r t i c u l a r , when the e q u i l i b r i u m point is concerned,
this leads to the m o t i o n along the c o n d i t i o n a l l y o p t i m a l straight trajectory focussed on the point M (see E x a m p l e 10, C h 2). If R\ = \xM\\ < \xM\ — R
2
= \xM\ — \yM\, then based on E x a m p l e from
Chapter 1 (see 2, case 2) max
min
ptt.y)
=
min
max
pU.n)
—
prix^y)
the c o n d i t i o n of E x a m p l e 1 are satisfied. Now write formally the Isaacs equation for the " s i m p l e p u r s u i t " game.
It
is of the form dp —
-X = max )
w i t h the i n i t i a l c o n d i t i o n PT(x,y}W=o
dp
X
— K + m m )
= \f(*i - j n )
3
dp -—w,-
+ (*2-jra) a
In (4.1.2), m a x and m i n are achieved for
ui(x,y,p ) s
-
a,dx J
Am
3
.M. U2(x,y,p ) x
(4.1.2)
3«
(4.1.3)
Examples
122
of differential
0
games of
pursuit
+ © ' dy2
and (4.1.2) takes the form dp
dp
+
dxi
i t
dp\\fd£
-0
dyy)
\dy
(4.1.4)
2
w i t h the initial condition
W e now check the continuous differentiability of the f u n c t i o n
dp
X,- - j/i
dp 5
f-
px{x,y):
Xi - J/; y(x,-i„)' +
(4.1.6) (x -y ) a
It follows from (4.1.6) that function px(x,y]
2
s
has continuous derivatives
everywhere except of the surface \/(x,-yi)
2
+ (x -y ) 2
2
2
= 0.
(4.1.7)
T h e surface (4.1.7) corresponds to the case Pr{x,y) — 0, since i n this case x = y a n d Cp{x) D C g ( y ) (/J < a ) . T h u s , the function 07(1,1/) turns out to be a solution to the C a u c h y p r o b l e m for (4.1.4) i n the space region which does not contain the surface the (4.1.7), the latter b e i n g singular for the Isaacs (4.1.4). Note that this singularity has been overcome i n the procedure developed in C h . 3 (theorem 2,3). Example 3. ( d y n a m i c game of p u r s u i t w i t h frictional forces). T h e pursuit takes place over the entire plane. M o t i o n equations are of the f o r m =Pi,
|"| < 1,
Games w i t h
prescribed
duration
without
h pi =
=
aiii
-
ij =
M
i,
s
phase
constraints
1.
<
kppi,
1,2.
i =
p%i -
123
k Si, E
Here q a n d r are respectively geometrical coordinates of the Players P and E , s (p) is a m o m e n t u m of P l a y e r £ ( P ) , a n d k , k are some constants interpreted as friction coefficients. T h e payoff function is defined as follows: P
H(q(T) r(T))
E
= p(q(nr(T))
>
= T'=>
F i r s t consider the case & ^ 0, k the circle w i t h the radius p
R (T) P
E
^ 0. T h e set C j ( t f ) (in the space 9) is
= a k
P
and its center at the point 1 -
q = q+
e-^
p-
T
where q, p are the s t a r t i n g position and i n i t i a l m o m e n t u m for Player P set C (r) (in the space ) is the circle w i t h the radius
The
E
and its center at the point r = r +
s
1 - e**r :
k
E
where r, s are the s t a r t i n g position and i n i t i a l m o m e n t u m for Player E . L e t us make the f o l l o w i n g a s s u m p t i o n :
*>** i >h
(4 L8)
T h e q u a n t i t y PT(
s
p (q,r) T
= p(0,0')
+
R
E
- R
P
-
E x a m p l e s of differential
124
Fig.
do not coincide w i t h the
C {r) E
1 _ -i
, _
e
-
pursuit
14.
( i n this example, the centers of circles Cp{q), points q, p), whence we have that
fa(
games of
e
-* T B
=
I ct
e-tpT + k T-\ P
e-^
a
+ kT
T
E
j3—
- 1
t-
2
Let ''*(*) he c o n d i t i o n a l l y o p t i m a ! trajectories. T h e n , to satisfy the first condition of invariance, it is sufficient that along the c o n d i t i o n a l l y optimal trajectories R (T-t)>
(4.1.9)
R (T-t)
P
E
for all 0 < t < T, where Rp(T — t) and R (T — f) are respectively the radius of the circles Cp~'{q {t)) and C | " ' ( r * ( i ) ) (note t h a t the c o n d i t i o n (4.1.8) is also satisfied for the simple p u r s u i t , since in this case Rp(T - t) = o(T -1), R (T — t) = f3[T — (), a > j3). Indeed, suppose there are such instants (i and t that E
-
E
2
Rp(T
- t)
> R {T
-
U),
Rp(T
- t ) < R (T
-
t ).
}
E
2
E
2
T h e n the s i t u a t i o n shown i n Fig.14 becomes possible. Here the center of pursuit has " j u m p e d " from the point A / ( l ) to the d i a m e t r i c a l l y opposite point M(ti). T h e inequality (4.1.9) excludes such a possibility. W e now obtain a sufficient condition for the coefficients a , j3, k , k for w h i c h (4.1.9) takes place. Denote T — t = r. t
P
E
d u r a t i o n without phase constraints
Games with prescribed Let us c o m p u t e
125
R' ,R' : P
E
1 - e = « —
*,
~ ,
k
1 R (r)
= (lL^
B
p-*E
_
k
P
T
,
E
R' {0) = Q, R' (0) = 0. P
E
W e find second derivatives: = ae~ '\
R" {T)
=
R> (T)
k
P
e
fir**.
Moreover, «WT)| _ = O , R" {r)\ T
E
0
T=a
=
p.
F r o m (4.1.8) we o b t a i n R» {0)
>
P
R (0). E
Consequently, i n some neighbourhood of zero flJ(T)
and since Rp(Q)
— R'EW
R" {T),
>
E
= 0, then i n the same neighbourhood R (r)>R! (T). F
Hence R (0)
= 0 a n d for s m a l l T we get
= Rp{0)
E
E
Rp(r)
E v i d e n t l y , if a &P( ) t
>
$ a n d fcp <
R B { T )f°
>
a>0and k
P
> k
E
r
a l l r > 0. (£
>
R {T). E
then c o n d i t i o n (4.1.8) is satisfied, a n d
k, E
Therefore, of interest is proof of (4.1.9) for
> JL).
We w i l l show t h a t i n this case there is always Rpir)
>
(4.1.10)
R! (T). E
Indeed, suppose the opposite is true. T h e n there is such instant r
^(r ) 1
But £
< £
From k
E
= i . (L - e - ^ ' )
- ^
(1 - e - V )
= Bfo ) 1
(see c o n d i t i o n (4.1.8)). < kp follows k r E
l
<
k
P
T
L
,
or e * ' < c £ T
\ whence
kFT
l
that
(4.1.11)
Examples
126
of differential
games of
pursuit
B u t this contradicts (4.1.11), i.e. (4.1.10) holds for a l l r > 0. S i n c e i n the neighbourhood of zero Rp{r)
>
then (4.1.10) implies Rp{r)
RE{T),
>
RE{T)
for all r > 0. W e have thus proved (4.1.9) under condition (4.1.8) for a l l r > 0. T h e second c o n d i t i o n of invariance is satisfied and proved as i n E x a m p l e 2. W e now construct the equation w i t h p a r t i a l derivatives of the first order (see (3.5.12)) for the game involved. It is of the form
-£(&+&]-£(&»+gM + dp
*
+/J m a x >
dp
dp
.
*
—t>i + Q m m >
dp
-5— m
(4.1.12)
W i t h the initial condition t
(4.1.13)
Controls u , 5 from (3.5.13), on which m i n a n d m a x are achieved i n (4.1.12), are determined by the formulas -2L a*.
Ui =
V;
3s,
=
:,
i = 1,2.
S u b s t i t u t i n g S,, 0; into (4.1.12), we o b t a i n the C a u c h y p r o b l e m for equatiurn w i t h partial derivatives of the first order
9 T ~ t
W '
—o
+
~ £
dr~. J S,
3tV
,
+
1
d ? * *
I
"
1
+ 0
EI*
(4.1.14)
£1
w i t h the i n i t i a l condition
M<MO =
£(«
~ r,)
a
(4.1.15)
Games w i t h p r e s c r i b e d d u r a t i o n without phase constraints
127
Introduce the n o t a t i o n s _ s
_
^ = A »
P
-
_ -
d
3 5 ? - * . ' = 1.2.
Characteristics of (4.1.14) are o p t i m a l trajectories in the game of pursuit [lj. Equations of C h a r a c t e r i s t i c s are of the form (see (3.5.18)) * = P i . Pi =
/., ^
a
. r* = s,-,
yfPl +
p
£,=0.
= -p
Sl
T I
-
£
k p,.,
PT = 0,
E
i=
l,2,
W h e n solving the equation (4.1.14) w i t h the i n i t i a l condition (4.1.15) by the method of characteristics, we o b t a i n the previously derived expression »
fa(
/
e
-kpT
_ i
e
-k T
_
E
|*
= &
t
-krT
e
-
r
+
i
k
+
p
T
-
P
i
~ P
S
-k T
l
e
-[«-—M *
s
+
*|
i
^
k T-\\ E
~
J -
T h e function p f r o m (4.1.16) has continuous partial derivatives over the entire region of space except the points l y i n g on the surface
£ ( , _ „ + „ _
\
S
i
_ _ )
=„,
(
4
,,7)
where the p a r t i a l derivatives p „ p , i — 1,2 are not defined. T h e surface (4.1.2) corresponds to the i n i t i a l states, for w h i c h the distance p(0,0'} between the centers of reachability circles is zero, since the expression on the left side of (4.1.12) is e x a c t l y the distance between the centers of reachability circles. T h u s , the function p proves to be a solution to the Cauchy problem for the equation (4.1.14) w i t h the i n i t i a l condition (4.1.15) i n the space region free from s i n g u l a r i t i e s , at these points Cp(q) D C (r) and pr{q,r) = 0. W h e n the condition R ( T ) > R [T) (a > 0, £ > £ ) is satisfied, the results of C h a p t e r 1 (see the T h e o r e m 2, 3) makes it possible to overcome this singularity [27] j u s t as i n E x a m p l e 1. p
ti
T
E
P
E
Examples
128 Consider the case k
F
T h e set Cf(q)
of differential
games of
= ICE = 0 ( the absence of friction) [35] i n some detail.
is a circle with the radius Rp(T)
= or— a n d its center at the
point q = q + pT, where q, p are the s t a r t i n g position a n d i n i t i a l for P l a y e r P . R (T) E
pursuit
T h e set C (r)
momentum
(in the space q, r) is a circle w i t h the radius
E
= 0%£ and its center at the point p m r + sT,
s t a r t i n g position and i n i t i a l m o m e n t u m for P l a y e r E. converts to the inequality a ^ 3
> 0—-,
where r, s are the
T h e c o n d i t i o n (4.1.9)
and the existence c o n d i t i o n for the
invariant center of pursuit (4.1.9) converts to the inequality a > 0
(4.1.18)
P h y s i c a l l y , this means that the m a x i m u m force applied by Player P exceeds the force applied by Player E. W h e n (4.1.18) is satisfied, the value of the game is equal to
PT(q,r).
tt is
determined from the f o r m u l a
Mq,
- r,- 4 T{
r) = A
-
Pi
- (a - 0)~.
(4.1.19)
T h e equation w i t h partial derivatives of the first order for the f u n c t i o n
PT{q,^)
is derived from (4.1.14) by s u b s t i t u t i n g kp — fcg = 0 Example
3a.
T h e pursuit occurs over the entire plane.
M o t i o n equations
are of the form 4i =
Pi,
p\ = aui - kppi,
|u| < 1,
i =
1,2
Vi = 0v
M S I ,
i =
1,2,
U
Here q and y are respectively the geometrical positions of the players P and E, and p is Player P ' s m o m e n t u m . T h u s , i n the case at h a n d , Player E moves in accordance w i t h the " s i m p l e m o t i o n " , and Player P representing a material point of unit mass, m o v i n g under the action of the friction force and the force a. Player £ ' s payoff is defined as a distance between the players' geometrical locations when the play terminates:
H(q(T),y(T))
F i r s t consider the case k the radius
= p(q(T),y(T))
P
=
£(*(?) -
^ 0. T h e set Cf(q)(\n e-
kpT
+
»,(D) . a
the space q) is a circle with k T-1 P
Games w i t h p r e s c r i b e d d u r a t i o n without phase
constraints
129
and its center at the p o i n t 1 -
e*'
T
where q, p are the s t a r t i n g position and i n i t i a l m o m e n t u m for Player P . T h e set C [y) is a circle w i t h the radius RE(T) = ST and its center at the point y. A s before, the q u a n t i t y p {q,y) is computed by the f o r m u l a E
T
h{
= p(q,y) + R E - R P , 1 -
e"*'- ' + k T 1
- a
- 1
P
t~ " k
T
„
+0T.
A s shown in the previous example, to satisfy the invariance conditions for the center of p u r s u i t M it is sufficient that along the c o n d i t i o n a l l y o p t i m a l trajectories q'(t), y'{t)
the inequality R (T
- t) > R {T
P
is satisfied for a l l 0 < ( < T, where R (T
- i ) , P l ( T - () are the radius of
P
reachability circles Cp~'(q*(t)), by
(4.1.20)
- t)
E
E
C | " ' ( y - ( t ) ) - In what follows, T - t is denoted
T.
We find the c o n d i t i o n (4.1.20) to be satisfied for the case under study. Compute R ' and R ' : P
B
R'p(r) Since fi > 0 and R' (0} E
=
8. Since R (0) P
this n e i g h b o u r h o o d .
l
~ ^
k
P
\
R' (T)=0. e
= 0, there is such neighbourhood of zero that
P
R' {T)
= a
=
R (0) E
= 0, then R {r)
<
P
R {T), E
T ?
flp(r)
<
0 for r from
T h u s , the sufficient condition of invariance (4.1.20) is
violated. A t the same t i m e , under some i n i t i a l conditions the center of pursuit is found to be invariant. L e t us deduce these conditions. C o n s i d e r the difference -kpT
y(T)
= R P ( T )- Jfc(r) = o
+ kr -r^p P
- 1
Br.
Show t h a t f>{r) goes to zero at a unique point which is a monotonicity point of the function v ( r ) . C o m p u t e
Examples
130 Since
>
of differential
games of pursuit
0, then i^'(r) increases monotonically. However, ^'(0) = —ft <
0. L e t 3 < I-. T h e n there is such r that ^'( o) = 0, or s
T
0
1 . 1 — In — g • k 1 - P-kp
TO =
(4.1.21)
P
For t > r the function {r) > 0, i.e. the function
0
is strictly increasing. Since y>'(0) < 0, i^(0) < 0 for 0 < T < r , then ^ ( r ) < 0 0
and y>(r) goes to zero for some f
> r . In the same t i m e
0
0
OL
l i m
0
= +oo,
Suppose the following condition is satisfied (4.1.22) T h e n the point M is invariant.
In fact, the invariance of the point M may
be violated (see Fig.14) only if d u r i n g the m o t i o n a long c o n d i t i o n a l l y optimal trajectories the point M, that is d i a m e t r i c a l l y opposite t o the point M , i n the set Cg{y) is more removed from Cp(q) than the point M ( o r i t is at the same distance from Cp(q) as the point M). G e o m e t r i c a l l y , i t is evident t h a t under condition (4.1.22) this is not possible. W h e n the c o n d i t i o n (4.1.22) is satisfied, for all T we have (4.1.23)
RE{r)-R (r)
T
In the case kp = 0 the inequality (4.1.23) will have the form
5~ - T(?o, !/o),
0T
or, c o m p u t i n g a m a x i m u m of the function Br — 0
oTJ,
lax
— 0, T
m a x
—
(4.1.24)
^
a
,
(4.1.25)
we rewrite (4.1.24) as ^
T h e function surface
PT(<1,IJ)
(4.1.26)
< h{qo,y<>)-
is continuously differentiable everywhere except the 1 - e-k T P
M£
(1i ~ Vi + Pi
= 0,
(4.1.27)
Phase-constrained
"simple
pursuit"
games
131
corresponding to the i n i t i a l states for w h i c h the center of circles Cf(q)
and
C E ( V ) coincide. T h i s function is a s o l u t i o n to the C a u c h y p r o b l e m everywhere off the set (4.1.27) for the equation w i t h p a r t i a l derivatives of the first order dp
'(dp
dp
, + 0
m
w i t h the i n i t i a l c o n d i t i o n PT\T=Q
= \j{qi - yi)
+ (
2
J/:) . 2
W h e n the conditions (4.1.21), (4.1.22), (4.1.26) are not satisfied, the singular surface (4.1.27) m a y not be dispersal a n d the o p t i m a l strategy for P l a y e r E may not be o p e n - l o o p .
4.2
Phase-constrained "simple pursuit" games
Example 4- ( " s i m p l e p u r s u i t " game on the h a l f - p l a n e ) . T h e players move in the h a l f - p l a n e S (they cannot leave S u n t i l the play terminates) i n accordance w i t h the m o t i o n equations i ; = cvu,-,
i = 1,2,
\u\ < 1,
y,=0vi,
» = 1,2,
|v| < T .
where a = const, 8 = const, a > 8 (see E x a m p l e 2 in this chapter). T h e game has the prescribed d u r a t i o n T, and P l a y e r E's payoff is equal to a distance between h i m a n d P l a y e r P at the game t e r m i n a t i o n instant T. Denote by A\ a set of i n i t i a l positions, where center of pursuit M belongs to a b o u n d a r y of the set 5 . A l s o , denote by A? a set of positions, for which this condition is not satisfied. T h e s o l u t i o n of the game from the i n i t i a l positions belonging to the set A
2
coincides w i t h the solution to the " s i m p l e p u r s u i t "
game w i t h o u t phase constraints (see E x a m p l e 2 i n this chapter). O f greater interest is the case where x, y, T 6 A j . Let us choose a coordinate system i n such a way that the abscissa axis passes along the b o u n d a r y of the half-plane S (Fig.15). L e t SR(X) be a m i n i m u m sphere w i t h its center at x such that S[i(x)r\S Cl(y)
( 1 S . Let S (x)
= SR(X) f l S be a b o u n d a r y of the set S (x)
R
R
the center of p u r s u i t i n the g a m e T(x, y, T) for x, y, T e A point of the sphere S {y) BT
a n d the sphere
SR(X),
t
is the intersection
l i y i n g in S. For pr(x,
y) > 0
the p o i n t M is i n v a r i a n t . Indeed, the c o n d i t i o n a l l y o p t i m a l trajectory y(t) Player E prescribes the m o t i o n along the line [y,M],
D
C\ S. T h e n
of
a n d the c o n d i t i o n a l l y
E x a m p l e s of differential
132
Fig.
games of pursuit
15.
o p t i m a l trajectory x(t) of Player P prescribes the m o t i o n along the straight line from x towards M. Geometrically, the set S (x(t)) D C£(j,(f)) D for a l l 0 < t < T a n d , consequently, the point remains invariant i n a l l of the games r ( i ( l ) , y(t), T—t). Therefore, using T h e o r e m 3, C h . 3 we may write i m m e d i a t e l y R
(x y)
PT
=
l
Vair(x,y,T),
which gives (4.2.2)
Vair(x,y,T) if xj -
y i
> 0, or
ValT(x,y,T) if S) -y
= -aT
+
(4.2.3)
< 0.
x
Recall the equation w i t h p a r t i a l derivatives of the first order for the value function of the " s i m p l e p u r s u i t " game: dv
' dT = a
if hi /8
-8
dV
K
,
dv\-
1
(4.2.4)
T h e function Eq.(4.2.2), Eq.(4.2.3) satisfy the Eq.(4.2.4) i n t h e region x y-i > 0 w i t h the initial condition V(x,y,T)\
vl=0
= - T-r^{x -y,-8Ty a
l
+ x\
2
> 0,
(4.2.5)
P h a s e - c o n s t r a i n e d "simple
p u r s u i t " games
133
(if x\ — yi > 0 ) , t h a t is the basic equation for the game of simple p u r s u i t . Singular
surface.
A singular surface is c o m i n g into play when the positions
of the P l a y e r s P a n d E are on a p e r p e n d i c u l a r to the b o u n d a r y of the set S, i.e.
a singular surface is of the form X\ — y .
W e show that for all points x,
x
y on the s i n g u l a r surface the value of the game is equal to p-p{x,y)
and the
o p t i m a l strategy u*(') for P l a y e r P satisfies the condition of T h e o r e m 1, C h . 3, i.e. P l a y e r P chooses the segment of his o p e n - l o o p control from the condition min p. C o n s i d e r two cases. 1. T h e p o i n t E lies at the foot of a perpendicular dropped from the point P to the b o u n d a r y of the set S. In this case: a) the reachability set Cp(x)
f| S of P l a y e r P does not contains the initial
position E (Fig.16); b) the reachability set C (x)
f] S of P l a y e r P contains the i n i t i a l position
P
E (Fig.17); 2. T h e p o i n t E lies on a p e r p e n d i c u l a r dropped from the p o i n t P to the b o u n d a r y of the set 5 , but it does not belong to the boundary of S. C o n s i d e r case l a . T o prove the o p t i m a l i t y of the strategy u'( ) for P l a y e r P, it suffices to show t h a t , when this strategy is adopted, p decreases if P l a y e r E remains on the singular surface (in this case, he remains at the foot of the p e r p e n d i c u l a r d r o p p e d from the point P to the b o u n d a r y of the set S). C o n s i d e r F i g . 1 6 . L e t the p o i n t E be at y of the interval \N(T),y ] 0
d u r i n g t i m e (. Since the length
0
is s t r i c t l y positive here, then as the p o i n t x moves in
accordance w i t h the strategy u'{ ) (this strategy prescribes the motion along the p e r p e n d i c u l a r Xo, yo since if the point y a m o t i o n gives m i n p ) the angle X(T)
a n d parallel to x M
X(T)
we lay off the line segment
0
[X(T],
0
is i n the state of rest, then such y ] increases for 0 < r < t. F r o m
N(T),
0
we draw the line
on w h i c h from the p o i n t
X(T)M\
of lenght a(T-r).
X(T)N'
of triangles we have that the points A ' , N ' , N(T)
F r o m the similarity
are l y i n g on the same straight
line. Hence we get = P(N,M) > p(N\M')
fr(xo,yo)
= p(N(r),M(r))
>
=
0
( x ( r ) , M ( r ) ) - a(T
-
r) =
fr„ (z(j),yWh r
We have thus proved the o p t i m a l i t y of the strategy «*(•) for the case under study. C o n s i d e r case l b (see F i g . 1 7 ) . point E is at y
0
A s i n the previous case, assume t h a t the
d u r i n g the t i m e f. L e t u( ) be an a u x i l i a r y strategy for P l a y e r
Examples
134
of differential
games of
pursuit
P, under which he moves d u r i n g the time [0,2] towards the point y , so that the line segment X(T)M(T) for 0 < r < ( moves parallel to itself. W e prescribe this strategy to Player P up to the instant i u n t i l case l b converts to case l a provided, of course, that f < f i.e. u n t i l the circle Cp~'(x ) a
0
touches the b o u n d a r y of S at the point y . T h e latter is allways possible, since under the strategy u( ) the m o t i o n does not occur at a m a x i m u m speed (to maintain the p a r a l l e l i s m i n approach, Player P is to decrease his speed). A s before, note that the points N, / V ( r ) , yo are l y i n g on the same straight line. Hence 0
h{x ,y ) = 0
0
p{N,M)
>
p{N{r),M(T))
=
pr-Mt),yo).
In this manner p decreases up to the instant t. Case l b then converts to case l a . T h i s completes the proof of the o p t i m a l i t y of the strategy u*('), since although the strategy u{-) does not carry ever the point P to the point minp, the quantity p decreases, hence the strategy u"(') transferring the point P to the point m i n p ensures a decrease in p W e have thus shown t h a t , i n case 1, p {x ,yo) is the value of the game and ii"(-) is the o p t i m a l strategy for Player P . Consider case 2 (Fig.18). W e show t h a t , when the strategy u*(-) is adopted T
0
by Player P , the q u a n t i t y p decreases if the point E remains on the singular surface. N o t e that Player E is concerned w i t h the m o t i o n for w h i c h the velocity of the current center of pursuit M{t) towards the point M is m i n i m u m . Such
P h a s e - c o n s t r a i n e d "simple
games
pursuit"
0T
135
0{T-i) Fig.
17.
a motion corresponds to the case where the point E moves from the position yo towards the point M at a m a x i m u m velocity 8. We draw the straight line L through the point N parallel to the b o u n d a r y of the set 5 . W h e n the point x moves towards the point K at a m a x i m u m velocity a, the intersection point K{t) of the b o u n d a r y of the set Cp~'(x(t)) and the straight line L moves from /V to K. Since | ^ | < | ^ | , then the velocity of the point K(t) at each instant ( is less then the velocity of the point M' (here we consider the t i m e t that is less than the time d u r i n g which the point E reaches the point M). T h i s means t h a t the angle K(t)M'M increases continuously. B u t the angle NMM is acute a n d , therefore, continues to be acute d u r i n g some t i m e [0,5]. T h i s enables us conclude t h a t on the interval [0,5]. h(xo,
yo) = NM>
KM'
> N'M'
= &•-,(*(*), y(t)),
i.e. p decreases. If there is an tnstant i, at which the angle KM'M turns out to be r i g h t , then we draw a new chord V from the point A " a n d consider the intersection point K" for the b o u n d a r y of C p ( i ( r . ) ) and the chord U on some _ I
interval [(, I + 5 J where the angle K"M'M is acute. A s before, we prove t h a t p decreases for f € [(, ( 4- 5]. T h i s process may be continued up to the instant ( u n t i l the point E occupies the position M since the time (, d u r i n g which the point E arrives at the point M is s t r i c t l y less than the t i m e , d u r i n g w h i c h the point P arrives at the point of K. In this manner we prove t h a t there is the strategy u[ ) of P l a y e r P (i.e. he moves on the singular surface at a m a x i m u m velocity) under w h i c h p decreases on the interval [0, (]. T h e further decrease i n p follows from case 1. Since fi(-) is a strategy of
Examples
136
of differential
games of
pursuit
Player P, then the strategy u*(-) transferring at each step the point x to the point m i n p ensures a decrease i n p. W e have thus obtained a complete solution to the " s i m p l e p u r s u i t " game in the h a l f - p l a n e . T h e o r e m 1 In the game of "simple strategy for Player
pursuit
" in the half-plane,
an
optimal
P is the strategy «*(•) and the value of the game is equal to
PT(X ,!/O). 0
Example 5. ("simple p u r s u i t " w i t h i n an acute angle. T h e case of prescribed duration and t e r m i n a l payoff). Let S be the interior of the acute angle together w i t h the boundary. We restrict ourselves to the case where the intersection C|(yo) 0 ^ does not include the vertice of the angle 0 (x , y £ S), (Fig.19). T h e r e are not more than four points of intersection of the boundary of the set C[{yo) w i t h the sides of the angle. Denote them by JV, M , M , M , Mi, respectively. Let N be the distant point of intersection of the straight line x , y w i t h the b o u n d a r y of the set C {y ). T h e n for any arrangement of the points Xo, yo £ S the center of p u r s u i t is among the points JV, Mi, M, M , M . W h e n the center of pursuit coincides w i t h the point JV, the game involved is equivalent to the " s i m p l e p u r s u i t " game free of phase constraints and reduces to the above-mentioned case (see E x a m p l e 1 i n this chapter }. Since a > 8 for a l l x 1/0 £ S then no more t h a n two centers of p u r s u i t may exist. Moreover, such a pair may be formed only by the points Mi, M or M3, M , l y i n g pairwise on one of the angle sides. T h e singular surface i n the game exactly corresponds to the set of initial states for which there are to centers of pursuit. Geometrically, it corresponds to the case where the points x , y belong to one of the perpendiculars L i or L erected from the m i d d l e portions of the line segments M M and M M see (Fig.19). 0
0
t
0
E
a
2
3
0
2
3
4
0 l
2
2
X
2
3
4
Phase-constrained
"simple
pursuit"
137
games
/
M
3
\ .
S
1
\
$2
F i g . 19.
W h e n dealing w i t h the region bounded by the singular surface, we find that i n such regions there is the invariant center of p u r s u i t a n d the value of the game is equal to pr{ o, J/o). since i n each such case the game is equivalent to the game of p u r s u i t o n the h a l f - p l a n e , where the boundary is one of the angle sides. x
For the global soluion of the game, we have to study the players' behavior on the singular surface. F o r the purposes of definiteness, assume that the point x, y is on the p e r p e n d i c u l a r L\ erected from the m i d d l e of the line segment MM l y i n g on t h e side Si of the angle. W h e n dealing w i t h the a u x i l i a r y game of " s i m p l e p u r s u i t " on the h a l f - p l a n e w i t h the b o u n d a r y S i a n d prescribing an o p t i m a l strategy i n this game to P l a y e r P (see the previous example), we find that, as shown above, the q u a n t i t y p only decreases when the points x,y are on L. T h i s means t h a t the strategy u'{-) for Player P ensures the constancy of p i n the region b o u n d e d by the singular surface, a n d its decrease o n this surface. S o the value of g a m e is pr(xo,yo) o p t i m a l strategy of Player P coincides to the strategy u'(-) from T h e o r e m 1, C h . 3. t
2
a n Q l
a
n
Example 6. ("simple p u r s u i t " , where pr(x VO) is not the value of the game). Consider t h e previous example where the vertex of angle 0 is the interior point of the set Cg(y) (Fig.20). L e t L be a perpendicular erected from the m i d d l e of the line segment OM\. A s s u m e that the point x € L. A t this point there are two centers of p u r s u i t , a n d w i t h a l l y' € S for which 0 € C%(y') the condition of T h e o r e m 1, C h . 3, are v i o l a t e d at this points. T h u s , we are facing here the whole region on s i n g u l a r i t y rather t h a n a singular surface. W e w i l l show that for such points x, y there is no strategy u*{ ) w h i c h ensures a nonincrease in 0i
Examples
138
Fig.
of differential
games of
pursuit
20.
h( >y)E x a m i n e Fig.20 (consider the case Cf(x) 3 C%(y) f\S). Let the time interval [0,5) be such that for all y' £ C (y) the set ©||~*(j/') contains the point 0. Suppose the point E moves towards the point My. T h e new reachability set C ~ {y(S)) contains b o t h the point M i and point O. T o ensure the distance from Cp~ (x(S)) to Mi equal to pr(x,y) Player P must move towards the point M\. B u t this involves an increase i n the distance from Cp~ (x(6)) to the point 0 , i.e. an increase i n p. T h e latter means t h a t for any strategy of Player P there is Player E's strategy (viz., the strategy which focussed the motion of the point E on the point M , ) , for which pj^s(x(6),y(S)) > pT{x,y). A more complex case of " s i m p l e p u r s u i t " in the circle is discussed in [15]. B u t there only the game of kind is presented, which s u b s t a n t i a l l y simplifies the problem. x
E
E
s
6
s
4.3
"Simple pursuit" game with two pursuers and one evader
Example 7. Consider the case where several pursuers and one evader participate i n the game, i.e. the team of pursuers P consists of P = { P i , P j ) players and all the participants of the game have a simple m o t i o n . T h u s , motion
" S i m p l e p u r s u i t " g a m e with two pursuers and one evader
139
equations are of the f o r m : for Pi for E
xf
= uf,
y, =
i = 1,2,
|u'"| < a W , \v\ <
Vj,
j = 1,2,
0,
8 < mina
( i |
.
T h e game has a prescribed d u r a t i o n T, a n d the payoff of Player E is equal to min,p{x«(r),y(T}} Consider the properties of the center of pursuit M. PTCX' ',x^ 'ijf) 1
~
2
m
a
min
x
T h e point M is a point from C (y)
min
.
which is farthest from the union of the
B
sets C j . f x W ) a n d
B y definition,
Cp\{x^).
T h e following two cases are possible: 1. A m i n i m u m i n the previous f o r m u l a is achieved at one point (for purposes of definiteness, let it be the point f i £ @p,{
2. A m i n i m u m is achieved simultaneously at two points: 6
6 Cp\(xW).
Then
=
the
))•
x
fi £
(^(i' '), 1
p(p ,M). 2
C o n s i d e r the first case. T h e following l e m m a holds. L e m m a 1 In the first case, the point M is on the center line of regions Op\(x^) "simple
pursuit"
and C (y)
coincides
E
game with the Players
with the center of pursuit
reachability M for
the
Pi, E, and the game itself is equivalent
to this game. P r o o f : Since pj is achieved at one point £i, then
Suppose the statement of the l e m m a is not true a n d M ^ M. the point i n the set C^{y)
l
for any e > 0 i n its e - n e i g h b o u r h o o d
there is the point M' £ C^{y)
farther f r o m C j , ( a : ) then the point M, i.e. ( , )
teciyv) Let p{(,2,M)-p(f.i,M)=
T h e n M is not
w h i c h is farthest from the set Cj. (x^).
6. A l s o , let
c
Therefore, w h i c h is
Examples
140
min
games of pursuit
of differential
p%M%~M>M).
W e have Choose t > 0 i n such a way that \p(&K)-P((.2,M)\
1
(
M}i<^
Then
j
Hence the point M, is farther from t h e u n i o n of the sets C £ (a; >) U (1
C^(x^)
than the point M on |, T h i s contradicts to the fact t h a t the point M is the center of pursuit. T h i s completes the proof of the first part of t h e lemma. Its second part is the consequence of the first one.
•
It follows from the l e m m a that for the complete solution of t h e problem it suffices to consider case 2. L e t o ( £ i , M ) = 4 , p(tt,M) neighbourhood
= e. 2
of the set C%(x )
In case 2, £
r
by
ti)
= £. 2
Denote the L-
T h e set Uj(x^)
w i t h ct®T + (! as a radius, and its center at the point i ' ' . 1
is a circle
T h e definition of
the quantity ^ T ( X ' ' ' , x ^ \ y ) implies Cl{y)cUj{x^)\JUj{x^). Prove t h a t , i n the case involved, the q u a n t i t y PT{%^\^ \y)
is the value of the
2
game (in case 1, this result is obtained as a consequence of the solution of the " s i m p l e p u r s u i t " game). T o this e n d , we shall check the invariance condition and study the singular surfaces. F i r s t consider the case where the points x"\
x^\ y are not l y i n g on the
same straight line. W e prove t h a t the first condition of invariance is satisfied. Let ;/"((), u*(f)
x' '*(() be c o n d i t i o n a l l y o p t i m a l trajectories ( i n this case, 2
is a straight trajectory from y to A f , and x'''*(f) a straight
trajectory
from sJ*> to £ , ) . T h e first condition of invariance may be w r i t t e n as
c -'(y-(t)) T
E
c
v7~H* m)\J
< < < r
m
(*)
T h e boundary of the set t / - ' ( x ( ) * ( i ) ) n ^ / ( x " { i ) ) composed o f two line segments C / ( x W * ( t ) ) and C j ~ ' ( x * ( i ) ) , each b e i n g a p o r t i o n of the circle (
- t
T
r_t
l
( 2 >
(2)
i s
"Simple pursuit"
game with two pursuers and one
Fig.
w i t h a'(T
evader
141
21.
— t) +£ (Fig.21) as a radius and its center at the point z ' ' " ( t ) . One (
of the intersection points of these circles, M, is s t a t i o n a r y (i.e. it does not vary for 0 < t < T), and another the p o i n t , N{t), Denote by a the angle x ^Mx\ Q
is dependent on the parameter (.
not exceeding x .
L e m m a 2 Suppose that the point M does not coincide with the point of intersection JtjfM of the center line x$\ y and the boundary o/C7g(j/o)- Then M lies within the angle a. 0
P r o o f : Suppose the opposite is true, and the point y
0
(see F i g . 2 1 ) . T h e n , of two line segments x$M intersect w i t h the line segment xf'^M^. can be r e a d i l y seen that the point M ^ 1
than the p o i n t M.
lies out of the angle a
there is the one which does not
L e t it be the segment x^M^I. is farther from the points x
0
Consequently, i t lies farther from the sets C p ( x 1
Cp\{x ^) since these are the circles w i t h centers at the points x \ 2 a
0
]
It
and x
li
0
x\ 0
2
1 )
)
0
2 )
and This,
however, is inconsistent w i t h M being the center of pursuit. T h i s completes the proof of the l e m m a .
•
E x a m p l e s of differential
142
games of pursuit
We now prove the inclusion (*). Assume that i t is not satisfied. T h e n there is the instant t when a point N not belonging to Uj~ {x^),(()) f l U ~'(*<2).(*)) l
t
may be cboosen from C^'iy'if))T h e n there is such time instant t < t that the point N{t) w i l l belong t o the b o u n d a r y of the set Cl~'{y(i))- T h a t implies that the points x^(i), x (t), y{i) must lie o n the same line. B u t i f at the instant t = 0 the points x ( 0 ) , £ ( 0 ) , y{0) were not l y i n g on the same straight line, then honceforth (when m o v i n g along 2 * ( i ) , x^"(t), y'(t)) they would never lie o n the same line. Indeed, since £ > 0, then i n course of time the angle x^'{t), y"p), x^'[t) decreases to the angle At, T h e points XQ \ y, x \ however, are not o n the same straight line, therefore the points x '{t),x^*{t),y'{t) for any 0 < t < T cannot lie o n the same straight line. T h i s completes the proof of the first invariance c o n d i t i o n . T h e validity of the second invariance condition is proved i n much the same way. 2
( l )
( 2 |
( , )
0
2
M
C o n s i d e r the case where the points X Q \ X \ yo are o n the same straight line. W e show that this case corresponds to a singular surface. Indeed, the second invariance condition is violated at the points XQ, yo, x K 2
q
2 Q
Suppose the point E remains i n a position y d u r i n g the time 5. For the initial states o n the singular surface there are two centers of p u r s u i t M N corresponding to two intersection points of the boundaries of the sets UJ~(XQ), UT(x ). A f t e r choosing a motion direction i n accordance w i t h one of the conditionally o p t i m a l trajectories (say, trajectory focussed on M), i n the time 5 Player P makes a transition t o the i n i t i a l state x^\6), x ' ' ( 5 ) (Player E is resting at the point y ) for which the center of p u r s u i t is now the point N. 0
0
2)
J
0
T h e singular surface under study is not dispersal, since there is a control for Player E {viz., the control v = 0) w h i c h , for f 6 [0,5], retains the points * ' ' ( t ) , y(i) oil S provided that Player P chooses the control u(-) in accordance w i t h T h e o r e m 3, C h . 3, i.e. from the condition mil)p. Indeed, the motion under the strategy u' (condition m i n p) focusses the points P , , P on the point E, and since the latter is i n the state of rest, then the game does not leave S (Fig.22). 2
2
We show that for such a behavior of Player E the function p decreases, which means that the value of the game is equal to p. N o w , assume that Player E employs the control v = 0 d u r i n g the time [0,5]. T h e n the point x®(t) moves towards the point y on a distance a,-5 ( a ; > B) i n a straight line, which is perpendicular to the straight line My , and the point M occupies the position M(S) on the straight line. Here we are facing the s i t u a t i o n which is similar to the behaviour on the singular surface i n case 1, E x a m p l e 4 ("simple p u r s u i t " i n the h a l f - p l a n e ) . R e p r o d u c i n g the reasonings given there, we may show that the length of the segment [6(0), M] is s t r i c l y greater t h a n the length of the segment [£,-(5), M ( 5 ) ] . However, the length of the first segment I = p and the lenght of the second is fo-s(xW(6),x&\6),yo). T h i s proves the decrease 0
a
"Simple
pursuit"
game with two pursuers
Fig.
a n d one evader
143
22.
of p and implies the following theorem (see (Fig.22). T h e o r e m 2 In the game of "simple pursuit" of Player E by the team P = the value of the game is p r ( x ' ' , i , y o ) , and an optimal strategy for Player P is constructed according to Theorem 1, Ch. 3. 0
4.4 Let Lp(x)
0
2 ,
Relations between maximin time of pursuit and time of absorption be the ^ - n e i g h b o u r h o o d of the set Cp(x).
Denote by T the first
time instant when C£(tf).
(4.4.1)
N . N K r a s o v s k y was the first to introduce the q u a n t i t y T and called it the absorption t i m e . U n d e r conditions of T h e o r e m 3, C h . 3, the value of the p u r s u i t game of prescribe d u r a t i o n T is precisely equal to £ (the t e r m i n a l payoff). E v i d e n t l y , the following i n e q u a l i t y always holds Q(x,y,£)
(4.4.2)
were 6(x,y,£) is the m a x i m i n time up to the ^ - c a p t u r e instant (until the distance f° is approached), a n d T is the absorption time (^-absorption). Indeed, m o v i n g i n the game F ( x , y, f) (along the c o n d i t i o n a l l y o p t i m a l trajectory y"(t)), i n the o p t i m a l t i m e game of p u r s u i t T(x, y, I) Player E can be sure that Player P cannot approach h i m on a distance not exceeding t up to the instant 0 . Since the value of the game T(x,y,T) is equal to t, then by the instant T Player P ensures an approach to P l a y e r E on the distance I, whenever the motion of the l a t t e r may be. Consequently,©- < T (see 4, C h . 3).
Examples
144
Fig.
of differential
games of
pursuit
23.
T h e inequality (4.2.2) may also be interpreted somewhat differently. In the game T(x,y,£} Player P ensures an approach to Player £ on a distance I in the time Q(x,y,£). Here he also reckons the cases of approaching Player E on the distance £ i n the time less than 0 . A t the same t i m e , t h a t the £ equals the value of the game V(x, y,T) merely signifies that Player P can ensure an approach to Player E for a distance not exceeding £ exactly at the instant T. T h e latter provides a more complex p r o b l e m (for P l a y e r P ) , since the approach to Player E on the distance t before the instant T is not taken into consideration. Consider the game of " s i m p l e p u r s u i t " i n some closed convex set S on the plane. T h i s means that there are phase constraints: the players P and E are not allowed to leave the set S as the game proceeds). W e prove the following obvious l e m m a . L e m m a 3 Let the point E move along the ray from the point y at a constant velocity 8. Then for a > 8, p(x,y) > t, the equation p(x,y(tj)-at has a unique
= (
(t>0)
(4.4.3)
solution.
P r o o f : Suppose there are two solutions f i > fj > 0 (the equation (4.4.3) is a quadratic equation). E x a m i n e F i g . 2 3 . Here \xA\ = at , \Ay{t )\ = £• From the point A the straight line L is drawn parallel to y{t). Prescribe Player P a motion from the point X to the point A w i t h velocity a and further along L w i t h a velocity /?. T h e velocity 8 < a and therefore such a m o t i o n is possible up to the instant i-i > t . 2
2
2
"Simple pursuit"
game with two pursuers
and one
evader
145
F i g . 24.
L e t x(t)
be a corresponding trajectory. B y c o n s t r u c t i o n , /)(£{(,),y{t,)} = t.
However, since the trajectory x(t)
is a polygonal line, then it is evident that
by e m p l o y i n g a s t r a i g h t - l i n e m o t i o n from x to y(t-i), Player P w i l l aproach Player E by the instant i j on a distance less t h a n Z, i.e. p[x,y{t\))
— at
}
< £.
T h i s is inconsistent w i t h the definition of £j. W e have proved the l e m m a . T h e o r e m 3 Let the conditions pursuit"
game of prescribed
of Theorem 3, Ch. 3, be satisfied in the
duration
T in S.
= e,
VatT{x,y,€)
= T
T X
(here V{x,y, £) is the optimal
= e{x,y,l)
time pursuit
"simple
Then
V ir{x,y,T)=p ( ,y) a
•
( > 0 is
game until a distance
reached). Proof:
T h e equality ValT(x,y,T)
= I follows f r o m T h e o r e m 3,
= pr(x,y)
C h . 3. Denote by M G S a center of p u r s u i t , i.e. the point for which t=p(£,M)= T h e set C (y] B
max
max
p((,n)
=
is an intersection of the sphere S {3T),
p (x,y). T
w i t h radius 0T { u + v | < !
V
3 ) a n d its center at y, a n d the set S. S i m i l a r l y , the set Cp(x) 2
of the sphere S {otT), x
w i t h radius aT (u
2
+ u\ < a ) 2
is an intersection
a n d its center at x, and
the set S. T h e point {• e S lies on the boundary of the sphere S (QT) i
I
since
M G S, x £ S a n d because of the convexity of S the hole segment [*, M] C S and ( e [x,M] M
because it is a point from Cp(x)
that is nearest to the point
(Fig.24). T h u s , the c o n d i t i o n a l l y o p t i m a l trajectory for Player P is a straight line. W e show t h a t a m o n g the c o n d i t i o n a l l y o p t i m a l trajectories for Player E
there is also the straight one. Indeed, if M is not on the b o u n d a r y of S, then
Examples
146
of differential
games of pursuit
the point M belongs to the b o u n d a r y of S (8T), and Player E's trajectory is straight. If the point M belongs to the b o u n d a r y of S then any trajectory j o i n i n g the points M a n d y (y(0) = y, y(T) = M) may be defined as follows. Let p{M,y) be a distance from M to y (evidently, p(M,y) < 8T). C o n sider the segment [y,M]. P l a y e r E is prescribed a m o t i o n w i t h a velocity 8 along the interval [y, M], u n t i l the instant p(M,y}/8 and the w a i t i n g on the time interval [p(M, y)/8, T] at the point M. T h e a b o v e - m e n t i o n e d motion is possible because of the convexity of S and is a c o n d i t i o n a l l y o p t i m a l straight trajectory. y
t
A s shown before, according to (4.2.2) &{x,y,l) < T. A s s u m e that the inequality Q{x,y,£) < T is true. Let £ ( / ) , y(t) be the corresponding c o n d i t i o n a l l y - o p t i m a l trajectories in the o p t i m a l time p u r s u i t game V(x,y,(). A s before, we may show that i ( t ) , y(t) are the straight line trajectories emanating f r o m the point X, y; here x(t) is focussed on the point y{Q(x, y,£}) (the " c a p t u r e p o i n t " ) . Since 0 ( x , y,l) < T and the game is simple p u r s u i t , the point y(0(x,y,£)) £ C^iy) (since E can realize the following m o t i o n , he moves i n the a straight line towards the point y(Q{x,y,£)) u n t i l the instant 0 and then stays at the point y(Q(x,y,£)). on the time interval [ Q , T ] ) . N o w , let, x(t) y{t) be a pair of conditionally o p t i m a l trajectories in the game V(x, y, £). Player E is prescribed a m o t i o n from the points y to the point M along a straight trajectory w i t h velocity 8 and then a w a i t i n g at the point M u n t i l the instant Q(x,y,£). T h e Player P cannot aproach h i m on a distance t, since 0 < T (see L e m m a 3) and t
P
(x{&),y{Q))
> p(x(T),y{T))
= p(£,M)
T h e contradiction obtained proves the theorem.
= p (x,y) T
= t. •
T h e o r e m 3 contains the m e t h o d of finding solution of the o p t i m a l time pursuit games in the " s i m p l e m o t i o n " case. T h e solution of the o p t i m a l time pursuit game (in the case if the conditions of T h e o r e m 3 are fullfilled) is constructed as follows. For the i n i t i a l conditions of the game T{x,y,£) (the opt i m a l time p u r s u i t game f r o m the i n i t i a l c o n d i t i o n x, y u n t i l the capture at the distance f ) , the game V(x,y,T) of prescribed d u r a t i o n T is constructed. Moreover, T is chosen from the condition T = m m { T : ValT(x, y, T) = £}, i.e. T is the m i n i m u m d u r a t i o n of the game T{x,y,T) for w h i c h its value is equal to £ (the capture radius i n the game V(x,y,£)). If T = oo, then the ^-approach cannot be ensured by Player P in a finite t i m e . T h e examples of simple p u r s u i t 2, 4, 5 show that the conditions of Theorem 3, C h . 3 are satisfied. Consequently, T h e o r e m 3 is applicable to t h e m . In E x a m p l e 2, the set S coincides w i t h the entire plane, i n E x a m p l e 4 w i t h the half plane a n d i n E x a m o l e 5 w i t h the interior of the angle.
'Simple pursuit"
game with two pursuers and one evader
147
In E x a m p l e 2, the function Q(x,y,e) is computed i n a simple way and is the time to the f - c a p t u r e instant provided that P l a y e r E moves from Player P i n a straight line w i t h the velocity 3, and P l a y e r P m o v i n g towards Player £ i n a straight line at the velocity a . T h i s gives us the following formula
(4.4.4)
a-3 T h e function Q{x,y,£) the surface
has continuous p a r t i a l derivatives everywhere except ^(x, -
y
i
)
2
+ (x
2
-y ) 2
2
= 0
(4.4.5)
which corresponds to a coincidence of the positions of players P and E. If the distance between the players at the initial instant is assumed to be s t r i c t l y greater t h a n zero and £ > 0 then the players' trajectories do not penetrate the surface (4.4.5) d u r i n g the game, and hence the function Q(x, y, £) satisfies the equation 50 ^— Vi dyi
1
max V »€T ^
+
m i n 2_, =—u,
= -1
(4.4.6)
w i t h the i n i t i a l c o n d i t i o n Q{x,y,t)\
= u.
p{Xiy)=t
S u b s t i t u t i n g the controls u , u i n t o (4.4.6) (see E x a m p l e 2 i n this chapter), on w h i c h m a x and m i n are realized i n this equation, we o b t a i n the equation w i t h p a r t i a l derivatives of the first order 50 dx\
+
50
-+3
dx
2
50 5t/i
+
50
(4.4.7)
dy
2
w i t h the i n i t i a l condition (4.4.8)
9 ( z , ^ ) U * . v M = 0.
For the g a m e of " s i m p l e p u r s u i t " in the h a l f - p l a n e (see E x a m p l e 4 in this chapter) and " s i m p l e p u r s u i t " w i t h i n the angle (see E x a m p l e 5), the function Q(x, y, £) cannot be c o m p u t e d e x p l i c i t l y , simple (see E x a m p l e 2 i n tins chapter) it turns out to be the root of the following equation (see (4.2.2), (4.2.3)):
-aQ
+ JU-Vi
+ yjiB&Y-yi)
+x$
= £
148 A s noted pursuit until absorption). (4.2.2) to be
Examples
of differential
games of
pursuit
before, (see (4.2.2)), 0(x,y,£) < T,i.e. the m a x i m i n time of the ^-capture instant does not exceed the absorption time (iO n e obvious sufficient condition may be given for the equality in satisfied.
L e t y'(t) be an o p t i m a l trajectory and v*{t) be an o p t i m a l o p e n - l o o p control for Player E in the game F(x,y,T). and let u'(t) be an o p t i m a l open-loop control for Player P realizing the ^-capture of P l a y e r £ i n a m i n i m u m time provided that he employs the o p e n - l o o p control v'(t) previously reported to Player P. Denote by tn(x,y\u"(t),v'{t)) the corresponding time u n t i l the ^-capture instant. T h e n if tn(x, ,u(t),v-(t))
(4.4.9)
= T,
r
then also
(4.4.10)
e(x,y,e) = T. Indeed, since 6(x,y,£) = max„ definition of u'(i) we have
(1)
m i n , | t (x, u (
n
t/,u(t), u(r)), then from the
e(x,j,,f)>(n(x,;/;u-((),u-(0) = a n d , using (4.4.2), o b t a i n (4.4.10).
f,
Chapter 5 " L i f e l i n e " game of pursuit 5.1
Definition of "life line" game
T h i s chapter deals w i t h a f a m i l y of z e r o - s u m two-person games, each being a model of p u r s u i t i n a closed convex set S C R . T w o points - Pursuer P and E v a d e r E - have m o d u l u s - constrained velocities and move w i t h i n the set S w i t h the p o s s i b i l i t y to switch the direction of m o t i o n (simple motion) at each i n s t a n t of t i m e . E v a d e r E is said to be captured as soon as the distance between h i m a n d P u r s u e r P reaches the value less t h a n or equal to £ > 0. T h e number £ is called the capture radius. Player P seeks to approach Player E before the l a t t e r intersects the " l i f e l i n e " , i.e. the b o u n d a r y of the set S. Player P must stay i n S throughout the d u r a t i o n of the g a m e . M o t i o n equations are of the form 2
1
ii =
11,-,
Hi = t><,
u ? + «2 < a , 2
»=
1,2,
a = const, 3 = const, i > + u < 2
2
a > 3
(5.1.1)
T h e game is s t u d i e d i n the class of piecewise o p e n - l o o p strategies (see i n 2, C h . 2). T h e payoff f u n c t i o n is defined i n the following way. L e t x(t), y(t) be a solution of the system of differential equations (5.1.1) in the s i t u a t i o n («(•),«(•)) w i t h the i n i t i a l positions x ,y £ S a n d let 0
- inf{( : x(t)
t
!p
£
0
5},
t, =mf{t:y(t)£S). E
'The problem for the half-plane was first stated in [1]. The solution for t = 0 was obtained in [31], for £ > 0 in [34]. Another method of solving the problem for the half-plane case is given in [43]. 149
"Life line" game of pursuit
150
T h e n we require that x(t) g S for a l l t > t a n d y(t) g S for a l l f > t (the latter condition restricts the strategy classes for t h e players). N e x t , let t - m i n { i : p(x(i),y(t)) - (} (if there are no such tp, then t is assumed t o be equal t o oo). Sp
Ss
P
P
P a y o f f f u n c t i o n . Let x(t), y{t) be the trajectories for the players P and E corresponding t o the i n i t i a l conditions x ,y € S i n the s i t u a t i o n («(•),»(•)). T h e n the payoff function (Player P ' s payoff) is defined as follows: 0
f K(*a,mu{
)M-))={
0
+ 1 , if t
P
< t ,
t
SE
0, i f tp = i - oo, { - 1 , if t > t ,
P
< oo
3a
P
SE
t
SE
< oo
W i t h the players' strategy sets a n d the payoff function denned, for each pair of initial positions io,j(o £ S the game is denoted by r(xo,yo)F r o m the form of the payoff function i t follows that the game r(x ,y ) is the game of k i n d where Player P seeks to approach Player E before the latter intersects the b o u n d a r y of the set S. A s s u m e that the game is of zero-sum two-person type, i.e. Player E ' s payoff is Player P ' s payoff w i t h opposite sign. W e first prove basic theorems for the games w i t h Player E discriminated for the time 6 > 0 ahead and then show that these theorems also hold for the game V(xo,yo) (without d i s c r i m i n a t i o n for Player E). o
5.2
0
Discrete game
Let 5 be a fixed partition of the interval [0,oo) w i t h a constant step A > 0. T h e game T&{x ,y ) is the game w i t h d i s c r i m i n a t i o n of P l a y e r E over the class of strategies P and £ defined i n 2, C h . 2. 0
0
Let Uj(-) be a fixed strategy for Player P which c a n , in any situation (Cf(*)t ff(-})> ensure the /"-capture of Player E in the game T^{x ,y ). Let Ou {x ,y ) be a set of a l l possible locations of Player E ' s position at the tcapture instant i n the s i t u a t i o n u ( - ) , u ( ) for different u ( - ) € E (in what follows, Player E ' s position a t the f - c a p t u r e instant is also called the "capture p o i n t " ) . E v i d e n t l y , if the set Cfij(.)(io,i/o) has a nonempty intersection with a complement of the set S Player P c a n not guarantee the ('-capture of E in the set S using tis{-). Consequently, in this case Player E can always choose a strategy under which the ^-capture i n the situation (u {-), «;(•)) occurs in the complement of the set S. There may exist a whole set of such strategies. 0
c
0
0
0
4
(
{
s
T h e o r e m 1 Suppose the following I. the intersection
conditions
ai~e
satisfied:
o / C ( . ) ( i ' o , yo) ivith the complement of the set S is nonempty; aj
Discrete g a m e
151
2. there exist the strategy u;{-) tinder which, in the situation ("*(•),t>i(-)), Player P speeds up to the "capture point" from the initial point x , and the "capture point" (at the instant oft capture) does not belong to S. 0
The speed up to "capture point"
is interpreted
as
follows.
Let £(t), y'(t) be the trajectories of the players P and E in the situation («*(•)- " f (')) / the initial states x , y . Then we say that the strategy Ui(-) in the situation {u (-}, v' (-}) speeds up to the "capture point" if, in his motion along the trajectory x(t), Player P, realizes the £-capture of Player E as fast as possible (provided that Player E moves along the trajectory y"{t) and his trajectory is previously reported to Player P). r o n l
0
6
0
5
The solution (««(•),t»J(-)) constitutes an equilibrium point in the game f ( 3 o i ! f o ) , and the value of the game is - 1 (i.e. under conditions of the theorem, none of the strategies of Pursuer P can ensure the t-capture of Player E in S). r
P r o o f : Let x'(t), i/j(-) be trajectories of the players P and E i n the s i t u a t i o n *>?(*)) the game V (x , y ). Denote by t>; a strategy for P l a y e r E under which he chooses at each t i m e i n s t a n t a direction of motion along the trajectory y'(t) irrespective of Player P ' s actions. Whenever the strategy u;( ) € P may be, the ^ - c a p t u r e i n the s i t u a t i o n {w^('})"s(')) cannot take place before the instant of time m
6
0
a
because i n the s i t u a t i o n ( u ( - ) , 0 ( ' ) ) or (u*(-)> s(0) Player P speeds up to the " c a p t u r e p o i n t " . T h i s means that 4
v
f
tp{u (-),v- (-))>t [u (-)A(-))>ts 6
s
P
6
£
for all strategies us(-) for Player P . W e have thus proved the theorem.
•
W h e n the set C .){x , y ) is contained i n 5 , the f - c a p t u r e of Player E in S is always possible if the strategy uj(-) is adopted by Player P In this case, the strategy us(-) turns out to be o p t i m a l for P l a y e r P (it may not be unique), and the value of the game is + 1 . O p t i m a l for Player E is any strategy v (-}. it(
a
0
s
N o w , to solve the game it suffices to establish the existence of strategy v (-) for P l a y e r E satisfying the conditions of T h e o r e m 1, or to establish the existence of strategy f i j ( ' ) under which the set C .)(x ,yo) contained i n S. s
eti
0
t s
W e now t u r n to c o n s t r u c t i o n of the p a i r of strategies tij(-), v' ( ) possessing the property mentioned above. Suppose that from the i n i t i a l position y P l a y e r E chooses a constant control v = const (\v\ — 0) (i.e. he moves i n a straight line w i t h a velocity 0 along some ray y^A). For each such m o t i o n , Player P has a unique constant control « = const w h i c h guarantees h i m the f - c a p t u r e of Player E from the position s
0
"Life line" game of
152
pursuit
A
xo
!fo
Fig. 25. x i n a m i n i m u m t i m e . T h i s control prescribes h i m the m o t i o n , w h i c h is called the speed up to - " c a p t u r e p o i n t " , along the ray x B focussed on the "capture p o i n t " ( F i g . 25). 0
Q
D e f i n i t i o n 1. is associated w i t h to y(t +i) along a the point y{i ) = k
k
E a c h o p e n - l o o p control v(t) transferring y(t ) to y(t ) the o p e n - l o o p control v(t), \v(t)\ = 8 transferring y(i ) single-vertex polygonal trajectory y(t) w i t h its vertex at y j (Player E's speed being m a x i m a l along y(t)}. k
k+l
k
l l - s t r a t e g y «?(•) places each state x(t ), y(t ) and the i n t e r v a l of the o p e n loop control u(() for ( £ i.e. the point y(t +\), i n correspondence with the interval of the o p e n - l o o p control u(t) for t £ [t ,t i] w h i c h , for i £ [tkitk] coincides w i t h the o p e n - l o o p control realizing the speed up to " c a p t u r e p o i n t " m o t i o n provided that from the time t , a n d throughout the duration of the game, Player E moves at a m a x i m u m speed along the straight trajectory y(t ), y{i ) towards the point y{i ) and for t £ \i ,t i) it coincides w i t h the o p e n - l o o p control realizing the speed up to - " c a p t u r e p o i n t " motion provided that from the time fj, and throughout the d u r a t i o n of the game, Player E moves at a m a x i m u m speed along the straight trajectory y(t ), y(tk+i) towards the point Moreover, if at an instant t' £ [i*,l*+i) it turns out that p(x(t'),y(t')) — f, then under the strategy u ( - ) P l a y e r P chooses u(r) on the t i m e interval [tf,tk+i) in such a way that p\x(t),y(t)) — ( for all ( 6 [(',(*+,) [31] . k
k
k
k
k+
k
k
k
k
k
k+
k
n
2
Let us determine the structure of the set C„n^(x y ). Ot
3
Il-strategy
"capture
point"
has the following important property. y(tp)
€
S,
then for all 0 <
I <
t
P
capture of Piayer E, Player P does not leave the set
0
C o n s i d e r the subset
If in the situation the trajectory
S.
x{i)
( « " ( • ) , « ( • ) ) the £
S,i.e.
for the
Proof of one geometric
lemma
153
C n .){xo, !/o) w h i c h is obtained from the latter p r o v i d e d t h a t Player E adopts constant strategies v (-}, \v {-)\ = 0 only (i.e. Player E moves only at a u
(
s
s
constant speed 0 along the rays e m a n a t i n g f r o m the point yo. Let C n . , ( i , y o ) be a convex h u l l of the set u
(
W e show that
0
^uj(.)(*0»lft) =
C n (x ,y ) u
{)
0
0
and the set (?«y(.)(*oi yo) is the b o u n d a r y of the set
!/o) (see in 3).
T h e o r e m 2 A s s u m e t/ia( the. set C n^{x ,y ) intersects the complement of the set S. Then, for U-strategy to satisfy the condition of Theorem 1, it is sufficient that there be the strategy »*(•) for Player E, under which he moves along some ray at the speed 0, and suck that the "capture point" in the situation »*(•)) belongs to the complement of the set S. u
0
0
P r o o f : T h i s theorem follows f r o m that in the situation (u^( ),v*(-}} (see the definition of the f l - s t r a t e g y ) Player P also moves along some ray passing through xo to the " c a p t u r e p o i n t " , i.e. he realizes the speed-up to the " c a p t u r e point". • Now we show t h a t under the condition C n(.)(:r.o,yo) f]S ^ A such a strategy exists (see L e m m a 1). T h i s is a ray passing through y to the " c a p t u r e p o i n t " . Now we have the theorem t h a t follows. u
0
T h e o r e m 3 To avoid the £-capture of Player E in S, it is necessary and suf¬ ficient that the set C^n^{x ,y ) have a nonempty intersection with the complement of the set S. 0
5.3
0
Proof of one geometric lemma
Let C a r t e s i a n coordinates x,y be introduced o n t h e plane. A s s u m e t h a t at f = 0 the pursuer P is s i t u a t e d in the point (0,0), and the Evader E in ( a , 0 ) , a > t. For t h e s i m p l i c i t y of n o t a t i o n , we set a = 1, 0 = A, A < 1. Denote the position of P a t the t i m e instant ( by P ( ( ) = (x' ,y ), and Player E by p
P
T h e players P and E are assumed t o move i n a straight lines with m a x i m a l speeds along the rays L and L E "meet" at the time instant ( at the point w, w e Lp f"l i>E- T h e coordinates o f the point w(x, y) then satisfy the equations P
s
2
+ y = (t + f?) , ( > 0 2
2
Here, for convenience of investigation of the geometric properties of capture curves, we sightly depart from the adopted notation P — (xi ,x?), E - (y\,yi). 3
"Life line" game of pursuit
154
Fig. 26. (x - a)
+ y
2
2
= (At) ,
(5.3.1)
2
Equations (5.3.1) on the plane x,y define the curve called the Cartesian oval [44]. Since ( > 0 then we get a section of the C a r t e s i a n oval. Denote it by D. A s s u m e that at the time instant r the players P and E change the direction of the m o t i o n (Player P changes this direction so as to realize the ("-capture). T h e n we have a new " c a p t u r e " curve D(r) given by the equations {x - x ) p
+ (y - y )
2
p
( -x ) x
E
2
+ C) , r < i
= {t-r
2
+ (y~yl)
2
= [\(t-r))
2
2
(5.3.2)
(f is the " c a p t u r e " time for the players P and E when they are moving along L and L , respectively). N o t e that w i t h T = 0 the equations (5.3.2) convert to (5.3.1). P
E
W e have the following basic l e m m a [34]. L e m m a 1 D(r) is closed, bounded convex curve for X < I, 8 > 4 T < tMoreover, with T > 0 the curve D(T) lies in a closed region bounded by the curve D = £)(0) (Fig.26). Before proving the l e m m a , three simple remarks w i l l be given without proof. R e m a r k 1. If any straight line intersects the curve K at no more than two points, then this curve is s t r i c t l y convex. R e m a r k 2. Let 2
2
y -x (y-'Jo'f t*>h
2
= m ,
- H (x-
x)
Vo>0,
\x \ >
2
0
Q
2
=
n, 2
y. 0
Proof of one geometric
lemma
155
be u p w a r d branches of two hyperbolas which have no more than two common points. R e m a r k 3. B y the h o m o t h e t y of the branch 7 of the h y p e r b o l a with the homothetic center at the point M £ 7 this b r a n c h changes to the branch 7 ' of the h y p e r b o l a , 7 a n d 7' being tangent at the point M 6 7 . If the homothety coefficient 0 < k < 1, then the convex region G b o u n d e d by 7 contains 7'.
If,
however,k > 1, then 7 ' is outside G. Proofrf
of the l e m m a ) .
In E u c l i d e a n space w i t h coordinates
(x,y,t),
the
equations from (5.3.2) give the right circular cones w i t h the axes parallel to the axis r.
U n d e r the c o n d i t i o n r
<
t each of the equations (5.3.2) give
the upper (i.e., directed to the increase of t) sections of these cones a n d the curve D{r)
K{E,r)
cones onto the plane W e show t h a t D(T)
K{P,r),
is a projection of the intersection line of these
x,y. is convex. T o this e n d , (see R e m a r k 1), is satisfies to
check whether the straight line intersects D ( r ) at no more than two points. Let p be a straight line in the plane x, y a n d TT the plane passing through p a n d being p a r a l l e l to the axis t. Here irf)ff(P,0)
and xf\K(E,0)
are the upper
branches of two hyperbolas h a v i n g , by R e m a r k 2, no more than two
common
points. W e have thus proved the convexity. We prove t h a t D(r) bounded by D — D{0).
w i t h T > 0 lies i n a closed (convex) region which is T o do t h i s , it suffices to establish t h a t , for any straight
line passing t h r o u g h the point w = D[r}f]D,
the chord in D[T)
is the p o r t i o n
of the chord of the same straight line w h i c h is cut out i n D. Suppose t h a t the players P a n d E are m o v i n g along the straight lines hp and L B , respectively. points P ( t ) £ K{P,0} P(t)
a n d E{t),
and L
E
C K{E,
A t the instant of time ( we associate them w i t h the a n d E(t)
£ A " ( £ , 0 ) whose projections are respectively
T h e p o i n t s P(t)
a n d E(t)
move along the rays L
P
C
K{P,0)
0) intersecting at the point w (on the plane x, y the point w is
projected i n t o the point w).
Since the directions of motions of players P and
E change at the instant i — r , then the points P(t) associated w i t h the points P{t)
£ K(P,r)
a n d E{t)
a n d E(t)
w i t h t > r are
£ K{E,r)
i n much the
same way (Fig.27). Let p be the straight line on the plane passing t h r o u g h the point w, and IT the plane passing t h r o u g h the straight line p a n d parallel to the axis (. Introduce the n o t a t i o n i(P,r) T h e curves i{P,
=
Tr^K(P,T),
0), 7 ( £ , 0), 7 ( P , T ) , I(E,
f(E,T)
= Trf]
K{E,T).
T) are the branches of hyperbolas
lying i n the p l a n e TT (see the F i g . 2 7 ) . A l l of t h e m are passing thought the point w. T h e real axes of these hyperbolas are p a r a l l e l to the axis t. T h e center of h y p e r b o l a 7 ( P , 0 ) is at the point A , t h a t is an ortogonal projection o n t o TT of the p o i n t ( 0 , 0 , - £ ) -
the vertex of the cone K(P,0).
In the same m a n n e r
"Life line" game of pursuit
156
Fig. 27. the center of the hyperbola, "y(P,r) is the projection onto the same plane n of the vertex of the cone k(P,r) so that the p o i n t s A , A' a n d w lie on the same straight line, that is a projection of the straight line P(0)w onto jr. From this it follows that f{P,r) is o b t a i n e d from ~t{P,0) by the homothety with homothetic coefficient _ wA' _ i - T + 1 k
~ ^4
p
"
T T T '
where ( is the applicate of the point w. S i m i l a r l y let B a n d B' be centers of hyperbolas 7 ( E , 0 ) and I { E , T ) . T h e points B , B' a n d w lie on the same straight line, so that 7 ( £ , r ) is o b t a i n e d from 7 ( E , 0 ) by the homothety with the homothetic coefficient wB' k
Clearly, k
E
< kp < 1.
E
=
t~r
^ B = - r -
Basic
theorem
157
L e t M be the second intersection point of 7 ( P , 0 ) a n d 7 ( £ , 0 ) , a n d JV be the second intersection p o i n t of the hyperbolas -y{P,r) show that the p r o j e c t i o n of the segment wM projection of wN.
covers the
Indeed, for the h o m o t h e t y w i t h the coefficient k
center at w, the h y p e r b o l a t(E, hyperbola f(P,
and 7 ( E , r ) . W e will
onto the plane x,y
0) converts to the h y p e r b o l a f{E,
E
a n d its
T ) , a n d the
0) to the branch 7, where the second intersection point M' lies
on the segment wM.
T h e curve 7 ( P , T ) is o b t a i n e d from 7 by the homothety,
with the coefficient k = £ | > 1, so t h a t N (see R e m a r k 3) lies on the arc wM of the curve
I { E , T ) .
Since f(E,r)
is uniquely projected onto the plane
then the p r o j e c t i o n of JV lies on the projection of the segment wM. thus proved the l e m m a .
5.4
x,y,
W e have •
Basic theorem
B y e m p l o y i n g L e m m a I, we now show that D is oval, that is a set of "captures points" for the s t r a i g h t - l i n e m o t i o n of P l a y e r E, a n d it coincides w i t h the boundary of the set Let D(x,y)
C n^(x,y). u
be the region that is bounded by oval D constructed for the
initial states X,y
of the players P a n d E.
the diameter of the set C {y) situation (x,y\u^(-),vs(-))
W e choose 8 i n such a way that
is less than e. T h e n the " c a p t u r e p o i n t " i n the
E
under any strategy Uf(-) belongs to the e- neigh-
borhoods of the set D(x,y).
Indeed, let x{t),
y{t)
be the m o t i o n trajectories
for the players i n the s i t u a t i o n (x, y; uj {•), u$(-)), a n d to, h,---, break points of the t i m e interval [0,oo) w i t h the step 8; y(t)
in,---
be the
is a polygonal
trajectory corresponding to the c o n t r o l u(() which constructs the imaginary motion of Player E as described i n the definition of I l - s t r a t e g y . A c c o r d i n g to L e m m a 1, we have D{x,y)
3 D (x (|)
D D(x(t ),y(t )) 2
...
2
3 D(x(t ),y(t )) k+l
k+1
,y (|))
3
-
= D(x(t ),y(t2)) 2
D •• •
= D(z(f ,),?/(£*+,)) D ... t +
From these inclusions it follows that the " c a p t u r e p o i n t " of the fictitious P l a y e r E m o v i n g along y(t)
belongs to the oval D{x,y).
B y the definition of II-
strategy, however, after the ^ - c a p t u r e of P l a y e r E the ^-capture of P l a y e r E takes place i n the t i m e not exceeding 6, i.e. i n the e-neighborhood D(x,y).
of the set
"Life line" game of pursuit
158
T h e o r e m 4 For any e > 0 there is such 6 > 0 that the t-neighborhood the set D(x,y) tke set D{x,y) the straight-line coincides C a^(x,y) u
contains
the set C n^(x,y).
is the oval D(x,y), motions
the following
representing
a set of "capture points"
of Player E in the situation
with the set C^u^x^).
of
In this case, the boundary of
u
Moreover,
(x,y;
«?(•),«*(•)))
for
which
for the convex hull of the set
equality holds:
A s a corollary of the T h e o r e m 4 a n d T h e o r e m s 1-3 i n this chapter, we obtain the following theorem that is basic for this chapter. T h e o r e m 5 The strategy u^(-) is optimal in the game Vs[x, y) for in that if D(x,y) has a nonempty intersection with the complement S the t-capture of B in S is unposible. If one £- neighborhood of contained in S, then Player E is t-capturcd in S under any strategies
5.5
Player of the D(x,y) v (-) £ s
P set is E.
Rejection of discrimination
Rejection of d i s c r i m i n a t i o n against Player E is possible in one of the following cases (in the case involved t > 0). 1. There exists such c > 0 t h a t the c neighborhood of the set D(x,y,t)
(in
the section, the notation D(x,y,() is used instead of D(x,y) w i t h the emphasis on the dependence of the set D upon the ("-capture radius) is contained i n S (by the e-neighborhood of the set is meant a union of the e-neighborhood of a l l points appearing i n this set). 2. T h e set S does not contain the sets
D(x,y,t)
T h e o r e m 6 Let the point x, y £ S and condition 1 fie satisfied. Then the Player P can ensure the t-capture of Player E independent of latter's actions, i.e. the value of the game V(x,y) = + 1 . Let the point x, y £ S and condition 2 be satisfied. Then Player E can ensure the avoidance of the t-capture by Player P independent of tke latter's actions, i.e. the value of tke game V(x,y) — —1. P r o o f : W e prove the first part of the theorem. Let D {x,yJ) be the eneighborhood of the set D(x,y,t). N o t e t h a t the set D,(x,y,t) varies as x, y, £ vary. T h i s , i n particular, means that for any e' > 0 there is such 6 > 0 that for all |8*, y', t', for which p(x', y\ x, y, t) < 8, c
D,(i',j7)cD,y(i,!,,<).
Rejection of
discrimination
159
Take ef, e > 0 such that D .(x,y,£)
C S. T h i s can be done by C o n d i t i o n 1.
t+l
Assume t h a t i = e' + e and S(i) corresponds to this e > 0, and S'(e) corresponds to t > 0 so t h a t there is C^.^y)
C D (x,y,t)
(Theorem 5, C h . 5). T h e n
c
for all x', y', £' such t h a t p(x\ y', t'\ x,y, £) < 6(e),
we have
D,K/,r)c%,!/,?)c^ Take t < /, \i' - t\ < and set 5, < m i n [ ^ l , 8'{e)]. Consider the following strategy for P u r s u e r P . D u r i n g the t i m e S"i he stays at the point x. N e x t , he applies the strategy " ( " ) (11-strategy) to the state of Player £ , i.e. at the point y(t - Si). Such a piecewise o p e n - l o o p strategy u^(-) is actually realizable, since at each instant tk Player P has information on j / ( i t ) , and hence on the line segment [y(tk — b~i),y(tk)]- Therefore, we may say t h a t Player P follows Player E's trace y(t — Si), w i t h d i s c r i m i n a t i o n being for the time b\ > 0. T h e n , by Theorem 5 from the previous section, adopting the strategy u E ( ' ) , Player P ensures the ("-capture of P l a y e r E ' s trace y(t — Si) in the set 5 ( : c , » / , £ ' } . However u
t
p(x(i)Mt))
< p{*(t)Mt
- Si)) + (y(t P
-
6i),!,(f)).
Consider the inequality for the instant i when P l a y e r P realizes the (¬ capture of Player E ' s trace y(i— S ). t
F r o m the definition of S i , we have that in
the time S i , t h e point y cannot go away from the point y(i — 6\) t o a distance exceeding Si/? <
T h i s gives p(x(t),y(?))
+p(y(i-Si),y(t))
+
6
^-<e,
i.e. that P l a y e r P realizes the ("-capture of Player E ' s trace y(t — S,) implies the ('-capture of Player £ by P l a y e r P . Hence P l a y e r P ensures the ( - c a p t u r e of Player E in the set D^x^j^^C
D (x,yJ)cS. i
T h i s exactly means t h a t , when C o n d i t i o n 1 is satisfied, Vair(x,y)
= +1.
Assume that C o n d i t i o n 2 of the theorem has been satisfied. In this case, any strategy v (-), w h i c h in the s i t u a t i o n («?(•))"*(•)) guaranteed Player £ the ^-capture outside S (at the points of the set D (x, y, () w h i c h do not belong to S), guarantees h i m the ( - c a p t u r e outside S and without d i s c r i m i n a t i o n , since Player P ' s position is not i m p r o v i n g here. T h i s completes the proof of the theorem. I 6
t
C o r o l l a r y . T h e whole set of points x, y € S is broken into three sets.
"Life
160
line"
game
of
pursuit
1. T h e w i n n i n g set of P l a y e r P W=
{x,y
P
: 3 e > 0 , D,{x ,£) >y
C
S}.
2. T h e w i n n i n g set of Player E W
B
= {x,y-.
D(x,yJ)£S}
3. N e u t r a l outcome set W
= {x,y:
0
St>0,
D {x,y,t) t
<£ S),
but there is no such e > 0 that D ( x , y , ( ) C 5 . (
For a l l x, y G W the value of the game ValF{x,y) - V(x,y) = + 1 , for all x, y £ W the value of the game Vair(x,y) = V(x,y) = - 1 . For any points of the set I , J 6 Wo, the sets D{x,y,£) and S have c o m m o n boundary points. For a l l x, y € W the 73-capture of Player E is not possible without d i s c r i m i n a t i n g h i m . E v i d e n t l y , any trajectory x(t), y(t) passing from Wp to WE must intersect W . In this sense, the set W separates the sets Wp and W. P
E
0
0
a
E
D e f i n i t i o n 2. T h e set Wo is called a barrier i n the game r ( x , y ) . In the game of k i n d , it is essential to define not only the o p t i m a l strategies for the players P and E but also the sets Wp and W . T o do this, it is sufficient to construct the separating set W . E
a
T h e barrier has the following property: for any point x , y £ Wp there is such a strategy of Player P w h i c h guarantees h i m that the trajectory x(t), y(t) from the i n i t i a l states x,y does not intersect the barrier d u r i n g the game, and for any point x,y £ WE there is such a strategy of Player E w h i c h guarantees h i m that the trajectory x(t), y(t) from the i n i t i a l states x,y does not intersect the barrier. These strategies are o p t i m a l for the players P and E i n the game B y employing this property, R.Isaacs in [1] derived a barrier equation for the " l i f e - l i n e " game in the h a l f - p l a n e . However, a similar definition may be provided for the barrier W i n the case of an a r b i t r a r y convex set S. 0
W i t h the knowledge of the b o u n d a r y of the set S, and h a v i n g an explicit expression for the boundary of the set D(x, y, £) we may provide a geometrically simple construction of the barrier W as a set of points x, y £ S, where the boundary of the set D{x,y,£) (that is a C a r t e s i a n oval in the case (. > 0 and an A p o l l o n i u s circle in the case I = 0) is tangent to the b o u n d a r y of the set 5. a
Rejection
of
discrimination
161
L e m m a 2 Let Sj C S C . . . C S C . . . and let W ', W l , . . . , W$ and W , . . . , W be tke corresponding sequence of winning sets for tke players P and E. Then 2
2 )
i] P
n
n E
E
P r o o f : T h e definition of W suggests that for any x , y 6 W there is such t > 0 for w h i c h D , ( ; r y , ' ) C S™. Since S<*> D S « - ' ) , then D,{x,y,£) C S ' * - " , and hence x, y G H ^ , * ~ . If now x , y e H # * f | S * > , then D(x,y,l) <£ S^ \ and since K) P
1
P
K)
fc
t
y
1,
(
+ 1
k
S
(
H
L e m m a 3 The sets W
,
' 3 5 « , then D ( x , y , £ ) ^ 5 ' *
+ 1
).
and WE are open sets if S is closed.
P
P r o o f : L e t x, y € W . T h e n , by definition, there is such ( > 0 that D<(x,y,i) C S. B u t the continuous dependence of the set D(x,y,t) on x , y, £ imply the existence of such 6(() that for a l l x ' , y', for which p(x', y'\ x, y) < 6(e), P
D(x\y'J)cD,(x, e). yi
Since
D(x, ,0 y
C S j ( x , y , ( ) C £ , ( s , y , « ) C S,
then x ' , y ' £ W . Now let x, y £ W . T h e n D{x,y,£) (2 S . Take the sequence x , x „ - * x, y a n d consider a corresponding sequence o f sets D[x , y „ , £). F r o m the continuous dependence of D[x, y,£) on s , y , £ we have that for any e > 0 there is such /V"(e) that for a l l n > Af(e) P
E
n
n
D{x ,y ,£) n
n
D
D. (x,y,£), (
where Z > _ ( x , y , f ) C J 9 ( x , y , ^ ) is a set of points for which the set D(x,y,£) is the e- n e i g h b o r h o o d . Since S a n d D(x,y,£) are closed a n d D(x,y,£) S, then there is a p o i n t z € D(x,y,l), z £ S which is interior for D(x,y,£). (
Denote by to the diameter of t h e m a x i m u m sphere w i t h its center at ZQ, contained i n D(x,y,£). Evidently, D(x ,y ,£) n
n
D D^(x,y,f),
"Life line" game of
162 and since z £ D^(x,y,
pursuit
£), z & S, then i
and hence for a l l n >
N(e)
T h i s proves the l e m m a .
•
L e m m a 4 Let S , S j , . . - , S „ , . . . i e a sequence of closed convex sets, and W , W \ ..., W ; W \..., Pvt ,...; Iff*,. - W f V • • corresponding sequence of winning sets for tke Players P and E, and barriers. Then, if S = f\S the sets W , W where W \JW = (WPUW^), W = (JWj?> are winning sets for the players P and E in the game defined in the set S. t
P
l)
P
2
P
n)
E
ni
Proof:
F
1
l
E
6 e
P
a
a
E
We prove only the first part of the l e m m a , since the second part is
proved i n m u c h the same way. Let x,y£ n t ^ p ' U W o )• T h i s means t h a t for a l l n D{x,y,£) C S H Evidently, D(x,y,t) C S = f l ^ " ' , and hence x, y € Wp\JW . Now let x , y £ W \J W . In this case,there exist such c > 0 t h a t D , ( x , y, £) C S, or the b o u n d a r y of D(x,y,() and 5 have some c o m m o n points (here D[x,yJ) C S). T h u s , i n b o t h cases D[x,y,() C S , and hence D,[x,y,l) C flM-, n = 1 , 2 , . . . , since S = fJ.fiW. Therefore, x , y € U . 1
P
0
0
T h i s completes the proof of the l e m m a .
•
Generalization to the case of incomplete information is given i n [49].
5.6
Multiplayer "life line" games
T h e games under study are the models of p u r s u i t i n a closed convex set 5 on the plane, where several players - a team of pursuers P = { P i , . . . , P ) and evaders E ,..., E - move w i t h i n the set S w i t h m o d u l u s - c o n s t r a i n e d velocities and have, at each instant of t i m e , an o p p o r t u n i t y to change the direction of m o t i o n . T h e evader E j is said to be captured when such j £ {!,... , m } is found t h a t the positions of the players P^ and E at some instant are spaced on a distance £ {£ > 0) apart. E a c h evader seeks to reach the b o u n d a r y of the set S before he is captured by one of the pursuers [24], T h e pursuit is carried out on the E u c l i d e a n plane and the players' positions are determined by the point {x ,... , x ; y ,... , y ) , where x = ( x * , x j ) is a position of pursuer P * , a member of the team P and y> = {y\,yi) is a position of the evader E j . A t each instant of the time the players may choose the m
t
n
}
1
m
1
m
k
Multiplayer
"life line"
games
163
direction of m o t i o n ( t h e direction of the velocity vector), a n d the velocity w i t h i n prescribed l i m i t s (here the m a x i m u m velocity is constant and equals Q j for Pj G P a n d 8j for £ , ) . M o t i o n equations are of the form
(^)
2
i\=u\,
1= 1,...,m
H=^,
j =
+K )
J
< # ,
l,...,n
(5.6.1)
>= l,...,n
Here a j = const, i = 1 , . . . , m , j8j = const, j - 1 , . . . , n , and m i n , o , > m a X j /9j Let { i { r ) , . . . , i ( ( ) , »*(*)»• ••»•*(*)} 1
m
b
ea
solution of the system of dif-
ferentia! Eq.(5.6.1) w i t h the i n i t i a l conditions f (s
Bj
= inf{« :
y ( ( ) j? 5 } , j = I
1
, n
1
, . - • > " "
n . T h e n with ( > I
J
S
e
€ 5 and . rfft g 5 ,
j = l,...,n. For any £ = ( f where i) g S ,
{ » ) , where
1
£ S , i = 1 , . . . , m a n d r/ = ( r , \ . . . , w"),
a l , . . . , n , we define the differential game of pursuit a n d
1
denote i t by r ( m , n ; f , 7 ? ) . A s noted before, the game F ( m , n ; £ , n ) is the game with n + 1 players - the pursuer team P = { P i , . . . , P } a n d the evaders Ej, m
j = 1 , . . . , n . T h e C a r t e s i a n p r o d u c t of the sets P = YIT '< P* = {"'(•)} p
the sets E', j = 1 , . . . , n , E team P - {Pi,...,
a
n
d
= {v'{-)} are the strategy sets for the pursuer
J
P ) a n d the evaders E ,...,
E , respectively.
t
m
n
Let w(-) e P a n d u '(-) £ E ' , j = 1 , . . . ,n. E a c h s i t u a t i o n (u(-), u (-),-••, J
J
1
u™(-)) uniquely determines rc+m trajectories called the trajectories of pursuers P\,... ,P
m
a n d the evaders E i , . . . , E . n
Let $
=min{(:
^ ( i ) , ^ ' ^ ) ) = <}, * = 1, • • •."*, j = 1 , . . . , n .
In any s i t u a t i o n , the m a t r i x { ( . £ } is uniquely defined, though it is evident that some o f its elements may be equal to infinity. T h e payoff functions are given as follows:
K^trM-^A') +1,
if t
-1,
i f there is such i , t h a t (
0,
SEj
«"(•)) =
0
S E j
l,...,m, > tp , t 1
Saj
f* o o , j =
i f * s . = co a n d tp = oo for a l l i. s
Me,i;«(-).« (-).---.""(-)) = - i : ^ i « . ' ? . « 0 . t » ( - ) ,
1
v»(0).
"Life line" game of
164
pursuit
W h e n the strategy spaces are constructed for players P and E ,j = 1,..., n , w i t h the value of payoff function determined i n each s i t u a t i o n , we determine a f a m i l y of games i n n o r m a l form r ( m , n ; c ; , r / ) , where f = ( | \ - . . , f ) , •) = fa\...,»"),£' € S, i = l , . . . , m , n> € S , J = l , . . . , n . T h e game r ( m , n ; f , r / } is a generalization of the " l i f e - l i n e " games discussed in 1-5, C h . 4, to the case where several pursuers and evaders take part in the process of pursuit-evasion 3
m
Consider the game r ( m , l ; £ , n ) where one evader is pursued by a team of pursuers (Player P is allowed to leave the set S prior to the capture of Player E). D e f i n i t i o n 3. T h e strategy u ( - ) £ P = {~\T P' is called I l - s t r a t e g y if n
U
= (u (0,...,tx (-)),
n
in
mn
where t i ' ( - ) , i = 1»...,TO is the If-strategy i n the game F(£',w) considered on the strategy sets Pi and E (for the definition of I l - s t r a t e g y see 2 i n this chapter). W e w i l l prove some auxiliary statements. n
L e m m a 5 For any e > 0 there is such 8 > 0 that in the discrimination game ^s( iy) (with one pursuer P and one evader E) in this situation (u$(-),v$()) the points y(t) with 0
P
P r o o f : Let y(t) be such a point. Consider the strategy v(-) prescribing that Player E should move from the point y to the point y{i) along the trajectory y(t] and remain at the point y(i) beginning from the instant i . Since for any e > 0 it is possible to choose 8 > 0 i n such a way t h a t in any situation (uP(•),»(•)) the point of ( - c a p t u r e will belong to D (x,y,£) this is valid also for the situation ( « ? ( • ) ) " ( ' ) ) However, the " c a p t u r e p o i n t " i n the situation («?(•),«(•)) is exactly the point y(i) and we have y(t) £ D [x,y,t). This completes the proof of the l e m m a . • c
c
W e now consider the game r ( m , 1; x, y). Denote by r {m, l;x,y) the game corresponding to the game r ( m , 1; i , y) where P l a y e r E is d i s c r i m i n a t e d for the time 8 > 0. L e t x ,... ,x be the initial positions of the players P , , . . . , P , respectively. Introduce the notation s
l
m
m
m D(x, e)
=
y>
r)D(x\y,e)
T h e o r e m 7 For any t > 0 there is suck 6>Q the M l ^ ( i , | ) t % | , i ) . f
that in tke game
T {m,\;x,y) e
i
P r o o f : Let y{t) be a trajectory in some s i t u a t i o n (u^(-),v (-)). L e t (},,.. .,f)5 be the instants, when the players P ...,P capture P l a y e r E, ordered so that tp > $ > ... > tp 6
u
m
Multiplayer
"life
games
line"
Consider the trajectory y(t)
165 on the interval [0,t ]. P
y(t) e
B y L e m m a 5,
D,(x\y,£).
Similarly, for t G [0,fj>] y(t)€ Evidently, y(t)
D^y.l)
for t G [ 0 , i £ ] belongs to all D (x\y,£), c
i =],...,m,
i.e. for
<e[M?) j ( i ) e A ( s , y , ^ n 1=1
f
l
Hence, i n p a r t i c u l a r , we have t h a t the point y(f$) in the s i t u a t i o n (u (-),v {-)), belongs to D (x y,£). arbitrary, then n
s
t
t
^ s - 4 which is the " c a p t u r e p o i n t " Since the strategy v (-) is c
N o t e that the game r(m, 1, x, y) is z e r o - s u m and the payoff of the team P equals +1 as soon as one of the players P entering the team P approaches Player £ o n a distance t {£ > 0). • A s a corollary of the previous theorem, we then have the following theorem. T h e o r e m 8 In tke game r « ( m , l , f j , tf) the Il-strategy is optimal for the team of pursuers P = {P ,.., P ] in the sense that if D(x,y,t) £ S, then Player E has the strategy v(-) under which the t-capture in S is not possible for any strategy u (-). lt
m
s
In fact, since D(x,y,£) <£ S, then there exist y G D andygS. Let v(-) be a strategy prescribing Player E the following behavior move with maximum velocity in a straight line from tke point y to the point y. Then the £-capture of Player E on the interval [y, y) is not possible under any strategy «*(•) P r o o f : Suppose the opposite is true. L e t Us(-) be a strategy under which the " c a p t u r e p o i n t " i n the s i t u a t i o n (u$(-),Vs(-)) belongs to the interval [y,y). and the c a p t u r e may be performed by Player Pi. In this case, because the n - s t r a t e g y against any s t r a i g h t - l i n e m o t i o n of P l a y e r E presupposes the m i n i m u m c a p t u r e time i n the situation ("?(•), v("))> then the ( - c a p t u r e may be performed by P l a y e r P ; on the interval [y,y) using n - s t r a t e g y . T h i s means that there are the points of interval [y,y) belonging to the boundary of the set D(x',y,£). T h i s , however, is not possible, since the interval \y,y) is not contained i n the intersection of convex sets D(x,y,£) = f T D(x',y,£) (whose boundary contains no interval). T h i s completes the proof of the theorem. • A s i n 5, we can give up the d i s c r i m i n a t i o n requirement. formulate the theorem w i t h o u t repeating the earlier reasonings.
Here we only
" L i f e i i i i e " g a m e of
166
pursuit
T h e o r e m 9 Suppose that in the game T(m,\;x,y) there is such t > 0 that the set D (x,y,t) is contained in E. Then the team P = { P i , . . . , P } can ensure the t-capture of Player E (the capture of Player E by one the players Pi) independent of tke latter's actions in the set S (I > 0). t
m
Assume that i n the game T(m, l;x,y) the set S does not contain the set D(x,y,£) T h e n Player E can ensure the avoidance the ( - c a p t u r e by the team P (the avoidance of the ( - c a p t u r e by a l l players P, independent of the actions of the team P). In the first case the value of the game is + 1 , and i n the second — 1 . A s in 5, we can define the notion of the w i n n i n g set for the t e a m P, the w i n n i n g set for Player E, and the barrier. For the solution of the game F ( m , n; f, n), we d i d not manage to find optimal strategies for the players and prove the existence of N a s h e q u i l i b r i u m i n pure strategies [2]. However, by employing the integer p r o g r a m m i n g methods, we succeed i n e s t i m a t i n g the greatest guaranteed payoff for the team of pursuers Vair(m,n;£, ). T h e value ValT(m,n\£,n) is estimated as follows. A s s u m e that a set of pursuers (a team) P = {Pi,...,P } is d i v i d e d into n groups M i , . . . , A f „ so that, prior to the game t e r m i n a t i o n , each of these groups can pursue only one player Ett... , E under the n - s t r a t e g y ( m > n). V
m
n
W i t h each fixed p a r t i t i o n i n g of the set of pursuers into the groups Mi,. • •, M the m a x i m u m payoff f(M\,..., M„) for the t e a m P is obtained as a solution of the following assignment p r o b l e m [4]: n
m
a
x
£ D & j « . j
(5-6.2)
provided that J^ ( < 1, J = 1 , . . . , n ; £ " = i O y = l , ; i = 1 n (here OH is the payoff of the i - t h group Mi when Player Ej is pursued in accordance with the n-strategy. It can be found w i t h the help of T h e o r e m 9). Evidently, =1
i}
max
Vair{m,n^,T,)>
Mi,...,M
(5.6.3)
f(M ...,M ) u
n
n
where { M i , . . . , M , J is a set of all possible partitions of the team P i , . . . , into n p u r s u i n g groups.
P
m
Note that since TI > ValV(m, n; £, n) always, then as soon as n is obtained on the r i g h t - h a n d side of the inequality (5.6.3) it converts to equality, and the o p t i m a l strategy for the pursuer team is the d i v i s i o n , at the i n i t i a l instant of the game, into the groups M , M , giving the payoff m a x / ( A f . . . , M „ ) , and the pursuit of the players Ej, for which ijy is obtained as the solution of problem (5.6.2), carried out by the group A / , under the I l - s t r a t e g y . t
r
t
l 5
Multiplayer
"life line"
games
167
R e m a r k 1. L e t 5 be a closed convex set. Consider the game w i t h the " l i f e line" i n the complement of the set S. Here the set C E^\(x,y) is the same as in T h e o r e m 7. So a l l the reasonings holds if i n the s i t u a t i o n ( u ( - ) , « ( • ) ) the trajectory of P l a y e r P does not intersect the b o u n d a r y of the set S before the game terminates. Otherwise the I l - s t r a t e g y is inadmissible, since it does not satisfy the conditions (*) (see 2 i n this chapter). T h i s , however, is not the case if the set M (x, y), w h i c h is a convex hull stretched over C n(.j(a;, y) and t/, does not intersect the set S, i.e. the following theorem holds. u
n
u
T h e o r e m 10 A s s u m e that the set M(x,y) is contained in the complement of the set S. Then the optimal strategy for the Player P is tke U-strategy, and the t-capture of tke player E is always possible in tke complement of the set S. R e m a r k 2. C o n s i d e r the game F(x,y) from the initial positions x,y € 5 for which C n(.)(x, y) C S. B y theorem 5, the ( - c a p t u r e of Player E is always possible i n S. In this case the following statement of the problem has a sense Player E seeks to m i n i m i z e a distance between himself and the b o u n d a r y of the set S at the " c a p t u r e " instant (he seeks to be captured nearer to his " o w n b a n k " - the b o u n d a r y of the set S ) . In this case, the e q u i l i b r i u m is of the form f « (•))"*(")ji where v'(-) is a strategy for P l a y e r E, prescribing that he should move along the ray j o i n i n g the point y w i t h the point of C a r t e s i a n oval D w h i c h is nearest to the b o u n d a r y of the set S. u
Chapter 6 Differential games with incomplete information 6.1
Pursuit games with delayed information for player P
The games of pursuit with incomplete information are an immediate generalization of the complete information games of pursuit. The simplest is the case with Player P acquires information on the phase state of Player E with the delay e > 0, and Player E has complete information. When dealing with the game with prescribed duration , a satisfactory theory is developed for such games and a structure of optimal strategies is determined. This section described the pursuit games with prescribed duration and delayed information for Player P that are continuous counterparts of Example 3 from Ch. 1 (see (25),(26),(29]) . Let a number e > 0 be given and called the information delay. For 0 ~ t ~ e, at each time instant t Player P knows his state x(t), the time t and Player E's state at the initial instant Yo . For e ~ t ~ T, at each time instant t Player P knows hi~ state x(t), the time t and Player E's state at time instant t - e, yet -I), and Player E 's knows his state yet), the opponents state x(t) and the tme t. His payoff is equal to p(x(T), yeT)). The game is zero- sum. Denote it by f(xo,yo, T) Pure piesewise open- loop strategies. By the pure piesewise open- loop strategy v(·) of Player E is meant a pair {T ,.oJ, where T is a partitioning of the time interval (0, T) by a finite number of points 0 = tl < t2 < ... < t. = T , and b is a. mapping which places each state X(tk)' y(tk), tk in correspondence with the measurable open-loop control v( ·) of Player E defined for t E Itk, t k+1 ) . By the pure piecewise open- loop strategy u(·) for Player P is meant a pair {q, ex},
169
Differentia!
170
games with incomplete
information
wher a is an a r b i t r a r y p a r t i t i o n i n g of the time inteval [0, T] by a finite number of points 0 = t-i < t'i < • • • t'k = T and a is a m a p p i n g w h i t h places each state x(t' ), k
y(t'
k
-
i),
for
t'
k
> £ and
t'
k
x(t' ), k
y, a
t'
k
for
t'
< 1 i n c o r r e s p o n d e n s with
k
the measurable control u(t) for P l a y e r P when t £ [**., T h e set of a l l pure piesewise o p e n - l o o p startegies for the players P and E are denoted respectively P and E. T h e game developes according to the system of controlled differential equations i
=
f{x,u),
(6.1.1)
V = 9(y,v),
assuming that a l l the conditions w h i c h ensure the existense and uniqueness of a solution of the system (6.1.1) on time interval [0, T] for any pair of measurable o p e n - l o o p controls u(Z), v(t) are satisfied. T h i s ensures the existence of a unique solution of (6.1.1) when the players P and E adopt the piecewise o p e n - l o o p strategies u(-) 6 P, u(-) € E correspondingly f r o m the given initial conditions x , y . 0
0
T h u s , i n any situation («(•),«(•)), the payoff function is uniquely difened: K(x ,y ;u(')X-)) 0
(6.1.2)
= p(*(ny(T)),
0
where x(t), y(t) is a solution to (6.1.1) w i t h the i n i t i a l condition Xo, y in the situation (u(-),i>(-)), and p is the E u c l i d e a n distance. It is well k n o w n , and this may be illustrated by referring to some simple examples, that generally 0
sup
inf K{x ,y ;u(-),v(-)) 0
^
0
inf
sup K(x
y -u(
0l
0
), v(•)),
(6.1.3)
since the game under study r(x ,y ,T) is not a complete i n f o r m a t i o n game. From (6.1.3) it follows t h a t , i n general, the e - e q u i l i b r i u m point exists not for all e > 0. Therefore, to obtain the e q u i l i b r i u m p o i n t , we w i l l follow the way proposed by J . V . N e u m a n n and 0 . Morgenstern for finite positional games w i t h incomplete information [2],[8], W e extend the strategy spaces of the players P and E to the so-called m i x e d piecewise o p e n - l o o p behavior strategies ( M P O L B S ) which presuppose the possibility of r a n d o m choice of controls at each step. W e will show t h a t for such classes of strategies the equality (6.1.3) is satisfied. 0
0
M i x e d p i e s e w i s e o p e n - l o o p b e h a v i o r s t r a t e g i e s . B y ( M P O L B S ) of Player P is meant a pair p(-) = { r , a } , where r is an a r b i t r a r y p a r t i t i o n i n g of the time interval [0, T] by a finite number of points 0 = ft
k
k
0
k
k
k
k
t
k
P u r s u i t games w i t h delayed information
for player P
171
«*(•) concentrated on a finite number of measurable o p e n - l o o p controls u(() for i e [i*,i*+l)> S i m i l a r l y , by M P O L B S o f P L a y e r E is meant a p a i r w(-) = {
k
k
2
a
k
k
M P O L B S sets for the players P a n d E are denoted respectively by P and E (compare these w i t h the " b e h a v i o r strategies" from [13]). E a c h p a i r of M P O L B S / i ( - ) , " ( ) w i t h the fixed i n i t i a l conditions xo, yo induces the p r o b a b i l i t y d i s t r i b u t i o n on the space of trajectories x(t), y(t), x(0) = xo, 3/(0) — Vo, therefore by the payoff M(xo,yo\p(-), f ( ' ) ) in M P O L B S is meant the expectation of the payoff K(x ,y ;u{-),v(-)) averaged over the distributions over the trajectory spaces induced by M P O L B S fi(-), i/(-). B y defining the spaces of strategies P and £ and the payoff M , we define a mixed extension F(xo, yo, T) of the game V(xo, yo, T) w i t h the i n i t i a l position xo, yo, T. Introduce the f o l l o w i n g a u x i l i a r y definition. Let C%(y) be a reachability set for Player E. Denote by C^(y) a convex hull of the set C (y) 0
0
B
~f(y,T)= Set y(y,T)
m a x />(«',n").
min
(6.1.4)
be achieved at the points (y,y), so that min
m a x p(n',n")
= p(y,y).
(6.1.5)
It follows from t h e definition of y that this is a center of the m i n i m a l sphere containing the set C g ( y ) . Hence the point y is unique. A t the same t i m e , there are a t least two points of tangency of this set a n d the m i n i m a l sphere containing i t , w h i c h coincide w i t h the points y. Jet y(t) be a trajectory y(0) = y of P l a y e r E for 0 < t < T. W h e n P l a y e r E moves along y(t), the value of the q u a n t i t y i{y(t),T-t) varies. L e t § ( ( ) be a trajectory o f the point y from (6.1.5) corresponding to the trajectory y(t). In w h a t follows, we w i l l be dealing w i t h the case where for a l l trajectories y(t) 1{T) e Cf(x). T h e point M w i l l be referred t o as the center o f p u r s u i t if Q
7
(M,()=
max
(6.1.6)
-r(y',g).
is achieved at this p o i n t . T h u s , l(M,£)
=
max
min
max
yeC< [y<)>,"€C< (y') E
E
o(n',n")
Differential
172
games with incomplete
information
C o n s i d e r an a u x i l i a r y simultaneous z e r o - s u m game of pursuit over a convex h u l l of the set C%(y). E a p o i n t n"
P l a y e r P chooses a p o i n t rf €
C%(y),
a n d Player
T h e choices are made simultaneously, a n d Player
€ Cj;(y).
P w i t h the choice of n' does not know the choice of n" by P l a y e r B, and conversely. by V{y,T)
P l a y e r E has the payoff p(n',n").
Denote the value of this game
to emphasize the dependence of the game value o n the parameters
y, T d e t e r m i n i n g the strategy sets C^{y)
a n d C|"(y) for the players P and E.
T h e game in n o r m a l form is w r i t t e n as follows: T(y,T) = < c £ ( y ) , C £ ( j , ) , p < y , T h e strategy set C^iy) of the set Gg[y)
is.
y")>-
for P l a y e r P ( m i n i m i z e r ) is convex as the convex hull T h e f u n c t i o n p(y',y")
is also convex i n its arguments
and continuous. F o r such games, we m a y employ the following theorem (see T h e o r e m 5, C h . 1). T h e o r e m 1 In the game T(y,T) Tke optimal strategy for Player a positive
probability
f C g ( y ) C ft").
1
=
in mixed strategies. E prescribes
to not more than tke (n + 1) points of the set
Cj(j),
Tke value of tke game is equal to min ve£r(
V(V,T)=* or {y,T)
there is the equilibrium
P is pure, and the one for Player
v )
max p{n',n"), „" cj( ) 6
v
V{y,T).
T h e o p t i m a l strategy for P l a y e r P in the simultaneous game F(y, T) is the choice of a center of the m i n i m a l sphere y c o n t a i n i n g the set C (y). Indeed, for any point n" € C^(y) we have E
R =
>
min
max
max
p(y,n")>
p(n',n")
= V(y,T)
= R
where R is the radius of the m i n i m a l sphere c o n t a i n i n g the set C , get E
P(hri")<
whence we
V(y,T)
for all n " e C | ( J / ) , which exactly proves the o p t i m a l i t y of the strategy y for Player P. W e establish conclusively that the strategy for P l a y e r P i n the game
T(y,T)
coincides w i t h the choice of the (unique) point y, a n d the o p t i m a l strategy for Player E prescribes positive probabilities to not more t h a n ( n + l )
points of
P u r s u i t games w i t h deiayed i n f o r m a t i o n for piayer P
173
tangency of the m i n i m a l sphere, c o n t a i n i n g the set C (y), and the set C ^ f y ) . T h e value of the game is equal to the radius of this sphere (see E x a m p l e 8, C h . 1). C o n s i d e r a simultaneous game Y{M ,£), where M is the center of pursuit defined i n (6.1.6). Denote by y~i(M),... , y f c ( M ) the points of the set C (M) which enter into the m i x e d o p t i m a l strategy s p e c t r u m for Player E in the game T(M,£), and by y(M) the center of the m i n i m a l sphere c o n t a i n i n g the set C (M), i.e. the o p t i m a l strategy for Player P i n the game Y(M £). D e f i n i t i o n 1. T h e trajectory yl{t) is called conditionally o p t i m a l if y"(0) = ya,y'(T — £) = M, y"(T) = yt(M) for some p a r t i c u l a r k f r o m k = 1 , . . . For each k there may be several c o n d i t i o n a l l y o p t i m a l trajectory for Player E. E
E
+ 1
E
t
T h e o r e m 2 Suppose T > £ and for any e > 0 by the time instant T Player P can ensure the e-capture of the center y(t) of the minimal sphere containing the set C (y(t — £j). Then the game Y(xo,yo,T) has the e-equilibrium in mixed piecewise open-loop behavior strategies. The e-optimal strategy for Player P is pure and coinsides with any of strategies which ensures | e-capture of the point y(t) The optimal strategy for Player E is mixed: during the time 0 < t < T — £ he must move to the point M along any conditionally optimal trajectory y'{t) and then, with probabilities p ,... ,p +i (the optimal strategy for player E in the game Y(M, £)), choose one of the conditionally optimal trajectories directed to one of the points yk{M) from the spectrum of the optimal strategy for Player E in the game T{M,£). Here the value of the game is equal to -y(M,£) E
t
n
P r o o f : Denote by u'(-) one of the strategies for Player P whose existence is assumed in the theorem, and by e*(-) the M P O L B S for P l a y e r E whose o p t i m a l i t y w i l l be proved below. W e shall prove that M(x ,y -,p(-),v"(-)) 0
+ e>
0
M(x ,y -,u:(-),v-(-)) 0
>M(x ,y ;(-), 0
>
0
(6.1.7)
»{•}}-e
o]U
for a l l p(-), (*(••). It is well k n o w n , however, (see T h e o r e m 3, C h . 1) that to prove the v a l i d i t y of (6.1.1) i t suffices to prove (6.1.7) only for pure piecewise o p e n - l o o p strategies of the players P and E, i.e. it suffices to prove that M(x ,y ;u(-),v'(-)) 0
+ e>
0
M(x ,yo\<{-), 0
**(•)) >
>M(x ,y ;u;(-),v(-))-e 0
0
for all u(-) € P , »(•) 6 & Denote by x'(t) P l a y e r P ' s trajectory in the s i t u a t i o n («*(•)."*(•))• M(x ,!/o;<(-)X(-))= E p * r i * m f c > 0
(6.1.8)
T
h
e
n
(6-1-9)
Differential
174
games
with
incomplete
information
Let R be a radius of the m i n i m a l sphere c o n t a i n i n g the set C (M),
i.e. R =
E
7
( M , f ) T h e n for a l l * = 1,- - . , n + 1
F r o m (6.1.9) we o b t a i n ( j ^
P
k
= l)
R - \ < W(*o,ffoi <(•),<**(•)) J S H + | Let i ( T ) , y[T-t)
(6-1.10)
be the corresponding states i n the s i t u a t i o n ( z , ! / ; < ( ) . e0)> 0
and Q(-) be a p r o b a b i l i t y measure induced o n the set C {y(T
0
— t)) From the
E
o p t i m a l i t y of the m i x e d strategy p = ( p , , . . . , p „ + i ) i n t h e game r(M,t)
we
have R - £ = v«ir(
t f
M 5 ( # ) , S t ) > 7(y(r
(r-<),*)>
/ ,
-
=
/>{s[y
(6.1.11)
where p[j/(T — t)] is the center of the m i n i m a l sphere containing the set C {y(T l
E
- £)). B u t p(x'(T) y(y{T y
- £))} < §, therefore
?(5(r),») < I + p[y(y(T-t))M
<
R
(H-«J
+ l
F r o m (6.1.10)-(6.1.12) we find U(*h»>;<(•).**(•))> / ,
p(x(r),,)dO- ,
(6.1.13)
e
however
J ^ _ P(HT),y)dQ ct
T
t))
= M(«o,»o; «:(•). *(•))•
(6.1.14)
From (6.1.13) a n d (6.1.14) we o b t a i n the r i g h t - h a n d side o f the inequality (6.1.7). T h e l e f t - h a n d side of (6.1.7) follows from the definition of strategy u,.
•
R e m a r k . For T < i t h e solution t o the game is not s u b s t a n t i a l l y different from the case T > t a n d the theorem holds if instead of C (y ), E
l{MJ),y(T-()
we understand C £ ( j / ) . C j ( l f o ) , l(M,T), 0
T h e diameter D[C (M)\ E
of the set C (M) E
C' (yo), E
0
tends to zero as g - » 0 w h i c h , in
t u r n , makes the value of the a u x i l i a r y game Y(M,f) value of the a u x i l i a r y simultaneous game ?{M,£)
0
y respectively.
tend t o zero. However, the over the sets C (M), e
E
C' (M) E
equals t o the value of the pursuit game w i t h delay of i n f o r m a t i o n t denoted
P u r s u i t games w i t h delayed information by Vair(x ,yo,T) = Vt(x ,y ,T) delay). Hence we have 0
0
for player P
175
(here the index £ stands for the information
0
)imVi(j ,!ft,,T) = 0 o
Since l i m / _ D[C (M)\ = 0, then the m i x e d o p t i m a l strategy for Player E in the game T{M,£), concentrating its mass at no more than ( n + l ) points from the set C {M), concentrates (in the l i m i t ) its mass at one point M, i.e. converts to a pure strategy. T h i s , i n t u r n , implies the existence of the o p t i m a l pure strategy for P l a y e r E i n the game V{x ,y ,T) as £ —> 0 Example 1. M o t i o n equations are of the form: f
0
E
B
0
for P l a y e r P for P l a y e r E
x - u, y=v,
0
|u| < a \v\ < 0
( a > 0), x € R, y € R. 2
2
L e t Xo, yo b e the i n i t i a l state of the game.the time T satisfies the condition pJ* y£ b
T>
+
i
Q-0
(here p is the E u c l i d e a n distance between the points x , y ). T h e reachability set C' (yo) = C (yo) and coincides w i t h the circle whose center is yo and radius 0t. T h e value of the game V(y,£) is equal to the radius of the circle C' (y), i.e. Vair(y,e) = 0t. 0
E
0
E
E
It can be seen t h a t ValV(y,£) is independent of y, hence any point of the set C ~ (yo) may serve as a point M. T h e o p t i m a l strategy for Player P i n the game T(y, £) is the choice of the point y, a n d the one for Player E is m i x e d and is the choice of any two d i a m e t r i c a l l y opposite points of the circle C (y) w i t h probabilities (^, \) (see E x a m p l e 7, C h . 1). E
f
E
A c c o r d i n g l y , the pure o p t i m a l strategy for Player P i n the game V(x , yo, T) is the focusing w i t h m a x i m a l velocity on the point y(t — () for t < t < T (on the point yo for 0 < t < i) prior to this point a n d afterwords, remaining in the e-neightbourhood of this point until the instant T. T h e o p t i m a l strategy for P l a y e r E the M P O L B S , is exactly the passage from the point y to an a r b i t r a r y point M £ C ~'(yo) d u r i n g the time T — I a n d then the choice of one of two directions toward two d i a m e t r i c a l l y opposite points of the circle C (M) w i t h probabilities (\, \). Moreover Vair(x ,y ,£) = 01. 0
0
E
l
E
0
0
Evidently, l i m ^ ( i , y , r ) = lim/?£ = 0, o
o
(6.1.15)
a n d , i n the l i m i t game w i t h complete i n f o r m a t i o n , for any < > 0 Player P ensures the t - c a p t u r e of P l a y e r E. A t the same time, for £ > 0 Player P can never ensure the ( - c a p t u r e of his opponent.
Differential
176 Example
games with incomplete
information
S. M o t i o n equations are of the f o r m : for P l a y e r P
i ,= x =
a
for P l a y e r £
m = vy
2
3/a =
?
+
1
1
|
£
t
t
a
( 6
.
L 1 6 )
u,
2
2
+
V l
,
_
{
<
v
+
J
( g
,
}
-1,
T h e reachability set for Player P is the same as in the g a m e from E x a m p l e 1. We find a reachability set for P l a y e r E. W e have V*(t) = n(0)-t,
(6.1.18)
set v = +1 and substitute (6.1.18) into the first equation (6.1.17). T h e n V i ( t ) » t + 1 - j/2(0) + M O )
+ y (0) 2
l]e'.
Setting u = — 1, we find !/,(() = - ( - 1 + J/ (0) + M O ) 2
- y (0) + l ] e ' . 2
T h e reachability set C E ( I A ( 0 ) , jfifO)) coincides w i t h the line segment joining the points + 1 — !/j(0) + (j/i(0) + 3/2(0} -
A=t
l)e',
B « - * - 1 + 3/2(0) + ( (0) - y (0) + l ) e ' . yi
2
T h e radius of the m i n i m a l sphere c o n t a i n i n g C ( (0},y (0)), half of the length of the interval [A, B\: E
fi[C (y.(0),y (0))] £
yi
is equal to the
2
= |1 - 1 , 2 ( 0 ) + r - ( l - 2 ( 0 ) ) e ' | . V
2
T h u s , the value of the a u x i l i a r y simultaneous game F(i/i (0), 3/2(0); I) is equal to = |1 - <j (Q) + < - (1 - y (0))e'|.
ValT(y,(0) y ((l)-i) }
2
2
2
To find the value of the pursuit game, we have to determine max ValV (y y ;t) (y .y;)ecj-'( (0),v (
!ll
1
2
=
1
max I1-& + * - ( ! 0-i.yi)6Cj-'(i ,|0|.J'l(f'))
yjje'l.
(6.1.19)
(
T h e reachability set C e " ' ( ! / i ( 0 ) , y 2 ( 0 ) ) , however, is composed only from the points of the form {yi,y = y (0) - T + £ } , hence the m a x i m u m i n (6.1.19) is equal to 2
2
|i+r-y (o)-(i + 2
r-*-in(Q)y|.
G a m e w i t h i n f o r m a t i o n delayed.
Case of m pursuiers and one evader
177
The o p t i m a l strategy of P l a y e r P is the p u r s u i t of the center of the reachability set C {y y ) (the m i d d l e of the interval \A,B]). Let us assume t h a t in the time T Player P can ensure the capture of t h a t center. E
u
2
T h e o p t i m a l strategy of Player E is a r b i t r a r y for t £ [ 0 , T - £). A t the instant of time T — £, w i t h probabilities |, | Player E chooses one of the two possible values of the control v = + l , t i = — 1 , and o n the remaining time interval ( T - £,T] he chooses a control v(t) that is identically equal to the control chosen by h i m at the instant T — £ as a result of a p p l i c a t i o n of a random m e c h a n i s m .
6.2
Game with information delayed. Case of m pursuiers and one evader
T h i s section deals w i t h the z e r o - s u m two-person game of pursuit between the pursuer t e a m . P = {Pi,...,P } and the evader E, the d u r a t i o n T being prescribed. M o t i o n equation are of the form: m
for the Players Pj for P l a y e r £
i<'> = f^{x^,u), y = g{y, v),
i =
l,...,m,
Here « « £ R , y £ ft", u £ C ft', v £ V C ft*. Let a n u m b e r £ > 0 be given and caUed the information delay. For 0 < f < £, at the instant of t i m e t P l a y e r P knows his state i ( t ) , time t and Player E's state at the i n i t i a l instant y . For £ < t < T, at each instant of time t Player P knows his state x(t), time ( and Player E's state y(t — £) at the instant t — £. A t each i n s t a n t of t i m e f P l a y e r E knows x{t),y(t),t. H i s payoff is equal to ^ I X i PHX^T), y(T)). Denote this game by y , T). n
0
r(*£\...,
0
P u r e piecewise o p e n - l o o p strategies ( P O L S ) for Players P = { P i , P ) and Player E have the form defined i n 1 of this chapter. T h e game proceeds i n accordance w i t h the m o t i o n E q . (6.2.1), assuming that a l l the c o n d i t i o n s , which ensure the existence and uniqueness of a solution to the system (6.2.1) on the interval [0,T) for any p a i r of measurable o p e n - l o o p controls u ( f ) , u ( i ) have been satisfied. T h i s ensures the existence and uniqueness of a solution of the system (6.2.1) when the players P and E adopt the piecewise o p e n - l o o p strategies u{>) £ P, v{-) £ E f r o m the i n i t i a l conditions XQ, J/Om
T h u s , i n any s i t u a t i o n (u(-),v(-))
w i t h the given i n i t i a l conditions x
0
=
yo the payoff function is defined in a unique way: K(x ,yoM-)M))--tp ( (ny(T)), 0
1 xli)
m
{=i
(6.2.2)
games with incomplete
Differential
178
information
where x(t) — {x'''(()}, y(t) is the solution of the system (6.2.1) w i t h the initial conditions x = {xW>--->M^h Vo and p is the E u c l i d e a n distance. 0
t
m
n
e
s i t u a t i o n (u(-)>«('))> "(0 =
In general, as i n the game from 1 of this chapter, sup
i n f K{x ,y ;u{-),v(-)} 0
inf
^
0
sup K(x ,y ]u(-),v{-)) 0
(6.2.3)
0
since T ( x ' ' , . . . ,x \y , T) is not a perfect i n f o r m a t i o n game. F r o m (6.2.3) it follows that the e -equilibrium in this game exists not for a l l e > 0. W e extend the strategy space P and E to the m i x e d piecewise open-loop behavior strategies w h i c h i m p l y a random choice of controls at each step. m 0
0
0
T h e n we show that in such class of strategies the equality (6.2.3) holds. Denote the M P O L B S for the i - t h member of the t e a m P = { P , , . . . , P } by /*'(-)• T h e n the M P O L B S for the team P is of the form p(-) = {/*"(•)}, i = 1,... , m . m
T h e M P O L B S sets for the players P and E are denoted respectively by P and E. W i t h the i n i t i a l conditions z , yo fixed, each M P O L B S pair induces probability d i s t r i b u t i o n on the space of trajectories x'"'((), i — 1 , . . . , rn, y(t), hence by the payoff M(x ,y ; { « ' * ' ( • ) } , C(*)') M P O L B S is meant the expectation of the payoff K(x ,y \ {u'''(')}> (')) averaged over d i s t r i b u t i o n on the trajectory spaces induced by M P O L B S {p (-)}, v( ). 0
0
0
m
0
u
0
li)
After the strategy spaces P , E and the payoff M are defined, for each of the fixed i n i t i a l conditions x y , T we construct a m i x e d extension t(x '\ ..., x \ y , T) o f the game r ( x , . . . ,x \ t, , T). 0:
0
0
0
l f
0
0
m
0
m
0
Introduce the following auxiliary quantity. L e t C ( y ) be a reachability set for Player E. Denote a convex hull of the set C%(y) by C (y). L e t E
B
1
min
-r{y,T)= ff
M
m
max - T p
2
(6.2.4)
^ ^ ) .
£ CT(y)
Consider a simultaneous a u x i l i a r y z e r o - s u m game of p u r s u i t on the convex hull of the set Cl(y). Player P = {P ...,P } choses m points n W , a n d P l a y e r E a point n £ CE(V)T h e choices are made simultaneously a n d , w i t h the choice of tjW £ Cl(y) Player P = { P i , . . . , P } has no knowledge of t h e choice made by Player E, and vice versa. Player P has the payoff £ p (r/ K ")• Denote the value of the game by V(y, T) (the game is two-person zero-sum) in order to emphasize the dependence of the game value on the parameters y, u
m
m
2
[i
G a m e with information
delayed.
Case of m pursuiers
and one evader
179
T which determine t h e strategy sets {C%{y)}' , C j ( j / ) for the players P and E. T h e game i n n o r m a l f o r m is w r i t t e n as follows: n
r( ;J)=<{^!,)r,c !
Here {C' {y)} r
E
I
fe|,-E ' (^, n > /
1
is t h e m - t u p l e (Cartesian p r o d u c t ) o f the sets C%{y).
m
T h e strategy set {Cl(y)}m of P l a y e r P (minimizer) is convex as the C a r t e sian product o f convex hulls of the sets C (y). T h e function 4- Ti'^ P^ifl tV) is also convex i n its arguments and is continuous. For such games we can apply T h e o r e m 5, C h . 1. In this case, it is stated as follows. 1
E
T h e o r e m 3 In the game T(y,T) there is an equilibrium in mixed strategies. The optimal strategy for Player P is pure, and the optimal strategy for Player E is mixed and prescribes positive probabilities to not more than ( n + l ) points of the set Cg(y) (C^(y) C R"). The value of the game is equal to 1 max — y V ( n m
min
V(y,t)= or (y,T)
=
1
( , |
• ,n),
V(y,T).
T h e solution to the game T(y, T) is given in the second part of E x a m p l e 8, C h . 1. " It turns o u t that the o p t i m a l strategy for Player P in the game F(y, T) coincides w i t h the choice of m identical points y that are the centers of the m i n i m a l sphere S c o n t a i n i n g the set C (y) (here the point y is unique), a n d the o p t i m a l strategy for Player E prescribed positive probabilities to not more than ( n + 1) points from the points y at which the m i n i m a l sphere containing the set C (y) is tangent to this set. T h e value of the game is equal to the square of the r a d i u s of this sphere. E
E
T h e point M is called the center of p u r s u i t if
max
(M,T-i)=
7
iW.T-i).
(6.2.5)
is achieved therein. T h u s 7
—
max
(A/,r-*)
mm
(»} V
(i)
€
=
max
C {y') E
- £ > V
)
. ' J )
(6.2.6)
180
Differential
games with incomplete
C o n s i d e r a simultaneous game T(M,£), defined i n (6.2.5). Denote by y^M),...,
information
where M is the center of pursuit the points of the set
y (M) n+l
C' (M) E
entering i n the s p e c t r u m of m i x e d o p t i m a l strategies for P l a y e r E i n the game Y(M,T
— £), a n d by y(M)
C' (M)
i.e. the o p t i m a l strategy for P l a y e r P in the game
E
the center of the m i n i m a l sphere c o n t a i n i n g the set T(M,£).
D e f i n i t i o n 2. T h e trajectory y%(t) is referred to as c o n d i t i o n a l l y optimal if j/*(0) = 3/0, y"(T
— £) =
k — l , . . . , n + 1.
For each k there m a y be several c o n d i t i o n a l l y optimal
M,
y*{T)
f°
= y (^) k
some p a r t i c u l a r k from
r
trajectories of Player E. Denote by y'(t)
the trace of the trajectory y(t)
which becomes known to
Player P by the instant t. B y the definition of the state of i n f o r m a t i o n i n the game y'(t)
for 0 < ( < £, y'(t)
= y
0
= y(t - £} for i < t < T.
Consider the
quantity ~r(y'(t),£)= J
min max ; = ( n ( ) , . . . , M ) >eC (y<<.)) ,
6
f )
;
E
C (y'(t)) E
for 0 < t < T. L e t m i n max be achieved at the points 1 "' 7
( i )
(0,5(0),
0<(
i-l
E a c h o p e n - l o o p control v(t) for Player E corresponds to a unique trajectory y(t) the trajectory y'{t), i n t u r n , corresponds to a u n i q u e trajectory J7 ''(E) (evidently, y'(t) = y(t), i = l , . . . , m ) . !
L e t us introduce a fictitious player E' capable of m o v i n g along the trajectory mT h e o r e m 4 Let T > £, and for P — {Pi,...
,Pm}
time T.
Then the game T(x^\
piecewise
open-loop
P = {Pi,...
,P )
any 1 > e > 0 each member of the team
can ensure the e-capture ..,,
x \
m
u ' ' ( - ) } which ensures the e, = T.
The optimal
The e-optimal
strategy for Player
and then choose with probabilities the game V(M,g))
k
1 + [
optimal
(the optimal
optimal
instant
trajectory
strategies,
trajectories,
k = \,...,n
strategies for Player
value of the game is equal to '(]
p,
py, ...,
to the points y (M),
spectrum of mixed optimal
1
by the
E is mixed: during the time 0 < ( <
one of the conditionally
over the point y*(t-()
= {u '^-),...,
of the point y(t)
he must move to the point M along any conditionally
E in the game r(M,l).
s
-
R? < e, R
2
— (R -
c,}
2
<
f
.
T—i y'(t)
of E i n
which
+ l, entering
y(M,£).
is stated from the condition ( R + <,)
of
in mixed
strategy of the team
with any strategy u'(-)
-capture
m
E' by the instant
0
behavior strategies.
is pure and coincides
of Player
y , T) has the equilibrium
m 0
carry
into the The
Game with information
delayed.
Case of m pursuiers
181
and one evader
Proof: Denote by y'(t) one of the strategies for Player P whose existence is assumed i n the theorem, and by (/"(•) the M P O L B S for P l a y e r E whose o p t i m a l i t y we are going to prove. W e shall prove that » ( « o , r y * ! / t ( - ) . ^ ( - ) } + * > M{xo, -«;(•), yo
<(•), *(•)) - e;
> M(x y ; 0l
0
*•(•)) >
p(-) = { / . « ( • ) }
(6.2.7)
for a l l /*(-), v(-). It is well k n o w n , however (see T h e o r e m 3, C h . 1), that to prove the validity of (6.2.7) it suffices to prove it only for pure P O L S of the players P and E, i.e. t h a t Aft>j,,!foi"(-),""(•)) + t > M ( x , i W « ; ( • } . " * ( • ) ) > 0
(6.2.8)
>M(x ,y u:(-),v(-))-t 0
for a l l u ( - ) £ P , u(-) €
o]
£
Denote b y x"(t) the trajectoryof the team P = { P ^ . . . , P } in the situation m
(u;(.),f*(-))- T h e n i
m
(6.2.9)
-E/» (xW(T),5 ) S
t
^=1
Let R be a radius of the m i n i m a l sphere containing the set C {M), E
•y(MJ).
i.e. H
J
=
Then
ft-«i
£
m
' =y ^
( - °) 6
21
for all k = I , . . . , n + 1 and all i = 1 , . . . , m. F r o m (6.2.9) we get {j2%\p
= l).
k
( f t - e , ) < M(* ,*»;<(•),**(•)) < ( * W J
8
(6-2.11)
Now let (?(•) be a p r o b a b i l i t y measure induced by the M P O L B S v{-) on the set C (M). E
F r o m the o p t i m a l i t y of the m i x e d strategy p = (pi,...
,p„+i) in
the game T ( M , i ) we have R
2
= 'j:p p (M,y )> k=] k
2
k
I' p\M,y)dQ. - (M)
JC
(6.2.12)
E
However, p(x '(T),M) {i)
<e^ = Y+~2R
( 6 , 2 , 1 3 )
Differential
182
games with incomplete
information
for a l l i = 1 , . . . , m . F r o m (6.2.12), (6.2.13) we find f
1
lc' {M) m E
; = 1
However, 1
/
m
-EpV
, T
m,!/W
= ^(*o,»o;«;(-),"(-))-
(6-2-15)
F r o m (6.2.14), (6.2.15) we obtain the r i g h t - h a n d side of the inequality (6.2.9). T h e l e f t - h a n d side of (6.2.9) follows i m m e d i a t e l y from the definition of the strategy u*(-).
6.3
•
Existence of equilibria i n mixed strategies in "princess and monster" game of pursuit
Consider the class of differential games of p u r s u i t p r o v i d i n g a generalization of the "princess and monster" game proposed by R . Isaacs as reseach problem in [1]. W e state the p r o b l e m correctly and show that such type of games have the e q u i l i b r i u m i n mixed strategies. T h e case of simple motion on the plane is discussed i n some detail, although all results may be extended to a general case of a r b i t r a r y m o t i o n equations for convex vectograms of the Players P and E. F i r s t recall the p r o b l e m proposed by R . Isaacs: M o n s t e r P tries to find Princess E. T h e time required to implement his project is the payoff. B o t h of them are i n the absolutely dark room R (of any shape) whose boundaries are known to t h e m . T h e capture takes place if the distance PE becomess less then the quantity £ which is small as compared to the size of the r o o m R. M o n s t e r is assumed to be hightiy intellectual and makes a simple motion with the known velocity ex. Princess is free to make any move. W e give a formal description of the game. Let a convex polyhedron S be given on the plane. Denote by So the interior of S; by 5 ' ] , . . . , S the sides of S (without vertices) and by S, i,.. . , 5 2 m the corresponding vertexes. A t the i n i t i a l instant of time the "chance" choises a m
n+
position x £ S for Player P and a position yo € S for Player E i n accordance with the uniform d i s t r i b u t i o n in S. If x ( t / ) belongs to S , k = 0 , . . . , n due to a r a n d o m play, then Player P (E) knows only that he is i n S , b u t does not 0
Q
0
k
k
Existence of equilibria
in mixed
strategies
183
know at w h a t exactly point of this set he is. N e x t , the Players P and E move into S i n accordance w i t h a simple m o t i o n x = au, y = 0v,
\u\ = 1, \v\ - 1
from the i n i t i a l state Xo £ S, yo £ S . A s s u m e t h a t at the instant 0 < f < T the point x(t) £ Sk and the point y(t) £ St. T h e n at this instant of t i m e the Players P a n d E knows only that they are respectively i n S and S , but they do not know at what exactly points of this sets they are. T h e d u r a t i o n of the game is fixed and equals to T. A continuous f u n c t i o n a l F{x , yo] x(t), y(t)} in given on set of trajectories x(t), y(t) for t £ [ 0 , T ] , a n d Player E's payoff is equal to F (player P's payoff is equal to —F). h
(
0
I n f o r m a t i o n s e t s . A c c o r d i n g w i t h the condition of the game the players distinquish only the set Sk, k = 0 , . . . , n . However, being in Sk, they do not distinquish positions i n this sets. Moreover, the players also know the set S. Therefore, when player P (E) is, say, on the side Sk, he knows which side is this one and hence on which side of Sk is the convex polyhedron S. If Player P (E) is at the vertex of Sk, k = r n - f 1 , . . . , 2 m , then he knows the position of the polyhedron S and the incident sides Sk,, Sfc that are contingut w i t h the vertex of Sk- If however, x € So, then Player P (E) knows only that he is So Therefore the i n f o r m a t i o n sets S ' ' for Player P (E) are defined as follows: t
1
S<°> = So, fe
SW = S \JSo, k
SW = S U S k
ki
US
hl
=
U So,
l,...,m,
* = m + l,...,2m
(Fig,28). Here Sj, , S t , are the sides of the polyhedron S that are incident to 5*. Define admissible controls i n each of S ' , k = 0 , . . . ,n. For i £ S ' ' Player P must choose an a r b i t r a r y direction of motion (both players have a simple motion) (Fig.29). For x £ S ' * ' , k = 1 , . . . , m, we s h o u l d remember that, when on the side of S , Player P must choose the direction i n w a r d S , and since for all x £ S the admissable control sets must coincide (otherwise Player P could d i s t i n q u i s h different positions i n S ' ^ ) (Fig.30). fc|
0
k
( k )
If x £ S^'\ k — m+^i • • - 1 2 m , then for a l l x £ S an admissible control set is the same as for x £ Sk, i.e. it represents a velosity direction inside the angle formed by the incident sides S t , , Sk,,(Fig.31) A l t h o u g h i n this case the definition of i n f o r m a t i o n sets of Players P and E is similar to the one i n t r o d u c e d the finitary i n games (see [7]), it is different from K u h n ' s definition [13] w h i c h requires that each information set chould w
Differential
184
Fig.
games with incomplete
information
28.
intersect the trajectory (the party) only once. In the case under study, the trajectories of the Players pass through the i n f o r m a t i o n sets. P u r e s t r a t e g i e s . A s in 1. C h . I . , under the players strategy is meant the rule which associates each information state of the player with the action admissible in this information state. T h i s nonformal description w i l l be used to provide a formal definition of the strategy. Since a control is chosen i n accordance w i t h the i n f o r m a t i o n , then it must be constant for all points of the information set k = 0 , . . . , 2 m , becouse the information i n all points of one information set is the same and the player knows only that he is i n j£v*/. Since, i n each i n f o r m a t i o n set, the player must choose a control admissible in this set, for all points x £ S> ' Player P can choose as a control an arbitrary direction of m o t i o n . A t the points x £ S \ k = 1 , . . . , m , he must choose a control in such a way t h a t the point remains in S. T h i s mean that he may choose only the controls which generate the m o t i o n directed i n w a r d S. The same is w i t h the choice of controls in S^, k = m + 1 , . . . , 2 m 0
(k
Denote the admissible control set i n by U (as mentioned before, the sets U are closed i n R ). T h e same statement also holds for Player E. k
2
k
Now we can give a formal definition of the strategy. D e f i n i t i o n 3. B y the strategy u{S^), ( v ( S ) ) , k = 0 , . . . , 2 m is meant a m a p p i n g which places each information k = 0, . . . , 2 m w i t h a control u(u) £ l/fc(Vjt) that is admissible in this information set. w
Let a situation (u{S^),v{S^)) be given. T h e game proceeds as follows. T h e "chance" choses the initial points x , y in the set S . P l a y e r P then choses the control u(.?l >) prescribed by the strategy u(S ), a n d P l a y e r E the control u ( S ) prescribed by the strategy u(S<*>). Here the m o t i o n complies 0
a
( 0 )
Q
0
w
Existence of e q u i l i b r i a in mixed
strategies
Fig. 29
185
Fig. 30
Fig. 31
w i t h the equations x = au(S<°>), y = w i t h the i n i t i a l c o n d i t i o n x(0)
0v(SW)
= x,
= t/ .
y(0)
0
0
Let (j ((j) be the instant of time when the t r a j e c t o r y x(t) passes from £< > to
(y(t))
initially
(S** ')- T h e n at the instant t\ (fj) P l a y e r P { £ ) choses
0
1
the control u(S<*'>) (u(S<*'>)) perscribed by the strategy u(S^)
( » ( S W ) ) and
adress to it u n t i l the game terminates or the state variable x (y) leaves the information set S
(S'* '). 1
IK,)
T h e motion i n 5 * (
(S
l )
( t a l
) complies w i t h the
equations i = au(S ' ),y = ( t
with the i n i t i a l c o n d i t i o n x{t\), y(t ). 2
MS<* )
,
a )
T h i s process continues u n t i l the instant
T. T h e game terminates at the instant T , a n d P l a y e r £ receives the payoff K(x ,y -,u(SW)MS )) 0
where x(t), states x , 0
0
=
W
E(x(t),y(t)),
are the correspong trajectories of the players from the initial
y(t) y
0
in the s i t u a t i o n ( u ( S ) , w f S * } ) , a n d F a continuous functional l t )
1
1
given on trajectories of the Players P a n d E . Since the i n i t i a l states x , 0
are chosen i n S
0
y
0
i n a r a n d o m way, then bevor the start of the game the player
may be sure i n r e c e i v i n g the average payoff
£ ( « ( < ? < * > ) , =
~27~Q~\
I
<•
Kix^yoM^MS^dxdy,
where U(SQ) is the Lebesque measure of the set S
0
Differential
186
games with incomplete
information
Under the payoff function i n this game is meant the average value of the payoff f f , i.e. the f u n c t i o n £(u(5^ ),v(5^ )). W e define the p u r s u i t game w i t h incomplete information in n o r m a l f o r m . Show the continuity of the functional as the function of )
4)
strategies u(S<*>), v(S^). Introduce the notation u(S<*>) = u , v(S^) = v k = 0 , . . . , r c = 2 m . T h e n each strategy for P l a y e r P(E) c a n be represented as vector u = ( u , t i „ ) , v = (t) , . . . , %'), E a c h values of u £ U , v £ V , where U (Vk) are the bounded closed sets i n R . Therefore the set of all strategies {u} ({i>}) is a compact subset i n the space R . Denote this set by P(E). T h u s the set P{E) is a compact set i n fl " . W e will show that the function E(u(S^),v(S^)) = E(u,v) is a continuous function defined on the Cartesian p r o d u c t of compact sets (PxE). Consider an arbitrary situation ( u , u ) . L e t D , k = m + 1,... , 2 m be the ^-neighbourhoods of the vertices of S . Denote by S$ x Ss a rectangle set of i n i t i a l states x$, y for which the trajectories x(t), y(t) for 0 < t < T from the i n i t i a l states x , ya iu the situation ( u , v ) do not intersect D . For any e > 0 i t is possible to find such 6 > 0 that \ [Ss)-p (S)\<e k
0
k
0
kl
k
k
k
2
k
2
2| +11
k
k
a
0
k
l fi
2
(here p. is the Lebesque measure of the set). Let x , y £ Ss X Ss a n d t\,... ,t ( T I , . . . ,T() be the instants at which the trajectory x(t) (y{t)) falls on the sides of S . Since x(t), (y(t)} do n o t intersect the sets D , then there is such f| > 0 ( f i < e) that the €|-neighbourhood of the points x(ti),... ,l(i„) (y{r\),..., y{T()), i,(x(ti)) (Tf(jf('Tf.)).J contains no vertices of the sets S. For any e > 0 there is such 6\ > 0 that as soon as o ( u ' , u ) < 5i, p ( V u ) < c\ the points, at w h i c h the trajectories x'(t), y'(t) intersect the sides of the set S in the situation (u\ v'} from the initial states x , j / , belong to 7,(1;), since minor variations i n strategies cannot change the passage order of i n f o r m a t i o n sets (as compared to the situation (u(-), v(•))). Since the motions along a l l trajectories occur at the constant velosities a a n d 0, then the instants o f falling o n the information sets are only slightly different from the instants (; i n the situation W W ' ) ) 0
0
T
k
k
0
0
It follows from the foregoing that for any t > 0 it is possible to find such 6 > 0 that as soon as p(ti', u) < 6, p(v', v) < 5, max
O
la:'(f) - x(t)\ < e, w
1
"
'
m a x \y'(t) - y(t)\ < t.
o
lff
w
The latter means that the trajectories e m a n a t u n g f r o m x y £ Ss x Ss are continuously dependent o n the strategies at the point u(*), « ( • ) , i.e. i f u ( ' ) , %(•) is a sequence of strategies converging to u(-), JJ(-), then for each x , yo 6 Ss x ^ a n d e > 0 there is such i V ( x , j / , e ) that for all m , r t > N(x ,y ,t) 0i
0
n
0
0
\K(x ,yo] 0
"„+,„(
),«„+,«(•))
0
0
- K(x , t/ ;u,.( ),Um( ))| < «, 0
0
a
(6.3.1)
Existence of equilibria
i.e the sequence K(x ,y ;u (-),v {-)) points of the set Ss x Ss 0
187
in mixed strategies
0
n
is fundamental converging in itself at all
n
W e now show that the sequence of integrals fj K(x , y ; u ('},i> ('))d>ctu also is f u n d a m e n t a l { c o n v e r s i n g i n itself) For simplicity.denote S x S - S and Ss x S'e = Ss- Indeed, s
0
0
W - W O ) , * « . ( • ) ) - E(u„(-),v (-))\
yo; « +«(*)i «Wm( ))«y - L ( o,
K x
3
/
/f{i ,ito;u 0
JS\s
,
n
B + B
m«•(•), «fc(0)4f* = a
K x
t(-).«»+m(-))V -
J
6
K(xo,yo;uA-),v (-))du
L .
n
S\Si
,j/o; JSi
n
=
m
= \j ( o,
n
u„(-),v {-))du
<
n
JS
t
I A'(a;o,!/o;«„ (-),w«+m(')} /* JS\S rf
+m
1
-
/ , „ ^(*0»fo!«»(•)»«%»C0)^J« Js\s,
t
+ \J
$
0
0
n+m
,
n
^{x ,T/ ;u„(0,J) ())V .
- j
Ktx ,y \U (-),V +m( ))dti
3
+
s
o
o
n
It follows from (6.3.1) that the sequence K(x , !/O; ti(')> "nf")) converges at each point x , T/O £ 5 i , hence the sequence of integrals \ K(XQ, yo; " (")> " n t ' ) ) ^ / also converges. T h i s means that there is such N[c) that w i t h n> N[e) for all m u
0
S(
0
|^ / C ( x o , y ; u 0
n + m
(-),i' +m(-))^/ n
1 2
- ^
13
n
f\"(so,3/o; "*(*). >*('))<'" v
< -
2
s
B y e v a l u a t i n g further the first difference (fi(S \ Ss) < t", K(xo, yo\ "(-)i "(•))
< M)
w
e
obtain
| j £ K(x ,yo;»».+,(•).fn+^t-))^'
- jf*(*°.So;um(-). « » ( • ) ) V | < <+ 2«M'
0
for n > A ^ c ) a n d a l l m . T h i s means that the sequence is f u n d a m e n t a l , i.e. there is the limit
f K(x ,y ;u (-),v {-)}dfi s
0
0
n
n
lim
/ K(x ,yo\u (-),v {-))du . 0
n
(6.3.2)
2
K
Since t h e measure u is continuous, then the integral f
s
K(x , 0
y ; "(•). " ( O ) ^ 0
is a continuous f u n c t i o n of t h e set. F r o m the fact that for any S > 0 lim
/ /C(z ,!/o;un(-W(-)hV= £
fir(*o,ifti«(-).»(-))^ . 3
0
it follows t h a t the l i m i t (6.3.2) is equal to f
s
function E{u,v)
K(x ,y ; 0
0
«(•), v{-))du' l«t
the
is continuous at t h e p o i n t u(-), !>(•}• Since the situation
(u(-),u(-)) is a r b i t r a r y the f u n c t i o n E(u,v) strategies u ( - ) , u( ).
is a continuous function of the
Differential
188
T h e o r e m 5 Tke payoff function
games with incomplete
E(u,v)
is a continuous
function
information on the prod-
uct of compact sets P X E It follows from T h e o r e m 4 that any game i n n o r m a l form defined on the compact sets of strategies w i t h a continuous payoff function has the equilibrium in mixed strategies. In view of T h e o r e m 5, this gives a theorem that is the basic for this section. T h e o r e m 6 The pursuit game with incomplete information this chapter, has the equilibrium in mixed strategies.
defined in 3 of
A slight modification i n the proof allows the generalization of Theorem 6 to the case of a r b i t r a r y motions x =
f{x,u),
y =
g(y,v).
Example 3. Consider a special case where the p o l y h e d r o n S is a square. M o t i o n equations are of the form for Player P for Player £
x = 0, y^ = vi,
x y
x
= u = 0.
2
,
2
2
3
T h e initial positions of the players P and E coincide. T h e controls u , v\ can take only one of two values: +1 or — 1 . 2
T h e d u r a t i o n of the game is fixed and equals T = 1. A c c o r d i n g to the conditions of the game the players cannot leave the square S, the payoff of Player E depends only on the final positions t/(l) and is defined as folows: F(x ,y ;x(t),y(t)) 0
a
= F'(x ,y ;x(l),y(l)) 0
= l
0
[
+J*
^Cl)
<
~
2
Here 0 < a < 1. Since the variations in the variables x , y do not change the payoff function, then the game can be reduced to an equivalent game, though in a simpler form, by i n t r o d u c i n g the vector z = (z\,z ) = (yj,x ) and the controls u = [Q,u ), v — ( « i , 0 ) . Indeed, the m o t i o n equations {6.3.3} are rewritten in the form x
2
2
2
z = u + v,
2
z{Q) = z . 0
(6.3.4)
where v and u take the values ( + 1 , 0 ) , ( - 1 , 0 ) and ( 0 , + l ) , ( 0 , - 1 ) , respectively. T h e payoff is of the form n % 4 i ) ) = {
+
! '
I -1,
2j < z i
Existence at equilibria in mixed
strategies
189
+1 s*
u
J--1
—
*
Fig. 32. where 0 < a < 1 A s s u m e that the i n i t i a l point Zo 6 S is chosen i n a r a n d o m way according to uniform p r o b a b i l i t y d i s t r i b u t i o n on S. Let us examine F i g . 3 2 . If the game terminates at the point z( 1) l y i n g above the line segment A f J , then Player E ' s payoff is assumed to be + 1 , otherwise (—1). For s i m p l i c i t y , we o m i t the information sets which are the vertices of the square, the p r o b a b i l i t y of falling on these vertex being zero. Therefore, the dimension of the strategy vector for each player is 5. Since the controls may take no more than two values, and only one value on some of the sides, then, as is readily seen, each player has exactly eight strategies. P l a y e r E ' s strategy
Player P ' s strategy
=(1,1,-1,-1,1),
*,(•) = ( 1 , 1 , 1 , - 1 , - 1 ) ,
v,(-) = ( - 1 , 1 , - 1 , - 1 , 1 ) ,
t*a(.) = ( - 1 , 1 , 1 , - 1 , 1 ) ,
«>(•) = ( 1 , - 1 , - 1 , - 1 , 1 ) ,
u»() = (1,1,-1,-1,1),
»,(.) = ( - 1 , - 1 , - 1 , - 1 , 1 ) , B»(.) = ( l , l , - l , l , l ) , !*(•) = ( - 1 , 1 , - 1 , 1 , 1 ) ,
«*(•) = ( - 1 , 1 , - 1 , - 1 , U () = ( l , 1 , - 1 , - 1 , U (") = ( " 1 , 1 , 1 , - 1 , "
Vfl) = (1,-1,-1,1,1), «»(•) = ( - 1 , - 1 , - 1 , 1 , 1 ) ,
U r v ) «= ( 1 , 1 , - 1 , - 1 , us(-) = ( - 1 , 1 , 1 , - 1 , -
5
6
T h u s , the payoff function may be w r i t t e n as a payoff m a t r i x . T h e elements of the m a t r i x o,j are c o m p u t e d by the formula ao-=
/ •*s
F(z,z(l))dz,
where z(t) is a t r a j e c t o r y from the i n i t i a l state z 6 S in the s i t u a t i o n (";(-), t>j(-)).
Differential
190
games with incomplete
information
T h e most tedious operation is to find z(t) i n the s i t u a t i o n (u ('},Uj(.)). W i t h o u t presenting a l l c o m p u t a t i o n , we w r i t e the payoff matrice for the case OJ = 0 and a = For a = 0 the m a t r i x have the form ;
u I |
Ux
«, V2
v< u
1 0
6
ti
7
U
8
1 77"
3 — 7T 0 3 - 7T
VG
tt fl 1
2
1 1
w -1 0
4
3
1 -1
1 -1
- 3
I
0
*— I z
I f
1
3
3
7 3 2 3 - J T 5
I 3
3
3
!-» 0
2 3„
? 3
7T
2
-6
2TT
-1 - 3 0 -1
0 3 - JT -1
2 3
6 - 2TT
«8
—1
1
* - I 0 JT - 3
0
1 3
«7
l
•2
-1 x - 3 0
3
!-*
0
In this case, as is seen from the payoff m a t r i x , there is the e q u i l i b r i u m in pure strategies v (-), u ( ) and the value of the g a m e is equal to 0. 3
7
For a = \ the m a t r i x is of the form:
U[ i
V l
U s
„
i l
U 3
1
"2
. . 1 "t , ¥
•
Uz 3
2
6 i II
1
1
12
4 i i
1
1
1
1
l i
fl
1?
3
12
_\
lij
WG s 6
"7 2 3
f I
I 11
\
k
fl.
LL
5
U\
2
6 1 1
!
2
2
n 12
5
12 2 3
12 11 12
!
!
12 2 3
J
J I S 1 A I 1 i 3
2
12
12
12
12
Us
. ^ 2
G 11 12
l
2
A 12
For a — x the e q u i l i b r i u m i u i n pure strategies does not exist and the mixed o p t i m a l strategy of Player E concentrates its measure on two pure strategies 3 ( ' ) , »«(•)) d the mixed o p t i m a l strategy of Player E on the strategies ttr('), ug('). T h e mixed o p t i m a l strategies are of the form: u
a
n
for P l a y e r f i
( 0 , 0 , |, ± , 0 , 0 , 0 , 0 ) ,
for Player P
(0,0,0,0,0,0,1,
f).
We may show that for all 0 < a < 1 the mixed o p t i m a l strategies of the players concentrate their measures on pure strategies u ( ), ti {.) of Player E and on strategies u ( - ) , u ( ) of Player P. 3
7
8
4
Differentia.!
6.4
games with discrete information
partition
191
Differential games with discrete information partition
Consider the differential game in w h i c h the players P and E have no information on their states, and know only some sets i n which such states reside. Assume that a m o t i o n equation is of the form
i = f(x,u,v), ueU, veV
(6.4.1)
( / is continuous and satisfies L i p s c h i t z condition for x uniformly w i t h respect to u, v i n a bounded set X), where x e ft , and U a n d V are compact sets i n ft , w i t h the i n i t i a l c o n d i t i o n x(0) = x . For smplicity, we restrict ourselves to the games of prescribed d u r a t i o n T w i t h x £ ft . 2
2
0
2
D e f i n i t i o n 4. L e t S = { S i , S ) be a p a r t i t i o n of the plane ft into a finite n u m b e r of sets, i.e. SiftSj = A , i f j , U £ i Si = ft , that is a diffeomorphism of sell s u b p a r t i t i o n of the same plane. ( B y the cell subpartition is meant any s u b p a r t i t i o n of cell p a r t i t i o n ) . T h e point x £ Si is called a boundary point of the p a r t i t i o n if for any t > 0 there is such e-neightbouhood U(x, f) of the point x and such k ^ i t h a t 2
m
2
(6.4.2)
U(x,e)f]S ^\. k
T h e point x £ Si is called the b o u n d a r y point of the first order i f it is a boundary point of the p a r t i t i o n a n d the set Sk from (6.4.2) is unique. T h e remaining b o u n d a r y points are called the boundary points of the second order. T h e p a r t i t i o n S is called regular if it contains only a finite number of boundary points of the second order w i t h at least one elment of the p a r t i t i o n being bounded. T h e connected set of b o u n d a r y points of the first order is called a regular segment of the boundary. A s s u m e t h a t , under any i n i t i a l conditions and any pair of constant controls u, v, the solution x(t) of the system (6.4.1) for f > 0 is not tangent to regular segments of the b o u n d a r y T h e game proceeds as follows. A regular information p a r t i t i o n S is given on ft . A continuous d i s t r i b u t i o n F is given on one of the bounded elements of this p a r t i t i o n T h e " c h a n c e " chooses the point x £ S^ i n accordance w i t h the d i s t r i b u t i o n F. T h e point XQ serves as the initial c o n d i t i o n for Eq.(6.4.1) and is not k n o w n to the Players P and E. T h e players P and E only know that the point x belongs to the p a r t i t i o n element 5 ^ , the d i s t i b u t i o n F also being k n o w n to t h e m . T h e point x moves i n S\- i n accordance w i t h the motion Eq.(6.4.I) under the unfluence of controls u and u chosen respectively by the players P and E. 2
0
0
0
Fig.
33.
A s s u m e that the system (6.4.1) has a unique s o l u t i o n , w h i c h can be continued on the interval [0,TJ, for a l l i n i t i a l conditions x e S and for all measurable o p e n - l o o p controls u((), v(t) of the players P and E. 0
ko
Let t i be the first instant when the point x(t) falls on the boundary of the set S. A t the point x ( t i ) , the set of p a r t i t i o n elements {Shi), i = 1 , . . . ,f, for which it is b o u n d a r y p o i n t , is made known to the players. If the point x moves further along the boundary of the p a r t i t i o n 5 , then the following information is furnished when x(t) falls on the point of the b o u n d a r y w i t h the changed set of the neighbouring sets i = If however, x starts moving from the point i(f-i) along one of the p a r t i t i o n S , the following information is furnished at the instant t when x(t ) again falls on the b o u n d a r y of S. In this case, the sets of p a r t i t i o n elements, for which the point x{t ) is boundary point, are made known to the players. T h i s process continues u n t i l the instant r . Player E gets the payoff H(x(T)), where the function H is a continuous function given on R?. Player P gets the payoff -H[x(T)). In w h a t follows the partition S is assumed to be regular. T h e regular p a r t i t i o n S is of the form given on Fig.33. Here the b o u n d a r y points of the second order are shown by circles. ki
2
2
2
A s is seen from the definition of the b o u n d a r y point of the first order, they serve as bounds only for two p a r t i t i o n elements of S . A n y regular segment of the boundary is the boundary of two elements of the p a r t i t i o n S. k
Denote a regular segment of the b o u n d a r y between the sets ShSj C S by Sij (if Si and Sj have no b o u n d a r y points i n c o m m o n , then S\j — A is assumed to be equal to an empty set). Let t(Sjj) be the leught of the line segment of the curve Sij (it also may be equal to infinity). If the sets Sj have a nou empty regular segment of the b o u n d a r y Sij, then between these sets there is a m a x i m u m regular segment of
Differential
games with discrete information
Fig.
partition
193
34.
the b o u n d a r y S^, i.e. such that
(here I p S y ) y be equal to i n f i n i t y ) . If ({Sij) ^ oo, then the set is a closed curve ( F i g . 34) or the ends of the segment S\j are the isolated b o u n d a r y points of the second order (since there is only a finite number of b o u n d a r y points of the second order). If, however, £(Sij) = oo, then the segment S^ has not more than one b o u n d a r y point of the second order. Denote the boundary points of the second order by {M }, ( = 1,...,5. T h e i n f o r m a t i o n a l description of the game suggests that the regular segments of the b o u n d a r y Sij a n d the points Mr play a special informational role in the game. A t the points of the set Sij, and at the points Mi, the player acquires i n f o r m a t i o n on the t r a n s i t i o n of the process to one of the p a r t i t i o n elements for w h i c h it is a boundary. T h u s , after realization of a r a n d o m mechanism w i t h the d i s t r i b u t i o n F(x) on the set St € S at the initial instant of the game, the players' i n f o r m a t i o n states vary as the trajectory x(t) passes through the set of Sij a n d the points M(. T h i s enables us to give the following definition. m
a
t
t
D e f i n i t i o n 5. T h e set S^ € S, the regular segments of b o u n d a r y Stf and the isolated b o u n d a r y points of the second order Si are called the information sets for the players P and E. Since the information sets are uniquely defined by the p a r t i t i o n S, the latter is called the information p a r t i t i o n . N o w we can define the players' strategies. Consider a collection of all information sets (&»,£»,
- • -, %
• • •.
;M ...,M( ...,M ), 1
i
3
Differential
194 composed of (s + m
games with incomplete
- m + 1) elements.
2
information
T h e strategy of P l a y e r P is 2(s -f
— m + l ) - d i m e n s i o n a l vector
m
2
W(•) = ( « t , « l l , . . . , U i j ' . - - - t m , m - l . l ) - - -."*)•• - ) " s ) . u
0
(6.4.3)
u
where u * , e t / is the choice made by Player P i n the i n f o r m a t i o n set S ; u,j € U is his choice i n the information set Sij, a n d ui £ (/ the choice i n the information set A/<. Similarly, the strategy for Player E is 2(s + m - m + I ) dimensional vector ka
2
"(•) = (vk<„vi2,
• • • , V i j , . . . ,v , -.i,v ... m m
,v ...
1}
,v ),
(l
a
(6.4.4)
where vt € V is the choice made by Player E in the set Sk„; "0' € V is his choice i n the information set Sij, and ve £ V the one i n the information set M . T h e set of all vectors (6.4.3) for £ U, mj € U, u £ M \s the set of pure strategies of Player P and is denoted by P. S i m i l a r l y the set of vectors (6.4.4) for v £ V, Wjj £ V, v/ £ V is the set of pure strategies for Player E and is denoted by E. 0
(
e
ko
Let (u(-),u(-)) be a situation i n the game. Consider the formation of x(t) in the situation (u(-),v(-)) i n more detail. Since the i n i t i a l choise of strategy i n game belong to the information set Sk £ S as a result of a r a n d o m move, then at the first stage the players strategies determine the choices of controls u(S ) = u*„, v(S ) — v which remain constant u n t i l the instant of time i[ when the point x m o v i n g i n accordance w i t h the system 0
ko
ko
x = f(x,u v ), ka>
x0)=xo,
ko
falls on the boundary of the p a r t i t i o n S. If the point x(ti) further motion occurs i n accordance w i t h the system i =
ko
z(t)|
lBei
=
£
then its
(6.4.5)
where u ^ j , = « ( 5 j j ) and u ; , ^ , — u ( 5 i , j , ) are the players choices in the information set S , determined by he strategies u(-), u(-), respectively. If, however, a:(ii) = M , then the further m o t i o n of the point orcurs in accordance w i t h the system I
1
|i3]
(
x = /(*,«(,»(),
x(t)\
l=tl
= x(U),
(6.4.6)
where u = u(M ), v = v(M ) are the players choices i n the information set M determined by the strategies u(-), u(-), respectively. T h e m o t i o n continues u n t i l the instant i when the point x(t) arrives at the next information set that is other than the set S „, where the controls are changed i n accordance w i t h the strategies it(-),u(-) and so o n , u n t i l the game terminates at the time instant T. A s a result, we o b t a i n the motion trajectory x(t) for 0 < i < T. t
e
f
t
f
2
k
Differentia] games with discrete i n f o r m a t i o n
partition
195
A s s u m e t h a t to each i n i t i a l c o n d i t i o n x G Ska * situation (u('), where it(-) £ P, v(-) € E correspond uniquely the m o t i o n trajectory z f i ) , 0 < f < T , a n d hence the payoff function H(x(T)). Therefore we may note that the payoff i n this g a m e is a function of the i n i t i a l condition x 6 Sh, and the strategies u(-), u(-), i.e a l ) <
0
e
a
c
n
0
K(x ;u(-),v{-))
(6.4.7)
= H(x{T)),
0
where x{t) is the m o t i o n trajectory from the i n i t i a l state x in the s i t u a t i o n («(•),»(•))• Since the realized i n i t i a l state x i s , u n k n o w n to the players and is a r a n d o m variable f r o m the set S w i t h the d i s t r i b u t i o n F(x), then i n the situation (u(-), u(-)) P l a y e r E can reckon on the averaged payoff 0
0
ko
M(u(-),v(-))
= [
(6.4.8)
K(x ;u{-),v(-))dF, 0
where m a g n i t u d e of K(x ; «(•),«(•)) is determined by the formula (6.4.7). T h u s we have reduced the game to the n o r m a l form: a
T=<
(6.4.9)
P E,M(u(-),v(-))>. t
where P , E are some compact sets i n 2 ( £ + m + 1 — m.)-dimensional Euclidean space that are the strategy sets for the players P and E and M(u(-),v(-)) is a real function given on the C a r t e s i a n product P x E which is a payoff function (fot P l a y e r E). 2
It may be i l l u s t r a t e d by referring to some simple examples that in the game T there is no e q u i l i b r i u m i n pure strategies even i n the simplest cases (for the simplest p a r t i t i o n s S = (S\,... ,S )). Therefore it seems necessary to introduce m i x e d strategies. Before defining the l a t t e r , we prove the continuity of the payoff function M ( u ( ' ) , u ( - ) ) on the set P x E. T h e proof of this assertion i n similar to the one of the corresponding T h e o r e m 3 of this chapter. B u t this calls for some additional assumptions. m
C o n s i d e r an a r b i t r a r y s i t u a t i o n (u(-),«(•))• Denote by /_>/, £ = l , . . . , s , the ^-neighbourhoods of the b o u n d a r y points M of the second order of the partition S. Let S(6) C S be a set of i n i t i a l states x for which the trajectory x(t) for 0 < t < T does not intersect the set Dt- A s s u m e that for any t > 0 we can find such S > 0 t h a t \u (S{6)) < e, where u is the measure t
ko
0
F
F
generated by the d i s t r i b u t i o n F on the set S . Let x £ S{6) a n d t ..., t be the instants when the trajectory x(t) arrives at the i n f o r m a t i o n sets Sij (i.e. the first instants when the trajectory x(t) intersects the regular b o u n d a r y segments of the p a r t i t i o n S). ko
0
u
T
Differential
196
games with incomplete
information
Since x(t) does n o t pass t h r o u g h the set Dt then there is such tj > 0 (ei < 8} that the t i - n e i g h b o u r h o o d s of the points x(ti),...
,x(t ),
7,, (x(ti)),
r
i = 1 , . . . , r , not contain the b o u n d a r y points of the second order j M . f
For any t > 0 there is such 6 > 0 t h a t , when p{u',u)
< S, p(v',v)
< 6, the
intersection points of the trajectory x'(t) i n the s i t u a t i o n (*/{•), »/(•)) from the i n i t i a l state x
€ S(8) a n d the b o u n d a r y of the p a r t i t i o n S, belong to 7,(1^)
0
since minor variations i n strategies cannot change the passage order through the information sets (as against the s i t u a t i o n ( u ( - ) , « ( - ) ) ) .
T h e n we may show
that t h e instants when t h e trajectories x'(t) arrive at t h e information sets are slightly different from the instants 1, of the trajectory x(t) arrival in the situation («(•), u( )). T h e foregoing implies that for any e > 0 it is possible t o find 8 > 0 that whenp(u'(-),u(-))<8,p(v-(.),vC))<6, >nax \x'(i)- (t)\<e. T
X
T h e latter means that the trajectories e m a n a t u n g from the points xo G S(S) are continuously dependent on the strategies a t the point (u(-), "(•)). Since H(x{T)) is assumed t o be a continuous function of x, then for any sequence «»(•)> n(-) converging to u(-), u(-), and for any point x € S{8), there is such N(x ,e) that for a l l m , n > N(x ,t) u
0
Q
0
\K{x ; u , 0
1 + m
(-)
{•))-K(x -u, (.) v (-))\<e, 0
l
>
(6.4.10)
n
i.e the sequence K(x ; u„(•),«„(•)) is converging i n itself at all points of the set S(S). 0
W e now show t h a t the sequence of integrals is converging i n itself. Indeed, v (-)) n+m
/
- Jtf(
K(x ;u, (-) ^ (-))dF0
i+m
}V
j
m
l<(xo-:U, „ (-),v (-))dF t+
s
n+m
-
U m
also
K(x ;v (-),v (-))dF 0
n
n
(-),« (.))| = m
/f(io;u»(-).««(-))^
[
K(x ,u, (-),v (-))dF+ 0
l
n
<
+ J
sw
K{x ; u (-) 0
n+m
}
v +, (.))dF n
n
- J
K(x -u„(-), 0
v (-))dF n
Differentia!
games with discrete information
197
partition
from (6.3.1) i t follows t h a t the sequence K(x ; u „ ( - ) , u„(-)) converges at each point x S S(8), therefore the sequence of integrals f K(x ; u „ ( - ) , v ('))dF also converges. T h i s means that there is such N(e) that w i t h n > N(e) for a l l m 0
0
m
L J
,
^
W
R
r
W
-
)
^
-
a
K{x ;u (-),v {-))dF
/
${V
0
0
R
n
JS{5)
E s t i m a t i n g f u r t h e r the first difference {p {S M) we o b t a i n F
£ K(x ;u , (-),v (-))dF*o 0
n+ n
f i
n+m
J5
J s
< t.
\ S{8)) < t , K(x ;u(-),v(-))
kQ
0
<
K(x ;u (-),v (-))dF < e-r 2eM Q
m
m
a
for n > JV(C) a n d a l l m. T h i s means that the sequence K{xo\u (-),v (-))dp a
ri
is f u n d a m e n t a l , i.e. there exist the l i m i t
F
^Sj,
(6.4.11)
K(^^(-)M-))dF.
M . L 0
Since the measure F is continuous, then the integral fg
is a continuous function of the set. F r o m the condition
K(xQ-,u{'),v{-))dpp
that for any 6 > 0
JjjQjjLgj
K(x ;it (-) Vn(-})du 0
n
:
= j
F
K(x ;u(-),v(-))dp , 0
it follows t h a t the l i m i t (6.4.11) is equal to f
s
function M(u(-)
t
F
K{x ;u(-),v[-))dp , D
F
i.e. the
«(•)) is continuous a t the point (u(-), v{-)). Since the situation
(u(•),«(•)) is a r b i t r a r y , then M(u(-},«(*))
is a continuous function of strategies
.(•),«(-)• Now we have the following theorem. T h e o r e m 7 In the. game with the regular information function
S ( u ( - ) , «(•)) is a continuous
function
partition
S, the payoff
definded on the product of sets
PxE. A s in the previous section, definde a m i x e d extention of the game Y. Let p, v be a m i x e d strategies of the players and E(p,u)=
j f
M(u{-),v{-))dpdu
be the e x p e c t a t i o n of the payoff i n m i x e d strategies p, v. Denote by {p} a n d {v) the sets of a l l m i x e d strategies of the players P and E, respectively. E m p l o y i n g T h e o r e m 4 i n C h a p t e r I a n d T h e o r e m 5 i n t h i s chapter, o b t a i n the f o l l o w i n g t h e o r e m .
Differential
198
A ^
^
^
^
X
games with incomplete
B
So
-1
X
0
^ ^ ^ ^ Fig.
+ l
35.
T h e o r e m 8 The game T with a regular information rium in mixed
information
partition
S has the equlib-
strategies.
Example 4- T h e game proseeds on the plane. M o t i o n equation are the same as i n E x a m p l e 3 of this chapter. T h e d u r a t i o n of the game T = 1. The payoff the Player E is t e r m i n a l and equal to |2i(l)| + 1^(1 )|. T h e information partition is of the form shown in Fig.35. T h e game starts in the information set AB (the interval [—1,1] on axis Z\) w i t h a r a n d o m choice( in accordance w i t h a uniform d i s t r i b u t i o n on AB) of the i n i t i a l point z . 0
Since one of the two control (+1 or —1) can be chosen in each of the three information sets, then the pure strategy sets for b o t h players coincide and are of the form
M)
«,(-)
«.(•) «,(•) «*(•) "s(0
vi(-) <*(•)
(1,1,1), (1,1,-1), (1,-1,1),
M*S
(1,-1,-1), (-1,1,1), (-1,-1,1),
%(•)
(-1,1,-1), (-1,-1,-1).
"4(0
To define the payoff m a t r i x , we have to compute the points z,, z i n each situation and to average the payoff over the i n i t i a l d a t a from the interval [—1,1]. For the s i m p l i c i t y , here we omit minor c o m p u t a t i o n s and propose the 2
Games w i t h m i x e d
state
information
199
final f o r m of the payoff m a t r i x : u 2 2 2 2 1 2 1 2 a
"1
Uj «3 "4
I>5
ve "a
2 2 2 2 1 2 1 2
u 2 2 2 2
u< 2 2
"a 1,5 0,5
2 2
0,5 1,5 0,5
0,5 1,5 0,5 1,5
1,5 0,5 2 2 2 2
3
1,5
u 1,5 0,5 1,5 0,5
«7
2 2 2 2
2 2 2 2
6
2 1 2 1
«s 2 1 2 1 2 2 2 2
A s is seen from the payoff m a t r i x , the n o u d o m i n a t e d strategies (see [2,8]) are only the strategies Uj(-), v ( ) for P l a y e r E and the strategies u {-), «$(•) for Player P T h e consideration of the corresponding 2 x 2 m a t r i x 3
6
(AY) shows that the a b o v e - m e n t i o n e d strategies are to be chosen w i t h equal probabilities. T h u s the o p t i m a l mixed strategies for the P l a y e r P and E are of the form ( 0 , 0 , 1 / 2 , 0 , 1 / 2 , 0 , 0 , 0 ) and ( 1 / 2 , 0 , 0 , 0 , 0 , 1 / 2 , 0 , 0 ) respectively.
6.5
Games with mixed information state
Here, as i n the previous section, we assume that a m o t i o n equation is of the form ± =/(%«,«),
(6.5.1)
where x € R , u e U C R , v € V C R (U and V are compact set) with the i n i t i a l c o n d i t i o n i ( 0 ) = XQ. A S usual, we assume that the c o n d i t i o n , which ensure the existence, uniqueness and [0,T] c o n t i n u a b i l i t y of the solution for any measurable o p e n - l o o p controls u ( t ) , v ( r ) , 0 < f < T, are imposed on the system (6.5.1). T h e game has a prescribed d u r a t i o n T. P l a y e r E's payoff 2
2
2
is determined by the function H{x(T)) function ( P l a y e r P ' s payoff is equal too
t
where H(x) -H[x(T)).
is a given continuous
T h e convex set M wherein the players have a discrete information state, is given in R . W h e n the point x falls on the b o u n d a r y of M (from the inside), the g a m e continues w i t h complete i n f o r m a t i o n (i.e. the players are informed about the phase state of the system x(t) at each current instant of t i m e ) . U n d e r the conditions of the game, once fallen on the b o u n d a r y of the M the game preseeds w i t h phase constraints i n the set R \ M. 2
2
Differentia!
200
games with incomplete
For s i m p l i c i t y , the set M is assumed to be open, S' be a regular p a r t i t i o n of the plane R . Denote by of the p a r t i t i o n S\ where S is the partitions of the (S' C S") that are a s u b p a r t i t i o n of the p a r t i t i o n 5". interval of the b o u n d a r y S y as an intersection Sij = 2
k
information
i.e fP\M = R \M. Let S = S' f l M a subpartition set M by the sets S' f] M S i m i l a r y , we define regular S'ij f]M of regular interval 2
k
of the p a r t i t i o n b o u n d a r y S' w i t h the set M. N e x t let A#(, I = 1,2 4 be the second order b o u n d a r y points of the p a r t i t i o n S' belonging to the set M. T h e strategy u(-) of P l a y e r P consists of a pair of strategies u(-) = {u](-), u {-)}, where U[{-) is a strategy of the form (6.4.3) defined on the information set S of the p a r t i t i o n S' (see 4 in this chapter), and uj(-) is the piecewise o p e n - l o o p strategy i n the perfect information game w i t h discrimination and phase constraints i n the set R \M. T h e strategy v(-) = {vi(-),v (-)} of Player E is defined i n much the same way. Denote by P and E the sets of all possible strategies u(-), v(-) of the players P and E, respectively. T h e definition of strategy given above reflects the fact t h a t at the first stage of te game (in M) the players actions are based on the discrete information determined by the p a r t i t i o n S (the functions u i ( ' ) , vx(-) being used to form the controls), and t h e n , after leaving M they play the perfect information game (the second components of strategies u j ( - ) , vj(-) being used here). T h e game proceeds as follows. Let S" be an element of the partition S' having a nonempty intersection w i t h the set M. A t the i n i t i a l instant of time the "chance" chooses a point xo 6 S = 5J^ = S'^ftM i n accordance with the given continuous d i s t r i b u t i o n F. Suppose the players have chosen the strategies u(-) = { u i ( ) , u ( - ) } e P and v(-) = {v, (•), wi(-)} G E. In the situation (u(-),u(-)) the motion of the point is as follows. In the set M the motion of the point i from the state XQ € S realized by a r a n d o m mechanism is affected only of the components Uj(-), Ui(-) of the strategies u ( ' ) , u(-), and the trajectory x(t) is constructed at this stage as in 4 of this chapter. Let ( be the first instant when x(t) falls on the set R \ M. F r o m the state x(i] (for i < T) the m o t i o n of the point is determined by the components U2(-), u ( ) of the strategies u(-), v( ) and is constructed as i n 2, C h . 2. If ( > T , then the game terminates in the set A f , and the trajectory x(t) is constructed as in 4 of this present chapter. T h u s we establish t h a t to each r a n d o m l y realized i n i t i a l position and s i t u a t i o n (u(-),ti(')) correspond a unique trajectory i ( ( ) and therefore a unique payoff 2
2
2
ka
k
2
ka
1
2
K{x ;u(-),v(')) 0
= H(x(T)),
(6.5.2)
where x(t) is the trajectory from the i n i t i a l state x in the s i t u a t i o n (u(-), u(-)), Since the realized i n i t i a l state x is u n k n o w n to the players and is a random variable w i t h the d i s t r i b u t i o n F{x) on the set S^ = S\ f\M, then Player E 0
0
Games with mixed information
state
201
in tlie s i t u a t i o n (u(-), u(-)) can only reckon on the averaged payoff. «(«(•)»«('))-
/
K(xoM-),v(
))dF,
(6.5.3)
where the m a g n i t u d e of K(x \u(-), «(•)) is determined f r o m the formula (6.5.2). We have thus reduced the game to n o r m a l form: 0
F=<
P,E,M(u(-),v(
(6.5.4)
))>
where P , E are t h e strategy sets of the players P a n d E, a n d M ( u ( ),u(-)) is the real function (see (6.5.3)) defined on the C a r t e s i a n product P x E and representing the payoff function (Player E's payoff). A s i n 6, C h . 2, we m a y show that the differential game w i t h d i s c r i m i n a t i o n and phase constraints in the set R \M for any e > 0 has the ( - e q u i l i b r i u m in piecewice o p e n - l o o p strategies. Denote by V(xg,T) the value function for the differential game of d u r a t i o n T in the set R \ M from the i n i t i a l state x G R \ M. T o o b t a i n a solution i n the game T, we define an a u x i l i a r y game T'. T h e game F* differs from the game T by the definition of the payoff f u n c t i o n . L e t (u(-),v(-)), be the situation i n T, x € S t „ , be t h e result of choosing a r a n d o m move, and x(t) be a trajectory in the s i t u a t i o n (u(-), «(•)) from the initial state x G 5 t . Denote by i the first instant when the trajectory x{t) penetrates the set R \ M, i.e. 2
2
0
2
0
0
0
2
F«mf{i
:
(6.5.5)
x(t)$M}.
If there is no i n f i m u m i n (6.5.5), then we set / = oo. T h e payoff function is defined in the game V as follow: _ r v(x(i),r-i), K (Crt«i(-M0) = | H
{
x
(
T
)
)
Note that by t h e definition of the game V(x,T)
ifo
Next,
let • *o /S
T h e strategies i n the game V are the first components of the strategies u(-) = [t*i( )><*s("))i (-) = ( i ( " ) > i ( ' ) ) determined on the elements of the p a r t i t i o n S, i.e. ui(-) a n d u,( ) are not employed (the game terminates when the point x arrives at R \ M). -
u
v
u
2
Denote the strategy sets u,(-) a n d u,(-) i n the game T' by P , a n d £ i respectively. E v i d e n t l y , these sets are compact sets i n the E u c l i d e a n spaces of finite d i m e n s i o n (see 4 in this chapter).
Differential
202
T h e o r e m 9 The payoff function
games with incomplete
M'(ui{-),v,(-))
is a continuous
information function
on
the set P\ x E\. P r o o f : T h i s theorem is proved in much the same way as T h e o r e m 7. A l l the conditions from the 4 of this chapter are assumed to be imposed on the p a r t i t i o n S' and the system (6.5.1). A s i n 4, introduce the set S(S) of i n i t i a l states from w h i c h , i n the situation ( u i ( - ) , » i ( ) ) , the trajectory fails to pass through the (-neighbourhoods of the second order boundary points of the p a r t i t i o n S. W e show that for x € S(6) the payoff A " ( x ; U i ( - ) , u i ( - ) ) is continuous i n the p o i n t ( u i ( - ) , " i t ) ) - ° this e n d , consider two cases. -
0
1
0
C a s e 1. T h e i n i t i a l state x
0
T
€ S(8) is such t h a t f < T.
C a s e 2. T h e game terminates when the players reaches the set
R\M,i.e.
i>T. In case 1 the continuity of / ( " ' ( x o i u ^ - ) , ^ - ) ) follows from t h a t of the function V(x(t),T — t) i n £ and T — i (the value function of the differential perfect information game depend continuously upon the i n i t i a l states). If ( < T , we prove the continuity by reproducing the reasonings from 4. If, however, t = T, then we employ the condition H{x((Tj) = V(x(T),Q) and the continuity of the function H in x. T h e proof of T h e o r e m 7 fully applies to Case 2. T h e continuity of the function M'(uj(-),Vi{')) in the point (tii(-),«]{•)) follows from that of the function K'(x \U](-),Ui(-)) i n the point (ttj(-),Vi(-)) for x g S(S) (see (6.4.10), (6.4.11) and the computations of the payoff of T h e o r e m 7). • 0
0
Since the point (u, (•), v (•)) has been chosen a r b i t r a r i l y , then the payoff function A f ' ( u i ( - ) , « ] ( • ) ) Of the game I " is continuous over the strategy set P i x E\. Since the strategy sets in the a u x i l i a r y game V are compact sets in the E u c l i d e a n space, then employing T h e o r e m 4, C h . 1, we o b t a i n the following theorem. t
T h e o r e m 10 In the auxiliary strategies.
game V there exist the equilibrium
in mixed
W e now focus on the original game T. In the game T the strategy sets are not compact, since each strategy contains a piecewise o p e n - l o o p component representing a strategy in the perfect information game on the set R? \ M. T h e set of such strategies is obviously non compact, since it contains the set of a l l measurable o p e n - l o o p controls u ( r ) , v(t) for ( < ( < T. A t the same time, for e > 0 we sucseed i n proving the existence of the e - e q u i l i b r i u m in
Games w i t h mixed information
state
203
the game V in m i x e d strategies. T h e proof employs the condition that in the perfect i n f o r m a t i o n game on the set R \M c-equilibrium point.
for any e > 0 there exist the
2
Let V b e the value of the game T', and (p',1/*) the e q u i l i b r i u m i n m i x e d strategies i n this game (see T h e o r e m 10). For a p a r t i c u l a r e > 0, each s i t u a t i o n (ui( )' i(")) f ° £ A > v i ( ) G E i i n the game T' is accosiated w i t h the s i t u a t i o n ( i / ( - ) , u ' ( ) ) for u'(-) G P, »'(•) G £ i n the game T , where "((•) = C t t i ( - ) ^ ( - ) ) , «<(•) = ( u i ( - ) , u ( ' ) ) 2(0 f-optimal strategies in the perfect information game of d u r a t i o n T — ( on the set R \M from the i n i t i a l state x(t). Here ( is the first instant when the trajectory x(t) in the s i t u a t i o n ( u i ( - ) , U i ( )) penetrates the set R \ M (if t > T, then the strategies u ( ' ) ' ii') defined i n an a r b i t r a r y way). ,
u
r
-
1
a
2
n
d
U
a
r
e
t
h
e
2
2
v
2
a
r
e
T h e o r e m 1 1 Let (u*,i/*) fie the equilibrium in mixed strategies in tke game T'. Then tke same situation ( u " , c * ) is tke e = Ze-equilibrium in the game P if by p" is meant tke probability measure on the strategy set Pi of the form u (-) = (nt(-),u (-)), and by u' tke probability measure on the strategy set £ 1 of the form v,{-) = (i>i(-),i> (-)). t
t
2
2
P r o o f : B y the definition of the game F ' K't M > M l- J W J - r - l ) , K (%;%(•), = j H
[
x
{
T
)
)
f
o
forf
>
T
but \V(x(i),T - t) - K(x \u (-),v,{-))\ < e, since the pair u (-) v {-) forms the e-equilibrium i n the perfect information game w i t h the d u r a t i o n T - t on the set R\M f r o m t h e i n i t i a l state x(i). Therefore 0
and,
t
2
t
2
consecuently, | M ' ( u i ( - ) , v i ( )) - M ( u ( ) , v ( - ) ) | < e £
e
for all U i ( - ) , u i ( - ) and u (-), «,(•)• T h e n c
|£'( ^)-£(^^)l<«. u
where J Pi J Ei
( ' - ) 6
5
7
Differential
204
games with incomplete
information
We show that
£(«(•), u-) + 3e > £ f > %
> E(ft , »(•)) - 3e m
(6.5.8)
for all u(-) € P , «(•) € £ . B y T h e o r e m 3 , C h . 1, this implies t h a t (p,*,v*) is the f | - e q u i l i b r i u m point e, = 3t i n the game Y. Prove the left h a n d side of (6.5.8): E(u(-),v-)
>
=
I M{u{-)M-))^''=
/ /
^*o;tui(),«S()l,h(),^(-)]}dW-^= tf(io;« (-),*«(-))<*™»''-«-
= /j^
t
Here u , ( ) € P . . F r o m (6.5.6),(6.5.7) we have
K{x ;u,i;%v (-))dFdv'-e>
JJ
s
a
> JJ
s
t
A"(io;u (0."i(-)^""-2< 1
= £ ' ( « , ( ) > " ' ) " 2e > E V , " * ) ~ 2 e >
=
£ ( " ' , f*) - 3e.
W e have thus proved the l e f t - h a n d side of inequality (6.5.8). T h e proof of the r i g h t - h a n d side of (6.5.8) is o m i t t e d here, since it is carried out i n much the same way. • Conclusively we obtain the following theorem. T h e o r e m 12 In the game Y for mixed strategies.
any e > 0 there exist the (-equilibrium
in
Example 5. T h e set M is an open unit circle with its center l y i n g in the origin of coordinates. T h e regular p a r t i t i o n 5 ' of the plane R is composed of two sets S[ and $£, where Sj D M , a n d S' = R \ S[. T h u s , the subpartition S of the partition S' is composed of a unique set Si = S'f\M = M. The motion equations in the case involved are the same as i n the example 3, 4 of this chapter,i.e. 2
2
z = u + v,
z €
2
R
2
Games with mixed information
state
205
The sets of control variables (7, V , however are richer than in the previous examples, v i z : V = ^ = {(1,0), ( - 1 , 0 ) , (0,1), ( 0 , - 1 ) } . T h e i n i t i a l state z is chosen i n accordance w i t h u n i f o r m d i s t r i b u t i o n on M. As i n 5, the players have perfect information out off M, i.e. they know the state z a t each instant of t i m e . T h e d u r a t i o n of the game is fixed and equal T = 2. T h e payoff is t e r m i n a l and equals 0
= |z,(2)| +
F{zo,z(2))
1*001-
A s shown before, t o solve the game, we have t o find first a solution of the prfect information subgame on the set R \ M. 1
Suppose the i n i t i a l state z belongs to the boundary of the set M. Consider the case z\ > 0, z > 0. T h e perfect information game from other i n i t i a l states i n solved i n m u c h the same way. A s s u m e t h a t Player E adopts the constant strategy v'(-) — (1,0) irrespective of time and realized states z. T h e best responce strategy by Player P then is to m i n i m i z e the grouth rate of the component Z\, i.e. t o adopt the strategy u'(-) = (—1,0) (any other strategy involves the g r o u t h \z\\ + \z \). A s a result the point stays where i t is until the game t e r m i n a t e s . A t the same t i m e , we may see t h a t the best responce of Player E t o t h e strategy u'(-) = (—1,0) is the choise of the strategy v'(-) = (1,0). T h u s , the s i t u a t i o n ("*(*)> *(*)) * region of Z\ > 0, z > 0 forms the e q u i l i b r i u m i n the perfect i n f o r m a t i o n game, and the value of this game is equal to |flcos
2
u
m
u e
2
2
2
\zj \ + \z \ = \Rcostp\ + \Rs\n
and is independent o f the d u r a t i o n of this subgame. O p t i m a l strategies i n the perfect information game are of the form f V
[
,
)
(1,0)
~ \ (-1,0)
for^>0, for*, < 0 ,
(-1,0)
for z , > 0,
(1,0)
forzi<0.
For the complete solution of the game i t remains t o define o p t i m a l strategies in the i n f o r m a t i o n set M. Since the p a r t i t i o n S contains only one i n f o r m a t i o n set, then the strategy sets for b o t h players coincide, each containing four strategies:
«,(•), v (): (1,0), u (-), vt{): (0,1), 1
2
U (-), UiC-j, 3
**(•): v*i):
(-1.0), (0,-1).
Differentia.!
206
games with incomplete
information
N o t e that ra the c o m p u t a t i o n of the payoff m a t r i x in ail situations (u{ •),«(•)) except («i(•),%(•).), (u (-),v (-)), 2
z
0
(u (-),v (-))
4
4
2
{u {-), u, (•)), for a l l i n i t i a l states
l
3
6 M the trajectory falls on the b o u n d a r y of the set M.
F r o m the solution
of the perfect i n f o r m a t i o n game it follows t h a t the trajectory remains on the boundary of M u n t i l the game t e r m i n a t i o n . Therefore, the payoff i n any of the indicated situations is the m a t h e m a t i c a l expectation of the s u m of absolute values of coordinates taken over the i n i t i a l positions z
from the set M if the
Q
trajectory intersects the b o u n d a r y of M in a suitable s i t u a t i o n . W e shall not provide all c o m p u t a t i o n , though it should be noted that the payoff M ' concide in the following s i t u a t i o n s : M'(
U l
(').M0)
l
=
= M (u (-),v C))
=
2
= M'(u (-) v (-)) 3
= M'{u {-),v {.))
l
3
2
4
4
",
M ' ( « i ( 0 , < ) ) = W'(iii(0,»<(-)) = = M'(u (-),v (-))
= M-(u (-),v (.))
=
= W ' M - W - ) )
= M'M)M-))
=
2
1
2
= M-(u (-),v (-)) 3
2
4
= M'(u (-),v4(.))
= b,
= M'(u (-),v^))
=
3
M'(ui(-),v (.)) 3
2
= M ' M O . M O ) = M'(«3(-).«i(-)) = c Now for the solution of the game f m a t r i x game:
i»i(0 v (-) v (-) t><(0
u,(0 a
2
3
c b
we have to solve the following auxiliary
UatO s ( - ) b c b a b b a c b u
"4(0 b e b a
T h e m i x e d o p t i m a l strategies in the m a t r i x game V under consideration prescribe equal probabilities ( 1 / 4 , 1 / 4 , 1 / 4 , 1 / 4 ) to each pure strategy of the players P and £ , T h e m i x e d o p t i m a l strategies in the o r i g i n a l game are constructed w i t h the help of the solution of the game F* a n d the perfect information subgame on the set R
1
\ M just as it was in T h e o r e m 11.
Pursuit g a m e with prescribe
6.6
duration
and delayed
information
207
Pursuit game with prescribed duration and delayed information for both players
Consider a differential z e r o - s u m p u r s u i t game of prescribed d u r a t i o n T between the pursuer t e a m P = { P , , . . . , P } , a c t i n g as one player, a n d the evader £ . M o t i o n equations are of the form: m
for players Pi for P l a y e r £
#"> = /<*>(*, n ) , y = <7(y,u),
u 6 J0 C CompR ,i = l,...,m, veVDCompR', k
(6.6.1) where x<'> £ R " , y € fi", x< (0) = 4 , y(0) = I/o- A l l the conditions, w h i c h ensure the existence, uniqueness and [0, T] continuability of the solution from the i n i t i a l states X Q , y for any p a i r of measurable o p e n - l o o p controls u(t) = ( u ' ' ' ( t ) } , u(t), are imposed o n the right-hand sides of equations (6.6.1). r)
0
T h e games under consideration differ from the imperfect information games of p u r s u i t presented in 1 of this chapter in the n o n - c o n v e x i t y of the payoff function and delayed i n f o r m a t i o n about the variable of P l a y e r P = { P j } (aside from delayed i n f o r m a t i o n on about the Player E for Player P ) . In 2 convexity of the payoff implies the existence of the pure o p t i m a l strategy for Player P and the m i x e d one for Player E. In the present case b o t h players have only mixed o p t i m a l strategy. L e t us define more presicely the state of information in game. A s s u m e t h a t the numbers £\ > 0, t > 0 (^i > £) are given and represent a delay in a c q u i r i n g information o n Player P by Player E, and vice versa. F o r 0 < t < T, a t each time i n s t a n t t P l a y e r P knows his state x(f) — {x^(t)}, the time t and P l a y e r E's state at the i n i t i a l time instant y . For t < t < T, a t each instant t P l a y e r P knows his state x(t) = ( x ' ' ' ( t ) } , the time I and Player £ ' s state at the i n s t a n t t—t. For 0 < t < £i, at each instant t Player £ knows the time t, his state y(t) and P l a y e r P ' s state XQ = {x ''} at the i n i t i a l instant of time. For £\ < t < T , at each instant t Player E knows his state y(t), the time t and the state x ( t - £ , ) = { x ( t - 4 ) } of Player P = { P } at the instant 0
0
( , ,
f
Player £ ' s payoff is defined as mm [xM{T),y{T)), P
(6.6.2)
where p{x^\y) is the E u c l i d e a n distance between the points x#\ y. Denote the defined game by r ( x o , 3 / o , T ) . A s i n 1, 2, i n t r o d u c e m i x e d and pure piecewise o p e n - l o o p strategies. P u r e p i e c e w i s e o p e n - l o o p s t r a t e g i e s . B y the pure piecewise o p e n - l o o p strategy v(-) of P l a y e r £ is meant a p a i r { r , b], where r is a p a r t i t i o n o f the time
Differential
208
games with incomplete
information
interval [0, T\ by a finite number of points 0 = *j < t% <...<*, = T, and b is m a p p i n g which places each state t , x , y[t ) for 0 < (* < ii i n correspondence w i t h the measurable o p e n - l o o p control v(t) for f € [t*,'it+i) a n d places each state t , x(t —£-,}, y(tk) for < t < T i n correspondence w i t h the measurable o p e n - l o o p control v(t) for i e [t*,**+i)- B y the pure piecewise open-loop strategy u(-) of Player P is meant a pair {
k
k
0
k
k
kl
0
k
k
(,
k
k+l
k
k
k
k
(l,
M i x e d p i e c e w i s e o p e n - l o o p b e h a v i o r s t r a t e g y . B y the m i x e d piecewise o p e n - l o o p behavior strategy v(-) of Player E is meant a pair {r,c}, where r is a p a r t i t i o n of the time interval [0,T] by a finite number of points 0 = f] < tj < ... < t, m T , a n d c is a m a p p i n g which places each state t , i , y(t ) for 0 < t < f] in correspondence w i t h the p r o b a b i l i t y measure i / (that is dependent on t x , y(t )) on Player E's reachability set C £ ' ~ ' ' ( t y ( ( ) ) from the state y(t ) in the t i m e t j — t , a n d places each state f^, y(t ), ( k A) f° ^ 5- I " i n correspondence w i t h the p r o b a b i l i t y mesure i / on Player E's. reachability set C' * ~' (y(t )). B y the m i x e d piecewise open-loop behavior strategy p() of team P = { P i , . . . , P } is a meant a pair {a,if}, where
k
kl
0
+
k
k
x
0
k
i
—
lt
k
+ 1
k
r
k e
l
k
k
m
2
k
k
k
0
k
in the time 4 - i * , a n d places each state t' , y(t' - £), x(t' ) for £ < t' < T in correspondence w i t h the p r o b a b i l i t y measure u (that is dependent on t' , y(t' - £), x(t' j) on Player P ' s reachability set k
+ 1
k
k
k
k
k
k
c K S ( a = n c j K V v * ) ) i
in the time t'
k+l
— t' . k
For each P O L S p a i r (u(-),v(-)) we may define the payoff function K ( * o , y o ; "(•),«(•)) = mio p( M(T),y{T)), x
(6.6.3)
where s ( i ) , !/(') are the players' trajectories realized when e m p l o y i n g the strategies u( ), «(•) in the game r(irj,!/o,r). Since the game is an imperfect ( , )
Pursuit g a m e w i t h prescribe duration
209
and delayed information
information game, the e q u i l i b r i u m in pure P O L S ' s may not exist therein, hence the i n t r o d u c t i o n of M P O L B S is absolutely necessary. E a c h pair of strategies f(")i " ( ' ) ' i n i t i a l conditions x , y induces the probability d i s t r i bution o n the final position of the game w
t
nt
n
e n
x
e
d
0
0
tn ^ i - n c g M ? ) ,
OK*,),
;=I
hence by the payoff is meant the m a t h e m a t i c a l expectation (6.6.3) which is denoted by M ( i , y ; / i ( - ) , " ( ' ) ) • U n d e r t h e solution of the game r(x ,y ,T) we mean the finding o f the e q u i l i b r i u m i n the M P O L B S class. T h e motion under M P O L B S u(), e(-) from the i n i t i a l states x , yo is constructed as follows. L e t /*(•) = {
0
a
0
0
k
(
1
E
2
k
2
2
2
2
2
2
3
3
€
0
0
Let Cp(x%') C (yo) be the reachability sets of the team member P ; a n d Player E from the i n i t i a l states x ' ' , y at the time T. For s i m p l i c i t y , assume that (though i t is not i m p o r t a n t for what follows) the set C (y) for any y, T is convex a n d compact. }
E
0
0
E
Consider a simultaneous a u x i l i a r y z e r o - s u m game of pursuit V o n the set C' (y). T h e game proceeds as follows. Player P chooses m points ij,- 6 C (y), and Player E the point £ € C' {y). T h e choices are made simultaneously a n d independently of one another, and Player E receives the payoff defined by the formula (6.6.2). Since the function (6.6.2) is continuous, and the pure strategy sets in the a u x i l i a r y game T are compact, then the game has the e q u i l i b r i u m in the class of m i x e d strategies. Denote the value of the game by V in order to emphasize its dependence o n the strategy sets [C {y)] , C' (y) of the players P and E. L e t ( u j , i / * ) be the e q u i l i b r i u m i n the game r For y € C^'lyo), the following requirements are imposed on t h e family of games V . 3
E
E
E
v
y
l
E
m
E
r
v
1. F o r a n y e > 0 there is such N t h a t i n the game T Player P has the mixed ( - o p t i m a l strategy n' prescribing equal probabilities (^) to some v
y
c
Differential
210
games with incomplete
N points ( m collections of points) £, (y),6v(y)
information
of the set [C£(i/)]'\
Moreover, the number N chosen by a given € is independent of y for all V 6
C -\y ). T
2. Let y(t)
E
a
be an arbitrary admissable trajectory of P l a y e r E for 0 < ( < T.
T h e n there are such continuous nonintersecting trajectories ij\y(t — (.)], j -
1 . . . . . J V , that
€ [ C & ( l f O - * ) ) ] " S a n d each point ^ ( l - * ) ]
is a point of the strategy spectrum l&t-il
t n e
m
g
a
m
e
^v('-')-
T h e o r e m 13 A s s u m e that the set C (y) is convex and compact for any y € C ~'ly ), the above-mentioned conditions 1,8 are satisfied and each team partner Pi by the time instant T — (j can ensure fulfilment of the inclusion E
E
0
(6.6.4)
C )(xM(T-e ))DC' (y(T-e)) P
s
E
for all y(T—£) € C (y(T—t—irrespective of Player E's actions, proceeding from the information acquired in the game T(x ,y ,T) only. And the team P acting further on the time interval [T — t\,T] can ensure the e-capture (is meant the arrival of the point s ( a s " ' , . ,*'*"') i n the t-neighbourhood of the point £ , ( £ ! ' ' , . -. ,(!f^)) °f "y point £j[y{T — ()} for any e > 0 by the instant T (here £ j , m are the collections of points defined by the conditions 1, 2.) E
0
a
a
Player E then has the optimal MPOLBS which prescribes for 0 < ( < T — t an open-loop transition to the point y, for which the value of the auxiliary game V(y) is maximal. Further, at the instant T—t, the optimal MPOLBS prescribes the choice of any point y(T) in accordance with the mixed strategy vJ which is optimal on the game F (a random choice is to be made), and an open-loop transition to the point y(T) realized as a result of a random choice. For any e > 0 Player P has the e-optimal MPOLBS which, / o r 0 < f
a,
0,
P r o o f : Let " " ( • ) , ""(•) ° e the strategies whose o p t i m a l i t y w i l l be proved. It suffices to prove that the inequalities M{x ,y ;u(-), 0
0
,;•'{•)) - e > M(x , 0
>M(x ,y -p-\v(-)) 0
0
y \,<"(•). ""(•)) > 0
+ c
are satisfied for a l l pure piecewise o p e n - l o o p strategies u(-), v{-). definition of the strategies p"(-) i / " ( . ) is follows i m m e d i a t e l y t h a t |ilf(x
0 l ! / 0
; "(-),^'(-))-V(!7)|<e U
(6.6.5) From the
(6.6.6)
Delayed information
for both
players
211
Therefore it suffices to prove that the inequalities (6.6.7) M{x ,y ;p.-'(-)A-))
(6.6.8)
0
are satisfied. To prove (6.6.7), consider two cases. 1. A s s u m e t h a t the strategy u(-) fails to ensure fulfilment of the inclusion (6.6.4) by the instant T — t\. T h e n the inequality (6.6.7) follows from the monotonic dependence of the payoff (6.6.2) on p and the convexity of the set C (y). E
2. If, however, the strategy u(-) ensures the fulfilment of the inclusion (6.6.4) and then directs the pursuit not towards the points entering into the to p t i m a l strategy s p e c t r u m of Player P i n the game r , then the inequality (6.6.7) follows from the o p t i m a l i t y of the strategy v of Player E i n the game T^. T h e inequality (6.6.8), however, follows i m m e d i a t e l y from the fact that the o p t i m a l strategy of P l a y e r E contains the o p t i m a l strategy e£ i n the game Y as a behavior strategy, and the m a x i m u m value of the a u x i l i a r y game is achieved at the point y. v
y
y
To complete the proof, note t h a t the conditions 1, 2 on the class of games r „ are not rigid and are satisfied in most of the cases involved when the payoff in each game T is continuously dependent u p o n the parameter y. T h e other conditions of the theorem are the conditions concerning the perfect information games. •
6.7
Delayed information for both players when the evader team takes part in game
Consider a generalization of the p r o b l e m given i n the previos section, to the case when several evaders, a c t i n g as one player, take p a r t i n the game. We a have a z e r o - s u m differential p u r s u i t - e v a s i o n game of prescribed d u r a t i o n between a ' t e a m of pursuers P = { P i , . . . , P } , a c t i n g as one player, and a team of evaders E = {E\,..., E ), also a c t i n g as one player. M o t i o n equations are of the f o r m m
n
forP;: for Ef
=-i*%W,»»J, b
m
z<" € R , k
( i )
€ V® C R*,
i =
vWevWcR*,
0=9 (y \v % m
ti
j /
w
e
fi\
z " ( o ) = 4h l
i,...,m; i = i, = v$
(6.7.1)
Differential
212
games with incomplete
information
A l l the conditions w h i c h ensure the existence, uniqueness a n d [0, T] continuability of the solution (6.7.1) are supposed to be ssatisfied for any measurable o p e n - l o o p controls u(t) = { « < ' » ( ( ) } , u(t) = { » » ( * ) } . T h e i n f o r m a t i o n state i n the game is as follows, given are t h e numbers t > 0 a n d £\ > 0 representing Player P delayed i n f o r m a t i o n about Player E a n d Player E's delayed i n f o r m a t i o n about Player P, respectively. This means t h a t , for 0 < t < £, at each instant of time t P l a y e r P knows his state x(t) — { x ( ( ) } , the time ( a n d P l a y e r £ ' s state at the i n i t i a l instant yo = {yo }' F o r £ < t < T, Player P knows his state x ( t ) = { x ( f ) } , the time t and Player E's state y(t - £) = {y (t - £)} at t h e time instant ( - 1. Similarly, for 0 < ( < A , at each instant of time t P l a y e r E knows his state y(t) = {y (t)}, t h e time f a n d Player P ' state x = { 4 ° } initial instant. F o r £i
(i)
]
u)
{i)
a
0
t
t
h
e
u)
Player E's payoff is t e r m i n a l and is defined as follows. C o n s i d e r the matrix A(x,y) = (p(x^, j ' ' ) ) > where p(x^,y^) > the E u c l i d e a n distance between the points artO, 0 e R ; x = ( x ' ' , . . . , x< >), y = (yW,...,yW). Introduce an order relation on the m a t r i x set. We say t h a t A > B if all elements of the m a t r i x B do not exceed the corresponding elements a;j o f the m a t r i x A and there exist such indexes ( i , j ) that > i*i ,j . T h e real function F defined on the set of m X n matrices, is monotonically increasing if F(A) > F(B) for A > B. L e t the payoff of Player E be equal to K(x,y) - F(A(x,y)), where F is a real, continuous, monotonically increasing f u n c t i o n . J
s
1
k
m
0
0
Q
0
B y the pure piecewise o p e n - l o o p strategy u(-) of P l a y e r E is meant a pair {cr.a}, where u is an a r b i t r a r y p a r t i t i o n of the time interval [0,T] by a finite number of points 0 = ( i < t < ... < t = T a n d a is a m a p p i n g which places each state x(t ) = { x ( i * ) } , y = {y^}, tk for 0 > f < £ i n correspondence w i t h the o p e n - l o o p control u(t) = {ti<'l(()}, \t ,tk+i), a n d for £ < t < T it palces each state x{t ) y{t -£) = {y^{t -£)}, t i n correspondence w i t h the measurable o p e n - l o o p control u(() = { u ( ( ) } defined on \t ,t ). Similarly, by the pure piecewise o p e n - l o o p strategy «(•) for P l a y e r E is meant a pair { r , & } , where r is a n arbitrary partition o f the time interval \0,T] by a finite 2
T
(l|
k
0
fc
h
k
t
k
k
k
k
(,)
k
k+l
number of points 0 = t\ < t' < ... < t' - T a n d b is a m a p p i n g which places each state x , y(t' ), t' for 0 < t' < t\ i n correspondence w i t h a measurable o p e n - l o o p control v(t) = { u ( 0 1 defined on the t i m e - i n t e r v a l [t' ,t' ), and for £i
0
k
k
s
k
( j ,
k
k
k
k+i
k
t+1
Q
0l
Delayed information
for both
players
213
u( ) € V, v(-) £ £ , then the payoff function is defined uniquely Kixo.yo-^i-hvi-))
(6.7.2)
= F(A(x(T),y(T)))
where x{t), y{t) c o n s t i t u t e a solution of (6.7.1) f r o m the i n i t i a l states x , y in the s i t u a t i o n (u(-), «(•))• Denote the game i n h a n d by r(x , Vo, T). 0
Q
0
B y the m i x e d piecewise o p e n - l o o p behavior strategy ( M P O L B S ) p(-) of Player P is meant a pair {o, d ) , where a is an a r b i t r a r y p a r t i t i o n of the time interval [0, T] by a finite n u m b e r o f points 0 — t, < t < ... < t = T a n d if is a m a p p i n g which for 0 < t < £, places each state x(t ), y , t in correspondence with the p r o b a b i l i t y measure p o n P l a y e r P ' s reachability set in the time t - i from the state x ( i ) , C'^'-' (x{t )) = fi^ C ^ " * " ^ ^ ) , a n d for £ < t < T i t places each state x ( t * ) , y[t - £), t i n correspondence with the p r o b a b i l i t y measure p on P l a y e r P ' s reachability set Cp* ~ (x(t )}. Similarly, b y the M P O L B S v{-) of P l a y e r E is meant a pair { T , C } , where T is an a r b i t r a r y p a r t i t i o n of the time interval [0,7"] by a finite number of points 0 = t[ < t' < ... < t' = T and c is a m a p p i n g which places each state x , y(t' ), t' for 0 < t' < £ in correspondence w i t h the p r o b a b i l i t y measure v on P l a y e r E's r e a c h a b i l i t y set f r o m the state y(t' ) i n the time t' — t' , 2
k
m
r
k
k
l
f c
0
k
1
k
k
k
k
s
2
0
k
k
s
k
k
t
k
C't'^W*)) = n?=i t' for £i < t' < T i n correspondence k
tk
k
reachability set C'^^iyit'^).
k+l
k
P each state x(t' -£), y(t' ), w i t h the p r o b a b i l i t y measure on the a
n
d
l a c e s
k
k
T h e sets of all M P O L B S of the players P a n d
E are denoted respectively by V a n d £ . W i t h the i n i t i a l states fixed, each M P O L B S p a i r /*(•), f ( - ) induces p r o b a b i l i t y d i s t r i b u t i o n over the final states of the game Cp(x ), 0
Cg(t/o) therefore b y the payoff of P l a y e r E we mean the
m a t h e m a t i c a l e x p e c t a t i o n M{x , y ; u(-), V(•)) of the payoff A ( x , y ; u(-), v(-)). 0
0
0
0
We have thus defined a m i x e d extension V(xo, yo, T) of the game
T(xg,yg,T).
Let us describe how in the s i t u a t i o n (/*(•)>"(")) ^ "P X £ the m o t i o n from the i n i t i a l states x , y is c o n s t r u c t e d . Denote the u n i o n of p a r t i t i o n s a = {(*} 0
a
and r = {r^} by w = {t' }. A t the instant t" = 0 the players P a n d E realize k
the measures u a n d v prescribed respectively by the mappings d and c in the state x , y , a n d m a k e t r a n s i t i o n s to the points x(r. ) = { x ( ( . ) } , x ( i - ) € 0
0
Cg-"(4 ), >
(,)
2
(l,
2
2
2
2
I , . . . , n u s i n g any o p e n - l o o p controls which carry over the point x £ ' to and
to
2
i = l , . . . , m , and y(i ) = {^'((' )}, y « ( t ) € C ^ f o o ' ) , j = 0
!/
( j ,
( 2);
^
E
R
E
A
N
D
SK'J)
a
r
e
1
x^(t ) 2
chosen r a n d o m l y in accordance
with the p r o b a b i l i t y measures p a n d (/ generated by the strategy p(-) and «/(-). In the states x(f. ) a n d t / ( t ) , the p r o b a b i l i t y measures u a n d c dictated by 2
2
the maps d a n d c are again realized, a n d the players make transitions to the r a n d o m l y chosen points x ( ( ) € C j T ' ( x ( t ) } , y(t' ) £ C l ~ % t i ) ) , etc. A s a 3
3
2
3
result of such sequential choice of behaviors, the r a n d o m trajectories x(t) and y(t) f r o m the i n i t i a l states x a n d y corresponding to the s i t u a t i o n (u( ), v(-)) 0
0
Differentia!
214
are realized. Let all reachability sets C (y^) Ei
of the set U"=i C T(y)
(j/W).
E
{Pi,..., P ) m
be compact, and D (y) E
chooses m points £ > of the set D (y), (l
E
be a convex hull
>. T h e game proceeds as follows. P l a y e r P =
E
points n^ € C
information
Consider a simultaneous a u x i l i a r y z e r o - s u m game
=< [D (y)\™-,C {y),K E
games with incomplete
f
E
and P l a y e r E chooses n
(y' *). T h e choices are made simultaneously and independentJ
ly of one another.
P l a y e r E receives the payoff P ( - 4 ( f <">,..., £< >; r / ' , . . . , m
1
!)'"'))• Since the payoff function of Player E is continuous, and the sets of pure strategies of the players are compact, then the game T(y) has the equilibrium in the class, of mixed strategies. Let V a ! r ( y ) = V(y), {p , l£) be the e-equiliy
b r i u m i n the game T(y). Suppose the following two conditions are satisfied. 1. For any e > 0 there is such number N that in the game V(y) Player P has the e - o p t i m a l strategy u prescribing equal probabilities ^ to the points f • •,.£».• from the set [,Dg(jf)] , the number N being independent of y for all y £ Cl-'iyo). y
m
2. L e t y(t) 0 < i < tiW ° f £j[y{t T(y(t-()).
= {y^(t)} be an a r b i t r a r y admissible motion of Player E for 7". T h e n there are such continuous nonintersecting trajectories O L 3 = 1,--->JV that £M h e [D%{y{t - i)T and each — fy] ' P°' °f the strategy s p e c t r u m in the game f
s
a
n t
T h e o r e m 14 Assume that using his accessible information, stant T — £j, Player P can ensure the inclusions - i)) C <3%W
D (y(T e E
- t-
k))
by the time in(6.7.3)
for all points y(T — t) € C {y(T — I — £y)) and, acting further on the time interval [T—£,T], he can ensure by the instant of time T for any number ei > 0 the e, -capture of any ofthepoints £j[y(T—li] determined by conditions 1 and 2. Then Player E — {E ,..., £„} has the optimal MPOLBS which is formulated as follows. For 0 < t < T — £ ke kas to make an open-loop transition to the point y = ( j " ' , . , . , ! / ' " ' ) where the auxiliary game value V(y) is maximal; at the instant of time T — I he has to choose any of the points y' G C' {y) in accordance with the mixed strategy i ' ! that is optimal in the game T{y) (making a random choice) and make an open-loop transition to the point y' = y(T) realized as a result of this mixed strategy V* (random choice). For any number e > 0 there is the following 4e -optimal MPOLBS for Player P = { P , , . . . , P } F o r 0 < t < T — £ he has to act (in a determininistic way) so as to ensure inclusions (6.7.3) and then, with probabdity , to choose one of the points £j[y(T —i—^i] and ensure by the instant of time T the e, -capture fej — e,(e)J of the points 4j moving along the trajectory £j[y(t — £)]. E
t
E
m
t
Delayed information Proof:
for both
players
215
N o t e that the fulfilment of inequalities pix®,^)
1 , . . . , m is called the e - c a p t u r e of the point
< t, for a l l i =
= { £ ] ' ' } by the point x =
Let «*(•), »**(•) are the strategies described i n the theorem. W e prove that M(x ,y ; 0
0
p'(-),u(-))
- 4e < M(x y -p (.)^-(.)} 0t
< M(x ,y -p(-),u-(-}) 0
<
c
D
(6.7.4)
+ 4e
0
for all M P O L B S u(-), f ( ). W e may show that (2.7.4) follow from inequalities Af(z ,!/o;/G)XO) 0
- 4 « < M(x ,y -n<(-), 0
„'(.))
0
<
< Mf>6,|toiB(4^H) + ^ (6-7-5) for all piecewise o p e n - l o o p strategies «£•), i>(-). Let x (t) and t/ (i) be the trajectories of the players P and E respectively resulting f r o m employment of strategies /*'(•), f " ( ) . T h e n the point x (T) w i t h p r o b a b i l i t y jj falls w i t h i n the t i - n e i g h b o u r h o o d of the point £j[y(T - £)] from the strategy p that is e - o p t i m a l i n the game T(y). T h e point y ( T ) is a realization of the strategy c
(
c
y
t
that is o p t i m a l for Player E i n the game T(y). Since the payoff function is continuously dependent on p(x^\ t/'" ), then for any number e > 0 there is the number ej > 0 such that as soon as the point x {T) with probability falls within the e j - n e i g h b o u r h o o d of the point (j[y{T — £)], the inequality 1
t
„•(•)) - M f > $ , i $ | < e,
\M(x y ;p<(-), 0t
0
(6.7.6)
where M(u^,i/ ) is the m a t h e m a t i c a l expectation of payoff i n the game T(t7) when the m i x e d strategies p' , are used is satisfied. O n the other h a n d , the situation ( n i , u ! ) is e - o p t i m a l i n the game V(y) and hence y
y
(6.7.7)
\M{f,,»' )-V(y:)\
From the formulas (6.7.5)-(6.7.7) it follows that to prove this theorem it suffices to show t h a t the inequalities (6.7.8)
M(x '(-)M-))-^
M(x ,yo-M-)^'()) D
+ ^>V(y)
(6.7.9)
are satisfied for a l l pure piecewise o p e n - l o o p strategies u(-), «(•). W e first prove i n e q u a l i t y (6.7.8). L e t Player P adopt the strategy / / ( - ) , and Player E an a r b i t r a r y pure piecewise o p e n - l o o p strategy v(-). T h e n , under the strategy u'(-), P l a y e r P makes a r a n d o m choice at the instant of time T - £i a n d , by the game t e r m i n a t i o n instant T, he appears w i t h p r o b a b i l i t y ^ i n the e , - n e i g h b o u r h o o d of the point £j[y(T - £)] entering into the s p e c t r u m of
Differential
216
games with incomplete
information
the strategy A*y(T-() t h a t ' e - o p t i m a l i n the game T(y(T — £)). W h a t e v e r the strategy v(-) £ £ of P l a y e r E and its corresponding trajectory y(t) may be, the inclusion y(T) £ C (y(T — £)) is satisfied, i.e. y(t) is a pure strategy for Player E i n the game r(y(T — £)). Since the delay of Player E's information on Player P is equal to £\ > 0, then the result of P l a y e r P' r a n d o m choice remains u n k n o w n to P l a y e r E until the game terminates and Player P's behavior proves to be strategically equivalent to the equiprobable choice of points in the instant T. N e x t , since the payoff function is continuously dependent on p(x"\ y W ) , and the set C (y{T — £)) is compact, then for any n u m b e r e > 0 there is such number ei > 0 that as soon as p(x^(T},^'\y(T — £)]} < ^ , i = 1,..., m , then the inequality s
E
E
\M(x ,y -,p'(-)M-)) ~ M{p\y{T))\ < e, (6.7.10) where M(p ,y{T)) is the m a t h e m a t i c a l expectation of payoff i n the auxiliary game T(y(T — £)), and y(t) is the trajectory resulting from employment of the strategy v{-), is satisfied. O n the other h a n d , the strategy u' is e-optimal in the game r(y(T - £)): 0
0
c
(6.7.11)
M(p\y(T))-t
It follows f r o m inequality (6.7.10),(6.7.11) t h a t M{x y ;p (-),v(-)) - 2e < V(y(T — £)), here the number ei can be chosen the same for a l l strategies v(-) € E. Since V(y{T - £)) < V(y), then this implies inequality (6.7.8). 0l
c
0
W e now prove inequality (6.7.9). Let Player E makes a r a n d o m choice at the time instant T — I and hence Player P does not know the result of this choice until the game terminates; therefore he may not move to the point realized as a result of this random choice. Consider two cases. 1. L e t the strategy u(-) of P l a y e r P be such t h a t x(t) is the pursuer's trajectory corresponding to is a pure strategy for Player P i n the game Y{y) (6.7.9) is satisfied because of the o p t i m a l i t y of
x(T) £ [D%(y)) , where this strategy. T h e n x(T) and hence the inequality strategy v. in the game m
2. Let «(•) be such strategy of Player P t h a t at least one of the points xW{T)eD {y). Assume that x^{T) $ D (y). C o n s i d e r the quantity e
E
rmn
c
p(xW,i)
=
E
p{x< \^ ), l
0
where ( is a boundary point of the set D (y). Since the set D (y) is convex, then the inequality < a;' ' - £ ,y - £ > < 0, where < x,y > is the scalar product of vector x, y £ R , holds for any point y of this set. Therefore f E
E
0
1
0
0
k
P\* ,V) W
~ P\M
=<
- y , x M -y
> -
< & - y , & - y
>=
Delayed information
for both
players
217
Consequently, p(x^,y) > p(£ ,y) for any point y e D' (y) a n d , in part i c u l a r , for the points setting up the s p e c t r u m of strategy i/£ i n the game r(y). T h e n the m a t r i x A(x,y) = (p(x^(T),y^1{T)) can be compared w i t h the m a t r i x B(x,y) different from the former i n the first row, where the point ( lies instead of the point and A(x,y) > B(x,y). T h e function F increases monotonicaly and hence, for the strategy u(-), Player P may select such d o m i n a t i n g strategy u(-) t h a t i®(T) 6 D (y), i= l , . . . , m , where x(t) = is Player P ' s trajectory corresponding 0
E
0
E
to the strategy £{•)> ' * ie
w
e
have made a transition to the first case.
• We now prove two theorems w h i c h enable us to isolate a sufficiently wide class of a u x i l i a r y games satisfying c o n d i t i o n 1 and 2. C o n s i d e r an infinite zero-sum two-person game V. Let X, Y be the pure strategy sets respectively for the m i n i m i z e r a n d the m a x i m i z e r , and K(x,y) be the kernel of the game; K{x,y)i(dx)^dy)
fJ
IS the expectation of the m a x i m i z a e r ' s payoff, where ( is the p r o b a b i l i t y measure on the Bore! field of subsets i n X, and v is the p r o b a b i l i t y measure on the Borel field of subsets in Y. T h e o r e m 15 Let X and Y be compact subsets respectively in the spaces R" and R and K be a continuous function defined on the set X X Y. Then for any number e > 0 there are such number N and M that in tke game T the players have the mixed t-optimat strategies £ and v prescribing equal probabilities i and respectively to N and M points Xj S X, i = 1 , . . . , JV, and y € Y, j = I , . . . , M. m
f
C N
L
M
t
Proof:
Let as define the B o r e l field X of subsets i n the set X w i t h the help
of the metric p (x x ) x
u
2
= n0\K(xi,y)
-
K[x ,y)l 2
and the Borel field y of subsets i n the set Y w i t h the help of the metric frfVuft)
= m a x \K(x,yi)
-
K(x,y )\. 2
We shall say that the strategies and x (J/J and y } for which p(x x ) = 0 {p{y yi) ~ 0J,are equivalent. A . V a l d shows t h a t if the strategy space of one of the player in the z e r o - s u m game T is c o n d i t i o n a l l y compact i n the topology under c o n s i d i r a t i o u , then for any number e > 0 the game has the e - e q u i l i b r i u m , 2
u
2 t
u
2
Differential
218
games with incomplete
information
and each of the players has the e - o p t i m a l strategy w h i c h is a m i x t u r e of a finite number of pure strategies. From the conditions of the theorem i t follows that this assertion is applicable t o the game V. Let £s = ( p i , . . . , ) be a finite ^ - o p t i m a l strategy for the m i n i m i z e r , p,being the p r o b a b i l i t y of choosing a pure strategy x ; £ X. W e m a y alwais suppose that pi — where i,-, tp are integers. A l l p; are reduced to a common denominator and are w r i t t e n as p,- — jjj, where £ ; £i = N. Since the function K is u n i f o r m l y continuous o n the compact X x Y, then for any number e > 0 there is such number S > 0 that the inequality p(x\x") < S for any y g Y implies the inequality \K(x', y) — K(x" ,y)\ < j . T h e strategy (jj is constructed as follows. Suppose the strategy prescribes equal probabilities jj to the noncoincident points x[ such that p{x'-,x ) < 5 w i t h i = E^Q 4 + 1 , . . . , E f c o 4 + 4 . k = l , . . . , r (£o = 0). T h u s , the points x' are grouped i n the 6~-neighbourhoods of the points X),, w i t h In points i n each neighbourhood. T h e number S is chosen i n such a way t h a t their neighbourhoods do not intersect. T h e n for any element y € Y the following condition is satisfied: P r
= 1
k
;
|A/(£sy)-M(&,j,)| =
< 7v E
<
- &{
£
< ^
-
= -
(6.7.12}
For the strategy £s we have M{iKy)<
V + |,
(6.7.13)
where V is the value of the game T . F r o m the inequalities (6.7.12),(6.7.13) it follows that M((' ,y) < V + % for all 1/ £ V . T h e m a x i m i z e r ' s strategy n' is constructed i n much the same way, satisfying the inequality M(x, 7)' ) > V — k for all x € X. T h e n M{t< , ) - e < M{C ,n ) < M(x,n' ) + e for all pure strategies x , y. T h i s completes the proof of the theorem. • N
M
M
N y
N
T h e C a r t e s i a n product of the sets Q
l6>
l M
M
of a l l points q € R", for each
of w h i c h there is a point q' € Qj such t h a t p(q,q') the ^-neighbourhood Qi ) x
= rfjU
of the set Q =
< S, w i l l be called Qj-
T h e sets
= I~Ij=i Q ^ ^ ' ) are said t o be continuous in the inclusion a t the point 3
x - = ( x l * , . . . , * ! " ' ) if f 1
that Q(x') x = (i
( l
c
o r
any number 5 > 0 there is such number A > 0
[Qt»]<*> a n d Q(x)
C [0(x*)]<1 are satisfied for a l l points
' , „ . ,x<">) satisfying the inequality m a x , \\xty - xP\\ < A .
Delayed information
for both
players
T h e o r e m 1 6 Let C ; (y ), E
t
Ei
vex sets for J/W £ Cl~ (y ), clusion,
j = l
C (y^),
]) 0
e
219 . , . h e Jfcs compact and con-
f
and the sets C' (y) be continuous
( j) 0
E
of y for any y £ C^iVo)-
as functions
number N of spectrum
in the in-
Then for every e > 0 the
points of the strategy p
may be selected the same for
y
ally€Cl- (y ). l
0
We first prove the following statement. L e m m a 1 In the conditions of Theorem continuously dependent on the points y £
16 the value of the game T(y} is C ~ (y ). E
l
0
P r o o f : For s i m p l i c i t y , assume that m — 1, n = 1, W e seek t o prove t h a t for any number e > 0 there is such number S > 0 that if p(y,y) < 6, then the i n equality \V(y) - V(y)\ < £ is satisfied. T h e set C (y) continuously vary i n the inclusion and hence for any number A > 0 there is such number 6, > 0 that i f p[y,y) < 6 then C (y) C ]C {y))M a n d C (y) C [C (y)}^\ Consider a simultaneous z e r o - s u m game t(y) w i t h the payoff function K(x, y) = F(A(x, y)) on the closure S& of the set [ C g ( t / ) ] . In this game for any number e > 0 there is a pair of C i - o p t i m a l strategies p'\ p" w i t h finite spectrum which is constructed as follows. L e t us set up finite t j - n e t s 5/y and >/],...,rhv i n correspond]ngs H e l l y metrics p and p o n the set 5"A. T h e n p", are the o p t i m a l strategies of the players i n the m a t r i x game r''{y). O n these (,-nets construct the strategy p as follofs. If £j £ C (y), then f; = is a point from the spectrum p' , this being chosen w i t h the same probability p, as £ in the strategy p . If £, £ Ss\C {y) then as & we choose any point £j £ C {y) such that p (£i,d) < A and i t appears i n the strategy p" w i t h the same weight as point in the strategy p . N o t e that i n view of the u n i f o r m continuity of the payoff function K on the compact set such point & £ C (y) exists for sufficiently small numbers A . E
U
E
E
E
E
(A)
x
:
y
11
E
1
(i
f
E
E
x
P i
li
E
T h e strategy i / is constructed in much the same way. T h e n p and u" are strategies in the games V(y) and T(y). For any point r\ £ 5 A a n d , in p a r t i c u l a r , for ij £ C (y) we have 1
u
E
N |M(A
e ,
,«)-W,r/)|
<
=
N
Similarly, for any p o i n t £ £ S , and specifically for £ € C (y), A
\M(t;,»")-M(f;,v")\<€,
E
the inequality (6.7.15)
Differential
220
games with incomplete
information
is satisfied. Moreover MN \M( MN < £E(!^(I-.%) - m>M\+i^c^f)
-
< fa (6-7.16)
(here q, are the weights of the points rjj i n the strategy £>''). F r o m the inequalities (6.7.14)- (6.7.16) and the t i - o p t i m a l i t y of the strategies p", t>" in the game f(y) i t follows t h a t for any points £,77 £
(and hence for (,n £
A/(//",77) - 4e, < j t f Q P . t f - f a < < Atf ( £ " , (j*) + 2
fa
C (y)) E
< M(p" ,v' ) < 1
< M((, i>") + 3e, < M ( £ ,fc")+ 4E,
£ l
Thus the strategy pair ( ; t " , i / " ) forms the 4«i-equilibrium in the r(y). Therefore \V{y) -M{p'\v«)\ of the game t(y))
:
<4
a n d \V - M(p",t>' )\
< ej (V b e i n g the value
l
( 1
w h i c h , together w i t h inequality (6.7.16) gives a n estimate
|V - V(y)\ < 7 ( i - If the strategies p\, i>\ i n the game V(y) are now constructed similarly to the strategies p", v' (by " a d j u s t i n g " the strategies fi >, f ) we 1
L
obtain the inequality \V - V(y)\ < 7
choose £] i n such a way that 14EI = €, and f i n d i n g the corresponding 6 = T h i s completes the proof of t h e l e m m a .
•
P r o o f : of T h e o r e m 16. It suffices to show that the number of points N — N(y) of the strategy spectrum p may be chosen in such a way that y
Suppose the oposite is true, i.e. let a number e > 0 a n d a sequence of points {y,}
- { i / i ' , . . - , ; / ! " } e C £ ~ ' ( i / ) be such t h a t l i m ^ / V , - cc for any choice 1
1
0
of a number N, — N(y,).
Since the set C ~ (y ) l
E
0
is c o m p a c t , then there is a
convergent subsequence of the sequence {y }. F o r s i m p l i c i t y , i t is assumed to 3
be the sequence {y } itself, then y £ C f ~ ' ( j f o ) , where y = l i n i s - c o y s
s
Consider an a u x i l i a r y game V(y).
B y T h e o r e m 15 in this game for any
number t] > 0 there is the e - o p t i m a l strategy p'J for Player P prescribing equal probabilities to the points
fa,...,
£ of the set [f3'(j7)]'"- F i x an arbitrary N
number S > 0. Because of the continuity i n the i n c l u s i o n of the sets C (y) at E
the point y there is such number A > 0 that the inclusions C (y) C [Cfid/)]''' E
and C (y) C [C' (y)]W B
B
are satisfied under the c o n d i t i o n m a x , ||yO')-yOT|j < A
In p a r t i c u l a r , these inclusions are satisfied for all y w i t h s > S for a sufficiently s
large number S, i.e. CEM
C [C (y)]W, l
B
C {y)c\C (y )} E
E
s
(6.7.17)
Delayed information
for both
players
221
tor all s > S; here m a x , $^\\ < A. T h e strategies u*_ of P l a y e r P in the games T(y ) (s > S) are constructed in the following way. L e t fl? prescribe equal p r o b a b i l i t i e s t o the points £ { , & , . . . , £ £ , of the set [ ^ ( w , ) ] ™ ; moreover, if & € [Z)|(«,)] , then f j = If ^ e l D ^ y , ) ] " , then as g we take any point £ £ PEO'*)]"' hich s
1
f
o
r
w
(6.7.18}
max||£f '-d ||<«. r ,
r ,
Evidently, £• may b e selected pairwise unequal t o one another. L e t us fix an a r b i t r a r y point y £ C (y,). F r o m the continuity of the payoff function it follows t h a t for any n u y m b e r e > 0 there is such number S > 0 E
2
2
that \K{&y) - m y)\ < £ , i - 1 , . . . , JV for m a x , ||£| ' - {{ || < 6 , the number S b e i n g independent of y £ C (y,). T h u s , u
r)
2
r)
2
l E
2
where M(p ,y) is the m a t h e m a t i c a l expectation of payoff i n the situation (uj,y) i n the game r(yj). T h i s implies the inequality Vi
(6.7.19) i=l E s t i m a t e the s u m o n the right side. W e first consider the case C (y) \ C (y). Because of inclusions (6.7.17) there is such point that m a x j \\y~^ — y' '|| < 6". F r o m the continuity of the payoff for any n u m b e r £3 > 0 there exists such number S3 > 0 that maxj ||y' * - t/ || < S the inequality holds E
E
j
J
(j,
when y £ y £ C (y) it follows that as soon as E
3
,=1
1=1
<
N
N
(6.7.20)
<E^^,j)-/^,^)l<« : 3
the number S being independent of the choice of a point y £ C (y ). l E
3
3
The
e , - o p t i m a l i t y o f the strategy p% of Player P i n the game r(y) and (6.7.20) suggest t h a t (6.7.21) 1=1
Differential
222 when y € C {y)C\C {y,) B
games with incomplete
information
the e , - o p t i m a l i t y of the strategy /ag i n the game 1
B
V(y) implies the inequality
£
i ^ f e
») = Mty?
• 8) <
+«i < P $
+ ei 4- e . 3
B y L e m m a 1, for any number e > 0 there is such number A 4
4
> 0 that (6.7.22)
\V(y )-V(y}\
4
if maxy \\y[ — yjp|j < A . Let the number 5 be chosen i n such a way that j)
4
for s > S (6.7.17)-(6.7.22) are satisfied. T h e inequalities (6.7.19),(6.7.21) and (6.7.22) suggest that M(p- ,y)
< V(y )
yt
a
+ e, + e + e + e, 2
(6.7.23)
3
for all y € C (y ), s > S. Since the numbers €\, e , ( 3 , u are chosen arbitrarily, then from the inequality (6.7.23) it follows that for any number e > 0 we may find such strategy a " that A / ( u * , ) < V ( j k ) 4- £ for all y e C 6{y ), s > S. T h u s , there exists such number S that the strategy p" w i t h the spectrum composed of N = N, points is e-optimal i n the game f{y ) for s > S, which is inconsistent w i t h the assumption that l i m _ N$ — 0 0 . T h i s completes the proof of the theorem. • B
s
2
l
s
v>
3
s
0 0
W e now give two examples of differential z e r o - s u m p u r s u i t - e v a s i o n games of prescribed duration w i t h delayed information for both players. Example 1. Consider the z e r o - s u m game on the plane between the pursuer team P = {P\,P?} and the Evader E. M o t i o n equations are of the form Pi-.
m
= ui%
£ R ', 2
B^lsflW
>0\
1=
1,2,
Player E's payoff is defined as mjnp (x^(T),y{T)). 2
(6.7.25)
Consider the auxiliary simultaneous game of pursuit V(y) on the set C (y) which is a circle w i t h center at the point y. A s shown by S a t i m o v et. a l . , {y) = 0 t -40 t /ir and the players posses the following o p t i m a l strategies. For Player E, the uniform d i s t r i b u t i o n on the circle of radius 01 w i t h its center at the point y is o p t i m a l . For Player P, the u n i f o r m d i s t r i b u t i o n of points (i = 1,2) on the circle of radius r = 201 fx w i t h its center at the same point E
v
2 2
2 2
2
0
Delayed information
for both
players
223
y, the p o i n t s a n d £ > being at the opposite ends of the diameter of this circle w h i c h is denoted by C{y,r ) |2
0
Let p%
iS/
be a m i x e d strategy of P l a y e r P which prescribed equal p r o b a b i l -
ities jj t o be p o i n t s $ = i i f , ^ ) 1
such t h a t the points
(f
1
are l y i n g a t
the opposite ends of the diameter of the circle C ( y , ? - ) , the point j j f is cho0
sen a r b i t r a r i l y on this circle, a n d other points gj * devide the circle into equal 1
arcs. Such sequence of strategies {p'pj^} converges to an o p t i m a l strategy
{/*'}
of P l a y e r P i n the game T(y) a n d since the payoff function is continuous, lim _ B
c t l
A / ( / i ^ , i / ) = M(p' ,t/),
Since the sets C (y) a r e congruent, then the
y
! |
E
number TV is independent of y a n d hence the game F(y) for a l l y £
C ~'(yo] B
satisfies c o n d i t i o n 1. Let y{t) be an a r b i t r a r y a d m i s s i b l e m o t i o n of P l a y e r E for 0 < ( < T. Denote by £j[y(i - I}] the points of the strategy spector
i n the game
V(y(t — £)). Suppose that the m o t i o n of the circle C (y(t
— £)) is t r a n s l a t i o n a l
B
when the center of the circle moves along the trajectory y(t — £). T h e n the trajectories of a l l points o n the circle represent the curves which are congruent to the trajectory y(t —(). In p a r t i c u l a r the trajectories are also such curves.
(\ \y(t-£)\ 2)
T h u s c o n d i t i o n 2 is also satisfied.
team partner of P stays at t h e p o i n t x
A s s u m e that each
u n t i l the time instant £, a n d then
0
employs a d e t e r m i n i s t a i c strategy which is a linear pursuit of the point y{t — £), i.e.the strategy u(-) = {o,a} is such t h a t the m a p p i n g a associates w i t h each state x(t ) = ( V > < t ) , s< >{(*))> »(** " l
k
the control u(t) =
2
t
{4%§,*®(t%
t < t < t t , , where k
p(xv>(tk):y(h-£))
If however the i n e q u a l i t y
T> ^' °J Pi
y
op*' — />
+t
+ tu
a^t >0{i l
+ t ), 1
i = l,2,
(6.7.27)
is satisfied, then by the time instant T - £\ each pursuer P,- ensures that the inclusion C' {y(T B
holds for a l l y{T - I) G C '(y(T e
B
- £)) C C \(^(T l P
- /,))
(6.7.28)
-?-(,)).
N e x t , consider one o f the points £ = (j[y(t - £}] - {# [y{t }
t £\T - ti,T].
- QUf
M* ~
T h e s e points move along the curves w h i c h are congruent
to the t r a j e c t o r y y(t — £). Moreover, a t each time instant f for T -1\ < t < T Player P knows the coordinates of these points. T h u s the p r o b l e m of P l a y e r
Differentia!
224 P i ' s approach to the point
games with incomplete
information
( i = 1,2) converts to the p r o b l e m of pursuit
w i t h perfect i n f o r m a t i o n . M o t i o n equations for this p r o b l e m are of the form P : ;
#
E:
f
= )
[|u<'>|| ;
= V,
M<0;
where E, is a " d u m m i " evader " h a n d l i n g " t h e m o t i o n of the point (j* QW t
and 0 are the same as in the equations (6.7.24), a n d the d u r a t i o n of the game is fixed and equals t\. Denote these d u m m i games by P , . In the games F; the choice of the control v made by Player E{ coinsides w i t h the choice of control made by Player E i n the game T(x , yo, T) at any instant of time. 0
Recall t h a t the inclusion (6.7.28) is satisfied, which means t h a t i n the game f j the reachability set of the evader E± i n the t i m e £ from the initial state t
— £ — £i)\ is contained in the reachability set of the pursuer P ; i n the
ij\y(T
time £i from the initial state x^(T approach the point
— £). T h i s means t h a t the pursuer P,- can
moving along any t r a j e c t o r y fj''[ty(( — £)] i n the time
which does not exceed £ if the strategy of linear p u r s u i t (6.7.26) is employed. x
T o be noted is that the function (6.7.25) is a special case of the payoff function (6.7.2). T h u s , in this example all conditions of T h e o r e m 1 are satisfied if the conditions (6.7.27) are satisfied, i.e. if the velocities of the team partners are sufficiently high as compared to the evader's velocity.
L e t us describe
o p t i m a l strategies for the players. Since the value of the a u x i l i a r y game V(y) is independent of the choice of the point y, then the o p t i m a l strategy for Player E is as follows.
For 0 < t < T — £, Player E makes an o p e n - l o o p transition
from the point y
to any point y 6 C E ~ ' ( ; / ) , at the time instant T — £, in
0
0
accordance w i t h the u n i f o r m d i s t r i b u t i o n he choses any point n on the circle of radius 0£ w i t h its center at the point y a n d makes an o p e n - l o o p transition to the point 77 i n the time £. T h e following M P O L B S is 4 ( - o p t i m a ) for Player P = {Pi.Pj}. point y
0
U n t i l the time instant £ the pursuers P i , P move towards the 2
or stand idle, for £ [ < i < T — $i the pursuer P pursues the point ;
directing his velocity vector on the point y(t - £) a n d thereby ensures
y(t -1)
that inclusion (6.7.28) is satisfied. A t the t i m e i n s t a n t T - £1 some e-optima! strategy p'
Ny
is fixed a n d Player P w i t h p r o b a b i l i t y ± selects one of the points
(i ( a pair of points being at the opposite ends of t h e diameter of the circle w i t h r = 20(/ir as its r a d i u s ) , a n d d u r i n g the r e m a i n i n g time T - (,
Player P< pursues (directing his velocity vector on the evading p o i n t ) the point m o v i n g along the trajectory £f*\g{i-()]
u n t i l the e , - c a p t u r e of this point;
thus, when the game terminates the point x(T) (,-neighbourhood
}
game is equal to 0 £ 2
Example
= ( i t ' ( T ) , i ( > ( T ' ) ) is in the
of the point ( £ ) % ( * - £)},(f [y(t 2
-
1
2
- £)]). T h e value of the
40 £ /w . 2
2
2
S. C o n s i d e r a z e r o - s u m differential game of p u r s u i t on the plane
Delayed information
for both
players
225
between the pursuer team P = { P , , . . . , P } , a c t i n g as one player, and the Evader E. M o t i o n equations are of the form m
Pi!
i ^ W ' l ,
E :
y i = u y + 3/1
|| < '|1
i=l,...,m;
;
U
2
- 1 < w < +1,
1
!/2 = - 1 Players E's payoff is defined as min,- p(x (T) y(T)). li)
T h e partners of the team
>
P have a simple m o t i o n , and from the m o t i o n equations for the evader we have tofe) = «a(0) - t, y , ( 0 = * + I - ya(0) +
[yt(Q)
(6.7.29)
+ y ( 0 ) - l ] e ' for 2
V =
+1,
(6.7.30)
= - t - 1 + y ( 0 ) + [y,(0) - ya(0) + l ] e ' for v = - 1 .
yM
(6.7.31)
2
It may be shown t h a t the reachability set £?|(yj(0) y^(0)j is the line segment 1
joining the point A [y(Q)) t
a n d B ( y ( 0 ) ) , the coordinates of the point A , ( y ( 0 ) ) (
being defined by Eq.(6.7.29) a n d Eq.(6.7.30), and those of the point B,(y(0)) by Eq.(6.7.29) a n d Eq.(6.7.31). H a l f of this line segment is equal to d (y(0}) = t
t + 1 — ys(0) + ly2(0) — l j e ' . T h u s the set C' (y(Q)}
is the line segment running
E
parallel to the a b c i s s a axis whose length is a continuous time function. C o n s i d e r a simultaneous z e r o - s u m a u x i l i a r y game r ( y ) on the set C (y) (on E
the interval \Bt(y), A / ( y ) ] ) . T h e game proceeds as follows. P l a y e r P chooses m points
f r o m t h e set C (y\,y2), l
Cg(yi,yi). another.
a n d Player E the point r/ £
E
T h e choices are made simultaneously and independently of one
Player E receives the payoff min,- />(£''', n).
A s is k n o w n , the value
of the game here is equal to V(y) =
a
n
d
6
=
that
f f = ^ = ^ ( y ) » the points 4°,---, ^ interval [ B ( y ) , A / ( i / ) ] i n t o equal parts. For P l a y e r £ there are several o p t i m a l a
d
d
e
v
i
d
e
t t l e
(
strategies one o f w h i c h is the choice o f the points
\
. . . , (\ , m)
with
equal p r o b a b i l i t i e s Jj-, T h u s , this game satisfies condition 1. Let y(t) be an a r b i t r a r y admissible m o t i o n of Player E for 0 < t < T. T h e n the set C' (y(t - £)) is the line segment m o v i n g parallel to the abcissa axis, its E
length 2d (y(t - i)) = 2 ( ( + 1 - y (t - i) + \yi{t - Q - l ] e } t
!
2
(6.7.32)
being independent of the coordinates y (t - £). In this case, a l l points g j * x
£' 2
J
(i=
],...,m)
of time t ptfPW
describe the nonintersecting trajectories, for at any instant - e)},^[y{t
- I)]) = 2d (y(t t
- «))/(2m -
1) a n d hence
226
Differential
games with incomplete
condition 2 is also satisfied.
F r o m formulas
information
(6.7.29) a n d (6.7.32) it follows
that max V(y) = — ! — • max {£ + 1 - y (T vec£-'(»(0)) 2 m - 1 vecj-'ivio)) 2
=
im
-t) + [j, (T-/)-l]e'} = 2
- { T + 1 - |ft(0) + [y (0) - T — J
1 +
2
and the point fyujfy) £ ^ " ' ( 3 / ( 0 ) ) may be chosen a r b i t r a r i l y . N o w , if the condition of T h e o r e m 14 are satisfied, then the players' optimal strategies
are constructed in the following way.
pursues the middle point of the interval [B (y(t
E a c h p a r t n e r of the team
- £)),A (y(t
(
— I))} until the
t
time instant T — li in order to ensure that the inclusion (6.7.3) is satisfied. A t the instant T -
ti, w i t h p r o b a b i l i t y | Player P chooses one of the m-
collections of points (,[y(T
- £ - £,)], ( [y(T 2
- t -
ti)\, a n d o n the remaining
time interval each partner F ; pursues the point ( j " ' m o v i n g along t h e trajectory £j''[i/(f — ()] to ensure by the time instant T the arrival of the point ar ' at the (l
e i - n e i g h b o u r h o o d of the point ^ j ' . For 0 < ( < T - £, Player E c a n move in 1
an arbitrary way, at the time instant T—t
he chooses w i t h p r o b a b i l i t y
of the points cjj° d i v i d i n g the interval [B {y(T t
- £)), A (y(T e
-
one
£))] into equal
parts, a n d in the time T — ( < t < T he makes a n o p e n - l o o p transition to the point resulting from a r a n d o m choice. In order to find out the parameters for which the conditions of T h e o r e m 1 are satisfied, we have to solve the auxiliary perfect i n f o r m a t i o n g a m e of pursuit a n d make appropriate estimates.
T h e s e estimates are rather cumbersome and
are not given here.
6.8
One multistage game with delayed information
In the general case, even the games of p u r s u i t , where at each stage the players have complete information o n the o p p o n e n t s ' actions, t u r n out to be very complex. T h e r e f o r e , to analyze such games it is not unusual to consider relatively simple situations.
Because of the considerable complexity of the incomplete
information games o f p u r s u i t , it is more i m p o r t a n t to find n o n t r i v i a l examples which might be relatively free from the c o m p l i c a t i n g factors.
In this respect,
of interest is the p r o b l e m formulated by R. Isaacs, where a ship carries out maneuvers so as to m i n i m i z e the probability of being stricken by a bomber. T h e problem is idealized since it is assumed that the ship (Player E or the evader)
is a point on the plane a n d the b o m b e r
(Player P or the pursuer)
One multistage
game with delayed
information
227
So = ( * - 3 , t + 3) ff, = ( * - l , i + 3) Bi=•{* + •!»* + 3) 93 = (* + 3,f + 3) F i g . 36.
has only one b o m b to s t r i k e any p o i n t on the plane {the sea surface).
The
survivability of the ship is not taken into consideration, i.e. h i t t i n g the ship is said to be equivalent to its d e s t r i c t i o n . It is assumed that after its release the bomb reached the water suface i n k time units (k is a n a t u r a l number), and in the time u n i t the ship must necessary move a distance unit either south-east or south-west. L e t us i n t r o d u c e a system of coordinates xOt:
the axis Ox is
directed from west to east; the axis Ot is directed from n o r t h to south; and the origin of coordinates is placed at the p o i n t where the ship was at the i n i t i a l instant of t i m e . Since in a t i m e unit P l a y e r E has to move s o u t h w a r d , the axis Ot sewes to reckon the t i m e . A t the i n i t i a l instant of time the ship can move to the point (—1,1), or to the p o i n t (1,1) if at the time instant t (where ( is a n a t u r a l number) it is at the p o i n t {x, t), then at the (f + k) instant of time it must be at one of the points g gj = (x — k + 2j i t
+
fc),...,
0
= (x — k, t + k), g
x
— (x — k + 2, t +
k),...,
gif = (x + k, t + k), (see F i g . 3 6 ) . In what follows
this set of points is called the reachability set in k steps from the position (x,t)
and is denoted by Dj,{x,t).
A t the time instant ( b o t h players know the
position occupied by P l a y e r E at this i n s t a n t . N e x t it is assumed that P l a y e r P may drop the b o m b only at the instants t (where t is a n a t u r a l number or o), a n d the b o m b reaches the sea surface at the instants ( + k. E v i d e n t l y , if the bomber drops the b o m b at the instant (, then it h i t s one of the points of the set Dk(x,t)
due to a b o m b fall delay for k time units. T h u s the b o m b fall
delay serves as a n o r m a l i n f o r m a t i o n delay for P l a y e r P Definition
6.
T h e p a t h £, of the length t is represented by a collec-
tion of the p o i n t s ( ( 0 , 0 ) , (x l), lf
for i = 0 , 1 , . . . , ( - 1, a n d S(£ )
( x , 2 ) , . . . , (x ,t), t
2
where x,- -
x
stands for the p o i n t ( z , t ) , i.e. S(l )
t
(
t
i + 1
=
±1
is a finite
c u l m i n a t i n g p o i n t of the p a t h £ . T h e n o t a t i o n £ - < h means t h a t the p a t h t,t
t
Differentia/games w i t h incomplete
228
information
is the i n i t i a l constituent part of the path £ (' < t a n d if £, is composed of the points ( ( 0 , 0 ) , (x l),..., x ,t'), then the path t is composed of the points u
u
t
t
((0,0), ( * „ 1 ) , . . . , (*,>,('}, (*«'+!.*' +!),•••. (*«.*))In this section it is assumed that P l a y e r E (the ship) adopts a m i x e d strategy. T h a t is, at the instant t for this every possible path ( the evader determines the probability p, w i t h which he will reduce his coordinate along the axis Ox, and hence the p r o b a b i l i t y 1 - p, w i t h which he w i l l extend his coordinate along this axis Ox. T h u s direction of the motion is determined by a random test. D e f i n i t i o n 7. T h e m a p p i n g u ; z -> [0,1], where z is a set of possible paths, is called the mixed behavior strategy of player E. B y the m i x e d behavior strategy of Player P is meant the b o m b d r o p probability d i s t r i b u t i o n on the reachability set of P l a y e r E at each instant depending on the path l . For any position {x,t) (the point) we w i l l number the elements of the set D (x,t) and denote t h e m by A\{x, t ) , . . . , A (x, t), respectively. If (y,t + k) £ D (x, t), then y(x) denotes a number of the point (y, t + k] in the set D (x, t). Consider the set D - {A ..., A ] and denote by D = D\JQ, where 0 is an abstract element substantially s t a n d i n g for a b o m b drop omission. D e f i n i t i o n 8. T h e behavior strategy for P l a y e r P is a m a p p i n g V which corresponds any path ii .£ Z the probability d i s t r i b u t i o n on the set D equal t o T O . A , ) , . . . , V{t ,A +,), V(t ,Q)) T h e number V(t ,Ai) with i = 1 , . . . , k + 1 signifies the probability of dropping a bomb to a position Ai(x, t), and V(£,, 0) the probability of bomb drop omission at that instant of time. In what follows the following notation is also admissible: A0i) = V(e , Ai) and 6 ( A ) = V(i , Q) (evidently, V(£ , Q) = i - EfcJ? V ( / | A j ) ) For Player E (the ship) the natural problem is to ensure m a x i m u m uncertainty about his true position (for Player P) at the b o m b release moment. It can be stated as follows: t
t
k
k+i
k
k
u
t
k
k+1
t
t
t
t
t
3
L
If the ship has travelled the p a t h t and arrived at the point ( x , i ) , then under behavior strategy the conditional probability of its arrival at any point of the set D {x,t) should be close to f ^ j t
k
T h i s condition is the i n t u i t i v e formulation of the o p t i m a l i t y criterion for Player E's behavior strategy. Indeed, if it is possible to define such behavior strategy u(-) that the c o n d i t i o n a l probabilities mentioned in condition 1 are equal to J J ^ J J , then at any instant of time t there is the (k + 1) point for the bomber where the ship is suspected to stay at the instant t + k. A l l these points are equivalent even though the p a t h £ travelled by the ship is known to Player P. B u t under the game conditions the evader must a c t u a l l y stay at t
One multistage
game with delayed
mformalion
229
one of these points, i.e. he cannot a d d i t i o n a l l y cheat on the pursuer. T h u s , this behavior strategy u proves to be the best for Player E. Unfortunately for k > 1 and sufficiently large values of T such behavior strategies d o not exist. T h e o r e m 1 7 For any natural number k > 1 there is such number N = N(k) there does not exist such behavior strategy that the conditional probabilities from condition 1 are exactly equal to TA^T P r o o f : Suppose the opposite is t r u e , n a m e l y : there exist such number k > 1 that a behavior strategy u ensures equal conditional probabilities from condition 1 (i.e. j r j ) .
Let pi,..,,p
be the probabilities (under a given behavior
k
strategy) corresponding to a t r a n s i t i o n from the position (1 - i,i - 1) t o the position ( — i , i ) , and p\,...,p
be the probabilities corresponding t o a t r a n -
k
sition from the p o s i t i o n ( i — 1, t — 1) t o the position ( i , i ) where i = 1 , . . . , k (since probability of the path leading t o these positions (—k, k] and (k,k) are equal to
therefore n * = i " i = I l f a i ft = Tk+t])> but
u s m
6
t l i
e rules of
the game we have u , + pi = ] and u ; , p £ [0,1], i = I , . . . , f c .
Hence we
{
obtain the inequalities u , > u , U^iPi
= fs^rt and pi > pi n!U&
= yjh>-
Consequently, 1 — pi + p\ > pi + J J ^ J J or pi < n^yy (similar inequalities hold for probabilities pi). T h u s p , pi £ [(S+Tf' It+Tll"
(/j,-) assuming
be imposed o n the probabilities pi
that the path ( ( 0 , 0 ) , . . . ,
( l - V - l ) ) ( ( ( 0 , 0 ) , . . . , ( i - l , t - l ) ) ) h a s been realized p i = l , . . . , J b . B u t then " i ( f ^ ) * (
equality pi > {k + l ) ' ft), then 1 =p +p l
1 - 3
p, £ [ji+ry. fl+ryl-
u
= (ir+IT>
a i l d
w
5
e
e Lt
h
ei n
"
' • (fc)' *' (similar reasonings a p p l y the probability 1-
>p +{k
1
> Pi TlUtPi
_ , )
' s conditions mast
t n
}
and pi < 1 ~(k+1 )(*-*>•*<'-*>,
+ \)l - i-kV-» k 3
l
whence we get ^ . ^ C [(* + l ) * " l
2 )
• a * * , 1 - ( * + 1)<*- > • fc^ *'], etc. As 1 -
1
J
-
noted above, however, we may show that the same restrictions should satisfy the probabilities pi, pi,i = 1 , . . . , k. A s s u m e that at particular stage of similar reasonings we have the estimate p;, pi £ [a, 1—a], where a < |,then at the next stage we have the estimate u i . ^ . e [ ( 1 - a ) - * 7 ( * + 1 ) , 1 - ( 1 - c 0 - * V ( * +1)L ( 1
( 1
i.e. the tower b o u n d for the value of the probability pi is raised by 6(a,k) [(1 - a ) * * ' - o[k + l ) ) ] / ( * + 1). Consider the guantity S(a,k) 1 -
a . It is m i n i m a l when a = a = 1 - (k-\) 0
by t a k i n g a derivative) and S(a ,k) 0
lfk
-(k+ 1 )
_ , /
=
as a function
* (tested inunediately
= ((fc - l ) / ( * + l ) )
, / f c
((* + l )
l)(i-*) i ) _ i > o.l. In fact, a derivative of the expression
( f c _ a )
• (* -
{k+\) -{k-l)1/k
+
lfk
with respect t o k is positive for k > 2 and is monotone, and a derivative of (k+
[)(*"'' - (it - l ) * " * ' w i t h respect to k is also positive and it m a x i m a l when 1
k = ( 3 + \ / l 3 } / 2 . Consequently, the value of the function 6(a ,k) Q
* = 2, w h i c h implies the inequality 6(a ,k) 0
> S(a ,2) 0
is m i n i m a l at
= ^J\jl•[\ + \)-\
> 0.1.
Thus, for any it five stages of similar reasonings enable us t o have a conclusion that the system of restrictions for the p r o b a b i l i t y u , is contradictory.i.e.
Differential
230
games with incomplete
information
Pi > {• and p\ < \. Since it was obviously assumed that the number of steps in the game considered i n the above reasonings, exceeds the number of steps at the earlier stages by k, then we have the upper b o u n d for the length of the game T which enables the existance of the behavior strategy garanteeing the value equal to
to the c o n d i t i o n a l probabilities f r o m condition 1.
completes the proof of the theorem.
This •
T h e p r o x i m i t y of c o n d i t i o n a l probabilities to the number
m a y be i n -
terpreted in various ways. Let a = (yx[yy)> . . . , r ^ y y ) be a vector of (k + 1) dimension. For each behavior strategy a n d for each path i
(t < T — k), the
t
vector 0 = (0o, 0t, • • •, 0k) is formed from the c o n d i t i o n a l probabilities, the vector being placed i n Q correspondence w i t h this p a t h , i.e. Q(£t) note by G {£ ) T
— 0- De-
the value of the expression {Y^ {\0i - Q,|) ) ^ for r > 1 (i.e.
t
R
=o
the n o r m 0 — a i n the space L ),
1
or (^, - 0i In # + \n(k + 1)) for r = 0 (the k
T
o
amount of information on Player E's position). T h u s , G(( ) t
gain from the step t if the p a t h t
is the expexted
has already been realised. If the index r is
t
o m i t t e d , this implies any of its values from the set ( l , o o ) ^{0}. max
W,{U )= T
Let
{Gilt)},
T h e o p t i m a l behavior strategy in the sense of the criterion r is such strategy on which a m i n i m u m of the function W,[U ) T
strategies.
is achieved in the class of behavior
Consider various techniques of solving this game for k =
2 by
referring to various approaches to the notion of " p r o x i m i t y " . In what follows this game is called the "bomber versus s h i p " game. Denote by p the p r o b a b i l i t y of arrival at the position ( i —
1) from the
point {x, t), by 6 and c the probabilities of arrival at the positions (x — 2, t + 2) and (x,t
+ 2) from the points ( z - l . t + l ) a n d (x + l,t + 1), respectively; by
9i = pb,
from condition 1. T h e o p t i m a l behavior strategy i n this section is such behavior strategy on which a m i n i m u m is achieved under the behavior strategies for a m a x i m u m w i t h respect to a l l possible lengths not exceeding T - 2 of the largest of the numbers ft, q , q for each p a t h . Since d u r a t i o n of the process is equal to T, 2
3
then Player P has to drop the bomb during this t i m e . T h i s payoff is unity if the ship is h i t , otherwise it is zero.
Denote this game by T
v
If Player P
releases the b o m b at the initial instant, then the expected value of his payoff equals qi, q or q depending on what exactly he is a i m i n g at. If the b o m b has 2
3
not been released, then the expected payoff equals pa -i(b) T
+ (1 -
p)ffT-i(c).
where o _ , (y) is the value of the game for the pursuer if there are T - 1 steps r
u n t i l the game terminates, a n d the probability chosen by the evader to reduce his coordinate along the axis Ox is equal to y. E v i d e n t l y , the largest expected
One multistage
game with delayed
information
231
payoff of P l a y e r P is equal to the m a x i m u m of these four quantities, a n d since Player E seeks to m i n i m i z e this payoff, then the following equality holds
-P)$T-I(4
I Pffr-iCO + O
(here p £ [0,1], g = 0). T h e analysis a n d solution of f u n c t i o n a l equations (6.8.1) are employed to find a s o l u t i o n to the infinite game. Let valF = 0
n
Lemma 2 (l) valT .
The function
g
is simmetric
n
about the point p = i and g [\) n
=
n
P r o o f : Since go is a constant, then statements of the l e m m a h o l d for i t . If we assume t h a t the f u n c t i o n gi is s y m m e t r i c , a n d replace respectively p by 1 — p, b by 1 — c, c by (1 — 6), then the equality <7i+i(p) = ff;+i(l — p) follows from the formula (6.8.1). If o„ = m i n [ i ] g (p), p€
0i
then g {l
n
- p) = m i n [ , i ] <7„(p), a n d
n
pe
0
since these p r o b a b i l i t i e s are components of the o p t i m a l strategy for the evader, then their m i x t u r e is also o p t i m a l a n d hence g,,(\) = min [o,i] <j,,(p) =
valV . n
p€
L e m m a 3 The inequality t and number
|;7„(p±£) — l? (p)| < e holds for any positive n
number
n.
T h i s statement follows from the l e m m a proved i n what follows. L e m m a 4 g (p) n
<
g +i(p) n
P r o o f : In this g a m e , using (n + 1) steps, the pursuer can ensure the payoff g {p) n
if he adopts the behavior strategy o p t i m a l i n the n - s t e p leuth game.
L e m m a 5 Tke function satisfying
the
g
n
uniformly
converges to a continuous
function
•
g
equation pb, QT(P)
—
min max
p
(l_6)-(l-p)c
(l-p)(l- ), c
P
g(b)
+ (1 -
p)g(c)
(6.8.2)
Differential
232 Proof:
games with incomplete
information
It follows from L e m m a 4 that for any fixed value of the argument p
the sequence {<7,,(p)} does not decrease a n d is b o u n d e d above, say by unity. Consequently, the pointwise l i m i t exists.
F r o m L e m m a 3, a n d from Arzela
theorem, it follows that this convergence is u n i f o r m a n d the l i m i t function g is continuous. Passing to the l i m i t , as i n (6.8.1), we get (6.8.2). T h i s completes the proof of the l e m m a . Denote by U
0
the following strategy: at the first step P l a y e r E reduces his
coordinate along the axis Ox where p
0
w i t h an a r b i t r a r y p r o b a b i l i t y p € [po, 1 -
Pc],
= (3 - v ^ ) / 2 , a n d then continues his m o t i o n in the direction along
which he moved at the earlier step w i t h p r o b a b i l i t y p. Such behavior strategy is called M a r k o v i a u , since the choice of control at each step is o n l y dependent on the earlier step and is independent of the controls chosen before.
The
following statement is varified immediately.
L e m m a 6 / / Player
•
E adopts tke behavior strategy Uo, then the expected max-
imum payojj of Player
P is equal to po-
N o t e that since time does not take part i n definition of the strategy UQ, then this l e m m a holds for the game w i t h any number of steps, specifically for the infinite game. T h e o r e m 18 The minimum on tke interval Proof:
value of tke function
B y L e m m a 4, the values of vair
quantity valV.
g(p)
equals p
0
and is achieved
[po, 1 — Pol— mm £[ ]
n
V
Moreover, the function g(y)
converges to the
g {p)
0il
n
is s y m m e t r i c as a l i m i t of symmet-
ric functions. Let p\ be the least value of p for w h i c h g(p)
— valT[j)\
< ^) and
b,c are the corresponding values on which a m i n i m u m i n expression (6.8,2) is achieved, i.e. Pi , h
valV,, = g(pi)
-
p , ( l - & ) + (! - p ^ c ,
max
.
(I-POO-C),
( 6
, - ' 8
3 )
Pij7(&) + ( 1 - ) < 7 ( c ) . PL
It follows from (6.8.3) that g(p,) equalities g{p,}
+ (1 - ))g(c)
> p g{b) }
P
and o(pi)
>
g(c)
that p i is a m i n i m u m value of p
:
g{p)
>
g(b)
holds.
a n d hence the in-
However, from the fact
— valT\ a n d f r o m the symme-
try of the function g about the p o i n t p — \ it follows that pj < 1 — 6 and c >
p,
(otherwise there w o u l d have been <j(pi) <
Consequently, p , ( l - h) + ( I -
)
Pl c
g{b)
> p\ + p , ( f - p i ) -
t i m e , from (4.8.3) we have that (1 - b)p
l
+ c ( l - pi)
or g(pi) pj.
<
g(c)).
A t the same
< <<{p,) -
ua(r,
thus
One multistage g a m e with delayed Pi
<
valT.
A s s u m e that p]
<
information
233
then valV
valV,
>
(1 — c ) ( l — p i ) and
v o i r > (1 - 6 ) p i + (1 - p i ) c > p\ + c(\ — pi). A d d i n g together these inequalities, we get 2 t j a / r > p + c ( l - p,). Since p + 1 is a stricly decreasing (for Pi € [0. \}) function and p i < valT < ^, then 2 u a f r > (valT) - valT + 1 and hence vair > (3 — \/5}/2 = p . B u t by L e m m a 5 we have the inequality valT < po- T h i s yieds a contradiction which precisely shows that p, = valT. We have thus proved the theorem. • 2
2
P l
2
0
Since (1 - i>)pi + (1 - pi jc > p\ + p , ( l —p\ ) = p = valV and (1 -f b} + c ( l — P i ) < valT (see f o r m u l a (6.8.3)), then = 1 — 6, c = p i and valT = pj = pQ = (3 — y/5)/2 U s i n g the same formula (6.8.3), it can be readily seen that g(p) = valT if and only if po < p i < 1 — o and c = 1 — 6 = po- Since any truncated (i.e. finite step) game is an o r d i n a r y m a t r i x game, whose m a t r i x is composed of zeros a n d unities, then its value is a r a t i o n a l number. Because of L e m m a 4, vaST < valT, and vair is an irrational number, hence valT < valF and linij.-.oo valF = valV, i.e. for any positive number e there is such number N t h a t vair — vatV < t. B y employing the o p t i m a l behavior strategy i n the game of length N in the infinite game, we obtain the e - o p t i m a l behavior strategy for P l a y e r P T h e following statement holds. l
1
Pl
P i
P
n
n
n
n
T h e o r e m 19 Tke infinite "bomber versus ship" game has the value valT = (3 - " / 5 ) / 2 . The optimal strategy for Player E coinsides with the strategy UcFor any number e > 0 there exist the e-optimal behavior strategy for Player P.
Chapter 7 Noncooperative games 7.1
differential
Game on a finite graph tree
In the m a j o r i t y of the conflictly controlled desission m a k i n g problems, the passage to the n o r m a l f o r m , i.e. the transformation of the problem to a single instantaneous choice of pure strategies fails to involve the efficient solution methods, alhough it enables to i l l u s t r a t e one or another o p t i m a l i t y principle. In some cases, the general existence theorems for the solutions of normalized games do not allow one to find or even specify the o p t i m a l behavior i n the original games, for w h i c h the n o r m a l i z a t i o n was provided. For instance as is seen below, i n "shess" there exist a saddle point in pure strategies. A t the same t i m e , dealing w i t h a corresponding m a t r i x game which is a n o r m a l i z a t i o n of the game of chess, one may only c l a i m the existence of a solution in the class of m i x e d strategies. T h i s becomes more apparent i n the studies of the differential game of p u r s u i t for w h i c h i n some cases, a solution turns out to be found e x p l i c i t l y , but the n o r m a l f o r m of the differential game is general to such extend t h a t to o b t a i n results proves to be practically impossible. T h e theory of p o s i t i o n a l games studies the m a t h e m a t i c a l models of conflict where d y n a m i c is t a k e n into account. T h e simplest class of positional games is the class of finite-stage positional games w i t h complete i n f o r m a t i o n . T h e definition of these games involves a certain knowledge of the graph theory. Let X be a finite set; the rule / setting u p the correspondence between each element x € X and the element f(x) e X is called the single valued m a p p i n g of X into X or the function defined on X a n d t a k i n g values i n X. T h e point to set m a p p i n g 7 of X i n t o X is the rule w h i c h sets u p a correspondence between each element x € X a n d a subset 7 ( 3 ) C X. H e r e -y(x) — 0 is also possible. T h e set 7 ( A ) ?=* \J
l€A
f{x)
is called the image of the set A C X. 235
It is fairly
Noncooperative
236
differential
games
easy t o see that if Ai C X (i = 1,. • • , t h e n
7 (CM.) = U 7 ( A ) , 7 ( n ^ ) c Q 7 ( ^ ) Let
7
2
(x)
7
( 7 ( x ) ) ,
7
<
+
1
f »
** itfi*))
(I = 2 , 3 , - . •)• T h e m a p p i n g
is
7
called a transitive closure of the m a p p i n g 7 if
7
The mapping 7 "
y~ (y) l
is the inverse of the m a p p i n g 7 a n d is defined as follows:
1
-{xex\ e y
7- ((7" ) (y)) I
{x}U7(^)U7V)U--U7"WU---
(x) =
,
i
(i -
7
(x)>. Let
l 2
2,3,...) and ~ ' ( F )
=
7
1.Draughts.
Example
{r ) iv)
rHrKM
^
{x € *
• -^Y+Xv) | f(x)r\B
«•*
^ 0} for all
E a c h position on the d r a u g h t - b o a r d is determined
by the arrangement of draughtsmen a t a given m o m e n t , a n d b y the indication as to whose move it is. L e t X b e a set of positions a n d 7 ( 1 ) a set of positions which can b e realized i m m e d i a t e l y after the position x € X; if i n the position x the number of white or black draughtsmen is zero or the position is drawn, then 7(3:) = 0. N e x t , i (x)
is a set of positions w h i c h can b e o b t a i n e d from x
h
in k moves, 7 ( 1 ) a set of all positions which can be o b t a i n e d from x and
1~ {A) 1
a set of positions from which it is possible to make i n one move a transition to the positions from the set A C X.
D e p i c t i n g the positions by dots and
d r a w i n g the arrow j o i n i n g two position x a n d y, where y £ ( x ) , theoretically 7
it is possible to construct a plane figure called the g r a p h of game.
Because of
a very large n u m b e r of positions, hovewer it is practically impossible to draw such a g r a p h . T h e use of point to set m a p p i n g s makes it possible t o represent a structure of numerous games: chess, go, etc. T h e pair G = < X ,
7
> is a g r a p h if X
set m a p p i n g of X into X.
is a finite set, a n d
In w h a t follows, the elements
7
a point to
of the set X are
represented b y the points o n the p l a n e , a n d the pairs of point x a n d y, for which y € ( x ) , are joined b y a continuous line w i t h the arrow directed from 7
x to y. Therefore, each element of the set X is called the vertex or node of a g r a p h , a n d the pair of elements
z = {x,y),
where y 6 ~l(x), is the a r c of the
g r a p h . W e say that x a n d y arc the b o u n d a r y nodes of the a r c z — (x,y), x as the origin a n d y as the e n d point. T h e arcs Z\ a n d z
2
if they are different a n d have a b o u n d a r y p o i n t i n c o m m o n . graph G =< X,f
with
are called contingent T h e p a t h in the
> is such sequence of arcs p = ( z i , . . . , J,-, . . . ) such that the
end p o i n t of each preceeding arc coincides with the origin of the subsequent one. T h e length of a finite p a t h p = (z^,..., z ) is the n u m b e r l{p) = k of its k
arcs, a n d i n the case of an infinite p a t h p we set ((p) = + 0 0 . T h e set of arcs in the g r a p h G = < X, of the set of arcs determines
7
> is denoted as U. Representation
the m a p p i n g 7 , a n d vice versa, the mappings
Game on a finite graph
tree
237
x Fig.
0
37.
7 determines the set U\ hence the g r a p h G =< X, 7 > can be w r i t t e n as G = < X, U > T h e edge of the graph G = < X,f > is the pair of two elements 2 — {z,y}, for w h i c h (x,y) € U or £ (/. A s distance from the arc, the orientation plays no role i n the edge. T h e set of edges of the graph G =< X, U > is denoted by U. B y the c h a i n is meant a sequence of edges [Si,Si,... , 5 ^ . . ' ) where one of the b o u n d a r y nodes of each edge i ; is boundary for and the other for (i > 2). T h e circle is a finite chain s t a r t i n g i n a node x and t e r m i n a t i n g i n the same node. T h e graph is called connected if any two of its nodes m a y be connected by a c h a i n . B y definition the tree, or the graph tree, is a finite connected graph w i t h o u t circles w h i c h has at least two nodes. In any g r a p h tree there is a unique node x such that j(x ) = X. T h e node XQ is called the i n i t i a l node of the g r a p h . 0
0
Example 2. Fig.37. shows the tree, or the g r a p h tree,with the origin x T h e nodes x £ X or vertices of the graph are m a r k e d by dots. T h e arcs are 0
Noncooperative
238
differential
games
depicted as thre arrowed segments emphasizing the origin a n d the end point of the arc. Let y € X, the subgraph of the g r a p h tree G = < X,f > is a graph of the form G =< X ,iy > where X = i(y), a n d 7 ( x ) = j(x)(~]X . In Fig.37, the graph G „ is shown by a d o t t e d line. I n t h e graph tree, for a l l x £ X the sets ( x ) a n d l {x) coincide, i.e. the m a p p i n g 7,, is a restriction of the m a p p i n g 7 to the set X . Therefore, the n o t a t i o n G =< X ,f > w i l l be used for subgraphs of the graph G . S
y
y
!l
y
y
y
7
y
Y
s
Example 3. In general, draughts or chess can n o t be represented by a graph tree i f by the node of the corresponding g r a p h is meant a n arrangement of (chess-men) draughtsmenn o n the b o a r d i n a given instant a t the indication of a move, since the same arrangement of (chess-man) draughtsmen can be obtained in a variety of ways. A t t h e same t i m e , i f by the node of the graph representing a structure of (chess-men) draughtsmen o n the board at a given instant, is meant the indication of a move and the history of the game (all succesive arangements of c h e s s - d r a u g h t s - m e n in the earlier moves), then each node is reached from the original node i n a unique way, the relevant graph of the game constains no circles a n d hence is the tree. W e w i l l now define a multistage game w i t h complete information on a finite graph tree. Let G = < A", 7 > be a graph tree, X = U£+j X where XiC\Xj = 0 with i ^ j a n d ( x ) = 0 for a l l positions x £ X „ + i . T h e set X, is called the priority set of the player i (i = 1 , . . . , n) a n d X„ the set of final positions. T h e real functions H\,...,H are defined o n the set X +iT h e function Hi is called the payoff of the player t ( i = 1 , . . . , n). iy
7
+i
n
n
T h e game proceeds as follows. Suppose we have n players labelled with natural numbers l,...,n. A s s u m e that XQ £ Xi„ then i n the node (position) x the player i i "makes a move" a n d selects the node X\ £ 7(^0); if X\ £ X ; , , then i n the node x\ the player i "makes a move" a n d selects the next node x £ 7 ( x i ) etc. If the position x „ _ ! £ . V , , is realized at the k-th step, then in this position the player i "makes a move" a n d selects the next node (position) x from the set ~j{xk-i). T h e game terminates as soon as the node Xi £ X„ is reached. Such s t e p - b y - s t e p selection implies a unique realization of nodes a
2
2
k
k
+i
XQ, . • •, xi which determines the p a t h i n the graph tree G e m a n a t i n g from the i n i t i a l position x a n d reaching one of the final positions i n the game. In what follows, such a p a t h is called the play of the game. Because of the treelike structure of the graph G , each position x / £ X uniquely determines the play reaching this position. In the final position X ( , each of the players i £ { l , . . . , n } receives the payoff Hi(zt). 0
n+l
W h e n m a k i n g a decision in the position x £ Xi, the player i is assumed to know this position a n d hence, because of the treelike structure of the graph
Nash
equilibrium
239
G, he can restore a l l previous positions. In this case, the players are said to have complete i n f o r m a t i o n provided by chess and draughts since chess- and d r a u g h t - p l a y e r s may put down their moves, a n d hence they are said to know the history of the game i n m a k i n g each move i n t u r n . T h e f u n c t i o n u ; , w h i c h places each position x £ X ; i n correspondence w i t h a position y 6 7 ( 1 ) , is called a strategy for the player i. T h e set of a l l possible strategies for the player i is denoted by D ( ( i = T h u s the strategy for the i - t h player prescribes h i m , i n any position x from his priority set Xi, a unique choice of the next position. T h e set of n - t i p e s of strategies u = ( U i , , . . , u „ ) , where u^ € Di, is called a situation. E a c h s i t u a t i o n uniquely defines a play of the game and hence the players' payoffs. Indeed, let Xo £ Xt then i n the s i t u a t i o n u = ( u i , . . . , u ) the next p o s i t i o n x\ is uniquely determined by the rule x\ = u . ^ x o ) ; now let Xi £ X „ , then Xj is uniquely determined by the rule i = u ^ f x , ) ; if the position Xfc_i € Xi is realized at the fc-th step, then X* is uniquely determined by the rule x = Uj,. etc. Suppose that to the situation u = ( t * i , . . . , u „ ) it
n
3
k
k
corresponds, i n t h a t sense, the play x i T h e n we may introduce the notion of the payoff function Ki : D = Di x . . . x D„ —> ff, 0
Ki(u)
= I
= Hi{x ),
n
e
where xi is the last node i n the play xo,...,xi
(i =
l,...,n)
corresponding to the situation
u = ( u i , . . . , u „ ) . T h e collection (7.1.1)
T=
n
u
n
is called the n - p e r s o n game w i t h complete information on graph G.
7.2
Nash equilibrium
We first consider a general case. Let a r b i t r a r y n sets A ...,An It
be given. T h e
set Ai is called the strategy set of the i - t h player (i = 1 , . . . , n). n real functions Id,..., T h e function Ki
K
Let further
be given on the C a r t e s i a n product Ai x . . . X
n
is called the payoff function of the i - t h player.
A. n
T h e n the
collection T= l
l
n
1
n
(7-2.1)
is called the n - p e r s o n g a m e i n n o r m a l form. T h e evolution of the game may be represented as follows. E a c h of players i simultaneously and independent of each other chooses a strategy u,- £ Ai ( i • 1 , . . . , ")• T h e players choices result as an n - t a p l e of strategies u = ( t t j , . . . , ^ ) called the s i t u a t i o n . Thereafter the player i receives the payoff equal to the value of his payoff function in this
N o n c o o p e r a t i v e differential games
240 situation K0>)
- , " , . ) • E a c h player is a i m i n g to o b t a i n a maximal
= Ki[u .. u
payoff. If n = 2 a n d K\ = - / f a , then the game (7.2.1) is z e r o - s u m game. T h i s chapter consider noncooperative games (7.2.1) where each player acts i n d i v i d u a l ! a n d receives the payoff determined by his payoff f u n c t i o n . the function Hi is not exclusively dependent on the strategy u the payoff of the i - t h player i n the noncooperative dependent on his behavior.
Since
the size of
it
game is not exclusively
Therefore, i n choosing his strategy, each player
has to care for possible behaviors of all his partners. T h e s i t u a t i o n u = (v.\,...,
u ) is called the N a s h e q u i l i b r i u m i n the noncon
operative game (7.2.1) if > Kp
Ki(u) for all Ui €
Ai (i
(ui,...,Jl;_],u ,
!,...,»).
=
|| it.-)
Here by (fi||u,-) we denote the situation
fif+.i,...,«n).
;
T h e meaning of the o p t i m a l i t y of the strategy u,- a p p e a r i n g i n the Nash e q u i l i b r i u m is that the i - t h player deviation from i t , w i t h the other players playing the e q u i l i b r i u m strategies Hj ( j ^ i ) from the s i t u a t i o n u , d i d not increase the payoff of the player i. T h u s , the N a s h e q u i l i b r i u m is not necessary insensitive to group deviations from i t . For the noncooperative multistage game w i t h complete information (7.1.1), the notion of e q u i l i b r i u m may enhaced by i n t r o d u c i n g the notion of absolute e q u i l i b r i u m . T h e subgame f
v
is related to any subgraph G
y
= (X ,"f) v
of graph
G in the following manner: the priority sets for players in the subgame r „ are determined by the rule Xf = X, \
fl X,
to be equal:
Hf{x)
X^ i +
l+
y
— A ' , f] X
(i = 1 , . . . , ri), the set of final positions is
y
a n d the payoff for players Hf in the subgame are assumed — Hi(x)
for all x £ X*
the strategy for the i - t h player
+l
(i = l , . . . , n ) .
Accordingly,
i n the subgame under s t u d y is defined as
a truncation of the strategy for the same player u ; in the game (7.1.1), i.e. uf(i) ^
Ui(x),
(i = 1 , . . . , n ) . T h e set of a l l such strategies is denoted
x £ X'
by O f . As a result, each subgraph G
y
defines the game is normal form
=
(7.2.3)
where the payoff function Kf is defined on D\ X . . . X Z>*. T h e N a s h equilibrium u = ( u i , . . . , u ) is called the absolute N a s h e q u i l i b r i u m (subgame perfect) in n
the game (7.1.1) if for any position y e X the s i t u a t i o n u" —
(Sj,...,$*),
where u- is a t r u n c a t i o n of the strategy u,-, is the N a s h e q u i l i b r i u m i n an appropriate subgame V
Y
(7.2.3). T h e following statement holds true.
T h e o r e m 1 In any multistage
game with complete information
finite graph tree G there exists the. absolute Nash
equilibrium.
(7.1.1) on a
Nasi]
equilibrium
241
Proof: T h e length of the game (7.1.1) on the graph G is called the length of the largest p a t h i n the graph G . T h e proof is p r v i d e d by i n d u c t i o n over the length of the game. If the length of the game (7.1.1) is equal to 1, this means that in the game (7.1.1) o n l y one of the players can make a move. B y choosing the next node f r o m the c o n d i t i o n of m a x i m i z a t i o n of his payoff, the player is acting under the strategy w h i c h forms the absolute Nash e q u i l i b r i u m . Let us now assume that the theorem holds true for all games whose length does not exceed k - 1, a n d prove it for the game (7.1.1) of the length k. Since the subgames V , y 6 7 ( x ) are of the length not exceeding k - 1, then by the indiction a s s u m p t i o n the absolute Nash e q u i l i b r i u m exists therein. For each subgame we denote this e q u i l i b r i u m by y
0
u» = ( * & . . . , G » ) Let u ( x ) = fift» where x £ Xi,, J
for x £ X C\X , t
(7.2.4)
y £ 7 ( ^ 0 ) (i = l,...,n)
y
and u,(x ) = 0
y',
0
* j f W
*!*) =
max
^ ( ^ , . . . , 0
T h e function fij is a strategy in the game (7.1.1), its truncation tif appearing in (7.2.4) a strategy in the subgame. T h u s , to complete the proof of the theorem, it suffices to show that the sytrategy set u = ( t i i , . . . , u„) constitutes the Nash e q u i l i b r i u m in the game (7.1.1). Indeed, Ki,(u)
=
nf(fi*-) =
m a x mt)
> Kf,(u) > / C , ( ( « K ) ) =
Ui,(x ) 0
-
* , ( « K ) .
y,
for all U,*, e O j , ; if i ^ i j , then Ki(Q)
= Kf
(&>') > Kf
((uWmf)
=
Ki(fl|M
for all Ui £ Di. T h i s completes the proof of the theorem. Example
•
4. Suppose the t w o - p e r s o n game (7.1.1), n = 2, is played on the
graph directed i n F i g . 3 7 ; the s t r u c t u r e of the game is presented i n Fig.38: the nodes from the p r i o r i t y set X
x
are represented by circles and these from the
set Xi by brocks, w u i t h the players' payoff written in filial positions. T h e position f r o m the sets Xi (i -
1,2) a n d the arcs e m a n a t i n g from each
node x £ Xi are numbered by double indices ( i , j ) where i is the index of the player, a n d j - t h e index of the node x i n the set X . t
Since the choice i n
the node x is equivalent to the choice of the next node y £ i(x),
t h e n , for
simplicity, we assume t h a t i n each node the strategy indicates the number
Noncooperative
242
Fig.
differential
games
38.
of the arc leading to the node y G 7 ( 2 ) . For example, the strategy of the first player u i = ( 2 , 1 , 2 , 3 , 1 , 2 , 1 , 1 ) prescribes that he must choose arc 2 in node (1.1) arc 1 i n node (1,2), arc 2 i n node ( 1 , 3 ) , arc 3 i n node (1,4), etc. Since the p r i o r i t y set of the first player is composed of 8 nodes, then his strategy i n the e i g h t - d i m e n s i o n a l vector. A n y strategy of the second player is the seven-dimentional vector. Altogether there are 864 strategies for the first player, and 576 strategies for the second. T h e corresponding game in n o r m a l form turns out to be the b i m a t r i x game w i t h two payoff matrices of dimension 864 x 576. T h e direct solution of such b i m a t r i x games is not only difficult, but also practically impossible. A t the same t i m e , the game involved has a suffiently simple structure and can be readily solved by the backward construction of the absolute Nash e q u i l i b r i u m as it was done i n the proof of T h e o r e m 1. W e denote by v\(y)
and v (y) 2
the payoffs respectively of players 1 and 2 in
Nash
equilibrium
243
the subgame T„ i n a fixed absolute e q u i l i b r i u m . W e first solve the subgammes F(i ), r >T
Bl(I,7) = subgames r
( 2 i T )
.
It can be readily seen t h a t u j ( l , 6 ) =
2, v ( l , 7 )
=
a
( l s )
, r^.e),
4, v , ( 2 , 7 ) =
1, u ( 2 , 7 ) = 2
8.
6, t i ( l , 6 ) = 2
2,
T h e n we solve the
In the subgame r , , there are two N a s h equi-
T ).
1 2
L2-7
s
l i b r i u m , since the second player does not care w h i c h of the alternatives to choose.
H i s choice, however, is essential for Player 1 since, w i t h Player 2's
choice of the l e f t - h a n d a r c , the first player scores + 1 , whereas with Player 2's choice of the other arc P l a y e r 1 scores +6. second player " f a v o u r s "
Note this and assume that the
and chooses the r i g h t - h a n d arc i n position
T h e n u , ( 2 , 5 ) = W|(l,6) -
i> (2,6) = » i ( l , 7 ) = 4, t>,(l,8) = 2, 8,(1,8) = 3.
N e x t we solve the games
a
F(i.3),
T(2,3), r (
] | 5
(2,5).
6, ^ ( 2 , 5 ) = 0,(1,8) = 2, ^ ( 2 , 6 ) = v , ( l , 7 ) = 2,
), r ^ ) -
In the subgame r
( l i 3 )
there are two N a s h equi-
libria, since the first player does not care which of the alternatives to choose. His choice, however, proves to be essential for Player 2 since w i t h P l a y e r l ' s choice of the l e f t - h a n d a l t e r n a t i v e he (Player 2) scores —10. A s s u m e that he selects the r i g h t - h a n d alternative i n position (1,3). T h e n u , ( l , 3 ) = 5, 8 , ( 1 , 8 ) = 1G, 1^(1,4) = M 2 , 5 ) = 6, 8 , ( 1 , 4 ) = v ( 2 , 5 ) m 2, 1,(1,6) = * i ( 2 , 6 ) = 2, 2
8 , ( 1 , 5 ) - 8 , ( 2 , 6 ) = 4, u ( 2 , 3 ) = 0, l
v , ( 2 , 3 ) = 6, M M ) = 3, 8 , ( 2 , 4 ) = 5For the games V( _ ), F 2
2
( 1 | 2 )
, r
l 2 ]
, j we have
t>,(2,1) = w i ( l , 3 ) = 5, u , ( 2 , 1 ) = 8,(1,3) = 10, t * ( l , 2 ) = v , ( 2 , 4 ) u , ( l , 2 ) = v (2,4) 2
= 3, ^ , ( 2 , 2 ) = - 5 , v , ( 2 , 2 ) = 6.
Now it is possible to solve the game T = T(i j). r
vi(l.l) =
i>i(2,l) =
3,
5, 8,(1,1) =
v,(2,l) =
It can be readily seen that 10.
As a result, we get
the
following absolute N a s h e q u i l i b r i u m (u,,u ) = ((1,2,2,2,2,3,2,1),(1,3,2,2,2,1,2)) 2
(7.2.5)
In the e q u i l i b r i u m (7.2.5) the game follows the path ( ( 1 , 1 ) , ( 2 , 1 ) , (1,3)). B y construction, the strategies i i ; (i =
1,2)
are "favourable"
in that the player
i m a k i n g his move and being equally interested i n the choice of subsequent alternatives chooses the strategy that is favourable for the player 3 - i . In the game under study there are e q u i l i b r i u m where the payoffs are different. T o construct one of such e q u i l i b r i a , it suffices to replace the "favourableness" c o n d i t i o n of the players w i t h the inverse condition of
"unfavourableness".
T h e n we have a new N a s h e q u i l i b r i u m = ((2,2,1,1,2,3,2,1),(3,3,2,2,1,1,2))
(7.2.6)
Noncooperative
244
differential
games
T h e payof for b o t h players i n (7.2.6) are less then those i n the e q u i l i b r i u m (7.2.5). T h e e q u i l i b r i u m (7.2.6), just as the e q u i l i b r i u m (7.2.5), is the absolute e q u i l i b r i u m . E v i d e n t l y , apart from " f a v o u r a b l e " a n d " u n f a v o u r a b l e " absolute Nash e q u i l i b r i a , there is an entire f a m i l y of intermediate absolute equilibria. O f interest is the question: when they are not differing i n the players' payoffs. T h e o r e m 2 Suppose
the payoff of players
Hi
(i =
(7.2.1) are such that if there are such k and such x,y H (y), k
then Hi(x)
for alt i = ! , . . . , « .
= Hi(y)
players' payoff coincide
in ail absolute
l,...,n)
in the game
x ^ y, that H*(x)
=
Then, in the game (7.1.1) the
equilibria.
P r o o f : Consider a f a m i l y of subgames of the game (7.1.1), the proof is given by induction over their length.
Suppose the subgame F^ is of length 1 and
the player ij makes a move i n a unique position which is not e q u i l i b r i u m he then makes a choice x 6 f[x) = max
H„(x)
final.
In the
from the condition H {ij) h
If the point x is unique, then the players' payoff vector is N a s h e q u i l i b r i a is also unique a n d here equals to H(x)
— ( f 7 , ( i ) , . . . , H (x)). n
If there exists a point
x ^ x sach that //;,(#) — / / ; , ( £ ) , then there is one more e q u i l i b r i u m with the payoffs H{2)
= (Hi(x),...,
however, it follows that H(x) Let v(x)
— (v\(x),...,
n
v (x)) n
single stage subgame F
F r o m the c o n d i t i o n of the theorem,
H {x)}. —
H(x) be the payoff vector i n the e q u i l i b r i u m i n a
w h i c h , as is previously seen, is defined i n a unique
x
mannaer. W e will prove that if w;, (x') = u;, (x")
for same i j , where x' and x"
are sach t h a t the length of subgames I V F " are equal to I, then u,(x') = Bj(ar} x
for all i = 1 , . . . , 71. Indeed, let x' G Xi , k
v, [x)
= H {x')
(x")
= #;(£"),
s
T h e n Vi(x')
= H,{x'),
that / / ; , ( £ ' ) = Hi, (x") Vi(x')
= Hi(x-)
= Hi(x")
Vi
i "6 =
ik
t
m a x 7/, (y) J/67(l'l t
i =
l,...,n.
F r o m (7.2.7) we have
a n d hence using the condition of the theorem we get = u,(x"),
i=\,...,n.
W e now assume that in all subgames T the payoff Vi(x)
Xi
r
of the length not exceeding k — 1,
in Nash e q u i l i b r i u m for all players are uniquely defined and
if for some two subgames F > a n d V„» the equality Vi, = u;,(x") holds for one index i j , then Vi{x'} = Vi(x") length k.
If x € Xi
lt
for a l l indexes i ^ i t . Let T
x
be a subgame of
then i n the position x the player i ] , chooses the next
position i 6 -y(x) from the condition %(*)
=
max
Vi,(rj).
Nash
equilibrium
245
In the case where the p o i n t x determined by this condition is u n i q u e , here is a unique vector ( v i ( i ) , . . . ,v (x)) n
of the players payoffs i n the N a s h e q u i l i b r i u m
in the game F . If, however, there are two positions x a n d x for w h i c h the B
equality U i , ( x ) = v; (x) t
= Vi(x)
holds t r u e , t h e n , by the i n d u c t i o n assumption we have
for a l l i
i%. Consequently, the players payoffs {vi(x)}
Nash e q u i l i b r i u m i n the game V Vj(x)
= Vj(x),
i n the
are defined i n a unique manner: u , ( x )
x
=
i = 1 , . . . , n . . T h i s completes the proof of the theorem.
•
W e w i l l derive f u n c t i o n a l equations for n o n z e r o - s u m games w i t h complete i n f o r m a t i o n . l t follows f r o m T h e o r e m 1 that i n any n o n z e r o - s u m game w i t h complete i n f o r m a t i o n there exists an absolute Nash e q u i l i b r i u m , i.e. there exist such a s i t u a t i o n ( t i i , . . . , u „ ) whose t r u n c a t i o n in any of the subgames T
of
x
game T forms therein the Nash e q u i l i b r i u m . Denote by v(x) the payoff vector i n the Nash e q u i l i b r i u m of subgame T
x
of the absolute e q u i l i b r i u m (iij,...,
£„).
= {v (x),...
,v {x)}
t
n
which is a t r u n c a t i o n
In contrast to the z e r o - s u m case,
where the value of the game is defined in a unique manner, here the vector v(x)
takes different values for different Nash e q u i l i b r i u m , since in different
absolute e q u i l i b r i a the players payoff vectors may be different. It was shown that v(x)
may be constructed as s i n g l e - v a l u e d vector function which sets up
a correspondence between each x a n d the payoff vector i n one of the absolute equilibria of subgames I V W e derive a f u n c t i o n a l equations for the c o m p u t a t i o n of the vector function v{x).
W e have u(x)
=
i«n(*))?
=
K
• ' S)>
"t?$y•
5
( t i j , . . . , u „ ) are absolute e q u i l i b r i a i n subgames I V L e t y G ^(x)
w
and X 6
h
e
r
e
Xj,
then -, < )
«*.(*) = Kfffi,.. and Vi{x) =
= v^y')
m a x Ki$l
...Sl)
=
^ M v )
for i ^ 1. For x £ Xi we similarly o b t a i n
m = * n * f » - • and v^x)
-
•><)
=
S i
i
m
> • • • '
K
)
=
s
v
i
{
y
)
for j f i. T h u s , vi(x) = » ; ( » * ) = m a x w ( y )
(7.2.8)
for all x G Xi a n d vj(x)
= Vj{y')
for; ^ i
= 1,•••,«)
(7.2.9)
E q u a t i o n s (7.2.8), (7.2.9) are solved u n d e r the b o u n d a r y c o n d i t i o n v(x)
= H{x),
x€X,
t+1
(7.2.10)
Noncooperative
246
differential
games
T h e system of equations (7.2.8),(7.2.9) w i t h the b o u n d a r y c o n d i t i o n (7.2.10) makes it possible to construct the b a c k w a r d recurrent procedure for the comp u t a t i o n of the vector function v(x)
a n d the corresponding absolute N a s h equi-
l i b r i u m . Indeed, let the vector f u n c t i o n v(y) of length £ < k — 1 a n d F
be c o m p u t e d for a l l subgames T
y
be a subgame of length k. In this case, i f x €
x
then Vi(x) is determined from (7.2.8), a n d v,(x) v(x)
= My"),-..,Vi-i(y ),
(j ^ i) f r o m (7.2.9), i.e.
max Vi(y),v {y*),...
m
Xi,
i+1
,v {y')). n
T h e same formulas indicate the initial o p t i m a l choice in p o s i t i o n x 6 Xi of subgame I V Indeed, i n the position x 6 Xi the player i has to choose the node y" £ j(x) for which the value of u,(t/") is m a x i m a l . E v i d e n t l y , the solution of the system of recurrent equation (7.2.8), (7.2.9) under the b o u n d a r y condition (7.2.10) is not u n i q u e , since a m a x i m u m in (7.2.8) may be achieved on different j / J , y^ nodes a l t h o u g h Vi(y{) = u,(yj), but Vj(y*) for some j ^ i may not coincide w i t h Vj(y ), w h i c h essentially affects further s o l u t i o n . A s noted i n E x a m p l e 4, this exactly leads to different payoffs in different absolute equilibria. 2
A l t h o u g h we have proved the existence of an absolute. N a s h e q u i l i b r i u m in the n o n z e r o - s u m multistage game w i t h complete i n f o r m a t i o n , there may exist not necessary absolute e q u i l i b r i a i n the basic game (7.1.1). A m o n g such e q u i l i b r i a a special role play the e q u i l i b r i a in penality strategies. T h i s will be illustrated by referring to examples. Example 5. Consider the n o n z e r o - s u m t w o - p e r s o n game from E x a m p l e 4. W e compare the payoffs i n " f a v o u r a b l e " and " u n f a v o u r a b l e " e q u i l i b r i a . In the "favourable" e q u i l i b r i u m (7.2.5) the players' payoffs are (5,10), this being (3,5) in the e q u i l i b r i u m (7.2.6). In the e q u i l i b r i u m (7.2.5) a large payoff for Player 2 is fully dependent upon the fancies of P l a y e r 1, since in the node (1,3) the choice of the l e f t - h a n d or r i g h t - h a n d arc makes no difference to P l a y e r 1, and if for some reason he decides on the l e f t - h a n d arc, P l a y e r 2's payoff may go down to 1. However, by e m p l o y i n g a p a r t i c u l a r behavior here called a penality strategy, P l a y e r 2 may always ensure payoff 5. Indeed, if P l a y e r 2 adopts the strategy u' = ( 2 , 3 , 2 , 2 , 1 , 1 , 2 ) , 3
(7.2.11)
the best responce of P l a y e r 1 may only come from the class of strategies uj(2,2,u ,l,a ,u , 3
where a; (i =
5
s
( l 7
,a ), 8
(7.2.12)
1,2,4) are the a r b i t r a r y acceptable choices of P l a y e r 1 i n the
corresponding nodes.
T h e payoff of Players in the e q u i l i b r i u m (7.2.11) are
equal to A ' i ( u J , u ) = 3, K (u\,u^) 2
2
-
5 a n d coincide w i t h those i n (7.2.6). It
is fairly easy to show that the pair of strategies (7.2.11), (7.2.12) forms the
Nash
equilibrium
247
F i g . 39.
Nash e q u i l i b r i u m , A t the same t i m e , i n this e q u i l i b r i u m Player 2 ensures the payoff 5, but this is not true for the s i t u a t i o n (7.2.6) i n which Player 1 may choose the strategy Uj = ( 1 , a , a , 3 , a , a , 2 , a ) , then / ^ ( u , , ^ ) = 4. 2
Note t h a t the strategy
3
s
G
8
(7.2.11) is a penelty strategy i n that, when it is
adopted by Player 2, the departure of P l a y e r I f r o m the strategy (7.2.12) i n the node XQ, say by choosing the l e f t - h a n d arc, involves P l a y e r 2's choice i n the node (2,1) of the central arc w i t h the quaranteed subsequent payoff for Player I equal to 2. In this case, Player 2's choice i n the subgame Fa,!.) is far from being o p t i m a l ( P l a y e r 2 in this position s h o u l d have chosen the r i g h t - h a n d arc) but he "penalizes" Player 1 thereby preventing his departure from the strategy J/J in the position (1,1). So the t r u n c a t i o n of the N a s h e q u i l i b r i u m (7.2.11), (7.2.12) in the subgame r i 2 , i | is not a N a s h e q u i l i b r i u m , i.e.
the e q u i l i b r i u m is not
an absolute e q u i l i b r i u m (subgame perfect), although it guarantees Player 2's payoff equal to 5. Example mentioned
6.
W e now provide a simpler example i l l u s t r a t i n g the a b o v e -
property
of e q u i l i b r i u m in penalty strategies.
the game is shown i n F i g . 3 9 .
T h e s t r u c t u r e of
A t the first step Player 1 chooses one of the
two alternatives: " r i g h t " ( # ) or " l e f t " ( L ) ; at the second step Player 2 knows Player l ' s move makes his move and chooses R or L alternative, at the t h i r d step P l a y e r I chooses R or L alternative, whereupon the game
terminates
and the players receive their payoff i n final positions. In Fig.39 the payoff for Player 1 are i n d i c a t e d by blocks, a n d those for Player 2 by circles. Denote the nodes, i n which P l a y e r 1 makes a move, by xi,x ,X3,x ,x , 2
which P l a y e r 2 makes a move, by y yi € Xi)
it
y. 2
4
5
and the nodes, in
In each position x; G X , (respectively
the strategy for the first (respectively the second) player u (v)
assume one of two values:
R or L .
D e n o t e the game i n question by
may r
( l | (
3)
Noncooperative
248
differential
games
(here x is the i n i t i a l p o s i t i o n , and 3 the number of moves i n the game). A n absolute e q u i l i b r i u m in the game is of the form: u(x) = L, u{x ) = L, l
2
u(x ) = ft, u(x4) = R, u{x ) = R, 5(i/i) = L, v{y ) = L. These strategies form the N a s h e q u i l i b r i u m i n a l l subgames P ^ . T ) , r ^ ^ j , r („,,!), (i = 2 , 3 , 4 , 5 ) . W e w i l l now consider an e q u i l i b r i u m i n penality strategies. It can be readily seen that it is of the f o r m 3
3
u(xj)
2
= R, u(x )
= L, u{x )
2
u(*4) -
a
= it,
L, u ( x ) = R,
(7.2.13)
5
fifth) = n, v(y ) 2
= R.
N o t e t h a t the strategies (7.2.13) do not form the N a s h e q u i l i b r i u m i n the subgame r ( „ , i ) . A l s o , of interest is t h a t the players' payoffs i n this e q u i l i b r i u m are higher than those in the absolute e q u i l i b r i u m ((10,3) and (8,2), respectively). If Player 2 does not pursue the strategy i (in p a r t i c u l a r , of i m p o r t a n c e is the case where v(y ) = L ^ v{y ) — R), Player 1 i n the node x choses the alternative u ( x ) = L, w h i c h minimizes not only the payoff for the "penalized" Player 2, but also his own payoff. Because of this, the t r u n c a t i o n of strategies (7.2.13) i n the subgame do not form i n the subgame the e q u i l i b r i a in this 4
2
2
4
4
game P ( n . i ) . E x a m p l e 7. W e have seen in E x a m p l e 4 that " P a v o u r a b l e n e s s " of the players give them higher payoffs i n the corresponding N a s h e q u i l i b r i a , than the "unfavourable" behavior. B u t this is not always the case. Sometimes the "unfavourable" N a s h e q u i l i b r i a gives higher payoff to all the players than the "favourable" one. W e shall illustrate this rather n o n t r i v a l fact on example below. Consider the two-person game on the F i g . 4 0 . T h e nodes from the priority set Xi are represent by circles and those from X by brocks, with players payoffs w r i t t e n in the final positions. In E x a m p l e 4 positions from the set Xi (i = 1,2) are numbered by double indexes where i is the indexes of the player and j - t h e index of the node x i n the set Xi. One can easily see that he "favourable" e q u i l i b r i a has the form ( 2 , 2 , 1 , 1 , 1 ) , (2,1) w i t h payoffs (2,1). T h e " u n f a v o u r a b l e " e q u i l i b r i a has the form ( 1 , 1 , 2 , 1 , 1 ) , (1,1) w i t h payoffs (5,3). W e will now provide a precise definition for penality strategies. For simplicity, we restrict ourselves to the case of n o n z e r o - s u m t w o - p e r s o n multistage game w i t h complete information 2
F=<0 ,/J ,A-,,A > 1
2
1
(7.2.14)
on graph G. T h i s game is related to two z e r o - s u m games (7.2.15)
jVash
equilibrium
249
5
6
1
3
*
10
1
2
1
1
-
1
1
2
0
0
5
F i g . 40.
F =< D„D ,-K ,K 2
Let i " i , T r
2x
21
2 l
2
(7.2.16)
>
2
be subgames of games (7.2.15),(7.2.16); Vi{x),
of these s u b g a m e s ; ( i l i , , u ) r
2
a n d (flf , u ) 22
a
v (x) 2
the values
the e q u i l i b r i a i n the games T
, respectively. C o n s i d e r a fixed path z = (x
0
= z ,Zj,... 0
,z ) (
l r
and
on the graph
G realized in t h e games (7.2.14)-(7.2.16). T h e strategy U\ is called a penality strategy for P l a y e r 1 if «,(*;) = i * , for positions z, € Zf\X , l
(7.2.17)
and fii(y) = «
)
(7-2.18)
for positions j e X , \ 2 such t h a t ( 7 " ) (y) = x € Z for some m . T h e penality 1
m
strategy t i for P l a y e r 2 is defined in much the same way: 5
u (z ) 2
i
= z
i + l
M V ) = 0M(I/) if y € X j \ Z a n d sucli t h a t ( 7 ) ' " ( y ) = x € 2 for some m . _ 1
(7.2.19)
( ' 7
2 2
°)
N o n c o o p e r a t i v e differential
250
games
Note that (1=1,2)
&(fil,&)=&(**)
{7.2.21}
If P l a y e r 1 employs the strategy Vy, for w h i c l i the p o s i t i o n z 6 Z is the first {
in the p a t h z, where 1*1(2,) f z
i + u
then the conditions (7.2.17), {7.2.18) imply
the inequality. If for P l a y r 2's strategy the position z, e Z is the first i n the p a t h Z , where ^ z
ui{zi)
i + 1
, then the c o n d i t i o n (7.2.19), (7.2.21) i m p l y the i n e q u a l i t y (7.2.23)
K {u u )
u
2
2
T h e next statement follows from relations (7.2.21 )-(7.2.23). T h e o r e m 3 For the pair of penality librium
in ike game (7.2.14),
following
inequalities
be
strategies
it is sufficient
z
all i = 0 , 1 , . . . , I
the
satisfied. ^ ( S t . t y a ^ & J
where (20,%\, • • • , t)
(.Si,fii) to be the Nash equithat for
( i = 1,2),
(7.2.24)
* the path realized in the situation s
(uijfi,)
Let u j , a n d u j , be the strategies for players 1 a n d 2 i n the games (7.2.15) and (7.2.16), respectively; Z the p a t h i n the corresponding s i t u a t i o n ( a J p U ^ ) ; u, and u
2
the penality strategies for the p a t h Z .
It m a y be shown that the
condition (7.2.24) is then satisfied. In this manner we get the following theorem. T h e o r e m 4 In the game (7.2.14) nality
there is always the Nash equilibrium
in pe-
strategies.
T h e i m p l i c a t i o n of penality strategies is that w i t h their help the player makes his opponent to follow a p a r t i c u l a r p a t h i n the game ( p a r t i c u l a r choices) by continuously threatening to switch to a strategy designed against the opponent (the strategy being o p t i m a l in the corresponding z e r o - s u m game).
The
set of e q u i l i b r i a in the class of p e n a l i z a t i o n strategies is sufficiently represent a t i o n a l . It should be noted that by p e n a l i z i n g his opponent the player may panaliize himself even to a greater extent. T h e s t r o n g d y n a m i c s t a b i l i t y of Nash e q u i l i b r i u m a n d / f - e q u i l i b r i u m . Let us state the concept of d y n a m i c s t a b i l i t y (time consistency) for nonzero-sum multistage game on a graph tree. In this section the game (7.1.1) is denoted by r( .f)> where ( is the length of the game. L e t U(x ,£) be the set of all situation in this game satisfying a given o p t i m a l i t y p r i n c i p l e (for instance the set of Nash e q u i l i b r i a ) . I0
0
Nash
equilibrium
251
Consider a play of the game ( x , x , , . . . . i j ) in the situation « = ( % . . , , « * ) € U(xf),t), L e t r(xi,t - i) be the corresponding subgames. Denote by u ' a truncation of the s i t u a t i o n u on positions of the subgame F(xi,£ — i ) . T h e dinamic s t a b i l i t y requirement then reduces to fulfilment of the condition ti" G U{xi,l-i), where is the set of ail situations in the subgame r[x t-i) 0
it
satisfying a given o p t i m a l i t y principle (for instance the set of equilibria). Let B* G U(xi,£ — i); and if there exists such situation ui = [wj,... ,w ) G U(x ,£} that n
w(x) on positions of mality U(xo,() Here by w(x) strategy of the
0
(7.2.25)
= u\x)
the subgame T(x , t — i), 0 < i < t, then the principle of o p t i is s a i d to be strongly d y n a m i c stable, (strongly time constant). for x € Xj (j = l , . . . , n ) is meant the choice made by the j - t h player Wj(x) in the position x € Xj, i.e ;
U J ( X ) =Wj(x)
for x G Xj (j = l , . . . , n )
T h e o r e m 5 The set of ail Nash equilibria ciple of optimality.
(7.2.26)
is a strongly-dynamic
stable
prin-
P r o o f : Let { / i ^ ' l u ' f x ) ] } — Uj(x;) (j = l , . . . , n ) be the payoffs of the players in the subgame F ( x ; , f — i) in the e q u i l i b r i u m u of this subgame. Consider an a u x i l i a r y game r , ( x o , £ ) (a t r u n c a t i o n of the game T(xo,£) in the node xj) which is the same as the game r ( x , £) except that in the gtaph G of this game the node i j is declared to be final, the other nodes of the subgraph G±, are eliminated from the set X. Is assumed to be 1
0
Hj(x )
= Kf[u {x)] i
i
= Vj(x )U i
=
\,...,n).
In other respects the games r(x ,t) and t(x J) coincide. Let w(x) be an equilibrium in the game r { x , ^ ) ; construct the situation w(x) in the game r ( x , t) by the rule. 0
0
0
0
. . _ f J5(X) ' 1 u'(x)
in positions of the game r ( x o , ( ) , in positions of the subgame r ( x , , f - i ) .
,y ^ 27)
We will show t h a t the s i t u a t i o n ui(x) is the e q u i l i b r i u m in the game r ( x , ^ ) . Note that Kj(v>) = I<j(w) for a l l j = ! , . . . , » , where Kj is the payoff of the p l a y e r ) in the game t{x ,t). Now let Uj G D,. Consider the situation (u»||«i). If in this s i t u a t i o n the play of the game passes through the position x , then 0
0
;
Kj{w\\uj) = Kf ( I ^ I M < « ? ( « ' " ) = Kj{w)
=
Kj{w).
If in the s i t u a t i o n {w\\uj) the play of the game does not pass through the position Xi, then Kj{w\\uj) = Kj(w\\uj)
= f
< Kj{*>) =
Kj(w)
Noncooperative
252 T h i s completes the proof of the theorem.
differential
games •
Consider the concept of /{"-equilibrium introduced by N . N . V o r o b y e v . Let /{ be the set of subsets of a l l players N = {!,... , n } . T h e elements of the system K are called coalitions, the system K being a c o a l i t i o n structure. Let u = ( u i , . . . , u „ ) be a situation i n the game (7.1.1), and u , = {u;}ies be a strategy set of the players from the coalition S. Denote by (u|ju,) a situation which is the same as the situation u except that the strategies u,- replaced therein by the strategies u,-, i £ S. T h e situation u is called the / ( ' - e q u i l i b r i u m i n the g a m e (7.1.1) if for each coalition S £ K and u, there is such if the player i £ S t h a t l
Ki{u\\u,)
T h e o r e m 6 Suppose that in the truncated game Vi(x ,() (i = 1,2,...) and subgames T(xi,£ — i) there are K-equilibria. Then the set of all K-equilibria is a strongly dynamic stable principle of optimality. 0
If the coalition structure K contains all i n d i v i d u a l players, i.e. the sets {t} (i £ N), then it can be readily seen that any / { - e q u i l i b r i u m is also the Nash e q u i l i b r i u m . If the coalition structure /{ is composed of all subsets of the set N, then the K e q u i l i b r i u m is called the strong Nash e q u i l i b r i u m . W e shall compare the concepts of Nash e q u i l i b r i u m and / { - e q u i l i b r i u m . T h e Nash e q u i l i b r i u m u = ( f i j , . . . , u „ ) implies t h a t if the player departs from the e q u i l i b r i u m strategy (providing the other players employ their equil i b r i u m strategy in the s i t u a t i o n u), then this w i l l never increas his payoff. T h e condition that, when the player i deviating, other players employ their e q u i l i b r i u m strategies is very i m p o r t a n t , since, when two players depart from their e q u i l i b r i u m strategies, they may have an o p p o r t u n i t y to improve their payoffs. T h e Nash e q u i l i b r i u m is a strongly d i n a m i c stable principle of optimality, but the inherent impossibility to unite players into groups i n order to increase their payoffs seems to be not quite real. T h e / { - e q u i l i b r i u m of a situation means that in any coalition S from a given coalition structure K there is necessarily a player who derives no benefit when the coalition departs from the e q u i l i b r i u m strategy. T h e / { - e q u i l i b r i u m of a situation may be interpreted as impossibility, i n this s i t u a t i o n , for the coalition S £ K to raise the payoff for all its members. T h u s , the existence of the / { - e q u i l i b r i u m prevents formation of the player coalitions S £ K seeking to depart from it. Unfortunately the existence of / { " - e q u i l i b r i u m is a very exceptional case.
Definition of the noncooperative
7.3
differential
games in normal
form
253
Definition of the noncooperative differential games in normal form
Let the state of controlled process at the time instant t be described by the vector x = x(t) = (x (t),...,x (t))eR™ (7.3.1) 1
which varies on the t i m e interval [t equation
0l
i = /(a
m
T] in accordance w i t h a differentia! m o t i o n
) U l
(i)
u»(t)),
(7.3.2)
where f(x, u-,,..., ti„) = / ( ( * , « ( ) . Its is assumed that the player i can have an influence on realization of the state (7.3.1) of the process in h a n d by choosing, as a c o n t r o l , the U{- measurable function m : [io,T] —> Ui, where Ui is a compact set i n the E u c l i d e a n space IF**. Denote the set of all such controls by D ; (i £ J V = { 1 , . . . , « } ) . W e assume that the vector function / is continuous i n thee set H™ x x . . . x U and satisfies L i p s c h i t z condition for x w i t h a constant K n
| | / ( * i , u i , . . . , u » ) - / ( * , u i , . . . ,u )|| < K\\xi a
n
i || 2
uniformly i n tt| € U ... ,u € U . It is well known that these conditions ensure for each initial state lt
n
n
x(t.)
(7.3.3)
= x.
and a fixed n - t u p l e of control of player u £ D\, ...,«*£ a unique solution x = x(t) = x ( t , i . , x . , u i , . . . ,ti„)
D
}
n
the existence of (7.3.4)
of the p r o b l e m (7.3.2), (7.3.3) on the time interval [t ,T] In this case the solution is taken i n the sence of C a r a t e o d o r i , i.e. as an absolutely continuous function x(t) satisfying the equation (7.3.2) almost everywhere on the interval [to,T\. F r o m the c o n t i n u i t y of the function / , and f r o m the compactness of the set W = lJj X . . . x ( / „ is follows that 0
m a x 11/(0,w}\\ = B < oo. Consequently, for any trajectory (7.3.4) the following inequality holds N 0 I I S I M I +
/'||/(«('-),tt,(T) ...,«.(T»dT 1
-'to
< ||x || + B(t - t ) + K f \\x{r)\\dr. 0
0
<
Noncooperative
254
differentia]
games
B y u s i n g the G r o n w a l l - B e l l m a n inequality we o b t a i n the estimate ^ x i U U ^ . - . ^ ^ x ^
+ B i t - t , ^ ' - ^
<
e < --» K
T
T h u s , t h e set $ ( X i , t „ , x . } of a l l trajectories (solutions to problems (7.3.2), (7.3.3) for all possible sets of player controls) is u n i f o r m l y b o u n d e d , a l l of the trajectories (7.3.4) w h o l l y l y i n g inside a closed sphere S ( 0 , R||x,||)) w i t h x = 0 as center a n d /2(|]z,|j) as radius in the E u c l i d e a n space R™. N e x t , from the continuity of the function / on the compact set C(||x,||) = S(0, R\\x.\\}) X W it follows that .
Jf(x,w)\\
max
=
B,(\\x,\\)<^
T h i s means that a l l trajectories (7.3.4) satisfy L i p s c h i t z c o n d i t i o n for f: | | x ( ( , t . , x . , u , . . . , „ ) - i ( i ' i , u , . . . , u , x , ) | [ < -Bi(||z,||)|(-t'| 1
T h u s the set
l t
> ~)
'
x
1
s a
0
1
n
' s o u n i f o r m l y continous. N o t e t h a t the continuity
of the function ( ( » , £ , ) —> x ( i
x
(
' , i i • - • i n ) u n i f o r m l y i n time ( and controls u
u
« i , . . . , u „ may be proved. In this chapter we consider differential games i n the class of piecewise o p e n loop strategies. Denote by D,[t, T ) the set of restriction of all admissible controls for the j—th player to the interval [ f , r ) C ['o,T] (i £ i V ) , a n d by
PF[to,T]
the set of all finite p a r t i t i o n s A = {(
0
= i?<(f
< . . . < ^ ,
= r}
of the time interval [t ,T\. T h e sequence
A
n
where
(j = l , . . - , n ( A ) ) .
'• R'^Wf-^f)
is called the A - s t r a t e g y for the i - t h player. T h e pair 0; — ( A , 1 / 4 ) , where A 6 PF[to,T], a n d
Suppose that in the initial state x{t ) o p e n - l o o p strategies
= x
0
=(A ,^ ),.-.,« l
1
0
n
the players choose piecewise
= ( A
n
, ^ J
(7.3.5)
T h e numbers tf' (j = 1 , . . . , n ( A ; ) ; i G N) are arranged i n ascending order. T h e n we have a sequence t < t, < t <...<(,= T. A t the i n i t i a l instant of time t the players choose controls 0
2
0
«'=<,i(*o)€0,-[t ,tr ) o
s
(7.3.6)
Definition
of the noncooperative
differentia!
255
games in normal form
whereupon the t r a j e c t o r y * ( 0 = x ( M o , m , « ] , - . . , <)
(7.3.7)
is realized on the t i m e interval [C , h] players from the set N = {i £ JV | i f ' = ( J stick to controls (7.3.6) up to the time instant ($, then they choose new controls t
0
the trajectory x(t)
•
= x(m>*{H),:4i
=
u 2
(>"«
(7.3.8)
being realized o n the time interval [*i,t ]. Let N = {i £ N | t f ' = i } and A = {i £ N | ( £ ' = r ) (N^Nj - B, JV, C A ^ ) . A t the time instant ( the players from the set JV choose new controls 2
2
3
2
2
2
2
and the players f r o m the set JV choose new controls 2
«i=^Xi2))€D<[<2,^'). Proceeding w i t h these reasonings, we determine i n q steps a unique trajectory x(t)
(7.3.9)
= {t,U,x ,a^... a ) X
Q
l
n
generated by strategies (7.3.5) from the i n i t i a l state x(t ) = x . O n the time interval [ f , i i ] the trajectory (7.3.8) coincides w i t h the trajectory (7.3.7), on the interval [<j,tj\ w i t h the trajectory (7.3.8), etc. 0
0
0
Consider a conflict decision process where the i - t h player seeks to m a x i m i z e the functional (payoff). H^(x T-t ,x(-)) 0l
0
( i e n
(7.3.10)
defined on the set of all trajectories (7.3.9). L e t u = (u,,...
,u ) n
be a situation
in P O L S , T h e n the f u n c t i o n a l , " n ) = H®(x T-
T-t„;«!,...
0t
t ;u). 0
(7.3.11)
defined on the set D$ X . . . X D% is called the payoff function of the i - t h player corresponding to the payoff (7.3.10). T h e collection r(x ,r-i )=< 0
0
tf?
i&a£L>-*M.
>
(-- ) 7
3
12
Noncooperative
256
differential
games
is called the noncooperative differential n - p e r s o n game described by equations (7.3.2) w i t h payoff functions (7.3.11). In the games (7.3.12) w i t h prescribed duration T — t a payoff for the i - t h player is of the form 0
i
0
+ /7,(x(T)),
r hi(x(r))dr
H (x ,T-t x(.))= 0l
hi > 0, Hi > 0,
(7.3.13)
where hi and / / , are continuous functions on R . T h e addend o n the r i g h t hand side (7.3.13) is called the integral payoff, the addend being the terminal payoff. Basically, we will consider the games (7.3.12) w i t h the payoff (7.3.13) for which a special designation V(x ,T — t ) is used. Let /V be the set of all players. T h e n every subset S C JV is called a coalition is the game r(x ,T — to)In our book we shall pay m a i n attention to the cooperative differential games and corresponding o p t i m a l i t y principles. For the noncooperative case we shall give only the definition of main o p t i m a l i t y principles, which are generalization of the multimove case. N a s h - e q u i l i b r i u m . T h e rc-tuple of strategies u' = ( u j , . . . , u",...,«*) constitute a Nash e q u i l i b r i u m in the game T{x , T — t ) i f m
0
0
0
0
0
«*(•)) > «(*<>, T - t ; u - ( . ) | k ( ' ) )
J®(xo,T-
0
for all i = 1 , . . . , n , « , ( ( ) G D*. T h e situation «*(•) = « , (•) is called Nash c - e q u i l i b r i u m in the game F(x ,T — to) i f
(•),.... <(•))
0
- fm <(•)) + ? > J«\*o,T-
•J H*o,T l
t ; («:(-)||u,(-))) 0
for a l l i = l , . . . , i i , u , ( ' ) g D f . T h e following statement holds. T h e game r ( x , T - to) for any e > 0 has an Nash e - e q u i l i b r i u m in the class of piecewise o p e n - l o o p strategies ( P O L S ) . W e shall not give the proof of this statement in our book, which is a generalization of the correspong theorem for the descrete time case. 0
P a r e t o - o p t i m a l i t y . T h e s i t u a t i o n u(-) = (tii(-), • • - , " ; ( ) , . . . , " „ ( - ) ) . « i 0 € D* is called P a r e t o - o p t i m a l in the game V(x ,T - f ) if from the existence of the situation u'(-) = (u',(-) u J O , . . . , < ( • ) ) , «<(•) € D * such that 0
0
J ^ o , T - t -«'(•)) > •/%, T - t ; fi(>)), £ = , . . . , » 0
0
follows that J H*o,T l
- to;u'(-)) = jM(x T0t
i ;«(•)). 0
Definition
7.4
of cooperative
differential
game
257
Definition of cooperative differential game in the form of characteristic sets
Consider a previously defined differential game T{x ,T - t ) assuming that the players' payoffs are nontransferable. R e c a l l that i n the nontransferable payoff case, the players from the coalition S C N a i m i n g to m a x i m i z e each component of the vector J = {J',i € S), where J' is the payoff function of the player i (see (7.3.11), Therefore the cooperation between the members of the coalition S is only the choice of joint strategies u$(-) g V$ w i t h o u t side payments. 0
0
s
D e f i n i t i o n 1. T h e game -f{x ,T game w i t h o u t side payments if 0
- t ) is called a cooperative differential 0
1. its rules p r o v i d e the formation of any c o a l i t i o n ; 2. the players' payoffs are untransferable and they may j o i n t l y choose strategies b u t w i t h o u t m a k i n g any side payments; 3. the players' objective is to m a x i m i z e , by using joint efforts, their o w n payoffs. For each c o a l i t i o n S C A , we consider a f a m i l y of z e r o - s u m games 7
{r£(z ,T-(o)Uec -'°(z )}, T
0
0
where C * ~ ( » o ) ' t h e reachability set in the game r ( i , T — t ). T h e z e r o sum game r (x , T — t ) between coalition S a n d N\S, acting as one player, starts at the instant t from the state XQ a n d proceeds until the instant T. Its dynamics is described by equations (7.4.2). T h e objective of Player S in the game r (x ,T - t ) is to m i n i m i z e the distance p(x(T),y) = \\x{T) - y|| at the game t e r m i n a t i o n i n s t a n t , where x(T) = x(T\x ,u$(-), Ujv\s( )) is the end point of the t r a j e c t o r y x(-) = x(-; x , v (-}, ityv\s(-)) w h i c h is a solution of the system (7.3.2), y £ C ~ °(x ) is a point the coalition 5 C N seeks to reach at the game t e r m i n a t i o n instant T, and tB
s
y
s
0
0
0
0
0
v s
0
0
0
0
T
T
s
0
[Vi is the admissible strategy set for the player i in the game T{x ,T - to)). U s i n g the T h e o r e m 3 from the C h . 2 we get t h a t i n the game r (x ,T-t ) for any € > 0 there are the c - o p t i m a l piecewise o p e n - l o o p strategies u (-) 0
y
s
0
0
c
•*WM(0
a
n
d
t n e v a
'
u e
vaWsixo,
o f t m s g a m e
T-t ) 0
i s e q u a l
t 0
= l i m \\x{T\x , »%{•), u-v\s(0) ' 0
vl
s
t
Noncooperative
258
differential
games
Introduce the sets yg-^ixe)
C - ' { z ) | vair* (x T
= {z€
T
0
s
0
- t ) = 0},
0l
0
= W O I * ( ° ) = *o. ( ) ;
X
6 if"'"^)}-
T
Purposefully, the set K " ' ( x ) is a subset of the reachability set T
0
C ~'°(x ) T
0
0
for each point z of w h i c h there is the strategy u' ( ) ensuring the arrival of the s
coalition S i n the c-neighborhood of the point ; at the instant T irrespective of the behavior of the coalition N\S; Xj~' (x ) 0
is the set of a l l trajectories of
0
the system (7.3.1.)-(7.3.2.) leading to the set K / ~ ' * ( x ) . L e t V be a mapping 0
: 2 x RT'xR}
V
N
—* R such that for each triplet (S, xo, T—to) £ 2 n
N
x If* x R
t ) = { J ( x ( ) ) | x(-) G Xj-"(x )},
V{S;x ,T0
0
}
(7.4.1)
0
where 7 ( x ( - ) ) = ( J , ( x ( ) ) , . . . , J ( x ( - ) ) ) , n
JM-))
= [
L e m m a 1 The mapping 1. for an empty coalition 2. for the maximal
hi(x(l))dt
T
+ Hi(x(T)),
im1
n
V defined by (7.4-1) has the following 0 and any x € R"' andT
> t
0
coalition
N and any x
t ) = {J(x(-))\xl-)€Xl'"-{x )}; 0
0
v"{9\ x , T — f ) = fl; 0
G R'" and T > (
0
properties:
0
0
V(N;x ,T
'
0
3. for any fixed Xo G Rm aitd T > t the mapping is supcradditive for the arbitrary nonintersecting coalitions S and R.
in S, i.e.
0
V(S\jR o,T
-t )DV(S;x ,T
lX
—
a
0
-t )\JV(R-x T
0
0
-t )
0:
0
P r o o f : Since V = 0, then we get sequentially Y ~' (x ) = 0, Xj-' (x ) V(9;xo,T-t ) = ID. T
t
a
a
0
t
= 0,
0
0
B y the definition of the reachability region, for each y £ C ~'"(xo) there is a strategy u ( - ) £ V such that p(x{T\ x , * « ( • ) ) , y ) = 0, hence ^ " ' ' ( x o ) = C - ° ( x ) a n d , consequently, V ( / V ; x , T - f ) = {J(x()) [ x(-) e X j - " ( i ) l where X^^o) = {x(-) | x ( i ) - x „ , x ( T ) G C - ' ° ( x } } is the set of all trajectories to be realized i n the game P ( x , T — to) T
w
T
N
0
0
0
0
0
T
0
0
0
We will now prove the superadditivity. Show t h a t S'cS implies X j r ' ° ( x ) C - ^ " ' V o ) - L e t J € V[S'\x ,T - t ). T h e n there exists a trajectory x ( ' ) £ X | , - ( x ) for which Ji = Mx{-)), i = l , . . . , n . Since S' C 5 , then x( ) G ^ • " ' " ( x o ) (the following strategy is always possible: in the game r i j { x , 7 * f ) , the players from S' employ the strategy t h a t is «-optimal i n the game 0
a
, 0
0
0
0
0
Definition of c o o p e r a t i v e differential
game
259
T%(XQ,T~to) whereas the players from S\S' employ the t r u n c a t i o n on the set S\S' of the e - o p t i m a l strategies of the coalition N\S in the same game). C o n sequently, J = J(x{-)) e V(S;x ,T-t ) a n d V{S'\x ,T -1 ) C V(S;x ,Tt ). For fmy Sift C N, Sf\R = 0, by what has been proved we have V ( 5 ; x , T - 1 ) c V(SUxo,r-fo), V(ft;x ,T-t )c V(SUft;x ,T-( ). B u t then V ( S ; x , T - t ) U V ( f t ; x , T - i„) c V ( S U # *o, T -1 ). This completes the proof of the l e m m a . • 0
0
0
0
0
0
0
0
0
0
0
0
U s i n g the set V
0
of a l l admissible situations i n the game r(x ,T
N
-t )
Q
= {J(x ,u (-))
0
0
0
N
•P/v}. For each coalition S C N, the set V{S\x ,T-i ) 0
0
f € N \ u {-)
= {Ji{x ,u (-))},
N
one
- t)
0
can write V(N;x ,T
0
0
0
e
N
may be interpreted as
0
a set of all payoff vectors w h i c h can be realized in the game r { x
0 l
only
T —t) 0
by the efforts of the members of the coalition S irrespective of the behavior of the players f r o m the set N \ S. J G V(N;
T h a t is, this is the set of the payoff vectors
x , T — t ) whose corresponding components may be guaranteed by 0
0
the coalition S to its members.
F r o m the s u p e r a d d i t i v i t y of the m a p p i n g V
in S it follows t h a t for S C S ,
K ( S ; x , T - t ) C V{S';x ,T
1
0
the
- t ),le.
0
0
0
larger the c o a l i t i o n the wider its opportunities. T h u s , the players can obtain the largest-benefit by u n i t i n g in the m a x i m a l coalition D e f i n i t i o n 2. T h e set V(S;x T 0L
N.
- t ) defined by (7.4.1) is called a char0
acteristic set of the c o a l i t i o n S in the game F ( x , T — t ) 0
0
T h e m a p p i n g V is the analog of the characteristic function in the cooperative game w i t h side payments a n d can be used i n the definition of solutions of the cooperative differential g a m e w i t h o u t side payments. T h e p a i r < N, V(S; XQ,T — t ) 0
>,where N is the set of players, a n d K t h e
m a p p i n g defined by (7.4.1.), is called the cooperative differential game in the form of characteristic sets. For short,we denote it by r ^ ( x , T — t ) . 0
0
Proceeding from the m i n i m a x p r i n c i p l e , we constructed a characteristic set. As is shown by L e m m a , it is s u p e r a d d i t i v e in S at all times.
Note that the
characteristic set m a y also be constructed by employing some other
methods.
However, it s h o u l d always have the meaning of the guaranteed payoff set for coalition a n d possess s u p e r a d d i t i v i t y i n S a n d K{0; x , T — t )
— 0.
0
0
In the g a m e F i / ( x , T—t ) the players are facing the p r o b l e m of realization of 0
0
(in a sense) o p t i m a l payoff vector .7 = ( J j , . . . , J„) from the set to).
V(N;x ,T0
T h e set of payoff vectors satisfying some o p t i m a l i t y p r i n c i p l e is called
the solution of the game r (x ,T v
V(N;xo,T
a
a n d is denoted by W (x ,T
- t) 0
v
- t ) . Specify now the form of the set Wv(x ,T 0
0
D e f i n i t i o n 3. T h e vector J € V{N;x ,T 0
-f ) 0
-
0
- t) 0
C
to).
{J = ( A , • • •, J.,)) is called
the i m p u t a t i o n i n the game F y ( x o , T — to) if Ji > m a x M | J' € V{{i};x ,T0
to)},
<-!,...,»
(7.4.3)
N o n c o o p e r a t i v e differentia.!
260
games
T h e c o n d i t i o n (7.4.3) reflects t h e i n d i v i d u a l r a t i o n a l i t y of the imputations vector.The set of a l l i m p u t a t i o n s i n the game r V ( x , T - t ) is denoted by E (x ,T - t ) . E a c h i m p u t a t i o n is a payoff vector a n d E (x ,T - t) C V(N;x T-t )0
0
v
0
v
0
0>
0
0
0
D e f i n i t i o n 4 L e t J', J" £ E (x ,T - t ). W e say t h a t the i m p u t a t i o n . / ' dominates the i m p u t a t i o n J" for the coalition S(J' i~s J") if v
1.
0
0
> J'/ for a l l i 6 $]
2. J' £V(S;x ,T
-t ).
0
0
T h e i m p u t a t i o n J' is said to d o m i n a t e the i m p u t a t i o n J" (J' > J")if there is a nonempty coalition S C N such that J' >-s 4 • N o t e that i n t h e untransferable payoff games the d o m i n a t i o n is possible for all coalitions c o n t a i n i n g more than one player, a n d for the coalition N. D e f i n i t i o n 5.
B y the c - k e r n e l i n the game
—
TV(XQ,T
t ) w i t h o u t side 0
payments is meant the set of nondominated i m p u t a t i o n a n d is denoted by C (x ,T-t ) D e f i n i t i o n 6. B y the N M - s o l u t i o n i n the game I Y ( : c , T - i ) without side payments is meant a subset Lv{x , T — to) of the i m p u t a t i o n set Ev{x ,T — t) for w h i c h the following conditions are satisfied: v
0
D
0
0
0
1. if J',J"
£
—
LV(XQ,T
0
0
to) i t follows that J' a n d •)" do not dominate one
another; 2. for any J £ [Ey(x ,T — t ) \ LV{XQ,T — t )\ there is a n i m p u t a t i o n J £ Lv{x , T — f ) such that J dominates J. 0
0
0
0
0
T h e o r e m 7 The imputation J belongs to the c-kernel Cv{xo, T — t ) of the game TV(XQ, T — t ) if and only if, for each coalition S C N and each imputation J' £ V(S;x ,T — t ) there exists such i £ S that J , > J J . 0
0
0
0
P r o o f : If J £ Cv(xo,T — t ) a n d for a coalition S C N there exist such an i m p u t a t i o n J' £ V(S; x , T - t ) t h a t J{ > J± for a l l i £ S, then J' y J- This disagrees w i t h the belonging of the i m p u t a t i o n ./ to the c - k e r n e l . If, however the i m p u t a t i o n J has the property i n d i c a t e d i n the c o n d i t i o n of the theorem, then i t is non d o m i n a t e d , i.e. i t belongs to the c - k e r n e l . • a
0
a
T h e o r e m 8 / / there exist the c-kernel Lv(x ,T-t ) a
0
s
Cv{xo,T
— t) 0
in the game I V ( : r o , T - t o ) , then Cv(x ,T-t ) 0
0
and the C
NM-solution L (x ,T-t ). v
0
0
Classification
of dinamic
stable (time-consistant)
solutions
261
T h e proof follows from the definition of d o m i n a t i o n a n d from that of the sets C (XQ,T
-
v
t ),
L {x ,T
0
v
-
0
t ). 0
D e f i n i t i o n 7. T h e s i t u a t i o n u" (-) e V is called P a r e t o o p t i m a l i n the game Y {x ,T — t ) if for any s i t u a t i o n u {-) GV the condition J,(x ,u (;)) > •Mzo, " « ( • ) ) for a l l i - l , . . . , n implies Ji(x<>,u (-)) - Ji(x , u (-)) for all t= l,...,n. N
v
0
N
a
N
N
0
N
p
0
N
N
A s is seen f r o m the latter definition, any deviation from the P a r e t o o p t i m a l situation involves a decrease i n the payoff for at least one player (at the same t i m e , the other players' payoffs m a y increase). L e f D ^ , be the set of a l l P a r e t o o p t i m a l situations i n the g a m e Tv{xo,T P(x ,T-
t)
0
0
— ( ). Introduce the set 0
= {J(x ,u (-)) Q
| u (-)
N
£
N
T? } N
which is called the P a r e t o o p t i m a l set of payoff vectors i n the game rv( ,
T —
Xtl
E v i d e n t l y , P[x ,T
t ). 0
- t)
0
0
C V(N\x ,T
-
0
i ).
T h e vectors from the set
0
P(xo, T — to) do not d o m i n a t e one another for the coalition N.
7.5
Classification of dinamic stable (time — consistant) solutions
A s is seen f r o m D e f i n i t i o n 3, any i m p u t a t i o n K G Ey{x , 0
tor i n the game F y f i o , T —1 ) 0
T —1 ) is a payoff vec0
a n d hence is realized along a trajectory x(-), i.e.
A l o n g the t r a j e c t o r y x{ ), the i m p u t a t i o n K is unique because
K = J{x{-)).
of the lack of side payments. A n d each i m p u t a t i o n from the set Ev{x
0>
T —1 ) 0
corresponds to its own trajectory. Hence the c o n d i t i o n a l l y o p t i m a l trajectory in the game Tv(xo, T — f ) has to be defined as a trajectory along which one 0
of the i m p u t a t i o n s f r o m the solution Wv{xo, T — to) is realized. D e f i n i t i o n 8. L e t W (x ,T v
(7.3.1.H7.3.2.)
0
such t h a t J(x(-))
- t) 0
^ 0. A n y trajectory x(-) of the system
G W (x ,T v
0
w i l l be called the condi-
- t) 0
tionally o p t i m a l t r a j e c t o r y i n the cooperative differential game Tv{xo, T — to) without side payments. In D e f i n i t i o n 8, J(x(
i = 1 , . . . , n } , w h e r e J<(i(-)) =
)) = {Ji(x(-)),
$lhi(x{t))dt
+ H {x(T)),
jectory x(-)
(see 7.3.13).
are the players' payoffs along the t r a -
i =
;
Suppose t h a t i n the i n i t i a l state of the game r (x ,T v
principle is chosen a n d the s o l u t i o n W {x ,T v
ance w i t h i t . L e t x(-) W {x , v
0
T - t ). 0
jectory x(-), t < T)
0
- t)
0
0
an o p t i m a l i t y
- f ) is constructed i n c o m p l i 0
be a c o n d i t i o n a l l y o p t i m a l trajectory, i.e.
A s s u m e that the game T {xo, T-to) v
G
J(x(-))
proceeds along the t r a -
i.e. consider the current f a m i l y of subgames {r (x(t),T—t), v
defined along the t r a j e c t o r y i ( - ) . T h e current game T {x{t), v
t
0
<
T - t) is
Noncooperative
262
differential
games
defined as follows. T h e following sets are constructed for each coalition 5' C N: Yf-HWl)
=
I valV {z(t),T
C -<(m) T
= W O IVW =
- t) = 0},
s
S
W . V(T) 6
WW))}-
Here C ~'{x{t)) is the reachability set of system f r o m the i n i t i a l state x(t) at the time instant T — t; Y%(x{t),T — t) is a z e r o - s u m game between the coalition 5 and N\S described by the equation ij = f(y, v (-), «N\S(0)> «S(0 6 T>s\t,T\, «w\s(0 6 ( ^ s [ t , T ] n , s A [ i , T ] is the set of admissible strategies for the coalition S defined in the restricted region [ i , T ] ) , the game duration being T — t and the i n i t i a l state y(t) = x(t). In this game, the objective of Player 5 is to m i n i m i z e the distance \\y(T) - z\\ by the instant T (the objective of Player N \ S is he opposite one), where z £ C ~ ' ( i ( ( ) ) i s a fixed point of the reachability set i n the current game rv{x(t), T — t). We w i l l now construct the set T
s
e
r
V(S; x{t), T — t) — {J(y(-)) " A w W ) ) - ! ^ ' ) ) . * - ! W O )
= fWr))dr
| y(-) £ X j - ' ( x ( i ) ) } ,
4 + ^( (r)), i = 1
n.
V
T h e construction of the characteristic sets V{S;x(t),T-t) for each S £ 2 " exactly defines the current cooperative game F y ( x ( ( ) , T — () = < N, V ( S ; i ( ( ) ,
r - o >.
Let £ v ( * ( i ) , T - t) be a i m p u t a t i o n set, i.e. B {x(t),T - t) = {J € V ( / V ; x ( < ) , r " 0 I Ji > m a x { J / | J ' £ i ( i ) , T - t)}, i - l , . . . , n } and W V ( i ( ( ) , T - l ) C £ ( i ( t ) , r - t) be a solution to the game T (x{t),T - i) generated by the same o p t i m a l i t y principle as the solution Wv(xo, T — to) of the game rv{x ,T — ( ). T h e current game rv(x(T),0) is of zero d u r a t i o n , and hence a unique payoff vector defined in the i n i t i a l state x(T) exists therein. For this reason, we assume that the set Wv(x(T),0), which is a solution of the current game P v ( x ( T ) , 0 ) , is composed of a unique i m p u t a t i o n H(x(T)) = {Ui(x(T)) i = l , . . . , r a } , where H,(x{T)) is a t e r m i n a l part of the player t's payoff (see (7.3.13)). For the trajectory £{•) to be realized i n the game r (xo,T-t ), it should be conditionally o p t i m a l in the i n i t i a l sense in all current subgames (otherwise the players have no reason to keep to this trajectory until the game terminates). In other words, the conditionally o p t i m a l trajectory x{-) i n the game ?v(xo,T-t ) can be realized only when the i m p u t a t i o n J(x(-)) £ W {x ,T - t ), has the property of d y n a m i c stability. v
v
v
0
0
t
v
0
0
v
0
0
Classification of dinamic
stable (time-consist ant)
solutions
263
D y n a m i c s t a b l e s o l u t i o n . Let i ( ) be a c o n d i t i o n a l l y o p t i m a l trajectory in the game r (x ,T - t ). For t G [to,T] it is denoted by x(-)[t, T). If the game r (x ,T-t ) proceeds along the trajectory x(-) until the instant t, then the player i's payoff on the time interval [ t , i ] is equal to v
v
0
0
0
0
0
for the i m p u t a t i o n J(x( )} £Wv{x ,T-t ) it is essential t h a t each instant t
0
0
J{H-)[i,T})
= £
to be realized i n the game the vector
k{x(r))dr
+
H(x(T))
representing the players' payoffs i n the game Vv(x(t),T i m p u t a t i o n in the game r (x{t),T
u
T}) € Wy{x ,T
t) should be the v
In this case, at each i n s t a n t the part J(x()[t,T]) i m p u t a t i o n J(x(-)[t ,
-
belong to its solution W {x(t),T
-1)
v
V^x^T-to),
6 W (x(t),T
— t).
- t) of the
v
— to) is o p t i m a l i n the i n i t i a l sense a n d
A
hence the players have no reason to reject orientation towards its realization in the current g a m e ry{x(t),T
— t), t
0
A n d this means that d u r i n g
the game the players have no intention to reject the conditionally o p t i m a l trajectory £{•) a n d its corresponding i m p u t a t i o n J{x(-)) D e f i n i t i o n 9. T h e i m p u t a t i o n J(x(-)) stable i n the game
—
VV(XQ,T
£ Wy(xo,T
to) if
1. along the c o n d i t i o n a l l y o p t i m a l trajectory x(-) t
0
< t <
G H V ( x o . T — to)-
— to) is called d y n a m i c
— t) ^ 0,
Wy{x(t),T
T\
2. fl
J(x(-))e
[J(i(-)[io,t])e^v(i{t),T-t)].
lo
We say t h a t the cooperative differential game V (x ,T v
payments has a d y n a m i c stable solution Wy(x , a
from Wv(x , 0
T —1 ) 0
0
— t) 0
without side
T — f ) if a l l the i m p u t a t i o n s 0
are d y n a m i c stable. A n y c o n d i t i o n a l l y o p t i m a l trajectory,
along w h i c h there exists a d y n a m i c stable i m p u t a t i o n entering into the s o l u t i o n , will be called an o p t i m a l trajectory. T h e d y n a m i c stable solution can be placed i n correspondence w i t h the opt i m a l t r a j e c t o r y set X: W {x , v
0
each i m p u t a t i o n from the d y n a m i c stable solution
corresponds to at least one o p t i m a l trajectory, so that
T — t) 0
W (x ,T-t ) v
Let J', J" € W (x ,T v
0
0
- t)
such trajectories x(-),x(-)
= {J(x(-))
0
0
|x(.)eA}
be d y n a m i c stable i m p u t a t i o n s . T h e n there exist € X
t h a t J' = J(x(-)),
J" = J{x(-))
a n d at any
N o n c o o p e r a t i v e differentia!
264
games
instant to S t 5 * J(x(-))e{J(£t))[to,t})®W (x(-),T-t)}, v
J(H-))e{J{H-Wo,t})®W (m,T-t)]. v
Let a £ R " , A C R
N
, then by a © J4 we u n d e r s t a n d the set of a l ! vectors
ofthe form a + z, where z € A (a © A =
a + z : z G A).
J ( i ( . ) [ t , t ] } ^ J ( i ( - ) [ t o , t ] ) , a n d the sets W {x(t),T-t) as solutions of different games r ( 5 ( t ) , T -
Here, i n general,
and WV(*(0.
v
0
t) a n d V (x(t),T
"
0.
- i ) , are not
v
v
r
related to one another. E v i d e n t l y , the i m p u t a t i o n J{x(-)) t) 0
if, at each instant t
0
is d y n a m i c stable in the game
< t < T the vector J(x(-)[t,T])
r {x ,Tv
0
is a d y n a m i c stable
i m p u t a t i o n i n the current game i V ( x ( t ) , T — t). S t r o n g l y - d y n a m i c s t a b l e s o l u t i o n . T h e d y n a m i c s t a b i l i t y of the i m p u tation J(x(-))
e
t ) implies that the i n c l u s i o n
-
WV{XQ,T
0
(7.5.1)
J(x(-),[t,T])£W (x(t),T-t) v
is satisfied at each instant to 55 t < T , i.e.on each interval [t,T]
the remaining
portion x(-)[t, T] of the trajectory x(-) is to be o p t i m a l in the game I Y { i ( f ),T— t). If, together with (7.5.1), at very instant t
a
< t < T for a l l conditionally
o p t i m a l trajectories £,(•) i n the game T V ( 5 ( t ) , T — t) there is the inclusion J{x(-)[t t])-rJ(x (-))eW (x ,T 0>
t
v
to)
0
then a " g o o d " c o n t i n u a t i o n of the game after the instant t is not only the i m p u t a t i o n J(x(-)[t,T)) Wv[x(t),T
— t).
e W (x{t), v
T - t) b u t also any i m p u t a t i o n J(x (-)) t
€
B y slightly strengthening this requirement, we obtain new
concept of d y n a m i c stability which is called the strongly d y n a m i c stability. D e f i n i t i o n 1 0 . T h e i m p u t a t i o n J(x(-)) d y n a m i c stable in the game Vv(x ,T 1. the i m p u t a t i o n J{x(-)) 2. for any t
0
£
— t ) is called strongly
WV{XQ,T
a
— to) if:
0
is d y n a m i c stable;
< tj < h < T \J{x{-)[t M\)®W {x{t \T-t )\c 0
v
2
2
(7.5.2)
C\J(x{-)\UM)®Wv{x{h),T~U)\ W e say that the cooperative differential game T (x(t),T v
payments has a strongly d y n a m i c stable solution W (x ,T v
putations from W (x ,T v
0
0
— t) w i t h o u t side - t) 0
if a l l the i m -
— t ) are s t r o n g l y - d y n a m i c stable. T h e conditionally D
Classification
of dinamic
stable (time-consistant)
solutions
265
o p t i m a l trajectory, along which a s t r o n g l y - d y namic stable i m p u t a t i o n from the solution exists, is called the strongly o p t i m a l trajectory. F r o m (7.5.2), for £ = t, t, = ( we have 2
J(x(-)[t ,t})
© W (x(t),T
0
for t —T,ti
0
-t)Q
v
W (x ,Tv
f. ), t
0
0
0
= t we o b t a i n
2
J(x(-)[t ,T])
W (x(T),Q)
ffi
0
v
c J{x()\t ,t])
e W [x{t),T
£ [J(x(-)[to,t])
© Wv(x(t),T
0
- t),
v
t
or y\(x(t))dt
+
H(x(T))
- t), t < t < T. 0
R e m a r k . L e t J{x(-)) £ W (x , T - t ) be a s t r o n g l y - d y n a m i c stable i m p u t a t i o n , and £,(•) an a r b i t r a r y conditionally o p t i m a l trajectory in the current game F y ( x ( r ; ) , 7 ' — t). F r o m the definition of the strongly d y n a m i c stability it follows that J{x(.)[t , t]) + J(x (-)) £ Wv{x , T - („). T h e vector J ( i , ( - ) ) is realized i n the game F y ( s ( t ) , T — t) along the trajectory x,(-), hence the vector J(x(-)[t , t\) + J(x (-)) will be realized in the game Ty{x ,T — ( ) if the following trajectory is realized there v
0
0
0
t
0
0
t
0
x(-) it(-)
0
on [t ,t], on [t,T]. D
T h e existence of a strongly d y n a m i c a l l y stable solution i n the cooperative differential game w i t h o u t side payments involves fulfilment of rather rigid conditions and hence is often empty. T e r m i n a l p a y o f f s . Consider the case when i n (7.3.13) hi = 0, i = 1 , . . . ,n. T h e n it follows f r o m the nontransferability of the players' payoffs t h a t to each point x £ C ~ (x ) corresponds one vector of the payoffs H(x) = { / / , ( x ) , i = l , . . . , n } , where C ~ '(x ) is the reachability set of system (7.3.2)-(7.3.3) on the interval [to,T]. Therefore the i m p u t a t i o n s i n the game V {x ,T — t ) with terminal payoffs will be realized i n the final states (points) rather than along the trajectories, accordingly. T
ia
a
T
l
0
v
V(N-,x ,T
- to) -
0
{H(x)
0
0
|x £ C -'°(s )}. r
0
T h e c o n d i t i o n a l l y o p t i m a l trajectory is defined as follows. D e f i n i t i o n 1 1 . Let W (x , T - t ) ^ 0. A n y trajectory of system (7.3.2)(7.3.3) such t h a t H(x(T)) £ W (x ,T-to) is called the conditionally o p t i m a l trajectory i n the g a m e r {{xo,T - t ) w i t h t e r m i n a l payoffs. v
0
0
v
v
0
0
266
Noncooperative
differential
games
Let i ( ) b e the c o n d i t i o n a l l y o p t i m a l trajectory. C o n s i d e r the current games r v ( x ( t ) , r - t), t < t < T along this trajectory. T h e game Y {x{t),T -t) differes from the game r (x ,T-t ) i n the i n i t i a l state and d u r a t i o n , although the players' payoff functions therein are the same as i n the game Tv(xo,T-t ), i.e. J = / i ( i ( r ) ) , ; = l , . . . , n . 0
v
v
0
0
0
1
A s before, the solution of the current game r (x(t) T v
-
W (x(t),T v
- t) is denoted as
t
0-
D e f i n i t i o n 1 2 . T h e i m p u t a t i o n H(y) stable in the game T (x , v
0
1. there exists such c o n d i t i o n a l l y o p t i m a l trajectory x(-) ± 0, t
W {x{t),T-t) v
is called dynamic
£ Wy{xo, T —t„)
T - t ) w i t h t e r m i n a l payoffs if
0
that x(T)
= y and
0
2. H(y)&
W {x{t),T-t).
fl
V
to
Definitions of the d y n a m i c stable solution and o p t i m a l trajectory in the game VV(XQ, T — to) w i t h t e r m i n a l payoffs are the same as in general case. Let X = ( i ( - ) | H(x(T)) € W (x(t),T-t) ,t
0
be a set of o p t i m a l trajectories in the game Tv{x ,T stability of the solution Wv(x ,T — t ) means that
— t ).
0
0
Wv{x ,T-t ) 0
c
0
0
T h e dynamic
0
(J
W (x(t),T-t)
0
x( )€X
v
l'o
D e f i n i t i o n 1 3 . W e say that the game w i t h t e r m i n a l payoff Tv{xo,T has a strongly d y n a m i c stable solution Wv(x , T — to)if:
- t„)
0
1. it is d y n a m i c stable; 2, for any o p t i m a l trajectory £(-) and any t
W (x(t ),T v
- t) C
2
< t < T 2
Wvixit^T-U).
2
In this case the c o n d i t i o n a l l y o p t i m a l trajectory is called a strongly optimal trajectory. N o t e that if W (x ,T - t ) is s t r o n g l y - d y n a m i c stable, and H{x(T)), H{x{T))€W (x ,T-to), then at any instant t
v
0
0
0
a
W (x(t) T-t)cW (x(T),T-T), v
}
v
Classification
of dinamic
stable (time-consistant)
W (x(t),T
-t)c
v
267
solutions
WV{X(T),
T-T)
for all T < t a n d , i n p a r t i c u l a r , for T = toR e g u l a r i z a t i o n of the game.
T h e players are actually unable to affect
the d y n a m i c s t a b i l i t y of the solution and restrict themselves to the choice of a c o n d i t i o n a l l y o p t i m a l trajectory only.
If along a c o n d i t i o n a l l y o p t i m a l
trajectory there is no d y n a m i c s t a b i l i t y of the solution,then because nontransferability the players cannot change this circumstances by effecting m u t u a l (side) payments. T o solve this p r o b l e m , we perform regularization of the game IV(;EQ, T — t ) by p r o v i d i n g somewhatwider opportunities for the players. 0
C o n s i d e r the integral payoff game, i.e.
in (7.3.13.)
we set / / , = 0, i =
l,...,n. Let x°(-)
be such c o n d i t i o n a l l y o p t i m a l trajectory i n the game Tv{x , T — t ) 0
0
that W (x°(t),
0, t
T - t)
v
Assume t h a t the i m p u t a t i o n J(x°(-))
0
6 W (x ,T v
and there exists the first instant U > t
0
0
< ( < T
(7.5.3)
- t)
is d y n a m i c unstable,
0
when the d y n a m i c stability of this
i m p u t a t i o n is d i s t u r b e d , i.e.
r
f h{x*(t))dttWv{x°{U),T-t ). r
l
Let us choose an a r b i t r a r y i m p u t a t i o n rT
where £'{•) is c o n d i t i o n a l l y o p t i m a l trajectory i n the current game r ( i ( t ) , v
T - h) (#{*,) =
i ' ( i ) + x°(i),
t > h).
0
1
Denote
•it,
F r o m (7.5.4) a n d (7.5.5) it follows t h a t n, + ;
f h(x°(t))dteW (x (t ) T-t ). T
Jt,
v
a
1 l
l
Note t h a t there are no restriction for the sign of the components of the vector V l l
. Set Jto to
T
J\x°{-)(U,T})
=
f
J u
KxHtm
268
Noncooperative
E v i d e n t l y , J (x°{))
differential games
= V(f (-)). Let c
l
J'(x°(-)[t,T})
=
k(z (r))dr t€(t ,T}, l
1
1
and let there be the first instant of time f. > i i such that 2
and J'ix^lh.T})
?W (x°(t,),T
-t ).
v
2
W e choose an arbitrary i m p u t a t i o n
r
/
r
h(x (t))dt
e W (x (t ),
2
0
v
(7.5.6)
T - t)
2
2
where x {-) is a conditionally o p t i m a l trajectory i n the current game T - t ) ( x ( t ) = £ ° ( l ) , x ( t ) ^ i ° ( ( ) , ( > t j ) . Denote 2
2
2
2
Tv(x°(t ), 2
2
2
r
r, =
M i ' (r))](
f [k(x (t))T
! s
2
(7.5.7)
From (7.5.6) and(7.5.7) it follows that
f Note that there are no restrictions for the sign of components of the vector i / , , . Set • ' V ( - ) [ W ] ) = ^ (£ (.)[( (i]) + 1
2
0
J (z°()(( ,T])= 2
2
ft^ti))^
0 >
/
fifj^JA
ft can be readily seen that J ( * ° ( - » =
Let
2
and let there be the first instant of time i
-
3
> i
2
such t h a t
e i y y ( x ° ( i ) , r - f), t G ( f , ( ) ,
j {x%)[t,T}) 2
2
and JV(-)[( -r])0HMx°(i ),r-( ). 3
3
3
3
Classification
of dinamic
stable (time-consistant)
269
solutions
A s s u m e t h a t the further c o n t i n u a t i o n of this process makes it possible to o b t a i n a sequence of time instants t < t < t < . . . and imputations J(x°(-)[t,T}) for ( G Kh), J^xX^T]) for t G ( ( , , ( , ) , J ( i ° ( 0 [ i , r ] ) for ( G (t ,i ),.--, for t G for which we have the inclusion J (x°(-)[t,T]) e W (x°(t),T - t) for ( G ( f t ) , A = 0 , 1 , . . . , and t
2
3
J
2
3
k
v
t l
f HV(S°(*
J (i%)[t T)) k
k+U
t + 1
i),T -
f c +
t ). k+l
Also, assume that there exists the time instant U < T for which J'(x°(),[t,T}) £ W (x°(t),T - t) for all ( G [t ,T]. Denote v
t
and define the function 7 ' by the formula T* -
£
/'
M*»)<^
r + 1
+
['h(x (r))dr, k
* e [ * * , * * + i ) . fc = o , i , . . . , f ( ( ,
= T).
+ 1
Let us i n t r o d u c e a vector function 0{t) = (0i(t), •. • ,0 {t)), intervals [((.., f-fc+i}, k = 0 , 1 , . . . , ( (t>+\ = T) by the formulas n
0(f) = 1, w)
=
0(0
=
!
defined on the
te[t ,u)\ 0
0 w >
>
^
>0,
t
€
tort*
i€[£ ,r, t
+ 1
);
T h e n the expression for 7 ' may be w r i t t e n in the form
7' = £
/ ' ' 0(r)A(i°(T))
/ ' 0(r)ft(i°(T))dT,
T +
tG[f*A
+ 1
) , * = 0,1,...,/(*«.l = T).
W e may show that for any t G |'o,T] l
/(i°('))-7 e^v(i ,(OT-0I
(
T h e proof of this assertion is left to the reader. N o w , because of the players payoffs being corrected at discrete time instants by the quantities n ,..., i)t, and regularized w i t h the use of m u l t i pliers 0{t) for t G ( i k . f t + i ) , the conditions for the belonging of the r e m a i n i n g ls
Noncooperative
270 part of the i m p u t a t i o n
differential
games
- 7' to the current game solution are satisfied.
T h e regularization of the game is such that at the start of the game the players agree to d i s t r i b u t e the i m p u t a t i o n J(x°(-))
on the time interval [ t , T ] in such 0
a way that the vector of the obtained payoffs on each i n t e r v a l [ i , (j is equal to 0
the vector 7*, t e [t ,T]. 0
T h e n the r e m a i n i n g i m p u t a t i o n J(x°{-))
belongs to the solution of the current game r (x°(t),T have no reason to depart f r o m the trajectory x°(-)
-7'
always
- t) and the players
v
chosen at the start of the
game. Remark. 0
v
R e g u l a r i z a t i o n of the game is performed - t) ^ 0, to < t < T.
W (x (t),T
the instant (, when W (x°(t),T v
-1)
p r o v i d i n g that
x°(t)
If, this c o n d i t i o n is not satisfied, then at = 0, i n the current g a m e T (x°{t), v
T - t)
there are no i m p u t a t i o n satisfying the i n i t i a l o p t i m a l i t y p r i n c i p l e (the optim a l i t y principle which has generated the solution Wv(x , Q
T — t) 0
of the game
P i / ( x , T — ( 0 ) ) - Therefore there is no reason to develop further the game along 0
the trajectory x°(-).
In this case we say that the game IY{.ro, T — t) has no
dynamic stable solution.
7.6
Structure and dynamic stability of pareto optimal solution in the game of approaching
A t y p i c a l example of untransferable payoffs i n differential games accures when this payoff are the distances. Here, and i n subsequent sections of this chapter, we w i l l show an a p p l i c a t i o n of the above-developed approach to the differential games w i t h such type of payoffs. We assume that the dynamics of the n - p e r s o n game is described by equations (7.3.2)—(7.3.3), and the players' objective is to o b t a i n the closest approach to specified stationary points M ; £ R , i — 1 , . . . , n by the game termination instant T. T h a t is, the player i's payoff is of the form m
H,(x(T))
(7.6.1)
= -p(x(T),M ), i
where p is the E u c l i d e a n distance. For each coalition S C N (N = {I,... we construct a characteristic set V(S\X(,,T J,(x(-)) = Hi(x(T)), by Ty(xo T
i = I,...
— t) a
, n . T h e obtained cooperative game is denoted
— i ) and is called the cooperative game of approaching.
}
0
game Ty{x ,T 0
— t) 0
,n})
by the rule (7.4.1) setting The
is an example of cooperative differential games without
side payments defined i n 4 of this chapter. Here, and i n the subsequent sections, the reachability set C ~'°{x ) T
system (7.3.2)-(7.3.3) is assumed to be convex for all XQ £ R' , n
in the sense of Euclidean metric.
0
of the
and compact
S t r u c t u r e a n d dynamic In the set C
stability
of pareto optimal
solution
271
' " ( X Q ) , for each player i there is his best (closest to M ; ) p o i n t
T
x': p{x\Mi)
min xec -'°(zo)
=
p{x,Mi).
r
None of the players, however, can gurantee achievment of his best point. N a t u rally, because of the dependent m o t i o n a n d as result of discussions at the start of the game, the players restrict themselves to a subset of the set
C ~ (x ) T
tfl
0
confined between the best p o i n t s for a l l players. It turns out that in a number of cases e x a c t l y this set corresponds to d y n a m i c stable solutions of the game — to) i n the sense of one or another o p t i m a l i t y p r i n c i p l e .
V (x ,T v
0
W e denote by ng an operator of the orthogonal projection from R convex compact set B C
onto a
m
R"'.
B y the o r t h o g o n a l p r o j e c t i o n of the p o i n t x £ R
onto B is meant the point
m
£ B:
WBX
p(x,n x)
=
B
We say that the point K X
mmp(x,y).
is the image of the point x for the orthogonal
B
projection TT . The p o i n t x £ R'" is called a p r e - i m a g e . For x £ B, x x b
for A C R"\
B
= {w x
KBA
B
Let A = conv{Ai,
\ x £
i = 1 , . . . , n } , Ai £ R'", where conv{Ai,
a convex hull of the points A\,..., polyhedron
(r
<
= x\
A}. A. n
i — 1 , . . . , n] is
R e c a l l that A is some r - d i m e n s i o n a l
T h e l a t t e r means t h a t A is contained in some r -
m).
dimensional space R"\ b u t is not wholly contained i n any ( i — 1) hyperplane. The b o u n d a r y of any r - d i m e n s i o n a l polyhedron is composed of a finite number of (r — 1) d i m e u t i o u a l polyhedrons that are called the (r — 1) faces. In what follows, we need the l e m m a s given below. L e m m a 2 Let B be a convex compact set in R'", which does not belong to B. ally£
Then tke inequality
and x be a point in
p(K x,y)
< p{x,y)
B
R
m
holds for
B.
Proof: T h e p o i n t ir x B
exists (since B is compact) a n d is b o u n d a r y point. L e t
us take an a r b i t r a r y p o i n t y £ B, y ^ 7 r x . Since the set B is convex, then B
lUj = \
y
+ (1 - X)w x B
£ B, 0 < A < 1. Consider the function m
p (x,w ) 2
x
= £[V ;=i
- Xy' - (1 -
A ) ^ ]
By the definition of the operator j r , the point x x B
B
the f u n c t i o n p (x,u>\). 2
dp (x,w*)
2
.
provides a m i n i m u m of
Hence
2
A=0
= 2 £ ( f f a r * - y )(x B
j
j
- ^ )
> 0-
Non cooperative differential
272
games
F r o m this we have -2(y
-
* B * , X ~
0,
>
»**)
where (•,•) is the scalar p r o d u c t of vectors. Cosequently, the angle between vectors y — KB%, X — JTSJ is not less than the right angle a n d is the largest of all interior angles of a triangle w i t h vertices x, y, i.e. p(x,y) > p{^B ,y)T h i s completes the proof of the lemma. • x
Let x,x $ u> , where w\ = Xy + (1 - X}y, X € [0,1], x,x,y,y Introduce the function F ( A ) = p{x,u>*) — p ( i , u i > ) , where
g
fl™.
that F(X)
> 0
A
= . E k - ^ - ( i - A ) ^ p .
p{x,u> ) x
\ J=I
L e m m a 3 F r o m tke fact that F(X) for all 0 < X < 1.
> 0 for X = 0,1 it follows
P r o o f : Since p(x,iv>,) a n d p(£,w\) are positive q u a n t i t i e s , then the inequality p (x,wx) > p (x,wx) implies the inequality p{x,w\) > p{x,w,,) for 0 < A < 1. Therefore, for simplicity, we consider F ' ( A ) = p {x,ws) — p (x,u>\). For an a r b i t r a r y A = A] 6 (0,1) u u , = x = X y + {1 — A])£ a n d 2
2
2
2
}
H A . ) = £ 1 ^ - X ij> - (1 - A i ) ^ ] - £ [ F j
- Xtf* - (1 - A , ) ^ ]
2
x
= D(^)
-
2
- 2(1 - X,)y x' ]
- 2Xtfx>
+ 2(1 - X^x'
2
=
+ 2X y>x>}. l
Considering the equalities F'(0) =
- 2^>- - ( P )
2
+ M*9*J,
m
H O
= E K ^ )
3
- 2x'f
- {i'f
+ 2^S»l,
J=l
we get
m ) ^ ' ( o ) + + £{(i ) J
2
-
" (x ) J
2
F'(0) +
f(iH
+ 2 * [ A ? + (1 - \^\
- 2 F [ A , ^ -f (1 - Xi)y'\)
1
+
- (x>) + 2 x > F 2
2i>i'],
=
Structure a n d d y n a m i c s t a b i l i t y of pareto o p t i m a /
Solution
273
where & = Aijf + ( 1 - A , ) y is a point that is s y m m e t r i c to x w i t h respect to the m i d p o i n t of t h e line segment u>\. T h e n
E0f
- 2P^] =
- m*+w*
-
- £ ( i ' - F) = J
Finally F " ( A , ) = F ' ( 0 ) + F * ( l ) - F ' ( l - A , ) . Set A = 1/2 a n d x
P'(l)=lmO)
+
F'(l))>0.
Similarly, we m a y show t h a t for a l l A = £ / 2 * , k= (0 < A , < 1)
1 , 2 , . . { = 1,.... , 2 * - 1
(
> 0.
2*
The asswertion of t h e l e m m a then follows f r o m the continuity of the function F'(A). • L e m m a 4 Let B vertices Bi,...,B .
be the r-dimensional closed convex polyhedron in PJ" with Then, for ally € B andx £ B', the inequality p{y,B ) <
T
r
q
p(x, Bk) is satisfied
k
at least for one k £ f (J = {1,...
,q}).
P r o o f : T h e proof is obtained by i n d u c t i o n . For r = 1 , B ' is a line segment, and l e m m a is evedent. Suppose i t holds for r = p — 1. Since B" is convex, the line segment ui), = [A£ + (1 — A)7r x) belongs to the set B" for a l l 0 < A < 1, where the point £ G B ~ is such that y 6 u>j. B y i n d u c t i o n , there exists i : € / such t h a t p{{, B ) < (x, B „), where B „ € B"~ . B u t B
p
l
kc
0
p{yB ) kx
+(l -
P
k
= P[M + {\ - 1^B*KBU\
~\)p(x rX,B ) B
ko
l
k
<
< \p(x, B^) + (1 - X)p(ir pX, B
A n d i t follows f r o m L e m m a 2 t h a t p{n ex, B „) < p{x,B ). B
k
ka
V i > , B O + 0-*)p{*B x, ko
A s a result, we get p(y,B ) kll
+ (1 - \)p(x, B ) ka
< p(x, B ), ko
k
Therefore,
B )<
r
< \p(z, B )
B „).
ko
=
P
{x,
k e / , quod erat d e m o n s t r a n d u m . 0
Noncooperative
274 L e m m a 5 Let B vertices Bi,...,B point £ G B , T
be the r-dimensional
T
and x,y
9
for
differential
closed convex polyhedron
be some points not belonging to B . < p(y,£)
then p{y,B )
in R
m
< p{x,B )
k
with
If there is the
r
which p{y,0
games
for
k
all at
least one k € /. Proof:
For ( £ S ' , the l e m m a is evedent.
£ £ B~ p
and assume t h a t ( G B".
l
exists such index ko G / that B 1
for which (e
1
Suppose p{y,Bt„)
G \B \ B ~ }.
there is B
G B^" C B
k
p
p
T h e n there exists the point
1
wx = [XBkt + (I - A ) U l 6 B * . 0 < A < 1.
> p(x,f?tj.
1
a n d B , there
1
P
ka
G B"
A s s u m e that the l e m m a holds for
B y the definition of B"'
p
If p[y,{k<,) < p ( x , 6 : J >
t
h
e
n
by indiction
such that /.(y, B*) < ,>(*, B * ) - If p f a . U ) > ^ * » & , ) >
then by L e m m a 3 p(f, y) > /.(f, *) for a l l £ G ti» = A f t , + (1 -X)B ,
0 < A < 1,
h
A
which contradicts the condition in the lemma. T h e n p{y, B ) ko
< p{x, B ), kt
and
this completes the proof of the lemma.
•
Let P (x ,T c
where and //(•)
I <(•)
- to) = {H[x(T;xo,^(-)))
0
G
V ) P
N
is the set of Pareto o p t i m a l situations i n the game r\,(x ,T 0
=
{//*(•).
» = l , - . . , n } . T h e set P (x ,T c
- t) 0
0
o p t i m a l set of payoff vectors i n the game f (x , c v
T — t)
0
0
— f ), 0
is called the Pareto or s i m p l y the Pareto
o p t i m a l set. In this section, by the solution
A
— t) 0
W e w i l l study this solution.
— t ).
0L
a
M,,}.
u
T h e o r e m 9 / / M f\C ~'°(x ) T
0
V
c
Denote M = conv{M ...,
— t ) of the game ry{x ,T
W {XQ,T
is meant the Pareto o p t i m a l set P (x T
= 0, then
0
P ( l 6 , r - i o ) = {H(x) e
|x € 7r T-, C
o ( l 0 )
M}.
P r o o f : Let V' 6 *Vr*H &, M
Consider the function F(()
x G C -*{x ), T
x£y'
0
= p ( y ' , f ) - p(x,(),
( f
Y = {y G M | y' = T r - . c
(7.6.2)
i f , a n d the set 0 ( l o )
y}
For definiteness, take the point y G Y such that p{y',y) =
amp(y',y).
Such a point exists, since the set Y is compact. y, for all x from (7.6.2) the inequality F(y)
B y the definition of the point
< 0 holds true. T h e n L e m m a 5
implies the inequality p(y', M , ) < p(x, A/,) for at least one i = ! , . . . , ! » , i.e. Hi{y') > ffi(x) for at least one t =
l,...,n,
(7.6.3)
S t r u c t u r e a n d dynamic And
stability
of pareto optimal
this means that H{y')
- i ).
e P {x ,T c
solution
0
275
W e show t h a t , apart from
0
the points of the set ir -«ix )M, in C ~ (x ) there are no points having the property (7.6.3), this is equivalent to the fact that for any x G C - ' ° ( x „ ) \ "•c -'o( )JW there is the point y e T c r - ' o f u j A / wherein C
T
T
0
,0
0
T
r
IO
Hi(y)
for all i=
> Hi(x)
l,...,n.
(7.6.4)
C o n s i d e r the sets M l = conv{M,ir T-, M}, C
Evidently, M
2
3
=
convx T-, M. C
oM
C A f , . Denote by C - < ° ( x ) and Tt T-< M T
of the sets C ~ ' ° ( x ) and T c - r - i T
0
then w T-i M C
M
a{ia]
0 ( r o
the boundaries
0(xo)
) M , respectively. Since C ~'"(x ) T
C C ^ - ' H X Q ) and / r r - ,
0{xej
C
0
c
0 ( l i ) )
f l M = 0,
0
A7 C C -'o(x ). T
0
Let 2 £ " c r - ' r q r o j A f and be a hyperplane tangent to C ~'°(XQ) at the point 2 . T h e h y p e r p l a n e Q, divides the space ft" into two portions. Since the set C - ' ° ( x ) is convex, it is wholly l y i n g on one side of Q . Denote this portion of space by Qt. Let t
1
T
:
0
n
Q * =
Qt.
It can be r e a d i l y seen t h a t C (a:o) C Q. Let further Qt = Q \ M and Qt be the b o u n d a r y of Qt • B y c o n s t r u c t i o n , the manifold Qt is the locus of the point in ft'" whose orthogonal projections onto M\ are l y i n g on ttcT-^fasM. Hence it is clear that for each point x G Qt its projection T _ t ,
+
+
u
i^xeM^lC^MftM,] (otherwise, i.e. if J T X £ M , \ M I
^ ' ' " { X Q ) ,
then on Qt
there is a p o i n t , v i z .
the point of intersection of the line segment A x + (1 - A J x ^ x and the set whose o r t h o g o n a l p r o j e c t i o n onto A / j does not belong to the set and this is inconsistent w i t h the definition of For any x G C
r _ l o
Qt)-
( x ) \ A / , there is the point y = J T ^ X (since x G Qt 0
and
2
N o w let x G Mi \ Kc -'oiz )M, r
C ~ (x ) ta
T-l
G W ) w h i c h satisfies the inequality (7.6.4) (see L e m m a 2).
It A X
T
Qf,
£7 *(xo),
a
0
and x ^ x be its image on M.
is, by the a s s u m p t i o n , convex and ir T-'o{ ^ C
x
Since the set
C C " ' ( x ) , then T
0
0
there exists the point **€
[ v - » ( » | W n » J .
w
A
= Ax + (1 -
AJTT^X, 0 <
A <
1.
N o n c o o p e r a t i v e differential
276
games
T h i s point is the desired one, since by employing L e m m a 2 we m a y easily show that it satisfies the inequality (7.6.4). T h u s , only the points of the set Consequently, P {x ,T-t ) c
0
T h e o r e m 10 If M C C - ( x ) , T
( o
0
D
•
^ T - . ^ ^ M } .
then
t ) = {H(x)
P (x ,Tc
C
= {H(y)\y£
0
have the property (7.6.3).
V T-I„^M
0
| x £ T
C r
_,
o ( r a |
M =
M).
P r o o f : A s follows from L e m m a 2, for every x g M there is a point £ £ M(£ x ^ x ) such that / / , ( £ ) > Hi(x)
=
for all i , . . . , n. Moreover, by L e m m a 4, for any
x $ M a n d for all y £ M the inequality H{(y)
is satisfied at least for
> Hi(x)
one i = 1 , . . . , n. T h i s means that only the payoff vectors defined on the set M can be Pareto o p t i m a l . W e will show that a l l vectors i n this set are such. Let x,y
£ M, x ^ y. For r — 1, M is the line segment w i t h the end points
Mi and M .
W e have p(M M )
2
u
= p(M y)-\-
2
p(Mi,
y ) , p{x, M )
x) < p(Mi,
2
p(y,M )
u
But since x ^ y, then either p(Mi,y)
=
2
< p(M ,x), x
2
2
> p(x,M )
H{y)
£ P(x , T -
2
Hence H(x),
> p{y, M ).
p(Mj,x)+p(x,M ).
p(y,M )
2
0
Suppose the theorem holds true for r = k — 1. Consider the ray Z
x
or t ). 0
ema-
nating from the point x and passing through the point y. If r = k, then there is such point z £ Z
that
%
max
p{x,z)=
p{x,z).
E v i d e n t l y , the point z is l y i n g on the boundary of the set M, i.e. it belongs to the (k — 1) face of the / t - d i m e n s i o n a l polyhedron M. B y i n d u c t i o n , H(z) £ P{x ,T — t ) . T h e n , by the definition of the P a r e t o o p t i m a l set, there exists at least one index i £ N for which H {z) > H (x) or, which is the same, p(z,M ) < p(x,M ). C o n s i d e r i n g that y £ w = Xx + (1 - X}z, 0 < A < 1, we have 0
0
0
ia
ia
ig
io
x
= p(Xx + (1 - A)5, A / , ) < Xp(x,
p(y,M ) iB
%)+
0
+ (1 - X)p{-z,MJ
+ (1 - X)p(x,M )
< Xp(x,M, ) 0
= p(x,M ),
io
i o
i.e. p{y,M ) < p(x,M ). Consequently, H{y) £ P {x ,T - t ). B y interchanging x and y, we similarly have that H{x) £ P (x , T - 1 ) . T h u s , for all y £ M = x T--, M a n d o n l y for t h e m H(y) £ P (x ,T-t ). T h i s completes the proof of the theorem. fj ia
c
ia
c
( 7
c
0 { r ( ) J
T h e o r e m 11 Suppose the points M ...,M M D C -'°{x ). Then u
T
0
0
0
0
0
0
are arranged in such a way that
n
0
P
e
M - t
0
)
= {H(x)
| x £ x r-, M c
cM
=
C -<°(x )}. T
0
S t r u c t u r e a n d dynamic
stability
of pareto optimal
277
solution
T h e proof of T h e o r e m 11 may be o b t a i n e d from that of T h e o r e m 8 if the sets M a n d C ~'"(x ) are interchanged. T
0
W e w i l l now consider the general case of arrangement of the points M i , . . Af„ w i t h respect to the reachability set C ' (x ). Let T h
A
f
=
l
0
{ieA |A/ eC -' (x )}, J
T
i
0
0
N ={i£N\M
T
i
E v i d e n t l y , JVj f] N-i = 0 a n d JV, IJ N
0
= JV.
2
T h e o r e m 12 In tke game of approaching of Pareto
optimal
c
If JV, = B
= 0 or CF-^fa)
Q
case where C -''(x ) T
M, £
^ ' ' " ( X Q ) .
c
0
0
structure:
= {//(*) | x €
0
P (x ,T-t )
* -° )M). c
ilB
0 then the conditions of T h e o r e m 6 or T h e o r e m 8 apply
when M C\C '' (x ) T
0
the set
0
payoff vectors has the following P {x ,T~t )
Proof:
T^(x ,T-to),
0
C M , respectively. T h e r e is another
Mr\C - (x ) T b,
0 when for a l l i =
^
0
t,...,n
A s is readily seen, i n this case the theorem is proved i n the
same way as i n the cases 1),2) that are given below. If N
= 0, then M C C ~ ' ( x ) a n d the conditions of T h e o r e m 7 apply. T
2
A s s u m e that J V , , J V / :
0
0
0. Introduce the following notation
M , = M[\C - °{x ), T l
T h e n Tr T-. M C
0(xa)
M
a
= M j |J'rc -'°(*o) ^r
y' 6 * < r -
A
M
=
2
& ,
M\M . }
Let
x e C ^(x ),x T
0
(7.6.5)
+ '. y
We shall show t h a t Hi(y')
> ffj(x)
at least for one i = I , . . . , n.
(7.6.6)
To do t h i s , consider two cases: 1- y' € M i - In this case, the inequality (7.6.6) is proved i n much the same way as T h e o r e m 7; 2. y' £ k T - . C
0 ( l o )
M l \ M, = 7 r r - , c
0 ( l o )
JW . 2
In the l a t t e r case, for any x from (7.6.5) a n d y € M
2
| r T-, y C
o{T0)
= y'},
the inequality p{y',y)
V , where Y
< p(x,y)
=
{y
e
is satisfied. T h e r e -
fore, the i n e q u a l i t y (7.6.6) follows from L e m m a 5. T h u s the set T composed o n l y of the points satisfying the condition (7.6.6).
C 3
._
1 O
<
I O )
M is
Noncooperative
278
differentia!
W e w i l l show t h a t , apart from the points of the set T 7 - - < C
no points having the property (7.6.6), i.e. for any x £ there is the p o i n t y € VcT-^jM
wherein p[y,Mi)
0(ro)
A/
games
there are
C ~' (xo)\ir T-i M T
a
C
for a l l i =
< p(x,Mi)
l,...,n. Consider the set
B y L e m m a 2, for any x £ C ~ ' ° ( x o ) \ M there exists such p o i n t y = J T ^ T
x £ M
that p(y, Af;) < p(x,Mi)
3
Now let x £ [M
\ 7r 7'-< ,
3
C
t
desired one. If, however, n^x
for all i = 1 , . . . , n.
(ll))
M ] . If jy — J T ^ X £ M j , then the p o i n t y is the
£ A V , then n^x
£ Jr r-<
2
c
0 ( r o )
M , b u t there exists
the p o i n t x' £ x T - , C
0 ( l o |
M f]uf , A
2
where U>A = A s + (1 - A J i r ^ x , 0 < A < 1, a n d using L e m m a 9 for the set B representing a convex h u l l of the points x', Mi,...,
M, n
we get that p(x' M ) i
i
p(x, Mi) for a l i i = = I , . . - . , » , i.e x' is the desired p o i n t .
< •
Theorems 5-8 provide a constructive m e t h o d for c o n s t r u c t i o n of the Pareto o p t i m a l set i n the game r (X(,,T
—()
v
jective" points w i t h respect of points corresponding game T {x ,T c
P(x ,T 0
v
0
T h i s i m p l i e s the set
to the Pareto o p t i m a l set of payoff vectors in the
- t ) , i.e. the set P (x ,T
0
under various arrangements of "ob-
to the reachability set.
x
0
0
- to) = {x £ C - '(x ) T
l
\ H{x)
0
£
is obtained in all cases by e m p l o y i n g the orthogonal projection
— t )} 0
of the convex hull of "objective"
points onto the reachability set of system
(7.3.2)-(7.3.3).
7.7
Existence of dynamic stable C—kernel and TVM-solution in cooperative game of approaching
In this section, by the solution W (x ,T v
meant the c-kernel Cy(x ,T u
- t) 0
0
—t) D
of the game V (x ,T v
— R(M{,r,(x(t)))
—t ).
0
Let x(-) be such trajectory of system (7.3.2)-(7.3.3) that x(T) and Ri{x(t))
— to) is
0
a n d /V A / - s o l u t i o n L%,(x ,T
0
£ x^r-ici^Af,
be a closed sphere w i t h center at the point A/,-
and of radius '•i(i(i)) =
max
p{x,Mi)
(i = 1 , . . . ,n,
t
0
< t <
T).
E x i s t e n c e of d y n a m i c stable C-kernel
279
and NM-solution
For each c o a l i t i o n S C N we construct the set = |-| R i ( S ( 0 ) . Mf
R (x(t)) s
T h e o r e m 1 3 The Pareto-optimal is dynamic stable. Proof:
c
0
0
A n y t r a j e c t o r y £(•) such that x ( t )
nearist i n C
T -
Because x(T)
0
=
0
conditionally-optimal.
in the game F = ( i , T - t )
set P {x ,T-t )
x,
x{T)
0
6 '"c - <'(x )M, T
i
x T-,
e
C
0 ( t o )
0
j W is
the point i ( T ) is the
B
' ° ( x ) to some point z G M, i.e. 0
p(z,x(r)) =
mm
p(z,x).
r6C -'0(r ) r
But we have C -'{x(t))
0
C ^"'"(XQ), X(T) G C -'(X(()), t T
r
0
< t < T, thus
x ( T ) will r e m a i n the nearist point in the set C ~ " ' ( x ( z ) ) to the point z for a l l r
io < t < T, i.e. if? ) £ 7 r r - . 1
c
( i ( ( )
jM
for all *o < i < T . B u t P (x(t),T From i ( t ) £ j r r - , c
T - t), for a l l t
0
: x G 7r r-,
- t) = {H(x)
c
( i ( 0 )
c
A / for all t
0
( i ( 1
„M},
t < t < T. 0
< t < T, it follows that H{x(T))
G P (x(t), c
< t < 7\ i.e n P W ) . T - t ) . (D
H(x(T))e
T h e point x ( t ) G " • r - i ( ) A / was a r b i t r a r y , a n d thus the d y n a m i c stability of C7
0
the set P ( x , T - t ) c
is proved.
0
0
IO
•
R e m a r k . T h e d y n a m i c s t a b i l i t y of the P a r e t o - o p t i m a l set may be easily proved also for the general n - p e r s o n games. T h e o r e m 14 For the existancc of nonempty c-kernel optimal following
set P {x ,T c
- t)
0
0
condition
s
is the interior
Yj- "(x ) l
(see 3 in this
0
chapter).
= {y£
0
of the set R (x ), s
T
0
(7.7.1)
and
0
| vairi(x ,T-t ) 0
that the
N):
= 9,
0
C -'°(x )
with the Pareto
it is sufficient
0
0
for each S C N (S ^
be satisfied
int R (x )f\Yj-"(x ) where int Rs{xo)
coinciding
in the game F£,(x , T - t ),
0
= 0)
Noncooperative
280 Proof:
differential
games
Let the condition (7.7.1) be satisfied. W e have to prove that the c-
kernel C (x ,T—to) v
exists a n d coinsides w i t h the set P (x , c
0
Suppose the opposite t r u e . L e t the vector H(x) b y a vector H(y) that H(y)
£ E {x ,T v
0
T - t ) be dominated
£ P (x , 0
0
and
£ V(S; x , T - t ) 0
T o do this,
is dominated.
0
T h e n there exists such c o a l i t i o n S C N
- t ).
0
0
T — t)
0
c
T—t ).
Q
it suffices to show that none of the vectors f r o m P (x , e
0
Hi{y)
f o r a l l i £ S'.
> Hi{x)
(7.7.3)
Since H(y) £ V(S;x ,T -t ), then y £ Ys~'°{xo) a n d the c o n d i t i o n (7.7.1) implies p ( j / , M ; ) > 7\(x ) at least for one i £ S. For a l l x £ 7 r r - i M and 0
0
0
c
0 [ r o )
i = 1 , . . . , n , however, p(x, M f ) < J',(xo), so that for any x £ x T - c ( ) M the C
strict inequality p{y,M,)
(7.7.3) does not h o l d , i.e. H(x) the c - k e r n e l C (x ,T-t ) v
0
0
r c
is never satisfied for all i £ S. Therefore,
< p(x,Mi)
is n o n d o m i n a t e d i m p u t a t i o n . Consequently,
exists a n d P ' ( i , T - t ) C C (x ,T-t ).
0
0
0
other h a n d , h o w e v e r , C ( x , T — to) D P (xo,T v
c
0
v
0
O n the
0
— to)- T h i s completes the proof
of the theorem.
•
It follows from T h e o r e m 14 that under c o n d i t i o n (7.7.1) each trajectory from the set = xo, S ( T j £ x
X = {*(•) | x(t ) 0
-,
c r
0 ( r o )
M}
(7.7.4)
is a c o n d i t i o n a l l y o p t i m a l (in the sense of c-kernel) trajectory in the game r y ( x , T -to). e
0
'
T h e o r e m 15 For a dynamic t ), 0
it is sufficient
stable c-kernel
that along each trajectory
to exist in the game Ty^XojT — i(-)
£ X
s
| x £ X T-I„( . )A'/).
Proof:
C
j
0
Each trajectory
x(-)
(7.7.5)
structure; C
in this case the dynamic stable c-kernel has the following a
v
is
(xo, T—
optimal.
U n d e r condition (7.7.5), at each instant to < £ < T along any tra-
jectory x(-)
£ X by T h e o r e m 14 C^(x(t),T — t) and P (x(t),T
Cy(x(t),T
c
current subgame r {x{t),T c u
kernel C (x , v
0
-
t) C P ( i ( t ) , T -
- t).
Therefore, the d y n a m i c s t a b i l i t y of the e -
T - t ) follows from that of the set P (x , c
0
0
T - t ).
to exist in the game T(x ,T holds 0
M
coincides
•
0
*$tfc3 fl %*-'°<*4 = M = 1,,.. In this case, the NM-solution
where
t),
— t) are c-cernel a n d P a r e t o - o p t i m a l sets in
T h e o r e m 16 For the NM-solution sufficient that the following condition
to).
all S C N
= 0, * - 0 < £ < T.
int R (x(t))f)Yf-'(x(t)) t ) — {H{x)
and for
with the Pareto optimal set
- t ),
it is
0
(7.7.6) P (x ,Tc
0
E x i s t e n c e of dynamic
stable C-kernel
and
NM-solution
281
P r o o f : F r o m c o n d i t i o n (7.7.6) it follows, i n p a r t i c u l a r , that
^ W f l ' o H h l A since for 5
= 0 for all S /
N,
?rf / V the s u p e r a d d i t i v i t y of V i n S a n d 5
V/-'°(x ) C
C
/V \ ( i )
imply
Y^(x ). 0
0
Let
G P {x T c
-
0l
W e w i l l show that they can not dominate
t) 0
one another. F r o m the definition of the set P (x c
T - t ) it follows that the
0:
0
i m p u t a t i o n s H' a n d H" do not dominate one another i n the coalition N (in this respect, recall t h a t i n the games w i t h o u t side paiments the d o m i n a t i o n in the c o a l i t i o n N is possible). d o m i n a t i o n H'{y) V[S;x ,T-
t)
0
0
y G T T-i ( C
0
If, however, W
G V[S;x ,T
y
then by Definition 9 of
H",
s
B u t this is not possible, since H'{y)
- t ).
0
0
i m p l i e s y £ Yj-^fo)
a n d H'{y)
G P (x ,T c
-
0
€
f ) implies 0
w i t h the result that
t o )
v;- (x )n^-.o( i o
o
M^0
r o l
for a p a r t i c u l a r S ^ N. T h e latter is inconsistent w i t h the condition (7.7.6). T h e second property of / V M - s o l u t i o n is always satisfied for the set P (xo,T c
— to). T h i s completes the proof of the theorem.
•
A s follows from T h e o r e m 16, under condition (7.7.6) each trajectory from the set (7.7.4) is a c o n d i t i o n a l l y o p t i m a l trajectory (in the sense of
HM-
solution). T h e o r e m 17 F o r a dynamic T (xo,T v
i =
— (. ), 0
stable NM-solution
it is sufficient
to exist in the game
that along each trajectory
G X and for
x(-)
all
l,...,n. J f f W l W ' W n ' ^ W i B *
in this case the dynamic L\,{x ,T-
t)
Q
optimal
0
= M o < t < T.
stable NM-solution
has the following
= {H(x)_ | x G Xc -'H**) >r
M
E
where the set X is defined by the
(7.7.7)
a
c
h
ira
3 V ector
!
structure: T O m
t h e
s e t
^
x
(7.7.4).
U n d e r conditions (7.7.5), (7.7.7) the theorems 15 a n d 17 provide a constructive m e t h o d for c o n s t r u c t i o n of a d y n a m i c stable c - k e r n e l and A A t f - s o l u t i o n . Example.
Suppose the game V (x ,T v
0
x = Uj + u
3
- t) 0
is described as follows:
+ u , x(0) 3
0,
where x G R\ « , = ( " * , « ? } , M < 1, i = 1 , 2 , 3 , T = 1. T h e r e a c h a b i l i t y set C - "{x ) i n this game is of the f o r m : (x ) + ( i ) < 3. For each i = 1,2,3 V ( { t } ; x , T - t ) = 0; for each of three coalitions S : { 1 , 2 } , { 1 , 3 } , {2,3} V(S;x ,T-t ) = {H(x)\(x ) + {x*)- < 1}. T
l
0
0
2
l
a
0
0
l
1
2
1
2
Won cooperative differentia/ games
282
Fig.
T h e set J T ^ ^ I ^ J H jectory x(-) ss Oy,
coincides w i t h the arc AB
where y € AB
0 < r < 1 the set C ~ {X(T)) 1
41.
at the point y. T h e set TC<-'(Z{T))M
T
A n y tra-
is a circle w i t h center at the point X ( T ) a n d is tan-
T
gent to the set C ~'°(xo)
(see F i g . 41.).
is a c o n d i t i o n a l l y o p t i m a ! trajectory. For coincides w i t h the
arc A'B' that is the orthogonal projection of the triangle M = AMiM M 2
the circle C ~ (x(r^ l
onto
3
Since the condition of T h e o r e m 17 is e x p l i c i t l y satisfied,
T
i.e. for each y € AB
the vector H(y)
= (-/>(?/, A / i ) , -p(y,M )
c
stable A b s o l u t i o n L (x ,T~t ) v
0
= {H{y)
0
-
2
belongs to the set P ( x ( r ) , l — T ) , 0 < r <
p(x,M )) 3
I , then there exists a dinamic \ y e AB]
=
P (x ,T-t ). c
0
0
T h e existence of the c - k e r n e l , as is shown by c o n d i t i o n (7.7.1) of Theorem 15, depends on the arrangement of " o b j e c t i v e " Mi = ( 2 , 8 ) , M
2
= (3,6), M
3
* T-'° M C
where C ~'"{x } T
x\ < x }. 2
0
points M M ,M . it
2
3
Let
= (6,6). T h e n
M
=
CT-'°(X )\JK, 0
is the boundary of the region C ~ ( x ) , a n d K = T
t o
0
{x\l/4x < 2
It is easily verified that for all these points the c o n d i t i o n (7.7.5)
of T h e o r e m 15 is satisfied, a n d a d y n a m i c stable c - k e r n e l is of the form: C [x ,T-t ) v
Q
0
= {Hp)
| y € AB]
= P<{x ,T 0
-
t ). 0
Moreover, it can be easily shown t h a t in the given game the c-kernel and MM-solution
are not s t r o n g l y - d y n a m i c a l l y stable.
Chapter 8 Cooperative differential games with side payments 8.1
Definition of cooperative differential game in the characteristic function form
Evidently, the Nash equilibrium principle discussed in the previous chapter fails to refHect all the facets of optimal behavior in n-person game. This optimality principle in noncooperative games, takes no account of coalition formation . At the same time, this factor is a special feature of the majority of n-person decision problems and deserves comprehensive investigation. This chapter considers a differential game r(xo, T - to) defined in 3 of Chapter 7 by (7.3.2.), assuming that any coalitions from the set N can be formed therein. By a coalition is meant a subset SeN where players are in at least one of the following relations: • between all members of coalition SeN there is a full information exchange; • members of coalition SeN are capable of correlating their strategies, i.e. united in coalition S the players act cooperatively as one player with a set of strategies
Vs
= IT Vi iES
where Vi is a set of admissible strategies for the player i; • members of coalition S have common interests, i.e. the payoff function of coalition S to be maximized by joint efforts of its members can be expressed as
283
Cooperative differentia!
284
games with side
payments
where J , i n the payoff function of the player i e S. These three types of relations between the members of one coalition we call levels of cooperation. In what follows, the levels of cooperations i n all coalitions 5 C JV i n the game T{XQ,T — t ) are assumed to be the same. T h e first level of cooperation can exist independently of two subsequent levels; w i t h o u t information exchange the strategies cannot be correlated; without strategy correlation, and hence w i t h o u t information exchange, m a x i m i z a t i o n of the total payoff is senseless. In connection w i t h the c o m m u n i t y of interests ( t h i r d level of cooperation), the possibility to transfer the payoff arises. W h e n different players can compare their payoff, have an o p p o r t u n i t y to s u m up and devide t h e m , a n d transfer part of the payoff to the other players (to effect side paiments), then such payoffs are called transferable. Otherwise we are dealing w i t h untransferable payoffs. In contrast to the games with transferable payoffs, i n the games w i t h untransferable payoffs only the second level of cooperation (cooperative selection) of strategies can be reached. Since untransferable payoffs cannot be devided into parts and transferred to other players, the objective of selection of a j o i n t strategy u ( ) £ T> by the coalition S is a possible m a x i m i z a t i o n of each component of the vector J s = (•'.,, • • •, Ji,), ih € S. D e f i n i t i o n 1. T h e game T ( x o , T — to) is called a cooperative differential game w i t h side payments if: 0
s
3
1. its rules provide for formation of any coalition; 2. the players' payoffs are transferable and the players are at the third level of cooperation (choosing joint strategies and effecting side payments); 3. the players' objective is to ensure a m a x i m a l t o t a l payoff and its equitable distribution under the agreement. U n d e r the most favorable conditions (in what follows they are specified), it is advantegeous for players to form one m a x i m a l coalition N, since i n this case it is possible to ensure the greatest t o t a l payoff. A s a result of such formation, the d i s t r i b u t i o n of the t o t a l payoff among the players turns out to be the main p r o b l e m in cooperative games. For this reason the prinsiples of optimal behavior of the players take in acount their advancing demands when the total payoff is shaved. In this sense, cooperative games are the s h a r i n g games rather than the strategic games. Such an approach to cooperative games is based on the notion of characteristic f u n c t i o n . D e f i n i t i o n 2. T h e function v ; 2 - i R , where 2 in the set of all subsets of the set A , and R) is the space of real numbers, such that N
1. u(8) = 0, where % is an empty set,
1
N
D e f i n i t i o n of cooperative
differential
game
285
2. v{S[JR) > v(S) + v{R) for any S, R C JV such that SC\R = 0 is called a characteristic function. v(S)
is interpreted as a generated payoff of the coalition S, i.e. the payoff
which can be o b t a i n e d by the coalition S from the game independent of other players' actions. T h e second property of the characteristic function is supera d d i t i v i t y . T h i s means t h a t any two disjoint coalitions united i n one coalition receive an a m o u n t at least not less than t h a t they could have received w i t h o u t such u n i o n , i.e. w i t h o u t coordination of their actions. W e shall construct a characteristic function for the cooperative differential n - p e r s o n games w i t h side payments. For each coalition S C JV we consider a z e r o - s u m differential game r {xo,T
between coalition S and JV \ S =
- t)
s
0
e JV |i
{i
£ S}
that is
described by equations x - f[x,u ,u \s) s
N
, x(t ) = «o,
(8.1.1)
0
where
e II W >
u
s
«N\S
e
v
n
>.
and the vectors x , / a n d the set Ui, i — 1 , . . . , n are the same as i n the game (7.3.2). T h e payoff for Player S ( m a x i m i z e r ) i n the game P s ( x o , T - to) in each situation (ujr(0t Af\s(-))> ««{•) 6 V , u
UjvysO) £ ^W\S,
s
is defined as
where the sets
a n d the functions J,- (i = 1 , . . . , n) are defined just as i n the
game (7.3.2.). T h e payoff of P l a y e r N\S is assumed to be
-Js(x ,u (-),iifj\s{-))0
s
B y existance theorems concerning z e r o - s u m games, i n the game F s ( x , T 0
t ) for any c > 0 there exist the e - o p t i m a l piecewise o p e n - l o o p strategies u' (-), s
0
N\s(')
u
all<
^
t
u
e
v a
'
u e
valV (x ,T s
- t)
0
For a fixed i n i t i a l state x
0
€ R
fijW (xo,"s(0> N\s(-})-
=
0
u
s
the function v(S;x ,T-t ),
m
0
0
S € 2
N
is defined
as follows: ( v(S;x ,T-U) 0
=
0,
if to),
valV (x ,Ts
[ max
u w (
.
0
) 6
p
N
J (^O,"N(-)) W
if i f
S=0, 5 C JV ( 0 j i 3 # JV), S-JV. (8.1.2)
286
C o o p e r a t i v e differential games with side payments
In what follows, for simplicity, we assume that a m a x i m u m i n (8.1.2) is achieved. Otherwise a l l constructions become c o m p l i c a t e d , b u t can be m a d e by employing sup operation instead of m a x a n d ( - o p t i m a l strategies instead of o p t i m a l strategies. L e m m a 1 The function the fixed Xo € R
v defined by (8.1.3),
is superadditive
in S, i.e. for
and any S, R C / V , S(~)R = 0 the following
m
v(S\jR;
- t ) > v(S;x T
xT 0l
— t ) + v(R; x ,T
0l
0
inequality
0
holds
- t ).
0
0
P r o o f : L e t K = S\J R. U s i n g the property of inf o p e r a t i o n , for a l l ux(-) £ 2> inf >
inf
^(xo.tZKt-^u/vyfO)) > inf
Js(xo,u {),u \ (-))+ K
N
K
K
J (x , R
u (-),
0
u \ (-)).
K
N
K
E v i d e n t l y , with a n increase i n the left h a n d member, this inequality is preserved: sup
-
nL
J
s
^
X
inf
°'
U
^ >
K
U
J K ( X
0
, U K ( - ) , U
N
\
K
( - ) )
>
J (x ,u (-),u \ (-))
j?f
N \ K ( ) ) +
R
0
K
N
K
for a l l UK{-) £ T> . T h e wider is the set over which the i n f i n u m is taken the K
less is the infinum itself. Since Vtj\
Mv
u -
U
J
K
C T> \ N
s(xo,u*(-W\*(-))+ ^ ( X Q , «/;(•),
MB
a n d 2> \/f C T> \ ,
S
inf
«W\5(-))+
N
W
J R ( X
jnf
0
, U
K
J R ( X
0
{ - ) , U
, U
then
R
N
\
K
( - ) , U
R
( - ) )
N
X
R
>
{ - ) ) .
We get v(S\jR;x ,T-t ) 0
=
0
sup
inf
J (x ,u {-),u {-)) K
0
K
>
NXK
for a l l Us(-) £ D and u ( ) £ X>JJ. Consequently, this inequality is preserved if i n its r i g h t - h a n d member, we take R
S
u(5;x ,r-i ) = 0
0
sup
inf
user's « \ S ( - ) 6 P N \ u
Js(x ,u (-),u \ (-)), 0
s
N
s
S
and instead of the addent, we take v{R;x ,T-t ) 0
0
=
sup
inf
J (i , R
0
(-),
UR
u \ {-)). N
R
o f cooperative differentia]
Definition
game
287
Hence we have the inequality v(S IJ R\ *o, T-t ) t ). T h i s completes the proof of the l e m m a .
> v{S; i f , T-t )+v{R;
0
x,
0
0
0
T•
It follows f r o m the s u r p e r a d d i t i v i t y condition that it is advantageous for the players to f o r m a m a x i m a l coalition N and obtain a m a x i m a l total payoff v(N;x ,T - t ) t h a t is possible i n the game. Purposefully, the quantity v(S;x ,T - t ) (S ^ JV) is equal to a garanteed payoff of the coalition 5 obtained irrespective of the behavior of other players, even though the latter form a c o a l i t i o n N \ S against S. 0
0
0
0
D e f i n i t i o n 3. T h e function v : 2 x R x R -> R defined by (8.1.2), is called a characteristic function of the game V(x , T — t ). N
m
1
0
1
0
N o t e t h a t the positiveness of payoff fuunctions Vj, i = l , . . . , n implies that of characteristic function. F r o m the s u p e r a d d i t i v i t y of v it follows that v(S'; xo, T -t ) > v(S; x , T - t ) for any S, S' C N such t h a t S C S", i.e. the s u p e r a d d i t i v i t y of the function v in S implies that this function is monotone in S. 0
0
0
Since the essence of cooperative game is the possibility to form coalitions and the m a i n p r o b l e m therein is d i s t r i b u t i o n of the total payoff between players, then the subject of cooperative theory is characteristic function rather than strategy. In fact, the characteristic function displays the possibilities of coalitions i n the best way and can form the basis for equitable d i s t r i b u t i o n of the t o t a l payoff between players. T h e pair < JV, v{S; x , T — to) > , where /Vis the set of players, and v the characteristic function, defined by (8.1.2) is called the cooperative differential game i n the f o r m of characteristic function v. For short, it w i l l be deuotewd by T (xo,T-t ). 0
v
0
R e m a r k . T h e equality (8.1.2) is not the only way of defining the characteristic function and is used when the formation of counter-coalition is plausiblle in the f o r m a t i o n of coalitions. O t h e r methods may also be applied to provide estimation of the c o a l i t i o n strength. V a r i o u s m e t h o d s for " e q u i t a b l e " d i s t r i b u t i o n of the t o t a l profit between players are treated as o p t i m a l i t y principles i n cooperative games. T h e set of such d i s t r i b u t i o n satisfying an o p t i m a l i t y principle is called a solution to the cooperative game (in the sense of this o p t i m a l i t y p r i n c i p l e ) . W e will now define solutions of the g a m e T (N; x , T - to). V
0
Denote by £, a share of the player i € N in the total gain v(N\x
0l
D e f i n i t i o n 4.
>«({*};*o.r-
2- T: £i ieN
=
t )0
T h e vector £ = ( & , . . . , ( » ) , whose components satisfy the
conditions: i- 6
T -
v(N;x ,T-to), 0
Cooperative
288
differential
games with side
payments
is called an i m p u t a t i o n in tlie game T {xo,T — t ). T h e equity of the d i s t r i b u t i o n f = . , f » ) representing an i m p u t a t i o n is that each player receives at least his safe payoff and the entire m a x i m a l payoff is divided evenly w i t h o u t a remainder. v
0
T h e o r e m 1 Suppose the function to : 2 X R X R -* R is additive in S £ 2", i.e. for any S,R £ 2 , SftR = 0 we have iv(S\J R;x ,T - t) = ui(S\x , T — to) + w(R; x T — t ). Then in the game r ( x . T — t ) there is a unique imputation r* = { w ( ) i } ; icg, T — to), i = 1 , . . . , r c ) . N
m
l
1
N
0
0t
0
0
w
0
0
0
P r o o f : F r o m the a d d i t i v i t y of w we immediately o b t a i n ui(N,x ,T — t) = U)({l};Xo, T — to) + ... -\- w({n};x ,T — t ), whence follows the statement of the theorem. • 0
0
0
0
T h e game w i t h additive characteristic function is called inessential. In the essential game V (x T — £ ) there is an infinite set of i m p u t a t i o n s . Indeed, any wector of the form v
0}
(v({\},x ,T
0
- t) + a . . .
0
0
,v({n},x ,T
u
£ «, = v(N; x ,Tie/v
t ) - £ v({i};x ,T few
0
o, > 0, £ £ N,
- to) + a,,),
a
a
(8.1.3)
- t ) > 0,
0
Q
is an i m p u t a t i o n . T h e i m p u t a t i o n set i n the game V (xo, T —f. ) is denoted by E {x ,T-t ). u
v
0
0
0
D e f i n i t i o n 5. We say that the i m p u t a t i o n f dominates the i m p u t a t i o n n in the coalition S (f J^s rj) if 1- 6 >
*G
Vi,
2- Zkesti
S;
<
v(S;x ,T-t ). 0
0
The i m p u t a t i o n f is said to dominate the i m p u t a t i o n n (f y- n) if there is such coalition S C N that f jr-g v. It follows from the definition of the i m p u t a t i o n that d o m i n a t i o n i n singleelement coalition and coalition N, is not possible. D e f i n i t i o n 6. T h e set of nondominated i m p u t a t i o n s is called the c-kernel of the game F„(a: , T - t ) and it denoted by C (x , T t ). 0
0
u
0
0
T h e equity of the i m p u t a t i o n belonging to the c-kernel is t h a t none of the coalitions can offer a reasonable alternative against this i m p u t a t i o n . D e f i n i t i o n 7. T h e set L (x , T-t ) C E (x , T - ( ) is called the N e u m a n n Morgenstern solution (the N M - s o l u t i o n ) of the game r {x ,T - t ) if: v
0
0
v
0
0
v
1. (,n
£ L (x ,T v
0
2. for n # L„(m,
0
- t ) implies f ^ ri (f does not dominate 0
T - to) there is such £ £ L (x , v
0
0
n),
T - to) that ( y rj.
Principle
of dynamic
stability
(time-consistancy)
289
A s is seen f r o m Definition 7, the conditions placed on the imputations from the NM-solution are weaker than those on the i m p u t a t i o n from the c-kernel and, as a result, t h e N A f - s o l u t i o n alwais contains the c-cernel. U n l i k e the c-kernel a n d A A b s o l u t i o n , the Shapley value representing an o p t i m a l distribution p r i n c i p l e of the t o t a l gain v(N; x , T - t ) is defined without using the concept of d o m i n a t i o n . 7
0
0
D e f i n i t i o n 8. T h e vector
0
0
n]
0
1. if v,w are two c h a r a c t e r i s t i c functions, then * ( i , T - t ) + ^ {x T ( ) = * "(io,T-£o); v
0
2.
0
0
v
0t
-
u + ,
TT$ (XO, T — t ) = •**(*, T — (o), where 7r is any permutation of players, 7r4>"(i , T — to) = { * i ( i ) ( x o , T — fo), * = 1 , . . . , ! » } , where irv is characteristic f u n c t i o n such t h a t for any coalition S = {i ...,:',} 7ru({7r(t,),..., Tr(i,)};x ,T - t ) - v(S;x ,T - t ); V
0
0
lt
0
0
0
0
3- J2iqN * - ( x , T - f ) = v(N,x ,T0
t );
0
0
0
4. if v(S; i , T - to) - v(S \ i ; x , T -to)
= 0 for a l l S C N [S 3 i ) , then
0
0
• ? ( * o , T - * b ) = 0. ]t is well k n o w n that there exists a unique vector •t"(j: , T —to) satisfying these four conditions, a n d its components are computed by the formulas 0
#?(*o,r-to) = =
£ SCN
( n - s)\(s - ] ) ' (S9i)
K
5
;
i
o
i
T
_
f q )
_ vtSMw.T-t,,)],
(8.1.4)
™
i = 1 , . . .,n. T h e components of the Shapley value have the meaning of the players' expected shares i n t h e t o t a l gain. A l s o , it may be shown that the Shaplay value is an i m p u t a t i o n .
8.2
Principle of dynamic stability (time-consistancy)
F o r m a l i z a t i o n of the notion of o p t i m a l behavior constitutes one of the fundamental problems in t h e theory of n-person game. A t present, quite for the various classes of games different o p t i m a l i t y principels are constructed. Some of t h e m are stated i n t h e previous section. R e c a l l that the players' behavior
C o o p e r a t i v e differential
290
games with side
payments
(strategies in noncooperative games or i m p u t a t i o n i n cooperatives games) satisfying some given o p t i m a l i t y p r i n c i p l e is called a s o l u t i o n of t h e g a m e i n the sence of this p r i n c i p l e a n d must posses two properties.
O n the o n e h a n d , i t
must adequately reflect the conceptual notion of o p t i m a l i t y p r o v i d i n g special features of the class of games for which i t is defined. O n the o n e h a n d , i t must be feasible under conditions of the game where it is a p p l i e d . T h i s property reduces to the existence of a the solution of the game generated by a specified principle of o p t i m a l i t y . In d y n a m i c games, one more requirement is n a t u r a l l y added to the abovementioned requirements, v i z . the purposefulness and feasibility of an o p t i m a l ity principle are to be preserved throughout the game.
T h i s requirement is
called the d y n a m i c s t a b i l i t y of a solution of the game (time consistency). T h e d y n a m i c s t a b i l i t y of a solution of the differential game is the property that, when the game proceedes along an " o p t i m a l " trajectory, a t each instant of time the players are to be guided by the same o p t i m a l i t y p r i n c i p l e , a n d hence do not have any g r o u n d for d e v i a t i o n from the previously adopted " o p t i m a l ' behavior throughout the game.
1
W h e n the d y n a m i c s t a b i l i t y is betrayed, at
some instant of time there are conditions under which the c o n t i n u a t i o n of the i n i t i a l behavior becomes n o n - o p t i m a l a n d hence the i n i t i a l l y chosen principle of o p t i m a l i t y proves to be unfeasible. In what follows the concept of d y n a m i c s t a b i l i t y is specified for each class of games. Some principles of o p t i m a l i t y in noncooperative differential games (and, in p a r t i c u l a r , in z e r o - s u m two-person games) are d y n a m i c stable. W e w i l l show, for example, that the N a s h e q u i l i b r i a are always d y n a m i c stable. Let u'(-) = (u\(•),-••,"«(')) tive game T(x ,T 0
he the N a s h e - e q u i l i b r i u m i n the noncoopera-
— to) (see D e f i n i t i o n 4 i n 2, chapter 1). T h e trajectory of sys-
tem (7.3.2.)-(7.3.3.) corresponding to the e q u i l i b r i u m u'(-) is denoted by £'(•)• W i t h the game T{xo, T — t ), we consider the current games T(x (t),T c
0
— (),
to < t < T determined along the trajectory x'(-). T h e game r ( x ' ( t ) , T - 1 ) is determined along the trajectory x'(-). by the equation X = f(x,u-,,...,u„)
T h e game T(x'(t),T
— t) is described
w i t h the i n i t i a l state x'(t),
is of duration
T — t, and the payoff function of players are of the form hi(x(r))dT
+ Hi(x(T)),
Denote A , [ t f ' , T ] -
t=
l,...,n.
= T}, ai[tf',T)
t G [ * £ ' , < & , ) , a n d u,(t) G U ^ r t k )
b
e
t
h
e
i
m
a
S
e
o
f
t
h
= « , . . . , < ' ) • Let e
map
We construct a truncated m a p a\{t,x(t)) of the m a p a (tf •, x(t£')), k
any t e [ifr,^)
the image of the m a p a\(t,x(t))
or, in the other words, a\{t,x(t)){r)
= a (t£', k
a (tt\x{it')). k
setting for
equal to u , ( r ) , r €
x(t£'))(r),
r £
Principle
of dynamic
stability
N o w let ( G [to,T], interval [ i , T ] , i.e.
ft = 0 , 1 , o i ,
291
(time-consistancy)
a n d u ( - ) [ t , T ] be a strategy t r u n c a t i o n on the time ;
where A J [ i , T ] = { * , $ $
= T},
T] -
{aj,«?+>,...,
« D . «< e W i [ t , t f c ; , ) , T € ( * , t f ; , ) . T h e n , for any e q u i l i b r i u m tz(') a n d its corresponding trajectory, :
JMt),u(-){t,T)) where
= x(t,x ,
= [
hMr^dr
+
Hi(x{T)),
u(-)).
0
T h e e q u i l i b r i u m u'(-) is called d i n a m i c stable i n the noncooperative game if for any i G \t ,T], ut(-)[t,T]
T(x ,T-t ) 0
£ D [ M l and i = 1,... ,M
0
0
f
Mx'{t),u'{-)[t,T])>
JiWi),u'{-)[t,n\<-)%n)-t,
where (u'(0[i,T]||u (0[(,Tl) = ( U ) ^ r ] . i
-- «;-,(0^r]>«K0[i,n«l .(-)
U
5
+
[t,T],..., »£.('.)[ti T]} i.e. f r o m the fact that the s i t u a t i o n u'(-) is e - e q u i l i b r i u m in the g a m e T(x ,T — t ) it follows that the truncated s i t u a t i o n u'(-)[t,T] is e - e q u i l i b r i u m in the current game T(x {t),T — t) {to < t < T). 0
0
<
To prove the d y n a m i c s t a b i l i t y of the e - e q u i l i b r i u m s i t u a t i o n u' in the noncooperative game V{x , 0
T — to), we construct the strategy ( u\(-)[to,t]
where Ui{-)[t,T]
G V,\t,T],
a n d V,[t,T]
on[i ,i], 0
is a restriction of V, on [t,T\
From
the definition of the e - e q u i l i b r i u m situation u'(-) for a l l i i i ( ' ) [ i , T ] G we get > Ji(as ,u'(-)IK(-)) - e ,
Ji{x ,u {-)) Q
c
i=l:,---,ft.
0
or + Mx<(t)y{-)[t,T])>
[ h (x<(T))dT t
i
f
ht(x'(T))dT+
-rJ (x'(t),u'(-)\t,T}\\u (-)[t,T])-e,i=l,-..,n, i
i
Hence for a l l u , ( ) [ f . , T ] € T>i[t,T] Ji(x'(t),
u'(-)[t, T}) > J i ( j f ^ y ^ f \ ^ ^ M i
T h i s shows t h a t the s i t u a t i o n u'(-)[t,T} Nash i n the current g a m e r(x'{t), T-t). in the g a m e V{x ,T 0
- t) 0
=
l,.-.,n,
is the e - e q u i l i b r i u m in the sence of Consequently, the N a s h e - e q u i l i b r i u m
is d y n a m i c stable.
C o o p e r a t i v e differential
292
games with side
payments
T h e d y n a m i c s t a b i l i t y of the N a s h e q u i l i b r i u m i n the game T{xo,T
— to)
m a y be shown i n much the same way. T h u s , the employment of an e q u i l i b r i u m i n noncooperative
(and in partic-
u l a r , zero-sum} games does not involve a n y d y n a m i c s t a b i l i t y p r o b l e m . T h e d y n a m i c stability of the e q u i l i b r i u m is ensured by the very fact o f its existence.
T h i s validates the consideration of non cooperatives
games i n n o r m a l
form which reduces the game to a unique choice of strategies at the i n i t i a l instant of the game. T h e s i t u a t i o n is different i n cooperative differential games. A s s u m e that at the start of the game the players adopt a n o p t i m a l i t y principle a n d construct a s o l u t i o n based on i t ( a i m p u t a t i o n set satisfying the chosen principle of o p t i m a l i t y , say the c - k e r n e l , W A f - s o l u t i o n , etc.). F r o m the definition of cooperative game it follows t h a t the evolution of the game is to be along the trajectory p r o v i d i n g a m a x i m a l total payoff for the players.
When
m o v i n g , the players arrive a t the subgames featuring current i n i t i a l states and current d u r a t i o n . I n due course, not o n l y t h e conditions o f the game a n d the players o p p o r t u n i t i e s , but even the players' interests may change.
Therefore,
at a n instant t the i n i t i a l l y o p t i m a l solution of the current game may not exist or satisfy the players now. T h e n , at the instant f, the players w i l l have no ground to keep to the i n i t i a l l y chosen trajectory.
T h e l a t t e r e x a c t l y means
the d y n a m i c i n s t a b i l i t y of the chosen o p t i m a l i t y p r i n c i p l e a n d , as a result, the d y n a m i c i n s t a b i l i t y of the m o t i o n itself. W e now focus o u r a t t e n t i o n on d y n a m i c stable solutions in the cooperative differential games w i t h side payments. Let an o p t i m a l i t y p r i n c i p l e be chosen in t h e game r „ ( i , T — t ). 0
The
Q
0
solution of this game constructed in the i n i t i a l state x{t ) = x
0
based on the
chosen principle of o p t i m a l i t y is denoted by W„{xo, T — f ) . T h e set W (x , T— v
0
to) is a subset o f the i m p u t a t i o n set E (x , v
A s s u m e that W (x v
0>
0
0
T - ( ) i n the game r „ ( x , T 0
0
t ). 0
T - t ) ^ 0. 0
D e f i n i t i o n 9 . A n y trajectory £{•) of the system (7.3.2.}-(7.3.3.) such that MH
)) = £
Mx(-))
= Hm
*o. T -
t) 0
is called a c o n d i t i o n a l l y o p t i m a l trajectory i n the game r „ ( ; r , T — i ) . 0
0
T h e definition suggests that along the conditionally o p t i m a l t r a j e c t o r y the players obtain the largest total payoff. that such a trajectory exists.
For s i m p l i c i t y , we assume
henceforth
In the absence of the c o n d i t i o n a l l y o p t i m a l
trajectory we m a y introduce the notion of " e - c o n d i t i o n a l l y o p t i m a l t r a j e c t o r y " and carry out the necessary constructions w i t h a n accuracy e. W e w i l l now consider the behavior of t h e set W^(x ,T 0
- t ) along the con0
d i t i o n a l l y o p t i m a l trajectory. Towards this e n d , i n each current state x(t) the
Principle
of dynamic
stability
(time—consistancy)
293
current s u b g a m e r (x(t),T - t) is defined as follows. define the c h a r a c t e r i s t i c function
In the state x(t)
v
v(S;x(t),T-t)
{ u
(
if 3
s
N
N [ ( | T ]
= 9,
if S C N (0 f if S = N.
vair {x(t),T - t) , J {x(t),u {-)[t,T])
) ( ( i T ] 6 l
we
=
o
I [ max „ ,
t
N
S 4- / V ) ,
Here JN(x(t),u (-)[t,T\) is the r e m a i n i n g t o t a l payoff of the players from the initial state x{t) on the c o n d i t i o n a l l y o p t i m a l trajectory, i.e. N
h,(x( ))dr
+
T
T — t) is the value of the z e r o - s u m differential game r (x(t),
valV {x(t), s
between
HfixiT)) T — t)
s
coalitions S a n d N \ S w h i c h is described by the equation x a n d f r o m the i n i t i a l state x(t)
f(x,u ,UN\s) s
and d u r a t i o n T — t.
=
Here the
payoff of P l a y e r 5" ( m a x i m i z e r ) in each s i t u a t i o n (u (>)[i-, T ] , u \ ( - ) [ ( , T]) s
us{-)[t,T] Js(x{t),
e V [t,T], s
G Vn\s\t,T]
u \ {-)[t,T] N
s
u (-)[t, T}, ms®&
T
s
The payoff of player N\S
})
= £
W
s
}
equals to M-)[t,
M0(4,
n
u^sW,
T]).
is set equal to — Js. T h e t r u n c a t i o n of the admissible
strategy i * s ( ) a n d the a d m i s s i b l e strategy Us(-)
a n d the admissible strategy
set T>s of the c o a l i t i o n S over the time interval [t,T] are denoted by us(-)[i, 2") and U s ^ T ] respectively. T h e current subgame T (x(t),T
- t) is defined as < N,v(S,x(t),T
v
T h e i m p u t a t i o n set i n the game T (x(t), v
E (x{t),T v
- t) = jf
6 0
- t) >.
T — t) is of the f o r m :
| 6 > v({i};x(t),T-
t), i = 1 , . . . , n ;
Zti=v(N;x(t),T-t)}, where
(
v(N;x(t),
T-t)=
v(N;
x T 0>
-to)
-
(
£
Hx{r))dr.
The quantity
/'£/Y,(i(r})dr is interpreted as the t o t a l gain of the players on the time interval [ i , ( ] when 0
the m o t i o n is caried out along the trajectory
x(-).
Cooperative
294
differential
games with side
payments
C o n s i d e r the f a m i l y of current games = < N,v(S;x{t),T
(T m)>T-*) v
- t) > , i o < i <
T),
determined along the conditionally o p t i m a l trajectory £(•) a n d their solutions W (x(t), v
C E {x(t),T
T-t)
-i)
u
as the i n i t i a l l y solution W„(xo,T L e m m a 2 Tke set W [x(T),0) v
generated by the same p r i n c i p l e of optimality — to)is a solution
and is composed of the only imputation wkere Hi(x(T))
is the terminal
of the current game
H(x(T))
= {H{(x(T)),
T {x(T),0) B
i — l,...,n},
part of the player i's payoff along the
trajectory
x(-). Proof:
Since the game r ( x ( T ) , 0 ) is of z e r o - d u r a t i o n , then for a l l i £ N v
v({i};x(T),0)
= H,{x(T)).
Hence
£>({i};i(T),0) = £
HtixiT))
=
v(N;x{T),0),
i.e. the characteristic function of the game r ( i ( T ) , 0 ) is additive for S and, v
by T h e o r e m 1, E (x{T),(i) u
= H(x(T))
=
W (x(T),<}) v
T h i s completes the proof of the lemma.
•
D y n a m i c s t a b i l i t y o f s o l u t i o n . L e t the conditionally o p t i m a l trajectory x(-) be such that W {x{t),T - t) ^ 0, t < t < T. If this condition is not satisfied, it is impossible for players to adhere to the chosen principle of o p t i m a l i t y , since at the very first instant t, when W„(x(t), T—t) — 0, the players have no possibility to follow this principle. A s s u m e that in the i n i t i a l state xo the players agree upon the i m p u t a t i o n £° £ W^(x , T —1 )- T h i s means that in the state x the players agree upon such i m p u t a t i o n of the gain v(N; x T—t ) that (when the game terminates at the instant T) the share of the i - t h player is equal to i.e. the i - t h component of the i m p u t a t i o n f ° . Suppose the player i's payoff (his share) on the time interval [io,i] is & ( x ( f ) ) . T h e n , on the r e m a i n i n g time interval [t,T] according to the £° he is to receive the gain n' = (° — f ; ( i ( t ) J . For the original agreement (the i m p u t a t i o n f ° ) to remain in force at the instant f, it is essential that the vector T/' = ( n } , . . . , nJJ belong to the set W„(x(t},T - t), i.e. a solution of the current game F ( i ( i ) , T - t). If such a condition is satisfied at each instant of time £ £ [io,T] along the trajectory x(-), then the i m p u t a t i o n f ° is realized. Such is the conceptual meaning of the d y n a m i c s t a b i l i t y of the sharing. A l o n g the trajectory x( ) on the time interval [t, T], t < t
0
0
0
0
0 l
v
0
N obtains the payoff U
(/v;i(i),r-() = E ieti
hi(x(r))dT
+
Hi{x(T))
0
Principle
of dynamic
stability
295
(time-consistancy)
T h e n the difference v(N;x T-
= f
t ) - v(mx0),T-t)
0l
0
h^r^dr
£
is equal to t h e payoff the coalition JV obtains o n the time interval [t ,t\. T h e share of the i - t h player i n this payoff, considering the transferability of payoffs, may be represented as 0
7.(0 -
/ ' A(r) £
= , ( i ( ( ) , 0),
h^r^dr
(8.2.1)
7
where /9;(T) is the [ i , T ] integrable function satisfying the condition 0
t f t ( ^ ) = 1. ft(0 > 0 , i
0
< r < T , (i = l , . . . , n ) .
From (8.2.1) we have
^
=
ft(0X>,(5(0)-
This q u a n t i t y m a y be interpreted as an instantaneous gain of the player i a t the moment t. Hence it is clear the vector 0(t) = {0i{t),...,
A,(i))
references
to a d i s t r i b u t i o n of the t o t a l gain among the members of the coalition JV. B y properly choosing /J(£), the players can ensure the dessirable outcome, i.e. to regulate t h e players' gain receipt w i t h respect to t i m e , so that et each instant t g [to,T] there will be no objection against realization of the original agreement (the i m p u t a t i o n f ° ) . Definition
10.
T h e i m p u t a t i o n (_° e W (x ,T v
0
- t ) is called d y n a m i c 0
stable in the game T ( a ; , T — £ ) i f the following conditions are satisfied: v
0
0
f. there exists a c o n d i t i o n a l l y o p t i m a l trajectory x(-) along which W {x(t),T-t)^Hi,
t
u
0
2. there exists such [t ,T\ integrable function 0(t) = (0,((),...,0 (t)) 0
for each t
0
that
n
< t < T /3,(() > 0, £ ? £°€
f l
ft(t) = 1 a n d
= 1
(8.2.2)
h(m>0)®W {x(t),T-t)}, v
to
where ( i ( £ ) , / 3 ) =
••
7
a n c l
solution of the current game F (x(t),T v
T h e cooperative differential game r [x ,T v
d y m a m i c stable solution W (x , v
to) are d y n a m i c stable.
0
T-t ) 0
0
W„(x{t),T
- t) is a
- t). - t ) w i t h side payments has a 0
if a l l of the i m p u t a t i o n s ( e
W (xo,Tv
Cooperative
296
differential
games with side
payments
T h e conditionally o p t i m a l trajectory along which there exists a dynamic stable solution of the game r „ ( x o , T — to) is called an o p t i m a l trajectory. If there exists a t least one d y n a m i c stable i m p u t a t i o n f ° £ W ( X Q , T — i ) , but not all of the i m p u t a t i o n s from the set W [x
v
0
0
v
0
0
0
T h e sum © i n above Definition 10 has the following meaning: for rj £ fi" and A C R nffiA = {n + a | a £ A}. n
F r o m the definition of the d y n a m i c s t a b i l i t y at the instant t = T we have £° E l{x(T),0) © W (x{T),0), where I V , ( S ( r ) , 0 ) is a solution of the current game T (x(T), 0) is made up of the only i m p u t a t i o n f = H(x(T)), the sharing f ° may represented as £° = ( i ( : T ) , / ? ) + H(x(T)) or v
r
v
7
g»=
/ ^(r)^/ r
1 ;
(x(r))dr
//(i(T)).
+
T h e dynamic stable i m p u t a t i o n f £ Hvf-Soj T — ( ) may be realized as follows. From (8.2.2) at any instant t < t < T we have 0
0
0
C°e[ (x(£),/i)©^( (0,T-i)], 7
3
(8.2.3)
where
i(m>p)
= f ' m J<
°
£ Ai(i(T))dT ieiv
is the payoff vector on the time interval [ i , i ] , the player i's share in the gain on the same interval being 0
7,(5(£),/?)= f
Pi{T)Y,Hx{T))dr.
W h e n the game proceeds along the o p t i m a l trajectory, the players on each time interval [ i , (] share the t o t a l gain 0
among themselves - t(x(t),0)
£ W (x(t), y
T-t)
(8.2.4)
so that the inclusion (8.2.3) is satisfied. F u r t h e m o r e , (8.2.3) implies the existence of such vector £ ' £ W (x(t),T - t) that £° = f(x(t),0) + T h a t is, in v
Principle
of dynamic
stability
(time-consistancy)
297
the d e s c r i p t i o n of the above m e t h o d of choosing 0{r),
the vector of the gains
to be o b t a i n e d by the players at the r e m a i n i n g stage of the game e
7(i(0. 0)
= e~
= j * h{x(r))dr
+
H(x(T))
belongs to the set W (x[t),T - t). G e o m e t r i c a l l y , this means that by varying the vector -f(x{t),0) = {-n{x(t),0),... , (x{t),0)) resticted by the only condition v
ln
E ^ ! ' l - « = / ' E M«WWr, i€JV
Jtl
> i£N
the players ensure diplacement of the set i(x{t),0)
- t) in such
® W (x(t),T B
a way that the i n c l u s i o n (8.2.3) is satisfied. In general, it is f a i r l y easy to see t h a t there may exist an infinite n u m ber of vectors 0{r)
satisfying conditions (8.2.3),(8.2.4). Therefore the sharing
m e t h o d proposed here seems to lack true uniqueness. However, for any vector satisfying conditions (8.2.3)—(8.2.4) at each time instant t
0{r)
players are guided by the i m p u t a t i o n f
£ W„{x{t),T
0
< t < T the
— t) a n d the same op-
timality p r i n c i p l e throughout the game, and hence have no reason to violate the previously concluded
agreement.
Let us make the following a d d i t i o n a l assumptions a) the set W (x(t),T
— t) is continuously dependent on x(t),t
v
i n Hausdorf
metric; b) the vector f* € W ( i ( f ) , 7 ' — () m a y be chosen as a continuously differenv
tiable m o n o t o n e nonincreasing function of the argument t. Show t h a t by properly choosing /3(f) we may always ensure d y n a m i c stability of the s h a r i n g f
D
€ W (x ,T v
Q
- to) under assumptions a),b) and the first
condition of D e f i n i t i o n 10 (i.e. along the conditionally o p t i m a l trajectory at each time i n s t a n t t W e choose
0
< t < T W „ ( S ( 0 . T-t)±
€ W (x(t),T a
- t) to be a continuously differentiable function
of i , t
< t < T. C o n s t r u c t the difference ( ° - f
W (x
T - to).
0
v
0>
Let 0(t)
9).
= (ft (1), • • •, & ( * ) )
= a(t) b
e
t
h
e
then we get £ ' + (*(() K
]
T
£
integrable vector
function satisfying c o n d i t i o n (8.2.4). Solve the equation (with respect to
0{t))
Cooperative differential games w i t h side payments
298
M a k e sure that for such 0(t) the c o n d i t i o n (8.2.4) is satisfied. Indeed,
t
EiCAf M * ( 0 )
N
£ « N *(*(«))
F r o m condition (7.3.13.) we have / i , , / / , > 0, t £ JV, and since ijg- < 0, then ft(r) > 0. T h u s , if along the c o n d i t i o n a l l y o p t i m a l trajectory all current games have nonempty solutions possessing conditions a ) , b) then the o r i g i n a l game V (xo, T — to) has a d y n a m i c stable solution. T h e o r e t i c a l l y , the m a i n problem is to study conditions imposed o n the vector function 0{t) i n order to ensure d y n a m i c s t a b i l i t y of specific forms of solutions W (x , T —1 ) i n various classes of games. v
u
8.3
0
0
Classification of dynamic stable solutions
In this section we will consider the concept of strongly d y n a m i c s t a b i l i t y and define d y n a m i c stable solutions for cooperative games w i t h t e r m i n a l payoffs. S t r o n g l y - d y n a m i c s t a b l e s o l u t i o n . F o r the d y n a m i c stable i m p u t a tion (° 6 W {x ,T - t ) , as follows from Definition 10, for i < t < T there exist such [fo,T] integrable vector function /3(f) and i m p u t a t i o n f (generally nonunique) from the solution W (x(t),T—t) of the current game r„(5i(i), T—t) that tf — i{x{t),j3) + T h e conditions of d y n a m i c s t a b i l i t y do not affect the i m p u t a t i o n from the set W (x(t),T - t) w h i c h fail to satisfy this equat i o n . F u r t h e r m o r e , of interest is the case where any i m p u t a t i o n from the current solution W {x(t),T - t) m a y provide a " g o o d " c o n t i n u a t i o n for the original agreement, i.e. for a d y n a m i c stable i m p u t a t i o n £° € W (x ,T - t ) at any instant f < t < T and for every C 6 W {£(t) T - t) the condition ( x ( t ) , P)+e* W (x , T - to), where ( J ( T ) , 0) + H(x(T)) = be satisfied. B y slightly strengthening this requirement, we o b t a i n a q u a l i t a t i v e l y new dynamic s t a b i l i t y concept of the solution W (x ,T-t ) o f the g a m e r (x , T—t(,) and call i t a strongly d y n a m i c stability. v
0
0
0
v
D
v
v
v
v
0
7
v
0
0
0
y
7
v
0
0
v
0
D e f i n i t i o n 11 T h e i m p u t a t i o n ( ° G W (x ,T - t ) is called s t r o n g l y d y n a m i c stable in the game r „ ( x o , T — to) if the following conditions are satisfied: u
1. the i m p u t a t i o n £° is d y n a m i c stable;
0
0
Classification
of dynamic
2. for any ( 7
(i(i
2 l
0
stable
< t < t x
solutions
< T and 0°(t)
2
0 ° ) 8 W (x(t ),T-1 )) v
2
299
corresponding to the i m p u t a t i o n f ° ,
c 7 ( i ( d , 0 ° ) ® W {x(t,),T-t,))
2
(8.3.1)
v
T h e cooperative differential game T (x T - t ) w i t h side payments has a s t r o n g l y - d y n a m i c stable solution W (x ,T - i ) if a l l the i m p u t a t i o n from W„(xo, T — fo) are s t r o n g l y - d y n a m i c stable. T h e c o n d i t i o n a l l y o p t i m a l trajectory along which there exist a strongly d y n a m i c stable solution of the game r „ ( x , T - f ) is called a strongly o p t i m a l trajectory. If there exists at least one s t r o n g l y - d y n a m i c stable, i m p u t a t i o n £ € W [x T - t ), but not a l l of the i m p u t a t i o n from the set £ € W (x ,T - t) have such a property, then we are dealing w i t h a p a r t i a l straongly d i n a m i c stability of the solution W (x ,T - t ) of the game T {x ,T - t ). T h e d y n a m i c i n s t a b i l i t y of the solution of the cooperative differential game leads to abandonment of the o p t i m a l i t y principle generating this solution, since none of the i m p u t a t i o n from the set W (x ,T — t ) remains o p t i m a l u n t i l the game terminates. Therefore, the set W {xo, T — to) is generally to be called a solution to the g a m e I\,(.to,T — to) only if it is d i n a m i c stable. Otherwise the game r (x T — t ) is assumed to have no solution. v
v
0l
0
0
0
0
v
0l
0
0
v
v
0
0
v
v
0
0
a
a
a
D
v
v
0t
0
T e r m i n a l p a y o f f s . In (7.3.2.), let h, = 0, i = ! , . . . , ! » . . T h e cooperative differential game w i t h t e r m i n a l payoffs is denoted by the same s y m b o l r „ ( 2 : , T — t ). In such games the payoffs are made when the game terminates. Denote by C ~'° (x ) the set of points y 6 R for w h i c h there exist an o p e n loop control ( u , ( t ) , . . . , u ( t ) ) , u,(t) £ U i = i,...,n transferring (because of system (7.3.2.)), a phase point from the i n i t i a l state x to the point y i n time T — t . T h e set C " ( x ) is called the reachability set i n the game V (x T - t )It is n a t u r a l l y assumed t h a t in the game w i t h terminal payoffs. 0
0
T
m
0
u
n
0
T
0
v
0l
l o
0
0
v(N;x ,T-t ) 0
=
0
max
H (x) N
-
H (x-), N
(8.3.2)
where
(for s i m p l i c i t y , assume t h a t a m a x i m u m in (8.3.2) is achieved, otherwise constructions become somewhat complicated and we have to deal with the edynamic stability). D e f i n i t i o n 1 2 . A n y trajectory i(-) of system (7.3.2.)-(7.3.3.) such t h a t x(T) = i' is called a c o n d i t i o n a l l y o p t i m a l trajectory i n the cooperative differential g a m e r (x ,T - t ) w i t h t e r m i n a l payoffs. u
0
0
Cooperative
300
differential
games with side
payments
Definition of a d y n a m i c stable i m p u t a t i o n f r o m the solution W (x , v
0
T — t) 0
is o b t a i n e d as a special case of Definition 10. Since the games w i t h terminal payoffs are frequently encountered i n this b o o k , t h i s definition is provided separately. C o n s i d e r the current games T (x(i)),
T-t),
v
t
0
< t < T along a conditionally
o p t i m a l trajectory x(-). A s before, their solutions are denoted by t) C E (x(t),T-t),
t
v
0
< t < T . T h e game r (x(t),T-t)
W„{x(t),T-
is of d u r a t i o n
u
T-t,
has t h e i n i t i a l state x{t), a n d the payoff functions therein are defined just as in the game T (x v
T - t ). N o t e t h a t , w i t h the m o t i o n along the conditionally
0t
0
o p t i m a l trajectory, at each t i m e instant t
< t < T the point x' remains in
0
the reachability region C -(x(t))
=
T
= {y€
R
M
\x{T;x(t),u (t),...,u (t)) l
6 Ui,
= y, (t)
n
Ui
and a t the instant t = T, C°(x(T))
i= l,...,rt),
= x*.
D e f i n i t i o n 1 3 . T h e i m p u t a t i o n (° € W (XQ,T stable in the game w i t h t e r m i n a l payoffs if:
— to) is called d y n a m i c
V
1. there is a conditionally o p t i m a l trajectory x(-) along w h i c h lV„(x(i),T-()^0,t
T h e c o n d i t i o n a l l y o p t i m a l trajectory along which there exists a dynamic stable i m p u t a t i o n f ° C W^{x ,T 0
Theorem
0
2 In the cooped>rative differential
game T (x ,T v
0
— t ) with Q
i = l , . . . , n , only the vector H{x') = { t f . f V ) ,
payoffs Hi(x(T)), whose components ditionally
— ( ) is called a n o p t i m a l trajectory.
optimal
terminal
i = l,...,n}
are equal to tke players payoffs at the end point of the contrajectory
may be dynamic
stable.
P r o o f : It follows from the d y n a m i c s t a b i l i t y of the i m p u t a t i o n f ° G W (x , T — ( ) that v
0
0
e e
w ( (t),T-t).
n
v X
to
B u t since the current game V (x(T), £ „ ( z { T ) , 0 ) = W (x(T),0) = H(x{T)) v
v
0) is of zero d u r a t i o n , then therein = H{x'). Hence
0 W»(x(t) T-t) l
=
H(x*l
0
i.e. f ° = H(x~) a n d there are no other i m p u t a t i o n s .
Classification
of dynamic
stable
solutions
301
T h e o r e m 3 For the existence of the dynamic stable solution in the game with terminal payoff it is necessary and sufficient that for all to < t < T H(x-)eW (x(t),T-t), u
where H(x") is the players payoff vector at the end point of tke conditionally optimal trajectory x(-), with W (x(t), T — t), t < t
0
T h i s theorem is a corollary of T h e o r e m 2. T h u s , if in the game w i t h terminal payoffs there is a d y n a m i c stable i m p u t a t i o n , then the players in the initial state x have to agree upon realization of the vector ( i m p u t a t i o n ) H(x') € W (x T — r ) a n d , with the motion along the optima] t r a j e c t o r y £(•), a t each t i m e instant to < t < T this i m p u t a t i o n H(x') belongs to the solution of the current games r „ ( £ ( i ) , T — t). 0
v
0l
0
A s T h e o r e m 2. shows, i n the game w i t h terminal payoffs only a unique i m p u t a t i o n from the set W (x , T — £ ) may be d y n a m i c stable. Therefore, in such games there is no point i n discussing both the d y n a m i c stability of the solution W (x , T — t ) as a whole and its strong d y n a m i c stability. v
v
0
0
0
0
Chapter 9 New optimality principles in n—person differential games 9.1
Integral optimality principles
Most of the o p t i m a l i t y principles ( O P ) i n n - p e r s o n differential game theory are taken from the classical (static) theory and are d y n a m i c instable (time i n consistent), or strongly d y n a m i c (unstable strongly time inconsistent). In this chapter for the cooperative case we try to purpose the family of strongly time consistent o p t i m a l i t y principles ( S T C O P ) constructed on the base of integration of the l o c a l l y o p t i m a l behavior i n classical sense. Using the regularizing procedure we get new S T C O P ' s based on core, Shapley value, / V M - s o l u t i o n . C o n s i d e r n - p e r s o n differential game r ( x , T — to) 0
x = / ( i t , t * i , . . . , u „ ) , u ; G Ui C
CompR
l
l
w i t h i n t e g r a l payoffs [
. ,u ) = F n
h,{x(t))dt,
Jin
hi > 0, i =
l,...,n.
C o o p e r a t i v e f o r m o f r(xo,T — to)- In this formalization we suppose that the players before starting the game agree to play t / J , . . . , it* such that the corresponding trajectory m a x i m i z e s the s u m of the payoffs n
m a x £ A'.Oo, T - t ; u , , . . . ,u„) 0
-
(9.1.1)
= itrKifaT-
303
New optimality principles in n-person differen
304
J2 i=i
hi{x'(t))dt
f » T
= V(N;
x, T -
t ),
0
0
J t
where N is the set of all players i n F ( a ; , T — f ) . T h e t r a j e c t o r y x'(t) is called o p t i m a l . L e t V{S;XQ,T — t0) be the characteristic function (S C N) and C(XQ, T — to) the core. Consider the family of subgames r ( x * ( i ) , T — t) along x*(t), t € [to,T], corresponding cores C{x*(t),T — t) (which are supposed to be nonvoid) a n d c.f. V{S; x*{t),T - t). T h e core C(x ,T - t ) is strongly time inconsistent a n d more over i n a l l n o n t r i v i a l cases even t i m e inconsistent. B u t using the c.f. V(S; z * ( i ) , T-t) a n d C(x'(t), T - t), t G [t0, T\ we shall construct a new c.f. a n d based o n i t a new strongly d y n a m i c stable time consistent ( S T C ) o p t i m a l i t y principle ( O P ) . Let us introduce the following function 0
0
0
0
V(S;x ,T-t ) = 0
"V f
T
0
fc / ' h,(x'{r))dr + V(S; x'{t),T-dt, t]
(9.1.2)
where S C N, I * ( T ) is the o p t i m a l trajectory from (9.1.1), V(S; x"{t),T — t), t 6 [t0, T\ c.f. i n the subgame P ( x ' ( i ) , T - t). W e suppose V(S; x*(t), T-t integrable o n [to,T]. W e have for S = N
V(N;x ,T-t ) = o
1 — tf, Jl
0
dt =
J<0
0
- Y~£v(N;x T-t )dt 0t
=
0
(9.1.3)
V(N;x ,T-t ), 0
0
because along the o p t i m a l trajectory X * ( T ) , T S [t Q ,r] the B e l l m a n n ' s o p t i m a l i t y principle holds for the function V(N;x'{t), T — t), i.e. V{N; x ,T - i o ) = £ 0
+V(/v !-(<), r - t ) , ;
/ " Ai(a:-(r})dr+
te[t ,n 0
It is easily seen that
V(S,\JS ;x ,T-to)> 2
0
> V(S,,xo,T-t ) + V{Sy,x T - t ), 0
0l
0
(9.1.4)
Integral optimality
principles
305
for Si 0 S = 0, Si C N, S Q N. F r o m the s u p e r a d d i t i v i t y of c.f. we have for all t 6 [t ,T] 2
2
0
V ( S x * ( t ) , T - t) + V(S ;x'(t),T
- t) <
3
i ;
(9.1.5)
A d d i n g t o b o t h sites of (9.1.5) £
/\(**(r))
f\(x*[T))d
and integrating on [t , T ] we get (9.1.4). 0
T h u s from (9.1.3) a n d (9.1.4) it follows that V{S;x ,T -1 ), S C N is c.f. in the game r ( x , T -1 ). Define now the analogue of V(S; x , T - t ) , S C N for the subgames r ( x " ( 0 ) , T — 0 ) , 9 G | t , T ] . Let 0
0
0
0
0
0
0
V(N x-(Q),T-Q)
=
]
= T^T
i — t Jo Li jv 0
€
J
T-t) di
(9.1.6)
(9.1.7)
V(/V;x-(0),T-O).
(9.1.8)
/ ' M * V ) ) a V + V ( A ; «•(«), e
and
v(S;**(0),:r-O) =
S C N From (9.1.6) we get that
p(jv i-(e),r-e);
T-Q T-t
0
W e see that V is not a c.f. in a c o m m o n sense i n the subgame ^ ( x • ( 0 } , 7 0 ) , to < 0 < T, because V{N; x " ( 0 ) , T - 0 ) is not equal t o the m a x i m a l s u m of the payoffs of all the players in this subgame. ,
In the way we have done it for V(S;x ,T-t ) we show that 0 ) , 0 G [ t , T ] , S C N is a superadditive function of S. 0
0
V(S;x"(0),T-
0
Let C{x ,T - to) a u d C(x'(t),T - t) be t h e nonvoid cores in the game r ( x T - to) a n d r(x"(t), T - t), t G [to, T\ respectively. L e t 0
0l
C(0 = {6(*),....6(-),".e-(.)>e eC(x-(t),T-t),
t efto.T]
New optimality
306
principles
in n-person
differential
games
be an integrable selector which is an i m p u t a t i o n from the core of the subgame r(z"(f), T — t) at each instant t. Consider the quantities
i
—
T
r r'
T
+
M^MUT to [Jio Lj'o
l(t)
•T r r'
dt},
(9.1.9)
dt
i = 1,.. . , n . Let C(x , T-t), (»"(©, T - 0 ) , be the sets of a l l possible vectors
e
teivn
at)^c{x-(t),T-t),
from the cores of the subgames r ( x " ( ( ) , T — t). T h e n using the set integration we may write, C(x ,T-t ) 0
0
= —— f C{x'{t),T 1 — to J t i
-
T
t)dt,
0
C V ( 0 ) , T - 0 ) = — 5 — f C(x-(t),Ti)dt. (9.1.10) J — E T h e necessary and sufficient condition for an i m p u t a t i o n £(() belong to the core C(x'{t),T-t) is T
0
(9.1.11)
SCN
Y;(M>V(S;x'(t)>T-t),
adding to b o t h sides of (9.1.11) E , e s / s /i,-(x*(r))dr and integrating cover \e,T) we get
>
^
{f
H^r))dr
kl
+
r
-
(9.1.12)
o
Using (9.1.7) and (9.1.9) we get £(, >t/(S';x-(0),T-0),
SCN
9
i.e. that any vector £
e
e C ( x * ( 0 ) , T - 0 ) , 0 G [ i , T ] belongs also to the core 0
of the subgame r ( x " ( 0 ) , T - 0 ) defined by the c.f. V{S;x'(e),T
-
0).
Integral optimality T h e o r e m 1 The
principles
307
set C(x-{0),T
r ( s * ( 0 , r - 0 ) with the c.f
-
0)
belongs to the core of the
^5;*-(@),r-;@j
subgame
SCN.
1
N o w we have the i n t u i t i v e background to introduce C(x'{Q),T - 0 ) as an OP in the subgame r ( s * ( 0 ) , T - 0 ) , 0 G [t ,T] { in the case 0 = t we have an OP for the o r i g i n a l game r(x , T - t ) ) . Define now a n a t u r a l procedure of d i s t r i b u t i o n of the i m p u t a t i o n on the t i m e interval \t ,T] which leads to the STC OP. 0
0
0
0
0
Let ( € C(x , the condition 0
T - t ) a n d a function /3,(t), i = 1 , . . . , n, t G [ i , T] satisfies 0
0
/
ft(t)dt
= &
/?,(() > o.
T h e f u n c t i o n /?(t) = { f t ( t ) } we shall call the i m p u t a t i o n d i s t r i b u t i o n procedure ( I D P ) . Define /
= 6(9),
D e f i n i t i o n 1. T h e O P C{x ,T I D P / ? ( i ) = { f t ( t ) } , that 0
6 ( 0 ) -f C(x'(Q),T
i=
I,...,n.
- f ) is called S T C if there exist such an 0
- 0 ) C C(x ,T
- to),
0
(9.1.14)
for all 0 G [to, T]. T h e definition of S T C is applicable for larger class o O P ' s (such t h a t Shaply value, i V / W - s o l u t i o n , K a l a i - S m o r o d i n s k y solution, etc.) T h e S T C of the O P means that if an i m p u t a t i o n f G C(x ,T — t ) and an I D P 8(t) = {0i{t}} of £ are selected, then after getting by the players, on the time interval [to, 6 ] , the amount 0
f
9
A(t)dt, i =
0
l,...,n,
the o p t i m a ! i n c o m e (in the sense of the O P C(x'(Q),T - .0)) on the time interval [Q,T] in the subgame T ( . r * ( 0 ) , T - 9 ) together w i t h £"(0) constitutes the i m p u t a t i o n belonging to the O P i n the original game F(x ,T - t ). T h i s condition is stronger t h a n time consistency, where we have only 0
0
£"-((0)cC(V,r-0), which means t h a t the part of the previously considered " o p t i m a l " i m p u t a t i o n belongs to the OP in the corresponding current subgame r ( x " ( 0 ) , T - 0 ) . T h e o r e m 2 OP C{x ,T 0
- T) 0
is STC
in
r(x ,T-t ). 0
0
308
New optimality
principles
T - t
T-to
in n-person
differential
games
P r o o f : Define 0,(t)
0
where £(t) e C{x'(t),T 0 ) , where
C o n s i d e r t h e set ( ( 0 ) + C ( x * ( 0 ) , T -
- t), t e [i ,T]. 0
5(0)»
(9.1.15)
(9.1.16)
f 0(t)dt. e
F r o m (9.1.16) we get
( ( Q ) = Tf^-r
• f i(t)dt
= r ^ X / >
+ - 1 - / V
a
<
,
[ / >
(
" <>*<*'«))<«
l
'
" r
" "
M
h
- t)h[x'(t))dt
« " ] ' "
+
+
LC^
C
'
by using the f o r m u l a 0 / = j\h{x-{t))dt
Mi'MWr +
=
^j' h(x-{r))drj
dt.
F r o m (9.1.17) a n d (9.1.16) we get £(e) + C ( x - ( 0 ) , r - 0 ) = dt+ +
f~^£
(*"( )) +ff(**(e),r-ej=
&
T
-rr^jCLC*
dT
( a r ( T , ,
'
l T +
« ] 0
, f t +
=
H *
=
JnlegraJ optimality
309
principles T
-
Q
l®
+
h
f^r L Q
for a l l
x
T
dr
( '( )) +
e{t)cC(x'(i),T-t),te[Q,T]}.
W e see t h a t every element | of the set £ ( 0 ) + C ( x * ( 0 ) , r - 0 ) is represented in the form dm-
+
7 ^ X
e
^ '
,
r
)
M
r
+
T ^ / e
T
[ / j ^ " ^
where £'(() is some i m p u t a t i o n from the core C(x*(t),T subgame r(x'(t), T — t). B u t the function
+
^
dt,
— t), t G [ 0 , T ] i n the
*e[e,21
also is a selector f r o m C{x'{t),T
- i ) , f £ [to,?"], thus
1 — to -/* L^io 0
dt =
dt = I
'J-
dt C C{x ,T 0
T — to Jt v'to
-
a
and we have ( ( 0 ) + C(x'(Q),T
- 0 ) C C(x ,T 0
- to),
t) 0
310
New optimality
for all 0 € [t ,T].
principles
in n-person
differential
games
T h e theorem is proved.
0
•
It m a y be easily seen that in tlie case of t e r m i n a l payoff, when - i
K,(x ,T 0
o
;
U
l
, . . . , « J = Hi(x{T)),
i=
l,...,n
all the results remain valid if we put
— to J l
1
where
0
((t)£C(i-(f),r-(],
J — to [ C{x'{tlT-t)ut,
C(x*(0),T-0)= - r i i — t Je
T
0
ee[i ,T]. 0
f. ) we m a y c o m p u t e the Shapley value Sh =
U s i n g the c.f. U ( S ' ; i o , T {Ski}
in the game V(x ,T
0
— t ).
0
We get the following formula
0
S A O o , T - to) = ^ where Sk{x'(t),T c.f.
—
f
t)dt, with
S C N. T h e following equality holds
0
- to) = —^— [ Sk(x-(t),T~ J — to J(o 9
+5"M^(O),T-0), where Sh{x'(Q),T-Q) V(S\x'{t),T
T -
- t) is a Shapley value for the s u b g a m e T{x'{t),T-t)
V(S\x'{t),T-t),
S'k{x ,T
c.f.
Sk(x-(t),
— t).
t)dt+
ee[( ,n 0
is a Shapley value in the s u b g a m e V(x'{t),T-t)
with
A s in the case of core we m a y construct the I D P with
guaranties the S T C (which in this case coincides w i t h T C ) of the Sh. For this reason it is enough to p u t
/9(t) = = i Sfc(«*(0.r-t). r
1
r
- to
T h e construction of the S T C A A f - s o l u t i o n precedes in m u c h the same way.
Differentia/ strongly
9.2
time consistent optimality
principles
311
Differential strongly time consistent optimality principles
In 1 we have introduced a family of strongly time consistent o p t i m a l i t y p r i n ciples ( S P C O P ) based u p o n the integration of o p t i m a l i t y principles ( O P ) in current subproblems o c c u r r i n g along the o p t i m a l path. W h e n t r y i n g to apply this new O P ' s to the o p t i m i z a t i o n of complicated economic systems one faces the p r o b l e m of the r e d i s t r i b u t i o n of the investments on the time interval under consideration. T h e latter may require a d d i t i o n a l investments in intermediate time intervals. T h i s tightens the applicability of the purposed integral S T C O P . To overcome the difficulty we purpose here another approach suitable for the o p t i m i z a t i o n of developing economic systems and close to one considered by S. C h i s t j a k o v . L e t n - p e r s o n d y n a m i c game V{xo, T — to) w i t h prescribed deraction T — to, from the i n i t i a l p o s i t i o n XQ. x = f(x,
u
u
...,%),
x € J T , Ui€UiC
CompR',
(9.2.1)
with integral payoffs - to; u i , . . . , u „ ) = f
KiixoyT
kitfiftdt
Jtd
hi > 0, i = 1,...,%. be given. Consider the cooperative form of the game r(x , T - t ). 0
0
In this formaliza-
tion as before we suppose t h a t the players before starting the game agree to play u j , . . . , u * such that the corresponding trajectory maximizes the s u m of the payoffs
max £
Ki(x , 0
T - t
0 l
u . . . ,u„) u
-
;=i where N is the set of a l l players in r(x , T -1 ). 0
0
T h e trajectory x*(t) is called
o p t i m a l . L e t S C N, a n d V{S; x , T - t ) be a characteristic function. Denote
by
0
L(XQ,T
-
0
t ) the set of a l l i m p u t a t i o n s in r ( x , T - t ) , i.e., 0
0
/ X * o , T - i o ) = {£ = { £ }
= £ 6 •=i
0
= V(/V;* ,T-to), 0
New optimality
312
6 2 Let C'-^ixo)
0
0
be a reachable set of the system.
0l
=
differential
w h i c h could be reached at instant t £ [h,T\
n
position x
0
in n-peison
games
i=l,...,«}.
V({i};x ,T-t ),
{t £ [t T])
all points i n R
principles
I.e., the set of from the initial
according to (9.2.1) w i t h the help of some admissible
x(t ) 0
o p e n - l o o p control U ( T ) , T € [ t , f ] . For each y £ C'^xo)
consider a subgame
0
V(y, T - t) of the game V{x, T - to) w i t h corresponding characteristic function V(S\y,T
— t) and set of i m p u t a t i o n s L{y,T
-
t).
D e f i n i t i o n 2. A point to set m a p p i n g C(y,T-t)cL(y,T-t) defined for a l l y £ C ^ ( x ) , t £ \t ,T] is called o p t i m a l i t y p r i n c i p l e ( O P ) in , 0
0
0
the f a m i l y of subgames V{y,T In special cases C{y,T-t)
— (). may be a core, A A ? - s o l u t i o n , S h a p l y - v a l u e e.t.c.
Consider the family of subgames r(x'(t),T—(),
along the o p t i m a l trajectory
x"((), f £ [to, T\ w i t h corresponding characteristic functions V(S;
x^t),
t
T — t),
and sets of i m p u t a t i o n s L{x~{t), T — t). Define now a n a t u r a l procedure of d i s t r i b u t i o n of the i m p u t a t i o n on the time interval [fo,T] which leads to the differential S T C O P . Let $ £ C(x ,T 0
- f-o) and a function 0i(t), i = 1 , . . . , n , t £ [£ , T] satisfies 0
the condition
/' A G 0 * = & T h e function 0{t) ( I D P ) . Define
= {
w
e
r
s
'
i a
A(0>0-
" call the i m p u t a t i o n d i s t r i b u t i o n procedure
#(()<" = 6 ( 0 ) ,
D e f i n i t i o n 3. T h e O P C(x*(t),
i =
l,...,n.
T - f), t £ [ f , T ] is called d y n a m i c stable 0
T C (time consistent) if there exist such an I D P 0(t)
= {#([)}> t h a t
(-("(0)cq '(0),T-0)
(9.2.2)
r
for all 0 £ [t,T]
(see C h . 8).
D e f i n i t i o n 4. T h e O P C ( x * ( t ) , T - i ) , t £ [ t , T ] is called strongly d y n a m i c stable S T C (strongly time consistent) if there exist such an I D P 0(i) — { A ( f ) } i that 0
I + C{x (Q), m
T-Q)C
for a l l 0_ £ [t ,T\. Here ( + C(x"(6),T vectors £ + n , for all IJ £ C ( x ' ( 0 ) , T - 0 ) . 0
C(x ,T0
t) 0
(9.2.3)
0 ) means the set of a l l possible
Differentia.!
strongly
time consistent
optimality
313
principles
T h e S T C of the O P means t h a t if an i m p u t a t i o n £ £ C(x , T - t ) and I D P £ ( ' ) = { A C ) } of t; are selected, then after getting by the players on the time interval [to,©] of the amount 0
£(6)=
/
0
A(t)<M = V . - , n
any o p t i m a l income {in the sense of the O P C ( x * ( 0 ) , T — 0 ) ) on the time interval [ 0 , T ] i n the subgame T(x\T - 0 ) together w i t h £ ( 0 ) constitutes the i m p u t a t i o n belonging to the O P i n the original game r ( x , T - i ) . T h i s condition is stronger t h a n time-consistency. 0
0
Suppose fo = 0 < 0 i < . . . < © „ < 0 „ + i < . . . < 0 = T is a partition of the t i m e interval [fo,T] such that 0 „ i — 0 „ = 6, k = 0 , 1 , . . . , m — 1. If (9.2.2) holds only i n the points 8j», k = 0 , 1 , . . . , m - 1 , i.e. O
m
+
( - f ( 6
t
) C W ) J - e
)
k
(9.2.4)
we call O P C(x-(t),T - t) 5 T C (5 time-consistent). If (9.2.3) holds only i n the points 0 * , k = 0 , 1 , . . . , m — 1, i.e. ( ( 0 . ) + C(x-(Q ),T we call O P C{x [t),T
C C(x ,T-
- O)
k
k
t)
0
0
(9.2.5)
- t) £ S T C (6 strongly time-consistent).
w
Now we have e v e r y t h i n g necessary to construct differential S T C O P . Introduce the following functions. # P
-
M
'
'
{
V(N; T) X(h
r £ [Oo, 9 i ) , where C 6 C(x'(t ),T 0
P
A
T
ti£?=iM'*fr))
- i ) , *o = x ( 0 ) = x(to). o
v^jx-teo.r-e,)
)
1
T 6 [ e , , e > ) , where £ € C ( x - ( 0 , ) , T - © , ) • s
=
tfg..M«'fr))
T € [ 9 * _ i , e f c ) , where £* £ f 7 ( * * ( 6 * _ i ) , T -
« T £ [0 _,,0, l n
n
(
T
J
"
0*-i)-
V(W;x-(0,„_ ),7 -0 _ ) 1
,
= T ] , where f " £ C ( * ' ( e „ - i ) , T -
m
1
©m-i)-
314
New optimality
principles
differential games
in n-person
Define the I D P /3[T) by f o r m u l a /3(T) = 0 * ( T ) , r e [ © f c - i , e « )
(9.2.7)
fc = 1 , . . . , n — 1. It is easily seen that / J > 0. Consider the formulas (9.2.6). For different i m p u t a t i o n s f £ e(;ar*(©*-l).7' — % - i ) ) ft = 1,. - - , n - 1 we get different functions 0 * ( r ) , T £ [0t,O*:+i), a n d thus different functions ^ ( r ) defined by (9.2.7). L e t B be the set of all such I D P 0{r), T £ [t ,T] for all possible (" £ C f V f O * - ! ) , J " - © * . , ) , ft = 1 , . . . ,n — 1.Consider the set k
0
r5(r ,r-to) = |f 0
0(r) £ 5 J .
: t = £/3(r)dr,
and the sets d(x"(0,),r-O ) = | ^ : f
= £/J(r)dr,
t
T h e set d{x_ ,T
/?(r)£s}.
- t ) is called the regularized O P _C{x , T - t ) , a n d corre-
0
0
0
spondingly C{x-_(e ),T-
0
0*) is a regularized O P C(x'(Q ) T
K
— Q ).
K I
W e consider C ( X Q , T—t )
K
as a new o p t i m a l i t y p r i n c i p l e i n the game r(io> T—
0
to)T h e o r e m 3 / / the WP 0{r), ^0,)
r 6 [t ,T]
is defined by (9.2.7) then
0
+ C(x-(Q ),
T - Q ) C C(x , T - t )
- i) is b~
STCOP.
K
i.e. the OPC{x*{r),T
K
P r o o f : Supposed = <(G ) + d ( i * ( 6 ) , r - 0 t
i
0
) ;
).
always
0
Then £ = f ( 0 ) + t
/?(r)dr
for some /3(r) £ B . B u t ( ( 0 * ) = J * » /3'(T-)OV for some 0'{T) £ fi. C o n s i d e r 8»i \-l
T€% ,&k), re[9 ,T],
T
P
i
]
a
\ &Tl
K
then 0"(T) € B, a n d f= and thus £ £ 6(x ,T 0
f 0"(r)dr T
- t ) . T h e theorem is proved. 0
•
T h e defined I D P has the advantage (compared w i t h integral o n e defined in 1 of this chapter) O i t r ) = 1, * €
[t ,T], E
S t r o n g l y time consistent
optimality
principles
315
and thus E £ ( e j = £ ; /
ft.{*iH)dT
(9.2.8)
which is the a c t u a l a m o u n t to be d i v i d e d between the players o n the time interval [ t , 0 ] a n d which is as i t is seen by the f o r m u l a (9.2.8), exactly equal to the a m o u n t earned by them on this time interval. T h u s for the realization of the p u r p o s e d I D P no a d d i t i o n a l investments are needed ((9.2.8) may not hold for integral O P ' s ) . o
If 6 tends to zero we may get S T C O P by i n t r o d u c i n g the I D P /?(T), T £ [ t , T ] , by the f o r m u l a 0
B
W
where jfi(f) £ C(X"(T),T
9.3
JWaiM^))
M
V(N,X-{T),T-T)
1
— r) is an integrable selector.
Strongly time consistent optimality principles for the games with discount payoffs
T h e p r o b l e m of d y n a m i c s t a b i l i t y (time consistency) for the n-person
differ-
ential games w i t h discount payoffs was first mentioned i n [68], where it was proved that even the P a r e t o o p t i m a l solutions may be time inconsistent i n this case. T h e reason is t h a t in a discount payoff case the payoffs of the players in subgames a c q u i r i n g along an o p t i m a l path essentially change their structure i m p l i e i n g the t i m e inconsistency of the chosen o p t i m a l i t y principle ( O P ) . T i l l the last time no a t t e m p t s have been made for the regularizing of the O P ' s in discount payoff case. W e refer to the report [69] on V International S y m p o s i u m on D y n a m i c G a m e s where this question was once more stated. Here we try to use the approach from 2 of this chapter to construct a f a m i l y of strongly d y n a m i c stable o p t i m a l i t y principles i n the case under discussion.
Here we
shall consider core as O P i n the g a m e , but all the results remain valid for any other subset of i m p u t a t i o n s , considered as o p t i m a l i t y p r i n c i p l e . C o n s i d e r ? i - p e r s o n differential game i = F{«,»!,>
r(x ) 0
..,u ), ^ £ u C n
x{t ) 0
CompR
1
= xo
w i t h payoffs «„...,«„)=
H
e «-Vhi(x(t))dt, x
ki>%
A,->0
(9.3.1)
New optimality
316
principles
in 7i-person
differential
games
C o o p e r a t i v e form of T(x ). Consider the n-tuple of open loop controls u j , . . . , u* such that the corre0
sponding trajectory maximizes the sum of the payoffs (utilities) ti
n
m a x £ / 0 ( x ; i t i , - . . , i t n ) = E & . { * a i " i . • • •, K) 0
= £
= Hm
/ t-w-^WW)*
=
x ), 0
where N is the set of all players i n r{x ). T h e trajectory x {t) is called conditionally o p t i m a l . Let V {S; xo) be the characteristic function (S C N) and C ( i a ) the core. Consider the family of subgames r(x ~{t)), along x '(t), i g [ £ , o o ) , corresponding cores and c.f. V (S; x ' " ( f ) ) . T h e payoff functions in the subgames have the form Jt
0
1
l
l
1
0
Kt&mw,
u„) =
j T e- '( -"/ (x -(r))dT A
T
l i
1
1
and differs by m u l t i p l i e r e*' from the payoff functions in the subgames defined for the games w i t h o u t discount factor. T h i s essentially changes the relative weights of payoff functions of different players, when the game develops, and thus the hole game itself. 1
Consider the partition of the time interval [fo,o°) by the points 0 ( = 00 < ©i < . . . < 0 * < 0^+1 < . . . , where 0*+i — 0 * = 5 > 0 does not depend upon k, the subgame i V t O O j . c . f . V (S;x (Q )) and core C ( i ' * ( 0 , ) ) . Let x '(t), t > © j be the conditionally o p t i m a l trajectory in the subgame r f z 1 - ^ ) ) , i.e. such that 0
2
u
2
l
2
m
,?*iXV'(©i)>"i.•••-"*) =
= ±i(x -(e ),u;,.„,u:) 1
=
l
,=i m J
T h e n consider the subgame r ( x " ( 0 ) ) , c.f. V (S; x " ( 0 ) ) , core C ( x " ( 0 ) ) . C o n t i n u i n g i n the same manner we get the sequence of subgames r ( x * ( 0 * ) ) , c.f. V + (S;x '(G )), cores C * * [ » * * ( % ) ) and conditionally o p t i m a l trajectories x<* )*(t), i > © , a
2
3
2
2
3
J
2
fc
k
l
k
+1
1
n
fc
D e f i n i t i o n 5. T h e trajectory x ' = x**(t), ( 6 [ © * _ ! , 0 , , ) , k called o p t i m a l i n the game r ( x ) . 0
1 , 2 , . . . is
Slrong}y time consistent
optimality
principles
317
In this f o r m a l i z a t i o n of the cooperative game we suppose that the players starting the game agree to use the o p t i m a l trajectory x'(t), t > 0 , t = 0 . O u r f o r m a l i z a t i o n depends upon 6 > 0, a n d we denote the cooperative form 0
O
of r ( i ) by r (x ). T h e vector function 0J{T), ..., procedure ( U D P ) i n r ( x ) if s
0
O
0
;
ft(r),...,
1=1
n
*
e-^- ^k ( W(t))dt
= E i=l ©*
&
i
= V (N;x -(Q )) k+l
is called the utily distribution
0 [r)
0
k+l
=
x
- V (N;
x (Q + )),
k+]
k
k+u
k
1
A ( i ) > 0, i = l , . . . , n , fc =0,1 Let ( G C (x '(Q )), k = 0 , 1 , . . . be any i m p u t a t i o n i n the subgame r(x "(0j..}) belonging to the core of this subgame. Define the function /?;(t), t G [ 0 * , 0 i ) by t h e f o r m u l a h
k+i
k
k
k
f c +
_ P'
k
=>
i!r
e-^'^h,(x^)-(t))dt
k
t [V ^(N; k
( T:i
V »(N;x>»{6 ))(e
[t)
k
k
-Q ) k
k+1
s * " ( 0 ) ) - V »(N; +
x + -{Q ,))}
k
t
k
l
^
k+
V^(N;x -(Q )}6 k
k
i = I , . . . , n , k = 0 , 1 , . . . . T h e functions {&(()}> ' >
k+1
s
k
0
0
k+l
k
k
0
;
rV(6*)).
D e f i n i t i o n 6. T h e C(x ) 0
9 t
»o
is called S T C if for a l l /? G C{x ) 0
p\(t)dt\ + C(x-(B )) k
C C(x ) 0
=
we have
C(x*(0 )). o
J
T h e o r e m 4 The (RC) C(x ) 0
is a STC optimality
principle
in
r*(x }0
Bibliography [l] R . Isaacs, Differential games , ( K r y g e r P u b l i s h i n g C o m p a n y , H u n t i n g t o n , N e w - Y o r k , 1975), p. 474. [2] N . N . V o r o b y e v , Advances
in Math. Sc., vol. 14, no. 4, (1959).
[3] N . N . Vorobyev and I. V . R o m a n o v s k y , hen. Univ. Bulletin, (1959). [4] D . G e i l , The Theory of Linear 1960), p. 410.
Economic
Models,
no. 7, part 2,
(Mc-Hill, New-York,
[5] M . I. Z e l i k i n a n d N . T . T i n i a n s k y , Advances in Math. Sc., vol. 20, no. 4, (1965). [6] V . I. Z u b o v , USSR [7] D . Isbell, Positional
Ac. of Sc., Reports, vol 190, no. 4, (1970). Games, (M., N a u k a , M o s k o w , 1967).
[8] N . K a r l i n , Mathematical Methods in Game Theory, Programming Economics , ( P e r g a m o n Press, L o n d o n , 1959), p. 838.
and
[9] D . Z . K e l e n d z h e r i d z e , USSR Ac. of Sc., Reports, vol. 133, no. 3, (1961). [10] N . N . K r a s o v s k i i , Game
Theoretic
Problems
of Motion
Control
, (M.,
N a u k a , M o s c o w , 1970), p. 420. [11] N . N . K r a s o v s k i i and A . I. S u b b o t i n , Positional
Differential
Games, ( M . ,
1967), p. 455. [12] N . N . K r a s o v s k i i , Appl. Math, [13] G . U . K u h n , Positional
and Mech.,
v o l . 27, no. 2, (1963).
games, ( M . , 1967).
[14] V . N . L a g u n o v , Theory of games, (Yerevan, 1973). [15] D . I. L i t t l e w o o d , Mathematician's
Miscellany,
1960). 319
(Methuen & C o , London,
320
Bibliography
[16] 0 . A . M a l a f e y e v , Len. Univ. Bulletin, [17] 0 . A . Malafeyev, Controlled
no. 19, p a r t 4, (1972).
Systems, no. 4 - 5 , (1970).
[18] E . F . M i s h c h e n k o , A u t o m a t i c s and Telemechanics,
no. 9, (1972).
[19] E . F . Mishchenko and L . S. P o n t r y a g i n , USSR
Ac.
of Sc.,
Reports,
vol. 174, no. 1, (1967). [20] M . S. N i k o l s k y , USSR Ac. of Sc., Reports, v o l . 205, no. 4, (1972). [21] M . S. N i k o l s k y , Cybernetics,
no. 2, (1973).
[22] J . S. O s i p o v , USSR Ac. of Sc., Reports, v o l . 203, no. 1, (1972). [23] N . M . Petrov, USSR Ac. of Sc., Reports, v o l . 190, no. 6, (1970). [24] L . A . P e l r o s j a n , USSR Ac. of Sc., Reports, v o l . 161, no. 2, (1965). [25] L . A . Petrosjan, USSR Ac. of Sc., Reports, v o l . 195, no. 3, (1970). [26] L . A . Petrosjan, Advances in Theory of Games, ( V i l n i u s , 1973). [27] L. A . Petrosjan, Armenian
Ac. of Sc., Reports, v o l . 44, 00. 1, (1967).
[28] L . A . Petrosjan, Len. Univ. Bulletin,
no. 19, (1972).
[29] L . A . Petrosjan, Armenian
Ac. of Sc., Bulletin,
v o l . 8, no. 2, (1973).
[30] L . A . Petrosjan, Armenian
Ac. of Sc., Bulletin,
v o l . 3, no. 5, (1968).
[31] L . A . Petrosjan, USSR Ac. of Sc., Reports, v o l . 161, no. 1, (1965). [32] L . A . Petrosjan, Problems of Cybernetics, [33] L . A . Petrosjan, Armenian
no. 27, (1973).
Ac. of Sc., Reports, v o l . 40, no. 4, (1965).
[34] L . A . Petrosjan a n d J . G . D u t k e v i c h , Len.Univ. (1969).
Bulletin,
no. 13, part 3,
[35] L . A . Petrosjan a n d N . V . M u r z o v , USSR Ac. of Sc., Reports, v o l . 172, no. 6, (1967). [36] L . S. P o n t r y a g i n a n d E . F . M i s h e n k o , Differential (1971).
Equations,
v o l . 7, no. 3,
[37] L . S. P o n t r y a g i n , USSR Ac. of Sc., Reports, v o l . 191, no. 2, (1970).
32]
Bibliography [38] L . S. P o n t r y a g i n , USSR Ac. of Sc., Reports, v o l . 174, no. 6, (1967). [39] L . S. P o n t r y a g i n , USSR
Ac. of Sc., Reports, v o l . 175, no. 4, (1967).
[40] L . S. P o n t r y a g i n , USSR
Ac. of Sc., Reports, v o l . 156, no. 4, (1967).
[41] L . S. P o n t r y a g i n a n d E . F . M i s c h e n k o , USSR Ac. of Sc., Reports, v o l . 7, no. 3, (1971). [42] B . N . P s h e n i c h n y , Cybernetics
, no. 1, (1968).
[43] B . N . P s h e n i c h n y , Mathematical tion, K i e v , n o . 3, (1969).
Research Methods and System
Optimiza-
[44] A . A . Savelov, Plane curves, ( F i z m a t g i z , M o s c o w , 1960). [45] N . Y . S a t i m o v , Differential
Equations, vol 9, no. 10, (1973).
[46] N . Y . S a t i m o v , USSR Ac. of Sc., Engineering [47] H . E . Scarf a n d L . S. Shapley, Applications Affairs, ( M o s c o w , 1961). [48] N . T . T i n i a n s k y , USSR [49] I. P. Jachauskas, Lith.
Cybernetics,
no. 6, (1972).
of Game Theory in
Military
Acad of Sc., Reports, v o l . 207, no. 1, (1972). Math.
[50] L . D . B e r k o v i t z , Ann. Math.
Articles, no. 1, (1967). Study, v o l . 52, ( P r i n c e t o n , 1964).
[51] L . D . B e r k o v i t z , Math. Theory of Control, ed. by B a l a k r i s h n a n a n d N e w s t a d t , ( A c a d e m i c press, 1967). [52] W . H . F l e m i n g , J. Math,
and Appl. , v o l . 3, (1961).
[53] W . H . F l e m i n g , Advances
in Game Theory, ( P r i n c e t o n , 1964).
[54] A . F r i e d m a n , Differential
Equations, v o l . 12, no. 1, (1972).
[55] A . F r i e d m a n , Differential
Equations,
v o l . 12, no. 2, (1972).
[56] A . F r i e d m a n , Differential
Equations,
v o l . 12, no. 3, (1972).
[57] J . C . H o , SIAM
J. on Control, v o l . 4, no. 3, (1966).
[58] J . C . H o , A . E . B r y s o n a n d S. B a r s o n , IEEE
Trans. Aut. Control, v o l . 10,
no. 4, (1965). [59] G . L e i t m a n , Intern.
J. of Non Linear
Mechanics,
v o l . 3, (1969).
322
Bibliography
[60] C . R y l l N a r d z e n s k i , Ann. Math. Studies, no. 3, (1957). [61] H . E . Scarf, Ann. Math Studies, no. 3, (1957). [62] H . Steinhaus, Mysl. Acad., vol. 1, no. 1, (1925). [63] P. P. V a r a i y a , SIAM
J. on Control, vol. 5, (1967).
[64] P. P. V a r a i y a a n d J . L i n , SUM
J. on Control, vol. 7, (1969).
[65] A . Z i e b a , Studia Mat., vol. 22, (1962). [66] L . A . P e t r o s j a n , Vestnik of the Len State Univ., no. 13, (1977). [67] L . A . P e t r o s j a n , Dynamic
control, (Sverdlovsk, 1979).
[68] R . H . S t r o t z , Review of Economic
Studies XXIII,
(1955-56).
[69] V . K a i t a l a a n d M . P o h j o l a , Reports of the Fifth international of Dynamic Games and Applications, (Jeneve, 1992). [70] T . Basar a n d G . J . Olsder, Dynamic Noncooperative demic Press, L o n d o n , 1982), p. 255. [71] A . F r i e d m a n , Differential [72] J . N a s h , Amah
Symposium
Game Theory, ( A c a -
Games, ( J o h n W i l e y , New Y o r k , 1971), p. 280.
of Mathematics,
v o l . 54, no. 2, (1951).
[73] J . V o n N e u m a n n a n d 0 . M o r g e n s t e r n , Theory of Games and Economic Behavior, ( P r i n c e t o n U n i v e r s i t y Press, P r i n c e t o n , N J , 1947), p. 850. [74] G . O w e n , Game
Theory, (Sauders, P h i l a d e l p h i a , 1968), p. 320.
[75] A r u n a b h a B a g c h i , Stackelberg Differential Games in Economic Models, in Lecture Notes in Control and Information Sciences, ed. A . V . Balakrishnan a n d M . T h o m a , no. 64, ( S p r i n g e r - V e r l a g , B e r l i n , 1984), p. 203. [76] Differential Games in Economic Analysis, ed. R . P. H a m a l a i n e n , H . K . E h t a m o , in Lecture Notes in Control and Information Sciences, e d . M . T h o m a a n d A . W i n e r , no. 157, ( S p r i n g e r - V e r l a g , B e r l i n , 1992), p. 312. [77] Differential Games-Developments in Modelling and Computation, in Lecture Notes in Control and Information Sciences, ed. M . T h o m a and A . W i n e r , no. 156, ( S p r i n g e r - V e r l a g , B e r l i n , 1992), p. 294.
323
Bibliography [78] A . I. S u b b o t i i i a n d A . Y . C h e n t s o v , Optimization Problems, ( N a u k a , M o s k o w , 1981), p. 264.
of Guarantee in Control
[79] F . L . C h e r n o u s k o a n d A . A . M e l i k j a n , Game-theoretic trol and Search, ( N a u k a , M o s k o w , 1978), p. 268. [80] N . L . G r i g o r e n k o , USSR [81] M . I. Z e l i k i n , USSR
Ac. of Sc., Reports,
Problems of Con-
v o l . 258, no. 2, (1988).
Ac. of Sc., Reports, v o l . 202, no. 5, (1972).
[82] A . F . K o n o n e n k o , USSR
Ac. of Sc., Reports,
[83] E . F . M i s b e n k o , USSR Ac. of Sc., Engineering
v o l . 231, no. 2, (1976). Cybernetics,
[84] M . S. N i k o l s k y , Math.
Proceedings,
[85] N . Y . S a t i m o v , Math.
Notes, v o l . 21, no. 3, (1977).
[86] F . L . C h e r n o u s k o , Appl. Math, [87] A . A . C h i k r y , Appl. Math, [88] A . N . K r a s o v s k i i , USSR [89] A . 1. S u b b o t i n , USSR
no. 5, (1971).
v o l . 116, no. 1, (1981).
and Mech, v o l . 40, no. 1, (1976).
and Mech, v o l . 39, no. 5, (1975).
Ac of Sc., Reports, v o l . 253, no. 6, (1980).
Ac. of Sc., Reports, v o l . 254, no. 2, (1980).
[90] A . I. S u b b o t i n a n d N . N . S u b b o t i n a , Appl. Math,
and Mech, v o l . 46,
no. 2, (1982). [91] B . B . R i h s i e v , Differential
Games with Simple Motions,
( F a n , Tashkent,
1989), p. 232. [92] L . A . Petrosjan a n d N . N . D a n i l o v , Cooperative their Applications,
Differential
Games
and
(Izd. T o m s k o g o U n i v . , T o m s k , 1985), p. 276.
[93] N . N . K r a s o v s k i i , Dynamic
Systems
( N a u k a , M o s c o w , 1985),
Control,
p. 518. [94] L . A . P e t r o s j a n a n d G . V . T o m s k i i , Geometry of Simple Pursuit,
(Nauka,
N o v o s i b i r s k , 1983), p. 142. [95] L . A . P e t r o s j a n a n d G . V . T o m s k i i , Dynamic
Games and Their
Applica-
tions, (Izd. L e n U n i v . , L e n i n g r a d , 1982). [96] L . A . Petrosjan a n d G . V . T o m s k i i , Differential Information,
Games with
(Izd. Irk. U n i v . , I r k u t s k , 1984), p. 188.
Incomplete
324
Bibliography
[97] L . A . Petrosjan and B . B . R i h s i e v , Pursuit
on
the
Plane,
(Nauka,
M o s k o w , 1991), p. 96. [98] L . A . Petrosjan, Len.
Univ. Bulletin,
[99] L . A . Petrosjan, i n IFIP
Technical
Ser. 1, no. 8(2), (1992).
Conference
Reports, ( S p r i n g e r - V e r l a g ,
B e r l i n , 1974). [100] L . A . Petrosjan and T . I. K u z m i n a , Non-zero ( J a k u t s k , 1989), p. 147.
Sum Differential
[101] L . A . Petrosjan, in Reports of Vii conference of Applied University of Osijck, ed. R . S k i t o v s k y , (Osijck, 1990). [102] L . A . Petrosjan and V . V . Z a k h a r o v , Introduction Ecology, (Izd. L e n . U n i v . L e n i n g r a d 1986), p. 220.
Games,
Mathematics
to the
in
Mathematical
[103] L . A . Petrosjan, in Proceedings of the International Conference of Differential Equations and Applications, h o l d i n R u s e , B u l g a r i a , A u g . 13-19, 1989. (Ruse, 1991). [104] N . A . Zenkevich, Len. Univ. Bulletin,
no. 19 (1981).
[105] V . V . Z a k h a r o v , Len. Univ. Bulletin,
no. 1 (1988).
[106] G . V . T o m s k i i , Appl. Math,
and Mech., v o l . 45 no. 2 (1981).
[107] V . I. Zhukovskii and N . T . T i n i a n s k y , Equilibrium Controls in Multicriterial Dynamic Systems, (Moskow U n i v . , M o s k o w , 1984), p. 224. [108] V . I. Z h u k o v s k i i , i n Operation [109] N . M . Sloboshanin, Len.
Reseach, ( B u l g a r , A c a d . S c i . , Sofia, 1885).
Univ. Bulletin,
vol. 3, no. 13 (1981).
Index absolute N a s h e q u i l i b r i u m 244 admissible 58
m a x i m i n strategy 10 method of characteristics 109 m i n i m a x strategy 10 m i x e d strategy 21, 30
barrier 160, 166 behavior strategy 213 capture 51
m i x e d extension 18, 21 noncooperative game 283 o p t i m a l strategy 13 o p t i m a l trajectory 84, 95 o u t c o m e 73 payoff function 6 payoff 238
c a p t u r e point 153 capture zone 51 center of p u r s u i t 84, 90 characteristic function 293, 284 coalition 252 c o n d i t i o n a l l y o p t i m a l trajectory 92, 95 convex 22 convex game 21
Princess and M o n s t e r game 182 pure strategy 21 pursuer 36 pursuer detail 46 pursuer team 166
cooperative g a m e 261 core 303 current subgame 292
qualitative payoff 56 reach a b i l i t y set 59 Shapley value 289 singular surface 92 s i t u a t i o n 34
delay of i n f o r m a t i o n 169 discrete game 36 discrimination 2 dispersal surface 86 d y n a m i c s t a b i l i t y 250, 262
strategy 34 s t r i c t l y convex 42 subgame 240
e q u i l i b r i u m point 13, 14 escape zone 51 evader 6, 8
synthesizing strategies 58 t e r m i n a l payoff 56
family of games 34 game i n n o r m a l f o r m 9 game w i t h perfect i n f o r m a t i o n 5 integral o p t i m a l i t y p r i n c i p l e 301
time consistency 290 tree 236 value of the game 10 variance 31 vertex 22
integral payoff 57 invariant center of p u r s u i t 84, 94 invariant center of the game 93 Isaacs equations 108
zero-sum 10
iterative m e t h o d 110 325