y| _
dii \(/dii
£
X fjy
d / l X HY
Y
denotes the direct product of /i\ and
2
B
d/iY/d/lB
duy
(
Hence i t follows from
(6.2.4) and (6.2.5) that
" ( f t ' ou ) — d p X p.y " ' fixp
[j-T ~
£
( f ) ( i
y
( ( )
_ I j-T |
£(<)
|2 *
(6.2.6)
We use (1.6.6) and (6.2.6) t o derive 1
All-,,
IHCY) =E
d u X fly "hi * r*Y
\
£
L
'ft
J
i
f
T ' dt 2
(6.2.7)
- E
Noting that
E
E[f
T
0
x(t)dB(t}]
=
0 (see (A.3.2)), i t follows from (6.2.3) that
= i j T E[\x(t)\ dt.
y\(t)dY(t)-l£\x(t)\*dt^
s
(6.2.8)
In the same way, i t can be shown that
E
-l>\
x(t)x(t)--\x(t)n
dt.
(6.2.9)
Now the formula (6.2.2) is an easy consequence of (6.2.7) - (6.2.9). To derive (6.2.2) we have assumed that ( i y •< f i x p (
£
Y
• Therefore, to complete the proof, we have to
CONTINUOUS
234
GAUSSIAN
CHANNELS
show that this is true. Oo the contrary, let us suppose that fi y e
there exists a set A , w i t h p-ey(A) > 0, on which dpy/dft
7$ H
x
(*$*'• Then
— 0 and dpyytldpB
B
> 0,
consequently log -
= oo
dpy/d/iB
on
A.
This means that dVY\(/dPB
E
/e
v
T
\
(6.2.10)
is infinite. On the other hand, (6.2.10) is equal to the right-hand side of (6.2.7) This is a contradiction. Therefore p y
which is finite by assumption (6.2.1).
(
should be absolutely continuous w i t h respect to / i x fiy.
•
£
R e m a r k . Since x(t) x(t))x(t)\
is a function of Y ', i t follows from (A.2.7) that E[(x(t) 0
-
= 0, and that E[\x[t)
-
x(t)\ ] 2
= E\\x(t)\
-
2
|2(«)| ].
(6.2.11)
2
Therefore i t is true that
m,y)-\l <|jf
T
{E[\ (m-E[x(t)\ ]}dt 2
X
E[\x(t)\ )di.
(6.2.12)
3
The formula (6.2.2) for the mutual information is quite i m p o r t a n t , especially the following points are worth being emphasized. (a) The formula can be applied even i f the channel is w i t h feedback. (b) The formula can be applied whether the channel input is Gaussian or nonGaussian. (c) Using the formula one can rather easily analyze how does the m u t u a l information Iriti
Y) grow w i t h T.
I t is known that the conditional expectation x(t) optimal mean square estimator of x(t)
= E[x(t)\K(ti),
u < f] is the
based on the observation of the output
K ( u ) , u < t. I n the filtering theory, i t is a fundamental problem to f i n d the best estimator x(t) and t o compute the error £[li(f.) — x(t)| ]. 2
is called the f i l t e r of x(t) and E\\x(t) - x(t)\ ] 2
is called the
The best estimator x(t) filtering
e r r o r . I t has
been seen that in white Gaussian channel to calculate the mutual information is equivalent to calculate the filtering error. Some examples t o calculate the mutual information will be given i n the next section.
S-t MUTUAL
INFORMATION
IN WHITE
GAUSSIAN
CHANNELS
2 3 5
(I)
Here let us consider a channel w i t h linear feedback. I t has a simple structure, b u t plays an important role i n practical situations. Let a message £ = { £ ( ( ) ; ' > 0} be a stochastic process. A white Gaussian channel w i t h l i n e a r feedback is presented by Y(*)=
f A(u)(£(u)-<(u))d + B(t), Jo
(6.2.13)
U
where A ( u ) is a non-random function and ( ( u ) = < ( u , F ) depends only on the 0
u
output Y . Intuitively speaking, £(u) represents the feedback term and A(u) is an U
B
amplitude function. I f feedback is not available, then the white Gaussian channel without feedback, corresponding to (6.2.13), is presented by
Vo(t) =
/ ' A{u)f(n)
du + B(t).
Ja
(6.2.14)
It is shown that the mutual information and the filtering error are not changed by using the linear feedback. T h e o r e m 6.2.3. Let £ = ( £ { ( ) } be a message, and
Y =
{Y(t)}
and
Y
a
=
{Y (t)} 0
he the outputs of the feedback channel (6.2.13) and the non-feedback channel (6.2.14), respectively. Assume that 0 < |A(u)| < M < oo (0 < u < T ) and
£
*< < oo,
[
£[IC(f)| ] du < co. 2
Then for any 0 < t < T I,(f,Y)
- (Wl } 1
= It&Yo),
(6.2.15)
= £ ( l « 0 - tm%
(6-2-16)
where ff(f) = £[£(f)| V » , « < t] and £o(f) = £[£(()! tt(u), u < (].
P r o o f . Since £ ( « ) is a function of Yg, the output K ( u ) = Y { u ) + f * A(s)£(s) da 0
of the non-feedback channel is a function of the output Y
g
of the feedback channel.
Consequently, we have I,((>Y)>It(t>Y )
(6.2.17)
0
and B [ m - m i < E [ \ m - i o ( t ) \
2
) .
(6.2.18)
236
CONTINUOUS
From (A.2.6) we know that
GAUSSIAN
E{£(u)\Y(s),
CHANNELS
s < u] =
Hence, applying Theorem
((s).
6.2.1, we see that /«(*,
y)=\f
A(u?mM
= \f
-
m
«km
-
-
cooi vm,
«in
* <
A(u) E[|£(u) - £(«)| ] du. 2
(6.2.19)
2
It follows from (6.2.17) - (6.2.19) and Theorem 6.2.1 that I (f,Y)>I (£,Y ) t
t
a
=
>
-^A(ufE[\£(u)-Uu)\ du
1
^'A(u) £[|£(u)-£( )| ]d 2
!
U
i
u
= /.(£, n
(6.2.20)
Thus the two inequalities i n (6.2.20) hold w i t h equality and we have the equations (6.2.15) and (6.2.16).
•
The output of the feedback channel is written as
Y(t)
=
Y (t)
— j ' A(u)C(u)
a
0
du.
The feedback term j j . A ( u ) ( ( u ) d u is specified by the o u t p u t Y * which is known a
for the receiver.
This means that the linear feedback brings to the receiver no
new information. This is the reason why the mutual information is not increased and the filtering error is not decreased by linear feedback. However feedback can be used to save the transmission energy. Suppose that the channel (6.2.13) has an average power constraint £[|i(!)| ] 2
— A(t) E[\£(t) 2
—
C(f)| ] < p imposed on 2
2
the input signals. We can reduce £[|£(i) — ((t)| ]> by choosing £ appropriately. 2
Then more effective amplitude function A(t)
is allowed. As a result, this make
i t possible to increase the mutual information transmitted over the channel and reduce the filtering error. This observation suggests to us how to construct an optimal coding scheme under an average power constraint. The optimal coding methods w i l l be studied i n Sec. 6.5 i n detail.
6.3 M U T U A L I N F O R M A T I O N W H I T E GAUSSIAN
IN
CHANNELS
(II)
6.3 MUTUAL
INFORMATION
IN WHITE
GAUSSIAN
CHANNELS
(It)
A message £ = {£(s);s > 0} is called a Gaussian message i f £ is a Gaussian process.
I n this section we consider a case where a Gaussian message is trans-
mitted over a white Gaussian channel. As we have seen i n the preceding section, calculation of the mutual information is directly related to calculation of the mean square filtering error. I t is known that some problems i n filtering can be solved w i t h the aid of results i n the theory of integral equation. Some useful known results, concerning the filtering error, will be put i n order. Then they w i l l be applied to calculate the m u t u a l information. A t first, a white Gaussian channel without feedback is considered. Then the channel input may be identified w i t h the message, so that the channel is described as t
r
Y(t)=
/ ' e ( ui ))d u + B(t), Jo
0
(6.3.1)
where £ = { £ ( ( ) } is assumed to be a Gaussian process w i t h mean aero and covariance function
= £[£(i)£(s)] such that
R(t,s)
R{t,t)dt
<
co. Note that the
covariance function R ( t , s) has the following properties: (i) i t is symmetric, i.e., R{t,
s) = R(s,t);
(ii) it is positive definite, i.e., ,T
and
(iii) R(t,s)
,T
R(t,s)f(t)f(s)d dt>0,
/
/
Jo
Jo
V/eL ([0,T]); 3
s
g L ( [ 0 , T ] ) , i.e., 2
r
Jo
J
fR^sf
dsdt
<
Ja
rr
Jo
Elatfmttsywadt
Jo
2 R{t,t)dt
T
< OO.
Here we denote by £ ([0, T]) the space of all real (measurable) functions /(t) such J
that j
a
T
f(t)
2
real functions
dt < oo. I n the same way £ ( [ 0 , T ] ) is defined as the space of all 3
g(t,s)
satisfying
£
$
g(t,
2
s)
2
dsdt
<
oo. Since
forms a Gaussian system, i t is seen that the filter (f(t) = £(t) is a linear function of Y(u), integral equation. A function
E[£{t)\Y(u),
{f(t),B{t),Y(t)} u <
t]
of
u < t , and is obtained by solving a Wiener-Hopf f{t,s)
is called a Volterra function if / ( t , s ) =
0
whenever t < s. L e m m a 6 . 3 . 1 . Let
R(t,s)
be a covariance function. Then the Wiener-Hopf equa-
tion h(r, s) + J
k(t, u)R(u,
s) du = J i ( i , s),
a<s
(6.3.2)
238
CONTINUOUS
GAUSSIAN
has a unique Volterra type solution
6
h(t,s)
CHANNELS
I f R(t,s)
L (\0,Tf). 2
is continuous i n
( i , s), then as the solution we may take a function h(t, s) which is continuous on the domain {((, s ) e R ; 0 < ^ < ( < T } . We omit the proof but briefly sketch the outline of the proof. O u t l i n e o f P r o o f . Define Ji< >(i,s) by / ^ " ( M ) = n
/E
( n + l ,
/ R^(t,u)R(u,s)d ,
(t, )=
Jo
Then we can show that the function = E(-ir
Ji
(6.3.3)
defined by
h(t,s)
+ l
and n>l.
u
s
m 4
R(t,s)
t n ,
(6.3.4)
3 < t,
((,s),
1=1
satisfies (6.3.2). The uniqueness of the solution is proved as follows. Let
ki(t,s)
and h i ( t , s) be solutions of (6.3.2). Then the function h (t, s) = h (t, s) — h\(t, s) 0
satisfies Ji ((, s)
=
0
— j
Q
k (t, u}R(u,
s) du.
0
2
Therefore, noting the positive definite-
ness of R { t , a), we have 0 <
meaning that
/ Jo
k {t,a) da
h (t,s) 0
0
= -
1
f j Jo Jo
R(u,s)h (t u)h (t,a)duda l)
>
v
0. Thus h , ( t , a ) = « i ( t , s ) .
=
< 0
•
T h e o r e m 6.3.2. Let the message £ = { £ ( ( ) } be a Gaussian process w i t h continuous covariance function
R(t,s).
Then the filter, the filtering error and the mutual
information i n the white Gaussian channel (6.3.1) without feedback are given by f(*)=
(6.3.5)
/ h(t,s)dY{s),
Jo
E[\at)-at)\ ) 2
(6.3.6)
= h(t,t),
respectively, where k{t, s) is the solution of the Wiener-Hopf equation (6.3.2). P r o o f . Since
{({t),
B(t),
Y{t);
t >
0} forms a Gaussian system and the right-hand
side of (6.3.5) is a function of {Y(u);u
< t],
to prove (6.3.5) i t suffices to show
that £(t) - /„' n(f, a) d Y ( s ) is orthogonal for all Y(u), u < t: = 0,
u < ( .
(6.3.8)
6.3 MUTUAL
INFORMATION
IN WHITE
From the independence of {£(*)} and
GAUSSIAN
CHANNELS
(II)
239
(6.3.1), (6.2.3) and (A.3.3) we have
{B(t)},
Right-hand side of (6.3.8) =
-
E
= jf
jf
h(t, )f( )ds s
-
s
j * h(t,s)dB(s) j
+
(J*t(v)dv
S
B(u)
)
j f h(^»)J^B «)«foi ds
(R{t,s)-k{t,s)-
)
= 0, and we have (6.3.8). Using (6.3.2) and the symmetry of fl(i,s), i n the same way as above we obtain E\\Z(t)\\
=
[
I
h(t u)h{t,v)R(u,v)dudv
+
i
Jo JO =
/
f
h{t,u)
2
du
Jo k{t,u)R(u,t)du.
Jo
Thus, using (6.3.2) again,
EW)
2
~ ?(t) ] 2
= R(t,
t) -
/'
h{t, u)R(u,
t) du = h(t,
t).
Jo Now (6.3.6) and (6.3.7) follow from (6.2.11) and (6.2.12).
•
It is not true to say that the mutual information is explicitly calculated by means of (6.3.7), because no general method to solve the Wiener-Hopf equation has been known. However, there is a rather simple but important case where the Wiener-Hopf equation can be solved to calculate the mutual information. E x a m p l e 6 . 3 . 1 . Let £o(^) be a zero mean Gaussian random variable w i t h variance c , A(t) be a non-random function w i t h Jg A{t} 2
dt < oo, and £ = { £ ( ( ) ; ' >
2
0) be a message given by £(t,w) = A(t){o(w),
r > 0, bi e 9 .
Then, since the covariance function of £ is
R{t,
s)
=
a A{t)A(s), 2
(6.3.9) i t is easily seen
that the solution of (6.3.2) is given by
^ • ' - i / f f i , 1 +
0
2
(
6
3
1
0
)
CONTINUOUS
240
GAUSSIAN
CHANNELS
Consequently, the m u t u a l information transmitted by sending the message (6.3.9) over the channel (6.3.1) without feedback is given by a
A(s)
t*
2
1
I n particular, if A ( t ) = 1, then r (to,Y)
=
t
= | l o g ( l +
IfcY)
(6.3.11)
a
If a message is a Gaussian Markov process, namely a Gaussian process with simple Markov property, then the filter is given as a solution of a stochastic integral equation and the filtering error is given as a solution of a Riccati differential equation which is rather easier to solve than the Wiener-Hopf equation. This is known as the Kalman theory. We give here the results without proof. Let a Gaussian process rj = { l ( t ) i i > 0} be a solution of a stochastic integral equation >?(<)= f
0
+
f
a(s)ri( )d + S
Jo
S
f
Jo
t>0,
b{ )dW(s), S
(6.3.12)
where n is a random variable w i t h distribution JV(0, t r ) , W = ( W ( t ) } is a Brownian motion independent of no, and a ( t ) and b(t) are real functions satisfying 2
0
fT
T
f
J
\a{t)\dt
< oo
j
and
b(t)
2
dt <
co.
It is known that n is a Markov process, and conversely a Gaussian Markov process w i t h mean zero is presented as a solution of a stochastic integral equation of this type if its covariance function is differentiable. Let a message £ = { £ ( ' ) } be given by * > 0, w g
« t , w ) = A(f)ij(*, ), W
where A ( t ) is a function such that
A(t)
2
ft,
(6.3,13)
dt < oo.
T h e o r e m 6.3.3. Assume that the message £ of (6.3.13) is t o be transmitted over the channel (6.3.1). Then the filter ij(/) =
E[t}(t)\
Y(u),
u < f] of rj(i) is a solution
of a stochastic integral equation flr)=
f{a(s)-A( ) {s)m )d + S
Jo
2
1
s
s
f
Jo
A(s)-y(s)dY(s),
t >
0,
6.3 MUTUAL
INFORMATION
IN WHITE
GAUSSIAN
CHANNELS
241
(II)
where 7(1) = £[|n(i) - i7(i)| ) is the filtering error and is given as the solution of the following Riccati equation 2
t 7(0)
( 6
- " 3
1 4 )
R e m a r k . The differential equation (6.3.14) is equivalent to the integral equation
a
+
(s)ds)
(6(s)2 j4(s) 7(s)3)exp
* i' " '
a(u)rfu ds
(- £ ) }•
E x a m p l e 6.3.2. Let the input £ = { £ ( ( ) } of the channel (6.3.1) be a Gaussian process given by £(t) = ft - <> / ( ( s ) d i + i.W((), Jo
' >0,
(6.3.15)
where £0 is a Gaussian random variable with distribution JV(0,a ). The message 2
£ is obtained as a special case of (6.3.12), by substituting o ( t ) = — a, 6(t) = 6, A(t) = 1. I n this case (6.3.14) turns out to be £ ( f ) = - 7 ( 0 - 2 n ( r ) + t> , 2
7
2
7
<>0
7(0) = <*\ Solving this equation, i t can be seen that the filtering error 7(f) — i?[|£(() — £(t)| ] 2
is given by 7
(i) = " '
g l Z cex (2 ^ +i r)-l 2
P
v
r
v
+
V
^ T ^ - a ,
7
where c
=
/ 2
" T T
• w •
(6.3.16)
Consequently the mutual information is equal to
1 c - e x p ( - 2 y ^ T i ' t ) , 1, , = xlog + ; ( v « ' + 6 ' - o)(. ~ /
1
c
l
2
f
1
(6.3.17)
242
CONTINUOUS
GAUSSIAN
In particular, i f a > 0 and b = 2ac , 1
CHANNELS
then we can show that £ = { £ ( * ) } is an
2
Ornstein-Uhlenbeck Brownian m o t i o n w i t h covariance function R{t,s)
= <7 e-'
M ,
2
(6.3.18)
We now consider Gaussian channels w i t h feedback. Suppose that the message £ = {£(t);t > 0} is a Gaussian process w i t h a continuous covariance function R(t, s ) . Let us start w i t h a white Gaussian channel (6.2.13) w i t h additive feedback. For simplicity we assume A(t) = 1. Then the channel is given by
Y(t,u)=
l {£(u,w)-C(u,w)}du + Jo
fl(r,w),
0 < ( < T.
(6.3.19)
Moreover we assume that £(u, w) = C(u, ^ " ( w ) ) is a linear function of K(t»,u>), 0 < v < u. I n other words, Q = { ( ( ( ) ; t > 0} is a process satisfying
C(t)eH {Y),
/ Jo
t
where H (Y)
is the subspace of
t
L (il,
spanned by Y(s),
P)
2
E[C.{t?\dt < m,
(6.3.20)
0 < s < t. I n this case
it is shown that the stochastic equation (6.3.19) has a unique solution Y = L e m m a 6.3.4. If (6.3.19) has a solution Y = {Y(t)},
U {Y)
= [J
t
f £ L ([0,(])|,
then
0 < i < T.
2
u
Denote by H (Y)
Proof. Ht(Y)
Hu)dY( );
{Y(t)}.
(6.3.21)
the set of the right-hand side of (6.3.21). T h e n clearly
t
For an element
D
n
V = J^c Y(t )£H (Y), k
k
(6.3.22)
t
=a
k
(0 = (
0
< ••-< t
n
< t, and c is a constant), let / ( i t ) = E ? = * + i <*> ** < « < * * + ! • k
Then n y
=
E
i
\
n+1
E
t = 0 \,=*+l
^
i
(n**+a-ni*)}= /
/
/ ( « w « > e w . ( n
6.3 MUTUAL
INFORMATION
IN WHITE
GAUSSIAN
CHANNELS
(II)
243
where t = t and c , = 0. Noting that the set of all elements of the form (6.3.22) is dense i n H,(Y) and that H {Y) is a closed subset of H,(Y), we conclude that H,{Y) = Ht{Y). • a
+
n +
J
t
This lemma implies that (6.3.19) can be replaced by r(t,u>) = j f ^ (
u
,
w
) - ^
l ( u , s ) d K ( s , ) ) du +
fl(t,w),
w
(6.3.23)
where, by (6.3.20), l{t s) is a Volterra function belonging to L ^ j O . T ] ) . We consider a Volterra equation 1
t
l(t, u)k(u,
fc(t, s) + j
s) du = -l(t,
s),
0 < s < t < T,
(6.3.24)
which is obtained from (6.3.2) by replacing the integral kernel fl(i,s) by a Volterra function i(*,s). We shall show a similar result t o Lemma 6.3.1. L e m m a 6.3.5. The Volterra equation (6.3.24) has a unique solution k ( t , s ) £ L ( [ 0 , T ] ) such that 2
2
k{t,s)
+ J
= 0 for ( <
k(t,s)
k(t,u)I(u s)du
s.
The solution fc(f,s) satisfies
= -l(t,s),
t
0<s
(6.3.25)
Moreover, i f i ( t , s ) is continuous in ( t , s ) on {((,s) £ R ; 0 < s < t < T } , then fc((, s) can be taken as a continuous function on the same set. 2
P r o o f . Define ('"'((,s) i n the same manner as i n (6.3.3), and define k ( t , s ) by
n=]
We can easily show that the right-hand side converges i n L ( [ 0 , T ) ) and fc(f,s) 2
2
satisfies (6.3.24) and (6.3.25). To show the uniqueness of the solution, let fc[((,.s) and
k {t,s) 2
be solutions of (6.3.24). Then fc (t,s) 0
Since ;
( n l
= (-1)" j
k [t, 0
l< >(t,u)k {u, ) n
a
s
s) = fci((,s) — du,
k (t,s) 2
Wn,
satisfies (6.3.26)
- » 0 i n i ( [ 0 , T ] ) as n - » co, we conclude that fc (f,s) = 0. Thus the 2
2
solution of (6.3.24) is unique.
0
•
The solution k ( t , s ) of (6.3.24) is called the r e s o l v e n t k e r n e l of the Volterra kernel (((, s).
244
CONTINUOUS
GAUSSIAN
CHANNELS
T h e o r e m 6.3.6. The stochastic equation (6.3.23), representing a white Gaussian channel w i t h linear feedback, has a unique solution Y(t,u>)
= %rt,vi+£
^J\u,»)dY (a,u)j
du,
0
(6.3.27)
where k(t, a) is the resolvent kernel of Z(r, s) and %fau)=
f(u,u>)du
I
Jo
+ B(t,u>).
(6.3.28)
Furthermore i t holds that H,{Y)
= H,(Y ),
0
0
(6.3.29)
P r o o f . Using Y o f (6.3.28), (6.3.23) can be rewritten i n the form Q
Y(t)
Let
Y = {Y(t)}
£
+ j
(J"
/(u, s) dYfrj)
du = Y (t).
(6.3.30)
0
be given by (6.3.27). Then, noting l(u,v)k(v, )dY {a)J s
0
dv = £
l(u,v)k(v,s)dY {a))
dv
from (6.3.27) and (6.3.24), we see that
= 0. Thus
Y
satisfies (6.3.30). Conversely, let
Y =
{Y(t)}
be a solution o f (6.3.30).
Then, using (6.3.25) i n place of (6.3.24), the same argument as above shows that Y satisfies (6.3.27). This means that Y given by (6.3.27) is the unique solution of (6.3.30), equivalently of (6.3.23). T h e relation (6.3.29) is clear from (6.3.27) and (6.3.30).
•
6.3 MUTUAL
INFORMATION
IN WHITE
GAUSSIAN
CHANNELS
(II)
2-15
R e m a r k . Equation (6.3.28) presents the non-feedback Gaussian channel corresponding t o the feedback channel (6.3.23). We have already seen that additive feedback does not change the mutual information and the filtering error (see (6.2.15) and (6.2.16)).
Clearly the invariance (6.3.29) under linear feedback is
a stronger property than (6.2.15) or (6.2.16), and we obtain equations £(f) = £[£(01 > » , u <
t] = £ [ £ ( 0 1 Y l u ) , a
f
u
Mi, a) dY„(s)
7
(6.3.31)
Jo
and I,{t,Y)
= I,(Z,Yo)
(6.3.32)
=
where k(t, s) is the solution of (6.3.2). One of the most important model of an additive feedback channel is presented by
f {£(«,«)-£>,«)}*< + *(*,«),
Y(t,*)=
(6.3.33)
Jo where £(ti) = £[£(u)l K ( s ) , s < u j . I t is shown that the feedback i n (6.3.33) is linear and that the signal energy ( = the variance of the input) is minimized by the feedback. T h e o r e m 6.3.7. Under the assumptions of Theorem 6.3.6, the stochastic equation (6.3.33) has a unique solution Y(t,u)
where
Y
0
= Yo(t,u)-
J
(f
h(v,a)dY (a,u)j
is given by (6.3.28) and
= {Y (t)} 0
du,
0
k(t,a)
(6.3.34)
is the solution of (6.3.2). The
filter of ((t) is
tO>j[
l(t,s)dY(s)
=
jf
(6.3.35)
h(t,s)dY (a), 0
where /((, a) is the resolvent kernel o f — h(t, s). Furthermore, for any white Gaussian channel (6.3.19) w i t h additive feedback, i t holds that Elm
- c(*)i ] > mm 2
- mn
o< *< r.
(6.3.36)
P r o o f . By Lemma 6.3.5, — h(t, s) is the resolvent kernel of i(r, a), and by Theorem 6.3.6, the solution of Y{t,v)
= Y (t,w)-jf 0
( T
l{u,s)dY(a w)^ t
du,
(6.3.37)
246
CONTINUOUS
GAUSSIAN
CHANNELS
is given by (6.3.34). Combining (6.3.31), (6.3.34) and (6.3.37), we have E[ttt)\Y(u),u
h(t,s)dY ( )=
f Jo
[ Jo
D S
Kt, )dY{s), 3
meaning that the equation (6.3.37) coincides w i t h (6.3.33).
Thus
Y
=
{Y(t)},
denned by (6.3.34), is the unique solution of (6.3.33) and the filter is given by (6.3.35). Finally (6.3.36) is clear from (6.2.16) and (A.2.8).
E x a m p l e 6.3.3.
(Continuation of Example 6.3.1)
•
Let a message £ (w) be a D
random variable w i t h distribution jV(0,
by M
P M
=
e K P
a
jf
p ( u )
2
p(t)
where p(i) > 0 is a function such that
2
d
t
s
( 6
t
.3.
3 8 }
dt < co.
The channel to be
+ Z?(r),
(6.3.39)
considered here is =
Y(t)
where £o(u) = £[£o| l^fs),
3
/ ' -4(u)(£ Jo 0
Mu))du
1? ] - Then the average power is equal to u
A(t) E[(to-Um }
= p(t) ,
2
2
(6.3.40)
1
and the mutual information and the filtering error are given by
W o ^ ^ ^ J ^ p l u f d u ,
El\So
~ fo(()i ] = ^ e*P ( "
(6.3.41)
P(«)
f
2
.
Indeed, since the Gaussian process £ = { £ ( ( ) } , £(*,">) = function R(t, s) = a A(t)A(s), 2
(6.3.42)
A(t)( (w), 0
has covariance
the solution of the Wiener-Hopf equation (6.3.2) is
given by k(t,s)
= p(t)p(s)exp
where we have used (6.3.10). Particularly
p(u) du}
,
2
k(t,t)
=
p(t) . 2
t >
Thus (6.3.41) follows
from (6.3.32), and (6.3.42) and (6.3.40) follow from (6.3.31) and (6.3.6).
We t u r n back to the case of channels without feedback.
Consider the white
Gaussian channel (6.3.1) without feedback, where the message £ is a Gaussian
6.3 MUTUAL
INFORMATION
IN WHITE
GAUSSIAN
CHANNELS
247
(II)
process w i t h continuous covariance function R(t, s). Then the mutual information can be evaluated by using the eigenvalues of the integral operator R with R(t, s) as the integral kernel. The method developed i n the following corresponds to the orthogonal expansion method which is applied i n Sec.
5.3 to calculate mutual
information i n discrete Gaussian channels. Let S € (0,T] be fixed. The integral operator R on L ( [ 0 , S ] ) , defined by 2
Rf(t)
=
f R(t,s)f(a)d3, Jo
(6.3.43)
feL\[Q,S]),
is called the covariance operator of £. I f there exist a number A and a nonzero function / € £ ( [ 0 , S j ) such that 2
Rf
= A/,
(6.3.44)
then A and / are called an eigenvalue and a corresponding eigenfunction, respectively, of R. I t is known that, for every eigenvalue A ^ 0, (6.3,44) has at most a finite number of linearly independent solutions. The maximum number of linearly independent eigenfunctions corresponding to A is called the multiplicity of the is a covariance function, the operator R has the follow-
eigenvalue A. Since R(t,s) ing properties:
(i) R is symmetric, i.e., (Rf,g)
= (f,Rg),
f,g
£ £ ([0,S])j and J
(ii) R is positive definite, i.e., {Rf, f) > 0, / € L ( [ 0 , 5 ) ) , where (•, •) denotes the 2
inner product of L ( [ 0 , S]). The operator R has at most countable eigenvalues and 2
they are a l l non-negative real numbers. We may arrange the eigenvalues in decreasing order: Ai > - • • > A
> • • • > 0, here ) A „ ) contains all eigenvalues appearing
n
in accordance w i t h their multiplicity. Moreover, we may choose the eigenfunction /„(f) corresponding to A„ i n such a way that /„(() is a continuous function and {/n(*)i * *
=
1 ) 2 « . J forms an orthonormal basis (c.o.n.s.) of £ ([0, S]), so that 2
(
(/*,/») = f /-(0/»(0* S
= *mn,
= VV»-
(6-3.45)
Jo Mercer's expansion theorem says that R(t, s) can be expanded in the form R(^s) = £ A f (()/,d>), 71=] n
n
where the series on the right converges uniformly and absolutely.
(6.3.46) I t is noticed
that, combining (6.3.45) and (6.3.46), we have V A „ = f R(t,t)dt. Tr, Jo S
(6.3.47)
248
CONTINUOUS
GAUSSIAN
CHANNELS
Needless to say, the eigenvalues and the eigenfunctions vary w i t h the interval [0, SI. If i t is necessary to specify the dependence on S , we denote A „ ( S ) and /,,((; S ) i n place of X„ and /n(t)- I t can be shown that / „ { ( ; S ) is also continuous also i n S. T h e o r e m 6.3.8. T h e m u t u a l information transmitted over the channel (6.3.1) without feedback is given by
where { A ( S ) } are the eigenvalues of the covariance operator of the Gaussian n
process £. To prove the theorem we prepare two lemmas. L e m m a 6.3.9. The integral equation a) + /
fc(t,u)fl(u,s)du
=
R(t,3),
0
(6.3.49)
Jo
has a unique solution fc(i,s) = fc(f,s;
6 £ ([0,S]).
S)
2
T h e function
k(t,s)
is
continuous i n ( t , s ) . P r o o f . The solution of (6.3.49) is given by *(*.•») = £
TX^T-Z-CO/nW-
(6.3.50)
Indeed, the series on the right converges uniformly and absolutely to a continuous function
fc(i,s).
Using (6.3.45) and (6.3.46), i t is easily shown that the function
fc(f, s) defined by (6.3.50) satisfies (6.3.49). The uniqueness of the solution can be shown i n the same manner as i n Lemma 6.3.1.
•
R e m a r k . I f S = f i n (6.3.49), then the integral equation coincides w i t h (6.3.2). Hence we have k(t,s;t)
0<s
= h{t, ), 3
(6.3.51)
where h(t, s) is the solution of (6.3.2). Moreover, i n the same manner as i n Theorem 6.3.2, i t can be shown that, based on the observation K ( u ) , u < S , t h e filter of f ( t ) ( t < S ) which minimizes the mean square error is given by E[fX0l
/ fc(f,s;
S)dY(a).
Jo
Note that (6.3.5) can also be derived by p u t t i n g S = f i n (6.3.52).
(6.3.52)
6.3 MUTUAL
INFORMATION
IN WHITE
GAUSSIAN
CHANNELS
(II)
249
L e m m a 6 . 3 . 1 0 . For each n , A „ ( S ) is differentiable and A
A
„(
S
) = \ (S)f {S; n
S) .
(6.3.53)
2
u
P r o o f . For any k i t follows from (6.3.45) that 0=
/
/„(t;S + n ) d i - /
=
f (t;S) dt
J
Jo
2
n
Jo Un(.t;S
I Jo
+ h) + f {t;S))(f (_t-S n
+
n
h)-Ut;S))dt
rS+h +
J IS
Mt-.s
s
+
hfdt.
Since /„((; 5 ) is continuous, dividing by h and then letting h tend to 0, we have lim | f
S
/„(*; S + h)(f (t;S
+ h) - /„((;
n
s
Note that MS)
=
S))dt
= -h {S;
S) . 2
n
(6.3.54)
s
/
/
Ja
Jo
B{t,a)U(t;
Stfjj*
S ) dsdt.
(6.3.55)
Using (6.3.55), (6.3.54) and (6.3.44), we can show that lim \ ( K ( S + h ) - A (S)) = A„(5)/ (5; S ) . n
Thus we have obtained (6.3.53).
n
2
•
P r o o f o f T h e o r e m 6.3.8. By (6.3.7) and (6.3.51),
For convenience, denote by 7 ( 5 ) the right-hand side of (6.3.48). By differentiation, we obtain
US) n= l
-ijK5,S;5),
(5)
(6.3.57)
250
CONTINUOUS
GAUSSIAN
where we have used (6.3.53) and (6.3.50).
CHANNELS
Apparently l i m s _ o
l i m s - o /
0
I ( S ) = 0.
J2T=i
*«(^)
=
Consequently, by
(6.3.56) and (6.3.57), we have
Is(t,Y) The proof is complete.
= j * ±I(t)dt
=
I(S).
•
6.4 C A P A C I T Y O F W H I T E
GAUSSIAN
CHANNELS
In general i t is a difficult task to compute the capacity of a channel. However, under an average power constraint, the capacity of a white Gaussian channel can be determined explicitly. It will also be shown that the capacity is not increased by feedback. This is an interesting property satisfied by white Gaussian channels. In general, the capacity is increased i f feedback is available. The channel to be considered here is a white Gaussian channel (6.1.6) under the conditions ( A , 0 ) — ( A , 3 ) given i n Sec. 6.1. As was mentioned previously, i t is D
0
practical to assume that the channel has some constraints on the channel input. A n average power constraint, which is described i n terms of the second moment £[|x(t)| ] of the signal, is the most common limitation. I n the sequel, we deal w i t h 3
average power constraints of the forms (6.4.1)
J EMt)\>}dt
i
(6.4.2)
T
and £Ht)| ]
2
Vt,
(6.4.3)
where p > 0 is a constant and p(t) > 0 is a continuous function. These constraints correspond to the constraints (5.3.1) - (5.3.3) i n the discrete time case. For the feedback white Gaussian channel subject to (6.4.1), we denote by the class of all admissible pairs (£, i ) of a message £ and an input Afi)
=
x =
( £ . * ) satisfies ( A „ , 1) - ( A , 3 ) and (6.4.1)}. 0
{x(t}}:
A(p)
6.4 CAPACITY
OF WHITE
GAUSSIAN
251
CHANNELS
Under the constraint (6.4.2) or (6.4.3), the class A(p) of all admissible pairs is defined i n the same way, replacing (6.4.1) w i t h (6.4.2) or (6.4.3), respectively. Recall that the capacity C j = C j { p ) (up t o time T) o f the channel (6.1.6) w i t h feedback has been denned as
C j ( p ) - s u p { / ( £ , y ) ; { ( , x ) £ A(p))
(6.4.4)
T
(see Definition 4.4.2). T h e capacity C
= C {p)
T
(up t o time T) of the same
T
channel w i t h o u t feedback is given by
c { p ) = supfz-H*,r) T
; i
e *(/>)},
(6-4.5)
where X ( p ) is the class consisting of all inputs x = { i ( t ) } which satisfy the given average power constraint and are independent of the noise
B =
{B(t)).
T h e o r e m 6.4.1. The capacity of the white Gaussian channel (6.1.6) subject t o the constraints (6.4.1) - (6.4.3) is not changed by feedback. Under the constraint
(6.4.1) or (6.4.2), the capacity is given by
C ( p ) = Cj(p) = \p T. T
(6.4.6)
2
Under (6.4.3), C
T
( p ) = C } {
P
)
= \
£
(6.4.7)
p(t) dt. 2
If the channel is w i t h feedback, then there exists an admissible pair (£, x) € A(p) by which the mutual information equal to the capacity is transmitted: I (f,Y)
(6.4.8)
= Cj( ).
T
P
P r o o f . Suppose that the constraint (6.4.2) is imposed. Clearly C {p) T
by definition. For any admissible pair (£, x ) , inequalities
h i a y ) < \ [
follows from
mm\ ]dt<\p T 2
(6.2.12) and (6.4.2). Thus we have C (p) T
< Cj{p)
<
\ T. 3
P
2
< Cj(p)
252
CONTINUOUS
GAUSSIAN
CHANNELS
To prove (6.4.6), i t suffices to show t h a t , i n the non-feedback case, there exists an input x such that the mutual information IT(X, Y ) is sufficiently close to 2 Jo
dt.
Let an input
x =
be an Ornstein-Uhlenbeck Brownian mo-
{x(t)}
tion w i t h covariance function R ( t , s) of (6.3.18) w i t h a = p. Using (6.3.18) and (6.3.17), we see that the constraints (6.4.1) and (6.4.2) are satisfied and
where c is given by (6.3.16) w i t h y/a
+ lap
2
2
— a -> p . 2
b
=
2
lap . 2
Let
co, then c —» —oo and
a -*
This means that, i f a is large enough, then
to p T / 2 . Thus (6.4.6) is proved under the constraint (6.4.2). 2
IT(X,Y)
is close
The proof given
above remains valid without any changes, even i f the constraint (6.4.2) is replaced by (6.4.1). Under the constraint (6.4.3), (6.4.7) can be shown analogously. I f the channel is with feedback, then Example 6.3.3 shows that there exists an admissible pair (£, x ) by which the capacity is attained.
Remark,
(a)
•
I n the case where the white Gaussian channel is w i t h feedback,
as will be shown i n the next section (Theorem 6.5.3), for any Gaussian message £ there exists an encoding scheme by which the capacity is attained, namely (6.4.8) holds. (b)
I n the case where the channel subject to (6.4.3) is w i t h o u t feedback, the ex-
istence of an admissible Gaussian input x = ( i ( f ) } , for which the m u t u a l information J T ( I Y ) transmitted over the channel is approximately equal to the capacity, X
is shown as follows. Divide the interval [0, T] into sufficiently small subintervals Alt,
k — 1,
K . Let
k = 1 , K , be mutually independent Gaussian random
variables w i t h distribution
where
N(Q,pj.),
p\
= min,^,
A n d define
p(t) . 2
\x(t)}
by x ( t ) = Ct
if
(SAjt.
Then, as is shown i n Example 6.3.1, 1 IT(X,Y)
=
K
-Y,1°&{1 fc=i
+
\&M).
where |A| denotes the length of an interval A. Let K -* oo and m a x t |A*] —* 0. Then i t can be shown that
if>g(l
+ |A |pi.) t
—
lj p(t) dt T
o
2
=
C (p) T
=
Cf{p).
6.4 CAPACITY
OF WHITE
GAUSSIAN
CHANNELS
253
Thus the capacity is approximated by the mutual information. We proceed to consider a white Gaussian channel on infinite time interval [0, oo). Assume that a constraint lirn i
[ E[\x(t)\*\dt
(6.4.9)
T
(p > 0 is a constant) is imposed on the channel inputs. The per unit time capacity is defined i n Definition 4.4.3. We denote the class of all admissible pairs (£,x) of a message £ and an input x = ( i ( t ) l 0 < t < oo} by MP)
= iU,x);
( £ , * ) satisfies ( A „ , l ) - ( A „ , 3 ) and (6.4.9)}.
Then the per unit time capacity C/ = C/(p)
of the channel (6.1.6) w i t h feedback
is given by C,(p)
= sup{7(£, Y); ( & « ) € 3 , ( p ) }
(6.4.10)
On the other hand, the per unit time capacity C{p) of the same channel without feedback is given by C(p) = sup{7(s, Y); x £ X(p)}, where X(p)
is the set of all processes x = {x(t)}
independent of the noise B =
(6.4.11)
which satisfy (6.4.9) and are
{B(t)}.
T h e o r e m 6.4.2. Under the constraint (6.4.9) the per unit time capacity of the white Gaussian channel (6.1.6) is given by C( ) P
= C,(p)
=
\ *. P
If the channel is w i t h feedback, the capacity is achievable, namely there exists an admissible pair (£, i ) 6 -4(p) such that l(tY)
= C,(p)
=
\p . 7
This theorem can be proved i n the same way as Theorem 6.4.1, so the proof is omitted.
254
CONTINUOUS
GAUSSIAN
CHANNELS
6.5 O P T I M A L C O D I N G S I N W H I T E G A U S S I A N
CHANNELS
Let us consider a white Gaussian channel (6.1.6) w i t h feedback, so that the conditions ( A , 0) - ( A , 3 ) are satisfied. I n this section, we shall study the problem 0
0
of o p t i m a l codings when a Gaussian message is to be transmitted over the channel. It will be shown that there exists an optimal coding scheme which maximizes the mutual information and at the same time which minimizes the filtering error. The o p t i m a l coding is given by using a linear feedback. Assume that the average power constraint (6.4.3) is imposed on the channel input.
Then the capacity is equal to Cj
=
j j
g
p{t)
2
dt (see (6.4.7)), where
T < oo is the terminal time. Suppose that a message £ = { £ ( ( ) ; 0 < t < T] is a Gaussian process, then an input signal x = ( z ( ( ) } satisfies (Ao, 2) and a function x(t,C{-,i>>),Y '(tA>)) 0
on the right-hand side of (6.1.7) indicates how to encode the
message w i t h the aid of the backward information about the o u t p u t up t o t. So an input x may also be called an encoding x. For a given message ft denote by A(p;()
the class of all possible encodings: A(pU)
= {x; x satisfies ( A , 2 ) , ( A , 3 ) and (6.4.3)}. 0
0
Each element x of A ( p ; f t l is called an admissible encoding. We consider a simple but quite important case where a message process £ = {£(<)} is given by £(t)=ft,
0
(6.5.1)
here ft is a Gaussian random variable w i t h distribution iV(0,1). To transmit the message we use the coding scheme given i n Example 6.3.3. Denote by x' = ( i * ( ( ) } the encoding given by (6.3.39) and by V Y-(t)
= { K " ( i ) } the corresponding o u t p u t :
= f
A » ( £ ( u ) - £ • ( « ) ) du +
=
X
f *lu)du Jo
B(t)
+ B(i),
(6.5.2)
where A ' ( u ) is the function of (6.3.38) and £ * ( « ) = E [ f t u ) | Y*(s),
s < u ] . By
(6.3.40), the encoding x' is admissible. Since the capacity C J is attained by this coding scheme, for any other encoding x e -4(p;£), the inequality
it%ry$i,{z,Y*% holds, where Y = {Y(t)}
o
(6.5.3)
is the output corresponding to x. T h i s means that the
encoding x~ is o p t i m a l from the viewpoint of mutual information. I t is said that
5.5 OPTIMAL
CODINGS
IN WHITE
GAUSSIAN
ass
CHANNELS
a n encoding * • is o p t i m a l i n t h e s e n s e o f m u t u a l i n f o r m a t i o n , i f (6.5.3) holds for all i € A{p\
{)-
A t the same time, the encoding x' is optimal also in the sense that the mean square filtering error is minimized. Namely i t is shown that, for any x e A(p;
mm -
f(*)i] > m\m - Wfb
o
a
<
T
(6.5.4)
< T,
where £(u) = £[£(s)| K<s), s < u] is the filter corresponding to x.
(),
I n fact, by
(6.3.42) and Theorem 1.8.8, we have
mm
-
EM*)
~ m }
M i
2
)
= «jexp
(~£ \u)du P
)
and
2
> «p{-2/,(£(t),f(0)} ><7 e p{~2I,(f,Y)} 2
X
>a exp(-2C' ) 2
f
— (7 exp^— J J
yielding (6.5.4). filtering
I t is said that an encoding i"
if (6.5.4) holds for all x 6 Aip;£).
p (u)du 3
is o p t i m a l i n t h e sense o f
Summarizing the arguments given
above, we have obtained the following theorem. T h e o r e m 6.5.1. Under the constraint (6.4.3), to transmit the Gaussian message £ of (6.5.1) over the white Gaussian channel (6.1.6) w i t h feedback, the encoding x" given by (6.5.2) is o p t i m a l not only i n the sense of mutual information but also of filtering.
Note that, as was seen i n Sec. 6.3, the feedback i n (6.5.2) is linear. Thus, even if nonlinear coding schemes are available, we have seen that the optimal coding scheme is linear and the resulting system {£(•), !"(•)> & ( ) } is Gaussian. A n analogous result has been established for a Gaussian Markov message. Let £ = { £ ( ( ) } be a process given by (6.3.12). We assume that the available encoding schemes are restricted to those given in the form, x(i, ) = (t,£(t,«),K H), w
I
0
,
0
(6.5.5)
251
CONTINUOUS
GAUSSIAN
CHANNELS
Note that, w i t h respect to the message, an input signal x(t) at t depends only on £{(). I n this case, the class A ( p ; £) of all admissible encodings is described as A { p \ Q = [m x satisfies ( 6 . 5 . 5 ) , ( A , 3 ) and (6.4.3)}. 0
The following result has been shown {[87]). We give the statement of the result without proof due to limited space. T h e o r e m 6.5.2. Let a message £ = { £ ( ' ) ) be the Gaussian Markov process given by (6.3.12) where we assume that a(t) and b(t) are bounded. Let A ' ( t ) > 0 be an amplification function given by A - ( f ) '
2
^ = exp { j f W )
- pfu)) du }
+ j f V s ) e x p { ^ ' ( 2 ( u ) - * > < u } ) d « } ds, 2
a
where a is the variance of £(0). Then the encoding x* of (6.5.2) is o p t i m a l i n the 1
sense of filtering as well as of mutual information. I n this case the average power of x ' is equal to
- C(t)\ ] = p ( i ) .
B[|**(t)|*3 = A\t?E\W)
2
(6.5.6)
2
Moreover the mutual information and the filtering error are given by h(t,Y')
= C)
=
| £
p{uf
du
and
E[\i{i) - C(t)\ )
« P { j f (2"(u) - Pi")) d w }
2
+
a
2
j
6(s) exp|y (2a(u) 2
p{u))du
J
ds.
R e m a r k . To be precise, in order to prove the theorem, we need some additional assumptions on the class A(p; f ) concerning continuity and boundedness of encod¬ ings x ( t ) = * { ( , £ ( ( ) , 1?). By analogy, one may expect that for any Gaussian message, not necessarily Markov, the encoding x ' determined by (6.5.2) is optimal. However this conjecture
6.5 OPTIMAL
CODINGS
IN WHITE
GAUSSIAN
CHANNELS
257
remains to be proved. As a partial answer to the conjecture, i t is shown that the encoding x' is optimal in the class of encodings w i t h additive feedback. I f a coding scheme is given i n the form F(*,w)=
f
A(«)tt{u,w)-C(«.«)}
JO =
then
x
/ Jo
+ B(t,u>),
x(u,u)du
is called an encoding w i t h additive feedback, where A ( u ) > 0
— {x(t)}
is a bounded non-random function and C(u,w) Y{s),
(6.5.7)
= C(u,Y *(fa>)) is a function of 0
s < u. We denote by A ^ a = A d d ( p ; 0 the class of all encodings w i t h
additive feedback satisfying (6.4.3): A d ( p \ 0 = {x; x satisfies (6.5.7),(6.4.3) and ( A , 3 ) } . i d
0
6.5.3. Let a. message £ — { £ ( ( ) } be a Gaussian process such that
Theorem
a<j E{f(t)*\dt
0
1°) There exist a unique function A * ( i ) i * G Avii which satisfy (6.5.2) and (6.5.6).
> 0 (0 < t < T ) and an encoding
The encoding x ' of 1°) is optimal i n -4 dd i n the sense of filtering as well
2°)
4
as of m u t u a l information. Proof.
1°)
Let fcj = A , ( ) { f ( u ) - <,(«)} € A d d ( » - l , 2 ) and U
Yi =
[Yi(i)}
be
the corresponding output. Then the following properties (i) and (ii) can be shown. (i)
If Ai(f)
£i(0H < (ii)
>
EM*)
for aU 0 < t <
At(t) ~
6(<)| 1, &>r every 2
If A i ( t ) > A (t)
T,
then M f t Y i ) > J i ( f t x j ) and
(, where ft(t) = £[£(t)| Y^u),
for all 0 < t < T and A , ( f ) > A (t)
2
2
Lebesgue measure, then
IT{(,YI)
>
u <
E[\f(t)
-
*].
on a set w i t h positive
Jr(ftlj).
Using these properties, one can construct the function A'(i)
which satisfies (6.5.2)
and (6.5.6). 2°)
Since E[x'(u)\K*(s),
s < ti] =
A'(u){E[£(u}\Y'(s),
s <
« J - £ * ( » ) ) = 0,
it is easily seen from (6.5.2) and (6.5.6) that /.{ft
Y*)
=
\ £
E\x'{uf)
du =
\ £
p(uf
du =
C).
Thus the capacity is achieved and the encoding x ' is optimal i n the sense of mutual information. Let x £ A ^ d be an arbitrary encoding given as i n (6.5.7).
Denote
258
CONTINUOUS
by A(f-) = 2
E[\({t)
-
GAUSSIAN
£(r)| ] and A * ( t ) 2
A'(t)
the filtering errors corresponding to
s
x ' , respectively. By (i), to prove A * ( t )
CHANNELS
0
On the contrary, suppose that A ' { t ) < A(t)
=
0
£
(6.5.8)
on a set w i t h positive Lebesgue
= max( A " ( t ) , A ( t ) ) and define a coding scheme by
0
Y (t)
and
2
> A(t),
measure. Then we define A (t)
x
< A * { ( ) , i t suffices t o show that
2
- f ( )}
A (u){tiu)
0
0
s jf
du + B(t)
U
x (u)
du +
0
B(t).
By (i), A ( i ) < m i n ( A * ( i ) , A ( i ) ) , where A ( i ) is the filtering error corresponding 0
D
to IQ.
Therefore J S M O
2
= A (f) A (f)
]
meaning that x
0
0
2
0
2
< m a x ( A ' ( ) A ' ( ( ) , A(f) A(t.) ) E
I
1
2
2
=
(t)\
P
£ A^dd. Thus by (ii)
This is a contradiction. Thus (6.5.8) should be true. I n the same way, the uniqueness of the function A * { t ) i n 1°) can be proved.
•
6.6 G A U S S I A N C H A N N E L S W I T H O U T
FEEDBACK
In the preceding sections, we have discussed Gaussian channels w i t h additive white noise. I n this section, we consider channels w i t h additive Gaussian noise of general type, assuming that the channels are
without
feedback.
The case of
channels w i t h feedback will be discussed i n Sec. 6.8. Since the channels are w i t h o u t feedback, Hilbert space method can be applied to calculate m u t u a l information and channel capacity. We start w i t h the calculation of the mutual information between two Gaussian processes.
Let
X
=
{X{t);t
£ T}
and
Y
=
{Y(t);t
E
T)
be mean continuous
Gaussian processes w i t h mean zero and finite variance, where the time parameter space T is [0,cc) or a subinterval of [0,oo) and a process to be mean continuous if l i m , _ , E\\X(s) pairwise Gaussian, namely
{X(t),
Y(t);
t
X
=
{X(t}}
is said
- X ( ( ) t ! = 0. Assume that (X, Y ) is 2
£ T ) forms a Gaussian system. T h e n the
6.6 GAUSSIAN
CHANNELS
m u t u a l information I(X,Y)
between X
WITHOUT
259
FEEDBACK e T } a n d Y = {Y{t);t
= {X(t);t
£ T)
is c a l c u l a t e d i n the s a m e w a y as i n T h e o r e m 5.3.3 for discrete time G a u s s i a n processes.
W e denote b y H{X),
e T ) , {Y{t);t
{X(t);t
operators Bx
J
£ T } a n d {X{t),Y{i);t
€ T ) , respectively, a n d define
a n d By o n H b y (5.3.3) using t h e orthogonal projection operators
I l x a n d f l y o n H{X) a c.o.n.s.
a n d H subspaces of L ( f i , P ) spanned by
H(Y)
V = {V ;n n
a n d H{Y). £ Z j
U s i n g L e m m a 5.3.2, we c a n see that there exists
(resp.
+
€ Z + l ) of H(X)
V = {V ;n n
consisting of t h e eigenvectors of B
(resp. By)
X
(resp.
m,n e Z ,
(6.6.1)
+
where
{ - , •} denotes t h e inner product
eigenvalues of B
X
of L*(Sl,P)
a n d A„, n 6 Z , are t h e +
a n d B y . W e call (U, V) = ({(/„},
onahzation of (X, Y).
K(K))
such that
{V }) n
a simultaneous orthog-
W e c a n prove the following result w h i c h is corresponding to
T h e o r e m 5.3.3. Theorem
6 . 6 . 1 . Let X
=
€ T ) and Y
[X(t);t
= {Y(t);t
€ T } be G a u s s i a n
processes s u c h that (X, K ) is pairwise G a u s s i a n . T h e m u t u a l information I(X, is finite if a n d only if Bx
eigenvalue. I n this case, t h e m u t u a l information I(X,
I(X,Y)
= IiV,V)
=
f^I{U ,V ) n
=
n
=
{X{t))
Y)
is given by
-l|Sog(l-
n=l
Note t h a t , e v e n if X
Y)
is a n operator of trace class a n d does n o t have 1 as a n
A„).
(6.6.2)
n=l
and Y
=
{Y(t)}
a r e not G a u s s i a n , the first
equality of (6.6.2) is true; I(X,Y)
(6.6.3)
= I(U,V).
W e now t u r n to discuss t h e information stability of continuous time G a u s s i a n processes.
L e t (X,Y)
=
[0, oo) a n d denote by IT{X, t < T)
a n d Y0T.
{X{t),Y{t);t Y)
S T } be a G a u s s i a n s y s t e m , where T m
the m u t u a l information between Xj
= {X(t);0
where f t j P i " v " * a n d / i x y denote t h e probabihty distributions of Xj, joint d i s t r i b u t i o n , respectively. W e denote by T ,
T.
m
<
W e put
= [(«,») € R ^ x
( T )
Yj" a n d t h e
the £-typical set:
1 |kiv («.1f) " W . * " ) | < e} OT
(6.6.4)
CONTINUOUS
260
GAUSSIAN
CHANNELS
(pi^KP'jl*
(cf. (4.5.7)). For each fixed T > 0, Set ( t f < > , V< >) = r
T
simultaneous orthogonalization of {Xj,Y ).
€ Z + ) be a
A p p l y i n g Theorem 6.6.1, we obtain
T
0
co h(X,Y)
Since {U$P\n H (X) T
= KV^VW)
g Z } (resp. +
= Y I(U^X /
{vi ;n
T
>
)-
(6-6.5)
€ Z } ) forms a c.o.n.s. of the subspace
T)
+
spanned by X(t)
(resp. H (Y)),
T
(resp. Y(t)),
0
u!P
and
can be w r i t t e n i n the form
U P =
[
T
f (t)X(t)dt,
V™
n
=
Jo Moreover we can show that ip^ \x,y) T
fg (t)Y(t)dt, Jo
nEZ+.
n
is invariant,
(x,y)
= ^ (v,v),
T
(6.6.6)
T)
under the transformation ( z , y ) = ( { z ( t ) } , {y(t)})
( u , v ) =J ( ( « „ ) , { « „ ) ) given
-*
by Jo
W(M<)d(,
t , „ = / g„{t)y(t}dt, Jo
neZ+,
where j (T) du'y
X
U
'
V
and (i'Jji denote the probability distributions o f U^ \ V ' * ' and
here //J"*,
T
the joint distribution, respectively. Define a set
f
(T)
t
7
by
= { ( « , » ) ; | | l o g ^ ( « , « ) - /(i/< »,v*i)| < j . . m
r
e
Then ^ ( ^ W c r ^
5
) -
(6.6.7)
Using (6.6.2) and (6.6.7) i n place of (5.3.17) and (5.3.18), we can prove that a continuous time Gaussian process is information stable, i n the same way as i n the discrete t i m e case (cf. Theorem 5.3.5).
6.6 GAUSSIAN
Theorem
CHANNELS
6 . 6 . 2 . Suppose that
(X,Y)
WITHOUT =
261
FEEDBACK
g [0,co)J is the same
{X(t),Y(t);t
pairwise Gaussian system as in Theorem 6.6.1. 1°)
For any
0 and
T >
£ >
^
0,
K
m
)
>
l
-
^
.
(6.6-8)
The pair ( X , Y ) is information stable, provided that
2°)
r ™
M
§in=<>-
1
Let us apply the simultaneous orthogonalization method developed above to calculate m u t u a l information transmitted over a Gaussian channel represented by (6.1.1). Assume that the channel is without feedback, so that every channel input X
=
{X(t)}
is independent of the Gaussian noise
Z =
I n the following
{Z(t)}.
we assume that the noise process Z is mean continuous. Theorem
6.6.3. Let the Gaussian channel (6.1.1) be without feedback and a
channel input X be a mean continuous (not necessarily Gaussian) process. Assume that the operator B x is of trace class and does not have 1 as an eigenvalue. Let {U,V) W
{ V } ) be a simultaneous orthogonalization of
= ({U„}, = {W ] N
n
(X,Y)
and define
by = T/KV*
V
N
n g Z+,
+ W, N
(6.6.10)
where ( A „ ) are the eigenvalues of B x - Then (6.6.10) presents a discrete white Gaussian channel and the mutual information I(X,Y)
= T[U,V)
= Jim
=
is calculated by
I(X,Y)
£ ( { # , , U
l i m {h (V) JV—oo
N
) , ( V
U
V
N
)
- h (W)}.
N
(6.6.11)
N
P r o o f . To prove that (6.6.10) presents a white Gaussian channel, we have to show that W , n g Z , are mutually independent Gaussian random variables and that +
n
W = {W } n
is independent of U = {[/"„}. Since U
m
and V
m
are orthogonal t o U
a
and V„ ( m / n ) , i t is clear that W , n g Z , are mutually orthogonal. Noting +
n
\\U \\ = n
1, from (6.6.1) and (6.6.10), we know that {U ,W )=Q m
n
for all
(U ,W„) n
m,ngZ . +
= 0, so that
262
CONTINUOUS
GAUSSIAN
CHANNELS
This means that W
n
e H(X) .
(6.6.12)
L
O n the other hand, i t follows from (6.6.10) and (6.6.1) that Wn 6 n(X,Y)
c H(X)ffi
H{Z),
where H(JT) ffi W(Z) denotes the direct sum of H{X)
(6.6.13) and H{Z).
(6.6.12) and
(6.6.13) mean that
w
n
Thus 0 = {Vn}
e W(Z),
n e z+.
is independent of W = {W„}.
(6.6.3) and Theorem 5.1.1.
Finally Eq. (6.6.11) follows from
•
We proceed to study the capacity of the Gaussian channel (6.1.1) without feedback under an average power constraint. Mathematically speaking, an average power constraint is specified by a class of covariance functions. I n the following, we fix a time parameter space T = [0, I*J, Let K be a class of covariance functions of processes w i t h time parameter space T . We assume that a process X = is possible t o be input, if the covariance function Rx(t,
{X(t)}
s) of X is i n R. Now the
class of all admissible inputs is given by X(R)
= {X; X is independent of Z and R
£ R).
x
It is shown that the capacity C — C(X{&)),
(6.6.14)
defined by (4.4.8), of the channel is
achieved by i n p u t t i n g a Gaussian signal. We denote by C
a
the capacity of the
same channel limiting channel inputs only on Gaussian processes. Define a class * , ( R ) by X {R) 3
= {X € # ( R ) ; A" is a Gaussian process }.
Then C = C ( * ( R ) ) . f
(
T h e o r e m 6.6.4. Under the constraint given by (6.6.14), the capacity C C{X(R))
=
of the Gaussian channel (6.1.1) without feedback is attained i n the
Gaussian system, namely C(X(R))
P r o o f . For any X = { A ' ° ( f ) } £ X (K) g
{X(t)}
= C(X,{R)).
(6.6.15)
6 A'(R), there exists a Gaussian process X"
w i t h the same covariance function as X.
=
Let us consider two
discrete Gaussian channels given by (6.6.10), corresponding to X and X . a
The
6.7 STATIONARY
GAUSSIAN
channels have a common noise W = {W }, n
corresponding t o X and X°,
263
CHANNELS
the inputs U = {U } n
and U° = {£/£}
respectively, have the same covariances, and U° is
Gaussian. T h e n , by (6.6.11) and (1.8.16), i t is shown that
I(U,V)
<
I(U",
V°),
and we have I(X,Y)
where Y,
V, V°, V
=
I{U,V)
<
I(U°,V}
=
I(X ,Y°),
(6.6.16)
a
are the outputs corresponding to X,
U, X",
X € A'(R) is arbitrary, (6.6.16) implies that C is not greater than C . g
C is not less t h a n C,. Thus (6.6.15) is obtained.
V
Since
By definition
•
When an additive noise is not Gaussian, no formulas have been known to calculate the capacity.
However, one can give an upper and a lower bounds by
comparison w i t h a Gaussian channel. We consider two channels: Y { t ) = X ( t ) + Z(t), K°(t) = X ( t ) + Z' (t), ,
where
Z =
{Z(t)}
0<(
(6.6.17) (6.6.18)
is a non-Gaussian process, while Z° = { Z ° ( t ) } is Gaussian. We
assume that the channels are without feedback and the constraint given by (6.6.14) is imposed. Using the simultaneous orthogonalization method, i n the same way as in the discrete time case (see Theorem 5.5.4), we can prove the following theorem. T h e o r e m 6.6.5. Let C ( Z ) and C{Z°)
be the non-feedback capacities of the addi-
tive noise channels (6.6.17) and (6.6.18), subject t o the constraint given by (6.6.14). If the noise Z ° is Gaussian and has the same covariance function as Z , then the capacity C ( Z ) is bounded by C(Z°)
< C(Z)
< C(Z°)
+ H(Z;Z°).
(6.6.19)
6.7 S T A T I O N A R Y G A U S S I A N C H A N N E L S A channel (6.1.1) w i t h additive noise Z = ( Z ( ( ) } is called a s t a t i o n a r y c h a n n e l i f Z is a stationary process, where the time parameter space is T = R.
In
particular, if Z is stationary Gaussian, then the channel is called a s t a t i o n a r y
264
CONTINUOUS
GAUSSIAN
CHANNELS
G a u s s i a n c h a n n e l . This section is devoted to studying m u t u a l information t r a n s m i t t e d over a stationary channel.
Throughout the section we assume t h a t the
channel is without feedback, so that every input We start w i t h a white Gaussian channel 6 . 1 , a white Gaussian noise i/(() = B(t)
X
— {X{t)}
(6.1.6).
is independent of
Z.
As we have discussed i n Sec.
can be regarded as a stationarity process
( i n a generalized sense), the white Gaussian channel
(6.1.6)
can be considered as
( a n integral form of) a stationary Gaussian channel. Let the white Gaussian channel
Theorem 6.7.1.
an input signal
x =
be w i t h o u t feedback and
(6.1.6)
be a stationary Gaussian process w i t h S D F /(A).
{x(t}}
Then
the mutual information rate T(x, Y ) is given by lim
I(x,Y)=
T—°°
~I (x,Y)
1
=
T
The proof is based on Theorem
±-
l o g ( l + 27r/(A))dA.
(
J _
in
o
(6.7.1)
a
We need to examine the asymptotic
6.3.8.
behavior of the eigenvalues A „ ( £ ) as S —< co. Although we do not give the proof here, the following lemma has been known. Lemma
6.7.2.
on L (\0,S]) 2
Let A „ ( S ) , n = 1 , 2 , b e the eigenvalues of the integral operator
w i t h integral kernel R(t, s) = y ( t - s), where 7 ( 1 ) is the covariance
function of x =
\x{t)): 7
(t)=
f°
z f{\)d\.
(6.7.2)
iik
J-oo
Then, for any 0 < a < b such t h a t m ( { A ; / ( A ) = a}) ( m ( A ) denotes the Lebesgue measure of a set
^#
a
£ ^ ( S )
< &}
= m ( { A ; / ( A ) = 6}) = 0
A),
= ^ m ( { A ; a < /(A)
< 6}),
(6.7.3)
where # B denotes the cardinal number of a set B.
P r o o f o f T h e o r e m 6 . 7 . 1 . Note that / ^ , / ( A ) d A =
7
( 0 ) and
A (S) n
=
7 ( 0 ) 5 are finite. For any e > 0, we can choose a = a < o, < •. • < , = ft and a
S
0
such that m ( { A ; / ( A ) = a )) k
a
= 0 and the following conditions (a) - (e)
satisfied.
(a)
1 / 4^r J-oo
l o g ( l + 2 * / ( A ) ) d A - / l o g ( l + 2 r/(A))dA 1
JB
are
6.7 STATIONARY
GAUSSIAN
265
CHANNELS
whereB={A;c<<|A|5}.
£
where £ *
log(l + A„(S)) -
M l + A„{5))
<£,
> S ,
S
0
< 0.
denotes the summation taken over all t i such that or <
n
/ l o g ( l + 27r/(A))dA - V l o g ( l + 2 r a _ ) m ( y l * k=i )
jfc
1
1 , 8
where A
t
= {A; a _ , < /(A) < t
a }. k
(d)
< y,
where B * = {n;2wa -i k
25 2
=
5> 5 , 0
< A„(5) < 2 7 ^ } .
l o g { l + 2 A ( 5 ) ) < e.
# B * l o g ( l + Z*B»_ ) t
n
Combining (a) - (e), we have
j -
r
l o g ( l + 2*/(A)) dA - f ]
Fl=l
log(l + 2A„(5))
<5e.
Since e > 0 is arbitrary, we obtain (6.7.1) from (6.3.38) and (6.7.4).
(6.7.4)
•
Let us consider a stationary Gaussian channel (6.1,1), where the noise Z
=
{ Z ( f ) } is a stationary Gaussian process with SDF ff(A). We assume that an input X = {X(t)}
is also a stationary Gaussian process w i t h SDF /(A), I n this case, i t
is conjectured that the mutual information rate I(X, I(X,Y)
l i m l=lT(X,Y)
= I
—'OO
=
Y) is given by A(f,g),
(6.7.5)
1
whe
In some special cases, the formula (6.7.5) has been proved i n a rigorous way. For example, i f b o t h /(A) and g(\)
are rational SDF's, then (6.7.5) is true ([96]).
266
CONTINUOUS
GAUSSIAN
CHANNELS
However, i n general, there are some mathematical difficulties t o prove (6.7.5) rigorously.
Recall that, i n discrete time case, a result similar to (6.7.5) has been
proved i n Theorem 5.2.3. Note that the formula (6.7.1) can be considered as a special case of (6.7.5), because the white noise B(t)
is a generalized stationary Gaussian process w i t h
SDF ( A ) = ( 2 i r ) - . 1
?
For the stationary channel (6.1.1), i t is interesting to examine the behavior of IT(X,
Y ) as a function of T. For notational convenience we put Y£ = {Y(t);
t
S <
Then note that I {X,Y)
=
T
I{X£,Y ). T
0
The conditional mutual information L(T)
= KXl^Yl^Y^),
T>0
(6.7.7)
is a useful quantity because, as will be seen later, L ( T ) is linear i n T . We introduce the following notations:
MY)
=
I(Y? ,YF\Y ),
MY)
=
I{Yl ,Y«\Y(Q)),
T
oo
0
x
K (Y)^I{Y^,Y \Y(0)), T
T
(5, T > 0). Note that
0
monotonically increases to
Jr,s( ) y
as S - t co. The
J (Y) T
following theorem can be proved i n a way similar to the proof of Theorem 5.1.2. T h e o r e m 6.7.3. Let the stationary channel (6.1.1) be without feedback and an input signal
X
=
{X{t)}
be also stationary. Here 2 = ( Z ( i ) } and
X
=
\X(_t)}
are not necessarily Gaussian. 1°)
If
I (X,Y), T
J (Y) T
and
are finite for every
J {Z) T
T
> 0, then for all T > 0
L{T) = AT, I (X, T
(6.7.8)
Y ) = A T + J ( Z ) - J ( Y ) + C, T
(6.7.9)
T
where A and C are constants independent of T. 2°)
If
I {X,Y) T
I (X, T
for all
T >
0.
is finite for all
T
>
0, and if J ( V ) and 0
Y ) = A T + K ( Y ) - K ( Z ) + I(X(0), T
T
M
z
)
K(0)),
are finite, then (6.7.10)
6.7 STATIONARY
R e m a r k . W h e n J {Y)
GA USSIAN
+ J { )-
is finite we have J ( Y ) = K {Y)
0
367
CHANNELS
A
T
T
y
Therefore the
assumptions of 1°) are weaker than those of 2").
As an easy consequence of this theorem, we know that I T { X , Y )
diverges ap-
proximately linearly as T —» oo. T h e o r e m 6.7.4. Under the assumptions of 1°) of Theorem 6.7.3 we have lim (I (X, T—*oo
Y) - AT)
T
= C
(6.7.11)
and ^Ir(X,Y)
(6.7.12)
= A.
The constant C satisfies C = Km{MY)
~ Js(Z)
+
Is(X,Y)}.
Moreover, i f the assumptions of 2°) are satisfied, then c = MY)
If the input X
= {X(t)}
- MZ)
+ i(X(Q),
y(0)).
and the noise Z = (Z(t))
are stationary Gaussian
processes w i t h SDF /(A) and ff(A), respectively, then i t is known that the constant A i n Theorem 6.7.3 and 6.7.4 is equal to A(f,g) that, if the assumptions (finiteness of JT(Y)
of (6.7.6) ([97, 66]). This implies and J T ( Z ) )
i n Theorem 6.7.3 are
proved to be true, then the formula (6.7.5) should be true. Here we give a comment on a stationary Gaussian channel presented by
Y(t)=
where Z — {Z(t)}
f
Jo
x(u)dv
+ Z(t),
(6.7.13)
t>0,
is a stationary Gaussian process w i t h SDF g(A).
that the inputs x = ( z ( t ) } ' s are limited on the set X.(p)
We assume
of all stationary processes
which satisfy (6.4.1) (or (6.4.9)) and independent of Z. Then the capacity is defined by (6.4.11) w i t h replacement of X{p)
by X,(p).
C.{p)
If the formula (6.7.5) is
true, then we can easily show, using the water filling method, that C.(p) = ^ / _ J o g [ m a x ( ^ , l ) j d A ,
(6.7.14)
268
CONTINUOUS
GAUSSIAN
CHANNELS
where B is a constant determined by /
(B-g(X))dX
=
p. 2
J{X;gW
Again, we may expect that (6.7.14), corresponding to the formula (5.5.7) i n the discrete time case, is true i n general. However (6.7.14) has been proved only in special cases including the case where g(X) is rational.
6.8 G A U S S I A N C H A N N E L S W I T H
FEEDBACK
In the preceding sections we have dealt w i t h Gaussian channels w i t h o u t feedback or white Gaussian channels with feedback. channels
with
I n this section we treat Gaussian
One of the fundamental problems is t o find a formula
feedback.
to calculate mutual information. To find a formula, using a transformation, we reduce the problem to the one i n the case of white Gaussian channels. Since the channels are w i t h feedback, the transformation should be causal. For this aim, Hilbert space method employed i n Sec. 6.6 is not suitable because the causality is not preserved. From this point of view, the canonical representation is useful. Throughout the section, we consider Gaussian channels (6.1.1) w i t h feedback. The time parameter space T = [0, Tj is fixed, unless otherwise stated. We need to introduce multidimensional white Gaussian channels. First we define multidimensional Brownian motion. A Gaussian process
B
{B(t)}
is said
to be a process w i t h independent increments if B(tjc) — B(t -
\), k = 2,
are
k
=
mutually independent for arbitrary ( j < l j < *•• - < (iv. A n JV-dimensional process B
= {B(f)
=
(B (t),...,B (t));t 1
>
N
0} is called an T V - d i m e n s i o n a l B r o w n -
i a n m o t i o n , if Bi = { B i ( t } } , i = 1,...,N,
are mutually independent zero mean
Gaussian processes w i t h independent increments. To each B , , there corresponds a measure n i j on [0, co) such that m,((
0
/(f)dB,(i) =
2
£
f{t,u)dB,{t,u)
can be defined i n the same manner as i n Sec. that
j„
|/(t,w)|
J
dm,(t)
<
a < 6.
Bi(a)\ ],
(6.8.1) w i t h respect to B ;
A.3, where / is a function such
oo w i t h probability one. Property (A.3.3) should be
replaced by ii
j
}{u)dBi{u)J
g(v)dBi{v)\^
=j£
/(n^Hdm^u).
6.3 GAUSSIAN
CHANNELS
WITH
169
FEEDBACK
In the sequel of this section we assume that the noise process 2 = { 2 ( 1 ) } of the channel (6.1.1) is represented i n the form
2(t) = £ /
where B = ( B ( t )
0<(
^((.uJdB.fu),
(6.8.2)
5 (Bi((),...,B/v(())} is an iV-dimensional Brownian motion
w i t h associate measures m , , i =
1 , s u c h
that m i >- m j >- ••• >• mjy.
Functions P i , t = 1,...,N, satisfy the following conditions; F j ( t , u ) = 0 i f t < U, £ i = i Jo
J
r
' ( * i
u
)
2
dntj{u)
<
oo, and for any real functions / i ,
i
= 1,...,JV, the
following equivalent relation holds N
T / F ( ( , ) / ( u ) d m ( u ) = 0, i=i « ;
< = > / i ( f ) = 0,
U
i
i
Vt, » = 1
Vt
JV.
In this case the equivalence T , { 2 ) =
holds, where <7 (Z)
0 < t < T,
(6.8.3)
denotes the n-field generated by 2 ( u ) , u < r. Intuitively
t
speaking, (6.8.3) means that ( B ( u ) ; u < t } contains exactly the same information that is contained i n { 2 ( u ) ; u
< '[.
I n this sense the representation (6.8.2) is
causal. The representation (6.8.2) w i t h property (6.8.3) is called the canonical representation i n the sense of
Levy-Hida-Cramer.
R e m a r k . I n general, a separable Gaussian process 2 = ( 2 ( t ) } such t h a t n , W , ( 2 ) = {0} can be canonically represented i n the form,
>=1 <>
j=l
J
1=1
The first term (called the continuous spectrum term) on the right-hand side satisfies the same conditions stated above. The second term (discrete spectrum term) is independent of the first term,
fl'(i)'s
are mutually independent processes of the
270
CONTINUOUS
GAUSSIAN
CHANNELS
where Wj is a Gaussian random variable w i t h distribution JV(0,1), and I j ' s are intervals of the form 7, = (t T\ }i
or Ij = [tj,T\
such that I
/
+
l
C Ij, and 6j(t)'s are
functions such that
i i J i S * »=i
for every (. Moreover the equivalence (6.8.3) holds w i t h ir-field T,(B)
generated
by {Bi(u);u
1, ...,£>)-
< t, i -
l,...,N]
and { B } ( u ) ; u < (, j
=
1,...,J,
/ -
Although we assume, for simplicity, t h a t the Gaussian noise Z has a canonical representation w i t h o u t discrete spectrum term, the theorems given i n this section can be proved even i f the representation has the discrete spectrum t e r m ( t h e formula (6.8.12) i n Theorem 6.8.3 needs t o be modified).
It w i l l be seen that mutual information in Gaussian channels is described i n terms of Reproducing kernel Hilbert space ( R K H S ) . Denote by R(t, 3) the covariance function of the process Z. We denote by >CT(Z) Hilbert
space w i t h R(t,s)
the r e p r o d u c i n g k e r n e l
as the reproducing kernel. T h e space K,T(Z),
con-
sisting of real functions defined on [0, T ] , is a Hilbert space w i t h inner product (-,
)Z,T and norm || • \\Z,T satisfying the following properties (a) and ( b ) :
Vte[0,r];
(a)
R(t,-)EK {Z),
(b)
( * ( i , - ) > v ) z , r = *>(*).
T
L e m m a 6 . 8 . 1 . T h e space Kj{Z) {
N
Vte[0,Tl,
is given by
ft
N
rT
1 (6.8.4)
w i t h inner product Jv
M)z,T
.7-
= £ / i=i *
V i
(u)0i( )dfn,(u). U
(6.8.5)
O u t l i n e o f P r o o f . B y (6.8.2), i t is shown that t h e covariance function R(t, s ) of Z is w r i t t e n i n the form N R(t,s)
= Yl
f
, Fi(t,
u)dmi(u).
(6.8.6)
6.S GAUSSIAN
CHANNELS
WITH
FEEDBACK
Using (6.8.6) and properties (a) and (b), we can prove (6.8.4) and (6.8.5).
I t is known t h a t the R K H S K
271
•
( B ) of a Brownian motion B = { 5 ( f ) } is given
T
by = {#(<) =
KT(B)
/ ¥*>)**;/ Jo Jo
Mu)\ du
(6.8.7)
w i t h norm
f Wif±>
(6-8-8)
Jo
From the theoretical point of view, the following fact is fundamental (proof is omitted). T h e o r e m 6 . 8 . 2 . ([53])
Assume that the Gaussian channel (6.1.1) is without
feedback and an input signal A' = {X(t)} information IT(X,
is a Gaussian process. Then, the mutual
Y) between the input X and the output Y corresponding to X
is finite if and only if P(X(-,w)e£ (Z)} = l . T
(6.8.9)
I t is known t h a t , under the condition (6.8.9), a Gaussian process X is represented in the form N
where X, = {xi(t)},
[ T
= Y\
X(t,w)
/ F (t,u)x (u,u)dm {u),
1=1 J"
i = l,...,N,
i
i
i
(6.8.10)
are Gaussian processes. Moreover i t is known
that (6.8.9) is satisfied i f and only if N
E[\\X\\
2
2 T
T
F
} = T
E[\x {t)\ )dm {u)<™. 2
1=1 °
i
i
(6.8.11)
J
In the study of white Gaussian channels, i t has always been assumed that the channel input X = {X(t)}
is of the form, X(t)
= / ' z ( u ) d u (cf. (6.1.6)). I n terms 0
of RKHS, this is equivalent to assuming that X( • ) € K T ( B ) . Therefore, Theorem 6.8.2 provides a theoretical justification of this assumption. We now t u r n to study a Gaussian channel (6.1.1) w i t h feedback. As a generalization of Theorem 6.2.1, a formula for the mutual information is given i n the following theorem.
272
CONTINUOUS
GAUSSIAN
CHANNELS
T h e o r e m 6.8.3. Assume that the Gaussian channel (6.1.1} is w i t h feedback and that an input signal X = {X(t)}
satisfies (6.8.10) and (6.8.11). T h e n , the m u t u a l
information /((£, Y) between a message f and the corresponding o u t p u t Y is given by
rii&r) = 2\mx-x\\%, \ i
-i
N
= T
E[Mu)-*j(u>| ]*n,-<»),
i>0,
3
(6.8.12) where
N
x(t)
[
= T
and X{(t) = E[xi(t)\Y(u),
t
ft&vpMdmW
/
< co
(6-8.13)
u < t] is the conditional expectation.
O u t l i n e o f P r o o f . We put N
Yi(t)
,1,
= Y
/ z , ( u ) d m , ( ) + !?;((),
l
i = l
u
N.
(6.8.14)
TT, Jo
u Then
N
Y(t)
+ Z(i) = V / Fi(t, u) dYi{u). •=i
= X(t)
(6.8.15)
J o
By the causality (6.8.3), we can show that ,(V-) = «7,(y,
ff
Y)
(6.8.16)
N t
meaning that It(t,Y)
= I,(UYi
(6.8.17)
Y )) N
Equations (6.8.14) present an JV-dimensional white Gaussian channel w i t h feedback.
A l t h o u g h (6.8.14) is multidimensional, in a parallel way as i n Theorem
6.2.1, we can show that the mutual information I,((,(YI,...,YN)) right-hand side of (6.8.12).
is equal to the
•
Noting (6.8.8), we know that the right-hand side of (6.2.2) can also be w r i t t e n as \E[\\X
- X\\ ,TI B
w
h
e
r
e
*(*)
= Jo' x(u)du
and X(t)
= /„' x(u)du.
Therefore, the
formula (6.2.2) is a special case of (6.8.12). I n the study of (non-white) Gaussian
6.8 GAUSSIAN
CHANNELS
WITH
FEEDBACK
27.1
channels w i t h feedback, the formula (6.8.12) plays fundamental roles.
We can
prove interesting theorems (Theorems 6.8.4 - 6.8.6) which are continuous time versions of Theorems 5.7.3 - 5.7.5. Let R be a class of covariance functions. Denote by Rx(t,s) function of a process X. A(R)
= {((,
A,( )
=
R
satisfies ( A . l ) - (A.3), R
X ) ; ({, X) x
A , ( R ) and X(R)
Define classes A(R),
x
) € A(R);
x
back.
€ R},
( £ ( - ) , X( • ) , Y( •)) forms a Gaussian system}
A ' ( R ) = {X; X is independent of Z, R
T h e o r e m 6 . 8 . 4 . ([62])
the covariance by
(E R } .
Assume that the Gaussian channel (6.1.1) is w i t h feed-
Then the capacity under the constraint specified by R is attained i n a
Gaussian system: C,{A{R))
= C,(A,{R)).
(6.8.18)
Later this theorem w i l l be proved only for a special case. However the basic idea t o prove can be applied for the general case. T h e o r e m 6 . S . 5 . Let f and X = {X(t)} ian channel (6.1.1) w i t h feedback.
be a message and an input of the Gauss-
Let X"
independent of Z and denote by Y" — {Y°[t)} = X°{t)
Y°{t)
+ Z(i),
=
{X°(t)}
be a Gaussian process
the output corresponding to X°: 0 < i < r.
(6.8.19)
If X and A" have the same covariance function, then 0
(6.8.20)
ITU,Y)<2IT(X\Y ). 9
This theorem can be proved, using Theorem 6.8.2 and Theorem 5.7.4. As an easy consequence of this theorem, we can show that feedback at most doubles the capacity of a non-white Gaussian channel. T h e o r e m 6 . 8 . 6 . Under a constraint specified by a class R of covariance functions, denote by C"{X{R))
the capacity of the Gaussian channel (6.8.19) without
feedback. Then C°(X(R))
< Cf(A(R))
< 2C°(X(R)).
(6.8.21)
274
CONTINUOUS
GAUSSIAN
CHANNELS
The factor " 2 " i n (6.8.21) cannot be replaced by any other number less than 2.
Later, we shall give an example (Example 8.1) which shows that the last assertion is true. We denote by R* the class of all covariance functions of the form,
K(M=
5j i
i
i
=
t
/ Jo
F (s, )F [t,v)r j(u,v)dm {u)dm (v),
/ Jo
i
where r(u,i>) = £i"ji(U, v))i,j=l,...,N
u
j
i
i
)
is a symmetric non-negative definite function
defined on [ 0 , T ] , satisfying 2
r
JV
r i , ( u , u ) drm(u)
/
<
•=}
The equivalence of (6.8.9) and (6.8.11), and Theorem 6.8.2 mean that the capacity = C / ( A ( R } ) is infinite unless R C R*. Let
Cj(A(R))
W
=
{W(t);Q
<
t <
T)
< t < T } , that is, Miv and uz
be a Gaussian process equivalent to Z = {Z(t);0
are mutually absolutely continuous, where nw denotes the probability distribution of W.
Then i t is known that KT(W)
coincides w i t h KT(Z)
as a set (the inner
products are different). Thus, from the mathematical point of view, i t seems to be reasonable to consider the capacity of the Gaussian channel (6.1.1) under a constraint E\\\X\\W.T\
(6.8.22)
< TP 2
Assume that the Gaussian noise Z = ( Z ( ( ) ; 0 < t < T ) is equivalent to a Brownian motion
B
=
{B(t);0
<
t <
T}.
Then i t is known [51, 50] that
Z
is
canonically represented i n the form,
= B(t)+ J
Z(t) where
f(s,u)
J
/( ,s)dB(u)ds,
(6.8.23)
u
e L ( [ 0 , T ] ) is a Volterra function. Let <j(s,u) 6 Z - ( [ 0 , T ] ) be the 2
2
2
2
resolvent kernel of f ( s , u) (cf. Lemma 6.3.5). Let (£, X ) be an admissible pair of a message and an input. We may assume that A" S K T { Z ) w i t h probability one. Since K-T(Z)
= K T ( B ) (as a set), A" can be expressed as
X(t)
= J
i(u)du.
(6.8.24)
S.8 GAUSSIAN
CHANNELS
WITH
275
FEEDBACK
We define J = { ? { ( ) }
x{t)
= x{t) +
f
Jo
g(t,u)x(xt)du.
(6.8.25)
Then i t is true that i ( t ) = J(t) + / / ( t , ) J ( ) d « . Jo u
Moreover we define
Y =
(6.8.26)
u
by
{Y{t}}
Y(t)=
f
+ B(t).
(6.8.27)
Then (6.8.27) presents a white Gaussian channel.
Moreover i t is shown that
Jo
x(u)du
<*t(Y) — <*t{Y)- Therefore, using Theorem 6.2.1, we have /.(ft Y ) = /,(£, Y)=\£ where
x(t)
=
\x(u) - 2 ( u ) l d u ,
(6.8.28)
2
? ( u ) , u < (].
E[x(u)\
We shall prove Theorem 6.8.4 in a special case where Z is equivalent to B . P r o o f o f T h e o r e m 6.8.4 ( a s p e c i a l case). We assume that the noise Z is represented as (6.8.23). Let (ft X ) G -4(R) be an admissible pair such that X is given i n the f o r m (6.8.24). Note that (£,X) x =
is not necessarily Gaussian. Define
{ £ ( ( ) } by (6.8.25). There exists a Gaussian process
x
a
=
{x (t)} 0
such that
(io, B ) is pairwise Gaussian and that (x , B ) and (x, B ) have a common covariance a
function. A n d define
Y
0
=
Y (t)= 0
Since B(u ) 2
by
{Y (t)} a
[ ?„(«)<&+ B(t), Jo
t>0.
(6.8.29)
— B ( « i ) is independent of x(t) ( u > U] > t ) , we see that 2
n(i,u) = £E[J (f)fl(u)] = £ E [ ? ( i ) S ( u ) ) 0
is a Volterra function. Denoting by fc((,u) the resolvent kernel of /i(f,u), define a process ft = ( f t ( t ) } by
€»(*) = * o ( * ) + / Jo
k(t,u)dY (u). 0
276
CONTINUOUS
GAUSSIAN
CHANNELS
Then we can prove that ft is independent of B, and consequently of Z. Finally, define X„ = {X {t)}
by
0
X (t)=
f Ja
0
where i Y
0
0
= {^o(*))
i s
xo(v)du,
defined by (6.8.26) replacing z ( u ) by
i
0
(
u
) i
^ d denote by
the o u t p u t corresponding to ( f t , A"o). Then we can show ( f t , X a ) € A ( R ) h((,Y)<
and
/r(ft,n)-
Since (ft X ) e -4(R) is arbitrary, we have obtained (6.8.18).
•
E x a m p l e 6 . 8 . 1 . Let the noise Z = ( 2 ( r ) } of the Gaussian channel (6.1.1) be given by Z(t)
= B(t)
- J
dB(u)ds.
j
(6.8.30)
This is a special case of (6.8.23). We consider the following coding scheme: K-(t) =
I
Jo
A(u)(t
- £ { « ) ) du + Z(t),
t > 0,
(6.8.31)
where a message £ is a Gaussian random variable w i t h distribution /V(0,1),
A(t)
is given by
with a function g{t) uniquely determined by 2 s ' ( 0 = ~ {t) 9
3(0) =
+ ^s(i)
3
2
+ V2,
t>0,
^,
and f(u) = £ [ ( | r ( « ) , j < u]. We p u t X-{t)
=
f Jo
x »
= A(u)(t
du,
-
£(«)).
Then i t can be shown that ElVWPl «
\,
<>0,
(6.8.32)
and 1((,Y')
=
lim^lT&Y-)
= 1.
(6.8.33)
S.9 RATE-DISTORTION
FUNCTION
OF A GAUSSIAN
277
PROCESS
We are interested i n the capacity of the channel under the constraint (6.4.9). From (6.8.8), we know that (6.4.9) is rewritten i n the form, Era ^ £ ( | | X | | We denote by C/(p)
(resp.
C{p))
2
Bi7
.]<^.
(6.8.34)
the feedback capacity (resp.
non-feedback
capacity) of the channel subject to (6.8.34). Then (6.8.32) and (6.8.33) mean that C ( l / 2 ) > 1.
(6.8.35)
f
On the other hand, we can show that e~o/2) =
|
Thus, i n this case, C , ( i ) / C ( ± ) > 2. Taking (6.8.21) into account, we conclude that the equality holds i n (6.8.35), and at the same time we know that the factor " 2 " in (6.8.21) cannot be replaced by any other numbers less than 2.
6.9 R A T E - D I S T O R T I O N OF A GAUSSIAN
FUNCTION
PROCESS
We shall give a formula of the rate-distort ion function of a Gaussian process under mean square distortion. I t is given in terms of the eigenvalues of the covariance operator of the Gaussian process. Let £ = (£(t); 0 < ( < T ) be a Gaussian process. The parameter space [0,T] is fixed i n the sequel. Denote by
Jo the mean square distortion.
The rate-distortion function R(D;£)
is defined by
(1.7.5). We denote by | A } ^ L i the set of all eigenvalues of the covariance operator n
R of f. Recall that R is the integral operator on t ( [ 0 , T ] ) having the covariance 3
function R(t,s)
of £ as the integral kernel (see Sec.
6.3).
Let /„ S
L ([0,T]), 2
n = 1 , 2 , b e the eigenfunctions of R corresponding to A„. As we have seen in Sec. 6.3, { / „ } can be chosen so that { / „ } forms a c.o.n.s. of £ ( [ 0 , T ] ) . 2
278
CONTINUOUS
GAUSSIAN
CHANNELS
T h e o r e m 6 . 9 . 1 . Let £ = {£(*)} be a mean continuous Gaussian process. Then, for any
D
> 0,
where 0 > G is a constant uniquely determined by ne £ m i n ( A , t f ) = .D . 2
n
(6.9.2)
2
n=l
P r o o f . Define an operator K : L ([0,T])
9
x = { x „ } € I by 2
2
x
=
n
[
n =
n
l,2,...,
Jo
where I = {x = {x }; 2
\x \ < oo}. Since { / „ } is a c.o.n.s., the operator K
n
n
2
is an isometry. Let r/ — ( i ( f ) } be a process, and define sequences £ = { £ „ } and ij = { n „ } of random variables by Z=Kt,
r } = Kri,
(6.9.3)
namely U " )
=
J
o
fMW,u)dt,
r,„(")
=
j
o
Ut) (t,w)dt. V
(6.9.4)
Since K is an isometry, we can show that
that
E
*
- i-l ] 2
/ £[IC(0 T
fKOI ] * 1
(6.9.5)
Since K is invertibie, i t is true that I&V)
= m,ii)-
(6.9.6)
Here we denote by R ( D ; (} the rate-distort ion function of f under the distortion
6.9 RATE-DISTORTION
FUNCTION
OF A GAUSSIAN
PROCESS
279
T h e n (6.9.5) a n d (6.9.6) mean that, under t h e transformation K of (6.9.3), the m u t u a l informations a n d t h e distortions are kept invariant. Therefore
R{D;t)
=
R{D-l).
(6.9.7)
O n the other h a n d , s i n c e ( is G a u s s i a n a n d K is linear, every ( „ is a G a u s s i a n r a n d o m variable. Moreover
E\Um]
= E
=
j
=
A
Ja
Jo
'
m
j\{t,s)Ml)f ( )dtds m a
Ut)f {s)dtds
I* Jo
m
(6.9.8)
T h i s means t h a t £ = { £ „ } forms a s y s t e m of independent G a u s s i a n r a n d o m variables. I n t h e s a m e way as i n the proof of T h e o r e m 5.8.1, it is shown that Hi. I); £) is equal to t h e r i g h t - h a n d side of (6.9.1). T h u s the proof is complete.
Example
< t < T]
6 . 9 . 1 . L e t B = {B(t);0
be a B r o w n i a n motion.
•
T h e n the
covariance function is R(t, s ) = m i n ( t , s ) a n d the eigenvalues of the covariance operator are AT
2
a-1,2
T h e n , u s i n g T h e o r e m 6.9.1, we c a n show that
R(D;B)
= ^ ± + o ( D -
2
) ,
as
D -
0.
For a n o n - G a u s s i a n process, the same result as T h e o r e m 5.8.3 has been proved. T h e o r e m 6 . 9 . 2 . L e t £ = { £ ( ( ) ) be a n o n - G a u s s i a n process, a n d £° = { £ ° ( t ) ) be a G a u s s i a n process with t h e same eovarianee function as £. T h e n the rate-distortion R(D,£),
w i t h the mean square distortion, is bounded by
R(D;
£°) - //(£; f ° ) < R(D,
£) < R[D;
{•),
(6.9.9)
CONTINUOUS
280
where
H{f;f")
GAUSSIAN
is the relative entropy.
For a stationary Gaussian process £ = R(D;f),
CHANNELS
{£(())> the rate-distortion function
per unit time, can be calculated exactly.
Here
R(D;f)
is defined by
(4.2.8) and the distortion d(£, n) per unit time is defined by dli^f
i
mjm
f E[\at)-#)| ]
T
(cf. (4.2.6)). T h e o r e m 6.9.3. Let f = (£(t);t e R ) be a stationary Gaussian process w i t h SDF /(A). Then, for any D > 0, iJ(i?;£) - ^
logmax ( ^ , l )
(6-9-10)
where 6 > 0 is a constant uniquely determined by f
mm(f(X),0 )d\ 2
=
D. 2
The formula (6.9.10) can be proved by using Theorem 6.9.1 and the asymptotic behavior (6.7.3) of the eigenvalues { A „ ( T ) } as T -> co.
6.10 C O D I N G T H E O R E M
FOR GAUSSIAN CHANNELS
The coding theorem can be proved for continuous time Gaussian channels similarly as i n the case of discrete time. The theorem means that the channel capacity is equal to the supremum of the achievable rates for the channel. Let us consider a Gaussian channel presented by (6.1.1) w i t h time parameter space T = |0,oo). Suppose that the noise Z = { Z ( f ) ; f > 0} is a mean continuous Gaussian process w i t h canonical representation (6.8.2). For each T , we consider a Gaussian channel Y(t) w i t h feedback.
= X{t)
+ Z(t%
0 < t < T,
(6.10.1)
6.10 CODING
L e t W = {W(t)\
THEOREM
7
FT>
0 < i < T}. We denote by
= {W(t);
(T)
the R K H S of W .
281
CHANNELS
t > 0} be a Gaussian process such that W' "* is equivalent to
for each T < co, where W
Z
(T>
FOR GAUSSIAN
K {W) T
We assume that the channel has a constraint given by E\]\Xf \
(6.10.2)
WT
where || • \\w,T denotes the norm of KT{W),
and denote by Cj(p)
the capacity
of the channel (6.10.1) subject t o (6.10.2). I n order that the channel has a finite capacity, as we have seen i n Sec. 6.8, i t is reasonable to treat the channel under the constraint (6.10.2). Moreover we denote by C/(/>) the per unit time capacity of the Gaussian channel (6.1.1) w i t h feedback, under the constraint lim j * £ [ f X | L r i
< P.
(6.10.3)
2
t —•» i
Let J ' * ' be a message w i t h probability 7
PijM
=
m
)
m = 1,...,M< >,
=
(6.10.4)
T
where Af' *^ > 0 is an integer. The message £ * is encoded into a channel input (T
7
A"' "* w i t h the a i d of feedback. We denote by Aj(p) 7
pairs ({, X^)
the class of all admissible
of a message f to be transmitted and an input X^ " . Suppose that 1
1
the output y l " ' corresponding to ( £ ' ' , A""' "') is decoded into a received message 7
r
7
/ J ^ . Then the error probability TT' "' is given by 7
„m
=
P
[
i
m ^ T ) y
The coding theorem for the continuous time Gaussian channel can be stated as follows. The proof parallels the one for the discrete case (see Theorem 5.9.1). T h e o r e m 6.10.1.
1°)
Assume that the feedback capacity Cj(p)
of the Gauss-
ian channel (6.10.1) subject to the average power constraint (6.10.2) satisfies
lim T—oo
3^1=0. T
(6.10.5)
1
If a rate R is less than the per unit time capacity Cf(p),
and C/(p) is continuous
at p, then the rate R is achievable. Conversely, if a rate R is achievable, then R < 2")
C (p). f
T h e same statement holds i n the special case without feedback upon sub-
s t i t u t i o n of C {p) T
for Cj(p)
and C(p) for C/(p), where C (p) T
is the capacity of
282
CONTINUOUS
GAUSSIAN
CHANNELS
the same channel without feedback and C(p)
the per unit time capacity without
feedback.
Proof.
1")
exist 0 <
p
0
Let rf. < C/(p). Then, by the continuity assumption on C/(/>), there < p
and e > 0 such that [ e
for sufficiently large T.
R
T
} <
e
X
0
p { l l -
E
) C j ( p
) } ,
0
Using Theorem 6.8.4, we can show that there exists an £ A p(p )
admissible pair {0,X}
and that
R < Cf(p )
(
such that $ =
0
{0(t))
is a Gaussian process
independent of Z , the channel input X = {A"(f)} is w r i t t e n in the form X { t ) = 6{t) + Y(t), where Y(t)
(6.10.6)
is a linear functional of the output { y ( s ) ; s < (}, and that Cj(p )
-
0
Y)
<
Cj(p ).
Y(t)
+
Z(t).
1 < I (9, T
(6.10.7)
0
Note that the channel is rewritten as =
Y(t)
We define a Gaussian process
=
Y
0
Y (t) o
8(t)
+
by
{Y»{t)} =
+
0[t)
Z(t).
Then we can show that
r (Y) = ^,(y ), t
< t ) and { y ( 3 ) ; s < /} contain the same information.
implying that {Y(s);0
0
Therefore, since F i ( Y } C Tt{Y)
Y can be decomposed into a sum,
= ^(YQ), Y(t)
where V(t)
(>o,
0
=
V(i)
+
S(t),
and 5(f) are linear functionals off?' ' and Z ' , respectively. For later 1
( l
use, we express V i n the form V(t) = 0 ( l , { ( ? ( s ) ; s < t } ) .
(6.10.8)
Note that (6.10.6) is rewritten as X(t)
= 6(t) +
{t)( );s < (}) + 5 ( t ) , s
(6.10.9)
6.10 CODING
The e-typical set T}
FOR
GAUSSIAN
283
CHANNELS
(e > 0 ) is given by
T)
TP
THEOREM
= {jM
6 R' ' 0
7 1
x R ^ ;
I |logv» '(»,lf) -
<
(T
•
where
y
By Theorem 6 . 6 . 2 , lim
/
i«?(7? )> = T
I .
(6.10.10)
Here we employ the random coding method similarly as i n the proof of Theorem Let ( } be a message given by ( 6 . 1 0 . 4 ) w i t h M = [e ] and let u = { u ( t ) ; ( < T ) , m = 1 , . . . , A f ' ' , be code words independently drawn according to the identical distribution Pg. For the message define the channel input 4.5.1-
(T>
T )
WO)
RT
m
T
m
°y X ( t ) = u ( i ) + - A ( ( , { n ( s ) ; s < t } ) + S((),
if
m
m
Given an output V"' "', the received message
( T )
=
m.
(6.10.11)
is reproduced by the same de-
7
coding rule as i n the proof of Theorem
£
4.5.1:
(u^y-mjeTf',
where £ = p e / 4 . 0
Denote by jr<' > = P(£
+ ? )
(T)
r
T )
the error probability of
reproduction. We denote by P { •) and E [ • ] the probability and the expectation c
c
w i t h respect to the choice of code u = (4.5.14), (6.6.8)
EfrW]
and
(6.10.7),
< e x P { ( l - e)Cf(po)}
< expfl for sufficiently large T .
-
Then, using
{u ). m
(4.5.10), (4.5.12),
we have
e{Cj( o) P
ex {-/r(f>, P
-
\ T)}
+
Po
Consequently, noting
Y)
+ 6)
+
^4^,
(6.10.5)
and
Cj(p„)
>
\ps,T,
we
obtain l i m E [*W\ t
T—*oo
=
0.
(6.10.12)
284
CONTINUOUS
GAUSSIAN
CHANNELS
Since the channel has feedback, concerning the average power constraint we need a treatment slightly different from that i n the proof of Theorem 4.5.1. P u t
Then, since {6(t)
is independent of { S ( t ) } , and {X(t)}
+ V(t)}
of (6.10.9) satisfies
(6.10.2) w i t h replacement p w i t h p , a
| B [ l i r + V f c I < ^ - f . Therefore < pi - l\
+ M V l where v
m
O
= (v (t)},
v (t)
m
( T )
=
m
(P) -
{ « ( » ) ; « < *})- We define the class D m
| « = {u }; j p j E
f l K + **ftyr < P - T 2
m
Note t h a t , if u = { u }
(6-10.13)
1
€ Z?' '(p), then the corresponding input X T
m
( T )
{ p ) by
J • =
{X(t)}
given by (6.10.9) satisfies the constraint (6.10.2). From (6.10.13) we know that Pc(D (p))
>
(T)
(6.10.14)
Denoting by E [ j r ' | Z ) ' ( p ) ] the conditional expectation on the event c
(7
)
(7
)
D (p), ,Ti
it follows from (6.10.14) that
E^\D^(P)\<^4E [^]. C
P
~
Po
Consequently, from (6.10,12), we have
lim £ ^ [ 0 ^ 0 0 1 = 0. 1 —'OO
This means that there exist an input X ™
and a received message f * * such that r
@n,xm)eAf>(p)*M>* lim T r—oo
(
T
)
=
0.
Thus the rate R is achievable. The converse part can be proved i n the same way as i n the proof of Theorem 4.5.2. Assertion 2°) can be proved i n the same manner.
•
HISTORICAL HISTORICAL
NOTES
285
NOTES
Shannon [103] included continuous time channels in his original treatment of information theory. Since then continuous channels, especially Gaussian channels, have been studied as one of the major subjects i n information theory (cf. [38, 43]). Kolmogorov [79] and Gel'fand and Yaglom [45] (see also [97]) have analyzed Gaussian systems, using Hilbert space approach, to calculate mutual information and channel capacity. Hilbert space approach is useful to analyze non-feedback channels or linear systems. By an alternative approach, Duncan [33] and Kadota, Zakai and Ziv [73] have derived the formula (6.2.2) (Theorem 6.2.1) for the mutual information transmitted over a white Gaussian channel w i t h feedback.
The
formula (6.2.2) has played a central role i n the theory of information transmission over feedback channels. Proof of Theorem 6.2.1 is essentially based on a result (Theorem 6.2.2) i n probability theory. For the proof of Theorem 6.2.2, see [42, 75, 88]. Section 6.4 follows essentially the approach due to Kadota, Zakai and Ziv [73], Coding schemes i n continuous Gaussian channels have been investigated i n [58, 87, 91, 92]. Theorems 6.5.1 and 6.5.2 are, respectively, due to Ihara [58] and Liptser [87]. Theorem 6.5.3 is obtained i n [59]. Hilbert space approach to calculate mutual information (Theorem 6.6.1) and channel capacity has been developed by Gel'fand and Yaglom [45] and Pinsker [97]. This approach has been further developed by Baker [4, 5]. A useful formula for the capacity of a Gaussian channel without feedback is obtained i n [5]. Theorem 6.6.2 on information stability is due to Pinsker [97], Theorem 6.6.4 was first proved in [55, 56]. Theorem 6.6.5 is proved in [61]. Study of m u t u a l information and capacity of stationary Gaussian channel has been one of the significant topics in continuous communication systems [38, 43, 95, 97, 103, 116]. Formulas (6.7.5) and (6.7.14) themselves are well known, however there are some mathematical difficulties to derive them i n a rigorous way. This problem is discussed i n [6]. The expressions for the mutual information, given in Theorems 6.7.4 and 6.7.5, are due to SoSev [110] and Ihara [66]. Hitsuda [53] has proposed to apply the canonical representation of Gaussian processes t o calculate mutual information. His approach enable us t o calculate mutual information in non-white Gaussian channels w i t h feedback. Theorems 6.8.2 and 6.8.3 are proved i n [53] and [54], respectively. Theorem 6.8.4 is due to Ihara [63]. Similarly as i n discrete time case, feedback at most doubles the capacity of a continuous Gaussian channel (Theorems 6.8.5 and 6.8,6). Example 6.8.1, provided
286
CONTINUOUS
GAUSSIAN
CHANNELS
by Ihara [69], shews that there exists a Gaussian channel whose capacity is exactly doubled by feedback. The formulas for the rate-distortion function (or e-entropy) of Gaussian processes, given i n Theorems 6.9.1 and 6.9.3 are due to Kolmogorov [79] and Pinsker [96].
The comparison theorem (Theorem 5.9.2), concerning the rate-distortion
function, is due to Binia, Zakai and Ziv [11]. The coding theorem (Theorem 5.10.1) for a Gaussian channel w i t h feedback is proved by Ihara. There are many suitable references [32, 5 1 , 57, 88, 98] for stochastic processes. Among them, see [51, 57] for Gaussian processes, especially [51] for the canonical representation, [57, 98] for stationary processes. See [76] for reproducing kernel Hilbert spaces. Integral equations of the type considered here has been treated i n [109].
Appendix Preparation from Probability Theory A.l PROBABILITY AND RANDOM
VARIABLE
This section is devoted to explaining measures, probabilities, measurable functions, and random variables. D e f i n i t i o n A . 1 . 1 . A
B3fl;
(b)
I f B £ B , then B
(c)
I f B „ € B, n = 1,2,..., then U » , B
c
€ B, where B° = f l - B ; n
€ B.
Each element B of B is called a m e a s u r a b l e set.
The pair {St,8) is called a
m e a s u r a b l e space.
D e f i n i t i o n A . 1 . 2 . A m e a s u r e P on a measurable space ( f l , B) is a real function defined on B satisfying (a) and (b); (a)
P(B)
(b)
If B
> 0 for every B e B; n
€ B, n = 1 , 2 , a r e mutually disjoint ( B i UBj=4,
n=l
/
n=] 287
*? i ) . £
t n
en
288
PREPARATION
FROM
PROBABILITY
THEORY
We say that P ( B ) ( B £ 8 ) is the measure of the set B, and ( f t , B , P ) is called a m e a s u r e s p a c e . I n addition to (a) and (b), if (c)
P(fl) = l ,
is satisfied, then P is called a p r o b a b i l i t y or p r o b a b i l i t y m e a s u r e and (ft, B, P) is called a p r o b a b i l i t y s p a c e . The m i n i m u m
of R
d
that contains all open sets i n R
the B o r e l field and each element i n B(B. ) d
d
is called
is a B o r e l s e t . More generally, for a is defined as the smallest a¬
topological space G , the topological Borel field B(G)
field of G containing all open sets i n G . The L e b e s g u e m e a s u r e m is a measure on ( R , e ( R ) ) such that m((<M,6,] x -•- X J
d
(a b }) iy
=
d
D e f i n i t i o n A . 1.3. A real valued function X = X{w),
Y\Li( i ~ i)b
a
defined on f l , is called a 8
m e a s u r a b l e f u n c t i o n (or simply, a measurable function), i f (L> e fl;A"(w) e A ) e 8,
VA e S ( R ) .
In probability theory, a measurable function is called a r a n d o m v a r i a b l e , where ( f l , B, P) is a probability space. The m i n i m u m ir-field c(X) generated by X,
for which X is measurable, is called the ir-fieid
I n the following ( G , £ J ) and ( H , H ) are measurable spaces.
D e f i n i t i o n A . 1 . 4 . If ( f l , 8, P ) is a probability space and X is a G valued function such that { w e i i ; A » e A } e B ,
vAec;,
then A" is called a G valued random variable.
Definition A.l.S.
For a G valued random variable A", the p r o b a b i l i t y d i s t r i -
b u t i o n fix of X is a probability measure on ( G , G) defined by u (A) x
= P(X
G A) = P({w
Denote by G x H = { { i , y ) ; i
e fl; A »
e A}),
A e
Q.
f G , j J H ) the product space. The product
Q x H of Q and TI is the m i n i m u m u-field containing all sets of the form Ax
Aeg,BeH.
B,
CONDITIONAL
PROBABILITY
AND
CONDITIONAL
289
EXPECTATION
D e f i n i t i o n A . 1 . 6 . Let X and Y be, respectively, G and H valued random variables. The j o i n t p r o b a b i l i t y d i s t r i b u t i o n p x Y is a probability distribution of (X,Y),
namely, a probability measure on ( G x H,{? x H ) satisfying X B } = nx{A)fi (B),
PXY(A
Y
AEQ,
BaH.
AeQ,
B e n
We say that A" and Y are independent, if = iLx{A)a (B),
x B)
PXY{A
Y
is satisfied. I n this case we write /IXY — " X x o y and we say that ux X fiy is the direct product of fix and u j - .
Let X and Y be (complex valued) random variables. We say that X is integrable, if/ |A>)|dP( )
W
D e f i n i t i o n A . 1 . 7 . If a complex valued random variable X is integrable, then the expectation of A' is given by j A») =
E[X] =
X(z)du (z).
/
Ja
x
J c
If E[|A"| j and E[|V| ] are finite, then the covariance of X and Y is given by 2
E[{X
2
-
E\X])(Y
-
A.2 C O N D I T I O N A L CONDITIONAL
E[Y])]
=
/ ( A » Jsi
E[X\){Y^)
-
PROBABILITY
-
E\Y\)dP{u).
AND
EXPECTATION
Let ( f l , B , P) be a probability space, A" be a (complex valued) random variable, and
T
be a sub
B.
Assume that
E\\X\)
= /„
\X{w)\dP(ui)
<
oo. Then,
by Radon-Nikodym theorem, there exists a T measurable function
[
VBe/.
(A.2.1)
290
PREPARATION
FROM
PROBABILITY
THEORY
D e f i n i t i o n A . 2 . 1 . T h e function tp i n (A.2.1) is called the c o n d i t i o n a l e x p e c t a t i o n of X given T and is w r i t t e n as V
=
E[X\?].
If Y = { K ( t ) ; t e T ) is a stochastic process and ff(Y) = a { Y { t ) ; t £ T ) is the (r-field generated by Y, then E[X\Y]
= E[X[Y(t),
t e T] =
E[X\a(Y)\
is called the conditional expectation of X given Y. Theorem A.2.1.
{Property of Conditional Expectation)
integrable {complex) random variables and j , JFj, 1°)
2
For any constant a and 6, E\aX +bX2\j \=aE\X \F\ T
l
2°)
Let X , X i , X , U be
be sub fields of B.
7
+
1
bE[X \r}.
(A.2.2)
2
I f A is independent of .F, namely, P ( A D S ) = P ( A ) P ( B ) for any A e ^ J f ) 1
and
then .EfA'l.r] =
3°)
If T C J ! , then 2
7
E\E[X\r,\\r \ 2
4°)
= E[X\F \.
(A.2.4)
= E[X\.
(A.2.5)
2
I t holds that E\E\X\r\)
5°)
(A.2.3)
I f U is ^ measurable and £[|A'| ], £[|J7| ] are finite, then 2
E\XU\F\
!
= E[X\ f]U,
E[(X-E[X\F))U]
(A.2.6)
= Q,
(A.2.7)
£[|A- - Iff] > £[|A- - E L V I S H ] .
(A.2.8)
2
Let
Y = {Y(t);
t €
T } be a stochastic process and
T = a{Y).
T h e n (A.2.8) i m -
plies that the m i n i m u m of mean square errors E^\X — U\ } for all a { Y ) measurable 2
functions is attained by the conditional expectation U = E[X\
(!')].
The conditional probability is denned as a special form of the conditional expectation. We denote by 1 A { X ) the indicator function: l (X)(uj)= A
I
o
if
X{u>)iA.
CONDITIONAL
PROBABILITY
AND CONDITIONAL
191
EXPECTATION
D e f i n i t i o n A . 2 . 2 . The c o n d i t i o n a l p r o b a b i l i t y of {X
e A] given F is defined
by P(XeA\T)
= E[l (X)\r\,
(A.2.9)
A
where X is a G valued random variable and A € C- Especially P(A" G A| V ) = P ( X € A| ( r ( y ) )
(A.2.10)
is called the conditional probability of {X € A} given V .
Let ( G , G) and ( H , Ti) be measurable spaces, and let X and Y be, respectively, G and H valued random variables. T h e o r e m A . 2 . 2 . Let G be a locally compact space satisfying the second countability axiom and Q be the topological Borel field of G. Then there exists a real function
defined o n g x H satisfying (a) - ( b ) ;
(a)
(b)
f{A, -) is an H measurable function, for each fixed A € Q\
(c)
For each fixed A e S , {A,y)
= P{X
Z A\Y)(u),
Y(u)
if
= y,
holds w i t h probability one. D e f i n i t i o n A . 2 . 3 . The functional ip{A,y) >p(A, y) = P{X
in Theorem A.2.2 is written as
e A| Y = y) = (1 \Y{A\
y),
X
and is called the c o n d i t i o n a l p r o b a b i l i t y d i s t r i b u t i o n of X given Y.
In general i t holds that UXY{AXB)=
f JB
(A\y)d,iY(yh
A€g,B€H.
UxlY
(A.2.11)
In particular, substituting B — H in (A.2.11), we have ux(A)=
f p.x\Y(A\y)dii (y), JH Y
AeQ.
(A.2.12)
292
PREPARATION
FROM
PROBABILITY
THEORY
A.3 S T O C H A S T I C I N T E G R A L
Let ( f l , S , P ) b e a probability space. motion and T,(B)
L e t B = {B{t)\t
> 0 } be a B r o w n i a n
< ( } be the ir-field generated by { B ( s ) , s < t}.
= a{B{s)\s
Let {Bt\t > 0 } be an increasing (8, C B i i f s < f) family o f
C 0 , and that B , is independent o f B(t + u ) - B ( f ) , V u > 0. T h e n the
s t o c h a s t i c i n t e g r a l ( h o integral)
/(u)dB(u) = f
j'
0 < t < T ( < oo),
/(u,w)rfB(u,w),
is defined for real stochastic processes {/(()} w i t h properties (a) - ( c ) ; /((,uf) is B ( R ) X B measurable;
(a) (b)
I f t is fixed, then /((, •) is B
(c)
jT f(t,w)
t
measurable;
dt < oo w i t h probability one.
2
It is known that the stochastic integral /„ / ( u ) d B ( u ) is a random variable which satisfies the following properties. Theorem A.3.1.
{Property of Stochastic Integral)
L e t / = {/(*)} a n d g =
{ ? ( ' ) } be stochastic processes w i t h properties (a) - ( b ) stated above. Then 1")
For any constant a and 6, f'{ f(u) Jo
+ bg(u))dB(u)
a
2")
f * f{u,u)dB[u,u)
3°)
I t holds that
= a f fp)dB(«) Jo
K £ | / / ( u ) d B ( u ) ] < co and Etff 0
E
T
{A.3.1)
is a continuous function of ( w i t h probability one.
0
ft £f(u)dB(u)
4°)
+ b f g(u)dB{u). Jo
2
i = 0.
(A.3.2)
g(u) dB(u)} < oo, then
j ' f{u) dB{u) £ g{v) d B ( „ ) ] = jf
2
f(u)g(u)
du
(A.3.3)
(s A f = m i n ( s , f ) ) .
I f / ( " ) is non-random, namely /(u,u>) does not depend o n u , then the stochastic integral JljJ /(u) d B ( u , w ) is called a W i e n e r i n t e g r a l .
T h e Wiener integral is
defined for all functions i n £ ( [ 0 , T ] ) . Finally we note that 2
/
Jo
ldB(u,u)=M(t,u),
(A.3.4)
Bibliography
|1] Aczel, J . arid Z. Dardczy (1975). On Measures of Information acterizations.
and Their
Char-
Academic Press, New York.
[2] Akaike, H . (1973). Information theory and an extension of the maximum likelihood principle. I n Proc. 2nd Int.
Theory, 267-281, Akad.
Symp. Information
Kiado, Budapest. [3] Arimoto, S. (1972). A n algorithm for computing the capacity of an arbitrary discrete memoryless channel. IEEE
Tram.
Inform.
Theory, I T - 1 8 , 14-20.
[4] Baker, C. ft. (1978). Capacity of the Gaussian channel without feedback. form.
Control,
3 7 , 70-89.
[5] Baker, C. R. (1987). Trans. Inform.
In-
Capacity of the mismatched Gaussian channel.
IEEE
Theory, I T - 3 3 , 802-812.
[6] Baker, C. R. and S. Ihara (1991). Information capacity of the stationary Gaussian channel. IEEE
Theory, 1 T - 3 7 , 1314-1326.
Trans. Inform.
[7] Barron, A. (1986). Entropy and the central l i m i t theorem. Ann.
Probability,
14, 336-342. [8] Berger, T . (1971).
Rate Distortion
Theory.
Prentice-Hall, Englewood Cliffs,
NJ. [9] Billingsley, P. (1965). Ergodic
Theory and Information.
[10] Billingsley, P. (1986). Probability
Wiley, New York.
and Measure. Wiley, New York (Second Ed.).
[11] Binia, J . , Zakai, M . and J . Ziv (1974). On the £-entropy and the rate-distortionfunction of certain non-Gaussian processes.
IEEE
Trans.
Inform.
Theory,
I T - 2 0 , 517-524. [12] Blachman, N. (1965). Tram.
Inform.
The convolution inequality for entropy powers.
Theory, I T - 1 1 , 267-271. 293
IEEE
294
BIBLIOGRAPHY
[13] Blahut, R. E. (1972). functions. IEEE
Computation of channel capacity and rate d i s t o r t i o n Theory, I T - 1 8 , 460-473.
Trans. Inform.
[14] Blahut, R. E. (1974). Hypothesis testing and information theory. IEEE
Trans.
Theory, I T - 2 0 , 405-417.
Inform.
[15] Blahut, R. E. (1987). Principles
and Practice
of Information
Addison-
Theory.
Wesley, Reading, M A . [16] Box, G. E. P. and G. M . Jenkins (1976). Time Series Analysis: Control.
Forecasting
and
Holden-Day, Oakland.
[17] Bucklew, J . A. (1990).
Large Deviations
Techniques
in Decision,
Simulation,
Wiley, New York.
and Estimation.
[18] Burg, J . P. (1967). Maximum entropy spectral analysis. Presented at the 37th Ann.
Int.
Meet.
Spectral Analysis,
Soc.
Explor.
Geophys.,
Oklahoma City, OK.; I n
Modern
Childer, D. G. (Ed.), I E E E Press, New York, 1978.
[19] Butman, S. (1969).
A general formulation of linear feedback communication
systems w i t h solutions. IEEE
Trans. Inform.
Theory, I T - 1 5 , 392-400.
[20] Butman, S. (1976). Linear feedback rate bounds for regressive channels. Trans. Inform.
Theory, I T - 2 2 , 363-366.
[21] Childer, D. G. (Ed.) (1978). Modern Spectral Analysis.
I E E E Press, New York.
[22] Cover, T . M . and S. Pombra (1989). Gaussian feedback capacity. IEEE Inform.
IEEE
Trans.
Theory, I T - 3 5 , 37-13.
[23] Cover, T . M . and J . A. Thomas (1991). Elements of Information
Theory.
Wiley,
New York. [24] Csiszar, I . (1975). I-divergence geometry of probability distributions and minimization problems. A n n . Probability,
3 , 146-158.
[25] Csiszar, I . (1984). Sanov property, generalized I-projection and a conditional 1 2 , 768-793.
limit theorem. A n n . Probability,
[26] Csiszar, I . and J . K o m e r (1981). Discrete
Memoryless
Systems.
Information
Theory:
Coding
Theorems
for
Academic Press, New York.
[27] Dacunha-Castelle, D. (1979). Formule de Chernoff pour des rapports de vraisemblance. I n Grandes Deviations
et Applications
Statistiques
(asterisqve
68),
25-31, Societe M a t h . France. [28] Dembo, A. (1989).
On Gaussian feedback capacity.
IEEE
Trans.
Inform.
Theory, I T - 3 5 , 1072-1076. [29] Dembo, A., Cover, T . M . and T . A. Thomas (1992). inequalities. IEEE
Trans. Inform.
Information theoretic
Theory, I T - 3 7 , 1501-1518.
[30] Deuschel, J . D. and D. W . Strook (1989). Large Deviations. New York.
Academic Press,
295
BIBLIOGRAPHY
[31] Dobrushin, R. L. (1959). General formulation o f Shannon' s main theorem i n information theory.
Uspekhi
Mat.
1 4 , 3-104; Translated i n A.M.S.
Nauk,
Transl.
Ser. 2, 33 (1963), 323-438. [32] Doob, J . L. (1953).
Stochastic
Wiley, New York.
Processes.
[33] Duncan, T . E. (1970). On the calculation of the mutual information. Appl.
Math.,
SIAM
[34] Ebert, P. (1970). The capacity o f the Gaussian channel w i t h feedback. System
J.
1 9 , 215-220.
Tech.
Bell
J., 4 9 , 1705-1712.
[35] Ellis, R. S. (1984). Large deviations for a general class of random vectors. Ann. Probability,
1 2 , 1-12.
[36] Ellis, R. S. (1988).
Entropy,
Large
Deviations
and Statistical
Sprin-
Mechanics.
ger Verlag, Berlin. [37] Faddeev, D . K . (1956). scheme. Uspehi
[38] Fano, R. M . (1961).
1 1 , 227-231 ( i n Russian).
Transmission
of Information:
A statistical
theory
[39] Feller, W. (1950, 1966).
com-
An Introduction
to Probability
Theory
and Its
Applica-
Vol. I , I I . Wiley, New York.
[40] Franke, J . (1985). M i n i m ax-robust prediction of discrete time series. verm.
of
Wiley, New York.
munication.
tions,
O n the concept of entropy o f a finite probabilistic
Mat. Natik,
Z.
Wahr.
Geo., 6 8 , 337-364.
[41] Franke, J . (1985). A R M A processes have maximal entropy among time series w i t h prescribed autocovariances and impulse responses.
Adv.
Appl.
Probability,
1 7 , 810-840. [42] Fujisaki, M., Kallianpur, G. and H. K u n i t a (1972). Stochastic differential equations for the nonlinear filtering problem. [43] Gallager, R. G . (1968). Information
Osaka
Theory
J.
Math.,
and Reliable
9, 19-40. Communication.
Wi-
On large deviations from the invariant measure.
Theory
ley, New York. [44] Gartner, J . (1977). Probability
Appl,
2 2 , 24-39.
[45] Gel'fand, I . M . and A . M . Yaglom (1957). Calculation of the amount of information about a random function contained i n another such function. Mat.
Nauk,
1 2 , 3-52; Translated i n A.M.S.
Transl.,
Uspekhi
Ser. 2, 1 2 (1959), 199-246.
[46] Gray, R. (1977). Toeplitz and circulant matrices: I I . Stanford Univ., Inform. System Lab., Tech. Report, 6504-1. [47] Gray, R. (1990).
Entropy
and Information
[48] Grenander, U . and G. Szego (1984).
Theory. Toeplitz
Springer Verlag, New York.
Forms
Chelsea, New York (Second Ed.). [49] Halmos, P. R. (1950). Measure Theory,
van Nostrand.
and Their
Applications.
296
BIBLIOGRAPHY
[50] Hida, T . (1960).
Canonical representations of Gaussian processes and their
applications. Mem. Coll.
Sci. Univ.
Kyoto,
[51] Hida, T . and M . Hitsuda (1993).
Ser. A , 3 3 , 109-155.
Gaussian
Amer.
Processes.
M a t h . Soc,
Providence, Rhode Island. [52] Hitsuda, M . (1968). Representation of Gaussian processes equivalent t o Wiener process.
Osaka
Main., 5, 299-312.
J.
[53] Hitsuda, M . (1974). M u t u a l information i n Gaussian channels. /. Anal,
Multivariate
4 , 66-73.
[54] Hitsuda, M . and S. Ihara (1975). Gaussian channels and the optimal coding. J. Multivariate
Anal,
5, 106-118.
[55] Huang, R. Y . and R. A . Johnson (1962). nuous channels.
IEEE
Trans.
Inform.
Information capacity o f time-conti-
Theory,
I T - 8 , sl91-sl98.
[56] Huang, R. Y . and R. A. Johnson (1963). Information transmission w i t h timecontinuous random processes. [57]
IEEE
Trans.
Inform.
Theory,
I T - 9 , 84-94.
G a u s s i a n Random
Ibragimov, I . A . and Yu. A. Rozanov (1978).
Processes.
Springer Verlag, New York. [58] Ihara, S, (1973). In
Proc.
Math.,
Second
Optimal coding i n white Gaussian channel w i t h feedback. Japan-USSR
Symp.
on Probability
Theory{Lecture
Notes
in
3 3 0 ) , 120-123, Springer Verlag, Berlin.
[59] Ihara, S. (1974). Coding theory i n white Gaussian channel w i t h feedback. J. Anal., 4, 74-87.
Multivariate
[60] Ihara, S. (1977). Coding theory i n Gaussian channel w i t h feedback. I n Seventh
Prague
Con},
and 1974 Europ.
Meet.
Stat,
IVaiw.
Vol. B , 201-207, Czecho.
Acad. Sci.. [61] Ihara, S. (1978). On the capacity of channels w i t h additive non-Gaussian noise. Inform.
Control,
3 7 , 34-39.
[62] Ihara, S. (1979). O n the capacity of the discrete time Gaussian channel w i t h feedback. I n
Trans.
Eighth
Prague
Con}.,
Vol. C , 175-186, Czecho. Acad. Sci..
[63] Ihara, S. (1980). On the capacity of the continuous time Gaussian channel w i t h feedback. J. Multivariate
[64] Ihara, S. (1984).
Anal,
Stochastic
1 0 , 319-331.
Processes
and Entropy.
Iwanami Shoten, Tokyo,
(in Japanese). [65] Ihara, S. (1984). IEEE
Trans.
M a x i m u m entropy spectral analysis and A R M A processes.
Inform.
Theory,
I T - 3 0 , 377-380.
[66] Ihara, S. (1985). M u t u a l information i n stationary channels w i t h additive noise. IEEE
Trans.
Inform.
Theory,
I T - 3 1 , 602-606.
[67] Ihara, S. (1987). Relationship between the constraints and the m i n i m u m relative entropy spectral density of time series. Mem.
Fac.
Sci.
Kochi
Univ.
(Math.)
297
BIBLIOGRAPHY
8, 31-42. [68] l h a i a , S. (1988). Capacity of discrete time Gaussian channel w i t h and without feedback, L Mem.
Fac. Sci. Kocki
(Math.) 9 , 21-36.
Vniv.
[69] Ihara, S. (1990). Capacity of mismatched Gaussian channels w i t h and without feedback. Probability
Theory
Rel.
Fields,
84, 453-471.
[70] Jaynes, E. T . (1957). Information theory and statistical mechanics.
Phys.
Rev.,
106, 620-630. [71] Jaynes, E. T . (1957). Rev.,
Information theory and statistical mechanics, 11.
Phys.
108, 171-190.
[72] Kac. M . , Murdock, W . L. and G. Szego (1953). On the eigen-values of certain Hermitian forms. J .
Rat. Mech.
Anal.,
767-800.
2,
[73] Kadota, T . T . , Zakai, M . and J . Z i v (1971). M u t u a l information of the white Gaussian channel w i t h and without feedback.
IEEE
Trans.
Inform.
Theory,
I T - 1 7 , 368-371. [74] Kadota, T . T . , Zakai, M . and J . Ziv (1971). Capacity of a continuous memoryless channel w i t h feedback.
IEEE
Trans.
Inform.
Theory,
I T - 1 7 , 372-378.
[75] K a i l a t h , T . (1970). The innovation approach to detection and estimation theory. Proc.
IEEE,
5 9 , 680-695.
[76] K a i l a t h . T . (1971).
R K H S approach t o detection and estimation problems -
Part I : Deterministic signals i n Gaussian noise,
IEEE
Trans.
Inform.
Theory,
I T - 1 7 , 530-549. [77] Kapur, J . N . (1989).
Maximum-Entropy
Models
in Science
and
Engineering.
Wiley Eastern, New Delhi. [78] K h i n c h i n , A . I . (1957).
Mathematical
Foundations
of Information
Theory.
Do-
ver, New York. [79] Kolmogorov, A . N. (1956). Theory of information transmission. I n sion
A.M.S.
on Sci.
Transl.
Problems
Proc.
Ses-
Izd. Akad. Nauk SSSR; Translated i n
of Automation,
Ser. 2, 3 3 (1963), 291-321.
[80] Kolmogorov, A. N . (1956). On the Shannon theory of information transmission in the case of continuous signals.
IRE
Trans.
Inform.
Theory,
I T - 2 , 102-108.
[81] Kolmogorov, A . N. (1958), A new invariant of transitive dynamical systems and automorphisms of Lebesgue spaces.
Dokl.
Acad.
Nauk
SSSR,
119, 861-864 (in
Russian). [82] Kolmogorov, A . N . (1959). automorphism. Dokl. [83] Kullback, S. (1959).
Acad.
Entropy per unit time as a metric invariant o f Nauk
Information
SSSR, Theory
[84] Kullback. S. and R. A. Leibler (1951). Math.
Statist.,
2 2 , 79-86.
124, 754-755 (in Russian). and Statistics.
Wiley, New York.
On information and sufficiency. A n n .
298
BIBLIOGRAPHY
[85] Lin'kov, Yu. N . (1965). Evaluation of e-entropy of random variables for small e.
Problems
Inform.
[86] Linnik, Yu.
1, 12-18.
Trans.,
V . (1959).
A n information-theoretic proof of the central l i m i t
theorem w i t h the Lindeberg condition. [87] Liptser, R. S. (1974).
Theory
Probability
4 , 288-299.
Appl.,
O p t i m a l encoding and decoding for transmission of
a Gaussian Markov signal i n a noiseless-feed back channel.
Problems
Inform.
T r a n s . , 1 0 , 279-288. [88] Liptser, R. S. and A. N . Shiryayev (1977). Statistics
of Random
Processes
I, I I .
Springer Verlag, New York. [89] Loeve, M . (1977).
Probability
Theory
I.
Springer Verlag, New York ( 4 t h Ed.).
[90] Natarajan, S. (1985). Large deviations, hypothesis testing, and source coding for finite Markov chains.
IEEE
Trans.
Inform.
I T - 3 1 , 360-365.
Theory,
[91] Ovseevich, I . A. (1970). O p t i m u m transmission o f a Gaussian message over a channel w i t h white Gaussian noise and feedback.
Problems
Inform.
Trans.,
6,
191-199. [92] Ovseevich, I . A. and M . S. Pinsker (1974). Linear transmission of Gaussian messages over a nonstationary Gaussian channel w i t h feedback.
Problems
Inform.
1 0 , 85-99.
Trans.,
[93] Ozarow, L. H . (1990). feedback.
IEEE
Trans.
Random coding for additive Gaussian channels w i t h Inform.
I T - 3 6 , 17-22.
Theory,
[94] Ozarow, L. H. (1990). Upper bounds on the capacity o f Gaussian channels w i t h feedback.
IEEE
Trans.
Inform.
[95] Pinsker, M . S. (1956).
I T - 3 6 , 155-161.
Theory,
Calculation of the rate of information transmission of
stationary random processes and the capacity of stationary channels. Akad.
Nauk
[96] Pinsker, M . S. (1963). Gaussian sources. sion,
Problems
of Information
Transmis-
14, 59-100 (in Russian).
[97] Pinsker, M . S. (1964). Information
ables
Dokl.
1 1 1 , 753-756 ( i n Russian).
SSSR,
and Processes.
and Information
Stability
of Random
Vari-
Holden-Day, San Francisco.
[98] Priestley, M . B. (1981).
Spectral
Analysis
and Time
Series.
Academic Press,
London. [99] Renyi, A. (1961). Berkeley
Symp.
O n measures of entropy and information.
Math.
Statist.
Prob.,
I n Proc. .ji/i
Vol. 1, 547-561, Univ. o f Calif. Press,
Berkeley. [100] Renyi, A. (1965). On the foundations of information theory. nal.
Stat.,
Rev.
Inst.
Inter-
3 3 , 1-33.
[101] Sanov, I . N. (1957). On the probability of large deviations of random variables. Mat.
Sbornik,
4 2 , 11-44; Translated in
Selected
TntuL
in Math.
Statist.
2fl9
BIBLIOGRAPHY Probability,
1 (1961), 213-244.
[102] Schalkwijk, J . P. M. and T . Kailath (1966). A coding scheme for additive noise channels w i t h feedback, Part I : No bandwidth constraints. Theory,
IEEE
Trans.
[103] Shannon, C. E. (1948). A mathematical theory of communication. Tech.
Inform.
I T - 1 2 , 172-182. Bell
System
J . , 2 7 , 379-423, 623-656.
[104] Shannon, C. E. (1949).
IRE,
Communication i n the presence of noise. Proc.
3 7 , 10-21. [105] Shannon, C. E. (1959). Coding theorems for a discrete source w i t h a fidelity criterion.
IRE
Nat.
ff.ec, p t . 4, 142-163.
Conv.
[106] Shannon, C. E. and W. Weaver (1949).
The
Mathematical
Theory
of
Commu-
Univ. of Illinois Press, Urbana.
nication.
[107] Shore, J. E. and FL W. Johnson (1980). Axiomatic derivation of the principle of m a x i m u m entropy and the principle of minimum cross-entropy. Inform.
[108] Slepian, D. (Ed.) (1974). ory.
Key
Integral
[110] Solev, V. N. (1981). [Ill]
Trans.
Papers
in the
Development
of Information
The-
I E E E Press, New York.
[109] Smithies, F. (1958). Math.,
IEEE
I T - 2 6 , 26-37.
Theory,
Equations.
Cambridge Univ. Press.
Information i n a scheme w i t h additive noise.
J.
Soviet
16, 996-1004 (translated from Russian).
Stam, A. (1959). sources. Inform.
Some inequalities satisfied by the quantities of information Control,
2, 101-112.
|112] Tiernan, J . C. and J . P. M . Schalkwijk (1974). A n upper bound to the capacity of band-limited Gaussian autoregressive channel w i t h noiseless feedback. Trans.
Inform.
IEEE
I T - 2 0 , 311-316.
Theory,
[113] Tverberg, H. (1958).
Math.
A new derivation of the information function.
S c a n d . , 6, 297-298.
[114] Ulrych, T . J . and T . N. Bishop (1975). and autoregressive decomposition.
Reviews
M a x i m u m entropy spectral analysis of Geophys.
and
Space
13,
Phys.,
183-200. [115]
Varadhan, S. R. S. (1984). Large
Deviations
and Applications.
Soc.
Industrial
and Applied Mathematics, Philadelphia. [116] Wyner, A. D . (1965). The capacity of the band-limited Gaussian channel. System
Tech.
J.,
45,
[117] Yule, G. U. (1927).
Bell
359-395.
On a method of investigating i n distributed series, w i t h
special reference to Wolfer's sunspot numbers. Phil.
Ser. A, 2 2 6 , 267-298.
Trans.
Royal
Soc.
London,
List of Symbols
det A
determinant of a m a t r i x A
18
in
determinant of a m a t r i x T
47
M
largest integer not greater than x
II HI
total variation
II • IU,T
norm of reproducing kernel Hilbert space
270
( • > ' )z,T
inner product of reproducing kernel Hilbert space
270
(•-•)
inner product of L (fl, P) ji is absolutely continuous w i t h respect to v
259
cardinal number of B
264
p.
31
2
•< v
covariance m a t r i x of
r (z) w
7(0 7(M)
{Z\,Zpi)
183 63
covariance function of a stationary process
58, 83
boundary of
(S) "(
21
covariance function of a stationary process covariance function of a process
PX
153
58 103
F fi r
probability distribution of X
36, 288
joint probability distribution of X and Y
36, 289
probability d i s t r i b u t i o n of
151
{£*;-
150
<S)
error probability of reproduction
163
p(i>y)
distortion measure
(T-field generated by Y
290
properties of Gaussian channel w i t h feedback
229
(A.0) -
(A.3)
42
property of Gaussian channel w i t h o u t feedback
229
(Ao.O) - ( A 3 )
properties of white Gaussian channel w i t h feedback
230
!A ,2')
property of white Gaussian channel without feedback
230
(A.2') 0 l
0
300
LIST
OF
301
SYMBOLS
161
A
class of admissible pairs
A
class of admissible pairs
161
A i m
information rate
265
Af{ )
class of admissible pairs
253
At
class of admissible Gaussian pairs
212
•VR)
class of admissible Gaussian pairs
273
Am
class of admissible pairs
273
A{P)
class of admissible pairs
250
( U j , (a.2)
properties of discrete Gaussian channel w i t h feedback
178
(a.2 )
property of discrete Gaussian channel without feedback
178
S
(7-field
287
P
1
Brownian motion a-field
generated by (X\,
59 ...,X ) n
66
8(X)
(T-field
C
channel capacity
21, 31
c
channel capacity per unit time
Cj
feedback capacity per unit time
Cf
feedback capacity
161
Cf{A)
channel capacity
161
Cf(A)
channel capacity per unit time
162
G(X)
channel capacity
160
C{X)
channel capacity per unit time
162 251
C {p)
channel capacity
Cf(p)
feedback capacity
C{ )
channel capacity per unit time
T
P
feedback capacity per unit time
Cf(p) D(R,
X)
D{R;i)
distortion-rate function distortion-rate function per unit time e-typical
set i n distortion
m i n i m u m reproduction error
160 162 162, 253
251 163, 253 253 43 150 152 172
D(p;u)
entropy function
26
d{X,Y)
reproduction error, distortion
42
distortion per unit time
150
E[X\
expectation of X
E[X\?\
conditional expectation of X given T
290
E[X\Y]
conditional expectation of X given Y
290
p
-
37, 289
a set of spectral density functions
122
a set of probability distributions
101
302
LIST
OF
SYMBOLS
spectral density function
63, 84
( H . l ) - (H.10)
properties of entropy
5, 6 , 8
( H . l ' ) , (H.2')
properties of entropy
9
relative entropy rate
80
H(f;g)
H-projection of v on F
100
H-projection
122
entropy of ( K , , Y
H (Y) N
N
178
)
H(p ...,p )
entropy of distribution ( p i , . . . , p )
H (X)
subspace spanned by {A ,, s < (}
u
m
m
-
t
3,4
H{X)
entropy of X
W(X)
entropy rate of X
H(X\Y)
conditional entropy of X given Y
H(X\Y
=
59
conditional entropy of X given Y = b
b)
relative entropy of M w i t h respect to v
mm*)
relative entropy rate of p w i t h respect t o v relative entropy rate of X w i t h respect t o Y
H{PX\PY)
3,4 64, 83
3,7 8 4, 21 61 61,62 23
H±{p\v)
relative entropy associated w i t h p a r t i t i o n A
( h . l ) - (h.7)
properties of continuous entropy
h(f)
entropy rate
h (Y)
continuous entropy of
HP)
continuous entropy of p(x)
13
h(X)
continuous entropy of X
13
h(X)
continuous entropy rate of X
59
KX[Y)
conditional entropy of A" given Y
17
N
h(X\Y
=
76 (Yj,Iw)
conditional entropy of A" given Y = y
y)
17, 18 178
17
(1.1) - (1.10)
properties of mutual information
/»(e.r) frit.Y) I{X,Y)
mutual information
178
mutual information
62, 231
I(X,Y)
mutual information rate between X and Y
mutual information between X and Y conditional m u t u a l information
I{X,Y\Z) I{X,Y\Z
=
z)
conditional m u t u a l information
39, 40
3, 36 6 1 , 62 39 38
I{x)
entropy function
ICT(Z)
reproducing kernel Hilbert space
270
mm
i(x?«AW*)
180
111, 116
L(T)
UXT^YZJY*^
i"(R)
L
p
space
92
i (0,P)
L
2
space
64
2
266
LIST
OF
SYMBOLS
303
L ([-n,ir],F)
t? space
64
Mi
space of probability distributions
29
M,(X,fi(X)) P(A)
space of probability distributions
29
2
probability of A
4
P(A- €
A\T)
conditional probability
291
P(X€
A\Y)
conditional probability d-dimensional Euclidean space
291
R
s
R(D;
R)
rate-distort ion function
13 43
R(D;0
rate-distort ion function per unit time
150
R{t,s)
covariance function of a process
237
S
class of spectral density functions
T
time parameter space
57
W )
Toeplitz matrix trace of matrix A
49
r
e
( S )
e-typical
set
65, 121 78 152
class of admissible input signals
160
X
class of admissible input signals
#,(*)
class of admissible Gaussian input signals
161 262
class of admissible input signals
262, 273
class of admissible input signals
251
AP)
class of admissible input signals
163, 253
XT
{X(s);s
X?
{X(s);s>t}
YI
{Y{t);S
Z
the set of all integers
57
Z+
the set of all positive integers
59
86 86
231
Subject I n d e x absolutely continuous, 21 Wiener measure, 232
binomial distribution, 108 Bishop,T.rj., 143, 299
Aczel,J., 293
bit, 5
achievable rate, 164
Blachman.N., 293
admissible input, 160
Blahut,R.E., 56, 294
admissible pair, 161
Boltmann's H theorem, 3
Akaike,H., 293
Borel field, 288
Akaike information criterion, 143
Borel set, 288
alphabet, 145
Box.G.E.P., 96, 294
alphabet space, 145
Brownian motion, 59, 230
Arimoto,S., 293
Jf-dimens tonal, 268
A R (autoregressive) process, 6 9 , 119, 120
Bucklew,J.A., 143 294
A R M A (autoregressive-moving average)
Burg,J.P., 118, 143, 294
process, 7 0 , 126, 127
Butman,S., 294
auto covariance function, 59 autoregressive process, 69
canonical representation, 66, 85, 269
autoregressive-moving average process, 70 average power constraint, 159, 195, 250
in the sense of Levy-Hida-Cramer, 96, 269 capacity, 160
backward channel, 147
coding capacity, 160, 164, 165
Baker.C.R., 285, 293
Gaussian channel, 200, 201,213,262,273
band limited process, 90
information capacity, 160, 165
Barron,A., 293
per unit time, 162
Berger,T.,56, 176, 293
stationary Gaussian channel, 202, 267
best predictor, 67, 85
white Gaussian channel, 196, 197, 199,
Billingsley,P., 293 Binia,J., 227,286, 293
251, 253 chain rule, 9, 41
304
SUBJECT channel, 144 memoryless channel, 159 stable channel, 164
INDEX distortion measure, 42
distortion-rate function, 4 3 , 44, 54, 149 per unit time, 150
with additive noise, 147, 177-183
distortion stable, 152
with additive feedback, 242, 245
Dobrushjn.RJ., 38, 176, 295
with feedback, 147, 178
Doob,J.L.,96 , 295
without feedback, 178
Duncan.T.E., 285, 295
channel capacity, 158-162 channel coding theorem, 162-171
Ebert,P., 295
converse, 164
eigenfunction, 247
Gaussian channel, 224, 281
eigenvalue, 247
Childer,D.G., 294
Ellis,R.S., 116, 143,295
code, 153
encoder, 144
code word, 153
entropy, 2, 3, 4 - 1 2 , 13
conditional entropy, 3, 7, 17
per unit time, 59
conditional expectation, 289
entropy function, 26, 111
conditional mutual information, 39
entropy rate, 59, 76-83
conditional probability, 291
stationary Gaussian process, 76
conditional probability distribution, 44, 291 e-entropy, 56 c.o.n.s. (complete orthonormal system), 189
e-typica) pair, 152
continuous entropy, 13—20
£-typical set, 152, 165, 283
Gaussian random variable, 48
error probability, 42
convergence in entropy, 31
hypothesis testing, 128, 132, 137
convergence in law, 31
of reproduction, 42, 163
convergence in variation, 31
of the first kind, 128
convex, 29 covariance function, 58
of the second kind, 128 exponential distribution, 14, 109
covariance operator, 247 Cover.T.M., 56, 227, 294
Faddeev,D.K„ 56, 295
cross entropy, 23
Fano.R.M., 56, 295
Csiszar,!., 143, 176, 294
Fano's inequality, 11 feedback, 147
Dacunha-Castelle,D., 143, 294
linear feedback, 234
Daroczy.Z., 293
Feller,W., 295
decision region, 128
filter, 234
decoder, 144
filtering error, 234
Dembo,A., 294
forward channel, 147
Deuschel,J.D., 143, 294
Franke,J., 143, 295
differential entropy, 13-20
Fujisaki.M., 295
distortion, 42
fundamental theorem in information
per unit time, 150
305
transmission, 171-175
306
SUBJECT
Gallager,R.G., 56, 97, 295 Gartner,J., 116, 143, 295 Gaussian channel, 147, 177-286 continuous time, 228—286 discrete time, 177-227 stationary, 263 with feedback, 229 without feedback, 229 Gaussian distribution, 47 Gaussian process, 57 Gaussian random variable, 47 Gaussian random vector, 47 Gaussian system, 57 Gel'fand,I.M., 227, 285, 295 Gibbs distribution, 108 Gray,R.,56, 80, 295 Grenander.U., 295 Halmos,P.R., 295 Hida,T.,96, 296 Hitsuda.M., 96, 286, 296 H-projection, 100, 101-109 //-projection, 122 Huang.R.Y., 296 hypothesis testing, 127-142 fbragimov,I.A., 96, 296 Ihara,S., 56, 143, 227, 285, 286, 293, 296, 297 i.i.d. (independent and identically distributed), 67 information divergence, 23 information for discrimination, 23 information gain, 23 information source, 144 information stable, 152, 193, 261 innovation process, 66 input signal, 144 Jaynes.E.T., 98, 107, 143, 297 Jenkins,G.M., 96, 294
INDEX
Johnson,R.A., 296 Johnson.R.W., 143, 299 joint probability distribution, 288 Kac,M., 97, 297 Kadota,T.T., 285, 297 Kailath,T., 297, 299 Kailianpur,G.,295 Kapur,J.N., 297 Khinchin,A.I., 56, 297 Kolmogorov,A.N., 56,97, 176,227, 285, 286, 297 Korner.J., 176, 294 Kullback.S., 2, 23, 56, 98, 143, 297 Kullback-Lei bier information number, 23 Kunita,H., 295 large deviation property, 111, 117 large deviation theorem, 110-118, 141 law of large numbers, 110 weak law, 110 strong law, 110 Lebesgue measure, 288 Leibler,R.A., 56, 297 Lin'kov.Yu.N., 56, 298 Linnik.Yu.V., 298 Liptser,R.S., 285, 298 Loeve,M., 298 log likelihood ratio function, 129 lower semi-continuity, 33 MA (moving average) process, 69 Markov chain, 39 Markov process, 58 maximum entropy distribution, 99 maximum entropy principle, 98 maximum entropy spectral analysis, 118¬ 127 ME (maximum entropy), 99, 119 mean square error, 42, 290 measure, 287
SUBJECT
measurable set, 287 measurable space, 287 measure space, 288
INDEX
307
Ornstein-Uhlenbeck Brownian motion, 59, 88, 242 orthogonal projection, 188 output signal, 144 Ovseevich,I.A., 298 Ozarov.L.H., 298
MEP (maximum entropy principle), 98 message, 144 stable message, 154 minimum prediction error, 67 minimum relative entropy distribution, 99 partition, 23 minimum relative entropy principle, 98 refinement, 24 minimum relative entropy spectral analysis, Pinsker,M.S., 56, 82, 97, 168, 176, 227, 122-127 285, 286, 298 minimum reproduction error, 171 Poisson distribution, 109 moment generating function, 106 Pombra,S., 227, 294 moving average process, 69 power spectral density, 63 moving average representation, 66 Priestley,M.B., 96, 298 MRE (minimum relative entropy), 99, 119 probability, 4, 288 MREP (minimum relative entropy principrobability density function, 13 ple), 98 probability distribution, 4, 288 multiple Markov process, 74, 87, 95 continuous, 13 in the strict sense, 74, 87 probability measure, 288 in the wide sense, 74, 87 probability space, 288 Murdock.W.L., 97, 297 Radon-Nikodym derivative, 21, 232 mutual information, 2, 3, 35-41 Radon-Nikodym theorem, 21 Gaussian processes, 191, 259 random coding method, 154, 166 Gaussian random variables, 50 random variable, 288 in additive noise channel, 178 in Gaussian channel, 183, 184, 212, 272 continuous, 13 in stationary channel, 181, 266 rate-distort ion function, 4 1 — 4 7 , 149 in white Gaussian channel, 235, 238, 248 Brownian motion, 279 per unit time, 61, 62 Gaussian process, 220, 222 277, 280 Gaussian random variable, 54 mutual information rate, 6 1 , 62 per unit time, 150 in stationary Gaussian channel, 186, 264 rate function, 26, 111 received message, 145 nat, 5 receiver, 144 Natarajan.S., 143, 298 reference measure, 21 Ney man-Pearson theorem, 129 relative entropy, 2, 3, 20-34 noise, 144 Gaussian distributions, 49 relative entropy rate, 6 1 , 6 2 , 80-83 optima] coding, 207-210, 254-258 stationary Gaussian process, 81 in the sense of filtering, 255 298 in the sense of mutual information, 255 Renyi
308
SUBJECT
INDEX
reproducing kernel Hilbert space, 270
stochastic integral, 64, 292
reproduction error, 42
Strook,D.W., 143, 294
resolvent kernel, 243
Szego.G., 97, 295, 297
R K H S (reproducing kernel Hilbert space), 270
T h o m a s J . A . , 56, 294
Rjccati equation, 241
T i e r n a n , J . C , 299
Rozanov.Yu.A., 96, 296
Toeplitz matrix, 78 total variation, 31
sampling theorem, 91-93
transmitter, 144
Sanov.I.N., 143, 29S
Tverberg,H.,299
Sanov property, 111, 113 Schalkwijk,J,P.M., 299
Ulrych/T.J., 143, 299
S D F (spectral density function), 63, 84
uniform distribution, 13
Shannon.C.E., 1, 2, 3, 56, 97, 175, 226, 285, user, 144 299 Shiryayev.A.N., 298
Varadhan,S.R.S„ 143, 299
Shore,J.E., 143, 299
Volterra function, 237
a- algebra, 287 a-field, 287
water filling method, 203
simultaneous orthogonalization, 191
weak convergence, 31
Slepian,D.,299
Weaver.W., 299
Smithies.F., 299
white Gaussian channel, 1 8 3 , 230
Solev.V.N., 227, 285, 299 source, 144 memoryless source, 148
with feedback, 230 without feedback, 230 white noise, 66
source coding theorem, 153-158
Wiener-Hopf equation, 237
spectral decomposition, 63
Wiener integral, 292
spectral density function, 6 3 , 84
Wiener measure, 232
rational, 72, 84 spectral distribution function, 63, 84
Wold decomposition, 65 Wyner.A.D., 299
spectral representation, 64, 84 stable
Yaglom.A.M., 227, 285, 205
in distortion, 152
Yule.G.U., 70, 299
in information, 152
Yule-Walker equation, 73
Stam,A., 299 stationary channel, 179, 181, 263
Zakai,M-, 227, 285, 286, 293, 297
stationary process, 58, 62-96
Ziv,J., 227, 285, 286, 293, 297
deterministic, 64, 84 purely nondeterministic, 65, 84 strictly, 58 weakly, 58