This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
*
E
e:»: E F*, /14>*/1::; 1, II'ljJ*/I::; I},
where (4)* 0 'ljJ*) (4> 0 'ljJ) = 4>* (4) )'ljJ* ('ljJ). The completion of £. 0 F w.r. t. A is denoted by £.®.xF and called the injective tensor product Banach space of E and F. Suppose that X, Yare compact Hausdorff spaces. If E = C(X), then C(X) 0 F is identified with the function space consisting of F-valued functions on X: xEX.
Moreover, it is true that
C(X) ®.x F
= C(X ;F),
the Banach space of all F-valued (norm) continuous functions on X with the sup norm. In particular, we have that
C(X) ®.x C(Y) =C(X x Y), where we identify (a0b)(x, y)
= a(x)b(y) for a E C(X), b E C(Y)
Proposition 5. Consider alphabetmessage spaces X v : X x ~ --+ [0,1]. Then: (1) (c5) =>(c5') {::} (c5"). (2) (c5'), (c6) =>(c2).
and x E X, Y E Y.
= X~, Y = YoZ and a junction
Proof. (1) (c5)=>(c5') is clear and (c5'){::}(c5") follows from the fact that each message [Yi ... Yj] is a clopen set and the set rot of all messages forms a topological basis for Y, and the fact that C(X x Y) = C(X) ®.x C(Y), where the algebraic tensor product C(X) 0 C(Y) is dense in it as is noted above. (2) Let b E C(Y). Then there exists a sequence {Sn}~=l of simple functions of kn
the form Sn =
E
O!n,k 1A n ,k ' An,k Erot, n
~ 1 such that
k=l
sup /Sn(Y) - b(y)/ yEY
= /lsn - b/l --+ 0
as n --+
00.
127
3.1. Information channels
This implies that
[ Sn(y) v(x, dy) ----t [b(Y) v(x, dy) uniformly in x,
Jy
Jy
Since sn(Y) v(·, dy) E C(X) by (c5'), we have b(y) v(·, dy) E C(X). Now (c6) implies that the RN-derivative k(x,·) == v~(d:») exists in L 1 (y , rJ) for every fixed x E X and
v(x, C)
=
fc
k(x, y) 'T/(dy) ,
x E X, C E~.
Let C E ~ be given and find a sequence {bn}~=l ~ C(Y) such that
[ Ibn(y) - lc(y)1 'T/(dy) ----t O. Then we have that, for x E X,
v(x, C) = [ lc(y)k(x, y) 'T/(dy)
r lim r bn(y) v(x, dy) l-
== n-+oo lim bn(y)k(x, y) rJ(dy) }y ==
n-+oo
by the Dominated Convergence Theorem. Therefore, v(·, C) is a limit of a sequence of continuous functions, and hence is measurable on X. That is, (c2) holds. For a channel v E C(X, Y) define an operator K; : B(Y)
(Kvb)(x) = [b(Y) v(x, dy),
~
B(X) by
bE B(Y),
(1.5)
where B(X), B(Y} are spaces of all bounded measurable functions on X, Y, respectively. In fact, if b ==
n
E
G-j lej' a simple function on Y, then obviously Kvb E B (X). j=l For a general b E B (Y) we can choose a sequence {bn } of simple functions converging to b pointwise with Ibnl :::; Ibl for n 2: 1. Then the Dominated Convergence Theorem applies to conclude that Kvb E B(X). Now we can characterize continuous channels between alphabet message spaces using the operator K v .
Proposition 6. Consider alphabet message spaces X == X~ and Y == JlQz and a stationary channel t/ E Cs (X, Y). Then the following conditions are equivalent:
Chapter III: Information Channels
128
(1) V is (2) The (3) The M(Y) and weak*.
continuous, i.e., t/ satisfies (c5'); operator K v defined by (1.5) is a linear operator from C(Y) into C(X); dual operator K~ of K; is from C(X)* into C(Y)*, i.e., K~ : M(X) -+ is sequentially weak * continuous, i.e., ~n -+ ~ weak* implies K~en -+ K~~
(4) v(-, k~l[Y}:) ... Y;~)]) E C(X) for n
~
1 and messages [Yik ... Yik]
(1 ::;
k:::; n). Proof. (4)
=* (1) is obvious. We shall show (1) =* (4) =* (2) =* (3) =* (1).
(1) =* (4). We first prove the case where n == 2. If i1 < i 2 , then
U
[y~~) ... YJ~)] n [Y~~) ... YJ~)] ==
[Y~~)·· . YJ~)Yjl +1 ... Yi2-1Y~~) ... YJ~)]
(1.6)
YkEYo jl
and hence
v(., [Y~~) ... yj~)]n[yj~) ... yj~)]) ==
L
v(., [y~~) ... yj~)Yjl+1 ... Yi2-1Y~~) ... yj~)]).
YkEYo jl
(1.7) Since each terrn on the RHS of (1.7) is continuous by (1), the LHS is also continuous. If i 1 :::; i 2 :::; i, :::; i2, then LHS of (1.6)
== [yi~) ... yi~~l] n [yi~) ... yj~)] n [y~~) ... yj~)] n [yj~~l ... yj~)] 2 1 0 if Yk ) =1= Yk ) for some i 2 < k - { [y~l) ... y~1)y~2) ... y~2)] if y(l) == y(2) for i < k < J.. ~1 31 31 + 1 32 k k 2 1
< i,
Thus the LHS of (1.7) is continuous by (1). The other cases are similarly proved. For a general n ~ 2, we can establish (4) by mathematical induction. (4) =* (2). Let Q:o be the algebra generated by the functions of the form l[Yi".Yj] (i :::; i). Then (4) implies that KlIQ: o ~ C(X). Since each b E C(Y) is' uniformly approximated by a sequence {bn } C Q:o, we have Kvb E C(X) because IIKv(b - bn)11 :::; lib - bnll -+ o. Therefore, K; is a linear operator from C(Y) into C(X). (2) =* (3). Suppose that {~n} ~ M(X) == C(X)* is a sequence converging weak* to E M(X), Le.,
e
(a,~n) =
Lad~n L -+
adf, =
(a,~),
a E C(X),
where (-, -) is the duality pair for C(X) and M(X). Then for b E C(Y) we have that
129
3.2. Channel operators
since Kvb E C(X) by (2). This means that {K~~n} ~ M(Y) converges weak* to K~~. Therefore (3) holds. (3) => (1). For x E X we denote by 8x the Dirac measure at x, i.e., 8x(A) = 1 if x E A and = 0 if x tj. A for A E :t. Note that 8x E M(X) for x E X. Assume that {Xn}~=l ~ X converges to x, Le., d(x n , x) -+ 0, where the metric d is defined by (11.1.1). For a E C(X) one has
(a,8x n )
= a(x n ) -+ a(x) = (a,8x ) .
Thus, by (3) we see that for b E C(Y)
If, in particular, we take b = means (1) holds.
I[Yi'.'Yi]'
then the above shows that Kvb E C(X). That
3.2. Channel operators Baire functions. We
'~~l~"~~d'thi~"'ki~~r" of operators.
Let X, Y be a pair of compact Hausdorff spaces with Baire measurable transformations S, T, respectively. Consider the algebras B(X), B(Y) and B(X x Y) of all bounded Baire functions on X, Y and X x Y, respectively, where the product is a pointwise multiplication. These are C*-algebras with unit 1, and B(X) and B(Y) are regarded as *-subalgebras of B(X x Y) by the identification of a = a 0 1 and b = 1 0 b for a E B(X) and b E B(Y). Here the involution f M f* is defined by f*(w) = f(w) for f E B(o'),w E 0, and the norm is the sup-norm, where 0, = X, Y or X x Y. A linear 0 erator A from B X x Y onto B X is said to be an avera in rmerator if for t, 9 E B (X x Y) (al) Al
= 1, A(fAg) = (Af)(Ag);
(a2) f ?:. 0 => Af ?:. 0, which imply: (aI') A is idempotent, i.e., A 2
= A;
(a2') IIAII = 1. That is, A is a norm 1 projection from B(X x Y) onto B(X). An averaging operator A satisfies that Af* = (Af)*,
(Af)*(Af) ~ A(f*f),
f
E B(X
x Y).
Chapter III: Information Channels
130
In fact, the first follows from (a2) and the second is derived using the first, (a1) and (a2): Af)*(f - Af)) = A(f*f) - (Af)*(Af)· Let us denote by B(X) such that (a3) fn
+0
=}
Afn
+o. stauonaru if
a~A
= A(S I8l T).
~~~~ldenotesthe
set of all stationary operators in A(X, Y). Note that A(X, Y) X, Y) are convex sets. We need another set of operators. Let Y) ·denote the set of all bounded linear operators K : B(Y) -+ B(X) such tha an
s
(k1) K1 = 1, Kb (k2) i;
+0
=}
~
Kb n
0 if b ~ 0;
+o.
K E J((X, Y) is said to be stationary if
(k3) KT = SK. J(s(X, Y) denotes the set of all stationary K E J((X, Y). Also note that J((X, Y) and J(s(X, Y) are convex sets. Let u E C(X, Y) be an abstract channel and define operators A : B(X x Y) -+ B(X) and K : B(Y) -+ B(X) respectively by
(Af)(x) = [f(x,y)v(x,dY), (Kb)(x) = [b(Y) v(x,dy),
f
E
B(X x Y),
(2.1)
b E B(Y).
(2.2)
A and K are called channel operators associated with v and sometimes denoted by A v and K v, respectively. Note that A v (l 0 b) = Kvb for b E B(Y), where (a 0 b)(x, y) = a(x)b(y) for a E B(X), b E B(Y) and x E X, Y E Y. First we establish a one-to-one, onto and affine correspondence between C(X, Y) and J((X, Y) as follows, since they are convex:
Theorem 1. There exists a one-to-one, onto and affine correspondence v +-t K between C(X, Y) and J((X, Y) (or Cs(X, Y) and J(s(X, Y)) given by (2.2). Proof. Let v E C(X, Y) and define K : B(Y) -+ B(X)· by (2.2). Then, K satisfies (k1) by (c1) and (c2). (k2) follows from the Monotone Convergence Theorem.
3.2. Channel operators
131
Conversely, let K E IC(X,Y). For each fixed x E X, Px defined by
Px(b) = (Kb)(x),
b E C(Y),
is a positive linear functional of norm 1 on C(Y) by (kl). Hence, there exists uniquely a probability measure V x E P(Y) such that
Px(b)
= [b(Y) IIx(dy),
b E C(Y).
(2.3)
Let 8 1 be the set of all b E B(Y) for which (2.3) holds. Then C(Y) ~ 8 1 and 8 1 is a monotone class by (k2). Thus 8 1 = B(Y). If we take b = Ie for C E ~, then (K1e)(x) = le(Y) vx(dy) = vx(C), x E X and V(.)(C) E B(X). Letting v(x, C) = vx(C) for x E X and C E ~, we see that (cl) and (c2) are satisfied. Thus we have established a one-to-one, onto correspondence v +-+ K between C(X, Y) and IC(X, Y), which is clearly affine. Moreover, if v E Cs(X, Y), then for x E X and b E B(Y) it holds that
Jy
(KTb)(x)
= [K(Tb)] (x) = [b(TY) lI(x, dy) = [b(y)lI(x,dT- 1y) = [b(y)II(Sx,dY) = (Kb)(Sx) = (SKb)(x),
i.e., K E ICs(X, Y). If, conversely, K E ICs(X, Y), then considering b = Ie for C E ~ we see that v(x, T- 1C) = v(Sx, C), x E X by virtue of (k3), i.e., t/ E Cs(X, Y). Therefore, the correspondence v +-+ K is one-to-one, onto and affine between Cs(X, Y) and ICs(X, Y). Using Theorem 1, we can also obtain a one-to-one correspondence between
C(X, Y) and A(X, Y) as follows: Theorem 2. There exists a one-to-one, onto and affine correspondence u +-+ A between C(X, Y) and A(X, Y) (or Cs(X, Y) and As(X, Y)) given by (2.1). Proof. Let v E C(X, Y) be given and define an operator A by (2.1). Al Observe that for t, 9 E B(X x Y) and x E X
[A(j Ag)] (x)
= [(j Ag)(x, y) lI(x, dy) = [f(x, y)(Ag)(x) lI(x, dy)
= 1 is clear.
132
Chapter III: Information Channels
= [f(X, y) vex, dy)(Ag) (x) = (Aj)(x)(Ag)(x).
Hence A satisfies (al ), (a2) is rather obvious and (a3) follows from the Monotone Convergence Theorem. Thus A E A(X, X). If v E Cs(X, Y), i.e., v is stationary, then for j E B(X x Y) and x E X one has
(SAj)(x) = (Aj)(Sx) = [f(Sx, y) v(Sx, dy)
= [f(Sx, y) vex, dT-1y) = [f(Sx, Ty) vex, dy)
= [[(SI2lT)f](x,y)v(x,d y) = [A(SI2lT)f](x). Consequently, (a4) is satisfied and A E As(X, Y). Conversely, let A E A(X, Y) and
(Kb)(x) = A(10 bJ(x),
b E B(Y), x E X.
Then, it is easily seen that K E JC(X, Y). By Theorem 1 there exists a unique channel u E C(X, Y) such that (2.2) holds. Thus, for al, ... ,an E B(X) and bl , ... ,b n E B(Y) we have n
E ak(x)(Kbk)(X) k=1
=
t
ak(x)
k=1
= f
}y
f
}y
bk(y) v(x,dy)
(t ak 8 bk) (x, y) v(x, dy).
(2.4)
k=1
Since the algebraic tensor product space B(X) 0 B(Y) is a monotone class, (2.4) irnplies (2.1). If A E As(X, Y), then for x E X and C E ~ it holds that
v(Sx, C) = [lc(y)V(Sx,dY) = [(1 8 lc)(Sx,y) v(Sx,dy) = [SA(10 Ie)] (x) = [A(S Q9 T)(101e)] (x)
= [l(SX)lc(TY) vex, dy) = [lc(Y) vex, dT-1y) = v(x, T-IC).
133
3.2. Channel operators
Therefore, u E Cs(X,Y).
Remark 3. Let v +-+ K +-+ A, where u E C(X, Y), K E JC(X, Y) and A E A(X, Y). If u is continuous, i.e., v satisfies (c5"), then we can see that K : C(Y) -t C(X) and A : C(X x Y) -t C(X) are positive bounded linear operators of norm 1. Definition 4. Let P be a subset of P(X). (1) to be identical (mod VI(X,·) = V2(X,·) u-a.e.x E X
denoted
VI
== V:
for every J-t E P. In this case, we write VI(X,·) = V2(X,·) P-a.e.x. (2) Two 0 erators K 1 , K 2 E JC(X, Y) are said to be identical mod P, denoted K I == K 2 (mod P), if
for every b E B(Y) and J-t E P. In this case, we write Kib = K 2b P-a.e. (3) Two operators AI, A 2 E A(X, Y) are said to be identical mod P, denoted Al == A 2 (mod P), if
f E C(X x Y), J-t E P, or, equivalently,
J-tEP, where Ai : B(X)* -t B(X x Y)* is the adjoint operator of Ai (i = 1,2). We need the following lernma and proposition.
Lemma 5. Let v E C(X, Y) and J-t E P(X). Then the following holds:
L
(Kb)(x) p,(dx)
L
=
Li
(Af)(x) p,(dx)
b(y) v(x, dy)p,(dx)
= =
Li fro (
=
i
b(y) p,v(dy) ,
(2.5)
f(x, y) v(x, dy)p,(dx) f(x, y) J-t &J v(dx, dy)
(2.6)
JXXY
for b E B(Y) and f E B(X x Y), where J-tV and J-t &J u are output source and compound source defined by (1.1) and (1.2), respectively.
Chapter III: Information Channels
134
Proof. (2.5) and (2.6) can be verified first for simple functions and then for general bounded Baire functions by approximation. Corollary 6. (1) Let VI, V2 E C(X, Y) and P ~ P(X). If VI(X,') «: V2(X,·) Pa.e.x, then MVI «MV2 and M ® VI « M ® V2 for every M E P. (2) If V ++ K ++ A for V E C(X, Y), K E K(X, Y) and A E A(X, Y), then K*M MV and A*M J-t®V for J-L E P(..J(). Proposition 7. Let P ~ P(X) and suppose that Vi E C(X, Y), K, E K(X, Y) and Ai E A(X, Y) (i = 1,2) correspond to each other, i.e., Vi ++ K, ++ Ai for i = 1,2. Then the following statements are equivalent: (1) VI == V2 (modP). (2) K I == K 2 (rnod P). (3) Al == A 2 (mod P). (4) AiJ-L = A 2J-L, i.e., J-L ® VI = J-L ® V2 for every J-L E P.
Proof. We have the following two-sided implications: VI
== V2 (mod P) ¢::::::}
{==}
VI(X,·)
Li L
V2(X,') P-a.e.x
a(x)b(y) vl(x,dy)p,(dx) =
Li
a(x)b(y) v2(x,dy)p,(dx)
for a E C(X), b E C(Y) and J-L E P
{==}
A 1(a 0 b)(x) p,(dx) =
L
(2.7)
A 2(a 0 b)(x) p,(dx)
for a E C(X), b E C(Y) and J-L E P ¢::::::}
¢::::::}
J-L(AI(a 0 b)) = J-L(A 2(a 0 b)) for a E C(X), b E C(Y) and J-L E P Al == A 2 (mod P),
since C(X x Y) = C(X) ®A C(Y). This implies (1) {:} (3). Moreover, we have (2.8)
¢::::::}
J-L(aKIb) = J-L(aK 2b) for a E C(X), b E C(Y) and M E P Kib = K 2b P-a.e. for b E C(Y)
¢::::::}
K I == K 2 (mod P),
¢::::::}
implying (2) {:} (3), and
(2.7) ¢::::::}
J"r a(x)b(y) M® JXXY
VI
(dx, dy)
=
J"rJXXY a(x)b(y) J-L ® v2(dx, dy)
for a E C(X), b E C(Y) and J-L E P ¢::::::} M
®
VI
= M ® V2 for
M
E
P
(2.8)
135
3.2. Channel operators
by Lemma 5, implying (1) {:} (4). The important cases are those where P == Ps(X) and P == Pse(X). Definition 8. Ps(X) is said to be complete for ergodicity provided that if A E X and J-L E Ps(X) are such that J-L(A) > 0, then there exists an ergodic source 'TJ E Pse(X) such that 'TJ(A) > o. As was remarked before (cf. Section 2.2), if S is a continuous transformation on X, then S is measurable, Ps(X) -=I 0 and Pse(X) -=I 0. Proposition 9. Suppose that S is a continuous transformation on the compact Hausdorff space X into X. Then, Ps(X) is complete for ergodicity.
Proof. Suppose that J-L(A) > 0 for some j-t E Ps(X) and A E X. Since J-L is regular, there is a compact set C E X such that C ~ A and J-L( C) > O. Then we can choose a sequence {fn}~=l ~ C(X) such that t« t 1e as n -t 00 since X is compact and Hausdorff. If 0 ::; a ::; 1, then the set Pa == {'TJ E Ps(X) : 'TJ(C) 2: a} is a weak* compact convex set, since
r: == n {'TJ E Ps(X): 'TJ(fn) 2: a} 00
n=l
and Ps(X) is so. Let ao == sup {a : P« -=I 0}. Then, ao 2: j-t(C) > 0 and Pao is a nonempty weak* compact convex set. Hence, there exists some 'flo E ex P ao such that 'TJo(C) == ao > O. We claim that 'flo E exPs(X) == Pse(X). If this is not the case, there exist J-Ll, J-L2 E Ps(X) and (3 E (0,1) such that 'TJo == {3J-Ll + (1 - (3)J-L2 and J-Ll -=I J-L2· Note that by the definition of ao we have J-Ll (C), J-L2 (C) ::; ao. Since ao == 'TJo(C) == (3J-Ll(C) + (1 - (3)J-L2(C), we must have J-Ll(C) == J-L2(C) == ao and hence J-Ll, J-L2 E Pao' This is a contradiction to the extremality of 'flo in Pao' Thus 'TJo E Pse(X). For this 'TJo it holds that 'TJo(A) 2: 'TJo(C) > O. Proposition 10. Suppose that Ps(X) is complete for ergodicity and let Cs (X, Y). Then the following conditions are equivalent:
(1) Vl == V2 (modPs(X)). (2) Vl == V2 (modPse(X)). (3) Vl(X, Ex) == V2(X, Ex) Ps(X)-a.e.xfor every E E XQ9~. (4) vl(x,Ex) == V2(X, Ex) Pse(X)-a.e.xfor every E E XQ9~. (5) J-L Q9 Vl == J-L Q9 V2 for every J-L E P; (X). (6) J-L Q9 Vl == J-L Q9 V2 for every J-L E Pse(X). Proof. (1)
=}
(2), (3)
=}
(4), (5)
=}
(6), (1) {:} (5) and (2) {:} (6) are clear.
Vb
V2
E
Chapter III: Information Channels
136
(1) and (4) =} (3) follow from the completeness for ergodicity of Ps(X). (4). Assume (4) is not true. Then there is some E E X ® q) and some M E Pse(X) such that M({X EX: VI(X, Ex) i= v2(x,Ex)}) > O. We may suppose M({X EX: vI(x,Ex) > v2(x,Ex)}) > O. Then (2) (6)
=}
=}
L.
V
l (x,E x ) P,(dX) >
L.
V
2(x,E x ) P,(dX),
i = 1,2,
it holds that
M® VI (E n (A x which implies that M®
VI
i= M® V2,
Y)) > M® V2 (E n (A x Y)), a contradiction to (6).
To close this section we characterize continuous channels once more in terrns of channel operators, which is an extension of Proposition 1.6.
Proposition 11. Let V E C(X, Y) be a channel and K E JC(X, Y) and A E A(X, Y) be the corresponding channel operators. Then the following conditions are equivalent: (1) v is continuous, i.e., u satisfies (c5"). (2) K : C(Y) -+ C(X) is a linear operator. (3) A : C(X x Y) -+ C(X) is a linear operator. (4) K* : M(X) -+ M(Y) is sequentially weak* continuous. (5) A* : M(X) -+ M(X x Y) is sequentially weak* continuous. Proof. This follows from Proposition 1.6, Remark 3 and Corollary 6. The results obtained in this section will be applied to characterize ergodicity of stationary channels in Section 3.4 and to discuss channel capacity in Section 3.7.
3.3. Mixing channels Finite dependence was defined for channels between alphabet message spaces. As a generalization, asymptotic independence (or strong mixing) and weak mixing are introduced, which are similar to those for stationary information sources. Also ergodicity and semiergodicity are defined for stationary channels and relations among these notions are clarified.
137
3.3. Mixing channels
Consider a pair of abstract measurable spaces (X, X) and (Y,~) with measurable transformations Sand T, respectively. We begin with a notion of ergodicity.
Definition 1. A
n+""'+'i'"\.·.....~......."If ..
if
"".L............L......."".L
i.e., if a stationary ergo_d_i.__~_rce is the input, then the compound source rnust also denotes the set of all stationary ergodic channels. be stationary ergodic. To obtain conditions for ergodicity we first consider channels between alphabet message spaces X = X~ and Y = Yl~, where X o and Yo are finite sets. We have the following with the notation in Section 3.1.
Proposition 2. If
t/
E Cs(X~, YoZ ) is a memoryless channel, then v is ergodic.
Proof. Let P = (p(bla))a,b be the channel matrix of v (cf. Definition 1.1, (c4)). Let J-t E Pse(X) and A = [xo = a],B = [xo = a'] C X and C = [Yo = b],D = [Yo = b'] c Y. Then one has 1
n-l
- LJ-tQ9v((SxT)-k(AxC)n(BxD)) n k=O 1
n-l
= - LJ-tQ9v((S-kAnB) x (T-kCnD)) n
k=O n-l
=
L
1 /L @ V([Xk n k=O 1
= a, Xo = a'] x
[Yk
= b, Yo = b'D
n-l
= -n L J-t([Xk = a, Xo = a'])p(bla)p(b'la') k=O
---+ J-t([xo = a])J-t([xo = a'])p(bla)p(b'la'),
= J-t Q9 v(A
by ergodicity of u,
x C)J-t Q9 v(B x D).
It is not hard to verify that the above holds for general messages A, B c X~ and C, D c Yoz . Hence J-t Q9 u is ergodic by Theorem 11.3.2. Therefore v is ergodic.
Definition 3. A channel v E C(XZ, J:':Z) is said to be finitely dependent or maeuenaeni if n
(c8) There exists a positive integer mEN such that for any n, r, s, tEN with r ~ S ~ t and s- r > m it holds that
~
Chapter III: Information Channels
138
for every x E X and every message Cn,r = [Yn ... Yr], Cs,t
= [Ys ... ytl c Y.
Then we record the following theorem, which immediately follows from Theorem 7 below.
Theorem 4. If a stationary channel v E Cs(X~, yoZ) is finitely dependent, then v is ergodic. Example 5. Consider alphabet message spaces X = X~ and Y = yoZ with messages IDly in Y. Let m 2: 1 be an integer and for each (Xl,... ,Xm ) E X a probability distribution p((XI, ... ,xm)l· ) on Yom is given. Define Vo : X x IDly ~ [0,1] by
o
t-s vo{x,[Ys···Yt])
= IIp((Xs+k-m, ...
,Xs+k-I)I(Ys+k-m, ... ,Ys+k-l))
k=O and extend Vo to v : X X ~ ~ [0,1], so that v becomes a channel. Then, this u is stationary and finitely dependent, and has a finite mernory. This is easily verified and the proof is left to the reader. Finite dependence can be generalized as follows. Consider a general pair of abstract measurable spaces (X, X) and (Y,~) with measurable transformations Sand T as before. First recall that a stationary source JL E P; (X) is said to be strongly mixing (8M) if A,BEX,
and to be weakly mixing (WM) if n-l
.!. L n-too n
IJL(S-kA n B) - JL(A)JL(B)
lim
I = 0,
A,BEX.
k=O
Also recall that JL E Ps (X) is ergodic iff n-l
! ~ JL(S-kA n B) = JL(A)JL(B), n-too n Z:: lim
A,BEX
k=O
(cf. Theorem 11.3.2). It was noted that for stationary sources strong mixing :::} weak mixing :::} ergodicity. As an analogy to these formulations we have the following definition.
I
J
139
9.9. Mixing channels
Definition 6. A channel v E C(X, Y) is said to be asymptotically independent or strongly "mixing (8M) if (c9) For every C,D E
~
lim {vex, T-nc n D) - v{x, T-nC)v(x, D)} = 0 Ps{X)-a.e.x.,
n-+oo
to be weakly mixing (WM) if (c10) For every C,D E
~
1 n-l
lim
n-+oo n
E Iv{x, T-kC n D) -
vex, T-kC}v{x, D)I = 0 Ps(X)-a.e.x,
k=O
and to be semiergodic (8E) if
(ell) For every C,D E 1
lim -
n-+oon
~
n-l
E {vex, T-kC n D) - vex, T-kC)v(x, D)} = 0
Ps(X)-a.e.x.
k=O
Note that if (X, X, S) is complete for ergodicity (cf. the previous section), then in (c9), (c10) and (3.1) below we can replace Ps(X)-a.e. by Pse(X)-a.e. Consider the following condition that looks slightly stronger than (ell): for every C,D E ~ n-l
n-l
lim .!. "v(x, T-kC n D) = lim.!. "v{x, T-kC)v(x, D) Ps{X)-a.e.x. n-+oo n Z:: n-+oo n Z:: k=O k=O
(3.1)
In view of Proposition 1.4, the LHS of (3.1) exists for a stationary channel. Hence (ell) and (3.1) are equivalent for stationary channels. If (3.1) holds for every x E X, then the existence of the RHS means that v(x,') E Pa(Y) and (3.1) itself means that v(x,') E Pae(Y) for every x E X (cf. Theorem 11.4.12). Also we have for (stationary) channels strong mixing ::::} weak mixing ::::} ergodicity ::::} semiergodicity, where the first implication is obvious, the second is seen in Theorem 7 below, and the last will be verified later. Actually, (3.1) is a necessary (and not sufficient) condition for ergodicity of a stationary channel (cf. Theorem 13 and Theorem 4.2). Now we can give a basic result regarding these concepts.
Theorem 7. Let
t/
E
Cs(X, Y) and f-t E Ps(X).
140
Chapter III: Information Channels
(1) If J-L strongly or (2) If J-L (3) If J-L
is ergodic and v is weakly mixing, then J-L ® v is ergodic. Hence, every weakly mixing stationary channel is ergodic. and v are weakly mixing, then J-L ® v is also weakly mixing. and v are strongly mixing, then J-L ® v is also strongly mixing.
Proof. (1) Let A, B E X and C, D E q). Then we have that n-l
r
.!.E n
v(x, T-kC)v(x, D) p,(dx)
k=O}s-kAnB
1 = .!. E 1 n-l
= .!. E
v(x, T-kC)lA(SkX)V(X, D)lB(x) p,(dx)
n k=O X n-l
v(Sk x, C)lA(Skx)v(x, D)lB(x) p,(dx),
n k=O
x
(3.2)
since v is stationary,
L
-+
vex, C)lA(x) p,(dx)
L
vex, D)lB(x) p,(dx),
since J-L is stationary and ergodic, == J-L ® v(A x C)J-L ® v(B x D).
On the other hand, since 1
t/
n-l
is weakly mixing, it holds that
-n E Iv(x, T-kC n D) -
I
v(x, T-kC)v(x, D) ~ 0 u-a.e.x E X
k=O
as n -+
00.
By the Bounded Convergence Theorem we have
n-l
.!.n E
r
I
Iv(x, T-kC n D) - vex, T-kC)v(x, D) p,(dx)
k=O}S-kAnB
1.!. E I n-l
< as n -+
00.
(3.3)
k=O
Cornbining (3.2) and (3.3), we get
I~ ~ p,0 v((S X T)-k(A x C) n (B x D)) =
I
vex, T-kC n D) - vex, T-kC)v(x, D) p,(dx) -+ 0
x n
p,0 v(A x C)p,0 v(B x D)I
I~ ~ p,0 v((S-k A n B) x (T-kC n D)) -
p,0 v(A x C)p, 0 v(B x D)I
141
3.3. Mixing channels
==
I.!.n ~ [ vex, T-kC n D) p,(dx) k=O}S-kAnB
p, 0 v(A
X
C)p, 0 v(B
X
D)
I
n-l
: :; .!. L .[
Iv(x, T-kC n D) - vex, T-kC)v(x, D)I p,(dx)
n k=O}S-kAnB
+ I.!.
E[
v(x,T-kC)v(x,D) p,(dx) - p,0v(A
X
C)p,0v(B
X
D)!
n k=O}S-k AnB
-+ 0 as n -+
00.
Thus, by Theroern 11.3.2 I-" ® v is ergodic. (2) and (3) can be verified in a similar manner as above. In the above theorem we obtained sufficient conditions for ergodicity of a stationary channel, namely, strong mixing and weak mixing. Key expressions there are (3.2) and (3.3). In particular, if'f/ E Ps(Y) is WM, then the constant channel Vn is stationary ergodic. For, if I-" E Pse(X), then I-" ® v"., == I-" x 'f/ is ergodic by Theorem 11.3.10. In the rest of this section, we consider semiergodic channels together with averages of channels. We shall show that there exists a semiergodic channel for which, if a weak mixing source is input, then the compound source is not ergodic. This irnplies that semiergodicity of a channel is weaker than ergodicity.
(c12) v*(Sx,C)
(c13)
L
== v*(x;C) for x
vex,C) p,(dx)
=
L
E X and C E~;
v*(x, C) p,(dx) for A
E
J, C E
!D
and p, E Ps(X).
(c12) means that for each C E ~ the function v*(·,C) is S-invariant, while (c13) means that v*(·, C) == EM (v(., C)13)(.) u-a.e. (3.4) for C E ~ and I-" E Ps(X), where E M(·13) is the conditional expectation w.r.t. 3 under the measure 1-". A sufficient condition for the existence of the average of a given stationary channel will be obtained (cf. Theorem 13). We need a series of lemmas.
Lemma 9. If v
E
Cs(X, Y) has an average u", then for C v*(·, C)
= n-too lim .!. n
E ~
n-l
L v(Sk., C)
k=O
Ps(X)-a.e.
(3.5)
142
Chapter III: Information Channels
Proof. In fact, this follows from (3.4) and the Pointwise Ergodic Theorem. Lemma 10. Let v E Cs(X, Y) with an average t/", If J-L E Pse(X) is such that J-L ® v =f. J-L x (J-Lv), the product measure, then J-L ® v =f. J-L ® u" . Proof. Observe that for A E X and C E
JL ® v*(A x C)
=
i
~
v*(x, C) JL(dx) n-I
= [ lim .! L v(Skx, C) JL(dx), JA n-too n k=O
by (3.5),
n-I
lim .! L [ v(Skx, C)lA(X) JL(dx) n-too n k=O Jx
L
v(x, C) JL(dx)
L
lA(X) JL(dx),
since JL is ergodic,
= J-L(A)J-Lv(C).
Thus, J-L ® v
=f. J-L ® t/"
by assumption.
v;,
Lemma 11. Let VI, V2 E Cs(X, Y) be semiergodic with averages vi, respectively. If vi == v; (modPs(X)) (cf. Definition 2.4), then Vo = !(VI +V2) is also semiergodic. Proof. Since
vi (i = 1,2) exists, we see that for C, D E ~ n-I
lim.! LVi(X,T-kC)Vi(X,D) n-too n k=O
= v;(x,C)vi(x,D) Ps(X)-a.e.x
by Lemma 9 and hence, by semiergodicity of Vi, n-I
lim 1 L Vi(X, T-kC n D) n-too n k=O
= V;(x, C)Vi(X, D) Ps(X)-a.e.x.
Consequently, we have that 1 n-I
lim -
n-toon
L
k=O
{vo(x, T-kC n D) - vo(x, T-kC)vo(x, D)}
1 [1 n-I
= n-too lim - '"" n L..J k=O
-2 VI (x, T-kC n D)
1
+ -V2(X, T-kC n D) 2
(3.6)
143
3.3. Mixing channels
since
VI, V2
are stationary,
= ~{vnx, C)VI(X, D) + vi(x, C)V2(X, D)}
- ~ {vi (x, C) + vi(x, C)} {VI (x, D) + V2(X, D)}, by (3.6), stationarity of
VI, V2,
and the definition of the average,
= ~{vi(x,C) -vi(X,C)}{VI(X,D) -v2(x,D)} == 0
by
Ps(X)-a.e.x,
vi == v 2 (rnodPs(X)).
Lemma 12. If semiergodic.
V
E
Thus Vo is serniergodic.
Cs(X, Y) is semiergodic and has an average u", then u" is also
Proof. Let C, D E ~ and J.L E Ps(X). Then, n-I
n-I
lim 1 "'v*(x,T-kCnD) n-.+oo n L-J k=O
= n-.+oo lim.!.. "'EI'(v(.,T-kCnD)!J)(x) n L-J k=O
= }~~ EI' (~ ~ v(.,T-kC n D)IJ) (x) == E~(v*(.,
by (3.6) with
V
==
Vi
>
C)v(·, D)IJ)(x),
and (10) in Section 1.2, == v*(x,C)E~(v(.,D)IJ)(x) ==
t/"
(x, C)v* (x, D) n-I
= by (cI2), which yields that
t/"
lim .!..
n-.+oo
n
L v*(x, T-kC)v*(x, D)
u-a.e.x,
k=O
is semiergodic.
Now the existence of an average channel of a given stationary channel is considered under SOUle topological conditions.
Chapter III: Information Channels
144
Theorem 13. Let (X, X, S) be an abstract measurable space with a measurable transformation Sand Y be a compact metric space with the Borel a-algebra ~ and a homeomorphism T. Then every stationary channel v E Cs(X, Y) has an average v* E
Cs(X,Y).
Proof. Denote by Mt(Y) c M(Y) = C(Y)* the positive part of the unit sphere and by B(Y) the space of bounded Borel functions on Y. Let v E Cs(X, Y) be given. For each x E X define a functional V x on B(Y) by
v",(J) = [1(y)v(x,dY),
f
E
B(Y).
If we restrict V x to C(Y), then we see that V x E Mt (Y) for each x EX. Let 1) be a countable dense subspace of C (Y) with the scalar multiplication of rational complex numbers and let n-l
A",(J) = lim .!:. n-+oo
n
L VSk",(J),
fE1)
(3.7)
k=O
for each x E X. Since, for each f E C(Y), the function v(o)(f) is bounded and measurable on X, (3.7) is well-defined u-a.e. by the Pointwise Ergodic Theorem for each JL E Ps(X). Let
Xn
= {x EX:
(3.7) exists for all f E 1)}.
It is easily seen that X n E X, Xn is S-invariant, and J-L(Xn ) = 1 for JL E Ps(X) since 1) is countable. Note that, for each x E X, A x ( · ) is a positive bounded linear functional on 1) since for f E 1) IAx(f)1 =
~
I
~
lim .!:. VSk",(J) I n L....t
n-+oo
k=O
n-l
lim n-+oo
.!:. L n
( I/(y)1 v(Skx,dy) < 11111,
k=O}Y
so that Ax (.) can be uniquely extended to a positive bounded linear functional Ax (.) of norm 1 on C(Y). Let us examine some properties of Ax. Ax satisfies that
x E X n , f E C(Y)
(3.8)
since Asx(f) = Ax(f) for x E X n and f E 1). For each f E 1), A(o)(f) is Xmeasurable on Xn, which follows from (3.7). For a general f E C(Y) there exists a sequence {fn}~=l C 1) such that "fn - fll ~ 0, so that
3.3. Mixing channels
145
implying the measurability of A(o)(/) on X1). Moreover, for each x E X1), one can find a probability measure 'l7x on ~ such that
I
Ax(f) = [f(Y) 1Jx(dy),
E C(Y)
by the Riesz-Markov-Kakutani Theorem. One can also verify that rlx is T-invariant, i.e., '17 E Ps(Y) for x E X1), which follows from (3.7) and stationarity of t/. Consider the set
Eo = {f E B(Y) : [ f d1Jx is X-measurable on X:o}. Then we see that C(Y) ~ 8 0 and 8 0 is a monotone class, i.e., if {In}~=l ~ 8 0 and In .,t. I, then I E 8 0 , Hence one has 8 0 = B(Y). Denote by the same syrnbol Ax the functional extended to B(Y). Take any '17 E Ps(Y) and define u" by if x E X1)
v*(x, C) = { 'l7x(C) '17 (C)
if x E X
n
for C E~. We shall show that t/" is the desired average of t/, First note that u" is a stationary channel, u" E Cs(X, Y), by'l7x (x E X1»),'17 E Ps(Y) and (3.8). Now we verify (c12) and (c13) (Definition 8). If x E X1), then Sx E X1) and
v*(x, C) = 'l7sx(C) = 'l7x(C) = v*(x, C), and if x E X
n, then Sx E X n and v*(Sx, C)
= 'I7(C) = v*(x, C),
CE~,
so that (c12) is satisfied. To see (c13), let I-t E Ps(X) be fixed and observe that
L[
fey) vex, dy)p,(dx) =
for I E 11. For, if g(x)
= Jy I(y) v(x, dy)
gs(x) == lim (Sng)(x) n-+oo
and
L
L
[f(Y) v*(x, dy)p,(dx)
(x E X), then
= }y [ fey)
g(x) p,(dx) =
L
v*(x, dy) u-a.e.x
gs(x) p,(dx)
(3.9)
146
Chapter III: Information Channels
by the Pointwise Ergodic Theorem, which gives (3.9). Moreover, (3.9) holds for f E C(Y) since ~ is dense in C(Y). If G E X is S-invariant (GE J) with /-L(G) > 0, then the measure /-La defined by AEX,
is an S-invariant probability measure. We have that for CEq)
fa
v(x, C) p,(dx) = p,(G)
= p,(G)
=
fa
L L
v(x, C) p,a(dx) v*(x, C) p,a(dx)
v*(x, C) p,(dx).
If G E J is such that /-L(G) = 0, then the equality in (c13) also holds. Thus u" satisfies (c13). Therefore t/" is the desired average of u, The following theorem is our main result of this section. Theorem 14. There is a semiergodic channel Va and a strongly mixing input /-La for which the compound source /-La ® Va is not ergodic. Hence, a semiergodic channel is not ergodic. Proof. Consider an alphabet message space X = {O, l}Z with the Baire a-algebra X and the shift S. Let V E C(X, X) be a memoryless channel with the channel matrix p=
(I t)
(cf. Definition 1.1). Then one can directly verify that u is stationary and semiergodic. i/" by Theorem 13 and t/" is semiergodic by Lemma 12, so that
v has an average
1
1 *
Va = "2 v +"2 v
is also semiergodic by Lemma 11 since (z-") * = t/", Now let /-La be a (!' ~)-Bernoulli source (cf. Example II.1.2), which is strongly mixing (cf. Example 11.3.7). Then /-La ® Va is not a direct product measure, so that
by Lemma 10. Since 1
1.
/-La ® Va = "2/-La ® v + "2/-La ® u
*
147
3.4. Ergodic channels
is a proper convex combination, J-to Q9 Vo is not ergodic by Theorem 11.3.2. This completes the proof.
3.4. Ergodic channels In the previous section sufficient conditions for ergodicity were given. In this section, ergodicity of stationary channels is characterized by finding several necessary and sufficient conditions. We shall notice that many of these conditions are similar to those for a stationary source to be ergodic. Also we use channel operators to obtain equivalence conditions for ergodicity. In particular, we realize that there is a close relation between ergodicity and extremality. Let (X, X, S) and (Y,~, T) be a pair of abstract measurable spaces with measurable transformations Sand T, respectively. Recall that~~a",_~,~~,~,~~~~~~J:"e"e~~:~~~~ v is said to be if
(c7) J-t
E Pse(X)
=> J-t Q9 v
E Pse(X
x Y).
Definition 1. Let P be a subset of P(X). A stationary channel v E Cs(X, Y) is said to be extremal in Cs(X, Y) mod P, denoted v E exCs(X, Y) (mod P), if Vl, v2 E Cs(X, Y) and a E (0,1) are such that v == aVl + (1 - a)v2 (mod P), then Vl == V2 (mod P). The following theorem gives some necessary and sufficient conditions for ergodicity of a stationary channel, which are very similar to those for ergodicity of a stationary source (cf. Theorem II. 3.2) .
Theorem 2. For a stationary channel u E Cs(X, Y) the following conditions are equivalent: (1) v E Cse(X, Y), i.e., v is ergodic. (2) If E E XQ9~ is S x T-invariant, then v(x, Ex) = 0 or 1 Pse(X)-a.e.x. (3) There exists an ergodic channel Vl E Cse(X, Y) such that v(x,') ~ Vl(X,·) Pse(X)-a.e.x. (4) If a stationary channel Vo E Cs(X, Y) is such that vo(x, .) « v(x,·) Pse(X)a.e.z, then Vo == v (modPse(X)). (5) v E exCs(X, Y) (modPse(X)). (6) For E, F E X Q9 ~ and J-t E Pse(X) it holds that n-l
lim 1 n-+oo n
L v(x, [(8
k=O
n-l
X
T)-k En F]a:)
.!. L v(x, [(8 X T)-k E]a:)v(x,Fa:) n-+oo n
= lim
k=O
= J-t Q9 v(E)v(x, F x)
u-a.e.x,
Chapter III: Information Channels
148
(7) For A, B E X, C, D E
~
and J-t E Pse(X) it holds that
n-I
lim.!:. n-+oo n
L r.
{v(x, T-kC n D) - v(x, T-kC)v(x, D)} JL(dx)
= O.
k=OJS-k AnB
Proof. (1) =} (2). Let J-t E Pse(X). Then J-t ® v E Pse(X x Y) by (1). Hence, if E E X ® ~ is S x T-invariant, then J-t ® v(E) = 0 or 1, i.e.,
L
v(x, Ex) JL(dx)
=0
or 1.
This implies v(x, Ex) = 0 u-a.e.x or v(x, Ex) = 1 u-a.e.x, Thus (2) holds. (2) =} (1). Let E E X ® ~ be S x T-invariant, J-t E Pse(X) and
A o = {x EX: v(x, Ex)
= o},
Al = {x EX: v(x,Ex) = I}. Note that T-IEsx = Ex since
T- I Es x = =
{Y E Y : Ty E {z
E Y : (Sx, z) E E}}
{y
E
E
Y : (Sx, Ty)
= {y E Y: (x,y) E (S
E}
x T)-IE
E} = Ex.
(4.1)
Henee A o is S-invariant since
S-IA o = {x EX: Sx E A o} = {x EX: v(Sx, Es x) = n} = {x EX: v(x, T-IESx) = O}, because v is stationary,
= {x
EX: v(x, Ex)
= O} = A o.
Similarly, we see that Al is S-invariant. Thus, J-t(A o) = 0 or 1, and J-t(A I) sinee J-t is ergodic. Consequently, we have that
JL @ v(E) =
= 0 or
1
L
v(x, Ex) JL(dx) = 0 or 1.
Therefore, J-t ® v is ergodic, (1) =} (3) is trivial since we can take 11.3.2 and Corollary 2.6 (1).
VI
=v
and (3)
=}
(1) follows from Theorem
3.4. Ergodic channels
149
(1) => (5). Suppose that (5) is false. Then for some Vi, V2 E Cs(X, Y) with Vi t= V2 (modPse(X)) and a E (0,1) we have V == aVl + (1 - a)v2 (modPse(X)). Hence J-l ® Vi :f. J-l ® V2 for some J-l E Pse(X) and p, 0 v(E)
= =
L L[
v(x, Ex) p,(dx) a vl (X,Ex) + (1- a)v2(X'Ex)] p,(dx)
= au Q9 vl(E) + (1 - a)J-l Q9 v2(E ),
E E X Q9~.
Thus J-l ® V = ap. ® Vi + (1 - a) J-l Q9 V2 is a nontrivial convex combination of two distinct stationary sources. Therfore, J-l Q9 u is not ergodic, which contradicts the ergodicity of u, (5) => (2). Assume that (2) is false. Then, there are some J-l E Pse(X) and some S x T-invariant E E XQ9~ such that J-l(A) > 0, where A = {x EX: v(x, Ex) #= 0, 1}. Define Vi and V2 by
v(x, C n Ex) v(x, Ex) , { v(x, C), v(x, C n E~) V2(X, C) = v(x, Ei) , { v(x, C),
Vi(x, C) =
xEA,
xEA,
where C E~. Clearly, Vi and V2 are channels. As in the proof of (2) => (1) we see that Ex = T-1Esx by (4.1) and hence
v(x, Ex) implying that
s:' A =
= v(x, T- l E s x) = v(Sx, Es x),
xE
A,
(4.2)
A. Thus, for x E A and C E ~
_ v(Sx,CnEs x) _ v(x,T-l(CnEsx)) Vi (S x, C) v(x, Ex) v(Sx, Es x) v(x, T- 1C n T- l Es x) v(x, T- 1C n Ex) v(x, Ex) v(x, Ex) = Vi(x, T- 1C). It follows that Vi is stationary. Similarly, one can verify that V2 is stationary. Moreover, we see that
xEA,
Chapter III: Information Channels
150
and hence VI "t V2 (modPse(X)). Note that for x E X and C E ~
Let B = {x EX: v(x,EaJ ~ ~} E
x and define V3 and V4 by xEB, x E BC, xEB, x E BC,
where C E ~. Obviously, V3, V4 E C(X, Y). Observe that S-1 B == Band S-1 BC = B C by (4.2). Then it is easily verified that V3 and V4 are stationary by S-invariance of Band BC, stationarity of VI and V2, and (4.2). Furthermore, we have by the definition of V3 and V4, and by (4.3) that
V4(X, Ex)
{2v(x, Ex) - 1 }Vl (x, Ex) + {2 - 2v(x, Ex)} V2(X, Ex) = 2v(x, Ex) - 1 < 1.
If x E A n BC, then
V3(X, Ex)
= {I - 2v(x, Ex) }V2(X, Ex) + 2v(x, E x)Vl(X, Ex) = 2v(x, Ex) > 0,
while V4(X, Ex) = V2(X, Ex} = o. Thus, V3(X, Ex) -=1= V4(X, Ex) for x E A. This means that V3 "t V4 (modPse(X)) and so V ~ exCs(X, Y) (modPse(X)). Therefore, (5) is not true, (4) =} (2). Suppose (2) is false. Then for some J-t E Pse(X) and E E x®~ with (S x T)-l E = E it holds that J-t(A) > 0, where A = {x EX: 0 < v(x, Ex) < I}. Define Vo by v(x,CnEx) x E A, C E~, vo(x, C) = v(x, Ex) , { lJ(x,C), x E AC, C E ~. Then we see that Vo E Cs(X, Y) and V "t lJo (rnodPse(X)) in a similar way as before. Moreover, we have lJo(x,·) ~ v(x,·) Pse(X)-a.e.x since D E ~ and v(x, D) = 0 imply Vo (x, D) = 0 for x EX. This contradicts (4).
151
3.4. Ergodic channels
(1), (2), (5) =} (4). Thus far we have proved (1) {:} (2) {:} (3) {:} (5). Assume (4) is false. Then there exists some Vo E Cs(X, Y) such that vo(x,·) ~ v(x,·) Pse(X)-a.e.x,
Vo
"# v
(modPse(X)).
Let fL E Pse(X) be arbitrary. By (2), if E E X ® ~ is 8 x T-invariant, then v(x, Ex) = 0 or 1 u-a.e.x.
Let VI = ~v + ~vo. Then, vI(x,Ex) = 0 or 1 u-a.e.x since v(x, Ex) = 0 u-a.e.x implies vo(x, Ex) = 0 u-a.e.x, and v(x, Ex) = 1 u-a.e.x implies vo(x, Ex) = 1 u-a.e.x, Thus VI is ergodic by (2), which contradicts (5). (1) =} (6). Suppose V is ergodic and let fL E Pse(X), Then fL ® v E Pse(X x Y) and hence for every E, F' E X ® ~ n-I
!L n-+oo n lim
p, @ v((8
X
T)-k En F ')
= p, @ II(E)p, @ v(F').
(4.4)
k=O
If we take F ' = F n (A x Y) for F E X ® ~ and A E X, then on the one hand n-I
lim! n-+oo
L Jfx v(x, [(8
X
T)-k En F n (A x Y)]",) p,(dx)
n k=O
= n-+oo lim ! n
n-I
Lp,@v((8 x T)-kEnFn (A x Y)) k=O
=fL®v(E)fL®V(Fn(AxY)),
= p, @ v(E)
L
by (4.4),
v(x,F",) p,(dx),
(4.5)
and on the other hand by Proposition 1.4
!
LHS of (4.4) = lim n-+oo
=
n-I
L
j
v(x, [(8
X
T)-k En F]x) fL(dx)
L v(x, [(8
X
T)-k En F]",) p,(dx).
n k=O A n-I
j
lim!
(4.6)
A n-+oo n k=O
Since (4.5) and (4.6) are equal for every A E X, one has the equation in (6). (6) =} (7). If E = A x C and F = B x D, then the equation in (6) reduces to n-I
! Lls-kAnB(X)V(x, T-kC n D) n-+oo n lim
k=O
== n-+oo lim ! n
n-I
L
k=O
lS-kAnB(X)V(x, T- kC)II(X, D) Pse(X)-a.e.x.
152
Chapter III: Information Channels
Integrating both sides w.r.t. It E Pse(X) over X, we get the equation in (7). (7) =* (1). Let It E Pse(X), A, B E X and C, D E ~. Then n-I
~E
{Jt ® v((S x T)-k(A x C) n (B x D)) - Jt e v(A x C)Jt ® v(B x D)}
k=O n-I
=
.!. E ti
{
k=O JS-k AnB
+~~
{vex, T-kC n D) - vex, T-kC)v(x, D)} Jt(dx)
{L lA(SkX)V(Sk x, C)lB(x)v(x,
-L
lA(X)V(X, C) Jt(dx)
D) Jt(dx)
L
IB(x)v(x, D) Jt(dX)}
-+0 (asn-+oo) by the assumption (7) and the ergodicity of It. Thus It ® v is ergodic. In (7) of Theorem 2, if we replace E = X x C and F =X x D, then (7) reduces to the condition (3.1). We now consider ergodicity of a stationary channel in terms of the channel operators it associates. So assume that X and Yare a pair of compact Hausdorff spaces with homeomorphisms Sand T, respectively.
Definition 3. Let K E JCs(X, Y), A E As(X, Y) and P ~ P(X). Then K is said to be extremal in JCs(X, Y) mod P, denoted K E exJCs(X, Y) (mod P), provided that, if K == aK I + (1 - a)K 2 (mod P) for some K I , K 2 E JCs(X, Y) and a E (0,1), then K I == K 2 (mod P). Similarly, A is said to be extremal in As(X, Y) mod P, denoted A E ex As (X, Y) (mod P), provided that, if A == aA I + (1- a)A 2 (mod P) for some AI, A 2 E As (X, Y) and a E (0,1), then Al == A 2 (mod P). Under these preparations, extremal operators A E As (X, Y) are characterized as follows:
Theorem 4. Let A E As (X, Y). Then the following conditions are equivalent: (1) For every i.s E C(X x Y) lim A(fng)(x) = lim Afn(x)Ag(x) Pse(X)-a.e.x,
n-+oo
n-+oo
1 n-I
where I« = (S®T)nf = - L (S®T)kf for n k=O (2) It E Pse(X) =* A*1t E Pse(X x Y).
ti
2:: 1.
(4.7)
3.4. Ergodic channels
(3) A
E
Proof. (1)
153
ex As (X, Y) (modPse(X)). =}
(2). Let J-L E Pse(X) and f, 9 E C(X x Y). Then
lim A*J-L(fng)
n-+oo
= n-+oo lim J-t(A(fng))
= n-+oo lim J-t(Afn' Ag),
by (1),
= n-+oo lim J-L(Af)nAg) , since A(S ® T) = SA, = J-t(Af)J-t(Ag),
since J-L is ergodic,
= A*J-L(f)A*J-t(g),
where (Af)n = Sn(Af),n ~ 1 (cf. Theorem 11.3.2). Thus A*J-L E Pse(X x Y) by Theorem 11.3.2. (2) =} (1). Suppose that (1) is false. Then it holds that for some J-L E Pse(X) and i.s E C(X x Y)
J-L({x EX: lim A(fng)(x) n-+oo
=I n-+oo lim Afn(x)Ag(x)}) > 0
or, equivalently, for some h E C(X)
(4.8) Since A*J-L is ergodic by (2), we have lim J-L(A(fng)· h)
n-+oo
= n-+oo lim J-t(A(fngh)) ,
by (al),
= lim A*J-t(fngh) n-+oo
= lim A*J-t(fn)A*J-L(gh) n-+oo
= lim J-t(Afn)J-L(Ag· h). n-+oo
(4.9)
Moreover, since J-L is ergodic, we also have
(4.10) (4.9) and (4.10) contradict (4.8). Thus (4.7) holds. (2) =} (3). Suppose A == aA 1 + (1 - a)A 2 (modPse(X)) for sorne AI, A 2 As (X, Y) and a E (0, 1). For J-t E Pse(X)
E
Chapter III: Information Channels
154
is also ergodic by (2). Then by Theorem 11.3.2, AiJL = A 2JL = A * JL E Pse(X x Y). Since this is true for every JL E Pse(X), we have Al == A 2 == A (modPse(X)) by Proposition 2.7. Thus (3) holds. (3) =} (2). Assume that (2) is not true. Then there exists a JL E Pse(X) such that A * JL is not ergodic. Hence there is some S x T-invariant set E E X ® ~ such that 0 < A * JL(E) < 1. Let Al = A * JL(E) and A2 = 1 - AI, and take 'Y so that o < 'Y < rnin{AI' A2}. Let Qi = (i = 1,2) and define operators AI, A 2 on B(X x Y) by
t
Alf = QIA(fIE) + (1
QIAIE)Af,
A 2f = Q2 A(fIEc) + (1 - Q2A1Ec)Af,
f f
E E
B(X xY), B(X x Y).
We shall show AI, A 2 E A(X, Y). All = 1 is clear. AI(fAIg) = (AIf)(AIg) for f, 9 E B(X x Y) is seen from the following computation:
+ (1- QIA1E)Ag}] = QIA[f{QIA(gIE) + (1- QIAIE)Ag}IE] + (1 QIAIE)A[f{QIA(gIE) + (1- QIAIE)Ag}] = Q~A[fIEA(gIE)] + QIA[f IE(l- QIAIE)Ag] + QI(I- QIAIE)A[fA(glE)] + (1- QIAIE)A[f(I- QIAIE)Ag] = Q~A(fIE)A(gIE) + QIA[f 1E(I- QIAIE)]Ag + QI(I QIAIE)Af· A(gIE) + (1- QIAIE)A[f(I- QIAIE)]Ag
AI(fAIg) = Al [f{Q IA(glE)
= Q~A(fIE)A(gIE)
+ QI(I -
+ QIA(f1E)(I - QIAIE)Ag
QIAIE)Af· A(gIE)
+ (1 -
QIAIE)2Af' Ag = [QIA(fIE) + (1- QIAIE)Af] [QIA(gIE) + (1 - QIAIE)Ag] = (AIf)(AIg), where we used A(fAg) = (Af)(Ag). Similarly, we can verify (al) for A 2. Moreover, (a3) is clearly satisfied by AI, A 2, which is seen from their definitions. Thus AI, A 2 E A(X, Y). Furthermore, AI, A 2 E As (X, Y) since for f E B(X x Y) we have AI(S ® T)f
= QIA(((S ® T)f)IE) + (1
QIA1E)A(S ® T)f
= QIA((S ® T)(fIE)) + (1- QIAIE)SAf, since A(S ® T) = SA and E is S x T-invariant, = QISA(fIE) = SAlf
+ (1 -
QIAIE)SAf
3.4. Ergodic channels
155
and similarly A 2 (S ® T)f = SA 2f. Now we show that A 1 t:. A 2 (modPse(X)). In fact, we have
Jlt(A 21Ec - A 11Ec) = Jlt(a2A1Ec
+ (1- a2A1Ec)A1Ec
- a1A(lEc IE) - (1 - a1A1E )A1Ec )
+ A1Ec(a1A1E - a2 A 1Ec)) = a2A*Jlt(EC) + Jlt{A1Ec(a1 A 1E - a2 A 1Ec)) = a2A2 + Jlt(A1Ec )Jlt(a1 A 1E - a2 A 1Ec), since Sn(A1Ec) = AlEc and Jlt is ergodic, = Jlt(a2A1Ec
+ A2(a1A1 - a2 A2) ')' + A2(')' - ')') > O.
= a2A2
=
(4.11)
Finally we see that A1A1 + A2A2 = A since for f E B(X x Y)
A1 A1f + A2A2f = ')'A(f1E) + (A1 - ')'A1E)Af = ')'Af + (1 - ')')Af = Af.
+ ')'A(f1Ec) + (A2
- ')'A1Ec )Af (4.12)
These results imply that A ~ exAs(X, Y) (modPse(X)), a contradiction. As a corollary we have: Corollary 5. Let v E Cs(X, Y) be a stationary channel and K E Ks(X, Y) and A E As(X, Y) be the corresponding channel operators to t/, Then the following
statements are equivalent: (1) v is ergodic. (2) u E exCs(X, Y) (modPse(X)). (3) K E exKs(X, Y) (modPse(X)). (4) A E ex As(X, Y) (modPse(X)). (5) For f, 9 E C(X x Y) it holds that lim A(fng)(x) = lim Afn(x)Ag(x) Pse(X)-a.e.x,
n-too
n-too
where fn = (S ® T)nf, n ~ 1. (6) For t, 9 E B(X x Y) the equality in (5) holds. (7) For E, F E X ® ~ it holds that for Pse(X)-a.e.x n-1
n-1
k=O
k=O
.!. '" v(x, [(8 X T)-k En F]x) = n-too lim .!. '" v(x, [(8 X T)-k E]x)v(x, Fx). n-too n Z:: n L....t lim
Chapter III: Information Channels
156
Proof. (1) ¢:> (2) ¢:> (3) ¢:> (4) ¢:> (5) follow from Theorems 2.1, 2.2, 2 and 4. (5) => (6) is proved by a suitable approximation. (6) => (7) is derived by taking f = IE and 9 IF for E, F E X ®~. (7) => (6) follows from the fact that {IE: E E X ®~} spans B(X x Y). (6) => (5) is trivial. Remark 6. Observe that each of the following conditions is not sufficient for ergodicity of a stationary channel v E Cs(X, Y), where (X, X, S) and (Y,~, T) are a pair of abstract measurable spaces with measurable transformations: (1) v(x,') E Pae(Y) Pse(X)-a.e.x.
(2) v(x,·) E Pse(Y) Pse(X)-a.e.x. (3) v = v", for rJ E Pse(Y), v", being the constant channel determined by n. In fact, if X = Y, rJ E Pse(X) is not WM, and v", is ergodic, then rJ ® v", = rJ x rJ is ergodic, which implies rJ is WM by Theorem 11.3.10, a contradiction.
3.5. AMS channels AMS sources were considered as an extension of stationary sources in Section 2.4. In this section, AMS channels are defined and studied as a generalization of stationary channels. A characterization of AMS channels is obtained as well as that of ergodic AMS channels. Absolute continuity of measures plays an important role in this section. Let X, Y be a pair of compact Hausdorff spaces with Baire a-algebras X, ~ and homeomorphisms S, T, respectively. The invertibility assumption is crucial. Assume that ~ has a countable generator ~o, i.e., ~o is countable and a(Vo) =~. Recall that Pa(O) and Pae(O) stand for the set of all AMS sources in P(!~) and the set of all AMS ergodic sources in Pa(!~) for 0 = X, Y and X x Y, respectively.
Definition 1. A channel v E C(X, Y) is said to be asymptotically mean stationary (AMS) if (cI4) I-t E Pa(X) =>
V
® I-t E Pa(X x Y).
That is, if the input source is AMS, then the compound source is also AMS. Ca(X, Y) denotes the set of all AMS channels. First we need the following two lemmas.
Lemma 2. A channel u E C(X, Y) is AMS iff I-t E Ps(X) => I-t ® v E Pa(X x Y).
Proof. The "only if' part is obvious. As to the "if' part, let I-t
E
Pa(X). Then
157
3.5. AMS channels
Ii
E
Ps(X) and J-L
~
Ii by Remark 11.4.9. Hence, J-L Q!) v
« Ii Q!) v ~ Ii Q!) u,
where the first « is clear and the second follows from Remark 11.4.9 and the assumption since Ii E Ps(X). Thus J-L Q!) v E Pa(X x Y) by Proposition 11.4.8.
Lemma 3. Let u E C(X, Y) and J-L E P(X) be such that J-LQ!)v E Pa(X x Y). Then: (1) J-L E Pa(X) with the stationary mean Ii(') = J-L Q!) v(· x Y) E Ps(X). (2) J-Lv EPa(Y) with the stationary mean J-Lv(,) = J-L Q!) v(X x .) E Ps(Y). (3) v(x,·) ~ J-LV u-o:e.x.
x
Proof. (1) Observe that for A E n-I
~L n-+oo n lim
k=O
n-I
It(S-k A)
= n-+oo lim ~ " It @ v(S n L..,;
x T)-k(A x Y))
k=O
= J-L Q!) v(A x Y). Thus J-L is AMS with the desired stationary mean. (2) is verified similarly. (3) Suppose that J-Lv(C) = O. Then J-Lv(C) = 0 since J-LV «J-Lv. Hence Itv(C)
=
i
v(x, C) It(dx)
= O.
This irnplies that v(x, C) = 0 p-a.e.x, completing the proof. An important consequence of these two lemmas is: Corollary 4. For a channel u E C(X, Y) the following conditions are equivalent: (1) v is AMS, i.e., u E Ca(X, Y). (2) For each stationary J-L E Ps(X) there exists a stationary channel VI E Cs(X, Y) such that (5.1) V(x,·) « VI(X,') u-a.e.x.
(3) For each stationary J-L E Ps(X) there exists an AMS channel such that (5.1) holds.
VI
E
Ca(X, Y)
Proof. (1):::} (2). Suppose v is AMS and let J-L E Ps(X). Then J-LV E Pa(Y) by Lemma 3 (2) since J-L Q!) u EPa(X x Y). If we let VI (x, C) = J-Lv(C). for x E X and C E !D, then VI is a constant stationary channel and (5.1) is true by Lemma 3 (3). (2) :::} (3) is immediate.
Chapter III: Information Channels
158
(3) :::::> (1). Let u E Ps(X) and suppose the existence of VI mentioned. Then VI E Pa(X x Y) and J-t ® V « J-t ® VI by (5.1). Hence J-t ® V E Pa(X x Y) by Proposition 11.4.8. Thus u is AMS by Lemma 2.
J-t ®
Now we want to consider an analogy of stationary means of AMS channels. Suppose that v E Ca(X, Y) and J-t E Ps(X). Then J-t ® v is AMS. Observe the following computation: for A E X and C E ~ n-l
n-l
1 ' " P, ® v((S x T)-k(A x C)) n LJ k=O
= .!.n L p, ® v(S-k A x T-kC) k=O
n-l
= .!. L n
(
vex, T-kC) p,(dx)
k=O}S-kA
n-l
=.!. L
n k=O
1
1
v(S-kx,T-kC)p,(d(S-kx))
A
n-l
= .!. L v(S-k x, T-kC) p,(dx) A
=
n
L
k=O
vn(x, C) p,(dx),
say,
== J-t® vn(A x C) -+ J-t ® v(A x C) (n -+ 00). Clearly, each V n (n 2:: 1) is a channel, but not necessarily stationary. The sequence {v n } is expected to converge to some stationary channel fJ, to be called again a "stationary mean" of u, We shall prove the following.
Proposition 5. A channel v E C(X, Y) is AMS iff for each stationary input source J-t E Ps(X) there exists a stationary channel v E Cs(X, Y) such that for every C E ~ 1 v(x, C) == lim n-+oo n
n-l
L
v(S-k x, T-kC)
u-a.e.x.
(~.2)
k=O
In this case it holds that (5.3)
Proof. Assume that v E Ca(X, Y) and J-t E Ps(X). Let "1(.) == J-tv(·). Then, by Lemma 3, "1 E Ps(Y) and v(x,') «"1 u-a.e.x, Let Xl == {x EX: v(x,·) « "1}, so that J-t(X I ) == 1. Moreover, if X* n S" Xl, then X* is S-invariant and J-t(X*) == 1 nEZ
159
3.5. AMS channels
since 1-£ is stationary. Let
k(x, y)
=
{
v(x,dy) 'f/(dy)'
x E X*,y E Y,
0,
x ¢ X*, Y E Y.
Then we have that k E L 1(X X Y, 1-£ ® v) by Theorem IV.2.5 and Remark IV.2.6, which are independent of this proposition, since 1-£ ® v « 1-£ x "7, and
vex,C)
=
Ie
k(x, y) 'f/(dy) ,
Now one has for x E X* and C E
-1 L
n-1
n
v(S-k x , T-kC)
~
1 n-1 = -
L1
n
k=O
k=O
n-1
= .!:. L
XEX*,CE~.
k(S-k x , y) "7(dy)
T-kC
1
k(S-kx, T-ky) 'f/(dy),
n k=O
since 'f/ is stationary,
C
n-1
= 11c(Y).!:. n
y
L k(S-kx, T-ky) 'f/(dy)
k=O
= [lc(y)[(S-l @T-1)nk](x,Y)'f/(dy).
By the Pointwise Ergodic Theorem there is an S x T-invariant function
(8- 1 ® T-1)nk -+
k*
k*
such that
1-£ ® u-a.e.
since k is jointly measurable. Hence we see that n-1
lim
.!:. L
n--+oo n k=O
since "7(.)
= 1-£ ® v(X
v(S-k x, T-kC)
=
rlc(y)k*(x, y) 'f/(dy)
u-a.e.x
}Y
x .). Take a stationary channel u" E Cs(X, Y) and define D by
D(x,C) = {
Ie
k*(x, y) 'f/(dy) ,
XEX*,CE~,
x¢X*,CE~.
v*(x, C),
Then, clearly Dis a stationary channel and (5.2) holds. By the Bounded Convergence Theorem we have from (5.2) that n-1
p, @ v(A x C)
= n--+oo lim .!:. '" p, @ v((S x T)-k(A x C)) n L.....t k=O
Chapter III: Information Channels
160
for A E X and C E~. Thus, since j.L Q9 1) is stationary, j.L Q9 v is AMS with the stationary mean j.L Q9 1), i.e., (5.3) is established. In Proposition 5, the stationary channel 1) depends on the given AMS channel v and the stationary input source u: We would like to have a single stationary channel independent of input sources such that (5.2) and (5.3) are true for all stationary j.L E Ps(X). This will be obtained using the countable generator ~o of~.
Theorem 6. For any AMS channel v E Ca(X, Y) there is a stationary channel YJ E Cs(X, Y) such that for any stationary input source j.L E Ps(X) 1
YJ(x, C)
= n-too lim n
n-l
L v(S-k x, T-kC)
u-a.e.x, C E ~,
(5.4)
k=O
(5.5)
j.LQ9v = j.LQ9YJ, v(x,·) «YJ(x,·)
u-a.e.x,
(5.6)
Proof. Let v E Ca(X, Y) and X(C)
=
{
x EX: lim
n-too
} .!.n n-l ~ v(S-kx, T-kC) exists L..J
,
CE~O,
k=O
X(v)
=
n
X(C).
CE~o
Then for any stationary j.L E Ps(X) we have j.L(X(v)) = 1
since j.L (X(C)) = 1 for C· E ~o by Proposition 5 and ~o is countable. Take a stationary channel v* E Cs(X, Y) and define a set function YJ on X x ~o by I
_() v x, C
=
n-l
lim - I: v(S-k x, T-kC), n-too n k=O { v*(x, C),
x E X(v),C E ~o, x fJ. X(v),C E ~o.
It is evident that YJ can be extended to X x ~ and becomes a stationary channel. Thus (5.4) is satisfied. (5.5) is shown by the Bounded Convergence Theorem and (5.6) is almost obvious.
Definition 7.. The stationary channel YJ constructed above"is called a stationary mean of the AMS channel v E Ca(X, Y).
161
3.5. AMS channels
The following is a collection of equivalence conditions for a channel to be AMS. Recall the notations s; and (8 ® T)n'
Theorem 8. For a channel u E C(X, Y) the following conditions are equivalent: (1) v E Ca(X, Y), i.e., u is AMB. (2) J-L E Ps(X) => J-L ® v E Pa(X x Y). (3) There is a stationary VI E Cs(X, Y) such that v(x,·) ~ Vl(X,') Ps(X)-a.e.x. (4) There is an AMB VI E Ca(X, Y) such that v(x,·) ~ VI(X,') Ps(X)-a.e.x. (5) There is a stationary Ii E Cs(X, Y) such that for C E ~ 1 n-l lim - "v(S-k x, T-kC) = vi», C) n4-OO n L....J
Ps(X)-a.e.x.
k=O
(6) For f E B(X x Y) and lim n4-OO
J-L E
Ps(X) the following limit exists:
r
l» [Av(S Q9 T)nf] (x) JL(dx).
If any (and hence all) of the above is true, then it holds that
(7) J-L ® v = J-L ® Ii for J-L E Ps(X). (8) v(x,·) ~ Ii(x,') Ps(X)-a.e.x. (9) For J-L E Ps(X) and f E B(X x Y) lim n4-OO
r [Av(S
Jx
Q9 T)nf]
(x) JL(dx)
=
r
Jx (Avf)(x) JL(dx).
Proof. (1) {::} (2) was proved in Lemma 2 and (1) {::} (5) follows from Theorem 6. By taking VI = Ii, (3) is derived from (5). (3) => (4) is immediate and (4) => (1) is proved in Corollary 4. Thus we proved (1) {::} (2) {::} (3) {::} (4) {::} (5). (2) {::} (6). Observe that for f E B(X x Y) and J-L E Ps(X)
L
[Av(S
Q9
T)nf] (x) JL(dx) = =
L[(S j"{
Q9
T)nf(x, y) v(x, dY)JL(dx)
(8 ® T)nf(x, y)
J-L ®
v(dx, dy)
JXXY
by Lemrna 2.5. The equivalence follows from Theorem 11.4.6. (7) and (8) are already noted in Theorem 6. As to (9) we proceed as follows: for f E B(X x Y) and J-L E Ps(X)
L
(Avf) (x) JL(dx)
=
Ll>
y)IJ(x, dY)JL(dx)
Chapter III: Information Channels
162
=
j"( jr(
f(x, y) Jj ® v(dx, dy),
by Lemma 2.5,
f(x, y) p,(ji'y v(dx, dy),
by (7),
JXXY
=
JXXY
= lim n---too
jf"{ (8 ® T)nf(x, y) Jj ® v(dx, dy) JXXY
= n---tooJx lim ( [Av(S Q9 T)nf](x) tL(dx). Example 9. (1) As was rnentioned before, each probability measure 'fJ E P(Y) can be regarded as a channel by letting v'TJ(x,C) = 'fJ(C) for x E X and C E~. If 'fJ E Pa(X), then v'TJ is AMS. In fact, 'fJ «rj since T is invertible, so that v'TJ(x,·) = 'fJ «rj = vi](x,·) for x E X. Moreover, vi] E Cs(X, Y) implies that v'TJ E Ca(X, Y) by Theorem 8 (3). In this case we have v'TJ = vi]. (2) If a channel u E C(X, Y) satisfies that v(x,·) «'fJ
Ps(X)-a.e.x
for some AMS 'fJ E Pa(Y), then v is AMS by Theorem 8 (4) and (1) above. Let
k(
x,y
) = vex, dy)
(x,y) E X x Y,
rj(dy) ,
where rj E P; (Y) is the stationary mean of n. Then v can be written as
v(x,C) and its stationary mean
= [ k(x,y)'fj(dy),
~
v as
v(x, C) = [k*(X, y) 'fj(dy) where k*(x, y)
x E X,C E
= n---too lim (8- 1 ® T-1)nk(x, y)
P.(X)-a.e.x, C Jj
E !l),
® v-a.e. (x, y) for
Jj E
Ps(X).
(3) A channel v E C(X, Y) satisfying the following conditions is AMS: (i) v is dominated, i.e., v(x,·) « 1/ (x E X) for some 'fJ E P(Y); (ii) v(x,·) E Pa(Y) for every x E X. In fact, u is strongly measurable by Corollary IV.2.3 since v is dominated and ~ has a countable generator. Hence v has a separable range in M(Y), so that {v(x n , · ) : n ~ I} is dense in its range for some {x n } ~ X. Let
(.) =
t n=l
v(~:, .).
163
3.5. AMS channels
Then we see that € E Pa(Y) by (ii) and Lemma 11.4.3(1). Thus v is AMS by (2) above since vex, .) ~ € (x EX).
Definition 10. An AMS channel v E Ca(X, Y) is said to be ergodic if (cI5)
J-L E
Pae(X) =>
J-L
® v E Pae(X x Y).
Cae (X, Y) denotes the set of all ergodic AMS channels in Ca(X, Y). After giving a lemma we have the following characterization of AMS ergodic channels. Sorne of the equivalence conditions are similar to those of AMS ergodic sources and some are to those of stationary ergodic channels.
Lemma 11. Let v E Ca(X, Y) be AMS and J-L E Ps(X) be stationary. Then for every E, F E X ® q) the following limit exists p-a.e.x: n-I
lim
n.-+-oo
.!.n "'v(x,[(8xT)-kEnF]x). L....J k=O
Proof. Since v is AMS and J-L is stationary, J-L ® v is AMS with proof parallels that of Proposition 1.4 and we have
J-L
v=
J-L
® 17. The
n-I
lim
n.-+-oo
.!.n E v(x, [(8 X T)-k En F]x) k=O
=
where J
= {E E X® q)
i
IF(X, y)EI'0v(lEIJ)(x, y) vex, dy) u-a.e.x,
: (8 x T)-IE
= E}.
Theorem 12. Let v E Ca(X, Y) with the stationary mean 17 E Cs(X, Y). Then the following conditions are equivalent: (1) v E Cae (X, Y), i.e., v is ergodic. (2) J-L E Pse(X) => J-L ® v E Pae(X x Y). (3) 17 E Cse(X, Y). (4) There is a stationary ergodic VI E Cse(X, Y) such that
v(x,·) «VI(X,')
Pse(X)-a.e.x.
(5.7)
(5) There is an AMS ergodic VI E Cae (X, Y) such that (5.7) is true. (6) If E E X®q) is 8 x T-invariant, then v(x,Ex ) = 0 or 1 Pse(X)-a.e.x.
Chapter III: Information Channels
164
(7) For E, F
E
X Q9 q) and J-t
E
Pse(X)
1 n-l lim - """ v(x, [(8 x T)-kEnF]x) n4-OO n L...J
=
k=O
1 n-l lirn - L17(x, [(8 x T)-kE]x)v(x,Fx) n4-OO n k=O
= J-t Q917(E)v(x, Fx)
u-a.e.x,
Proof. (1)
=} (2) is clear. (2) =} (3). Suppose (2) is true and let J-t E Pse(X). Then J-t Q9 u E Pae(X x Y). Hence J-t ® 17 = J-t v E Pse(X x Y) by Theorem 11.4.12 and Theorem 8 (7). Thus 17 is ergodic. (3) =} (4). Take VI = 17 and invoke Theorem 8. (4) =} (5) is immediate. (5) =} (6). Assume (5) is true. Let E E X Q9 q) be 8 x T-invariant and take J-t E Pse(X). Then J-t ® VI E Pae(X x Y) and J-t Q9 u ~ J-t Q9 VI· If J-t Q9 vl(E) = 0, then 0 = J-t Q9 v(E) = Ix v(x, Ex) J-t(dx) , so that v(x, Ex) = 0 u-a.e.x, Similarly, if J-t ® VI(E) = 1, then we can show that v(x, Ex) = 1 u-a.e.x. (6) =} (1). Let J-t E Pae(X). Suppose that E E X Q9 q) is 8 x T-invariant and v(x, Ex) = 0 ii-a.e.x and hence u-a.e.x, Then
p, @ veE) =
Ix
v(x, Ex) p,(dx)
= o.
If v(x, Ex) = 1 ii-a.e.x, then we have J-tQ9v(E) = 1. Thus J-tQ9V is ergodic. Therefore v is ergodic. (1) =} (7). This is shown in a similar manner as the proof of (1) =} (6) of Theorem 4.2 using Lemma 11. (7) =} (1). Let J-t E Pse(X) and E, F E X Q9 q). By integrating both sides of the equation in (7) w.r.t. J-t over X, we get
Hence J-t Q9 v is AMS ergodic. Therefore v is AMS ergodic. We noted that exPa(X) ~ Pae(X) and the set inclusion is proper (cf. Theorem 11.4.14). Similarly we can prove the following. Theorem 13. (1) If v E exCa(X, Y) (modPse(X)), then v E Cae(X, Y). Thus ex c; (X, Y) ~ Cae (X, Y).
165
3.5. AMS channels
(2) If there exists a weakly mixing source in Pse(Y), then the above set inclusion is proper, in that there exists some AMS ergodic channel v E Cae (X, Y) such that v
rt exCa(X, Y)
(modPse(X)).
Proof. (1) Let v E Ca(X, Y), I1 E Cs(X, Y) be its stationary mean and u B A E A(X, Y) be the corresponding channel operator. Suppose that v rt Cae (X, Y). Then there is some J-L E Pse(X) such that A * J-L rt Pae(X X Y). Hence there is some S x Tinvariant set E E X Q9 ~ such that 0 < Al == A * J-L(E) < 1. Letting A2 = 1 - AI, take ~ > 0 so that 0 < ~ < Inin{Al, A2}. Let ai = (i = 1,2) and define operators AI, A 2 on B(X x Y) by
Xi
= 0!1A(f1E) + (1 - a lA1E)Af, A 2f = a2 A(f1Ec) + (1 - a2A1Ec )Af, Alf
f f
x Y), B(X x Y).
E B(X E
Then as in the proof of (3) => (2) of Theorem 4.4 we see that AI, A 2 E A(X, Y). It follows from (4.11) that Al 1=. A 2 (modPse(X)). Moreover, A is a proper convex combination of Al and A 2 :
which follows from (4.12). Now we want to show that VI and Observe that for f E B(X x Y)
L
V2
are AMS channels, where
Vi B
Ai (i = 1,2).
[A1(S Q9 T)nf](x) /-L(dx)
= =
L
[a1(A(S Q9 T)nf 1E) + (1 ~ a 1 A 1E)A (S Q9 T)nf] (x) /-L(dx)
L
[a1A(S
Q9
T)n(j1E)
because E is S x T-invariant. Since
r,«
+ (1 -
a 1 A 1E)A (S Q9 T)nf] (x) /-L(dx)
v is AMS, n-+oo lim Ix alA(S Q9 T)n(f1E) dJ-L exists
by Theorem 8 (6). Also lim a lA1E)A(S Q9 T)nf dJ-L exists by Theorem 8 (6) n-+oo and the Bounded Convergence Theorem. Thus, we proved that
exists for every f E B(X x Y) and hence VI is AMS by Theorem 8 (6).. Similarly, is AMS. Consequently we see that u rt exCa(X, Y) (modPse(X)).
V2
Chapter III: Information Channels
166
(2) Take an
1]
E
Pse(Y) that is WM and define
~(C) = L9dT/,
eby
CE~,
where 9 E L l(Y,1]) is nonnegative with norm 1 which is not T-invariant on a set of positive 1] measure. Then, as in the proof of Theorem 11.4.14 we see that e E Pae(Y) , i= 1], ~ = 1] and ( == ~(e + 1]) E Pae(Y) isa proper convex combination of two distinct AMS sources. Hence
e
"c ¢ exCa(X, Y)
(modPse(X))
since "c = ~(ve + v.,,), ve, v." E Ca(X, Y) and iJe i= v.". We need to show "c E Cae (X, Y). Clearly v( = v, = v." E Cse(X, Y) since J-t ® Vn = J-t x 1] E Pse(X x Y) for J-t E Pse(X) by Theorem 11.3.10. Thus "c E Cae (X, Y) by Theorem 12. If the output space is an alphabet message space Y = Yl~ with a shift transformation T, then there exists a Bernoulli source that is strongly mixing (cf. Example 11.3.7), so the assumption in Theorem 13 (2) is satisfied.
3.6. Capacity and transmission rate
For a stationary channel we define the transmission rate functional and the stationary and ergodic capacities. An integral representation of the transmission rate functional is given. For a stationary ergodic channel the coincidence of two capacities is shown. First we deal with the alphabet message space case and then the general case. Consider alphabet message spaces X = X~ and Y = Y oZ with shifts Sand T, respectively, where X o = {at, ... ,ap } and Yo = {b l , ... ,b q } . For each n ~ 1, ~(X) denotes the set of all messages in X of length n starting at time i of the form [X~k) .•. X~~n-l], 1 ~ k ~ pn. Similarly, we denote by VJt~ (Y) and VJt~ (X x Y) the sets of all messages in Y and X x
Y of length n starting at time i, respectively. Note that
.
n-l..
~(X) = .V
3=0
S-3VJti(X)
E
P(X) for n ~ 1 and i E Z. Let a channel u E C(X, Y) be given. For each input source J-t E P(X) we associate the output source J-tV E P(Y) and the compound source J-t ® v E P(X x Y). The mutual information In (X, Y) = In(J-t; v) between the two finite schema (VJt~(X), J-t) and (VJt~(Y),J-tv) is given by In (X, Y) = In(J-t; v)
3.6. Capacity and transmission rate
167
(cf. (1.1.2)), where
HJL(VJt~(X)) = -
L
M(A) logM(A),
AEVJt~(X)
etc. Then, ~ In (M ; v) is considered as an average information of one symbol (or letter) when messages of length n in the input source [X, J-L] are sent through the channel u, If the limit
since
V
n=-oo
with a
snVJt~(X) = X, etc. In this case note that I(· ; v) is affine: for a, (3 ~ 0
+ {3 =
1 and M, rJ E Ps(X)
I(aM + {3rJ; v)
= aI(M; v) + {3I(rJ; v),
which follows from Theorem 1.5.6 (or Lemma 1.5.1). The,.. stationary capacity and the are respectively defined by
Cs(v) = sup {I(M; v) : ME Ps(X)}, Ce(v) = sup {I(J-L; v) : ME Pse(X) and MQ9 v
Pse(X x Y)},
E
where if MQ9 v ¢ Pse(X x Y) for every ME Pse(X), then let Ce(v) = O. Let us mention the weak* upper semicontinuity of the entropy functional. For n ~ 1 let Hn(M) = HJL(VJt~(X)). Then we have that for each M E Ps(X)
n n lim Hn(p,) n
n~oo
~
1, k
= H (8). JL
~
1,
~
1,
(6.1)
Chapter III: Information Channels
168
In particular, (6.1) implies that
Hkn(J-L) < Hn(J-L) kn n '
n
2:: 1,k 2:: 1,
so that
H ( ) > H 2(J-L) > H 22(J-L) > ... > H 2n(J-L) > ... , 1 J-L 2 22 2n i.e, H 2;} p,) t Hp,(S). This shows that H(.)(S) is a weak* upper semicontinuous function on Ps(X) c C(X)* since each H n(·) is so. Although this was proved in Lemma II. 7.3, we shall use this method to prove the part (2) (i) of the following theorem, which summarizes the fundamental results regarding transmission rates and capacities in the alphabet message space case.
Theorem 1. Let v E Cs(X~, Yl~) be a stationary channel. (1) 0 ~ Ce(v) ~ Cs(v) ~ logpq, where p = IXol and q = IYol. (2) If v has a finite memory (cf. Definition 1.1 (c5)) and is finitely dependent (cf. Definition 3.3 (c8)), then: (i) 1(· ; v) is upper semicontinuous in the weak* topology of Ps(X) C C(X)*; (ii) Cs(v) = Ce(v); (iii) There exists a stationary ergodic input source J-L* E Pse(X) such that Ce(v) = I(J-L*;v). (3) Cs(v) = Ce(v) if v is ergodic.
Proof. All the statements of Theorem 1 (except (2)(i)) follow from the discussion below (see Theorem 8), where we consider a more general setting. So we prove (2)(i). Since v has finite memory, it is continuous (Definition 3.1, (c5')), so that Hp,v(T) is a weak* upper semicontinuous function of J-L by Proposition 1.6 and Lemma 11.7.3. Hence, it is sufficient to show that Hp,(S) - Hp,®v(S X T) is a weak* upper semicontinuous function of J-L on Ps(X). Since v has finite memory and is finitely dependent, there exists a positive integer m such that (c5) and (c8) hold. By (c5) we have that for any message C = [Ym+l ... Yn](m + 1 ~ n) C Y
v(X, C) = v(x', C),
x,x'
E
A = [Xl'· ·Xn ] .
We denote this common value by v(A, C). For n 2:: 1 and J-L E Ps(X) let
fn(J-L) =
-!. [L J-L ® v(A x C) log J-L ® v(A x C) n
L J-L(A) log J-L(A)] ,
A,C
A
where the sum is taken for A E 9R~(X) and C E 9R:!~(Y). Observe that
fn(P,) =
~ [LV(A,C)P,(A)lOgV(A,C)P,(A) A,C
LP,(A)lOgP,(A)] A
(6.2)
169
3.6. Capacity and transmission rate
1
=-
2: v(A, C)J.L(A) logv(A, C).
(6.3)
n A,C
= [Xl" 'X n] E VJt~(X) let A' = [Xm+l" .xn] E VJt~!~(X) and for C = [Ym+l" 'Yn] E VJt~!~(Y) let C' = [y~ .. 'y!mYm+I" 'Yn] E VJt~(Y). Then one has for J.L E Ps(X) For A
J.L ® v(A xC') :::; J.L ® v(A x C) :::; J.L ® v(A' x C). It follows that for n ~ m
+ 1 and J.L E Ps(X)
2: J.L ® v(A xC') log rz ® v(A xC') : :; 2: J.L ® v(A xC') logJ.L ® v(A x C)
-Hn(J.L ® v) =
A,C'
A,C'
=
2: J.L ® v(A x C) log J.L ® v(A x C) A,C
= Gn(J.L),
(6.4)
say,
< 2: J.L e v(A x C) log®v(A' x C) A,C
=
2: J.L ® v(A' x C) log u ® v(A' x C) 2: J.L ® v(smA' x TmC) log J.L ® v(smA' x TmC)
A',C A',C
= -Hn-m(J.L ® v).
(6.5)
Hence (6.2) and (6.4) imply
fn(P,)
= !n (G n (p,) + Hn(p,» ~ n1 (Hn(p,) - H n (p, ® lI»,
while (6.2) and (6.5) imply
These two yield that liminf fn(P,) n-+oo
~ n-+oo lim !(Hn(p,) n
Hn(p, ® lI»
= Hp,(S) - HJL®v(S X T)
170
Chapter III: Information Channels
lim sup fn(tt) n-+oo
~ n-+oo lim ! (Hn(tt) n
Hn-m(tt ® v))
= lim ~Hn(J-t) - lim n
n = HJ.t(S) n-+oo
n-+oo
HJ.t®v(S
X
n
m. _ l - Hn_ m (J-t ® v) n- m
T).
So we conclude that lim fn(J-t)·= HJ.t(S) - HJ.t®v(S x T),
n-+oo
Note that fn(·) is a weak* continuous function on Ps(X) for each n ~ 1. To prove the weak* upper semicontinuity of HJ.t(S) - Hp,®v(S X T) it suffices to show that {fn(J-t)} contains a monotonely decreasing subsequence. Let f ~ 1 be arbitrary and n = 2(m + f). Denote a message A = [Xl··· Xn] E 9Jt~(X) by A = [Xl· .. Xm+l] n [X m+l+1 ... Xn] = Al n A 2 , say. Similarly, we write C = [Ym+1
Yn]
Ym+l] n [Ym+i+1 ... Y2m+l] n [Y2m+l+1 ... Y2(m+l)] n c' n C 2 , say.
= [Ym+1 = C1
Since v has a finite memory and is finitely dependent, one has
v(A,O) = v(A, 0 1 n Of n O2)
< v(A, C 1 n O2 ) = v(A, 01)v(A, O2), where V(A,Ol n C 2) stands for the common value of v(x, 0 1 n O2) for X E A. Now from (6.3) and the above we see that
fn(J-t) = f2(m+l)(J-t) 1
L L
< 2(m + l)
v(A 1 , C 1)v(A2 , C 2)tt(A1 n A 2 ) logv(A 1 , C1)v(A2 , C 2 )
A1,C1 A 2,C2
=
1 m+.f..
--0
L
v(A 1, C 1 )J-t(A 1 ) logv(A 1, 0 1 )
A1,C1
= fm+l(J-t), so that fm+l(J-t) .~ f2(m+l)(J-t). Since this holds for every f = 1,2, ... , we can choose a monotonely decreasing subsequence {fnk(J-t)}k=l for each J-t E Ps(X).
171
3.6. Capacity and transmission rate
Now we assume that X and Yare totally disconnected compact Hausdorff spaces with the bases X o and ~o of clopen sets and homeomorphisms Sand T, respectively. As before X and ~ stand for the Baire o-algebras of X and Y, respectively. Let 2t E P(Xo) and ~ E P(~o) be fixed clopen partitions of X and Y, respectively. Q: = 2t x ~ denotes the clopen partition {A x B : A E 21,B E ~} of X x Y. We consider three entropy functionals: H 1 (1t) = H(It, 21, B),
H 2 ( ", ) = H("" H3(~)
~,T),
= H(~, c,S x T),
It E Ms(X), '" E ~ E
Ms(Y), Ms(X x Y).
(6.6)
By Theorem 11.7.6 there are B-, T- and B x T-invariant nonnegative measurable functions hi on X, h 2 on Y and h3 on X x Y, respectively such that
H 1(p,)
=
L
h1(x) p,(dx),
H2(11) =
1, 2(y) 11(dy),
H3{~) =
jr(lxxY h3(X,y) ~(dx, dy),
h
~ E
Ms(X x Y).
Definition 2. Let v E Cs(X, Y) be a stationary channel and It E Ps(X) be a stationary source. Then, the transmission rate v:t It; v = v:t It; v, 21, ~ of the channel v w.r.t. by
where HI, H 2 and H 3 are defined by (6.6). Hence v:t(. ; v) is called the transmission rate junctional of v on Ps(X) or on Ms(X). Hereafter, we shall use the letter v:t for the transmission rate functional. We can obtain an integral representation of the transmission rate functional
v:t(. ; v). Proposition 3. Let v E Cs(X, Y) be a stationary channel. Then the transmission rate junctional v:t(. ; It) is a bounded positive linear junctional on M; (X) and there is a universal B-invariant bounded Baire junction t{·) on X such that (6.7)
172
Chapter III: Information Channels
Proof. Observe that for J-t E Ms(X)
H 1(J-t) + H 2(J-tv) - H 3(J-t ® v)
~(J-t; v) =
=
r h(x) p,(dx) + Jyrh(y) P,V(dy)- j JXXY r h(x, y) p, @v(dx ,dy) Jx r
1
i
2
h1(x) p,(dx) +
i[
3
h2(y) v(x, dy)p,(dx)
-i[h (x ,y) v(x ,dY)P,(dX) = l, [h (X) + [h (y) v(x, dY) - [h (x, y) v(x, dY)] P,(dX) 3
1
2
3
by Lemrna 2.5. Hence, letting XEX,
(6.8)
we have the desired integral representation (6.7). S-invariance of 1: follows from S-invariance of h-i, T-invariance of h2 , S x T-invariance of h3 and stationarity of u, Clearly ~(.; v) is a bounded linear functional on Ms(X) since 1: is bounded. We show that ~(. ; v) is nonnegative. Note that
(6.9)
9l(J-t; v) = H 3(J-t x J-tv) - H 3(J-t ® v)
= n-+oo lim 1 n
L [(p, x p,v)(A x B) log(p, x p,v)(A x B) A,B
- J-t ® v(A x B) log u ® v(A x B)]
= n-+oo lim .!.. '" p, @ v(A n L....J
x B)[log(p, x p,v)(A x B) -log p, @ v(A x B)]
A,B
~O
by Theorem 1.1 (1), where the sum is taken over A E 2l. V 2l.n and B E ~ V Q3n, and we have used the following computation:
L A,B
(1-£ xJ-tv)(A xB) log(J-t x J-tv)(A x B)
3.6. Capacity and transmission rate
173
L J-L(A) log J-L(A) + L J-Lv(B) log J-Lv(B) == L J-L ® v(A B) logJ-L(A) + L J-L ® v(A A,B A,B == L J-L ® v(A xB) log(J-L J-Lv)(A B). ==
A
B
X
X
X
B) logJ-Lv(B)
X
A,B
For a general J-L E M;-(X) we can similarly show 9t(J-L; v) 2:: O. The following corollary immediately follows from (6.9).
Corollary 4. For a stationary channel v E Cs(X, Y), the transmission rate functional 9\(. ; v) is wri·tten as
where (1-£ == J-L
X
J-LV - J-L ® u for J-L E M; (X).
To consider the Parthasarathy-type integral representation of transmission rates we use results in Sections 2.6 and 2.7. As in Section 2.7 denote by 2)0 the algebra of 00 . clopen sets generated by . U S-JSll and let 2)x == a(2)o), the a-algebra generated J=-OO
by 2)0' Let C(X, Sll) be the closed subspace of C(X) spanned by {IA : A E 2)0}. Ps(X,2)x) stands for the set of all stationary sources on (X, 2)x), and Pse(X, 2)x) for the set of all ergodic elements from Ps(X, 2)x). As usual, for J-L E P(X), J-L1~x denotes the restriction of J-L to 2)x .
Lemma 5. With the notation mentioned above, (1) Ps(X, Q3x) == {J-LI~x : J-L E Ps(X)}. (2)Pse(X,2)x) == {J-LI~x : J-L E Pse(X)}.
Proof. (1) Let J-LI E Ps(X,2)x) be given. Then J-LI is regarded as a positive linear functional of norm 1 on C(X, Sll) by the Riesz-Markov-Kakutani Theorem, By the Hahn-Banach Theorem there exists an extension of J-LI onto C(X) of norm 1. Let PI be the set of all such extensions. Since PI is weak* compact and convex, the fixed point theorem implies that there is some S-invariant J-L E P(X), i.e., J-L E Ps(X) and J-L1~x == J-LI· (2) Suppose that J-LI E Pse(X,2)x) is ergodic and let PI be as in (1). If J-L E ex [ps(X) n PI]' then J-L is ergodic. For, if J-L == O',rJ + (3€ with a, (3 > 0, 0',+(3 == 1 and n, € E Ps(X), then as functionals J-L == O',rJ + (3€ == J-LI on C(X, 2l). Since J-LI is ergodic, we see that J-L == rJ == € on C(X,2(.) and hence 11, € E Ps(X) n Pl' Thus J-L rJ == € on C(X) because J-L is extremal in Ps(X) n Pl' Therefore J-L is ergodic and J-L1~x == J-LI.
174
Chapter III: Information Channels
It follows from Lemma 5 (1) that
Hence the entropy functional H I ( · ) on P s(X,2() is unambiguously defined by
Recall that R denotes the set of all regular points in X (cf. Section 2.7). For J-t E Ps(X) and r E R, J-tr denotes the stationary ergodic source corresponding to r. By Lemma 5 (2) there is some Pr E Pse(X) such that J-tr = Pr Imx and we define
for a stationary channel u, For partitions ~ and
Proposition 6. Let v E Cs(X, Y) be a stationary channel satisfying that (c2') v(·, C) is
~x
-measurable for CEq:).
Then the function 9\(J-tr; v) of r E R is ~ x -measurable on R such that
Proof. According to the proof of Theorem 11.7.6 the entropy functions hI, h2 and h 3 are ~x-, ~y- and ~xxy-measurable, respectively. Hence the condition (c2') implies that h 2(y) v(·, dy) and h 3 ( · , y) v(·, dy) are ~x-measurable, so that Vl(J-t(.); v) is also ~x-measurable on R. By Lemma 11.7.5 we see that for J-t E Ps(X)
Jy
Jy
!R(JL; v) =
=
L L
l:(x) JL(dx) =
L[L
l:(x) JLr(dX)] JL(dr)
!R(JLr; v) JL(dr).
Definition 7. For a stationary channel t/ E Cs(X, Y) the stationary capacity Cs(v) is defined by Cs(v) = sup {Vl(JL; v) : JL E Ps(X)} and the ergodic capacity Ce(v) by
Ce(v) = sup {Vl(J-t; v) : J-t
E
Pse(X) and JL ® v
E
Pse(X x Y)}.
3.6. Capacity and transmission rate
175
If there is no J-t E Pse(X) with J-t Q9 v E Pse(X x Y), then we let Ce(v) =
o.
Theorem 8. Let v E Cs(X, Y) be a stationary channel satisfying (c2'). (1) Cs(v) = sup {91:(J-t; v) : J-t E Pse(X)}. (2) If v is ergodic, then Cs(v) = Ce(v). (3) If v is ergodic and 91:(. ; v) is weak* upper semicontinuous on Ps(X) c C(X)*, then there exists an ergodic source J-t* E Pse(X) such that Cs(v) = Ce(v) = 91:(J-t*; v). Proof. (1) By Proposition 6 we see that for any J-t E Ps(X)
9l(JL; v)
=
L
9l(JLr; v) JL(dr)
::; sup 91:(J-tr; v) rER
::; sup {91:(J-t; v) : J-t E Pse(X)}. Taking the supremum on the LHS over Ps(X) gives the conclusion. (2) is obvious. (3) Since 91:(.; v) is weak* upper semicontinuous and the set Ps(X) is weak* compact and convex, there exists at least one J-to E P, (X) such that 91:(J-to; v) attains its supremum, i.e., 9t(J-to; v) = Cs(v). Let P1 be the set of all such J-to. Since 91:(.; v) is weak* upper semicontinuous and affine on Ps(X), P1 is weak* compact and convex. By the Krein-Milman Theorem there is at least one extreme point J-t* in Pl. Then J-t* is also extremal in Ps(X). For, if J-t* = cqi; + (1-a)J-t2 for a E (0,1) and J-t1, J-t2 E Ps(X), then
Cs(v)
91:(J-t*; v)
= a91:(J-t1; v) + (1 - a)91:(J-t2; v) ::; Cs(v), which implies 9t(J-t1; v) = 91:(J-t2; v) = Cs(v) and J-tb J-t2 E Pl. By the extremality of J-t* we have J-t* = J-t1 = J-t2, so that J-t* is extremal in Ps(X). Therefore J-t* is ergodic by Theorem 11.3.2. In the rest of the section we discuss the transmission rate and the capacity of a stationary channel in a measure theoretic manner. Let (X, X, S) and (Y,~, T) be a pair of abstract measurable spaces with measurable transformations Sand T. A slightly different type of transmission rate is defined as follows.
Definition 9. Let v E Cs(X, Y) be a stationary channel. (1) The transmission rate 91:(J-t; v) of v w.r.t. J-t E Ps(X) is defined by
91:(J-t; v) = sup {HIJ-(2t, S)
+ HIJ-v(SJ3, T)
- HIJ-@v(2t x SJ3, S x T) :
2t E P(X), SJ3 E P(~)},
Chapter III: Information Channels
176
where Hp,(2(, S) = H(J-t, 2(, S), etc. (2) The stationary capacity Cs(v) and the ergodic capacity Ce(v) are respectively defined as in Definition 7. Remark 10. (1) In the above definition, the equality Cs(v) = Ce(v) for stationary ergodic channel v is not trivial. (2) In the alphabet message space case we have chosen 2( = 9Jt1(X) and ~ = 9Jt1(Y). Since they are the generators in the sense that V sn§{ = X and V Tn~ = nEZ
~,
nEZ
we obtained
by the Kolmogorov-Sinai Theorem. In the discussion from Definition 2 through Theorem 8 we fixed partitions 2( E P(X) and ~ E P(~). (3) Since J-t (8) v ~ J-t x J-tv and
Hp.(m.) + Hp.v(lJ3) - Hp.®v(m. x lJ3)
""
= L...J fL 0
(J-t x J-tv)(A x B)
v(A x B) log fL 0 v(A x 13) ,
A,B
where the sum is taken for A E 2( and B E relative entropy and Theorem 1.6.2 that
~,
it follows from the definition of the
9l(J-t; v) = H(J-t x J-tvlJ-t (8) v) d(J-t (8) v) d(J-t (8) v) = XxY d( J-t x J-tV ) log d( J-t x J-tV ) d(fL x fLV).
11
Note that this quantity is defined for all channels and input sources. Proposition 11. If Ps(X) is complete for ergodicity (cf. Definition 2.8) and v Cse(X, Y) is a stationary ergodic channel, then the equality Cs(v) = Ce(v) holds.
E
Proof. We only have to prove Cs(v) ~ Ce(v) assuming Cs(v) < 00. For an arbitrary e > 0 choose a stationary source J-t E Ps(X) and finite partitions 2( E P(X) and ~ E P(~) such that
Hp,(2(, S)
+ Hp,v(~, T)
- Hp,®v(2( x
~,S
x T) > Cs(v) -
€.
(6.10)
Then by Theorem 11.7.7 we can find transformation invariant bounded measurable functions hI (x), h 2 (y) and h 3(x, y) such that
177
3.6. Capacity and transmission rate
Hp,v(r.I3, T) = [h 2 (Y) p,v(dy),
HJ-t~v(2( X
23, S
X
JrJXXY r h
T) =
3(x,
y) p,@v(dx,dy).
As in the proof of Proposition 3, if we define t(x) by (6.8), then
=
LHS of (6.10)
L
rex) p,(dx).
Since t(·) is S-invariant there exists a sequence {t n (.)} of S-invariant simple functions such that t n t t on X. For each n ~ 1 write t n as kn
tn =
E
CYn,kIAn,k'
k=l
where {An,l,'" ,An,kn} E P(X) with S-lAn,k from the Monotone Convergence Theorem that
=
An,k
for I~ k ~ kn . It follows
Hence by (6.10) we see that
for some no
~
1. Consequently one has for some k o
Since Ps(X) is complete for ergodicity, there exists some 1-£* E Pse(X) such that 1-£* (Ano,ko) = 1. Thus
By the ergodicity of v we obtain Ce(v)
~
Cs(v), completing the proof.
178
Chapter III: Information Channels
3.7. Coding theorems In this section, Shannon's first and second coding theorems are fomulated and proved. Feinstein's fundarnentallemma is also proved. It is used to establish the above mentioned theorems. We use the alphabet message space setting: X = X~ and Y = YoZ for finite sets X o and Yo with the shifts Sand T, respectively. A channel v E C(X, Y) is said to be nonanticipatory if (c16) v(x, [Yn = b]) = v(x', [Yn = b]) for every nEZ, b E Yo, and x = (Xk), x = X with Xk = x~ (k :::; n).
(x~) E
Recall the notations mt~(X) and v(A,D): mt~(X) is the set of all messages of length n starting at time i in X, and v(A, D) is the common value v(x, D) for x E A, where A E X and D E~. Then Feinstein's fundamental lemma is formulated as follows:
Lemma 1 (Feinstein's fundamental lemma). Let v E Cs(X, Y) be a stationary, m-memory, m-dependent and nonanticipatory channel with the ergodic capacity C =
Ceo For any e > 0 there exist positive integers n = n(e) and N = N(e), messages UI , ... ,UN E mt~~n(X) and measurable sets Vb ... ,VN E A(mt~(Y)), the algebra generated by mt~ (Y), such that
(1) VinVj=0 (i#j); (2) u (Ui , Vi) > 1 - e, 1 :::; i :::; N; (3) N > en(C-e) . Proof. By the definition of C e there exists an ergodic source JL E P s e (X) such that e 9l(JL ; v) > C - 2".
In fact, we rnay assume 9l(JL;v) = C in view of Theorem 6.1 (2). Since v is stationary and ergodic by Theorem 3.4, JLV and JLQ9v are also stationary and ergodic. For n ~ 1 we denote
A = [x- mX- m+l · · ·Xn-I]
E VJt~~n(X),
D = [YOYI·· ·Yn-I] E VJt~(Y).
(7.1) (7.2)
Then, by 5MB Theorem (Theorern 11.5.1) 1 --logJL(A), n
1 1 --logJLv(D), --logJLQ9v(AxD) n n converge to HI-£(S) , Hl-£v(T) and HI-£®v(S x T) a.e. and hence in probability, respectively. Hence
..!:.log v(A, D) = ..!:.lo /l Q9 v(A x D) n
JLv(D)
n
g JL(A)JLv(D)
3. 7. Coding theorems
179
tends to Hj.£(S) + Hj.£v(T) - Hj.£Q9v(S x T) = 9t(JL; v) in probability (JL we can choose a large enough n = n(c) ~ 1 for which 1 v(A,D) It Q9 u ([ ;;: log Itv(D)
>C
~]) 2
Q9
v). Thus
> 1- ~2'
(7.3)
where [... ] indicates the set
For each A E oot~+n(X) let
VA =
D) Itv(D) > C U{D E 9.n (Y ) : ;;:1 log v(A, 0
n
e}
2 .
Then we get LHS of (7.3)
= JL Q9 v (U(A
x VA))
=L
A
L JL(A)v(A, VA) > 1 A
JL Q9 v(A x VA)
A
~
(7.4)
2
If A and D are such that ~log:<:
> JLv(D)en(C-~).
Now choose a UI E oot~+n(X) such that
v(U I , Vu1 ) > 1 - e, which is possible in view of (7.4), and let VI = VUl. If there is a U2 E oot~+n(X) such that v(U2 , VU2 - VI) > 1 c, then we let V2 = VU2 - VI. If, moreover, there is a U3 E
mt~+n(X)
such that
then we let V3 = VUa - (VI U V2 ) . This procedure terminates in a finite number of steps, and we write the so obtained sets as
Chapter III: Information Channels
180
It follows from the construction of these sets that (1) and (2) are satisfied. Furthermore, we have for A E 9Jl;;;'~n(X)
which implies N
V(A,VA) S V(A, }dll~ + V(A, VA -
)
jld N
Vj )
~V(A,UVj)+1-e. J=l
Taking the average over
9Jl;;;'~n(X)
w.r.t. J-L, we get
~ M(A)v(A,VA) S /],V
(jld N
Vj )
+1-
e,
which, together with (7.4), yields that (7.6) Since Vj ~ VUj (1
< j < N), we have by (7.5) that
Thus, invoking (7.6), we see that
if we choose n
= n(e)
~ 1 such that
n(e) ~ ~ log:. Thus (3) holds.
An immediate consequence of the above lemma is: Corollary 2. Let u E Cs(X, Y) be a stationary, nonanticipatory, m-memory and m-dependent channel with the ergodic capacity C = Ceo Then, for any sequence
181
3. 7. Coding theorems
{cn}~=m+l such that
Cn
+0,
there exist a sequence of positive integers {N =
N(n)}~=m+l' a family offinite sets of messages {ui n ) , ..• ,u];)} c m;;"+n(X) (n ~ m + 1) and a family of finite sets of measurable sets {V1(n), . . . , vir)} c A(m~ (Y)) (n ~ m
+ 1)
such that for n ~ m
n ujn)
+1
0 (i =I j); (2) v (Ui(n) , Vi(n)) > 1 - Cn, 1 ::; i (3) N > en(C-c
(1) Ui(n)
=
::; N;
n ) .
To formulate Shannon's coding theorems, we consider another alphabet Xb {a~, ... ,a~} and the alphabet message space (X', X', 8 '), where X' = Xb'L and 8 ' is the shift on X'. A code is a one-to-one measurable rnapping ip : X' -+ X. If we let
vcp (x', A) = 1A (
X'
E
X',A
E
X,
then [X', vCP, X] is called a noiseless channel. This means that, if vcp (x', A) = 1, then we know that x' E X' is encoded as
where r ~ 1 is an integer, called the code length, and
vcp(x', D) = v(
X'
E X',D E~,
then we have a new channel [X', vcp? Y] or Vcp E C(X ', Y), which is called the induced channel. Suppose that v E Cs(X, Y) is a stationary, nonanticipatory, m-rnernory and mdependent channel, and
(7.7) Consider rnessages D 1 , D 2 , • • • ,Dqn E m~ (Y) of length n. For each k (1 ::; k ::; qn) choose a subscript ik (1 ::; ik ::; .em +n ) for which
(7.8) and let
qn
En =
U(A~k k=l
X
Dk ) .
(7.9)
182
Chapter III: Information Channels
When, for each k, the message D k is received at the output Y, then it is natural to consider that the message A i k is sent, or we decode D k as A i k • Then J.L' ® vcp(En ) indicates the probability of decoding without error. The following theorem asserts that under the above mentioned conditions we can find a block code with sufficiently large length for which the decoding error probability is arbitrarily close to zero.
Theorem 3 (Shannon's first coding theorem). Let v E Cs(X, Y) be a stationary, nonanticipatory, m-memory and m-dependent channel with the ergodic capacity C = Ceand [X', J.L'] be a stationary ergodic source with entropy H o = H(/l/) < C. Then, for any e (0 < e < 1) there exists a positive integer no = no(e) such that for any n ~ no there exists an n-block code sp : X' -+ X with J.L (g) Vcp (En) > 1 - e, where En is defined by (7.9). Proof. Since [X', J.L'] is stationary ergodic, 5MB Theorem implies that
in probability as n -+ 00, and that for any given e (0 integer nl = nl(e) such that
< e < 1) there exists a positive
(7.10) Let A~, ... ,A~l E oot~~nl (X') be such that e - 1 log zz'( A') k
Then, this and (7.10) imply that N1
'L>'(A~) ~ 1-~,
(7.11)
k=l
J.L' (A~)
~ e- n 1 (Ho+~),
from which we see that N1
1 ~ LJ.L'(A~) ~Nle-nl(Ho+~)
or
N1
::;
enl(Ho+~).
(7.12)
k=l
Since [X, u, Y] is stationary, nonanticipatory, m-memory and m-dependent, we can apply Feinstein's fundamental lemma to see that there exist positive integers
183
3. 7. Coding theorems
n2 = n2(e) and N 2 = N 2(e), messages UI , · · · ,UN2 E v.n;;;'+n2 (X) and measurable sets VI, ... , VN2 E A (v.n~2 (Y)) such that (i) Vi n Vj = 0 (i =1= j); (ii) v (Ui , Vi) > 1 - ~, 1 ::; i ::; N 2 ; (iii) N 2 > en2(C-~). Let no = no(e) = max{nl,n2}. We rnay suppose that C> H o +e, so that C Ho + ~. Thus (7.12) and (iii) imply that
~
>
Let n ~ no be arbitrary and define a function
1 ::; k ::; N I + 1 ::; k ::; .em +n ,
NI
where A~s satisfy (7.7). Then,
where o, E v.n~(Y) (1 ::; j ::; qn) and V(Uk' D j ) is the common value of v(x, D j ) for x E Uk since v has m-memory and since x' E A~ implies
>
(1 - ~)tt'(AD.
Consequently, for En defined by (7.9), it holds that qn
tt' ® 1I'I'(En)
U (A~k x D k))
= tt' ® 11'1' (
k=1
N1
~
E E
It' ® Vep(A~k x Dk)
j=1 Dk~Vj
Nl
~
E E
tt' ® vep(Aj x D k ) ,
j=1 Dk~Vj
N1
=
E tt' ® vep(Aj x Vj) j=1
by (7.8),
(7.13)
Chapter III: Information Channels
184 Nl
>
L (1 -
;=1
~)J.L'(Aj),
> (1 - ~r, > I-c.
by (7.13),
by (7.11),
The first coding theorem states that the error probability of decoding can be made arbitrarily small. On the other hand, the second coding theorem asserts that, in addition, the transmission rate of the associated channel can be close to the entropy of the input, so that efficiency of the information transmission is also guaranteed.
Theorem 4 (Shannon's second coding theorem). Let [X',J-L'] be a stationary ergodic source with the entropy H o = H(J-L') and [X, i/, Y] be a stationary, nonanticipatory, m-memory and m-dependent channel with the ergodic capacity C = Ceo If H o < C and 0 < e < 1, then there exists an n-bolock code ip : X' -+ X such that
where En is defined by (7.9). Proof. For each n
~
1 let
c~ = inf {c > 0: J.L'([ - ~ logJ.L'([x~m·· ·X~_l]) < u; + c]) ~ 1- c}, so that
J.L' ([ - ~ log ([x~m .. ·x~_d) ::; Ho + c~
J) ~ 1 - c~.
(7.14)
It follows from the 8MB Theorem that
and hence e~ -+ 0 as n -+ such that
00.
For each n ~ 1 let A~,l' ... ,A~,Nl E oot;;;'~n(X') be
1 '( AnI k ) ~ H + en' , --logJ-L o n '
Note that Nl
LJ-L'(A~,k) ~ 1- e~, k=l
(7.15)
185
3. 7. Coding theorems
J./(A~,k) ~ e-n(Ho+c~),
1
< k:::; N 1,
by (7.14) and (7.15), respectively. It follows from the above pair of inequalities that Nl
1 ~ L:Jl(A~,k) ~ Nle-n(Ho+c~) k=l
or
N 1:::; en(Ho+c~).
(7.16)
By Corollary 2 we can find a sequence {c~~}~=m+l of positive numbers, a sequence {N2 = N2(n)}~=m+l of positive integers, a farnily of finite sets of messages {ui n ), ... ,Ut:)} c 9Jt~~n(X) (n ~ m + 1) and a family of finite sets of measurable sets {V1(n) , ... , V~~)} (i) Ui(n) n ujn) =
c A(9Jt~(Y)) (n ~ m + 1) such that
0 (i =1= j); > 1 - e"n' 1 < - i -< N 2,.
(ii) v(U~n) ~ , V(n)) ~ (iii) N 2 > en(C-c~); (iv) lim e~ = 0. n-+oo
Since lim e~ n-+oo
= 0, we have n-+oo lim (c~ + e~) = 0, so that there exists an no
~ 1 such
that c~ + c~ < C - H o for n ~ no. Let n ~ no. Then by (7.16) and (iii) we see that N 1 < N 2. Thus we can define a function
N1 + 1
< em +n ,
and hence an (m + n)-block code
so that for a large enough n it holds that J-I/ ® vep(En ) > 1 - e. To prove 9l(p,'; vep) > H o - e, it suffices to show that
or, equivalently,
since lim l.HJLI (9Jt~m+n(X')) = H(p,') = He: Let n-+oo n
C
~Dn,k
(A'.) = It' ® Vep(A~,i X Dn,k) n,~ 11.' V (D) , r: ep n,k
186
Chapter III: Information Channels
Then we have that for each k .e m + n
L ~Dn,k (A~,i) log ~Dn,k (A~,i) i=l
.e m +n
= ~Dn,k (A~,ik) log ~Dn,k (A~,ik) + L
i=l,i::;i:ik
~Dn,k (A~,i) log ~Dn,k (A~,i)'
since x log x is convex on [0, 1], = ~Dn,k (A~,ik) log ~Dn,k (A~,ik)
~Dn,k (A~,ik)) log (1 - (1 - ~Dn,k (A~,ik)) log(/!,m+n - 1)
+ (1 -
- ~Dn,k (A~,ik))
2:: -log2 - (1 - ~Dn,k(A~,ik)) log(/!,m+n - 1). Now if we take the average over k (1
~
k
~
qn) w.r.t. the measure j.t'vep, then
qn .em + n - L tt'Vep(Dn,k) L ~Dn,k (A~,i) log~Dn,k (A~,i) k=l i=l ~ log 2 + (1- tt' ® vep(En)) log(/!,m+n -1) ~ log 2 + (c~ + c~)(m + n) log /!'. On the other hand, it is easy to verify that
qn .e m + n - LJ-t'vep(Dn,k) L~Dn,k(A~,i) log~Dn,k(A~,i) k=l i=l
= - L J-t' ® Vep(A~,i i,k
X
Dn,k) log tt' ® Vep(A~,i
X
Dn,k)
+ Ltt'vep(Dn,k) logtt'vep(Dn,k) k
=
Therefore,
Hp.'®vcp
(VJt~~n(X') x VJt~(Y)) -
Hp.'vcp
(VJt~(Y)).
Bibliographical notes
187
+ n) log z = 0, < lim log 2 + (c~ + c~)(m n
-n-too
as desired.
Bibliographical notes 3.1. Information channels. The present style of formulation of channels (Definition 1.1) is due to McMillan [1](1953), Khintchine [2](1956) and Feinstein [2](1958). Proposition 1.2 is proved by McMillan [1]. Proposition 1.3 is due to Fontana, Gray and Kieffer [1](1981). Proposition 1.4 is obtained in Yi [2](1964). Proposition 1.5 is in Umegaki and Ohya [1](1983). Proposition 1.6 is proved by Umegaki [7](1974). 3.2. Channel operators. Echigo (Choda) and M. Nakamura studied information channels in an operator algebra setting in Echigo and Nakamura [1](1962) and Choda and Nakamura [1, 2](1970, 1971), where a channel is identified with an operator between two C*-algebras. Umegaki [6](1969) established a one-to-one correspondence between channels and certain type of averaging operators on some function spaces, which led to a characterization of ergodic channels. Theorems 2.1, 2.2 and Propositions 2.7,2.9 are due to Umegaki [6]. Regarding completeness for ergodicity Y. Nakamura [1](1969) gave a sufficient condition. Proposition 2.10 is proved by Y. Nakamura [1] and Umegaki [6] independently. Proposition 2.11 is obtained by Umegaki [5](1964). 3.3. Mixing channels. Proposition 3.2 is noted by Feinstein [2]. Theorem 3.4 together with Definition 3.3 is due to Takano [1](1958). Strong and weak mixing properties of channels were introduced by Adler [1](1961), where he proved Theorem 3.7. Definition 3.8 through Theorem 3.14 are formulated and proved by Nakamura [2](1970). 3.4. Ergodic channels. Characterization of stationary ergodic channels was first obtained by Yi [1, 2] in 1964 (Theorem 4.2 (2), (6)). In some years it has been overlooked. Umegaki [6] and Nakamura [1] independently obtained some equivalence conditions of ergodicity of a stationary channel. Umegaki used a functional analysis approach to this characterization, while Nakamura applied a measure theoretic consideration. Theorem 4.2 (3), (4), (5) and (7) are due to Nakamura [1]. Theorem 4.4 and Corollary 4.5 (2) (7) are obtained by Umegaki [6]. Ergodicity of Markov channels is considered in Gray, Durham and Gobbi [1](1987). 3.5. AMS channels. Jacobs [1](1959) defined "almost periodic channels" together with "almost periodic sources" in the alphabet message space setting, which are special cases of AMS channels and AMS sources, defined by Fontana, Gray and Kieffer [1] and Gray and Kieffer [1] (1980). Almost periodic channels and sources are essentially the same as AMS ones in treating. Subsequently Jacobs showed some
188
Chapter III: Information Channels
rigorous results regarding almost periodic channels in his papers [2, 6](1960, 1967). Lemma 5.2 through Theorern 5.8 are mainly due to Fontana, Gray and Kieffer [1]. In Theorem 5.8, (6) is noted in Kakihara [4](1991). Kieffer and Rahe [1](1981) showed that a Markov channels between one-sided alphabet message spaces is AMS. Lemma 5.11 is given by Ding and Yi [1](1965) for an almost periodic channel. In Theorem 5.12, (2) is observed by Kieffer and Rahe [1], (3) is due to Fontana, Gray and Kieffer [1], (4) and (5) are given by Kakihara [5], and (6) and (7) by Ding and Yi [1] for an almost periodic channel. Theorem 5.13 is obtained in Kakihara [5].
3.6. Capacity and transmission rate. The equality Ce(v) = Cs(v) for a stationary channel was one of the important problems since Khintchine [2](1956) (cf. Theorem 6.1). Carleson [1](1958) and Tsaregradskii [1](1958) proved C; = C, for finite memory channels. Feinstein [3](1959) and Breiman [2](1960) showed C e = C; for finite memory and finitely dependent channels. In particular, Breirnan [2] proved that for such a channel Ce is attained by some stationary ergodic source, where he used Krein-Milman's Theorem. So part (i) and (iii) of Theorem 6.1 (2) are due to Breiman [2]. Parthasarathy [1](1961) also showed C e = C; using integral representation of entropy and transmission rate functionals invoking ergodic decomposition of stationary sources. According to his proof, C; = C; holds for stationary ergodic channels (Theorem 6.1 (3)). For channels between compact and totally disconnected spaces Umegaki [5] considered transmission rates and capacities. Proposition 6.3 through Theorem 6.8 are due to Umegaki [5]. Nakamura [1] formulated transmission rates and capacities in an abstract measurable space setting and proved Proposition 6.11. For a discrete memoryless channel an iterative method to compute the capacity was obtained by Arirnoto [1](1972), Blahut [1](1972) and Jimbo and Kunisawa [1](1979). Various types of channel capacity have been formulated. We refer to Kieffer [2](1974), Nedoma [1, 3](1957, 1963), Winkelbauer [1](1960) and Zhi [1](1965). 3.7. Coding theorems. Feinstein [1](1954) proved Lemma 7.1. The proof here is due to Takano [1]. Shannon [1](1948) stated his coding theorems (Theorems 7.3 and 7.4). Subsequently, their rigorous proofs were given by Khintchine [2]' Feinstein [2] and Dobrushin [1](1963). Here we followed Takano's method (see Takano [1]). The converse of coding theorems was obtained by many authors, see e.g. Feinstein [3] and Wolfowitz [1](1978). Gray and Ornstein [1](1979) (see Gray, Neuhoff and Shields [1](1975)) gives a good review for coding theorems. Related topics can be seen in Kieffer [2]' Nedoma [1, 2](1957, 1963) and Winkelbauer [1].
CHAPTER IV
SPECIAL TOPICS
In this chapter, special topics on information channels are considered. Fisrt, ergodicity and capacity of an integration channel are studied, where an integration channel is determined by a certain rnapping and a stationary noise source. A channel can be regarded as a vector valued function on the input, i.e., a measure valued function. Then strong and weak measurabilities are considered together with dominating measures. A metric topology is introduced in the set of channels. Completeness w.r.t. this metric is examined for various types of channels. When a channel is strongly measurable, it is approximated by a channel of Hilbert-Schmidt type in the metric topology. Harmonic analysis is applied when the output is a locally compact abelian group. In this case a channel induces a family of unitary representations and a family of positive linear functionals on the L I_group algebra. The Fourier transform is shown to be a unitary operator between certain Hilbert spaces induced by a channel and an input source. Finally, noncommutative channels are considered as an extension of channel operators between function spaces, which are cornmutative algebras. Here the function spaces are replaced by certain (noncommutative) operator algebras.
4.1. Channels with a noise source In this section we consider integration channels determined by a mapping and a noise source. Ergodicity and capacity of this type of channels are studied. Let (X, X, S) and (Y,~, T) be a pair of abstract measurable spaces with rneasurable transformations, which are the input and the output of our communication system as before. We consider another measurable space (Z, 3, U)with a measurable transformation U on it and a measurable mapping 'ljJ : X x Z -+ Y satisfying that (nl) 'ljJ(Sx, U z) = T'ljJ(x, z) for x E X and z E Z. Choose any U-invariant probability measure ( E Ps(Z), called a noise source, and 189
Chapter IV: Special Topics
190
define a mapping v : X x ~ -+ [0, 1] by
v(x,C) = llC('l/J(X,Z))«dZ),
x E X,CE~.
(1.1)
Since 'l/J is measurable, it is easily seen that t/ is a channel, i.e., v E C(X, Y). Moreover, it is stationary since for x E X and C E ~
v(x,T-1C) = llT-1C('l/J(X,Z)) «dz),
by
n.n,
= llc(T'l/J(X, z)) «dz) = llC('l/J(SX,UZ)) «dz),
= llc('l/J(SX, z)) «dz),
by (nl),
since ( is stationary,
= v(Sx, C). The channel t/ defined by (1.1) is called an integration channel determined by the pair ('l/J, () and is sometimes denoted by v'l/J,(. We shall consider ergodicity of this type of channels, where the mapping 'l/J and the noise source ( are fixed.
Proposition 1. Suppose that J-t x ( E Pse(X x Z) for every J-t E Pse(X), then the integration channel v = v'l/J,( is ergodic, i.e., v E Cse(X, Y).
Proof. Assume that E E X Q9 ~ is S x T-invariant. Then, T- 1 Es x = Ex for x E X by (111.4.1). Letting f(x, z) = lEx ('l/J(x, z)) for (x, z) E X x Z, we note that f is S x T-invariant since for (x, z) E X x Z f(Sx,Uz) = 1Esx('l/J(Sx,Uz)) = 1Esx(T'l/J(x,z)) = I T - l Esx ('l/J(x, z))
= lEx ('l/J(x, z)) = f(x, z).
By assumption J-t x ( is ergodic for any J-t E Pse(X), so that J-t x ((E) hence ((Ex) == 0 or 1 u-a.e.x, Thus
f == 0 Consequently, .for every J-t E P se(X)
or 1 J-t x (-a.e.
= 0 or
1 and
4. .1. Channels with a noise source
191
= L l lEJl/J(x,z)) ((dz)ll(dx) = Llf(x,Z)((dZ)Il(dX) =
fro r
Jxxz
f(x, z) J-t
X
(dx, dz)
= 0 or 1.
Thus J-L Q9 v is ergodic. Therefore v is ergodic. In addition to (n1) we impose the following two conditions on 'l/J: (n2) 'l/J(x,') : Z --+ Y is one-to-one for every x E X; (n3) ..\(G) E XQ9~ for every G E X@3, where the mapping"\: X is defined by ..\(x,z) = (x, 'l/J(x, z)) for (x, z) E X x Z.
X
Z -+ X x Y
Note that if X, Y, Z are complete metric spaces, then (n3) is always true for any Baire measurable mapping 'l/J : X x Z -+ Y. Under the additional assumption of (n2) and (n3) the converse of Proposition 1 is proved as follows: Proposition 2. Suppose that the mapping 'l/J satisfies (n1) - (n3). Then the integration channel v = v'l/J,( is ergodic iff J-t x ( is ergodic for every ergodic J-t E Pse(X). Proof. The "if" part was shown in Proposition 1. To prove the "only if" part, let ,.\ be the mapping in (n3). Since A is also one-to-one by (n2) we have that ,.\-l"\(G) = G for G E X Q9 3. If G E X @ 3 is S x U-invariant, then
..\(G) = "\(8 x U)-lG)
G} = {(x, 'l/J(x, z)) : (8x, Uz) E G} = {(x, y) : y = 'l/J(x, z), (8x, Uz) E G}
= {"\(x,z): (8x,Uz) E
~ {(x, y) : Ty = 'l/J(8x, Uz), (8x, Uz) E ~ {(x,y): (8x,Ty) E "\(G)}
= (8 x T)-l "\(G). Now one has that for an ergodic J-t E Pse(X) J-L
x (G) = J-t x ((,.\-l"\(G))
= LL l,\-l,\(G) (x, z) Il(dx)((dz) = L L l'\(G)('x(x,z)) Il(dx)((dz)
G}
Chapter IV: Special Topics
192
= =
=
LL LL
lA(G) (x, 1f(x, z))
lA(G)x
j1,
(1f(x,z)) ((dz)JL(dx)
L
v(x, >'(G)x) JL(dx)
= j1, ® V (A(G)) since v is ergodic and
JL(dx)((dz)
=
® v is so. Therefore
°
or 1
j1,
x ( is ergodic.
Example 3. (1) For an integration channel v = v1/J,( with (n1) (n3) it holds that, for a stationary j1, E P, (X), j1, x ( is ergodic iff j1, ® v is so. (2) Suppose that (X, X, S) = (Z, 3, U), then an integration channel v = v1/J,( with (n1) - (n3) is ergodic iff ( is WM. (3) Suppose that (X, X) = (Y,~) = (Z,3) is a measurable group with a group operation "." commuting with S = T = U. Let y = 'I/; (x, z) = x . z for x, z E X. Then, the integration channel v = v1/J,( determined by ('I/;, () is ergodic iff ( is WM. (4) Consider a special case where X = X~, Y = YoZ and Z = Z~ with X o = {O, 1,2, ... ,p -1}, Yo = {O, 1,2, ... ,p+q - 2} and Zo = {O, 1, 2, ... ,q -1}. Define 'I/; (x, Z)i Xi + Zi (mod (p + q)) for i E Z, where x = (Xi), Z = (Zi) and 'I/; (x, Z)i is the ith coordinate. Then the integration channel t/ = v1/J,( is called a channel of additive noise. In this case, v is ergodic iff j1, x v is ergodic for every j1, E Ps e (X). The following proposition characterizes integration channels.
Proposition 4. Assume that 'I/; : X X Z -+ Y is a measurable mapping satisfying (n1)- (n3). Then, a stationary channel » e cs (X, Y) is an integration channel determined by ('I/;, () for some noise source ( E P, (Z) iff (1) V(x, 'I/; (x, Z)) = 1 for x E X; (2) v(x,'I/;(x, W)) = v(x', 'I/; (x', W)) for x,x' E X and WE 3. Proof. Under conditions of (n1) - (n3), for any x E X and W E 3 the set 'I/;(x, W) is measurable, i.e., 'I/;(x,W) = {'I/; (x, z) : Z E W} E ~. For, A(X x W) E X ® ~ by (n3) and hence 'I/;(x, W) = A(X x W)x E ~. Suppose v = v1/J,C;. Then for x E X and W E 3
v(x, 1f(x, W)) =
=
L
l,p(x,w) (1f(x,z)) ((dz)
L
lw(z) ((dz),
= ((W),
by (n2),
4. .1. Channels with a noise source
193
which is independent of x. Thus (1) and (2) immediately follow. Conversely assume that (1) and (2) are true. Then, ((.) == v(x, 7/J(x, .)) is independent of x. Note that ( is a U-invariant probability measure on 3, i.e., ( E Ps(Z) by (c1), (n2) and (1). Now for C E ~ we have that
v(x, C) = v(x, C n 7/J(x, Z)), = v(x, 7/Jx7/J;l( C)
n 7/Jx(Z)) , where 7/Jx(')
= 7/J(x, '),
1
7/J x ( 7/J; ( C) )) 7/J; 1 ( C) )
= v (x, =
by (1),
«
= h1,p;l(C) (z) «(dz) =
h
1C (1/I(x,Z)) «(dZ),
which implies that v is an integration channel. Next we consider capacities of integration channels in the setting of alphabet message spaces. So assume that X = X~, Y = Yl~ and Z = Z~ for some finite sets X o, Yo and Zoo S, T and U are shifts on the respective spaces. For a stationary channel u E Cs(X, Y) the transmission rate functional 91:(. ; v) is given by
as in Section 3.6. Also the stationary capacity Cs(v) and the ergodic capacity Ce(v) are defined there. We now define an integration channel as follows. Let m ~ 0 be a nonnegative integer and 'l/Jo : Xr;+l x Zo -+ Yo be a mapping such that
Define a mapping
;p : X
x Z -+ Y by i E Z,
where x = (Xi) E X, Z = (Zi) E Z and ~(x, Z)i is the ith coordinate of ,(jj(x, z) E Y. Evidently ~ is measurable and ~ satisfies (n1) since
~(Sx, UZ)i = 7/JO((SX)i-m, (SX)i-m+l, ... ,(SX)i, (UZ)i) = 7/JO(Xi-m+l' Xi-m+2, Xi+l, Zi+l) =
~(x, Z)i+l
194
Chapter IV: Special Topics
for each i E Z. Moreover, ;j; enjoys the properties of (n2) and (n3). Taking any noise source ( E Ps(Z), we can define an integration channel v = v1[;". We note that v has a finite memory or m-memory (cf. (c5) in Section 3.1): for any message
[Yi···Yj]
(i~j)cY
v(X, [Yi· .. Yj]) = v(x', [Yi ... Yj]) if x = (Xk) and x' = (x~) satisfy Xk = x~ (i - m ~ k ~ j).
(1.2)
Theorem 5. The transmission rate functional v:t(.; v) of an integration channel v = v1[;" is given by
v:t(J-t; v) =
H~v(T)
- H,(U),
J-t E Ps(X).
Proof. Let J-t E Ps(X). Then we see that
. log J-t (8) V([(Xl' Yl) 1 = - lim n-+oo n
L
L
J-t (8) V([Xl-m
. log J-t (8) V([Xl- m 1
L
L
= - n-+oo lim n
Xl- 171
, ·· ·
,X n
J-t([Xl- m
= (Xk)
Vl(J-t; v) =
E
[Xl-m
H~(S)
· · · xn ]
... x n]
Yn])
x [Yl
· · · xn])v(x,
[Yl
Yn])
...
xn])v(x, [Yl ... Yn])
since v has m-memory (cf. (1.2)). Hence we have that
+ H~v(T) -
= HlJv(T) - n-+oo lim! n
H~®v(S
L Xl- 171
.
x n] x [Yl ... Yn])
Yl,··· ,Yn
. log J-t([Xl- m for x
(xn, Yn)])
, ·· ·
x T)
tL([Xl-m
00
oxnD IOgtL([Xl-m ooxnD 0
,X n
1
11m + n-+oo n
. log J-t([Xl- m = HlJv(T) + lim! n-+oo
n
L Xl- 171
, •••
L
,X n
u:,... ,Yn
· .. xn])v(x,
J-t([Xl- m
[Yl· .. Yn])
· · · xn])v(x,
[Yl ... Yn])
195
4.1. Channels with a noise source
. log v(x, [Yl ... Yn]), where x
= (Xk)'
Now, it holds that
V(X, [Yl ... Yn]) =
Ll[
Y1 oooYn l
(¢(x, z) )((dz)
= ((M1(Xl- m , ' "
,xt,Yl)n···nMn(xn-m, ... ,Xn,Yn)),
where for i = 1, ... ,n
= {z = (Zk)
Mi(ao,at, ... ,an,b)
E
Z: 'l/Jo(ao, ... ,an, Zi)
= b} C
Z,
since ,(j;(x, z) = ('l/Jo(Xi-m,' .. ,Xi, Zi))ieZ E [Yl ... Yn] iff Z E Mi(Xi-m, ... ,Xi, Yi) for 1 :::; i :::; n. Let Y1 { 'l/JO(Xi-m, ... ,Xi, d) : d E Zo} ~ Yo. Then for any Yi E Y1 there is a unique [Zi] E Vltt (Z) such that Mi(Xi-m," . ,Xi, Yi) = [Zi], where VJti (Z) is the set of all messages of length 1 starting at time i in Z. If Yi E Yo - Y1 , then Mi(Xi-m, ... ,Yi) = 0. Therefore,
9t(J.t j v)
L
= Hl£v(T) + n-+-oo lim.!. n
J.t([Xl- m · .. x n ])
Xl- m , · · · ,X n
L
v(x, [Yl ... YnJ) log v(x, [Yl ... Yn])
Yl,··· ,Yn
= HJ.Lv(T) +
1 lim -H, (U-IVlt~(Z) V· .. V u-nVlt~(Z))
n-+-oo
n
= HJ.Lv(T) - H,(U). Proposition 6. For the integration channel v ergodic input source J.L* E Pse(X) such that Cs(v)
= v1/j" there exists a stationary
= 9t(J.L*; v).
Proof. HJ.L(S) is a weak* upper semicontinuous function of J.L by Lemma 11.7.3. Since v has a finite memory, it is continuous, i.e., it satisfies (c5') or (c5"). Continuity of a channel implies that HJ.Lv(T) is a weak* upper semicontinuous function of J.L by Proposition 111.2.11. Thus 9t(. ; v) is weak* upper semicontinuous on Ps(X). The assertion follows from Theorem 111.6.8 (3). Let us consider a special case where X o = Yo = Zo is a finite group. Let 'l/Jl(X, z) = = (Xi' Zi)iEZ, The channel u determined by 'l/Jl and ( E Ps(Z) is called a channel of product noise.
x-
Z
Theorem 7. For a channel v =
V'ljJl,'
of product noise it holds that
Cs(V) = log IXol- H,(U).
Chapter IV: Special Topics
196
Proof. Let p = IXol and consider a (~, ... , ~)-Bernoulli source /10 on X = X~. Then we see that for n ~ 1
JLOV([Yl ... YnJ) =
L
v(x, [Yl ... YnJ) JLo(dx)
L L
Xl, ... ,X n
1
v(x, [Yl · .. YnJ) JLo(dx)
[XI"'X n]
V(X,[Yl···Yn])/10([Xl···X nJ) , X=(Xk),
Xl, .. · ,X n
= In p
= pn ~
~
P
1
= pn
L Xl,'"
V(X'[Yl"'YnJ),
SinCeJLo([xl"'XnJ)=p~'
,X n
L Xl, .. · ,X n
L L
Xl, ... ,X
Jz[1[Y,"'Yn]('l/Jl(X,Z))((dz) [1
1
I[Yl"'Ynl ('l/Jl (x, Z)) ((dz)
n J[(x-; .yI) ... (x;; 'Yn)]
(([(x 11 . Yl) ... (x;:;l . Yn)J)
Xl, .. · ,X n
1
since {xi 1 . Yj : 1 :::; i :::; p} = {Xi : 1 :::; i :::; p} = X o for 1 :::; j:::; n. Hence, /1oV is also a (*,... , ~)-Bernoulli source. Thus Hp,ov(T) = log IXol = logp. Therefore,
Cs(v) =
sup [Hp,v(T) - H((U)] :::; logp - H((U) = 9t(/10; v) :::; Cs(v), p,EPs(X)
or 9t(/10; v) = Cs(v) = log IXol-II((U).
4.2. Measurability of a channel In this section, we regard a channel as a measure valued function on the input space and consider its measurability. Let (X, X, S) and (Y, q), T) be a pair of abstract measurable spaces with rneasurable transformations Sand T, respectively. As before, M(O) denotes the space of all C-valued rneasures on 0 and B(O) the space of all C-valued bounded measurable functions on 0, where 0 = X, Y or X x Y. First we need notions of measurability of Banach space valued functions, which are slightly different from those in Bochner integration theory (cf. Hille and Phillips [1] and Diestel and Uhl [1]).
4.2. Measurability of a channel
197
Definition 1. Let E be a Banach space with the dual space E"; where the duality pair is denoted by (¢,¢*) for ¢ E E and ¢* E E", Consider a function ip : X -+ E: cp is said to be finitely £ -valued or £ -valued simple function if n
cp(x) =
:E lAk (X)¢k,
xEX
k=l for some partition {A k }k=l C X of X and some {¢k}k=l C E, cp is said to be strongly measurable if there exists a sequence {CPn}~=l of £-valued sirnple functions on X such that XEX, IICPn(x) - cp(x)lle -+ 0, where
11·11 e
is the norm in E,
cp is said to
be weakly measurable if the scalar function
¢*(cp(.)) = (cp(.),¢*) is measurable for ¢* E E", Although the above definition is not identical nor equivalent to the one in Bochner integration theory, one can show that sp : X -+ E is strongly rneasurable iff ip is weakly measurable and has a separable range. The usual definition of measurabilities of a Bochner integration theory is as follows. Let m be a finite positive measure on (X, X). Then a function cp : X -+ £ is said to be strongly measurable if there is a sequence {CPn} of E-valued simple functions such that
IICPn (x) - cp(x) lie -+ 0
m-a.e.x.
Note that the strong (or weak) measurability on (X, X) implies that on (X, X, m) for any finite measure m. Take a channel v E C(X, Y). Then from condition (cl ) we can regard u as an M(Y)-valued function v on X:
v(x,·) == v(x) E M(Y),
xEX.
We want to consider strong and weak measurabilities of
Definition 2. Let v E C(X, Y) be a channel and (2.1). Then v is said to be strongly measurable if (cI7)
v is strongly measurable on
(X, X)
and to be weakly measurable if (cI8)
v is weakly measurable on
(X, X).
(2.1)
v in the following.
v:X
-+ M(Y) be defined by
198
Chapter IV: Special Topics
Clearly (cI7) implies (cI8). Recall that a channel v E C(X, Y) is said to be dominated if (c6) There exists some
1] E
P(Y) such that v(x,·)
«1]
for every x E X.
Then (c6) is between (c17) and (cI8) as seen from the following. Theorem 3. Let v E C(X, Y). Then, (cI7) => (c6) => (cI8). That is, if v is strongly measurable, then v is dominated, which in tum implies that v is weakly measurable. (cI7) => (c6). Assume that v is strongly measurable. Then the range {v(x,·) : x E X} of v is separable in M(Y) and hence has a countable dense subset {v(x n , · ) : n 2 I}. Let .) = ~ v(x n , .) 1] ( L..-J 2n '
Proof.
n=l
where the RHS is a well-defined element in P(Y). Suppose 1](C) = o. Then v(x n , C) = 0 for n 2 1 by definition. For any x E X let {x n k }~1 be a subsequence of {x n } such that
IIv(xn k , · )
v(x, ·)11
-
~ 0
as k
~ 00,
which is possible by denseness of {v(x n , · ) : n 2 I} in {v(x,·) : x E X}. Hence we see that v(x, C) = lim v(x n k , C) = O. This shows that v(x, .) ~ 1]. k-+-oo
(c6) => (cI8). Assume that some 1] E P(Y) satisfies v(x) « 1] for x E X. The L 1-space L 1 (Y, 1]) is identified with a subspace of M(Y) by the identification L 1(Y,1]) 3 9 == 1]g E M(Y) given by
'flg(C) =
L
g(y) 'fl(dy),
C
E
lV,
i.e., 9 = ~, where lI1]gll = l1]gl(Y) = IIgI11,1]. The Loo-space LOO(y, 1]) is the dual of L1(Y,1]) by the identification LOO(y, 1]) 3 I == 1* E L1(y, 1])* given by
l*(g) = [f(Y)9(Y) 'fl(dy), Now for x E X let V x(' )
=
9 E L 1 (Y, 'fl).
lI~~~:r) (-) E L 1 (y ,'fl)
be the RN-derivative. To show the weak measurability of iJ it suffices to prove that of the function X 3 x I-t V x E L1(y, 1]). For I E LOO(y, 1]) choose a sequence {In}~=l of bounded simple functions on Y such that In ~ I n-a.e. For each n 2 1
f~(lIx) = [fn(y)lIx(y) 'fl(dy) == f~(x),
say,
.4. 2. Measurability of a channel
199
is a function of x converging to f*(v x ) == f*(x), say, for all x E X by the Bounded kn
Convergence Theorem. If we let fn =
E
an,k1cn,k for n
2:: 1, then we see that for
k=l
n2::1
which implies that f~(') is measurable. Hence f*(·) is also measurable. Since f E LOO(y, rJ) is arbitrary, we can show that the function x t---+ V» is weakly measurable. This completes the proof. When the measurable space is separable, the implication (c6) => (cI7) holds, i.e., under the assumption of separability of the measurable space (c6) and (cI7) are equivalent, which will be shown in the following.
Corollary 4. Let v E C(X, Y) and assume that ~ has a countable generator. Then (c6) implies (cI7). That is, every dominated channel is strongly measurable.
Proof. Suppose that (c6) holds with 'TJ E P(Y). Since ~ has a countable generator, L 1 (y , rJ) is separable. Using the notations of the proof of Theorem 3, we only have to prove the strong measurability of the function X :3 x t---+ V x E L1(y, 'TJ). But then, the weak measurability of that function is shown in the proof of Theorem 3. Moreover, L 1 (Y, rJ) is separable. Therefore x t---+ V x is strongly measurable.
rJ
Assume that a channel v E C(X, Y) is dominated, v(x,·) ~ 'TJ (x E X) for some P(Y). Let k( ) = v(x, dy) (x,y) EX xY x,y rJ(dy) ,
E
be the RN-derivative. We want to consider joint measurability of this function k. To this end we shall use tensor product Banach spaces, which will be briefly mentioned. We refer to Schatten [1] and Diestel and Uhl [1]. Let J-t E P(X) and E be a Banach space. L 1(X; £) denotes the Banach space of all £-valued strongly
Chapter IV: Special Topics
200
measurable functions on X which are Bochner integrable w.r.t. 1-", where the norm 1/ I1I,p, is defined by
lI iPlkJ£ =
Ix IliP(x) li e
J.t(dx).
The algebraic tensor product L 1 (X) 0 £ consists of functions of the form
which are identified with £-valued functions xEX.
The greatest crossnorm /'(.) is defined on L 1 (X ) 0 £ by
which is equal to
Thus, the completion of L 1(%)0£ w.r.t. the greatest crossnorm /" denoted L 1(X)0"( E, is identified with L 1 (X ;E). L 1 (X ) 0"( E is called the projective tensor product of L1(X) and E. If E = M(Y) or E = L 1 (y , "7) for some "7 E P(Y), then
L 1(X; M(Y)) = L 1(X) 0"( M(Y), L 1(X; L 1(y )) = L 1 (X ) 0"( L 1 (y ) = L 1(X
X
Y) = L 1(X
X
Y, I-" x "7).
Theorem 5. Assume that a channel v E C(X, Y) is strongly measurable and hence is dominated by some source 'TJ E P(Y). Let I-" E P(X) be arbitrarily fixed. Then the RN-derivative k(x, y) = v~(d:» is jointly measurable on the product measure space (X x Y,:r0~,1-" x "7). Proof. We shall consider the tensor product space L 1(X, 1-") 0"( E for E = M(Y) or L 1 (y , "7). Since v is strongly measurable, v(·) is an M(Y)-valued strongly measurable
4.. 2. Measurability of a channel
201
function on the measure space (X, X, J.L) and, moreover, it is Bochner integrable w.r.t. J.L, i.e., v(·) E L 1 (X; M(Y)). Then there are a sequence {
jn
where
=E
fn,j 0€n,j, n ~ 1. Since L 1 (y ) = L 1 (Y, 'rJ) is identified with a closed
j=1
subspace of M(Y) as in the proof of Theorem 3, we can regard L 1(X) 0 L 1 (y ) and L 1(X) Q9')' L 1 (y ) as subspaces of L 1(X) 0 M(Y) and L 1(X) Q9')' M(Y), respectively. In fact, the following identifications can be made:
v == This means k E L 1(X
X
v~(~~) (-) = k(.,.) E L 1(X) ®1' L 1 (y ). Y) and k is jointly measurable.
Remark 6. In Theorem 5, if a measure € E P(X x Y) is such that € ~ J.L x n, then k(x, y) = v~(d~» is in L 1(X x Y, €) and hence is jointly measurable. In particular, if € = J.LQ9v, then k E L 1(X X Y, J.LQSlv) and k is jointly measurable since J.LQSlv« J.L x n. Let us consider a special case where (Y, V)
=
(X, X). Assume that a channel
C(X, X) is strongly measurable, so that
v(x) = v(x, .) ~ n,
xEX
for SOllie 'rJ E P(X). Recall that if'rJ is considered as an input source, then the output source 'rJV E P(X) is given by
'TJv(C) =
L
vex, C) 'TJ(dx),
CEX.
Then we have: Proposition 7. Under the above assumption one has: (1) 'rJv ~ 'rJ;
(2) 'TJv(dy) 'rJ(dy)
= ( vex, dy) 'TJ(dx), n-a.e.u E X.
Jx
'rJ(dy)
Chapter IV: Special Topics
202
Proof. (1) is obvious since v(x, .) « rJ for every x E X. (2) Note that k(x, y) = v~(d~) is jointly measurable on (X x X, X ® X, rJ x rJ) by Theorem 5. Applying Fubini's Theorem we get for C E X that
l ~~~)
= 1Jv (C ) =
1J(dy)
=
i
v(x, C) 1J(dx)
{ Jx
(1c v(x,dy) (d)) (d ) rJ(dy) rJ rJ x
il = l (i =
Y
k(x, y) 1J(dY)1J(dx) k(X,Y)1J(dX))1J(dY).
This is enough to obtain the conclusion.
4.3. Approximation of channels Let (X, X, S) and (Y,~, T) be a pair of abstract measurable spaces with measurable transformations. We introduce a metric topology in the space C(X, Y) of channels from X to Y. It will be shown that C(X, Y),Cs(X, Y),Cse(X, Y),Ca(X, Y) and Cae (X, Y) are complete w.r.t. this topology. When a channel is strongly measurable, a Hilbert-Schmidt type for it is defined. We shall prove that any strongly measurable channel is approximated by a channel of Hilbert-Schmidt type in the metric topology.
Definition 1. Define p(.,.) onC(X, Y) x C(X, Y) by P(VI, V2)
where
11·11
=
sup IlvI(x,.) -
V2(X,
xEX
·)11,
VI, V2
E
C(X, Y),
(3.1)
is the total variation norm in M(Y). Clearly P is a metric on C(X, Y).
Recall that each channel B(X) given by
V
(Kvg)(x)
E
C(X, Y) induces a channel operator K; : B(Y) -t
= [g(y) v(x, dy),
9 E B(Y),
where B(X) and B(Y) are spaces of all bounded measurable functions on X and Y as before. An immediate consequence of the above definition is:
Lemma 2. Let VI,V2 E C(X,Y) and K V 1 , Kv 2 channel operators. Then it holds that IIK v 1
-
K v 2 11 =
:
B(Y) -t B(X).be corresponding
P(VI, V2).
4.3. Approximation of channels
203
Proof. Observe that
IIK
v 1
K
v2
\1
=
sup II(Kv 1 IIfllsl
=
sup sup IIflls l xEX
-
K v 2 )f ll
I}y( j(Y){Vl(X, dy) - V2(X, dy)}1
:::; sup sup (If(Y)llvI(x,,) - V2(X, ·)I(dy) xEX IIfIlSl}Y :::; p(Vl, V2)' Conversely, let e > 0 be given. Choose an Xo E X such that
which is possible by (3.1). Define a functional A on B(Y) by
f
E B(Y).
Then A is bounded and linear with norm II VI (Xo, .) - V2(XO, ')11· Clearly IIK v 1 K V 2 11 2:: \IAII, so that IIK v 1 -K V 2 11 > p(Vl, v2)-e. Therefore IIK v 1 -K v 2 11 2:: P(VI, V2)' Lemma 3. Let V n (n 2:: 1), v E C(X, Y) and p(v n , v) -+ 0 as n -+ any J-L E P(X) it holds that
00.
Then, for
Proof. Let J-L E P(X). It suffices to show the second convergence. For n 2:: 1 and E E oX Q9 ~ choose En,l, E n ,2 E oX Q9 ~ such that En,l n E n ,2 = 0, En,l U E n ,2 =X and \IJ-LQ9 V n
J-LQ9
vii
= 1J-LQ9
(vn
-
= J-L Q9 (vn -
v)I(X x Y) V)(En,l) - J-L ® (vn
V)(En ,2)'
Then the RHS of the above is estimated as J-L Q9 (vn - V)(En,l)
=
i
J-L ® (vn - V) (En ,2)
{v n (x, (En,l).,) - v(x, (En,d.,) }JL(dx)
-i
{v n (x, (En,2).,)
v(x, (E n,2).,) }JL(dx)
204
Chapter IV: Special Topics
~
SUp
xEX
Ivn(x, (En,l)X) - V(X, (En,l)X) I
+ SUp IVn (x, (En,2)x) - V(X, (En,2)x) 1
xEX ~ 2 sup II vn(x , .) - v(x, ·)11 xEX =2p(vn,v)-+O (n-+oo), where (En,i)x is the x-section of En,i for i follows.
= 1,2 and n 2
1. Therefore the conclusion
Proposition 4. Consider the metric p on C(X, Y). (1) (C (X, Y), p) is a complete metric space. (2) (Cs(X, Y), p) is a complete metric space. (3) (Cse(X, Y), p) is a complete metric space. (4) rc, (X, Y), p) is a complete metric space. (5) (Cae (X, Y), p) is a complete metric space.
Proof. (1) Let {vn } C C(X, Y) be a Cauchy sequence, i.e., p(vn, vm ) -+ 0 as n, m -+ Let x E X be fixed. Then,
00.
Since vn(x, .) E P{Y) for n 2 1 and P(Y) is norm closed, there is some v(x, .) E P(Y) such that Ilvn(x,·) - v(x, ·)11 -+ o. Hence v(·,·) satisfies (c1). To see that v(·,·) satisfies (c2), let C E ~ be arbitrary. Since vn ( · , C) is X-measurable for n 2 1 and xEX,
we see that v(·, C) is also X-measurable. Hence (c2) is satisfied. Thus v is a channel. For any c > 0 there is some no 2 1 such that p(vn, vm ) < e for n, rn 2 no. Letting m -+ 00 we obtain p(vn, v) ~ e for n 2 no. This means that p(vn, v) -+ 0 as n -+ 00. Therefore (C(X, Y), p) is complete. (2) Let {v n } < Cs(X, Y) be a Cauchy sequence. Then by (1) there is some v E C(X, Y) such that p(vn, v) -+ O. To see that v is stationary let x E X and C E ~. It follows that
Iv(Sx,C)
v(x,T-1C)1 ~ Iv(Sx,C) - vn(Sx,C)1 + Ivn(Sx,C) - vn(x,T-1C)1
+ Ivn(x,T-1C) -
v(x, T-1C)1
~ II v (Sx, .) - vn(Sx, ·)11 ~2p(vn,v)-+O
+ Ilvn(x,.)
(n-+oo).
- v(x, ·)11
4.3. Approximation of channels
205
Hence v(Sx, C) == v(x, T-1C). Thus v is stationary. (3) Let {vn } ~ Cse(X,Y) be a Cauchy sequence. Then there is some stationary v E Cs(X, Y) such that p(vn , v) -+ 0 by (2). To see that v is ergodic let It E Pse(X). Since It®vn is ergodic for n ~ 1, 111t@vn - It@vll -+ 0 by Lemma 3, and Pse(X x Y) is norm closed, one has that It @ v is also ergodic. Thus v is ergodic. (4) and (5) can be proved similarly. Let us now consider channels [X, t/, X], where the input and the output are identical. Let rJ E P(X) and consider the Hilbert space L 2(X, rJ). If an operator L : L 2(X, rJ) -+ L 2(X, rJ) is defined by
(Lf)(x)
=
L
k(x, y) TJ(dy) ,
where k(·,·) is a suitable kernel function on X x X, then L is said to be of HilbertSchmidt type if k E L 2(X X X, rJ x 'TJ). Similarly we have:
Definition 5. Assume that a channel v E C(X, X) is strongly measurable, so that there is some rJ E P(X) such that v(x, .) «: rJ for x E X and
==
k( x, y ) -
v(x, dy) rJ(dy)
is jointly rneasurable. In this case, v is said to be of ('TJ) Hilbert-Schmidt type if k E L 2(X X X, rJ x rJ).
Theorem 6. Let v E C(X, X) be a strongly measurable channel with a dominating measure rJ E P(X). For any e (0 < e < 1) there exist a pair of channels VI, v2 E C(X, X) and a A (0 < A < e) such that
(3.2) _
Vl(X,
dy)
_
V2(X,
dy)
k1(x,y)= k2(x,y)
Proof. Let k(x, y) constant c > 0 let
= TJ(dy)
== v~(d:» for k~(x, y)
TJ(dy)
== k~(x, y) ==
1
)
(3.3)
EL(XxX,TJXTJ, EL
2(
)
(3.4)
X X X,TJ x TJ .
(x, y) E X x X. Then clearly k E L1(X
k(x, y)l[k~c](x, y),
x,yEX,
k(x, y)l[k
x,yEX,
X
X). For a
206
Chapter IV: Special Topics
where [k 2:: c] = {(x,y) E X x X: k(x,y) 2:: c}. Now let c (0 Choose a constant c > 0 for which
o< Then define
VI, V2
IIk~lll,77x71
1 - e :::; IIk~IIt,77x71
< C,
< c < 1) be given.
< 1.
by XEX, CE.x,
x EX, CE.x, and let A = Ilk~IIt,71x71. It follows that (3.2)
Theorem 7. Let
V
E C(X,
(3.4) hold.
X) be a channel. If v is strongly measurable, then:
(cI9) There is an 'fJ E P(X) such that for any e > 0 there exists an ('fJ) HilbertSchmidt channel V g E C(X, X) for which p(v,vg ) < c. If (X,.x) is separable, then the converse is true, i.e., (cI9) implies the strong measurability of v ((cI7)). Proof. Let 'fJ E P(X) be a dominating measure for t/. Theorem 6 implies that for any e (0 < c < 1) there are channels VI, V2 E C(X, X) and a A > 0 such that e
0< A <
where
V2
2'
is of ('fJ) Hilbert-Schmidt type. It follows that
II V (x , .) -
V2(X,
·)11 = Allvl(x,.) -
·)11 v :::; A(llvl (x, ·)11 + Il 2(x, .) II) :::; 2A < c, x E X V2(X,
or p(v, V2) < c. Conversely, assume (cI9). Then there exists a sequence {v n } ~ C(X, X) of channels of ('fJ) Hilbert-Schmidt type such that 1
p(v,vn) < -, n Since vn(x,·) « 'fJ for x E X and n 2:: 1, one has v(x,·) -«: 'fJ for x E X and v is weakly measurable by Theorem 2.3, where v(x) = v(x, .). Since (X,.x) is separable, v has a separable range in L1(X, 1]) C M(X). Thus V is strongly measurable.
4.4. Harmonic analysis for channels
207
4.4. Harmonic analysis for channels
We consider channels [X, v, G], where (X, X) is an abstract measurable space and G is an LeA (= locally compact abelian) group with the Baire a-algebra Q5. It will be shown that a channel is associated with a family of continuous positive definite functions and a family of positive linear functionals on the L 1-group algebra. Furthermore, Fourier transform is studied in connection with the output source of a channel. There are two good reasons for studying such channels. (1) We can employ a harmonic analysis method. (2) For any measurable space (Y, q)) we can construct a compact abelian group G such that Y ~ G. In fact, let r = {I E B(X) : I/(x)1 == I} and consider the discrete topology on f. Then r is a discrete abelian group with the product of pointwise multiplication. Hence G = f is a compact abelian group. Observe that Y ~ G by the identification y == Sy for y E Y, where Sy(X) = X(s) for X E r, the point evaluation. Let us denote by 0 the dual group of G and by (t, X) = X(t) the value of X E 0 at t E G. L 1 (O) = L 1 (O, in) denotes the L1_group algebra of 0, where in is the Haar measure of 0, and the convolution, involution and norm are defined respectively by
/ * g(X) =
fa
f*(X) = /(X- 1 ) ,
/(XI)g(XX I-
II/lit =
1 )
m(dxl ) ,
fa 1/(x)1
m(dx),
for 1,9 E L 1 (O ) and X E O. There is an approximate identity {e~} C L 1 (O), which is a directed set and satisfies that for each K" e~ * e~ = e~ = e~, Ile~lh = 1 and Ile~ * I - 1111 ~ 0 for I E L 1 (O). We begin with a definition. Definition 1. (1) Let P = P(X,O) be the set of all functions ¢ : X x satisfying the following conditions:
0
(p l) For every x E X, ¢(x,·) is a continuous positive definite function on that ¢(x, 1) = 1, where 1 is the identity function on G; (p2) For every X E where sp : G
~
0, ¢(., X) E B(X),
C is said to be positive definite if n
n
L L ailikcp(t j f ;:;l ) 2: 0 j=lk=l
for any n 2: 1, tl, ... ,tn E G and a1, . . . ,an E C.
~ C
0 such
Chapter IV: Special Topics
208
(2) Let Q = Q(X, L 1 (0 )) be the set of all functions q : X x L 1 (0 ) -t C satisfying the following conditions: (ql ) For every x E X, q(x,·) is a positive linear functional on L 1 (0 ) of norm 1;
(q2) For every f E L 1 (0 ), q(., f) E B(X). Note that P and Q are convex sets.
Proposition 2. (1) For a channel v E C = C(X, G) define ¢v by
¢>v(x, X) =
L
(t, X) v(x, dt),
x
E
X, X E G.
Then, ¢v E P. (2) If G is compact and totally disconnected, then for any ¢ E P there exists a unique channel v E C such that ¢ = ¢v. In this case, the correspondence v B ¢v between C and P is onto, one-to-one and affine. Proof. (1) This is a simple application of Bochner's Theorem together with the fact that ¢v(x, X) = KvX(x) (x E X, X EO), where K, is the channel operator associated with v (cf. Section 3.2). (2) Suppose that G is compact and totally disconnected. Let ¢ E P be given. For each x E X there exists by Bochner's Theorem a Baire measure V x on G such that
¢>(x, X) =
¢(x,l) = 1 implies that X x C(G) -t C by
Vx
~(x,b) =
L
(t, X) v., (dt),
is a probability measure. Define a mapping J(.,.)
L
b(t) v.,(dt),
x
E
X,b
E
C(G).
J
Clearly ¢ = on X x O. Since G is compact, for a fixed b E C(G), there exists a sequence {bn } of finite linear combinations of elements in 0 such that b., -t b uniformly on G. Hence we have that
n
uniformly on X. Since, for each finite linear combination b'· =
L k=l
O'.kXk
with
O'.k
E
4.4. Harmonic analysis for channels
209
and ¢(" b') is X-measurable, we see that ¢(" b) is also X-measurable for b E C(G). By assumption we can choose a topological basis 'I for G consisting of clopen sets. For each C E 'I, Ie E C(G) and hence V(.)(C) is X-nleasurable. Let Q;1 == {C E Q; : V(.)(C) E B(X)}. Then it is easily seen that Q;1 is a monotone class and contains 'I'. Hence (!;1 == o-('!) == Q; since Q; is generated by T, Thus v(.) (C) is X-measurable for all CEQ;. Letting v(x, C) == V x (C) for x E X and CEQ;, we conclude that » e c and ¢ == ¢v' The uniqueness of v follows from that of V x for each x E X. A one-to-one correspondence between P and Q is derived from a well-known theorem (cf. Dixmier [1, p.288]). Lemma 3. There exists a one-to-one, onto and affine correspondence ¢ P and Q given by
q(x, f) =
fa
f-+
q between
(4.1)
!(X)¢J(x, X) m(dx) ,
In this case, there exist a family {Ux('), H«, (JX}XEX of weakly continuous unitary representations of 0 and a family {Vx('), ll x , (Jx}xEX of bounded *-representations of L 1 (0 ) such that for each x E X ¢(x, X) == (Ux(X)(Jx, (Jx)x'
(4.2)
q(x, f) == (Vx(f)(Jx, (Jx) x'
(4.3)
where 1-l x is a Hilbert space with inner product (".) x and (Jx E 1-l x is a cyclic vector of norm 1 for both representations. Remark 4. In Lemma 3 above, for each x E X the Hilbert space ll x can be obtained from L 1 (0 ) through the positive linear functional q(x,') by means of the standard GNS (Gel'fand-Neumark-Segal) construction as follows. Define (', ·)x by
(f, g)x == q(x, f
* g*),
(4.4)
which is a semiinner product. Let N x == {f E L 1 (0 ) : (f,f)x == o} and ll x be the completion of the quotient space L 1 (0 )/N x w.r.t. (', ·)x. Let us denote the inner product in ll x by the same symbol (', ·)x. Denote by [f]x the equivalence class in ll x containing f E L 1 (0 ), i.e.,
(4.5) Ux ( .) and Vx ( .) are given respectively by
Vx(f)[g]x == [f
* g]x
(4.6)
210
Chapter IV: Special Topics
for X E 0, f,g E L 1 (0 ), where gx(·) operator on 1i x satisfying
= g(X·)·
We see that Ux(X) is a unitary
(1) Ux(XX') Ux(X)Ux(X') for X,X' E 0; (2) Ux(l) = 1, the identity operator on 1i x ; (3) Ux(X)-l = Ux(X- 1 ) = Ux(X)*, the adjoint of Ux(X), for X E 0; (4) (Ux(·)[g]x, [g]x)x is continuous on 0 for any 9 E L 1 (0 ), while Vx (f) is a bounded linear operator on 1i x such that (5) Vx(af + {3g) = aVx(f) + {3Vx(g) for a,{3 E C and i,s E L1(0); (6) Vx(f * g) =Vx(f)Vx(g) for f,g E L 1 (0 ); (7) Vx(f*) = Vx(f)* for f E L 1 (0 );
(8) IIVx(f)11 ~ Ilfllt for f E L 1 (O). Moreover, the cyclic vector identity: (]x = w-lim e~, i.e.,
(]x
is obtained as the weak limit of the approximate
~
Now let S be a measurable transformation on (X, X) and T be a continuous homomorphism on G. T induces a continuous homomorphism T on 0 given by the equation
(Tt, X) = (t, TX), and
T associates an
operator
t E G,X EO,
T on B(G) such
Tf(x) = f(TX),
f
that
E B(G),X E
O.
Definition 5. 4> E P is said to be stationary if (p3) 4>(Sx, X) = 4>(x, TX) for x E X and X E
O.
Denote by P s the set of all stationary 4> E P. q E Q is said to be stationary if (q3) q(Sx, f) = q(x, Tf) for x E X and f E L 1 (0 ). Denote by Qs the set of all stationary q E Q. Proposition 6. Let v E C, 4>v E P and qv E Q be corresponding elements. Then, v E Cs {=} 4>v E P s {=} qv E Qs·
Proof. That u E C; {=} 4>v E P s can be proved by the following two-sided implications using the fact that G spans a dense subset in L 1 (G, v(x, .)) for each x E X: v
E
C; {::=:} v(Sx, C) = v(x, T- 1 C ), x
E
X, C
E (!;
4.4. Harmonic analysis for channels {=}
211
L
(t, X) v(Sx, dt) =
L
(t, X) v(x, dT-1t), X E X, X E G
<===>
fa
f(x)cP,ASx, X) m(dx) =
fa
f(x)cPv(x, TX) m(dx),
X,
1 E £1(0)
xEX,/E£l(O)
<===> qv(Sx, I) <===> qv E Qs·
= qv(x,
T/), x
E
Remark 7. Let
(f,g)p= !x ([J]x,[g]x)x/l-(dx) = !xq(X,f*9*)/l-(dX),
f,gEL1(G),
where we use notations in (4.4), (4.5) and (4.6). Let NI-£ = {I E £1(0) : (/,/)1-£ = O} and ?-ll-£ be the completion of the quotient space £1 (O)/N/1- w.r.t. (., .)1-£. Then ?-ll-£ is a Hilbert space. Denote by [/]1-£ the equivalence class in ?-l/1- containing 1 E £1(0).
212
Chapter IV: Special Topics
Lemma 8. Let v E C. For each input source J.l E P(X) there exist a weakly continuous unitary representation {Up,(·), 1-£p" gp,} of 0 and a bounded *-representation {Vp,(·), 1-£p" gp,} of £1(0) on the same Hilbert space 1-£p, such that for 9, h E £1(0)
=L
(U/L(x)[g]/L' [h]/L) /L
(Ux(X)[g]x, [h]x)x fJ-(dx) ,
(V/LU)[g]/L' [h]/L) Jl. = L (VxU)[g]x, [h]x)x fJ-(dx) , Proof. Let J.l E P(X) and
=
=L
4>(x, X)fJ-(dx) ,
¢Jv E P. By Lemma 111.2.5 we have that for X E
=L
1
(t, x) v(x, dt)fJ-(dx)
=
1
0
(t, x) fJ-v(dt).
Since J.lV is a Baire probability measure on G,
QU)
=k
f(x)
It follows from Fubini's Theorem that for f E £1(0)
QU)
= kf(x)[L 4>(X,X)fJ-(dx)]m(dX) = L k f(x)4>(x, X)m(dX)fJ-(dx) = L q(x, f) fJ-(dx) , by (4.1),
where q = qv. Observe that for l.s E £1(0)
U, g)/L
=L
q(x,J * g*) fJ-(dx)
= QU * g*).
Thus there exist a weakly continuous unitary representation {Up,(·), 1-£p" gp,} of 0 and a bounded *-representation {Vp,(·), 1-£p" gp,} of £1(0) on the Hilbert space 11,p, such that
XEO,
4-4- Harmonic analysis for channels
Q(f) =
213
(vJL (f) (lJL' (lJL)JL'
Now the desired equations are easily verified. Let us now consider the Fourier transform :F on L 1 (G) defined by
(Ff)(t) = fa(t,x)!(x)m(d X), For /-L E P(X) let tiJL be the I-lilbert space obtained in Lemma 8. As is wellknown (Plancherel's Theorem), the Fourier transform :F is a unitary operator between L 2 (8 ) and L 2(G) (cf. e.g. Dixmier [1, p.369]). Similarly, we can prove the following. Theorem 9. Let u E C and cPv 'E P and qv E Q be corresponding to u, For each /-L E P(X) let tiJL be as in Lernma 8. Then, the Fourier transform F on L 1 (8 ) is regarded as a unitary operator from tiJL onto L 2 ( G, j.Lv) .
Proof. Let j.L E P(X). Using Fubini's Theorern we have for f E L 1 (8 ) that
([f]p, [!]p) 1£ = L q(x,! * !*) IJ-(dx) = L fa (f
* !*)(X)v(x, X) m(dX)IJ-(dx)
= Lfa(f * !*)(X) l(t,x)v(x,dt)m(dX)IJ-(dX) = L lfa (f * !*)(X)(t, X) m(dx)v(x, dt)IJ-(dx) = L l
F(f * !*)(t) v(x, dt)IJ-(dx) 2
= lIF(f)1 IJ-v(dt)
= (F(f), F(f) )2,JLV' where (', ·)2,JLv is the inner product in L 2 (G, j.Lv). Since F(L 1 (0 )) is dense in Co(G), which is the Banach space of all C-valued continuous functions vanishing at infinity and is dense in L 2(G, /-Lv), we conclude that F(L 1 (8 )) is dense in L 2(G, /-Lv). Therefore, the Fourier transform F can be uniquely extended to a unitary operator from 2(G,/-Lv). tiJL onto L Remark 10. To consider measurability of a channel v E C(X, G) assurne that G is compact abelian with the normalized Haar measure m(·). The dual group 8 is
Chapter IV: Special Topics
214
discrete abelian, so that L 1 (G) = £1 (G). Let ¢ = ¢v E P correspond to t/, If ¢(x,') E £1(0) for every x E X, then v(x,') « m (x E X), i.e., u is dorninated by m, and the function iJ : X -+ M(G) is weakly measurable. For, since ¢(x,·) == ¢x (.) E £1 (G) for each x EX, it follows frorn the Fourier inversion formula that
cPx(X) =
L
FcPx(t)(t, X)m(dt),
On the other hand, we know that
cP(x, X) = Hence v(x,')
«
L
(t, X) v(x, dt),
m (x E X). Therefore the conclusion follows from Theorem 2.3.
4.5. Noncommutative channels In this section, a noncommutative extension of (classical) channels is discussed based on the one-to-one correspondence between the set of channels and the set of certain type of operators between function spaces (Theorern 111.2.1). 'We forrnulate a noncommutative (or quantum mechanical) channel as a certain mapping between the state spaces of two C*-algebras. Stationarity, ergodicity, KMS condition and weak mixing are introduced, and their properties are examined. As standard textbooks of operator algebra theory we refer to Dixmier [1](1982), Sakai [1](1971) and Takesaki [2](1979). Let A be a C*-algebra and 6(A) be its state space. 6(A) consists of positive linear functionals on A of norm 1. Consider a strongly continuous one parameter group a(G) of *-automorphisms of A over an LCA group G. That is, i) at is a *-automorphisrn of A for t E G; ii) atl at2 = aht2 for t1, t2 E G; iii) lirn Ilata - all = 0 for a E A, ebeing the unit of G. t-s e
Then the triple (A, 6(A), a(G)) is called a C*-dynarnical system.
Definition 1. Let G be an LCA group and (A,6(A),a(G)) and (lm,6(Ja),jJ(G)) be a pair of C*-dynamical systerns. A mapping A* : 6(A) -+ 6(Ja) is said to be a channel if (qc l ) A* is a dual map of a completely positive rnap A : Ja -+ A. Here A*
4.5. Noncommutative channels
215
to an n x n positive matrix (Ab i j ) . Sometimes A* is called a quanturn channel. C(A, lB) denotes the set of all channels from A to lB.
Example 2. (1) Let (X, X) and (Y,~) be a pair of compact Hausdorff spaces with the Baire rr-algebras, and A == C(X), lB == C(Y), the Banach spaces of Cvalued continuous functions on X, Y, respectively. Then, A and lB are comrnutative Mt(X) == P(X) and 6(lB) == P(Y). If C*-algebras with 6(A) == [C(X)*J7 S : X--+ X and T : Y--+ Yare invertible measurable transformations, then an == S" and f3n == T" (n E Z) define one parameter groups of *-automorphisms on A and lB over Z, respectively, where Sand T are operators induced from Sand T, respectively. Hence (C(X), P(X), S(Z)) and (C(Y), P(Y), T(Z)) are C*-dynarnical systems. Let A : C(Y) -+ C(X) be a positive linear operator with Al == 1, then the dual map A* : P(X) -+ P(Y) is a channel, where 1 is the identity function. In fact, the following statement is true by the proof of Theorem 111.2.1: If A : C(Y) --+ C(X) is a positive linear operator with Al == 1, then A has a unique extension Al : B(Y) -+ B(X) such that B(Y) :3 bn 4.- 0 implies B(X) :3 Aib n 4.- O. Thus, since Al induces a unique channel, such a A defines a continuous channel in view of Proposition 111.2.11. (2) In (1), let B(X) and B(Y) be the Banach spaces of all bounded Baire functions on X and Y, respectively. In this case, B(X) and B(Y) are also C*-algebras and it holds that P(X) ~ 6(B(X)) and P(Y) ~ 6(B(Y)). Let A : B(Y) -+ B(X) be a positive linear operator with Al == 1 such that bn 4.- 0 => Ab n 4.- 0 (cf. (k1) and (k2) in Section 3.2). Then, A * : 6 (B(X)) --+ 6 (B(Y)) is a channel. In view of Theorem 111.2.1, Definition 1 is a noncommutative extension of (classical) channels. Note that B(X) is a ~ *-algebra, i.e., a C*-algebra with a a-conuerqence. In this case, the rr-convergence an ~ a is given by sup Ilanll < 00 and an -+ a pointwise. n>l
Also note that B(X) is the smallest ~*-algebra containing the C*-algebra C(X), denoted B(X) == C(X)CT and called the a-enoelope of C(X). Therefore, as a direct noncommutative extension of a channel operator defined in Section 3.2 we can formulate a noncommutative channel operator as a positive linear operator A between two ~*-algebras with Al == 1 and such that b« ~ 0 implies Ab n .z, O. (3) Let 1-l be a complex separable Hilbert space and B(1-l) be the algebra of all bounded linear operators on 1-l. T(1-l) denotes the Banach space of trace class operators on 1-l, so that T(1-l)* == B(1-l) holds. Hence,
6 (B(1-l))
== Tt (1-l) == {p
E
T(1-l) : p ~ 0, tr p == I},
where trf-) is the trace. Let {Qn}~=1 be a set of rnutually orthogonal projections 00
on 1-l such that
L: Qn
== 1, i.e., a resolution of the unity, and define A * by
n=1 00
A*p== EQnPQn, n=1
P E Tt(1-l),
216
Chapter IV: Special Topics
which is called a quantum measurement. Then, A* is a channel in C(B(1-l), B(1-l)). (4) Let IB be a C*-algebra and A be its closed *-subalgebra. If A : IB -t A is a projection of norm 1, then the dual map A * : 6(A) -t 6(IB) is a channel. (5) Let (X, X) and C(X) be as infl) and (A, 6(A), a) be a C*-dynamical system. If w : X -t 6(A) is an X-measurable mapping, then
A*p, =
L
I-L E P(X)
w(x)p,(dx),
defines a channel A* : P(X) -t 6(A) and is called a classical-quantum channel. If =:: X -t A+ = {a E A: a ~ O} is an A+-valued c.a, measure, then
A*
E
6(A)
defines a channel, called a quantum-classical channel, where A : C(X) -t A and satisfy that
AU)
= LIdS,
f
E
=:
C(X),
which is a Riesz type theorem for vector rneasures. Let us fix two C*-dynamical systems (A, 6(A), a(JR)) and (IB, 6(IB), (3(JR)) , where we assume that A and IB have the identity 1 and G = lR, the real line.
Definition 3. Consider a C*-dynamical system (A,6(A),a(lR)). 1(0'.) = l(a,A) denotes the set of all a-invariant states, i.e.,
I(a) = {p
E
6(A) : p(ata) = p(a), a E A, t E lR}.
A state p E 6(A) is said to be a-KMS if, for any a, b E A, there corresponds a C-valued function ts» on C such that (i) fa,b is analytic on D = {z E C : 0 < Imz < 1}, where Im{·} indicates the imaginary part; (ii) fa,b is bounded and continuous on D; (iii) fa,b(t) = p((ata)b) and fa,b(t + i) = p(b(ata)) for t E JR. K (a) = K (a, A) denotes the set of all 0'.-KMS states. A state p is said to be weakly mixing (WM) if
p(a(a)b) = p(a(a))p(b), where
lit
a(a) = lim t-+oo t
0
as(a) ds,
a,b
E
A,
a E A.
4.5. Noncommutative channels
217
WM (a) = WM (a, A) denotes the set of all WM states. Using the above definition we can introduce stationarity, ergodicity, KMS condition and WM condition for quantum channels.
Definition 4. Let A * : 6 (A) -+ 6 (JR) be a channel. A * is said to be stationary if (qc2) A 0 (3t
= at 0 A for t
E ~.
Cs(A, JR) denotes the set of all stationary channels in C(A, JR). A stationary channel A* is said to be ergodic if (qc3) A* (ex I (a)) ~ ex I ({3), where ex{.} denotes the set of all extreme points in the set {.}.
Cse(A, JR) denotes the set of all stationary ergodic channels. A stationary channel A * is said to be KMS if (qc4) A * (K (a )) ~ K ((3) .
Ck(A, JR) denotes the set of all KMS channels. A channel A* is said to be weakly mixing (WM) if
(qc5) A*(WM(a)) ~ WM({3).
Cwm(A, JR) denotes the set of all WM channels. As is easily seen, the condition (qc2) corresponds to the condition (k3) (KT = SK). In a classical channel setting, exPs(X) = Pse(X) and exPs(Y) = Pse(Y), and a stationary ergodic channel u transforms ergodic input sources into ergodic output sources, or K~(exPs(X)) ~ exPs(Y). Hence (qc3) is a natural extension of ergodicity of classical channels.
Example 5. Let AQ9JR be the C*-tensor product of A and JR, and j : JR -+ AQ9JR be a natural embedding, Le., j(b) = 1 Q9 b for b E JR. Also j denotes a natural embedding of A into A Q9 JR. (1) Take a state 'l/J E 6 (JR) and define A : JR -+ A by
A=(1Q9'l/J)oj. Then, it is easily seen that A"ip = 'l/J for
Chapter IV: Special Topics
218
= ip ® 1/; (0-(1 ® 13(b l =
)
)0-(1 ® b2))
cp ® 1/;( a ® ,8(0-(1 ® bl))o-(l ® b2)).
since cp is WM, so that
A* cp (13(b 1)b2) = cp @1/; (a @,8(0-(1 @bl ) ) ) cp @1/; (0-(1 ® b2)) =
A*(13(b l))A*cp(b 2)
since the algebraic tensor product A0lffi is dense in A@lffi. This means A* cp E WM (,8). Therefore A* is WM. (3) In (2), if 1/; E K(,8), then A* = [(1 ® 1/;) 00- 0 * is KMS. For, let sp E K(a). Then cp®1/; E K(a®,8). Now for Q,R E A®lffi there exists a C-valued function fQ,R on C that is analytic on D = {z E C : 0 < Im z < I}, and bounded and continuous on D such that
jJ
fQ,R(t) = cp ® 1/;(at ® ,8t(Q)R) , fQ,R(t Letting Q
= 0-(1 ® bl )
+ i) =
and R
ip
® 1/;(Rat @ ,8t(Q)),
= 0-(1 ® b2 )
t E JR, t E JR.
for bb b2 E lffi, we see that for t E JR
fQ,R(t) == fb1,b2 (t) = A *cp(,8t(b l)b2) , fQ,R(t
+ i) == fbl,b2(t + i)
= A*cp(b 2,8t(b l
)) .
Hence A*cp is KMS. Therefore A* is KMS. Definition 6. Let S ~ 6(A) and ab a2 E A. We say that al == a2 (modS) if cp(al) = cp(a2) for cp E S. If Ai and A2 are channels, then we say that Ai == A2 (modS) provided that Aicp = A 2CP for cp E S. Proposition 7. If A* is a stationary ergodic channel, then A* is extremal in Cs(A,lffi) modI(a), i.e., if A* = AAi + (1 - A)A2 with 0 < A < 1 and Ai,A 2 E Cs(A,lffi), then Ai == A2 (modI(a)).
Proof. Suppose that A* = AAi+(l A)A2 for some A E (0,1) and Ai,A 2 E Cs(A,lffi). Then we have A*ep(b) = AAi cp(b) + (1 - A)A~cp(b) for b E lffi and cp E exI(a). It is easily verified that for cp E exI(a) b E lffi, t E JR, k = 1,2,
4.5. Noncommutative channels
219
so that Ak
Theorem 8. If 1m is simple, i.e., 1m contains no proper two-sided ideal, and A = B(1i) for some Hilbert space 1i, then a stationary channel A* is ergodic iff A* E exC s (.A, 1m) (rnod 1(0'.)).
Proof. The "only if' part is proved in Proposition 7. To prove the "if' part, suppose that A is not ergodic. Then there exists some
A*
(i) Qt ~ 0;
(ii) (x'ljJ, QtX'ljJ) = 1; (iii) 'l/J k (b) = (x'ljJ, Q 1r'ljJ (b)x'ljJ) (b E B),
t
for k = 1,2, where u'ljJ(lR) is a one parameter group of unitary operators on 1i'ljJ and 1r'ljJ(1m)' and u'ljJ(lR)' are the commutants in B(1i'ljJ). Since 1m is simple, 1r'ljJ is faithful, i.e., 1r'ljJ(b) = 0 {:} b = O. Let 8'ljJ = A 0 1r;j; l : 1r'ljJ(1m) -t A, which is completely positive because A : 1m -t A is so. Since A = B(1i), 8'ljJ can be extended to a completely positive map 8'ljJ : B(1i'ljJ) -t A. Hence we have that
Let At = 8'ljJ oQt 01r'ljJ : 1m -t A, which is completely positive (k consider a map
E9 epE6(A)
At:1m-t
E9
= 1,2).
For k = 1,2
J\p,
epE6(A)
where VJ = A"ip if VJ E 1({3), At = A otherwise and J\p = A for
by bE 1m,w E 6(A), k = 1,2.
Chapter IV: Special Topics
220
Let A k
==
E
EB
0
At (k == 1,2). Then, A k
:
lffi --+ A is completely positive,
cpE<5 (A)
Akl == 1 and At: E Cs(A, lffi) for k == 1,2. Moreover, Ai A* == AAj
+ (1- A)A;
=1=
A2 and
(modI(a)),
which contradicts the assumption. We now consider interrelations between KMS channels, WM channels and ergodic channels. lB denotes the set of all (3-analytic elements of lffi, i.e., b E lB iff there exists a lffi-valued entire function A(t) such that A(t) == (3t(b) for t E JR. (A, 6(A), a) is siad to be a-abelian if
lit
lim -
t-+oo
where [a, b]
t
0
a,b,cEA,
== ab - ba for a, b E A. Then we can state the following.
Theorem 9. Let A* E Cs(.A,lffi) be a stationary channel. Then: (1) A* is WMiff (5.1)
(2) A * is KMS iff
(3) If (A,6(A),a) is a-abelian and A is *-homomorphism, then A* is KMS iff A* is ergodic. Proof. (1) Suppose A* is WM. Then we have for A*
bb
b2
E
lffi,
since A*
==
4.5. Noncommutative channels
221
As in the commutative case, we want to consider compound states. Let A* : -+ 6(B) be a channel. Take any
(i) 4l(a ® 1) =
=
1
w p,(dw) ,
exS
thus, sp is the barycenter of It. Using this and A*
p"
we can define a compound state of
= 1. (w®A*w)p,(dw), exS
which depends on the choice of Sand
p"
since p, is not necessarily unique.
Remark 10. (1) 4lp.,s is an extension of the classical compound source. In fact, let A = C(X), B = C(Y) and S = P(X). Let v E C(X, Y) B K~ = A*. Then each p, has a unique decomposition
J-t =
L
Ox J-t(dx),
where 8x is the Dirac measure at x E X. Hence we have that for A E
x and C
E ~
L =L
~p,s(A x C) =
ox(A)A*ox(C)J-t(dx) lA(X)V(X,C)J-t(dx)
= LV(X,C)J-t(dx) = p,Q9v(A x C). (2) If A = B(1l 1 ), B = B(1l2), A* : 6(A) -+ 6(B) is a channel, and P E 6(A) Tt(1l 1 ) is written in a Schatten decomposition form
=
00
P= LAnPn, n=l
(5.2)
Chapter IV: Special Topics
222 00
then p ® A * p =
L:
AnPn ® A * Pn is a compound state, where Al 2: A2 2: ... 2: 0 and
n=1
Pn's are mutually orthogonal one-dimensional projections (cf. Schatten [2]).
A quantum entropy and relative entropy are defined and entropy transrnission through a quantum channel can be discussed. A brief description regarding this will be given in the Bibliographical notes.
Bibliographical notes
4.1. Channels with a noise source. The contents of this section is taken from Nakamura [3](1975). 4.2. Measurability of a channel. The content of this section is taken from Umegaki [7](1974). Rernark 2.6 is added, which was used in Section 3.5. The role of dominated channels is clarified in connection with weak and strong measurabilities. 4.3. Approximation of channels. The metric p(.,.) is introduced in Jacobs [1](1959) and Ding and Yi [1](1965). Lemma 3.2 is in Kakihara [3](1985). Lemma 3.3 and Proposition 3.4 are noted here. Theorems 3.6 and 3.7 are obtained in Umegaki [7]. Another type of distance was considered by Neuhoff and Shields [1](1982) using d-distance. Kieffer [4](1983) considered various topologies in C(X, Y) and showed that the class of weakly continuous channels is dense in some types of channels under these topologies. 4.4. Harmonic analysis for channels. The idea of this section is taken from Kakihara [1,2](1976,1981) and [3] and Kakihara and Umegaki [1](1979). 4.5. Noncommutative channels. Echigo (Choda) and Nakamura [1](1962) and Choda and Nakamura [1, 2](1970, 1971) considered a noncommutative extension of classical channels. On the other hand, quantum channels were defined and studied by some authors such as Davies [1](1977), Holevo [1](1977), Ingarden [1](1976) and Takahashi [1](1966). The present style of formulation of a (quantum) channel is obtained by Ohya [1](1981). A E*-algebra formulation of a channel was considered by Ozawa [1](1977) (see also [2](1980)), which is a direct generalization of the classical channel as was seen in Example 5.2 (2). Definition 5.4 through Remark 5.10 are due to Ohya [1], [2](1983). von Neumann [2](1932) introduced an entropy for a state P E T{(1-l) that is defined by H(p) = -trplogp 00
L(4Jn,plog p4Jn), n=1
223
Bibliographical notes
where {
H(p) ==
:E An log x.. n=1
Thus the von Neumann entropy can be viewed as a generalization of the Shannon entropy since {An} is an infinite probability distribution, i.e., An 2: 0 (n 2: 1) and 00
I.:
An
1. Basic properties of H (p) are collected in the following:
n=1
(1) (Positivity) II(p) 2: 0 for p E Tt(H). (2) (Concavity) I[(API + (1 - A)P2) 2: AH(PI) + (1 - A)H(P2) for A E [0,1] and PI,P2 E Tt(H). (3) (Lower semicontinuity) Pn -t P (weakly) =} H(p) ~ lim inf H(Pn). n-+oo
These properties together with some other properties of the von Neurnann entropy were obtained by Lieb and Ruskai [1](1973), Lieb [1](1973) and Lindblad [1, 2, 3, 4](1972-1975). The von Neurnann entropy was extended to a C*-dynamical setting by Ohya [2, 3]. Another type of entropy was introduced by Segal [1](1960), now known as the Segal entropy, which was developed by Ruskai [1](1973) and Ochs and Spohn [1](1978). Umegaki [1](1962) introduced what is now called the Umegaki relative entropy in a semifinite and a-finite von Neumann algebra A. When A == B(H), for two states P, a E Tt(1l) the Umegaki relative entropy H(pla) is defined by
H(pIO") = { :P(logp -logO")
if s (p) < s (a ) otherwise
where s(p) is the support of p. Araki [1, 2](1976, 1977) extended to the case where A is any von Neumann algebra based on the Tornita-Takesaki theory (cf. Takesaki [1](1970)). Shortly afterwards Uhlmann [1](1977) defined the relative entropy for a pair of positive linear functionals on a *-algebra, which is a generalization of Araki's definition and is in the framework of interpolation theory. Ohya [2],[3] (1984) defined the relative entropy in a C*-algebra setting and applied it to entropy transmission in quantum channels (see also Ohya [4](1985)). Yuen and Ozawa [1](1993) showed the ultimate upper bound for information carried through a quantum channel. A related topic can be seen in Sugawara [1](1986).
REFERENCES
R. L. Adler [1] Ergodic and mixing properties of infinite memory channels. Proc. Amer. Math. Soc. 12 (1961), 924-930. R. Ahlswede and 1. Csiszar [1] Hypothesis testing with communication constraints. IEEE Trans. Inform. Theory IT-32 (1986), 533-542. M. A. Akcoglu [1] A pointwise ergodic theorem in Lp-spaces. Ganad. J. Math. 27 (1975), 10751082. P. Algoet and T. Cover [1] A sandwich proof of the Shannon-McMillan-Breiman theorem. Ann. Probab. 16 (1988),899-909. H. Araki [1] Relative entropy of states of von Neumann algebras. Publ. RIMS Kyoto Univ. 11 (1976), 809---833. [2] Relative entropy of states of von Neumann algebras II. Publ. RIMS Kyoto Univ. 13 (1977), 173-192. S. Arimoto [1] An algorithm for computing the capacity of arbitrary discrete memoryless channls. IEEE Trans. Inform. Theory IT-IS (1972), 14-20. R. Ash [1] Information Theory. Interscience, New York, 1965. [2] Real Analysis and Probability. Academic Press, New 1972. R. R. Bahadur [1] Sufficiency and statistical decision functions. Ann. Math. Statist. 25 (1954), 423-462. O. Barndorff-Nielsen [1] Subfields and loss of information. Z. Wahrscheinlichkeitstheorie 2 (1964), 369379. A. R. Barron [1] The strong ergodic theorem for densities: Generalized Shannon-McMillanBreiman theorem. Ann. Probab. 13 (1985), 1292-1303. 225
226
References
P. Billingsley [1] Ergodic Theory and Information. Wiley, New York, 1965. G. D. Birkhoff [1] Proof of ergodic theorern. Proc. Nat. Acad. Sci. U.S.A. 17 (1931), 656-660. R. E. Blahut [1] Computation of channel capacity and rate-distortion functions. IEEE Trans. Inform. Theory IT-18 (1972), 460-473. [2] Hypothesis testing and information theory. IEEE Trans. Inform. Theory IT20 (1974), 405-417. J. R. Blum and D. L. Hanson [1] On invariant probability measures. Pacific J. Math. 10 (1960), 1125-1129. L. Boltzman [1] Weitere Studien iiber das Warmegleichgewicht unter Gasmolekiilen. Wiener Berichte 63 (1872), 275-370. [2] Uber die Beziehung zwischen dem zweiten Hauptsatze der mechanischen Warme theorie und der Warhscheinlichkeitsrechung respektive den Satzen iiber das Warmegleichgewicht. Wiener Berichte 76 (1877), 373--c435. L. Breiman [1] The individual ergodic theorem of information theory. Ann. Math. Statist. 28 (1957), 809-811: Correction. ibid. 31 (1960), 809-810. [2] On achieving channel capacity in finite-memory channels. Ill. J. Math. 4 (1960), 246-252. J. R. Brown [1] Ergodic Theory and Topological Dynamics. Academic Press, New York, 1976. L. Carleson [1] Two rernarks on the basic theorems of information theory. Math. Scand. 6 (1958), 175-180. H. Chernoff [1] A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Statist. 23 (1952),493-507. [2] Large sample theory: parametric case. Ann. Math. Statist. 27 (1956), 1-22. G. Y. H. Chi and N. Dinculeanu [1] Projective limits of measures preserving transformations on probability spaces. J. Multivariate Anal. 2 (1972), 404-417. M. Choda and M. Nakamura [1] A remark on the concept of channels II. Proc. Japan Acad. 46 (1970),932-935. [2] A remark on the concept of channels III. Proc. Japan Acad. 47 (1971),464-469. K. L. Chung [1] A note on the ergodic theorern of information theory. Ann. Math. Statist. 32 (1961), 612-614.
References
227
1. P. Cornfeld, S. V. FOlnin and Ya. G. Sinai
[1] Ergodic Theory. Springer-Verlag, Berlin, 1982. 1. Csiszar
[1] Information-type measures of difference of probability distributions and indirect observations. Studia Sci. Math. Hungar. 2 (1967), 299-318. 1. Csiszar and J. Korner [1] Information Theory: Coding Theorems for Discrete Memoryless Systems. Academic Press, New York, 1981. E. B. Davies [1] Quantum communication systems. IEEE Trans. Inform. Theory IT-23 (1977), 530-534. N. Dinculeanu and C. Foias [1] A universal model for ergodic transformations on separable measure space. Michigan Math. J. 13 (1966),109-117. [2] Algebraic models for measures. Ill. J. Math. 12 (1968), 340--351. [3] Algebraic models for measure preserving transformations. Trans. Amer. Math. Soc. 134 (1968), 215-237. Hu Guo Ding and Shen Shi Yi [1] Some coding theorems for almost periodic channels. Chinese Math. 6 (1965), 437-455. J. Diestel and J. J. UhI, Jr. [1] Vector Measures. Amer. Math. Soc., Providence, R.I., 1977. J. Dixmier [1] C*-algebras. North-Holland, New York, 1982. R. L. Dobrushin [1] General formulation of Shannon's main theorems in information theory. Amer. Math. Transl. 33 (1963), 323-438. L. Doob [1] Stochastic Processes. John Wiley & Sons, New York, 1953. Y. N. Dowker [1] Invariant measures and the ergodic theorems, Duke Math. J. 14 (1947), 10511061. [2] Finite and a-finite invariant measures. Ann. Math. 54 (1951), 595-608. [3] On measurable transformations in finite measure space. Ann. Math. 62 (1955), 504-516. N. Dunford and J. T. Schwartz [1] Linear Operators, Part 1. Interscience, New York, 1958. M. Echigo (Choda) and M. Nakamura [1] A remark on the concept of channels. Proc. Japan Acad. 38 (1962),307--309. A. D. Faddeev
228
References
[1] On the notion of entropy of a finite probability space. Uspekhi Mat. Nauk 11 (1956), 227-231 (in Russian). R. H. Farrel [1] Representation of invariant measures. Ill. J. Math. 6 (1962), 447-467. A. Feinstein [1] A new basic theorem of information theory. IRE Trans. Inform. Theory P.G.I.T.4 (1954), 2-22. [2] Foundations of Information Theory. McGraw-Hill, New York, 1958. [3] On the coding theorem and its converse for finite-memory channels. Inform. Control 2 (1959), 25-44. C. Foias [1] Automorphisms of compact abelian groups as models for measure-preserving transformations. Michigan Math. J. 13 (1966), 349-352. R. J. Fontana, R. M. Gray and J. C.Kieffer [1] Asymptotically mean stationary channels. IEEE Trans. Inform. Theory IT-27 (1981),308-316. R. G. Gallager [1] Information Theory and Reliable Communication. Wiley, New York, 1968. I. M. Gel'fand, A. N. Kolmogorov and A. M. Yaglom [1] On the general definition of the amount of information. Dokl. Akad. N auk SSSR 111 (1956), 745-748 (in Russian). S. G. Ghurge [1] Information and sufficient subfields. Ann. Math. Statist. 38 (1968),2056-2066. R. M. Gray [1] Probability, Random Processes, and Ergodic Properties. Springer-Verlag, New York, 1988. [2] Entropy and Information Theory. Springer-Verlag, New York, 1990. R. M. Gray and L. D. Davisson [1] The ergodic decomposition of stationary discrete random processes. IEEE Trans. Inform. Theory IT-20 (1974),625-636. R. M. Gray, M. O. Durham and R. L. Gobbi [1] Ergodicity of Markov channels. IEEE Trans. Inform. Theory IT-33 (1987), 656-664. R. M. Gray and J. C. Kieffer [1] Asymptotically mean stationary measures. Ann. Probab. 8 (1980), 962-973. R. M. Gray, D. L. Neuhoff andP. C. Shields [1] A generalization of Ornstein's d distance with application to information theory. Ann. Probob. 3 (1975), 315-328. R. M. Gray and D. S. Ornstein
References
229
[1] Block coding for discrete stationary d-continuous noisy channels. IEEE Trans. Inform. Theory IT-25 (1979), 292-306. R. M. Gray and F. Saadat [1] Block source coding theory for asymptotically mean stationary sources. IEEE Trans. Inform. Theory IT-30 (1984), 54-68.
S. Guiasu [1] Information Theory with Applications. McGraw-Hill, New York, 1977. P. R. Halmos [1] Lectures on Ergodic Theory. Math. Soc. Japan, Tokyo, 1956. [2] Entropy in Ergodic Theory. Lecture Notes, University of Chicago Press, Chicago, 1959. P. R. Halmos and L. J. Savage [1] Application of the Radon-Nikodym theorem to the theory of sufficient statistics. Ann. Math. Statist. 20 (1949), 225-241. Te Sun Han and K. Kobayashi [1] Exponential-type error probabilities for multiterminal hypothesis testing. IEEE Trans. Inform. Theoru IT-35 (1989), 2"-14. [2] The strong converse theorem for hypothesis testing. IEEE Trans. Inform. Theory IT-35 (1989), 178-180. R. V. L. Hartley [1] Transmission of information. Bell Sys. Tech. J. 1 (1928), 535-563. E. Hille and R. S. Phillips [1] Functional Analysis and Semi-groups. Amer. Math. Soc., Providence, R.I., 1957. W. Hoeffding [1] Asymptoticallyoptimal tests for multinomial distributions. Ann. Math. Statist. 36 (1965), 369-401, 401-408. A. S. Holevo [1] Problems in the mathematical theory of quantum communication channels. Rep. Math. Phys. 12 (1977), 273-278. R. S. Ingarden [1] Quantum information theory. Rep. Math. Phys. 10 (1976),43-73. K. Jacobs [1] Die Ubertragung diskreter Informationen durch periodische und fastperiodische Kanale. Math, Annalen 131 (1959), 125-135. [2] Uber die Durchlasskapazitat periodischer und fastperiodischer Kanale, In: Trans. Second Prague Conf. on Information Theory, Statistical Decision Functions, Random Processes, Held at Prague in 1959, Ed. by J. Kozesnik, Academic Press, New York, pp.231-251, 1960. [3] Uber die Struktur der Mittleren Entropie. Math. Zeit. 18 (1962), 33-43.
230
References
[4] fTber Kanale von Dichtetypus. Math. Zeit. 78 (1962),151-170. [5] Ergodic decomposition of the Kolmogorov-Sinai invariant. In: Ergodic Theory, Ed. by F. B. Wright, Academic Press, New York, pp.173-190, 1963. [6] Almost periodic sources and channels. Z. W. Verw. Geb. 9 (1967), 65-84. M. Jimbo and K. Kunisawa [1] An iteration method for calculating the relative capacity. Inform. Control 43 (1979), 216-223. Y. Kakihara [1] Information channels between compact groups. Res. Rep. Inform. Sci. Tokyo Institute of Technology, No. A-28, May 1976. [2] Some remarks on information channels. Res. Rep. Inst. Inform. Sci. Tech. Tokyo Denki Univ. 7 (1981), 33-45. [3] Stochastic processes with values in a Hilbert space and information channels. Doctoral Thesis, Tokyo Institute of Technology, March 1985. [4] Ergodicity of asymptotically mean stationary channels. J. Multivariate Anal. 39 (1991), 315-323. [5] Ergodicity and extremality of AMS sources and channels. Submitted. Y. Kakihara and H. Umegaki [1] Harmonic analysis and information channels. 1978-Seminar on Applied Functional Analysis, Ed. by H. Umegaki, Yurinsha, Tokyo, pp. 19-23, 1979. S. Kakutani [1] Examples of ergodic measure-preserving transformations which are weakly mixing but not strongly mixing. In: Lecture Notes in Mathematics #318, Springer, New York, pp. 143-149, 1973. G. Kallianpur [1] On the amount of information contained in a a-field. Contribution to Probability and Statistics - Essays in Honor of Harold Hotelling. Stanford University Press, Stanford, 1960. J. N. Kapur [1] Maximum Entropy Models in Science and Engineering. Wiley Eastern Limited, New Delhi, 1989. J. N. Kapur and H. K. Kesavan [1] Entropy Optimization Principles with Application. Academic Press, New York, 1992. 1. Katznelson and B. Weiss [1] A simple proof of some ergodic theorems. Israel J. Math. 42 (1982),291-296. A. 1. Khintchine [1] The concept of entropy in probability theory. Uspekhi Mat. Nauk 8 (1953), 3-20 (in Russian).
References
231
[2] On the fundamental theorems of information theory. Uspekhi Mat. Nauk 11 (1956), 17-75 (in Russian). [3] Mathernatical Foundations of Information Theory. Dover, New York, 1958. J. C. Kieffer [1] A simple proof of the Moy-Perez generalization of the Shannon-McMillan theorem. Pacific J. Math. 51 (1974), 203-206. [2] A general formula for the capacity of stationary nonanticipatory channels. Inform. Control 26 (1974), 381-391. [3] A generalized Shannon-McMillan theorem for the action of an amenable group on a probability space. Ann. Probab. 3 (1975), 1031-1037. [4] Some topologies on the set of discrete stationary channels. Pacific J. Math. 105 (1983), 359-385. J. C. Kieffer and M. Rahe [1] Markov channels are asymptotically mean stationary. SIAM J. Math. Anal. 12 (1981), 293-305. A. N. Kolmogorov [1] A new metric invariant of transitive dynamical systems and automorphisms in Lebesgue spaces. Dokl. Akad. Nauk SSSR 119 (1958),861-864 (in Russian). [2] On the entropy per unit time as a metric invariant of automorphisms. Dokl. Akad. Nauk SSSR 124 (1959),754-755 (in Russian). B. O. Koopman and J. von Neumann [1] Dynamical systems of continuous spectra. Proc. Nat. Acad. Sci. U.S.A. 18 (1932), 255-263. U. Krengel [1] Ergodic Theorems. De Gruyter Series in Mathematics, De Gruyter, New York, 1985. N. Kryloff and N. Bogoliouboff [1] La theorie de la measure dans son application it l'etude des systes dynamiques de lamecanique non lineaire. Ann. Math. 38 (1937),65-113. S. Kullback [1] Information Theory and Statistics. Wiley, New York, 1959. S. Kullback and R. A. Leibler [1] On information and sufficiency. Ann. Math. Statist. 22 (1951), 79-86. E. H. Lieb [1] Convex trace functions and the Wigner-Yanase- Dyson Conjecture. Adv. in Math. 11 (1973), 267-288. E. H. Lieb and M. B. Ruskai [1] Proof of the strong subadditivity of quantum mechanical entropy. J. Math. Phys. 14 (1973), 1938-1941. G. Lindblad
232
References
[1] An entropy inequality for quantum mechanics. Comm. Math. Phys. 28 (1972), 245-249. [2] Entropy, information and quantum measurements. Comm. Math. Phys. 33 (1973), 305-322. [3] Expectations and entropy inequalities for finite quantum systems, Comm. Math. Phys. 39 (1974), 111-119. [4] Completely positive maps and entropy inequalities. Comm. Math. Phys. 40 (1975), 147-151. N. F. G. Martin and J. W. England [1] Mathematical Theory of Entropy. Addison-Wesley, New York, 1981. B.McMillan [1] The basic theorems of information theory. Ann. Math. Statist. 24 (1953), 196219. Shu-Teh C. Moy [1] Asymptotic properties of derivatives of stationary measures. Pacific J. Math. 10 (1960), 1371-1383. [2] Generalizations of Shannon-McMillan theorem. Pacific J. Math. 11 (1961), 705-714. [3] A note on generalizations of Shannon-McMillan theorem. Pacific J. Math. 11 (1961), 1459-1465. K. Nakagawa and F.Kanaya [1] On the converse theorem in statistical hypothesis testing. IEEE Trans. Inform. Theoru IT-39 (1993), 623-628. [2] On the converse theorem in statistical hypothesis testing for Markov chains. IEEE Trans. Inform. Theory IT-39 (1993),629-633. Y. Nakamura [1] Measure-theoretic construction for information theory. Kodai Math. Sem. Rep. 21 (1969), 133,"-150. [2] A non-ergodic compound source with a mixing input source and an Adler ergodic channel. Kodai Math. Sem. Rep 22 (1970), 159-165. [3] Ergodicity and capacity of information channels with noise sources. J. Math. Soc. Japan 27 (1975), 213-221. J.Nedoma [1] The capacity of a discrete channel. In: Trans. First Prague Conf. on Information Theory, Statistical Decision Functions, Random Processes, pp.143-181, 1957. [2] On non-ergodic channels. In: Trans. Second Prague Conf. on Information Theory, Statistical Decision Functions, Random Processes, Held at Prague in 1959, Academic Press, New York, pp.363-395, 1960. [3] Die Kapazitat der periodischen Kanale, Z. Wahrscheinlichkeitstheorie 2 (1963), 98-110.
References
233
D. L. Neuhoff andP. C. Shields [1] Channel distances and representation. Inform. Control 55 (1982),238-264. W. Ochs and H. Spohn [1] A characterization of the Segal entropy. Rep. Math. Phys. 14 (1978),75-87. M.Ohya [1] Quantum ergodic channels in operator algebras. J. Math. Anal. Appl. 84 (1981), 318-328. [2] On compound state and mutual information in quantum information theory. IEEE Trans. Inform. Theory IT-29 (1983),770-774. [3] Entropy transmission in C*-dynamical systems. J. Math. Anal. Appl. 100 (1984), 222-235. [4] State change and entropies in quantum dynamical systems. In: Lecture Notes in Mathematics #1136, Quantum Probability and Applications II, Springer, Berlin, pp.397-408, 1985. N. Oishi [1] Notes on ergodicity and mixing property. Proc. Japan Acad. 41 (1965), 767770. D. S. Ornstein [1] Bernoulli shifts with the same entropy are isomorphic. Adv. in Math. 4 (1970), 337-352. [2] Ergodic Theory, Randomness, and Dynamical Systems. Yale University Press, New Haven, 1974. D. S. Ornstein and P. C. Shields [1] An uncountable family of K-automorphisms. Adv. in Math. 10 (1973),63-88. D. S. Ornstein and B. Weiss [1] The Shannon-McMillan-Breiman theorem for a class of amenable groups. Israel J. Math. 44 (1983), 53-60. J. C. Oxtoby [1] Ergodic sets. Bull. Amer. Math. Soc. 58 (1952), 116-136. M. Ozawa [1] Channel operators and quantum measurements. Res. Rep. Inform. Sci. Tokyo Institute of Technology, No. A-29, May 1977. [2] Optimal measurements for general quantum systems. Rep. Math. Phys. 18 (1980), 11-28. W. Parry [1] Entropy and Generators in Ergodic Theory. Benjamin, New York, 1969. [2] Topics in Ergodic Theoru. Cambridge University Press, Cambridge, 1981. K. R. Parthasarathy [1] On the integral representation of the rate of transmission. of a stationary channel. Ill. J. Math. 5 (1961), 299-305.
234
References
[2] A note on McMillan's theorem for countable alphabets. In: Trans. Third Prague Conference on Information Theory, Statistical Decision Functions, Random Processes, held at Prague in 1962, pp. 541-543, 1964. [3] Probability Measures on Metric Spaces. Academic Press, New York, 1967. A. Perez [1] Information theory with an abstract alphabet. Generalized forms of McMillan's limit theorem for the case of discrete and continuous times. Theory Probab. Appl. 4 (1959), 99-102. [2] Extensions of Shannon-McMillan's limit theorems to more general stochastic processes. In: Trans. Third Prague Conference on Information Theory, Statistical Decision Functions, Random Processes, held at Prague in 1962, pp. 545-574, 1964. K. Petersen [1] Ergodic Theory. Cambridge University Press, Cambridge, 1983. J. R. Pierce [1] The early days of information theory. IEEE Trans. Inform. Theory IT-19 (1973), 3-8. M. S. Pinsker [1] Information and Information Stability of Random Variables and Processes. Holden Day, San Francisco, 1964. M. M. Rao [1] Foundations of Stochastic Analysis. Academic Press, New York, 1981. [2] Probability Theory with Applications. Academic Press, New York, 1984. [3] Conditional Measures and Applications. Marcel Dekker, New York, 1993. O. W. Rechard [1] Invariant measures for many-one transformations. Duke Math. J. 23 (1956), 477-488. A. Renyi [1] On mixing sequences of sets. Acta Math. Acad. Sci. Hungar. 9 (1958), 215-228. S. M. Rudolfer [1] On characterizations of mixing properties of measure-preserving transformations. Math. Systems Theory 3 (1969), 86-94. M. B. Ruskai [1] A generalization of entropy using trace on von Neumann algebras. Ann. Inst. Henri Poincare 19 (1973), 357-373. S. Sakai [1] C*-algebras and W*-algebras. Springer-Verlag, New York, 1971. R. Schatten [1] A Theory of Cross-Spaces. Ann. Math. Studies No. 26, Princeton University Press, Princeton, 1950.
References
235
[2] Norm Ideals of Completely Continuous Operators. Springer-Verlag, New York, 1960. 1. E. Segal [1] A note on the concept of entropy. J. Math. Meck. 9 (1960),623-629. C. E. Shannon [1] A mathematical theory of communication. Bell System Tech. J. 27 (1948), 379-423, 623-656. C. E. Shannon and W. Weaver [1] The Mathematical Theory of Communication. University of Illinois Press, Urbana, 1949. P. C. Shields [1] The Theory of Bernoulli Shifts. University of Chicago Press, Chicago, 1973. Ya. G. Sinai [1] On the concept of entropy for dynamical systems. Dokl. Akad. Nauk. SSSR 124 (1959), 768-711 (in Russian). D. Slepian [1] Information theory in the fifties. IEEE Trans. Inform. Theory IT-19 (1973), 145-148. [2] Key papers in the Development of Information Theory. IEEE Press, 1974. C. Stein [1] Inforrnation and comparison of experiments. Unpublished. A. Sugawara [1] On mathematical information channels with a non-cornmutative intermediate system. J. Math. Anal. Appl. 114 (1986), 1-6. H. Takahashi [1] Information theory of quantum mechanical channels. In: Advances in Communication Systerns, Vol. 1, Academic Press, New York, pp. 227-310, 1966. K. Takano [1] On the basic theorems of inforrnation theory. Ann. of the Inst. Statist. Math. (Tokyo) 9 (1958),53-77. M. Takesaki [1] Tomita's theory of rnodular Hilbert algebras and its applications. Lecture Notes in Mathematics #128, Springer, Berlin, 1970. [2] Theory of Operator Algebras 1. Springer-Verlag, New York, 1979. 1. P. Tsaregradskii [1] A note on the capacity of a stationary channel with finite memory. Theory Probab. Appl. 3 (1958), 79-91. A. 1. Tulcea [1] Contributions to information theory for abstract alphabets. A rkiv for Math. 4 (1960), 235-247.
236
References
A. I. Tulcea and C. I. Tulcea [1] Topics in the Theory of Lifting. Springer-Verlag, Berlin, 1969. H. Tverberg [1] A new derivation of the information function. Math. Scand.6 (1958),297-298. H. Uhlmann [1] Relative entropy and theWigner-Yanase-Dyson-Lieb concavity in interpolation theory. Comma Math. Phys. 54 (1977),21-32. H. Umegaki [1] Conditional expectation in an operator algebra IV (entropy and information). Kodai Math. Sem. Rep. 14 (1962), 59-85. [2] Entropyfunctionals in stationary channels. Proc. Japan Acad. 38 (1962), 668672. [3] A functional method on amount of entropy. Kodai Math. Sem. Rep. 15 (1963), 162-175. [4] General treatment of alphabet-message space and integral representation of entropy. Kodai Math. Sem. Rep. 16 (1964), 18-26. [5] A functional method for stationary channels. Kodai Math. Sem. Rep. 16 (1964), 27-39: Supplement and correction. ibid. 189-190. [6] Representations and extremal properties of averaging operators and their application to information channels. J. Math. Anal. Appl. 25 (1969), 41-73. [7] Absolute continuity of information channels. J. Multivariate Anal. 4 (1974), 382-400. [8] Operator Algebras and Mathematical Information Theory, Selected Papers. Kaigai, Tokyo, 1985. H. Umegaki and M. Ohya [1] Entropies in Probabilistic Systems - Information Theory in Functional Analysis I. Kyoritsu, Tokyo, 1983 (in Japanese). [2] Quantum Mechanical Entropies - Information Theory in Functional Analysis Il. Kyoritsu, Tokyo, 1984 (in Japanese). A. J. Viterbi [1] Information theory in the sixties. IEEE Trans. Inform. Theory IT-19 (1973), 257-262. J. von Neumann [1] Proof of the quasi-ergodic hypothesis. Proc. Nat. A cad. Sci. U.S.A. 18 (1932), 70-82. [2] Die Mathematischen Grundlargen der Quantenmechanik. Springer-Verlag, Berlin, 1932. P. Walters [1] An Introduction to Ergodic Theory. Springer-Verlag, New York, 1982. K. Winkelbauer
References
237
[1] Communication channels with finite past history. In: Trans. Second Prague Conf. on Information Theory, Statistical Decision Functions, Random Processes, held at Prague in 1959, Ed. by J. Kozesnik, Academic Press,New York, pp.685-831, 1960. J. Wolfowitz [1] Coding Theorems of Information Theory (Third Ed.), Springer-Verlag, New York, 1978. Shen Shi Yi [1] The fundamental problem of stationary channel. In: Trans. Third Prague Conf. on Information Theory, Statistical Decision Functions, Random Processes, held at Prague in 1962, pp.637-639, 1964. [2] Basic problems concerning stationary channels. Advancement in Math. 7 (1964), 1-38 (in Chinese). H. P.Yuen and M. Ozawa [1] Ultimate information carrying limit of quantum systems. Phys. Rev. Lett. 70 (1993), 363-366. Zhang Zhao Zhi [1] Some results obtained with alrnost-periodic channels. Chinese Math. 6 (1965), 428-436.
INDICES
Notation index
Chapter I
P(AI~), 11
N, 1
E(f), 12
P (PI, . . . ,Pn), 1 p(.), 1 (X,p),1
Loo(x), 12
IlfllI,12
H(X),1 RHS, 1
H(p),1 H(Pl, ... ,Pn), 1 ~n, 2 p(Xj, Yk), 2 p(Xj IYk), 2
H(XIY),2 H(Xly),2 I(X, Y), 2 H(X, Y), 2 H(plq),2 JR.,4 JR.+,4 JR.+,4 LHS, 9 (X,x,J-l),11 J-l0S- l , 11 (X, X, u; S), 11 Ll(x), 11 J-lt, 11 J-lt «J-l, 11 E(fl~), 11
2, 12
Ilflloo, 13 Ilfll p , 14 PL 2(q)) , 14
L 2(X), 14 (f, 9 )2, 14 f ° S, 14 ~n t~, 15 a(·), 15 P(~),
16 2tVQ3, 16 S-I2t, 16 2t Q3, 16 H(2t) , 17 I(2t),17 I(2tI~), 17 H(2tI~), 17 2l, 17 ~1 V~2, 17 J-l(AIB),17 H(2t, S), 20
<
H(S),20 Z,22
[f>t],24 SI
239
~
S2, 25
240
X~, 27
[x9 . . . x q] 27 't J ' 001, 28 A~B, 29
A,29 23J-L' 29 t-t1 ~ t-t2, 29 r(t-t), 31 C,31 C, 31
Indices
S,47 JJ-L' 48 EJ-L(·I JJ-L),48 H(€,Q{,8),48 H~(t-tlv), 49 H(t-tl v),49 t-te(A),55 dt,55 I(€I1J), 55 t-t1 ~ t-t2, 57 t-t1 .L t-t2, 57 SJ1 « 001, 57 001 ~ SJ1, 57 (3(k,e), 61
Chapter II X~, 67
Z,67 [x9 . . .x q] 68 't J ' 001, 68 dO(ai, aj), 68 d(x, x'), 68
prk(·),
69
C(X),69 B(X),69 M(X),69 P(X),69 Ps(X),69 P s e(X),69 A(001),69 C(X)*, 70 AJ-L(f),70 1€I(X),70 t-t(f), 70 S,71 LP(X, t-t), 71 Sn,71 S,71 exPs(X),71
241
Notation index
co [exPs(X)] , 71 !s,72
!,1,72 TI ·112,/L' 74 E/L(·IJ), 75 11·llp,/L' 75 <5{···},75 S ffi 1£, 75 const, 75 X x~, 77
J/L' 79 Pr(·), 80 1£.1.,83 Z+,83 I n,83
IJn Jnl, 83 X u, 85
M x(!),106 /-tx, 106 B(X,S),107 !Q, 107 R,108 !#,109 H(/-t) , 111 G/L' 111 ~o, 113 ~, 113 G(X,2{), 113 P(X, ~), 113 (J,114 R,114 Ps(X, ~), 115 2{, S), 118
m.,
/-t
j1(A), 91 Pa (X ), 91 M a(X),92 M s (X ), 92
Chapter III
/-t~1J,
C(X,Y),122
a
93 (8/-t)a, 93 (8/-t)s, 93 X oo,93 Pa e (X ), 97 [X, /-t], 99 VJtn , 99 H/L(VJt n), 99 rot~, 99 VJt~, 100 H(/-t) , 100 H(/-t, S), 100 I/L(2t) , 100 I/L(2tI~), 100
VJtn,g, 104 VJtn,b, 104 h/L' 105 Mn,x(!), 106 Q,106
(X
X Y,X®~,S®T),
[X, t/, Y], 122 Ex, 122
Cs(X, Y), 122 p(bklaj), 122 P = (P(bklaj))J,. k' 122 /-tV, 123 /-t®V, 123 V1J(X' G), 123 8®T, 125 (8 ® T)n, 125 0:F, 126 A(q»,126 ¢ 0'l/J, 126 ¢* 0'l/J*, 126 ®,\:F, 126 G(X) ®,\ :F, 126 G(X ; :F), 126 K v,127 (., .), 128
e
e
121
Indices
242
s., 129 A(X, Y), 130 As(X, Y), 130 IC(X, Y), 130 IC s(X,Y),130 A == A v , 130 K == x., 130 VI == V2 (mod P), 133 K I == K 2 (mod P), 133 Al == A 2 (mod P), 133 Cse(X, Y), 137 9.ny, 138
u"(x, C), 141 Mt(Y),144 v E exCs(X, Y) (mod P), 147
K E exICs(X,Y) (modP), 152 A E exAs(X,Y) (rnod P), 152 Ca(X, Y), 156 tu», C), 160 c.: (X, Y), 163 9.n~ (X), 166 9.n~ (Y), 166 9.n~(X x Y), 166 In (X, Y), 166 In(/-l; v), 166 I(/-l; v), 167 Cs(v), 167, 174, 176 Ce(v), 167, 174, 176 n; (/-l) , 167 v(A, C), 168 9l(/-l; v), 171, 175 Ps(X,~x), 173 Pse(X, ~x), 173 /-ll~x, 173
/-lr, 174 A(9.n~ (Y)), 178 vep, 181 CPr, 181 vep, 181 En, 181
Chapter IV v'ljJ,(, 190 r(j;,193 (¢, ¢*), 197
11,11£,197
V, 197 'fig, 198 v x , 198 k(x, y), 199 LI(X; £), 199 11
LI(X) 0 c, 200 ')1(,), 200 LI(X) 0, e. 200 p(', '), 202 Sy, 207 G,207 (t, X), 207 m,207 1* g, 207 e"",207 P == P(X, G), 207 Q == Q(X,LI(G)), 208 ¢v, 208 {U x ( ' ) ' 1-lx, ex} xEX' 209 {Vx(')' 1-lx, ex} xEX' 209 (-, -)x, 209 [Ix], 209 %,209,210 ~, 210 T,210
r., 210
Qs, 210 ex.T', (modPse(X)), 211 ex Qs (rnodPse(X)), 211 (-, -)/-£,211
1-l/-£, 211 [f]/-£,211 {U/-£(-), 1-l/-£, e/-£}, 212
Notation index
{VIL(o), tilL' gIL}' 212 Ff,213 C o(G),213 fl(G),214 A,214 6(A), 214 a(G),214 (A,6(A),a(G)),214 A*, 214 C(A, JB), 215 an .!!+ a, 215 C(X)CT, 215 B(ti), 215 T(ti), 215 Tt (ti), 215 trf-), 215 I(a) = I(a, A), 216 Imj-}, 216 K(a) = K(a, A), 216 a(a), 216 WM(a) = WM(a, A), 216 Cs (A,JB), 217 Cse(A, JB), 217 Ck(A, JB), 217 Cwm(A, JB), 217 A0JB, 217 al == a2 (modS), 218 Ai == A2 (modS), 218 {ti~,~~,x~,u~}, 219 ~(JB)', 219 lB, 220 [a, b], 220 <J? IL,S, 221 H(p),222 H(pIO"), 223 s(p), 223
243
244
Indices
A uthor index
A Adler, R. L., 187 Ahlswede, R., 64 Akeoglu, M. A., 119 Algoet, P., 119 Araki, H., 223 Arimoto, S., 188 Ash, R.,.63, 64
Dineuleanu, N., 64 Ding, Huo Guo, 119, 120, 188, 222 Dixmier, J., 209, 213, 214 Dobrushin, R. L., 188 Doob, L., 64 Dowker, Y. N., 119 Dunford, N., 70, 71, 91 Durham, M. 0., 189
B Bahadur, R. R., 64 Barndorff-Nielsen, 0., 64 Barron, A. R., 119 Bernoulli, J., 27, 69 Billingsley, P., 63 Birkhoff, G. D., 72, 119 Blahut, R. E., 64, 188 Blum, J. R., 119 Bogoliouboff, N., 120 Boltzman,L.,63 Breiman, L., 64, 102, 119, 120, 188 Brown, J. R., 28,63
E
C Carleson, L., 188 Chernoff, H., 64 Chi, G. Y. H., 64 Choda, M., 185, 222 Chung, K. L., 119 Clausius, R., 63 Cornfeld, I. P., 63 Cover, rr., 119 Csiszar, I., 63, 64 I)
Davies, E. B., 222 Davisson, L. D., 120 Diestel, J., 196, 199
Eehigo (Choda), M., 187, 222 England, J.W., 63
F Faddeev, A. D., 7, 64 Farrel, R. H., 119 Feinstein, A., 63, 64, 177, 187, 188 Foia§, C., 64 Fomin, S. V., 63 Fontana, R. J., 119, 187, 188 G Gallager, R. G., 63 Gauss, C. F., 56, 57 Gel'fand, 1. M., 64 Ghurge, S. G., 64 Gobbi, R. L., 187 Gray, R. M., 63, 110, 119, 120, 187, 188 Guiasu, S., 63 H Hahn, H., 43 IIalmos, P. R., 63, 64 Han, Te Sun, 64 Hanson, D. L., 119 Hartley, R. V. L., 63 Hille, E., 196 Hoeffding, W., 64
A uthor index
Holder, 0., 14 Holevo, A. S., 222 I Ingarden, R. S., 222 J Jacobs, K., 64, 119, 120, 187, 222 Jensen, J. L. W. W., 13 Jimbo, M., 188 K Kakihara, Y., 119, 188, 222 Kakutani, S., 119 Kallianpur, G., 64 Kanaya, F., 64 Kapur, J. N., 63 Katznelson, 1., 119 Kesavan, H. K., 63 Khintchine, A. Y., 7, 63, 64, 187, 188 Kieffer, J. C., 119, 120, 187, 188, 222 Kobayashi, K., 64 Kolmogorov, A. N., 20, 25, 64 Koopman, B. 0., 119 Korner, J., 63 Krengel, U., 63 Kryloff, N., 120 Kullback, S., 55, 63, 64 Kunisawa, K., 188 L Lebesgue, H., 93 Leibler, R. A., 55, 64 Lieb, E. H., 223 Lindblad, G., 223 M Markov, A. A., 28 Martin, N. F. G., 63 McMillan, B., 102, 119, 187
245
Moy, Shu-Teh C., 120
N Nakagawa, K., 64 Nakamura, M., 187, 222 Nakamura, Y., 119, 120, 187, 188, 222 Nedoma, J., 188 Neuhoff, D. L., 188,222
o Ochs, W., 223 Ohya, M., 63, 187, 222, 223 Oishi, N., 117 Ornstein, D., 28, 63, 64, 120, 188 Oxtoby, J. C., 120 Ozawa, M., 222, 223
p Parry, W., 63 Parthasarathy, K. R., 64, 120, 188 Perez, A., 120 Petersen, K., 63 Phillips, R. S., 196 Pierce, J. R., 64 Pinsker, M. S., 63 R Rahe, M., 188 Rao, M.M., 16, 61, 64, 75 Rechard, O. W., 119 Renyi, A., 119 Rudolfer, S. M., 119 Ruskai, M. B., 223
S Saadat, F., 119 Sakai, S., 214 Savage, L. J., 64 Schatten, R., 126, 199, 222 Schwartz, J. T., 70, 71, 91
Indices
246
Segal, I. E., 223 Shannon, C. E., 1, 7, 63, 64, 102, 119, 182, 184, 188 Shields, P. C., 63, 64, 188, 222 Sinai, Va. G., 20, 25, 63, 64 Slepian, D., 64 Spohn, H., 223 Stein, C., 64 Sugawara, A., 223 T
Takahashi, H., 222 Takano, K., 187, 188 Takesaki, M., 214, 223 Tsaregradskii, 1. P., 188 Tulcea, A. 1., 31, 118 Tulcea, C. 1., 31 Tverberg, H., 64 U Uhl Jr., J. J., 196, 199 Uhlman, H., 223 Urnegaki, H., 63, 64, 119, 120, 187, 188, 222, 223
V Viterbi, A. J., 64 von Neurnann, J., 74, 119, 222
W Walters, P., 28, 63, 88 Weaver, W., 63 Weiss, B., 119, 120 Winkerbauer, K., 64, 188 Wolfowitz, J., 188 y
Yaglom, A. M., 64 Vi, Shen Shi, 119, 120, 187, 188, 222 Yuen, H. P., 223
Z Zhi, Zhang Zhao, 188
Subject index
247
Subject index
A absolutely continuous, 11, 57 abstract channel, 123 additivity, 4 algebraic dynarnical system, 39 algebraic measure system, 35 algebraic model, 36, 39 algebraic tensor product, 126 a-invariant, 216 a-KMS, 216 a-abelian, 220 alphabet, 65 __ message space, 68 AMS, 91,156 analytic element, 220 aperiodic, 87 approximate identity, 207 asymptotically dominate, 93 asymptotically independent, 139 asymptotically mean stationary (source), 91 asymptotically mean stationary (channel), 156 average (of a channel), 141 averaging operator, 129 stationary , 130
c
C*-algebra, 214 C*-dynamical system, 214 C*-tensor product, 217 capacity stationary __ , 167, 174, 176 ergodic , 167, 174, 176 chain, 57 channel, 122, 214 distribution, 122 matrix, 122 of additive noise, 192 of product noise, 195 operator, 130 abstract __ , 123 asymptotically mean stationary 156 classical-quantum , 216 constant __ , 123 continuous __ , 122 dominated __ , 123 ergodic __ , 136, 147, 163, 217 induced __ , 181 integration __ , 190 KMS_, 217 m-dependent __ , 137 rn-memcry __ ,122 B memoryless , 122 noiseless _.__ , 181 barycenter, 221 quantum __ ,215 Bernoulli shift, 27, 69 quantum-classical , 216 Bernoulli source, 69 stationary __ , 122, 217 fJ-analytic element, 220 Birkhoff Pointwise Ergodic Theorem, 72 strongly mixing (SM) , 139 weakly mixing (WM) __ , 139, 217 block code, 181 bounded *-representation, 209 chunk, 57 boundedness, 12 classical-quantum channel, 216 clopen, 68 code, 181
248
Indices
__ length, 181 block __ , 181 commutant, 219 complete for ergodicity, 135 complete invariant, 28 complete system of events, 1 completely positive, 214 compound scheme, 2 compound source, 123 compound space, 121 compound state, 221 trivial , 221 concave, 3 concavity, 5 conditional entropy, 2, 17 __ function, 17 conditional expectation, 11 conditional probability, 11 conjugate, 29, 38 CONS, 38 constant channel, 123 continuity, 4 continuous (channel), 122 convex, 13 cyclic vector, 209 cylinder set, 27, 68
functional, 42 of a finite scheme, 1 of a measure preserving transformation, 20, 46 __ of a partition, 17 conditional __ , 2, 17 conditional __ function, 17 Kolmogorov-Sinai __ , 20 relative , 2, 49 Segal __ , 223 Shannon __ , 1 universal __ function, 117, 118 von Neumann __ , 223 equivalent, 57 ergodic capacity, 167, 174, 176 channel, 136, 147, 163, 217 decomposition, 109 source, 69, 97, 99 theorem, 72, 74 Pointwise , 72 Mean , 74 expectation, 12 extendability, 4 extremal, 147, 152, 211 extreme point, 71
D
F Faddeev Axiom, 7 faithful, 219 Feinstein's fundamental lemma, 178 finite memory, 122 finite message, 68 finite scheme, 1 finitely dependent, 137 finitely E-valued, 197 Fourier inversion formula, 214 Fourier transform, 213
density zero, 83 Dirac measure, 128 dominated, 57 dominated (channel), 123, 197 Dominated Convergence Theorem, 12 dynamical system, 11
E E-valuedsimple function, 197 entropy, 1 equipartition property, 104 __ function, 17, 49
Subject index
G Gaussian (probability measure), 57 Gaussian (random variable), 56 generator, 176 greatest crossnorm, 200 GNS construction, 209 GNS representation, 219 H Hahn decomposition, 43 Hilbert-Schmidt type (channel), 205 Hilbert-Schrnidt type (operator), 205 Holder's Inequality, 14 homogeneous, 59 hypothesis testing, 60 I idempotency, 12 identical (mod P), 133 induced channel, 181 information, 1 __ source, 69 Kullback-Leibler __ , 55 mutual __ , 2, 167 stationary __ source, 69 injective tensor product, 126 input source, 123 input space, 121 integration channel, 190 intertwining property, 217 invariant measure 1, 106 invertible, 11 irreducible, 80 isomorphic, 26, 35, 39 isomorphism, 26 J
Jensen's Inequality, 13
249
K KMS channel, 217 KMS state, 216 Kolmogorov-Sinai entropy, 20 Kolrnogorov-Sinai Theorern, 25 Kullback-Leibler information, 55 L least crossnorm, 126 Lebesgue decornposition, 93 lifting, 31 linearity, 11 M m-dependent, 137 m-memory, 122 Markov shift, 28 martingale, 15 Convergence Theorem, 15 Mean Ergodic Theorem, 74 measurable (transformation), 11 strongly __ , 197 ,197 weakly measure algebra, 29 measure preserving, 11 memoryless (channel), 122 message, 68 mixing strongly channel, 139 strongly __ source, 81, 99, 138 weakly __ channel, 139,217 weakly source, 81, 99, 138 weakly state, 216 Monotone Convergence Theorem, 13 monotonicity, 4 j.L-equivalent, 29 u-a.e. S-invariant, 79 mutual information, 2, 167
250
N noise source, 189 noiseless channel, 181 nonanticipatory, 178
o of density zero, 83 output space, 121
p pairwise sufficient, 59 partition, 16 Plancherel Theorem, 213 point evaluation, 207 Pointwise Ergodic Theorem, 72 positive definite, 31, 207 positivity, 4, 12 projective tensor product, 200 pseudosupported, 221 Q quantum channel, 215 quantum rneasurement, 216 quantum-classical channel, 216 quasi-regular, 106, 114
regular, 108, 114 relative entropy, 2, 49 Umegaki , 223 resolution of the unity, 215 S S-invariant (mod J.l), 79 S-stationary, 47 Schatten. decomposition, 221 semialgebra, 68, 77 semiergodic, 139 Shannon entropy, 1
Indices
Shannon's first coding theorem, 182 Shannon's second coding theorem, 184 Shannon-Khintchine Axiom, 7 Shannon-McMillan-Breirnan Theorem, 102 shift, 27, 68 Bernoulli __ , 27 Markov __ , 28 rr-convergence, 215 rr-envelope, 215 :E*-algebra, 215 sirnple, 219 simple function, 197 singular, 75 source, 69 Bernoulli __ , 69 cornpound __ , 123 ergodic , 69 input __ , 123 noise , 189 output , 123 stationary __ , 69 state, 214 __ space, 214 compound __ , 221 trivial compound , 221 stationary, 210 capacity, 167, 174, 176 channel, 122, 217 channel operator, 130 information source, 69 mean (of a channel), 160 mean (of a source), 91 source, 69 strongly rneasurable, 197 strongly mixing, 81, 99, 139 subadditivity, 5 submartingale, 15 __ Convergence Theorem, 16 sufficient, 58
Subject index
251
pairwise __ , 59 supermartingale, 15 support, 223 symrnetry, 4
T tensor product algebraic , 126 injective , 126 projective , 200 totally disconnected, 68 transmission rate (functional), 167, 171, 175 trivial (cornpound state), 221 type 1 error probability, 61 type 2 error probability, 61
u Urnegaki relative entropy, 223 uncertainty, 1 uniform integrability, 24 universal entropy function, 117, 118
V von Neumann entropy, 223 von Neumann Mean Ergodic Theorem,
74 W weak law of large nurnbers, 61 weakly continuous unitary representation, 209 weakly measurable, 197 weakly rnixing, 81, 99, 138, 139, 216, 217 X x-section, 122 y ~-partition,
16