ON A PAPER OF A. FELDZAMEN BY
S. R. FOGUEL* ABSTRACT
Results of A. Feldzamen on semi-similarity of operators are proved here using matrix methods. The use of these methods yields simpler proofs, the formulations of the theorems assume a more transparent form. The purpose of this note is to give shorter and more transparent proofs of results given in [3]. This will be done by using the methods developed in [4]. It should be mentioned that we assumed separability while Feldzamen does not. 1. Preliminary notions. Let S be a normal operator, on a separable Hilbert space H, of uniform multiplicity n < ~ . Thus H can be taken as direct sum of n equal spaces L2(F/, ]E,p), where f/is a Borel subset of the plane, ~ the collection of Borel subsets off~, and # a finite positive measure. Also:
See [3], [4], or [5]. The spectral measure E(.) of S is given by (f1(2)~
X(~5)fl(2)~
E('),f,(i2)/
=
(Z(cS)f.(2) ,,
where )C(fi) is the characteristic function of 6. Every operator A that commutes with S is given by a matrix of bounded and measurable functions azj(2), where
A(f,,(i)O) fi(2)
ft(2) ~ .
= (a,j(2)) ( f . ( i , j
See Theorem 2.1. of [4]. Received January 18, 1963. * The research reported in this document has been sponsored in part by the Air Force Orifice of Scientific Research of the Air Research and Development Command United States Air Force through its European Otfice. 133
134
S.R. FOGUEL
[September
DEFINITION. The vectors y~ e H, i = 1 ... k, will be called dependent over 6 if y~(2) are dependent for almost every 2 ~ 6. LEMMA 1.1. The vectors yi are dependent over 6 if and only if there exist k measurable functions gi, and a sequence of Borel sets 6m increasing to 6, such that a. The functions gi are bounded on 6r~ and not all zero. b. I f gi, m is the restriction of gi to 6m then k
~, gi, m(S)Yi = O. i=1
Proof. It is clear that a. and b. imply dependence. Conversely, let yi(2) be dependent for 2 e 6. For each 2 E 6 there exist constants g~(2) such that k
g~(2)yi(2 ) = O. i=1
It is enough to show that one can choose gi to be measurable. Let us consider the matrix (yi,r(2)) where yi,r(2) is the rth component of yi(2). The set t) can be decomposed into finitely many disjoint measurable sets, on each a certain determinant of (yi,r(2)) is the largest non vanishing one. On each set gi(2) can be chosen by Cramer's Rule, and are thus measurable. COROLLARY.
If k
> n then the vectors y~ are dependent over every set 6.
LEMMA 1.2. Let y~ 1 < i < n be independent over~. Let x be any vector in H. There exist n measurable functions fi( 2) and a sequence of Borel sets 6,, increasing to f~ such that n
x = lim ]E f~,m(S)Y~, m--,ooi=l
where f i,,~ is the restriction o f f i to 6 m and is bounded. The functions f ~are uniquely defined.
Proof. The vectors x(2), yi(2) are dependent by the previous Corollary. Thus x(;t) can be represented by a linear combination of yi(2). Since these vectors are independent the representation is unique. The same result could be proved for the case that Yi are independent over some set 6 = fL 2. Canonical form for niipotents. In this section we will follow ['1] to bring a nilpotent matrix with measurable elements to canonical form. It was proved in 1-4] that if N is quasi nilpotent and commuting with S, then N(2)" = 0 a.e.
1963]
O N A P A P E R O F A. F E L D Z A M E N
135
Let A(2; x) be an n by n matrix whose elements are polynomials in x with coefficients that are measurable functions of 4. Let f~k be the set on which the minimal order of the polynomials a,j(2; x) is equal to k. This is a measurable set. Let f~l = U ~ ' ~
where f ~ ' J = { 2 l o r d e r of aij(2;x )
=
1}. Again I)~ 'j is
t,J
measurable. An elementary transformation will bring aij to the upper left corner and by more elementary transformations A(2; x) can be brought to the form a(2) 0 ... 0 ! A~(2; x) 0
0
where order of a(2) is one and A1(2; x) has the same form as A(2; x). Let us split f~k to fl~i , j = {4 [ 2 ~ f~k and order of a,j(2; x) = k}. On f~'Jwe apply to A(2; x) an elementray transformation to bring aij to the left upper corner. Using the Euclidean Algorithm we see that there are two possibilities: 1. By an elementary transformation (using measurable coefficients) we can bring A(2; x) on ~'~ to the form a(2 x) 0
i
...
A1(2; x)
where A1(2; x) has the same form as A(2; x) and a(2; x) divides every element of A1(2; x). 2. A(2 ; x) can be transformed to a matrix whose minimal order is less than k, on
f~. These considerations prove: LrMMA 2.1. There exists a matrix B(2;x) such that both B(2;x) and B(2;x) -1 have polynomial elements with coefficients that are measurable functions of 2 and: B(2; x)a(2; x)B(2; x ) - l = diag{fl(2; x), f2(2; x), ..., f,,(2; x)}, where fi(2;x) are polynomials fi(2; x)l f,+1(2; x).
in x
with measurable
coefficients and
Let now A(2; x ) = x I - N(2) where N(2) represents the nilpotent operator N. Thenfi(2; x) = x i~a) (or 0), for they divide the minimal polynomial of N(2) (Theorem 8, Chapter V, of [1]). Thus i(2) is a measurable function of 2 and 0 < i(2) < n. Let f~ be the union of the disjoint sets f~,, where on ~ , i(2) is equal to a given fixed integer 1 < i < n. The sets f~, are measurable. By chapter V of [1],
136
S.R. FOGUEL
[September
Theorem 6.10, the matrix diag(fi(2; x)) is equivalent, on f2~,to a canonical Jordan matrix diag(fl(2; x)) ~ x l - Q~, where 081 : QGt "~-
... 0
0
0
/~,- 1
...0
and ei is either I or zero. Using Lemma 2.1 again one can find a matrix C(2; x), with the same properties as B(2; x) of Lemma 2.1, such that C(2; x) (xI - N(2)) C(2; x)- 1 = xI - Q for 2 Ef~. Finally by chapterV, Theorem 5.10, of I-1.1: C,(Q; ~)N(,~)C,(Q,,; 2)-~ = Q,, when 2 e f~. To summarize: THEOREM 2.2. Let N be a nilpotent operator commuting with S and let N(2) be its matrix representation. Let Q~ be the Jordan forms of a nilpotent matrix. There exists a matrix D(2) of measurable functions such that D-1 (2) exists, and measurable sets 12~ whose union is f~ such that D(A)N(2)D- 1(2) = Q~ For the matrix Q, there exist vectors Ya, ..., Y, such that j1-1
Yl, Q~Yl .... ,Q=
O J,.-1
yx, ...,yr, Q~yr, . . . , ~
Yr
are independent, j 1 + ... + j , = n, and Q~yi = O. Let xi(2) = D- 1(2)y t, and let f~,m c 11~ be such that xi(2) is bounded on f~,m and f~,,m increases to f~. Then on f~,m (on E(f~,,~)H) N1, Nx1, ..., Njt -
l x l , .,. ' Xr, N Xr, ..., N j r - l x r
are independent, and NJ'x, = O. This shows that the sets f~ do not depend on the representation of H as direct sum of L2 spaces (Spectral Multiplicity Theorem). The sets f ~ will be called the canonical sets of S + N. 3. Semi similarity. Let T = S + N and T1 = $1 + N1 be two spectral operators (see 121) and let S have uniform multiplicity n (equivalenty S is similar to a normal operator with uniform multiplicity). In [31 the notion of semi semilarity is defined by:
1963]
ON A PAPER OF A. FELDZAMEN
137
DEFINITION. T and T~ are semi similar if there is a sequence o f Borel sets 6m increasing to ~ such that, if E(.) and E~(.) are the spectral measures of T a n d T 1, there are bounded maps Lm, from EI(6")H to E(6m)H, with
Lmr'L-m 1 = r l ". where Tm(Tlm) is the restriction of T(T~) to E(6") (E1(6")). It was shown in [3], Theorem 27, that if Tand T1 are semi similar, then S and S~ are similar. If Tis semi similar to Tt and T = KT2K - 1for a bounded operator K where T2 is again spectral then
L ' K T 2 K - 1L-~l= TI" or T2 is semi similar to T1. Also by the remark following Theorem 2.2 the operators T2 and T have the same canonical sets. THEOREM 3.1. The spectral operators T and T I are semi similar if and only if S and S t ".are similar and T and T 1 have the same canonical sets. Proof. Without loss of generality we may assume that S = S~. If S + N is semi similar to S + N~ then
L ' N ' L ~ 1= Nlm, where
N m and N I ". a r e
the restrictions of N and N~ to E(6")H. But then
Lm(a)N'(a)Lfl(a) = N~'(~), which proves that N(2) and NI(A) have the same canonical sets. Conversely, if N and N1 have the same canonical sets, then on f ~
N = D-I(A)Q~D(2),
N 1 = D'~l(2)Q~Dl(2);
hence
N 1 = D-~ ~(2)D(2)N(2)D-.'(2)Dl(2 ). Define 6., so that D-I(2)Dt(2) and D~-I(2)D(2) L'(2) = {D~-I(,~)D(~) ] restricted to E(6m)H}.
be
bounded on 6", and
COROLLARY. Semi similarity is a transitive relation. This is Theorem 26 of [3]. Theorem 3.1 is essentially equivalent to Theorem 29, and 30 o f [3].
138
S. R. FOGUEL BIBLIOGRAPHY
I. Albert, A. A., 1941, Introduction to algebraic theories. The University of Chicago Press, Chicago. 2. Dunford, N., 1950, Spectral operators, Pac. J. Math., 4, 213-227. 3. Feldzamen, A. N., 1961, Semi similarityinvariants for spectral operators on Hilbert space Trans. Amer. Math. Soc., 100, 277-323. 4, Foguel, S. R., 1958, Normal operators of finite multiplicity, Comm. Pure Appl. Math. 11, 297-313. 5. Mackey, G. W., 1959, Commutative Banach algebras. Notas de Matematica No. 17, Rio De Janeiro. THE HEBREWUNIVERSITYOF JERUSALEM
ON OPERATORS WHICH ATTAIN THEIR NORM BY
JORAM LINDENSTRAUSSQ)
ABSTRACT
The following problem is considered. Let X and Y be Banach spaces. Are those operators from X to Y which attain their norm on the unit cell of X, norm dense in the space of all operators from X to Y? It is proved that this is always the case if X is reflexive. In general the answer is negative and it depends on some convexity and smoothness properties of the unit cells inX and Y. As an application a refinement of the Krein-Milman theorem and Mazur's theorem concerning the density of smooth points, in the case of weakly compact sets in a separable space, is obtained. 1. Introduction. Let B(X, Y) be the Banach space of all bounded linear operators from the Banach space X into the Banach space Y. (The norm in B(X, Y) is the usual operator norm.) Let P(X, Y) be the subset of B(X, Y) consisting of all the operators which attain their norm on the unit cell of X, that is all those T for which there is an x ~ X satisfying I! x II = 1 and I] Tx I1= 11T II. Bishop and Phelps [1] (cf. also [2]) proved that if dim Y = 1 then P(X, Y) is norm dense in B(X, Y) for every Banach space X. In [1] they also raised the general question--for which Banach spaces X and Y is P(X, Y) n o r m dense in B(X, Y)? This question is the subject of the present note. Rather simple examples show that in general P(X, Y) is not dense in B(X, Y). The simplest examples, perhaps, are based on the fact that if a 1 - 1 operator T f r o m X into a strictly convex space Yattains its n o r m at a point x, then x is an extreme point of the unit cell of X. However, if we consider instead of P(X, Y) the larger set Po(X, Y), consisting of all the operators T such that T** attains its n o r m on the unit cell of X**, then it can be shown (Theorem 1) that this set is always n o r m dense in B(X, Y). The question of Bishop and Phelps, as it stands, is very general. In fact, it seems to be too general to have a reasonably complete solution. We therefore restrict ourselves here to the study of those spaces X which have either one of the following properties. A. For every Banach space Y, P(X, Y) is n o r m dense in B(X, Y). B. For every Banach space Y, P(Y, X) is n o r m dense in B(Y, X). An immediate consequence of the density of Po(X, Y) in B(X, Y) is that every Received August 20, 1963. (1) Research supported by the National Science Foundation, U.S.A. (NSF--GP--378). 139
140
JORAM LINDENSTRAUSS
[September
reflexive space has property A. In Theorem 2 we show that if a Banach space X has property A then, under certain circumstances (for example if it is separable), its unit cell must have many strongly exposed points(2). A dual result concerning property B and strongly smooth points is also given (Theorem 3). As a consequence of the results mentioned above we obtain (for weakly compact sets in a separable space) refinements of the Krein-Milman Theorem and Mazur's Theorem [10] concerning the density of smooth points (cf. Theorem 4). In section 3 we prove some results of a more special nature and discuss a few simple examples. It is shown in particular that a finite-dimensional space whose unit cell is a polyhedron has property A and that there are Banach spaces X such that P(X, X) is not norm dense in B(X, X). I am indebted to Professor R.R. Phelps for helpful conversations concerning the subject of this note. Notations. By " o p e r a t o r " we always mean a bounded linear operator. Our results hold for real and for complex Banach spaces; however, for convenience of notation, we shall assume that the Banach spaces are real. The unit cell (x: xeX,[I x [I < 1} of X is denoted by Sx. Let C be a convex set in the Banach space X. A point x e C is called an exposed point of C if there is a n f e X* such that f ( y ) < f ( x ) for every y # x in C. A point x ~ C is called a strongly exposed point of C if there is a n f E X * such that (Of(Y) < f ( x ) for y # x in C; and (ii) co f(xn) ~ f ( x ) and {X ,},=1 c C imply Hx, - x [1 ~ 0. The dual notions are those of a smooth and strongly smooth point. We shall use these notions only for the unit cell and so we define them only in this case. A point x with [1x ][ = 1 is called a smooth point of Sx if there exists only one f ~ X * satisfying f ( x ) = = 1. A point x e X with II x !I = 1 is called a strongly smooth point of Sx iff,(x) ~ 1, and {f,}ff= 1 c Sx. imply that II•-f 11-' 0 (wheref is, necessarily, the unique element of Sx. satisfying f ( x ) = 1). A point x with II x I! = 1 is a smooth point (resp. strongly smooth point) of Sx if and only if the norm is Gateaux (resp. Fr6chet) differentiable at x (cf. ~mul'yan I-11]). A Banach space X is called strictly convex (resp. smooth) if every point on the boundary of Sx is an exposed (resp. smooth) point of Sx. X is called locally uniformly convex [9] if [1x,, + x 2, and 11xn !1= II x II = I imply [1x , - x 11~ 0. If X is locally uniformly convex then eveiy point on the boundary of S x is a strongly exposed point. If every point on the boundary of Sx is strongly smooth we say that the norm in X is Fr6chet differentiable (cf. Day [5, pp. 112-113] tbr these and related notions).
Ilfll
2. The main results. We begin by giving a simple characterization of the set Po(X, Y). LEMMA 1. An operator T from X to Y belongs to Po(X, Y) if and only if ~ there are {XkIk=1 in X and StfkIk=l in y * such that (2) This notion is defined below.
1963]
ON OPERATORS WHICH ATTAIN THEIR NORM
(1)
II x k II = liT, II = 1
(2)
If~(Tx,)l ->- II z l l - 1/j
141
k
=
1,2,
..-
j<=k, k = 1,2, ...
Proof. Suppose (1) and (2) hold and let x** be any weak* limit point of the sequence {x,}~=l. Then, for every j, IT**x**¢fj) l>=II Tll- 1/j and hence II T**x** II--II T** II- The proof of the converse is also immediate (using the weak* density of Sx in Sx..). THEOREM 1. For every X and Y, Po(X, Y) is norm dense in B(X, Y). Hence every reflexive space has property A. Proof. Let T~B(X, Y) with II ~ii = 1 and an e with 0 < e < l / 3 be given. We choose first a monotonically decreasing sequence {ek} of positive numbers such that (3)
2~
e:<e,
i=1
ei<e~,
2
ek < l / 1 0 k ,
k=l,2,...
i=k+l
We next choose inductively sequences {T,k)k~= 1' {Xk}k Qo 1' and {fk}~°=l satisfying =
(4)
T, = T
(5)
I1 TkXk II >-- II T, II- ~, II x, II = 1
k = 1,2, ...
(6)
fk(Tkxk)-- 11 T~x~ I!, llA 11= 1
k = 1,2, ..-
(7)
Tt,+ ix = TkX "[- ekf k( TkX) " TkXk
x GX ,
k = 1,2,
...
Having chosen these sequences we verify that the following hold. k-I
(8)
IITj-T, 11 _-< 2X~,, i=j
(9)
i
11~+~ II ~ It r~ II + ~ II Y~ II~- 4e~
(10) (11)
IIT, II----4/3
k=l,2,..k = 1, 2, .-.
II r~ll => II ~11 ~1
j
k = 1,2,...
Ifj(rjxk)l >_- II r~ 1!- 68j
j < k,
k = 1,2,...
Assertion (8) is easily proved by using induction on k. By (5), (6) and (7)
II Tk+, II > II r,+,~, II = II Z~x,(1 + ~f~(T~,))II = IIr,x,
II(l+ ~,llT,,,,ll)-:(llr,II-~,)(:+ ~,11r, li-~,~).
Relation (9) fo]lows easilyfrom thi~inequatity,since IIT, II= 4/3 and ~, < 1/10k, while (10) is an immediate consequence of (4) and (9). Finally we verify (11). By the triangle inequality, (5), (8), (10) and (3) we have, for j < k,
JORAM LINDENSTRAUSS
142
[1 Tj+~xk II = II Tkx~, 11- tl T k -
Tj+~
[September
11>=
k-!
= I1 T~ II - ~ - 2i = j~+ l ~, > II ~+~ II - 24 Hence, by (7) and (9),
~IS~(T~x~) I I1% 11+ 11r~ II :> II T~+ ~xk II = I1T~ II + ~J II T~ 112 &if, -
so that
If~(Zjxk)l >_ 11r~ II-
6~j,
and this proves (11). The sequence Tk converges in norm to an operator T satisfying II ~- z II z and I1 ~- Tj !1 --<~?-1 f o r . / = 2, 3,... (use (3) and (8)). We claim that ~ P o ( X , Y). Indeed,
Ifjt~x,) I >: If,(T~-x~)I- II %- ~ll->--- II r~ I1-
6~s-
zj_,
>= II ~11 - 6~-2~f-,
>= II ~11 - ~/s,
and the desired conclusion follows from Lemma 1. For reflexive X it is obvious that P(X, Y) = Po(X, Y) and thus every reflexive X has property A. Remark. The operator T constructed in the proof of Theorem 1 has also the property that T - T is compact. THEOREM 2. Let the Banach space X have property A. Then (i) I f X is isomorphic to a strictly convex space, then Sx is the closed convex hull of its exposed points. (ii) If X is isomorphic to a locally uniformly convex space, then Sx is the closed convex hull of its strongly exposed points. Proof. The proofs of (i) and (ii) are almost identical so we prove here only (ii). Let C be the closed convex hull of the strongly exposed points of Sx. Suppose that C # S x . Then there is an f ~ X * with [[fl[ = 1 and a 6 > 0 such that I/(x~l < 1-6 for x ~ C . Let III II1 be a locally uniformly convex norm in X which is equivalent to the given norm II II and such that I I Ix III --- II x II for every x. Let Ybe the space X @ R(3) with the norm II (x,r) !! = (111 x 1115+ re) "~. Then Y is locally uniformly convex. Let V be the operator from X into Y defined by Vx = (x, Mf(x)) where M > 2/~5. Then V is an isomorphism (into) and the same is true for every operator sufficiently close to I1. We have
II vii ~
M;
II Vx II z (1 + <M- 2)2) t/2 <
(3) R denotes the one-dimensional space.
M-
1 for x ~C.
1963]
ON OPERATORS WHICH ATTAIN THEIR NORM
143
It follows that operators sufficiently close to Vcannot attain their norm at a point belonging to C. To conclude the proof we have only to show that if Tis an isomorphism (into) which attains its norm at a point x and if the range of Tis locally uniformly convex, then x is a strongly exposed point of Sx. Indeed, let g ~ Y* satisfy [] g [1 = 1, g ( T x ) = 1[ T[]. Suppose that g(Tx,)--I] TI], with [Ix,, [] __<1. Then [] Tx + Tx, [] ~ 2 ][ T []and hence, by our assumption on Y, ][ Tx - Tx,][ --* 0. Since T is an isomorphism, II x , - x It "-"0, and this concludes the proof of the theorem. For examples of X which satisfy the assumption in (i) cf. Day [3], [4]. Kadec [6] has proved that every separable Banach space is isomorphic to a locally uniformly convex space. We prove next a partial converse to Theorem 2(ii). We say that a family of points {x~} on the boundary of Sx is uniformly strongly exposed (u.s.e.) if there is a function ~$(e),with ~$(e) > 0 for every ~ > 0, and a set {f~} of elements of norm 1 in X* such that for every ~, f~(x~) = 1, and tbr any x,
II x II --- 1 and f:(x) _>_1 - 6(e) imply II x - x: l! -In a uniformly convex space the set of all the boundary points of the unit cell is u.s.e. The set of all the extreme points of the unit cell of ll is also u.s.e. PROPOSITION 1. Suppose S x is the closed convex hull of a set of uniformly strongly exposed points. Then X has property A. Proof. The proof is similar to that of Theorem I and we indicate here only the necessary modifications. Let {x:} be a set of u.s.e, points whose closed convex hull is Sx and let {f:} be the corresponding set in X* (appearing in the definition o f a u.s.e, set). We choose the e; as in the proof of Theorem I and define a sequence of operators T k a s follows. T1 = T and
Tk+ lX = TkX + ekf~,,(x) " TkX~,
k = 1,2, ...
where x,k is an element of {x~} satisfying ]] Tkx,~ [1 -->_[[ Tk 11- e~2, and f~k is the corresponding element of {f~}. As in the proof of Theorem 1 it can be shown that the sequence Tk converges in the norm topology to an operator i~ satisfying 1[ ~V_ T][ < e, and that ]f,j(x~) [ > 1 - 1/j for i < k. By the definition of a u.s.e, set it follows that the sequence x,~ converges in the norm topology to a point x, say, and we have [[ iPx 1[ = 11i? 1[. This concludes the proof. REMARK. There exist even finite-dimensional spaces whose unit cells cannot be obtained as the closed convex hull of a u.s.e, set. Indeed, in a finite-dimensional space the closure of a u.s.e, set is again u.s.e., and simple 2-dimensional examples show that in general a convex set cannot be obtained as the closed convex hull of a closed subset of its set of exposed points. Our next result is concerned with some smoothness properties.
144
JORAM LINDENSTRAUSS
[September
THEOREM 3. Let X have property B. Then (i) I f X is isomorphic to a quotient space of a smooth space then the smooth points of S x are norm dense in the boundary of S x. (ii) If X is isomorphic to a quotient space of a space which has a Frdchet differentiable norm, then the strongly smooth points of Sx are norm dense in the boundary of Sx. Proof. Again the proofs of parts (i) and (ii) are almost identical so we prove only (ii). Let Z be a space whose norm is Fr6chet differentiable and let To be an operator from Z onto X with HTo[I <1" Let Y be Z ~ R with H(z,r) ll = (H z ]I2 + r2) 1/2. The norm in Yis also Fr6chet differentiable. Take Xo on the boundary of Sx and define T e B(Y,X) by T(z, r) = Toz + rMxo, where M is a sufficiently large positive number. Let e > 0 be given and let ire P(Y, X) satisfy II ir - T I[ < 5. Since T* is an isomorphism into, the same is true for ~* ife is small enough. Take Yo = (zo, ro)e Y with UYo t{ = 1 and 1{Tyo II = 11ir 1t- We have Tryo = Toz o + roMxo + u for some u of norm < 5, and ]l iryo ]l > M - 5. Put xl = yo/I] ~ l l Since I[ Tozo II = 1 it follows easily that HX o - xl [] = O(M-1) as M -~ oo (uniformly with respect to e in [0, 1])(3a). Hence to conclude the proof it is sufficient to prove that xt is a strongly smooth point of Sx. Take a n f e X* with f ( x , ) = [tSll = 1 and let f . ( x : ) ~ 1 with lif. II <- 1. Put g = t ' f / I I iPil and gn = T * I J 11f I]" Clearly g(Yo) = 11g II = 1 and g,(Yo) ~ 1. Since the norm in Y is Fr6chet differentiable ][ g , - g ]]-~0 and hence IIf -: ]1-,0 (~* is an isomorphism). This concludes the proof. Our main reasons for stating and proving Theorem 3 are its obvious duality with Theorem 2 and the application of its proof in Theorem 4b. In contrast to Theorem 2, Theorem 3 does not seem to be a useful criterion for deciding whether a given Banach space has property B. This is because the assumptions in (i) or (ii) here hold less frequently than the corresponding ones in Theorem 2 while the conclusion of (i), for example, holds for every separable space (cf. Day [3] and Klee [8] for more details). It is even conceivable that statement (i) in Theorem 3 holds in general, that is, without the assumption that X has property B. Concerning statement (ii) it should be remarked, perhaps, that if a Banach space X has a Fr6chet differentiable norm then the density character of X* is the same as that of X. This result was recently announced by Kadec [7] and it is a consequence of the theorem of Bishop and Phelps. The results which we have already proved imply easily the following refinement of the Krein-Milman theorem and the density theorem of Mazur [10] (cf. also Klee [81]). THEOREM 4. a. Every weakly compact convex set in a separable Banach space is the closed convex hull of its strongly exposed points. (3a) Here we assume that ro ~ 0. If ro < 0 we replace Yo by - y o .
1963]
ON OPERATORS WHICH ATTAIN THEIR NORM
145
b. The boundary of the unit cell of a separable reflexive space has a dense set of strongly smooth points. Proof. From Theorems 1 and 2 it follows immediately that the unit cell of a separable reflexive space is the closed convex hull of its strongly exposed points. The more general statement of (a) results from the observation that the method of proof of Theorem 1 yields the following statement: For everyX and Ythe set consisting of all the operators T which attain their supremum on a given weakly compact convex set C in X (that is, for which there exists an xo ~ C with I1 Txo = supxoc Zx II) is norm dense in B(X, Y). Theorem 2 c a n also be modified in an obvious way so as to apply to operators which attain their supremum on a fixed C. Part (b) follows by observing that if X is reflexive and separable, we can take as Yin the proof of Theorem 3 a space isomorphic to X ~)R. Since Yis reflexive the density of P(Y,X) in B(Y,X) follows from Theorem 1 (and hence we do not use the assumption in Theorem 3 that X has property B).
11
I1
REMARK. There exist separable reflexive spaces whose unit cells have exposed points which are not strongly exposed. For example, let X be the space 12 and denote by S its unit cell in the usual norm. Let en= ( 1 - 1 / n , O , ... ,0,1,0,... ) for n = 2,3,.-. (the number I stands in the n-th place). Let Sz be the closed convex hull of Un°°=2 { + e,} U S. It is easy to see that the point (1, 0,0,...) is the only point in S~ whose first coordinate is 1 and hence it is an exposed point of $1. It is, however, not strongly exposed since the first coordinate of e~ tends to 1. It follows, by duality, that there are separable reflexive spaces whose unit ceils have smooth boundary points which are not strongly smooth. It should be remarked also that there exist separable (non-reflexive) Banach spaces whose unit cells do not have any strongly smooth boundary point (l~, for example). 3. Some examples. Using known representation theorems for operators it is possible in some special cases to verify directly the density of P(X, Y)in B(X, Y). We shall now give a few results in which some of the common spaces are examined as to whether they have property A or B. The results are, however, very incomplete. Even for the spaces C(K) we do not know a complete characterization of those which have either property A or B. Moreover, the examples given here do not answer some questions which naturally arise in connection with the results proved in section 2 (for example: Does every reflexive space have property B?). PROPOSITION 2. a. The space LI(#) has property A if and only if the measure It is purely atomic.
b. The space C(K) with K compact metric has property A if and only if K is a finite set.
146
JORAM LINDENSTRAUSS
[Septemlxr
Proof. It is well known and easy to see that the extreme points of the unit cell of an LI(IX) space are the characteristic functions of the atoms of/x (multiplied by a suitable scalar so as to have norm 1). Hence the unit cell of LI(IX) is the closed convex hull of its extreme points if and only if IXis purely atomic. Every L l space is isomorphic to a strictly convex space (Day [411). Hence, by Theorem 2(i), Li(IX) does not have property A if/x is not purely atomic.That an l i(I ) space has property A for every index set I is very easily seen directly (it follows also from Proposition 1). To prove part (b) it is, by Theorem 2(ii), sufficient to show that if K is infinite compact Hausdorff, then the unit cell of C ( K ) has no strongly exposed point. Suppose there were such a point x. Let # be the functional(4) appearing in the definition of a strongly exposed point and let e > 0. There exists a non-void open set G in K such that 0 ~<[IXI(G) < e/2. Since x is an extreme point of the unit cell of C(K),[ x(k) [ = 1 for every k E K. It is now easily seen that there is a y ~ C ( K ) such that [I Y [[ =< 1, [[ y - x [] = 2 and y(k) = x ( k ) for k ~ G. Clearly p(y) __>1 and this contradicts the definition of a strongly exposed point. Our next result is a consequence of the theorem of Bishop and Phelps [1]. It may be regarded as the dual of Proposition 1. PROPOSITION 3. Let X be a Banach space such that there exist two sets {x,} in X and {f,} in X*(5) and 2 < 1 such that
~. IIs: II =
1 f o r every ~ and
II x II = sup:lf:(x) l S°r
every x ~ X .
2. II x: II =f,(x=) = 1 f o r every ~ and if:(x,) i < 2 f o r ct ~ ft. T h e n X has p r o p e r t y B. with 11zll = 1 and ~, 0 < ~ < 1, be given. Clearly 1 = II zll = sup, II Z*f=11. Let ~o be such that [1T*f=o II = 1 - e(1 - ){)/4. Choose a g ~ Y* which attains its norm on Sr and satisfies Proof.
Let T e B ( Y , X )
II g
- T*f~o
I1<--~(1 -
2)/2,
1 - e(1 -- 2)74 <
II g II z
~.
Put Toy = T y + [(1 + e)g(y) - T*f,o(Y)]X~o.
We have
II T- To II--- ~ll g Further, T o ~ P ( Y , X ) .
[i +
II T * f ~
-
g
11-- 2~.
Indeed, for ~ ~ s0,
I[ z~f~ II ~ It zll + IfXx,o)I@I! g [I + I[T*f~o - g II) < =< 1 + ;t(g + e(1 - 2)/2) =< 1 + e(1 + 2)/2. (4) We identify the functional with the corresponding measure on K. (5) In both sets the set of indices {a) is the same.
1963]
ON OPERATORS WHICH ATTAIN THEIR NORM
147
while To*f~o = (1 + e)g satisfies
II Zo*f~o II -- ( i + ~)II ~ II >-- ( I II To* II--II To*f~o !l and since
Hence true for To.
+ ~)(I - e(l - )I.)/4) => 1 + e(l + 2 ) / 2 .
T*fao attains its n o r m on S r the same is
The assumptions in Proposition 3 are satisfied if, tor example, X is finitedimensional and its unit cell is a polyhedron or if X = C(K) with K having a dense set of isolated points. Many examples of spaces which do not have property B can be obtained by using Theorem 2. For example, if X is strictly convex and if there is a Banach space Y such that
S r is not the closed convex hull of its exposed points and such that Yis isomorphic to a proper subspace of X then X does not have property B. In some special cases it is easy to obtain somewhat stronger results. We have for example PROPOSITION 4. If X is strictly convex and if there is a non-compact operator from Co into X then X does not have property B. Proof. Suppose T e P(co, X) and let y~_ Co satisfy II Y II-- 1 and II Tr II = H z!l. Denote by {ei}~°°_-~ the natural basis of c o. There is an integer n such that for i > n II Y + ei/ 2 II = 1. It follows that for these i, [] Ty +Tei/ 2 II ==- II T~ II and hence, by the strict convexity of X, Te~ = 0 for i > n. Thus every operator belonging to P(c o, X) has a finite-dimensional range and, as a consequence, every operator in the closure of P(co, X) is compact. Finally we observe the following. PROPOSITION 5.
There exist Banach spaces X for which P(X,X) is not dense
in B(X,X). Proof. Let Y = c o with the usual n o r m and let Z be a strictly convex space isomorphic to c o. Put X = Y @ Z with 1[ (y,z)II-- m a x ( II y II, II z [1)(6). x has the required property. Indeed, let TO be an isomorphism from Y onto Z with
II To II ==-1 and
define T i n B(X,X) by T(y,z) = (0, Toy ). We have II Toy 11>= 2~11Y II for every y e Yand some e > 0. Suppose there were a T ~ B(X, X) with II t- zl[ < and I1 5~ [t = [I T(yo ,zo) II for some (yo,Zo) in X of n o r m 1. Put T(yo, Zo)= (u,v). Clearly II u 11<~ and since II t [I >~ it follows that II u 11< II t II -- Ilv II. Since S r has no extreme point there is a y~ ¢ 0 in Y such that
II y~ + y o II = II- y, + y o II ~ 1. (6) For every vector it will be clear to which space it belongs. Therefore we use the same notation, tl It, for the norms in X, Y and Z.
148
JORAM LINDENSTRAUSS
Hence II T(yo, Zo) + f'(Yx,O)[I -<- II fll.
Since Z is strictly convex and II v !1--II ill it follows that f(y,,0)= (y2,0) for some y2~ Y. We get 511 yl II > II z
t(y~,o)II--
II Toy, II > 2~11 y, II
and this is a contradiction.
REFERENCES 1. Bishop, E. and Phelps, R. R., 1961, A proof that every Banach space is subreflexive, Bull. Amer. Math. Soc., 67, 97-98. 2. Bishop, E. and Phelps, R. R., 1963, The support functionals of a convex set,Proc. Symp. Pure Math., 7, (Convexity) 27-35. 3. Day, M. M., 1955, Strict convexity and smoothness, Trans. Amer. Math. Soc., 78, 516-528. 4. Day, M. M., 1957, Every L space is isomorphic to a strictly convex space, Proc. Amer. Math. Soc., 8, 415-417. 5. Day, M. M., 1958, Normed linear spaces, Springer, Berlin. 6. Kadec, M. I., 1959, On spaces which are isomorphic to locally uniformly convex spaces, Izvestia Vysshikh Uchebnykh Zavedenii (Mathematics) 6, 51-57. 7. Kadec, M. I., 1963, Some problems in the geometry of Banach spaces, Dissertation, Moscow University. 8. Klee, V., 1959, Some new results on smoothness and rotundity in normed linear spaces, Math. Ann., 139, 51-63. 9. Lovaglia, A. R., 1955, Locally uniformly convex Banach spaces, Trans. Amer. Math. Soc., 78, 225-238. 10. Mazur, S., 1933, 0ber konvexe Mengen in linearen normierten R~iumen, Studia Math., 4, 70-84. 11. ~mul'yan, V. L., 1941, Sur la structure de la sph6re unitaire dans l'espace de Banach, Math. Sb. (N. S.), 9, 545-561. YALE UNIVERSITY,NEW HAVEN AND UNIVERSITYOF WASHINGTON, SEATTLE
MINIMAL UNIVERSAL COVERS IN
En
BY
H. G. EGGLESTON*
ABSTRACT
It is shown that any plane set of constant unit width contains a semi-circle of radius ½, and using this a minimal univeral plane cover is explicitly constructed. It is also shown that in an n-dimensional space" with n> 2 there are minimal universal covers of arbinary large diameter.
Introduction. We shall consider subsets of real n-dimensioral Euclidean space E". Denote by J r , that class of subsets of E" which have a width in every direction equal to 1. If two subsets X and Y of E n are congruent we write X ,-- Y. A subset C o f E" is called a universal cover if it is closed, convex and such that for every subset A of E n whose diameter is less than or equal to 1 we can find a subset B of C such that B ~ A. Since every such set A is contained in a member of o,~,, it is sufficient in proving that a set C is a universal cover to vertify that the congruent subset B of C can be found corresponding to every set A that belongs to d , . By a m i n i m a l universal cover is meant a universal cover of which no proper subset is also a universal cover. In the plane the diameter of any minimal cover is less than 3 and the question has been asked (by V. Klee, see [23) as to whether there is a finite upper bound of the diameters of compact minimal universal covers in E", depending possibly on n. We show that this is not the case by proving that for any given positive number K and any integer n with n > 3, there is a compact minimal universal cover in E" whose diameter is greater than K. We first prove that a certain plane set is a minimal universal plane cover by means of a lemma, which incidentally also shows that any set in aT"2 contains a semicircle of radius ½ (see [1]). §I. An explicit example of a plane minimal universal cover. In this section it is shown that a certain set ¥, defined explicitly, is a plane minimal universal cover. Y is the union of a disc and of a IReuleaux triangle, both of unit diameter, so * This paper was written while the author was a National Science Foundation Visiting Senior Fellow at the University of Washington, Seattle, Washington, U.S.A. Received Septembei 12, 1963 149
150
H . G . EGGLESTON
[September
placed that two of the vertices of the triangle are diametrically opposite points on the disc. See Figure 1.
Figure 1 I f Yis a universal cover at all it must: be a minimal one: indeed no proper subset o f Y can contain b o t h a disc and a Reuleaux triangle each of unit diameter. We shall deduce that Y is a universal cover from the following lemma. L E n A 1. I f Z is a plane set of unit constant width then there are two points on the frontier of Z, say s,t, which are at unit distance apart and such that of the two semicircles of radius ½ that pass through both s and t, one at least, does not meet the interior of Z. F o r if this l e m m a were true then, since every l:oint of Z is distant at m o s t 1 f r o m b o t h s and t, it would follow that Z lies in a figure bounded by a semicircle on st and (on the opposite side of st) two arcs of unit radius and centers s and t. This figure is congruent to IT. Thus Z is congruent to a subset of Y. But any set whose diameter does not exceed 1 is contained in a set of unit constant width. Hence Y is a universal cover. It remains to prove the lemma. Proof of the lemma. By a standard approximation argument it is sufficient to establish the lemma when Z is a Reuleaux polygon and in what follows we consider this case only. Two vertices of Z, say a, b, are said to be opposite if a n d only if they are at unit distance apart. In the frontier of Z lie two circular arcs whose centers are a and b. They lie on one and the same side of the line ab; we call this the positive side and the other side will be called the negative side. The circumcircle of the part of Z on the negative side of ab will be denoted by Yah, its radius by rob (~ab is a disc). In any case rob ~_ ½ and we wish to establish the existence of two opposite vertices a,b, for which r,b = ½. We assume that no such vertices exist and show that this leads to a contradiction. Any two vertices p,q on the frontier of Z divide this frontier into two arcs,
1963]
M I N I M A L U N I V E R S A L COVERS I N E n
151
Moreover, if p and q are not opposite, then one of these two arcs contains no points opposite to p or q. The vertices of Z on this arc other than p or q are said to lie between p and q. I f p and q are opposite then by the vertices bet~veen p and q we mean those that lie on the negative side of the line pq. Since rpq > ½, there must, f o r any pair of opposite vertices p,q, be vertices between p and q which lie on Vp~. Define the vertices v, w such that they lie on Vpq and no other vertices between p and v or between q and w lie on Vpq. Let there be f(p), g(q) vertices between p and v and between q and w respectively. Let h(pq)=min(f(p),g(q)) and z = m i n p , q h(pq). Choose opposite vertices a,b so that z = h(ab) and suppose for definiteness that f ( a ) = h(ab). Then let b 1 be the vertex opposite a adjacent to b and at be the vertex opposite to bt adjacent t o a.
We consider two cases. CASE (i) Z = 0. at lies on Y.b (see Figure 2). bt lies outside Y~b and the line joining bt to the center of Yah bisects internally the angle abia~. Hence Y.b cuts the segment b~at 0 //1
/
i I
\
i
iI
\\
I I
\\
//
Iiii
I. !~
/I \ i X jl
.i I
.. i - j r I.t.i-
•
ii
.i ..i Y<,b
bs~_~
b Figure 2
in two points a t and c, and the center of Tab lies on the positive side of atb t. Thus the part of Yah on the negative side of albt, apart f r o m the point al, lies interior to 7a,b,. But this means that no vertex o f Z on the negative side ofaib~ lies on V~,b,- This is impossible since it implies 1",,b~ = ½. CASE (ii) Z > 0. Let c be the vertex of Z between a and b on Yob nearest to a (see Figure 3). Because of the extremal property of a,b and c and because z > 0, b,c and all the vertices of Z between a~ and c must be interior points of ~o,b,. Since b,c are interior to Vo,b,, of the two parts of rob on the two sides o f the line bc one must
152
H . G . EGGLESTON
[September
0 //" /
/// I I \
r c \
/ \
I
/
b Figure 3 lie interior to )'~b,. Since a~ lies on the frontier of Ya,b, it must be that part of yah on the side of bc opposite to a~. Thus all the vertices of Z between b and c are interior to Va,bl" Hence all the vertices of Z between at and b~ are interior to V,,b,. Again this is impossible. The lemma 1 is proved. REU_ARK. The lemma also shows that any member of K2 contains a semicircle of radius ½. For if Z is a Reuleaux polygon and in the notation of the lemma Vob = ½ then all vertices of Z on the negative side of ab lie in Yahand the frontier o f Z on the positive side of ab (which is formed from circular arcs of radius 1 with centers at these vertices) lies outside or on the frontier of Yah" Thus of the frontier of Yahone semicircle lies inside Z and the other lies outside Z where inside and outside are to be interpreted in the weak sense; i.e., frontier points of Z can lie in the frontier of V~b. §2. A minimal universal cover in E* of large diameter. By a ball is meant a closed solid sphere; for example, a set such as {(xl,..., x,) I Y-,F=l(x~ - k3 z <_r2} where (xt .... ,x~) denote the coordinates o f a point in E". We need the following lemma. LEMMA. 2. There is a member W of o,~,, n > 3, such that the projection of W in any direction is not an (n - 1)-dimensional ball. Let Bo be the n-dimensional ball with center (0,0 .... ,0) and radius ½. Let Bi be the ball with radius 1 and center p~ where every coordinate of p~ is zero except the i th, and the ita coordinate is (7½ - 1)/28. Let U be the intersection of Bo and all the B~ and let V be the union of U and all the points Pv Let W be a member o f o,~, such that W ~ V. Let Wo be the projection of Win the direction 0. By direct calculation it can be seen that there is a p o i n t f on the frontier of Wo which is the projection o f a point
1963]
MINIMAL UNIVERSAL COVERS IN E n
153
o f the frontier of Wthat lies interior to B o and on the frontier of one o f the B~. But this means that the frontier of Wo in a sufficiently small neighborhood o f f coincides with the frontier of an (n - 1) dimensional ball of radius 1. Thus W0 is not an (n - 1) dimensional of radius ½. Since Wo ~ a~rn- 1 it follows that W0 is not an (n - 0-dimensional ball. COROLLARY. There is a member W of a~ n such that the greatest lower bound of the circumradii of projections of Win all directions is greater than ½. The p r o o f is obvious by compactness arguments. LEMMA 3. I f Z E K 2 and every square circumscribing Z has its sides bisected by the points of contact with Z then Z is a circle. Let a,b be two points of Z distant 1 apart and such that the semicircle tQb joining them lies in Z. Let the center of this semicircle be p, and let q be one of the points of Z most distant from p. The line through q perpendicular to pq is a support line of Z: hence by the hypothesis the support lines of Z parallel to pq touch the circle of which tQb is a part. But then by hypothesis pq must be equal to ½. Hence Z is a circle. Denote by Z that member of ~ n such that the greatest lower bound of the circumradii of projections of Z in all directions has the largest possible value. Let this value be R; then R > ½ .
Construction of a universal minimal cover. Let P be the infinite prism lxll < ½ , i = 1 , 2 .... , n - l , x n > 0 . P i s a u n i v e r s a l cover and our universal minimal cover will be a subset of P. Denote the projection of any set X in E n onto xn = 0 by X o. Then if YE a~f-n and Y c P we have Yo ~ of'n- 1 and Yo c Po where Po is the ( n - 1)-dimensional cube whose faces lie in the intersections of the hyperplanes xi = + ½, i = 1,2,..., n - 1, with Xn = 0. O f the 2 n- 1 faces o f this cube denote that which lies in x~ = ½ by F. Yo meets F in precisely one point; denote this point by F(Y). Consider the class of sets °2/ congruent to Y which lie in P and can be obtained from Y by combining a rotation or reflection which leaves the line x~ --- x2 . . . . . Xn_ ~ = 0 fixed, with a translation perpendicular to this line. Let F(g r) be the set of points formed by F ( Y ) for all Y~ q¢. Let F*(q¢) be the subset of the points of F(q/) that are most distant from the point (½, 0,... ,0) (i.e. the center of face F). Both F ( ~ ) and F * ( ~ ) are closed non-void sets. If (½, X2, X3, . . . , X n _ l , 0) belongs to F(q¢) so do all the 2 n-2 points (½, _+ x2, + x3,..., + x n - 1 , 0) and they are all the same distance from (½,0,0,...,0). Thus there is a point, say (½,x2,x3 .... ,x*_l,0), of F*(q/) for which x, > 0. Select one such point and let Y* be the set (or one of the sets) belonging to q/* such that F(Y*)=(½,x*2,x*a .... ,x,*_~,0). It follows from lemma 3 that Y* is the point (½, 0, 0 .... ,0) if and only if Yis a ball.
154
H.G. EGGLESTON
[September
Next let g be a large positive number so that 2(R - 1 / 2 ) . g > K (K is the preassigned positive number, R the minimal circumradius of any projection of Z), and let Pg be the intersection of P with the cone (1)
2 < (x,,+g)2 x 2 + x22 + ... + xn-t (2g + 1 ) 2 - 1
This cone intersects xn = h in the (n - 1)-dimensional ball with center (0,0 .... ,0, h) and radius rh = (h + g)/((2g + 1)2 - 1) 1/2. Since this radius is large for large values of h it follows that Pg is a universal cover. Also the set Pg contains the n-dimensional unit ball with center at x , = ½ , x i = O , i = 1..... n - 1 . Denote this ball by B*. F o r each set Yof og'n in P select Y* as described above and translate Y* parallel to the x, axis until it lies inside Pg and, subject to this condition, is as close to x, = 0 as possible. Let the translated set be Y**. Let Q be the union o f all these sets and let S be the convex cover of the closure of Q. Q and (therefore) S are bounded. S is a subset of Pg that meets the hyperplane x, = 0 (since Q ~ B * ) and S also meets the hyperplane x, = K. F o r consider where Z** can lie. If Z** lies in x, < K then R < rx, i. e. R =< (K + g)/((2g + 1) 2 - 1) '/z , which implies K+g R =< 2 ( g ( g + 1 ) ) l / 2
K 1 <2g- +-2-" =
Thus 2(R - ½)g < K. But this contradicts our choice of g. Thus Z** lies at least partly in xn ~ K. Hence S lies partly in x~ >= K, and the diameter of S is at least K. S is a compact universal cover; let T b e a subset of S that is a minimal compact universal cover• By the same argument as that above Tcontains points in x, > K. Now the line L defined by xl = ½, x2 = xa . . . . . x , _ l = 0 meets Q in the single point q = (½, 0,...0, ½). Moreover q is the only point of the closure of Q on L. F o r if this were not so let p in Q lie on L, p va q. Then there exists a sequence of points pj such that pj E Q and pj ~ q as j ~ ~ . Then P1 belongs to one of the sets of which Q is the union, say pj ~ Yj**. Now Yj** meets x~ = ½ in a single point, say y~, and is moreover contained in an n-dimensional ball of radius 1 touching x~ = ½ at y~. If the distance py~ is tan 0i, that o f p p j is greater than or equal to sec 0j - 1. Since pj ~ p as j ~ oo it follows that 0j ~ 0 as j ~ and thus y~ ~ p as j ~ oo • Let the coordinates of y~. be (ylU),y2 U),... yn(J)). Since p has its 2 nd, 3r~, .... ( n - 1st) coordinates all zero this means that y ( j ) ~ 0,..., y ~ l ~ O. Now a subsequence {Yj,} converges to a member, say W, o f ~f', and F*(W) is the single point (½,0,0 .... ,0). By the remark concerning
1963]
MINIMAL UNIVERSAL COVERS IN E n
155
Lemma 3, W must be such that all its two-dimensional projections are circles. Hence W is an n-dimensional ball, i.e., W ~ B*. N o w Yf* is one of the sets of which Q is the union and thus Y~*can not be translated inside Po parallel to the x, axis so that it lies nearer to x, = 0. Thus Y],* meets the surface of the cone (I). Hence so also does W. Thus in fact W is B*. But if this is so, Y~* converges to B*. Hence y~ converges to q and p is q. This contradiction establishes that the line L meets Q in the single point q = ( ~ , 0 .... ,0,½). Now S it the convex cover of Q and Q lies in the set xa < ½, and if xl = ½, then 0 __<xi < ½, i = 2, . . . , n - l , and it follows that S cannot meet the line L in any point other than q. T is a universal cover and hence contains an n-dimensional ball o f radius ½. S c P and it follows that S meets the line L. Thus T contains B*. Thus T contains the point (0,...,0). Hence T has diameter at least equal to K. This establishes the required result. REFERENCES
1. Besicovltch, A. S., 1963, On semicircles inscribed into sets of constant width. Proe. Syrup. Pure Math., 7, (Convexity), 15-18.
2. Grfinbaum, B., 1963, Borsuk's problem and related questions. Proe. Symp. Pure Math., 7, (Convexity), 271-284. BEDFORD COLLEGE, LONDON AND UNIVERSITY OF WASHINGTON, SEATrLE
ON THE STRUCTURE OF LINEAR GRAPHS BY
P. ERDOS ABSTRACT
Denote by G(n; m) a graph of n vertices and m edges. We prove that every G(n; [n2/4] + 1) contains a circuit of l edges for every 3 =< I < c2n, also that every G(n; [n2/4] + 1) contains a ke(un, Un) with un = [Cllogn] (for the definition of ke(un, Un) see the introduction). Finally for t > to every G(n; [tn3/2])contains acircuit of 21 edges for 2 <= I < c3t2.
G(n; m) will denote a graph of n vertices and m edges, K(p) will denote the complete graph of p vertices, and K(p, p) will denote the complete bipartite graph of 2p vertices. More generally K(pt,..., Pr) denotes the r-chromatic graph where there are p, vertices of the i-th color and any two vertices of different color are adjacent. Ke(pl,.",pr), Pl < P2 <="'" <=P,, will denote a K ( P l , ' " , Pr) where two vertices o f the first color are adjacent, i.e. Ke(pl, "" ,p,) is a K(pl, ..., p,) with an extra edge. The vertices of G will be denoted by x, xl, y , ' " ; the edge connecting x and y will be denoted by (x,y). ( G - xl . . . . . xr) denotes the graph G f r o m which the vertices xt,---, x, and all edges which are incident to them have been deleted, v(x), the valency of x, is the number of edges adjacent to x. C1 will denote a circuit having l edges, c I, c2, "" denote suitable positive absolute constants. It] is the greatest integer not exceeding t. A special case of a well known theorem of Tur~in [1] states that every G(n; [ n 2 / 4 ] + 1) contains a K(3) (i.e. a triangle). Dirac and I observed (independently) that every G(n; [n2/4] + 1) contains for every 4 < k _< n a subgraph G(k; [ k 2 / 4 ] + 1)and in fact Dirac proved a more general theorem [2]. In the present paper we continue the investigation of the structure of the graphs G(n; [n2/4] + 1) and we are going to prove the following theorems: THEOREM l.
Put [cl log n] = u n. Every G(n; [n2/4] + 1)contains a Ke(un,u~).
REMARK. The structure of K~(un,u~) is clearly uniquely determined. It is the G(2u,; u,,2 + 1) which contains a K(u,,u~) as a subgraph.
THEOREM 2.
Every G(n; [n2/4] + 1 ) contains a Cl for every 3 < l < c2n.
THEOREM 3.
Let i > t o , then every G(n; [~na/2]) contains a C2t for
every
2 < l < c3 t2. Received September 28, 1963. This work was done while the author received support from the National ScienceFoundation, N.S.F.G.88.
156
ON THE STRUCTURE OF LINEAR GRAPHS
157
Apart from the value of cl Theorem 1 is best possible. In fact we can show the following THEOREM 4. To every ~ > 0 there is a c(e) so that ,for every n there is a G(n; [ ( ~ ) ( 1 - e)])which does not contain a K([c(e)logn], [e(e)logn]). We suppress the proof of Theorem 4 since it uses the methods used in [31. A theorem o f A. H. Stone and myself [41 implies that every G(n; [en2]) contains a K([cl(e)logn ], [c~(e)logn]). The exact determination of c(e) and cl(~,) seems difficult. I would expect that the exact determination of e2 in Theorem 2 will be difficult. Theorem 3 is best possible in the sense that E. Klein [5] showed that there is a G(n; [c4nZ]) which contains no C4. For t > to perhaps every G(n; [tn3/2]) contains a C2, for every 2 < l < Cstnl/2; if true, then apart from the value of c 5 this is easily seen to be best possible. By the same method as used in the proof of Theorem 1 we can prove THEOREM 5. 7'0 every k there is an no = no(k) and a Ck SO that, .for n > no, G(n; [n2/ 4] + k) always contains a K([cklogn], [cklogn]) and k further edges. We suppress the proof of Theorem 5. Put r k = [cklog HI. For k > 1 the structure of our G(2rk; r2k + k) is of course not uniquely determined. Perhaps the following result holds: Let n > 8. Then every G(n; [ n 2 / 4 ] + n - 1) contains a K([clogn], [c log HI) and two edges which have no vertex in common and all four vertices c f which have the same color. It is easy to see that a G(n; [n2/41 + n - 2) does not have to have this property. To see this consider a K([n/2], [(n + 1)/2]) where further one vertex of each color is adjacent to all the vertices of our graph i.e., the vertices of our G ( n ; [ n 2 / 4 ] + n - 2 ) are x , . . . , x ~ ; y l , . . . , y ~ k = I n / 2 ] , l = [(n + 1)/2] and its edges are (xl, yj); 1 _< i --- k, l___<j____I and (xl,xi),(yl,yj); 2 <_ i <_ k, 2 <=j<=I. Put P D2
"
2
r
m(n,p)-- 2~-~----1)(n - - r 2 ) + ( 2 ) , n = ( p - 1 ) t + r , l _ < _ r = < p - 1 . Turfin proved that every G(n; m(n,p)) contains a K(p) and Dirac and I [2] observed (independently) that it contains a K(p + 1) from which one edge is missing. By very much more complicated methods I can prove that for n > no(p, k) G(n: m(n, p)) contains a p chromatic subgraph K(k, ...,k) and one further edge (i. e., a K~(k,..., k)); for p = 2 this is a weakened form o f Theorem 1. Now we prove Theorem 1. First we need two Lemmas. LEMMA 1. Every G(n; m) contains a subgraph which has valency greater than [m/HI. Further (1)
M>=m-(n-N)
[~]
G(N,M) every vertex of
158
P. ERDOS
[September
(The L e m m a of course means that every vertex of G(N, M) has valency in G(N,M) greater than [re~n]). I f every vertex of G(n, m) has valency > In~ m], there is nothing to prove. Hence we can assume that G(n,m) has a vertex x 1 of valency < [re~n]. If G(n; m) - xl has a vertex x2 with v(x2) < [re~n] we consider G(n; m) - xl - x 2 . We repeat this process and obtain a sequence of vertices x x,..-,xk so that the valency of xi in (G(n; m) - x 1 . . . . . xi-1) is _< [re~n] for every 1 < i < k - 1, but every vertex of (2)
(G(n; m) - x I . . . . . .
x~.) =
G(N; M)
has valency > [re~n]. Clearly M > 0 for otherwise, since (G(n; m) - xl . . . . . x,_~) has only one vertex and thus no edges, we can put in (2) k < n - 1 and by our construction we would have
an evident contradiction. Further by our construction (k = n - N)
," which proves (1), and the p r o o f of L e m m a 1 is complete.
LEM~ 2. Let m> [n2/4]. Then every G(n; m) contains a K,(2,k) where k = Icon].
L e m m a 2 is known [6]. Now we can prove Theorem 1. In fact we shall prove the stronger statement: To every e > 0 there is a ci = q(~) so that every G(n; [n2/4] + 1) contains a
Ke([ctlogn], [n1-*]). By L e m m a 1 our G(n; In2~4.] + 1) contains a subgraph G(N, M) every vertex of which has valency > [ [ n 2 / 4 ] + 1] = [ n / 4 ] . Further (1) implies by a simple n computation (2)
M=>
~-
+1-(n-N)
Further since every vertex of (3)
>
--~-- .
G(N,M) has valency > [n/4] we have n N > --. 4
By (2) L e m m a 2 can be applied to G(N, M) and by L e m m a 2 and (3) we obtain that G(N, M) contains a Ke(2, k) with k = [c5n/4 ]. Let the vertices of our Ke(2, k) be (.we choose e5 < 1/3)
1963]
ON THE STRUCTURE OF LINEAR GRAPHS
xl,xz; Y~, "",Yk,
(4)
159
k =
<
["] -g
- 1
Denote by zl, " ' , z, the other vertices of G(N, M). Each y has by Lemma 1 valency > In/4] (in G(N,M)), hence each y~, 1 < i < k is connected with more than n
(5)
n
---2-k+l 4
>--_ 8
z's. ((5) follows immediately from (4) since the number of x's and y's is k + 2 < [ n / 8 ] + 1 and in the worst case yf is connected with all of them). Let z~°, 1 < j < fi, t~ > n/8, be the z's adjacent to Yv Form all the (u.-2)-tuples (u. = [c 1 logn] of Theorem 1) of these vertices for each i,1 < i < k = [c5n/4 ]. By a simple computation we obtain (we use (b) > (6)
~,
ti
i=1
u.-
> csn 2
=
(a/ b) b)
(iinjS+l) csn( n)-2
4
u, - 2
> ---4-
8(u,-2)
Further trivially
(7)
u. - 2
< ( u . - 2) ! <
(u.- 2)..- 2 <
u.--sT!
Hence from (6) and (7) (8)
~ i=a
ti u.-2
)
csn > --4-
( ) n u.-2
1 nl_, 24~. ---~- >
( ) n u.-2
for every 8 > 0 if cl = c~(e) is sufficiently small. The number of the z's is clearly less than n, hence the number of the (u, - 2)-tuples formed from z's is less than (u, n_ 2)" Thus from (8)there is a (u, -
2)-tuple which occurs more than
n 1-~ t i m e s - - i n other words there is a set of u,. - 2 z's which are adjacent to the same [n I -*] y's. If we adjoin to these z's xl and x2 (which are adjacent and are adjacent to all y's) we obtain that G(N; M) and hence our G(n; [n2/4] + 1) contains a Ke(u.,n 1-~) for every e > 0 if cl = c1(~) is sufficiently small. This completes the proof of our assertion and hence Theorem 1 is proved. Proof of Theorem 2. As in the proof of Theorem 1 our G(n; [n2/4] + 1) contains a Ke(2,[csn/4]), cs < ! 1 / 3 , having the vertices Xl,X2, YI,...,yk, k = [c5n/4]. Each of the k vertices y~,...,yk are adjacent to more than n/8 z's (we use the notations of Theorem 1). Consider now the bipartite graph whose
160
P. ERDOS
vertices are Yx,'",Yk; z l, ...,z, and whose edges are the edges (Yt, z j ) o f G(n;m). This bipartite graph has fewer than n vertices and more than --~
~-- c6 n2
edges. Hence by a theorem of Gallai and myself I-7] it has a path of length c2n (the length of a path is the number of its edges). Since our graph is bipartite every second of its vertices is a y. Now since x l and x2 are adjacent and they are adjacent to each of the y ' s we immediately obtain that our G(n; 1-n2/4] + 1) contains a C t for each 3 < k < [c2n ], which proves Theorem 2. Proof of Theorem 3. By L e m m a 1 G(n; [tn3/2]) contains a subgraph G(N; M) every vertex of which has valency => [tn~/Z]. Let x be one such vertex and let Yl, "",Yk, k = ½ [ thaI2] be some of the vertices adjacent to x and denote by z l , . . , the other vertices of G(N,M). Every y has valency > 1-tnl/2],thus since the number of y ' s is ½1-tn~/2] there are at least ½ 1-tn1/2] z's adjacent to each y. Hence the bipartite graph whose vertices are y~,..., Yk : Z~,... and whose edges are the edges (y~,z j) of G(n, m) has at least t2 k ½[tn l/z] = k[tnl/2] 2 > ~-n edges. The number of its vertices is clearly < n. Thus by the theorem of Gallai and myself [7] it has a path of length > 2c3 t 2 and as in the p r o o f of Theorem 2 every second vertex of this graph is a y. Since x is adjacent to every y this path together with the vertex x gives the required circuits C2t, 2 < l < c3t 2, which proves Theorem 3.
REFERENCES 1. Tur~tn, P., 1941, Mat. Lapok, 48, 436--452 (Hungarian), see also Turan, P., 1955, On the theory of graphs, Coil. Math., 3, 19-30. 2. Dirac, G., will appear in Aeta Math. Acad. Sci. Hungar 3. Erdtis, P., 1947, Some remarks on the theory of graphs, Bull. Amer, Math. Soc., 53, 292-294; see also ErdSs, P. and Renyi, A., 1960, On the evolution of random graphs, PubL Math. Inst. Hung. Acad, 5, 17-61. 4. Erd6s, P. and Stone, A. H., 1946, On the structure of linear graphs, Bull. Amer. Math. Soc., 52, 1087-1091. 5. Erd6s, P., 1938, On sequences of integers no one of which divides the product of two others and on some related problems, Izv. Nauk. Inst. Mat. Mech. Tomsk., 2, 74-82. 6. ErdiSs, P., 1962, On the theorem of Rademacher-Tur~in, Illinois Y. of Math., 6, 122-127. 7. Erd6s, P. and Gallai, T., 1959, On maximal paths and circuits of graphs, Acta. Math. Acad. Sei. Hungar., 10, 337-356. UNIVERSITYOF MICHIGAN~ ANN ARBOR,MIctnOAN, U.S.A.
A k-EXTREME POINT IS THE LIMIT OF k-EXPOSED POINTS BY
EDGAR ASPLUND ABSTRACT
It is proved that the relative boundary of a k-dimensional intersection of a hyperplane and a compact convex body is contained in the closure of the union of all intersections of dimension lower than p that the same convex body makes with different hyperplanes. We prove in this note a theorem that generalizes a theorem of Straszewicz [-3]. Let C be a c o m p a c t convex body in R n. DEFINITION 1. A point p e C is called k-extreme if it is not the centroid of some non-degenerate (k + 1)-simplex in C. In the terminology of Bourbaki I-2, §1, exerc. 2] one would say that a k-extreme point is o f order at most k. DEFINITION 2. A point p e C is called k-exposed if it is contained in a closed half-space K such that K n C is at most k-dimensional. We collect some immediate consequences of the definitions. COROLLARY. A k-exposed point is k-extreme. A k-exposed (extreme) point is
h-exposed (extreme)for all h > k. I f ~p is a supporting hyperplane of C and p ~ q~ n C is k-extreme with respect to ~ n C, then p is k-extreme with respect to C. We will call 0-exposed (extreme) points exposed (extreme) to conform with earlier usage. The following theorem coincides for k = 0 with the theorem of Straszewicz. THEOREM. For k > O, the closure of the set of k-exposed points contains the set of k-extreme points. F o r k > n - 1, the theorem is true trivially. For k = n - 1 the set of k-exposed points and the set of k-extreme points both coincide with bd C and for k __> n with C. Received September 30, 1963 161
162
EDGAR ASPLUND
This theorem is applied in [1] to the problem of determining which subsets of a general finite-dimensional Banach space have unique farthest points. We proceed to prove the theorem by induction on k. The proof for k = 0 may be found in Straszewicz [3] and in Bourbaki [2, §4, exerc. 15c], and we will use this "ordinary Straszewicz t h e o r e m " both as a starting point for the induction and in the p r o o f itself. Suppose that the theorem has been proved for k < h. Let p be a point in the complement of the closure of the set of h-exposed points. In order to obtain a contradiction we also add the hypothesis that p is h-extreme. Let A be an open neighborhood of p such that no q EA is h-exposed. By the induction hypothesis, p is the centroid of some non-degenerate h-simplex S. Choose a linear manifold M that passes through p and that is supplementary to the one generated by S. Since by hypothesis the point p is h-extreme, it must be extreme relative to M n C and hence, by the ordinary Straszewicz theorem, some point q E A n M n C is exposed relative to M n C. We will now show that the face of q is at most (h-1)-dimensional, thereby arriving at the contradiction, since by hypothesis q is not ( h - 1)-extreme. Let N be a hyperplane of M that separates q from M n C. As an affine manifold of R n, N has codimension h + 1 and N n C = {q). By the Hahn-Banach theorem, there is a hyperplane tk containing N that supports C at q. In this hyperplane N has codimension h and separates q from the rest of ~b n C. But q is not h-exposed, hence q5 n C has at least dimension h + 1. Let F be the affine space generated by ~b n C. Then N n F has codimension at most h in F, and (N n F) n C = {q} so that there exists a hyperplane ~k of F supporting q ~ A C = F C ~ C at q and containing N ~ F . In ~, N t ~ F has codimension at most h - 1 . But ~kalso contains the face o f q (in C), which therefore can have at most dimension h - 1. The proof is now complete.
REFERENCES 1. Asplund, E., Sets with unique farthest points (to appear). 2. Bourbaki, N., 1956, Espaces vectoriels topologiques, chapitre IV. Paris. 3. Straszewicz, S., 1935, Uber exponierte Punkte abgesehlossener Punktmengen, Fund. Math., 24, 139-143. UNIVERSITY OF CALIFORNIA, BERKELEY, CALIFORNIA
ON HAMILTONIAN BIPARTITE GRAPHS BY J. MOON AND L. MOSER ABSTRACT Various sufficientconditions for the existence of Hamiltonian circuits in ordinary graphs are known. In this paper the analogous results for bipartite graphs are obtained. Various sufficient conditions for an ordinary graph (without loops or multiple edges) to be Hamiltonian have been given by Dirac, Erd6s, Ore, P6sa, and others. The object of this note is to point out some corresponding results for bipartite graphs which can be obtained by similar methods. A bipartite graph B(n,n) consists of the vertices Pl,P2,'",Pn and ql,q2 ..... qn and some of the edges (Pi,qi). We assume throughout that n _~ 2. The degree d(x) of the vertex x is the number of vertices with which it is adjacent, or joined by an edge. A graph is Hamiltonian if it contains a complete cycle, i.e., a cycle which contains every vertex of the graph exactly once. The following result is proved by an argument very similar to one used by P6sa I7]. THEOREM. A bipartite graph B(n,n) is Hamiltonian if it has the following property: For any nonempty subset F of k < ½n vertices Pi each of degree d(pi) < k, every vertex q.i with degree d(qj) < n - k is adjacent to at least one vertex in F, and similarly with Pi and q~ interchanged. Proof. Assume that B = B(n, n) satisfies the hypothesis of the theorem but is not Hamiltonian. We may suppose that B becomes Hamiltonian whenever any new edge (p,q) is added, joining vertices not already adjacent. For if B did not originally have this property we could add suitable edges until it did and the graph so obtained would still satisfy the hypothesis of the theorem. Suppose that the vertices Pl and q i are not adjacent. Then there exists a complete path, say (Pl,q2,P2 ..... qn, P,,q~), from Pl to ql. If qi is any vertex adjacent to p~ then it must be that Pi-1 is not adjacent to q~, for otherwise (Pl,qj, P j , . . . , q l , P j - I , q j - I , P j - ~ . , . . . , q 2 , P l ) would be a complete cycle o f B. F r o m this it follows that
d(p) + d(q) <_ n,
(1) Received October 9, 1963.
163
164
3. MOON AND L. MOSER
[September
for any nonadjacent vertices p and q in B. When d(p) + d(q) is evaluated for all pairs of nonadjacent vertices p and q, suppose that it attains its maximum when p = Pl and q = ql and that d(pl) < d(ql). If d(pl) = k, then from (1) and the hypothesis of the theorem it follows that 1 -< k _< ½n. We have seen that if q1 is adjacent to p~ then ql is not adjacent to P j-1, using the same notation as before. There are k such vertices p j_ 1 and each of their degrees is at most equal to k, for otherwise we would have
d(pj-1) + d(ql) > d(Pl) + d(ql), contradicting the choice of Pl and ql. The fact that ql, where d(ql) < n - k, is not adjacent to any of the vertices p j_ 1 violates the original hypothesis. This contradiction suffices to complete the proof of the theorem. The following corollary is analogous to the theorem proved by P6sa [7] for ordinary graphs. COROLLARY1. I f B(n,n) is such that for every k, where 1 <_k <_½n, the number of vertices p such that d(p) < k is less than k, and similarly with p replaced by q, then B(n,n) is Hamiltonian. A simple example of a class of graphs to which the theorem but not this corollary would apply is the bipartite graph with four vertices of each type and which contains the edge (Pi, qj) if and only if [j - i [ _-< 1. The following result follows from Corollary 1 by the same sort of argument as was used to obtain the result of Erd6s [3-1 from the result of P6sa [7]. COROLLARY2. Suppose that d(x) >=k for every vertex x of a bipartite graph B(n,n) where k is some integer satisfying 1 <- k <_½n. I f the number of edges in B(n,n) exceeds max {n(n - t) + t 2 I k
-<_t < ½n} = max{n(n - k)
+ k 2,
n(n-[½n"]) + [½n.12},
then B(n,n) is Hamiltonian. Setting k = 1 in this statement gives a result corresponding to one given by Ore [6] for ordinary graphs. The following corollary is another consequence of the above theorem. A similar result for ordinary graphs was given by Ore [5]. (See also Erd6s and Gallai [2]). COROLLARY 3.
If B(n,n) is a bipartite graph in which d(p) + d(q) > n,
for every pair of nonadjacent vertices p and q, then B(n, n) is Hamiltonian. If it is assumed that d(x) > ½n for every vertex x in a bipartite graph B(n, n), then Corollary 3 may be applied to yield a result analogous to one given first by Dirac [1] and later by Newman [4] for ordinary graphs. ~fhe result may be expressed in somewhat more terpsichorean language as follows.
1963]
ON HAMILTONIAN BIPARTITE GRAPHS
165
COROLLARY4. Given n men and n women, a sufficient condition f o r setting up a Hora (A Hora is an Israeli dance involving a circle made up of the participants, in which men and women alternate and presumably each person knows his or her two partners.) is that each man knows more than half the women present and each woman knows more than half the men present. It is not difficult to construct examples that show that each of the preceding results, is in a sense, best possible. However, the problem o f finding conditions which are both necessary and sufficient for a bipartite graph to be Hamiltonian remains unsolved and appears to be very difficult.
REFERENCES 1. Dirac, G.A., 1952, Some theorems on abstract graphs, Proc. London Math. Soc., 2, 69-87. 2. ErdSs, P. and Gallai, T., 1959, On maximal paths and circuits of graphs, Aeta Math. Acad. Sci. Hung., 10, 337-356. 3, Erd6s, P., 1962, Remarks on a papel of P6sa, Publ. Math. Inst. Hung. Acad. Sci., 7, 227-228. 4. Newman, D. J., 1958, A problem in graph theory, Amer. Math. Monthly, 65, 677. 5. Ore, O., 1960, Note on Hamilton circuits, Amer. Math. Monthly, 67, 55. 6. Ore, O., 1961, Are coverings of giaphs, Ann. Mat. Pura Appl., 55, 315-321. 7. P6sa, L., 1962, A theorem concerning Hamilton lines, Publ. Math. Inst. Hung. Acad. Sci., 7, 225-226. UNIVERSITY COLLEGE, LONDON AND UNIVERSITY OF ALBERTA
ON A THEOREM BY P. MALLIAVIN BY
/~KE PLEIJEL ABSTRACT
P. Malliavin has deduced the asymptotic behaviour of a measure from the behaviour of its Stieltjes transform when z-+ oo along a curve. His result is here proved in an elementary way. 0. Recently P. Malliavin [-1] published a tauberian theorem for Stieltjes-transforms considered in the complex plane. His result has already been used by S. Agmon in a study of eigenvalues of differential operators which will appear in a forthcoming paper. I have observed that Malliavin's theorem is easily obtained by a method which I once used to study tauberian theorems [2]. Since I have understood that Malliavin's proof follows other lines I have considered it worth while to publish my way of proving his result. 1. A basic formula.
Let
f(z)
dtr (2)
=
and let L(Z) be an oriented curve in the complex plane from Z to Z = X + iY not cutting the line of integration 0 < x < oo. We assume X > 0, Y > 0. By a change of the order of integration (1)
t ( z ) = T 1~ i
~z) f(z)dz =
v(2,Z)dtr(2)
where v is the angle between the negative real direction and the direction from (~,0) to Z. We observe that if f = f l + if2, f t and f2 real, then
Yf t(Z) = fo~ sin 2v dtr(2),
Yfz(Z)- JoSin2v da (,~). Formula (1) can be written Received November 1, 1963 166
ON A THEOREM BY P. MALLIAVIN
(2)
167
1
I(Z) - ~ Yfl(Z) - a(X) + a(O) =
lf~(v =-½sin2v)d, ( ) +
f:(v-½sin2v)da( ).
Here
(3)
Iv - n - ½sin2v[ < csin2v,
(4)
[ v - ½sin 2v] c sin2v,
5¢<
<
2=v
It,
O
~-,
lg
where c is a constant. It is evident that sin2v can be replaced by sinav which, however, does not improve the results in the considered cases. From (2), (3), (4) it follows, provided d a ( 2 ) > 0, that
(5)
It(z) -
1
c
r f l ( z ) - o ( x ) + .(0)[ ____ rye(z).
2. M a l l i a v i n ' s t h e o r e m . Malliavin considers a curve L which for x > 0 is given by y = + x ~, 0 < y < 1, and assumes for f of section 1 that
f(z) = a ( - z) ~ + O(z #)
(6)
when z ~ oo along L. Here - 1 < ~ < 0 and fl < ~. It is also supposed that
da(2) > O. Evidently L may be modified near the origin so as to give a connected curve not cutting the positive real axis x > 0. Since fl < a it follows from (6) that
Yf(Z) = O(X ~+~) when Z ~ 0o along L. We define L(Z) as the part of the modified curve L for which Re z < X when Z = X + i Y. O( ) refers to the case when Z--} oo along L. Write (7)
l(Z)=
lfL
af,
t z ) i f ( z ) - a(-z)')dz +
~z)(-z)"dz.
To estimate the first integral of this formula we consider separately the cases - 1 < fl < 0, fl = - 1 , fl < - 1 . In the first case the integral is O(Xa+X), in the second it is O(logX). In the third case the integral converges to some constant A. The rate in which it approximates to A is O(X#+I). The second integral of (7) equals
aR~+ t sin(a + 1 ) ( n - ~k) n(~ + 1)
'
where R = ] Z 1, ff = arg Z. This expression is written
/~KE PLEIJEL
168
aXe+ t sinz(~ + 1) +
O(X~+r).
z(~ + 1) All taken together we find by the help of (5) that when X-~ oo
a(X)_a(O)=aX~+ 1 s i n n ( ~ + 1) n(a+l) + O(X ~+~) + O(X p+~) + A, where A is present only when ] / < - 1 and the term O(X ~+t) should be replaced by O(logX) when ] / = - 1 . (The 8 in Malliavin's remainder is not necessary.)
BIBLIOGRAPHY 1. Malliavin, Paul, 1962, Un th~or~me taub~rien avec reste pour la transform6e de Stieltjes, Comptes ReMus de l'Aeademie des Sciences, Paris, 255, 2351-2352. 2. Pleijel, ./d~e, 1952, On a theorem of Carleman, Matematisk Tidskrift, B, 39-43.
MATEMATISKAINSTITUTIONEN LUND, SWEDEN
GRAPH-THEORETIC AND ALGEBRAIC CHARACTERIZATIONS OF SOME MARKOV PROCESSES* BY
A. PAZ ABSTRACT
An algebraic decidable condition for a stationary Markov chain to consist of a single ergodic set, and a graph-theoretic decidable condition for a stationary Markov chain to consist of a single ergodic noncyclic set are formulated. In the third part of the paper a graph-theoretic condition for a nonstationary Markov chain to have the weakly-ergodic property is given. 1. Introduction. In this paper we are concerned with algebraic and graphtheoretic characterizations of stationary and nonstationary M a r k o v chains (with discrete time parameter and finite n u m b e r of states). In the first part we formulate a graph-theoretic condition for a given chain to consist of a single ergodic noncyclic set. This condition is shown to be equivalent, as to characterization, t c a condition given in D o o b [3, p. 173] but more economical as to computation. In the second part we formulate a simple algebraic sufficient and necessary condition for a stationary M a r k o v chain to consist of a single ergodic set. This condition can also be used to characterize finite graphs, since with every finite graph one can associate an infinite number o f M a r k o v chains. In the third part we derive a theorem generalizing a result mentioned in D o o b [2] for stationary chains, to a certain type of nonstationary chains. The theorem states, roughly, that such a nonstationary chain, after a sufficiently long lapse of time, " f o r g e t s " its initial state. Some formulas characteristic of nonstationary chains are given, permitting approximation of long products of stochastic matrices satisfying certain conditions.
Afinite graph [1] is an ordered pair ( S , F ) where S is a finite nonempty set and F a multi-valued mapping of S into S. The elements of S are called vertices, and the ordered pairs ( a , b ) , such that a E S and b c a F * * are called edges. A sequence of vertices (aoa 1 ... av) such that ( a i a i + x ) (i = 0 , . . . , v - l ) is an edge, is a path of length v and av is a consequent of order v of ao. (Notation: a~ E a0F*). Received May 14, 1963. * The paper is based on part of the author's work towards the D. Se. degree. ** a F is the set of all F images ofa. 169
170
A. PAZ
[September
A graph is strongly connected if for every pair of vertices i,j (i ~ j), i is a consequent of j. A pair of vertices i and j has a common consequent k (of order v) if there exists a v such that k ~ i F ' f 3 j F ~. If all the vertices in the graph have a common consequent k of order v, then k is a universal consequent of order v for the given graph. A vertex in a given graph is transient if it has a consequent of which it is not itself a consequent. A vertex which is not transient is nontransient. Note that, if i is a consequent o f j and j is nontransient, then j is a consequent of i and i is nontransient. (For let k be any consequent of i; then k is a consequent o f j and j is nontransient, whence j is a consequent of k, this implying that i is also a consequent of k). Note also that the set of nontransient vertices cannot be empty, since a maximal sequence of vertices connected by a path must terminate in a nontransient vertex. The class of nontransient vertices is subdivided into ergodic subclasses (where an ergodic class is the set of vertices of a maximal strongly-connected subgraph) with two vertices belonging to the same ergodic class iff they are consequents of each other. Finally, it can be shown [4, p. 611 that with any ergodic class E there is associated a unique positive integer d with the following properties: 1) If i e E and i e iF', then d divides v. 2) ! f i , j ~ E and .~eiF z as well as j e i F ' , then l ~ - v ( m o d d ) . 3) d is the smallest integer having properties 1) and 2). Thus any ergodic class E may be subdivided into d cyclic classes C1 ". Cd, as follows: Two vertices i,j belong to the same cyclic class, i f f j • iF" and v - 0(modd). Note that if i e Ct, j e Ct (1 < t -< d) and j e iF" then v - t - l ( m o d d ) EXAMPLES.
1
2
1"
)'3
\
s
3
5
4
2"
7.4 II
In graph I vertices 1 and 5 are transient while 2, 3, 4 are nontransient, forming an ergodic (strongly-connected) subgraph. Vertices 5 and 3 have vertex 4 as common consequent of order 1. In graph II all vertices are nontransient. The graph is strongly connected and subdivisible into the cyclic subclasses {1,2} and {3,4}.
1963]
CHARACTERIZATIONS
OF SOME MARKOV
PROCESSES
171
2. Conditions H 1 and H 2. We shall consider the following conditions for a given graph: Condition HI., Every pair of vertices in the graph has a common consequent. Condition H 2. The graph has a universal consequent. THEOREM 1. A graph which satisfies condition H 1 contains a single ergodic class which is not divisible into cyclic subclasses. Proof. Suppose there are several ergodic classes in the graph, G~,G 2 " " G r. Let il ~ G~, it ~ Gt be a pair of vertices of different classes. By our assumption, il and is have a common consequent k, where k is nontransient being a consequent of nontransient vertices. Hence il and is are consequents of k, and this implies that k c Gt and k ~ Gr Thus G1 and Gt are identical classes. To prove the second part of the theorem, assume that the ergodic class in the graph is divisible into several cyclic subclasses C1, C2"" Cd. Let c 1 e C1 and ct ~ Ct be a pair of vertices of different classes. They have a common consequent k which is nontransient and hence belongs to a cyclic class Ck. This implies that k is a consequent of order k - l ( m o d d ) of cl and a consequent of order k - t ( m o d d ) of Cr Now k is a common consequent of c I and ct, which implies that k - 1 - k - t ( m o d d ) or 1 - t ( m o d d ) ; thus c~ and ct are identical cyclic classes. THEOREM 2. Let ( S , F ) be a graph with n vertices. If a pair of vertices i and j, i, j ~_S, has a common consequent, then it has a common consequent of order v where v < n ( n - 1 ) / 2 . Proof. If states i and j have a common consequent, then there exists a sequence of (unordered) pairs of vertices (with i = io, j =Jo): (ioJo),(iljx) ... (iuju)
such that (1)
(2)
i k ~6 jk i k E iF k, Jk EJ Fk
k=0,1,2,...,/t-1
(3) iu = j, If the sequence contains two equal pairs, then omit the part of the sequence between these pairs, including the second of the equal pairs. Repeat this procedure until a reduced sequence is obtained: (ioJo),(i[j~) ... (i~j'k)"'" (i'j'v) such that (1') ~kV~Jk k = 0,1...v-1 (2') i;, ~ iF k j k % j r k (3') (i~j;) ~ (i~j',) l~t, k,t=l,2...v (4') i" = j" Now by (2') and (4') i~ = j" is a common censequent of order v of the vertices i and j, while by (1') and (3') ~ is at most n ( n - 1)/2. Q.e.d. • !
.!
172
A. PAZ
[September
Note that if vertices i and j have a common consequent of order v, then they also have a common consequent of order v + k,k = 1,2.... TrmOR~M 3.
Condition H1 is equivalent to H2.
Proof. That H 2 implies H1 is trivial. Now assume Ht to hold and let i and j be a pair of vertices in the graph. These vertices have a common consequent k of order yr. Let t be a vertex different from i and j and t any consequent of t of order vl. By H1, vertices l and k have a common consequent of order v2, which is a common consequent of order vl + v2 of vertices i, j and t. This argument can be repeated a finite number of times to give a universal consequent. Q.e.d. With every finite graph ( S , F ) where S = {vlv2 "" v,} one can associate an n x n Boolean matrix (transition matrix) M<s,r> = II m,jl!, such that mi~ =
1 - i f (v~vi) is an edge of the graph, 0 - otherwise.
It is easily proved that the powers of M <s,r>, namely M
= IIm~'ll, k are such that:
-- 1,2,.
,
(k)_ [ 1 - i f there is a path of length k from i to j, mij - / 0 - otherwise. This result, combined with Theorem 2, provides a computational method for deciding whether a given graph satisfies H1. The question is decided by raising the transition matrix of the graph to the n. ( n - 1 ) / 2 - t h power at most. By Theorem 3, the same method will decide whether a given graph satisfies H2. Note that H 1, although equivalent to H2, is more economical in the sense that more steps would be required to decide directly whether a given graph satisfies H2. EXAm'LE. Consider the following graph 2"
1
>.3
T
l
-4
.<
•",a
J
In order to verify whether vertices 2 and 5 have a common consequent, we consider the sequence of consequent sets: {2} 1 {3} 2 {4} 3 {5,1} " {1,2} s {2,3} 6 {3,4} 7 {4,5,1} 8 {5} {1} {2} {3} {4} {5,1} {1,2} {2,3}
s {5,1,2}
-,,.k
{3,4}
'
..-.k
{1,2,3}
{4,5,1}
1963]
CHARACTERIZATIONS OF SOME MARKOV PROCESSES
173
One sees that vertex 1 is a common consequent at vertices 2 and 4 of order 9 (the bound given by Theorem 2 in this case being 10). Inspection also shows that any other pair of vertices in the graph have a common consequent at smaller order than vmtices 2 and 5. Further consideration of this example shows that 13 steps would be required to decide directly whether this graph satisfies H 2. REMARK. The above graph can be generalized: Let S = {1,2,...,n} and define: iF=i+1 (n-2)F (n-
= {n-ll
1)r = 1
It can be shown that this graph satisfies the condition H1 but the smallest order for which the condition is satisfied is (n z - 2n + 2)/2 if n is even and (n 2 - 2n + 3)/2 if n is odd. These numbers are close enough to the bound given in Theorem 2. 3. Stochastic matrices. Stationary ease. In what follows familiarity with finite stochastic matrices and their properties is assumed. The reader is referred to [2, 3, 4] for a detailed account on these topics. A finite stochastic matrix is a square matrix A = IIaij[l such that aij_-> 0, i,j = 1,2,..-,n, and ~ = 1 alj = 1, i = 1,2,...,n. With every stochastic finite matrix A there can be associated a finite graph (S, F ) such that S = (vlv 2... v,,}, where n is the order of A and the ordered pair of vertices (Vl, vj ) is an edge of the graph iff a~j > 0. More generally, the elements of the stochastic matrix are interpreted as the probability of transition from state to state in a system having n internal states. Using the previous classification of vertices in a given graph (internal states in a given system) we can now state: THEOREM 4*. I f A is a stochastic matrix, there is a stochastic matrix Q = UqlJ][ such that lim 1 ~ Am = Q. m=l
n~oO n
THEOREM 5**. The limit qij in Theorem 4 is independent of i iff there is a single ergodic class in the graph related to A. TrIEOREM 6***.
The limit qij of Theorem 4 can be taken as ordinary limit lim 1 ~ Am = lim A m n~oO n m=l
* For a proof, see [2, p. 175]. ** For proof, see [2, p. 181]. *** See [2, p. 182].
n~oo
=
Q
174
A. PAZ
[September
iff there are no cyclic subclasses in any ergodic class of the graph related to A.
COROLLARY 1. If A is a stochastic matrix and the graph related to this matrix satisfies H1 or (H2), then there is a matrix Q = IIq, ll such that lim -1 ~ A m = l i m A" = Q n-¢oo n
m=l
n,..¢oo
and qij is independent of i. Proof. By Theorems 1, 3, 5 and 6. The limiting matrix Q. In the sequel, some additional definitions are needed: Let A be an n x n matrix, ~/the n-component row vector and ~ the n-component column vector having all components equal to 1. A(,) denotes the r-th row and A (') the r-th column of A. Clearly (A • B)(,) = A(,)B; .4(,) is obtained from A(,) by omitting the r-th term. Clearly ~- A(,) =
A(') , i. ,1~(,) e. a matrix all whose rows are equal to At,).
If A is stochastic, then A(,)~ = 1 and A~ = ~. If A¢ = ~ and also r/A = r/then A called doubly stochastic. For any stochastic n × n matrix A and any r < n we define the r-th kernel of A, denoted by kr(A), as k,(A)= A - ~A(,). Finally, k,(A) is obtained from k,(A) by omitting the r-th column and the r-th row. (k,(A) is a ( n - 1 ) x ( n - 1 ) matrix). The usual way to compute lim An or lim 1 ]~ Ak (when the first limit does n~oO
n~oO
n
k=l
not exist) is based on the fact that the limiting matrix Q satisfies (in both cases) the equation QA = AQ - Q or Q I - I - A ] = 0 This equation is equivalent to a set of n linear homogeneous equations. Nontrivial solutions exist provided ] I - A I= 0. If the system associated with the matrix consists of a single ergodic class, then it is known that a unique solution exists for the set of equations f (xl ...~n) [I - A] = 0 (1)
xi = 1 i=l
This implies that I 1 - A I = 0 and the dimension of [ I - ,4] is n - 1 (for the single ergodic class system). From (1) we obtain, by adding to both sides (xl " ' x n ) ' ~ ' A t , ),
1963] (2)
CHARACTERIZATIONS OF SOME MARKOV PROCESSES
175
(xx -.. x,) [I - (A - ~A(r))] = (xl ... x,) ~ A(,)
Now ]Exi = 1, hence (xl "" x,)# = 1 and (xl "" x,)#A(,)= At,); thus (xl . . . x , ) [ I - ( A - CA(,))] = A(, ), and by the definitions introduced above, (3)
(xl ... x,) [I -/~,(A)} = a(,)
Any vector (xl • .. x,) which is a solution of(3) is such that (xx ".. x,)~ = ~]in = 1X i = 1. Indeed, ]ELlx i = (xl.-.x,)~ = ( X l . - . x , ) [ ~ - ~ + ¢] = (xx...x,,)[I - A + ~A(r)] ~ = = (x 1 . . . x , ) [ I - /~,(A)]~ = A(,)~ = 1. Any such vector is therefore a solution of (2) and (1). Thus (1) is seen to be equivalent to (3). From Theorem 5 it follows that the system related to the matrix A consists o f a single ergodic set iff (1) and (3) have a unique solution such that;
I I - x (a) l # o (4)
and (xl "'" x,) = A(r)[I -- £ ( A ) ] - 1.
We have thus proved the following TI-rEOREM 7. Let A be a stochastic matrix, then for any r = 1,2,...,n, ]1 - /~r(A) ] # 0 if and only if the system related to A consists oJ a single ergodic
class. REMARKS. 1) It is easily seen that the theorems in this section could have been proved with K,(A) replacing K,(A) (see definition on p. 178). This use of K,(A) has the advantage that K,(A) is an ( n - 1 ) by ( n - 1 ) matrix while R~(A) is n by n. Similarly we can write, instead of (3) and (4), (3')
(xl "'" Xr- 1X,+I "'" Xn) [I -- Kr(A)] = A(r)
(4')
(xl ... x,_ 1Xr+ l " " X,) = A(~)[I - Kr(A)]- 1
The information contained in these formulas is the same as that in (1) and (2), since (xl .--x,) is a stochastic vector, whence x, = 1 - ~ i , , x i . 2) Corollary 1 in the preceding section and Theorem 7 provide a method for analysing the structure of a system representable by a stochastic matrix. With the aid o f Theorem 7, we first ascertain whether the system has a single ergodic class or not. If it has several ergodic classes, we can investigate each class separately. If it has a single ergodic class, we have to ascertain whether it satisfies H1 (or H2). If it does, the ergodic class is not divisible into cyclic subclasses. The decision procedure given here is quite simple from a computational point of view. 3) The decision procedure described in Remark 2 can be applied to finite graphs as well. With a given finite graph one can associate a stochastic matrix
176
A. PAZ
[September
A=II~,jI I
such that a , j > 0 iff (v,vj) is an edge of the graph, and ~ = t a ~ k = 1, i = 1,2,...,n. The matrix A is clearly not unique and we can choose the simplest possible matrix (as to computation) satisfying the above condition; then the procedure described in Remark 2 can be applied to the chosen matrix. EXAMPLES. Consider the matrices
303
o ~ o A=
0 01
o o
o~o
0
ooo33"
0330
o o]~o
o
½0~
3¼0
0
oo]]o
]ooo
B=
o 3J
C =
0¼¼03
]ooo
]o]o~
¼ 030¼
(]3o~o
For matrix A, [ I - Ks(A)] = 0. Hence A has more than one ergodic class. We find by inspection that the sets are {1, 3} and {2,4}, while state 5 is transient. Matrix B corresponds to a chain consisting of a single ergodic set and two cyclic subclasses, and matrix C satisfies HI.
Set
A =
0
3
¼ 3 We shall calculate limA n using (4): '
=16[{
lim X(n)= ~ x hence lira Ate3)= (~,~,~) ~ 1 and -'1(3) A(a)(I - Ka(A)] -1 = (~,-~), 11"-)00
n--P oo 2
1
2
2
1
2
lim A" 11")(30
4. Stochastic matrices - - nonstatlonary case. In the preceding sections we considered stationary stochastic systems representable by a constant transition matrix. Nonstationary stochastic systems (discrete time parameter and finite number of states) are systems for which the transition probabilities may change from step to step. Mathematical represemation of such a system is provided by a sequence of stochastic matrices A(1), A(2),..., A(O,..', where A(/) is the transition matrix in the i-th step.
1963]
CHARACTERIZATIONS
OF SOME MARKOV
PROCESSES
177
The set {A(k) [ k = 1,2,....) for a given system may contain an infinite number of distinct matrices. However, we confine ourselves here to a finite set of distinct matrices N = {Ak]k = 1,2,...,n} and consider all products of the form H I~=1Ar~/, D m = 1,2,-.-, where A(i) is as above and A ( i ) e N , i = t , 2 , . . . , m . Set I I =lA(i) 11b,J II, M5")= maxb,j, m~"'= m~n b,~. The counterpart of H 1 for the nonstationary case is the following: Condition H 3. A finite graph satisfies H 3 if every pair of vertices in the graph has a common consequent of order one. In what follows we shall say that a matrix satisfies Hx (H2 or Ha) if the graph related to the matrix satisfies H1 (H2 or H3). We can now prove the following: THEOREM 8. I f N is a finite set of matrices all satisfying H a and if B = I'IL~A(i), A(i) ~ N, then M~ a) -- m~B)< (1 - 6)"
j = 1,2, . . . , n
where 6 is the minimal nonzero element among all elements of the matrices in N. Proof. (This proof generalizes the one given in [3, p. 173] for the stationary case). Let A be any matrix in N, A = IIa,j II. Consider the sum ~ = l ( a , k - a,k) for fixed e and ft. Divide this sum into two sums, ]E~ denoting summation over values of k for which a,k > aak, and I2~ denoting summation over values of k for which a,~ < aak. Then: ~..+ (a~k ~-- aOka) ar ~,-(a~k~ - a#k 2) = ~ (a~k -- aOk) = ~ aak -- ~ aOk = 1-- 1 = O. k~
k2
k=l
k=l
k=l
Hence
States ~ and fl have a common consequent of order one (condition Ha); hence there exists a~ such that aft, > 0 and ao~ > 0. If a,~ > ap[, then (a,~ - ap~) s ~+,, and
Z+ta ~:, ~, -- aak,) = Z~a~,-- ~+,atjk, < ~, a~--a#i~<- l--6 k--I
If a#~ < a¢~, then (a~g - a#~) ¢ ~ ,
and
Thus in both cases
(**)
~ + l ( a ~ , - apk,) < 1 - 6.
Let C = 1 e,l II ¢ N. We obtain
178
[September
A. PAZ
M~jAc)- m~A'c)= max Z (a~ - a~k)ekj ~,~
=<
k
l+
}
~
max Zk,(a,k,- a~,)M} c~ + Z£(a~k,- apk~)m}c~
--
max 2+(a~k - a Ok]w~(c) k ~'~ j =,fl
--
m~c~) =< (1 - ~)(M} ~
m}%;
kj
due to (*) and (**) M~ c) -- m~c) = c~1 -- cpj < ]~(C~k -- ct~k) <= 1 -- 6
for fixed ~ and fl, so that M~ ac) - m(jA'c) < (1 - 6) 2. The theorem follows by induction. THEOREM 9.
I f A a n d B are stochastic matrices, then K , ( A B ) = K , ( A ) . K ~ ( B ) .
By definition R , ( A B ) = A B - ~(AB)(,) = A B - ~A(,)B =
Proof. -
~A(,)B
~B(r) = A B - ~A(,)B + ~A(,)~B(,) - A~B(,)
+ ~B(r ) -
=
AB
=
[A - ~A(,)] [B - ~B(,)]
=
gr(A) g,(B).
Thus
g,(AB) = g~(A)" g~(B).
The r-th rows of the matrices o n both sides of this equations are zero rows. The r-th column on the right is obtained by multiplying/~(A) by the r-th column of/~,(B). Thus omitting the r-th rows and columns on both sides does not affect the equality, and therefore K ~ ( A B ) = K ~ ( A ) " K~(B).
THEOREM 10. I f A a n d B are stochastic m a t r i c e s then (XB)(,) = A(r)K(,)(B) + 1~(,) Proof. A ( , ) R , ( B ) + B(~) = A ( , ) [ B - ~B(r)] + B(,) = A(r)B - A(,)~B(,) + B(,)= = A(,)B - B(,) + B(,) = A ( o B = (AB)(,). Hence, (AB)~,) = A(o R(,)(B) + B(,), and by considerations similar to those in the proof of the preceding theorem we obtain (AB)tr) = A(,)K(,)(B) + ]~(,). By induction, we obtain the formula A(i) (,) = ,,i(,) K,(A(2))" Kr(A(3)) ..... K , ( A ( m ) ) + i
+ A ( 2 ) ( , ) K , ( A ( 3 ) ) K , ( A ( 4 ) ) . . . . . K , ( A ( m ) ) + ... + A(m)(,).
THEOREM 11.
I f H is a set o f m a t r i c e s all s a t i s f y i n g Ha, then
("
lira K, l-IA(i) m-~oo
)
= 0,
\i=1
where A ( i ) ~ N a n d 0 is the zero m a t r i x .
Proof. The terms in K , ( [ I m A ( i ) ) are differences of pairs of terms in the same column of the product matrix (1-l'~(A(i)). By Theorem 8, these differences are
1963]
CHARACTERIZATIONS OF SOME MARKOV PROCESSES
179
smaller than ( 1 - ~ ) m where 6 is the smallest nonzero element in all matrices contained in N. But lim=_.oo(1-6)= = 0, and the p r o o f is complete. REMARKS
AND
CONCLUSIONS.
1) Consider the following condition for a set N of stochastic matrices: Condition H , . A finite set of stochastic matrices of the same order satisfies H , (of order k) if there is a k such that every product of k or more matrices from N is a matrix satisfying Ha. In a forthcoming paper we shall prove the following: THEOREM. Let N = {Ai]i = 1 . . . n } be a finite set of stochastic matrices o the same order. Let Bm be any matrix of the f o r m B m = I-I~=i Aj, A j ~ N. Define 11Bm II as: IIB ll = max,(M~ Bin'- m~~m,) (M~~m, and m (B') are as in Theorem 8). I f and only i] N satisfies H a then limm..~o Bin ]l = 0. This theorem generalizes Theorem 8. Moreover, it will be shown in that paper that H 4 is a decidable condition, i.e. one can check in a finite number of steps whether a given set of matrices N satisfies H 4. 2) If N contains a single matrix, we revert to the stationary case. In this case Theorem 12 states that lim,_,oo[Kr(A)] n = 0. By Theorem 11 we find that
II
m--1
A(L) = A(o[Kr(A)] " - I + .'[(o[Kr(a)] "-2 + ... + A(O = A(r ) ~, [K~(A)] i. i=0
In the limit this becomes n--1
lim A(•) = A(o lim ]~ [K,(A)]k n -..¢oo
n'coo / = 1
Consider the following THEOREM 12". I f A is a matrix such that l i m A " = 0 then ]I - A] ~ 0 and lim ~i~=o Ai = [I - A] - x With the aid of this theorem, noting that K~(A) satisfies its condition, (if A satisfies Ha) we obtain lim A(~,) = A(r)[I - K,(A)] -x, a formula derived earlier in a different manner. 3) If N contains an infinite set of matrices, Theorems 8 and 11 may be proved by stipulating that all nonzero terms in all matrices of the (infinite) set N have a lower bound e > 0 and that all matrices in N satisfy H a. 4) If N is a finite set of doubly stochastic matrices satisfying Ha, then the products o f the form II~AI,A~ ~ N, have a limit or, more precisely, the matrix Q all whose terms equal 1/n satisfies the equation *) For proof, see [4, p. 22].
180
A. PAZ
[September
lim f i A i = Q. m-~OO i
5) With regard to long products of stochastic matrices it is of interest to inquire whether there exists a procedure for approximating such a product. Suppose, for example, that A is a stochastic matrix which can be written in the form A = I + e where terms in e are small. The usual approximation A " = I + he, ceases to be stochastic for sufficiently large n. However, it is seen from Theorems 8, 10 and 11 that a good stochastic approximation to a long product of stochastic matrices (all satisfying H3 and all belonging to a finite set N of matrices) is obtained by omitting the first k matrices in the product, where k is an easily computable function of the error allowed. For, as is seen from the above theorems, the first k matrices in the product contribute to the terms in the rth row of the product o f m matrices at most: (1 - 5 ) " - 1 + (1
-
~)m-2 +
. . .
+ (1 -- 5) 'n-k = (1 -- 5) " - k -
(1 -- 5)"
5 while those in the other rows differ from their counterparts in the r-th row by at most (1 - 5)% where 5 is defined as in Theorem 9 and (1 - 5) is the maximal term in all kernels of the matrices in the finite set N. Acknowledgements. The author is indebted to Dr. M. Yoeli for guidance, advice and a number of suggestions followed up in the paper; to Prof. H. Hanani, for guidance and advice; to Prof. M. O. Rabin of the Hebrew University for encouragement is undertaking the present study; and to Mr. E. Goldberg for editorial assistance.
1. 2. 3. 4.
BIBLIOGRAPHY B~rgc, C., 1962, The Theoryof Graphs and its Applications, Wiley, New York. Doob, J. L., 1953, Stochastic Processes, Wiley, New York. Feller, W., 1950,Probability Theory and Applications, Wiley, New York. Kemeny, J. C. and Snell, J. L., 1960,Finite Marker Chains, Van Nostrand, Princeton.
TECHN/ON-ISRAEL INSTITUTE OF TECHNOLOGY,
~A
THE GEOMETRY OF SOLVABILITY AND DUALITY IN LINEAR PROGRAMMING BY
ADI BEN-ISRAEL* ABSTRACT
Solvability and boundedness criteria for dual linear programming problems are given in terms of the problem data and the intersections of the nonnegative orthant with certain complementary orthogonal subspaces. Introduction. The duality theorem of linear programming(1) relates two linear extremum problems in terms of solvability, boundedness and equality of functional values. The classical theory of Lagrange multipliers admits extensions to some special nonlinear cases(2) as well as interpretations of duality in the context of applications(a). Tucker, in [16], showed duality--in the linear case--to follow from elementary geometric considerations of complementary orthogonality of manifolds corresponding to the dual problems(4). In this paper we follow Tucker's approach and supplement his results [16] by an alternative theorem for dual programs (Theorem 4 below), and by a characterization of all duality situations in terms of the geometrical configurations of certain manifolds associated with the data and the data itself (Theorem 6 below). None of our results seem to be essentially new; yet our efforts may be justified for pedagogical reasons. NOTATIONS. In this note we use the same notations as in [1]. In particular: is an arbitrary ordered field; E" is the n-dimensional vector space over ~ ' ; C{fl,...,fg} is the cone spanned by the vectors f l , ' " , f k in E"; Received August 22, 1963 *This research was partly supported by the Officeof Naval Research, contract Nonr-1228(10), project NR 0474)21, and by the National Science Foundation, project G-14102. (1) Conjectured by Von Neumann (e.g. [7], p. 23) and proved byGale, Kuhn and Tucker [8]. For the extended form discussed here, see Charnes and Cooper [3]. (2) Notably Kuhn and Tucker [11]. (3) E.g. [7], pp. 19-22. (4) A similar approach was used in [1] to develop, in a unified manner, the main theorems of linear inequalities. 181
182
ADI BEN-ISRAEL
[September
A is an m x n matrix over ~ ; N(A) is the null space of A in E"; R(A r) is the range space of A r in E". In addition, let A + denote the generalized inverse of A (see, e.g., [14] and [2]). The following two theorems, which are Corollaries 3' and 5 respectively of [1], are used in the sequel and quoted here for ease of reference: 0.1 Then (i) (ii)
Tt-mOREM. Let L, L ± be complementary orthogonal subspaces in E ~. the following are equivalent: L n E ~ . = {0}; L± n intE~ # ¢.
0.2 THEOREM. Let Lbe a subspace of E ~ of dimension k, k = 1,2,...,n. Then the following are equivalent: (i) LC~bdE~+=C{el,...,ep}, l <=p<=k; (ii) L± has a basis in intC{ep+l,...,e,}.
1. LEPTA. Let L be an arbitrary subspace of E ~. Then the following are equivalent: (i) {x +L}~E~+ v~p for all x~E~; (ii) L n int E~. # ~b; (iii) {x + L} nintE~. # q~ for all x E E ~. Proof. (i) => (ii) Suppose (ii) is false. This is possibly only in two cases: Case A: L N b d E ~ = C{el,...,%}, 1< p < n-dimL. Case B: LOE~_ = {0}. We will now show (i) to be false by producing, in each case, a vector Xo such that {Xo + L} = ¢. Case A: Let Xo be any vector in L ± n i n t C { - % + l , ...,-e~}, a set which is nonempty by Theorem 0.2. Case B: Let Xo be any vector in L Z n i n t { - E + } , the latter set is nonempty by Theorem 0.1. (ii) ~ (iii) Let y E L ~ intEr_. Then for all x e E ~ and scalars 2 satisfying n
o
> max x'----L-I I xt
x + 2y EintE~.. Obvious.
2. COROLLARY. Let A be an arbitrary m x n matrix over ,~. Then the following are equivalent:
1963]
SOLVABILITY AND DUALITY
183
(i) A x = b, x > 0 is solvable for all b ~ R ( A ) ; (ii) A x = O, x > 0 is solvable; (iii) A x = b, x > 0 is solvable for all b ~ R ( A ) . Proof.
The solutions of A x = b, when solvable, form the manifold A+b + N(A), where A + denotes the generalized inverse of A, [14]. Now (i), (ii) and (iii) are the corresponding parts in Lemma 1 with x = A+b and L = N(A). 3. COROLLARY. Let A be an arbitrary m x n matrix over ~ . Then the following are equivalent: (i) A r w > c is solvable for all c ~ E " ; (ii) Arw > 0 is solvable; (iii) Arw > c is solvable for all c ~ E " . Proof.
In Lemma 1 let L = R(AT), x = - c .
4. THEOREM. Let A be an arbitrary m x n matrix over ~ . Consider the system of equations and inequalities: I) A x = b , x>=O I') A x = O , x>=O II) A r w > c II') Arw > 0 Then: a) I is solvable for all b ~ R(A) if and only if II' does not have solutions with nonnegative nonzero vectors Arw. b) II is solvable for all c ~ E" if and only if I' does not have nonnegative nonzero solutions. c) I f II' has solutions w with Arw > 0 then I is solvable if and only if Arw>_O ~ (b,w)>=O. d) I f I' has solutions x with x >_0 then II is solvable if and only if Ax=O, x>O ~ (c,x)
a)
By Corollary 2 it follows that I is solvable for all b ~ R(A) if and only if N ( A ) n i n t E " + ~ ~p. By Theorem 0.1, the latter condition is equivalent to
R(A
n
= {0}.
b) By Corollary 3, II is solvable for all c ~ E" if and only if R (A r) ~ inter+ ~ q~. This is equivalent, by Theorem 0.1 to: N(A)nE"+ = {0}. c) This is the well-known Farkas' lemma (see, e.g. [15]). d) This is Theorem 1 in Ky Fan [6]. REMARKS.
a) Theorem 4 is a collection of classical results in a setup which is completely analogous to "Fredholm's Alternative" theorem for linear equations [5]. Solvability relations between linear inequalities and equations were studied by
184
ADI BEN-ISRAEL
[September
Motzkin [13], Kuhn [10] and generalized by Ky Fan to the case of complex normed linear spaces [6]. For a use of Fredholm's theorem to prove the main theorems of linear inequalities see [1]. b) For b E R(A), part c can be rewritten as: c') If b ~ R(A) and II' has solutions w with Arw > 0 then I is solvable if and only if Arw > 0 =~ ( A r w , A +b) > O. This follows from the fact that AA ÷ is the perpendicular projection on R(A), e.g. [2]. 5. Let A be an arbitrary m x n matrix over ~-, b e E " and c e E". Let
S = {xeE": Ax=b, x>O}
11 = sup(c,x) x~,.~
T=
{ w e e m: A r w > c }
12 = inf (b,w) KET
The duality theorem of linear programming relates the problem of solving for 11 the primal problem, to that of solving for 12, the dual problem. The duality theorem states indeed that there are four mutually exclusive cases: Case A: S ~ , T~b, Ix=l 2 Case B: S = ¢ , T¢~b, I 2 = - ~ Case C: S ~ ¢ , T = q ~ , I 1 = Case D: S = ¢ , T=~b Conjectured by von Neumann (see [7], p. 23) and proved by Gale, Kuhn and Tucker [8], this theorem was extended to some nonlinear situations, the most general being that of Charnes, Cooper and Kortanek [4]. We will now elaborate on the four cases given above. In terms of the data {A, b, c}, and more specifically of the configurations of N(A) and R(A r) with respect to E~, we give below conditions for the attainment of each of the above cases. 6. THEOREM. Let A be an arbitrary m n matrix over ,~, b eR(A) in E m, c ~ E" and let S, T, I1 and 12 be as above. Then there are eight mutually exclusive cases, tabulated below. Proof. The cases 1, ..., 8 are clearly mutually exclusive. In each case Theorem 4 is used to draw the conclusions regarding the sets S and T. Then the duality theorem of linear programming is used to obtain 11 and I2. RE~a~Ks. a) The above 8 cases can be visualized geometrically in a manner which helps to clarify the concept of duality. Thus in the 2-dimensional case where A is a 1 x 2 matrixj dimR(A r) = dimN(A) = 1, the first case appears as follows:
u
7
6
5
4
3
1
Case
R(A r) N i n t C{ep+l, ...,e,} ¢ ¢
N ( A ) n b d E ~ = C{ei "" ep} 1 __
R ( A T) ~ E~+ = {0}
N ( A ) (~ E~_ = {0}
Intersection with E~.
Assumptions
and
and A x = O, x >_ 0 and (c, x) > 0 for some x
A r w >_ 0 and ( A r w , A + b ) < 0 f o r some w
A r w >_ 0 and (Arw, A + b) < 0 for some w but A x = O, x >_ 0 =>(c, x) < 0
A r w > 0 =>( A r w , A +b) > 0 but A x = O, and (c,x) > 0 for some x > 0
A r w > 0 =>(Arw, A+b) > 0 A x = O, x > O =>(c,x) < O
A x = O, x > 0 and (c,x) > 0 for some x
A x = O, x > 0 =>(c,x) 5 0
ATw > 0 and ( A r w , A + b) < for some w
AT w >_ 0 =~(Arw, A+b) > 0
Conditions on b,c
Empty
Non-empty
Non-empty forall b G R(A)
Empty
Non-empty
Empty
Non-empty
Empty
Non-empty
Non-empty for all c ~ E "
Non-empty
Empty
T
S
Conclusions
O0
12 = - - 0 0
I 1 = 12
I:=oo
11 = / 2
12=
11 = 12
[t,I2
,<
-]
>
> z~7
~q
©
186
ADI BEN-ISRAEL
[September
ez
/ The other cases are drawn in a similar manner. Furthermore, by the "complementary slackness" property, it is now easy to identify optimal points. Thus Xo is the optimal solution of the primal problem and a = Arwo- c, where Wo is the optimal solution of the dual problem ([,16], p. 15). b) Theorem 6 combines well-known solvability theorems (Tucker [16], Charnes-Cooper [.3], p. 214), and the duality theorem of linear programming to characterize the duality situations in terms of the data {A,b,c).
Acknowledgment. The help and encouragement of Professor A. Charnes are hereby gratefully acknowledged. REFERENCES 1. Ben-Israel, A., Notes on linear inequalities, I: The intersection of the nonnegative orthant with complementary orthogonal subspaces, ONR Research Memorandum, No. 78; (to appear in J. Math. Anal. Appl.). 2. Ben-Israel, A. and Chames, A., 1963, Contributions to the theory of generalized inverses, J. See. Ind. Appl. Math. (to appear). 3. Charnes, A. and Cooper W. W., 1961, Management Models and Industrial Applications of Linear Programming, Vols. I, II, J. Wiley and Sons, New York. 4. Charnes, A., Cooper ,W. W. and Kortanek, K., 1962, Prec. Nat. Acad. Sci. U.S.A., 48, 783-786. 5. Courant, R. and Hilbert, D., 1953, Methods of Mathematieal Physics, Vol. I, Interscience Publishers, Inc., New York. 6. Fan, Ky., pp. 99-156 in [12]. 7. Gale D., 1960, The Theory of Linear Economic Models, McGraw-Hill Book Co., New York. 8. Gale, D., Kuhn, H. W. and Tucker, A. W., pp. 317-319 in [9]. 9. Koopmans, T. C. (editor), 1951, Activity Analysis of Production and Allocation, Cowles Monograph No. 13, J. Wiley and Sons, New York. 10. Kulm, H. W., 1959, Amer. Math. Monthly, 63, 217-232.
1963]
SOLVABILITY AND DUALITY
187
11. Kuhn, H. W. and Tucker, A. W., 1951, Nonlinear Programming, Proc. Second Berkeley Symposium on Mathematical Statistics and Probability, pp. 481-492. 12. Kuhn, H. W. and Tucker, A. W. (editors), 1956, Linear Inequalities and Related Systems, Annals of Mathematics Studies, No. 38, Princeton University Press, Princeton. 13. Motzkin, T. S., 1936, Beitrage zur Theorie der Linearen Ungleichungen, Inaugural Dissertation, Basel, 1933, Azriel, Jerusalem. 14. Penrose, R., 1955, A generalized inverse for matrices. Proc. Camb. Philos. Soc., 51, 3, 406-413. 15. Tucker, A. W., pp. 3-18 in [12]. 16. Tucker, A. W., June 1962, Simplex Method and Theory, Notes on Linear Programming and Extensions, Part 62. The Rand Corporation, Santa Monica, Memorandum RM-3199-PR. TECHNION-ISRAELINSTITUTEOF TECHNOLOGY,HAIFA. AND NORTHWESTERNUNIVERSITY,EVANSTON,ILLINOIS
THE
SPHERE
IN
THE
IMAGE
BY
H. HANANI, E. NETANYAHU ANDM. REICHAW-REICHBACH Introduetion. If f ( x ) is an analytic and univalent function of the complex variable x in the unit circle S = {x: Ix I < 1} and f ( 0 ) = Yo, then the image f ( S ) of S contains a circle with radius ro = Even when f ( x ) is not univalent, the i m a g e f ( S ) of S contains a circle with radius k where k is the constant o f Bloch(1). The aim of this paper is to prove theorems, similar to the last statement, for some mappings of n-dimensional Euclidean spaces and general Banach spaces into themselves. The idea of the proofs consists in the following use of fixed point theorems (z): Suppose that S = {x: Ilxll < 1} is the unit sphere in a Banach space X and that for a given y E X the mapping x - fl [ f ( x ) - y] of S into X has a fixed point x ~ S for some fl ~ 0. Then f ( x ) = y and so y is contained in the image J(S) of S. Considering all such points y, we are looking for conditions under which the set of these points contains a sphere with radius as large as possible. By S = S(x o,r) = {x: p(Xo,X) < r) we denote the sphere with center Xo and radius r in a metric space with metric p and by Bd(S) = {x:p(xo, x ) = r} the boundary of S. (By (x, y) we denote the scalar product of x and y. 1. In this section a simple generalization to Hilbert spaces of the fixed-point theorem of Schauder (Theorem 1) and its applications are given. Before proceeding with the proof of Theorem 1, let us first note the following
1If'(0)1.
If'(0)1,
LEMMh 1. I f X1 and X2 are closed subsets of a metric space X and f x :X1 -~ Y and f2 "-X 2 ~ Y a r e mappings(a) of X 1 and X z into a metric space Y, then f l ( x ) for x E X 1 f ( x ) = fz(x) for x e X z is continuous on X 1 U X 2 , provided that f l ( x ) = f 2 ( x ) on X1 n X2. The proof is trivial. RE~ARK 1. Easy examples show that the assumption of closedness o f the sets X~ and X2 in Lemma 1 is essential. (1) Bloeh's Theorem has been generalized to mappings of n-dimensionalspaces by S. Bochner in [2]. (z) A particular ease of this idea has been used in [9], p. 734. (3) By "mapping" we always understand a continuous mapping. Received August 12, 1963. 188
1963]
THE SPHERE IN THE IMAGE
189
THEOREM 1. Let f : S--* X be a completely continuous mapping of the sphere S = (x: I l x - Xoll < r} in the Hilbert space X into X , such that (1)
(x - Xo, f ( x ) -- Xo) < r 2
for
x ~ Bd(S).
Then there exisls a point ~ ~ S such that f ( ~ ) = 2(4).
Proof.
g(x)
Define for x E S the function
[f,(x ) = f(x) for x X, = (x: I!f(x)-xoll ~ r) 1 f(x) - x o -- Xo + r llS(x)- XolI forx X -- (x:lls(x3- xoll->r)
Then, by the continuity o f f , the sets X1 and X 2 a r e closed subsets of S, and for x E X1 n X 2 = (x: [if(x) - x 0 II = r) obviously f~(x) =f2(x). Hence, by Lemma 1, g(x) is a continuous function on S. Moreover, since S = X I u X 2 and f is completely continuous on S, it follows that g is also completely continuous on S. Thus by ~ g ( x ) - Xol] < r and by the theorem of Schauder [10, Theorem 2] there exists a point ~ such that ~ = g(~). Supposing g(~) =f2(~) = ~, we get f(£)-x 0
(2)
r
Ilf( )- xoll
_£_x
o.
Hence i1g - X o I] = r, i.e. ~ ~ Bd(S). On multiplying both sides of (2) scalarly by f ( g ) - Xo, it follows from (1) that r - Xo II --< r2- But then, by the definition of g(x), we have g = g(~) =f(:~).
Hf(~)
LEMMA 2. Let f : S - - X be a mapping of a sphere S = (x:ltx- xoll ----r) in the Hilbert space X into X , such that for some fl # O, the mapping x + fir(x) is completely continuous on S. Suppose further that for some given . ~ X we have (3)
( x - Xo, f l [ f ( x ) - 37]) < 0 for every x E Bd (S).
Then there exists a point ~ ~ S, such that f ( ~ ) = 37.
Proof. We show that Theorem 1 applies to the mapping h(x) = x + f l [ f ( x ) - 37]. In fact, since x + flf(x) is completely continuous, it follows that h(x) is completely continuous, and it remains to verify that (1) holds with f replaced by h. Indeed, for x e B d ( S ) w e have ( X - X o , X - X o + f l [ f ( x ) - 3 7 1 ) = IIx- oll 2 + ( x - Xo, fl[f(x) - 37])= r 2 + (x - Xo, f l [ f ( x ) - 37]). Hence by (3), the assumption (1) holds with f replaced by h. By Theorem 1 there exists a point ~ = h($), and since fl # 0 it follows that f($) = 37. Putting, in Lemma 2, Xo = 0 and either fl = 1 or fl = - 1 we obtain the following (4) For mappings of finite dimensional spaces, Theorem 1 can be derived from a result ofA. Abian and A. B. Brown [1].
190
H. HANANI et al.
[September
TrmOREM 2. I f f : S ~ X is a mapping of the sphere S = {x: IIx 11--< r} in the Hilbert space X into X such that for some given ~ either (a) (x,f(x)) < (x,y,) for ~ x II = r a n a x + f ( x ) is completely continuous in S or
(b)
(x,f(x)) >_ (x,~) for IIx II = r and
x
- f(x) is completely continuous in S,
then there exists a point 2 ~ S, such that f ( 2 ) = .~. The following two examples illustrate the use of Theorem 2. EXAMPLE 1. Let X be the Euclidean 2n-dimensional space, {aj}j=~,2..., a sequence of n real numbers, k - a positive integer and consider the mapping f : X ---,X defined by (4)
t
2k- 11 - ajx2j Y2j- 1 = - x2j2k-1
LY2)
=
--x2j
(j = 1,2, ...,n)
+ ajx2i-1
where x = ( x l , x 2, "",x2,) and y = (Yl,Y2, "",Y2,) are points o f X. Then the image of the unit sphere S = {x: Hxll z l} contains a sphere s'= {y: ]IYD ----ro} with radius r o > ( 1 / 2 n ) ~-1. Proof. We have ~b(x) = (x,f(x)) = - ~,)=~x2k2, and for II x ~ = 1 (i.e. for x E Bd(S)), tk(x) has a maximum for xt = x2 . . . . . x2,. This maximum equals max Jlxll=t ~b(x)= - ( 1 / 2 n ) k-~ = - ro. Now, if ~ p II --- - r o . It follows, that for every II ~ II ~ ro we have ( x , f ( x ) ) < (x,~) for x with 11x !1 = 1. Moreover, since X is finite-dimensional, x + f ( x ) is completely continuous and therefore the assumption (a) of Theorem 2 is satisfied. Thus, by Theorem 2, there exists a point ~ e S such that f(~) = ~. Hence each point p ~ S' is an image of some point ~ES. REMARK 2. Let us note that the result obtained in the above example is in general the best possible. In fact, if X is the real plane (i.e. n = 1) and k = 2 the mapping (4) has the form
(4')
Yl = -x31 - alx2
L Y2
=
--Xa2 -t- a l x 1
Therefore, by the result of Example 1, the map of the unit circle under mapping (4') contains a circle ~ y [I < ro with radius ro > ½. Now taking al = 0, we obtain Yl = -xla; Y2 = -x32, thus xl = _y~/3, x2 = -yX2/3 and the image of the unit circle x 2 + x 2 < 1 is the set S' of points satisfying y2/3 + y2/3 < 1. It is easy to see, that the largest circle contained in S' has radius ½. Similarly it can be shown that for a (2n - 1)-dimensional Euclidean space X, the map of the anit sphere ~ x II --- 1 under the mapping:
1963]
THE SPHERE IN THE IMAGE
191
2k-- 1
Y2j-1
=
--X2j-1
-- a j x 2 j
Y2j
= _ x22k- 1 + ajx2 j - 1,
(j = 1,2,..., n - 1)
2k-- 1
Y2n-t
=
--x2n-1
where a~,j = 1 , 2 , . . . , n - 1 , are real constants and k is a positive integer, contains a sphere S' = {y: ]l Y I[ ~ ro} with r o >__1 / ( 2 n - 1)g- 1. EXAMPLE 2. Let X = L2[0 ,1] be the Hilbert space of all square integrable functions on the interval [0,1] and consider the mapping f : S -~ X of the unit sphere S = {x = x(t): IIx ]1 =< 1} into X defined by f ( x ) = x + StoX2(u)du. Then x - f ( x ) = - f~x2(u)du is a completely continuous mapping on S and for x with II x II = 1 we have ](x,1-2)[
_-< I l x l l ' l l l - 5 [ ] = [ ] 1 - 2 [ I
Hence, for x with ][x I] = 1 we obtain
(x,f(x))
- ( x , y ) -- 1 +
x0)[1
- 2 0 ) ] a ¢ >_- 1 - II 1 - 211.
It follows that for 2 = 2(0 satisfying 1 > II 1 - 2 II the inequality ( x , f ( x ) ) > (x, 2) holds for x E Bd(S). Thus the assumption (b) of Theorem 2 is satisfied. By Theorem 2, we obtain that the imagef(S) contains a sphere S' = {y: I1Y - 1 II -<--1}. it is easy to verify that S' is the largest sphere with center Yo(0 = 1 contained in the image f ( S ) of S. In fact, taking any constant function Y(0 = 2 + e with e > 0 and assuming that for some x = x(0 there is J(x) = 2 + e, we obtain x(t) = const = c and C 2 + C - - (2 + ~.) = 0. Hence [c] > i and thus [[x [l > 1. 2. In this section some consequences of the contractive-mapping principle and their applications are given. The idea of use of the contractive mapping principle is analogous to that used in [5, p. 148] for finding of the so-called resolvent of a non-linear operator. Let us call a mapping g : X ~ X of a complete metric space X, with metric p, into itself y-contractive (0 < ~, < 1), if (5)
p(g(x), g(y)) <= yp(x, y)
holds f o r a l l x , y e X . If (5) holds for all x , y belonging to a sphere S in X, then g is called y-contractive in S. It is known that (6) If g : S ~ X is y-contractive in the sphere S = S(xo, r) contained in a complete metric space X and if for Xa = g(xo) we have p(xo,xl)<= ( 1 - y)r, then the sequence x, = g ( x , _ 1 ) , n = 1,2,... converges to the unique solution x c S ( x o , r ) of the equation g ( x ) = x [7, p. 49, Remark 2]. Let now f : X ~ X be a mapping of a Banach space X into itself. We say that
H. HANANI et aL
192
[September
the sequence (g,fl,7,r, xo) has the property P and write (g,fl,7,r, Xo)~P if the mapping gx - flf(x) is y-contractive in the sphere S = S(xo, r)(S). Note, that if(a, fl,7, r, Xo) e P and y is any point of X, then g(x) = ctx- fl[f(x) - y] is ),-contractive in S(xo, r ) and, for x x = g(xo) and Yo = f ( x o ) , we have
p(xo,xl)=HXo-gXo +/~(yo-y)ll =
+/~llyo- eli. - g l IlXoll]/l#l, we
ll-glllxoH
Hence, for y satisfying [lYo- Y[[ < [ ( 1 - 7 ) r - [ 1 obtain p(Xo,X 0 < ( 1 - 7)r. Therefore it follows from (6) that if (g,fl,7,r, Xo)~P, then there exists a unique point x~S(xo, r), such that g x - f l [ f ( x ) - y ] =x, i.e. y = ((1-a)/fl)x +f(x) (if f l ~ 0). In other words: (7) If (g, fl,7, r, Xo) ~ P and fl ~ 0, then the imagcf(S) of the sphere S = S(x o, r) under the mapping f(x) + ( ( 1 - g ) / f l ) x contains a sphere S(yo, ro) with center yo=f(xo) and radius r o = [ O - 7 ) r - l l - g l . l l X o l l ] / l ~ I. Noting that if S 1 c $2 then f(Si) cf(S2), and putting in (7) g -- 1, we obtain the following
If (1,fl,7, r, xo)eP, S(xo, r) c S = S ( 2 , f) and fl¢O then the image f(S) of S contains a sphere S ' = S(yo, ro) with center Yo =f(xo) and radius r o = ((1-7)/[/~l)r. THEOREM 3.
For some applications of this theorem, let us recall the notion of derivative for mappings in Banach spaces. Let f:G--> H be a mapping of an open set G contained in a Banach space X into a subset H of a Banach space Y. Let x o e G, and suppose that there exists a linear mapping A : X - - * Y such that for every x e X we have limt. o [ f ( x o + tx) - f ( x o ) ] / t = A(x). The mapping A is called the derivative o f f at the point x o and denoted b y f ' ( x o ) o r f ' . It can be shown (6) that: If [y,x] ~ G is an interval and f : G ~ H has a derivative at each point of [y,x], then for every linear mapping U : X - ~ g we have ] [ f ( x ) - f ( y ) - U(Ay)][ < supo
I <=s u p g ~ s [ l f l f ' ( ~ ) - I H . l l x - y ~ .
Thus, if for some f l ¢ 0 and 7, 0 < 7 < 1,
(8)
II
- z II <
is satisfied for every ~ e S(xo, r), we obtain that (1, fl,?, r, Xo) ~ P. Therefore, by Theorem 3, we have the following (5) An analogous condition was used in [8].
(6) [6, p. 592].
1963]
THE SPHERE IN THE IMAGE
193
THEOREM 4. I f f : S ~ X is a mapping of a sphere S = S(2,f) in a Banach space X into X, having a derivative f ' at every point of some sphere S(xo,r) c S and if there exist fl # 0 and 7, 0 < 7 < 1, such that (8) holds, then f ( S ) contains a sphere S ' = S(yo,ro) with center Yo = f ( x o ) and radius r o = ( ( 1 - ~ ) / ] f l ] ) r . The next two examples illustrate the use of this theorem in the case fl = 1. EXAMPLE 3. Let X = C [0,1] be the Banach space of all continuous functions on the interval [0,1], with the n o r m Ilxl[--maxo---,xlx(t)[ • Let f ( x ) = x(t) + .[~xZ(u)du(7). Then f'(~)x = x(t) + 2 .~ ~(u) x(u) du and [ I - f ' ( ¢ ) ] x = 2 j'~ ¢(u)x(u)du. Hence ~ [ I - f ' ( ~ ) ] x II --<2 f oll ¢(u)I. I x(u) la. and
IIl-f'(¢)II ____2I oll¢(u)lldu= . Taking [[ ¢(u) ~ < r we thus get, by Theorem 4, ro = ( 1 - 2 r ) r , the m a x i m u m of which is attained for r = ¼. Hence the image of the sphere S(O,¼) contains the sphere S(0, I).
f~oe-S'[x(t)
EXAMPLE 4. Let X = C [ 0 , 1 ] and l e t f ( x ) = x(s) + + ½x2(t)],dt, 0 < s < 1. We have f ' ( ¢ ) x ( t ) = x(s) + .[~e-st[1 - ~(t)]x(t)dt and for ~(t)~ S(0,1) we obtain
IIf'(¢)- z I1 =
max
-~'(1 + {(t))dt =
[1 + {(t)] dr.
0<s
Let us now look for a fixed sphere S(xo,r) contained in S(0,1) and find sup~ osc~o.,> I[f'(O - I I[. For ¢ ~ S(xo, r) we have 1 + ~(t) < 1 + Xo + r and f~ [1 + ¢(t)]dt < 1 + r + f~Xo(t)dt = 7. Therefore I l f ' ( ¢ ) - , II =< ~ for ~ ~ S('~,r). By Theorem 4, we obtain r o = ( 1 - ?)r = - r ( r + f~Xo(t)dt. Taking Xo(t ) = c = c o n s t , with c < 0 , we obtain from S(xo,r)= S(0,1) that r = 1 + c and ro = - r ( r + c ) = - ( 1 + c)(1 + 2c). This has a m a x i m u m for c = - ¼ . Therefore for Xo(t) = - ¼, we have r = 1 + c = ¼ and r o = - ¼ ( - ½) = I. It follows that the image f [ S ( - ¼ , ¼ ) ] of the sphere S(-¼,¼) contains a sphere with radius I. REMARK 3. The results obtained in Examples 3 and 4 are not the best possible. We note also the following consequence of Theorem 1: I f f : S ~ X is a completely continuous mapping of a sphere S = S(O,r) in a Hilbert space X into X such that f [ B d ( S ) ] = S, then there exists a fixed point x = f ( x ) ~ S . Indeed, i f f [ B d ( S ) ] = S then evidently (x,f(x)) < r 2 on Bd(S), i.e. condition (1) is satisfied (for Xo = 0) and by Theorem 1 there exists a point x such that x = f ( x ) . This result is well known for mappings of spheres in finite-dimensional Euclidean spaces. Another well-known result, which can also be easily derived from Theorem 1, is: I f f : S ~ X is a mapping of a sphere in a finite dimensional Euclidean space X into X which is a h o m e o m o r p h i s m on Bd (S) and for which f ( B d (S)) = Bd (S), then S = f ( S ) . (7) This mapping was considered in [3, p. 136].
194
H. HANANI et al.
[September
We shall now prove a theorem, which is an analogue for Banach spaces to a " u n i f o r m " version of the theorem on implicit functions for finite dimensional spaces. Before proving this analogue let us make some general remarks. First of all, we note that in infinite dimensional spaces X an isometric mapping f : S---} S of a sphere S c X need not necessarily contain a sphere in X. Indeed, let X be the Hilbert space 12. i,e., the set of all sequences (x~,x2, ...,x,...,) such that ~ =o~l x i 2 < ~ , where x~ are real numbers and let S be the unit sphere s-- (x: II x II -<- 1} in X. Let f(x) = (0,xl,x2,..-) for x = (xl,x2,...). Then J(S) is nowhete dense in X and thus it does not contain any sphere in X. In what follows, we shall confine ourselves to mappings of the form x - F(x), where F(x) maps a Banach space X into itself. In the case that F(x) is completely continuous it is known that the image f ( S ) of a closed unit sphere S c X (or even of a closed and bounded set) is a closed subset of X [4, p. 193]. A simple example shows that it is not true that the image f ( S ) contains a sphere in X. Indeed let us define F(x)=(xl,0,O,...) for x = (xl,x2, "") where x E 12. Then f(x) = x - F(x) = (O, x2,xa, ...), where F(x) is completely continuous, but f(S) does not contain any sphere in X. Note however, that in this last case we have l]F'(x)[l = 1 for every x e 12. In the case that I[F ' ~ = ((1 - 7)/2K), then the image f ( S ) of the sphere s = (x 11x ll ~ r) contains a sphere of radius r o = ((1 - 7)/2) 2 / K . Proof. We have [IF'(x)II = IIF'(O) + F'(x) - F'(O) [I< 7 + ~ F'(x) - F'(O) II _-< r + K 11 x 11(8) Hence for x satisfying l[x II ~ ((1- 7)/2K) we obtain ][F'(x)II <= (1 + 7)/2< 1. Therefore Hf' - II[ = [1F' II ~ (1 + 7)/2 and by Theorem 4, the imagef(S) of the sphere S contains a sphere of radius ro = ((1 - 7)/2)2/K and center Yo = f(0). It is easy to show that J maps the sphere {x: [Ix 1] <--((1-7)/2K} onto S(yo, ro) in a one-to-one fashion. REFERENCES 1. Abian, A. and Brown, A. B., 1962, A new fixed point theorem for continuous maps of the closed n-cell, Enseignement Math., (2) 8, 34-40. 2. Bochner, S., 1946, Bloch's theorem for real variables, Bull. Amer. Math. Soc., 52, 715-719. 3. Cameron, R. H., 1956/57, Nonlinear Volterra functional equations and linear parabolic differential systems, J. Analyse Math., 5, 136--182. 4. Granas, A., 1960, On the disconnection of Banach spaces, Fund. Math., 48, 189-200. 5. Krasnosel'skii, M. A., 1956, Topologi~eskie metody v teorii nelineingh integral'nyh
uravnenii, Moscow. (s) This last inequality follows from an inequality in [6, p. 592].
1963]
THE SPHERE IN THE IMAGE •
195 g v
6. Kantorovi~, L. V., and Akilov, G. P., 1959, Funkclonal nil analiz v normirovannyh prostranstvah, Moscow. 7. Ljusternik, L. A. and Sobolev, V. I., 1951, Elementy funkcional'nogo analiza, MoscowLeningrad. 8. Reichbach, M., 1960, Some theorems on mappings onto, Paci[ic J. Math. 10, 1397-1407. 9. Reichbach, M., 1961, Fixed points and openness, Proc. ,4mer. Math. Soc., 12, 734-736. 10. Schauder, J., 1930, Der Fixpunktsatz in Funktionalriiumen, Studia Math., 2, 171-180. T E C H N I O N - I S R A E L I N S T I T U T E OF T E C H N O L O G Y , HAIFA