TRANSLATIONS SERIES IN MATHEMATICS AND ENGINEERING
A.V. Balakrishnan
TRANSLATIONS SERIES IN MATHEMATICS AND ENGINEERING
A.V. Balakrishnan, A.A. Dorodnitsyn, and J.L. Lions, eds.
1983, 267 pp. ISBN 0-911575-00-6
Vistas in Applied Mathematics: Numerical Analysis, Atmospheric
-3;
D-0
Spectral Theory of Random Fields
M.I. Yadrenko
Sciences, Immunology. Collected papers
Mathematical Models in Immunology
dedicated to G.I. Marchuk on his 60th birthday.
1983, 378 pp. ISBN 0-911575-01-4
1986, 396 pp. ISBN 0-911575-38-3
A.A. Borovkov, ed.
V.F. Kolchin
Advances in Probability Theory: Limit Theorems and Related Problems
Random Mappings
G.I. Marchuk
1984, 392 pp. ISBN 0-911575-03-0
1986, 224 pp. ISBN 0-911575-16-2
.60
L. Telksnys, ed.
V.A. Dubovitskij
Z01!
The Ulam Problem of Optimal Motion of Line Segments V-0
1985, 128 pp. ISBN 0-911575-04-9
Detection of Changes in Random Processes 1986, 240 pp. ISBN 0-911575-20-0
V.F. Dem'yanov, and A.M. Rubinov
Statistics and Control of Stochastic
Quasidifferential Calculus
mar
O;5
N.V. Krylov, R.S. Liptser, and A.A. Novikov, eds.
1986, 301 pp. ISBN 0-911575-35-9
Processes 1985, 521 pp. ISBN 0-911575-18-9
V.P. Chistyakov, B.A. Sevast'yanov, and V.K. Zakharov
Probability Theory for Engineers Yu.G. Evtushenko
Numerical Optimization Techniques
R.F. Gabasov, and F.M. Kirillova
°o°
1985, 575 pp. ISBN 0-911575-07-3
1986, 175 pp. ISBN 0-911575-13-8
Methods of Optimization V.F. Dem'yanov, and L.V. Vasil'ev 1985, 472 pp. ISBN 0-911575-09-X
1986, approx. 350 pp. ISBN 0-911575-02-2 acv
Nondifferentiable Optimization
B.T. Polyak
Introduction to Optimization A.A. Borovkov, ed.
Advances in Probability Theory:
1986, approx. 450 pp. ISBN 0-911575-14-6
Limit Theorems for Sums of Random Variables
I.A. Boguslavskij
1985, 313 pp. ISBN 0-911575-17-0
Filtering and Control 1986, approx. 400 pp. ISBN 0-911575-21-9
Vladimir F. Dem'yanov Alexander M. Rubinov
QUASIDIFFERENTIAL CALCULUS
timization Software, Inc. Publications Division, New York
Authors V.F. Dem'yanov Applied Mathematics Department Leningrad State University Leningrad, Staryj Peterhof USSR 198904 A.M. Rubinov Institute for Socio-Economic Problems USSR Academy of Sciences ul. Voinova 50-a Leningrad USSR 198015
Library of Congress Cataloging-in-Publication Data Dem'ianov, V.G. (Vladimir Fedorovich), 1938Quasidifferential calculus.
(Translations series in mathematics and engineering) Bibliography: p. Includes index. 1. Calculus, Differential. I. Rubinov, Aleksandr Moiseevich. II. Title. III. Series. 515.3'3 86-5422 QA304.D45 1986 ISBN 0-911575-35-9
© 1986 by Optimization Software, Inc., Publications Division, 4 Park Avenue, New York, New York 10016. All rights reserved. Published in 1986. Printed in the United States of America.
ABOUT THE AUTHORS
Vladimir Fedorovich
Dem'yanov
(born
1938) received his Ph.D. in Mathematics from the Leningrad State University in 1964
and his D.Sc. degree from the Computing Center of the USSR Academy of Sciences in
1972. He was appointed Professor at the Applied Mathematics Department of the Leningrad State University in 1970. Alexander Moiseevich Rubinov (born 1940) graduated from the Leningrad State University in 1962. He received his Ph.D. in Mathe-
matics from the Novosibirsk Institute of Mathematical Sciences in 1965. Since 1976
he has been with the Leningrad Research Institute for Socio-Economic Problems at the USSR Academy of Sciences.
TABLE OF CONTENTS
Preface
ix
Chapter 1 Preliminaries
1
Chapter 2 Elements of the Theory of Multivalued Mappings
6
Chapter 3 Directional Derivatives
19
Chapter 4 Convex Functions
37
Chapter 5 Sublinear Functions
46
Chapter 6 The Space of Convex Sets
.
.
.
.
65
78
Chapter 8 The Clarke Derivatives
91
00
Chapter 7 Upper Semicontinuously Directionally Differentiable Functions . . . . .
T--4
Chapter 9 Upper Convex and Lower Concave Approximations . . . . . . .
.
104
vii
Table of Contents
viii
Chapter 10 Quasidifferentiable Functions
112
Chapter 11 Examples
122
Chapter 13 A Relation Between the Clarke Subdifferential and the Quasi differe ntial
143
0
Chapter 14 Implicit and Inverse Function Theorems
00
128
0
Chapter 12 Quasidifferentiability of a Composition
155
176
Chapter 16 Necessary and Sufficient Conditions for an Extremum
188
Chapter 17 Rate and Directions of Steepest Descent and Ascent
203
Chapter 18 Saddle Points of Quasidifferentiable Functions
229
Chapter 19 Star-Shaped Sets and Their Applications in Nonsmooth Optimization
236
Chapter 20 Approximate Quasidifferentiability
254
Bibliographic Notes
.
.
.
.
.
.
.
.
.
265 \-0
Z4..
Chapter 15 Cones of Admissible Directions and a Nondegeneracy Condition
269
Index
283
Transliteration Table (Russian-English)
289
000
00
References
PREFACE
W-4
0'+
When studying optimization problems, the mathematician has to make use of certain tools. In the smooth case, for example, one such tool is the derivative; in the nonsmooth case, the derivative concept is naturally replaced by the idea of a directional derivative or one of its generalizations. But there is a catch: any such generalization must be consistent with simpler cases, e.g., if the function is directionally differentiable, then the generalized directional derivative should be the same as the ordinary directional derivative. Otherwise the generalization-even though aesthetically elegant may have little or no relevance to the optimization problem. BCD
'ti
Suppose that a function f is directionally differentiable on some set S in the finite-dimensional space R'2 , i.e. for any X E S and g E R'2 the following limit exists: d f (x)
dg
= 1 im
III
III
f(g)
f(x+ctg)-f(x)
a
a-4+0
Here fx(g) is called the directional derivative of f at x in the direction g E Rn. Let x be fixed. The function
A (alg) = fW + a
dg
is called the first-order approximation of f in a neighborhood of the point x. Then
boo
f(x+xg) = f(x) + a °dg+ o(a) where o(a) = o(a,g, x) is such that °
a a
fi(a,g) + o(a) ,
- 0.
a-4+0
The use of the directional derivative allows us to study some very important properties of the function, e.g., (a) o check necessary (and sometimes even sufficient) conditions for optimality; (b) to find directions of steepest descent or ascent; (c) to construct numerical algorithms.
ix
Preface
x
Suppose that f(g) is a continuous function of g for every fixed x (this
for example, f is a Lipschitzian). In this case, f(g) is also positively homogeneous. But every positively homogeneous continuous func-
00,0
occurs if,
tion can be approximated to within any prescribed accuracy by the differ-
".OW
h(g) = ma x (v,g)
0
'b>
ence of two positively homogeneous convex functions. We know that to study the first-order properties of a function f at a point x it is sufficient to study a good approximation of its directional derivative at the point x. As has been noted, such an approximation can be provided by the difference of two positively homogeneous convex functions. But every positively homogeneous convex function h(g) can be expressed in the form ,
VEA
where A E Rn is a convex compact set. Therefore, the difference of two positively homogeneous functions can be written
ma x (v,g) - m a x (v,g) vEB vEA or, equivalently, m a x (v,g) + min (w,g) vEA WE[-B]
.
anon
Here A and B are convex compact sets in Rn. It should now be clear why it is important to study the class of functions whose directional derivatives can be represented in the form df (x)
dg E
-
min (w,g)
max (v,g) + E
v
,
(1)
(x)
Rn and df(x) c Rn are convex compact sets. 4-.
A function f is said to be quasidifferentiable at a point x if it is directionally differentiable at x and there exist convex compact sets !2f(x) and df(x) such that the directional derivative df(x)/ag of the function f can be is called a represented as in (1). The pair of sets Df(x) = quasidifferential of f at x. It is clear that a function does not have a unique quasidifferential.
We emphasize the fact that to examine the first-order properties of the function, it is sufficient (at least in the case of directionally differentiable Lipschitz functions) to study only quasidifferentiable functions.
If a function f is Lipschitz but not directionally differentiable, we can bbl
consider the upper Dini derivative dD f (x,g) = 1 1 m sup a-4+0
[ f (x+ag) - f (x) ]
Preface
xi
or the lower Dini derivative
a f(x,g)
-
a [f(x+ag) -.f(x)]
These derivatives are consistent with the directional derivative: if f is a directionally differentiable Lipschitzian, then
dDf(x,g) _ d f(x,g)
= f(g).
But, even more importantly, the functions dD f (x,g) and dD f (x,g) are continuous and positively homogeneous (as functions of g). Thus quasi-
differentiable functions can also be used to study the Dini derivatives.
This again demonstrates the importance of studying quasidifferentiable functions.
The concept of quasidifferentiability was introduced in 1979. Since then a
whole theory of quasidifferential calculus has been developed, and many problems traditionally related to classical calculus have been stated and solved for quasidifferentiable functions -formulas for computing quasidifferentials, a chain rule, inverse and implicit function theorems, necesran
sary and sufficient conditions for an extremum, quasidifferentiable mappings, etc.
This book presents the state-of-the-art of quasidifferential calculus. It was written while one of the authors was a member of the International Institute for Applied Systems Analysis in Laxenburg, Austria. The authors express their gratitude to Professor A. Wierzbicki and Professor A.B. Kurzhanski, the former and present Chairmen of the System and Decision Sciences Program at IIASA, for their support, and to thank Helen Gaskin for her careful editing and Elfriede Herbst and Edith Gruber for typing and retyping the manuscript.
Chapter 1 PRELIMINARIES
We recall here some definitions and results which will be used throughout this book. Proofs can be found in any monograph on convex r-,
analysis (see, e.g., [44], [79], [83], [1191, [1271, [1361).
0".
1.1. In what follows, Rn is the n-dimensional Euclidean space. An element x = (X(09 ..., x(n)) of Rn is called a vector (or sometimes a direction or a point, depending on the role it is to play). If x = (x(1) ,...,x(n)) E Rn , y = (y(1)...,y(n)) E Rn, then (x,y) = En=1 x(')y(i) is called the scalar product of vectors x and y. Let X E Rn . Then the quantity
II x II =
( x, x) is called the Euclidean norm
of X.
Vectors x1,...,x2 are said to be linearly independent if the equality r
ITV
(1.1)
LrkXk = 0
k=1
holds if and only if ak = 0 V k E 1:r. Otherwise they are called linearly dependent.
In (1.1), 0 = (0,...,0) E Rn is the zero element of Rn (in the sequel the dimensionality of 0 is usually clear from the context). Recall that in Rn any n+ l vectors are linearly dependent. By Ve (x) c Rn we denote an open ball centered at a point X E Rn with radius . ? 0
Ve(x) =
E R"RnI IIv x II <e} .
By B we denote the unit closed ball centered at the origin:
B=
E R"RnI IIvII <_ 11 ;
by Be(x) we denote a closed ball of radius e >_ 0 centered at X E Rn :
B. (x) _{ v E Rn I
11 v -x 11
:5.61 ; 1
Chapter 1
2
CJ`
and by BE we denote a closed ball of radius .6 >_ 0 centered at the origin. is clear that
It
= x+BE = x+eB.
BE(x)
Let A be a set in the space Rn. Then by cl A we denote the closure of A; by int A we denote the set of interior points of the set A. It is clear that BE (x) = cl VV (x). The empty set is denoted by 0. y"'
A set A c Rn is called compact if it is closed and bounded. It is easy to check that
m a x (v,g) =
V v E Rn
1101
gEB
min (v,g) _ -IIvII
VvE Rn
gEB
1.2. A set d2 c Rn is called convex if the inclusions x1 E £2, x2 E £2 imply that
x a = ax1 + (1-a)x2 E i 2
V a E [0,1 ]
.
A set £2 c Rn is called strictly convex if the relations x1 E
x2 E
x1 # x2 imply that
xa = ax1 + (1-a)x2 E int 12
V a E (0,1)
.
For D1 c Rn, £2 c Rn the set £2 =
£1 +
£2
= {x = x1 +x2 I X1 E £21, x2 E D2 }
is called the sum of the sets d21 and £22. If £2 c Rn , 2 E R1, then the set
,(21-AD ={x=Ay iyE,2} is called the product of the set d2 and the number 2. Let £2 c Rn. The set r
co
r
x = I akxk I Xk E 12, ak ? 0, 1 ak = 1, r is a natural number k=1
k=1
is called the convex hull of the set £2. If r
ai =
xi E Rn, ai ? 0 V i E 1: r; i=1
1
Preliminaries
3
then the vector x = Ei = i a1x1 is called a convex combination of the vectors xl ,...,Xr.
Theorem 1.1 (The Caratheodory Theorem). Any vector x E co Sl can be represented as a convex combination of not more than n+l vectors from the set Sl.
Note that the convex hull of a compact set is a compact set. If Sl is a closed bounded set, then
ma x (v, x) = max X E 12
If £21 ,. .. ,is
(v, x)
V v E Rn
XE CO12
are closed bounded sets in Rn and £2 =121 + ... +
,mss ,
then
s
ma x (v,g) _ Y ma x (v,g) . V E12
i=1
V EQt
A
Theorem 1.2 (The Separation Theorem). Let £2 c Rn be a closed convex set and let x0 4 £2. Then there exists a vector go E Rn, Iigo II = 1, and a number a > 0 such that (x-x0 ,g0) < -a V x E £2 .
Corollary 1.1. If £2 c Rn is a closed convex set and x0 is a boundary point of the set £2, then there exists a vector go E Rn , II go II = 1, such that
(x x0 ,g0) < 0
V X E D.
1.3. A set T c Rn is called a cone if the inclusion x E T implies v = Ax c
£2 V2>0.
If T is a cone, then the set
F* = {w E Rn I (w,v) >_ 0 V V E T} is called the cone conjugate to F.
Proposition 1.1. If F c Rn is a cone, then F* c Rn is a closed convex cone.
Theorem 1.3. Let F c Rn be a closed convex cone and let 12 c Rn be a con-
vex compact set. For the cone F and the set 12 not to
intersect, i.e.,
F r, d2 = 0, it is necessary and sufficient that there exist a vector w0 E F such that in a x (w0, x) < 0. XE12
Chapter 1
4
Corollary 1.2. Let T c Rn be a closed convex cone and let 12 c Rn be a convex compact set. In order that int T n d2 = m, T n int 12 = m, it is necessary and sufficient that there exist a vector w0 E F* such that max (w0,x)<0. XE12
Proposition 1.2. If r c Rn is a closed convex cone, then r** = T. Corollary 1.3. If F c Rn is a convex cone, then
T**
= cl T.
Corollary 1.4. If Ti c Rn , i E I :s, are convex cones and r
S
{v=vjEr,iE
Ti =
1:s
,
i =1
i =1
then S
ri
*
S
=nr.
i=1
i =1
Corollary 1.5. If Ti c Rn , i E I :s, are closed convex cones, then
= cl n T
n T i =1
i =1
1.4. Let d2 be an arbitrary set in Rn and let r
cone Q = v = Y akvk I vk e d2, ak ? 0, k E 1:r, r is a natural number k=1
This set (cone £2) is called the convex conical hull of the set d2 (and is clearly a convex cone). PIN
Proposition 1.3. Any point v E cone dl can be represented in the form r
v = I akvk k=1
where vk E A ak ? 0, k e 1:r, 1 < r < n, and the vectors {vk k E 1:r} I
are linearly independent.
Proposition 1.4. Let 12 c Rn be a bounded closed set and let 0 4 co £2. Then the cone 12 is a closed convex cone.
Preliminaries
5
Proposition 1.5. If a set Sl e Rn contains a finite number of points, then the cone Sl is a closed convex cone (compare with Proposition 1.4; here the condition 0 4 co 12 is dropped).
Chapter 2 ELEMENTS OF THE THEORY OF MULTIVALUED MAPPINGS
"'o
2.1. Let us consider two sets X and Y. A mapping a is called a multivalued (or point-to-set) mapping of X into Y if every point x E X is associated with some nonempty subset a(x) of the set Y. By 17(X) we shall denote the family of all nonempty subsets of X (2X usually denotes the family of all subsets of X, i.e.,
17(X) = 2X \m. If a : X -4 (Y),
E 17(X), then we shall use the notation
a(n
x
E a(x) .
$_T
9Q,
The set gr a of all pairs [x,y], y E a(x), is called the graph of the map-
ping a : X -4 17(Y). In other words,
gr a = {[x,y] E XXYIy E a(x)} .
Every set Z belonging to the direct product Xx Y is the graph of some multivalued mapping a which is defined on the set X1 of those points x E X for which there exists y E Y such that [x,y] E Z. In this case a(x) = {Y I [x,Y] E ZI V.,
Sao
and the set X1 is called the domain of a and is denoted dom a. The set a 1 defined on the subset a(X) of the set Y by the formula
is called the inverse of a.
In other words, if [x,y] E gr a, then [y,x] E gr a-1. Let us consider two simple examples.
6
Elements of the Theory of Multivalued Mappings
7
4-a
x
Example 2.1. Let f(xy) be a function defined on the direct product XxY of sets X and Y. Let
x, = {xEx13yE r:f(x,y) :5 o}. For X E X1 let us put a(x) = ty c- Y I f(X9Y) :5 01
-
If EXI, then fy E Y I 3X E
: f(X,Y) :5 01 .
The set gr a consists of all pairs [x,y] such that f(x,y) <_ 0. If y E a(X 1), i.e., there exists an x E X such that f(x,y) < 0, then a- I (y) = jx c- X I f(xy) :5 01
-
Example 2.2. Let X = RI , Y = R2 and let B be the unit ball in the space Y. Consider a scalar-valued function f defined and nonnegative on R1. For every
x E RI let a(x) = f(x)B. This
is
a ball of radius f(x) centered at the
origin. Now let
c R1, # = suFf(x) XE
a')
If the supremum is achieved, then a() _ pB; if the supremum is not achieved, then
U AB Let f(x) = x. It is clear that
gr a = {[x,y] E R1 xR2 and a-1 (y) = {x
I
Ily ll
IX
_
Ilyll},
<_ x} is a one-dimensional ray with its vertex at the
point Ily ll.
Sao
One more definition: let mappings a: X -4 I7(Y), b: Y -4 17(Z) be given. The mapping b o a: X -4 17(Z) defined by the equality
(b° a)(x) = b(a(x)) =
u b(y)
yE a(x)
is called the composition of mappings a and b.
Chapter 2
8
2.2. In what follows it is assumed that X and Y are sets in finite-dimensional spaces. A mapping a: X -4 17(Y) is called closed if the graph gr a of this mapping is closed, i.e., if the relations
x k k-4 x
A k-4 y
imply y E a(x).
Proposition 2.1. If
is a compact subset of a set X and a mapping a: X -4
17(Y) is closed, then the set a() is closed.
P r o o f. Let yk E a(). Then there exists an xk E
such that yk E a(xk ). We can assume without loss of generality that the sequence {xk } converges.
If X = I i m Xk, then x E . Since the mapping a is closed, we have y E a(x) and therefore y E a(). 0 Corollary 2.1. If a is a closed mapping, then the set a(x) is closed for every x E X. any sets.
is bounded for A mapping a: X -4 17(Y) is called bounded if the set e X. It is clear that images of a closed bounded mapping are compact By 170(Y) we denote the family of all compact subsets of the set Y. A map-
ping a: X -4 170(Y) is called upper semicontinuous (u.s.c.) at a point x if for any e > 0 there exists a 9 > 0 such that a(V5(x)) c VV(a(x))
.
Here V5(x) is an open ball of radius 6 centered at the point x,
vE(a(x)) _ {y E Y I p(y,a(x)) = min 11y-zll < e} ZE a(x)
is the set of points lying at a distance of less than e from a(x). (The set VV(a(x)) is also called the a neighborhood of the set a(x).)
Proposition 2.2. A bounded mapping a: X -4 170(Y) is u.s.c. at a point x if and only if it is closed at a, i.e., if the relations Xk -4 X,
imply that y E a(x).
yk -4 y,
yk E a(xk)
Vk
Elements of the Theory of Multivalued Mappings
9
P r o o f. (a) Let a be u.s.c. at x and let xk _ x, yk _ y, yk E a(xk) V k. We W'.
must show that y E a(x). Assume the converse; let y 4 a(x). Since a(x) is com-
pact, we have p(y,a(x)) = e > 0. But yk - y and therefore for sufficiently large k we have P(Yk,a(x)) >
2,
i.e., Yk 4 Ve(a(x)) A
On the other hand, since the mapping a is u.s.c. there exists a 5 > 0 (depending on e) such that a(Vs(x)) c VE (a(x)) 2
If k is sufficiently large, then xk E Va(x); but this implies the inclusion
WIN
Yk E a(xk) C a(Vs(x)) C VE (a(x)) z
This contradiction proves that the mapping a is closed at the point x.
(b) Let a mapping a be closed at x. Assuming that it is not u.s.c. at x, let us find e > 0 such that for every k there exists a point xk E VlIk(x) with the property that the set a(xk) is not contained in V f (a(x)), i.e., there exists an element Yk E a(xk) such that p(yk,a(x)) > e.
The boundedness of the mapping a implies that {yk } is a bounded sequence, and therefore, without loss of generality, we can assume that i m Yk = y exists and, in addition, p(y,a(x)) >_ E. Thus, there must exist sequences {xk } and {yk} such that Xk _ x, Yk _ y, Yk E a(xk) V k and y 4 a(x). This contradiction completes the proof. 0
We shall define a lower semicontinuous (l.s.c.) mapping only in terms of sequences.
A mapping a: X - 17(3 is called lower semicontinuous (l.s.c.) at a point
x E X if y E a(x), xk -4 x imply that there exists a sequence {yk } such that Yk E a(xk) V k and Yk
A mapping a is called Kakutani continuous (or K-continuous) at a point x E X if it is both u.s.c. and l.s.c. at this point. For mappings with compact images, continuity may also be defined by introducing a metric into the family 17o(Y) of all compact subsets of the set Y, treating a as a singlevalued mapping operating in 17o(Y) and also introducing continuity in the same
way as for single-valued mappings. As a rule, it is the so-called Hausdorff
Chapter 2
10
metric that is introduced into 170(Y). This metric is defined by the equality max {m a x p(v,77), m a x
VE
U ET/
where , q E 170 (Y). Here is (as above) the distance from a point y to a set . We shall just mention in passing that it is possible to show that the function
thus defined is in fact a metric on the set 170(Y). A mapping a: X -4 170(Y) is called Hausdorff continuous (or H-continuous)
at a point x if
it is continuous at this point as a single-valued mapping, i.e., if for any e > 0 there exists a 8 > 0 such that the inequality Ilx y II < 8 implies p(a(x),a(y)) < E. An equivalent definition: if Xk -4 x, then p(a(xk),a(x)) -4 0.
The following property holds.
Theorem 2.1. Let a mapping a: X -4 17(Y) be bounded in a neighborhood of
some point x E X. Then H-continuity of the mapping a at the point x is equivalent to the K-continuity at this point. Proof of this theorem can be found, for example, in [47].
Therefore, when dealing with the continuity of bounded mappings, 4-,
we shall not specify the nature of the continuity. ...CD
2.3. Here we consider mappings with convex compact images. By 17c (Y) we denote the family of all convex compact subsets of a set Y. Consider a mapping a: Y -4 17JRm ), where X is an open set in the space Rn. Take x E X. For every e E Rm define the function q,e(x) =
max (.e,y)
.
yE a(x)
It turns out that topological properties of the mapping a (its continuity, upper and lower semicontinuity) are closely related to topological properties of the 4-J
.N3
function q,e. A function q is called upper semicontinuous (u.s.c.) at a point x E X if for any sequence {xk } such that xk -4 x the following inequality holds:
lim q(xk) = lim sup q(xk)
<_
q(x)
.
Proposition 2.3. A bounded mapping a: X -4 17c(Rm) is u.s.c. at a point x E X if and only if for every e E Rm the function q,e is upper semicontinuous at x.
P r o o f. (a) Let a mapping a be u.s.c. at a point x. Take any e E Rm. We shall prove that q,e is a u.s.c. function. Choose any sequence {xk } such that
Elements of the Theory of Multivalued Mapping
11
xk -4 x. Since sets a{xk } are compact, there exists a point yk E a(xk) such that q,e(xk) =
max (e,y) _ (1'yk )
yE a (Xk)
for every k. The set {xk } is bounded, and therefore the boundedness of a implies that the sequence {yk } is bounded. Let a sequence {k1 } be such that q,e(xk i ) -4 lim g1(xk)
.
Since the sequence {yk . } is bounded, we can assume without loss of generality the lim yk . = y exists. The mapping a is u.s.c. and therefore y E a(x). Thus we have
_ (4)
lim q,e(xk) = lim qe(xk i) = lim (.4k.) <_
max
y'E a( x)
(.y') = q1(x)
,
'z33
i.e., the function q1 is u.s.c.
(b) Now let the function q1 be u.s.c. at a point x for any e E Rm. We wish to show that the mapping a is u.s.c. at x. Assuming the opposite, there must exist sequences {xk } and {yk } such that Xk -4 X, yk -4 Y, yk E a(xk) V k
and y 4 a(x). Since a(x) is a convex compact set, the separation theorem tells us that there must exist an element e E Rm and a number e > 0 such that (e,y) >
max (ey') +e = qe(x) +e. a(x)
y,E
For sufficiently large k we have
q,e(xk) ! VIYO > VY)
-2
2
which contradicts the assumed upper semicontinuity of the function q1.
0
Remark 2.1. Arguing as in the first part of the proof above, it can be shown that if a is u.s.c., then for any continuous function f the function q(x) _ max f(y) is u.s.c. ye a(x) 3"o
Remark 2.2. Under the assumptions of Proposition 2.3, we can speak about the boundedness of a mapping only in some neighborhood of a point x. Here lim = lim inf.
Chapter 2
12
CD>
A function q is called lower semicontinuous (l.s.c.)
at
a point
x,
if the
inequality lim q(xk) > q(x) holds for any sequence {xk } such that Xk -4 x.
Proposition 2.4. Let a mapping a: X -4 IIc(Rm) be bounded in some neighbor-
hood of a point x E X. Then a is l.s.c. at x if and only if the function q,e is I.s.c, at x for every e E Rm .
P r o o f. (a) Let a mapping a be l.s.c. at x. Take any e E Rm and find y E a(x) such that max
q,e (x) =
y'E a(x)
(.e',y') = (.e,y)
.
The lower semicontinuity of the mapping a implies the existence of a sequence {Yk } such that Yk E a(xk ), Yk -4 y. Since q,e (xk) > V,yk ), we have
lim q1(x)
"M (eyk )
_ (y) = qe (x)
i.e., the function q is l.s.c.
(b) Let the function q,e be l.s.c. at x for any e E Rm. We must show that the mapping a is also l.s.c. at x. Assume the opposite. In this case it is not hard to see that there exist an e > 0, an element y E a(x) and a sequence {xk } such that xk -4 x, a(xk) n (y+Be) = 0. Applying the separation theorem, we shall find, for every k, an element ek such that
max (.ek,v) <
min (,ek,u+y)
(2.1)
.
u E B.6
vE a ( x k )
Since
min (.Pk,u+y)
_ (.Pk,y) + min (.ek,u) uEB.6
0.y
uEB.6
Vk,y) + e m i n (ek,y)
( k,y) - EIi ek II
U E B
and
it follows from (2.1) that q,ek (xk) < (ek,y) - e 1"ek II
We can assume that
IL,ek II
=
1
.
(inequality (2.1) is not violated if
(2.2) ek
is
.
Elements of the Theory of Multivalued Mappings
13
multiplied by any positive number). Without loss of generality suppose that .ek -4 e. Let hk = e - ek and let C be a number such that IA
IIVII :5 C V v C= V a(Xk)
-
The existence of such a C follows from the boundedness of the mapping a. Since q,e (xk)
=
max (.e,v) =
VE a (xk)
<
max (.ek,v) +
VE a (xk)
max (ek +hk ,v)
VE a (xk)
max (hk,v) < gzk(xk) + 11hk II C ,
VE a (xk)
(2.1) and the equality II,ek II = 1 lead to
q1(x) <- qZ k (xk) + I I hk 11 C < (ek y) + 11.ek 11 C - e
= (_ey) -(hk,y) + I1hk11C-e < qe(x) + [11hk 11C - (hk,y)] - e
.
Since hk -4 0, the following inequality holds for sufficiently large k:
q (xk) < q (x) - C2 This contradicts the lower semicontinuity of the function q1.
0
Remark 2.3. Arguing as above, we can show that if f is a continuous function and a is a l.s.c. mapping, then the function q(x) = E ma(x/(y) is Y
l.s.c.
Theorem 2.2. Let a mapping a: X -4 17,(Rm) be bounded in some neighbor-
hood of a point xp E X. Then the mapping a is continuous at xp if and only if for every e E R'" the function q,e(x) =
max (.e,v)
VE a(x)
is continuous at xp.
Proof follows immediately from Propositions 2.3 and 2.4.
Chapter 2
14
Proposition 2.5. Let a mapping a be continuous on some compact set X. Then a is bounded on X. 0
P r o o f. We wish to show that the set a(X) is bounded. Assuming the opposite, let us find sequences {xk } and {yk } such that xk E X, Yk E a(xk) V k, "Yk II -4 oo. There exists a vector e such that (e,yk) -4 +00.
Consider the function
q1(x) =
max (e,y) yE a(x)
.
continuous on X and therefore bounded. On the other hand, q1 (x) >_ (eyk) -4 +00. This contradiction completes the proof. 0 It follows from Theorem 2.2 that this function is
CCs
4-'
+-' +-)
+-J
V,0
2.4. In the topological theory of multivalued mappings it is proved that under some natural assumptions every u.s.c. mapping can be approximated to within some given accuracy by a continuous mapping. We need one result of this type:
Theorem 2.3. Let a: X -, 177(Rm) be a bounded u.s.c. mapping, where X c Rn is a compact set and 17c(Rm) is the family of convex compact subsets
of the space Rm. Then for any .6 > 0 there exists a continuous mapping b: X -4 17c(Rm) such that
gr a cgrb c V,,(gra) ,
(2.3)
where gr a and gr b are the graphs of mappings a and b, respectively, and
t.'
V,,(gr a) _ {z = [x,y] I x E Rn, y c= Rn , p(z,gr a) <_ e} is a closed e -neighborhood of the set gr a.
Before we proceed to prove the theorem let us check two simple properties related to inclusions (2.3). Let
z = [x,y] E Rn X Rm . 010
Introduce the following norm into the space Rn x Rm : IIzII
= max {Ilxll,llyll}
norm.
is
evaluated on the basis of this CCs
and suppose that the distance p(x,gra)
Elements of the Theory of Multivalued Mappings
15
Lemma 2.1. Relation (2.3) is equivalent to stating that the inclusions a(x) c b(x) c V. [a(BE (x))] = a(x+BE) + BE
(2.4)
moo
are satisfied for any x E X. Here BE and BE are closed balls of radius e centered at zero in the spaces Rn and Rm, respectively, and BE (x) is a ball of radius e centered at x in the space Rn.
P r o o f. It is clear that the relations gr a c gr b and a(x) c b(x) V x E X are equivalent. Consider the right-hand side of (2.3). Let y E b(x). Then [x,y] E gr b and, hence, there exists a point [x',y'] E gr a such that P([x,y],[x',y']) = max { IIx-x' II, ii y-y' ii} :5 e
.
Since
x' E x+ Be , y E y'+Bm , y' E a(x')
we have y E y' + Bm c a(x') + BmC c a(x + BE) + Bm . 46
(2.5)
On the other hand, it is not hard to see that if (2.4) holds for all x E X and y E b(x), then gr b c VE(gr a) .
0
Lemma 2.2. Let open balls
Vs (xi) _ {y I
IIy - xi II < 05i }
where xi E X, i E 1:p, form an open covering of a compact set X, i.e., v Vs (xi)
X
.
Then there exist continuous functions 1, i E 1:p, such that (i } e,(x) > O. Furthermore, ei(x) > 0 if x E V5(x1) and .ei(x) = 0 if x
Vs (xi) ;
(2.6)
p
(ii)
1, (x) = i =1
1 .
(2.7)
Chapter 2
16
P r o o f. Let a set Wi be complementary to the set V5(xi ); i.e.,
Wi = {yI 11y-x111>_c}, >,U
and let us put yi (x) = p(x,Wi ). Then some y1i (x) ? 0 and yi (x) = 0 only on the set Wi. Since the balls {V5(x)} form a covering of the compact set X, we have p
Y Y/k(x)>0
VXE X.
k=1
The functions ei (x) = Vli (x)IF
Vlk (x) are the ones we are looking for. 0
_1
k
Remark 2.4. In this case, the functions {yi} form a continuous division of the unit function depending on the covering {V5(xi)}.
P r o o f of Theorem 2.3. Since the mapping a is u.s.c., there exists a number SX such that a(V sX (x)) C VE (a(x))
for a given e and every x e X. We can assume without loss of generality that 8X <_ e. Since x e V5X (x),
the balls {VaX (x)} form an open covering of X. Since X is compact, we can choose a finite subcovering V5X (x1),...,V5X (xp ). Let Di, i e 1:p, be the clop
t
Nib
sure of the set VE(a(xi)). It is not difficult to check that Di = a(xi) + Bm . Let us construct functions {ei } satisfying properties (2.6) and (2.7) for the 46
covering {Vs (xi) }p =1.
We shall put
and show that the mapping b has the required properties.
(a) Let us show that the set b(x) is convex and compact. We know that the sets Di = cl(a(xi)+VE) = a(xi) + Be are convex and compact. Therefore Y2,(x)D1 is a convex set. Since the Di are bounded, the set Y2 (x)D1 is also bounded. Finally, the compactness of the Di implies that the set Y1i (x)Di is also closed. Thus b(x) E 17,(Rm ).
(b) To prove that the mapping b is continuous, let us show that b is both u.s.c. and l.s.c. Take sequences {xk }, {yk } such that Xk -4 X, yk -4 y, yk E b(xk ) V k, i.e. p
yk =
i =1
'ei (xk) Zk
Elements of the Theory of Multivalued Mappings
17
where zk E Di. We can assume without loss of generality that
limzk = zi E Di Vi. k-300
Passing to the limit in the equality yk = E -ei(Xk)Zk
yields y E b(x). It follows from Proposition 2.3 that b is u.s.c. at x. Now let xk -4 x, y E b(x). Then y = Tei(x)zi, where zi E Di. Take
Yk = E 'et(xx)zt
It is clear that yk E b(xk), yk - y, which means that b is l.s.c. The upper semicontinuity of b implies that b is continuous. (c) To show that a(x) c b(x) V x E X, we shall introduce the set
From the definition of 8x for i E 1(x) we have the inclusion a(x) c VE (a(xi )), .
i.e.,
a(x) c D;
d i r= 1(x)
.
Since -ei (x) = 0 if i 4 1(x), we have
h=1
Awl
P
a(x) _ 1: -ei (x)a(x) _
-ei (x)Di = b(x) .
-ei (x)a(x) c iE I (x)
iE I (X)
(d) Finally, let us check that b(x) c a(x+BE) + BE. Let i E I(x). Then IIx xill
<
cx
<_ e
and hence xi e x+BE. Therefore a(xi) c a(x+Bn ). Adding the ball BE to both sides of this inclusion, we have
Di = a(xi) + Be c a(x+Be) + Be
Chapter 2
18
Now P
b(x) _
-ei (x)Di =
c
-ei (x)Di iE 1(X)
i=1
1 e (x) [a(x+BE)+Bm ] = a(x+Be) + Bm
iE 1(X)
This completes the proof of the theorem. 0
Chapter 3 DIRECTIONAL DERIVATIVES
3.1. Let a function f be defined on an open set X c Rn. Fix X E X and g E Rn . If the limit
1im a1 [f(x+ag) - f(x)]
(3.1)
a -9+0
exists, it is called the derivative of the function f in the direction g. We shall express the directional derivative in a number of different ways: Sao
df(x) g
f(g), f'(x,g).
The first form makes it explicit that we are considering the directional deriv-
ative ("partial" derivative); the second is convenient if a point x is fixed and the derivative is a function of direction g; the third version is useful if the directional derivative is being studied as a function of two variables x and g.
A function is said to be directionally differentiable at x if the limit (3.1) ado
exists for every g.
Let f be directionally differentiable at x. Then the directional derivative fx' is a positively homogeneous function, i.e., Al
f,(2g) = 2ff(g)
V A >_ 0
.
This follows immediately from the definition.
Let us fix x EX and g E Rn and consider the following function of one variable:
h(a) = f(x+ag)
.
It is clear that f'(x,g) coincides with the right-hand-side derivative h+ (0) of the function h at zero and therefore the directional derivative shares all the usual properties of the derivative.
19
20
Chapter 3
First of all, the directional derivative f'(x,g) represents a first-order approximation of the increment f(x+ag) - f (x): f(x+ag) - f(x) - af'(x,g) = o(a)
,
(3.2)
where -a- +0
a
0
Note that some of the generalizations of the derivative concept used in nonsmooth analysis (for example, the Clarke directional derivative) do not represent a first-order approximation. Note also that many of the formulas and rules from differential calculus can be applied to the directional derivative: e.g., if f1 and f2 are differentiable at x0 in a direction g, then their sum, difference, product and quotient are also differentiable at x0 in the direction g; and the rules of differential calculus can be used to compute the corresponding directional derivative. But more importantly, the functions
f(x) = max Ui (x), f2(x)} and ,fix)
= min
Ui (x), f2 (x)}
are also differentiable at x0 in the direction g, i.e., the directional derivative allows the operations of taking the pointwise maximum and minimum. This property (which is fundamental and will be proved below in Proposi-
tion 3.3) is due to the fact that f'(x,g) is a one-sided derivative. More precisely, the value f (x,g) coincides with the right-hand-side derivative of the function g(a) (which is a function of one real variable) at zero. Now let us consider an example of a directionally differentiable function.
Example 3.1. Let p(x) =
xl i) ,
where x = (x( 1) ,..., x(n)) E Rn. We shall
show that the function p is directionally differentiable on Rn and that its directional derivative is given by the formula 0_W
dg
where R(x) = {i E
I :n
I
max
iER(x)
g(i)
(3.3)
x(i) = p(x)}, and g = (g(1),...,g(n) ). Letting p(x) = c,
we conclude that for some e > 0 the inequality x(i) < c-e holds for all
Directional Derivatives i
21
R(x). Therefore, for any sufficiently small positive a we have
x(i) + ag(s) < c - e + ag(') < c - C2
E
A
(j)
2
< x(j) + ag(j)
V i4 R(x), j E R(x). It follows from the above inequalities that for such values of a
p(x+ag) = max [x(J) + ag(i)]
,
...
jE R (x)
p(x+ag) - p(x) = max
[x(i) + ag(i)] - c
...
jE R (x)
[c+ag(j)]-c
max
...
jE R (x)
max ag(j) = a max g(j) jE R(x)
...
jE R(x)
which implies (3.3). +-j
Note that in this case the function o(a) defined by formula (3.2) is equal to zero for sufficiently small a. Viewing the directional derivative as a one-sided derivative of a function
of one real variable,
it is
easy to obtain an analogue of the Lagrange
0.,
formula. But first we prove the following lemma.
Lemma 3.1. Suppose that a function h(a) is defined and continuous on an interval [a,b] and that at every point a in this interval there exists a right-hand-side derivative
ha+2- ha.
h ' a = lim Z- 4
if h+ (a)
>_
0 V a E [a,b] ,
(3.4)
then
h(b)
>_
h(a)
.
(3.5)
Chapter 3
22
P r o o f. Fix any e > 0 and consider a subset A of the interval [a,b] such that if a E A and a :5,8:5 a, then
hA - h(a) : -e(# - a)
(3.6)
.
yam
aim
It is clear that a E A and that if a E A, then 8 E A V 8 E [a,a]. This means that A is itself an interval. Let us prove that A is a closed interval (it is enough to show that its right-hand end y lies in A). It follows from (3.6) that
h(fl) - h(a) ! -e(fl-a)
V,# < y.
Passing to the limit as /3 -4 y - 0 yields
These two inequalities imply that y E A. Thus A = [a,y]. Let us show that y = b. Arguing by contradiction and assuming that y < b, we conclude that for every 8 > 0 there exists an as < S such that
h(y+as) - h(a) < e(y+ as -a)
.
(3.8)
A
From the definition of A, it is also clear that as > 0. It follows from (3.7) and (3.8) that
(3.9)
Since as
+o
+ 0, (3.9) implies that h+
<e
which contradicts (3.4). Thus y = b and therefore
h(b) - h(a) >- -e(b-a)
.
(3.10)
The required inequality (3.5) now follows from (3.10) since e is an arbitrary positive number.
Directional Derivatives
23
Corollary 3.1. Under the hypotheses of Lemma 3.1 the following properties hold: r-.
(a) h is nondecreasing on [a,b], i.e.,
h(fl) > h(a) if 8 > a ;
(3.11)
(b) If h+ (a) > 0 for some a E [a,b], then h(,(3) > h(a) V fl > a.
(3.12)
Property (a) follows immediately from the lemma. To prove property (b), let
us assume the converse, i.e., there exists a f3 > a such that h(f3) = h(a). From (3.11) we conclude that h(f3') = h(a) V f3' E [a, f3]. But then
h a+ -h a= 0
h' a= I i m
S-9+ 0
which contradicts (3.12).
Corollary 3.2. Let the hypotheses of Lemma 3.1 be satisfied and let
inf h+ (a)
,r.
m=
aE [a, b]
,
M=
sup h+ (a) .
aE [a, b]
Then
m(b-a) <_ h(b) - h(a) <_ M(b-a) 0
P r o o f. We apply Lemma 3.1 to the functions
hl (a) = Ma - h(a) and
.
Chapter 3
24
Proposition 3.1. Let a function f be defined and continuous on an open set
X c Rn and differentiable in a direction g at every point of the interval [x,x+aog] _ {x+ag I a E [O,ao]}, where ao > 0 and x is fixed. Then
f(x+a0 g) = f(x) + cap , where
c E [m,M] ,
m=
M=
sup f'(x+ag, g) aE [0, ao]
inf
aE [O,ao]
f '(x+ag,g)
,
.
P r o o f. Let us introduce the function h(a) = f(x+ag)
r-,
which is defined on [O,ao ]. From the definition of the directional derivative, it follows that the function h has a right-hand-side derivative h+ (a) on [O,ao ] and
h+ (a) = f'(x+ag,g). Applying Corollary 3.2 leads to the required result.
o
3.2. When speaking about the directional derivative we usually assume that some of its properties are known, since otherwise the information about the function is rather limited. Considering the directional derivative f'(x,g) as a function of f at some fixed g, it is generally assumed that the function has some topological properties (such as continuity, upper or lower semicontinuity, etc.). If we study f'(x,g) as a function of g at some fixed x, this function should have properties of an algebraic nature (such as linearity, sublinearity, superlinearity, representation as the sum of a sublinear function and a superlinear one, etc.).
0.0
cps
The above assumptions are justified since they are usually satisfied for most applied problems. It turns out that the algebraic and topological properties mentioned above are closely related: for example, continuity of the derivative in x implies that it is linear in g, upper semicontinuity in x implies sublinearity in g (these results are stated and proved in Chapter 7). Classical ("smooth") calculus is mainly concerned with functions which are continuously differentiable in x. In this case the derivative f(g) is linear, i.e., there exists a vector v such that fX (g) = (v,g). The vector v is called the
Directional Derivatives
25
gradient of the function f at the point x and is denoted by Vf(x), grad f(x) or df(x)ldx.
It is well known that
VAX) = r L9(1)
where df
(x)/dx(i)
200 02
aXcn>
is the partial derivative of f with respect to the ith variable,
i.e., = dx(e)
1 im 1 [f(x(l),...,x(_l),x(1)+a,x(i+l),...x(z)) - x(1)
... x(n )
a -)O
.a0
If the directional derivative f(g) of a function f at x exists and is linear,
i.e., Vf(x) exists, then we say that f is Gateaux differentiable at x. Thus, Gateaux differentiability may be described as "linear directional differentiability." If a function i.e., we have
f is differentiable at a point x in the classical sense, f(x+g) = f(x) + (Of(x)2g) + o(g)
,
where -3
11g 11
11g ii-40
0
'
the function f is called Frechet differentiable (or just differentiable). What is the difference between Gateaux differentiability and Frechet differentiability? Suppose that the gradient Vf(x) exists at a point x. For vectors then
,
g of norm one let us consider the functions
og (a) = f(x+ag) - f(x) - a(of(x),g) . If f is Gateaux differentiable, then og (a)
a
0
i.e., for any e > 0 there exists a S > 0 such that
Iog(a)I < ea V CC E (0,6) Thus S may depend on g.
.
Chapter 3
26
Frechet differentiability means that
og (a)
a
-4 0 uniformly with respect to
g: S depends only on e and the same value can be taken for all g (such that Ilgh = 1).
Now we consider an example involving a function which is Gateaux differentiable but not Frechet differentiable.
.00
Example 3.2. Consider the set A shown in Figure 3.1, (an "apple"). The stalk of the "apple" is a vertical tangent at x0 to the curve forming the "boundary" of the "apple." Let us put
fW = n, 0
a x 4 A
.
For any g, the intersection of A and the ray {xo +ag I a >_ 01 is an interval, the length of which depends on g. Therefore the directional derivative fX (g) exists and is equal to zero. Since f,,' 0 is a linear function of g, the function f
is Gateaux differentiable. On the other hand, it is clear that f is not Frechet differentiable at x0 since f is discontinuous at x0 (Frechet differentiability implies continuity).
o
Figure 3.1.
Directional Derivatives
27
3.3. Let a function f be defined on an open set X c R. We say that f is unidifferentiable (or just uniformly differentiable) at x EX if for any e > 0 there exists an ao > 0 such that
formly directionally
1 [f(x+ag) - f() x-af'(x,g)I <e
a
V a E (O,ao ], d g E B
(3.13)
where B is (as usual) the unit ball. Assume fig II = 1 and put v = ag. Then it follows from (3.13) that the inequality
[f(x+v) -f(x) - f r(v)I < e IIv II
(3.14)
holds for all sufficiently small v. The inequality shows that
1
[f(x+v) - f(x)
-fv( }l
-IIVII-)0
0
.
(
3 . 15 )
It is clear that (3.14) implies (3.13) and also the following property (which seems to be even stronger than (3.13)): for every e > 0 and r > 0 there exists an a0 > 0 such that
1a Lf(x+ag) -f(x) - af'(x,g) I <e V aE (O,a0),d g E Br
(3.16)
where B,. is the ball of radius r centered at zero.
Unfortunately, uniform differentiability of a function f does not imply that either f itself or its directional derivative fX is continuous. To illustrate this, let us consider the following example.
Example 3.3. Let us take a sequence {xk } of different points on the unit circle in the space R2. Now define the following function:
k2
if x = 2xk for some k
0
otherwise
f(x) = .
The function f is directionally differentiable at x0 = 0 and k fig 112 if g = 2xk for some k ,
f'(O,g) _ 0
otherwise
.
,
Chapter 3
28
Since at xo = 0 we have f(x0+ag)
f(xo) - afXO(8) _ f(ag) - af"(O,S) _ 0
,
f must be uniformly differentiable at x0. But at the same time the function f and its derivative fx are discontinuous. D
Proposition 3.2. Let a function f be uniformly differentiable at a point x. Then
(a) if fX is continuous at zero, then the function f is continuous at x ;
(b) if fX is continuous at a point g, then
f'(x,g)
=
1 im g a -4+0 4
a
[f ( x+aq)
-.f(x) ]
(3.17)
.
P r o o f. The continuity of f follows immediately from (3.14). Since we have f (x + aq) -f(x)
a
.f(x+aq} f(x) _ '(x, .f q) a
- Px' g)
+ lf'(x,g) - f'(x, q) i equality
,
(3.17) may be obtained from (3.13) and the continuity of the
derivative. 0
Proposition 3.3. Uniform differentiability of a function f at a point x is +.'
equivalent to the following property:
(A) For every q E Rn and any e > 0 there exist a 8 > 0 and an a0 > 0 such that
= lf(x+ag) - f(x) - af'(x,g)I < ea .-.
log (a) I
V a E [O,ao], V g E B5(q) .
(3.18)
b'`
cps
P r o o f. (a) Let f be uniformly differentiable, where e > 0 and q E Rn is given. Choose any o > 0 and put r = IIq II + J. It follows from (3.16) that
Directional Derivatives
29
there exists an ao > 0 such that inequality (3.18) holds for all a E (O,ao ] and g E Br. Since B5(q) c Br, this inequality is also valid for g E B5(q). (b) Let property A be satisfied. It then follows from the definition that for every q E S = {x I fix ii = 1 } there exist numbers ao (q) and 8(q) such that < ae V a E (0 900 (q) 19 g E V5(q) (q) ,
log (a) I
where V5(q) is an open ball of radius 5 centered at the point q. The balls V5(q) (q), form an open covering of the sphere S. Since this sphere is compact, there exists a finite covering V5(gI) (q, ),...,V5(q ) (qp ). Let ao be the smallest P
of the numbers ao (q, ),...,a0 (qq ), and let g E S. Then g E Vs(q t) (q) for some i, and therefore l o8 (a)I < ae V a E (O,ao (q j)]
Clearly, this inequality also holds for a E (O,ao ].
.
0 f/1
Remark 3.1. In fact, property A means that f is locally uniformly differ,RS
entiable. Hence, it follows from Proposition 3.3 that locally uniform differentiability is equivalent to "global" differentiability. Unfortunately, in infinite-dimensional spaces this property does not hold and therefore it is necesF-+
V.4
sary to use locally uniform differentiability (see, e.g., [39]).
Next we introduce one important class of uniformly directionally differC13
entiable functions.
qty
Proposition 3.4. Let a function f be directionally differentiable at a point x C X and Lipschitz with a constant L in some neighborhood of this point. Then (a) the derivative fX is Lipschitz (in g) with the same constant L; (b) the function f is uniformly directionally differentiable at the point x.
P r o o f. Consider the absolute value of the difference
l a If(x+a4) -.f(x)] - a V(x+ag) -f(x)] I = l a [f(x+a9) -.f(x+a8)]
(3.19)
0
Assume that a is small enough to ensure that elements ag and aq belong to a neighborhood wheref satisfies a Lipschitz condition. Then
[f(x+aq) - f(x+ag)I <_ aL II q-g II
.
Chapter 3
30
Taking the limit in (3.19) yields [f(q) -.fX(B)I
:5
Lllq-g11
If, in particular, g = 0, then [f(q) l <_ Lligll. Therefore [f(x+q) -.f(x) -.fX(q)I
:5
[f(+q) -.f(x)I + [f(q) I
<- 2L 11q II ,
which implies that the function f is uniformly differentiable. D A theorem on the differentiability of compositions holds for uniformly directionally differentiable functions. Before formulating it, we shall introduce
the following definition: a mapping H: X -4 Rm, where X c Rn is an open set,
is said to be differentiable in a direction g at a point x E X if the followRIB
ing limit exists: ,.,
H'(x,g) = HX(g) = 11m 1 [H(x+ag) -H(x)] a-4+0 a
.
Let H(x) = (hi (x),...,hm (x)). It is clear that H is differentiable at x in a direction g if and only if its coordinate functions hi(x) (i E 1:m) are differentiable at the point x in the direction g. In this case H'(x,g) _ (hi (x,g),..., hm (x,g))
We say that H is uniformly directionally differentiable at x if functions hi,..., hm have this property.
Theorem 3.1. Let X be an open set in Rm and let a mapping H: X -4Y be direcgyp.,..,
tionally differentiable at a point x E X. Let a function f be defined on Y and uniformly directionally differentiable at the point y = H(x). Assume also obi
that the derivative f' is continuous as a function of direction. Then the function u(z) = f(H(z))yis directionally differentiable at the point x and
u'(x,g) = f'(H(x), HX (g)) V g E Rn
.
P r o o f. Let yt8 (a)
[H(x+ag) - H(x) -aHX(g)]
v = HX (g) ,
va = HX (g) + y/8 (a) .
Directional Derivatives
31
Find the difference
u(x+ag) - u(x) = f(H(x+ag)) -f(H(x)) = f(H(x) + aHX(g) + ayr8(a)) - f(H(x))
= f(y+ava) -.fV') = af'(y,va) + O(ava) , where 0 11V 11
1 1V1 1- 4 0 0 11
Taking the limit in the equality
[u(x+ag) - u(x)] = f'(y, va ) +
o(ava)
a
we conclude that the derivative uX (g) exists and coincides with f'(y,v) _ f'(H(x), H'(x,g) ). o
Theorem 3.1'. Let all the assumptions of Theorem 3.1 be satisfied and let, in addition, the mapping H be uniformly directionally differentiable. Then the function u(z) = f(H(z)) is uniformly directionally differentiable at the point x.
P r o o f. It follows from the proof of Theorem 3.1 that
u(x+ag) - u(x) - au'(x,g) = a[f'(H(x), HX (g) + y'8 (a)) - f'(H(x), HX (g))] + o(ava (g))
.
Here
va (g) = HX (g) + Vg (a).
Since the mapping is uniformly differentiable, we have y'8 (a) a -- 0 0 uniformly with respect to g E B. Therefore, the quantity in square brackets tends to zero (as a -4 +0) uniformly with respect to g, the same being true for the quantity o(ava (g))/a. 0
Chapter 3
32
To illustrate Theorem 3.1, we shall find the directional derivative of the maximum function
u(z) = max J(z).
(3.20)
iE 1: m
Proposition 3.5. Let functions fl,...,fm be directionally differentiable at a point x. Then the function u defined by formula (3.20) is also directionally differentiable at this point and
max f(x,g),
u'(x,g) =
iE R (x)
where
R(x) = {i E l:m I J(x) = u(x)} .
If the functions fi' are uniformly directionally differentiable at x, then the same is true for the function u.
P r o o f. Let us put H(x) = (fl (x),...,fm (x))
p(y) = max
iE 1: m
y(i)
where y = (y(1),...,y(m)) E Rm. The function p is directionally differentiable (see Example 3.1) and
p'(y,v) =
max v(i)
iER*(y)
where
v-,
It is clear that p is a Lipschitz function and therefore uniformly differentiable. It follows from Theorem 3.1 that the function u is directionally differentiable and
u'(x, g) = p'(H(x), Hx (g)) =
max
f(x,g) = max f,'(x,g).
iE R*(H(x))
iE R(x)
Directional Derivatives
33
It can be deduced from Theorem 3.1 that if the functions f are uniformly directionally
also uniformly
differentiable, then u is
directionally
dif-
ferentiable. 0 3.4. In addition to uniform directional differentiability, uniform differentiability with respect to a parameter (uniform parametric differentiability) is of great significance.
Let f(x,y) be a continuous function of two variables defined on a set X xY, where X is an open set in R'z and Y is a compact set in Rm. Assume that for every
y E Y the function fy (x) = f(x,y) is directionally differentiable at a point x0 . The directional derivative f(x0 ,g) of this function shall be denoted by f'(x0,g,y). Let oy(a) = f(x0+a8)y) -.f(xo9Y) a
function f(xy) is uniformly directionally differentiable with
respect to parameter y E Y at a point xo if for any e > 0 there exists an ao > 0 such that
loy(a)I < ea `daE (O,ao), VyE Y. In this case we also say that the derivative f'(x,g,y) is uniform with respect to y. Let us consider an example.
Example 3.4. Suppose that for ally E Y a function f(x,y) has a partial derivative oaf (x,y)/dx which is jointly continuous in both variables. Then the derivative lie/
f'(x,g,y) =
Y)
x'
g
is uniform with respect to y. Indeed, it follows from the Lagrange formula that
ry(a) = f(x+ag,y) - f(x,y) a
)81
J
Y)1
,df(x+O 9)Y)
al
ax
df(x,y
dx J
where By = By (a) E [O, a]. The continuity of oaf /dx implies that for any .6 > 0
Chapter 3
34
there exists a 5 > 0 such that df(x+8yg,y)
d
,
< . V aE (0,05), V
E Y.
(DD
Therefore
joy(a)I
<_
aeIIgII VyEY.
The uniform parametric derivative has the following important property.
obi
Proposition 3.6. If a continuous function f has a derivative f'(x,g,y) which is uniform with respect to y at x in a direction g, then the function y -4 f'(x,g,y) is continuous.
P r o o f. Let ha(Y) =
a Cf(x+ag, Y) -.f(x,Y)]
The function ha (y) is continuous for all a. -4 The uniformity of the derivative means that the convergence ha (y) a 4+0 f'(x,g,y) is uniform with respect to y, and the proposition may be proved
using the well-known theorem on the continuity of the uniform limit of a family of continuous functions. o Let a function f(x,y) be defined on X x Y, where X is an open set and Y is a compact set. Put
O(x) = m a x f(x,y) .
yEy
Theorem 3.2. Suppose that a function f is jointly continuous in both variables and has a derivative at a point x in a direction g which is uniform with respect to y. Then f is differentiable at x in the direciton g and
0'(x,g) =
max f'(x,g,y),
yE R(x)
where
R(x) = {y E Y I O(x) = f'(x,y)} .
Directional Derivatives
35
ors
This theorem is well known (it is proved, for example, in [47]). A simpler proof is based on the composition theorem (compare with Proposition 3.5), but one needs to use the technique of functional analysis.
Note that if the derivative f'(x,g,y) is not uniform, then Theorem 3.2 is not valid.
Now let us consider the case where the partial derivative oaf/dx of a "'(D
function is jointly continuous in both variables (it is of course assumed that f is continuous). We showed in Example 3.4 that the function f is differen-
tiable in x uniformly with respect to y and therefore Theorem 3.2 holds. Hence, the derivative q'(x,g) exists and
q'(x,g) =
[f(x,y) , g
max
(3.21)
,
dx
Y E R (x)
0
a)a)
where, as above, R(x) = {y E Y 10(x) = f(x,y)}. It follows from the continuity of functions 0(x) and f(x,y) that the set R(x) is closed and therefore compact. Furthermore, the continuity of these functions implies the upper semicontinuity of the mapping R. Indeed, it is clear from Proposition 2.3 that it is sufficient to verify that this mapping is closed. Let xk -4 x, yk -4 y, yk E R(xk ), i.e., O(Xk) = f(xk,yk). Passing to the limit yields 0(x) = f(x,y), and hence y E R(x), which implies that the mapping R is closed.
Let us consider the following multivalued mapping a defined on X: x , y E R(x) A
a(x) _ Je E Rn I e =
.
>4~
The compactness of R(x) and the continuity of the partial derivative dfldx '0`
imply that a(x) is a compact set. Let d0(x) denote the convex hull of a(x)
(x) = co a(x)
.
We shall call this set the subdifferential of the maximum function 0 at the point x. This set is convex and compact (because it is the convex hull of a compact set). The following equality holds:
O'(x,g) =
max
(v, g) .
VE of (x)
Indeed, =
max
[df(x,y) , g
yE R ( x )
max vE co a(X)
Ox
=
(v,g) = max
max (v,g) /'1
O'(x,g)
vE a (x)
vE f (X)
(v,g).
Chapter 3
36
Proposition 3.7. The mapping x -4 f(x) is upper semicontinuous.
P r o o f. It is sufficient to check that the mapping x -4 f(x) is closed. Let Xk -4 x, Vk -4 v, Vk E f(xk). It follows from the Caratheodory theorem and the definition of the set f(xk) that every Vk can be represented in the form n+1
Vk =
aki
off (xk , yk i )
(3.22)
dx
i =1
where k+1
Yki E R(xk),
aki >_ 09 i =1
aki = 1
E--
The sequences {aki I k E 1:oo} (i E 1:n+1) are bounded. We can assume with-
out loss of generality that aki kt ai and that the sequences {yki} also converge to elements yi E Y. Since the mapping R is u.s.c., we have
yi E R(x) V i E 1: (n+ 1) . Passing to the limit in (3.22) yields
df(x,yi)
n+1
V= i=1
ai
(3.23)
dx
where n+1
yi E R(x),
a, > 0,
But (3.23) simply means that v E f(x). 0
i=1
ai = 1
.
Chapter 4 CONVEX FUNCTIONS
The family of convex functions and the family of maximum functions are
two most important classes of nonsmooth functions, the study of which helped better understand the nature, properties and problems of nonsmooth analysis.
The heart of convex analysis is the concept of the subdifferential, which was discovered and widely used by J.-J. Moreau [104] and R.T. Rockafellar [136,140].
-,:
3C!
Here we shall summarize (without proof) some of the properties of &.4
convex functions to be used later on. For detailed proofs see [44,136].
4.1. A function f defined on a convex set S c Rn is called convex on S if
f(ax1 + (1-a)x2)
<_
af(xl) + (1-a)f'(x2)
V xl,x2 E S, V a E [0,1]
.
(4.1)
A function f is called strictly convex on S if
f(axl + (1-a)x2) < af(x1) + (1-a)f(x2) r-.
V xl,x2 E S, xl # x2
d a E (0,1)
.
(4.2)
V xl,x2 E S, V a E [0,1] .
(4.3)
A function f is called concave on S if >_
af(x1) + (1-a)f'(x2)
r"+
f(axl + (1-a)x2)
A function f is called strictly concave on S if
`''
f(ax1 + (1-a)x2) > cLAxl) + (1-a),f(x2)
V XI J2 E S, xl # x2,
V a E (0,1)
.
(4.4)
Theorem 4.1. Let a function f be convex on a convex open set S c Rn. Then f is continuous on S. 37
Chapter 4
38
is differentiable at any point x E S in every direction g E Rn, i.e.,
cow
Theorem 4.2. If a function f is convex on a convex open set S c R", then f the
following limit exists:
df(x) a8
= 1 im 1 [f(x+ag) a a
-f(x)]
.
4.2. Let a function f be convex on Rn, x E Rn. The set
f(x) _ {v E Rn I f(z) - f(z)
(v,z-x) V z E Rn}
(4.5)
0
is called the subdifferential of the function f at the point x. Any element v e q)I(x) is called a subgradient (or generalized gradient) of the function f at x.
The mapping ) q : Rn - 17(Rn) is called the subdifferential mapping (related to the function P.
Theorem 4.3. The set oY(x) is nonempty, convex, closed and bounded. Corollary 4.1. The mapping x -4oY(x) is bounded on any bounded set.
Lemma 4.1. Let x E Rn, and consider a function f which is convex on Rn. Then for any .6 > 0 there exists a 8 > 0 such that
f(z) c 0Y(x) + SE (0)
V z E Si(x)
.
(4.6)
Remark 4.1. It follows from (4.6) and the definition of upper semicontinuity that the mapping x -4o3(x) is upper semicontinuous at any point x E Rn. But, generally speaking, it is not lower semicontinuous.
Remark 4.2. It is also possible to show that if S is an open set in R12, and x E S, then
f(x) = {v E Rn I f(z) - f(x) >_ (v,z x) V z E S} ,
(4.7)
i.e., the subdifferential concept is a local one (the subdifferential of f at x is completely defined by the properties of the function f in a neighborhood of the point x).
Theorem 4.4. The following relation holds:
f(g)
=
df(x) dg
=
max VE
(X)
(v, g)
.
(4.8)
Convex Functions
Corollary 4.2. Let X E Rn , g E Rn ,
v1
39
f(x). If oaf (x)/dg = -a < 0
E
and max
(v1, g) =
f(x)
--
(v, g)
df (x) dgg
then forge = -g we have
df(x) gl i.e.,
d=
max (v,g1) > (v1,g1) = (-v1,g) _ vE f (x)
a> 0,
if a convex function f decreases in some direction, then it necessarily
increases in the opposite direction.
4.3. Let functions f , i E 1:N, be defined and convex on Rn. It is easy to show that the functions N
ai > 0
ai, f (x) ,
0 1(x) _
i =1
and '0,
02 (x) = max{f (x) i E 1:N} I
are convex on Rn.
Theorem 4.5. The following relations hold: N
d 1(x) =
ai Ai (x)
(4.9)
i =1
'0,
&2 (X) = CO fei(X) I i ER(x)j ,
(4.10)
where
R(x) = {i E
1:N I f(x) = 02 (x)} .
Corollary 4.3. Let a function f be convex continuously differentiable in a neighborhood of a point x. Then
f(x) _ {f'(x)} .
(4.11)
Remark 4.3. Observe that the multiplication of a convex function by a negative number does not preserve the convexity.
Chapter 4
40
Theorem 4.6. If a function f is convex on Rn, then it is Lipschitz on any convex bounded set, i.e., for any bounded set G c Rn there exists an L < oo such that 11(z) - f(x)l
<_
Lllz-y11
V x E G, b' z E G
.
Theorem 4.7. If a function f is convex on R", then }'(x)
= ma x [f(z) + (v(z),x-z)) V x E Rn z e R"
,
(4.12)
where v(z) is an arbitrary vector from q,Y(z), i.e., a convex function is a maximum of affine functions.
Corollary 4.4.1f f is convex on a convex set S1 c Rn and X E int S1, then
f(x) = max [f(z) + (v(z),x-z)]
(4.13)
Z E S1
Remark 4.4. Note that an affine function (i.e., a function of the form f(x) = (a,x) + b, where a E Rn, b E R1) is both convex and concave at the same time.
4.4. The subdifferential concept can be used to describe necessary (and in the convex case also sufficient) conditions for a minimum.
Theorem 4.8. Let a function f be convex on R". For a point X E R" to be a minimum point of the function f on the space Rn it is necessary and sufficient that m
0E0 ,Y(x)
(4.14)
.
Clearly the set MW of minimum points of a convex function f is convex. If f is strictly convex on Rn, then M(f) is a singleton (or empty).
Necessary and sufficient conditions for a constrained minimum will be discussed in Chapter 16.
Theorem 4.9. If a point x is not a minimum point of a convex function f, then the direction AX)
g(x)
(4.15)
11 AX)11
where IIv(x)II
=
min vE f(x)
11v 11
,
Convex Functions
41
is the direction of steepest descent of the function f at the point x, i.e., min1
dg(x)
11g 11:5
dg
0
(MD
In the convex case the direction of steepest descent is unique (it will be shown below that this property does not necessarily hold for a nonconvex function).
Thus, in the convex case the subdifferential concept allows us:
0(D
.joy
1. to compute the directional derivative (see formula (4.8)); 2. to give necessary and sufficient\conditions for a minimum (see (4.14)); 3. to find the steepest descent direction (see (4.15)) if the point under consideration is not itself a minimum.
o'"
Note also that many numerical methods based on subdifferentials (or subgradients) have been developed.
°10
4.5. In many cases we do not need, or are not able, to compute the subdifferential or a subgradient. Therefore it is important to have some 0
approximation of the subdifferential.
Let f be convex on Rn. Fix x E Rn and e > 0 and set
fi(x) = {v E R" I f(z) -f(X) >_ (v,z-x)-E V Z ER"}
.
(4.16)
The set fix) thus defined is called the e-differential of the function f at the point x. Any element v E fi(x) is called an e-subgradient of f at x. The r-,
mapping J: [0,oo) x Rn -417(R") is called the E-subdifferential mapping.
Theorem 4.10. For any fixed x E R" and e > 0 the set fi(x) is nonempty, closed, convex and bounded.
Corollary 4.5. The mapping (ex) -4 fi(x) is bounded on any bounded set from [0,oo) x R" .
Definition. Let e ? 0, g E Rn , and let a function f be convex on Rn . The quantity
max
vE aef(x)
(v,g)
is called the e-derivative of the function f at the point x in the direction g.
(4.17)
Chapter 4
42
Theorem 4.11. The following relation holds: de.f(x) = dg
i of 1 ff(x+a8) -f(x) + E] .
a>0 a
(4.18)
A point X E Rn is called an c-stationary point of f on Rn if
0 E &f(X)
(4.19)
.
Theorem 4.12. Relation (4.19) is equivalent to 0
!5 f(x) - f* !5 e
9
(4.20)
where
f* = lrifnfly). ye R
Corollary 4.6. If for some X E Rn , .6 >0 and g E Rn we have
d,f (x) dg
0
then the funcrion f can be decreased in the direction g by not less than E. The direction ve(x) g,c (x)
-
II v x II '
where IIve(x)II =
min
IIvIi
VE def(X)
is called the c-steepest descent direction of the function f at the point x (and is, again, unique). The following property is most important.
Theorem 4.13. The e-subdifferential mapping (e, x) - d f (x) is H-continuous in .6 and x on (0,oo) x Rn.
Convex Functions
43
4.6. Let S c Rn be an open convex set and let Q c S be a convex set. Let a function f be defined and convex on S. Fix X E 12, e ? 0 and set m
dn}'(x) = {v E R" I f(Z) -,t'(x) >_ (v,z-x) V z E DI ff(x) _ {V E Rn I 1(z) - f (X) The sets o"(2f(x)
and dnf(x)
(V,Z-X)
,
- E d Z E D}
(4.21)
(4.22)
the conditional
are called (respectively) 0E.
subdifferential and the conditional E-subdifferential of the function f at the `CJ
,CD
point x ED with respect to the set D. An element v E f(x) is called a conditional subgradient, and an element v E dnf(x) is called a conditional e-subgradient of f at x with respect to D. These concepts are useful in studying constrained optimization problems.
Lemma 4.2. If X E int .fl, then for any foxed c >_ 0 the set do f(x) is nonempty, convex, closed and bounded. Moreover, if f is defined and convex on R", then o"f(x) = o'(x). Theorem 4.14. Let c >0, x ED. Then the following relation holds: d,12f(x) = cl BBf(x)
(4.23)
,
where Ble2f (X) = J V E R" 13Z E Sl: V E C2f (Z), f(z) - f (X)
(V,Z X) - E} .
Theorem 4.15. Let f = fl +f2, where functions fl and f2 are convex on an open set S C R" . Let .fl C S be a convex set and let c >_ 0. Then
0f(x)
=
1zo'
2
>_o
[d fi (x) + d 12f2(x)]
.
(4.24)
+
e1 +e2=e
Let x E 1, e > A, g E Rn,
II g II = 1. Set
ma x vE dnff(X)
(v, g)
do f (x) dg
sup (v,g) VEdff(x)
The quantity dnf(x)ldg is called the conditional derivative of the function f at the point x ED in the direction g with respect to the set .fl; the quantity dnf(x)ldg is called the conditional e-derivative of f at x in the direction g
Chapter 4
44
with respect to 12. Let
y(x) _ {V = 2(Z-x) I A > O, Z E D} , x E D . It is possible to show that a12
fcX
_
d-dga if g E cl r(x)
,
(4.25)
anf(x) dg
sup (v,g) _ V E dd2 f
RIB
Lemma 4.3. If X E 1, g E y(x), II g II = 1, then inf a>O
(x)
[f( x+ag) -f(x) + e]
(4.26))
x+agE £2
Corollary 4.7. For any e > 0 dg .f (x) dg
-
+C>O
Vg4y(x).
(4.27)
Lemma 4.4. Let a function f be convex on S and x c= .fl. Then O-Y(x) - I' (X)
--a f (X)
where T*
(x) = {w I (w,v) >_ 0 V v E AX)J . I'.
The cone I'*(x) is conjugate to the cone Ax) (see Chapter 1).
Theorem 4.16. Let a function f be convex on a convex set .fl. For a point x E S2 to be a minimum point off on .fl it is necessary and sufficient that 0 E dOf(x)
.
(4.28)
HIV
Let e > 0. A point x E S2 is called an e-stationary point of a convex function f on a convex set .fl if 0E
(4.29)
Convex Functions
45
Lemma 4.5. Relation (4.29) is equivalent to
.f(X) -}*
0
(4.30)
E,
where
PyE n Corollary 4.8. If 0 4 an f(x), then there exists a direction g E y(x) such that (4.31) and
i of
a>O
f(x+ag) < f(x) -.c
.
x+agE£2
Let X E 12, g(x) E Rn, II g(x) II = 1 be such that
d nf(x) d 8(x)
inf u g a= i
dg f (X) dg
<0
The direction g(x) is called the conditional E-steepest descent direction of the function f at the point x with respect to the set .fl.
Opt
Theorem 4.17. If .2 c Rn is a closed convex set, then the mapping d9 f is K-continuous (Kakutani continuous) in x and c on the set (o,oo) x 12 (i.e., it is both upper and lower continuous). v
Corollary 4.9. If X E int 1, c > 0, then the mapping dj f is H-continuous (Hausdorff continuous) at the point [e,x].
Chapter 5 SUBLINEAR FUNCTIONS
5.1. A very important subclass of convex functions
is
that of sublinear
`CC's"
functions, i.e., convex positively homogeneous functions. There are nu-
merous reasons for their importance, of which we shall mention only two. First, the derivative of a convex function is convex and positively homogeneous, i.e., sublinear; second,
sublinear functions
are convenient for
describing convex compact sets. Minkowski duality is used for this purpose (see Section 5.2).
Let K be a convex cone in the space Rn. A function p defined on K is called sublinear if it is subadditive, i.e.,
p(x+y) < p(x) + p(Y)
V x,y E K
and positively homogeneous, i.e.,
p(2x) = 2p(x) V X E K, VA > 0. We shall now summarize some elementary properties of sublinear functions:
1. A sublinear function is convex. This follows immediately from the definition: if x,y E K, a E [0,1], then p(ax+(1-a)y)
<_ p(ax) + p((1-a)y) = ap(x) + (1-a)p(y) .
0
2. A convex positively homogeneous function is subadditive (and therefore sublinear).
P r o o f. From the definition,
P(x+Y) = P (2 2 + 2) = 2p 3. The positive homogeneity of a sublinear function p implies that p(O) = 0. Indeed, p(0) = p(2.0) = 2p(0), and hence p(O) = 0. D
46
Sublinear Functions
47
CDR
A convex cone is the natural domain of a sublinear function, because only there does it make sense to consider the relations defining a sublinear function.
First of all, let us consider the case where this cone coincides with Rn, i.e.,
the function is defined on the entire space. In this case, zero is an
interior point of the domain. In what follows we shall assume that p is a sublinear function defined on Rn. Other properties of p include:
4. p(x) < p(x). Indeed, 0 = p(O) = p(x+(-x)) < p(x) + p(-x). 0 5. p is continuous and Lipschitz on the entire space Rn.
P r o o f. Continuity follows from Theorem 4.1. Since p is a convex function, it is Lipschitz on any bounded set. In particular, there exists an L such that A
Ip(x')-p(y')I < LIIx' y'II
Vx',y'E B,
where B is the unit ball. Now let x and y be arbitrary. Putting c = max { Iix II, IIy II }, we have
Therefore
m
xEB,
c
p
Y-EB.
c
5 Lug-Y1.
It follows from this inequality that
Ip(x)-p(y)I < Lllx - yll. o 6. p(O) = {v E Rn I (v,g) < p(g) V X E Rn }.
This property follows immediately from the definition and the equality p(O) = O. o 7. p'(O,g) = p(g). To be more precise, p '(O,g) =
1im 1 [p(O+ag) - p(O)]
a a9+ 0
a
p(g)
0
Chapter 5
48
8. The following basic equality holds:
p(x) =
max (v, x) vEdp(0)
.
This follows from Theorem 4.4 and properties 6 and 7 above. o
Thus, the set p(O) describes the behavior of the function p on the entire space, not only in a neighborhood of zero. (Recall that, in general, the subdifferential of a convex function represents the directional derivative, which describes only local properties of the function.) For this reason the set
dP(0) _ {v E R" I (v,x) <_ p(x) V x} is called the subdifferential of p and is denoted by o)i. We can now state the following theorem.
Theorem 5.1. Let p be a sublinear function defined on the space R. Then p(x)
= max
VEdp
(v, x)
.
We shall now describe the subdifferential fi(x) at an arbitrary point x.
Proposition 5.1. If p is defined and sublinear on Rn, then for any point x E Rn the following equality holds:
o p(x) _ {v E op I (v,x) = p(x) } . 0 0
Proof. 1.LetvE op. Then (v,y) < p(Y) V y E Rn If, in addition, (v, x) = p(x), then
(v,y x) < p(Y) - p(x) V Y E Rn i.e., v E
(x).
2.LetvE p(x), i.e., (v,yx) < p(Y) - p(x) V y E Rn
(5.1)
Substituting y = 0 and y = 2x in (5.1), we obtain
(v,x) ? p(x) and (v,x) < p(x) . Thus, (v,x) = p(x). It now follows from (5.1) that v Edp.
0
Sublinear Functions
49
The subdifferential dp of a sublinear function p is closely related to the set Jg r= Rn I p(g) :5 01
K
It follows from the sublinearity and continuity of p that the set K is a convex closed cone. This cone is not empty since it contains zero.
Proposition 5.2. Let p be a sublinear function defined on R" and let K = {g I p(g) S 0}. Then the cone K* conjugate to K coincides with the cone (-cl cone dam) which is the closed conical hull of the set (--o37).
Proof. Let v E (--o3q), i.e., -(v,g) :5 p(g) V g r= Rn .
Therefore, if g E K then
(v>8) ? P(8)
0
i.e., v E K*. Hence (-dp) c K*. Since K* is a closed convex cone, we have
c K*
cl cone (-o
r-,
Check the reverse inclusion. Let v 4 [-cl cone obi], or equivalently,
(-v) 4 cl cone dp . Since cl cone dp is a convex closed cone, the element (-v) can be separated from the cone cl cone dam, i.e., there exists a g E Rn such that
-(v,g) > 0
=
(w,g) max we cl cone dp
.
(5.2)
Relation (5.2) implies the inequality and hence
0 t max (w,g) wedp
p(g)
Since p(g) <_ 0, we have g E K. On the other hand, it follows from (5.2) that A
(V9g) < 0 ,...
;.p
i.e., v 4 K*. Thus, if v 4 (-cl cone dpi), then v 4 K*. This proves the inclusion K* c cl cone (-dam). 0
Chapter 5
50
Corollary 5.1. If 0 4 dp, then K* = cone (-gyp).
P r o o f. It is sufficient to show that cone (--dp) is closed. To do this, we shall prove the following more general fact:
If U is a convex compact set not containing zero, then A = cone U is closed. Let vk -4 v, vk E A V k. Then Vk = 2kwk, where 2k >_ 4, Wk E U.
Without loss of generality, we can assume that the limit lim Wk = w exists. Thus w E U (since U is compact). Since 0 4 U, we have w # 0. Therefore the sequence {2k } is bounded (otherwise
II vk II
-4 +oo). We can also assume
BCD
that 2k -4 A >_ 0. It follows from the equality v = 2w that v E A, and hence the cone A is closed. 0
Note that if 0 E U, then the conical hull cone U is not necessarily 0
closed.
Note also that the relation 0 4 )7 holds if and only if there exists an element g such that p(g) < 0. Indeed, if such an element exists, then the relation
p(g) = max (v,g) < 0 V dp
,-+
implies that 0 4 dp.
Arguing from the other direction, if 0 4 p, then it follows from the 4-4
separation theorem that there exists an element g such that
p(g) = max (v,g) < (v,4) = 0 . V
dp
IoW
r..
5.2. Let M be the family of all convex compact subsets of the space Rn, and let P be the family of all sublinear functions defined on Rn. Theorem 5.1
shows that for every p E P there exists an element U E M (viz. U = p) such that
p(x) = max (v,x).
(5.3)
Take an arbitrary compact set U E M and define the function
pu(x) = max (v,x). The function pU defined on Rn by formula (5.4)
(5.4)
is
called the support
Sublinear Functions
51
function of the compact U. It is easy to verify that this function is sublinear. Indeed, pu (xl +x2)
= m a x [(v,xl) + (v,x2 )] VEU
Cue
< m a x (v,xl) + m a x (v,x2) = pu(xi)+pu(x2), vEU
VEU
pU (2x) = m a x (v,2x) _ Amax (v,x) _ 2p(x) V2 > o. VEU
VEU
Hence, pu E P. What can be said about the subdifferential dpu of the function pU ? Proposition 5.3. The following equality holds:
dPa
= u.
P r o o f. For any fixed v E U it follows from (5.4) that (v,x)
PU (x)
VXEX
.
Hence, v E 39 and therefore U c off.
Let us now assume that there exists an e E such that e 4 U. Since U is a convex compact set, it follows from the separation theorem that there must exist an element x such that
(v,x) > m a x (w,x) = p(x)
,
which contradicts the inclusion v E off. This contradiction completes the proof.
0
C4"
pU.
044
0,,
Consider a mapping 0: P -) M which associates every subdifferentiable function p with its subdifferential off. This mapping operates on the entire set M: every compact set U c M is the subdifferential of some sublinear funcIn addition, it follows from Proposition 5.3 that 0 is a one-totion one mapping. The mapping thus defined is called Minkowski duality. This mapping makes it possible to study convex compact sets using the properties of sublinear functions, and conversely, to study sublinear functions s..
using the properties of convex compact sets. We shall now summarize some of the properties of support functions and subdifferentials which are related to algebraic operations.
Chapter 5
52
The support function of the sum of sets is equal to the sum of
1.
the support functions: PU1+U2 = PU1 +PU2 .
Proof. PU 1 + U 2 (x)
=
max [(v1,x) + (v2 ,x)]
(v,x) =
max
v1 E U 1 V2 E U2
vE U 1 +U2
max (v1 , x) + max (v2 , x) = pU l (x) + pU 2 (x) , p
V1 EU1
V2 EU2
This property can be extended to an arbitrary number of summands by induction.
2. The subdifferential of the sum of sublinear functions is equal to the ^`.
sum of subdifferentials of these functions. ran
This property follows immediately from Theorem 4.5, but we shall give another proof which makes use of Minkowski duality. Without loss of generality, we can consider the case of only two functions. Let pt , p2 E P. The subdifferentials of these functions dal and 0)72 are
convex and compact, and therefore their sum U =da + dp2 is also convex and compact. It follows from property 1 that
pU (x) = m a x (v,x) = V EU
max (v1,x) +
v1 EaPI
max (v2,x) = P1 (x) + p2 (x)
V2 EaP2
Thus, dpU = a(pl +p2 ). On the other hand, by virtue of Proposition 5.3, PU = U = d-P I + -0392
0
3. If A > 0, U E M, then piU = 2pu
IfA>O,pE P, then d(A ,)=Ads I
IV
This property can be verified directly. o r-,
4. Let U = co [ U1 U U2 ] =
u [aU1 +(1-a) U2 ] . Then 0<_a<_1
pU (x)
= max
{pU 1(x),pU 2 (x)}
.
Sublinear Functions
53
P r o o f. First of all note that U is a convex compact set. It follows from Properties 1 and 3 that pU (x) =
max max (v, x) 05 a51 v [aUl +(1-a)U2] max paU,+(1-a)U2(x)
05 a51
max [apU 1(x) + (l-a)pU (x)] 2
05 a51
.
If pU,l (x) >_ pu 2 (x), then the last maximum above is achieved if a = 1 and coincides with pU , (x). If pU 2 (x) >_ pU 1(x), then this maximum coincides with pU 2 (x). 0
By induction, it is possible to show that if U = co pU (x)
= max
fpU 1(x),...,pU
m
u Uj, then
ie1:m
(x)} .
5. Let p(x) = max {p1(X),...,P(X)}. Then
o5 = co u dpi . i=1:m Proof. LetUi=o i V i E 1:m,and
U=co u Ui U. i 1:m
Then U is a convex compact set and it follows from property 4 that max {pu (x),...,PU m W1
Pu(x)
1
= max {pl (x),..., pm (x)} .
Thus pu = p and therefore U = o U = ff.
El
6. Let p 1(x) >_ p2 (x) V x E Rn. Then Pi
P2
This property follows immediately from the definition of the subdifferential. 0
Chapter 5
54
7. Let Ul and U2 be convex compact sets. If Ul D U2, then PU 1(x) >_ pU 2(x)
V x E Rn
.
This follows from the definition of the support function. o
Finally, we have one more relation between a convex compact set U and its support function pU.
8. Let lip II = max p(x), where p is a sublinear function. Clearly Ilx II < 1
p(x) 5
lip 11
V x E R"
Ilx 11
.
then
=CD
It is easy to see that if II U II = p(U,O), where p is the Hausdorff metric, and II U II = II pU II .
In fact,
max max (1,x)
= max pU (x)
IIpU II
Ilxll<_1 AEU
I1x11:5 1
max max (1,x) = max 11111 zeU
We shall show that
lip II
=
IIUII
eeU
Ilx 11 :5 1
coincides with the smallest Lipschitz constant for
the sublinear function p. First let us check that lip II is a Lipschitz constant. Let x,y E Rn. We have PAY + (x-Y)) :5 P(Y) + P(X-Y)
and therefore
p(x) - p(y) 5 p(x y) 5
Ilp II
Ilx y II
Similarly
p(y) - p(x) 5
Ily x 11
lip II
.
It now follows that O(X) - P(Y)l
i.e.,
lip 11
is a Lipschitz constant.
:5
lip 11
-
llX-Y 11 ,
Sublinear Functions
55
Setting y = 0 in the last inequality, we get
O(x)) 5
Ilp II
IIX II
V X.
Hence 11P 11
> max OWI IIX its i
On the other hand,
= max p(x)
Ilp II
max IuXNSi
IIX osi
Therefore Ilp II
= max
(p(x)I
IIX os i
Let L be any Lipschitz constant for the function p. Then
IP(x)I = Ip(x) - p(0)j 5 Lllxll and therefore Ilp 11 5 L. This completes the proof. D
Essentially we have proved the following:
Theorem 5.2. (i) Let U1,...,Um E M. Then P
i 1:m
U
71
.(x) = I PU i(x) , :
i
(x) = max {Pu1 (X),..., pU m (x)}
.
If A > 0, U E M, then pi U (x) _ 2pu (x) V x E Rn I f Ul D U 2, then pU 1(A) >_ pU 2 (x)
(ii) Let p, ,...pm E P. Then IQ,
d(I ppi) d max pi i l:m
Io = co u
39i
V x E Rn
Chapter 5
56
Here (max pt)(x) = max pt (x). (iii) Let U E M. Then the quantity IIUII
= max I1.ell .AEU
coincides with the quantity II pU II
= max pU (x) , iiX 11s i
which is the smallest Lipschitz constant for the function pU.
(iv) If .Z > 0, p E P, then d(Ap) _ Adp.
If Pi W > P2 (x) V x E P, then dal DO192 We shall now prove the following proposition, using Minkowski duality. Proposition 5.4. Let U, V, W be convex compact sets such that
U+W = V+ W. Then
U=V.
P r o o f. The support functions pl and p2 of the compact sets U + W and V + W coincide. But
pi = pU+w = pU + pw P2 = pv + w = pv + pw It is clear that pU = Pv and therefore U = V.
0
jai
5.3. Now let us study sublinear functions defined on some convex cone K in the space Rn. Such functions can, generally speaking, be discontinuous on the boundary of K. In what follows we shall consider only lower semicontinuous (l.s.c.) sublinear functions.
Proposition 5.5. Let p be a l.s.c. sublinear function defined on a convex cone K c R" . Then its epigraph
epi p = {(x,A) E K x RI is a closed convex cone.
I p(x) >_ Al
Sublinear Functions
57
P r o o f. The fact that epi p is a convex cone follows immediately from the sublinearity of p. We must prove that this cone is closed. Let sequences {xk} and {Ak } be such that
(xk,Ak) E epi p
dk,
xk-X, 2k-4A. Since p is l.s.c., we have P(x)
<_
lim p(xk) S lim 2k = A
,
i.e., (x,2) E epi p. o Remark 5.1. It is possible to show that the converse is also true: if a function p is such that epi p is a convex closed cone then p is sublinear and a4-.
I.S.C.
The following definition is a slight generalization of the one for p defined over the entire space: The set
d p= {v E R" I (v,x) <_ p(x) V x E K} is
called the subdifferential of a l.s.c. sublinear function p defined on a
convex cone K.
Theorem 5.3. Let p be a l.s.c. sublinear function defined on a convex cone K. Then the set d p is nonempty and
p(x) =
sup (v,x) V x E K . vE aKP
P r o o f. Let Z = epi p be the epigraph of a function p. It follows from Proposition 5.5 that Z s a convex closed cone. Let y E K, .6 > 0. Since the point (y, p(y)-e) does not belong to Z, we can use the separation theorem to find a nonzero vector (v,c) E Rn x R1 such that (5.5)
(V,Y) + C(P(Y) -C) > 0
(5.6)
+
(v,x)+c2 <_ 0 V (xA E Z, .
Substitute x = y, A = p(y) in (5.5). Comparing the obtained inequality with (5.6), we conclude that c < 0. Therefore (5.5) can be rewritten in the form
-1 (v,x) <2 VxEKV2? x.
000
Chapter 5
58
For v'=- 1c v, 2= P(x) we have (v', x) < p(x) V x E K so that v' E d p and therefore d p is not empty. It follows from (5.6) that
(v',y) >
P(y)-e.
This inequality holds for any e > 0 and y E K. Therefore
p(x) < (v',x) <
sup (v,x) V x E K
v dKP
The reverse inequality follows from the definition ofd p.
.
LI
Remark 5.2. The equality
p(x) =
max (v,x) V e ax P
may not be true for some x E K. Let us consider sets which are subdifferentials of l.s.c. sublinear functions defined on a cone K.
A set A c Rn is said to be stable with respect to a cone K (or K-stable) if any inclusion x E A also implies that
x + y E A Vy E K. In other words, if x E A, then the cone x+K with vertex x belongs to A. Formally this means that A+K c A. However, since A+K always contains A (because 0 E K), the set A is K-stable if and only if entirely
A+K=A. Let us consider two elementary examples.
Example 5.1. The epigraph
epi f = {(x,A) E Rn x R1
I A >_ f(x)}
of a function f is stable with respect to the cone
K = J(OA) E Rn x R1
> 01
Example 5.2. Let us consider sets
VI
X(1) < p, X(2) < p, X(I)X(2) > 11 VI
A2 = {x = (X(1 ),x(2 )) E R2
Al
Al
Al = {x = (xcl),xc2>) E R2 xcl> > p, xc2> > p, x(1)x(2) > 1}
Sublinear Functions
59
The set Al is stable with respect to the cone R2 _ {x = (X(1)9X(2)) I xt l) >_ 0, x(2) >_ o}
The set A2 is stable with respect to the cone R2 = R2. Let p be a l.s.c. sublinear function defined on a convex cone K. It is not difficult to check that the subdifferential d p of this function is a closed convex set.
Proposition 5.6. The set d p is stable with respect to the cone (K*), where K* is the cone conjugate to K.
P r o o f. Let V E d p, v' E (K*). Then (v, x) < p(x) ,
(v', x) < 0
VXEK
.
Therefore (v+v',x) < p(x), i.e., v+v' Ed p. Finally, we conclude from Theorem 5.3 that
sup (v,x) = p(x) < +oo . 0
vEdKP
Proposition 5.7. Let K be a convex closed cone and let a set U c Rn be convex, closed, stable with respect to the cone (K*) and such that
sup (v,x) < 00 d x E K
.
(5.7)
vE U
Then U is the subdifferential of the l.s.c. sublinear function p defined on K by the equality
p(x) = sup (v, x)
.
(5.8)
vE U
P r o o f. Let us first check that formula (5.8) describes a l.s.c. sublinear function defined on K. It follows from (5.7) that the definition of p is consistent. Let a sequence {xk } be such that xk -4 x, xk, x E K. Let us show that P(X) :5 liM P(Xk)
Fix e > 0 and find v E U such that
Chapter 5
60
Since p(xk) >_ (v, xk) V k, we have
lim p(xk)
>_
lim (V,xk) _ (V,Xk) > p(x) - e
.
Since e is arbitrary positive, we have lim p(xk) >_ p(x)
.
Hence, p is a l.s.c. function. The relations Coo
p(x1 +x2) = sup (v, x1 +x2) <_ sup (v, x1) + sup (v, x2 ) VEU
VEU
VEU
= p(x1) + P(X2)
p(2x) = sup (v,2x) _ A s up (v, x) _ 2p(x) V A > 0 VEU
VEU
imply that p is sublinear. Let V E U. It follows from (5.8) that (v, x)
<_ p(x) V X E K , 7T'
i.e., v E dKp. Hence, U c d p. Let us now check the reverse inclusion. Assuming this inclusion does not hold, let us find an element v E dK p such that v 4 U. Since the set U is convex and closed, the inequality
w(v, x) > sup (w, x)
(5.9)
WE U
holds for some X E R'z. Let us now prove that X E K. Since the set U is (K*)-stable, we have
(w-K*) c U for any W E U. It follows from (5.9) that
(v,x) >
sup
V'E [w-K*]
sup
W'E (-K*)
(v',x) =
sup
w'E [-K*]
(w-w',x)
[(w, x) - (w', x)] _ (w, x) -- i n f (w', x) w'E K*
Sublinear Functions
61
These relations imply that
in f (w', x) > (w, x) - (v, x)
WEK*
It follows from these inequalities that
>_ 0 V W' E K*
(w', x)
(5.10)
.
Indeed, if this is not the case, then there exists a w" E K* such that
(W",x) < 0 and therefore, since K* is a cone, we have
i n f (w', x) < i n f (2w ", x) _ -0 R>0
W'E K*
8m?
Inequality (5.10) indicates that x E K**. But the cone K is closed so that K** =KandhencexE K.
III
Now we can rewrite (5.9) in the form (v,x) > p(x), which is impossible since v EdX p. This contradiction shows that u = dK p. 0 is not difficult
to show that an
x=0
=w,
'CA
Remark 5.3. Using Proposition 5.7 it
analogue of Theorem 5.2 holds for the subdifferentials of sublinear functions defined on a cone. Note the following property, which will be used later on: If p, and P2 are l .s.c. sublinear functions defined on a cone K, then d K(PI+P2)
_K P2)
Cl
Proposition' 5.8. Let p be a l.s.c. sublinear function defined on a convex closed cone K. Then P(x) , x E K ,
sup
(v, x)
VEdKP
+oo
,
xcL K
.
'0"a
Q,0
Ado
P r o o f. It is only necessary to consider the case x 4 K. Since K is a convex closed cone, we can use the separation theorem to find an element W E K*
such that (w,x) < 0. Let V E dK p. Since the set dK p is (-K*)-stable, we have
(v 2w) E dK P
V A> 0
.
62
Chapter 5
Therefore
sup (v,x) VE
>_
dKp
sup
v'=v-2w
(v',x) _ (v,x) + sup 2(-w,x) = +00
.
R>0
0
R>_0
BCD
USU.
,per
Let p be a sublinear function defined on the space Rn, and let K be a convex closed cone in Rn. The function PK defined on the cone K by the equality
PK W = P(x) is called the contraction of the function p to the cone K.
The following property of dKPK is a very useful technical tool for studying convex sets contained in a cone, and will be widely used later on.
Proposition 5.9. The subdifferential dKPK of the contraction of a sublinear function p defined on the whole space R" to a convex closed cone K is given by the formula c4'
d pK =
Proof. Let U =
p-K*
.
- K*. The set U is convex since it is the sum of Let a se-
convex sets p and (K*). Let us prove that this set is closed.
may
quence {vi } be such that vi -- v, vi E U V i. Any vi can be represented in the form vi = v + v Ed , E (K*). Since the set p is comCAI
pact, we can assume that I i m v = v' and v' E. Since vi' = vi - vj, the 1-400 sequence {v"} also converges to some element v" E (-K*). Thus v = v' + v" E (off-K*), which implies that U is closed.
The set U is stable with respect to the cone (K*). Indeed, let v E U, v = v' + v", where v' E d5 v" E (K*). Take any W E (-K*). Then
v + w = (v'+v") + w = v' + (v"+w) E (-K*) since v"+w E (K*). Finally, if x E K, then s u p (v, x) =
s ud
vE U
V'E
[(v', x)+(v ", x)l
V"E (- K*)
orb
= sup (v',x) + V'E dp
= p(x) < +0 .
sup
V"E (-K*)
(v",x) = sup (v',x) v'E dp
Sublinear Functions
63
Thus, all the assumptions of Proposition 5.7 are satisfied for the set U. Therefore U is the subdifferential of the function x
(v,x) vE U
VxE K,
which coincides with PK . 0
a.,
Remark 5.4. Proposition 5.9 can easily be proved by means of theorems on l.s.c. convex functions which take infinite values (see, e.g., [79,136]). But the proofs of these theorems are rather complicated, and therefore we shall not use them here.
Remark S.S. Most of results in this section follow from the theorems in Sec,,,
tion 4.6 given therein without proofs. 0
a".
5.4. A function q defined on a convex cone K in the space Rn is called superlinear if the function p = (-q) is sublinear. In other words, q is superlinear if
q(2x) = 2q(x) V X E K, VA >_ 0, q(x1 +x2) >_ q(x1) + q(x2)
V x1, x2 E K .
ten.
All of the above results concerning sublinear functions can be reformulated (with appropriate modifications) for the superlinear case. We shall not state 0
all these results here, but rather give only the definition of the superdifferential. We shall confine ourselves to the case of a superlinear function q defined on the whole space. In this case,
q(x) = min (w,x), W Edq
where dq is a convex compact set called the superdifferential of the function q and is defined by the equality
Let U be a convex compact set in the space Rn. In addition to the support function
pu(x) = m a x (v, x) , VEU
we can also introduce the function
qU (x) = min (w, x) . WE U
Chapter 5
64
The function pU is sublinear, the function qU is superlinear and dU = dqu = U. Note that
qU (x) = min (v, x) _ - m a x (-v, x) VEU
VEU
ma x (w, x) WE [- U]
P- v (x) _ -Pa (x) Alternatively qu (x)
min (w, x) _ - m a x (v, x) WEU
vEU
-PU(x)
Conversely, if
q(x) = -p(x) V X E Rn , where q is a superlinear function and p is sublinear, then o p = dq.
Chapter 6 THE SPACE OF CONVEX SETS
6.1. Let M be the family of all convex compact subsets of the n-dimensional space Rn. We can introduce the operations of addition and multiplication by a nonnegative number in a natural way (see Chapter 1): Let A c Rn and B c Rn be convex sets and let c >_ 0. Then
A+B = {xIx=y+z, YEA, zE B}, M = JXIX=Cy, YEAJ. It is well known that these operations have the following properties: if A, B, C cM,c1 >_0,c2 _ 0, then
(A+B)+C = A+(B+C),
A+B = B+A, c(A+B) = cA+cB, (Cl +c2)A = c1A+c2A,
(6.1)
(Cl c2 )A = c1(c2A)
=A, 0 A = {0} . All these properties are easily proved. Let us prove, for example, the equality
65
Chapter 6
66
Here it is assumed that c1,c2 ? 0. If c1 = c2 = 0, then the property is obvious. We shall therefore assume that c1 + c2 > 0. Let x c (Cl +c2 )A. Then x = (Cl +c2 )y, where y E A, and therefore
x = (C1y+C2y) E c1A +c2A . This proves the inclusion
(Cl +c2)A c [c1A + c2A] . To check the reverse inclusion, let us take X E [c1 A+ c2A], i.e., x = c1 x1 +C2 x2 , where x1 E A, X2 E A. We have C1 X1 + C2 X2 = (Cl +C2)
C2 C
-Cl -FC2
xl + Cl -1-C2 x2J 0
and since A is a convex set, the term in the square brackets belongs to A, so that X E (Cl +c2 )A. 0 0.O
Remark 6.1. The equality cA = {x I x = cy, y E Al allows us to define the multiplication of elements of the set M by any real number. But in this case some of the properties mentioned above no longer hold. For example, let B be the unit ball of the space R'z. Then -B = (-1)B = B and therefore 4-+
B+B = B + B = 2B. Assuming at the same time that (6.1) is valid, we get
B + B = (-1+ 1)B = O -B = {0}
.
The operation of addition in M has the property of contraction. This means that if A, B, C EM, then the equality
A+C = B+C implies that A = B. This property was proved in Chapter 5 (see Proposition 5.4).
H.6
6.2. The set M is not a linear space (multiplication by (-1) is not allowed .,..,
and therefore we cannot subtract). This is not very convenient, and therefore we have to consider whether it is possible to extend this set in such a way that subtraction is always feasible. Mathematicians have faced many analo.,.,
gous problems; to solve them a special algebraic technique has been develop-
The Space of Convex Sets
67
ed involving the imbedding of a semigroup with contraction into a group. We shall illustrate the idea with a simple example.
Suppose that (a) we are already familiar with the concepts of a natural number and of zero, (b) we know how to perform algebraic operations and
(c) we are aware of the fact that the operation of subtraction on the set N = J09 1,2,...l is not always feasible. We want to extend this set to make subtraction possible.
Consider all ordered pairs [a,b], where a,b E N, i.e. the direct product N x N. Let us define the operations of addition and multiplication by a natural number and zero in N x N:
[a,b] + [c,d] = [a+c,b+d]
,
r-,
p[a,b] = [pa,pb] V P EN . Let us also introduce the equivalence relation =:
a+d=b+c.
[a,b] = [c,d] ate.
r-,
A pair (x,y) such that [a,b] + [x,y] = [c,d] is called the difference of pairs [a,b] and [c,d].
The operation of subtraction on the set of pairs is always feasible. For example, we can take [x,y] = [b+c,a+d]. One inconvenience inherent in such
a definition is the nonuniqueness caused by the use of equivalence rather than equality. In addition, it is necessary to link the new objects with the original set N. (IQ
To do this, let us factorize the set of pairs so that a class of equivalent pairs will be referred to as a single element. We shall denote the set of all classes of equivalent pairs by Z. Let a,,8 E Z. How should we define their sum? It is easy to check that if [a1,a2 ] and [a1 ,a2 ] are two pairs from the class a and [b1,b2 ] and [b1 ,b2 ] are two pairs from the class f3, then their sums [a1,a2 ] + [b1,b2 ] and [al ,a2 ] + [bl ,b2 ]
Rte
are equivalent pairs. This means that the sum (up to the equivalence relation) is completely defined by classes a and f3 and does not depend on specific repre-
sentatives of these classes. The class a + 6 to which the sums of pairs from a and f3 belong is therefore called the sum of classes a and f3.
`-'
ran
s.,
5'O
We define the multiplication of a class by a natural number and by zero in an analogous way. The operations thus defined have all the usual properties. Thus, for any two elements a,,6 E Z we define the difference of a and f3 as the class which contains the pair [b+c,a+d], where [a,b] E a, [c,d] E f3.
Chapter 6
68
0
0
The role of zero is played by the class which contains all pairs of the form [a,a]. Let us denote this class by 0. Let aQ denote the class which contains the pair [a,O]. It is clear that aQ = ab if and only if a = b, aQ + ab = as + b, ap.Q = pa, Thus, algebraic operations over classes of the form as are complete-
ly defined by operations over natural numbers. This allows us to identify a class aQ and a number a, and therefore to consider the set N as being imbedded in Z.
d..
Special attention should be paid to the class a containing the pair [O,a]. This class is the inverse of the class aQ : aQ + aQ = 0.
Elements of the set Z are called integers: the integer as containing the v)'
Coca
pair [O,a] is called the number -a (here a E N).
It is obvious that the definition of negative numbers is rather complicated. However we just forget their complicated origin when operating with them, making use only of the very fact of their existence and their properties.
An analogous construction with appropriate modifications is used to define the difference of convex compacts. Since the concept of numbers (including negative ones) is assumed to be familiar, we shall immediately 4-+
define the multiplication of pairs of sets by an,y real number. Let us first consider the direct product M x M, i.e., the family of pairs of the form [A,B], where A E M, BEM. We shall define the algebraic operations of addition and multiplication by a real number, using the following relations: [A 1,B 1 ] + [A2 ,B2 ] = [A 1 +A2 ,B 1 +B2 ]
[cA,cB]
ifc >_ 0
,
P-4
ifc <_ 0
.
(6.2)
,
c[A,B] =
(6.3)
[cB,cA]
Here A,A1,A2,B,B1,B2 E M. In particular, (-1)[A,B] _ [-B, A]. The set ell
M X M does not yet represent a linear space. To obtain a linear space, let us introduce the equivalence relation - into M x M. We say that the pairs [A1 ,B 1 ] and [A2 ,B2 ] are equivalent if A,-B2 = A2 B 1. By M we denote the set of all
equivalent classes of the set M x M. Let a,/3 E M, pairs of sets [A1 ,B1 ] and [A2 ,B2 ] belong to a, and pairs [A1 ,B1 ] and [A2 ,B2 ] belong to /3. It is easy to check that pairs [A1 +A 1,B 1 +B i ] and [A2 +A2 ,B2 +B2'2 ] are equivalent,
i.e., belong to the same class, say y. This enables us to define the sum of classes a and 8 as follows: a + 8 = y, where y is the class containing 4-'
r-,
the pair [A+A',B+B']. (Here [A,B] is a pair from a and [A',B'] is a pair from /3.) This definition is consistent, since the class y does not depend on the choice of pairs from classes a and /3, but only on these classes themselves.
The Space of Convex Sets
69
We may define the product of a class a and a real number c in an +.+
analogous way. By ca we denote the class which contains the pair c[A,B], where [A,B] is any pair from the class a. (CD
0
6...
A),
It is now possible to show that the set M with the algebraic operations defined above represents a linear space. The equivalence class containing the element [0,0] (i.e., the family of all pairs of the form [A, A], where A E M)
plays the role of zero. A proof of the linearity of M is given below in 0
Section 6.3.
CDaaC
Let us consider a convex compact set A. Suppose we want to find the class aA which contains the pair [A,0]. It is possible to show (see Section 6.3) that: (a) if A # B, then aA # aB ; (b) the sum of sets is associated with the sum of the corresponding classes; (c) the multiplication of a set A by a number c >_ 0 is associated with the class acA . Identifying the set A with the class aA, we can consider the family of all convex compact sets M as being
imbedded in the linear space M; to be more precise, M is a convex cone in M. Furthermore, M is a reproducing cone, i.e., every element of M can be represented as the difference of two elements of M. Indeed, let a EM and let the class a contain a pair [A,B]. Representing [A,B] in the form
[A,B] = [A,0] + [0,B] = [A,0] - [-B90] and using the definition of algebraic operations in M, we find that a = aA - a- B ,
i.e., a E M - M. Thus, M is a linear space containing M as a convex reproducing cone. For this reason the space M is called the space of convex sets (or, more precisely, convex compact sets).
ten
6.3. At first glance the construction above looks rather complicated and unnatural. But in fact the opposite is the case. This will become clear from P".
the functional realization of the space M, which is given below.
'=0
Let P be the family of all sublinear functions defined on Rn. If a funcis contained in M. Note that every convex compact A is the subdifferential of some function PA E P
tion p belongs to P, then its subdifferential
-this is the support function of the set A and is defined by PA (x) = m a x (v,x) v EA
V X E Rn
Let us also consider the family Q of all superlinear functions defined on Rn. The superdifferential dq of every function q E Q belongs to M; if AE M, then A = ?qA for the function qA E Q defined by the equality qA (x) = m i n (w,x) V X E Rn wEA
Chapter 6
70
Every set A E M0" is associated with a sublinear function pA and a super0
linear function qA. Since
max(v,x) _ -min(-(v,x)), v EA
V EA
we have (see Section 5.4)
PA W = -qA (x) ;
-4(- A) W
PA W
Let us consider the set L = P + Q of all functions which can be represented as the sum of a sublinear function and a superlinear function. It is not difficult to show that this set is a linear space. We shall prove only that if e E L, then-,e E L. Indeed, let e = p + q, where p E P, q E Q. Then .I = (-p) + (-q). Since (p) E Q, (-q) E P, we must have (.e) E L. We shall now describe the relations between L and M. Consider a pair [A,B] which belongs to M x M. Put
4A,B1(X) = max (v, x) + m i n (w, x) . VEA
WEB
It is clear that 1[A ,B ] EL and (1[A1,B1] + I[A2,B21 )(x)
gee
= max (v, x) + min (w, x) + max (v, x) + min (w, x) V EA1
max (v,x) + a>4
=
wEBI
vE [AI+A2]
v EA2
wEB2
min (w,x) _ 'e[A I+A2B I+B2I(x)
WE [BI+B2]
_ ZIAI)BII+[A2,B2](x)
Thus, every sum of pairs is associated with a function from L, which is equal to the sum of the corresponding functions. For c >_ 0 we have CI[A,B ] (x)
= c (m a x(v,x) + m i n(w,x)) = v EA
_ 'e[cA,cB ](x) _
wEB
c[A,B ](x)
max (v,x) + min (w,x)
v EcA
wEcB
The Space of Convex Sets
71
`k'
-Icl(m a x(v,x) + m i n(w,x)) WEB
V EA
(max (v, x)
+
vE I CIA
min (w, x) WE 1cJB
min (-v, x) +
max (-w, x) WE I cIB
VE JcIA
= min (v, x) + max (w, x) = vEcA
[cB ,cA]
(x) = 'ec [ A , B ] (x}
Thus the operation of multiplying a pair by a number (see formulas (6.3)) is associated with the multiplication of the corresponding function by the corresponding number. Let pairs [A 1,B 1 ] and [A2 ,B2 ] be equivalent, i.e., A 1-B2 = A2-B1. Then
max(v,x) + max
v e Al
WE (-B2)
(w,x) = max(v,x) +
max (w,x) we (-B1)
v E A2
.
Therefore
Z[A 1,B 1 ] (x) = max (v, x) + m in (w, x) V E Al
we B1
max (v, x) -
max (w, x) WE(-B1)
vEAl
(6.4)
max (v, x) v e A2
max
WE (-B2)
(w, x)
= max (v,x) + m i n (w, x) _ l[A 2,B 2 ] (x) . v EA2
m
WEB2
Thus, the equivalence of pairs implies equality of the corresponding functions. It follows from (6.4) that the converse is also true: If 1[A 1,B 1 ] = .e[A 2,B 21
,
then [A1,B1 ]
' [A2,B2]
''-h
Since elements of the space of convex sets are classes of equivalent pairs, it follows from the above argument that these elements are completely defined by functions from L. To be more precise, the mapping yf: M - L defined by
Chapter 6
72
the equality
where
1(x) = max (v, x) + m in (w, x) wEB
V EA
4-4
and (A,B) is an arbitrary pair from the class a, is a one-to-one mapping on the entire space L. Note that every pair (A,B) belonging to the class a is associated with a particular decomposition of the function e into the sum of a sublinear function p and a superlinear function q (namely, p = PA, q = qB ); conversely, every representation e = p + q is associated with a particular pair (A = dp, B = dq) from the class a. As shown above, algebraic operations over pairs from a (and hence over
classes of equivalent pairs) are associated with the corresponding algebraic operations over functions from L. In other words,
VI(al+a2) = Kal) + V02) 9
:tea
Since the set L is a linear space and yi is a one-to-one correspondence M - L, Q''
it is clear that M is also a linear space. Furthermore, yi can be regarded as f-+
an isomorphism of linear spaces M and L. Let A E M and let aA be the class from M which contains the pair [A,0]. Then where
1(x) = max (v, x)
.
v EA
Thus, the isomorphism yi associates every class aA with the support function PA of the set A. Identifying, as well as above, the class aA with the set A means that we have indeed identified this set and its support function.
Let us summarize the results. Any function e which can be represented as the sum of a sublinear function and a superlinear function can be associated with a class of equivalent pairs of sets. Furthermore, algebraic operations over functions are associated with algebraic operations over the corresponding classes."', The family of all equivalence classes is a linear space which contains the set M of all convex compact sets as a convex reproducing cone.
The Space of Convex Sets
73
6.4. We shall now discuss some properties of the space L and M which have to do with ordering relations. N
Definition 6.1. A set Z is called ordered if an ordering relation is introduced into Z. This is a relation >_ with the following properties: if z1 >_ z2, z2 z3, y"'
Nor
then z1 >_ z3; the relations z1 >_ z2 and z2 >_ z1 are satisfied if and only if z1 =
Z2-
o°'
A set Z is called an ordered linear space if it is both a linear space and an ordered set, and its linear operations and ordering relation are consistent in the following sense:
thenz1 +z _ Z2+z V Z E Z
and CZ1
CZ2
b4+
if Z1 _ Z2,
V C>O.
If Z is an ordered linear space, then the set K = {Z EzZ>_I 0} is a convex cone which does not contain straight lines. This cone is usually called the cone of positive elements. gyp
Conversely, if we choose some convex cone K which does not contain straight lines from a linear space Z, then Z becomes an ordered linear space. An ordering relation can be introduced as follows: z1 Z2 if and only if
Z1-Z2EK. +.+
Nor
Now let us consider the space L. An ordering relation can be introduced in a natural way: the relation el >_ 12 implies that el (x) >_ .e2(x) V X E Rn. Clearly, this relation transforms L into an ordered linear space. The cone K -44,
of positive elements consists of functions e which are nonnegative on Rn. r1'
Our next aim is to show that L is a lattice. First we give the necessary definitions.
coo
N
Let Z be an ordered set. A subset U of this set is said to be upper
(lower) bounded if for some Z E Z the following inequality holds: x<_ z
(x>_ Z)
VXE U.
X00
The element z is called an upper (lower) bound of the set U. Let U be an upper (lower) bounded set. If the family of all upper (lower) bounds of the set U contains a smallest (largest) element, this is called the supremum (infimum) of the set U and is denoted by sup U (inf U). Sometimes we use the term `the exact upper (lower) bound' of U instead of the term `supremum' ('infimum'). In addition to sup {ai ,...,an }, sup ay, the notation a 1 v a2 v yE r
,..
v an' yE r ay is also used. The forms inf {a,b} and a A b are equivalent.
A set Z is called a lattice if every finite subset of Z has its own supremum
and infimum. Clearly, if a supremum and an infimum exist for all two-element subsets of Z, then Z is a lattice.
Chapter 6
74
Let Z be a set of functions defined on some set T, where the relation zl >_ z2 means that >_
z2 (t)
V t E T. +.,
z1(t)
If in addition to functions zl and z2 the set Z also contains functions
z(t) = max{z1(t),z2(t)},
z(t) = min{z1(t),z2(t)}
then Z is a lattice, where z1 v z2=z, z1 Az2=z.
,°"S
Nay
Remark 6.2. If for some z1, z2 E Z the corresponding elements z and z do not belong to Z, it does not necessarily imply that Z is not a lattice. For example, the set of convex functions defined on some convex set is a lattice,
although z does not necessarily belong to this set. If z1, z2 E Z, then the function zl A z2 is defined as follows: the epigraph of this function is such that it coincides with the convex hull of the union of the epigraphs of z 1 and z2.
Lemma 6.1. The space L is a lattice.
Proof. Letej,Z2 EL and let Li = pi + q1; 12 = P2 + q2, where P1,P2 E P,
qi, q2 E Q Q.
Let us show that the functions .e(x)
= max {e1(44 (x) } V xE Rn
:g(x) =min {.e1(x),.e2(x)}
belong to L. First note that
.e(x) = max {pl(x)-g2(x),p2(x)-ql(x)} + (ql(x)+q2(x}) fi(x) = (Pi(x)+Pa(X)) + min {q2(x) Pi(x),4i(x) P2(x)} .
Consider, for example, equality (6.7). Let x E Rn be such that 1i (x)
e2 (x)
,
p l (x) +q l (x) > P2(x)+q2(X)
,
(6.6) (6.7)
The Space of Convex Sets
75 4..
In this case, the expression on the right-hand side of (6.7) coincides with p2(x)-q2(x) = 12(x). At the same time, .:g(x)
= min
{11(x),12 (x) } = 12 (x)
.
If 11(x) <_ 1(x), then we have 11(x) on both sides of (6.7). Since the functions q2pl and q1p2 are superlinear, the function min {g2(x)p 1(x), g1(x)p2(x)}
cry
must be superlinear as well. Since, in addition, p 1 +p2 E P, it follows from (6.7) that e E L. _ Analogously, it follows from (6.6) that 1 E L, thus completing the proof. 0 We shall now give formulas for computing the exact bounds of a finite number of elements of L. Let I be a finite set of indices, 1i E L. 1i = pi + qi, where pi E P,qi E Q V i E I. Then
max 1i (x) = max pk (x) - tE2:1 i E1
k E1
i* k
qi (x)
+ 2: qi(x) , iE1
min 1i(x) = iE1 2: pi(x) + m in qk(x) - 1: pi(x) iE I i E1
k E1
.
(6.8)
(6.9)
i* k
Formulas (6.8) and (6.9) can easily be proved by induction with respect to the number of elements of set I. Now let us consider the space M of convex sets. Recall the mapping yr describing the relation between linear spaces M and L (see (6.5)). Let yr-1 be the mapping inverse to V/. If 1 E L, 1= p+q, where p E P, q E Q, then V-1 (1) coin-
cides with the class a of equivalent pairs containing the pair [dp,dq], where p is the subdifferential of the sublinear function p and dq is the superdifferential of the superlinear function q. Let us introduce an ordering relation into M as follows: a1 >_ a2 if and only if yi(ai) >_ yr(a2). Let W(al) = 11 = pi + q1, y(a2) = 12 = p2+q2, where p1,p2 E P and ql 9 q2 E Q.
Then
a1 =
1(11) contains the pair [dp1,dg1]
a2 = V _1(,e2) contains the pair [p2,dg2]
,
Chapter 6
76
The inequality ;Y(al) >_ V/(a2) implies that
pl -q2 ? P2 -ql Let us consider the sublinear function p = pl -q2. coincides with the set
Its subdifferential
1-dq2. Moreover, the subdifferential d(-q2) of the ,,p
sublinear function -q2 coincides with (-dq2). Therefore
dff
d(p I +(-q2))
dp I + d (-q2)
d-P I - q2
-
Analogously the subdifferential p' of the sublinear function p' = P2 - ql coincides with d P2-dq1. The inequality yr(a1) >_ V/(a2) is equivalent to the inequality
pl -q2 > p2 -ql which (from the above) is equivalent to the inclusion
P2 - dql
P1 - dq2
Let [A1,B1 ] be an arbitrary pair from the class a1 and let [A2,B2] be an arbitrary pair from the class a2. Putting
pi(x) = m a x (v,x); qi(x) = m i n (w,x) we B
V E A.
1:2
,
we get VI(a1) =
p1 + ql
;
V/(a2) = P2 + q2
Ai = pi, Bi = dqi; V i E 1:2. Thus, the relation a1 >_ a2 implies that the inclusion Al - B2
A2 - B1
holds for arbitrary pairs [A1 ,B 1 ] E a1, [A2,B2 ] E a2.
It follows immediately from the definition that the ordered spaces L and M are isomorphic. Since L is an ordered linear space, M is also an ordered c13
linear space. Since L is a lattice, M is a lattice as well.
Using formulas (6.8) and (6.9), it is easy to find the exact bounds of a finite set of elements from M. Let I be a finite set of indices, where pi E P, qi E Q. It follows from Theorem 5.2 that
d v pi = co u dpi . iE I
iE I
The Space of Convex Sets
77
Applying an analogue of this theorem to superlinear functions yields
= co u dqi
A qi
iEI
iE I
Here q1 E Q V i E I.
Let ai E M V I E I and let (Ai,B1) be any pair from the class ai. Then the supremum iE I ai coincides with the class containing the pair [A,B],
where
kE I
iE I i* k
(6.10)
'-%
A = co u IAk- 2: Bi B = 2: Bi iE I
A ai coincides with the class containing the pair [C,D], p,;
The infimum
iE I
where
C = IiEAi I D = co u Bk - 2: Ai iE I
kE I
i# k
Let us prove (6.10). It follows from (6.8) that an element e c L related to the class v ai can be represented as the sum of the sublinear function iE I
p(x) = max L(X)
(qi (x)
+ iE I
,he
i# k
and the superlinear function
q(x) =
iE I
qi (x) ,
where we can take pi(x) = m a x (v,x), qi(x) = m a x (w,x). Using the linearity
VEAW E B.
of Minkowski duality and the rules for computing the subdifferentials and superdifferentials of exact bounds, it is now easy to obtain formula (6.10).
Chapter 7 UPPER -SEMICONTINUO USLY DIRECTIONALLY DIFFERENTIABLE FUNCTIONS
7.1. Let us consider the n-dimensional space Rn equipped with some norm II. Let X be an open set in this space and let a function f be defined, continuous and directionally differentiable on X. We say that the function f is II
°,4
upper-semicontinuously directionally differentiable (u.s.c.d.d.) at a point xo E X if for any fixed g E Rn the function x -4f'(x,g) is upper semicontinuous (in x) at this point and is bounded in some neighborhood of x0. The latter property means that there exists a number C < 00 such that [f'(X'g)J
:5
(7.1)
C11g11
for all g E Rn and every x in some neighborhood of x0. Let us consider two examples of u.s.c.d.d. functions.
Example 7.1. Let f be a convex function defined on Rn. Then it is directionally differentiable on Rn and df (x)
dg
= f'(x,g) =
max
VEdf(x)
v (,g)
where fi(x) is the subdifferential of f at x. The mapping x - e(x) is u.s.c. ...
and bounded (see [44,136]). It follows from Proposition 2.3 that the function x -4 f'(x,g) is upper semicontinuous for all g. The boundedness of the mapping x -4 e(x) on any bounded set in R" implies the existence of C < C>0 such that (7.1) holds. Hence f is an u.s.c.d.d. function.
Example 7.2. Let f be a maximum function, i.e.,
f(x) = m a x O(x,y) yEY
V X E X,
where the function 0 is defined on X x Y, Y is a compact set in some space, and 0 and its partial derivative doldx are jointly continuous in both variables. 78
Upper Semicontinuously Directionally Differentiable Functions
79
The function f is directionally differentiable and
f'(x,g) =
max (v,g) vEd f(X)
where
f(x) = co fd'vx R(x)
y e R(x)}
,
= {y e Y I O(x,Y) =f(x)}. r.,
dam
(D0
It follows from Proposition 3.7 that the mapping x -4 f(x) is u.s.c. The boundedness of Y and continuity of do/dx imply that the mapping x -4 d(x) is bounded. Therefore the function f is u.s.c.d.d. It turns out that these examples are quite typical.
Definition 7.1. We say that a function f defined on X is subdifferentiable at a point x E X if it is directionally differentiable at x and if its directional derivative fX is a sublinear function (as a function of direction). We denote by f(x) the subdifferential of f at x. By definition,
f(g) =
max (v,g)
vEd f(X)
V x E Rn
Recall that the subdifferential is a convex compact set.
Proposition 7.1. If a function f is u.s.c.d.d. at a point x E X, then it is also subdifferentiable at this point.
P r o o f. The positive homogeneity of the function f(g) = f'(x,g) is obvious. Let us check its subadditivity. Take gi, 92 E Rn . Then there exist functions Sri (a) and V2(a) such that
ui (a) a and
0
0
V2 (a) a
--0 0
I
Px'gi) = a [f(x+agl) -f(x)] + Vj(a) f'(x,gi +g) = a [f(x+agi+a82) f(X)] + V2(a)
Chapter 7
80
yielding
f'(x,g+g) -f'(x,gl)
=
V3 (a)
a [f(x+ag1+ag2) -f(x+agl)] + V3(a) , V2 (a)
w1(a)
a+0
0
.
Fix some a > 0, put xa = x + ag1 and define
Ma =
sup f'(xa+/Jg2,g2) 0<_/B:5 a
It follows from Proposition 3.1 that
f(xa +ag2) - f(xa) <_ Ma a . Therefore
I/1
f'(x,g1+g2) -f'(X,81) = a [f(Xa+a8a) -f(Xa)] + V3(a) Ma + V3(a).
fts
Since f is an u.s.c.d.d. function, the derivative f'(x,g2) is u.s.c. (as a function of x). This means that for any .6 > 0 there exists a 8 > 0 such that
f(y,g) < f'(x,g2) + 2
b' y E Bs(x)
For sufficiently small a and fl E (O,a) we have Xa + /3g2 = x + Q'81 +,892 E By (x)
and therefore Ma < f U,92) + 2. Assuming that
Iu3(a)I < 2 (which is the case if a is sufficiently small), we have f'(x,g1+g2) - f'(x+g1) <_ f'(x,g2) + 0
.
This implies that the function f(g) = f'(x,g) is subadditive (since .6 is arbitrary).
0
Upper Semicontinuously Directionally Differentiable Functions
81
Remark 7.1. If f is an u.s.c.d.d. Lipschitz function, then it is easy to check that f (x,g) coincides with the Clarke derivative f(x,g) (see Chapter 8). Therefore Proposition 7.1 will follow from Proposition 8.1.
cep
Corollary 7.1. Let a function f be directionally differentiable in some neighborhood of a point xo E X and let its directional derivative f'(x,g) be continuous in x at xo for any fried g. Then f is Gateaux differentiable at xo, i.e., the following gradient exists:
cxo> _ (df(xo)
Vf(xo)
afcxo>
and
f'(x,g) = (Vf(xo),g) Here, x = (x(1) ,..., x(X) ).
Indeed, the continuity of f'(x,g) implies that the functions f and -f are u.s.c.d.d. at x0 and therefore functions f(g) and -f'(g) are both sublinear. But this
is possible only if f(g) is a linear function, which implies that it
Gateaux differentiable.
is
o
Let a function f be defined on an open set X c Rn and be u.s.c.d.d. on this set. It follows from Proposition 7.1 that f is subdifferentiable at every point x E X (and the subdifferential 2f(x) is defined for every x E X). Fix any g E Rn and consider the function qg (x) =
max (v,g) = f'(x,g).
vE d f(x)
It follows from the definition of, qg that qg is an u.s.c. function. Inequality (7.1) implies that the mapping e is bounded in some neighborhood of every point x E X. It thus follows from Proposition 2.3 and Remark 2.1 that the mapping x -4 e(x) is u. s. c.
oaf
From Theorem 2.3 and Lemma 2.1, we conclude that every point xo E X has a neighborhood (in which the mapping x -4fi(x) is bounded) such that for any fixed e > 0 we can find a continuous multivalued mapping b which is defined in this neighborhood, has convex compact sets as its images and is such that the inclusions
f(x) c b(x) c f(B(x)) + Be hold. Here B, (x) = x + Be ; Be = Be (0).
(7.2)
Chapter 7
82
For simplicity we assume that the mapping x -4 f(x) is bounded on the whole set X. It then follows from Theorem 2.3 that a continuous mapping b satisfying (7.2) can be defined on this set. Let e and lt be positive numbers. It follows directly from (7.2) that there exists a continuous mapping b such that
f(x) c b(x) c f(Be (x)) + B,u
d x E X.
(7.3)
4.'
This condition is met, for example, by the mapping b which satisfies (7.2) for e' = min {e,,u } .
A continuous mapping b satisfying (7.3) is called a continuous (e,p)subdifferential of the function f and is denoted by d s1 f. Clearly this map-
CDR
CT'
ping is not uniquely defined: if 0 < e' <_ e, 0 < ,u' <_ ,u, then every continuous (e',,u')-sub differential is also a continuous (e,p)-subdifferential. The definition of a continuous (e,,u)-subdifferential can be extended to the case in which either e or p is zero. However, in this case we cannot guar-
nor
VIA.
antee the existence of a continuous (e,,u)-subdifferential for an arbitrary u.s.c.d.d. function. Nevertheless, continuous (e,O)-subdifferentials exist for convex functions. We shall now describe one of them. Let a function be defined and convex on an open convex set X. By d of (x)
we denote the conditional e-subdifferential off at x with respect to the ball Be (x) (see [Section 4.6] ):
,ftX)
_ {V E Rn I f(Z) - f(X)
(V,Z-X)_Oc V Z E BE (X)}
.
coo
.TWO
Proposition 7.2. Let a function f be defined and convex on an open convex set X E Rn. Then the mapping dEf' is a continuous (E,0)-subdifferential of the function f.
P r o o f. It follows from Theorem 4.14 that dEf(x) coincides with the closure of the set
CJ (x) = {v E Rn 13 x' E lnt B. (x): v e
(x');
f(x') - f(x) ! (V, x -x) - ej
.
It follows from the definition that CEf'(x)
c e(int B£(x)) c e(Bf(x))
.
In addition, f(x) c Cj(x) and the def(x) are convex compact sets (the latter follows from Theorem 4.10). Thus
f(x) c fi(x) c f(B(x)).
(7.4)
Upper Semicontinuously Directionally Differentiable Function
83
r-,
It is now necessary to demonstrate the continuity of the mapping dEf(x). It follows from Theorem 4.11 that the support function gEf(x,g) of the set
d E f'(x) is given by gef(X, 8) =
a [f(x+a8) -f(x) + -e]
i of
.
0
Fix any vector g and consider the function
h(x,a) = a [f(x+ag) - f(x) + E] >-t
x
which is jointly continuous in both variables on X x (0, Since
1 i m h(xo,a) _ +00
Ig III'
Fix xo E X.
,
a-4+0
there exist numbers 9 > 0 and ao > 0 such that inf
s
min s h(x,a) V x e B5(xo)
h(x,a) =
0
.
ao< a5 llg 11
Since h is jointly continuous in both variables on the compact set BS(xo) X [as,
'
]
IIg Il
the function
min e h(x,a)
x -4 qcf(x,g) =
aoSa<_Ti g11
is continuous at the point x0. Moreover, since the mapping x -4 d f (x) is bounded in some neighborhood of xo (this follows from (7.4) and the boundedness of the subdifferential), Theorem 2.2 demonstrates that the mapping x -4 d J(x) is continuous. 0 The following example will be of special interest later on.
Example 7.3. Let p(x) = max that
x( i)
,
where x = (x(1 ) ,..., x(n)) E Rn. Assume
i E 1: n IIx 11
= max Ix(i> I. iEl: n
Chapter 7
84
Observe, first of all, that Cep(x) _ {v e Rn 13 x' : 11x-x'11 < e, v e dp(x'), (7.5)
p(x) <_ (v,x) + e} .
This follows from the equality
(vex') = p(x') which holds for v e p(x') since p is sublinear (see Proposition 5.1). Put n
v(j)=1; vW =0 Vi4J
,
i =1
where J c 1:n = I, and let
R(x) _ { i E I I p(x) = x(') }, J(x) =
u
R(x')
IIx-x' II <s
We want to show that CeP(x) _ {v E j(x) I Ax) <_ (v,x) + E} .
It follows from Example 3.1 that
p(x') = R (x')
V x' E Rn ,
and therefore J (x)
=
U
11X-x II<s
R (x')
_
u p (x') . 11X-x' II
Thus formula (7.6) is an immediate corollary of equality (7.5). Let R'(x)
P(x) - P) < } .
Then
J(x) = R2, (x)
.
(7.6)
Upper Semicontinuously Directionally Differentiable Functions
85
Let i e J(x), i.e., i e R(xi) for some xi such that 11x-x1 II < e. Since
p(x) - xM = p(x) - xij) + xij) -
p(x) - p(xi) + (X(`) - x(j))
<_ p(x xi)+ (x(i) - x(j))
2llx--x111
<
2e,
we have i e R2e(x). Now let us assume that i e R2e(x), i.e., there exists a number 9 > 0 such that
p(x)-x() = 2e-25. Take
y(i) = X(i) + 6 - 6 , y(i) = X(j) - 06 + 9 V j # i
Y(i) - x(i) I
.
= e- 8 V j E I
Ilxyll =E-9<e, y e lnt Be (x) . The definition of 9 means that the equality
p(x)=x()+2e-2S is satisfied, and therefore for j
i we have
y(J) < p(x) - e + 9 = X(i) +
yM
.
This implies that i .E R(y), i.e., i E J(x). It follows from formula (7.6) and the equality J(x) = R2 s (x) that Cep (X) _ {v E
42s (X)
I p(x) <- (v,x) + e} .
Since this set is closed, it follows from Theorem 4.14 that
d p(x) = {vE 42c (X) I p(x)<-(v,x)+e} .
(7 .7)
Chapter 7
86
Theorem 7.1 (On the continuous (e,p)-subdifferential of a composition).
Let a function f be defined, Lipschitz and u.s.c.d.d. on an open set X1 c Rn. Suppose also that for any e > 0 and ,u > 0 there exists a continuous Let functions h1,..., hn be defined and conti(e,p)-subdifferential nuously differentiable on an open set X2 c Rm, where m ? n. Let a mapping H(x) = (h 1(x),..., hn (x)) be such that: (i) H(X2) c X1
(ii) The Jacobian matrix
A,
A,
dx(m )
dx(1) Hx =
An
hn
... dx(m )
odx(1)
has a minor of the nth order which does not vanish on the closure cl X of some bounded open subset X of the set X2.
Then the function O(x) = f(H(x)) is u.s.c.d.d. and for any 8 > 0, y > 0 there exist e > 0 and ,u > 0 such that the mapping x -4 (Hx)* dEaf(H(x))
of the function 0 on the set X1. Here *
is a continuous denotes transposition.
The proof is based on the following lemma.
Lemma 7.1. Under the hypotheses of Theorem 7.1, for any 8 > 0 there exists an e > 0 such that
H(x) + Be c H(x+B5)
H(Bs(x)) V x E X.
ma'r'
Q`.
O.0
P r o o f. First let us show that for every point x E X2 the image of any of its neighborhoods contains some ball centered at the point H(x). Assume for the sake of argument that the minor which does not vanish (see condition (ii)) corresponds to the first n indices. Let
x = (x(1),...,x(n),Z
E X2
Consider the set X2 = {y = (y(1),...,y(n)) E Rn I
1),..., (m)) E X2}
Upper Semicontinuously Directionally Differentiable Functions
87
and the mapping H defined on this set by the equality
H(y) = H(x)
,
where x =
rya
Since the Jacobian of this mapping does not vanish at the point y = ((1) ,...,x-(n) ), it follows from the inverse function theorem that in some neighborhood of this point there exists a continuous mapping H-1 which is the inverse of H. The continuity of H 1 implies that the image of every
sufficiently small neighborhood of the point y in the set X2 (under the mapping H) contains a ball centered at the point H(Y) = H(xi). Furthermore,
the image of any neighborhood of the point x in the set X2 (under the mapping H) contains a ball centered at the point Y.
Fix S > 0. Let e(x) denote the supremum of the set of numbers e > 0 such that
H(x) + Be c H(x+BS) V X E X2
.
Here BS and B., are open balls, of radius 5 and e, respectively, centered
at zero. It follows from the above argument that e(x) > 0 for all x. Let us show that the function e(x) is l.s.c. Setting out to prove the opposite, we can find a sequence {xk } and numbers e', e" > 0 such that Xk -4 X,
V k.
Xk E X2, e(x) > E" > E' > e(xk)
(7.8)
The inequality e' > E(xk) implies that there exist elements {yk} such that I I H(xk) - A 11 < i',
yk 4 H(xk +B s)
(7.9)
.
Since the sequence {H(xk )} converges, the sequence W1 is bounded. We can assume without loss of generality that the limit lim Yk = y exists. Since II H(x) - y II = lim II H(xk) - yk II
<_
e' < e" < E(x)
,
we have y E H(x) + Be to
c H(x+Bs) ,
i.e., the equality y = H(x') holds for some x' E x + Bs. Let IIx' - x II Take numbers y and y' such that A
0 < 2y < y' < 5-5'. Since the image of a neighborhood contains a neighborhood and yk -4 H(x'),
Chapter 7
88
the inclusion
yk E H(x' + By) holds for sufficiently large k. Let numbers k be such that IIx - xkii
<_
< 2y+9' < 6.
IIz - x'II + IIx'-xI1 + IIx - xkii
We conclude that x' + By c Xk + BS and therefore
yk E H(x' + By) c H(xk + BS) . However, this contradicts (7.9), showing that e(x) must be l.s.c. The set ci X is compact by assumption, and therefore e(x) achieves its minimum on cl X at some point xo, where e(x) ? e(xo) > 0. 0
P r o o f of Theorem 7.1. Let O(x) = f(H(x)). Since f is Lipschitz, it follows from Theorem 3.1 that
O'(x,g) = fj (HX (g)) =
max (v, HX (g)) v r= d f(H(x) )
max ((HX)*v, g) =
vE af(H(x))
max (v',g) v'E (H')* (a.f(H(X)))
We conclude that 0 is an u.s.c.d.d. function and that
&(x) = (HX) *(e(H(x)))
.
Let numbers 6 > 0, v > 0 be given. Find an e > 0 which corresponds to 05 (and whose existence is guaranteed by Lemma 7.1) and choose a ,u such that the inequality 11A
p1I(HX)*II < v holds. Take a continuous (e,p)-subdifferential d f of the function f. Then
f(H(x)) c 4f(H(x)) c f(Hx + BE) + ,uB *
.
Applying the operator (HX) * to these inclusions yields
(HX)* f(H(x)) c (HX)*d ,uf(H(x)) c (HX) *f(H(x) +Be) + ,u (HX) *B
c (HX) * f (H(x+BS)) + vB * .
Uppersemicontinuously Directionally Differentiable Functions
89
Making use of the equalities
&(x) = (Hx")*e(H(x)) dO(x+Bj) =
u
9
(HX)*e(H(x+Bj))
11X'- xa
,
we finally obtain
&W
aO(x+aj) + va*
(xX)*a ufcxcx»
o
.
Remark 7.2. If for every e > 0 a function f has a continuous (e,0)-subdifferential 4f, then for any J > 0 there exists an e > 0 such that the mapping (Hx)(dx o(H(x)))
is a continuous (e,0)-subdifferential of the function 0 = f(H) on the set X. This follows directly from the proof of Theorem 7.1, which allows us to
describe a continuous (e,0)-subdifferential for one class of finite maximum functions.
coy
Theorem 7.2. Let functions hl ,...,hn be defined and continuously differentiable on an open set X c R" (where m >_ n) and let
O(x) = max he(x) V x E X
.
ie 1: n
Assume that the Jacobian matrix
ah;
' O)} has a minor of the nth order which
does vanish on the closure cl X of some bounded open subset X of the set X. Then for any 8 > 0 there exists an e > 0 such that the mapping d o (x) defined below is a continuous (5,0)-subdifferential of 0. The mapping d p is described by the relation n
n
y E Rm I y= I Vi dhi (x) 1
i=1
d
,...,
dhi (x) vi
i =1
dx< m) J
v=(v1,...,vn)E Ve(x)
9
Chapter 7
90
where n
V (x) =
vi = 1, vi >_ 0; vi = 0 if i 4 R2 (H(x))
v E Rm I i =1
fi(x) <_ I vihi + E}
,
R2e(H(x)) _ {i E 1:n I O(x) - hi(x) < 2E} Remark 7.3. The set 4,0(x) is a part of the convex hull of vectors {zj
ah. ah. (ate) IiE
1:n}
,
and the set V,,(x) indicates sets of coefficients which define convex combinations of vectors zi belonging to do. ;-h
P r o o f. The function 0 can be represented in the form 0(x) = pn (H(x)), where pn is the function described in Example 7.3 and the mapping H is H(x) = (h1(x))...,hn (x)). To conclude the proof, it is necessary to use the results of Example 7.3 and Remark 7.2. 0
Chapter 8 THE CLARKS DERIVATIVES
8.1. In the terms of theoretical analysis of extremal problems, the generaliza-
tions of the derivative for all functions in some sufficiently rich class are worth considering. One such class is that of Lipschitz functions. 0
In nonsmooth analysis, generalizations of the derivative concept which are sublinear functions of direction are of a particular interest. For generalized derivatives it is possible to introduce the concept of a generalized subdifferential.
Let X be an open set in Rn and let a function f defined on X be locally Lipschitz, i.e., such that it satisfies a Lipschitz condition in some neigh-
borhood of every point x E X (a Lipschitz constant may depend on x as well as the neighborhood). Assume that f is directionally differentiable at a point x E X. Then (see Propositions 3.2 and 3.4) off (x} dg
_' x
= Iim a-i+0
[f(x+ctq) - x
V g E Rn
(8.1)
q -)g
If f is not differentiable in some direction, then the limit in (8.1) does not exist. However, in this case we can consider the upper limit
ft(x,g)
I i m 1 [f(x+aq) - x
=
a-)+0
(8.2)
q -)g
Since f is Lipschitz, the function
h(aq)
[f(x+aq) -f(x)]
is bounded in a neighborhood of the point x for a > 0 and q close to g. Thus the upper limit in (8.2) is finite. This limit is called the upper directional derivative (Dini derivative, contingent derivative). 91
Chapter 8
92
It is clear that one can also consider the lower directional derivative:
[f(x+q) -.f(X)
.fl (X,8) = 1 im
.
(8.3)
a -+0 q -48
An important property of derivatives (8.2) and (8.3) is that they coincide with the directional derivative (if it exists). They therefore provide a local approximation of the function to within accuracy of a higher order than compared with a (in the case where such an approximation is possible). If this is impossible, then the numbers fl (x,g) and fT (x,g) represent the lower and upper bounds (respectively) of numbers y for which there exist sequences {ak } and {qk } such that
ak-4+o,
f(x-i-aq)
gk-4g,
= f(x) + akY + o (ak)
The derivatives ft (x,g) and fJ l (x,g) are not sublinear functions of a direction, and therefore F. Clarke [27] suggested the use of another quantity, now called the Clarke derivative. 4''
Definition 8.1. Let a function f be defined and locally Lipschitz on an open set X, g E R". The quantity
f°(x,g) =
1im a -)+0
a1
[f(y+ag) - f(y)]
y-3X
is called that Clarke upper derivative of f at x E X in the direction g.
Proposition 8.1. The Clarke upper derivative is finite for all g E R" . In addition, the function g -4f °(x,g) is sublinear (in g e Rn ).
P r o o f. Since f is
locally Lipschitz, we can assume without loss of
generality that y and a are such that
[f(y+ag) - f(y)l
<_
Lallg II
(where L is a Lipschitz constant) and therefore [f 0 (x,g) I
! L < + oo
The Clarke Derivatives
93
The relations
f
...
Iim Lf(y+aAg) - f(y)] a -1+0 a y -4X
= 1 im a. a-4+0
1 [f(y-i-a2g) -.f(y)l = f °(x,8)
y-)X
BCD
show that the function g -4 f°(x,g) is positively homogeneous. Let us now prove that it is sublinear. We have
f°(x,g+q) =
a [f(y+ag+aq) -f(v)]
-i i+mo
ay
-)X
1 i m 1 lf(y+ag+aq) -f(y+ag)] + l i m 1 [f(y+ag) - .f(Y)] a a -)+0 a
a -i+0 y -4X
=f
y -4X
f
Here we used the subadditivity of the upper limit. 0
Let us set p(g) = f°(x,g). Since p is a sublinear function, we have (see Theorem 5.1)
p(g) = max (v,g) v E dp
...
where dp is a convex set (since it is the subdifferential of the function p). This set is called the Clarke subdifferential of the function f at the point x and is denoted by dcx(x). The following equality is an immediate corollary of this definition:
f°(x,g) =
max
(v,g)
vE dczf(X)
V g E R'1
Fix a direction g and consider the function qg (x) =
max vE dC tf(x)
(v,g) _ f°(x,g) .
(8.4)
Chapter 8
94
Proposition 8.2. The function qg is upper semicontinuous.
P r o o f. We must check that if xk -4x, then qg (x)
>_
lim qg (xk)
.
Consider a subsequence {k5 } such that
lim qg (xks) = lim qg (xk)
,
and let es - +0. Since =
(xk
Tim
[f(y+ag) - f(Y) I
y-4Xk S a -)+0
there exist, for any s, a vector ys and a number as such that - xks II<e
Ilys
qg (xks }
<_
0
;
g} - .fCvs }] + es . s as [fU+as 1
Passing to the limit yields
lim g8(xk) = lim g8(xk) <_ lim k-,oo
k-4oo
<_
1im y -)X
,
1
as
[f(yk+ak8} -f(yk)]
[f(y+ag) - f(y)] = qg (x) .
0
a -4+0
Corollary 8.1. The multivalued mapping x -4 dc xf(x) is upper semicontinuous.
To characterize the Clarke subdifferential, we need the following results, which are related to the concept of measure. (a) Let a function f be Lipschitz on an open set X c R" . Then this func-
tion is differentiable almost everywhere, i.e., the gradient of the function f exists except for a set of Lebesgue measure zero (a set with Lebesgue measure which is equal to zero). (It follows from Proposition 3.4 that in this case the Gateaux and the Frechet derivatives coincide.)
The Clarke Derivatives
95
o6)
(b) Let T be a set of measure zero in some ball BE(x), let g be an element of Rn (not equal to zero) and let My be the interval formed by the intersection of the line {y + ag I a E R1 } and the ball BE(x) (here y E BE(x)}. Then for almost all y E B,,(x) the measure of the intersection of the interval MY and the set T is zero (considered as the Lebesgue measure on an interval). o6)
01-4
(c) Let a function g(a) be defined and Lipschitz on an interval [a,b].
Then
b
Ja g'(a) da = g(b) - g(a) .
(8.5)
"x`.
Let us explain this equality. From property (a), the function g is differentiable almost everywhere on the interval (a,b) and therefore on [a,b]. Hence the integrand in (8.5) is a function which is defined almost everywhere. (We are considering the Lebesgue integral.)
Proposition 8.3. Let a function f be Lipschitz in a neighborhood of a point x and Gateaux differentiable at this point. Then Vf(x) E ac E f(x).
P r o o f. Consider the sublinear function
,',
p(g) = f°(x,g). We shall show that (O.f(X),8)
:5 p(g)
V g E Rn
(8.6)
.
We actually have (VAX),g) = I i m
Lf(x+ag) - AX) I
1+ag) -f(Y}] = P(8) a
.
y -4x
Inequality (8.6) implies that Of(x) E dp = dC Xf(x).
0
Remark 8.1. The equality
dc zf(x) = {vfcx>} does not always hold (see [31], where counterexamples are given).
Chapter 8
96
8.2. Let a function f be Lipschitz is some neighborhood V of a point x, and let T c V be a set of measure zero which contains all points in V where f is not differentiable. Let
A = {vIv=limVf(yi),yi -4x,yi E V\T}.
(8.7)
x-400
Since f is Lipschitz, the gradients Vf(y), d y E V\ T are jointly bounded and therefore the set A is bounded. It is not difficult to check that A is closed. Thus A is a compact set. The compactness of A implies that D = co A is a compact set as well. Theorem 8.1. The following relation holds:
dc zf(x) = D .
P r o o f. (a) First let us show that D c dczf(x). Let V E A and let a sequence {yi} be such that
yi E V\T,
yi-4x,
v = limVf(yj. 1-400
From Proposition 8.3, it follows that
of(yi) E ac Zf(yi) .
Since the mapping acE is u.s.c. (see Corollary 8.1), Proposition 2.2 implies
that v E dczf(x). Thus A c ac,ef(x). But the subdifferential dcRf(x) is a convex set, and therefore D = co A c dc f(x). (b) Let us now check the reverse inclusion. Theorem 5.2 indicates that it is sufficient to show that the support function p(g) of the compact D majorizes the support function f°(x,g) of the compact set dc J(x):
p(g) ? f°(x,g) V g E R'2 Since
D = coA, we have
p(g) = max (v,g) = ma x (v,g) . vED
vEA
The Clarke Derivatives
97
Let e be a positive number. It follows from the definition of the set A that there exists a 6 > 0 such that
(Vf(y),g) < P(8) + e V Y E B S (x)\ T .
(8.8)
Fix Y E Ba12(x). Let My be the intersection of the line {y+ag I a E R1 } with the ball Ba(x). My is nonempty. Put hy P9
(a) = fly+ag)
-
c43
If the function f is differentiable at y + ag, then the following derivative exists:
h;)9 (a) = (VAY+ag)g)
-
pica
Clearly, hy,g is Lipschitz. It was noted above (see property (b)) that for almost all y the intersection
L of the interval M with the set T is a set of one-dimensional measure zero. Considering the function hy ,g , using the fact that .f is differentiable on My \ Ly and noting property (c), we obtain (for almost all y)
a
f(y-i-ag)-f(y) = by>s(a) - by>s(0) = f0(Vf(y+flg),g)48. Here the number a is sufficiently small to ensure that y+ag E Bo(x). It follows from (8.8) that for almost all ,(3 E [O,a] the inequality :5 P(8) + e
(V.fCY+iaB),8)
.o°
is satisfied. Therefore
f(y+ag) - f(y)
<_
a(p(g) + E)
(8.9)
.
who
Inequality (8.9) holds for all y from any set of complete measure, which is everywhere dense in the ball BS12(x). Since the function f is continuous, s...
inequality (8.9) holds for all y from this ball. It follows from (8.9) that
f
lim
1AY+ag) -AYA :5 P(g) +
y-3X
a -,+0
Since e is arbitrary, we have f°(x,g) <_ p(g). This completes the proof.
0
Chapter 8
98
Remark 8.2. Let a function f be defined and Lipschitz on an open set X E Rn and let T be a subset of X of zero measure containing all points of X where f is not differentiable. Let
Z = {[y,Vf(y)] I Y E X\T} c R" x Rn .
0",
C4;
b'4
Ate,
,gyp
0~OC,
PO,
This set is the graph of the gradient mapping y -4Vf(y) defined on the set X\ T. Let cl Z be the closure of Z. Since it is a subset of the direct product R" x Rn, the set cl Z is the graph of some multivalued mapping. It follows from the proof of Theorem 8.1 that this mapping associates a set A(x) defined by formula (8.7) with every point x E X. The same theorem also implies that the mapping x -4dc zf(x) coincides with the mapping x -4co A(x). Thus, the subdifferential mapping x -4dczf'(x) is defined by the gradient mapping y -4 Vf(y) using the operation of graph closure and by taking the convex
0000
r..
hulls of the images. Clearly, the subdifferential mapping provides the same information about
ate'
`r3
°99
the function as the initial gradient mapping, but it is much easier to use since it is defined on the entire domain of f. Both the subdifferential mapping and the gradient mapping completely characterize the function in a neighborhood of a point at which the function is not differentiable.
004°
,.4
8.3. We shall now find the Clarke derivative and the Clarke subdifferential
for various classes of functions.
Example 8.1. Let a function f be continuously differentiable at a point x, i.e., the gradient Vf(y) exists for all y in some neighborhood of x, and Vf(y) -4 Vf(x) as y -4 x. Then it follows from Theorem 8.1 that
dc f(x) = {of(x)} and hence
fo(x,g) = (Vf(x),g) -
Example 8.2. Let f be a convex function. Then f is Lipschitz in some neighborhood of x and therefore the Clarke derivative exists. Since Iim y-3X
I U
[f(y + ag) - f(Y) I
P-"
f
a -,+0
Al
1im
1
a-4+0 a
Lf(x+ag) - f(x)J _ f'(x,g),
The Clarke Derivatives
99
the subdifferential of the sublinear function g -, f°(x,g) contains the subCAD
differential of the sublinear function fz (g) (see Theorem 5.2). In other words, ac X.f(x) D
f(x).
(8.10)
Let us check the reverse inclusion. By Theorem 8.1.,
dc Xf(x) = co A
,
where the set A is defined by (8.7). Let V E A. Then there exists a sequence {y1 } such that yj -, x, f is differentiable at yj, and Vf (yi) -, v.
Since f is a Lipschitz function, it is uniformly differentiable, and therefore the existence of the gradient Vf(yi) and the directional differentiability of f at yj imply the Frechet differentiability of f at yi. From the fact that the mapping f is upper semicontinuous (see [136]) and using yj -4 x, Vf(yi) -4 v, Vf (yi) E dJ(yj), we have
v E 0 f(x).
Thus A c o)I(x). Since the subdifferential o)'(x) is a convex set, we obtain dc y,f(x)
co A c: o,Y(x)
.
(8.11)
00
It follows from (8.10) and (8.11) that for a convex function
ac j(X) = f (X) , and therefore
f'(x,g) V 8
.f
Remark 8.3. The results described in Examples 8.1 and 8.2 follow immediately from Remark 7.1. The following proposition will be used in Examples 8.3 and 8.4.
Proposition 8.4. Let a function f be Lipschitz in a neighborhood of a point x. Then
fo(xl-g) = (-f)o(xpg)
Pro of. We have
f°(x,-g) =
1 im
y-)x
a -4+0
-
Ia Lf(Y-aS) -f(Y)l
Chapter 8
100
Put y-ag = z. Then y = z + ag and therefore
f °(x, g) = 1 im a [f(z) - f(z+a8)] Z -4 X
a-)+0
= 1 im 1 [C-fl(z+ag) z-iX
- (-fl(z)l = H)°(x,8) .
0
a-4+o
Corollary 8.2. Under the hypotheses of Proposition 8.4, we have
dc f(x) _ - ac X (-f)(x) Indeed, putting A = dc f(x), B = ac E ( fl(x), we have for g E R"
PA (8) = m a x (v,8) = f °(x,8) = (-f)°(x, S) v EA
= m a x (v', -g) = max
VE [-B]
V'EB
(v,g) = p- B (g) .
Since the support functions of the sets A and (-B) coincide, these sets must coincide as well.
o
Example 8.3. Let f be a concave function and let ;)f(x) be its superdifferThis function is convex and ential at a point x. Consider the function h its subdifferential dh(x) coincides with It has already been shown that for a convex function h we have
dc Eh(x) = dh(x)
.
Corollary 8.2 implies that
ac f(x)
dc xh(x)
dh(x) =
Therefore, for a concave function f we have
dc f(x) = d.f(x)
The Clarke Derivatives
101
and hence the Clarke directional derivative is
f°(x,g) =
(v,g) =
max
max
(v,g)
(8.12)
.
vE off (x)
vE aC Qf (x)
x2)
However, the directional derivative of a concave function is given by the
relation
f'(x,g) = min
(v,g)
(8.13)
.
vE oaf (x)
Comparing (8.12) and (8.13), we conclude that if f is concave and not differentiable at a point x, then its directional derivative differs from the Clarke derivative. Since the directional derivative provides an exact local approximation of the function f in a neighborhood of x (in the sense that
f(x+ag) - f(x) - af'(x,g) = o(a)
,
where
0a
-4 a a4+0
0) '
CAD
the same cannot be said of the Clarke derivative. Let us consider this result. A concave function can be represented as the 44"
minimum of affine functions. Since the derivative (in any sense) of an affine function is linear, a concave function must be approximated by a function which is the minimum of some set of linear functions. A local approximation 0-0
4-+
of a concave function in terms of the maximum of linear functions seems very unnatural and hardly deserves to be called an approximation. At the same time, since the Clarke derivative is always sublinear, it will approximate
0.0
any function (regardless of its algebraic nature) by the maximum of linear functions. Thus, the Clarke derivative is too rough to be used for local approximations since it often gives unrealistic results.
Example 8.4. Consider an even function f (i.e., a function such that f(y) =ft-y))
which is Lipschitz in some neighborhood of the point x = 0. Let f be differentiable at some point y. Then (Vf(y) g)
Iim
-
I lf(y+ag) - fly)]
t i m 1 ff(-y+a(-g)) - f( y)l
a -4+0 a
Chapter 8
102
It follows from this equality that f is differentiable at the point yl = -y and Of(-y) = -Vf(y). Using this property and Theorem 8.1, we conclude that the
then (-v) E dc J(0). The
set dc f(O) is symmetric, i.e., if v E ac symmetry of this set can be expressed in the form dc y,f(O) =
-dc J(0)
It therefore follows from Corollary 8.2 that
ac ef(0) = ac e(f)(0)
This implies that the Clarke derivatives of the functions f and fat zero coincide:
foag)
= (-f)0(09g)
-
cry
NA!
This result shows that it is impossible to develop a full analogue of
Differential Calculus based on the Clarke derivative. Indeed, the formula "a)
(f)°(x,g) = -(f)°(x,g) certainly does not hold for all functions. Let f be an even function. Then ac J(O) + dc X (-f )(0) = 2dc f(O), 0
whereas dCE(f+( f))(0) = {0}. Thus the Clarke subdifferential of a sum is not always equal to the sum of the Clarke subdifferentials. Only a few analogues of Differential Calculus hold for the Clarke subdifferential (and derivative). Thus, for example, we have dc X (f, +f2) (x) c dc J, (x) + dcZf2 W
-
The sets of the left- and right-hand sides of this inclusion may differ quite markedly. For example, let fl (x) = R Ilx II, f2(x) = fl (x), where R is a real positive number. Since fl is a convex function, we have dc of, (0) + dc X (f2 (0)) = B2R
i.e.,
9
the ball of radius 2R centered at zero, where R can be as large as
desired. At the same time,
dc X (f, +f20) = J01
-
The problems related to analogues of Differential Calculus for the Clarke subdifferentials have been treated quite extensively (see, e.g., [31,137]). In
The Clarke Derivatives
103
particular, it has been shown that there are some classes of functions for which the equality
ac X (fi +fi)(x) = ac .efi (x) + ac efi(x) can be guaranteed to hold. We shall not go into details here. The derivative discussed above can be called the Clarke upper derivative. It is also possible to introduce the Clarke lower derivative. By definition
fo(x,g) = 1 im
a ff(y+crB) -f(Y)]
y-4x a -3+0
RIB
Applying Proposition 8.4 and the equality
Liim a ff(y+crB) -f(Y)] y-'x
a -+0
1 im
y-'x a a -4+0
[ f(Y+a8) - (fl(Y)]
,
we obtain
fo(x,g) = -(-J)O(X,g) = -fO(X9-g) and hence
fo(x,g) =
max
vEaCXf(x)
(v,-g) =
min (v,g)
vEaCzf(x)
Thus the Clarke lower derivative is a superlinear function of direction with a superdifferential which coincides with the Clarke subdifferential dc zf(x).
Chapter 9 UPPER CONVEX AND LOWER CONCAVE APPROXIMATIONS
9.1. Let a function f be defined on an open subset X c Rn and be directionally differentiable at a point x E X. In what follows it is assumed that the derivative f r is continuous (as a function of direction). A sublinear function p defined in Rn and such that
P(8) ? f(g) d S E R" is called an upper convex approximation (u.c.a.) of the function f at the point X.
It follows immediately from the definition that if p is an u.c.a. off at x, then
f(x+ag) <_ fix) + ap(g) + 08 (a)
,
(9.1)
where
og (a)
a
a
0
0.
Remark 9.1. The term "an upper convex approximation" is not sufficiently precise since the function f(x) + ap(g) approximates f(x+ag) only if p is (in some sense) close to fx. However, in the case where fX is not convex, it is impossible for an u.c.a. to be close to the function fX in all directions simultaneously. Therefore, it would be better to use in this case the term "a convex majorant." It is natural to consider the symmetric concept of a lower concave approximation. A superlinear function q defined on Rn and such that
q(g) 5 f;(g)
d 8 E Rn
is called a lower concave approximation (l.c.a.) of the function f at the point X E X.
104
Upper Convex and Lower Concave Approximations
105
Of course, as in the case of the u.c.a., it would be more correct to refer to a l.c.a. as a "concave minorant" off at x. If a sublinear function p is an u.c.a. of an function at a point x, then its subdifferential dp represents a linearization of the derivative fx' at this point. In other words, provides a local approximation of f at x by a maximiza-
tion. Clearly, this approximation is more precise the closer p is to fx. In the same way, if a superlinear function q is a l.c.a. of f at x, then its superdifferential dq gives a local approximation of f at this point by a minimization. 0
9.2. Let us consider some examples of upper convex approximations and of lower convex approximations. It is assumed that all functions discussed below are defined on an open set X c Rn and that x E X.
Example 9.1. Let f be a differentiable function, i.e., it has a directional derivative fx' which is a scalar product. Then there exists a vector z such that f(g) = (z,g). It is clear that f f is both an u.c.a. and a l.c.a. of the
C3"
function f at the point x. It is not difficult to show that a sublinear function p is an u.c.a. if and only if its subdifferential contains z, and that a superlinear function q is a l.c.a. if and only if z E dq. Example 9.2. Let f be a convex function. Then it has a directional derivative fx'
at x, which is a sublinear function such that ex' = f(x), where fix) is
the subdifferential of the function f at the point x. o".
It is clear that fX is an u.c.a. of f at x. A sublinear function p is an u.c.a. off at x if and only if dp c e (x).
x'0000
Any element U E f(x) is a l.c.a. of f at x. It is possible to show that a superlinear function q is a l.c.a. of f at x if and only if dq contains at Observe that in this example the minimal upper convex approximation fx' exists but the maximal l.c.a. does not. In other words, the inequality
least one element of
f(x).
P(B) > f(g)
d 8 E Rn ,..,
0
holds for any upper convex approximation p, but there is no lower concave approximation q such that q(g)
_>
q(g)
V gE R'2
,
V qE Q,
.,..,
where Q is the family of lower concave approximations of the function f at x.
Example 9.3. Let f be a concave function. Then the derivative fX exists, is continuous and superlinear. Moreover,
d fz = d f (x)
,
Chapter 9
106
where a*) is the superdifferential of f at x. It is possible to prove that a sublinear function p is an u.c.a. if and only if r1 affix)
0
,
.fl
and a superlinear function q is a l.c.a. if and only if dq =)df(x). A concave function has a maximal lower concave approximation fX but no minimal upper convex approximation.
Example 9.4. Let a function f be Lipschitz in a neighborhood of a point X E X
and let f(g) = f°(x,g) be the Clarke derivative (more precisely, the upper Clarke derivative) of the function f at x in a direction g (see Chapter 8). Assuming that f is directionally differentiable at x, it is easy to see that
I °(x,g) ? f;(g). Since the function f ° is also sublinear and continuous, this function is an u.c.a.
of f at x. Thus, the Clarke subdifferential do 1f(x) is only one of many sets which provide a local approximation of f at x. The Clarke lower derivative fo(x,g) can be taken as a l.c.a. (see Chapter 8).
9.3. Since an individual u.c.a. is not a satisfactory approximation of a function, it is appealing to consider those families of u.c.a.'s which provide an exact local approximation.
Let A be an arbitrary set. A family {p, I A E A }, where pp is an u.c.a. of a function f at a point x, is called an exhaustive family of u.c.a.'s of f at x if
i n f p, (g) _ fx(g)
AE A
V g E Rn
In other words, if
f(x+ag) = f(x) +
a E f pA (g) + og (a)
V g E Rn
where
og (a)
a
- 0.
a -9+ 0
U..4
Analogously, a family {q I A E A }, where qA is a l.c.a. of a function f at a point x, is called an exhaustive family of l.c.a.'s of f at x if
sup q, (g) = f(g)
AE A
V g E Rn
Upper Convex and Lower Concave Approximations
107
i.e., if
f(x+ag) = f(x) + a E p
qz (g) + 08 (a)
V g E R"
The existence of an exhaustive family of u.c.a.'s means that the directional derivative fX can be linearized (described by linear functions) by means of the "minimax" operation:
f(g) =
i n f max W,g) .
Ae A uE ape,
The existence of an exhaustive family of l.c.a.'s implies that ff can be linearized by means of the "maximin" operation:
f(g) = sup min (v,g). 2e A vE aq2
Theorem 9.1. Let a function f be defined on an open set X c R" and be direc-
tionally differentiable at a point x EX. Assume also that the derivative fX is continuous (as a function of direction). Then the function f has exhaustive families of u.c.a.'s and l.c.a.'s at the point x. This theorem follows immediately from the next result. obi
Theorem 9.2. Let a function h be defined, positively homogeneous and continuous on R". Then there exist families of sublinear functions {pa I 2 E fl} and superlinear functions {q I A E A} such that
h(g) =
h(g) = s up 9j (8)
i n f pA (8)>
.
.eA
.EA
(9.2)
The same set of indices A may be chosen for all h. ~.,
P r o o f. Suppose that for any continuous positively homogeneous function h there exists a family of superlinear functions
jqA I A EA I
Ma
j
I A EA I
such that the second equality in (9.2) is satisfied. Then
(-h}(g} = sup q_ h(g) , 2A
h(g) = in f aEA
(-q-h)(8) A
Chapter 9
108
A E A } of Thus, the first equality in (9.2) holds for the family {p, sublinear functions (where pt = -qq h). This implies that it is sufficient to I
prove the existence of a family of superlinear functions. First let us show that the function h is bounded on the unit sphere S = {x I
= 11. Since, in addition, h is continuous at zero, there exists an i7 > 0
III
Ilx II
such that Jh(x) 1
<_
V x : Ilxll < q
1
.
Therefore, if Ilx II = 1, then (h(lix)I < 1 and hence
<+oo
sup lh(x)I <
.
XES
Let us put A = S x (0,1) and construct a family of functions {.e% (A E Al defined on X and such that
h(g) = sup .
j (g)
?A
d g E S.
Let A E A, i.e., A = [u,e], where u E S, e E (0,1). Since the function h is continuous at u, there exists a 8 > 0 for a given e, such that
h(g) > h(u) -.6
if Ilg-u II < J.
Take
=-
(h(u)-e-M) Ilg-u 112 + (h(u)-e) =°0
.eA (g)
VgE S,
where M is a number such that --z
i n f h(g) > M+ 1;
ge S
1 - L)(h(u) - E) > M .
(9.3)
Check that 1, has the following properties: ..,
(1) .eA (u) = h(u) - e (this follows from the definition of 4); (2) .eA(g)
II u-g II
h(g) V g E S.
then h(g) > h(u) - e > 1, (g). Here we use the ,-,
Indeed, if
<_
inequality
h(u) -.6 > i n f h(g) - 1 > M, gE S
which may be derived from the definition.
Upper Convex and Lower Concave Approximations
If
109
Il u-g II >_ 8, then
4(g) = - Ilu-g112(h(u)-e-11 + (h(u)-e) <_ -(h(u)-e-M) + (h(u)-E) = M < h(g)
.
It follows from properties (1) and (2) that
h(g) = sup .e,, (g) SEA
VgES
.
Let A = [u, e] E A. Since II u I12 = 11g112 = 1, we have II u -g II 2 = 2 - 2(u,g). Therefore ,5T
(h(u)-e-M)(2 - 2(u,8)) + (h(u))-e)
(h(u)-e-M)(u,g) -
_
22
(izu)_e)
(1-
M)
`-J
4(g)
(u,8) +,#2 s where
fl
yT(h(u) --e-M);
82
-
-((h(u)-e) (i_c)
M)
(9.4)
It follows from (9.3) that
181 >0, /32<0. s.,
Let ds consider the function q,j (g)
-_
/31(u,g) +A 11g11
,
(9.5)
4).m
where coefficients X31 and l32 are defined by equalities (9.4). Since /32 < 0, the
functional qj is superlinear. If g E S, then qj (g) = e, (g), and therefore the equality
h(g) = sup SEA
qq (g)
holds for g E S.
Since the functionals h and qj are positively homogeneous, this equality holds for all g E Rn. 0
Upper Convex and Lower Concave Approximations C13
110
ten
Remark 9.1. The proof of this theorem indicates a means of constructing a l.c.a. which approximates a function f at a point x in a direction g,
1 1g1I = 1.
This l.c.a. is of the form (9.5), where coefficients 81 and /32 are given by formula (9.4) with h = fx. Here e is the accuracy of the approximation, and the numbers 5 and M are related to fX and e (see Proof).
Remark 9.2. The proof is based on the ideas of supremal generation proposed in [88].
9.4. Finally, some remarks on the calculus of exhaustive families of u.c.a.'s and l.c.a.'s. (a) Let {p, 12 E A } and {qm I P EMI be exhaustive families of u.c.a.'s. tit
and l.c.a.'s. (respectively, of a function f at a point x. Consider the function Ii = cf, where c is a real number.
a000
If c > 0, then the families {cp j I A E A } and {cqu
p E M} are
c},
exhaustive families of u.c.a.'s and l.c.a.'s of fi at x.
If c < 0, then the family {cq'u
,u E M} is an exhaustive family of
u.c.a.'s of A at x. Indeed, Orb
(fi)x(S) _ -I c[fx(8) _ -1cl sup q (S) ,u e M
= Icy i n f (-9,u (g)) = in f c9,u (g) ,u e M
#r= M
4'"
W0.
Analogously, it can be shown that the family {cps I A E A } is an exhaustive family of l.c.a.'s of fl at x. (b) Let {pj I A E A} and {p" 1,u E M} be exhaustive families of u.c.a.'s of functions fi and f2, respectively, at a point x. For [AP] E A x M let us put p'I,u (g) = pi (g) + pu (g)
Also, let f = fl +f2. Then f(g) _ (fl )X (g) + (f2) ,r (g) = i n f p(g) + i n f pu (g) IEA # EM
i n f [p,j (g) + p (g)] =
AEA ,uEM I
PIE u (g) ,
[.Z,u] E A x M} is an exhaustive family of u.c.a.'s of
the function f at the point x. We can construct an exhaustive family of l.c.a.'s in the same way.
cps
i.e., the family {pju
inf [A,fl] E A XM
0
Upper Convex and Lower Concave Approximations
111
poi.
(c) Using (a) and (b) it is not difficult to write down formulas for computing exhaustive families of u.c.a.'s and l.c.a.'s for a linear combination c1 fl + c2 f2, the product f1 f2, and the quotient fl /f2. r0-
Of course, these formulas are of practical interest only if we know exhaustive families of u.c.a.'s and l.c.a.'s of functions Ii and f2, which, however, can be easily constructed.
The concepts of an u.c.a. and a l.c.a. are very convenient in the study of extremal problems. We shall illustrate this by the following example. Suppose
that f is defined on an open set X c Rn and is directionally differentiable at x* E X. For the point x* to be a minimum point of f on X, it is necessary that
`d g E Rn
.
If p is an u.c.a. of f at x*, then p >_ fX and therefore
p(g)?0
`dgE Rn
.
(9.6)
00.0
U''"' T.,
This inequality is equivalent to the inclusion O Edp. Thus, a necessary condition for a minimum has a very simple and natural form: the subdifferential o)7 must contain zero.
If {pj I A E Al is an exhaustive family of u.c.a.'s, then the necessary condition (9.6) is equivalent to the condition O Edp,j `d 2 E A. Analogously, a necessary condition for a maximum can be expressed in terms of l.c.a.'s. Thus the problem of studying necessary conditions is reduced to that of finding u.c.a.'s (or 1.c.a.'s) and the exhaustive families of these approximations.
tea)
In the next chapter we shall describe one sufficiently rich class of ,,r
functions for which exhaustive families of u.c.a.'s and l.c.a.'s constructed in a rather simple canonical way (and with relative ease).
can be
Chapter 10
Q UASIDIFFERENTIABLE FUNCTIONS
10.1. A function f defined on an open set X c RR is said to be quasidifferentiable at a point x E X if it is directionally differentiable at this point and if its derivative of(x)ldg can be represented in the form df (x)
max (v,g) + m i n (w,g) ,
fX' 1 (9)
WE W
VEV
where V and W are convex compact sets in Rn.
Remark 10.1. It is possible to show that representation (10.1) is equivalent to the following property: there exists bounded sets Zl and Z2 in the space Rn such that
df(x) d -= sup (v,g) + i n f (w,g) vEZ,
g
.
WEZ2
Let a function f be quasidifferentiable at a point x (i.e., representation (10.1) holds). Put
p(g) = m a x (v,g); q(g) = m i n (w,g) VEV
.
WEW
-h~
Clearly, the function p is sublinear and the function q is superlinear. Thus, the directional derivative of a quasidifferentiable function can be represented as the sum of a sublinear function and a superlinear function and therefore the derivative is continuous as a function of direction. Moreover, this derivative is Lipschitz since both sublinear and superlinear functions are Lipschitz (see Chapter 5).
It is clear of course that if the derivative fx' can be represented as the sum of a sublinear function p and a superlinear function q, then f is quasidif-
ferentiable at x; in formula (10.1) we can take the subdifferential p of the function p as V and the superdifferential dq of the function q as W. 112
Quasidifferentiable Functions
113
?:'
Let us establish a relation between quasidifferentiability and the concepts of upper convex and lower concave approximations.
Theorem 10.1. Let a functionf be defined on an open set X c Rn and be directionally differentiable at a point X E X. The following propositions are equivalent:
(a) the function f is quasidifferentiable at the point x; (b) the function f has an exhaustive family of u.c.a.'s I W E W}, where pw = w + p, W is a convex compact set, p is a sublinear function; (c) the function f has an exhaustive family of l.c.a.'s {q (v E V}, where q,, = v + q, V is a convex compact set, and q is a superlinear function.
P r o o f. (a) G=* (b). Let f be quasidifferentiable at x so that relation (10.1) holds. Put a x (v,g) . p(g) = m VEV R'1
The function p is sublinear and therefore for any W E W the function p,, = w+p is also sublinear. Thus
f;;(g) = max (v,g)+mi n (w,g), VEV
WEW
4-4
b0>
i.e., the family fpw I W E W} is an exhaustive family of u.c.a.'s. However, if fPw I W E W} is an exhaustive family of u.c.a.'s (where W is a convex compact set and pw = w + p), then
f(g) = m i n ((w,g) + p(g)) = p(g) + m i n (w,g) = p(g) + q(g) WEW WEW
Since the function q(g) = mEi (w,g) is superlinear, the function f is quasidifferentiable at x. The equivalence (a) ==* (c) may be proved in an analogous way. 0 a)4
10.2. The nonuniqueness of a pair of sets defining the directional derivative of a quasidifferentiable function seems to be quite natural if viewed in terms
0'0
of the space of convex sets. Indeed, quasidifferentiability means that the derivative fX can be represented as the sum of a sublinear function, and a superlinear function, i.e., belongs to the space L (see Chapter 6). But, as observed in Chapter 6, every function from L is associated with a class of
cep
pairs of convex compact sets. Two pairs [A 1, B 1 ] and [A2 , B2 ] belong to the same
class if and only if they are equivalent, i.e.,
A 1 - B2 = A2 - B1
.
(10.2)
Chapter 10
114
ran
Ir.
Definition 10.1. Let a function f be quasidifferentiable at a point x. The class of equivalent pairs of sets [A, B] such that
f(g) = max (v,g) + m i n (w,g) v EA
V g E R1z
WEB
`.,
is called the quasidifferential (QD) of the function f at the point x and is denoted by Df(x). Any pair of sets belonging to this class is also called a quasidifferential of f at x and is denoted by Df(x). This ambiguous usage of Df(x) should not cause confusion.
BCD
Remark 10.2. Using the terminology of the space of convex sets, Df(x) is just an element a of the space of convex sets such that j/i(a) = fx, where yi
'v0
,,r
is the mapping defined by formula (6.5). It is also a pair of sets which belongs to the class of a. If Df(x) = [V,W] then the set V is called a subdifferential of the function P"'
X01
f at the point x and is denoted by fi(x), and the set W is called a superdifferential of f at x and is denoted by af(x). Note that the sets must be considered as a pair Df(x) = (a quasidifferential): they cannot
[f(x),df(x)] such that df(x) =
f(g) =
max (v,g) +
mar'
`ti
be studied and used separately. By definition, a quasidifferential of f at x is a pair of convex sets
min (w,g)
.
4.,1
dg vedf(z) weO f(x) This pair is defined up to the equivalence relation (described by (10.2)).
10.3. MAIN FORMULAS OF QUASIDIFFERENTIAL CALCULUS. It is understood that algebraic operations over quasidifferentials are performed in the same way as operations over elements of the space of convex sets (or, equivalently, over pairs of sets). Recall that (see Chapter 6) [AI,BI] + [A2,B2] = [AI+A2, BI+B2] [cA,cB]
ifc _ 0,
[cB,cA]
ifc<0.
c[A,B] =
(10.3)
0
Let 0(x) be the set of all functions defined on an open set X c Rn and let it be quasidifferentiable at a point x E X.
Quasidifferentiable Functions
115
Theorem 10.2. (i) If fi, f2 E O(x), then fi +f2 E fi(x) and Df(x)
D(fl +f2)(x) = Dfl(x) + Df2(x) .
In other words, if Dfl(x) _
(10.4)
(x), dfl (x)], Df2(x) _ [e2 (x), df2 (x)J, then
Df(x) = [fi(x), af(x)] , where
`ti
f(x) = fi(x) + h (x); afcx> = afl W + aft W (ii) If fi E fi(x) and c is a real number, then f = cfi E fi(x) and
D(cfl)(x) = cDfl (x)
Df(x)
In other words, if Dfl(x) =
.
(10.5)
r7fi(x)], then
Df(x) = where .,..
c
A
if c
Al
.,.,
{ af(x) _ (x)
if c
0
.
(iii) If fif2 E O(x), then f =fl f2 E fi(x) and
Df(x) = f1(x)Df2(x) + f2(x)Dfi (x) .
(10.6)
(x)] and Df2(x) = [e2 (x),df2 (x) ], then
In other words, if Dfi (x) _
Df(x) =
Chapter 10
116
where
df(x) =
,
f1df2+f2fi
lf.fi
0 ,.fi ? 0
,
f1df2+f2df1
Lf.fi
0 , .fi
f1f2-i-f2df1
iffi ? 0 , fi
IA
f1f2+f2f1 Iffi?0 ,.fi?0
0
,
0
,
f1df2+f2df1
1.ffi ? 0 , fi ? 0
,
f1f2+f2df1
ffi
0 , fi ? 0
,
fif2+f2f1
f.fi
0 9f2
f1df2+f2f1
f fi ? 0 , fi ? 0
0 .
Here the values of the functions and their subdifferentials and superdifferenrials are computed at the point x.
Df(x)
nil
fi
I
f E A(x) and 2
ski
(iv) If fl E O(x) and f1(x) # 0, then f =
Df, (x)
A (X)
In other words, fDf1(x) = [fi (x), afi (x)], then
Df(x) = [fi(x), df(x)] where (x)
fl
afi (x)
df(x)
,
112(x)
ei fix)
P r o o f. We shall prove only properties (i) and (ii). Properties (iii) and (iv) can be proved in an analogous manner.
Quasidifferentiable functions
117
(i) Since fi, f2 E fi(x), there exist derivatives dfi(x)lag and af2(x)lag, where df 1 (x)
=
g
df2(x)
dg
ma x (v,g ) + m in (w,6) , VEd f1(x) vEdf1(X)
=
,g
= fj(x,g) =
max ve d f 2 (x)
(v,g) +
min
ve d f2 (x)
(w,g) coo
d
.
Since the directional derivative of a sum is equal to the sum of the directional derivatives we have
d f(x} dg
-
df I (x)
dg
--
df2(x)
d
g
max (v,g) +
ve df 1 (x)
max (v,g)
VE f2(x)
min (w,g) + min (w,g) wed f, (x) wEd f2 (x)
(10.7)
m
+
+
+!Y2
max
(w,g) coo
(v,g) +
max VE 1 ! Y 1
.
WE [af1(X) +af2(X)]
But (10.7) implies (10.4). This proves (i). (ii) Let f = cfl, fi E O(x). If c > 0, then dfi (x)
dg -c dg -c
(max
V E f (x) /'Y
of (x)
we df (x)
(v,g) +
ve [Caf (x)]
(w,g))
min
CPO
max
(v,g) + min
(w,g)
.
(10.8)
WE [Caf (x)]
Ifc<0,then af(x) dg
afi (x) = c dg
Icy
=
Icy
(v,g) - min (w,g) (-max VE of (x) we of (x) min
(v,g) +
ti.,
W E[-d f(x) ]
(v,g) +
min
WE [c of W1
Relation (10.5) now follows from (10.8) and (10.9). 0
(w,g))
/'1
max
VE [c of(x)]
max
W E[-a f (x) ]
(w,g)
.
(10.9)
Chapter 10
118
Remark 10.3. Theorem 10.2 is an immediate corollary of the rules for com-
puting directional derivatives and of the fact that the mapping i defined by (6.5) is linear. (Recall that this mapping is a one-to-one correspondence between the space M of convex sets and the space L of functions which can be represented as the sum of a sublinear function and a superlinear function.)
Theorem 10.3. Let an index set I = 1: n, let functions lrl ,..., urn be members of A(x) and
fl(y) = maxi/(y) VyEX,
(10.10)
i EI
f2(y)=min(y)
'd y e X.
i eI
Then the functions fi and f2 are quasidifferentiable at x and D.fi (x) =
(x), o'fi (x)J
Df2W =
W . dfiW l
(10.12)
,
(10.13)
where
dfl(x) =
V
kE R (x)
[k(x)
iE R (x)
'.m
fi(x) = CO
i #k
I dy k(x), f2(x) =
kE R (x)
df2(x) = Co
U
kEQ(x)
I *-k W
kE Q (x)
[k(x) - iEQ(x) ddi(x) i #k
Here
R(x) = {i E I I V/i(x) =.fi(x)}, Q(x) = {i E I I I//t(x) =f2(x)}.
P r o o f. We shall confine ourselves to considering the function fi. It follows from Proposition 3.5 that this function is directionally differentiable: Sao
df i (x) dg
max iER(x)
d Vi (x)
dg
Equation (10.11) may then be obtained, using formula (6.10).
11
Quasidifferentiable Functions
119
Remark 10.4. Another proof of this theorem, which does not involve the space of convex sets, is given in Example 12.3.
'ti
Remark 10.5. It follows from the above results that the set of quasidifferentiable functions is a linear space, an algebra (with respect to pointwise multiplication) and a lattice (with respect to the natural ordering relation). In other words, the set of quasidifferential functions is a linear space closed with respect to all algebraic operations and with respect to the operations of
V"'
taking a pointwise maximum and minimum.
,0D
10.4. Now let us show that under some natural assumptions the integral of a summable family of quasidifferentiable functions is also quasidifferentiable. Consider some space (S,E,m), where the measure m is probabilistic, i.e., m >_ 0, m(S) = 1.
Let X be an open set in Rn and a functionfs be associated with every S E S,
0.0
where fs is defined on X and is quasidifferentiable at x E X. Assume that for every y EX the function s -4 fs(y) is summable. Let
1(y) = ffstm.
(10.14)
&a00
We shall now find conditions under which the function f is quasidifferentiable and describe its quasidifferential. .ti
Let every s E S be associated with a sublinear function ps defined on Rn and the function s -4 p s (g) be measurable for every g E Rn. (In this case the mapping s -, ps is called weakly measurable.) It is not difficult to show that in this case the function s -4 ii p s ii = v'r1 I p s (g) I , is also measurable; 8
the continuity of p s implies that is summable. Since
II p
s
II < +00. Assume that the function s -4
II p
s
II
VgERn,
Ips(g)I <
the function ps(g) is also summable for any g. Let
p(g) =
is
p s (g) dm
Clearly, the function p is sublinear. Putting f s Ip (g)I < c Ilg II
II p
(10.15)
.
s
II dm =c, we conclude that
V g E Rn
which implies that the function p is also continuous.
Chapter 10
120
Theorem 10.4 (von Strassen). A vector v E Rn belongs to the subdifferential p of the function p defined by (10.15) if and only if (vs,g) d m V g E Rn
(v,g) =
Ca"
fS
where s -4 vs is a weakly measurable mapping, and vs E dps VS E S.
This theorem was proved for the infinite-dimensional case in [137]. A proof can also be found in [64]. More general results related to this problem are described in [64].
Let us consider the function fs (see (10.14)) for every s E S. Since this function is quasidifferentiable at x, the relation
f5(x+ag) = fS (x) + a
oafs (x)
os,x,g (a)
dgg
a (a)
os ,X, g
holds for all g E Rn and a > 0, where lim tion f (g) =
W0
and the func-
(x) can be represented in the form dg s
VS)to
= ps+qs, p5 EP, q5E
Q
(for the definition of P and Q see Section 6.3).
Lemma 10.1. Let 1 jf0s,x,g((2') dm s
a
0
0.
Suppose that the function (fs)X can be decomposed into a sum ps + qs in such a way that the mappings s -4 ps and s -4 qs are weakly measurable and, in addition,
fIlp3lldm < +00, fIIq5lIdm S
<
S
Then the function f defined by (10.14)
is quasidifferentiable
at x and
Dflx) = [oXx), dflx)], where f(x)(df(x)) consists of all vectors v such that (v, 8) =
ff5(x+ag)dm d
g
s
vs E YS (vs E dfS (x)) for all all s. Here ° Ys = °
s,
fs = d qs .
Quasidifferentiable Functions
121
.ti
Pro of. We have dfs (X)
CPO
f (fs (x)
s
= f(x) + a f afsag(x) dm
+
os , X, g (a)
+ a d8 moo
flx+ctg) = ff(x+ag) dm
dm
5 os,X,g (a) dm . S
It follows from the hypotheses of the lemma that the function f is directionally differentiable at x and
oak
Z
=
f dgx) dm b g E Rn
.
Since X
dg
=
f p(g) dm + f 4s (8) dm V g E Rn s
,
s p,,
we have that f is quasidifferentiable at x. The required description of the sets qYx) and dAx) follows directly from Theorem 10.4.
Chapter 11 EXAMPLES
Let us consider some simple examples of quasidifferentiable functions.
Example 11.1. If a fucntion f is Gateaux differentiable at a point
x, i.e.,
there exists an element Vf(x) E Rn such that
f(x + ag) = }'(x) + a(Df(x),g) + og(a) V g E R^ where
g
,
-
(a) a a-4+0 0, then f is quasidifferentiable at x and Df(x) = [Vf(x),O]
(more precisely, Df(x) = [{Df(x)}, {0}]). Clearly, Df(x) can also be represented
in the form Df(x) _ [0, Vf(x)] since the pairs [Vf(x),O] and [0, 'Vf(x)] are equivalent. Example 11.2. Let a function f be defined and convex on an open convex set X c Rn and let
f(x) _ {v E R" I f(y) - f(x) S (w,y-x) V y E X} be the subdifferential of f at a point X E X. Then
f(g) -
df(x) dg
-
max
vE of (x)
(v, g) V g E Rn .
Since the function fX is sublinear (as a function of direction), the function f is quasidifferentiable on X and Dflx) = Example 11.3. Let a function f be defined and concave on an open convex set X c R" and let
df(X) _ {W E R" I f(y) -f(x) < (W,y-X) V y E X} be the superdifferential of the function f at a point X EX (see Remark 4.1). 122
Examples
123
As in the case of a convex function, it is clear that f is quasidifferentiable on X and Df(x) = [O,of(x)].
cow
Remark 11.1. If the directional derivative of a function f at a point x is a sublinear (superlinear) function (as a function of direction), then it is natural to call such a function subdifferentiable (sup erdifferenriable) at x. A subdifferentiable (superdifferentiable) function f has a quasidifferential of the form Thus, a convex function is subdtfferentiable, a concavefunction is superdifferentiable, and aGateaux-differentiable function is both subdifferentiable ...
and superdifferentiable.
Example 11.4. Let f be a maximum function, i.e.,
f(x) = max yI(z,y)
,
yEY
?;x
where Y c Rm is a compact set and the function t and its partial derivative ayi/dz are both continuous on XxY. Here X c Rn is an open set. Then the function f is directionally differentiable at any x E X and (see (3.21))
f;g) _
=
max
(11.1)
g)
yeR(z)
where R(x) = {y E Y I f(x) = yi(x,y)}. It follows from (11.1) that fX is a convex function. Thus the function f is quasidifferentiable (or, more precisely, subdifferentiable) at x and Df(x) = [A, 0], where
A = co k2z X V9
I y E R(x))
,
R(x) = {y E Y I .fW = Sv(x,Y)} . Example 11.5. Let f be a minimum function, i.e.,
f(z) = m i n yi(x,y) yEY
,
where Y and yi are the same as in Example 11.4. Then f is superdifferentiable at any x E X and has a quasidifferential of the form Df(x) = [0,B], where
B = co
VX91 Y E QW)
9
QW = ly E Y I fW = V(Xq Y) I
-
Chapter 11
124
Example 11.6. Let f(z) = mi f, where I = 1: N; and functions f are defined and convex on an open convex set X c Rn. It follows from Chapter 10 and Example 11.2 that f is a quasidifferentiable function. Let us find its quasidifferential at a point X E X. Using formulas (10.12), we have
Df (x) _ [fi(x), df(X)] and
1f(x)
ke Q (x)
df (x) = co
2lkW
u
kE Q (X)
2: (-ei(x)) iE Q (X)
.
i* k
Here Q(x) = {i E I I f(x) = fi(x)} and qYi(x) is the subdifferential of a convex function f at x.
Example 11.7. Let f(x) = f(y, z) be defined on X = YxZ, where Y c Rn' , Z c Rn 2 are convex open sets in the corresponding spaces. The function f is called convexo-concave on X if the function f(y,z) is convex in y on Y for any fixed z E Z and concave in z on Z for any fixed y E Y. Fix
xo=(yo,zo)EXandlet Y0y(vo,=o)
_ {v E R"' f(y,zo) -f(vo,=o) >- (v,y-yo) e y E r} ,
afZCvo,zo) _ {w E Rn2 I f(yo, z) -f(Yo,zo)
(w,z-zo) V z E Z}
(11.2) (11.3)
a0.
The set Dfy(yo,zo) is the subdifferential of the convex function hl(y) = f(y, z0) at the point yo E Y, and the set dfZ(yo,zo) is the superdifferential of the concave function h2(z) = f(yo, z) at the point zo E Z. These sets are nonempty, convex, closed, and bounded. In [47, Section 4, Chapter II] it was proved that where the function f is quasidifferentiable on X and Df(xo) =
Y0(X0) _ [,(yo,zo, On219 at'(xo) _
[only
(11.4)
are zero elements of spaces Rn' and R n2 , respectively. Thus the first formula in (11.4) means ans that f(xo) is the direct product of sets o y(yo,zo) c R ' and {02} c R 2. Here On
1
and On 2
Examples
125
Example 11.8. Let a function f be defined on an open set X c Rn and let
F(x) = sat f(x) =
if f (x) 2 1 , if -1 S f(x) S 1 if f (x) 5 -1 .
1+1 f (x) L-1
Since Fl(x) = max {fl(x),-1}, where fl(x) =min {f(x),1}, quasidifferentiability of the function f at a point xo E X implies quasidifferentiability of the function F at xo.
Applying formulas (10.12) and (10.13), we get DF(x) _ [o'(x),df(x)], where
if f(x) >_ 1 or f(x) < 1
1{O}
f(x) _
{f' (x)}
if I
0
1co if (X), 01
I<1
if f(X) = 1
fol
if f(X)
1
9
.
r-,
co {0, f'(x)} if f(x) = 1
,
In the following examples, x = (x(' ),x(2)) E R2, x(O) = (0,0).
III
(D'
max {fi(x),f2(x)}, where Example 11.9. Let f(x) = JxMj, i.e., f(x) fl (x) = P>, f2(x) = -P). Since fi and f2 are smooth functions, we can take
Dfi(xo) _ [o
Dfa =
where
= {A1}, Al = (1,0) E R2,
df2(xo) = {A2}, A2 = (-1,0) E From formula (10.2) we have D,f(xo) = f(xo)
fi(XO) _ {(090)} , R29
df2(xo) _ {(0,0)} all
oY&o)
where (see Fig. 11.1)
co {Ao,A2}, df(xo) _ {(0,0)}
Chapter 11
126
Remark 11.2. Recall that a quasidifferential is not defined uniquely. For example, if we take
),,(Xo) = famg df2(xo) =
Ofil(io)
1(190)1 9
{(-1,0)}, o?f2(xO) _ {(0,0)}
,
we get another quasidifferential for the function f at the point xo: Df (xo) = [df(xo) f (xo) 1
9
where
co {(0,0),(2,0)}, o?f(xo) _ {(1,0)}
Example 11.10. Analogously, if f(x) = Ix(2)I, then Df(xo) _ [df(xo),df(xo)]; where (see Fig. 11.2) o)I(xo) = co {B19B2}9 Axo) _ {(0,0)}
B1 = (0,1) E R2,
B2 = (0,-1) E R2
.
Example 11.11. Let x0 = (0,0) and let f1(x) = IX(1)1 + IX(2)1,
f2 = IX(') I - Ix(2) l
f3(X) = JJX(1) I - JX(2)11
,
.
Making use of the formulas from Chapter 10 and Examples 11.9 and 11.10, it is easy to show (see Figs. 11.3, 11.4, 11.5) that fi(xo) = co{A1,A2} + co{B1,B2}, af(x0) _ {(0,0}
,2(xo) = co {A1,A2},
,
d2(xo) = co {B1,B2}
of3(xo) = co {co{Ai,A2}+co{A1,A2}; co {B1,B2}+co {B1,B2}}
0}3(x) = co{B1,B2} + co{A1iA2}
It is easy to see that for the functions fi' f2, f3,
dcz f (xo) = co {A1,A2} + co {B1,B2} V i
,
i.e., the Clarke subdifferential of all three functions at the point x0 is the same. Other examples can be found, for instance, in [47, Chapter II].
Chapter 12 QUASIDIFFERENTL4BILITY OF A COMPOSITION
12.1. A mapping H: X -, Rm , where X is an open set in Rn, is called quasidifferentiable if its coordinate functions are quasidifferentiable. In other words, let H be given by H(x) = (hl (x),...,hm (x)). Then, if every function hj, i E 1:M is quasidifferentiable at a point X EX, the mapping H is said to be quasidifferentiable at the point x.
Let Dhj(x) = [dhj(x),dhj(x)] be a quasidifferential of the function h, i E 1:M, at x. We shall define a quasidifferential DH(x) of the mapping H at the point x as a pair of sets [dH(x),dH(x)]. The set dH(x) consists of all nxm-matrices composed of rows i formed by vectors belonging to dhj(x); analogously, the set dH(x) consists of all nxm-matrices composed of rows i formed by vectors belonging to dhj(x).
"(D
We shall call the set dH(x) a subdifferential and the set dH(x) a super-
differential of the mapping H at the point x.
Let U be some set of nxm-matrices and let Uj denote the set of vectors corresponding to the ith rows of matrices in U. Let X E Rn. We shall let m a x Ax to denote the vector y E Rm whose ith coordinate is equal to
AEU C>4
m a x (aj,x). In other words,
aieUj
ale U1
amEUm
a2EU2
ski
max AX = [max (al, x), max (a2, x), ..., max (am, x)
AEU
Similarly, we introduce
128
ale U1
a2EU2
...,
min (am, x)
amEUm
ski
min AX = 1min (al, x), min (a2 , x),
AEU
Quasidifferentiability of a Composition
129
Proposition 12.1. If a mapping H is quasidifferentiable at a point X E X, then
dH x = H'x (, g) dH
=
max A g
AEdH(x)
+
min Bg _
BEdH(x)
0+0
The proof follows immediately from the definition. 0 It also follows from the definition that subdifferentials dH(x) and superdifferentials dH(x) are compact (i.e., closed and bounded and convex. Furthermore, these sets of matrices have the property of operational convexity.
Definition 12.1. We say that a set U of nxm-matrices is operationally convex if the inclusions AEU, A' E U imply that
for any a such that 0
al
a2
a = 0
am
where 0 <_ aj <_ 1 V i. Here I is the identity matrix.
boo
p." cry
If a set U of matrices is operationally convex, then a matrix composed of rows formed by convex combinations of the corresponding rows of matrices A E U and A' E U will also belong to set U. _ The operational convexity of sets dH(x) and dH(x) follows immediately from the definition since the ith row of a matrix belonging, e.g. to dH(x), is composed of elements of a convex compact dhj(x), and has nothing to do with the jth row of this matrix if j # i.
Remark 12.1. The above definition of a quasidifferential of a mapping is not quite consistent, since it depends on the choice of specific quasidifferentials Dhj(x) = [dhj(x),dhj(x)] of functions hj, i E 1:m. More precisely, the quasidifferential of a mapping H is a class of pairs [dH(x), dH(x)], where dH(x) and ;)H(x) (AD
are families of matrices constructed as described above for any combination of quasidifferentials Dhj(x) = [dhj(x),dhj(x)j. Hence, every {Dhj(x) i E 1: m} is associated with a particular pair [dH(x),dH(x)] and conversely every such pair I
corresponds to some particular {Dhj(x) I i E 1: m}.
It is possible to define the quasidifferential of a mapping in terms of the space of operationally convex sets (which is analogous to the space of convex sets), rather than coordinate functions. This approach has been described in detail in the literature.
Chapter 12
130
12.2. Let P: Rn -4 Rm be a mapping of the form P(x) = (p1(x), ..., pm (x)). The mapping P is called sublinear if its coordinate functions pi,...,pm are sublinear. Sometimes the term "a sublinear operator" is used instead of "a sublinear mapping." Let x,y E Rm and x = (x(1), ..., X(m )), y = (y(1)) ..., y(m )). We write x > y if xW > y( V i E 1: m. In other words, we treat Rm as an ordered space (see Section 6.4) in which the cone of positive elements of the space Rn coincides with the cone + Rm
_ {xE
R"
IE
1.,
If an operator P is sublinear, then it is (i) subadditive, i.e., P(x+y) _< P(x) + P(y) V X, Y E Rm ; (ii) positively homogeneous, i.e., P(Ax) = 2P(x) V A > 0, V X E Rn .
Let P: Rn - Rm be a sublinear operator. The subdifferential dP of P con-, sists of all nxm-matrices composed of rows i which belong to the subdifferentials dp of the coordinate functions pi. Clearly, dP is a compact and operationally convex set and P(x) = m a x Ax. AEdP It follows from the definition that
dP = {A I Ax 5 P(x) V x E Rn }
.
A mapping Q; Rn -4 Rm is called superadditive if
Q(x+y) > Q(x) + Q(y) V x,y E Rm A mapping which is both superadditive and positively homogeneous is called a superlinear mapping (or a superlinear operator). The superdifferential dQ
of a linear operator Q consists of all nxm-matrices A such that Ax > Q(x) V x E Rn. Clearly, Q(x) = m i n Ax. A E aQ
Let X be an open set in Rn. It is clear that a mapping H: X - Rm is quasi-
differentiable at a point x E X if and only jf (i) it is directionally differentiable and (ii) its directional derivative HX can be represented in the form HX = P + Q, where P is a sublinear operator and Q is a superlinear operator. Moreover, the subdifferential dH(x) coincides with the subdifferential dP of the operator P, and the superdferential dH(x) coincides with the superdifferential dQ of the operator Q.
12.3. First we shall discuss the composition of a sublinear function and a sublinear operator.
Let T: Rn - Rm be a sublinear operator and let p be a sublinear function defined on Rm. Unfortunately, the composition g(x) = p(T(x)) is not neces-
Quasidifferentiability of a Composition
131
sarily a sublinear function. This can be guaranteed only in the case where p has the property described below.
We say that a function p defined on Rm is increasing if the inequality x ? y implies p(x) ? p(y). If a sublinear function p is increasing, then the composition g(x) = p(T(x)) is sublinear. Indeed, since T is sublinear, we have yl = T(xl+x2) <_ T(xl)+T(x2)
In.
Y2- Since p is an increasing function, we must have p(yl) <_ P(Y2). Therefore /-.
8(xi+x2) = P(T(xi+x2)) = XYi) :5 PCY2) = P(T(xi) + T(xz))
< P(T(xi)) + P(T(xz)) = 8(xi) + 8x2)
Proposition 12.2. A sublinear function p is increasing if and only if its subdifferential & is contained in the cone Rm of vectors with nonnegative components.
P r o o f. 1. Let p be increasing. Then for x <_ 0 we have
max (v, x) = p(x)
<_ 0
.
V 607P
In particular, if ej is the ith basis vector, ej = (0,...,0,1,0,...,0) E Rm , i E 1: m, and V Edp, then
-(v, ej) = (v,-ej),
p(-ej)
<_ 0
,
.ti
DEC)
,..
i.e., (v, ej) >- 0 V i, and hence v >- 0. Al
IC)
Via.
L%'
2. Let op c Rm . Take any v E& and x, y E Rm such that x >- y, i.e., x - y >- 0. Since op c Rm, we have (v,x-y) > 0, i.e., (v, x) >- (v,y). Therefore
p(x) = max (v,x) > max (v,y) = p(y). 0 vEap
vEap
Mao
Proposition 12.3. Let a function p be defined and sublinear on Rm and let v' be a lower bound to the value of its subdifferential dam, i.e., v' <_ v d v E dom. Then the function
p1(X) = P(x) - (vfgx)
is sublinear and increasing.
Chapter 12
132
Remark 12.2. A lower bound v' always exists since the set p is bounded.
P r o o f of Proposition 12.3. Clearly the function pl
is sublinear and
moreover
p1 = P - v' This equality implies that p 1 c Rm since v' is a lower bound. It follows from Proposition 12.2 that pl is an increasing function.
0
cps
12.4. Lemma 12.1. Let a function p be defined and sublinear on Rm and let a function p given by p (x,y) = p(x-y) be defined on Rm X Rm. Then the function f is sublinear and its subdifferential is d, = f V = [v 1,v2 ] E Rm x Rm I VI E p, V2 = -VII .
(12.1)-
b".
a'0
Remark 12.3. Recall (see Chapter 2) that an element of a direct product is denoted by [x,y]. We can regard [x,y] to be a 2m-dimensional vector the first m coordinates of which form the vector x and the last m coordinates the vector y. As usual, we shall write f (x,y) instead of f([x,yJ).
P r o o f of Lemma 12.1. It is obvious that the function p is sublinear. Let us prove equality (12.1).
Take v = [v1,v2 ] E, i.e., (v 1,x) + (v2,y) <_ p (X-Y)
V x,y E Rm
.
Putting y = 0 yields v1 E p. Setting x = y, we obtain the inequality (v1+v2,x)
<-
0
V x E Rm
,
which can be true only if v2 = -v1. Now consider a vector v = [v1 ,v2 ] such that VI E p, v2 = -vi. Then for any [x,y] E Rm X Rn we have
(v1,x) + (V2,y) = (v1,x) + (V1,--y) = (v1,x--y) < p(x y) = jJ(x,y)
,
(see Property 6 in Section 5.1). 0 We say that a function f is uniformly quasidifferentiable at a point x if it is both uniformly directionally differentiable and quasidifferentiable at this i.e., v E point.
Quasidifferentiability of a Composition
133
Theorem 12.1 (A composition theorem). Let X be an open set in Rn, let Y be be quasidifferentiable at a point xo E X. If a function f defined on Y is uniformly quasidifferentiable at
an open set in Rm, and let a mapping H : X the point yo = H(xo), then the function
V(x) _ f(H(x))
(12.2)
is quasidifferentiable at the point xo. 0
P r o o f. Since all the conditions of Theorem 3.1 are satisfied, the function
V is directionally differentiable at xo and the following equality holds: Sao
dg
-
Vg E R
(xo,g) = f'(H(xo),H(xo,g))
s,,
00r
We must show that the function h(g) = f'(yo,H '(xo,g)) can be represented as the sum of a sublinear function and a superlinear function. Let DH(xo) = [aH(xo), dH(xo)] be a quasidifferential of the mapping H at
the point xo and let Df(yo) = [moo), df(yo)] be a quasidifferential of the function f at the point yo = H(xo). Then H' (xo ,g)
=
max
Ag + min
max
Ag --
AE dH(xo)
AE aH(xo)
Be aH(xo)
Bg
max
Be [- aH(xo)]
Bg = T(g) - S(g)
,
0
where T and S_ are sublinear operators whose subdifferentials coincide with dH(xo) and [-rdH(xo)], respectively: max VE of (yo)
(v,.1) + max
(v,1) -
max sir
max
vE df (y0)
(w,.1)
we df (y0)
m
f(yo,1) =
(w,.1) = p(1) - q(.1)
.
we [- of (y0)]
Here p and q are sublinear functions such that =
f(yo),
d4 =
(12.3)
Chapter 12
134
Now we can write (12.4)
h(8) = P(T(S)-S(8)) - q(T(8)-S(S)) .
Let z : Rn - Rm x Rm be the mapping defined by the equality z (g) = [T(g), S (g) ], i.e., z (g) _ (tl (g),...,tm (g),sl (g),...,sm (g)), where {t1 i E 1: m}, {sj i E 1: m} I
I
are coordinate functions of the operators T and S, respectively. Clearly, z is a sublinear operator. Construct the sublinear functions p and q as in Lemma 12.1: p(g,e) = p (g_,1), q(g,1) = q (g-1) .
(12.5)
h(g) = fl(z(g)) - 4 (z(8)) .
(12.6)
We have
Let an- _element v' be a common lower bound of the sets dp = f (yo) and d q = -Oif (yo), and let an element v" be a common upper bound of the same sets, i.e.,
[af(vo)]
vvsvof
F-"
By Lemma 2.1, v1 E p, v2 = -v1, and hence Let w = [v1,v2] E v' < v1, --v" < v2. It follows from Proposition 2.3 that the function p can ((A
b represented in the form p(v1,V2) = pl(v1,v2) + (V',V1) -
(12.7)
where pl is an increasing sublinear function. Similarly, it follows from the inequalities v' < v1,
-v < V2
V w = [v1,v2] E
q
that q(V1,V2) = gl(v1,V2) + (V')V1) - (v")v2)
where ql is also an increasing sublinear function. Since Z(g) = [T(g)-S(g)], (12.4) leads to
h(g) = Pi(T(8),5(8)) - gi(T(8))S(8)) .
(12.8)
Since pl is an increasing sublinear function, the function hl(g) = pl(T(g), $(g)) is sublinear. Similarly, the function h2(g) = gl(T(g),S(g)) is sublinear and hence (-h2) is superlinear. This completes the proof. 0
Quasidifferentiability of a Composition
135
12.5. The following lemma enables us to compute a quasidifferential of a composition.
Lemma 12.2. Let p be an increasing sublinear function defined on Rn, let T: Rn -- Rm be a sublinear operator and h(g) = p (T(g)). Then
dh = {w I w = A*v, A E dT, v E dp}.
(12.9)
Here A* is the transpose of A.
Remark 12.4. Let al,..., am represent the rows of a matrix A and let v = (v(1 >
v(m) } Then
M
A*v =
v(`)aj j ffi1
P r o o f of Lemma 12.2. Let W denote the set on the right-hand side of equality (12.9). We shall first show that
h(g) = m a x (w,g)
V g e Rn
We W
.
(12.10)
Let w = A*v E W, V E p. It follows from Proposition 12.2. that v >_ 0 and therefore
(w,g) = (A*v,g) = (v,Ag)
(v,T(g)) <_ p(T(g)) = h (g) .
<_
Hence,
m a x (w,g)
_<
h(g)
.
WEW
On the other hand, we can also show that for any g E Rn there exists a w E W such that (w,g) = h(g). To construct such a w, let us consider the coordinate functions f tj I iE 1:m} of the operator T. These functions are sublinear. It follows from Proposition 5.1 that if aj E d tj(g), then
(aj,g) =
tj (g)
and
aj E d tj
V i E l :m
.
Let us consider the matrix whose rows are vectors al ,..., am. It follows from the definition of the subdifferential d T that A E d T. In addition,
Ag = ((a1
,g),..., (am ,g)) = (t1(g),..., tm (g)) = T(g) .
Chapter 12
136
If v e dp(T(g)), then (v,T(g)) = p(T(g)) and v e dp. For w = A*v we have
(w,8) = (A*v,g) = (v,Ag) = (v,T(g)) = p(T(g)) = h(g) This proves formula (12.10). To complete the proof of the lemma, it is sufficient to check that W is a convex compact set. From its definition, it is clear that W is closed. Since h is a sublinear function, it must be Lipschitz, and therefore for w E W we have (w,g)
h(g)
_<
V g: iig II = 1
_< L
_< L 11g II
,
where L is a Lipschitz constant. This inequality implies that Ilw II <_ L V w E W, i.e., W is bounded. Thus W is a compact set. Let us now prove that W is convex.
It has already been shown that if al ,...,am represent the rows of an nxm matrix A, then m
A*v = I vMa1 i =1
where v = (V(l),...,V(m)) E Rm.
Let w1,w2 E Wand w1 = A v1, w2 = A v2, where v1,v2 Edp, A1,A2 E dT, V1 = (V(1
1
))...,vim)
), v2 =
all
(v21),...,VZm},
al m represent the rows of a
matrix A2. Note (from Proposition 12.2) that
vii)>0 ViE1:m,jE1:2. Let 1+ denote the set of indices i such that V(`) + V O) 1 2
>0.
If a,l > 0, a +,8 = 1, then m
* m (0 = aA1v1 +fAIv2 = ay, vl ali+l 1: v2(0a2i i =1 i =1
r+^
awl +/3w2
Yd +
iE I
[avi
0a1
0)
i E 1+
(av1
i + flv2')a2iI
av(`) +,(3v2i)}
1
1
i
av( ) +,v2 ) 1
ali
flV (T) 1
(i) flv( i) 2
a2i
Quasidifferentiability of a Composition
137
Let A be the matrix whose rows al,..., am are given by fly2(i) av(0 aV(` +f v(2 i) all + av i) + v(2 0 1
ai
V i E 1:M0
a2 i
1
1
BCD
Clearly, A E o?T. Indeed, since ai i,a2i E ?ti and ai is a convex combination of a1 i and a2 i, we must have ai E d ti . But this means that A E 9T. In addition, v = av1 + /3v2 E dp. Thus m
awl + /3w2 = I v(i)ai = A*v E w . 0 i =1
Theorem 12.2. Consider a function V defined by formula (12.1). Then its quasidifferential at a point xa satisfies the following relation:
DyI(xo) = [&(xo), ay1(xO)], where o 1(xp)
m w I W = I [v(i) (2i +."d - v'(i)2 - v"(i)#i]
,
(12.11)
=1
v = (V(1),...,v(m)) E %/V'0), Ai E dhi(xp), 9i E M
d xo) = fW I w[v(`)(2+1ui)+v'(t)2i+v"(`)pi],
(12.12)
i =1
V = (V(1),...,V(m)) E f(yo), 2i E ahi(xo), /li E d h1(xo)
;
Here v' = (v'(1) ,...,v'(m)) and v" = (v"(1 ) ,...,v "(m)) are arbitrary vectors such that v
OP
<
v :5 v
v C- Ayo) U (-Z*O)) -
P r o o f. It follows from (12.8) that
aft o)
8
fyo(8) s h(g)
Pi(Z(8)) - 4i(Z(8))
where Z: Rn - Rm x Rm is a sublinear operator, Z(g) = (T(g), S (g)), and pi,gi are increasing sublinear functions defined on Rm X Rm. Since the func-
Chapter 12
138
tion hl (g) = pl (Z(g)) is sublinear, the subdifferential d yi(xo) coincides with
the set dhl. Similarly, since the function h2(g) = g1(Z(g)) is sublinear, we
III
have d y(xo) = -d h2.
Let us find the subdifferential d hl. Recalling the definition of operators Z, T and S (see the proof of Theorem 12.1), we have
Z(g) =
(v,g), max
max VE
(v,g), ..., max
max
(v,g),
vE I- dh1 (XO)]
(v,g)
vE dhm(Xo)
VE dh2(XO)
dh1(XO)
max
max
(v,g), ...,
This implies that an nx2m-matrix A with rows (a1 ,...am to the subdifferential dZ if and only if aj E d hj(xo)
V i e I :m
aj E [--t ?hi- m (xo)l
(v,g)
VE [- dhm (XO)]
VE I- dh2(XO)]
, am + 1
belongs
,
V i E m+ 1,...,2m
.
It follows from (12.7) that p1(V1,V2) = j3'(v1,v2) - ([V', -v"],[V1,V2])
where (see (12.5)) p is defined by the equality p(u,v) = p(u-v). Applying Lemma 12.1 yields 1
=
- [V , -v "] { [.e, x211 1 E dp,
12 = -1 } '-
[v %-v" ]
= {[11-v', -11+v"] I £1 E dp }
Since (see (12.3)) dp = e(yo), we have dpi = {[V-V', --V+V"] I V E f(yo)} Using Lemma 12.2 yields
dh1 = {wIw=A*v,,AE dZ,VE d Since A, = d y(xo), A*v = 1j2 s 1 v(j)aj, where Jai of the matrix A and v = (V(1),...,V(2rn) ), we can write
I
11.
i E 1: 2m} are the rows
Quasidifferentiability of a Composition
ay/(xo) =
M
wIw=
(v(i)-v'('))2i +
i =1
m
(v (1) _v(0
139
1,
i =1
V = (V(1),...,V(m)) E B(YO), Ai E dhi(xo), /Ui E [-Oihi(xo)
{w I W = I [v(`)(2i tui) - v'(`)2i - v "V)pi 1, V = (V(1),...,V(m)) E
f(yo)' 2i E ahi(x0), /i E ahi(xo)} .
This proves formula (12.11). Formula (12.12) can be proved in a similar way. 0
12.6. We shall now illustrate Theorems 12.1 and 12.2, using some simple examples.
Example 12.1. Let a function f be defined and convex on Rm and let functions hl,...,hm be defined and concave on Rn. Consider a function VI(x) = f(h1 (X),...,hm (x))
.
Find a quasidifferential of yi at a point x0. Let v" = (VII), ...,v" (m )) be such that
V" > 0, V _ V"
V V E e(ZO)
,
where zo = (hi (xO),...,hm (xo)) and (zo) is the subdifferential of f at zo. Let d hi(xo) denote the superdifferentials of concave functions hi at x0. Applying Theorem 12.2 yields
DV(xo) = where m
VIV
,'Y
d 4v(x0) =
(v(s) - V"(i))/3i
i =1
fli E d hi(xo), v au/(xo) =
/3V"('),/3 E dhi(x0)
wIw= i =1
v"(i)ahi(x0)
_ i =1
E e(ZO)
Chapter 12
140
+,0
Example 12.2. Let f be defined and convex on Rm and let hl,...,hm be defined and convex on Rn. Consider a function VI(x) _ f(h1 (X),...,hm (x)) .
Find Dyf(x) at a point x0. Let v' _ (v'(1),...,V'(m)) be such that
v'<0, v'
Let f(z0) and dh;(xo) be the subdifferentials (at the corresponding points) of convex functions f and hi respectively. It follows from Theorem 12.2 that DV(xo)
where
d Vl(xo) _ {v ( v = I (v(i) - v'(i>)a1, ai E dhi(xo) v = (vt l),...,V(m)) E f(z0)} , m
d v(xo)
_ I vW d hi(xo)
.
i= 1
.'y
Remark 12.5. If v > 0 VV E f (z), then we can take v' = 0. From the above formula it is clear that in this case the yi is a subdiffere nti able function and
aiv(xo) = {v I v = E a;v('), ai E dht(xo), v E f(zo)}
.
0.04,
Example 12.3. Let us use Theorem 12.2 to obtain a quasidifferential of a maximum function. Let functions h1,...,hm be quasidifferentiable at a point xo E X c R", z = (zl ,...,zm) c Rm and define
f(z) = max
iE 1: m
zi
Quasidifferentiability of a Composition
141
Clearly, f is a sublinear function and Df(z) = [2ftz),0], where f (z) _ {w I w = (w1,...,Wm ),
M
wi = 1, Wi 2t 0 i= 1
Vic 1:n,wi=OVid R(z)}, R(z) = fir= I:m I zi = f(z)j Let V (x) = f (h 1(x),..., hm (x)) = m a x
i e 1:m
.
hi (x) and w' = 0, w" = (w j',...,w,n ), where
0
if i 4 R(zo)
,
1
if i e R(zo)
,
zo = y/(zo) Clearly, w' _< w <_ w" V w E f(z0) V {0}. Applying Theorem 7.2, we find that yf is quasi differentiable at xo and
DV(xo) = where
d yi(xo) =
vIv=
2: wi(ai iE R (zo)
+/i) -iE R2:(zo)
ai E d hi (xo ), Iii E d hi (xo),
F
iE R (z0)
d yi(xo) _
2: dhi(xo) iE R (zo)
wi = 1, wi ? 0
Chapter 12 IQ,
142
v =
I
wi (ai +A) -
I
wi ai +fli
iER(z0)
iE R (zo )
wi ai iE R ( Zo)
I
IiER(z0)
wi
2:
kER(zo)
A
kE R (zo)
tab
II'
If v e dy/(xo), then
k
kE R (zo)
k*i m
Here Ywi = 1, wi -?02 ai E dhi(xo), /3i E dhi(xo). It follows from the above that
u
(dhi(xo)
iE R (zo)
- kE RI(zo) dhk(xo)) IQ,
d yi(xo) = co
k*i
fir.
The expressions for d w(xo) and d V(xo) just obtained coincide with the formulas given in Chapter 10 (see Theorem 10.3).
Chapter 13 A RELATION BETWEEN THE CLARKE SUBDIFFERENTIAL AND THE QUASIDIFFERENTIALS
13.1. Let a function f be defined, Lipschitz, and quasidifferentiable on an open set X e R", and let x be a point in X. In Chapter 8 it was shown that in this case the function f is differentiable on X almost everywhere and that the following relation holds for the Clarke subdifferential:
dc ZAx) = co A
(13.1)
,
where
A= {v ER" I v = lim Of(Yj), yj -+ x, yj E T(ffl
.
4.4
Here T(f) c V, V is a neighborhood of the point x, and V\ T(f) is a set of 4-;
~'0
measure zero containing all the points in V at which f is not differentiable. Our aim is to determine a relation between the Clarke subdifferential and the quasidifferentials. Let us first consider the case where f is a convex function. Let a mapping 0: Rn -) 2R" be defined by the relation
0(g) = Arg max {(v,g) I v e fi(x)} , where
(13.2)
f(x) is the subdifferential of f at x and Arg max denotes the set of
points at which a maximum is achieved. 4..
Since f(x) is a convex compact set, 0(g) is a singleton (i.e., consists of a single point) for almost all g E R". Let C
C(fj = {g E R" I g e C, 0(g) is a singleton} ,
where C = {g E R" I
11g 11 = 1}.
143
Chapter 13
144
It follows from the properties of convex sets that if a set C C C is such that the inner measure of the set C\ C is zero, then
f(x) = cl co{ fi(g) I g r C}
.
Therefore
f(x) = cl co{z(g) I g e C} ,
(13.3)
where {z(g)} = fi(g).
Proposition 13.1. If xk -+ x, xk e T(f), gk s Ilxk xkxxII -+ g, and if D, f (xk) then z e fi(g).
z,
Pro of. We have dfd W gk
-
max
v6 of (x)
(v,gk) = (vk,gk)
,
(13.4)
where vk E (g). We can assume without loss of generality that vk -4 V.
(13.5)
Clearly, v E fi(g).
Since the function h(a) = f(x+ag) is convex (in a), the derivative of its right-hand side III
h+(a) =
h(a+/3) -h(a)
lim
=
,l-4+o
ofx+agk )
d d9k
is nondecreasing. Since xk = x + akgk, where ak = Iixk -x II, we have dfcXk> d9k
o?f(x) d9k
Since f is differentiable at the point xk, we can write
af(xk) = (ofcxk>,gk) d9k
(13.6)
The Clarke Subdifferential and the Quasidifferential
145
Therefore, from (13.4) and (13.6) we have
(V.f(Xk)rBk) =
df(xk) agk
(13.7)
>
C4,
Taking the limit as k -+ oo in (13.7) and using (13.5) and the assumptions of the proposition, we have
(z,g) ? (v,g) =
max (w,g) weaf(X)
Since z e fix), this inequality implies that indeed
(z,g) = and therefore z e fi(g).
max
(w,g)
,
W6 df(X)
0
Corollary 13.1. If GE C, then Dftxk) -+ z(g) as xk -+ X, Xk e T(fl, and
8k -
xk x Ilxkxll
erg'
where 0(g) = {z(g)}.
Since for a convex function f we have dcfix) = f(x), relation (13.7) implies that
ac fix) = cl co{z(g) I g e c}
(13.8)
.
13.2. Now suppose that a function f is Lipschitz and quasidifferentiable on X and, also, that Df(x) = [&x),dflx)] is its quasidifferential at a point x e X. Let O(g)
O-W + O(g)
where
O(g) = Arg max {(v,g) I v Ed ex)I `0,
O(g) = Arg min {(w,g) I w e dAx)}
.
Chapter 13
146
Since f(x) and df(x) are convex compact sets, each of the sets do(g) and ao(g) consists of a single point, and therefore O(g) is also a singleton for almost all g E C. The set of values of g is denoted by C(f). (Clearly, C(f)
also depends on x.)
Definition 13.1. A is a thick subset of B if the inner measure of its relative complement is zero.
If f is a Lipschitz function, then for any thick sets C c C and T e T(f) tea'
we have
dc Zflx) = cl co v 13 {xk }, xk e T, xk -+ x, Xk -X
8k =
IIXk x II
g E C, vf kxk) -4 v
}
(this property follows easily from (13.1)). C,4
We say that f r= M(x) if f is Lipschitz on an open set X e Rn, quasidifferentiable at x E X and there exist thick sets C(f) c CO and T(fl e T(t) such that for any sequence {Xk} described by Xk --X
Xk -4 X,
IIXk-XII
-3g, gE
f), Xk r T(f)
(13.9)
we have (13.10)
Df(xk) -+ z(g) ,
here
Z(g) = VW + W(g);
O-W = Ng* W = fw(g)l -
°u°
Hence, 0(g) = {z(g)}. Thus, for f E M(x) we have
dC f(x) = Cl co {z(g) I g e CO c C(fl}
.
(13.11}
It follows from Corollary 13.1 and relation (13.8) that f e M(x) if f is a convex function. Clearly M(x) also contains concave functions and smooth (i.e., continuously differentiable) functions.
The Clarke Subdifferential and the Quasidifferential
147
Example 13.1 shows that differentiability in a neighborhood of a point x
is not sufficient for a function to belong to M(x). The following theorem indicates that all "normal" operations over functions in the class M(x) produce a function which also belongs to M(x). .
Theorem 13.1. The set M(x) is closed with respect to all algebraic operations and the operations of taking pointwise maxima and minima (of a finite number of functions).
P r o o f. This theorem may be proved by verifying it for all operations. We shall consider only two cases: (i) a sum of two functions and (ii) a maximum function.
Let f =fl + f2, where fif2 E M(x). It then follows from (10.4) that Dflx) _
where
f(x) _ fi(x) + h(x), df(x) = af, (x) + af2cx
(13.12)
.
Since T(fl) and T(f2) are thick sets on V, we have that T(fl = T(fl) n T(f2 )
is also a thick set on V, and T(t) c T(f). Similarly, C(t) = C(fl) c C(f2) is a thick set on C, and C(f) c C(fl. Suppose that Xk x
xk -4 x,
xk E T(f),
gk =
11 xkx I I
-4 g,
g E C(f) .
Since }'1, }2 E M(x), we have
Vfi(xk) -4 z1(8) = V1(8) + w1(8),
V.fz(xk) -4 z2(8) = V2(8) + N'2(8)
where
vl(g) = Arg max {(v,g) I V E ffd1x)} , v2(g) = Arg max {(v,g) I V E ffdZ(x)} , ...
wl(g) = Arg min {(w,g) I W E dfl(x)} w2(g) = Arg min {(w,g) I W E df2(x)} .
,
Chapter 13
148
Since xk e T(f), we have Vf(xk) = Vfj (xk) + Vf2(xk) and hence Vflxk) zi(8) + z2(g) = v(g) + w(g), where (13.13)
v(S) = v1(8) + v2(9)) w(8) = w1(8) + x'28) . It is clear from (13.12) that
Arg max J(v,g) I v e fi(x)} "^J
Arg max J(v,g) I v E f1(x)} + Arg max J(v,g) I v e e2(x)l = vl(g) + v2(g) .
(13.14)
Similarly, .-.
Arg min {(w,g) I w E fi(x)} = wl(g) + W2 (9) .
(13.15)
We conclude from (13.13)-(13.15) that Vf(xk) - v(g) + W(g) ,
(13.16)
where
v(g) = Arg max J(v,g) I v e f(x)} _ O(g) , w(g) = Arg min {(w,g) I W E af(x)}
(g) ,
and the sets fi(g) and O(g) each consist of a single point. Thus (13.16) holds for all g E C(t) and for all sequences {xk} such that xk --x
Xk E T(f), xk -4 x, A/
gk = Ilxkx II
N
-4g , AI
where the sets CO and T(f) are thick on C and X respectively, and C(f) e C(f ),
T(f) e T(f ). It follows that f E M(x), and therefore (13.11) holds. This
proves the theorem for a sum of two functions. Now let us consider a maximum function. Suppose that
f(x)
where I = 1:N, f E M(x).
ma x f (x) , i
The Clarke Subdifferential and the Quasidifferential
149
Recall that if the functions fi, i E I, are Lipschitz on an open set X e Rn ,
then for almost all x' E Tf we have Vf(x') = Vf (x') V i E R(x'), where R(x') =j i E I I f (x') = f (x')}.
Let T1(fl e Tf denote the set of all points in T(t) for which this property holds. Let
C'(f) = n C(f ), T'(fl = n T(f) iEI
iEI
.
It is obvious that C'(f) and T'(f) are thick sets (on C and V, respectively). Since C(f) and T1(f) are thick sets, it is clear that C{fl = CO n C'(f) and T(f) = T1(fl n T'(f)
are also thick sets on C and V, respectively. Now consider a sequence {xk } such that Xk x
Xk -4 x,
Xk E TO,
gk = IIXkx I
I
-p g,
g E C(f) .
We can assume without loss of generality that i(g) E R(xk) V k. Therefore i(g) E R(x). Moreover, it is easy to see that i(g) E R(x,g), where
R(x,g) =
df(x) _ df (x)
i e R(x)
dg
dg
Since
,
f(x) = co fi(x) -
dfk(x) I i e R(x)
ke R (x)
k*i
df(x) _
2: df (x) iE R (x)
,
we have, for any i E R(x,g), max VE o f (X)
(v,g) = m a x (v,g) VEAi
,
Chapter 13
150
where
A; = afj (x) -
E
kE R (x)
afk(x)
k*i
Therefore, since i(g) E R(x,g), we have max
vE aJ (x)
(v,g) =
max
vE d fi(g)(x)
(v)g) --
min
kE R (x) w E 0 fk(x) k* i(g)
(w,g)
.
But g E C, i.e., g E C(f) V i, g E C(f ), and hence --
I
ITV
Arg max {(v,g) I V E(x)} = vi(g) (g)
kE R (x)
wk(g) __ v(g) ,
(13.17)
k* i(g)
where
vi(g) = Arg max {(v,g) I V Ed(. fi )} wi(g) = Arg min {(w,g) , w E df (x)}
.
Note that the set Arg max {(v,g) I V E f(x)} is a singleton (since g E C(f)). It is clear that min WE d f (X)
(w,g) =
Y iE R (X)
min
(w,g)
WE d fi(X )
and therefore
Arg min {(w,g) I W E df (x)} _
Y
wi(g)
.
(13.18)
iE R (x)
Now, from (13.17) and (13.18) we have Arg max {(v,g) I v E fi(x)} + Arg min {(w,g) I W E d}'(x)} (13.19)
The Clarke Subdifferential and the Quasidifferential
151
On the other hand, since xk E T1(f), we have
V.f(xk) = Vf(8)(xk) and therefore
limVf(xk) = limVf(g)(xk).
k-oo
(13.20)
k-oo
Since f (g) E M(x), we can write vf(g)(xk) -4 zi(g)
(13.21)
vi(g)(g) + wi(g)(g) .
Thus, from (13.21), (13.20) and (13.19) we finally conclude that f E M(x).
Remark 13.1. We can now find the Clarke subdifferential at x of any quasidifferentiable function belonging to M(x), as long as we have a quasidifferential of f at x. Quasidifferential Calculus can therefore be used as a calculus for Clarke subdifferentials (via relation (13.11)).
Observe that this is a real calculus since the rules are expressed in terms ...
of equalities, not inclusions. caw
Relation (13.11) shows that the Clarke subdifferential is a contraction of a quasidifferential (and therefore functions which have different quasidifferentials may have the same Clarke subdifferential).
Example 13.1. Let X E R1, .1F(x) = x2 sinx1, and x0 = 0. The function f is Lipschitz and differentiable on R1:
2xsinx
- cos- if x 0,
VAX) = f f(X) L
0
if=. if x = 0 .
Clearly, dc zf'(xo) = co {v 13 {xi}, xi -- 0, Vf(xi) -- v} = [-1,1].
cry
Since f(xo) _ [{0,0}, {0,0}1, it is clear that relation (13.11) cannot be satisfied. This is because f 4 M(xo) (the function f is differentiable but not continuously differentiable at x0). In this case,
0(g) _ {0}, 0(g) _ {0}, 0(g) _ 0_(g) + 0(g) _ {0}
Vg
and we cannot find a thick set T(f) c T(f) = (-oo,oo) (here C(f) = {-1,1 } and is "thick" with respect to itself) such that z(g) = 0 for any sequence {xk } satisfying (13.9). This is because the sequence {Yk }, where yk = cos 1/xk + 2xk sin 1/xk,
may tend to any number in the interval
not necessarily to zero.
Chapter 13
152
Example 13.2. Let x = (x(1),x(2)) e R2, xo = (0,0), and define
f(x) = max {min {x(' ),x(2) }; x(2) -x(I) }
(13.22)
.
Put I"1
f1(x) =min {X(I), z(2)}, .f2(x) _ X(2) -X(I)
>
f3(x) = x(') , f4(x) = -x(2)
.f(x) = max {fi(x),.fi(x)}, fi (x) =min {f3(x),.fa(x)}
.
and f4 are continuously differentiable, we can take , Since f2f3 Df2(xo) =
af(x0)J, ei(xo) = {(-1,1)}, af(xo) = {(0)4)}
D.f3(x0) = 13(x0), df3(xo)],
f3(xo) = {(1,D)}, dfs(xo) = {(0 ,0)}
Dfa(xo) = [a(xo), afa(xo)]>
%(xO) = {(0,-1)}, o'fa(xo) = {(0,0)} .
Using the rules of quasidifferential calculus (see Chapter 10), we have
Dfl(xo) = [e&O)q fj(xo)]
9
where
2fI(XO) = Wq-l)j
9
dfi(xo) = co J(0,1), (-190)1
9
and
Df(xo) = &(xo), f(xo)]
,
where
f(xo) = co {(1,-1) - (0,0); (-1,1) - co {(0,1), (-1,0)} = co {(1,-1); co {(0,1), (-1,0)}} = co o '(xo) = (0,0) + co {(0,1), (-1,0)1 = co {(0,1), (-1,0)} We observe that f E M(xo).
(0,1), (-1,0)1
,
The Clarke Subdifferential and the Quasidifferential
153
It is clear from Fig. 13.1 that we can take
C(f) = C(f) _ Ig E R2 1 IIg II = 1 } \ u
iE 1:4
Ai
,
where
A, =
-
9
9-
A3
(T2
Tj ) 1
1
( 42-
,
9
T2)
A2 =
(2 T 9
5
1
T5
I-
A4 NF5
T 5)
Find
fi(g) = Arg max {(v,g) (v e f(x0)} O(g) = Arg min {(w,g) I w e df(xo)} . con
It is clear that for g E A1A2 (where AB is the arc on the unit circle which connects points A and B but does not include them) we have (see Figs. 13.1 and 13.2) O(g) = {(0,1)}, O(g) = {(-1,0)} and hence 0(g) = {(-1,1)} .
x2
Figure 31.1
Figure 13.2
Chapter 13
154
e-1
m
coq
For g e A2A3 we have {(1,-1)}, O(g) _ {(-1,0)} and hence O(g) = {(0,-1)} m
010
For g E A3A4 we have {(1,-1)}, O(g) _ {(0,1)} and hence 0(g) = {(1,0)}
For g e A4A 1 we have
fi(g) = {(-1,0)}, O(g) = {(0,1)} and hence 0(g) Applying (13.11) we finally obtain (see Fig. 13.3) x2
x1
Figure 13.3
do Lf(xo)
= co {O(g) I g E C} = co {(-1,0), (0,-1), (1,0)} .
The direct evaluation of do z f (xo) using formula (13.1) gives the same result.
Remark 13.2. The function (13.22) was described by F. Clarke in [31, Section 25].
Chapter 14 IMPLICIT AND INVERSE FUNCTION THEOREMS
CAD
14.1. The implicit and inverse function theorems are essential components of classical differential calculus. Here we describe analogous theorems for quasidifferentiable functions.
First we shall consider the case in which there is only one equation. A system of equations will be examined in Section 14.3. Let z = [xy] E Rm + 1
,
where x e R"', y E R1, and consider a given function f(z) = f(x,y). Let zo = [xo,yo] be a solution of the equation f(z) = 0, i.e.,
f(xo,yo) = 0 .
(14.1)
Fix any g E Rm, 11g II = 1, and consider the function 010
F(a,y) = f(x0+ag,y), where 0
a continuous function y(a) defined on [O,ao] can ,..
a
be found, such that
F(a,y(a)) = 0
V a E [O,ao]
,
(14.2)
then we say that equation (14.2) describes an implicit function y(a) = y(a,g).
In this case a neighborhood of the point zo contains a solution to the equation
F(xo+ag,y) = 0
.
(14.3)
The question of existence and uniqueness of implicit functions has been studied in some detail in the case where the function f is smooth. Now let us 155
Chapter 14
156
N
assume that f is jointly quasidifferenti able in both variables at the point zo and continuous in some neighborhood of zo. Then
f(z0+a) = f(zo) + a
max (v,77) + min (w,R) vEd f(ZO) WEd f( 2O)
+ o(a) .
(14.4)
arm
Here 17 = [g,q] (recall that g E Rm is fixed). We assume that there exists a qo ER' such that f(zo,770) =
af( zo) drb
max
min
(v,i7o) +
VE d f (ZO)
(w,i7o) = 0
,
(14.5)
WE d f( ZO)
where 77o = [g,qo] Let that function o(a) = o(a,g) in (14.4) be such that o a,
a
-4
a-+0
(14.6)
0
uniformly with respect to q in some neighborhood of the point qo E R1.
This assumption is quite natural and is satisfied in many cases of practical importance. Put
v = [vl,v2],
w = [w1,w2] ,
where v2 E R 1,
V1 E Rm ,
wl E Rm ,
W2 E R 1,
and consider the function
h(q) =
max
vEd f(ZO)
((v1,g) + v2q) +
min
wEd f(ZO)
((wl,g) + w2q) .
(14.7)
Relation (14.5) implies the equality
h(9o) = 0 .
(14.8)
Implicit and Inverse Function Theorems
157
Theorem 14.1. Let h(qo) = 0. If the function h(q) is strictly monotonic in a neighborhood of the point qo, then there exists a function y(a) which is defined and continuous for a E [O,ao], where ao > 0, and such that equality (14.2) is satisfied. Furthermore, the right-hand-side derivative of the function y(a) exists at the point a = 0 and
(0) a yyo y+ C) y(O) =
1im
,
a-+0
a [y(a)) -- yC0 )]
= qo
(14.9)
0
P r o o f. It follows from (14.1) and (14.4) that f(xo+ag, yo+a(qo+,8)) = ah2(a,A
,
(14.10)
where
h2(a,fl) = hl(,8) +
o a,fl)
hiA = h(qo+,8), hl(0) = 0 .
9
Since the function h(q) is strictly monotonic in a neighborhood of the point qo, the function hl(#) is strictly monotonic for ,l3 sufficiently small. Therefore there exists an Eo > 0 such that for any e E (O,eo] we can find 131(e) and Q2(E) which satisfy the equations
hl(#2(e)) = -e
h I(#, (e)) = e I and the relations ,81(e)
0
fl2(e) - 0
0,
.
m
Fix any e E (O,EO]. It follows from (14.6) that
oa a
OAA
_
a
-4
0
a-+0
uniformly with respect to,8 (if the absolute value of fl is sufficiently small). Therefore there exists an ao(E) > 0 such that e
NAM
0
V e E Aad(e)]
't3
Chapter 14
15 8
Thus WIN 2
29
The continuity of the function h2(avg) in ,8 implies that there exists a /3(a) such that h2(a,,Q(a)) = 0, i.e., f(xo+ag, yO+a(qo+fl(a))) 010
0
Clearly,
fl(a)
-4
a 4+0
0
Y(a) = Yo + a(9o+Q(a)) is a solution to equation (14.2) such that y+ (0) = qo.
0
14.2. Now let us consider the case m = 1, i.e., the case in which f(x,y) is a function of two variables. Let z = (x,y) E R2 and let zo = (xo,yo) be a solution to the equation f (z) = 0, i.e., O15,
f(XO'YO) = 0
-
Put h(q)
(v1g+v2q) +
max
=
(v1,V2)E
(ZO)
min
(w1,w2)E af(zO)
(w1g+w2q)
h(q)+l(q). Here g = +1 or -land
h(q) =
max (v1)v2)E f (ZO)
min
(w1g+w2q)
.
(w1,w2)E ?f (ZO)
and B =Pry [4f(zo)] be the projections of the sets e(zo)
Let A = Pry
respectively, on the axis y (A and B are obviously intervals).
r.1
and
(v1g+v2q), h(q) =
Implicit and Inverse Function Theorems
159
Proposition 14.1. if A n B
m
(14.11)
,
then the equation h(q) = 0 has a unique solution and the function h(q) is strictly monotonic (for both g = +1 and g = -1).
P r o o f. Fix any q E E1 and let A = [a1,a2 ], B = [b1,b2 ], where a1 < a2, b1 S b2. Take a direction s E El (s = +1 or s = -1). Then
max vs+
(v1,v2) E R(q)
mi n (wl,w2) E Q (q)
ws 2
14. ()
where
R(q) = {v = (v1tv2) I V E f(zo); v18 + v29 = h(9)}
Q(9) = {w = Mtw2) I W E df(ZO); x'18 +x'29 = h(9)} . It follows from (14.11) that
as
>_ max {b1-a2, a1-b2} = c >0 .
(14.13)
It is clear from (14.12) that if
for a direction s, then the inequality
necessarily holds for the direction s1 = -s. Since (14.13) is valid for any q, there exists a unique qo such that h(qo) = 0. Inequality (14.13) also implies that the function h(q) is monotonic.
Chapter 14
160
Corollary 14.1. If f is a subdifferentiable function, then Df(zo) = &(zo),O] and condition (14.11) is equivalent to the condition 0 4 A.
If f is a superdifferentiable function, then (14.11) is equivalent to the condition 0 4 B. If f is a smooth function, then B = O1, A = f(z0) and condition (14.11) becomes the classical condition
f(xo,yo) # 0 . Remark 14.1. Proposition 14.1 can be generalized to the case x E Rm. Put HQ7) = df (zo)/d17 and find the solution to the equation
H(17) = 0 .per
subject to the condition
(14.14)
11 17 II = 1. -14
Assume that equation (14.14) has a finite number of solutions where 77k = (P 1 k'p2 k )
LA,
CIO
p,,
Of course it may happen that there is either an infinite number of
solutions or none at all. For example, if
4f(zo) c int J(zo) or e(zo) c int [--af(zo)]
>
..4
then equation (14.14) has no solution. Take a solution 77k = (Plk'P2k). If P1 k > 0, then q1 k = P2 k'Pl k is a solution to the equation max
(w1 +w2q) = 0 .
min
(v 1 +v2q) +
IUD
h1(q) -
(14.15)
(w1,w2) E af(Z0)
(v1,v2) E af(ZO)
If, in addition, the function hl (q) is strictly monotonic in a neighborhood
of the point ql k, then there exists a continuous function y(a) defined on [O,ao ], where ao > 0, such that
y(o) = yo,
f(x0+a, y(a)) = 0
V a E [0,ao]
.
(14.16)
The function y(a) is differentiable from the right at the point a = 0 and
y+(o)=qlk
If pl k < 0, then q2 ,k = P2 k/ Pl k is a solution to the equation
h2 (q) =
max
(vIIv2) 6 d(Z0)
(-v1 +v2q) +
min (w1,w2) 6 f(ZO)
(-w1 +W2 q) = 0 . (14.17)
Implicit and Inverse Function Theorems
161
4-+
If, in addition, the function h2(q) is strictly monotonic in a neighborhood of the point q2 k, then there exists a continuous function y(a) defined on [O,ao], where ao > 0, such that .
.-,
V a E [O,ao] w
y(O) = Yo, f(x0-a, y(a)) = 0
(14.18)
This function y(a) is differentiable from the right at the point a = 0 and y+(0) = q2 k If p1 k = 0, then the equation f(xo+ag,y) = 0 may have a solution (for g = +1 .-r
or g = -1) which has an infinite derivative at zero (this solution corresponds to the point p7k ). r,,
If pl k > 0 V k E 1,.e, then the equation f(xo-a,y) = 0 has no solution for a > 0 sufficiently small (in a neighborhood of yo). If pl k < 0 V k E 1:.e, then the equation f(xo+a,y) = 0 has no solution for a sufficiently small (in a neighborhood of yo). ?7
,...,i7
can help us to solve the inverse problem: find
Q,^
CD's
4.4
The points
max r01
hi(g) =
BCD
4-A
solutions to the equation f (x, yo+aq) = 0 (where q = +1 or q = -1). In particular, if P2 k > 0 and the function
(v1g+v2) +
min
(w1$+w2)
(w1,w2) E f(z0)
(v1,v2) E af(ZO)
is strictly monotonic in a neighborhood of the point g1 k = Pi k'P2 k, then there exists a continuous function x(a) defined on [0,ao], where ao > 0, such that
x(0) = xo, f(x(a), yo+a) = 0
V a E [0,ao]
(14.19)
and x+ (0) = gl k.
Analogously, if P2 k < 0 and the function
h2(g) =
max (v1,v2) E af(ZO)
(v1$-v2) +
min
(wlg-w2)
(w1,w2) E f(zO)
is strictly monotonic in a neighborhood of the point 92k = Pl k'-p2 k, then there exists a continuous function x(a) defined on [0,ao], where ao > 0, such that
x(0) = xo, f(x(a), yo-a) = 0 and x+ (0) = 92k-
V a E [0,ao]
(14.20)
Thus, in the two-dimensional case, studying a subdifferential of the function enabled us to prove the existence of an implicit function for both directions, g = +1 and g = -1.
Chapter 14
162
Example 14.1. Let f(xy) = jxj - jyj + 2y and zo = (0,0). Clearly, RZO) = f(OVO) = 0
-
We have RZ) = A (Z) + f2 (Z)
,
where
fi(z) =
f2(z) = -lvi
ICI + 2Y,
The function f, is subdifferentiable at the point zo and we can take
f1(zo) = co {(-1,0),(1,0)} + (0,2) = co {(-1,2),(1,2)} df, (zo)
JO)Oj
The function f2 is superdifferentiable at zo and we can take
f(zo) = {(0,0)}, af2(zo) = co {(0,-1),(0,1)}
.
Therefore
f(zo) =
Jj(zo),
afcZO> = af2(=o)
In Figure 14.1, the set f(zo) is bounded by a bold line and the set [-f(zo)] .t)
by a dashed line (-df (zo) = f(zo)) (in this case 4-f (zo) = df(zo)).
,.0.<
...
Find solutions to the equation H(i) = 0. Let L1 be the interval connecting points (1,2) and (0,-1), and let L2 be the interval connecting points (-1,2) and (0,-1). We find the points in L1 and L2 which are the nearest to the origin and call them z1 and z2, respectively.
Then the solutions to the equation H(i) = 0 are Z2
Zl 171
=
172=
11Z111'
112211-
It is clear that 3 771
Z
TI
- VltJI- )
772
3- --1
10 I
N1V
)
.
Implicit and Inverse Function Theorems
163
Since pit =3/vTV>0,we have g1i = - 1/3. The function hi (g) (see (14.15)) is strictly monotonic in a neighborhood
of the point ql 1, and therefore equation (14.16) has a solution y(a) for
sufficiently small a > 0 and y' _ - 1/3. Analogously, since P21 = - 3/410 < 0, we have q21 = - 1/3; equation (14.18) has a solution y(a) for sufficiently small a > 0 and y+'(0) _ - 1/3. Next we consider the inverse problem. Since the second components of the vectors 1i and 112 are negative, there exist two continuous functions xl (a) and x2 (a) (defined for sufficiently small a > 0) such that
x1 (0) = xpt f(x(a),y0-a) = 0
V i E 1:2.
Furthermore, xi+ (0) = 3, x2+ (0) = 0.
At the same time, there is no solution to the equation f(x, yo+a) = 0 (where a > 0) in any neighborhood of the point xo.
Example 14.2. Let f(x,y) = Ixl - lYl, zo = (0,0). It is clear that f(zo) = 0. We can take
f(zo) = co {(-1,0),(1,0)}
,
df(zo) = co {(0,--1),(0,1)}
Again, 4f(zo) = f(zo). V V
Figure 14.1
Figure 14.2
Chapter 14
164
It is clear from Fig. 14.2 that the equation H(r7) = 0 has four solutions:
27 1
L - F2
\F2
272
t
L
NF2
- \F2
1
273
1
272
t
-\F2 '
-%F2
Since pl 1 > 0 and p2 2 > 0, equation (14.15) has two solutions: q1 1 = 1, q12 = -1.
The function h1 (q) is strictly monotonic in a neighborhood of the points
q1 1 and q12 . Therefore there exist two functions yl (a) and y2 (a) such that
yi(0) = 0, f(xo+a,y(a)) = 0 V i E 1:2 for all sufficiently small a > 0. In addition, yi + (0) = 1, y2.} (0) = -1. Analogously, since p13 < 0 and P14 < 0, equation (14.17) has two solu-
tions q13 = +1, q14 = -1, and the function h2(q) is strictly monotonic in a neighborhood of points q1 3 and q1 4 .
Thus, there exist two functions y3(a) and y4(a) such that
V i E 3:4
yi (0) = yo = 0, f(xo-a, yi (a)) = 0
.-r
r-,
for sufficiently small a > 0. In addition, y3 + (0) = 1, y4 + (0) = -1. It is not difficult to check directly that this is really the case.
Example 14.3. Let f(xy) = sin xj - sin 12yj and zo = (0,0). It is clear that f(zo) = 0. We have RZ) = A (Z) + f2 (Z)
,
where
f1(z) = sin xj = max {sin x, -sin x} , f2(z) = -sin 12yj = -sin (max {2y,-2y}) . The function fi(z) is subdifferentiable at the point zo and we can take
f1(zo) = co {(cos 0, 0), (-cos 0, 0)} = co {(1,0), (-1,0)} t(0)0)1
Implicit and Inverse Function Theorems
165
0
The function f2(x) is a smooth composition of quasidifferentiable functions, i.e., f2(x) = q5(y(z)), where
0(t) = sin t, yl(z) = 12y1. Obtaining the quasidifferential of a composite function in the usual way, we have Df2(zo) _ 0"(Y1(zo))DY1(zo) = -cos 0 Dyr(zo) _ -Dyr(zo) D Vl(zo) = [r2o(zo),
where
dyf(zo) = co {(0,2), (0,-2)} d yf(zo) _ {(0,0)} Therefore
t'2(zo) _ {(0)0)} N
df2(zo) _ -dyf(zo) = co {(0,2), (0)-2)} Thus 0
f(zo) = co {(1,0),(-1,0)} 0
df(zo) = co {(0,2),(0,-2)} . It is clear from Fig. 14.3 that the equation H Q7) = 0 has four solutions:
171 =
(45
e
2 273
_
I,
VJ
1
VJ $ ;5-
s
272
r > -VJr )
s
L yJ
272 = (
2 %F5
-
)
.
yJ
For each of two directions, g = +1 and g = -1, there exist two functions yi (a), y2(a) satisfying equations (14.16) and (14.18), respectively (for all sufficiently small a > 0), and such that y1 (0) = yo (i E 1:2). In each case, one of these functions has a right-hand-side derivative equal to 1/2, while the second has a right-hand-side derivative equal to -1/2.
Chapter 14 C13
166
Example 14.4. Let f(x,y) = Ix +y2-21 - Jx2-2y+i1 and zo =
(1,-1). Clearly,
f(go)=0. `'h
According to the rules of Quasidifferential Calculus (see Chapter 10), we can take Df(zo) = &(zo), af(zo)J, where
f(zo) = co {(1,2), (-1)-2)}, df(zo) = co {(2,-2), (-2,2) . It follows from Figure 14.4 that the equation df(zo)/dij = 0 (111711
1) has
four solutions:
771= (&) 773 = (0,1) ,
V1-7-
l/
774 = (0,-1) .
Y Y
Figure 14.3
Figure 14.4
The equation f(x0+a, y(a)) = 0 has a solution related to the point 77, such that y(O) = yo = 1, y,'(0) = 1/4.
Similarly, the equation f(x0-a, y(a)) = 0 has a solution related to the point 172 such that y(O) = yo = 1, y.} (0) = - 1/4.
The points 273 and 774 indicate (since p13 = P14 = 0) the possibility that there are solutions to equations (14.16) and (14.18) which have infinite
derivatives at the point a = 0. It is not difficult to check directly that such
Implicit and Inverse Function Theorems
167
M-+
solutions exist. Since p2 * 0 V i E 1:4, the equation f(x, yo+a) = 0 (for all sufficiently small a > 0) has two solutions x1(a) and x2 (a) related to the points 711 and 772 and such that
xi + (0) = 4,
x1(0) =x2(0) = xo = It
x2+ (0) = 0
.
Similarly, the equation f(x, yo-a) = 0 has two solutions x3(a) and x4(a) related to the points 773 and N and such that
x3+(0)= -4,
x3(0)=x4(0)=xo= It
x4+(0)=0.
G."
ti,
Example 14.5. Let f(x,y) = Ix+y2-21 + 1x2-2y+1I and zo = (1,1). It is clear that f(zo) = 0. The function f is subdifferentiable at the point zo and we can take
f(zo) = co {(3,0), (-1,4), (1,-4), (-3,0)} df(zo) _ {(0,0)}
vow
Since [-rdf(zo)] c int f(zo), the equation df(zo)ld1 = 0 (where 47 11 = 1) has no solution. Therefore there are no solutions to equations (14.16) and (14.18) in a neighborhood of the point (1,1).
14.3. Now let us consider the multidimensional case. Let z = [x,y], x E Rm, y E Rn, and let the function f (z), i E I:n, be quasidifferentiable on Rm + n Consider the following system: r-.
f (x,y) = 0
V i r= 1:n .
This can be rewritten in the form
f(z) = 0
(14.21)
,
CT'
where f = (f ,...,fn ), 0 E Rn. The problem is to find a function y(x) such that
f(x, y(x)) = 0
ViE 1:n,VxE Rn
.
Unfortunately we cannot solve this very general formulation of the problem for an arbitrary quasi differentiable system of type (14.21). But what we shall try to do is to solve this problem for a given direction g E Rm. We shall call this a directional implicit function problem.
Chapter 14
168
Suppose that zo = [xo,yo] is a solution to system (14.21), i.e.,
J(zo) = 0
V i E 1:n
.
Consider the system of equations
f(x0+ag, y(a)) = 0
(14.22)
,
where a > 0. Since the functions f are quasidifferentiable for any q E Rn, we have from (14.21) that dfi (zo )
f (xo+ag, yo+aq)
= f (xo,yo) + a d
=ad
[g,q]
+ oi(a,q)
i(zo) + of (a,q) [g,q]
(14.23)
where
dfi (zo)
d[g,q]
=
max
viEdfi(z0) [(vi
i 99) + (v2 i ,q)]
min
+
[(wt i,g) + (w2 i,q)] .
(14.24)
W E o fi (Z0)
Here Df i (z) = [ f (z), d f (z)] is a quasidifferential of fi at z; f (z) c R'n + n , N
df (z) c R'n+n are respectively subdifferentials and superdifferentials off at z (convex compact sets); vi = [vii, v2 i] and wi = [wi i, w2 i]. Let qo E Rn be a solution to the quasilinear system afi (zo)
d [g,qo ]
=0
Vi E
1: n
.
(1425) .
Suppose that in (14.23) of (a,q)
(14.26)
Implicit and Inverse Function Theorems
169
uniformly with respect to q E S8(g0)
Jq E Rn I ll(J-(jp 11 < } s
where 8 > 0 is fixed.
Is it possible to find a vector function r(a) with ao > 0 such that
V i e I :n, a E [O,ao]
f (xo+ag) yo +a[qo+z(a)]) = 0
(14.27)
,
where r(a) E Rn V a E [O,ao]? Take e 2t 0 and introduce the sets vi e ei(zo) I (vl i,g) + (v2 i,q0) 2 R ie
=
max
Ift,
vie d f i
[(vi i ,g)+(%i ,qo)] - e (ZO)
wi E d f (ZO) I (wl i,g) + (w2 i )qo)
<
min
r-,
gic =
W E d fi (ZO)
=
vi e ei(z0) I (vi i,g) + (v2 i,q0+Z)
max
r-,
R i(Z)
viE d f i (ZO)
Ri(Z)
=
[(W'I i,g)+(w2 i,q0)] + e
[(vi i ,g)+(V2 i ,q0+z)]
wi e d f (z0) I (wi i,g) + (w2 i,qO+Z) =
min WE
fi (z0)
[(1'i i,g)+(w2 i,q0+Z)]
It is clear that all these sets depend on zo, g, qo. Note that mappings R i(z) and 0
R i(r) are upper semicontinuous (i.e., closed) and that for any e > 0 there exists a 51 > 0 such that <
R i(Z) c R ic,
R i(Z) c Ric
V i e 1:n, V- re S51 (0)
.
(14.28)
Chapter 14
170
From (14.24) we have dfi (zo )
d[g,go+z]
(Vii(Z), g) + (v2i(Z), qo+z) + (wij(Z)) g) + (wai(Z)> qo+Z) (v2i(Z)+w2i(z),z) + r1 i(z) ,
where
r1 i(z) = (v1 i(Z), g) + (v2 i(z), qo) + (w1 i(z), g) + (w2 i(z), qo) , Vi(T)
_ [v1 i(Z), v2 i(z)] E R i(z)
wi(z) _ [w1 i(z), w2 i(r)] E R i(Z)
Since R i(z) and R i(Z) are upper semicontinuous, if zs wi (ZS)
s400
s400
0, vi (ZS)
S-400
vi, and
wi, then vi E R i (0), wi E Ri (0). This means that
rI i(O)
dfi (zo)
d
and the rl i(z) are continuous. It follows from (14.25) that
rl i(0) = 0
V i E l:n
.
(14.29)
Thus, from (14.23) f (xo+ag) yo+a(qo+z)) = a[(v2 i (z)+w2 i (z), z) + ri (a,z)] , where
ri(a,Z) = r1 i(s.) +
oi(a , qo+Z)
a
Consider the functions Fia (Z) = (v2 i (Z) + w2 i(2'), Z) + ri(a,2) . Here v21(r) E "21(')' w2 i (Z) E
"2i()' where
V2i(Z) _ {v2i 13 v1i E Rn: [vli,v2i] E Ri(Z)} W2 i(Z) = {w2 i 3 wl i E Rn : [w1 i,w2 i] E R1(z)}
(14.30)
Implicit and Inverse Function Theorems
171
The mappings vi i(r) and w2 i(r) are upper semicontinuous. Now introduce
the set M(z) of matrices such that A E m(r) if A is a matrix with ith row [v2 i(z) + w2 i(Z)]T, where
v2 i(z) E V2 i(z) and w2 i(Z) E W2 i(z)
The mapping M is convex valued and upper semicontinuous. Let us denote by M. (where e >_ 0) the set of matrices such that A2
Me = IA =-
:
Is
Ai = [v2 i+w2 i]T , V2 i e
R, W2 i E R ie
V
An
From (14.28) it is clear that
M(z) C M£ d z E S' 61 ( 0 )
(14.31)
.
Note that if 51 =di(e) in (14.31), then (14.28) is satisfied.
Theorem 14.2. If for some e > 0 we have
mi n detA>0,
(14.32)
AEMc
then for a positive and sufficiently small a there exists a solution to system (14.27) or, equivalently, to the system
Fia(z) = 0
Vie 1:n.
P r o o f. Let us construct the mapping M-1 (r) r(a,r) _ Oa (z)
,
where
M- I (r) = JB = A- I I A e M(r) I
.
From (14.31) and (14.32), it follows that 0a (z) is upper semicontinuous (for any fixed a e Mao]) in z e Sal (0) and that Oa (S81(0)) C S461 (0) .
Chapter 14
172
It is easy to see that 0a (2) is convex for all z. This means that all of the conditions of the Kakutani theorem (see [81 ]) are satisfied and therefore there exists at least one point 2(a) which is a fixed point of the mapping Oa (z): Z(a) E 0a ma)) ANC)
From (14.26) and (14.29) it is also clear that Z(a)
a0
which together with (14.30) yield Fi a (Z(a)) = 0
.
0
000
Corollary 14.2. If qo is a solution to (14.25) and condition (14.32) of
'0
III
Theorem 14.2 is satisfied, then system (14.22) has a solution y(z) defined on [O,ao] (where ap > 0) and such that im
a
fy(a)
0
= qo
goo
We shall call Theorem 14.2 a directional implicit function theorem. Of course, there could be several solutions to (14.25), or none at all. It is important to be able to solve systems of equations of the form max
vi Eali
[(VI i,g) + (v2 i,q)] + m i n [(wl i,g) + (w2 i,q)] = bi Wi Ea21
V i E 1:n
,
where vi = [vl i, v2 i], wi = [w1 i, w2 i ], and al i c Rm + n and 0121 c Rm + n are convex sets. We shall call systems of this type quasilinear.
In some cases (for example, if 61 i and 621 are convex hulls of a finite number of points) the problem of solving quasilinear systems can be reduced to ;,.,
that of solving several linear systems of algebraic equations (we shall illustrate this later on).
Implicit and Inverse Function Theorems
173
14.4. Now let us consider a special case of the problem, namely, where system (14.21) is of the form
x+O(y) = 0,
x(i) + Oi(y) = 0
(14.33)
V i e 1:n
,
where
X = (x(1),...,X(n)) E Rn, y = (y(l),...,y(n)) E Rn and the function Oi are quasi differentiable on Rn. Suppose that zo = [xo yo] E R2 n is a solution to (14.33), i.e.,
Choose any direction g E R. We now have to consider two questions: 1. What conditions are necessary for the existence of a positive ap and a continuous vector function y(a) such that the expressions V a e [O,ao]
y(0) = yo, xo + ag + r(y(a)) = 0
(14.34)
are satisfied? 2. If y(a) exists, does
.-.
Y'' (0) = 1 i m a-)+ 0
1 a Cy(a) - yC0)]
necessarily exist? CDR
To answer these questions we turn to Theorem 14.2 and its corollary. Let DOi (y)= [d Oi (y), du(y) be a quasidifferential of Oi at y. We then have
max Vie d O i (y0)
(vi,q) + -{-
Oi(yo+a) _ oi( O) + a
min
(wi,q)
W iE d O i (yo)
+ o; (a,q) .
(14.35)
Chapter 14
174
In this case equation (14.25) takes the form max
(wi,q) _ -g1 V i E 1:n
min
(vi,q) +
.
(14.36)
W 1E d Oi (Yo)
v1E d Ot (yo)
Suppose that qo E Rn is a solution to (14.36) and that in (14.23) o.(a)g)
a
a 4+0
0
uniformly with respect to q E Sg(go) We now introduce the sets
= {vi E do (yo)
R
I
(vt,q) >_
max
(vi,q) - e}
v1E d Ot (yo)
_ {wt E dO (yo) I (wi ,q)
R ie
_min
(wi,q) + e}
.
W 1E d 0t (Yo)
Let M. be a set of matrices such that [At
Me
A
,
Ai = [vi + w,]T , Vi E Rte, wt E kig, V
i
A_ An
where e >_ 0.
Theorem 14.3. If for some e > 0 we have
min det A > 0
,
(14.37)
AEMe
then there exist an Cro > 0 and a continuous vector function y(a) such that
and y+ (0) = qo.
Implicit and Inverse Function Theorems
175
Remark 14.2. In the case where each of the sets ao;(yo) and dOt(yo) (for all values of t) is a convex hull of a finite number of points, it can be shown that Theorem 14.3 is valid if (14.37) holds for e = 0. An analogous result can also be obtained for Theorem 14.2. L3,
Remark 14.3. Suppose that [xo,yo] is a solution to (14.34). Then to solve the directional inverse function problem, it is necessary to find all solutions to (14.36) and check whether condition (14.37) is satisfied.
Chapter 15 CONES OF ADMISSIBLE DIRECTIONS AND A NONDEGENERACY CONDITION
cor
15.1. In studying and solving extremal problems, it is often convenient to U1,
approximate those sets over which optimization is to be performed. Approxima0
tions of this type can also be useful in studying other problems. There are a number of ways of making such an approximation; we shall mention only those we shall be using in later chapters.
000
Let Q be a set in the space Rn and let x be a point in the closure of Q. A vector g E Rn is called an admissible direction with respect to the set at the point x if there exists an ag > 0 such that x+ag E Q V a E (0,ag ). A vector g E Rn is called a tangent direction with respect to the set Q at the point x if for any a > 0 there exists a vector
a
-4
0
(a) such that x+ag+yr(a) E £2.
0.
(15.1)
Let V1 be the set of functions yf operating in Rn, defined for sufficiently small a > 0 and such that (15.1) holds. Elements of the set V/1 will also be denoted by o(a).
Thus, a direction g is tangent (with respect to a set Q at a point x) if there exists a function f E V/1 such that x + ag + Vl(a) E Q or, equivalently,
if x+ag+o(a) E Q. A vector g is feasible with respect to a set Q at a point x if there exist sequences {gk } and {ak } such that gk E Rn ,
ak > 0, gk -4 g, ak -4 +0, x + ak gk E Q .
The following definition is equivalent: a vector g is feasible if for any e > 0,
a0 > 0 there exists an element w E g+B,, and a number a E (0,a0) such BCD
that x+aw E 12. Let y(x,A2) denote the set of directions which are admissible with respect
to a set .2 at a point x; let K(x,Q) denote the set of directions which are tangent to a set d2 at a point x, and let r'(x,Q) denote the set of directions which are feasible with respect to a set Q at a point x. 176
Cones of Admissible Directions and a Nondegeneracy Condition
177
Clearly, each of the set y(x,Q), K(x,Q) and r(x,sl) is a cone (although not necessarily a convex cone). We shall call these sets the cones approximating a set d2 at a point x. It follows directly from the definition that
K(x,sl) =
U K V (x,(2) , VIE V1
where
KxQ= n
a> 0
l Q-x- a.
Similarly,
n rxa = C> n 0 a0>0 u aE (O,ao)
1 Q-x + eB.
Let
Pa (y) = in f Ily-z ll 2E2
Since x e cl fl, we have 012 (x) = 0. It is clear that g e y(x,i2) if and only if there exists an ag > 0 such that pfj (x+ag) = 0 V a E (O,ag). Property 15.1. The following equality holds:
K(x,!l) _ {g e Rn I (pa )'(x,g) = o}
(15.2)
.
P r o o f. By definition, '(,g) x = 1 i m 1 [p1-i ( x+ag) -P(x)] = 1 i m Let
lim 1p(x+ag) =0 a and let va E Q be such that Ilx+ag-va II < Pa (x+ag) + a2 .
a P (x+ag)
.
Chapter 15
178
Put V/(a) = x+ag-va. Then x+ag+br(a) E d2 and simultaneously yr E V1. Thus g E K(x,Q). Analogously we can show that the relation g E K(x,d2) implies (pp)'(x,g) = 0. For the cone T(x,d2) we have SIa
a
fPn (x+aS) -PD (x)] =
0}
(15.3)
or, equivalently,
a JJni a -,+0
y(x, Q) c K(x,Q) c T'(x, w) .
(15.4)
If x is an interior point of Q, then all three cones coincide with Rn ; if x is an isolated point, then all three cones contain only zero.
(DEC
0.0
The cone T(x,Q) is always closed. Indeed, let g be a limit point of this cone. For every .6 > 0 and ao > 0 it is necessary to find a v E BB (g) and an a E (O,ao) such that x+ av E D. Let w E T(x,Q) be such that IIw-g II < .6/2. Then there exist a v E B,12 (w) and an a E (O,ao) such that x+ag E d2. This is precisely what we required, i.e., T(x,Q) is closed.
Col'
If the function pp is directionally differentiable at x E cl d2 and its
fir,
directional derivative is continuous as a function of direction, then the cones K(x,Q) and T(x,Q) coincide. A corollary of this is that the cone K(x,Q) is closed (in general this cone is not assumed to be closed). Note also that the following inclusions hold:
I'(x,Q) c cl (cone(Q--xw)), y(x,Q) c cone
(Q-x)
.
(15.5)
tea)
.fl
Indeed, if g E T(x,Q), then there exists a sequence {gk } such that gk -4g and akgk E [Q x] for some ak > 0. This implies the first inclusion in (15.5). The second inclusion can be checked in a similar way. The following property follows from relations (15.5) and (15.4): If y(x,Q) = cone (Q -x), then cl y(x,Q) = T(x,Q)
.
..,
0
These three cones describe the local properties of d2 in a neighborhood of x. Indeed, it follows from the definition that for any e > 0 we have
Cones of Admissible Directions and a Nondegeneracy Condition
170
y(x, i2) = y(x, Q r Bo (x) )
K(x,(2) = K(x,(2 r Bo(x))
,
T(x,Q) = T(x,Q r
.
B,,(x))
CAD
Why is it necessary to use different cones? The cone of admissible directions seems to be the simplest and most convenient of the cones considered, but unfortunately in many cases it is empty or contains only zero. For example, if d2 is described by an equality, i.e., d2 = {x h(x) = 0}, and if the graph of the function h does not contain "linear" segments in a neighborhood of the point [xof(xo)], then the cone y(xo,D) consists only of zero. In some cases the cones K(x,a) and T(x,!2) provide much more informa0
I
CSC
tion about the set a than does the cone y(x,a), but they are much more !"f
difficult to construct. Thus, if a is the intersection of a number of sets
n 'Ii
iE 1:N
then y(x,(2) = n y(x,ai) and, thus, if we know the cones y(x,Qi), it is not iE 1:N
difficult to find y(x,Q). At the same time, the equalities K(x, (2) =
r) K(x, (2i ), T (x, Q) =
iE 1:N
o T (x, Qi )
ie 1:N
(/S
do not necessarily hold and it is not easy to find the conditions under which they do. Observe in passing that
r x, v d2j) iE 1:N
but that y (x,
v ai
iE 1:N
v T (x,Qi) ,
ie 1:N
does not always coincide with
u y(x9d2i). However,
ie 1:N
extremal problems in which a set a is given as the finite union of its subsets are not encountered too frequently. &n'
There is quite a large class of sets for which cl y(x,.rl) = T(x,a). This class includes all convex sets and "inequality sets," i.e., sets of the form 12 = {x I h(x) < O}, where h is a function satisfying certain natural properties. We shall now describe approximating cones for some special classes of sets.
Chapter 15
180
15.2. First let us consider the case where d2 is a convex set.
Proposition 15.1. Let Q be a convex set and X ED. Then
cone (.- x)
A> 0
T (x, Q) = K(x, S2) =
cl
42-x) ,
y(x, a)
(15.6)
.
Proof. Let g E cone (a-x). Then aog E (Q--x) for some ao > 0 or, equivalently, x+ago E fl. In addition, since X E !2, we have x+ag = [/3(x+ao g) + (1--,(3)x] E !2
V a E [O,ao ]
,
where ao
Thus g E y(x,Q), i.e., cone (fl-x) c y(x,Q). The reverse inclusion follows from (15.5), proving (15.6). Using (15.4), (15.5) and (15.6), we arrive at the equality cl y(x,Q) = T(x,Q). Let us prove that K(x,Q) = T(x,Q). Consider the function psl (y) = p(x,a) which represents the distance of a point y from the set 42. The convexity of
4 implies that po is a convex function (this can easily be checked by the reader). Therefore pn is directionally differentiable at the point x and hence, in particular, we have
111
=
r-'
'x
l i m a[pa( 1 x+ag)-pa (x)]
a-,+0
r-'
l i m 1 [pa Cx+a ( -,On g) p.(X) () a+ a
V g
E Rn .
a)04.4
From (15.2) and (15.3) we deduce that K(x,Q) = T(x,12). 0
",p
In what follows, the cone y(x,Q) will be called the cone tangent to a convex set !2 at a point x. It will be denoted by Kx(Q) or simply by KX (where this does not cause confusion). t10
Let us now describe the cone conjugate to KX in the case where 42 is a convex set.
Cones of Admissible Directions and a Nondegeneracy Condition
181
Definition 15.1. Let .fl c R" be a convex set and x e .Cl. The cone
NX = NX (i2) _ {v E R' I (v,x) = ma x (v,y)}
yEl)
is called the cone normal to the set a at the point x. In other words, v E Nx(A2) if (v,x) = pa (v), where pa is the support function of the set D. Nx is a cone since, if (v,x) = pD (v), we have (AV9 X) = A (V) X) = APD (V) = PD (AV)
VA 2t 0
.
0
Let us prove that NX is a convex cone. It is necessary to show that if v1,v2 E NX, then VI +V2 E N. We have
(vi+v2,x) = (vi,x) + (v2,x) = pn (vi) + pn (v2) >_ pry (vi+v2)
.
At the same time, since x E .Q, we have
(v1+v2,x) < ma x (v1+v2,y) = p.Q(v1+v2)
Yen
Thus, (vl+v2,x) = pQ (vi+v2), which implies that v1+v2 E Nx.
Proposition 15.2. The following relation holds:
NX = - KX
.
Pro o f. (i) Let I E (KX ). Applying the equality Kx = cone (Q-x) yields (v,y-x) <0VyE yEfl, and hence m a x (v,y) Y En
(v, x)
Since X E £2, the last inequality is possible only if
m a x (v,y) _ (v, x) ,
y ED
i.e., v EN,.
Chapter 15
182
A
(ii) Let V E N,r, g G K. Since x+ag E d2 for sufficiently small a > 0, we have (v,x+ag) < (v, x), i.e., (v,g) < 0. This implies that v E (-Kr). 0
15.3. Consider the case where d2 = {y E X I h(y) <_ 0} (an "inequality" constraint).
Here H is a function defined on an open set X c Rn and directionally differentiable at a point x E X. It is assumed that the directional derivative h'(x,g) is continuous in g. Put
Yi = {8 I h'(x,g) < 0} ,
(15.7)
0} .
(15.8)
Y2 = {8 I h`(x,8)
follows directly from the definition that yl c y(x,Q), r(x,(2) c y2. Therefore, if cl yl = y2, then cl y(x,a) = r(x,a). We shall now formulate sufficient conditions such that cl Ti = y2 in the It
case where h is quasidifferentiable at a point x. Let V be a convex compact set and g E Rn. The set
GX(V) _ {v E VI (v,x) = m (v,x)} V
V
is called the max-face of the set V generated by x. It is well known (see Proposition 5.12) that the max-face GX (V) coincides with the subdifferential dpX of the directional derivative pX of the sublinear function p, where p is the support function of the set V. In other words, GX (V) coincides with the subdifferential dp(x) of the function p at the point x: pX (g) =
max
(v,g)
(15.9)
ve GX(V)
It is also clear that GX (V) is a convex compact set. If x = 0, then GX (V) = V.
Let us consider an ordered pair [V,W] of convex compact sets, where V c Rn, W c Rn. We say that a pair [V,W] is in a general position if for any g E Rn the max-face Gg (W) does not belong to the max-face Gg (V). Some examples of this now follow.
If the intersection of sets V and W is empty then the pair [V,W] is in a general position.
Cones of Admissible Directions and a Nondegeneracy Condition
183
If the boundaries of these sets intersect, but there is no common support hyperplance at each point of intersection, then the pair [V,W] is in a general position.
If one of the sets V and W is contained in the interior of the other, then these sets are in a general position.
s''
In the above examples both the pair [V,W] and the pair [W,V] are in a general position. Now let V,W c R2, where W is a triangle and V is a circle inscribed in this triangle. In this case the pair [V,W] is in a general position, but the pair [W,V] is not.
Proposition 15.3. Let a function h be quasidifferentiable at a point x and let rl and r2 be the cones defined by (15.7) and (15.8), respectively. If the pair [d h(x), -dh(x)] is in a general position, then cl ri = r2 = T(x,a).
P r o o f. We must show that for any g such that hX (g) = 0 and any .6 > 0 there exists an element q such that ,.p
IIq-g II < e,
hr(q) < 0
.
Put
max (v,y), p2(y) =
...
p, (y) =
vE a h(x)
max (v,y) vE [-a h(X)]
V Y E Rn
The functions p, and P2 are sublinear and
hz(Y) = Pi(Y) -P2(Y)
V y E R"
.
Therefore pi (g) = p2(g)-
Since the pair [dh(x), -dh(x)] is in a general position, there exists a v
such that vE
Gg (dh(x))
v 4 Gg
(15.10) (15.11)
Making use of (15.11), the separation theorem and formula (15.9), we find an element w E Rn such that
(v,w) >
max v'E G (ah (X)) 9
(v',w) _ (p1)(w)
Chapter 15
184
Therefore for sufficiently small a > 0 we have (V,W) >
I (pj(g+aw)-pj(g)]
or, equivalently,
Pt(B+ax') < P1(g) + a(v,w) .
N-+
r-1
(15.12).
From (15.10), the definition of the max-face and the equality p1(g) = p2(9), we deduce that
pi (g) + a(v,w) = (v,g) + a(v)w) = (v, g+aw) < p2(g+aw) . Putting q = g + aw, where a is sufficiently small, so that 11q-g11 < e, we have from (15.12) that h,r (q) = pI (q) - P2 (q) <
0. 0 O''
Remark 15.1. Let [V, W] and [V, W] be two equivalent pairs which are quasi-
differentials of a function h at a point x. If the pair [V, -W] is in a general s'"
position, then the pair [V,-W} is also in a general position. To show this, let S'(q) = h' (q). Then le = P1--P2 = p3-p4, where
pl (q) = max (v,q),
p2(q) =
P3 (q) = max (v,q),
p4 (q) =
vE V
VEV
max (v,q)
vE[-W ]
max (v,q) vE[-W ]
For g,q E Rn we have (q) _ (Pi )g (9) - (n2)8 (q) _ (P3)g (9) - (P4)8 (9) .
(15.13)
Using (15.9), it is easy to check that the pair [V,-W] is in a general position if and only if for any g E Rn there exists a q E Rn such that (p 1 )9 (q) - (P2)9 (q) < 0
Therefore, it follows from (15.13) that the pairs [ V, -W] and [V,-W] are both in general positions. Hence the property of being in a general position does not depend on a specific choice of a pair representing the subdifferential.
Cones of Admissible Directions and a Nondegeneracy Condition
185
Theorem_15.1. Let a function h be quasidifferentiable at a point x, let the set [d h(x), -d h(x)) be in a general position and let 12 = {y E Rn h(y) <_ 0}. I
Then
This theorem is an immediate corollary of Proposition 15.3. CAD
Remark 15.2. Equality (15.14) is automatically satisfied if (i) the sets dh(x) and -dh(x) do not intersect, or (ii) their boundaries intersect but at each point of intersection there is no common hyperplane that is also a support hyperplane to both sets, or (iii) one of the sets is contained in the interior of the >,4-
other.
15.4. Let a set a be defined by
.Cl = {y E X I h(y) = 0}
(15.15)
(an "equality" constraint). Here X c Rn is an open set, h is continuous on X and uniformly directionally differentiable at a point x E X, and hz is a
continuous function (of direction). In this case the cone y(x,t2) may be empty. We shall study the cone T(x,a).
Proposition 15.4. Let h satisfy the above conditions and, in addition, let its derivative h,r have the following property: if hx(g) = 0, then for any .6 > 0 there exist elements v E B, (g) and w E B, (g) such that hX (v) < 0, hX (w) > 0. Then
T(x,t2) = {g E Rn I hX (g) = 0}
(15.16)
.
Proof. (i) Let g E I'x (.2). Then there exist sequences {ak } and {gk } such that
ak - +0, gk - g, h(x+ak gk) = 0 . Since
h(x+ak gk) = h(x) + ak hX (gk) + o(ak gk) where 11v 11
00
,
Chapter 15
186
and the derivative hX is continuous with respect to direction, taking the limit leads to hX(g)=0. (ii) Let hX (g) = 0. Assuming that g 4 T(x,a), we shall find ao > 0 and e > 0 such that
x+av 4 Sl VaE (O,ao), VvE B'C (g),
r-.
h(x+av) # 0 if 0 < a < ao,
II v-g II < e
.
We shall show later that in this case h(x+av) always has the same sign. For the sake of argument one needs to assume that h(x+av) >_ 0. Then
h(x+av) = h(x) +ah'xv +oav = a (h'(x,v) +>_ 0. Dividing by a and taking the limit as a - +0 yields Al
h'(x, v) > 0
V V E BC (g)
.
However, these inequalities contradict the assumptions of the proposition. We obtain a contradiction also in the case h(x+av) < 0.
Let us now prove that the function h(x+av) always has the same sign if a E (O,ao], V E B,(g). Assuming the contrary, we find a1,a2 E (O,ao] and VI V2 E B, {g} such that
h(x+al vl)
< 0,
h(x+a2v2) > 0
.
Let
0(t) = h(x+talvl +(1-t)a2v2)
V t E [0,1]
.
The function 0 is continuous and is of different sign at 0 and 1. Therefore O(to) = 0 . Then
toai vi + (1-to)a2v2 = av 9
`"'
there exists a point to E (0,I) such that (15.17)
Cones of Admissible Directions and a Nondegeneracy Condition
187
where
a = to al + (1-to)a2 E V=
ta1
vi +
(1
a)a2
(O,ao]
v2, E BE(D)
But (15.17) implies that h(x+av) = 0, which contradicts the assumption that
g 4 T(x,a). 0
Theorem 15.2. Let a function h be uniformly quasidifferentiable at a point x E .Q, where .f2 is described by (15.15). If the pairs [d h(x), -N(x)] and [dh(x), 4h(x)] are both in general positions, then relation (15.16) holds.
P r o o f. It follows from the proof of Proposition 15.3 that if the pair [dh(x), -dh(x)] is in a general position, then for any e > 0 there exists an element v E B,(g) such that hX (v) < 0. _
Analogously, if the pair [dh(x), -dh(x)) is in a general position, then for some W E BB (w) we have (--h)X (w) > 0 or, equivalently, hX (w) < 0.
To conclude the proof it is sufficient to refer to Proposition 15.4. 0
Remark 15.3. Equality (15.16) is automatically satisfied under the same conditions as equality (15.14) (see Remark 15.2.).
Chapter 16 NECESSARY AND SUFFICIENT CONDITIONS FOR AN EXTREMUM
"tea
16.1. It is convenient to describe necessary conditions for an extremum in terms of the cones approximating the set over which optimization is to be performed.
0:.
The following necessary conditions follow directly from the definition of an extremum point.
Theorem 16.1. Let a function f defined on an open set X c R" be directionally differentiable at a point X E X. Let d2 be some subset of X and x E .fl. Then for a point x to be a minimum point of f on Sl it is necessary that min f(g) gE r (x,.?)
=0.
(16.1)
For a point x to be a maximum point off on !2 it is necessary that max f(g) gE Y (X,a)
=0.
(16.2)
If, in addition, the directional derivative fx' is continuous in g, then the cone y(x,s2) in (16.1) and (16.2) can be replaced by its closure cl r(x,s2).
ate'
P r o o f. We consider only the case of a minimum. Let g c= y(x,Q). Then for sufficiently small a > 0 we have x+ag E d2 and therefore f(x+ag) > f(x). This inequality implies that f(g) > 0. Since, in addition, 0 E y(x,Q) and f(0) = 0, condition (16.1) must be true. 0 116
Definition 16.1. A point x E .C2 satisfying (16.1) is called an inf-stationary point of the function f on the set d2.
A point x E S2 satisfying (16.2) is called a sup-stationary point of the function f on the set d2.
188
Necessary and Sufficient Conditions for an Extremum
189
Theorem 16.2. Let all the conditions of Theorem 16.1 be satisfied. Assume that the function f is uniformly directionally differentiable at a point x E Sl
and that its derivative f(g) is continuous in g. Then for a point x to be a minimum point off on Sl it is necessary that min f(g) gEr(X,Q)
=0
(16.3)
.
For a point x to be a maximum point off on dl it is necessary that max
gEr(X,Q)
f(g) = 0
(16.4)
.
P r o o f. We shall again only consider the case of a minimum. Take any g E r(x,dl). Then there exist sequences {ak } and {9k} such that
gk E R' , gk -4 g, x + ak gk E 12 . m
ak -4 +0,
We have
.fix) _ f(x+ag) _ .f(x) + akfX (8k) + 0(ak9k) , where IIvO 0
IIV11
Taking the limit yields fX (g) > 0.
O
o
Corollary 16.1. Let f be Lipschitz on S5(x) ri 12, where 8 > 0. If
f(g) > 0
V g E r(x,Q), g# 0,
then x is a strict local minimum of the function f on the set 12.
P r o o f. We must show that there exists an r > 0 such that
f(y) > f(x)
V Y E Sr (x) r1 Q Q.
Assuming the contrary, there must exist a sequence {xt } such that
X.
x, xt E 12, f(x) < f(x) -4 -400
d1
Chapter 16
190
We can assume without loss of generality that
xi -x ai
gi
-4
:400
g,
where ai = Ilxi x I I . Clearly, g E T(x,Q), u g h I = 1. We have
f(x) -f(x) = f(x+ag) -f(x) = [f (x+ai g) f (x)] + [f (x+aigi) f (x+ai g) ]
The function f is Lipschitz and therefore jf(x+a1g1) -f (x+ai g) J < Lai II gi-g II
Since 1
[(
:
-f(x)] 'a
i-4
f(g),
we have 1
On the other hand, our assumptions.
1
a
.
Cf(xt) -f(x)]
f;(g).
[f(x)-f(x)] < 0 V i. Thus fX(g) < 0, which contradicts
00c
ado
Remark 16.1. Clearly, Theorems 16.1 and 16.2 are still valid if we replace the cones r(x,Q), cl r(x,sl), T(x,Q) by subcones of these cones. For example, if T(x,sl) is not convex, it is more convenient to use convex subcones of T(x,Q) rather than T(x,sl) itself. Now consider the case where the function f is quasidifferentiable.
Theorem 16.3. Let a function f be quasidifferentiable at some point x and let K be a convex cone in R2 . Then (i) the relation m i n f,(g) = 0 holds if and only if r7f (x) (= f(x) - K* ;
geK
( ii) the relation ma x f(g) = 0 holds if and only if -df'(x) c df(x) + K* .
geK
Here, as usual, K* is the cone conjugate to K and Df(x) = [fi(x), df(x)] is a quasidifferential off at x.
Necessary and Sufficient Conditions for an Extremum
191
P r o o f. Let
pl (g) =
max (v,g), p2(g) = (v,g) max vEdf(x) VE[-of(x)]
V g E Rn
.
(16.5)
The functions p, and P2 are sublinear and
max (v,g) +
vE d f (x)
min (w,g) = pi(g) - p2(g)
w r: d f(x)
Let m i n f(g) = 0. Then gEK f(g) = p i (g) - p2 (g)
0
VgEK m
f,(g) =
.
ooh
Hence Pi K > P2 K , where PiK is the contraction of the function pi to the cone K. Since P i K > P2 K) we have P2 K Making use of Proposition 5.8, iK we then obtain (16.6)
It is easy to check that this inclusion is equivalent to
dpi - K* D P2 .
(16.7)
Since f (x) and - if (x) are convex compact sets, equation (16.5) implies that pi = f(x), 0p2 = -, if(x). The required inclusion f(x) - K*
(16.8)
-df (x)
then follows from (16.7). BCD
n°'
Next let us prove the opposite inclusion. Let (16.8) hold. Inclusion (16.7) is also satisfied, implying inclusion (16.6), i.e., P i K P2 K This is equivalent to PI K (g) > P2K (g) V g E K, which in its turn is equivalent to f,(g) > 0 V g E K. This proves the first part of the Theorem. The second part can be proved in a similar way. 0 0
The following necessary conditions for the unconstrained case were stated by L.N. Polyakova (see [1201):
Chapter 16
192
Theorem 16.4. Let a function f be quasidifferentiable at X E int Q. Then for point x to be a minimum point off on .fl it is necessary that --df (x) c
f(x).
(16.9)
For the point x to be a maximum point off on Sl it is necessary that
f(x)
df(x)
(16.10)
Pro of. Since X E int £2, the cone r(x,Q) coincides with the entire space Rn, and therefore y*(x,Q) _ {0}. Applying Theorems 16.1 and 16.3 now yields the required result.
16.2. Let a function h be defined and quasidifferentiable on an open set X c Rn. Put
S l = {yE XI h(y)<_0}
(16.11)
= {gIh'(x,g)<0},
(16.12)
and define y,
y 2 = gI h'(x,g):! 0}. The relation between r1, r2, r(x,Q) and T(x,Q) was discussed in Chapter 15. Let us consider the cone r2. Proposition 16.1. The following relation holds:
r2 =
_u
WE
[-cone (dh(x)+w)] * .
(16.14)
dh(x)
Here, as usual, the cone is the conical hull of * is the cone conjugate to
(the cone 0
P r o o f. Let pw
where w E dh(x).
max
vE dh(x)
[(v,g) + (w,g)],
Kw = {g I pw (g) <_ 0}
,
Necessary and Sufficient Conditions for an Extremum
193
Since (see Theorem 10.1)
hh(g) =
min
WE dh(x)
pw (g) ,
we have
_u Kw
72 =
WE dh(x)
Using Proposition 5.2, we obtain Kw = -cl cone &w, which implies that KW = [-cone &wl*. It follows from the definition that the subdifferential °&w of the function pw coincides with the set dh(x) + w. Then
r2 =
_u Kw =
_u
WEdh(x)
WEdh(x)
[-cone(dh(x)+w)]*
.
0
(16.15)
16.3. We shall now consider necessary conditions for a constrained extremum.
Theorem 16.5. Let a quasidifferentiable function f attain its extremal value on a set Sl (defined by (16.11)) at a point x such that h(x) = 0 and the pair [dh(x), -odh(x)] is in a general position. If x is a minimum point, then
odf(x) c
n [fi(x) + cl cone (dh (x)+w)] .
(16.16)
n [df(x) - cl cone (dh(x)+w)] .
(16.17)
WE ah(x)
If x is a maximum point, then
odf(x)
c
WE dh(z)
P r o o f. We shall consider only the case of a minimum. It follows from Theorem 15.1 that the closure cl y(x,Q) of the cone y(x,Q) coincides with y2. Since (see (16.15)) Y2 =
where K wW - - -[ that
(d
h() x +w )]
,
_ V Kw WE dh(x)
it follows from Theorem 16.1 and Remark 16.1
min f,(g) =0 VwE dh(x). g r= Kw
Chapter 16
194
Therefore (see Theorem 16.3)
-d f (x) c f (x) - Kw
V w E 9 h (x)
.
It follows from the definition of Kw that
Kw = -cl cone (d h (x)+w)
,
and hence
-df (x) c
_ n [f(x) WE d h(x)
+ cl cone (d h (x)+w)] .
El
Remark 16.2. Let h(x) = 0 and Dh(x) = [d h (x), d h (x)] be a quasidifferential of a function h at a point x such that
d h (x) n(-dh (x)) = 0.
(16.18)
Then the necessary condition for a minimum (condition (16.16)) can be rewritten in the form
_r) &(x) (x) + cone (dh (x)+w) ]
d f (x) c
(16.19)
.
WE dh(x)
obi
The necessary condition for a maximum (condition (16.17)) can be rewritten in the form
-f(x) c
_r) [df(x) - cone (d h (x)+w)] WE
.
(16.20)
dh(x)
.,'
Indeed, condition (16.18) is equivalent to 0 4 [dh (x)+w] V w E [d h (x)]. Since the conical hull of a compact set not containing zero is closed, the operation of closure in formulas (16.16) and (16.17) is not necessary.
Unfortunately, the validity of condition (16.18) depends on the pair of sets chosen to represent the quasidifferential. To show this, let condition (16.18) be satisfied for some pair [dh (x), d h (x)]. If B is a ball with a sufficiently large radius centered at zero, then the sets dh (x) +B and d h (x) +B intersect. At the same time, the pair [dh (x)+B, d h (x)+B] is _ equivalent to the pair [dh (x), d h (x)]. Observe also that if (16.18) is satisfied, then the pair [dh (x), d h (x)] is in a general position.
Necessary and Sufficient Conditions for an Extremum
195
Remark 16.3. It is not difficult to check that the necessary condition for a minimum (condition (16.16)) can be rewritten in the form
-(w + f (x)) n cl cone (w'+dh(x)) # 0 V W E off (x), V W' E ooh (x)
.
(16.21)
.
(16.22)
Relation (16.21) can be represented as
0
- cone (w+d f (x)) n cl cone (w'+d h (x))
V W E Off (x), V W' E ooh (x)
= v 2 (recall that the cone
Here the cone
=u
If the sets dh(x) and -d h (x) do not intersect, then (16.22) can be rewritten as .-,
- cone (w+f(x)) n cone (w'+dh (x))
m
.
(16.23)
Analogously, the necessary condition for a maximum (condition (16.17)) can be represented as
cone (v +df (x)) n cl cone (w'+d h (x))
m
VvEdf(x),Vw'Edh(x).
(16.24)
If dh(x) n (-dh (x)) = 0, then (16.24) is equivalent to
cone (v +af(x)) n cone (w'+dh(x)) VVE
f(x),
m
b' w'E dh(x).
If, in addition, the intersection of the sets !2f(x) and -d f (x) is also empty, then these necessary conditions can be rewritten in the following symmetric form:
the necessary condition for a minimum:
-cone (w+f(x)) n cone (w'+dh(x))
m
V W E af(x), `d w' E ooh (x) ;
(16.25)
Chapter 16
196
the necessary condition for a maximum:
cone (v+df(x)) n cone (w'+dh(x)) # 0
VvEe(x),Vw'Edh(x).
(16.26)
The following necessary conditions (first derived by A. Shapiro [146]) may be useful for practical purposes:
Proposition 16.2. Let functions f and h be quasidifferentiable on R" and
,fl={yE R" I h(y)_<0}.LetxE.fl,h(x)=0. For the point x to be a minimum point off on Sl it is necessary that L1(x) C L2 (x)
,
where
Ll(x) r^,
L2(x) _ co {at'cx> - dh(x), dh(x) - afcx>}
P r o o f. Let h(x) = 0 and let x be a minimum point of f on Q. Consider the function
F(y) = max {f(y) f*, h(y)} , where
f* = f(x) = mi n f(y)
.
YED
The equality F(x) = 0 implies that x is a minimum point of F on R'2. Since
the function F is quasidifferentiable (as the maximum of two quasidifferentiable functions), applying the necessary condition for an unconstrained minimum (condition (16.9)) leads to -9F(x) c d F (x), where
aF(x) = co {afcx>-ah(x), ah(x) - afcx>} aF(x) = df(x) + ah(x)
,
.
`dam
(These regulations follow from the rules of quasidifferential calculus; see Chapter 10).
11
Necessary and Sufficient Conditions for an Extremum 16.4. Let Sl = {y I h(y) <_ 0} and D = {y I f(y)
197
f(x)}. Putting f(y) = f(x)-f(y),
we have dl = {y I f(y) < 0}. Note that Df(x) = [-df(x), -e(x)]. Assume that the pairs of sets -df(x) and f(x) and -dh(x) and ah (x) do not intersect. Find a necessarycondition for the point x to be a minimum point of the function h on the set .fl. Using (16.25), we have
- cone (w'+dh (x)) n cone (v'-df(x)) # 0 V w' E d h (x), V v' E[
(x)]
.
(16.27)
Putting v = -v' and using the equality
cone (v+df(x)) _ --cone (v'-df (x))
,
we can rewrite (16.27) in the form
cone (w'+dh(x)) n cone (v+df(x))
0
V WE d h(x), V v E f(x) Thus, the necessary condition for x to be a minimum point of the function h on the set d2 coincides with the necessary condition for x to be a maximum point of the function f on the set D.
Remark 16.4. Let a set £2 be described by the following system of inequalities:
d2 = {yE Rn Iht(y)<_0 ViE 1:N}, where functions ht (i E 1:N) are defined and quasidifferentiable on an open set X c Rn. Putting h(y) = max he(y), we can represent d2 in the form d2 {y E Rn I h(y) <_ 0}, where h is a quasidifferentiable function.
Necessary conditions for an extremum of a function f on £2 can be express-
ed in terms of quasidifferentials of the functions f and h. Since, in turn, a quasidifferential of h can be expressed in terms of quasidifferentials of the functions hl,..., hN , necessary conditions can also be expressed directly in terms of quasi differentials of functions f, h1,..., hN .
Chapter 16
198
16.5. Let
fit,,
whgre h, as above, is qua8idifferentiable on some open set X containing Q. Necessary conditions for an extremum can be formulated in terms of the _ cone T(x,Q) (where x ED is an extremum point of f).
_ Assume tl a function h is such that the pairs of sets [d h (x), -d h (x)] and [d h (x), -d h (x)] are in a general position. Then from Theorem 15.2 we have {g I h'(g) = o} or, equivalently,
F(x,Q) = Y2 n Y2 where
r2 = {g I h "(g) <_ 0}
,
_ {gI hh(g)>0} _ {gI Using Pro 0"
AU
.
16.1, we have
,72
Applying t h e s
(-h),"(g)<-0}
=
v
W'E d h(x)
[-cone (dh(x), (dh (x)+w'] *
.
Preposition to the function -h and using the fact that D(-h)(x) = [-d h (x), --d(x)]
leads to
t,; r2
U
AE [-d h(x)]
v
w 'E d h (x)
[-cone (-dh(x)+2)]*
[cone (d h (x) + w')] * .
Necessary and Sufficient Conditions for an Extremum
199
Thus
U
Dint
T(x,.(Q) =
w'E d h (x)
w'Ed h(x) V'Ed h(x)
[-cone (d h (x)+w' )] * n
U
[cone (d h (x)+v' )]
vIF Ed h (x)
[-cone (dh (x)+w')]* n [cone (d h (x)+v' )]*
.
(16.29)
Theorem 16.6. Let a quasidifferentiable function f attain its extremum on
a set D described by relation (16.28) at a point x such that the pairs [d h (x), --r7h (x)] and [d h (x), ooh (x)] are in a general position. Assume that f and h are uniformly quasidifferentiable at the point x.
If x is a minimum point off on .fl, then
-d f (x) c
n
w'Edh(x)
{f(x) - cl [cone (d h (x)+v') - cone (dh+w' )] }
.
(16.30)
.
(16.31)
vIF Ed h(x)
If x is a maximum point off on fl, then
-d f(x) c
n
w'E d h (x)
{d f (x) + cl [cone (d h (x)+v') - cone (d h+w') ] }
V Ed h(x) 0
P r o o f. We shall consider only the case of a minimum. Let Kv 'w
= [-cone (dh (x)+w') ] * n [cone (d h (x) +v') ] * ,
where v' E dh (x), w' E d h (x). It follows from (16.29) that
T(x,12) =
U V 'E d h (x) W 'E d h (x)
Chapter 16
200
Theorem 16.2 and Remark 16.1 imply that
f,(g) = 0
min
Vv'E dh(x), VW'E dh(x).
gE KV,w,
Applying Theorem 16.3 yields
- d f ( x ) c [f(x) - K*
,]
(16.32)
.
Since (see Chapter 1) the cone conjugate to an intersection is equal to the closure of the sum of the conjugate cones, we have K* -w -
= -cl [cone (d h (x)+W') + cone (d h (x)+v')] .
It follows from (16.32) that d f (x)
c e(x) - cl
[cone (d h (x)+v') - cone (d h (x) + w')]
.
Since this inclusion holds for every V' E dh(x) and W' E d h (x), we can conclude that (16.30) is valid.
16.6. We shall now consider sufficient conditions for a strict (local) maximum and minimum.
Theorem 16.7. Let a function f defined on an open set X c Rn be uniformly quasidifferentiable at a point x E X. Then if
-df (x) c int
f(x)
,
(16.33)
then the point x is a strict local minimum off If
-df(x) c int df(x)
,
then x is a strict local maximum off
Remark 16.5. Note that this theorem is essentially "nonsmooth," since for differentiable functions the conditions of Theorem 16.7 are never satisfied.
P r o o f of Theorem 16.7. We shall again consider only the case of a minimum.
Inclusion (16.33) implies that -df(x) + eB c fi(x) for some e > 0 (here B is the unit ball).
Necessary and Sufficient Conditions for an Extremum
201
We have
(v,g) <
max
max (v,g) .
vE d f(X)
VE [-af(X) +eB ]
Since max
(v)g)
max (v,g) + m ax (v,g) VE [-d f(X)] v E eB
=
VE [-df(X) +eB ]
max
(v,g) + ell g II
,
w
VE [-a f (X)]
we have
f;(g) =
max (v,g) -
vE d f(X)
(v,g) max VE [-a f(X)]
ell g II
Thus, for II g II = 1, we have
f(x+ag) - f(x) = aff (g) + Ox ,g (a)
ae+
0X3,g (a)
a
Since oX,g(a
a
-4 0
uniformly with respect to U E B, there exists a 8 > 0 such that
a6 +OX,g(a) > 0
V g E B, V a E (0,8).
0
Remark 16.6. A sufficient condition for a strict local extremum in the constrained case can be stated as follows (seems e.g., [44]): If there exists an r > 0 such that for everyw E df(x) and w' e d h (x) the intersection n cl cone (d h (x)+w')
contains a ball of radius r, then the point x is a strict local minimum of f on
fl.
Chapter 16
202
Similarly, if the necessary condition described in Proposition 16.2 is satisfied and
Ll (x) c int L,2(x)
(16.34)
,
then x is a strict local minimum of f on D. `-'
Remark 16.7. Let us consider condition (16.21) once again. If each of the sets df(x) and dh(x) is a convex hull of a finite number of points, then (16.21) may be checked simply by verifying this condition for the vertices of the sets of (x) and d h (x) only. Let us prove this property. Let
(x1 +A) n B # 0
,
(16.35)
(x2 +A) nB # 0,
(16.36) 4-'
4-+
where A and B are convex sets. We must show that for any a e [0,1 ] we have
((axl + (1-a)x2) + A) n B # 0
(16.37)
.
Indeed, (16.35) and (16.36) imply that there exist points y1 E A and Y2 E A such that zi
= (xi+Yl) E B,
Z2 = (x2+Y2) E B .
Since B is convex, we have (az1+(1-a)z2) E B V a e [0,1 ], i.e.,
((axi + (1-a)x2) + (ayl + (1-a)y2)) E B .
(16.38)
But A is a convex set and therefore (ayl+(1-a)y2)) E A. It is now clear that (16.37) may be deduced from (16.38). A),
This shows that to check X16.21), it is necessary to verify this relation only for the vertices of the set df(x). Analogous arguments can be applied to the set cl cone (ah(x)+w'). `k,
Thus, if each of the sets df(x) and dh(x) is a convex hull of a finite `Z3
4..
number of points, the problem of verifying necessary conditions can be simplified considerably. This is of particular importance for computational purposes.
Chapter 17 RATE AND DIRECTIONS OF STEEPEST DESCENT AND ASCENT
17.1. We shall now show how quasidifferentials may be used to find the rate and directions of steepest descent or ascent.
Let a function f defined on some open set X c Rn be quasidifferentiable at a point x E X, and let K c Rn be a convex cone. Put
-ak =
i n f f(g),
gEK
bk =
Ilg II 5 1
sup f(g).
gEK
(17.1)
Ilg II 5 1
The quantities aK and bK are called the rate of steepest descent and the rate of steepest ascent, respectively, of the function f at the point x with respect to the cone K. If aK > 0 and there exists a vector g E K, II g II = 1, such that the infimum in (17.1) is obtained, then this vector g is called a direction of steepest descent of the function f at the point x with respect to the cone K. If bk > 0 and there exists a vector g, E K, 11g, II = 1, such that the supremum
in (17.1) is obtained, then this vector gl is called a direction of steepest ascent of the function f at the point x with respect to the cone K. Of course, it can happen that one (or both) of these directions does not exist; and even if they do exist, they are not necessarily unique. 17.2.
Definition 17.1. Let A c Rn, B c Rn be closed sets. The quantity
d(A,B) = s u p p(v, B) = sup i n f vEA
II v-w II
vEA WEB
is called the semideviation of the set A from the set B.
If the sets A and B are not only closed but also bounded, then the ord.
+-'
distance between A and B in the Hausdorff metric (the Hausdorff deviation of A from B) is given by the formula (see Chapter 2) p(A,B) = max {d(A, B), d(B,A) } . 203
Chapter 17
204
Theorem 17.1. Let a function f defined on an open set X c Rn be quasidifferentiable at a point x E X, and let K c Rn be a convex closed cone. Then
aK = d(---df(x), f(x)-K*) ,
bK = d(fi(x), af(x)-K*)
.
BCD
P r o o f. We shall concern ourselves only with the quantity aK. Note that the set f(x) -K* has the following property : if g 4 K, then sup
vE [af(X) -K *I
(v,g) _ + 0
(17.2)
.
Indeed (compare with the proof of Proposition 5.7), since g 4 K, the sepaCDR
ration theorem can be used to find an element w E K* such that (w,g) < 0. Let v' E f (x). Then (v' 2w) E ( f (x) -K*) V A > 0 and therefore sup
VE [df(x) - K*]
(v,g) ? sup (v'-2w, g) _ + 00
.
A>O
From (17.2) we now have
ak = - inf f'(g) = - inf gEK
gEK Ilgll <- 1
Ilgll <- 1
inf
Ilg ll-1
min (w,g) max (v,g) + wEaf(X) vEdf(x)
min (w,g) (v,g) + sup vE [af(x) -K*] WE a f(x)
IQ,
r-.
{(v,g) + (w,g)} inf inf sup IIgII<_1 wEaf(x) vE [af(x)-K*]
inf
inf
(v+w,g) .
sup
w E o f f (x) g E B VE [f(x) -K * ]
Using the fact that the unit ball B is convex and compact, the set [f(x) -K* ] is convex and the function O(v,g) = (v+w, g) is affine in v and g, the minimax theorem yields
(v+w, g) = sup VE [df(x) -K*]
sup VE
m
inf gE B
i n f (v+w, g)
-K*] gE B
.
Rate and Directions of Steepest Descent and Ascent
205
Since
inf (g,r) _ - Il r ll gE B
we have
inf W E off (x)
inf (v+w,)
sup
vE [f(x) -K*] gE B m
inf WE o f f (x)
sup vE [f(x) -K * ]
inf
sup
wE off (x) vE [df(x)-K*]
sup
WE [-off (x)]
- II v+w II
m
aK = -
11 v+w 11
o(w, f(x)_K*) = d(-df (x),
f(x)_K*).
Let a denote the rate of steepest descent in the unconstrained case (i.e., if
the cone K coincides with the entire space), and let b denote the rate of steepest ascent in the unconstrained case. The next result follows from Theorem 17.1.
Theorem 17.2. Let a function f defined on an open set X c Rn be quasidiffer-
entiable at a point x E X. Then the rate of steepest descent at the point x is jal = d(-df(x), f(x)); and the rate of steepest ascent is b = d(-fi(x), df(x)).
Remark 17.1. It is easy to check that f(x)) df(x))
d(e(x), --df(x))
It follows from this equality and Theorem 17.2 that the distance
-df(x)) in the Hausdorff metric coincides with the rate of steepest descent or ascent, whichever is the greater.
X01
Remark 17.2. The equality a = 0 implies that the necessary condition (16.9) is satisfied at the point x. Analogously, the relation b = 0 holds if and only if necessary condition (16.10) is satisfied at x.
206 I-+
Chapter 17
17.3.
oar
sae
Definition 17.2. Let .0 be a set in Rn, X E £2, and let 7(x,4'2) be the cone of admissible directions at the point x. Let a function f be directionally differentiable at x. Also, let
a n (x) = a n =
inf
fX (g)
.
(17.3)
g ii=1
Cog
/ham
The quantity -an = Ian I is called the rate of steepest descent of the function f at the point x with respect to the set D. We can define the rate of steepest ascent with respect to the set 12 in a similar way. Let g(x) E T(x,Q), IIg(x) II = 1, be such that
f(g(x)) =
min
gET(x,£2)
f;(g).
Ilgil=1
The direction g(x) is called a direction of steepest descent of the function f at the point x with respect to the set 1, and is not necessarily unique. A steepestdescent direction may be only feasible not admissible. Analogously, , if gl (x) E T(x,Q), 11g, (x) 11 = 1, and
f(gi(x)) =
max
gET(x,
f;(g),
IIgIi=1
then the direction g1 (x) is called a direction of steepest ascent of f at x with respect to D. Consider the case in which .(l is given by
.fl = {y I h(y) :5 0}
,
(17.4)
where h is a quasidifferentiable function defined on Rn. Let a point x E Sl be such that h(x) = 0 and the pair [d h (x), -df(x)] is in a general position. Then (see Proposition 16.1, Proposition 15.3 and equation (16.12)) the cone. cl y(x,.(2) is defined by the equality
T(x,fl) = cl y(x,.(2) = {g I hr (g) <_ 0}
u
w E dh (x)
[-cone (d h (x) +w) ] * .
Rate and Directions of Steepest Descent and Ascent
207
Put Kw = -[cone (d h (x)+w)]*. The set Kw is thus a convex closed cone. Find a direction of steepest descent of the function f at the point x with respect to the set Q. Then from Theorem 17.1 we have an (x)
= an
min
gEr(x,.2)
f", (g) = -
inf
11 gIIS1
IIgH=1
inf
wE oIh(x)
sup
WE dh(x)
inf f(g) _
gE Kw iig ii S1
f(g)
gE cl y(x,.2) sup
WE Ih(x)
d(-df(x)) e(x)-Kw*)
1-inf f(g)] gE Kw Ilghi S 1
.
This implies the following result.
Theorem 17.3. Let x E .0 and h(x) = 0. Then the following relation holds:
la,?(x)I = la,?l = m
WE
sup d(--d}'(x), fi(x) + cl cone (dh(x)+w)) dh(x)
.
Remark 17.3. The relation a,? (x) = 0 holds if and only if the necessary condition for a minimum (condition (16.16)) is satisfied at the point x.
Theorem 17.4. Let 12 c Rn be a convex set and let x E Q. Then I a,? (x) I
= d(-df(x), dJ(x) -Kx) ,
where an (x) is given by (17.3) and Kx is the cone tangent to the set .(2 at the point x. 0
The proof of this theorem is left to the reader.
Remark 17.4. Theorems 15.2 and 17.1 enable us to compute the rates and directions of steepest descent and ascent in the case where 12 is given by (16.28) (see e.g. [121,123]).
Chapter 17
208
Remark 17.5. Theorem 17.3 enables us to find steepest descent directions. Let X E .fl, h(x) = 0 and Ian (x)l > 0 (i.e., x is not an inf-stationary point). For everyw E df(x) and w' E dh(x) we calculate min
II z + z' II
XE f (X)+w
=
II z(w,w') + z'(w,w') II
= q(w,w')
z'E cl cone(f (X)+w') Then find
Arg max {q(w,w') I [w,w'] : w E off (x), w' E ooh (x)} = N.
It is easy to see that for any [w,w'] EN the direction z(w,w') + z'(w,w' ) I I z w w' + z' w w'
g (w,w)
11
O.,
is a direction of steepest descent. Off'
Clearly, a direction of steepest descent is not necessarily unique
if the
(-.1
set N is not a singleton, then there may be a number of steepest descent directions.
yak
OOH
Of course, it is quite possible that a steepest-descent direction is tangential but not admissible (i.e., belongs to the cone cl y(x,Q) but not to the !"f
s.0
cone y(x,Q). This is not very convenient from a computational standpoint. There are several ways of guaranteeing that a descent direction (not neces-
sarily a steepest descent direction) will be admissible. One of these is based on the following proposition:
Proposition 17.1. Let the assumptions of Theorem 16.5 be satisfied and let K(w') = cl (cone ooh(x)+w')). If int K*(w') # 0, then the necessary condition (16.16) is equivalent to III
r,,
0 E co { [f (x) + w] u T,7 (w') } - L,7 (w,w') , where
Tj (w') = {v E K(w') 1 for any 17 >0.
11V II
= 171
Rate and Directions of Steepest Descent and Ascent
209
Corollary 17.1. If, for some x E D and w E d f (x), w' E d h (x), we have h(x) = 0 and 0 4 LI (w,w'), then z71 (w,w' )
g, (w,w) _II
Z27 w w' I I'
Here
min
IIz 11
zE L77 (w,w') m
11 Z77 (w,w')11 =
is a descent direction off on .2 at x, which is also admissible, i.e.,
dx dg,7 (w,w')
<0
and
dh(x) dg, (w,w')
We can also use Proposition 16.2 to guarantee that a descent direction be r..
admissible. r,'
Proposition 17.2. Let x e .fl and h(x) = 0. If Ll(x) t L2(x) (defined as in Proposition 16.2), then any direction of the form v(x)-w(v(x))
...
g-
Il v(x)-w(v(x)) II '
where
max p(v) _ p(v(x))
vE Ll (x)
p(v) =
min II v-w II = we L2(x)
II v-w(v) II
is a descent direction which is also admissible.
Example 17.1. First let us consider an unconstrained optimization problem. Let x E R2 and take
f(x) = max{min{x(l) , x(2) }; x(2) -x(l) } where x0 = (0,0).
Chapter 17
210
This function was considered in Example 13.2. As was shown therein, we can take Df(xo) = [e(xo), df(xo)], where
f(xo) = co {(1,-1),(0,1),(-1,0)} df(xo) = co J(0,1),(-1,O)j o.'
This does not satisfy the necessary condition for an unconstrained minimum (condition (16.9)). Find the Hausdorff semideviation of the set -df(x°) from the set f(x). It is clear (see Fig. 17.1) that the most widely separated points in the sets -df (x°) and f(x0) are (1,0) and (0,-1). To find the distance between
the point (1,0) and the set f(xo), it is necessary to find the distance between this point and the interval connecting the points (0,1) and (1,-1). Trivial calculations show the nearest point to be z° = (3/5, -1/5), and therefore 121
r-+
(1,0)-z° g0
1
I I (1,0) - z° II
F5 /S
is a steepest-descent direction. Xa
Figure 17.1
-\5 '-\5
Rate and Directions of Steepest Descent and Ascent
211
Analogously, the point (0,-1) yields zo = (1/5, -3/5), and thus go
is another steepest-de scent direction. Note that the rate of steepest descent is Ia(xo)I =
11(1,0)-zo II
=
11(0'-1)-zo II =
5'
Example 17.2. Now let us consider a constrained optimization problem. Let
f be the same function as in Example 17.1 and let xo = (0,0). Take ,Cl {x I h(x) <_ 0} (see Fig. 17.2), where
h(x) = min {hl(x), h2(x)} ,
hl(x) = max {x(1>, x(2)} h2W
max J-X(l), X(2)1
Applying the rules of quasidifferential calculus (Chapter 10), we c
A, (xo) = co JO
N, (xO) = JOI 9
ahz (xo) _ co {(-1,0),c0,1>},
ah2 cXO>
take
_ {ol
Finally (see Fig. 17.3) we have
dh(xo) = co {(1,0),(0,-1)} + co {(-1,0),(0,1)} = co {(-1,-1),(1,1)} dh(xo) = co {-co {(1,0),(0,-1)}, -co {(-1,Q),(0,1)}} = co
',_
4)C13
C13
The pair dh(xo) and 4h(xa) is in a general position since the boundaries of these sets intersect at point (1/2, 1/2) and (-1/2, -1/2) where there is r1o .,.
common support hyperplane (see Remark 15.2). We shall now check whether the point xo satisfies the necessary condition .-^i
(16.1) for a constrained minimum. Since h(xo) = 0 and the sets dh(xo) and (4h (xo)) are in a general position, we can apply necessary condition (16.21). From Remark 16.62 it is sufficient to check this condition for the vertices of the sets df(xo) and dh(xo.)
Chapter 17
212
xi
x
00,610
A
X2
Figure 17.3
Figure 17.2 Since
f(xo) = co dh(xo) = co 1-4
and the vertices of d f (xo) are (-1,0), (0,1) and the vertices of d h (xo) are (1,0), (0,-1), (-1,0), (0,I), we must check the following eight relations:
m,
(17.5)
A1nB2 m,
(17.6)
A1nB3
m,
(17.7)
Al nB4
m,
(17.8)
A2nB1
m,
(17.9)
A2 n B2
m
A,
B,
,
(17.10)
A2nB3 # m,
(17.11)
A2 n B4 # m
(17.12)
,
Rate and Directions of Steepest Descent and Ascent
213
where
0
Al = -[(-1,0) + f(xo)] = co {(0,1),(1,-1),(2,0)}
A2 = -[(0,1) + f(xo)] = co
{(0,-2),(1,-1),(-1,0)}
B1 = cone co {(2,1),(0,-1)}, B2 = cone co {(1,0),(-1,-2)} '-'
0
0
B3 = cone co {(0,1),(-2,-1)}, B4 = cone co {(1,2),(-1,0)} It is clear from Figs. 17.4 - 17.7 that conditions (17.5) - (17.12) are satis-
00'
crab
c13
fied. Note also that the set Al and the cone B3 and the set A2 and the cone B4 each have only one point of intersection, i.e., these intersections have empty interiors and therefore the sufficient condition for a minimum (see Remark 16.6) is not satisfied. (In this case the point xo is actually a minimum point but this cannot be deduced from the necessary and sufficient condition.)
Now let us check the necessary condition given in Proposition 16.2. We have L1(xo)
= -df(xo) - dh(xo) = co {(1,0),(0,-1)} + co {(1,0),(0,-1),(-1,0),(0,1)} = co {(2,0),(1,-1 },(0,0},(1,1),(0,-2),(-1,-1)} ,
f(xo) - d h (xo)
= co {(0,1),(1,-1),(-1,0)} + co {(0,1),(1,0),(-1,0),(0,-1)} co
(-1,-1},(1,0},(2,-1},(0,-1},(1,-2}} ,
dh(xo) - df(xo) = co
{(1,1),(-1,-1)} + co {(1,0),(0,-1)}
= co {(2,1),(1,0),(0,-1),(-1,-2)} ,
L2 (xo)
= co {f(xo) - d h (xo ), d h (xo)
- df (xo) }
= co (1,0),(2,-1),(0,-1),(1,-2),(2,1),(-1,-2)} .
Chapter 17
214
It is clear from Fig. 17.8 that the necessary condition
Ll(xa) c int Z.l(xo) is not satisfied. -CD
Example 17.3. Let f be the same function as in Examples 17.1 and 17.2 and let xo = (0,0). Take .fl = {x I h(x) <_ 0} (see Fig. 17.9), where
h(x) = min {hl (x), h2(x)}
h l (x) = max 12x-3y, x} , h2{x) = max {3x-2y, y} . Applying the rules of quasidifferential calculus (Chapter 10), we can take (see Fig. 17.10) d hi (xo)
= co {(2,--3),(-1,0)} r6,
dhl(xo) _ {(0,0)}
,
dh2(xo) = co {(3,-2),(0,-2)} , dh2(xo) _ {(0,0)} d h (xo)
= co
j(5,-5),(2,-2),(-1,1)j
dh(xo) = co {(-2,3),(1,0),(-3,2),(0,-1)}
It is clear that the pair d h (xo) and -dh (xo) is in a general position. As in Example 17.2, let (see Fig. 17.11)
Al = -[(-1,0) + f(xo)] = co {(0,1),(1,-1),(2,0)} ,
A2 = -[(0,1) + f (xo)] = co Since the vertices of d h (xo) are (-2,3),(1,0),(-3,2),(0,-1) (the vertices of )(xo) are again (-1,0),(0,1)), we must check the relations
AjnBj # 0 where (see Figs. 17.12 - 17.15)
V i E 1:2,jE 1:4,
(17.13)
Rate and Directions of Steepest Descent and Ascent B1 = cone [(-2,3) + d h (xo)] = cone co J(3,-2),(0,1),(-3,4)1
215
,
B2
cone [(1,0) + d h (xo)] = cone co {(6,-5),(3,-2),(0,1)} ,
B3
cone [(-3,2) + d h (xo)] = cone co {(2,-3),(-1,0),(-4,3)}
B4
cone [(0,-1) + dh(xo)] = cone co {(5,-6),(2,-3),(-1,0)}
It is clear from Fig. 17.16 that A 1 n B 1 * 0 but A2 n B 1 = 0. Find min
11V-w II
=
11v2-w1 II
V EA2
WEBI
where W,
V2
r15
10)
F13
13
U3'1115
v2-w1 = (1,-1)
101
WIN
Since 2
1
3
,
we have that v2 -w1
921 =
1-2
-3 9
11V2-W1 1i
L41-3
41-3 j
is a descent direction off at xo with respect to Q. The rate of descent in this direction is 1
la(821)1 =
(17.14)
11V2-we 11
1-3
It is clear from Fig. 17.17 that B2 n Al * 0 but B2 n A2 = 0. Find
min
Il v-w II
= IIV2-w2 11
m
V EA2
wEB2
where 66 V2
W2
1.61
L5 61)
217
X2
D
0
i
X2
Figure 17.6
n
v.. .
1.. '..Uiiiii 3.. .r, .
110
...... . ......:
lima., 11..!
11.... 11..,,
Figure 17.7
8IZ
Nm NU OMNI KiNg 'NEW 14
SEND
6:57116 NOMOMMEMONEENI Z;u
IN,
isssss.A irssusal.
555511 ussuui
l
5IU ANs
Now
1
.somos 0 Ism
I
Am
is..... 1aIRRII,ME Isms iuIs SE0i 441, 00 Awl
a.Tn2i,4
8'LI
VVVVVVV.A
a.Tn.5TtI I 6'L
219 X2
T4 3 2
.1N
-3 -2 -1 llf
2
X
3
4
5
6
ah
Figure 17.10
x2
Figure 17.11
220
Figure 17.12 x2
Figure 17.13
221 X2
Figure 17.14
x2
4ti
ti
Rate and Directions of Steepest Descent and Ascent
223
Since 1
v2-w2 =
5
6
61
61)
we have that v2-w2 922
4
5
' '4611
11V2-W2 II
is a descent direction of f at xo with respect to Q. The rate of descent in this direction is la(922)1 =
1
11V2-W2 11 V -6-1
It is clear from Fig. 17.18 that A2 n B3 #0 but Al n B3 = 0. Find
min
IIv-wII =
11 VI w3 II
v EA1 wEB3
where
OIL
V1
W3
flO
15
F13
13)
Therefore
(
3
NIA
VI -w3 =
13)
,
and V1-w3
813 =
11V1-W3 11
141-3 ' -
) 2
3
L 13
4--
is a descent direction of f at xo with respect to Q. The corresponding rate of descent is la(813)1 =
11V1 -W3 II
=
1 V1J
224
Chapter 17
It is clear from Fig. 17.19 that A2 n B4 #0 but that Al n B4 = 0. Since
min
11v-w11 =
11v1-w411
v EA1
wEB4
where 1--1
V1
W4
55
66
F61
61
we have
V ' 61 5
VI-W4 =
'
,.d
Then the direction
gla -
''1 x'4 I1vl-w4 II
-
5
6
46-1 ' 4-61-
is a descent direction of f at xo with respect to .fl and the corresponding rate of descent is
11 Vl-Wq 11
r-+
la(g14)1 =
1
_
(17.17)
6-1
It follows from (17.14) - (17.17) that there are two directions of steepest
r-1
descent:
821 = 1J_
)
813 =
-13-
( ylJ_
The rate of the steepest descent is then
la(920I = la(gl3)I =
1
4 1-3
'
J
- y13
225 x2
x2
Chapter 17
226
;.0
Now let us check the necessary condition given in Proposition 16.2. Since
e(xo) = co {(1,-1),(0,1),(-1,0)} df(xo)
= co j(0,1),(-I,O)j
dh(xo) = co 1(5,-5),(2,-2),(-1,1)1
ah(xo) = co {(-2,3),(1,0),(-3,2),(0,-1)} we have L1(xo)
= -[df(xo)+dh(xo)] = -[co {(0,1),(-1,0)} + co {(--2,3),(1,0),(--3,2),(0,-1)}]
_ -co (--3,3),(0,0),(-4,2),(--1,-1 )}
co {(2,-4),(-1,-1),(3,-3),(0,0),(4,-2),(1,1)} ,
f(xo) - dh (xo)
= co {(1,-1 },(0,1),(-1,Q)}
+ co {(2,-3},(-1,0},(3,-2),(0,1)} = co {(3,-4),(2,-2),(1,--3)x(0,--1),(-1,1),(--2,0), (4,-3),(3,-1),(2,-2),(1,0),(0,2),(--1,1)}
= co {(3,-4},(2,-2),(1,-3},(0,--1),(-1,1), (--2,0),(4,-3),(3,--1),(1,0),(0,2)}
dh(xo)
- df(xo)
= co {(5,-5),(2,-2),(-1,1)} + co {(0,-1 },(1,0)}
= co {(5,-6),(2,-3),(-1,0),(6,-5),(3,-2),(0,1)}
Rate and Directions of Steepest Descent and Ascent
L2 (xo)
227
= co J f (xo) -d h (xo), d h (xo) -df (xo )} = co {(3,-4},(2,-2},(1,-3},(0,--1 },(-1,1 },(-2,0),(4,--3},(3,-1) (1,0),(0,2),(5,--6),(2,-3),(--1,0),(6,--5),(3,-2),(0,1)}
.
It is clear from Fig. 17.20 that Ll(xo) (t I.L(xo). Find the semideviation of the set Ll from the set I.2. Routine calculations show that d(L1(xo),L2(xo)) =
min
max
"V--w II
ve L1(X0) we L2(x0)
=
"VI -WI II
= 11v2-w2 II
where 25 '
53 25 96 25
w2
153
LAW
SIN
v2 = (2,-4),
SIN
196
11V1-W111 =
25 '
Therefore V1 -W1
V2 -W2
(4
3
25 ' 25 3
4
25 '
25
1
5, 1
II v2w2 II
5.
Then 41 gi
g2
5)
coo
are directions of descent of f at xo with respect to Sl and the rate of descent in these directions is 1/5 (note that the rate of the steepest descent is 1/-NFI-3 > 1/5).
These descent directions are interior (see Fig. 17.21 for the set Q). Note that steepest descent directions 92, and 913 are tangential to the set Q.
228 x2
Figure 17.20
x2
Figure 17.21
Chapter 18
)CI
SADDLE POINTS OF Q UASIDIFFERENTIABLE FUNCTIONS
18.1. Let z = [x,y] E Rn x Rm = Rn+ m , x E Rn, y E Rm, and consider a finitevalued function f(z) = f(x,y) defined on Rn X Rm. Then z* = [x*,y*] is called a minimax saddle point of f(z) on Rn x Rm if
f(x*,y) < f(x*,y*) < f(x,y*)
V x E Rn, V y E Rm
.
(18.1)
f(x*,y) >_ f(x*,y*) >_ f(x,y*)
V x E Rn, V y E Rm ,
(18.2)
If
then z* is called a maximin saddle point of f(z) on Rn x Rm. The terms "minimax saddle point" and "maximin saddle point" are used to distinguish the roles played by the x and y variables: in (18.1) the "player" who controls CJ`
(D'
P".
the parameter x tries to minimize the objective function f(x,y), while the "player" who controls the parameter y tries to maximize it; in (18.2) everything is reversed. If it is clear from the context which case is being considered, we shall just use the term "saddle point." (D'
'"'
If for x # x* and y # y* the inequalities (18.1) are strict, the point z* is called an isolated minimax saddle point of the function f(z) on Rn x Rm. An isolated maximin saddle point is defined in an analogous way.
A point z* E Rn + m is called a locally minimax saddle point of f(z) on Rn X Rm if there exists a 8 > 0 such that
f(x*,y) < f(x*,y*) < f(x,y*)
V Z E SS(z*)
,
where
S5(z*) _ {z = [x,y] I (x x*)2 + (y_y*)2 <
L$21
A locally maximin saddle point of f(z) on Rn x Rm may be defined in an analogous way. The concepts of an isolated locally minimax saddle point of f(z) on Rn x Rm and an isolated locally maximin saddle point of f(z) on Rn X Rm follow naturally from the above definitions. 229
Chapter 18
23Q
Suppose that £2 c Rn + m is a closed set. A point z* = [x*,y*] E d2 is called a minimax saddle point of the function f(z) on d2 if
f(x*)Y) ! f(x*)Y*) ! f(x)y*)
V [XY] C= d2.
If r-,
f(x*,y) ? f(x*,y*) ? f(x,y*) V [x,y] E
d2,
then the point z* E .fl is called a maximin saddle point of the function f(z) on Q. A point z* E .fl is called a locally minimax saddle point of the function f(z) on Sl if there exists a 8 > 0 such that V [x,y] E d2 n SS(z*)
f(x*,y) < f(x*,y*) < f(x,y*)
.
W-4
4-+
The definitions of an isolated saddle point and a locally maximin saddle point of 4 function f(z) on d2 should be obvious. We recall the definition of a quasidifferentiable function (see Chapter 10). A function f(z) is said to be quasidifferentiable at a point zo if it is directionally differentiable at this point and there exist compact convex sets f (zo) c Rn + m and df(zo) c Rn + m such that
af(zo) d8
_ 1 im 1 [f(zo+ag) -f(zo)l
a
max
=
VE f (z0)
(v,g) +
_max
f(zp)
(w,g)
.
(18.3)
Here
g = [g1,g2] E Rn X Rm, v = [vi,v2] E Rn X Rm, I-1
w = [wi,w2 ] E Rn X
Rm
.
Obviously, `o'
where o(a,g,zo)
a
-a 0 asa-4+0.
(18.5)
Saddle Points of Quasidifferentiable Functions
231
The pair of sets Df(zo) = &(zo), df(zo)] is called a quasidifferential of f(z) at the point zo. We have the following necessary conditions for an extremum: if F(x) attains its smallest value on RS at the point x* E Rs, then
_dF(x*) c dF(x*)
(18.6)
.
if F(x) attains its largest value on Rs at the point x** E Rs , then
-dF(x**) c iF(x**)
.
(18.7)
-dF (x*) c int dF (x*)
(18.8)
The condition
is sufficient for x* to be a local minimum of F(x) on Rs, and the condition
-dF(x**) c int dF(x**)
(18.9)
is sufficient for x** to be a local maximum of F(x) on R. Let N
d x (z) = {v1 E Rn 13V2 E Rm : v = [V1,v2 ] E f (z)} , N
acv
dfx(z) = {w1 E Rn 13 w2 E Rm : w = [w1,w2] E df(z)} N
fy (z) = {V2 E Rm 3 v1 (=- Rn : V = [V1,v2]
,
f(z)}
E
N
dfy (z) = {w2 E Rm 13 w1 E Rn : W = [w1,w2] E off(z)}
.
..'
The sets ex(z) and dfX(z) are projections of the sets (z) and df(z) on the subspace Rn, and fy(z) and dfy(z) are their projections on Rm. ,C3
"rd
Theorem 18.1. Suppose that the function f(z) is quasidifferentiable at z* [x*,y*]. If z* is a minimax saddle point off(z) on Rn X Rm, then
-dfx (z*) c
fx(z*),
-oYy(z*) c 6y(z*)
.
(18.10)
.
(18.11)
If z* is a maximin saddle point off(z) on Rn x Rm, then
-t9fy (z*) c o y (z*) r-.
-+fx (z*) c dfx(z*),
Chapter 18
232
P r o o f. It suffices to check (18.10). It is clear from (18.1) that x* E Rn is a minimum point of f1(x) = f(x,y*) on Rn, and y* E Rm is a maximum point of f2(y) = f(x*,y) on Rm. It is obvious from (18.3) that f1(x) and 12(Y) are quasidifferentiable at points x* and y*, respectively, and that
dfi(x*)], Df2(y*) = 0Z(y*), af2cY*»
D.fi(x*) _
,
where
fi(x*) = f(z*), 2(y*) = oy (z*),
dfl(x*) = ;)fx (Z*) df2 (y*) =
I
(z*) .
To prove the theorem now requires only the use of the necessary condition for a minimum of f1(x) (condition 18.6) and the necessary condition for a maximum of f2(y) (condition 18.7). 0
A point z* E Rm satisfying (18.10) is called a minimax stationary point, while a point y* E Rn x Rm satisfying (18.11) is called a maximin stationary point of f(z) on Rn X Rm.
Theorem 18.2. (Sufficient conditions for a saddle point). Suppose that f(z) is quasidifferentiable at the point z* E Rn X Rm and that relation (18.5) holds uniformly for g E R n + m , 11g 1l = 1, in the expansion (18.4) about the point zo = z*. If
-dfx (z*) c int
fx
(z*),
(18.12)
--o )'Y (z*) c int o?fy (z*) ,
then z* is an isolated locally maximin saddle point off(z) on Rn X Rm. But if
_d fx(z*) c int dfx(z*),
_cfy(z*) c int oo y(z*)
then z* is an isolated locally maximin saddle point off(z) on Rn X Rm.
P r o o f. The proof of the first part of the theorem follows from the sufficient condition for a strict local minimum of f1(x) (condition (18.8)) and the sufficient condition for a strict local maximum of f2(y) (condition (18.9)). The second part of the theorem can be proved in a similar way.
In the examples below, z = [x,y] E R1 x R1, x E R 1, y E R1, and zo = R01.
Example 18.1. Let f(z) = [xj - yI. It is easy to see that we can take Df(zo) _ W(zo), df(zo)], where
f(zo) = co {(1,0),(-1,0)}, df(zo) = co {(0,1),(0,-1)}
.
Saddle Points of Quasidifferentiable Functions
233
Fig. 18.1 makes it clear that
dx(zo) = [-1,1] c R1, df (zo) _ {o} c R1 fy(zo) = {0} c R1,
dfy(zo)
-dfX(zo) c int f(zo),
= [-1,1] c R' c int dfy(zo) ,
condition (18.12) holds for zo and, by Theorem 18.2, zo is an isolated locally minimax saddle point of f(z) on R1 x R1. Example 18.2. Let f(z) = LvI - Jxj. It can be verified as above that zo = [0,0] is an isolated locally maximin saddle point of f(z) on R1 X R1.
Example 18.3. Let f(z) = jxj + yl. Applying the rules of quasidifferential calculus (see Chapter 10), we obtain Df(zo) =
df(zo)], where
f(zo) = co {(0,0),(2,2),(-2,2)}
,
df(zo) = co We have (see Fig. 18.2)
_ [-2,2] c R1,
!2fx(zo)
!2fy
(zo) _ [0,2] c R1, Y
I
df (zo)
= [-1,1 ] c
R1
dfy (zo) = J-11 c R1 V
1
-T
X
-1
1
r Figure 18.1
Figure 18.2
Chapter 18
234
'r'
It is clear from this that neither condition (18.10) nor condition (18.11) holds at zo; therefore, this point is neither a minimax saddle point nor a maximin saddle point of f(z) on R1 x R1.
Ira
18.2. We shall now consider saddle points on quasidifferentiable sets. Sup-
pose that d2 c Rn + m is a quasidifferentiable set, i.e., d2 = {z E Rn + m I h(z) <_ 0 },
where h(z) = h(x,y) is a continuous quasidifferentiable function on Rn + m . Let zo E d2 and set ooh
8 E Rn+m
Y1(ZO) =
g E Rn+m
m
AZO) _
ahag >
<0
The nondegeneracy condition is said to be satisfied at the point zo if Y(zO) =Y1(zo),
where y is the closure of the set Y. Suppose that f(z) is quasidifferentiable .-,
(AD
on an open set S Q. We shall establish necessary conditions that must hold at a saddle point z* E d2 of f(z) on the set d2. If h(z*) < 0, then these conditions are given by Theorem 18.1. Let us now consider the case h(z*) = 0.
Theorem 18.3. Suppose that f(z) and h(z) are Lipschitz functions in a neighborhood of the point z*, that the nondegeneracy condition for the function h1(x) = h(x,y*) is satisfied at the point x* E Rn, and that the nondegeneracy condition for the function h2(y) = h(x*,y) is satisfied at the point y* E Rm. If the point z* = [x*,y*] E d2 is a minimax saddle point off(z) on d2, then
(ff (z*) + w) n [
(z*) + w')]
0
V W E d fx (z*), V W' E d hx (z*) ,
(-dfy (z*) - w) n [
(18.13)
(z*) + w')] # 0 .-.
V w E fy (z*), V w' E d by (z*)
Here K(A) is the closure of the conical hull of a set A. The proof follows from Remark 16.3.
.
(18.14)
Saddle Points of Quasidifferentiable Functions
235
Remark 18.1. Condition (18.13) is equivalent to
-dfx (z*) c
r)
[ fy (z*) + k(dhx(z*) + w)]
,
r)
[o_Yy (z*) - K(dhy (z*) + w)]
.
we dhy(a*)
while (18.14) is equivalent to -,o fy (z*) c
_
we ahy(z*)
ry,
Remark 18.2. The problem is somewhat simplified if d2 = £21 x £22, where d21 c Rn and d22 c Rm are quasidifferentiable sets.
Remark 18.1. Suppose that z = [xl,...,xm ] E Rn, n = ri ni, and that f(z) = fj(xj)***jxm)) i = 1:m, are finite-valued functions xi E Rn i. Then z* _ [4) ...,x,*n ] is called an equilibrium point if
(Z*) <
fi(x*,...,x,*
x x*
... x *
V x E Rn V i E 1:m.
If the functions f(z) are quasidifferentiable on Rn , then, as above, we can also write necessary conditions (in terms of quasidifferentials of the functions f (z)) which must be satisfied by an equilibrium point.
Chapter 19 STAR-SHAPED SETS AND THEIR APPLICATIONS IN NONSMOOTH OPTIMIZATION
19.1. One very important concept in Subdifferential Calculus is that of Minkowski duality (see Chapter 5), through which every convex compact set is
associated with a specific support function. The study of quasidiffer-
entiable functions is essentially based on the properties of the space of (CD
convex sets. Making use of this space, a sum of a convex function and a concave function can be associated with every class of equivalent pairs of convex compact sets.
The concept of a gauge (a gauge function of a convex set containing the origin) is very important in convex analysis. However, the definition of a b'4
gauge does not require the corresponding set to be convex but only to have a "star shape" with respect to its "zero" (origin). For this reason the idea of a gauge is not limited to convex sets, but can be applied to a much wider class of sets altogether (correspondence between gauges and these sets has long been recognized in the geometry of numbers). It is convenient to consider only those sets which are star-shaped with respect to their zero and which have a continuous gauge. We call these sets star-shaped. It is possible to
pea)
introduce algebraic operations (called here inverse addition and inverse
W'~
multiplication by a nonnegative number) within this family of sets in such a way that the natural correspondence between gauges and star-shaped sets
becomes an algebraic isomorphism. This allows us to use the standard algebraic technique (see Chapter 6) normally used to construct the space of convex sets to build the space of star-shaped sets. The duality between gauge functions and support functions (which holds in the convex case) allows us to consider the polar operator as a linear mapping from the space of star-shaped sets
into the space of convex sets. It
is
then possible to look at some
'"r
problems previously studied, using the space of convex sets from a different, some respect more quasidifferential calculus. in
236
general,
view.
This
is
particularly
useful
in
Star-shaped Sets and Their Applications in Nonsmooth Optimization
237
19.2. We shall now consider star-shaped sets and their gauges in more detail.
Definition 19.1 A closed subset U of the n-dimensional space Rn is called a star-shaped set if it contains the origin as an interior point and every ray defined by
{2xIA>_0} (x# 0)
X
does not intersect the boundary of U more than once.
spa
To justify the definition we shall show that a star-shaped set U is starshaped with respect to its zero, i.e., for all points x E U the set U contains the interval [O,x] = {2x I A E [0,1]1. Let us consider the set UX = U n 'ex, where x # 0. This set is closed since it is a subset of the ray J3 and the endpoints of the adjoining intervals are the boundaries of U. The fact that U ..,
is star-shaped implies either that there is no adjoining interval (i.e., UX = ZX ) or that an adjoining interval is unique and of the form { vx v E (v',+oo)}, where v' > 0. In this case, UX = [O,v x]. The star shape of U with respect to its zero follows immediately from the above and is equivalent to either the relation I
AU c U
V2E [0,1]
or the relation
AU D U V A z 1
.
Let .fl be a set in Rn and let 0 E int .fl. The function ICI
IxJn
= inf {A > 0 1 x E ASl}
(19.1)
is called the gauge of the set d2 (or the Minkowski gauge function). If d2 is v)'
convex, then the gauge coincides with the gauge function familiar from convex analysis; if d2 is a ball, then the gauge is a norm corresponding to CS`
this ball.
Theorem 19.1. Let s be a functional defined on Rn. The following propositions are then equivalent: (a) the functional s is positively homogeneous, nonnegative and continuous; (b) s coincides with the gauge of a star-shaped set .fl, where
Chapter 19
238
4-:
P r o o f. (a) Let s be a positively homogeneous, nonnegative, continuous functional and let £2 = {x I s(x) < 11. Then
It is easy to check that the set d2 is star-shaped. Indeed, the continuity of s implies that the set £21 = {x s(x) < 1 } is open and therefore zero is an interior point of £2. Let x # 0. If s(x) = 0, then the entire ray PX belongs to d2i , i.e., to the interior of £2, and therefore does not intersect its boundary. If s(x) > 0, then the intersection of the ray PX and the boundary of d2 consists I
+,,
of only one point Ax, where A = 1/s(x). At this point, s(2x) = 1. (b) Let s coincide with the gauge of a star-shaped set £2. Since £2 is star-
shaped, it follows from the definition that s(x) < 1 if x E £2 and that if s(x) < 1, then x E £2. Since £2 is closed, we have £2 = {x I jxi < 1 }. It is clear that the gauge is both positively homogeneous and nonnegative.
Let us now show that the gauge is continuous. Since the gauge is positively homogeneous, it is enough to check that the set B1 = {x (xi < 1} is closed and that the set B2 = {x (xI < 1 } is open. (Indeed, in this case the sets {x I lxi < Al are open for all A > 0 and the sets {x I (xi < Al are closed for all A > 0. Therefore the gauge is continuous on the whole space.) B1 is closed since it coincides with D. Suppose now that B2 is not open, that x E B2 and that there exists a sequence {xk } such that xk -4 x, ixk I >_ 1. We can assume I
+''
I
ate
0
without loss of generality that lim (xk I = V >_ 1. Take yk = xk/Ixk J. Then (yk I = 1
U4)
ItU
ate
and therefore yk is a boundary point of £2. Since yk - x/v is also a boundary point of £2. If x # 0, it follows that the ray PX intersects the boundary of d2 at at least two different points, x/ixi and x/v, which is impossible.
call
4-.
If lxi = 0, then the ray PX lies entirely in d2 and (from the definition of "star-shaped") does not contain any boundary points of £2. Thus the gauge of a star-shaped set must also be continuous and the theorem is proved. 0
{x
I
0.."
Remark 19.1. Since the gauge is continuous and int £2 coincides with the set
lxi < 11, £2 must be regular, i.e., it coincides with the closure of its
interior. Q,,
Let us denote by S the set of all star-shaped subsets of the space Rn, and by K the family of all nonnegative, continuous, positively homogeneous functions defined on Rn. The following propositions may then be deduced.
Proposition 19.1. A mapping i/: S -4 K which associates a gauge with every star-shaped set is a bijection.
The set K is a cone in the space C0(Rn) of all continuous, positively ...
homogeneous functions defined on Rn. Since every function from Co(Rn) is
Star-shaped Sets and Their Applications in Nonsmooth Optimization
239
I
0
0
'+~
0330<
.ow
completely defined by its trace on the unit sphere Sl = {x E Rn Ilx II = 11, where 4 11 is the Euclidean norm of x, the space Co(R") can be identified with the space C(Sl) of all functions which are continuous on Sl, whereas the cone K coincides with the cone of functions which are nonnegative on S. Assume that C(Sl) (and hence the cone K) are ordered in some natural way: fi ? .f2 HA (x) ? f2(x) V x. Let us introduce the following ordering relation (by antiinclusion) within
::z
.1.4
the family S of all star-shaped sets: d22
if Qi c Q2
.
tome
o4)
It follows immediately from the definition of a gauge that the bijection VI
which associates a gauge with every star-shaped set is an isomorphism of ordered sets S and K. In other words, relations Q1 c Q2 and [xIi >_ (x12 V x .,.
are equivalent (where I.1i is the gauge of the set Sli).
The cone K is a lattice (see Chapter 6), i.e., if fi ,...,fm E K, then functions [and f defined by min fi(x), i
f(x) =
max f(x) i
also belong to K. Let f be the gauge of a star-shaped set d 2j. Then [is the gauge of the union Q = ud2i and f is the gauge of the intersection Q = n Q1. This follows from the relations (19.2)
{2>OIxEW} = nJA >OIxEAQi},
(19.3)
which can be verified quite easily. 0
Thus, the union and intersection of a finite number of star-shaped sets are star-shaped sets. Furthermore, the union coincides with the infimum and the intersection with the supremum of these sets in the lattice S.
Proposition 19.2. Let A be a set of indices and let U. be a star-shaped set with gauge (a . If the function (xl = a n A (xla is continuous, then it is the gauge of the set cl u Ua. If the function Ix! = a EpA Ixla is finite and contiI
a
nuous, then it is the gauge of the set a Ua.
Chapter 19
240
ate,
We shall prove only the first part of the proposition. Since the function IxIa is continuous, it follows from Theorem 19.1 that this function I xI = a n A is the gauge of some star-shaped set t2. It is now not difficult to check that £2 = cl au U.. Indeed, the continuity of the functions I and ( a implies that
int.fl = {x I IxI < 1} _ x I inf IxIa < 1} = u int U. a a Therefore, making use of the regularity of star-shaped sets yields
12 = cl int a = cl u int U. = cl u Ua a
a
This proves the first part of the proposition.
o
19.3. The algebraic operations of addition and multiplication by a nonnegative
number have been introduced within the family K of gauges of star-shaped sets in a natural way. We shall now introduce corresponding operations within the family S with the help of isomorphism Vv. I' I
Let Sl c S and .2 ? 0. We shall describe the set A 0 dl with gauge is the gauge of Sl, as the inverse product of L.2 = AI' In , Where I
the set dl and the number A. The set .ill (D .f12 with gauge I'I
which satisfies the relation
= I'll +1'Iz.
where I li is the gauge of set a1, is called the inverse sum of the star-shaped sets d2l and d22.
It follows from the definition that if A > 0, then A 0 a = 1/2 a. If A = 0, then the set A 0 d2 coincides with the entire space Rn.
We shall now describe inverse summation. To do this, we require the following elementary proposition.
Proposition 19.3. Let al,...,am be nonnegative numbers. Then
al + ... + am =
min max a. 0
jai=1 (where it is assumed that 0/0 = 0).
i
1
ai
ai
(19.4)
Star-shaped Sets and Their Applications in Nonsmooth Optimization
241
P r o o f. If ai = 0 for all i, then (19.4) is trivial. Consider the case I ai > 0 and put ai
V i E 1:M.
m
ak k=1
Let Jai I i E l:m} be an arbitrary set of nonnegative numbers such that
ai =
1
.
i =1
If ai > ai V i, then 1 = I ai > I ai = 1, which is impossible. Therefore for some j we have ai <_ aj and, by the definitions of the aj, m k=1
ag - a j
This implies that m
max i
1
ai
ai ? Y ak . k=1
At the same time, m
max e
which completes the proof. I
jf2,
m
y ak = Y ak a ai = max k=1 k=1 i e
D
Now let us consider the star-shaped sets .fll and d12 with their gauges be the gauge of their inverse sum and lf22, respectively, and let I
I
I
.ill (D .f12. Then the following equality holds for every x:
Ixl
= Ixlf2, + lxlf2Z
= m i n max f OS a51
min max {IxIaai,
O5 a<<-1
a
Ix In , ,
11a IXI. 2 }
(kI(1- a )Q 2
min
IxIa
O5 a<-1
a is the gauge of the set rill n (1-a)Q2. (It is assumed that 012=anoaQ.) where
Chapter 19
242
Since the function ( I is continuous, it follows from Proposition 19.2 that
.ill (D d22 = cl
u
[rill n (1- a) Sl2 ]
.
0<_a<_1
Cod
Note that the role of zero (a neutral element) with respect to summation in a "semilinear space" S is played by the space Rn (since the gauge of Rn coincides with the identity zero). At the same time, Rn is the smallest element in the ordered set S. We shall now give some examples.
Example 19.1. Consider the following rectangles in R2:
U = [-1,1]x[-2,2]
V, _ [-22,22]x[ The inverse sum of these rectangles coincides with an octagon which is symmetric with respect to the coordinate axes. The intersection of this
(0,0),
'
0,' (1+2a.' 1+2A(TI
a''
octagon with the first quadrant has the vertices: * 2,A
2+A ,
.
CDR
Rectangles U and V1 and their inverse sum are shown in Figure 19.1. The set U Q V1 is shown in Figure 19.2. Example 19.2. Let U = {(x,y) E R2 I y <_ 11 and V = {(x,y) E R2 I x<_ 11. The set
U o V is depicted in Fig. 19.3.
Example 19.3. The sets U and V are presented in Figs. 19.4(a) and 19.4(b), respectively; the set U o V coincides with the intersection of U and V (see Fig. 19.4(c)). "'CD
19.4. We shall now describe the vector space generated by the "cone" of star-
shaped sets S, for which an order relation (with respect to antiinclusion) and inverse algebraic operations have already been defined. We shall argue in the same way as in Chapter 6. Let S2 be the set of pairs (U1,U2), where UU E S. Let us introduce within
S2 the operations of inverse addition Q and inverse multiplication by a number 0, and a pre-ordering relation ? and an equivalence relation ---. These are defined as follows:
143
Figure 19.1
A 1.67
0.48
I
.83 0.95
Figure 19.2
Figure 19.3
Chapter 19
244
(Ul,U2) (D (VI ,V2) = (U1 (D Vl, U2 (DV2) ,
AO(U1,U2) _ (2OU19
2OU2) if2?0,
A O (U1,U2) _ (I%I O U2,I%I O U1) if A < 0
(U1,U2) > (V19V2) H U1 0 V2 > U2 (D (U19U2)
,
V1
(V19V2) H U1 (DV2 = U2 (@ V1 .
'BCD
0
We shall now factorize the set S2 with respect to the equivalence relation In other words, we shall consider the family T of all classes of equivalent pairs. Since the operators o and 0 produce equivalent pairs when applied to equivalent pairs, the operations of inverse summation and inverse multiplication by a number can be introduced within T in quite a natural way. The order relation within T is derived naturally from S2. An element of T which contains a given pair (U1,U2) will be denoted by [U1,U2]. We shall identify an element U of the set S with the element [U,Rn ] of the set T. The equality [U19U2] = [U1,Rn ] +O [Rn ,U2] = [U1,Rn ] O [U2,Rn ]
(where O 77 = p+ (-1) 0 77) then implies every element of T can be represented as the difference of two elements of S, i.e., T is the smallest
vector-ordered space containing S. For this reason we shall call S the space C/1
of star-shaped sets (compare this with the space of convex sets). We shall associate with every pair (U1,U2) E S2 a positively homogeneous function f = ' I i - 1'12, where ' Ii is the gauge of Ut. It is clear that two pairs generate the same function if and only if they are equivalent. Hence, the function f = I ' I1 - 1'12 E C0(Rn) is associated with every element [U1,U2] of the space T. Conversely, by representing a continuous positively homogeneous function I
i
,036
f in various forms f = fi - f2 (where f E K), we conclude that every element of the space C0(Rn) is associated with a class of equivalent pairs [U1,U2], where U1 = {x I f(x) <_ 11. Identifying, as above, a star-shaped set U with the element [U, En ] c T, we conclude that the mapping
[11i,Ui] - I'll -
(19.5)
is an extension of the bijection ': S - K (which associates a gauge with a star-shaped set) to the bijection T -4 C0(Rn ). We shall use the same symbol i to denote this bijection and refer to it as a natural isomorphism.
u U
Figure 19.4(a)
Chapter 19
246
..r
It is clear that VI preserves both the algebraic operations and the order relation. It is also clear that T, Co(R'1) and C(S1) can be viewed as different manifestations of the same ordered vector space. It is well known that the space C(S1) is a vector lattice: with its elem
ments fi ,..., fm it includes a pointwise supremum v f and a pointwise infimum i =1
m
A f . In addition, if f = fl i - f2i 9 then i =1
m
m
m
i=1
k=1
i=1
m
m
v (flk+f2jJ i-k
of=
Af= i=1
k=1
() fl k +
i-k
.f2 i
-
.f2 i
m
i=1
.f2 i
We may now conclude that the space T is also a vector lattice: if a1,..., an E T,
m
m
i=1
1 k=1
v ai =
n
v (Ul k ED
A at = i=1
Ulk (D
( Y- ED) Uzt), iA
k=1
+O U2i
+O U2i
i k
(19.6)
i =1
ED)
U2i
1%o
ai = [U1 i,U2 i], then
(19.7)
=1
where (Y, 0) denotes the inverse sum of the corresponding terms. From (19.6) and the relation m
m
i=1
i=1
A ai = - v (-ad ,
we conclude that m
A ai = i=1
(E ED) U,,, n (U2 k i =1
k=1
JUiJl.
(19.8)
mar,
/-Y
°0o
Equation (19.8) is in some respect more convenient than (19.7).
Let a = [U1,U2] be an element of the space of star-shaped sets and let f = I'll - 1'12 be the corresponding positively homogeneous function:
Star-shaped Sets and Their Applications in Nonsmooth Optimization
247
Let V = {x I f(x) <_ 11. The set V is star-shaped. It is not difficult to check that the element a+ = a v 0 coincides with [V,Rn ], i.e., that V is the smallest (in the sense of the ordering within S, or the largest with respect to inclusion) U
star-shaped set with the property U1 z) U2 o V. We shall now introduce a norm I I within the space CO(Rn ). If f E CO(Rn ), then
IfI = max
x
xERn
where
II
II
11X 11
is the Euclidean norm in R" . The corresponding norm in C(Sl) is
If I = m a k, Jf(z)j.
In what follows we shall use the equality
Ifl = inf{A _ 01 -r211x11
Let B be the unit ball in Rn. The element e = (B,Rn) of the space T corresponds to the function , and the element -e = (Rn, B) to the function - . Let us define the following norm in T: II
II
II
II
where a E Rn . If a = [U1,U2], then
lal = inf{.2>O1 r-"
U
set IUI = I [U,Rn ] I and therefore
IUI = inf{A > O 1 A U
inf{A > O 1 U
0.0
obi 0d0
,.G
Let X be a star-shaped compact set in Rn and let E be some subset of the
family S(X) of all star-shaped subsets of X. Let U EE and let JU be the I
gauge of U. We shall consider the sets
du = {x I (the boundary of U) and
Jxju = 1 }
Chapter 19
248
Proposition 19.4. Let a subset E of the space S(X) be closed in the topology of the space of star-shaped sets T. The set E is compact in this topology if and only if (i) there exists a neighborhood B of zero such that B c U V U E I;
(ii) for every .6 > 0 there exists a 5 > 0 such that du +B5 c dU V U
`',
where By = {x I IIx II < 5}.
0c,
P r o o f. Let us consider the set E1.1 of all functions from C(S1) this
Mad
:.CD
wpb
QC..,
represents a contraction (on S1) of the gauges of sets from E. The fact that being compact. By the the set E is compact is equivalent to the set Arzela-Ascoli theorem, this property of 2j.I is equivalent to this set being bounded and equicontinuous. It is clear that condition (i) is satisfied if and E1.
1
only if E., is bounded.
'"*.,
We shall now show that condition (ii) is equivalent to being equicontinuous, assuming that (i) holds. Let condition (ii) be satisfied and let
UEE.
tea)
7..
IAA'
V
First note that there exists an r > 0 such that IIx II > r for any U E I and any x such that IxIU = 1. This follows immediately from condition (i). Since X is compact, there exists an R < 00 such that IxI <_ R V x E X. Let condition (ii) be satisfied. We shall show that the set L1.1 is uniformly
continuous, i.e., for e > 0 there exists a 5 > 0 such that the relations
IIx II
=
Ily II = 1 and 11x -y II < Simply that Ixlu - Iymu I
< e V U E E.
Putting e' = re, we shall find a 8' corresponding to e' such that (ii) is satisfied. Let 5 = 8'/R, take elements x and y such that IIx II = Ily II = 1, top
11x -y II < 8 and choose any U E E. Let
A=
1
,
IxIU
x = 2x,
y
= Ay .
It is clear that 11x'11 = 11y'11 = A. Since Ix'I U = 2I xIU = 1, we have x E U C X and hence A = 11x'11
R.
Thus
11x' -y'11 = A11x y1I < R8 _ 8' From (ii) we have y' E dU , i.e., LY'Iu - IxIU
< C' .
Star-shaped Sets and Their Applications in Nonsmooth Optimization
249
Since Ix'IU = 1, we have Ilx'll = A > r and therefore 1
- lulu _
I1XPIU
I
- LY 'U
I
r E' = e
<
0O)
I1XIU
MME
Since 5 does not depend on U, the required uniform continuity is established. Next let us assume that the set of functions L1.1 is uniformly continuous.
It is necessary to prove that (ii) holds. Choose an e > 0 and let e' < dIR. We shall find a 8' such that the relations IIx' II
=
= 1,
Ily' II
<S
Ilx' y' II
imply that IIx'IU
VUE
- VIu < .6' I
E.
Choose a 5 such that 5 < (e-Re')r, 5 < rS'. Let U EE and X E dU, i.e., Iu = 1. Assume also that
ily x II
E S and put x' = x/ Ilx ii, y' = y/ ily II. The
inequality 2
IlyII
Iix ii
II
Ilx II
Ilx y 112 1 Ily 11
implies that iix' y' ll < 5'. At the same time, since ily' ii = 1, we have Iy'IU
IlxlU
- lulu = I
1
r and hence
IIX'IU . lix u - Iy, JU uy" 111
I[x'IU' lix II
<_
5
lix u
IIX-IU - VIu I + VIu I lix II - Ily II
Re'+1IIx-y II r
<e,
HIYUU_11 _< C
or, equivalently, y E dd . This completes the proof.
0
Chapter 19
250 10
OD,
Remark 19.2. Let I be the family of all convex compact sets belonging to X for which condition (i) of Proposition 19.4 is satisfied. Then it is not difficult to show that the set E1.1 is equicontinuous and therefore I is compact.
19.5. Let L be the subspace of the space C0(R'z) which consists of functions which can be represented by the sum of a convex function and a concave function. It was shown in Chapter 6 that the mapping 0: M - L defined by 'P([U,V])(x) = max (u, x) + m i n (v, x) U EU
(19.9)
V EV
p00
is an algebraic and ordering isomorphism (it is of course assumed that L is provided with natural algebraic operations and an order relation).
The inverse mapping j-1 associates an element [pp, dq] from M with a function p + q E L (here p is the subdifferential of the sublinear functional p and dq is the superdifferential of the superlinear functional q). Let us consider a subset U of the space R'z, with polar U° given by
000,°
U° = {x I (u, x) < 1 V U E U}
.
Dap
W04°0
The main properties of the polar are (see [79,88,1361): (i) the set U° is convex and closed; 0 E U° ; (ii) if U is convex and closed and O E U, then U°° = U ; (iii) U is compact if and only if 0 E int U° ; (iv) let U be a convex closed set, with O E U. Then the gauge function of U coincides with the support function of the polar U° and the support function of U coincides with the gauge function of the polar; (v) let U1 and U2 be convex and closed and let 0 E U1, 0 E U2. Then
the relations U1 D U2, U1 c UU are equivalent and (U1+U2)° = U1 O+ U2
(
=
°
,
1 U° if A > 0 .
Now let us consider star-shaped convex sets U1 and U2. Since 0 E int Ut, the polar U is compact. Since the gauge I It of the set U1 coincides with the support function of the polar Ui*, the following relation holds: 1x11 - Ix12 =max ° (l,x) - m a x ° (l,x) I EU1 I EU2
max ° (l,x) + min I
E U1
°
IE [-U2]
(l,x)
.
(19.10)
Star-shaped Sets and Their Applications in Nonsmooth Optimization
251
Let yf and 'P be mappings defined by formulas (19.5) and (19.9), respec-
tively, let a be an element of the space T containing the pair (U1,U2), and let /3 be an element of the space M containing the pair (Ul , -UU ). It follows from (19.10) that yia = (Pf3 and hence /3 = (0-1 ;V) (a). The operator it =
C13
It is clear that TT is a linear space (this follows from the fact that the
BCD'
convexity of a star-shaped set is equivalent to the convexity of its gauge). BCD
The set of values of the operator it coincides with the space of convex sets. Indeed, for 8 EM it is always possible to find a pair (U,V) E 8 such that 0 E U, O E V. Then U = (U°)°, V = (V°)° so that ,(3 = ira, where a = [U°,V°] E Tc.
The properties of the polar imply that the operator it is linear and order-preserving.
19.6. The space of star-shaped sets can be used to provide a geometrical interpretation of directional derivatives. Let f be a function defined on an open set Q c R'z and suppose that at a point x E R'z we can construct the
hot'
directional derivative of f:
dg
f"' (g)
m
[f(x+ag) -f(x)l
where the function fX (g) is continuous in g.
Since the functional fX is positively homogeneous, an element of the space T of star-shaped sets is associated with f,,. In other words, a pair of star-shaped sets (U,V) exists, such that
f,(g)=min{2>OIgE2U}-min{2>OIgE2V} or, equivalently,
f,(g)=min{2>OIgE2U}+max{2
(19.11)
Note (from equation (19.11) that the pairs (U,V) and (UI,VI) represent the 0
derivative of f if and only if they are equivalent. Let us denote the set U in (19.11) by f(x) and the set V by df(x). Invoking the properties of the space T of star-shaped sets, it is possible to state rules for algebraic operations over functions and the corresponding pairs:
Chapter 19
252
4(fi + f2)(x) =
fi (x) E) 4f2(x) ,
d(f, + f2)(x) = df, (x) E) df2 (x)
,
4(fi f2)(x) = fi (x) 0 f2(x) E) 12(x) 0 4fi (x) ti:
fi (x) 0 df2(x)
12(x) odf, (x)
Using formulas (19.6) and (19.7) and the rules for differentiability of a maximum function, it is easy to find d (max f (x)),
d (maxj(x)), d min f (x)), d min f (x))
It is clear that a function f is quasidifferentiable at x if and only if there exist convex sets df(x) and df(x). In this case,
4f(x) = [df(x)]`, df(x) = [-af(x)]`, where fi(x) and af(x) are a subdifferential and a superdifferential, respectively of f at x (this follows from Section 19.5). We shall now present a geometrical interpretation of necessary conditions for a minimum, based on the following lemma.
Lemma 19.1. Let a function f be directionally differentiable at x E R'z , let its derivative f(g) be continuous in g and let K be a cone in R1z. Then (i) the relation
m i n fX (g) = 0 g EK
is satisfied if and only if 4f(x) r1 K c df(x); (ii) the relation
m a x fX(g) = 0
g EK
is satisfied if and only if df(x) n K c fi(x).
Star-shaped Sets and Their Applications in Nonsmooth Optimization
253
Proof. Let us write fX(g) in the form f.,.'(9) = 1911 - 1921
where I' I I is the gauge of the set
9
f(x) and 1'12 is the gauge of the set df(x).
Assume that
m i n fX (g) = 0 and g E ff(x) n K, gEK
1 and I8li - 1812 ? 0, so that 1$12 <_ 1, which is equivalent to the inclusion g E df(x). Thus, we have f(x) n K c df(x). then 1811
CPO
(CD
:..
Otherwise, suppose that this last incluson holds. For a g E K such that 1811 > 0 let us find a A > 0 such that RgIl = 1. Then Ag E fi(x). But since Ag E df(x) we have the inequality IA912 <_ 1. This means that 1811 - 1812 = 181i)
4-o
UFO
f(g) >_ 0. Thus, if IgIl = 0, then 1$12 = 0 (since 1812
Part (ii) of the lemma can be proved in the same way. D Let X E .fl c R" . By yX we denote the cone of admissible directions and by TX we denote the cone of feasible directions of .fl at x (see Chapter 15). CPO
Theorem 19.2. Let x* E .fl be a minimum point of f on .fl. If f is directionally differentiable at x* and fX(g) is continuous in g, then
c df(x*) .
(19.12)
4f(x*) n yX* c df(x*) .
(19.13)
4f(x*) n yX If f is uniformly differentaible at x*, then
The proof is a consequence of Theorems 16.1 and 16.2 and Lemma 19.1.
Chapter 20 APPROXIMATE Q UASIDIFFERENTIABILITY
::r
The e-subdifferential plays an important role in convex analysis (in particular, because an e-subdifferential is a continuous mapping if e > 0). It therefore seems useful to have some analogous approximation for quasidifferentials. Here we describe an approach suggested by V. Gorokhovik [61,62].
20.1. Let a function f defined on an open set X c Rn be directionally differentiable at a point x E X, and let e > 0. We say that a pair of compact convex sets DEf(x) = [r7 f(x), aEf(x)] is an e-quasidifferential of the function f at the point x if
x f(g) -
(v,g) + min
max
(w,g) ski
W E def(x)
L''E def(x)
where B is the unit ball. Equivalently, we can define De f (x) _ [dE f (x), ^..,
de f(x)] as an e-quasidifferential of f at x if max
[yE def(x)
(v,g) + min
(w,g)
WE def(x)
< eIIgII
IV g Rn
.
ski
x (x) -
ago
Clearly, if a pair [de f (x), de f (x)] is an e-quasidifferential of f at x, then any equivalent pair (as defined in Chapter 6) is also an e-quasidifferential of f at x. Thus, an e-quasidifferential is an element of the space of convex sets. Note that even in this space an e-quasidifferential is not uniquely defined, since even nonequivalent pairs may be e-quasidifferentials of f at x. We say that a function f is approximately quasidifferentiable at x if f has
an e-quasidifferentiable at x for any e > 0. It is clear that any quasidiffer4-4
entiable function is also approximately quasidifferentiable, and a quasidifferential is also an e-quasidifferential for all e > 0. Theorem 20.2 shows that the class of approximately quasidifferentiable
3a)
functions is quite rich. But first let us recall a fundamental result from functional analysis. 254
Approximate Quasidifferentiability
255
Theorem 20.1 (Stone-Weierstrass theorem). Let Z be a set of functions defined and continuous on a compact set S such that (1)
Z is a linear set;
(ii) if zl, z2 E Z, then the functions
z(x) = max {z1 (x), z2(x)} and z(x) = min {zl(x), z2(x)} also belong to Z; cow
(iii) Z contains the identity element, i.e., the function e defined b y the equality e(x) = 1 V X E S. coo
If a function h is continuous on S, then there exists a sequence {zk Zk E Z V k} which converges uniformly to h.
This theorem is proved, for instance, in [87].
Theorem 20.2. Let a function f be defined on an open set X c Rn, X E X. The function f is approximately quasidifferentiable at the point x if and only if f is directionally differentiable at x and its directional derivative f(g) is %o'
continuous (as a function of g).
P r o o f. (i) Let L be the set of all functions defined on R'z which can be represented as the sum of a sublinear function and a superlinear function. Let Z be the trace of L on the unit sphere S = {g E R'z II g II = 11 (z E Z if and only if there exists a function e EL such that z(g) = 1(g) V g E S). Since L is a linear space, Z is also linear. From Lemma 6.1, the set L satisfies condition (ii) of Theorem 20.1, and therefore Z also satisfies this N-,
I
condition.
Finally, the identity element (the function e -the trace of the
sublinear function p(g) = II g II on S) belongs to Z. Thus all the conditions of Theorem 20.1 are satisfied.
'a'
p,,
Let a function f be directionally differentiable at x and the function f(g) be continuous in g. It follows from Theorem 20.1 that for any e > 0 there exists a function Z E Z such that Lfx'(g) - Z(g) I
<EVgES.
Let a function I E L be such that its trace on S is z and consider a N
sublinear function p and a superlinear function q such that p + q = I. Then
Chapter 20
256
This relation immediately implies that the pair [dam, aq] is an e-quasidifferential of the function f at x and therefore f is approximately quasidifferentiable. (ii) Let a function f be approximately quasidifferentiable at x. It follows from the definition of approximate quasidifferentiability that for any natural number k there exists a funcrion ek EL such that
[f(g)
-
Vg
k Ilg II
ek(8)I
.
This means that for any R > 0 the sequence {.2k(g)} converges to f(g) uniformly on BR (0), where
BR(O) = {g E Rn I
Ilg II
_< R}
,
i.e., fX is the uniform limit of a sequence of continuous functions and therefore fX is also continuous. This completes the proof. 0 20.2. We shall now give a theorem on the e-quasidifferentiability of a composition.
Let functions hl,...,hm be defined on an open set X c Rn and where ej > 0. Let De .hi(xo) _ [de hi(xo), do .hi(xo)] be an e1-quasidifferential 1
of the function hi at a point xo, and H(x) _ (hl (x),..., hm (x)}. Consider a 4-.
function f defined on an open set in Rm containing the point yo = H(xo) and such that f is uniformly directionally differentiable at yo. Suppose that for some > 0, f has an quasidifferential:
Dj(yo) _ [d .f(yo), d.f(yo)] Put CO
_ em+ IIeIIC,
where
M=
max IIHX (v) II
=
IIvII<_1
max (I (hj(x,v)}2 IIvII<_1
C = max (IId.f(yo) II, Ila f(yo) II) and lie II
1/2
(20.1)
is the Euclidean norm of the vector e. If A c Rn is a compact set,
then, by definition (see Chapter 5), 11A11
= max 11111 .eEA
Approximate Quasidifferentiability
Let vectors
vol
257
= (vi,...,v,') and v" = (vl ,...,v,n) be such that
V' < v < v" VVE d,f(yo)udj(yo). Theorem 20.3. The pair of sets [D, D], where tai
D=wIw=
vi(2i + 1ui) c =1
- vi2i -
V = (Vi,...,Vm) E d.f(yo),
Ai Ed, .hi(xo), ui Ed, .hi(xo),
(20.2)
m
D=
ww=
Vi(2i +,ui) + V12i + V!iui; V = (Vi,...,Vm) E c =1
Ai Ed, .hi(xo),,ui Ed, .hi(xo)
E ,
(20.3)
is an eo-quasidifferential of the function 0(x) = f(hi(X),...,hm(X)) at the point x0.
P r o o f. It follows from the conditions of the theorem that the function O(x) = f(H(x)) is directionally differentiable at xo and O'(xo,v) = f'(yo, H'(xo,v)) . Let
A = dj(yo), B = dj(yo) -e E
,
1(0) = m a x (u,u) + min (v, u) ft EA
V EB
It follows from the definition of an e-quasidifferential that [f'(yo,u) - .e(u)i
E no u II
du
Let us write e in the form
._
e"
,
.
.
Chapter 20
258
where
.e'(u) = max (u,u), e "(u) = m in (v, u)
.
ft EB
ft EA
The function e' is sublinear, and therefore (see theorem 5.2) it is Lipschitz (ODE'
with a constant IIA II. It also follows from Theorem 5.2 that the sublinear function e" is Lipschitz with a constant IIB II. Thus a must be a Lipschitz function with a constant C defined by (20.1). Let
Al = d .hi(xo), Bi = de .hj(xo)
1i(v) = max (u,v) + m i n (v,v) ft E Ai
V E Bi
Let L: R'z - Rm be a mapping with coordinate functions ei : Lv = (,el (v) ......
m (v)}
.
LIZ
We have
IIH'(xo,v) - Lvll = E (hi(xo,v) <
- _e(v)I2)ii2
'(6?Ilvl12)1i2 =
IIEII
Ilvll
Using this inequality and the definitions of M and C yields I f'(yo,H'(xo,v)} - .1(Lv)I
I f'(yo, H'(xo,v) - I(H'(xo,v)}I + II(H'(xo,v)} - .e(Lv)I IIH'(xo,v) II
+ CIIH'(xo,v) - Lv II
M'MIIvII + CIIeII
IIvII
= eIIvII
To complete the proof, it is necessary to check that the function o(v) = .e(Lv) can be represented in the form
0(v) = m a x (u,v) + m i n (v,v) , fl ED
VED
where the sets D and D are defined by (20.2) and (20.3).
Approximate Quasidifferentiability
259
Since 0 is positively homogeneous, we have a-4+0 a
Analogously,
L'(O,v) = L(v), .e'(O,u) = .e(u)
.
It follows from Theorem 12.2 that
O(v) _ 0'(O,v) =
max
E ao (o)
(u,v) + min
(v.v)
,
vE do co>
where m
WW=
\.y
(0) =
[V i (2i+pi) - vi2i -" vi,ui ]; V = (V19 ... )VM) E de(0),
r..
i =1
Ai E dei(O), Pi E dei (O)
,
m
do (0) =
ww=
[Vi(/Z.i+,lli) + v /.i + VI;ui]; v = (Vi,...,Vm ) E de(0}, i =1
Ai E
dei(O),,Ui E dei(0)
and
v' < v < v" V V E de(0) u de(0)
.
Since the pair
[A,B] _ [d.f(yo), d.f(yo)] is a quasidifferential of e at zero and the pair [Ai,Bi] = [d .hi(xo), d£ .hi(xo)]
is a quasidifferential of ei at zero, &(0) coincides wih the set D and 9O(0) coincides with the set D. 0 Using Theorem 20.3, it is not difficult to derive formulas for the e-quasidifferentials of a sum, a product, a quotient, a pointwise maximum and a pointwise minimum. For example, let us compute the e-quasidifferential of a sum. Consider the function f(y) = yi + --- + y11. In this case ! = 0, d N f (y) = (1,...,1) and d f (y) = 0.
Chapter 20
260
20.3. A point x0 is called a local e-minimum point of a function f if there exists a neighborhood N(x) of x0 such that
f(xo) < f(x) + Elix-xo II
V x E N(xo), x # xo
.
(20.4)
Here e > 0. Clearly any point xo satisfying (20.4) is a local minimum of the function
h. (x) = f(x) + e II x -
xo II
and, conversely, any local minimum point of h£(x) is a local e-minimum point of the function f. This definition of a local e-minimum point seems to be more natural than
the one used in convex analysis: there an e-minimum point is defined as a point satisfying f(xo) <_ f(x) + e d x. Clearly, for any e > 0 an e-approximation of a minimum point can be rough enough.
Theorem 20.4. Let a function f defined on an open set X c R" be directionally differentiable at a point X0 E X, and let its directional derivative f'(xo,g)
be continuous as a function of direction. Let e > 0 and let x0 be a local .6-minimum point of the function f. Take El > 0 and let
Del.f(xo) =
[de1f(xo), de1f(xo)]
be an El-quasidifferential off at x0. Then de1f(xo) + (e+e1)B D [--a£1 f(xo)]
.
P r o o f. Let .e(g) =
max vE a£1f(XO)
(v,g) + min
(w,g)
.
wE d£1f(xo)
Then Lf'(xo,g) - 1(g) I <_ e1 II g II
Vg
and hence
f'(xo,g) < 1(g) + e1
VgES
,
where S is the unit sphere. On the other hand, since x0 is a local e-minimum point, we have f(xo)
_<
f(x0+ctg) + ae V g E S
.
Approximate Quasidifferentiability
261
This inequality implies that Al
f(xo,g)
a
Taking the limit as a -+ +o, we have f'(xo,g) >_ -e and therefore
1(g) =
(v,g) + min
max
(w,g) >- - (e+e1) _ - (e+e) II g II
wE d£1f(xo)
VE d£1f(xo)
VgE S. Putting e' = e + e1, we can rewrite this inequality in the form e' II g II
+
max
(v,g)
-min
>_
WE [-d£1 f(xo )]
vE d£1f(xo)
(w,g)
or, equivalently, max
VE [d£1f(xo)+£'B]
de , g(xo)
(v,g)
-min
WE [-d£1f(xo)]
+ eB c: [-d. , f(xo) 1
(w,g)
,
0
Corollary 20.1. If xo is a local minimum point off, then ,.f(x0) + E1B
[-df1.f(x0)]
b El > 0
.
Remark 20.1. A point xo is called a local e-maximum point of a function f if there exists a neighborhood N(xo) of xo such that
f(x0) > f(x) + Ellx-xo II
V x E N(xo), x # xo
.
0
In an analogous way it is possible to prove that if xo is a local E-maximum point of f, then
a f(xo)
[-d
+ (e+ei)B] .
In particular, if xo is a local maximum point, then
a f(x0) c [--d£,f(xo) + e1B]
V el >- 0
.
262
Chapter 20
20.4. Constrained optimization problems can be treated as in Chapter 16. The 4.,
following analogue of Theorem 16.3 holds.
Theorem 20.5. Let a function f defined in a neighborhood of a point x be directionally differentiable at the point x, and let f'(x,g) be continuous in g. +.4
Let K c Rn be a convex cone. Then ,.,
(i) the equality
min f'(x,g) = 0 g EK
holds if and only if
Ve>0;
ae.f(x) + eB - K* (ii) the equality
m a x f '(x,g) =0 g EK
holds if and only if
df f (x) +,6B + K*
[-d f (x)]
Ve>0.
P r o o f. We shall consider only case (i). Let e > 0 and
f'(x,g) >_ 0 V g E K. Al
(20.5)
Then max r01
vE df (x)
(v,g) -
max
WE [- If (x)]
- 0VgEK
(w,g) + e I i g I I >
Put
pi(g) = max vE
p2(g) =
(v,g) + e II g II =
max W r= [-;)f W]
max w r= [af(x) +£B ]
(X)
(w,g)
.
(v,g)
.
Approximate Quasidifferentiability
263
As in Theorem 16.3, it can be shown that
dpi - K* z
,
which implies that !2f(x) + eB - K*
VE>0
(20.6)
.
Arguing the same way as in proving Theorem 16.3, it is possible to prove that (20.6) implies (20.5). D
The development then proceeds as in Chapter 16. Let f attain its minimum on the set
at a point xo. Let f and h be directionally differentiable at xo and let their derivatives
f'(xo,g) and h'(xo,g) be continuous in g. Fix e > 0 and let
Y£1
(w,g) < ell g II g I max (v,g) + min vE d£h(xo) WE d£h(xo)
Assume that Y i is not empty. Then
h'(xo,g) <0 V g E Y1 .
.
BIBLIOGRAPHIC NOTES
Chapter 2
000'r
The theory of multivalued mappings is very important in the study of extremal problems, in convex analysis, mathematical economics, and related fields. In Chapter 2 we give some elementary results related to the topological theory of multivalued mappings. A detailed survey of the main results of this theory is presented in [20], which also includes an extensive bibliography. The Hausdorff metric is discussed from the geometric standpoint in [64].
Chapter 3
The importance of directional differentiability in nonsmooth optimization
Cry
is due to the fact that many of the functions occurring in real problems are nondifferentiable but have the property of directional differentiability. Examples include convex functions and maximum functions. The directional differentiability of maximum functions is discussed in many publications; see, e.g., [32, 34, 38]. As already noted, Theorem 3.2 can be found in Pshenichnyj [127]. An approach to the differentiability of a maximum function based on the composition theorem is described in [39]. Chapter 4
gun
The true "encyclopedia" (the "Bible") with respect to convex functions is Rockafellar [136]. Convex analysis is based on fundamental results obtained by F. Fenchel [52, 53], J.-J. Moreau [104], and R.T. Rockafellar [137]. The properties of convex functions and numerical methods for minimizing such functions are described in [44, 107, 119, 127, 128, 130, 148]. The fundamental concept in convex analysis that of a subdifferential-was introduced in Moreau [104] and Rockafellar [137]. This concept had a tremendous impact on research in convex analysis, as well as on nonsmooth optimization in general.
265
Bibliographic Notes
266
Chapter 5
+'0
,ran
mar"
Minkowski duality was introduced by H. Minkowski in his work on the geometry of numbers; see, e.g., [53, 88]. Minkowski duality plays a fundamental role in modern convex analysis and its applications, in many cases making it possible to ignore the distinction between convex sets and their support functions. The existence of this duality means that it is necessary to pay special attention to sublinear functions. An abstract formulation of Minkowski duality is given in [88].
Chapter 6
,.,
"-h
The space of convex sets is studied, for instance, in [74, 115, 132]; also see [88] where this space and its generalizations are treated from the standpoint of an abstract Minkowski duality. In addition to its applications in quasidifferential calculus, the space of convex sets is an important element in the theory of multivalued mappings, since it allows us to consider a multivalued mapping with convex images as a single-valued mapping with values in the space of convex sets.
Chapter 7 The results stated in this chapter were first presented in [144].
Chapter 8
A survey of varied generalizations of the concept of a derivative, their properties and interplay can be found in Rockafellar [140], which also contains an extensive bibliography.
Nom'
0..
A detailed presentation of these and related problems is given in [137]. The Clarke derivative was introduced in [27, 28], and its properties and applications are treated at some length by F. Clarke in his recent monograph [31]. Property (a) (following Corollary 8.1) is the Rademaker theorem, property (b) follows directly from the Fubini theorem, and property (c) is a corollary of the absolute continuity of functions satisfying the Lipschitz condition.
Chapter 9
N0-'
The notion of an upper convex approximation was introduced by B.N. Pschenichnyj in [128], where it was systematically used to study extremal problems. Many examples of upper convex approximations are also given in [16]. Exhaustive families of upper convex and lower concave approximations have been defined and studied by A.M. Rubinov (see [42]).
Bibliographic Notes
267
Chapter 10 Quasidifferentiable functions were first introduced in [40, 47].
Chapter 12
The composition theorem ('Theorem 12.2) was proved by Rubinov [41]. Sublinear operators and the subdifferentials have been studied extensively in convex analysis [21, 87, 141]. Chapter 13 The results given in this chapter first appeared in [35]. Chapter 14
The first implicit and inverse function theorems for the quasidifferentiable case were stated by V.F. Dem'yanov [37, 44], and are also discussed in [33]. Other approaches to this problem are described in [19, 22]. Chapter 15
The notion of "a pair being in a general position" was introduced by A. M. Rubinov [41].
Chapters 16, 17
Necessary conditions for an unconstrained extremum of a quasidifferentiable function were obtained by L.N. Polyakova [120]. In the constrained case, necessary conditions were obtained by Dem'yanov and Polyakova (see, e.g., [44]). Proposition 16.2 was proved by A. Shapiro [146]. We do not consider the computational aspects of the problem; note only that the first numerical method for minimizing quasidifferentiable functions was suggested by T.I. Sivelina [149]. This method was generalized in [46] and [112].
Chapter 19
Star-shaped sets and their applications to nonsmooth analysis are studied in [ 145].
Chapter 20 This chapter is based on results obtained by V.V. Gorokhovik [61, 62].
REFERENCES
[1] Alekseev, V.M., Tikhomirov, V.M, and Fomin, S.V. Optimal'noe Upravlenie (Optimal Control). Moscow: Nauka, 1979.
[2] Alexandrov, A.D. 1949. On surfaces which may be represented by a difference of convex functions (in Russian). Izv. Akad. Nauk Kazakhskoj SSR, Ser. Fiz.-Mat. 3:3-20. [3]
1950. On surfaces which may be represented by a difference
.
of convex functions (in Russian). Dokl. Akad. Nauk SSSR 72:613-6. 40i
[4] Aubin, Jean Pierre. Mathematical Methods of Game and Economic Theory. Amsterdam: North-Holland Publ. Co., 1979. [5]
.
1980. Contingent derivatives of set-valued maps. Technical all
Report 2044, Mathematics Research Center at the University of WisconsinMadison, Madison, Wisconsin, USA. . 1980. Contingent derivatives of set-valued maps and existence of solutions to nonlinear inclusions and differential inclusions. Advances
oaf
+-+
[6]
t0)
in Mathematics. Supplementary Studies. Volume 7A. Mathematical Analysis and Applications, ed. Leopoldo Nachbin, Part A, 160-229. New York London Toronto Sydney San Francisco: Academic Press, 1981. [7] Aumann, Robert J. August 1965. Integrals of set-valued functions.
J. Math. Anal. Appl. 12, 1:1-12.
[8] Auslender, A. February 1979. Differentiable stability in non convex and non differentiable programming. Math. Programming Stud. 10. Point-toset Maps and Mathematical Programming, ed. P. Huard, 29-41. Amsterdam: North-Holland Publ. Co.
269
References
270
1981. Minimisation de fonctions localement Lipschitziennes: . applications a la programmation mi-convexe, mi-differentiable. In Non-
(D.
[9]
linear Programming 3, ed. Olvi L. Mangasarian, Robert R. Meyer, and Stephen M. Robinson, 429-60. New York San Francisco London: Academic Press.
[10] Averbukh, I., and Smolyanov, G. August 1968. The various definitions of the derivative in linear topological spaces. Russian Math. Surveys 23, 4:67-111. (English transl.)
[11] Bank, B., Guddat, J., Klatte, D., Kummer, B., and Tammer, K. Nonlinear Parametric Optimization. Berlin: Akademie-Verlag, 1982.
[12] Banks, H.T., and Jacobs, Marc Q. February 1970. A differential calculus for multifunctions. J. Math. Anal. Appl. 29, 2:246-72.
[13] Barbu, V. 1981. Necessary conditions for nonconvex distributed control problems governed by elliptic variational inequalities. J. Math. Anal. Appl. 80:566-97.
0
[14] Ben-Tal., A., and Zowe, J. 1982. Necessary and sufficient optimality conditions for a class of nonsmooth minimization problems. Math. Programming 24:70-91.
[15] Beresnev, V.V., and Pshenichnyi, 1974. The differential properties of .p,
minimum functions. U.S.S.R. Comput. Math. and Math. Phys. 14, 3:101-13. (English transl.)
[16] Berkovitz, Leonard D. August 1916. Variational methods in problems of control and programming. J. Math. Anal. Appl. 3, 1:145-69. [17]
.
May 1974. Lower semicontinuity of integral functionals. Trans.
Amer. Math. Soc. 192: 51-7. X00
i-+
[18] Bishop, Errett, and Phelps, R.R. 1963. The support functionals of a convex set. Proceedings of the Symposia in Pure Mathematics 7: Convexity, 27-35.
`o^
[19] Boltyanskij (Bolt'yanskii), V.G. 1974. A method of tents in the theory of extremal problems. Russian Math. Surveys 30, 3:1-54. (English transl.) [20] Borisovich, Yu.G., Gel'man, B.D., Myshkis, A.D., and Obukhovskii,
V.V. January 1984. Multivalued mappings. J. Soviet Math. 24, 6:719-91. ...
(English transl.)
References
271
[21 ] Borwein, J.M. 1984. Subgradients of convex operators. Math. Operationsforsch. Statist. Ser. Optimization 15: 179-91.
[22] Bradly, M., and Datko, R. 1977. Some analytic and measure-theoretic properties of set-valued maps. SIAM J. Control Optim. 15:625-35.
[23] Bryzgalova, L.N. 1978. Maximum functions of a family of functions depending on parameters. Functsional'nyi Anal. Appl. 12, 1:50-51. (English transl.)
[24] Cassels, J.W.S. An Introduction to the Geometry of Numbers. 2d prtg, corr. Berlin New York: Springer-Verlag, 1971.
[25] Chaney, R.W. 1982. On sufficient conditions in nonsmooth optimization. Math. Oper. Res. 7: 463-75. [26]
.
1983. A general sufficiency theorem for nonsmooth nonlinear
programming. Trans. Amer. Math. Soc. 276: 235-46.
[27] Clarke, Frank H. 1975. Generalized gradients and applications. Trans. Amer. Math. Soc. 205:247-62. .
1976. A new approach to Lagrange multipliers. Math. Oper. CD'
[28]
Res. 1:165-74. [29]
. 1976. The maximum principle under minimal hypothesis. SIAM J. Control Optim. 14:1078-91.
.1976. On the inverse function theorem. Pacific J. Math. 64:97-102.
[30] [311
. Optimization and Nonsmooth Analysis. New York: John Wiley & Sons Publ. Co., 1983.
[32] Danskin, John M. The Theory of Max-Min and Its Application to Weapons Allocation Problems. New York Berlin Heidelberg: Springer-Verlag, 1967.
[33] Demidova, V.A., and Dem'yanov, V.F. 1983. A directional implicit function theorem for quasidifferentiable functions. Working Paper WP-83-125, International Institute for Applied Systems Analysis, Laxenburg, Austria.
[34] Dem'yanov, V.F. Minimaks: Differentsiruemost' po Napravleniyam (Minimax: Directional Differentiability). Leningrad: Izd. Leningrad Univ., 1974.
References
272
[35]
.
1981. On a relation between the Clarke subdifferential and the
quasidifferential. Vestnik Leningrad Univ. Math. 13:183-89. (English transl. )
(ed.). Negladkie Problemy v Teorii Optimizatsii i Upravleniya (Nonsmooth Problems in the Theory of Optimization and Control). Leningrad: Izd. Leningrad Univ., 1982.
[37]
. 1983. Quasidifferentiable functions: necessary conditions and descent directions. Working Paper WP-83-64, International Institute for
'ti
[36]
moo"
Applied Systems Analysis, Laxenburg, Austria.
[38] Dem'yanov, V.F., and Malozemov, V.N. Introduction to Minimax. New York: John Wiley & Sons Publ. Co., 1974. (English transl.)
tad
coy
[39] Dem'yanov, V.F., and Rubinov, A.M. Approximate Methods of Solving Extremal Problems. New York: American Elsevier, 1970. (English trans.)
[40]
(Dem'janov). 1980. On quasidifferentiable functionals. Soviet Math. Dokl. 21, 1:14-7. (English transl.)
1981. On some approaches to nonsmooth optimization pro-
[41]
blems (in Russian). Ekonomika i Matematicheskie Metody 17:1153-74. [42]
C3'
. 1982. Elements of quasidifferentiable calculus (in Russian). In Nonsmooth Problems of the Theory of Control and Optimization, ed. V.F. Dem'yanov, 5-127. Leningrad: Izd. Leningrad Univ.
[43]
.
1983. On quasidifferentiable mappings. Math. Operationsforsch.
Statist. Ser. Optimization 14:3-21. F-"
[44] Dem'yanov, V.F., and Vasil'ev, L.V. Nondifferentiable Optimization. New York: Optimization Software, Inc., Publications, 1986. (English transl.) y''
'D-
[45] Dem'yanov, V.F., and Zabrodin, I.S. 1983. Directional differentiability of a continuous maximum function of quasidifferentiable functions. Working Paper WP-83-58, International Institute for Applied Systems Analysis, Laxenburg, Austria.
[46] Dem'yanov, V.F., Gamidov, S., and Sivelina, T.I. 1983. An algorithm for minimizing a certain class of quasidifferentiable functions. Working Paper WP-83-122, International Institute for Applied Systems Analysis, Laxenburg, Austria.
References
273
[47] Dem'yanov, V.F., Polyakova, L.N., and Rubinov, A.M. 1979. On a genmar
eralization of the concept of a subdifferential (in Russian). Vsesoyuznaya Konferentsiya po Dinamicheskomy Upravleniyu. Tezisy Dokladov (The All-Union Conference on Dynamic Control: Abstracts of Reports), 79-84. Sverdlovsk, USSR. [48] Dubovitskij (Dubovitskii), A.Ya., and Milyutin, A.A. 1965. Extremum problems in the presence of restrictions. U.S.S.R. Comput. Math. and Math. Phys. 5:1-80. (English transl.)
[49] Ekeland, I., and Temam, R. Convex Analysis and Variational Problems. Amsterdam: North-Holland Publ. Co., 1976.
[50] Ekeland, I. 1979. Nonconvex minimization problems. Bull. Amer. Math. Soc. 1:443-74.
[51 ] Ermol'ev, Yu.M. 1966. Methods for solving nonlinear extremal problems (in Russian). Kibernetika 4:1-17. [52] Fenchel, W. 1979. On conjugate convex functions. Canad. J. Math. 1:73-7. [53]
. 1953. Convex Cones, Sets, and Functions. Lecture Notes. Princeton. Princeton University Press.
[54] Fletcher, R., and Watson, G.A. 1980. First- and second-order conditions for a class of nondifferentiable optimization problems. Math. Programming 18:291-307. nom'
[55] Gabasov, R., and Kirillova, F.M. Methods of Optimization. New York: Optimization Software, Inc., Publications. Forthcoming. (English transl.) CD'
[56] Gamkrelidze, R.V., ed. 1981. Progress in Science and Engineering (in Russian). Matem. Analiz 19:127-230.
[57] Gauvin, J. 1979. The generalized gradient of a marginal function in mathematical programming. Math. Oper. Res. 4:458-63.
19:101-19.
000
wQ,,
[58] Gauvin, J., and Dubeau, F. 1982. Differential properties of the marginal functions in mathematical programming. Math. Programming Stud. 0
[59] Goldstein, A.A. 1977. Optimization of Lipschitzian continuous functions. Math. Programming 13:14-22.
References
274
[60]
Gol'stein, E.G. Vypukloe Programmirovanie. Teoreticheskie Osnovy (Convex Programming. Elements of the Theory). Moscow: Nauka, 1970.
[611 Gorokhovik, V.V. 1982. On the quasidifferentiability of real-valued functions. Soviet Math. Dokl. 26:491-4. (English transl.) [62]
.
1984. Quasidifferentiability of real-valued functions and local
extremum conditions. Siberian Math. J. 25, 3:388-95. (English transl.) \C)
[63] Gwinner, J. 1981. Bibliography on nondifferentiable optimization and nonsmooth analysis. J. Comput. Appl. Math. 7:277-85. ODD
[64] Hadwiger, H. Vorlesung uber Inhalt, Oberflache and Isoperimetrie
(Lectures on the Subject of Volume, Surface Area and Isoperimetry). Berlin: Springer-Verlag, 1957.
[65] Halkin, H. 1976. Interior mapping theorem with set-valued derivative. J. Analyse Math. 30:200-7.
1976. Mathematical programming without differentiability. In . Calculus of Variations and Control Theory, ed. D.L. Russell, 279-88.
cod
1.,
[66]
New York: Academic Press. 4-+
[67] Hartman, P. 1959. On functions representable as a difference of convex functions. Pacific J. Math. 9:707-13. [68] Hiriart-Urruty, J.-B.
1978.
Gradient generalises de fonction margi-
nal. SIAM J. Control Optim. 16:381-416. [69]
.
1979. Tangent cones, generalized gradients and mathematical
programming in Banach spaces. Math. Oper. Res. 4:79-97. [70]
.
1979. New concepts in nondifferentiable programming. Bull.
Soc. Math. France Mem. 60:57-85. [711
.
1980. Mean-value theorems in nonsmooth analysis. Numer.
Funct. Anal. Optim. 2:1-30.
[72] Hogan, W. 1971. Directional derivatives for extremal-value functions. Working Paper no. 117, Western Management Science Institute, Los Angeles, California, USA. [73]
1973. Directional derivatives for extremal-value functions . with applications to the completely convex case. Oper. Res. 21:188-209.
References
275
00a
[74] Hormander, L. 1954. Sur la fonction d'appui des ensembles convexes dans un espace localement convex. Ark. Mat. 3:181-6.
[75] Huard, P., ed. 1979. Point-to-set maps and mathematical programming. Math. Programming Stud. 10:1-190. 0
Ana,
[76] Ioffe, A.D. 1979. Necessary and sufficient conditions for a local minimum. 1: A reduction theorem and first-order conditions. SIAM J.
Control Optim. 17:245-50. ...
bow
1981. Nonsmooth analysis: differentiable calculus of nondif-
z
[77]
ferentiable mappings. Trans. Amer. Math. Soc. 266:1-56. (English transl.) [78]
Ioffe, A.D., and Levin, V.L. 1972. Subdifferentials of convex functions. Trans. Moscow Math. Soc. 26:1-72. (English transl.)
[79] Ioffe, A.D.,
and Tikhomirov, V.M. Theory of Extremal Problems.
Amsterdam New York: North-Holland Publ. Co., 1979.
1948. Extremum problems with inequalities as subsidiary conditions. In Studies and Essays: Courant Anniversary Volume, ed. K.O. Friedricks, O.E. Neugebauer, and J.J. Stoker, 187-204. New York:
[80] John, F.
Wiley Interscience. 00
[81] Kakutani, S. 1941. A generalization of Brower's fixed point theorem. Duke Math. J. 8:457-9. 00
[82] Kantorovich, L.V., and Akilov, G.P. Functional Analysis (in Russian). Moscow: Nauka, 1977. 00
[83]
Karlin, Samuel. Mathematical Methods and Theory in Games, Programming and Economics. Vols. I, II. Reading, Mass.: Addison-Wesley Publ. Co., 1959.
00
[84] Kough, P.F.
1979. The indefinite quadratic programming problem.
Oper. Res. 27:516-33. the
BCD
V'1
00
[85] Kruger, A.Ya., and Mordukhovich, B.S. 1980. Extremal points
Euler equation in nonsmooth optimization problems (in Russian). Dokl. Akad. Nauk BSSR 24:684-7. 00
[86] Kuhn, H.S., and Tucker, A.W. 1951. Nonlinear programming. In Proceedings of the Second Berkeley Symposium of Mathematical Statistics and Probability, ed. J. Neumann, 481-92. Berkeley Los Angeles, CA: University of California Press.
References
276 F-" 00
[87] Kutateladze, S.S. 1979. Osnovy Funktsional'nogo Analiza (Foundations of Functional Analysis). Novosibirsk, USSR: Nauka, 1983.
[88] Kutateladze, S.S., and Rubinov, A.M. Dvojstvennost' Minkovskogo i ee Prilozheniya (Minkowski Duality and Its Applications). Novosibirsk, USSR: Nauka, 1976. o+0
F-"
[89] Kugushev, E.I. 1973. The maximum principle in optimal control problems for systems with nonsmooth right-hand side (in Russian). Vestnik Moskov. Univ. Ser. I Mat. Mekh. 3:107-13.
[90] Lemarechal, C., and Mifflin, R., eds. Nonsmooth Optimization. New York: Pergamon Press, 1977.
[91 ] Le van Hot. 1981. On the differentiability of multivalued mappings. I., II. Comment. Math. Univ. Carolinae 22:267-80; 337-50.
[92] Makarov, V.L., and Rubinov, A.M. Mathematical Theory of Economic Dynamics and Equilibria. Berlin Heidelberg New York: Springer-Verlag, 1977. (English transl.)
[93] Malozemov, V.N. 1981. On sufficient conditions for a local minimax. Vestnik Leningrad. Univ. Math. 9:141-5. (English transl.)
[94] Mangasarian, O.L. Nonlinear Programming. New York: McGraw-Hill, 1969.
V-4
[95] Maurer, Helmut. 1979. First-order sensitivity of the optimal-value function in mathematical programming and optimal control. In Mathematical Programming with Data Perturbations 1, ed. Anthony V. Fiacco, 51-67.
Lecture Notes in Pure and Appl. Math. 73. Papers presented to the First Symposium on Mathematical Programming with Data Perturbations,
held on May 24-25, 1979, at the George Washington University. New York: Marcel Dekker, Inc., 1979.
[96] McCormick, Garth P. 1975. Optimality criteria in nonlinear programming. In Nonlinear Programming, ed. Richard W. Cottle and C.E. Lemke. Proc. of Symposium in Appl. Math. of Amer. Math. Soc. and the Soc. for Industr. and Appl. Math., held in New York City, March 23-24, 1975. Vol. 9. Providence, R.I.: Amer. Math. Soc., 1976.
[97] Melzer, D. 1984. Expressibility of two piecewise linear convex functions. Humboldt University, Berlin. Preprint.
References
277
[98] Mifflin, R. 1977. Semismooth and semiconvex functions in constrained optimization. SIAM J. Control Optim. 15, 6:959-72. [99]
.
1982. A modification and an extension of Lemarechal's algo-
rithm for nonsmooth minimization. Math. Programming Stud. 17:77-90. r-4
[100] Minchenko, L.I., and Borisenko, O.F. (O.F. Borisenko and L.I. Minchenko). 1983. On the directional differentiability of a maximum function. U.S.S.R. Comput. Math. and Math. Phys. 23, 3:34-9. (English transl.)
[101] Mirica, S. 1980. A note on the generalized differentiability of mappings. Nonlinear Anal. 4:567-75.
[102] Mordukhovich, B.S. 1980. Metric approximations and necessary optima-
lity conditions for general classes of nonsmooth extremal problems. Soviet Math. Dokl. 22:526-30. (English transl.)
[103] Mordukhovich, B.S. and Kruger, A.Ya. 1976. Necessary optimality conditions in the problem of terminal control with nonfunctional constraints (in Russian). Dokl. Akad. Nauk BSSR 20, 2:1064-7.
[104] Moreau, J.-J. 1963. Fonctionelles sous-differentiables. C.R. Acad. Sci. Paris Ser. A-B, 257:4117-9. [105] Neustadt, L.W. Optimization. Princeton: Princeton Univ. Press, 1976.
[106] Nurminskij (Nurminskii), E.A. 1977. Continuity of e-subgradient mappings. Cybernetics 13, 5:790-1. (English transl.) [107] Nurminskij, E.A., ed. Progress in Nondifferentiable Optimization. Laxenburg, Austria: International Institute for Applied Systems Analysis, 1982.
[108] Pallaschke, D. 1984. On numerical experiments with a quasidifferentiable optimization algorithm. In Abstracts of the IIASA Workshop on Nondifferentiable Optimization: Motivations and Applications, Sopron, Hungary, 17-22 September 1984, 138-40. Laxenburg, Austria: International Institute for Applied Systems Analysis.
[109] Papavassilopoulos, G. 1981. Algorithms for a class of nondifferentiable problems. J. Optim. Theory Appl. 24:41-82.
[110] Pecherskaya, N.A. 1976. On the directional differentiability of a maximum function subject to linked constraints (in Russian). In Operations Research (Models, Systems, Solutions), ed. Yu.G. Evtushenko, 11-6. Moscow: Computer Center of the Academy of Sciences of the USSR.
References
278
.1981. On the differentiability of multivalued mappings (in Rus-
[111]
sian). Vestnik Leningrad. Univ. Mat. Mekh. Astronom. 7:115-7.
.1982. Differentiability of set-valued mappings (in Russian). In
[112]
Nonsmooth Problems of the Theory of Control and Optimization, 128-47. Leningrad: Izd. Leningrad. Univ.
1986. Quasi-differentiable mappings and their differentiability
[113]
of maximum functions. Math. Programming Stud. 29:145-59.
[114] Penot, J.P. 1978. Calcul sous-differential et optimization. J. Funct. Anal. 00
27:248-76.
[115] Pevnyj (Pevnyi), A.B. 1971. Differentiation of a Maximin Function. U.S.S.R. Comput. Math. and Math. Phys. 11, 2:253-9. (English transl.)
[116] Pinsker, A.G. 1966. The space of convex sets of a locally convex space (in Russian). In Some Classes of Semiordered Spaces, ed. B.Z. Vulich, 13-8. Leningrad: Izd. Leningrad. Univ.
[ 117] Polak, E., and Wardi, Y. 1982. Nondifferentiable optimization algorithm for designing control systems having singular value inequalities. Automatica 18:267-83.
0W0
¢:'
[118] Polak, E., Mayne, D.Q., and Wardi, Y. 1983. On the extension of constrained optimization algorithms from differentiable to nondifferentiable problems. SIAM J. Control Optim. 21:179-203.
[119] Polyak, B.T. Introduction to Optimization. New York: Optimization Software, Inc., Publications. Forthcoming. (English transl.)
aid
[120] Polyakova, L.N. 1980. Necessary conditions for an extremum of quasidifferentiable functions. Vestnik Leningrad. Univ. Math. 13 (1981):241-7. (English transl.) [121]
.1982. On a problem in nonsmooth optimization (in Russian). Cybernetics 2:119-22.
[1221
.
1984. On the minimization of the sum of a convex function
and a concave function. Collaborative Paper CP-84-28, International Institute for Applied Systems Analysis, Laxenburg, Austria.
[123]
279
. 1984. On the minimization of a quasidifferentiable function subject to equality quasidifferentiable constraints. Collaborative Paper CP-84-27, International Institute for Applied Systems Analysis, Laxen-
oar
I-+
References
burg, Austria.
[124] Pontryagin, L.S., Boltyanskij (Boltjanskii), V.G., Gamkrelidze, R.V., and Mishchenko, E.F. TheMathematical Theory of Optimal Processes. Chichester: Wiley, 1962.
[125] Pourciau, B.H. 1977. Analysis and optimization of Lipschitz continuous mappings. J. Optim. Theory Appl. 22:311-51. [126] Pschenichnyj, B.N. 1965. Dual method in extremal problems, I (in Russian). Kibernetika 3:89-95. [127]
.Necessary Conditions for Extremum Problems. New York: Marvel Dekker, 1971. (English transl.)
[128]
. Vypuklyj Analiz i Ekstremal'nye Zadachi (Convex Analysis and Extremal Problems). Moscow: Nauka, 1980.
[129]
.
Metod Linearizatsij (The Method of Linearizations). Moscow:
Nauka, 1983.
[130] Pshenichnyj (Pshenichnyi), B.M., and Danilin, Yu.M. Numerical Methods in Extremal Problems. Moscow: Mir, 1978. (English transl.)
[131] Pshenichnyj, B.M., and Khachatryan, R.A. 1982. Equality constraints in nonsmooth optimization problems (in Russian). Ekonomika i Mat. Methody 18:1133-40.
[132] Radstrom, H. 1952. An embedding theorem for spaces of convex sets. Proc. Amer. Math. Soc. 3:165-9.
[133] Roberts, A.W., and Varberg, D.E. Convex Functions. New York: Academic Press, 1973. [134] Robinson, S.M. 1976. First-order conditions for general nonlinear optimization. SIAM J. Appl. Math. 30:597-603.
[135] Rockafellar, R. Tyrell. 1966. Characterization of the subdifferentials of convex functions. Pacific J. Math. 17:497-510. [136]
.
Convex Analysis. Princeton Math. Ser. 28, 1970.
References
280
[137]
. The Theory of Subgradients and Its Applications to Problems of Optimization. Lecture Notes Series, Montreal: Montreal Univ. Press,
1978.
[138]
1979. Clarke's tangent cones and the boundaries of closed sets
.
in Rn. Nonlinear Anal.: Theory, Methods and Appl. 3:145-54. [139]
.
1982. Favorable classes of Lipschitz continuous functions in C13
subgradient optimization. In Progress in Nondifferentiable Optimization, ed. E. Nurminskii. Laxenburg, Austria: International Institute for Applied Systems Analysis. [140]
. Generalized subgradients in mathematical programming. In Mathematical Programming. The State-of-the-Art, 368-90. Berlin Heidelberg 171
New York Tokyo: Springer-Verlag, 1983.
[141] Rubinov, A.M. 1977. Sublinear operators and their applications. Russian Math. Surveys 32:115-75. (English transl.) [142]
.
1980. Magistral sets in discrete, disperse dynamical systems.
Siberian Math. J. 21, 4:583-90. (English transl.) c13
Obi
00000
.oz
. Superlinear Multivalued Mappings and Their Applications to Problems of Mathematical Economics (in Russian). Leningrad: Nauka, o,)
[143]
1980.
[144]
. 1984. Continuous (E,m)-subdifferentials of uppersemicontinuously differentiable functions. In Abstracts of the IIASA Workshop on Nondifferentiable Optimization: Motivations and Applications, Sopron, Hungary, 17-22 September 1984, 148-52. Laxenburg, Austria: International Institute for Applied Systems Analysis.
[145] Rubinov, A.M., and Yagubov, A.A. 1984. The space of star-shaped sets and its applications in nonsmooth optimization. Collaborative Paper CP-84-28, International Institute for Applied Systems Analysis, Laxenburg, Austria.
[146] Shapiro, A. 1984. Quasidifferential calculus and first-order optimality conditions in nonsmooth optimization. SIAM J. Control Optim. 23:610-7.
[147] Shor, N.Z. 1972. A class of almost-differentiable functions and a minimization method for this class. Cybernetics 8, 4:599-606. (English transl.) [1481
.Minimization Methods for Non-differentiable Functions. Berlin Heidelberg New York Tokyo: Springer-Verlag, 1985. (English transl.)
References
281
[149] Sivelina, T.I. 1983. Minimizing a certain class of quasidifferentiable functions (in Russian). Vestnik Leningrad. Univ. Mat. Mekh. Astronom. 7:103-5.
[150] Strodiot, J.J., and Hien Nguyen, V. 1979. Characterisation des solutions optimales en programmation non differentiable. C. R. Acad. Sci. Paris Ser. A-B, 288:1075-8.
J;'
[151] Tagawa, S. 1979 Optimierung mit mengenwerten Abbildungen (Maximimum Likelihood Estimation). Oper. Res. Verfahren 31:619-29. [152] Thibault, L. 1982. Subdifferentials of nonconvex vector-valued functions. J. Math. Anal. Appl. 86:319-44.
[153] Tuy, H. 1983. Global minimization of a difference of two convex functions. Institute of Mathematics of Hanoi, Hanoi, Vietnam. Preprint. [154]
.
1984. Global minimization of a difference of two convex
functions. In Selected Topics in Operations Research and Mathematical Economics, ed. G. Hammer and D. Pallaschke, 98-118. Lecture Notes in Economics and Mathematical Systems, 226. Berlin Heidelberg New York Tokyo: Springer-Verlag.
[155] Tyurin, Yu.N. 1965. A mathematical formulation of a simplified model of industrial planning (in Russian). Ekonomika i Matematischeskie Metody 1:391-409.
[156] Valadier, M. 1969. Sous-differentiels d'une borne superieure et d'une somme continue de fonctions convexes. C. R. Acad. Sci. Paris Ser. A-B, 268:39-49.
[157] Vial, J.-P. 1983. Strong and weak convexity of sets and functions. Math. Oper. Res. 8:231-59. [158] Vinogradova, T.K., Dem'yanov, V.F., and Pevnyj, A.B. 1973. On the dir-
ectional differentiability of functions of maximum and minimum (in Russian). Optimizatsiya 10, 27:17-21.
[159] Vulich, B.Z. Special Problems of the Geometry of Cones in Normed Spaces (in Russian). Kalinin: Izd. Kalinin. Uver. 1978.
[160] Warga, J. Optimal Control of differential and Functional Equations. New York: Academic Press, 1972.
References
282
[161]
[162]
. 1975. Necessary conditions without differentiability assumptions in optimal control. J. Differential Equations 18:41-62.
1975. Derivative containers, inverse functions and controllability.
.
Symposium on Calculus of Variations and Control Theory, ed. D.L. Russell, 13-45. University of Wisconsin-Madison, Wisconsin, USA. r""
New York: Academic Press, 1976. [163]
.
1978. An implicit function theorem without differentiability.
Proc. Amer. Math. Soc. 69:65-8.
[164] Wolfe, P. and Balinski, M.S. (M.L. Balinski and P. Wolfe), eds. Nondifferentiable Optimization. Math. Programming Stud. 3. Amsterdam: NorthHolland Publ. Co., 1975.
[165] Womerslay, R.S. 1982. Optimality conditions for piecewise smooth functions. Math. Programming Stud. 17:13-27. [166] Yomdin, Y. 1983. On functions representable as a supremum of a family of smooth functions. SIAM J. Math. Anal. 14:239-46.
[167] Zalgaller, V.A. 1963. On the representation of functions of two variables as a difference of convex functions (in Russian). Vestnik Leningrad. Univ. Mat. Mekh. Astronom. 1:44-5.
INDEX
,!,
Approximation first-order -- ix lower concave -- 104 exhaustive families of l.c.a's 106 upper convex -- 104 exhaustive families of u.c.a's 106 Bijection 238 Bound lower -- 73 upper -- 73 exact lower (upper) --
73
owe
Concave minorant 105 Cone 3 approximating a set 177 conjugate to 3 closed convex -- 3 convex -- 4 of positive elements 73 tangent to a convex set 180 Contraction of a function to a cone Convex majorant 104
62
Derivative (see Function, derivative of) Difference of pairs 67 Direction 1 admissible -- 176 tangent -- 176
283
284
Index
((DD
4-.
Euclidean norm 1 Extremum Necessary conditions for a constrained -- 193 Necessary conditions for an -- 231 Function concave -- 37 strictly concave -- 37 convex -- 37 strictly convex -- 37 convex positively homogeneous -- 46 continuous (e,,u)-subdifferential of a -- 82 directionally differentiable -- 19 derivative of a -Clarke derivative of a -- 92 Clarke lower derivative of a -- 103 Clarke upper derivative of a -- 92, 103 conditional derivative of a -- 43 contingent derivative of a -- 91 Dini derivative of a -- 91 directional derivative of a -- ix, 19 lower directional derivative of a -- 92 upper directional derivative of a -- 91 e-derivative of a -- 41 conditional e-derivative of a -- 43 differentiable -directionally differentiable -- 19 Frechet differentiable -- 25 Gateaux differentiable -- 25 e-differential of a -- 41 gradient of a -- 25 implicit -- 155 increasing -- 131 lower semicontinuous -- 12 quasidifferentiable -- x, 112 quasidifferential of a -- x, 114 steepest descent direction of a -- 41 e-steepest descent direction of a -- 42 subdifferential of a -- 38, 114 Clarke subdifferential of a -- 93 e-subdifferential of a -- 41 conditional subdifferential of a -- 43 conditional e-subdifferential of a -- 43 4-.
,-4 ,-4 4-4
cam
0
C13
Index
285
coy
Function subdifferentiable -- 79, 123 subgradient of a -- 38 e-subgradient of a -- 41 conditional e-subgradient of a -- 43 sublinear -- 46 superdifferentiable -- 123 superdifferential of a -- 114 superlinear -- 63 support -- 50-51, 236 uniformly differentiable -- 27 uniformly directionally differentiable -- 27 uniformly quasidifferentiable -- 132 upper semicontinuous -- 10 upper semicontinuously directionally differentiable --
General position, of a pair Generalized subdifferential Infimum
Lattice
182 91
73 73
Mapping bounded -- 8 closed -- 8 composition of --s 6 directionally differentiable -- 30 uniformly directionally differentiable -domain of a -- 6
graph of a -- 6 Hausdorff continuous -inverse of a -- 6 multivalued -- 6
10
point-to-set --6 quasidifferentiable -- 128 subdifferential -- 38 -subdifferential -- 41 sublinear -- 130 superadditive -- 130
30
78
286
Index
Mapping superlinear -- 130 upper semicontinuous -- 8 weakly measurable -- 119 Matrix operationally convex -- 129 Metric Hausdorff -- 9-10 Minkowski duality 51, 236 Necessary condition for a maximum Necessary condition for a minimum
194-6 194-5
Operator sublinear -- 130 superlinear -- 130
Point `"h
000
'fib
,.-.. r+.
000
c',
equilibrium -- 235 e-stationary -- of a convex function 44 inf-stationary -- 188 local e-maximum -- 261 local e-minimum -- 260 maximin saddle -- 229, 230 isolated maximin saddle -- 229 isolated locally maximin saddle -- 229 locally maximin saddle -- 229 minimax saddle -- 229, 230 isolated minimax saddle -- 229 isolated locally minimax saddle -- 229 locally minimax saddle -- 229 sup-stationary -- 188 !CA
c13
r-,
Polar properties of -- 250-251 Problem directional implicit function -Product 2 scalar -- 2
r-,
Quasidifferential calculus main formulas of -- 114
167
287
Index
Semideviation of a set A from the set B Set compact -- 2, 3 convex -- 2 gauge of the -- 237 star-shaped -- 236, 237 inverse product of a -- 240 inverse sum of a -- 240 strictly -- 2 convex conical hull of a -- 4 convex hull of -- 2 convex hull of a compact -- 3 empty -- 2 e-neighborhood of a -- 8 lower semicontinuous -- 9 Kakutani continuous -- 9 ordered -- 73 stable with respect to a cone 58 lower bounded -- 73 upper bounded -- 73 Space ordered linear -- 73 Steepest ascent direction of -- 203, 206 rate of -- 203, 206 Steepest descent direction of -- 203, 206 rate of -- 203, 206 Subset thick -- 146 Sum
203
2 1)r
Sum of classes 67 Supremum 73 System quasilinear -- 172 Theorems composition -- 133 directional implicit function --
172
44-
'C,
on continuous (e,,u)-subdifferential of a composition on e-quasidifferentiability of a composition 256 on separation 3 Stone-Weierstrass -- 255 von Strassen -- 120
86
00
288
Index
Vector 1 convex combination of --s 3 feasible -- 176 linearly dependent -- 1 linearly independent --- 1
R
E
R
E
a A
a
p P
r
65
b
c C
s
B B
v
T T
t
r r
g
y Y
u
d
4) 0
f
e
x X
kh
e
u IL
is
N
TRANSLITERATION TABLE
zh
N
ti u
ch
z
in
sh
1
iu Iu
shch
j
b
'b
II
k
bt bI
y
m
bb 33
eh
n
lo
K
yu
o 0
0
A SI
ya
nn
p
e E 99
00
e E
N
1
III
289