POINT-TO-SET MAPS AND MATHEMATICAL PROGRAMMING
MATHEMATICAL PROGRAMMING STUDIES Editor·in·Chief M.L. BALINSKI, Yale University, New Haven, CT, U.S.A. Senior Editors E.M.L. BEALE, SCIentific Control Systems, Ltd., London, Great Britain GEORGE B. DANTZIG, Stanford University, Stanford, CA, U.S.A. L. KANTOROVICH, National Academy of Sciences, Moscow, U.S.S.R. TJALLING C. KOOPMANS, Yale University, New Haven, CT, U.S.A. A.W. TUCKER, Princeton University, Princeton, NJ, U.S.A. PHILIP WOLFE, IBM Research, Yorktown Heights, NY, U.S.A. Associate Editors VACLAV CHVATAL, Stanford University, Stanford, CA, U.S.A. RICHARD W. COTTLE, Stanford University, Stanford, CA, U.S.A. H.P. CROWDER, IBM Research, Yorktown Heights, NY, U.S.A. J.E. DENNIS, Jr., Cornell University, Ithaca, NY, U.S.A. B. CURTIS EAVES, Stanford University, Stanford, CA, U.S.A. R. FLETCHER, The University, Dundee, Scotland B. KORTE, Universitat Bonn, Bonn, West Germany MASAO IRI, University of Tokyo, Tokyo, Japan C. LEMARECHAL, IRIA·Laboria, Le Chesnay, Yvelines, France C.E. LEMKE, Rensselaer Polytechnic Institute, Troy, NY, U.S.A. GEORGE 1. NEMHAUSER, Cornell University, Ithaca, NY, U.SA. WERNER OETTLI, Universitiit Mannheim, Mannheim, West Germany MANFRED W. PADBERG, New York University, New York, U.S.A. M.J.D. POWELL, University of Cambridge, Cambridge, England JEREMY F. SHAPIRO, Massachusetts Institute of Technology, Cambridge, MA, U.S.A. L.S. SHAPLEY, The RAND Corporation, Santa Monica, CA, U.S.A. K. SPIELBERG, IBM Scientific Computing, White Plains, NY, U.S.A. HOANG TUY, Institute of Mathematics, Hanoi, Socialist Republic of Vietnam D.W. WALKUP, Washington University, Saint Louis, MO, U.S.A. ROGER WETS, University of Kentucky, Lexington, KY, U.S.A. C. WITZGALL, National Bureau of Standards, Washington, DC, U.S.A.
NORTH-HOLLAND PUBLISHING COMPANY - AMSTERDAM-NEW YORK-OXFORD
MATHEMATICAL PROGRAMMING STUDY
10
Point-to -Set Maps and Mathematical Programming Edited by P. HUARD
A. Auslender J.M. Borwein J.P. Delahaye J. Denel J.Ch. Fiorot E.G. Gol'shtein P. Huard D. Klatte R. Klessig
B. Kummer G.G.L. Meyer E. Polak S.M. Robinson A. Ruszczynski R. Saigal J. Szymanowski S. Tishyadhigama N.V. Tret'yakoY
1979
NORTH-HOLLAND PUBLISHING COMPANY - AMSTERDAM-NEW YORK-OXFORD
© THE MATHEMATICAL PROGRAMMING SOCIETY - 1979 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner.
This book is also available in journal format on subscription.
North-Holland ISBN for this series: 0 7204 8300 X for this volume: 0444 85243 3
Published by : NORTH-HOLLAND PUBLISHING COMPANY AMSTERDAM' NEW YORK' OXFORD
Sole distributors for the U.S.A. and Canada: Elsevier North-Holland, Inc. 52 Vanderbilt Avenue New 'York, N.Y. 10017
Library of Congress Cataloging in Publication Data
Main entry under title: Point-to-set maps and mathematical programming. (Mathematical programming study ; no. ) 1. programming (Mathematics)--Addresses, essays, lectures. 2. Mappings (Mathematics)--Addresses, essays, lectures. I. Huard, Pierre. II. Auslende~ Alfred. III. Series. QA402.5 . P5 7 519.7 78-23304 ISBN 0-444-85243-3
PRINTED IN THE NETHERLANDS
PREFACE The theory of point-to-set maps, and more exactly the notions of continuity connected with it, forms a most interesting mathematical tool for the study of algorithms. It has come into increasing use during the last twelve years in papers on optimization (convergence of algorithms, synthesis of existing methods, stability of parametrized mathematical programs, etc.). The object of this monograph is to give a sample of this literature, and to endeavour to take stock of the question. The monograph includes a bibliographic survey drafted by the editor in collaboration with several colleagues. A list of references going back to the beginning of the century is given in an annex, as well as a short communication by Delahaye and Denel concerning the equivalences between the various notions of continuity of the point-to-set maps used in classical papers. The other articles cover a great variety of subjects, which can however be regrouped under a few headings: stability, optimality-duality, algorithms and fixed points. Three papers dealing with the stability of nonlinear programming, with different subjects of concern. Paper 1 by Auslender studies the directional derivatives of the optimal value of a program whose right-hand members are parameters, in the case of nondifferentiable constraints. Paper 7 by Klatte lays down sufficient conditions for the lower semicontinuity of the set of optimal solutions in the case of a program where the objective function and the domain are defined by parametrized quadratic convex functions. It will be recalled that this semicontinuity is generally not assured, contrary to upper semicontinuity. Lastly, Paper 8 by Kummer deals with the same problem, but for the set of optimal solutions of the dual program of a linearly constrained program where only the convex objective function is parametrized: it is consequently a matter of continuity of the set of multipliers of the primal. Stability is also studied in Paper 10 by Robinson, but in a broader setting: the subject is the stability of the set of solutions of multivoque equations, Le., where the sign of equality is replaced by that of belonging to a set. This leads to applications to complementarity problems and to quadratic programming, where the constraints are cone-constraints. Extensions of the theory of duality are given in two papers. Papers 2 by Borwein gives a generalization of the Farkas lemma, and illustrates the possibilities of simplifying afforded by point-to-set maps. Paper 5 by Gul'shtein and Tret'yakov studies the conservation of duality properties for generalized v
vi
Preface
Lagrangian functions and the applications resulting therefrom for the convergence of methods of subgradients for determining saddle-points. Seven other papers deal mainly with algorithms. Papers 6 by Huard and 13 by Tishyadhigama, Polak and Klessig extend the applications of Zangwill's general algorithm by the weakening of hypotheses. In Paper 4 by Fiorot and Huard are studied the possibilities of combining the iterations of different algorithms in a cyclic way (composition of ranges of point-to-set maps) or arbitrarily (union of the ranges), thus generalizing the conventional relaxation methods. The conventional continuity properties of point-to-set-maps are hard to preserve in these operations, and new notions of continuity are proposed by Denel in Paper 3. This author is thus able to construct general algorithms having a great number of applications. Also in the field of composition of algorithms, Paper 12 by Szymanowski and Ruszczynski studies the convergences of two-level algorithms and especially the influence of the approximation of calculations in the sub-optimizations. All these algorithms lead to obtaining fixed points of point-to-set-maps. Paper 11 by Saigal studies at theoretic and practical levels a method for obtaining fixed points by simplicial decomposition of space, and piecewise linear approximation of the functions. Lastly, Paper 9 by Meyer studies the properties of cluster points of a sequence of points generated by an algorithm, account taken of the properties of continuity of the point-to-set map that defines the algorithm. P. Huard
CONTENTS Preface Contents
Vll
v
Background to point-to-set maps in mathematical programming Annex 1: The continuities of the point-to-set maps, definitions and equivalences . . . . . Annex 2: Relaxation methods General reference list
8 13 14
(1) Differentiable stability in non convex and non differentiable programming, A. Auslender . . . . . .
29
(2) A multivalued approach to the Farkas lemma, l. Borwein
42
(3) Extensions of the continuity of point-to-set maps: applications to fixed point algorithms, l. Denel
48
(4) Composition and union of general algorithms of optimization, l.Ch. Fiorot and P. Huard . . . . . . . .
69
(5) Modified Lagrangians in convex programming and their generalizations, E.G. Gol'shtein and N. V. Tret'yakov . . ..
86
(6) Extensions of Zangwill's theorem, P. Huard
98
(7) On the lower semicontinuity of optimal sets in convex parametric optimization, D. Klatte
104
(8) A note on the continuity of the solution set of special dual optimization problems, B. Kummer . . . . . . .
110
(9) Asymptotic properties of sequences iteratively generated by point-toset maps, G.G.L. Meyer
115
(10) Generalized equations and their solutions, Part I: Basic theory, S.M. Robinson
128
(11) The fixed point approach to nonlinear programming, R. Saigal
142
vii
viii
Contents
(12) Convergence analysis for two-level algorithms of mathematical programming, J. Szymanowski and A. Ruszczynski
158
(13) A comparative study of several general convergence conditions for algorithms modeled by point-to-set maps, S. Tishyadhigama, E. Polak and R. Klessig . . . . . . . . . . . .
172
Mathematical ProgrammingStudy 10 (1979) I-7. North-Holland Publishing Company
B A C K G R O U N D T O P O I N T - T O - S E T MAPS IN M A T H E M A T I C A L PROGRAMMING
I. A i m o f the b o o k
The articles of this monograph deal with the use of point-to-set maps in the theory of mathematical programming or, more generally, of optimization. We should recall that a point-to-set map is, generally speaking, a function defined on a space X and whose ranges are subsets of a space Y. This is expressed symbolically F : X ~ ~ ( Y ) or again F : X ~ 2 v. Why a monograph on such a subject? The use is recent, and to our knowledge the first articles of this kind are those by Cesari (1966) and by Rosen (1966). In these papers the authors use continuity properties of point-to-set maps in optimal control problems (existence or convergence theorems). With great modesty, apparently, for the authors do not stress the originality of their approaches. We might also mention, in the book by Berge (1959, and 1966 for the second edition), the expression of a theorem known as the maximum value theorem (Ch. 6, w regarding the stability or continuity properties of the optimal value of a parametrized program, as well as of the set of optimal solutions. But the use of point-to-set maps only really got into its stride after the publication of Zangwill's book, in 1969. This book, the theme of which bears on the representation and study of iterative methods by means of general schemas, makes most stimulating reading. One important idea set forth in this book is the "macroscopic" analysis and synthesis of the algorithms, in the study of their convergence, thanks to point-to-set maps. Notwithstanding the obvious advantage of this mathematical tool in the representation and study of optimization algorithms, its use by specialists has not shown rapid spread. There are now hardly more than a few dozen users. One of the reasons for this slow spread is probably that point-to-set maps are hardly ever studied in university courses in mathematics. Hence the absence of a common language, and especially the use of notions that are neighbouring but nevertheless vary from author to author: the notions of continuity may be taken as the main example of this. The authors are thus practically compelled, at the beginning of each article, to define the notions they are using, thus weighing down on the presentation of results. This is also the case for the basic theorems applied: few readers know, for instance, the theorems on the stability of the continuity properties in the composition of two point-to-set maps. On the other hand, can one imagine a "classical" author reminding readers of a similar
2
Background to point-to-set maps
theorem on the composition of two "univoque" functions, or quoting a reference thereto? With this monograph we hope to point out the advantage of using point-to-set maps for optimization, and to make this mathematical tool better known, one of great use and which often simplifies theoretical schemas. Before justifying the above statement, we would remind readers how the notion of a point-to-set map appeared in the literature of mathematics. By reading the references quoted in the historical outline that follows, we note that these studies deal essentially with the notions of continuity of these functions. Most authors propose and study two types of continuity, originally called upper semi-continuity and lower semi-continuity. These names cover, as we have indicated above, notions that differ slightly from author to author. A comparative study of these notions is given in Annex 1.
2. Historical outline
The notion of point-to-set mapping, and more exactly the notions of continuity connected with it, made their appearance long before mathematical programming, in study of the limit of a sequence of sets. We can cite a theorem on the upper limit of a family of subsets of the interval [0, 1] depending on a real variable, presented by Painlev6 in his course at the Ecole Normale Sup6rieure de Paris, in 1902. This theorem was taken up again and developed by Zoretti in 1905. Then, spread out over about 20 years, several articles came out on the limits of sets or on that type of functions valued in ~(R): Janiszewski (1912), Hahn (1921), Moore (1924-1925), Vasilesco (1925), Wilson (1926), Hill (1927). But it was during the thirties that studies were published on more general point-to-set maps as well as on the notions of continuity related thereto: Bouligand (1931, 1932, 1933), Kuratowski (1932), Hahn (1932), Blanc (1933). Among these more modern studies, published independently of one another (except for Blanc, who refers to Bouligand, Vasilesco and Kuratowski), the work of Hahn stands out. This most complete exposition makes uses of modern topological viewpoint. He drew up definitions of continuity and many results, often made use of subsequently, but very rarely quoted, despite a second edition that came out in 1948. The latter date appears to correspond, after a long interruption, to a revival in publications on the continuity properties of point-to-set maps: Brisac (1947), Choquet (1947, 1948), Fort (1949), these studies dealing with topological spaces. Then Michael (1951) studied the construction of topologies on the space of the subsets. More recently, we should mention Cesari (1966, 1968, 1970), Lasota and Olech (1968), Ky-Fan (1970), Biilera (1971), etc. This list does not claim to be exhaustive, especially for the latter years. Increasing numbers of theoretical articles are being published on point-to-set maps, without direct relationship to
Background to point-to-set maps
3
optimization: we are not including them in our reference list, except for some books of a general nature. We close, however, with a reminder respecting some known works dealing with and making use of these notions of continuity properties: Berge (1957), Debreu (1959), Kuratowski (1961) and, already referred to, Berge (1959 and 1966), Zangwill (1969). 3. Parametrization in mathematical programming The notion of point-to-set mapping appears quite naturally in mathematical programming during study of the stability of the optimal value of a program or of the set of optimal solutions, when the problem data depend on a parameter: the domain and the set of optimal solutions appear as point-to-set maps of this parameter. This problem of parametrization may itself be the outcome of solving a program in two steps. The variables being denoted by x and y, we fix y and optimize with respect to x. The problem thus reduced is a problem parametrized by y, and its optimal value is a function of y, which has to be improved by modifying y. Many articles have been written (several dozen) dealing closely or not with these questions, and they are marked with a cross in column 1 of the reference list. Among the first published are those by Berge (1966), Rosen (1966), Rockafellar (1967 and 1969), Dantzig, Folkman and Shapiro (1967), Dinkelbach (1969), Zangwill (1969). 4. General algorithms Another fruitful field of application of point-to-set maps to mathematical programming is the representation of iterative solving methods or algorithms. Most of these autonomous algorithms can be simply defined by the recurrent relation xi.~ E F(xi), where F is a suitably chosen point-to-set map, with eventually a termination rule. The determination of a point of F(x), x being given, then represents the set of calculations to be made at each iteration. The definition of F may be somewhat complex, and generally appears as the composition of several point-to-set maps. For instance, in order to represent the maximization without constraints of a function [ : R " ~ R , presumed to be continuously differentiable, by means of the classical algorithm known as the "ascent method", we can first consider the point-to-set maps D : R " • R"---> ~ ( R " ) and M : R " • giving to a point x and to a direction z the respective values D ( x , z ) = {y E R" [ y = x + Oz, O >0}, i.e. the half-line with extremity x and direction z, and M(x, z) = {y E D(x, z) If(Y) > f(t) Vt E D(x, z)}, i.e. the set of optimal solutions on the half-line D(x, z). The choice of the
4
Background to point-to-set maps
direction z, for a given x, offers some degree of freedom and can be defined by z E za(x), with A: R ' ~ ~ ( R ' ) a point-to-set map such that a(x) = {z ~ R = Iz. V/(x) ~ ~llV/(x)ll"llzll}
where a is a positive constant. Finally, writing zl'(x)= {x} • A(x), the point-toset map F, defining the ascent method algorithm, appears as the composition F = M o A', which means that we have, for any x, F ( x ) = U~x.,)~a,~x)M(x,z). Other examples can lead to compositions of a higher order. To return to the general case. If the algorithm properly plays its part, it must generate, from an initial approximate solution x0, a sequence of points converging towards an optimal solution of the problem posed (or any accumulation point of this sequence must be an optimal solution, or more modestly a stationary point). The justification of this convergence generally consists in two steps, the first of which amounts to showing that the limit point (respectively any accumulation point) of the sequence is a fixed point of F, i.e. satisfying x E F ( x ) . The second step amounts to showing that this fixed point condition is a sufficient condition of optimality (or a necessary condition of optimality, in which case the algorithm yields only stationary points). In what follows, the word convergence will thus take on a very broad meaning, for the sake of simplification. In such an approach to the convergence of algorithms, the demonstrations clearly show the respective parts played by the hypotheses and especially by the continuity properties of F. These demonstrations are simple in outline, easily allowing partial changes (eventually prompted by plain common sense) to be made in any given special method, while ensuring that the theoretical framework that guarantees convergence is not overstepped. For example, the introduction of an approximation e in the calculations regarding an iteration (i) does not alter the schema of the demonstration, where F becomes a function of (x, e) instead of x. If (x, e) is one accumulation value of the sequence of the pairs (x, el), then x E F ( x , ~ ) is obtained. A recurrence relation giving 9 = 0 is for instance, (Xi+l, Ei+t) ~ F(xi, ei) • [0, 89r To make things clear, in the example of the ascent method algorithm referred to above, the fixed point relation x ~ F(x, ~) means that there exists z E A ( x ) such that f ( x ) >- [ ( t ) - ~ Vt ~ D(x, z). Which, if ~ = 0 and account is taken of definition of the function A, leads to V/(x) = 0. This new approach to algorithms quite naturally lends itself to a synthesis of optimization methods. After 1960, the nonlinear programming methods proposed in mathematical literature showed considerable development. Without prejudice to the practical value of these methods, many of them show but slight variations, unjustified from the theoretical standpoint. A natural need for classification has become evident, both for characterization of the methods and for unifying the convergence demonstrations. This is why general methods have appeared in literature on the subject, providing a minimum of precise data about the hypotheses necessary for their theoretical justification, and affording a wide
B a c k g r o u n d to p o i n t - t o - s e t m a p s
5
choice for possible particularizations, thus permitting a rediscovery of many conventional methods. A forerunner in general algorithms is undoubtedly Zoutendijk (1960) who, while proposing well-defined special methods, demonstrated their convergence by means of theorems valid for an entire category of methods. His well-known theory of "feasible directions" did not, however, make use of point-to-set mapping. The same is true of some general algorithms proposed during the decade that followed: Levitin and Polyak (1966), Huard (1967), Fiacco (1967) . . . . . Then came the publication of Zangwill's book (1969), in which is presented with the aid of point-to-set mapping a general algorithm that goes beyond mathematical programming, and makes it possible to determine a point of a privileged subset P in a given set E. A point-to-set map F : E ~ ~(E) is used iteratively following the rule:
xi+l E F(xi) Xi+ l = X i
if xi ~ P, otherwise.
On the basis of some general hypotheses, any accumulation point of the infinite sequence thus generated belongs to P. One important application in mathematical programming is to take as P the set of optimal (or stationary) solutions of a program. Zangwill's book provided an incentive work on a synthesis covering nonlinear programming methods. His general algorithm was extended by weakening the hypotheses: R.R. Meyer (1968, 1970, 1976(a), 1976(b)), Polak (1970, 1971 and this volume), Huard (1972, and this volume), Dubois (1973), G.G.L. Meyer (1977). At the same time there was a growing concern for the presentation of conventional methods in the most general form. Among the articles most "committed", we should mention R.R. Meyer (1970, 1976(b), 1977(a)), G.G.L. Meyer and Polak (1971), Huard (1972, 1975), G.G.L. Meyer (1974, 1975) and Auslender and Dang Tran Dac (1976). We should not overlook the advanced article by an anonymous author (see Anonymous (1972)) which, if it amused all its readers, discouraged no-one.
5. Macroscopic study of the algorithms The regrouping of a family of special methods in a single general algorithm reveals a common, simpler schema, which makes it easier to distinguish between and separate the various basic operations forming an iteration. Hence the natural idea of considering the structure of an algorithm "macroscopicaily" in order to be able to alter some parts without touching the others. It is thus possible to consider setting up a new method "by blocks", taking these blocks from various existing methods. Or one can use successively, complete iterations of different algorithms. Here again point-to-set mapping lends itself to such construction. Let us consider, for example, several functions F / : X ~ ~(X), j = 1, 2 . . . . . p,
6
Background to point-to-set maps
such that for any j, the relation xi§ E F~(xi) represents the current iteration of an algortihm (j). If we use these relations each in turn, in the natural order j = 1, 2 . . . . . p, in cyclical form, by regrouping p iterations in succession we implicitly build up an algorithm defined by the function F = Fp o Fp-t . . . . . F~. Other types of compositions can be envisaged. The problem set is to establish under what conditions any accumulation point obtained by the algorithm composed is a fixed point for each of the functions F/. This latter property often represents an optimality condition for some decomposable problems. The most classical example of application of this procedure is relaxation. For instance, if a point (x, y) of R 2 maximizes a differentiable concave function in both directions of coordinates, it thus maximizes this function in the whole space. And this point can be obtained by an infinite sequence of sub-optimizations, effected alternatively with respect to x, with y fixed, and with respect to y, with x fixed. These relaxation procedures are old, and were developed independently of the use of point-to-set mapping. A short reference list is given in an annex regarding this work, which made use solely of sub-optimizations on lines of R n or eventually on affine varieties. It is but recently that the use of point-to-set mapping has spread relaxation to subsets of any kind, at the same time simplifying demonstrations by means of the idea of composing algorithms. This notion is the basic idea of Zangwill's book (1969). It has been taken over and developed by different authors, e.g.R.R. Meyer (1976(b)), G.G.L. Meyer (1975(a) and 1975(b)), Fiorot and Huard (1974 and this volume).
6. Extending the classical notions o[ continuity Study of general algorithms is as yet but in its initial stages, and the results obtained do not always permit making a satisfactory description of certain special methods. Obtaining a fixed point by means of an iterative algorithm rests largely on certain continuity properties of the associated point-to-set map. In building up a complex algorithm, making use of operations on the ranges of various point-to-set maps, such as intersection, union, optimization, etc., the continuity properties of these functions are n o t always preserved. It is obvious that the classical notions of continuity, originally introduced independently of optimization methods, are not always well adapted to recent studies. In particular, Zangwill's general algorithm, in its classical form, is based on hypotheses that are too strong to permit a great number of applications. Apart from the articles already referred to, generalizing this algorithm, there is a new notion of continuity introduced by Denel (1977 and this volume). This continuity does not tie in with one point-to-set map but with a family of them, depending o n a scalar parameter p, the values of which are decreasing functions of p, in the sense of inclusion. What is involved is an extension of classical continuities better adapted to the construction of general algorithms and which generalizes fixed point theorems. Lastly, this notion of a "p-decreasing family" makes possible the measurement of the discontinuity of point-to-set maps.
Background to point-to-set maps
7
7. Various uses o[ point-to-set mapping
Apart from this important movement of research, various applications should be mentioned. The classical extension of optimality conditions and of duality to problems having cone-constraints naturally makes use of point-to-set mapping: e.g., see Borwein (1977). Zang, Choo and Avriei (1975, 1976, 1977) relate the notion of global optimality to the lower semi-continuity of a point-to-set map, defined by the truncation of the domain by an equi-level surface of the objective function. The weakening of the notion of gradient, e.g. the sub-differential, replaces a single valued function by a point-to-set map. For example, see Minty (1964), Moreau (1967), Valadier (1968), Dem'yanov (1968), Rockafellar (1970), Bertsekas and Mitter (1973), Clarke (1976). The many procedures for solving convex programs by successive linearizations, proposed in mathematical literature, have often been taken up again and analysed in the light of point-to-set mapping. We cite: R.R. Meyer (1968), Robinson and R.R. Meyer (1973), Fiorot and Huard (1974(a)), Fiorot (1977), Denel (1977(b)). As already pointed out, taking approximate calculations into account during the course of each iteration brings but very small complication to the theoretical schema of a method when it is represented by a point-to-set map. Likewise, it is often possible to insert at each iteration the discretization of the domain where sub-optimizing is being carried out, in order to obtain finitely many calculations. If this discretization can be defined as a point-to-set mapping of the current solution, offering the requisite continuity properties, then the demonstration of convergence remains unchanged. Two examples of such a method of implementation are given in Huard (1977 and 1978), one for the maximization of an ascent direction, the other for adapting Rosen's gradient projection method to the case of nonlinear constraints. Finally, we would point out that integer programming is also affected by point-to-set mapping although, a priori, it appears to be excluded from this field because of its combinatorial nature. The proof is given by R.R. Meyer (1975, 1977(c)). In this rapid review we have neglected mathematical studies unrelated, a priori, to optimization, as for instance articles concerning the derivability of point-to-set maps. On this subject we might mention, however, Methlouthi (1977), who gives an application to the stability of inequalities. Acknowledgments
This review has been written with the help of several authors, who sent us references and documents. The oldest references were discovered by Dr. J. Denel. We wish to thank them for their helpful informations. P. HUARD Electricit~ de France, Clamart, France
Mathematical Programming Study 10 (1979) 8-12. North-Holland Publishing Company
ANNEX 1 T H E CONTINUITIES OF THE POINT-TO-SET MAPS, DEFINITIONS AND E Q U I V A L E N C E S O. Introduction
In the theory of set-valued mapping, two kinds of continuity have been developed. For each of them, very closely related definitions have been given using on one hand (Hill (1927), Kuratowsky and Hahn (1932), Bouligand and Blanc (1933)) ordering inclusion properties in terms of limit of sequences of sets and on the other hand (Hahn (1932), Choquet (1948), Berge (1959)) topological properties of the "inverse image". The connexions between these definitions are given in the following. All throughout the paper, we consider a map F from X into ~(Y), the set of subsets of Y, where X, Y are Hausdorff spaces. Particular assumptions (for example, first countability) will be specified when necessary; it is to be noticed that none of the assumptions that are given, can be deleted. The properties are presented without proofs, the majority of the results being stated in the literature (for complete proofs and counter examples see [2]). O. 1. Notation ~ the family of the neighbourhoods of x ~ X, N' C N will always denote an infinite subset of N, {x.}• a sequence of points in X ({x.}N, an extracted subsequence) [A] ~ " [B] means: A ::> B, and B ::> A if assumption H is verified. (for the meaning of such a specified assumption, see the footnotes to Diagrams 1 and 2).
1. Limits of sets (Hahn and Kuratowski) Let {A.}N be a given sequence of subsets of a topological space Y. 1. I. L o w e r limit o.f {A.}N limN A. denotes the lower limit of the sequence {A.}N, i.e. the subset of Y (possibly empty) that consists of points x satisfying (VV E T'(x))(3no)n > n o ~ A,, N V ~ 8
O,
Continuities of point-to-set maps
9
1.2. Upper limit of {A.} iimN A. denotes the upper limit of the sequence {A.}s, i.e. the subset of Y (possibly empty) that consists of all points x satisfying (WV ~ ~(x))(Vn)(3n' >- n)A., n v # ~. 1.3. Properties (a) (b) (c) (d)
limNA. C limN A. and these subsets are closed in Y. x ~ limN A. ~=rn,(3{x.}~--> x)(3no)(Vn >- no)x. E A.. x ~ lims A. ~:~H,(3N' C N)(:l {x.}N,~ x)(Vn E N')x. E A.. If ~7 is opened in Y. then (::IN' C N)(Vn ~
N')A.
n ~ = O--~(lims A . ) n ~ = 0.
(e) If G is compact (or G sequentially compact)
(limN A.) n G = 0 ~ (3no)(Vn >- no)A. n G = O. (f) If Y is a metric space with compact closed-balls, G closed, then A. N G ~ O
(Vn)
A. connected subset of Y
(Vn) ~ ( l i m N A . ) n G~O.
limN A. ~ ~ and compact
2. First kind of continuity: the lower continuity
2.1 We present four definitions that have been introduced in the literature. Definition 2.1 (Hill, Kuratowski, Hahn, Blanc). The map F is said to be lower semi continuous by inclusion (L.S.C.) at x E X if and only if (V{x.}s--> x)F(x) C limN F(x.). Definition 2.2 (Hahn, Choquet, Berge). The map F is said to be lower semi continuous (I.s.c.), at ~r E X if and only if (V~? C Y, opened) F(x) n r
~
(3 v ~ ~(x))(Vx' E V)F(x') n ~ # fJ.
Definition 2.3 (Debreu, Hogan, Huard). The map F is said to be opened (or lower continuous) at x E X if and only if
(V{x. E X}s~x)}(=l{y. E Y}s~ y)(=ln0)(Vn -> n0)y. E F(x.). (Vy ~ F(x))
10
Continuities of point-to-set maps
Definition 2.4 (Brisac). The map F is said to be lower semi continuous, (l.s.c.) at x E X if and only if
F(x) = {y ~E Y I ( v v ~E 'F'(y)){x' ~EX ] F(x') n V : O} 6~ ~ where F(x) denotes the closure of F(x).
2.2. Equivalence properties Diagram 1 shows the connections between these definitions; the reader is referred to its footnote for the meaning of hypothesis Hi (i = 1,2). These equivalences are given for the definitions at x E X.
Diagram 1. Ht: Y satisfies the "first a x i o m of countability": H2: "there exists at x ~ X a countable base of neighbourhoods".
Remark 1. The notion of lower-continuity is extended to the whole space X by assuming the l.s.c, at every x E X. A characterization of the lower semi continuity (l.s.c.) on X is given by the following relation (Berge, Choquet): F is l.s.c, on X if and only if
{x E X [ F(x) A ~ # 0} is opened in X for every opened 6 in Y. Remark 2. It is worth noticing that all the previous definitions are equivalent if X and Y are first countable Hausdorff spaces. This is the case for metric spaces. 3. Second kind of continuity: the upper continuity
3.1 Definition 3.1 (Hill, Kuratowski, Hahn, Bouligand, Choquet). The map F is said to be upper semi continuous by inclusion, (U.S.C.) at x ~ X if and only if (V{x, E X}N ~ x) lims F(x,) C F(x).
Continuities of point-to-set maps
11
Definition 3.2 (Hahn, Choquet, Berge). The map F is said to be upper semi continuous (u.s.c.) at x E X if and only if (W7 C Y, opened) F ( x ) C ~7~ (3 V E ~
E V ~ F(x') C ~.
Definition 3.3 (Debreu, Hogan, Huard). The map F is said to be closed (or upper continuous) at x E X if and only if (V{x, ~ X}~-~ x) ( V { y , E Y } N ~ y such that y . ~ F ( x , ) )
I (Vn) = ) , y E F ( x ) .
Definition 3.4 (Choquet). The map F is said to be weakly upper semi continuous (w.u.s.c.) at x E X if and only if ( V y ~ F(x))(3 U ~ ~V(x))(3 V E ~V(y))x' E U ~ F(x') fq V = ~. R e m a r k 3. Choquet Definition 3.2.
calls "strong upper semi continuity"
the property of
R e m a r k 4. Both Definitions 3.1 and 3.4 imply " F ( x ) closed", but Definition 3.3 only implies " F ( x ) sequentially closed".
3.2. Equivalence properties Diagram 2 shows the connections between these definitions (the meaning of assumptions Hi (i = 1,4) is given by its footnote). Beside these equivalences, the following proposition gives a relation between U.S.C. and u.s.c, in a particular interesting context.
.
Diagram 2. These following equivalences are gwen for the definitions at x ~ X. H~: Y satisfies the "first axiom of countability"; Hz: "there exists at x E X a countable base of neighbourhoods"; /'/3: Y is a regular space and F(x) is closed; 1"14: Y - F(x) is compact (in particular, if Y is compact,/-/4 is fulfilled).
Continuities o[ point-to-set maps
12
Proposition 3. If Y is a metric space with compact closed balls, then
F(x) ~ ~J and compact [ F(x') connected ::> [USC ::> u.s.c]. (3C c o m p a c t ) ( 3 V E ~V(x))x'~ V::> [ F ( x ' ) n C ~ O at x at x Remark 5. It is to be noticed that Definitions 3.1, 3.3 and 3.4 are equivalent if the spaces X and Y are first countable Hausdorff spaces (in particular, if X, Y are metric spaces). Remark 6. It is worth noticing that Berge defines the u.s.c, of a map over the whole space X by the two following conditions: FF is u.s.c, at every x E X, is compact-valued.
Bibliography (1) The reader is referred to the general reference list of this Study (more precisely, papers with a mark in column 2). (2) J.P. Delahaye, J. Denel, "Equivalences des continuit6s des applications multivoques dans des espaces topologiques", Publication n ~ 111, Laboratoire de Calcul, Universit6 de Lille (1978). J.P. D E L A H A Y E and J. D E N E L Universit~ de Lille l, France
Mathematical Programming Study 10 (1979) 13. North-Holland Publishing Company
ANNEX2 RELAXATION METHODS A. Auslender, "Methodes num6riques pour la d6composition et la minimisation de fonctions non diff6rentiables", Numerische Mathematik 18 (1971) 213-223. J. Cea and R. Glowinski, "Sur les m6thodes d'optimisation par relaxation", Rairo No. R-3 (1973) 5-32. D. Chazan and W. Miranker, "Chaotic relaxation", Linear Algebra and its Applications (1969) 199-222. B. Martinet and A. Auslender, "M6thodes de d6composition pour la minimisation d'une fonction sur un espace produit", S.LA.M. Journal on Control 12 (1974) 635-642. J.C. Miellou, "Algorithmes de relaxation chaotique /l retard", Rairo No. R-1 (1975) 55-82. J.M. Ortega and W.C. Rheinboldt, Iterative solution of nonlinear equations in several variables (Academic Press, New York, 1970). F. Robert, M. Charnay and F. Musy, "Iterations chaotiques s6rie parall61e pour des 6quations non-lin6aires de point fixe", Aplikace Matematiky 20 (1975) 1-37.
S. Schechter, "Relaxation methods for linear equations", Communications on Pure and Applied Mathematics 12 (1959) 313-335. S. Schechter, "Iterative methods for nonlinear problems", Transactions of the American Mathematical Society 104 (1962) 179-189. S. Schechter, "Relaxation methods for convex problems", Siam Journal on Numerical Analysis 5 (1968) 601-612. S. Schechter, "Minimization of a convex function by relaxation", in: J.M. Abadie, ed., Integer and nonlinear programming (North-Holland, Amsterdam, 1970) pp. 117-189.
13
Mathematical ProgrammingStudy I0 (1979) 14-28. North-Holland Publishing Company
G E N E R A L R E F E R E N C E LIST
Each references' main subjects are indicated by numbers: Number
Main subject Parametrization; stability Theory of point-to-set maps, possibly not related to optimization: continuity, differentiability, integrability, existence of fixed points, etc. General algorithms; synthesis of optimization methods; computation of fixed points Use of point-to-set maps in particular problems or methods of optimization (excepted stability problems) Book, survey.
Anonymous, "A new algorithm for optimization", Mathematical Programming 3 (1972) 124-128.
(3)
J.P. Aubin, "Propri6t6 de Perron-Frobenius pour des correspondances positives semi-continues supdrieurement", Centre de recherches math6matiques, rapport No. 719, University of Montreal (1977).
(4)
J.P. Aubin, Mathematical methods of game and economic theory (North-Holland, Amsterdam, 1979).
(4, 5)
J.P. Aubin, Applied Junctional analysis York, 1978).
(Wiley-Interscience, New
(5)
J.P. Aubin and F.H. Clarke, "Multiplicateurs de Lagrange en optimisation non convexe et applications", Comptes Rendus de i'Acad~mie des Sciences 285-A (Paris, 1977)451-454.
(4)
J.P. Aubin and F.H. Clarke, "Removal of linear constraints in minimizing nonconvex functions", Modelling research group report No. 7708, University Park, Los Angeles (1978)
(I,4)
J.P. Aubin and J. Siegel, "Fixed points and stationary points of dissipative multivalued maps", Modelling research group report No. 7712, University Park, Los Angeles (1978).
(2)
14
General reference list
15
R.J. Aumann, "Integrals of set valued functions", Journal of Mathematical Analysis and Applications 12 (1965) 1-12.
(2)
A. Auslender, "R4solution num6rique d'in4galit6s variationnelles", Comptes rendus de l'Acad~mie des Sciences 276-A (Paris, 1973), 10631066.
(2, 4)
A. Auslender and Dang Tran Dac, "M6thodes de descente et analyse convexe", Cahiers du Centre d'Etudes et de Recherche Op~rationnelle 18 (I 976) 269-307.
(3, 4)
A. Auslender, Optimisation (m(thodes num(riques) (Masson, Paris, 1976).
(3, 5)
A. Auslender, "Minimisation de fonctions localement lipschitziennes: Applications a la programmation mi-convexe, mi-diff4rentiable", in: Mangasarian, Meyer and Robinson, eds., Nonlinear Programming 3 (Academic Press, New York, 1978).
(4)
A. Auslender, "Differentiable stability in nonconvex and nondifferentiable programming", Mathematical Programming Study 10 (1979) (this volume).
(1,4)
H.T. Banks and M.Q. Jacobs, "A differential calculus for multifunctions", Journal of Mathematical Analysis and Applications 29 (1970) 246-272.
(2)
M.S. Bazaraa, "A theorem of the alternative with application to convex programming: optimality, duality and stability", Journal of Mathematical Analysis and Applications 41 (1973) 701-715.
(I)
B. Bereanu, "The continuity of the optimum in parametric programming and applications to stochastic programming", Journal o[ Optimization Theory and Applications 18 (1976) 319-334.
(I, 2)
C. Berge, "Th4orie g4n~rale des jeux ~ n personnes", M4morial des sciences math6matiques 138 (Gauthier-Villars, Paris, 1957).
(2, 5)
C. Berge, Espaces topologiques-Fonctions multiooques (Dunod, Paris, 1966).
(2, 5)
D.P. Bertsekas and S.K. Mitter, "A descent numerical method for optimization problems with nondifferentiable cost functionals", SIAM Journal on Control 11 (1973) 637-652.
(4)
L.J. Billera, "Topologies for 2x; set-valued functions and their graphs", Transactions of the American Mathematical Society 155 (1971) 137147.
(2)
16
General reference list
E. Blanc, "Sur une proprirt6 diffrrentielle des continus de Jordan", Comptes Rendus de l'Acad~mie des Sciences 196 (Paris, 1933) 600-602.
(2)
E. Blanc, "Sur la structure de certaines lois g~nrrales rrgissant des correspondances multiformes", Comptes Rendus de l'Acad(mie des Sciences 196 (Paris, 1933) 1769-1771.
(2)
F.H. Bohnenblust and S. Karlin, "On a theorem of Ville", in: H.W. Kuhn and A.W. Tucker eds., Contribution to the theory of games (Princeton University Press, Princeton, NJ, 1950). Vol. 1, pp. 155-160.
(4)
J. Borwein, "Multivalued convexity and optimization: a unified approach to inequality and equality constraints", Mathematical Programming 13 (1977) 183-199.
(2, 4)
G. Bouligand, "Sur la semi-continuit6 d'inclusion et quelques sujets connexes", Enseignement math~matique 31 (1932) 14-22, and 30 (1931) 240.
(2)
G. Bouligand, "Proprirtrs grn~rales des correspondances multiformes", Comptes Rendus de l'Acad~mie des Sciences 196 (Paris, 1933) 1767-1769.
(2)
R. Brisac, "Sur les fonctions multiformes", Comptes Rendus de l'Acad~mie des Sciences 224 (Paris, 1947) 92-94.
(2)
F. Browder, "The fixed point theory of multivalued mappings in topological vector spaces", Mathematische Annalen 177 (1968) 283301.
(2)
C. Castaing and M. Valadier, Convex analysis and measurable multi[unctions (Springer-Verlag, Berlin, 1977).
(2, 5)
A. Cellina, "A further result on the approximation of set valued mappings", Rendiconti Accademia Nazionale dei Lincei 48 (1970) 230-234.
(2)
A. Cellina, "The role of approximation in the theory of set valued mappings", in: H.W. Kuhn and G.P. Szeg/5 eds., Differential games and related topics (North-Holland, Amsterdam, 1971).
(2)
L. Cesari, "Existence theorems for weak and usual optimal solutions in Lagrange problems with unilateral constraints, I and II", Transactions of the American Mathematical Society 124 (1966) 369--430.
(2, 4)
L. Cesari, "Existence theorems for optimal solutions in Pontryagin and Lagrange problems", S I A M Journal on Control 3 (1966) 475--498.
(2, 4)
General reference list
17
L. Cesari, "Existence theorems for optimal controls of the Mayer type", S I A M Journal on Control 6 (1%8) 517-552.
(2, 4)
L. Cesari, "Seminormality and upper semicontinuity in optimal control", Journal o[ Optimization Theory and Applications 6 (1970) 114-137.
(2, 4)
G. Choquet, "Convergences", Annales de l'Universit~ de Grenoble 23 (1947-1948) 55-112.
(2)
F.H. Clarke, "A new approach to Lagrange multipliers", Mathematics of Operations Research 1 (1976) 165-174.
(4)
F. Cordellier and J.C. Fiorot, "Trois algorithmes pour r6soudre le probl~me de Fermat-Weber g6n6ralis6", Bulletin de la Direction des Etudes et Recherches (E.D.F.) S~rie C, suppl6ment au no. 2 (1976) 35-54.
(4)
F. Cordellier and J.C. Fiorot, "Sur le probl~me de Fermat-Weber avec fonctions de cofits convexes", Laboratoire de Calcul, publication no. 74, University of Lille (1976).
(4)
F. Cordellier and J.C. Fiorot, "On the Fermat-Weber problem with convex cost functions", Mathematical Programming 14 (1978) 295-311.
(4)
D.E. Cowles, "Upper semi-continuity properties of variables sets in optimal control", Journal of Optimization Theory and Applications 10 (1972) 222-236.
(2, 4)
J.P. Crouzeix, "Continuit6 des applications lin6aires multivoques", Revue Fran~aise d'Automatique, d'Informatique et de R.O. No. R1 (1973) 62-67.
(2)
Dang Tran Dac, "D6composition en programmation convexe", Revue Franr d'Automatique, d'In[ormatique et de R.O. No. R1 (1973) 68-75.
(3, 4)
J.W. Daniel, "Stability of the solution of definite quadratic programs", Mathematical Programming 5 (1973) 41-53.
(1)
G.B. Dantzig, J. Foikman and N. Shapiro, "On the continuity of the minimum of a continuous function", Journal of mathematical Analysis and Applications 17 (1%7) 519-548.
(1, 2)
R. Datko, "Measurability properties of set-valued mappings in a Banach space", S I A M Journal on Control 8 (1970) 226--238.
(2)
18
General reference list
G. Debreu, Theory of value, Cowles Foundation monograph No. 17, (Wiley, New York, 1959).
(5)
J.P. Delahaye and J. Denel, "Equivalences des continuit4s des applications multivoques dans des espaces topologiques", Laboratoire de Calcul, publication no. 111, University of Lille (1978)
(2)
V.F. Dem'yanov, "Algorithms for some minimax problems", Journal on Computer and System Sciences 2 (1%8) 342-380.
(4)
J. Denel, "Propri4t~s de continuit~ des families p-d~croissantes d'applications multivoques", Laboratoire de Calcul, publication no. 87, University of Lille (1977).
(2)
J. Denel, "On the continuity of point-to-set maps with applications to Optimization", Proceedings of the 2nd symposium on operations research, Aachen, 1977 (to appear).
(2, 3, 4)
W. Dinkelbach, Sensitivitiits Analysen und Parametrische programmierung (Springer-Verlag, Berlin, 1%9).
(1,5)
S. Dolecki, "Extremal measurable selections", Bulletin de l'Acad~mie Polonaise des Sciences 25 (1977) 355-360.
(2)
S. Dolecki, "Semicontinuity in constrained optimization", Control and Cybernetics (to appear).
(1,2)
S. Dolecki and S. Rolewicz, "Metric characterizations of the upper semicontinuity", Institute of Mathematics, report 125 (Polish Academy of Sciences, Warsaw, 1978).
(2)
S. Dolecki and S. Rolewicz, "Exact penalty for local minima", Institute of Mathematics, report 125 (Polish Academy of Sciences, Warsaw, 1978).
(4)
S. Dolecki and R. Rolewicz, "A characterization of semicontinuity preserving multifunctions", Institute of Mathematics, report 125 (Polish Academy of Sciences, Warsaw, 1978).
(2)
J. Dubois, "Theorems of convergence for improved nonlinear programming algorithms", Operations Research 21 (1973) 328-332.
(3)
B.C. Eaves, "Homotopies for computation of fixed points", Mathematical Programming 3 (1972) 1-22.
(3)
B.C. Eaves and R. Saigal, "Homotopies for computation of fixed points on unbounded regions", Mathematical Programming 3 (1972) 225-237.
(3)
General reference list
19
B.C. Eaves and W.I. Zangwill, "Generalized cutting plane algorithms", S I A M Journal on Control 9 (1971) 529-542.
(4)
I. Ekeland and M. Valadier, "Representation of set-valued mappings", Journal of Mathematical Analysis and Applications 35 (1971) 621-629.
(2)
J.P. Evans and F.J. Gould, "Stability in nonlinear programming", Operations Research 18 (1970) 107-118.
(1)
J.P. Evans and F.J. Gould, "A nonlinear duality theorem without convexity", Econometrica 40 (1972) 487-496.
(4)
A.V. Fiacco, "Sequential unconstrained minimization methods for nonlinear programming", Thesis, Northwestern University, Evanston, Illinois (1967).
(3)
J.C. Fiorot, "Algorithmes de programmation convexe par lin6arisation en format constant", Revue Fran~aise d'Automatique, d'Informatique et de R.O., Analyse Num~rique 11 (1977) 245-253.
(4)
J.C. Fiorot et P. Huard, "Une approche th~orique du probl~me de lin6arisation en programmation math~matique convexe", Laboratoire de Caicui, publication no. 42, University of Lille (1974).
(4)
J.C. Fiorot et P. Huard, "Composition et r6union d'algorithmes g~n6raux", Laboratoire de Caicul, publication no. 43, University of Lille (1974).
(3)
M.K. Fort Jr., "A unified theory of semicontinuity", Duke Mathematical Journal 16 (1949) 237-246.
(2)
J. Gauvin and J.W. Tolle, "Differential stability", S l A M Journal on Control and Optimization 15 (1977) 294-311.
(1)
R.J. Gazik, "Convergence in spaces of subsets", Pacific Journal of Mathematics 43 (1972) 81-92.
(2)
A.M. Geoffrion, "Duality in nonlinear programming: a simplified applications oriented development", S l A M Review 13 (1974) 1-37.
(1)
J.H. George, V.M. Seghal and R.E. Smithson, "Application of Liapunov's direct method to fixed point theorem", Proceedings of the American Mathematical society 18 (1971) 613-620.
(2)
I.L. Glicksberg, "A further generalization of the Kakutani fixed point theorem with application to Nash equilibrium points", Proceedings of the American Mathematical Society 3 (1952) 170-174.
(2, 3)
20
General re[erencelist
H.J. Greenberg and W.P. Pierskalla, "Extensions of the Evans and Gould stability theorems for mathematical programs", Operations Research 20 (1972) 143-153.
(1)
H.J. Greenberg and W.P. Pierskalla, "Stability theorems for infinitely constrained mathematical programs", Journal o.f Optimization Theory and Applications 16 (1975) 409-428.
(1)
J. Guddat, "Stability in convex quadratic parametric programming", Mathematische Operationsforschung und Statistik 7 (1976) 223-245.
(1)
J. Guddat and D. Klatte, "Stability in nonlinear parametric optimization", Proceeding of the IX Symposium on Mathematical Programming, Budapest (1976).
(1)
H. Hahn, "Uber irreduzible Kontinua", Sitzungsberichte der Akademie der Wissenscha[ten Wien 130 (Vienna, 1921).
(2)
H. Hahn, Reelle Funktionen, 1. Tome: Punktfunktionen, copyright 1932 by Akademische Verlags Gesellschaft MBH Leipzig (Chelsea Publishing Co., New York, 1948).
(2, 5)
F. Hausdorff, Set theory (Chelsea Publishing Co., New York, 1962).
(5)
H. Hermes, "Calculus of set valued functions and control", Journal of Mathematics and Mechanics 18 (1968) 47-59.
(4)
L.S. Hill, "Properties of certain aggregate functions", American Journal o[ Mathematics 49 (1927) 419--432.
(2)
C.J. Himmelberg, "Fixed points of compact multifunctions", Journal of Mathematical Analysis and Applications 38 (1972) 205-207.
(2)
W.W. Hogan, "Directional derivatives for extremal-value functions with applications to the completely convex case", Operations Research 21 (1973) 188-209.
(1)
W.W. Hogan, "The continuity of the perturbation function of a convex program", Operations Research 21 (1973)351-352.
(1)
W.W. Hogan, "Point-to-set maps in mathematical programming", (1,2,5) SIAM Review 15 (1973) 591-603. W.W. Hogan, "Applications of general convergence theory for outer approximation algorithms", Mathematical Programming 5 (1973) 151168. P. Huard, "Resolution of mathematical programming problems with nonlinear constraints by the method of centres", in: J. Abadie, ed.,
(3, 4)
Generdl reference list
21
Nonlinear programming (North-Holland, Amsterdam, 1967) pp. 206219.
(3)
P. Huard, "Optimisation dans R n. 2~me partie: Algorithmes g6n6raux", Laboratoire de Calcul, University of Lille (1972).
(3, 4, 5)
P. Huard, "Tentative de synth~se dans les m6thodes de programmation non-lin6aire", Cahiers du Centre d'~tudes de R.O. 16 (1974) 347-367.
(3, 4)
P. Huard, "Optimization algorithms and point-to-set maps", Mathematical Programming 8 (1975) 308-331.
(3, 4)
P. Huard, "Implementation de m6thodes de gradients par discr6tisation tangentielle", Bulletin de la Direction des "Etudes et Recherches (E.D.F.) S~rie C No. 2 (1977) 43-57.
(4)
P. Huard, "Implementation of gradient methods by tangential discretization", Journal of Optimization Theory and Applications 28 (1979).
(4)
M.Q. Jacobs, "Some existence theorems for linear optimal control problems", S I A M Journal on Control 5 (1967) 418-437.
(4)
M.Q. Jacobs, "On the approximation of integrals of multivalued functions", S I A M Journal on Control 7 (1969) 158-177.
(2, 4)
R. Janin, "Sensitivity for nonconvex optimization problems", in: A. Auslender ed., Convex analysis and its applications (Springer-Verlag, Berlin, 1977) pp. 115-119.
(1)
Z. Janiszewski, "Sur les continus irr6ductibles entre deux points", Journal de l'Ecole Polytechnique, S6rie II, 16 (1912).
(2)
J.L. Joly and P.J. Laurent, "Stability and duality in convex minimization problems", Revue Fran~aise d'Informatique et de R.O., No. R-2 (1971) 3-42.
(1)
S. Kakutani, "A generalization of Brouwer's fixed point theorem", Duke Mathematical Journal 8 (1941) 457-459.
(2)
P.R. Kleindorfer and M.R. Sertel, "Equilibrium existence results for simple dynamic games", Journal of Optimization Theory and Applications 14 (1974) 614--631.
(1,4)
R. Klessig, "A general theory of convergence for constrained optimization algorithms that use antizigzagging provisions", S l A M Journal on Control 12 (1974) 598-608.
(3)
22
General reference list
R. Klessig and E. Polak, "An adaptative precision gradient method for optimal control", S I A M Journal on Control 11 (1973) 80-93.
(4)
B. Kummer, "Global stability of optimization problems", Mathematische Operationsforschung und Statistik, Optimization 8 (1977) 367383.
(1)
C. Kuratowski, "Les fonctions semi-continues dans l'espace des ensembles ferm6s", Fundamenta Mathematicae 18 (1932) 148-160.
(2)
C. Kuratowski, Topologie, third edition, volume II, Monografie matematyczne (Polska Akademia Nauk., Warszawa, 1961) Ch. 4, section 39.
(2, 5)
Ky Fan, "Fixed point and minimax theorems in locally convex topological linear spaces", Proceedings of the National Academy of Sciences of USA 38 (1952) 121-126.
(2)
Ky Fan, "Extensions of two fixed point theorems of F.E. Browder", in: W.M. Fleischman ed., Set-valued mappings, selections and topological properties o[ 2x, Lecture notes in Mathematics 171 (SpringerVerlag, Berlin, 1970).
(2, 5)
A. Lasota and C. Olech, "On Cesari's semicontinuity condition for set valued mappings", Bulletin de l'Acad~mie Polonaise des Sciences 16 (1968).
(2)
J.M. Lasry and R. Robert, "Analyse non lin6aire multivoque", Cahiers de math~matiques de la d~cision No. 11, University of Paris-Dauphine (1976).
(2)
E.S. Levitin and B.T. Polyak, "Constrained optimization methods", USSR Computational Mathematics and Mathematical Physics 6 (1966) 1-50.
(3, 4)
T.C. Lira, "A fixed point theorem for multivalued nonexpansive mappings in a uniformly convex Banach space", Bulletin o[ the American Mathematical Society 80 (1974) 1123-1126.
(2)
D.G. Luenberger, Introduction to linear and nonlinear programming (Addison-Wesley, Reading, MA, 1973).
(4, 5)
V.J. Manusco, "An Ascoli theorem for multivalued functions", Journal of the Australian Mathematical Society 12 (1971) 466-477.
(2)
M. Martelli and A. Vignoli, "On differentiability of multi-valued maps",
General re[erence list
23
Bollettino della Unione Matematica Italiana 10 (1974) 701-712.
(2)
D.H. Martin, "On the continuity of the maximum in parametric linear programming", Journal of Optimization Theory and Applications 17 (1975) 205.
(1)
B. Martinet, "Perturbation des mrthodes d'optimisation. Applications", Revue Franfaise d'Automatique, d'In[ormatique et de R.O., Analyse Num~rique 12 (1978) 153-171.
(1,4)
O.H. Merrill, "Applications and extensions of an algorithm that computes fixed points of certain upper-semi-continuous point-to-set mappings" University of Michigan, Ph.D. Dissertation, Ann Arbor (1972).
(3)
H. Methlouthi, "Caicul diff6rentiel multivoque" Centre de recherche de math6matiques de la d~cision, Cahier No. 7702, Universit~ ParisDauphine (1977).
(2)
G.G.L. Meyer, "Algorithm model for penalty functions-type iterative procedures", Journal of Computer and System Sciences 9 (1974) 20-30.
(3, 4)
G.G.L. Meyer, "A canonical structure for iterative procedures", Journal of Mathematical Analysis and Applications 52 (1975) 120-128.
(3)
G.G.L. Meyer, "A systematic approach to the synthesis of algorithms", Numerische Mathematik 24 (1975) 277-289.
(3)
G.G.L. Meyer, "Conditions de convergence pour les algorithmes it6ratifs monotones, autonomes et non drterministes", Revue Franfaise d'Automatisme, d'Informatique et de R. 0., Analyse Num~rique 11 (1977) 61-74.
(3)
G.G.L. Meyer, "Convergence conditions for a type of algorithm model", SlAM Journal on Control and Optimization 15 (1977) 779-784.
(3)
G.G.L. Meyer and E. Polak, "A decomposition algorithm for solving a class of optimal control problems", Journal o[ Mathematical Analysis and Applications 32 (1970) 118--140.
(4)
G.G.L. Meyer and E. Polak, "Abstract models for the synthesis of optimization algorithms", SIAM Journal on Control and Optimization 9 (1971) 547-560.
(3)
G.G.L. Meyer and R.C. Raup, "On the structure of cluster point sets of iteratively generated sequences", Electrical engineering department, report No. 75-24, The Johns Hopkins University, Baltimore, MD (1975).
(3)
24
General re[erencelist
R.R. Meyer, "The solution of non-convex optimization problems by iterative convex programming", Ph.D. Thesis, University of Wisconsin-Madison (1968).
(4)
R.R. Meyer, "The validity of a family of optimization methods", SIAM Journal on Control 8 (1970) 41-54.
(3, 4)
R.R. Meyer, "Integer and mixed-integer programming models: General properties", Journal of Optimization Theory and Applications 16 (1975) 191-206.
(4)
R.R. Meyer, "Sufficient conditions for the convergence of monotonic mathematical programming algorithms", Journal of Computation and System Sciences 12 (1976) 108-121.
(3)
R.R. Meyer, "On the convergence of algorithms with restart", SIAM Journal on Numerical Analysis 13 (1976) 696-704.
(3, 4)
R.R. Meyer, "A convergence theory for a class of anti-jamming strategies", Journal of Optimization Theory and Applications 21 (1977) 277-297.
(4)
R.R. Meyer, "A comparison of the forcing function and point-to-set mapping approaches to convergence analysis", SIAM Journal on Control and Optimization 15 (1977) 699-715.
(3)
R.R. Meyer, "Equivalent constraints for discrete sets", Mathematics research center report No. 1748, University of Wisconsin-Madison (1977).
(4)
E. Michael, "Topologies on spaces of subsets", Transactions o[ the American Mathematical Society 71 (1951) 152-182.
(2)
E. Michael, "Continuous selections", Annals o[ Mathematics 63, 64, 65 (1956, 1957).
(2)
G.J. Minty, "On the monotonicity of the gradient of a convex function", Pacific Journal o[ Mathematics 14 (1964) 243-247.
(4)
R.L. Moore, "Concerning upper semicontinuous collections of continua" Proceedings o[ the National Academy of Sciences of USA 10 (1924) 356-360.
(2)
R.L. Moore, "Concerning upper semicontinuous collections of continua" Transactions of the American Mathematical Society 27 (1925) 416.
(2)
J.J. Moreau, "S6minaire sur les 6quations aux d6riv6es partielles, II, Fonctionnelles convexes", Coll6ge de France (1966-1967).
(4)
General re[erencelist
25
S.B. Nadler, "Multivalued contraction mappings", Pacific Journal o[ Mathematics 30 (1969) 475--488.
(2)
F. No~.i~ka, J. Guddat, H. Hollatz and B. Bank, Theorie der linearen parametrischen Optimierung (Akademie-Verlag, Berlin, 1974).
(I, 5)
J.M. Ortega and W.C. Rheinbolt, Iterative solution of nonlinear equations in several variables (Academic Press, New York, 1970).
(5)
J.M. Ortega and W.C. Rheinbolt, "A general convergence result for unconstrained minimization methods", SlAM Journal on Numerical Analysis 9 (1972) 40--43.
(3)
A.M. Ostrowski, Solution o[ equations and systems of equations (Academic Press, New York, 1966).
(5)
W.W. Petryshyn and P.M. Fitzpatrick, "Fixed point theorems for multivalued non compact inward maps", Journal o[ Mathematical Analysis and Applications 46 (1974) 756--767.
(2)
E. Polak, "On the convergence of optimization algorithms", Revue Franfaise d'In[ormatique et de R.O. 16-R1 (1969) 17-34.
(3, 4)
E. Polak, "On the implementation of conceptual algorithms", in: J.B. Rosen, O.L. Mangasarian and K. Ritter, eds., Nonlinear programming (Academic Press, New York, 1970) pp. 275-291.
(3, 4)
E. Polak, Computational methods in optimization: A unified approach (Academic Press, New York 1971).
(3, 4, 5)
E. Polak, R.W.H. Sargent and D.J. Sebastian, "On the convergence of sequential minimization algorithms", Journal o[ Optimization Theory and Applications 14 (1974) 439--442.
(4)
B.T. Polyak, "Gradient methods for the minimization of functionals", U.S.S.R. Computational Mathematics and Mathematical Physics 3 (1963) 864--878.
(3)
S.M. Robinson, "Stability theory for systems of inequalities, Part II: Differentiable nonlinear systems", Mathematics research center technical report No. 1338, University of Wisconsin-Madison (1974).
(1)
S.M. Robinson, "Perturbed Kuhn-Tucker points and rates of con-
26
General reference list
vergence for a class of nonlinear-programming algorithms", Mathematical Programming 7 (1974) 1-16.
(1)
S.M. Robinson, "Regularity and stability for convex multivalued functions", Mathematics of Operations Research 1 (1976) 130-143.
(1)
S.M. Robinson, "First-order conditions for general nonlinear optimization", S l A M Journal on Applied Mathematics 30 (1976) 597-607.
(4)
S.M. Robinson, "A characterization of stability in linear programming", Operations Research 25 (1977) 435 447.
(1)
S.M. Robinson and R.H. Day, "A sufficient condition for continuity of optimal sets in mathematical programming", Journal of Mathematical Analysis and Applications 45 (1974) 506-511.
(l)
S.M. Robinson and R.R. Meyer, "Lower semicontinuity of multivalued linearization mappings", S I A M Journal on Control l l (1973) 525-533.
(2, 4)
R.T. Rockafellar, "Duality and stability in extremum problems involving convex functions", Pacific Journal of Mathematics 21 (1967).
(1)
R.T. Rockafellar, Convex functions and duality in optimization problems and dynamics, Lecture notes in operations research and mathematical economics 11 (Springer-Verlag, Berlin, 1969).
(5)
R.T. Rockafellar, Convex analysis (Princeton University Press, Princeton, N J, 1970).
(5)
R.T. Rockafellar, "Montone operators and the proximal point algorithm", S I A M Journal on Control and Optimization 14 (1976) 877892.
(3)
J.B. Rosen, "Iterative solution of nonlinear optimal control problems", S I A M Journal on Control 4 (1966) 223-244.
(4)
J.B. Rosen, "Two-phase algorithm for nonlinear constraint problems", Computer science department technical report No. 7%8, University of Minnesota (1977).
(4)
R. Saigal, "Extension of the generalized complementarity problem", Mathematics of Operations Research 1 (1976) 260-266.
(4)
J. Saint-Pierre, "Borel convex-valued multi[unctions" in: A. Auslender ed., Convex analysis and its applications (Springer-Verlag, Berlin 1977) pp. 180-190.
(2)
M.R. Sertel, "The fundamental continuity theory of optimization on a
General re[erencelist
27
compact space. 1", Journal of Optimization Theory and Applications 16 (1975) 549-558.
(l)
I. Shigeru and T. Wataru, "Single-valued mappings, multi-valued mappings and fixed point theorems", Journal of Mathematical Analysis and Applications 59 (1977) 514-521.
(2)
R.E. Smithson, "Sub continuity for multifunctions", Pacific Journal of Mathematics 61 (1975) 283-288.
(2)
S. Swaminathan, Fixed point theory and its applications (Academic Press, New York, 1977).
(2, 5)
Hoang Tuy, "On the convex approximation of nonlinear inequalities", Operations[orschung und Statistik 5 (1974) 451-466.
(4)
Hoang Tuy, "Stability property of a system of inequalities", Mathematische Operations[orschung und Statistik, Optimization 8 (1977) 27-39.
(1)
M. Valadier, "Quelques propridtds des sous-gradients", Rapport de I'I.R.I.A., Automatique 6833, Rocquencourt (1%8).
(4)
F.L. Vasilesco, Th~se, Facultd des Sciences, Paris (1925).
(2)
R.J.B. Wets, "On the convergence of random convex sets", in: A. Auslender ed., Convex analysis and its applications (Springer-Verlag, Berlin, 1977) pp. 191-206.
(2)
W.A. Wilson, "On the structure of a continuum, limited and irreductible between two points", American Journal of Mathematics 48 (1926) 145-168.
(2)
I. Zang and M. Avriel, "On functions whose local minima are global", Journal of Optiniization Theory and Applications 16 (1975) 183-190.
(4)
I. Zang, E.U. Choo and M. Avriei, "A note on functions whose local minima are globaF', Journal of Optimization Theory and Applications 18 (1976) 555-559.
(4)
I. Zang, E.U. Choo and M. Avriei, "On functions whose stationary points are global minima", Journal of Optimization Theory and Applications 22 (1977) 195-208.
(4)
28
General reference list
W.I. Zangwill, Nonlinear programming: a unified approach (PrenticeHall, Englewood Cliffs, N J, 1969).
(3, 4, 5)
W.I. Zangwill, "Convergence conditions for nonlinear programming algorithms", Management Science 16 (1969) 1-13.
(3, 4)
L. Zoretti, "Sur les fonctions analytiques uniformes qui poss6dent un ensemble parfait discontinu de points singuliers", Journal de Mathdmatiques Pures et Appliqudes 1 (1905) 1-51.
(2)
G. Zoutendijk, Methods of feasible directions (Elsevier, Amsterdam, 1960).
(3, 5)
Mathematical Programming Study 10 (1979) 29-41. North-Holland Publishing Company
DIFFERENTIABLE STABILITY IN NON CONVEX NON DIFFERENTIABLE PROGRAMMING
AND
A. A U S L E N D E R
Universit~ de Clermont, Aubidre, France Received 24 October 1977
This paper consists of a study of differentiable stability in non convex and non differentiable mathematical programming. Vertically perturbed programs are studied and upper and lower bounds are estimated for the potential directional derivative of the perturbed objective function.
Key words: Mathematical Programming, Objective Function Sensibility, Locally Lipschitz Function, Generalized Gradients.
0. Introduction This p a p e r consists o f a study o f differentiable stability in non c o n v e x and non differentiable m a t h e m a t i c a l p r o g r a m m i n g . F o r a s u r v e y o f this and related w o r k see G a u v i n and Toile [7]. The goal o f this paper is to e x t e n d the G a u v i n and Tolle's results to the non differentiable case. More precisely in this p a p e r R N is the usual v e c t o r s p a c e o f real N - u p l e s with the usual inner p r o d u c t d e n o t e d by (.,.), m, p, N ( p < N ) positive integers, and [, [i iE(1,m)g~, j E ( 1 , p ) are real-valued f u n c t i o n s defined on R N locally Lipschitz; (1, p ) d e n o t e s the set o f integers included in the closed interval [ 1 , p ] . T h e f u n c t i o n s gi i E ( l , p ) are c o n t i n u o u s l y differentiable. Let (z, w) ~ R m • R p ; we shall c o n s i d e r the sets C = { x : / i ( x ) -<0, i E (1, m), gi(x) = O, i E (1, p)} (C is a s s u m e d to be n o n e m p t y ) ,
C(z, w) = {x:/~(x) - z, i ~ (1, m), g i ( x ) = w, i ~ (1, p)}, and the mixed n o n linear p r o g r a m P:
3' = inf(f(x) I x E C)
and its p e r t u r b e d p r o g r a m P(z, w):
h(z, w) = inf(/(x) Ix E C(z, w)).
T h e aim o f this p a p e r is to obtain l o w e r and u p p e r b o u n d s f o r (A)
lim inf h(;~z, ;tw) - h(0, 0) ,~-,o+ A '
lim sup h(2tz, ;~w) - h(0, 0) x-.0+ A 29
30
A. Auslender/ Differentiable stability
When the functions [, [i i E (1, p) are continuously differentiable such bounds were obtained by Gauvin and Tolle [7] with the use of the set of Kuhn and Tucker vectors under the Mangasarian-Fromovitz constraint qualification condition [11]. When the functions are only locally Lipschitz, recently HiriartUrruty [10] and Clarke [3] have generalized the set of Kuhn and Tucker vectors by giving necessary conditions. With this generalized set and under a generalized constraint qualification condition, bounds o f (A) will be obtained and these bounds will coincide with those of Gauvin and Tolle given in the differentiable case. In Section 1 of this paper, the basic terminology and notation used as well as some fundamental results from pseudoinverse matrix theory, point-to-set maps and the theory of locally Lipschitz are presented. In Section 2, a fundamental theorem is given with first applications in the necessary conditions of optimal programming. In Section 3, lower and upper bounds of (A) are given.
1. Preliminaries l.l. P o i n t - t o - s e t m a p s
For proving T h e o r e m s 3.3 and 3.4 we shall use concepts and theorems from the theory of point-to-set maps O defined from R" to R p. 12 is closed at a point ~ if Xk --~ .~, Yk E g'~(Xk) and Yk-'* 37 imply that 37E ~($). g2 is uniformaly compact near $ if there is a neighborhood of $, V such that U x e v I2(x) is bounded. Let v(x) = inf(q~(x, Y) I Y E 12(x)) where q~ is a real-valued function defined on R" • R p and let H ( x ) = {y e ~/(x) [ v(x) = ~(x, y)}.
Theorem 1.1 ([10, theorem 5]). If g2 is closed at ~ and u n i f o r m l y c o m p a c t near and i[ ~ is lower semi continuous on ~ x I2(.~), then v is lower s e m i - c o n t i n u o u s at
Now let {A.} be a sequence of subsets of R m. We define the outer limit, lim.~= A. by lim A. = {x E R m: x = lim x.,, where {ni} is an infinite subsequence of integers and x., E A.~}; the inner limit, l i m . ~ A. by lim A. = {x: x = lim x., where x, ~ A. for all but a finite number of n}.
A. Auslender / Differentiable stability
31
If l i m ~ Am = l i m , ~ An, we say that l i m , ~ Am exists and set lim An = lim An = !im An.
Theorem 1.2 ([6, theorem 1.2.2]). Suppose that f o r every sequence {x,} with x, ->x lim .O(x,) is either empty or equal to D ( x ) and that ~ is continuous, then H is closed at x. Theorem 1.3 ([6, corollary 2.3.4]). Let g, {g~} be affine functions f r o m R n to R m with g = l i m r ~ gr. If we define H ( g ) = {x: g(x) = 0} and suppose that lim sup rank(g,) -< rank(g), r~oe
Then either l i m r ~ H(g,) = H ( g ) or H(gr) is empty f o r infinitely m a n y r.
1.2. Pseudoinverse maps In this p a p e r we shall have occasion to use the pseudoinverse of a p x N matrix A with rank p (p < N ) . For such a matrix, A A ' is non singular and the matrix A ' ( A A t ) -~ denoted by A -n is called the pseudoinverse of A. M o r e o v e r we have: Theorem 1.4 ([8, theorem 8.1]). I f A is a p • N matrix with rank p, then A lb is a solution o f A x = b. The next t h e o r e m will be fundamental in this paper. Theorem 1.5. L e t gi i E ( 1 , p ) be a set of p real-valued functions continuously differentiable defined on R N. Suppose further that the matrix o f partial derivatives (Ogi], Oxj/
iE(1,p),
jE(1,N)
(1.0)
has rank p at x = Xo. Then f o r a given vector ho, there exists a neighborhood V • X of (h0, x0), 8 > 0 a function a with values in R N defined on ] - 8, 8 [ • V • X continuous which has with respect to the first component a continuous partial derivative aa/aA such that a(0, h, y) = 0,
Oct ~ - (0, h, y) = 0 V(y, h) E X • V
(1.1)
A. Auslender/ Differentiable stability
32
and such that if we set x(A, h, y) = y + a(A, h, y) + Ah,
then x(A, h, y) satisfies gi(x(A, h, y)) = gi(y) + A(Vgi(y), h) VAE]-6,6[,
V(y,h)EXxV,
iE(1,p).
(1.2)
Proof. Use the proof of [8, theorem 5.2] substituting the variable A by the triplet (X, h, y). For the following we shall denote by G(xo) the pseudoinverse of the matrix defined by (1.0).
1.3. Locally Lipschitz functions For completeness we include some important concepts from the theory of locally Lipschitz functions. For a more detailed exposition the reader is referred to Clarke [2, 3, 4, 5]. Let f be a real-valued function defined on R N and locally Lipschitz, then there exists for every x ~ R N a nonempty convex compact set denoted by Of(x) and called the generalized gradient such that, if we note
/~
12)
sup f(X + h + A v ) - f ( x + h) lira ~o + A '
(1.3)
h-,O
then we have f~
v) = 6*(v I Of(x)) V v E R N.
(1.4)
In this formula, 0"(. I Of(x)) is the support functional of Of(x). More generally if C is a convex compact set we note
O*(v I C) = max((v, Y) I Y E C).
(1.5)
We shall use the two classical properties of support functionals ([12, theorem 13.11)
OECr
O*(v [ C ) - > 0 V v E R N.
(1.6)
If C,, i ~ (1, r), are convex compact sets, Ai positive reals, then
O*(vl~:lAiCi)=~AiO'(vlCi). We shall use also the following properties of f0. (1) f0 is upper semi continuous (u.s.c.), (2) for any x, if(x; .) is convex and positively homogeneous,
(1.7)
A. Auslender/ Differentiable stability
33
(3) f~
v) = lim sup f(x + h +Av') - f ( x + h)
(1.8)
h-*O,v'~t"
2. A fundamental theorem: first applications Theorem 2.1. Let hi i E (1, s) be a set o f p real-valued functions defined on R N locally Lipschitz. For any (z, w) E R s x R p set D(z, w) = {x: hi(x) <- zi (i E (1, s )),gj(x) = wj (j E (1, p))}; D*(z, w) = {x: hi(x) < zi (i ~ (1, s)),gj(x) = wi (j E (1, p))}. Let 2 E D(O, O) and suppose that (1) matrix
(ag:~
(2.0)
cgXj/ ie(1 ,p), ie(1.N)
has rank p at point .~; (2) there exists y such that
h~163)7) < 0,
i E I(s
(Vg,(s
= 0,
i E (1, p)
(2.1)
where I(s = {i: hi(s = 0}. Then: (A) For any (z, w): (a) there exists at least one direction d such that
h~163d) < zi,
i E I(s
(Vg~(s d) = w~, i E (1, p).
(2.2)
(b) For any d satisfying 2.2, if a is the function associated to (s d) by Theorem 1.5 and if we set x(A) = s + a(;t, d, s Ad, then there exists ;to > 0 such that x(X) ~ D*(Xz, Xw) V )t E ]0, Xo[.
(2.3)
(B) More generally, f o r (L ~ ) there exists a neighborhood Z x W o f (L ~), )to > O, an affine injective transformation d defined f r o m R " into R N such that f o r any (z, w ) E Z • W, A ~ ] 0 , Ao[ (a) h~163d ( w ) ) < zi,
i E I(s
(Vg;(s
d(w)) = w~, i E (1, p),
(b) if a is the function associated to (,L d ( ~ ) ) by Theorem 1.5 and if we set x(a, w) = ~ + a (A, d(w), s + Xd(w), then x(a, w) E D*(Az, AwL
(2.4)
Proof. Obviously part (A) is a corollary of part (B), but for more clearness we shall prove first part (A).
A. Auslender/ Differentiable stability
34
(A(1)) Since Vg~($), i E (1, p) are i n d e p e n d e n t , G($)w satisfies (Vgi(~), G($)w) = w,,
i ~ (1, p)
( G ( $ ) is the p s e u d o i n v e r s e of the of the matrix defined in 2.0). T h u s f r o m 2.1 and since the functionals {(Vg,-($),.)} are linear and the functionals {h~ are h o m o g e n e o u s and c o n v e x there exists 6 --- 0 such that if we set ~ = O7)7+ G(Y,)w, then h~
}) < z,,
i E I($),
(Vgi(s
}) = w,,
i E (1, p).
(A)
(A2) L e t d satisfying 2.2, a the f u n c t i o n a s s o c i a t e d to (d, s b y T h e o r e m 1.5 and set x(X) = :~ + a(X, d, ~) + Ad, then: (a) f r o m T h e o r e m 1.5, there exists ;tl > 0 such that for ;t E ]0, ;t~[,
g~(x(A))=Aw,,
i @ (1,p).
(b) Since zi - h o(~; d ( ~ ) ) > 0 there exists A2 > 0 such that f o r i E I ( ~ ) , A E ]0, A2[, hi(x(A)) - h~(~) < lira sup hi(x(A)) - hi(~) A ,~o* A
d) + zi
h~
Set d' = d + a(A, d, .f)/X ; then f r o m (1.1) d ' ~ d if X ~ 0 +. T h u s f r o m 1.8, we h a v e lira sup hi(x(X)) - hi(S) _< lim sup hi($ + h + Xd') - hi('2 + h) x-.0 +
A
h-.O.x--.o
A
-< lim sup h~(:~ + h + Ad*) - h,-(.f + h) = h0($; d) h-~O,d*-~d ~~0 +
and c o n s e q u e n t l y w e obtain h/(x(X)) < Xzi.
(B)
(c) Since h,- is c o n t i n u o u s and since a(X, d, s such that f o r A E ]O, A3[, i~. I(g),
0 if A ~ 0 + t h e r e exists A3 > 0
hi(x(X)) < Xzi.
(C)
If we set ;t0 = min(At, ;t2, A3), then we obtain 2.3. (B(I)) F o r e v e r y b o u n d e d n e i g h b o r h o o d Z • W of (L if') t h e r e exists o7 - 0 such that if we set d ( w ) = o T ~ + G ( $ ) w , then d is an injective attine m a p satisfying for e v e r y (z, w) E Z • W,
h~
iEl(~),
(Vgi(~),d(w))=w~,
iE(I,p).
(B(2)) L e t a the f u n c t i o n a s s o c i a t e d to (~, d(ff)) b y . T h e o r e m 1.5 and set x(A, w) = $ + a(A, d ( w ) , ~) + Ad(w). Then: (a) F r o m T h e o r e m 1.5, since a(A, d(w), $ ) ~ 0 if A ~ 0 § and w --, ff there exist
A. Auslender/ Differentiable stability
35
At > 0 , W1C W neighborhood of ~ such that for A E ] 0 , At[, w E WI, z E Z we have gi(x(A, w)) =Awi,
i ~ (1, p),
hi(x(A, w)) < Azi,
i~_ I(Y`).
(b) Since z i - h~ d ( f f ) ) > 0, there exist A2E ]0, A,[, a neighborhood W2C W~ such that for A E ]0, A2[, w E W2, z ~ Z, i E 1(2) we have hi(x(A, w)) - hi(y`)
A
< lim sup A-.o%w~
hi(x(A, w)) - hi(y`)
A
+ zi - h~
d(ff))
and then for such (A, w), with the same arguments as in (A (b)) we can prove that hi(x(A, w)) < Xzi. Now return to problem P given in the introduction. First we want to give necessary conditions to characterize a local minimum 2. For that it is necessary to impose some type of constraint qualification condition on the functions .6, gi at Y`. We shall say that 2 satisfies the constraint qualification condition if: (1) matrix
OXj/ ie(i,p).ieO,N)
has rank p at Y`; (2) there exists 3~ such that ~ ( 2 ; 3~) < 0,
i ~ I(2),
(Vgi(y`), 37) = 0,
i E (1, p)
(2.5)
where I(Y`) = {i: [i(y`) = 0}. This condition generalizes the Mangasarian-Fromovitz condition [11] in the differentiable case. The next generalized K u h n - T u c k e r theorem appears as a trivial corollary of more general theorems given by Hiriart-Urruty [9] and by Clarke [3]. Nevertheless this theorem is also a simple corollary of Theorem 2.1 and we obtain thus another proof of it.
Theorem 2.2. I[ Y` E C satisfies the constraint qualification condition a n d if 2 is a local m i n i m u m [ o r f in C, then there exist real n u m b e r s ui vj, i ~ 1(2), j ~ (1, p) such that
ui -> 0,
i E 1(2),
o~ of(y,)+ ~ , ~u~af~(y`)+ ~.= v~vgj(y`). Proof. Denote by D the following set D = {d: f~
d) < 0, (Vgi(2), d) = 0, j E (1, p),
~(Y`; d) < 0, i ~ I(2)}.
(2.6) (2.7)
A. Auslender/ Differentiable stability
36
D is empty. In the contrary case, if we set hi = [i, i E (1, m), hm+~(')= f ( ' ) - / ( $ ) , (z, w) -- (0, 0), then by part (A) of T h e o r e m 2.1 we obtain a contradiction with the fact that $ is a local minimum. Thus, from the fundamental theorem [I, p. 210] there exists ;to, ;ti - 0, i E I($), /zj, j E (1, p) such that ;to+ ~
iEl(s
d)+
;ti# 0,
d)
iEI(.~)
+ ~ ~j(Vg~(~), d) -> 0, Yd. j=l
Next, the qualification constraint condition implies that ;to # 0; then from 1.6 and 1.7 we conclude that 2.7 holds. Let s be the n u m b e r of elements of I(~); without loss of generality we assume that I ( ~ ) - - ( 1 , s) and for the following we shall denote by K ( g ) the set of K u h n - T u c k e r vectors (u, v) E R s x R p such that 2.6 and 2.7 hold.
3. Differentiable stability In this section we investigate some differential properties of the perturbed program P(z, w) given in the introduction. We denote by M(z, w) the optimal set of P(z, w), that is, M ( z , w ) = {x E C ( z , w): f ( x ) = h ( z , w)}.
It will be assumed in this section that ~ is a local minimum for f in C and that the qualification constraint condition is satisfied in s
Theorem 3.1. F o r e v e r y (z, w ) E R s x R p, E > 0 there exists a v e c t o r y(e, z, w ) s u c h that
f0(~; y(E, z, w)) < zi,
i E I(~),
(Vgi(~), y(e, z, w)) = w. i ~ (l,p),
f0(~; y(e, z, w)) -< -min(((u, v), (z, w)) [ (u, v) E K(~)) + e. Proof. For (u, v, y) ~ R s x R p x R N we denote L(u, v, y) =
y) + E
y) +
iE/((~)
i"= l
Consider the minimization problem PI:
a = inf(f~
y) I ~ ( ~ ; y) -< zi,
i ~ I(~),
(Vgi(x), y) = wi,
i ~ (1, p))
y).
(3.1)
(3.2)
A. Auslender/ Differentiable stability
37
and its ordinary dual problem Q,:
fl : sup(inf-x ~ viwi y
i=l
~, uizi+ L(u, v, y)[ui >-O, i E I(,2,, v, E R) iUI(2)
Since from part (A(a)) of Theorem 2.1 there exists y* such that ff~(2;y*)
iEI($),
(Vgi(2),y*)=wi,
i@(1,p)
we can apply the dual classical theorem of [12] and we obtain: (a) a = /3,
(b) the set of optimal solutions of Qt is nonempty. Thus, a = m a x ( - ( ( u , v),(z, w ) ) + i n f L(u, v, y ) [ u ->0, v E RP). \
y
/
From 1.6 and 1.7 we have K ( 2 ) = {(u, v): u >- O, L(u, v, .) >- 0},
(A)
Let u -> 0; if (u, v) ~ K(2), it is a consequence from (A) that there exists y such that L(u, v, y) < 0 and since L(u, v, .) is positively homogeneous we have inf(L(u, v, y) [ y E R N) = -or In the other case, if (u, v) E K(2) since L(u, v, .) is homogenous and ->0 we have inf(L(u, v, y) I Y E R N) = 0. Thus finally we obtain a = -min(((u, v), (z, w))[ (u, v) E K(:D). Then if we take an e-optimal solution of Pt we obtain the desired result. For every z E R " we denote z "- (z *, z 2) with z t = (z, . . . . . z,), z 2 = (zs+l . . . . . z,,). Now we assume for the sequel that 2 E M(0, 0). 3.2. For any direction (z, w) E R m x R p, (1) there exists A0 such that
Theorem
C(Xz, Aw) r OVA E ]0, X0[,
(2) lira sup h(Xz, Aw) - h(0, 0) < - min(((u, v), (z', w)) [ (u, v) E K(2)). A-.,o
A
(3.3)
Proof. Let e > 0 , / ~ > 0 , y ( e , z , w ) satisfying 3.1 and 3.2. Set d(e,p.) = y(~, z, w ) + / ~ (~ satisfies 2.5). Then from part (A(a)) of T h e o r e m 2.1, if x is the
A. Auslender/ Differentiable stability
38
curve associated to d in this theorem, there exists ;to > 0 such that for A ~ ]0, ;to[, x(x) ~ C(;tz, ;tw).
(A)
Moreover, from (A) we obtain
h(;tz, hw) --
sup x~0 +
h(hz, ;tw) - h(0, 0) ~ lira sup [(2 + h + ; t d ' ) - f ( 2 + h) ;t
x--,0+,h--,0
;t
d'~d(~,tt)
< [ o ( 2 ; d(e,/~)). Thus from inequality 3.2 and since f0(~; .) is convex and homogenous we obtain lira sup h(;tz, Aw) - h(0, 0) _~ _ min(((u, v), (z I, w)) [ (u, v) E K(2)) A-~ +
;t
+ E + ~fo(2; y).
(B)
Then, if e --> 0, tz ~ 0 we obtain from (B) inequality 3.3. T h e o r e m 3.3. Assume that the point-to-set mapping C(.) is uniformly compact
near (0, 0), then h is continuous at (0, 0). Proof. Since C(-) is uniformly compact near 0 and closed at (0, 0), h is lower semi-continuous at (0, 0) by Theorem 1.1. We shall prove now that h is upper semi-continuous at (0, 0). Let {z,, w,} converging to (0, 0) such that = lim sup h(z, w) = lim h(z,, w,). ( w,z )--~ O,O)
n--,~
Set (~., ~ . ) _
(z., w.)
I[(z.,
w.)[l'
A. = II(z., w.)ll,
t h e n w i t h o u t Joss o f g e n e r a l i t y w e can a s s u m e that there e x i s t s (2, ~ ) s u c h that (5, ~ ) = lim (2., ~.),
0 = lim A..
n ---~r
n--~o
Thus from part B of T h e o r e m 2.1 if x is the curve associated to (:~, d(~)) there exists no such that for n -> no,
x(;t., ~,.) ~ C(;t.5., ;t.r Since h(;t.(2., ~.)) = h(z., w.) we obtain
h(z,,, w,,) <--/(x(;t., g',,))
A. Auslender/ Differentiable stability
39
and 8 -< liraf(x(X,, r
= f(~) = h(O, 0).
n ~
3.4. A s s u m e that C(.) is uniformly c o m p a c t near (0,0), that qualification constraint condition is satisfied f o r any x E M(O,O), then there exists ~ E M(O, O) such that
Theorem
lim inf h(Az, Aw) - h(0, 0) _> _ 8 , ( ( z ~, w) [ K(~)). x-*o*
(3.5)
A
Proof. (a) Since C(.) is uniformly c o m p a c t near (0,0) from T h e o r e m 3.2, there exists )to such that for A E ]0,)t0[, M(Az, Aw) is not empty. Let {A.} be any sequence of positive reals such that 0 = lim A., lim inf h(Az, Aw) - h(0, 0) _ ~ = lim h(A.(z, w)) - h(0, 0) x-.o+
A
,.-.~
An
and let x. E M(X.z, A.w) such that ,f = lim, ~ x,. Since h is continuous we have 8 = lim [ ( x . ) - / ( ~ ) n~
~ E M(O, 0).
An
(b) Let ~,/z > 0, y ( E , - z ' , -
w) given by T h e o r e m 3.1, and y satisfying 2.5. Set
D(x, w) = {d: (Vg/(x), d) = - wi, i ~ (1, p)}. For any closed convex set H denote Pu('), the projection operator. Set y-~ = P o(.... ) ( y ( ~ , - z ' , - w ) ) ,
Y. = Po~x.,0)(P).
Since Pu is n o n e x p a n s i v e we have, for n sufficiently large,
II:.ll-< I1~11,
I1:, +
G(x.)wll <-IIa(x.)w + y(,, -z, -
w)ll.
Thus the sequences {:.}, {}.} are bounded and from T h e o r e m s 1.2 and 1.3 we have y ( e , - z ~ , - w ) = lim y., --
~ = lim y..
Set d, = y. +/zy., then (Vgi(x.),d.)=-wi,
i E (1,p).
Set d = y ( ~ , - z ~, - w ) + # y and let a the curve associated to (~, d) by T h e o r e m 1.5. Let a,(A) = a(A, d., x.). From T h e o r e m 1.5 there exists A > 0, n~ such that if
40
A. Auslender/ Differentiable stability
w e set Z,(A) = x. + a . ( A ) + Ad., then for n > nt we have gi(.f.(A)) = g i ( x . ) - AWl V A G ]0, )~[, a.(0) = 0,
lira
a ' ( A ) = 0.
T h u s there exists n2 > n~ such that for n > n2, A, e 10, ~7[, lira a . ( A . ) = 0,
lira a.(A.) = 0.
L e t y. = ~.(A.). T h e n y. = x. + A.d" with d = l i m . ~ d ' . Since x. E M(A.z, A.w) w e have g/(y.) = 0,
n -----n2.
(A)
T h e n there exists n3 -- n2 such that (a)
fi(y.) < 0,
i ~ IC~);
(b)
fi(y.)-fi(x.)_~
A.
(B)
lim sup f i ( y " ) - f i ( x " ) .-~ A.
/ ~ ( ~ ; 3~)
--
~,-w))<--z.
i~I(~);
(C)
f(Y") - . f i x . ) <_ lim sup f ( Y " ) - f ( x . ) + ~ <_ fo($; d) + A. .-~ A.
<_/o(~; y(~,-z,-w))+ M~
9)+ e.
(D)
Since x. E M(X.(z, w)) from (A), (B), (C) this implies that y.~CVn>-n3 and then f ( y . ) > f ( ~ ) V n - n~.
T h u s f r o m (D) and f r o m T h e o r e m 3.1 we obtain - 8 < max(((u, v), (z ~, w)) ] (u, v) E K ( ~ ) ) + 2e + ~ ( ~ ; ~)/z. T h e n , w h e n (e, t z ) ~ (0, 0) w e obtain inequality 3.5. Remark.
F r o m T h e o r e m s 3.2, 3.3 and 3.4 corollaries as in [7] can be obtained.
References [1] C. Berge, Espaces topologiques--[onctions multivoques (Dunod Paris, 1966). [2] F. Clarke, "Generalized gradients and applications", Transactions of the American Mathematical Society 205 (1975) 247-262.
A. Auslender/ Differentiable stability
41
[3] F. Clarke, "A new approach to Lagrange multipliers", Mathematics of Operations Research 1 (1976) 165-174. [4] F. Clarke, "Generalized gradients of Lipschitz functionals", M.R.C. Tech. Summary Rept. [5] F. Clarke, "On the inverse function theorem", Pacific Journal of Mathematics 69 (1) (1976). [6] G. Dantzig, Folkman and J.F. Shapiro, "On the continuity of the minimum set of a continuous function", Journal of Mathematical Analysis and Applications 17 0967) 519-548. [7] J. Gauvin and J.W. ToUe, "Differentiable stability", Siam Journal on Control and Optimization 15 (1977) 294-311. [8] M.R. Hestenes, Optimization theory--The finite dimensional case (Wiley, New York, 1975). [9] J.B. Hiriart-Urruty, "On optimality conditions in nondifferentiable programming", Mathematical Programming 14 (1978) 73-86. [I0] W.W. Hogan, "'Point-to-set maps in mathematical programming", Siam Review 15 (1973) 591-603. [l l] O.L. Mangasarian and Fromovitz, "The Fritz-John necessary optimality conditions in the presence of equality and inequality constraints", Journal of Mathematical Analysis and Applications 17 (1967) 37-47. [12] R.T. Rockafellar, Convex analysis (Princeton University Press, Princeton, N J, 1970).
Mathematical Programming Study 10 (1979) 42-47. North-Holland Publishing Company
A MULTIVALUED
APPROACH
TO THE FARKAS LEMMA
J.M. B O R W E I N
Dalhousie University, Hali/ax, Nova Scotia, Canada Received June 1977 Revised manuscript received 17 February 1978 The Farkas lemma is examined in the context of point-to-set mappings. Some general non-linear inclusions are studied and the standard linear results are rederived in a strengthened and simplified form.
Key words: Multivalued Mappings, Upper Semi-continuity, Lower Semi-continuity, Nonlinear Farkas Lemmas, Linear Theorems. 1. Introduction Suppose that we are in R". The classical Farkas L e m m a [15, 22.3.1, p. 200] states that a linear inequality a0" x -< 0 is a consequence of the s y s t e m
ai.x<-O,
i=1 ..... m
if and only if there exist non-negative real numbers X~. . . . . ),,~ such that ra
t~=lAiai = do. =
Variants of this result, which is central to a great many optimization principles and to m a n y other results concerning general inequalities (see for example Mangasarian [8], Ritter [11, 12, 13] or L e h m a n n and Oettli [6]), have been examined in infinite dimensional spaces by Craven [3], Ritter [12], Flett [4] and others. The central tool which we use is the theory of muitivalued functions developed by Berge [1], Kuratowski [6] and others. This a p p r o a c h which is entirely different f r o m that in [3], [4], [12] enables us to present stronger theorems and considerably more e l e m e n t a r y proofs. We review some terminology about multivalued mappings [1]. Let X, Y be non-empty sets. A multivalued mapping T between X and Y, denoted T : X --+ Y, is a mapping f r o m X into the p o w e r set of Y. The effective domain of T, denoted D(T), is the set of points in X for which Tx is not empty. The range of T, R ( T ) , is the set U~xTx. If V C Y , it is usual to write T - ( V ) = { x l x E X and Tx n V ~ r and T+(V) = {x Ix ~ X and Tx C V}. Let X and Y be topological spaces. T is said to be lower semi-continuous when T - ( V ) is open in X w h e n e v e r V is open in Y. T is said to be upper 42
J.M. Borwein/ Farkas lemmas
43
semi-continuous when T+(V) is open in X whenever V is open in u If T has non-empty compact images and is upper semi-continuous, T is said to be u.s.c. Let W be a third space. Suppose T : X ~ Y and S: Y ~ W. The composition of T and S is defined by ( T S ) y = T ( S y ) = U ,~s, Tw. The fundamental continuity result for compositions is that T S is lower semi-continuous whenever T and S are. Similarly, T S is upper semi-continuous or u.s.c, whenever both T and S are. These results are very easily established. Finally, T is said to be single valued when D ( T ) = X and Tx is a singleton for all x in X. It is immediate that for single valued mappings all the previous notations of continuity coincide with the standard one. For convenience, single valued mappings will be denoted with small Roman letters. Thus we identify the single valued mapping T = {f} and the ordinary function f. We will write T~ C T2 if T~x C T2x for all x in X. All topological vector space notions are taken from Robertson and Robertson [14].
2. Multivalued non-linear Farkas lemmas Let X, Y, Z be non-empty sets. Let A C Y, B C Z be arbitrary non-empty subsets. Let F:X---> Y, G :X--->Z be multivalued mappings. We specify conditions on their domains as we proceed.
Proposition 1. Suppose that D ( G ) C D ( F ) and Fx A A ~ O implies
Gx O B # O.
(1)
Then there exists H : Y--} Z, with D ( H ) C R ( F ) , such that (i) G C HF, (ii) y E A O R ( F ) implies H y O B ~ 0. If D ( G ) = X, then R ( F ) = D ( H ) .
Proof. Let H = G F - . Then if x o E D ( G ) , one has
F-(Fxo) = {x I Fx n FXo ~ 0}, and Xo E F-(Fxo). Thus Gxo C G(F-(Fxo)) = (GF-)(Fxo) = (HF)xo.
This establishes (i). Now suppose that y E R ( F ) O A. There is then some x E X with y E Fx n A. Thus x E F y and Gx C G ( F - y ) = Hy. Moreover, since Fx O A # 0 , (1)implies that Gx n B # 0. This in turn implies that H y O B # 0, which establishes (ii). If D ( G ) = X, then R ( F - ) = D ( F ) = X and D ( H ) = R ( F ) .
Proposition 2. Suppose that F = f is single valued and f(x) ~ A
implies
Gx C B.
(2)
44
J.M. Borweinl Farkas lemmas
Then one can find H : Y ~ Z with D ( H ) C R ( f ) and (i) G c HI, (ii) H ( A n R(f)) c B. Conversely, if such an H exists, (2) holds.
Proof. Suppose (2) holds. As before, let H = G F - = G f -I. Then (i) is established as before. Now, suppose y C A N R(f). Then y E f ( x 0 ) O A for some x0. If x E f - l y , one has f ( x ) = f ( x o ) E A and thus G x C B . Thus H y = G ( f - t y ) C B which establishes (ii). It is easy to verify that if such an H exists (2) must hold. We now suppose that X, Y, Z have been topologized.
Proposition 3. The following continuity results hold for H defined as in Propositions 1 and 2. (i) If F - and G are lower semi-continuous, so is H. (ii) If F - and G are upper semi-continuous (u.s.c.), so is H. (iii) If F = f is single valued and open relative to R ( f ) and G is lower semi-continuous, then H is lower semi-continuous relative to R ( f ) .
Proof. (i) and (ii) follow from the previously made observation that composition preserves all three continuity properties. (iii) It suffices to show that F - = f-~ is relatively lower semi-continuous as we may then apply (i) (with Y replaced by R(f)). Now suppose 0 C X is open. Then if-')-(0) = {y [ / - ' ( y ) n 0 r ~} = {y [ 3 x ~ x , y = f ( x ) , x ~ 0}. Thus (f-I)-(0)= f(0) which is open relative to R(f) by hypothesis. This is the definition of relative lower semi-continuity for F - = f-~. In the case that H is single valued on D ( H ) , Proposition 3 gives conditions for it to be continuous relative to D(H). A simple condition which guarantees this is that g = G be single valued on X and that Fxl n Fx2 # ~J implies
g(xl) = g(x2).
(3)
In the event that F is also single valued, (3) is the condition required by Craven in [3]. Collecting results, we have the following corollary which generalizes Theorem 1 of [3]. Corollary 1. Suppose that X, Y, Z are topological spaces and that A C Y, B C Z are non-empty sets. Let f : X ~ Y and g : X ~ Z be single valued and, respectively, open relative to R ( f ) and continuous. Suppose that condition (3) holds. Then the following are equivalent.
45
J.M. Borweinl Farkas leramas f(x) E A g = hf
implies
(4)
g(x) E B,
where h : R ( f ) ~ Z h ( A n R ( f ) ) c B.
is single valued, continuous, and satisfies
(5)
Proof. Suppose (4) holds. Proposition 2 shows that h = H satisfies g C hf
and
h ( A n R ( f ) ) c B.
Condition (3) guarantees that H = h is single valued on D ( H ) = R ( f ) . This in turn implies that g = h[. The relative lower semi-continuity of H is a consequence of Proposition 3 (iii) and since H = h is single valued on the set R ( / ) , h is actually continuous as a mapping of R(f) into Z. The converse is obvious. This theorem is stronger than Craven's [3] which required in addition that f was continuous (a condition which was necessary for his proof method). We also remark that there are clearly other variants of Propositions 1 and 2. The two given seem the most useful.
3. Linear theorems
In this section we suppose that all spaces concerned are real vector spaces and that all topological spaces are real, Hausdortf, topological vector spaces. We have the following linear Farkas lemma. Theorem 1. L e t S : X -~ Y be a linear mapping which is open relative to R ( S ) . Let T : X -->Z be a continuous linear mapping. Suppose that A C Y, B C Z are closed convex cones and that B n - B = {0}. Then the following are equivalent. Sx E A
implies
T = LS
where L : R ( S ) - - * Z L ( R ( S ) n A ) c B.
(6)
Tx ~ B, is
linear,
continuous
and
satisfies
(7)
Proof. It is only necessary to verify that (3) holds and that L is linear. Linearity is immediate. To see that (3) holds, suppose that Sx~ = Sx2. Then S(xl - x2) = 0 A A - A and so (6) implies that T ( x l - x 2 ) ~ B O - B . But B O - B ={0} by assumption. Thus T(x~ - x2) = 0 and Tx~ = Tx2 as required. The theorem now follows from Corollary 1. It is apparent that the requirement that A and B be closed c o n v e x cones could be considerably weakened.
46
J.M. Borwein!Farkas leramas
The linear Farkas lemmas proven in [2], [3], [11] all assume that S is continuous and impose topological conditions on the spaces to insure that the open mapping theorem holds so that S can be assumed open as well. The continuity of S is then also needed in the proof which proceeds by factoring out the null space of S. This method not only relies on considerably more machinery but also yields a weaker theorem. Note that even in the case that X is an infinite dimensional Banach space and Y = R, we can find (open) discontinuous linear functionals so that Theorem 1 is properly stronger than the results in [3], [4], [12]. When R ( S ) is not equal to Y we still would like to replace (7) by T = LS
where L : Y - ~ Z is linear, continuous and satisfies L ( A ) C B.
(8)
The results in [5] or [10] on the extension of positive operators allows one to do this if the appropriate topological and order theoretic conditions are imposed. We content ourselves with establishing a theorem for the central case of the Farkas lemma in which Z = R. Theorem 2. Let X and Y be topological vector spaces. Suppose that S : X ~ Y is a continuous linear mapping which is open relative to R ( S ) . Let f : X ~ R be a continuous linear functional on X. Suppose that A C Y is a closed convex cone with non-empty interior, int A, and that R ( S ) n i n t A S ~.
(9)
Then the following are equivalent. Sx E A implies f ( x ) >- 0, f = LS
(10)
where L : Y - > R is a continuous linear functional on Y which is non-negative on A. (l 1)
Proot. It suffices, by Theorem 1, to replace (7) by (11). Thus we wish to show that any continuous linear functional L on R ( S ) which is non-negative on A O R ( S ) may be extended to a continuous linear functional on Y which is nonnegative on the whole of A. This is established in a variety of places as a consequence of the H a h n - B a n a c h theorem [10], [14] or more general convexity theory [15]. Often it is established in an unnecessarily complicated manner [11]. Here we give a simple proof in the spirit of mathematical programming. Consider the convex program (P)
i n f { L x : ( - l ) x E - A , x E R(S)}.
By assumption (P) has its infimum attained at 0. We will apply the Lagrange multiplier theorem [2], [8], [15]. This is established in arbitrary topological vector spaces in [2] using multivalued notions. To apply the multiplier theorem recall
J.M. Borwein/ Farkas lemmas
47
that a constraint qualification is required. Slater's constraint qualification (that some feasible point for (P) has its image under the constraint function in the interior of - A ) is exactly (9). Thus from Corollary 2.7 of [2] we may derive the multiplier relationships that there exists a continuous linear functional [ on Y with
[(a)>-O
VaEA,
(12)
and
Lx + [ ( - l ) x >- 0
Vx E R(S).
(13)
Since (13) says that L and [ coincide on R(S) we are done. If A is a polyhedral cone and X and Y are finite dimensional (12) and (13) will still hold as a consequence of the linear duality theory [7], even if (9) fails. Thus, in Theorem 2 we have reestablished a generalization of all the classical Farkas iemmas.
Acknowledgment This work was partially produced while the author was a D.Phil. Student under the supervision of Dr. M.A.H. Dampster of Ballioi College, Oxford, whose continued interest is much appreciated. Research was partially funded on N.R.C. Account A4493.
References [I] C. Berge, Topological spaces (Oliver and Boyd, London, 1963). [2] J.M. Borwein, "Multivalued convexity and optimization: a unified approach to equality and inequality constraints", Mathematical Programming 13 (1977) 163-180. [3] B.D. Craven, "Nonlinear programming in locally convex spaces", Journal of Optimization Theory and Applications 10 (1972) 197-210. [4] T.M. Flett, "On differentiation in normed spaces", Journal of the London Mathematical Society 42 (1967) 523-533. [5] D.H. Fremlin, Topological Riesz spaces and measure theory (Cambridge University Press, Cambridge, 1974). [6] K. Kuratowski, Topology, Volume I (revised) (Academic Press, New York, 1966). [7] R. Lehmann and W. Oettli, "The theorem of the alternative, the key theorem and the vector maximization problem, Mathematical Programming 8 (1975) 332-344. [8] O.L. Mangasarian, Nonlinear programming (McGraw-Hill, New York, 1969). [9] V.J. Manusco, "An Ascoli theorem for multivalued functions", Journal o[ the Australian Mathematical Society 12 (1971) 466-477. [10] A.L. Peressini, Ordered topological vector spaces (Harper and Row, New York, 1967). [I 1] K. Ritter, "Optimization in linear spaces, I", Mathematische Annalen 182 (1969) 189-206. [12] K. Ritter, "Optimization in linear spaces, lI", Mathematische Annalen 183 (1969) 169-180. [13] K. Ritter, "Optimization in linear spaces, Ill", Mathematische Annalen 184 (1970) 133-154. [14] A.P. Robertson and W.J. Robertson, Topological vector spaces (Cambridge University Press, Cambridge, 1964). [15] R.T. Rockafellar, Convex analysis (Princeton University Press, Princeton, 1970).
Mathematical Programming Study 10 (1979) 48-68. North-Holland Publishing Company
EXTENSIONS O F T H E C O N T I N U I T Y OF POINT-TO-SET MAPS: A P P L I C A T I O N S T O FIXED P O I N T A L G O R I T H M S J. D E N E L Universit~ de Lille 1, France Received 17 January 1978 Revised manuscript received 24 March 1978
A new approach for synthetizing optimization algorithms is presented. New concepts for the continuity of point-to-set maps are given in terms of families of maps. These concepts are well adapted to construct fixed point theorems that are widely useful for synthetizing optimization methods. The general algorithms already published are shown to be particular applications and illustrations in the field of mathematical programming are given. Key words: Point-to-Set Maps, Continuity, General Algorithms, Fixed Point Theorems, Synthesis, Optimization.
1. Introduction
For about ten years the number of optimization methods has been leading some authors to develop a synthetic approach of such algorithms in order to recognize their common features and a global condition for the convergence. Zangwill [19] seems to have been the first to fully exploit in the optimization area the concept of point-to-set maps. He showed that a lot of methods can be viewed as applications of the fixed point method, xi+, E F(x~), where F is a point-to-set map that depends on a particular algorithm. This approach divides the convergence criterias into two kinds. The first one to ensure that any accumulation point of the sequence generated by the method is a fixed point for F, the second one to ensure that every fixed point has optimality properties for the problem. We are interested here by the description of fixed point algorithms which generalize those already published [15, 16, 17, 19] and enlarge their possible applications. The convergence conditions generally given for such general schemes are of two kinds; the first one is the so-called strict monotonic property (which means the existence of a function h such that x ' E F(x) and x ~ F(x) implies h ( x ' ) > h(x)). The second condition uses the continuity-properties of point-to-set maps (about these notions see for example Berge [1] and for the terminology used here, see [151 and the Appendix). A lot of papers are concerned with the study of the continuity of maps often used in the modelization of mathematical programs (see for example [1, 10, 15, 16]). Another topic that arises is the stability of 48
J. Denel/ Extensions of the continuity
49
optimization problems; it has been widely studied [3, 8, 9, 10, 15, 17]. These results show that the classical continuities of mappings are not well adapted because when modelizing one has to consider maps that are defined either by an operation (intersection, composition ...) between maps or by the optimal solutions set of a parametrized program. Unfortunately the stability of the classical continuities (in particular the lower one) is only verified in a few cases and always requires strong assumptions. This restricts the power of the already published schemes. In this paper we present a new synthetic approach of optimization methods that greatly avoids this non-stability. The originality lies in the definition in terms of families of maps, of properties like continuity and in a different approach of algorithms (no maximisations at each iteration but something that preserves the strict monotonic property which seems to be essential). This proposed approach is justified in the context of optimization by the new possible applications and leads to general fixed point algorithm that generalize [15, 17]. To develop this approach we consider families of point-to-set maps that depend on a parameter. These families, we have called p-decreasing families of maps, are defined in Section 1. Some essential definitions and properties needed for the description of fixed point algorithms are given (for more details see [5]). These definitions can be considered as extensions of the classical continuities; they allow us to define a "measure" for maps that are not lower or upper continuous. Besides this generalization, the formalism used here allows us to exhibit, for convergence purpose, a "minimal property" (the uniform regularity) that is not a continuity property. In Sections 2 and 3, two general algorithms are presented. They are the generalization of the one (and of the two) stage algorithm [15, 17]. Illustrations of the possible applications are shown in Section 4. Notations R n, the n-dimensional Euclidean space; ~(X), the set of all subsets of X C R" ; A ~ (Fr(A)) the interior (the boundary) of A C R" ; ~,, the closure of A C R" ; (x, y), the Euclidean scalar product on R~; Ix, z], the convex hull of x, z E R" (segment); liras x, (liras x,), the smallest (the greatest) accumulation point of the sequence
{x~ ~ R~; 1. p-Decreasing families of point-to-set maps: Definitions In the following we shall have to consider families of point-to-set maps from X (C R") into ~ ( Y ) (Y C Rm), which depend on a nonnegative parameter and
J. Denel/ Extensions o[ the continuity
50
which are ordered by inclusion. Definitions and properties about such families are given in this section, for a complete study of the p-decreasing families the reader is referred to [5]. Definition 1 (p-decreasing family). A family {Fp[p >-0} of point-to-set maps from X into ~ ( Y ) is said to be a p-decreasing family if and only if (i) Vp >0, Fp : X ~ ( Y ) , (ii) Vx E X, Vp -> 0, Vp' -> 0 (p' --
0} from X into ~ ( Y ) is uniformly regular at g E X if and only if, F0(~) # ~:=> =lp > 0, =IV(S) a neighbourhood of $: Vx ~ X tq V($), Fp(x)~ ~. The family is uniformly regular on X if it is regular at every x E X. Remarks. (1) A p-decreasing family {FpJp-O} will be called regular if the property of Definition 2 holds only at $, that is, F0(.~) # ~==~=lp > 0 : Fp(.~) # ~. (2) It is convenient to call dense at ~ E X a p-decreasing family such that
Fo(g) C U Fp(s p>0
it is obvious that density at ~ implies the regularity at ~. (3) In the Definition 2 it is equivalent to say, F0(g) # ~ 3p >0, V{x. E X}N-~ g, =In0: Vn -> no, Fp(x,) # ~. (4) Definition 2 is not concerned with a notion of continuity of the map F0 (in the classical meaning). But it will be seen (Sections 2, 3) that this notion is the one that is needed for convergence results in the fixed point theorems. To ensure the stability of the uniform regularity of p-decreasing families in elementary operations we have to introduce an extension of the classical lower continuity. Definition 3 (pseudo-lower-continuity (or p.l.c) at ~ E X). A p-decreasing family {Fp ] p -> 0} from X into ~ ( Y ) is pseudo-lower-continuous at ~ E X if and only if,
V{x. E X ~ , V p ,
p'>O(p'
and
VyEF,(g)
we have :lno, :l{y, E Y } s ~ y: Vn - no, y. E Fp,(x.) the family is p.l.c, on X if it is p.l.c at every x E X. Remarks. (5) If the family {Fp[p->0} does not depend on the parameter p, Definition 3 is obviously the definition of the lower-continuity; but the p.l.c at of a p-decreasing family does not imply the lower continuity (in the classical understanding) of the map F0. Conversely even if F0 is lower continuous at • X, the p-decreasing family {Fp [ p -> 0} may not be p.l.c at g. But it is easy to
J. Denel/ Extensions of the continuity
51
see that if a p-decreasing family is dense and p.l.c at $ E X then the map F0 is lower continuous at ~. (6) In the above definitions, the variable x and the parameter p play distinct parts. In fact to a p-decreasing family we can associate a map G defined on X x R . into ~ ( Y ) by G ( x , p ) = Fp(x). It is easy to verify that the lower continuity of the map G implies the pseudo-lower-continuity of the family; the converse is not true.
Definition 4 (pseudo-upper-continuity (or p.u.c) at g E X). A p-decreasing family {Fp I P > 0} from X into ~ ( Y ) is pseudo-upper-continuous at ~ E X if and only if, V{x,, E X}s--, ~ } V{y. E Y } s ~ y such that ::lp > 0 , :lno: Vn -> no, y. E F,(x.) => ~ E F0(~). The family is p.u.c on X if it is p.u.c at every x E X.
Remarks.
(7) If the family {Fplp >-0} does not depend on the parameter p, Definition 4 is obviously the classical definition of the upper continuity. It is easy to construct families that are p.u.c with the map F0 being not upper continuous. (8) The previous definitions have been given in terms of sequences which is well adapted to the study of algorithms. In [5], these definitions are given in topological spaces and the connections with the definitions given here are studied. Let us now give some properties that will be used in the following sections.
Proposition 1. Let be given a p-decreasing family {Fp ] p > 0} from X into ~ ( Y ) and denote by M = {x E X [ Fo(x) = 0}. Then, {Fp I P >- 0} uniformly regular on X ~ M closed. Proof. If M = ft the result is correct; so assume M S 0 and consider a sequence {x. E M}~ converging to ~; by the converse, assume SK M:
{Fpl p ->0} uniformly regular at x /
::lp > 0 , 3 n o : n > n o , F p ( x . ) # 0 ]
Fo(g) ~ 0 Remark 3
Vn, Fo(x.) = 0
I
leading to a contradiction.
Proposition 2. If {Fp ] p >- O} is uniformly regular at ~ ~ X and if there exist
{x. e X ~ - , ~ e x ,
~o. e R+}~ -,0
such that Fp.(x,) = ~, then Fo(g) = ~. Proof. ( a ) { F p l p ~ O }
uniform regular at g ~ 3 p > O ,
3V(g):VxEXAV(g),
J. Denel/Extensions of the continuity
52
Fp(x) r 0; and (b) {x. E X } ~
~
3no: n >--no~ x. E X FI V(Y,) imply
3n0: Vn >--no, Fp(x.) r O.
(1)
But,
{p. -> 0}s ~ 0,
{Fp ] p - 0} p-decreasing
imply
::lnl:Vn>-n,,p.<--p
and
Fp(x.)CFp.(x.),
which contradicts result (1). With the following proposition we give a result similar to the stability of optimization programs. The statement proposed here will be used in Section 2. Let us consider X C R " , Y C R p, X closed; f:XxY~R; d~ : Y ~ R ; 3' : R+--* R+ a monotonous strictly increasing continuous function such that 3"(0) = 0; {ap ]p -> 0} a p-decreasing family from Y into ~ ( X ) . Proposition 3 below gives results about the family {Fp I P - 0} defined by Vy
Y, Vp > 0: Fp(y) = {x E Ap(y) If(x, y) -> ~b(y) + 3'(p)},
Vy E Y, p = 0: F0(y) = {x E Ao(y) If(x, y) > ~b(y)}. Proposition 3. With the notations and assumptions given above, {Fp [p-> 0} is a p-decreasing family and we have: f upper semi-cont, on X x {y}] (i) ~b lower semi-cont, at Y I ~ {Fp I P >- 0} p.u.c, at y, {Ap [ p -->0} p.u.c, at
(ii)
f lower semi-cont, on X x {9} ] {Fp [ p -> 0} dense, dp upper semi-cont, at Y I ~ uniformly regular {Ap ] P > 0} dense and p.l.c, at ~ and p.l.c, at Y.
Proof. (i) Consider {y. E Y}s ~ Y, {x. E X}s ~ ~ such that :lp0 > 0, 3n0: Vn -> no, x. E F~(y.). X closed ~ ~ E X ; furthermore because {Ap I P -> 0} is pseudo upper continuous at ~ we have ~ E A0(~) and for every n -> no we have
f(x., y.) >--,;b( y . ) + 3'(P0). Taking the limit (n ~ +oo) in this result and using the continuity properties of f and ,# we obtain: f(~, 9) - 4,(Y) + 3'(p0) > 4,(9)
which implies with ~ E A0(~) that ~ E F0(y).
J. Denel/Extensions of the continuity
53
(ii) We first have to show that Fo(9) C Up>oFp(9); if ~ E Fo(9) we have ~ ~ A0(~) and f(g, 9) > ~k(~). The density of the family {Apl p >-0} implies the existence of a sequence {x. E X}s converging to ~ with x. E Up>o Ap(9) for every n or equivalently Vn ~ N , ::lp. > 0 : x. E Ap.(9).
(1)
On the other hand, the assumption about 3' implies that there exists p0 > 0 such that 0 < 7(Po) < f ( 2 , 9 ) - $(Y), since f(2, 9) > 4~(Y), and then 3no: Vn > no, f(x., 9) > 4,(9)+ 3,(po). This result with (1) shows that for every n - no, there exists p" = min{p., po} such that x. ~ Fp;(9), hence
p>0
Let us now prove the pseudo-lower-continuity of {Fp ] p ~ 0} at Y. Consider p, p ' > 0 ( p ' < p ) , {y, E Y } ~ 9 and ~ E Fp(~). {Ap ] p -> 0} p.l.c at ~ l ::> :Ino, 3{x. E X}s ~ ~: Vn -> no, x. E Ap,(y. ). E Ap(y) J We will show that this sequence {x. E X}~ verifies f(x., y . ) > 4,(y.)+ 3,(p') for every n-> n~. By the converse, assume there exists a sub sequence {x, E X}~, such that Vn ~ N', f(x., y.) < 4'(Yo) + 3,(P')Using the continuity of f and ~b, and taking the limit (n E N') we obtain /(x, y) -< ~(y) + 3,(#) < ~ ( y ) + 3,(o) (because Finally Remarks family at
3, is strictly increasing) and hence a contradiction with ~ E Fp(9). to prove the uniform regularity at Y, we observe that the regularity (cf. 1, 2 in Section 1) and the pseudo-lower-continuity of a p-decreasing 9 imply its uniform regularity (see [5, P.I. 3, p. 10]).
2. A one stage algorithm
2.1. Principle, assumptions of algorithm Am Here is described a fixed point algorithm to construct a feasible sequence {x.}N for solving the general problem: maximize
f(x),
subject to
x•ACR".
J. Denell Extensions of the continuity
54
2. I.I. Intuitive description A well known approach to solve problem ( ~ ) is to replace the direct solving by an infinite sequence of optimization sub-problems, associated with the current solution x, easier to solve than ( ~ ) because their domain O(x) and/or their objective function g(., x) are more simple. More precisely assume that at every point x E A is associated a subset ~ ( x ) C A. Let us denote by Po(x) the sub-set of 12(x) defined by:
Po(x) = {y E ~ ( x ) I g(Y, x) > g(x, x)} where g is the objective function of the sub-problems properly associated with f. In the classical approach, [15] or [17], the successor x' of x is chosen among the points which maximize g(., x) over Po(x) and hence, to prove convergence, the continuity properties of the optimal solution set of a parametrized problem have to be used. Intuitively, the set Po(x) may be considered as the union of sets Pp(x), p > 0 where (see Fig. l)
Pp(x) = {y E ~ ( x ) [ g(y, x) >-g(x, x) + p}.
~
-
-q.(=) Fig. I.
In the proposed approach, a step consists, x E A and p > 0 being given, in arbitrarily choosing x' in Pp(x) if Pp(x) is not empty. If Pp(x) is empty and only in this event, the step consists in setting x' = x and in reducing the parameter p (for example p = 89 The classical maximization of g ( - , x ) over O(x) is obviously a particular case. We shall present two versions of such a one stage algorithm: the first one when x' is arbitrarily chosen (no maximizations), the second one when the point x' maximizes g(-, x) over Pp(x). The convergence results are of course stronger with this second version. It is to be noticed that to recognize a set Pp(x)--tJ is as much difficult as to recognize that a point x maximizes the function g(., x) over /2(x) within some error ~, this knowledge being required in the classical schemes.
2.1.2. Assumptions Let us consider: E C R", a compact subset;
J. Denel/ Extensions o f the continuity
55
{Fp [ p -> 0} a p-decreasing family from E into ~ ( E ) ; and assume HI: M = {x E E [ Fo(x) = 0} is not empty. H2: The family {Fp [ p -> 0} is uniformly regular on E. H3: There exist h : E ~ R and a : ]0, + o o [ x E ~ R such that a: 3K, V x E E - M : h ( x ) < - K , b: Vx E E - M, Vp > 0 : x ' E F p ( x ) ~ h ( x ' ) > - a ( p , x ) , c: V{x. E E - M}N such that {h(x.)}s has a limit/~ we have Vp>O,
h < limN a(p, x.).
Remark. (9) An example of function a is given, in a lot of applications, by a ( p , x ) = h ( x ) + p . In Section 2.4 it is shown that the classical relation [15] between the original objective function f of problem ( ~ ) and the related function g implies the existence of such a function. (10) V x E E - M , V p > O : h ( x ) < a ( p , x ) . To prove this, consider in H3c a sequence {x.}~ with x. = x Vn. With this remark, assumption H3 can be seen as a property like the strict monotonic property. (11) In terms of p-decreasing family, assumption H2 replaces the uppercontinuity in Zangwill theorem.
2.2. Description of algorithm A~ Let be given starting values x0 E E, p0 > 0 and a scalar/3 E ]0, 1[. Step n: ..
f Xn+l ~ Xn~
if Fp.(x.) = 0, men ~.0
The sequence generated by this one stage algorithm may be constant for n -> no; the scalar/3 ensures the sequence {p,}N to converge to zero if the event Fp,(x,) = 0 occurs an infinity of times.
2.3. Convergence results T h e o r e m I. Under the assumptions HI, H2, H3 there exists a well determined sub sequence of the sequence {x.}N constructed by Al, having its accumulation points in M.
Proof. We may suppose that for every n , x . ~ M ; otherwise, if x~0E M, then
56
z Denel/Extensions of the continuity
Fo(x,O = 0. Thus we would have Fp(x~o) = 0 for every p > 0 and the sequence would be constant for n >- no and equal to x,0. With (x~ E E - M}N let us show that: (1) {h(xn)}s has a limit/T, (2) {.on}~converges to zero. (1) By construction, assumptions H3b, completed by Remark 10, and H3a, the sequence {h(x~)}~ is an upper-bounded nondecreasing sequence, thus converges to some/Y. (2) The sequence {p,}~ is a non-increasing positive sequence; it has a limit p*. Assume that p * > 0. From the construction of the sequence {P,}s it is obvious then that there exists no such that Vn >_no
P~o -
P.
-
P
,
[Fpn(Xn) = Fp.(x,,) r O. So we have, using Hab:Vn >-no h(x~+l)>-a(p*,x~). This is inconsistent with {h(x~)}~/Y (use H3c and n ~ + ~ ) . Thus there exists an infinite well-determined subset N, C N such that Vn E N1
Fpn(x=) =
O.
The result follows by applying Proposition 2 to any convergent subsequence of the (compact) sequence {x~}s,. We propose with Theorems 2 and 3 stronger results according to stronger assumptions (in Theorem 2) or according to a modification of each step in A~ (version Ai, Theorem 3). Theorem 2. Under the assumptions Ht, H2, H3 and (H4) h is l.s.c at any x E M and u.s.c on E, (H~) if x E M, then Vy E E: h(y) --- h(x), algorithm A~ constructs a sequence {x,}s having all its accumulation points in M. Remark. The latter assumption H5 implies that M is the set of optimal solutions for the problem sup{h(x) I x ~ E}. This assumption will be often verified in the applications (concave maximisations) and is similar to hypothesis Hal in [15, p. 313]. Proof. Results from Theorem I are available. Since E is compact there exists K M, limit of a sub sequence N~ C N1 (where N, is determined in Theorem 1). Hs:~ Vy E E: h(y) <- h($) lim{xn}~i = ~ E M ] h l.s.c, on M m
{h (x,)}w~ h
I ==)'h('x)-~ ==)VyEE:h(y)_~.
(I)
J. Denel/ Extensions of the continuity
57
Let us now consider any accumulation point x* = lim{x. In e N ' C N } and assume that x* ~ M.
Fo(x') ~ (J}
g _ 3y E F~(x*)~ ~ h(y)>_ a(p, x*)} ::~ :lp~ > 0, Fpo(x*)~ --~H3b
H2
J - - Remark 10
yielding :ly E / E : h(y) > h(x*); but lim{x.}N, = x* E E - M ] h u.s.c, o n E - M
~ <- h(x*)
:~ :ly E E : / ~ < h(y),
{h(x.)}s--* h this contradicts result (1).
Version A~ of algorithm A~: Theorem 3. In this version A;, we modify the choice of x.+l in Fp.(x.) whenever Fp.(x.)~ (J. The successor x.+l is chosen in Fp.(x.) as a point which maximizes the function h within some error over the set
F,Ax.). Description of A~. Let be given Xo~ E, p o > 0 , / ~ ~ ] 0 , 1[ and {y.}N a sequence of nonnegative reals converging to zero. Step n: -
f
Xn+I
=
Xa,
if Fp.(x.) = t~, then~0 < P.+t ~/3p., otherwise choose~ x"+l ~ Fp.(x.): Vy ~ Fp.(x.), h(y) -< h(x.+l) + y. I.P.+1 = Pn-
end of step n.
Under the assumptions H1, H2, H3, every, accumulation point of the sequence {x.~ constructed by Ai is in M.
Theorem 3.
Proof. Points 1 and 2 from T h e o r e m 1 are available, that is {h(x.)]~-*/~, {p.}s~0. Let us then consider any accumulation point x* = lim{x.}N, of the sequence {x.}s and assume that x* ~ M.
Fo(x*) # ~} H2
::~ 3po > 0, 3no: Vn > no (n E N'), Fpo(x.) ~ O.
Thus, for e v e r y n -> no (n E N ' ) there exists y. such that Y. E -rr_~0,x.,~ t ~1 ~ h ( y , ) -> , ' (p0, x , ) . H3b
J
J. Denel/ Extensions of the continuity
58
Since { p , } ~ 0 , we can prove as in T h e o r e m 1 the existence of n, such that
Vn>-nl(nEN
')
y, EFp.(x,).
But the choice of x,., in Fp.(x,) implies that Vn - max{n0, n,}, n ~ N': a(p0, x,) -< h(y,) -< h(x,+l) + 3',. Taking the limit (n E N') we conclude limN, a(po, x,) <- h. So, a contradiction by using H3,.
2.4. Application to optimization It is shown in this section that the previous one stage fixed point theorem A, (or A~) leads to an application for solving problem (~) that generalizes the classical one stage schemes already published by Meyer [17], Huard [14, 15]. Let us recall briefly their description: assume A is compact, f continuous on A and consider: A, a map : A ~ ~ ( A ) satisfying x E A (x), Vx E A; g : A x A ~ R a continuous function such that Vx E A, Vy E A (x): g(x, x) < g(y, x ) : ~ f(x) < f(y). The application of Zangwill's theorem leads to the following algorithm, (Theorem 1.6 of [17]) or (Theorem 4.1 of [15]): it is to be noticed that this related function g, introduced by Huard, is of a great interest when modelizing optimization methods. Step n:
x. E A be given choose x.§ E M~(x.) = {y E A(x.) [ g(y, x.) -> g(t, x.), Vt E A (x.)} if x , ~ Ma(x,). When the map A is continuous on A (i.e. upper and lower continuous), the convergence result is that any accumulation point of the generated sequence is a fixed point for the map M. It is easy to describe these theorems with algorithm A, by defining the p-decreasing family {Pp [ p >--O} as follows Vx E A, Vp > 0: Pp(x) = {y • A (x) [ g(y, x) >--g(x, x) + p}, p = 0: Po(x) = {y E A(x)[ g(y, x) > g(x, x}. The use of A[ with this family leads to the same sequences than the ones by [17] or [15]. Furthermore it is possible to apply A~ (without maximizations). Let us show that the assumptions Ht, H2, H3 are satisfied.
J. Denel/ Extensions o[ the continuity
59
H,: jr being continuous on A compact implies the existence of $ E A such that Vy E A, /(y)--
g(Yc, Yc) which implies (by assumption on g) that f ( y ) > / ( 2 ) , so, a contradiction. Hence, M is not empty. H2: {Pp I P -> 0} is of course a p-decreasing family from A into ~(A); its uniform regularity is given by Proposition 3 (Section 1); use {Ap ]p_>0}--{A ]p-->0}, ok(x) = g(x, x), y ( p ) = p and the fact that A being lower continuous implies that the family {A [p _>0}, which does not depend on p, is dense and pseudo-lower continuous on A. H3: Let us set h = [ and define a by
Vx e A , Vp >0,
a(p,x) = min{f(y)[ y E /t(x),g(y,x)>_g(x,x)+p};
if the set {y E A (x) [ g(y, x) >- g(x, x) + p} = ~, then a(p, x) is set to +oo. With this definition, it is obvious that the assumptions H3a and H3b are satisfied. Furthermore, the upper continuity of A implies H3r Another application of Zangwill's theorem (which is demonstrated by a particular proof) with a weaker assumption on A (i.e. only lower continuous) has been presented by Meyer (Theorem 1.7 of [17]) or more precisely by Huard [14]. This application requires the use of the original objective function of problem (~) instead of a related function g(., x) at each iteration. Again, algorithm A~ generalizes these results; to show this, the previous verifications are still available but in H3, the function a will be defined by Vy E A, Vp >--0: a(p, y) = f(y).
3. A two-stage algorithm A2 The algorithm described in this section is similar to the one published by Huard [15, p. 317]. However the use of p-decreasing families allows to enlarge the applications of this type of general algorithms in the field of mathematical programming. A two-stage algorithm is described by: to a feasible point x is associated a set A(x) (for example the set of all the descent directions in unconstrained optimization). An arbitrary z in A (x) being chosen, the successor x' is picked in the set F(x, z) where F is a point-to-set map. The set F(x, z) consists generally in the points of the segment [x,z] which satisfy a given property (for example the points maximizing a function). In Section 3.1 assumptions and notations are given for the description of algorithm A2 (3.2). The convergence is proved in Section 3.3.
3.1. Assumptions, notations for algorithm A2 let us consider E0 C R", E~ C R" two compact subsets;
60
J. Dend/ Extensions o f the continuity
{Ap [ p -> 0} and {Fp ] p > 0} two p-decreasing families from Eo into ~ ( E , ) and Eo x E~ into ~(Eo). Denote Ml = {x E E0 ] A0(x) = ~},
M2 = {x E Eo [ 3z E Ao(x), Fo(x, z) = (J}, M
= MI
UM2.
And assume H~: (a) Me: ~, (b) the maps Ao and Fo are such that MI r ~::> M2 = ~. H~: (a) The family {Zip [ p - 0} is uniformly regular, (b) the family {Fp ]p - 0} is uniformly regular, (c) the family {Zip [ p - 0 } is pseudo-upper-continuous. H~: ::lh : Eo--* R and :la : ]0, +oo[ x ]0, +oo[ x E o ~ R such that (a) =IK < +oo: Vx E Eo, h(x) <- K Co) Vx E Eo - M~, Vp > 0, Vp' > 0, we have
z E Ap(x) } x' ~ FAx, z) ~ h(x') >- a(p, p', x). (c) V{x. ~ E0}s such that {h(x.)}N has a limit/Y, then Vp > 0, Vp' > O:/Y < lims a(p, p', x.).
Remarks. (1) H~r ==>Vx E E0, Vp, p' > 0: h(x) < a(p, p', x). (2) The function ~, can be a ( p , p ' , x ) = h(x)+pp'; in a lot of applications a does not depend on the parameter p.
2.2. Description of algorithm A2 Starting values: x0 E E0, p0 > 0, p~ > 0 and a scalar fl E ]0, 1[. Step n: Xn+l =
Xn,
1st stage: if Ap,(x.) = O, then t 0 < p.,j <_tip., [0 < p ' + l ~ p ' , otherwise choose z. • Ap.(x,) and Xn+l = Xn,
2nd stage: if Fa~(x., z.) = O, then ~p.+l = p~, [0 < p.+, - tip., .
[ Xn+l E Fp;(xn, zn),
otherwise ChOOSe/ , , tp.+l = p., p.+l = p.. end of step n.
J. Denel/Extensions o[ the continuity
61
3.3. Convergence Theorem 4. Under the assumptions H~, Hi, HI there exists a well-determined sub sequence of the sequence {x,}s constructed by algorithm A2 having its accumulation points in M.
Proof. We may assume that for every n, x, K M, otherwise the proof is achieved. Let us show the three following points: (1) {h(x,)}s has a limit/Y. (2) At least one among the two sequences {p,}s or {p'}~ converges to 0. (3) Mt # t~<=~{P.}N-->0. The proofs of points 1 and 2 are very similar to the corresponding ones in Theorem 1. Let us show point 3. First, if {p,}~-~ 0 there exists, by construction of A2, an infinite subset N~ such that: Vn E N1, Ap.(x.) = r By extracting a convergent sub-sequence {X.}NIcN, (E0 compact) and using Proposition 2 with this sequence, we prove that M1 m 0. Second, assume M1 # I~ and {p.}s--> p* > 0 ({p.}s converges). This implies: 3n0:Vn >-no
z. E A p . ( x . ) C A p . ( x . ) ,
but, from point 2, { p ' } ~ 0 implies the existence of N2 C N such that Vn E N2
z. E Ap.(x.)
and
Fp~(x., z.) = 0.
Again we can apply Proposition 2 with a sub-sequence N~ C N2 such that {z.}N~-->~' and
{x.}N~:~
(E0 and E, compact).
Hence we have, using the p.u.c of {Ap ] p > 0}, 3s E a0(~), F0(~, 2) = 0r ME ~ 0.
This is inconsistent with M~ ~ t~ (see H~b). In conclusion: (a) If M~;a0, the sequence N1 of indices such that n E N ~ A p , ( x , ) = r is well-determined. Proposition 2 proves that the accumulation points of N~ are in M. (b) If M~ = I~, the sequence N2 of indices such that: Vn E N2,
::17..E Ap.(xn) C Ap.(xn): Fp;,(xn, zn) =
(p* is the limit of {p.}~) is well-determined and again Proposition 2 proves the theorem. The following theorem (similar to Theorem 2 in Section 2) holds for algorithm
As.
J. Denel/ Extensions of the continuity
62
Theorem 5. If the function h is l.s.c at any x ~ M, u.s.c on E0, if x ~ M implies Vy E E0, h(y) -< h(x) and if the assumptions H~, Hi, H'3 are satisfied, then every accumulation point of the sequence {x.}s is in M. The proof is omitted because it is similar to the proof of T h e o r e m 2. (Remark, x0 ~ M=> A0(x0) ~: ~ and Vz E A0(x0), Fo(xo, z) ~ ~J.) As in Section 2.3 it is possible to derive from algorithm A2 particular versions (denoted by A~ and A~3. In version Ai, inaccurate maximizations are performed in Fp;(x.,z.) to determine x.+t (that is, if F p ; ( x . , z . ) ~ ~J, choose x.+~ E Fp;(x.,z.) such that Vy E Fp~(x., z.), h(y) <- h(x.+~) + 3'.)As shown by T h e o r e m 6, this version Ai leads to convergence results if M~ = or M~ ~ ~ and A0 upper-continuous on E 0 - M. Whenever M~ g ~ and A0 not upper-continuous, no results can be proved. This is why we define a modified version A~ of Ai. It will be assumed in A~ that z. is arbitrary chosen (if Ap.(x.) ~ 0) in A~.(x.) C Ap.(x.) with t~. = min{p0, sup{p > 0 [ Ap(x.) ~ ~}}. This modification of Ai leads to T h e o r e m 6' (every accumulation point is in M provided H~, Hi, H[ are satisfied). It is to be noticed that this version is not implementable (at each step, find ~.) but it will be useful in some applications to obtain the same convergence results as those already published. Theorem 6. If the assumptions H~, Hi, H[ are satisfied, then every accumulation point of the sequence {x.}N constructed by A i is in M provided (a) M l = O, o r (b) MI ~ ~ and Ao upper-continuous on Eo - M (classical definition) instead of Hic. Proof. Case (a). The same walk as in T h e o r e m 4 proves that:
{h(x.)}N --->h, {p'}N-->O and
{p,}N--*p*>O.
Now let us consider x* = lim{xn}N, any accumulation point. If there exists N" C N' such that:
Vn E N",
z. E Ap.(x.)C Ap.(x.)
and
Fp,(x., z.) = (~,
then Proposition 2 applied to a sub sequence N'{ C N " such that z. ~ ~ (n E N'[) implies :If E A0(2) (by Hic) Fo(x*, ~) = ~t. If such a N" does not exist, that is: :IN0: V n >- no(n E N')
z. E Ap.(x.)
and
Fp,(x., z.) ~ O,
the converse, x* ~ M, leads to a contradiction (use Hib, Hie).
J. Denell Extensions o.f the continuity
63
Case (b). The proof is similar and without particular difficulties; the use of assumption H[c is replaced by A0 upper-continuous.
Theorem 6'. I f the assumptions H], H'2, H'3 are satisfied, then every accumulation point of the sequence constructed by A~ is in M. Proof. If M~ = 0 or M~ g 0 and A0 upper continuous, then T h e o r e m 6 proves the result. In the other case, it is clear that {p.}N-*0. Let us consider x* = lim{x.}N, any accumulation point. If for infinitely many n E N', Ap.(x.)= ~t, then Proposition 2 implies x* ~ M. Otherwise, if we assume x* ~ M, the particular choice of z. and the uniform regularity of {Ap I P -> 0} imply the existence of t~ > 0 such that n E N ' : ~ z. E A~(x.). In both cases (p~,~0 or not), a contradiction with H~c follows.
4. Applications to optimization Algorithm 5.2 (p. 318 in [15]) is a particular case of algorithm A2. It corresponds to the version A~ and T h e o r e m 6, case (a), where the map A0 is nevertheless assumed to be upper-continuous, because in that scheme the family {Ap [ p -> 0} is such that Vp -> 0, Ap = A0. There are a lot of particularizations of the algorithm by Huard [15]; for example related gradient methods (conjugate gradient methods...), Frank and Wolfe's method, Rosen's method, linearized method of centers, Zoutendijk's method .... We shall now show on only two new examples that with this algorithm A2 we can modelize a lot of well-known methods that could not be modelized with the classical approach. 4.1. Linearized method of centers with partial linearization [ 1 I, 4] The problem to be solved is: maximize
f(x),
subject to
g~(x) >- O,
i = 1. . . . . m,
xEB, where the functions are concave, continuously differentiable and B a compact polyhedron. For a given e > 0, we denote by d' : B • B ~ R a function defined by d',(z; x) = min{f'(z; x) - f(x), g~z; x) [ i E I,~)} with f ' and g~ being the tangential approximation of f and g~ at point x and with I.(x) -- {i E {1 . . . . . m} I gi(x) < e}. This function d', is actually the "partial linearized F-distance" related to the
64
J. Denel/Extensions o[ the continuity
particular F-distance
d(t,
f(x)) =
min{f(t) - f(x), gi(t) I i E {1 ..... m}}
The method consists in: (1) maximize d" over B and choose an optimal solution z, (2) maximize d(., f(x)) on the segment [x, z] and choose as a successor of any solution of this one-dimensional optimization. It is to be noticed that this method is a particular case of the "method centers by upper bounding functions" [12]. We could verify this latter method an application of algorithm A2. It is not possible to interpret this method with the two-stages scheme given [15] because the function d" is only u.s.c on B. Thus the map
x, of is in
M'(x) = {z E B I d'(z; x) >- d'(t; x) Vt E B} is not upper-continuous. But, by setting:
Vp > O, Ap(x) = {z E B I d'(z; x) >- d'(t; x)Vt E B} 1 Ao(X) = {z E B I d'(z; x) >- d(t, /(x))Vt E B}J and
Vp > O, F,(x, z) = {y ~ Ix, z] I d(y,/(x)) >- p '1
f
Fo(x, z) = (y ~ Ix, zl [ dfy,/(x)) > 0}
we can verify the linearized method of centers, when only the active constraints within 9 are linearized, is an application of A~, by setting:
Eo={x[gtx)>-O,i=l h(x) = / ( x ) ,
..... m } N B ,
El=B,
or(p, p', x) = jr(x) + p',
the family {Ap I P -> 0} is of course p-decreasing, uniformly regular because for every x E E0 and p -> 0, the set Ap(x) ~ t~. Furthermore, B being closed, d', being u.s.c and max{d(t, f(x)) [ t E B} being continuous (classical result) we have the pseudo-upper-continuity of the family {Ap [ p _> 0}. Besides, Proposition 3 implies the uniform regularity of the family {Fp I P -> 0} because the map (x, z)---> [x, z] is lower-continuous. Finally with the choices for h and a, assumption H~ is satisfied. This shows that this method is an application of A2; Theorem 5 is available because the set M (= M2 in this case) is the set of all the optimal solutions for the considered problem (see [15, p. 325)].
4.2. Sub-differentiable optimization Let us consider the algorithm of Dem'yanov [7] for solving the problem: minimize
/(x),
subject to
x ER",
3". Denel/ Extensions o[ the continuity
65
where [ is defined by
f ( x ) = max{fi(x) l i = 1 ..... m}
[i continuously differentiable
and A = {x E R" [ [(x) <-/(x0)}
compact.
Let us define, Vx E R", Vp -> 0.
Ip(x) = {i E {1..... m} I f ( x ) - fi(x) <- p}, ~bp(x) = min ( m a x {(Vfi(x), g)}} = max (V/i(x), g(x, p)), Igl=l t.ielatx )
J
iEl~(x)
g(x, p) is the p-steepest descent direction. The method of Dem'yanov consists in minimizing f along the e-steepestdescent direction. It is possible to reduce 9 (for example e' = e/2) if at some step we obtain ~b,(x) > -a0e (a0 a positive constant), see [7, p. 79]. It is easy to show that this method is a particular case of A2 by setting: E0 = A, E, = {y E R" I lYl = 1}, {Ap I P -> 0} a p-decreasing family defined by Vx E A, Vp > 0, Ap(x) = {g E E~ I max (V/i(x), g) <- -aop} ietp(~)
p = O, Ao(x) = {g E E, [ max (V.fi(x), g) < O} iEIo(x)
{Fp I P -> 0} the family defined by
Vp
> O, F,(x,
g) = {y =
x + Og, 0 >- 0
If(Y) -< f(x) - P}t, f
p = o, Fo(x, g) = {r
x + Og, 0 -> 01/(r) (x)}
J
h(x) = ./(x),
a(p, p', x) = - / ( x ) + p'. The assumptions H~, Hi, H~ of Section 3.1 are satisfied (for a justification, see just below). Hence Theorem 4 shows that every accumulation point of the subsequence N~ (defined by n E N~ r162p,+~ < p,) is such that ~0o(x*)- 0, that is, x* is a stationary point. This is exactly Theorem 1.7.2 in [7, p. 79]. Furthermore, if the functions [i are convex, then Theorem 5 is available and proves that every accumulation point is a stationary point (i.e. a minimum). If at each step, the parameter p, is chosen as in version A~ (i.e. the largest possible), we obtain the "first method" of Dem'yanov [7, p. 71] and Theorem 6' gives the same results as Dem'yanov's (Theorem 1.6.1 in [7, p. 75]).
Justification o[ H~, Hi, H~. It is clear that {x E A I A0(x) = 0} corresponds to the stationary points (~b0(x)-> 0), and every direction g E Ao(x) being a descent direction, we have {x E A [ :ig E Ao(x), Fo(x, g) = 0} = 0, hence M2 = 0;
66
J. Denel/ Extensions of the continuity
Assumptions H~,a,b.c are obviously satisfied with the particular choice of h and a. Let us prove Hi. the family {ap I P > 0} is p-decreasing
because p' <- p ~ Ip,(x) C Ip(x),
hence ap(x) c Ap,(x). Let us consider { x . } s ~ $ with a 0 ( $ ) ~ 0 . The existence of g E E t such that maxiElael (V.fi($), g) < 0, implies 4'0(-D < 0. Because there is a finite number of functions [i, we can exhibit a > 0 such that I=(g) = Io(~)
by setting p o = m i n { - ~ a o ~ b o ( s we can prove, by the converse, the existence of no such that n - no::) ~bp0(xn)-< - aopo, hence the family {Ap ] p - 0} is uniformly regular on Eo. The pseudo-upper-continuity of {Ap ] p -> 0} follows from the fact that io E Io(x*) ~ io E Ip(x.)
for n -> no.
{Folp->0)
The verification of the uniform regularity of the family is quite obvious (consider po = 89 - / ( Y ) ] with )7 C Fo(s g); it is clear that there exists no such that n >- n o ~ f ( y . ) <- f ( x . ) - po with Yn = Xn + Og.). Remark. Algorithm As with Theorem 5 is available to modelize the method of Bertsekas and Mitter [2] in the general convex sub-differentiable optimization. As a conclusion, the theoretical definitions of continuity for point-to-set maps, presented in this paper, are justified by the number of applications of the fixed point theorems At and A2 in the mathematical programming area. On a theoretical point of view, these definitions (completely studied in [5]) imply very interesting properties which have to be used when synthetizing optimization methods. Furthermore, algorithm A2 opens a simple way for the modelization of restarted algorithms (or algorithms which work with p previous steps), this, by constructing a multistage algorithm of the same kind as A> The two fixed point theorems A~ and A2 cannot be seen, in general, as the application of a unique theorem. It is shown in [6] that A~ and As are of the same type (i.e. the same theorem) provided little stronger assumptions on At are made.
Appendix Classical definitions for the continuity of point-to-set maps [15, p. 310]. Let us consider
X C R",
Y C R m;
F : X --} ~ ( Y )
such that V x E X , F ( x ) # ~.
J. Denel[ Extensions of the continuity
67
(1) F is said to be upper-continuous at x E X if V{Xk E X } N ~ x, V{yk E Y}N~ y such that Yk E F(xD, Vk we have y E F(x). (2) F is said to be lower-continuous at x E X if V{Xk E X } s ~ x , Vy E F(x) we have ~]{Yk E Y}N-~ Y~ ~ V n -> no, Yk E F(Xk).
3no
J
Remark (see Berge [1]). F is said to be upper s e m i c o n t i n u o u s at x E X if V 6 open in Y such that F ( x ) C 6 , 3 V ( x ) , neighbourhood of x in X such that
x' E V(x) ~ F(x') C (7. It is known that F u.s.c and F(x) closed ~ F upper continuous at x, F upper continuous and Y c o m p a c t f f F u.s.c at x.
Acknowledgements I am indebted to Professor P. Huard for his helpful comments about this work.
References ll] C. Berge, Espaces topologiques-fonctions multivoques (Dunod, Paris, 1966). [2] D.P. Bertsekas and S.K. Mitter, "A descent numerical method for optimization problem with nondifferentiable cost functionals", S I A M Journal on Control I I (1973)637-652. [3] G. Dantzig, J. Folkmann and N. Shapiro, "On the continuity of the minimum set of a continuous function", Journal of Mathematical Analysis and Applications 17 (1967) 519-548. [4] J. Denel, "Adaptation and performance of the linearized method of centers", Cahiers du CERO (Bruxelles) 16 (1974) 447-458. [5] J. Denel, "Propri6t6s des families p-d6croissantes d'applications multivoques', Publication no. 87, Laboratoire de Calcul, Universit6 de Lille I (1977). [6] J. Denel, "On the continuity of point-to-set maps with applications to optimization", Proceedings o.f the lI symposium of operations research, Aachen, 1977, to appear. [7] V.F. Dem'yanov and V.N. Molozemov, "On the theory of nonlinear minimax problems", Russian Mathematical surveys 26 (1971) 57-115. [8] J.P. Evans and F.J. Gould, "Stability in nonlinear programming", Operations Research 18 (1970) 107-118. [9] A.M. Geoffrion, "Duality in nonlinear programming: a simplified application oriented development", S I A M Review 13 (1971) 1-37. [10] W.W. Hogan, "Point-to-set maps in mathematical programming", S I A M Review 15 (1973) 591-603. [l l] P. Huard, "Programmation math6matique convexe", Revue Franfaise d'lnformatique et de Recherche OpirationneUe 7 (1968) 43-59. [12] P. Huard, "A method of centers by upper bounding functions with applications", in: O.L. Mangasarian, K. Ritter, J.B. Rosen, eds., Nonlinear programming (Academic Press, New York, 1970) 1-30.
68
J. Denell Extensions o f the continuity
[13] P. Huard, "Tentative de synthrse dans les mrthodes de programmation nonlinraire", Cahiers du CERO (Bruxelles) 16 (1974) 347-367. [14] P. Huard, "Implementation of gradient methods by tangential discretization", Journal of Optimization Theory and Applications 28 (1978). [15] P. Huard, "Optimization algorithms and point-to-set maps", Mathematical Programming 8 (1975) 308-331. [16] G.G.L. Meyer and E. Polak, "Abstract model for the synthesis of optimization algorithms", S I A M Journal on Control 9 (1971) 547-560. [17] R. Meyer, "The validity of a family of optimization methods", S I A M Journal on Control 8 (1970) 41-54. [18] R.T. Rockafellar, Convex analysis (Princeton University Press, Princeton, NJ, 1970). [19] W.I. Zangwill, Nonlinear programming, a unified approach (Prentice Hall, Englewood Cliffs, RI, 1969).
Mathematical Programming Study 10 (1979) 69-85. North-Holland Publishing Company
COMPOSITION AND UNION OF GENERAL OPTIMIZATION
ALGORITHMS
OF
J.Ch. F I O R O T University o[ Lille L Villeneuve d'Ascq, France
and P. H U A R D University of Lille L Villeneuve d'Ascq, France Electricit~ de France, Clamart, France
Received 7 February 1978
The study of the convergence of algorithms of optimization obtained by composition or union, taken in sense of the relaxation, is done. After having recalled the Zangwilrs theorem and given two extensions we study the obtainment of generalized fixed points in the framework of the composition or the union of algorithms obtained in a free steering way for, firstly functions having a unique maximum over some particular subsets, ranges of the current point, and secondly for general functions. The validity of the different hypotheses is discussed through some examples. Key words: Convergence, Composition of Functions, Union of Algorithms, Fixed Points, Free-steering Algorithms, Point-to-Set Maps.
1. Introduction
Relaxation m e t h o d s have been p r o p o s e d originally by Jacobi and GaussSeidel for solving linear systems of equations. These methods consist in solving in a cyclic w a y a sequence of partial problems limited to certain variables only. We can quote for the solving of linear systems: Varga [28] and for nonlinear systems: Ortega and Rhienboldt [19], Mieliou [17]. Recently Robert [24], Robert, Charnay and M u s y [25] have treated the free steering case where partial resolutions are choosen in a free order: Miellou [18] has treated the chaotic case where delays are introduced. These ideas were first presented by Schechter [26], Chazan and Miranker [4]. The relaxation process has been extended in the f r a m e w o r k of optimization problems, the algorithm consisting in a sequence of suboptimizations related to parts of the domain, essentially straight lines or linear subspaces parallel to coordinate axes or else on the c o m p o n e n t parts of a set-product. Let us quote for the maximization of strictly c o n c a v e functions using directional derivatives: Auslender [1] or using the differentiability: Ortega and Rhienboldt [19,20], Cea 69
70
J.Ch. Fiorot, P. Huardl Composition of algorithms
and Glowinski [2], Martinet and Auslender [10]. The free steering case has been treated for functions of class C 2 by Schechter [27] and for functions of class C 1 by Ortega and Rheinboldt [19] and Boyer [3]. We consider in this paper the problem of optimization: M a x { f ( x ) l x ~ A } . Restricting ourselves to the case where A is a subset of R n, we shall study here a recent generalization of methods of optimization by relaxation, in the sense where the suboptimizations are done on subsets more general than straight lines or affine varieties. More precisely these subsets are defined as the ranges of the current solution by point-to-set maps (cf. Huard [7]). Cyclic or free steering relaxations are then defined in a very natural way by the composition or the union of these ranges. Hence the expression "composition and union" of algorithms given as the title of this paper. Moreover we want to evaluate the part and the power of each hypothesis considered in the study of the convergence to an optimal solution of a sequence of points obtained, or in a more modest way, in the obtainment of opitmum accumulation points. These accumulation points appear as fixed points of point-to-set maps. The article consists of four parts. In the first we recall Zangwill's theorem and we give two slight extensions. In the second part we give the notation and the essential hypotheses. In the third we study the free steering utilization of algorithms. The treatment also differs from papers quoted in references in the sense that they have always used either the Frechet or Gateaux's differential (with different definitions according to the authors) or the directional derivative. These hypotheses, fundamental for the previous authors, are not needed here. We only consider a continuous function with a unique maximum on each given subset. The shape of these subsets is not taken into account, they need simply, besides of the unimodularity recalled, to be the range of a current solution by a continuous point-to-set map and to include x. These hypotheses are sufficient for any accumulation point to be a fixed point for the point-to set maps describing the algorithm. In some examples, when we apply these results to a strictly concave differentiable function, we find again the convergence to the optimal solution of the optimization problem. In the fourth part we weaken the hypotheses, for instance we drop the unicity of the minimum of the function on the given subsets. The results we obtain establish the existence of generalized fixed points for only certain point-to-set maps defining the composition or the union of the algorithm.
1. Preliminaries
First recall Zangwill's theorem [29, p. 21]. Theorem 1. Suppose: E a compact set in R n, P a subset o f E, F : E - ~ ( E ) a point-to-set map, h : E - ~ R a continuous function on E such that: Vx E E - P:
J.Ch. Fiorot, P. Huardl Composition of algorithms
71
(o~) F(x)--[:O and F closed at x, ([3) x' E F(x) ~ h(x') > h(x). Let be given ~ ~ E, F defines the following sequence: if kx E P, stop; otherwise k+lX E F(kx), k E N. Under these conditions, every accumulation point *x of the sequence satisfies: *x ~ P .
Corollary 2. Suppose: E C R" a compact set, P C E, I a finite set of indices, h : E ~ R a continuous function on E and Fi : E ~ ~ ( E ) , i E L a set of point-to-set maps such that: Vi E I, Vx E E - P : (a') F/(x):]:0 and I] closed at x, (fl') x' E l](x) ~ h(x') > h(x). Let us define I" = UiEI I]. We consider the following sequence: ~ E E; if ~x E P, stop; otherwise k+tx E F(kx), k E N. Then every accumulation point *x of the sequence satisfies: *x ~ P.
Proof. The Zangwill's theorem assumptions are satisfied: conditions (a) and ([3) follow directly from conditions (o~') and ([3') respectively. Remark 3. From above, when only one I'~ is considered, we obtain Zangwill's statement.
Remark 4. Taking a successor k+~x E F(kx) means choosing it arbitrarily in one of the ranges F~(kx) i.e. in a free manner. A particular realization, for instance, consists in using all the F~'s one after the other in a determined order i.e. in a cyclic way. In this case, if I = {l, 2 . . . . . p} for instance, taking the natural order, we can define F' by F'(x) = Fp o Fp-i o.. 9o l'~(x) where F' is the composition of p point-to-set maps. Now we give a second corollary (Dubois [5]). Corollary 5. Suppose: E a compact set in R", P C E, F : E ~ ~ ( E ) a point-to-set map, and h : E-->R a continuous function on E such that: Vx E E - P : (or) F(x)d=O and F closed at x, ([3) x' E F(x) ~ h ( x ' ) > h(x). We consider the following algorithm: ~ E E; if kx E P, stop; otherwise k+~x E E such that h(k+~x) >--h(ky) with ky E F(kx). Then every accumulation point *x o f the sequence satisfies: *x ~ P.
Other results related to analysis of convergence of mathematical programming
72
J.Ch. Fiorot, P. Huard/ Composition o[ algorithms
algorithms have been considered in the literature, for instance: G.G.L. Meyer [1 l, 12, 13], H u a r d [8], R.R. Meyer [16], P o l a k [23].
2. Notation and hypotheses The problem is: to maximize a continuous function f over a subset A of R ". For this we give p point-to-set maps A, i ~ I = {l, 2 . . . . . p}, from A into 90(A). We require the following hypotheses: ( H I ) x E A~(x), Vi ft. I, Vx ft. A, (H2) A~ is continuous on A (closed and lower semi-continuous), (H3) Vx ~ A, f : A ~ R has a unique m a x i m u m o v e r Ai(x), (H4) there exists ~ ~ A such that E 0 = { x ~ Al,f(x)>-f~x)} is a c o m p a c t subset.
2.1. Standard examples o[ such Ai (a) Unconstrained case (A = Rn). The range A~(x) is a r~ dimentional linear variety containing x with E~r~ = n (p - n) and R ~ = EieIA~(0). In particular for p=n, ri=l, Ai(x)={u ~ Rn[u=x+0e. 0 ~ R} with e~ the i th vector of any basis of R n. (b) Constrained case. Let be given: Vi, i E I a linear s u b s p a c e of R ~ such that R ~ = V = ~ i Vi and K --- ~ie1K~ with Ki a n o n e m p t y closed subset of Vi. In Cea and Glowinski [2], V and V~ are reflexive B a n a c h spaces. Define A = K and for any x E K we set:
Ai(x) = {y ~ K I Y = (xl, x2 . . . . . xi-i, yi, xi+l . . . . . xp), Yi E Ki} or
AI(X) = (xl . . . . . xi-i, O, xi§ . . . . . xt,) + Ki C_x + Vi. R e m a r k 6. In the previous examples all sets A~(x) are convex; but this condition is not required by A s s u m p t i o n s ( H I ) - ( H 4 ) .
2.2. Definition and properties Definition 7. For any x E E0 and any i E I, we define the point-to set map Mi, M i : x ~ { u E A i ( x ) [ [ ( u ) > - f ( t ) , V t ft. Ai(x)}. If the subset Mi(x) is reduced to a point, which is the case when (H3) is used, we shall term this subset by M~(x) instead of {M~(x)}. For any positive integer m we also define: M = Mi, ~ M~,_, . . . . M~ with it E I for j = 1, 2 . . . . . m.
Remark 8. The following results a b o u t convergence are dependently of the methods used to maximize .f o v e r Ai(.).
established
in-
ZCh. Fiorot, P. Huard/ Composition of algorithms
73
Property 9. Under Hypotheses ( H I ) - ( H 4 ) : M is a continuous function on Eo. Proof. For any it ~ I and for any x E E0, f r o m (H3), the set Mij(x) is n o n e m p t y and M~j is univoque. The hypothesis (H2) and the continuity of f give the closedness of Mi: Moreover, from ( H I ) , Mii(x) belongs to the c o m p a c t E0. And consequently Mij is a continuous univoque function on E0 and M also.
Lemma 10. Under Hypotheses ( H I ) and (H3): x = Mij(x) for every ij E L j = 1,2 . . . . . m is equivalent to x = M(x). Proof. (a) F r o m x = Mij(x) for e v e r y ij E L j = 1, 2 . . . . . m we obtain directly from the definition of M: x = M(x). (b) If x = M ( x ) let be 0y = x, ty, 2y. . . . . my = X with iy E Mij(~-~y) a sequence originating and arriving at x. F r o m ( H I ) and the definition of Mii we obtain f(j-ly) < [(jy). As 0y = my = 0x it follows: f(iy) = [(x) and f r o m (H 1) and (H3), for any it E L we have Jy = x, j = 1, 2 . . . . . m, consequently x = Mij(x).
3. Study of free steering utilization of algorithms 3.1. Algorithm I Let ~ be a given starting point in A, and consider the following sequence (kx), k E N, given by: if kx = M~(kx) for e v e r y i E I then stop, else:
k+~x ~ U
M~(kx). iEl
The primitive problem is to maximize [ o v e r A; in the following section 3.2 with only H y p o t h e s e s (H1)-(H4) we are concerned with seeking the fixed points of the Mi's. These fixed points are related to the o p t i m u m of [ o v e r A. Then additional h y p o t h e s e s such as differentiability and strict concavity of f permit us to obtain the maximum. We always suppose in the sequel that the sequence (kx), k E N given by the algorithm, involves infinitely many distinct points; otherwise the last given point satisfies kx = Mi(kx) for e v e r y i E /, i.e. kx is a fixed point for each M~.
Property 11. For the sequence (kx), k E N, given by algorithm I we have: IIk~'x - kxl[--,O, when k --, ~. Proof. We s u p p o s e the converse i.e. there exist a real 8 > 0 and a subsequence (kx), k E N 1 C N such that IIk+'x - ~xll > 8. As (~§ ~x) ~ E0 • E0 we can find a subsequence converging to (*x, **x) with I]**x - *xll-> 8. But I is finite, and there exists at least one function Mi which is used infinitely m a n y times such that
74
J.Ch. Fiorot, P. Huard/ Composition o[ algorithms
k*lX = Mi(kX). F r o m the continuity of Mi it follows: **x = Mi(*x). But [(*x) = /(**x), this equality and Hypothesis (H3) give *x = **x, yielding a contradiction. Remark 12. Following Ostrowski [21] we know that either the entire sequence converges or it possesses a set of accumulation points which constitute a continuum. Consequently if as in M e y e r [14, 15] we add the following hypothesis: for any given A, {x ~ E o l x = M ( x ) and [(x) = A} is finite, we obtain the convergence of the sequence (kx), k E N. In a forthcoming example (Example 17) we shall note that the convergence may happen even if this last hypothesis is not satisfied.
3.2. Study of fixed points given by algorithm I. We consider an arbitrary accumulation point *x of the sequence (kx), k ~ N, given by algorithm I and the corresponding subsequence (kx), k ~ IQ C N, converging to *x. For a given but arbitrary integer m we define a partition of the indices of the sequence (i.e. a partition of the sequence in fact) in subsets of m successive terms. Then we consider the groups of m successive iterates which give at least a point kx, k E 1V, of the subsequence. In these infinitely many such subsets (or groups) there exist at least one infinity of them which uses the " o p e r a t i o n s " M~t, M, 2. . . . . Mi, in this order. This follows from the fact that, given m successive iterates taken in the partitioning there is a finite number of possible ways (pm) to maximize over the A~'s in a given order. Repetitions are allowed, particularly if m > p. Of course if m < p only a part of the A~'s will be used. L e t us set M = M~m o M~m_~. . . . M~ the composition of these m successive "operations" where il, i: . . . . . im is an ordered sequence of indices taken in I. Lemma 13. Under Hypotheses (H1) to (H4) and with the above definition o[ M,
every accumulation point *x o[ the sequence given by algorithm I satisfies *x = M ( * x ) and *x is also a fixed point [or each Mj, j E {i~, i2. . . . . ira}. Proof. Let N " C N be the subset of indices of points which give by the point-toset map Mi. the first point of each special m successive iterates (recall that each special m successive iterate contains at least a point kx, k E 1Q). We have for K E N": k+mx = Mi, o Mira_, . . . . o Mit(~x). If k and k' are two successive integers in N " we may have k' = k + m if there are two successive groups of "operations" M~,, M~2. . . . . M~,, every one of them giving at least a point of kx, k E Iq. We consider the subsequence (kx), k E N". In particular, thanks to (HI) and (H3), it is possible to satisfy requirements of Corollary 5. Instead of E, F, ky, kX, P and h we take respectively Eo, M, k§ kx (with k E N"), {x E Eol x = M ( x ) }
J.Ch. Fiorot, P. Huard/ Composition of algorithms
75
a n d / . P r o p e r t y 9 gives point (o0. The condition (/3) is written in the following way: x ' = M(x), x#M(x) implies f(x')>f(x). If we suppose the contrary i.e. f(x')=f(x), then f r o m ( H I ) the sequence ~ 'y . . . . . m y = x ' with i y = Mif-ly) for j E {1,2 . . . . . m} is such that [(x) = f ( l y ) = f(2y) . . . . . /(my) = f(x). And now (H3) gives x = ly = 2y . . . . my = x' which cannot hold. Then any accumulation point g of the subsequence (kx), k E N", satisfies g = M($). Returning to the subsequence (kX), k E I([, we can write it (k+PkX)with k E N " and p~, ~ {1, 2 . . . . . m}. In a group of m successive iterates the subsequence (kx), k E N, m a y have several elements i.e. for a given k E N " it m a y exist several p~ E {1,2 . . . . . m} from which we just pick up one denoted Pk and we still have k*Pkx~ *X when k-~ ~ (with k ~ N"). Property 11 gives: IIk+'x - kxll
0, It'+=x -
0 .....
II + x -
for k E N";
then ]]*+Pkx- kxi]~0 for k E N" and consequently *x = ,~, therefore *x = M(*x) and from L e m m a 10: *x = Mij(*x) for j ~ {1, 2 . . . . . m}.
Counterexample 14. If we drop H y p o t h e s i s (H3) we cannot apply Corollary 5 i.e. in fact Zangwill's T h e o r e m l as shown by an example in R z where the graph of a quasi-concave function [ is drawn (Fig. l). Outside the square abcd the value of f is zero. i = 1, 2.
Ai(x ) = {U]U = X + Oei, 0 E R},
Ml(b)=(8),
M ( b ) = M2oMl(b),
! !
i.e. segment [c,a] union the shaded portions,
s S p
s
9 (,~)
Fig. 1.
J.Ch. Fiorot, P. Huardl Composition o[ algorithms
76
b ~ M ( b ) , a or c E M ( b ) a n d h o w e v e r f ( c ) = f ( a ) = f ( b ) = O . Introduction Hypothesis (H5) (H5) ::lm E N, m > - p such that Vj E N, Vi E /, :lk E [ j m + l , ( j + l ) m ] satisfying k+~x = Mi(kx). Algorithm I defines implicitly a sequence (ik), k E N, satisfying k+~x = M~k(kX). Arbitrary sequences (i~) satisfying Hypothesis (H5) will be called "essentially periodic" by analogy with the property introduced in [19, p. 513] for a sequence of vectors. Effectively (H5) may be written (H5') 3 m ' , m ' - > p such that Vj E N, Vi E L 3k E [j+ 1, j + m'] satisfying k+'x = Mi(kx). By setting m ' = 2m for example, (H5) implies (H5') and conversely, setting m = m', (H5') implies (H5).
Theorem 15. Under Hypotheses (HI) to (H5) we have: (i) every accumulation point *x o f the sequence generated by algorithm I satisfies *x = Mi(*x) for any i E L (ii) I / x = M~(x) for every i E I is a sufficient condition o f optimality then *x is a maximum of f over A. (iii) I[ in plus this maximum is unique, then the sequence (kx), k E N, converges to this maximum. Proof. (i) Hypothesis (H5) means that in any sequence of m successive iterates any "operation" i is used at least one time. That is to say, in M appear all the functions Mi, i E I. Then by L e m m a 13, any accumulation point x of the sequence (kX), k E N, is a fixed point for each M , i E I. (ii) Evident. (iii) For the sequence we consider g an arbitrary accumulation point of (kx), k ~ N. By (HI), the sequence of values f(kX) are monotone non decreasing; then we obtain [ ( s [(*x). Hypothesis (ii) and the unicity of the maximum yield: .~ = *X.
3.3. Application to the cyclic case
Algorithm I used in a cyclic way is written in the following manner: Algorithm I bis. Let ~ be a starting point given in A, if kx = Mi(kx) for any i E L then stop, else: k*~X = Mi(kx)
with k ~ i (rood p).
Gathering p successive iterations and defining M = Mp o Mp_~ o . . . o M~, this cyclic algorithm may also be written: let ~ be a starting point given in A; if
J.Ch. Fiorot, P. Huard[ Composition of algorithms
77
kx = M(kx), then stop, else: k+tx = M(kx). This algorithm had been treated in [29, p. 111] and in [19, p. 515] under a less general form (called "univariate relaxation method" following the terminology of [19, p. 224]). If the sequence (kx), k E N, has infinitely many distinct points, Hypothesis (H5) is automatically satisfied: we have the Theorem 15 results under Hypothesis H1 to H4: any accumulation point o f (kx), k E N, given by Algorithm I bis is a fixed point for each Mi, i E I. Let us give some examples relating to Algorithm I or I b i s . Example 16. The set A is R 2. Let us define Ai(x) = {y E i~1,2.
R21y= x
+ Oe,, 0 ~ R},
[ : ( x , x2) ~ - ([(xl - 2) cos a + (x2 - 2) sin al) 1/2 - ( ! - (x~ - 2) sin a + (x~ - 2) cos ~{)1/2 with a S 0 (mod ~ ' ) . This function [ is not quasi-concave and its directional derivatives are not defined along Ll and L2 (Fig. 2), but f satisfies (H3). The iterates *x given by Algorithm I bis are alternatively on L~ and L2 and converge to O (which is the unique maximum).
\
L2
Z
..
" " L1
e2
_--- x 1 F i g . 2,
Example 17. In Remark 12 we point out that the iterates can converge even if Meyer's hypothesis is not satisfied. We give such an example with A = R E and f : ( x , x2)--, min (1 - Ixl + x2[, - 2 1 x l - x2[). The set of optimal solutions (Fig. 3) is the segment [a, b] with a = (- 89
78
J.Ch. Fiorot, P. Huard[ Composition o[ algorithms
b = (89 ~). The corresponding optimum value of jr is zero. The maximizations are done on Ai with A i ( x ) = { y E R 2 1 y = x + 0 e . 0 ~ R}, i = 1 , 2 . Although {x ~ E o l x = M ( x ) , j r ( x ) = 0} = [a, b], an infinite set, for an arbitrary starting point ~ the sequence of iterates converges to a or b or reaches a point of [a, b] in one step. In this example jr also satisfies (H3) but T h e o r e m 15 does not warrant the convergence.
]
:' If(x)
= -I'~
I it
Fig. 3.
3.4. Differentiable case Introduce (H6): [ is differentiable. Now recall the well-known notion of cone of tangents to A at a point x E A:
T ( A , x ) = {y E R" [y = limkA(kx--x) with kx E A and gx -~ x(k ~ oo), k;t ~ R§ In the following, we shall simply write T instead of T ( A , * x ) , Ti instead of T(Ai(*x),*x). We shall also use F ( T ) the negative polar cone of the cone T, F ( T ) = {u ~ (R")*lu-x ~<0 Vx E T}, where u-x is the inner product of u and x. Corollary 18. Under Hypotheses (H1) to (H6), for any accumulation point *x (given by Algorithm I) satisfying T C [ U ieiTi] we have: Vf(x) ~ F ( T ) , i.e. *x is a stationary point.
J.Ch. Fiorot, P. Huardl Composition o[ algorithms
79
Proof. From Theorem 15: *x = M~(*x), i ~ /, then Vf(*x) E F(Ti), i.e. V/(*x) E
AiEIF(T3. As T C [UjEITd ([
] is the symbol of the convex hull),
F ( T ) D F([UiEIT,.]) = F(UiErTi) = AiEIF(Ti). Example 19. Coming back to the standard Example 2.1 (b) we obtain (A = K) the following properties [6]:
T~ = T(K~, (0 . . . . . O, *xi, 0 . . . . . 0)), T = Uie1T(Ki, (0 . . . . . 0, *xl, 0 . . . . . 0), then T = U ~zIT~. As (H1) and (H2) are satisfied we obtain the following corollary. Corollary 20 (Example 2.1 (b)). (i) For any accumulation point *x of the sequence given by Algorithm I under Hypotheses (H3) to (H6): Vf(x) E F(T). (ii) Under Hypotheses (H4) to (H6) and f strictly concave, the sequence is convergent to the maximum of f over A. (iii) The same conclusion as (ii) holds if f is strictly quasi-concave and
vf(*x)#0. Proof. (ii) As f is strictly concave, (H3) is satisfied and Vf(*x) E F ( T ) implies that *x is a maximum of f over A. As this maximum is unique the sequence converges to this maximum. (iii) Since the maximum is unique (Poljak [22]) the convergence follows. A similar corollary can be given for Example 2.1 (a) which is a particular case of Example 2.1 Co) by setting K~ =Ai(0), which implies A = K = R " and then: Vf(*x) E F ( T ) = {0}.
4. Results for more general functions, free steering and cyclic cases
4.1. Hypotheses In what follows we make hypotheses much weaker than in Section 2 (neglecting (H3) among others). An essential consequence is that the M~'s are point-to-set maps and have no longer classical fixed points i.e. points satisfying x = Mi(x) but generalized fixed points satisfying x E Mi(x). We shall see that accumulation points of sequences given by free steering or cyclic composition of algorithms are generalised fixed points of certain M~ but not of all. Examples illustrating these results will show the part played by the hypotheses. As in Section 2 we give p point-to-set maps Ai, i E /, from A into ~ ( A ) and f a continuous function from A into R such that: (HI) x E A~(x), Vi ~ I, Vx E A, (H2) A~ is continuous on A,
80
J.Ch. Fiorot, P. Huard/ Composition of algorithms
(H4) there exists x0 E A such that E 0 = { x E A If(x)>_f~x) } is a compact subset, (H7) y ~ Ai(x) ~ At(y ) = At(x ). In the following, for Algorithms I, II, Ibis, the generated sequence fix), k E N contains infinitely m a n y distinct points. If it was not the case the problem would already have been solved i.e. x E M~(x), Vi E L We take for the M~'s the same definition as in Section 2.2, but now the Ms are point-to-set maps. As in Property 9, from (HI) (H2) and (H4) we show that Mt(x)4:~ for any x belonging to E0 and that M~ is closed on E0. For an accumulation point *x given by the algorithms I, II, I bis, we consider in the following the subsequence (~x), k E N ' C N converging to *x. Let I ' be the subset of the i's belonging to I such that the "operation" i (to maximize f over Ai(')) happens infinitely many times in building the subsequence fix), k E N'. In the Robert, Charnay and Musy's terminology [25] I ' is called a "residual" associated to the subsequence (kx), k E N'. In the case I ' = I this residual is said to be maximal. As I is finite, I' is nonempty.
4.2. Algorithm I, free steering case We use again Algorithm I (Section 3.1) with point-to-set maps M~. In this algorithm repetitions are allowed. On the contrary for algorithms II and I b i s repetitions will not take place. Theorem 21. With Algorithm I, under Hypotheses and Definitions 4.1 we have
*x E M~(*x) for any i E I'. Proof. For an arbitrary i belonging to I', there exists S C N ' such that k+Ix M,(kx) for any k E S. We considered two consecutive elements of S called k and k' and the subsequence ff+2x), k E S. As there exist infinitely many pairs (k, k') E S x S and a finite number of "operations" i, one of them, j (may still be i) is used infinitely many times to obtain k§ from k+lx with k E S, i.e. it exists (i, j) and S ' C S such that k+lx E Mi(kx) and ~+2x E Mj(k+~x) for any k ~ S'. Now consider the sequence (kx, k+lX, k+2x) E Eo X Eo • Eo for k E S'. The compactness of E0 ensures the existence of a converging subsequence to ($, *x, **x) for k E S" C S'. As Mi is closed then *x E M~(~) i.e. *x maximizes f over At(S). From *x ~ Ai(2) and Hypothesis (H7) we obtain Ai(*x) = At(:~), therefore M/(*x) = Mi(~) and *x ~ M/(*x). Remark 22. Theorem 21 is proved but actually we have also proved that *x E Mj(*x). Indeed: Mj being closed, we obtain: **x E Mj(*x). Moreover the sequence of values is monotonous nondecreasing, from (HI) we deduce [(*x) =
J.Ch. Fiorot, P. Huardl Composition of algorithms
81
f(**x). If *x ~ Ms(*x ) then f ( * * x ) > f ( * x ) yielding a contradiction, hence *x E
Mj(*x). 4.3. Algorithm II, free steering case Let L ( x ) = {i E I lx ~ M,(x)}. Let ~ be given in A, consider the sequence: if L(kx) = L then stop; else:
k+~x~
U , M,(kx).
iEI-L( x)
We observe that the same " o p e r a t i o n " i cannot be used two times consecutively because, from (H7), if k+lx ~ Mi(kx), then Ai(k§ = Ai(gx) and k+Ix E Mi(k+Ix), then i E L(k+lx).
Corollary 23. With Algorithm II, under Hypotheses and Definitions 4.1, for any i E I', there exists jq:i such that *x E Mi(*x) and *x ~ Ms(*x ).
Proof. It is a direct consequence of T h e o r e m 21 and previous remark. 4.4. Algorithm I b i s , cyclic case We use again Algorithm I b i s (Section 3.3) with point-to-set maps M~. It is a particular case of the previous ones where the "operations" i are used in turn in a settled order i.e. cyclicly.
Corollary 24. With Algorithm I b i s , under Hypotheses and Definition 4.1, for every i E I', we have: *x E Mi(*x) and *x E Mi+~(*x). 4.5. Examples in R 2 and R 3 illustrating the results of Theorem 21 and its Corollaries We give below two illustrations of application of Algorithms I, II and I b i s which give accumulation points satisfying Theorem 21 but not satisfying x E Mi(x) for any i of L Example in R 2. Consider in the plane x~O,x2 the polyhedron G of vertices a, b, c, d (Fig. 4). L e t gi'x = 0 , i = 1, 2, 3, 4, be the equations of the straight lines supporting the four sides: G = {x ~ R 21 gi 9x -> 0, i = 1, 2, 3, 4}. We define a function f inside G explicitly and outside G we define it by its level sets: For x E G, 4
f ( x ) = fl(x) with f~(x) = B (g~ " x)2" It is a regular F-distance, following Huard [9]. For x ~ G, f is a function whose level sets are defined by four line segments respectively parallel to the sides of G, their lengths being equal to those of the sides, at a distance p from
J.Ch. Fiorot, P. Huard/ Composition of algorithms
82
~3
A2
-2
Fig. 4.
them; these four line segments are linked together by arcs of circles with centers a, b, c, d and radius p so that the join of different curves and line segments is continuously differentiable. Let us note that: jr(x)= 0 and V / ( x ) = 0 for any x belonging to the boundary abcd of G; inside G, jr is a strictly positive quasiconcave function and the level sets of f are decreasing from zero when we move away from G. Let us draw in Fig. 5 the graph of jr but only for x3 -< 0. For x3->0 we obtain a bell-shaped surface which presents a top whose projection on x~ 0, x2 is the "centre" in the sense of [9] belonging to the interior of G. This surface tangent to x~ Or X2 along abcd is connected with the above part x3-< 0 in such a way that it is continuously differentiable and quasi-concave on R 2.
Let us define three directions A,, A2, A 3 parallel to the sides ab, dc, ad and
4
S
A rule line segment moving in a parallel v direction to b'~. and leaning on ~ edges (y) and (y') of the parts of / parboloid )id of vertices b and c /(IY)
C
" \\~
\ A part of paraboloid of revolution defined by the following equation (x - c )2 + (x - c )2 + x = 0 1
F%.
5.
1
2
2
3
J.Ch. Fiorot, P. Huard/ Composition of algorithms
83
maximize following At, a2, za3 in this order and cyclicly. Starting from ~ (Fig. 4), choosing as a successor, when there are many possibilities, the point furtherst away, we obtain four accumulation points a, b, c, d which are stationary points i.e. having a gradient equal to zero. These points are not the maximum, this being the " c e n t r e " of G. For b and c we have b E Mi(b), c E Mi(c) for i = 1, 2, 3 i.e. b @ M ( b ) and c E M ( c ) . On the other hand points a and d satisfy the following conditions: a E Ml(a), a ~ M3(a) but a $ M2(a): Mz(a) = {a'} E int G,
d E M2(d), d E M3(d) but d q~ Ml(d): Ml(d) = {d'} E int G. This is exactly what Corollary 24, i.e., Theorem 21 predicts, and no more can be said about these points. Example in R 3. Here directions A1, A2, A3 are independent, they are the directions of the axes el, e2, e3. Fig. 6 represents incompletely three contours of f with 1, 0, - 1 values. Fig. 7 represents a section in the horizontal plane containing points a, b, c, d an6 the starting point ~ The maximization is made following A~, A2, A3 in this order and cyclicly.
e3
--2
Fig. 6.
84
J.Ch. Fiorot, P. Huard/ Composition o[ algorithms
/ k
2X _: 3 X
/ /
t~ '
ix
-i
0•
Fig. 7. T h e infinite s e q u e n c e of p o i n t s kX s t a y s in the s e c t i o n p l a n e b e c a u s e in the v i c i n i t y of d and b we h a v e kx E M3(kx). T h i s s e q u e n c e has f o u r a c c u m u l a t i o n p o i n t s a, b, c, d. F o r c we have c @ M i ( c ) for i = l, 2, b u t c ~ M3(c), b e c a u s e t o w a r d s the l o w e r part there are b e t t e r values than f ( c ) o v e r A3(c).
References
[I] A. Auslender, "M6thodes num6riques pour la d6composition et la minimisation de fonctions non diff6rentiables", Numerische Mathematik 18 (1971) 213-223. [2l J. Cea et R. GIowinski, "Sur les m6thodes d'optimisation par relaxation", Revue Franqaise d'Automatique, d'lnformatique et de Recherche Opdrationnelle (Dgcembre 1973) 5-32. [3] R. Boyer, "Quelques algorithmes diagonaux en optimisation convexe", Th~se 3~me cycle, Universit6 de Provence 0974). [4] D. Chazan and W. Miranker, "Chaotic relaxation", Linear Algebra and its Applications 2 (1969) 199-222. [5] J. Dubois, "Theorems of convergence for improved nonlinear programming algorithms", Operations Research 21 (1973)328-332. [6] J.C. Fiorot et P. Huard, "Composition et r6union d'algorithmes g6n6raux", Compte-Rendus Acaddmie des Sciences Paris, tome 280 (2 juin 1975), Sgrie A, 1455-1458-S6minaire d'Analyse Num6rique No. 229, Universit~ de Grenoble (Mai 1975). [7] P. Huard. "Optimization algorithms and point-to-set maps", Mathematical Programming 8 (1975) 308-331. [8] P. Huard, "Extensions of Zangwi[l's theorem", this volume. [9] P. Huard, "A method of centers by upper-bounding functions with applications", in: J.B. Rosen, O.L. Mangasarian, K. Ritter, eds., Nonlinear programming (Academic Press, New-York, 1970) 1-30.
ZCh. Fiorot, P. Huardl Composition of algorithms
85
[10] B. Martinet et A. Auslender, "M6thodes de decomposition pour la minimisation d'une fonction sur un espace produit", S I A M Journal on Control 12 (1974) 635-643. Ill] G.G.L. Meyer, "Conditions de convergence pour les algorithmes it6ratifs monotones, autonomes el non d6terministes", Revue Fran(aise d'Automatique, d'In[ormatique et de Recherche Op~rationneUe I I (1977)61-74. [12] G.G.L. Meyer, "Convergence conditions for a type of algorithm model", S I A M Journal on Control and Optimization 15 (1977) 77%784. [13] G.G.L. Meyer. "A systematic approach to the synthesis of algorithms", Numerische Mathematik 24 (1975) 277-289. [14] R. Meyer, "On the convergence of algorithms with restart". S I A M Journal o[ Numerical Analysis 13 (1976) 696-704. [15] R. Meyer, "Sufficient conditions for the convergence of monotonic mathematical programming algorithms", Journal of Computer and System Sciences 12 (1976) 108-121. [16] R. Meyer, "A comparison of the forcing function and point-to-set mapping approaches to convergence analysis", S I A M Journal on Control and Optimization 15 (1977) 699-715. [17] J.C. Miellou, "M6thode de Jacobi, Gauss-Seidel, sur-(sous) relaxation par blocs appliquEe ~ une classe de probl6mes non lin6aires", Compte-Rendus Acad~mie des Sciences Paris tome 273 (20 D6cembre 1971) S6rie A, 1257-1260. [18] J.C. Miellou, "AIgorithme de relaxation chaotique .h retards", Revue Fran(aise d'Automatique, d'In[ormatique et de Recherche Op~rationnelle (Avril 1975) 55-82. [19] J.M. Ortega et W.C. Rheinboldt, Iteration solution o[ nonlinear equations in several variables (Academic Press, New York, 1970). [20] J.M. Ortega et W.C. Rheinboldt, "A general convergence result for unconstrained minimization methods", S l A M Journal of Numerical Analysis 9 (1972) 40-43. [21] A.M. Ostroswki, Solution of equations and systems of equations, (Academic Press, New York, 1966). [22] B.T. Poljak, "Existence theorems and convergence of minimizing sequences in extremum problems with restrictions", Soviet Mathematics Doklady 7 (1966) 72-75. [23] E. Polak, Computational methods in optimization, a unified approach (Academic Press, New York 1971). [24] F. Robert, "Contraction en norme vectorielle: convergence d'it6rations chaotique pour des 6quations de point fixe ~ plusieurs variables", Colloque d'Analyse Num6rique (Gourette 1974). [25] F. Robert, M. Charnay et F. Musy, "Iterations chaotiques s~rie parall/,'le pour des ~quations non linEaires de point fixe', Aplikace Mathematiky 20 (1975) 1-37. [26] S. Schechter, "Relaxation methods for linear equations", Communications on Pure and Applied Mathematics 12 (1959) 313-335. [27] S. Schechter, "Minimization of a convex function by relaxation", in: Abadie, ed., Integer and nonlinear programming (North Holland, Amsterdam, 1970) 177-189. [28] R.S. Varga, Matrix iterative analysis (Prentice Hall, Englewood Clifs, N J, 1%2). [291 W.I. Zangwill, Nonlinear programming: a unified approach (Prentice Hall, Englewood Cliffs, N J, 1969).
Mathematical Programming Study 10 (1979) 86-97. North-Holland Publishing Company
MODIFIED
LAGRANGIANS
IN CONVEX
PROGRAMMING
AND
THEIR G E N E R A L I Z A T I O N S E.G. G O L ' S H T E I N
Central Economics-Mathematical Institute, Moscow, U.S.S.R. and N.V. T R E T ' Y A K O V
Central Economics-Mathematical Institute, Moscow, U.S.S.R. Received 22 June 1977
In this paper a rather general class of modified Lagrangians is described for which the main results of the duality theory hold. Within this class two families of modified Lagrangians are taken into special consideration. The elements of the first family are characterized by so-called stability of saddle points and the elements of the second family generate smooth dual problems. The computational methods naturally connected with each of these two families are examined. Further a more general scheme is considered which exploits the idea of modification with respect to the problem of finding a root of a monotone operator. This scheme yields a unified approach to convex programming problems and to determination of saddle and equilibrium points as well as expands the class of modified Lagrangians.
Key words: Modified Lagrangians, Convex Programming, Stability of Saddle Points, Smooth Dual Problems. Monotone Operator.
1. A general class of modified Lagrangians in convex programming C o n s i d e r the c o n v e x p r o g r a m m i n g p r o b l e m ]'(x) ~ sup,
g ( x ) = ( g l ( x ) . . . . . gin(x)) _> 0,
xEG,
(l)
w h e r e G is a c o n v e x s u b s e t of the E u c l i d e a n s p a c e E" a n d the f u n c t i o n s f ( x ) ,
gi(x) are f i n i t e - v a l u e d a n d c o n c a v e o n G. The standard approach
to p r o b l e m (1) (see, e.g. [4]) m a k e s
use of the
Lagrangian function
Fo(x, y) = f ( x ) + (g(x), y),
(2)
w h e r e x E G, y E E " , a n d (., .) d e n o t e s the i n n e r p r o d u c t in the E u c l i d e a n space. O n e m a y briefly d e s c r i b e the role of f u n c t i o n (2) in the f o l l o w i n g way. D e n o t e E~' = {y E E " : y -> 0},
~0(x) = inf>,eE~. Fo(x, y), 86
O0(Y)= supxec Fo(X, y)
and
E.G. Gol'shtein, N.V. Tret'yakov/ Modified Lagrangians
87
consider problems and
q,0(x)--* sup,
x E G,
(3)
~b0(y)--*inf,
y E E~'.
(4)
It is easy to verify that (3) is equivalent to the original problem (1). More precisely, for the feasible set (~ = {x E G: g ( x ) ~ 0} of problem (1) one has
~Oo(X )
~f(x),
x ~ d,
t -~,
x ~ G.
(5)
Thus, the equality supxea q~0(x)= supxed [(x) holds together with X * = X*, where X * = Arg max ~00(x),
X * = Arg max [(x).
x~G
x~.G
On the other hand under well-known conditions (see, e.g. [4]) the duality framework is valid, i.e. sup ~0(x) = i n f 00(Y), xEG
v ez7
the saddle points of Fo(x, y) with respect to (x, y ) E G x E~ forming the set X/f x Y~' with Y~' = Arg minyeEr 00(Y). In other words, the determination of a solution x* of problem (1) may be replaced by that of a saddle point (x*, y~') of function (2). Recall that the solutions y* of the dual problem (4) yield a certain characteristic of stability of the original problem (1). The same approach to the convex programming problem turned out to be applicable, Fo(x, y) being replaced by some other functions. The latter functions are called modified Lagrangians. A function F(x, y) concave in x E G, convex in y E E.'2 is said to be a modilied Lagrangian for problem (1) if a relation analogous to (5) holds, i.e. if inf F(x, y) = {f(x), '
x E G
Let the saddle-point set J(* x 17"* of the modified Lagrangian F(x, y) with respect to (x, y) E G x E .~ be non-empty. Then according to the given definition the first c o m p o n e n t ~'* of this set coincides with the solution set X * of problem (3) or, equivalently, with the solution set X* of problem (1). Now we define a class of modified Lagrangians. Let )t;(~:, rt), i = 1. . . . . m be finite-valued functions for all sr E ( - ~ , +~), 77 E [0, +~). For y = (y~. . . . . y,,) define a function
F~(x, y) = ]'(x) + (g(x), y) - A(g(x), y)
(6)
A(g(x), y) = ~ h.i(gi(x), Yi).
(7)
with i=1
88
E.G. Gol'shtein, N.V. Tret'yakov/ Modified Lagrangians
Further, denote ~x(x) = inf F,(x, y), yEE m
O~(y) = sup FA(x, y) xEG
and consider two problems q~A(x)~sup,
x E G,
(8)
~bx(y)--->inf,
y E E~'.
(9)
Obviously, the ordinary Lagrangian (2) and problems (3), (4) are a particular case of (6)-(9) which corresponds to X~(~, 7) = 0, i = 1. . . . . m. Finally, denote X * = Arg max ~x(x), xEG
Y* = Arg rain 0x(Y). yEE~
Lemma 1 (see [1]). L e t the following a s s u m p t i o n s hold f o r i = 1 . . . . . m. (a) Ai(~, ,/) is c o n v e x in ~, c o n c a v e in ~1. (b) The f u n c t i o n ~1 - Ai(~, 17) is n o n - d e c r e a s i n g in ~ f o r any 71 >- O. (c) lim~.+| [~7/- Ai(~, 7/)] = -oo f o r any ~ < O. (d) inf,~o [ ( r / - Ai(~, 7?)] = 0 f o r any ~ > O. Then the f u n c t i o n FA(x, y) defined by (6), (7) is a modified Lagrangian. particular, X * = X * = X * .
In
Under the assumptions of L e m m a 1 the set Y* depends upon the choice of A~(~, r/), i = 1. . . . . m. A simple extra condition implies that Y* is independent of this choice, i.e. one has Y* = Y~. Lemma 2 (see [1]). Let, in addition to (a)-(d), the following condition be satisfied f o r i = 1..... m. (e) Ai(O, r/) = (OAi/O~:)(O,7/) = 0 f o r all ~1 > O. Then the equality Y * = Y * holds.
The contraction of the class of modified Lagrangians due to the additional requirement (e) looks natural in view of the following circumstances. On the one hand this contraction maintains the usual interpretation of vectors y* E Y~' in various applications (e.g., in mathematical economics). On the other hand it still gives one a chance to improve certain computational methods of convex programming as is shown below. The set of conditions (a)-(d) admits a more obvious form. Namely, L e m m a s 1 and 2 imply the following proposition. Theorem 1 (see [1]). L e t the following a s s u m p t i o n s hold f o r i = 1. . . . . m. (1) Ai(~, 7) is c o n v e x in ~ ~ ( - ~ , +oo), c o n c a v e in T1 E [0, +~). (2) Ai(O, 7) = (OAj/O~)(O, ~/) = 0 f o r all 7? >- O.
E.G. Gol' shtein, N. V. Tret'yakov/ Modified Lagrangians
89
(3) A~(~, 7) -< ~*/for all ~ >- 0, n -> 0. Then the function Fx(x, y) defined by (6), (7) is a modified Lagrangian for any problem of the form (1). Furthermore, the equality Y * = Y * is valid. Throughout the rest of the paper we deal only with such modified Lagrangians for which Y* = Y*. In view of that we shall use the notation Y* for this set.
2. Modified Lagrangians and stability of saddle points The following definition was first given in [5] (see also [1]). The saddle-point set U* x V* of a function F(u, v), concave in u E U, convex in v ~ V is said to be stable in u (with respect to F(u, v)) iff Arg max F(u, v*) = U*
for any v* E V*.
uEU
Theorem 2 (see [1]). Let, in addition to (1)-(3), the following condition hold for i = l . . . . . m. (4) Ai(~,,1)>0 for ~ # 0 , , / > 0 and for ~ < 0 , ~ / = 0 . Then the set X * x Y * which is the saddle-point set of the function F~(x, y) is stable in x with respect to this function. Evidently the condition (4) is not satisfied for Ai(~, , / ) = 0. It agrees with the well-known fact that if no extra properties similar to strict concavity of f(x), gi(x) are in the case, then the set X * x Y* is not stable in x with respect to the ordinary Lagrangian F0(x, y) in the sense of the above definition. The stability of saddle points in y may be defined in a similar way. There is a connection between stability and the problem of convergence of the subgradient method for determining saddle points. The following proposition concerns arbitrary saddle functions which may have no relation to problem (1). Let U, V be closed convex sets in Euclidean spaces and let O, Q be open convex sets which contain U and V respectively. Consider a function F(u, v) concave in u E 0 and convex in v ~ Q whose saddle-point set U* x V* with respect to u E U, v E V is assumed to be non-empty. L e t OuF and 0vF denote the subdifferentials of F in u and in v respectively. Further, let ~ru and ~'v be projections on U and on V. The subgradient projection method may be written as follows u k+l = r
k + t~klk),
Vk+l = ~r~(vk -- aglk),
(10)
where I k E O~F(u k, vk),
1~ ~ O~F(u k, vk).
Theorem 3 (see [1, 5]). I f the saddle-point set U* x V* o f the function F(u, v)
90
E.G. Gol'shtein, N.V. Tret'yakov/ Modified Lagrangians
with respect to u E U, v E V is b o u n d e d a n d stable both in u a n d in v and i[
Jim.k=0,
no
where a = a ( u ~ v ~ is small enough, then m e t h o d (10) converges in terms o f the distance, i.e.
tim dist(u k, U*) = [!m dist(v k, V*) = 0. One may indicate explicit expressions of modified Lagrangians with the property of saddle points stability in both variables in the case of the linear programming problem. In the general case of problem (1) such explicit expressions are not known although it is possible to obtain modified Lagrangians with the desired property using the operation "sup" in x (see below, Section 5). In view of that the following recent result of Maistrovskii [6] is of interest. Let the saddle function F ( u , v) considered in Theorem 3 be smooth. Then according to [6] the statement of Theorem 3 is valid if the stability of U* • V* either in u or in v is required instead of stability in both variables. Therefore if the functions involved in the convex programming problem are smooth then the subgradient projection method (10) with an infinitely decreasing step size converges to the set X* • Y* in terms of the distance, when being applied to a modified Lagrangian which satisfies the conditions of Theorem 2.
3. Modified Lagrangians generating smooth dual problems and the modified dual method of convex programming
The requirements (1)-(4) of Theorem 2 permit one to get smooth dual problem (9) with no restrictive assumptions concerning the original problem (1). This important property of modified Lagrangians was first discovered by several authors [7-9] for the "quadratic" modification (see below). It is this property that leads to certain generalizations of modified Lagrangians given in Section 4. Consider a family of modified Lagrangians for which the property holds. Let a ( u ) be a convex function which belongs to C~(E") and satisfies the conditions a(0) = 0, Va(0) = 0 and IVa(u')-Va(u")l >- " / l u ' - u"l,
~, > 0
for all u', u" • E". Consider the function F " f x , y) = [(x) + max [(g(x) - t, y) - a ( g ( x ) - t)] tEE~
(11)
which is concave in x E G and convex in y E E ' . Generally speaking, the function Fa(x, y) does not admit the form (6) with X(g(x), y) of the separable
E.G. Gol'shtein. N. V. Tret'yakov/ Modified Lagrangians
91
type (7). But as a matter of fact the form (6), (7) was used mainly to simplify the notations, and it can be easily verified that the statements of Theorems 1-3 are valid for F*(x, y). Furthermore, it looks quite natural to consider F~(x, y) for any y E E m, since according to the following proposition the function ~b~(y) = sup F*(x, y) x•G
is well-defined (and smooth) everywhere in Em. T h e o r e m 4 (see [1,2]). The saddle-point set of Fa(x, y) with respect to x E G,
y E E = coincides with X* x Y*, this set being stable in x. The only assumption $0(Y) ~ + ~ implies that 4,'(y) E CI(E m) and IV~'(y ') - V~'(y")l-< (l/~/)ly'- Y"I; Y', y " E E m, where y is the constant involved in the definition of a(u). Note that the modulus l / y in the Lipschitz condition above depends only upon the choice of ~t(u) and consequently it may be treated as known. Therefore the dual problem ~a(y)~inf,
y E E m,
the problem of unconstrained minimization of the smooth function Sa(y), may be solved by means of the finite-step gradient method. This is a way of solving problem (1) and it is natural to call it the modified dual method. When investigating the convergence of the method, one should take account of the fact that in general the calculation of V~b~(y) may be carried out but approximately. In the following scheme the errors in calculating V~b~ are considered in terms of approximate maximization of Fa(x, y) over x E G, what looks natural from the computational point of view. So, for any y ~ E m let the sequences {xk},{yk}, k = 0, 1..... be defined by the relations x k E G,
FO(x k, yk) > sup Fa(x, yk) _ 8k, xEG
yk+l = yk _ 3,kVyF~(x k, yk),
(12)
with 8k -> O,
0 < inf yk -< sup Yk < 2~.
Denote by v and t~ respectively the optimal value and the asymptotic optimal value of problem (1). T h e o r e m 5 (see [2]). Assume that O0(y) # +oo and Y* • O. Let {xk}, {yk} satisfy (12) with an arbitrary starting point y~ E =. Then the condition ~,~=o 8~,/2< +oo
implies that [i_m yk = y* E Y *,
[im inf gi(xk) >--O,
limf(xk)=tS. k.-r
E.G. Gol'shtein, N.V. Tret'yakov/ Modified Lagrangians
92
Moreover, in the case when Y* is bounded (e.g., when the Slater condition is satisfied), the only requirement limm~ 84 = 0 implies that lirr2~inf g,(x k) >- O, lira f(x k) = v, k--|
Jim dist(y 4, Y*) = 0,
For the case of a closed bounded set G, some algorithms for solving problem (1), which are based on the scheme (12), are considered in [2]. For the "quadratic" modified Lagrangian, i.e. for a(u)=-~/{ul 2, the modified dual method was investigated in [7-10]. On the other hand, in several papers by B.W. Kort and D.P. Bertsekas (see, e.g., [12, 13]) a version of the dual method for non-quadratic modified Lagrangians is examined which differs from the method presented here.
4. The modification method for monotone operators
Let Z be a c o n v e x set in the Euclidean space E. A point-to-set operator T : Z ~ 2 E is considered, the set T(z) being non-empty for all z E Z . The operator T is supposed to be monotone, i.e. for any z', z " ~ Z and for any t' ~ T(z'), t" E T(z") one has
(t'-t",z'-z")>_O. The problem under investigation consists in finding a root of T on the set Z, i.e. such z* E Z that 0 E T(z*). In the special case when Z = E and T is a single-valued operator satisfying the inverse strong monotonicity condition
(T(z') - T(z"), z ' - z") >- 3~lT(z') - T(z")l 2, 3, > 0
(13)
for all z', z" E E, the problem may be solved by means of the following method: z k+~ -- z k - ~/klk,
Ii 4 - T(z4)l < e4,
~4 -> 0,
k = O , 1..... Theorem 6 (see [3]). Suppose that (13) holds, the set Z* = {z: T(z) = 0, z E Z} is
non-empty, and the [ollowing inequalities are satisfied: 0 < inf ~/k < sup ~'k < 2y,
_oek
k=0,1 .....
~+oo.
Then limk_.| z k = z* E Z*. It is shown in [2] that if, in particular, T(z) is the gradient of a convex differentiable function f(z), then (13) is equivalent to the Lipschitz condition for
E.G. Gol'shtein, N.V. Tret'yakov/ Modified Lagrangians
93
T(z). Obviously, in this case the method above coincides with the perturbed gradient method of minimization of f ( z ) . The modification method presented here treats the general case in which (13) is not valid. In this case the problem of finding a root of a given arbitrary monotone operator is replaced by that of finding a root of a modified operator, the latter satisfying condition (13). The modification may be obtained by the following scheme. Let R : E - } E be any operator satisfying the inverse strong monotonicity condition (13) as well as the strong monotonicity condition ( R ( z ' ) - R(z"), z' - z") >- 7 , l z ' - z"[2,
71 > 0
(14)
for all z', z" ~ E with 73'~ -< 1. For each w E E consider the operator TR.w : Z ~ 2 E defined by the equality TR.w(z) = T ( z ) - R ( w - z)
and denote by z ( w ) the root of TR.w. (Evidently, if such z ( w ) exists then it is the only root of TR.w on Z). The modified operator TR : E ~ E is then defined by the formula TR(w) = R ( w - z(w)).
Recall that a monotone operator T : Z - * 2 E is said to be maximal monotone if z' E Z, t' E T ( z ' ) whenever the inequality (t' - t, z' - z) --- 0 holds for all z • Z, t ~ T ( z ) . An important example of a maximal monotone operator is given by T=K+Q:Z-,2
E,
where Z is a closed convex set, K ( z ) is the normal cone for Z in the point z, i.e. K ( z ) = {1 : (1, z' - z) >- O, V z ' E Z},
and Q is a m o n o t o n e upper-semicontinuous point-to-set mapping, the set Q ( z ) being compact in E for each z E Z. 7 (see [3]). S u p p o s e that T : Z - * 2 E is a m a x i m a l m o n o t o n e operator. Then the f o l l o w i n g properties hold: (i) the modified operator T is well-defined and single-valued f o r each w ~ E; (ii) Ts satisfies condition (13); (iii) the modified operator T R : E - - * E has the s a m e roots as the original operator T : Z --* 2 r'.
Theorem
Obviously, one may set R ( z ) = V~o(z), where the function ~0(z) is strongly convex in E, the gradient Vr being Lipschitz continuous. Note that the case of R ( z ) = Vr with ~o(z)= 89 2 was considered by R.T. Rockafellar in [13].
94
E.G. Gol'shtein. N. V. Tret'yakov/ Modified Lagrangians
As to the root z(w) of TR.w, it can be found, with any accuracy, by means of "gradient-like" method with infinitely decreasing step-size (see [3, Section 4]). Thus, in view of Theorems 6 and 7 the modification method yields, for practically any maximal monotone operator, a process converging to its root.
5. S o m e applications of the modification m e t h o d
(A) Consider again the convex programming problem (1). Using the notations of Sections 1-3, let us set E = E m,
Z = Y = { y E E m : cgO0(y)g 0},
T ( y ) = OCJ0(y)
in the above scheme of the modification method. The operator T is known to be maximal monotone under the assumption that tP0(y)-= +oo. Further, set R ( y ) = Vq~(y), the convex differentiable function ~(y) satisfying the conditions ~',ly' - y"[ - I V ~ ( y ' ) - V~,(y")] -< ~1 [ y ' - y"[,
1 0 < y, _< -~
(15)
for all y', y" ~ E'L The modified operator TR, which under the assumption above is well-defined, takes the form
TR(v) = VCa(v),
v (~ E m,
where tp~(v)= sup~e~ Fa(x, v), the function Fa(x, v) being a modified Lagrangian of part 3, and the function a(u) = r is conjugate to r Thus Theorems 4 and 5 may be obtained by implementation of Theorems 6 and 7. (B) Consider now another application of the modification method to problem (I). Set E = E ~§
z = (x,y),
T(z) = (-O~Fo(x, y)) x {~TrFo(x, y)} = O~Fo(z), Z = {z ~ E"+" : a~Fo(z) = t~}, g(z) = {V,p,(x), V,p2(y)}, where the function r conditions:
satisfies (15), and the function ~o~(x) satisfies similar
y l x ' - x"[-< [V~,(x') - V~,(x")[--- 1 I x ' - x"[,
(16)
for all x', x" E E". If the set G involved in problem (1) is closed and the functions [(x), gi(x) are upper-semicontinuous over G then the modified operator TR exists and
TR(w) = V~P"(w) = ( - V . F " ( w ) , V~P"(w)),
w = (u, v) E E "+",
95
E.G. Gol'shtein, N. V. Tret'yakov/ Modified Lagrangians
with Pa(u, v) = max #"(u, v, x), xEG
# ~ ( u , v, x ) = f ( x ) - a l ( u - x ) + max [(g(x) - t, v) - a 2 ( g ( x ) - t)] tEE+
= F'~2(x, v) - a d u - x ) ,
al(u) = ,p,(u),
a2(v) = ~'(v).
By Theorem 7 the function P~ is a modified Lagrangian for problem (1) with the following properties: (i) the saddle-point set of Pa(u, v) with respect to u E E n, v E E" coincides with X* x Yr
(ii) P~(u, v) is differentiable in (u, v) = w and there holds (VwP~(w')- VwF~
w ' - w " )>- 3' [ VwPO(w')- VwP~(w")l2
(17)
with w', w " E E n+m,
VwP~(w) = (-Vu#a(w), VvP~(w)).
The inequality (17) being verified, a saddle point of F ( u , v) may be found by means of the perturbed finite-step gradient method (see Section 2) which in this case takes the form x k ~ G,
# ~ ( u k, v k, x k) -> P ~ ( u k, v k ) - 8~,
8k --- 0,
u k+l = u k - ykVat(U t - x k ) ,
k = 0, 1 . . . . .
v~+, = v k _ y k V v # ~ ( u k, v k, xk), T h e o r e m 8 (see [3]). A s s u m e
that
G is c l o s e d a n d the f u n c t i o n s
i = 1 . . . . . m are u p p e r - s e m i c o n t i n u o u s
o v e r G a n d let t h e s a d d l e - p o i n t
Y * o f Fo(x, y), x E G, y ~ E~' be n o n - e m p t y .
0 < inf Yk --<sup Yk < 2y,
f(x), gi(x), set X * x
T h e n u n d e r the c o n d i t i o n s
8~/2 < + ~
~__0
one has i m u k = u* E X * ,
i m v k = v* E Y *
f o r a n y s t a r t i n g p o i n t (u ~ v ~ C E n+m.
Note that for the case of quadratic functions al(u), a2(v) Theorem 8 was first proved by R.T. Rockafellar in [14].
6. O n e m o r e p r o p e r t y of the m o d i f i e d L a g r a n g i a n
Fa(x,
y)
The inequality (17) is valid for the function pa but it does not hold in the case
96
E.G. Gol'shtein. N. V. Tret'yakov/ Modified Lagrangians
o f F a. T h e l a t t e r f u n c t i o n satisfies the c o n d i t i o n (V~F~(z ') - V y ~ ( z " ) , z ' - z">= = - (VxF~
') - VxF~(z"), x ' - x">+ (VvFa(z ') - VyF~(z"), y ' - y")
_ 3,lV,Fa(z ') - V,Fa(z")12;
(18)
w i t h z = (x, y), x', x " E G, y', y " E E ~, w h i c h is not as s t r o n g as (17). N e v e r t h e less, the c o n d i t i o n (18) is sufficient f o r c o n v e r g e n c e o f the f i n i t e - s t e p g r a d i e n t m e t h o d to a s a d d l e p o i n t o f F~(x, y) w h e n e v e r the f u n c t i o n s f ( x ) , gi(x) i n v o l v e d in p r o b l e m (1) a r e s m o o t h e n o u g h . M o r e p r e c i s e l y , s u p p o s e t h a t G = E n a n d t h a t the g r a d i e n t s V f ( x ) , Vg~(x) a r e L i p s c h i t z c o n t i n u o u s o n e a c h b o u n d e d s u b s e t o f E n. C o n s i d e r t h e p r o c e s s xk+t = x k + / 3 V x F ~ ( x ~, yk), yk+l = y~ _ / 3 V y F , ( x k, y~),
k=0,1
.....
Theorem 9. I f X * x Y * • ~ then f o r a n y s t a r t i n g p o i n t (x ~ yO) there exists such /3o(Xo, yO) > 0 t h a t u n d e r the condition 0 < / 3 3o(X ~ yO) one has [im=x k = x * E X * ,
[imy~=y*~
Y*
f o r the p r o c e s s above.
References [I] E.G. Gol'shtein and N.V. Tret'yakov, "Modified Lagrangian functions", Economics and Mathematical Methods 10 (3) (1974) 568--591 (in Russian). [2] E.G. Gol'shtein and N.V. Tret'yakov, "The gradient method of minimization and algorithms of convex programming based on modified Lagrangian functions", Economics and Mathematical Methods 11 (4) (1975) 730-742 (in Russian). [3] E.G. Gol'shtein, "The modification method for monotone mappings", Economics and Mathematical Methods 11 (6) (1975) 1144-1159 (in Russian). [4] E.G. Gorshtein, Theory of convex programming, AMS Translation Series (1972), (Translation of a book in Russian edited by "Nauka", Moscow, 1970). [5] E.G. Gol'shtein, "The generalized gradient method for determination of saddle points", Economics and Mathematical Methods 8 (4) (1972) 569-579 (in Russian). [6] G.D. Maistrovskii, "On the gradient methods for determination of saddle points", Economics and Mathematical Methods 12 (5) (1976) (in Russian). [7] B.T. Polyak and N.V. Tret'yakov, "On an iterative method of linear programming and its economic interpretation", Economics and Mathematical Methods 8 (5) (1972) 740-751 (in
Russian). [8] N.V. Tret'yakov, "The method of penalty prices for convex programming problems", Economics and Mathematical Methods 9 (3) (1973) 525-540 (in Russian). [9] R.T. Rockafellar, '% dual approach to solving nonlinear programming problems by unconstrained optimization", Mathematical Programming 5 (3) (1973) 354-373. [10] R.T. Rockafellar, "The multiplier method of Hestenes and Powell applied to convex programruing", Journal of Optimization Theory and Applications 12 (6) (1973) 555-562. [11] B.W. Kort and D.P. Bertsekas, "Multiplier methods for convex programming", Proceedings 1973 IEEE Conf. on Decision and Control (San Diego, California) 428-432.
E.G. Gol'shtein. N. V. Tret'yakov/ Modi]ied Lagrangians
97
[12] B.W. Kort and D.P. Bertsekas, "Combined primal dual and penalty methods for convex programming", S I A M Journal on Control and Optimization 14 (2) (1976) 268-294 [13] R.T. Rockafellar, "Monotone operators and the proximal point algorithm", S I A M Journal on Control and Optimization 14 (5) (1976). [14] R.T. Rockafellar, "Augmented Lagrangians and applications of the proximal point algorithm in convex programming", Mathematics o f Operations Research 1 (2) (1976) 97-116.
Mathematical Programming Study I0 (1979) 98-103. North-Holland Publishing Company
EXTENSIONS OF ZANGWILL'S
THEOREM
P. H U A R D Electricit~ de France, Clamart, France University of Lille, Lille, France
Received 20 January 1978 The validity of Zangwill's general algorithm for finding a point of a subset of a set is given here with weakened hypotheses. In particular the closedness of the point-to-set map used in the algorithm is not needed. Key words: Point-to-Set Maps, General Algorithm, Fixed Point, Convergence Conditions.
Introduction The results given in this paper are extensions of Zangwill's well known theorem [6, p. 91, T h e o r e m A]. The hypotheses needed here are weaker than Zangwill's or Polak's hypotheses [5, p. 278, T h e o r e m 2.3] t. In particular, neither the upper semi-continuity of the point-to-set map used, nor the classical strictly monotonic property is needed. The main hypothesis we use here was proposed in 1971 simply in a concern for synthesis, more precisely for the needs of an adacemic course. But its usefulness for applications only came to light a few years later: in 1974 for proving the convergence of a method of convex programming using successive linearizations (Fiorot and Huard [2]), and in 1976 in a subgradient method for solving the generalized F e r m a t - W e b e r problem (Cordellier and Fiorot [1]). This hypothesis has recently been used, in the form of Relation (~'3) of Corollary 3, by G.G.L. M e y e r [4, p. 780, Hypothesis 5], for a comparative study of hypotheses. Concerning Corollaries 1 and 2, which are but direct applications of Proposition 0, two different definitions of the algorithm are considered. Corollary 2 uses a more flexible iterative rule which applies to the case where it is not possible to check whether or not the current point x~ is a solution of the problem. Corollary 3 and Proposition 4 give us an easier comparison of the preceding results with those of Zangwill and Polak. Notations R ", the n-dimensional euclidean space. N, the set of nonnegative integers. ~The latter author proposes in this volume, with S. Tishyadhigama and R. Klessig, weaker hypotheses. See the corresponding paper. 98
P. Huard/ Zangwill' s theorem
99
~ ( E ) , the set of the subsets of a set E. /~, the closure of the set E. In the following, we shall call cluster point of an infinite sequence of points any limit of an infinite sub-sequence.
Proposition 0. Let E be a closed subset o f R", P be a subset o f E, Q be a subset o f P. Let F1, F2 : ( E - Q ) ~ ~ ( E ) be two point-to-set maps such that F~ D F2. We assume, moreover, f o r all x E E - Q: (Clo) F2(x) # O. (/30) x' E (E - Q) n FI(X) :~ Fl(x') C Fz(x). (~/o) If x~_P, 3 V ( x ) , a neighbourhood of x, such that: x' E (E - Q) n V(x),
x" E (E - Q) n F2(x') ~ x ~ F,(x").
A n infinite sequence {xi I i E N } is generated with the following rule: Xo~ E. xi+l E F2(xi)
(ro)
if xi~ P.
xi+l ~ F2(xi) U{xi} xi+l= xi
if xi ~ P - Q.
if xi E Q.
Under these conditions, f o r any cluster point x . o f the sequence we have x . ~ P.
Remark 1. One practical advantage of this proposition, which uses an auxiliary subset Q, is to group in a single proof the very similar direct proofs of the two results that follow (Corollaries 1 and 2). This idea was suggested to me by F. CordeUier. Remark 2. As we can check in the following proof, Hypothesis (3'o) is used at point x . only. Consequently this hypothesis may be weakened in an appropriate manner (cf. [3]) for some applications. Remark 3. As we shall see at the end of the proof, it is possible to substitute F.~(x') for F2(s') in Hypothesis (y0), with p being any positive integer depending on x'. Proof. Let x, be a cluster point (if one exists) of the sequence. Let N ' be a subset of N such that i ~ N'c~x~+~ # x~. Two cases have to be considered. (1) The sub-sequence corresponding to N ' is finite. Therefore, from a certain rank i0 we have x~ = x~ = x,, Vi ~ N, i -> i0. Let us assume x ~ P. Then x~+lE F2(x~), Vi E N, i - i0, and consequently
1oo
P. Huardl Zangwill's theorem
x , E F2(x,). We may use (3,0) with x = x ' = x " = x,, and we obtain x, ~ Ft(x,), and hence x, ~ F2(x,) because f t ~ Us, hence a contradiction. The hypothesis x~ ~ P is not possible, and the proposition is then satisfied.
(2) The s u b - s e q u e n c e corresponding to N ' is infinite. By definition of N', x. is a cluster point of this sub-sequence. Denoting by s(i) the successor of i in N ' , we have: xiEE-Q,
ViEN',
(1)
xs(i) E FE(Xi),
Vi E N ' .
(2)
Since F, D F2 and from (2) we have x.,~ E F,(xi), Vi E N ' and using (/3o) we get in succession: Fl(x sti)) C Fl(xi),
V i E N'.
xj E Fl(X~),
Vi, j ~ N " , j > i.
x , E F~(xi),
Vi E N ' .
(3)
Let N " be a subset of N ' defining a sub-sequence converging towards x,. This convergence implies with (1): 3 i o E N": xi E ( E - Q) n V ( x , ) ,
v i E N " , i >_ io.
(4)
(1), (2) and (4) allow us to use (yo) with x = x,, x ' = xi and x " = x,,), where i E N". We get: X , v: F,(xs,~),
Vi E N", i >- io.
(5)
(3) and (5) are in contradiction because s ( i ) E N " if i ~ N " by definition. Then the hypothesis x, ~ P is impossible. If following Remark 3 we modify Hypothesis (3,0) we may still apply (3,0) taking for x" the pth successor of xi in the sub-sequence N". Denoting it by xj, it is evident that x i E F~(x~), and relation (3) and relation (5) thus obtained are still in contradiction.
Corollary 1. L e t E be a closed subset o f R", P be a subset o f E. L e t El, F2: (E-P)~ ~ ( E ) be two p o i n t - t o - s e t m a p s such that Fl D F2. W e s u p p o s e more f o r all x E E - P :
(al) F~(x) # fk. (/30 x ' E (E - P ) n F,(x) => Fl(x') c F,(x). (yl) :1V(x), a n e i g h b o u r h o o d o / x , such that: x' E ( E - P ) fq V(x),
x" ~ ( E - P ) tq F2(x') ~ x ~ F,(x").
A n infinite s e q u e n c e {xi ] i E N } is generated with the [ollowing rule:
P. Huardl Zangwill's theorem
101
x0C E. (rO
xi+l ~ F2(xi) Xi+l = xl
if x ~ P.
otherwise.
Under these conditions, f o r any cluster point x , o f the sequence we have x. EP.
Proof. It is a direct application of Proposition O, taking Q = P.
Remark 4. Originally in [3], instead of Hypothesis (yl), this proposition used the slightly stronger hypothesis (y2) of Corollary 2. This weakening, which does not alter the proof, was suggested by J. Denel. Remark 5. Rule (rt) for generating the sequence assumes that we are able to check whether or not a given point x belongs to P. The next corollary uses a more flexible rule (r2) which permits us to take for xi+~ a point different from x, even if x~ ~ P: that is to say using the rule x~+tE F2(x~). This flexibility is obtained at the cost of a slight strengthening of the hypotheses. Corollary 2. We use the same definitions and hypotheses as in Corollary 1, with the following modifications : F~, F2 are defined over the whole set E. (aO and (ill) are supposed valid f o r the whole set E. (y0 is replaced by: (Y2) If x ~ P. ::iV(x), a neighbourhood of x, such that: x'EEtqV(x),
x"~F2(x')~x~FK(x").
Here, the generating rule (r0 becomes:
(r2)
Xo~ E. xi+t E F2(xi) if x ~ P. xi+m E F2(xi)U{xi} otherwise.
Under these conditions, for any cluster point x, of the sequence we have x , E P.
Proof. It is a direct application of Proposition 0, taking Q = ~.
Corollary 3 (Extension of Zangwill's and Polak's theorems [6, 5]). Let E be a closed subset o f R ~. P be a subset o f E. F2:E~(E),
f:E~R.
We assume f o r all x E E - P:
(or3) F2(X) # ~.
102
P. Huard/ Zangwill" s theorem
(/33) x' E F2(x) :~ f ( x ' ) > f(x). (3,3) f upper-semi-continuous at x. 3 V(x), a neighbourhood o f x, such that: x" ~ ( E - P ) O F2(x') :::>f(x") > f ( x ) .
x' E ( E - P ) O V(x),
A n infinite sequence {x, I i E N } is obtained with the following rule: XoE E.
(r,)
xi+l E F2(xi) xi§ = xi
if xi~ P.
otherwise.
Under these conditions, f o r any cluster point x , o f the sequence we have x , E P.
Proof. This is an application of Corollary l, taking for F~ the point-to-set map defined by: El(x) = {y E E ] f ( y ) > f(x)},
which satisfies (/30. Furthermore, (/33) implies F, D F2. Lastly, (3'3) implies (3'0 in this particular context. In fact:
ffx") > f(x) ~ x r {y E E ]f(y) >- f(x")}, f u.s.c, over E - P } x"E E - P ::~ {y E. E ] f ( y ) >- f(x")} D Fl(x")n (E - P), and hence x ~ Fl(x"). Remark 6. With a slightly different formulation, Zangwill proposed in [6] the following stronger hypothesis: E compact, f continuous over E, F2 closed over E - P. His hypothesis, with (/33), implies (3'3), as shown further on by Proposition 2. Polak proposed in [5] the following hypothesis. Vx E E - P, we have (i) f continuous at x; (ii) 3 V(x), a neighbourhood of x, and 8(x), a positive scalar, such that:
x' ~ E n V(x), x" ~ F2(x')~ f(x") > f(x')+ 8(x).
This hypothesis implies (3'3)and (/33). Proposition 4. Using the notations and definitions of Corollary 3 we have the following relation. Suppose E - P compact, F2 closed at x,
f l.s.c, over E - P , f(x') > f(x),
V x ' E F~(x).
Then there exists V(x), a neighbourhood o f x, such that x'~(E-P)OV(x),
x"E(E-P)OF2(x')~f(x")>f(x).
103
P. Huardl Zangwill's theorem
Proof, Let us assume the negation o f the conclusion, that is: (Hyp.)
VV(x),
3x'E(E-P)NV(x),3x"E(E-P)AF2(x')
such that
.f(x") -< .f(x) and let us show that this hypothesis leads to a contradiction. Under this hypothesis, there exist two sequences of points xi and Yi such that:
{xi E E - P l i ~ N}--> x. {yi E ( E - P ) nF2(x~) I i E N}: f(y~) <-.f(x),
Vi E N.
The compacity of E - P implies the existence of an infinite sub-sequence of points y, defined by N' C N, and converging towards y E E - P . And the lower semi-continuity of 1" over E - P gives f(y) <-.((x). Furthermore, F2 being closed at x, we have y E F2(x), and hence ]'(y) >.f(x), in contradiction with the preceding result.
References [1] F. Cordellier and J.C. Fiorot, "On the Fermat-Weber problem with convex cost function", Mathematical Programming 14 (1978). [2] J.C. Fiorot and P. Huard, "Une approche th6orique du probl~me de lin6arisation en programmation math6matique convexe", Publication No. 42 du Laboratoire de Calcul (Universit6 de Lille, 1974). [3] P. Huard, "Optimisation dans R"", Cours de 3b,me cycle, Laboratoire de Calcul (Universit6 de Lille, 1971). [4] G.G.L. Meyer, "Convergence conditions for a type of algorithm model", S l A M Journal on Control and Optimization 15 0977) 779-784. [5] E. Polak, "On the implementation of conceptual algorithm", in: O.L. Mangasarian, K. Ritter and J.B. Rosen, eds., Nonlinear programming (Academic Press, New York, 1970) 275-291. [6] W.I. Zangwill, Nonlinear programming: a unified approach (Prentice Hall, Englewood Cliffs, RI, 1969).
Mathematical Programming Study 10 (1979) 104-109. North-Holland Publishing Company
ON T H E L O W E R S E M I C O N T I N U I T Y OF O P T I M A L SETS IN CONVEX P A R A M E T R I C O P T I M I Z A T I O N D. KLATTE Der Humboldt Universitiit, Berlin, G.D.R. Received December 1977 Revised manuscript received March 1978 Regarding a special class of convex parametric problems sufficient conditions for the lower semicontinuity of the optimal solution sets are developed. Key words: Lower Semicontinuity, Convex Parametric Programs, Optimal Solutions Set, Point-to-Set Maps.
1. Introduction and notation
We consider a parametric programming problem given by P(w)
min{f0(x,w) Ix E M(w)},
w E W variable,
(1)
where the parameter set W is a metric space, [0 is a real-valued function on E" • W and for each w E W, M(w) C_E" represents the constraint set. E" is the Euclidean n-space. Numerous authors have discussed the continuity of the optimal sets and/or the extreme values for various classes of linear and nonlinear mathematical programs, we refer to [1-15]. They have studied these questions by applying appropriate concepts of set convergence or of semicontinuity of point-to-set maps. Throughout this paper Berge's concepts of lower semicontinuous (l.s.c.), upper semicontinuous (u.s.c.) and closed point-to-set maps are used [1, Chapter VI]. In the second paragraph we will apply some results published by Hogan [8]. For point-to-set maps from a metric space W into the Euclidean n-space Hogan's definitions of semicontinuity are equivalent to those given by Berge (see [8, pp. 592-595]). The purpose of this note is to present a sufficient condition for the I.s.c. of the optimal set map ~ : W ~ E" defined by
O(w)={zEM(w)lfo(z,w)=
inf
xEM(w)
fo(x,w)}.
(2)
Under rather general conditions the map ~b is closed or u.s.c, on the solvability set Wp given by
Wp = {w ~ W [~b(w) = r
(3) 104
105
D. Klatte/ Continuity of optimal sets
(see [1, 2, 3, 8, 12]), while the l.s.c, of $ on Wp requires very strong assumptions. It is a well established fact that $ is not 1.s.c. on We in the simple case of linear programs parametrized in the objective function. Some classes of parametric programs which satisfy the I.s.c. property of ~/, have been discussed in [2, 5, 11, 14], for instance. The following lemma will be used in the next paragraph. It can be proved in the same way as Theorem 13 [8]. (A straightforward proof will be found in [10].) Lemma. Suppose that W is a metric space, w E W, J is a finite index set, fj are real-valued, continuous functions on E" x W f o r j E J, and the functions fj(., w), j E J, are convex on E". Let point-to-set maps L, F ~ and F f r o m W into E" be given such that F~
and F~
L(u)n{xEE"lfj(x,u)
jEY},
u E W,
F(u)=L(u)O{xEE'lfi(x,u)<-O,
jEJ},
u E W,
is nonempty. I[ L is l.s.c, at w, then F ~ and F are l.s.c, at w.
In view of the following considerations some notations will be given, ri Q (cl Q, dim Q) means the relative interior (the closure, the dimension) of a convex set Q c E". IIll is the Euclidean norm of E", xTy is the scalar product of x E E" by y E E'. H m'" means the linear normed space of (m, n)-matrices of real-valued elements with the Euclidean norm of Era". If the domain of a point-to-set map F : W--* E" is restricted to a subset G C_ W, this will be expressed by F6.
2. Lower semicontinuity of Suppose that the parametric programming problem P(w) defined by (1) satisfies the following additional assumptions: (A1) Let the constraint sets M ( w ) , w E W, be described by M ( w ) = {x E E" I f ( x , w) -< 0, i E J~ U J2},
where J1 = {1 . . . . . m}, J2 = {m + 1.... r}, and fi, i ~ J1 U J2, are real-valued, continuous functions on E" x W. (A2) Let fi(x, w) = xTCi(w)x + pi(w)Tx, i E Jt U{0}, (x, w) E E" x W, where C i : W ~ H*'" and pi : W--> E" are continuous mappings on W, and where C~(w), w E W, are symmetrical, positive semidefinite matrices for i ~ J~ U{0}. (A3) The functions fj(., w), j E J2, w E W, are convex on E". (A4) For each w E W, M ( w ) satisfies any kind of Slater condition. Consequently, if w ~ Wp the usual necessary and sufficient conditions for optimality in c o n v e x programming can be used (in the sense of [15], Section 28], for instance).
106
D. Klatte/ Continuity of optimal sets
Borrowing the concepts by No~:i6ka [13] and Guddat [6] we define ch(w) = {j 9 Jt U J2 [fi(x, w) = 0 for all x 9 ~/,(w)},
w 9 Wp,
(4)
and G(I, d) = {u 9 Wp [ I = ch(w), dim ~/,(u) = d}
(5)
where I C_J, U J2 and 0 <- d -< n. Applying well-known optimality criteria in convex quadratic optimization the following assertion can easily be shown (see [10]): Suppose that u 9 Wp, ch(u) = I C_J1 and y ( u ) 9 ~/,(u). Then
ri O(u) =
f C ' ( u ) x = C ' ( u ) y ( u ) , i 9 I U{O} 1 x 9 gntpi(u)Tx pi(u)Ty(a), i 9 I U{0}I. [fi(x,u)
(6)
For every u 9 Wp such that ch(u) = I C Jt the matrix of the equality system in (6) is denoted by R1(u), IIR~(u)ll. means the Euclidean norm of R l ( u ) .
Theorem. Let I be a subset of J1, and let 0 <- d <- n be an integer. Suppose that G(I, d) is nonempty. Let G be a subset of G(I, d) such that the constraint set map Mo is l.s.c, on G. Then d/a is l.s.c, on G. Proof. Let w be an arbitrary point of G. Choose any z E ri $(w). Consider PZ(w):
min{f0(x, u) Ix E MZ(u)},
u E G,
where M = ( u ) = {x e M ( u ) l l l x
- zll--- 1},
u 9 G.
The optimal set map of P~(w)q is denoted by ffz. By hypothesis, Mo is l.s.c, at w. Further, the function x --, II1 - zll is convex on E" and z9 {x 9 M(w) I IIx - zll < 1}. Hence, by the L e m m a stated in Section 1, M" is l.s.c, at w. M" is closed at w (see Hogan [8, T h e o r e m 10]). Since f0 is continuous, Theorem 8 in [8] yields that ~bz is closed at w. Now, M~(w) is nonempty and compact, M ' ( u ) , u 9 G, are convex sets, and [0(', u), u 9 G, are convex on E*. By [8, Corollary 9.1 and Theorem 3], ~b' is u.s.c, at w. Since M ~ is l.s.c, at w, there is an open neighborhood N~(w) of w such that M'(u)#r for u E N ' ( w ) C _ G . Thus, ~b'(u)#~J for u E N Z ( w ) . Let gz, F ~ : N'(w)--',E" be defined by
II~'(u)- zll = min{lly - zlll y e r
(7)
F'(u) = R l ( u ) ~ ( u ) .
(8)
Choose ~ > 0. By construction, the mapping R t is continuous on G. Thus, there is an open neighborhood of w (open relative to the metric space G) denoted by Nl(W) such that t[
I I R ' ( w ) - R'(u)ll. < 2(1 + IIzll) for u E N,(w).
D. Klatte/ Continuity o.[ optimal sets
107
Let S, = {x ~ E" I inf{llx - yll] y E ~b~(w)} < 89 Since the optimal set map ~,z is u.s.c, at w, there exists an open neighborhood N:(w) such that O:(u) _c S, for each u E N2(w). Obviously, O~(w) C_ ~/,(w) because /0(z, w) -0(x, w) for each x E MZ(w) and MZ(w) C_M ( w ) . It follows that R1(w)z = R t ( w ) y for all y E ~b~(w). Hence, for u ~ N~(w) n N l ( w ) n N2(w),
IlR'(w)z -
R'tu)~(u)ll
-< I I R ' ( w ) l l . l l y ( u ) - ~:~(u)[I + (1 + Ilzll)llR'(w)- g'(u)llH < ~,
where [[y(u)- ~:Z(u)l[ = min{lly - ~:'(u)[[ [ y E ~b~(w)}. Therefore, if z E ri ~b(w), then F ~ is continuous at w. Let S C_ E" be an open set (relative to E") such that ~ ( w ) A S # 0. We shall construct an open neighborhood /V(w) such that for each u ~ / V ( w ) , we have ~b(u) n S # 0. Obviously, ri ~b(w) O S # 0 . Let z be any point of r i ~ b ( w ) n S , and let N~(w) be the neighborhood introduced above. Now we define point-to-set maps L, F ~ N Z ( w ) ~ E" by L ( u ) = {x E E" [ R l ( u ) x = r ' ( u ) } , F~
= {x E L ( u ) [llx - zll < 1, fi(x, u) < o, i E ( J l ~ I ) u J2},
where F ~ is given by (8). Hence, L ( u ) # O for each u E NZ(w). Since GC_ G(L d), for each u E G, the rank of R t ( u ) equals the rank of R t ( w ) . Using the continuity of F ' at w from a result published by Dantzig et al. [2, Theorem II.2.2] it follows that L is l.s.c, at w. The lemma stated in Section 1, the l.s.c, of L, z E F~ and the hypotheses A1, A2 and A3 guarantee that F ~ is l.s.c, at w. Hence, since z E F~ A S , there is an open neighborhood N ( w ) C_N ~ ( w ) such that u E N ( w ) implies that F~ O S # 0. Choose for each u E N ( w ) , x(u) ~ F~ A S . Accordingly, R t ( u ) x ( u ) = FZ(u) for u E N ( w ) . By definitions of R t ( u ) in (6) and of F~(u) in (8), for each u E N(w), we have flo(X(U), u)
=
fO(~Z(U), U),
IIX(U) -- ZII < 1,
fi(X(U), U) = fi(~'(U), U),
i E I,
[i(X(U), U) < O, j E ( J r \ I ) U J2.
By (7), ~Z(u)E ~/,~(u) for u E N~(w). Thus, if u E N ( w ) then x(u) is a local optimal solution of min{.f0(x, u ) l x E M(u)}. Convexity insures x ( u ) E ~(u) for each u E N ( w ) . Since S and w were arbitrary, the I.s.c. of ~G on G is shown. If the hypothesis for MG to be l.s.c, is deleted, the I.s.c. of ~b6 on G is no longer valid. Consider min{x [ x 2 + y2 _< 1, y -> 0, y = tx - Itl}, We have ~(t)={(l,0)} ,/,(t) = { ( - I, o)}
if t > 0 , if t -< o.
t E E ~ variable.
108
D. Klatte/ Continuity of optimal sets
Although ch ~O(t)= {I, 2, 3} and dim t 0 ( t ) = 0 for t E E t, ~ is not closed and not l.s.c, at t = 0. Finally, an e x a m p l e will illustrate w h y w e had to restrict o u r studies to the case that f o r e a c h w E G ( I , d), the active constraints fi(', w), i E ch(w), are quadratic (I _C J1). C o n s i d e r the p a r a m e t r i c p r o g r a m 15
min{tz - y I f~(x, y, z, s) -< 0}, (t, s) E E I x El+ variable, E+-{s ~ E I Is-0}, [l(x,y,z,s)=gl(x,s)+g2(z)+y-l, 1
__
where gt(x,s)
g2(z) =
=
( x - s) ifx>_s (x -q- s ) 2 i f x < _ - s , 0 if-s<x<s, ( z + 4 ) 2 if z - < - 4 , z2 if z -->0, 0 if - 4 < z < 0 .
O b v i o u s l y , 15 satisfies the suppositions A I - A 4 , and the constraint set m a p is l.s.c. on El+. It is easily s h o w n that
q,(t, s ) = {(x, y,
E 3 I [ - s ---x -< s, y = 1 - 8 8 2, z = - I t } fortO,
~O(t,s)={(x,y,z)EE 3[-s<_x<_s,y=
l-lt2, z=-4-1t} fort>0,
~b(t,s)={(x,y,z)~E31-s<-x<-s,y
s-0,
= 1, - 4 _ < z__<0}
f o r t = 0, s_>0. H e n c e , G({1}, 1) = {(0, 0)} U{(t, s) E E 2 [ s > 0, t # 0}, but qJ~t{ll.~ is not l.s.c, at (0, 0).
References
[1] C. Berge, Topological spaces (Macmillan, New York, 1963). [2] G.B. Dantzig, J. Foikman and N. Shapiro, "On the continuity of the minimum set of a continuous function", Journal of Mathematical Analysis and Applications 17 (1967) 519-548. [3] G. Debreu, Theory of value (John Wiley, New York, 1959). [4] J.P. Evans and F.J. Gould, "Stability in nonlinear programming", Operations Research 18 (1970) 107-118. [5] H.J. Greenberg and W.P. Pierskalla, "Extensions of the Evans-Gould stability theorems for mathematical programs", Operations Research 20 ( 1972) i 43-153. [6] J. Guddat, "Stability in convex quadratic parametric programming", Mathematische Operationsforschung und Statistik 7 (1976) 223-245. [7] J. Guddat and D. Klatte, "Qualitative stability in nonlinear optimization", Proceedings of the IX. lnternat. Symposium on Mathematical Programming, Budapest, 1976 (to appear). [8] W.W. Hogan, "Point-to-set maps in mathematical programming", SIAM Review 15 (1973) 591-603.
D. Klatte[ Continuity o1: optimal sets
109
[9] P. Huard, "Optimization algorithms and point-to-set maps", Mathematical Programming 8 (1975) 308-331. [10] D. Klatte, "Untersuchungen zur Iokalen Stabilit~it konvexer parametrischer Optimierungsprobleme", Dissertation (A), (Humboldt-Universit/it, Berlin, 1977). Ill] B. Kummer, "Stability and weak duality in convex programming without regularity", preprint (Humboldt-Universit~it, Berlin, 1978). [12] R. Meyer. "The validity of a family of optimization methods", S l A M Journal on Control 8 (1970) 41-54. [13] F. No~.i~ka, J. Guddat, H. Hollatz und B. Bank, Theorie der linearen parametrischen Optimierung (Akademie-Verlag, Berlin, 1974). [14] S.M. Robinson and R.H. Day, "A sufficient condition for continuity of optimal sets in mathematical programming", Journal o[ Mathematical Analysis and Application 45 (1974) 506-511. [15] R.T. Rockafellar, Convex analysis (Princeton University Press, Princeton, N J, 1970).
Mathematical Programming Study 10 (1979) 110-114. North-Holland Publishing Company
A NOTE ON THE CONTINUITY OF THE SOLUTION SET OF SPECIAL DUAL OPTIMIZATION PROBLEMS B. K U M M E R Der Humboldt Unioersitiit, Berlin, G.D.R.
Received December 1977 Revised manuscript received March 1978 In this paper dual programs of convex optimization problems having a parametric objective function and a fixed linear feasible set are studied. By using some properties of the primal problem the continuity of the dual optimal solution set is proved. Two examples show the necessity of the suppositions. Key words: Stability, Special Dual Programs, Optional Solutions Set, Point-to-Set Maps.
1. Introduction Generally, the solution set of parametric optimization p r o b l e m s does not continuously depend on the parameter. That shows the study of linear problems with parametric objective functions. The aim of this p a p e r is to state the continuity of the solution set map ~bD of special dual programs D(A)
y EEm, y > 0 .
[inf(jr(x,A)+y'(Ax-b))]~max, xEE n
For this we need a stability property of the corresponding primal problem P(A)
jr(x, A)-~ rain,
A x < b, x E E,.
2. Definitions and assumptions We will consider the mappings ~/,p and I//D defined as So(A) = {y [ y solves D(A)}, Sp(A) = {x I x solves P(A)}. Let us throughout suppose (l) A is a m x n-matrix, b E Era, A ~ E~; (2) jr(., A) is c o n v e x and continuously differentiable for all fixed A; (3) The gradient Vd'(., ") is continuous on E , x {0}; (4) The solution set Sp(A) of the problem P(A) is always not e m p t y . 110
B. Kummer/ Continuity of the solution sets
111
We denote by Ai and bi the ith row of the matrix A and the ith component of the vector b respectively. Further, let us form
l ( x ) = {i [A~c = b,}
for all x in En.
We will give conditions for ~bD to be continuous at h = 0 what means H(A) = inf{~ [ 0 o ( h ) C U,~bD(0) ^ ~D(0)C U,~bo(X)} converges to zero as A --* 0. Here U,M denotes the set of all points z whose distance d(z, M ) to M is less than ~. This continuity means the convergence of the Hausdorff-distance between the sets $nOt) and ~,o(0) which may be unbounded. Generally, that is not equivalent to Berge's continuity-definition [1], but both definitions coincide for compact 0o(0). Our main assumptions will be the following: (5) For every sequence At--*0 (t = 1,2 .... ) there are an infinite subsequence {X,~k)}~-m,... and points x k E ~v(Xttk)), X~ ~vt0) such that V,./(x ~ 0) = lim V,f(x k, X,~k~) k--~*
and
I(x k) C l ( x ~
for all k.
As it is easy to see from (2), (3) and (4), the condition (5) holds if for every sequence At ~ 0 (t = 1, 2 .... ) the upper limit (in the sense of Hahn, Kuratowski) li---m0P(A,) = {x E E, ]lim d(x, ~bp(Z,) = 0} is not empty (because of lim ~bp(At)C ikp(0)). T h e r e f o r e the assumption (5) is satisfied if (5.1) the mapping 0v is lower semicontinuous at A = 0 (in Berge's sense [1]); or
(5.2) f(x, A)= c(A)'x, c continuous (there are only finite many solution sets). Further, using T h e o r e m V.3.1. by Goldstein [2] or Hogan's Theorem 9 [3] we note that (5) holds if (5.3) ~bp(0) is compact and f is continuous on E , • where also this condition leads to lim Iltp(A,) ~ 0. In the case lira 0p(A~)= 0 the condition (5) is difficult to verify. H o w e v e r , it can be fulfilled what the following simple example shows: f(x, A)= A2x2 - 2Ax, x-0. Mathematically, it seems to be more difficult to show that (5) does not follow from (2), (3) and (4) than to prove the next theorem. 3. Results Theorem. If the above suppositions (1)-(5) hold, then the solution set map dID is
continuous at h = O.
B. Kummer[ Continuity o[ the solution set
112
By our suppositions the Lagrangian L(x, y, A) = [(x, A)+ y'(Ax - b) has a saddle point with respect to x E E . , y ~Em, y >-0 for every ,X. The local Kuhn-Tucker-conditions yield that if x(A) belongs to ~bp(A) then the dual solution set can be written as
Proof.
(6)
~OD()t)= {y --> 0 I y'(Ax(a) - b) = 0, V~f(x(A), X) +
y'A = 0}.
Now, let us assume ~bD would not be continuous at )t = 0. Then there is a sequence A,-~0 (t = 1,2 .... ) such that H()t,)>-g>O for all t and for some g. Let {It.k)} be a subsequence satisfying the condition (5). An appropriate choice of all A,k) leads to the identity of all sets I(x k) (k = 1,2 .... ). Setting J = l(xk), I = l(x ~ we have J C_ L Further, viewing (5) and (6) the equations
=OYi~J, ~
--Vx.f(xk, l~t(k))},
(7)
I/JD(,~.t(k)) = {Y---01Yi
(8)
~bD(O)= { y > O l yi = OV i~- l, i~etAiyi = - V J(x~ O)}
A,y,=
follow. Let us introduce the set M={y
>>-Olyi=OVi~=J,~ Aiyi = - V ~ ( x ~
Because of ~OD('~tCk~)#0 and that 9k = inf{~ I I//D(At(k))C
Vxi(x ~ 0 ) = lira V~/(x k, At~k~) it is not difficult to see
U,M ^ M C U,I[ID(At(k))}
converges to zero for k ~ oo. (The reader will find a proof for this fact e.g., in [4]. There is shown that parametric polyhedrons Q ( / z ) = {x E E n l A x <-/z} can be written as Minkowski-sum Q ( / z ) = K ( ~ ) + U (t~ E L ) such that K is a continuous and compact-valued point-to-set map defined on L = {p. I Q ( / x ) # 0} and U is the cone {x I Ax <- 0}). Therefore, the choice of ~ and of the sequence A, implies J # L and we obtain J C I and M C ~D(0). Let y t E ~t)(0)---M and y J E M be chosen arbitrarily. Then we can easily construct a contradiction. Since -v,f(x~
= Y. Y i, A i - E _ y iJA i , iEl iE]
it follows (9)
~]
~, y,A,, i
otiA i = i
]
From A~ ~ = bi (i E I) we obtain
(10)
ct, = y~- y~.
B. Kummerl Continuity of the solution set
113
On the other side the relations
Aix k = b i ( i E J )
Aix k < b i ( i E I ~ J ) ,
and
k#0
hold. According to (9) they lead to (ll)
E aibi= E oliAixk=
E
Y~Aixk < E y~bi.
For the last inequality is used that y~>O
and
y~->0.
iEl~J
The contradiction (10), (11) verifies the theorem. Now, we will present two examples showing that the theorem will no longer hold if in the primal problem P()`) nonlinear restrictions appear or if the supposition (5) is removed.
Example 1. f(x, A) = -2)`xx + x2-' min, g l ( x ) = x~ - x2 <- O,
X ~E2.
g 2 ( X ) = Xl2 + (X2 -- 2 ) 2 -- 4 --< 0.
For I)`]<2 we obtain the single primal solution x()`)= ()`,)`2) and the dual solution set f{(1,0)}
q'D()`)=l{y>_0ly,+4y2="
l}
if )` # 0 if), = 0 .
Obviously, qJD is not continuous at )` = 0.
Example 2. f(x, )`) ~ min, xEE2,
--Xl ~'~ O,
--X2 ~<0 ,
x 2 - 1-----0.
In order to define the function .f we consider the non differentiable function g(x, A) = max{1 - Ihlxl, 1 - x2} and smooth it in an appropriate way. For this we distinguish the cases (i) IXIx1 + 1 -< x2, (ii) I)`lx,- 1 < x2 < I)`lx, § l, (iii) x2 < - I)`lx,- l, and put
f(x, ),) =
{!!l)`lx,
I)`Ix, + (-~)(1 - x2 + I)` Ix,)2
x~
if(i),
if (ii), if (iii).
Then, for e v e r y )` in El the function f(., )`) is convex and continuous differenti-
114
B. Kummer/ Continuity of the solution set
able. T h e gradient V,J'(-, .) is continuous. The partial derivates have the form
oflax~
aflax2
-Ixl
-Ixl +89
0
x~+ Ixlx,)
0
if (i)
- 89 - x z + IAlx,) - 1
if (ii) if (iii)
Finally, it is not difficult to c o m p u t e that
~,p(x) = {x ~ E2 Ix, -> 21xl-', x2 = 1}
(x ~ 0),
~bv(0) = {x ~ E2 [ xl --- 0, x2 = 1}, $D(A) = {(0, O, 1)} (A ~ 0),
$D(O) = {(0, O, 0)}.
H e r e the third c o m p o n e n t o f the d u a l v a r i a b l e is c o n n e c t e d w i t h t h e t h i r d p r i m a l c o n d i t i o n x 2 - l <- O.
References
[1] C. Berge, Topological spaces (MacMillan, New York, 1963). [2] E.G. Gorshtein, "Teorija dvoistrennosti v matematitseskom programmirovanie i ee prilozhenija", lztsatel'stvo Nauka (Moscow, 1971) lln Russian]. [3] W.W. Hogan, "Point-to-set maps in mathematical programming," S I A M Review 15 (1973) 591-603. [4] B. Kummer, "Globale Stabilitiit quadratischer Optimierungs-probleme", Wissenschaftliche Zeitschrift der Humboldt-Universitiit., Naturwissenschaftliche Reihe, (to appear in 1978).
Mathematical ProgrammingStudy 10 (1979) 115-127. North-Holland Publishing Company
ASYMPTOTIC PROPERTIES OF SEQUENCES ITERATIVELY GENERATED BY POINT-TO-SET MAPS Gerard G.L. M E Y E R The Johns Hopkins University, Baltimore, Maryland, U.S.A.
Received 23 November 1977 Revised manuscript received 16 March 1978 In this paper, we examine the relationships between the fixed point set of a point-to-set map A(.), and the asymptotic properties of the sequences which may be iteratively generated by using the map A(.). Let L be the set of all limit points, and Q be the set of all cluster points of all sequences which may be iteratively generated by A(.). The consequences of various assumptions on the map A(-) and the sequences generated by A(-) on lower-bounds and upper-bounds for L and Q are discussed. Key words: Point-to-Set Map, Characteristic Set, Periodic Point, Fixed Point Set, Asymptotic Regularity, Monotonic Algorithm, Algorithm Schema.
Introduction A point-to-set map A(.) from a set X into a set Y is a map which associates a subset of Y with each point in X. Since the pioneering work of Zangwill [25], many papers and books have been written which use this concept to unify and simplify the study of iterative algorithms for solving optimization problems [2, 4, 6-8, 10-19, 22-26]. The results presented in the literature, with few exceptions [11, 23], deal with point-to-set maps in which both X and Y are subsets of the same topological space, and are usually concerned with discussing the applicability of point-to-set maps to the study of a class or classes of optimization algorithms. This paper is also devoted to the study of point-to-set maps in which both X and Y are subsets of the same space. To facilitate the comprehension of the results, we have assumed that both X and Y are identical and subsets of E" but the proofs are given in such a manner that it is easy to generalize the results to more complicated spaces. We do not take into account the manner in which the map A(-) is described, and the implications of the description on the properties of A(-). Rather, we assume that we have a map A(-) from a closed subset T of E" into all the non-empty subsets of T, and we examine the relationships between the fixed point set of A(.) and the asymptotic properties of the sequences which may be iteratively generated by using the map A(-). In particular, we try to find upper-bounds on the sets L of all limit points of all sequences which may be generated iteratively by A(.) and Q, the set of all cluster points of all sequences which may be iteratively generated by A(.). The paper's first part is devoted to the precise definition of the manner in i15
116
G.G.L. Meyer[ Asymptotic properties ,9[ sequences
which a point-to-set map A(.) is used to generate sequences, i.e., the description of an algorithm schema, and the definition of the characteristic set of the schema. The second part of the paper presents the consequences of continuity assumptions on the map A(-). We show that under suitable hypotheses, we may obtain upper bounds on L. Then, in the third part of the paper, we show that in contrast with the deterministic case [17], the cluster points of a sequence generated by a non-deterministic algorithm schema need not have the same periodicity. The last two sections of the paper are devoted to the analysis of upper bounds for the set Q. These bounds can only be obtained if some additional assumptions are made on the algorithm schema. Section four deals with the concept of asymptotic regularity, and section five examines the consequences of monotonicity on the behavior of the schema. The notations used in this paper are standard with the following possible exceptions: (i) {a; b; c} denotes the set containing the elements a, b, and c; (ii) [a; b], (a; b], (a; b), and [a; b) denote the sets {x l a -<x-< b}, {x I a < x - < b}, {x ] a < x < b}, and {x [ a - < x < b} respectively; (iii) Given a sequence {xi} and an index set K, {Xi}k denotes the subsequence of {x~} consisting of all points in {x~} with index i in K ; (iv) E n denotes the Euclidean space of dimension n; (v) All neighborhoods are relative to the set T.
Algorithm schema Let T be a closed subset of E", and let A(.) be a map from T into all the non-empty subsets of T. The algorithm schema we wish to consider in this paper is of the following form. 1. Algorithm. Let Zo be a given point in T. Step O. Set i = 0. Step I. Pick a point z~§ in A(z~). Step 2. Set i = i + l, and go to Step I. Algorithm 1 is non-deterministic (the specific manner in which z~§ is chosen at each iteration is not given), autonomous (the map A(.) does not depend on the index i), and does not possess a stop rule (the algorithm can only generate infinite sequences). The finite properties of Algorithm 1 are described by its characteristic set [ 12]. 2. Definition. The characteristic set of an iterative algorithm without stop rule is the set of all points z so that the algorithm may generate a sequence {zl} for which there exists an integer k with zi = z for all i -> k.
G.G.L. Meyer/Asymptotic properties o/sequences
117
Let C be the characteristic set of Algorithm I, then Definition 2 implies immediately that C = {z E T I z E A(z)}. 3. Definition. A point z in T is a periodic point of A(-) of period p if and only if (i) z ~ AP(z), and (ii) zZ: AO(z) for all q = 1,2 . . . . . p - 1. The characteristic set C of Algorithm 1 consists of all periodic points of A(.) of period 1 (i.e., the fixed points of A(.)). The sequences generated by Algorithm 1 are infinite, and therefore we must characterize the asymptotic properties of Algorithm 1: let L be the set of all limit points of all c o n v e r g e n t sequences which may be generated by Algorithm 1, and let Q be the set of all cluster points of all sequences which m a y be generated by Algorithm 1. 4. Lemma. C C_ L C_ Q. Proof. Obviously, L is a subset of Q, and to show that C is a subset of L, it is enough to note that if z0 is in C, then Algorithm 1 may generate the infinite sequence {zi}, with zi = z0 for all i. We note that C is not a fixed set for the map A(.). In fact we have C C A(C). This result can be sharpened when A(-) is a point to point map, i.e. when A(z) contains only one point for every z in T. In such a case C = A(C).
Continuity of the map A(') In the absence of assumptions on A(.), the result contained in L e m m a 4 is essentially the only result linking the finite and asymptotic properties of Algorithm 1. In order to get finer results, we have to assume that A(.) is continuous in some sense. 5. Definition. The map A(-) is upper semi-continuous (u.s.c.) at a point z in T i f f given any neighborhood N(A(z)) of A(z), there exists a neighborhood N(z) of z so that A(z') is a subset of N(A(z)) w h e n e v e r z' is in N ( z ) n T. The map A(.) is u.s.c, on a subset S of T i f f A(.) is u.s.c, at every point z in S. 6. Definition. T h e map A(-) is closed at a point z in T iff {zi} converges to z, zi in T for all i, {yi} converges to y, and y,- in A(zi) for all i implies that y is in A(z). The map A(.) is closed on a subset S of T i f f A(.) is closed at e v e r y point z in S.
G.G.L. Meyer/Asymptotic properties of sequences
118
We repeat these two well known definitions because there is not a complete agreement on terminology among various authors [1, 2, 3, 5-9, 18-20, 25, 26]. 7. Definition. A m a p A(.) is c o m p a c t on T iff the set A(z) is c o m p a c t for every z in T. Contrary to a prevalent belief, the c o n c e p t s of closedness and upper-semicontinuity are not equivalent. 8. Example. Let T = E, and let A(.) be the m a p from T into T defined by
A(z)=(O;1)
for a l l z i n T.
The map A(.) is not closed on T but it is obvious that A(-) is u.s.c, on T. 9. Example. Let T = E, and let A(.) be the map from T into T defined by
A(z)=
{o},
z -< o,
{z;l/z},
0
{z},
I < z.
The map A(.) is not u.s.c, on T, but A(.) is closed on T. The relationships between the closedness and u.s.c, concepts are contained in the following two lemmas. 10. Lemma. Suppose that T is bounded and that A(.) is closed on T, then A(.) is u.s.c, and compact on T. Proof. Let z be a point in T and suppose that A(.) is not u.s.c, at z. Then, given a neighborhood N ( A ( z ) ) of A(z), we can find a sequence {z~} in T converging to z and a sequence {y~} so that y / i s in A(z~) but not in N(A (z)) for all i. The set T is bounded and thus there exists an infinite subset K of the integers so that {yi}r converges to some point y. Clearly y cannot be in A(z) thus contradicting the fact that A(-) is closed at z. We conclude that A(.) must be u.s.c, at every point z of T. To show that A(z) is c o m p a c t for e v e r y z in T, it suffices to note that A(z) is a closed subset of T which is closed and bounded. 11. Lemma. Suppose that A(.) is u.s.c, on T and that the set A(z) is closed for every z in T, then the map A(.) is closed on T. Proof. Let z be a point in T, let {zi} be a sequence in T converging to z and let {y,.} be a sequence in T converging to some point y such that Yi is in A(zi) for all i. Suppose that y does not belong to A(z), then we can find disjoint neighborhoods N ( y ) and N ( A ( z ) ) of y and A(z) respectively. The map A(-) is u.s.c, and
G.G.L. Meyer[Asymptotic properties o.f sequences
119
there exists a neighborhood N ( z ) of z so that A(w) is in N ( A ( z ) ) for all w in N(z). The sequence {z~} converges to z, the sequence {y~} converges to y and there exists k so that z~ is in N ( z ) and y~ is in N ( y ) for all i -> k. But y~ is in A(z~) for all i and therefore y~ is in N ( A ( z ) ) for all i-> k which is clearly impossible because N ( y ) and N ( A ( z ) ) are disjoint. We conclude that A(-) is closed at every z in T, i.e., A(-) is closed on T. Note that L e m m a 10 and L e m m a 11 may be relaxed in various ways. For example, L e m m a 10 may be relaxed in the following manner: if the range of A(.) is bounded and A(.) is closed on T, then A(-) is u.s.c, and c o m p a c t on T. We are now ready to present some results which are the c o n s e q u e n c e of the continuity of A(-). 12. Lemma. Suppose that A(.) is u.s.c, and compact on T, and let {z~} be a specific sequence generated by Algorithm 1. I[ z . is a cluster point of {z~}, then for every p : 1,2 . . . . . the set AP(z.) contains a cluster point of {zl}. Proof. The m a p A(-) is u.s.c, and c o m p a c t on T and therefore A ( S ) is c o m p a c t for every n o n - e m p t y c o m p a c t subset S of T [1, p.116]. This implies immediately that AP(z) is c o m p a c t for every p = 1,2 . . . . . and for e v e r y z in T. Let z , be a cluster point of {zi} and let N ( A P ( z , ) ) be a c o m p a c t neighborhood of AP(z,). Then there exists a neighborhood N ( z , ) of z , so that y in N ( z , ) implies that AP(y) is a subset of N(AP(z,)). The point z , is a cluster point of {z~} and thus there exist an infinite subset K of the integers and k so that zi is in N ( z , ) for all i -> k , i in K. It follows that z~+p is in N ( A P ( z , ) ) for all i -> k , i in K, i.e., N ( A P ( z , ) ) contains an infinite n u m b e r of points of {z~}. It follows immediately that the c o m p a c t set AP(z,) contains at least one cluster point of {z~}. 13. Corollary. l[ A(.) is u.s.c, and compact on T, then L = C (i.e. i[ a sequence generated by Algorithm I converges, it converges to a point in C). Proof. Let {z~} be a convergent sequence generated by Algorithm 1. The sequence {z~} p o s s e s s e s only one cluster point z , . F r o m L e m m a 12, the set A ( z , ) contains a cluster point of {z~} which has to be z , and we conclude that z , is in C. It may happen that the continuity properties of A(.) do not hold on all of T, or that it is not possible to prove that they hold on all of T. Let D be a subset of T, and l e t / 5 be the c o m p l e m e n t of D with respect to T. 14. Corollary. Suppose that A(.) is u.s.c, and compact on ID, then C C_ L C_ C U
D.
120
G.G.L. Meyer/Asymptotic properties o/sequences
The result of Corollary 14 shows that any upper bound we may obtain on the set L is bounded from below by C, and that this upper bound depends on the strength of the assumptions on A(-). In particular, we see that in order to prove that L = C, we do not need A(.) to be u.s.c, and compact on all of T, but only that A(.) be u.s.c, and compact on the complement C of C with respect to T. It is well known that if a sequence {z~} is bounded and possesses one and only one cluster point z , , then {z~} converges to z,. We now show that under appropriate assumptions on the map A(.), a sequence {zi} generated by Algorithm i is bounded whenever {zi} possesses one and only one cluster point. 15. Theorem. Suppose that A(.) is u.s.c, and compact on T, and let {zi} be a
specific sequence generated by Algorithm I. If {zi} possesses one and only one cluster point z,, then: (i) the sequence {z~} is bounded, (ii) {zi} converges to z , , and (iii) z , is in C. Instead of proving the theorem directly, we shall prove a more general result, and then use this result to prove T h e o r e m 15. 16. Lemma. Suppose that A(.) is u.s.c, and compact on T, and let {z~} be a specific sequence generated by Algorithm 1. l.f the set o.f cluster points of {z~} is non-empty and bounded, then the sequence {zi} is bounded. Proof. Let q be the set of all cluster points of the sequence {zi}. The set q is bounded by assumption, and being a cluster point set is also closed. The fact that the map A(-) is u.s.c, and compact on T implies immediately that the set A(q) = {y E T I Y E A(z), z in q} is also closed and bounded. Let N ( A ( q ) ) be a compact neighborhood of the set A(q), and let N ( q ) be a compact neighborhood of the set q, so that A ( y ) is a subset of N ( A ( q ) ) whenever y is in N ( q ) . Suppose that {z~} is unbounded, then there exists an infinite subset K of the integers so that, for all i in K, z~ is not in N ( q ) U N ( A ( q ) ) . Let j be the smallest index so that zj is in N ( q ) , and for all i in K, i > j , let j(i) be the largest index so that zj(i~ is in N ( q ) and j(i) < i. Then, for all i in K, i > j, the point zj(i)+j is in N ( A ( q ) ) , but is not in N(q). It follows that there exists an infinite number of points of {zi} which are in N ( A ( q ) ) , but not in N ( q ) , and the sequence {zi} possesses a cluster point which is not in q. By definition, every cluster point of {z~} is in q, and therefore the assumption "the sequence {zi} is u n b o u n d e d " leads to a contradiction. Using L e m m a 16, the result of T h e o r e m 15 is obvious: if {zi} possesses one and only one cluster point, then the cluster point set of {zi} is non-empty and bounded, and therefore the sequence {zi} is bounded. This implies immediately
G.G.L. Meyer/Asymptotic properties of sequences
121
that {zi} converges to its unique cluster point which by Corollary 13 is in C. The fact that the cluster point set of a sequence generated by Algorithm l is bounded has two important consequences when A(.) is u.s.c, and compact on T. 17. Corollary. Suppose that A(.) is u.s.c, and compact on T and let {z~} be a specific sequence generated by Algorithm 1. If the cluster point set of {zi} is non-empty and bounded, then given any neighborhood N ( q ) o / q , there exists k, depending on N ( q ) so that 7.i is in N ( q ) [or all i >- k.
Proof. L e m m a 16 implies that {zi} is contained in a bounded set B. Let N ( q ) be a neighborhood of q. The set {z ] z ~ N ( q ) , z E B} is bounded and therefore cannot contain an infinite subsequence of {z~} (otherwise there would exist a cluster point of {zi} in {z I z~: N ( q ) , z ~ B} which is impossible). It follows that there exists k so that z~ is in N ( q ) for all i -> k. 18. Corollary. Suppose that A(.) is u.s.c, and compact on T and let {7.i} be a specific sequence generated by Algorithm I. If the cluster point set q of {7.i} is not empty and bounded, then q C_ A(q). Proof. Let N ( A ( q ) ) be a compact neighborhood of A(q) and let N ( q ) be a compact neighborhood of q so that y in N ( q ) implies that A(y) is a subset of N(A(q)). From Corollary 17, we know that there exists k so that 7.i is in N ( q ) for all i -> k. It follows that zi~ is in N ( A ( q ) ) for all i -> k, i.e. 7.i is in N ( A ( q ) ) for all i->k + I. We conclude that all the cluster points of {zi} are in all the neighborhoods of the compact set A(q), thus the result.
Periodicity of cluster points Let {7.i} be a specific sequence generated by Algorithm l from the initial point z0, and let q be the set of all cluster points of {zi}. In this section, we investigate the possible relationship between the number of points in q and the periodicity of the points in q. We begin our investigation by showing with an example that not all points in q need to have identical periodicity. 1.9. Example. Let y~ and Y2 be two distinct points in T, and assume the following:
(i) yl ~ A(y2), (ii) Yl ~ A(y0, and (iii) Y2 E A(yl) tq A(y2). Then, Algorithm 1 may generate the sequence {zi} defined by z0 = y~, zl = Y2, 7.2 = Yt, 7.3 = Y2 . . . . etc. This sequence possesses the two cluster points y~ and y2, and Yt is of periodicity two and y2 is of periodicity one.
G.G.L. Meyer[Asymptotic properties of sequences
122
We know from Corollary 18 that if A(.) is u.s.c, and compact on T, and q is the bounded cluster point set of a specific sequence {zi} generated by Algorithm 1, then q C A(q). Clearly this implies that q ~_ A(q) fq q. We now show that this relation characterizes q when q contains a finite number of points. 20. Lemma. Suppose that A(.) is u.s.c, and compact on T, let q be the cluster point set of a specific sequence {zi} generated by Algorithm 1, and assume that q is non-empty and contains a finite number of points. A non-empty subset Y of q is equal to q if and only if A ( Y ) fq q C Y. Proof. (i) If Y = q, then Corollary 18 implies that q c_ A(q), i.e., q ~ A(q) tq q. (ii) Now suppose that Y is a non-empty subset of q so that A ( Y ) n q c_ Y and let I2 be the complement of Y with respect to q. Let N ( A ( Y ) ) and N ( Y ) be disjoint neighborhoods of A ( Y ) and ~7 respectively, and let N ( Y ) be a neighborhood of Y so that y in N ( y ) implies that A(y) is a subset of N ( A ( Y ) ) . Corollary 17 implies that there exists k so that zi is in N ( Y ) U N ( I ? ) for all i -> k. Suppose that zj is in N ( Y ) for some j -> k, then zj+t is in N ( A ( Y ) ) . But zi+t is also in N ( Y ) U N ( ~ ' ) and the fact that N ( Y ) n N ( A ( Y ) ) = 0 implies that zj+~ is in N ( Y ) . We conclude that if zi is in N ( Y ) for some j -> k, then z,- is in N ( Y ) for all i ->j and therefore q C_N ( Y ) . Actually, q is in every neighborhood of Y which is a subset of N ( Y ) and q C Y. But by assumption, Y C_ q and therefore Y = q. 21. Theorem. Suppose that A(.) is u.s.c, and compact on T, let {zi} be a specific sequence generated by Algorithm 1, and let q be the set of cluster points of {zi}. If q is not empty and contains exactly p points, then: (i) q C_ U~'=-d A/(y) for every y in q, and (ii) every point y in q is a periodic point of A(.) of period ry, with 1 <- ry <- p. Proof. (i) Let y be a point in q and construct iteratively the sequence of sets Yj, j = 0, 1.... using the following scheme: Y0=y
and
Yj+t=(YjUA(Yj))nq.
Then the sequence { Y i} satisfies: Y~C_q and
YiC_Yp~
and a l l j = O , l ....
L e m m a 20 implies that if Yj is not equal to q, then A(Yi) fq q is not a subset of Y~ and therefore card( Yj+0 -> card(Y j)+ 1, where card(Y) denote the cardinality of the set Y. It follows that if Yi is not equal to q, then card(Y i ) - > j + I. By assumption q contains p points and therefore Yi must be equal to q for some j-< p - 1. We conclude that q = Yp_~. The construction of the sequence {Yj} implies that Y~ c_ Yo u A(Yo) U ... u A'(Yo)
G.G.L. Meyer/Asymptotic properties of sequences
123
and therefore p-1
q C_ I_J AJ(y) j-O
where we define A~ to be equal to y. (ii) Let y be a point in q, then L e m m a 12 implies that there exists a point w in A(y) A q. If w = y, y is a periodic point of A(.) of period 1. If w g y, we know from the first part of T h e o r e m 21 that y is in AJ(w) for some j, with 0 -< j -< p - 1. It follows immediately that y is in Ai+t(y) i.e., y is a periodic point of A(.) of period r , with 1 -< ry < p. It may happen that q contains p points, and that every point y in q is of periodicity ry with r r < p. 22. Example. L e t Yl and Y2 be two distinct points in T, and a s s u m e that both y~ and Y2 are in A(yO tq A(y2). Then, Algorithm 1 may generate the sequence {zi} defined b y z0 = yl, zl = y2, z2 = y~, z3 = y2, etc. This sequence p o s s e s s e s two cluster points y~ and y2, i.e., card (q) = 2 but both y~ and y2 are periodic points of A(.) of period 1. Note that if the assumptions of T h e o r e m 21 are satisfied and if A(.) is a point-to-point m a p then every point y in q is a periodic point of A(-) of period p [17].
Asymptotically regular algorithms We have seen that we m a y obtain upper-bounds on the set L by assuming that the map A(.) p o s s e s s e s the appropriate continuity properties. In order to obtain upper bounds on the set Q, we must m a k e some further assumptions concerning Algorithm 1. In this section we shall assume that Algorithm 1 is asymptotically regular. 23. Definition. A sequence {zi} is asymptotically regular iff fllzi+l - zi[[} converges to 0. An iterative algorithm is asymptotically regular iff e v e r y infinite sequence it generates is asymptotically regular. 2.4. Theorem. I f the map A(.) is u.s.c, and compact on ID, and Algorithm 1 is
asymptotically regular, then C C_ Q c_ C u D.
Proof. If z . is a point in Q there exists a sequence {z~} generated by Algorithm 1 and K, an infinite subset of the integers so that {zi}r converges to z . . Suppose that z . is not in C 13 D, then z . does not belong to A ( z . ) and there exists
124
G.G.L. Meyer[Asymptotic properties of sequences
compact neighborhoods N ( z , ) and N ( A ( z , ) ) of z , and A ( z , ) respectively so that N ( A ( z , ) ) f'l N ( z , ) = 0 and A(z) C N ( A ( z , ) ) Vz E N ( z , ) . It follows that there exists E > 0 and k so that
IIz,.,-z, ll>-~
Vi>-k, i i n K
and this contradicts the assumption that {z~} is an asymptotically regular sequence. 25. Corollary. If A(.) is u.s.c, and compact on C, and Algorithm 1 is asymptotically regular, then L = Q = C. A.M. Ostrowski has shown [21, p. 203] that if a sequence {zl} is asymptotically regular and contained in a bounded set, then only the following may occur: (i) The sequence {z~} possesses one and only one cluster point, or (ii) The sequence {z~} possesses an uncountable number of cluster points, and these cluster points form a continuum. Ostrowski's results imply immediately the following. 26. Corollary. Suppose that A(.) is u.s.c, and compact on C, T is bounded, Algorithm I is asymptotically regular, and C contains at most a countable number of points, then every sequence generated by Algorithm 1 converges to a point in C.
Monotonic algorithms An alternate approach for obtaining upper-bounds on the set Q consists in using the concept of monotonicity. Given a map v(-) from T into E, we induce an ordering on T. 27. Definition. A sequence {zi} is monotonic with respect to v(.) iff v(zi) >- v(zi+l) for all successive points zi and zi+l of the sequence. A sequence {zi} is strictly monotonic with respect to v(-) iff either v(z~) > v(z~+~) for all successive elements z~ and z~+~of the sequence, or there exists k, depending on the sequence, so that v(zi) > v(zi+t) for all i - k, and v(zi) = v(zi+O for all i > k. 28. Definition. An iterative algorithm is monotonic (strictly monotonic) iff every sequence it generates is monotonic (strictly monotonic). 29. Lemma. Suppose that v(.) is lower semi-continuous on T, let {zi} be mono-
G.G.L. Meyer~Asymptotic properties o[ sequences
125
tonic with respect to v(.), and let q be the cluster point set o f {zi}. I[ q is not empty, then (i) {v(zi)} converges to some value v . , and (ii) v(z) = v(y) [or every z and y in q.
Corollary 12 and L e m m a 29 imply immediately that we can find an upper bound for Q by using maps v(.) which have the following property: 30. Hypothesis. C = {z E T [ v(y) = v(z) for at least one point y in A(z)}. Note that for every map v(.) we have {y E T I v(y) = v(z) for at least one point y in A(z)} D C. By chosing maps v(.) for which Hypothesis 30 is satisfied, we insure that equality holds, i.e., {y E T I v(y) = v(z) for at least one point y in A(z)} = {z E T [ z E A(z)}. 31. Lemma. Suppose that A(.) is u.s.c, and compact on T, Algorithm 1 is monotonic with respect to v(.) and v(.) is lower semi-continuous on T and satisfies Hypothesis 30, then Q = L = C. The assumptions of L e m m a 31 may be weakened in various ways by using hypotheses on A(.) and v(.) which do not involve A(.) and v(.) separately. In order to be able to vary the strength of the hypotheses, we introduce once again a subset D of T. 32. Hypothesis. If z belongs to/5, there exist a neighborhood N ( z ) of z, 8(z) > 0, and A(z) such that for all x' in A(z') and for all z' in N ( z ) , (i) v(x') + ~5(z) -< v(z'), and
(ii) A(z)-< v(z'). 33. Hypothesis. If z belongs to /5, there exists a neighborhood N ( z ) of z such that v ( x ' ) < v(z) for all x' in A(z') and for all z' in N ( z ) , and v(.) is lower semi-continuous o n / 5 . It was shown in [15] that Hypothesis 32 and Hypothesis 33 were not comparable, and that the following results were true: 34. Lemma. I[ either Hypothesis 32 or 33 is satisfied, then C C L C Q c_ D. Note that a more general form of Hypothesis 33 has been proposed by Huard [7-8].
126
G.G.L. Meyer/Asymptotic properties o[ sequences
35. Hypothesis. Let B(.) be a map from T into all the non-empty subsets of T such that: (i) A(z) C B(z) for all z in T, (ii) if y E B(z), then B(y) C B(z), and (iii) if z belongs t o / 5 , there exists a neighborhood N ( z ) of z so that z does not belong to the closure of B(x') for all x' in A(z') and for all z' in N(z). 36. Lemma. I[ Hypothesis 35 is satisfied, then C c L C Q c_ D. The proof of L e m m a 36 may be found in [8, p. 156]. The upper bound D may not be as small as C. In fact in some cases, D is bounded from below by the extended characteristic set Co of Algorithm I with respect to v(.), where Co is the set of all points z in T such that for each scalar > 0 there exists at least one point x in A(z) which may depend on z and 8 such that v(x) >- v ( z ) - ~ [14, 15]. 37. Lemma. If Hypothesis 32 is satisfied, then D ~_ Co, and i[ either Hypothesis 33 or 35 is satisfied, then D ~_ C. The use of a surrogate map v(.) to obtain the convergence properties of Algorithm 1 has been widely described in the literature [3, 5, 6-8, 10-19, 22-26]. Hypotheses 32 and 33 presented in this paper have been discussed in [15] and have been shown to be weaker than the existing ones. It has been noted by Zangwill [26] that the map v(-) used in convergence theory to help find an upper-bound for Q is closely related to a Liapunov map. This is indeed the case when T is bounded and A(.) is u.s.c, on T provided that only asymptotic stability in the large is considered. 38. Definition. A non-empty subset D of T is an asymptotically stable in the large equilibrium set of Algorithm 1 iff to e v e r y neighborhood Nt(D) of D and to every sequence {zi} generated by Algorithm 1 correspond k, depending on N ( D ) and {zi} so that zi is in N ( D ) for all i > k. 39. Lemma. If T is bounded, A(.) is u.s.c, on D and either Hypothesis 32, 33 or 35 is satisfied, then D is an asymptotically stable in the large equilibrium set o[
Algorithm 1. Proof. Let N ( D ) be a neighborhood of D and let {zi} be a sequence generated by Algorithm I. By assumption T is bounded and therefore the cluster point set q of {zi} is not empty and bounded. L e m m a 34 shows that q is a subset of D and therefore N ( D ) is a neighborhood of Q. Corollary 17 implies that there exists k so that zi is in N ( D ) for all i -> k, and the set D is an asymptotically stable in the large equilibrium set of Algorithm 1.
G.G.L. Meyer/Asymptotic properties of sequences
127
References [ 1] C. Berge, Espaces topologiques - functions multivoques (Dunod, Paris, 1966). [2] G. Debreu, Theory of Value, Cowles Foundations, Monograph 17, (Wiley, New York, 1959). [3] B.C. Eaves and W.I. Zangwill, "Generalized cutting plane algorithms", S I A M Journal of Control 9 (1971) 529-542. [4] H. Hermes, "Calculus of set valued functions and control", Journal of Mathematics and Mechanics 18 (1968) 47-59. [5] W.W. Hogan, "Point-to-set maps in mathematical programming", S I A M Review 15 (1973) 591-603. [6] P. Huard, "Optimization algorithms and point-to-set maps", Mathematical Programming 8 (1975) 308-331. [7] P. Huard, Cours de 3eme cycle, lere partie, elements theoriques, Universit~ de Lille I (Francel Janvier 1972. [8] P. Huard, Cours de 3eme cycle, 2eme partie, algorithmes generaux, Universit6 de Lille I (France) Janvier 1972. [9] S. Kakutani, "A generalization of Brouwer's fixed point theorem", Duke Mathematical Journal 8 (1941) 457-459. [t0] D.G. Luenberger, Introduction to linear and nonlinear programming (Addison-Wesley, Reading, MA, 1973). [11] G.G.L. Meyer, "Algorithm model for penalty functions-type iterative procedures", Journal of Computer and System Sciences 9 (1974) 20-30. [12] G.G.L. Meyer, "A canonical structure for iterative procedures", Journal o/ Mathematical Analysis and Applications 52 (1975) 120-128. [13] G.G.L. Meyer, "A systematic approach to the synthesis of algorithms", Numerische Mathematik 24 (1975) 277-289. [14] G.G.L. Meyer, "Conditions de convergence pour les algorithmes iteratifs monotones, autonomes et non deterministes", Revue Francaise d'Automatique, ln[ormatique et Recherche Operationnelle, Analyse Numerique 11 (1977) 61-74. [15] G.G.L. Meyer, "Convergence conditions for a type of algorithm model", S I A M Journal on Control and Optimization 15 (1977) 779-784. [16] G.G.L. Meyer and E. Polak, "Abstract models for the synthesis of optimization algorithms", S l A M Journal on Control 9 (1971) 547-560. [17] G.G.L. Meyer and R.C. Raup, "On the structure of cluster point sets of iteratively generated sequences", Journal of Optimization Theory and Applications. to be published. [18] R.R. Meyer, "The validity of a family of optimization methods", S l A M Journal on Control 8 (1970) 41-54. [19] R.R. Meyer, "Sufficient conditions for the convergence of monotonic mathematical programming algorithms, Technical Report 220, Computer Science Department, University of Wisconsin, Madison, WI. [20] E. Michael, "Topologies on spaces of subsets", Transactions American Mathematical Society 71 (1951) 152-182. [21] A.M. Ostrowski, Solution of equations and systems of equations (Academic Press, New York, 1966). [22] E. Polak, "On the convergence of optimization algorithms", Revue Francaise d'Automatique, lnforraatique et Recherche Operationnelle 16 (1%9) 17-34. [23] E. Polak, Computational methods in optimization: a unified approach (Academic Press, New York, 1971). [24] B.T. Polyak, "Gradient methods for the minimization of functionals", U.S.S.R. Computational Mathematics and Mathematical Physics 3 (1%3) 864-878. [25] W.I. Zangwill, "Convergence conditions for nonlinear programming algorithms", Working Paper No. 197, Center for Research in Management Science, University of California, Berkeley, November 1966. [26] W.I. Zangwill, Nonlinear programming: a unified approach (Prentice Hall, Englewood Cliffs, N J, 1%9).
Mathematical Programming Study 10 (1979) 128-141. North-Holland Publishing Company
GENERALIZED EQUATIONS AND THEIR SOLUTIONS, PART I: BASIC THEORY S t e p h e n M. R O B I N S O N *
University of Wisconsin-Madison, Madison, Wisconsin, U.S.A. Received 28 November 1977
We consider a class of "generalized equations," involving point-to-set mappings, which formulate the problems of linear and nonlinear programming and of complementarity, among others. Solution sets of such generalized equations are shown to be stable under certain hypotheses; in particular a general form of the implicit function theorem is proved for such problems. An application to linear generalized equations is given at the end of the paper; this covers linear and convex quadratic programming and the positive semidefinite linear complementarity problem. The general nonlinear programming problem is treated in Part II of the paper, using the methods developed here.
Key words: Variational Inequalities, Generalized Equations, Monotone Operators, Nonlinear Complementarity Problem, Nonlinear Programming, Economic Equilibria.
1. Introduction
In this p a p e r w e shall s t u d y t h e b e h a v i o r o f s o l u t i o n s o f the generalized equation
OEf(x)+ T(x),
(1.1)
w h e r e f is a c o n t i n u o u s l y F r 6 c h e t d i f f e r e n t i a b l e f u n c t i o n f r o m an o p e n set / ] C R n into R n a n d T is a m a x i m a l m o n o t o n e o p e r a t o r f r o m R" i n t o itself (recall t h a t an o p e r a t o r T is monotone if f o r e a c h (x~, w,), (x2, w2) in g r a p h T o n e h a s <x, - x2, w , - w2> - 0,
w h e r e (., .) d e n o t e s t h e i n n e r p r o d u c t , a n d maximal monotone if its g r a p h is n o t p r o p e r l y c o n t a i n e d in t h a t o f a n y o t h e r m o n o t o n e o p e r a t o r ) . W e u s e t h e t e r m " g e n e r a l i z e d e q u a t i o n " b e c a u s e if T is i d e n t i c a l l y z e r o , t h e n (1.1) r e d u c e s to t h e e q u a t i o n f ( x ) = 0, a n d b e c a u s e s y s t e m s like (1.1) r e t a i n s o m e o f t h e a n a l y t i c p r o p e r t i e s o f n o n l i n e a r e q u a t i o n s , as w e shall s h o w in w h a t f o l l o w s . W e shall b e p a r t i c u l a r l y i n t e r e s t e d in c o n d i t i o n s w h i c h , w h e n i m p o s e d on f a n d T, will e n s u r e t h a t t h e set o f s o l u t i o n s t o (1.1) r e m a i n s n o n e m p t y a n d is well b e h a v e d (in a s e n s e to b e defined) w h e n f is s u b j e c t e d to small p e r t u r b a t i o n s . T o i n t r o d u c e t h e s e p e r t u r b a t i o n s , w e shall m a k e u s e o f a t o p o l o g i c a l s p a c e P a n d a * Sponsored by the United States Army under Contract No. DAAG29-75-C--0024 and by the National Science Foundation under Grant No. MCS74-20584 A02. 128
S.M. Robinson [ Generalized equations, part I
129
function f 9P x f] ~ R " , so that we can replace (1.1) by 0 E f(p, x) + r ( x ) ,
(1.2)
and study the set of x which solve (1.2) as p varies near a base value po. A particular case of (1.2) of special interest for applications is that in which T is taken to be the operator dt~c, where for a closed c o n v e x set C C R " one defines the indicator function Oc of C by
r
:=~'0' x~C, +oo, x ~ C, I.
and where 0 denotes the subdifferential operator [13, Section 23]. This yields the special generalized equation 0 E I(P, x) + O~l,c(X),
(1.3)
which expresses analytically the geometric idea that f(p, x) is an inward normal to C at x. Many problems from mathematical programming, complementarity, mathematical economics and other fields can be represented in the form (1.3): for example, the nonlinear complementarity problem F ( x ) ~ K*,
x U. K,
(x, F ( x ) ) = 0
(1.4)
where F : R " ~ R " , K is a nonempty polyhedral c o n v e x cone in R", and K * := {y E R" I (Y, k) --- 0 for each k E K}, can be written as 0 E F ( x ) + Oq~K(x). Further information on nonlinear complementarity problems (often with K = R% the non-negative orthant) may be found in, e.g., [2, 4, 7, 81. The K u h n - T u c k e r necessary conditions for mathematical programming [61 form a special case of (1.4); e.g., for the problem minimize
0(y),
subject to
g(y)-~0,
h(y) = 0.
(1.5)
where 0, g and h are differentiable functions from R m into R, R q and R r respectively, one has the K u h n - T u c k e r conditions O'(y)+ug'(y)+vh'(y)=O, h(y) = O,
u>-O,
g(y)<-O,
(u,g(y))=O,
and these can be written in the form (1.4) by taking n = m + q + r, K = R m x Rq+x R r, x = (y, u, v) and F(x) =
[0'(y) + ug'(y) + vh'(y)] r-g(y) -h(y)
I
S.M. Robinson/Generalized equations, part I
130
There are also important applications of (1.3) to economic equilibrium problems [15], among others. It is of interest to note that in most of the applications mentioned one finds that C is a polyhedral convex set, and we shall see that particularly strong results can be obtained for such problems. It is also worth pointing out that problems of linear or quadratic programming lead to linear generalized equations: for example, if P C R" and Q c f f are two polyhedral convex cones, H and A are matrices of dimensions m • m and l • m respectively, c • R m and a ~ R t, then we can consider the quadratic programming problem minimize
~(x, H x ) + (c, x ) ,
subject to
a-Ax~Q*,
xEP,
(1.6)
where Q* is the dual cone of Q. The necessary optimality conditions for (1.6) are (assuming without loss of generality that H is symmetric): xrH + c + uA E P*,
a - x A x E Q*,
x~P,
uEQ,
( x r H + c + u A , x ) = O,
(u, a - A x ) = O.
These can be formulated in a somewhat more transparent manner by writing them as
+ [:]+o+,xo(:), a linear generalized equation which, if P and Q are taken to be R" and l ~ respectively (i.e., in the case of quadratic programming with equality constraints and unconstrained variables) reduces to an ordinary linear equation. We shall see that linear generalized equations are basic to the analysis done here in much the same way as linear equations are basic to the analysis of nonlinear equations. The organization of this paper is as follows: in the next section we state and prove the main result (Theorem 1) after defining a property used in the statement. We also discuss a way of simplifying (by restricting) one of the hypotheses. In Section 3 we examine a class of multivalued functions frequently found in applications, and show that they have one of the key properties needed in Theorem 1. Finally, in Section 4 we apply the results of Sections 2 and 3 to linear generalized equations. Applications to nonlinear problems will be the subject of Part II of this paper.
2. Main results
Before stating the main theorem, we require a preliminary definition dealing with a certain continuity property of multivalued functions (or m u l t i f u n c t i o n s , as we shall call them).
S.M. Robinson / Generalized equations, part I
131
Definition 1. Let X and Y be normed linear spaces. A multifunction F : X ~ Y is upper Lipschitzian with modulus A, or U.L.(A), at a point x0 E X with respect to a set V C X , if for each v E V one has F ( v ) C F(xo) + ~ IIv
-
Xo]lBY,
where By is the unit ball in Y. We say F is locally U.L.(A) at x0 if it is U.L.(A) at x0 with respect to some neighborhood of x0. This property is close to the Lipschitz continuity for multifunctions defined by Rockafellar [14, Section 3], except that we do not require F(xo) to be a singleton; in the problems we shall consider F(xo) will often be multivalued. Note that the distance from any point of F ( v ) to the set F(xo) is bounded above by )tllv -x01l, although the distance from a point of F(xo) to F(v) may be large. Before stating the main theorem, we shall try to motivate its hypotheses. Recall that in the classical inverse-function theorem the key assumption is that the linearization of the function being considered, about a point x0 in the inverse image of 0, should be regular: specifically, that the inverse image of 0 under the linearized function should be a singleton (in fact, the point x0 itself). In our situation, since we may be dealing with solution sets rather than points, we have to linearize about each point in a set. The first assumption in the theorem is that there is a n o n e m p t y bounded set X0 (analogous to the point x0 in the classical case) such that the inverse image of 0, under an appropriate kind of iinearization performed at any point of X0, is X0 itself together with, possibly, points outside some neighborhood of X0. There is also an assumption of uniform upper Lipschitz continuity, which is automatically true in the classical case. Finally, there is an assumption that the inverse image of any point near 0, under the linearization previously mentioned, has a convex c o m p o n e n t in the neighborhood of X0 within which we are working. In the classical case this is equivalent to the first hypothesis, but not so here. We shall show, below and in Part II, that many problems of practical interest satisfy these hypotheses. In particular we show in Proposition 1 that the third assumption can be replaced by an assumption of positive semidefiniteness which is often satisfied in applications. In the following theorem, we use f_, to denote the partial Fr6chet derivative, with respect to the second argument, of a function f(p, x) of two variables; B denotes the unit ball in R n with respect to the Euclidean norm, which is used throughout the remainder of the paper. Theorem 1. L e t P be a topological space, 12 an open set in R n and T a closed multifunction f r o m R ~ into itself. Let f be a continuous function from P x 12 into R n such that f2 is continuous on P x O . Let p o E P ; write Lfxo(x) f o r f ( p o , xo)+ f 2(Po, Xo)(X - Xo). Suppose that there are a nonempty, bounded convex set Xo and constants A, 3' > 0 and rl > 0 with Xv := X0 + ~,B C 12, such that f o r each Xo E X0:
132
S.M. Robinson/Generalized equations, part I
(i) X~, n(Lf,o + T)-1(0) = Xo; (ii) X~ n ( L f , o + T) --1 is U.L.()t) at 0 with respect to riB; (iii) for each y E riB, X , n(Lf~o+ T)-~(y) is convex and nonempty. Then there exist a number ~ E (0, !11 and a neighborhood U(po) such that with E(p):={{xEXo+SBlOEf(p,x)+T(x)}
0
pEU,
p~U,
one has : (1) ~ is upper semicontinuous from U to R" ; (2) E(p0)= X0; and (3) For each e > O, for some neighborhood U,(po) and for each p E U,, 0 # ~ ( p ) C Z(p0) + ()t + e)ao(P)B, where a0(p) := max(Ill(p, x ) - f(P0, x)ll I x e x0}. Note that if P is actually a normed linear space and if f(p, x) is Lipschitzian in p uniformly over x ~ Xo, then for some constant Ix and each p E U, we have ~(p)
c.~
(po) + ()t +
e)IxllP -
PollB,
so that ,~ is locally U.L.[()t + e)Ix] at po. Proof. Choose x0E X0; denote Lfxo+ T by Q(xo). Let 0 E (0, ri] with )tO ---3' and let y E OB; then X , n Q(xo)-I(y)CXo+XllYllB c X,. Hypothesis (iii), together with closure of Q(xo), implies that for each y ~ OB, Xv n Q(x0)-l(y) is non-empty, compact and convex. In particular, X0 is a compact convex set. The basic idea of the proof is to approximate the inverse of the operator f(p, x) + T ( x ) by the inverse of the operator Q('n'(x))(z) := Lf,,(~)(z) + T(z) = f(Po, rr(x)) +f2(Po, rr(x))(z - or(x)) + T(z), where It(x) is the closest point to x in X0, just as one approximates the inverse of a function in the classical inverse-function theorem by the inverse of its linearization about some point. We then apply a fixed-point theorem; in proving the inverse-function theorem one usually uses the contraction principle, but here we have to use the Kakutani theorem. Observe that the "linearized" operator appearing here is of the type we discussed above in considering linear generalized equations; this illustrates our comment that these operators play a r61e in the analysis of generalized equations analogous to that of linear operators in classical analysis. Of course, during this approximation it will be necessary to be careful that we work with the correct component of the inverse image (i.e., that lying in X,), and this adds a certain amount of complexity to the notation.
S.M. Robinson/Generalized equations, part I
Define, for two subsets A and C of R" and a point x E R", d[x, C] :=
133
inf(llx
-
cll l c ~ c} and d [ A , C] = sup{ata, C]I a e A}, where the s u p r e m u m and infimum of ~ are defined to be -or and + ~ respectively. Denote by 7r the projection from R" onto X0; ~r is well known to be nonexpansive, hence a fortiori continuous. Using continuity and c o m p a c t n e s s , one can show that the function /3(5) := max{ll/2(p0, x ) - / 2 ( P 0 ,
~(x))ll I x ~
g o + 8B}
is well defined for small 8, and is continuous at 0 with /3(0)= 0. Thus, we can choose a 8 E (0, 3'] such that ;t/3(8)-< 89and 5/3(5)<-89 It is not difficult to show that for this fixed 8 the function a , ( p ) := max{Ill(p, x) - f(P0,
x)lll
x ~ x,}
is well defined for all p E P, and is continuous at P0 with as(po) = 0. Thus, we can choose a neighborhood U(po) such that for each p ~ U, as(p)< 89 and Aa~(p)--~8. N o w choose any p E U, and define a multifunction Fp f r o m X8 into R" by
Fp(x) := X~ t3 Q(Tr(x))-'[Lf~x)(x) - f(p, x)]. If x is any point of Xs, we have
IILf,~x~(x) - / ( p , x)ll -< II/(P, x) - f(po, x)ll + II/(p0, x) - Zf~x)(x)ll.
(2.1)
Now define (for this fixed x) a function of one real variable r by g(~') := ,f(Po, rx + (1 - ~')Tr(x)) - Lf,,t~)('rx + (1 - r)Tr(x)). We find that
II/(po, x ) - Lf~,~(x)ll = IIg(1)- g(0)ll-< sup(llg'(~)ll I 0 < 9 < l}. H o w e v e r , for ~- E [0, 1], g ' ( r ) = [/2(P0, x,) - f2(Po, ~(x))][x - lr(x)], where x , : = r x + ( 1 - z ) ~ - ( x ) . 7r(x,) = ~r(x), so
We h a v e by properties of the projection that
llf~(po,x,) - f~tp0, ~(x))ll -- llf~fp0,x,) -/~(p0, ~(x,))ll-/3(5), since x, E Xs. Hence
ll/(po,x) - Lf,,~)(x)ll<-/3(8)Iix- ~(x)ll.
(2.2)
As II/(P,x)-/(po, x)ll-< ,,,(p), we have from (2.1) and (2.2)
llLf,,,,~(x)-f(p,x)ll<-a,(p)+/3(8)llx-~'(x)ll<89189 = 0.
(2.3)
Hence, by our previous remarks Fp(x) is a nonempty compact convex set for each x E Xs. Also, using (i),(ii)and (2.3) we have for x E Xs,
134
S.M. Robinson[ Generalized equations, part I
d[ Fp(x ), X0] = d[X, n Q(1r(x ) )-t[ Lf,,<x>(X) - [(p, x)], X~ n Q(zr(x))-l(0)] <- x l l L f ~ ( x ) - f ( p , x)ll
(2.4)
___x ~ ( p ) + xt3(8)llx - ~ ( x ) l l - ~8 + 89 = 8, so Fp carries Xs into itself. We have graph F o = {(x, y) I x ~ Xs, y E Xs, L/~<~(x) - f ( p , x) E Lf~x~(y) + T(y)} = {(x, y) 10 ~ f(p, x) + [2(Po, ~-(x))(y - x) + T(y)} O(Xs • X~). Using the continuity of f, f:, and rr, together with the fact that T is closed, one can show without difficulty that graph Fp is closed in X8 x Xs. We can thus apply the Kakutani fixed-point theorem [5, 9] to conclude that there is some xp E X8 with xp E Fp(xp); that is,
Lf,(~p)(xp) - f(p, xp) ~ Lf,(~)(xp) + T(xp), so
0 ~ f(p, xp) + T(xp) and thus Xp ~ ~(p), which is therefore nonempty. We have graph ~ = {(p, x) E U • X~ I 0 E f(p, x) + T(x)}; this is closed in U • X~ by joint continuity of f and closure of T. However, the range of ~ is contained in the compact set X~; thus by [9, L e m m a 4.4] ~ is actually upper semicontinuous from U to X~. If x o ~ ) t 0 then by (i) one has 0 E Lf~o(Xo) + T(xo) = f(Po, Xo) + T(xo), so x0 E ~(P0) and thus ,~(Po) D Xo. On the other hand, if x E ~(po) then x ~ X~ and 0 ~ f(P0, x) + T(x); therefore
Lf,(,)(x) - f(Po, x) ~ Lf~(~)(x) + T(x), so that x ~ F~(x). As x ~ X~, we have from (2.4) with p = P0 that
d[x, X0] <- d[Foo(x), X0] -< x l l L f ~ , ( x ) - f ( p o , x)ll. But from (2.3) with p = Po, we find that
liZZie,s(x) -/(po,
x)ll - / ~ ( 8 ) l l x - ~-(x)ll = ~ ( 8 ) d l x , X0].
Thus
d[x, X0] --- ;t/3(8)d[x, X0] - ~d[x, X0], implying that x E X0 since X0 is closed. Thus we actually have ~(P0) = X0. Now take any e > 0; find 8, E (0, 8] such that for cr U [0, 8,] one has )t/3(~r) ---~e/(X + E). One can show that the function
~'(p) := maxllf2(p, x ) - f2(p0, x)ll I x ~ X0+ 8,B} is well defined on P and is continuous at po; choose a neighborhood U,(po)C U so that if p E U, we have ~ ( p ) C -~(P0) + 8,B and ;t3,(p) -< 89 + E). Now choose
S.M. Robinson/Generalized equations, part I
135
any p E U, and a n y x E E(p). Using (2.4) and the fact that x E Fp(x), we have
d/x, E(po)] <- d[Fp(x), X0] <- ~llzY.~,~(x) - f ( p , x)ll
-< xl[h(x) - h(zr(x))[I + X[[h(~(x))l[ + Xl[f(po, x) - Lf~,x>(x)[I, (2.5) w h e r e h(x):= f(p, x) -f(po, x). If we define, as before, x, := rx + (1 - z)~r(x), we have
IIh(x) - h(Tr(x))[I <-Ilx - ~r(x)[[ sup{llh'(x,)H I o < ~" < 1}. But h'(x,) = f2(p, x~) - [2(po, x,), so
Ilh(x) - h ( r r ( x ) ) [ I -
r(p)llx - ~'(x)[I.
Thus, using (2.2), (2.5) and the fact that [Ih(~r(x))ll-< a0(p), we h a v e d / x , -~(p0)] -< Xr(p)llx - ~(x)ll + A~0(p) + A~(8,)IIx - ~(x)ll
--- [,/ (x + ,)llix - ~(x)li + x~0(p). But IIx - ~r(x)ll -- d/x, ~(p0)], so if x > 0 we obtain [A/(A + ~)]d[x,-~(P0)] -< Aao(p) and thus
dIx, E(p0)] -< (X + E)a0(p).
(2.6)
On the o t h e r hand, if X = 0, then (2.5) implies that d/x, Z(P0)] = 0, in which case (2.6) holds trivially. In either case, t h e r e f o r e ,
X(p) C ~(P0) + (A + E)ao(p)B, which c o m p l e t e s the proof. Verification o f the h y p o t h e s e s o f this t h e o r e m in a p a r t i c u l a r case m a y be difficult; this is particularly true of (ii) and (iii). It is t h e r e f o r e desirable to look for classes of p r o b l e m s for which this verification m a y be easier. In the next section we exhibit such a class f o r h y p o t h e s i s (ii); we do so for (iii) in the following p r o p o s i t i o n .
Proposition 1. In Theorem 1, the hypothesis (iii) may be replaced by (iii)' /z(Po, xo) is positive semidefinite and T is maximal monotone. Proof. We shall s h o w that (iii)', t o g e t h e r with the other h y p o t h e s e s of T h e o r e m 1, implies (iii). C h o o s e a n y x0• Xo; under (iii)' the function Lf~ will be a m a x i m a l m o n o t o n e operator. As T is also m a x i m a l m o n o t o n e and as d o m L.f~ (the effective d o m a i n of LfQ is all of R", we h a v e f r o m [1, C o r o l l a r y 2.7] that Q(xo) is m a x i m a l m o n o t o n e ; hence so is Q(x0) -1. T h e set Q(x0)-~(0) is then
136
S.M. Robinson / Generalized equations, part I
convex, so that (i) implies that Q(x0)-~(0)= x0. It follows that for y E riB, X~ M Q(x0)-'(y) c g o + xllYlln (by (ii)). Now let r, E (0, 711 with Aa < y. If y E aB, the convexity of Q(x0)-'(y) implies that Xv tq Q(x0)-'(y) = Q(x0)-'(y), so Q(xo) -t is locally U.L.(,X) at 0. But this, together with the boundedness of Q(x0)-'(0), shows that Q(xo)-' is locally bounded at 0; in fact, it must be locally bounded at every point of intt~B, since the image of some ball around such a point will be contained in the image of aB, which in turn is contained in the bounded set X ~ =Xo+AaB. But then from [12, T h e o r e m I] we have that i n t a B cannot contain any boundary point of dom Q(x0)-~; however, as i n t a B meets dom Q(x0) -I (at 0) and is connected we finally conclude that i n t a B C int dom Q(xo)-k Thus, for each y with llyll < a the set Q(xo)-~(y) is nonempty, convex and contained in Xa~ c X r Now let 77o be any positive number smaller than a. As hypothesis (ii) of Theorem 1 was true for 7, and as a-< 71, that hypothesis will be satisfied also for "00; as we have just seen, hypothesis (iii) also holds for r~0, and this proves Proposition I. The hypothesis (iii)' is certainly simpler than is (iii); however, (iii) covers a more general class of problems. For example, consider the linear generalized equation
0 ~ - a x +/3 + 0~[-L~I(X), where = > 0. This does not satisfy (iii)'; however, if 1/31~ a then each of its solutions (one if 1/31> a, three if It~l < ~) can be analyzed under (iii). If 1/31-- a then the solution at - s g n / 3 can be so analyzed, but the solution at sgn/3 cannot (indeed, the conclusions of Theorem 1 fail for that solution).
3. Polyhedral multifunctions In the last section, we exhibited a class of problems for which hypothesis (iii) of Theorem 1 always held. Here we do somewhat the same thing for hypothesis (ii): we show that for a class of multifunctions important in applications to optimization and equilibrium problems, local upper Lipschitz continuity holds at each point of the range space. The problem of verifying hypothesis (ii), in the case of such functions, then reduces to that of showing that the Lipschitz constants are uniformly bounded and that the continuity holds on a fixed neighborhood for each function in the family considered. For the application given in Section 4 this is trivial; some cases in which it is non-trivial are treated in Part II.
Definition 2. A multifunction Q : R n ~ R '~ is polyhedral if its graph is the union of a finite (possibly empty) collection of polyhedral convex sets (called components).
S.M. Robinson/Generalized equations, part I
137
Here we use "polyhedral convex set" as in [13, Section 19]. It is clear that a polyhedral multifunction is always closed, and that its inverse is likewise polyhedral. Further, one can show without difficulty that the class of polyhedral multifunctions is closed under scalar multiplication, (finite) addition, and (finite) composition. The following proposition shows that they have good properties also with respect to upper Lipschitz continuity. For brevity, we omit the proofs of this proposition and the next; they may be found in [10].
Proposition 2. L e t F be a polyhedral multifunction f r o m R ~ into R =. Then there exists a constant A such that F is locally U.L.(A) at each XoE R n.
It is worth pointing out that ;t depends only on F and not on x0, although of course the size of the neighborhood of x0 within which the continuity holds will in general depend on x0. The importance of polyhedral multifunctions for applications is illustrated by the following fact, in the statement of which we use the concepts of subdifferential and of a polyhedral convex function (one whose epigraph is a polyhedral convex set), which are discussed further in [13].
Proposition 3. Let f be a polyhedral convex function f r o m R ~ into (-oo, + oo]. Then the subdifferential Of is a polyhedral multifunction.
It follows from this proposition that subdifferentials of polyhedral convex functions display the upper Lipschitz continuity required in Theorem 1. In view of our earlier remarks about polyhedral multifunctions, this behavior is not lost if we combine these subdifferentials in various ways with other polyhedral multifunctions. For example, let C be a nonempty polyhedral convex set in R n and let ~bc : R n ~ (-oo, +oo] be its indicator function, defined by r
0, x~C, +o% x ~ C.
It is readily verified that $c is a polyhedral convex function. Now, if A is a linear transformation from R n into itself and a E R ~, then the operator A x + a + O$c(X) and its inverse are, by Propositions 2 and 3, everywhere locally upper Lipschitzian. Hence, generalized linear equations have good continuity properties with respect to perturbations of the right-hand side; we shall exploit this fact in the next section. This discussion also shows that, if the operator T in Theorem 1 is polyhedral, then the linearized operators L f , , + T have at least some of the continuity properties required in hypothesis (ii) of that theorem; it is still necessary to prove uniformity, but this is trivial if X0 is a singleton, while in general it can often be done by using the structure of the problem (e.g., in nonlinear programming: see Part II of this paper).
138
S.M. Robinson / Generalized equations, part I
4. An application: stability of a linear generalized equation
To illustrate an application of Theorem 1, we specialize it to analyze the behavior of the solution set of the linear generalized equation 0 E A x + a + d~bc(X),
(4.1)
where A is an n x n matrix, a E R", and C is a nonempty polyhedral convex set in R". Such problems include, as special cases, the problems of linear and quadratic programming and the linear complementarity problem. We shall characterize stability of the solution set of (4.1) when the matrix A is positive semidefinite (but not necessarily symmetric); a more general (but more complicated) result could be obtained by dropping the assumption of positive semidefiniteness but assuming hypothesis (iii) of Theorem 1. Theorem 2. Let A be a positive semidelinite n x n matrix, C be a nonempty polyhedral convex set in R* and a ~ R". Then the [ollowing are equivalent: (a) The solution set o f (4.1) is nonempty and bounded. (b) There exists ~o > 0 such that f o r each n x n matrix A' and each a' E R" with e ' : = max{llA' - All, Ila'- all} < G0,
<4.2)
the set S ( A ' , a') := {x [ 0 ~ A ' x + a' + O~bc(X)} is nonempty. Further, suppose these conditions hold; let lz be a bound on S ( A , a), and A be a local upper Lipschitz constant f o r [A(.) + a + a~bc(-)]-1 at 0 (which exists by the results o[ Section 3). Then f o r any open bounded set 9 containing S ( A , a) there is some ~1 > 0 such that [or each A', a' with max{HA' - All, Ua' - all} < ~1 we have r # S(A', a') O 9 C S(A, a) + A~'(1 - A~')-t(1 + ~L)B.
(4.3)
Finally, i[ (A', a') are restricted to values [or which S ( A ' , a') is known to be connected (in particular, if A' is restricted to be positive semideJinite), then qt can be replaced by R".
Proof. (b ::> a) If (b) holds then in particular S ( A , a') is nonempty for all a' in some ball about a. This means that 0 belongs to the interior of the range of the operator A(.)+ a + 0~c(-), which is maximal monotone by [1, Corollary 2.7]. Accordingly, the inverse of this operator is locally bounded at 0 [1, Proposition 2.9] and so in particular S ( A , a) is bounded. (a:ff b) We apply Theorem 1, taking P to be the normed linear space of pairs (A', a') of n x n matrices and points of R', with the distance from (A', a') to (A", a") given by max{llA'-A'ql, Ila'-a"ll}; we take po := ( A , a ) , T := O~bc, and f[(A', a ' ) , x ] : = A ' x + a'. The set X0 is then S ( A , a ) ; we let O be any open bounded set containing X0, and since Lf,~(x) = A x + a for any x0, it is clear that the hypotheses are satisfied (note that Proposition 1 implies that (iii) holds). We
S.M. Robinson/Generalized equations, part I
139
then find that for some 6 > 0 , ~ 0 > 0 and all ( A ' , a ' ) with e ' < E 0 , we have S(A', a') A[S(A, a) + 8B] nonempty, which proves (b). Now choose ~ ; without loss of generality we can suppose that 12 was taken to be this ~. As ~ is bounded, we can find ~1 E (0, c0] with Ael < 1 and such that for each x E ~, ~1(1 + Ilxll) --- 7, where 77 is the parameter appearing in Theorem 1. Now pick any (A', a') with ~ ' < el; by the above discussion S(A', a ' ) N ~ is nonempty, and we take x' to be any point of that intersection. We know that 0 E A ' x ' + a' + ~q~c(X'), which is equivalent to x' E [A(-) + a + OtOc(.)]-'[(A - A')x' + (a - a')]. But since x' ~ ~,
It(A
-
A')x'+
(a -
a')ll-< max{llA - A'U, Ila - a'll}(l + Ilx'll) -< ~,(1 + Ilx'll) -< ,7,
and so by upper Lipschitz continuity, d[x', S ( A , a)] --- AII(A - A')x' + (a - a')ll. Now let x0 be the closest point to x' in S ( A ; a); then
II(a
-
m')x' + (a - a
311 - iI(A
-
A ' ) x o + ( a - a311
+ I](A
-
m ' ) ( x' -
x0)[[
_< ,'(1 § ~ , ) § , ' l l x ' - xdl. Accordingly, as IIx' - xoll = d [ x ' , S(A, a)] w e h a v e
dIx', S ( A , a)] -< )ta'(l + ix) + )tE'd[x', S ( A , a)], yielding d[x', S ( A , a)] --- )t~'(1 - Ae')-l(1 +/~). Since x' was arbitrary in S ( A ' , a') O 1/i, we have (4.3). Finally, we observe that for all small ~', S(A', a') n 9 is contained in S ( A , a) + 8B which is contained in qt. If S ( A ' , a') also met the complement of qt then it would be disconnected; thus if S(A', a') is connected it must lie entirely in qz, so that we may replace qt by R" in (4.3). In particular, if A' is positive semidefinite then A'(.) + a' + O~Oc(.) is maximal monotone, so that S(A', a') is convex as the inverse image of 0 under this operator. This completes the proof. One might wonder, since the boundedness of ~ is used at only one place in the proof, whether a refinement of the technique would permit replacement of by R" in all cases. The following example shows that this cannot be done even for n = 1: take C = R+, A = [0] and a = [1], so that the problem is 0 ~ [0]x + [11 + ar
140
S.M. Robinson/Generalized equations, part I
whose solution set is S([0], [l]) = {0}. H o w e v e r , it is readily checked that for any E > 0, S ( [ - e l , [1]) = {0, ~-~}; thus we cannot take g' = R in this case. Theorem 2 provides, in particular, a complete stability theory for convex quadratic programming (including linear programming) and for linear complementarity problems with positive semidefinite matrices; this extends earlier work of Daniel [3] on strictly convex quadratic programming, and of the author I l l ] on linear programming. Stability results for more general nonlinear programming problems are developed in Part II of this paper. It might be worth pointing out that the strong form of Theorem 2 (i.e., with A' restricted to be positive semidefinite) can sometimes be shown to hold because of the form of the problem. For example, consider the quadratic programming problem minimize
89
Qx) + ,
subject to
B x + Dy <- d
(4.4)
(we could also have added equality constraints, constrained variables, etc. but have omitted these for simplicity). Here Q is m x m, B is r x m and D is r x s. The formulation of (4.4) as a generalized equation is (taking Q to be symmetric) 0~
0
+
+ aOc(X, y, u),
(4.5)
-D where C = R " x R ' • R~. The matrix shown in (4.5) is then the matrix A of T h e o r e m 2; it is positive semidefinite if and only if Q is positive semidefinite (i.e., if and only if the problem (4.4) is convex). Now, if Q is actually positive definite, then for all small perturbations of the data of (4.4) (i.e., of Q, q, p, B, D, and d) the matrix in (4.5) will remain positive semidefinite and the strong form of T h e o r e m 2 will hold. The point here is that the structure of the problem prevents the type of perturbation which could destroy the positive semidefiniteness of A. This comment, of course, applies in particular to all linear programming problems [11].
References [1] H. Br6zis, Op~rateurs maximaux monotones (North-Holland, Amsterdam, 1973). [2] R.W. Cottle, "Nonlinear programs with positively bounded Jacobians", SIAM Journal on Applied Mathematics 14 (1966) 147-158. [3] J.W. Daniel, "Stability of the solution of definite quadratic programs", Mathematical Programming 5 (1973) 41-53. [4] G.B. Dantzig and R.W. Cottle, "Positive (semi-) definite programming", in: J. Abadie, ed., Nonlinear programming (North-Holland, Amsterdam, 1968)55-73. [5] S. Kakutani, "A generalization of Brouwer's fixed point theorem", Duke Mathematical Journal 8 (1941) 457-459.
S.M. Robinson / Generalized equations, part I
141
[6] O.L. Mangasarian, Nonlinear programming (McGraw-Hill, New York, 1969). [7] J.J. Mor~, "Coercivity conditions in nonlinear complementarity problems", S I A M Review 16 (1974) 1-16. [8] J.J. Morr, "Classes of functions and feasibility conditions in nonlinear complementarity problems", Mathematical Programming 6 (1974) 327-338. [9] H. Nikaido, Convex structures and economic theory (Academic Press, New York and London, 1968). [I0] S.M. Robinson, "An implicit-function theorem for generalized variational inequalities", Technical Summary Report No. 1672, Mathematics Research Center, University of WisconsinMadison, 1976; available from National Technical Information Service under Accession No. AD A031952. [I I] S.M. Robinson, "A characterization of stability in linear programming", Operations Research 25 (1977) 435--447. [12] R.T. Rockafellar, "Local boundedness of nonlinear, monotone operators", Michigan Mathematical Journal 16 (1969) 397--407. [13] R.T. Rockafellar, Convex analysis (Princeton University Press, Princeton, N J, 1970). [14] R.T. Rockafellar, "Monotone operators and the proximal point algorithm", S I A M Journal on Control and Optimization 14 (1976) 877-898. [15] H. Scarf, The computation of economic equilibria (Yale University Press, New Haven and London, 1973).
Mathematical Programming Study 10 (1979) 142-157. North-Holland Publishing Company
THE FIXED POINT A P P R O A C H TO N O N L I N E A R PROGRAMMING* R. S A I G A L Northwestern University, Evanston, IL. 60201, USA
Received 21 December 1977 Revised manuscript received 7 July 1978 In this paper we consider the application of the recent algorithms that compute fixed points in unbounded regions to the nonlinear programming problem. It is shown that these algorithms solve the inequality constrained problem with functions that are not necessarily differentiable. The application to convex and piecewise linear problems is also discussed. Key words: Nonlinear Programming, Fixed Points, Piecewise Linear, Nonconvex Programming
I. Introduction In this paper we c o n s i d e r the p r o b l e m min g0(x), gi(x) < 0 ,
(l.l) i=l .....
m
(1.2)
w h e r e go and gi are arbitrary functions on a set X c R n, the n - d i m e n s i o n a l Euclidean space. T h e set X m a y be discrete, but for simplification, we a s s u m e that the c o n v e x hull o f X is R n, and that there exists a subdivision of R n with vertices in X. O n e such e x a m p l e of X is the grid o f integers, f o r w h i c h efficient triangulation p r o c e d u r e s exist. See T o d d [17]. O u r a p p r o a c h is to c o n s i d e r piecewise linear a p p r o x i m a t i o n s g[, i = 0 . . . . . m instead, and then to solve the c o n t i n u o u s p r o b l e m by the fixed point algorithms. W e thus obtain an a p p r o x i m a t e solution to (1.1-2). S u c h an a p p r o a c h has been successfully used in [9]. W e note that g~ are not differentiable. Since they are piecewise linear, a notion of a generalized subdifferential can be readily defined. This is the same as the generalized gradient of Clarke [1], and f o r c o n v e x gl, the s a m e as the subdifferential of c o n v e x f u n c t i o n s , R o c k a f e l l a r [10]. Since the fixed point algorithms of E a v e s and Saigal [4] and Merrill [7] can s u c c e s s f u l l y find fixed points of certain point-to-set mappings, we f o r m u l a t e this p r o b l e m as such a point-to-set m a p p i n g p r o b l e m which can then be solved by these algorithms. H a n s e n [5], H a n s e n and S c a r f [6], and E a v e s [3] had r e c o g n i z e d the potential *This research was partially supported by Grant no. mcs 77-03472 from the National Science Foundation. 142
R. Saigal/ Fixed point approach to nonlinear programming
143
of applying these methods to nonlinear programming, but the full potential was explored by Merrill [7]. In Section 3, we present extensions of several of his results for the convex case. Traditional descent type methods for solving this problem are summarized in Mifflin [8]. Also, in [8], a steepest descent type algorithm, using the mapping of [7], is presented. In Section 2, we present a brief overview of the fixed point algorithms; in Section 3 these algorithms are applied to the constrained and unconstrained convex problems; in Section 4 we introduce piecewise linear mappings and establish the necessary and sufficient conditions for local minimization of the constrained and the unconstrained problems; in Section 5 the application of the algorithms is discussed; and in Section 6 we discuss the computational aspects. Finally, in the appendix, we present the computational experience of solving some fairly large nondifferentiable problems.
2. The algorithms We now give a brief description of the algorithm of E a v e s and Saigal [41 implemented on the subdivision J 3 of R " x (0, D]. We will assume that the nonlinear p r o g r a m m i n g problem is being solved by this algorithm. The triangulation J 3 of R n x (0, D] has vertices in R" x {D.2 k} for k = 0, 1, 2 . . . . . Also, v = (v~ . . . . . v , ~ ) is a vertex if Vn+~= D ' 2 -k for some integer k and vi/vn~ ~ is an integer for each i. In case, for a vertex v, vi/v,, ~ is an odd integer, it is called a central vertex. Any simplex in J 3 then has a unique representation by a triplet (v, ~r, s) where v is a central vertex, 7r is a permutation of {1 . . . . . n + 1} and s is an n - v e c t o r with si E {-1, + 1}. A complete description of J 3 can be found in Todd [17]. Given a point to set mapping ! f r o m R n into n o n e m p t y subsets of R", and a 1-1 afflne mapping r f r o m R" into R ", we say a n-simplex o- = (v ~. . . . . v "§ is (a) r-complete if 0 E hull {r(o-)}, (b) 1 U r-complete if 0 E hull {r(o-) U l(o-)}, (c) /-complete if 0 E hull {l(o-)}. These algorithms, starting with a unique r-complete simplex o-0 containing the unique zero of r, generate a sequence of I U r-complete simplexes o-0, o't, o-2. . . . . o-k. . . . . In case these simplexes lie in a bounded region, it can be readily shown that if they are f r o m J3, there is a subsequence m,, m:, o-~,. . . . , m, . . . . . of /-complete simplexes such that o-~k C R " • {D.2 -~} and, so the diameter of m, approaches zero as k approaches ~. In case the simplexes m lie in a bounded region, we will say that the algorithm has succeeded. The justification for this is the following. Theorem 2.1. L e t o-~, o-2. . . . . o-t. . . . be a s e q u e n c e o f l - c o m p l e t e simplexes which lie in a b o u n d e d set, B, and the d i a m e t e r ek O[ o-k a p p r o a c h e s 0 as k a p p r o a c h e s
144
R. Saigal/ I~xed point approach to nonlinear programming
oo. In addition, let 1 be an upper semi-continuous point-to-set mapping, and f o r each x, l(x), a nonempty, compact and convex subset of R". Then, if ~ is a cluster point of {xk}~=0 with xk ~ or~, then 0 E 1(,2). Proof. Since {ork}~-0 lie in a bounded set B, under the hypothesis on l, l(/~) is compact, hence l(B) is bounded (/3 is the closure of B). N o w , as ork = (v~,, v~. . . . . v~ § is /-complete, there exist y~.*kE l(viD and Xi.k:> O, ~a)ki,k 1 such that ~iAi.ky*k=O. Since OAzfor all i and yz*~~ y~ for all i. Thus :
E Aiyi* = O,
EAi=
l,
Ai->0.
t
Also, as dia(or k) a p p r o a c h e s 0, on some c o m m o n subsequence v~ ~ s for all i. Since y~* E I(v~), using the upper semi-continuity of l we have y* E l(~), and since l(~) is convex, we have our result. Starting with some initial simplex or0 of size e0, if the simplexes generated by the algorithm do not lie in a bounded region, we will say that the algorithm has failed. In this case, for each bounded set D containing x0, there exists a i LI r-complete simplex or = (V ~. . . . . V~ ) of size ~ <_--~0, such that or ~ D and for s o m e x E or, (x - x 0 , v i - x0)> 0 for all i. We will also say that the algorithm has generated l U r - c o m p l e t e simplexes su]ficiently f a r f r o m x0, to express the a b o v e relation, and that the algorithm has generated or sutiiciently far f r o m a set D if for some x E or, (x - x 0 , v i - x ~ ) > 0 for e v e r y x0, x~ in D. In subsequent sections, where we consider applications to nonlinear programming, we will frequently choose r ( x ) = X - X o , and define a point-to-set mapping l such that if, for some x, 0 E l(x), then x is a solution to our problem. 3. C o n v e x c a s e
In this section, we will consider the application of fixed point algorithms for solving (1.1-2) when the underlying functions are convex, not necessarily differentiable. We will m a k e the simplifying assumption that the functions are defined o v e r all of R ", and that they are finite. 3.1. Unconstrained case We now consider the problem of minimizing go when the set {x:go(x)<= g0(x0)} is bounded for s o m e x0, and the function go is convex. The subdifferential set Ogo(x) of a c o n v e x function go at x is the set of all vectors x* in R" such that g0(y)--- go(x) + (x*, y - x)
for all y in R n
(3.1)
and under our assumption this set is n o n e m p t y , closed and bounded, and convex. Also, as a point-to-set mapping, it is upper semi-continuous [10]. A trivial c o n s e q u e n c e of (3.1) is the following theorem.
R. Saigal[ Fixed point approach to nonlinear programming
145
Theorem 3.1. U n d e r the above conditions on go, g solves (1.1) if and only if 0 ~ ago(g). We n o w s h o w that the algorithms of Section 2 i m p l e m e n t e d with r(x) = x - Xo,
l(x) = ago(x).
for an arbitrary starting point x0 and initial grid size 6o will c o n v e r g e to a solution of (1.1). L e t B ( x , 6) = {y: Ily - xll < 6}, and M ( x , E) = sup{go(y): y E B ( x , E)}.
T h e o r e m 3.2. L e t I and r be as defined above, and go satisfy the a b o v e conditions. Then, f o r each r > O, starting with the unique r - c o m p l e t e s i m p l e x containing x0, the fixed p o i n t algorithms will generate a sequence, w h o s e cluster points g satisfy 0 E l(g).
Proof. A s s u m e that the algorithm fails and thus does not g e n e r a t e simplexes in a b o u n d e d region. N o w , define D = {x: go(x) --<M(x0, E0)} Since D is b o u n d e d , we can find a l U r - c o m p l e t e simplex ~r = (v 1. . . . . v "§ of size 6 -< r sufficiently far f r o m x0. H e n c e , for s o m e x E t r , (v i - x0, x - x0) > 0, and v ~q~ D f o r e a c h i = 1 . . . . . n + 1. N o w , as go is c o n v e x , b y (3.1) for e v e r y y* E ago(v ~) and e v e r y i go(v ~- x + Xo) :> go(v i) + (Xo - x, y*) Since v ~- x + x 0 ~ D, we have ( X o - X , y*) < 0 for all y* E ago(v ~) and e v e r y i. H e n c e , using F a r k a s lemma, we claim that o- c a n n o t be ! U r - c o m p l e t e , and we h a v e a contradiction. The result n o w follows f r o m T h e o r e m 2.1. N o w , let a = (v 1. . . . . v "+~) be a / - c o m p l e t e simplex of d i a m e t e r E > 0. Then there exist y ~ E a g 0 ( v ~ ) , A , = 0 , i = 1, .,n+l,Y.A~ 1 such that ZA~y~'=0. We can then p r o v e , as d o n e by Merrill [7], that:
Theorem 3.3. L e t tr be as a b o v e a n d g be a solution to (3.1). Then there is an x E ~r such that g0(x) => g o ( x ) - ~ , Ai(v', y*>,
(3.2)
I
and M ( g , E) -> go(x).
Proof. Let x = Y.Aivi. F r o m (3.1) go(x) > go(v i) + (x - v i, yi*).
(3.3)
146
R. Saigal/ Fixed point approach to nonlinear programming
Hence
g0(~) >-- Y, x,g0(v') - ~x,(v', y*) i-I
>=go(X) - ~ ~i--go(v i) + (s - x, y*) and so
x~0(v' - x + ~) >- ~ , X,go(v') >=go(x). But as v ~ x + $ E BGf, ~), we have o u r result. N o t e that in T h e o r e m 3.3, (3.2) gives a c o m p u t a b l e lower b o u n d on the m i n i m u m value of g ( x ) and can thus be used as a stopping rule. Also, (3.3) s h o w s that the algorithm is c o n v e r g i n g to a m i n i m u m . 3.2. Constrained case W e n o w c o n s i d e r the p r o b l e m (1.1-2) w h e n the f u n c t i o n s g~ are c o n v e x functions. Define, s ( x ) = max{gi(x): 1 <=i <--m} and we note the s is also a c o n v e x function. W e n o w a s s u m e that the set {x: s(x)<= s(x0)} is b o u n d e d for s o m e x0 and if {x: s(x)=<0} is n o n - e m p t y , then s ( x ) < 0 f o r s o m e x. N o w , define the m a p p i n g
l(x) =
ago(x)
if s (x) < 0,
hull{Ogo(x), Os(x)}
if s ( x ) = 0
Os(x)
if s ( x ) > O.
(3.4)
Theorem 3.4. L e t ~ be such that 0 E l(.f). T h e n ~ solves (1.1-2) or indicates that (1.2) has no solution.
Proof. T h e r e are three cases. Case (i). s(~) > 0 . In this case 0 ~ Os(~), and thus ~ is a global minimizer o f s, and h e n c e the c o n s t r a i n t set (1.2) is e m p t y . Case (ii). s ( g ) < 0 . In this case 0 E Og0(~) and h e n c e ~ is a global minimizer of go. Since it also satisfies (1.2), ~ solves (1.1-2). Case (iii). s(~) = 0. In this case, there is a z* ~ Ogo(,f), y* E Os(2) and p > 0 such that p z * + y* = 0 . p > 0 , since the c o n t r a r y implies that 0 E as(~) and so s(g) < 0 is impossible. L e t I(~) = {i : gi(x) = 0}. Then, Os(g) = h u l l { U iett~) 0g;(g)}
R. Saigal/ Fixed point approach to nonlinear programming
147
Ai = 1, and yi* E agi(,~) such that
and so there are n u m b e r s A,>= 0 , i E I(.~), ~ y* = ~ Xiy'*,. H e n c e , pz* + ~, Air* = 0. N o w , let y satisfy (1.2). Then, f r o m (3.1),
go(Y) >=go(E) + (z*, y - ~)), g,(y) >=g~($) + (y~*, y - 2),
i E I(Y,).
Hence
pg0(y) ->-pg0(y) +~. x~(y) >=pgo(x)+(pz* + ~ Aiy~', y - ~ ) = pgo(~)
and hence ~ solves (1.1-2). We n o w s h o w that the algorithm initiated with
r(x) = x - Xo for arbitrary x0 and l(x) as in (3.4) will c o m p u t e an .~ such that 0 E 1($).
T h e o r e m 3.5. Let s satisfy the above assumptions, and let the algorithm implement l and r as above. Then, for arbitrary c o > 0 , starting with a unique r-complete simplex of size co, the algorithm will generate a sequence whose cluster points ~ satisfy 0 @ l($), and thus solve (1.1-2).
Proof. L e t
M(xo, ~) = max{sup{s(x): x C B(Xo, e)}, 0} and D = {x: s(x)<--_M(xo, E)}. By a s s u m p t i o n , D is b o u n d e d . N o w , a s s u m e that the algorithm fails. H e n c e , it g e n e r a t e s a simplex cr = (v ~. . . . . v "*~) of d i a m e t e r > 0 such that tr is l U r - c o m p l e t e , and sufficiently far f r o m D, i.e., tr • D and for e v e r y x E t r , (v i - x0, x - x 0 ) > 0. Also s(v ~) > 0 for all i. N o w , c o n s i d e r the point vi - x + xo E:_D. T h e n
s(v ~ - x +xo)>= s ( v i ) + ( X o - X , y*)
f o r all y* E as(v~).
Since v i ~ D , we h a v e ( x - x 0 , y * ) > 0 f o r all y* E as(v ~) and all i; and, f r o m F a r k a s ' l e m m a , tr c a n n o t be l U r - c o m p l e t e , a contradiction. H e n c e , as l is u p p e r s e m i - c o n t i n u o u s and l(x) are n o n - e m p t y c o m p a c t c o n v e x s u b s e t s of R", f r o m T h e o r e m 2.1, we h a v e our result. We n o w a s s u m e that there is no solution to (1.2). H e n c e s ( x ) > 0 for all x, and thus (3.4) r e d u c e s to l(x) = as(x). A c o n s e q u e n c e of T h e o r e m 3.3 is the following. (This result also a p p e a r s in Merrill [7].)
148
R. Saigal/ Fixed point approach to nonlinear programming
3.6. L e t s ( x ) > 0 f o r all x, and that {x: s(x) <- s(x0)} is b o u n d e d f o r s o m e xo. Then, the algorithm will detect the infeasibility o f (1.2) in a finite n u m b e r o f iterations.
Theorem
Proof. F o r e a c h 9 > 0 , the a l g o r i t h m s c o m p u t e a 1 U r - c o m p l e t e s i m p l e x in a finite n u m b e r o f i t e r a t i o n s . A l s o , s i n c e s ( x ) > 0, it will a t t e m p t to m i n i m i z e s(x). N o w , let o- = (v t . . . . . v "'~) be an / - c o m p l e t e s i m p l e x of size 9 > 0 f o u n d b y the a l g o r i t h m . T h e n , t h e r e a r e y* E Os(v ~) s u c h t h a t ~ . hiy* = 0, ~ Ai = 1, hi -> 0 has a solution. A l s o , f r o m T h e o r e m 3.3, if g m i n i m i z e s s,
s(g) >- s ( x ) - ~ x,(v ~, y*) = s(x)-
Y . xi
N o w , define D = { x : s ( x ) < - M ( g , 9 when M(g, 9 = sup{s(x): x E B(g, 9 F r o m T h e o r e m 3.3, x E D and tr C B ( D , 9 Define N->_ Ily*l[ f o r y* E 8s(x), x ~ B(D, 9 Then
rY~ ,~i(v'
I
x, y*) _-< N 9
as 9 ~ O, a n d s(x) > O, f o r s o m e sufficiently s m a l l 9 > O, s ( x ) - ~, ,L(v i, y~) > O, and hence we are done.
In a d d i t i o n , we c a n o b t a i n a l o w e r b o u n d o n the f u n c t i o n in this c a s e as well. L e t ~ r = ( v ~. . . . . v ~. . . . . v r b e l a b e l e d b y y* E 8g0(v ~) a n d v "'~ . . . . . H e n c e s ( v ~ ) < O , i = 1 . . . . . r and s(v~)>-_6 f o r
optimal value of the objective v ~*~) be / - c o m p l e t e a n d let v "*f b e l a b e l e d b y y * E a s ( v i ) . i=r+l ..... n+l. A l s o , let
E Aiy~ = 0, E Ai = 1, Ai ~ 0, 0 = 2 1 A i a n d ~ -- ( 1 [ 0 ) ~ . ~ i Aivi" T h e n w e c a n prove:
Theorem
3.7. Let
v i. y*,
O, ~ be as
go(x) >=go(,r ) - 1 ,~, Ai(vi, y . ) . I:1
Proof. U s i n g (3.1) w e get, f o r i = 1. . . . .
go(g) >- go(v i) + (g - v i, Y*) and for i = r + 1.....
n + 1, w e get
s(g) >=s ( x ) + (g - v ~, y*).
r
above,
and
g solve
(1.1-2).
Then
R. Saigal/ PTxed point approach to nonlinear programming
H e n c e , as s ( s
149
~0
1
r
,,*I
1 ? I ~(v', y*) tl
->- go(~) -
1 ,,-I
*
and w e h a v e o u r result.
4. Piecewise linear functions and nonlinear programming In this s e c t i o n w e e s t a b l i s h the n o t a t i o n and p r o v e s o m e b a s i c r e s u l t s f o r n o n l i n e a r p r o g r a m s with p i e c e w i s e l i n e a r f u n c t i o n s .
Cells and m a n i f o l d s A cell is the c o n v e x hull of a finite n u m b e r o f p o i n t s a n d half lines (half lines are sets o f t h e t y p e {x: x = a + tb, t >=0} w h e r e a a n d b a r e fixed v e c t o r s in R"). T h e d i m e n s i o n o f a cell is the m a x i m u m n u m b e r o f l i n e a r l y i n d e p e n d e n t p o i n t s in the cell. W e will call an n d i m e n s i o n a l cell an n - c e l l . L e t ~- b e a s u b s e t o f an n - c e l l or. If x, y • o ' , O < A < I , (1-A)x+AyE~i m p l i e s that x, y in r, then t is c a l l e d a f a c e o f a cell o-. A s i m p l e f a c t is t h a t f a c e s are cells. A l s o f a c e s that a r e (n - l ) - c e l l s a r e called f a c e t s o f the ceil, a n d that a r e O-cells a r e c a l l e d v e r t i c e s o f the cell. 0 # M b e a c o l l e c t i o n o f n-cells in R". L e t M = U ~ e ~ . W e call ( M , M ) a s u b d i v i d e d n - m a n i f o l d if (4.1) A n y t w o n - c e l l s of M t h a t m e e t , d o so on a c o m m o n f a c e . (4.2) E a c h (n - I ) - f a c e o f a cell lies in at m o s t t w o n - c e l l s . (4.3) E a c h x in M lies in a finite n u m b e r o f n - c e l l s in M. If (M. M ) is a s u b d i v i d e d n - m a n i f o l d f o r s o m e ~t, w e call M a n - m a n i f o l d .
Piecewise linear f u n c t i o n s L e t M be a n - m a n i f o l d , t h e n the f u n c t i o n
g:M~R is c a l l e d p i e c e w i s e l i n e a r on a s u b d i v i s i o n M o f M if (4.4) g is c o n t i n u o u s (4.5) G i v e n a cell tr in d/, t h e r e e x i s t s an affine f u n c t i o n g,, : R" ~ R such that g l ~ ( x ) = g,,(x) (i.e., g r e s t r i c t e d to tr is g~).
150
R. Saigal/ Fixed point approach to nonlinear programming
Generalized subdifferentials L e t M be a n - m a n i f o l d , a n d ~ b e its s u b d i v i s i o n . L e t g : M - - - , R b e a p i e c e w i s e l i n e a r f u n c t i o n . T h e n , f o r e a c h x E M, w e define a g e n e r a l i z e d s u b d i f f e r e n t i a l set cgg(x) as f o l l o w s : F r o m (4.3), x lies in a finite n u m b e r o f n - c e l l s , ~r~, o'2. . . . . crr in ~ say. L e t Vg~, = ai ( w h e r e Vf is the g r a d i e n t v e c t o r o f f ) . T h e n , w e define
Og(x) = hull{al . . . . . a,} a n d w e n o t e t h a t if g in a d d i t i o n , is c o n v e x , t h e n Og(x) is the s u b d i f f e r e n t i a l o f g at x, R o c k a f e l l a r [10]; a n d , as g is l o c a l l y L i p s c h i t z c o n t i n u o u s , Og(x) is t h e g e n e r a l i z e d g r a d i e n t o f C l a r k e [1]. In t h a t c a s e , the t h e o r e m b e l o w is k n o w n , [2], b u t w e will use t h e p i e c e w i s e l i n e a r i t y o f g to e s t a b l i s h it. T h e o r e m 4.1. / f x is a local m i n i m u m o f g0,then 0 ~ 3go(x). Proof. A s s u m e X is a l o c a l m i n i m u m b u t 0 qf-,ggo(X). N o w , let g E crt fq crz n 999fq or, T h e n ago(X) = hull{a~ . . . . . at}. H e n c e , f r o m F a r k a s ' l e m m a , t h e r e is a z # 0 s u c h t h a t ( z , a ~ ) < 0 f o r i = 1 . . . . . r. L e t 9 b e sufficiently s m a l l so the B(X, 9 C U crj. T h e n X + Oz ~ B(X, 9 f o r sufficiently small 0 > 0. A s s u m e X + Oz E o'j f o r s o m e j. H e n c e
go(X + Oz) = (aj, ~) + O(aj, z) - ~,~ = go(x) + O(ai, z) < go(X) a n d w e h a v e a c o n t r a d i c t i o n to the f a c t t h a t X is a local m i n i m u m . G i v e n a p o i n t - t o - s e t m a p p i n g F f r o m R" to n o n e m p t y s u b s e t s o f R", w e s a y F is w e a k l y m o n o t o n e at X with r e s p e c t to 27* E I ( ~ ) on F if t h e r e is an 9 > 0 such t h a t f o r all x ~ B(~, 9 N F. (x - -L Y* - 27*) => 0
for all y* in F(x).
We can then prove: T h e o r e m 4.2. g is a local minimum of go if and only if 0 ~ Ogo(g) and 3go is weakly monotone at X with respect to 0 on R". Proof. L e t X lie in t h e cells trt, ~r2. . . . . tr,. T h e n ago(X) = hull{at . . . . at}. F o r s o m e sufficiently small E > 0 , let B(g, 9 C U try. T o s e e t h e if p a r t , let 0 E ag0(g) a n d let Ogo b e w e a k l y m o n o t o n e with r e s p e c t to z e r o at X. H e n c e , f o r s o m e ~ > 0, for all x E B(g, ~) w e h a v e (x - X, ai) >= 0
w h e r e ai E Ogo(x) C ago(X).
R. Saigal/ Fixed point approach to nonlinear programming
151
Hence, go(x) - go(Y~) = (ai, x ) - "Yi - (ai, -~) + "/i ~ O,
and so ~ is a local m i n i m u m of go- T o see the only if part, let 0 E ago(g) and Ogo not w e a k l y m o n o t o n e with respect to O. Then, for a sufficiently small 9 > 0 such that B ( ~ , 9 U ~r~, there is an x G B ( $ , 9 and a a~E ago(x) such that (x - JL a~) < O. Since ago(x) c ago(x), we have go(x) - go(x) = (ai, x ) - yi - (ai, .~) + yi = (x - x, a,)
<0 which is a c o n t r a d i c t i o n . The c o n s t r a i n e d p r o b l e m
Let g~ : R" ~ R be piecewise linear f u n c t i o n s on subdivided manifolds (R", j / i ) , respectively, f o r e a c h i = 0 , . . . , m. W e n o w consider the c o n s t r a i n e d minimization p r o b l e m (1.1-2). i O ' 2 , . . ., err, i as the n-ceils of ~ F o r a generic point x in R" we define cry, in which x lies, and g,l~ri(y) = (a I, y) - 3,~,for each i = 0 . . . . . m. Also note that, by definition, r~ is finite. Also, there exists 9 > 0 such that B ( x , 9 C Ujcrj f o r each i. W e are n o w r e a d y to establish the n e c e s s a r y c o n d i t i o n s ' f o r ~ to be a local m i n i m u m o f (1.1-2). This t h e o r e m also a p p e a r s in Clarke [2]. Theorem 4.3. L e t ~ be a local m i n i m u m o f go o v e r all x s a t i s f y i n g (1.2). Then (i) There exists M >- 0 s u c h t h a t
Adgi(.~) = 0, i = 1. . . . , m. (ii) There e x i s t s y* E Ogo(~), z * E ag~(,f) s u c h that 0 = y * + ~ Aiz*. i-t
Proof. Let X be a local minimum, I ( ~ ) = {i: g,(.f) = 0},
C =
U
iEl(~)
Ogi(Y,),
and
cone(C)= {y: y= ~] A~y*,y'3~ C, Ai>O}. i=1
Also, let 0 ~ Og0(,~) + c o n e ( C ) . (It can readily be confirmed that (i) and (ii) hold if and o n l y if 0 E ago($) + cone(C).) T h e n , f r o m Farkas, lemma, since both ago($) and C are c o n v e x
152
R. Saigal/ Fixed point approach to nonlinear programming
c o m b i n a t i o n s of a finite n u m b e r of v e c t o r s , there exists a z such that (z, y*) < 0
f o r all y* ~ Ogo(~),
(z,y*)_-<0
forally*EC.
N o w , consider x -- $ + Oz for sufficiently small iq~ I ( ~ ) , g i ( x ) < O , and x E B(.~, E). H e n c e , for s o m e a ~
go(x)-go(s176
-7i~
0 >0 such that for Ogo($), a ~ Ogo(x). So,
o ~)+7O=O(aO, z)
Also, for i E I(:D, there is a a I E c~g~(~) such that al ~ Og~(x). H e n c e
g,(x) - gi(x) = (al, x) - Yl - (al, x ) + Y} = (z, a j) <=O. Since gi(~) = 0, we get a contradiction that ~ is not a local m i n i m u m . W e n o w p r o v e a sufficiency condition.
Theorem 4.4. Let ~ be a point such that (i) There exist hi >- 0 f o r which i = 1 . . . . . m.
h/gi(-~) = O,
(ii) Define the m a p m
F ( x ) = Ogo(x) + ~ , AiOgi(x). Then OEF(~). i=l
(iii) F is weakly m o n o t o n e at ~ with respect to 0 on the set F = {x: gi(X) <: O, i -: I . . . . . m}. T h e n s is a local m i n i m u m of go on F. Proof. L e t x E B(.~, r e a c h i = 0 . . . . . m. T h e n
F, and 9 sufficiently small so that B(s r
U i t r i for
go(x) - go(x) ~ go(x) + s A~gi(x) - go(x) - ~ ] A~gi(.~) i-I
i=l
Y, iEl(g}
>=0 since a ~ ~,Aia~i E F ( x ) , and F is w e a k l y m o n o t o n e with r e s p e c t to 0 at x, and so ~ is a local m i n i m u m .
5. The fixed point approach to PL nonlinear programming W e will c o n s i d e r the application of the fixed point algorithm o f [4] to the case w h e r e go is piecewise linear on s o m e subdivision of R", and gi are c o n v e x
R. Saigal/ Fixed point approach to nonlinear programming
153
functions. In this case, the mapping I, (3.4), is applicable. As is evident from T h e o r e m 3.5, in this case we will p r o v e that the algorithms will find a "stationary point" ~ such that 0 E l(s In certain special cases, the progression of the algorithm will indicate if ~ is a local minimum, Saari and Saigal [I1], for the general case considered here, ~ may be a local m a x i m u m or a saddle point of the function go. That the algorithm will compute a stationary point can be established in a manner similar to the proof of T h e o r e m 3.5. Since s is c o n v e x , the part of T h e o r e m 3.4 pertaining to the nonexistence of a solution to (1.2) also carries through. The c o n v e r g e n c e of the algorithms can also be p r o v e d under the relaxed hypothesis that s be convex outside some bounded region; i.e., if D is a bounded set containing x0, then for each x ~ D, and y* E as(x), s ( z ) >=s ( x ) + (y*, z - x)
for all z ~ D .
Then, starting with r(x) = x - xo and l as defined in (3.4) we can prove: Theorem 5.1. L e t s be c o n v e x outside a b o u n d e d region D, a n d r and I as above. Then, f o r any ~o > O, starting with the unique r-complete simplex containing Xo, the algorithm will generate a s e q u e n c e w h o s e cluster points ~ satisfy 0 E l(~).
Proof. Let N = max{0, sup{s(x): x E B(xt, ~o) U D} for an arbitrary x~ such that B(xt, e ) n D = ( J and let D = {x: s ( x ) <- N } . By assumption /3 is bounded, B(x~, e0) and D are subsets o f / 5 , and s is convex outside/5. N o w , assume that, for some e0 > 0 , the algorithm fails. Then, there is a l U r - c o m p l e t e simplex tr = (v' . . . . . v "§ of diameter --<~0 sufficiently far f r o m /5; i.e., or • / 5 and for some x C or, (v i - x o , x - x ~ ) > 0 . Also, s(vi) > 0 for all i. N o w consider the point v i - x +x~ E / 3 . Also, by assumption, v ~ - x +x~ ~ D. H e n c e s ( v i - x + x l ) > s ( v ~ ) + ( x ~ - x, y *)
for a l l y * E a s ( v i)
and as v ~ b and v i - x + x t E / 3 we have ( x - x ~ , y * ) > 0 for all y* E Os(v~); for all i. Thus tr is not l U r-complete, a contradiction. The result now follows f r o m T h e o r e m 2.1.
6. Computational considerations As is evident f r o m the Sections 3 and 5, the convergence of the fixed point algorithms can be established under some general conditions on the problem, and differentiability is not necessary. Computational experience indicates that the computation burden increases when the underlying mappings are not smooth. For smooth mappings, under the usual conditions, the fixed point algorithms can be made to converge quadratically Saigal [15]. This can be observed by comparing the solution of four nondifferentiable nonlinear pro-
R. Saigal[ Fixed point approach to nonlinear programming
154
Table A. l Constrained minimization problem with piecewise linear objective function and one piecewise linear constraint in seven variables. Grid of Search
Number of function evaluations
Number of simplexes searched
8.0 4.0 2.0 1.0 0.5 0.25
34 111 28 286 539 487
121 160 54 699 1,034 918
Table A.2 Unconstrained minimization of a piecewise linear convex function of 43 variables. Grid of search
Number of function evaluations
Number of simplexes searched
6.55 3.27 1.63 0.81 0.41 0.20 0.10
951 469 400 526 623 1,081 885
1,431 469 400 526 623 1,081 885
Grid of search
Number of function evaluations
Number of simplexes searched
1.93 0.97 0.48 0.24 0.24 0.12 0.12 0.12 0.06 0.06 0.06 0.06 0.03 0.15 0.07 0.07 0.O7 0.003
1895 1398 608 232 218 433 189 377 679 222 259 130 346 254 9O 171 59 262
3872 1398 608 232 218 433 189 377 679 222 259 130 346 254 9O 171 59 262
Table A.3
R. Saigal/ Fixed point approach to nonlinear programming
155
Table A.4 Grid of search 1.94 0.96 0.48 0.24 0.02 0.06 0.03 0.015 0.007 0.004
Number of function evaluations 1140 42 53 52 16 16 16 16 16 16
Number of simplexes searched 1759 42 53 52 16 16 16 16 16 16
Table A.5 Zero finding problem for a smooth function of 80 variables. The accelerated algorithm has been used to solve this problem. Grid of search
Number of function evaluations
Number of simplexes searched
8.74 4.47 2.24 I. 18 0.56 0.24 0.14 0.03 0.002 0.000009
i ,573 149 106 97 88 84 81 84 81 82
1,750 149 106 97 88 84 81 84 81 82
g r a m m i n g p r o b l e m s i m p l e m e n t i n g t h e m a p p i n g (3.4) p r e s e n t e d in t h e a p p e n d i x , T a b l e s A. 1-4, w i t h t h e s o l u t i o n o f a s m o o t h p r o b l e m o f e i g h t y v a r i a b l e s in T a b l e A.5. On such a p r o b l e m , r e p o r t e d in N e t r a v a l i a n d S a i g a l [9], t h e g r o w t h o f the number of function evaluations with the number of variables was tested. The r e s u l t s w e r e a s a n t i c i p a t e d b y the w o r k s o f Saigal [12] a n d T o d d [17]. It w a s p r e d i c t e d in t h e s e w o r k s t h a t t h e f u n c t i o n e v a l u a t i o n s g r o w a s O(n2), w h e r e n is t h e n u m b e r o f v a r i a b l e s . (See [9, 4. I].)
Appendix W e n o w g i v e s o m e c o m p u t a t i o n a l e x p e r i e n c e with s o l v i n g n o n d i f f e r e n t i a b l e o p t i m i z a t i o n p r o b l e m s o f f a i r l y large n u m b e r o f v a r i a b l e s . F o r c o m p a r i s o n
R. Saigal/ Fixed point approach to nonlinear programming
156
purposes, we also give the results of solving an eighty-variable smooth problem (where the c o n v e r g e n c e has been accelerated).
Problem 1. This is a 7 variable problem. It is a version of the problem considered by Natravali and Saigal [91. The value of entropy on the entropy constraint is 2.7, and this is the 19th run in the series of runs done on this problem.
Problem 2. This is a 43 variable problem considered in W.B. Eisner, " A descent algorithm for the muitihour sizing of traffic n e t w o r k s , " Bell System Technical Journal 56 (1977) 1405-1430. This run was made on a piecewise linear version, while the problem formulated by Eisner was piecewise smooth. The function is convex.
Problem 3. This is the following 15 variable convex piecewise s m o o t h problem: min x
max fi(x) I~j_<-35
where 5
lO
fi(x) = 2 ~ cijx,o+, + 3dix~o,, + ei -- ~ a,ix,. i-1
] = 1. . . . . 5.
i=l
fj(x)=-xj,
j = 6 . . . . . 20,
and
fj(x)=xj-lO,
j = 2 1 . . . . . 35.
and the data a~i, c~j, d r, ei are the same as that for problem 10 in the appendix of D.M. Himmelblau, Applied Nonlinear Programming (McGraw-Hill, N e w York, 1972). The starting point is x~ = 0.001, i4:7, x7 = 60.
Problem 4. This is the following 15 variable minimization problem: 10
5
m i n - E bix, + • i -I
5
5
E c,ixi.,oxi.,o + 2 E dkx]
i-I j-I
j=l
subject to /~(x)_-
j = 1....
,35
where ,fi are the s a m e as those for problem 3. The data b~ can be obtained from the problem 10 of D.M. Himmelblau, Applied nonlinear programming (McGraw-Hill, N e w York, 1972). The starting point is x~ = 0.001, i ~ 7, x7 -: 60.0.
R. Saigal/ Fixed point approach to nonlinear programming
157
P r o b l e m 5. Is the 80 v a r i a b l e p r o b l e m , c o n s i d e r e d b y Kellogg, Li, a n d Y o r k e , " A c o n s t r u c t i v e p r o o f of B r o u w e r ' s fixed p o i n t t h e o r e m a n d c o m p u t a t i o n a l r e s u l t s , " S I A M Journal o f N u m e r i c a l M a t h e m a t i c s 13 (1976) 473--483. T h e r e s u l t s of the a b o v e f o u r p r o b l e m s are s u m m a r i z e d in T a b l e s A.1-5, respectively.
References [1] F.H. Clarke, "Generalized gradients and applications," Transactions of the American Mathematical Society 205 (1975) 247-262. [2] F.H. Clarke, "A new approach to Lagrange multipliers," Mathematics of Operations Research 1 (1976) 165-174. [3] B.C. Eaves, "Nonlinear programming via Kakutani fixed points," Working Paper No. 294, Center for Research in Management Science, University of California, Berkeley, 1970. [4] B.C. Eaves and R. Saigal, "Homotopies for computation of fixed points on unbounded regions," Mathematical Programming 8 (1972) 134--145, [5] T. Hansen, "A fixed point algorithm for approximating the optimal solution of a concave programming problem," Cowles Foundation Discussion Paper No. 277, Yale University, New Haven, Connecticut, 1969. [6] T. Hansen and H. Scarf, "On the applications of a recent combinational algorithm," Cowles Foundation Discussion Paper No. 272, Yale University, New Haven, Connecticut, 1969. [7] O.H. Merrill, "Applications and extensions of an algorithm that computes fixed points of certain upper-semi continuous point to set mappings," Ph.D. Dissertation, University of Michigan, Ann Arbor, 1972. [8] R. Mifflin, "An algorithm for constrained optimization with semismooth functions," Mathematics of Operations Research 2 (1977) 191-207. [9] A. Netravali and R. Saigal, "'Optimum quantizer design using a fixed point algorithm," The Bell System Technical Journal 55 (1976) 1423-1435. [10] R.T. Rockafellar, Convex analysis (Princeton University Press, Princeton, NJ, 1970). Ill] D.G. Saari and R. Saigal, in preparation. [12l R. Saigal, "Investigations into the efficiency of fixed point algorithms," in: S. Karamardian, ed., Fixed points - algorithms and applications (Academic Press, New York, 1977). [13] R. Saigal, "Fixed point computing methods," Encyclopedia of Computer Science and Technology, Vol. 8 (Marcel Dekker, New York, 1977). [14] R. Saigal, "On paths generated by fixed point algorithms," Mathematics of Operations Research l (1976) 359-380. [15] R. Saigal, "On the convergence rate of algorithms for solving equations that are based on methods of complementary pivoting," Mathematics o[ Operations Research 2 (1977) 108-124. [16] H. Scarf, "The approximation of fixed points of a continuous mapping, SIAM Journal of Applied Mathematics 16 (1967) 1328-1343. [17] M.J. Todd, "On triangulations for computing fixed points," Mathematical Programming 10 (1976) 322-346. [18] M.J. Todd, The computation of fixed points and applications (Springer-Verlag, Berlin-Heidelburg, 1976). [19] M.J. Todd, "New fixed point algorithms for economic equilibria and constrained optimization," Tech. Report No. 362, School of Operations Research, Cornel University, Ithaca, New York, 1977. [20] P. Wolfe, "A method of conjugate subgradients for minimizing nondifferentiable functions," in: M.L. Balinski and P. Wolfe, eds., Nondifferentiable optimization, Mathematical Programming Study 3 (North-Holland, Amsterdam, 1975) 145-173.
Mathematical Programming Study 10 (1979) 158-171. North-Holland Publishing Company
CONVERGENCE ANALYSIS FOR TWO-LEVEL OF MATHEMATICAL PROGRAMMING*
ALGORITHMS
J. S Z Y M A N O W S K I
Politechnika Warszawska, Warsaw,Poland and A. R U S Z C Z Y N S K I
Politechnika Warszawska, Warsaw,Poland Received 12 July 1977 Revised manuscript received 30 March 1978 A general scheme, covering a wide class of two-level methods, is formulated. The possibilities for handling truncation errors and stopping rules on both levels are described. The convergence of the algorithm is proved under general assumptions, and computational results for an example of a minimax problem are given.
Key words: Convergence Analysis, Truncation Error, Two-level Algorithm, Stopping Rule, Mathematical Programming.
1. Introduction T h e r e is a w i d e c l a s s o f m a t h e m a t i c a l p r o g r a m m i n g m e t h o d s w h i c h a r e in e s s e n c e t w o - l e v e l m e t h o d s . T h e s e m e t h o d s c o n s i s t in t h e e x i s t e n c e o f s o m e s u p e r v i s o r y (upper-level) algorithm which invokes subroutines solving auxiliary (lower-level) o p t i m i z a t i o n p r o b l e m s . W e will i l l u s t r a t e this o n t h r e e t y p i c a l e x a m p l e s . (i) T h e m i n i m a x p r o b l e m rain m a x f(x, y) x~X
(1)
y• Y
w h e r e X C R n, Y C R m, f : R n • R m ~ R 1, is u s u a l l y s o l v e d in a t w o - l e v e l s c h e m e . T h e lower-level problem is as f o l l o w s m a x f ( x , y).
(2)
y~Y
S o l u t i o n s )~(x) o f this p r o b l e m a r e i n t r o d u c e d to t h e upper-level problem min f(x, y(x))
(3)
xff:X
* Revised version of the paper presented at the IX International Symposium on Mathematical Programming, Budapest 1976. 158
J. Szymanowski, A. Ruszczynski/ Two-level algorithms
159
(ii) Price m e t h o d is used in the most simple case for the solution of problems in the form of: N
min ~ fi(xi),
(4)
i=l
subject to N
= h}(x j)<-O, J = l ..... m, X i ~_ X i,
where X i C R hi,
fi : Rni ...) R 1,
h} : R "~~ RmJ.
In this method upper-level variables p = (P~,Pz . . . . . Pro), Pj E R r"j, are introduced and N lower-level problems are formulated min [L'(x ~,p) = [i(xi) + ~ (Pi, h}(xi))], xi~X i
i = 1. . . . . N.
(5)
/=1
of these problems are used in the upper-level problem
Solutions s
N
max ~ Li(,fi(p),p).
(6)
p~O i = I
(iii) Primal method is used in the most simple case for the solution of problems in the form of N
min ~ [i(x, yi),
(7)
i=I
subject to the constraints (x, y~) ~ Y~,
xE X
where
X C R",
y i C R" x R "~,
fi : R" x R mi~ R I.
Independent lower-level problems are formulated (at fixed x) min [i(x, yi), (x, yi)•
i = 1, 2 . . . . . N.
(8)
yi
Solutions ~i(x) found in this way are introduced to the upper-level problem N
min ~. [i(x, ~i(x)). xEX
(9)
1=1
Of course above transformations are not always possible. Conditions, under which they are applicable, are discussed in numerous references. Important feature of above-mentioned methods is that solution of upper-level problems (3), (6) and (9) calls for multiple execution of lower-level programs (2),
160
J. Szymanowski, A. Ruszczynskil Two-level algorithms
(5), (8) at different values of upper-level variables. Usually, it is assumed that lower-level solutions are unique and are computed precisely. These assumptions can hardly be satisfied in practice because computation of lower-level solutions requires infinite number of function evaluations. Therefore truncation of lowerlevel algorithms has to be used and obvious questions arise such as What is the effect of inaccuracy in lower-level problems on the properties of the method? Is it worth to solve the lower-level problems always with maximum (i.e. required in the solution) accuracy? Analogous questions arose in the analysis of general minimization algorithms w i t h inaccurate line search [2, 6]. Let us note however that analogue is only superficial because in the latter case in spite of inaccurate line search we have clear criterion of improvement--decrease of the objective function. In the case of two-level methods inaccurate solution of lower-level problems results in deformation of the function observed by the upper level. It is connected with specific computational and theoretical difficulties. Recently these problems are under current investigation [1,4]. In Section 2 of this paper a general scheme of a two-level method will be formulated. The scheme covers wide class of two-level methods and in particular three cases sketched in this introduction. Next, assumptions about stop tests used in the truncation will be precised. In Section 3 a general algorithm of handling stop conditions on both levels will be described. It is an algorithm organizing the cooperation of any two methods in a two-level scheme. In Section 4 convergence of the method will be proved under general assumptions about point-to-set maps describing upper- and lower-level algorithms. Section 5 covers the analysis of cooperation from the point of view of computational cost in the special case of linearly convergent methods. In Section 6 some computational results for an example of a minimax problem are presented.
2. Formulation of the problem and relevant assumptions Now we arrange those analysed two-level methods into the frame of the following scheme. Let the lower level problem be of the form min q(v, m)
rn~Itt(v)
(10)
where q : R n • R ' E R1; M : Rn-~ 2w. Let us denote the solution of (10) for fixed v by r~(v). These solutions are used in the upper-level problem min g(v, r~(v)) vEV
(11)
where g : R" x R' ~ R I and V is a closed subset of R n. We assume that point-to-
J. Szymanowski, A. Ruszczynski/ Two-level algorithms
161
set mapping M is such that for v E V the set M(v) is non-empty, and the problem (10) has a unique solution rh(v). It is evident that the methods discussed in the Introduction are included in this scheme. For example the minimax problem (1) may be written in the form (10), (11) with v=x, r e = y , q(v,m)=-f(x,y), M ( v ) = Y, g(v,m)=f(x,y), V=X. Scheme (10), (11) is rather flexible and covers as well two-level methods for the solution of sparse systems of equations etc. Suppose that the following conditions for the lower- and upper-level algorithms are satisfied.
Properties of the lower-level algorithm (HI) Lower-level problems (10) are solved by a minimization algorithm generating for fixed v the sequence of points of space R" convergent to the point rh(v). (H2) The lower-level algorithm is provided with a correct stop test dependent on a parameter ~ > 0. For each v ~ V and each ~ > 0 the algorithm fulfills this stop criterion after finite number of steps. The point in R r found in such a way will be denoted by r~(v, E). (H3) There exist a constant ~ > 0 and function
~b : R + ~ R +,
lira ~b(a) = 0, a-}0 +
such that for all v E V and 0 < ~ -< (i) rfi(v, ~) E M(v), (ii) [[rfi(v, ~) - ~h(v)ll-< ~(~). So, we assume about the lower-level algorithm that tightening of the stop criterion results in better approximation of rh(v, ~) to exact solution rh(v). In (HI) we assume convergence of the lower-level algorithm. Of course this assumption may be fulfilled when the function q and the point-to-set mapping which describes the algorithm satisfy appropriate conditions. These conditions are discussed in several references [2, 4, 6, 8] and will not be analysed here.
Properties of the upper-level algorithm In our problem the upper-level algorithm has to minimize the function
g(v, rh(v)). Should it be the case that the lower-level problem (10) were precisely computed, convergence of the whole two-level method would be the result of assumption that function g(v, rh(v)) is sufficiently regular. H o w e v e r minimization of lower-level has to be truncated after finite number of iterations, which results in sending to the upper level some approximations tfi(v,~) of the solutions rh(v). Thus the upper-level algorithm for different ~ observes different functions g,(v) = g(v, rfi(v, ~)).
(12)
162
3". Szymanowski, A. Ruszczynskil Two-level algorithms
T h e r e f o r e we shall consider the family of upper-level algorithms dependent on observed functions. Consequently, we have to consider the set of the approximate problems of the form min g,(v).
(13)
vEV
(H4) There exists a constant ~ E (0, 6] such that for any function g, : V ~ R 1 with 0 < ~ <- i the upper-level algorithm defines some point-to-set mapping B, : V--* 2 v
(14)
and B,(u) # ~ for all u E V. (H5) The upper-level algorithm is provided with a stop criterion dependent on the parameter 8 :> 0 defining for each function g, with 0 < e-< i the set /2,(8) of approximate solutions of the problem (13). There exists a function r +, l i m ~ o + r such that for all 8:>0, all 0 < E - < m i n ( ~ , 8 ) and all u E .Q~(8) g,(u)
-
inf g,(v) -< ~(8).
(15)
vEV
(H6) There exists a constant g > 0 such that for all 0 < 8-< g and all 0 < ~-< min(~, 8)
(i) ~,(8) ~ ~, (ii) V(u E V ) V ( { u k } k E N : U 0 ---- u, u k + 1 ~ B,(u~))3(ko>-O) uk~ ~ , ( 8 ) Let us note that we do not assume that the problems (13) may be solved precisely by the upper-level algorithm. Solutions of these problems may not even exist. We assume only that it is possible to find in a finite number of steps a point belonging to /],(8) with the stop parameter 8 greater than the error ~. The method of proper handling of the stop criteria at upper and lower level will be specified in the next section.
3. An algorithm Let us observe that when minimizing the function g,(v) one can expect the simultaneous decrease of the function g(v, n~(v)) only when improvement in the values of the function g,(v) is of higher order than its deviation from g(v, n~(v)). Further minimization of g,(v) is beside the purpose as we are in the area, where the approximation of the function g(v, n~(v)) with g,(v) is no longer sufficient. It is necessary then to decrease E and continue the minimization, starting from the last point obtained. Above intuitive considerations are the basis of construction of the following accuracy selection algorithm: Step O. Select initial point v~ V, initial accuracy ~~ (0, min(~, g)], a constant A ~ (0, 1). Set k -- 0. Step I. Set u ~ v k.
J. Szymanowski, A. Ruszczynski[ Two-level algorithms
163
Step 2. Compute r~(u ~ ~k), g,,(u 0) = g(u o, rh(u o, ~ ) ) . Step 3. Set j = 0. Step 4. Compute u i§ E B,k(uJ). Step 5. If g,k(u ~§ -- g~k(u ~ ~k, then go to Step 6; otherwise go to Step 7. Step 6. Set v k+t = u i+t, u ~ v k+l, ~k+~ = ~k, k = k + 1 and go to Step 3. Step 7. If u i~ ~E/'/,k(~ k) then go to Step 8; otherwise set u j = u j§ j = j + 1 and go to Step 4. Step 8. Choose A k (~ (0, A], set v k*l = v k, ek+L= AkEk, k = k + 1 and go to Step 1. Above algorithm is based on the construction of two sequences: {v k} and {ui}. The sequence {v k} is a fundamental sequence of the upper level and its convergence to the solution of the problem (11) will be proved soon. The sequence {u ~} is an auxiliary one. Its aim is to test whether it is possible to reduce the function g,k by more than e k (Step 5). Then successive element of the sequence {v k} is constructed. On the other hand, if the stop test on the upper level is activated and proper decrease of the function g,k has not been achieved yet (Step 7), we draw the conclusion that we are in the area where the error of g,k(v) with respect to g(v, dr(v)) prevails and we decrease ~ (Step 8). Above algorithm in a very shortened manner describes the operation of two-level method, giving detailed analysis only for fragments relevant to the handling of stop tests. In practice implementation of Step 4 requires iterative invocations of the lower level and iterative solving of the problem (10). Since all these operations are carried out at the same ~k, they were omitted in order to present clear idea of algorithm. Important feature of the described method of cooperation of the algorithms of upper and lower level is the fact, that the lower-level problem (10) at v distant from the solution is computed roughly, avoiding superfluous computing expenses. Moreover, each step of the algorithm may be executed in a finite number of iterations. Such an algorithm is often called an implementable algorithm [6]. Let us note that no instruction " S t o p " appears in the algorithm. In practice of course one has to define the best accuracy ~ n and to supplement Step 8 of the algorithm with the test "if ~-< ~=j~ then truncate the upper level sequence and choose v k as an approximation of the solution of the problem". From considerations of the next section (formula (27)) if follows that the point (v k, rfi(v k, r satisfies the inequality: g(v k, rfi(v ~, era)) < rain g(v, dz(v)) + O(r vEV
where O(~)-~ 0 when r 0. From above estimate it is evident that one can find such ~m~ that required accuracy of the solution of problem (11) could be obtained. It is worth mentioning that the algorithm described in this section is to a certain extent related to the general approach to mathematical programming
164
J. Szymanowski, A. Ruszczynski/ Two-level algorithms
algorithms, developed in [3]. General convergence theorems from [3] are based on the notion of C-decreasing families of point-to-set maps. In our case however, it seems rather difficult to introduce an appropriate ordering between the maps describing subsequent iterations at different E. T h e r e f o r e the method described above does not fall into the general framework of [3]. We shall derive now the proof of convergence of the sequence {v k} to the solution of problem (11).
4. Convergence analysis Theorem. Let the assumptions ( H I ) - ( H 6 ) from Section 2 be satisfied as well as the following additional ones (H7) The function g(v, r~(v)) is continuous with respect to v and bounded from below on V. (H8) The function g(v, m) is uniformly continuous with respect to m, i.e. there exists a nondecreasing function y : R + ~ R +, lim~_.o, y ( a ) = O such that for all v E V and all ml, m 2 ~ M ( v ) [g(v, m ~) - g(v, m2)l - y(llm ~ - m ~ll). Then each convergent subsequence {Vk}k~K o f the sequence {vk}k~N generated by the algorithm, converges to a point v| V such that g(v ~, rfi(v|
= min g(v, ~ ( v ) ) .
(16)
vEV
Proof. First we shall p r o v e that if v k ~ v | v ~ ~ V, 9k ~ 0 , ek > 0, then lira g(v k, ~ ( v k, ek)) = g(v ~, rfi(v|
(17)
k -~
Indeed, it follows from (HS) and (H3) (ii) that
Ig(v ~, ,~(v*, e~)) - g(v ~, ,~ (v~))l _< ~(lln~(v ~, ~ ) _ n~ (v~)ll) <_ ~ ( r
From (H7) we have lira g(v k, rfi(vk)) = g(v | ~(V~)) kEoo
and from (H3), (H8) lira y(~b(e*)) = 0 k-,| Consequently (17) is true. Now we shall consider two cases (A) The algorithm carries out Step 6 finite number of times. Then there exists
J. Szymanowski, A. Ruszczynski/ Two-level algorithms
165
k0~>0 such that v k = v k~ for k->k0, and Step 7 is carried out for k->k0. It follows from (H6) (ii) that for each ek (k -> k0) after finite number of iterations we find a point u such that g,k(U) > g,k(V *o) -- ~k.
u E O,*(~k),
(18)
Thus Step 8 is carried out infinite number of times and Ek ~ 0 . Moreover, by virtue of (H5)
g,*(u) <- inf g,k(V)+ ~p(ek).
(19)
~,EV
It follows from (18), (19) that for any v E V g,k(v ko) _< g,k(v) + ~p(r
+ Ck.
(20)
After transition to the limit with k ~ 0% with a view to (17), we obtain
g(v ~o, ,h(v *~ <- g(v, rh(v)) for any v E V. Thus (16) is true in this case. (B) Step 6 is carried out infinite number of times. Let a subsequence {Vk}k~K of the sequence {Vk}k~r~converge to v | If Step 8 were carried out finite number of times then there would exist k0-> 0 such that for k -> k0
~k = r
g(v*+], r~(l)k+l, ~ ) ) < g(1)k, ~ ) ) _ i[~.
Hence. the sequence
{g( v k, ffl ( v k, Ck~ would be unbounded. It follows from (H8) and (H3) (ii) that for k ~ oo, k E K
g(v k, rh(vk))~ --oo. which contradicts (H7). Thus, Step 8 is carried out infinite number of times. Consequently, ek ~ 0. Let l(k) be the greatest from the numbers ! < k such that in lth iteration Step 8 was executed. We have already proved that l(k)~oo and e " k ) ~ 0 , when k~oo. It follows from the definition of the number l(k), that e ; = e "k)+~ for l ( k ) < i < - k. Hence
g(v k, r~(v k, ~,)) < g(v,k)+l. ~ (1) l(k)+l, ~l(k)+l))
(21)
Since for l(k) Step 8 was carried out. it follows from (H5) and (H6) (ii) (in a similar way to that which led to inequality (20)) that
g(v"k~, ~ (v'(k), ~.k))) <~ inf g(v, rh(v, r vEV
+ ~(r
+ ~l(~)
(22)
and V"k)+t = v"k).
(23)
166
J. Szymanowski, A. Ruszczynski/ Two-level algorithms
From (21) and (23) we obtain
g(vk tfi(vk, ek)) <~g(v,k~, th(Vttk~, eltk)+l)).
(24)
Next, by virtue of (H8) and (H3)
g(v "k~, th(v "k~, ~"~+')) <- g(v '(k), th (v "k), ettk'))+ y(l{m (v '(k), e "k)+') - th (v! tk,, <_g(v,k~, rh(v,k), e,kl)) + V(O(~,~*,) + O(~,kl)). Taking into account (22) and (24) we conclude that for any v E V
g(v ~, rh(v k, ek)) <- g(v. th(v, e"kl)) + ~/(~(~.k~+,) + ~(e.k~)) + ,p(E.k~) + ~,k~.
(25)
It follows from (H8) and (H3) (ii) that for any v E V
g(v, th(v)) >- g(v, tfi(v, E"k~))-- y(~b(E"k~)).
(26)
Thus, from (25) and (26) we obtain for any v E V
g(v k, rfi(v k, Ek)) --
(27)
After transition to the limit with k-->o~, k ~ K in (27). with a view to (17), we obtain for any v ~ V
g(v ~, rh(v~)) <- g(v, rh(v)). Since V is closed, v ~
V and (16) is true. The theorem has been proved.
Let us consider briefly assumptions of the theorem. Assumptions ( H I ) - ( H 6 ) were discussed in Section 2 and their meaning will be explained in the numerical example (Section 6). An important question connected with numerical implementation is the equilibration of the sensibility of stop tests of upper and lower level to parameter e. It is true that the above theorem warrants convergence provided that assumptions (H1)-(H8) are fulfilled, but it is quite clear that such equilibration will result in substantial savings of computations. For this purpose it may be useful to base the tests on the numbers oq(e), o'2(E) instead of e. If o'~(~)~0, tr2(~)~0 when ~ 0, then such a transformation will not spoil the convergence because general assumptions of the theorem will be still valid.
5. Computational cost for linearly convergent methods An important point that should be clarified is the choice of the values of parameters Ak. If we assume that Ak = A for all k->0, then existence of such optimal value is evident since:
J. Szymanowski, A. Ruszczynski/ Two-level algorithms
167
(1) At A close to 1 the stop criterion at the upper level will be fulfilled very often (Step 7). It will result in many additional computations aimed at rejection of sequence {u i} and improvement of accuracy at the same point v k. (2) At A close to 0 the stop criterion for the lower level will be tightened too fast. It will result in excessive cost of computation of the function g,(v) at v distant from the solution. We shall illustrate the possibility of selection of optimal A for the simple case of linearly convergent algorithms. Let us assume that (i) Calculation of th(v,r calls for C~ Iogr iterations of the lower-level algorithm. (ii) The upper-level algorithm has linear rate of convergence, i.e. there exists q < 1 such that g,(u j+~) <- qg,(uJ). (iii) Stop test at the upper level is of form [u i+' E [],(e)] r [g,(u i) - g,(u j§ <--el (iv) Transition from u j to u j§ calls for C2 runs of lower level. (v) There exists 3, > 0 such that for any v ~ V and m~, m 2 E M ( v ) Ig(v, ml) - g(v, m2)l-<
vllm,- m21l.
Simple transformations lead to the following estimate of the cost of computations ~'(X) ~ d, + d2110g )t[ + d3/llog )t I where constants d~, d2, d3 are positive and depend on Ct, C2, q, y, initial accuracy eo and final accuracy ~,~.. It is evident that the function ~r(A) has a minimum in the interval (0, 1). The plot of ~r(A) is shown in Fig. 1 in the next section.
6. Computational experience The accuracy selection algorithm described in the previous sections was tested on the following example of a minimax problem min max If(x, y) = 100(x2 - (x02)2 + 2xtyl + 2x2Y2- ( y 0 2 - (y2)2 - (Yl -
y2)4].
x E R 2 yEl~2
(28) Let us observe that for x # ( 0 , 0 ) , m a x y f ( x , y ) > 0 and for x = ( 0 , 0 ) , maxy f(x, y) = 0. Thus a unique solution of (36) is (x, y) = (0, 0, 0, 0). Let us write the problem (36) in the general form introduced in Section 2. The lower-level problem is of the form rain [q(v, m) = (m02+ (m2) 2 + (ml - m2)~ - 2viral - 2v2m2].
(29)
m~E 2
We denote, as before, solutions of (37) by n~(v). The upper-level problem is of the form min [g(v, th(v)) = 100(v2- (vl)2)2 - q(v, rh (v))]. (30) vER 2
168
J. Szymanowski, A. Ruszczynski[ Two-level algorithms
For solving the lower-level problem (29) the well-known variable metric method of Fletcher was used. The stop criterion was based on the norm of the gradient. Let us note that if the norm of the gradient is of range e, then the error of the function observed is of range (e)2. In the accuracy selection algorithm, described in Section 3, it is useful to have these two errors equilibrated (Step 5). Thus the stop test
IIvmq(v, m)ll < ~x/~.
(31)
was used in the Fletcher's method. P a r a m e t e r 7 / > 0 in (31) will be chosen later. The starting point for this algorithm was fixed at m ~ (0.5, 1.0). The Fletcher's method, starting f r o m m ~ generated a sequence {mk(v)} convergent to rh(v). The first point of this sequence, satisfying stop test (31), was chosen as n~(v, r At the upper level a modified variable metric method of Wolfe, Broyden and Davidon (WBD) was used. The set f ~ ( 6 ) of approximate solutions of the upper-level problems was defined as follows
~,(8) = {rE R2:
II [(v "))l
(32)
In order to fulfill assumption (H6), p a r a m e t e r T/ in (31) should be sufficiently small. In our case 7/= 0.1 is good enough. The starting point for the upper-level algorithm was fixed at v ~ = (5, 24). More details about implementation of Fletcher's and W B D methods, together with description of line search procedures etc., may be found in [9]. The accuracy selection algorithm was written in such a way that any minimization method may be used at the upper- or lower-level. Besides the minimization procedures one should determine the following parameters: e0: initial accuracy, c~n: final accuracy, A: accuracy reduction coefficient, a : rate of c o n v e r g e n c e multiplier. In Step 8 of the a c c u r a c y selection algorithm c k is updated according to the formula ek+~= A*r k, where A k = (a)*A. In our example we want to find a point v E R: in which [[Vvg(v, m(v))l[ < l0 -7. Since in the accuracy selection algorithm, formulated in Section 3, it was required that 8 in (32) be equal to c in (31), we choose CmJn= 10-14- It is reasonable to choose c ~ of the same range as the square of the norm of the gradient in v ~ In the c o m p u t a t i o n s c ~ was fixed at 104. N u m e r i c a l results
In order to investigate properties of the a c c u r a c y selection algorithm influence of p a r a m e t e r A on the cost of computations was tested. P a r a m e t e r a was fixed at l, which c o r r e s p o n d s to linear c o n v e r g e n c e of accuracy p a r a m e t e r c*. The
J. Szymanowski, A. Ruszczynski/ Two-level algorithms
169
cost of c o m p u t a t i o n s was defined by the formula TC = N F + 2- NG,
where NF: number of function evaluations at the lower level, NG: number o f gradient evaluations at the lower level. Results of computations for various A are collected in Table 1, where NE: number of reductions of ~, Table
1 a
10 -18 l0 -15 l0 -]z 10 -9 10 -7 2.5 x 10-5 10 .3 10 -2
4x10 2 2 . 5 x l 0 -I
superlinear
NE I
2 2 2 3 4 6 9 13 30 4
Ni
xI
x2
Yl
Yz
f
TC
194 195 196 209 206 229 239 259 269 337 207
- 0 . 4 9 x l 0 -t~ - 0 . 4 9 x l0 -[~ 0 . 4 2 x l0 -6 0 . 2 4 x 1 0 -tz 0.27 x 10 s 0.38 x 10 -~~ -0.71x10 s 0.27 x 10 -s - 0 . 1 3 x 10 -5 - 0 . 3 2 x l 0 -s - 0 . 1 0 x 10-9
-0.39xl0 u - 0 . 3 8 x l0 -ll - 0 . 5 4 x l0 -9 -0,12xl0 u - 0 . 2 4 x 10 io 0.15 x 10 -I~ 0 . 3 3 x l 0 l0 0.11 x 10 -9
0 . 5 8 x l 0 l0 - 0 . 3 6 x l0 ~o - 0 . 1 5 x l0 3 - 0 . 3 0 x l 0 -12 0.27 x 10-s 0.63 x 10 ~0 _0.71xl0-S 0.27 x 10 s
0.58xl0 -u - 0 . 2 8 x l0 T M 0.19• l0 -4 0 . 7 3 x l 0 ,12 - 0 . 2 3 x 10 to 0.17 x 10 -~0 0 . 3 4 x l 0 -I~ 0.11 X 10 -9
0 . 3 8 x l 0 zo 0 . 3 7 x I0 -Is 0 . 1 8 x l0 t: 0 . 1 4 x l 0 -zz 0.71 x 10-~7 0.23 x 10 19 0 . 5 1 x 1 0 t6 0.85 X 10 -17 0 . 1 8 x 10- u 0.10xl016 0.1l X 10 -19
7329 6752 5482 a 4823 4298 3894 3995 4019 4187 a 5001 3264
a Required accuracy not
- - 0 . 5 5 X 10 -9
- 0 . 2 0 x 1 0 12 - 0 . 2 8 x 10-~l
0.58X l0 5
0.30X l0 6
- 0 . 3 2 x l 0 -8 0.21 • 10 to
0 . 7 7 x l 0 -~z 0 . 1 6 x 10 i1
obtained. cosf
7000
6000 %,
!
~~
-5000
I !
,/.
o~
9~,000
I(~ ' 18
1o I~
1i 4
I012 .
.
I0I0 .
.
F i g . I. I n f l u e n c e o f p a r a m e t e r 999
168
I06
10
10
;t o n t h e c o s t o f c o m p u t a t i o n s ,
results of experiments,
- - - t h e o r e t i c a l c u r v e : 7r(A) = 2 4 0 0 + 25011og A I + 2 2 0 0 / J l o g A [.
170
J. Szymanowski, A. Ruszczynskil Two-level algorithms
NI: number of invocations of the lower-level procedure, (xt, x2, y~, Y2): solution obtained, f: value of the objective function, TC: total cost of computations. Influence of parameter A on the cost of computations is illustrated in Fig. 1. Let us note that the value of A = 10-Is corresponds to the method in which lowerlevel problems are solved with utmost accuracy. Cost of computations in this case is marked in Fig. 1 by a horizontal dotted line. Although considerations from Section 5 were made under very simplifying assumptions the curve cost versus a is greatly similar to the theoretical one. Additionally, one experiment was carried out in which parameter a was less than 1. Such values of ot correspond to superlinear convergence of accuracy parameters ~k. Results of this experiment are shown in the last row of Table 1 (A = 0.01, ,~ = 0.02). The cost of computations is much less than in the case of linear decrease of ek. Presumably, superlinear reduction of ~k is better for superlinearly convergent algorithms such as variable metric methods.
7. Final conclusions In the paper analysis of two-level methods was carried out under the assumption that lower-level problems are solved inaccurately. A general scheme of two-level methods covering a wide class of existing methods was used. This approach has a number of advantages. (l) Convergence of these methods was proved under general assumptions about stop criteria in upper- and lower-level algorithms. Optionally chosen minimization methods may be used in the scheme. (2) An accuracy selection algorithm was described and investigated theoretically and numerically. It is not a new two-level method but an algorithm organizing cooperation of minimization methods within the frame of a two-level method. (3) Proposed algorithm is numerically valid, i.e. each its step requires finite number of computations. Moreover, proper equilibration of stop criteria results in substantial savings of computations. Numerical experiments confirmed theoretical considerations. An example of application of the ideas suggested in this paper to the gradient projection method in minimax problems is presented in [7].
References [1] A. Auslender, "Minimization of convex functions with errors", IX International Symposium on Mathematical Programming, Budapest, 1976.
J. Szymanowski, A. Ruszczynski/ Two-level algorithms
171
[2] P. Huard, "Optimization algorithms and point-to-set maps", Mathematical Programming 8 (1975) 308-331. [3] J. Denel, "Nouvelles notions de continuit6 des applications multivoques et applications a l'optimisation", Publication No 83 (mars 1977), Laboratoire de Calcul de rUniversite de Lille I. [4] R.R. Meyer, "The validity of a family of optimization methods", SlAM Journal on Control 8 (1970) 41-54. [5] G. Pierra, "Crossing of algorithms in decomposition methods", IFAC Symposium on Large Scale Systems Theory and Applications, Udine, 1976. [6] E. Polak, Computational methods in optimization (Academic Press, New York, 1971). [7] J. Szymanowski and A. Ruszczyfiski, "An accuracy selection algorithm for the modified gradient projection method in minimax problems", 8th IFIP Con[erence on Optimization Techniques, Wurzburg, 1977. [8] W.I. Zangwill, "Nonlinear programming. A unified approach", Prentice Hall, Englewood Cliffs, NJ, 1969). [9] "Methods for unconstrained optimization", Technical Report 1.2.03.2, Institute of Automatic Control, Technical University of Warsaw (1976).
Mathematical Programming Study 10 (1979) 172-190. North-Holland Publishing Company
A COMPARATIVE STUDY OF SEVERAL GENERAL CONVERGENCE CONDITIONS FOR ALGORITHMS MODELED BY POINT-TO-SET MAPS* S. T I S H Y A D H I G A M A
a n d E. P O L A K
University o[ California, Berkeley, CA, U.S.A.
and R. K L E S S I G Bell Laboratories, Holmdel, NJ, U.S.A.
Received 29 November 1977 Revised manuscript received 7 April 1978 A general structure is established that allows the comparison of various conditions that are sufficient for convergence of algorithms that can be modeled as the recursive application of a point-to-set map. This structure is used to compare several earlier sufficient conditions as well as three new sets of sufficient conditions. One of the new sets of conditions is shown to be the most general in that all other sets of conditions imply this new set. This new set of conditions is also extended to the case where the point-to-set map can change from iteration to iteration. Key words: Optimization Algorithms, Convergence Conditions, Point-to-set Maps, Nonlinear Programming, Comparative Study.
1. Introduction In r e c e n t y e a r s , t h e s t u d y o f o p t i m i z a t i o n a l g o r i t h m s h a s i n c l u d e d a s u b s t a n t i a l e f f o r t to i d e n t i f y p r o p e r t i e s o f a l g o r i t h m s t h a t will g u a r a n t e e t h e i r c o n v e r g e n c e in s o m e s e n s e e.g. [ I ] - [ 2 9 ] . A n u m b e r o f t h e s e r e s u l t s h a v e u s e d a n a b s t r a c t a l g o r i t h m m o d e l t h a t c o n s i s t s o f the r e c u r s i v e a p p l i c a t i o n o f a p o i n t - t o - s e t m a p . It is this t y p e o f r e s u l t with w h i c h w e a r e c o n c e r n e d in this p a p e r a n d , in p a r t i c u l a r w i t h the r e s u l t s p r e s e n t e d in [13], [16], [21], [24] a n d [29]. W e h a v e t w o p u r p o s e s . F i r s t , w e w i s h to i n t r o d u c e t h r e e n e w g e n e r a l c o n v e r g e n c e r e s u l t s . S e c o n d , w e wish to i d e n t i f y t h e r e l a t i o n s h i p s a m o n g the general convergence results including both our new results and previously published results. In o r d e r to c o m p a r e r e s u l t s , it is n e c e s s a r y to h a v e a c o m m o n f r a m e w o r k . U n f o r t u n a t e l y , d i f f e r e n t a u t h o r s h a v e u s e d slightly d i f f e r e n t a b s t r a c t a l g o r i t h m m o d e l s a n d h a v e a r r i v e d at slightly d i f f e r e n t c o n c l u s i o n s , p a r t l y b e c a u s e t h e y have used somewhat different concepts of convergence. Thus, before a c o m p a r i s o n c a n be m a d e , it is n e c e s s a r y to e s t a b l i s h a c o m m o n f r a m e w o r k a n d * Research sponsored by the National Science Foundation Grant ENG73-O214-A01 and the National Science Foundation (RANN) Grant ENV76-05264. 172
S. Tishyadhigama, E. Polak, R. Klessigl Algorithms modeled by point-to-set maps
173
then to translate the various theories into this f r a m e w o r k . Our a p p r o a c h to this task is as follows. In Section 2, we define an abstract algorithm model and formally define a concept of c o n v e r g e n c e for this model. Our new convergence results establish that certain conditions are sufficient for the algorithm model to be convergent in the sense of our c o n c e p t of convergence. The earlier results use a similar a p p r o a c h , but occasionally differ f r o m each other by the algorithm model and c o n c e p t of convergence used. We take the essential features of these earlier sufficient conditions and use these to create analogous conditions that are sufficient in our present framework. We then establish relationships between the various sufficient conditions by showing which conditions imply other conditions. In view of our a p p r o a c h to the interpretation of earlier work, we m a k e no claim that, and the reader should not infer that, the contents of this paper fully describe the various earlier results. W h e n we associate an a u t h o r ' s name with a set of sufficient conditions, we mean that the original conditions f r o m which we derived the conditions in question, were first proposed by that author. The interested reader can find all of the new results stated in this p a p e r in [26]. [26] also shows h o w the sufficient conditions used in this p a p e r are derived f r o m the original sufficient conditions. Section 3 contains the main results of this paper. These results are summarized by Fig. 1. Each box represents a set of conditions and an arrow indicates that the conditions at the tail of the a r r o w imply the conditions at the head. We have included in Section 3 results that show that under special conditions, some sets of sufficient conditions are equivalent. The most important of these special
G.I~E ~ M E Y I ~ R
I,,,t
44) I
~
'
~
POLAK~'~---'~1 H._MI:.Y~ I
m
(I)
IZ"NOW'LLI I (3.47) I
C IS LOWER SEMICONTINUOUS
,,, .,SS,NOLEV'LUE0 (,5) C IS CONTINUOUS
Fig. 1. Results of Section 3.
174
S. Tishyadhigama, E. Polak, R. Klessigl Algorithms modeled by point-to-set maps
cases is when the cost (or surrogate cost) function, c, is continuous. The special cases are noted in Fig. 1. In Section 4 we illustrate how the sufficient conditions presented in Section 3 can be modified to apply to an algorithm model which may use a different point-to-set map at each iteration. We do this by extending the most general sufficient conditions of Section 3. Finally, in the Appendix we present s o m e c o u n t e r e x a m p l e s to show that there are meaningful differences between the sets of sufficient conditions.
2. Framework for comparison and preliminaries In this section, we present an abstract algorithm model and define a concept of convergence. In addition, we present s o m e results and notation that will be extensively used in the sequel. (2.1) Delinition. /2 is a Hausdorff topological space that satisfies the first axiom of countability. A C / 2 is called the set of desirable points. (2.2) Remark. The set A consists of points that we will accept as " s o l u t i o n s " to the problem being solved by the algorithm. For example, it m a y consist of all points satisfying a n e c e s s a r y condition of optimality./2 is sometimes taken as the set of feasible points for a problem. T h u s , / 2 m a y be a subset of a larger topological space. If such is the case, the relative topology o n / 2 is used. (2.3) Algorithm model. Let A :/2 ~ 2 a - ~b where 2 a denotes all subsets of O. Step 0: Set i = 0. Choose any z0 E/2. Step 1: Choose any zi+~ E A(z~). Step 2: Set i = i + 1 and go to Step 1. (2.4) Remark. Algorithm Model (2.3) has no provision for stopping and thus always generates an infinite sequence. H o w e v e r , many algorithms have stopping tests and stop only when zi E A. This can be accounted for in (2.3) by defining A(zi) = {zi} w h e n e v e r zi satisfies the stopping condition. Thus our analyses are shortened because we do not have to consider the trivial finite sequence case. We now state our c o n c e p t of convergence. (2.5) Definition. We say the Algorithm Model (2.3) is convergent ~ if the accumulation points of any sequence {z~} constructed by (2.3) are in A. W h e n we s a y t h a t a sequence {y~} c o n v e r g e s , w e still m e a n it in the usual s e n s e , i.e., for s o m e ~, yi--~ y as i ~ .
S. Tishyadhigama, E. Polak, R. Klessig/ Algorithms modeled by point-to-set maps
175
(2.6) Remark. We make no assumption that {z~} will have accumulation points. Thus, it is possible for (2.3) to be convergent and for {z~} to have no accumulation points. For example, for an optimization problem with no solution, defining A as the set of solutions means that A : ~b and the applications of a convergent algorithm results in a sequence {zi} that cannot have accumulation points. The definitions (2.1), (2.3) and (2.5) constitute the c o m m o n structure within which we shall carry out our analysis. To conclude this section, we establish some notation and state some results that will be useful later. All of the sufficient conditions in Section 3 assume the existence of a function c : 12 ~ R ~ and imply that c(z') <- c(z) Vz' E A(z), z E 12. In most applications, c is the cost function in the optimization problem. Since c is used frequently, we establish the following notational convention and state a lemma whose proof is straightforward and therefore omitted. (2.7) Notation. The symbol c always represents a function c : 12 ~ R ~. (2.8) Lemma. S u p p o s e {z~}C 12 is such that c(z~.l)<-c(zl) i = O, 1 ..... {c(zi)} converges if and only if some subsequence of {c(zi)} converges.
Then
The following properties of first countable Hausdorff topological spaces are well known (see [31]) and are stated here for reference. (2.9) Facts. (i) If z is an accumulation point of {zi}C 12, then there exists a subsequence of {zi} converging to z. (ii) For each z ~ O, there exists a sequence of neighborhoods of z, { U~}, such that U~+~C Uii = 0, 1.... and zi E U~i = 0, 1.... implies that z~ ---,z. (iii) For any sequence { z i } c S c 1 2 with S c o m p a c t , there exists a subsequence converging to a point in S.
3. Comparison of sufficient conditions In this section we present a n u m b e r of sets of sufficient conditions for Algorithm Model (2.3) to be convergent in the sense of (2.5). T h r e e of these sets, (3.3), (3.18) and (3.38), are new while the remaining sets have been extracted from previous results. We start by proving that (3.3) is sufficient. Then we establish the relationships among the various sets of conditions as indicated in Fig. I. As can be seen from this figure, all conditions ultimately imply (3.3). Thus, all conditions presented are indeed sufficient. (3.1) Definition. c(.) is said to be locally b o u n d e d from below at z if there exist a
S. Tishyadhigama, E. Polak, R. Klessigl Algorithms modeled by point-to-set maps
176
neighborhood U of z and b E R t (possibly depending on z) such that (3.2)
c(z') >- b Vz' E U.
(3.3) Conditions. (i) c(.) is locally bounded from below on 12 - A.
(ii) c(z') <- c ( z ) V z ' E A ( z ) , z E A. (iii) For each z E 1 2 - A , if {x~}C12 is such that x i ~ z c(xz)--* c*, then there exists an integer N such that (3.4)
and
c(y) < c* Vy ~ A(xN).
(3.5) Theorem. If Conditions (3.3) hold, then Algorithm Model (2.3) is convergent. Proof. First we note that (3.3)(iii) implies that c(z') < c(z) Vz' E A(z), z E 12 - A and hence, together with (3.3)(ii), that (3.6)
c(z') <- c(z) Vz' E A(z), z E 12.
Let {zi} be any infinite sequence constructed by the Algorithm Model (2.3) and suppose that z* E 1 2 - A is an accumulation point of {zi}. We shall establish a contradiction. Let K C{0, 1.... } index a subsequence such that z i ~ K z * . Because of (3.3)(i) and (3.6), there exists a c* such that c(zi)--->Kc*. L e m m a (2.8) then implies that c(z~)~ c* and it follows that (3.7)
c(z~)>-c*
f o r i = 0 , 1 .....
On the other hand, using {z~}~eKfor {x~}in (3.3)(iii) yields the existence of N such that c(zN~l) ~, then 6 < c*. (3.9) Theorem. Conditions (3.8) imply Conditions (3.3).
Proof. Suppose that conditions (3.8) hold. Then conditions (3.3)(i) and (3.3)(ii) hold since these are identical to (3.8)(i) and (3.8)(ii). Conditions (3.8)(ii) and (3.8)(iii) imply that c(z') <- c(z) Vz' E A(z), z E 12. Now let z ~ 1 2 - A and let {xi}C/2 be such that xi--' z and c ( x i ) ~ c*. We now
S. Tishyadhigama, E. Polak, R. Klessigl AIgorithms modeled by point-to-set maps
177
assume that (3.3)(iii) does not hold and establish a contradiction. If (3.3)(iii) does not hold, there exist y ~ E A ( x ~ ) i = 0 , 1 .... such that c(y~)->c*, i = 0 , 1 ..... Thus, we have c(xi) >- c(yi) >- c*,
i = O, l ....
which implies that lim c(yi) = c* and this contradicts (3.8)(iii). The proof is now complete. (3.10) Definition. The pair (c, A ) is locally uniformly m o n o t o n i c at z if there exists g(z) > 0 (possibly depending on z) and a neighborhood U ( z ) of z such that (3.11)
c(z")-c(z')<--~(z)Vz"~A(z'),z'E
U(z).
(3.12) Remark. Polak [21] was the first to use local uniform monotonicity. G. Meyer [13[ later generalized the Polak conditions by using local boundedness from below of c(.) instead of boundedness from below. (3.13) Conditions (G. Meyer [13]). (i) c(.) is locally bounded from below o n / 2 - A. (ii) c(z') <-- c ( z ) V z ' E A ( z ) , z E a. (iii) The pair (c, A) is locally uniformly monotonic on /2 - A. (3.14) Theorem. (a) Conditions (3.13) imply Conditions (3.8). (b) Conditions (3.8) with the additional a s s u m p t i o n that c(.) is locally b o u n d e d imply Conditions
(3.13). Proof. (a) Suppose that Conditions (3.13) hold. Then Conditions (3.8)0) and (3.8)(ii) hold since these are identical to Conditions (3.13)(i) and (3.13)(ii). Consider z E / 2 - A and let g(z) and U ( z ) be as in Definition (3.10). Let {xi}, {yi}, c* and t? be as given by (3.8)(iii). Then (3.13)(iii) implies that there exists an integer N such that c(y~)--- c ( x ~ ) - 8(z) for all i-> N. Hence, c = lim c(yD O, 8~ ~ O, x~ ~ U , y~ E A(x~) and (3.15)
c(xi) >- c(yi) > c ( x i ) - 8,
i = O, I . . . . .
Since c is locally bounded, there exists b -> 0 such that (3.16)
Ic(xi)l-< b,
i = 0, 1.....
178
S. Tishyadhigama, E. Polak, R. Klessigl AIgorithms modeled by point-to-set maps
Thus, there exist a subsequence {c(x~)}~ and c* such that c ( x ~ ) ~ r c * . (3.15) then implies that (3.17)
lim c(yi) = lim c(xi) = c*. K
K
Since xi E Ui, x~ ~ z and y~ E A(xi), (3.8)(iii) requires that limK c(yi) < c* which contradicts (3.17). Thus (3.13)(iii) must hold. The proof is now complete. (3.18) Conditions. T h e r e exists 8 : 12 ~ R ~ with the following properties. (i) c(.) is locally bounded from below on 12 ~ A. (ii) c ( z ' ) - c ( z ) <- - ~ ( z ) <-- 0 V z ' ~ A ( z ) , z E 12. (iii) For each z E O ~ A, if {xi} C 12 is such that x~ ~ z, then ~7=0 8(xi) = ~. (3.19) I.emma. S u p p o s e z ~ 12 and ~ : 12 --~ R § is such that ~,~-o ~(xi) = oo f o r all sequences {x~} that converge to z. Then there exist V(z), a n e i g h b o r h o o d o f z, and ~5 > 0 such that 8(z') >- 8 f o r all z' E V(z).
Proof. Suppose the l e m m a is false. Then we can find {z~} with z ~ z
and
8(zz)~0. Define the map n as (3.20)
n(i)=min{jlj>-i
+ l,8(z)<-(~)~}.
n is well defined since 8(z~)-~ 0. Let the sequence {x~} be defined as X0 =
ZO,
Xl :
Zn(0) ,
X2 :
Zn(n(O)),
X3 : Zn(n(n(O)))
and so forth. Then ~T:0 8(xi)-<~T=0(89 i = 2. But xi--> z and thus, by the hypothesis, ~7=0 6(xi)= ~. So we have a contradiction and the l e m m a must be true. (3.21) Theorem. Conditions (3.18) imply Conditions (3.13) and vice versa (i.e. (3.18)r Proof. ( ~ ) Clearly, (3.18)(i) implies (3.13)(i) and (3.18)(ii) implies (3.13)(ii). Let z E / 2 - A. Conditions (3.18)(ii), (3.18)(iii) and L e m m a (3.19) imply that there exist V ( z ) a neighborhood of z and 8 > 0 such that c(z")-c(z')<--~(z')<--8
Vz"EA(z'),z'~
V(z).
Consequently, (c, A) is locally uniformly monotonic at z. Condition (3.13)(iii) is therefore established. ((::) Condition (3.13)(i) is identical to (3.18)(i). For each z E ~Q - A, let ~(z) > 0 and U ( z ) be as given in Definition (3.10). Define 8 : 12 ~ R ~ by (3.22)
8(z) ~ inf{c(z) - c ( z ' ) [ z ' ~ A(z)}.
Conditions (3.13)(ii) and (3.13)(iii) imply that 8 ( z ) > - 0 for all z E 1 2 and thus
S. Tishyadhigama, E. Polak, R. Klessigl Algorithms modeled by point-to-set maps
179
(3.18)(ii) holds. Now, for z ~ / 2 ~ A, 6(z') -> 6(z) > 0 for all z' ~ U ( z ) . Therefore, whenever xi--* z, ~,,7=0 6(xi) = o~ and (3.18)(iii) holds. The proof is now complete. (3.23) (i) O. (ii) (iii)
Conditions (Polak [21]). c(.) is either lower semicontinuous on 12 - A or bounded from below on c(z') < c ( z ) Vz' E A ( z ) , z E a.
(c, A) is locally uniformly monotonic on s
A.
(3.24) Theorem. (a) Conditions (3.23) imply Conditions (3.13). (b) Conditions (3.13) with the additional a s s u m p t i o n that c(.) is lower semicontinuous imply Conditions (3.23). Proof. (a) Clearly, (3.23)(i) implies that c(-) is locally bounded from below. Thus (a) is true because (3.23)(ii) and (3.23)(iii) are identical to (3.13)(ii) and (3.13)(iii). (b) When c(.) is lower semicontinuous, (3.23)(i) is satisfied and therefore Conditions (3.13) imply Conditions (3.23). Thus, (b) is true. (3.25) Conditions (R. Meyer [16]). 6:/2--, R § is such that: (i) c(.) is either lower semicontinuous on /2 - A or bounded from below on (ii) c(z') <- c ( z ) - ~(z) V z ' C A ( z ) , z ~ g2. (iii) For each z E/2, if z~--, z and 8(z~) --, 0, then 6(z) = 0. (iv) {z' ~ o I g(z') = 0} c a. (3.26) Theorem. (a) Conditions (3.25) imply Conditions (3.23). (b) Conditions (3.23) imply Conditions (3.25) when A is closed. Proof. (a) Conditions (3.25)(i) and (3.23)(i) are identical. Since 6 has only nonnegative values, (3.25)(ii) implies (3.23)(ii). Now suppose z ~ O - - A . By (3.25)(iv), (3.27)
g(z) > 0.
Let {Ui} be a sequence of neighborhoods of z satisfying (2.9)(ii). Assume that (3.25) holds, but (3.23)(iii) does not hold at z. Then there exist {zi}, {yi} and {8~} such that z~ E U~ (and hence zi ~ z), y~ ~ A(zi), 6~ ~, 0 and c(yi) > c ( z l ) - 81, i = O, ! . . . . . Combining this inequality with (3.25)(ii) yields c(zi) - g(zi) >- c(yi) > c(zl) - 6,
i = O, 1 ....
180
S. Tishyadhigama, E. Polak, R. Klessig[ Algorithms modeled by point-to-set maps
Rearranging, we have that 0 - ~ ( z i ) < 8~, i = 0, 1.... which means that g(z~)~0. But since z~--, z, (3.25)(iii) requires g ( z ) = 0 which contradicts (3.27) and the p r o o f of (a) is complete. (b) As before, (3.23)0) and (3.25)(i) are identical. N o w define g: 12 ~ R + by (3.28)
g(z) =
0 i n f { c ( z ) - c ( z ' ) l z ' E A(z)}
ifzEA, if z ~ 12 - A.
Then (3.23)(iii) implies that ~(z) -> 0 for all z E 12 and (3.23)00 along with (3.28) imply (3.25)(ii). N o w consider z @ O - A. Because A is closed and because of (3.23)(iii), there exists a neighborhood U of z and ~5> 0 such that U C O -- A and ~(z') -> ~5 > 0 for all z' E U. Thus if we have z~ ~ z* E ~ and ~(zi) ~ 0 , we must have that z * E A which implies that ~ ( z * ) = 0. Consequently, (3.25)(iii) holds. Also, A = {z' E 121g(z') = 0} and therefore (3.25)(iv) holds. The proof is now complete. (3.29) Remark. If Conditions (3.23) hold, they will also hold if A is replaced by any A' such that A C A ' . Because of this latitude in selecting A in (3.23), Conditions (3.23) and (3.25) are not equivalent. H o w e v e r , if one chooses A to be as small as possible so that Conditions (3.23) hold for a given c(.) and A(.), then Corollary (3.30) shows that Conditions (3.23) are equivalent to Conditions (3.25). (3.30) Corollary. S u p p o s e
that s u p p o s e t h a t A = 12"]~ A where
Conditions
(3.23) are satisfied.
In
addition,
A _a {zl(c, A ) is locally u n i [ o r m l y m o n o t o n i c at z}. (Conditions
(3.23) o n l y i m p l y t h a t 12 ~ A C A.) Then C o n d i t i o n s
(3.25) are
satisfied.
Proof. In view of T h e o r e m (3.26), it will suffice to show that A is closed. We a s s u m e that A is not closed and establish a contradiction. If A is not closed, there exists z*, an accumulation point of A, such that z*qS.A (i.e. z* E A). Thus there exist a neighborhood N of z* and a 8 > 0 such that c(z") - c ( z ' ) <- - ,~ Vz" ~_ A ( z ' ) , z' ~- N.
Since N A A ~ , we can choose a :?E N AA. but N is also a neighborhood of ;? and the a b o v e inequality implies that (c, A) is locally uniformly monotonic at L That is, ;~ E A and we have ~ E A n A = ~b which is a contradiction. T h e r e f o r e , A is closed and the p r o o f is complete. (3.31) (i) (ii) (iii)
Conditions (G. M e y e r [13]). c(.) is lower semicontinuous on 12 - A. c ( z ' ) < c ( z ) V z ' E A(z), z E A. For each z E / 2 - A, there exists a neighborhood U of z such that c(z") < c ( z ) V z " ~ A(z'), z' ~ U.
S. Tishyadhigama, E. Polak, R. Klessig/ Algorithms modeled by point-to-set maps
181
(3.32) Theorem. (a) Conditions (3.31)imply Conditions (3.3). (b) Conditions (3.3) with the additional assumption that c(.) is continuous imply Conditions (3.31). Proof. (a) Assume that Conditions (3.31) hold. Then (3.3)(i) holds since lower semicontinuity implies local boundedness from below. Next, (3.3)(ii) is identical to (3.31)(ii). Let z E / 2 - A and let {xi}C/2 be such that x i ~ z and c(xi)-->c*. The lower semicontinuity of c(.) implies that (3.33)
c(z) -< c* = lira c(xD.
There must exist an integer N such that x:v E U where U is given by (3.31)(iii). Hence (3.34)
c(y) < c(z) Vy E A(xN).
Combining (3.33) and (3.34) shows that (3.3)(iii) holds. (b) Assume that Conditions (3.3) hold and that c is continuous. Since we are assuming that c(.) is continuous, Condition (3.31)(i) holds. Condition (3.31)(ii) holds since it is identical to (3.3)(ii). Let z E 12 - A and let {U~} be a sequence of neighborhoods of z that satisfies (2.9)(ii). If (3.31)(iii) does not hold, we can find xi ~ Ui and yl ~ A(x~) such that (3.35)
c(y~)-->c(z),
i = O, 1. . . . .
By the construction of {U~}, x~ ~ z, and therefore the continuity of c(.) implies that lim c(xi) = c(z). But then (3.35) contradicts (3.3)(iii) so we must have that (3.31)(iii) holds. The proof is now complete.
(3.36) Definition. The composite map c(A(.)) is super upper semicontinuous at z if for each ~ > 0, there exist ~ E A ( z ) and a neighborhood U of z such that (3.37)
c(z") < c(~) + 9 Yz" ~ A(z'), z' ~ U.
(3.38) Conditions. (i) c(.) is lower semicontinuous on /2 - A . (ii) c(z') <- c(z) Vz' ~ A ( z ) , z E A. (iii) For each z E / 2 - A, there exists y ( z ) > 0 (possibly depending on z) such
that c(z') <- c(z) - y(z) Vz' E A(z).
(iv) c(A(.)) is super upper semicontinuous on 1-2 - A .
(3.39) Theorem. (a) Conditions (3.38)imply Conditions (3.31). (b) Conditions (3.38) imply Conditions (3.23).
182
s. Tishyadhigama, E. Polak, R. Klessig/ Algorithms modeled by point-to-set maps
Proof: Suppose Conditions (3.38) hold. (a) Then Conditions (3.31)(i) and (3.31)(ii) hold because they are identical to (3.38)(i) and (3.38)(ii). Let z E O ~ d and let 3'(z) > 0 be as given in (3.38)(iii). Set --- 89 in Definition (3.36) and denote the required neighborhood by U. Then, with 5 ~ A(z), as in (3.36), (3.40)
c(z")<-c(2)+ 89
U
and consequently, (3.31)(iii) holds. (b) Condition (3.38)(i) clearly implies (3.23)0) and (3.38)(ii) is identical to (3.23)(ii). Let z E O - A and let ~,(z)> 0 be as given by (3.38)(iii). Let 2 E A ( z ) and U be as given by Definition (3.36) with ~ = ~/(z). Then we have (3.41)
c(z") <- c ( 2 ) + ~ / ( z ) < - c ( z ) - ~/(z)+~'r(z) = c(z)-]3,(z)Vz"EA(z'),z'E U.
Since c(.) is lower semicontinuous, there exists a neighborhood U1 of z with U1 C U such that (3.42)
c ( z ) - ~ 3 , ( z ) < - c ( z ' ) V z ' E U1.
Combining (3.41) and (3.42) yields (3.43)
c(z") <- c(z') - 89
Vz" E A(z'), z' E U1.
Thus, (c, A) is locally uniformly monotonic on O - A , i.e., (3.23)(iii) holds. The proof is now complete.
(3.44) (i) (ii) (iii) (iv) (v)
Conditions (Polyak [24]). c(.) is lower semicontinuous on 12 - A. A(.) is single valued (denoted by a(-)) Vz E/2. c(a(z)) <--c(z) Vz E A. c(a(z)) < c(z) Vz E 1-1 ~ A. c(a(.)) is upper semicontinuous on O ~ A.
(3.45) Theorem. (a) Conditions (3.44) imply Conditions (3.38). (b) Conditions (3.38) imply Conditions (3.44) when A(.) is a single valued map. Proof. First we note that when A(.) (= a(.)) is single valued, Definition (3.36) implies that c(a(.)) is upper semicontinuous. Thus for A ( . ) = a(-), (3.44)0) and (3.38)(i) are equivalent, (3.44)(iii) and (3.38)(ii) are equivalent, and (3.44)(v) and (3.38)(iv) are equivalent. Furthermore, when A ( . ) = a(.), it is easy to see that (3.44)(iv) is equivalent to (3.38)(iii). Thus both (a) and (b) follow immediately. The proof is now complete. (3.46) I~tinltion. A(.) is said to be closed at z if zi-->z, yi ~ A(z~) and y ~ y imply that y C A(z).
S. Tishyadhigama, E. Polak, R. Klessig/ Algorithms modeled by point-to-set maps
183
(3.47) Conditions (Zangwill [29]). (i) (ii) (iii) (iv) (v)
c(.) is c o n t i n u o u s on /2. c(z') <- c(z) Vz' E A(z), z E A. c(z') < c(z) Vz' E A(z), z E / 2 ~ A. A is closed on /2 ~ A. For e a c h z E / 2 -- A, if xi ~ z and y~ E A(xi), then {yi} is c o m p a c t .
(3.48) T h e o r e m . Conditions (3.47) imply Conditions (3.38).
Proof. S u p p o s e Conditions (3.47) hold. Condition (3.47)0) clearly implies (3.38)(i). Condition (3.47)(ii) is identical to (3.38)(ii) and h e n c e (3.38)(i), (if) hold. Let z E / 2 - - A and let {~i}C A(z) be such that c(~)--, sup{c(~')l~:' ~ A(z)}. Condition (3.47)(v) implies that {sc~} is c o m p a c t and hence there exist a s u b s e q u e n c e {sc~}~r, and sc* such that sol~K, sr Condition (3.47)(iv) implies that sr E A ( z ) and thus, b e c a u s e c(.) is c o n t i n u o u s ((3.47)(i)), (3.49)
c(~*) = max{c(~:')l ~' E A(z)}.
N o w (3.49) and (3.47)(iii) yield (3.50)
c(z') <- c(~*) < c(z) Vz' E A ( z )
and so we h a v e (3.51)
c(z')<-c(z)-(c(z)-c(r
a= c ( z ) - y ( z ) V z ' E A ( z )
with 3 , ( z ) > 0 which establishes (3.38)(iii). T o show that (3.38)(iv) holds, we a s s u m e the c o n t r a r y and establish a contradiction. L e t z E 1 2 - - A and let ~* E A ( z ) be as a b o v e . Let {U~} be a s e q u e n c e of n e i g h b o r h o o d s of z satisfying (2.9)(ii). If (3.38)(iv) does not hold, we can find 9 > 0, z7 E U~ and z~' E A(z~) such that (3.52)
c(z'3 > c(~*) + e,
i = 0, 1. . . . .
By c o n s t r u c t i o n , z ~ z . Condition (3.47)(v) then implies that {zi'} is c o m p a c t . H e n c e , there exist a s u b s e q u e n c e {zi}~eK2" such that z " ~ i K2Z"* and of c o u r s e z ~ r 2 z . F r o m (3.52) we conclude that (3.53)
c(z:') >- c(~:*) + 9
Since A(.) is closed, (3.47)(iv), z" E A ( z ) and (3.50) c o m b i n e s with (3.53) to yield (3.54)
c(z") >_ c(~*) + 9 >_c(z") + 9 > c(z".)
and we have a contradiction. C o n s e q u e n t l y , (3.38)(iv) must hold and the p r o o f is complete.
184
S. Tishyadhigama, E. Polak, R. Klessig/ Algorithms modeled by point-to-set maps
4. Extension to the time varying case In this section, we modify Conditions (3.3) to apply to the case where the point-to-set map depends upon the iteration number, i. The other sufficient conditions can be extended in a similar fashion. These extensions are relatively straightforward. T h e r e f o r e , we extend only Conditions (3.3) (the most general conditions) as an example of what can be done. (4.1) Time varying algorithm model: Let Ai : ,(2 ~ 2 a for i = 0, 1..... Step 0: Set i = 0. Choose any z0E O. Step 1: Choose any zi+~ E Ai(zi). Step 2: Set i = i + I and go to step I.
(4.2) Conditions. (i) c(') is locally bounded from below on ~(2 - A . (ii) There exists an integer N~ -> 0 such that c(z') < c(z) Vz' E A~(z), Vz E ~'2, i>_N,. (iii) For each z E I 2 - - A , if {xi}CI2 is such that x i ~ z and c(xz)-->c*, then there exists an integer N2 - N~ such that c(y) < c* 'r G AN:(xu,). (4.3) Theorem. If Conditions (4.2) hold, then Algorithm model (4.1) is convergent (in the sense of Definition (2.5)).
Proof. Let z* be an accumulation point of {zi}, the sequence constructed by (4.1). We assume that z* E 1 2 - A and establish a contradiction. There exists a subsequence {zl}~sK such that z ~ K z * . Without loss of generality, we can also a s s u m e that {C(Z~)}isK is monotonically decreasing because of (4.2)(ii). If z* ~ / 2 - A, (4.2)0) implies that {c(zi)}~sK is bounded from below and hence c(zi) ~K c*. L e m m a (2.8) can be applied to obtain that c(z~)--> c* and (4.4)
c(zi) >- c* Vi >- Ni.
But, if z* ~ / 2 (4.5)
A, (4.2)(iii) implies that
C(ZN:+I)< C*
which contradicts (4.4). Thus, we must have z * E A and the proof is complete.
(4.6) Remark. It is immediately obvious that Conditions (3.3) imply Conditions (4.2) when Ai-- A for i = 0, 1,2 ..... Appendix A. Selected counterexamples The purpose of this Appendix is to show, by means of c o u n t e r e x a m p l e s , that certain of the implications not proved in Section 3 cannot, in fact, be proved. In
s. Tishyadhigama, E. Polak, R. Klessig/ AIgorithms modeled by point-to-set maps
185
the first set of c o u n t e r e x a m p l e s c is continuous, A is closed and A is single valued. U n d e r these restrictions, the sets of sufficient conditions aggregate into four e q u i v a l e n c e classes (see Fig. 1). T h e s e are: C l a s s I: C o n d i t i o n s (3.3) and (3.31). C l a s s lh C o n d i t i o n s (3.8), (3.13), (3.18), (3,23) and (3.25). C l a s s III: C o n d i t i o n s (3.38) and (3.44). C l a s s IV: C o n d i t i o n (3.47). It is i m m e d i a t e l y e v i d e n t that IV implies III, Ill implies II, II implies I. We shall p r e s e n t c o u n t e r e x a m p l e s to show that the c o n v e r s e is false. T h e first two c o u n t e r e x a m p l e s will be c o n s t r u c t e d f r o m the following o p t i m i z a t i o n p r o b l e m and algorithm. (A.1) Problem. m i n { c ( z ) l z E R'} w h e r e c :R ~~ R ' is defined by -z-I (A.2)
c(z) =
forz<--~,
z 2- ~
-
I
I for -- -~< z < 2,
1 for ~ <- z.
(A.3) R e m a r k . c is c o n t i n u o u s l y differentiable and -1 (A.4)
c'(z) =
for z - < - ~ ,
2z I
for - ~ < z < ~ , for !-
(A.5) Algorithm. D a t a : z0E R ~arbitrary. P a r a m e t e r s : a @ [0, 1),/3 ~ (0, 1), cr > 0 . S t e p 0: S e t i = 0 . S t e p 1: If c'(zi) = 0, set zi+~ = z~. Else, set zi+, = zi - r w h e r e j(zi) is the smallest n o n n e g a t i v e integer satisfying
(A.6)
c(zi - ~r/3J'~c'(z~))
- c ( z ~ ) < - ao~/3"Z'~c'(z~) 2.
S t e p 3: Set i = i + 1 and go to Step I.
(A.7) R e m a r k . W h e n a # 0, this algorithm is a version of the A r m i j o gradient m e t h o d [32]. Our c o u n t e r e x a m p l e s are c o n s t r u c t e d by taking various values of the p a r a m e t e r s a , / 3 and yr.
(A.8) Definition
~ R', A(z)
,a ~ {z [ c'(z) = 0} = {0},
a-A-a ( z ) A [.
tz
z - ~/3J~:lc'(z)
if z E A if z E f t - A.
(A.9) C o u n t e r e x a m p l e (I;/~ll). W e take a = 0, /3 = ~ and tr = l in (A.5) and show that Conditions (3.31) hold but Condition (3.8)(iii) does not hold. S t r a i g h t f o r w a r d
S. Tishyadhigama, E. Polak, R. Klessig/ Algorithms modeled by point-to-set maps
186
calculations s h o w that in this case
(A.10)
a(z) =
i+1 0 -1
f o r z < - 89 1 1 f o r -~--
If 0 < z < 89 there exists r = 89 z such that a(z') = 0 f o r all z' E (z - ~, z + r and h e n c e c(a(z')) < c(z). If z - 89 then - z < a(z') < z f o r all z' E (z - 89 z + 89 w h i c h m e a n s that c(a(z')) < c(z). Similar b e h a v i o r o c c u r s f o r z < 0 and thus condition (3.31)(iii) is satisfied at a n y z ~0. It is also clear that (3.31)0) and (3.31)(ii) hold. N o w w e s h o w that C o n d i t i o n (3.8)(iii) is n o t satisfied at z = 89 L e t {zi} be defined as 89 lli f o r i = 1,2 . . . . . Clearly z i ~ z and c(zj)-->c(z)= - 89 Also, b y 1 I (A.10), y i = a ( z i ) = z i - l E [ - ~ , ~ ] for i=1,2 ..... Hence c(yi)=(zi 1)2-]~ H o w e v e r , (3.8)(iii) requires that l i m c ( y l ) < c ( z ) = ( z - 1 ) ~ - ~ = - 2l = c ( z ) . lim c(zi). C o n s e q u e n t l y , w e h a v e s h o w n that Conditions (3.31) do not imply C o n d i t i o n s (3.8). (A.11) Counterexample (II=g, III). W e take a = ~, /3 = ~ and o - = 3 a n d s h o w that C o n d i t i o n s (3.23) are satisfied but that C o n d i t i o n (3.44)(v) does n o t hold. It is i m m e d i a t e l y evident that (3.23)(i) and (3.23)(ii) hold. S u p p o s e z CA, i.e. c'(z) ~ O. F o r a n o n n e g a t i v e integer j, w e have, by use o f the T a y l o r Series e x p a n s i o n , (A. 12)
c(z - (])(~)Jc'(z)) - c(z) + (89
= - (~)(~)Jc'(z)c'(z) = (});[- c ' ( z Y
+
2=
(89
+ o((~)(~)~c'(z))
+ o((~)(~)Sc'(z))/(~)q
w h e r e o(x)/Ixl-~O as x - ~ 0 . T h u s since c ' ( z ) ~ O, there exists an integer i such that the right-hand side o f (A.12) is strictly less than zero f o r j = ]. H e n c e (A. 13)
c(z - (~)(})Jc'(z)) - c(z) < -( 89
2.
Since c and c ' are c o n t i n u o u s , there exists an ~ > 0 such that (A. 14)
We
c(z' - (~)(~)tc'(z')) - c(z') < - (89
z
< - (89
[c'(z)l~ 2
V l z ' - zl < ,.
t h e r e f o r e can c o n c l u d e that j(z') <- f f o r all [z' - z] < e and h e n c e
(A.15)
c(a(z'))
- c ( z ' ) < (b(])s(z'~ 89
_.a
_ ~< - ~A .~~i -.~ f !
C'(Z) 2
8 <0Vlz'-z[<~.
T h u s , we have established condition (3.23)(iii). T o s h o w that C o n d i t i o n (3.44)(v) does not hold, w e s h o w that c(a(.)) is not u p p e r s e m i c o n t i n u o u s at z = - 1. A s t r a i g h t f o r w a r d calculation s h o w s that j(z) = 1, a(z) = 0 and h e n c e c ( a ( z ) ) = --~. N o w let z' = z - ~ f o r s o m e ~ > 0. It is e a s y
S. Tishyadhigama, E. Polak, R. Klessig/ Algorithms modeled by point-to-set maps
to s h o w
that
for
E >0
sufficiently small, j ( z ' ) = 0 ,
a(z')= 89
and
187
hence
c(a(z'))---( 89 E2--E-- 89 T h u s , t h e r e e x i s t s an g > 0 s u c h that c ( a ( z E)) > - - ~ > - ] = c(a(z)) for all ~ E (0, ~] a n d c(a(.)) is n o t u p p e r s e m i c o n t i n u o u s at z = - l . (A.16) C o u n t e r e x a m p l e (III f f IV). F o r this c o u n t e r e x a m p l e , w e a p p l y Algorithn (A.5) to a n e w f u n c t i o n c w h e r e {z! (A.17)
f~ 1
c(z)--
f o r z < - 1.
This c is n o t d i f f e r e n t i a b l e at 1 a n d - 1 so at t h e s e p o i n t s w e t a k e c ' ( 1 ) = lim~ t~ c'(z) = 2 a n d c ' ( - 1 ) = lim~ s-1 c'(z) = - 2 in A l g o r i t h m (A.5). W e also s e l e c t o" = 3, a = ~ a n d / 3 to a n y p o s i t i v e n u m b e r . U n d e r t h e s e c o n d i t i o n s , it is e a s y to s h o w that j ( z ) = 0 f o r all z a n d thus
(A.18)
a(z) =
z-]
l
- z]2
-1--
z+3 z < - l . W e shall n o w s h o w t h a t C o n d i t i o n s (3.44) h o l d b u t that C o n d i t i o n (3.47)(iv) d o e s n o t hold. It is c l e a r t h a t (3.44)(i)-(3.44)(iv) hold. A l s o , it is o b v i o u s that (3.44)(v) h o l d s f o r all z, e x c e p t , p o s s i b l y at z = + 1, a n d z = - 1 . W e s h o w t h a t c(a(.)) is u p p e r s e m i c o n t i n u o u s at z = 1 a n d t h u s b y s y m m e t r y , (3.44)(v) h o l d s f o r all z. F i r s t c ( a ( 1 ) ) = c ( - I ) = ( _ 1 ) 2 = 1. N o w c o n s i d e r E E (0, 3]. T h e n a(1 + ~) @ d, 1]. T h u s , c ( a ( 1 + ~)) = [a(1 + a)]2 = (1 + ~ - 3)2 = (1 + ~)2_ 2(1 + e) + 9. On the o t h e r h a n d , a(1 - E) C ( - ~ , - ~ ) a n d so c ( a ( 1 - a)) = ~(~ - I) 2. C o n s e q u e n t l y , (A.19)
limc(a(l+a))=~<~=c(a(1))=limc(a(1-e)). *,L0
a~0
W e c o n c l u d e f r o m (A.19) t h a t c(a(.)) is u p p e r s e m i c o n t i n u o u s at 1 a n d thus e v e r y w h e r e . T h e r e f o r e , w e h a v e s h o w n t h a t C o n d i t i o n s (3.44) h o l d . On the o t h e r h a n d , lim,~0 a ( l + ~)=-~ b u t a(1) = - 89 so t h a t a ( . ) is not c l o s e d (i.e. n o t c o n t i n u o u s at 1) a n d (3.47)(iv) d o e s n o t hold. 2 In o u r last c o u n t e r e x a m p l e , w e s h o w t h a t C o n d i t i o n s (3.3) d o n o t i m p l y C o n d i t i o n s (3.31). A s c a n be s e e n f r o m Fig. 1, w e shall n e e d a f u n c t i o n c(.) t h a t is n o t c o n t i n u o u s . T h e f o l l o w i n g l o w e r s e m i c o n t i n u o u s f u n c t i o n will suffice. (A.20)
c(z)
=Sz+l / z2
for l
2It is interesting to note that a counterexample can be constructed using a continuously differentiable function, c. In particular, the function and algorithm used in Counterexample (A.11) can be used with a = ~,/3 = ~and o- = ~. After a substantial amount of calculation, it can be shown that the resulting map a(.) is discontinuous at z = -+1 because ](z) is discontinuous at these points.
188
S. Tishyadhigama, E. Polak, R. Klessig] Algorithms modeled by point-to-set maps
(A.21) C o u n t e r e x a m p l e ((3.3)3)(3.31)). W e a p p l y A l g o r i t h m (A.5) to c as d e f i n e d 9 b y (A.20) w h e r e we t a k e c'(1) = lim:t~ c ' ( z ) = 2. W e also s e l e c t tr = 4, a = ~ and fl = ~. A f t e r s o m e c o m p u t a t i o n , it can be s h o w n t h a t
(A.22)
j ( z ) = {Ol
z ><_ l
and hence (A.23)
a(z)
J'z-~
/o
z > l, z_
W e n o w s h o w that C o n d i t i o n s (3.3) hold b u t C o n d i t i o n (3.31)(iii) d o e s not hold. C l e a r l y (3.3)(i) a n d (3.3)(ii) hold. F i r s t c o n s i d e r an a r b i t r a r y z E z' ~ 0}. T h e n c ( a ( . ) ) is c o n t i n u o u s at z and c ( a ( z ) ) < c(z). T h u s s u c h t h a t Iz' - zl < 9 i m p l i e s t h a t c ( a ( z ' ) ) < c ( z ) . S i n c e z is a p o i n t w e c o n c l u d e that (3.3)(iii) h o l d s f o r all z ~ A e x c e p t p o s s i b l y c o n s i d e r zi = 1 + I/i a n d $i = 1 - 1/i for i = 1 . . . . . T h e n (A.24)
z,+ l,
(A.25)
c(zi) = 1 + 1/i + 1 --->2 a= c*,
(A.26)
c(z.i) = (I - 1 / i ) 2 ~ I ___a6*,
(A.27)
c(a(zi)) = (1 + l/i - ~9,: ) ~ i g,_5,
(A.28)
c ( a ( h ) ) = c(O) = O.
13
{z' I z ' r 1, z' ~ ~-, there exists 9 > 0 o f c o n t i n u i t y o f c, z = 1 o r -~. N o w
e,? I,
T h u s , t h e r e e x i s t s an i n t e g e r N such that (A.29)
c ( a ( z u ) ) < 2 = c*,
(A.30)
c(a(Y.N)) = 0 < 1 = ~*
a n d (3.3)(iii) holds at z = 1. A s i m i l a r a r g u m e n t can be used to s h o w that (3.3)(iii) h o l d s at z - 1~ 4. O n the o t h e r h a n d , C o n d i t i o n (3.31)(iii) d o e s not hold at z = 1. T o see this, c o n s i d e r a n y 9 E (0, 1). T h e n , a ( l + ~ ) = 1 + ~ - 9 = 9 (-~,-~). Therefore, (A.31)
2'g
T h u s , t h e r e e x i s t s an g > 0 (A.32)
25
c ( a ( l + E ) ) = ( ~ - ~ ) 2 = E " - - ~ E + % ~ iZ.
s u c h that
c ( a ( 1 + ~)) > 1 = c ( l ) Ve E (0, ~].
C o n s e q u e n t l y , C o n d i t i o n (3.3 l)(iii) d o e s n o t hold at z = I.
S. Tishyadhigama, E. Polak, R. Klessig/ Algorithms modeled by point-to-set maps
189
References [1] J.W. Daniel, "Convergent step sizes for gradient-like feasible direction algorithms for constrained optimization," in J.B. Rosen, O.L. Mangasarian and K. Ritter, eds., Nonlinear programming (Academic Press, New York, 1970) pp. 245-274. [2] J. Dubois, "Theorems of convergence for improved nonlinear programming algorithms", Operations Research 21 (1) (1973) 328-332. [3] B.C. Eaves and W.I. Zangwill, "Generalized cutting plane algorithms", Siam Journal on Control 9 (4) (1971) 529-542. [4] R.M. Elkin, "Convergence theorems for Gauss-Seidel and other minimization algorithms", Computer Science Center Tech. Rept. 68-59, University of Maryland, College Park, MD (1968). [51 W.W. Hogan, "Point-to-set maps in mathematical programming", S I A M Review 15 (3) (1973) 591--603. [6] W.W. Hogan, "Applications of general convergence theory for outer approximation algorithms", Mathematical Programming 5 (1973) 151-168. [7] P. Huard, "Optimization algorithms and point-to-set maps", Mathematical Programming 8 (3) (1975) 308--331. [8] R. Klessig, "A general theory of convergence for constrained optimization algorithms that use antizigzagging provisions," S l A M Journal on Control 12 (4) (1974) 598-608. [9] R. Klessig and E. Polak, "An adaptive precision gradient method for optimal control," S I A M Journal on Control 11 (1) (1973) 80-93. [I0] D.G. Luenberger, Introduction to linear and nonlinear programming (Addison-Wesley, Reading, MA, 1973). [11] M.L. Lenard, "Practical convergence conditions for unconstrained optimization", Mathematical Programming 4 (1973) 309--323. [12] G.G.L. Meyer and E. Polak, "Abstract models for the synthesis of optimization algorithms", S I A M Journal on Control 9 (1971) 547-560. [13] G.G.L. Meyer, "Convergence conditions for a type of algorithm model", Tech. Rept. 75-14, The Johns Hopkins University, Baltimore, MD (1975). [14] G.G.L. Meyer, "Algorithm model for penalty-type iterative procedures", Journal of Computer and Systems Sciences 9 (1974) 20-30. [15] R.R. Meyer, "Sufficient conditions for the convergence of monotonic mathematical programming algorithms", Journal of Computer and Systems Sciences 12 (1976) 108-121. [16] R.R. Meyer, "A comparison of the forcing functions and point-to-set mapping approaches to convergence analysis", S I A M Journal on Control 15 (4) (1977) 699-715. [17] R.R. Meyer, "A convergence theory foraclassofanti-jammingstrategies',Tech. Rept. 1481,Math. Research Center, University of Wisconsin, Madison, WI. (1974). [18] H. Mukai and E. Polak, "On the use of approximations in algorithms for optimization problems with equality and inequality constraints", S l A M Journal on Numerical Analysis 15 (4) (1978) 674-.693. [19] J.M. Ortega and W.C. Rheinboidt, lterative solution o f nonlinear equations in several variables (Academic Press, New York, 1970). [20] J.M. Ortega and W.C. Rheinboldt, "A general convergence result for unconstrained minimization methods", S I A M Journal on Numerical Analysis 9 (1) (1972) 40--43. [21] E. Polak, "On the convergence of optimization algorithms", Revue Fran~aise d'Automatique, Informatique et Recherche Op~rationnelle 16(RI) (1969) 17-34. [21a] E. Polak, Computational methods in optimization: a unified approach (Academic Press, New York, 1971). i"22] E. Polak, "On the implementation of conceptual algorithms", in J.B. Rosen, O.L. Mangasarian and K. Ritter, eds., Nonlinear programming (Academic Press, New York, 1970) pp. 275-291. [23] E. Polak, R.W.H. Sargent and D.J. Sebastian, "On the convergence of sequential minimization algorithms", Journal o f Optimization Theory and Applications 14 (1974) 439---442. [24] B.T. Polyak, "Gradient methods for the minimization of functionals", U.S.S.R. Computational Mathematics and Mathematical Physics 3 (1963) 864--878. [25] S.W. Rauch, "A convergence theory for a class of nonlinear programming problems", S I A M Journal on Numerical Analysis 10 (I) (1973) 207-228.
190
S. Tishyadhigama, E. Polak, R. Klessig/ Algorithms modeled by point-to-set maps
[26] S. Tishyadhigama, "General convergence theorems: Their relationships and applications", Ph.D. Thesis, Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, (1977). [27] D.M. Topkis and A.F. Veinott, "On the convergence of some feasible direction algorithms for nonlinear programming", SIAM Journal on Control 5 (2) (1967) 268-279. [28] P. Wolfe, "Convergence conditions for ascent methods", SIAM Review 11 (2) (1969) 226--235. [29] W.I. Zangwill, Nonlinear programming: a unified approach (Prentice-Hall, Englewood Cliffs, NJ., 1969). [30] W.I. Zangwill, "Convergence conditions for nonlinear programming algorithms", Management Science 16 (1) (1969) 1-13. [31] J.L. Kelley, General topology (Van Nostrand, Princeton, NJ., 1955). [32] L. Armijo, "Minimization of functions having continuous partial derivatives", Pacific Journal o[ Mathematics 16 (1966) I-3.