Nonlinear Programming
SIAM's Classics in Applied Mathematics series consists of books that were previously allowed to go out of print. These books are republished by SIAM as a professional service because they continue to be important resources for mathematical scientists. Editor-in-Chief Robert E. O'Malley, Jr., University of Washington Editorial Board Richard A. Brualdi, University of Wisconsin-Madison Herbert B. Keller, California Institute of Technology Andrzej Z. Manitius, George Mason University Ingram Olkin, Stanford University Stanley Richardson, University of Edinburgh Ferdinand Verhulst, Mathematisch Instituut, University of Utrecht Classics in Applied Mathematics C. C. Lin and L. A. Segel, Mathematics Applied to Deterministic Problems in the Natural Sciences Johan G. F. Belinfante and Bernard Kolman, A Survey of Lie Groups and Lie Algebras with Applications and Computational Methods James M. Ortega, Numerical Analysis: A Second Course Anthony V. Fiacco and Garth P. McCormick, Nonlinear Programming: Sequential Unconstrained Minimization Techniques F. H. Clarke, Optimization and Nonsmooth Analysis George F. Carrier and Carl E. Pearson, Ordinary Differential
Equations
Leo Breiman, Probability R. Bellman and G. M. Wing, An Introduction to Invariant Imbedding Abraham Berman and Robert J. Plemmons, Nonnegative Matrices in the Mathematical Sciences Olvi L. Mangasarian, Nonlinear Programming *Carl Friedrich Gauss, Theory of the Combination of Observations Least Subject to Errors: Part One, Part Two, Supplement. Translated by G. W. Stewart Richard Bellman, Introduction to Matrix Analysis U. M. Ascher, R. M. M. Mattheij, and R. D. Russell, Numerical Solution of Boundary Value Problems for Ordinary Differential Equations K. E. Brenan, S. L. Campbell, and L. R. Petzold, Numerical Solution of InitialValue Problems in Differential-Algebraic Equations Charles L. Lawson and Richard J. Hanson, Solving Least Squares Problems J. E. Dennis, Jr. and Robert B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations Richard E. Barlow and Frank Proschan, Mathematical Theory of Reliability *First time in print.
Classics in Applied Mathematics (continued) Cornelius Lanczos, Linear Differential
Operators
Richard Bellman, Introduction to Matrix Analysis, Second Edition Beresford N. Parlett, The Symmetric Eigenvalue Problem Richard Haberman, Mathematical Models: Mechanical Vibrations, Population Dynamics, and Traffic Flow Peter W. M. John, Statistical Design and Analysis of Experiments Tamer Bajar and Geert Jan Olsder, Dynamic Noncooperative Game Theory, Second Edition Emanuel Parzen, Stochastic Processes Petar Kokotovic, Hassan K. Khalil, and John O'Reilly, Singular Perturbation Methods in Control: Analysis and Design Jean Dickinson Gibbons, Ingram Olkin, and Milton Sobel, Selecting and Ordering Populations: A New Statistical Methodology James A. Murdock, Perturbations: Theory and Methods Ivar Ekeland and Roger Témam, Convex Analysis and Variational Problems Ivar Stakgold, Boundary Value Problems of Mathematical Physics, Volumes I and II J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables David Kinderlehrer and Guido Stampacchia, An Introduction to Variational Inequalities and Their Applications F. Natterer, The Mathematics of Computerized Tomography Avinash C. Kak and Malcolm Slaney, Principles of Computerized Tomographic Imaging R. Wong, Asymptotic Approximations of Integrals O. Axelsson and V. A. Barker, Finite Element Solution of Boundary Value Problems: Theory and Computation David R. Brillinger, Time Series: Data Analysis and Theory Joel N. Franklin, Methods of Mathematical Economics: Linear and Nonlinear Programming, Fixed-Point Theorems Philip Hartman, Ordinary Differential Equations, Second Edition Michael D. Intriligator, Mathematical Optimization and Economic Theory Philippe G. Ciarlet, The Finite Element Method for Elliptic Problems
This page intentionally left blank
Nonlinear Programming Otvi L Mangasarian University of Wisconsin Madison, Wisconsin
siajTL. Society for Industrial and Applied Mathematics Philadelphia
Library of Congress Cataloging-in-Publication Data Mangasarian, Olvi L., 1934Nonlinear programming / Olvi L. Mangasarian. p. cm. -- (Classics in applied mathematics ; 10) Originally published: New York : McGraw-Hill, 1969, in series: McGraw-Hill series in systems science. Includes bibliographical references and indexes. ISBN 0-89871-341-2 1. Nonlinear programming. I. Title. II. Series T57.8.M34 1994 519.7'6-dc20 94-36844 109876543 All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the Publisher. For information, write the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688. Copyright © 1994 by the Society for Industrial and Applied Mathematics. This SIAM edition is a corrected republication of the work first published in 1969 by the McGraw-Hill Book Company, New York, New York. Siam
is a registered trademark.
To
Josephine Mangasarian, my mother, and to Claire
This page intentionally left blank
Preface to the Classics Edition
Twenty-five years have passed since the original edition of this book appeared; however, the topics covered are still timely and currently taught at the University of Wisconsin as well as many other major institutions. At Wisconsin these topics are taught in a course jointly listed by the Computer Sciences, Industrial Engineering, and Statistics departments. Students from these and other disciplines regularly take this course. Each year I get a number of requests from the United States and abroad for copies of the book and for permission to reproduce reserve copies for libraries. I was therefore pleased when SIAM approached me with a proposal to reprint the book in its Classics series. I believe that this book is an appropriate choice for this series inasmuch as it is a concise, 'igorous, yet accessible account o ' the fundamentals of constrained optimization theory that is useful to both the beginning student as well as the active researcher. I am appreciative that SIAM has chosen to publish the book and to make the corrections that I supplied. I am especially grateful to Vickie Kearn and Ed Block for their friendly and professional handling of the publication process. My hope is that the mathematical programming community will benefit from this endeavor. Olvi L. Mangasarian
ix
This page intentionally left blank
Preface
This book is based on a course in nonlinear programming given in the Electrical Engineering and Computer Sciences Department and the Industrial Engineering and Operations Research Department of the University of California at Berkeley and in the Computer Sciences Department of the University of Wisconsin at Madison. The intent of the book is to cover the fundamental theory underlying nonlinear programming for the applied mathematician. The entire book could be used as a text for a one-semester course, or the first eight chapters for a one-quarter course. The course level would probably be advanced undergraduate or firstyear graduate. The only prerequisite would be a good course in advanced calculus or real analysis. (Linear programming is not a prerequisite.) All the results needed in the book are given in the Appendixes. I am indebted to J. Ben Rosen who first introduced me to the fascinating subject of nonlinear programming, to Lotfi A. Zadeh who originally suggested the writing of such a book, to Jean-Paul Jacob, Phillippe Rossi, and James W. Daniel who read the manuscript carefully and made numerous improvements, and to all my students whose questions and observations resulted in many changes. Olvi L. Mangasarian
xi
This page intentionally left blank
To the Reader
The following system of numbering and cross-referencing is used in this book. At the top of each page in the outer margin appear chapter and section numbers in boldface type; for example, 3.2 at the top of a page means that the discussion on that page is part of Chapter 3, Section 2. In addition, each item on the page (Definition, Theorem, Example, Comment, Remark, etc.) is given a number that appears in the lefthand margin; such items are numbered consecutively within each section. Item numbers and all cross-references in the text are in italic type. Cross-references are of the form "by Definition 5,4.3"; this means "by the definition which is item 3 of Section 4 in Chapter 5." Since the four appendixes are labeled A, B, C, and D, the reference "C.I.3" is to "item 3 in Section 1, Appendix C." When we refer in a section to an item within the same section, only the item number is given; thus "substituting in 7" means "substituting in Equation 7 of this section."
xiii
This page intentionally left blank
Contents Preface to the Classics Edition Preface To the Reader Chapter 1. The Nonlinear Programming Problem, Preliminary Concepts, and Notation 1. The nonlinear programming problem 2. Sets and symbols 3. Vectors 4. Matrices 5. Mappings and functions 6. Notation
Chapter 2. Linear Inequalities and Theorems of the Alternative 1. 2. 3. 4.
Introduction The optimalily criteria of linear programming: An application of Farkas' theorem Existence theorems for linear systems Theorems of the alternative
Chapter 3. Convex Sets in Rn 1. 2.
Convex sets and their properties Separation theorems for convex sets
Chapter 4. Convex and Concave Functions 1. 2.
Definitions and basic properties Some fundamental theorems for convex functions
Chapter 5. Saddlepoint Optimality Criteria of Nonlinear Programming Without Differentiability 1. 2. 3. 4.
The minimization and saddlepoint problems Some basic results for minimization and local minimization problems Sufficient optimality criteria Necessary optimality criteria
ix xi
xiii 1
1 3 6
8 11 13
16 16 18 21 27
38 38 46
54 55 63
69 70 72 74 76 XV
xvi
Contents
Chapter 6. Differentiable Convex and Concave Functions 1. 2. 3.
Differentidble convex and concave functions Differentiable strictly convex and concave functions Twice-differentiable convex and concave functions
83 83 87 88
4. Twice-differentiable strictly convex and Concave
functions functions
Chapter 7. Optimality Criteria in Nonlinear Programming with Differentiability The minimization problems and the Fritz John and Kuhn-Tucker stationary-point problems 2. Sufficient optimality criteria 3. Necessary optimality criteria
90
92
1.
93 96 97
Chapter 8. Duality in Nonlinear Programming
113
1. Duality in nonlinear programming 2. Duality in quadratic programming 3. Duality in linear programming
123 126
Chapter 9. Generalizations of Convex Functions: Quasiconvex, Strictly Quasiconvex, and Pseudoconvex Functions 1. 2. 3. 4.
Quasiconvex and quasiconcave functions Strictly quasiconvex and strictly quasiconcave functions Pseudoconvex and pseudoconcave functions Summary of properties and relations between quasiconvex, strictly quasiconvex, pseudoconvex, convex, and strictly convex functions 5. Warning 6. Problems
Chapter 10. Optimality and Duality for Generalized Convex and Concave Functions 1. Sufficient optimality criteria 2. Necessary optimality criteria 3. Duality
114
131 131 136 140
145 147 148
151 151 153 157
Contents
xvii
Chapter 11. Optimality and Duality in the Presence of Nonlinear Equality Constraints 1. Sufficient optimality criteria 2. "Minimum principle" necessary optimality criteria: X° not open 3. Fritz John and Kuhn-Tucker stationary-point necessary optimality criteria: X° open 4. Duality with nonlinear equality constraints
Appendix A. Vectors and Matrices 1. 2.
Vectors Matrices
Appendix B. Resume of Some Topological Properties of Rn 1. Open and closed sets 2. Sequences and bounds 3. Compact sets in Rn
Appendix C.
Continuous and Semicontinuous Functions, Minima and Infima
161 16% 162 170 174
177 177 179
182 182 185 188
191
1. Continuous and semicontinuous functions 191 2. Infimum (supremum) and minimum (maximum) of a 195 set of real numbers 3. Infimum (supremum) and minimum (maximum) of a 196 numerical function 4. Existence of a minimum and a maximum of a numerical function 198
Appendix D. Differentiable Functions, Mean-value and Implicit Function Theorems 1. Differentiable and twice-differentiable functions 2. Mean-value theorem and Taylor's theorem 3. Implicit function theorem
200 200 204 204
Bibliography
205
Name Index
215 217
Subject Index
This page intentionally left blank
Chapter One
The Nonlinear Programming Problem, Preliminary Concepts, and Notation
1. The nonlinear programming problem f The nonlinear programming problem that will concern us has three fundamental ingredients: a finite number of real variables, a finite number of constraints which the variables must satisfy, and a function of the variables which must be minimized (or maximized). Mathematically speaking we can state the problem as follows: Find specific values (xi, . . . ,xn), if they exist, of the variables (£1, • . . ,zn) that will satisfy the inequality constraints
the equality constraints
and minimize (or maximize) the o bjective function over all values of Xi, . . . ,xn satisfying 1 and 2. Here, Qi, hj, and 6 are numerical functions! of the variables x\, . . . ,xn, which are defined for all finite values of t In order to introduce the problem in the first section of the book, some undefined terms (function, real variable, constraints, etc.) must be interpreted intuitively for the time being. The problem will be stated rigorously at the end of this chapter (see 1.6.9 to 1.6.12). t The concept of a numerical function will be defined precisely in Sec. 1.5. For the present by a numerical function of xi, . . . , xn we mean a correspondence which assigns a single real number for each n-tuple of real values that the variables xi, . • • , xn assume.
1.1
Nonlinear Programming
the variables. The fundamental difference between this problem and that of the classical constrained minimization problem of the ordinary calculus [Courant 47, Fleming 65] f is the presence of the inequalities 1. As such, inequalities will play a crucial role in nonlinear programming and will be studied in some detail. As an example of the above problem consider the case shown in Fig. 1.1.1. Here we have n = 2 (two variables Xi,Xz), m = 3 (three inequality constraints), and A; = 1 (one equality constraint). Each curve in Fig. 1.1.1 is obtained by setting some numerical function equal to a real number such as B(xi,x2) — 5 or g*(x\,xt) = 0. The little arrows on the t This refers to the works by Courant, written in 1947, and by Fleming, written in 1965, as listed in the Bibliography at the back of the book. This system of references will be used throughout the book with one exception: [Gordan 73] refers to Gordan's paper written in 1873.
Fig. 1.1.1 A typical nonlinear programming problem in two variables (xi,xi).
a
Preliminary Concepts and Notations
1.3
curves Qi(x\,Xi) = 0 indicate the side in the direction of which g^ increases, and hence all (xi,xz) must lie on the opposite side of these curves if they are to satisfy 1. All such (xi,Xz) lie in the shaded area of Fig. 1.1.1. To satisfy 2, (2:1,2:2) must lie on the curve hi(xi,Xz) = 0. The solution to the problem is (£1,0:2). This is the point on the curve A 1(0:1,2:2) = 0 at which 6 assumes its lowest value over the set of all (x 1,2:2) satisfying ^(0:1,2:2) ^ 0, i = 1, 2, 3. In more complicated situations where n, m, and A; may be large, it will not be easy to solve the above problem. We shall then be concerned with obtaining necessary and/or sufficient conditions that a point (x\, . . . ,xn) must satisfy in order for it to solve the nonlinear programming problem 1 to 3. These optimality conditions form the crux of nonlinear programming. In dealing with problems of the above type we shall confine ourselves to minimization problems only. Maximization problems can be easily converted to minimization problems by employing the identity maximum B(x\, . . . ,£„) = —minimum [—0(2:1, . . . ,£„)] Problem Solve graphically as indicated in Fig. 1.1.1 the following nonlinear programming problem: minimize ( — x\ — Xt) subject to
2. Sets and symbols We shall use some symbols and elementary concepts from set theory [Anderson-Hall 63, Hamilton-Landin 61, Berge 63]. In particular a set F is a collection of objects of any kind which are by definition elements or points of T. For example if we let R (the reals or the real line) denote the set of all real numbers, then 7 is an element or point of R. We use the symbol G to denote the fact that an element belongs to a set. For example we write 7 G R- For simplicity we also write sometimes 5,7 G R instead of 5 £ R and 7 G R. If T and A are two sets, we say that F is contained in A, F is in A, F is a subset of A, or A contains F, if each element of F is also an element of A, and we write F C A or A 3 F. If F C A and A C F we write F = A. A slash across a symbol denotes its negation. Thus x @ F and F <£ A denote respectively that x is not an element of F and that F is not a subset 3
1.2
Nonlinear Programming
of A. The empty set is the set which contains no elements and is denoted by 0. We denote a set sometimes by {x,y,z}, if the set is formed by the elements x, y, z. Sometimes a set is characterized by a property that its elements must have, in which case we write [x | x satisfying property P\ For example the set of all nonnegative real numbers can be written as The set of elements belonging to either of two sets F or A is called the union of the sets F and A and is denoted by F U A. We have then The set of elements belonging to at least one of the sets of the (finite or infinite) family of sets (r,),e/ is called the union of the family and is denoted by U F,. Then
The set of elements belonging to both sets T and A is called the intersection of the sets F and A and is denoted by F P> A. We then have
The set of elements belonging to all the sets of the (finite or infinite) family of sets (F,)ie/, is called the intersection of the family and is denoted by r! F,. Then ie/
Two sets F and A are disjoint if they do not intersect, that is, if
r n A = 0.
The difference of the sets A and F is the set of those elements of A not contained in F and is denoted by A ~ F. We have then
In the above it is not assumed in general that F C A. If however F C A, then A ~ F is called the complement of F relative to A. The product of two sets F and A, denoted by F X A, is defined as the set of ordered pairs (x,y) of which x G T and y G A. We have then
4
Preliminary Concept! and Notations
1.1
Fig. 1.2.1 The product r X A of the sets r and A.
The product of n sets Fi, . . . , Tn, denoted by Ti X F 2 X • • * X F n , is defined as the set of ordered n-tuples (x\, . . . ,xn) of which x\ G I\ • • • , #n £ Tn. We have then If Ti = T2 = • • • = rn = T, then we write F" = T X r X • • • X T. If we let then Figure 1.2.1 depicts the set T X A. The set # 2 = R X R, which can be represented by points on a plane, is called the Euclidean plane. The following symbols will also be used: (Vx) reads for each x (3x) reads there exists an x such that => reads implies <= reads is implied by «=» reads is equivalent to (A slash [/] across any one of the last three symbols denotes their negation.) For example the statement "for each x there exists a y such that 6(x,y) = 1" can be written as The negation of the above statement can be automatically written as Frequently we shall refer to a certain relationship such as an equation or an inequality by a number or Roman numeral such as I or II. 5
1.8
Nonlinear Programming:
The notation I => II means relationship I implies relationship II. An overbar on I or II (I or II) denotes the negation of the relationship referred to by that numeral. Obviously then the statement that I =» II is logically equivalent to I <= II. Thus
3.
Vectors n-vector
An n-vector or n-dimensional vector x, for any positive integer n, is an n-tuple (xi, . . . ,xn) of real numbers. The real number x, is referred to as the ith component or element of the vector x. Rn
The n-dimensional (real) Euclidean space Rn, for any positive integer n, is the set of all n-vectors. The notation x £j Rn means that x is an element of Rn, and hence, x is an n-vector. Frequently we shall also refer to re as a point in Rn. R1, or simply R, is then the Euclidean line (the set of all real numbers), R* is the Euclidean plane (the set of all ordered pairs of real numbers), and R" = R X R X • • • X R (n times). Vector addition and multiplication by a real number Let x,y (E Rn and a G R- The sum x -\- y is defined by
x + y = (xi + 7/1, . . . ,xn + yn) and the multiplication by a real number ax is defined by ax = (axi, . . . ,axn)
Linear dependence and independence The vectors x1, . . . ,xm G Rn are said to be linearly independent if
otherwise they are linearly dependent. (Here and elsewhere 0 denotes the real number zero or a vector each element of which is zero.) Linear combination The vector x £ Rn is a linear combination of x1, . . . , xm G Rn if x = X1^1 4- • • • + \mxm 6
for some X1, . . . , \m G R
Preliminary Concepts and Notations
1.3
and it is a nonnegative linear combination of xl, . . . , xm if in addition to the above equality X 1 , . . . , \m ^ 0. The numbers X 1 , . . . , Xm are called weights. The above concepts involving vector addition and multiplication by a scalar define the vector space structure of Rn. They are not enough however to define the concept of distance. For that purpose we introduce the scalar product of two vectors. Scalar product The scalar product xy of two vectors x,y E Rn is defined by xy = xtfji + • • • + xnyn Norm of a vector The norm |ja;|| of a vector x E Rn is defined by
Cauchy-Schwarz inequah'ty Let x,y E Rn. Then where \xy\ is the absolute value of the real number xy. PROOF
Let x,y E Rn be fixed. For any a E R
Hence the roots of the quadratic equation in a xx(a)z + 2xya + yy = 0 cannot be distinct real numbers, and so (xy)2 ^ (xx)(yy) which implies the Cauchy-Schwarz inequality.
|
Distance between two points Let x,y E Rn- The nonnegative number S(x,y) = \\x — y\\ is called the distance between the two points x and y in Rn. Problem Establish the fact that Rn is a metric space by showing that 7
1.4
Nonlinear Programming
8(x,y) satisfies the following conditions
(triangle inequality) (Hint: Use the Cauchy-Schwarz inequality to establish the triangle inequality.) Angle between two vectors Let x and y be two nonzero vectors in Rn: The angle \l> between x and y is defined by the formula
This definition of angle agrees for n = 2,3 with the one in analytic geometry. The nonzero vectors x and y are orthogonal if xy = 0 (\l/ = 7T/2); form an acute angle with each other if xy ^ 0 (0 2g ^ f=j ir/2), a s/nc/ acwte an0/e if xy > 0 (0 ^ ^ < ?r/2), an obtuse angle if xy ^ 0 (7T/2 ^ ^ ^ TT), and a sin'd obtuse angle if xy < 0 (?r/2 < ^ ^ ?r).
4. Matrices Although our main concern is nonlinear problems, linear systems of the following type will be encountered very frequently: AnXj. + • • • + Alnxn - bi Amlxi + • ' ' + Amnxn - bm where Ai}^ and 6,, i = 1, . . . , m,j = 1, . . . , n, are given real numbers. We can abbreviate the above system by using the concepts of the previous section. If we let Ai denote an n-vector whose n components are AHJ j = 1, . . . , n, and if we let x G Rn, then the above system is equivalent to AiX — bi
i = 1, . . . , m
In 2 we interpret A& as the scalar product 1.3.6 of At and x. If we further let Ax denote an w-vector whose m components are A<x, i = 1, . . . , m, and b an w-vector whose m components are bi, then the equivalent systems 1 and 2 can be further simplified to
Ax = b 8
Preliminary Concepts and Notations
1.4
In order to be consistent with ordinary matrix theory notation, we define the m X n matrix A as follows
The ith row of the matrix A will be denoted by A» and will be an n-vector. Hence
The jth column of the matrix A will be denoted by A.> and will be an w-vector. Hence
The transpose of the matrix A is denoted by A' and is defined by
Obviously the t'th row of A is equal to the ith column of A', and the jth column of A is equal to the jth row of A'. Hence
The last equalities of 8 and 9 are to be taken as the definitions of A't and A'J respectively. Since Ay is the real number in the t'th row of the jth column of A, then if we define A^t as the real number in thejth row of the ith column of A', we have
The equivalent systems 1, 2, and 3 can be written still in another form as follows
9
1.4
Nonlinear Programming
Here A.j and b are vectors in Rm and x, are real numbers. The representation 2 can be interpreted as a problem in Rn whereas 11 can be interpreted as a problem in Rm. In 2 we are required to find an x £ Rn that makes the appropriate scalar products 6, (or angles, see 1.3.11} with the n-vectors Ai} i = 1, . . . , m. In 11, we are given the n + 1 vectors in Rm> A.j, j'• — 1, . . . , n and b, and we are required to find n weights xi, . . . , xn such that b is a linear combination of the vectors A.j. These two dual representations of the same linear system will be used in interpreting some of the important theorems of the alternative of the next chapter. The m X n matrix A of 4 can generate another linear system yA, defined as follows
where y £ Rn. Hence, yA is an n-dimensional vector whose jth component is given by In general we shall follow the convention of using upper case Latin letters to denote matrices. If A is an m X n matrix, and if we let
then we define the following sub matrices of A (which are matrices with rows and columns extracted respectively from the rows and columns of A)
Fig. 1.4.1 An m X n matrix and its submatrices. 10
Preliminary Concepts and Notations
1.8
jth column of ith row of It follows then that
and
Figure 1.4-1 depicts some of the above submatrices of A. Nonvacuous matrix A matrix A is said to be nonvacuous if it contains at least one element A^. An m X n matrix A with m ^ 1 and n ^ 1 is nonvacuous even if all its elements AH = 0.
5. Mappings and functions Mapping Let X and Y be two sets. A mapping F from X into Y is a correspondence which associates with every element x of X a subset of Y. For each x (E X, the set T(x) is called the image of x. The domain X* of F is the subset of points of X for which the image T(x) is nonempty, that is, The range T(X*) of F is the union of the images of all the points of X*, that is
EXAMPLEE
Let X = F = #, F(z) = {y | cos y = *}.
Then
Function A function f is a single-valued mapping from a set X into a set Y. That is for each x £ X, the image set/(x) consists of a single element of F. The domain of / is X, and we say that / is defined on X. The range of / is f(X) = U f(x). (For convenience we will write the image of a funcxex tion not as a set but as the unique element of that set.) 11
1.&
Nonlinear Programming
Numerical function A numerical function 6 is a function from a set X into #. In other words a numerical function is a correspondence which associates a real number with each element x of X. EXAMPLESS If If X= then d isd the familiar real EXAMPLESS X R, = R, then is the familiar realsingle-valued single-valued func func tion of a real variable, such as 0(x) = sin x. If X is the set of positive integers, then 6 assigns a real number for each positive integer, for example 6(x) = l/x\. If X = Rn, then d is the real single-valued function of n variables.
Vector function An m-dimensional vector function f is a function from a set X into Rm. In other words a vector function is a correspondence which associates a vector from Rm with each element x of Jf. The m components of the vector f(x) are denoted by fi(x), . . . , fm(x). Each/< is a numerical function on X. A vector function / has a certain property (for example continuity) whenever each of its components/, has that property. EXAMPLE If X = R, then d is the familiar real single-valued func of Rn. The m components /,, i' = 1, . . . , m, of / are numerical functions on Rn. Linear vector functions on Rn An m-dimensional vector function / defined on Rn is said to be linear if
f(x) = Ax + b where A is some fixed m X n matrix and b is some fixed vector in Rm. It follows that if / is a linear function on Rn then
(Conversely, the last two relations could be used to define a linear vector function on Rn, from which it could be shown that/(a;) = Ax -f 6 [Berge 63, p. 159].) If m = 1 in the above, then we have a numerical linear function B on Rn and where c is a fixed vector in Rn and y is a fixed real number. 12
Preliminary Concepts and Notations
1.6
Inequalities or equalities involving linear vector functions (or linear numerical functions) will be naturally called linear inequalities or equalities.
6.
Notation Vectors and real numbers
In general we shall follow the convention that small Latin letters will denote vectors such as a, b, c, x, y, z, or vector functions such as/, g, h. Exceptions will be the letters i, j, k, m, n, and sometimes others, which will denote integers. Small Greek letters will denote a real number (a point in R) such as a, ft, 7, £, rj, f, or a numerical function such as 8, >, \f/. Subscripts A small Latin letter with an integer subscript or a small Latin letter subscript will denote a component of a vector, in general, and on occasion will denote a vector. For example, if x G Rb, then x$ and zt denote respectively the third and z'th components of x. On the other hand we will have occasion to let x\ G Rm, £2 G Rm, etc., in which case this intent will be made explicit. Small Greek letters with integer or Latin subscripts will occasionally be used to denote real numbers such as Xi, X,. If x G Rn, K C N = {I, . . . ,n}, and K contains k elements each of which is distinct, then X^K is a vector in Rk with the components [xi | i G K-} and is denoted by XK. Thus a small Latin letter with a capital Latin letter subscript denotes a vector in a space of smaller or equal dimension to that of the space of the unsubscripted vector. Superscripts A small Latin or Greek letter with a superscript or an elevated symbol will denote a fixed vector or real number, for example x1, xz, x\ x, x, £*, I, etc. Exponentiation on the other hand will be distinguished by parentheses enclosing the quantity raised to a power, for example (z)2. Zero The number 0 will denote either the real number zero or a vector in Rn all components of which are zero. Matrices Matrices will be denoted by capital Latin letters as described in detail in a previous section, Sec. 1.4. 13
1.6
Nonlinear Programming
Sets Sets will always be denoted by capital Greek or Latin letters such as F, A, fl, R, I, X, Y. Capital letters with subscripts, such as TI, T2, I\, and capital letters with elevated symbols, such as r*, X° will also denote sets. (See also Sec. 1.2.) Ordering relations The following convention for equalities and inequalities will be used. If x,y £ Rn, then
If x ^ 0, x is said to be nonnegative, if x > 0 then x is said to be semipositive, and if x > 0 then x is said to be positive. The relations =, ^, >, > defined above are called ordering relations (in Rn). The nonlinear programming problem By using the notation introduced above, the nonlinear programming problem 1.1.1 to 1.1.3 can be rewritten in a slightly more general form as follows. Let X° C Rn, let g, h, and 6 be respectively an m-dimensional vector function, a ^-dimensional vector function, and a numerical function, all defined on X°. Then the problem becomes this: Find an x, if such exists, such that
The set X is called the feasible region, x the minimum solution, and Q(x) the minimum. All points x in the feasible region X are referred to as feasible points or simply as feasible. Another way of writing the same problem which is quite common in the literature is the following:
subject to
14
Preliminary Concepts and Notations
1.6
We favor the more precise and brief designation 9 of the problem instead of 10 to 12. Notice that if we let X° = Rn in the above problem, then we obtain the nonlinear programming problem 1.1.1 to 1.1.3. If X° = Rn and 6, g, and h are all linear functions on Rn, then problem 9 becomes a linear programming problem: Find an x, if such exists, such that
where 6, c, and d are given fixed vectors in Rn, Rm, and Rk respectively, and A and B are given fixed m X n and k X n matrices respectively. There exists a vast literature on the subject of linear programming [Dantzig 63, Gass 64, Hadley 62, Simmonard 66]. It should be remarked that problem 13 is equivalent to finding an x such that
When B and d are absent from this formulation, 14 becomes the standard dual form of the linear programming problem [Simmonard 66, p. 95].
16
Chapter Two Linear Inequalities and Theorems of the Alternative
1. Introduction It was mentioned in Chap. 1 that the presence of inequality constraints in a minimization problem constitutes the distinguishing feature between the minimization problems of the classical calculus and those of nonlinear programming. Although our main interest lies in nonlinear problems, and hence in nonlinear inequalities, linearization (that is, approximating nonlinear constraints by linear ones) will be frequently resorted to. This will lead us to linear inequalities. It is the purpose of this chapter to establish some fundamental theorems for linear inequalities which will be used throughout this work. (Needless to say, these fundamental theorems also play a crucial role in linear programming. See for example [Gale 60, chap. 2].) The type of theorem that will concern us in this chapter will involve two systems of linear inequalities and/or equalities, say systems I and II. A typical theorem of the alternative asserts that either system I has a solution, or that system II has a solution, but never both. The most famous theorem of this type is perhaps Farkas' theorem [Farkas 02, Tucker 56, Gale 60]. Farkas' theorem For each fixed p X n matrix A and each fixed vector b in Rn, either has a solution
or
Linear Inequalities and Theorems of the Alternative
a.i
Fig. 2.1.1 Geometric interpretation of Farkas' theorem: II' has solution, I' has no solution.
has a solution
but never both. We shall postpone a proof of Farkas' theorem until after we have given a geometric interpretation of it and applied it to get the necessary optimality conditions of linear programming. To give a geometric interpretation of Farkas' theorem, we rewrite I and II as follows
where A[f denotes the jih column of A' and Aj the jih row of A (see Sec. 1.4). System II' requires that the vector b be a nonnegative linear combination of the vectors AI to Ap. System I' requires that we find a vector x G Rn that makes an obtuse angle (^7r/2) with the vectors AI to Ap and a strictly acute angle (
Fig. 2.1.2 Geometric interpretation of Farkas' theorem: I' has solution, II' has no solution. 17
h.2
Nonlinear Programming
shows a simple case with n = 2, p = 3, in which IF has a solution, and hence by Farkas' theorem F cannot have a solution. Figure 2.1.2 shows a case with n = 2, p = 2, in which IF has no solution, and hence by Farkas' theorem F must have a solution.
2. The optimality criteria of linear programming: An application of Farkas' theorem As a typical example of the use and power of the theorems of the alternative we shall show how Farkas' theorem can be employed to derive necessary optimality conditions for the following linear programming problem: Find an x, if it exists, such that
where b and c are given fixed vectors in Rn and Rm respectively, and A is a given fixed m X n matrix. Optimality criteria of linear programming
(Necessity} Let x be a solution of the linear programming problem 1. Then there exists a u G Rm such that (x,u) satisfy (sual feasibility) (primal feasibility) (complementarity)
(Sufficiency) If some x (E Rn and some u G Rm satisfy 3 to 6, then x solves 1. PROOF (Necessity) Let x solve 1. Define the index sets P, Q, and M as follows:
and assume that P and Q contain p and q elements, respectively. (see 14.16)
Then
t These are standard terms of linear programming [Dantzig 63], which differ from the terminology of nonlinear programming (see Chap. 8). The complementarity condition usually refers to an equivalent form of 6: H(Ax — c) = 0. 18
Linear Inequalities and Theorems of the Alternative
2.3
it follows that for some real number 8 > 0, where e is an m-vector of ones. each x G Rn> we can find a real number a > 0 such that
Then for
and hence x -\- ax ^ X. Since x is the minimum solution of 1, we have that for each x G Rn there exists an a > 0 such that Hence bz ^ 0
for each x G Rn
which implies that b = O.f By taking u = 0 G ^m, the relations 3 to 6 are satisfied because b = 0. If P 5^ 0, then we assert that the system has no solution x G Rn- For if P did have a solution z, say, then ax would also be a solution of 9 for each a > 0. Now consider the point x + ax, where re is a solution of 9 and a > 0. Then
where in 12 the first inequality follows by defining e as a §-vector of ones and
and the second inequality of 12 holds for some a > 0 because — d < 0. But relations 10 to 12 imply that x -f ax G -^ and — b(x + az) < — bx, which contradicts the assumption that x is a solution of 1. Hence 9 has no solution x G Rn, and by Farkas' theorem 2.1.1 the system f To see this, take for a fixed i € M = {1, 2, .. . , n}, at,- = b{, x^i = 0, then bx ^ 0 implies that (b,; ) 2 ^ 0. Hence bt = 0. Repeating this process for each i € M, we get 6 = 0. 19
8.2
Nonlinear Programming:
must have a solution y G Rp-
If we let 0 G R", we have then
By defining u G Em such that u = (UP,WQ), where wp = ?/ G Rr, UQ = 0 G #*, condition /4 becomes conditions 4 and 5, and condition 15 becomes condition #. Condition 3 holds because x G X. This completes the necessity proof. (Sufficiency] Let x G Rn and u £ Rm satisfy 3 to 0, and let x G X. By 3 we have that x G -X". Now
We remark that Farkas' theorem was used only in establishing the necessity of the conditions 3 to 6, whereas the sufficiency of the conditions 3 to 6 required only elementary arguments. This is the typical situation in establishing optimality criteria in mathematical programming in general. Necessity requires some fairly sophisticated mathematical tool, such as a theorem of the alternative or a separation theorem for convex sets (see Chap. 3), whereas sufficiency can be established merely by using elementary manipulation of inequalities. The remaining sections of this chapter will be devoted to obtaining a rather comprehensive set of theorems of the alternative of the Farkas type. We shall follow Tucker [Tucker 56] in establishing these results. We begin first in the next section with some existence theorems for linear inequalities, from which all the theorems of the alternative follow. Problem Consider the following general form of the linear programming 30
Linear Inequalities and Theorems of the Alternative
1.3
problem: Find x G Rn> y €E R*, if they exist, such that
where 6, a, c, and d are given vectors in Rn, R', Rm, and Rk respectively, and A, D, B, and E are given m X n, m X t, k X n, and A; X ^ matrices respectively. Show that a necessary and sufficient condition for (x,y) to solve the above problem is that (x,y) and some u G #w and v £ Rk satisfy the following conditions:
(Hint: Replace the equality constraint Bx + Ey = d
by £z + £y ^ d
and
-5a: - Ey ^ -d and use Theorem 2.}
3. Existence theorems for linear systems We establish now some key theorems for the existence of certain types of solutions for linear systems and begin with a lemma due to Tucker [Tucker 56]. ai
Nonlinear Programming
1.3
Tucker's lemma For any given p X n matrix A, the systems
I
Ax ^ 0
and
II
A'y = 0, y ^ 0
possess solutions x and y satisfying
Aix + 2/1 > 0 PROOF The proof is by induction on p. For p = 1, if AI = 0, take y\ = 1, x — 0; if A\ j& 0, take yi = 0, x = A!. Now assume that the lemma is true for a matrix -4 of p rows and proceed to prove it for a matrix of p + 1 rows A:
By applying the lemma to A, we get x,y satisfying
we take
Then
which extends the lemma to A. However, if Ap+ix < 0, we apply the lemma a second time to the matrix B:
where
32
Linear Inequalities and Theorems of the Alternative
1.3
So or Bx = 0
This second use of the lemma yields v,u satisfying
Let
It follows from 5 and 7 that
then
and
where the last inequality follows from 7, the equality before from 6, the equality before from 10, the equality before from 11, and the equality before from 4- Finally from 4, H, 10, 6, and 7 we have AIW + HI = (Bi — \iA p+i)w -f ui = BIW + HI
Relations 5, 5, .?#, and 15 extend the lemma to A 23
>.3
Nonlinear Programming
From Tucker's lemma two important existence theorems follow. These theorems assert the existence of solutions of two linear systems that have a certain positivity property. First existence theorem [Tucker 56] For any given p X n matrix A, the systems and possess solutions x and y satisfying PROOF In Tucker's lemma the row AI played a special role. By renumbering the rows of A, any other row, say -4,-, could have played the same role. Hence, by Tucker's lemma 1, there exist zl G Rn, yi £ RP, i = I , . . . , p, such that
Define
Hence by 15, we have that
and for i = 1, . . . , p,
14
Linear Inequalities and Theorems of the Alternative
a.s
Fig. 2.3.1 Geometric interpretation of Theorem 14, n — 2, p = 3.
or
Ax + y > 0 A geometric interpretation of the above theorem can be given in the space of the rows A± G Rn, i = 1, . . . , p. The theorem states that one of the following alternatives must occur for any given p X n matrix A: (a) There exists a vector x which makes a strict acute angle (
and
possess solutions x G Rn, yi G Rpl, yt G Rfl satisfying Ax + yi> 0 25
s.s
Nonlinear Programming
PROOF We remark first that the requirement that A be nonvacuous is merely to ensure that the statement of the theorem is nonvacuous, that is, without a matrix A the theorem says nothing. We apply Theorem 14 to the systems
and
and obtain x, yi, zi, z-t, satisfying AX + 7/i > 0
Bx + 21 > 0
-fix + 22 > 0 Define now y2 = Zi — 22- We have then that x, 1/1, 1/2 satisfy Ax ^ 0 Bx = 0 A'yi + B'y2 = 0 t,! ^ 0 As + yi >
Corollary Lef A, B, C, and D be given pl X n, pz X n, p3 X n, and p* X n matrices, with A, B, or C nonvacuous. Then the systems and II
A'j/i + B'y, + C'y, + ^4 = 0, tfl ^ 0, y2 ^ 0, y3 Z 0
possess solutions
satisfying
Ax + t/i > 0 Bx + 2/2 > 0 and Cx + 1/3 > 0
This corollary to Theorem 17 will be used to derive the most general types of the theorems of the alternative of the next section. 16
Linear Inequalities and Theorems of the Alternative
1.4
4. Theorems of the alternative In this section we shall be concerned with establishing a series of theorems relating to the certain occurrence of one of two mutually exclusive events. The two events, denoted by I and II, will be the existence of solutions of two related systems of linear inequalities and/or equalities, f The prototype of the theorem of the alternative can be stated as follows: Either I or II, but never both. If we let I denote the nonoccurrence of I and similarly for IT, then we can state the following. Typical theorem of the alternative
I<^T! or equivalently I<=*II TYPICAL PROOF I =» II
(or equivalently I <= II)
and I => II
(or equivalently I <= II)
The proof that I => II is usually quite elementary, but the proof that I => II utilizes the existence theorems of the previous section. In the theorems to follow, certain obvious consistency conditions will not be stated explicitly for the sake of brevity. For example, it will be understood that certain matrices must have the same number of rows, that the dimensionality of certain vectors must be the same as the number of columns in certain matrices, etc. We begin now by establishing a fairly general theorem of the alternative due to Slater [Slater 51]. Slater's theorem of the alternative [Slater 51] Let A, B, C, and D be given matrices, with A and B being nonvacuous. Then either I
Ax > 0
Bx > 0
Cx ^ 0
Dx = 0 has a solution x
or f Occasionally we shall also refer to the systems of inequalities and equalities themselves as systems I and II. XT
3.4
Nonlinear Programming
has a solution
6w< never 6o£/i. PROOF (I => II) By assumption, I holds. We will now show that if II also holds, then a contradiction ensues. If both I and II hold, then we would have x, yi, yz, y*, y\ such that
xA'yi + xB'y* + xC'y, + xD'y* > 0 because xD'y4 = 0, xC'y3 ^ 0, and either xB'yz ^ 0 and xA'yl > 0, or xB'yz > 0 and xA'y\ ^ 0. But now we have a contradiction to the first equality of II. Hence I and II cannot hold simultaneously. Thus,
i=*n. (I => ID
=>II. We remark that in the above proof, the requirement that both A and B be nonvacuous was used essentially in establishing the fact that I => II. Corollary 18, which was used to prove that I =» II, can handle systems in which merely A or B are nonvacuous. By slightly modifying the above proof, the cases B vacuous and A vacuous lead respectively to Motzkin's theorem of the alternative (or transposition theorem, as Motzkin called it) [Motzkin 36] and Tucker's theorem of the alternative [Tucker 56]. Motzkin's theorem of the alternative [Motzkin 36] Let A, C, and D be given matrices, with A being nonvacuous. Then as
Linear Inequalities and Theorems of the Alternative
2.4
either Ax > 0
I
Cx ^ 0
Dx = 0
has a solution x
or has a solution
but never both. PROOF (I ==> II) such that
If both I and II hold, then we would have x, yit y3, y^
xA'yi + xC'y3 + xD'y4 > 0 because xD'y^ = 0, xC'y3 ^ 0, and xA'y\ > 0. But now we have a contradiction to the first equality of II. Hence, I and II cannot hold simultaneously. Thus, I => II.
(T=>H) I =* (Ax ^ 0, Cx ^ 0, Dx = 0> => {Ax ^ 0» (by Corollary 2.3.18)
=*!!. |
Tucker's theorem of the alternative [Tucker 56] either I
Let B, C, and D be given matrices, with B being nonvacuous. Bx > 0
Cx ^ 0
Then
Dx = 0 has a solution x
or
has a solution y2, y3,y4 but never both. The proof is similar to the proof of Motzkin's theorem 2. (The reader is urged to go through the proof himself here.) Slater [Slater 51] considered his theorem as the one providing the most general system I possible, because it involved all the ordering relations >, >, ^, =. Similarly we can derive another theorem which involves the most general system II possible, in which y\ > 0, yz > 0, j/a ^ 0, and 2/4 is unrestricted. 99
h2.4
Theorem of the alternative Let A, B, C, and D be given matrices, with A and B being nonvacuous. Then either
has a solution or
has a solution but never both. PROOF (I => !I) 7/3, 2/4 satisfying
If both I and II hold, then we would have x, ylt yt,
xA'yi + xB'y* + xC'y3 + zZ) / y 4 > 0 because xD'y\ = 0, xC'y3 ^ 0, and either xB'y2 ^ 0 and xA'y\ > 0, or z5'2/2 > 0 and xA'yi ^ 0. But now we have a contradiction to the first equality of II. Hence, I and II cannot hold simultaneously. Thus,
i=>n. (n=>i)
by Crollary 18 •=>!• We remark that if either A or 5 is vacuous, then we revert to Tucker's theorem 3 or Motzkin's theorem 2. We remark further that in all of the above theorems of the alternative the systems I are all homogeneous. Hence, by defining z = — x, the system I of, say, Slater's theorem 1 can be replaced by I'
Az < 0, Bz < 0, Cz ^ 0, Dz = 0 has a solution z
Similar remarks apply to theorems 2 through 480
Linear Inequalities and Theorems of the Alternative
1.4
The above theorems of the alternative subsume in essence all other theorems of this type. We derive below some of these theorems directly from the above ones. Gordan's theorem [Gordan 73] For each given matrix A, either I
Ax > 0 has a solution x
or
II
A'y = 0, y > 0 has a solution y
but never both. PROOF Follows directly from Motzkin's theorem 2, by deleting the matrices C and D. Geometrically we may interpret Gordan's theorem as follows. Either there exists a vector x which makes a strict acute angle (<7r/2) with all the row vectors of A, Fig. 2.4-1 a, or the origin can be expressed as a nontrivial, nonnegative linear combination of the rows of A, Fig. 24.1b. Farkas' theorem [Farkas 02] For each given p X n matrix A and each given vector b in Rn either I
Ax ^ 0
bx > 0 has a solution x G Rn
or
II
A'y = b, y ^ 0 has a solution y G Rp
but never both. PROOF
By Motzkin's theorem 2, either I holds or
IF 677 — A'y =3 = 0, ij > 0, j/s ^ 0 must have a solution r\ (E R and
Fig. 2.4.1 Geometric interpretation of Gordan's theorem.
si
8.4
Nonlinear Programming
but not both. Since r; G ]?, T? > 0 means 77 > 0. Dividing through by 17 and letting y = 1/3/77, we have that II' is equivalent to Stiemke's theorem [Stiemke 15] For each given matrix B, either I
Bx > 0 has a solution x
or
II
B'y = 0, y > 0 has a solution y
but never both. PROOF and D
Follows directly from Tucker's theorem 3, by deleting C
Nonhomogeneous Farkas' theorem [Duifin 56] For a given p X n matrix A, given vectors b (E Rn, c £ Rp, and a given scalar ft, either I
bx > /3, Ax ^ c has a solution x (E Rn
or
has a solution but never both. PROOF
The system I is equivalent to
has a solution By Motzkin's theorem #, then, either I' holds or
but not both. sa
Linear Inequalities and Theorems of the Alternative
8.4
Now we have either 171 > 0 or 171 = 0 (and hence fi > 0). B defining y — y^lt\\ in the first case and y = y3 in the second, we have that II' is equivalent to
II" is equivalent to II. Gale's theorem for linear equalities [Gale 60] For a given p X n matrix A and given vector c G Ep, either I
Ax = c has a solution x G Rn
or
II
A'y — 0, cy = 1 has a solution y G Rp
but never both. PROOF I'
The system I is equivalent to
£ > 0, - & + Ax = 0 has a solution £ G R, x G Rn By Motzkin's theorem 2, either I' holds or
has a solution but not both. Since y\ G R, y\> 0. By defining y = y*/y\, II' is equivalent to II. | Gale's theorem for linear inequalities (^) [Gale 60] For a given p X n matrix A and given vector c G Rp, either I
Ax ^ c has a solution x G Rn
or
II
A'y = 0, cy — — 1, y ^ 0 has a solution y G Rp
but never both. PROOF
The system I is equivalent to
has a solution 93
Nonlinear Programming
3.4
Table 2.4.1 Theorems of the Alternative! 1 Ax > 0, Bx > 0, Cx £ 0, Dx = 0 4'yi +B'y2 + C'y3 +D'yt = 0 (A and B nonvacuous) (Slater) y\ > 0, 2/2 ^ 0, 3/3 ^ 0 or ^! £ 0, i/2 > 0, 3/3 ^ 0
2 Ax > 0, Cx ^ 0, Dx = 0 (A nonvacuous) (Motzkin)
A'y, + C'y, + Z)'y4 = 0 J/i > 0, y, ^ 0
3 Bx > 0, Cx £ 0, Dx = 0 (B nonvacuous) (Tucker)
B'y2 + C"2/3 + D'yt = 0 y2 > 0, 2/3 ^ 0
4 Ax > 0, Bx £ 0, Cx £ 0, Dx = 0 A'yi + 5'2/2 + C"i/3 + D'yt = 0 or 2/i > 0, 2/2 > 0, yi ^ 0 Ax £ 0, Bx > 0, Cx £ 0, Dx = 0 (-4 and B nonvacuous)
5 Az > 0 (Gordan)
4'y = 0, y > 0
6 bx > 0, Ax ^ 0 (Farkas)
A'l/ = 6, y ^ 0
7 JSx > 0 (Stiemke)
B'y = 0, y > 0
* 6z > 0, 4z ^ c (Nonhomogeneous Farkas)
A'y = b, cy £ 0, y ^ 0 or A'y = 0, cy < 0, y ^ 0
9 4z = c (Gale)
A'y = 0, cy = 1
*0 4x ^ c (Gale)
A'y = 0, cy = -1, y ^ 0
ff
A'y = 0, cy = -1, y ^ 0 or A'y = 0, cy g 0, y > 0
Ax < c
tNo "or" appearing in the above table and in Problems 2.4.12 to 2.4.17 is an exclusive "or."
By Motzkin's theorem 2, either I' holds or
has a solution but not both. By defining y = y^/y\, II follows from II'. 34
Linear Inequalities and Theorems of the Alternative
3.4
Theorem for linear inequalities (<) For a given p X n matrix A and a given vector c E; Rp, either I
Ax < c has a solution x G Rn
or
has a solution but never both.
PROOF T
The system I is equivalent to
£ > 0, c£ - Ax > 0 has a solution £ £ R, x E R" By Slater's theorem 1, either I' holds or
but not both. If for the case when y\ > 0, y% ^ 0, we set y = 2/2/2/1, and for the case when y\ ^ 0, yz > 0, we set y = y 2 , then II is equivalent to II'. In the table above, Table 2.4-1, we give a convenient summary of all the above theorems of the alternative. Problems By using any of the above theorems 1 to 11, establish the validity of the following theorems of the alternative (12 to ./7): Either I holds, or II holds, but never both, where I and II are given below.
has a solution has a solution has a solution has a solution has a solution has a solution SB
2.4
Nonlinear Programming
has a solution
has a solution has a solution has a solution has a solution has a solution
Mnemonic hint In all the theorems of the alternative Mo 17 above, which involve homogeneous inequalities and/or homogeneous equalities, the following correspondence between the ordering relations, >, >, ^, =, occurs: Orderings appearing in I
36
Orderings appearing in II
Linear Inequalities and Theorem! of the Alternative
2.4
The asterisks indicate ordering relations which must be present in order for the correspondence to hold. The arrows indicate the direction in which the correspondence is valid; for example, —> indicat es that starting with the relations at the unpointed end of the arrow, the corresponding relations are those at the pointed end of the arrow. Problem Establish Motzkin's theorem 2 by starting with Farkas' theorem 6. (Hint: Let
has a solution pseudoconcave and use Farkas' theorem. e is a vector pof ones in the above.)
37
Chapter Three Convex Sets inR n
The purpose of this chapter is to introduce the fundamental concept of convex sets, to describe some properties of these sets, and to derive the basic separation theorems for convex sets. These separation theorems are the foundations on which many optimality conditions of nonlinear programming rest.
1. Convex sets and their properties In order to define the concept of a convex set, we begin by defining line and line segments through two points in Rn. Line Let x\x2 e R*. The line through xl and x* is defined as the set
{x\x = (1 - X)*1 + Xz2, X ER] or equivalently [x | x = pix1 + pzx2, pi, pz G R, Pi + Pz = 1}
If we rewrite the first definition in the equivalent form
[ x \ x = x1 -f X(z 2 - x1), X e # ) and consider the case when x G R*, it becomes obvious that the vector equation x = x1 + X(z 2 — x1) is the parametric equation of elementary analytic geometry of the line through x1 and x2, Fig. 8.1.1.
Convex Seta in 72"
8.1
Line segments Let xl, x2 £ Rn. We define the following line segments joining z1 z and x : (i) Closed line segment [xl,x*] = {x \ x = (1 - X)*1 + \x*, 0 g X ^ 1} (ii) Open line segment (xl,x2) = {x \ x = (1 - X)z l -f Xz 2 , 0 < X < 1} (iii) Closed-open line segment [xl,x*) = {x \ x — (1 — X)^1 + Xz2, 0 ^ X < 1} (iv) Open-closed line segment (xl,x*] = {x \ x = (1 — \)xl + Xz2, 0 < X ^ 1} Obviously [z^z2] is the portion of the straight line through a;1 and x which lies between and includes the points x1 and x2, Fig. 3.1.1. (z*,z2) does not include x1 or z2, (a:1,*;2) does not include a:2, and (xl,x*] does not include x1. z
Convex set A set F C Rn is a convex set if the closed line segmentf joining every two points of F is in T. Equivalently we have that a set F C Rn is convex if
Figure 3.1.2 depicts some con vex sets in R2, and Fig. 3.1.3 some nonconvex sets in Rz. It follows from 3 that Rn itself is convex, that the empty set is convex, and that all sets consisting each of one point are convex. The subsets of Rn defined below in 4. 5, and 6 are all convex sets n in R . This can be easily established by a direct verification of the definition 3 of a convex set. t It is obvious that the definition of a convex set would be unchanged if any of the other line segments denned in 2 were used here instead of the closed line segment.
Fig. 3.1.1 Line and line segment through x1 and x*. 39
Nonlinear Programming
8.1
Fig. 3.1.2 Convex sets.
Halfspace Let c G Rn, c 7* 0, and a G R- Then the set [x x G R", ex < a] is an open half space in Rn, and the set {x \ x G #n, cz ^ a j is a c/oserf halfspace in #". (Both halfspaces are convex sets.) Plane Letc G #n, c ?* 0, and a G #• Then the set {z | x G ^n, ex = a} is called a plane in .Rn. (Each plane in Rn is a convex set.) Subspace A set F C Rn is a subspace if
Each subspace of /2n contains the origin and is a convex set. The subspaces of R3 consist of 0, R3, the origin, and all straight lines and planes passing through the origin. Problem (i) Show that each open or closed ball Bt(x) = {x \ x G Rn, \\x - x\\ < 6} Bt(x) = {x | x G Rn, \\x - x\\ ^ e]
Fig. 3.1.3 Nonconvex sets. iO
Convex Sets in R*
3.1
around a point x G Rn is a convex set. (Hint: Use the triangle inequality 1.3.10 in the form \\x + y\\ ^ \\x\\ + \\y\\.) (ii) Show that the interior of a convex set is convex. Vertex Let F be a convex set in Rn. Each x G F for which there exist no two distinct xl,xz G F different from x such that x G [z1,^2], is called a vertex of F (or an extreme point of F). A convex set F C Rn may have no vertices (for example the plane {x | x G Rn, ex = a} and the open ball B\(f) have no vertices), a finite number of vertices (for example the set {x \ x G Rn, x ^ 0, ex = 1}, where e in an n-vector of ones, has the n vertices e\ i = 1, . . . , n, where & is an n-vector with ef = 1 and e? = 0, i 5^ j), or an infinite number of vertices (for example the closed ball J?\(z)C Rn has an infinite number of vertices given by [x \ x G R", \\x — x\\ = X}). Theorem // (Fi),-e/ is a family (finite or infinite] of convex sets in Rn, then their intersection C\ Ft- is a convex set. i& PROOF l
2
Let x^x* G H Ft, and let 0 ^ X ^ 1. Then for each i G /, iGI
x ,x G F,, and since F, is convex, (1 — \)xl + \x2 G Ft. Polytope and polyhedron A set in Rn which is the intersection of a finite number of closed halfspaces in Rn is called a polytope. If a polytope is bounded (that is, for each x in the polytope \\x\\ ^ a for some fixed a G R), it is called a polyhedron. It follows from the convexity of the halfspaces 4 and Theorem 9 that poly topes and polyhedra are convex sets. Convex combination A point b G Rn is said to be a convex combination of the vectors 0 , • • • , «TO G Rn if there exist m real numbers pit . . . , pm such that 1
b = pia1 + • • • + pmam, pi, . . . , pm ^ 0, pi + • • • 4- Pm = 1 Equivalently, if we define an m X n matrix A whose ith row is Ai = a', 41
3.1
Nonlinear Programming
and if we let p = (pit . . . ,pm) £ Rm and e be an m-vector of ones, then we have that 6 is a convex combination of the rows of A if (6 = A'p, p ^ 0, ep = 1) has a solution p E Rm Note that if 6 is a convex combination of two points a1,a2 £ Rn, then this is equivalent to saying that 6 £ [a^o2] (see 2). Simplex Let x°, xl, . . . , xm be m + 1 distinct points in Rn, with m ^ n. If the vectors xl — x°, . . . , xm — x° are linearly independent, then the set of all convex combinations of x°, xl, . . . , xm
is called an m-simplex in /2n with vertices x°, x1, . . . , xm. (A 0-simplex is a point, a 1-simplex is a closed line segment, a 2-simplex is a triangle, and a 3-simplex is a tetrahedron.) Theorem A set F C Rn i$ convex if and only if for each integer m ^ 1, every convex combination of any m points of T is in T. Equivalently, a necessary and sufficient condition for the set T to be convex is that for each integer m ^ 1
PROOF The sufficiency of 14 is trivial; take m — 2, then T is convex by 3. The necessity of 14 will be shown by induction. For m = 1, 14 holds trivially. For m — 2, 14 holds as a consequence of 3. Assume now that 14 holds for m, we will now show that it also holds for m -f 1. Let x\ x*, . . . , xn+l £ T Pit • • • , Pm+i Pi +
^ 0
• • • + Pm+1 = 1
If pm+i = 0, then pix1 + • • • -f pmxm £ F, since 14 holds for m. If pm+1 = 1, thenpis 1 + • • • + Pm-nzm+1 = zm+1 £ r. If 0 < pm+1 < 1, 48
Convex Seta in Rn
3.1
then we can write
Caratheodory's theorem [Caratheodory 07] Let F C Rn- If ^ is a convex combination of points of F, then x is a convex combination of n + I or fewer points of F. PROOF
Let
We will show now that if m > n -f 1, then x can be written as a convex combination of m — 1 points in F. (This would establish the theorem then, for we could repeatedly apply the result until £ is a convex combination of n -f 1 points of F.) If any Pi in the above expression is zero, then x is a convex combination of m — 1 or fewer points of T. So let each pi > 0. Since m > n + 1, there exist rit . . . , rm_l G R, not all zero, such that
Define 9» = Pi — otTi
for i = I , . . . , m
where a is some positive number chosen such that #,- ^ 0 for all i, and at least one g,-, say g*, is equal to 0. In particular we choose a such that
43
a.i
Nonlinear Programming
Fig. 3.1.4 A set T and its convex hull [T].
Then
and
Hence x is a convex combination of m — 1 points in F. Convex hull Let T C Rn. The convex hull of T, denoted by [F], is the intersection of all convex sets in Rn containing F. (By Theorem 9, the convex hull of any set F C Rn is convex. Figure 3.1.4 shows a hatched nonconvex set in Rz and its shaded convex hull.) Obviously if F is convex, then F = [F]. Theorem The convex hull [F] of a set F C Rn is equal to the set of all convex combinations of points of F. PROOF
Let A denote the latter set, that is,
If x\x* G A, then
44
Conrex Sets in Rn
3.1
Hence for 0 g X ^ 1
and
Thus \xl -f- (1 — X)o;2 G A, and A is convex. It is also clear that r C A. Since A is convex, then [T] C A. We also have by Theorem 13 that the convex set [T] containing T must also contain all convex combinations of points of I\ Hence A C [r], and A = [r]. Sum of two sets Let T,A C Rn. Their sum r -f A is denned by Product of a set with a real number Let T C Rn, and let X E R-
The product XT is defined by
Note that if X = -1 and r, A C Rn, then A + XF = A - T. Note that this is not the complement of F relative to A as defined in 1.2 and written as
A ~ r.
Theorem The sum Y + A of two convex sets T and A in Rn is a convex set. PROOF Let 21,*2 E T + A, then zl = xl + yl and zz = x2 + yz, where xl,x2 G T and y^y2 £ A. For 0 ^ X g 1
Hence r + A is convex. Theorem The product pY of a convex set T in Rn and the real number p is a convex set. 45
1.2
Nonlinear Programming
PROOF Let zl,z* £ »T, then z1 = »xl, z2 = ^x\ where xl,x2 £ T. 0 ^ X^ 1
For
Corollary // F and A are two convex sets in R", then F — A is a convex set.
2. Separation theorems for convex sets It is intuitively plausible that if we had two disjoint convex sets in Rn, then we could construct a plane such that one set would lie on one side of the plane and the other set on the other side. Despite its simplicity, this is a rather deep result and is not easy to prove. One version of this result, the Hahn-Banach theorem, can be established by only using the vector space properties 1.3.3 of Rn and not the topological properties induced by the norm ||xj! [Berge 03, Valentine 64]. We shall, however, use these topological properties of Rn (all summarized in Appendix B) in deriving the separation theorems for convex sets. In particular our method of proof will make use of Gordan's theorem of the alternative 2.4-5 and the finite intersection theorem of compact sets B.3.2 (iii). (Knowledge of the contents of Appendix B is assumed from here on.)
Separating plane The plane {x \ x £ Rn, ex = a } , c 7^ 0, is said to separate (strictly separate] two nonempty sets F and A in Rn if
If such a plane exists, the sets F and A are said to be separable (strictly separable). Figure 3.2.1 gives a simple illustration in R2 of two sets in Rn which are separable, but which are neither disjoint nor convex. It should be remarked that in general separability does not imply that the sets are disjoint (Fig. 3.2.1], nor is it true in general that two disjoint sets are separable (Fig. 3.2.2}. However, if the sets are nonempty, convex, and 46
Convex Sets In R?
3.2
Fig. 3.2.1 Separable but not disjoint sets.
disjoint, then they are separable, and in fact this is a separation theorem we intend to prove. Lemma Let fl be a nonempty convex set in Rn, not containing the origin 0. Then there exists a plane {x \x (~ Rn, ex = 0}, c 7* 0, separating £2 and 0, thatis,
PROOF
With every x G & we associate the nonempty closed set
Let x1, . . . , xm be any finite set of points in fl. It follows from the convexity of fl, Theorem 3.1.13, and from the fact that 0 ^ 12, that
Fig. 3.2.2 Disjoint but not separable sets. 47
3.2
Nonlinear Programming
or equivalently
Hence by Gordan's theorem 2.4-5 x^y > 0, i — 1, . . . , m has a solution y G: Rn Obviously y ^ 0, and we can take y such that yy = 1. Then
and hence
The sets (A^en are closed sets relative to the compact set {y \ y G Rn, yy = 1} [see B.1.8 and B.3.2(\}}, hence by the finite intersection theorem B.3.2(\\\) we have that C\ Ax ^ 0. Let c be any point in this interseczen tion. Then cc = 1 and cz ^ 0 for all x G ^- Hence {# | a: G #n, ex = 0} is the required separating plane. | It should be remarked that in the above lemma we did not impose any conditions on fl other than convexity. The following example shows that the above lemma cannot be strengthened to x G ^ =* ex > 0 without some extra assumptions. The set is convex and does not contain the origin, but there exists no plane {x | x £ R*, ex = 0} such that x E & =» ex > 0 (Fig. 3.8.3). If on the other hand we do assume that ft is closed (or even if we
Fig. 3.2.3 48
Convex Sets in Rn
3.9
assume less, namely that the origin is not a point of closure ft), then we can establish a stronger result, that is, there exists a plane which strictly separates the origin from ft (see Corollary 4 and Lemma 5 below). However, before doing this, we need to establish the following fundamental separation theorem. Separation theorem Let F and A fee two nonempty disjoint convex sets in R". Then there exists a plane {x | x G Rn, ex = a } , c ^ 0, which separates them, that is,
PROOF
The set
is convex by Corollary 3.1.22, and it does not contain the origin 0 because F C\ A = 0. By Lemma 2 above there exists a plane \x \ x G Rn, ex = 0}, c 7± 0, such that
or Hence Define
Then
We derive now from the above fundamental separation theorem a corollary, and from the corollary a lemma, Lemma 5. Lemma 5 will be used in establishing a strict separation theorem, Theorem 6, below. Corollary Let 12 be a nonempty convex set in Rn. If the origin 0 is not a point of closure of fl (or equivalently if the origin is not in the closure fi of fi), then 49
3.2
Nonlinear Programming
there exists a plane {x \ x G Rn, ex = a ] , c 7* 0, a > 0, strictly separating 12 and 0, and conversely. In other words
PROOF («=) Assume that there exist c ^ 0, a > 0 such that ex > a for all x G Q. If 0 G 12, then (see BJ.3 and B.I.6} there exists an x G f such that ||z|| < a/2||c||, and hence
which is a contradiction. Hence 0 ££ 12. (=>) Since 0 is not a point of closure of 12, there exists an open ball Bf(0) = \x \ x G Rn, \\x\\ < e\ around 0 such that B e (0) n 0 = 0 (see B.1.3). Since the ball /?€(0) is convex (see 3.1.7), it follows by Theorem 3 that there exists a plane {x \ x G Rn, ex = 7}, c 7* 0, such that
Since 5t(0) is an open ball, it must contain the nonzero vector Sc for some positive 5. Hence 7 ^ dec > 0. Let a = ^dcc > 0. Then
Lemma Let ft be a nonempty closed convex set in Rn. If 12 does not contain the origin, then there exists a plane \x \ x G Rn, ex = a } , c ^ 0, a > 0, strictly separating 12 and 0, and conversely. In other words
PROOF This lemma follows from Corollary 4 above by observing that the requirement that 12 be closed and not contain the origin 0 implies that 0 is not a point of closure of 12, that is, 0 ^ fl (see B.1.3, B.I.5 and B.1.6). Strict separation theorem Let F and A be two nonempty convex sets in Rn, with T compact and A closed. If T and A are disjoint, then there exists a plane {x \ x G Rn, 80
3.2
Convex Set* in R"
ex = a ] , c 5^ 0 which strictly separates them, and conversely words
In other
PROOF («=) If x E F C\ A, then ex < a < ex, a contradiction. (=») The set is convex by Corollary 3.1.22 and closed by Corollary B.3.8. Hence by Lemma 5 above there exists a plane \x | x G Rn, ex = /x}, c 5^ 0, ^i > 0, such that orr
Hence
Define
Then
The above separation theorems will be used to derive some fundamental theorems for convex functions in the next chapter, which in turn will be used in obtaining the fundamental Kuhn-Tucker saddlepoint optimality criteria of convex nonlinear programming in Chap. 5 and also the minimum principle necessary optimality condition of Chap. 11. We remark here that a theorem of the alternative, the Gordan theorem 8.4.6, was fundamental in deriving the above separation theorems. We can reverse the process and use the above separation theorems to derive theorems of the alternative. Thus to derive Gordan's theorem 2.4.5, namely that either A'y = 0, y > 0 has a solution y G Rm or Ax > 0 51
8.3
Nonlinear Programming
Fig. 3.2.4 Geometric reinterpretation of Gordan's theorem by using Lemma 6. (a) A'y = 0, y > 0 has solution; Ax > 0 has no solution; (b) Ax > 0 has solution; A'y = 0, y > 0 has no solution.
has a solution x G Rn, we observe that if e G Rm is a vector of ones, then
The last implication follows by taking y = &• G Rm, i = 1, . . . , m, where & has zeros for all elements except 1 for the ith element. Using the framework of Lemma 5 we can give a geometric reinterpretation of the Gordan's theorem as follows: Either the origin 0 £ R"
Fig. 3.2.5 Geometric interpretation of Lemma 5 02
Convex Bets in A"
3.2
is in the convex hull of the row vectors A I , . . . , An of the matrix A (A'y = 0, y > 0 has a solution, Fig. 8.2.4a), or it is not (in which case by Lemma 5 Ax > 0 has a solution x = c, Fig. 3.2.4V). More generally, if fi is any nonempty closed convex set in Rn, then either it contains the origin, Fig. 8.2.5a, or it does not (in which case by Lemma 5 there exists a vector c £ Rn which makes a strict acute angle with each x G fy Fig. 3.2.5V). Problem Establish Farkas' theorem 2.4-6 by using Theorem 6 above. (Hint: Observe that A'y = 6, y ^ 0 has no solution if and only if the sets {6} and \z \ z = A'y, y ^ 0} are disjoint. Then use Theorem 6.)
us
Chapter Four Convex and Concave Functions
In this chapter we introduce convex, concave, strictly convex, and strictly concave functions defined on subsets of Rn. Convex and concave functions are extremely important in nonlinear programming because they are among the few functions for which sufficient optimality criteria can be given (Chaps. 5 and 7), and they are the only functions for which necessary optimality conditions can be given without linearization (Kuhn-Tucker saddlepoint condition in Chap. 5). We give in this chapter some of the basic properties of convex and concave functions and obtain some fundamental theorems involving these functions. These theorems, derived by using the separation theorems for convex sets of Chap. 3, are akin to the theorems of the alternative derived in Chap. 2 for linear systems. In this sense convex and concave functions inherit some of the important properties of linear functions. These fundamental theorems will be used to derive the important saddlepoint necessary optimality condition of Chap. 5 and the minimum principle necessary optimality condition of Chap. 11. Finally it should be mentioned that no differentiability or explicit continuity requirements are made on the functions introduced in this chapter. A subsequent chapter, Chap. 6, will be devoted to differentiate convex and concave functions.
Convex and Concave Functiona
4.1
1. Definitions and basic properties Convex function A numerical function 6 denned on a set T C Rn is said to be convex at x G F (with respect to F) if
6 is said to be convex on T if it is convex at each x G F. Note that this definition of a convex function is slightly more general than the customary definition in the literature [Fenchel 53, Valentine 64, Berge-Ghouila Houri 65] in that (i) we define convexity at a point first and then convexity on a set, and (ii) we do not require F to be a convex set. This generalization will allow us to handle a somewhat wider class of problems later. It follows immediately from the above definition that a numerical function 6 defined on a convex set F is convex on F if and only if
Figure l+.l,l depicts two convex functions on convex subsets of Rn = R.
Fig. 4.1.1 Convex functions on subsets of Rn = R. (a) A convex function 8 on R; (b) A convex function 6 on r = [ — 1, <*>). 55
4.1
Nonlinear Programming
Concave function A numerical function 6 defined on a set F C Rn is said to be concave at x £i F (with respect to F) if
6 is said to be concave on F if it is concave at each x G r. Obviously 9 is concave at x £ F (concave on F) if and only if — 6 is convex at x (convex on F). Results obtained for convex functions can be changed into results for concave functions by the appropriate multiplication by — 1, and vice versa. It follows immediately from the above definition that a numerical function 6 defined on a convex set F is concave on F if and only if
Figure 4-1'•# depicts two concave functions on convex subsets of Rn = R. Problem Show that a linear function, d(x) = ex — a, x G Rn, is both convex and concave on Rn, and conversely. Strictly convex function A numerical function 0 defined on a set F C Rn is said to be strictly
Fig. 4.1.2 Concave functions on subsets of Rn = R. (a) A concave function 8 on R; (b) A concave function 6 on r = [0,1]. 86
Convex and Concave Functions
4.1
convex at x £ F (with respect to F) if
6 is said to be strictly convex on F if it is strictly convex at each x £ F. Strictly concave function A numerical function 6 defined on a set F C Rn is said to be strictly concave at x £ F (with respect to F) if
8 is said to be strictly concave on T if it is strictly concave at each x G T. Obviously a strictly convex (strictly concave) function on a set r C Rn is convex (concave) on F, but not conversely. For example a constant function on Rn is both convex and concave on Rn, but neither strictly convex nor strictly concave on Rn. In fact, it can be easily shown that all linear functions 0(z) = ex — a on Rn are neither strictly convex nor strictly concave on Rn. Hence, because of the linear portion, the function depicted in Fig. ^.l.la is not strictly convex on R, but the function of Fig. 4-1 -lb is strictly convex on [— 1, «). Both functions of Fig. 4-1.2 are strictly concave on their domains of definition. An n-dimensional vector function / defined on a set F in Rn is convex at x G F, convex on F, etc., if each of its components /,, i — 1, . . . , m, is convex at x G F, convex on F, etc. Theorem Let f = (/i, . . . ,/m) be an m-dimensional vector function defined on F C Rn- Iffis convex atx G T (convex on F), then each nonnegative linear combination of its components /,-
6(x) = pf(x)
p £0
is convex at x (convex on F). 67
4.1
Nonlinear Programming
PROOF
Let x £ F, 0 ^ X ^ 1, and let (1 - X)z -f Xz G F.
Then
Problem Let 0 be a numerical function defined on a convex set F C #"• Show that 6 is respectively convex, concave, strictly convex, or strictly concave on T if and only if for each a:1,a:2 G F, the numerical function \p defined on the line segment [0,1] by is respectively convex, concave, strictly convex, or strictly concave on [0,1]. Theorem For a numerical function 9 defined on a convex set T C Rn to be convex on F it is necessary and sufficient that its epigraph be a convex set in Rn+1. PROOF (Sufficiency) Assume that Ge is convex. Let xl,xz G T, then [xl,6(x1)] G Ge and [z2,0(£2)] G Ge. By the convexity of Ge we have that or
and hence 6 is convex on F. (Necessity) Assume that 6 is convex on F. Let xl,£l G G9 and xz,£2 G Ge. By the convexity of 6 on F we have that for 0 ^ X ^ 1
SB
Convex and Concave Functions
4.1
Hence' and Ge is a convex set in Rn+1. Corollary For a numerical function 6 defined on a convex set T C Rn to be concave on F it is necessary and sufficient that its hypograph be convex set in Rn+l. Figure J^.l.Sa depicts a convex function on T and its convex epigraph Ge. Figure 4-l-$b depicts a concave function on T and its convex hypograph Hf. Theorem Let 6 be a numerical function defined on a convex set T C Rn- A necessary but not sufficient condition for 6 to be convex on T is that the set be convex for each real number a. PROOF
Let B be convex on T and let xl,xz £ Aa. Then
Hence (1 — X)^ 1 -f- Xz 2 G A a , and Aa is convex.
Fig. 4.1.3 The convex epigraph Ge of a convex function and the convex hypograph He of a concave function. 59
Nonlinear Programming
4.1
We now show that if Aa is convex for each a, it does not follow that 6 is a convex function on F. Consider the function Q on R defined by Q(x) = (a;)3. 6 is not convex on R. However, the set is obviously convex for any a (see 3.1.4). Corollary Let d be a numerical function defined on the convex set F C Rn- A necessary but not sufficient condition for 0 to be concave on F is that the set be convex for each real number a. Figure ^A.^a depicts a convex function 0 on a convex set F C R" = R and the associated convex set A a . Figure 4-1-4b depicts a nonconvex function 6 and the associated convex set A a . Figure 4-1-4c depicts a concave function 6 on a convex set F C Rn = R and the assof>mt,prl r.nnvpv spf. fi...
Problem Let 9 be a numerical function defined on the convex set F C Rn. Show that a necessary and sufficient condition for 9 to be convex on F is that for each integer m ^ 1
Fig. 4.1.4 The convex sets Aa and fla of 10 and 11 associated with a function 6. 60
Convex and Concave Functions
(Hint: Use Theorems 8 and 3.1.13. inequality [Jensen 06].)
4-1
The above inequality is Jensen's
Theorem If (0t)i'e/ is a family (finite or infinite) of numerical functions which are convex and bounded from above on a convex set T C Rn, then the numerical function 6(x) = sup 0i(x) iei is a convex function on T.
PROOF
Since each 0,- is a convex function on F, their epigraphs
are convex sets in Rn+1 by Thearem 8, and hence their interaction
is also a convex set in Rn+1 by Theorem 3.1.9. But this convex intersection is the epigraph of 6. Hence 6 is a convex function on F by Theorem 8. | Corollary If (#i)ie/ is a family (finite or infinite) of numerical functions which are concave and bounded from below on a convex set T C Rn, then the numerical function
is a concave function on F. We end this section by remarking that a function 0 which is convex on a convex set F C Rn is not necessarily a continuous function. For example on the halfline T = {x \ x ^ R, x ^ — I } , the numerical function
is a convex function on F, but is obviously not continuous at x = — 1, Fig. 4-1-1 b. However, if F is an open convex set, then a convex function 6 on F is indeed continuous. This fact is established in the following theorem. 61
4,1
Nonlinear Programming
Theorem Let F be an open convex set in Rn. If 6 is a convex numerical function on T then 6 is continuous on T. PROOF [Fleming 65]f Let x° G T, and let a be the distance (see L3.9) from x° to the closest point in Rn not in T (a — + w if r = Rn). Let C be an n-cube with center x° and side length 25, that is By letting (n)*43 < a, we have that C C r. vertices of C. Let
Let V denote the set of 2"
/3 = max 6(x) x€V
By Theorem 10 the set A0 = [x \ x £ T, e(x) ^ ft] is convex. Since C is the convex hull of V (this can be easily shown by induction on u) and V C A0, it follows that C C Ap, by Theorem 3.1.13 (Fig. 4.1.5). Let x be any point such that 0 < \\x — x°\\ < d, and define x° + u, x° — u on the line through x° and x as in Fig. 4-1-5. Write x now as a convex combination of x° and x° + u, and x° as a convex combination of x and z° - M. If X = \\x - x°\\/8, then
t Fleming attributes this proof to F. J. Almgren.
Fig. 4.1.5 69
Convex and Concave Functions
4.9
Since 8 is convex on F
These inequalities give
Thus for any given e > 0 it follows that 1 6 ( x ) - B(x°)\ < e for all x satisfying [ft - 0(x°)] ||x - x°\\ < ed, and hence 6(x) is continuous at oj°. | Since the interior of each set F C Rn is open, it follows that if 6 is a convex function on a convex set F C Rn, it is continuous on its interior.
2.
Some fundamental theorems for convex functions
We saw in Chap. 2 that Farkas' theorem of the alternative played a crucial role in deriving the necessary optimality conditions of linear programming. In this section we shall derive what may be considered extensions of theorems of the alternative of Chap. 2 to convex and concave functions. These theorems in turn will play a similar crucial role in deriving the necessary optimality conditions of nonlinear programming in Chaps. 5 and 11. (In the remainder of this chapter various properties of continuous and semicontinuous functions will be used. For convenience, these results are summarized in Appendix C.) We begin by establishing a fundamental theorem for convex functions, the essence of which is given in [Fan-Glicksburg-Hoffman 57]. Theorem Let r be a nonempty convex set in Rn, let f be an m-dimensional convex vector function on T, and let h be a k-dimensional linear vector function on Rn. If
63
4.3
Nonlinear Programming
then there exist p G Rm and q G Rk such that
REMARK p ^ 0 and (p,q) ^ 0 docs not imply p > 0 and q ^ 0, but it docs imply p > 0 or q ^ 0 or both. However if we delete the linear equalities h(x) = 0, then p > 0. PROOF
Deh'ne the sets
and
By hypothesis A does not contain the origin 0 G Rm+k. for if ( y l , z l ) and (# 2 ,£ 2 ) are in A, then for 0 ^ X ^ 1
Also, A is convex,
and
Because A is a nonempty convex set not containing the origin, it follows by Lemma 3.2.2 that there exist /) G Rm, q G Rk, (p,q) ^ 0 such that Since each u, can be made as large as desired, p ^ 0. Let e > 0, u = f(x) + ee, v = h(x), x G T, where e is a vector of ones in Rm. Hence (u,v) G A(2) C A, and
or
Now, if
we get, by picking e such that epe < 8, that
64
Convex and Concave Functions
*-2
which is a contradiction to the fact that pf(x) + qh(x) ^ — epe for all x G r. Hence
If we observe that for an m-dimensional vector function / denned on T C Rn we have that
and
then the following corollary is a direct consequence of Theorem 1. Corollary Let T be a nonempty convex set in Rn, let f i , f t , fa be m1-, m2-, and m*-dimensional convex vector functions on T, and h a k~dimensional linear vector function on Rn. If
then there exist pi E Rm\ Pa G Rm*, Pz G Rm>, and q G Rk such that
We give now a generalization of Gordan's theorem of the alternative 2.4-.S to convex functions over an arbitrary convex set in Rn. Generalized Gordan theorem [Fan-Glicksburg-Hoffman 57] Let f be an m-dimensional convex vector function on the convex set T C Rn. Then either I
f(x) < 0 has a solution x G r £6
4.1
Nonlinear Programming
or
II
pf(x) ^ 0 for all x G T /or some p > 0, p £ Rm
but never both. PROOF (I => II) Let z £ T be a solution of f(x) < 0. Then for any p > 0 in Rm, pf(x) < 0, and hence II cannot hold. (I ==» II) This follows directly from Theorem 1 above by deleting h(x) = 0 from the theor To see that 3 is indeed a generalization of Gordan's theorem 2.4-5 we let f(x) = Ax, where A is an m X n matrix. Then
where the last equivalence follows by taking x = ±e\ i = 1, . . . , n, where ei £ Rn has zeros for all its elements except 1 for the ith element. In the same spirit, Theorem 1 above can be considered a partial generalization of Motzkin's theorem 24-% to convex functions. The generalization is partial (unlike 3 which is a complete generalization of Gordan's theorem), because the statement of Theorem 1 does not exclude the possibility of both systems having a solution, that is, there may exist an x £ T and p ^ 0, (p,q) ^ 0, such that f(x) < 0, h(x) = 0, and P/C*0 + qh(x) ^ 0 for all x G F. Similarly, Corollary 2 is a partial generalization of Slater's theorem 2.4-1- However, it is possible to sharpen Theorem 1 and make it a theorem of the alternative if we let T = Rn, h(x) = Bx — d and require that the rows of B be linearly independent. We obtain then the following result. Theorem Let f be a given m-dimensional convex function on Rn, let B be a given k X n matrix with linearly independent rows, and let d be a given k-dimensional vector. Then either I
f(x) < 0, Bx = d has a solution x £ Rn
or
pf(x) -f q(Bx - d) ^ 0 for all x £ Rn for some p > 0, p G Rm, qER" but never both.
II
66
Convex and Concave Functions
4.9
PROOF (I => II) Let x £ Rn be a solution of f(x) < 0 and Bx = d. Then for any p > 0 and q in # m and Rk respectively,
Hence II cannot hold. (I => II) If I has no solution then by Theorem 1 there exists p ^ 0, (p,q) ^ 0 such that
If p > 0, the theorem is proved. We assume the contrary, that p = 0, and exhibit a contradiction. If p = 0, then
We will show now that B'q = 0. For, if B'q ^ 0, then by picking x = — qB for the case when qd ^ 0, and x — 2(qd)qB/qBB'q for the case when qd < 0, we obtain that q(Bx — d) < 0. Hence B'q = 0 for some q 7* 0, which contradicts the assumption that the rows of B are linearly independent. We close this chapter by obtaining another fundamental theorem for a (possibly infinite) family of convex and linear functions. Theorem [Bohnenblust-Karlin-Shapley 50] Let r be a nonempty compact convex set in Rn and let (fi)i^M be a family (finite or infinite) of numerical functions which are convex and lower semicontinuous on T, and let (hi)^K be a family (finite or infinite) of linear numerical functions on Rn. If
then for some finite subfamily (/,-„ . . . ,/iJ of (fi)i£M and some finite subfamily (h^, . . . ,^,-J of (fti),-e/c there exist p £ Rm and q £ Rk such that
If K is empty, that is if all equalities hi(x) = 0 are deleted, then the last inequality above (^ 0) becomes a strict inequality (> 0). 67
Nonlinear Programming
4.2
PROOF
[Berge-Ghouila Houri 65] The system
has no solution x in T. [For if it did have a solution x, then /,(£) ^ « for all e > 0 and all i G M, and hi(x) — 0 for all i G K. This in turn implies that /,•(£) ^ 0 for all i G -fl/ and /i,(x) = 0 for all i E. K (for otherwise if fi(x) > 0 for some i G M, then picking e = Y^ji(x) > 0 would lead to a contradiction). This however contradicts the hypothesis of the theorem.] The sets are closed sets (because of the lower semicontinuity of /,, the linearity of hj, and the compactness of T, see Appendix C) contained in the compact set F, and their intersection is empty. Hence by the finite intersection theorem B.3.2(iii) there exist a finite number of such sets so that their intersection is empty. Thus we obtain indices (ii,iz, • • • ,im) G M, (ii,iz, . . . ,4) G K, and real numbers ei, e2, . . . , em > 0, such that t.Vip svat.pm
has no solution x G T. Hence by Corollary 2 there exist p G Rm, 3 G Rk such that and
from which the conclusion of the theorem follows if we observe that
68
Chapter Five Saddlepoint
Optimality Criteria of Nonlinear Programming Without Differentiability
The purpose of this chapter is to derive optimality criteria of the saddlepoint type for nonlinear programming problems. This type of optimality criterion is perhaps best illustrated by a simple example. Consider the problem of minimizing the function 6 on the set X = [x | x G R, -x + 2 £ O j , where Q(x) — (x)2. Obviously the solution is x = 2, and the minimum is 6(x) = 4. The saddlepoint optimality criterion for this problem is this: A necessary and sufficient condition that a; be a solution of the minimization problem is that there exists a real number u (here u = 4) such that for all x G R and all u G R, u ^ 0 6(x) + u(-x + 2) ^ 6(x) + u(-x + 2) ^6(x) +u(-x + 2)
It is easy to verify that the above inequalities are satisfied for x = 2, u = 4. Hence the function \f/ defined on R* by t(x,u) = 8(x) + u(-x + 2) has a saddlepoint at x = 2, n = 4, because it has a minimum at (x,u) with respect to x for all real x, and a maximum with respect to u for all real nonnegative u. For the above simple problem, the saddlepoint criterion happens to be both a necessary and a sufficient optimality criterion for x to be a solution of the minimization problem. This is not always the case. We shall show in this chapter that the above saddlepoint condition is a sufficient optimality condition without any convexity
6.1
Nonlinear Programming
requirements. However to establish the necessity of the above saddlepoint condition, we need not only convexity but also some sort of a regularity condition, a constraint qualification. This confirms earlier statements made to the effect that necessary optimality conditions are more complex and harder to establish. We shall develop the optimality criteria of this chapter without any differentiability assumptions on the functions involved. Subsequent chapters, Chaps. 7 and 11, will establish optimality criteria that involve differentiable functions.
1. The minimization and saddlepoint problems The optimality criteria of this chapter relate the solutions of a minimization problem, a local minimization problem, and two saddlepoint problems to each other. We define these problems below now. Let X° be a subset of Rn, let 0 and g be respectively a numerical function and an m-dimensional vector function defined on X°. The minimization problem (MP) Find an x, if it exists, such that
The set X is called the feasible region or the constraint set, x the minimum solution or solution, and 6(x) the minimum. All points x in the feasible region X are called feasible points. If X is a convex set, and if 6 is convex on ^T, the minimization problem MP is often called a convex programming problem or convex program, (We observe that the above minimization problem is a special case of the general minimization problem 1.6.9, where the additional ^-dimensional vector equality constraint h(x) = 0 was also present. The reason for this is that in the absence of differentiability there are no significant optimality criteria for problems with nonlinear equality constraints. Some results for linear equality constraints will be obtained however. See 5.8.2, 6.4.2, and 6.4.8.) The local minimization problem (LMP) Find an x in X, if it exists, such that for some open ball B&(x) around x with radius 6 > 0
70
Saddlepoint Optimality Criteria without Differentiability
8.1
The Fritz John saddlepoint problem (FJSP)
The Kuhn-Tucker saddlepoint problem (KTSP)
Remark If (z,fo,f) is a solution of FJSP and f 0 > 0, then (x,f/f0) is a solution of KTSP. Conversely, if (x,u) is a. solution of KTSP, then (x,l,u) is a solution of FJSP. Remark The numerical functions <(>(x,r0,r) and t(x,u) defined above are often called Lagrangian functions or simply Lagrangians, and the m-dimensional vectors f and u Lagrange multipliers or dual variables. These multipliers play a role in linear and nonlinear programming which is very similar to the role played by the Lagrange multipliers of the classical calculus where a function of several variables is to be minimized subject to equality constraints (see for example [Fleming 65]). Here, because we have inequality constraints, the Lagrange multipliers turn out to be nonnegative. When we shall consider equality constraints in 5.3.2, 5.4-2, and 5.4-8, the multiplier associated with these equalities will not be required to be nonnegative. Remark The right inequality of both saddlepoint problems, FJSP 3 and KTSP 4 and
71
5.2
Nonlinear Programming
can be interpreted as a minimum principle, akin to Pontryagin's maximum principle] [Pontryagin et al. 62]. Pontryagin's principle in its original form is a necessary optimality condition for the optimal control of systems described by ordinary differential equations. As such, it is a necessary optimality condition for a programming problem, not in Rn, but in some other space. More recently [Halkin 66, Canon et al. 66, Mangasarian-Fromovitz 67] a minimum principle has also been established for optimal control problems described by ordinary difference equations. This is a programming problem in Rn, which unfortunately is not convex in general, and hence the results of this chapter do not apply. However the optimality conditions of Chaps. 7 and 11, which are based mainly on linearization and not on convexity, do apply to optimal control problems described by nonlinear difference equations.
2.
Some basic results for minimization and local minimization problems
We establish now some basic results concerning the set of solutions of the minimization problem and relate the solutions of the minimization and local minimization problems to each other. Theorem Let X be a convex set, and let 6 be a convex function on X. of solutions of MP 5.1.1 is convex.
The set
REMARK A sufficient but not necessary condition for the convexity of X is that X° be a convex set and that g be convex on X°. This follows from 4.1.10 and 3.1.9. PROOF
Let xl and x* be solutions of MP.
That is,
It follows by the convexity of X and 0, that for O ^ X ^ 1 , ( 1 — X).tl-|Xz 2 G X, and
Hence (1 — X)^1 + Xz2 is also a solution of MP, and the set of solutions is convex. t Pontryagin gets a maximum principle instead of a minimum principle because his Lagrangian is the negative of the Lagrangian of nonlinear programming. 72
Saddlepoint Optimality Criteria without Differentiability
5.2
Uniqueness theorem Let X be convex and x be a solution of MP 5.1.1. If 8 is strictly convex at x, then x is the unique solution of MP. PROOF Let x ^ x be another solution of MP, that is, x £ X, and 0(z) = 6(x). Since -X" is convex, then (1 — X)x + Arc G X whenever 0 < X < 1, and by the strict convexity of 0 at x This contradicts the assumption that 8(x) is a minimum, and hence x cannot be another solution. Theorem Let X be convex, and let 6 be a nonconstant concave function on X. Then no interior point of X is a solution of MP 5.1.1, or equivalently any solution x of MP, if it exists, must be a boundary point of X. PROOF If MP 5.1.1 has no solution the theorem is trivially true. Let x be a solution of MP. Since 6 is not constant on X, there exists a point x G X such that 6(x) > 6(x). If z is an interior point of X, there exists a point y £j X such that for some X, 0 ^ X < 1 See Fig. 5.2.1.
Hence
and B(x] does not attain its minimum at an interior point z. Figure 5.2.2 shows a simple example of Theorem 3 in R. Theorem If x is a solution of MP 5.1.1, then it is also a solution of LMP 5.1.2. The converse is true if X is convex and 6 is convex at x.
Fig. 5.2.1 73
Nonlinear Programming
6.8
Fig. 5.2.2 A simple example of Theorem Sin R.
PROOF If x solves MP, then x solves LMP for any 8 > 0. To prove the converse now, assume that x solves LMP for some 8 > 0, and let X be convex and 6 be convex at x. Let y be any point in X distinct from x. Since X is convex, (1 — X)z + Xy G X for 0 < X ^ 1. By choosing X small enough, that is, 0 < X < 8/\\y — x\\ and X ^ 1, we have that Hence
from which it follows that
3. Sufficient optimality criteria The main sufficient optimality criteria developed here (1 and 2 below) require no convexity assumptions on the minimization problem MP 5.1.1. These criteria are quite straightforward to obtain and need no complicated machinery to derive. First results of this type were obtained in [Uzawa 58]. Sufficient optimality theorem // (x,u) is a solution of KTSP 5.1.4, then x is a solution of MP 5.1.1. If (x,f0,f) is a solution of FJSP 5.1.3, and fo > 0, then x is a solution of MP 5.1.1. PROOF The second statement of the theorem follows trivially from the first statement by Remark 5.1.5. Let (x,ii) be a solution of KTSP 6.14. Then for all u ^ 0 in Rm 74
Saddlepoint Optimality Criteria without Differentiability
5.8
and all x in X°
From the first inequality we have that For any j, I ^ j ^ m, let
It follows then that QJ(X) ^ 0. Repeating this for all j, we get that g(x) ^ 0, and hence x is a feasible point, that is, x G X. Now since u ^ 0 and g(x) ^ 0, we have that ug(x) ^ 0. But again from the first inequality of the saddlepoint problem we have, by setting u = 0, that ug(x) ^ 0. Hence ug(x) = 0. Let x be any point in X, then from the second inequality of the saddlepoint problem we get
Hence x is a solution of MP. | It should be remarked here that because no convexity assumptions were made in the above theorem, equality constraints can be handled by replacing them by two inequality constraints. That is, replace h(x) = 0 by h(x) g 0 and —h(x) ^ 0. Problem Consider the minimization problem where h is a A;-dimensional vector function on X° and all else is defined as in MP 5.1.1. Let and
Show that if there exist x £ X°, u £ Rm, u ^ 0, v £ Rk such that
75
5.4
Nonlinear Programming
or if there exist x G X°, f0 G R, f0 > 0, f G Rm, f ^ 0, s £ Rk such that
then x is a solution of the minimization problem. (Notice that v and s are not restricted in sign.) The question may be raised as to what sort of point is the point x if (x,f0,f) is a solution of FJSP 5.1.8 and we do not require that f0 > 0. An answer to this question is given by the following result. Corollary // (x,fo,f) is a solution of FJSP 5.1.3, then either x solves MP 5.1.1 or X has no interior relative to g(x] ^ 0, that is, {x \ x G -X"0, g(x) < 0} = 0. PROOF By the same argument as in the proof of Theorem 1 above we show that g(x) ^ 0 and rg(x) = 0. Now, if f 0 > 0, then x solves MP by Theorem 1. If f 0 = 0, then f > 0 and we have from the second inequality of FJSP 5.1.8 that Now, if the set {x \ x G X°, g(x) < 0} is nonempty, then for any element x in it fg(x) < 0, which contradicts the fact established above that fg(x) ^ 0 for all x G ^°- Hence {x \ x G X°, g(x) < 0} = 0. |
4. Necessary optimality criteria The situation with respect to necessary criteria is considerably more complicated than the situation with respect to sufficient optimality criteria. The two situations are compared in the following table:
Necessary criteria
Sufficient criteria
(a) Convexity needed
No convexity needed
(6) Consequence of separation theorem of Separation theorem of convex sets not convex sets needed needed (c) Regularity condition (constraint quali- No constraint qualification needed fication) needed in the more important necessary criterion (7 below) 76
Saddlepoint Optimality Criteria without Diflerentiability
5.4
We begin by establishing a necessary optimality criterion which does not require any regularity conditions. This necessary optimality criterion is similar in spirit to the necessary optimality criterion of Fritz John [John 48] (see also Chap. 7), which was derived for the case where the functions 0 and g were differentiate but not convex. We use no differentiability here, but instead we use convexity. The present criterion is a saddlepoint criterion, whereas Fritz John's is a gradient criterion. The main point of similarity is the presence of the multiplier f 0 in both criteria. Fritz John saddlepoint necessary optimality theorem [Uzawa 58, Karlin 59] Let X° be a convex set in Rn, and let 6 and g be convex on X°. If x is a solution of MP 5.1.1, then x and some f 0 G R, f G Rm, (?o,f) > 0 solve FJSP 5.1.3 and fg(x) = 0. PROOF
Because x solves MP
By Corollary 4.2.2 there exist f 0 E R, f £ Rm, (f0,f)
> 0 such that
By letting x = x in the above, we get that fg(x) ^ 0. But since f ^ 0 and g(x) ^ 0, we also have fg(x) ^ 0. Hence and
which is the second inequality of FJSP 5.1.3. We also have, because g(x) ^ 0, that and hence, since fg(x) = 0 which is thefirstinequality of FJSP 5.1. Problem Consider the minimization problem
77
5.4
Nonlinear Programming
where h is a fc-dimensional linear vector function on Rn, 6 and g are convex on X°, and all else is defined as in MP 5.1.1. Show that if I is a solution of the above problem, then x and some f0 G R, f G Rm, s G Rk, (f0,f) ^ 0, (f0,f,§) ?£ 0 satisfy fg(x) = 0, and
(Hint: Again use Corollary 4-#-#-) It should be remarked here that in the above necessary optimality criteria there is no guarantee that f 0 > 0. In cases where f 0 = 0 it is intuitively obvious that the necessary optimality criterion FJSP 5.1.3 does not say much about the minimization problem MP 5.1.1, because the function 8 has disappeared from 5.1.3 and any other function could have played its role. In order to exclude such cases, we have to introduce some regularity conditions. These regularity conditions are referred to in the literature as constraint qualifications. We shall have occasion to use a number of these constraint qualifications throughout this book. Some of these constraint qualifications (like the three introduced below) make use only of the convexity properties of the functions defining the feasible region X. Other constraint qualifications, to be introduced later, in Chap. 7 for example, make use mostly of the differentiability properties of the functions defining the feasible region X. Slater's constraint qualification [Slater 50] Let X° be a convex set in Rn. The m-dimensional convex vector function g on X° which defines the convex feasible region is said to satisfy Slater's constraint qualification (on X°) if there exists an x G X°suchthat0(:r) < 0. Karlin's constraint qualification [Karlin 59] Let X° be a convex set in Rn. The m-dimensional convex vector function g on X° which defines the convex feasible region is said to satisfy Karlin's constraint qualification (on X°) if there exists no p G Rm, P > 0 such that
78
Saddlepoint Optimality Criteria without Differentiability
6.4
The strict constraint qualification Let X° be a convex set in Rn. The m-dimensional convex vector function g on X° which defines the convex feasible region is said to satisfy the strict constraint qualification (on X°) if X contains at least two distinct points x1 and x 2 such that g is strictly convex at x1. Lemma Slater's constraint qualification 3 and Karlin's constraint qualification 4 are equivalent. The strict constraint qualification 5 implies Slater's and Karlin's constraint qualifications 3 and 4PROOF (3 <=> 4) pnnivalpnt.
By Gordan's generalized theorem 4.8.3, 3 and 4 are
Because g is strictly convex at x1, it follows from 4-1-4 that where the last inequality follows from the fact that g(xl) ^ 0 and g(xz) ^ 0. Thus g satisfies Slater's constraint qualification 3 and hence also Karlin's. We are ready now to derive the most important necessary optimality criterion without the use of differentiability. The theorem is widely known under the name Kuhn-Tucker [Kuhn-Tucker 51], even though Kuhn and Tucker required both convexity and differentiability in its derivation. The theorem in its present form, without any differentiability requirements, is attributed to Uzawa [Uzawa 58] and Karlin [Karlin 59]. Kuhn-Tucker saddlepoint necessary optimality theorem [Kuhn-Tucker 51, Uzawa 58, Karlin 59] Let X° be a convex set in Rn, let 6 and g be convex on X°, and let g satisfy Slater's constraint qualification 3, Karlin's constraint qualification 4, or the strict constraint qualification 5 on X°. If x is a solution of MP 5.1.1, then x and some u G Rm, u i; 0, solve KTSP 5.1.4 and ug(x) = 0. PROOF We first observe that by Lemma 6 above we need only establish the theorem under Karlin's constraint qualification. By Theorem 1, x and 79
5.4
Nonlinear Programming
some f0£R, f £ Rm, (f 0 ,f) > 0, solve FJSP 5.1.3 and fg(x) = 0. If f 0 > 0, then by Remark 5.1.5 we are done. If f0 = 0, then f > 0, and from the second inequality of FJSP 5.1.3 which contradicts Karlin's constraint qualification 4- Hence f 0 > 0. We summarize in Fig. 5.4-1 the relationships between the solutions of the various problems of this chapter. We end this section by deriving a Kuhn-Tucker saddlepoint necessary optimality criterion in the presence of linear equality constraints. In order to do this, we have to let the set X° of MP 5.1.1 be the entire space Rn. Kuhn-Tucker saddlepoint necessary optimality theorem in the presence of linear equality constraints [Uzawa 58] Let 6, g be respectively a numerical function and an m-dimensional vector function which are both convex on Rn. Let h be a k-dimensional linear vector function on Rn, that is, h(x) = Bx — d, where B is a k X n matrix, and d is a k-vector. Let x be a solution of the minimization problem
and let g and h satisfy any of the constraint qualifications: (i)
(Generalized Slater 3} g(x) < 0, Bx — d has a solution x £ Rn
Fig. 5.4.1 Relationships between the solutions of the local m i n i m i z a t i o n problem (LMP) 5.1.2, the minimization problem (MP) 5.1.1, the Fritz John saddlepoint problem (FJSP) 5.1.3, and the Kuhn-Tucker saddlepoint problem (KTSP) 6.1.4. 80
Saddlepoint Optimality Criteria without Differentiability
S.4
There exists no p > 0, p G Rm, 9 G Rk
(ii)
(Generalized Karlin 4) such that
(iii)
(Generalized strict 5) X contains at least two distinct points xl and x2 such that g is strictly convex at x1
Then x and some u G: Rm, u ^ 0, v G Rk satisfy ug(x) = 0, and
PROOF
We shall first establish the fact that
(iii) =* (i) => (") and then prove the theorem under (ii). [(iii) => (i)] Since g(xl) ^ 0, g(x*) g 0, Bxl = d, Bx* = d, we have for 0 < X < 1 that B[(l - \)xl + Xz2] = d and Hence (i) holds. [(i) ==> (ii)] If g ( x ) < 0 and Bx = d, then for any p > 0, p € RMI, and any q € Rk, Hence (ii) holds. We establish now the theorem under (ii). There will be no loss of generality if we assume that the rows B I , . . . , Bk of B are linearly independent, for suppose that some row, Bk say, is linearly dependent on BI, . . . , Bk-i, that is Bk = V s,-J?,-, where $1, . . . , Sk-i are fixed real numbers. Then
for any x satisfying B{X = di, i = 1, . . . , k — 1. But since x (— X, and BiX = di, i = I , . . . , k, it follows that V Mt — dk = 0 and .B*z — d* = 0 for any x satisfying B{x = d,-, i = 1, . . . , fc — 1. 81
6.4
Nonlinear Programming
Hence the equality constraint BkX = dk is redundant and can be dropped from the minimization problem without changing the solution x. Then, once we have established the theorem for the linearly independent rows of B, we can reintroduce the linearly dependent row Bk (without changing the minimization problem) and set Vk — 0 in the saddlcpoint problem. By 2 above, there exist f0 £ R, f £ Rm, s E Rk, (f0,f) ^ 0, (f0,f,s) T^ 0, which satisfy fg(x) = 0 and solve the saddlepoint problem of 2. If TO > 0, then u = f/f0) v — s/f0 solve the saddlepoint problem of the present theorem, and we are done. Suppose f 0 = 0. Then since fg(x) = 0 and Bx — d = 0, we have by the second inequality of the saddlepoint problem of 2 that
0 ^ fg(x) + s(Bx - d)
for all x £ Rn
which contradicts (ii) above, if f > 0. Now suppose that f = 0, then s ^ 0 and s(Bx — d) ^ 0 for all x in R". Hence (see last part of proof of 4-%-4) B's = 0, which contradicts the assumption that the rows of B are linearly independent. Thus f 0 > 0.
82
Chapter Six Differentiable Convex and Concave Functions
In this chapter we give some of the properties of differentiable and twice-differentiable convex and concave functions. Appendix D summarizes the results of differentiable and twice-differentiable functions which are needed in this chapter.
1. Differentiable convex and concave functions Let 8 be a numerical function denned on an open set T in Rn. We recall from Appendix D that if 6 is differentiable at x G T, then
where V6(x) is the n-dimensional gradient vector of 6 at x whose n components are the partial derivatives of 6 with respect to Zi, . . . , xn evaluated at x, and a is a numerical function of x. Theorem Let 8 be a numerical function defined on an open set T C Rn and let 8 be differentiable at x G T. If 8 is convex at x G T, then d(x) - 8(x) ^ V6(x)(x - x) for each x G T // 8 is concave at x G T, then 8(x) - 8(x) ^ V8(x)(x - x) for each x G T PROOF Let 8 be convex at x. Since T is open, there exists an open ball B&(x) around x, which is contained in T. Let x E T, and let x 5^ x. Then for some p., such
6.1
Nonlinear Programming
that 0 < n < 1 and /i < d/\\x — x\\, we have that
x = x + n(x - x) = (1 - n)x + /iz G B«(s) C r Since 5 is convex at a;, it follows from 4-1-1, the convexity of BS(X) (see 3.J.7), and the fact that x G Bt(x), that for 0 < X ^ 1 (1 - \)6(x) + X0(z) ^ 6[(l - \)x + Xz]
or
Since
taking the limit of the previous expression as X approaches zero gives Since 6 is convex at x, since ^ G T, and since f = (1 — n)x + /ux, we have by 4-1-1 that But since and fj. > 0, the last three relations give The proof for the concave case follows in a similar way to the above by using 4-1-® instead of 4-1-1. Theorem Let 8 be a numerical differentiable function on an open convex set F C Rn- & is convex on T if and only if 6 is concave on T if and only if
84
Diflerentiable Convex and Concave Functions
6.1
PROOF
(Necessity) Since 6 is convex (concave) at each xl G F, this part of the proof follows from Theorem 1 above. (Sufficiency) We shall establish the result for the convex case. The concave case follows in a similar way. Let xl,xz GE F, and let 0 ^ X ^ 1. Since F is convex, (1 — X)^1 -f Xz2 £ F. We have then
Multiplying the first inequality by (1 — X), the second one by X, and adding gives A geometric interpretation of the above results can be given as follows. For a differentiable convex function 6 on F, the linearization 6(x) 4- V0(x)(x — x) at x never overestimates 8(x) for any x in F, see Fig. 6.1.1. For a differentiable concave function 6 on F, the linearization 6(x) 4- V6(x)(x — x) at x never underestimates 6(x) for any x in F, see Fig. 6.1.2. Theorem Let 9 be a numerical differentiable function on an open convex set T C Rn- A necessary and sufficient condition that 6 be convex (concave) on F is that for each xl,x2 £ F
Fig. 6.1.1 Linearization of a convex function 0 never overestimates the function. SB
6.1
Nonlinear Programming
Fig. 6.1.2 Linearization of a concave function 6 never underestimates the function.
PROOF We shall establish the theorem for the convex case. cave case is similar.
The con-
(Necessity} Let 6 be convex on F and let xl,x2 (E F. By Theorem 2 we have that and
Adding these two inequalities gives
(Sufficiency) Letxl,x2 £ F. Then for 0 ^ X ^ 1, (1 - \)xl -f Xx2 E F. Now by the mean-value theorem D.2.1 we have for some X, 0 < X < 1 But by assumption or
Hence and by Theorem 2 above, 6 is convex on F. If / is an n-dimensional function on F C Rn, and [/(x2) — /(x1)] 86
Dlflerentiable Convex and Concave Function*
6.1
(x2 — x1) ^ 0 for all xl,x2 £ T, then / is said to be monotone on T. It is seen from the above theorem that a differentiable numerical function on the open convex set F C R" is convex if and only if V0 is monotone on F. There exists an extensive literature on monotone functions [Zarantonello 60, Browder 66, Minty 64, Opial 67], which is involved mainly with the solution of nonlinear equations in more general spaces than Rn. Some use of monotone functions in nonlinear programming has been made [Karamardian 66, Rockafellar 67b], but the full power of the theory of monotone functions has not, it seems, been exploited in nonlinear programming.
2. Differentiable strictly convex and concave functions All the results of the previous section extend directly to strictly convex and strictly concave functions by changing the inequalities ^ and ^ to strict inequalities > and <. In particular we have the following results. Theorem Let 6 be a numerical function defined on an open set F C Rn, and let 6 be differentiable at x G F- If 9 is strictly convex at x G F, then If 8 is strictly concave at x G F, then Theorem Let 0 be a numerical differentiable function on an open convex set F C Rn. 8 is strictly convex on F if and only if 8 is strictly concave on F if and only if Theorem Let 8 be a numerical differentiable function on an open convex set F C Rn- A necessary and sufficient condition that 8 be strictly convex (concave) on F is that for each x1^ G F, x1 5^ x2 The proofs of Theorems 2 and 3 above are essentially identical to the proofs of Theorems 6.1.2 and 6.1.3 respectively and are left to the 87
6.8
Nonlinear Programming
reader. The proof of Theorem 1 is not only different from the proof of Theorem 6.1.1 but also makes use of Theorem 6.1.1. We give this proof below. PROOF OF THEOREM 1 Let 0 be strictly convex at x.
Then
Since 6 is convex at x, it follows from Theorem 6.1.1 that We will now show that if equality holds in 5 for some x in F which is distinct from x, then a contradiction ensues. Let equality hold in 5 for x = £, x E T and x j* x. Then From 4 and 6 we have that
or
But by applying Theorem 6.1.1 again to the points x(X) = (1 — X)x + \£, which are in F, and x, we obtain that
which contradicts 7, since for small X, (1 — X)x + Xz G F because F is open. Hence equality cannot hold in 5 for any x in F which is distinct from x, and the theorem is established for the convex case. The concave case is similarly proved.
3. Twice-differentiable convex and concave functions Let 0 be a numerical function defined on some open set F in Rn. We recall from Appendix D that if 0 is twice differentiate at x G F, then
88
Differentiable Convex and Concave Function!
6.8
where V8(x) is the n-dimensional gradient vector of 6 at x, and V28(x) is the n X n Hessian matrix of 6 at x whose ijth element is d28(x)/dxi dxj, i, j = I , . . . , n. Theorem Let 6 be a numerical function defined on an open set T C Rn, and let 6 be twice differentiate at x G T. If 0 is convex at x, then V28(x) is positive semidefinite, that is, If 6 is concave at x, then V28(x) is negative semidefinite, that is,
PROOF
Let y G Rn-
Because F is open, there exists a X > 0 such that
By Theorem 6.1.1 it follows that But since 8 is twice differentiable at x
Hence
Taking the limit as X approaches zero, and recalling that lim 0(x,\y) = 0, we get that The concave case is established in a similar way. Theorem Let 8 be a numerical twice-differentiable function on an open convex set T C Rn- 8 is convex on T if and only if V20(z) is positive semidefinite on T, that is, for each x G T 8 is concave on T if and only if V28(x) is negative semidefinite on T, that is, for each x G T
89
8.4
Nonlinear Programming
PROOF (Necessity) Since 6 is convex (concave) at each x (E T, this part of the proof follows from Theorem 1 above. (Sufficiency) By Taylor's theorem D.2.2 we have that for any x1^ £ T
for some 8, 0 < 5 < 1. But the right-hand side of the above equality is nonnegative (nonpositive), because V 2 0(z) is positive (negative) semidefinite on T, and xl + &(xz — xl) G r. Hence the left-hand side is nonnegative (nonpositive), and by Theorem 6.1.2 6 is convex (concave) on T.
4. Twice-differentiable strictly convex and concave functions Not all the results of the previous section extend to strictly convex and strictly concave functions by replacing inequalities by strict inequalities. In fact we begin by establishing the following partially negative result. Theorem Let 6 be a numerical function defined on an open set T C Rn, and let 8 be twice differentiate atx^T. If 6 is strictly convex at x, then Vz6(x) is positive semidefinite, but not necessarily positive definite; that is, it is not necessarily true that
// 6 is strictly concave at x, then V29(x) is negative semidefinite, but not necessarily negative definite; that is, it is not necessarily true that
PROOF If 6 is strictly convex at x, then 0 is convex at x, and by Theorem 6.3.1 V 2 0(z) is positive semidefinite. That V 2 0(z) is not necessarily positive definite can be seen from the counterexample 6(x) = (x)4, x £ R. 6 is strictly convex on R but V20(z) = 12(x)2 is not positive definite since V20(0) = 0. The concave case is established similarly. 90
Diflerentiable Convex and Concave Functions
6.4
Theorem Let 6 be a numerical twice-differentiable function on an open convex set F C Rn- -4 nonnecessary but sufficient condition that 8 be strictly convex on T is that V 2 0(z) be positive definite on T; that is, for each x G T A nonnecessary but sufficient condition that 6 be strictly concave on T is that V 2 0(z) be negative definite on T; that is, for each x G F
PROOF The nonnecessity follows from Theorem 1 above. The sufficiency proof is essentially identical to the sufficiency proof of Theorem 6.3.8.
91
Chapter Seven Optimality Criteria in Nonlinear Programming With Differentiability
In Chap. 5 we developed optimality criteria without the use of any differentiability assumptions on the functions entering into the nonlinear programming problem. Many problems (for example linear programming and quadratic programming) involve differentiate functions. It is important then to develop optimality criteria that take advantage of this property. These criteria are merely extensions of the well-known and often abused optimality criterion of the classical calculus of "setting the derivatives equal to zero." As we did in Chap. 5, we shall develop necessary and sufficient optimality criteria. For the sufficient optimality criteria we shall need differentiability and convexity (in 5.3.1 no convexity was needed). For the necessary optimality criteria we shall need differentiability, and, depending on which criterion we are talking about, we shall or we shall not need a constraint qualification. Note that we do not need any convexity requirements in order to establish necessary optimality criteria here. This is unlike the necessary optimality criteria of Sec. 5.4, where convexity was crucially needed. Again, as is the case of Chap. 5, the main sufficient optimality criteria here are straightforward to establish. Only simple inequalities relating convex functions and their gradients are needed. The necessary optimality criteria, on the other hand, need more sophisticated arguments that involve theorems of the alternative, and also, some of them require some sort of constraint qualification.
Nonlinear equality constraints will be treated cursorily in this chapter. A subsequent chapter, Chap. 11, will be devoted to programming problems with nonlinear equality constraints.
1. The minimization problems, and the Fritz John and Kuhn-Tucker stationary-point problems The optimality criteria of this chapter relate the solutions of a minimization problem, a local minimization problem, and two stationary-point problems (the Fritz John and Kuhn-Tucker problems) to each other. The minimization and the local minimization problems considered here will be the same as the corresponding problems of Chap. 5, that is, problems 5.1.1 and 5.1.2, with the added differentiability assumption. The Fritz John and Kuhn-Tucker problems of this chapter (3 and 4 below) follow from the Fritz John and Kuhn-Tucker saddlepoint problems 5.1.3 and 5.1.4 if differentiability is assumed, and conversely the Fritz John and Kuhn-Tucker saddlepoint problems 5.1.3 and 5.1.4 follow from the Fritz John and Kuhn-Tucker problems (3 and 4 below) if convexity is assumed (see 7.3.8 below). Let X° be an open set in Rn, and let 6 and g be respectively a numerical function and an ??i-dimensional vector function both defined on X°. (In many nonlinear programming problems X° is Rn.) The minimization problem (MP) Find an x, if it exists, such that
The local minimization problem (LMP) Find an x in X, if it exists, such that for some open ball Bs(x) around x with radius 5 > 0
The Fritz John stationary-point problem (FJP) Find x £ -X"0, f0^R,f^Rm
if they exist, such that
93
Nonlinear Programming
7.2
(It is implicit in the above statement that 6 and g are differentiable at x.) The Kuhn-Tucker stationary-point problem (KTP) Find x G X°, u £E Rm if they exist, such that
(Again, it is implicit in the above statement that 8 and g are differentiable at x.) Remark If (x,f 0 ,f) is a solution of FJP 8, and f 0 > 0, then (x,f/f0) is a solution of KTP 4- Conversely, if (x,u) is a solution of KTP 4, then (z,l,w) is a solution of FJP 3. Remark The Lagrangian functions
(z,rQ,r) and $(x,u) defined above are precisely the same Lagrangian functions defined in Chap. 5 (see 5.1.3 and 5.1.4).
2.
Sufficient optimality criteria
The sufficient optimality criteria developed here (1 and 2 below), unlike the sufficient optimality criteria 5.3.1, depend heavily on convexity. Their derivation, however, is quite straightforward. Sufficient optimality theorem [Kuhn-Tucker 51] Let x G -X"0, let X° be open, and let 6 and g be differentiable and convex at x. If (x,u) is a solution of KTP 7.1.4, then x is a solution of MP 7.1.1. If (x,f0,f) is a solution of FJP 7.1.3, and f0 > 0, then x is a solution of MP 7.1.1. PROOF The second statement of the theorem follows trivially from the first statement by Remark 7.1.5. 94
7.2
Optlmality Criteria with Diflerentlability
Let (x,u) be a solution of KTP.
We have for any x in X that
Hence Since <7(x) ^ 0, x is in .X", and hence
It should be remarked here that because of the convexity requirements on g, nonlinear equality constraints of the type h(x) = 0 cannot be handled by the above theorem by replacing them by two inequalities h(x) ^ 0 and —h(x) 5£ 0. However, linear equality constraints can be handled by this procedure. Problem Let x G X°, let X° be open, let 6 and g be differentiable and convex at x, let B be a given k X n matrix, and let d be a given fc-dimensional vector. Show that if (f,w,y), x G X°, u G Rm, v G Rk is a solution of the following Kuhn-Tucker problem
then
An interesting case not covered by Theorem 1 above is the case when (x,fo,f) is a solution of FJP 7.1.3 but the requirement that f0 > 0 95
7.2
Nonlinear Programming
is not made, in order to guarantee that x be a solution of MP 7.1.1. This is given by the following theorem, where instead of requiring that f 0 > 0, we require that g be strictly convex at x. Sufficient optimality theorem Let x G X°, let, X° be open, let 6 be differentidble and convex at x, and let g be differentiate and strictly convex at x. If (x,f0,f) is a solution of FJP 7.1.3, then x solves MP 7.1.1. PROOF
Let (x,fQ,f) solve FJP.
/ = {i\Qi(x) = 0 }
Let
J = [i\gi(x) <0}
I\JJ = {1,2, . . . ,m}t
Since f ^ 0, g(x) ^ 0, and fg(x) — 0, we have that
?igi(x) = 0
for i — 1, . . . , m
and hence f, = 0
for i E J
Since r0V6(x) -f fVg(x) = 0 and (f 0 ,f) > 0, we have that
It follows then by Gordan's theorem 2.4-5 that
has a solution Consequently
has a solution for if it did have a solution x E X°, then x 5^ x, and 0 > tf(f) - 0(z) ^ V5(x)(f - x)
(by convexity of 0 at x and ff.^.jf)
0 ^ (7/(^) — ^/(x) > Vgi(x)(x — x)
(by strict convexity of g at £ and 6.2.1)
f Sometimes the constraints gi(x) ^ 0 are said to be active constraints at x, and pj(x) ^ 0 are said to be inactive constraints at x. 96
Optimality Criteria with Differentiability
which contradicts 4 if we set z = x — x. have from 5 that
7.3
Recalling that gi(x) = 0, we
Since g(x) ^ 0, x is in X, and hence x solves MP 7.1.1. Remark In 1 and 2 above, since u ^ 0, g(x) ^ 0, and ug(x) = 0, we have that iiigi(x) — 0 for i = 1, . . . , m, and hence iii = 0 for
Similarly in S we have already shown that fi — 0 for i G J• It is obvious then from the proofs above that the convexity of gi at x is all that is needed in 1 and 2 instead of the convexity of g at x (as was assumed in 1 and 2), and that the strict convexity of gi at x is all that is needed in 3 instead of the strict convexity of g at x (as was assumed in $). The more restrictive assumptions were made in 1, 2, and 8 to make the statements of the theorems simpler.
3. Necessary optimality criteria In the necessary optimality conditions to be derived now, convexity does not play any crucial role. The differentiability property of the functions is used to linearize the nonlinear programming problem, and then theorems of the alternative are used to obtain the necessary optimality conditions. Once again, to derive the more important necessary optimality conditions (7 below), constraint qualifications are needed. We begin with the following slight generalization of Abadie's linearization lemma, which establishes the nonexistence of a solution of a system of linear inequalities whenever the local minimization problem LMP 7.1.2 has a solution. Linearization lemma [Abadie 67] Let x be a solution of LMP 7.1.2, let X° be open, let d and g be differentiable at x, let concave 97
7.8
Nonlinear Programming
and let
W — {i') ffi(x) — 0, and ^, is not concave at x} Then the system
has no solution z in Rn. PROOF
Let
Hence Let x be a solution of LMP 7.1.2 with 5 = 5. We shall show that if z satisfies V6(x)z < 0, Vgw(x)z < 0, and Vgry(z)2 ^ 0 then a contradiction ensues. Let z satisfy these inequalities. Then, since X° is open, there exists a 5 > 0 such that Since 6 and g are differentiate at x (see D.I.3), we have that for 0 < 8 < I
where (i) If 6 is small enough (say 0 < 5 < 50), then and hence
then 98
(ii)
Similarly, for i G W and 5 small enough (say 0 < 8 < 5,),
Optimallty Criteria with Differentiability
7.3
and hence
that
(iii)
For i E V, we have, since < is concave at x and V0v(;r)z ^ 0,
(iv) For i E «/, 0i(z) < 0. Hence for small enough 5 (say 0 < 5 < 5,), we have and hence Let us call 5 the minimum of all the positive numbers 5, 5, 50, fi, (i = 1, . . . ,m) defined above. Then for any 5 in the interval 0 < 5 < 5 we have that
Hence, for 0 < 5 < 6, we have that x + 626 BI(X) r\Xand6(x + dz) < 6(x), which contradicts the assumption that x is a solution of LMP 7.1.2 with 5 = 5. Hence there exists no z in R" satisfying V0(£)z < 0, Vgr,,-(f )z < 0, and V0v(£)z^0. We are now ready to derive a series of necessary optimality criteria based on the above linearization lemma. We begin by a generalization of a result of Abadie which, for the case of a finite number of constraints, includes the classical result of Fritz John. Fritz John stationary-point necessary optimality theorem [John 48, Abadie 67] Let xbea solution of LMP 7.1.2 or of MP 7.1.1, let X° be open and let 8 and g be differentiable at x. Then there exists an f0 E R and anf£Rm such that (x,r0,f) solves FJP 7.1.3, and
99
7.S
Nonlinear Programming
where
PROOF
If x solves MP 7.1.1, then x solves LMP 7.1.2.
Let
and
By Lemma 1 above we have that
Hence by Motzkin's theorem 2.4-2 there exist f0, fw, fv such that
Since gw(x) = 0 and
and
Since x is in JT, ^(x) 5j 0. (f0,r» > 0
Hence (x,r 0 ,f) solves FJP 7.1.3, and
REMARK The above theorem becomes that of Abadie [Abadie 67] if we replace the concavity of gv at x and the nonconcavity of gw at x by the requirements that gv be linear on Rn and that gw be nonlinear on Rn. It might not be immediately obvious why the above result is more general than Abadie's. One way to see this is the following: Besides (f 0 , f) > 0, we try to establish in addition the semipositivity of the smallest subset of the components of (f 0 , f). For example if we can establish the semipositivity of fo (and hence its positivity), then this is the sharpest necessary 100
ttttttH
optimality condition we can get (see 7 below). In the above theorem we establish the semipositivity of (f 0 , f^), where W is the set of indices of the active constraints which are not concave at x. Abadie establishes the semipositivity of a larger-subset of components of (f 0 , f), namely that of (f 0 ,ftf), where N is the set of indices of the active constraints which are nonlinear on Rn. On the other hand, the above theorem becomes that of Fritz John [John 48] if we drop the result that ( r 0 , f w ) > 0. REMARK If X° is convex and gv is concave on X°, then the above theorem certainly holds. However, the concavity of j/v does not (unless gv is linear) make the set {x \ x E -X"0, gv(x) ^ 0} convex (see J^.1.10 and 4.1.11). Again, just as in the case for the saddlepoint necessary optimality criteria (see Section 5.4), there is no guarantee that f 0 > 0 in Theorem 2 above. This can occur, for example, when the solution x of LMP 7.1.2 or of MP 7.1.1 is at a cusp, Fig. 7.3.la, or when the feasible region x is degenerate and is made up of one point only, Fig. 7.3.1b. In cases when f0 — 0, the function 6 disappears from the Fritz John problem 7.1.3, and we have a degenerate case. As was done in Sec. 5.4, it is possible to exclude such cases by introducing restrictions on the constraints g(x) ^ 0. These restrictions are again called, as in Sec. 5.4, constraint qualifications. In addition to Slater's constraint qualification 5.4-3, Karlin's constraint qualification 5.4-4, and the strict constraint qualification 5.4-5, we shall introduce the Kuhn-Tucker constraint qualification 3, the Arrow-Hurwicz-
Fig. 7.3.1 Examples of minimization problems in which f 0 of the Fritz John problem 7.1.3 is zero.
101
7.3
Nonlinear Programming
Uzawa constraint qualification 4, and the reverse convex constraint qualification 5. The Kuhn-Tucker constraint qualification [Kuhn-Tucker 51] Let X° be an open set in Rn, let g be an w-dimensional vector function defined on X°, and let
g is said to satisfy the Kuhn-Tucker constraint qualification at x G: X if g is differentiate at x and if
where
The Arrow-Hurwicz-Uzawa constraint qualification [Arrow et al. 61] Let X° be an open set in Rn, let g be an w-dimensional vector function defined on X°, and let
0 is said to satisfy the Arrow-Hurwicz-Uzawa constraint qualification at x €E X if g is differentiate at x and if
where and
108
Optimality Criteria with Differentiability
7.3
The reverse convex constraint qualification [Arrow et al. 61] Let X° be an open set in Rn, let g be an w-dimensional vector function defined on X°, and let g is said to satisfy the reverse convex constraint qualification at x G X if g is differentiate at x, and if for each i G / either p, is concave at x or gi is linear on Rn, where (The reason for the adjective reverse convex is that if A" 0 is a convex set and gi€j is concave on X°, then the set {x | x € X°, giej(x) ^ 0} is not convex, but its complement relative to X°, that is, {x \ x € X°, g^i(x) > 0}, is a convex set. For example if X° = R" and g(x) = — xx + 1, then g satisfies the reverse constraint qualification at each x € {x \ x € Rn, xx ^ 1}.) Before deriving the fundamental Kuhn-Tucker necessary optimality conditions (7 below), we relate the three constraint qualifications just introduced to each other and to the constraint qualifications of Chap. 5. Lemma Let X° be an open set in Rn, let g be an m-diniensional vector function defined X°, and let
(i) (ii) (iii)
// g satisfies the reverse convex constraint qualification 5 at x, then g satisfies the Arrow-Hurwicz-Uzawa constraint qualification 4 at x// g satisfies the reverse convex constraint qualification 5 at x, then g satisfies the Kuhn-Tucker constraint qualification 3 at x. Let X° be convex, let g be convex on X{\ and let g be differentiable at x. If g satisfies Slater's constraint qualification 5.4-3 on X°, Karlin's constraint qualification 5.4-4 on A'°, or the strict constraint qualification 5.4-5 on X°, then g satisjies the Arrow-Hurwicz-Lzawa constraint qualification 4 at x.
PROOF (i) Let g satisfy the reverse constraint qualification 5 at x. Then the set W defined in 4 is empty, and 2 = 0 satisfies Hence g satisfies the Arrow-Hurwicz-Uzawa constraint qualification 4 at x. 103
7.3
Nonlinear Programming
(ii)
Let g satisfy the reverse constraint qualification 5 at x,
Define Let y be any vector in Rn satisfying Define for some
Obviously conditions (a) and (c) of the Kuhn-Tucker constraint qualification 3 are satisfied. We will now show that condition (6) is also satisfied. Since X° is open, and since gi is concave and differentiate at x, we have that
and for 0 ^ T ^ 1 and 0 ^ X < X. Hence for 0 ^ X < X and 0 ^ r ^ 1, we have that and for i G «/, we have that
where the last inequality holds because gj(x) < 0 and lim ati(x,\Ty) = 0. Hence gj[e(r)\ < 0, for 0 ^ r ^ 1. Since gi[e(r)] ^ 0 for 0 ^ r ^ 1, we have that
and condition (6) of the Kuhn-Tucker constraint qualification 3 is satisfied, (iii) By Lemma 5.4-6 we have that Slater's constraint qualification 5.4-3 and Karlin's constraint qualification 5.4-4 are equivalent and that the strict constraint qualification 5.4-5 implies both Slater's and Karlin's constraint qualification. Hence we need only establish the present lemma under Slater's constraint qualification. If g satisfies Slater's constraint 104
Optimality Criteria with Differentiability
qualification on X°, there exists an x G X° such that g(x) < 0. is differentiable at x, we have by 6.1.1 that
7.3
Since g
where Hence by taking z = x — x, we have that Vgf/(z)z > 0, and the ArrowHurwicz-Uzawa constraint qualification 4 is satisfied at x. The results of the above lemma are summarized in Fig. 7.3.2. We are now ready to derive the fundamental necessary optimality criterion of nonlinear programming, the Kuhn-Tucker necessary optimality criterion. We shall establish the result under all the constraint qualifications introduced. In view of Lemma 6 above, we need only establish the result under the Kuhn-Tucker constraint qualification 3 and the Arrow-Hurwicz-Uzawa constraint qualification 4- (A somewhat involved proof [Abadie 67] shows that a special case of the Arrow-Hurwicz-Uzawa constraint qualification, where gv are the linear active constraints and gw are the nonlinear active constraints, implies the Kuhn-Tucker constraint qualification. We take here a different and somewhat simpler approach and show that either the Arrow-Hurwicz-Uzawa or the KuhnTucker constraint qualification is adequate for establishing the necessary optimality conditions we are after.) Kuhn-Tucker stationary-point necessary optimality theorem [Kuhn-Tucker 51] Let X° be an open subset of Rn, let 6 and g be defined on X°, let x solve LMP 7.1.2 or MP 7.1.1, let 6 and g be differentiable at x, and let g
Fig. 7.3.2 Relations between the various constraint qualifications (CQ). IDS
7.3
Nonlinear Programming
satisfy (i) the Kuhn-Tucker constraint qualification 3 at x, or (ii) the Arrow-Hurwicz-Uzawa constraint qualification 4 at x, or (iii) the reverse convex constraint qualification 5 at x, or (iv) Slater's constraint qualification 5.4-3 on X°, or (v) Karlin's constraint qualification 5.4-4 on X°, or (vi) the strict constraint qualification 5-4-5 on X°. Then there exists a u (E Rm such that (x,u) solves KTP 7.1-4PROOF In view of Lemma 6 above we need only establish the theorem under assumptions (i) or (ii). (i) Let x solve LMP 7.1.8 with 5 = 5. Let We have to consider two cases: the case when 7 is empty and the case when 7 is nonempty. (7 = 0) Let y be any vector in Rn such that yy = 1. Then Since Qi(x) < 0 and lim ai(x,8y) = 0, it follows that for small enough 5, «-»o say 0 < 8 < 8 < 5, gt(x + fy) < 0 and x + Sy E ^°. But since x solves LMP 7.1.2, we have that
for 0 < 5 < 5. Hence
Since lim a(x,dy) = 0, we have on taking the limit of the above expression *-»o as 5 approaches zero that
Since y is an arbitrary vector in 72n satisfying yy — 1, we conclude from this last inequality, by taking y = ± e', where ei G Rn is a vector with one in the ith position and zero elsewhere, that
Hence x and u = 0 satisfy KTP 7.1.4. 106
7.3
Optimality Criteria with Differentiability
(/ 7^ 0) Let g satisfy the Kuhn-Tucker constraint qualification 3 at x, and let y G Rn satisfy Vgi(x)y ^ 0. Hence by 3, there exists an n-dimensional vector function e defined on [0,1] such that e(0) = x, e(r) G X for 0 ^ T ^ 1, e is differentiate at T = 0, and de(Q)/dr = \y for some X > 0. Hence for 0 f£ T ^ 1
where lim Ti(0,r) = 0. Hence by taking T small enough, say 0 < T < f < 1, we have that e(r) G Bi(x). x solves LMP 7.1.2, we have that
Since e(r) G X for 0 ^ T ^ 1 and
Hence b}^ the chain rule D.I.6 and the differentiability of 6 at x and of e at 0, we have for 0 < T < f that
where lim /3(0,r) = 0.
Hence
Taking the limit as T approaches zero gives
Since e(0) = x and de(0)/dr = Ay for some X > 0, we have that
Hence we have shown that or that
Hence by Motzkin's theorem 2.4-2 there exists an fo and an f/ such that
107
7.8
Nonlinear Programming
Since f 0 is a real number, f 0 > 0 means f0 > 0. By defining
we have that
and since x £ X, we have that
Hence (x,u) solves KTP 7J.4(ii) Let £ solve MP 7.1.1 or LMP 7J.0. Then by the Fritz John theorem 2 there exists an f0 £ R and an f G # m such that (x, r0, f) solves FJP 7.1.3, and where Define
and By Remark 7.1.5 we only have to show that f 0 > 0. Since (f0,fw) > 0, we have that f 0 > 0 if W is empty. Assume now that W is nonempty. We will now show by contradiction that f0 > 0. Suppose that f 0 = 0; then since fj = 0, we have that
Since g satisfies the Arrow-Hurwicz-Uzawa constraint qualification 4 at x, there exists a z £ #" such that
108
Optimality Criteria with Differentiability
7.S
Fig. 7.3.3 A geometric interpretation of the Kuhn-Tucker conditions 7.1.4.
Premultiplying these inequalities by fw and TV respectively gives which contradicts the fact that Hence f 0 > 0.
Fig. 7.3.4 Relationship between the solutions of the local minimization problem (LMP) 7.1.2, the minimization problem (MP) 7.1.1, the Fritz John problem (FJP) 7.1.S, and the Kuhn-Tucker problem (KTSP) 7.1.4. 109
7.S
Nonlinear Programming
Fig. 7.3.5 Relationships between the solutions of the local minimization problem (LMP) 7.1.2, the minimization problem (MP) 7.1.1, the Fritz John problem (FJP) 7.1.3, the Kuhn-Tucker problem (KTP) 7.14, the Fritz John saddlepoint problem (FJSP) 5.1.3, and the Kuhn-Tucker saddlepoint problem (KTSP) S.I.4.
A geometric interpretation of the Kuhn-Tucker conditions 7.1.4 can be given as follows. At x, there exists a nonnegative linear combination of the gradient of the objective function V6(x) (with positive weight) and the gradients of the active constraints ^gi(x), i G I, which is equal to zero. Figure 7.8.3 depicts such a situation. If, however, none of the constraint qualifications are satisfied, there may not exist such a nonnegative linear combination of V6(x) and V0/(x). In such cases V6(x) may have a zero weight, as for example in Fig. 7.3.1. The various optimality criteria of this chapter are related to each other in Fig. 7.3.4, and to the optimality criteria of Chap. 5 in Fig. 7.3.5. Problem Let X° be an open set in Rn, and let 6 and g be defined on X°. Find the conditions under which 110
Optimality Criteria with Differentiability
(i) (ii)
7.3
A solution (x, r 0 , f) of the Fritz John saddlepoint problem FJSP 5.1.3 is a solution of the Fritz John problem FJP 7.1.3, and conversely, A solution (x,u) of the Kuhn-Tucker saddlepoint problem KTSP 5.1.4 is a solution of the Kuhn-Tucker problem KTP 7.1.4, and conversely. (The above relationships are indicated by dotted lines in Pig. 7.3.5.} Problem
Let X° be an open set in Rn, let 0 be defined on X°, let x E X°, and let 6 be differentiable at x. Show that if
then
Under what conditions is the converse true? Give a geometric interpretation of the above conditions for the case when X° = R. Problem Let X° be an open set in Rn, let 8 be defined on X°, let x £ X°, and let 6 be differentiable at x. Show that if
then
Under what conditions is the converse true? Give a geometric interpretation of the above conditions for the case when X° = R. Problem (method of feasible directions [Zoutendijk 60]) Let one of the assumptions (ii) to (vi) of Theorem 7 hold, and let 6 and g be differentiable on X°. Suppose that we have a point x € X at which one of the last five constraint qualifications of 7 holds but at which the Kuhn-Tucker conditions 7.1.4 are not satisfied. Show that there exists a feasible direction z in Rn such that x + dz € X and 6(x + dz) < 6(x) for some 111
7.8
5 > 0.
Nonlinear Programming:
(Hint: Use the Farkas' theorem 2.4-6, the fact that
has no solution UT, and the differentiability property of Q and g.) a linear programming problem the solution of which will yield z.
State
Problem (general Kuhn-Tucker stationary-point necessary optimality criterion) Let X° be an open set in Rn* X R"*, let d, g, and h be respectively a numerical function, an w-dimensional vector function, and a /c-dimensional vector function defined on X°, let (x,y) G -X"0 solve the following general minimization problem
let 6, g, and h be differentiable at (x,y), and let the composite m + 2k + n 2 -dimensional vector function / defined on X° by f(x,y) = [g(x,y), h ( x , y ) , — h ( x , y ) , —y] satisfy one of the constraint qualifications 3, 4> or 5 at (x,y), or one of the constraint qualifications 5.4-3, 5.4-4* or 5.4-5 on X°. Show that there exist u G Rm, v G Rk such that (x,u,v} satisfies the following general Kuhn-Tucker necessary optimality conditions
(Hint: Redefine X as X = {(x,y) | (x,v) G X°, g(x,y) ^ 0, /i^y) g 0, -h(x,y} ^ 0, -y ^ 0} and then use Theorem 7.)
113
Chapter Eight Duality in Nonlinear Programming
Duality plays a crucial role in the theory and computational algorithms of linear programming [Dantzig 63, Simmonard 66]. The inception of duality theory in linear programming may be traced to the classical minmax theorem of von Neumann [von Neumann 28] and was first explicitly given by Gale, Kuhn, and Tucker [Gale et al. 51]. Duality in nonlinear programming is of a somewhat later vintage, beginning with the duality results of quadratic programming [Dennis 59]. In spirit, however, duality theory in nonlinear programming is related to the reciprocal principles of the calculus of variations, which have been known since as far back as 1927 [Trefftz 27, 28, Friedrichs 29, Courant-Hilbert 53, Lax 55, Mond-Hanson 67]. In recent years there has been an extensive interest in the duality theory of nonlinear programming as evidenced by the publication of some 50 or more papers on the subject. To cover the subject of all these papers would take a book by itself. Instead we shall concentrate on a few of the more basic results which are simpler to prove and which subsume in an obvious way the duality results of linear programming. The plan of this chapter is as follows. We first introduce the minimization problem and its dual and develop the duality results of nonlinear programming. We then apply these results to quadratic and linear programming problems.
8.1
Nonlinear Programming
1. Duality in nonlinear programming Let X° be an open set in Rn, and let 6 and g be respectively a numerical function and an ?n-dimensional vector function, both defined on X°. We define now the same minimization problem that we have been dealing with in the previous chapters and its dual. The (primal) minimization problem (MP) Find an x, if it exists, such that
The dual (maximization) problem (DP) Let 8 and g be differentiate on X°. they exist, such that
Find an x and a u G Rm, if
The duality results we are about to establish relate solutions x of MP and (x,u) of DP to each other. They also relate the objective functions 0 and \j/ to each other. We begin by giving a weak and easy-toderive duality theorem. Weak duality theorem [Wolfe 61] Let X° be open, and let 6 and g be differentiate
where X and Y are defined in 1 and 2. 114
on X°.
Then
Duality In Nonlinear Programming1
8.1
PROOF
We derive now one of the more important duality theorems of nonlinear programming. Wolfe's duality theorem [Wolfe 61] Let X° be an open set in Rn, let 8 and g be differentiable and convex on X°, let x solve MP 1, and let g satisfy any one of the six constraint qualifications of Theorem 7.3.7. Then there exists a u G Rm such that (x,u] solves DP 2 and
PROOF [Huard 62] By Theorem 7.3.7, there exists a u £ Rm such that (x,u) satisfies the Kuhn-Tucker conditions
Hence (x.u) G Y = \(x,u)
x £ Xn,u £ Rm, V8(x) + uVg(x) = 0, u ^ 0}
Now let (x,u) be an arbitrary element of the set Y.
Then
115
1.1
Nonlinear Programming
Hence
Since
ALTERNATE PROOF Again as above, by Theorem 7.3.7, there exists a u (E Rm such that (x,u) satisfies the Kuhn-Tucker conditions. Hence (x,u) E Y. Now where the last inequality follows from Theorem 3. Hence (x,u) solves DP 2. REMARK The constraint qualification in the above theorem is merely a sufficient condition for the theorem to hold. It may hold without satisfying such a constraint qualification, as evidenced by the following primal problem. Find x (E R2 such that
where a is some nonnegative number. The dual to this problem is: Find x G R2 and u G R* such that
When a = 1, the primal problem has one feasible (and hence minimal) point, Xi = X2 = 0. At this point none of the constraint qualifications are satisfied. However it can be verified (after a little calculation) that x = 0, u — 0 also solves the dual problem with a maximum of zero. Another important duality theorem is the converse of Theorem 4 above. In order to obtain such a theorem, we have to modify the hypotheses of Theorem 4- There are a number of such converse theo11*
Duality in Nonlinear Programming
8.1
rems [Hanson 61, Huard 62, Mangasarian 62, Mangasarian-Ponstein 65, Karamardian 67]. To cover all such theorems would be lengthy. Instead we shall confine ourselves to two converse theorems which hold under somewhat different assumptions and the new proofs of which are completely different from each other. Strict converse duality theoremf [Mangasarian 62] Let X° be an open set in Rn, let 9 and g be differentiable and convex on X°, let MP 1 have a solution x, and let g satisfy any one of the six constraint qualifications of Theorem 7.3.7. // (x,u) is a solution of DP 2, and if if/(x,u) is strictly convex at x, then A — x, that is, x solves MP 1, and
REMARK $(x,u) is strictly convex at x if either 0 is strictly convex at x or if for some i, wt > 0 and 0, is strictly convex at A. PROOF We shall assume that £ 9* x and exhibit a contradiction. Since x is a solution of MP 1, it follows from Theorem 4 above that there exists a u G Rm such that (x,u) solves DP 2. Hence
and Because (£,u) G Y, we have that Vxif/(£,u) = 0. Hence by the strict convexity of \l>(x,u) at x and 6.2.1 we have that It follows then that
or that But from Theorem 7.3.7 we have that ug(x) = 0, hence f We have used the adjective strict to distinguish the above converse duality theorem from other theorems, such as Dorn's converse duality theorem [Dorn 60] (see 8.2.6 below) and theorem 5.6 of [Mangasarian-Ponstein 65], in which the solution of the primal problem x need not equal x, where (x,u) is the solution of the dual problem. 117
8.1
Nonlinear Programming
which contradicts the facts that u ^ 0 and g(x) ^ 0. Hence x = x. We also have that
It should be remarked that the above theorem can be strengthened (see theorem 5.7 in [Mangasarian-Ponstein 65]) by dropping the assumptions that MP 1 has a solution and that g satisfies a constraint qualification, but retaining the strict convexity assumption on \j/(x,u}. The proof of this strengthened theorem is rather lengthy, and hence we content ourselves here with the above weaker version. We derive now another strict converse duality theorem under different hypotheses from those of Theorem 5 above. We drop the assumptions that MP 1 has a solution, that g satisfies a constraint qualification, and that t(x,u) is strictly convex at x. We add the assumptions that ^(x,u) is twice continuously differentiate at x and that the Hessian matrix Vx2\fs(x,u) is nonsingular at x. The Hanson-Huard strict converse duality theorem [Hanson 61, Huard 62] Let X° be an open set in Rn, and let 6 and g be differentiate on X°. Let (x,u) be a solution of DP 2, and let 6 and g be convex at x. If either (i) [Huard 62] *(/(x,u) is twice continuously differentiate at x and the n X n Hessian matrix Vx^(x,u) is nonsingular, or (ii) [Hanson 61] there exists an open set A C Rm containing u and an n-dimensional differentiate vector function e(u) on A such that and
then x solves MP 1, and B(x) = t(x,u) PROOF Since Vx^(x,u) = 0, we have that assumption (i) above implies, by the implicit function theorem D.3.1, assumption (ii) above. (Assumption (ii) does not necessarily imply assumption (i).) We establish the theorem now under assumption (ii). From (ii) we have that (x,u) = [e(u),u] G {[e(u),u] \u G A, u ^ 0} C Y 118
Duality in Nonlinear Programming
8.1
and since
we have then that
Since A is open and the constraint u ^ 0 is linear, we have by Theorem 7.3.7 (or more conveniently by 7.3.10) and the chain rule D.I.6 that
Since u G A. since V^Oc.w)
x = e(u)
= 0 for u G A, and since
the last two relations give
But since (x,u) G Y, we also have that
The last four relations and Theorem 7.2.1 imply that x is a solution of UP 1. We also have that We give now a theorem which shows when the dual problem has an objective function, unbounded from above, on the set of dual feasible points Y. This is a generalization of a theorem of Wolfe [Wolfe 61, theorem 3]. Unbounded dual theorem Let X° be an open set in Rn, and let 6 and g be differentiable on X°. If there exists a dual feasible point (xl,u1} such that then the dual problem has an unbounded objective function (from above) on the set of dual feasible points Y. 119
••1
Nonlinear Programming
PROOF By Theorem 24.10 of the alternative there exists a w 2 £ Rm such that
Since (a;1,!*1) is dual feasible, then
Consider now the point (x1, w1 -f- ru2) for any r > 0. We have that
and hence (x1, w1 + rw 2 ) is dual feasible for any T > 0. Since u2g(xl) = 1, we have that the dual objective function
tends to « as r tends to «. REMARK The above theorem does not apply if g is convex at xl and the primal problem has a feasible point, x*, say. For if we define z — x* — xl, then we have that and hence ^(x1) + Vg^z1)* ^ 0 has a solution 2 = x* — x1. EXAMPLE Consider the example in the Remark following Theorem 4 above, and let a = y±. Obviously then the primal problem has no feasible point. Consider now the dual feasible point Xi1 = x 2 x = 0, Ui1 = W2 1 = 1. We have then that
has no solution z G Rn, and hence by Theorem 7 above the dual objective function is unbounded from above on the dual feasible region Y. This can be easily seen if we let Xi = x 2 = 0 and let u\ — w 2 tend to ». Then ^(x,w) tends to w. no
Duality in Nonlinear Programming
8.1
Corollary (unbounded dual) Let X° = Rn, and let 6 and g be differentiable on Rn. Let the dual problem DP 2 have a feasible point (x1^1), let g be concave^ at xl or linear on Rn, and let the primal problem MP 1 have no feasible point. Then the dual problem has an unbounded objective function $(x,u) (from above) on the set of dual feasible points Y.
PROOF
We claim that
has no solution For if it did have a solution z £ Rn, then x* = xl + z would satisfy where the next-to-last inequality follows from 6.1.1. Hence which contradicts the assumption that X is empty. So, has no solution z in Rn, and by Theorem 7 the dual problem has an unbounded objective function (from above) on the set Y of dual feasible points. The case when g is linear in the above corollary is theorem 3 of [Wolfe 61]. We finally give a theorem which tells us when the primal problem has no local or global minimum. Theorem (no primal minimum) Let X° be an open set in Rn, let 0 and g be differentiable and concave^ on X°, and let X ?* 0. // the dual problem DP 2 has no feasible point, then neither the primal problem MP 1 nor the local minimization problem LMP 7.1.2 has a solution.
PROOF
Because the dual problem has no feasible point, we have that
has noi solution t Note that if g were concave on Rn, the set X would still not be convex in general, unless g were linear. % Note that the concavity of 8 and g does not, in general, produce a primal convex programming problem (unless 8 and g are linear).
iai
8.1
Nonlinear Programming
Hence by Farkas' theorem 2.4.6
has a solution
Let x (E X.
Then by 6.1.1 and the last two inequalities we have that
Hence by choosing 5 small enough, x + z G BI(X) C\ X for any 5 > 0 and x cannot be a solution of LMP 7.1.2 and, hence, not a solution of MP 1. We mentioned earlier that there is an extensive literature on the duality theory of nonlinear programming. The presentation we have followed here resembles that given in [Dorn 60, Wolfe 61, Hanson 61, Huard 62, Mangasarian 62], There are approaches to duality which use the conjugate function concept, which we have not touched upon here [Fenchel 53, Berge-Ghouila Houri 65, Dieter 65a, Rockafellar 63, 64, 67a, 676, 67c, 69, Whinston 65, 67]. There are also approaches that use a minmax theorem [Stoer 63, 64, Mangasarian-Ponstein 65, Karamardian 67]. There is also the theory of symmetric dual nonlinear programs [Dorn 61, Cottle 63, Dantzig et al. 65, Mond 65, Mond-Cottle 66, Stoer 64]. Duality relations relaxing convexity conditions have appeared in [Mangasarian 65, Karamardian 67, Rissanen 67, Jacob-Rossi 69] (see also Chap. 10). Duality in spaces more general than Rn has also appeared in [Hurwicz 58, Kretschmar 61, Rubinstein 63, Rockafellar 63, Brondsted 64, Moreau 65, Tyndall 65, 67, Dieter 65b, 66, Levinson 66, LarsenPolak 66, Gol'stein 67, Ritter 67, Varaiya 67, Hanson-Mond 67, Van Slyke-Wets 67, Hanson 68]. Problem Let X° be an open set in Rn> X R"1, let 6 and g be respectively a differentiate numerical function and a differentiable w-dimensional vector function on X°. Show that the dual to the following primal minimization problem
122
Duality in Nonlinear Programming
8.8
is the following maximization problem
State and prove the duality relations that hold for these dual problems and the conditions under which they hold.
2. Duality in quadratic programming We consider in this section programming problems with a quadratic objective function and linear constraints. Let b be an n-vector, c an m-vector, C a symmetricf n X n matrix, and A an m X ft matrix. We consider the following primal problem. The (primal) quadratic minimization problem (QMP) Find an x, if it exists, such that
According to 8.1.2 the dual to the above problem is given by
The constraint relation Cx — b + A'u = 0 implies that
Using this equation in the objective function, the dual problem becomes the following. t If C is not symmetric, we replace C by (C + C")/2 in QMP 1 because xCx - x[(C + C')/2]x. M
8.2
Nonlinear Programming
The quadratic dual (maximization) problem (QDP) Find an x G R" and a u G Rm, if they exist, such that
A whole group of duality theorems now follows from the theorems of the previous section. Weak duality theorem Let C be positive semidefinite (that is, xCx ^ 0 for all x in Rn).
Then
PROOF This theorem follows from Theorem 8.1.3 by observing that Y^xCx — bx is convex on Rn if the matrix C is positive semidefinite (see Theorem 6.3.8). Dorn's duality theorem [Dorn 60] Let C be positive semidefinite. If x solves QMP 1, then x and some u G Rm solve QDP 2, and the two extrema are equal. PROOF This theorem follows from Theorem 8.1.4 if we make the same observation as in the proof of Theorem 3 above. REMARK The dual quadratic programs / and 2 possess a nice feature not shared by the dual nonlinear programs 8.1.1 and 8.1.2. If C is positive semidefinite, then the objective function of QMP 1 is convex on Rn, and the objective function of QDP 2 is concave on Rn X Rm. Strict converse duality theorem Let C be positive definite (that is, xCx > 0 for all nonzero x in R"). If (x,u) solves QDP 2, then x solves QMP 1, and the two extrema are equal. PROOF This theorem follows from Theorem 8.1.6 by observing that if C is positive definite, then %xCx — bx is strictly convex (and hence convex) on Rn (see Theorem 6.4-2), and the Hessian matrix Vx*[%xCx — bx + u(Ax — c)], which is equal to C, is nonsingular (for if it were singu134
Duality in Nonlinear Programming
8.9
lar, then Cx = 0 for some x j* 0, and xCx = 0 for some x j* 0, which contradicts the positive definite assumption). Dorn's converse duality theorem [Dorn 60] Let C be positive semidefinite. If (x,u) solves QDP 2, then some x G Rn (not necessarily equal to x), satisfying C(x — x) = 0, solves QMP 1, and the two extrema are equal. REMARK We have not established here the counterpart of this theorem for the general nonlinear case. Such a theorem exists [Mangasarian-Ponstein 65, theorem 5.6]. Note the difference between 5 and 6 above. In 5 the same x appears in the solutions of 1 and 2. In 6 we merely have that C(x — x) = 0. PROOF By Theorem 7.3.7 (or more conveniently by 7.8.12) and the linearity of the constraints Cx + A'u = b, u ^ 0, there exists a v (E Rn such that (£,u,v) satisfies the following Kuhn-Tucker conditions
Substitution of the first relation into the second one gives
The last four relations, the assumption that C is positive semidefinite, and Theorems 6.3.2 and 7.2.3 imply that v solves QMP 1. Hence x = v solves QMP 1, and Cx = Cv = Cx. Now we show that the two extrema are equal.
115
Nonlinear Programming
8.8
Unbounded dual theorem Let Y 9* 0. Then (X = 0) =» (QDP # /ias an unbounded objective function from above on Y) PROOF
Follows from Corollary 8.1.8.
Theorem (no primal minimum) Let C be negative semidefinite, and let X 9* 0.
PROOF
Then
Follows from Theorem 8.1.9.
Problem Let b, a, c, and d be given vectors in Rn, Re, Rm, and Rk, respectively, and let A, D, B, E, C, and F be given ra X n, m X A /c X n, k X A n X n (symmetric), and / X t (symmetric) matrices, respectively. Show that the following are dual quadratic problems.
3. Duality in linear programming We outline briefly here the main duality results of linear programming and show how they follow from the results we have established. By deleting the matrix C from the dual quadratic problems 8.2.1 and 8.2.2, we obtain the following dual linear programs. 196
Duality in Nonlinear Programming
8.3
The (primal) linear minimization problem (LMP) Find an x, if it exists, such that
The dual linear (maximization) problem (LDP) Find a u, if it exists, such that
Note that the dual problem contains only the variable u. The nonlinear and quadratic dual problems 8.1.2 and 8.2.2 contain both the variables x and u. The dual linear program has the unique feature that it does not contain the primal variable x. We combine now all the fundamental duality results of linear programming in the following theorem. Duality theorem of linear programming [Gale et al. 51]
(ii) (\\\)
(x solves LMP 1) <=> (u solves LDP 2} => — bx = — cu Let Y 7* 0. Then (X = 0) <£=» ( — cu is unbounded from above on Y}
(iv)
Let X 7* 0.
Then
(Y = 0) *=$ ( — bx is unbounded from below on X)
REMARK Both of the sets X and Y may be empty, in which case none of the above results hold. For example take A = Q, c = —1,6=— 1. (i) Follows from 8.8.3. (ii) Follows from 8.24 and 8.2.6. (ill) The forward implication follows from 8.2.7. ward implication is equivalent logically to
PROOF
The back-
(X j* 0) =* {—cu is bounded from above on Y)
m
8.3
Nonlinear Programming
which follows from part (i) of this theorem, (iv) (=>) hs no solution
(<=)
The backward implication is equivalent to
(Bx that solves LMP 1} Since X ^ 0, we have that Ax ^ c has a solution x, which implies that A'v = 0, v 2i 0, cv < 0 has no solution v (for if it did have a solution v, then 0 = v-Az ^ cv, which contradicts cv < 0). Hence
(A'v = 0, v ^ 0) =» cv ^ 0 Similarly, since F 5^ 0, A'w = 6, u ^ 0 has a solution w, which implies that Ay ^ 0, 6y > 0 has no solution y (for if it did have a solution y, then 0 ^ wAy = by, which contradicts by > 0). Hence Ay ^ 0 => by ^ 0
We will now show that if LMP 1 has no solution, then a contradiction ensues. {LMP 1 has no solution)
has a solution
128
Duality In Nonlinear Programming
8.3
has a solution
has a solution
has a solution
has a solution
has a solution
Now, either er = 0 or a > 0. If a = 0, then Az ^ 0, A't = 0, t ^ 0, which implies (by making use of 4 and 5 above) that bz ^ 0, ct ^ 0, and bz — ct ^ 0. This contradicts bz — ct > 0 above. If 0, we have, upon normalization, that A't = b, Az :g c, — bz-\-ct 0 we arrive at a contradiction. Hence LMP I has a solution. 139
Nonlinear Programming
8.3
Problem Consider the primal linear minimization problem LMP 1. Show that if X T* 0, and if, for all x in X, — bx^a for some real number a, then LMP 1 has a solution x. (Hint: Use the facts that -X" 5^ 0 and that — bx < a has no solution x in X to show that there exists a dual feasible point, that is, a w 6E Y; then use Theorem 3, part (v).) Remark It is not true in general that if a linear function is bounded from below on a closed convex set, then it achieves its infimum on that set. For example, x% ^ 0 on the closed convex set {x \ x G R2, (2)~x> — xz ^ 0}; however x2 does not attain its infimum of zero on the set.f Problem 7 above shows that if a linear function is bounded from below on a poly tope, then it achieves its infimum on the polytope. Problem Show that the following are dual linear programs:
where 6, a, c, and d are given vectors in Rn, Rf, Rm, and Rh, respectively and A, D, B, and E are given m X n, m X •£, k X n, and k X I matrices, respectively. 11 am indebted to my colleague James W. Daniel for this example.
ISO
Chapter Nine Generalizations of Convex Functions: Quasiconvex, Strictly Quasiconvex and Pseudoconvex Functions
Beginning with Chap. 4, we have continually used the concepts of convex and concave functions in deriving optimality conditions and duality relations. Since not all properties of convex and concave functions are needed in establishing some of the previous results, a more general type of function will also work. For example, some results need only that the set Aa = {x | x £ T, 6(x) ^ a} be convex, where T is a convex set in Rn, 6 is a numerical function defined on T, and a is any real number. Now if 6 is a convex function on T, the convexity of A« is assured by Theorem 4.I.W. However, 0 need not be convex in order that Aa be convex. A function 0 which is quasiconvex on F has this property [Nikaido 54]. Another property of differentiable convex functions that was used in previous results was this: If V6(x)(x —• x) ^ 0, then 0(x) ^ 6(x). This property follows from 6.1.1, and has the obvious consequence that if V0(z) = 0, then B(x) ^ B(x). Not only convex functions have this property. Pseudoconvex functions [Tuy 64, Mangasarian 65], to be introduced in this chapter, also have this property. By using only the properties of convex functions that are needed in establishing some of the previous results, these results are extended to a larger class of functions.
1. Quasiconvex and quasiconcave functions Quasiconvex function A numerical function 6 de-
••1
Nonlinear Programming
fined on a set F C Rn is said to be quasiconvex at x £ r (with respect to F) if for each x G r such that Q(x) ^ 6(x), the function 6 assumes a value no larger than 0(x) on each point in the intersection of the closed line segment [x,x] and F, or equivalently
0 is said to be quasiconvex on F if it is quasiconvex at each x G r. Note that this definition of a quasiconvex function is slightly more general than the customary definition in the literature [Nikaido 54, BergeGhouila Houri 65, Martos 65] in that (i) we define quasiconvexity at a point first and then on a set, and (ii) we do not require F to be convex. This generalization will allow us to handle a somewhat wider class of problems. It follows from the above definition that a numerical function 0 defined on a convex set F is quasiconvex on F if and only if
Quasiconcave function A numerical function 6 defined concave at x G F (with respect to B(x) ^ Q(x), the function 6 assumes each point on the intersection of the equivalently
on a set F C Rn is said to be quasiF) if for each x G F such that a value no smaller than 8(x) on closed line segment [x,x] and F, or
6 is said to be quasiconcave on F if it is quasiconcave at each x G F. Obviously B is quasiconcave at x (on F) if and only if — 6 is quasiconvex at x (on F). Results obtained for quasiconvex functions can be changed into results for quasiconcave functions by the appropriate multiplication by —1, and vice versa. 188
Generalizations of Convex Functions
9.1
It follows from 2 that a numerical function 6 denned on a convex set F is quasiconcave on F if and only if
Figure 9.1.1 depicts a quasiconvex and a quasiconcave function
on R.
3Theorem Let 6 be a numerical function defined on a convex set T C Rn, let and let
Then (B quasiconvex on F) <=> (Aa is convex for each a G R)
and (9 quasiconcave on F) «=> (fta is convex for each a G R) PROOF We shall establish the theorem for the quasiconvex case. quasiconcave case follows from it.
The
Fig. 9.1.1 A quasiconvex and a quasiconcave function on R. (a) Quasiconvex 6 on Rn — R; (b) quasiconcave 6 on Rn = R. 133
9.1
Nonlinear Programming
(<=) Let xl,x* E T, e(x*) £ 8(xl), and 0 ^ X ^ 1. If we let a = 8(xl), then by the convexity of Aa we have that
and hence 6 is quasiconvex on F. (=>) Let 6 be quasiconvex on F, a be any real number, and let x1 2 and x be any two points in Aa (if Aa is empty, then it is convex). Without loss of generality let 0(x2) ^ 6(xl). Since xl,x^ £ Aa, we have that Since 6 is quasiconvex, and since F is convex, we have that for 0 fS X ^ 1 Hence (1 — X)^ 1 + Xz2 £ A a , and Aa is convex.
4Theorem (differentiate quasiconvex and quasiconcave functions) Let T be an open set in Rn, and let 8 be a numerical function defined on F Then
PROOF We shall prove the quasiconvex case only. The other case is similar. (=>) If x1 — xz, the implication is trivial. Assume then that 1 x T* x*. Since F is open, there exists an open ball B&(x1} around x1 which is contained in F. Then, for some ft such that 0 < ft < 1 and # < d/ 134
Generalizations of Convex Functions
9.1
we have that Then
and let We establish now the quasiconvexity of 0 on T by showing that ft is empty. We assume that there is an x G ft and show that a contradiction ensues. Since 0(z2) ^ Q(xl) < B(x) for x G 0, we have from the hypothesis of the theorem that
and
Since 0 < X < 1, it follows then that
1SS
9.2
Nonlinear Programming
Since 6(xl) < 0(x), and since 6 is continuous on F, the set S) is open tive to (xl,x*), it contains x, and there exists an x3 = (1 — ju)z + 0 < M ^ 1, such that x3 is a point of closure of ft, and such that 0(z3) = (see C.I.I and B.I.3}. By the mean-value theorem D.2.1 we have for some x G ft
relanxl, 6(x1) that
and since then Since x G fy the last relation above contradicts the equality established earlier, V0(x)(x 2 - x1} = 0 for all x E fi.
2. Strictly quasiconvex and strictly quasiconcave functions We begin this section by recalling the concepts of a, local minimum and maximum (see 5.1.2 or 7.1.2). Let 6 be a numerical function defined on the set F C Rn- A point x £ F is said to be a local minimum (maximum) of 6 if there exists an open ball BI(X) around x with radius 5 > 0 such that (local minimum) (local maximum) We remark that for a numerical function 6 which is quasiconvex (quasiconcave) on a convex set F, a local minimum (maximum) need not be a (global) minimum (maximum) over F. This fact is easily demonstrated by the numerical function 6 defined on R as follows:
It is easy to see that 0 is both quasiconvex and quasiconcave on R by verifying definitions 9.1.1 and 9.1.2. It is also easy to see that x =• % is both a local minimum and a local maximum, but certainly not a global minimum or maximum over R. 136
Generalizations of Convex Functions
9.3
We introduce now a type of function which is essentially a restriction of quasiconvex (quasiconcave) functions to functions with the property that a local minimum (maximum) is also a global minimum (maximum). Such functions have been independently introduced in [Hanson 64, Martos 65, Karamardian 67]. (These functions were called functionally convex by Hanson and explicitly quasiconvex by Martos.) Strictly quasiconvex function A numerical function 6 defined on a set F C Rn is said to be strictly quasiconvex at x G F (with respect to F) if for each x G F such that 6(x) < B(x) the function 8 assumes a lower value than 6(x) on each point in the intersection of the open line segment (x,x) and F, or equivalently
0 is said to be strictly quasiconvex on F if it is strictly quasiconvex at each x G FIt follows from the above definition that a numerical function 6 defined on a convex set F is strictly quasiconvex on F if and only if
Strictly quasiconcave function A numerical function 6 defined on a set F C Rn is said to be strictly quasiconcave at x (E F (with respect to F) if for each x G F such that B(x) < 6(x) the function 6 assumes a higher value than B(x) on each point in the intersection of the open line segment (x,x) and F, or equivalently
6 is said to be strictly quasiconcave on F if it is strictly quasiconcave at each x G F. 137
9.2
Nonlinear Programming
It follows from the above definition that a numerical function 6 defined on a convex set F is strictly quasiconcave on F if and only if
Obviously 6 is strictly quasiconcave at x (on F) if and only if — 0 is strictly quasiconvex at x (on F). Figure 9.2.1 depicts a strictly quasiconvex and a strictly quasiconcave function on R. The quasiconvex function of Fig. 9.1.la is not strictly quasiconvex, and the quasiconcave function of Fig. 9.Lib is not strictly quasiconcave. (Why?) We observe that a strictly quasiconvex function need not be quasiconvex. Consider the numerical function 8 defined on R as follows
This function is strictly quasiconvex on R but is not quasiconvex on R. For by taking x1 = -1, x* = 1, X = >^, we see that 0(z2) = B(x1}, but 0[(1 - X)^ 1 + Xz2] > 6(xl). If we require that 6 be lower semicontinuous, the above counterexample will be eliminated and every strictly quasiconvex function will also be quasiconvex. In fact we have the following result.
Fig. 9.2.1 Strictly quasiconvex and strictly quasiconcave functions, (a) Strictly quasiconvex function 6 on Rn = R; (6) strictly quasiconcave function 6 on Rn = R. 138
Generalizations of Convex Functions
9.3
Theorem [Karamardian 67] Let 6 be a lower (upper) semicontinuous numerical function defined on the convex set T in Rn. If 0 is strictly quasiconvex (strictly quasiconcave) on F, then 6 is quasiconvex (quasiconcave) on T, but not conversely. PROOF We prove only the quasiconvex case. Let 8 be strictly quasiconvex on F, and let xl and x 2 be in F. By 1 we have that
Hence if 6(xz) < 8(xl), we are done by 9.1.1. Assume now that 6(xz) = 6(xl). We will show (by contradiction) that there exists no £ G (xl,xz) such that 6(xl) < 8(x). This will then establish the quasiconvexity of 6 by 9.1.1. Let x £ (x^x*) such that 8(xl) < 8(x). Then Since 6 is lower semicontinuous on F, ft is open relative to (xl,x*) by C.1.2(\v}. Hence, there exists an £ G (xl,x) C\ ft. By the strict quasiconvexity of 6 and 1 we have that (since £ G ft and x G ft)
and which is a contradiction. Hence no such x exists and 6 is quasiconvex on T. That the converse is not true follows from the example given at the beginning of this section, which is quasiconvex on R but not strictly quasiconvex on R. (For take x1 = %, x2 = —%, X = ^!o> then 0(z2) < 6(x1), but 8[(l - \)xl + Xz 2 )] = 8(xl), which contradicts 1.) Theorem Let 8 be a numerical function defined on the convex set F in Rn, and let x G r be a local minimum (maximum). If 8 is strictly quasiconvex (strictly quasiconcave) at x, then 8(x) is a global minimum (maximum) of 8 on T. PROOF We prove the strictly quasiconvex case. mum, then there exists a ball BS(X) such that
If x is a local mini-
139
Nonlinear Programming
9.3
Assume now that there exists an x in F but not in BS(X) such that Q(x &(x). By the strict quasiconvexity of 6 at x and the convexity of F, we have that
and hence
which contradicts a previous inequality. In closing this section on strictly quasiconvex (strictly quasiconcave) functions, we remark that there does not seem to be a simple characterization of a differentiable strictly quasiconvex (strictly quasiconcave) function in terms of the gradient of the function [such as 9.1.4 for a quasiconvex (quasiconcave) function].
3. Pseudoconvex and pseudoconcave functions Let 6 be a differentiable function on Rn. If VQ(x) = 0, then it follows from 6.1.1 that Q(x) ^ 6(x) for all x in Rn if 6 is convex at x. If 8 is a differentiable strictly quasiconvex function (and hence by 9.2.3 also a quasiconvex function) on Rn, and if VB(x) = 0, it does not necessarily follow that Q(x) ^ 6(x) for all x in Rn. For example take the strictly quasiconvex function 6 defined on R by 6(x) = (x)3. For this function V0(0) = 0, but certainly 0(0) is not a minimum of 9 on R. In order to exclude such strictly quasiconvex functions that have inflection points with horizontal tangents, we introduce the concept of a pseudoconvex function. (The concept of a pseudoconvex function was introduced independently in [Mangasarian 65] and in a slightly different form under the name of semiconvex function in [Tuy 64]. Most of the results for pseudoconvex functions derived in this book are based on [Mangasarian 65].) Pseudoconvex function Let 6 be a numerical function defined on some open set in Rn containing the set F. 6 is said to be pseudoconvex at x G F (with respect 140
Generalizations of Convex Functions
9.3
to F) if it is differentiable at x and
8 is said to be pseudoconvex on T if it is pseudoconvex at each x G T. Pseudoconcave function Let 6 be a numerical function denned on some open set in Rn containing the set T. 9 is said to be pseudoconcave at x G T (with respect to F) if it is differentiable at x and
8 is said to be pseudoconcave on T if it is pseudoconcave at each :r £ F. Obviously 6 is pseudoconcave at x (on T) if and only if — 6 is pseudoconvex at x (on T). Figure 5.3J depicts a pseudoconvex function and a pseudoconcave function on R. We give now some results involving pseudoconvex and pseudoconcave functions. Theorem (minimum principle) Let 6 be a numerical function defined on some open set in Rn containing the set T, and let x G T.
Fig. 9.3.1 Pseudoconvex and pseudoconcave functions, (a) Pseudoconvex function 6 on Rn = R; (6) pseudoconcave function 6 on Rn = R. 141
9.3
Nonlinear Programming
(i)
Let F be convex, and let 6 be differentiate
(ii)
Let 6 be pseudoconvex (pseudoconcave) at x.
PROOF
(i)
Let x be any point in T.
at x.
Then
Then
Since T is convex
Since 6 is differentiate at x, and since 6(x) = min d(x), we have that
where
Hence By taking the limit as X approaches zero, we get The second implication of (i) follows similarly. (ii) For any x in T we have that V6(x)(x — x) ^ 0, hence by 1, Q(x] ^ 6(x) and 6(x) = min 6(x). The second implication follows *er from 2. Corollary Let 6 be a numerical function defined on the open convex set T in Rn. Let x £ T, and let 6 be differentiable at x. Then
142
Generalizations of Convex Functions
(ii)
Let & be pseudoconvex (pseudoconcave) at x.
9.3
Then
PROOF (i) We consider only the first implication. By 3(i) we have that Vd(x) (x — x) ^ 0 for all x E F. Since F is open, x = x — 5V0(z) E r for some 5 > 0. Hence — 8V8(x)V8(x) ^ 0, which implies that V8(x) = 0. (ii)
Follows trivially from S(ii).
We relate now pseudoconvex functions to strictly quasiconvex functions, quasiconvex functions, and convex functions. Theorem Let F be a convex set in Rn, and let 8 be a numerical Junction defined on some open set containing F. If 6 is pseudoconvex (pseudoconcave) on F, then 6 is strictly quasiconvex (strictly quasiconcave) on F and hence also quasiconvex (quasiconcave) on F. The converse is not true. PROOF Let 8 be pseudoconvex on F. We shall assume that 6 is not strictly quasiconvex on F and exhibit a contradiction. If 6 is not strictly quasiconvex on F, then it follows from 9.2.1 that there exist xl,x2 E F such
and for some x such that Hence there exists an x £ (xl,x*) such that
where By Theorem 8(1) above we have then that
143
9.3
Nonlinear Programming
and Since then
and Hence
and
But by the pseudo-convexity of 6 on T, it follows that
and hence
This last inequality contradicts the earlier statement that
Hence 6 is strictly quasiconvex on T and, by 9.2.3, is also quasiconvex on F. That the converse is not necessarily true can be seen from the example d(x) = (x)3, x G R, which is strictly quasiconvex on R but is not pseudoconvex on R. Theorem Let Q be a numerical function defined on some open T set in Rn, let x G T, and let 6 be differentiable at x. If 6 is convex (concave) at x, then d is pseudoconvex (pseudoconcave) at x, but not conversely. PROOF
144
Let 6 be convex at x.
By 6.1.1 we have that
9.4
Generalizations of Convex Functions
Hence
and 6 is pseudoconvex at x. That the converse is not necessarily true can be seen from the example
8 is not convex on R because Vz8(x) < 0 for x < 0. convex on R because V8(x) = I -f 3(z) 2 > 0 and
However, 6 is pseudo-
Theorem Let F be a convex set in Rn, and let 9 be a numerical function defined on some open set containing F. If 6 is pseudoconvex (pseudoconcave) on T, then each local minimum (maximum) of 6 on T is also a global minimum (maximum) of 8 on T. PROOF By Theorem 5 & is strictly quasiconvex (strictly quasiconcave on T. By Theorem 9.2.4 each local minimum (maximum) of 6 on T is also a global minimum (maximum) on F.
4. Summary of properties and relations between quasiconvex, strictly quasiconvex, pseudoconvex, convex, and strictly convex functions In Fig. 9.4.1 we summarize the relations between differentiable functions which are quasiconvex, strictly quasiconvex, pseudoconvex, convex, and strictly convex, all denned on an open convex set F in Rn. For convenience, we include statements which are logical equivalents of each other. For example, we include the two following equivalent definitions of a pseudoconvex function
or
145
9.4
Nonlinear Programming
In Fig. 9.4-1 certain inequalities hold only if xl 9* x2 and X ;* 0 or X 5* 1. All implications are represented by single arrows. Implications not so represented are not true in general. Figure 9.4-1 immediately shows that the class of quasiconvex functions is the largest class considered and the strictly convex class is the smallest. Figure 9.4-2 shows similar results for differentiate quasiconcave functions, etc.
Fig. 9.4.1 Properties and relations between differentiable quasiconvex, strictly quasiconvex, pseudoconvex, convex and strictly convex functions defined on an open convex set r C Rn146
Generalizations of Convex Functions
9.8
Fig. 9.4.2 Properties and relations between differentiable quasiconcave, strictly quasiconcave, pseudoconcave, concave and strictly concave functions defined on an open convex set T C Rn-
5.
Warning
It was shown in 4-1-6 that each nonnegative linear combination of convex functions is also convex. This is not true in general for quasiconvex, strictly quasiconvex, or pseudoconvex functions. We indicate the reason for the pseudoconvex functions 0 and 0 on Rn. If we define
147
Nonlinear Programming
9.6
then the requirement that does not ensure that and hence we cannot conclude that and so ^ may not be pseudoconvex on Rn. For example Q(x) — — x and (x) = x + (x)z are both pseudoconvex on R, but their sum \f/(x} = (x)3 is not pseudoconvex on R.
6. Problems Nonlinear fractional functions Let 0 and ^ be numerical functions defined on a set F C R", and let
Let T be open, let x £ F, and let > and ^ be differentiable at x. Show that d = <(>/$ is pseudoconvex at x if
and
(ii)
Let T be convex. Show that d = $/$ is quasiconvex on F if
and
148
9.6
Generalizations of Convex Functions
(Hint: Consider, for the case ^ > 0, the set
Nonlinear fractional functions Show that the words convex and concave can be interchanged throughout the above problem 1, except that the sentence "Let T be convex . . . " remains unchanged. Linear fractional functions Let a £ Rn, b E Rn, « £ R, ft £ Rdenned by
Show that the function 6
is both pseudoconvex and pseudoconcave (and hence also both quasiconvex and quasiconcave) on each convex set r C R" on which bx + /3 ^ 0. Bi-nonlinear functions Let 0 and cr be numerical functions defined on a convex set F £ Rn, and let o- 7^ 0 on r. (i)
Let F be open, and let 0 and a be differentiate on F. Show that 8 = 0cr is pseudoconvex on F if (0 is convex, ^ 0, or is concave, a- > 0, all on F) or
(0 is concave, 0 ^ 0, a is convex, tr < 0, all on F) Show that 8 is pseudoconcave on F if (0 is convex, 0 ^ 0, a is convex, a < 0, all on F) or
(0 is concave, 0 ^ 0, cr is concave, cr > 0, all on F) (ii)
If we drop the assumptions that F is open and that 0 and a are differentiate on F, show that either of the first two conditions of (i) make 6 quasiconvex on F and either of the last two conditions of (i) make 8 quasiconcave on F. (Hint: Show first that the reciprocal 149
9.6
Nonlinear Programming
of a positive concave function is a positive convex function, and that the reciprocal of a negative convex function is a negative concave function, then use the results of Prob. 1 above.) Let 6 be a positive numerical function defined on the open convex set F in Rn. Show that if log 0 is convex (concave) on T, then 6 is convex (pseudoconcave) on F.
160
Chapter Ten Optimality and Duality for Generalized Convex and Concave Functions
In this chapter we shall use the concepts of quasiconvexity (quasiconcavity) and pseudoconvexity (pseudoconcavity), introduced in the previous chapter, to derive sufficient optimality criteria, necessary optimality criteria, and duality relations for a class of nonlinear programming problems that involve differentiable functions. The pioneering work in this direction was that of [Arrow-Enthoven 61]. Subsequent work appeared in [Tuy 64, Martos 65, Mangasarian 65].
1. Sufficient optimality criteria We establish in this section two sufficient optimality criteria which are slight generalizations of the sufficiency result of [Mangasarian 65]. The present results also subsume the sufficiency results of [Arrow-Enthoven 61] and Theorem 7.2.1 of this book. Sufficient optimality theorem Let X° be an open set in Rn, and let 6 and g be respectively a numerical function and an m-dimensional vector function both defined on X°. Let x G X°, let
let 0 be pseudoconvex at x, and let QI be differentiable and guasiconvex at x. If there exists a u G Rm such that (x,u) satisfies the following conditions
10.1
Nonlinear Programming
then x is a solution of the following minimization problem
PROOF
Let
and hence Since it follows by the quasiconvexity of QI at x and Theorem 9.1.4 that and hence But since uj = 0, we also have that Addition of the last two relations gives But since [V6(x) + uVg(x)](x — x) ^ 0 for all x G X, the last inequality gives
which, by the pseudoconvexity of 6 at x, implies that
152
Generalized Optimality and Duality
10.2
Since g(x) ^ 0, we have that x £E X, and
The following theorem, which is a generalizaton of the KuhnTucker sufficient optimality criterion 7.8.1, follows trivially from the above theorem by observing that V6(x) -f uVg(x) = 0 implies that [V0(z) + uVg(x)](x - x) = 0 for all x 6 X. Kuhn-Tucker sufficient optimality theorem Theorem 1 above holds with the condition replacing the condition
2. Necessary optimality criteria The purpose of this section is to generalize the necessary optimality criteria of Sec. 7.3 by employing the concepts introduced in Chap. 9. We begin by extending Abadie's linearization lemma 7.3.1. Linearization lemma Let X° be an open set in Rn, and let 6 and g be respectively a numerical function and an m-dimensional vector function both defined on X°. Let x be a solution of LMP 7.1.2, let Q and g be differentiable at x, let P
=
\i I Qi(%) — 0>
an
d Qi is pseudoconcave at x\
and let
Q — H I 9i(%)
=
0>
d Qits °t pseudoconcave at x}
an
Then the system
has no solution z in Rn. PROOF The proof is identical to that of Lemma 7.3.1 except that V and W are replaced by P and Q and step (iii) of the proof of 7.3.1 is 168
10.2
Nonlinear Programming
modified as follows: (in') For i 6E P, we have by 9.3.2, since 0t is pseudoconcave at x and dVgp(x)z ^ 0, that We generalize now the Fritz John stationary-point necessary optimality theorem 7.3.2. Fritz John stationary-point necessary optimality theorem Let x be a solution of LMP 7.1.2 or of MP 7.1.1, let X° be open, and let 8 and g be differentiate at x. Then there exist an f 0 G # and an f (E Rm such that (x,f0,r) solves FJP 7.1.3 and where
PROOF The proof follows from Lemma 1 above in exactly the same way as Theorem 7.3.2 follows from Lemma 7.3.1. Again, as in Chap. 7, there is no guarantee that f 0 > 0 in the above theorem. To ensure this, we impose constraint qualifications. We give below less restrictive versions of constraint qualifications that were introduced earlier. The weak Arrow-Hurwicz-Uzawa constraint qualification (see 7.3.4)
Let X° be an open set in Rn, let g be an m-dimensional vector function defined on X°, and let g is said to satisfy the weak Arrow-Hurwicz-Uzawa constraint qualification at x G X if g is differentiable at x and has a solution z Gj Rn where and Qi is pseudoconcave at x} and
and Qi is not pseudoconcave at x \ 154
Generalized Optimality and Duality
10.3
The weak reverse convex constraint qualification (see 7.3.5) Let X° be an open set in Rn, let g be an m-dimensional vector function defined on X°, and let g is said to satisfy the weak reverse convex constraint qualification at x G X if g is differentiate at x and, for each i G I, either gr, is pseudoconcave at x, or 0, is linear on Rn, where
Slater's weak constraint qualification (see 5.4-3) Let X° be an open set in Rn, let g be an m-dimensional vector function defined on X°, and let
g is said to satisfy Slater's weak constraint qualification at x G X if g is differentiable at x, gi is pseudoconvex at x, and there exists an x G X such that gi(x) < 0 where
We observe here that each of the weak constraint qualifications 3, 4, and 5 above is essentially implied by the corresponding constraint qualification introduced earlier, that is, by 7.3.4, 7.3.5, and 5.4-3, respectively. Before deriving the Kuhn-Tucker necessary optimality conditions under the weakened constraint qualifications above, we relate the weakened constraint qualifications to each other and to the Kuhn-Tucker constraint qualification 7.3.3. Lemma Let X° be an open set in Rn, let g be an m-dimensional vector function defined on X°, and let
(i) // g satisfies the weak reverse convex constraint qualification 4 at x, then g satisfies the weak Arrow-Hurwicz-Uzawa constraint qualification 3 at x. (ii) // g satisfies the weak reverse convex constraint qualification 4 at x, then g satisfies the Kuhn-Tucker constraint qualification 7.3.3 at x. IBS
lo.a (iii)
Nonlinear Programming
// g satisfies Slater's weak constraint qualification 5 at x, then g satisfies the weak Arrow-Hurwicz-Uzawa constraint qualification 3 at x.
PROOF (i) We have here that Q is empty, and hence by taking z = 0, we immediately satisfy Vgp(x)z = 0, and the weak Arrow-Hurwicz-Uzawa constraint qualification 3 is satisfied at x. (ii) The proof is essentially identical to the proof of Lemma 7.3.6(ii), except that the line reading in the proof of 7.3.6(\\) is replaced by the following argument: (by pseudoconcavity of gj at x)
(iii) Let g satisfy Slater's weak constraint qualification 5 at x. Then there exists an x G X such that where But since gi is pseudoconvex at x, it follows from (the logical equivalent of) 9.8.1 that Hence by taking z — x — x, we have that Vgi(x)z > 0, and the weak Arrow-Hurwicz-Uzawa constraint qualification 3 is satisfied at x. We summarize the relations obtained above between the various constraint qualifications in Fig. 10,2.1. Kuhn-Tucker stationary-point necessary optimality theorem Let X° be an open set in Rn, let 9 and g be defined on X°, let x solve LMP 7.1.2 or MP 7.1.1, let 9 and g be differentiate at x, and let g satisfy (i) (ii) (iii) (iv) 166
The Kuhn-Tucker constraint qualification 7.3.3 at x, or the weak Arrow-Hurwicz-Uzawa constraint qualification 3 at x, or the weak reverse convex constraint qualification 4 at x, or Slater's weak constraint qualification 5 at x.
Generalized Optimality and Duality
10.3
Fig. 10.2.1 Relations between various constraint qualifications (CQ).
Then, there exists a u G Rm such that (x,u) solves KTP 7.1.4PROOF In view of Lemma 6 we need only establish the theorem under assumptions (i) and (ii) above. (i) This is the same as 7.3.7(1). (ii) This proof is identical with the proof of Theorem 7.3.7(ii) with the following modifications: V and W are replaced by P and Q, and Theorem 2 is used in the proof instead of Theorem 7.8.2.
3. Duality In this section we will extend the Hanson-Huard strict converse duality theorem 8.1.6 in two directions. In the first direction we will show that (x,u) need only be a local solution of the dual problem rather than a global solution. In the second direction we will relax the convexity of the objective function 6 to pseudoconvexity, and the convexity of the constraint function g to quasiconvexity. We will also show that neither the weak duality theorem 8.1,3 nor Wolfe's duality theorem 8.1.4 holds for a pseudoconvex 6. Strict converse duality theorem [Mangasarian 65] Let X° be an open set in Rn, and let 6 and g be differentiate Let (x,u) be a local solution of DP 8.1.2, that is,
on X°.
157
Nonlinear Programming
10.3
where BS(X,U) is an open ball in Rn X Rm with radius 5 > 0 around (x,u). Let 6 be pseudoconvex at x, and let g be quasiconvex at x. If either (i) (ii)
$(x,u) is twice continuously differentiate at x, and the n X n Hessian matrix Vxz^(x,u) is nonsingular, or there exists an open set A in Rm containing u and an n-dimensional differentiate vector function e on A such that
and
then A solves MP 8.1.1, and
If x solves MP 8.1.1, it does not follow that x and some u solve DP 8.1.2 (even if g is linear), nor does it follow that 0(x) ^ $(x,u) for any dual feasible point (x,u), that is, (£,u) £E Y. PROOF The proof of the first part of the theorem is essentially identical to the proof of Theorem 8.1.6, except that we invoke the sufficient optimality theorem 10.1.2 here, whereas we invoked the sufficient optimality theorem 7.2.1 in the proof of 8.1.6. (The assumption that (x,u) need only be a local solution of DP 8.1.2 does not change the proof of 8.1.6 and in fact could have been made in Theorem 8.1.6 also. It would have added nothing to that theorem, however, since under the assumptions of Theorem 8.1.6 each local maximum is also a global maximum. This, however, is not the case under the present assumptions, as can be seen from the example following this proof, in which a local maximum to the dual problem is not a global maximum.) We establish now the second part of the theorem by means of the following counterexample:
158
Generalized Optimallty and Duality
10.8
The solution of MP1 is obviously x = 1, whereas DPI has no maximum but has a supremum, which is equal to zero. (To see this, rewrite the dual problem in the equivalent form
and note that the quadratic equation 1 — 2x + 2(z)2 = 0 has only complex roots.) We also have dual feasible points (£,ti), such as £ = 10, U = 20e-100, such that 8(x) < iK£,fl). We give now an example where the first part of the above theorem applies and where the dual problem has no global maximum but a local maximum which is also a solution of the primal problem. Consider the primal minimization problem
and its dual
Setting v = I — u gives
and setting v = — [l/3(z)2], x j* 0, gives
Hence V<£(£) = 0 implies that x = — 1, and since V 2 0(z) < 0 for x < 0, > is strictly concave on {x \ x G R, x < 0}, and f = — 1 is a local maximum of 0. However :£ = — 1 is not a global maximum of on {x \ x G R, x > 0 or x < 0}, since
10.3
Nonlinear Programming
It is obvious also that x = — 1 solves the primal problem and that B(x) = — 2. Hence x = x, and 6(x) = t(x,u). We end this chapter with a corollary that follows directly from the proof of Theorem 8.1.6 or from the proof of Theorem 1 above. Corollary Let all the assumptions of Theorem 1 above hold, except that 6 need not be pseudoconvex at x nor g be quasiconvex at x. If (x,u) is a local solution of DP 8.1.2, then (x,u) is a Kuhn-Tucker point, that is, (x,u) solves KTP7.14.
160
Chapter Eleven Optimality and Duality in the Presence of Nonlinear Equality Constraints
Up until now we have dealt almost exclusively with minimization problems in which the feasible region X is determined by inequality constraints only, that is,
where X° is a subset of Rn, and g is an w-dimensional vector function defined on X°. (For exceptions see 6.8.8, 5.4-8, 7.2.2, and 7.3.18, where equality constraints are also considered.) In this chapter we shall examine in some detail minimization problems in which the feasible region X is determined by inequality and nonlinear equality constraints, that is,
where h is a fc-dimensional vector function defined on X°. In particular we shall obtain sufficient optimality criteria, necessary optimality criteria, and some duality results for the following minimization problem
We remark here that all the necessary optimality conditions of Chap. 7 that involved differentiable functions required that the set X° be open. We derive in this chapter an important necessary optimality condition of the maximum-principle type which does not require that X° be open.
11.1
1,
Nonlinear Programming
Sufficient optixnality criteria
The sufficient optimality criteria given here follow directly from Theorem 10.1.1 and Theorem 10.1.2 by observing that the equality constraint h(x) = 0 can be written as h(x) ^ 0 and — h(x) ^ 0, and that the negative of a quasiconcave function is quasiconvex. Sufficient optimality theorem Let X° be an open set in Rn, and let d, g, and h be respectively a numerical function, an m-dimensional vector function, and a k-dimensional vector function, all defined on X°. Let x G X°, let let 0 be pseudoconvex at x, let gr be differentiate and quasiconvex at x, and let h be differentiate, quasiconvex, and quasiconcave at x. If there exist u £j Rm and v £ Rk such that (x,u,v) satisfies the following conditions
then x is a solution of the following minimization problem
Kuhn-Tucker sufficient optimality theorem Theorem 1 above holds with the condition replacing the condition
2.
"Minimum-principle" necessary optimality criteria: X° not open
We consider in this section the minimization problem
162
Optimality and Duality with Nonlinear Equalities
11.2
We do not assume here that X° is an open subset of Rn, that g is convex on X°, or that h is linear on\ff n .f Without the assumption that X° is open, we cannot derive necessary optimality criteria of either the Fritz John type (7.3.2} or of the Kuhn-Tucker type (7.3.7). We can, however, derive necessary optimality criteria of the minimum-principle type, which, when X° is open, become necessary optimality criteria of the Fritz John or Kuhn-Tucker type. (The minimum-principle necessary optimality criterion we are about to derive has been referred to as a maximum principle in [Halkin 66, Halkin-Neustadt 66, Canon et al. 66, MangasarianFromovitz 67] because of its close relation to Pontryagin's maximum principle [Pontryagin et al. 62], which is the fundamental optimality condition of optimal control. We have a minimum principle here instead of a maximum principle because the adjoint variables of Pontryagin's maximum principle, which play the role of Lagrange multipliers, are the negative of our Lagrange multipliers f 0 , f, s, to be introduced below. Halkin [Halkin 66] has a maximum principle because he has a maximization problem instead of a minimization problem.) We point out now the main difference between a minimum-principle necessary optimality criterion and the main condition of the Fritz John necessary optimality criterion. The minimum principle requires that where x is a solution of the minimization problem, f 0 , f, and s are Lagrange multipliers and X° is the closure of X°. The Fritz John criterion requires that Obviously, the latter condition follows from the first one if X° is open. If X° is not open, the latter condition does not hold in general. We begin by establishing a generalization of a linearization lemma of [Mangasarian-Fromovitz 67a]. Linearization lemma Let X° be a convex set in Rn with a nonempty interior, int (X°), and let A be an open set in Rn. Let f be an ^-dimensional vector function, and let h be a k-dimensional vector function, both defined on some open set containing X°. Let t For the case when X° is convex, g is convex on X°, and h is linear on Rn, we have the necessary optimality conditions 5.4-8. 163
ll.X
Nonlinear Programming
Let f be differentiable at x, let h have continuous first partial derivatives at x, and let Vhj(x), j = I , . . . , k, be linearly independent, that is,
Then
PROOF k = n: The linear independence of Vhj(x), j = 1, . . , , k, is equivalent to the nonsingularity of Vh(x). Hence Vf(x)(x — x) < 0 cannot hold, because Vh(x)(x — x) = 0 implies that x — x = 0. k > n: This case is excluded because of the assumption that Vhj(x),j j = 1, . . . , k, are linearly independent, which implies that k ^ n. 0 < k < n: We shall establish the lemma by proving the following backward implication
which is the logical equivalent of the implication of the lemma. Since h(x) = 0, and since Vhj(x), j = 1, . . . , k, are linearly independent, we have, by the implicit function theorem D.3.1, that there exist: (i) a partition (XJ,XK) of x such that xi G Rn~k, XK £ Rk, (ii) an open set Si in Rn~k containing xr, and (iii) a fc-dimensional differentiable vector function e on ft such that
164
Optlmality and Duality with Nonlinear Equalities
11.1
and
VXKh(x) is nonsingular Since h and e are differentiable, and since h[xi,e(xi)] = 0 for all XT E fy we have, by the chain rule D.L6, that Postmultiplication by (XT — XT) gives But by assumption Vh(x)(£ — x) = 0, or equivalently
Hence the last two distinct equalities and the nonsingularity of VXKh(x) imply that
Now, since ft is open, since XT (E ft, and since e is differentiable at x, there exists a 50 > 0 such that for S < do and
where c is a /c-dimensional vector function that tends to zero as 5 tends to zero. By combining the last two equalities and the fact that XK = e(xi), we have that for 5 < 50 But this relation and the fact that / is differentiable at x imply that there exists a 61 such that 0 < 81 < 80 and such that for 5 < di
188
11.1
Nonlinear Programming
where ||a;/,x/c|| denotes (xixi + XKXR)^, and where 6 is an /-dimensional vector function which tends to zero as 6 tends to zero. Since c tends to zero as 5 tends to zero, and since (by assumption) Vf(x)(x — x) < 0, that is, it follows that there exists a 52 such that 0 < 52 < 5i, and such that for 0 < 5 < 82, the quantity in the curly brackets { } in the expression next to the last is strictly negative. Hence (since /(£) = 0) we have that and, since h[xi,e(xi}} — 0 for XT (E 12, we also have that Now, we have already shown that for 5 < 80 Because x G int (-X"0), and because c tends to zero as 8 tends to zero, there exists a 83 such that 0 < 83 < 52 and 53 < 1, and such that Since x (— X°, it follows from the last two relations and the convexity of X ° that Since x G A, and since A is open, it follows from the expression established above for e[xi -+- 8(xi — Xi}} that there exists a 5 4 such that 0 < 54 < 53 and such that Hence, combining the above results, we have that any x given by satisfies f(x] < 0, h(£) = 0 and is in X° C\ A. We have thus established the lemma for the case when n > k > 0. k = 0: This is the case where there is no h(x) — 0. We prove here that We have now that for 0 < 8 < 80
166
Optimality and Duality with Nonlinear Equalities
11.9
Since b tends to zero as 5 tends to zero, and since Vf(x) (x — x) < 0, there exists a 6 > 0 such that
Since x G X° C\ A, and since X° is convex and A is open, there exists a £ < 0 such that £ < 8 and $ < 1, and such that
Hence any x = x + 8(x — x ) . 0 < 5 < §. satisfies f(£] < 0 and is in
x°nA.
We derive now another lemma from the above one by using the fundamental theorem for convex functions, Theorem ^$.1 (which is a consequence of the separation theorem for convex sets, Theorem 3.2.8). This will be the key lemma in deriving the minimum-principle necessary optimality criterion. Lemma Let X° be a convex set in Rn with a nonempty interior, int (X°~), and let A be an open set in Rn. Let f be an ^-dimensional vector function, and let h be a k-dimensional vector function, both defined on some open set containing X°. Let
Let f be differentiable at x. Then
at x, and let h have continuous first partial derivatives
PROOF If Vhj(x), j = 1, . . . , k, are linearly dependent, then there exists a q 9^ 0 such that qVh(x) = 0, and the lemma is trivially satisfied by p = 0, q ^ 0, and qVh(x)(x - x) = 0 for all x £ X°. If Vhj(x),j = 1, ... , k, are linearly independent, then by Lemma 1 above we have that
has solution has aa solution 167
11.3
Nonlinear Programming
Since the interior, int (X°), of a convex set is convex (see 3.1.7), it follows by Theorem 4-2.1 that there exist p G R*, q G Rk, P ^ 0, (p,q) ^ 0 such that and since the above expression is continuous in x, and in fact linear, the above inequality holds also on the closure X° of X°. We are now ready to derive the fundamental necessary optimality criterion of this chapter. Minimum-principle necessary optimality theorem Let X° be a convex set in Rn with a nonempty interior: int (X°). Let 6 be a numerical function, let g be an m-dimensional vector function, and let h be a k-dimensional vector function, all defined on some open set containing X°. Let x be a solution of
Let 6 and g be differentidble at x, and let h have continuous first partial derivatives at x. Then there exist f 0 G R> f G Rm, s G Rk such that the following conditions are satisfied
REMARK The first of the above four relations is called a minimum principle because it is a necessary optimality condition for f06(x) -frg(x) -f- sh(x) to have a minimum at x (see Theorem 9.3.3). If in addition we have that f08 + fg + sh is pseudoconvex or convex at x (this, in general, is not the case under the assumptions of the above theorem), then the minimum-principle condition implies that fod(x) + fg(x) -f- sh(x) = min f08(x) + fg(x) -f- sh(x) x£.X<>
The above condition becomes then equivalent to the saddlepoint condition of Problem 6.4.2. PROOF
168
Let
Optimality and Duality with Nonlinear Equalities
11.2
Let m/ and mj denote the number of elements in the sets / and J respectively, so mi -\- mj = m. Since g is defined on some open set containing X°, and since g is differentiate at x, there exists a 8 > 0 such that for i 6E J and \\x — x \ < 8
Hence the set is an open set in Rn. Now, we have that the system
for if it did have a solution, then x would not be a solution of the minimization problem. But we also have that Hence by Lemma 2 above, there exist f 0
such that
By defining f / = 0 G: Emj and observing that and
the theorem is established. It should be remarked that the restriction that the convex set X° have a nonempty interior is a mild restriction, since in general a convex set without an interior is equivalent to the intersection of a convex set with a nonempty interior and a linear manifold \x \ x £ Rn, h(x] = 0} (h linear on Rn). Since we allow equalities h(x) = 0 in Theorem 3 above, 169
11,8
Nonlinear Programming
convex sets with empty interiors can, in effect, be handled by the above results. The convexity requirement on X° is of course a restriction which cannot be dispensed with easily. (See however [Halkin 66, Canon et al. 66].) If we replace the convexity requirement on X° by the requirement that XQ be open, a stronger necessary optimality condition than the above one can be obtained. In effect this will be an extension of the Fritz John stationary-point necessary optimality theorem 7.3.2 to the case of nonlinear equalities. We shall give this result in the next section of this chapter.
3. Fritz John and Kuhn-Tucker stationary-point necessary optimality criteria: X° open We derive in this section necessary optimality criteria of the Fritz John and Kuhn-Tucker types from the minimum principle of the previous section. Fritz John stationary-point necessary optimality theorem [Mangasarian-Fromovitz 67a] Let X° be an open set in Rn. Let 6 be a numerical function on X°, let g be an m-dimensional vector function on X°, and let h be a k-dimensional vector function on X°. Let x be a (global) solution of the minimization problem
or a local solution thereof, that is,
where B&(x) is an open ball around x with radius 8. Let 6 and g be differentiable at x, and let h have continuous first partial derivatives at x. Then there exist fo (E R, r G Rm, s G Rk such that the following conditions are satisfied
PROOF ITU
Let x be a global or local solution of the minimization problem.
Optlmality and Duality with Nonlinear Equalities
11.8
In either case (since X° is open) there exists an open ball Bp(x) around x with radius p such that Bp(x) C BI(X) C. X°, and
Since Bp(x) is a convex set with a nonempty interior, we have by the minimum principle 11.2.3 that there exist f 0 E R, ? £ Rm, s £ Rk such that
Since for some small positive f we have from the first inequality above that To derive Kuhn-Tucker conditions from the above, we need to impose constraint qualifications on the problem. The Kuhn-Tucker constraint qualification (see 7.3.3) Let X° be an open set in Rn, let g and h be w-dimensional and /c-dimensional vector functions on X°, and let
g and h are said to satisfy the Kuhn-Tucker constraint qualification at x G X if g and h are differentiate at x and
there exists an n-dimsional vector funtion on the interval suuch that
is differentiable at where
171
11.3
Nonlinear Programming
The weak Arrow-Hurwicz-Uzawa constraint qualification (see 10.8.3) Let X° be an open set in Rn, let g and h be m-dimensional and /c-dimensional vector functions on X°, and let
g and h are said to satisfy the weak Arrow-Hurwicz-Uzawa constraint qualification at x (E X if g and h are differentiate at x, h is both pseudoconvex and pseudoconcave at x, and
has a solution
where
ispseudoconcave pseudoconcave
The weak reverse convex constraint qualification (see 10.2.4) Let X° be an open set in Rn, let g and & be m-dimensional and /c-dimensional vector functions on X°, and let
g and h are said to satisfy the weak reverse constraint qualification at x G X if a and /i are differentiate at x, and if for each
either Qi is pseudoconcave at x or linear on Rn, and /i is both pseudoconvex and pseudoconcave at x. The modified Arrow-Hurwicz-Uzawa constraint qualification [Mangasarian-Fromovitz 67a] Let X° be an open set in Rn, let g and h be w-dimensional and fc-dimensional vector functions on X°, and let
g and /i are said to satisfy the modified Arrow-Hurwicz-Uzawa constraint qualification at x G X if g is differentiable at x, A is continuously differen172
Optimality and Duality with Nonlinear Equalities
11.3
liable at x, Vhi(x), i — 1, . . . , k, are linearly independent, and
has a solution where
Kuhn-Tucker stationary-point necessary optimality theorem Let X° be an open set in Rn, and let 9, g, and h be respectively a numerical function, an m-dimensional vector function, and a k-dimensional vector function, all defined on X°. Let x be a (global) solution of the minimization problem
or a local solution thereof, that is,
where B$(x) is some open ball around x with radius 5. Let d, g, and h be differentiate at x, and let g and h satisfy (i) (ii) (m) (iv)
the Kuhn-Tucker constraint qualification 2 at x, or the weak Arrow-Hurwicz-Uzawa constraint qualification 3 at x, or the weak reverse convex constraint qualification 4 at x, or the modified Arrow-Hurwicz-Uzawa constraint qualification 5 at x.
Then there exist u G Rm and a v G Rk such that
PROOF (i)-(ii)-(iii) These parts of the theorem follow from Theorem 10.2.7, parts (i), (ii), and (iii), by replacing h(x) = 0 in the above theorem by h(x) ^ 0 and -h(x) ^ 0. 173
U.*
Nonlinear Programming
(iv) All we have to show here is that f 0 > 0 in the theorem 1 above, for then we have that x,w = f/f 0 and v = the Kuhn-Tucker conditions above. We assume now that exhibit a contradiction. If 7 = [i | Qi(x) = 0} is empty, that is, there are no straints, then f — 0 (since fg(x) = 0, f ^ 0 and g(x) ^ 0). Theorem 1
Fritz John s/f0 satisfy f 0 = 0 and active conHence by
which contradicts the assumption of 5 that Vhi(x), i = 1, . . . , k, are linearly independent. (If there are no equality constraints h(x) — 0, then (fo,f) = 0 contradicts the condition (f 0 ,f) ^ 0 of Theorem 1, since there is no s.) If / is not empty, then by Theorem 1 we have that
If fj = 0, then s 5^ 0, and we have a contradiction to the assumption of 5 that Vhi(x), i = 1, . . . , k, are linearly independent (if there is no s, then fi = 0 implies (f 0 ,f) = 0, which contradicts the condition (f 0 , f) 5^ 0 of Theorem /). If rz 9* 0, then fj > 0. But by 5 there exists a 2 such that Hence which contradicts the equality above that
4. Duality with nonlinear equality constraints The following strict converse duality results follow respectively from Theorem 10.3.1 and Corollary 10.3.2 by replacing the equality constraints h(x) = 0 by h(x) £ 0 and -h(x) ^ 0. Strict converse duality theorem Let X° be an open set in Rn, and let 6, g, and h be respectively a numerical function, an m-dimensional vector function, and a k-dimensional vector function, all defined and differentiable on X°. Let (x,u,v) be a (global) 174
Optimallty and Duality with Nonlinear Equalities
11.4
solution of the dual problem
or a local solution of the dual problem, that is,
where BS(X,U,&) is an open ball in Rn X Rm X Rk with radius 8 > 0 around (£,w,$). Let 6 be pseudoconvex at x, let g be quasiconvex at x, and let h be both quasiconvex and quasiconcave at x. If either (i) (ii)
\l/(x,u,6) is twice continuously differentiate at x and the n X n Hessian matrix Vxz\l/(x,u,D) is nonsingular, or there exists an open set A in Rm X Rk containing (u,v} and an n-dimensional differentiate vector function e on A such that
and
then A solves the minimization problem
and
If x solves the minimization pro blem 3, it does not follow that x and some u and v solve the dual problem 2 (unless X° is convex, 6 and g are convex on X°, and h is linear on Rn), nor does it follow that &(x) ^ \l/(£,u,v) for any dual feasible point (£,#,£), that is, (£,u,v) £ Y. 175
11.4
Nonlinear Programming
Corollary Let all the assumptions of Theorem 1 above hold, except that 6 need not be pseudoconvex at x, nor g quasiconvex at x, nor h both quasiconvex and quasiconcave at x. If (x,u,v) is a local solution of the dual problem 2, then (£,ufi} is a Kuhn-Tucker point, that is,
176
Appendix A Vectors and Matrices
We collect in this appendix some of the bare essentials of real linear algebra which are needed in this book. For more detail the reader is referred to [Gale 60, Halmos 58]. We assume here that the reader is familiar with the notation and definitions of Chap. 1.
1. Vectors Fundamental theorem of vector spaces Let each of the vectors y°, yl, . . . , ym in Rn be a linear combination of the vectors xl, xz, . . . , xm in Rn; then y°, yl, . . . , ym are linearly dependent.
PROOF The proof will be by induction on m. If m = 1, then
then y° = yl = 0, and y° and yl are linearly dependent. If not, then po T* 0, say. Then and T/° and y1 are linearly dependent because p0 ^ 0. Now assume that the theorem holds for TO = k — 1, and we will show that it also holds for m = k. By hypothesis we have that
If all pi' are zero then all y3' are zero and hence linearly dependent. Assume now that at least one p*' is not zero, say pi°. Define
A.I
Nonlinear Programming
Then each of the k vectors zj is a linear combination of the k — 1 vectors x*, . . . , xk, and hence by the induction hypothesis, the zj are linearly dependent, that is, there exist numbers qi, . . . , qk, not all zero, such that
which shows that the y' are linearly dependent. Corollary Any n + 1 vectors in Rn are linearly dependent. PROOF Let e ' b e the vector in Rn with zero for each element except the ith, which is 1. Then any vector x in Rn is a linear combination of the ei for x = V xte{. The corollary then follows from Theorem 1 above. Corollary Any m vectors in Rn are linearly dependent if m > n. PROOF By Corollary 2 any n + 1 vectors of the m vectors are linearly dependent, hence the m vectors are linearly dependent. Corollary Each system of n homogeneous linear equations in m unknowns, m > n, has a nonzero solution. PROOF Consider the matrix equation Ax = 0 where A is any n X m matrix. The m columns Aj of A are vectors in Rn and hence by Corollary 3 above are linearly dependent. Hence there exists an x ^ 0 such that Ax = 0. Basis of a subset of Rn Let S be a subset of Rn, and let r be the maximum number of linearly independent vectors which can be chosen from S. Any set of r linearly independent vectors of S is called a basis for S. 178
A.2
Appendix A
Basis theorem The linearly independent vectors xl, . . . , xr are a basis for a subset n S of R if and only if every vector y in S is a linear combination of the x\ PROOF [Gale 60] Suppose every y in S is a linear combination of the x\ Then S contains no larger set of linearly independent vectors, for any set of more than r vectors must be dependent since they are combinations of the x\ Therefore r is the maximum number of linearly independent vectors which can be chosen from S, and hence the xi are a basis for S. Conversely, suppose that the xi are a basis for S. Then by definition, r is the number of vectors in the largest set of linearly independent vectors that can be found in S. So if y is in S, then x1, . . . , xr, y are linearly dependent, that is,
and po T^ 0, for otherwise the x{ would be linearly dependent.
Hence
which is the desired linear combination.
2.
Matrices
Row and column rank of matrix The maximum number of linearly independent rows of a matrix is called the row rank of the matrix, and the maximum number of linearly independent columns of a matrix is called the column rank of the matrix. Rank theorem For any matrix A, the row rank and column rank are equal. PROOF [Gale 60] Let r be the row rank of A, s its column rank, and suppose that r < s. Choose a row basis for A, which we may assume consists of the rows AI, . . . , Ar (renumbering rows and columns of A clearly does not affect its row or column rank), and a column basis, which we assume consists of the columns A.i, . . . , A.,. Let
179
A.a
Nonlinear Programming:
and note that the equations are r equations in s (> r) unknowns and hence have a nonzero solution y by Corollary A.I4- Also since AI, . . . , Ar are a row basis, it follows 1
from the basis theorem A.1.6 that for all k, Ak — Y pikAi for some numbers pik. Hence
and for all k
which is equivalent to
This shows that the columns A.I, . . . , A.t of A are linearly dependent, which contradicts the assumption that they were a basis. This contradiction shows that r ^ s. The same argument applied to the transpose of A shows that s ^ r. Hence r = s. Rank of a matrix The rank of a matrix is the column or row rank of the matrix (which are equal by the above theorem). Corollary Let A be an r X n matrix with rank r (r ^ ri). Then for any b in Rr, the system Ax = b has a solution x in R". PROOF By Theorem 2 above, the column rank of A is r, and thus if A.i, . . . , A.r, say, are a column basis, we have r linearly independent vectors in Rr, and the vector b £ Rr can be expressed as a linear combination of them, that is,
By defining Xj• = 0 for.; = r + 1, . . . , n, we have that 6 = Ax. 180
Appendix A
A.2
Nonsingular matrix An n X n matrix is said to be nonsingular if it has rank n. Semidefinite matrix An n X n matrix A is said to be positive semidefinite if xAx ^ 0 for all x in Rn and negative semidefinite if xAx ^ 0 for all x in Rn. Definite matrix An n X n matrix A is said to be positive definite if and negative definite if Obviously the negative of a positive semidefinite (definite) matrix is a negative semidefinite (definite) matrix and conversely. Also, each positive (negative) definite matrix is also positive (negative) semidefinite. Proposition Each positive or negative definite matrix is nonsingular. PROOF Let A be an n X n positive or negative definite matrix. If A is singular, then its rank is less than n, and there exists an x G R", x -^ 0, such that Ax = 0. Hence xAx = 0 for some x ?** 0, which contradicts the assumption that A is positive or negative definite. Hence A is nonsingular.
181
Appendix B Resume' of Some Topological Properties ofR n
We collect here some well-known topological properties of R" which can be found in greater detail in any modern book on real analysis or general topology [Buck 65, Fleming 65, Rudin 64, Berge 63, Simmons 63J.
1.
Open and closed sets Open ball
Given a point x° £ Rn and a real number X > 0, the set
is called an open ball around x° with radius X. (Sometimes the subscript X from B^(x°) will be dropped for convenience.) Interior point A point x is said to be an interior point of the set T C Rn if there exists a number e > 0 such that Bt(x) C T. Point of closure A point x is said to be a point of closure of the set F C Rn if for each number e > 0, Note that a point of closure of T need not be in T. For example 0 E R is a point of closure of the infinite set T = {1,H>M> • • • } > but 0 £ T. On the other hand, every point in the set F is also a point of closure of T. In other words, a point of closure x° of a set T is a point such that there exist points in T that are arbitrarily close to it (closeness being measured by the distance 5(x,x°) = \\x — x°\\, see 1.8.9).
Appendix B
B.I
Open set A set T C Rn such that every point of F is an interior point is said to be open. Closed set A set T C Rn such that every point of closure of F is in F is said to be closed. Closure of a set The closure T of a set T C Rn is the set of points of closure of F. Obviously F C T, and for a closed set F = T. Interior of a set The interior int (F) of a set F C Rn is the set of interior points of F. Obviously int (F) C F, and for an open set F = int (F). Relatively open (closed) sets Let F and A be two sets such that F C A C Rn- F is said to be open (closed) relative to A if F = A C\ Q, where 0 is some open (closed) set in Rn. Obviously an open (closed) set F in Rn is open (closed) relative to n R . If F C A, and if F is open (closed), then F is open (closed) relative to A for F = A r> F. Problem Show that: (i) Every open ball B\(a) in Rn is an open set. (ii) The closure B*(a) of an open ball B^(a) in Rn is and is closed. (The closure of an open ball is called a closed ball and is denoted by -Bx(a).) (iii) The interior of a closed ball #x(a) is the open ball B\(a). Theorem The family of open sets in Rn has the following properties: (i) Every union of open sets is open. (ii) Every finite intersection of open sets is open. (iii) The empty set 0 and Rn are open. us
B.I
Nonlinear Programming
PROOF (i) Let (I\-),-e/ be a family, finite or infinite, of open sets in Rn. If x £ r = U r,, then x £ I\ for some i £ /, and there exists an « > 0 such that
B.(x) C r, C r Hence £ is an interior point of T, and T is open. (ii) Let r,6/ be a finite family of open sets in Rn. If x £ F = O I\, then a: £ I\ for each i £ /. Because each F, is open, there exist c* > 0 such that Bt<(x) C I\
for V* £ 7
Take e = minimum e' > 0 (this is where the finiteness of / is used), then
B.(X) c r and F is open. (iii) Since the empty set 0 contains no points, we need not find an open ball surrounding any point, and hence 0 is open. The set Rn is open because for each x £ Rn, Bt(x) C Rn for all e > 0. The family of open sets in Rn, as defined in 4> is called a topology n in R . (In fact any family of sets—called open—which has properties (i), (ii), (iii) above is also called a topology in Rn. For example the sets 0, Rn also form a topology in Rn. We shall, however, be concerned here only with open sets as defined by 40 Theorem Let T C Rn-
PROOF
Then
Let x £ Rn.
Then
Hence
Corollary The complement (relative to Rn) of an open set in Rn is closed, and vice versa. 184
B.«
Appendix B
PROOF
T is closed *=> T = T
By using Corollary 12, the following theorem is a direct consequence of Theorem 10. Theorem The family of closed sets in Rn has the following properties: (i) (ii) (iii)
Every intersection of closed sets is closed. Every finite union of closed sets is closed. The empty set 0 and R" are closed. Problem Show that the interior of each set F C Rn is an open set. Problem Show that the closure of each set F C Rn is a closed set. Problem Let A be a fixed m X n matrix and b a fixed m-vector.
(i) (ii) (iii)
Show that the set {x \ x G Rn, Ax ^ b}, and hence also the set {a: | x G Rn, Ax = 6} are closed sets in Rn. Show that the set {x \ x G Rn, Ax < b} is an open set in Rn. Show that the set {x \ x £j Rn, Ax < b} is neither a closed nor an open set in Rn.
2. Sequences and bounds Sequence A sequence in a set X is a function / from the set / of all positive integers into the set X. If /(n) = xn G X for n G I, it is customary to denote the sequence / by the symbol [ x n \ or by x1, z2, . . . . For any sequence of positive integers n1, n2, . . . , such that n1 < n 2 < • • • , the sequence z"1, z"2, . . . is called a subsequence of x1, x2, . . . . If X = Rn, then x1, z2, . . . , is a sequence of points in Rn, and if X = R, then z1, z2, . . . , is the familiar sequence of real numbers. 18*
B.a
Nonlinear Programming
Limit point Let x1, x2, ..., be a sequence of points in Rn. A point x£ Rn is said to be a limit point of the sequence if
(In the literature, a limit point is sometimes called an accumulation point or a cluster point.} Limit Let x1, x2, . . . , be a sequence of points in Rn. A point x° G Rn is said to be a limit of the sequence, or we say that the sequence converges or tends to a limit x° if
Remark
Obviously a. limit of a sequence is also a limit point of the sequence, but the converse is not necessarily true. (For example the sequence 1, — 1, 1, — 1, . . . , has the limit points 1 and —1, but no limit. The sequence 1, 3^j, %, • • • > has a limit, and hence a limit point, zero.) Also note that if x° is a limit of xl, x2, . . . , then it is also a limit of each subsequence of x1, a;2, . . . . Theorem (i) // x is a limit point of the sequence x1, x2, . . . , then there exists a subsequence xn\ xn\ . . . , which has x as a limit, and conversely. (ii) // a sequence x1, x2, . . . , tends to a limit x°, then there can be no limit point (and hence no limit) other than x°. PROOF
(i) Let x be a limit point of x1, re2, . . . .
Then
and so on. The subsequence xn\ xn\ . . . , converges to x. 186
Appendix B
B.3
Conversely, if the subsequence xn\ xn\ . . . , converges to x, then for given e > 0 and n there exists an n* ^ n (in fact all n{ ^ n) such that ||xn< — x|| < t, and hence x is a limit point of xl, a;2, . . . . (ii) Let the sequence x1, x2, . . . , tend to the limit x°, and x ?* x°. Then for 0 < e < ||x — x°||, there exists an integer n° such that (by triangle inequality, see 1.3.10) and hence x is not a limit point of the sequence. Remark Note that a limit point (and hence a limit) of a sequence x1, x2, . . . , in Rn is a point of closure of any set F in Rn containing {x^x2, . . . }. Conversely, if x is a point of closure of the set F C Rn, then there exists a sequence x1, x2, . . . C F (and hence also a subsequence x"1, x"2, . . . ) such that x is a limit point of x1, x2, . . . (and hence a limit of l n2 , . . . )"» . T n , r4 X
Lower and upper bounds Let F be a nonempty set of real numbers.
Then
a is a lower bound of F <=> (x (E F => x ^ a) /3 is an upper bound of F <=» (x G F => x ^ /3) Greatest lower and least upper bounds Let F be a nonempty set of real numbers. A lower bound a is a greatest lower bound of F (infimum of F, or inf F) if no bigger number is a lower bound of F. An upper bound /3 is a least upper bound of F (supremum of F, or sup F) if no smaller number is an upper bound of F. Equivalently we have
We shall take the following axiom as one of the axioms of the real number system [Birkhoff-Maclane 53]. 1ST
B.S
Nonlinear Programming
Axiom Any nonempty set T of real numbers which has a lower (upper) bound has a greatest (least) lower (upper) bound. If the set F has no infimum (or equivalently by the above axiom if it has no lower bound), we say that F is unbounded from below and we write inf F = — «. Similarly if F has no supremum (or equivalently by the above axiom if it has no upper bound), we say T is unbounded from above and we write sup F = + ». Hence by augmenting the Euclidean line R by the two points + °° and — °° any nonempty set F will have an infimum, which may be — °°, and a supremum, which may be + °o. We shall follow the convention of writing inf 0 = -f » and sup 0 = — °°. We observe that neither inf F nor sup F need be in F. For example inf {iy2,lA, . . .} is 0, but 0 is not in the set |1,H,H» • • • } • Theorem Every bounded nondecreasing (nonincreasing) sequence of real numbers has a limit. PROOF Let xl, x2, . . . , be a bounded nondecreasing sequence of real numbers. By the above Axiom 9, the sequence has a least upper bound 0. Hence
(by 8) (because sequence is nondecreasing) (because 0 ^ xn for all n) The proof for a bounded nonincreasing sequence is similar. Cauchy convergence criterion A sequence xl, x2, . . . , in Rn converges to a limit XQ if and only if it is a Cauchy sequence, that is, for each e > 0 there exists an n* such that \\xm — xn\\ < efor each m,n ^ n*. For a proof see [Buck 65, Fleming 65, Rudiri 64].
3.
Compact sets in Rn
Bounded set A set F C Rn is bounded if there exists a real number a such that for each x £ F, ||x|| ^ a. 188
Appendix B
B.S
A set F C Rn which is both closed and bounded has some very interesting properties. In fact such sets constitute one of the most important classes of sets in analysis, the class of compact sets. We give in the theorem below a number of equivalent characterizations of a compact set in Rn. (These characterizations are not necessarily equivalent in spaces more general than Rn.) Compact sets (theorem-definition) A set T C Rn is said to be compact if it satisfies any of the following equivalent conditions: (i) (ii) (iii)
T is closed and bounded. (Bolzano-Weierstrass) Every sequence of points in T has a limit point in T. (Finite intersection property) For any family (r,)ier of sets, closed relative to T, it follows that
or equivalently
(iv)
(Heine-Borel) From every family (I\)t-e/ of open sets whose union VJ I\ contains T we can extract a finite subfamily T tl , . . . , Tim
te/
whose union F,, \J F,2 • • • W I\m contains T (or equivalently stated: every open covering of T has a finite subcovering).
We shall not give here a proof of the equivalence of the above four conditions; such a proof can be found in any of the references given at the beginning of this Appendix. An especially lucid proof is also given in chap. 2 of [Berge-Ghouila Houri 65]. Corollary Let T and A be sets in Rn which are respectively compact and closed. Then the sum ft is closed. PROOF Let x° belong to the closure of Q. Then there exists a sequence xl, x2, . . . , in 8 which converges to x° (see B.2.6). Then we can find 189
Nonlinear Programming
B.3
a sequence y1, yz, . . . , in T and a sequence 21, 22, . . . , in A such that z n = yn + 2"
forn = 1, 2, . . .
Since T is compact, there exists a subsequence i/"1, ynt, . . . , which converges to y° £ T. Then
Hence and ft is closed.
1*0
Appendix C Continuous and Semicontinuous Functions, Minima and Infima
We collect in this appendix some basic definitions and properties of continuous and semicontinuous functions defined on a subset of Rn. We also state some facts about the minima and maxima of such functions. More detailed discussions can be found in [Berge 63, BergeGhouila Houri 65.]
1. Continuous and semicontinuous functions Continuous function A numerical function 8 defined on a set r C Rn is said to be continuous at x° G r (with respect to T) if either of the two following equivalent conditions are satisfied: (i)
For each e > 0 there exists a 5 > 0 such that
(ii)
For each sequence xl, x*, . . . , in T converging to x°, lim 6(xn) = 0(lim xn) = d(x°)
n—»«o
n—»«
6 is said to be continuous on T (with respect to F) if it is continuous (with respect to F) at each point x° £ F, or equivalently if any of the following equivalent conditions hold: (iii) The sets
and
(iv)
are closed relative to F for each real a. The sets
C.I
Nonlinear Programming
and
(v)
are open relative to F for each real a. The epigraph of 6 and the hypograph of 6 are closed relative to F X R.
In each of the above conditions a pair of symmetric requirements can be found. By dropping one requirement from each pair, we arrive at the concept of semicontinuous functions. Lower semicontinuous function A numerical function 0 defined on a set F C Rn is said to be lower semicontinuous at x° G F (with respect to F) if either of the two following equivalent conditions is satisfied : (i)
For each e > 0 there exists a 5 > 0 such that
(ii)
For each sequence x1, x2, . . . , in F converging to x°,
where lim inf 8(x") denotes the infimum of the limit points of the n—+»
sequence of real numbers 8(x1}i 0(z2), . . . . 6 is said to be lower semicontinuous on T (with respect to F) if it is lower semicontinuous (with respect to F) at each point x° G F, or equivalently if any of the following equivalent conditions holds:
(iii)
The set
(iv)
is closed relative to F for each real a. The set
198
C.I
Appendix C
is open relative to F for each real a. (v)
The epigraph of 6 is closed relative to I' X R.
Upper semicontinuous function A numerical function 0 defined on a set F C Rn is said to be upper semicontinuous at x° G F (with respect to F) if either of the two following equivalent conditions is satisfied: (i)
For each e > 0 there exists a 8 > 0 such that
(ii)
For each sequence x1, x2, . . . , in F converging to x°,
where lim sup 0(xn) denotes the supremum of the limit points of the n—»«o
sequence of real numbers 0(x1), 0(z2), . . . . 0 is said to be upper semicontinuous on T (with respect to F) if it is upper semicontinuous (with respect to F) at each point x° G F, or equivalently if any of the following equivalent conditions hold:
(iii)
The set
(iv)
is closed relative to F for each real a. The set
(v)
is open relative to F for each real a. The hypograph of 6 is closed relative to F X R. Remark 6 is lower semicontinuous at x° 6: F (with respect to F) if and only 193
C.I
Nonlinear Programming
if —6 is upper semicontinuous at x° £ T (with respect to r). 8 is continuous at x° (E T (with respect to T) if and only if it is both lower semicontinuous and upper semicontinuous at x° £ F (with respect to r). Examples is lower semicontinuous on R, see Fig.
C.I.I is upper semicontinuous on R, see Fig.
rU1.-£1 .«9
Fig. C.1.1 A lower semicontinuous function on R.
Fig. C.1.2 A upper semicontinuous function on R.
Theorem Let (6i)i£i be a (finite or infinite) family of lower semicontinuous functions on T C Rn> Its least upper bound e(x) = sup 8t(x) i&
is lower semicontinuous on T. If I is finite, then its greatest lower bound 4>(x) = inf ef(x) i&
is also lower semicontinuous on T. PROOF The first part follows directly from #(iii) above and Theorem B,L13(i) if we observe that for any real X, the set
194
Appendix C
C.S
is closed (relative to F). The second part follows from #(iii) and Theorem B.IJ3(ii) if we observe that for any real X the set
is closed (relative to F). Corollary Let (Qi)i£i be a (finite or infinite) family of upper semicontinuous functions on F C Rn. Us greatest lower bound
is upper semicontinuous on F. If I is finite, then its least upper bound
is also upper semicontinuous on T.
2.
Intimum (supremum) and minimum (maximum) of a set of real numbers
We recall that in Appendix B (B.2.8] we defined the infimum and supremum of a set of real numbers F as follows
and
We also noted that neither a nor 0 need be in F. However when a and |5 are in F, they are called respectively min F and max F. Minimum (maximum) Let F be a set of real numbers. If (inf F) £ F, then inf F is called the minimum of F and is denoted by min F. If (sup F) (E F, then sup F is called the maximum of F and is denoted by max F. Equivalently
195
C.S
Nonlinear Programming
3. Infimum (supremum) and minimum (maximum) of a numerical function Bounded functions A numerical function 6 defined on the set F is said to be bounded from below on T if there exists a number a such that The number a is a lower bound of 8 on F. 6 is said to be bounded from above on T if there exists a number /3 such that The number ft is an upper bound of 0 on F. Infimum of a numerical function Let 0 be a numerical function denned on the set F. If there is a number a such that
and
then 5 is called the infimum of 9 on F, and we write
Supremum of a numerical function Let 6 be a numerical function defined on the set F. number 0 such that
If there is a
and
then /3 is called the supremum of 6 on F, and we write
If we admit the points + », then every numerical function 6 has a supremum and infimum on the set F on which it is defined. 196
Appendix C
C.S
Examples
Minimum of a numerical function Let 6 be a numerical function denned on the set F. If there exists an x G F such that
then 0(z) is called the minimum of 6 on F, and we write
We make the following remarks: Not every numerical function has a minimum; for example, e~x and x have no minima on R. (ii) The minimum of a numerical function, if it exists, must be finite, (iii) The minimum of a numerical function, if it exists, is an attained infimum, that is, (i)
Maximum of a numerical function Let 6 be a numerical function denned on the set F. If there exists an x G F such that
then 6(x) is called the maximum of 0 on F and we write
Remarks similar to (i), (ii), and (iii) above, which applied to the minimum of a function, also apply here to the maximum of a function. 197
Nonlinear Programming
C.4
4. Existence of a minimum and a maximum of a numerical function We give now sufficient conditions for a numerical function defined on a subset of Rn to have a minimum and a maximum on that subset. Theorem A lower (upper) semicontinuous function 6 defined on a compact— that is, closed and bounded—set T in Rn is bounded from below (above) and attains in T the value
In other words there exists an x G r such that
PROOF the set
We prove the lower semicontinuous case.
Let 7 > a.
is not empty and is closed relative to T by C.1.2(i\\). finite intersection theorem B.3.2(iii)
Then
Hence by the
Choose x in this intersection; then d(x) = a, and hence a is finite. It should be remarked here that the above theorem cannot be strengthened by dropping the semicontinuity or the compactness assumptions. In other words, in order to ensure that
we cannot weaken the conditions that (i) (ii) (iii)
8 is lower semicontinuous on F, T is closed, and T is bounded.
We give examples below where the infimum is not attained whenever any one of the above conditions is violated. 198
Appendix C
C.4
(i)
inf
8(x) = 0, but no minimum exists on the compact set
(ii) inf 8(x) = 0, but no minimum exists of this continuous function
0<x<1
on the open set
(iii) inf 6(x) = 0, but no minimum exists of this continuous function on
*££
the closed unbounded set R.
199
Appendix D Differentiable Functions, Mean-value and Implicit Function Theorems
We summarize here the pertinent definitions and properties of differentiable functions and give the mean-value and implicit function theorems. For details the reader is referred to [Fleming 65, Rudin64, Hestenes 66, Bartle 64].
1. Differentiable and twicedifferentiable functions Differentiable numerical function Let 6 be a numerical function defined on an open set F in Rn, and let x be in F. 6 is said to be differentiable at x if for all x G Rn such that x + x G F we have that
where t(£) is an n-dimensional bounded vector, and a is a numerical function of x such that 6 is said to be differentiable on F if it is differentiable at each x in F. [Obviously if 6 is differentiable on the open set F, it is also differentiable on any subset A (open or not) of F. Hence when we say that 6 is differentiable on some set A (open or not), we shall mean that 6 is differentiable on some open set containing A.] Partial derivatives and gradient of a numerical function Let 6 be a numerical function defined on an open set F in Rn, and let x be in F. 0 is said to have a partial derivative at x with respect to Xi, i = 1, . . . , n, if
Appendix D
D.I
tends to a finite limit when 5 tends to zero. This limit is called the partial derivative of 6 with respect to x, at x and is denoted by 66(x)/dXi. The n-dimensional vector of the partial derivatives of 6 with respect to Xi, . . . , xn at x is called the gradient of 6 at x and is denoted by V0(z), that is,
Theorem Let 6 be a numerical function defined on an open set Y in Rn, and let x be in T. (i) // 6 is differentiate at x, then 6 is continuous at x, and V0(z) exists (but not conversely), and
(ii) // 6 has continuous partial derivatives at x with respect to xi, . . . , xn, that is, V6(x) exists and V0 is continuous at x, then 6 is differentiate at x. We can summarize most of the above result as follows
Differentiable vector function Let / be an w-dimensioiial vector function defined on an open set F in Rn, and let x be in F. /is said to be differentiate at x (respectively on T) if each of its components f i , . . . ,m f is differentiate at x (respectively on F). Partial derivatives and Jacobian of a vector function Let / be an m-dimensional vector function defined on an open set F in Rn, and let x be in T. f is said to have partial derivatives at x with 201
Nonlinear Programming
D.I
respect to Xi, . . . , xn if each of its components f i , . . . , f derivatives at x with respect to Xi, . . . , xn. We write
m
has partial
The m X n matrix V/(x) is called the Jacobian (matrix) of f at x. Chain rule theorem Let f be an m-dimensional vector function defined on an open set F in Rn, and let $ be a numerical function which is defined on- Rm. The numerical function 6 defined on F by 6(x) = *[/(*)] is differentiate at x G F if f is differentiate aty = f(x), and
at x and if is
differentiate
V6(x) = V0(£)V/(z) Twice-differentiable numerical function and its Hessian Let 6 be a numerical function denned on an open set F in Rn, and let x be in F. $ is said to be twice differentiate at x if for all x G Rn such that x -f- x G r we have that
where V26(x) is an n X n matrix of bounded elements, and /3 is a numerical function of x such that lim 0(x,x) = 0 z-*0
The n X n matrix V20(z) is called the Hessian (matrix) of 6 at x and its ijth element is written as
Obviously if 8 is twice differentiate at x, it must also be differentiate at x. Theorem Let 6 be a numerical function defined on an open set F in Rn, and kt x be in F. Then aoa
Appendix D
(i) (ii)
D.I
V0 differentiable at x => 6 twice differentidble at x V0 has continuous partial derivatives at x => 6 twice differentiable at x
Remark The numbers dd(x)/dxi, i' = 1, . . . , n, are also called the first partial derivatives of 6 at x, and d*Q(x)/dXi dx, i,j = 1, . . . , n, are also called the second partial derivatives of 6 at x. In an analogous way we can define /cth partial derivatives of 6 at x. Remark Let 6 be a numerical function defined on an open set F C Rn X Rk which is differentiable at (x,y) (E T. We define then
Let / be an m-dimensional function defined on an open set F C Rn X Rk which is differentiable at (x,y) G r. We define then
and
108
Nonlinear Programming
D.3
2. Mean-value theorem and Taylor's theorem Mean-value theorem Let 6 be a differentiate numerical function on the open convex set T C Rn, and let xl,x2 E T. Then for some real number y, 0 < 7 < 1. Taylor's theorem (second order) Let 6 be a twice-differentiable numerical function on the open convex set T C Rn, and let x\x2 E T. Then
for some real number
3. Implicit function theorem Implicit function theorem Let f be an m-dimensional vector function defined on an open set A C Rn X Rm, let f have continuous first partial derivatives at (x,y) G: A, let f(x,y) = 0, and let Vvf(x,y) be nonsingular. Then there exist a ball Bx(x,y) with radius X > 0 in Rn+m, an open set F C -Kn containing x, and an m-dimensional vector function e with continuous first partial derivatives on F such that Vvf(x>y)
and
204
*s nonsingular for all (x,y} £ B\(x,y)
References Abadie, J.: On the Kuhn-Tucker Theorem, in J. Abadie (ed.), "Nonlinear Programming," pp. 21-36, North Holland Publishing Company, Amsterdam, 1967. Anderson, K. W., and D. W. Hall: "Sets, Sequences and Mappings, the Basic Concepts of Analysis," John Wiley & Sons, Inc., New York, 1963. Arrow, K. J., and A. C. Enthoven: Quasiconcave Programming, Econometrica 29:779-800 (1961). Arrow, K. J., L. Hunvicz, and H. Uzawa (eds.): "Studies in Linear and Nonlinear Programming," Stanford University Press, Stanford, Calif., 1958. Arrow, K. J., L. Hurwicz, and H. Uzawa: Constraint Qualifications in Maximization Problems, Naval Research Logistics Quarterly 8: 175-191 (1961). Bartle, R. G.: "The Elements of Real Analysis," John Wiley & Sons, Inc., New York, 1964. Berge, C.: "Topological Spaces," The MacMillan Company, New York, 1963. Berge, C., and A. Ghouila Houri: "Programming, Games and Transportation Networks," John Wiley & Sons, Inc., New York, 1965. Birkhoff, G. and S. Maclane: "A Survey of Modern Algebra," The MacMillan Company, New York, 1953. Bohnenblust, H. F., S. Karlin, and L. S. Shapley: "Solutions of Discrete, Twoperson Games," Contributions to the Theory of Games, vol. I, Annals of Mathematics Studies Number 24, Princeton, 1950, 51-72. Bracken, J., and G. P. McCormick: "Selected Applications of Nonlinear Programming," John W T iley & Sons, Inc., New York, 1968. Brondsted, A.: Conjugate Convex Functions in Topological Vector Spaces, Matematisk-fysiske Meddelel ser udgivet af Dei Kongelige Danske Videnskabernes Selskab, 34(2): 1-27 (1964). Browder, F. E.: On the Unification of the Calculus of Variations and the Theory of Monotone Nonlinear Operators in Banach Spaces, Proc. Nat. Acad. Sci. U.S., 66: 419-425 (1966). Buck, R. C.: "Advanced Calculus," McGraw-Hill Book Company, New York, 1965. Canon, M., C. Cullum, and E. Polak: Constrained Minimization Problems in Finite-dimensional Spaces, Society for Industrial and Applied Mathematics Journal on Control, 4: 528-547 (1966). Carathe'odory, C.: tlber den variabilitatsberich der Koeffizienten von Potenzreihen, die gegebene Werte nicht annehmen, Mathematische Annalen 64: 95-115 (1907).
205
Nonlinear Programming
Charnes, A., and W. W. Cooper: "Management Models and Industrial Applications of Linear Programming," vols. I, II, John Wiley & Sons, Inc., New York, 1961. Cottle, R. W.: Symmetric Dual Quadratic Programs, Quarterly of Applied Mathematics 21: 237-243 (1963). Courant, R.: "Differential and Integral Calculus," vol. II, 2d ed., rev., Interscience Publishers, New York, 1947. Courant, R., and D. Hilbert: "Methods of Mathematical Physics," pp. 231-242, Interscience Publishers, New York, 1953. Dantzig, G. B.: "Linear Programming and Extensions," Princeton University Press, Princeton, N.J., 1963. Dantzig, G. B., E. Eisenberg, and R. W. Cottle: Symmetric Dual Nonlinear Programs, Pacific Journal of Mathematics, 15: 809-812 (1965). Dennis, J. B.: "Mathematical Programming and Electrical Networks," John Wiley & Sons, Inc., New York, 1959. Dieter, U.: Dualitat bei konvexen Optimierungs—(Programmierungs—)Aufgaben, Unternehmensforschung 9: 91-111 (1965a). Dieter, U.: Dual External Problems in Locally Convex Linear Spaces, Proceedings of the Colloquium on Convexity, Copenhagen, 52-57 (1965b). Dieter, U.: Optimierungsaufgaben in topologischen Vektorraumen I: Dualitatstheorie, Zeitschrift fur \Vahrscheinlichkeitstheorie und Verwandte Gebiete, 5: 89-117 (1966). Dorn, W. S.: Duality in Quadratic Programming, Quarterly of Applied Mathematics, 18: 155-162 (1960). Dorn, W. S.: Self-dual Quadratic Programs, Society for Industrial and Applied Mathematics Journal on Applied Mathematics, 9: 51-54 (1961). Duffin, R. J.: Infinite Programs, in [Kuhn-Tucker 56], pp. 157-170. Fan, K., I. Glicksburg, and A. J. Hoffman: Systems of Inequalities Involving Convex Functions, American Mathematical Society Proceedings, 8: 617-622 (1957). Farkas, J.: Uber die Theorie der einfachen Ungleichungen, Journal fur die Heine und Angewandte Mathematik, 124: 1-24 (1902). Fenchel, W.: "Convex Cones, Sets and Functions," Lecture notes, Princeton University, 1953, Armed Services Technical Information Agency, AD Number 22695. Fiacco, A. V.: Second Order Sufficient Conditions for Weak and Strict Constrained Minima, Society for Industrial and Applied Mathematics Journal on Applied Mathematics, 16: 105-108 (1968). 106
References
Fiacco, A. V., and G. P. McCormick: "Nonlinear Programming: Sequential Unconstrained Minimization Techniques," John Wiley &Sons, Inc., New York, 1968. Fleming, W. H.: "Functions of Several Variables," McGraw-Hill Book Company, New York, 1965. Friedrichs, K. 0.: Ein Verfahren der Variationsrechnung des Minimum eines Integral als das Maximum eines Anderen Ausdruckes Darzustellen, Nachrichten von der Gesellschajl der Wissenschaften zu Gottingen Mathematische—Physikalische Klasse, 1320(1929). Gale, D.: "The Theory of Linear Economic Models," McGraw-Hill Book Company, New York, 1960. Gale, D., H. W. Kuhn, and A. W. Tucker: Linear Programming and the Theory of Games, in [Koopmans 51], pp. 317-329. Gass, S.: "Linear Programming," 2d ed., McGraw-Hill Book Company, New York, 1964. Gol'stein, E. G.: Dual Problems of Convex and Fractionally-convex Programming in Functional Spaces, Soviet Mathematics-Doklady (English translation), 8: 212-216 (1967). Gordan, P.: Uber die Auflosungen linearer Gleichungen mit reelen Coefficienten, Mathematische Annalen, 6: 23-28 (1873). Graves, R. L., and P. Wolfe (ed.): "Recent Advances in Mathematical Programming," McGraw-Hill Book Company, New York, 1963. Hadley, G. F.: "Linear Programming," Addison-Wesley Publishing Company, Inc., Mass., 1962. Hadley, G. F.: "Nonlinear and Dynamic Programming," Addison-Wesley Publishing Company, Inc., Reading, Mass., 1964. Halkin, H.: A Maximum Principle of the Pontryagin Type for Systems Described by Nonlinear Difference Equations, Society for Industrial and Applied Mathematics Journal on Control, 4: 90-111 (1966). Halkin, H., and L. W. Neustadt: General Necessary Conditions for Optimization Problems, Proc. Nat. Acad. Sci. U.S., 56: 1066-1071 (1966). Halmos, P. R.: "Finite-dimensional Vector Spaces," D. Van Nostrand Company, Inc., Princeton, N.J., 1958. Hamilton, N. T., and J. Landin: "Set Theory: The Structure of Arithmetic," Allyn and Bacon, Inc., Boston, 1961. Hanson, M. A.: A Duality Theorem in Nonlinear Programming with Nonlinear Constraints, Australian Journal of Statistics 3: 64-71 (1961). 207
Nonlinear Programming:
Hanson, M. A.: Bounds for Functionally Convex Optimal Control Problems, Journal of Mathematical Analysis and Applications, 8: 84-89 (1964). Hanson, M. A.: Duality for a Class of Infinite Programming Problems, Society for Industrial and Applied Mathematics Journal on Applied Mathematics, 16: 318-323 (1968). Hanson, M. A., and B. Mond: Quadratic Programming in Complex Space, Journal of Mathematical Analysis and Applications, 23: 507-514 (1967). Hestenes, M. R.: "Calculus of Variations and Optimal Control Theory," John Wiley & Sons, Inc., New York, 1966. Hu, T. C.: "Integer Programming and Network Flows," Addison-Wesley Publishing Company, Inc., Reading, Mass., 1969. Huard, P.: Dual Programs, 75.17 J. Res. Develop., 6: 137-139 (1962). Hurwicz, L.: Programming in Linear Spaces, in [Arrow et al. 58], pp. 38-102. Jacob, J.-P., and P. Rossi: "General Duality in Mathematical Programming," IBM Research Report, 1969. Jensen, J. L. W. V.: Sur les fonctions convexes et les ine'galite's entre les valeurs moyennes, Acta Mathematica, 30: 175-193 (1906). John, F.: Extremum Problems with Inequalities as Subsidiary Conditions, in K. 0. Friedrichs, O. E. Neugebauer, and J. J. Stoker, (eds.), "Studies and Essays: Courant Anniversary Volume," pp. 187-204, Interscience Publishers, New York, 1948. Karamardian, S.: "Duality in Mathematical Programming," Operations Research Center, University of California, Berkeley, Report Number 66-2, 1966; Journal of Mathematical Analysis and Applications, 20: 344-358 (1967). Karlin, S.: "Mathematical Methods and Theory in Games, Programming, and Economics," vols. I, II, Addison-Wesley Publishing Company, Inc., Reading, Mass., 1959. Koopman, T. J. (ed.): "Activity Analysis of Production and Allocation," John Wiley & Sons, Inc., New York, 1951. Kowalik, J., and M. R. Osborne: "Methods for Unconstrained Optimization Problems," American Elsevier Publishing Company, Inc., New York, 1968. Kretschmar, K. S.: Programmes in Paired Spaces, Canadian Journal of Mathematics, 13: 221-238 (1961). Kuhn, H. W., and A. W. Tucker: Nonlinear Programming, in J. Neyman (ed.), "Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability," pp. 481-492, University of California Press, Berkeley, Calif., 1951. 908
Beferencea
Kuhn, H. W., and A. W. Tucker: "Linear Inequalities and Related Systems," Annals of Mathematics Studies Number 38, Princeton University Press, Princeton, N.J., 1956. Ktinzi, H. P., W. Krelle, and W. Oettli: "Nonlinear Programming," Blaisdell Publishing Company, Waltham, Mass., 1966. Larsen, A., and E. Polak: Some Sufficient Conditions for Continuous-linearprogramming Problems," International Journal of Engineering Science, 4: 583604 (1966). Lax, P. D.: Reciprocal Extremal Problems in Function Theory, Communications on Pure and Applied Mathematics, 8: 437-453 (1955). Levinson, N.: Linear Programming in Complex Space, Journal of Mathematical Analysis and Applications, 14: 44-62 (1966). Levitin, E. S., and B. T. Polyak: Constrained Minimization Methods, USSR Computational Mathematics and Mathematical Physics (English translation), 6(5):1-50 (1966). McCormick, G. P.: Second Order Conditions for Constrained Minima, Society for Industrial and Applied Mathematics Journal on Applied Mathematics, 16: 641-652 (1967). Mangasarian, 0. L.: Duality in Nonlinear Programming, Quarterly of Applied Mathematics, 20: 300-302 (1962). Mangasarian, 0. L.: Pseudo-convex Functions, Society for Industrial and Applied Mathematics Journal on Control, 3: 281-290 (1965). Mangasarian, O. L., and S. Fromovitz: A Maximum Principle in Mathematical Programming, in A. V. Balakrishnan and L. W. Neustadt (eds.), "Mathematical Theory of Control," pp. 85-95, Academic Press Inc., New York, 1967. Mangasarian, 0. L., and S. Fromovitz: The Fritz John Necessary Optimality Conditions in the Presence of Equality and Inequality Constraints, J. Math. Analysis and Applications, 17:37-47 (1967a). Mangasarian, 0. L., and J. Ponstein: Minmax and Duality in Nonlinear Programming, Journal of Mathematical Analysis and Applications, 11: 504-518, (1965). Martos, B.: The Direct Power of Adjacent Vertex Programming Methods, Management Science, 12: 241-252 (1965). Minty, G. J.: On the Monotonicity of the Gradient of a Convex Function, Pacific Journal of Mathematics, 14: 243-247 (1964). Mond, B.: A Symmetric Dual Theorem for Nonlinear Programs, Quarterly of Applied Mathematics, 23: 265-269 (1965). S09
Nonlinear Programming
Mond, B., and R. W. Cottle: Self-duality in Mathematical Programming, Society for Industrial and Applied Mathematics Journal on Applied Mathematics, 14: 420-423 (1966). Mond, B., and M. A. Hanson: Duality for Variational Problems, Journal of Mathematical Analysis and Applications, 18: 355-364 (1967). Moreau, J.-J.: Proximite* et dualite* dans un espace Hilbertien, Bulletin de la Societe Mathematique de France, 93: 273-299 (1965). Motzkin, T. S.: "Beitrage zur Theorie der Linearen Ungleichungen," Inaugural Dissertation, Basel, Jerusalem, 1936. Nikaido, H.: On von Neumann's Minimax Theorem, Pacific Journal of Mathematics, 4: 65-72 (1954). Opial, Z.: "Nonexpansive and Monotone Mappings in Banach Spaces," Lecture Notes 67-1, Division of Applied Mathematics, Brown University, Providence, Rhode Island, 1967. Ponstein, J.: Seven Types of Convexity, Society for Industrial and Applied Mathematics Review, 9: 115-119 (1967). Pontryagin, L. S., V. G. Boltyanskii, R. V. Gamkrelidze, and E. F. Mishchenko: "The Mathematical Theory of Optimal Processes," John Wiley & Sons, Inc., New York, 1962. Rissanen, J.: On Duality Without Convexity, Journal of Mathematical Analysis and Applications, 18: 269-275 (1967). Ritter, K.: Duality for Nonlinear Programming in a Banach Space, Society for Industrial and Applied Mathematics Journal on Applied Mathematics, 15: 294-302 (1967). Rockafellar, R. T.: "Convex Functions and Dual Extremum Problems," Doctoral dissertation, Harvard University, 1963. Rockafellar, R. T.: Duality Theorems for Convex Functions, Bulletin of the American Mathematical Society, 70: 189-192 (1964). Rockafellar, R. T.: Conjugates and Legendre Transforms of Convex Functions, Canadian Journal of Mathematics, 19: 200-205 (1967a). Rockafellar, R. T.: Convex Programming and Systems of Elementary Monotonic Relations, Journal of Mathematical Analysis and Applications, 19: 543-564 (1967b). Rockafellar, R. T.: Duality and Stability in Extremum Problems Involving Convex Functions, Pacific Journal of Mathematics, 21: 167-187 (1967c). Rockafellar, R. T.: "Convex Analysis," Princeton University Press, Princeton, N.J., 1969.
210
References
Rosen, J. B.: The Gradient Projection Method for Nonlinear Programming, Society for Industrial and Applied Mathematics Journal on Applied Mathematics, 8: 181-217 (1960) and 9: 514-553 (1961). Rubinstein, G. S.: Dual Extremum Problems, Soviet Mathematics-Doklady (English translation), 4: 1309-1312 (1963). Rudin, W.: "Principles of Mathematical Analysis," 2d ed., McGraw-Hill Book Company, New York, 1964. Saaty, T. L., and J. Bram: "Nonlinear Mathematics," McGraw-Hill Book Company, New York, 1964. Simmonard, M.: "Linear Programming," Prentice-Hall, Inc., Englewood Cliffs, N.J., 1966. Simmons, G. F.: "Introduction to Topology and Modern Analysis," McGrawHill Book Company, New York, 1963. Slater, M.: "Lagrange Multipliers Revisited: A Contribution to Nonlinear Programming," Cowles Commission Discussion Paper, Mathematics 403, November, 1950. Slater, M. L.: A Note on Motzkin's Transposition Theorem, Econometrica, 19: 185-186 (1951). Stiemke, E.: t)ber positive Losungen homogener linearer Gleichungen, Mathematische Annalen, 76: 340-342 (1915). Stoer, J.: Duality in Nonlinear Programming and the Minimax Theorem, Numerische Mathematik, 5: 371-379 (1963). Stoer, J.: tlber einen Dualitatssatz der nichtlinearen Programmierung, Numerische Mathematik, 6: 55-58 (1964). Trefftz, E.: "Ein Gegenstiick zum Ritzschen Verfahren," Verhandlungen des Zweiten Internationalen Kongresses fur Technische Mechanik, p. 131, Zurich, 1927. Trefftz, E.: Konvergenz und Fehlerschatzung beim Ritzschen Verfahren, Mathematische Annalen, 100: 503-521 (1928). Tucker, A. W.: Dual Systems of Homogeneous Linear Relations, in [KuhnTucker 56], pp. 3-18. Tuy, Hoang: Sur les ine'galite's line*aires, Colloquium Mathematicum, 13: 107-123 (1964). Tyndall, W. F.: A Duality Theorem for a Class of Continuous Linear Programming Problems, Society for Industrial and Applied Mathematics Journal on Applied Mathematics, 13: 644-666 (1965). 111
Nonlinear Programming
Tyndall, W. F.: An Extended Duality Theorem for Continuous Linear Programming Problems, Notices of the American Mathematical Society, 14: 152-153 (1967). Uzawa, H.: The Kuhn-Tucker Theorem in Concave Programming, in [Arrow et al. 58], pp. 32-37. Vajda, S.: "Mathematical Programming," Addison-Wesley Publishing Company, Inc., Reading, Mass., 1961. Valentine, F. A.: "Convex Sets," McGraw-Hill Book Company, New York, 1964. Van Slyke, R. M., and R. J. B. Wets: "A Duality Theory for Abstract Mathematical Programs with Applications to Optimal Control Theory," Mathematical Note Number 538, Mathematics Research Laboratory, Boeing Scientific Research Laboratories, October, 1967. Varaiya, P. P.: Nonlinear Programming in Banach Space, Society for Industrial and Applied Mathematics Journal on Applied Mathematics, 16: 284-293 (1967). von Neumann, J.: Zur Theorie der Gesellschaftsspiele, Mathematische Annalen, 100: 295-320(1928). Whinston, A.: Conjugate Functions and Dual Programs, Naval Research Logistics Quarterly, 12: 315-322 (1965). Whinston, A.: Some Applications of the Conjugate Function Theory to Duality, in [Abadie 67], pp. 75-96. Wolfe, P.: A Duality Theorem for Nonlinear Programming, Quarterly of Applied Mathematics, 19: 239-244 (1961). Zangwill, W. I.: "Nonlinear Programming: A Unified Approach," Prentice-Hall, Inc., Englewood Cliffs, N.J., 1969. Zarantonello, E. H.: "Solving Functional Equations by Contractive Averaging," Mathematics Research Center, University of Wisconsin, Technical Summary Report Number 160, 1960. Zoutendijk, G.: "Methods of Feasible Directions," Elsevier Publishing Company, Amsterdam, 1960. Zukhovitskiy, S. I., and L. I. Avdeyeva: "Linear and Convex Programming," W. B. Saunders Company, Philadelphia, 1966.
ata
Indexes
This page intentionally left blank
Name Index Abadie, J., 97, 99, 100, 205 Almgren, F. J., 62 Anderson, K. W., 3, 205 Arrow, K. J., 102, 103, 151, 205 Avdeyeva, L. I., 212
Fiacco, A. V., 206, 207 Fleming, W. H., 2, 62, 71, 182, 188, 200, 207 Friedrichs, K. 0., 113, 207 Fromovitz, S., 72, 163, 170, 172, 209
Bartle, R. G., 200, 205 Berge, C., 3, 46, 55, 68, 122, 132, 182 189, 191, 205 Birkhoff, G., 187, 205 Bohnenblust, H. F., 67, 205 Boltyanskii, V. G., 72, 163, 210 Bracken, J., 205 Bram, J., 211 Brondsted, A., 122, 205 Browder, F. E., 87, 205 Buck, R. C., 182, 188, 205
Gale, D., 16, 33, 35, 113, 127, 177, 179, 207 Gamkrelidze, R. V., 72, 163, 210 Gass, S., 15, 207 Ghouila Houri, A., 55, 68, 122, 132, 189, 191, 205 Glicksburg, I., 63, 65, 206 Gol'stein, S., 122, 207 Gordan, P., 31, 207 Graves, R. L., 207
Canon, M., 72, 163, 170, 205 Carathe"odory, C., 43, 205 Charnes, A., 206 Cooper, W. W., 206 Cottle, R. W., 122, 206, 210 Courant, R., 2, 113, 206 Cullum, C., 72, 163, 170, 205 Daniel, J. W., 130 Dantzig, G. B., 15, 18, 113, 122, 206 Dennis, J. B., 113, 206 Dieter, U., 122, 206 Dorn, W. S., 117, 122, 124, 125, 206 Duffin. R. J.. 32, 206 Eisenberg, E., 122, 206 Enthoven, A. C., 151, 205
Fan, K., 63, 65, 206 Farkas, J., 16, 31, 206 Fenchel, W., 55, 122, 206
Hadley, G. F., 15, 207 Halkin, H., 72, 163, 170, 207 Hall, D. W., 3, 205 Halmos, P. R., 177, 207 Hamilton, N. T., 3, 207 Hanson, M. A., 113, 117, 118, 122, 137, 207, 208, 210 Hestenes, M. R., 200, 208 Hilbert, D., 113, 206 Hoffman, A. J., 63, 65, 206 Hu, T. C., 208 Huard, P., 115, 117, 118, 122, 208 Hurwicz, L., 102, 103, 122, 205, 208 Jacob, J.-P., 122, 208 Jensen, J. L. W. V., 61, 208 John, F., 77, 99, 208 Karamardian, S., 87, 117, 122, 137, 139, 208 Karlin, S., 67, 77-79, 205, 208 Koopmans, T. J., 208 Kowalik, J., 208 Krelle, W., 209
Nonlinear Programming
Kretschmar, K. S., 122, 208 Kuhn, H. W., 79, 94, 102, 105, 113, 127, 207-209 Kiinzi, H. P., 209
Rosen, J. B., 211 Rossi, P., 122, 208 Rubinstein, G. S., 122, 211 Rudin, W., 182, 188, 200, 211
Landin, J., 3, 207 Larsen, A., 122, 209 Lax, P. D., 113, 209 Levinson, N., 122, 209 Levitin, E. S., 209
Saaty, T. L., 211 Shapley, L. S., 67, 205 Simmonard, M., 15, 113, 211 Simmons, G. F., 182, 211 Slater, M. L., 27, 78, 211 Stiemke, E., 32, 211 Stoer, J., 122, 211
McCormick, G. P., 205, 207, 209 MacLane, S., 187, 205 Mangasarian, 0. L., 72, 117, 118, 122, 131, 140, 151, 157, 163, 170, 172, 209 Martos, B., 132, 137, 151, 209 Minty, G. J., 87, 209 Mishchenko, E. F., 72, 163, 210 Mond, B., 113, 122, 208-210 Moreau, J.-J., 122, 210 Motzkin, T. S., 28, 210 Neustadt, L. W., 163, 207 Nikaid6, H., 131, 132, 210 Oettli, W., 209 Opial, Z., 87, 210 Osborne, M. R., 208 Polak, E., 72, 163, 170, 205, 209 Polyak, B. T., 209 Ponstein, J., 117, 118, 122, 209, 210 Pontryagin, L. S., 72, 163, 210 Rissanen, J., 122, 210 Ritter, K., 122, 210 Rockafellar, R. T., 87, 122, 210
216
Trefftz, E., 113, 211 Tucker, A. W., 16, 20, 21, 24, 25, 28, 29, 79, 94, 102, 105, 113, 127, 207209, 211 Tuy, Hoang, 131, 140, 151, 211 Tyndall, W. F., 122,211, 212 Uzawa, H., 77, 79, 80, 102, 103, 205, 212 Vajda, S., 212 Valentine, F. A., 46, 55, 212 Van Slyke, R. M., 122, 212 Varaiya, P. P., 122, 212 von Neuman, J., 113, 212 Wets, R. J. B., 122, 212 Whinston, A., 122, 212 Wolfe, P., 114, 115, 121, 122, 207, 212 Zangwill, W. L, 212 Zarantonello, E. H., 87, 212 Zoutendijk, G., Ill, 212 Zukhovitskiy, S. L, 212
Subject Index Alternative: table of theorems of, 34 theorems of, 27-37 Angle between two vectors, 8 Axiom of real number system, 188 Ball: closed, 183 open, 182 Basis, 178 Bi-nonlinear function, 149 Bolzano-Weierstrass theorem, 189 Bounded function, 196 Bounded set, 188 Bounds: greatest lower and least upper, 187 lower and upper, 187 Caratheodory's theorem, 43 Cauchy convergence, 188 Cauchy-Schwarz inequality, 7 Closed set, 183 Closure of a set, 183 Compact set, 189 Concave function, 56 differentiable, 83-91 minimum of, 73 strictly, 57 strictly concave: and differentiable, 87-88 and twice differentiable, 90-91 twice differentiable, 88-90 Concave functions, infimum of, 61 Constraint qualification: Arrow-Hurwicz-Uzawa, 102 modified, 172 weak, 154 with convexity, 78-81 with differentiability, 102-105, 154-156, 171-172 Karlin's, 78 Kuhn-Tucker, 102, 171
Constraint qualification: reverse convex, 103 weak, 155, 172 Slater's, 78 weak, 155 strict, 79 Constraint qualifications, relationships between, 79, 103, 155 Continuous function, 191-192 Convex: and concave functions, 54-68 generalized, 131-150 and pseudoconvex functions, relation between, 144, 146 Convex combinations. 41 Convex function, 55 continuity of, 62 differentiable, 83-91 at a point, 83 on a set, 84 strictly, 56 strictly convex: and differentiable, Q7 fifi oi~ oo
and twice differentiable, 90-91 twice differentiable, 88-90 Convex functions: fundamental theorems for, 63-68 supremum of, 61 Convex hull, 44 Convex set, 39 Convex sets, 38-53 intersection of, 41 Differentiable function: numerical, 200 vector, 201 Distance between two points, 7 Dual linear problem (LDP), 127, 130 Dual (maximization) problem (DP), 114, 123 Dual problem, unbounded, 119, 121, 126, 127 Dual variables, 71
Nonlinear Programming
Duality, 113-130, 157-160, 174-176 in linear programming, 126-130 with nonlinear equality constraints, 174-176 in nonlinear programming, 114-123 in quadratic programming, 123-126 Duality theorem: Dorn's, 124 Dorn's strict converse, 125 Hanson-Huard strict converse, 118 linear programming, 127 strict converse, 117, 118, 124, 157, 160, 174, 176 weak, 114, 124 Wolfe's, 115 Equality constraints: linear, 80, 95 nonlinear, 75, 112, 161-176 Euclidean space Rn , 6 Farkas' theorem, 16, 31, 37, 53 nonhomogeneous, 32 Feasible directions, method of, 111 Finite-intersection property, 189 Fractional function: linear, 149 nonlinear, 148 Fritz John saddlepoint necessary optimality theorem, 77 Fritz John saddlepoint problem (FJSP), 71 Fritz John stationary-point necessary optimality theorem, 99, 170 Fritz John stationary-point problem (FJP), 93 Function, 11 linear vector, 12 numerical, 12 vector, 12 Gale's theorems, 33 Gordan's theorem, 31
ais
Gordan's theorem: generalized to convex functions, 65 Gradient of a function, 201 Halfspace, 40 Heine-Borel theorem, 189 Hessian matrix, 202 Implicit function theorem, 204 Inequalities, linear, 16-37 Infimum, 187, 195 of a numerical function, 196 Interior of a set, 183 Interior point, 182 Jacobian matrix, 202 Kuhn-Tucker saddlepoint necessary optimality theorem, 79 in the presence of equality constraints, 80 Kuhn-Tucker saddlepoint problem (KTSP), 71 Kuhn-Tucker stationary-point necessary optimality criteria, 105, 111, 112, 156, 173 Kuhn-Tucker stationary-point problem (KTP), 94 Kuhn-Tucker sufficient optimality theorem, 94, 153, 162 Lagrange multipliers, 71 Lagrangian function, 71 Limit, 186 Limit point, 186 Line, 38 segments, 39 Linear combination, 6 nonnegative, 7 Linear dependence, 6
Subject Index
Linear inequalities, 16-37 Linear minimization problem (LMP), 127, 130 Linear programming: duality, 126-130 optimality, 18-21 Linear systems, existence theorems, 21-26 Linearization lemma, 97, 153, 163 Local minimization problem (LMP), 70, 93 Mapping, 11 Matrices, 8-11 Matrix: nonvacuous, 11 submatrices of, 10 Maximum, 195 of a numerical function, 197 existence of, 198 Maximum principle (see Minimum principle) Mean-value theorem, 204 Metric space, 7 Minimization problem (MP), 70, 93 local, 70, 93 solution set, 72 uniqueness, 73 Minimum, 195 of a numerical function, 197 existence of, 198 Minimum principle, 72, 162-170, 141-142 Motzkin's theorem, 28 Negative definite matrix, 91, 181 Negative semidefinite matrix, 89, 90, 181 Nonlinear programming problem 1-3, 14-15 Nonsingular matrix, 181 Norm of a vector, 7 Notation, 13-15 Numerical function, 12
Open set, 183 Optimality criteria: with differentiability, 92-112, 151-176 necessary, 97-112, 153-157, 162-174 sufficient, 94-97, 151-153, 162 necessary saddlepoint, 76-82 saddlepoint, 69-82 sufficient saddlepoint, 74-76 Partial derivatives: of a numerical function, 200-201 of a vector function, 201-202 Plane, 40 Point of closure, 182 Polyhedron, 41 Poly tope, 41 Positive definite matrix, 91, 181 Positive semidefinite matrix, 89, 90, 181 Primal minimization problem (MP), 114, 122 Primal problem, no minimum, 121,126, 127 Product of a convex set with a real number, 45 Pseudoconcave function, 141 Pseudoconvex and strictly quasiconvex functions, relation between, 143, 146 Pseudoconvex function, 140-141 Quadratic dual problem (QDP), 124 Quadratic minimization problem (QMP), 123 Quadratic programming, 123-126 Quasiconcave function, 132 differentiable, 134 strictly, 137 Quasiconvex and strictly quasiconvex functions, relation between, 139, 146 119
Nonlinear Programming
Quasiconvex function, 131 differentiable, 134 strictly, 137 Rank of a matrix: column, 179 row, 179 Relatively closed set, 183 Relatively open set, 183 Rn, 6 Saddlepoint problem: Fritz John (FJSP), 71 Kuhn-Tucker (KTSP), 71 Scalar product, 7 Semicontinuous function: lower, 192-193 upper, 193 Separating plane, 46 Separation theorem, 49 strict, 50 Separation theorems, 46-53 Sequence, 185 Set, 3 complement of, 4 element of, 3 empty, 4 product with a real number, 45 subset of, 3 Set theoretic symbols, 5 Sets: difference of, 4 disjoint, 4
120
Sets: intersection of, 4 product of, 4-5 sum of, 45 union of, 4 Simplex, 42 Slater's theorem, 27 Stiemke's theorem, 32 Subspace, 40 Sum of two convex sets, 45 Supremum, 187, 195 of a numerical function, 196 Symbols, 5 Taylor's theorem, 204 Triangle inequality, 8, 41 Tucker's theorem, 29 Twice differentiable numerical function, 202 Uniqueness of minimum solution, 73 Vector, 6 addition, 6 multiplication by a scalar, 6 norm of, 7 Vector function, 12 Vector space, fundamental theorem of, 177 Vectors: angle between two, 8 scalar product of, 7 Vertex, 41