Applied Mathematical Sciences Volume 45
Applied Mathematical Sciences 1. John: Partial Differential Equations, 4th ed. (cloth) 2. Sirovich: Techniques of Asymptotic Analysis. 3. Hale: Theory of Functional Differential Equations, 2nd ed. (cloth) 4. Percus: Combinatorial Methods. 5. von Mises/Friedrichs: Fluid Dynamics. 6. Freiberger/Grenander: A Short Course in Computational Probability and Statistics. 7. Pipkin: Lectures on Viscoelasticity Theory. 8. Giacaglia: Perturbation Methods in Non-Linear Systems. 9. Friedrichs: Spectral Theory of Operators in Hilbert Space. 10. Stroud: Numerical Quadrature and Solution of Ordinary Differential Equations. 11. Wolovich: Linear Multivariable Systems. 12. Berkovitz: Optimal Control Theory. 13. Bluman/Cole: Similarity Methods for Differential Equations.
14. Yoshizawa:
Stability Theory and the Existence of Periodic Solutions and Almost Periodic Solutions.
15. Braun: Differential Equations and Their Applications, 3rd ed. (cloth) 16. Lefschetz: Applications of Algebraic Topology. 17. Collatz/Wetterling: Optimization Problems. 18. Grenander: Pattern Synthesis: Lectures in Pattern Theory, Vol I. 19. Marsden/McCracken: The Hopf Bifurcation and its Applications. 20. Driver: Ordinary and Delay Differential Equations. 21. Courant/Friedrichs: Supersonic Flow and Shock Waves. (cloth) 22. Rouche/Habets/Laloy: Stability Theory by Liapunov's Direct Method. 23. Lamperti: Stochastic Processes: A Survey of the Mathematical Theory. 24. Grenander: Pattern Analysis: Lectures in Pattern Theory, Vol. It. 25. Davies: Integral Transforms and Their Applications. 26. Kushner/Clark: Stochastic Approximation Methods for Constrained and Unconstrained Systems.
27. de Boor: A Practical Guide to Splines. 28. Keilson: Markov Chain Models-Rarity and Exponentiality. 29. de Veubeke: A Course in Elasticity. 30. Sniatycki: Geometric Quantization and Quantum Mechanics. 31. Reid: Sturmian Theory for Ordinary Differential Equations. 32. Meis/Markowitz: Numerical Solution of Partial Differential Equations. 33. Grenander: Regular Structures: Lectures in Pattern Theory, Vol. III. 34. Kevorkian/Cole: Perturbation Methods in Applied Mathematics. (cloth) 35. Carr: Applications of Centre Manifold Theory.
(continued)
Klaus Glashoff O
Sven-Ake Gustafson
Linear Optimization and Approximation An Introduction to
the Theoretical Analysis and Numerical Treatment of Semi-infinite Programs With 20 Illustrations
Springer-Verlag New York Heidelberg Berlin
Klaus Glashoff Universitat Hamburg Institut fur Angewandte Mathematik 2 Hamburg 13 Bundestrasse 55 Federal Republic of Germany
Sven-Ake Gustafson Department of Numerical Analysis and Computing Sciences Royal Institute of Technology S-10044 Stockholm 70 Sweden and Centre for Mathematical Analysis Australian National University
P.O. Box 4 Canberra, ACT 2600 Australia
AMS Subject Classifications: 90005, 49D35
Library of Congress Cataloging in Publication Data Glashoff, Klaus, 1947Linear optimization and approximation. (Applied mathematical sciences ; v. 45) Translation of: Emfiihrung in die lineare Optimierung. Includes bibliographical references and index. 1. Mathematical optimization. 2. Duality theory. (Mathematics) I. Gustafson, Sven-Ake, 1938II. Title. Ill. Series: Applied mathematical sciences (Springer-Verlag New York Inc.) : v. 45. QAI.A647
vol. 45
510s
[519.7'21
83-647
[QA402.51
Original edition © 1978 by Wissenschaftliche Buchgesellschaft, Darmstadt/ West-Germany. (First published in the series: "Die Mathematik. Einfiihrungen in Gegenstand and Ergebnisse ihrer Teilgebiete and Nachbarwissenschaften.")
English edition © 1983 by Springer-Verlag New York Inc. All rights reserved. No part of this book may be translated or reproduced in any form without written permission from Springer-Verlag, 175 Fifth Avenue, New York, New York 10010, U.S.A. Printed and bound by R.R. Donnelley & Sons, Hamsonburg, VA. Printed in the United States of America.
987654321 ISBN 0-387-90857-9 ISBN 3-540-90857-9
Springer-Verlag New York Heidelberg Berlin Springer-Verlag Berlin Heidelberg New York
Preface
A linear optimization problem is the task of minimizing a linear real-valued function of finitely many variables subject to linear constraints; in general there may be infinitely many constraints. is devoted to such problems.
This book
Their mathematical properties are investi-
gated and algorithms for their computational solution are presented. Applications are discussed in detail. Linear optimization problems are encountered in many areas of applications.
long time.
They have therefore been subject to mathematical analysis for a We mention here only two classical topics from this area:
the so-called uniform approximation of functions which was used as a mathematical tool by Chebyshev in 1853 when he set out to design a crane, and the theory of systems of linear inequalities which has already been studied by Fourier in 1823.
We will not treat the historical development of the theory of linear optimization in detail.
However, we point out that the decisive break-
through occurred in the middle of this century.
It was urged on by the
need to solve complicated decision problems where the optimal deployment of military and civilian resources had to be determined. of electronic computers also played an important role.
The availability The principal
computational scheme for the solution of linear optimization problems, the simplex algorithm, was established by Dantzig about 1950.
In addi-
tion, the fundamental theorems on such problems were rapidly developed, based on earlier published results on the properties of systems of linear inequalities.
Since then, the interest of mathematicians and users in linear optimization has been sustained.
New classes of practical applications are v
Vi
being introduced continually and special variants of the simplex algorithm and related schemes have been used for the computational treatment of practical problems of ever-growing size and complexity.
The theory of
"classical" linear optimization problems (with only finitely many linear constraints) had almost reached its final form around 1950; see e.g. the excellent book by A. Charnes, W. W. Cooper and A. Henderson (1953). Simultaneously there were great efforts devoted to the generalization and extension of the theory of linear optimization to new areas.
Thus non-
linear optimization problems were attacked at an early date.
(This area
plays only a marginal role in our book.)
Here, connections were found
with the classical theory of Lagrangian multipliers as well as to the duality principles of mechanics.
The latter occurred in the framework of
convex analysis.
At the same time the theory of infinite linear optimization came It describes problems with infinitely many variables and
into being.
constraints.
This theory also found its final form rapidly; see the paper
by R. J. Duffin (1956).
A special but important class of infinite linear optimization problems are those problems where the number of variables is finite but the number of linear inequality constraints is arbitrary, i.e. may be infinite. This type of problem, which constitutes a natural generalization of the classical linear optimization problem, appears in the solution of many concrete examples.
We have already mentioned the calculation of uniform
approximation of functions which plays a major role in the construction of computer representations of mathematical expressions.
Uniform approxi-
mation can also be successfully used in the numerical treatment of differential equations originating in physics and technological problems. Using an investigation by Haar from 1924 as a point of departure, A. Charnes, W. W. Cooper and K. 0. Kortanek in 1962 gave the fundamental mathematical results of the last-mentioned class of linear optimization problems (with the exception of those questions which were already settled by Duffin's theory).
This class of optimization problems, often called semi-infinite programs, will be the main topic of the present book.
The "classical" linear
optimization problems, called linear programs, will occur naturally as a special case.
Whether the number of inequality constraints is finite is a matter of minor importance in the mathematical theory of linear optimization problems.
The great advantage of treating such a general class of problems,
vii
encompassing so many applications, need not, fortunately, be achieved by means of a correspondingly higher level of mathematical sophistication. In our account we have endeavored to use mathematical tools which are as simple as possible.
To understand this book it is only necessary to mas-
ter the fundamentals of linear algebra and n-dimensional analysis. theory is summarized in §2.)
(This
Since we have avoided all unnecessary mathe-
matical abstractions, geometrical arguments have been used as much as possible.
In this way we have escaped the temptation to complicate simple
matters by introducing the heavy apparatus of functional analysis. The central concept of our book is that of duality.
Duality theory
is not investigated for its own sake but as an effective tool, in particular for the numerical treatment of linear optimization problems. Therefore all of Chapter II has been devoted to the concept of weak duality.
We give some elementary arguments which serve to illustrate
the fundamental ideas (primal and dual problems).
This should give the
reader a feeling for the numerical aspects of duality.
In Chapter III
we discuss some applications of weak duality to uniform approximation where the emphasis is again placed on numerical aspects. The duality theory of linear optimization is investigated in Chapter IV.
Here we prove theorems on the existence of solutions to the optimi-
zation problems considered.
We also treat the so-called strong duality,
i.e. the question of equality of the values of the primal and dual probThe "geometric" formulation of the dual problem, introduced here,
lems.
will be very useful for the presentation of the simplex algorithm which is described in the chapter to follow.
In Chapter V we describe in great detail the principle of the exchange step which is the main building block of the simplex algorithm. Here we dispense with the computational technicalities which dominate many presentations of this scheme.
The nature of the simplex algorithm can be
explained very clearly using duality theory and the language of matrices and without relying on "simplex tableaux", which do not appear in our text. In Chapter VI we treat the numerical realization of the simplex algorithm. solved.
It requires that a sequence of linear systems of equations be Our presentation includes the stable variants of the simplex
method which have been developed during the last decade. In Chapter VII we present a method for the computational treatment of a general class of linear optimization problems with infinitely many constraints. (1970).
This scheme was described for the first time in Gustafson
Since then it has been successfully used for the solution of many
viii
practical problems, e.g. uniform approximation over multidimensional domains (also with additional linear side-conditions), calculation of quadrature rules, control problems, and so on.
In Chapter VIII we apply the ideas of the preceding three chapters to the special problem of uniform approximation over intervals.
The
classical Remez algorithm is studied and set into the general framework of linear optimization.
The concluding Chapter IX contains several worked examples designed to elucidate the general approach of this book.
We also indicate that the
ideas behind the computational schemes described in our book can be applied to an even more general class of problems. The present text is a translated and extended version of GlashoffGustafson (1978). IV is revised.
Chapters
VIII and IX are completely new and Chapter
More material has been added to Chapters III and VII.
These changes and additions have been carried out by the second author, who is also responsible for the translation into English.
Professor
Harry Clarke, Asian Institute of Technology, Bangkok, has given valuable help with the latter task.
We hope that this book will provide theoretical and numerical insights which will help in the solution of practical problems from many disciplines.
We also believe that we have clearly demonstrated our con-
viction that mathematical advances generally are inspired by work on real world problems.
Table of Contents Page
Preface CHAPTER I.
v
INTRODUCTION AND PRELIMINARIES §1. §2. §3.
CHAPTER II.
WEAK DUALITY §4. §5.
CHAPTER III.
§10. §11.
CHAPTER V.
§17.
CHAPTER VIII.
Stable Variants of the Simplex Algorithm Calculating a Basic Solution
Nonlinear Systems Derived From Optimality Conditions A General Computational Scheme
APPROXIMATION PROBLEMS BY CHEBYSHEV SYSTEMS §18. §19.
§20.
CHAPTER IX.
Basic Solutions and the Exchange Step The Simplex Algorithm and Discretization
A GENERAL THREE-PHASE ALGORITHM §16.
General Properties of Chebyshev Systems One-sided Approximation and Generalized Quadrature Rules of the Gaussian Type Computing the Best Approximation in the Uniform Norm
EXAMPLES AND APPLICATIONS OF SEMI-INFINITE PROGRAMMING §21. §22. §23. §24.
1 5
10
20 30
37 37 46 58
Geometric Interpretation of the Dual Problem Solvability of the Dual Problem Separation Theorem and Duality Supporting Hyperplanes and Duality
NUMERICAL REALIZATION OF THE SIMPLEX ALGORITHM §14. §15.
CHAPTER VII.
Uniform Approximation Polynomial Approximation
THE SIMPLEX ALGORITHM §12. §13.
CHAPTER VI.
Duality Lemma and Dual Problem State Diagrams and Duality Gaps
DUALITY THEORY §8. §9.
1
20
APPLICATIONS OF WEAK DUALITY IN UNIFORM APPROXIMATION §6. §7.
CHAPTER IV.
Optimization Problems Some Mathematical Prerequisites Linear Optimization Problems
A Control Problem with Distributed Parameters Operator Equations of Monotonic Type An Air Pollution Abatement Problem Nonlinear Semi-Infinite Programs
S8 69 75 82
92 93 105 115
115 128
134 135 141
153 153 158 168
175
175 181 184 188
References
193
Index
196 ix
Chapter I
Introduction and Preliminaries
51.
OPTIMIZATION PROBLEMS Optimization problems are encountered in many branches of technology,
in science, and in economics as well as in our daily life.
They appear
in so many different shapes that it is useless to attempt a uniform description of them or even try to classify them according to one principle In the present section we will introduce a few general con-
or another.
cepts which occur in all optimization problems.
Simple examples will
elucidate the presentation. Example:
(1)
Siting of a power plant.
Pl,P2,...,P5.
located at
Five major factories are
A power plant to supply them with electricity
is to be built and the problem is to determine the optimal site for this plant.
The transmission of electrical energy is associated with energy
losses which are proportional to the amount of transmitted energy and to the distance between power plant and energy consumer.
One seeks to sel-
ect the site of the plant so that the combined energy loss is rendered a minimum.
are represented by points in the plane with the
P1,P21.... P5
coordinates
PI = (x1,Y1),...,P5 = (x5,YS)
d(P,F) _
{(x-x)2
+
(Y-Y)2}1/2.
Denote the transmitted energy quantities by lem may now be formulated.
plane, a point value at
P
is given by
P = (x,y), P = (x,y)
The distance between the two points
E1,...,E5.
Our siting prob-
We seek, within a given domain
G
of the
such that the following function assumes its minimal
P: 1
INTRODUCTION AND PRELIMINARIES
I.
2
E1d(P,P1) + E2d(P,P2) +...+ E5d(P,PS). In order to introduce some terminology we reformulate this task. fine the real-valued function
f
of two real variables
x,y
We de-
through
f(x,y) = E1{(x-x1)2 + (y-y1)2}112 +...+ E5{(x-x5)2 + (y-y5)2}1/2. We then arrive at the optimization problem: that
P = (x,y) E G
Determine numbers
x,y
such
and
f(x,y) < f(x,y) for all
(x,y) E G.
Siting of power plant
Fig. 1.1.
All important concepts associated with optimization problems may be illustrated by this example:
f
is called a preference function, G
the permissible set, and the points of feasible.
G
is
are called permissible or
Thus the optimization problem means that one should seek a
permissible point such that to the permissible set.
f
assumes its minimal value with respect
If such a point does exist, it is called an
optimal point (for the problem considered), or optimal solution, or minimum point of
f
in
G.
In the analysis of an optimization problem it is important to verify that an optimal solution does exist, i.e. that the problem is solvable. This is not always the case. that the functions
fi(x) = -x
As an illustration of this fact we note and
f2(x) = e_X
points in the set of all real numbers.
do not have any minimum
On the other hand, if an optimiza-
tion problem is solvable, a minimum point may not be unique.
In many
applications it is required to determine all minimum points which the preference function has in the permissible set.
Optimization Problems
1.
3
It is of course of no use to formulate a task, appearing in economics or technology, as an optimization problem when this problem cannot be solved.
A formulation as an optimization problem is thus advantageous
only when the mathematical structure of this task can be investigated and suitable theoretical and computational tools can be brought to bear. Oftentimes, "applications" to economics or management are proposed whereby very complicated optimization problems are constructed but it is not pointed out that neither theoretical nor numerical treatment of the problem appears to be within reach, now or in the near future.
It should
always be remembered that only some of the relevant factors can be incorporated when a decision problem is formulated as an optimization problem. There are always decision criteria which cannot be quantified and whose inclusion into a mathematical model is of doubtful value.
Thus, in the
siting problem discussed above, there are many political and ecological factors which cannot be accounted for in a mathematical model.
This indi-
cates that there is, in principle, a limit of what can be gained by the mathematization of social processes.
This difficulty cannot, as a rule,
be overcome by resorting to more complicated models (control theory, game theory, etc.) even if it sometimes may be concealed. quite different for technical systems.
The situation is
Since nowadays the mathematiza-
tion and also the "optimization" of social processes are pushed forward with great energy, we find the critical remark above to be justified. Production model.
Example:
(2)
or consumes
goods
n
G1,...,Gn
environmental pollutants). numbers
(a1,...,an)
We consider a firm which produces
(e.g. raw materials, labor, capital,
An activity of the firm is represented by
where
ar
indicates the amount of good
is produced or consumed when the activity sity
1
(measured in suitable units).
ect various activities
P
.
s
that to each
s
finite) there are
numbers
n
which
is taking place with inten-
We assume that the firm can sel-
Thus the firm's technology has the property
in a fixed index set n
P
Gr
S
(which may be finite or in-
(a1(s),...,an(s)).
A production plan of
the firm is defined by selecting a (finite) number of activities ..,P s
Ps 1
ties
and prescribing that they are carried out with the intensi-
q
xl,...,xq, where
xi > 0, i = 1,2,...,q.
We assume that the pro-
duction process is linear, i.e. for the given production plan the amount of good
Gr
which is produced or consumed is given by
ar(s1)x1 + ar(s2)x2 +...+ ar(sq)xq.
4
INTRODUCTION AND PRELIMINARIES
I.
We shall further assume that the activity
causes the profit (or cost)
P s
Hence the profit achieved by the chosen production plan is given
b(s).
by b(sl)xl + b(s2)x2 +...+ b(sq)xq.
(3)
The optimization problem of the firm is to maximize its profit by proper choice of its production plan, i.e. it must select finitely many activiand the corresponding intensities
Ps ,...,Ps
ties
xl,x2,...,xq
such
1
that the expression (3) assumes the greatest value possible. The choice of activities and intensities is restricted by the fact that only finite amounts of the goods
G1,...,Gn
are available.
In
practice this is true only for some of the goods but for simplicity of presentation we want to assume that all goods can only be obtained in limited amounts:
ar(sI)xl + ar(s2)x2 +...+ ar(s )x q
Thus (4) defines
r = 1,2,...,n.
(4)
side-conditions which constrain the feasible acti-
n
The optimization problem can thus be cast into
vities and intensities. the form:
< cr,
q-
Determine a finite subset
and the real numbers
of the index set
{sl,...,sq}
S
such that the expression (3) is rendered
x1,... x q
a maximum under the constraints (4) and the further side-conditions xi > 0, (5)
i = 1,2,...,q.
Remark.
A maximization problem is transformed into an equival-
ent minimization problem by multiplying its preference function by (6)
in
M
The general optimization problem. be a real-valued function defined on
f
let
M.
M
be a fixed set and
We seek an element
x
such that f(x) < f(x)
M
Let
-1.
for all
x E M.
is called the feasible or permissible set and
ference function.
f
is termed the pre-
We remark here that the feasible set is, as a rule,
not explicitly given but is defined through side-conditions (often called constraints), as in Example (2). (7)
Definition.
v = {inf f(x)
I
The number
v
given by
X E M}
is called the value of the corresponding optimization problem.
If
M
is the empty set, i.e. there are no feasible points, the optimization
2.
Some Mathematical Prerequisites
5
problem is said to be inconsistent and we put
If feasible points
v = -.
do exist we term the optimization problem feasible or consistent.
If
v = -W, the optimization problem is said to be "unbounded from below".
Thus every minimization problem must be in one and only one of the following three "states" IC, B, UB:
IC = Inconsistent; the feasible set is empty and the value of the problem is
+W.
B = Bounded; there are feasible points and the value is finite. UB = Unbounded; there are feasible points, the preference function is unbounded from below, and the value is The value of a maximization problem is
-m
-m.
in the state IC, finite in
state B, and - in the state UB.
42.
SOME MATHEMATICAL PREREQUISITES The successful study of this book requires knowledge of some elemen-
tary concepts of mathematical analysis as well as linear algebra.
We
shall summarize the notations and some mathematical tools in this section. (1)
Vectors.
We denote the field of real numbers by
R, and by
Rn
the n-dimensional space of all n-tuples of real numbers
(2)
x n In
Rn, the usual vector space operations are defined:
componentwise
addition of vectors and multiplication by scalars (i.e. real numbers). We assume that the reader is familiar with the concepts of "linear independence", "basis", and "subspace".
written (3)
0.
Matrices.
real numbers
of
The zero vector of
Rn
is
n-tuples of the form (2) are also referred to as "points". An
m x n matrix aik
A
(m 2.1) is a rectangular array
(i = 1,2,...,m, k = 1,2,...,n),
all
a12
aln
a21
a22
a2n
aml
amt
amn
A =
6
INTRODUCTION AND PRELIMINARIES
I.
are termed the elements of the matrix A and aik is aik situated in row number i and column number k. To each given matrix A The numbers
we define its transpose
T
...
by
all
a21
a12
a22
...
amt
l aln
a2n
...
amn
1
A
AT
I
'ml
=
Every vector
x E Rn
I
may be considered an
n x 1
matrix.
In order to
save space we write, instead of (2), XT = (xl,x2,...,xn).
We note that
The reader is supposed to know elementary matrix
(AT)T = A.
operations (addition and multiplication of matrices). Linear mappings.
(4)
mapping of
Rn
y E Rm
vector
into
km
Every m x n
matrix
whereby every vector
A
defines a linear
x E Rn
is mapped onto a
via
(5)
y = Ax.
Using the definition of matrix multiplication we find that the components
of y
are to be calculated according to yi = ailxl + ai2x2 +...+ ainxn,
Denote the column vectors of A
1 < i < m.
al,a2,...,an.
by
Then we find
Ax = alxl + a2x2 +.. .+ anxn.
(6)
Equation (6) thus means that the vector the column vectors of (7)
(5).
Linear systems of equations.
The task of determining
lems of linear algebra. n
unknowns
y
is a linear combination of
A.
xl,x2,...,xn
Now let a fixed
y
be given in
in (5) is one of the fundamental prob-
x
(5) is called a linear system of equations with and
m
equations.
We assume that the solva-
bility theory of (5) (existence and uniqueness of solutions) is known to the reader. each
y E Rm
has the rank
An example:
from (6) we conclude that (5) is solvable for
if the column vectors of A in.
solution if the column vectors of when
A
span all of
Rm, i.e. if A
It is equally simple to verify that (5) has at most one A
are linearly independent.
is a square matrix, n x n, is of particular interest.
The case Then (5)
2.
Some Mathematical Prerequisites
7
has an equal number of equations and unknowns. Ax = y
has a unique solution
the column vectors
of A
al,a2,...,an
Rn, i.e. are
is said to be regular (or n x n
In this case there exists a
nonsingular).
if and only if
y E Rn
form a basis of
Then the matrix A
linearly independent.
Then the linear system
for each
x E Rn
matrix
A-1
with the
properties A-1(Ax) = x, A-I
A(A-1x) = x,
all
is called the inverse of A
x E Rn.
and the linear system of equations (5)
has the unique solution x = A-1y.
are given.
A vector
y E Rn
and a number
Then we denote by the hyperplane
H(y;n)
the set of all points
Hyperplanes.
(8)
n E R
x E Rn
such that T Y x = ylxl + y2 x2 +...+ y x n n
= n
y
is called the normal vector of the hyperplane.
x
and
z
we have
H(y;n)
in
For any two vectors
yT (x-z) = 0. yTx = n
A hyperplane H(y;n)
partitions
Rn
into three disjoint sets, namely
and the two "open half-spaces" T
Al = {x
I
y
x < n}
yTx > n}.
A2 = {x
The linear system of equations (5) also admits the interpretation that the vector
x
must be in the intersection of the hyperplanes a1,...,am
(i = 1,2,...,m), where A.
Sets of the form A
U H(n;y)
H(a'';yi),
here are the row-vectors of the matrix and
A2 U H(n;y)
are termed closed
1
half-spaces.
YTx < n
They consist of all points or
x E Rn
such that
YTx > n,
respectively. (9)
Vector norms.
real number laws:
Ilxll.
We shall associate with each vector
The mapping
x - Ilxll
x E Rn
shall obey the following
a
8
INTRODUCTION AND PRELIMINARIES
I.
llxll > 0, all
(i)
(ii)
ilaxll =
(iii)
Then
x E Rn
lal
llxll = 0
and
llxll, all
x E Rn, all
for
x = 0
only;
X E R;
x E R, y E R.
llx+Yll < llxll + llyll, all
will be called the norm of the vector.
llxll
Show that the following mappings define vector norms on
Exercise: Rn:
x - max{lx 1l,lx2l,...,lxnl}. The most well-known norm is the Euclidean norm, which will be treated in the next subsection.
The scalar product of two
Scalar product and Euclidean norm.
(10)
vectors
and
x
T
is defined to be the real number
y
T
x y = Y x = xlyl + x2y2 +...+ xnyn. The real number x'x = (x1 + x2 +...+
=
lxl
is called the Euclidean norm or length or absolute value of the vector x.
The reader should verify that the mapping
the sense of (9).
lx+yi2 + lx-Yl2 = 2(
Ix12 +
for all
lyl2)
Some topological fundamentals.
(11)
two points
x,y
defines a norm in
x - lxl
It is also easy to establish the "parallelogram law"
in
to be given by
Rn
ing of all points whose distance to
a
We define the distance between
l
The set
lx-yl.
is less than
number, is termed the open sphere with center
Kr(a) = {x E Rn
x,y E Rn.
a
consist-
K (a) r
r, a fixed positive
and radius
r.
Thus
lx-al < r}.
We are now in a position to introduce the fundamental topological structure of
A point
Rn.
if there is a sphere K (a) c :A. T A.
A
a
is said to be an inner point of a subset
K (a) r
which in its entirety belongs to
We will use the symbol
A
a
A.
A
is termed open if A = A.
is said to be a boundary point of the set contains both points in
A
sphere
Kr(a)
to
The set of all boundary points of
A.
A,
for the set of all inner points of 0
is also called the interior of
The point
A c Rn
A
if every
and points which do not belong A
is called the boundary of
A
2.
Some Mathematical Prerequisites
and is denoted
closure of A A = A.
9
The union of
bd A.
and is denoted
A
and its boundary is called the
A
The set
A.
is said to be closed if
The following relations always hold.
0
0
AcAcA,
bd A = A A.
The topological concepts introduced above have been defined using the Euclidean norm.
This norm will be most often used in the sequel.
How-
ever, one may define spheres in terms of other norms and in this way arrive at the fundamental topological concepts "inner points", "open sets", and so on, in the same manner as above.
prove that all norms on
Rn
Fortunately it is possible to
are equivalent in the sense that they gen-
erate the same topological structure on
Rn:
A set which is open with
respect to one norm remains open with respect to all other norms. to establish this assertion one first verifies that if
11.111
are two norms on
c
there are two positive constants
Rn
In order
and
and
C
1.112
such
that
cIIxlt1 < 1Ix112 < CjjxIIl
for all
x E Rn.
Based on these fundamental structures one can now define the main concept of convergence of sequences and continuity of functions in the usual way. We suppose here the reader is familiar with these concepts. Compact sets.
(12)
there is a real number sets of
Rn
A c Rn
such that
is said to be bounded when
A c K(0).
Closed bounded sub-
will be termed compact.
Compact subsets
A
Every infinite sequence gent subsequence the image
A subset r > 0
f(A)
of
Rn
have the following important property: of points in the set
{xi}i>1
{xik}k>l'
If
f: R' + Rm
of every compact set
A
A
has a conver-
is a continuous mapping, then
is compact also.
From this
statement we immediately arrive at the following result which also may be looked upon as an existence statement for optimization problems: (13)
of
Rn
and
Theorem of Weierstrass. f
Let
assumes its maximum and minimum value on x E A
and
A
be a nonempty compact subset
a real-valued continuous function defined on
x E A
A.
Then
f
A, i.e. there exist points
such that
f (x) = max{f (x)
I
x E Al
and
f(z) = min{f(x)
I
x E A}.
It is recommended that the reader, as an exercise, carry out the proof of this simple but important theorem.
10
§3.
I.
INTRODUCTION AND PRELIMINARIES
LINEAR OPTIMIZATION PROBLEMS An optimization problem shall be called a linear optimization problem
(LOP) when the preference function is linear and the feasible domain is defined by linear constraint functions. Thus the preference function has the form n C
cryr,
I
r=1
where
is a fixed vector in
c
Rn.
The set of feasible vectors of an
(LOP) will be defined as an intersection of half-spaces: given index set (which may be finite or infinite). associate a vector
as E Rn
and a real number
Let
With each
bs.
S
be a
s E S
we
Then the set of
feasible vectors of a linear optimization problem consists of all vectors y E Rn {y
lying in all half-spaces
I
asy > bs},
s E S.
(1)
We shall discuss two examples of sets of vectors defined by means of systems of linear inequalities. (2)
Example.
S = {1,2}
(In both cases we have
n = 2.)
al = (2,3) T, a2 = (-1,0) T, bl = 6, b2 = -3.
In this case (1) becomes 2yl + 3y2 > 6 -Yl
> -3.
This set is indicated in Figure 3.1 by the checkered area.
V2
2y1 + 3y2 ^ 6
- '1= Y1
Fig. 3.1
3.
Linear Optimization Problems
N
11
V2
1
1\
s=D
Fig. 3.2.
s=1
The checkered area is the set defined by means of the inequali-
yl + sy2 > T, s E [0,1].
ties
Example.
(3)
1
s-2
Let
be the real interval
S
[0,1].
finitely many elements, in contrast to Example (2).) and
b
s
= j for all
yl + sy2 >
Let
y1-y2-plane which is defined by these inequalities is
drawn in Fig. 3.2.
The two hyperplanes (in this case straight lines)
r
yl + sy2 = corresponding to
s = 1
and
s = 1/2
The "general" situation (for
are marked in the figure.
n = 2) is illustrated in Fig. 3.3.
The hyperplanes corresponding to some particular
and
a s
indicated.
S
as = (1,s)T
The inequalities (1) then become
s E [0,1] .
r,-,
The subset of the
s E [0,1].
now has in-
(S
b
,
s
s E S
are
may be infinite; if so, it generates infinitely many hyper-
planes.
We note that the inequalities (1) may define bounded as well as unbounded subsets of (4)
Compare Fig. 3.2 with Fig. 3.3.
Rn.
Exercise.
Set
as = (1,1/5)T, bs = 0, for
plane defined by (1).
n = 2.
Let
s = 1,2,...
S = {1,2,... ,}, and let .
Draw the subset of the
y1-y2-
Show that this subset can be defined using two
inequalities only! (5)
Exercise.
Draw the subset of the
the infinitely many inequalities
y1-y2-plane defined through
12
I.
INTRODUCTION AND PRELIMINARIES
Fig. 3.3
-syl - l-s2 y2 > - 1-s2, To summarize:
for every Sought: (P)
A linear optimization problem is defined as follows:
s E S
a vector
A vector
y E Rn
Minimize
S, and
c = (c1,c2,...,cn)T E Rn, a nonempty index set
A vector
Given:
s E [-1,1] .
T
c y
as E Rn
and a real number
bs.
which solves the following problem (P):
subject to the constraints
asy > bs, all
s E S.
We now introduce some alternative notations which will often be used in the sequel.
We write
a(s)
instead of
as
b(s)
instead of
bs.
and
Hence we arrive at the following two componentwise representations of the vector
a(s) = as:
as = (a is' a2s,...,ans)
T
and
a(s) = CaI(s),a2(s),...,an(s))T.
Thus the optimization problem (P) can also be written in the following form:
3.
Linear Optimization Problems
13
n (P)
Minimize
I
n
r=1
c
rY.r
subject to the constraints
I
r=1
a
r(s)y, >- b(s),
s ES. One can use a particularly simple representation in the important special case when
has a finite number of elements, i.e. when (P) has only
S
finitely many constraints.
To discuss this case we put
m vectors
there occur
a(s
)
S = {s1,s2'...,sm)
(i = 1,2,...,m).
i
where
m > 1.
Then
The corresponding linear
constraints take the following form al(sl)y1 + a2(s1)y2 + ... + an(sl)yn > b(s1) a1(s2)y1 + a2(s2)y2 + ... + an(s2)yn > b(s2) (6)
a1(sm)yi + a2(sm)y2 + ... + an(sm)yn > b(sm) The
numbers
nm
are combined into a matrix A with the vectors
ar(si)
in its columns:
a(si)
a1(s1)
a1(s2)
...
a1(sm)
a2(s1)
a2(s2)
...
a2(sm)
l I
A =
(7)
an(s1)
an(s2)
...
an(sm) J
If now the tor
m
numbers
b(s ), i = 1,2,...,m
are combined into the vec-
i
b = (b(s1),b(s2),...,b(sm))T, then the constraints (6) may be
written ATy > b.
On the other hand let a matrix 1,2,...,m) and a vector ties
T
A y > b
A = (ar ),
(r = 1,2,...,n
b = (b1,b2,...,bm)T
be given.
and
s =
Then the inequali-
become
ally, + a21y2 + ... + aniyn > b,
a12y, + a22y2 + ... + an2yn > b2
a1my, + a2my2 + ... + anmyn > bm This system of inequalities is expressed in the form of (6) by putting S = {1,2,3,...,m}
14
INTRODUCTION AND PRELIMINARIES
I.
and ar(s) = ars
for
Example.
(8)
s = 1,2,...,m
and
r = 1,2,...,n.
Consider the system of inequalities
Y2 > 2
YI +
yI+3y2<3 YI > 0 Y2 > 0.
The second inequality is multiplied by
-1
and expressed in the form
-Yl - 3y2 > -3.
In this case we have
The matrix A
n = 2, m = 4.
1
-1
1
0
1
-3
0
1
becomes
A= Every column corresponds to one constraint of the system of inequalities and the corresponding vector (9)
Definition.
is given by
b
b = (2,-3,0,0) T.
A linear optimization problem with finitely many
constraints will be called a linear program.
Its standard form will be
denoted (LP):
Minimize
(LP)
A = (ars)
Here in
and
Rm
Rn
cTy
under the constraints
is a given
n
by
m
ATy > b.
matrix and
are given vectors
b,c
respectively.
Linear programming, i.e. the algorithmic solution of linear optimization problems of the type (LP), is one of the most important areas of linear optimization.
Therefore this special case will be treated separa-
tely and in detail in the sequel.
In the case that (1) defines infinitely many constraints (ISI = )*, it may be advantageous to look upon the vectors "matrix"
a(s)
as columns of a
This "matrix" has infinitely many columns.
A.
Here we combine the vectors
example of Exercise (4).
Consider the
a(s) = (1,1/s) T
into the array 1
1
1
1
1
1/2
1/3
1/4
...
We denote by ISI the number of elements of many elements, we write ISI = -.
S.
If
S
has infinitely
3.
Linear Optimization Problems
The vectors
a(s)
15
can always be arranged in this way when
S
con-
tains countably many elements but this representation fails in a more general situation, e.g. when
S = [0,1].
might be useful to write the vector rangement.
In the case
S =
[0,1]
a(s)
However, also in this case it from (1) in a matrix-like ar-
we may write
a1(0) ... al(s) ... al(l)
a2(0) ... a2(s) ... a2(l) an (0)
... an (s) ... an(1)
a(0)
= -
a(1)
Definition.
(10) ISI
t
+ a(s)
T
Consider a LOP of the type (P) and such that
(i.e. there are infinitely many linear constraints).
finite subset
{sl,s2,...,smI c S
and form the matrix
A
Select a
from (7).
The
linear program hereby arising is called a discretization of the original LOP.
As an example we discuss the general LOP: n
Minimize
cTy
subject to the constraints
a (s)y
r
I
r=1 where
r
> b(s),
s E S,
= m.
ISI
A discretization of this task is defined by means of the linear program: n
Minimize
cTy
subject to the constraints
a (S )y
r i r
I
r=1
> b(si),
i = 1,2,...,m.
Here, sl,s2,...,sm Example.
(11)
are fixed elements in
S.
Often problems of the type illustrated by Example (3)
are discretized as follows.
h = 1/(m-1), si = (i-l)h
Select a natural number
and form the matrix
A.
m > 2, put
In the case of (3) we
get 1
1
1
1
2
mm-1
mml
...
1
' '
m-2 m-1
A = 0
(12)
Exercise.
Denote by
1
v
the value of Problem (P) and by
the value of a discretization of (P). vm(P) < v(P).
Show that
vm(P)
16
I.
INTRODUCTION AND PRELIMINARIES
The method of discretization is very important both in theory and practice.
We will return to this topic in §13.
Provided that certain very
general conditions are met, it is possible to show that for every linear optimization problem (P) there is a discretization with the same optimal solution as (P).
These conditions are met in the practical applications This statement is an important consequence of the
discussed in this book.
duality theory of Chapter IV and indicates the important role of linear programming in the framework of linear optimization. We mention here that in computational practice discretization is often used to calculate an approximate solution of a linear optimization problem with infinitely many constraints.
The linear program thereby ob-
tained is solved by means of the simplex algorithm (Chapter V and VI) which, after finitely many arithmetic operations, delivers a solution (or the information that none exists).
We shall now illustrate another useful way of studying a given LOP by means of diagrams. Consider again Example (3). s E (0,1].
We have
a(s) = (l,s)T, b(s) = I for
Thus
a1(s) = als = 1 a2(s) = a2s = s b(s) Let
cI = 1
= bs and
= v. c2 = 0.
yl + sy2 > j,
The constraints (1) are written
s E [0,1].
They are illustrated in Fig. 3.2 but may also be represented geometrically as follows.
(yl,y2)
satisfies these constraints if the straight line
z(s) = Yl + sY2
lies above the graph of the function Fig. 3.4.)
YS
in the interval
[0,1].
(See
The corresponding LOP may be reformulated as the task to
determine, among all such straight lines, the one which intersects the vertical axis at the lowest point. (13)
solution.
Exercise.
Prove that the LOP above has the value
0
but no
Show also, by drawing a picture, analogous to Fig. 3.4, that
every discretization of this LOP has the value point of the interval
[0,1]
cretization, sl,s2'...,sm. low in this case.
--, if the left boundary
does not appear among the points of disThus the linear program is unbounded from be-
3.
Linear Optimization Problems
17
A
s
s
Fig. 3.4
Example:
(14)
Air pollution control.
We consider the problem of
maintaining a satisfactory air quality in an area
S
This
(e.g. a city).
goal shall be reached by regulating the emissions from the sources of pollutants in such a manner that the control costs are as small as possible.
sources have been identified and their positions and strengths
N
are known.
We consider here only the case of one pollutant, e.g. centration of the pollutant at a point pN
d(s)
s = (sl,s2)T
SO2.
The con-
is given by
q.V.(s).
=
JL1 Here
V.
is the transfer function which describes the contribution from
the source with index V.
j
to the ambient concentration at the point
describes an annual mean and is hence time-independent.
s.
The transfer
functions are calculated from meteorological dispersion models incorporating wind speed and direction, atmospheric stability, and several other geographical and meteorological variables. transfer functions are known.
q
,
We shall assume that the
is the strength of source number
j.
J
The number of pollutant sources is generally very great and therefore they cannot be regulated individually. source classes
G1,G2,...,Gn
lated in the same way.
Instead they are divided into
n
and all sources in a given class are regu-
Thus all residential houses of a city may form one
source class.
The sources are now numbered so that all sources with in-
dices between
jr-1 + 1
and
jr
comprise class number
r
(r = 1,2,...,n).
18
INTRODUCTION AND PRELIMINARIES
I.
Thus we have
< ... <jn=N.
0 = j0 < j 1 We now introduce
(r = 1,2,...,n)
vr(s) = E gjVj(s)
where the summation is extended over all members of class concentration of the pollutant at point
s
r.
The total
is thus given by
n
vr(s) r=1 One reduction strategy is now to reduce the emmission of class fraction
Thus
Er.
0 < Er < 1
Gr
by the
Hence the total remain-
(r = 1,2,...,n).
ing concentration after regulation becomes n (1 - Er)vr(s). r=1
We require now that for each surpass a given limit
s E S
g(s).
g
the value of this expression does not
may be a legally imposed standard de-
fining the highest acceptable concentration. are upper bounds
er < 1
for the fractions
We assume also that there Er.
(It is not technically
possible to completely remove the emissions from the group fore the numbers
E1,E2,...,En
0 < Er < er,
G r.)
There-
must meet the conditions:
r = 1,2,...,n
(15)
nC
(1-Er)vr(s) < g(s),
s E S.
(16)
r=1
The reduction of emissions entails costs, e.g. for the installment and maintenance of effluent filters in factories.
We shall assume that these
costs are defined by the linear function n
K(E) =
I
(17)
crEr,
r=1
where
cl,c2,...,cn
are known numbers.
The task of minimizing the cost
function (17) under the constraints (15), (16) is a linear optimization problem: n
Minimize
rr
c E
r=l
subject to the constraints
E
r-
> 0, r = 1,2,...,n,
3.
Linear Optimization Problems
-E
r
n r=1 (18)
->
-e
,
r
19
r = 1,2,...,n,
n
Er vr (s) >- -g(s) + I vr(s) , s E S. r=l
Remark.
The function
d
does not completely describe the air
quality since the level of concentration changes irregularly with time. The reduction policy which is determined by considering the annual mean concentrations only is therefore a long-term regulation strategy which must be supplemented with suitable short-term measures to counteract temporary strong increases in ambient concentrations. The above formulation of an optimization problem for environmental pollution control is based on work by Gorr and Kortanek.
See e.g. Gorr,
Gustafson and Kortanek (1972) and Gustafson and Kortanek (1975).
Chapter II
Weak Duality
The present chapter is very elementary in its entirety but is of decisive importance for understanding the material to follow.
Here we
lay the foundations for the theoretical as well as computational treatment of linear optimization problems.
The simple examples are particu-
larly designed in order to familiarize the reader with the structure of such problems as well as the central concept of duality which plays a major role both in the theory and in all practical applications of linear optimization.
A thorough study of these examples is the best preparation
for the duality theory to be presented in Chapter IV and the algorithms of Chapters V through VIII.
§4.
DUALITY LEMMA AND DUAL PROBLEM We consider the optimization problem (P) which was introduced in §3.
It can be written in the following compact form:
(P)
Minimize
c
T
y
subject to
a(s)Ty > b(s),
s E S
or alternatively
n
n (P)
Minimize
I
r=l
c y r r
subject to
I
r=l
One obtains an upper bound for the value vector
y
is available.
a (s)y
r-> b(s),
v(P)
as soon as a feasible
r
According to the definition of
immediately that
v(P) < cTy. 20
s E S.
v(P)
we find
4.
Duality Lemma and Dual Problem
21
It is of great interest for numerical treatment to determine good lower bounds for
v(P).
This fact will be illustrated in many examples.
The
following fundamental lemma can be used for constructing such lower bounds. (1)
Duality lemma.
{s1,s2,...,sq}
Let the finite subset xl,x2,...,x
q > 1, and the nonnegative numbers
S,
be such that q
c = a(sl)x1 + a(s2)x2 + ... + a(sq)xq.
(2)
Then the following inequality holds for every feasible vector y = (Y1,...,yn)T:
-
b(s1)xl + b(s2)x2 + ,.. + b(s )x 4
We have assumed that
Proof:
< cTy.
q
(3)
is feasible for (P).
y
Then we find
in particular i = 1,2,...,q.
a(si)TY > b(si),
xi > 0, i = 1,2,...,q, we get
Since
i== 1
/
b(sxi < i 1
(a(s i)Ty)xi = l\i= a(s i )x i} Y. 1
i
The assertion now follows from (2). y
Since (3) holds for every vector
which is feasible for (P) we
immediately arrive at the following statement on lower bounds for the optimal value
(Note that here we revert to the componentwise
v(P).
representation of the vectors (4)
index set
Let
Corollary. S
a(si)
and
c.)
{s1,.... sq}, q > 1
and let the numbers
x1,...,xq
be a finite subset of the satisfy
q
a (s )x
i=I
r i i
= c
r
,
(5)
r = 1,2,...,n.
Then q
i=1
b(si)xi < v(P).
(6)
We remark already here that one is, of course, interested in obtaining the best possible lower bounds for
v(P).
We will show in later chapters
that for large classes of problems it is possible to obtain arbitrarily good lower bounds by selecting the subset properly.
s1,...,sq
and the numbers
xi
22
II.
(7)
We consider the LOP
Example.
yl + 1/2 y2
Minimize
subject to
We try now to determine a finite subset tive numbers are met.
xl,...,xq
We take first
yl + sy2 > es, {s1,...,sq}
s E [0,1].
of
S
and nonnega-
such that the assumptions of the duality lemma q = 1
and seek a point
and a nonnegative number
[0,1]
WEAK DUALITY
x1
sI
in the interval
with the property (5):
XI = 1
1
x1 = 1/2.
sl
These equations have the unique solution
xI = 1, sl = 1/2.
From (6) we
get s1 xle= I
= T = 1.648 ... < v(P).
e l/2
It is also easy to obtain a rough upper bound: numbers
curve
such that the straight line
yl,yy2
es
throughout the interval
[0,1].
One needs only to find
yl + sy2
(Draw a picture similar to Fig. 3.4.)
yl = 1, y2 = 2. yl + 1/2 y2 = 2.
lies above the
This occurs e.g. for We get
v(P) <
Hence we have arrived at the (not very good) bracketing
1.648 < v(P) < 2. A better result is obtained by selecting
q = 2.
We then are faced with
the equations (see (5)):
xl + x2 = 1 s1x1 + s2x2 = 1/2.
One possible solution is given by
sl = 0, s2 = 1, x1 = x2 = 1/2.
From
(6), xIes1 + x2es2 = 1/2 + 1/2 (8)
Exercise.
e = 1.859 < v(P).
Show that indeed
v(P) = 1/2(l+e)
by determining a suitable upper bound. (9)
Example.
Minimize
Consider the linear program
3y1 + y2
subject to the constraints of Example (8) in §3. for its optimal value.
We seek a lower bound
To obtain a representation (2) or (5) means that
4.
Duality Lemma and Dual Probler
23
the vector
c = (3,1)T
tion of
columns of the matrix appearing in Example (8) in §3:
q
shall be written as a nonnegative linear combina-
1
-1
1
0
1
-3
0
1
A= Since
c E R2, we take
and try at first to represent
q = 2
nonnegative linear combination of the first columns of
c
as a
A.
(3).1. (1)xl + l _3 x2 = The unique solution of this linear system of equations turns out to be From (6) we now get the lower bound 5 for the optimal
xl = 4, x2 = 1.
Determine graphically the optimal
b = (2,-3,0,0)T.)
(We had
value.
value and the solution of the linear program. Let
Lemma.
(10)
y = (yl,...,yn)
Assume also that the subset bers
{s1,...Isq}
be feasible for the problem (P). of
S
and the nonnegative num-
satisfy the assumption (2) of the duality lemma.
x1,...,xq
If
n
q
b(s)x i=1
=
is satisfied, then Proof:
I
r=l
1
1
Since
c y r r
y
is an optimal solution to (P).
y
is feasible for (P) we have
n
v(P) <
cryr.
E
r=1
On the other hand, from (11) and (6), n
cryr < v(P). r=1
The assertion follows. (12)
(LP)
where
Linear programming.
Minimize A
has
m
hold, of course. Ax = c,
cTy
Consider now the particular problem
subject to
column vectors
ATy > b,
al,...am.
In this case
q < m must
Then every nonnegative solution of the system
x = (x1,...,xm)T
will give lower bounds for the value
(13)
v(LP)
of the form
24
WEAK DUALITY
II.
bTx < v(LP).
(14)
Note that (13) can be written in the alternative form m c =
aixi, i=1
which corresponds to Equation (5), while (14) corresponds to the inequal-
ity (6). A natural objective is to select the subset nonnegative numbers the value
x1,...,xq
obtained from the duality lemma.
v(LP)
Dual problem (D): real numbers
xl,...,xq
{sl,...,sq}
and the
in order to maximize the lower bound for
Determine a finite subset
We arrive at the {s1,...,sq} c S
and
such that the expression
q
i=1
xib (si)
(15)
is maximized, subject to the constraints q
xiar(s.) = cr,
r = 1,2,...,n,
(16)
i=1
xi > 0,
i = 1,2,...,q.
{si,...,sq, xl,...,xq}
(17)
is said to be feasible for (D) when
si E S,
i = 1,2,...,q, and (16) and (17) hold.
The problem (D) appears to be very complicated since
However, we will see in Chap-
of "mass points", may be arbitrarily large. ter IV that
q = n
may be assumed in all problems of practical interest.
(Then (D) is a nonlinear optimization problem with in our argument we shall start by allowing Denote by
v(D)
q, the number
q
2n
variables.)
But
to be arbitrarily large.
the value of (15) subject to (16) and (17).
Then we con-
clude from the duality lemma (1) the (18)
Weak duality theorem.
v(D) < v(P)
The pair of problems (P) - (D) is called a dual pair.
The transfer
from the primal problem (P) to the dual problem (D) will be called dualization.
The following reformulation of Lemma (10) will be useful when the
results of the present section are applied to concrete problems. (19)
Lemma.
Let
{sl,...,sq, xl,.... xq}
y = (y1,...,yn)T
be feasible for (P) and
be feasible for (D).
If
4.
Duality Lemma and Dual Probier.
25
n
q
b(s.)x. = 1
i=1
holds, then
1
y
c y r r
E
r-1
is a solution of (P) and
is a
{si,...Isq, x1,.... xq}
solution of (D).
{s1,...,s q
,
..,x
x1
y = (yl,...,yn)T
Let
Complementary slackness lemma.
(20)
feasible for (P) and
q
}
be
be feasible for (D).
As-
sume also that the following relation holds:
n xi(I ar(si)yr
-
b(si))
i = 1,...,q .
= 0,
(21)
r=1
Then
y
of (D).
is a solution of (P) and
is a solution
{si,.... sq, xi,...,xq}
Further, the values of (P) and (D) coincide.
Proof:
In (21), x
.
> 0
1
implies
cn
i = 1,2,...,q.
ar(si)yr = b(si), r=1
Thus we have the following equation:
i=1
b(s 1 )x 1.
_
I(I
a (s 1.)yr )x
i=1 r=l r
=
1
Here we have used the feasibility of
E (I a (s .)x.)y r=1 i=l r 1 1 r
=
{sI'...,sq, xi,...,xq}.
I r=l
c
ryr.
The asser-
tion now follows from Lemma (19). (22)
Example:
Optimal production plan.
turn to the production model (2) in §1. G1,...,Gn
and the possible activities
In this subsection we re-
There we considered Ps (s E S)
n
goods
which were described
by the vectors a(s) = (al(s),...,an(s))T. Here
ar(s)
is a measure of the amount of good
or produced when activity
Ps
Gr
which is consumed
is carried out with intensity
1.
We had formulated an optimization problem (for maximization of profits) of the following form: (q > 1) of the index set
S
Determine a finite subset and real numbers
{sl,.... sq}
{x1....,x
}
such that the
q
expression
b(s1)xi + b(s2)x2 + ... + b(sq)xq is maximized subject to the constraints
(23)
26
II.
ar(si)xi + ar(s2)x2 + ... + ar(sq)xq < cr,
WEAK DUALITY
r = 1,...,n,
(24)
and x.
i = 1,...,q.
> 0,
(25)
In order to get an optimization problem of the type (D) we introduce slack variables
Then we write (24) - (25) in the following
fir, r = 1,2,...,n.
equivalent form q
ar(si)xi + Er = cr,
(26)
r = 1,2,...,n
i=1
xi > 0
(i = 1,...,q), r > 0
(r = 1,...,n).
(27)
This may be interpreted as meaning that the activities supplemented with the so-called disposal-activities (28)
The corresponding primal problem.
Ps, s E S
are
Pr, r = 1,...,n.
The maximization of the pre-
ference function (23) subject to the constraints (26), (27) is the dual of the following linear optimization problem: n Minimize
I
r=1
(29)
y c r r
subject to the constraints n
r=1
ar(s)yr>b(s), s ES
(30)
r = 1,...,n.
(31)
yr > 0,
y1,...,yn
The variables
of this
the prices of the goods
primal problem may be interpreted as and the number
G1..... Gn
nC
(32)
ar(s)Yr r=1
indicates the cost which arises when the activity out with intensity
Thus a "price system"
1.
Ps (s E S)
y1,...,yn
is carried
is feasible
(i.e. meets the conditions (30) - (31)) when all prices are nonnegative and when the cost (32) for no ing when the activity
P s
s E S
is below the revenue
result-
b(s)
is carried out with unit intensity.
The com-
plementary slackness lemma (20) now assumes the following form: (33)
xi > 0
for
Let
{s1,...,sq, x1..... xq}
i = 1,...,q
and let
y
be a feasible production plan with be a feasible price vector.
production plans and price vectors are optimal if
These
4.
Duality Lemma and Dual Probler.
27
n
ar(si)yr = b(si), i = 1,...,q
(34)
,
r=1
and r = 1,...,n,
yrEr = 0,
(35)
with cr -
r
ar(si)xi,
r = 1'...,n.
i=1
The conditions (34) and (35) admit an excellent economic interpretation: A feasible production plan and a feasible price vector are optimal if i) the cost per unit intensity of each activity
occurring in the pro-
P s
duction plan is equal to the corresponding revenue prices
yr
of goods
and if ii) the
b(s)
which are not exhausted (i.e. Er > 0) are zero.
Gr
By means of the tools developed in Chapter IV we will be able to give conditions which ensure that the problem (23) optimal production plan is solvable.
-
(25) of finding an
We shall also demonstrate that
there is then an optimal production plan involving at most
n
activities.
This result is true even if there are arbitrarily many possible activities.
The study of production models of the same kind as, and similar to, that of problem (23) - (25) has greatly stimulated the development of
The whole theory of Chapter IV as well as the simplex
linear programming.
algorithm of Chapter V can be motivated with concepts from economics. This is expounded in the book by Hildenbrand and Hildenbrand (1975) and the reader is referred to this text. (36)
Duality for linear programming.
We now investigate the import-
ant special case of linear programming, i.e. when the index set
n (LP)
Minimize
I
cryr
subject to
ATy > b.
r=1
We recall that the constraints of (LP) may be written in the form any > b.,
where
all.... am
S
is
Then (P) takes the special form (see (9), §3):
finite, S =
are the column vectors of the matrix
A, and
28
WEAK DUALITY
II.
bI
...
all
a12
a21
a22
alm ...
b2
a2m
b=
A =
l
..
.
a
an2
t
t
aI
a2
am
t
b
nm
and
m
In this case there are only finitely many vectors
ai (i = 1,...,m) and
is permitted by the constraints of the dual problem.
xi = 0
q = m
may put
Therefore we
from the outset and replace (16), (17) by
m xi > 0
aixi = c,
for
i = 1,...,m.
i=1
Using matrices we get with
x = (xi,.... xm)T
Ax= c, x> 0. Therefore we define the dual linear program to be the optimization problem m Maximize
(LD)
b.x. = bTx
subject to
Ax = c,
x > 0.
i=1
This is a problem with a linear preference function, linear equality constraints, and positivity requirements for all variables.
It is a very
important fact that problems of the type (LP) through simple transformations can be brought into the form (LD) and vice versa.
This is not pos-
sible for general problems of the type (P) and (D). The transformation (LP) - (LD).
(37)
A vector
y E Rn
meets the
constraints
ATy > b of (LP) if and only if there is a vector ATy - z = b, (Such a
such that
z > 0
is called a slack vector).
z
z E Rm
equalities to be satisfied by the vector
(38)
This system of equations and in(y,z) E Rn+m
does not have the
same form as the constraints of (LD) since only some of the ables, namely splitting up ATy+
-
z1,...,zm, must be nonnegative. y
ATy
in the following way.
n+m
vari-
This is remedied by
Consider the system
- z=b
y>0, y >0, z> 0
(39)
4.
Duality Lemma and Dual Problem
where
y+ E Rn, y
and
and
satisfy (39), then the vectors
z
satisfy (38).
z
We show that (39) and (38) are equival-
E Rn, z E Rm.
y+, y
If
ent.
29
y = y+ - y
To prove the converse note that every vector
may be written
y E Rn
with
y = Y+ - y
y+ > 0,
Thus from any solution (y ,y ,z)
y
> 0.
(40)
of (38) we may construct a solution
(y,z)
A representation (40) of
of (39).
y may be obtained by
putting yr = max(Yr,0) 1
r = 1,...,n.
,
(41)
yr = -min (yi,, 0) J But the representation type (40).
y = y+ - y
is not the only possible one of the
Let
r = y r + aor
y
yr where
yr ar
+
a
r I
are arbitrary nonnegative numbers.
Then
y = y+ - y
is also
a representation of the type (40) and it is easy to show that all representations of the type (40) may be constructed from (42).
We observe now
that
cT y= cT^+ y - cT-y holds for all representations of the type (42).
Therefore it follows that
the program (LP) is equivalent to the following optimization problem of type (LD):
Maximize
T -Cc Ty+ - c y )
subject to
A
(D)
(AT,
-AT,
Im)
y± z
(Y+,
Y-'
= b I
z)T > 0.
A (43)
The transformation (LD) -+ (LP).
(LD),
Ax= c,
x> 0,
in the equivalent form
We rewrite the constraints of
30
II.
WEAK DUALITY
Then we obtain from (LD) the following optimization problem of type (LP): c
A
Minimize
(LP)
-bTx
subject to
-A
0m
x >
-c
I
We define the double dualization of the linear
Definition.
(44)
f
program (LP) to be the following process:
First the linear program (LP)
A is dualized giving (LD).
Then the transformation (43)
(LD) - (LP)
is
A
Lastly, the linear program (LP) is dualized. A A We see immediately that (LD) is the dual of (LP). But we have al-
carried out.
A ready shown that (LP) and (LD) are equivalent.
Thus we arrive at the
important result:
If the linear program (LP) undergoes a double duali-
Theorem.
(45)
zation, an optimization problem equivalent to (LP) results. Consider the two optimization problems
Exercise.
(46)
T
Minimize
c
y
subject to
Ay > b,
Maximize
bTx
subject to
ATx < c,
y > b,
y E Rn,
and x > 0,
x E Rm.
In what sense can they be said to form a dual pair?
Carry out suitable
transformations which bring them into the form (LP) or (LD).
§5.
STATE DIAGRAMS AND DUALITY GAPS Using the simple weak duality theorem (18) of §4, we may immediately
derive a first classification table for the dual pair (P) - (D). of the type
v(P) = v(D)
given in Chapter IV.)
are called strong duality theorems.
(Results
They are
We recall that every minimization problem of the
type (P) must be in one and only one of the three states (see (7), §1) IC
(Inconsistent; there are no feasible vectors
we have (P)
y.
By definition
v(P) _ -.)
B
(Bounded; there are feasible vectors
UB
(Unbounded; there are feasible vectors
and
y
y
ference function is arbitrarily small, i.e.
v(P)
is finite.)
such that the prev(P) = -W.)
5.
31
State Diagrams and Duality Gaps
By the same token, the dual problem must be in one and only one of the three states indicated below.
(Observe that (D) is a maximization prob-
lem.) IC
(Inconsistent:
B
(Bounded:
UB
(Unbounded:
(D)
v(D)
v(D) _ -m.) finite.)
v(D) _ +-.)
The statement of the duality theorem (18) of §4 may be represented by the state diagram below.
Combinations of states of the dual pair (P)
-
(D)
which are impossible by (18) of §4 are marked with a cross in the diagram. (The reader should verify that these combinations cannot occur.) State diag ram for the dual pair (P)
(1)
-
(D).
P D
IC
B
UB
IC
1
2
4
B
3
5
x
UB
6
x
x
The Case 5 is of main interest for the applications. are both bounded.
Then (P) and (D)
This occurs when both problems are feasible.
It is possible to construct simple examples to demonstrate that all the Cases 1,2,3,4,5, and 6, which are not excluded by the weak duality theorem, do in fact occur in practice. We will show later that the Cases 2 and 3 do not occur in linear It is often
programming, i.e. linear optimization problems of type (LP).
possible to introduce "reasonable" assumptions on general linear optimization problems in order to insure that Cases 2 and 3 do not materialize. We shall treat this topic in detail in Chapter IV.
Nevertheless, we il-
lustrate Cases 2 and 3 of the state diagram by means of two examples constructed for the purpose. (2)
Example.
Minimize
(P)
yl
n = 2, S = [0,1].
subject to the constraints
(P) has feasible vectors, for we may take all feasible vectors
y = (yl,y2)T
syl + s2y2 > s2,
yl = 0, y2 = 1.
must satisfy
yl > 0.
Furthermore,
This fact is
easily illustrated by means of a diagram similar to Fig. 3.4.
we get
v(P) = 0 and Problem (P) is hence in State B.
s E S.
Therefore
II.
32
WEAK DUALITY
The corresponding dual problem (D) reads q
Maximize
sixi i=l
subject to the constraints q
sixi = 1
(3)
sixi = 0
(4)
i=1
i=1
si E [0,1]
i = 1,...,q
for
xi>0
q > 1.
and
By (4), for
The inconsistency of (D) is shown as follows: we must have
= 0
x. i
or
s. I
= 0
x. > 0
since
I-
and
i = 1,...,q
s2 > 0.
i-
But then
(D) is therefore in State IC and we have thus
(3) cannot be satisfied.
an instance of Case 2 in diagram Cl). (5)
(P)
Since
n = 1, S = [0,1]
Example.
Minimize
0
s(sy - 1) > 0, each feasible
s2 y1 > s
sy1 - 1 > 0
subject to the constraints
yI
for all
s2y1 > s,
s E S.
must satisfy
yI
This is not possible for any number
s E [0,1].
yl,
implying that (P) is in State IC.
The dual problem is
q
sx i i
Maximize
q subject to the constraints
s?xi = 0,
(D)
s. E [0,1],
x. > 0, for
i = 1,...,q
(D) is feasible and for each permissible lows that
si = 0
or
xi = 0
for
(q > 1),
{s1,...,sq, xl,...,xq}
i = 1,...,q.
it fol-
Thus (D) is in State B,
hence we have an instance of Case 3 in diagram (1). We have already mentioned that we shall in Chapter IV establish theorems proving
v(P) = v(D)
Thus we will prove that
is true given certain general assumptions.
v(LP) = v(LD)
always holds for linear program-
ming if at least one of the problems is feasible.
However, at the end of
this section we shall give examples of linear optimization problems which are in Case 5 of the diagram (1); i.e. where both the primal and dual problems are bounded, but where
v(P)
and
v(D)
do not coincide.
5.
State Diagrams and Duality Gaps
(6)
Definition.
Let a dual pair (P)
33
-
(D) be given.
The number
6(P,D) = v(P) - v(D) We introduce here the convention
is called the defect.
for all real numbers
If
c.
6(P,D) > 0, we say that a duality gap has
occurred.
The following diagram gives the values of the defect corresponding to all states of the dual pair. the state diagram (1).
This diagram is obtained directly from
(The impossible states which are marked with a
cross in (1) are omitted.) (7)
Defect diagram. (P) (D)IC
B
UB
+m
+_
0
+_
d
IC
I
B
(8)
y1
stands for a
nonnegative number,
Consider the following problem of type (P):
Example.
Minimize
d
0
0
UB
Here
subject to the constraints
syl + s2 y2 > 0, yl
s E [0,1]
> -10.
Here it is natural to look upon the index set as consisting of two different subsets since the constraints are generated by the vector a(s) _ (s,s2)T,
s E [0,1],
a(2) = (1,0)T (The notation
a(2)
is chosen arbitrarily.)
The reader should verify
that the constraints of (P) may be written in the form
a(s)Ty > b(s), where
S = [0,1] U {2} and
sES
II.
34
WEAK DUALITY
s E [0,1]
0,
b(s) = s = 2.
-10,
In the formulation of the corresponding dual problem we encounter infinitely We may represent them in the "matrix"
a(s) E R2.
many column vectors (see also §3) 0
...
S
...
1
0
...
s2 ...
1
t
t
0
t
t
a(l) a(2)
a(s)
a(s)
1
s E [0,1] . The dual problem can now be formulated at once. imply that the vector
combination of the vectors
qcl
slj si
i=1
x
+
1
a(s), s E S:
'j -q = lj q
0
The constraints of (D)
can be represented as a nonnegative linear
(1,0)
,
x1
0
..,x q
>0
(9)
(10)
sl,.... sq-1 E (0,1].
The second of the two equations summarized in (9) is q-l
2
s ixi = 0.
i=1
Because of (10) we must therefore have Therefore
xi = 0
or
si = 0, i = 1,...,q-1.
is necessary in order to satisfy (9)
xq = 1
- (10).
But then
the value of the dual preference function becomes q
b(si)xi = -10. i=1
Thus we conclude
v(D) = -10. We now determine
v(P).
sy1 + s2 y2 > 0,
we get
yI > 0.
s E [0,1]
0
s E [0,1],
(sY1 + s2Y2 = s(Y1 + sy2)
implies
Therefore
Since
yI > 0.)
and
yl + sy2 > 0, all
5.
State Diagrams and Duality Gaps
We now note that every vector (P).
35
(0,N,2)T E R2
with
is optimal for
y2 > 0
Thus we conclude
v(P) = 0. We have thus shown that the dual pair (P)
-
(D) has the duality gap
d(P,D) = 10.
Here we have an instance of Case 5 of the state diagram (1) or the defect diagram (7) with fect
d
d = 10.
From this example we also realize that the de-
may be made arbitrarily large by appropriately choosing the con-
straints for (P). Exercise.
(11)
Minimize
Consider problem (7) of §4:
yl + 2 y2
subject to
yl + sy2 > es,
s E [0,1].
Show that both the primal problem and its dual are solvable and that no duality gap occurs.
Hint:
Use for the dual
q = 2
and
sl = 0, s2 = 1.
Up to now we have not studied the solvability of (P) and (D).
(12)
This matter will be discussed in Chapter IV in connection with duality theory.
Exercise.
(13)
Minimize
-y1
a)
Consider the linear optimization problem
subject to the constraints
-yI > -1
(P)
-syI - y2 > 0,
s = 1,2,3,...
Formulate the corresponding dual problem (D) and show that there is a duality gap b)
6(P,D) = 1.
Show that the problem (P) in a) is equivalent to the task:
Minimize
-yI
subject to
-y1 > 0 -YI - Y2 > 0.
Form the dual and show that no duality gap occurs. (14)
The example of the preceding exercise shows clearly
Remark.
that the dual (D) of a certain linear optimization problem (P) depends not only on the preference function and the set of feasible vectors but also on the formulation of (P), i.e. on the manner in which the set of feasible vectors is described through linear inequalities. (15)
equality
Exercise.
yl > 0
Consider again the Examples (2) and (5).
is added to the constraints of (P) in (2).
The in-
Show that
36
II.
WEAK DUALITY
the corresponding dual pair is an instance of Case 5 of (1) and that no duality gap occurs.
Analogously, the inequality
the constraints of Example (5).
0
yl > 1
is added to
Show that the duality gap now "disappears"
(Case 6).
The question now arises whether the duality gap, when it occurs, is caused by an "unfavorable" choice of inequalities n
ar(s)yr > b(s),
s E S,
r=1
to describe the set of feasible vectors of (P).
Is it possible that there
always is an equivalent system of inequalities n
r=1
2r(s)yr > b(s),
sES
describing the same set of vectors and such that no duality gap appears? The answer is yes.
The existence of an equivalent, but for the pur-
pose of duality theory "better", system of inequalities is demonstrated in a paper by Charnes, Cooper and Kortanek (1962). (1975).)
(See also Eckhardt
However, there are no simple methods to transform systems of in-
equalities to remove duality gaps. questions further.
Therefore we will not discuss these
Instead, we shall in Chapter IV give simple conditions
which insure that for a given linear optimization problem no duality gap occurs.
Chapter III
Applications of Weak Duality in Uniform Approximation
Uniform approximation of functions is one of the most important applications of linear optimization.
Both the theory and the computational
treatment of linear optimization problems have been greatly influenced by the development of the theory of approximation.
In the first section of this chapter the general problem of uniform approximation will be formulated as a linear optimization problem. corresponding dual is derived.
The
The rest of the chapter will be devoted
to the special case of polynomial approximation.
Some classical problems
which admit an exact solution in closed form are also studied.
§6.
UNIFORM APPROXIMATION Let
be an arbitrary set and
T
which is defined on T tions
v
r
T + R, r = 1,...,n
:
f: T + R
and bounded there.
a real-valued function
The real-valued bounded func-
are also given.
The problem of linear uniform approximation is to determine a linear combination n r=l
yrvr
which best approximates
f
in the sense that the following expression is
minimized:
n sup tET
I
I
yrvr(t) - f(t)1
r=l
37
38
III.
(1)
APPLICATIONS OF WEAK DUALITY IN UNIFORM APPROXIMATION
The problem of uniform approximation:
n
Minimize
sup tET
(PA)
yrvr(t) - f(t)
I
I
r=l
over all vectors
y = (y1,...,yn)T E R.
An equivalent formulation is Minimize
over all vectors
yn+1
(y,yn+l)T E R"1,
subject to the constraints
nn
all
yrvr(t) - f(t)l S Yn+1'
t E T.
r=1
We note that for real numbers Iml
a
and
g
the inequality
< a
is equivalent to the two inequalities -a > -B
a > -S Therefore the approximation problem (PA) may be rewritten in the following form:
Minimize n
r=1
yn+l
subject to the constraints
vr(t)yr + yn+l > f(t), all t E T
nn
r=1
(2)
all
vr(t)yr + yn+l > -f(t),
(3)
t E T.
(4)
This problem now has the form of a linear optimization problem (P) in Rn+l
provided the index set
an(s))T
are properly defined.
S
and the functions
a(s) = (al(s),...,
There are two different kinds of vectors
since the vectors
a(s)
I
vl(t) l
1
and
-vl(t) l
,
t E T,
correspond to the conditions (3) and (4) respectively.
(5)
The constraints
of the dual of the problem (2) - (4) imply that the vector
6.
Uniform Approximation
39
0 1
c =
E Rn+100
1
which appears in the preference function of (2), must be expressed as a nonnegative linear combination of finitely many of the vectors (5). Hence the dual problem corresponding to (2)
-
(4) takes the form (compare
with §4, (15) - (17)): {t+,...,t++}, {t...... t
Determine two subsets
q_
and real numbers
x1,...,x++
x1,...,x
q
+
f(t+)x± 1
i=1
-
1
f(t )x
i=1
1
T(q+ + q- > 1)
of
}
q
such that the expression
-
q (6) 1
is maximized, subject to the constraints q
q
+
+
vr(ti)xi
q+
+
r = 1,...,n,
(7)
q-
x + i=1
vr(ti)xi = 0,
x = 1, i=1
1
(8)
1
x. > 0,
i = 1,...,q ,
(9)
x. > 0,
i = 1,...,q
(10)
1 -
This dual problem can be written in an equivalent, but simpler form. (11)
The dual problem (DA).
(q > 1) and real numbers
Determine a subset
xl,x2,...,xq
{t1,.... tq}
of
T
such that the expression
q
f(ti)x
(12)
i=1
is maximized, subject to the constraints q
(13)
r = 1,...,n,
= 0, i=1 v r(t.)x. 1 1 Ixil < 1.
(14)
i=1 (15)
Lemma.
The optimization problems (6) - (10) and (12) - (14) are
equivalent in the following sense: +
+
x1x +, x1,...,x ...... q
q
}
For every
satisfying (7)
{tl,...,t
+
q
-
,
tl,.... t
(10) one may construct
q
40
III.
APPLICATIONS OF WEAK DUALITY IN UNIFORM APPROXIMATION
{t1,...,tq, x1,...,xq}
satisfying (13), (14) such that the values of the
preference functions (6) and (12) coincide, and vice-versa. Let a solution of (7)
Proof:
sume that
(10) be given.
-
T+ = (tl,...,t++} put
q = q + +
We may as well as-
We discuss first the case when the sets
x+ > 0, xi > 0.
and T = (ti,...,t q q q_, (t1,...,tq}
= T+ U T-
Then we just
are disjoint.
}
and
xj,
if
ti = tj
for a
tj E T
I -X.,
if
ti = t.
for a
t. E T ,
It is easy to verify that (13), (14) are satisfied and that (6) and (12) have the same value.
In the remaining case when k, R
point in common, there are indices tk = t-,
with
xk
then we remove T
and
T
have a
min(xk,xR) = d > 0.
Then we replace
from
T+
such that
from
tk
xk - d
and
with
xi
T+, but if instead
xi - d.
xk-d = 0
If now
xi - d = 0, tk
is removed
This transformation does not change the value of the preference
.
function (6), and the equations (7),
(9), (10) continue to hold.
But in-
stead of (8) we get q x
X. <
+
1.
i=1
1
i=1
1 -
The sets
T
and
T
will become disjoint after a finite number of the
transformations described above and a suitable solution of (DA) is constructed by the procedure given earlier.
the assertion we let set
q
be feasible for (DA).
Now
= q, t1 = ti, i = 1,...,q, and
= q
x+ =
To verify the remaining part of
{tl,...,tq, x1,...,xq}
max(O,xi) _ (Ixil + xi)/2,
xi = -min(O,xi) _ (Ixij
- xi)/2,
i = 1,...,q.
The rest of the argument is straightforward.
Note that in order to
satisfy (8) it might be necessary to replace
x+
xi + c, where
c > 0
with
xi + c, xi
with
is chosen so that the condition (8) is met.
All duality results which have been derived for the dual pair (2) (4),
(6)
- (10) may be applied to the pair of problems (PA), (DA) from
(1) and (11) to give corresponding statements.
However, many of these
-
6.
Uniform Approximation
41
theorems may be shown directly for the pair (PA) - (DA).
This is true,
e.g. for the duality lemma which could be based on (1) of §4:
numbers
Let the finite subset
Lemma.
(16)
xi,...,xq
(tl,...,t } a T 4
and the real
be such that
q
r = 1,...,n
vr(ti)xi = 0,
(17)
i=1
q L
i=l
IxiI < 1.
(18)
-
Then the following relation holds for any
q
y E Rn:
n
i=1
(19)
Yrvr(t) - f(t)I.
f(t)x1 < suPI I tET r=1
Proof:
From (17) we conclude
yrvr(ti))xi = 0. G ( 1 i=1 r=1 Thus q
n
q f(t1.)x1 .
_
i=1
((
Sf(t 1.)
-
i=1
I
yrvr (t
i)}x.
1
r=1 n
tq
If(ti) -
L
Yrvr(ti)I
Ixil
r=1
i=1
q
n
< suplf(t) tET
`i
r=l
n
< sup I f (t) tET
I Ix.I
Yr vr (t) I
i=l
1
I
I yrvr (t) r=1
which is the desired result. (20)
Show that the left hand side of (19) may be replaced
Exercise.
by q q I
L
f(ti)xiI
i=l (21)
Remark.
If
any choice of elements
q > n+l, then (17) has a nontrivial solution for tl,...,tq
in
T.
underdetermined linear system of equations
Indeed, (17) then gives the
42
APPLICATIONS OF WEAK DUALITY IN UNIFORM APPROXIMATION
III.
v1(t1) ... vl (t q) v2(tl) ... v2(tq)
l vn(tl)
vn(tq)
22
0
2
t 0 j q
J
and setting
x =
Ixii)-1X,
(
(22)
i=1
the vector
now meets the constraints (17), (18) of (DA).
x E Rq Example.
(23)
The function
mated by a straight line
f(t) = et
y1 + y2t
is to be uniformly approxi-
over the interval
T = (-1,1].
Thus
we need to solve the problem: sup let - yl - y2tl. tET
Minimize (yl,y2)
We want to apply Lemma (16).
We select
q = 3
and set
tl = -1, t2 = 0,
The system of equations (17) then becomes
t3 = 1.
xl + x2 + -X1
3=0
3 = 0.
+
The general solution of this system is given by
21 = a X2 = -2a X3=a
a
where
is arbitrary.
The "normalization" (22) gives
x = (4, -2, 4)T,
which together with
t1 = -1, t2 = 0, t3 = 1
Thus we may conclude from (16) that if
(DA).
straight line over the interval 1
we -1 -
1
1
+
meets the constraints of et
is approximated by a
[-1,1], then the error will be at least
Z 0.27.
An upper bound for the smallest possible approximation error is obtained by taking
6.
Uniform Approximation
43
yl + y2t = 1.36 + t.
Then sup
let - 1.36 - tj
= 0.36.
tE[-1,1] The function
Exercise.
(24)
approximated over the interval
f(t) = 1/(2+t)
[-1,1]
is to be uniformly
by a straight line
y1 + y2t.
Determine a lower bound for the value of the corresponding approximation problem by proceeding as in (23). puts
t1 = -1, t2 = 0, t3 = 1.) x1, x2, x3
for
optimally for (DA).
q = 3
Hint:
t2 = T.
xl, x2, x3
and
One gets the same linear system (The lower bound is
Consider the same example as in (24) with
t1 = -1, t3 = 1, but set
Let
Hint:
as in the preceding example.
Exercise.
(25)
(Thus one selects again
Then try to determine and
T
0.083.)
q = 3.
t2 = T
become the variables of
the following optimization problem:
Maximize
1+T + 3
xl +
subject to the constraints
xl+x2+x3= 0, (26)
-xl + TX2 + x3 = 0, 1x11 + 1x21 + Ix3l = 1,
(27)
-1 < T < 1.
(28)
Assume that
and
xl
are positive and
x3
xl - x2 + x3 = 1.
c omes
press
x1, x2
and
x3
x2
negative.
Then (27) be-
This relation is used together with (26) to exas (linear) functions of
T.
We then enter these
expressions into the preference function and maximize with respect to This gives the lower bound
T.
0.0893.
The following simple lemma may be useful when one wants to show that a certain vector
y
is an optimal solution of (PA).
An illustrative
example is given in (31). (29)
and
Lemma.
Let
{t1....It q
,
q > 1, satisfy q
vr(ti)xi = 0, i=1 q
I 1xil = 1. i=l
r = 1,...,n,
xl,... x }, where q
ti E T, i = 1,...,q,
44
Let
APPLICATIONS OF WEAK DUALITY IN UNIFORM APPROXIMATION
III.
y E Rn
and define
n
yn+l = sup f Ct) - I yrvr(t) tET
r=1
Assume also that the following relations hold for
i = 1,...,q:
Either
x.I = 0 or n YrvrCti) = yn+1 sgn xi
f(ti) -
where
sgn xi = xi/Ixil.
(30)
I r=1
is an optimal solution of (DA)
Then we may assert: {t1,...ItgI x11 ...,xq} and
of (PA), and the values of (PA) and (DA) coincide.
y
Proof:
f(t i)x
i=l
=
=
I y(
I f(t i )xi - r=l r i=1 vr (t i )x. L i i=l I i=l
f(ti) -
I Yrvr (ti)}x
i.
r=1
Applying (30) we get
q
qqC
f(ti)xi = Yn+1
iLl xi
qqC
sgn(xi) = Yn+l iLllxil n
= suplf(t) - I yrvr(t)I. tET r=l
The statement now follows from Lemma (16). (31)
Example.
The function
mated over the interval
[0,2]
f(t) = t2
is to be uniformly approxi-
with a linear combination of the functions
v1(t) = t, v2(t) = exp(t). Andreasson and Watson (1976) give as the solution of this approximation problem the following coefficients
of
vl
and
v2:
We want to use Lemma (29) to verify that these values of
yl
and
y2
yl = 0.18423256,
y1
and
y2
y2 = 0.41863122.
are optimal (within the precision shown).
One first establishes that the
error function
t2
- ylt - y2 exp(t)
assumes its minimum and maximum values at t2 = 2.00000000:
tl = 0.40637574
and
6.
Uniform Approximation
ti 2
- y1t1 - y2 exp(t1) = -0.53824531,
2
t2
45
- y1t2 - y2
exp(t2) =
0.53824531.
The dual constraints from (29) read (with
q = 2)
tixl + t2x2 = 0,
exp(tI)x1 + exp(t2)x2 = 0, Ix1I + Ix2I = 1.
We put
sgn x1 = 1
and
Then two of the
sgn x2 = -1.
equations above
become tlxl + t2x2 = 0,
-xl + x2 = 1. tI = 0.40637574
Entering
x1 = -0.83112540
and
and
into these equations we obtain
t2 = 2
It is now easy to check that all
x2 = 0.16887459.
conditions of Lemma (29) are met.
Thus the proposed solution is indeed
optimal.
We conclude this section by showing that the approximation problem is solvable under fairly general conditions. Theorem.
(32)
that the functions on
T.
T c Rk
be nonempty and compact and assume also
f, v1,...,vn
are continuous and linearly independent
Let
Then the linear approximation problem (PA) is solvable; i.e. there
is a vector
y E Rn
max If(t) tET
such that
n - Iy v (t)I = min max If(t) r=l
r r
yERn tET
I
r=l
y v (t)I. r r
We may write "max" instead of "sup" in the formulation of
Note.
(PA) since the functions
f, v1,...,vn
are continuous and
and hence the error function n
y v
f
r r
rI l
assumes its maximum and its minimum. Proof:
We define a norm on
n IIyNIv = maxi I yrvr(t)I tET r=l
Putting
n -
y = 0
we get
Rn
by
T
is compact
46
APPLICATIONS OF WEAK DUALITY IN UNIFORM APPROXIMATION
III.
n
max If(t) -
E
r=1
tET
y v (t) l = max lf(t) l
rr
= A.
tET
Hence the optimum value of (PA) lies in the interval
[O,A].
the minimization we need only to consider those vectors
y
Because of which satisfy
n
max If(t) - I yrvr(t)l < A. tET r=1
(33)
Using the triangle inequality on (33) we find
n
n
II Yrvr(t)l < If(t) r=1
Yrvr(t)l + lf(t)l `_ 2A.
E
r=1
Thus we need only to minimize over those vectors
y E Rn
such that
IIYlly < 2A;
i.e. a compact subset of
Rn.
Since the preference function of (PA),
n
y -+ max If(t) - I yrvr(t)l, tET r=1
is continuous, the existence of an optimal solution follows by Weierstrass' theorem (see (13), 52).
V.
POLYNOMIAL APPROXIMATION This section is devoted to the study of (PA) in the case when
a real interval and the function nomial.
f
is
T
is
to be approximated by a poly-
Then major simplifications are possible and one can, for example,
calculate lower bounds for the error of the best approximation without treating the dual problem explicitly.
Some special approximation prob-
lems admitting an optimal solution in closed form are also treated.
We
now prove: (1)
Lemma.
(x1,. ..,xn+l)
Let
tI < t2 < ... < to+l
be fixed real numbers and let
be a nontrivial solution of the homogeneous linear system
of equations n+l
r 1 (2)
i=1
Then xi xi+l < 0,
i = 1,...,n.
7.
Polynomial Approximation
Pn
the uniquely determined polynomial
Proof:
Let
be a fixed integer such that
i
Yrt
Pn(t) _
47
1 < i < n.
Denote by
r-1
r=1
satisfying
j =i
1,
Pn(t.)
D,
(See Fig. 7.1.)
= 1,...,n+1,
j
That such a
i,
j
j +
does exist is an immediate consequence
Pn
of the fact that the so-called Vandermonde matrix is nonsingular. (3) below.)
From (2),
P (t.)xi _ i=1
n+l
n
n+l n
(See
i
y
L
L
r i-1
r=l
Pn
Due to the construction of
tr-1 X. = 0. i
i
this relation gives
xi + Pn(ti+l)xi+l = 0. Pn
cannot vanish in
[ti,ti+l ]; if it did, Pn
Therefore
which is impossible. (3)
Exercise.
Vandermonde matrix
Let
V
Pn(ti+l) > 0
t1 < t2 < ... < to
by
1
Fig. 7.1
would have
n
and we conclude be given.
zeros,
xixi+l < 0.
Define the
48
APPLICATIONS OF WEAK DUALITY IN UNIFORM APPROXIMATION
III.
1
tI
t2
...
to
t2
t2
...
t2
2
1
n
V(tl, ..,tn) = to-1
to-1
to-1 .
.
.
n
2
1
It can be shown that det V(tl,...,tn) > 0.
(4)
Use (2) to obtain the expression det V(t1,...,ti-1, ti+1"* 'Itn+l) xi = -xi+1
et V t1,...,ti-I, ti, ti+2....,tn+1
This combined with (4) gives an alternative proof of Lemma (1). We remark here that a result corresponding to Lemma (1) may be established not only for tems
l,t,...,tn-1, but also for general Chebyshev sys-
The theorems to follow which depend on Lemma (1) can
v1,...,vn.
also be generalized.
See Chapter VIII.
The following theorem, which is due to De La Vallee-Poussin, is important since it can be used for calculating lower bounds for the error of the best possible approximation without solving the linear system (2) explicitly. (S)
of degree
Theorem.
Let
< n, and let
f
be continuous on
(a,s], let
a < tI < t2 <...< to+1 < S
be
P
n
be a polynomial
points such
that {f(ti) - P(ti)}.{f(ti+1) - P(ti+l)} < 0,
(See Fig. 7.2.)
min I f (ti)
i
i = 1,...,n.
(6)
Then
-P
)I < An <
max I f (t) -
a
P(t)j,
(7)
where n
An = infl max If(t) - I yER a
A
n
yrtr-ll.
denotes the smallest error which can be achieved when
approximated by polynomials of degree
< n.
f
is
7.
Polynomial Approximation
49
degree P < n = 3 n + 1 = 4
t3
t2
t1
Fig. 7.2
Proof:
The right-hand inequality in (7) is obvious.
Let
pl' " "pn+l
be a nontrivial solution of the system n+1 E
r-1 ti Pi = 0,
r = 1,...,n.
i=1
By Lemma (1) we may assume
pipi+l < 0, i = 1,...,n.
Now put
n+l
xi = Pi{ El I pjI }-1. In this way we get a feasible solution to the dual problem since n+l
tr lx. = 0,
r = 1, ..,n, (8)
n+l G
lxil = 1.
i=1
By (16) of §6 (weak duality) we also have n+l
f(ti)xi < An
(9)
i=1
We now define 6i = f(ti) - P(ti); by assumption (6), 6i6i+l < 0.
If the signs of all numbers
xi
changed simultaneously, the constraints of (DA) are still met.
are
Therefore
50
III.
APPLICATIONS OF WEAK DUALITY IN UNIFORM APPROXIMATION
we can always achieve X161 > 0
(10)
since we also have n+l
n+l
n+l
+ L
i=1
Applying (8) and (10) we find that
xixi+1 < 0.
xilf(ti) - P(ti)I
E
f(ti)Xi
X
xi6i
i=1
i=1
n+ > min I6iI
Ix.
E
= minlf(ti) - P(ti)I.
i=1
I
i
An application of (9) now gives the desired result. Corollary.
(11)
that there are
n+l
Let
P
points
be a polynomial of degree a < tI < t2 <...< to+l <-a
< n
and such
with the proper-
ties
I6iI = If(ti) - P(ti)I = and
Then
6 .6 1.+1 < 0,
i = 1,...,n+l, (12)
i = 1,...,n.
1
is a polynomial of degree
P
the uniform norm. f - P
max If(t) - P(t)I, a
< n
which best approximates
f
in
The conditions (12) state that the error function
alternates in sign at
t1,...,tn+1
and assumes its largest ab-
solute value at these points. Remark.
(13)
In the special case when 1611=1621=...=16n+ll'
we get n+l
f(ti)xi = minlf(ti) - P(ti)I i=1
i
Hence (7) and (9) give the same lower bound for the attainable approximation error in this case.
We shall show in Chapter IV that a strong dual-
ity theorem can be established for the dual pair (PA) and (DA); i.e. no duality gap occurs.
This entails the use of Theorem (5) for constructing
arbitrarily good lower bounds for
An
by choosing
suitably. (14)
Determination of a polynomial satisfying (6).
t2 <...< to+1 < a
be given.
Define the function
6
a < tI <
Let
by
6(ti)
We now seek a polynomial
P
of degree
< n
and a constant
a
such that
7.
Polynomial Approximation
51
i = 1,...,n+l.
PCti) = f(ti) + ed(ti),
(15) is a linear system of equations with as unknowns.
(15)
and the coefficients of
a
Using (4) it is easy to demonstrate that
and
P
P
are
c
uniquely determined by (15). and
P
scheme.
a
are efficiently calculated using a so-called difference
(We assume that divided differences are familiar to the reader.
Otherwise see e.g. Dahlquist and Bjorck, (1974), p. 277.)
Since
P[tl,...,tn+1] = 0, (15) gives at once e = -f[t1,...,tn+1]/5[tl,...,tn+l],
where we use the customary notations for divided differences.
P
may be
represented in the "Newton" form n-1
P(t) = P[t1] + P[t1,t2](t-t1) +...+ P[t1,t2,...,tn]
II (t-ti).
i=1
The divided differences appearing in this formula are easily computed from the intermediate results obtained when calculating (5),
lei
is a lower bound for
(16)
Numerical example.
tl = 0, t2 = 1/2, and
ti 0
f(ti)
t3 = 1.
f[ti,ti+l]
Let
[a,$] = [0,1], f(t) _ (l+t)-1, n = 2,
The difference schemes for f[t1,t2,t3]
1/3
(17)
t2 E (0,1)
are:
6[tl,t2,t3]
-8
1
-4
1/2
-1
e = 1/24; i.e. the function
mated in the uniform norm over less than
6
4
2/3
We get at once
and
-1
1
-1/3 1
f
d[ti,ti+l]
8(ti)
-2/3 1/2
By Theorem
c.
An.
[0,1]
1/(l+t)
cannot be approxi-
by a straight line with an error
1/24 '= 0.0417.
Exercise.
Take
t1 = 0, t3 = 1, and show by optimizing over
(see also exercise (25), §6) that
A
2
= (3-j)/4 z 0.0429.
We now discuss some special approximation problems which nevertheless are of general interest.
APPLICATIONS OF WEAK DUALITY IN UNIFORM APPROXIMATION
III.
52
(18)
Exercise.
f
Let
have two continuous derivatives on Denote by
f"(t) > 0, t E [a,s].
and be such that
which interpolates
a
at the endpoints
f
and
t
[a,8]
the straight line Put
$.
6= max If(t) - k(t) 1. a
Next use (11) to show that the straight line which approximates
f
best
in the uniform norm has the representation
k(t) - 6/2 and that the approximation error is (19)
Exercise.
t1
=
a+S 2
-
and
tl
in (18) and show that the straight
f(t) = t2
Put
line which best approximates this function at
6/2.
in the uniform norm interpolates
f(t) = t2
t2, where
1 2T (0-')'
a+S
t2 =
2
+
2T
Show also that the approximation error is
8 (a-B)2. We will next treat the more general problem of approximating in the uniform norm by a polynomial of degree
f(t) = to
In order to represent
< n.
the solution in a concise form we introduce the Chebyshev polynomials. (20)
Definition.
The Chebyshev polynomials
TO,T1,...
are defined
through
T0(t) = 1,
T1(t) = t (21)
TnCt) = 2t Tn-1(t) - Tn-2(t), (22)
n = 2,3,...
Show that the recurrence relation (21) is satisfied
Exercise.
by
(23)
Tn(t) = cos(n arccos t). Hint:
Use the addition theorem (24)
cos(A+B) = 2 cos A cos B - cos (A-B). We now prove: (25)
Theorem.
Let
[a,8]
be a given interval.
n
min max t...... t_ a
I
II
i=l
(t-ti)I = 2(,_a)n/4n
Then
7.
Polynomial Approximation
53
The minimum is assumed for
ti =a+8 2 +
2
8-a cos 9i, where
Bi =
i-1/2 n
i = 1,...,n.
7r,
(26)
Also,
2(8-a)n T (2t-a-0 ). 4n n 8-a
T1
i=1
i
Proof:
Consider the approximation problem n
max stn -
Minimize
tE [a, 8]
yERn
We next determine
yrtr-ll
I
(27)
r=1
y
through the condition
n to
= Qn(t), ytr-1 r
E
(28)
r=1
where
Qn (t) = 2 (8-a)n T (2t-a-8) 4n
n
(29)
8-a
and apply (11) to verify that
y
is a solution of (27).
We first note that 2t-a-$ 8-a
t
maps
on
[a,8]
[-1,1].
Using the recurrence relation (21) for
verify that the coefficient of that
ITn(t)l < 1
and thus
to
in
Q.
is
1.
IQn(t)I < 2(8-a)n/4n.
Tn, we
By (23) we conclude
We also find that
Qn(ti) = (-1)i-12(8-a)n/4n, where 2+8
t* = 1
and
2
IQn(t)j
+
8-a
cos
2
(i-1)"
n
,
i = 1,
assumes its maximum value at
, n+l ti.
(30)
,
Hence the conclusion
follows from (11).
Using (28) and (29) we conclude that the polynomial < n
which best approximates
to
in the uniform norm on
Pn
of degree
[a,8]
is given
by P (t) = to _ 2(8-a)n T (2t-a-8). n
(Note that
vanishes.)
4n
n
8-a
when the right hand side is expanded the coefficient of
(31)
to
54
APPLICATIONS OF WEAK DUALITY IN UNIFORM APPROXIMATION
III.
(32)
Exercise. -N-1
ti = cos
ii,
Let
i = 1,...,N+1,
and
si = cos 1 N/2,
i = 1,...,N.
Show that the Chebyshev polynomials satisfy the following orthogonality relations: 1
N N
I Tm(si)Tn(si)
I 'Tm(ti)Tn(ti)
m = n = 0
0<m=n
j 1/2,
i=1
1N+1
,
=
i1
0
,
1
,
m = n = 0
1/2,
JI
0
,
(33)
m#n, m
0<m=n
(34)
m#n, m
Here the notation " means that the factor of the first and the last term in (34).
1/2
should be placed in front
Note also that
TN(ti)
and TN(si) = 0. We next treat an approximation problem which sometimes occurs in the study of iteration processes in numerical linear algebra. (35)
Theorem.
Let
[a,8]
be a bounded interval such that
0 ¢ (a,6].
Consider the problem max IP(t)I a
Minimize
(36)
< n
over all polynomials of degree
such that
P (O) = 1.
(37)
The optimal solution is given by P(t) = Tn(2tt=a-B)/T.(a±s)
Proof:
We can write
P(t) = 1 - ylt - y2t2 since
P(O) = 1.
min y1,...,yn
P
(38)
in the form yntn
(39)
The problem (36), (37) may then be written max 11 - yit -...- Yntnl a
(40)
7.
Polynomial Approximation
55
and we recognize (40) as an instance of (PA) in (1) of §6. determine subsets x1,...,xq
Its dual reads:
(q > 1), and real numbers
{ti,.... tq} c :[a,01
such that
q xi
(41)
i=1
is maximized subject to the constraints q
xi ti
= 0,
1
r = 2,...,n+l
(42)
i=1
(43)
Ixil < 1. i=1
See (11) of §6. (40) and (41)
-
We shall construct feasible solutions to the two problems (43), and then use (29) of §6 to verify that these solu-
tions are optimal. In (41) ti =
SZa +
(43) we put
Ba z
q = n+l,
cos 6i,
xl = 2n Tn(cos 01)'
0.
=
il 2n
xn+1
i = 1,...,n+l
iT,
Tn(cos 6n+1)
(44)
Tn(cos 6i = 2,...,n.
xi = n
Condition (42) is now met by (33) since we may express combination of We observe that 'l,...,yn
Since
T0....,Tr-1. P
ITn(ti)I = 1,
in (38) is of the form (39).
by (39) for this particular polynomial
tr-1
as a linear
(43) is also satisfied.
Next we define P.
By (38),
P(ti) _ (-1)1-1/Tn(a±0)
Now (44) gives (-1)i-1
xl _
1
2n'
xn+1
(-1)n 2n
'
2,...,n.
n
xi
Hence (30) of §6 is also met, establishing optimality of the polynomial (38).
We next discuss the problem of constructing polynomials of degree < n
which approximate a function
approaches are conceivable: determine the polynomial P(ti) = f(ti),
P
f
on a bounded interval
i) select
n
of degree
< n
i = 1,2,.-.,n,
points
[a,0].
t1 < t2 <...< t
n
Two and
satisfying (45)
56
select
ii)
Q
APPLICATIONS OF WEAK DUALITY IN UNIFORM APPROXIMATION
III.
n+l
of degree
points
sI < s2 <...< sn+l
degree
min max If(t) - Q(t)I, Q sl,...ISn+l
< n
of degree
Q < n.
(46)
Show that there is one and only one polynomial
Exercise.
(47)
and determine the polynomial
which solves the problem
< n
satisfying (45).
Hint:
tions which must be satisfied by the coefficients of The construction of verify that cide with
is described in (14).
Q
interpolates
Q
t1..... tn
f
in
n
P
Derive a linear system of equaP.
It is now easy to
points which generally do not coin-
We next state an expression for the ap-
in (45).
proximation error.
n
be a closed bounded interval and let
[a,s]
n points with
be
t1,...,tn have
Let
Lemma.
(48)
a < tI < t2 <...< to < a.
continuous derivatives
the polynomial of degree
< n
f',...,f(n)
on
satisfying (45).
[a,$]
Further, let
f
and denote by
P
Then
f(t) = P(t) + R(t),
(49)
where n
(n)
I
R(t) = nf with the unknown point the points
and
t
(t-ti),
TI
i=1
(50)
lying in a subinterval of
E
t1,...,tn.
depends on
In general,
containing
[a,R] t.
The proof of this result is given in Dahlquist-Bjorck (1974), p. 100. Using (49) and (50) we get n
If(t) - P(t) I _ n; If(n) (t) I
amaxsI1IT
(t-t) I
(51)
.
The approximation error is thus bounded by an expression containing a factor which is independent of
A natural approach is to make this
f.
second factor as small as possible in the uniform norm.
We may here di-
rectly apply Theorem (25) to determine the appropriate choice of tl,.... tn
ti =
To select
in (45); namely,
a26
si
interpolates
+
-S -_a
cos LIZ-2 n,
i = 1,2,...,n.
in (46) we argue as follows. f
at the points
t
,
1
n t. i=1(t-1)
_
2t-a-S
2(8-a)n T 4n
n(
0-a )'
of (26).
We assume that Then
(52)
P
in (49)
7.
Polynomial Approximation
57
The maximum of the absolute value of this function is assumed at S. =
azs
Sz-a
+
cos
(1
n )
(53)
i = 1,...,n+1,
and these points are entered into (46). (54)
f(t) = to degree
Exercise.
Consider again the problem of approximating
over a closed bounded interval
< n.
(a,B)
by a polynomial of
Verify that the two approaches i) and ii) above give the
same results, if we select
ti
in (45) according to (52) and
s i
in
(46) according to (53). (55)
Exercise.
Assume again that
in (45) is given by (52) and
t i
si
in (46) by (53).
We determine the polynomials
P
and
Q
in the form
n-1 c qrl
r
Q =
E
drgr,
r= O
where
2t-a-p 4r(t) = T r(
B-a
),
r = 0,...,n-1.
Use the orthogonality relations (33) and (34) to derive expressions for the coefficients
cr
and
dr.
Show also that the number
c
of (15)
is given by n+1
(-1)1 lf(s
n
i=1
thus obtaining a lower bound for the achievable approximation error.
Chapter IV
Duality Theory
A major topic of this chapter is the derivation of "strong" duality results, i.e. theorems which specify when
v(D) = v(P).
Another important
topic is the existence of solutions to the problems (P) and (D).
We shall
give two strong duality theorems, namely (9) of §10 and (7) of §11.
They
can be used to verify strong duality in most linear optimization problems occurring in practice.
§8 is of independent interest since it gives a geometric representation of the dual problem (D) that also is helpful for the understanding of the numerical procedures to be described in Chapter V, VI, and VII.
§8.
GEOMETRIC INTERPRETATION OF THE DUAL PROBLEM At first we introduce the concept of a convex set and the special
case of a convex cone.
Their elementary properties are discussed.
A very
simple geometric representation of the dual problem will be given. (1)
The set
Definition.
following property: line segment between
a1 E K,
a1 a1
and and
K c Rn a2
is said to be convex if it has the
belonging to
a2
lies in
K.
implies that the entire
K
This may be written:
a2 E K
Aa1 + (1-A)a2 E K,
By induction on
q
A E [0,1].
we easily establish that if
al,...,aq E K. then q
A.a. E K
i=111
58
K
is convex and
8.
Geometric Interpretation of the Dual Problem
59
convex
nonconvex
Fig. 8.1
if
A l + A 2+ ... + A q= 1 and ai > 0,
i = 1,...,q.
See Fig. 8.1. Definition.
(2)
Let
vectors
x E Rn
q
A be an arbitrary set of vectors in
We
Rn.
Conv (A), to be the set of all
A, denoted
define the convex hull of
admitting a representation
aiai (q > 1)
x= i=1 where q
ai = 1
i=1 and A.
> 0,
i = 1,...,q,
Thus the convex hull of
ai E A,
i = 1,...,q.
A, Conv (A), consists of all possible convex
combinations
q
x=
i=1
A.a., 1 1
A. 1-> 0,
q
of finitely many vectors from large.
A. = 1,
i=1 1 A.
q>1
The number
(3)
q
can be arbitrarily
The verification of the following statements is straightforward:
Conv (A) is convex for any set
A; a convex set which contains
A must
contain all convex combinations (3); Conv (A) is the smallest convex set having
A
as a subset.
See also Fig. 8.2.
IV.
60
DUALITY THEORY
Fig. 8.2
Fig. 8.3
(4)
that if
A convex cone is a convex set with the property
Definition. then
x E C
where
form a convex cone of the convex set
Let
K
be
(5)
which we shall denote
C
Let
A
be an arbitrary subset of and the notation
for the conic hull of the convex hull of
CC(A)
Instead of
We shall
Cone (Cony (A)) Cone (Cony (A))
by forming all nonnegative multiples of all
convex combinations (3) of elements of
z =
A.
Rn.
CC(A).
we shall sometimes write By (5), we obtain
Cone (K) is
K.
use the word convex conic hull of A
CC(A) = {z
Cone (K), the conic hull
It is straightforward to verify that
K.
Definition.
L
xiai,
xi > 0,
i=1
ai E A,
See Fig. 8.3.
A> 0
x E K,
the smallest convex cone containing (6)
A > 0.
for all
Then all vectors
a convex set.
y = Ax
Ax E C
i = 1,...,q,
A.
i = 1,...,4, (7)
q > 1}.
B.
Geometric Znterpretatign of the Dual Problem
61
Fig. 8.4
Thus the convex conic hull consists of all nonnegative linear combinations of elements of the set
We shall apply the concepts introduced above
A.
to the set of vectors which occurs by the formulation in §4 of the dual pair (P)
-
The constraints of the primal problem
(D).
a(s)Ty > b(s),
s E S,
can be expressed in terms of the set of vectors
AS = {a(s)
I
s E S} c Rn.
(8)
Combining (16) of §4 and (17) of §4 with (7) we find that Xi.... x
}
{s1,...Isq,
is feasible for the dual problem if and only if the vector
c
q
may be written as a nonnegative linear combination of the vectors a(s1),...,a(s
q)
x1,...,x
with coefficients
solutions if and only if Since the convex cone
c
lies in CC(AS)
q.
Thus (D) has feasible
CC(AS).
will play a major role in our presenta-
tion we shall introduce a special notation. (9)
Definition.
The convex conic hull of
AS
will be denoted
Mn
and called the moment cone of the optimization problem (P), Mn = CC(AS).
The words "moment cone" are traditional and will not be elaborated upon. From the remarks preceding the last definition we get the following statement:
62
DUALITY THEORY
IV.
1
Fig. 8.5
The dual problem (D) is feasible if and only if
Lemma.
(10)
c E Mn Example.
(12)
s
a(s) _ The sets
[:2
,
Put
n = 2, S =
s E [0,1].
AS, Conv (AS), and
Consider now the Minimize
and
[0,1]
Mn = CC(AS)
are indicated in Fig. 8.5.
optimization problem:
yI + 2 y2
subject to
(P) syl + s2y2 > es - 1,
Here
c = (1,1/2)T.
s E [0,1].
We see from Fig. 8.S that
the dual (D) of (P) above is feasible.
(Exercise:
megative combination of two suitable vectors (13)
Exercise.
c
a(sI)
is in
Mn
express
and
and that c
as a non-
a(s2)!).
Consider the same example as in (12) but with the
modification Minimize
yl.
Is the corresponding dual feasible? We have hitherto permitted an arbitrarily large natural number in the representation (7) of the convex hull.
set of a finite-dimensional vector space
Rn
However, CC(A)
q
is a sub-
and one might conjecture
Problem
Geometric Interpretation of t -e
8.
q = n
that at most
tion of a vector
vectors
from
a. i
A
are required for the representa-
(Try some simple examples!)
CC(A).
in
z
63
We now prove a general statement to this effect. Reduction Theorem.
(14)
Let the vector
negative linear combination of the
i
vectors
q
(p > 1) be a non-
z E Rp
z1,.... z
RP (q > 1),
in 9
e. q i = 1,...,q.
xi > 0,
xizi,
z =
(15)
i=1
Then
admits a representation
z
q
z =
x .zi.,
I
x
i
i=1
i = 1,...,q.
> 0,
(16)
i
such that at most set of vectors
.
of the numbers
p
with
z.
xi
x. > 0, (z.
are nonzero and such that the x. > 0}, is linearly independent.
I
The proof is constructive and we will show how to arrive at a
Proof:
representation (16) from (15) by means of finitely many arithmetic operaalready are linearly independent then
zl,...,zq
tions. If the numbers
q < p
and
are uniquely determined by (15) and (16), and there
x. = x,
is nothing more to prove.
We assume therefore that
z1,...,zq
are lin-
We will demonstrate how to reduce the number of positive
aely dependent.
terms in (15) step by step until the corresponding vectors become linearly
The linear dependence of z1-.,z
independent.
means that there are q
numbers
a,,...,aq
such that
q a
i
i=1
z
= 0.
i
(17)
Hence we have for each
with
r
ar
i 0
a.
z
r
1
=
Z1 .
.
itr ar
Entering this relation into (15) we get q Z = i=1 i+r
a.
(xi - xr al)z.. r
(18)
Hence we have got a representation of of the vectors
zl,...,z
q.
z
as a linear combination of
We must now also show that
r
so that (18) becomes a nonnegative linear combination, i.e.
q-1
can be chosen
64
a.
x. - x 1 > 0,
r aT -
1
We now select
r
i
such that
ar > 0.
in (17) are nonpositive we multiply (17) by
ar
(If all
x. > 0
a
and
1 -
r
> 0
-1.)
Since
we conclude that
a.
xi - xr al > 0
if
ai < 0.
r
The condition (19) is thus met when
We next discuss the case
Then (19) implies
ai > 0. x,
ai < 0.
x
>
r
a. -ar
.
i
This condition and consequently (19) is certainly met if we determine
r
such that x
a
r
x,
a i > 0}.
min{
=
a.
Then (18) expresses vectors
z
(21)
as a nonnegative linear combination of the
zl'...,zr-1' zr+l'...,z
q-1
This procedure may be repeated until
q.
we have determined a representation (16) such that those vectors which belong to nonnegative coefficients Example.
(22)
zl = I i I' Then
z
xi
(p = 2, q = 4).
Z2 = I Z I,
Z3 =
are linearly independent. Let
(2),
z4 =
V,
(2), z = 1724 1
admits the representation
z=41 z1+21 z2+41 z3+41z4. This relation corresponds to (15). of course.
(23) The vectors are linearly dependent,
Thus we have, for example,
zl = 31 z2 + 31 z3 or
-3z1 + z2 + z3 = 0.
Geometric Interpretation of the Dual Problem
8.
This corresponds to (17) with ar > 0, r = 2
must have
al = -3, a2 = 1, a3 = 1, a4 = 0. r = 3
or
65
meets
this condition.
Since we
By (21) we
must next determine the smaller of the quotients
x2 a2
x3 a3
and
1
Thus we should take
1 =T .
r = 3
and (18) gives
z= z1+ 41 z2+ 41 z4. Thus
no longer appears in (24), in contrast to the representation
z3
(23).
(24)
Carry out another reduction step, this time on (24), and obtain
as a nonnegative linear combination of two of the vectors
z
z1, z2, z4.
Is it possible to carry the reduction even further? Exercise.
(25)
z E Conv (A) c Rn
Prove the Lemma of Caratheodory:
Every vector
admits a representation
n+l x a
z
i=1
i
i
where n+l
xl,...,xn+l > 0
al,...,an+l E A,
and
X. = 1. i=1
From the Reduction Theorem (14) we obtain the following result: Theorem.
(26)
{s1,.... s
Let
,
q
x1, ..,x
q
}
with
q > 1
be feasible
for (D); i.e. q
r = 1,...,n,
ar(si)xi = cr,
(27)
i=1
and x. > 0,
i = 1,...,q.
Then there is a subset
{s1 ,...,si } n 1 with the properties
xl ,...,x.
of
{s1....,sq}
and numbers
n
1
..,si
{si
,
n
1
..,xi }
xi
is also feasible to (D); i.e.
n
1
n ar(s
j=1 3
i j )x i j
= c,
r
r = 1,-..,n,
(28)
66
The vectors,
a(s.
which belong to positive numbers
)
J
In Theorem (26) we have tacitly assumed that elements, i.e. at least
S
constraints occur by (P).
n
(In many applications
quirement is met.
are linearly
x,
J
independent.
or
DUALITY THEORY
IV.
S
has at least
As a rule, this re-
has infinitely many elements
is the result of "sufficiently fine" discretization and
S
n
ISI
be-
comes very large.)
One can always achieve OTy > 0
ISI
> n
by adding the trivial constraint
to (P) sufficiently many times.
This operation does not change
M .
n
However, we cannot conclude from Theorem (26) that we only need to consider feasible solutions (D) with
{s1,...,s
q
and that we can put
q < n
lation of the dual problem.
,
x1,
q = n
.. x
q
}
of the dual problem
from the start in the formu-
It is quite possible that by the transition
from (27) to (28) by means of the Reduction Theorem (14) it happens that
q
n n L
b(si )xi
j
j
b(si)xi.
<
i=1
j
If one wants to make sure that the value of the dual preference function does not change, then one must apply the Reduction Theorem on the
n+l
equations q
i=1
b(si)xi = co, (29)
q
r = 1,...,n.
ar(si)xi = cr, i=1
We obtain then the important result that
n+l
points
"are enough".
s. J
Thus we may put (30)
from the start in the formulation of (D).
q = n+l
The dual problem (D). n+l
b(si)xi
Maximize i=1
subject to the constraints n+l
ar(si)xi = cr, i=1 sl,.... sn+l E S,
xl,.... xn+1 > 0.
r = 1,...,n,
8.
Geometric Interpretation of the Dual Problem
67
We will show in (7) of §12 that if (D) has a solution then we can even put
q = n
from the outset.
From the preceding argument, in particular (29), we are led to intro-
Mn+l C
duce yet another moment cone b(s)
to the vector
a(s)
Rn+l
We adjoin the real number
and consider the vectors
b (s) al(s) R+1
a(s) =
(b(s),aI(s),...,an(s))T E
=
an(s)J Then we can write (29) in the form q
'a(si)xi = (c0,cl,...,cn)T.
(31)
i=1
Following the pattern of (8), we let Rn+1
AS = {a(s)
I
s E S} c
and can then define
Mn+1'
The moment cone
Definition.
(32)
Mn+I
associated with the opti-
mization problem (P) is the convex conic hull of
AS;
n+l = CC(AS). By the definition of the convex conic hull (see (7)) every vector i E Mn+1
admits the representation q
a(si)xi,
z =
(33)
xi > 0.
i=1
By comparison with (31) we realize that
{s,...,sq, xl,...,xq)
sible for (D) with the corresponding value
c0
is fea-
of the dual preference
function if and only if (cO,cl,...,cn)
(We may put
T
q = n+1
E Mn+1'
(34)
in (31) and (33) by the Reduction Theorem.)
From (34) we obtain a "geometric" formulation of the dual problem. It will be fundamental for the discussion to follow. (35)
A "geometric" formulation of the dual problem.
Maximize
c0
68
IV.
DUALITY THEORY
1
Fig. 8.6
subject to the constraint (c0,...,cn)T E n+1'
It is, at least in principle, clear how to get from a solution
of (35) to a dual solution 6
xl,...,xn+l}
(c0,c)
(and vice versa).
1
Since
(c0,cl,...,cn)T E
n+1'
n+l
bCsi)xi
c0
i=1 and n+l
aCsi)xi,
c = i=1
where
xi
{sl'"
are nonnegative numbers and n+l'
x1......x1}
si E S, i = 1,...,n+1.
is hence a solution to (D).
Fig. 8.6 gives a geometric illustration of the dual problem (35). We seek that point
(c0,c)T
{(c0,cl,...,cn),
which belongs to
Mn+I
of the straight line
co E R}
and whose first component is as large as possible.
We mention also the special case of linear programming. Rn+l
Mn+1 = {z E
z =
m C
i=1
Let
There we find
aixi = Ax,
x = (xl,...,xm)T > 01.
(36)
9.
Soivabi,11ty of the Dual Problem
b1
all
A =
b2
b
a12
alm
69
m
a
a21
a22
and
ant
anm
+
+
+
al
a2
a
2m
m
We can now write the condition (34) in the form
The dual problem (LD) is then equivalent to the following problem (we write
for
x0
c0)
Minimize
T -x0 + b x = 0,
subject to
x0
Ax = c, (Xi,...,xm)
§9.
T
> 0.
SOLVABILITY OF THE DUAL PROBLEM The following important theorem on the solvability of the dual prob-
lem (D) is an immediate consequence of the formulation (35) of §8. (1)
Theorem.
Let a given linear optimization problem be such that
is closed and the dual problem (D) is bounded (i.e. it is in state
Mn+l "B" - see Diagram (1), §5). Proof:
We note that
Then problem (D) has a solution. v(D)
is the maximum of the continuous function
f given by f (z0,
, zn) = z0,
defined on the closed and bounded set I
Mn+l n {(z0,z)T Here
c E Rn
v(D)-l < z0 < v(D), z = c).
is the vector appearing in the preference function of (P).
Theorem (1) is very useful since there are simple criteria for ascertaining that
M1+1
is closed.
They are applicable for important
classes of linear optimization problems. dual pair (P)
We shall show in §10 that the
- (D) has no duality gap under the assumptions of Theorem (1).
IV.
70
DUALITY THEORY
Quite often we shall encounter a special class of problems where the index set
S
a1,...an, b
and the functions
which appear in the con-
straints of (P), n
ar(s)yr > b(x),
s E S,
r=1 satisfy the following assumptions:
General assumptions on
(2)
and the real-valued functions
(P).
S
al,...,an,b
is a compact subset of which are defined on
Rk
are
S
continuous there.
This assumption is valid for the examples (3) of §3 and (7) of §4 k = 1) but not for (4) of §3.
(with
gramming (2) holds trivially; since tion on
is continuous.
S
For the special case of linear proS
is finite every real-valued func-
We can then always assume that
S = {1,...,m} c R. (3)
Definition.
If there is a vectory = (yl,...,ym)T E Rn
such
that n
r=1
ar(s)yr > b(s),
s E S,
(4)
then (P) is said to meet the Slater condition.
If (P) satisfies (4) then
we also call (P) superconsistent since (4) is a sharpening of the state-
ment thaty is feasible for (P). Suppose now that Assumption (2) is satisfied.
tion (4) is met if one of the functions
al (s) = 1, Indeed,
a1,...,an
Then the Slater condiis constant, e.g.
s E S.
(4) is met if we take
y = (y1, 0,...,0)T.
where yl > max b(s). sES
This is possible since
b
is continuous on a compact set.
(Compare (13)
of §2.) (5)
Remark.
The Slater condition is an example of the so-called
regularity conditions which are introduced in the theory of optimization and which play a major role in the derivation on existence of solutions. tion in §11.
of theorems on duality and
We shall encounter another regularity condi-
9.
Solvability of the Dual Problem
(6)
Exercise.
dition (4).
Consider (P) given Assumption (2) and the Slater con-
Show that the set of vectors feasible to (P) has interior
Hint: Lety E Rn
points.
71
satisfy (4).
y with
such that all vectors
Iy-yI < E
Show that there is an
E > 0
are feasible for (4).
The two theorems to follow can be used to establish the existence of solutions of the dual of most linear optimization problems encountered in practice. (7)
Suppose that Assumption (2) is satisfied and that (P)
Theorem.
meets the Slater condition.
Then the moment cone
is closed.
Mn+l
In order to carry out the proof of this theorem we need the following result which is of independent interest. (8)
Lemma.
A c Rp
Let
be a compact set.
Then its convex hull,
Conv (A), is also compact.
By (25) of §8, Conv (A) is generated by means of all possible
Proof:
linear combinations p+l aixi i=1
where and
a1,...,ap+1 E A where the set
D c Rp+l
(xl,...,xp+l) E D,
is defined by p+l
Rp+1 I
D = {x E
xl > 0,
i = 1,...,p+1, and
xi = 1}. i=1
Hence Conv (A) is the image of the compact set
AXA
x
...
X
(p+l times)
AxD
under the continuous mapping p+l
(al,...,ap+1, xi,...,xp+1) 3
L
aixi.
i=1
Since
A
was compact, Conv (A) must be compact as well.
(See the remark
after (12) of §2.) Proof of Theorem (7):
will show that then M1+1 = CC(AS).
z
Let
z
must be in
be an arbitrary vector in Mn+l
also.
Mn+l'
We
By Definition (32) of §8,
DUALITY THEORY
IV.
72
Thus to
we may associate a sequence
z E Mn+I
a sequence of nonnegative numbers
{hi}i>1
in
and
Conv(AS)
such that
{ai}i>1
z = lim aihi.
(9)
j-+_ The set
AS
is compact since
is compact and
S
By Lemma (8), Conv (AS)
tinuous.
subsequence of
is compact.
We may therefore pick a
which converges to a vector
{hi}i>1
are con-
a1,.... an, b
h E Conv (AS).
we may as well assume from the outset that the sequence
{hi}i>1
Thus
in (9)
is such that lim hi = h,
h E Conv(AS).
i-M.
If now the sequence
{ai}i>1
that it converges to
A > 0.
is bounded we can in the same way assume Then we obtain
z = lim a.h. = lim X. lim h. = ah h E Conv (AS), a > 0, it follows that
and from
We next consider the remaining case when
as was to be established. {ai}i>i
is unbounded.
z = ah E CC(AS) = n+1
Then we may assume, if necessary by using a suit-
able subsequence, that > 0, A.>0
1 = 1,2,...,
and lim 1/a. = 0. i-+°° 1
Thus we get
i- i
h = Iim hi = Iim
i-
aihi = 1im al i-MD
This means that the null vector of are S
nonnegative numbers
q > 1
1im aihi = 0z = 0.
1 1-b° Rn+1
lies in
a1,...,aq
and
Conv (AS). q
points
Hence there
s1,...Isq
in
such that q
a(si)ai
0 = i=1
and q
ai = 1.
(10)
i=1
From the definition of
a(s)
(see
(30)- (31) of 48) this implies that
9.
Solvability of the Dual Problem
73
q
b(si)ai
0=
i=1 and q
ar(si)ai,
0 =
r = 1,...,n.
i=1 Let
y E Rn
The last two equations now give
be an arbitrary vector.
q
n \\
0 =
yrar(si)
ail
b(si)J
Since problem (P) is required to meet the Slater condition there is a such that
y' E Rn nC
Yrar(si) - b(si) > 0,
i = 1,...,q.
r=1
If we now put ... = aq = 0
ity that
y = y
is unbounded.
Example.
s2 y1 > s,
Here we have
This rules out the possibil-
Hence we have established the theorem.
Consider the constraint
n = 1, S = [0,1], aI(s) = s2, b(s) = s. a1(0) = b(0) = 0.
(xI,O)T, x1 > 0
(13)
aI = a2 =
s E [0,1]
tion is not met since vectors
ai > 0, that
must hold, contradicting (10).
{Ai)i>1
(12)
in (11) we get, since
Exercise.
are in
Mn+1
Mn+1
The Slater condi-
is not closed since the
but not in
Mn+1'
Consider the problem of uniform approximation over
a compact set, discussed in 96.
Show that the Slater condition is met.
In §4 we showed that if (P) and (D) are consistent, then (D) has a finite value.
Combining (1) and (7) we get the following statement on the exist-
ence of solutions to (D). (14) i)
ii)
iii)
Theorem.
Let the dual pair (P)
-
(D) have the properties
Assumption (2) is satisfied, (D) is feasible,
(P) meets the Slater condition.
Then (D) is solvable.
This theorem will be sharpened significantly in 12 of 910. We now treat linear programming and show that the corresponding mo-
ment cone n+1
as defined in (36) of §8 is closed in this case.
DUALITY THEORY
IV.
74
We shall say that cones of the form C = {z E Rp
(x1,...,xm) > 0}
z = Ax,
I
are finitely generated.
In the case of linear programming, Mn+I
finitely generated, and the following theorem establishes that
is
Mn+1
is
closed.
Then the rows of A
p.
be a convergent sequence in
Rp
are linearly independent. C
is closed.
p x m matrix
We consider first the case when the
Proof:
rank
Every finitely generated cone in
Theorem.
(15)
Let now
A has {zj}j>1
such that
zJ + z
(16)
We want to show that
z
is also in
Every
C.
nonnegative linear combination of at most vectors of
z)
can be written as a
linearly independent column
p
A, by the Reduction Theorem (14) of §8.
We may now, for each
j, supplement this set of column vectors by picking suitable column vectors from the remaining ones to get a basis for each
vector
xj
- Rp
z) = A j xj, Here
A.
I. c {1,...,m}
an index set
j > 1
p
elements and a
such that
x> > 0.
is formed of the columns from
xj = A-Izl,
Then there is for
Rp.
containing
A
corresponding to
Ij.
Thus
j > 1.
However, there are only finitely many matrices these a fixed matrix
A
and a subsequence
Aj.
{j(k)}k>l
Hence there is among of natural numbers
such that
xi (k) = A -1 z3 (k) ,
k > 1.
Hence we get from (16)
xj (k) + x = A -1 z. Since
xj(k)
> 0
we must have
x > 0.
The relation
z = A x
then implies that
z E C
which was the desired conclusion.
the remaining case when the rank of A
that the rows of A
We now treat
is less than
p.
We may assume
are ordered such that the first
p1
rows are linearly
Separation Theorem and Duality
10.
75
independent (1 < pI < p) and the remaining rows are linear combinations of the first A = 0
(We have, of course, excluded the trivial case
ones.
p1
from further consideration.)
Then every
1
zI E
(z1,z2)T,
z =
Rp,
z E C
may be written
1 Rp-p,
z2 E
where
zl =Ax, xERm, x> 0,
(17)
z2 = Bz1.
(18)
and
Here
is a
A
pI X m matrix and
define the cone
B
a
(p-p1) x pI
matrix.
We next
associated with (17) and argue as above and use (18)
to arrive at the desired result
z =
Combining Theorems (1S) and Cl) we conclude that (LP) is solvable when (LD) is bounded.
We saw in (37) of §4 that every problem in the form of
(LP) may be transformed into an equivalent problem in the form of (LD).
Hence a corresponding existence theorem is valid for (LP) as well.
This
fact we summarize in the (19)
Theorem.
Consider the dual pair (LP)
-
(LD) of linear pro-
If both of these problems are consistent then they both have solu-
grams. tions.
In the next section we shall also show that no duality gap can occur under the assumptions of Theorem (19).
§10.
SEPARATION THEOREM AND DUALITY We shall start this section by developing a fundamental tool to be
used in the proof of strong duality theorems, namely the statement that a point outside a closed convex set in
RP
may be "separated" from this set
by a hyperplane in the sense of the following definition. (1)
Rp
and
Definition.
z f M
Let
M be a nonempty, closed and convex subset of
a fixed point.
H(y;n) ={xERp is said to separate
I
z
The hyperplane
yTx=n) from M
if
IV.
76
Separating hyperplane
Fig. 10.1.
yTx < r1 < yTz,
DUALITY THEORY
x E M.
From geometric considerations (see Fig. 10.1) one is led to believe that a vector
which defines a separating hyperplane is obtained by determin-
y
ing the projection
of
z0
M
on
z
and putting
y = z - z0.
This will
We will therefore first show the
turn out to be the correct procedure.
existence of a unique projection point.
(See (4).)
To give a motivation for the argument to follow we shall first indicate the fundamental role of the concept of separating hyperplanes in the theory of the dual pair (P)
- (D).
Assume that the hyperplane n Rn+1
I
zryr = 0}
H(y;0) = {z E r=0
separates the moment cone n+l Mn+I
from the point
lies on one side of the hyperplane.
v 4 Mi+1.
Thus all of
Hence
n 0 >
I
zryr, all
(z0,...,zn) E n+1
r=0
In particular, since
Mn+1 = CC(AS)
we have
z = a(s) = (b(s), a1(s),...,an(s))T E Mn+1 for all
s E S.
Thus we find from (2) that
(2)
10.
Separation Theorem and Duality
77
n 0 > b(s)y0 +
s E S.
a,(s)yr,
£
r=1 If
holds, then the last relation takes the form
y0 > 0 n
r=1
-y
a (s) r > b(s), r
Hence the vector
be feasible for (P).
y
Let
sing through the origin such that Projection Theorem.
(4)
set and let vector
0<
z
z0 E M
Mn+l
M c RP
Let
Give a hyperplane pas-
is on one side of this hyperplane.
be a nonempty, closed, convex
be a fixed point outside of
M.
which lies "closest" to
That is, z0
Iz - z0I <
Proof:
is feasible for (P).
y = (-y1/y0' ...,-yn/y0)
Exercise.
(3)
s E S.
y0
Since
Iz
- xI, all
M
is closed and
z.
Then there is exactly one is such that
x E M. z E M we find
p = inf Iz - XI > 0. xEM
Obviously, it is sufficient to search for the vector
in the set
z0
M=Mn {xERP I Iz -xl <2p}. assumes its minimum
Now the continuous real-valued function
x - Iz - xI
value on the bounded and closed set
Hence there is a
M.
z0 E M
such
that
Iz - z01 < Iz - xI,
x E M.
From the construction of
(5)
(5) holds for all
M,
lish the uniqueness of the projection point there is a vector
zl # z0
z2 = (z1 + z0)/2.
Z2I2 =
4I(z-z0)
Iz
+
=
implying
x E M. The parallelogram law from (10) of §2 gives
+ (z_zl)I2 <
4I(z-z0)
+ (z_zl)I2
lzo_zlI2 = 4I(z-z0) + (z_zl)I2 +
2(Iz_zol2
+
1z-z21 < Iz-z0I.
uniqueness is established.
We must now estab-
Assume therefore that
such that
Iz - z1I < Iz - xI, all We now put
x E M.
z0.
I(z_z0) -
(z_zl)I2
Iz_zll2) = I=-zoi2. This contradicts the construction of
z0, hence
78
IV.
Separation Theorem.
(6)
Let
set.
put
y = z-z0
and
n = (z-z0)Tz0
T
y x < n < yTz,
x E M;
i.e. the hyperplane
H(y;n)
Proof:
number.
M c Rp
Let
be a nonempty, closed, convex
be a fixed point whose projection on
(E M
z
M
is
z0.
If we
we get (7)
separates
z
from
M.
0 < n < 1
x E M be an arbitrary vector and
Let
DUALITY THEORY
be a fixed
Then
(1-11)z0 + ux = z0 + u(x-z0) E M.
We also find that
Iz-z012
< Iz - (zo +
u(x-z0))I2
= Iz-z012 - 2u(z-z0)T(x-z0) + u21x-z012, giving (z-Z0) T(x-z0) < 2 111x-z012
Letting
u - 0
we arrive at
(z-z0)T(x-z0) < 0,
establishing the leftmost inequality in (7).
The other inequality re-
sults from the relation Iz-z012
0 <
=
T (z-z0)T(z-z0) = y z
-
T yTz0 = y z - n,
concluding the proof.
Suppose now that the assumptions of Theorem (6) hold, but specialize
M
to be a convex cone.
Then
x E M
implies that
Ax E M
for all
A > 0.
From (7) we then get
yT (ax) < n,
A > 0,
yTx < n/x,
a > 0.
or
Letting
A - m we conclude
yTx < 0,
Thus if M
x E M.
is a convex cone we may put
(7) in the form
n = 0
from the start and write
Separation Theorem and Duality
10.
T y x < 0 < yTz,
79
x E M.
(8)
Now we can use the Separation Theorem to establish the duality result which was promised earlier. First Duality Theorem.
(9)
Consider the dual pair (P) - (D) and
make the following assumptions: i)
ii)
The dual problem is consistent and has a finite value The moment cone
Mn+I
v(D);
is closed.
Then (P) is consistent as well and
v(P) = v(D); i.e. there is no duality gap. Proof:
Moreover, (D) is solvable.
We have already shown that (D) is solvable (Theorem (1) of
Thus we have
§9).
(cO,cl,...,cn)T
E n+l'
but (co + e, cl,.... cn) 4 Mi+1
for any
e > 0.
Since
Mn+1
is closed we may invoke the Separation
Theorem (6) and conclude that there is a hyperplane in arates
(co + e,c) T
is a vector
from the convex cone
Mn+I
Rn+l
(see (8)).
which sepHence there
Rn+l,
different from
(y0,yl,...,yn)T E
n
0, such that
n xryr < 0 < Y0(c0 + e) +
r=0
crYr,
r=1
(10)
(x0,xl,.... xn)T E Mn+l'
In (10) we now put (x0,x1,...,xn)T = Cc O,cl,...,cn)T E Mn+1
and obtain y0e > 0. > 0
Since
we must hence have
(x0,xl,...,xn) (s E S
relation
T
y0 > 0.
If we now set
= (b(s),aI(s),...,an(s))
T
E AS c Mn+1'
is arbitrary) we find from the leftmost inequality in (10) the
80
DUALITY THEORY
IV.
n
r=1
ar(s)(-Yr/YO) > b(s),
s E S.
Hence the vector Y = (-Yl/Y0, -Y2/YO...... y ly0) E Rn
is feasible for (P).
The right inequality in (10) implies
n
cr(-Yr/YO) < co + C. r=1
We now arrive at the following chain of inequalities:
n
v(P) <
cryr
r=1 The first inequality follows from the fact thaty is feasible for (P) and the last is a consequence of the weak duality theorem (18) of §4. Thus
v(P) - e < v(D) < v(P) for any (11)
e > 0, proving the theorem. Exercise.
the moment cone
Mn+l
The ray
Hint:
Consider again Example (8) of B. R3
in
{(0,a,0)T
I
Draw a picture of
and show that this cone is not closed.
A > O)
lies in
Mn+1
but not in
Mn+l'
In many applications the General Assumption of (2) of §9 is met: S
is a compact subset of
continuous on
Rk
and the functions
a,,.... an
and
b
are
We combine the Theorems (7) and (14) of §9 with (9)
S.
and arrive at the following useful result: (12)
Theorem.
Consider the dual pair (P) - (D) and make the assump-
tions i)
ii)
iii)
General Assumption (2) of §9; (D) is consistent;
(P) meets the Slater condition.
Then (D) is solvable and the values of (P) and (D) coincide. We discuss also the case of linear programming, i.e. the dual pair (LP)
Minimize
cTy,
ATy > b
(LD)
Maximize
bTx,
Ax = c,
x > 0.
10.
Separation Theorem and Duality
81
Theorem (9) and Theorem (19) of §9 deliver the entire duality theory of linear programming.
We have by Theorem (9) that if (LD) is consistent
and bounded then (LP) is consistent also and the values of (LD) and (LP) coincide.
Using the transformations (37) of §4 we may also conclude that
if (LP) is consistent and bounded then (LD) is consistent as well and the values of the two problems coincide.
From this argument we obtain the
following state and defect diagrams for linear programs.
(Compare also
with (1) of §5 and (7) of §5.) State and defect diagrams for linear programming.
(13)
`LP)
(LP)IC IC
(LD)
IC
UB
B
(LD)
0
UB
6
State diagram
0
Defect diagram
Duality theorem for linear programming.
(14)
A dual pair (LP) -
i)
0
B
5
UB
UB
IC
4
1
B
B
(LD) is in one and only one of the states
1, 4, 5, 6 of the state diagram (13).
All states are realized.
If both programs are consistent (i.e. if state 5 is realized)
ii)
then both problems are solvable and no duality gap occurs. The reader should construct simple examples (n = 1
or
n = 2) to
show that all the states 1, 4, 5, 6 can be realized. We recall once more that the First Duality Theorem (9) plays a fundamental role
for the argument of this Section.
this theorem we may conclude that solution.
v(D) = v(P)
However, the assumptions
Under the assumptions of as well as that (D) has a
i) and ii) of Theorem (9) do not
imply the solvability of (P), as is illustrated by the example in Exer-
cise (13) of B. (15)
(1) of §6.
Exercise.
Show that
Consider the problem of uniform approximation of v(DA) = v(PA)
and that the dual problem is sol-
vable. (16)
Exercise.
We replace the dual (D) by the "modified dual" (D')
as follows: (D')
Maximize
c0
when
(Compare with (35) of §8.) v(P)
(c0,c)T E Mn+1'
Show that the weak duality inequality
is valid for the modified dual pair (P) - (D!).
v(D') <
Show also that when
82
IV.
DUALITY THEORY
v(D') is finite then (D') is always solvable and that we always have v(P) = v(D').
Exercise.
(17)
A c Rp
Let
c E CC(A)
a E A
(The modified problem (D') is of theoretical interest only.)
Use the Separation Theorem (6) to show Farkas' Lemma:
be a nonempty set and
c E Rn
if and only if all vectors
also satisfy
cTy > 0.
y
Then
a fixed vector. such that
aTy > 0
Specialize to the case when
for all
A
has finitely
many elements. (18)
Remark.
The duality theorem (12) can be sharpened somewhat.
(A corresponding statement is true for the First Duality Theorem.)
One
can show that the assertions of (12) remain true if we replace the assumption (ii) by ii')
is finite.
v(P)
It is easy to establish that ii) and iii) imply ii').
A proof of this
sharpened version of (12) is to be found in Glashoff (1979).
For easy
reference we sum up the result, which is quite useful for many applications. (19)
Theorem.
Consider the dual pair (P) - (D).
Make the follow-
ing assumptions: i)
ii)
iii)
General assumption (2) of §9 v(P)
is finite;
(P) meets the Slater condition.
Then (D) is solvable and the value of (P) and (D) coincide.
§11.
SUPPORTING HYPERPLANES AND DUALITY In this section we shall prove a theorem which could be said to be a
kind of "dual" to Theorem (9) of §10:
from the consistency and bounded-
ness of (D) follows the strong duality result
v(P) = v(D)
as well as
the solvability of (P) provided certain regularity conditions are met. For this purpose we will need a corollary to the Separation Theorem (6) of §10 which states that a supporting hyperplane passes through each boundary point of a convex set. (1)
let
Definition.
Let
M be a nonempty convex subset of
z E M be a fixed point. H(y;n) = {x E RP
I
yTx
(See Fig. 11.1.).
The hyperplane
= Ti)
is said to be a supporting hyperplane to
M
at
z
if
Rp
and
Supporting Hyperplanes and Duality
11.
Fig. 11.1
83
Supporting hyperplane
yTx
Lemma.
Let
be in
z
no supporting hyperplanes to Proof:
M, the interior of at
M.
Then there are
z.
has a supporting hyperplane
M
Assume
M
M.
H(y;n)
at
z.
Since
0
there is a s> 0
z E M z
such that
= z+ayEM.
A
We find that y
T z
X = yTz + ayTy < n = Y
ayTy < 0,
which contradicts
1 > 0
and
Thus we reach the desired conclu-
yTy > 0.
sion. (3)
Theorem.
Let
M be a nonempty convex subset of
Rp
and let
0
be on the boundary of M hyperplane to
M
at
z.
(z E bd M = M91).
Then there is a supporting
z
84
IV.
For every nonempty convex subset
Proof:
DUALITY THEORY
the following
M c :RP
statement holds:
bdM=bdR. This elementary property of convex sets follows from the fact that 0
bd M = MOM 2
since
0
M = M.
(4)
We shall show the truth of (4) in (22) - (26) at the end of §11.
z E bd M be a fixed point.
There is a sequence
{z
}
Now let
of points such
i
that
zi f M
points on
and
lim zi = z.
and the closed convex set
zi
M by
zio.
Putting
yix < yi zi, Since
We apply the Separation Theorem to the
yi = zi - zio
x E M,
zi
we get
i = 1,2,...
zi fE M, yi 4 0, i = 1,...,
Denote the projection of
M.
.
setting
i = 1,2,...,
yi = Yi/IYil,
we get 1Yil = 1
and
yix < yi zi,
x E M,
i = 1,2,...
(5)
Consider the set
B = {y E Rp B
of
I
Jyj = 1}.
is closed and bounded, hence compact. {yi}i-1
which converges to a point
Therefore there is a subsequence y E B.
Applying (5) to this sub-
sequence and passing to the limit we get yTx < yTz,
x E M,
which proves the assertion of the theorem sincey E B Definition.
(6)
and hence
0.
The dual problem (D) is termed superconsistent if
0
c E M . n (7)
Second Duality Theorem.
Consider the dual pair (P)
the assumptions i)
v(D) is finite; 0
ii)
(D) is superconsistent, i.e.
c E Mn
-
(D).
Make
11.
Supporting Hyperplanes and Duality
Fig. 11.2.
85
The cones Mn+l and M'n+l
Then (P) is solvable and v(P) = v(D). Proof:
Both (P) and (D) are feasible.
Hence the values
are finite due to the weak duality lemma.
v(D)
v(P)
and
We set as usual
c0 = v(D).
(8)
T c (Otherwise lies on the boundary of Mn+1' 0' 1'" ''cn) we could find a vector (c0,cl,...,cn)T with c0 > c0 but still feasible
The vector
(c
to (D), a fact which would contradict (8).)
For the purpose of carrying
out the proof we now introduce the following convex cone (see also Fig. 11.2):
Mn+1 - {(zz0,z ,...,z ) T
such that
I
there is
'z0 < z0,
(z0,zl,...,zn)T E M1+1
21 = zl,.... in = zn}.
We find at once that (20,cl,...,cn)
T
E bd Mn+1'
By (3) there is a nontrivial supporting hyerplane to (c0,cl,...,cn)T; i.e. there is a vector
y = (y0'y)T =
M
n+1
at (y0'yl'... yn)T # 0
such that yTz < 0 = y0c0 + yTc,
z E Mn+1'
(9)
86
DUALITY THEORY
IV.
We have used here the fact that n+l (9) implies, since
is a convex cone.
AS c CC(AS) = Mn+1
(See (8) of 410.)
c Mn+l'
n
YOb(s) +
ar(s)yr < 0,
E
s E S.
(10)
r=1
We now show that
From the definition of
y0 > 0.
T
Mn+1
it follows that
a > 0.
We therefore get from (9) y0c0 - y0A + yTc < 0.
y0c0 + yTc = 0
Since
A > 0,
-y0A < 0,
and hence
Y0 > 0.
Putting
We must now rule out the possibility
nnp
Yrzr < L r=1
z E Mn.
crYr,
r=1
is the projection of
the condition
Mn
at
c.
(Since
y # 0
y0 > 0.
r = 1,...,n,
and obtain, from (10), n ar(s)yr > b(s),
s E S.
r=1
Thus
T
(yl'" ''yn)
is feasible for (P) and hence n
v(D) < v(P) <
cryr'
r=1
By (9) we conclude that nc
crYr = c0 = v(D). r=1
defined through
and
y0 = 0
But this contradicts the fact that
Hence we have established that
Yr = -Yr/YO,
Rn+l
Therefore, (11) means that there is a nontrivial
z0 = 0.
(y1,...,yn) # 0.)
(Lemma (2)).
on the subspace of
Mn+1
supporting hyperplane to have
y0 = 0.
y0 = 0, we get from (9) that
n
Mn
we find that
we must
c E Mn
We now let
Supporting Hyperplanes and Duality
11.
Hence we have shown that
v(P) = v(D)
87
and
T
(y
Yn)
1
solves the prob-
lem (P).
The Second Duality Theorem just established can be applied to the problem of uniform approximation defined in (1) of §6. ately (without requiring the set a1,...,an
and
T
We obtain immedi-
to be compact or the functions
to be continuous) that
b
v(DA) = v(PA) (strong duality) and that the primal problem has a solution (see also (15) of §10):
Consider the approximation problem (PA) of (1) of §6.
Theorem.
(12)
Let
v1,...,vn
be linearly independent on
T; i.e.
n
t E T
yrvr(t) = 0, r=1
Then (PA) is solvable and the values of
implies
yl = y2 = ... = yn = 0. (PA) and (DA) coincide.
We will show that the linear optimization problem which is
Proof:
We must verify that
equivalent to (PA) satisfies the assumptions of (7). the vector vex cone
c = (0,...,0,1) T
of (2) of §6 lies in the interior of the con-
M which is generated by the vectors
(v 1(t),...,vn(t),l)T, (-vl(t)...... vn(t),l)T
c E M, for we can pick an arbitrary
Note that c =
t E T.
E T
(13)
and write
2(-v1(t),...,-vn(t),l)T.
Z(vl(t),...,vn(t),l)T + 0
We next assume that
and show that a contradiction results.
M
c
If
0
c E M4h1
M
at
c E bd M
then
and by (3) there is a supporting hyperplane to
Hence there is a vector
c.
T
y z < 0 = yTc,
(We can put
n = 0
(y1'' " 'yn'yn+l)T # 0
z E M.
since
M
(14)
is a cone.
we find from (14) that
c = (0,...,0,1) T
such that
See (8) of §10.)
Yn+1 = 0
Since
and hence
n
yrzr < 0,
z E M.
(15)
r=1
We observe that we know
(y1,...,yn)T + 0.
(Y1'. ''yn'yn+1)T + 0.
and arrive at
We have just seen that
Yn+l = 0
but
We now enter the vectors (13) into (15)
DUALITY THEORY
IV.
88
n yrvr(t) = 0,
t E T,
r=1
contradicting the linear independence of vl,...,vn
on
T. 0
c E M.
There is a simple way of imposing the condition Consider the problem
Regularization.
(16)
n (P)
Minimize
cryr,
E
a(s)Ty > b(s),
s E S.
r=1
Assume now that we know a solutions of (P) and a number that
F > 0
such
Then we supplement the constraints of (P)
JyrI < F, r = 1,...,n.
with the conditions r = 1,...,n.
l>rI < F,
the (equivalent) linear constraints
These may also be written as Yr > -F,
-Yr > -F,
-
r = 1,...,n.
Thus we get a modified ("regularized") problem: n
Minimize
(PF)
subject to
cryr
E
a(s)Ty > b(s),
s E S,
r=1
r = 1,...n where er = (0,...,0,1,0,...,0)T E Rn. r
rth component
The vectors which define the constraints of vectors
er
PF
include all the unit
as well as all the negative unit vectors
-er.
Hence we find
in this case that
Mn=Rn and the regularity condition 0
c E M
n
is trivially met.
By means of the duality theorem just proved, we find
that the dual pair (PF)
-
(DF) has no duality gap.
The solvability of
(P F) is also a consequence of this duality theorem but can alternatively be established from the fact that the constraints of (P F) define a compact subset of
Rn.
Supporting Hyperplanes and Duality
11.
89
It is known from the Reduction Theorem (14) of 98 that every admits the following representation:
c E Mn = CC(AS) qqC
c=
a(si)xi,
L
i=1
q < n,
are linearly
a(s1),...,a(sq)
and
sl,...,sq E S, x1,...,xq > 0
where
The representation (17) is generally not unique; i.e.
independent vectors. c
(17)
can have different representations (17) and the value of A representation (17) with
be unique.
Lemma.
(18)
Let
c
q = n
q
need not
is said to be maximal.
have a maximal representation; i.e.
n (19)
a(si)xi,
c = i=1
x
i = 1,...,n,
> 0,
are linearly independent.
a(sI),...,a(sn)
Then
(20)
i
c
lies in the interior
Proof:
Let
Mn
of
(21)
M .
n
have the representation (19), which we write as fol-
c
lows:
c = A(sl,. ,sn)x, A(sl,...,sn)
where the matrix
x = A(si,...,sn) Let now
s1,...,sn
-1 c.
be fixed.
Then the components
x1,...,xn
be looked upon as continuous functions of the vector then conclude that there is an E
a(s1),...,a(sn).
has the column vectors
is nonsingular by (21), so
A(sl,...,sn)
in the neighborhood
in (19) may
From (20) we
with the property that all vectors
a > 0
Ic - El < e
c.
are such that
xl,...,xn > 0, where
1' c.
x = A(sl,
,sn)
Thus the vector c = A(sl,...,sn)x also lies in
That is, c
M .
n
Hence there is a neighborhood of
is in the interior
c
which lies in
M
G
Mn
of
Mn, which is the desired result.
We remark that the converse of Lemma (18) is false. we consider the following 4 vectors in
R3:
As an example
n
IV.
90
DUALITY THEORY
al = CO,0,l)T
Put
a2 =
Cl,0,l)T
a3 =
T
a4 =
T
It is easy to establish, e.g. by drawing a suit-
c = (1/2, 1/2, 1) T.
able picture, that vectors
lation that
is in the interior of the moment cone formed by the
c
a1,...,a4.
Nevertheless one verifies by straightforward calcu-
has no representation (19) - (21) with
c
q = n = 3.
We conclude this section by showing, as promised above, that 2
0
M = M M c Rp.
holds for nonempty convex sets
The proof will be carried out in
three steps (see also Eggleston (1958)). Lemma.
(22)
Let
M c RP
be a nonempty set in
RP
with nonempty 0
interior
Let
M.
xl
and
x2
M
be two points in
such that
x2 E M.
Consider the line segment [xl,x2l = {x = Axl + (1-A)x2
Then all of
1
A, E [0,1]}.
[xl,x2], except possibly the endpoint
x1, belongs to the
0
M
interior
of
M.
Since
Proof:
M
is convex, [xl,x2] c M.
is a sphere, K6(x2), 6 > 0, with
x2 E M
implies that there
(see (11) of §2).
Kd(x2)
Let
c :M
y # xl
be a point in
[x1,x2].
We want to show that there exists
r > 0
such that
Kr(y) c M
(23) 0
and hence
y E M
as asserted.
Put
y = Axl + }ix2
where Let
(24)
A > 0, U > 0, A + p = 1. z E KU6(y).
Then
1z - yj < p6,
or, by (24), (z -
Since
(Ax1 + U x2)1 < p6.
p > 0
we find that
We verify now that (23) holds for
r = p6.
91
Supporting Fiyperplanes and Duality
11.
I(z - Axl)/U - x21 < 6; (z - Ax1)/p
i.e.
lies in
and hence in
K6(x2)
M.
Consider next the
identity z = Axl + u(z - Axl)/p.
Due to the convexity of
M,
must also belong to
z
M, proving (23) and
hence the assertion. Lemma.
(25)
assumption
Lemma (22) remains true when the
The assertion of
is replaced by the weaker requirement
x1 E M
x1 E M. 0
0
with
y E [x1,x2]
M
x2 E M
Since
Proof:
there is a
6 > 0
and let
y # xl, y # x2
zl
such that
K6(x2) c M.
Let
be an arbitrary point in
such that
Izl - xl1 < dlxl - yi/1x2 - YI Define
through the relation
z2
22 - x2 = - (zl - x1)Ix2 - YI/Ixl - yl-
Then we obtain 1z2 - x21 < d,
z2 E K6(x2) c M.
i.e.
Next we find that y = Ax2 + pxl = Az2 + uzl, where A = Ixl - YI/{Ixl - YI + 1x2 = y')
and
u=1=Ix2-uI/flxl-uI+Ix2-pI}. Hence
2
Then
'T-
Theorem.
(26) M.
Lemma (22) now delivers the desired result.
y E [z2,zl].
Let
M c RP
be a convex set with nonempty interior
M = M.
0
Proof:
Since
M c M
we get
showing
x E M, x
M
Select an arbitrary y + x, with tion.
xl E M.
x E [xl,y].
Since
by °
0
implies x¢ M. 0
M c M
We establish that
M c M. 0
0
Assume that 0
x E M- M
and
x E M there is also a point
x E M.
y E M,
By Lemma (25) x E M, contradicting the assump-
Chapter V
The Simplex Algorithm
This and the next chapter are devoted to the presentation of the simplex algorithm for the numerical solution of linear optimization problems. This very important scheme was developed by Dantzig around 1950.
We will
see that the simplex algorithm consists of a sequence of exchange steps. A special algorithm, related to the simplex algorithm and also based on exchange steps, was used in 1934 by Remez for the calculation of best approximations in the uniform norm.
His procedure is described in Cheney
(1966).
We will not prove the convergence of the simplex algorithm here.
For
the case of finitely many constraints (linear programming) the convergence has been established a fairly long time ago (Charnes, Cooper and Henderson (1953), p. 62).
The general case is much more difficult and
has not been studied until recently.
In this chapter we shall give a general description of the simplex algorithm and Chapter VI will be devoted to its numerical realization. For easy reference we state here Problem (P), which is to be treated by means of the simplex algorithm:
n (P)
Minimize
n cryr
subject to
r=1
I
ar(s)yr > b(s),
s E S.
r=1
In this and the next chapter we shall require that (P) is solvable, if bounded, and that no duality gap occurs.
that this situation occurs when n+l
We have shown in Chapter IV, §10
is closed (e.g. the case of linear
programming) or when the Slater condition is met. then be written in the following form:
92
The dual problem can
12.
Basic Solutions and the Exchange Step
n
n
Maximize
(D)
93
i£1 a(s):.=:, r =
subject to
b(si)xi
s. E S,
x.
> 0,
1,...,n,
i = 12 ..,n
In the future we shall write a feasible solution to
(see (7), of §12).
this problem in the form
{a,x}.
c S
Here, a =
and
x = (xi,...,xn) E Rn.
BASIC SOLUTIONS AND THE EXCHANGE STEP
§12.
We write the constraints of (D) in the form n
a(si)xi = C,
(1)
i=1
a = {s1....Isn) c S,
x = (x1,
are
Here, a(sl),...,a(sn)
n
..,xn)T > 0.
of those vectors in
(2)
Rn
which appear in
the constraints of (P):
a(s)Ty > b(s), (3)
Definition.
Let
{a,x}
a(sl),...,a(sn)
Also, let
hold.
s E S. be feasible for (D), i.e. (1) and (2) be linearly independent.
Then
{a,x)
will be called a basic solution to (1).
Thus if
{a,x}
is a basic solution then the linear system of equa-
tions (1) has the unique solution
x.
We shall also write this system in
the form A(sl,...,sn)x = c. is the
Here, A(sl,...,sn)
(4)
n x n
matrix having the columns
a(s1),...,
a(sn): a1(s1)
...
an(sn)
a2(sI)
...
a2(sn)
A(sl,.... sn) =
(5)
an(sl)
...
an(sn) J
Hence if
{a,x}
A(sl,...Isn)
is
is a basic solution then the rank of this basis matrix n
and we have
x = A(sl,.... sn)-lc and
x > 0.
94
V.
We shall require that among the vectors
Requirement.
(6)
THE SIMPLEX ALGORITHM
a(s),
s E S, there is always a subset of n
linearly independent vectors.
implies that if
must hold.)
{sl,...,sq), x1,...,xq
solution
(This
Then there is a
Let the dual problem (D) be solvable.
Lemma.
(7)
n < m
ISI = m, then
q <.n, xi > 0, i = 1,...,q,
such that
and the vectors a(si),
i = 1,...,q
are linearly independent. Proof:
Let (D) have the value
Then we have the relations
v(D).
q
xibCsi) = c0 = v(D),
(8)
xia(si) = c,
(9)
i=1
q i=1
i = 1,...,q.
x. > 0,
Thus the vector vectors
E Rn+l
(c0,...,c
is a convex combination of the
(b(si),aI(si),...,an(si))T E Rn+1
is not unique.
q
The representation (8), (9)
Using the reduction theorem (14) of §8 we conclude that
among the representations (8), (9) there is at least one such that q < n+l, xi > 0, 1 = 1,...,q dent.
cone
Mn+l
a(s1),...,a(sq)
Mn+1'
are linearly indepen-
We consider therefore the moment
q < n.
which is defined as in (32) of §8.
lies on the boundary of have
and
We now want to show that
(c0,...,cn)T
The vector
By Lemma (18) of §11 we must therefore
q < n, which is the desired conclusion. We can now state and prove an important result. (10) Theorem (Existence of optimal basic solutions).
problem (D) be solvable.
Let the dual
Among the solutions there is a basic solution,
i.e. an optimal basic solution. Proof:
The proof is an immediate consequence of Lemma (7).
always a solution pendent vectors
{s1,...Isq}, xl,...,xq a(s1),...,a(sq), q < n.
ready established.
We discuss the case
xq+l = xq+2 = ... = X and select
sq+l,...,sn E S
linearly independent. (6).)
Thus
of (D) with If
q
There is
linearly inde-
q = n, the assertion is al-
q < n.
Then we put
= 0
such that the vectors
a(sI),...,a(sn)
are
(This is always possible due to the requirement
12.
Basic Solutions and the Exchange Step
a = {sl,...Isn}
and
95
x = (xl,...,xq0,...,0)T E Rn
define an optimal basic solution.
(This basic solution is "degenerate"
in the sense of Definition (39) below.) Definition.
(11)
The subset
a = {sl,...Isn} c S
elements is called a basic set if the matrix
with exactly
n
is nonsingu-
A(sl,...,sn)
lar and the system of equations
A(sl,...,sn)x = c has a nonnegative solution
is of course a basic solu-
{a,x}
(Then
x.
tion of (D).) The simplex algorithm consists of a sequence of exchange steps. each step a basic set is given and one constructs a new basic set and the corresponding vector
One seeks to achieve:
n
n E
x' E Rn.
In
a' c S
b(si)xi <
i=1
E
i=1
{a',x'}
i.e. that
b(sl)x!; 1
(12)
is a better basic solution than
in the sense
{(Y,x}
that the preference function of (D) assumes a larger value. In the following we are going to split this exchange step into six substeps, each of which will be discussed in detail.
Special attention
will be devoted to the question of determining when an improvement (12) is possible.
The numerical considerations associated with the simplex algorithm will be dealt with in §14.
In order to start the simplex algorithm an
initial basic solution {a0,x0}
must be known.
In §15 we shall describe
how to construct an initial basic solution.
We assume now that we are given a basic set basic solution
{a,x}.
Thus
x
a
and the corresponding
is the unique solution of (4).
We have already stated that the simplex algorithm also delivers approximate solutions to the primal problem (P).
The following simple com-
plementary slackness theorem indicates how the basic set sociated with a vector
Complementary slackness theorem.
(13)
and
{&,x}
a
may be as-
y E Rn.
Then
feasible for (D).
y
and
Let {o,x}
y
be feasible for (P) are optimal for (P)
and (D) respectively if and only if
xi
Cn
r=1
a(.)
1
- b(si)} = 0, J
i = 1,...,n.
(14)
96
THE SIMPLEX ALGORITHM
V.
Proof:
mality of
We showed in (20) of §4 that (14) is sufficient for the optiy
and
The necessity is an easy consequence of the
{0,x}.
relation
n i=l
n
b(si)x i = v(D) = v(P) =
E r=1
combined with the dual constraints. v(P) = v(D)
cryr We recall that we have assumed
in this entire chapter.
The statement of the complementary slackness theorem can also be phrased thusly:
{&,x}
and
y
are optimal for the Problems (P) and (D)
respectively if and only if they satisfy the following systems of equations and inequalities: Primal constraints n
ar(s)yr > b(s),
s E S.
(15)
r = 1,...,n
(16)
r=1
Dual constraints n
ar(si)xi = cr, i=1
xi > 0,
i = 1,...,n.
Complementary slackness conditions
rn xiS
i = 1,...,n.
ar(si)yr - b(si)} = 0,
Our given basic solution Starting from
{a,x}
{a,x}
(17)
must of course satisfy (16).
we determine a vector
is satisfied as well by selecting
y
y E Rn
such that (15)
as the solution of the equations
n
ar(si)yr = b(si),
i = 1,...,n.
r=1
This system has a unique solution
y
since the system can be written
AT(sl,.. ,sn)y = b(si,.. ,sn). Here
AT(sl,...,sn)
is the transpose of the matrix
and
b(sl,....sn) = (b(sI),....b(sn))T E R.
(18)
A(sl,...,sn)
in (5)
12.
Basic Solutions and the Exchange Step
A(sl,...,sn) AT(sl,...,sn)
97
is nonsingular by the definition of basic solution.
Hence
Thus
has the same property.
y = AT(s1....,sn)-1b(sl,...,sn)
is uniquely determined by (18). Exchange Substeps (El) and (E2).
(19)
The basic set
a = (sl,...,sn} C S is given.
Compute the unique nonnegative solution
(El)
x
to the linear sys-
tem of equations A(sl,...,sn)x = c.
Determine the unique solution
(E2)
y
to the linear system of equa-
tions
AT(s1,...,sn)y = b(sl,...,sn). If
also satisfies
y n
ar(s)yr > b(s),
s E S,
r=1
then
y
is optimal for (P) and
{a,x}
optimal for (D).
We assume now
that we are given a basic set a = {sl,...,sn} C S such that the vector tions (15).
y
calculated in (E2) does not meet all the condi-
Then
{a,x,y}
is not a solution to the system (15)
- (17).
We describe now how to find
an approximate solution {a',x',Y'}
to the
system of equations and inequalities (15) - (17) which is better
in the sense of (12).
The basic sets
except one in common.
Thus if
and
a
a'
will have all elements
a = {sl, ..,sn},
then exactly one s' E S
si, i = 1,...,n, say
which did not belong to
a.
sr, will be exchanged for an
Hence
98
THE SIMPLEX ALGORITHM
V.
a' = {sl' .... sr-1's''
S! = si,
sr+1'...,sn} =
iTr
,
S'r = S'. Alternatively,
a' = (a U (s'}) . {sr} r E
for some
included in
We describe first how to select {a,x,y}
a'.
s' E S
to be
are hence given.
(20)
Exchange Substep (E3).
(E3)
Determine
s' E S
such that
n
r=1
ar(s')Yr < b(s').
If no such {a,x,y}
s'
(21)
exists, then the computation is stopped here, since
solves (15) - (17).
This means that we include in the basic set is such that a primal constraint is violated.
a'
a point
which
s'
This fact entails that
s' f a. There remains to determine a member i.e. will be replaced by
si E a
which shall leave
a,
s'.
(22)
Exchange Substep (E4).
(E4)
Compute the solution
d E Rn
of the linear system of equations
(23)
A(si,...,sn)d = a(s') i.e.
n
a(si)di = a(s'). i=1
(23) thus expresses the "new" vector the "old" vectors
a(si), si E a.
from the following argument.
a(s')
as a linear combination of
The meaning of the vector
d
is clear
Consider the set
a U {s'} = {s1,...,sn,s'} C :S.
It consists of
n+l
elements.
(24)
Introduce the n+l-dimensional vector
(x1 - Adi,...,xn - AdnA) (25)
_ (xi(A),...,xn(A), xn+l(a))T E Rn+l
12.
Basic Solutions and the Exchange Step
(A E R
is arbitrary).
{a U {s'}, x(A)}
The value of the dual preference function for
will be denoted by
c0(A):
n
n
b (si) xi (a) + b (s') A =
c0 (A) _
b (si) (xi-Adi) + b (s') A.
i=1
If we put
99
i=1
A = 0, we get
c0(0) =
nn
L
b(si)xi,
i=1
the "old" value of the dual preference function. (26)
The following relation is true for all
Lemma.
A:
(27)
c0(A) = c0(0) + XACs'), where
n
A(s') = b(s') - I ar(s')yr > 0. r=1
(Compare (21).) Using (18) and (23) we have
Proof:
n cOCX) =
b(si)xi + A{b(s') -
i=1
n
b(si)di}
i=1
= c0(0) + A{b(s') - b(sl.... ,sn)Td}
= c0(0) + A{b(s') - yTA(sl,...Isn)d) = c0(0) + A{b(s') - yTa(s')} = c0(0) + AA(s'). Since not
A(s') > 0, the value of the dual preference function for smaller than that for
is feasible for all
x = x(0).
Therefore, if
x(A)
is
{a U {s'}, x(A)}
A > 0, then the value of the dual preference function
can be made arbitrarily large.
This should mean that (D) is unbounded,
entailing that (P) is inconsistent.
This case is dealt with in the follow-
ing lemma. (28)
Lemma.
d. < 0,
Let the unique solution vector
d
of (23) be such that
i = 1,...,n.
(29)
Then (D) is unbounded and hence (P) is inconsistent. Proof:
We note first that (23), (24) and (25) imply that the equality
constraints of the dual problem are met independently of (29).
Thus
100
V.
THE SIMPLEX ALGORITHM
n
r = 1,...,n,
ar(si)xi()L) + ar(sl)xn+1(A) = cr, i=1
and this equation is true for all real i =
xi(A) > 0, A > 0.
for all
If (29) holds as well, then
A.
Letting
A -* +m, by (27) we conclude that
c0(A) +
establishing the assertion. It is now clear how to select One calculates the maximal
A
when some of the
di
are positive.
such that
A
i = 1,...,n.
xi(A) = xi - Adi > 0,
(30)
Then one need only consider those indices
i
such that
di > 0.
If
di > 0, then (30) is equivalent to A < xi/di.
Thus
a = min {xi/di, di > 0}
It is also clear that at least one of the
meets all the conditions (30).
xi(a), i = 1,...,n
components
of the vector
x(a)
will vanish.
Indeed,
if
a = xr/dr
r E {l,...,n},
for an
(31)
then we get
x
xr (a) = xr - dT dr = 0.
(32)
r
sr
The corresponding element
is removed from the basic set.
Hence we
put {a u {s'}} . {s
al
}
r s1,
s
sl'
sr+1, ..,sn}
'sr-1'sr' sr+1'
..,sn}
and x' = (x1(A),...,xr-1(a),X, xr+1(a),...,xn(a))T xr
xr _
(x1
d
r
d1,...,xr-1
d
r
xr dr-1,
dn)T. n - dr r
d
r
xdr
xr+l -
r
dr+1,...,
(33)
12.
Basic Solutions and the Exchange Step
Use (27) and (32) to verify once more that
Exercise.
(34)
101
{a',x'}
is feasible for (D) and that
n
n
x
b(sl)xi =
b(si)xi + dr A(s').
i=1
r
i=1
(31) does not necessarily determine the index
r E {1,...,n}
uniquely.
We summarize the process above (i.e. the determination of which element
sr
a) as follows:
to remove from
(35)
Exchange Substeps (ES) and (E6).
Let
d
be the unique solu-
tion of (23) in Substep (E4). (ES)
di < 0, i = 1,...,n, then (D) is unbounded and (P) is in-
If
The computations are stopped.
consistent. (E6)
If there is a positive
di, then select an
with
r E (1,...,n}
and such that
dr > 0 x
dr = min{xi/di, di > 0}. r
Next put
a' = {a U {s!}} . {s r
Now the fundamental question arises whether the "new" set set.
a'
In that case one can repeat the process from Substep
instead of
a).
is a basic
(El) (with
a'
Thus one gets an iterative scheme, the simplex algorithm.
We now prove (36)
Lemma.
step (E6). Proof:
Let
Then
a'
s'
be found via Substep (E3) and
Sr
via Sub-
is a basic set.
To facilitate the presentation we renumber the vectors
a(si), i = 1,...,n
r = 1.
so that
Thus we must show that
a(sp), a(s2),...,a(sn)
are linearly independent.
(37)
Since
a
is a basic set the vectors (38)
aCs2),...,a(sn)
must be linearly independent. linearly dependent.
Then
of the vectors in (38);
a(s') =
nn
E
i=2
a(si)Pi.
Assume that
a(s')
a(s'), a(s2),...,a(sn)
are
can be written as a linear combination
102
V.
THE SIMPLEX ALGORITHM
Comparing with (23) we find that dI = 0,
d2 = P2,...,dn = Pn.
This contradicts the fact that we have assumed selected such that
dr > 0.
A( sl,...,sr-1's
r = 1
since
r
is always
The system of equations ,sn )x
= c
'sr+1'
has a unique nonnegative solution
x'
since the index
in Substep (E6) precisely according to that criterion.
r
was selected
(See also (33).)
Thus Lemma (36) guarantees that one can return to Substep El with the new basic set
a', provided no interruption occurs in Substeps (E3) or
As stated earlier, the goal is to increase the dual preference
(ES).
function, i.e. to achieve that b(slI...Isn)Tx < b(sl,...,sn)x' holds at each simplex step. under all circumstances.
Unfortunately this cannot be provided for
That is, if
T T xr b(si,...,sn) x' = b(sl,...,sn) x + A(s').
d r
and
and
s'
A(s') > 0
s
are chosen such that
r
and
dr > 0
then it is quite possible that
xr = 0 holds.
Then the value of the dual preference function would remain con-
stant during the transfer from the basic set
a
to the new basic set
a'.
Such an exchange would appear not to be worthwhile. (39)
A basic solution
Definition.
xi > 0, i = 1,...,n.
If at least one
{a,x}
is termed regular if
xi = 0, then the basic solution is
called degenerate. (40)
Exercise.
Minimize
6
1
We are given the following optimization problem (P) 1 +
1
r-1
r- )
yr
subject to
r=l (P)
6 E
r=1
sr-lyr > es,
s E [-I'll.
Basic Solutions and the Exchange Step
12.
103
The corresponding dual problem reads q
Maximize
s.
i=1
1
i
i=1
( 1)r-1r
1 +
=
sr-lx
C
(D)
subject to
e lx.
E
= 1,...,6,
i
i = 1,...,q.
xi > 0,
Verify the statements below. i)
Put
q = 7
a(l)
a(1) = {sl,...,s7}, x(1) E]R7
and define
-
=
,
0,
, 1},
,
x(1) = (1/12, 5/18, 5/12, 4/9, 5/12, 5/18,
Then
{0(1),x(1)} ii)
Let
x
(2)
1/5, 19
2S
25
25
25
19
1}, T
(144' 48' 72' 72' 48' 144)
{a(2),x(2)}
iii)
1/12)T
is feasible for (D) but is not a basic solution.
(2)
Then
by
is a regular basic solution.
Using the reduction process from (14) of 98, one may construct from
{0(1),x(1)}
a basic solution with the basic set
a(3) = {-l, -/, 0, 41-15,
315, 1},
x(3) = (0, 5/9, 8/9, 0, 5/9, 0)T Then
{a(3),x(3)}
is a degenerate basic solution.
We observe that when an optimization problem is such that all basic sets are regular then the dual preference function increases with each simplex step.
We now summarize all the Substeps of the exchange step for the linear optimization problems of type (P). (41)
The exchange step of the simplex algorithm.
Let a basic set
a = {s1....,sn} c S be given (the construction of an initial basic set is treated in §15). introduce the nonsingular matrix A(s1,...Isn)
We
104
V.
with the columns
THE SIMPLEX ALGORITHM
a(s1),...,a(sn), and the vector
b(sl,...,sn) _ (b(sI),...,b(sn))T. (El)
Determine
from
x E Rn
A(sl,...,sn)x = c. (E2)
Compute
from
y E Rn
AT(sl,.... sn)y = b(sl,...,sn). (E3)
Determine an
s' E S
such that
n
I ar(s')yr < b(s').
r=l
with this property exists, then
y
is optimal for (P) and
If no
s'
{a,x}
optimal for (D), and the calculations are stopped here. (E4)
Compute
d = (d1,...,dn)T E Rn
such that
A(sl,.... sn)d = a(s'). (E5)
If
d. < 0,
i = 1,...,n,
then (D) is unbounded and (P) is inconsistent, and the computations are stopped here. (E6)
r E {1,...,n}
Find x
such that
x.
dT = min
/di > 0}
di and put
a' = {a U {s'}}
.
{sr},
..'sr-1,s" sr+1'
i.e. .
,sn} _ {s',...,s'}.
{ s l '
Then
a'
is a basic set and the corresponding basic solution
satisfies x b(si,...,sn)Tx' = b(s1,...Isn)Tx + dT A(s'). r (42)
Remark.
The Substeps (El),
of linear systems of equations. the calculations efficiently.
(E2) and (E4) call for the solution
We have not yet described how to arrange
The different variants of the simplex al-
gorithm differ only in this respect.
Fundamental for the analysis of the
numerical properties of the various simplex algorithms is the recognition
13.
The Simplex Algorithm and Discretization
105
that at each simplex iteration linear systems of equations are solved, explicitly or implicitly. (43)
Remark.
We shall discuss this matter in §14.
We note that exactly one element is exchanged by the
transfer from the "old" basic set
to the "new" one
a
a'.
There are
other exchange procedures by which several elements are exchanged at each One extreme case is the so-called simultaneous exchange when all
step.
elements of a
are changed by the transfer to
a'
(see Judin and
We also mention in this context the Remez
Goldstein (1968), p. 506).
algorithm (see Cheney (1966), p. 97), where again the entire basic set is The computational effort is generally greater
exchanged at each step.
than by the exchange algorithm described above but on the other hand one hopes to achieve greater increases in the value of the dual preference function per iteration step.
§13.
THE SIMPLEX ALGORITHM AND DISCRETIZATION Let an initial basic set
aO = {s0,...,sO}
be known.
(See §15.)
If we now perform an exchange step and no interruption occurs in (E3) and (E5)
(in each of these cases there is no need to continue the computations),
then (E6) gives a new basic set
a' = {sl,...,sn}.
Substep (El) and start a new exchange step. the simplex algorithm.
Hence we can return to
In this way we have obtained
Thus we generate a sequence
1
a0
of basic sets,
k
k
k
= {s1,...,sn},
Note that
ak
and
k = 0,1,...
.
have all elements except exactly one in common.
ak+l
We also get a corresponding sequence of basic matrices
AO -). A, -.A2i ...
,
where Ak =
has the column vectors xk
a(sk),...,a(sn).
= Ac, k = 1,2,... kl
are such that
The corresponding vectors
106
THE SIMPLEX ALGORITHM
V.
T+jx
bix1 < b2T x2 < ... < bkxk <
< ... < v(D),
where k
k
bk = b(s1,...,sn).
The matrix
Remark.
(1)
Ak-1
differs from
only by one column
A.
vector!
We now want to describe in greater detail how to determine the vector which is to be included in the basis (Substep (E3)).
a(s')
general very many indices
s E S
There are in
such that
n
ar(s)Yr - b(s) < 0. r=1 If one wants to write a computer program for carrying out the exchange step, then one must given an unambiguous selection rule. The case of linear programming,
(2)
finite set.
In this case
ISI <
Usually one has the rule to select
s'
S
is a
at the minimum point
of the error function n
ar(s)yr - b(s). r=1 Thus we take an index value which renders the function
n
A(s) = b(s) - I ar(s)Yr r=1
a maximum.
of
Hence, in Exchange Substep (E3) we add to the basis an element
which is such that the primal constraints are violated as much as
S
possible.
Since
S
is finite we can determine an element
s'
which
has the property
A(s') > A(s),
sES
(3)
by means of finitely many arithmetic operations.
If
s'
is not uniquely
defined by (3), then we must introduce further conventions to make a unique choice possible.
If
interval, we take as
S s'
is an ordered set, e.g. a finite subset of a real the smallest index satisfying (3).
Thus the Substep (E3) of the Exchange step is completely specified for a finite index set.
For this class of linear optimization problems
one can establish a simple result on the convergence of the simplex algorithm.
13.
The Simplex Algorithm and Discretization
Consider the case
S = {1,...,m}
finitely many different basis sets (n = =ml
Ym,n
where
107
m > n.
a = {sl,.... sn}.
Then there are only Indeed, there are
m!
n!(m-n)!
different subsets of
S with n elements. Hence there are at most y m,n different basic solutions of the system occurring in the dual problem (LD)
Ax = c,
x > 0.
In principle, it is possible to solve the dual pair (LP) - (LD) by
means of calculating all these basic solutions and then to pick the one which assigns the highest value to dual preference function.
In practice this is not possible since the computational effort
thereby required is prohibitive even for modest values of m
and
n.
As
an example we mention that Y20,10 = 184756.
The decisive advantage of the simplex algorithm is the fact that a sequence of basic solutions is systematically generated in such a manner that the corresponding values of the dual preference function form a nondecreasing Therefore usually only a small fraction of the possible number
sequence.
of basic sets will be generated.
This is the reason for the efficiency
of the simplex algorithm of linear programming. (4)
Theorem.
Let
S
have finitely many elements; i.e. we consider
the dual pair (LP) - (LD) of linear programs. bounded.
Let (LD) be feasible and
Assume also that the simplex algorithm generates a sequence of
basic solutions such that the corresponding values of the dual preference function form a strictly increasing sequence.
Then the
simplex algorithm
delivers optimal solutions to (LP) and (LD) after finitely many iterations. Proof:
Since the values of the preference function corresponding to
the basic solutions which are generated by the simplex algorithm are strictly increasing, the same basic set cannot appear twice. simplex algorithm generates pairwise different basic sets.
Thus the Since there
are only finitely many basic sets the simplex algorithm must stop at an optimal solution after finitely many iterations. (5)
Remark.
If all the basic solutions which are generated by the
simplex algorithm are regular (see (39) of §12), then the preference function of the dual problem is strictly increasing.
Hence the simplex al-
gorithm must deliver an optimal solution after finitely many iterations.
108
THE SIMPLEX ALGORITHM
V.
If degenerate basic solutions occur, it is quite possible that the simplex algorithm "cycles", i.e. the same basic solutions reappear periodically and the value of the dual preference function remains constant without having reached its optimum. have been constructed.
Examples illustrating this phenomena
However, such "pathological" cases occur so rarely
that one generally does not bother with taking special precautions for dealing with them when constructing computer programs for practical use. It sometimes happens that degenerate basic sets do occur and thus one or several simplex steps are carried out through which the current value of the dual preference function does not increase, but normally the increase resumes without the use of any special devices for achieving this desired state of affairs.
The case of degeneracy and possible cycling is of course of great theoretical interest.
By means of a modification of Exchange Substep (E6)
the simplex algorithm may be altered so that the same basic set cannot reappear even if degeneracy occurs.
Then the simplex algorithm gives an
optimal solution after finitely many iterations in this more general situation as well.
The principle behind this modification is to introduce an
arbitrary small perturbation of the vector function.
c
in the primal preference
Hence we construct a perturbed problem such that no degenerate
basic solutions are generated by the simplex method.
Hence this perturbed
problem is solved after finitely many simplex iterations.
By construction
one can now determine an optimal solution of the original problem from the calculated optimal solution of the perturbed problem.
This so-called
e-method is described in Charnes, Cooper and Henderson (1953).
It uses
the so-called lexicographic ordering to modify Exchange Substep (E6). See also Hadley (1964) or Collatz and Wetterling (1971). It is much more difficult to prove a convergence statement of the form lim bkT xk = v(D)
kwhen there are infinitely many constraints.
Then the simplex algorithm
can not, in general, be shown to stop after finitely many iterations. Theoretical investigations of this case can be found in the book by Blum and Oettli (1975), p. 247-255 and in the writings by Carasso (1973) and Hofmann and Klostermair (1976). When
S
has infinitely many elements, then there is of course no
general procedure to find an tions on the index set
S
s'
satisfying (3).
and the functions
Without special assump-
ar, r = 1,...,n
and
b, it
13.
The Simplex Algorithm and Discretization
is not certain that an special case when continuous on mine an
s
S
109
with the property (3) exists.
s'
is a compact subset of
Rk
Even for the
a1,.... an,b
and
are
S, it is not possible to give a general method to deter-
where
A(s)
assumes its maximum value.
The case just men-
tioned has appeared several times before in our text. in uniform approximation problems.
It often occurs
In theoretical analysis (e.g. con-
vergence proofs) one often works with
s'
satisfying (3).
Some minor
But in practice
relaxations of this condition are sometimes introduced. one normally proceeds along the lines to be given below: (6)
Modification of the exchange substep (E3) when
a finite subset
{sl,...,sm} c :S
A(s') > A(s), (If
and determine an
s'
ISI
= -.
Select
such that
(7)
s E Sm.
is not uniquely determined by (7), then one proceeds as described
s'
in (2).) It is easy to realize that this corresponds to a discretization of (P) in the sense of (10) of §3. (Pm)
Minimize
c
T y
subject to
If we now start with a basis
Consider the linear program a(s)Ty > b(s),
a E Sm
s E Sm.
and use the selection rule from (2)
then the simplex algorithm applied to (P ) above delivers the same new m basis elements s' as when it is used on the continuous problem (P)
Minimize
T
c y
subject to
when one also starts from
a
s E S
a(s)Ty > b(s),
and determines
according to (7).
s'
The "rough" calculation of the new element
s'
to enter the basis
and (approximately) satisfying
A(s') > A(s),
sES
thus corresponds to a discretization of (P).
This gives us a reason to
discuss discretization of linear optimization problems with infinitely many constraints.
Discretization is very important, both in theory and in
practice.
Consider the problem n (P)
Minimize
cTy
subject to
I
ar(s)yr > b(s),
r=l
This problem is approximated by the linear program
s E S.
110
THE SIMPLEX ALGORITHM
V.
n
Minimize
(Pm)
cTy
subject to
ar(si)yr > b(si),
I
i = 1,...,m.
r=1
Here, {s1,...,sm}
is a fixed subset of
S.
We now give a useful interpretation of the discretized program (P M S
is assumed to be a subset of Definition.
(8)
wl,...,wm
Let
Rk.
T = {sl,.... s
be a subset of
S, and
be real-valued functions with the properties (i) and ii) be-
low:
i) wj (s) > 0, s E S, j = 1, ... ,m; w j (s
=
i#j
f 'o:,
i)
i,j = 1,...,m.
,
Suppose a real-valued function function
Lf: S a R
f
is defined on
S.
We define the new
by
m (Lf)(s) _
wj(s)f(sj).
I
j=l
Then
is called a positive interpolating operator with nodes
L (9)
sl,.... sm
Piecewise linear interpolation in one dimension;
Example.
S = [a,8], a = sl < s2 < ... < sm = 8.
Define
wj, j = 1,...,m
accord-
ing to:
to
a<s<s.
(s-s.
)/(s
-
- j-1
< s < s.
(only for
j > 1)
(only for
j < m)
j-
w (s)
See Fig. 13.1.
(sj+1-s)/(sj+l-sj)
sj L s < sj+1
0
sj+l < s <
This construction may be generalized to the "triangulation"
of multidimensional areas. functions
s.
)
wj
It is also possible to work with weighting
of a more general nature, e.g. piecewise polynomials of
degree higher than 1.
The following result motivates the use of positive interpolating operators. (10)
nodes
Theorem.
sl,.... sm.
Let
L
be a positive interpolating operator with
Then the linear optimization problem n
(PL)
Minimize
cTy
subject to
r=
(Lar)(s)yr > (Lb)(s),
s E S
=1
has the same feasible vectors
y
and hence the same solution as the dis-
The Simplex Algorithm and Discretization
13.
1
111
w3
------
b
s4
s3
SZ
S
m
Fig. 13.1
cretized problem (Pm). Proof:
a)
Let
(Lf)(si) = f(si), y
we find that
y meet the constraints of (P L).
Since
i = 1,...,m,
also satisfies the constraints of (P
m).
Assume on the other hand that
b)
n ar(si)yr > b(si),
E
i = 1,...,m.
r=1 wi(s) > 0, i = 1,...,m
Since
n
and
s E S, we get
m
E (Lar)(s)yr - (Lb)(s) _
r=1
i=1
nc
wi(s){
r=1
ll
yrar(si) - b(si)F 10 1
s E S, proving the assertion.
for all
The discretization (Pm) of (P) is equivalent to replacing (P) by a linear optimization problem with the same index set tions
ar, b
approximated by
Lar, Lb
S
respectively.
but with the funcIt is possible to
express the deviation of the optimal value of (P ) from that of (P) in
m
terms of the interpolation errors maxILa (s) - a (s)l, r sES r
maxlLb(s) - b(s)j.
r
sES
Compare Theorem (16)! (11)
be a grid.
Definition. Let
Let
S
be a subset of
Rk
and let
{sl,...Ism} c S
112
V.
h = h(s
Is - s.I. min 1 sES 1
.. ,s ) = max
m
1
Then
is called the roughness of the grid.
h
THE SIMPLEX ALGORITHM
Here,
I I
Euclidean distance in Exercise.
(12)
denotes the
Rk.
Consider the interpolating operator of (9).
that there is a constant
Show
such that
c
I f(s) - (Lf) (s) I < ch2
max
sE [a, s] when
f
is twice continuously differentiable.
Note.
Rk, k > 1.
Two numerical examples.
(13) a)
This result cannot be directly generalized to
Minimize
yI + y2/2 + y3/2 + y4/3 + y514 + y6/3
subject to the
constraints 2
yI + y2s + y3t + y4s2 + y5st + y6t IYrI < 10,
The index set
> es +t
,
s E [0,1],
t E [0,1]
r = 1,...,6. [0,1] x [0,1]
is replaced by the 25 points
si = 0.25 (i-1), tj = 0.25 (j-1), i = 1,...,5, of the grid is
2
h = 0.12S r2- c 0.1768.
j
= 1,...,5.
where
The roughness
The discretized problem is hence a
linear program with 6 variables and 37 constraints. of the simplex method.
(silt .)
It was solved by means
In the table below the solutions of the discretized
and the continuous problems are given.
The latter was solved with the
three-phase algorithm of Chapter VII. Discretized Problem Optimal value
2.41
Original Problem 2.44
Optimal solution yl
Y2
y3 Y4 y5 Y6
2.86
2.58
-4.69
-4.11
-4.69
-4.11
4.55
4.25
4.31
4.53
4.55
4.25
This example was solved by means of the computer codes of K. Fahlander
(1973).
13.
The Simplex Algorithm and Discretization
113
The following example gives an idea about how rapidly the dis-
b)
cretization error decreases when the grid is refined.
We consider the
problem 6
Minimize
6
subject to
E Yr r=l
E r=1
sr-ly
r >- 1/(1+s2),
We discretize this problem by replacing the index set SR = {s1
S
- -
0 < s < 1.
by the subset
i = 1,...,R}.
=
Using the simplex algorithm we got the results below.
(The original
problem was again solved by means of the three-phase algorithm of Chapter VII.)
Index Set
Roughness of Grid
S21
1/40
0.785 561 34
S41
1/80
0.785 568 72
S81
1/160
0.785 568 92
Optimal Value
0.785 569 11
S
(14)
Solution of linear optimization problems by discretizations. Si, k = 1,...
Select a sequence of finite subsets
of the index set
S
with the properties
h(SR) = max min
is - sl + 0
when
t +
(15)
sES s`ESR
and
R = 1,2,...
SR c SR+1'
.
The linear programs (PR) are solved by means of the simplex algorithm:
Minimize
(PR)
c
n
T
y
subject to
I
ar(s)yr > b(s),
s E S
r=1
An optimal basic solution to the dual (DR) can be used as the starting basic solution to Remark.
DR+l.
It is possible to prove that
lim v(PR) = v(P) R-
provided that the assumptions of the duality theorem (7) of §11 are met, the sequence of discretizations satisfies (15), S c Rk and the functions
al,...,an,b
are continuous on
S.
is a compact set,
114
V.
THE SIMPLEX ALGORITHM
The following simple theorem can often be used to estimate the difference between the optimal value of the discretized problem and that of the original problem. Theorem.
(16)
Let the linear optimization problem be such that there
is a vectors E Rn
P > 0
and a real number
satisfying
nn
a(s)TY =
L
ar(s)Yr > P,
s E S.
r=1
(si,...Ism)
Let
be a subset of
S.
The linear program arising when
is replaced by this subset is assumed to have a solution A
m
y(m).
S
Let
be such that
> 0 nn
ar(s)Yim) + Am > b(s),
s E S.
(18)
r=1
Then
v(P), the value of the linear optimization problem (P), can be
bracketed as follows: cTy(m) < v(P) < cTy(m) + Amp-1cTy Proof:
The leftmost inequality is well known.
See (12) of B.
show the other inequality we observe that the vector y = Y(m) + A m p-
1y
meets the conditions of (P).
We find from (17) and (18) that
n ar(s)Yr = r=1
E
ar(s)YTm) + AmP-1
r1
Hence we get v(P) < cTY = C
E
r=1
+
AMP-1cT
establishing the desired result.
ar(s)Yr > b(s),
s E S.
To
Chapter VI
Numerical Realization of the Simplex Algorithm
In this chapter we shall describe how to implement the simplex algorithm on a computer.
As stated earlier, this algorithm requires the
solution of a sequence of linear systems of equations.
We devote consid-
erable space to explaining how to solve such systems in a computationally efficient way.
In the last section we discuss the construction of a basic
solution with which one can start the simplex algorithm.
STABLE VARIANTS OF THE SIMPLEX ALGORITHM
§14.
Each exchange step of the simplex algorithm calls for the solution of three linear systems of equations.
In Substeps (El), (E2) and (E4)
we encounter A
xk k
= c,
(1) (2)
Tk
Aky
- bk,
Akdk = ak'
(3)
The meaning of the abbreviations with (41) of §12.
bk, ak
will be clear if we compare
We observe that the vector
ak
will not be known before
the system (2) is solved. In principle, one could solve the three systems (1), (2), and (3) straightforwardly in each exchange step of the simplex algorithm.
One
could use any of the standard methods (e.g. Gaussian elimination or Householder transformations) to calculate the vectors (1), (2), and (3) respectively.
xk, yk, and
dk
from
These and other numerical methods are
described in textbooks on numerical analysis, e.g. Dahlquist and Bjorck
115
116
NUMERICAL REALIZATION OF THE SIMPLEX ALGORITH
VI.
(1974), Stoer (1976) and Stewart (1973).
Such a procedure can make sense in some cases, in particular when the number
is modest, say
n
n = 10.
quired grows rapidly with
n.
However, the computational effort re-
In a general case it increases as
n3.
Hence the total effort would be prohibitive for problems of a size often encountered in practice, i.e. with hundreds and thousands of variables, even if a large powerful computer is available.
Therefore several variants of the simplex algorithm have been developed in order to reduce the computational labor.
to exploit the fact that the matrices
Ak-1
and
The decisive idea is Ak
are closely related.
They differ only by one column vector.
We shall now discuss a variant of the simplex algorithm which is based on Gaussian elimination.
The rest of this section is not crucial for the
understanding of the simplex algorithm since it deals with the efficient and accurate solution of a sequence of linear systems of equations.
Hence
the reader may skip this topic during the first reading of the book without losing contact with the contents of succeeding sections.
We consider a linear system of equations of the form
where
A = (aik) (i,k = 1,...,n) is a fixed nonsingular matrix and
given vector.
b
a
In order to solve the system of equations one seeks to
determine a nonsingular matrix The product
R
of
F
and
with the following property:
F A,
FA = R
(5)
is an "upper triangular matrix" of the form r11
r12
..
r22
R =
rIn
with
r.. = 0,
i = 1,...,n.
O (5) is called a triangular factorization of the matrix (6)
A.
The factorization method for linear systems of equations.
pose a triangular factorization (5) is known.
Sup-
Then the system
Ax = b is equivalent to the system Rx = Fb.
(7)
14.
Stable Variants of the Simplex Algorithm
In order to solve
Ax = b
117
one first calculates the vector
b = Fb
and
then solves the system r11x1 + r12x2 + ... + rInxn r22x2 +
b1
b2
+ r2nxn
Rx =
= Fb. x r nnn
I bn
The last system is easily solved by means of back-substitution: x
n
= r-lb
nn n 1
rn-l,n-16n-1 - rn-l,nxn
xn-1
1
(8)
r12x2 - ... - rlnxn).
xl =
r11-
(9)
Solution of
ATx = b.
system of equations
The
ATx = b
(10)
which contains the transpose
of A
AT
factorization (5) is available.
can also be easily solved when a
Indeed, (10) is equivalent to the two
systems of equations RTy = b
(11)
x = FTy.
(12)
(This statement is verified by multiplying (12) by AT = RT(FT)-1.) solve (10) one starts by determining
y
To
from (11):
r11Y1 b2
r12y1 + r22y2
= b.
RTY =
l rlnyl + r2ny2 + ... + rnnyn Thus
y
y1,...,yn
b
I
n 1
is calculated by means of forward-substitution and one finds in analogy with (8).
from (12) without major effort. simplex algorithm.
The solution
x
is subsequently found
Consider now exchange step
k
of the
Let a triangular factorization
Fk Ak = Rk
of the basis matrix
Ak
be known.
Then the three linear systems of equa-
tions which appear in this exchange step,
118
VI.
NUMERICAL REALIZATION OF THE SIMPLEX ALGORITHM
Akxk = C,
T Akyk = bk, Akdk = ak,
may be solved as described in (6) and (9).
Numerical schemes for triangular factorization.
(13)
The most common
methods for calculating a triangular factorization of the type
FA= R Put
are based on the following idea.
A(1) = A A(2), ..,A(n-1)
and determine a sequence of matrices
according to the
rules
A(2) = F(1) A(1) A(3)
=
F(2) A(2)
F(2) F(1) A
F(n-1) A(n-1) = F(n-1)
A(n) = Here
=
F(1),...,F(n-1)
mined such that
...
F(1) A.
is another sequence of matrices which are detertake the form indicated below (here "x"
A(2), ..,A(n)
means that the element at this point may be different from 0)
A(2)
_
x
x
0
x
0
x
0
... ... ...
x
x
x
x
x
x
0
x
x
x
0
0
x
0
0
x
0
0
x
A(3) =
x
x
A(n)
_
x
x
x
x
... ...
x
...
... ... ... ... ...
x x x x
x
x x x
(14)
O X) Next we put F(n-1) i.e.
FA = R
A(n) = R.
...
The triangular factorization sought is then written
F(1) A = R,
14.
Stable Variants of the Simplex Algorithm
119
with F = F(n-1)...F(1).
Thus the original matrix A n-1
is brought to triangular form by means of
transformation steps. F(1),...,F(n-1)
Suitable matrices
F(1), i = 1,...,n-1
Gaussian elimination.
In the latter method one selects
are so-called elimination matrices and
G.
trices.
are orthogonal matrices, and the method based on
i = 1,...,n-1,
F(1) = G. Pi,
where
can be calculated in several dif-
We mention here the Householder transformations, in which
ferent ways.
(See below.)
Pi
permutation ma-
Due to space limitations we shall treat this method
only.
We
Triangular factorization by means of Gaussian elimination.
(15)
start by describing the first step of the method (13); i.e. the determination of
F(1) A Here
such that
F(1) =
A(2)
.
shall have the form (14).
A(2)
the idea of forming
A(2)
We borrow from Gaussian elimination
by subtracting suitable multiples of the first
row from the other rows of the matrix A
in order to render zero the ele-
ments of the first column in the second row, third row, etc.
We assume
first that all
+ 0.
The following "elimination matrix" has the desired effect: 1
-a21/a11 -a31/a11
-anl/all
O 1
0
0
1
(16)
0
1
One verifies this by means of straightforward calculation. one must proceed otherwise and exchange rows: ail # 0
all = 0,
If
one determines an element
and lets the first and the i-th rows change places.
The matrix
which results is then multiplied by an elimination matrix (16). In order to secure numerical stability, it is recommended to choose as the pivot element that element in the first row which has the largest absolute value:
120
NUMERICAL REALIZATION OF THE SIMPLEX ALGORITHM
VI.
jail) =
max k=1, ..,n
Iak1!.
Exercise (Permutation matrices).
(17)
Denote by
11 (i,k)
n x n
the
matrix 1
1
Row number i +
0 ...
1
1
1
Row number k -
... 0
1
1
.1 J
Thus we get
and
k
n
'k)
Show that
matrix.
of
be interchanging rows number II(i,k)A
Determine also
A.
n(i,k) = i
II(i,k)
i
and
k
in a unit
is obtained by exchanging rows number A R(1'k).
i
Finally, show that
(unit matrix).
We have thus constructed a matrix of type (14) by performing one step of Hence we obtain
the Gaussian elimination process.
A(2)
=
F(1) A
where
F(1) = G 1 Here
P1 (18)
P
1
.
is a permutation matrix and
GI
The general elimination step.
We now describe how to determine
x
x
x
0
x
.
A .
.
Let the matrix when
+1 .
an elimination matrix.
A(k)
x X
x x x
.
.
0
0
x
.
.
0
A (k) = (aid)) _
- k-th row
O x
4
k-th column We now perform the following operations:
A(k)
be given.
is of the form
14.
Stable Variants of the Simplex Algorithm
121
Consider the elements in column number
i)
the main diagonal of
A(k).
largest absolute value. ja(k)I Lk
=
which are on or below
Determine an element out of these which has
Let
be such an element, i.e.
a(k)
max la(k)I. ik k
Interchange rows number
ii)
k
k
and
k
of the matrix
A(k), i.e.
form the matrix
(k) P kA
where
(See Exercise (17).)
Pk = II(t,k)
Substeps i) and ii) are often
referred to as row-pivoting.
Consider row number
iii)
PkA.
in
k
of this row from all rows with numbers
Subtract suitable multiples
k+l,...,n
in such a manner that
all elements in the k-th column and below the main diagonal become zero. This means that we form A
where
(k+l)
= Gk 'k has the form
Gk
rl
0 1
+ k-th row
0 gk+l,k
(19)
1
0 gnk
with
Igikl < 1, i = k+l,...,n.
As a direct consequence of this scheme
we get (20)
Theorem.
Let
permutation matrices
A
be a nonsingular matrix.
Pl,...,Pn-l
Then there are
and elimination matrices
G 1,
...,Gn-l
such that Gn-1 Pn-1 ...
and
R
is an upper triangular matrix with
(21)
gi,k-1
G1 P1 A = R
Numerical realization.
of the elimination matrices
r.. # 0, i = 1,...,n.
One normally stores the elements Gk-l
in the positions of the matrices
V].
122
NUMERICAL REALIZATION OF THE SIMPLEX ALGORITHM
It is also
which are zeroed during the course of the computations.
A(k)
We shall here describe
necessary to keep track of the row interchanges.
a procedure which is advantageous to use in conjunction with the simplex algorithm, especially when one applies the "stable updating" to be discussed later.
We want to store explicitly the matrix
FG
Pn-1 ... GI P1
n-1
n x n
which is obtained by multiplying the Pn-1 Gn-1' one stores the
unit matrix by
This structure is exploited as follows. n x 2n
P11 G1
At the start
matrix
B = (A,I).
All row operations which are needed for the transfer from A(k+l)
A(k)
to
(row interchanges, additions of multiples of a certain row to
other rows) are carried out on all of
In this way we get the sequence
B.
of matrices
B(I) = B = (A, I) B(2) = GIP1B =
B(n) = G
P
n-1
(A(2),GIP1)
n-1
B(n-1)
_
(A(n),G
n-1
P
n-1" ' G 1 P 1)
(R , F) . Thus the matrix
has been replaced by the matrices
r
4
I
2
l
-3
2 3 5
R
and
F.
We want to factorize
Example.
(22)
A=
B
1
3 2
Thus we put
B= (A,1) =
4
2
3
1
0
0
2
3 5
1
0 0
1
0
0
1
-3
2
= B (1)
.
1
No row interchange is required in the first step since the element in the first column with the largest absolute value is in the first row.
Accord-
ing to iii) of (18) we subtract 1/2 times the first row from the second We then obtain
row and (-3/4) times the first row from the third row.
B(2) =
(A(2),GIP1l =
4 0 0
2 2
13/2
3 -1/2 17/4
1
-1/2 3/4
0
0
1
0
0
1
I.
14.
Stable Variants of the Simplex Algorithm
123
The second and third rows are now interchanged: 4 P2G(2)
_
(P2A(2), P2GIP1) =
2
0
13/2
0
2
3 17/4 -1/2
1
0
0
3/4 -1/2
0
1
1
0
I.
The last elimination step (subtraction of a suitable multiple of the second row from the third) gives
14 B(3) _ (R,F) =
2
13/2
0 0
0
3
0
1
17/4 -47/26
0
3/4 -19/26
0
1
1
-4/13
2 3
3
4
2
5
1 2
It is easy to check that 4 2
FA
-193/4 /26
0
-4/13
Exercise.
(23)
A=
3 2
1
6
1
3
1
1
1
(
0
3
17/4
0
i302
.
-47/26
R
Factorize the following matrix
Remarks.
(24)
-3
=
(a)
The factorization
FA = R is closely related to the so-called LR-decomposition of
A.
Thus one can
show that F = Gn-I ... G' P where n-1 ... Pi+l Gi Pi+l .. Pn-l'
Gr
i
i = 1,...,n-1,
Gnr-1
- Gn-1'
and p
Every
= Pn-1 ... Gi
GI G11
P1'
is again a matrix of type (19). -1 G' 2
(Why?)
Therefore
r-1
... G n-1 = L
is a lower triangular matrix, which is easily verified by means of straightforward calculation.
One obtains the decomposition
PA = LR where
L.. = 1, 1 = 1,...,n.
124
NUMERICAL RELIZATION OF THE SIMPLEX ALGORITHM
VI.
The method for calculating the factorization
(b)
FA = R
which we
have described above is numerically stable with respect to the round-offs which occur during the course of the computations. fact here.
We shall not prove this
This stability is the reason for the use of factorization
methods in "modern" realizations of the simplex algorithm. Consider now the k-th exchange step of the simplex algorithm. Ak
be the basis matrix.
The matrices
Fk
and
Rk
Let
in the factorization
FkAk=Rk The solutions
are calculated as described in (21).
xk, yk
dk
and
to
the linear system of equations Ak xk = c,
T Ak yk = b
k
Akdk=ak are determined as described in (6) and (9).
We have already said that one
should exploit the fact that two successive basis matrices
Ak
Ak
and
l
differ only in one column in order to get an efficient numerical realization of the simplex method.
We now show how to update a factorization of
Ak; i.e. to calculate the factorization of Ak+1 Modification techniques.
(25)
from that of
Ak.
By this we mean methods which allow
us to pass from a decomposition
FA = R of the
n x n
matrix A
to the corresponding decomposition
FA=R A arises from
where
A
as a result of "small changes"; e.g. change of a
row or column or the addition or deletion of a row or column. Use the same notations as before.
Let
A
be a fixed
n x n
matrix
and suppose the decomposition
(26)
FA = R is known.
We denote the column vectors of A
by
al,...,an:
A = (all ...,an). Let
a*
be a fixed vector.
We consider the matrix
A when the r-th column vector is removed from A added as the last column:
A
which arises from
and the vector
a*
is
14.
Stable Variants of the Simplex Algorithm
A = (al,..,ar-1'
125
ar+1°...,an,a ).
We seek the matrix FA = H.
Fail i = 1,...,n
The vectors
column vectors of
are known from (26) since they are the
Thus
R.
H = (Fal,...,Far-l, Far+1....,Fan, Fa*) and
H
is a matrix of the following form:
x x
x
x
x
FA= H= O x x
x
t
r-th column The first
r-1
columns of
are identical with those of
H
r-th through (n-l)-th columns of H coincide with the last of
R.
The last column of H
The matrix
H
is the vector
and the
R
n-r
can now be brought into upper triangular form by means
of a sequence of Gaussian elimination steps with row-pivoting. with (18).) ing rows.
columns
Fa*.
(Compare
Here one needs only to consider the exchange of two neighborOne thus obtains an upper triangular matrix
Gn-1 Pn-1 ..'
Each matrix
G. i
Gr Pr H = R.
R
through (27)
has the form 1
row i row i+l
1
-o-
0
gi
1
1
Pi
is either the unit matrix or the matrix which arises from the unit
matrix by interchanging rows number
i
and
i+l.
We have further that
126
NUMERICAL REALIZATION OF THE SIMPLEX ALGORITHM
VI.
Igil < I.
From (27) we get Gn-1 Pn-1 ...
r Pr FA = R.
Putting Gn-I ... Gr Pr F, we get the decomposition sought: FA = R.
Numerical realization of the modification.
(28)
tion
n x 2n
Let the factoriza-
be calculated according to (21) and given in the form of the
FA = R
matrix
(R,F).
One passes to the matrix (H,F)
We now apply
where the Hessenberg matrix is formed as described in (25).
Gaussian elimination to this matrix according to (18) and bring H upper triangular form.
The final result is then (after
n-r-1
into
elimina-
tion steps) the matrix (R,F)
and the desired decomposition of
is
AA
FFA = R.
The validity of the procedure just given is a consequence of the relations R
Pn-1 ...
Gr Pr) H
(G n-1
and f = (G'n-I Pn-1 ... Gr Pr) F. (29)
In (22) we calc ula ted the fac toriza tion
Example.
FA =
1
0
0
4
0
1
2
2 3
3
3/4
1
-4/13
-3
5
2
-19/26
4 0 0
1
2
13/2 0
Now we want to determine the corresponding decomposition
A=
4
3
1
2
1
2
-3
2
4
3
17/4 -47/26
FA = R
R.
with
A has arisen from A by replacing the second column by the third and the third by y the new column vector
a* = (1,2,4)
T
For simplicity, we work
14.
Stable Variants of the Simplex Algorithm
Thus we start with the matrix
with 3 decimal places.
(R,F) =
Entering
4
2
0 0
6.50
3
0
4 0
3 4.25
0
-1.81
0
0
1
0.750 -0.731
4.25 -1.81
0
1
-0.308
1
and following the rules of (28) we get
Fa* = (1, 4.75, 0.037) T
(H,F) =
127
1
0
1
0.750 -0.731
4.75 0.037
0
0
1
-0.308
1
We must add the second row, multi-
Only one elimination step is required.
plied by 1.81/4.25 = 0.426 to the third row.
(R,F) =
I
We then get
4
3
1
0
4.25
0
0
4.75 2.06
0
1
0.750 -0.412
0
0
1
1
0.118
which defines, within working precision, the factorization
FA = R.
Check 1
0
0
4
3
1
1
0.118 ,( -3
2
4
4
3
0.002
0.000
1
FA = I
-0.412
,
-
(
2.06
By simply counting the number of multiplications and divisions required,
factorization
such operations are necessary to determine the
n3/3
one finds that about
A
when
FA = R
on the order of magnitude of to carry out one modification.
is an
n2
n x n
matrix.
On the other hand,
multiplications and divisions are needed If we neglect the "administrative over-
head" of the computational program, the addition and subtractions, we may conclude that the use of modification techniques will entail substantial savings
for large
n.
However, for contemporary computers (1981) it is a
very rough approximation to neglect the time required for an addition or subtraction in comparison to that needed for a multiplication or division. In the present case the conclusion will not be altered even if we consider additions and subtractions as well.
will increase slowly with
n.
Normally, the administrative overhead
Modification techniques are almost a must
for treating large linear optimization problems within reasonable time. (30)
the
n x 2n
Summary.
matrix
B0 = (A0,I)
Let a starting basic matrix
A0
be given.
Define
128
VI.
NUMERICAL REALIZATION OF THE SIMPLEX ALGORITHM
and determine, as described in (21), the matrix (RO,F0)
such that
F0 R0=A0 (n-1
row-pivoting and elimination steps are required).
general step.
We discuss the
Suppose the matrix
Bk = (Rk, Fk)
The basis matrix
has been calculated.
Ak
has the factorization
Fk Ak = Rk.
If now the column vector
ar
is to be removed and the entering vector
a*
is determined, one calculates Bk+1 = (Rk+1' Fk+1) as described in (25) and (28).
the "new" basic matrix
In this way we find the factorization of
Ak+1'
Fk+l Ak+l = Rk+l.
415.
CALCULATING A BASIC SOLUTION In order to start the simplex algorithm we need a basic solution
(a0,x0}
We shall now describe how to construct such a starting
of (D).
solution.
We consider again the linear optimization problem (P)
where
Minimize S
cTy
subject to
a(s)Ty > b(s),
s E S.
is an arbitrary index set.
As in (16) of 411 we introduce the regularized problem (F > 0
is a
fixed real number) (P F)
Minimize
cTy
subject to
a(s)Ty > b(s),
eT
s E S
y > -F, r = 1,...,n.
-eT y > -F,
Here, er valent to
is the r-th unit vector and the last
2n
constraints are equi-
Calculating a Basic Solution
15.
129
r = 1,...,n.
lyrI < F,
If (P) has a solution
(1)
lyrl < F
such that
y
then
is a solution to
y
(P F) as well and the values of (P) and (PF) coincide.
Hence one can
solve (PF) instead of (P). The dual problem (DF).
(2)
the last
Let the dual variables associated with
constraints of (PF) be
2n
mr, mr, r = 1,...,n.
The dual
takes the form
n
q Maximize
(DF)
b(si)xi - F i=1
(mr + m ) r
I
r=1
subject to
q
ar(s i)xi
i=1
xi > 0,
+ m
r
- m = c r r
,
r = 1,...,n
i = 1,...,q
m r > 0, r = 1,..
m
r
n.
> 0,
The second term in the preference function of (DF), (m
nn
-F
r=1
r
+ m), r
(3)
may be interpreted as a "penalty" for violating the constraints q
ar(si)xi = cr,
r = 1,...,n.
i=1
If
is large enough, the constraints will be satisfied.
F
The advantage of considering the regularized problem stems from the fact that one can immediately find a basic solution of (DF). Construction of a basic solution of (DF).
(4)
Put
and con-
x0 = 0
sider the system e-
0+
mr - mr = cr,
r = 1,...,n.
(5)
We get a basic solution of (5) and hence also of (DF) by putting, for each r, one of the vectors cr > 0; otherwise
er
(-er)
or
follows:
n E
r=1
n
e
r mr
+
(-e r)
in the basis.
goes into the basis.
I(-e r)mr = c. r=l
We select
er
if
(5) can now be written as
130
NUMERICAL REALIZATION OF THE SIMPLEX ALGORITHM
VI.
We note that the basic solution is regular if c
# 0,
r
r = 1,...,n.
Otherwise it is degenerate.
The simplex algorithm can of course be
started in both cases. (6)
Remark on the value of the parameter F.
The starting method
described above can always be used when a suitable a priori estimate of the solutions of (D) is available.
If
F
is chosen too small, however,
then the solutions of (DF) are not feasible for (D).
Hence it is not
possible to start with the basic solution of (DF) given in (4) and to use the simplex algorithm to find a basic solution free from all the vectors er
and
-e r, r = 1,...,n, or with all the corresponding variables
my
equal to zero.
In this case one could of course increase
F
m+,
and con-
tinue with the simplex algorithm.
One arrives to the so-called two-phase method of the simplex proIf no "realistic" estimate (1) is available
cedure by arguing as follows. then one chooses
F
very large.
This means that the first term of the
preference function of (DF) has a relatively small influence. therefore be neglected.
It can
Then we consider instead the problem:
n
Maximize
(mr + mr)
-
r=1
(7)
subject to the constraints of (DF). (8)
Phase I of the simplex procedure.
The simplex algorithm is
applied to the following dual pair of linear optimization problems: (P1)
Minimize
subject to
cTy
a(s)Ty > 0,
s E S
eiry > -1,
-e Try > -1, (D1)
Maximize (9)
n - I (mr + m ) r r=1
Exercise.
r = 1,.-.,n-
subject to the constraints of (DF).
Confirm that (P1) and (D1) form a dual pair of linear
optimization problems.
Also discuss in what sense (P1) can be looked upon
as a limiting case of (P F) when
F -+
When (D1) is treated with the simplex algorithm one seeks to satisfy the constraints of (D).
If
v(D1) = 0, then (D) is consistent, and if
v(DI) < 0, then (D) is inconsistent.
15.
Calculating a BaSic Solution
131
We assume now that the simplex algorithm has delivered an optimal basic solution to (DI) after finitely many exchange steps and that the corresponding value of the dual preference function is zero.
Thus (D) is
feasible.
The basis vectors of this optimal basic solution are called
ai,
i = 1,...,n: ai E {a(s)
If now all
s E S} U {er
I
I
r = 1,...,n} U{-er
I
r = 1,...,n}.
are of the form
ai
i = 1,...,n,
ai = a(si),
i.e. none of the vectors
a
r
and
-e
appear in the basis, then one may
r
put a 0 = {s1,.. ,sn},
which is then a basic set for (P) algorithm to (P) cedure.
- (D).
We may now apply the simplex
- (D) and have thus entered Phase II of the simplex pro-
Hence this is always possible if the optimal value of (D I) is
zero and Phase I of the simplex procedure delivers a regular optimal basic solution.
er
or
(-er)
or
mr
If the value of (D I) is zero, then none of the vectors Mr can appear in the optimal basis with a positive weight
respectively.
We assume now that (D I) has the optimal value zero and a degenerate
optimal basic solution where at least one of the vectors form
er
or
as follows:
ai
is of the
Then one proceeds in Phase II of the simplex method
-er.
consider the modified dual problem: q
(DII)
Maximize
b(si)xi
subject to
i=1
q
+
ar(si)xi + mr - mr = cr,
r = 1,...,n
i=1
n
r=1
The constraint
(m+
r+mr) = 0
xi > 0,
i = 1,...,q
mr > 0,
mr > 0,
r = 1,...,n.
En=I (m+ + mr) = 0
has been introduced to insure that
r
every feasible solution of (DII) satisfies Therefore
DII
is equivalent to
D.
mr = 0, mr = 0, r = 1,...,n.
Every basic matrix of
DII
has, of
132
NUMERICAL REALIZATION OF THE SIMPLEX ALGORITHM
VI.
course, n+1
We now show how to construct a starting
column vectors.
basic set for D11
from an optimal basic solution of
DI.
Let an optimal basic matrix of (D I) contain the following
n
column
vectors:
al = a(s1),...,ak = a(sk),
ak+l = ei...,ak+i = eip -elk+l,...,an = -ein-k,
ak+t+1
where
and
k
A =
k
are fixed integers.
The basic matrix thus has the form
a(s1)...a(sk) 0 -1
00
Now let
be any vector out of the set
e
belong to the set
{ak+R+1'
and the following
(n+l) x (n+l)
...,an},
{ak+l.....an}.
Then
-e
cannot
since the basic matrix is nonsingular,
matrix is a starting basic matrix for
(DII) :
0...OAl...1
I
1
).
k-th column Exercise.
(10)
A
Show that
is nonsingular.
If the calculations are carried out as described in §14, then we leave Phase I with the basic matrix
A
factorized according to
FA = R.
This decomposition is recorded as the
n x 2n
matrix
(R, F) . We describe next how to find a corresponding factorization
FA= R, i.e. the (R,F)
(n+l) x 2(n+l)
matrix
15.
Calculating a Basic Solution
from
133
This is done by means of a method similar to the modification
(R,F).
techniques of §14.
We find from (11) that R
-Fe
0...0 1...I
1
0
F
A = 0
f
1
The matrix on the right is "almost" in triangular form. most
Consider the
form.
One needs at
permutation and elimination steps to bring it into triangular
n-k
(n+l) x 2(n+l)
R
-Fe
matrix
F (12)
0...0 1...1
After
0...0
1
Gaussian elimination steps, (12) is changed to assume the form
n-k
Ix xx
...
x
x
...
x )
O Hence we have the desired factorization
FA= R of the starting basis
matrix already in Phase I.
2(n+l)
an
It may be more practical to work with an
Remark.
(13)
A.
n x n
A
(n+l) x
We form it as follows, where
A0
is
basis matrix: 0
I
0
0
0...0
0
0
0
After factorizations we get R
0
F
0
0
0
0
0
(14)
It is now easy to supplement (14) to obtain (12).
Then the matrix
(R,F)
is calculated as earlier described and one may enter Phase II.
Chapter VII
A General Three-Phase Algorithm
In this chapter we shall describe a computational scheme for efficient numerical treatment of general linear optimization problems with infinitely many constraints.
For this purpose we shall derive a nonlinear
system of equations from whose solutions one constructs an optimal soluThe general scheme is then presented and
tion of the original problem.
its use is illustrated in several numerical examples. Thus we consider again the dual pair (P) - (D): (P)
Minimize
c
n
T
subject to
y
ar(s)yr > b(s),
s E S.
r=1 q (D)
Maximize
b(si)xi
subject to
i=1
q
ar(si)xi = cr,
r = 1,2,...,n,
i=1
In this chapter we shall require that (P) and (D) are solvable and that no duality gap occurs.
compact subset of S.
Rk
We shall further assume that and that
all...an, b
S
is a nonempty
are continuous functions on
Later, we shall also impose the condition that they have continuous
partial derivatives up to a certain order.
134
16.
§16.
Nonlinear Systems Derived from Optimality Conditions
135
NONLINEAR SYSTEMS DERIVED FROM OPTIMALITY CONDITIONS Theorem.
(1)
be an optimal solution to Problem (P) and let
y
Let
{s1,...,sq; x1,...,xq}
be an optimal solution to (D) with
1 < q < n
and such that
i = 1 , .. ,q.
x. > 0,
i
(2)
Put
n
f(s) = I ar(s)yr - b(s). r=1
Then
y1,...,yn, sl,...Isq, X1,.... xq
have the properties (3), (4) and
(5) below;
nn
i = 1,...,q;
ar(si)yr = b(si),
(3)
r=1 q
ar(si)xi = cr,
r = 1,...,n.
(4)
i=1
The function Proof:
f
assumes its minimal value at sl,...Isq.
(5)
(3) follows from (2) and the duality slackness conditions (4) expresses the fact that
(14) of §12.
feasible for (D).
f(s) > 0,
y
Since
{sl,...,sq, X11 ...,xq}
is
is a feasible vector for (P) we have
s E S.
(6)
By (3), f(si) = 0, i = 1,...,q, establishing (5). In the computational scheme to be described in this chapter, (3), (4), and (5) will be used for the calculation of xl,...,x
We shall assume that
.
y1,...,yn, sip...IsgI
can be determined, e.g. from a suf-
q
q
ficiently fine discretization of (P). and refer to
We shall call
x1,...,z
as the corresponding mass-points.
sl,...,sq
q Thus
masses q
is
the number of mass-points. (7)
Since we have assumed that
Remark.
each mass-point
si
S
corresponds to a vector with
si,...,si, (i = 1,...,q).
is a subset of k
Rk,
components
Thus
n + kq + q "unknowns"
1
yl
k
1
k
yn' sl,...,sl, sq,...,sq, x1,...,xq
the calculation of the primal and dual solutions. q + n
will appear in
(3) and (4) will give
136
A GENERAL THREE-PHASE ALGORITHM
VII.
equations which must be satisfied by these unknowns. equations will be derived from (5).
The "missing"
kq
Then we will get a system of equa-
tions with the same number of equations and unknowns.
Its solution is
then used to construct optimal solutions to (P) and (D). Exercise.
(8)
Show that
y1,...,yn, sl,...Isq, xl,...,xq
tions to (P) and (D), if they satisfy (2) Example.
(9)
-
ar, r = 1,...,n, and
Let the functions
tinuous partial derivatives of first order on i = 1,...,q, lie in the interior of of
= 0,
s.
asl
and
j = 1,...,k
S.
are solu-
(5).
S.
b
have con-
Assume also that
si,
Then (5) entails
i = 1
1
Therefore we get in this case the following system of
kq
equations
n
i = 1,...,q.
Var(si)yr = Vb(si),
(10)
r=1
(The gradient vector Vf
of a real-valued differentiable function
f
is here defined by
k
Of(s) = (
(s),...,af(s)
)
as
as
Thus we get, by combining (3), (4), and (10), a nonlinear system of equations with
n + (k+l)q
unknowns and the same number of equations.
This
system may be treated by means of one of the standard numerical schemes, e.g. the Newton-Raphson method.
See e.g. Dahlquist-Bjorck (1974) or Stoer
(1976). (11)
Remark.
It is well-known that the conditions (10) are neces-
sary for (5) but not sufficient.
Thus a solution to (3),
(4) and (10)
that also satisfies (2) may not satisfy (5) and hence may not be a solution to (P) and (D).
In order to establish that a candidate solution ob-
tained from the necessary conditions really solves the dual pair (P) (D), one must verify that the infinitely many primal conditions n ar(s)yr > b(s),
s E S,
r=1
are met.
Before we discuss the case when a mass-point is situated on the boundary of
S
we shall make some important observations about the
determination of the integer
q
by means of discretization.
Nonlinear Systems nerived from Optimality Conditions
16.
Let (PR) be a discretization of
P
137
be
(compare §13) and let {rrt,xt}
an optimal basic solution of the corresponding dual problem (DR); oR = {sit,...IsnR} c SR, xg = {x1g,...,xnR}T E Rn.
Denote by
qt
the number of positive components of
the basic solution is termed degenerate if to expect that if the grid q
Si
qt < n.
We recall that
xt.
It seems reasonable
is sufficiently fine then
qR = q
where
is the number of masspoints of a solution of (D).
However, several numerical examples have been solved where the contrary is true.
In almost all problems, one finds that
qt=n for all discretizations (P t) - (Dt), irrespective of the fineness of the grid.
Thus the discretized problems are generally not degenerate.
This
observation agrees with the theoretical result that the degenerate linear programs are, in a certain sense, more rare than the regular ones. On the other hand, the case q < n
is fairly common in optimization problem with infinitely many constraints. Nevertheless, the integer
q
can be determined from the solutions of dis-
cretized problems, as is illustrated in the following example. Example.
(12)
We want to solve the (primal) problem
8
Minimize
I
yr/r
r=1
subject to 8 r=l
syr r-1 > 1/(2-s), -
s E [0,1] = S.
We discretize and select the following subsets SR = {O,hg,2ht,...,l}
with
SR
of
S
(t > 2):
ht = 1/(t - 1).
The corresponding discretized Problems (Pt) - (DR) were solved on a computer by means of the simplex algorithm for
t = 21, 41, 81.
qt = 8
obtained in all three cases and the following basic sets emerged:
was
138
VII.
k = 21
R = 41
021
k = 81
041
0.0000
0.0000
1500 0.2000
0 1500
0
0.1750
0.1750 }
0 . 5000
0.5000 0.5250
0.5000 0.5125 }
0.5500
.
.
1
1625
8000 0.8500
0.
8250 0.8500
0 8250
1.0000
1.0000
1.0000
0.
Group
081
0.0000 0.
A GENERAL THREE-PHASE ALGORITHM
2
3
.
4
0.8375 }
5
We note that the eight numbers in each column may be divided into five The elements of Groups number 2,3,4 lie closely together and the
groups.
distances between the two elements in these groups get smaller with increasing
I.
It is reasonable to assume that q = 5
holds for the "continuous" problem.
This conjecture can be shown to be
true by means of the theory of Chapter VIII.
Now we shall demonstrate
how to derive a nonlinear system for the primal and dual problems of this particular example. q = 5
mass-points
There are 8 primal unknowns, namely sl,...Is5
y1,.... y8, and
with the corresponding masses
The results of the table above indicate
sl = 0
x1,...,x5.
s5 = 1, i.e. we
and
assume that these points lie on the boundary of the interval may also be concluded from the theory of Chapter VIII.
(0,1].
This
There remain the
8 dual unknowns s2, s3, s4, x1, x2, x3, x4, x5.
Hence there are in total 8 unknown numbers to determine. equations result from (3) and
n = 8
Now
equations from (4).
q = 5
The missing 3
equations are obtained from (5) and the observation that the "error function"
f
assumes its minimum value at
s2, s3, s4.
Hence its derivative
must vanish at these points, giving the 3 equations sought. the 16 equations (observe that
s1 = 0
and
s5 = 1) :
We give now
16.
Nonlinear Systems Derived from Optimality Conditions
- 1/2
yl
139
= 0
yl + s2y2 + s2y3 + ... + s2y8 - l/(2-s2) = 0 7
2
yl + s3y2 + s3y3 + ... + s3y8 - 1/(2-s3) = 0 yl + s4y2 + s4y3 + ... + s4y8 - 1/(2-s4) = 0
yl +
y3 + ... +
Y2 +
y8 - 1
xl+x2+x3+x4+x5 - 1
=0
0
s2x2 + s3x3 + s4x4 + x5 - 1/2 = 0
- 1/3 =
s2x 2 + s3x 3 + s2x 4 + x 5 4
0
7
s2x2 + s37 x3 + s4x4 + x5 - 1/8 = 0
y2 + 2s2y3 + ... + 7s6 y8
-
1/(2-s2)2 = 0
y2 + 2s3y3 + ... + 7s3 y8
-
1/(2-s3)2 = 0
y2 + 2s4y3 + ... + 7s6y8
-
1/(2-s4)2 = 0.
We recommend that the reader verify each of these equations. equation gives immediately
yl = 1/2
(The first
and this value can be entered into
the remaining equations decreasing the size of the system somewhat.)
We
write the above system in the form gl(Y,s,x) = 0 (13) g16(y,s,x)
= 0
where we use the notation (Y,s,x) = (Y1,---,Y8's2,s3,s4' x1,...,x5).
We next show how to construct an approximate solution from the solutions (y,s,x)
(yi;aR,xi)
of (Pt) - (DR).
(y,s,x)
to (13)
The approximation
may then be improved by an iterative scheme, e.g. the Newton-
Raphson method.
We put
yiI = yiR, i = 1,2,...,8
(here, Yt = (Ylt'-.-'y8t)); i.
xl£ = x1R,; xit = x2i-2,t + x2i-i,k.'
= 2,3,4; xst = x8t
sit = (x2i-2,Ls2i-2,L + x2i-1,k * s2i-l,k)/xit'
i = 2,3,4.
A GENERAL THREE-PHASE ALGORITHM
VII.
140
is the center of gravity of the mass-points belonging to Group
Thus
sik number i.
The 'goodness' of this approximation is expressed by the number pt =
Thus
p.
max Igi(yX,sR,xR)1i=1,...,16 is the maximum norm of the residual vector of the system (13).
We get the following table:
Group
£=21
i=41
xil
sil
xil
sil
0.048495
0
1
X=81
xil
sil
0.049853
0
0.049940
0
2
0.1 74643
0.277444
0.172802
0.272652
0.172742
0.272418
3
0.5 00004
0.348128
0.500003
0.354998
0.500004
0.355290
4
0.8 25362
0.277439
0.827202
0.272648
0.827262
0.272413
5
0.048494
1
Residual norm
1.40-10 -
pl
0.049852
1
3
1,16.10
0.049938
1
4
5.22-10 -
5
The exact mass-points and masses are given below: sl = 0
xI = 1/20
s2 = 0.5(l-/77) = 0.172673
x2 = 49/180 = 0.272222
s3 = 0.5
x3 = 16/45
s4 = 0.5(1+/) = 0.827327
x4 = 49/180 = 0.272222
s5 = 1
x5 = 1/20
= 0.05
= 0.355556
= 0.05.
It is generally true that very good approximate solutions to the nonlinear system of equations can be constructed by means of discretization, linear programming and clustering of mass-points by determining centers of gravity as described above.
Sometimes it is not even necessary to improve upon
this approximate solution by means of iterative methods. (14)
Use the same method as in the preceding example to
Exercise.
solve the problem 10
Minimize
E
r=l
10
y /r r
subject to
E
r=1
> -1/(1+s2),
sr ly
s E [0,1) = S.
r
Use for the discretization an equidistant grid with
N = 41
the corresponding linear program with the simplex algorithm.
points.
Solve
The approxi-
17.
A General Computational Scheme
141
mate solution found in this way may be compared with the true result which is as follows:
q = 6; the mass-points and masses are
S.
X.
0.037989
0.096417
0.190708
0.202986
0.427197
0.259604
0.686634
0.247041
0.897894
0.165077
1.000000
0.028876
i
The primal solution is y = (-1.000000, -5.837.10-5, 1001582, -0.020238, -0.856457, -0.612559, 2.622486, -2.606125, 1.188151, In the next section we shall describe a general computational scheme.
A GENERAL COMPUTATIONAL SCHEME
517.
Retain the general assumptions of the beginning of this chapter, inal,a2,...,an, and
cluding the requirement that
derivatives of the first and second order.
b
have continuous partial
We propose
A general computational scheme consisting of the three phases
(1)
1), ii) and iii) below.
The dual pair (P) - (D) is discretized; i.e. the infinite index
i)
set
is replaced by a finite subset.
S
The resulting dual pair of linear
programs is solved by means of the simplex method. ii)
The structure of the nonlinear system (3), (4), (5) of §16 is
determined from the calculated optimal solutions of the discretized problems.
A tolerance
a
is selected.
If among the mass-points of the solu-
tion of the discretized problem there are two mass-points with masses
xi
and
x.
such that the distance between
si si
and and
sj
is
s i
less than
c, then they are replaced by a mass-point
s
carrying mass
z
where x = xi + xi,
x = (xisi + x.s.)/i.
This procedure is repeated as long as there still are two mass-points lying closer to each other than
E.
A nonlinear system is now derived by combining (3), (4), (5) of 516.
142
VII.
iii)
A GENERAL THREE-PHASE ALGORITHM
The nonlinear system obtained in Phase ii) is solved by some If the calculated
numerical procedure, e.g. the Newton-Raphson method.
solution satisfies the feasibility conditions of (P) and (D), it is acOtherwise one reenters Phase i) with a refined grid.
cepted as optimal. Remark.
The scheme described above has been successfully applied to It is recommended to use a numerically stable
many practical problems.
realization of the simplex algorithm in Phase i), e.g. the version described in §14, which uses stable updating of the basic matrix. In Phase ii) we construct a nonlinear system by combining (3), (4),
Thus if
and (5) of §16.
is an interior point of
s i
If
tions from (5) of §16 as explained in (9) of §16.
point of
S, we get s i
k
equa-
is a boundary
S, one may proceed as explained in the Example (3) below if
S
A more general description can be formulated by
has a simple structure.
means of the so-called Kuhn-Tucker conditions if
S
is defined through a
set of inequalities: S = {s E Rk: hi(s) < 0,
j = 1,...,p}.
The reader is referred to Collatz and Wetterling (1971) for a discussion of this topic. (2)
If the tolerance
Remark.
a
is selected too large in Phase ii)
or the grid of Phase i) is not sufficiently fine then we may enter Phase iii) with the wrong nonlinear system and the Newton-Raphson iterations diverge or converge to a "solution" which does not define a feasible vector
y
In both cases one reenters Phase i) with a finer grid
of (P).
and reduces the tolerance
in Phase ii).
a
It is possible to show that
Phase iii) succeeds provided that the grid in Phase i) is sufficiently fine, a
in Phase ii) is small enough, and certain general regularity A general three-phase scheme for semi-infinite pro-
conditions are met.
grams of the type given above was first published in Gustafson (1970). (3)
Example.
Let
in the plane (k = 2).
S
be the set
[0,1] x [0,1], i.e. the unit square
We consider the case
n = 8.
Assume that after
carrying out Phase i) (solution of the discretized version of the dual pair (P) - (D)) we get a mass-point distribution as depicted in Fig. 17.1.
8 mass-points appear in 4 clusters containing 1, 2, 2, and 3 mass-points respectively.
Assume now that
a
each cluster with one mass-point. the 4 mass-points
is chosen such that Phase ii) replaces Thus
q = 4.
Hence we must determine
sl,...,s4' the corresponding masses
the 8 primal variables
y1,...,y8.
Since each
s i
x1,...,x4, and
is a two-dimensional
17.
A General Computational Scheme
143
S
Fig. 17.1
4.2+4+8 = 20
vector we have in total
unknowns.
Due to the character of
the solution of the discretized problem we assume that upper right corner, s2 interior of and
s1
on the left-hand boundary and
Thus we should have
S.
are the first and second components of
of remaining unknowns is 17. 8
22
r=1 as s
s1 = s1 = 1
a, (S )yr -
8
a (s.)y r=1 asi r 7 r
-
a2
as
and s1.
s1
is in the
s3, s4 s2 = 0
in the
where
s1
Therefore the total
(5) of §16 now gives the 5 equations
b(s2) = 0
8 j = 0, asi b(s)
i = 1,2
and
j = 3,4,
with (3) and (4) of §16 giving the remaining 4 and 8 equations.
Hence
we have constructed a nonlinear system of equations with the same number of equations as unknowns. (4)
Exercise.
equations when
S
Describe how to construct the nonlinear system of is the unit circle in the plane and some mass-points
are situated on the boundary. (5)
Example.
(13) of §13.
table below:
We consider the first example which was discussed in
Upon discretization we get the mass-points indicated in the
144
VII.
Coord. of s.
i
A GENERAL THREE-PHASE ALGORITHM
X.
1
1
1.00
0.0 0
0.0667
2
0.50
0.2 5
0.2667
3
0.25
0.5 0
0.2667
4
0.50
0.5 0
0.2000
5
0.00
1.0 0
0.0667
6
1.00
1.0 0
0.1333
2 s
55
1
s6
s3
X
s4
s2 1
®
>
Si
s
1
Fig. 17.2 (0 Mass-points of the discr. problem) (x Mass-points of the cont. problem)
Here the 6 mass-points appear in 4 clusters, one of which has 3 members, the other 3 having one mass-point each.
Phase ii) gives
q = 4
and the
following initial approximation for the solution to the dual problem (D): i
Coord. of mass-point si
1
1
0
0.0667
2
0.4091
0.4091
0.7334
3
0
1
0.0667
4
1
1
0.1333
Mass xi
Derive the corresponding nonlinear system of equations (it has
6+4+2
unknowns) and verify that it is satisfied by the optimal solution of the primal problem (given in (13) of §13) together with the following masses and mass-points:
17.
A General Computational Scheme
145
i
Coord. of mass-point i
1
1
0
0.083333
2
0.400000
0.400000
0.694444
3
0
1
0.083333
4
1
1
0.138888
Mass X. i
We shall also discuss the class of problems which were treated in §6, namely calculation of uniform approximations.
The computational scheme
which is described in this chapter has been very efficient for the solution of this class of problems, in particular by approximation on multipledimensional sets. Let
be a compact subset of
T
ments and let
v1,...,vn, and
f
(k > 1) with at least
Rk
be real-valued functions on
n+l T.
ele-
The
approximation problem reads n
I
(PA)
Minimize
yn+l
subject to
yrvr(t) - f(t)j < yn+l,
I
t E T.
r=1
The corresponding dual becomes q (DA)
Maximize
f(ti)xi
subject to
i=1 q
vr(ti)xi = 0,
r = 1,..-,n,
i=1 qq
< 1, i=1
{tl,...,tq} c T. (6)
Let the functions
Theorem.
v1,...' n
be linearly independent.
Then both the problems (PA) and (DA) are solvable and have a joint optiThere are always optimal solutions for (DA) such that
mal value.
xl # 0,...,xq # 0
with
1 < q
The proof is obtained by combining (15) of §6 with (12) of
§10 and (12) of §11.
The next theorem corresponds to (1) of §16 and its converse (see (8)
of §16). (7)
Theorem.
independent.
Then
Let the
n+l
functions
y1, ...,yn' yn+l
with
vl,...,vn, f Yn+i > 0
and
be linearly {tl,...,tq,
146
A GENERAL THREE-PHASE ALGORITHM
VII.
xi,...xq)
with
ti E T, xi # 0, i = 1,...,q (1 < q < n+l)
are optimal
for (PA) and (DA) if and only if the following relations hold: n yrvr(ti) = Yn+1 sgn xi,
f(ti) -
i = 1,...,q,
(8)
r = 1,...,n,
(9)
r=1 q
vr(ti)xi = 0, i=1
q lxii = 1.
I
(10)
i=1
n
The "error function"
I
yrvr - f
r=1
assumes its maximum or minimum value on
T
at each point
ti,...It
4.
Consider the linear optimization problem equivalent to (PA).
Proof:
Then the theorem is a direct consequence of (1) and (8) of §16.
It is
even easy to show that one direction of the statement follows from Lemma We verify here how (8) is derived from the optimality of
(29) of §6.
(t1,...ItqXll...,xq).
and
y1,. ..,yn
If
xi > 0
proof of Lemma (15) of §6, xi = xi, t+ = ti.
we write, as in the
The corresponding complemen-
tary slackness condition is
n
yv(ti) + yn+ r r
xi(
\\ i - f(ti)) = 0.
r=1
If
we put
xi < 0
xi = -xi, ti = ti
and get the complementary slackness
condition
n
\\
xi(- E yrvr(ti) + yn+i + f(ti)J = 0. r=1
Thus if
xi + 0
one of the following two equations is satisfied:
n
f(ti) -
E
Yrvr(ti) = Yn+l,
if
xi > 0;
yrvr(ti) _ -yn+l'
if
xi < 0.
r=1 n E
r=1
This is equivalent to (8).
We point out that the inequality
is a consequence of the linear independence of
Y.+1 > 0
v1,...,vn,f.
The numerical treatment of the dual pair (PA) - (DA) is analogous to that of (P)
-
(D).
Thus a three-phase computational scheme is used.
The
problem is discretized and an initial approximate solution is constructed,
17.
147
A General Computational Scheme
giving
In particular, one observes the sign of an
q.
the solution of the discretized problem. the equations (8) and (11). xl > 0
and
xi
resulting from
This is taken into account in
As an example consider the case
q = 2,
Then (8) and (10) give the equations
x2 < 0. n
f(tl) -
Yrvr(t1) = Yn+l'
E
r=1
n Yrvr(t2) = -Yn+l'
f(t2) r=I
xl - x2 = 1.
The conditions (11) give rise to equations in the same way as (5) of §16. Thus one determines from the results of the discretized problem whether a mass-point
in this case an extremal point of the error function) lies
t1
in the interior or at the boundary of
T.
Accordingly, one appends condi-
tions that partial derivatives must vanish at (12)
Example.
ti.
The following problem is treated in Andreasson and
The function
Watson (1976).
f(s,t) = exp(-s2 - t2)
is to be approximated in the uniform norm on the square
0 < s,t < 1
by
a linear combination of the functions vI(s,t) = 1,
v2(s,t) = s,
v3(s,t) = t,
v4(s,t) = 2s2-1
v5(s,t) = at, v6(s,t) = 2t2-1.
At first the problem is discretized; i.e. it is approximated by the task 6
min
Determine
max If(si,tk) I yrvr(si,tk)I r=1 i,k=1,...,5
(13)
where
4 i-1
t
k-1
k 4 =
k = 1,...,5.
,
The problem (13) is then reformulated as a linear program, as described in §6.
Then we get the task
Minimize
y7
subject to the linear constraints
148
A GENERAL THREE-PHASE ALGORITHM
VII.
6
vr(si,tk)yr + Y7 > f(si,tk)
(ilk = 1,...,5),
(14)
(ilk = 1,...,5).
(15)
r=1 6
- E vr(silt k)Yr + Y7 > -f(si'tk) r=1
This is a linear program with 7 variables and 50 constraints.
It was *
solved with the simplex algorithm which is described in §12-14
.
The
following solution emerged: 1.0358267
y1 =
Y2 = -0.38764207
y3 = -0.95174831 Y4 = -0.12398722
(16)
0.43169480
y5 = Y6 =
0.13390288
y7 =
0.025910991.
The optimal solution is displayed in Fig. 17.3.
Those vectors which appear
in the optimal basis matrix are marked with a ® or a 0 indicating the coordinates
si,tk
of the corresponding mass-point.
basis vector gives equality in (14).
Here, ® means that the
Thus the error function
n f
yrvr
assumes a maximum value there with respect to the point set 1,...,5}.
{(si,tk)/i,k =
A 9 sign means that the basis vector gives equality in (15) and
T 4
5
8
0
2
3
e
C0
7
1
1
e
1
j
6
Fig. 17.3
We thank Mr. Gerd Schuhfuss for solving this program, using a deskcalculator, an "HP 9825A".
17.
A General Computational Scheme
149
hence the error function assumes a minimum value with respect to the same The masses
set.
x.
xl =
0.28571429
x2 =
0.14285714
of the points indicated in Fig. 17.3 are as follows:
x3 = -0.19047619 x4 =
(17)
0.071428571
x5 = -0.095238095
x6 = -0.19047619 x7 = -0.023809524.
The number
q
of mass-points which should result after Phase ii) of the
three-phase algorithm would in this case depend critically on the choice of the tolerance
e.
In particular, should the three points in the lower
left part of the figure be combined into one or more than one points? To settle this question one could of course repeat Phase i); i.e. linear programming with a finer grid.
We take an alternative route.
We consider
again the error function: n`
g = f -
E
Yrvr
r=1
with
yl,...,y6
given by (16).
Thus
g
satisfies
We now determine the extremal points of
grid.
g
1g(s)1 < y7
on the
on the unit square
by means of a Newton-Raphson scheme and using appropriate grid-points as starting approximations.
Six local extrema were found.
are listed in the table below and marked in Fig. 17.4.
Their positions There, 9 means a
local minimum and ® a local maximum.
Coordinates of local extrema 0.288
0.000 4
1
0.000
1.000
1.000 2
0.725 5
0.590
(18)
1.000
0.000
0.867 6
3
0.118
0.000
We now proceed to the construction of the nonlinear system based on the assumption that the error curve corresponding to the solution of the continuous problem has its extremal points distributed in the same way as the function
g.
(If this assumption should turn out to be wrong, then Phase
150
VII.
A GENERAL THREE-PHASE ALGORITHM
Fig. 17.4
iii) will fail and we must return to Phase i) which will be repeated with a refined grid.)
Thus we put
(silt 1).... ,(s6,t6)
the square.
and assume that all extremal points
q = 6
(numbering as in Fig. 17.4) lie on the boundary of
In particular we have
s4 = 0, t4 = 1.
The following 18 un-
knowns remain to be determined: yl,...,y6,y7, x1,...,x6, s1,t2,t3,s5,s6. From (8),
(9), and (10) we get 13 equations.
(19)
We note that the sign of
the masses should be chosen as follows (see Fig. 17.4):
sgn xl = sgn x2 = sgn x4 = 1 sgn x3 = sgn x5 = sgn x6 = -1.
Hence the 6 equations from (18) are completely determined in a simple form and (10) now becomes
xl+x2 -x3+x4 -x5 -
x6 = 1.
The "missing" 5 equations are now generated from (11) since certain partial derivatives must vanish, e.g. 6
f
a
s 8
at
jf l
yr vrI(s1,0) = 0 r=1 6
(
jf
yrvr}(1,t2) =
0.
r
Thus we have got 18 equations for the determination of the 18 unknowns of (19).
An approximate solution was found as follows:
s1,t2,t3,s5,s6
17.
A General Computational Scheme
were taken from (18) x3
yl,...,y7
151
from (16) and
xl,x2,x4,x5,x6
from (17),
was calculated by combining the two masses numbered 3 and 7 of the
solution of the discretized problem. giving
Thus these two masses were added,
x3 = -0.21428571.
After four iterations the Newton-Raphson method delivered the resuits
0.98576860
yl =
y4 = -0.14461987
y2 = -0.34796776
y5 =
0.42457304
y3 = -0.90271418
y6 =
0.11293036
and the maximal deviation was y7 = 0.027274796.
This agrees with the results reported by Andreasson and Watson.
The solu-
tion of the dual problem is given in the table below: Mass-point numbers
Coordinates
0 . 27210827 1
0.25885317
0.00000000 00000000 0.62068986 1
2
.
0.15098041
0 00000000 .
3
-0.20873218
0.21767815 0 0 .
4
0.090166417
1.0
5
0.67690452 1.00000000
-0.13844199
83562113 0.00000000
-0.15282583
0
6
Remark.
(20)
Masses xi
s.,t.
.
The procedure described above is applicable to many
variants of the uniform approximation problem, e.g. when
y
must satisfy
finitely or infinitely many linear constraints besides those specified in (PA).
One example of such problems is one-sided approximation in the
uniform norm.
Further examples are to be found in Chapter IX.
We note that in many approximation problems q = n+l holds in (8),
case when
(9), and (10).
vl,...,vn
be treated in Chapter VIII.) tl,. .. ,tn+l
(An important class of such problems is the
form a Chebyshev system on If
q = n+l, then
S.
These problems will
yl,...,yn+l
and
may be determined from (8) and (11) without also calculating
152
VII.
xi,...,xn+i
A GENERAL THREE-PHASE ALGORITHM
from (9) and (10).
The mathematical properties of the system arising from (8)
-
(11)
have been investigated in Hettich (1976), where nonlinear approximation is treated as well.
Chapter VIII
Approximation Problems by Chebyshev Systems
This chapter will be devoted to the study of the problem pairs (P) (D) and (PA) - (DA) in a special but important case, namely when the moment generating functions
al,...,an
form a so-called Chebyshev system.
The most well-known instance of such a system is
ar(s) =
r = 1,...,n, on a closed and bounded real interval.
sr-1.
In all the linear
optimization problems to be treated in this chapter, the structure of the nonlinear system can be determined from the outset, which simplifies the numerical treatment considerably in comparison to a direct application of the three-phase algorithm.
In the first section we shall present some general properties of Chebyshev systems.
The reader will recognize many results from the theory
of polynomials in one variable.
The next section will be devoted to Prob-
lem (P) and the connection between one-sided approximation and certain generalized quadrature rules of the Gaussian type.
In the last section
we shall treat numerical calculation of the best approximations in the uniform norm.
§18.
GENERAL PROPERTIES OF CHEBYSHEV SYSTEMS (1)
Let the functions
bounded real interval tem over
[a,s].
ul,...,u ul,...,un
[a,s], if the determinant
153
be continuous on the closed and will be called a chebyshev sys-
154
APPROXIMATION PROBLEMS BY CHEBYSHEV SYSTEMS
VIII.
u1(t1)
ul(tn)
....
U(tl,,tn) =
(2)
....
un(tI)
un(tn)
satisfies the relation U(tl,...,tn) > 0 Remark.
if
a < tl < t2 < ... < to < s.
The monominals
(3)
ur(t) = tr-l, r = 1,...,n
system over any real interval.
form a Chebyshev
From a numerical point of
See (3) of 57.
view it is often more advantageous to work with orthogonal polynomials instead of monomials.
We note that if
then we can determine constants
r-1
is a polynomial of degree
ar(t)
d r,
d
r
= +1
ur = drar, r = 1,...,n, is a Chebyshev system. ur = Tr-1
or
d
r
= -1, such that
The particular case
(see (20) of 57) occurs often in computational practice.
also give the following example: 0 < al < a2 < ... < an.
Put
Let
be real numbers such that
X.
ur(t) = eArt.
shev system over any real interval.
We
Then
ul,...,un
is a Cheby-
The reader is referred to Karlin and
Studden (1966) for further examples.
We will now show that many interesting results may be derived from the definiing relation (3). (4)
Theorem.
Let
ul,...,un
closed and bounded interval
ti, i = 1,...,n
let
y E Rn
[a,p].
form a Chebyshev system over the Let
w E Rn
be distinct points.
be a fixed vector and
Then there is a unique vector
satisfying
n
rr i
y u (t
r=1
Proof:
of unknowns.
)
= wi,
i = 1,...,n.
(5)
(5) is a linear system with
n
equations and the same number
We may assume that the equations are reordered such that
tl < t2 < ...< tn.
By (3) we conclude that the system (5) has a nonzero
determinant and hence a unique solution
y.
We have immediately (6)
Q
Corollary.
Let
ul,...,un
be as in (4) and define the function
by
n
nC
r=1
yrur
(7)
18.
General Properties of Chebyshev Sys=ers
where Q
yl,...,yn
are real numbers.
Then
155
has less than
Q
n
zeros if
is not identically zero.
Assume that
Proof:
Putting
tl,...,tn.
Q
is not identically zero but vanishes at in (5) we get the unique solu-
wi = 0, i = 1,...,n
yr = 0, r = 1,...,n, establishing the contradiction sought.
tion
If we put
ur(t) = tr-1, r = 1,...,n, then
of degree less than
Q
becomes a polynomial
Thus (6) is a generalization of the familiar
n.
statement that a polynomial of degree less
than
n
also has less than
n
In the same way, (4) generalizes the theorem that there is a unique
zeros.
polynomial of degree less than
n
which interpolates
n
given points.
In order to discuss the problems (P) - (D) we also need to intro-
(8)
duce zeros of multiplicity 2.
We shall see that some well-known results
on polynomials can easily be extended to linear combinations of functions which form a Chebyshev system.
For this purpose we introduce the deter-
minants jl .... in (9)
t1 .... tn where the symbols are defined by rules i), ii) and iii) below and the value is evaluated according to rules iv) and v).
a
i)
are integers and we have always
ii)
iii)
= 1; ji = 2
l
If
iv)
ji = 1
is possible only if
jl = j2 = ... = in = 1
jl'...,jn
or
ji = 2;
ji-1 = 1.
then
_ U(tl,...,tn)
tltn If there is a
v)
ji = 2
then we proceed as follows.
First we as-
sign to the determinant (9) the value given in Rule iv) above. Next we change all columns ments
u
r
(10)
[t
t
i-11 i
]
for
with
ji = 2
r = 1,...,n.
Example.
2
(t
i
t
j
t 2
3
so that the ele-
are replaced by the divided difference
ur(ti)
=
ul(tl)
ul(t2)
ul[t2,t3)
u2(tl)
u2(t2) u3(t2)
u2[t2,t3]
u3(tl)
u3[t2,t3]
156
APPROXIMATION PROBLEMS BY CHEBYSHEV SYSTEMS
VIII.
We note that the determinants (2) and (9) generally have different numerical values (for the same points
t1,...,tn) if there is a
ji = 2.
But
(2) is positive if and only if (9) is positive. (9) may also be defined if two arguments coincide. ur
If the functions
are differentiable (which we assume from now on) then we define ur[ti,t.] = lim ur[t,ti] = u=(ti). tot.
1
We next introduce the determinant (11)
whose value is given by the rules a) - d) below.
a)
a < t1 < t2 < ... < to <0;
b)
if
ti = tthen ti = a
and
or
if all
d)
if two arguments
ti > ti-1
ti+2'
are distinct, then
c)
ti
ti+l<
or
ti+1 = B
U'(tl,...,tn) = U(tl,...,tn);
coincide then we put
ti
i1,...,jn
U'(t1,...,tn) = U tl ..* tn where 2
if
ti = ti-1
1
if
ti > ti-1'
li
(12)
Example.
U (tl,t2,t2) = U
(13)
1
1
2
tl
t2
t2
u1,...,un
The functions
system of order two over entiable on
[a,S]
[a,8]
if
u1(t1 )
u1(t2)
u2(t1)
u2(t2) u'(t2)
u3(t1)
u3(t2)
ul(t2)
u3(t2)
are said to form an extended Chebyshev u1,...,un
and all determinants
are continuously differ-
U'(t1,...,t
> 0
for
a
A function
f
which is continuously differentiable on
[a,g]
is said to have a zero of multiplicity 2 (also called a double zero) at t E [a,s], if
f(t) = 0
and
fl(t) = 0.
We can now extend the Corollary
(6) and state: (15)
Theorem.
order two over
Let
[a,s].
u1,...,un
form an extended Chebyshev system of
Let the linear combination
Q be given by (7).
157
General Properties of Chebyshev Sys`e-.s
18.
Assume also that
Q
is not identically zero.
Then
has less than
Q
n
[a,R], counted with multiplicity.
zeros in
We note that if
The proof is analogous to that of (6).
Proof:
has a double zero at
t, then the coefficient vector
y
of
Q
Q
must
satisfy the two equations
Q' (t) = 0.
QCt) = 0,
Thus if we assume that
Q
has
zeros counted with multiplicity we get
n
a linear system of equations whose right-hand side is zero and whose coefficient matrix has a determinant of the type (11).
Hence the conclusion
follows.
The interpolation statement (5) may be extended to the con-
Remark.
fluent case, i.e. when pairs of the points
ti
appearing in (15) are
allowed to coincide.
One could also introduce extended Chebyshev systems
of order higher than
2
and establish the corresponding results on inter-
polation and maximum number of zeros. Some results which will be needed in the sequel are given in the exercises below.
ul,...,un+1
Let
Exercise.
(16)
as well as
Chebyshev systems of the second order over
[a,R].
u1,...,un
be extended
Show that (17)
U'(tl,...,tn,t) = cn(un+1(t) - Q(t)) where
ul,...,un
un+1
is a linear combination of
has
n
Exercise (16).
Hint:
yrur(t)
r=1 n+l
ur(t) = tr-l
and require
ul,...,un+1
u(n) (t) > 01 n+l
satisfy the assumptions of
Use Rolle's theorem to show that no function of
n I
by the last column.
continuous derivatives and also satisfies
the form
un+1(t) -
U'(t1,...,tn,t)
Show that if we take
Exercise.
t E [a, s], then the functions
can have
Q
Q'(ti) = un+1(ti)
Expand the determinant (18)
that
and
i = 1,...,n;
ti = ti+l, then
if
ii)
t
such that
Q(ti) = un+I(ti),
i)
Hint:
is independent of
cn > 0
zeros in
[a,8].
158
APPROXIMATION PROBLEMS BY CHEBYSHEV SYSTEMS
VIII.
Exercise.
(19)
R(t) = un+1 -
Use the notation and assumptions of (16).
Let
Q(t).
Show the following results:
R(t) > 0, t E [a,$], in the two cases i) and ii) below: has
R
i)
And
double zeros in
n/2
R(a) = 0
ii)
and
R
even);
(n
double zeros in
(a,B)
odd).
(n
R(t) < 0, t E [a,B], in the two cases i) and ii) below: R(a) = R(8) = 0, R
i)
(a,B)
(n-l)/2
has
has
double zeros in
(n/2)-1
(a,B)
(n
even);
Hint:
and
R(B) = 0
ii)
R
has
double zeros in
(n-l)/2
(a,B)
odd).
(n
does not change sign at a double zero.
R
Exercise.
(20)
order two over
Let
[a,B].
be an extended Chebyshev system of
ul,...,un+I
Show that there is a linear combination
n C
n+1
(21)
rIl Yrur
which is strictly positive on (a,B)
in
t E [a,B], or
Hint:
-Q(t) > 0, t E (a,0].
Put
Q
If
and
a
B
has only double zeros then either
2Q = Q1 + Q.
where
are nonnegative linear combinations of the type of (21).
Q2
n
(a,B].
and possibly simple zeros at
odd and
§19.
n
Q(t) > 0, Q1
even should be discussed separately.
ONE-SIDED APPROXIMATION AND GENERALIZED QUADRATURE RULES OF THE GAUSSIAN TYPE In this section we shall study Problem (P) - (D), which
§3 and §4.
Here we study a special but important case.
is the closed and bounded interval Chebyshev system over ferentiable on
[a,B].
Instead of
S.
times we shall assume that the
[a,8]
and
instead of
b.
we defined in
The index set
a1....,an
ar
we shall write
n+l
functions S.
ur, as in §18.
ul,...,un, b
Then we shall write
Besides the dual pair (P) - (D) we shall study the two
n
n I
r=l
and
Some-
also form
problems
Maximize
cryr
S
form an extended
will be assumed to be continuously dif-
b
an extended Chebyshev set of order two over
(P2)
and
The cases
subject to
I
r=l
ur(s)yr < b(s),
s E S,
un+1
19.
One-Sided Approximation
159
q Minimize
(D2)
q subject to
xib(si)
xiu(si) = c,
i=1
i=1
Exercise.
(1)
i = 1,...,q.
> 0,
x Show that (P2)
- (D2) are a dual pair of linear
optimization problems. Let
Lemma.
(2)
ul,...,un
closed and bounded interval
form an extended Chebyshev set over the and let
S
b
be continuous there.
Then
(P) and (P2) meet the Slater condition.
Using the result of (20) of §18 we establish that there is a
Proof:
vector
such that
z E Rn n
Q(s) =
zrur(s) > 0,
I
s E S.
r=l
Put
d = min Q(s) sES
Next set A = d-1(1 + max b(s)) sES
and define the vector
n
r=1
yrur(s) = A
y = Az.
Then we get
n
r=1
zrur(s) > Ad > max b(s).
-
sES
Thus (P) meets the Slater condition.
The proof of the analogous statement
for (P2) is carried out in a similar way. (3)
Hence we may use (12) of §10 to conclude that if (D) is feasible
and the assumptions of Lemma (2) are met, then strong duality holds for (P)
- (D) and (P2) - (D2) respectively.
in the interior of the moment cone solution.
If we also require that
c
is
M , we can show that (P) has a unique n
(For a definition of this cone see §8.)
For this purpose we
shall give a simple characterization of interior and boundary points of the moment cone.
Our argument parallels that of Karlin-Studden (1966),
pp. 42-43. (4)
Let
T = (t1,...,tR}
we shall denote by the index of
be a subset of the interval T, ind (T), the integer
t
{sign(ti-a) + sign(B-ti)}.
ind (T) i=1
[a,B].
Then
160
APPROXIMATION PROBLEMS BY CHEBYSHEV SYSTEMS
VIII.
Thus
ind (T)
R
= un+1 - Q
of (19) of §18.
Z = {zl,...,zq}
Let
denote its zeros.
in such a manner that ind (Z) = n
Q
ect
Consider the function
Example.
(5)
ind (Z) =n and un+1(s) = es.
They are denoted
will be double.
has at most
We want to construct
and indicate the zeros of
n = 4
and
Then we can sel-
R(s) < 0, s E (m,8], or
S = [0,1], ur(s) = sr 1, r = 1,...,n,
By (18) of §18, R
counted with multiplicity.
n = 3
and
R(s) > 0, s E [a,s].
We discuss the special instance and
21 - 2, 2t - 1, 2t.
must assume one of the three values
n R
zeros in
The zeros in
R.
zi, i = 1,...
.
Z = {z1,z2}
ind(Z) = 4
R(s) > 0,
s E [0,1];
n = 4
Z = {0,z3,1}
ind(Z) = 4
R(s) < 0,
s E [0,1];
n = 3
Z = {0,z4}
ind(Z) = 3
R(s) > 0,
s E [0,1];
n = 3
Z = {z5,1}
ind(Z) = 3
R(s) < 0,
s E [0,1].
Let
be a given vector.
c E Mn
(0,1)
Thus
n = 4
(6)
[0,1],
explicitly for
If
L
cr =
xiur(s.),
r = 1,...,n,
xi > 0,
i = 1,...,q,
(7)
i=1
then we say that
has a representation involving the points
c
We define now the index of c as
ind(T)
where
T
SI,...,sq.
is the subset with
smallest index satisfying (7). (8)
1/3,1/4)
S = (0,1], ur(s) = sr-1, r
Example.
c = (1,1/2,
has the two representations
c = 6 u(0) +
u(1/2) + 3
and
u(1)
(9)
6
+1 c =2 1 u(1 1 + 12 u(1 2
(10)
AT
The index of the subsets appearing in (9) and (10) is 4. (11)
Hint: or
Show that for
Exercise.
c
from (8) we do have
ind (c) = 4.
all subsets with index 3 must be of one of the two forms
(t,l), where (12)
Lemma.
order two over
{0,t}
t E (0,1).
S.
Let
ul,...,un
A point
c E M
be an extended Chebyshev system of
is a boundary point of Mn
if and
19.
one-Sided Approximation
only if
ind(c) < n.
161
Every boundary point admits a unique representation
(7).
Proof:
c0 E Bd (Mn).
Let
Then there is a supporting hyperplane to
Mn at, atcpassing through
0
may find real constants
such that
Br
since
is a
Mn
onvex cone.
Thus we
n
r=1
Br>0
satisfying 0
nC
nC
Brcr > 0,
Brcr = 0 ,
r=1
(13)
c E Mn.
r 11
Now put
nn
Q =
E
r1
Brur
By (13) we get
Q(t) > 0, Since
t E [a, B). must have a representation (7).
c0 E Mn, c0
cror = r=1
Thus
xi
i=1
xiQ(ti) = 0.
Brur(si) _
I
r=1
i=1
Q(ti) = 0, i = 1....,
.
Theorem (15) of §18 can be reformulated to
the statement that the set of zeros of ind(c0) < n.
The numbers
k < n, since if t . . . . . . . . tn
k < n
Q
has an index
< n.
Hence
in (7) are uniquely determined as long as
xi
we add
tR+1", ..tn
are selected such that
to the sum in (7), where
tl,...,tn
are distinct.
We put
We next consider (7) as a linear system of equa-
... = xn = 0.
xR+1 tions with
We get
xi,...,xn
as unknowns.
It has a unique solution since its
determinant is positive. c E Mn
Assume conversely that a vector
with index
ents define a supporting hyperplane at boundary point of (14)
order
2
Theorem.
over
has a representation (7)
We construct a nonnegative function
< n.
Mn.
Let
c.
Q whose coeffici-
By (2) of §11, c
u1,...,un
S, and let
b
be an extended Chebyshev system of
have a continuous derivative on
(D) is feasible then (D) and (D2) have optimal solutions.
the interior of the moment cone tions.
must be a
This concludes the proof.
Mn
If
S.
c
If is in
then (P) and (P2) have unique solu-
162
APPROXIMATION PROBLEMS BY CHEBYSHEV SYSTEMS
VIII.
Proof:
Since the Slater condition is met in P and P2, Problems (D)
and (D2) have optimal solutions.
Let
Then there are nonnegative reals
xi
q
y
be the optimal value of (D).
and elements
ti E S
such that
x.u i r (t.)i = c r', r = 1, ..,n,
i=1
X
> 0,
i = 1,...,q.
0
If
c E M. then (P) has an optimal solution
by Theorem (7) of §11.
y
Put
n Q =
I
Yrur
r=1
Due to complementary slackness we have Q(ti) = b(ti),
i = 1,...,q.
(15)
Q(s) > b(s), s E S, we must also require
Since
Q'(ti) = b'(ti)
a < ti < a.
if
(16)
Combining (15) and (16) we get a linear system of equations with knowns and a number of equations amounting to we conclude from Lemma (12) that
c E Mn
ind (c) > n.
(16) uniquely determine the optimal solution (17)
Example.
If
c
ind (tl,...,tq).
where
b
yl
is at the boundary of
subject to
un-
Thus (15) and
y.
have any solution or there might be many solutions. Minimize
n
Since
Mn, then (P) may not Consider the problem
yl + y2s + y3s2 > b(s),
is a function continuously differentiable on
s E [-1,1] [-1,1].
The dual of this problem reads q
Maximize
xib(ti)
subject to
i=1
q
(18)
xi = 1, i=1 qq
xiti
(19)
0,
i=1
q
xit2 = 0,
xi > 0, i = 1,...,q,
i=1
(20)
-1 < ti < 1,
i = 1,...,q.
19.
One-Sided Approximation
163
Combining (18) and (20) we find that we must take Thus
ind (1,0,0)
= 2
Let now
for this problem.
Q(s) = yl + y2s + y3s2.
put
q = 1, t
= 0, x
3 1
T
y
y E R
= 1. 1
be given and
is optimal if and only if
Q(0) = b(0),
Q(s) > b(s),
s E [-1,1] .
Thus we must also have Hence a solution
Q'(0) = b'(0).
must satisfy
y
y2 = b'(0)
yl = b(0)
and (21)
-1 < s < 1.
y3s2 > b(s) - b(0) - sb'(O), y3
is generally not determined uniquely by (21).
we get the condition In the case
For
f(s) = exp(s)
y3 > e-2 z 0.718. Is13/2,
f(s) _
(21) gives the relation
-1 < s < 1
y3s2 > Is13/2,
which cannot be satisfied for any
Thus (P) has no solution in this
y.
case.
The conditions of Theorem (14) do not, however, guarantee the uniqueness of solutions to (D). (22)
(P)
This is illustrated by
Example.
Minimize
yl + 2 y2
subject to
yl + y2s > 1 + s cos 6ws,
0 < s < 1,
The dual of this problem reads q (D)
Maximize
xi (1 + si cos 6nsi)
subject to
i=1 qq
i=1
xi = 1,
qC
xisi = 1/2, i=1
xi > 0,
We can take
q = 1,Txl = 1,
problem, i.e. (1,2)
sl = 1/2.
Taking
E M2.
Slater condition is met.
i = 1,...,q.
si E [0,1],
Thus
ind (1,1 )T = 2
yl = 3, y2 = 0
By (14), (P) has a unique solution.
that
1 + s cos 61rs < 1 + s,
in this
we find that the We note
164
VIII.
with equality at
APPROXIMATION PROBLEMS BY CHEBYSHEV SYSTEMS
Hence an optimal solution to (D)
s = 0, 1/3, 2/3, 1.
is defined by the conditions 4
xi = 1,
i=1 4
xi31
xi > 0,
= z,
1 = 1,...,4.
i=1
These conditions do not determine Theorem.
(23)
Let
xi,...,x4
as well as
ul,...,un
Chebyshev systems of order two over have unique solutions.
determined if
S.
If
ul,...,un+I
(P) and (P2) have solutions which are uniquely
c E M . n
We now treat the case
are a direct consequence of Theorem (14).
c E bd Mn
be extended
c E Mn* then (D) and (D2)
The statements about the solutions of (P) and (P2) for
Proof:
c E An
uniquely.
and study the solutions of (D).
Let
c
have the representa-
tion q
cr =
xiur(ti),
r = 1,...,n.
(24)
i=1
If
c E bd Mn
then
and (24) is uniquely determined by
ind (c) < n
{tl,...Itq}
Then there is only one subset
such that the constraints of
(D) are met, so (D) has trivially a unique optimal solution.
show that (P) has a solution
Points
y.
c.
We next
are selected in
tq+l,...,tk
such a manner that ind {t1,...,t1} = n
and this set contains the endpoint
$.
Next, y
is determined from the
equations
where
T y u(ti) = un+1(ti),
i = 1...... ,
yTu,(ti) = un+I(ti),
ti + (a,
u(ti) _ (uI(ti),...,un(ti))T.
As shown in (19) of §18, y
meets the constraints of (P).
struction of a solution to (P2) proceeds in a similar manner. show that (D) has a unique solution if value of (D), A
c E Mn.
the optimal value of (D2).
is closed, the optimal values are attained.
Let
Then Also,
A
A < X.
The con-
We need to
be the optimal Since
Mn+1
One-Sided Approximation
19.
165
(ell ...,cnX) E bd Mn+1'
Hence it has a unique representation given by
4
4CC
xiu(ti) = c,
iLl
and we have
(25)
xiun+1(ti)
iLl
ind (tl,...,t-) < n. 4
(D2) is treated in the same way.
Thus
we have concluded the proof. If
Remark.
(26)
c E M
n
then
ind (c) > n.
Combining this know-
ledge with (25) we get ,t-} = n.
ind {tl,
4
If we discuss (D2) in the same way we shall find a representation
q qC
xiu(ti) = c,
iLl
iLl
xiun+1(ti) = A,
(27)
where ind (tl,...,t I = n. 4
Since (P) and (P 2) have unique solutions we must have
(27) defines two different representations of then (P) has a unique optimal solution
c E Mn
c.
y.
A < A.
Thus (25),
We note also that if Put
n
Q =
yrur
I
r=1
Then we must have Q(ti) = un+l(ti),
Q(t) > un+l(t). Therefore the right endpoint
S
must be in the subset
}.
(See
4
(19) of §18.)
Arguing in the same way we find that the set
{tl,...Itq}
is also uniquely determined and does not contain the endpoint
S.
Thus
e
if
c E Mn
then
subsets of index (28)
U1,...,un let
w
has two different representations associated with
n.
Generalized quadrature rules of the Gaussian type.
form an extended Chebyshev system of order two over
Let again
f
which are continuously differentiable over
(a,s]
and
[a,8]
be a continuous nonnegative function over the same interval.
functions fine
c
For
we de-
166
VIII.
(B
I(f) = J
APPROXIMATION PROBLEMS BY CHEBYSHEV SYSTEMS
f(s)w(s)ds.
a We want to construct mechanical quadrature rules of the form
q
1(f) Z
xif(si)
(29)
i=1
a < sl < s2 < ... < sq < B.
where
We want (29) to give exact results for
f = ur, r = 1,...,n.
Putting
(a
Cr = I(ur) = J
r = 1,...,n,
ur(s)w(s)ds,
(30)
a
we find that the weights
and the abscissas
xi
s.
must meet the condi-
tion
q r = 1,...,n.
xiur(si) = Cr'
(31)
i=1
q = n
If we put
in (31) and select
(31) as a linear system with
Si
x1,...,xn
arbitrarily we may consider as unknowns.
u1....,u
Since
form a Chebyshev system, the determinant of this system is positive and hence a unique solution exists.
We now show that there are exactly two rules (31) such that i = 1,...,q, and
ind (sl,... s
)
= n.
xi > 0,
These rules are called generalized
q
To establish this we need only show that
rules of the Gaussian type.
For
(cl,...,cn)T E Mn, since then we can apply the argument of (26).
we define for
N = 2,3,...
ur(a),
r = 1,...,n
the functions
urN
according to
s=a
urN (s) _ u{(N-1Na+1B) r
i-1
a +
(B-a) < s < a + 1(N a)
.
We find that lim urN(s) = ur(s),
N-
r = 1,...,n, (32)
rB
lim J
N- a
rN(s)w(s)ds = cr,
Put N =
cr
(B
a We find that
urN(s)w(s)ds.
r = 1,...,n.
19.
One-Sided Approximation
N
N
riur {(N-1Na+1B} S
i=1
r
167
where
is the integral of
Ci
over the interval
a + i(B-a)/N].
[a + (i-1)(B-a)/N, Thus
w
cN = (cN'...,cN)T E Mn , N = 1.....
M
Since
.
2
1
n
is closed, c E Mn
due to (32).
One-sided approximation.
(33)
Let
be as in (28).
ul,...,un[a,B]
We discuss now the problem of approximating the continuously differentiable function
f
from above by the linear combination
Q = yTu in such a manner that
fBIQ(s)
a
- f(s) Iw(s)ds
is minimized when continuous on
(34)
Q(s) > f(s), s E S.
[a, B].
Here
w
is a fixed function,
Q(s) > f(s), JQ(s) - f(s)I = Q(s) - f(s)
Since
and (34) becomes
IQ(s) - f(s) Iw(s)ds = cy T -
rB J
f(s)w(s)ds,
(35)
Ja
where
c
is given by (30).
of (35) is independent of
Since the integral on the right hand side y, our goal is to render the scalar product a
minimum subject to the constraint instance of (P).
We note that
Q(s) > f(s), s E S.
We recognize an
q, {sl,. .,sq}, x11 ...,xq
for the dual problem (D) if and only if
s1,...,s
q
and
is feasible
x1
..
xq
are
the abcissas and weights of a quadrature rule (with nonnegative weights) which is exact for
u1,...,un.
By complementary slackness the optimal
Q
must satisfy the equations Q(si) = f(si),
i = 1,...,q,
(36)
(si-a)(B-si)Q'(si) = f'(si) = 0, If the n+l functions
u1....,un,f
i = 1,...,q.
(37)
form an extended Chebyshev system of
order two then the optimal solutions of (D) and (D2) define generalized rules of the Gaussian type.
See (26).
168
APPROXIMATION PROBLEMS BY CHEBYSHEV SYSTEMS
VIII.
We want to find the best polynomial approximations
Example.
(38)
from above and below to the function treat the cases For
n = 4
n = 3,4
and
on
et
for
w(t) = 1.
a = 0, a = 1.
For
See also (5).
rules have the abscissas
0, 2/3
respectively.
1/3, 1
Calculation of generalized quadrature rules of the Gaussian
b = un+1, where
and
u......un+l
gorithm is simplified considerably since It is also known that When
tq = a
and
[a,s].
The three-phase al-
is known from the outset.
q
must occur in the representation sought
is even we also have
n
ar = ur
are required to form ex-
ul,...,un
tended Chebyshev systems of order two over
c.
from
the two generalized Gaussian
n = 3 and
et
q = 3, sl = 0, s2 = 1/2, s3 = 1,
Such rules can be determined by solving (P), (D) for
type.
We
c = (1,1/2,1/3,1/4)T
Thus the best approximation to
above is found by solving (36), (37) with
for
Thus
there are two (generalized) rules of Gaussian type which can
be found from (9) and (10).
(39)
[0,1]
ur(s) = sr-l.
Thus the structure of
t1 = a.
the nonlinear system treated in Phase 3 is known from the outset and we know for certain whether a "correct" system has been constructed after We observe that
carrying out Phases 1 and 2.
s1,...,sq
and
x1,...,xq
can be found from the nonlinear system (4) of §16 which in this case has n
equations and
n
If one wants to solve (P) instead, y
unknowns.
can afterwards be found from the linear system resulting from combining For the important case
(3) and (5) of §16.
ur(s) = sr-1
special al-
gorithms have been developed.
§20.
COMPUTING THE BEST APPROXIMATION IN THE UNIFORM NORM In this section we shall treat the numerical solution of the dual
pair (PA) - (DA) when
the same interval.
form an extended Chebyshev system of
v1....,vn
order two over an interval
[a,8]
Instead of
and
f
is twice differentiable over
we shall write
yr
ur, r = 1,...,n.
We
write (PA) and (DA) as follows (see §6): n (PA)
Minimize
yn+1
subject to
I
I
yrur(t) - f(t)l < yn+1,
r=1 q (DA)
Maximize
xif(ti)
subject to
i=1 q
xiur(ti) = 0, i=1
q I
i=l
1xii = 1.
r = 1,...,n,
t E [a,$];
20.
Computing the Best Approximation in the Uniform Norm
In §7 we treated polynomial approximation; i.e. the case
169
ur(t) =
tr-1
We shall now show that many of the results obtained there may be easily extended to case of a general extended Chebyshev system of order two. a < tl < t2 < .,. < to+l < 5
Let
Lemma.
(1)
and let
be fixed real numbers
be a nontrivial solution of the homogeneous system
x1,...,xn+l
of equations n+l
ur(ti)xi = 0,
r = 1,.... n.
(2)
i=1
Then i = 1,...,n.
x .x 1.+1 < 0, 1
Proof:
Let
i
be a fixed integer such that
1 < i < n.
Let
nC
yrur,
P = r I=l
the linear combination which is uniquely determined by the conditions
P(t) _
1,
J
0,
j = i, j 1,...,n+l,
j + i,
(3)
j# i+l.
The determinant of the system of equations (3) is positive by the definition of Chebyshev systems.
The rest of the argument parallels the proof
of Lemma (1) of §7. (4)
Theorem.
Let
f
be continuous on
vl,...,vn
system on the same interval and a linear combination
P
a Chebyshev
be given:
n P = r Il yrvr.
Let further
a < tl < t2 < ... < to+l < a
{f(ti) - P(ti)}
.
be
{f(ti+1)-P(ti+1)} < 0,
n+l
points such that
i = 1,...,n.
(5)
Then
min
i
I f (ti) - P (ti) I < On <
max
a
If(t) - P(t)j,
(6)
where n
= inf n yERn Proof:
replace
tr-l
max If(t) a
-
I
r=l
y r r I.
The proof closely follows that of Theorem (5) of §7 if we with
ur
there.
170
APPROXIMATION PROBLEMS BY CHEBYSHEV SYSTEMS
VIII.
(7)
Corollary.
Let
n p =
Yrur
E
r=1
be a linear combination such that there are
n+l
points
a < t1 < t2 <
with the properties
< to+l < $
lail = If(ti) - P(ti)l =
max If(t) - P(t)l, a
i = 1,...,n+l,
and i = 1,...,n.
d 1.+1 < 0,
d 1
Then f
is the linear combination of
P
which best approximates
in the uniform norm. (8)
Determination of a linear combination satisfying (5).
a < t1 < t2 < ... < to+l < B of
u1,...,un
be given.
and a constant
ul,...,un
a
Again let
We seek a linear combination
i = 1,...,n+1.
P(ti) = f(ti) - a(-l)1,
P
such that (10)
Putting
n r=nC
P =
L1 Yrur,
we get the linear system of equations n
Yrur(ti) + E(-1)1 = f(ti = 1,...,n+l.
(12)
r=1
There are
unknowns, namely
n+l
y1,...,yn, and
e, and the same number
Expanding the determinant of coefficients by its last
of equations.
column and using the defining property (2) of §18 of Chebyshev systems, we ascertain that (12) has a unique solution. For a general Chebyshev system, (12) may be solved numerically as described in §14 but if If
ur(t) = tr-l
ur = Tr-1, r = 1,...,n a+b ti
2
+
b-a Z
the method of (14) of §7 is faster.
and
cos -11n
ir,
i = 1,...,n+1,
then the orthogonality relations (34) of §7 can be used to solve (12) efficiently. By Theorem (4), (PA).
manner.
lel
from (10) is a lower bound for the value of
We next describe how to improve upon this bound in a systematic
20.
Computing the Best Approximation in the Uniform Norm
Lemma.
(13)
and let
u1,...,un
Let
be given and let
to+l < S
form a Chebyshev system over
be continuous on the same interval.
f
171
y E Rn
and
Let
[a,s]
a < t1 < t2 < ... <
be the solution of (12).
e
Put
n
R(t) = f(t) - I yrur(t) r=1
Now let
be such that
a < T1 < T2 < ... < Tn+I < $ R(Ti+l) < 0,
R(Ti)
i = 1,...,n;
(14)
IR(T)I > IR(ti)l,
i = 1,...,n+1;
(15)
IR(T.)I > JR(t.)I
for at least one
1
1
We define
z E Rn
(16)
i.
through
and
n
zrur(Ti) + e(-1)1 = f(T.),
i = 1,...,n+1.
(17)
r=1
Then
1E1
>
Proof:
i.
I
We determine
xi,...,xn+1
1,...,F,n+1
and
as the unique
solutions of the equations n+l
n+l
xiur (ti) = 0,
r = 1,...,n,
i=1
(-l)lxi = 1,
(18)
(-1)'Ei = 1.
(19)
i=1
n+l
n+l
iur(Ti) = 0,
r = 1,...,n,
i=1
i=1
Since the matrix of coefficients of (18) is the transpose of that of (12) This is true of (19) by the
the former system has a unique solution.
From Lemma (1) we conclude that
same argument.
i
xixi+l < 0,
i
i
n+l
n+l
n zrur(ti) + E
E
r=1
i=1
Using (19) we arrive at n+l e _
f(Ti). i=1
In the same way we find that
n+l (-1)'
i=1
we get
.
f(Ti)
=
i=1
172
APPROXIMATION PROBLEMS BY CHEBYSHEV SYSTEMS
VIII.
n+l xif(ti).
e = i=1
Applying (18) and the definition of n+l
n+l E _
we obtain
R
x R(t
1=1
i
i=1
i
i
R(Ti).
(21)
All terms in the two sums (21) have the same sign due to (14) and (20). Therefore the desired conclusion
By passing from the set
Remark.
follows from (16).
> lal
Ial
{t1,...,tn+1)
to
{TI' ...,Tn+1)
as described in Lemma (13) above we perform a simplex-like exchange step and obtain an improvement of the lower bound for the obtainable approximaWe will now show that it is possible to carry out such an
tion error.
exchange as long as
Assume that
Use the same notation as in Lemma (13).
Lemma.
(22)
there is a
is not an optimal solution of (PA).
yl,...,yn' E
such that
t* E [a,b]
R(t*) > E.
(23)
Then there is a set Since
Proof:
by (12), it has
is continuous on
R
A.
placed by
t*.
ii)
iii)
If
if
t* > an+I.
Theorem.
only if there are
Proof:
then
replaces
t*
t*
t*
< t* < Ai+1'
replaces
A1; otherwise
Ial
Then
t*
replaces
if
R(t*)
t
replaces Ai+1.
R(Xn+l) > 0;
XI.
Then (14)
points
an+l
replaces
-
(16) are satisfied as claimed.
is an optimal solution of (PA) if and
y1,...,yn+l n+l
A.
R(Ai) > 0; otherwise
Then
(12) is satisfied with
(7).
such that
i
T. = A., i = 1,...,n+l. (24)
will be re-
A.
An+1'
R(t*)
otherwise Put
Next one of the
R(t*)R(al) > 0
replaces
There is an ai
such that
There are the three cases i), ii), iii):
t < al. t*
R(ti+1) < 0
R(ti)
i = 1,...,n.
= ti, i = 1,2,...,n+1.
First put
and
(a,8]
z1 < z2 < ... < zn
zeros
n
ti < zi < ti+l'
i)
meeting the conditions (14) - (16).
{T1,...,Tn+1)
a < t1 < t2
< ... < to+l < $
such that
= yn+1
If (12) is satisfied then optimality follows from Corollary
Assume on the other hand that
y1,...,yn+1
is an optimal solution
20.
Computing the Best Approximation in the Uniform Norm
of (PA).
173
Since (PA) and (DA) have the same optimal value and (DA) has a
solution we may write
q q
y.+1 =
xif(ti),
L
(25)
i =1
cq
r = 1,...n,
xiur(ti) = 0,
(26)
i=1 q 1I1 Ixil = 1.
(27)
We need only consider optimal basic solutions of (DA); i.e. we must have q < n+l.
rank
The homogeneous system (26) has a matrix of coefficients with
= min(q,n).
Hence it has nontrivial solutions only for
and (DA) has therefore no optimal solutions with
q < n+l.
is the only possibility for optimal basic solutions. by
yr
and summing over
i=1
x1
E r=1
q > n+1
Thus
q = n+l
Multiplying (26)
we find that
r
yrur(ti) = 0.
Thus (25) becomes n+l yn+1
n
xi{f(ti)
E
E
-
i=1
(28)
yrur(t.)}.
r=1
By Lemma (1) we have
xixi+l < 0.
Hence (27) entails
n+l
(-1)1xil = 1.
I
i=1
Entering this expression into (28) we arrive at n+l
n+l
X.(-1)lyn+ll =
I
i=1
n xi{f(ti)
I
yrur(ti)}I.
-
r=1
i=1
Since
If(t)
n -
E
r=1
yrur(t) I : yn+1'
t E [a, 6),
we must conclude that (12) is satisfied for
lei = yn+l, establishing the
desired result. (29)
Remark.
Theorem (24) can be used for deriving a nonlinear
system of equations to solve (PA) numerically.
equations with the unknowns y1"
" 'yn+l
and
(12) is a system of n+l tl,.." to+l'
The missing
174
n+l R
APPROXIMATION PROBLEMS BY CHEBYSHEV SYSTEMS
VIII.
equations are derived by utilizing the fact that the error function of Lemma (13) must have a local extremum at Theorem.
(30)
order two over
Let
Then
f
be twice continuously differentiable
y1,...,yn+l
if and only if there is a set
nn
be an extended Chebyshev system of
ul,...,un
and let
[a,s]
on the same interval.
ti, i = 1,...,n+1.
is the optimal solution of (PA)
a < tl < t2 < ... < to+l < s
such that
yrur(ti) + (-1)IE = f(ti = 1,...,n+l,
(31)
r=1
n
11
(ti-a)(6-ti) { E YruT(ti) - f'(ti)} = 0, r=1
i = 1,...,n+l,
(32)
J1
(33)
yn+l = le"' Proof:
(31) and (33) follow from Theorem (24).
fact that the error function has a local extremum at
(32) expresses the ti.
If
ti E (a,s)
then the derivative of the error function must vanish. The three-phase algorithm is much simpler for (PA) with Chebyshev systems than in the general case.
q
no clustering occurs in Phase
In Phase 1 a discretized version of
(PA) is solved by means
2.
is set to
n+l
from the outset and
of an exchange algorithm based on Lemma (13).
For discretized problems convergence is guaranteed by the fact that only finitely many exchanges can take place and the calculated lower bound increases in each step. all
ti
To improve efficiency one generally exchanges
in each step and seeks to achieve
IR(ti)I > IR(ti)I.
The
classical Remez algorithm (see e.g. Cheney (1966)) requires that the maximum value of the error function on
[a,s]
be calculated at each step;
but this cannot be achieved by means of a finite number of arithmetic operations unless further assumptions are made about the structure of the function
f.
Chapter IX
Examples and Applications of Semi-Infinite Programming
In this chapter we shall illustrate how the techniques of semiinfinite programming can be used for the computational treatment of nontrivial problems in a practical context.
We remind the reader that im-
portant applications have been discussed elsewhere in the book, e.g. in §6, §7, §19 and §20.
§21.
A CONTROL PROBLEM WITH DISTRIBUTED PARAMETERS (1)
In this section we shall treat a problem of potential interest
for industry.
One wants to change the temperature of a metal body by
regulating the temperature of its environment.
This must be done within
a predetermined period of time and the temperature of the environment can only be varied between an upper and a lower value.
We shall discuss
a simple model problem which is solved in Glashoff and Gustafson (1976). Only one spatial coordinate occurs, but the solution to be presented here could possibly be applied to paralleliepipedic bodies having large extensions in the remaining two dimensions; i.e. when boundary effects can be neglected. (2)
Thus we consider a thin rod which can be heated symmetrically
at both ends but is thermally isolated from its surroundings everywhere else.
(The rod could be thought of as representing a cut through a plate
in its central part.
The two large surfaces of the plate are held at the
same temperature and heat flows into or out of the interior of the plate. The heat thus propagates perpendicularly to the large surfaces of the plate, not along the surfaces).
We select the coordinate. system so that
the endpoints of the rod are located at 175
-1
and
+1.
Inside the rod the
176
IX.
temperature is
EXAMPLES AND APPLICATIONS OF SEMI-INFINITE PROGRAMMING
y(x,t)
at the point
at the time
x
shall study the temperature of the rod for
t, -1 < x < 1.
We
We assume that
0 < t < T.
the temperature is governed by the heat diffusion equation, Yt(x,t) = Yxx(x,t) - q(x)y(x,t), where
-1 < x < 1, 0 < t < T,
(3)
is a given twice-differentiable function with
q
0 < x < 1.
q(x) = q(-x),
(4)
As usual, yt, yxx, etc. denote partial derivatives.
The temperature of
u, the temperature at the two endpoints.
the rod is controlled by varying
The transfer of heat from the rod to the surrounding medium (or vice versa) follows the law 0 < t < T
Byx(l,t) = u(t) - y(l,t), (right endpoint).
Here, $
(5)
An analogous equation holds for the left endpoint. Combining (3), (4) and (5) we realize
is a positive constant.
that
-1 < x < 1,
y(-x,t) = y(x,t), i.e. y
is an even function of
Yx(0,t) = 0,
0 < t < T;
Therefore we must have
x.
0 < t < T.
We need only consider
y(x,t)
the surrounding medium be
0 < x < 1.
for
u(t), 0 < t < T, and let
sulting temperature distribution in the rod at at
t = 0
Let the temperature of be the re-
y(x,T)
if the temperature
t = T
is
y(x,0) = 0,
-1 < x < 1.
Now let the desired temperature at
t = T
be
z(x)
where
z
is a con-
tinuous function with z(x) = z(-x).
We now want to compute a function mates
quire that
u which is such that
as closely as possible.
z(x)
u
y(x,T)
approxi-
For physical reasons we must re-
is a bounded function and introduce the constraint
0 < u(t) < 1,
0 < t < T.
For easy reference we collect the equations describing our control problem.
Yt(x,t) - Yxx(x,t) + q(x)y(x,t) = 0,
0 < x < 1,
0 < t < T,
(6)
177
A Control Problem with Distributed Parameters
21.
Ryx(l,t) + Y(l,t) = u(t),
0 < t < T,
(7)
yx(O,t) = 0,
0 < t < T,
(8)
y(x,0) = 0,
0 < x < 1,
(9)
0 < u(t) < 1,
0 < t < T.
(10)
If the control function is continuous, one can establish that the system - (9) has a classical solution
(6)
where
y(x,t)
0 < x < 1, 0 < t < T.
derivatives y
are continuous functions for yt, yxx is in fact continuous for 0 < x < 1, 0 < t < T.
continuous for
the linear control operator
y
y(x,T)
is
through
0 < x < 1,
(Lu)(x) = y(x,T), where
L
Thus
u, therefore, we can introduce
For continuous
0 < x < 1.
and its partial
y
We introduce the uni-
is the solution to the problem (6) - (9).
form norm on the space of functions continuous on
and formulate
[0,1]
our problem as follows: Minimize when
(11)
IILu - zIL
It can be
varies over all continuous functions satisfying (10).
u
shown that this problem does not in general have an optimal solution. Hence one extends the class of functions problem.
u
to get a solvable control
Here we take a
See Glashoff and Gustafson (1976) for details.
short cut to arrive more quickly at a computational treatment. (12)
We select an integer
and the fixed numbers
n > 1
t0,ti,...,tn,
where
0 = t0 < t I
< ...
Next we denote by
U
the class of piecewise constant functions
u
satis-
fying u(t) = ar,
u E U
Thus
u E U, then t = 0
tr-1 < t < tr,
(13)
is uniquely determined by the vector
(all ...,an)T.
If
can easily be calculated numerically since we start with
Lu
and calculate
the just computed y(x,t2), and so on. Minimize
r = l,...,n.
y(x,t1), 0 < x < 1
y(x,t1)
with
u(t) = al,
as an initial value for
y
Next we use
and determine
Therefore we can approximate (11) with the problem
JILu - zllm
over all
u E U.
We next introduce the nonnegative basis functions
(14)
yr
through
178
EXAMPLES AND APPLICATIONS OF SEMI-INFINITE PROGRAMMING
IX.
tr-1 < t < tr'
( 1 ,
r = 1,...,n.
vr(t)
yr E U, r = 1,...,n, and if u
Thus
(15)
otherwise
0,
is defined by (13) we get
n
u =
arvr.
E
(16)
r=1
Next we put
(17)
r = 1,....n,
wr = Lvr,
giving
n
Lu =
r=1
n
a Lv r
=
r
I a rwr .
r=1
Combining (15) and (16) we find that 0 < ar < 1,
u E U
meets (10) if and only if
r = 1,...,n.
(18)
Hence Problem (14) takes the form n
Minimize
11
E
arwr - Z11-
(19)
r=1
over all
a E Rn u = yr
putting
subject to (18). in (6)
- (9).
We observe that
wr
is determined by
Problem (18) may now be treated in analogy
to the approximation problems in Chapter III.
Thus we first recast it
into the form Minimize
when
an+1
n
OW(X) - z(x)1 < an+l'
0 < x < 1
(20)
r=1
0 < ar < 1,
r = 1,...,n.
(21)
By replacing (20) and (21) with equivalent simpler inequalities, we obtain
Minimize over all
a
(22)
n+ 1
al,...,an+l
subject to the constraints
n
arwr(x) + an+1 > z(x),
0 < x < 1,
(23)
r=1
n -
r=1
arwr(x) + an+l > -z(x),
0 < x < 1,
(24)
21.
A Control Problem with Distributed Parameters
a
r
> 0,
-ar > -1,
179
r = 1,...,n,
(25)
r = 1,...,n.
(26)
(22) - (26) is now a linear optimization problem of the type defined in §3.
The three-phase algorithm of Chapter VII applies.
The fact that the
inequality constraints appear in four disjoint groups makes the organization of the calculation somewhat laborious.
We present here a worked example from Glashoff and Gustafson
(27)
(1976).
In (5)
- (9), q(x) = 0, 0 < x < 1, S = 0.1
were selected.
Several values of T
only the case
ul,...
Let
form.
T = 0.3.
and
z(x) = 0.2
were treated but we discuss here
In this Example
wr(x)
may be determined in closed
be the positive roots of the equation
V tan u = 10.
Next determine
Akpk(x)
through
Akpk(x) = 2 sin uk(uk + cos uk sin Uk)-Icos 11kX.
Then
is determined from
wr(x)
Lur(x) = wr(x) n = 10
k=1 I
was chosen and
tr = 0.03
r,
tr
AkukPk(x) f ur(t)exp(-uk(T-t))dt. 0
were taken equidistant;
r = 0,...,10.
The problem (22) - (26) was discretized by means of an equidistant grid with 17 points
xi;
xi = (i-l)/17, Then (22)
i = 1,...,17.
- (26) was replaced by a linear program having 11 variables
a1,...,a11
and 54 constraints.
We note that
0 < ar < 1
The results in Table (32) below emerged.
only for
r = 5,8,9,10.
Next put
10
f (x) = where
I
arwr (x) - z (x) ,
(28)
r=1
a1,...,a10
is the calculated solution just obtained.
bility condition is in this case that
R(x)I < all,
0 < x < 1.
The feasi-
180
EXAMPLES AND APPLICATIONS OF SEMI-INFINITE PROGRAMMING
IX.
We find that
f
0, 0.3125, 0.6250,
has local extrema at the 5 gridpoints
0.8750, 1.
Thus we assume that (22) - (26) has an optimal solution
all...,a11
such that the function 10
f =
arwr - z
I
r=l
has local extrema at the endpoints
which we denote E1'2'3
0
and
1
and at
interior points
3
Thus we get the following 8 equations: .
If(0)I = all, If(l)I = all, I f(Ci) I = all, i = 1,2,3, fig i) = 0, i = 1,2,3.
(29)
(30) (31)
We use the result of the discretized problem as an approximation of the solution to the linear optimization problem (22)
or = ar
for
r = 1,2,3,4,6,7
and assume that
-
Thus we put
(26).
and
f
hav the "same
f
shape", i.e. that they have the same number and the same kind of local extrema, thus enabling us to remove the absolute value symbols and select correct signs in (29) and (30).
Thus the 8 equations (29)
-
(31)
The system is solved
have the 8 unknowns with the Newton-Raphson method.
Lastly, the optimality of the solution
hereby obtained is checked by verifying that the complementary slackness For this
conditions with respect to the dual of (22) - (26) are met.
particular problem it was possible to simplify the general three-phase algorithm due to the special structure of the error curve problem appears here only at the verification step.
f.
The dual
We also see from (32)
that the discretization error is rather small. (32)
Table.
Calculated results for
T = 0.3, n = 10,
17 equidistant gridpoints in Time interval
0 - 0.12 0.12-0.15 0.15-0.21 0.21-0.24 0.24-0.27 0.27-0.30 Optimal value (33)
Exercise.
Index r 1,2,3,4 5
6,7 8
9
10 11
[0,1]
Discretized problem
Continuous problem (20)-(24)
1
1
0.43638
0.43631 0
0
0.10848 0.23062 0.19959 1.069x10-4
0.10835 0.23068 0.19959 1.060x10-4
What could happen if the verification of the com-
plementary slackness conditions is left out?
Discuss in particular the
case when (22) - (26) is discretized with a fine grid!
22.
Operator Equations of Monoton=c Type
181
OPERATOR EQUATIONS OF MONOTONIC TYPE
§22.
(1)
function
We shall use the term operator equation for equations having a u
as unknown.
Such problems are often formulated as differen-
tial equations or integral equations.
If the unknown function occurs
linearly, then an approximate solution to the operator equation may be calculated by means of reformulating the given problem into an approximation problem of the type discussed in §6 and later in the book. (2)
example.
We illustrate the general idea by discussing the following Let
defined for
be a continuous function of two variables
K
0 < s < 1, 0 < t < 1.
tions which are defined on
Let
f
and
g
We seek a function
(0,1].
s
and
t,
be two given funcu
satisfying the
condition
u(0) = 1
(3)
and fulfilling the linear integro-differential equation 1
u'(t) + f(t)u(t) + 1
Let now
u1,...,un
be
ferentiable on
[0,1].
K(t,s)u(s)ds = g(t),
0 < t < 1.
(4)
0
n
given functions which are continuously dif-
We want to approximate the unknown function
u
with a linear combination n
yru
u = I r=1
(5)
r
The idea is to enter this approximation into (4) and to minimize the norm of the function n
Cn
YruT(t) + f(t) I r=1
1 fl
Cn
yrur(t) +
r=1
L
r=1
K(t,s)ur(s)ds - g(t),
Yr 0
0 < t < 1. Next put 1
vr(t) = ur(t) + f(t)ur(t) +
K(t,s)ur(s)ds,
r = 1,...,n.
(6)
0
If we want to approximate Minimize
yn+l
subject to the constraints
g
in the uniform norm, we get the task (7)
182
EXAMPLES AND APPLICATIONS OF SEMI-INFINITE PROGRAMMING
IX.
n
0
yrvr(t) - g(t)I < yn+
(8)
r=1 n
Yrur(0) = 1.
E
(9)
r=1
The problem (7)
The last relation comes from (3).
- (9) can then be re-
formulated as a linear optimization problem and solved by means of the general computational schemes of Chapter VII.
We notice the similarity
between the approach taken here and that applied in §21. direct relation between the value a solution (7)
There is no
in (7) and the deviation between
Yn+l
and an approximating linear combination (5).
u
The problem
- (9) is defined and can be solved numerically even if the original
task (3), (10)
(4) does not have a solution.
We shall now discuss a general class of operator equations
where the analysis can be carried much further and where the techniques of semi-infinite programming permit a systematic computational treatment. We refer here to the so-called operator equations of monotonic type.
See
A comprehensive account is given in Protter-Weinberger
e.g. Collatz (1952).
(1967) and we refer the interested reader to this text for the mathematiNumerical examples are also given
cal theory of this class of equations.
Here we shall illustrate the use of semi-infinite pro-
in Watson (1973).
gramming on a particular example. (11)
tion
u
We want to calculate the uniquely determined func-
Example.
of two variables satisfying
au+au=0 2
2
as2
at2
A = {(s,t), 0 < s < 1, 0 < t
on
s,t E bd A, the boundary of
u(s,t) = f(s,t),
Here, f
(12) (13)
A.
is a known continuous function.
(12) is a monotonic operator equation and one can show the following result:
v(s,t) < u(s,t) < w(s,t) whenever
and
v
w
(14)
are functions of two variables satisfying
v(s,t) < f(s,t) < w(s,t), (s,t) E bd A
waw<0
2
as2
2
at2 -
2
- as2
2
(15)
(s,t) EA.
(16)
ate'
Our goal is to construct functions
v
and
w
numerically.
Put
Operator Equations of Monotonic Type
22.
183
n
w(s,t) =
I
Yrgr(s,t),
r=1
where t L,
g1,...,gn
are defined by the expressions
respectively.
n=
z
yl,...,yn
1,s,t,s2,st,t2,...,stL-l
is an integer and
Here, L
(L+1) (L+2). We get
are constants to be determined.
2 aw+ aw_ n/ y f (s,t) rr as2 at2 r=1 2
where
fr
are calculated from
f4(s,t) = f6(s,t) = 2, etc.
Thus
gr.
fr(s,t) = 0, r = 1,2,3,5,
The conditions (15) and (16) imply
nn yrgr(s,t)
> f(s,t),
(s,t) E bd A,
r=1
n (s,t) E A.
yrfr(s,t) < 0, t=1
We want to find a "good" function
w, i.e. a function which satisfies the
right inequality in (15) and the left inequality in (16) as well as posTherefore we
sible.
minimize over all
(17)
Yn+1
subject to
y1,...,yn+1 n
f(s,t) + Yn+l >
yrgr(s,t)
> f(s,t),
(s,t) E bd A,
(18)
(s,t) E A.
(19)
r=1 nn
- Yn+l <
Yrfr(s,t) < 0, r=1
The problem (17) - (19) can easily be recast into the following equivalent linear optimization problem: Minimize over all
(20)
Yn+l
yl,...,yn
subject to
n yrgr(s,t)
> f(s,t),
(s,t) E bd A,
(21)
r=1 nn
yn+l
yrgr(s,t)
C
r=1
> -f(s,t),
(s,t) E bd A,
(22)
184
IX.
EXAMPLES AND APPLICATIONS OF SEMI-INFINITE PROGRAMMING
n
yrfr(s,t) > 0,
(s,t) E A
(23)
(s,t) E A.
(24)
r=1
n C
yn+1 +
yrfr(s,t) > 0,
r=1
This task can be treated with the computational schemes of Chapter VII. The construction of
is carried out in an analogous manner.
v
(14) can
now be used to calculate pointwise upper and lower bounds for the solution
u.
Exercise.
(25)
(24) is solvable.
Show that the linear optimization problem (20)
Hint:
-
Use Theorem (7) of §11.
AN AIR POLLUTION ABATEMENT PROBLEM
§23.
We shall resume the discussion of the air pollution control problem of (14) of §3 but now in a more general context.
We noted that pollutants
emitted from various sources, e.g. power plants, contaminate the air. Sooner or later they will reach the ground as fallout.
Thus sulfur com-
pounds from power plants burning fossil fuels may damage soils and acidify The pollutants are often transported
waters causing the death of fish.
long distances before they reach the ground.
Thus the severe acidifica-
tion of lakes in Scandinavia is caused, to a large extent, by industry in Similar phenomena have recently been
Great Britain and Central Europe.
In this section we shall develop a model
observed in the U.S. and Canada.
The main
which incorporates both air pollution and fallout on the ground.
difficulty associated with its application to practical problems is the construction of the transfer functions.
Much research is needed in this
area. (1)
Air pollution control model.
We use the same notation as in
(14) of §3 but include the fallout as well.
quality control area not coincide. V.
and
W.
S
and a fallout control area
With each source where
Thus we consider an air
j
F.
S
and
need
F
we associate the transfer functions
V (s), s E S, is the contribution from source
j
to
the annual mean concentration in the air of the pollutant considered at In the same way, W.(t), t E F, is the contribution from source
s E S.
to the fallout at fied.
t E F.
Let
N
sources with strengths
gj
j
be identi-
We assume that the combined annual mean pollutant concentration is
given by
23.
An Air Pollution Abatement Problem
N
gjVj(s),
185
s E S,
and that the total fallout is N
g.". (t),
t E F.
We assume that the contributions add up according to the principle of superposition.
The number of sources is fairly large and therefore we
combine them into classes as described in (14) of §3. class are regulated in the same way.
The sources in each
Upon performing this aggregation we
write the total concentration in the air at
s E S,
n
r=1
vr(s)
and the total fallout at
t E F,
n wr(t). r=1
Thus source-class
gives rise to the concentration contribution
r
wr.
and the fallout contribution
One reductions strategy is that the emission of class by the fraction
0 < Er < 1, r = 1,...,n.
Thus
Er.
yr
r
is reduced
Hence the total re-
maining concentration after regulation is given by n
(1-Er)vr(s) r=1
and the total fallout becomes n
(1-Er)wr(t). r=1
We now require that the remaining concentration and fallout do not surpass given levels posed.)
g
and
f.
(The standards
g
and
f
We also assume that there are upper bounds
for the fractions
Er.
emissions completely.)
may be legally imer < 1, r = 1,...,n,
(It may not be technically possible to remove the
Therefore the numbers
El,...,En
must meet the
conditions
0 < E r < er ,
r= 1,...,n,
(2)
ne
(1-Er)vr(s) < g(s), r=1
s E S,
(3)
186
IX.
EXAMPLES AND APPLICATIONS OF SEMI-INFINITE PROGRAMMING
n
t E F.
(1-Er)wr (t) < f(t),
(4)
r=1 The reduction of emissions entails costs, e.g. for purification of the exhausts or the use of more expensive fuels than otherwise would have been selected.
We shall assume here that the costs are defined by the linear
function
n
K(E) = where
I crEr, are known numbers.
cl,...,cn
function
(5)
r=1
K(E)
The task of minimizing this cost
subject to the constraints (2) -
(4) can be written as a
linear optimization problem as follows: n
Minimize
I
crEr
(6)
r=1
subject to the constraints E
r
r = 1,...,n,
> 0,
-Er > -e
(7)
r = 1,.... n,
(8)
r,
n
n
E v (s) > -g(s) + E V r r r=l r r=1 E
n r=1
Er wr (t) > -f(t) +
n E r=1
wr (t),
s E S,
(9)
t E F.
(10)
The constraint (9) admits the following interpretation:
the total reduc-
tion must amount at least to the difference between the concentration before reduction and the imposed standard. that
E
(10) is solvable if it is consistent.
ards are met by maximal reduction; i.e. n (1-er)vr(s) < g(s),
s E S.
(1-er)wr (t) < f(t),
t E F.
r=1
and n
Conditions (7) and (8) entail
is restricted to a compact subset of
r=1 The dual of (6) - (10) may be written: Maximize
Rn.
Thus the problem (6)
Consistency means that the stand-
-
23.
An Air Pollution Abatement Problem
T q1 -n a +
187
q2
xig0(si) +
Eif0(ti)
(11)
L-1
over all vectors
gl,g2, points
A E Rn, n E Rn, integers
S. E S, ti E T,
xi, Ei, subject to the constraints
and reals
ql
q2
i=1
r = 1,...,n,
n
Ei > 0,
i = 1,...,q2.
r
r = 1,...,n,
(12)
i=1
Ar > 0,
-
Eiwr(ti) = cr,
xivr(si) +
Ar - nr +
> 0,
-
xi .
> 0,
-
i = 1,...,q1, (13)
In (11) we have put
g0(s) _ -g(s) +
CnC
L
r=1 n
vr(s),
f0(t) _ -f(t) + I w(t). r=1
The complementary slackness conditions for the dual pair (6)
-
(10) and
(11) - (13) read ATE = 0,
(14)
nT(e-E) = 0,
(15)
n xi { I
Ervr(si) - g0(si)I = 0,
i = 1,...,q1,
(16)
Erwr(t
i = 1,...,q2.
(17)
r=1 n Ei {
-
f0(ti)} = 0,
r=1
The equations (14) - (17) which must be fulfilled for optimal solutions can be analyzed as follows. r = 1,...,n.
nn
0 < Er < er, we must have
Er = 0, then (15) entails
If e.g.
er - Er = er > 0.
Since
Further, if
xi > 0
nr = 0
Arnr = 0,
since
then
Ervr(si) = g0(si).
r=1
Thus the pollutant concentration reaches the highest possible value at si.
In the same way the level of fallout reaches the standard value at
ti
if
ql
points
Ei > 0.
Thus an optimal reductions strategy is associated with where the pollutant concentration reaches the
sl,...,sq 1
highest value and
q2
points
where the rate of fallout is
t1,...,t q2
188
EXAMPLES AND APPLICATIONS OF SEMI-INFINITE PROGRAMMING
IX.
the largest permissible.
The positions of these "critical points" are
determined when Problem (6)
- (10) is solved numerically.
For this purpose the general three-phase algorithm of Chapter VII may be used.
In Phase i), S
in (9) and
{s1,...IsN} c S
finite subsets
linear program is solved.
noted by
E*.
in (10) are replaced by the
{t1,...,tL} c T
and the resulting
Let the optimal solution then obtained be de-
We find that
n
r=1
F
and
n
E*v (s.) ? -g(s
rr
vr (s .),
+ r= =1
Cn
nCC
w(t r
E*wr(ti) > -f(tt) + rI l rL1
This means that with the reduction strategy tion and fallout are met on the grids.
E*, the standards for pollu-
They can hence be violated only
outside the grids and it is possible to derive bounds for how large the deviation can be.
r-r-
We recall that
0 < E* < e
Hence one can assess
< 1.
when it is worthwhile to carry out the remaining phases of the algorithm since the parameters of this problem, e.g. the transfer functions, are not very accurately determined.
524.
NONLINEAR SEMI-INFINITE PROGRAMS In this section we shall illustrate by examples how the computational
scheme of Chapter VII may be extended to problems which are not of the form of (P) (introduced in §3) or (D) (introduced in §4). We treat first the class of problems which arise when the preference function of (P) is replaced by a nonlinear convex function. Let the index set
we consider the following task: a1,.... an
and
be defined as in §3.
b
continuously differentiable on Minimize over all
R.
S
Suppose that
F
is convex and
Consider the problem
F(y)
y E Rn
Thus
and the functions
(1)
subject to the constraints
n yrar(s) > b(s),
s E S.
(2)
r=1
This problem may be reduced to the form of (P).
In our further develop-
ment we shall assume that (2) determines a compact subset of it will be denoted by
K.
Rn.
Then (1), (2) has an optimal solution
Here y*.
We
Nonlinear Semi-Infinite Programs
24.
189
shall now derive relations which can be used for the determination of
y*.
(1) and (2) may be written as: Yn+l
Minimize subject to
(3a)
yn+l = F(y),
y E K.
Let us now assume that a cube
(3b)
T = {x
Ixil < F,
i = 1,...,n}
I
with
is known
n E T, the linear function
Denote by
K E T.
n
1I(n,Y) = FCn) +
Fr(n) (Yr-T1r) r=1
where
stands for
Fr(n)
Since
.
2n
Oettli (1975))
F
is convex we have (c.f. Blum and
r
F (y) > n(n,Y),
y E K,
F (y) = sup H(n,Y) = n(Y,Y) nET
Hence (3) is equivalent to the problem Minimize
(4)
yn+l
subject to Yn+l
(n,Y)
n ET, yE K.
(2) gives the condition that
y E K.
(5)
Combining this with (4) and (5) we
finally arrive at the formulation Minimize
Y.+1
(6)
subject to
n yn+1
C L
n yrFr(n) ? F(n) -
r=1
nrFr(n),
n E T,
(7)
s E S.
(8)
r-1
n
Yrar(s) > b(s), r=1 (6)
-
(8) is a linear optimization problem of the type introduced in 93.
It can be solved by means of the general three-phase algorithm of Chapter VII.
An alternative is to discretize (1),
(2) directly.
This generalization may be carried even further.
We consider the
following problem: Program (PG).
Let
S
be a compact subset of
function of the two arguments
s,y, where
s E S
Rn
and
and let y E Rn.
g g
be a is
190
EXAMPLES AND APPLICATIONS OF SEMI-INFINITE PROGRAMMING
IX.
required to have the properties that the set
K = {y E Rn
g(s,y) < 0,
1
is nonempty and compact, and
s E S} g
(9)
is twice continuously differentiable on
S x K. Let
be twice continuously differentiable on
G
Then Program
Rn.
(PG) is the task:
Minimize
G(y)
y E Rn
over all
(10)
subject to the constraint
g(s,y) < 0,
Program (PG) has a solution since the continuous function
Remark. G
s E S.
is to be minimized over the compact subset
a special case of (PG) which occurs if T a (s)
y.
Since
G
K
is not assumed to be convex, G
G
many local minima on
of
Program (P) is
Rn.
is linear and
g(s,y) = b(s) -
may have arbitrarily
K, a fact which complicates the numerical treatment.
To a certain extent, a computational scheme for (PG) can be based on the experiences from (P), even if the implementation on a computer is much more difficult.
A natural idea is to discretize (PG), i.e. replace
S
by a finite
grid
T = {sl,...,sN}
and approximate (PG) by the task Minimize G(y)
(12)
y E Rn subject to the constraints
over all
g(sj,Y) < 0, Let now
L
(13)
j = 1,...,N.
be a positive interpolating operator with nodes
(see (8) of §13).
s1,...,sN
We define
N
Lg(s,y) =
I
are as in (8) of
w.
to conclude that Lg(s,y) < 0,
y
13.
We next invoke Theorem (10) of §13
satisfies (13) if and only if s E S.
Here the discretization (12), (13) of Program (PG) is equivalent to replacing
g(s,y)
by
Lg(s,y)
in (11).
For convergence results based on this
24.
Nonlinear Semi-Infinite Programs
191
fact see Gustafson (1981).
We note that the numerical solution of the discretized problem (12), (13) is a nontrivial task along with the
ties (13) are consistent and the set
K
verification that the inequaliof (9) is nonempty and compact.
These matters must be settled analytically, if possible,
This is in
sharp contrast to Problem (P) where the questions of the consistency and boundedness of the discretized problem are answered as a result of the simplex calculations. The problem (12), (13) may be treated using the algorithm in Han (1977) or the variant developed by Powell (1978).
an approximate solution
y
Thus we may calculate
It can be used to determine neces-
to (PG).
sary conditions which must be met by optimal solutions to (PG). ment parallels that in §16. Let
y*
i)
ii)
be an optimal solution to (PG).
g(s,y*) < 0,
s E S;
There are
points
q
g(sj,y*) = 0,
In the first case
y*
Our argu-
See also Gustafson (1981) and Watson (1981). Then two cases are possible:
such that
s E S
i = 1,...,q.
(14)
is a solution to the equation
VG(y) = 0.
(15)
But (15) may have other solutions besides
y*.
Thus one would need to
determine all solutions to (15) and seek out those which meet (11) and render
a minimum.
G
Next consider Case ii).
Put
f(s) = g(s,y*).
Then
f
has a local maximum at
(16)
sj, j = 1,...,q.
Arguing as in §16, we
derive conditions apart from (14) which must be met by
Hence
y*.
may be considered as a solution to the problem of minimizing to (14) and the constraints generated by the fact that
f
G
y*
subject
from (16) as-
sumes a maximum at s, j = 1,...,q. In the numerical treatment one approximates
y*
by
y, a calculated
optimal solution to (12), (13), and derives the constraints by replacing the unknown
y*
by the calculated
y
in (14) and (16).
at a nonlinear constrained optimization problem.
Hence we arrive
Using Lagrange multi-
pliers as described in Luenberger (1969), Chap. 9, we may derive a nonlinear system of equations which subsequently is solved numerically, e.g. by means of the Newton-Raphson method.
Thus we get a direct generaliza-
tion of the computational procedures described in Chapter VII.
An alterna-
192
IX.
EXAMPLES AND APPLICATIONS OF SEMI-INFINITE PROGRAMMING
tive approach is to apply the algorithms by Han and Powell which were mentioned earlier.
In either case an independent verification of the
optimality of the calculated solution is called for.
References
Andreasson, D. 0. and Watson, G. A.: Linear Chebyshev approximation without Chebyshev sets, BIT 16 (1976), 349-362. Bartels, R. H.: A penalty linear programming method using reducedgradient basis-exchange techniques, Linear Algebra and Appl. 29 (1980), 17-32.
Bartels, R. H. and Golub, G. H.: The simplex method of linear programming using LU-decompositions, CACM 12 (1969), 266-268.
Bartels, R. H., Stoer, J. and Zenger, Ch.: A realization of the simplex method based on triangular decompositions. In: "Linear Algebra", J. H. Wilkinson and C. Reinsch (Eds.), Springer-Verlag, BerlinHeidelberg-New York (1971). Blum, E. and Oettli, W.: Mathematische Optimierung, Springer-Verlag, Berlin-Heidelberg-New York (1975).
L'algorithme d'exchange en optimisation convexe, These, Carasso, C.: Grenoble (1973). Charnes, A. and Cooper, W. W.: Management Models and Industrial Applications of Linear Programming, Vols. I,II, J. Wiley & Sons, New York (1961).
Charnes, A., Cooper, W. W. and Henderson, A.: An Introduction to Linear Programming, J. Wiley $ Sons, New York (1953). Charnes, A., Cooper, W. W. and Kortanek, K. 0.: Duality, Haar programs and finite sequence spaces, Proc. Nat. Acad. Sci. U.S. 48 (1962), 783-786.
Charnes, A., Cooper, W. W. and Kortanek, K. 0.: Semi-infinite programs which have no duality gap, Management Science 12 (1965), 113-121. Cheney, E. W.: Introduction to Approximation Theory, McGraw-Hill, New York (1966). Collatz, L.:
Aufgaben monotoner Art, Arch. Math. 3 (1952), 366-376.
Approximation von Funktionen bei einer oder mehreren Veranderlichen, ZAMM 36 (1956), 198-211.
Collatz,.,L.:
Collatz, L. and Krabs, W.: Stuttgart, (1973).
Approximationstheorie, B. G. Teubner,
Collatz, L. and Wetterling, W.: Optimierungsaufgaben, Zweite Auflage, Springer-Verlag, Berlin-Heidelberg-New York (1971). 193
194
REFERENCES
0
Dahlquist, G. and Bjorck, A.: Numerical Methods, Prentice-Hall, Englewood Cliffs, New Jersey (1974).
Linear Programming $ Extensions, Princeton University Dantzig, G. B.: Press, Princeton, New Jersey (1963). In "Linear Inequalities and Related Duffin, R. J.: Infinite programs. Systems", H. W. Kuhn and A. W. Tucker (Eds.), Princeton University Press, Princeton, New Jersey (1956), 157-170. Eckhardt, U.: Theorems on the dimension of convex sets, Linear Algebra and Appl. 12 (1975), 63-76.
Eggleston, H. G.:
Convexity, Cambridge University Press, Cambridge (1958).
Fahlander, K.: Computer programs for semi-infinite optimization, TRITANA-7312, Department of Numerical Analysis and Computing Science, Royal Institute-of Technology, S-10044 Stockholm 70, Sweden.
Gill, P. E. and Murray, W.: A numerically stable form of the simplex algorithm, Linear Algebra and Appl. 7 (1973), 99-138. In: "SemiGlashoff, K.: Duality theory of semi-infinite programming. infinite programming", Proc. Int. Colloqu. Bonn. R. Hettich (Ed.), Lecture Notes in Control and Information Sciences 15, Springer-Verlag, Berlin-Heidelberg-New York (1979), 1-16.
Glashoff, K. and Gustafson, S.-A.: Numerical treatment of a parabolic boundary-value control problem, J. Opt. Th. Appl. 19 (1976), 645-663.
Einfuhrung in die Lineare Optimierung, Glashoff, K. and Gustafson, S.-A.: Wissenschaftliche Buchgesellschaft, Darmstadt, (1978). Gorr, W., Gustafson, S.-A. and Kortanek, K. 0.: Optimal control strategies for air quality standards and regulatory policies, Environment and Planning 4 (1972), 183-192. 0
Gustafson, S.-A.: On the computational solution of a class of generalized moment problems, SIAM J. Numer. Anal. 7 (1970), 343-357. 0
A general three-phase algorithm for nonlinear semi-infinite programming, in Y. P. Brans (Ed.), Operations Research '81, NorthHolland Publ. Co., Amsterdam-New York-Oxford (1981), 495-508.
Gustafson, S. -A.:
0
Numerical treatment of a class of Gustafson, S.-A. and Kortanek, K. 0.: semi-infinite programming problems, Nav. Res. Log. Quart. 20 (1973), 477-504. 0
On the calculation of optimal longGustafson, S.-A. and Kortanek, K. 0.: term air pollution abatement strategies for multiple-source areas, Proc. Sixth NATO/CCMS Expert Panel on Air Poll. Model., (1975). Linear Programming, Addison-Wesley Publ. Comp., Reading, Hadley, G.: Mass., 3rd printing (1964). Han, S. P.: A globally convergent method for nonlinear programming, J. Opt. Th. Appl. 22 (1977), 297-309.
A Newton-method for nonlinear Chebyshev approximation, In: Hettich, R.: "Approx. Theory", Proc. Int. Colloqu. Bonn, Lecture Notes Math., 556, Springer-Verlag, Berlin-Heidelberg-New York (1976), 222-236. "Semi-infinite Programming", Lecture Notes in Control Hettich, R. (Ed.): and Information Sciences 15, Springer-Verlag, Berlin-HeidelbergNew York (1979). Numerische Methoden der Approximation and Hettich, R. and Zencke, P.: semi-infiniten Optimierung, Teubner, Stuttgart, 1982.
References
195
Hildenbrand, K. and Hildenbrand, W.: Lineare Okonomische Modelle, Springer Hochschultext, Berlin-Heidelberg-New York (1975). Hoffman, K.-H. and Kiostermair, A.: A semi-infinite linear programming procedure and applications to approximation problems in optimal control. Approx. Theory II, Proc. Int. Symp. Austin, (1976), 379-389. Judin, D. B. and Golstein, E. G.: Berlin (1968).
Lineare Optimierung I, Akademie-Verlag,
Karlin, S. and Studden, W. J.: Tchebycheff Systems: with Applications in Analysis and Statistics, Interscience Publishers, New York-LondonSydney (1966). Krabs, W.:
Optimierung and Approximation, B. G. Teubner, Stuttgart (1975).
Lorentz, G. G.: Approximation of Functions, Holt, Rinehart and Winston, New York (1966). Luenberger, D. G.: Optimization by Vector Space Methods, John Wiley f Sons, New York-London-Sydney-Toronto (1969).
Powell, M. J. D.: A fast algorithm for nonlinearly constrained optimization calculations: In: "Numerical Analysis", G. A. Watson (Ed.), Lecture Notes in Mathematics 630, Springer-Verlag, Berlin-HeidelbergNew York (1978).
Protter, M. H. and Weinberger, H. F.: Maximum Principles in Differential Equations, Prentice-Hall, Englewood Cliffs, New Jersey (1967). Stewart, G. W.: Introduction to Matrix Computation, Academic Press, New York and London (1973).
Einfuhrung in die Numerische Mathematik, 2. Auflage. SpringerStoer, J.: Verlag, Berlin-Heidelberg-New York (1976). Watson, G. A.: One-sided approximation and operator equations, J. Inst. Maths. Applic. 12 (1973), 197-208. Watson, G. A.: On the best linear one-sided Chebyshev approximation, J. Approx. Theory 7 (1973), 48-58.
Globally convergent methods for semi-infinite programming, Watson, G. A.: Department of Mathematics, University of Dundee (1981).
Index
Absolute value for vector, 8 Activity, 3, 25 Air pollution, 17, 184 Annual mean concentration, 17 Andreasson, 44, 147 Basic set, 95 Basic solution, 93, 95 Bjorck, 51, 115 Bounded state, 5 Boundary point, 8 Caratheodory, 65 Center of a sphere, 8 Center of gravity, 140 Charnes, vi, 36, 92 Chebyshev, v Chebyshev polynomial, 52 Chebyshev system, 48, 153 Closed half-space, 7 Closed set, 9 Compact set, 9 Complementary slackness lemma, 25 Complementary slackness theorem,
Eckhardt, 36 Elements of a matrix, 6 Equivalent norms, 9 Euclidean norm of a vector, 8 Exchange step, 93, 97 Extended Chebyshev system of order two, 156, 160, 168 Factorization method, 116 Fahlander, 112 Fall-out, 184 Farkas' Lemma, 82 Feasible point, 2 Feasible problem, S Feasible set, 4 Finitely generated, 74 First Duality Theorem, 79
Gaussian elimination, 115, 119 General Assumption, 70, 82 General optimization problem, 4 Generalized quadrature rule of the Gaussian type, 158 Gorr, 19
95
Conistent problem, 5 Constraint, 4 Conic hull, 60 Convex combination, 59 Convex cone, 60 Convex hull, 59 Convex conic hull, 60 Cooper, vi, 36, 92 Dantzig, v, 92 Dahlquist, 51, 115 Defect, 33 Defect diagram, 33, 81 Degenerate basic solution, 95, 137
Discretization, 15, 109, 113 Disposal-activity, 26 Distance, 8 Double dualization, 30 Distributed parameters, 175 Dual linear program, 28 Dual pair, 24 Dual problem, 24, 39, 66, 129 Duality, 27 Duality gap, 33, 36, 79 Duality lemma, 20 Dualization, 24 Duffin, vi
Hyperplane, 7 Henderson, vi, 92 Hildenbrand, 27 Householder transformation, 119 Inconsistent, 5 Index of a set, 159 Inner point of a set, 8 Integro-differential equation, 181 Intensity, 3 Interior of a set, 8 Inverse of a matrix, 7 Karlin, 159 Kortanek, vi, 19, 36
Length Linear Linear Linear
of a vector, 8 combination, 61 mapping, 6 optimization problem, 10, 12,
45
Linear program, 14, 23 Linear programming, 14, 23, 27, 81, 106
Linear system of equations, 6 LR-decomposition, 123 Mass, 135 196
Index
Mass-point, 135 Matrix, 5 Maximization problem, 4, 5, 31 Maximal representation, 89 Minimization problem, 4 Minimum point, 2 Moment cone, 61, 80 Monotonic type, 182 Nonsingular matrix, 7 Normal vector, 7 One-sided approximation, 167 Open half-space, 7 Open set, 8 Open sphere, 8 Operator equation, 181 Optimal point, 2 Optimal solution, 2 Permissible set, 2, 4 Pivot element, 119 Positive interpolating operator, 110, 190 Preference function, 2, 4 Production model, 3 Production plan, 3, 25 Projection Theorem, 77 Protter, 182 Radius of a sphere, 8 Rank of a matrix, 6 Reduction Theorem, 63 Regular basic solution, 102 Regular matrix, 7 Regularity condition, 70 Regularized problem, 128 Regularization, 88 Roughness of a grid, 112 Row pivoting, 121 Scalar product, 8 Second Duality Theorem, 84 Semi-infinite programs, vi Separating hyperplane, 75, 76 Separation Theorem, 78 Side-condition, 4 Siting of power plant, 1 Slack vector, 28 Slater's condition, 70, 73, 80, 82 Solution, 2 Solvability, 69 Square matrix, 6 State diagram, 30, 81 Stewart, 116 Stoer, 116 Studden, 159 Superconsistent, 70, 84
197
Supporting hyperplane, 82, 83
Transfer function, 17 Transpose of a matrix, 6 Triangular factorization, 116, 118, 119
Uniform approximation, 37, 38, 73 Value of an optimization problem,
4, 15 Vandermonde matrix, 47 Vector, 5 Vector norm, 7 Watson, 44, 147 Weak duality theorem, 24 Weierstrass' theorem, 9 Weinberger, 182 Zero of multiplicity 2, 156