Linear Optimization and Approximation: An Introduction to the Theoretical Analysis and Numerical Treatment of Semi-infinite Programs (Applied Mathematical Sciences)

Applied Mathematical Sciences Volume 45 Applied Mathematical Sciences 1. John: Partial Differential Equations, 4th ed...

Author: K. Glashoff | S.-A. Gustafson

137 downloads 376 Views 2MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Applied Mathematical Sciences Volume 45

Applied Mathematical Sciences 1. John: Partial Differential Equations, 4th ed. (cloth) 2. Sirovich: Techniques of Asymptotic Analysis. 3. Hale: Theory of Functional Differential Equations, 2nd ed. (cloth) 4. Percus: Combinatorial Methods. 5. von Mises/Friedrichs: Fluid Dynamics. 6. Freiberger/Grenander: A Short Course in Computational Probability and Statistics. 7. Pipkin: Lectures on Viscoelasticity Theory. 8. Giacaglia: Perturbation Methods in Non-Linear Systems. 9. Friedrichs: Spectral Theory of Operators in Hilbert Space. 10. Stroud: Numerical Quadrature and Solution of Ordinary Differential Equations. 11. Wolovich: Linear Multivariable Systems. 12. Berkovitz: Optimal Control Theory. 13. Bluman/Cole: Similarity Methods for Differential Equations.

14. Yoshizawa:

Stability Theory and the Existence of Periodic Solutions and Almost Periodic Solutions.

15. Braun: Differential Equations and Their Applications, 3rd ed. (cloth) 16. Lefschetz: Applications of Algebraic Topology. 17. Collatz/Wetterling: Optimization Problems. 18. Grenander: Pattern Synthesis: Lectures in Pattern Theory, Vol I. 19. Marsden/McCracken: The Hopf Bifurcation and its Applications. 20. Driver: Ordinary and Delay Differential Equations. 21. Courant/Friedrichs: Supersonic Flow and Shock Waves. (cloth) 22. Rouche/Habets/Laloy: Stability Theory by Liapunov's Direct Method. 23. Lamperti: Stochastic Processes: A Survey of the Mathematical Theory. 24. Grenander: Pattern Analysis: Lectures in Pattern Theory, Vol. It. 25. Davies: Integral Transforms and Their Applications. 26. Kushner/Clark: Stochastic Approximation Methods for Constrained and Unconstrained Systems.

27. de Boor: A Practical Guide to Splines. 28. Keilson: Markov Chain Models-Rarity and Exponentiality. 29. de Veubeke: A Course in Elasticity. 30. Sniatycki: Geometric Quantization and Quantum Mechanics. 31. Reid: Sturmian Theory for Ordinary Differential Equations. 32. Meis/Markowitz: Numerical Solution of Partial Differential Equations. 33. Grenander: Regular Structures: Lectures in Pattern Theory, Vol. III. 34. Kevorkian/Cole: Perturbation Methods in Applied Mathematics. (cloth) 35. Carr: Applications of Centre Manifold Theory.

(continued)

Klaus Glashoff O

Sven-Ake Gustafson

Linear Optimization and Approximation An Introduction to

the Theoretical Analysis and Numerical Treatment of Semi-infinite Programs With 20 Illustrations

Springer-Verlag New York Heidelberg Berlin

Klaus Glashoff Universitat Hamburg Institut fur Angewandte Mathematik 2 Hamburg 13 Bundestrasse 55 Federal Republic of Germany

Sven-Ake Gustafson Department of Numerical Analysis and Computing Sciences Royal Institute of Technology S-10044 Stockholm 70 Sweden and Centre for Mathematical Analysis Australian National University

P.O. Box 4 Canberra, ACT 2600 Australia

AMS Subject Classifications: 90005, 49D35

Library of Congress Cataloging in Publication Data Glashoff, Klaus, 1947Linear optimization and approximation. (Applied mathematical sciences ; v. 45) Translation of: Emfiihrung in die lineare Optimierung. Includes bibliographical references and index. 1. Mathematical optimization. 2. Duality theory. (Mathematics) I. Gustafson, Sven-Ake, 1938II. Title. Ill. Series: Applied mathematical sciences (Springer-Verlag New York Inc.) : v. 45. QAI.A647

vol. 45

510s

[519.7'21

83-647

[QA402.51

Original edition © 1978 by Wissenschaftliche Buchgesellschaft, Darmstadt/ West-Germany. (First published in the series: "Die Mathematik. Einfiihrungen in Gegenstand and Ergebnisse ihrer Teilgebiete and Nachbarwissenschaften.")

English edition © 1983 by Springer-Verlag New York Inc. All rights reserved. No part of this book may be translated or reproduced in any form without written permission from Springer-Verlag, 175 Fifth Avenue, New York, New York 10010, U.S.A. Printed and bound by R.R. Donnelley & Sons, Hamsonburg, VA. Printed in the United States of America.

987654321 ISBN 0-387-90857-9 ISBN 3-540-90857-9

Springer-Verlag New York Heidelberg Berlin Springer-Verlag Berlin Heidelberg New York

Preface

A linear optimization problem is the task of minimizing a linear real-valued function of finitely many variables subject to linear constraints; in general there may be infinitely many constraints. is devoted to such problems.

This book

Their mathematical properties are investi-

gated and algorithms for their computational solution are presented. Applications are discussed in detail. Linear optimization problems are encountered in many areas of applications.

long time.

They have therefore been subject to mathematical analysis for a We mention here only two classical topics from this area:

the so-called uniform approximation of functions which was used as a mathematical tool by Chebyshev in 1853 when he set out to design a crane, and the theory of systems of linear inequalities which has already been studied by Fourier in 1823.

We will not treat the historical development of the theory of linear optimization in detail.

However, we point out that the decisive break-

through occurred in the middle of this century.

It was urged on by the

need to solve complicated decision problems where the optimal deployment of military and civilian resources had to be determined. of electronic computers also played an important role.

The availability The principal

computational scheme for the solution of linear optimization problems, the simplex algorithm, was established by Dantzig about 1950.

In addi-

tion, the fundamental theorems on such problems were rapidly developed, based on earlier published results on the properties of systems of linear inequalities.

Since then, the interest of mathematicians and users in linear optimization has been sustained.

New classes of practical applications are v

Vi

being introduced continually and special variants of the simplex algorithm and related schemes have been used for the computational treatment of practical problems of ever-growing size and complexity.

The theory of

"classical" linear optimization problems (with only finitely many linear constraints) had almost reached its final form around 1950; see e.g. the excellent book by A. Charnes, W. W. Cooper and A. Henderson (1953). Simultaneously there were great efforts devoted to the generalization and extension of the theory of linear optimization to new areas.

Thus non-

linear optimization problems were attacked at an early date.

(This area

plays only a marginal role in our book.)

Here, connections were found

with the classical theory of Lagrangian multipliers as well as to the duality principles of mechanics.

The latter occurred in the framework of

convex analysis.

At the same time the theory of infinite linear optimization came It describes problems with infinitely many variables and

into being.

constraints.

This theory also found its final form rapidly; see the paper

by R. J. Duffin (1956).

A special but important class of infinite linear optimization problems are those problems where the number of variables is finite but the number of linear inequality constraints is arbitrary, i.e. may be infinite. This type of problem, which constitutes a natural generalization of the classical linear optimization problem, appears in the solution of many concrete examples.

We have already mentioned the calculation of uniform

approximation of functions which plays a major role in the construction of computer representations of mathematical expressions.

Uniform approxi-

mation can also be successfully used in the numerical treatment of differential equations originating in physics and technological problems. Using an investigation by Haar from 1924 as a point of departure, A. Charnes, W. W. Cooper and K. 0. Kortanek in 1962 gave the fundamental mathematical results of the last-mentioned class of linear optimization problems (with the exception of those questions which were already settled by Duffin's theory).

This class of optimization problems, often called semi-infinite programs, will be the main topic of the present book.

The "classical" linear

optimization problems, called linear programs, will occur naturally as a special case.

Whether the number of inequality constraints is finite is a matter of minor importance in the mathematical theory of linear optimization problems.

The great advantage of treating such a general class of problems,

vii

encompassing so many applications, need not, fortunately, be achieved by means of a correspondingly higher level of mathematical sophistication. In our account we have endeavored to use mathematical tools which are as simple as possible.

To understand this book it is only necessary to mas-

ter the fundamentals of linear algebra and n-dimensional analysis. theory is summarized in §2.)

(This

Since we have avoided all unnecessary mathe-

matical abstractions, geometrical arguments have been used as much as possible.

In this way we have escaped the temptation to complicate simple

matters by introducing the heavy apparatus of functional analysis. The central concept of our book is that of duality.

Duality theory

is not investigated for its own sake but as an effective tool, in particular for the numerical treatment of linear optimization problems. Therefore all of Chapter II has been devoted to the concept of weak duality.

We give some elementary arguments which serve to illustrate

the fundamental ideas (primal and dual problems).

This should give the

reader a feeling for the numerical aspects of duality.

In Chapter III

we discuss some applications of weak duality to uniform approximation where the emphasis is again placed on numerical aspects. The duality theory of linear optimization is investigated in Chapter IV.

Here we prove theorems on the existence of solutions to the optimi-

zation problems considered.

We also treat the so-called strong duality,

i.e. the question of equality of the values of the primal and dual probThe "geometric" formulation of the dual problem, introduced here,

lems.

will be very useful for the presentation of the simplex algorithm which is described in the chapter to follow.

In Chapter V we describe in great detail the principle of the exchange step which is the main building block of the simplex algorithm. Here we dispense with the computational technicalities which dominate many presentations of this scheme.

The nature of the simplex algorithm can be

explained very clearly using duality theory and the language of matrices and without relying on "simplex tableaux", which do not appear in our text. In Chapter VI we treat the numerical realization of the simplex algorithm. solved.

It requires that a sequence of linear systems of equations be Our presentation includes the stable variants of the simplex

method which have been developed during the last decade. In Chapter VII we present a method for the computational treatment of a general class of linear optimization problems with infinitely many constraints. (1970).

This scheme was described for the first time in Gustafson

Since then it has been successfully used for the solution of many

viii

practical problems, e.g. uniform approximation over multidimensional domains (also with additional linear side-conditions), calculation of quadrature rules, control problems, and so on.

In Chapter VIII we apply the ideas of the preceding three chapters to the special problem of uniform approximation over intervals.

The

classical Remez algorithm is studied and set into the general framework of linear optimization.

The concluding Chapter IX contains several worked examples designed to elucidate the general approach of this book.

We also indicate that the

ideas behind the computational schemes described in our book can be applied to an even more general class of problems. The present text is a translated and extended version of GlashoffGustafson (1978). IV is revised.

Chapters

VIII and IX are completely new and Chapter

More material has been added to Chapters III and VII.

These changes and additions have been carried out by the second author, who is also responsible for the translation into English.

Professor

Harry Clarke, Asian Institute of Technology, Bangkok, has given valuable help with the latter task.

We hope that this book will provide theoretical and numerical insights which will help in the solution of practical problems from many disciplines.

We also believe that we have clearly demonstrated our con-

viction that mathematical advances generally are inspired by work on real world problems.

Table of Contents Page

Preface CHAPTER I.

v

INTRODUCTION AND PRELIMINARIES §1. §2. §3.

CHAPTER II.

WEAK DUALITY §4. §5.

CHAPTER III.

§10. §11.

CHAPTER V.

§17.

CHAPTER VIII.

Stable Variants of the Simplex Algorithm Calculating a Basic Solution

Nonlinear Systems Derived From Optimality Conditions A General Computational Scheme

APPROXIMATION PROBLEMS BY CHEBYSHEV SYSTEMS §18. §19.

§20.

CHAPTER IX.

Basic Solutions and the Exchange Step The Simplex Algorithm and Discretization

A GENERAL THREE-PHASE ALGORITHM §16.

General Properties of Chebyshev Systems One-sided Approximation and Generalized Quadrature Rules of the Gaussian Type Computing the Best Approximation in the Uniform Norm

EXAMPLES AND APPLICATIONS OF SEMI-INFINITE PROGRAMMING §21. §22. §23. §24.

1 5

10

20 30

37 37 46 58

Geometric Interpretation of the Dual Problem Solvability of the Dual Problem Separation Theorem and Duality Supporting Hyperplanes and Duality

NUMERICAL REALIZATION OF THE SIMPLEX ALGORITHM §14. §15.

CHAPTER VII.

Uniform Approximation Polynomial Approximation

THE SIMPLEX ALGORITHM §12. §13.

CHAPTER VI.

Duality Lemma and Dual Problem State Diagrams and Duality Gaps

DUALITY THEORY §8. §9.

1

20

APPLICATIONS OF WEAK DUALITY IN UNIFORM APPROXIMATION §6. §7.

CHAPTER IV.

Optimization Problems Some Mathematical Prerequisites Linear Optimization Problems

A Control Problem with Distributed Parameters Operator Equations of Monotonic Type An Air Pollution Abatement Problem Nonlinear Semi-Infinite Programs

S8 69 75 82

92 93 105 115

115 128

134 135 141

153 153 158 168

175

175 181 184 188

References

193

Index

196 ix

Chapter I

Introduction and Preliminaries

51.

OPTIMIZATION PROBLEMS Optimization problems are encountered in many branches of technology,

in science, and in economics as well as in our daily life.

They appear

in so many different shapes that it is useless to attempt a uniform description of them or even try to classify them according to one principle In the present section we will introduce a few general con-

or another.

cepts which occur in all optimization problems.

Simple examples will

elucidate the presentation. Example:

(1)

Siting of a power plant.

Pl,P2,...,P5.

located at

Five major factories are

A power plant to supply them with electricity

is to be built and the problem is to determine the optimal site for this plant.

The transmission of electrical energy is associated with energy

losses which are proportional to the amount of transmitted energy and to the distance between power plant and energy consumer.

One seeks to sel-

ect the site of the plant so that the combined energy loss is rendered a minimum.

are represented by points in the plane with the

P1,P21.... P5

coordinates

PI = (x1,Y1),...,P5 = (x5,YS)

d(P,F) _

{(x-x)2

+

(Y-Y)2}1/2.

Denote the transmitted energy quantities by lem may now be formulated.

plane, a point value at

P

is given by

P = (x,y), P = (x,y)

The distance between the two points

E1,...,E5.

Our siting prob-

We seek, within a given domain

G

of the

such that the following function assumes its minimal

P: 1

INTRODUCTION AND PRELIMINARIES

I.

2

E1d(P,P1) + E2d(P,P2) +...+ E5d(P,PS). In order to introduce some terminology we reformulate this task. fine the real-valued function

f

of two real variables

x,y

We de-

through

f(x,y) = E1{(x-x1)2 + (y-y1)2}112 +...+ E5{(x-x5)2 + (y-y5)2}1/2. We then arrive at the optimization problem: that

P = (x,y) E G

Determine numbers

x,y

such

and

f(x,y) < f(x,y) for all

(x,y) E G.

Siting of power plant

Fig. 1.1.

All important concepts associated with optimization problems may be illustrated by this example:

f

is called a preference function, G

the permissible set, and the points of feasible.

G

is

are called permissible or

Thus the optimization problem means that one should seek a

permissible point such that to the permissible set.

f

assumes its minimal value with respect

If such a point does exist, it is called an

optimal point (for the problem considered), or optimal solution, or minimum point of

f

in

G.

In the analysis of an optimization problem it is important to verify that an optimal solution does exist, i.e. that the problem is solvable. This is not always the case. that the functions

fi(x) = -x

As an illustration of this fact we note and

f2(x) = e_X

points in the set of all real numbers.

do not have any minimum

On the other hand, if an optimiza-

tion problem is solvable, a minimum point may not be unique.

In many

applications it is required to determine all minimum points which the preference function has in the permissible set.

Optimization Problems

1.

3

It is of course of no use to formulate a task, appearing in economics or technology, as an optimization problem when this problem cannot be solved.

A formulation as an optimization problem is thus advantageous

only when the mathematical structure of this task can be investigated and suitable theoretical and computational tools can be brought to bear. Oftentimes, "applications" to economics or management are proposed whereby very complicated optimization problems are constructed but it is not pointed out that neither theoretical nor numerical treatment of the problem appears to be within reach, now or in the near future.

It should

always be remembered that only some of the relevant factors can be incorporated when a decision problem is formulated as an optimization problem. There are always decision criteria which cannot be quantified and whose inclusion into a mathematical model is of doubtful value.

Thus, in the

siting problem discussed above, there are many political and ecological factors which cannot be accounted for in a mathematical model.

This indi-

cates that there is, in principle, a limit of what can be gained by the mathematization of social processes.

This difficulty cannot, as a rule,

be overcome by resorting to more complicated models (control theory, game theory, etc.) even if it sometimes may be concealed. quite different for technical systems.

The situation is

Since nowadays the mathematiza-

tion and also the "optimization" of social processes are pushed forward with great energy, we find the critical remark above to be justified. Production model.

Example:

(2)

or consumes

goods

n

G1,...,Gn

environmental pollutants). numbers

(a1,...,an)

We consider a firm which produces

(e.g. raw materials, labor, capital,

An activity of the firm is represented by

where

ar

indicates the amount of good

is produced or consumed when the activity sity

1

(measured in suitable units).

ect various activities

P

.

s

that to each

s

finite) there are

numbers

n

which

is taking place with inten-

We assume that the firm can sel-

Thus the firm's technology has the property

in a fixed index set n

P

Gr

S

(which may be finite or in-

(a1(s),...,an(s)).

A production plan of

the firm is defined by selecting a (finite) number of activities ..,P s

Ps 1

ties

and prescribing that they are carried out with the intensi-

q

xl,...,xq, where

xi > 0, i = 1,2,...,q.

We assume that the pro-

duction process is linear, i.e. for the given production plan the amount of good

Gr

which is produced or consumed is given by

ar(s1)x1 + ar(s2)x2 +...+ ar(sq)xq.

4

INTRODUCTION AND PRELIMINARIES

I.

We shall further assume that the activity

causes the profit (or cost)

P s

Hence the profit achieved by the chosen production plan is given

b(s).

by b(sl)xl + b(s2)x2 +...+ b(sq)xq.

(3)

The optimization problem of the firm is to maximize its profit by proper choice of its production plan, i.e. it must select finitely many activiand the corresponding intensities

Ps ,...,Ps

ties

xl,x2,...,xq

such

1

that the expression (3) assumes the greatest value possible. The choice of activities and intensities is restricted by the fact that only finite amounts of the goods

G1,...,Gn

are available.

In

practice this is true only for some of the goods but for simplicity of presentation we want to assume that all goods can only be obtained in limited amounts:

ar(sI)xl + ar(s2)x2 +...+ ar(s )x q

Thus (4) defines

r = 1,2,...,n.

(4)

side-conditions which constrain the feasible acti-

n

The optimization problem can thus be cast into

vities and intensities. the form:

< cr,

q-

Determine a finite subset

and the real numbers

of the index set

{sl,...,sq}

S

such that the expression (3) is rendered

x1,... x q

a maximum under the constraints (4) and the further side-conditions xi > 0, (5)

i = 1,2,...,q.

Remark.

A maximization problem is transformed into an equival-

ent minimization problem by multiplying its preference function by (6)

in

M

The general optimization problem. be a real-valued function defined on

f

let

M.

M

be a fixed set and

We seek an element

x

such that f(x) < f(x)

M

Let

-1.

for all

x E M.

is called the feasible or permissible set and

ference function.

f

is termed the pre-

We remark here that the feasible set is, as a rule,

not explicitly given but is defined through side-conditions (often called constraints), as in Example (2). (7)

Definition.

v = {inf f(x)

I

The number

v

given by

X E M}

is called the value of the corresponding optimization problem.

If

M

is the empty set, i.e. there are no feasible points, the optimization

2.

Some Mathematical Prerequisites

5

problem is said to be inconsistent and we put

If feasible points

v = -.

do exist we term the optimization problem feasible or consistent.

If

v = -W, the optimization problem is said to be "unbounded from below".

Thus every minimization problem must be in one and only one of the following three "states" IC, B, UB:

IC = Inconsistent; the feasible set is empty and the value of the problem is

+W.

B = Bounded; there are feasible points and the value is finite. UB = Unbounded; there are feasible points, the preference function is unbounded from below, and the value is The value of a maximization problem is

-m

-m.

in the state IC, finite in

state B, and - in the state UB.

42.

SOME MATHEMATICAL PREREQUISITES The successful study of this book requires knowledge of some elemen-

tary concepts of mathematical analysis as well as linear algebra.

We

shall summarize the notations and some mathematical tools in this section. (1)

Vectors.

We denote the field of real numbers by

R, and by

Rn

the n-dimensional space of all n-tuples of real numbers

(2)

x n In

Rn, the usual vector space operations are defined:

componentwise

addition of vectors and multiplication by scalars (i.e. real numbers). We assume that the reader is familiar with the concepts of "linear independence", "basis", and "subspace".

written (3)

0.

Matrices.

real numbers

of

The zero vector of

Rn

is

n-tuples of the form (2) are also referred to as "points". An

m x n matrix aik

A

(m 2.1) is a rectangular array

(i = 1,2,...,m, k = 1,2,...,n),

all

a12

aln

a21

a22

a2n

aml

amt

amn

A =

6

INTRODUCTION AND PRELIMINARIES

I.

are termed the elements of the matrix A and aik is aik situated in row number i and column number k. To each given matrix A The numbers

we define its transpose

T

...

by

all

a21

a12

a22

...

amt

l aln

a2n

...

amn

1

A

AT

I

'ml

=

Every vector

x E Rn

I

may be considered an

n x 1

matrix.

In order to

save space we write, instead of (2), XT = (xl,x2,...,xn).

We note that

The reader is supposed to know elementary matrix

(AT)T = A.

operations (addition and multiplication of matrices). Linear mappings.

(4)

mapping of

Rn

y E Rm

vector

into

km

Every m x n

matrix

whereby every vector

A

defines a linear

x E Rn

is mapped onto a

via

(5)

y = Ax.

Using the definition of matrix multiplication we find that the components

of y

are to be calculated according to yi = ailxl + ai2x2 +...+ ainxn,

Denote the column vectors of A

1 < i < m.

al,a2,...,an.

by

Then we find

Ax = alxl + a2x2 +.. .+ anxn.

(6)

Equation (6) thus means that the vector the column vectors of (7)

(5).

Linear systems of equations.

The task of determining

lems of linear algebra. n

unknowns

y

is a linear combination of

A.

xl,x2,...,xn

Now let a fixed

y

be given in

in (5) is one of the fundamental prob-

x

(5) is called a linear system of equations with and

m

equations.

We assume that the solva-

bility theory of (5) (existence and uniqueness of solutions) is known to the reader. each

y E Rm

has the rank

An example:

from (6) we conclude that (5) is solvable for

if the column vectors of A in.

solution if the column vectors of when

A

span all of

Rm, i.e. if A

It is equally simple to verify that (5) has at most one A

are linearly independent.

is a square matrix, n x n, is of particular interest.

The case Then (5)

2.

Some Mathematical Prerequisites

7

has an equal number of equations and unknowns. Ax = y

has a unique solution

the column vectors

of A

al,a2,...,an

Rn, i.e. are

is said to be regular (or n x n

In this case there exists a

nonsingular).

if and only if

y E Rn

form a basis of

Then the matrix A

linearly independent.

Then the linear system

for each

x E Rn

matrix

A-1

with the

properties A-1(Ax) = x, A-I

A(A-1x) = x,

all

is called the inverse of A

x E Rn.

and the linear system of equations (5)

has the unique solution x = A-1y.

are given.

A vector

y E Rn

and a number

Then we denote by the hyperplane

H(y;n)

the set of all points

Hyperplanes.

(8)

n E R

x E Rn

such that T Y x = ylxl + y2 x2 +...+ y x n n

= n

y

is called the normal vector of the hyperplane.

x

and

z

we have

H(y;n)

in

For any two vectors

yT (x-z) = 0. yTx = n

A hyperplane H(y;n)

partitions

Rn

into three disjoint sets, namely

and the two "open half-spaces" T

Al = {x

I

y

x < n}

yTx > n}.

A2 = {x

The linear system of equations (5) also admits the interpretation that the vector

x

must be in the intersection of the hyperplanes a1,...,am

(i = 1,2,...,m), where A.

Sets of the form A

U H(n;y)

H(a'';yi),

here are the row-vectors of the matrix and

A2 U H(n;y)

are termed closed

1

half-spaces.

YTx < n

They consist of all points or

x E Rn

such that

YTx > n,

respectively. (9)

Vector norms.

real number laws:

Ilxll.

We shall associate with each vector

The mapping

x - Ilxll

x E Rn

shall obey the following

a

8

INTRODUCTION AND PRELIMINARIES

I.

llxll > 0, all

(i)

(ii)

ilaxll =

(iii)

Then

x E Rn

lal

llxll = 0

and

llxll, all

x E Rn, all

for

x = 0

only;

X E R;

x E R, y E R.

llx+Yll < llxll + llyll, all

will be called the norm of the vector.

llxll

Show that the following mappings define vector norms on

Exercise: Rn:

x - max{lx 1l,lx2l,...,lxnl}. The most well-known norm is the Euclidean norm, which will be treated in the next subsection.

The scalar product of two

Scalar product and Euclidean norm.

(10)

vectors

and

x

T

is defined to be the real number

y

T

x y = Y x = xlyl + x2y2 +...+ xnyn. The real number x'x = (x1 + x2 +...+

=

lxl

is called the Euclidean norm or length or absolute value of the vector x.

The reader should verify that the mapping

the sense of (9).

lx+yi2 + lx-Yl2 = 2(

Ix12 +

for all

lyl2)

Some topological fundamentals.

(11)

two points

x,y

defines a norm in

x - lxl

It is also easy to establish the "parallelogram law"

in

to be given by

Rn

ing of all points whose distance to

a

We define the distance between

l

The set

lx-yl.

is less than

number, is termed the open sphere with center

Kr(a) = {x E Rn

x,y E Rn.

a

consist-

K (a) r

r, a fixed positive

and radius

r.

Thus

lx-al < r}.

We are now in a position to introduce the fundamental topological structure of

A point

Rn.

if there is a sphere K (a) c :A. T A.

A

a

is said to be an inner point of a subset

K (a) r

which in its entirety belongs to

We will use the symbol

A

a

A.

A

is termed open if A = A.

is said to be a boundary point of the set contains both points in

A

sphere

Kr(a)

to

The set of all boundary points of

A.

A,

for the set of all inner points of 0

is also called the interior of

The point

A c Rn

A

if every

and points which do not belong A

is called the boundary of

A

2.

Some Mathematical Prerequisites

and is denoted

closure of A A = A.

9

The union of

bd A.

and is denoted

A

and its boundary is called the

A

The set

A.

is said to be closed if

The following relations always hold.

0

0

AcAcA,

bd A = A A.

The topological concepts introduced above have been defined using the Euclidean norm.

This norm will be most often used in the sequel.

How-

ever, one may define spheres in terms of other norms and in this way arrive at the fundamental topological concepts "inner points", "open sets", and so on, in the same manner as above.

prove that all norms on

Rn

Fortunately it is possible to

are equivalent in the sense that they gen-

erate the same topological structure on

Rn:

A set which is open with

respect to one norm remains open with respect to all other norms. to establish this assertion one first verifies that if

11.111

are two norms on

c

there are two positive constants

Rn

In order

and

and

C

1.112

such

that

cIIxlt1 < 1Ix112 < CjjxIIl

for all

x E Rn.

Based on these fundamental structures one can now define the main concept of convergence of sequences and continuity of functions in the usual way. We suppose here the reader is familiar with these concepts. Compact sets.

(12)

there is a real number sets of

Rn

A c Rn

such that

is said to be bounded when

A c K(0).

Closed bounded sub-

will be termed compact.

Compact subsets

A

Every infinite sequence gent subsequence the image

A subset r > 0

f(A)

of

Rn

have the following important property: of points in the set

{xi}i>1

{xik}k>l'

If

f: R' + Rm

of every compact set

A

A

has a conver-

is a continuous mapping, then

is compact also.

From this

statement we immediately arrive at the following result which also may be looked upon as an existence statement for optimization problems: (13)

of

Rn

and

Theorem of Weierstrass. f

Let

assumes its maximum and minimum value on x E A

and

A

be a nonempty compact subset

a real-valued continuous function defined on

x E A

A.

Then

f

A, i.e. there exist points

such that

f (x) = max{f (x)

I

x E Al

and

f(z) = min{f(x)

I

x E A}.

It is recommended that the reader, as an exercise, carry out the proof of this simple but important theorem.

10

§3.

I.

INTRODUCTION AND PRELIMINARIES

LINEAR OPTIMIZATION PROBLEMS An optimization problem shall be called a linear optimization problem

(LOP) when the preference function is linear and the feasible domain is defined by linear constraint functions. Thus the preference function has the form n C

cryr,

I

r=1

where

is a fixed vector in

c

Rn.

The set of feasible vectors of an

(LOP) will be defined as an intersection of half-spaces: given index set (which may be finite or infinite). associate a vector

as E Rn

and a real number

Let

With each

bs.

S

be a

s E S

we

Then the set of

feasible vectors of a linear optimization problem consists of all vectors y E Rn {y

lying in all half-spaces

I

asy > bs},

s E S.

(1)

We shall discuss two examples of sets of vectors defined by means of systems of linear inequalities. (2)

Example.

S = {1,2}

(In both cases we have

n = 2.)

al = (2,3) T, a2 = (-1,0) T, bl = 6, b2 = -3.

In this case (1) becomes 2yl + 3y2 > 6 -Yl

> -3.

This set is indicated in Figure 3.1 by the checkered area.

V2

2y1 + 3y2 ^ 6

- '1= Y1

Fig. 3.1

3.

Linear Optimization Problems

N

11

V2

1

1\

s=D

Fig. 3.2.

s=1

The checkered area is the set defined by means of the inequali-

yl + sy2 > T, s E [0,1].

ties

Example.

(3)

1

s-2

Let

be the real interval

S

[0,1].

finitely many elements, in contrast to Example (2).) and

b

s

= j for all

yl + sy2 >

Let

y1-y2-plane which is defined by these inequalities is

drawn in Fig. 3.2.

The two hyperplanes (in this case straight lines)

r

yl + sy2 = corresponding to

s = 1

and

s = 1/2

The "general" situation (for

are marked in the figure.

n = 2) is illustrated in Fig. 3.3.

The hyperplanes corresponding to some particular

and

a s

indicated.

S

as = (1,s)T

The inequalities (1) then become

s E [0,1] .

r,-,

The subset of the

s E [0,1].

now has in-

(S

b

,

s

s E S

are

may be infinite; if so, it generates infinitely many hyper-

planes.

We note that the inequalities (1) may define bounded as well as unbounded subsets of (4)

Compare Fig. 3.2 with Fig. 3.3.

Rn.

Exercise.

Set

as = (1,1/5)T, bs = 0, for

plane defined by (1).

n = 2.

Let

s = 1,2,...

S = {1,2,... ,}, and let .

Draw the subset of the

y1-y2-

Show that this subset can be defined using two

inequalities only! (5)

Exercise.

Draw the subset of the

the infinitely many inequalities

y1-y2-plane defined through

12

I.

INTRODUCTION AND PRELIMINARIES

Fig. 3.3

-syl - l-s2 y2 > - 1-s2, To summarize:

for every Sought: (P)

A linear optimization problem is defined as follows:

s E S

a vector

A vector

y E Rn

Minimize

S, and

c = (c1,c2,...,cn)T E Rn, a nonempty index set

A vector

Given:

s E [-1,1] .

T

c y

as E Rn

and a real number

bs.

which solves the following problem (P):

subject to the constraints

asy > bs, all

s E S.

We now introduce some alternative notations which will often be used in the sequel.

We write

a(s)

instead of

as

b(s)

instead of

bs.

and

Hence we arrive at the following two componentwise representations of the vector

a(s) = as:

as = (a is' a2s,...,ans)

T

and

a(s) = CaI(s),a2(s),...,an(s))T.

Thus the optimization problem (P) can also be written in the following form:

3.

Linear Optimization Problems

13

n (P)

Minimize

I

n

r=1

c

rY.r

subject to the constraints

I

r=1

a

r(s)y, >- b(s),

s ES. One can use a particularly simple representation in the important special case when

has a finite number of elements, i.e. when (P) has only

S

finitely many constraints.

To discuss this case we put

m vectors

there occur

a(s

)

S = {s1,s2'...,sm)

(i = 1,2,...,m).

i

where

m > 1.

Then

The corresponding linear

constraints take the following form al(sl)y1 + a2(s1)y2 + ... + an(sl)yn > b(s1) a1(s2)y1 + a2(s2)y2 + ... + an(s2)yn > b(s2) (6)

a1(sm)yi + a2(sm)y2 + ... + an(sm)yn > b(sm) The

numbers

nm

are combined into a matrix A with the vectors

ar(si)

in its columns:

a(si)

a1(s1)

a1(s2)

...

a1(sm)

a2(s1)

a2(s2)

...

a2(sm)

l I

A =

(7)

an(s1)

an(s2)

...

an(sm) J

If now the tor

m

numbers

b(s ), i = 1,2,...,m

are combined into the vec-

i

b = (b(s1),b(s2),...,b(sm))T, then the constraints (6) may be

written ATy > b.

On the other hand let a matrix 1,2,...,m) and a vector ties

T

A y > b

A = (ar ),

(r = 1,2,...,n

b = (b1,b2,...,bm)T

be given.

and

s =

Then the inequali-

become

ally, + a21y2 + ... + aniyn > b,

a12y, + a22y2 + ... + an2yn > b2

a1my, + a2my2 + ... + anmyn > bm This system of inequalities is expressed in the form of (6) by putting S = {1,2,3,...,m}

14

INTRODUCTION AND PRELIMINARIES

I.

and ar(s) = ars

for

Example.

(8)

s = 1,2,...,m

and

r = 1,2,...,n.

Consider the system of inequalities

Y2 > 2

YI +

yI+3y2<3 YI > 0 Y2 > 0.

The second inequality is multiplied by

-1

and expressed in the form

-Yl - 3y2 > -3.

In this case we have

The matrix A

n = 2, m = 4.

1

-1

1

0

1

-3

0

1

becomes

A= Every column corresponds to one constraint of the system of inequalities and the corresponding vector (9)

Definition.

is given by

b

b = (2,-3,0,0) T.

A linear optimization problem with finitely many

constraints will be called a linear program.

Its standard form will be

denoted (LP):

Minimize

(LP)

A = (ars)

Here in

and

Rm

Rn

cTy

under the constraints

is a given

n

by

m

ATy > b.

matrix and

are given vectors

b,c

respectively.

Linear programming, i.e. the algorithmic solution of linear optimization problems of the type (LP), is one of the most important areas of linear optimization.

Therefore this special case will be treated separa-

tely and in detail in the sequel.

In the case that (1) defines infinitely many constraints (ISI = )*, it may be advantageous to look upon the vectors "matrix"

a(s)

as columns of a

This "matrix" has infinitely many columns.

A.

Here we combine the vectors

example of Exercise (4).

Consider the

a(s) = (1,1/s) T

into the array 1

1

1

1

1

1/2

1/3

1/4

...

We denote by ISI the number of elements of many elements, we write ISI = -.

S.

If

S

has infinitely

3.

Linear Optimization Problems

The vectors

a(s)

15

can always be arranged in this way when

S

con-

tains countably many elements but this representation fails in a more general situation, e.g. when

S = [0,1].

might be useful to write the vector rangement.

In the case

S =

[0,1]

a(s)

However, also in this case it from (1) in a matrix-like ar-

we may write

a1(0) ... al(s) ... al(l)

a2(0) ... a2(s) ... a2(l) an (0)

... an (s) ... an(1)

a(0)

= -

a(1)

Definition.

(10) ISI

t

+ a(s)

T

Consider a LOP of the type (P) and such that

(i.e. there are infinitely many linear constraints).

finite subset

{sl,s2,...,smI c S

and form the matrix

A

Select a

from (7).

The

linear program hereby arising is called a discretization of the original LOP.

As an example we discuss the general LOP: n

Minimize

cTy

subject to the constraints

a (s)y

r

I

r=1 where

r

> b(s),

s E S,

= m.

ISI

A discretization of this task is defined by means of the linear program: n

Minimize

cTy

subject to the constraints

a (S )y

r i r

I

r=1

> b(si),

i = 1,2,...,m.

Here, sl,s2,...,sm Example.

(11)

are fixed elements in

S.

Often problems of the type illustrated by Example (3)

are discretized as follows.

h = 1/(m-1), si = (i-l)h

Select a natural number

and form the matrix

A.

m > 2, put

In the case of (3) we

get 1

1

1

1

2

mm-1

mml

...

1

' '

m-2 m-1

A = 0

(12)

Exercise.

Denote by

1

v

the value of Problem (P) and by

the value of a discretization of (P). vm(P) < v(P).

Show that

vm(P)

16

I.

INTRODUCTION AND PRELIMINARIES

The method of discretization is very important both in theory and practice.

We will return to this topic in §13.

Provided that certain very

general conditions are met, it is possible to show that for every linear optimization problem (P) there is a discretization with the same optimal solution as (P).

These conditions are met in the practical applications This statement is an important consequence of the

discussed in this book.

duality theory of Chapter IV and indicates the important role of linear programming in the framework of linear optimization. We mention here that in computational practice discretization is often used to calculate an approximate solution of a linear optimization problem with infinitely many constraints.

The linear program thereby ob-

tained is solved by means of the simplex algorithm (Chapter V and VI) which, after finitely many arithmetic operations, delivers a solution (or the information that none exists).

We shall now illustrate another useful way of studying a given LOP by means of diagrams. Consider again Example (3). s E (0,1].

We have

a(s) = (l,s)T, b(s) = I for

Thus

a1(s) = als = 1 a2(s) = a2s = s b(s) Let

cI = 1

= bs and

= v. c2 = 0.

yl + sy2 > j,

The constraints (1) are written

s E [0,1].

They are illustrated in Fig. 3.2 but may also be represented geometrically as follows.

(yl,y2)

satisfies these constraints if the straight line

z(s) = Yl + sY2

lies above the graph of the function Fig. 3.4.)

YS

in the interval

[0,1].

(See

The corresponding LOP may be reformulated as the task to

determine, among all such straight lines, the one which intersects the vertical axis at the lowest point. (13)

solution.

Exercise.

Prove that the LOP above has the value

0

but no

Show also, by drawing a picture, analogous to Fig. 3.4, that

every discretization of this LOP has the value point of the interval

[0,1]

cretization, sl,s2'...,sm. low in this case.

--, if the left boundary

does not appear among the points of disThus the linear program is unbounded from be-

3.

Linear Optimization Problems

17

A

s

s

Fig. 3.4

Example:

(14)

Air pollution control.

We consider the problem of

maintaining a satisfactory air quality in an area

S

This

(e.g. a city).

goal shall be reached by regulating the emissions from the sources of pollutants in such a manner that the control costs are as small as possible.

sources have been identified and their positions and strengths

N

are known.

We consider here only the case of one pollutant, e.g. centration of the pollutant at a point pN

d(s)

s = (sl,s2)T

SO2.

The con-

is given by

q.V.(s).

=

JL1 Here

V.

is the transfer function which describes the contribution from

the source with index V.

j

to the ambient concentration at the point

describes an annual mean and is hence time-independent.

s.

The transfer

functions are calculated from meteorological dispersion models incorporating wind speed and direction, atmospheric stability, and several other geographical and meteorological variables. transfer functions are known.

q

,

We shall assume that the

is the strength of source number

j.

J

The number of pollutant sources is generally very great and therefore they cannot be regulated individually. source classes

G1,G2,...,Gn

lated in the same way.

Instead they are divided into

n

and all sources in a given class are regu-

Thus all residential houses of a city may form one

source class.

The sources are now numbered so that all sources with in-

dices between

jr-1 + 1

and

jr

comprise class number

r

(r = 1,2,...,n).

18

INTRODUCTION AND PRELIMINARIES

I.

Thus we have

< ... <jn=N.

0 = j0 < j 1 We now introduce

(r = 1,2,...,n)

vr(s) = E gjVj(s)

where the summation is extended over all members of class concentration of the pollutant at point

s

r.

The total

is thus given by

n

vr(s) r=1 One reduction strategy is now to reduce the emmission of class fraction

Thus

Er.

0 < Er < 1

Gr

by the

Hence the total remain-

(r = 1,2,...,n).

ing concentration after regulation becomes n (1 - Er)vr(s). r=1

We require now that for each surpass a given limit

s E S

g(s).

g

the value of this expression does not

may be a legally imposed standard de-

fining the highest acceptable concentration. are upper bounds

er < 1

for the fractions

We assume also that there Er.

(It is not technically

possible to completely remove the emissions from the group fore the numbers

E1,E2,...,En

0 < Er < er,

G r.)

There-

must meet the conditions:

r = 1,2,...,n

(15)

nC

(1-Er)vr(s) < g(s),

s E S.

(16)

r=1

The reduction of emissions entails costs, e.g. for the installment and maintenance of effluent filters in factories.

We shall assume that these

costs are defined by the linear function n

K(E) =

I

(17)

crEr,

r=1

where

cl,c2,...,cn

are known numbers.

The task of minimizing the cost

function (17) under the constraints (15), (16) is a linear optimization problem: n

Minimize

rr

c E

r=l

subject to the constraints

E

r-

> 0, r = 1,2,...,n,

3.

Linear Optimization Problems

-E

r

n r=1 (18)

->

-e

,

r

19

r = 1,2,...,n,

n

Er vr (s) >- -g(s) + I vr(s) , s E S. r=l

Remark.

The function

d

does not completely describe the air

quality since the level of concentration changes irregularly with time. The reduction policy which is determined by considering the annual mean concentrations only is therefore a long-term regulation strategy which must be supplemented with suitable short-term measures to counteract temporary strong increases in ambient concentrations. The above formulation of an optimization problem for environmental pollution control is based on work by Gorr and Kortanek.

See e.g. Gorr,

Gustafson and Kortanek (1972) and Gustafson and Kortanek (1975).

Chapter II

Weak Duality

The present chapter is very elementary in its entirety but is of decisive importance for understanding the material to follow.

Here we

lay the foundations for the theoretical as well as computational treatment of linear optimization problems.

The simple examples are particu-

larly designed in order to familiarize the reader with the structure of such problems as well as the central concept of duality which plays a major role both in the theory and in all practical applications of linear optimization.

A thorough study of these examples is the best preparation

for the duality theory to be presented in Chapter IV and the algorithms of Chapters V through VIII.

§4.

DUALITY LEMMA AND DUAL PROBLEM We consider the optimization problem (P) which was introduced in §3.

It can be written in the following compact form:

(P)

Minimize

c

T

y

subject to

a(s)Ty > b(s),

s E S

or alternatively

n

n (P)

Minimize

I

r=l

c y r r

subject to

I

r=l

One obtains an upper bound for the value vector

y

is available.

a (s)y

r-> b(s),

v(P)

as soon as a feasible

r

According to the definition of

immediately that

v(P) < cTy. 20

s E S.

v(P)

we find

4.

Duality Lemma and Dual Problem

21

It is of great interest for numerical treatment to determine good lower bounds for

v(P).

This fact will be illustrated in many examples.

The

following fundamental lemma can be used for constructing such lower bounds. (1)

Duality lemma.

{s1,s2,...,sq}

Let the finite subset xl,x2,...,x

q > 1, and the nonnegative numbers

S,

be such that q

c = a(sl)x1 + a(s2)x2 + ... + a(sq)xq.

(2)

Then the following inequality holds for every feasible vector y = (Y1,...,yn)T:

-

b(s1)xl + b(s2)x2 + ,.. + b(s )x 4

We have assumed that

Proof:

< cTy.

q

(3)

is feasible for (P).

y

Then we find

in particular i = 1,2,...,q.

a(si)TY > b(si),

xi > 0, i = 1,2,...,q, we get

Since

i== 1

/

b(sxi < i 1

(a(s i)Ty)xi = l\i= a(s i )x i} Y. 1

i

The assertion now follows from (2). y

Since (3) holds for every vector

which is feasible for (P) we

immediately arrive at the following statement on lower bounds for the optimal value

(Note that here we revert to the componentwise

v(P).

representation of the vectors (4)

index set

Let

Corollary. S

a(si)

and

c.)

{s1,.... sq}, q > 1

and let the numbers

x1,...,xq

be a finite subset of the satisfy

q

a (s )x

i=I

r i i

= c

r

,

(5)

r = 1,2,...,n.

Then q

i=1

b(si)xi < v(P).

(6)

We remark already here that one is, of course, interested in obtaining the best possible lower bounds for

v(P).

We will show in later chapters

that for large classes of problems it is possible to obtain arbitrarily good lower bounds by selecting the subset properly.

s1,...,sq

and the numbers

xi

22

II.

(7)

We consider the LOP

Example.

yl + 1/2 y2

Minimize

subject to

We try now to determine a finite subset tive numbers are met.

xl,...,xq

We take first

yl + sy2 > es, {s1,...,sq}

s E [0,1].

of

S

and nonnega-

such that the assumptions of the duality lemma q = 1

and seek a point

and a nonnegative number

[0,1]

WEAK DUALITY

x1

sI

in the interval

with the property (5):

XI = 1

1

x1 = 1/2.

sl

These equations have the unique solution

xI = 1, sl = 1/2.

From (6) we

get s1 xle= I

= T = 1.648 ... < v(P).

e l/2

It is also easy to obtain a rough upper bound: numbers

curve

such that the straight line

yl,yy2

es

throughout the interval

[0,1].

One needs only to find

yl + sy2

(Draw a picture similar to Fig. 3.4.)

yl = 1, y2 = 2. yl + 1/2 y2 = 2.

lies above the

This occurs e.g. for We get

v(P) <

Hence we have arrived at the (not very good) bracketing

1.648 < v(P) < 2. A better result is obtained by selecting

q = 2.

We then are faced with

the equations (see (5)):

xl + x2 = 1 s1x1 + s2x2 = 1/2.

One possible solution is given by

sl = 0, s2 = 1, x1 = x2 = 1/2.

From

(6), xIes1 + x2es2 = 1/2 + 1/2 (8)

Exercise.

e = 1.859 < v(P).

Show that indeed

v(P) = 1/2(l+e)

by determining a suitable upper bound. (9)

Example.

Minimize

Consider the linear program

3y1 + y2

subject to the constraints of Example (8) in §3. for its optimal value.

We seek a lower bound

To obtain a representation (2) or (5) means that

4.

Duality Lemma and Dual Probler

23

the vector

c = (3,1)T

tion of

columns of the matrix appearing in Example (8) in §3:

q

shall be written as a nonnegative linear combina-

1

-1

1

0

1

-3

0

1

A= Since

c E R2, we take

and try at first to represent

q = 2

nonnegative linear combination of the first columns of

c

as a

A.

(3).1. (1)xl + l _3 x2 = The unique solution of this linear system of equations turns out to be From (6) we now get the lower bound 5 for the optimal

xl = 4, x2 = 1.

Determine graphically the optimal

b = (2,-3,0,0)T.)

(We had

value.

value and the solution of the linear program. Let

Lemma.

(10)

y = (yl,...,yn)

Assume also that the subset bers

{s1,...Isq}

be feasible for the problem (P). of

S

and the nonnegative num-

satisfy the assumption (2) of the duality lemma.

x1,...,xq

If

n

q

b(s)x i=1

=

is satisfied, then Proof:

I

r=l

1

1

Since

c y r r

y

is an optimal solution to (P).

y

is feasible for (P) we have

n

v(P) <

cryr.

E

r=1

On the other hand, from (11) and (6), n

cryr < v(P). r=1

The assertion follows. (12)

(LP)

where

Linear programming.

Minimize A

has

m

hold, of course. Ax = c,

cTy

Consider now the particular problem

subject to

column vectors

ATy > b,

al,...am.

In this case

q < m must

Then every nonnegative solution of the system

x = (x1,...,xm)T

will give lower bounds for the value

(13)

v(LP)

of the form

24

WEAK DUALITY

II.

bTx < v(LP).

(14)

Note that (13) can be written in the alternative form m c =

aixi, i=1

which corresponds to Equation (5), while (14) corresponds to the inequal-

ity (6). A natural objective is to select the subset nonnegative numbers the value

x1,...,xq

obtained from the duality lemma.

v(LP)

Dual problem (D): real numbers

xl,...,xq

{sl,...,sq}

and the

in order to maximize the lower bound for

Determine a finite subset

We arrive at the {s1,...,sq} c S

and

such that the expression

q

i=1

xib (si)

(15)

is maximized, subject to the constraints q

xiar(s.) = cr,

r = 1,2,...,n,

(16)

i=1

xi > 0,

i = 1,2,...,q.

{si,...,sq, xl,...,xq}

(17)

is said to be feasible for (D) when

si E S,

i = 1,2,...,q, and (16) and (17) hold.

The problem (D) appears to be very complicated since

However, we will see in Chap-

of "mass points", may be arbitrarily large. ter IV that

q = n

may be assumed in all problems of practical interest.

(Then (D) is a nonlinear optimization problem with in our argument we shall start by allowing Denote by

v(D)

q, the number

q

2n

variables.)

But

to be arbitrarily large.

the value of (15) subject to (16) and (17).

Then we con-

clude from the duality lemma (1) the (18)

Weak duality theorem.

v(D) < v(P)

The pair of problems (P) - (D) is called a dual pair.

The transfer

from the primal problem (P) to the dual problem (D) will be called dualization.

The following reformulation of Lemma (10) will be useful when the

results of the present section are applied to concrete problems. (19)

Lemma.

Let

{sl,...,sq, xl,.... xq}

y = (y1,...,yn)T

be feasible for (P) and

be feasible for (D).

If

4.

Duality Lemma and Dual Probier.

25

n

q

b(s.)x. = 1

i=1

holds, then

1

y

c y r r

E

r-1

is a solution of (P) and

is a

{si,...Isq, x1,.... xq}

solution of (D).

{s1,...,s q

,

..,x

x1

y = (yl,...,yn)T

Let

Complementary slackness lemma.

(20)

feasible for (P) and

q

}

be

be feasible for (D).

As-

sume also that the following relation holds:

n xi(I ar(si)yr

-

b(si))

i = 1,...,q .

= 0,

(21)

r=1

Then

y

of (D).

is a solution of (P) and

is a solution

{si,.... sq, xi,...,xq}

Further, the values of (P) and (D) coincide.

Proof:

In (21), x

.

> 0

1

implies

cn

i = 1,2,...,q.

ar(si)yr = b(si), r=1

Thus we have the following equation:

i=1

b(s 1 )x 1.

_

I(I

a (s 1.)yr )x

i=1 r=l r

=

1

Here we have used the feasibility of

E (I a (s .)x.)y r=1 i=l r 1 1 r

=

{sI'...,sq, xi,...,xq}.

I r=l

c

ryr.

The asser-

tion now follows from Lemma (19). (22)

Example:

Optimal production plan.

turn to the production model (2) in §1. G1,...,Gn

and the possible activities

In this subsection we re-

There we considered Ps (s E S)

n

goods

which were described

by the vectors a(s) = (al(s),...,an(s))T. Here

ar(s)

is a measure of the amount of good

or produced when activity

Ps

Gr

which is consumed

is carried out with intensity

1.

We had formulated an optimization problem (for maximization of profits) of the following form: (q > 1) of the index set

S

Determine a finite subset and real numbers

{sl,.... sq}

{x1....,x

}

such that the

q

expression

b(s1)xi + b(s2)x2 + ... + b(sq)xq is maximized subject to the constraints

(23)

26

II.

ar(si)xi + ar(s2)x2 + ... + ar(sq)xq < cr,

WEAK DUALITY

r = 1,...,n,

(24)

and x.

i = 1,...,q.

> 0,

(25)

In order to get an optimization problem of the type (D) we introduce slack variables

Then we write (24) - (25) in the following

fir, r = 1,2,...,n.

equivalent form q

ar(si)xi + Er = cr,

(26)

r = 1,2,...,n

i=1

xi > 0

(i = 1,...,q), r > 0

(r = 1,...,n).

(27)

This may be interpreted as meaning that the activities supplemented with the so-called disposal-activities (28)

The corresponding primal problem.

Ps, s E S

are

Pr, r = 1,...,n.

The maximization of the pre-

ference function (23) subject to the constraints (26), (27) is the dual of the following linear optimization problem: n Minimize

I

r=1

(29)

y c r r

subject to the constraints n

r=1

ar(s)yr>b(s), s ES

(30)

r = 1,...,n.

(31)

yr > 0,

y1,...,yn

The variables

of this

the prices of the goods

primal problem may be interpreted as and the number

G1..... Gn

nC

(32)

ar(s)Yr r=1

indicates the cost which arises when the activity out with intensity

Thus a "price system"

1.

Ps (s E S)

y1,...,yn

is carried

is feasible

(i.e. meets the conditions (30) - (31)) when all prices are nonnegative and when the cost (32) for no ing when the activity

P s

s E S

is below the revenue

result-

b(s)

is carried out with unit intensity.

The com-

plementary slackness lemma (20) now assumes the following form: (33)

xi > 0

for

Let

{s1,...,sq, x1..... xq}

i = 1,...,q

and let

y

be a feasible production plan with be a feasible price vector.

production plans and price vectors are optimal if

These

4.

Duality Lemma and Dual Probler.

27

n

ar(si)yr = b(si), i = 1,...,q

(34)

,

r=1

and r = 1,...,n,

yrEr = 0,

(35)

with cr -

r

ar(si)xi,

r = 1'...,n.

i=1

The conditions (34) and (35) admit an excellent economic interpretation: A feasible production plan and a feasible price vector are optimal if i) the cost per unit intensity of each activity

occurring in the pro-

P s

duction plan is equal to the corresponding revenue prices

yr

of goods

and if ii) the

b(s)

which are not exhausted (i.e. Er > 0) are zero.

Gr

By means of the tools developed in Chapter IV we will be able to give conditions which ensure that the problem (23) optimal production plan is solvable.

-

(25) of finding an

We shall also demonstrate that

there is then an optimal production plan involving at most

n

activities.

This result is true even if there are arbitrarily many possible activities.

The study of production models of the same kind as, and similar to, that of problem (23) - (25) has greatly stimulated the development of

The whole theory of Chapter IV as well as the simplex

linear programming.

algorithm of Chapter V can be motivated with concepts from economics. This is expounded in the book by Hildenbrand and Hildenbrand (1975) and the reader is referred to this text. (36)

Duality for linear programming.

We now investigate the import-

ant special case of linear programming, i.e. when the index set

n (LP)

Minimize

I

cryr

subject to

ATy > b.

r=1

We recall that the constraints of (LP) may be written in the form any > b.,

where

all.... am

S

is

Then (P) takes the special form (see (9), §3):

finite, S =

are the column vectors of the matrix

A, and

28

WEAK DUALITY

II.

bI

...

all

a12

a21

a22

alm ...

b2

a2m

b=

A =

l

..

.

a

an2

t

t

aI

a2

am

t

b

nm

and

m

In this case there are only finitely many vectors

ai (i = 1,...,m) and

is permitted by the constraints of the dual problem.

xi = 0

q = m

may put

Therefore we

from the outset and replace (16), (17) by

m xi > 0

aixi = c,

for

i = 1,...,m.

i=1

Using matrices we get with

x = (xi,.... xm)T

Ax= c, x> 0. Therefore we define the dual linear program to be the optimization problem m Maximize

(LD)

b.x. = bTx

subject to

Ax = c,

x > 0.

i=1

This is a problem with a linear preference function, linear equality constraints, and positivity requirements for all variables.

It is a very

important fact that problems of the type (LP) through simple transformations can be brought into the form (LD) and vice versa.

This is not pos-

sible for general problems of the type (P) and (D). The transformation (LP) - (LD).

(37)

A vector

y E Rn

meets the

constraints

ATy > b of (LP) if and only if there is a vector ATy - z = b, (Such a

such that

z > 0

is called a slack vector).

z

z E Rm

equalities to be satisfied by the vector

(38)

This system of equations and in(y,z) E Rn+m

does not have the

same form as the constraints of (LD) since only some of the ables, namely splitting up ATy+

-

z1,...,zm, must be nonnegative. y

ATy

in the following way.

n+m

vari-

This is remedied by

Consider the system

- z=b

y>0, y >0, z> 0

(39)

4.

Duality Lemma and Dual Problem

where

y+ E Rn, y

and

and

satisfy (39), then the vectors

z

satisfy (38).

z

We show that (39) and (38) are equival-

E Rn, z E Rm.

y+, y

If

ent.

29

y = y+ - y

To prove the converse note that every vector

may be written

y E Rn

with

y = Y+ - y

y+ > 0,

Thus from any solution (y ,y ,z)

y

> 0.

(40)

of (38) we may construct a solution

(y,z)

A representation (40) of

of (39).

y may be obtained by

putting yr = max(Yr,0) 1

r = 1,...,n.

,

(41)

yr = -min (yi,, 0) J But the representation type (40).

y = y+ - y

is not the only possible one of the

Let

r = y r + aor

y

yr where

yr ar

+

a

r I

are arbitrary nonnegative numbers.

Then

y = y+ - y

is also

a representation of the type (40) and it is easy to show that all representations of the type (40) may be constructed from (42).

We observe now

that

cT y= cT^+ y - cT-y holds for all representations of the type (42).

Therefore it follows that

the program (LP) is equivalent to the following optimization problem of type (LD):

Maximize

T -Cc Ty+ - c y )

subject to

A

(D)

(AT,

-AT,

Im)

y± z

(Y+,

Y-'

= b I

z)T > 0.

A (43)

The transformation (LD) -+ (LP).

(LD),

Ax= c,

x> 0,

in the equivalent form

We rewrite the constraints of

30

II.

WEAK DUALITY

Then we obtain from (LD) the following optimization problem of type (LP): c

A

Minimize

(LP)

-bTx

subject to

-A

0m

x >

-c

I

We define the double dualization of the linear

Definition.

(44)

f

program (LP) to be the following process:

First the linear program (LP)

A is dualized giving (LD).

Then the transformation (43)

(LD) - (LP)

is

A

Lastly, the linear program (LP) is dualized. A A We see immediately that (LD) is the dual of (LP). But we have al-

carried out.

A ready shown that (LP) and (LD) are equivalent.

Thus we arrive at the

important result:

If the linear program (LP) undergoes a double duali-

Theorem.

(45)

zation, an optimization problem equivalent to (LP) results. Consider the two optimization problems

Exercise.

(46)

T

Minimize

c

y

subject to

Ay > b,

Maximize

bTx

subject to

ATx < c,

y > b,

y E Rn,

and x > 0,

x E Rm.

In what sense can they be said to form a dual pair?

Carry out suitable

transformations which bring them into the form (LP) or (LD).

§5.

STATE DIAGRAMS AND DUALITY GAPS Using the simple weak duality theorem (18) of §4, we may immediately

derive a first classification table for the dual pair (P) - (D). of the type

v(P) = v(D)

given in Chapter IV.)

are called strong duality theorems.

(Results

They are

We recall that every minimization problem of the

type (P) must be in one and only one of the three states (see (7), §1) IC

(Inconsistent; there are no feasible vectors

we have (P)

y.

By definition

v(P) _ -.)

B

(Bounded; there are feasible vectors

UB

(Unbounded; there are feasible vectors

and

y

y

ference function is arbitrarily small, i.e.

v(P)

is finite.)

such that the prev(P) = -W.)

5.

31

State Diagrams and Duality Gaps

By the same token, the dual problem must be in one and only one of the three states indicated below.

(Observe that (D) is a maximization prob-

lem.) IC

(Inconsistent:

B

(Bounded:

UB

(Unbounded:

(D)

v(D)

v(D) _ -m.) finite.)

v(D) _ +-.)

The statement of the duality theorem (18) of §4 may be represented by the state diagram below.

Combinations of states of the dual pair (P)

-

(D)

which are impossible by (18) of §4 are marked with a cross in the diagram. (The reader should verify that these combinations cannot occur.) State diag ram for the dual pair (P)

(1)

-

(D).

P D

IC

B

UB

IC

1

2

4

B

3

5

x

UB

6

x

x

The Case 5 is of main interest for the applications. are both bounded.

Then (P) and (D)

This occurs when both problems are feasible.

It is possible to construct simple examples to demonstrate that all the Cases 1,2,3,4,5, and 6, which are not excluded by the weak duality theorem, do in fact occur in practice. We will show later that the Cases 2 and 3 do not occur in linear It is often

programming, i.e. linear optimization problems of type (LP).

possible to introduce "reasonable" assumptions on general linear optimization problems in order to insure that Cases 2 and 3 do not materialize. We shall treat this topic in detail in Chapter IV.

Nevertheless, we il-

lustrate Cases 2 and 3 of the state diagram by means of two examples constructed for the purpose. (2)

Example.

Minimize

(P)

yl

n = 2, S = [0,1].

subject to the constraints

(P) has feasible vectors, for we may take all feasible vectors

y = (yl,y2)T

syl + s2y2 > s2,

yl = 0, y2 = 1.

must satisfy

yl > 0.

Furthermore,

This fact is

easily illustrated by means of a diagram similar to Fig. 3.4.

we get

v(P) = 0 and Problem (P) is hence in State B.

s E S.

Therefore

II.

32

WEAK DUALITY

The corresponding dual problem (D) reads q

Maximize

sixi i=l

subject to the constraints q

sixi = 1

(3)

sixi = 0

(4)

i=1

i=1

si E [0,1]

i = 1,...,q

for

xi>0

q > 1.

and

By (4), for

The inconsistency of (D) is shown as follows: we must have

= 0

x. i

or

s. I

= 0

x. > 0

since

I-

and

i = 1,...,q

s2 > 0.

i-

But then

(D) is therefore in State IC and we have thus

(3) cannot be satisfied.

an instance of Case 2 in diagram Cl). (5)

(P)

Since

n = 1, S = [0,1]

Example.

Minimize

0

s(sy - 1) > 0, each feasible

s2 y1 > s

sy1 - 1 > 0

subject to the constraints

yI

for all

s2y1 > s,

s E S.

must satisfy

yI

This is not possible for any number

s E [0,1].

yl,

implying that (P) is in State IC.

The dual problem is

q

sx i i

Maximize

q subject to the constraints

s?xi = 0,

(D)

s. E [0,1],

x. > 0, for

i = 1,...,q

(D) is feasible and for each permissible lows that

si = 0

or

xi = 0

for

(q > 1),

{s1,...,sq, xl,...,xq}

i = 1,...,q.

it fol-

Thus (D) is in State B,

hence we have an instance of Case 3 in diagram (1). We have already mentioned that we shall in Chapter IV establish theorems proving

v(P) = v(D)

Thus we will prove that

is true given certain general assumptions.

v(LP) = v(LD)

always holds for linear program-

ming if at least one of the problems is feasible.

However, at the end of

this section we shall give examples of linear optimization problems which are in Case 5 of the diagram (1); i.e. where both the primal and dual problems are bounded, but where

v(P)

and

v(D)

do not coincide.

5.

State Diagrams and Duality Gaps

(6)

Definition.

Let a dual pair (P)

33

-

(D) be given.

The number

6(P,D) = v(P) - v(D) We introduce here the convention

is called the defect.

for all real numbers

If

c.

6(P,D) > 0, we say that a duality gap has

occurred.

The following diagram gives the values of the defect corresponding to all states of the dual pair. the state diagram (1).

This diagram is obtained directly from

(The impossible states which are marked with a

cross in (1) are omitted.) (7)

Defect diagram. (P) (D)IC

B

UB

+m

+_

0

+_

d

IC

I

B

(8)

y1

stands for a

nonnegative number,

Consider the following problem of type (P):

Example.

Minimize

d

0
0

UB

Here

subject to the constraints

syl + s2 y2 > 0, yl

s E [0,1]

> -10.

Here it is natural to look upon the index set as consisting of two different subsets since the constraints are generated by the vector a(s) _ (s,s2)T,

s E [0,1],

a(2) = (1,0)T (The notation

a(2)

is chosen arbitrarily.)

The reader should verify

that the constraints of (P) may be written in the form

a(s)Ty > b(s), where

S = [0,1] U {2} and

sES

II.

34

WEAK DUALITY

s E [0,1]

0,

b(s) = s = 2.

-10,

In the formulation of the corresponding dual problem we encounter infinitely We may represent them in the "matrix"

a(s) E R2.

many column vectors (see also §3) 0

...

S

...

1

0

...

s2 ...

1

t

t

0

t

t

a(l) a(2)

a(s)

a(s)

1

s E [0,1] . The dual problem can now be formulated at once. imply that the vector

combination of the vectors

qcl

slj si

i=1

x

+

1

a(s), s E S:

'j -q = lj q

0

The constraints of (D)

can be represented as a nonnegative linear

(1,0)

,

x1

0

..,x q

>0

(9)

(10)

sl,.... sq-1 E (0,1].

The second of the two equations summarized in (9) is q-l

2

s ixi = 0.

i=1

Because of (10) we must therefore have Therefore

xi = 0

or

si = 0, i = 1,...,q-1.

is necessary in order to satisfy (9)

xq = 1

- (10).

But then

the value of the dual preference function becomes q

b(si)xi = -10. i=1

Thus we conclude

v(D) = -10. We now determine

v(P).

sy1 + s2 y2 > 0,

we get

yI > 0.

s E [0,1]

0
s E [0,1],

(sY1 + s2Y2 = s(Y1 + sy2)

implies

Therefore

Since

yI > 0.)

and

yl + sy2 > 0, all

5.

State Diagrams and Duality Gaps

We now note that every vector (P).

35

(0,N,2)T E R2

with

is optimal for

y2 > 0

Thus we conclude

v(P) = 0. We have thus shown that the dual pair (P)

-

(D) has the duality gap

d(P,D) = 10.

Here we have an instance of Case 5 of the state diagram (1) or the defect diagram (7) with fect

d

d = 10.

From this example we also realize that the de-

may be made arbitrarily large by appropriately choosing the con-

straints for (P). Exercise.

(11)

Minimize

Consider problem (7) of §4:

yl + 2 y2

subject to

yl + sy2 > es,

s E [0,1].

Show that both the primal problem and its dual are solvable and that no duality gap occurs.

Hint:

Use for the dual

q = 2

and

sl = 0, s2 = 1.

Up to now we have not studied the solvability of (P) and (D).

(12)

This matter will be discussed in Chapter IV in connection with duality theory.

Exercise.

(13)

Minimize

-y1

a)

Consider the linear optimization problem

subject to the constraints

-yI > -1

(P)

-syI - y2 > 0,

s = 1,2,3,...

Formulate the corresponding dual problem (D) and show that there is a duality gap b)

6(P,D) = 1.

Show that the problem (P) in a) is equivalent to the task:

Minimize

-yI

subject to

-y1 > 0 -YI - Y2 > 0.

Form the dual and show that no duality gap occurs. (14)

The example of the preceding exercise shows clearly

Remark.

that the dual (D) of a certain linear optimization problem (P) depends not only on the preference function and the set of feasible vectors but also on the formulation of (P), i.e. on the manner in which the set of feasible vectors is described through linear inequalities. (15)

equality

Exercise.

yl > 0

Consider again the Examples (2) and (5).

is added to the constraints of (P) in (2).

The in-

Show that

36

II.

WEAK DUALITY

the corresponding dual pair is an instance of Case 5 of (1) and that no duality gap occurs.

Analogously, the inequality

the constraints of Example (5).

0

yl > 1

is added to

Show that the duality gap now "disappears"

(Case 6).

The question now arises whether the duality gap, when it occurs, is caused by an "unfavorable" choice of inequalities n

ar(s)yr > b(s),

s E S,

r=1

to describe the set of feasible vectors of (P).

Is it possible that there

always is an equivalent system of inequalities n

r=1

2r(s)yr > b(s),

sES

describing the same set of vectors and such that no duality gap appears? The answer is yes.

The existence of an equivalent, but for the pur-

pose of duality theory "better", system of inequalities is demonstrated in a paper by Charnes, Cooper and Kortanek (1962). (1975).)

(See also Eckhardt

However, there are no simple methods to transform systems of in-

equalities to remove duality gaps. questions further.

Therefore we will not discuss these

Instead, we shall in Chapter IV give simple conditions

which insure that for a given linear optimization problem no duality gap occurs.

Chapter III

Applications of Weak Duality in Uniform Approximation

Uniform approximation of functions is one of the most important applications of linear optimization.

Both the theory and the computational

treatment of linear optimization problems have been greatly influenced by the development of the theory of approximation.

In the first section of this chapter the general problem of uniform approximation will be formulated as a linear optimization problem. corresponding dual is derived.

The

The rest of the chapter will be devoted

to the special case of polynomial approximation.

Some classical problems

which admit an exact solution in closed form are also studied.

§6.

UNIFORM APPROXIMATION Let

be an arbitrary set and

T

which is defined on T tions

v

r

T + R, r = 1,...,n

:

f: T + R

and bounded there.

a real-valued function

The real-valued bounded func-

are also given.

The problem of linear uniform approximation is to determine a linear combination n r=l

yrvr

which best approximates

f

in the sense that the following expression is

minimized:

n sup tET

I

I

yrvr(t) - f(t)1

r=l

37

38

III.

(1)

APPLICATIONS OF WEAK DUALITY IN UNIFORM APPROXIMATION

The problem of uniform approximation:

n

Minimize

sup tET

(PA)

yrvr(t) - f(t)

I

I

r=l

over all vectors

y = (y1,...,yn)T E R.

An equivalent formulation is Minimize

over all vectors

yn+1

(y,yn+l)T E R"1,

subject to the constraints

nn

all

yrvr(t) - f(t)l S Yn+1'

t E T.

r=1

We note that for real numbers Iml

a

and

g

the inequality

< a

is equivalent to the two inequalities -a > -B

a > -S Therefore the approximation problem (PA) may be rewritten in the following form:

Minimize n

r=1

yn+l

subject to the constraints

vr(t)yr + yn+l > f(t), all t E T

nn

r=1

(2)

all

vr(t)yr + yn+l > -f(t),

(3)

t E T.

(4)

This problem now has the form of a linear optimization problem (P) in Rn+l

provided the index set

an(s))T

are properly defined.

S

and the functions

a(s) = (al(s),...,

There are two different kinds of vectors

since the vectors

a(s)

I

vl(t) l

1

and

-vl(t) l

,

t E T,

correspond to the conditions (3) and (4) respectively.

(5)

The constraints

of the dual of the problem (2) - (4) imply that the vector

6.

Uniform Approximation

39

0 1

c =

E Rn+100

1

which appears in the preference function of (2), must be expressed as a nonnegative linear combination of finitely many of the vectors (5). Hence the dual problem corresponding to (2)

-

(4) takes the form (compare

with §4, (15) - (17)): {t+,...,t++}, {t...... t

Determine two subsets

q_

and real numbers

x1,...,x++

x1,...,x

q

+

f(t+)x± 1

i=1

-

1

f(t )x

i=1

1

T(q+ + q- > 1)

of

}

q

such that the expression

-

q (6) 1

is maximized, subject to the constraints q

q

+

+

vr(ti)xi

q+

+

r = 1,...,n,

(7)

q-

x + i=1

vr(ti)xi = 0,

x = 1, i=1

1

(8)

1

x. > 0,

i = 1,...,q ,

(9)

x. > 0,

i = 1,...,q

(10)

1 -

This dual problem can be written in an equivalent, but simpler form. (11)

The dual problem (DA).

(q > 1) and real numbers

Determine a subset

xl,x2,...,xq

{t1,.... tq}

of

T

such that the expression

q

f(ti)x

(12)

i=1

is maximized, subject to the constraints q

(13)

r = 1,...,n,

= 0, i=1 v r(t.)x. 1 1 Ixil < 1.

(14)

i=1 (15)

Lemma.

The optimization problems (6) - (10) and (12) - (14) are

equivalent in the following sense: +

+

x1x +, x1,...,x ...... q

q

}

For every

satisfying (7)

{tl,...,t

+

q

-

,

tl,.... t

(10) one may construct

q

40

III.

APPLICATIONS OF WEAK DUALITY IN UNIFORM APPROXIMATION

{t1,...,tq, x1,...,xq}

satisfying (13), (14) such that the values of the

preference functions (6) and (12) coincide, and vice-versa. Let a solution of (7)

Proof:

sume that

(10) be given.

-

T+ = (tl,...,t++} put

q = q + +

We may as well as-

We discuss first the case when the sets

x+ > 0, xi > 0.

and T = (ti,...,t q q q_, (t1,...,tq}

= T+ U T-

Then we just

are disjoint.

}

and

xj,

if

ti = tj

for a

tj E T

I -X.,

if

ti = t.

for a

t. E T ,

It is easy to verify that (13), (14) are satisfied and that (6) and (12) have the same value.

In the remaining case when k, R

point in common, there are indices tk = t-,

with

xk

then we remove T

and

T

have a

min(xk,xR) = d > 0.

Then we replace

from

T+

such that

from

tk

xk - d

and

with

xi

T+, but if instead

xi - d.

xk-d = 0

If now

xi - d = 0, tk

is removed

This transformation does not change the value of the preference

.

function (6), and the equations (7),

(9), (10) continue to hold.

But in-

stead of (8) we get q x

X. <

+

1.

i=1

1

i=1

1 -

The sets

T

and

T

will become disjoint after a finite number of the

transformations described above and a suitable solution of (DA) is constructed by the procedure given earlier.

the assertion we let set

q

be feasible for (DA).

Now

= q, t1 = ti, i = 1,...,q, and

= q

x+ =

To verify the remaining part of

{tl,...,tq, x1,...,xq}

max(O,xi) _ (Ixil + xi)/2,

xi = -min(O,xi) _ (Ixij

- xi)/2,

i = 1,...,q.

The rest of the argument is straightforward.

Note that in order to

satisfy (8) it might be necessary to replace

x+

xi + c, where

c > 0

with

xi + c, xi

with

is chosen so that the condition (8) is met.

All duality results which have been derived for the dual pair (2) (4),

(6)

- (10) may be applied to the pair of problems (PA), (DA) from

(1) and (11) to give corresponding statements.

However, many of these

-

6.

Uniform Approximation

41

theorems may be shown directly for the pair (PA) - (DA).

This is true,

e.g. for the duality lemma which could be based on (1) of §4:

numbers

Let the finite subset

Lemma.

(16)

xi,...,xq

(tl,...,t } a T 4

and the real

be such that

q

r = 1,...,n

vr(ti)xi = 0,

(17)

i=1

q L

i=l

IxiI < 1.

(18)

-

Then the following relation holds for any

q

y E Rn:

n

i=1

(19)

Yrvr(t) - f(t)I.

f(t)x1 < suPI I tET r=1

Proof:

From (17) we conclude

yrvr(ti))xi = 0. G ( 1 i=1 r=1 Thus q

n

q f(t1.)x1 .

_

i=1

((

Sf(t 1.)

-

i=1

I

yrvr (t

i)}x.

1

r=1 n

tq

If(ti) -

L

Yrvr(ti)I

Ixil

r=1

i=1

q

n

< suplf(t) tET

`i

r=l

n

< sup I f (t) tET

I Ix.I

Yr vr (t) I

i=l

1

I

I yrvr (t) r=1

which is the desired result. (20)

Show that the left hand side of (19) may be replaced

Exercise.

by q q I

L

f(ti)xiI

i=l (21)

Remark.

If

any choice of elements

q > n+l, then (17) has a nontrivial solution for tl,...,tq

in

T.

underdetermined linear system of equations

Indeed, (17) then gives the

42

APPLICATIONS OF WEAK DUALITY IN UNIFORM APPROXIMATION

III.

v1(t1) ... vl (t q) v2(tl) ... v2(tq)

l vn(tl)

vn(tq)

22

0

2

t 0 j q

J

and setting

x =

Ixii)-1X,

(

(22)

i=1

the vector

now meets the constraints (17), (18) of (DA).

x E Rq Example.

(23)

The function

mated by a straight line

f(t) = et

y1 + y2t

is to be uniformly approxi-

over the interval

T = (-1,1].

Thus

we need to solve the problem: sup let - yl - y2tl. tET

Minimize (yl,y2)

We want to apply Lemma (16).

We select

q = 3

and set

tl = -1, t2 = 0,

The system of equations (17) then becomes

t3 = 1.

xl + x2 + -X1

3=0

3 = 0.

+

The general solution of this system is given by

21 = a X2 = -2a X3=a

a

where

is arbitrary.

The "normalization" (22) gives

x = (4, -2, 4)T,

which together with

t1 = -1, t2 = 0, t3 = 1

Thus we may conclude from (16) that if

(DA).

straight line over the interval 1

we -1 -

1

1

+

meets the constraints of et

is approximated by a

[-1,1], then the error will be at least

Z 0.27.

An upper bound for the smallest possible approximation error is obtained by taking

6.

Uniform Approximation

43

yl + y2t = 1.36 + t.

Then sup

let - 1.36 - tj

= 0.36.

tE[-1,1] The function

Exercise.

(24)

approximated over the interval

f(t) = 1/(2+t)

[-1,1]

is to be uniformly

by a straight line

y1 + y2t.

Determine a lower bound for the value of the corresponding approximation problem by proceeding as in (23). puts

t1 = -1, t2 = 0, t3 = 1.) x1, x2, x3

for

optimally for (DA).

q = 3

Hint:

t2 = T.

xl, x2, x3

and

One gets the same linear system (The lower bound is

Consider the same example as in (24) with

t1 = -1, t3 = 1, but set

Let

Hint:

as in the preceding example.

Exercise.

(25)

(Thus one selects again

Then try to determine and

T

0.083.)

q = 3.

t2 = T

become the variables of

the following optimization problem:

Maximize

1+T + 3

xl +

subject to the constraints

xl+x2+x3= 0, (26)

-xl + TX2 + x3 = 0, 1x11 + 1x21 + Ix3l = 1,

(27)

-1 < T < 1.

(28)

Assume that

and

xl

are positive and

x3

xl - x2 + x3 = 1.

c omes

press

x1, x2

and

x3

x2

negative.

Then (27) be-

This relation is used together with (26) to exas (linear) functions of

T.

We then enter these

expressions into the preference function and maximize with respect to This gives the lower bound

T.

0.0893.

The following simple lemma may be useful when one wants to show that a certain vector

y

is an optimal solution of (PA).

An illustrative

example is given in (31). (29)

and

Lemma.

Let

{t1....It q

,

q > 1, satisfy q

vr(ti)xi = 0, i=1 q

I 1xil = 1. i=l

r = 1,...,n,

xl,... x }, where q

ti E T, i = 1,...,q,

44

Let

APPLICATIONS OF WEAK DUALITY IN UNIFORM APPROXIMATION

III.

y E Rn

and define

n

yn+l = sup f Ct) - I yrvr(t) tET

r=1

Assume also that the following relations hold for

i = 1,...,q:

Either

x.I = 0 or n YrvrCti) = yn+1 sgn xi

f(ti) -

where

sgn xi = xi/Ixil.

(30)

I r=1

is an optimal solution of (DA)

Then we may assert: {t1,...ItgI x11 ...,xq} and

of (PA), and the values of (PA) and (DA) coincide.

y

Proof:

f(t i)x

i=l

=

=

I y(

I f(t i )xi - r=l r i=1 vr (t i )x. L i i=l I i=l

f(ti) -

I Yrvr (ti)}x

i.

r=1

Applying (30) we get

q

qqC

f(ti)xi = Yn+1

iLl xi

qqC

sgn(xi) = Yn+l iLllxil n

= suplf(t) - I yrvr(t)I. tET r=l

The statement now follows from Lemma (16). (31)

Example.

The function

mated over the interval

[0,2]

f(t) = t2

is to be uniformly approxi-

with a linear combination of the functions

v1(t) = t, v2(t) = exp(t). Andreasson and Watson (1976) give as the solution of this approximation problem the following coefficients

of

vl

and

v2:

We want to use Lemma (29) to verify that these values of

yl

and

y2

yl = 0.18423256,

y1

and

y2

y2 = 0.41863122.

are optimal (within the precision shown).

One first establishes that the

error function

t2

- ylt - y2 exp(t)

assumes its minimum and maximum values at t2 = 2.00000000:

tl = 0.40637574

and

6.

Uniform Approximation

ti 2

- y1t1 - y2 exp(t1) = -0.53824531,

2

t2

45

- y1t2 - y2

exp(t2) =

0.53824531.

The dual constraints from (29) read (with

q = 2)

tixl + t2x2 = 0,

exp(tI)x1 + exp(t2)x2 = 0, Ix1I + Ix2I = 1.

We put

sgn x1 = 1

and

Then two of the

sgn x2 = -1.

equations above

become tlxl + t2x2 = 0,

-xl + x2 = 1. tI = 0.40637574

Entering

x1 = -0.83112540

and

and

into these equations we obtain

t2 = 2

It is now easy to check that all

x2 = 0.16887459.

conditions of Lemma (29) are met.

Thus the proposed solution is indeed

optimal.

We conclude this section by showing that the approximation problem is solvable under fairly general conditions. Theorem.

(32)

that the functions on

T.

T c Rk

be nonempty and compact and assume also

f, v1,...,vn

are continuous and linearly independent

Let

Then the linear approximation problem (PA) is solvable; i.e. there

is a vector

y E Rn

max If(t) tET

such that

n - Iy v (t)I = min max If(t) r=l

r r

yERn tET

I

r=l

y v (t)I. r r

We may write "max" instead of "sup" in the formulation of

Note.

(PA) since the functions

f, v1,...,vn

are continuous and

and hence the error function n

y v

f

r r

rI l

assumes its maximum and its minimum. Proof:

We define a norm on

n IIyNIv = maxi I yrvr(t)I tET r=l

Putting

n -

y = 0

we get

Rn

by

T

is compact

46

APPLICATIONS OF WEAK DUALITY IN UNIFORM APPROXIMATION

III.

n

max If(t) -

E

r=1

tET

y v (t) l = max lf(t) l

rr

= A.

tET

Hence the optimum value of (PA) lies in the interval

[O,A].

the minimization we need only to consider those vectors

y

Because of which satisfy

n

max If(t) - I yrvr(t)l < A. tET r=1

(33)

Using the triangle inequality on (33) we find

n

n

II Yrvr(t)l < If(t) r=1

Yrvr(t)l + lf(t)l `_ 2A.

E

r=1

Thus we need only to minimize over those vectors

y E Rn

such that

IIYlly < 2A;

i.e. a compact subset of

Rn.

Since the preference function of (PA),

n

y -+ max If(t) - I yrvr(t)l, tET r=1

is continuous, the existence of an optimal solution follows by Weierstrass' theorem (see (13), 52).

V.

POLYNOMIAL APPROXIMATION This section is devoted to the study of (PA) in the case when

a real interval and the function nomial.

f

is

T

is

to be approximated by a poly-

Then major simplifications are possible and one can, for example,

calculate lower bounds for the error of the best approximation without treating the dual problem explicitly.

Some special approximation prob-

lems admitting an optimal solution in closed form are also treated.

We

now prove: (1)

Lemma.

(x1,. ..,xn+l)

Let

tI < t2 < ... < to+l

be fixed real numbers and let

be a nontrivial solution of the homogeneous linear system

of equations n+l

r 1 (2)

i=1

Then xi xi+l < 0,

i = 1,...,n.

7.

Polynomial Approximation

Pn

the uniquely determined polynomial

Proof:

Let

be a fixed integer such that

i

Yrt

Pn(t) _

47

1 < i < n.

Denote by

r-1

r=1

satisfying

j =i

1,

Pn(t.)

D,

(See Fig. 7.1.)

= 1,...,n+1,

j

That such a

i,

j

j +

does exist is an immediate consequence

Pn

of the fact that the so-called Vandermonde matrix is nonsingular. (3) below.)

From (2),

P (t.)xi _ i=1

n+l

n

n+l n

(See

i

y

L

L

r i-1

r=l

Pn

Due to the construction of

tr-1 X. = 0. i

i

this relation gives

xi + Pn(ti+l)xi+l = 0. Pn

cannot vanish in

[ti,ti+l ]; if it did, Pn

Therefore

which is impossible. (3)

Exercise.

Vandermonde matrix

Let

V

Pn(ti+l) > 0

t1 < t2 < ... < to

by

1

Fig. 7.1

would have

n

and we conclude be given.

zeros,

xixi+l < 0.

Define the

48

APPLICATIONS OF WEAK DUALITY IN UNIFORM APPROXIMATION

III.

1

tI

t2

...

to

t2

t2

...

t2

2

1

n

V(tl, ..,tn) = to-1

to-1

to-1 .

.

.

n

2

1

It can be shown that det V(tl,...,tn) > 0.

(4)

Use (2) to obtain the expression det V(t1,...,ti-1, ti+1"* 'Itn+l) xi = -xi+1

et V t1,...,ti-I, ti, ti+2....,tn+1

This combined with (4) gives an alternative proof of Lemma (1). We remark here that a result corresponding to Lemma (1) may be established not only for tems

l,t,...,tn-1, but also for general Chebyshev sys-

The theorems to follow which depend on Lemma (1) can

v1,...,vn.

also be generalized.

See Chapter VIII.

The following theorem, which is due to De La Vallee-Poussin, is important since it can be used for calculating lower bounds for the error of the best possible approximation without solving the linear system (2) explicitly. (S)

of degree

Theorem.

Let

< n, and let

f

be continuous on

(a,s], let

a < tI < t2 <...< to+1 < S

be

P

n

be a polynomial

points such

that {f(ti) - P(ti)}.{f(ti+1) - P(ti+l)} < 0,

(See Fig. 7.2.)

min I f (ti)

i

i = 1,...,n.

(6)

Then

-P

)I < An <

max I f (t) -

a
P(t)j,

(7)

where n

An = infl max If(t) - I yER a
A

n

yrtr-ll.

denotes the smallest error which can be achieved when

approximated by polynomials of degree

< n.

f

is

7.

Polynomial Approximation

49

degree P < n = 3 n + 1 = 4

t3

t2

t1

Fig. 7.2

Proof:

The right-hand inequality in (7) is obvious.

Let

pl' " "pn+l

be a nontrivial solution of the system n+1 E

r-1 ti Pi = 0,

r = 1,...,n.

i=1

By Lemma (1) we may assume

pipi+l < 0, i = 1,...,n.

Now put

n+l

xi = Pi{ El I pjI }-1. In this way we get a feasible solution to the dual problem since n+l

tr lx. = 0,

r = 1, ..,n, (8)

n+l G

lxil = 1.

i=1

By (16) of §6 (weak duality) we also have n+l

f(ti)xi < An

(9)

i=1

We now define 6i = f(ti) - P(ti); by assumption (6), 6i6i+l < 0.

If the signs of all numbers

xi

changed simultaneously, the constraints of (DA) are still met.

are

Therefore

50

III.

APPLICATIONS OF WEAK DUALITY IN UNIFORM APPROXIMATION

we can always achieve X161 > 0

(10)

since we also have n+l

n+l

n+l

+ L

i=1

Applying (8) and (10) we find that

xixi+1 < 0.

xilf(ti) - P(ti)I

E

f(ti)Xi

X

xi6i

i=1

i=1

n+ > min I6iI

Ix.

E

= minlf(ti) - P(ti)I.

i=1

I

i

An application of (9) now gives the desired result. Corollary.

(11)

that there are

n+l

Let

P

points

be a polynomial of degree a < tI < t2 <...< to+l <-a

< n

and such

with the proper-

ties

I6iI = If(ti) - P(ti)I = and

Then

6 .6 1.+1 < 0,

i = 1,...,n+l, (12)

i = 1,...,n.

1

is a polynomial of degree

P

the uniform norm. f - P

max If(t) - P(t)I, a
< n

which best approximates

f

in

The conditions (12) state that the error function

alternates in sign at

t1,...,tn+1

and assumes its largest ab-

solute value at these points. Remark.

(13)

In the special case when 1611=1621=...=16n+ll'

we get n+l

f(ti)xi = minlf(ti) - P(ti)I i=1

i

Hence (7) and (9) give the same lower bound for the attainable approximation error in this case.

We shall show in Chapter IV that a strong dual-

ity theorem can be established for the dual pair (PA) and (DA); i.e. no duality gap occurs.

This entails the use of Theorem (5) for constructing

arbitrarily good lower bounds for

An

by choosing

suitably. (14)

Determination of a polynomial satisfying (6).

t2 <...< to+1 < a

be given.

Define the function

6

a < tI <

Let

by

6(ti)

We now seek a polynomial

P

of degree

< n

and a constant

a

such that

7.

Polynomial Approximation

51

i = 1,...,n+l.

PCti) = f(ti) + ed(ti),

(15) is a linear system of equations with as unknowns.

(15)

and the coefficients of

a

Using (4) it is easy to demonstrate that

and

P

P

are

c

uniquely determined by (15). and

P

scheme.

a

are efficiently calculated using a so-called difference

(We assume that divided differences are familiar to the reader.

Otherwise see e.g. Dahlquist and Bjorck, (1974), p. 277.)

Since

P[tl,...,tn+1] = 0, (15) gives at once e = -f[t1,...,tn+1]/5[tl,...,tn+l],

where we use the customary notations for divided differences.

P

may be

represented in the "Newton" form n-1

P(t) = P[t1] + P[t1,t2](t-t1) +...+ P[t1,t2,...,tn]

II (t-ti).

i=1

The divided differences appearing in this formula are easily computed from the intermediate results obtained when calculating (5),

lei

is a lower bound for

(16)

Numerical example.

tl = 0, t2 = 1/2, and

ti 0

f(ti)

t3 = 1.

f[ti,ti+l]

Let

[a,$] = [0,1], f(t) _ (l+t)-1, n = 2,

The difference schemes for f[t1,t2,t3]

1/3

(17)

t2 E (0,1)

are:

6[tl,t2,t3]

-8

1

-4

1/2

-1

e = 1/24; i.e. the function

mated in the uniform norm over less than

6

4

2/3

We get at once

and

-1

1

-1/3 1

f

d[ti,ti+l]

8(ti)

-2/3 1/2

By Theorem

c.

An.

[0,1]

1/(l+t)

cannot be approxi-

by a straight line with an error

1/24 '= 0.0417.

Exercise.

Take

t1 = 0, t3 = 1, and show by optimizing over

(see also exercise (25), §6) that

A

2

= (3-j)/4 z 0.0429.

We now discuss some special approximation problems which nevertheless are of general interest.

APPLICATIONS OF WEAK DUALITY IN UNIFORM APPROXIMATION

III.

52

(18)

Exercise.

f

Let

have two continuous derivatives on Denote by

f"(t) > 0, t E [a,s].

and be such that

which interpolates

a

at the endpoints

f

and

t

[a,8]

the straight line Put

$.

6= max If(t) - k(t) 1. a
Next use (11) to show that the straight line which approximates

f

best

in the uniform norm has the representation

k(t) - 6/2 and that the approximation error is (19)

Exercise.

t1

=

a+S 2

-

and

tl

in (18) and show that the straight

f(t) = t2

Put

line which best approximates this function at

6/2.

in the uniform norm interpolates

f(t) = t2

t2, where

1 2T (0-')'

a+S

t2 =

2

+

2T

Show also that the approximation error is

8 (a-B)2. We will next treat the more general problem of approximating in the uniform norm by a polynomial of degree

f(t) = to

In order to represent

< n.

the solution in a concise form we introduce the Chebyshev polynomials. (20)

Definition.

The Chebyshev polynomials

TO,T1,...

are defined

through

T0(t) = 1,

T1(t) = t (21)

TnCt) = 2t Tn-1(t) - Tn-2(t), (22)

n = 2,3,...

Show that the recurrence relation (21) is satisfied

Exercise.

by

(23)

Tn(t) = cos(n arccos t). Hint:

Use the addition theorem (24)

cos(A+B) = 2 cos A cos B - cos (A-B). We now prove: (25)

Theorem.

Let

[a,8]

be a given interval.

n

min max t...... t_ a
I

II

i=l

(t-ti)I = 2(,_a)n/4n

Then

7.

Polynomial Approximation

53

The minimum is assumed for

ti =a+8 2 +

2

8-a cos 9i, where

Bi =

i-1/2 n

i = 1,...,n.

7r,

(26)

Also,

2(8-a)n T (2t-a-0 ). 4n n 8-a

T1

i=1

i

Proof:

Consider the approximation problem n

max stn -

Minimize

tE [a, 8]

yERn

We next determine

yrtr-ll

I

(27)

r=1

y

through the condition

n to

= Qn(t), ytr-1 r

E

(28)

r=1

where

Qn (t) = 2 (8-a)n T (2t-a-8) 4n

n

(29)

8-a

and apply (11) to verify that

y

is a solution of (27).

We first note that 2t-a-$ 8-a

t

maps

on

[a,8]

[-1,1].

Using the recurrence relation (21) for

verify that the coefficient of that

ITn(t)l < 1

and thus

to

in

Q.

is

1.

IQn(t)I < 2(8-a)n/4n.

Tn, we

By (23) we conclude

We also find that

Qn(ti) = (-1)i-12(8-a)n/4n, where 2+8

t* = 1

and

2

IQn(t)j

+

8-a

cos

2

(i-1)"

n

,

i = 1,

assumes its maximum value at

, n+l ti.

(30)

,

Hence the conclusion

follows from (11).

Using (28) and (29) we conclude that the polynomial < n

which best approximates

to

in the uniform norm on

Pn

of degree

[a,8]

is given

by P (t) = to _ 2(8-a)n T (2t-a-8). n

(Note that

vanishes.)

4n

n

8-a

when the right hand side is expanded the coefficient of

(31)

to

54

APPLICATIONS OF WEAK DUALITY IN UNIFORM APPROXIMATION

III.

(32)

Exercise. -N-1

ti = cos

ii,

Let

i = 1,...,N+1,

and

si = cos 1 N/2,

i = 1,...,N.

Show that the Chebyshev polynomials satisfy the following orthogonality relations: 1

N N

I Tm(si)Tn(si)

I 'Tm(ti)Tn(ti)

m = n = 0

0<m=n
j 1/2,

i=1

1N+1

,

=

i1

0

,

1

,

m = n = 0

1/2,

JI

0

,

(33)

m#n, m
0<m=n
(34)

m#n, m
Here the notation " means that the factor of the first and the last term in (34).

1/2

should be placed in front

Note also that

TN(ti)

and TN(si) = 0. We next treat an approximation problem which sometimes occurs in the study of iteration processes in numerical linear algebra. (35)

Theorem.

Let

[a,8]

be a bounded interval such that

0 ¢ (a,6].

Consider the problem max IP(t)I a
Minimize

(36)

< n

over all polynomials of degree

such that

P (O) = 1.

(37)

The optimal solution is given by P(t) = Tn(2tt=a-B)/T.(a±s)

Proof:

We can write

P(t) = 1 - ylt - y2t2 since

P(O) = 1.

min y1,...,yn

P

(38)

in the form yntn

(39)

The problem (36), (37) may then be written max 11 - yit -...- Yntnl a
(40)

7.

Polynomial Approximation

55

and we recognize (40) as an instance of (PA) in (1) of §6. determine subsets x1,...,xq

Its dual reads:

(q > 1), and real numbers

{ti,.... tq} c :[a,01

such that

q xi

(41)

i=1

is maximized subject to the constraints q

xi ti

= 0,

1

r = 2,...,n+l

(42)

i=1

(43)

Ixil < 1. i=1

See (11) of §6. (40) and (41)

-

We shall construct feasible solutions to the two problems (43), and then use (29) of §6 to verify that these solu-

tions are optimal. In (41) ti =

SZa +

(43) we put

Ba z

q = n+l,

cos 6i,

xl = 2n Tn(cos 01)'

0.

=

il 2n

xn+1

i = 1,...,n+l

iT,

Tn(cos 6n+1)

(44)

Tn(cos 6i = 2,...,n.

xi = n

Condition (42) is now met by (33) since we may express combination of We observe that 'l,...,yn

Since

T0....,Tr-1. P

ITn(ti)I = 1,

in (38) is of the form (39).

by (39) for this particular polynomial

tr-1

as a linear

(43) is also satisfied.

Next we define P.

By (38),

P(ti) _ (-1)1-1/Tn(a±0)

Now (44) gives (-1)i-1

xl _

1

2n'

xn+1

(-1)n 2n

'

2,...,n.

n

xi

Hence (30) of §6 is also met, establishing optimality of the polynomial (38).

We next discuss the problem of constructing polynomials of degree < n

which approximate a function

approaches are conceivable: determine the polynomial P(ti) = f(ti),

P

f

on a bounded interval

i) select

n

of degree

< n

i = 1,2,.-.,n,

points

[a,0].

t1 < t2 <...< t

n

Two and

satisfying (45)

56

select

ii)

Q

APPLICATIONS OF WEAK DUALITY IN UNIFORM APPROXIMATION

III.

n+l

of degree

points

sI < s2 <...< sn+l

degree

min max If(t) - Q(t)I, Q sl,...ISn+l

< n

of degree

Q < n.

(46)

Show that there is one and only one polynomial

Exercise.

(47)

and determine the polynomial

which solves the problem

< n

satisfying (45).

Hint:

tions which must be satisfied by the coefficients of The construction of verify that cide with

is described in (14).

Q

interpolates

Q

t1..... tn

f

in

n

P

Derive a linear system of equaP.

It is now easy to

points which generally do not coin-

We next state an expression for the ap-

in (45).

proximation error.

n

be a closed bounded interval and let

[a,s]

n points with

be

t1,...,tn have

Let

Lemma.

(48)

a < tI < t2 <...< to < a.

continuous derivatives

the polynomial of degree

< n

f',...,f(n)

on

satisfying (45).

[a,$]

Further, let

f

and denote by

P

Then

f(t) = P(t) + R(t),

(49)

where n

(n)

I

R(t) = nf with the unknown point the points

and

t

(t-ti),

TI

i=1

(50)

lying in a subinterval of

E

t1,...,tn.

depends on

In general,

containing

[a,R] t.

The proof of this result is given in Dahlquist-Bjorck (1974), p. 100. Using (49) and (50) we get n

If(t) - P(t) I _ n; If(n) (t) I

amaxsI1IT

(t-t) I

(51)

.

The approximation error is thus bounded by an expression containing a factor which is independent of

A natural approach is to make this

f.

second factor as small as possible in the uniform norm.

We may here di-

rectly apply Theorem (25) to determine the appropriate choice of tl,.... tn

ti =

To select

in (45); namely,

a26

si

interpolates

+

-S -_a

cos LIZ-2 n,

i = 1,2,...,n.

in (46) we argue as follows. f

at the points

t

,

1

n t. i=1(t-1)

_

2t-a-S

2(8-a)n T 4n

n(

0-a )'

of (26).

We assume that Then

(52)

P

in (49)

7.

Polynomial Approximation

57

The maximum of the absolute value of this function is assumed at S. =

azs

Sz-a

+

cos

(1

n )

(53)

i = 1,...,n+1,

and these points are entered into (46). (54)

f(t) = to degree

Exercise.

Consider again the problem of approximating

over a closed bounded interval

< n.

(a,B)

by a polynomial of

Verify that the two approaches i) and ii) above give the

same results, if we select

ti

in (45) according to (52) and

s i

in

(46) according to (53). (55)

Exercise.

Assume again that

in (45) is given by (52) and

t i

si

in (46) by (53).

We determine the polynomials

P

and

Q

in the form

n-1 c qrl

r

Q =

E

drgr,

r= O

where

2t-a-p 4r(t) = T r(

B-a

),

r = 0,...,n-1.

Use the orthogonality relations (33) and (34) to derive expressions for the coefficients

cr

and

dr.

Show also that the number

c

of (15)

is given by n+1

(-1)1 lf(s

n

i=1

thus obtaining a lower bound for the achievable approximation error.

Chapter IV

Duality Theory

A major topic of this chapter is the derivation of "strong" duality results, i.e. theorems which specify when

v(D) = v(P).

Another important

topic is the existence of solutions to the problems (P) and (D).

We shall

give two strong duality theorems, namely (9) of §10 and (7) of §11.

They

can be used to verify strong duality in most linear optimization problems occurring in practice.

§8 is of independent interest since it gives a geometric representation of the dual problem (D) that also is helpful for the understanding of the numerical procedures to be described in Chapter V, VI, and VII.

§8.

GEOMETRIC INTERPRETATION OF THE DUAL PROBLEM At first we introduce the concept of a convex set and the special

case of a convex cone.

Their elementary properties are discussed.

A very

simple geometric representation of the dual problem will be given. (1)

The set

Definition.

following property: line segment between

a1 E K,

a1 a1

and and

K c Rn a2

is said to be convex if it has the

belonging to

a2

lies in

K.

implies that the entire

K

This may be written:

a2 E K

Aa1 + (1-A)a2 E K,

By induction on

q

A E [0,1].

we easily establish that if

al,...,aq E K. then q

A.a. E K

i=111

58

K

is convex and

8.

Geometric Interpretation of the Dual Problem

59

convex

nonconvex

Fig. 8.1

if

A l + A 2+ ... + A q= 1 and ai > 0,

i = 1,...,q.

See Fig. 8.1. Definition.

(2)

Let

vectors

x E Rn

q

A be an arbitrary set of vectors in

We

Rn.

Conv (A), to be the set of all

A, denoted

define the convex hull of

admitting a representation

aiai (q > 1)

x= i=1 where q

ai = 1

i=1 and A.

> 0,

i = 1,...,q,

Thus the convex hull of

ai E A,

i = 1,...,q.

A, Conv (A), consists of all possible convex

combinations

q

x=

i=1

A.a., 1 1

A. 1-> 0,

q

of finitely many vectors from large.

A. = 1,

i=1 1 A.

q>1

The number

(3)

q

can be arbitrarily

The verification of the following statements is straightforward:

Conv (A) is convex for any set

A; a convex set which contains

A must

contain all convex combinations (3); Conv (A) is the smallest convex set having

A

as a subset.

See also Fig. 8.2.

IV.

60

DUALITY THEORY

Fig. 8.2

Fig. 8.3

(4)

that if

A convex cone is a convex set with the property

Definition. then

x E C

where

form a convex cone of the convex set

Let

K

be

(5)

which we shall denote

C

Let

A

be an arbitrary subset of and the notation

for the conic hull of the convex hull of

CC(A)

Instead of

We shall

Cone (Cony (A)) Cone (Cony (A))

by forming all nonnegative multiples of all

convex combinations (3) of elements of

z =

A.

Rn.

CC(A).

we shall sometimes write By (5), we obtain

Cone (K) is

K.

use the word convex conic hull of A

CC(A) = {z

Cone (K), the conic hull

It is straightforward to verify that

K.

Definition.

L

xiai,

xi > 0,

i=1

ai E A,

See Fig. 8.3.

A> 0

x E K,

the smallest convex cone containing (6)

A > 0.

for all

Then all vectors

a convex set.

y = Ax

Ax E C

i = 1,...,q,

A.

i = 1,...,4, (7)

q > 1}.

B.

Geometric Znterpretatign of the Dual Problem

61

Fig. 8.4

Thus the convex conic hull consists of all nonnegative linear combinations of elements of the set

We shall apply the concepts introduced above

A.

to the set of vectors which occurs by the formulation in §4 of the dual pair (P)

-

The constraints of the primal problem

(D).

a(s)Ty > b(s),

s E S,

can be expressed in terms of the set of vectors

AS = {a(s)

I

s E S} c Rn.

(8)

Combining (16) of §4 and (17) of §4 with (7) we find that Xi.... x

}

{s1,...Isq,

is feasible for the dual problem if and only if the vector

c

q

may be written as a nonnegative linear combination of the vectors a(s1),...,a(s

q)

x1,...,x

with coefficients

solutions if and only if Since the convex cone

c

lies in CC(AS)

q.

Thus (D) has feasible

CC(AS).

will play a major role in our presenta-

tion we shall introduce a special notation. (9)

Definition.

The convex conic hull of

AS

will be denoted

Mn

and called the moment cone of the optimization problem (P), Mn = CC(AS).

The words "moment cone" are traditional and will not be elaborated upon. From the remarks preceding the last definition we get the following statement:

62

DUALITY THEORY

IV.

1

Fig. 8.5

The dual problem (D) is feasible if and only if

Lemma.

(10)

c E Mn Example.

(12)

s

a(s) _ The sets

[:2

,

Put

n = 2, S =

s E [0,1].

AS, Conv (AS), and

Consider now the Minimize

and

[0,1]

Mn = CC(AS)

are indicated in Fig. 8.5.

optimization problem:

yI + 2 y2

subject to

(P) syl + s2y2 > es - 1,

Here

c = (1,1/2)T.

s E [0,1].

We see from Fig. 8.S that

the dual (D) of (P) above is feasible.

(Exercise:

megative combination of two suitable vectors (13)

Exercise.

c

a(sI)

is in

Mn

express

and

and that c

as a non-

a(s2)!).

Consider the same example as in (12) but with the

modification Minimize

yl.

Is the corresponding dual feasible? We have hitherto permitted an arbitrarily large natural number in the representation (7) of the convex hull.

set of a finite-dimensional vector space

Rn

However, CC(A)

q

is a sub-

and one might conjecture

Problem

Geometric Interpretation of t -e

8.

q = n

that at most

tion of a vector

vectors

from

a. i

A

are required for the representa-

(Try some simple examples!)

CC(A).

in

z

63

We now prove a general statement to this effect. Reduction Theorem.

(14)

Let the vector

negative linear combination of the

i

vectors

q

(p > 1) be a non-

z E Rp

z1,.... z

RP (q > 1),

in 9

e. q i = 1,...,q.

xi > 0,

xizi,

z =

(15)

i=1

Then

admits a representation

z

q

z =

x .zi.,

I

x

i

i=1

i = 1,...,q.

> 0,

(16)

i

such that at most set of vectors

.

of the numbers

p

with

z.

xi

x. > 0, (z.

are nonzero and such that the x. > 0}, is linearly independent.

I

The proof is constructive and we will show how to arrive at a

Proof:

representation (16) from (15) by means of finitely many arithmetic operaalready are linearly independent then

zl,...,zq

tions. If the numbers

q < p

and

are uniquely determined by (15) and (16), and there

x. = x,

is nothing more to prove.

We assume therefore that

z1,...,zq

are lin-

We will demonstrate how to reduce the number of positive

aely dependent.

terms in (15) step by step until the corresponding vectors become linearly

The linear dependence of z1-.,z

independent.

means that there are q

numbers

a,,...,aq

such that

q a

i

i=1

z

= 0.

i

(17)

Hence we have for each

with

r

ar

i 0

a.

z

r

1

=

Z1 .

.

itr ar

Entering this relation into (15) we get q Z = i=1 i+r

a.

(xi - xr al)z.. r

(18)

Hence we have got a representation of of the vectors

zl,...,z

q.

z

as a linear combination of

We must now also show that

r

so that (18) becomes a nonnegative linear combination, i.e.

q-1

can be chosen

64

a.

x. - x 1 > 0,

r aT -

1

We now select

r

i

such that

ar > 0.

in (17) are nonpositive we multiply (17) by

ar

(If all

x. > 0

a

and

1 -

r

> 0

-1.)

Since

we conclude that

a.

xi - xr al > 0

if

ai < 0.

r

The condition (19) is thus met when

We next discuss the case

Then (19) implies

ai > 0. x,

ai < 0.

x

>

r

a. -ar

.

i

This condition and consequently (19) is certainly met if we determine

r

such that x

a

r

x,

a i > 0}.

min{

=

a.

Then (18) expresses vectors

z

(21)

as a nonnegative linear combination of the

zl'...,zr-1' zr+l'...,z

q-1

This procedure may be repeated until

q.

we have determined a representation (16) such that those vectors which belong to nonnegative coefficients Example.

(22)

zl = I i I' Then

z

xi

(p = 2, q = 4).

Z2 = I Z I,

Z3 =

are linearly independent. Let

(2),

z4 =

V,

(2), z = 1724 1

admits the representation

z=41 z1+21 z2+41 z3+41z4. This relation corresponds to (15). of course.

(23) The vectors are linearly dependent,

Thus we have, for example,

zl = 31 z2 + 31 z3 or

-3z1 + z2 + z3 = 0.

Geometric Interpretation of the Dual Problem

8.

This corresponds to (17) with ar > 0, r = 2

must have

al = -3, a2 = 1, a3 = 1, a4 = 0. r = 3

or

65

meets

this condition.

Since we

By (21) we

must next determine the smaller of the quotients

x2 a2

x3 a3

and

1

Thus we should take

1 =T .

r = 3

and (18) gives

z= z1+ 41 z2+ 41 z4. Thus

no longer appears in (24), in contrast to the representation

z3

(23).

(24)

Carry out another reduction step, this time on (24), and obtain

as a nonnegative linear combination of two of the vectors

z

z1, z2, z4.

Is it possible to carry the reduction even further? Exercise.

(25)

z E Conv (A) c Rn

Prove the Lemma of Caratheodory:

Every vector

admits a representation

n+l x a

z

i=1

i

i

where n+l

xl,...,xn+l > 0

al,...,an+l E A,

and

X. = 1. i=1

From the Reduction Theorem (14) we obtain the following result: Theorem.

(26)

{s1,.... s

Let

,

q

x1, ..,x

q

}

with

q > 1

be feasible

for (D); i.e. q

r = 1,...,n,

ar(si)xi = cr,

(27)

i=1

and x. > 0,

i = 1,...,q.

Then there is a subset

{s1 ,...,si } n 1 with the properties

xl ,...,x.

of

{s1....,sq}

and numbers

n

1

..,si

{si

,

n

1

..,xi }

xi

is also feasible to (D); i.e.

n

1

n ar(s

j=1 3

i j )x i j

= c,

r

r = 1,-..,n,

(28)

66

The vectors,

a(s.

which belong to positive numbers

)

J

In Theorem (26) we have tacitly assumed that elements, i.e. at least

S

constraints occur by (P).

n

(In many applications

quirement is met.

are linearly

x,

J

independent.

or

DUALITY THEORY

IV.

S

has at least

As a rule, this re-

has infinitely many elements

is the result of "sufficiently fine" discretization and

S

n

ISI

be-

comes very large.)

One can always achieve OTy > 0

ISI

> n

by adding the trivial constraint

to (P) sufficiently many times.

This operation does not change

M .

n

However, we cannot conclude from Theorem (26) that we only need to consider feasible solutions (D) with

{s1,...,s

q

and that we can put

q < n

lation of the dual problem.

,

x1,

q = n

.. x

q

}

of the dual problem

from the start in the formu-

It is quite possible that by the transition

from (27) to (28) by means of the Reduction Theorem (14) it happens that

q

n n L

b(si )xi

j

j

b(si)xi.

<

i=1

j

If one wants to make sure that the value of the dual preference function does not change, then one must apply the Reduction Theorem on the

n+l

equations q

i=1

b(si)xi = co, (29)

q

r = 1,...,n.

ar(si)xi = cr, i=1

We obtain then the important result that

n+l

points

"are enough".

s. J

Thus we may put (30)

from the start in the formulation of (D).

q = n+l

The dual problem (D). n+l

b(si)xi

Maximize i=1

subject to the constraints n+l

ar(si)xi = cr, i=1 sl,.... sn+l E S,

xl,.... xn+1 > 0.

r = 1,...,n,

8.

Geometric Interpretation of the Dual Problem

67

We will show in (7) of §12 that if (D) has a solution then we can even put

q = n

from the outset.

From the preceding argument, in particular (29), we are led to intro-

Mn+l C

duce yet another moment cone b(s)

to the vector

a(s)

Rn+l

We adjoin the real number

and consider the vectors

b (s) al(s) R+1

a(s) =

(b(s),aI(s),...,an(s))T E

=

an(s)J Then we can write (29) in the form q

'a(si)xi = (c0,cl,...,cn)T.

(31)

i=1

Following the pattern of (8), we let Rn+1

AS = {a(s)

I

s E S} c

and can then define

Mn+1'

The moment cone

Definition.

(32)

Mn+I

associated with the opti-

mization problem (P) is the convex conic hull of

AS;

n+l = CC(AS). By the definition of the convex conic hull (see (7)) every vector i E Mn+1

admits the representation q

a(si)xi,

z =

(33)

xi > 0.

i=1

By comparison with (31) we realize that

{s,...,sq, xl,...,xq)

sible for (D) with the corresponding value

c0

is fea-

of the dual preference

function if and only if (cO,cl,...,cn)

(We may put

T

q = n+1

E Mn+1'

(34)

in (31) and (33) by the Reduction Theorem.)

From (34) we obtain a "geometric" formulation of the dual problem. It will be fundamental for the discussion to follow. (35)

A "geometric" formulation of the dual problem.

Maximize

c0

68

IV.

DUALITY THEORY

1

Fig. 8.6

subject to the constraint (c0,...,cn)T E n+1'

It is, at least in principle, clear how to get from a solution

of (35) to a dual solution 6

xl,...,xn+l}

(c0,c)

(and vice versa).

1

Since

(c0,cl,...,cn)T E

n+1'

n+l

bCsi)xi

c0

i=1 and n+l

aCsi)xi,

c = i=1

where

xi

{sl'"

are nonnegative numbers and n+l'

x1......x1}

si E S, i = 1,...,n+1.

is hence a solution to (D).

Fig. 8.6 gives a geometric illustration of the dual problem (35). We seek that point

(c0,c)T

{(c0,cl,...,cn),

which belongs to

Mn+I

of the straight line

co E R}

and whose first component is as large as possible.

We mention also the special case of linear programming. Rn+l

Mn+1 = {z E

z =

m C

i=1

Let

There we find

aixi = Ax,

x = (xl,...,xm)T > 01.

(36)

9.

Soivabi,11ty of the Dual Problem

b1

all

A =

b2

b

a12

alm

69

m

a

a21

a22

and

ant

anm

+

+

+

al

a2

a

2m

m

We can now write the condition (34) in the form

The dual problem (LD) is then equivalent to the following problem (we write

for

x0

c0)

Minimize

T -x0 + b x = 0,

subject to

x0

Ax = c, (Xi,...,xm)

§9.

T

> 0.

SOLVABILITY OF THE DUAL PROBLEM The following important theorem on the solvability of the dual prob-

lem (D) is an immediate consequence of the formulation (35) of §8. (1)

Theorem.

Let a given linear optimization problem be such that

is closed and the dual problem (D) is bounded (i.e. it is in state

Mn+l "B" - see Diagram (1), §5). Proof:

We note that

Then problem (D) has a solution. v(D)

is the maximum of the continuous function

f given by f (z0,

, zn) = z0,

defined on the closed and bounded set I

Mn+l n {(z0,z)T Here

c E Rn

v(D)-l < z0 < v(D), z = c).

is the vector appearing in the preference function of (P).

Theorem (1) is very useful since there are simple criteria for ascertaining that

M1+1

is closed.

They are applicable for important

classes of linear optimization problems. dual pair (P)

We shall show in §10 that the

- (D) has no duality gap under the assumptions of Theorem (1).

IV.

70

DUALITY THEORY

Quite often we shall encounter a special class of problems where the index set

S

a1,...an, b

and the functions

which appear in the con-

straints of (P), n

ar(s)yr > b(x),

s E S,

r=1 satisfy the following assumptions:

General assumptions on

(2)

and the real-valued functions

(P).

S

al,...,an,b

is a compact subset of which are defined on

Rk

are

S

continuous there.

This assumption is valid for the examples (3) of §3 and (7) of §4 k = 1) but not for (4) of §3.

(with

gramming (2) holds trivially; since tion on

is continuous.

S

For the special case of linear proS

is finite every real-valued func-

We can then always assume that

S = {1,...,m} c R. (3)

Definition.

If there is a vectory = (yl,...,ym)T E Rn

such

that n

r=1

ar(s)yr > b(s),

s E S,

(4)

then (P) is said to meet the Slater condition.

If (P) satisfies (4) then

we also call (P) superconsistent since (4) is a sharpening of the state-

ment thaty is feasible for (P). Suppose now that Assumption (2) is satisfied.

tion (4) is met if one of the functions

al (s) = 1, Indeed,

a1,...,an

Then the Slater condiis constant, e.g.

s E S.

(4) is met if we take

y = (y1, 0,...,0)T.

where yl > max b(s). sES

This is possible since

b

is continuous on a compact set.

(Compare (13)

of §2.) (5)

Remark.

The Slater condition is an example of the so-called

regularity conditions which are introduced in the theory of optimization and which play a major role in the derivation on existence of solutions. tion in §11.

of theorems on duality and

We shall encounter another regularity condi-

9.

Solvability of the Dual Problem

(6)

Exercise.

dition (4).

Consider (P) given Assumption (2) and the Slater con-

Show that the set of vectors feasible to (P) has interior

Hint: Lety E Rn

points.

71

satisfy (4).

y with

such that all vectors

Iy-yI < E

Show that there is an

E > 0

are feasible for (4).

The two theorems to follow can be used to establish the existence of solutions of the dual of most linear optimization problems encountered in practice. (7)

Suppose that Assumption (2) is satisfied and that (P)

Theorem.

meets the Slater condition.

Then the moment cone

is closed.

Mn+l

In order to carry out the proof of this theorem we need the following result which is of independent interest. (8)

Lemma.

A c Rp

Let

be a compact set.

Then its convex hull,

Conv (A), is also compact.

By (25) of §8, Conv (A) is generated by means of all possible

Proof:

linear combinations p+l aixi i=1

where and

a1,...,ap+1 E A where the set

D c Rp+l

(xl,...,xp+l) E D,

is defined by p+l

Rp+1 I

D = {x E

xl > 0,

i = 1,...,p+1, and

xi = 1}. i=1

Hence Conv (A) is the image of the compact set

AXA

x

...

X

(p+l times)

AxD

under the continuous mapping p+l

(al,...,ap+1, xi,...,xp+1) 3

L

aixi.

i=1

Since

A

was compact, Conv (A) must be compact as well.

(See the remark

after (12) of §2.) Proof of Theorem (7):

will show that then M1+1 = CC(AS).

z

Let

z

must be in

be an arbitrary vector in Mn+l

also.

Mn+l'

We

By Definition (32) of §8,

DUALITY THEORY

IV.

72

Thus to

we may associate a sequence

z E Mn+I

a sequence of nonnegative numbers

{hi}i>1

in

and

Conv(AS)

such that

{ai}i>1

z = lim aihi.

(9)

j-+_ The set

AS

is compact since

is compact and

S

By Lemma (8), Conv (AS)

tinuous.

subsequence of

is compact.

We may therefore pick a

which converges to a vector

{hi}i>1

are con-

a1,.... an, b

h E Conv (AS).

we may as well assume from the outset that the sequence

{hi}i>1

Thus

in (9)

is such that lim hi = h,

h E Conv(AS).

i-M.

If now the sequence

{ai}i>1

that it converges to

A > 0.

is bounded we can in the same way assume Then we obtain

z = lim a.h. = lim X. lim h. = ah h E Conv (AS), a > 0, it follows that

and from

We next consider the remaining case when

as was to be established. {ai}i>i

is unbounded.

z = ah E CC(AS) = n+1

Then we may assume, if necessary by using a suit-

able subsequence, that > 0, A.>0

1 = 1,2,...,

and lim 1/a. = 0. i-+°° 1

Thus we get

i- i

h = Iim hi = Iim

i-

aihi = 1im al i-MD

This means that the null vector of are S

nonnegative numbers

q > 1

1im aihi = 0z = 0.

1 1-b° Rn+1

lies in

a1,...,aq

and

Conv (AS). q

points

Hence there

s1,...Isq

in

such that q

a(si)ai

0 = i=1

and q

ai = 1.

(10)

i=1

From the definition of

a(s)

(see

(30)- (31) of 48) this implies that

9.

Solvability of the Dual Problem

73

q

b(si)ai

0=

i=1 and q

ar(si)ai,

0 =

r = 1,...,n.

i=1 Let

y E Rn

The last two equations now give

be an arbitrary vector.

q

n \\

0 =

yrar(si)

ail

b(si)J

Since problem (P) is required to meet the Slater condition there is a such that

y' E Rn nC

Yrar(si) - b(si) > 0,

i = 1,...,q.

r=1

If we now put ... = aq = 0

ity that

y = y

is unbounded.

Example.

s2 y1 > s,

Here we have

This rules out the possibil-

Hence we have established the theorem.

Consider the constraint

n = 1, S = [0,1], aI(s) = s2, b(s) = s. a1(0) = b(0) = 0.

(xI,O)T, x1 > 0

(13)

aI = a2 =

s E [0,1]

tion is not met since vectors

ai > 0, that

must hold, contradicting (10).

{Ai)i>1

(12)

in (11) we get, since

Exercise.

are in

Mn+1

Mn+1

The Slater condi-

is not closed since the

but not in

Mn+1'

Consider the problem of uniform approximation over

a compact set, discussed in 96.

Show that the Slater condition is met.

In §4 we showed that if (P) and (D) are consistent, then (D) has a finite value.

Combining (1) and (7) we get the following statement on the exist-

ence of solutions to (D). (14) i)

ii)

iii)

Theorem.

Let the dual pair (P)

-

(D) have the properties

Assumption (2) is satisfied, (D) is feasible,

(P) meets the Slater condition.

Then (D) is solvable.

This theorem will be sharpened significantly in 12 of 910. We now treat linear programming and show that the corresponding mo-

ment cone n+1

as defined in (36) of §8 is closed in this case.

DUALITY THEORY

IV.

74

We shall say that cones of the form C = {z E Rp

(x1,...,xm) > 0}

z = Ax,

I

are finitely generated.

In the case of linear programming, Mn+I

finitely generated, and the following theorem establishes that

is

Mn+1

is

closed.

Then the rows of A

p.

be a convergent sequence in

Rp

are linearly independent. C

is closed.

p x m matrix

We consider first the case when the

Proof:

rank

Every finitely generated cone in

Theorem.

(15)

Let now

A has {zj}j>1

such that

zJ + z

(16)

We want to show that

z

is also in

Every

C.

nonnegative linear combination of at most vectors of

z)

can be written as a

linearly independent column

p

A, by the Reduction Theorem (14) of §8.

We may now, for each

j, supplement this set of column vectors by picking suitable column vectors from the remaining ones to get a basis for each

vector

xj

- Rp

z) = A j xj, Here

A.

I. c {1,...,m}

an index set

j > 1

p

elements and a

such that

x> > 0.

is formed of the columns from

xj = A-Izl,

Then there is for

Rp.

containing

A

corresponding to

Ij.

Thus

j > 1.

However, there are only finitely many matrices these a fixed matrix

A

and a subsequence

Aj.

{j(k)}k>l

Hence there is among of natural numbers

such that

xi (k) = A -1 z3 (k) ,

k > 1.

Hence we get from (16)

xj (k) + x = A -1 z. Since

xj(k)

> 0

we must have

x > 0.

The relation

z = A x

then implies that

z E C

which was the desired conclusion.

the remaining case when the rank of A

that the rows of A

We now treat

is less than

p.

We may assume

are ordered such that the first

p1

rows are linearly

Separation Theorem and Duality

10.

75

independent (1 < pI < p) and the remaining rows are linear combinations of the first A = 0

(We have, of course, excluded the trivial case

ones.

p1

from further consideration.)

Then every

1

zI E

(z1,z2)T,

z =

Rp,

z E C

may be written

1 Rp-p,

z2 E

where

zl =Ax, xERm, x> 0,

(17)

z2 = Bz1.

(18)

and

Here

is a

A

pI X m matrix and

define the cone

B

a

(p-p1) x pI

matrix.

We next

associated with (17) and argue as above and use (18)

to arrive at the desired result

z =

Combining Theorems (1S) and Cl) we conclude that (LP) is solvable when (LD) is bounded.

We saw in (37) of §4 that every problem in the form of

(LP) may be transformed into an equivalent problem in the form of (LD).

Hence a corresponding existence theorem is valid for (LP) as well.

This

fact we summarize in the (19)

Theorem.

Consider the dual pair (LP)

-

(LD) of linear pro-

If both of these problems are consistent then they both have solu-

grams. tions.

In the next section we shall also show that no duality gap can occur under the assumptions of Theorem (19).

§10.

SEPARATION THEOREM AND DUALITY We shall start this section by developing a fundamental tool to be

used in the proof of strong duality theorems, namely the statement that a point outside a closed convex set in

RP

may be "separated" from this set

by a hyperplane in the sense of the following definition. (1)

Rp

and

Definition.

z f M

Let

M be a nonempty, closed and convex subset of

a fixed point.

H(y;n) ={xERp is said to separate

I

z

The hyperplane

yTx=n) from M

if

IV.

76

Separating hyperplane

Fig. 10.1.

yTx < r1 < yTz,

DUALITY THEORY

x E M.

From geometric considerations (see Fig. 10.1) one is led to believe that a vector

which defines a separating hyperplane is obtained by determin-

y

ing the projection

of

z0

M

on

z

and putting

y = z - z0.

This will

We will therefore first show the

turn out to be the correct procedure.

existence of a unique projection point.

(See (4).)

To give a motivation for the argument to follow we shall first indicate the fundamental role of the concept of separating hyperplanes in the theory of the dual pair (P)

- (D).

Assume that the hyperplane n Rn+1

I

zryr = 0}

H(y;0) = {z E r=0

separates the moment cone n+l Mn+I

from the point

lies on one side of the hyperplane.

v 4 Mi+1.

Thus all of

Hence

n 0 >

I

zryr, all

(z0,...,zn) E n+1

r=0

In particular, since

Mn+1 = CC(AS)

we have

z = a(s) = (b(s), a1(s),...,an(s))T E Mn+1 for all

s E S.

Thus we find from (2) that

(2)

10.

Separation Theorem and Duality

77

n 0 > b(s)y0 +

s E S.

a,(s)yr,

£

r=1 If

holds, then the last relation takes the form

y0 > 0 n

r=1

-y

a (s) r > b(s), r

Hence the vector

be feasible for (P).

y

Let

sing through the origin such that Projection Theorem.

(4)

set and let vector

0<

z

z0 E M

Mn+l

M c RP

Let

Give a hyperplane pas-

is on one side of this hyperplane.

be a nonempty, closed, convex

be a fixed point outside of

M.

which lies "closest" to

That is, z0

Iz - z0I <

Proof:

is feasible for (P).

y = (-y1/y0' ...,-yn/y0)

Exercise.

(3)

s E S.

y0

Since

Iz

- xI, all

M

is closed and

z.

Then there is exactly one is such that

x E M. z E M we find

p = inf Iz - XI > 0. xEM

Obviously, it is sufficient to search for the vector

in the set

z0

M=Mn {xERP I Iz -xl <2p}. assumes its minimum

Now the continuous real-valued function

x - Iz - xI

value on the bounded and closed set

Hence there is a

M.

z0 E M

such

that

Iz - z01 < Iz - xI,

x E M.

From the construction of

(5)

(5) holds for all

M,

lish the uniqueness of the projection point there is a vector

zl # z0

z2 = (z1 + z0)/2.

Z2I2 =

4I(z-z0)

Iz

+

=

implying

x E M. The parallelogram law from (10) of §2 gives

+ (z_zl)I2 <

4I(z-z0)

+ (z_zl)I2

lzo_zlI2 = 4I(z-z0) + (z_zl)I2 +

2(Iz_zol2

+

1z-z21 < Iz-z0I.

uniqueness is established.

We must now estab-

Assume therefore that

such that

Iz - z1I < Iz - xI, all We now put

x E M.

z0.

I(z_z0) -

(z_zl)I2

Iz_zll2) = I=-zoi2. This contradicts the construction of

z0, hence

78

IV.

Separation Theorem.

(6)

Let

set.

put

y = z-z0

and

n = (z-z0)Tz0

T

y x < n < yTz,

x E M;

i.e. the hyperplane

H(y;n)

Proof:

number.

M c Rp

Let

be a nonempty, closed, convex

be a fixed point whose projection on

(E M

z

M

is

z0.

If we

we get (7)

separates

z

from

M.

0 < n < 1

x E M be an arbitrary vector and

Let

DUALITY THEORY

be a fixed

Then

(1-11)z0 + ux = z0 + u(x-z0) E M.

We also find that

Iz-z012

< Iz - (zo +

u(x-z0))I2

= Iz-z012 - 2u(z-z0)T(x-z0) + u21x-z012, giving (z-Z0) T(x-z0) < 2 111x-z012

Letting

u - 0

we arrive at

(z-z0)T(x-z0) < 0,

establishing the leftmost inequality in (7).

The other inequality re-

sults from the relation Iz-z012

0 <

=

T (z-z0)T(z-z0) = y z

-

T yTz0 = y z - n,

concluding the proof.

Suppose now that the assumptions of Theorem (6) hold, but specialize

M

to be a convex cone.

Then

x E M

implies that

Ax E M

for all

A > 0.

From (7) we then get

yT (ax) < n,

A > 0,

yTx < n/x,

a > 0.

or

Letting

A - m we conclude

yTx < 0,

Thus if M

x E M.

is a convex cone we may put

(7) in the form

n = 0

from the start and write

Separation Theorem and Duality

10.

T y x < 0 < yTz,

79

x E M.

(8)

Now we can use the Separation Theorem to establish the duality result which was promised earlier. First Duality Theorem.

(9)

Consider the dual pair (P) - (D) and

make the following assumptions: i)

ii)

The dual problem is consistent and has a finite value The moment cone

Mn+I

v(D);

is closed.

Then (P) is consistent as well and

v(P) = v(D); i.e. there is no duality gap. Proof:

Moreover, (D) is solvable.

We have already shown that (D) is solvable (Theorem (1) of

Thus we have

§9).

(cO,cl,...,cn)T

E n+l'

but (co + e, cl,.... cn) 4 Mi+1

for any

e > 0.

Since

Mn+1

is closed we may invoke the Separation

Theorem (6) and conclude that there is a hyperplane in arates

(co + e,c) T

is a vector

from the convex cone

Mn+I

Rn+l

(see (8)).

which sepHence there

Rn+l,

different from

(y0,yl,...,yn)T E

n

0, such that

n xryr < 0 < Y0(c0 + e) +

r=0

crYr,

r=1

(10)

(x0,xl,.... xn)T E Mn+l'

In (10) we now put (x0,x1,...,xn)T = Cc O,cl,...,cn)T E Mn+1

and obtain y0e > 0. > 0

Since

we must hence have

(x0,xl,...,xn) (s E S

relation

T

y0 > 0.

If we now set

= (b(s),aI(s),...,an(s))

T

E AS c Mn+1'

is arbitrary) we find from the leftmost inequality in (10) the

80

DUALITY THEORY

IV.

n

r=1

ar(s)(-Yr/YO) > b(s),

s E S.

Hence the vector Y = (-Yl/Y0, -Y2/YO...... y ly0) E Rn

is feasible for (P).

The right inequality in (10) implies

n

cr(-Yr/YO) < co + C. r=1

We now arrive at the following chain of inequalities:

n

v(P) <

cryr
r=1 The first inequality follows from the fact thaty is feasible for (P) and the last is a consequence of the weak duality theorem (18) of §4. Thus

v(P) - e < v(D) < v(P) for any (11)

e > 0, proving the theorem. Exercise.

the moment cone

Mn+l

The ray

Hint:

Consider again Example (8) of B. R3

in

{(0,a,0)T

I

Draw a picture of

and show that this cone is not closed.

A > O)

lies in

Mn+1

but not in

Mn+l'

In many applications the General Assumption of (2) of §9 is met: S

is a compact subset of

continuous on

Rk

and the functions

a,,.... an

and

b

are

We combine the Theorems (7) and (14) of §9 with (9)

S.

and arrive at the following useful result: (12)

Theorem.

Consider the dual pair (P) - (D) and make the assump-

tions i)

ii)

iii)

General Assumption (2) of §9; (D) is consistent;

(P) meets the Slater condition.

Then (D) is solvable and the values of (P) and (D) coincide. We discuss also the case of linear programming, i.e. the dual pair (LP)

Minimize

cTy,

ATy > b

(LD)

Maximize

bTx,

Ax = c,

x > 0.

10.

Separation Theorem and Duality

81

Theorem (9) and Theorem (19) of §9 deliver the entire duality theory of linear programming.

We have by Theorem (9) that if (LD) is consistent

and bounded then (LP) is consistent also and the values of (LD) and (LP) coincide.

Using the transformations (37) of §4 we may also conclude that

if (LP) is consistent and bounded then (LD) is consistent as well and the values of the two problems coincide.

From this argument we obtain the

following state and defect diagrams for linear programs.

(Compare also

with (1) of §5 and (7) of §5.) State and defect diagrams for linear programming.

(13)

`LP)

(LP)IC IC

(LD)

IC

UB

B

(LD)

0

UB

6

State diagram

0

Defect diagram

Duality theorem for linear programming.

(14)

A dual pair (LP) -

i)

0

B

5

UB

UB

IC

4

1

B

B

(LD) is in one and only one of the states

1, 4, 5, 6 of the state diagram (13).

All states are realized.

If both programs are consistent (i.e. if state 5 is realized)

ii)

then both problems are solvable and no duality gap occurs. The reader should construct simple examples (n = 1

or

n = 2) to

show that all the states 1, 4, 5, 6 can be realized. We recall once more that the First Duality Theorem (9) plays a fundamental role

for the argument of this Section.

this theorem we may conclude that solution.

v(D) = v(P)

However, the assumptions

Under the assumptions of as well as that (D) has a

i) and ii) of Theorem (9) do not

imply the solvability of (P), as is illustrated by the example in Exer-

cise (13) of B. (15)

(1) of §6.

Exercise.

Show that

Consider the problem of uniform approximation of v(DA) = v(PA)

and that the dual problem is sol-

vable. (16)

Exercise.

We replace the dual (D) by the "modified dual" (D')

as follows: (D')

Maximize

c0

when

(Compare with (35) of §8.) v(P)

(c0,c)T E Mn+1'

Show that the weak duality inequality

is valid for the modified dual pair (P) - (D!).

v(D') <

Show also that when

82

IV.

DUALITY THEORY

v(D') is finite then (D') is always solvable and that we always have v(P) = v(D').

Exercise.

(17)

A c Rp

Let

c E CC(A)

a E A

(The modified problem (D') is of theoretical interest only.)

Use the Separation Theorem (6) to show Farkas' Lemma:

be a nonempty set and

c E Rn

if and only if all vectors

also satisfy

cTy > 0.

y

Then

a fixed vector. such that

aTy > 0

Specialize to the case when

for all

A

has finitely

many elements. (18)

Remark.

The duality theorem (12) can be sharpened somewhat.

(A corresponding statement is true for the First Duality Theorem.)

One

can show that the assertions of (12) remain true if we replace the assumption (ii) by ii')

is finite.

v(P)

It is easy to establish that ii) and iii) imply ii').

A proof of this

sharpened version of (12) is to be found in Glashoff (1979).

For easy

reference we sum up the result, which is quite useful for many applications. (19)

Theorem.

Consider the dual pair (P) - (D).

Make the follow-

ing assumptions: i)

ii)

iii)

General assumption (2) of §9 v(P)

is finite;

(P) meets the Slater condition.

Then (D) is solvable and the value of (P) and (D) coincide.

§11.

SUPPORTING HYPERPLANES AND DUALITY In this section we shall prove a theorem which could be said to be a

kind of "dual" to Theorem (9) of §10:

from the consistency and bounded-

ness of (D) follows the strong duality result

v(P) = v(D)

as well as

the solvability of (P) provided certain regularity conditions are met. For this purpose we will need a corollary to the Separation Theorem (6) of §10 which states that a supporting hyperplane passes through each boundary point of a convex set. (1)

let

Definition.

Let

M be a nonempty convex subset of

z E M be a fixed point. H(y;n) = {x E RP

I

yTx

(See Fig. 11.1.).

The hyperplane

= Ti)

is said to be a supporting hyperplane to

M

at

z

if

Rp

and

Supporting Hyperplanes and Duality

11.

Fig. 11.1

83

Supporting hyperplane

yTx
Lemma.

Let

be in

z

no supporting hyperplanes to Proof:

M, the interior of at

M.

Then there are

z.

has a supporting hyperplane

M

Assume

M

M.

H(y;n)

at

z.

Since

0

there is a s> 0

z E M z

such that

= z+ayEM.

A

We find that y

T z

X = yTz + ayTy < n = Y

ayTy < 0,

which contradicts

1 > 0

and

Thus we reach the desired conclu-

yTy > 0.

sion. (3)

Theorem.

Let

M be a nonempty convex subset of

Rp

and let

0

be on the boundary of M hyperplane to

M

at

z.

(z E bd M = M91).

Then there is a supporting

z

84

IV.

For every nonempty convex subset

Proof:

DUALITY THEORY

the following

M c :RP

statement holds:

bdM=bdR. This elementary property of convex sets follows from the fact that 0

bd M = MOM 2

since

0

M = M.

(4)

We shall show the truth of (4) in (22) - (26) at the end of §11.

z E bd M be a fixed point.

There is a sequence

{z

}

Now let

of points such

i

that

zi f M

points on

and

lim zi = z.

and the closed convex set

zi

M by

zio.

Putting

yix < yi zi, Since

We apply the Separation Theorem to the

yi = zi - zio

x E M,

zi

we get

i = 1,2,...

zi fE M, yi 4 0, i = 1,...,

Denote the projection of

M.

.

setting

i = 1,2,...,

yi = Yi/IYil,

we get 1Yil = 1

and

yix < yi zi,

x E M,

i = 1,2,...

(5)

Consider the set

B = {y E Rp B

of

I

Jyj = 1}.

is closed and bounded, hence compact. {yi}i-1

which converges to a point

Therefore there is a subsequence y E B.

Applying (5) to this sub-

sequence and passing to the limit we get yTx < yTz,

x E M,

which proves the assertion of the theorem sincey E B Definition.

(6)

and hence

0.

The dual problem (D) is termed superconsistent if

0

c E M . n (7)

Second Duality Theorem.

Consider the dual pair (P)

the assumptions i)

v(D) is finite; 0

ii)

(D) is superconsistent, i.e.

c E Mn

-

(D).

Make

11.

Supporting Hyperplanes and Duality

Fig. 11.2.

85

The cones Mn+l and M'n+l

Then (P) is solvable and v(P) = v(D). Proof:

Both (P) and (D) are feasible.

Hence the values

are finite due to the weak duality lemma.

v(D)

v(P)

and

We set as usual

c0 = v(D).

(8)

T c (Otherwise lies on the boundary of Mn+1' 0' 1'" ''cn) we could find a vector (c0,cl,...,cn)T with c0 > c0 but still feasible

The vector

(c

to (D), a fact which would contradict (8).)

For the purpose of carrying

out the proof we now introduce the following convex cone (see also Fig. 11.2):

Mn+1 - {(zz0,z ,...,z ) T

such that

I

there is

'z0 < z0,

(z0,zl,...,zn)T E M1+1

21 = zl,.... in = zn}.

We find at once that (20,cl,...,cn)

T

E bd Mn+1'

By (3) there is a nontrivial supporting hyerplane to (c0,cl,...,cn)T; i.e. there is a vector

y = (y0'y)T =

M

n+1

at (y0'yl'... yn)T # 0

such that yTz < 0 = y0c0 + yTc,

z E Mn+1'

(9)

86

DUALITY THEORY

IV.

We have used here the fact that n+l (9) implies, since

is a convex cone.

AS c CC(AS) = Mn+1

(See (8) of 410.)

c Mn+l'

n

YOb(s) +

ar(s)yr < 0,

E

s E S.

(10)

r=1

We now show that

From the definition of

y0 > 0.

T

Mn+1

it follows that

a > 0.

We therefore get from (9) y0c0 - y0A + yTc < 0.

y0c0 + yTc = 0

Since

A > 0,

-y0A < 0,

and hence

Y0 > 0.

Putting

We must now rule out the possibility

nnp

Yrzr < L r=1

z E Mn.

crYr,

r=1

is the projection of

the condition

Mn

at

c.

(Since

y # 0

y0 > 0.

r = 1,...,n,

and obtain, from (10), n ar(s)yr > b(s),

s E S.

r=1

Thus

T

(yl'" ''yn)

is feasible for (P) and hence n

v(D) < v(P) <

cryr'

r=1

By (9) we conclude that nc

crYr = c0 = v(D). r=1

defined through

and

y0 = 0

But this contradicts the fact that

Hence we have established that

Yr = -Yr/YO,

Rn+l

Therefore, (11) means that there is a nontrivial

z0 = 0.

(y1,...,yn) # 0.)

(Lemma (2)).

on the subspace of

Mn+1

supporting hyperplane to have

y0 = 0.

y0 = 0, we get from (9) that

n

Mn

we find that

we must

c E Mn

We now let

Supporting Hyperplanes and Duality

11.

Hence we have shown that

v(P) = v(D)

87

and

T

(y

Yn)

1

solves the prob-

lem (P).

The Second Duality Theorem just established can be applied to the problem of uniform approximation defined in (1) of §6. ately (without requiring the set a1,...,an

and

T

We obtain immedi-

to be compact or the functions

to be continuous) that

b

v(DA) = v(PA) (strong duality) and that the primal problem has a solution (see also (15) of §10):

Consider the approximation problem (PA) of (1) of §6.

Theorem.

(12)

Let

v1,...,vn

be linearly independent on

T; i.e.

n

t E T

yrvr(t) = 0, r=1

Then (PA) is solvable and the values of

implies

yl = y2 = ... = yn = 0. (PA) and (DA) coincide.

We will show that the linear optimization problem which is

Proof:

We must verify that

equivalent to (PA) satisfies the assumptions of (7). the vector vex cone

c = (0,...,0,1) T

of (2) of §6 lies in the interior of the con-

M which is generated by the vectors

(v 1(t),...,vn(t),l)T, (-vl(t)...... vn(t),l)T

c E M, for we can pick an arbitrary

Note that c =

t E T.

E T

(13)

and write

2(-v1(t),...,-vn(t),l)T.

Z(vl(t),...,vn(t),l)T + 0

We next assume that

and show that a contradiction results.

M

c

If

0

c E M4h1

M

at

c E bd M

then

and by (3) there is a supporting hyperplane to

Hence there is a vector

c.

T

y z < 0 = yTc,

(We can put

n = 0

(y1'' " 'yn'yn+l)T # 0

z E M.

since

M

(14)

is a cone.

we find from (14) that

c = (0,...,0,1) T

such that

See (8) of §10.)

Yn+1 = 0

Since

and hence

n

yrzr < 0,

z E M.

(15)

r=1

We observe that we know

(y1,...,yn)T + 0.

(Y1'. ''yn'yn+1)T + 0.

and arrive at

We have just seen that

Yn+l = 0

but

We now enter the vectors (13) into (15)

DUALITY THEORY

IV.

88

n yrvr(t) = 0,

t E T,

r=1

contradicting the linear independence of vl,...,vn

on

T. 0

c E M.

There is a simple way of imposing the condition Consider the problem

Regularization.

(16)

n (P)

Minimize

cryr,

E

a(s)Ty > b(s),

s E S.

r=1

Assume now that we know a solutions of (P) and a number that

F > 0

such

Then we supplement the constraints of (P)

JyrI < F, r = 1,...,n.

with the conditions r = 1,...,n.

l>rI < F,

the (equivalent) linear constraints

These may also be written as Yr > -F,

-Yr > -F,

-

r = 1,...,n.

Thus we get a modified ("regularized") problem: n

Minimize

(PF)

subject to

cryr

E

a(s)Ty > b(s),

s E S,

r=1

r = 1,...n where er = (0,...,0,1,0,...,0)T E Rn. r

rth component

The vectors which define the constraints of vectors

er

PF

include all the unit

as well as all the negative unit vectors

-er.

Hence we find

in this case that

Mn=Rn and the regularity condition 0

c E M

n

is trivially met.

By means of the duality theorem just proved, we find

that the dual pair (PF)

-

(DF) has no duality gap.

The solvability of

(P F) is also a consequence of this duality theorem but can alternatively be established from the fact that the constraints of (P F) define a compact subset of

Rn.

Supporting Hyperplanes and Duality

11.

89

It is known from the Reduction Theorem (14) of 98 that every admits the following representation:

c E Mn = CC(AS) qqC

c=

a(si)xi,

L

i=1

q < n,

are linearly

a(s1),...,a(sq)

and

sl,...,sq E S, x1,...,xq > 0

where

The representation (17) is generally not unique; i.e.

independent vectors. c

(17)

can have different representations (17) and the value of A representation (17) with

be unique.

Lemma.

(18)

Let

c

q = n

q

need not

is said to be maximal.

have a maximal representation; i.e.

n (19)

a(si)xi,

c = i=1

x

i = 1,...,n,

> 0,

are linearly independent.

a(sI),...,a(sn)

Then

(20)

i

c

lies in the interior

Proof:

Let

Mn

of

(21)

M .

n

have the representation (19), which we write as fol-

c

lows:

c = A(sl,. ,sn)x, A(sl,...,sn)

where the matrix

x = A(si,...,sn) Let now

s1,...,sn

-1 c.

be fixed.

Then the components

x1,...,xn

be looked upon as continuous functions of the vector then conclude that there is an E

a(s1),...,a(sn).

has the column vectors

is nonsingular by (21), so

A(sl,...,sn)

in the neighborhood

in (19) may

From (20) we

with the property that all vectors

a > 0

Ic - El < e

c.

are such that

xl,...,xn > 0, where

1' c.

x = A(sl,

,sn)

Thus the vector c = A(sl,...,sn)x also lies in

That is, c

M .

n

Hence there is a neighborhood of

is in the interior

c

which lies in

M

G

Mn

of

Mn, which is the desired result.

We remark that the converse of Lemma (18) is false. we consider the following 4 vectors in

R3:

As an example

n

IV.

90

DUALITY THEORY

al = CO,0,l)T

Put

a2 =

Cl,0,l)T

a3 =

T

a4 =

T

It is easy to establish, e.g. by drawing a suit-

c = (1/2, 1/2, 1) T.

able picture, that vectors

lation that

is in the interior of the moment cone formed by the

c

a1,...,a4.

Nevertheless one verifies by straightforward calcu-

has no representation (19) - (21) with

c

q = n = 3.

We conclude this section by showing, as promised above, that 2

0

M = M M c Rp.

holds for nonempty convex sets

The proof will be carried out in

three steps (see also Eggleston (1958)). Lemma.

(22)

Let

M c RP

be a nonempty set in

RP

with nonempty 0

interior

Let

M.

xl

and

x2

M

be two points in

such that

x2 E M.

Consider the line segment [xl,x2l = {x = Axl + (1-A)x2

Then all of

1

A, E [0,1]}.

[xl,x2], except possibly the endpoint

x1, belongs to the

0

M

interior

of

M.

Since

Proof:

M

is convex, [xl,x2] c M.

is a sphere, K6(x2), 6 > 0, with

x2 E M

implies that there

(see (11) of §2).

Kd(x2)

Let

c :M

y # xl

be a point in

[x1,x2].

We want to show that there exists

r > 0

such that

Kr(y) c M

(23) 0

and hence

y E M

as asserted.

Put

y = Axl + }ix2

where Let

(24)

A > 0, U > 0, A + p = 1. z E KU6(y).

Then

1z - yj < p6,

or, by (24), (z -

Since

(Ax1 + U x2)1 < p6.

p > 0

we find that

We verify now that (23) holds for

r = p6.

91

Supporting Fiyperplanes and Duality

11.

I(z - Axl)/U - x21 < 6; (z - Ax1)/p

i.e.

lies in

and hence in

K6(x2)

M.

Consider next the

identity z = Axl + u(z - Axl)/p.

Due to the convexity of

M,

must also belong to

z

M, proving (23) and

hence the assertion. Lemma.

(25)

assumption

Lemma (22) remains true when the

The assertion of

is replaced by the weaker requirement

x1 E M

x1 E M. 0

0

with

y E [x1,x2]

M

x2 E M

Since

Proof:

there is a

6 > 0

and let

y # xl, y # x2

zl

such that

K6(x2) c M.

Let

be an arbitrary point in

such that

Izl - xl1 < dlxl - yi/1x2 - YI Define

through the relation

z2

22 - x2 = - (zl - x1)Ix2 - YI/Ixl - yl-

Then we obtain 1z2 - x21 < d,

z2 E K6(x2) c M.

i.e.

Next we find that y = Ax2 + pxl = Az2 + uzl, where A = Ixl - YI/{Ixl - YI + 1x2 = y')

and

u=1=Ix2-uI/flxl-uI+Ix2-pI}. Hence

2

Then

'T-

Theorem.

(26) M.

Lemma (22) now delivers the desired result.

y E [z2,zl].

Let

M c RP

be a convex set with nonempty interior

M = M.

0

Proof:

Since

M c M

we get

showing

x E M, x

M

Select an arbitrary y + x, with tion.

xl E M.

x E [xl,y].

Since

by °

0

implies x¢ M. 0

M c M

We establish that

M c M. 0

0

Assume that 0

x E M- M

and

x E M there is also a point

x E M.

y E M,

By Lemma (25) x E M, contradicting the assump-

Chapter V

The Simplex Algorithm

This and the next chapter are devoted to the presentation of the simplex algorithm for the numerical solution of linear optimization problems. This very important scheme was developed by Dantzig around 1950.

We will

see that the simplex algorithm consists of a sequence of exchange steps. A special algorithm, related to the simplex algorithm and also based on exchange steps, was used in 1934 by Remez for the calculation of best approximations in the uniform norm.

His procedure is described in Cheney

(1966).

We will not prove the convergence of the simplex algorithm here.

For

the case of finitely many constraints (linear programming) the convergence has been established a fairly long time ago (Charnes, Cooper and Henderson (1953), p. 62).

The general case is much more difficult and

has not been studied until recently.

In this chapter we shall give a general description of the simplex algorithm and Chapter VI will be devoted to its numerical realization. For easy reference we state here Problem (P), which is to be treated by means of the simplex algorithm:

n (P)

Minimize

n cryr

subject to

r=1

I

ar(s)yr > b(s),

s E S.

r=1

In this and the next chapter we shall require that (P) is solvable, if bounded, and that no duality gap occurs.

that this situation occurs when n+l

We have shown in Chapter IV, §10

is closed (e.g. the case of linear

programming) or when the Slater condition is met. then be written in the following form:

92

The dual problem can

12.

Basic Solutions and the Exchange Step

n

n

Maximize

(D)

93

i£1 a(s):.=:, r =

subject to

b(si)xi

s. E S,

x.

> 0,

1,...,n,

i = 12 ..,n

In the future we shall write a feasible solution to

(see (7), of §12).

this problem in the form

{a,x}.

c S

Here, a =

and

x = (xi,...,xn) E Rn.

BASIC SOLUTIONS AND THE EXCHANGE STEP

§12.

We write the constraints of (D) in the form n

a(si)xi = C,

(1)

i=1

a = {s1....Isn) c S,

x = (x1,

are

Here, a(sl),...,a(sn)

n

..,xn)T > 0.

of those vectors in

(2)

Rn

which appear in

the constraints of (P):

a(s)Ty > b(s), (3)

Definition.

Let

{a,x}

a(sl),...,a(sn)

Also, let

hold.

s E S. be feasible for (D), i.e. (1) and (2) be linearly independent.

Then

{a,x)

will be called a basic solution to (1).

Thus if

{a,x}

is a basic solution then the linear system of equa-

tions (1) has the unique solution

x.

We shall also write this system in

the form A(sl,...,sn)x = c. is the

Here, A(sl,...,sn)

(4)

n x n

matrix having the columns

a(s1),...,

a(sn): a1(s1)

...

an(sn)

a2(sI)

...

a2(sn)

A(sl,.... sn) =

(5)

an(sl)

...

an(sn) J

Hence if

{a,x}

A(sl,...Isn)

is

is a basic solution then the rank of this basis matrix n

and we have

x = A(sl,.... sn)-lc and

x > 0.

94

V.

We shall require that among the vectors

Requirement.

(6)

THE SIMPLEX ALGORITHM

a(s),

s E S, there is always a subset of n

linearly independent vectors.

implies that if

must hold.)

{sl,...,sq), x1,...,xq

solution

(This

Then there is a

Let the dual problem (D) be solvable.

Lemma.

(7)

n < m

ISI = m, then

q <.n, xi > 0, i = 1,...,q,

such that

and the vectors a(si),

i = 1,...,q

are linearly independent. Proof:

Let (D) have the value

Then we have the relations

v(D).

q

xibCsi) = c0 = v(D),

(8)

xia(si) = c,

(9)

i=1

q i=1

i = 1,...,q.

x. > 0,

Thus the vector vectors

E Rn+l

(c0,...,c

is a convex combination of the

(b(si),aI(si),...,an(si))T E Rn+1

is not unique.

q

The representation (8), (9)

Using the reduction theorem (14) of §8 we conclude that

among the representations (8), (9) there is at least one such that q < n+l, xi > 0, 1 = 1,...,q dent.

cone

Mn+l

a(s1),...,a(sq)

Mn+1'

are linearly indepen-

We consider therefore the moment

q < n.

which is defined as in (32) of §8.

lies on the boundary of have

and

We now want to show that

(c0,...,cn)T

The vector

By Lemma (18) of §11 we must therefore

q < n, which is the desired conclusion. We can now state and prove an important result. (10) Theorem (Existence of optimal basic solutions).

problem (D) be solvable.

Let the dual

Among the solutions there is a basic solution,

i.e. an optimal basic solution. Proof:

The proof is an immediate consequence of Lemma (7).

always a solution pendent vectors

{s1,...Isq}, xl,...,xq a(s1),...,a(sq), q < n.

ready established.

We discuss the case

xq+l = xq+2 = ... = X and select

sq+l,...,sn E S

linearly independent. (6).)

Thus

of (D) with If

q

There is

linearly inde-

q = n, the assertion is al-

q < n.

Then we put

= 0

such that the vectors

a(sI),...,a(sn)

are

(This is always possible due to the requirement

12.

Basic Solutions and the Exchange Step

a = {sl,...Isn}

and

95

x = (xl,...,xq0,...,0)T E Rn

define an optimal basic solution.

(This basic solution is "degenerate"

in the sense of Definition (39) below.) Definition.

(11)

The subset

a = {sl,...Isn} c S

elements is called a basic set if the matrix

with exactly

n

is nonsingu-

A(sl,...,sn)

lar and the system of equations

A(sl,...,sn)x = c has a nonnegative solution

is of course a basic solu-

{a,x}

(Then

x.

tion of (D).) The simplex algorithm consists of a sequence of exchange steps. each step a basic set is given and one constructs a new basic set and the corresponding vector

One seeks to achieve:

n

n E

x' E Rn.

In

a' c S

b(si)xi <

i=1

E

i=1

{a',x'}

i.e. that

b(sl)x!; 1

(12)

is a better basic solution than

in the sense

{(Y,x}

that the preference function of (D) assumes a larger value. In the following we are going to split this exchange step into six substeps, each of which will be discussed in detail.

Special attention

will be devoted to the question of determining when an improvement (12) is possible.

The numerical considerations associated with the simplex algorithm will be dealt with in §14.

In order to start the simplex algorithm an

initial basic solution {a0,x0}

must be known.

In §15 we shall describe

how to construct an initial basic solution.

We assume now that we are given a basic set basic solution

{a,x}.

Thus

x

a

and the corresponding

is the unique solution of (4).

We have already stated that the simplex algorithm also delivers approximate solutions to the primal problem (P).

The following simple com-

plementary slackness theorem indicates how the basic set sociated with a vector

Complementary slackness theorem.

(13)

and

{&,x}

a

may be as-

y E Rn.

Then

feasible for (D).

y

and

Let {o,x}

y

be feasible for (P) are optimal for (P)

and (D) respectively if and only if

xi

Cn

r=1

a(.)

1

- b(si)} = 0, J

i = 1,...,n.

(14)

96

THE SIMPLEX ALGORITHM

V.

Proof:

mality of

We showed in (20) of §4 that (14) is sufficient for the optiy

and

The necessity is an easy consequence of the

{0,x}.

relation

n i=l

n

b(si)x i = v(D) = v(P) =

E r=1

combined with the dual constraints. v(P) = v(D)

cryr We recall that we have assumed

in this entire chapter.

The statement of the complementary slackness theorem can also be phrased thusly:

{&,x}

and

y

are optimal for the Problems (P) and (D)

respectively if and only if they satisfy the following systems of equations and inequalities: Primal constraints n

ar(s)yr > b(s),

s E S.

(15)

r = 1,...,n

(16)

r=1

Dual constraints n

ar(si)xi = cr, i=1

xi > 0,

i = 1,...,n.

Complementary slackness conditions

rn xiS

i = 1,...,n.

ar(si)yr - b(si)} = 0,

Our given basic solution Starting from

{a,x}

{a,x}

(17)

must of course satisfy (16).

we determine a vector

is satisfied as well by selecting

y

y E Rn

such that (15)

as the solution of the equations

n

ar(si)yr = b(si),

i = 1,...,n.

r=1

This system has a unique solution

y

since the system can be written

AT(sl,.. ,sn)y = b(si,.. ,sn). Here

AT(sl,...,sn)

is the transpose of the matrix

and

b(sl,....sn) = (b(sI),....b(sn))T E R.

(18)

A(sl,...,sn)

in (5)

12.

Basic Solutions and the Exchange Step

A(sl,...,sn) AT(sl,...,sn)

97

is nonsingular by the definition of basic solution.

Hence

Thus

has the same property.

y = AT(s1....,sn)-1b(sl,...,sn)

is uniquely determined by (18). Exchange Substeps (El) and (E2).

(19)

The basic set

a = (sl,...,sn} C S is given.

Compute the unique nonnegative solution

(El)

x

to the linear sys-

tem of equations A(sl,...,sn)x = c.

Determine the unique solution

(E2)

y

to the linear system of equa-

tions

AT(s1,...,sn)y = b(sl,...,sn). If

also satisfies

y n

ar(s)yr > b(s),

s E S,

r=1

then

y

is optimal for (P) and

{a,x}

optimal for (D).

We assume now

that we are given a basic set a = {sl,...,sn} C S such that the vector tions (15).

y

calculated in (E2) does not meet all the condi-

Then

{a,x,y}

is not a solution to the system (15)

- (17).

We describe now how to find

an approximate solution {a',x',Y'}

to the

system of equations and inequalities (15) - (17) which is better

in the sense of (12).

The basic sets

except one in common.

Thus if

and

a

a'

will have all elements

a = {sl, ..,sn},

then exactly one s' E S

si, i = 1,...,n, say

which did not belong to

a.

sr, will be exchanged for an

Hence

98

THE SIMPLEX ALGORITHM

V.

a' = {sl' .... sr-1's''

S! = si,

sr+1'...,sn} =

iTr

,

S'r = S'. Alternatively,

a' = (a U (s'}) . {sr} r E

for some

included in

We describe first how to select {a,x,y}

a'.

s' E S

to be

are hence given.

(20)

Exchange Substep (E3).

(E3)

Determine

s' E S

such that

n

r=1

ar(s')Yr < b(s').

If no such {a,x,y}

s'

(21)

exists, then the computation is stopped here, since

solves (15) - (17).

This means that we include in the basic set is such that a primal constraint is violated.

a'

a point

which

s'

This fact entails that

s' f a. There remains to determine a member i.e. will be replaced by

si E a

which shall leave

a,

s'.

(22)

Exchange Substep (E4).

(E4)

Compute the solution

d E Rn

of the linear system of equations

(23)

A(si,...,sn)d = a(s') i.e.

n

a(si)di = a(s'). i=1

(23) thus expresses the "new" vector the "old" vectors

a(si), si E a.

from the following argument.

a(s')

as a linear combination of

The meaning of the vector

d

is clear

Consider the set

a U {s'} = {s1,...,sn,s'} C :S.

It consists of

n+l

elements.

(24)

Introduce the n+l-dimensional vector

(x1 - Adi,...,xn - AdnA) (25)

_ (xi(A),...,xn(A), xn+l(a))T E Rn+l

12.

Basic Solutions and the Exchange Step

(A E R

is arbitrary).

{a U {s'}, x(A)}

The value of the dual preference function for

will be denoted by

c0(A):

n

n

b (si) xi (a) + b (s') A =

c0 (A) _

b (si) (xi-Adi) + b (s') A.

i=1

If we put

99

i=1

A = 0, we get

c0(0) =

nn

L

b(si)xi,

i=1

the "old" value of the dual preference function. (26)

The following relation is true for all

Lemma.

A:

(27)

c0(A) = c0(0) + XACs'), where

n

A(s') = b(s') - I ar(s')yr > 0. r=1

(Compare (21).) Using (18) and (23) we have

Proof:

n cOCX) =

b(si)xi + A{b(s') -

i=1

n

b(si)di}

i=1

= c0(0) + A{b(s') - b(sl.... ,sn)Td}

= c0(0) + A{b(s') - yTA(sl,...Isn)d) = c0(0) + A{b(s') - yTa(s')} = c0(0) + AA(s'). Since not

A(s') > 0, the value of the dual preference function for smaller than that for

is feasible for all

x = x(0).

Therefore, if

x(A)

is

{a U {s'}, x(A)}

A > 0, then the value of the dual preference function

can be made arbitrarily large.

This should mean that (D) is unbounded,

entailing that (P) is inconsistent.

This case is dealt with in the follow-

ing lemma. (28)

Lemma.

d. < 0,

Let the unique solution vector

d

of (23) be such that

i = 1,...,n.

(29)

Then (D) is unbounded and hence (P) is inconsistent. Proof:

We note first that (23), (24) and (25) imply that the equality

constraints of the dual problem are met independently of (29).

Thus

100

V.

THE SIMPLEX ALGORITHM

n

r = 1,...,n,

ar(si)xi()L) + ar(sl)xn+1(A) = cr, i=1

and this equation is true for all real i =

xi(A) > 0, A > 0.

for all

If (29) holds as well, then

A.

Letting

A -* +m, by (27) we conclude that

c0(A) +

establishing the assertion. It is now clear how to select One calculates the maximal

A

when some of the

di

are positive.

such that

A

i = 1,...,n.

xi(A) = xi - Adi > 0,

(30)

Then one need only consider those indices

i

such that

di > 0.

If

di > 0, then (30) is equivalent to A < xi/di.

Thus

a = min {xi/di, di > 0}

It is also clear that at least one of the

meets all the conditions (30).

xi(a), i = 1,...,n

components

of the vector

x(a)

will vanish.

Indeed,

if

a = xr/dr

r E {l,...,n},

for an

(31)

then we get

x

xr (a) = xr - dT dr = 0.

(32)

r

sr

The corresponding element

is removed from the basic set.

Hence we

put {a u {s'}} . {s

al

}

r s1,

s

sl'

sr+1, ..,sn}

'sr-1'sr' sr+1'

..,sn}

and x' = (x1(A),...,xr-1(a),X, xr+1(a),...,xn(a))T xr

xr _

(x1

d

r

d1,...,xr-1

d

r

xr dr-1,

dn)T. n - dr r

d

r

xdr

xr+l -

r

dr+1,...,

(33)

12.

Basic Solutions and the Exchange Step

Use (27) and (32) to verify once more that

Exercise.

(34)

101

{a',x'}

is feasible for (D) and that

n

n

x

b(sl)xi =

b(si)xi + dr A(s').

i=1

r

i=1

(31) does not necessarily determine the index

r E {1,...,n}

uniquely.

We summarize the process above (i.e. the determination of which element

sr

a) as follows:

to remove from

(35)

Exchange Substeps (ES) and (E6).

Let

d

be the unique solu-

tion of (23) in Substep (E4). (ES)

di < 0, i = 1,...,n, then (D) is unbounded and (P) is in-

If

The computations are stopped.

consistent. (E6)

If there is a positive

di, then select an

with

r E (1,...,n}

and such that

dr > 0 x

dr = min{xi/di, di > 0}. r

Next put

a' = {a U {s!}} . {s r

Now the fundamental question arises whether the "new" set set.

a'

In that case one can repeat the process from Substep

instead of

a).

is a basic

(El) (with

a'

Thus one gets an iterative scheme, the simplex algorithm.

We now prove (36)

Lemma.

step (E6). Proof:

Let

Then

a'

s'

be found via Substep (E3) and

Sr

via Sub-

is a basic set.

To facilitate the presentation we renumber the vectors

a(si), i = 1,...,n

r = 1.

so that

Thus we must show that

a(sp), a(s2),...,a(sn)

are linearly independent.

(37)

Since

a

is a basic set the vectors (38)

aCs2),...,a(sn)

must be linearly independent. linearly dependent.

Then

of the vectors in (38);

a(s') =

nn

E

i=2

a(si)Pi.

Assume that

a(s')

a(s'), a(s2),...,a(sn)

are

can be written as a linear combination

102

V.

THE SIMPLEX ALGORITHM

Comparing with (23) we find that dI = 0,

d2 = P2,...,dn = Pn.

This contradicts the fact that we have assumed selected such that

dr > 0.

A( sl,...,sr-1's

r = 1

since

r

is always

The system of equations ,sn )x

= c

'sr+1'

has a unique nonnegative solution

x'

since the index

in Substep (E6) precisely according to that criterion.

r

was selected

(See also (33).)

Thus Lemma (36) guarantees that one can return to Substep El with the new basic set

a', provided no interruption occurs in Substeps (E3) or

As stated earlier, the goal is to increase the dual preference

(ES).

function, i.e. to achieve that b(slI...Isn)Tx < b(sl,...,sn)x' holds at each simplex step. under all circumstances.

Unfortunately this cannot be provided for

That is, if

T T xr b(si,...,sn) x' = b(sl,...,sn) x + A(s').

d r

and

and

s'

A(s') > 0

s

are chosen such that

r

and

dr > 0

then it is quite possible that

xr = 0 holds.

Then the value of the dual preference function would remain con-

stant during the transfer from the basic set

a

to the new basic set

a'.

Such an exchange would appear not to be worthwhile. (39)

A basic solution

Definition.

xi > 0, i = 1,...,n.

If at least one

{a,x}

is termed regular if

xi = 0, then the basic solution is

called degenerate. (40)

Exercise.

Minimize

6

1

We are given the following optimization problem (P) 1 +

1

r-1

r- )

yr

subject to

r=l (P)

6 E

r=1

sr-lyr > es,

s E [-I'll.

Basic Solutions and the Exchange Step

12.

103

The corresponding dual problem reads q

Maximize

s.

i=1

1

i

i=1

( 1)r-1r

1 +

=

sr-lx

C

(D)

subject to

e lx.

E

= 1,...,6,

i

i = 1,...,q.

xi > 0,

Verify the statements below. i)

Put

q = 7

a(l)

a(1) = {sl,...,s7}, x(1) E]R7

and define

-

=

,

0,

, 1},

,

x(1) = (1/12, 5/18, 5/12, 4/9, 5/12, 5/18,

Then

{0(1),x(1)} ii)

Let

x

(2)

1/5, 19

2S

25

25

25

19

1}, T

(144' 48' 72' 72' 48' 144)

{a(2),x(2)}

iii)

1/12)T

is feasible for (D) but is not a basic solution.

(2)

Then

by

is a regular basic solution.

Using the reduction process from (14) of 98, one may construct from

{0(1),x(1)}

a basic solution with the basic set

a(3) = {-l, -/, 0, 41-15,

315, 1},

x(3) = (0, 5/9, 8/9, 0, 5/9, 0)T Then

{a(3),x(3)}

is a degenerate basic solution.

We observe that when an optimization problem is such that all basic sets are regular then the dual preference function increases with each simplex step.

We now summarize all the Substeps of the exchange step for the linear optimization problems of type (P). (41)

The exchange step of the simplex algorithm.

Let a basic set

a = {s1....,sn} c S be given (the construction of an initial basic set is treated in §15). introduce the nonsingular matrix A(s1,...Isn)

We

104

V.

with the columns

THE SIMPLEX ALGORITHM

a(s1),...,a(sn), and the vector

b(sl,...,sn) _ (b(sI),...,b(sn))T. (El)

Determine

from

x E Rn

A(sl,...,sn)x = c. (E2)

Compute

from

y E Rn

AT(sl,.... sn)y = b(sl,...,sn). (E3)

Determine an

s' E S

such that

n

I ar(s')yr < b(s').

r=l

with this property exists, then

y

is optimal for (P) and

If no

s'

{a,x}

optimal for (D), and the calculations are stopped here. (E4)

Compute

d = (d1,...,dn)T E Rn

such that

A(sl,.... sn)d = a(s'). (E5)

If

d. < 0,

i = 1,...,n,

then (D) is unbounded and (P) is inconsistent, and the computations are stopped here. (E6)

r E {1,...,n}

Find x

such that

x.

dT = min

/di > 0}

di and put

a' = {a U {s'}}

.

{sr},

..'sr-1,s" sr+1'

i.e. .

,sn} _ {s',...,s'}.

{ s l '

Then

a'

is a basic set and the corresponding basic solution

satisfies x b(si,...,sn)Tx' = b(s1,...Isn)Tx + dT A(s'). r (42)

Remark.

The Substeps (El),

of linear systems of equations. the calculations efficiently.

(E2) and (E4) call for the solution

We have not yet described how to arrange

The different variants of the simplex al-

gorithm differ only in this respect.

Fundamental for the analysis of the

numerical properties of the various simplex algorithms is the recognition

13.

The Simplex Algorithm and Discretization

105

that at each simplex iteration linear systems of equations are solved, explicitly or implicitly. (43)

Remark.

We shall discuss this matter in §14.

We note that exactly one element is exchanged by the

transfer from the "old" basic set

to the "new" one

a

a'.

There are

other exchange procedures by which several elements are exchanged at each One extreme case is the so-called simultaneous exchange when all

step.

elements of a

are changed by the transfer to

a'

(see Judin and

We also mention in this context the Remez

Goldstein (1968), p. 506).

algorithm (see Cheney (1966), p. 97), where again the entire basic set is The computational effort is generally greater

exchanged at each step.

than by the exchange algorithm described above but on the other hand one hopes to achieve greater increases in the value of the dual preference function per iteration step.

§13.

THE SIMPLEX ALGORITHM AND DISCRETIZATION Let an initial basic set

aO = {s0,...,sO}

be known.

(See §15.)

If we now perform an exchange step and no interruption occurs in (E3) and (E5)

(in each of these cases there is no need to continue the computations),

then (E6) gives a new basic set

a' = {sl,...,sn}.

Substep (El) and start a new exchange step. the simplex algorithm.

Hence we can return to

In this way we have obtained

Thus we generate a sequence

1

a0

of basic sets,

k

k

k

= {s1,...,sn},

Note that

ak

and

k = 0,1,...

.

have all elements except exactly one in common.

ak+l

We also get a corresponding sequence of basic matrices

AO -). A, -.A2i ...

,

where Ak =

has the column vectors xk

a(sk),...,a(sn).

= Ac, k = 1,2,... kl

are such that

The corresponding vectors

106

THE SIMPLEX ALGORITHM

V.

T+jx

bix1 < b2T x2 < ... < bkxk <

< ... < v(D),

where k

k

bk = b(s1,...,sn).

The matrix

Remark.

(1)

Ak-1

differs from

only by one column

A.

vector!

We now want to describe in greater detail how to determine the vector which is to be included in the basis (Substep (E3)).

a(s')

general very many indices

s E S

There are in

such that

n

ar(s)Yr - b(s) < 0. r=1 If one wants to write a computer program for carrying out the exchange step, then one must given an unambiguous selection rule. The case of linear programming,

(2)

finite set.

In this case

ISI <

Usually one has the rule to select

s'

S

is a

at the minimum point

of the error function n

ar(s)yr - b(s). r=1 Thus we take an index value which renders the function

n

A(s) = b(s) - I ar(s)Yr r=1

a maximum.

of

Hence, in Exchange Substep (E3) we add to the basis an element

which is such that the primal constraints are violated as much as

S

possible.

Since

S

is finite we can determine an element

s'

which

has the property

A(s') > A(s),

sES

(3)

by means of finitely many arithmetic operations.

If

s'

is not uniquely

defined by (3), then we must introduce further conventions to make a unique choice possible.

If

interval, we take as

S s'

is an ordered set, e.g. a finite subset of a real the smallest index satisfying (3).

Thus the Substep (E3) of the Exchange step is completely specified for a finite index set.

For this class of linear optimization problems

one can establish a simple result on the convergence of the simplex algorithm.

13.

The Simplex Algorithm and Discretization

Consider the case

S = {1,...,m}

finitely many different basis sets (n = =ml

Ym,n

where

107

m > n.

a = {sl,.... sn}.

Then there are only Indeed, there are

m!

n!(m-n)!

different subsets of

S with n elements. Hence there are at most y m,n different basic solutions of the system occurring in the dual problem (LD)

Ax = c,

x > 0.

In principle, it is possible to solve the dual pair (LP) - (LD) by

means of calculating all these basic solutions and then to pick the one which assigns the highest value to dual preference function.

In practice this is not possible since the computational effort

thereby required is prohibitive even for modest values of m

and

n.

As

an example we mention that Y20,10 = 184756.

The decisive advantage of the simplex algorithm is the fact that a sequence of basic solutions is systematically generated in such a manner that the corresponding values of the dual preference function form a nondecreasing Therefore usually only a small fraction of the possible number

sequence.

of basic sets will be generated.

This is the reason for the efficiency

of the simplex algorithm of linear programming. (4)

Theorem.

Let

S

have finitely many elements; i.e. we consider

the dual pair (LP) - (LD) of linear programs. bounded.

Let (LD) be feasible and

Assume also that the simplex algorithm generates a sequence of

basic solutions such that the corresponding values of the dual preference function form a strictly increasing sequence.

Then the

simplex algorithm

delivers optimal solutions to (LP) and (LD) after finitely many iterations. Proof:

Since the values of the preference function corresponding to

the basic solutions which are generated by the simplex algorithm are strictly increasing, the same basic set cannot appear twice. simplex algorithm generates pairwise different basic sets.

Thus the Since there

are only finitely many basic sets the simplex algorithm must stop at an optimal solution after finitely many iterations. (5)

Remark.

If all the basic solutions which are generated by the

simplex algorithm are regular (see (39) of §12), then the preference function of the dual problem is strictly increasing.

Hence the simplex al-

gorithm must deliver an optimal solution after finitely many iterations.

108

THE SIMPLEX ALGORITHM

V.

If degenerate basic solutions occur, it is quite possible that the simplex algorithm "cycles", i.e. the same basic solutions reappear periodically and the value of the dual preference function remains constant without having reached its optimum. have been constructed.

Examples illustrating this phenomena

However, such "pathological" cases occur so rarely

that one generally does not bother with taking special precautions for dealing with them when constructing computer programs for practical use. It sometimes happens that degenerate basic sets do occur and thus one or several simplex steps are carried out through which the current value of the dual preference function does not increase, but normally the increase resumes without the use of any special devices for achieving this desired state of affairs.

The case of degeneracy and possible cycling is of course of great theoretical interest.

By means of a modification of Exchange Substep (E6)

the simplex algorithm may be altered so that the same basic set cannot reappear even if degeneracy occurs.

Then the simplex algorithm gives an

optimal solution after finitely many iterations in this more general situation as well.

The principle behind this modification is to introduce an

arbitrary small perturbation of the vector function.

c

in the primal preference

Hence we construct a perturbed problem such that no degenerate

basic solutions are generated by the simplex method.

Hence this perturbed

problem is solved after finitely many simplex iterations.

By construction

one can now determine an optimal solution of the original problem from the calculated optimal solution of the perturbed problem.

This so-called

e-method is described in Charnes, Cooper and Henderson (1953).

It uses

the so-called lexicographic ordering to modify Exchange Substep (E6). See also Hadley (1964) or Collatz and Wetterling (1971). It is much more difficult to prove a convergence statement of the form lim bkT xk = v(D)

kwhen there are infinitely many constraints.

Then the simplex algorithm

can not, in general, be shown to stop after finitely many iterations. Theoretical investigations of this case can be found in the book by Blum and Oettli (1975), p. 247-255 and in the writings by Carasso (1973) and Hofmann and Klostermair (1976). When

S

has infinitely many elements, then there is of course no

general procedure to find an tions on the index set

S

s'

satisfying (3).

and the functions

Without special assump-

ar, r = 1,...,n

and

b, it

13.

The Simplex Algorithm and Discretization

is not certain that an special case when continuous on mine an

s

S

109

with the property (3) exists.

s'

is a compact subset of

Rk

Even for the

a1,.... an,b

and

are

S, it is not possible to give a general method to deter-

where

A(s)

assumes its maximum value.

The case just men-

tioned has appeared several times before in our text. in uniform approximation problems.

It often occurs

In theoretical analysis (e.g. con-

vergence proofs) one often works with

s'

satisfying (3).

Some minor

But in practice

relaxations of this condition are sometimes introduced. one normally proceeds along the lines to be given below: (6)

Modification of the exchange substep (E3) when

a finite subset

{sl,...,sm} c :S

A(s') > A(s), (If

and determine an

s'

ISI

= -.

Select

such that

(7)

s E Sm.

is not uniquely determined by (7), then one proceeds as described

s'

in (2).) It is easy to realize that this corresponds to a discretization of (P) in the sense of (10) of §3. (Pm)

Minimize

c

T y

subject to

If we now start with a basis

Consider the linear program a(s)Ty > b(s),

a E Sm

s E Sm.

and use the selection rule from (2)

then the simplex algorithm applied to (P ) above delivers the same new m basis elements s' as when it is used on the continuous problem (P)

Minimize

T

c y

subject to

when one also starts from

a

s E S

a(s)Ty > b(s),

and determines

according to (7).

s'

The "rough" calculation of the new element

s'

to enter the basis

and (approximately) satisfying

A(s') > A(s),

sES

thus corresponds to a discretization of (P).

This gives us a reason to

discuss discretization of linear optimization problems with infinitely many constraints.

Discretization is very important, both in theory and in

practice.

Consider the problem n (P)

Minimize

cTy

subject to

I

ar(s)yr > b(s),

r=l

This problem is approximated by the linear program

s E S.

110

THE SIMPLEX ALGORITHM

V.

n

Minimize

(Pm)

cTy

subject to

ar(si)yr > b(si),

I

i = 1,...,m.

r=1

Here, {s1,...,sm}

is a fixed subset of

S.

We now give a useful interpretation of the discretized program (P M S

is assumed to be a subset of Definition.

(8)

wl,...,wm

Let

Rk.

T = {sl,.... s

be a subset of

S, and

be real-valued functions with the properties (i) and ii) be-

low:

i) wj (s) > 0, s E S, j = 1, ... ,m; w j (s

=

i#j

f 'o:,

i)

i,j = 1,...,m.

,

Suppose a real-valued function function

Lf: S a R

f

is defined on

S.

We define the new

by

m (Lf)(s) _

wj(s)f(sj).

I

j=l

Then

is called a positive interpolating operator with nodes

L (9)

sl,.... sm

Piecewise linear interpolation in one dimension;

Example.

S = [a,8], a = sl < s2 < ... < sm = 8.

Define

wj, j = 1,...,m

accord-

ing to:

to

a<s<s.

(s-s.

)/(s

-

- j-1

< s < s.

(only for

j > 1)

(only for

j < m)

j-

w (s)

See Fig. 13.1.

(sj+1-s)/(sj+l-sj)

sj L s < sj+1

0

sj+l < s <

This construction may be generalized to the "triangulation"

of multidimensional areas. functions

s.

)

wj

It is also possible to work with weighting

of a more general nature, e.g. piecewise polynomials of

degree higher than 1.

The following result motivates the use of positive interpolating operators. (10)

nodes

Theorem.

sl,.... sm.

Let

L

be a positive interpolating operator with

Then the linear optimization problem n

(PL)

Minimize

cTy

subject to

r=

(Lar)(s)yr > (Lb)(s),

s E S

=1

has the same feasible vectors

y

and hence the same solution as the dis-

The Simplex Algorithm and Discretization

13.

1

111

w3

------

b

s4

s3

SZ

S

m

Fig. 13.1

cretized problem (Pm). Proof:

a)

Let

(Lf)(si) = f(si), y

we find that

y meet the constraints of (P L).

Since

i = 1,...,m,

also satisfies the constraints of (P

m).

Assume on the other hand that

b)

n ar(si)yr > b(si),

E

i = 1,...,m.

r=1 wi(s) > 0, i = 1,...,m

Since

n

and

s E S, we get

m

E (Lar)(s)yr - (Lb)(s) _

r=1

i=1

nc

wi(s){

r=1

ll

yrar(si) - b(si)F 10 1

s E S, proving the assertion.

for all

The discretization (Pm) of (P) is equivalent to replacing (P) by a linear optimization problem with the same index set tions

ar, b

approximated by

Lar, Lb

S

respectively.

but with the funcIt is possible to

express the deviation of the optimal value of (P ) from that of (P) in

m

terms of the interpolation errors maxILa (s) - a (s)l, r sES r

maxlLb(s) - b(s)j.

r

sES

Compare Theorem (16)! (11)

be a grid.

Definition. Let

Let

S

be a subset of

Rk

and let

{sl,...Ism} c S

112

V.

h = h(s

Is - s.I. min 1 sES 1
.. ,s ) = max

m

1

Then

is called the roughness of the grid.

h

THE SIMPLEX ALGORITHM

Here,

I I

Euclidean distance in Exercise.

(12)

denotes the

Rk.

Consider the interpolating operator of (9).

that there is a constant

Show

such that

c

I f(s) - (Lf) (s) I < ch2

max

sE [a, s] when

f

is twice continuously differentiable.

Note.

Rk, k > 1.

Two numerical examples.

(13) a)

This result cannot be directly generalized to

Minimize

yI + y2/2 + y3/2 + y4/3 + y514 + y6/3

subject to the

constraints 2

yI + y2s + y3t + y4s2 + y5st + y6t IYrI < 10,

The index set

> es +t

,

s E [0,1],

t E [0,1]

r = 1,...,6. [0,1] x [0,1]

is replaced by the 25 points

si = 0.25 (i-1), tj = 0.25 (j-1), i = 1,...,5, of the grid is

2

h = 0.12S r2- c 0.1768.

j

= 1,...,5.

where

The roughness

The discretized problem is hence a

linear program with 6 variables and 37 constraints. of the simplex method.

(silt .)

It was solved by means

In the table below the solutions of the discretized

and the continuous problems are given.

The latter was solved with the

three-phase algorithm of Chapter VII. Discretized Problem Optimal value

2.41

Original Problem 2.44

Optimal solution yl

Y2

y3 Y4 y5 Y6

2.86

2.58

-4.69

-4.11

-4.69

-4.11

4.55

4.25

4.31

4.53

4.55

4.25

This example was solved by means of the computer codes of K. Fahlander

(1973).

13.

The Simplex Algorithm and Discretization

113

The following example gives an idea about how rapidly the dis-

b)

cretization error decreases when the grid is refined.

We consider the

problem 6

Minimize

6

subject to

E Yr r=l

E r=1

sr-ly

r >- 1/(1+s2),

We discretize this problem by replacing the index set SR = {s1

S

- -

0 < s < 1.

by the subset

i = 1,...,R}.

=

Using the simplex algorithm we got the results below.

(The original

problem was again solved by means of the three-phase algorithm of Chapter VII.)

Index Set

Roughness of Grid

S21

1/40

0.785 561 34

S41

1/80

0.785 568 72

S81

1/160

0.785 568 92

Optimal Value

0.785 569 11

S

(14)

Solution of linear optimization problems by discretizations. Si, k = 1,...

Select a sequence of finite subsets

of the index set

S

with the properties

h(SR) = max min

is - sl + 0

when

t +

(15)

sES s`ESR

and

R = 1,2,...

SR c SR+1'

.

The linear programs (PR) are solved by means of the simplex algorithm:

Minimize

(PR)

c

n

T

y

subject to

I

ar(s)yr > b(s),

s E S

r=1

An optimal basic solution to the dual (DR) can be used as the starting basic solution to Remark.

DR+l.

It is possible to prove that

lim v(PR) = v(P) R-

provided that the assumptions of the duality theorem (7) of §11 are met, the sequence of discretizations satisfies (15), S c Rk and the functions

al,...,an,b

are continuous on

S.

is a compact set,

114

V.

THE SIMPLEX ALGORITHM

The following simple theorem can often be used to estimate the difference between the optimal value of the discretized problem and that of the original problem. Theorem.

(16)

Let the linear optimization problem be such that there

is a vectors E Rn

P > 0

and a real number

satisfying

nn

a(s)TY =

L

ar(s)Yr > P,

s E S.

r=1

(si,...Ism)

Let

be a subset of

S.

The linear program arising when

is replaced by this subset is assumed to have a solution A

m

y(m).

S

Let

be such that

> 0 nn

ar(s)Yim) + Am > b(s),

s E S.

(18)

r=1

Then

v(P), the value of the linear optimization problem (P), can be

bracketed as follows: cTy(m) < v(P) < cTy(m) + Amp-1cTy Proof:

The leftmost inequality is well known.

See (12) of B.

show the other inequality we observe that the vector y = Y(m) + A m p-

1y

meets the conditions of (P).

We find from (17) and (18) that

n ar(s)Yr = r=1

E

ar(s)YTm) + AmP-1

r1

Hence we get v(P) < cTY = C

E

r=1

+

AMP-1cT

establishing the desired result.

ar(s)Yr > b(s),

s E S.

To

Chapter VI

Numerical Realization of the Simplex Algorithm

In this chapter we shall describe how to implement the simplex algorithm on a computer.

As stated earlier, this algorithm requires the

solution of a sequence of linear systems of equations.

We devote consid-

erable space to explaining how to solve such systems in a computationally efficient way.

In the last section we discuss the construction of a basic

solution with which one can start the simplex algorithm.

STABLE VARIANTS OF THE SIMPLEX ALGORITHM

§14.

Each exchange step of the simplex algorithm calls for the solution of three linear systems of equations.

In Substeps (El), (E2) and (E4)

we encounter A

xk k

= c,

(1) (2)

Tk

Aky

- bk,

Akdk = ak'

(3)

The meaning of the abbreviations with (41) of §12.

bk, ak

will be clear if we compare

We observe that the vector

ak

will not be known before

the system (2) is solved. In principle, one could solve the three systems (1), (2), and (3) straightforwardly in each exchange step of the simplex algorithm.

One

could use any of the standard methods (e.g. Gaussian elimination or Householder transformations) to calculate the vectors (1), (2), and (3) respectively.

xk, yk, and

dk

from

These and other numerical methods are

described in textbooks on numerical analysis, e.g. Dahlquist and Bjorck

115

116

NUMERICAL REALIZATION OF THE SIMPLEX ALGORITH

VI.

(1974), Stoer (1976) and Stewart (1973).

Such a procedure can make sense in some cases, in particular when the number

is modest, say

n

n = 10.

quired grows rapidly with

n.

However, the computational effort re-

In a general case it increases as

n3.

Hence the total effort would be prohibitive for problems of a size often encountered in practice, i.e. with hundreds and thousands of variables, even if a large powerful computer is available.

Therefore several variants of the simplex algorithm have been developed in order to reduce the computational labor.

to exploit the fact that the matrices

Ak-1

and

The decisive idea is Ak

are closely related.

They differ only by one column vector.

We shall now discuss a variant of the simplex algorithm which is based on Gaussian elimination.

The rest of this section is not crucial for the

understanding of the simplex algorithm since it deals with the efficient and accurate solution of a sequence of linear systems of equations.

Hence

the reader may skip this topic during the first reading of the book without losing contact with the contents of succeeding sections.

We consider a linear system of equations of the form

where

A = (aik) (i,k = 1,...,n) is a fixed nonsingular matrix and

given vector.

b

a

In order to solve the system of equations one seeks to

determine a nonsingular matrix The product

R

of

F

and

with the following property:

F A,

FA = R

(5)

is an "upper triangular matrix" of the form r11

r12

..

r22

R =

rIn

with

r.. = 0,

i = 1,...,n.

O (5) is called a triangular factorization of the matrix (6)

A.

The factorization method for linear systems of equations.

pose a triangular factorization (5) is known.

Sup-

Then the system

Ax = b is equivalent to the system Rx = Fb.

(7)

14.

Stable Variants of the Simplex Algorithm

In order to solve

Ax = b

117

one first calculates the vector

b = Fb

and

then solves the system r11x1 + r12x2 + ... + rInxn r22x2 +

b1

b2

+ r2nxn

Rx =

= Fb. x r nnn

I bn

The last system is easily solved by means of back-substitution: x

n

= r-lb

nn n 1

rn-l,n-16n-1 - rn-l,nxn

xn-1

1

(8)

r12x2 - ... - rlnxn).

xl =

r11-

(9)

Solution of

ATx = b.

system of equations

The

ATx = b

(10)

which contains the transpose

of A

AT

factorization (5) is available.

can also be easily solved when a

Indeed, (10) is equivalent to the two

systems of equations RTy = b

(11)

x = FTy.

(12)

(This statement is verified by multiplying (12) by AT = RT(FT)-1.) solve (10) one starts by determining

y

To

from (11):

r11Y1 b2

r12y1 + r22y2

= b.

RTY =

l rlnyl + r2ny2 + ... + rnnyn Thus

y

y1,...,yn

b

I

n 1

is calculated by means of forward-substitution and one finds in analogy with (8).

from (12) without major effort. simplex algorithm.

The solution

x

is subsequently found

Consider now exchange step

k

of the

Let a triangular factorization

Fk Ak = Rk

of the basis matrix

Ak

be known.

Then the three linear systems of equa-

tions which appear in this exchange step,

118

VI.

NUMERICAL REALIZATION OF THE SIMPLEX ALGORITHM

Akxk = C,

T Akyk = bk, Akdk = ak,

may be solved as described in (6) and (9).

Numerical schemes for triangular factorization.

(13)

The most common

methods for calculating a triangular factorization of the type

FA= R Put

are based on the following idea.

A(1) = A A(2), ..,A(n-1)

and determine a sequence of matrices

according to the

rules

A(2) = F(1) A(1) A(3)

=

F(2) A(2)

F(2) F(1) A

F(n-1) A(n-1) = F(n-1)

A(n) = Here

=

F(1),...,F(n-1)

mined such that

...

F(1) A.

is another sequence of matrices which are detertake the form indicated below (here "x"

A(2), ..,A(n)

means that the element at this point may be different from 0)

A(2)

_

x

x

0

x

0

x

0

... ... ...

x

x

x

x

x

x

0

x

x

x

0

0

x

0

0

x

0

0

x

A(3) =

x

x

A(n)

_

x

x

x

x

... ...

x

...

... ... ... ... ...

x x x x

x

x x x

(14)

O X) Next we put F(n-1) i.e.

FA = R

A(n) = R.

...

The triangular factorization sought is then written

F(1) A = R,

14.

Stable Variants of the Simplex Algorithm

119

with F = F(n-1)...F(1).

Thus the original matrix A n-1

is brought to triangular form by means of

transformation steps. F(1),...,F(n-1)

Suitable matrices

F(1), i = 1,...,n-1

Gaussian elimination.

In the latter method one selects

are so-called elimination matrices and

G.

trices.

are orthogonal matrices, and the method based on

i = 1,...,n-1,

F(1) = G. Pi,

where

can be calculated in several dif-

We mention here the Householder transformations, in which

ferent ways.

(See below.)

Pi

permutation ma-

Due to space limitations we shall treat this method

only.

We

Triangular factorization by means of Gaussian elimination.

(15)

start by describing the first step of the method (13); i.e. the determination of

F(1) A Here

such that

F(1) =

A(2)

.

shall have the form (14).

A(2)

the idea of forming

A(2)

We borrow from Gaussian elimination

by subtracting suitable multiples of the first

row from the other rows of the matrix A

in order to render zero the ele-

ments of the first column in the second row, third row, etc.

We assume

first that all

+ 0.

The following "elimination matrix" has the desired effect: 1

-a21/a11 -a31/a11

-anl/all

O 1

0

0

1

(16)

0

1

One verifies this by means of straightforward calculation. one must proceed otherwise and exchange rows: ail # 0

all = 0,

If

one determines an element

and lets the first and the i-th rows change places.

The matrix

which results is then multiplied by an elimination matrix (16). In order to secure numerical stability, it is recommended to choose as the pivot element that element in the first row which has the largest absolute value:

120

NUMERICAL REALIZATION OF THE SIMPLEX ALGORITHM

VI.

jail) =

max k=1, ..,n

Iak1!.

Exercise (Permutation matrices).

(17)

Denote by

11 (i,k)

n x n

the

matrix 1

1

Row number i +

0 ...

1

1

1

Row number k -

... 0

1

1

.1 J

Thus we get

and

k

n

'k)

Show that

matrix.

of

be interchanging rows number II(i,k)A

Determine also

A.

n(i,k) = i

II(i,k)

i

and

k

in a unit

is obtained by exchanging rows number A R(1'k).

i

Finally, show that

(unit matrix).

We have thus constructed a matrix of type (14) by performing one step of Hence we obtain

the Gaussian elimination process.

A(2)

=

F(1) A

where

F(1) = G 1 Here

P1 (18)

P

1

.

is a permutation matrix and

GI

The general elimination step.

We now describe how to determine

x

x

x

0

x

.

A .

.

Let the matrix when

+1 .

an elimination matrix.

A(k)

x X

x x x

.

.

0

0

x

.

.

0

A (k) = (aid)) _

- k-th row

O x

4

k-th column We now perform the following operations:

A(k)

be given.

is of the form

14.

Stable Variants of the Simplex Algorithm

121

Consider the elements in column number

i)

the main diagonal of

A(k).

largest absolute value. ja(k)I Lk

=

which are on or below

Determine an element out of these which has

Let

be such an element, i.e.

a(k)

max la(k)I. ik k
Interchange rows number

ii)

k

k

and

k

of the matrix

A(k), i.e.

form the matrix

(k) P kA

where

(See Exercise (17).)

Pk = II(t,k)

Substeps i) and ii) are often

referred to as row-pivoting.

Consider row number

iii)

PkA.

in

k

of this row from all rows with numbers

Subtract suitable multiples

k+l,...,n

in such a manner that

all elements in the k-th column and below the main diagonal become zero. This means that we form A

where

(k+l)

= Gk 'k has the form

Gk

rl

0 1

+ k-th row

0 gk+l,k

(19)

1

0 gnk

with

Igikl < 1, i = k+l,...,n.

As a direct consequence of this scheme

we get (20)

Theorem.

Let

permutation matrices

A

be a nonsingular matrix.

Pl,...,Pn-l

Then there are

and elimination matrices

G 1,

...,Gn-l

such that Gn-1 Pn-1 ...

and

R

is an upper triangular matrix with

(21)

gi,k-1

G1 P1 A = R

Numerical realization.

of the elimination matrices

r.. # 0, i = 1,...,n.

One normally stores the elements Gk-l

in the positions of the matrices

V].

122

NUMERICAL REALIZATION OF THE SIMPLEX ALGORITHM

It is also

which are zeroed during the course of the computations.

A(k)

We shall here describe

necessary to keep track of the row interchanges.

a procedure which is advantageous to use in conjunction with the simplex algorithm, especially when one applies the "stable updating" to be discussed later.

We want to store explicitly the matrix

FG

Pn-1 ... GI P1

n-1

n x n

which is obtained by multiplying the Pn-1 Gn-1' one stores the

unit matrix by

This structure is exploited as follows. n x 2n

P11 G1

At the start

matrix

B = (A,I).

All row operations which are needed for the transfer from A(k+l)

A(k)

to

(row interchanges, additions of multiples of a certain row to

other rows) are carried out on all of

In this way we get the sequence

B.

of matrices

B(I) = B = (A, I) B(2) = GIP1B =

B(n) = G

P

n-1

(A(2),GIP1)

n-1

B(n-1)

_

(A(n),G

n-1

P

n-1" ' G 1 P 1)

(R , F) . Thus the matrix

has been replaced by the matrices

r

4

I

2

l

-3

2 3 5

R

and

F.

We want to factorize

Example.

(22)

A=

B

1

3 2

Thus we put

B= (A,1) =

4

2

3

1

0

0

2

3 5

1

0 0

1

0

0

1

-3

2

= B (1)

.

1

No row interchange is required in the first step since the element in the first column with the largest absolute value is in the first row.

Accord-

ing to iii) of (18) we subtract 1/2 times the first row from the second We then obtain

row and (-3/4) times the first row from the third row.

B(2) =

(A(2),GIP1l =

4 0 0

2 2

13/2

3 -1/2 17/4

1

-1/2 3/4

0

0

1

0

0

1

I.

14.

Stable Variants of the Simplex Algorithm

123

The second and third rows are now interchanged: 4 P2G(2)

_

(P2A(2), P2GIP1) =

2

0

13/2

0

2

3 17/4 -1/2

1

0

0

3/4 -1/2

0

1

1

0

I.

The last elimination step (subtraction of a suitable multiple of the second row from the third) gives

14 B(3) _ (R,F) =

2

13/2

0 0

0

3

0

1

17/4 -47/26

0

3/4 -19/26

0

1

1

-4/13

2 3

3

4

2

5

1 2

It is easy to check that 4 2

FA

-193/4 /26

0

-4/13

Exercise.

(23)

A=

3 2

1

6

1

3

1

1

1

(

0

3

17/4

0

i302

.

-47/26

R

Factorize the following matrix

Remarks.

(24)

-3

=

(a)

The factorization

FA = R is closely related to the so-called LR-decomposition of

A.

Thus one can

show that F = Gn-I ... G' P where n-1 ... Pi+l Gi Pi+l .. Pn-l'

Gr

i

i = 1,...,n-1,

Gnr-1

- Gn-1'

and p

Every

= Pn-1 ... Gi

GI G11

P1'

is again a matrix of type (19). -1 G' 2

(Why?)

Therefore

r-1

... G n-1 = L

is a lower triangular matrix, which is easily verified by means of straightforward calculation.

One obtains the decomposition

PA = LR where

L.. = 1, 1 = 1,...,n.

124

NUMERICAL RELIZATION OF THE SIMPLEX ALGORITHM

VI.

The method for calculating the factorization

(b)

FA = R

which we

have described above is numerically stable with respect to the round-offs which occur during the course of the computations. fact here.

We shall not prove this

This stability is the reason for the use of factorization

methods in "modern" realizations of the simplex algorithm. Consider now the k-th exchange step of the simplex algorithm. Ak

be the basis matrix.

The matrices

Fk

and

Rk

Let

in the factorization

FkAk=Rk The solutions

are calculated as described in (21).

xk, yk

dk

and

to

the linear system of equations Ak xk = c,

T Ak yk = b

k

Akdk=ak are determined as described in (6) and (9).

We have already said that one

should exploit the fact that two successive basis matrices

Ak

Ak

and

l

differ only in one column in order to get an efficient numerical realization of the simplex method.

We now show how to update a factorization of

Ak; i.e. to calculate the factorization of Ak+1 Modification techniques.

(25)

from that of

Ak.

By this we mean methods which allow

us to pass from a decomposition

FA = R of the

n x n

matrix A

to the corresponding decomposition

FA=R A arises from

where

A

as a result of "small changes"; e.g. change of a

row or column or the addition or deletion of a row or column. Use the same notations as before.

Let

A

be a fixed

n x n

matrix

and suppose the decomposition

(26)

FA = R is known.

We denote the column vectors of A

by

al,...,an:

A = (all ...,an). Let

a*

be a fixed vector.

We consider the matrix

A when the r-th column vector is removed from A added as the last column:

A

which arises from

and the vector

a*

is

14.

Stable Variants of the Simplex Algorithm

A = (al,..,ar-1'

125

ar+1°...,an,a ).

We seek the matrix FA = H.

Fail i = 1,...,n

The vectors

column vectors of

are known from (26) since they are the

Thus

R.

H = (Fal,...,Far-l, Far+1....,Fan, Fa*) and

H

is a matrix of the following form:

x x

x

x

x

FA= H= O x x

x

t

r-th column The first

r-1

columns of

are identical with those of

H

r-th through (n-l)-th columns of H coincide with the last of

R.

The last column of H

The matrix

H

is the vector

and the

R

n-r

can now be brought into upper triangular form by means

of a sequence of Gaussian elimination steps with row-pivoting. with (18).) ing rows.

columns

Fa*.

(Compare

Here one needs only to consider the exchange of two neighborOne thus obtains an upper triangular matrix

Gn-1 Pn-1 ..'

Each matrix

G. i

Gr Pr H = R.

R

through (27)

has the form 1

row i row i+l

1

-o-

0

gi

1

1

Pi

is either the unit matrix or the matrix which arises from the unit

matrix by interchanging rows number

i

and

i+l.

We have further that

126

NUMERICAL REALIZATION OF THE SIMPLEX ALGORITHM

VI.

Igil < I.

From (27) we get Gn-1 Pn-1 ...

r Pr FA = R.

Putting Gn-I ... Gr Pr F, we get the decomposition sought: FA = R.

Numerical realization of the modification.

(28)

tion

n x 2n

Let the factoriza-

be calculated according to (21) and given in the form of the

FA = R

matrix

(R,F).

One passes to the matrix (H,F)

We now apply

where the Hessenberg matrix is formed as described in (25).

Gaussian elimination to this matrix according to (18) and bring H upper triangular form.

The final result is then (after

n-r-1

into

elimina-

tion steps) the matrix (R,F)

and the desired decomposition of

is

AA

FFA = R.

The validity of the procedure just given is a consequence of the relations R

Pn-1 ...

Gr Pr) H

(G n-1

and f = (G'n-I Pn-1 ... Gr Pr) F. (29)

In (22) we calc ula ted the fac toriza tion

Example.

FA =

1

0

0

4

0

1

2

2 3

3

3/4

1

-4/13

-3

5

2

-19/26

4 0 0

1

2

13/2 0

Now we want to determine the corresponding decomposition

A=

4

3

1

2

1

2

-3

2

4

3

17/4 -47/26

FA = R

R.

with

A has arisen from A by replacing the second column by the third and the third by y the new column vector

a* = (1,2,4)

T

For simplicity, we work

14.

Stable Variants of the Simplex Algorithm

Thus we start with the matrix

with 3 decimal places.

(R,F) =

Entering

4

2

0 0

6.50

3

0

4 0

3 4.25

0

-1.81

0

0

1

0.750 -0.731

4.25 -1.81

0

1

-0.308

1

and following the rules of (28) we get

Fa* = (1, 4.75, 0.037) T

(H,F) =

127

1

0

1

0.750 -0.731

4.75 0.037

0

0

1

-0.308

1

We must add the second row, multi-

Only one elimination step is required.

plied by 1.81/4.25 = 0.426 to the third row.

(R,F) =

I

We then get

4

3

1

0

4.25

0

0

4.75 2.06

0

1

0.750 -0.412

0

0

1

1

0.118

which defines, within working precision, the factorization

FA = R.

Check 1

0

0

4

3

1

1

0.118 ,( -3

2

4

4

3

0.002

0.000

1

FA = I

-0.412

,

-

(

2.06

By simply counting the number of multiplications and divisions required,

factorization

such operations are necessary to determine the

n3/3

one finds that about

A

when

FA = R

on the order of magnitude of to carry out one modification.

is an

n2

n x n

matrix.

On the other hand,

multiplications and divisions are needed If we neglect the "administrative over-

head" of the computational program, the addition and subtractions, we may conclude that the use of modification techniques will entail substantial savings

for large

n.

However, for contemporary computers (1981) it is a

very rough approximation to neglect the time required for an addition or subtraction in comparison to that needed for a multiplication or division. In the present case the conclusion will not be altered even if we consider additions and subtractions as well.

will increase slowly with

n.

Normally, the administrative overhead

Modification techniques are almost a must

for treating large linear optimization problems within reasonable time. (30)

the

n x 2n

Summary.

matrix

B0 = (A0,I)

Let a starting basic matrix

A0

be given.

Define

128

VI.

NUMERICAL REALIZATION OF THE SIMPLEX ALGORITHM

and determine, as described in (21), the matrix (RO,F0)

such that

F0 R0=A0 (n-1

row-pivoting and elimination steps are required).

general step.

We discuss the

Suppose the matrix

Bk = (Rk, Fk)

The basis matrix

has been calculated.

Ak

has the factorization

Fk Ak = Rk.

If now the column vector

ar

is to be removed and the entering vector

a*

is determined, one calculates Bk+1 = (Rk+1' Fk+1) as described in (25) and (28).

the "new" basic matrix

In this way we find the factorization of

Ak+1'

Fk+l Ak+l = Rk+l.

415.

CALCULATING A BASIC SOLUTION In order to start the simplex algorithm we need a basic solution

(a0,x0}

We shall now describe how to construct such a starting

of (D).

solution.

We consider again the linear optimization problem (P)

where

Minimize S

cTy

subject to

a(s)Ty > b(s),

s E S.

is an arbitrary index set.

As in (16) of 411 we introduce the regularized problem (F > 0

is a

fixed real number) (P F)

Minimize

cTy

subject to

a(s)Ty > b(s),

eT

s E S

y > -F, r = 1,...,n.

-eT y > -F,

Here, er valent to

is the r-th unit vector and the last

2n

constraints are equi-

Calculating a Basic Solution

15.

129

r = 1,...,n.

lyrI < F,

If (P) has a solution

(1)

lyrl < F

such that

y

then

is a solution to

y

(P F) as well and the values of (P) and (PF) coincide.

Hence one can

solve (PF) instead of (P). The dual problem (DF).

(2)

the last

Let the dual variables associated with

constraints of (PF) be

2n

mr, mr, r = 1,...,n.

The dual

takes the form

n

q Maximize

(DF)

b(si)xi - F i=1

(mr + m ) r

I

r=1

subject to

q

ar(s i)xi

i=1

xi > 0,

+ m

r

- m = c r r

,

r = 1,...,n

i = 1,...,q

m r > 0, r = 1,..

m

r

n.

> 0,

The second term in the preference function of (DF), (m

nn

-F

r=1

r

+ m), r

(3)

may be interpreted as a "penalty" for violating the constraints q

ar(si)xi = cr,

r = 1,...,n.

i=1

If

is large enough, the constraints will be satisfied.

F

The advantage of considering the regularized problem stems from the fact that one can immediately find a basic solution of (DF). Construction of a basic solution of (DF).

(4)

Put

and con-

x0 = 0

sider the system e-

0+

mr - mr = cr,

r = 1,...,n.

(5)

We get a basic solution of (5) and hence also of (DF) by putting, for each r, one of the vectors cr > 0; otherwise

er

(-er)

or

follows:

n E

r=1

n

e

r mr

+

(-e r)

in the basis.

goes into the basis.

I(-e r)mr = c. r=l

We select

er

if

(5) can now be written as

130

NUMERICAL REALIZATION OF THE SIMPLEX ALGORITHM

VI.

We note that the basic solution is regular if c

# 0,

r

r = 1,...,n.

Otherwise it is degenerate.

The simplex algorithm can of course be

started in both cases. (6)

Remark on the value of the parameter F.

The starting method

described above can always be used when a suitable a priori estimate of the solutions of (D) is available.

If

F

is chosen too small, however,

then the solutions of (DF) are not feasible for (D).

Hence it is not

possible to start with the basic solution of (DF) given in (4) and to use the simplex algorithm to find a basic solution free from all the vectors er

and

-e r, r = 1,...,n, or with all the corresponding variables

my

equal to zero.

In this case one could of course increase

F

m+,

and con-

tinue with the simplex algorithm.

One arrives to the so-called two-phase method of the simplex proIf no "realistic" estimate (1) is available

cedure by arguing as follows. then one chooses

F

very large.

This means that the first term of the

preference function of (DF) has a relatively small influence. therefore be neglected.

It can

Then we consider instead the problem:

n

Maximize

(mr + mr)

-

r=1

(7)

subject to the constraints of (DF). (8)

Phase I of the simplex procedure.

The simplex algorithm is

applied to the following dual pair of linear optimization problems: (P1)

Minimize

subject to

cTy

a(s)Ty > 0,

s E S

eiry > -1,

-e Try > -1, (D1)

Maximize (9)

n - I (mr + m ) r r=1

Exercise.

r = 1,.-.,n-

subject to the constraints of (DF).

Confirm that (P1) and (D1) form a dual pair of linear

optimization problems.

Also discuss in what sense (P1) can be looked upon

as a limiting case of (P F) when

F -+

When (D1) is treated with the simplex algorithm one seeks to satisfy the constraints of (D).

If

v(D1) = 0, then (D) is consistent, and if

v(DI) < 0, then (D) is inconsistent.

15.

Calculating a BaSic Solution

131

We assume now that the simplex algorithm has delivered an optimal basic solution to (DI) after finitely many exchange steps and that the corresponding value of the dual preference function is zero.

Thus (D) is

feasible.

The basis vectors of this optimal basic solution are called

ai,

i = 1,...,n: ai E {a(s)

If now all

s E S} U {er

I

I

r = 1,...,n} U{-er

I

r = 1,...,n}.

are of the form

ai

i = 1,...,n,

ai = a(si),

i.e. none of the vectors

a

r

and

-e

appear in the basis, then one may

r

put a 0 = {s1,.. ,sn},

which is then a basic set for (P) algorithm to (P) cedure.

- (D).

We may now apply the simplex

- (D) and have thus entered Phase II of the simplex pro-

Hence this is always possible if the optimal value of (D I) is

zero and Phase I of the simplex procedure delivers a regular optimal basic solution.

er

or

(-er)

or

mr

If the value of (D I) is zero, then none of the vectors Mr can appear in the optimal basis with a positive weight

respectively.

We assume now that (D I) has the optimal value zero and a degenerate

optimal basic solution where at least one of the vectors form

er

or

as follows:

ai

is of the

Then one proceeds in Phase II of the simplex method

-er.

consider the modified dual problem: q

(DII)

Maximize

b(si)xi

subject to

i=1

q

+

ar(si)xi + mr - mr = cr,

r = 1,...,n

i=1

n

r=1

The constraint

(m+

r+mr) = 0

xi > 0,

i = 1,...,q

mr > 0,

mr > 0,

r = 1,...,n.

En=I (m+ + mr) = 0

has been introduced to insure that

r

every feasible solution of (DII) satisfies Therefore

DII

is equivalent to

D.

mr = 0, mr = 0, r = 1,...,n.

Every basic matrix of

DII

has, of

132

NUMERICAL REALIZATION OF THE SIMPLEX ALGORITHM

VI.

course, n+1

We now show how to construct a starting

column vectors.

basic set for D11

from an optimal basic solution of

DI.

Let an optimal basic matrix of (D I) contain the following

n

column

vectors:

al = a(s1),...,ak = a(sk),

ak+l = ei...,ak+i = eip -elk+l,...,an = -ein-k,

ak+t+1

where

and

k

A =

k

are fixed integers.

The basic matrix thus has the form

a(s1)...a(sk) 0 -1

00

Now let

be any vector out of the set

e

belong to the set

{ak+R+1'

and the following

(n+l) x (n+l)

...,an},

{ak+l.....an}.

Then

-e

cannot

since the basic matrix is nonsingular,

matrix is a starting basic matrix for

(DII) :

0...OAl...1

I

1

).

k-th column Exercise.

(10)

A

Show that

is nonsingular.

If the calculations are carried out as described in §14, then we leave Phase I with the basic matrix

A

factorized according to

FA = R.

This decomposition is recorded as the

n x 2n

matrix

(R, F) . We describe next how to find a corresponding factorization

FA= R, i.e. the (R,F)

(n+l) x 2(n+l)

matrix

15.

Calculating a Basic Solution

from

133

This is done by means of a method similar to the modification

(R,F).

techniques of §14.

We find from (11) that R

-Fe

0...0 1...I

1

0

F

A = 0

f

1

The matrix on the right is "almost" in triangular form. most

Consider the

form.

One needs at

permutation and elimination steps to bring it into triangular

n-k

(n+l) x 2(n+l)

R

-Fe

matrix

F (12)

0...0 1...1

After

0...0

1

Gaussian elimination steps, (12) is changed to assume the form

n-k

Ix xx

...

x

x

...

x )

O Hence we have the desired factorization

FA= R of the starting basis

matrix already in Phase I.

2(n+l)

an

It may be more practical to work with an

Remark.

(13)

A.

n x n

A

(n+l) x

We form it as follows, where

A0

is

basis matrix: 0

I

0

0

0...0

0

0

0

After factorizations we get R

0

F

0

0

0

0

0

(14)

It is now easy to supplement (14) to obtain (12).

Then the matrix

(R,F)

is calculated as earlier described and one may enter Phase II.

Chapter VII

A General Three-Phase Algorithm

In this chapter we shall describe a computational scheme for efficient numerical treatment of general linear optimization problems with infinitely many constraints.

For this purpose we shall derive a nonlinear

system of equations from whose solutions one constructs an optimal soluThe general scheme is then presented and

tion of the original problem.

its use is illustrated in several numerical examples. Thus we consider again the dual pair (P) - (D): (P)

Minimize

c

n

T

subject to

y

ar(s)yr > b(s),

s E S.

r=1 q (D)

Maximize

b(si)xi

subject to

i=1

q

ar(si)xi = cr,

r = 1,2,...,n,

i=1

In this chapter we shall require that (P) and (D) are solvable and that no duality gap occurs.

compact subset of S.

Rk

We shall further assume that and that

all...an, b

S

is a nonempty

are continuous functions on

Later, we shall also impose the condition that they have continuous

partial derivatives up to a certain order.

134

16.

§16.

Nonlinear Systems Derived from Optimality Conditions

135

NONLINEAR SYSTEMS DERIVED FROM OPTIMALITY CONDITIONS Theorem.

(1)

be an optimal solution to Problem (P) and let

y

Let

{s1,...,sq; x1,...,xq}

be an optimal solution to (D) with

1 < q < n

and such that

i = 1 , .. ,q.

x. > 0,

i

(2)

Put

n

f(s) = I ar(s)yr - b(s). r=1

Then

y1,...,yn, sl,...Isq, X1,.... xq

have the properties (3), (4) and

(5) below;

nn

i = 1,...,q;

ar(si)yr = b(si),

(3)

r=1 q

ar(si)xi = cr,

r = 1,...,n.

(4)

i=1

The function Proof:

f

assumes its minimal value at sl,...Isq.

(5)

(3) follows from (2) and the duality slackness conditions (4) expresses the fact that

(14) of §12.

feasible for (D).

f(s) > 0,

y

Since

{sl,...,sq, X11 ...,xq}

is

is a feasible vector for (P) we have

s E S.

(6)

By (3), f(si) = 0, i = 1,...,q, establishing (5). In the computational scheme to be described in this chapter, (3), (4), and (5) will be used for the calculation of xl,...,x

We shall assume that

.

y1,...,yn, sip...IsgI

can be determined, e.g. from a suf-

q

q

ficiently fine discretization of (P). and refer to

We shall call

x1,...,z

as the corresponding mass-points.

sl,...,sq

q Thus

masses q

is

the number of mass-points. (7)

Since we have assumed that

Remark.

each mass-point

si

S

corresponds to a vector with

si,...,si, (i = 1,...,q).

is a subset of k

Rk,

components

Thus

n + kq + q "unknowns"

1

yl

k

1

k

yn' sl,...,sl, sq,...,sq, x1,...,xq

the calculation of the primal and dual solutions. q + n

will appear in

(3) and (4) will give

136

A GENERAL THREE-PHASE ALGORITHM

VII.

equations which must be satisfied by these unknowns. equations will be derived from (5).

The "missing"

kq

Then we will get a system of equa-

tions with the same number of equations and unknowns.

Its solution is

then used to construct optimal solutions to (P) and (D). Exercise.

(8)

Show that

y1,...,yn, sl,...Isq, xl,...,xq

tions to (P) and (D), if they satisfy (2) Example.

(9)

-

ar, r = 1,...,n, and

Let the functions

tinuous partial derivatives of first order on i = 1,...,q, lie in the interior of of

= 0,

s.

asl

and

j = 1,...,k

S.

are solu-

(5).

S.

b

have con-

Assume also that

si,

Then (5) entails

i = 1

1

Therefore we get in this case the following system of

kq

equations

n

i = 1,...,q.

Var(si)yr = Vb(si),

(10)

r=1

(The gradient vector Vf

of a real-valued differentiable function

f

is here defined by

k

Of(s) = (

(s),...,af(s)

)

as

as

Thus we get, by combining (3), (4), and (10), a nonlinear system of equations with

n + (k+l)q

unknowns and the same number of equations.

This

system may be treated by means of one of the standard numerical schemes, e.g. the Newton-Raphson method.

See e.g. Dahlquist-Bjorck (1974) or Stoer

(1976). (11)

Remark.

It is well-known that the conditions (10) are neces-

sary for (5) but not sufficient.

Thus a solution to (3),

(4) and (10)

that also satisfies (2) may not satisfy (5) and hence may not be a solution to (P) and (D).

In order to establish that a candidate solution ob-

tained from the necessary conditions really solves the dual pair (P) (D), one must verify that the infinitely many primal conditions n ar(s)yr > b(s),

s E S,

r=1

are met.

Before we discuss the case when a mass-point is situated on the boundary of

S

we shall make some important observations about the

determination of the integer

q

by means of discretization.

Nonlinear Systems nerived from Optimality Conditions

16.

Let (PR) be a discretization of

P

137

be

(compare §13) and let {rrt,xt}

an optimal basic solution of the corresponding dual problem (DR); oR = {sit,...IsnR} c SR, xg = {x1g,...,xnR}T E Rn.

Denote by

qt

the number of positive components of

the basic solution is termed degenerate if to expect that if the grid q

Si

qt < n.

We recall that

xt.

It seems reasonable

is sufficiently fine then

qR = q

where

is the number of masspoints of a solution of (D).

However, several numerical examples have been solved where the contrary is true.

In almost all problems, one finds that

qt=n for all discretizations (P t) - (Dt), irrespective of the fineness of the grid.

Thus the discretized problems are generally not degenerate.

This

observation agrees with the theoretical result that the degenerate linear programs are, in a certain sense, more rare than the regular ones. On the other hand, the case q < n

is fairly common in optimization problem with infinitely many constraints. Nevertheless, the integer

q

can be determined from the solutions of dis-

cretized problems, as is illustrated in the following example. Example.

(12)

We want to solve the (primal) problem

8

Minimize

I

yr/r

r=1

subject to 8 r=l

syr r-1 > 1/(2-s), -

s E [0,1] = S.

We discretize and select the following subsets SR = {O,hg,2ht,...,l}

with

SR

of

S

(t > 2):

ht = 1/(t - 1).

The corresponding discretized Problems (Pt) - (DR) were solved on a computer by means of the simplex algorithm for

t = 21, 41, 81.

qt = 8

obtained in all three cases and the following basic sets emerged:

was

138

VII.

k = 21

R = 41

021

k = 81

041

0.0000

0.0000

1500 0.2000

0 1500

0

0.1750

0.1750 }

0 . 5000

0.5000 0.5250

0.5000 0.5125 }

0.5500

.

.

1

1625

8000 0.8500

0.

8250 0.8500

0 8250

1.0000

1.0000

1.0000

0.

Group

081

0.0000 0.

A GENERAL THREE-PHASE ALGORITHM

2

3

.

4

0.8375 }

5

We note that the eight numbers in each column may be divided into five The elements of Groups number 2,3,4 lie closely together and the

groups.

distances between the two elements in these groups get smaller with increasing

I.

It is reasonable to assume that q = 5

holds for the "continuous" problem.

This conjecture can be shown to be

true by means of the theory of Chapter VIII.

Now we shall demonstrate

how to derive a nonlinear system for the primal and dual problems of this particular example. q = 5

mass-points

There are 8 primal unknowns, namely sl,...Is5

y1,.... y8, and

with the corresponding masses

The results of the table above indicate

sl = 0

x1,...,x5.

s5 = 1, i.e. we

and

assume that these points lie on the boundary of the interval may also be concluded from the theory of Chapter VIII.

(0,1].

This

There remain the

8 dual unknowns s2, s3, s4, x1, x2, x3, x4, x5.

Hence there are in total 8 unknown numbers to determine. equations result from (3) and

n = 8

Now

equations from (4).

q = 5

The missing 3

equations are obtained from (5) and the observation that the "error function"

f

assumes its minimum value at

s2, s3, s4.

Hence its derivative

must vanish at these points, giving the 3 equations sought. the 16 equations (observe that

s1 = 0

and

s5 = 1) :

We give now

16.

Nonlinear Systems Derived from Optimality Conditions

- 1/2

yl

139

= 0

yl + s2y2 + s2y3 + ... + s2y8 - l/(2-s2) = 0 7

2

yl + s3y2 + s3y3 + ... + s3y8 - 1/(2-s3) = 0 yl + s4y2 + s4y3 + ... + s4y8 - 1/(2-s4) = 0

yl +

y3 + ... +

Y2 +

y8 - 1

xl+x2+x3+x4+x5 - 1

=0

0

s2x2 + s3x3 + s4x4 + x5 - 1/2 = 0

- 1/3 =

s2x 2 + s3x 3 + s2x 4 + x 5 4

0

7

s2x2 + s37 x3 + s4x4 + x5 - 1/8 = 0

y2 + 2s2y3 + ... + 7s6 y8

-

1/(2-s2)2 = 0

y2 + 2s3y3 + ... + 7s3 y8

-

1/(2-s3)2 = 0

y2 + 2s4y3 + ... + 7s6y8

-

1/(2-s4)2 = 0.

We recommend that the reader verify each of these equations. equation gives immediately

yl = 1/2

(The first

and this value can be entered into

the remaining equations decreasing the size of the system somewhat.)

We

write the above system in the form gl(Y,s,x) = 0 (13) g16(y,s,x)

= 0

where we use the notation (Y,s,x) = (Y1,---,Y8's2,s3,s4' x1,...,x5).

We next show how to construct an approximate solution from the solutions (y,s,x)

(yi;aR,xi)

of (Pt) - (DR).

(y,s,x)

to (13)

The approximation

may then be improved by an iterative scheme, e.g. the Newton-

Raphson method.

We put

yiI = yiR, i = 1,2,...,8

(here, Yt = (Ylt'-.-'y8t)); i.

xl£ = x1R,; xit = x2i-2,t + x2i-i,k.'

= 2,3,4; xst = x8t

sit = (x2i-2,Ls2i-2,L + x2i-1,k * s2i-l,k)/xit'

i = 2,3,4.

A GENERAL THREE-PHASE ALGORITHM

VII.

140

is the center of gravity of the mass-points belonging to Group

Thus

sik number i.

The 'goodness' of this approximation is expressed by the number pt =

Thus

p.

max Igi(yX,sR,xR)1i=1,...,16 is the maximum norm of the residual vector of the system (13).

We get the following table:

Group

£=21

i=41

xil

sil

xil

sil

0.048495

0

1

X=81

xil

sil

0.049853

0

0.049940

0

2

0.1 74643

0.277444

0.172802

0.272652

0.172742

0.272418

3

0.5 00004

0.348128

0.500003

0.354998

0.500004

0.355290

4

0.8 25362

0.277439

0.827202

0.272648

0.827262

0.272413

5

0.048494

1

Residual norm

1.40-10 -

pl

0.049852

1

3

1,16.10

0.049938

1

4

5.22-10 -

5

The exact mass-points and masses are given below: sl = 0

xI = 1/20

s2 = 0.5(l-/77) = 0.172673

x2 = 49/180 = 0.272222

s3 = 0.5

x3 = 16/45

s4 = 0.5(1+/) = 0.827327

x4 = 49/180 = 0.272222

s5 = 1

x5 = 1/20

= 0.05

= 0.355556

= 0.05.

It is generally true that very good approximate solutions to the nonlinear system of equations can be constructed by means of discretization, linear programming and clustering of mass-points by determining centers of gravity as described above.

Sometimes it is not even necessary to improve upon

this approximate solution by means of iterative methods. (14)

Use the same method as in the preceding example to

Exercise.

solve the problem 10

Minimize

E

r=l

10

y /r r

subject to

E

r=1

> -1/(1+s2),

sr ly

s E [0,1) = S.

r

Use for the discretization an equidistant grid with

N = 41

the corresponding linear program with the simplex algorithm.

points.

Solve

The approxi-

17.

A General Computational Scheme

141

mate solution found in this way may be compared with the true result which is as follows:

q = 6; the mass-points and masses are

S.

X.

0.037989

0.096417

0.190708

0.202986

0.427197

0.259604

0.686634

0.247041

0.897894

0.165077

1.000000

0.028876

i

The primal solution is y = (-1.000000, -5.837.10-5, 1001582, -0.020238, -0.856457, -0.612559, 2.622486, -2.606125, 1.188151, In the next section we shall describe a general computational scheme.

A GENERAL COMPUTATIONAL SCHEME

517.

Retain the general assumptions of the beginning of this chapter, inal,a2,...,an, and

cluding the requirement that

derivatives of the first and second order.

b

have continuous partial

We propose

A general computational scheme consisting of the three phases

(1)

1), ii) and iii) below.

The dual pair (P) - (D) is discretized; i.e. the infinite index

i)

set

is replaced by a finite subset.

S

The resulting dual pair of linear

programs is solved by means of the simplex method. ii)

The structure of the nonlinear system (3), (4), (5) of §16 is

determined from the calculated optimal solutions of the discretized problems.

A tolerance

a

is selected.

If among the mass-points of the solu-

tion of the discretized problem there are two mass-points with masses

xi

and

x.

such that the distance between

si si

and and

sj

is

s i

less than

c, then they are replaced by a mass-point

s

carrying mass

z

where x = xi + xi,

x = (xisi + x.s.)/i.

This procedure is repeated as long as there still are two mass-points lying closer to each other than

E.

A nonlinear system is now derived by combining (3), (4), (5) of 516.

142

VII.

iii)

A GENERAL THREE-PHASE ALGORITHM

The nonlinear system obtained in Phase ii) is solved by some If the calculated

numerical procedure, e.g. the Newton-Raphson method.

solution satisfies the feasibility conditions of (P) and (D), it is acOtherwise one reenters Phase i) with a refined grid.

cepted as optimal. Remark.

The scheme described above has been successfully applied to It is recommended to use a numerically stable

many practical problems.

realization of the simplex algorithm in Phase i), e.g. the version described in §14, which uses stable updating of the basic matrix. In Phase ii) we construct a nonlinear system by combining (3), (4),

Thus if

and (5) of §16.

is an interior point of

s i

If

tions from (5) of §16 as explained in (9) of §16.

point of

S, we get s i

k

equa-

is a boundary

S, one may proceed as explained in the Example (3) below if

S

A more general description can be formulated by

has a simple structure.

means of the so-called Kuhn-Tucker conditions if

S

is defined through a

set of inequalities: S = {s E Rk: hi(s) < 0,

j = 1,...,p}.

The reader is referred to Collatz and Wetterling (1971) for a discussion of this topic. (2)

If the tolerance

Remark.

a

is selected too large in Phase ii)

or the grid of Phase i) is not sufficiently fine then we may enter Phase iii) with the wrong nonlinear system and the Newton-Raphson iterations diverge or converge to a "solution" which does not define a feasible vector

y

In both cases one reenters Phase i) with a finer grid

of (P).

and reduces the tolerance

in Phase ii).

a

It is possible to show that

Phase iii) succeeds provided that the grid in Phase i) is sufficiently fine, a

in Phase ii) is small enough, and certain general regularity A general three-phase scheme for semi-infinite pro-

conditions are met.

grams of the type given above was first published in Gustafson (1970). (3)

Example.

Let

in the plane (k = 2).

S

be the set

[0,1] x [0,1], i.e. the unit square

We consider the case

n = 8.

Assume that after

carrying out Phase i) (solution of the discretized version of the dual pair (P) - (D)) we get a mass-point distribution as depicted in Fig. 17.1.

8 mass-points appear in 4 clusters containing 1, 2, 2, and 3 mass-points respectively.

Assume now that

a

each cluster with one mass-point. the 4 mass-points

is chosen such that Phase ii) replaces Thus

q = 4.

Hence we must determine

sl,...,s4' the corresponding masses

the 8 primal variables

y1,...,y8.

Since each

s i

x1,...,x4, and

is a two-dimensional

17.

A General Computational Scheme

143

S

Fig. 17.1

4.2+4+8 = 20

vector we have in total

unknowns.

Due to the character of

the solution of the discretized problem we assume that upper right corner, s2 interior of and

s1

on the left-hand boundary and

Thus we should have

S.

are the first and second components of

of remaining unknowns is 17. 8

22

r=1 as s

s1 = s1 = 1

a, (S )yr -

8

a (s.)y r=1 asi r 7 r

-

a2

as

and s1.

s1

is in the

s3, s4 s2 = 0

in the

where

s1

Therefore the total

(5) of §16 now gives the 5 equations

b(s2) = 0

8 j = 0, asi b(s)

i = 1,2

and

j = 3,4,

with (3) and (4) of §16 giving the remaining 4 and 8 equations.

Hence

we have constructed a nonlinear system of equations with the same number of equations as unknowns. (4)

Exercise.

equations when

S

Describe how to construct the nonlinear system of is the unit circle in the plane and some mass-points

are situated on the boundary. (5)

Example.

(13) of §13.

table below:

We consider the first example which was discussed in

Upon discretization we get the mass-points indicated in the

144

VII.

Coord. of s.

i

A GENERAL THREE-PHASE ALGORITHM

X.

1

1

1.00

0.0 0

0.0667

2

0.50

0.2 5

0.2667

3

0.25

0.5 0

0.2667

4

0.50

0.5 0

0.2000

5

0.00

1.0 0

0.0667

6

1.00

1.0 0

0.1333

2 s

55

1

s6

s3

X

s4

s2 1

®

>

Si

s

1

Fig. 17.2 (0 Mass-points of the discr. problem) (x Mass-points of the cont. problem)

Here the 6 mass-points appear in 4 clusters, one of which has 3 members, the other 3 having one mass-point each.

Phase ii) gives

q = 4

and the

following initial approximation for the solution to the dual problem (D): i

Coord. of mass-point si

1

1

0

0.0667

2

0.4091

0.4091

0.7334

3

0

1

0.0667

4

1

1

0.1333

Mass xi

Derive the corresponding nonlinear system of equations (it has

6+4+2

unknowns) and verify that it is satisfied by the optimal solution of the primal problem (given in (13) of §13) together with the following masses and mass-points:

17.

A General Computational Scheme

145

i

Coord. of mass-point i

1

1

0

0.083333

2

0.400000

0.400000

0.694444

3

0

1

0.083333

4

1

1

0.138888

Mass X. i

We shall also discuss the class of problems which were treated in §6, namely calculation of uniform approximations.

The computational scheme

which is described in this chapter has been very efficient for the solution of this class of problems, in particular by approximation on multipledimensional sets. Let

be a compact subset of

T

ments and let

v1,...,vn, and

f

(k > 1) with at least

Rk

be real-valued functions on

n+l T.

ele-

The

approximation problem reads n

I

(PA)

Minimize

yn+l

subject to

yrvr(t) - f(t)j < yn+l,

I

t E T.

r=1

The corresponding dual becomes q (DA)

Maximize

f(ti)xi

subject to

i=1 q

vr(ti)xi = 0,

r = 1,..-,n,

i=1 qq

< 1, i=1

{tl,...,tq} c T. (6)

Let the functions

Theorem.

v1,...' n

be linearly independent.

Then both the problems (PA) and (DA) are solvable and have a joint optiThere are always optimal solutions for (DA) such that

mal value.

xl # 0,...,xq # 0

with

1 < q
The proof is obtained by combining (15) of §6 with (12) of

§10 and (12) of §11.

The next theorem corresponds to (1) of §16 and its converse (see (8)

of §16). (7)

Theorem.

independent.

Then

Let the

n+l

functions

y1, ...,yn' yn+l

with

vl,...,vn, f Yn+i > 0

and

be linearly {tl,...,tq,

146

A GENERAL THREE-PHASE ALGORITHM

VII.

xi,...xq)

with

ti E T, xi # 0, i = 1,...,q (1 < q < n+l)

are optimal

for (PA) and (DA) if and only if the following relations hold: n yrvr(ti) = Yn+1 sgn xi,

f(ti) -

i = 1,...,q,

(8)

r = 1,...,n,

(9)

r=1 q

vr(ti)xi = 0, i=1

q lxii = 1.

I

(10)

i=1

n

The "error function"

I

yrvr - f

r=1

assumes its maximum or minimum value on

T

at each point

ti,...It

4.

Consider the linear optimization problem equivalent to (PA).

Proof:

Then the theorem is a direct consequence of (1) and (8) of §16.

It is

even easy to show that one direction of the statement follows from Lemma We verify here how (8) is derived from the optimality of

(29) of §6.

(t1,...ItqXll...,xq).

and

y1,. ..,yn

If

xi > 0

proof of Lemma (15) of §6, xi = xi, t+ = ti.

we write, as in the

The corresponding complemen-

tary slackness condition is

n

yv(ti) + yn+ r r

xi(

\\ i - f(ti)) = 0.

r=1

If

we put

xi < 0

xi = -xi, ti = ti

and get the complementary slackness

condition

n

\\

xi(- E yrvr(ti) + yn+i + f(ti)J = 0. r=1

Thus if

xi + 0

one of the following two equations is satisfied:

n

f(ti) -

E

Yrvr(ti) = Yn+l,

if

xi > 0;

yrvr(ti) _ -yn+l'

if

xi < 0.

r=1 n E

r=1

This is equivalent to (8).

We point out that the inequality

is a consequence of the linear independence of

Y.+1 > 0

v1,...,vn,f.

The numerical treatment of the dual pair (PA) - (DA) is analogous to that of (P)

-

(D).

Thus a three-phase computational scheme is used.

The

problem is discretized and an initial approximate solution is constructed,

17.

147

A General Computational Scheme

giving

In particular, one observes the sign of an

q.

the solution of the discretized problem. the equations (8) and (11). xl > 0

and

xi

resulting from

This is taken into account in

As an example consider the case

q = 2,

Then (8) and (10) give the equations

x2 < 0. n

f(tl) -

Yrvr(t1) = Yn+l'

E

r=1

n Yrvr(t2) = -Yn+l'

f(t2) r=I

xl - x2 = 1.

The conditions (11) give rise to equations in the same way as (5) of §16. Thus one determines from the results of the discretized problem whether a mass-point

in this case an extremal point of the error function) lies

t1

in the interior or at the boundary of

T.

Accordingly, one appends condi-

tions that partial derivatives must vanish at (12)

Example.

ti.

The following problem is treated in Andreasson and

The function

Watson (1976).

f(s,t) = exp(-s2 - t2)

is to be approximated in the uniform norm on the square

0 < s,t < 1

by

a linear combination of the functions vI(s,t) = 1,

v2(s,t) = s,

v3(s,t) = t,

v4(s,t) = 2s2-1

v5(s,t) = at, v6(s,t) = 2t2-1.

At first the problem is discretized; i.e. it is approximated by the task 6

min

Determine

max If(si,tk) I yrvr(si,tk)I r=1 i,k=1,...,5

(13)

where

4 i-1

t

k-1

k 4 =

k = 1,...,5.

,

The problem (13) is then reformulated as a linear program, as described in §6.

Then we get the task

Minimize

y7

subject to the linear constraints

148

A GENERAL THREE-PHASE ALGORITHM

VII.

6

vr(si,tk)yr + Y7 > f(si,tk)

(ilk = 1,...,5),

(14)

(ilk = 1,...,5).

(15)

r=1 6

- E vr(silt k)Yr + Y7 > -f(si'tk) r=1

This is a linear program with 7 variables and 50 constraints.

It was *

solved with the simplex algorithm which is described in §12-14

.

The

following solution emerged: 1.0358267

y1 =

Y2 = -0.38764207

y3 = -0.95174831 Y4 = -0.12398722

(16)

0.43169480

y5 = Y6 =

0.13390288

y7 =

0.025910991.

The optimal solution is displayed in Fig. 17.3.

Those vectors which appear

in the optimal basis matrix are marked with a ® or a 0 indicating the coordinates

si,tk

of the corresponding mass-point.

basis vector gives equality in (14).

Here, ® means that the

Thus the error function

n f

yrvr

assumes a maximum value there with respect to the point set 1,...,5}.

{(si,tk)/i,k =

A 9 sign means that the basis vector gives equality in (15) and

T 4

5

8

0

2

3

e

C0

7

1

1

e

1

j

6

Fig. 17.3

We thank Mr. Gerd Schuhfuss for solving this program, using a deskcalculator, an "HP 9825A".

17.

A General Computational Scheme

149

hence the error function assumes a minimum value with respect to the same The masses

set.

x.

xl =

0.28571429

x2 =

0.14285714

of the points indicated in Fig. 17.3 are as follows:

x3 = -0.19047619 x4 =

(17)

0.071428571

x5 = -0.095238095

x6 = -0.19047619 x7 = -0.023809524.

The number

q

of mass-points which should result after Phase ii) of the

three-phase algorithm would in this case depend critically on the choice of the tolerance

e.

In particular, should the three points in the lower

left part of the figure be combined into one or more than one points? To settle this question one could of course repeat Phase i); i.e. linear programming with a finer grid.

We take an alternative route.

We consider

again the error function: n`

g = f -

E

Yrvr

r=1

with

yl,...,y6

given by (16).

Thus

g

satisfies

We now determine the extremal points of

grid.

g

1g(s)1 < y7

on the

on the unit square

by means of a Newton-Raphson scheme and using appropriate grid-points as starting approximations.

Six local extrema were found.

are listed in the table below and marked in Fig. 17.4.

Their positions There, 9 means a

local minimum and ® a local maximum.

Coordinates of local extrema 0.288

0.000 4

1

0.000

1.000

1.000 2

0.725 5

0.590

(18)

1.000

0.000

0.867 6

3

0.118

0.000

We now proceed to the construction of the nonlinear system based on the assumption that the error curve corresponding to the solution of the continuous problem has its extremal points distributed in the same way as the function

g.

(If this assumption should turn out to be wrong, then Phase

150

VII.

A GENERAL THREE-PHASE ALGORITHM

Fig. 17.4

iii) will fail and we must return to Phase i) which will be repeated with a refined grid.)

Thus we put

(silt 1).... ,(s6,t6)

the square.

and assume that all extremal points

q = 6

(numbering as in Fig. 17.4) lie on the boundary of

In particular we have

s4 = 0, t4 = 1.

The following 18 un-

knowns remain to be determined: yl,...,y6,y7, x1,...,x6, s1,t2,t3,s5,s6. From (8),

(9), and (10) we get 13 equations.

(19)

We note that the sign of

the masses should be chosen as follows (see Fig. 17.4):

sgn xl = sgn x2 = sgn x4 = 1 sgn x3 = sgn x5 = sgn x6 = -1.

Hence the 6 equations from (18) are completely determined in a simple form and (10) now becomes

xl+x2 -x3+x4 -x5 -

x6 = 1.

The "missing" 5 equations are now generated from (11) since certain partial derivatives must vanish, e.g. 6

f

a

s 8

at

jf l

yr vrI(s1,0) = 0 r=1 6

(

jf

yrvr}(1,t2) =

0.

r

Thus we have got 18 equations for the determination of the 18 unknowns of (19).

An approximate solution was found as follows:

s1,t2,t3,s5,s6

17.

A General Computational Scheme

were taken from (18) x3

yl,...,y7

151

from (16) and

xl,x2,x4,x5,x6

from (17),

was calculated by combining the two masses numbered 3 and 7 of the

solution of the discretized problem. giving

Thus these two masses were added,

x3 = -0.21428571.

After four iterations the Newton-Raphson method delivered the resuits

0.98576860

yl =

y4 = -0.14461987

y2 = -0.34796776

y5 =

0.42457304

y3 = -0.90271418

y6 =

0.11293036

and the maximal deviation was y7 = 0.027274796.

This agrees with the results reported by Andreasson and Watson.

The solu-

tion of the dual problem is given in the table below: Mass-point numbers

Coordinates

0 . 27210827 1

0.25885317

0.00000000 00000000 0.62068986 1

2

.

0.15098041

0 00000000 .

3

-0.20873218

0.21767815 0 0 .

4

0.090166417

1.0

5

0.67690452 1.00000000

-0.13844199

83562113 0.00000000

-0.15282583

0

6

Remark.

(20)

Masses xi

s.,t.

.

The procedure described above is applicable to many

variants of the uniform approximation problem, e.g. when

y

must satisfy

finitely or infinitely many linear constraints besides those specified in (PA).

One example of such problems is one-sided approximation in the

uniform norm.

Further examples are to be found in Chapter IX.

We note that in many approximation problems q = n+l holds in (8),

case when

(9), and (10).

vl,...,vn

be treated in Chapter VIII.) tl,. .. ,tn+l

(An important class of such problems is the

form a Chebyshev system on If

q = n+l, then

S.

These problems will

yl,...,yn+l

and

may be determined from (8) and (11) without also calculating

152

VII.

xi,...,xn+i

A GENERAL THREE-PHASE ALGORITHM

from (9) and (10).

The mathematical properties of the system arising from (8)

-

(11)

have been investigated in Hettich (1976), where nonlinear approximation is treated as well.

Chapter VIII

Approximation Problems by Chebyshev Systems

This chapter will be devoted to the study of the problem pairs (P) (D) and (PA) - (DA) in a special but important case, namely when the moment generating functions

al,...,an

form a so-called Chebyshev system.

The most well-known instance of such a system is

ar(s) =

r = 1,...,n, on a closed and bounded real interval.

sr-1.

In all the linear

optimization problems to be treated in this chapter, the structure of the nonlinear system can be determined from the outset, which simplifies the numerical treatment considerably in comparison to a direct application of the three-phase algorithm.

In the first section we shall present some general properties of Chebyshev systems.

The reader will recognize many results from the theory

of polynomials in one variable.

The next section will be devoted to Prob-

lem (P) and the connection between one-sided approximation and certain generalized quadrature rules of the Gaussian type.

In the last section

we shall treat numerical calculation of the best approximations in the uniform norm.

§18.

GENERAL PROPERTIES OF CHEBYSHEV SYSTEMS (1)

Let the functions

bounded real interval tem over

[a,s].

ul,...,u ul,...,un

[a,s], if the determinant

153

be continuous on the closed and will be called a chebyshev sys-

154

APPROXIMATION PROBLEMS BY CHEBYSHEV SYSTEMS

VIII.

u1(t1)

ul(tn)

....

U(tl,,tn) =

(2)

....

un(tI)

un(tn)

satisfies the relation U(tl,...,tn) > 0 Remark.

if

a < tl < t2 < ... < to < s.

The monominals

(3)

ur(t) = tr-l, r = 1,...,n

system over any real interval.

form a Chebyshev

From a numerical point of

See (3) of 57.

view it is often more advantageous to work with orthogonal polynomials instead of monomials.

We note that if

then we can determine constants

r-1

is a polynomial of degree

ar(t)

d r,

d

r

= +1

ur = drar, r = 1,...,n, is a Chebyshev system. ur = Tr-1

or

d

r

= -1, such that

The particular case

(see (20) of 57) occurs often in computational practice.

also give the following example: 0 < al < a2 < ... < an.

Put

Let

be real numbers such that

X.

ur(t) = eArt.

shev system over any real interval.

We

Then

ul,...,un

is a Cheby-

The reader is referred to Karlin and

Studden (1966) for further examples.

We will now show that many interesting results may be derived from the definiing relation (3). (4)

Theorem.

Let

ul,...,un

closed and bounded interval

ti, i = 1,...,n

let

y E Rn

[a,p].

form a Chebyshev system over the Let

w E Rn

be distinct points.

be a fixed vector and

Then there is a unique vector

satisfying

n

rr i

y u (t

r=1

Proof:

of unknowns.

)

= wi,

i = 1,...,n.

(5)

(5) is a linear system with

n

equations and the same number

We may assume that the equations are reordered such that

tl < t2 < ...< tn.

By (3) we conclude that the system (5) has a nonzero

determinant and hence a unique solution

y.

We have immediately (6)

Q

Corollary.

Let

ul,...,un

be as in (4) and define the function

by

n

nC

r=1

yrur

(7)

18.

General Properties of Chebyshev Sys=ers

where Q

yl,...,yn

are real numbers.

Then

155

has less than

Q

n

zeros if

is not identically zero.

Assume that

Proof:

Putting

tl,...,tn.

Q

is not identically zero but vanishes at in (5) we get the unique solu-

wi = 0, i = 1,...,n

yr = 0, r = 1,...,n, establishing the contradiction sought.

tion

If we put

ur(t) = tr-1, r = 1,...,n, then

of degree less than

Q

becomes a polynomial

Thus (6) is a generalization of the familiar

n.

statement that a polynomial of degree less

than

n

also has less than

n

In the same way, (4) generalizes the theorem that there is a unique

zeros.

polynomial of degree less than

n

which interpolates

n

given points.

In order to discuss the problems (P) - (D) we also need to intro-

(8)

duce zeros of multiplicity 2.

We shall see that some well-known results

on polynomials can easily be extended to linear combinations of functions which form a Chebyshev system.

For this purpose we introduce the deter-

minants jl .... in (9)

t1 .... tn where the symbols are defined by rules i), ii) and iii) below and the value is evaluated according to rules iv) and v).

a
i)

are integers and we have always

ii)

iii)

= 1; ji = 2

l

If

iv)

ji = 1

is possible only if

jl = j2 = ... = in = 1

jl'...,jn

or

ji = 2;

ji-1 = 1.

then

_ U(tl,...,tn)

tltn If there is a

v)

ji = 2

then we proceed as follows.

First we as-

sign to the determinant (9) the value given in Rule iv) above. Next we change all columns ments

u

r

(10)

[t

t

i-11 i

]

for

with

ji = 2

r = 1,...,n.

Example.

2

(t

i

t

j

t 2

3

so that the ele-

are replaced by the divided difference

ur(ti)

=

ul(tl)

ul(t2)

ul[t2,t3)

u2(tl)

u2(t2) u3(t2)

u2[t2,t3]

u3(tl)

u3[t2,t3]

156

APPROXIMATION PROBLEMS BY CHEBYSHEV SYSTEMS

VIII.

We note that the determinants (2) and (9) generally have different numerical values (for the same points

t1,...,tn) if there is a

ji = 2.

But

(2) is positive if and only if (9) is positive. (9) may also be defined if two arguments coincide. ur

If the functions

are differentiable (which we assume from now on) then we define ur[ti,t.] = lim ur[t,ti] = u=(ti). tot.

1

We next introduce the determinant (11)

whose value is given by the rules a) - d) below.

a)

a < t1 < t2 < ... < to <0;

b)

if

ti = tthen ti = a

and

or

if all

d)

if two arguments

ti > ti-1

ti+2'

are distinct, then

c)

ti

ti+l<

or

ti+1 = B

U'(tl,...,tn) = U(tl,...,tn);

coincide then we put

ti

i1,...,jn

U'(t1,...,tn) = U tl ..* tn where 2

if

ti = ti-1

1

if

ti > ti-1'

li

(12)

Example.

U (tl,t2,t2) = U

(13)

1

1

2

tl

t2

t2

u1,...,un

The functions

system of order two over entiable on

[a,S]

[a,8]

if

u1(t1 )

u1(t2)

u2(t1)

u2(t2) u'(t2)

u3(t1)

u3(t2)

ul(t2)

u3(t2)

are said to form an extended Chebyshev u1,...,un

and all determinants

are continuously differ-

U'(t1,...,t

> 0

for

a
A function

f

which is continuously differentiable on

[a,g]

is said to have a zero of multiplicity 2 (also called a double zero) at t E [a,s], if

f(t) = 0

and

fl(t) = 0.

We can now extend the Corollary

(6) and state: (15)

Theorem.

order two over

Let

[a,s].

u1,...,un

form an extended Chebyshev system of

Let the linear combination

Q be given by (7).

157

General Properties of Chebyshev Sys`e-.s

18.

Assume also that

Q

is not identically zero.

Then

has less than

Q

n

[a,R], counted with multiplicity.

zeros in

We note that if

The proof is analogous to that of (6).

Proof:

has a double zero at

t, then the coefficient vector

y

of

Q

Q

must

satisfy the two equations

Q' (t) = 0.

QCt) = 0,

Thus if we assume that

Q

has

zeros counted with multiplicity we get

n

a linear system of equations whose right-hand side is zero and whose coefficient matrix has a determinant of the type (11).

Hence the conclusion

follows.

The interpolation statement (5) may be extended to the con-

Remark.

fluent case, i.e. when pairs of the points

ti

appearing in (15) are

allowed to coincide.

One could also introduce extended Chebyshev systems

of order higher than

2

and establish the corresponding results on inter-

polation and maximum number of zeros. Some results which will be needed in the sequel are given in the exercises below.

ul,...,un+1

Let

Exercise.

(16)

as well as

Chebyshev systems of the second order over

[a,R].

u1,...,un

be extended

Show that (17)

U'(tl,...,tn,t) = cn(un+1(t) - Q(t)) where

ul,...,un

un+1

is a linear combination of

has

n

Exercise (16).

Hint:

yrur(t)

r=1 n+l

ur(t) = tr-l

and require

ul,...,un+1

u(n) (t) > 01 n+l

satisfy the assumptions of

Use Rolle's theorem to show that no function of

n I

by the last column.

continuous derivatives and also satisfies

the form

un+1(t) -

U'(t1,...,tn,t)

Show that if we take

Exercise.

t E [a, s], then the functions

can have

Q

Q'(ti) = un+1(ti)

Expand the determinant (18)

that

and

i = 1,...,n;

ti = ti+l, then

if

ii)

t

such that

Q(ti) = un+I(ti),

i)

Hint:

is independent of

cn > 0

zeros in

[a,8].

158

APPROXIMATION PROBLEMS BY CHEBYSHEV SYSTEMS

VIII.

Exercise.

(19)

R(t) = un+1 -

Use the notation and assumptions of (16).

Let

Q(t).

Show the following results:

R(t) > 0, t E [a,$], in the two cases i) and ii) below: has

R

i)

And

double zeros in

n/2

R(a) = 0

ii)

and

R

even);

(n

double zeros in

(a,B)

odd).

(n

R(t) < 0, t E [a,B], in the two cases i) and ii) below: R(a) = R(8) = 0, R

i)

(a,B)

(n-l)/2

has

has

double zeros in

(n/2)-1

(a,B)

(n

even);

Hint:

and

R(B) = 0

ii)

R

has

double zeros in

(n-l)/2

(a,B)

odd).

(n

does not change sign at a double zero.

R

Exercise.

(20)

order two over

Let

[a,B].

be an extended Chebyshev system of

ul,...,un+I

Show that there is a linear combination

n C

n+1

(21)

rIl Yrur

which is strictly positive on (a,B)

in

t E [a,B], or

Hint:

-Q(t) > 0, t E (a,0].

Put

Q

If

and

a

B

has only double zeros then either

2Q = Q1 + Q.

where

are nonnegative linear combinations of the type of (21).

Q2

n

(a,B].

and possibly simple zeros at

odd and

§19.

n

Q(t) > 0, Q1

even should be discussed separately.

ONE-SIDED APPROXIMATION AND GENERALIZED QUADRATURE RULES OF THE GAUSSIAN TYPE In this section we shall study Problem (P) - (D), which

§3 and §4.

Here we study a special but important case.

is the closed and bounded interval Chebyshev system over ferentiable on

[a,B].

Instead of

S.

times we shall assume that the

[a,8]

and

instead of

b.

we defined in

The index set

a1....,an

ar

we shall write

n+l

functions S.

ur, as in §18.

ul,...,un, b

Then we shall write

Besides the dual pair (P) - (D) we shall study the two

n

n I

r=l

and

Some-

also form

problems

Maximize

cryr

S

form an extended

will be assumed to be continuously dif-

b

an extended Chebyshev set of order two over

(P2)

and

The cases

subject to

I

r=l

ur(s)yr < b(s),

s E S,

un+1

19.

One-Sided Approximation

159

q Minimize

(D2)

q subject to

xib(si)

xiu(si) = c,

i=1

i=1

Exercise.

(1)

i = 1,...,q.

> 0,

x Show that (P2)

- (D2) are a dual pair of linear

optimization problems. Let

Lemma.

(2)

ul,...,un

closed and bounded interval

form an extended Chebyshev set over the and let

S

b

be continuous there.

Then

(P) and (P2) meet the Slater condition.

Using the result of (20) of §18 we establish that there is a

Proof:

vector

such that

z E Rn n

Q(s) =

zrur(s) > 0,

I

s E S.

r=l

Put

d = min Q(s) sES

Next set A = d-1(1 + max b(s)) sES

and define the vector

n

r=1

yrur(s) = A

y = Az.

Then we get

n

r=1

zrur(s) > Ad > max b(s).

-

sES

Thus (P) meets the Slater condition.

The proof of the analogous statement

for (P2) is carried out in a similar way. (3)

Hence we may use (12) of §10 to conclude that if (D) is feasible

and the assumptions of Lemma (2) are met, then strong duality holds for (P)

- (D) and (P2) - (D2) respectively.

in the interior of the moment cone solution.

If we also require that

c

is

M , we can show that (P) has a unique n

(For a definition of this cone see §8.)

For this purpose we

shall give a simple characterization of interior and boundary points of the moment cone.

Our argument parallels that of Karlin-Studden (1966),

pp. 42-43. (4)

Let

T = (t1,...,tR}

we shall denote by the index of

be a subset of the interval T, ind (T), the integer

t

{sign(ti-a) + sign(B-ti)}.

ind (T) i=1

[a,B].

Then

160

APPROXIMATION PROBLEMS BY CHEBYSHEV SYSTEMS

VIII.

Thus

ind (T)

R

= un+1 - Q

of (19) of §18.

Z = {zl,...,zq}

Let

denote its zeros.

in such a manner that ind (Z) = n

Q

ect

Consider the function

Example.

(5)

ind (Z) =n and un+1(s) = es.

They are denoted

will be double.

has at most

We want to construct

and indicate the zeros of

n = 4

and

Then we can sel-

R(s) < 0, s E (m,8], or

S = [0,1], ur(s) = sr 1, r = 1,...,n,

By (18) of §18, R

counted with multiplicity.

n = 3

and

R(s) > 0, s E [a,s].

We discuss the special instance and

21 - 2, 2t - 1, 2t.

must assume one of the three values

n R

zeros in

The zeros in

R.

zi, i = 1,...

.

Z = {z1,z2}

ind(Z) = 4

R(s) > 0,

s E [0,1];

n = 4

Z = {0,z3,1}

ind(Z) = 4

R(s) < 0,

s E [0,1];

n = 3

Z = {0,z4}

ind(Z) = 3

R(s) > 0,

s E [0,1];

n = 3

Z = {z5,1}

ind(Z) = 3

R(s) < 0,

s E [0,1].

Let

be a given vector.

c E Mn

(0,1)

Thus

n = 4

(6)

[0,1],

explicitly for

If

L

cr =

xiur(s.),

r = 1,...,n,

xi > 0,

i = 1,...,q,

(7)

i=1

then we say that

has a representation involving the points

c

We define now the index of c as

ind(T)

where

T

SI,...,sq.

is the subset with

smallest index satisfying (7). (8)

1/3,1/4)

S = (0,1], ur(s) = sr-1, r

Example.

c = (1,1/2,

has the two representations

c = 6 u(0) +

u(1/2) + 3

and

u(1)

(9)

6

+1 c =2 1 u(1 1 + 12 u(1 2

(10)

AT

The index of the subsets appearing in (9) and (10) is 4. (11)

Hint: or

Show that for

Exercise.

c

from (8) we do have

ind (c) = 4.

all subsets with index 3 must be of one of the two forms

(t,l), where (12)

Lemma.

order two over

{0,t}

t E (0,1).

S.

Let

ul,...,un

A point

c E M

be an extended Chebyshev system of

is a boundary point of Mn

if and

19.

one-Sided Approximation

only if

ind(c) < n.

161

Every boundary point admits a unique representation

(7).

Proof:

c0 E Bd (Mn).

Let

Then there is a supporting hyperplane to

Mn at, atcpassing through

0

may find real constants

such that

Br

since

is a

Mn

onvex cone.

Thus we

n

r=1

Br>0

satisfying 0

nC

nC

Brcr > 0,

Brcr = 0 ,

r=1

(13)

c E Mn.

r 11

Now put

nn

Q =

E

r1

Brur

By (13) we get

Q(t) > 0, Since

t E [a, B). must have a representation (7).

c0 E Mn, c0

cror = r=1

Thus

xi

i=1

xiQ(ti) = 0.

Brur(si) _

I

r=1

i=1

Q(ti) = 0, i = 1....,

.

Theorem (15) of §18 can be reformulated to

the statement that the set of zeros of ind(c0) < n.

The numbers

k < n, since if t . . . . . . . . tn

k < n

Q

has an index

< n.

Hence

in (7) are uniquely determined as long as

xi

we add

tR+1", ..tn

are selected such that

to the sum in (7), where

tl,...,tn

are distinct.

We put

We next consider (7) as a linear system of equa-

... = xn = 0.

xR+1 tions with

We get

xi,...,xn

as unknowns.

It has a unique solution since its

determinant is positive. c E Mn

Assume conversely that a vector

with index

ents define a supporting hyperplane at boundary point of (14)

order

2

Theorem.

over

has a representation (7)

We construct a nonnegative function

< n.

Mn.

Let

c.

Q whose coeffici-

By (2) of §11, c

u1,...,un

S, and let

b

be an extended Chebyshev system of

have a continuous derivative on

(D) is feasible then (D) and (D2) have optimal solutions.

the interior of the moment cone tions.

must be a

This concludes the proof.

Mn

If

S.

c

If is in

then (P) and (P2) have unique solu-

162

APPROXIMATION PROBLEMS BY CHEBYSHEV SYSTEMS

VIII.

Proof:

Since the Slater condition is met in P and P2, Problems (D)

and (D2) have optimal solutions.

Let

Then there are nonnegative reals

xi

q

y

be the optimal value of (D).

and elements

ti E S

such that

x.u i r (t.)i = c r', r = 1, ..,n,

i=1

X

> 0,

i = 1,...,q.

0

If

c E M. then (P) has an optimal solution

by Theorem (7) of §11.

y

Put

n Q =

I

Yrur

r=1

Due to complementary slackness we have Q(ti) = b(ti),

i = 1,...,q.

(15)

Q(s) > b(s), s E S, we must also require

Since

Q'(ti) = b'(ti)

a < ti < a.

if

(16)

Combining (15) and (16) we get a linear system of equations with knowns and a number of equations amounting to we conclude from Lemma (12) that

c E Mn

ind (c) > n.

(16) uniquely determine the optimal solution (17)

Example.

If

c

ind (tl,...,tq).

where

b

yl

is at the boundary of

subject to

un-

Thus (15) and

y.

have any solution or there might be many solutions. Minimize

n

Since

Mn, then (P) may not Consider the problem

yl + y2s + y3s2 > b(s),

is a function continuously differentiable on

s E [-1,1] [-1,1].

The dual of this problem reads q

Maximize

xib(ti)

subject to

i=1

q

(18)

xi = 1, i=1 qq

xiti

(19)

0,

i=1

q

xit2 = 0,

xi > 0, i = 1,...,q,

i=1

(20)

-1 < ti < 1,

i = 1,...,q.

19.

One-Sided Approximation

163

Combining (18) and (20) we find that we must take Thus

ind (1,0,0)

= 2

Let now

for this problem.

Q(s) = yl + y2s + y3s2.

put

q = 1, t

= 0, x

3 1

T

y

y E R

= 1. 1

be given and

is optimal if and only if

Q(0) = b(0),

Q(s) > b(s),

s E [-1,1] .

Thus we must also have Hence a solution

Q'(0) = b'(0).

must satisfy

y

y2 = b'(0)

yl = b(0)

and (21)

-1 < s < 1.

y3s2 > b(s) - b(0) - sb'(O), y3

is generally not determined uniquely by (21).

we get the condition In the case

For

f(s) = exp(s)

y3 > e-2 z 0.718. Is13/2,

f(s) _

(21) gives the relation

-1 < s < 1

y3s2 > Is13/2,

which cannot be satisfied for any

Thus (P) has no solution in this

y.

case.

The conditions of Theorem (14) do not, however, guarantee the uniqueness of solutions to (D). (22)

(P)

This is illustrated by

Example.

Minimize

yl + 2 y2

subject to

yl + y2s > 1 + s cos 6ws,

0 < s < 1,

The dual of this problem reads q (D)

Maximize

xi (1 + si cos 6nsi)

subject to

i=1 qq

i=1

xi = 1,

qC

xisi = 1/2, i=1

xi > 0,

We can take

q = 1,Txl = 1,

problem, i.e. (1,2)

sl = 1/2.

Taking

E M2.

Slater condition is met.

i = 1,...,q.

si E [0,1],

Thus

ind (1,1 )T = 2

yl = 3, y2 = 0

By (14), (P) has a unique solution.

that

1 + s cos 61rs < 1 + s,

in this

we find that the We note

164

VIII.

with equality at

APPROXIMATION PROBLEMS BY CHEBYSHEV SYSTEMS

Hence an optimal solution to (D)

s = 0, 1/3, 2/3, 1.

is defined by the conditions 4

xi = 1,

i=1 4

xi31

xi > 0,

= z,

1 = 1,...,4.

i=1

These conditions do not determine Theorem.

(23)

Let

xi,...,x4

as well as

ul,...,un

Chebyshev systems of order two over have unique solutions.

determined if

S.

If

ul,...,un+I

(P) and (P2) have solutions which are uniquely

c E M . n

We now treat the case

are a direct consequence of Theorem (14).

c E bd Mn

be extended

c E Mn* then (D) and (D2)

The statements about the solutions of (P) and (P2) for

Proof:

c E An

uniquely.

and study the solutions of (D).

Let

c

have the representa-

tion q

cr =

xiur(ti),

r = 1,...,n.

(24)

i=1

If

c E bd Mn

then

and (24) is uniquely determined by

ind (c) < n

{tl,...Itq}

Then there is only one subset

such that the constraints of

(D) are met, so (D) has trivially a unique optimal solution.

show that (P) has a solution

Points

y.

c.

We next

are selected in

tq+l,...,tk

such a manner that ind {t1,...,t1} = n

and this set contains the endpoint

$.

Next, y

is determined from the

equations

where

T y u(ti) = un+1(ti),

i = 1...... ,

yTu,(ti) = un+I(ti),

ti + (a,

u(ti) _ (uI(ti),...,un(ti))T.

As shown in (19) of §18, y

meets the constraints of (P).

struction of a solution to (P2) proceeds in a similar manner. show that (D) has a unique solution if value of (D), A

c E Mn.

the optimal value of (D2).

is closed, the optimal values are attained.

Let

Then Also,

A

A < X.

The con-

We need to

be the optimal Since

Mn+1

One-Sided Approximation

19.

165

(ell ...,cnX) E bd Mn+1'

Hence it has a unique representation given by

4

4CC

xiu(ti) = c,

iLl

and we have

(25)

xiun+1(ti)

iLl

ind (tl,...,t-) < n. 4

(D2) is treated in the same way.

Thus

we have concluded the proof. If

Remark.

(26)

c E M

n

then

ind (c) > n.

Combining this know-

ledge with (25) we get ,t-} = n.

ind {tl,

4

If we discuss (D2) in the same way we shall find a representation

q qC

xiu(ti) = c,

iLl

iLl

xiun+1(ti) = A,

(27)

where ind (tl,...,t I = n. 4

Since (P) and (P 2) have unique solutions we must have

(27) defines two different representations of then (P) has a unique optimal solution

c E Mn

c.

y.

A < A.

Thus (25),

We note also that if Put

n

Q =

yrur

I

r=1

Then we must have Q(ti) = un+l(ti),

Q(t) > un+l(t). Therefore the right endpoint

S

must be in the subset

}.

(See

4

(19) of §18.)

Arguing in the same way we find that the set

{tl,...Itq}

is also uniquely determined and does not contain the endpoint

S.

Thus

e

if

c E Mn

then

subsets of index (28)

U1,...,un let

w

has two different representations associated with

n.

Generalized quadrature rules of the Gaussian type.

form an extended Chebyshev system of order two over

Let again

f

which are continuously differentiable over

(a,s]

and

[a,8]

be a continuous nonnegative function over the same interval.

functions fine

c

For

we de-

166

VIII.

(B

I(f) = J

APPROXIMATION PROBLEMS BY CHEBYSHEV SYSTEMS

f(s)w(s)ds.

a We want to construct mechanical quadrature rules of the form

q

1(f) Z

xif(si)

(29)

i=1

a < sl < s2 < ... < sq < B.

where

We want (29) to give exact results for

f = ur, r = 1,...,n.

Putting

(a

Cr = I(ur) = J

r = 1,...,n,

ur(s)w(s)ds,

(30)

a

we find that the weights

and the abscissas

xi

s.

must meet the condi-

tion

q r = 1,...,n.

xiur(si) = Cr'

(31)

i=1

q = n

If we put

in (31) and select

(31) as a linear system with

Si

x1,...,xn

arbitrarily we may consider as unknowns.

u1....,u

Since

form a Chebyshev system, the determinant of this system is positive and hence a unique solution exists.

We now show that there are exactly two rules (31) such that i = 1,...,q, and

ind (sl,... s

)

= n.

xi > 0,

These rules are called generalized

q

To establish this we need only show that

rules of the Gaussian type.

For

(cl,...,cn)T E Mn, since then we can apply the argument of (26).

we define for

N = 2,3,...

ur(a),

r = 1,...,n

the functions

urN

according to

s=a

urN (s) _ u{(N-1Na+1B) r

i-1

a +

(B-a) < s < a + 1(N a)

.

We find that lim urN(s) = ur(s),

N-

r = 1,...,n, (32)

rB

lim J

N- a

rN(s)w(s)ds = cr,

Put N =

cr

(B

a We find that

urN(s)w(s)ds.

r = 1,...,n.

19.

One-Sided Approximation

N

N

riur {(N-1Na+1B} S

i=1

r

167

where

is the integral of

Ci

over the interval

a + i(B-a)/N].

[a + (i-1)(B-a)/N, Thus

w

cN = (cN'...,cN)T E Mn , N = 1.....

M

Since

.

2

1

n

is closed, c E Mn

due to (32).

One-sided approximation.

(33)

Let

be as in (28).

ul,...,un[a,B]

We discuss now the problem of approximating the continuously differentiable function

f

from above by the linear combination

Q = yTu in such a manner that

fBIQ(s)

a

- f(s) Iw(s)ds

is minimized when continuous on

(34)

Q(s) > f(s), s E S.

[a, B].

Here

w

is a fixed function,

Q(s) > f(s), JQ(s) - f(s)I = Q(s) - f(s)

Since

and (34) becomes

IQ(s) - f(s) Iw(s)ds = cy T -

rB J

f(s)w(s)ds,

(35)

Ja

where

c

is given by (30).

of (35) is independent of

Since the integral on the right hand side y, our goal is to render the scalar product a

minimum subject to the constraint instance of (P).

We note that

Q(s) > f(s), s E S.

We recognize an

q, {sl,. .,sq}, x11 ...,xq

for the dual problem (D) if and only if

s1,...,s

q

and

is feasible

x1

..

xq

are

the abcissas and weights of a quadrature rule (with nonnegative weights) which is exact for

u1,...,un.

By complementary slackness the optimal

Q

must satisfy the equations Q(si) = f(si),

i = 1,...,q,

(36)

(si-a)(B-si)Q'(si) = f'(si) = 0, If the n+l functions

u1....,un,f

i = 1,...,q.

(37)

form an extended Chebyshev system of

order two then the optimal solutions of (D) and (D2) define generalized rules of the Gaussian type.

See (26).

168

APPROXIMATION PROBLEMS BY CHEBYSHEV SYSTEMS

VIII.

We want to find the best polynomial approximations

Example.

(38)

from above and below to the function treat the cases For

n = 4

n = 3,4

and

on

et

for

w(t) = 1.

a = 0, a = 1.

For

See also (5).

rules have the abscissas

0, 2/3

respectively.

1/3, 1

Calculation of generalized quadrature rules of the Gaussian

b = un+1, where

and

u......un+l

gorithm is simplified considerably since It is also known that When

tq = a

and

[a,s].

The three-phase al-

is known from the outset.

q

must occur in the representation sought

is even we also have

n

ar = ur

are required to form ex-

ul,...,un

tended Chebyshev systems of order two over

c.

from

the two generalized Gaussian

n = 3 and

et

q = 3, sl = 0, s2 = 1/2, s3 = 1,

Such rules can be determined by solving (P), (D) for

type.

We

c = (1,1/2,1/3,1/4)T

Thus the best approximation to

above is found by solving (36), (37) with

for

Thus

there are two (generalized) rules of Gaussian type which can

be found from (9) and (10).

(39)

[0,1]

ur(s) = sr-l.

Thus the structure of

t1 = a.

the nonlinear system treated in Phase 3 is known from the outset and we know for certain whether a "correct" system has been constructed after We observe that

carrying out Phases 1 and 2.

s1,...,sq

and

x1,...,xq

can be found from the nonlinear system (4) of §16 which in this case has n

equations and

n

If one wants to solve (P) instead, y

unknowns.

can afterwards be found from the linear system resulting from combining For the important case

(3) and (5) of §16.

ur(s) = sr-1

special al-

gorithms have been developed.

§20.

COMPUTING THE BEST APPROXIMATION IN THE UNIFORM NORM In this section we shall treat the numerical solution of the dual

pair (PA) - (DA) when

the same interval.

form an extended Chebyshev system of

v1....,vn

order two over an interval

[a,8]

Instead of

and

f

is twice differentiable over

we shall write

yr

ur, r = 1,...,n.

We

write (PA) and (DA) as follows (see §6): n (PA)

Minimize

yn+1

subject to

I

I

yrur(t) - f(t)l < yn+1,

r=1 q (DA)

Maximize

xif(ti)

subject to

i=1 q

xiur(ti) = 0, i=1

q I

i=l

1xii = 1.

r = 1,...,n,

t E [a,$];

20.

Computing the Best Approximation in the Uniform Norm

In §7 we treated polynomial approximation; i.e. the case

169

ur(t) =

tr-1

We shall now show that many of the results obtained there may be easily extended to case of a general extended Chebyshev system of order two. a < tl < t2 < .,. < to+l < 5

Let

Lemma.

(1)

and let

be fixed real numbers

be a nontrivial solution of the homogeneous system

x1,...,xn+l

of equations n+l

ur(ti)xi = 0,

r = 1,.... n.

(2)

i=1

Then i = 1,...,n.

x .x 1.+1 < 0, 1

Proof:

Let

i

be a fixed integer such that

1 < i < n.

Let

nC

yrur,

P = r I=l

the linear combination which is uniquely determined by the conditions

P(t) _

1,

J

0,

j = i, j 1,...,n+l,

j + i,

(3)

j# i+l.

The determinant of the system of equations (3) is positive by the definition of Chebyshev systems.

The rest of the argument parallels the proof

of Lemma (1) of §7. (4)

Theorem.

Let

f

be continuous on

vl,...,vn

system on the same interval and a linear combination

P

a Chebyshev

be given:

n P = r Il yrvr.

Let further

a < tl < t2 < ... < to+l < a

{f(ti) - P(ti)}

.

be

{f(ti+1)-P(ti+1)} < 0,

n+l

points such that

i = 1,...,n.

(5)

Then

min

i

I f (ti) - P (ti) I < On <

max

a
If(t) - P(t)j,

(6)

where n

= inf n yERn Proof:

replace

tr-l

max If(t) a
-

I

r=l

y r r I.

The proof closely follows that of Theorem (5) of §7 if we with

ur

there.

170

APPROXIMATION PROBLEMS BY CHEBYSHEV SYSTEMS

VIII.

(7)

Corollary.

Let

n p =

Yrur

E

r=1

be a linear combination such that there are

n+l

points

a < t1 < t2 <

with the properties

< to+l < $

lail = If(ti) - P(ti)l =

max If(t) - P(t)l, a
i = 1,...,n+l,

and i = 1,...,n.

d 1.+1 < 0,

d 1

Then f

is the linear combination of

P

which best approximates

in the uniform norm. (8)

Determination of a linear combination satisfying (5).

a < t1 < t2 < ... < to+l < B of

u1,...,un

be given.

and a constant

ul,...,un

a

Again let

We seek a linear combination

i = 1,...,n+1.

P(ti) = f(ti) - a(-l)1,

P

such that (10)

Putting

n r=nC

P =

L1 Yrur,

we get the linear system of equations n

Yrur(ti) + E(-1)1 = f(ti = 1,...,n+l.

(12)

r=1

There are

unknowns, namely

n+l

y1,...,yn, and

e, and the same number

Expanding the determinant of coefficients by its last

of equations.

column and using the defining property (2) of §18 of Chebyshev systems, we ascertain that (12) has a unique solution. For a general Chebyshev system, (12) may be solved numerically as described in §14 but if If

ur(t) = tr-l

ur = Tr-1, r = 1,...,n a+b ti

2

+

b-a Z

the method of (14) of §7 is faster.

and

cos -11n

ir,

i = 1,...,n+1,

then the orthogonality relations (34) of §7 can be used to solve (12) efficiently. By Theorem (4), (PA).

manner.

lel

from (10) is a lower bound for the value of

We next describe how to improve upon this bound in a systematic

20.

Computing the Best Approximation in the Uniform Norm

Lemma.

(13)

and let

u1,...,un

Let

be given and let

to+l < S

form a Chebyshev system over

be continuous on the same interval.

f

171

y E Rn

and

Let

[a,s]

a < t1 < t2 < ... <

be the solution of (12).

e

Put

n

R(t) = f(t) - I yrur(t) r=1

Now let

be such that

a < T1 < T2 < ... < Tn+I < $ R(Ti+l) < 0,

R(Ti)

i = 1,...,n;

(14)

IR(T)I > IR(ti)l,

i = 1,...,n+1;

(15)

IR(T.)I > JR(t.)I

for at least one

1

1

We define

z E Rn

(16)

i.

through

and

n

zrur(Ti) + e(-1)1 = f(T.),

i = 1,...,n+1.

(17)

r=1

Then

1E1

>

Proof:

i.

I

We determine

xi,...,xn+1

1,...,F,n+1

and

as the unique

solutions of the equations n+l

n+l

xiur (ti) = 0,

r = 1,...,n,

i=1

(-l)lxi = 1,

(18)

(-1)'Ei = 1.

(19)

i=1

n+l

n+l

iur(Ti) = 0,

r = 1,...,n,

i=1

i=1

Since the matrix of coefficients of (18) is the transpose of that of (12) This is true of (19) by the

the former system has a unique solution.

From Lemma (1) we conclude that

same argument.

i

xixi+l < 0,

i

i

n+l

n+l

n zrur(ti) + E

E

r=1

i=1

Using (19) we arrive at n+l e _

f(Ti). i=1

In the same way we find that

n+l (-1)'

i=1

we get

.

f(Ti)

=

i=1

172

APPROXIMATION PROBLEMS BY CHEBYSHEV SYSTEMS

VIII.

n+l xif(ti).

e = i=1

Applying (18) and the definition of n+l

n+l E _

we obtain

R

x R(t

1=1

i

i=1

i

i

R(Ti).

(21)

All terms in the two sums (21) have the same sign due to (14) and (20). Therefore the desired conclusion

By passing from the set

Remark.

follows from (16).

> lal

Ial

{t1,...,tn+1)

to

{TI' ...,Tn+1)

as described in Lemma (13) above we perform a simplex-like exchange step and obtain an improvement of the lower bound for the obtainable approximaWe will now show that it is possible to carry out such an

tion error.

exchange as long as

Assume that

Use the same notation as in Lemma (13).

Lemma.

(22)

there is a

is not an optimal solution of (PA).

yl,...,yn' E

such that

t* E [a,b]

R(t*) > E.

(23)

Then there is a set Since

Proof:

by (12), it has

is continuous on

R

A.

placed by

t*.

ii)

iii)

If

if

t* > an+I.

Theorem.

only if there are

Proof:

then

replaces

t*

t*

t*

< t* < Ai+1'

replaces

A1; otherwise

Ial

Then

t*

replaces

if

R(t*)

t

replaces Ai+1.

R(Xn+l) > 0;

XI.

Then (14)

points

an+l

replaces

-

(16) are satisfied as claimed.

is an optimal solution of (PA) if and

y1,...,yn+l n+l

A.

R(Ai) > 0; otherwise

Then

(12) is satisfied with

(7).

such that

i

T. = A., i = 1,...,n+l. (24)

will be re-

A.

An+1'

R(t*)

otherwise Put

Next one of the

R(t*)R(al) > 0

replaces

There is an ai

such that

There are the three cases i), ii), iii):

t < al. t*

R(ti+1) < 0

R(ti)

i = 1,...,n.

= ti, i = 1,2,...,n+1.

First put

and

(a,8]

z1 < z2 < ... < zn

zeros

n

ti < zi < ti+l'

i)

meeting the conditions (14) - (16).

{T1,...,Tn+1)

a < t1 < t2

< ... < to+l < $

such that

= yn+1

If (12) is satisfied then optimality follows from Corollary

Assume on the other hand that

y1,...,yn+1

is an optimal solution

20.

Computing the Best Approximation in the Uniform Norm

of (PA).

173

Since (PA) and (DA) have the same optimal value and (DA) has a

solution we may write

q q

y.+1 =

xif(ti),

L

(25)

i =1

cq

r = 1,...n,

xiur(ti) = 0,

(26)

i=1 q 1I1 Ixil = 1.

(27)

We need only consider optimal basic solutions of (DA); i.e. we must have q < n+l.

rank

The homogeneous system (26) has a matrix of coefficients with

= min(q,n).

Hence it has nontrivial solutions only for

and (DA) has therefore no optimal solutions with

q < n+l.

is the only possibility for optimal basic solutions. by

yr

and summing over

i=1

x1

E r=1

q > n+1

Thus

q = n+l

Multiplying (26)

we find that

r

yrur(ti) = 0.

Thus (25) becomes n+l yn+1

n

xi{f(ti)

E

E

-

i=1

(28)

yrur(t.)}.

r=1

By Lemma (1) we have

xixi+l < 0.

Hence (27) entails

n+l

(-1)1xil = 1.

I

i=1

Entering this expression into (28) we arrive at n+l

n+l

X.(-1)lyn+ll =

I

i=1

n xi{f(ti)

I

yrur(ti)}I.

-

r=1

i=1

Since

If(t)

n -

E

r=1

yrur(t) I : yn+1'

t E [a, 6),

we must conclude that (12) is satisfied for

lei = yn+l, establishing the

desired result. (29)

Remark.

Theorem (24) can be used for deriving a nonlinear

system of equations to solve (PA) numerically.

equations with the unknowns y1"

" 'yn+l

and

(12) is a system of n+l tl,.." to+l'

The missing

174

n+l R

APPROXIMATION PROBLEMS BY CHEBYSHEV SYSTEMS

VIII.

equations are derived by utilizing the fact that the error function of Lemma (13) must have a local extremum at Theorem.

(30)

order two over

Let

Then

f

be twice continuously differentiable

y1,...,yn+l

if and only if there is a set

nn

be an extended Chebyshev system of

ul,...,un

and let

[a,s]

on the same interval.

ti, i = 1,...,n+1.

is the optimal solution of (PA)

a < tl < t2 < ... < to+l < s

such that

yrur(ti) + (-1)IE = f(ti = 1,...,n+l,

(31)

r=1

n

11

(ti-a)(6-ti) { E YruT(ti) - f'(ti)} = 0, r=1

i = 1,...,n+l,

(32)

J1

(33)

yn+l = le"' Proof:

(31) and (33) follow from Theorem (24).

fact that the error function has a local extremum at

(32) expresses the ti.

If

ti E (a,s)

then the derivative of the error function must vanish. The three-phase algorithm is much simpler for (PA) with Chebyshev systems than in the general case.

q

no clustering occurs in Phase

In Phase 1 a discretized version of

(PA) is solved by means

2.

is set to

n+l

from the outset and

of an exchange algorithm based on Lemma (13).

For discretized problems convergence is guaranteed by the fact that only finitely many exchanges can take place and the calculated lower bound increases in each step. all

ti

To improve efficiency one generally exchanges

in each step and seeks to achieve

IR(ti)I > IR(ti)I.

The

classical Remez algorithm (see e.g. Cheney (1966)) requires that the maximum value of the error function on

[a,s]

be calculated at each step;

but this cannot be achieved by means of a finite number of arithmetic operations unless further assumptions are made about the structure of the function

f.

Chapter IX

Examples and Applications of Semi-Infinite Programming

In this chapter we shall illustrate how the techniques of semiinfinite programming can be used for the computational treatment of nontrivial problems in a practical context.

We remind the reader that im-

portant applications have been discussed elsewhere in the book, e.g. in §6, §7, §19 and §20.

§21.

A CONTROL PROBLEM WITH DISTRIBUTED PARAMETERS (1)

In this section we shall treat a problem of potential interest

for industry.

One wants to change the temperature of a metal body by

regulating the temperature of its environment.

This must be done within

a predetermined period of time and the temperature of the environment can only be varied between an upper and a lower value.

We shall discuss

a simple model problem which is solved in Glashoff and Gustafson (1976). Only one spatial coordinate occurs, but the solution to be presented here could possibly be applied to paralleliepipedic bodies having large extensions in the remaining two dimensions; i.e. when boundary effects can be neglected. (2)

Thus we consider a thin rod which can be heated symmetrically

at both ends but is thermally isolated from its surroundings everywhere else.

(The rod could be thought of as representing a cut through a plate

in its central part.

The two large surfaces of the plate are held at the

same temperature and heat flows into or out of the interior of the plate. The heat thus propagates perpendicularly to the large surfaces of the plate, not along the surfaces).

We select the coordinate. system so that

the endpoints of the rod are located at 175

-1

and

+1.

Inside the rod the

176

IX.

temperature is

EXAMPLES AND APPLICATIONS OF SEMI-INFINITE PROGRAMMING

y(x,t)

at the point

at the time

x

shall study the temperature of the rod for

t, -1 < x < 1.

We

We assume that

0 < t < T.

the temperature is governed by the heat diffusion equation, Yt(x,t) = Yxx(x,t) - q(x)y(x,t), where

-1 < x < 1, 0 < t < T,

(3)

is a given twice-differentiable function with

q

0 < x < 1.

q(x) = q(-x),

(4)

As usual, yt, yxx, etc. denote partial derivatives.

The temperature of

u, the temperature at the two endpoints.

the rod is controlled by varying

The transfer of heat from the rod to the surrounding medium (or vice versa) follows the law 0 < t < T

Byx(l,t) = u(t) - y(l,t), (right endpoint).

Here, $

(5)

An analogous equation holds for the left endpoint. Combining (3), (4) and (5) we realize

is a positive constant.

that

-1 < x < 1,

y(-x,t) = y(x,t), i.e. y

is an even function of

Yx(0,t) = 0,

0 < t < T;

Therefore we must have

x.

0 < t < T.

We need only consider

y(x,t)

the surrounding medium be

0 < x < 1.

for

u(t), 0 < t < T, and let

sulting temperature distribution in the rod at at

t = 0

Let the temperature of be the re-

y(x,T)

if the temperature

t = T

is

y(x,0) = 0,

-1 < x < 1.

Now let the desired temperature at

t = T

be

z(x)

where

z

is a con-

tinuous function with z(x) = z(-x).

We now want to compute a function mates

quire that

u which is such that

as closely as possible.

z(x)

u

y(x,T)

approxi-

For physical reasons we must re-

is a bounded function and introduce the constraint

0 < u(t) < 1,

0 < t < T.

For easy reference we collect the equations describing our control problem.

Yt(x,t) - Yxx(x,t) + q(x)y(x,t) = 0,

0 < x < 1,

0 < t < T,

(6)

177

A Control Problem with Distributed Parameters

21.

Ryx(l,t) + Y(l,t) = u(t),

0 < t < T,

(7)

yx(O,t) = 0,

0 < t < T,

(8)

y(x,0) = 0,

0 < x < 1,

(9)

0 < u(t) < 1,

0 < t < T.

(10)

If the control function is continuous, one can establish that the system - (9) has a classical solution

(6)

where

y(x,t)

0 < x < 1, 0 < t < T.

derivatives y

are continuous functions for yt, yxx is in fact continuous for 0 < x < 1, 0 < t < T.

continuous for

the linear control operator

y

y(x,T)

is

through

0 < x < 1,

(Lu)(x) = y(x,T), where

L

Thus

u, therefore, we can introduce

For continuous

0 < x < 1.

and its partial

y

We introduce the uni-

is the solution to the problem (6) - (9).

form norm on the space of functions continuous on

and formulate

[0,1]

our problem as follows: Minimize when

(11)

IILu - zIL

It can be

varies over all continuous functions satisfying (10).

u

shown that this problem does not in general have an optimal solution. Hence one extends the class of functions problem.

u

to get a solvable control

Here we take a

See Glashoff and Gustafson (1976) for details.

short cut to arrive more quickly at a computational treatment. (12)

We select an integer

and the fixed numbers

n > 1

t0,ti,...,tn,

where

0 = t0 < t I

< ...
Next we denote by

U

the class of piecewise constant functions

u

satis-

fying u(t) = ar,

u E U

Thus

u E U, then t = 0

tr-1 < t < tr,

(13)

is uniquely determined by the vector

(all ...,an)T.

If

can easily be calculated numerically since we start with

Lu

and calculate

the just computed y(x,t2), and so on. Minimize

r = l,...,n.

y(x,t1), 0 < x < 1

y(x,t1)

with

u(t) = al,

as an initial value for

y

Next we use

and determine

Therefore we can approximate (11) with the problem

JILu - zllm

over all

u E U.

We next introduce the nonnegative basis functions

(14)

yr

through

178

EXAMPLES AND APPLICATIONS OF SEMI-INFINITE PROGRAMMING

IX.

tr-1 < t < tr'

( 1 ,

r = 1,...,n.

vr(t)

yr E U, r = 1,...,n, and if u

Thus

(15)

otherwise

0,

is defined by (13) we get

n

u =

arvr.

E

(16)

r=1

Next we put

(17)

r = 1,....n,

wr = Lvr,

giving

n

Lu =

r=1

n

a Lv r

=

r

I a rwr .

r=1

Combining (15) and (16) we find that 0 < ar < 1,

u E U

meets (10) if and only if

r = 1,...,n.

(18)

Hence Problem (14) takes the form n

Minimize

11

E

arwr - Z11-

(19)

r=1

over all

a E Rn u = yr

putting

subject to (18). in (6)

- (9).

We observe that

wr

is determined by

Problem (18) may now be treated in analogy

to the approximation problems in Chapter III.

Thus we first recast it

into the form Minimize

when

an+1

n

OW(X) - z(x)1 < an+l'

0 < x < 1

(20)

r=1

0 < ar < 1,

r = 1,...,n.

(21)

By replacing (20) and (21) with equivalent simpler inequalities, we obtain

Minimize over all

a

(22)

n+ 1

al,...,an+l

subject to the constraints

n

arwr(x) + an+1 > z(x),

0 < x < 1,

(23)

r=1

n -

r=1

arwr(x) + an+l > -z(x),

0 < x < 1,

(24)

21.

A Control Problem with Distributed Parameters

a

r

> 0,

-ar > -1,

179

r = 1,...,n,

(25)

r = 1,...,n.

(26)

(22) - (26) is now a linear optimization problem of the type defined in §3.

The three-phase algorithm of Chapter VII applies.

The fact that the

inequality constraints appear in four disjoint groups makes the organization of the calculation somewhat laborious.

We present here a worked example from Glashoff and Gustafson

(27)

(1976).

In (5)

- (9), q(x) = 0, 0 < x < 1, S = 0.1

were selected.

Several values of T

only the case

ul,...

Let

form.

T = 0.3.

and

z(x) = 0.2

were treated but we discuss here

In this Example

wr(x)

may be determined in closed

be the positive roots of the equation

V tan u = 10.

Next determine

Akpk(x)

through

Akpk(x) = 2 sin uk(uk + cos uk sin Uk)-Icos 11kX.

Then

is determined from

wr(x)

Lur(x) = wr(x) n = 10

k=1 I

was chosen and

tr = 0.03

r,

tr

AkukPk(x) f ur(t)exp(-uk(T-t))dt. 0

were taken equidistant;

r = 0,...,10.

The problem (22) - (26) was discretized by means of an equidistant grid with 17 points

xi;

xi = (i-l)/17, Then (22)

i = 1,...,17.

- (26) was replaced by a linear program having 11 variables

a1,...,a11

and 54 constraints.

We note that

0 < ar < 1

The results in Table (32) below emerged.

only for

r = 5,8,9,10.

Next put

10

f (x) = where

I

arwr (x) - z (x) ,

(28)

r=1

a1,...,a10

is the calculated solution just obtained.

bility condition is in this case that

R(x)I < all,

0 < x < 1.

The feasi-

180

EXAMPLES AND APPLICATIONS OF SEMI-INFINITE PROGRAMMING

IX.

We find that

f

0, 0.3125, 0.6250,

has local extrema at the 5 gridpoints

0.8750, 1.

Thus we assume that (22) - (26) has an optimal solution

all...,a11

such that the function 10

f =

arwr - z

I

r=l

has local extrema at the endpoints

which we denote E1'2'3

0

and

1

and at

interior points

3

Thus we get the following 8 equations: .

If(0)I = all, If(l)I = all, I f(Ci) I = all, i = 1,2,3, fig i) = 0, i = 1,2,3.

(29)

(30) (31)

We use the result of the discretized problem as an approximation of the solution to the linear optimization problem (22)

or = ar

for

r = 1,2,3,4,6,7

and assume that

-

Thus we put

(26).

and

f

hav the "same

f

shape", i.e. that they have the same number and the same kind of local extrema, thus enabling us to remove the absolute value symbols and select correct signs in (29) and (30).

Thus the 8 equations (29)

-

(31)

The system is solved

have the 8 unknowns with the Newton-Raphson method.

Lastly, the optimality of the solution

hereby obtained is checked by verifying that the complementary slackness For this

conditions with respect to the dual of (22) - (26) are met.

particular problem it was possible to simplify the general three-phase algorithm due to the special structure of the error curve problem appears here only at the verification step.

f.

The dual

We also see from (32)

that the discretization error is rather small. (32)

Table.

Calculated results for

T = 0.3, n = 10,

17 equidistant gridpoints in Time interval

0 - 0.12 0.12-0.15 0.15-0.21 0.21-0.24 0.24-0.27 0.27-0.30 Optimal value (33)

Exercise.

Index r 1,2,3,4 5

6,7 8

9

10 11

[0,1]

Discretized problem

Continuous problem (20)-(24)

1

1

0.43638

0.43631 0

0

0.10848 0.23062 0.19959 1.069x10-4

0.10835 0.23068 0.19959 1.060x10-4

What could happen if the verification of the com-

plementary slackness conditions is left out?

Discuss in particular the

case when (22) - (26) is discretized with a fine grid!

22.

Operator Equations of Monoton=c Type

181

OPERATOR EQUATIONS OF MONOTONIC TYPE

§22.

(1)

function

We shall use the term operator equation for equations having a u

as unknown.

Such problems are often formulated as differen-

tial equations or integral equations.

If the unknown function occurs

linearly, then an approximate solution to the operator equation may be calculated by means of reformulating the given problem into an approximation problem of the type discussed in §6 and later in the book. (2)

example.

We illustrate the general idea by discussing the following Let

defined for

be a continuous function of two variables

K

0 < s < 1, 0 < t < 1.

tions which are defined on

Let

f

and

g

We seek a function

(0,1].

s

and

t,

be two given funcu

satisfying the

condition

u(0) = 1

(3)

and fulfilling the linear integro-differential equation 1

u'(t) + f(t)u(t) + 1

Let now

u1,...,un

be

ferentiable on

[0,1].

K(t,s)u(s)ds = g(t),

0 < t < 1.

(4)

0

n

given functions which are continuously dif-

We want to approximate the unknown function

u

with a linear combination n

yru

u = I r=1

(5)

r

The idea is to enter this approximation into (4) and to minimize the norm of the function n

Cn

YruT(t) + f(t) I r=1

1 fl

Cn

yrur(t) +

r=1

L

r=1

K(t,s)ur(s)ds - g(t),

Yr 0

0 < t < 1. Next put 1

vr(t) = ur(t) + f(t)ur(t) +

K(t,s)ur(s)ds,

r = 1,...,n.

(6)

0

If we want to approximate Minimize

yn+l

subject to the constraints

g

in the uniform norm, we get the task (7)

182

EXAMPLES AND APPLICATIONS OF SEMI-INFINITE PROGRAMMING

IX.

n

0
yrvr(t) - g(t)I < yn+

(8)

r=1 n

Yrur(0) = 1.

E

(9)

r=1

The problem (7)

The last relation comes from (3).

- (9) can then be re-

formulated as a linear optimization problem and solved by means of the general computational schemes of Chapter VII.

We notice the similarity

between the approach taken here and that applied in §21. direct relation between the value a solution (7)

There is no

in (7) and the deviation between

Yn+l

and an approximating linear combination (5).

u

The problem

- (9) is defined and can be solved numerically even if the original

task (3), (10)

(4) does not have a solution.

We shall now discuss a general class of operator equations

where the analysis can be carried much further and where the techniques of semi-infinite programming permit a systematic computational treatment. We refer here to the so-called operator equations of monotonic type.

See

A comprehensive account is given in Protter-Weinberger

e.g. Collatz (1952).

(1967) and we refer the interested reader to this text for the mathematiNumerical examples are also given

cal theory of this class of equations.

Here we shall illustrate the use of semi-infinite pro-

in Watson (1973).

gramming on a particular example. (11)

tion

u

We want to calculate the uniquely determined func-

Example.

of two variables satisfying

au+au=0 2

2

as2

at2

A = {(s,t), 0 < s < 1, 0 < t
on

s,t E bd A, the boundary of

u(s,t) = f(s,t),

Here, f

(12) (13)

A.

is a known continuous function.

(12) is a monotonic operator equation and one can show the following result:

v(s,t) < u(s,t) < w(s,t) whenever

and

v

w

(14)

are functions of two variables satisfying

v(s,t) < f(s,t) < w(s,t), (s,t) E bd A

waw<0
2

as2

2

at2 -

2

- as2

2

(15)

(s,t) EA.

(16)

ate'

Our goal is to construct functions

v

and

w

numerically.

Put

Operator Equations of Monotonic Type

22.

183

n

w(s,t) =

I

Yrgr(s,t),

r=1

where t L,

g1,...,gn

are defined by the expressions

respectively.

n=

z

yl,...,yn

1,s,t,s2,st,t2,...,stL-l

is an integer and

Here, L

(L+1) (L+2). We get

are constants to be determined.

2 aw+ aw_ n/ y f (s,t) rr as2 at2 r=1 2

where

fr

are calculated from

f4(s,t) = f6(s,t) = 2, etc.

Thus

gr.

fr(s,t) = 0, r = 1,2,3,5,

The conditions (15) and (16) imply

nn yrgr(s,t)

> f(s,t),

(s,t) E bd A,

r=1

n (s,t) E A.

yrfr(s,t) < 0, t=1

We want to find a "good" function

w, i.e. a function which satisfies the

right inequality in (15) and the left inequality in (16) as well as posTherefore we

sible.

minimize over all

(17)

Yn+1

subject to

y1,...,yn+1 n

f(s,t) + Yn+l >

yrgr(s,t)

> f(s,t),

(s,t) E bd A,

(18)

(s,t) E A.

(19)

r=1 nn

- Yn+l <

Yrfr(s,t) < 0, r=1

The problem (17) - (19) can easily be recast into the following equivalent linear optimization problem: Minimize over all

(20)

Yn+l

yl,...,yn

subject to

n yrgr(s,t)

> f(s,t),

(s,t) E bd A,

(21)

r=1 nn

yn+l

yrgr(s,t)

C

r=1

> -f(s,t),

(s,t) E bd A,

(22)

184

IX.

EXAMPLES AND APPLICATIONS OF SEMI-INFINITE PROGRAMMING

n

yrfr(s,t) > 0,

(s,t) E A

(23)

(s,t) E A.

(24)

r=1

n C

yn+1 +

yrfr(s,t) > 0,

r=1

This task can be treated with the computational schemes of Chapter VII. The construction of

is carried out in an analogous manner.

v

(14) can

now be used to calculate pointwise upper and lower bounds for the solution

u.

Exercise.

(25)

(24) is solvable.

Show that the linear optimization problem (20)

Hint:

-

Use Theorem (7) of §11.

AN AIR POLLUTION ABATEMENT PROBLEM

§23.

We shall resume the discussion of the air pollution control problem of (14) of §3 but now in a more general context.

We noted that pollutants

emitted from various sources, e.g. power plants, contaminate the air. Sooner or later they will reach the ground as fallout.

Thus sulfur com-

pounds from power plants burning fossil fuels may damage soils and acidify The pollutants are often transported

waters causing the death of fish.

long distances before they reach the ground.

Thus the severe acidifica-

tion of lakes in Scandinavia is caused, to a large extent, by industry in Similar phenomena have recently been

Great Britain and Central Europe.

In this section we shall develop a model

observed in the U.S. and Canada.

The main

which incorporates both air pollution and fallout on the ground.

difficulty associated with its application to practical problems is the construction of the transfer functions.

Much research is needed in this

area. (1)

Air pollution control model.

We use the same notation as in

(14) of §3 but include the fallout as well.

quality control area not coincide. V.

and

W.

S

and a fallout control area

With each source where

Thus we consider an air

j

F.

S

and

need

F

we associate the transfer functions

V (s), s E S, is the contribution from source

j

to

the annual mean concentration in the air of the pollutant considered at In the same way, W.(t), t E F, is the contribution from source

s E S.

to the fallout at fied.

t E F.

Let

N

sources with strengths

gj

j

be identi-

We assume that the combined annual mean pollutant concentration is

given by

23.

An Air Pollution Abatement Problem

N

gjVj(s),

185

s E S,

and that the total fallout is N

g.". (t),

t E F.

We assume that the contributions add up according to the principle of superposition.

The number of sources is fairly large and therefore we

combine them into classes as described in (14) of §3. class are regulated in the same way.

The sources in each

Upon performing this aggregation we

write the total concentration in the air at

s E S,

n

r=1

vr(s)

and the total fallout at

t E F,

n wr(t). r=1

Thus source-class

gives rise to the concentration contribution

r

wr.

and the fallout contribution

One reductions strategy is that the emission of class by the fraction

0 < Er < 1, r = 1,...,n.

Thus

Er.

yr

r

is reduced

Hence the total re-

maining concentration after regulation is given by n

(1-Er)vr(s) r=1

and the total fallout becomes n

(1-Er)wr(t). r=1

We now require that the remaining concentration and fallout do not surpass given levels posed.)

g

and

f.

(The standards

g

and

f

We also assume that there are upper bounds

for the fractions

Er.

emissions completely.)

may be legally imer < 1, r = 1,...,n,

(It may not be technically possible to remove the

Therefore the numbers

El,...,En

must meet the

conditions

0 < E r < er ,

r= 1,...,n,

(2)

ne

(1-Er)vr(s) < g(s), r=1

s E S,

(3)

186

IX.

EXAMPLES AND APPLICATIONS OF SEMI-INFINITE PROGRAMMING

n

t E F.

(1-Er)wr (t) < f(t),

(4)

r=1 The reduction of emissions entails costs, e.g. for purification of the exhausts or the use of more expensive fuels than otherwise would have been selected.

We shall assume here that the costs are defined by the linear

function

n

K(E) = where

I crEr, are known numbers.

cl,...,cn

function

(5)

r=1

K(E)

The task of minimizing this cost

subject to the constraints (2) -

(4) can be written as a

linear optimization problem as follows: n

Minimize

I

crEr

(6)

r=1

subject to the constraints E

r

r = 1,...,n,

> 0,

-Er > -e

(7)

r = 1,.... n,

(8)

r,

n

n

E v (s) > -g(s) + E V r r r=l r r=1 E

n r=1

Er wr (t) > -f(t) +

n E r=1

wr (t),

s E S,

(9)

t E F.

(10)

The constraint (9) admits the following interpretation:

the total reduc-

tion must amount at least to the difference between the concentration before reduction and the imposed standard. that

E

(10) is solvable if it is consistent.

ards are met by maximal reduction; i.e. n (1-er)vr(s) < g(s),

s E S.

(1-er)wr (t) < f(t),

t E F.

r=1

and n

Conditions (7) and (8) entail

is restricted to a compact subset of

r=1 The dual of (6) - (10) may be written: Maximize

Rn.

Thus the problem (6)

Consistency means that the stand-

-

23.

An Air Pollution Abatement Problem

T q1 -n a +

187

q2

xig0(si) +

Eif0(ti)

(11)

L-1

over all vectors

gl,g2, points

A E Rn, n E Rn, integers

S. E S, ti E T,

xi, Ei, subject to the constraints

and reals

ql

q2

i=1

r = 1,...,n,

n

Ei > 0,

i = 1,...,q2.

r

r = 1,...,n,

(12)

i=1

Ar > 0,

-

Eiwr(ti) = cr,

xivr(si) +

Ar - nr +

> 0,

-

xi .

> 0,

-

i = 1,...,q1, (13)

In (11) we have put

g0(s) _ -g(s) +

CnC

L

r=1 n

vr(s),

f0(t) _ -f(t) + I w(t). r=1

The complementary slackness conditions for the dual pair (6)

-

(10) and

(11) - (13) read ATE = 0,

(14)

nT(e-E) = 0,

(15)

n xi { I

Ervr(si) - g0(si)I = 0,

i = 1,...,q1,

(16)

Erwr(t

i = 1,...,q2.

(17)

r=1 n Ei {

-

f0(ti)} = 0,

r=1

The equations (14) - (17) which must be fulfilled for optimal solutions can be analyzed as follows. r = 1,...,n.

nn

0 < Er < er, we must have

Er = 0, then (15) entails

If e.g.

er - Er = er > 0.

Since

Further, if

xi > 0

nr = 0

Arnr = 0,

since

then

Ervr(si) = g0(si).

r=1

Thus the pollutant concentration reaches the highest possible value at si.

In the same way the level of fallout reaches the standard value at

ti

if

ql

points

Ei > 0.

Thus an optimal reductions strategy is associated with where the pollutant concentration reaches the

sl,...,sq 1

highest value and

q2

points

where the rate of fallout is

t1,...,t q2

188

EXAMPLES AND APPLICATIONS OF SEMI-INFINITE PROGRAMMING

IX.

the largest permissible.

The positions of these "critical points" are

determined when Problem (6)

- (10) is solved numerically.

For this purpose the general three-phase algorithm of Chapter VII may be used.

In Phase i), S

in (9) and

{s1,...IsN} c S

finite subsets

linear program is solved.

noted by

E*.

in (10) are replaced by the

{t1,...,tL} c T

and the resulting

Let the optimal solution then obtained be de-

We find that

n

r=1

F

and

n

E*v (s.) ? -g(s

rr

vr (s .),

+ r= =1

Cn

nCC

w(t r

E*wr(ti) > -f(tt) + rI l rL1

This means that with the reduction strategy tion and fallout are met on the grids.

E*, the standards for pollu-

They can hence be violated only

outside the grids and it is possible to derive bounds for how large the deviation can be.

r-r-

We recall that

0 < E* < e

Hence one can assess

< 1.

when it is worthwhile to carry out the remaining phases of the algorithm since the parameters of this problem, e.g. the transfer functions, are not very accurately determined.

524.

NONLINEAR SEMI-INFINITE PROGRAMS In this section we shall illustrate by examples how the computational

scheme of Chapter VII may be extended to problems which are not of the form of (P) (introduced in §3) or (D) (introduced in §4). We treat first the class of problems which arise when the preference function of (P) is replaced by a nonlinear convex function. Let the index set

we consider the following task: a1,.... an

and

be defined as in §3.

b

continuously differentiable on Minimize over all

R.

S

Suppose that

F

is convex and

Consider the problem

F(y)

y E Rn

Thus

and the functions

(1)

subject to the constraints

n yrar(s) > b(s),

s E S.

(2)

r=1

This problem may be reduced to the form of (P).

In our further develop-

ment we shall assume that (2) determines a compact subset of it will be denoted by

K.

Rn.

Then (1), (2) has an optimal solution

Here y*.

We

Nonlinear Semi-Infinite Programs

24.

189

shall now derive relations which can be used for the determination of

y*.

(1) and (2) may be written as: Yn+l

Minimize subject to

(3a)

yn+l = F(y),

y E K.

Let us now assume that a cube

(3b)

T = {x

Ixil < F,

i = 1,...,n}

I

with

is known

n E T, the linear function

Denote by

K E T.

n

1I(n,Y) = FCn) +

Fr(n) (Yr-T1r) r=1

where

stands for

Fr(n)

Since

.

2n

Oettli (1975))

F

is convex we have (c.f. Blum and

r

F (y) > n(n,Y),

y E K,

F (y) = sup H(n,Y) = n(Y,Y) nET

Hence (3) is equivalent to the problem Minimize

(4)

yn+l

subject to Yn+l

(n,Y)

n ET, yE K.

(2) gives the condition that

y E K.

(5)

Combining this with (4) and (5) we

finally arrive at the formulation Minimize

Y.+1

(6)

subject to

n yn+1

C L

n yrFr(n) ? F(n) -

r=1

nrFr(n),

n E T,

(7)

s E S.

(8)

r-1

n

Yrar(s) > b(s), r=1 (6)

-

(8) is a linear optimization problem of the type introduced in 93.

It can be solved by means of the general three-phase algorithm of Chapter VII.

An alternative is to discretize (1),

(2) directly.

This generalization may be carried even further.

We consider the

following problem: Program (PG).

Let

S

be a compact subset of

function of the two arguments

s,y, where

s E S

Rn

and

and let y E Rn.

g g

be a is

190

EXAMPLES AND APPLICATIONS OF SEMI-INFINITE PROGRAMMING

IX.

required to have the properties that the set

K = {y E Rn

g(s,y) < 0,

1

is nonempty and compact, and

s E S} g

(9)

is twice continuously differentiable on

S x K. Let

be twice continuously differentiable on

G

Then Program

Rn.

(PG) is the task:

Minimize

G(y)

y E Rn

over all

(10)

subject to the constraint

g(s,y) < 0,

Program (PG) has a solution since the continuous function

Remark. G

s E S.

is to be minimized over the compact subset

a special case of (PG) which occurs if T a (s)

y.

Since

G

K

is not assumed to be convex, G

G

many local minima on

of

Program (P) is

Rn.

is linear and

g(s,y) = b(s) -

may have arbitrarily

K, a fact which complicates the numerical treatment.

To a certain extent, a computational scheme for (PG) can be based on the experiences from (P), even if the implementation on a computer is much more difficult.

A natural idea is to discretize (PG), i.e. replace

S

by a finite

grid

T = {sl,...,sN}

and approximate (PG) by the task Minimize G(y)

(12)

y E Rn subject to the constraints

over all

g(sj,Y) < 0, Let now

L

(13)

j = 1,...,N.

be a positive interpolating operator with nodes

(see (8) of §13).

s1,...,sN

We define

N

Lg(s,y) =

I

are as in (8) of

w.

to conclude that Lg(s,y) < 0,

y

13.

We next invoke Theorem (10) of §13

satisfies (13) if and only if s E S.

Here the discretization (12), (13) of Program (PG) is equivalent to replacing

g(s,y)

by

Lg(s,y)

in (11).

For convergence results based on this

24.

Nonlinear Semi-Infinite Programs

191

fact see Gustafson (1981).

We note that the numerical solution of the discretized problem (12), (13) is a nontrivial task along with the

ties (13) are consistent and the set

K

verification that the inequaliof (9) is nonempty and compact.

These matters must be settled analytically, if possible,

This is in

sharp contrast to Problem (P) where the questions of the consistency and boundedness of the discretized problem are answered as a result of the simplex calculations. The problem (12), (13) may be treated using the algorithm in Han (1977) or the variant developed by Powell (1978).

an approximate solution

y

Thus we may calculate

It can be used to determine neces-

to (PG).

sary conditions which must be met by optimal solutions to (PG). ment parallels that in §16. Let

y*

i)

ii)

be an optimal solution to (PG).

g(s,y*) < 0,

s E S;

There are

points

q

g(sj,y*) = 0,

In the first case

y*

Our argu-

See also Gustafson (1981) and Watson (1981). Then two cases are possible:

such that

s E S

i = 1,...,q.

(14)

is a solution to the equation

VG(y) = 0.

(15)

But (15) may have other solutions besides

y*.

Thus one would need to

determine all solutions to (15) and seek out those which meet (11) and render

a minimum.

G

Next consider Case ii).

Put

f(s) = g(s,y*).

Then

f

has a local maximum at

(16)

sj, j = 1,...,q.

Arguing as in §16, we

derive conditions apart from (14) which must be met by

Hence

y*.

may be considered as a solution to the problem of minimizing to (14) and the constraints generated by the fact that

f

G

y*

subject

from (16) as-

sumes a maximum at s, j = 1,...,q. In the numerical treatment one approximates

y*

by

y, a calculated

optimal solution to (12), (13), and derives the constraints by replacing the unknown

y*

by the calculated

y

in (14) and (16).

at a nonlinear constrained optimization problem.

Hence we arrive

Using Lagrange multi-

pliers as described in Luenberger (1969), Chap. 9, we may derive a nonlinear system of equations which subsequently is solved numerically, e.g. by means of the Newton-Raphson method.

Thus we get a direct generaliza-

tion of the computational procedures described in Chapter VII.

An alterna-

192

IX.

EXAMPLES AND APPLICATIONS OF SEMI-INFINITE PROGRAMMING

tive approach is to apply the algorithms by Han and Powell which were mentioned earlier.

In either case an independent verification of the

optimality of the calculated solution is called for.

References

Andreasson, D. 0. and Watson, G. A.: Linear Chebyshev approximation without Chebyshev sets, BIT 16 (1976), 349-362. Bartels, R. H.: A penalty linear programming method using reducedgradient basis-exchange techniques, Linear Algebra and Appl. 29 (1980), 17-32.

Bartels, R. H. and Golub, G. H.: The simplex method of linear programming using LU-decompositions, CACM 12 (1969), 266-268.

Bartels, R. H., Stoer, J. and Zenger, Ch.: A realization of the simplex method based on triangular decompositions. In: "Linear Algebra", J. H. Wilkinson and C. Reinsch (Eds.), Springer-Verlag, BerlinHeidelberg-New York (1971). Blum, E. and Oettli, W.: Mathematische Optimierung, Springer-Verlag, Berlin-Heidelberg-New York (1975).

L'algorithme d'exchange en optimisation convexe, These, Carasso, C.: Grenoble (1973). Charnes, A. and Cooper, W. W.: Management Models and Industrial Applications of Linear Programming, Vols. I,II, J. Wiley & Sons, New York (1961).

Charnes, A., Cooper, W. W. and Henderson, A.: An Introduction to Linear Programming, J. Wiley $ Sons, New York (1953). Charnes, A., Cooper, W. W. and Kortanek, K. 0.: Duality, Haar programs and finite sequence spaces, Proc. Nat. Acad. Sci. U.S. 48 (1962), 783-786.

Charnes, A., Cooper, W. W. and Kortanek, K. 0.: Semi-infinite programs which have no duality gap, Management Science 12 (1965), 113-121. Cheney, E. W.: Introduction to Approximation Theory, McGraw-Hill, New York (1966). Collatz, L.:

Aufgaben monotoner Art, Arch. Math. 3 (1952), 366-376.

Approximation von Funktionen bei einer oder mehreren Veranderlichen, ZAMM 36 (1956), 198-211.

Collatz,.,L.:

Collatz, L. and Krabs, W.: Stuttgart, (1973).

Approximationstheorie, B. G. Teubner,

Collatz, L. and Wetterling, W.: Optimierungsaufgaben, Zweite Auflage, Springer-Verlag, Berlin-Heidelberg-New York (1971). 193

194

REFERENCES

0

Dahlquist, G. and Bjorck, A.: Numerical Methods, Prentice-Hall, Englewood Cliffs, New Jersey (1974).

Linear Programming $ Extensions, Princeton University Dantzig, G. B.: Press, Princeton, New Jersey (1963). In "Linear Inequalities and Related Duffin, R. J.: Infinite programs. Systems", H. W. Kuhn and A. W. Tucker (Eds.), Princeton University Press, Princeton, New Jersey (1956), 157-170. Eckhardt, U.: Theorems on the dimension of convex sets, Linear Algebra and Appl. 12 (1975), 63-76.

Eggleston, H. G.:

Convexity, Cambridge University Press, Cambridge (1958).

Fahlander, K.: Computer programs for semi-infinite optimization, TRITANA-7312, Department of Numerical Analysis and Computing Science, Royal Institute-of Technology, S-10044 Stockholm 70, Sweden.

Gill, P. E. and Murray, W.: A numerically stable form of the simplex algorithm, Linear Algebra and Appl. 7 (1973), 99-138. In: "SemiGlashoff, K.: Duality theory of semi-infinite programming. infinite programming", Proc. Int. Colloqu. Bonn. R. Hettich (Ed.), Lecture Notes in Control and Information Sciences 15, Springer-Verlag, Berlin-Heidelberg-New York (1979), 1-16.

Glashoff, K. and Gustafson, S.-A.: Numerical treatment of a parabolic boundary-value control problem, J. Opt. Th. Appl. 19 (1976), 645-663.

Einfuhrung in die Lineare Optimierung, Glashoff, K. and Gustafson, S.-A.: Wissenschaftliche Buchgesellschaft, Darmstadt, (1978). Gorr, W., Gustafson, S.-A. and Kortanek, K. 0.: Optimal control strategies for air quality standards and regulatory policies, Environment and Planning 4 (1972), 183-192. 0

Gustafson, S.-A.: On the computational solution of a class of generalized moment problems, SIAM J. Numer. Anal. 7 (1970), 343-357. 0

A general three-phase algorithm for nonlinear semi-infinite programming, in Y. P. Brans (Ed.), Operations Research '81, NorthHolland Publ. Co., Amsterdam-New York-Oxford (1981), 495-508.

Gustafson, S. -A.:

0

Numerical treatment of a class of Gustafson, S.-A. and Kortanek, K. 0.: semi-infinite programming problems, Nav. Res. Log. Quart. 20 (1973), 477-504. 0

On the calculation of optimal longGustafson, S.-A. and Kortanek, K. 0.: term air pollution abatement strategies for multiple-source areas, Proc. Sixth NATO/CCMS Expert Panel on Air Poll. Model., (1975). Linear Programming, Addison-Wesley Publ. Comp., Reading, Hadley, G.: Mass., 3rd printing (1964). Han, S. P.: A globally convergent method for nonlinear programming, J. Opt. Th. Appl. 22 (1977), 297-309.

A Newton-method for nonlinear Chebyshev approximation, In: Hettich, R.: "Approx. Theory", Proc. Int. Colloqu. Bonn, Lecture Notes Math., 556, Springer-Verlag, Berlin-Heidelberg-New York (1976), 222-236. "Semi-infinite Programming", Lecture Notes in Control Hettich, R. (Ed.): and Information Sciences 15, Springer-Verlag, Berlin-HeidelbergNew York (1979). Numerische Methoden der Approximation and Hettich, R. and Zencke, P.: semi-infiniten Optimierung, Teubner, Stuttgart, 1982.

References

195

Hildenbrand, K. and Hildenbrand, W.: Lineare Okonomische Modelle, Springer Hochschultext, Berlin-Heidelberg-New York (1975). Hoffman, K.-H. and Kiostermair, A.: A semi-infinite linear programming procedure and applications to approximation problems in optimal control. Approx. Theory II, Proc. Int. Symp. Austin, (1976), 379-389. Judin, D. B. and Golstein, E. G.: Berlin (1968).

Lineare Optimierung I, Akademie-Verlag,

Karlin, S. and Studden, W. J.: Tchebycheff Systems: with Applications in Analysis and Statistics, Interscience Publishers, New York-LondonSydney (1966). Krabs, W.:

Optimierung and Approximation, B. G. Teubner, Stuttgart (1975).

Lorentz, G. G.: Approximation of Functions, Holt, Rinehart and Winston, New York (1966). Luenberger, D. G.: Optimization by Vector Space Methods, John Wiley f Sons, New York-London-Sydney-Toronto (1969).

Powell, M. J. D.: A fast algorithm for nonlinearly constrained optimization calculations: In: "Numerical Analysis", G. A. Watson (Ed.), Lecture Notes in Mathematics 630, Springer-Verlag, Berlin-HeidelbergNew York (1978).

Protter, M. H. and Weinberger, H. F.: Maximum Principles in Differential Equations, Prentice-Hall, Englewood Cliffs, New Jersey (1967). Stewart, G. W.: Introduction to Matrix Computation, Academic Press, New York and London (1973).

Einfuhrung in die Numerische Mathematik, 2. Auflage. SpringerStoer, J.: Verlag, Berlin-Heidelberg-New York (1976). Watson, G. A.: One-sided approximation and operator equations, J. Inst. Maths. Applic. 12 (1973), 197-208. Watson, G. A.: On the best linear one-sided Chebyshev approximation, J. Approx. Theory 7 (1973), 48-58.

Globally convergent methods for semi-infinite programming, Watson, G. A.: Department of Mathematics, University of Dundee (1981).

Index

Absolute value for vector, 8 Activity, 3, 25 Air pollution, 17, 184 Annual mean concentration, 17 Andreasson, 44, 147 Basic set, 95 Basic solution, 93, 95 Bjorck, 51, 115 Bounded state, 5 Boundary point, 8 Caratheodory, 65 Center of a sphere, 8 Center of gravity, 140 Charnes, vi, 36, 92 Chebyshev, v Chebyshev polynomial, 52 Chebyshev system, 48, 153 Closed half-space, 7 Closed set, 9 Compact set, 9 Complementary slackness lemma, 25 Complementary slackness theorem,

Eckhardt, 36 Elements of a matrix, 6 Equivalent norms, 9 Euclidean norm of a vector, 8 Exchange step, 93, 97 Extended Chebyshev system of order two, 156, 160, 168 Factorization method, 116 Fahlander, 112 Fall-out, 184 Farkas' Lemma, 82 Feasible point, 2 Feasible problem, S Feasible set, 4 Finitely generated, 74 First Duality Theorem, 79

Gaussian elimination, 115, 119 General Assumption, 70, 82 General optimization problem, 4 Generalized quadrature rule of the Gaussian type, 158 Gorr, 19

95

Conistent problem, 5 Constraint, 4 Conic hull, 60 Convex combination, 59 Convex cone, 60 Convex hull, 59 Convex conic hull, 60 Cooper, vi, 36, 92 Dantzig, v, 92 Dahlquist, 51, 115 Defect, 33 Defect diagram, 33, 81 Degenerate basic solution, 95, 137

Discretization, 15, 109, 113 Disposal-activity, 26 Distance, 8 Double dualization, 30 Distributed parameters, 175 Dual linear program, 28 Dual pair, 24 Dual problem, 24, 39, 66, 129 Duality, 27 Duality gap, 33, 36, 79 Duality lemma, 20 Dualization, 24 Duffin, vi

Hyperplane, 7 Henderson, vi, 92 Hildenbrand, 27 Householder transformation, 119 Inconsistent, 5 Index of a set, 159 Inner point of a set, 8 Integro-differential equation, 181 Intensity, 3 Interior of a set, 8 Inverse of a matrix, 7 Karlin, 159 Kortanek, vi, 19, 36

Length Linear Linear Linear

of a vector, 8 combination, 61 mapping, 6 optimization problem, 10, 12,

45

Linear program, 14, 23 Linear programming, 14, 23, 27, 81, 106

Linear system of equations, 6 LR-decomposition, 123 Mass, 135 196

Index

Mass-point, 135 Matrix, 5 Maximization problem, 4, 5, 31 Maximal representation, 89 Minimization problem, 4 Minimum point, 2 Moment cone, 61, 80 Monotonic type, 182 Nonsingular matrix, 7 Normal vector, 7 One-sided approximation, 167 Open half-space, 7 Open set, 8 Open sphere, 8 Operator equation, 181 Optimal point, 2 Optimal solution, 2 Permissible set, 2, 4 Pivot element, 119 Positive interpolating operator, 110, 190 Preference function, 2, 4 Production model, 3 Production plan, 3, 25 Projection Theorem, 77 Protter, 182 Radius of a sphere, 8 Rank of a matrix, 6 Reduction Theorem, 63 Regular basic solution, 102 Regular matrix, 7 Regularity condition, 70 Regularized problem, 128 Regularization, 88 Roughness of a grid, 112 Row pivoting, 121 Scalar product, 8 Second Duality Theorem, 84 Semi-infinite programs, vi Separating hyperplane, 75, 76 Separation Theorem, 78 Side-condition, 4 Siting of power plant, 1 Slack vector, 28 Slater's condition, 70, 73, 80, 82 Solution, 2 Solvability, 69 Square matrix, 6 State diagram, 30, 81 Stewart, 116 Stoer, 116 Studden, 159 Superconsistent, 70, 84

197

Supporting hyperplane, 82, 83

Transfer function, 17 Transpose of a matrix, 6 Triangular factorization, 116, 118, 119

Uniform approximation, 37, 38, 73 Value of an optimization problem,

4, 15 Vandermonde matrix, 47 Vector, 5 Vector norm, 7 Watson, 44, 147 Weak duality theorem, 24 Weierstrass' theorem, 9 Weinberger, 182 Zero of multiplicity 2, 156