OPTIMAL SOLUTION OF NONLINEAR EQUATIONS
This page intentionally left blank
OPTIMAL SOLUTION OF NONLINEAR EQUATIONS Krzysztof A. Sikorski
OXFORD UNIVERSITY PRESS
2001
OXFORD UNIVERSITY PRESS
Oxford New York Athens Auckland Bangkok Bogota Buenos Aires Calcutta Cape Town Chennai Dar es Salaam Delhi Florence Hong Kong Istanbul Karachi Kuala Lumpur Madrid Melbourne Mexico City Mumbai Nairobi Paris Sao Paulo Shanghai Singapore Taipei Tokyo Toronto Warsaw and associated companies in Berlin Ibadan
Copyright © 2001 by Oxford University Press, Inc. Published by Oxford University Press, Inc. 198 Madison Avenue, New York, New York 10016 Oxford is a registered trademark of Oxford University Press. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of Oxford University Press. Library of Congress Cataloging-in-Publication Data Sikorski, Krzysztof A., 1953Optimal solution of nonlinear equations / Krzysztof A. Sikorski. p. cm. Includes bibliographical references and index. ISBN 0-19-510690-3 1. Differential equations, Nonlinear—Numerical solutions. 2. Mathematical optimization. 3. Fixed point theory. 4. Topological degree. I. Title. QA377 S4918 2000 515'.355—dc21 99-045246
9 8 7 6 5 4 3 2 1 Printed in the United States of America on acid-free paper
To my wife Elizabeth and our son John
This page intentionally left blank
Preface The purpose of this monograph is to provide an overview of optimal computational methods for the solution of nonlinear equations, fixed points of contra ve and noncontra ve mappings, and for the computation of the topological degree. We analyze the worst case scenario here. This means that for a given error criterion and a tolerance e, the methods guarantee the computation of an e-approximation to the solution for every function in a given class F. Optimal methods solve the problem in the smallest possible time. We study several classes of function, with special emphasis on tight complexity bounds and methods that are close to or achieve these bounds. In addition, pseudocodes and numerical tests of several methods are exhibited. The monograph should be viewed as a report on work in progress. We provide several worst case results, list numerous open problems, mention new work in average case analysis, as well as alternative models to be explored. The work on optimal complexity algorithms for nonlinear problems had its inception in the work of Kiefer in 1953 and Traub in 1961. This stream of research complements the classical numerical work on convergence of iterative techniques as summarized by Ortega and Rheinboldt in 1970. In the annotations to chapter 2 we give a brief history and list references to this work. In the 1980s Traub and Wozniakowski initiated a general complexity theory for solving continuous problems. Our work on nonlinear problems fits into this field presently known as Information-Based Complexity. In late 1980s a new stream of research in algebraic complexity was originated with the work of Blum, Shub, and Smale. Several optimality results and new algorithms for approximating zeros of systems of real and complex polynomial equations were derived. In addition a new algebraic topological complexity theory over the reals was established.
viii
PREFACE
We stress that the results we present strongly depend on the assumptions on the classes of function considered and confirm the worst case behavior of numerical experiments. It may surprise some pra tioners that, for example, the bisection method is proven to be optimal, even in the class of infinitely smooth function changing a sign at the endpoints of an interval. We stress that this is a worst case result. It confirms the fact that all iterative techniques based on linear information, in the worst case, exhibit only linear convergence. This was also the conclusion of extensive numerical experiments. On the contrary, at least quadratic rate of convergence of hybrid methods was observed in many tests. This is indeed confirmed by the average case analysis of a hybrid bisection-Newton, and bisectionsecant type methods carried out by Novak, Ritter, and Wozniakowski in 1995 (see the annotations to chapter 2). They also proved that such hybrid methods are optimal on the average. We present a brief summary of all chapters. More-detailed summaries are given in theINTRODUCTIONSto each chapter. In chapter 1 the basic concepts are illustrated with a simple bisection example and then formalized in the language of InformationBased Complexity theory. Adversary arguments lead to lower bounds on errors of algorithms. The real number model with oracle calls is defined. Basic notions of optimal information and algorithms are specified. In chapter 2 we analyze optimal methods for solving scalar and multivariate nonlinear equations. We define various error criteria and then review optimality results in several classes of function. An asymptotic analysis of one problem is also undertaken. In chapter 3 we discuss fixed point problems for the case of contractive functions. We first derive optimala algorithms for the univariate case under the relative and absolute error criteria. Then the multivariate case is analyzed. Several algorithms are derived including the circumscribed ellipsoid, centroid, and ball iteration methods. In chapter 4 we deal with fixed point problems for noncontra ve function. We exhibit the optimal bisection-envelope method for the univariate case. We prove that the multivariate problem has infinite worst case complexity under the absolute error criterion. The complexity under the residual error criterion is finite but exponential in 1/e. In chapter 5 we derive lower and upper bounds on the complexity of computing topological degree of Lipschitz function. In the two-
PREFACE
ix
dimensional case the derived algorithm enjoys almost optimal cost. We list some numerical tests. In the multivariate case the lower and upper bounds are exponential. Form of the text Each section of the text ends with exercises. Exercises vary in difficulty: More difficult exercises are marked with an asterisk (*), and open research problems are marked with two asterisks '**'. Each chapter closes with annotations, which indicate the source of the material and include historical remarks. A bibliography follows at the end of each chapter. We have chosen a special format for numbering theorems, lemmas, examples, corollaries, figures and formulas. Namely, they are numbered consecutively on each page, and have the page number attached to their name. For example, Theorem 99.1 is the name of the first theorem on page 99, Corollary 44.2 is the name of the second corollary on page 44, etc. We believe that this format will best serve the reader by providing a more structured and self contained text. Acknowledgments I want to acknowledge many debts. H. Wozniakowski and F. Stenger introduced me to the field of numerical computation and computational complexity, with special emphasis on optimal algorithms for solving nonlinear equations. Collaboration with Henryk and Frank was always stimulating and produ ve. Several sections of the monograph present my joint work with H. Wozniakowski, T. Boult, and Ch. W. Tsay. These are precisely indicated in the annotations. New work with Z. Huang and L. Khachiyan is also reported in the annotations. L. Plaskota, F. Stenger, J. F. Traub, G. Wasilkowski, and H. Wozniakowski carefully read the manuscript and suggested valuable improvements. Z. Huang helped me with drawing all figures and provided several constru ve comments. Most of the research reported here was carried out in the splendid research environments of the Computer Science Departments at the University of Utah, University of Warsaw, Columbia University, and University of California at Berkeley. I would like to thank the National Science Foundation, IBM, and Amoco Corporations for supporting the research reported here.
x
K. Sikorski Salt Lake City and Warsaw 2000
PREFACE
Contents 1 Introduction 1.1 Basic Concepts 1.2 Formulation of the Problem 1.2.1 Computational Methods 1.2.2 Optimal Complexity Methods 1.2.3 Asymptotic Setting 1.2.4 Exercises 1.3 Annotations Bibliography
3 7 8 16 18 18 19 20
2 Nonlinear Equations
23
3
2.1 Univariate Problems 25 2.1.1 Optimality of the Bisection Method 27 2.1.2 Root Criterion in C 34 2.1.3 Residual Criterion in W 37 2.1.4 General Error Criterion in C and W . . . . 44 2.1.5 Polynomial Equations 46 2.1.6 Asymptotic Optimality of the Bisection Method 56 2.1.7 Exercises 70 2.2 Multivariate Problems 71 2.2.1 function with Nonzero Topological Degree . . 71 2.2.2 Lipschitz Functions function 83 2.2.3 Exercises 97 2.3 Annotations 98 2.3.1 Overview and brief history 98 2.3.2 Specific Comments 106 Bibliography 108 XI
xii
CONTENTS
3 Fixed Points-Contractive Functions cti 3.1 Univariate Problems 3.1.1 Relative Error Criterion 3.1.2 Absolute Error Criterion 3.1.3 Exercises 3.2 Multivariate Problems 3.2.1 A Constru cti ve Lemma 3.2.2 Ball Iteration 3.2.3 Ellipsoid Iteration 3.2.4 Centroid Method 3.2.5 Numerical Tests 3.2.6 Exercises 3.3 Annotations 3.3.1 Specific Comments Bibliography
121 122 123 137 138 139 140 142 144 155 157 162 162 163 165
4 Fixed Points-Noncontractive Functions 4.1 Univariate Problems 4.1.1 Minimal Cardinality Number 4.1.2 The FPE-A Method 4.1.3 Exercises 4.2 Multivariate Problems 4.2.1 Absolute Error Criterion 4.2.2 Exercises 4.3 Annotations 4.3.1 General Comments 4.3.2 Residual Error Criterion 4.3.3 Specific Comments Bibliography
171 173 173 175 177 178 178 187 188 188 188 190 190
5 Topological Degree Computation 5.1 Two-Dimensional Lipschitz Functions 5.1.1 Basic Definitions 5.1.2 Lower Bound on the Minimal Cardinality Number 5.1.3 Minimal Cardinality Number 5.1.4 Complexity of the Problem 5.1.5 Numerical Experiments 5.1.6 Exercises 5.2 Lipschitz function in d Dimensions
193 194 195
196 199 207 208 211 211
CONTENTS 5.2.1 Basic Definitions 5.2.2 Information N* 5.2.3 Algorithm Using Information N* 5.2.4 Lower Bound on the Minimal 5.2.5 Exercises 5.3 Annotations 5.3.1 Specific Comments Bibliography Index
xiii 212 213 214 217 226 226 227 228 233
This page intentionally left blank
OPTIMAL SOLUTION OF NONLINEAR EQUATIONS
This page intentionally left blank
Chapter 1
Introduction This monograph is devoted to studying worst case complexity results and optimal or nearly optimal methods for the approximation of solutions of nonlinear equations, approximation of fixed points, and computation of the topological degree. The methods are "global" in nature. They guarantee that the computed solution is within a specified error from the exact solution for every function in a given class. A common approach in numerical analysis is to study the rate of convergence and/or locally convergent methods that require special assumptions on the location of initial points of iterations to be "sufficiently" close to the actual solutions. This approach is briefly reviewed in the annotations to chapter 2, as well as in section 2.1.6, dealing with the asymptotic analysis of the bisection method. Extensive literature exists describing the iterative approach, with several monographs published over the last 30 years. We do not attempt a complete review of this work. The reader interested in this classical approach should consult the monographs listed in the annotations to chapter 2.
1.1
Basic Concepts
We motivate our analysis and introduce basic notions in a simple example of zero finding for continuous function with different signs at the endpoints of an interval. Example 3.1 We want to approximate a zero of a function f from the class F = {/ : [0,1]
R: f(0) ,<0, and
f(1) > 0, / continuous}. s}
4
CHAPTER 1. INTRODUCTION
By an approximate solution of this problem we understand any point x = x(f) such that the distance between x and some zero = of the function f , f ( a ) = 0, is at most equal to a given small positive number e, \x — a\ < e. To compute x we first gather some information on the function f by sampling f at n sequentially chosen points ti in the interval [0,1]. Then, based on this information we select x. To minimize the time complexity we must select the minimal number of sampling points, that guarantee computing x(f] for any function f in the class F. This minimal number of samples (in the worst case) is called the information complexity of the problem. The cost of combining the samples is called the combinatory complexity of the algorithm constru cti ng x(f). We observe that our problem can be easily solved by the standard bisection method. That is, we select the first evaluation point as the midpoint of [0,1], t1 = 0.5. Then, if the function value /(t 1 ) is positive, the next interval containing a zero of f is [0,0.5], and if negative then [0.5,1]. We compute t2 as the center of the next interval containing a zero and so on. This is illustrated in figure 4-1After n evaluations we have an interval of length 2-n containing a zero of f. The best worst case approximation to a zero of f is now given by the midpoint of that interval. It is not difficult to carry
Figure 4.1: The bisection method.
1.1. BASIC CONCEPTS
5
out an induction showing that this approach is optimal in the worst case, i.e., that any other sampling procedure will produce an interval of length at least 2-n containing a zero of f. Therefore, the minimal number of function samples required to compute x(f) is given by n = [log 2 (l/(2 ))] in the worst case. This minimal number is achieved by the bisection method, i.e., the bisection method is optimal in this class of function. • In section 2.1.1 we outline a generalization of this example to the case of smooth function and methods based on evaluations of arbitrary linear functionals as information. It turns out that the bisection method remains optimal in that setting. Therefore, smoothness of function and extra power of general linear information do not help here in the worst case. It is customary in numerical practice to approximate zeros of univariate function by using hybrid methods involving bisection, Newton type, and higher order interpolatory methods. These methods do converge with asymptotically at least quadratic rate, for functions having zeros of bounded multiplicity. We show in section 2.1.6
that again the linear rate of convergence of the bisection method can not essentially be improved whenever function are infinitely many times differentiable and have zeros of unbounded multiplicity. The results of section 2.1.1 indicate that the fast asymptotic convergence of hybrid methods may occur for some function after an arbitrarily large number of bisection steps have been accomplished. This has been observed, but not too often, in numerical tests. For many test function the hybrid methods were far superior to plain bisection. These tests indicated the need of average case analysis of such methods. Recent results, as outlined in annotations to chapter 2, show that indeed a hybrid bisection-Newton type methods minimize the number of iterations in the average case. We stress that the above results hold with respect to the absolute error criterion, since we insisted on computing solutions close to the actual zeros in the absolute sense. One may wish to relax the absolute error criterion and decide to compute a point x = x(f) at which the magnitude of a function / is small, |/(x)| < e. This is called the residual error criterion. In general the absolute and residual solutions are not related to each other. For example, in the above class it is impossible to satisfy the residual error criterion in the worst case, since the function do not have a uniform upper bound on the derivatives. In some other
6
CHAPTER 1.
INTRODUCTION
classes, like the Lipschitz class in section 2.2.2 or W class in section 2.1.3, it is impossible to solve the problem under the absolute criterion, whereas the solution with residual criterion is readily available. In the course of the text we will introduce various generalizations of these two basic criteria, including relative, relative residual, and arbitrary homogeneous bounds. In terms of the above example we informally introduce basic concepts of our analysis. We denote the set of all zeros of a function / by S(f) and define the set of absolute e-approximations to be S (/,e) = {x € [0,1] : inf \x — \ e}. Then the problem is to compute an e-approximation M(/) as an element of the set S(/,£). We assume that the error tolerance e < 0.5, since for e 0.5 the trivial solution is M(f) = 0.5. The element M(f) is computed under the follwing assumptions. We use the real number model of computation. This means that we can perform the basic arithmetic operations (+,-,*,/) on real numbers at a unit cost. The cost of comparing real numbers is also taken as unity. The unit information operations, called oracle calls, are assumed to be function evaluations (we later generalize these to be arbitrary linear functional evaluations). The cost of computing f(t] for any argument t € [0,1] is denoted by a constant c, and c is usually much larger than unity. By a method computing M(f) we understand any mapping that is composed of a finite number of oracle calls, arithmetic operations, and comparisons. We analyze methods that compute M(f) S(f,e) for any function / in our class F. In the next section we make this definition more precise by splitting the computation of M(/) into two stages: information gathering (oracle calls) and combinatorial evaluation of M(f) (combinatory algorithmic stage). The worst case cost of a method M(.) is defined as the maximal cost of computing M(f) for any function f £ F. Then, the worst case complexity of the problem is defined as the minimal cost among all methods that solve the problem. In section 2.1.1 we show that the worst case complexity of zero finding in the above class F with methods based on evaluations of arbitrary linear functionals is
for some constant d € [0, 3]. Since for typical function / c essentially get
1, we
1.2. FORMULATION
OF THE PROBLEM
7
Why the real number model of computation? We carry out our analysis in the real number model since it is the most popular and practical model for scientific computation, numerical analysis, algebraic complexity, as well as computational geometry. The results in that model are essentially the same as in fixed precision floating point, whenever the methods investigated are numerically stable, and the relative error tolerance e is not too small compared with the product of the roundoff unit of the floating point arithmetic, the condition number, and the accumulation constant of the chosen method. Fortunately, many of the optimal methods are also numerically stable. The methods presented for the topological degree evaluation compute correct result provided that the fun on evaluations have only correct signs! That is also the case with the bisection method in the above example. We do not attempt giving precise definitions of stability and floating point analysis of methods. In the annotations to this chapter we list several references devoted to such topics.
1.2
Formulation of the Problem
In this section we cast our problem in formal language of the computational complexity theory. We formalize the notion of computational methods and their cost. All cost analysis is carried out under the assumption of the real number model of computation. We include this section to make the presentation of the material complete, up to date, and to direct reader's attention to Information-Based Complexity theory, which establishes a formal framework for studying optimal complexity methods for continuous problems. Our problem is formally defined as follows. We let F be a subset of a linear space of function F = {/ : D C Rd }, and let G be a subset of the real space . We let 5 be a given operator,
where 2G is the power set of G and is the set of nonnegative real numbers. We assume that the set S(f) = S(f, 0) is not empty and that S ( f , e ) becomes smaller as we decrease e, i.e., whenever
8
CHAPTER 1.
INTRODUCTION
for every element / € F. We call 5 the solution operator. Now we formulate the problem as: for a given error tolerance e 0, and any element , we wish to compute an e—approximation M(f) to be an element of the set S(f, e),
The above assumptions on S ( f , s ) guarantee that the problem has a solution and that the set of solution elements does not increase when we decrease the error tolerance. These can be viewed as natural requirements. The most basic solution operator studied by us is defined in terms of the absolute error criterion
where dist(a;, A) = inf , for A C G, with a given norm || • || in ;. Throughout the book the solution operators will be defined
by
which corresponds to the zero finding problem, which corresponds to the fixed point problem, which corresponds to the topological degree problem. An important example of a solution operator defined in terms of the residual error criterion for the zero finding problem is given by
Under this criterion we compute a point at which the magnitude of a fun on is at most equal to the prescribed error tolerance s. 1.2.1
Computational Methods
We describe the computation of the element M(f) 6 S(f,e) as composed of the following two stages: gathering information on the element / (oracle calls stage) and combining the information to compute M(/) (combinatorial algorithmic stage). Below we describe each of these.
1.2. FORMULATION OF THE PROBLEM
9
Information To compute M(f) we need to gather some information on the element /, e.g. samples of a fun on / at a discrete set of arguments. We assume that in general we can compute unit information operations L(f), L : F , where L is a linear functional. Examples of such functionals L are fun on and/or derivative evaluations, scalar products, integrals, etc. We denote the class of all such L's by . For each / we compute a finite number n(f) of unit information operations, which constitute the information N(f) about /. The information N is called parallel (nonadaptive) iff
where Li € L for every and where the are linearly independent. The number of evaluations n is fixed for any fun on / and is called the cardinality of N. In the case of parallel information the functionals Li are given a priori, before any computation takes place. Such information can be easily implemented on a parallel computer with up to n-processors and yielding optimal speed-up over sequential one processor computation. The class of sequential (adaptive) information does not enjoy this desirable property; every Li evaluation depends on all previously computed values, and the total number of evaluations n(f) depends on the particular fun on / selected. The sequential information can therefore be defined as
where
for
and
The number n(/) is determined as follows. We suppose that we have already computed yl = LI(/), ...,y,- = (f; yi,...,y,-_i). Then we make a decision whether another evaluation is needed. The decision is made on the basis of available knowledge about /. If the decision is "NO", n(f) becomes i and the computation is terminated. If the decision is "YES", then we choose L;+1, and the whole process is repeated. This termination procedure can be modeled by defining Boolean function
called termination function. If ter,-(yi, ...,?/,•_!) = 1 then we terminate the computation and set n(f) = i. Otherwise, if teri(yi,..., )
10
CHAPTER 1.
INTRODUCTION
= 0, we choose the (i + l)st functional Li+1 and compute L;+i( ,..., y i ) . This process is then repeated. Thus, n(f) is defined as
By convention, we define here min{0} = +00. The maximal number of functional evaluations is called the cardinality of N,
In what follows, we assume that the cardinality card(IV) is finite. This assumption is unrestri ve in practice, since if n(f) is unbounded then the cost of computing the information N(f) could be arbitrarily large. In some parts of our book we consider the sequential information with the number of evaluations n = n(f) given a priori, i.e., independent of the selection of a fun on /. This is justified by a common computational practice in which the user restricts a priori the total number of steps (evaluations) in a method. This type of sequential information is sometimes easier (less technical) to analyze. We will call it information with predefined cardinality. In the sequential information the next evaluation (functional) Li is defined with respect to all previously computed information operations given in the vector [y1; ..., y ]. Such structure does not permit easy implementation on a parallel computer and is naturally suited for sequential processor implementation. Two important properties of the information are: • It is partial, i.e., the operator N is many to one. This implies that knowing N(f) we cannot uniquely identify /. • It is priced, i.e., each evaluation Li costs, say c-units of time, and the total N(f) therefore costs c n(f). We are interested in information that has minimal cost, i.e., minimal number of L evaluations, which are necessary to compute an e-approximation M(/) for every / G F. This minimal number of evaluations m(e) will be called the information complexity of the problem.
1.2. FORMULATION OF THE PROBLEM
11
Algorithms
Knowing the information N(f) we compute an approximation M(f) € S(f, s). This is accomplished by an algorithm defined as any transformation of the information N(f) into the set G, i.e.,
We are interested in algorithms that use minimal resources (information with minimal cardinality and a small number of arithmetic operations and comparisons) to solve our problem. Such algorithms compute the e-approximation M(/) = (N ( f ) ) , for every / 6 F, and have smallest (or close to smallest) cost. The cost issues are further discussed below. Remark We stress that classical numerical methods (algorithms) can be cast into our framework. The notions of information and algorithms are not separated there but rather are jointly used as a method or an algorithm. • By a computational method M for solving the problem we understand combined information N and algorithm such that M(/) = (N(f)). In what follows we introduce concepts like radius and diameter of information, define error of an algorithm, optimal and central algorithms. These notions will enable us to characterize the quality of computational methods. Radius and diameter of information
The radius and diameter characterize the quality of information. The smaller the radius and diameter the better we can approximate our problem. We assume that the solution operator is given as in 8.1, so the error is measured according to the absolute error criterion. The notions of radius and diameter with different error criteria are analyzed in the following chapters. We also assume for simplicity that the set S(/) = {s(/)} is a singleton, with the general case left as an exercise for the reader. We let y — N(f) be the n-dimensional information vector for
12
CHAPTER 1.
INTRODUCTION
Figure 12.1: Local radius of information. some / 6 F. The set of elements F indistinguishable from / by N will be called V(y) (see Figure 12.1), i.e.,
The local radius of information r(N, y) is defined as the radius of the set U(y) = S(f), which is composed of all elements s(f) for f in V(y), i.e.,
The local diameter of information d(n, y) is the diameter of the set U(y), i.e.,
The (global) radius r(N) and diameter d(N) of the information N are defined as the worst case local radius and diameter, i.e.,
1.2. FORMULATION OF THE PROBLEM
13
and
It is not difficult to prove (but is left as an exercise) the following relationship between the radii and diameters of information: Lemma 13.1 For every vector y = N ( f ) , f tion N we have:
F and any informa-
Moreover, if the elements s(f) of the sets S(f) are real numbers, then
and
Errors of algorithms It turns out that the local (global) radius of information is a sharp lower bound on the local (global) error of any algorithm using the information N. More precisely, we define the local e( , y) and global e( ) errors of any algorithm as:
and
Theorem 13.1 We have:
and
where
(N) is the class of all algorithms using the information N.
CHAPTER 1.
14
INTRODUCTION
Proof We will only prove 13.1 since the proof of 13.2 is similar. We let R = inf and L = r(N, y). We prove (i) L R, and (ii) L R which combined imply 13.1. To this end we take any algorithm and observe that
which yields (i). Now we take any 8 > 0 and c$
Then taking the algorithms
G such that
(y) = c with
0+ we obtain
which completes the proof. • Theorem 13.1 indicates that optimal algorithms should have errors as small as possible, i.e., equal to corresponding radii of information. The strongly (locally) and globally optimal error algorithms are therefore defined as: The algorithm * is a strongly (locally) optimal error algorithm iff for every
The algorithm
** is an (globally) optimal error algorithm iff
These definitions immediately imply that every strongly optimal algorithm is also globally optimal. It may happen though that the local error of a globally optimal algorithm is much larger than the local radius of information.
1.2. FORMULATION OF THE PROBLEM
15
Central algorithms We suppose now that for every y € N(F) there exists c = c(y) which is the center of a smallest ball containing the set U(y). We then define the central algorithm (y) by
Lemma 15.1 An algorithm error algorithm.
is central iff it is a strongly optimal
The proof of this lemma is left to the reader as an exercise at the end of this chapter. Interpolatory algorithms It is important to analyze algorithms that are "nearly" strongly optimal, or more precisely that compute a solution belonging to the set U ( y ) . Such algorithms will be called interpolatory, since they provide an exact solution for some element / that interpolates our unknown / with respect to the given information N. An algorithm is called interpolatory iff for some
for every
It turns out that the local error of interpolatory algorithms is at most twice the local radius of information. This property is formulated in the following theorem. Theorem 15.1 For every interpolatory algorithm N(F) we have
, and every y 6
and Proof We only show (i) since the proof of (ii) is similar. To show (i) it is enough to prove that e( , y) d(N, y) (see Lemma 13.1 (i) and 13.1). We observe that
which completes the proof. •
CHAPTER 1.
16 1.2.2
INTRODUCTION
Optimal Complexity Methods
We stress that to compute an e-approximation M(f) S ( f , e ) , for every / F, we need to utilize a method M — ( , N), such that the (global) error of the algorithm is at most e, e( ) e. We discuss here complexity issues of computational methods. We use the real number model of computation, which is defined by the following two postulates: 1. We assume that each information operation (oracle call) L;(•) costs c > 0 units of time. 2. We assume that multiplication by a scalar or addition (subtraction) of elements in the set G has unit cost. Since G C d, the unit cost corresponds to the cost of d arithmetic operations. We make this assumption to normalize our cost function. In addition we assume that comparisons and evaluations of certain elementary function also have unit cost. We define the worst case cost of a method M = ( , N), as:
where the cost(N(f)) is the cost of computing the information vector y = N ( f ) , and cost( (N(/))) is the cost of combining the information y to compute M(f) =
(y) with using of the algorithm . A method M° = ( , N°) is called an optimal complexity method if it guarantees to compute an e-approximation M(/) € S ( f , e ) , for every problem element F, with a minimal worst case cost, among all methods that belong to a specific class M, cost(M°) = comp(e), where such that
This minimal cost comp(e) is called e-complexity of the problem 5 in the class M . Examples of classes of methods include the following: 1. information consisting of parallel evaluations of linear fun cti onals combined with an arbitrary algorithm 2. information consisting of parallel fun with any algorithm
on samples combined
1.2. FORMULATION OF THE PROBLEM 3. information consisting of sequential fun with an arbitrary algorithm, etc.
17 on samples combined
We recall that the information complexity m(e) in the class £ is the minimal number of samples ;(•) € , in the information Nn = [ (-), ... (•)], which is needed to compute an e-approximation for every / F. Formally for all N as above with
We now suppose that we can construct information N° consisting of n = m(e] samples, such that the radius of N° is at most e, r(N°) . To summarize: N° is such that: 1. The number of samples 2. The radius r(N°) 3. The cost(N°(f}} c m(e).
, equals m(e).
e. for a worst / 6 F is approximately equal
We further assume that there exist an algorithm mation N° such that: 4. The error of r(N°).
using the infor-
is equal to the radius of information N°, e( ) —
5. The cost of combining N ° ( f ) with the algorithm smaller that the cost of computing N°:
is much
for every
Under these assumptions the cost of the method M° = ( , N°) is approximately equal to c m(e),
We observe that to compute an e-approximation M(/) e S ( f , e ) for every / e F we have to use information Nn with the number of samples n > m(e}. Therefore, the cost of any method M that solves our problem must be at least c m(e),
CHAPTER 1.ININTRODUCTION
18
These arguments imply that the method M° is an almost optimal complexity method in the class M. = ( , N), where N is any information consisting of evaluations L;(•) € and is an arbitrary algorithm. Furthermore, we conclude that the e-complexity of the problem is approximately
which establishes important relationship between information complexity and the e-complexity of the problem. Conclusion Any method satisfying conditions (1-5) above is an almost optimal complexity method. The e-complexity of the problem is approximately equal c m(e). • Fortunately, many important applications satisfy conditions (15). These include the zero finding problem in the case of opposite signs of a fun on at the endpoints of an interval, other zero finding problems as described in chapter 2, and many of the fixed point problems in chapters 3 and 4.
1.2.3
Asymptotic Setting
In classical numerical analysis the asymptotic analysis of methods is commonly undertaken. In this case we approximate an element s(f) 6 S(f) by a sequence of methods Mn(f) = n (Nn(/)), where Nn(f) consists of n, in general, sequential evaluations on /. In this setting we are interested in achieving possibly the best speed of convergence of the sequence \\s(f) — Mn(f}\\ for every fun on in the class F. Here the complexity analysis is of secondary importance. The problem is to find a sequence of algorithms n and information Nn that guarantee the best speed of convergence. This setting will be analyzed in section 2.1.6 for the class of smooth univariate function changing sign at the endpoints of an interval.
1.2.4
Exercises
1. Carry out the proof of Lemma 13.1. 2. Verify 13.2. 3. Verify formally Lemma 15.1. 4. Carry out the proof of part (ii) of Theorem 15.1.
1.3. ANNOTATIONS
19
5. Suppose that the set S(f) is not a singleton as assumed on page 11. Derive the general formula for the local radius of information with respect to the absolute error criterion, which is given by:
6. (*) Carry out the proofs of all major results in this chapter with using of the generalized definition of radius of information from the previous exercise.
1.3
Annotations
This chapter is based on the theory outlined in several monographs: [4], [7]-[11], [6], [12]. In these the reader shall find the history of research devoted to optimal information and algorithms for several important applications as well as exhaustive bibliography of the field presently known as Information-Based Complexity [8]-[11]. These monographs go well beyond the worst case model discussed in our book. They analyze extensions of the theory to different settings, including average, probabilistic, and asymptotic analysis [11]. Moreover, they fully develop the theory and applications for linear problems [11]. A formalization of the real number model is given by Blum, Shub, and Smale [1], [2]. In our analysis we use the extended real number model with inclusion of oracle calls. This formal extension can be found in the paper of Novak [5]. The notion of numerical stability for solving nonlinear equations was introduced by Wozniakowski [13]. In a recent paper [14], Wozniakowski gives complete floating point analysis of the bisection method, together with an informal review of several computational models used in scientific computation. An excellent monograph devoted to stability of numerical algorithms was recently published by Higham [3]. The results of exercises 5 and 6 were further generalized in [10] to the case of sets without any norm.
20
BIBLIOGRAPHY
Bibliography
[1] Blum, L., Shub, M., and Smale, S. On a theory of computation and complexity over the real numbers: NP completeness, recursive function and universal machines. Bull. Amer. Math. Soc., 21: 1-46, 1989. [2] Blum, L., Cucker, F., Shub, M., and Smale, S. Complexity and Real Computation. Springer-Verlag, New York, 1998. [3] Higham, N. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, 1996. [4] Kowalski, M., Sikorski, K., and Stenger, F. Selected Topics in Approximation and Computation. Oxford University Press, New York, 1995. [5] Novak, E. The real number model in numerical analysis. Complexity, 11: 57-73, 1995.
J.
[6] Novak, E. Deterministic and Stochastic Error Bounds in Numerical Analysis. Vol. 1349 of Lecture Notes in Math. SpringerVerlag, Berlin, 1988. [7] Plaskota, L. Noisy Information and Computational Complexity. Cambridge University Press, New York, 1996. [8] Traub, J. F., and Werschulz, A. G. Complexity and Information. Cambridge University Press, New York, 1998. [9] Traub, J. F., and Wozniakowski, H. A General Theory of Optimal Algorithms. Academic Press, New York, 1980. [10] Traub, J. F., Wasilkowski, G., and Wozniakowski, H. Information, Uncertainty, Complexity. Addison-Wesley, Reading, MA, 1983. [11] Traub, J. F., Wasilkowski, G., and Wozniakowski, H. Information Based Complexity. Academic Press, New York, 1988. [12] Werschulz, A. G. The Computational Complexity of Differential and Integral Equations. Oxford University Press, New York, 1991.
BIBLIOGRAPHY
21
[13] Wozniakowski, H. Numerical stability for solving nonlinear equations. Numer. Math., 27: 373-390, 1977. [14] Wozniakowski, H. Why does information-based complexity use the real number model? Theor. Comp. Sci., 219: 451-466, 1999.
This page intentionally left blank
Chapter 2
Nonlinear Equations In this chapter we address the problem of approximationg zeros a of nonlinear function /, f ( a ) = 0, where / E F C {/ : D C Rd —^1Z1}. In order to define our solution operators, we first review several error criteria that are commonly used to measure the quality of approximations to zeros of nonlinear equations. This is done for univariate function / : [a, b] R. Straightforward generalizations to the mulivariate case are based on replacing the magnitude fun on by a specific norm. These are considered in section 2.2 when we review multivariate problems. Error criteria
A number of error criteria are used in practice for approximation of a zero a of /. For instance one may wish to find a number x = x(f, e) such that one of the following conditions is satisfied: root (absolute) relative root residual relative residual
criterion : criterion : criterion : criterion :
where e is a nonnegative real number specifying the errror tolerance. A general formulation of the problem can be given in terms of a general error criterion as follows. We define E to be a nonnegative real fun on Then the problem can be formulated by the solution operator S ( f , e ) given by
24 We call the fun are as follows
CHAPTER 2. NONLINEARREEQUATIONS on E a general error criterion. The examples of E
which corresponds to the root criterion,
which corresponds to the relative root criterion,
which corresponds to the residual criterion, and
which defines the relative residual criterion. Remark It is not difficult to generalize the definitions of the radius of information and error of an algorithm presented in section 1.2.1. Namely
and
As an analog to the result in section 1.2.1 we can prove that the radius of information is a tight lower bound on the error of any algorithm
The proof of this result is left as an exercise for the reader. •
2.1. UNIVARIATE PROBLEMS
2.1
25
Univariate Problems
In this section we review several classes of univariate function with emphasis on optimal methods and tight complexity bounds. In general we define the class of function F as a subset of
where some restrictions on smoothness, additional properties (like sign change) and bounds on the norm are analyzed in the following sections. In the following sections 2.1.1-2.1.4 we consider the class F C F of function, which have some number of continuous derivatives, nonempty set of zeros and a bounded norm (or seminorm). A particular selection of the class depends on the error criterion and is specified below. We start our analysis in section 2.1.1 with the class C of infinitely many times differentiable function changing sign at the end points of the interval [0,1]. We consider the absolute error criterion and analyze methods based on sequential evaluations of arbitrary linear functionals. We show that in this class the bisection method is optimal. In sections 2.1.2-2.1.4 we do not assume that the function change signs at the endpoints. In this case the zero finding problem under the absolute error criterion becomes unsolvable in the worst case; however, it can be solved in the residual sense. More precisely, in section 2.1.2 we analyze the absolute error criterion. We prove that there exists no method to find x, \x — a\ e, with e < (b — a)/2 for the class F1 of infinitely often differentiable function with simple zeros, whose arbitrary seminorm is bounded by 1. We stress that this result holds independently of which and how many linear functionals are evaluated. The same result holds for the relative root criterion with e < (b — a) / (b + a + 2 ) and a 0 (section 2.1.4). In section 2.1.3 we consider the residual error criterion, and deal with the class F2 of function having zeros whose (r — l)st derivative is bounded by one, r 1. We find almost optimal information and algorithm. The analysis makes extensive use of Gelfand n-widths. The almost optimal information consists of n parallel fun on evaluations and the algorithm is based on a pair of perfect splines interpolating /. This algorithm yields a point x such that f ( x ) = O ( n - r ) .
26
CHAPTER 2. NONLINEAR EQUATIONS
For small r we present a different algorithm and information that are also almost optimal and much easier to compute than the algorithm based on perfect splines. If n is large enough, n = , then the residual criterion is satisfied. By contrast we prove that the relative residual criterion is never satisfied (section 2.1.4). In section 2.1.4 we also discuss a general error criterion and find a lower bound on the error of optimal algorithms in terms of Gelfand widths. We compare the results for root and residual criteria. We consider the class F1 with the seminorm given by ||f|| = . Then F1 is a proper subset of F2. This shows that for the class F\ it is possible to satisfy the residual criterion while it is impossible to satisfy the root criterion. On the other hand, there exist classes of function for which one draws the opposite conclusion. For example, for the class of continuous function that have opposite signs at a and 6, one can easily show that it is possible to satisfy the root criterion and impossible to satisfy the residual criterion. This shows how essential is the specification of the error criterion for solving nonlinear algebraic equations. We finally summarize the results of sections 2.1.5-2.1.6. In section 2.1.5 we generalize the results of section 2.1.1 to the case of polynomials of (in general) unbounded degree that change signs at the endpoints of [a, b]. We show that even in this strongly restricted class the bisection method remains to be optimal. We also find optimal parallel method, which is based on evaluations of a polynomial at equidistant points in [a, b]. In section 2.1.6 we analyze the asymptotic rate of convergence of methods approximating zeros of infinitely many times differentiable function that change signs at the endpoints of [0,1]. We show that for function having zeros of infinite multiplicity we cannot essentially improve the linear rate of convergence of the bisection method. This result holds for methods based on arbitrary sequential evaluations of continuous linear functionals. When the multiplicity of zeros is bounded, then there exist methods that converge asymptotically with at least quadratic rate. The general setting of the iterative model of computation is reviewed in the annotations to this chapter. We give there several references to the classical literature devoted to this model. In the annotations we also briefly review the average case model of com-
2.1.
UNIVAPIATE PROBLEMS
27
putation and list references addressing univariate problems. The multivariate average case analysis is a widely open research area. Complexity results We briefly summarize complexity results for solving univariate problems. We recall that the worst case complexity is defined as the minimal (worst case) cost of a method that computes e-approximations for all function in the considered class. As in chapter 1 we assume that each fun on evaluation costs c units of time, and that the cost of arithmetic operations, comparisons, and the evaluations of elementary function is taken as unity (usually c >> 1). Our positive results for the bisection method in sections 2.1.1 and 2.1.5 show that the e-complexity in corresponding classes of function is proportional to clog(l/e). The negative results in sections 2.1.2-2.1.4 for the root, relative root, and relative residual criteria, show that the e-complexity for these problems is infinite. For the residual criterion, the results of section 2.1.3 imply that for small r, the e-complexity is proportional to ce~ 1 / r . For large r, we can conclude that the e-complexity is at least proportional to ce~ 1 / r . It may be much larger since the computation of the algorithms based on perfect splines may require much more than ce~l/r arithmetic operations.
2.1.1
Optimality of the Bisection Method
We consider here a zero finding problem for smooth function that change sign at the endpoints of an interval. The goal is to compute an e-approximation with respect to the root criterion. This can be accomplished by using bisection, hybrid bisection-secant, or bisection-Newton type methods. We show that in the worst case the bisection method is optimal in the class of all methods using arbitrary sequential linear information. This holds even for infinitely many times differentiable function having only simple zeros.
28
CHAPTER 2. NONLINEARrR EREQUATIO
Formulation of the problem
We let = C [0,1] to be the space of infinitely many times differentiable function / : [0,1] . We let F = f(1) 0) there exists exactly one a = a(f) such that /(a) = 0, and The solution operator S is defined as
with the singleton set S(f) = {a(f)}. We wish to compute an eapproximation M = M(f), which belongs to the set S( ),
for every fun on / 6 F. To accomplish this we use arbitrary sequential information with predifmed cardinality n, N = Nn as defined in hapter 1,
where y = LI(/), [y,-_i,//,-(/;y,-i)], and !,-,/(•) !,-(•;y,;-i) is a linear functional i = 1, ...,n. The bisection information JVbls is in particular given by the functionals
for Xi — (ez,-_i + 6,-_i)/2 with ao = 0, 60 = 1, and
and
Knowing A^(/) we approximate a(f) by an algorithm <j>,
As in chapter 1, the worst case error of an algorithm
is
2.1. UNIVARIATE PROBLEMS
29
The radius of information is given by
and as in general, it is a lower bound on the error of any algorithm
where is the class of all algorithms using N. The bisection algorithm is defined as
It is easy to verify that
Optimality theorem We show that the bisection method Mbis = (!>bis, 7Vbis) is optimal, i.e., Theorem 29.1 The following equation holds
where C is the class of all sequential information. To prove this theorem we need two lemmas. We first denote /, to be any closed subintervals of [0,1], i = 1,..., k, and
We let Li : T —>• 7£ be linearly independent linear functionals. Then we prove Lemma 29.1 For every 6, 0 < 8 < 1, and every family of intervals Ii C [0,1], with diameter diam(/;) = 6, i = ! , . . . , & — 1 such that the functionals Li, i = 1,..., k — 1 are linearly independent on Ck-i, there exists an interval Ik C [0,1] with diam(/fc) = 6, such that the functionals Li, i = 1,..., k are linearly independent on Ck-
30
CHAPTER 2. NONLINEAR EQUATIONS
Proof We suppose that such interval Ik does not exist. Therefore, for every choice of Ik C [0, 1] with d i a m ( c ) = 8, the functionals I/i,...,Lfc are linearly dependent on Ck- Since L i , . . . , L f c _ i are linearly independent on Cfe_i, this yields
We first assume that a,-(//t) = a; for every choice of Ik. We let Ij = [cj,Cj+i], for any numbers Cj, j = !,...,, such that c^ = 0, cq = 1 — (5 (q < oo), and Cj < Cj+i < Cj + 8. Then /* form a (^-covering of [0,1],
By the unity decomposition theorem we can decompose any / G as
Therefore
for any This equation contradicts linear independence of Z/i,..., L^ on T. Therefore there must exist two intervals 71 and /2 such that
and
and /3j ^ jj for some index j. This implies that
Since /3j — 7j ^ 0 this contradicts linear independence of LI, ..., Lk-i on Ck-i, and completes the proof. • The next lemma guarantees existence of specific worst case functions.
2.1. UNIVARIATE PROBLEMS
h31
Lemma 31.1 For every adaptive information Nn, and for every 8, 0 < 6 < 2~n/(n+ 1), there exists a fun on fn G T, points x\,n and X2,n and intervals Ij with diam(/j) = 5, j — 1, ...,n, such that (i) LI fn,..., Ln fn are linearly independent on Cn, (ii)
where
and
(in) fn is strictly increasing for
Proof We use induction on n. We suppose first that n = 1. Since L\ is not 0 on T', then as in the proof of Lemma 29.1 we conclude that there exists an interval /i = [c, c + $}, Ii C [0,1], such that LI ^ 0 on C\ = C(I\). We assume without loss of generality that c > 0.5 — /2, and define
We note that /j £ F, and that L\^^ — L\ is linearly independent on C\. By taking x\,\ = 6/2 and £2,1 — c — 5/2 we obtain
and
and /i is strictly increasing 01 This completes the case n = 1. We now assume that our result holds for some n > 1 with a fun on fn. The information Nn+i consisting of n + 1 evaluations yields a functional Ln+ijn. Without loss of generality we assume
CHAPTER 2. NONLINEARR EEQUATIONS
32
that Li,fn,..., Lnjn, Ln+ijn are linearly independent on T (see also the annotations to this section). Then Lemma 29.1 implies that such that there exists an interval are linearly independent or If or and then Lemma 31.1 holds for #2,71- We therefore assume that
Suppose without loss of generality that Then we set
We note that
and take a function
such that
Such fun on h exists since the functionals are linearly independent on Cn+\- Therefore
The function , h, and fn are illustrated in Figure 33.1. We now take a positive constant d so small that
We then define the fun
on
which obviously belongs to T. Moreover, since Nn(fn+i) = N n ( f n ) , then Z/,-,fn+1 = Lijn for i = 1,..., n+ 1. We define x\^n+i = x\^n and %2,n+i = c - 6/2. Then z 2 ,n+i - ^i,n+i > 2~"-1 - (n + 2)5, and dist(/j, [xi,n, xz,n]) > $/% for every j. The choice of the constant d and the fun on g yields that fn+i satisfies the assertions (ii) and (iii). Therefore Lemma 31.1 holds for the fun on fn+\- • We are now ready to prove the main result of this section.
2.1. UNIVARIATE
PROBLEMS
33
Figure 33.1: The function g, h, and fn. Proof of Theorem 29.1. We construct two function g\ and from F such that
for every 5, 0 < S < 2 ™/(n + 1). Then the proof follows immediately from the definition of radius of information on page 29, with 5 approaching zero. To construct g\ and g% we apply the technique presented in the proof of Lemma 31.1. Namely, we take the fun on fn, functionals Li,fn from Lemma 31.1, and define the fun on g G T by
We then take a fun on h € Cn such that £;,fn (g + h) = 0 for every i — 1,..., n. As before we choose a positive constant d so small that
and define the function
CHAPTER 2. NONLINEAR EQUATIONS
34
Both gi and g? belong to J- and have only simple and single zeros, «(5l) S (X\,n ~ $/2,Xl,n)
and «(p 2 ) G (z 2 , n , ^2,n + V 2 );
since
fn
is
strictly increasing on these intervals. Therefore we conclude that g\ and #2 belong to F. Moreover we note that Nn(g\) = Nn(gz) and |«(<7i) — 0(52)! > 2~™ — (n + 1)5, which means that these function satisfy all of the requirements. The proof is complete. • Corollary 34.1 In order to compute an e-approximation to the zero of every fun on in our class, one needs m(e] — [Iog2 ^~1] fun on evaluations. The number m(e) is minimal with that property, i.e., the information complexity of the problem is equal to m(e).
2.1.2
Root Criterion in C
We define T = C [a,b} to be the linear space of infinitely often differentiable function and set S(f) to be the set of zeros of /,
We let || • || be an arbitrary seminorm on F. We consider the subclass FI to be the set of function in F that has only simple zeros and whose seminorm is bounded by one, i.e., FI = {/ € T : S(f) ^ 0, and / (a) ^ 0, for every a G S(/), and ||/|| < 1}. For a given e and any fun on / € FI we want to find an ^-approximation M = M(f) satisfying the root criterion,
We want to solve this problem by using arbitrary sequential information and an arbitrary algorithm tj>. For this specific problem we have
where dist
for
get for every algorithm
As before we
2.1. UNIVARIATE PROBLEMS
35
We prove here a negative result showing that the problem is too difficult to be solved with finite cost whenever e < (b — a)/I. As a consequence no computational method exists that solves our problem in this class of function. For positive results we need to consider a weaker error criterion, e.g., residual error analyzed in the next section or introduce more restrictions on the class of function, as was done in section 2.1.1. The main result here is given in the following theorem. Theorem 35.1 7/card(/V) = n < +00, then
Proof We first set <j>(N(f)) = (a + b)/2 for any information TV and get e() < (b — a)/2, and as a consequence, r(N) < (b — a)/2. To prove the reverse inequality we construct for every 7, 0 < 7 < (b — o)/2, two function fi and /2 from Fj, such that N(fi) = N(fz) and dist(5'(/i), 5(/2) > 6—0 — 27. Then the result of the theorem will follow with 7 tending to zero. We first construct the fun on f\. We let n = card(JV), define the points
for
and the functions
for i- 1,2, ...,n+ 1We first note that hi 6 T and that ma\x€[a>b] \hi(x)\ = 1. Next we let d = max(||l||,maxi<j< n+ i ||/ij||) and take a positive 6. If d > 0, we assume that 6 < l/(4(n + l)d). For a given vector c — [CD, ci, ...,cn+i] G 7Sn"f2, we assume that CQ e [1,3] and that maxi<,-
36
CHAPTER 2. NONLINEAR EQUATIONS
We observe that H 6 T and /c 6 T for any considered vector c. If
We observe that fc(x{) = 5 and that f c ( ( z k - i +£fc)/2) = # —c 0 $ < 0. Therefore /c has a zero, and it has at most 2(n + 1) zeros inside [a, a+ 7], 5(/c) C [a, a+ 7]. We further note that f'c(x) = 0 iff x = xt, x = (xi-i + Xi)/2, x e [xj-i,Xj] if Cj — 0 or x e [a; + 7, 6]. For every choice of c,-, i = l,...,n+ 1, such that max,-|c,-| = 1, there exists CQ e [1,3] such that CQ| #((£,-_! + £;)/2)| ^ 6 for i = 1,..., n + 1. We let c* = [CQ, ci,..., cn+i]. This choice implies that the fun on fc* has only simple zeros and belongs to F\. We now choose the constants cj,..., cn+1, max, |ci| = 1, so that the fun on /c« has the same information as the fun on 8, N(fc*) — N ( 8 ) , where 6(x) — S for x € [a, b]. To this end we take c,-'s as above that satisfy the equation
Then Li(/ c .) = ii( ) = yi. If ten( y i ) = 1, then N ( f c . ) = N(S) and we are done. If not, we assume inductively that for some j, 1 < j' < n — 15 we have ter t -(?/i,..., y,-) = 0, where
for i = l , . . . , J. We then take any c,-, z = 1,..., ?z+l, with max, |c,-| = 1, such that the following system of linear equations is satisfied
Such choice exists since j + 1 < n + 1 implies that there exists a nonzero solution of the above homogeneous system. We then have
and we are done. Otherwise,
2.1. UNIVARIATE PROBLEMS
37
then j + I < ra(/c») < card (TV) = n. Thus j < n - 2 and the construction can be repeated for j + I. Since n(/ c ») < n for any choice of c,-, i = 1,..., n + 1, with max,- |c,-| = 1, then there exists an index j such that terj(yj,..., j/j) = 1 and N(fc*) = N ( 6 ) , as claimed. We define fi = /c*. To construct ji we proceed as above with X{ replaced by x* — 6 - 7 * ' / ( n + l ) , * = 0,l,...,n+l. Then /2 6 FI, JV(/ 2 ) = N(S), and 5(/2) C [6 - 7,6]. This implies that dist(5(/i), 5(/2)) > b - a - 27. By noting that N(fi) = N(/2) we complete the proof. • Theorem 35.1 states that the error of any algorithm is at least (6 — a)/2. Therefore for e < (b — a)/2 there exist no method for which the root criterion is satisfied.
2.1.3
Residual Criterion in W^
We define T = W^ to be the space of function / : [a, b] —>• H whose (r — l)st derivative is absolutely continuous and such that the infinity norm of the rth derivative is finite, U/^]^ < +00, r > 1. We let 5(/) = {a € [a, b] : f ( a ) = 0} denote the set of all zeros of /, let W^ = {/ 6 W^[a,b] : U/^U^ < 1}, and define the class F2 as The solution operator is defined now with respect to the residual error criterion as The error of an algorithm is now given by
and the radius of information, which as before gives a sharp lower bound on the error of any algorithm, is given by
We define C = C[a, b} to be the space of continuous function on [a, b] equipped with the infinity norm ||/||c = maXj-g^] \f(x}\. An essential role in our analysis takes the Gelfand nth width dn(W^, C) of the set W^ in the space C, i.e.,
38
CHAPTER 2. NONLINEAR EQUATIONS
where Li,...,Ln are linear functionals. Classical result of approximation theory gives
where Kr is the Favard constant, We first show that the radius r(N) of any information of cardinality n is bounded from below by dn+l(W^,C). Theorem 38.1 We let n = card(TV). Then
Proof We let define a fun on
and
take any
For any choice of linear functionals L\,..., Lj, j < n, and any point z, z £ [a, b], we define
From the definition of the Gelfand nth width we easily conclude that
Therefore there exist a fun such that
on /* depending on Li,...,Lj and z
and f*(y) > \\$\\c, for some point y € [a, b]. We note that /* belongs to F%, and define a fun on g that depends on /*
We observe that Thus g also belongs to F%.
and
2.1. UNIVARIATE PROBLEMS
39
We now let N be arbitrary sequential information of card (TV) = n. We find a fun on /* such that N(g(-J*)) = N(S). This is accomplished in a manner similar to that in the proof of Theorem 35.1. We take LI as the first functional of N and pick any point 2, say z = (a + 6)/2. By using (38.1) with j = 1 we find the fun on /j" depending on LI and z, and gi — g(-, ;/i). Then we have LI(51) = L1( ) = yi. If teri(yi) — 1 then AT(3!) = N(S) and we are done. If not, we assume inductively that for some index j, 1 < j < n — 1, ter,-(yi,...,y,-) = 0 for i = l,...,j, where y,- = L;( ; yi,..., y;_i), and L; is the z'th functional of AT. Now we take LI, L 2 (-;; yi), ..., Lj +1 (-; yi, ...y,-) and z = (a + 6)/2. By using (38.1) for these functionals we get the function /* and
# = (•;/;).
Then we have
for i = l,...,j+ 1. If terj + i(yi,...,yj+i) = 1 then and we are done. If not, then j + 1 < n(gj) < card(A^) = n. Thus j < n — 2 and the whole construction can be repeated for j + 1. Since n(gj) < n for all j, we find an index j*, j* < n, such that teiy(yi,...,%.) = 1 and N ( g j . ) = N(S). We now select any algorithm <j> using AT. By applying (f> to the fun on gj» we get
As the final step of the proof, we use (38.1) with the functionals Li,L 2 (-;yi),...,L y .(-;y 1 ,...,y j ._ 1 ) and z = (f>(N(5)). Then we take the fun on /* depending on all these functionals and z, and g*(x) = 8(x) - /*(#), for which we have N(g*) = N(5). The algorithm 4> applied to g* yields <{>(N(8)) = z. We then have
Since r/ is arbitrary we get e(<j>) > dn+l. This completes the proof. • We now exhibit information A^* of cardinality n and an algorithm $* using W*, such that e(*) < 2dn(W^,C). This algorithm
CHAPTER 2. NONLINEAR EQUATIONS
40
is defined in terms of perfect splines of degree r. Precisely, we assume that r < n and define Xn-r,r as the class of perfect splines s : [a, b] —> 7£ of degree r which have n — r knots, i.e., for every s in Xn-rtr there exists £,- = £;(s), a < t\ < • • • < tn^T < b and a,- = a,-(s) such that
There exist a unique (up to multiplication by —1) perfect spline s n _ r ,r from ^ra-r,r with the minimal norm, i.e.,
The spline s n _ r>r has n distinct zeros a^,..., x*n and
We define the information TV* as
for / e W^. We now define the algorithm 4>* using TV* as follows. We let u and v be perfect splines of degree r with n — r knots 77,- and Li,respectively,i = 1,..., n— r, interpolating / at a;*, i.e.,
and such that
where where
and
Wedefine
Then / and /+ are the envelopes for the family of function from W^ having the same information as /, i.e.,
where
and
2.1. UNIVARIATE PROBLEMS Wee let
Then the algorithm
and let
41 satisfythe equation
* is defined as
We now prove
Theorem 41.1 The error of * is not greater than
Proof We let / G Fj and z be a zero of /. It is known that
for every f. Therefore we have
and
The proof is completed by taking supremum over / 6 F2. • As a conclusion from Theorems 38.1 and 41.1, we have the following corollary.
Corollary 41.1 The information N* and the algorithm $* are almost optimal, i.e.,
and
for some constants cn and cn from [1,2].
42
CHAPTER 2. NONLINEAR
EQUATIONS
Information complexity bound To guarantee that the residual criterion M(f) € 5(/,s) is satisfied with M = (f>*(N*(f)) it is enough to define n such that e( *) < e. Due to Corollary 41.1 we have
Furthermore, this n is almost the minimal number of evaluations for which the residual criterion is satisfied. Algorithm with small combinatory cost The almost optimal algorithm <j>* is, in general, nonlinear since the computation of (f>* requires the solution of two nonlinear systems of size n — r to compute the knots of the perfect splines u and v. Therefore the combinatory cost of <j>* may be large. We now construct the information ,/V** and the algorithm **, which are almost optimal and easy to compute. We let n — k-r, where k is a nonnegative integer, let h = ( b — a ) / k , and [a,-, 6,-] = [a + (i - l)h, a + ih] for i = 1,..., k. We then set
to be the linear transformations of [—1,1] onto [a;, 6;]. We denote xitj = g i ( z j ) , where Zj = cos((2j - l)7r/(2r)), j = l,...,r, are the zeros of the Chebyshev polynomial of the first kind Tr. For any fun on / 6 F2 we define the information ./V** by
and define the interpolatory polynomials Wi of degree r — 1 satisfying
We know that
2.1.
UNIVARIATE
PROBLEMS
43
We note that
We define the algorithm
as
where x** is chosen from [a, b] such that mini<;
and therefore e(4>**) < 2A. We summarize this in Corollary 43.1 The information N** and the algorithm ** are almost optimal since
and where cn and cn are in [1, B] for
We note that for large r we have
For small r, say r < 4, it is easy to implement the algorithm (f)**. For instance we may first compute /(a;^), ...,/(a;i!r) and check whether rnini<;< r |/(xi,;)| < A. If this is satisfied then we set x** = a:^;. realizing this minimum. If not we construct w\ and compute a point xi minimizing u>i on [01,61], |^i(x 1 )| = min iz . e [ aii j 1 ] |t«i(a;)|.
CHAPTER 2. NONLINEAR
44
EQUATIONS
If |wi ( x i ) | < A then we set x** = x\. Otherwise we compute the next values of / at o;2,i, ..., x 2 , r and repeat the above procedure. Equation (43.1) implies that there exists a point X{ 6 [at-,6t-] such that Wi(xi)\ < A for some i = 1, ...,fc, where a;,- is defined by |w,-(x t )| = min.pg^.^] |u;,-(a;)|. This analysis implies that the worst case cost of the algorithm ** is at most equal to the cost of finding k minimizers of polynomials of degree r — 1, i.e., for r < 4 it is bounded by k • Cs, where Cy, denotes the number of arithmetic operations to find the minimizer of a cubic polynomial over a given interval. 2.1.4
General Error Criterion in C00 and Wr°°
One may wish to solve a nonlinear equation by utilizing a different error criterion than the absolute or residual criteria analyzed in previous sections. We recall that for the general error criterion E ( f , x ) , as defined on page 23, the solution operator is given by We first concentrate on the relative root criterion
and the class of function F\ as defined in section 2.1.2. We assume for simplicity that a > 0. In the proof of Theorem 35.1 we used two function with the same information whose zeros were arbitrarily close to the endpoints of [a, b]. From this construction, we conclude that
whenever card(TV) < +00. We further note that the algorithm
has the error
2.1. UNIVARIATE PROBLEMS
45
Due to (24.1) specifying the radius of information in the general case we have We note that for 6 = 0, (N(f)) is the harmonic mean of a and b. Since the above equation holds for any information N of finite cardinality we conclude that: Corollary 45.1 If e < (b - a)/(a + b + 2 ) then there exists no method for which the relative root criterion is satisfied. We now assume a special form of the general error criterion E. We define the class FI as in section 2.1.3. We denote F = W^[a, b] and let where !/;(•, x) : f —>• Ti is a linear functional, We assume that E is of the form
i.e., that the dependence on / is through A(/, x). We let dn+k+l(W^,C) be the Gelfand (n + k + l)st width (compare section 2.1.3). We generalize Theorem 38.1 by proving Theorem 45.1
for all
We let E be s-homogeneous, i.e.,
We let
Then
Proof We sketch the proof since it is similar to the proof of Theorem 38.1. We assume for simplicity that d = dn+k+1 is finite. We take any TJ S (0,d) and define the fun on S(x) = d — 77. We let g(x) — $(x) — f * ( x ) , where the fun on /* is chosen from WL. so that and weget Since
Since 4> and 77 are arbitrary, the theorem is proven. •
CHAPTER 2. NONLINEAR EQUATIONS
46
We illustrate this theorem by two examples. We first consider the relative residual criterion, i.e., E as defined on page 24 by
Then the dependence of E on / is characterized by
the parameter s ~ 0, and E(A(l, z), z) = +00, for every choice of z. Therefore Theorem 45.1 yields Corollary 46.1 For any information N of finite cardinality the radius Therefore there exists no method for which the relative residual criterion is satisfied no matter how large the error parameter e is. We now consider the error criterion defined by A(f, x) = [f(x)]
and Then the operator E is s-homogeneous and Theorem 45.1 holds with k = 1 and E(A(I, z), z} = 1. By using Theorem 41.1 it is not difficult to verify that there exists information N, such that r(N) < 2s(dn}s. This shows that the Theorem 45.1 is essentially sharp for this case. 2.1.5
Polynomial Equations
In this section we study the class of univariate polynomials of unbounded degree and establish a theorem on constrained approximation of smooth function by polynomials. We seek an ^-approximation M(p) in the absolute sense to a zero ap of a real polynomial p, \M(p) — ap\ < s, in the interval [a, b]. We assume that the polynomial p belongs to the class FI of polynomials of bounded arbitrary seminorm and having a root in [a, b], or to the class F2 of polynomials that are nonpositive at a, nonnegative at 6, and have exactly one simple zero in [a, b]. The information on p consists of n sequential evaluations of arbitrary linear functionals (information with predefined cardinality n). The point M(p) is constructed by an arbitrary algorithm using such information.
2.1. UNIVARIATE PROBLEMS
47
We show that if e < (b — a)/2, then there exists no method for finding M(p) for every p F1, no matter how large is the number of evaluations n. This is a stronger result than the one for smooth function presented in section 2.1.2. For the class F2 we can solve the problem for arbitrary e and p. In the class of parallel information, the optimal information consists of evaluations of p at n equidistant points in [a, b]. In the class of sequential continuous information, the optimal information consists of n evaluations of p at the points generated by the bisection method. This is a stronger result than the one presented for C function in section 2.1.1. To prove this result we establish a new theorem on constrained approximation of smooth function by polynomials. More precisely, we prove that a smooth fun on can be arbitrarily well uniformly approximated by a polynomial that satisfies constrains given by n arbitrary continuous linear functional. Our results indicate that the problem of finding an e-approximation to a real zero of a real polynomial is essentially of the same complexity as the problem of finding an e-approximation to a zero of an infinitely differentiate fun on. This makes the results of sections 2.1.1 and 2.1.4 stronger. We stress that we did not assume knowledge of the degree of the polynomial. The problem of finding an e-approximation to a zero of a polynomial of known degree has been recently studied by many authors as indicated in the annotations to this section. Formulation of the problem
We define P — P[a, b] to be the set of all real polynomials on the interval [a, b] C 12,. For p € P, we let S(p) to be the set of zeros of p in [a, b]. We also let C = C [a, b] be the space of infinitely differentialble function in [a, b]. We define two subclasses of P by where || • || is an arbitrary seminorm in C , and
For a given positive e we define the solution operator for every p 6 P.
CHAPTER 2. NONLINEAR
48
EQUATIONS
We recall that we want to find a point M(p) such that
for any polynomial p in the class F,, i = 1,2. To solve our problem we employ methods based on arbitrary sequential information Nn with predefined number of evaluations n and arbitrary algorithms . The error of is now given by
or in terms of the local error
where
The radius of information is now defined as
where the local radius
We stress that a general proof technique used to show our results is based on an "adversary" principle. Namely, for every n, we first construct a fun on fn, that generates the information
where
yi = Li(fn;yi,—,yi-1), and y t = Li(fn), i = 2,...,n. The function in the null space of N n , f n (.) are those which can be used to perturb f n and still have the algorithm using N n , f n ( f n ) behave in exactly
2.1. UNIVARIATE PROBLEMS
49
the same way. Then we construct a worst pair of such "null" perturbations to fool our algorithm, i.e., we choose a pair of hi and h2 such that N n , f n (h1) = Nnjn(hi} = [0,0, ...,0] and such that the distance between zeros of pi = fn + hi and p2 = fn + hi is maximized.
Results for the class FI We first show that there exists no method to solve our problem in the class FI whenever e < (6-a)/2. We present a sketch of the proof which is similar to the proof of Theorem 35.1. Namely we prove Theorem 49.1 For every n and every sequential information Nn,
Proof By setting (Nn(p)) = (a + b)/2 we get e( ) (b - a)/2. Thus r(Nn) (b — a)/2 as well. To prove the reverse inequality we construct for every 7, 0 < 7 < (b — a)/2, two polynimials pi and p2 from FI such that Nn(pi) = Nn(p2) and dist(S(pi),S(p2)) > b — a — 27. Then the assertion of the theorem follows from (48.2) by taking . Construction of the polynomials p1 and p2 is similar to the construction of worst case function in Theorem 35.1. We first define the function i = 1, ..., n + 1 by
where = a+ (n+ 1), i = 0, 1, ..., n+ 1) and 7 is an arbitrary number, 0 < 7 < (b — a)/2. We then let pi be the polynomials that satisfy
We let d that
and take a positive
such
50
CHAPTER 2. NONLINEAR
EQUATIONS
We will take the fun on fn — 6 in Eq. (48.3), and hi = —h^ = cp*, where p* is defined below. By applying Nn to the constant polynomial 6(x) = 6 we get the information NHtg (see Eq. (48.3))
We then let C = [c1, ...,cn+1] be a nonzero solution of the homogeneous linear system
Welet
Thenfor
Then we define the polynomial p* by
welet
We observe that
and
Thus the polynomial pc has a zero in [a, b]. The definition of pc implies that S(pc) C [a,a + ]. The polynomial p1 is defined by
2.1. UNIVARIATE PROBLEMS
51
for some c as above. To construct p2 we proceed as above with a;,replaced by x* — b — i^/(n + 1), i = 0, ...,n, as in the proof of Theorem 35.1. • Class F-2 : Optimal parallel information We first analyze parallel information in the class F?. We prove that the optimal information consists of evaluations of a polynomial at n equidistant points in [a, b]. We let jV"par be the class of all parallel information with at most n evaluations. We define where Xi — a + i(b - a)/(n + 1), i = 0,1, ...,n+ I. We let p be an arbitrary polynomial from F? and the index j = j(N°(p)) be such that P(XJ) < 0 and P(XJ+I) > 0. Then it is clear that a zero of p lies in [xj,Xj+1]. Thus Eq. (48.1) and (48.2) imply that
Then we prove Theorem 51.1 The information N° is optimal in the class A/"par, i.e.,
Proof For arbitrarily small 6 and any information we construct two polynomials pi and p% from F2, such that N(pi) — N(p2) and Then Theorem 51.1 follows from Eq. (48.1), (48.2) and (51.1) with 8^-0. To construct p\ and p2 we let A — [ao,..., an] be a nonzero solution of the homogeneous linear system
52
CHAPTER 2. NONLINEAR
EQUATIONS
Figure 52.1: Polynomials p(x) and t ( x ) . We define the polynomial
Since p is of degree not larger than n, there exists a subinterval [c, d] of [a,b], a < c < d < b, such that d — c > (b — a]/(n + 1) — §: and p is of constant sign in [c, d]. Without loss of generality we suppose that p is positive in [c, d], see Figure 52.1. We will take fn(x) = t(x) and h i ( x ) = —h,2(x) = p ( x ) , where the polynomial t(x) is defined below. Namely, we take the polynomial t(x) = l/^/mg(x)m, see Figure 52.1, where g is a linear transformation of [c, d\ onto [—1,1], g(x) = 2x/(d— c) — (d + c}/(d— c), and m is sufficiently large, so that the following inequalities hold:
The numbers m and 77 such that the above inequalities are satisfied obviously exist. We define
2.1. UNIVARIATE PROBLEMS
53
and Then we have i — 1,2. Moreover, each of pi has a single and simple zero, $(pi) C [c- ??, c] and S(p2) C [d,d + rj\. Thus pi 6 F2, z = 1,2. Since dist(S(pi), S(p2)) > ^ - c > -^y — 5, the proof is completed. • Class FI : Optimal sequential information We now prove that the bisection information JVbls, as defined in section 2.1.1, is optimal in the class of all sequential continuous information A/"c. We first prove a theorem on constrained approximation of func tions from by polynomials Theorem 53.1 For every fun on f 6 C [a, b], information N € A4; and 6 > 0, 7 > 0, there exists a polynomial w 6 P[a, b] such that
and
Proof We recall that Nnj(-) = [Li,/(0> • • • > L n , f ( - ) ] , as defined in Eq. (48.3). Without losing generality we suppose that the functionals Lij are linearly independent on C [a, b]. This assumption and the continuity of Lij imply that they are also linearly independent on P[a,b]. Therefore there exist polynomials p; € P[a,b], i = l,...,n such that We consider a sequence of polynomials {wm}™=1, such that
Since LJJ are continuous we get
For each wm we define a polynomial vm by
54
CHAPTER 2. NONLINEAR
Then we have
EQUATIONS
and
and
Equations (53.1) and (53.2) imply that there exists an index m0 such that for every m > m0 the following inequalities hold:
We define the polynomial w* by
Then
Moreover
and
which means that w* satisfies our theorem. • In what follows we use the same notation and proof technique as in section 2.1.2. We assume that the reader is familiar with the results presented there. We are now ready to prove Theorem 54.1 The bisection information JVbls is optimal in the class A/"c, i.e.,
2.1. UNIVARIATE PROBLEMS
55
Proof For every 77, 0 < rj < (b — a)/(2 n n), and every information N 6 J\fc we construct two polynomials w\ and w^ from F^ such that N(WI) — N(w2), and
Then the proof of Theorem 54.1 will follow from Eq. (48.1) and (48.2) with rj tending to zero. We first consider the fun on fn constructed by induction in Lemma 31.1 with /i in the proof replaced by
Then, as in the proof of Theorem 29.1, we construct two function gi and 2 in tne dass C [a,b] such that N(gi) = N(g2), each of , has exactly one simple zero, a\ = 5(i > (b — a)/2™ — (n + 1)77. The choice of /i guarantees that fn(a) < 0 and fn(&) > 0, which yields that gi(a) < 0 and #,•(&) > 0 for i = 1, 2. We denote The number 7^ is positive since g\ and g% are strictly increasing in the intervals and
where are defined as in Lemma 31.1. We define = and to be neighborhoods of «2 such that and We let 7 = 7i/4 and set
and
56
CHAPTER 2. NONLINEAR EQUATIONS
The definition of ,- and -, i = 1,2, implies that 6 is positive. By applying Theorem 53.1 with the above S and 7 to the functions g± we obtain polynomials to,-, each of them having exactly one, simple zero, distance between those zeros not less than (b - a)/2™ - (n + 1)77 and N(WI) - N(gi) = AT (52) = N(w2}. This completes the proof of Theorem 54.1. • 2.1.6
Asymptotic Optimality of the Bisection Method
In this section we seek an approximation to a zero of an infinitely differentiable function The error of the bisection method using n function evaluations is 2~( n + 1 ). In section 2.1.1 we proved that for the fixed number n of evaluations, the bisection information and algorithm are optimal in the worst case setting. Thus nothing is lost by restricting oneself to function values. One may then ask whether bisection is also optimal, or nearly optimal, in the asymptotic worst case sense, that is, possesses asymptotically nearly the best rate of convergence. Methods converging fast asymptotically, like Newton or secant type, are of course, widely used in scientific computation. We prove that the answer to this question is positive for the class of functions having zeros of infinite multiplicity and information consisting of evaluations of continuous linear functional. Assuming that every / has zeros with bounded multiplicity, hybrid methods are known, which have at least quadratic rate of convergence as n tends to infinity (see the annotations). Formulation of the problem We let T — C [0,1] be the space of infinitely differentiable realvalued functions on the interval I — [0,1] with the metric p given
by
where
By S(f) = / -1 (0) we denote the set of all zeros of the function /. We seek an approximation to a zero of a function that belongs to
2.1. UNIVARIATE PROBLEMS
57
the class F:
i.e., every function in F has exactly one zero. To solve this problem, we use sequential information with possibly infinite number of evaluations, N : F defined as in chapter 1 by
where
is an arbitrary linear functional and
Observe that L i j ( - ) depends on the previously computed values yj, j = 1,..., i - 1. By Nn(f) we denote
Note that the vector Nn+1(f) contains all components of Nn(f), Nn+i(f) = [Nn(f), Ln+ij(f)]. That is, while increasing n, we use previously computed information. We may assume without loss of generality that the functional in N(-) are linearly independent, i.e., l>\Litf,..., Lnj are linearly independent, for every / € F, n = 1, 2,.... We denote by J\f the class of all information operators of that form. Knowing Nn(f) we approximate S(f) by an algorithm. By the algorithm = { } we mean a sequence of arbitrary transformations, (j)n : Nn(F) ->• / , n - 1,2,.... We let (N) to be the class of all algorithms using information N. The nth error of for an element / is defined by
In the asymptotic setting we wish to find * and N* such that for any / in F the error en(N*, /) goes to zero as fast as possible
58
CHAPTER 2. NONLINEAR
EQUATIONS
as n tends to infinity. The information N* and algorithm * are called nearly optimal iff for every N G .A/", every G (N), and every sequence, 0, (Sn strictly, arbitrarily slowly, decreasing to 0) there exists /* G F such that for every / 6 F we have
This means that an arbitrary algorithm <j> does not converge essentially faster for the function /* than the algorithm c/>* for any function /. The bisection information Nhls is defined as in section 2.1.1. by
where and
The bisection algorithm
is given by
It is known that for every
and that there exists functions / in F such that
for some c > 0, like for example /;(£) = t — |, i = 1,2,4,5. In fact there exists an infinite number of such functions. In section 2.1.1 we proved that for a fixed n
2.1. UNIVARIATE
PROBLEMS
59
for every / 6 F and <j> G <j>(N), i.e., that the bisection information and algorithm are optimal for the worst case model with a fixed number of functional evaluations. Here we show that the bisection information and algorithm are nearly optimal for the asymptotic worst case setting. More precisely, we assume that the information N is continuous, i.e.,
For an arbitrary sequence 6n< Sn \ 0, any N 6 .A/", and any € (N) we define the set B = B(N, ,6n) of functions from F such that the error en(N, <j>, /) is essentially at least Sn • 2~n , i.e.,
To prove near optimality of the bisection method, it is enough to show that the set B is not empty for any 8H: N, and (f>. Indeed, taking any /* 6 B and any / 6 F we have
We will show more by proving that the Lebesgue measure of the set S(B) of zeros of all functions from B equals one. This in particular implies that the set B is uncountable. Precisely, we define the set S(B) by
We prove Theorem 59.1 For every continuous information N 6 N, every algorithm <j> 6 (N) and any sequence 6n, 8n \ 0, the Lebesgue measure JJL of the set S(B) equals one, i.e.,
60
CHAPTER 2. NONLINEAR EQUATIONS
We remark that if the multiplicity m of a zero of / is finite, then it is possible to construct information N and algorithm (f>, which guarantee asymptotically quadratic convergence, as explained in the annotations to this section. We do not know whether Theorem 59.1 holds with the sequence 6n replaced by a constant c, 0 < c < 1, or whether the sequence 8n is necessary. Therefore Theorem 59.1 does not preclude the existence of an algorithm fa and information N\, different from the bisection method, which for some c (perhaps small) satisfies for all / G F and large n, n > K ( f ) , e n ( N l , f a , f ) < c2~^+V. The proof of Theorem 59.1 relies heavily on the assumption that functions in F are of the C class. We do not know whether similar asymptotic result holds for analytic or piecewise analytic functions. We stress that for a fixed n and continuous information the bisection method is optimal even for the class of polynomials of unbounded degree, as described in the previous section. In the following we present auxiliary lemmas and the proof of Theorem 59.1. Auxiliary lemmas We first prove a few auxiliary lemmas needed in the proof of Theorem 59.1. The first lemma is a slight reformulation of Lemma 29.1. Namely, for a closed set A C [0,1], we let
Lemma 60.1 We let Li : T —¥ 7£, i = 1,..., fc, be linearly independent linear funtionals. Then for every positive (3 and any family of closed intervals /,- C [0,1], i = !,...,& — 1, such that L\^ ..., L^-i are linearly independent on C(Ui.T11/i), there exists a closed interval Ik C [0,1] of length f3 such that LI, ...,Lk are linearly independent on In the next lemma, we construct a family of functions from T needed in the proof of Theorem 59.1.
Lemma 60.2 For every £ > 0, 0 < e : < ^ , there exists a family of functions F£,
2.1. UNIVARIATE PROBLEMS
61
with the following properties: (i) F£ is tree structured, where /^o is the root of the tree,
and the functions on the nth level, n — 1,2,..., are constructed inductively in what follows, (ii) Every function f G Fs satisfies
for some [a*,, a*,*] C [0, lj. (Hi) For every f = fi>n there exist closed intervals /i,/2, ...,fn C [0,1] such that the functionals LI, L?j,..., Lnj are linearly independent on C(U™_ 1 /i), and the distance d\st(Iit[a*f, a*/*]) > £n, i = 1,..., n, where en = e • 2~ 2n , with the convention dist(0, V) = +00. Proof (Construction) We let £ be a small positive number, 0 < e < •j^, en = £-2~ 2n , and let {<$„} be an arbitrary sequence monotonically decreasing to 0, 8n \ 0. We define a sequence of indices {n^}, k = 1,2,..., such that
We let
for any interval [a, b] C [0,1]. The family Fs is tree structured. Namely, at the root we have the function fito defined in assertion (i). At the nth level of the tree we have cn functions, where cn < 3 • c n _i if n = n^ + 1 for some fc, or cn < 2c n _i otherwise. Thus the number cn < 3n. We define the functions fi
62
CHAPTER 2. NONLINEAR
EQUATIONS
When n = 0 then /Ii0 is defined in the assertion (i). Obviously, a*,J l , 0 = £• and a*Jfl*, 0 = 1 — e. Since there exist no intervals 7,J in this case, then dist(0, [e, 1 — e]) = +00 > £Q = e. Thus /^o satisfies the induction basis. We now assume that all /j^, k = 0, ...,ra — 1, have been constructed. We let k = n — 1 and let / = / t - >n -i De any function on the (n — l)st level. The information Nn yields the functional Lnj. Due to our assumption the functionals LI, LI,J, ..., L n j are linearly independent on T. Thus, Lemma 60.1 with a = 2en and k = n — I yields an interval In, In = [m — £n, m+£n], such that LI, L2./,..., Lnj are linearly independent on C'(U"_ 1 /i), where /i,...,fn_i are the intervals from the assertion (iii) for the function /. Now we construct the functions on the nth level that are successors in F£ to /. We let a* = a*, and a** = a*,*If a** — a" < Qsn then we let / be a leaf of the tree and therefore the successors are not defined. If a** — a* > 6en then we define the successors / J j j n , j € {1, 2, 3}, ij 6 [1, 3n] depending on whether n = n^ + 1 for some k. We let M = (a" + a**)/2 and define the auxiliary functions Hj, JG {1,2, 3} by:
an d
2.1. UNIVARIATE PROBLEMS
63
In the opposite case, i.e., when
we define the functions Hj as follows. We suppose first that a** — a" < I0en. Then we define the functions HJ, j = 1,2 as in Eq. (62.1). then we have three cases: If In both of these we define respectively. and as in In this case are defineda as in or In this case, we define three functions In case (1), we have
In case (2), we have
The functions K\,Hi and HS are illustrated in Figure 64.1. For any of the cases in Eq. (62.1) and (63.1) we let Hj 6 C(U"ri1/i) be the solutions of
Such functions exist since the functionals £;,/, i = l , . . . , n — 1, are linearly independent on (^(Ufjj1/,-). We let
where c is a positive constant so small that
64
CHAPTER 2. NONLINEAR
EQUATIONS
Figure 64.1: The functions Hi, Eq. 62.la, 62.Ib, and 63.Ic case 1. and We define
and note that S(fi-,n) = S(Hj) n [a*, a'"}. This, Eq. (64.1), and the choice of Hj imply that the assertions (ii) and (iii) of the lemma are satisfied. • The next lemma characterizes more properties of functions in F£: Lemma 64.1 We let /;,„ be an arbitrary function in Fs as constructed in Lemma 60.2. Then: (i) The length of the interval of zeros of /;>n is at most f|j for nk < n < nk+i, i.e.,
2.1. UNIVARIATE PROBLEMS
65
(ii) For every n, the Lebesgue measure of the set Uj-S(fi, n ) is at least
where
This in particular implies that
(Hi) There exist infinite branches in the tree FE. (iv) The functions in every infinite branch in Fs form a Cauchy sequence in T. (v) Ifn > m and /j>m is a predecessor of f^n in Fe, then Nm(fj<m) = Nm(fi,n).
Proof The construction in Lemma 60.2 implies that for all n, if /;,„ is a successor in Fe to fj,n-i then /i(5(/i iW )) < /j(S(/j ;n _i)) and for n = nk + 1, n(S(fi
We show (ii) by simple induction. If n = 0 then 5(/o,o) = [£, 1 — e] and /i(5(/0,o)) = 1 - 2e > 1 - D0£. We now assume that (ii) holds for n — 1, n > 1. Then
66
CHAPTER 2. NONLINEAR EQUATIONS
since en — e • 4 n and the total number of functions on the (n — l)st level is at most 3n~1. By solving the recurrence relation for Dn one /-\ n obtains Dn — 8 - 6 • (|) , i.e., Dn < 8,Vn, which completes the proof of (ii). Now we show (iii). We suppose by contrary that all branches in F£ are finite, i.e., that Fe has at most n levels for some n. Then the tree Fe has all leaves on at most nth level. We recall that /;>n is a leaf, iff /*(S(/,-,„)) < 6en. This yields
Equation (65.1) implies which contradicts our assumption. We now show (iv). We first note that if /,->n is a successor to fj<m in F 6 , n > m, then the construction in Lemma 60.2 implies
where h_tk are the functions defined in Eq. (63.3) and the summation is taken along the branch of F6 connecting f Observe that Eq. (63.4) implies that \\ h. || for any 0 < l k. Therefore
for any 0 < l m. Consequently,
2.1. UNIVARIATE PROBLEMS
67
Since 2~^m~^ is arbitrarily small for large TO, the proof of (iv) is completed. The point (v) is an immediate consequence of Eq. (63.2) and (66.1). Indeed, by letting / = /j;m in the construction of Lemma 60.2, formula (63.2) implies that
where
as in equation
Thus
i.e.,
This finally completes the proof of Lemma 64.1. • Since T is a Frechet space, then every Cauchy sequence in T is convergent. Therefore Lemma 64.1 implies that the following class of functions F0 is well defined:
where /.)7l constitute the infinite branches in F£, and the limit is taken with respect to the p-metnc in J2'. In the following Lemma 67.1we show that every function / 6 FO has exactly one zero, and that /(O) < 0 and /(I) > 0. Moreover, we show that the set of zeros of all / from F0 has Lebesgue measure arbitrarily close to 1. Lemma 67.1 The set FQ is a subset of F, i.e., (i) F0 c F. (ii) The set of zeros of all functions f from FQ has almost full measure for e —> 0. More precisely,
Proof We first show (i). We note that if / € FO, i.e., / =n lim /._„, then
We will show that a/ is the only zero of /. This, combined with /.,n(0) < 0 and /.,„(!) > 0 implies (i).
68
CHAPTER 2. NONLINEAR
EQUATIONS
Indeed, we take any a / aj , a 6 [0, lj. Since /i(5(/.;n)) —> 0 as n —> oo, (compare assertion (i) of Lemma 64.1), then there exists an index m > 1 such that a £ [al n — en, a*,*n + £ n ], for n > m. By using Eq. (64.1) and (66.1) with /;,„ = / and /j?m = /.,m, (n = +oo) we get
which completes the proof of (i). We now show (ii). We define
Then the set of zeros of all functions from FQ is
We observe that Sn+\ C Sn. This and Eq. (65.1) yield
which proves (ii). • Proof of Theorem 59.1 To complete the proof of Theorem 59.1 we show that for every s, 0 < £ < ^, every sequence 6n \ 0, any N € A f , and any <j> 6 ( N ) , the measure f i ( S ( B ) n 5(F0)) > 1 - 8e, i.e., where S(A) denotes the set of zeros of all functions from A and B is defined as in Eq. (59.1). The proof is completed by taking £ —>• 0 in Eq. 68.0.
2.1.
UNIVARIATE PROBLEMS
69
To show Eq. (68.0) we need only to prove that /J.(T) = 0, where
Indeed, (J,(T) = 0 and Lemma 67.1 (ii) imply that
We now concentrate on the proof of n(T) = 0.
(69.1)
We let xi n ) ) , i = l,...,c n , for any function fi>n on the nth level of Fe from Lemma 60.2. Since the functionals in Nn are continuous, Lemma 64.1 implies that x^n = <j>n(Nn(f)), for any / 6 FQ such that /;,„ belongs to the branch {/.,n} of F£ with / = lim n /., n . We let 6^ = 2~ncn • 6n. We observe that the definition of the sequence n^ in Lemma 60.2 implies that for n^ < n < rik+i we have
and
converges to zero
We define
and
We let M be a positive integer
CHAPTER 2. NONLINEAR
70
EQUATIONS
We then observe that
where Xkn,n = n(Nn(fkn,n)) and fkn,n forms an arbitrary infinite branch in FE. Indeed, to show this we let
We take any x 6 T$. Then there exist M and m such that x € T^f . Thus x e n~ =m F n M , i.e., for all n > m, x - x^, n < M • £ n 2~ n , for some sequence Xkn,n along a branch of FE. Thus z 6 A. Conversely, if x € A then there exists m and M, such that \x - Xkn,n < M8n2~n for all n > m and some £fc n ,n- This implies that x G V ( x fc n ,n) f°r all n > m, i.e., a; 6 n~ =m y n M , which yields x 6 T1^ and a; 6 T5, and completes the proof of Eq. (70.1). We now observe that
Therefore
and
since 6^ —t 0. This yields /^(Tjj) = 0, since T$ is a countable union of sets of measure zero. Finally, since
where xkn,n =
Exercises
7. Prove the result outlined in the Remark on page 24. 8. Verify the claims of the second example on page 46 dealing with the error criterion E ( f , x ) = \f(x) s .
2.2. MULTIVARIATE
PROBLEMS
71
9. (*) Generalize the results of section 2.1.1 to the case of information with the number of evaluations n(f) dependent on the particular function /. 10. (**) Verify whether the sequence 6n is necessary in Theorem 59.1. 11. (**) Generalize the results of section 2.1.6 to the case of analytic or piecewise analytic functions.
2.2
Multivariate Problems
In this section we review optimal methods and derive lower complexity bounds for zero finding problems in the multivariate case. This problem is far more difficult than in the univariate case, and the results for general classes of functions are sparse at best. Moreover, the worst case analysis of the following sections indicates that the results are negative, i.e., that the problem is intractable (has exponential complexity) or even unsolvable (infinite complexity) in large classes of functions. Therefore additional restrictions and special properties of functions need to be analyzed to derive lower bounds or to obtain methods that enjoy polynomial complexity. The average case analysis, where there is hope for positive results, is wide open for future research. 2.2.1
Functions with Nonzero Topological Degree
In this section we analyze the class of bivariate infinitely differentiable functions with nonzero topological degree. We know that for univariate functions that change sign at the endpoints of [a, b] we can compute e-approximations to zeros in the absolute sense, and that the bisection method is optimal (Section 2.1.1). The sign change /(a) • /(&) < 0 is equivalent to the assumption that / has non-zero topological degree, since deg(/, [a, b],0) = (sign(/(6)) sign(/(a)))/2. If the degree of / is zero (no sign change), then in general there exists no method using linear information to solve this problem (section 2.1.2). Thus the degree determines whether we can or cannot solve the problem for the scalar case. The situation drastically changes when we add just one dimension. We show that in general it is impossible to find an ^-approximation in the absolute
72
CHAPTER 2. NONLINEAR
EQUATIONS
sense to a zero of a bivariate smooth function with nonzero topological degree. More precisely, we assume that / is defined on a unit triangle T in 7£2 and that T is completely labeled under /. We consider arbitrary sequential information with predefined number of evaluations n and an arbitrary algorithm as our method. We show that for arbitrary n and e < diam(T)/2 there exists no method to solve the problem in the worst case. Our result indicates, in particular, that arbitrary continuation and/or simplicial continuation method cannot approximate zeros of every function / to within e < diam(T)/2, no matter how large the number of function and/or derivative evaluations n. We conclude that extra restrictions on / must be imposed to guarantee positive worst case results. We remark that the unit triangle serves as the domain of functions for technical reasons and that the result holds for an arbitrary compact domain D in 7£2, with £ < diam(£>)/2. Formulation of the problem
We let T = {x e n2 : x, > 0, i = 1,2, xl + x2 < 1} be the unit triangle in 7£2, and T = C*°°(T) be the class of infinitely many times differentiate functions /, / : T -> ft2. We let
where deg(/, T, 0) is the topological degree of / relative to T at 0, and Conv/(T) is the triangle with vertices /(), /((0,1)), and /((I, 0)). We say that T is completely labeled under / (or /-Sperner's triangle) iff 0 6 Conv / (T). We include the assumption 9 6 Conv/(T), since it makes our result stronger and is a typical assumption in the theory of simplicial continuation methods. Since the set of zeros of / is a singleton, 5 = (s }, the solution operator 5 is defined as
We use arbitrary sequential information N with predefined number of evaluations n and an arbitrary algorithm (f> to solve our problem.
2.2. MULTIVARIATE
PROBLEMS
73
Since the error is measured by the absolute error criterion, we get
The radius of information is now given by
where rad(C/(7V )) is the radius of the set U(N(f)) of zeros of functions from F that share the same information with /,
We prove that for any information N with an arbitrarily large number n of evaluations there exist two functions / and g in F having the same information, N(f) — N ( g ) , such that \\s(f) — s(g)\\2 is arbitrarily close to diam(T). This combined with Eq. (73.1) yields that the radius r(N) is at least diam(T)/2. By choosing a trivial algorithm <j>(N(f)) = (0.5,0.5) we get r(N) = diam(T)/2. Since the radius of information is a lower bound on the error of any algorithm, we easily conclude that there exists no method to solve our problem with error less than diam(T)/2. We formulate this in Theorem 73.1 For every n and every sequential information N with n evaluations, the radius r(N) is equal to the half of diameter o f T , i.e.,
To prove this theorem we need a few auxiliary lemmas. Auxiliary lemmas The first lemma will be proved for an arbitrary number of dimensions. It is a generalization of Lemma 29.1 of Section 2.1.1. We let M be a compact region in 7£d, and let T — C (M) be the class of functions / : M—>7£d that are infinitely many times differentiable. We let C(uf=1B,-) denote the set where Bi are open balls in Tld with respect to any norm || • ||. We also let Li : F—tlZ, i = 1,..., k, be linearly independent linear functionals.
74
CHAPTER 2. NONLINEAR
EQUATIONS
Lemma 74.1 For arbitrarily small positive e, and every family of balls B; r\M 7^ 0, i = 1, ..., (A;— 1), such that L\, ..., L/t-i are linearly independent on C(U^~^Bi), there exists an open ball Bk Pi M ^ 0 with diam(B/t) — £ suc/i £/sai L/i,...,Lk are linearly independent on Proof We suppose that the lemma does not hold. Then for every Bk, Bk n M ^ 0, with diam(Bfc) = e, the linenar functionals Li,...,Lk are linearly dependent on C(u|L1B,-). Since by assumption LI, ..., jLfc_j are linearly independent on Ct(Ut-:Tj Bi), we must have
We first assume that a,-(Bjt) = a,- for all B^ (i.e., we let the constants of the summation to be independent of the choice of Bk). Then we let BI , ..., B*, q < +00, to be an open e-covering of M (i.e., B* C 7^" is a ball with diam(Bj) = e, and M C U^ =1 B*). This covering exists since M is compact. Then, by the partition of unity theorem, any function / € C (M) can be decomposed as
Therefore for all / 6 C (M) the above equation and the linearity of Lk imply
This equation contradicts linear independence of Li,...,Lk on J- '. Therefore we must have Qi(Bk) ^ a,-, so that there exists at least two balls B&,i and B&,2 such that
and
2.2. MULTIVARIATE PROBLEMS
75
where /?,- = ai(Bk,i), Ti = "1(^,2), and fa ^ 7j for some j e {1,..., A; — 1}. These equations imply
Since /3j — j}• ^ 0 the last equation contradicts linear independence of LI, ..., Lk-i on C(ufri1-Bi)- Tnus tne lemma holds. • The next lemma guarantees existence of specific worst case functions. Lemma 75.1 For every information N, every e, 0 < £ < diam(T)/2, and for every jn, 0 < jn < a/22n+3, a = \/£/8, there exist (1) a function Fn = ( f t , fn2) e C (T), (2) strips S*,S% defined below, and (3) balls Bi with diam(Sj-) < 7,-, ji = 1,..., kn, where kn is the maximal number of linearly independent functionals on C (T) among Litpn,..., Ln,Fn (we denote these functional L
ii—>LlJ>
and
(i.e., the strips S%, and S% are at least a/22""1"1 units wide); (in) dist(5^, S#) > diam(T) - s, where
and dlst(W, Z) = +00 for W or Z empty; (iv) dist(B,-, Sxn] > 7 n/2, and dist(B,-, 5») > 7n/2, for i = 1,.., kn, (v) the triangle T is Fn-Sperner's triangle; (vi)
where
76
CHAPTER 2. NONLINEAR
EQUATIONS
Proof We first define a function which is needed in the proof of this lemma. We let a and / ? ( « < / ? ) be fixed real numbers. We define the function
The proof goes by induction on n. We suppose first that n = 0, i.e., that we do not have any information. We construct a function F0 which satisfies assertions (i)-(vi) of our lemma. We let
and define for all (z,y) € T (see Figure 77.1)
We note that the function FQ satisfies assertions (i)-(vi) of our lemma. Indeed, the equation 5^2 — $0,1 = (1 — a) — (1 — 2a) = a > a/2 implies (i). Similarly (ii) holds. The assertion (iii) holds since dist(S£,S#) = diam(T) - e/2 - e/2 = diam(T) - e. We also observe that (iv) holds trivially since there is no B{. To show (v) we note that F 0 (0,0) = (1,1) and that F 0 (l,0) and F 0 (0,1) lie on the opposite sides of the line y = x. Thus T is F0-Sperner's triangle, since 0 € Conv^ 0 (T). Finally, we see that condition (vi) holds by the definitions of FQ, SQ and SQ. We now assume that Lemma 75.1 holds for n — 1 with the function F n _i. Then the information N with n evaluations yields the functional Ln^n_i (we recall that Z/I ) F O (-) = ^i(O-) ^-^11 •••'-^fc n i Ln,Fn-i are linearly dependent on C (T], then Fn = F n _i satisfies
2.2. MULTIVARIATE
PROBLEMS
77
Figure 77.1: Graph of F0. the lemma. Therefore we assume that they are linearly independent. We take 7n < min( 7n _;t/2, a/2 2n+3 ). Then by Lemma 74.1 there exists a ball B^n C T7n = {z : dist(z, T) < 7n} with diam(Sfe n ) = 7n and kn — & n _i + 1, such that Z/*, ..., L*k (L*kn = LHjp'n_1) are linearly independent on C(L)f.21B,-). Two cases are possible: Case (1). dist(5 fcr ,,^_ 1 ) > 7ri/2 and dist(Bkn, 5»_j) > 7n/2 (i.e., #&„ is at least 7n/2 away from both strips). Then letting Fn = F n _i, &n = Sn-i an<^ $n = $n-i we conclude that (i)-(vi) are satisfied. Case (2). dist^.S^) < 7n/2 or dist(S fcn ,5^_ 1 ) < 7n/2 (i.e., Bkn is within 7n/2 of one of the strips). We note that it cannot be close to both strips at the same time, since they are separated by diam(T) — e. We assume, without loss of generality, that dist(SA.n,5^_1) < 7 n /2. We let (bxkn,bykj be the center of the ball Bkn . The two possible subcases are: (a) bxkn < (5£_M + S£_ li2 )/2 (i.e., the ball is centered in or to the left of the center of the strip S^_1 .); (b) bkn > (Sn-i,i center of the strip
(i- e -i
the bal1 is to the right of the
CHAPTER 2. NONLINEAR
78
EQUATIONS
We first consider the subcase (2a), and define the function
and let HI(X, y) = (h(x, y), h(x, y)). We now take a function
for i = 1, ...,& n _i. Such function exists since the functionals L* are linearly independent on C(U,-=i"1 Bi)- Therefore
for j = 1, ..., n, since Lj^pn_l is a linear combination of L*. We then denote ( h i , h^) = HI + H1, and choose a positive constant c so small that
and such that if there exists an index i such that 5, contains a vertex u of T, and if it is the case that
(78.2)
fn-i(v) > fn-i(v} (respectively, <) then it is also the case that f n - i ( v ] + chi(v) > fn-\(v] + c/i2(u) (respectively, <).
We note that requirement (78.2) is needed to guarantee that T is (Fn-i + c(Hi + // 1 ))-Sperner's triangle. We finally let
and S£ = {(x,y) e T : S*2 — Sn-i 2)- Now we show that the properties (i)-(vi) are satisfied by these choices.
2.2. MULTIVARIATE
PROBLEMS
79
From the induction assumption, S*i > (1 — 2a) and S£2 < (1 — a)Moreover, 5^-5^ = ^_ 1>2 -(5*_ 1)1 +5S_ li2 )/2- 7n '= (S£_i, 2 ^_i,i)/2- 7n > a/(2-2 2 ( n - 1 )+ 1 )-G/(2-2 2 " +3 ) = a(2- 2 "-2~ 2n - 3 ) > a/22"+1. This implies (i). Property (ii) is obviously true since 5^ = S^_1. Property (iii) is satisfied since S* C 5g and S% C SQ, and (iii) is satisfied for n = 0. Property (iv) holds for fiz, i = 1,..., fcn_i, from the induction assumption. Fordist(5fc n ,5£) we havedist(B fen ,S£) > dist^^S^) > To > Tn- (Since we are considering the subcase (2a) we know that (S»_ M + 5»_ li2 )/2 > 6^ > S*_M - 7n ). As for the dist(B fcn ,5S) we have 6^n < (S'^_ljl + 5 r ^_ 1)2 )/2 and diam(BfcJ = 7n. Thus, we can conclude that d'ist(Bkn,sfj > \bxkn + 7n/2 - 5^1 > 7n/2. Properties (v) and (vi) are satisfied since the choice of the costant c was small enough to meet requirements (78.1) and (78.2). These arguments imply that the lemma holds for the case (2a). In the case (2b), we proceed as in the case (2a), replacing the function h(x, y) by
This finally completes the proof of Lemma 75.1. • Proof of Theorem 73.1 The proof is based on the construction of two functions from the class F having the same information, and zeros separated by diam(T) — e, for arbitrarily small positive e. We begin by defining a function U : 722 —>• 7£2, U(x, y) = («i(x, y), u%(x, y)) where (compare Figs. 81.1 and 81.2)
80
CHAPTER 2. NONLINEAR
and
where (see Fig. 81.2)
EQUATIONS
and
and
We now take a function W : 7£2 —>• 7£2, W = (wi(x,y),W2(x,y)) € C(Vi=iBi) such that N(W) = -N(U). Again the function W exists since the functionals L* are linearly independent on C(U^1S,-). Next, we let Fn = (/^, /2) to be the function from Lemma 75.1, and choose a constant c so small that
for j = 1,2, and if there exists an index j such that Bj contains a vertex v of T, and if fn(v) > F2(v) (respectively, <), then (f* + c(wi+ui)(v) > (F2 + c(w2 + U2))(v) (respectively, <). We define the function G1 = Fn + c • (U + W). We then note that N(Gl) = N(Fn), T is a G:-Sperner's triangle and that G1 has exactly one zero cv, which is located inside the strip S^ at the intersection of the lines z = (5^ + •5^ 2 )/2 and y = (1 — S'^ 2 )/2. Thus, a = (ai,a 2 ) = ((S^ + S^)/2, (1 - S£2)/2). To see that a is a simple zero we calculate the Jacobian Jac(G1} at a, where
2.2. MULTIVARIATE
PROBLEMS
Figure 81.1: Graph of m + F0.
Figure 81.2: Graphs of u\, u\, and u\.
81
CHAPTER 2. NONLINEAR
82
EQUATIONS
We observe that on S*, Fn = W = (0,0), so we can reduce this equation to
Using the fact that on S£, ^- = 0 (also ^
= 0) , we have
By recalling the definition of U we note that in sufficiently small neighborhood of a,
We note that this occurs where PL(z,a,b) = 0.5, but since the integrand is symmetric with respect to its argument, the value of the integral at the midpoint is obviously half of the total integral. Therefore
Now we note that
and recall that
where c* is a nonzero constant. Therefore we can conclude that Jac(G 1 )| a ^ 0 and that Gl is a member of the class F. Next, we similarly construct a function G2 in F with one simple zero in 5», such that N(G2) = N(Fn). Therefore N(G1) = N(G2). This means that for arbitrarily small e we constructed two functions in our class with the same information, whose zeros are separated by at least diam(T) — e. (We recall that these zeros are in S* and S% which are by Lemma 75.1 separated by diam(T) — e). Theorem 73.1 follows by taking the limit as £—>-0. •
2.2. MULTIVARIATE PROBLEMS
2.2.2
83
Lipschitz Functions
In this section we analyze the class of Lipschitz continuous functions. We utilize the residual error criterion. We assume that the functions transform d-dimensional unit cube B into the set of real numbers 7£, or into the p-dimensional real space IV. We seek a point x* such that ||/(£*)]|oo < £• We find an optimal algorithm, which uses adaptive information with predefined number of evaluations n, and find almost optimal information, which consists of n parallel function evaluations at equispaced points Xj in the cube B. We also prove that a simple search algorithm that yields a point x* = x^ such that Il/(£fc)llco = mini<j< n ||/(zj)||oo is nearly optimal. The complexity of the problem turns out to be roughly equal to (K/e)d, where K is the Lipschitz constant. Formulation of the problem We define B = [0, l]d to be the d-dimensional unit cube in Kd and 1 1 a; 1 1 = maxi<;< *R. satisfying the Lipschitz condition with constant K,
We let F be a subclass of F defined by F - {/ G T : there exists a £ B : such that f ( a ) = 0}. The solution operator is now given by
for every / € F. The problem is as follows. For any function / in F, find a residual e-approximation to a = « , i.e., a point M such that Remark One may wish to solve the above problem in the class of functions g : B —> TV ', p > 1, satisfying the Lipschitz condition and having a zero in B. This problem is, however, equivalent to the case p=l.
CHAPTER 2. NONLINEAR
84
EQUATIONS
Indeed, for a given g : B —>• TV we define the function /, / : B ->• ft, by
This / satisfies the Lipschitz condition with the same constant as g. We also note that / has a zero in B, and that S ( f , s ) = {x £ B : \\g(x)\\ < s}. Therefore we may assume without loss of generality that p = 1. • To solve our problem we use sequential methods with arbitrary linear information N of predefined cardinality n. We recall that for the residual error criterion the error of an algorithm is defined as
and it can be expressed in terms of the local error e(<j), f) as
where
The radius of information in this case is given by
where r(N, /) is the local radius of information given by
In this section, we solve the following problems: 1. Find optimal information N° and almost optimal information Nao. 2. Find minimal cardinality number m(e). 3. Find optimal, strongly optimal and almost optimal algorithms.
2.2. MULTIVARIATE
PROBLEMS
85
Information consisting of function evaluations We first analyze sequential information consisting of function evaluations, where x\ is some point chosen a priori, x\ 6 B, and where #,- is an arbitrary transformation a;,- : T^*"1 —>• B. Our first goal is to derive formulas for the local radius of information. Local radius of information We first make several observations on the functions in our class. We let y = N(f), i.e., j/j = /(zj), j — 1,..., n, and define the set and
Thus Z is the set of zeros of all functions / in F that share the information with the function /. The definition of the class F implies for all such that and We define the functions
These definitions imply that g and g+ are lower and upper bounds for the functions / that share the information with /, i.e., for x 6 B and / 6 F such that N(f) = N(f). We let B(x,r) = {y € 7ld : \\y — x\\ < r} be the cube with center x and radius r. It is clear that
CHAPTER 2. NONLINEAR
86
EQUATIONS
where Int(A) is the interior of the set A. We take any z 6 Z and define the function / by
This / satisfies a Lipschitz condition with the constant K. Moreover, Eq. (85.1) implies that for all j, K\\z - x3\\ > \y}\. Thus we get g-(z) < 0 and f(z) = 0. Similarly N(f) = N(f), which means that / € F and z e Z. From (85.1) we conclude that
We now define 6(c n ,r n ) as a cube of minimal radius rn containing the set Z and set D We select any / 6 F such that N(f) = N(f). Then there exists a zero z £ Z of / such that \\cn — z\\ < rn. Hence
We also observe that
Define the functions /~ and / + by
The above inequality and bounds provided by g~ and g+ imply (86.3) for all x 6 B. This means that /~ and /+ are the envelopes of functions / with above properties. We are now ready to prove Lemma 86.1 We let I = {i : y,- > 0} and J = {j : y} < 0}. Then for any function f & F, the local radius of information is given by
where
assuming that min
2.2. Ml/LTIVARIATE PROBLEMS
87
Proof If yp = 0 for some p then obviously r(JV, /) = 0. Thus we assume that yp ^ 0 for all p. We note that ||a;j — a;,-|| > yi/K — y j / K for all i; 6 / and j 6 J, which implies that dn > 0. We denote D = min{|yi|,..., |yn|, £>„, dn} and first prove that for any function / £ F we have
By setting x = xp in the definition of the local radius of information we get r(N, /) < \yp\ = |/(x p )|. In a similar way, by taking x = cn inequality (86.2) yields r(N, /) < Dn. Thus it is enough to prove that for any function / G F. To this end, for nonempty / and J we choose z'o € / and jo € J such that
We define where
where [a;j0, a;^] denotes the interval joining x,-0, and a;JO, and dB(x, y] is the boundary of B(x,y). We now prove that
Thedefintionofpyields Therefore
The definition of /+ yields the above inequality. Similarly we can show that f~(p) > —dn. Thus inequality (86.3) yields j(p) < dn for every / € F such that JV = AT . Therefore
88
CHAPTER 2. NONLINEAR
EQUATIONS
the definition of the local radius of information immediately yields Eqs. (87.2) and (87.1). We now prove that r ( N , f ) > D. To this end, for an arbitrary XQ G B we construct a function f in F such that N(f) = N(f) and
We define / by
This / satisfies the Lipschitz condition with the constant K. We suppose that XQ $ U j e jB(xj, (D — y j / K ) . If j G J then ||xo — Xj\\ > (D — yj)/ K. Thus D — K\\XJ — XQ\\ < yr This implies that f(xj) = f(xj). If i € / then D - K\\Xi - x0\\ < j/;, so /(a;,-) = f(xj). Thus N(f) = N(f). From the definition of rn there exists z in Z(N(f)) such that /On - A'||s - x0\\ < 0. Obviously f~(z) < 0, which yields f ( z ) < 0. Thus / has a zero in B since /(XQ) > D > 0. Therefore f £ F, and inequality (88.1) holds. Similarly we can show Eq. (88.1) in the case XQ 6 (Jj€jB(xj, (D - y j ) / K } . We note that Eq. (88.1) yields suP{|/>0)| : / € F, N(f) = N(f)} > D. Since XQ was chosen arbitrary we get r(JV, /) > D. This combined with Eq. (87.1) completes the proof of the lemma. • Optimality results
In this part we derive formulas for optimal information and algorithms. We first assume that the number of evaluations n is of the form n = Ad — 1 for some integer A > 1. The case of general n will be discussed later. We let
and define the set X* by
2.2. MULTIVARIATE
PROBLEMS
89
The set X* has Ad = n + 1 elements. We let Xj,...,a;* + 1 be distinct elements of X", i.e., X* = {x^,..., x^+l}. We note that X* is the set of centers of the cubes B(x*,R) that form the optimal covering of B. Here optimal covering means that B C D"^-8(3^, R) and for every points Zj such that 5 C U^1 .B (zj, r) it may be shown that r > R. We now fix z G {1, 2,..., n + 1} and define parallel information Ni
by
We note that we do not compute f(x*) and therefore the number of evaluations is n. We are now ready to prove the optimality of the information JV,-. Theorem 89.1 For every i G {1, 2,..., n+ 1} the information Ni is optimal and where n = Ad — 1. Proof We denote v - KR = K/(2A). showing that
We start the proof by
We select an arbitrary / in F and let j/j = f(xj) for all j. We first suppose that there exists an index j such that |?/j| < v. Then Lemma 86.1 yields r(iV,-, /) < v. We can assume that \yj\ > v for all j. We let z G Z = Z(Ni(f)). Then the definition of Z on page 85 implies z g IntB(xj, \yj\/K). Thus \\z — x*j\\ > v/K — R, which means that z G B(x*,R) and consequently Z C B(x*,R). From the definition of Dn we conclude that Dn < v, which combined with Lemma 86.1 implies r(Afj,/) < v. Hence for all / G F we have r(N^f) < v. By taking the supremum over / we get inequality (89.1). We now show that for every information N there exists a function g & F such that
We recall that N(f) — [/(^i), ...,/(;£„)], where x\ is given a priori, and Xi = Xi(f(xi),..., /(x,-_i)). Define the function g by
90
CHAPTER 2. NONLINEAR
EQUATIONS
i — I times
for any x G B, where z\ = x\, and z,- = x,- (u, u, ..., t>). Then
This gr satisfies a Lipschitz condition with the constant K. To guarantee that g G F it is enough to show that the set of zeros of g, S(g) = {z e B : g(z) = 0}, is not empty. The definition of g yields
We suppose to the contrary that S(g) = 0. This implies that B C (J™=l\ntB(zj,R). Equation (86.1) yields Z = 0. Therefore it is enough to prove that Z = Z(N(g)) ^ 0. We shall show more by proving that where Vol denotes the d-dimensional volume. To this end we observe that
This yields that g has a zero and belongs to F. From the above inequality we conclude that the radius of Z is at least R. Thus the definition of Dn on page 86 yields Dn > v. Finally we conclude from Lemma 86.1 that r(N,g) = v. This proves that r(N) > v for any information N. Inequality (89.1) combined with this yields Theorem 89.1. • Theorem 89.1 says that parallel information is optimal. Thus even if sequential evaluations are permited it does not help. The nodes of the optimal information are given a priori. Further discussion of parallel versus sequential methods is outlined in the annotations to this chapter. Minimal cardinality number
We now want to find the minimal cardinality of information N such that r(N) < e, i.e. the minimal cardinality number m(e). We note
2.2. MULTIVARIATE
PROBLEMS
91
that Theorem 89.1 states that Ni is optimal if the number of evaluations n is of the special form n = Ad — 1. We could not find an exact radius for arbitrary n. We can, however, prove the following result. Theorem 91.1 The minimal cardinality number m(e) is given by
where a € (—1, 0]. Proof Theorem 89.1 states that r(Ni) = K/(2A) with n = Ad - 1. To guarantee that r(JV;) < e we choose the minimal n* such that
for some A*. This yields A* = \K/(2e)]. We suppose that n < q = (A* — l)d — 1. Then for arbitrary information N we have
We now conclude that This equation can be rewritten as m(e) = (A* — a)d — 1 with some a € (—1,0]. Hence, Theorem 91.1 is proven. • We illustrate this theorem by selecting K = 2, which gives m(e) essentially equal to (\}d- We note that m(e) depends strongly on the dimension of the problem. We suppose that we could solve the problem with using of n = 106 function evaluations. Then the accuracy e that can be guaranteed is no better than 10~6/d. Thus for d = 1 we get £ > ID"6, for d = 3, e > 1(T2, and for d = 6, e > KT1. We now wish to develop optimal algorithms. Optimal algorithms
We let N be any information, and define an algorithm
92
CHAPTER 2. NONLINEAR EQUATIONS
where D, Dn, cn, dn, and p are defined as in Lemma 86.1. Then the definition of the local error of an algorithm and Lemma 86.1 imply that Therefore <j> is a strongly optimal algorithm. The combinatory complexity of 0, i.e., that cost of computing (f)(y) for a given y = N ( f ) , may be large since it requires the computation of a center cn of the set Z(y). It is an interesting combinatorial problem to find the minimal cost of computing the center of the set B — U^=llnt(B(xj,bj)). For the optimal information TV; we propose an algorithm that has combinatory cost linear in n. We recall that v = KR, and define *
by
where \yj\ = min{|yi|,..., |yi-i|, |y,'+i|,.--, |yn+i|}- Thus the computation of w = *(y) for a given y = Ni(f) requires only n comparisons. The above definition of 4>* yields
As in the proof of Eq. (89.1) we get D* < u, which yields e( *} < v. By Theorem 89.1, r(Ni) = v which gives e(<j>*) — r(Ni). Hence <j>' is optimal. We summarize these results in the following theorem. Theorem 92.1 The algorithm (f>* is optimal, but not strongly optimal. The combinatory cost of * is equal to the cost of n comparisons. We stress that to compute *(y) we need to know the Lipschitz constant K. Thus we propose a third algorithm, which is almost optimal, does not require the knowledge of K, and has combinatory cost linear in n. We define <^** by
where \yj\ = min{|j/i|, ..., |y,-_i|, |y;+i|, ..., |yn+i|}We first find the error of this algorithm. We let / be a function, / : B —}• 7£, satisfying the Lipschitz condition with the constant K. If |/(a;*)| > 2v for all j, then the set Z = {2 6 B : f ( z ) = 0} is
2.2. MULTIVARIATE
PROBLEMS
93
empty. Thus / £ F. This implies that for every / 6 F there exists j, such that |/(arp| < 2u. Hence
We let
Then g satisfies the Lipschitz condition with the constant K. Furthermore g~(x*) — 0 since there exists jQ such that \\x* — x^o\\ = 2R and ||zj - a;? 11 > 2R for all j. This, Eq. (93.1) and the definition of local error of an algorithm yield
This equation says that (j)** is almost optimal. The combinatory cost of ** is equal to the cost of n — 1 comparisons. We summarize these results in the following theorem. Theorem 93.1 (i) The algorithm ** is almost optimal since its error is twice the radius of information
but it is not strongly optimal. (ii) The computation of (f>** (y) does not require the knowledge of the constant K. (Hi) The combinatory cost of ** is equal to the cost of n — I comparisons. General information
So far we were dealing with the information consisting of n sequential function evaluations. Here we study the sequential information consisting of arbitrary n sequential linear functional evaluations. It is surprising that even in that general class Af the parallel information Ni is almost optimal. This is shown in the following theorem. Theorem 93.2 We let n = Ad - 2 and ni = (A - l)d - I for some integer A > 1. Then
where Nj is composed of n\ evaluations, and j — 1,..., (A — l)d.
94
CHAPTER 2. NONLINEAR
EQUATIONS
Proof Since n\ < n we get
for j = 1, ..., (A — l)d. This establishes the two right nearest relations in Eq. (93.2). Therefore it is enough to show that for every N from
To this end we let R(x) = R = K/(2A), x 6 B. By applying the information N to the function R we get the parallel information
We define
for i = 1,..., n+ 2. We let (ci,..., c n+ i) be a nonzero solution of the homogeneous system of n linear equations with n + 1 unknowns,
We define \Ck\ = maxi< t -
The function / satisfies Lipschitz condition with the constant K and has a zero in B, since f(x*k] = 0. Therefore / belongs to the class F. We also note that We now choose an aroitrary point x0 t £>. men
2.2. MULTIVARIATE
PROBLEMS
95
As before, we let (ci, ...,c,- 0 _i,c,- 0 +i, ...,c n+2 ) be a nonzero solution of the system
We let |cfc| = max{|c,-| : i ^ io}. We define the functions H\ and /i
by
We note that f i ( x ) = R for x £ B(x*Q, 1/(2A)) and that /i belongs to F. It is crucial to notice that
Thus for every information N € Af we constructed a function f £ F and for every XQ £ B we constructed a function fi £ F such that AT = ^(/x) and /i(x 0 ) = R. The definition of local radius of information yields now that
This proves the left-most relation in Eq. (93.2), and completes the proof of Theorem 93.2. • We obtain the following corollary on minimal cardinality number m(s) in the class Af of information. Corollary 95.1 The minimal cardinality number m(s) in the class A/" is given by
Proof Theorem 91.1 implies
96
CHAPTER 2. NONLINEAR
EQUATIONS
We choose the maximal A such that K/(2A) > s, i.e., A = K/(2e)-b for some 6 e [0,1]. Theorem 93.2 yields that r(N] > K/(2A) for n = Ad — 2. Thus m(e) has to satisfy
These imply that m(e) — ( K / ( 2 s ) ) d ( l + 0(1)), which completes the proof. • Complexity of the problem We summarize our results by developing complexity bounds for the solution of our problem. We recall that comp(£) is the minimal cost of solving our problem and it is equal to the cost of computing information N with the number of evaluations equal to m(e) plus the minimal combinatory cost of an optimal algorithm using N. We assume that the cost of each functional evaluation is c and that the basic arithmetic operations and comparisons cost unity. Furthermore, we assume that any optimal algorithm has to use each information evaluation yj at least once, which yields that its combinatory cost must be at least m(e) — 1. In particular, this implies that the algorithm 4>* introduced earlier has almost minimal combinatory cost. We summarize this discussion in the following theorem. Theorem 96.1 (i) The complexity of our problem is
where b € L[1 m(e)rr, ' 1]. J (ii) The cost comp(M*) of the method M* = ( *, TV;), which is the sum of the cost of computing information JV,- with m(e) evaluations and the combinatory cost of *, is almost minimal since
where (in) The cost comp(M**) of the method M** = (4>**, AT,-), which does not require the knowledge of K, is
where
2.2. MULTIVARIATE
PROBLEMS
97
and Thus, asymptotically,
comp(M**) = 2dcomp(e) (l + o(l)), ase-)- 0.
Remarks It is important to note that our negative complexity result depends significantly on the class of functions F. Indeed, if we take the following class FI, for all
and there exists
then we get different results. This FI is the class of functions satisfying a two-way Lipschitz condition with the constants K\ and K-2, 0 < KI < KI, and having a zero in [0,1]. Following our proofs we can show that
for n > 2 and i G {1, ..., n+ I}. Therefore, m(e) is not greater than n* = max([(A'2 — Ki)/(2e)] — 1, 2). We can prove that there exists an optimal algorithm using Ni with combinatory cost not greater than Cim(e), where c\ is a constant. Therefore, the complexity comp(£) is not greater than n*(c+ci). We note that for KI close enough to K% the complexity comp(e) is essentially equal 2c. This is intuitively obvious since for KI tending to KI the class FI shrinks to a class consisting of linear functions, and a linear function is uniquely determined by the values at two distinct points. It is an open problem to generalize the above result for the class FI to the d-dimensional case. We conjecture that the complexity is roughly (\(K2 - A'1)/(2e)] - l)d(c+ I). • 2.2.3
Exercises
12. (*) Verify all claims presented in Remarks on page 97 for the one dimensional case d = 1.
98
CHAPTER 2. NONLINEAR
EQUATIONS
13. (**) Verify the conjecture in Remarks on page 97 for the multivariate case d > 1. 14. (**) Find the minimal cost of computing the center cn of the set Z = B — Uj = 1 Int(B(xj, 6j)), defined on page 85, and needed in the implementation of the strongly optimal algorithm fi.
2.3
Annotations
2.3.1
Overview and brief history
Substantial literature treats the problem of finding worst-case optimal methods for approximating zeros of nonlinear algebraic equations. A partial list for worst case results includes Booth [8], Chernousko [17], Eichorn [20], Gross and Johnson [29], Hyafil [32], Kiefer [40], [41], Micchelli and Miranker [53], Renegar [68]-[70], Schonhage [77], [78], Smale [100], Sikorski [93], [94], Sikorski and Wozniakowski [98], and Sukharev [107], who consider the scalar case; and Boult and Sikorski [9], Majstrovskij [49], Nemirovsky and Yudin [56], Nesterov and Nemirovsky [57], Renegar [70], [72], Shub and Smale [81], [82], Sikorski [94], and Todd [110] who consider the multivariate case. A partial list for the average case analysis of univariate problems includes Novak [58]-[60], Novak and Ritter [61], [62], Novak, Ritter, and Wozniakowski [63], Ritter [75], and Shub and Smale [80]. We briefly recall some of the worst case results. The iterative model of computation and selected average case results are briefly presented below. The first optimality results are to be found in 1953, in Master's thesis [41] of Kiefer. He considers the problem of approximating maximum of scalar unimodal functions. The information consists of n sequential function evaluations. He proves that the Fibonacci search is the optimal method. In the 1960s researchers were concentrating their attention on unimodal and convex classes of functions. Optimal algorithms for zeros of functions and their derivatives were derived, e.g., in the work of Gross and Johnson [29], Booth [8], Chernousko [17], and Eichorn [20]. Gross and Johnson consider the class of convex continuous functions changing sign at the endpoints of an interval. They study optimal sequential methods based on function evaluations. Booth studies location of zeros of derivatives, Chernousko the location of
2.3. ANNOTATIONS
99
zeros of functions with bounded difference quotients, and Eichorn studies the zero finding problem or a maximum in the class of unimodal or monotone nonincreasing functions. In the 1970s the focus shifted to classes of functions with bounded derivatives. Examples here include the work of Michelli and Miranker [53] on envelope methods, Sukharev [107] on Lipschitz functions, and Majstrovskij [49] on the optimality of Newton's method. An important contribution is the work of Nemirovsky and Yudin [56]. They study the multivariate minimization problem and derive formulas for information complexity m(e) in the class of sequential function and/or derivative evaluations for computing e-approximations to the extremum in the residual sense. They find sharp estimates for m(e) in the classes of convex and strongly convex functions as well as nonconvex functions defined on a convex and/or compact set of dimension d. In the convex class they obtain m(e) « dlog(l/e), and in the nonconvex case m(e} K ( l / e ) d / k for fc-times continuously differentiable functions. The authors only occasionally deal with the problem of combining these m(e} evaluations to compute an approximate solution. In recent work, Nemirovsky and Nesterov [57] review interior-point polynomial methods in convex programming. This monograph complements the first book by describing several important algorithms constructing ^-approximations to extrema (in the residual sense) with the number of arithmetic operations bounded by p(m, d, D) log(C/£r), where p is a polynomial, m is the number of constraints of the domain of functions, d the dimension of the domain, D the dimension of the "coefficient" vector uniquely identifying problem instances, and C is a scale parameter dependent on the magnitudes of the coefficients. All of these algorithms enjoy almost optimal complexity, since their cost is polynomially proportional to the lower bounds on the information complexity. It is not known though what the smallest degrees and/or coefficients of those polynomials are to compute ^-approximations for all problem instances in a given class. Comprehensive overviews of optimization methods are given by Dennis and Schnabel [19], Gill, et al., [27], Fletcher [24], and Osborne [65], whereas complexity issues in optimization are addressed by Vavasis [124]. Average case analysis
Average case analysis of nonlinear zero finding problems is a new
100
CHAPTER 2. NONLINEAR
EQUATIONS
and much-needed direction of research. Only univariate results are available so far. A partial list includes the papers of Novak [58][60], Novak and Ritter [61], [62], Novak, Ritter, and Wozniakowski [63], Ritter [75], and Shub and Smale [80]. A survey of average case results is given by Novak and Ritter in [62]. We review here the recent results of Novak, Ritter, and Wozniakowski presented in [63]. This work is representative of the current stream of research in average case analysis. We proved in this chapter that the bisection method is optimal in several classes of functions, and that the worst case complexity is of the order O(log(l/e)). This, in particular, holds for the class
and even for the C subclass of functions with only simple zeros. Computational practice suggests that methods exist that perform "better" than bisection on the "average." Indeed, this was the observation of Brent, Bus, and Dekker while testing their hybrid bisection/interpolatory type methods [13]-[18]. These algorithms are still considered as "best" practical algorithms for zero finding; see Kahaner, Moler, and Nash [36]. Novak, Ritter, and Wozniakowski prove that, indeed, a hybrid bisection-secant type method is much better than plain bisection, and that it is almost optimal on the average. In the average case setting here the error and cost are defined on the average with respect to a probability measure on the class F with r > 2 and with preset boundary conditions at the endpoints. The probability measure is chosen as a conditional r-folded Wiener measure. The set of functions with only single zeros has a full measure then. The computational methods considered are based on sequential function and/or derivative evaluations as information (with varying cardinality). Therefore an adaptive stopping criterion is used for each function in the class F. The error criterion is assumed to be absolute or residual. The average cost of a method is defined as the average number of information samples, arithmetic operations, and comparisons used. Therefore, the average cost is at least proportional to the average number of information samples. The hybrid bisection-secant method has indeed the average cost proportional to the number of function samples utilized. The computational complexity in the average case setting compav(e) is defined as the minimal average case cost of a method M = ( , N) with average error at most e. It turns out that compav(e) is of order loglog(l/e) and that
2.3. ANNOTATIONS
101
a hybrid bisection-secant method M;,s has almost minimal average complexity. The same result also holds for a hybrid bisection-Newton method. We stress that these results hold for methods using information with varying cardinality. All methods using information with predefined cardinality and average error at most e must use at least O(log(l/e)) function evaluations, as shown by Ritter [75]. Therefore in this class the bisection method remains almost optimal (modulo multiplicative constant). To achieve the averege cost O(loglog(l/e)) the methods must use adaptive termination rules. These rules for the bisection-secant and bisection-Newton methods are quite simple. We just need to test whether the magnitude of the function is at most e or the length of the interval containing a zero is at most 2e. Moreover, these methods guarantee that the corresponding error is at most e for every function / £ F. Thus the optimality of these methods is preserved whenever the error is measured in the worst case and the cost in the average case sense. We now give more precise description of these results. Novak et al. study the zero finding problem in the class where az-, bi are given boundary parameters, and r > 2 is the smoothness offunctions. They assume /(O) = CQ < 0 and /(I) = 60 > 0 to r guarantee existence of a zero for each function. The space C [0,1] is equipped with the norm
The average case setting is defined with respect to the r-folded Wiener measure P on the Borel -algebra on the closed subspace F of Cr[0,1]. Basic properties of F and some applications are given by Novak and Ritter in [62]. Our goal is to compute an e-approximation for every function / € F under the absolute or residual error criterions. Therefore, the solution operators are given by: and
The methods considered have the form:
102
CHAPTER 2. NONLINEAR
EQUATIONS
where a general adaptive information with varying cardinality n — n , where the number of evaluations n(f) is determined for each function / by an adaptive stopping rule, 0 < k < r is a fixed number for each method, and <j> is an arbitrary algorithm. We recall that n(f) = minjz : ter,-(7Vj- ) = 1} for some termination functions ter; as defined in general in chapter 1. We denote the error of the method M by with the absolute criterion, and by
with the residual error criterion. We denote any of these errors by E ( M n ( f } } . Novak et al. construct a hybrid bisection-secant method M^ls~sec such that the following results hold (a precise flowchart of this method is given in Fig. 104.1.). The first result says that the worst case error of this method is at most e, whenever the average cost is at most O(loglog(l/e)). Theorem 102.1 The method M^is~sec has the worst case error at most s, where 0 < e < 0.5, and the average number of evaluations
where A is a constant dependent only on the regularity r and the boundary conditions a,-, &;. The second result says that the method Mbis~sec is almost optimal, even if we do not insist on the error bound for all / € F but only on the average. Theorem 102.2 For any method Mn with average error at most s,
2.3. ANNOTATIONS
103
for 0 < € < 0.5, the average number of evaluations is
where B is any constant B > r + 0.5 and the constant G depends only on B, regularity r and the boundary conditions a;,6;. We finally outline the method M^s~sec. This method is a hybrid bisection-secant method that computes points x; G [0,1], at which the function / is evaluated, and subintervals [/,-, r;] containing a zero of /. We always have /(/,-) < 0 < /(r,), x,- £ [/,-_!, r»•_!], and [/,-, r;] C [/j_i,r,-_i]. Each function evaluation is preceded by the verification of the stopping rule. At the beginning, and later after each bisection step, the method takes two steps of the regula falsi, starting from the endpoints of the interval [/j_i, r^j]. Then the secant steps are performed until they are well defined, result in points in the current interval, and reduce the length of the current interval by at least 0.5 in every three steps. If any of these conditions is violated, a bisection step takes place. The termination functions and e-approximations are given by:
in the case of the absolute error criterion, and
We further define the points
for the secant method, with
and
for the bisection method. In each step a new interval /,- containing a zero of / is computed,
The complete method is summarized in the flowchart in Figure 104.1.
CHAPTER 2. NONLINEAR EQUATIONS
104
START i=0;
if ao > — &o then XQ = 0 else XQ = 1; I0=[l0,r0=
[0,1];
while ter t = 0 do begin Two steps of regula falsi; for j; = 1, 2 do begin
i = i + 1; xi = sec(li-1,ri-1);
evaluate /(x;); Set [/.-.r,^/,-; if ter, = 1 then return M,; STOP. end
Secant steps; while r,- - /j < (r,-_3 - /i-a)/2 do begin z = sec(z,-,a;i_ 1 ); if 2 ^ (/i, r,-) or z undefined, then go to L;
i = i + 1; xi = z; evaluate /(x;); Set [/,-,r,-] = /t-; end; Z> : (bisection step) i = i + 1; Xj = bis,; evaluate /(x,-); Set [/,-,ri] = 7j;
end; return Mt; STOP. Figure 104.1. Flowchart of the bisection-secant method.
2.3. ANNOTATIONS
105
Iterative model of computation There has been much research on the iterative model, in which one constructs a sequence of points convergent to a zero of a function, and seeks methods with the fastest possible convergence for every function in a given class or on the average. Thus, this can be viewed as an asymptotic model, worst or average case, with stationary iterative information. Stationary iterative information means that a fixed set of linear functionals is used repeatedly in generating the sequence of points. As an example the Newton information is defined by the evaluations of a function and its derivative at consecutive iterative points. More formal definition of this model and sample results are given in section 2.1.6, where we analyze asymptotic almost optimality of the bisection method. A partial list of research papers is given by Traub [112], Ortega and Rheinboldt [64], Traub and Wozniakowski [116], and Kelley [39]. Although this monograph is not devoted to the iterative model we present a brief summary of selected results. The study of complexity in the iterative model was initiated by the work of Traub [111], [112]. In the 1964 monograph, Traub considers iterations for approximating simple or multiple zeros of scalar nonlinear functions. The information is assumed to be values of function and its derivatives. He introduces a classification of iterations, according to the information used, as one-point, one-point with memory, multipoint, and multipoint with memory. He proves a maximal order theorem for one-point iterations and introduces the idea of interpolatory iteration. He conjectures that memory always adds less than one to the order of one-point iterations. He proposes a number of complexity measures. Wozniakowski [130]-[134] generalizes the problem of maximal order iterations to multivariate and infinite dimensional cases. He establishes the maximal order of interpolatory algorithms for the scalar case, and shows that the memory does not in general increase the order in the multivariate case. He introduces the concept of order of information that provides a general tool for establishing the maximal order of any given iteration. He shows that maximal order depends only on the information used by an algorithm and not on the structure of the algorithm (compare also Traub and Wozniakowski [113][116], [118]). Significant papers on optimal iterations include Brent [12]-[14], Brent, Winograd, and Wolfe [15], Kacewicz [33]-[35], Kung
106
CHAPTER 2. NONLINEAR
EQUATIONS
[46], Rung and Traub [47], [48], Meersman [51], [52], Saari and Simon [76], Trojan [122], [123], and Wasilkowski [125]-[128]. Another stream of research investigates the approximate solution of scalar or multivariate polynomial equations. It is assumed there that complete information is available through knowledge of the degree of a polynomial and all its coefficients. Some papers deal with probabilistic settings and some with different models of computation. A sample list includes the work of Hirsch and Smale [31], Kim [42], McMullen [50], Murota [55], Renegar [68]-[74], Schonhage [77], and Smale [99], [100]. Pioneering work in the average case analysis of iterative methods for approximation of polynomial zeros is due to Shub and Smale [81] who analyze the average behavior of the Newton-type method for zeros of complex polynomials. Shub and Smale show that, on the average, six starts of a modified Newton's method are sufficient to obtain a point z, such that \f(z)\ < e, with the cost proportional to d(d+ log(1/e)) for a complex normalized polynomial / of degree d. Renegar [68]-[70] investigates simplicial continuation algorithm due to Kuhn et al. [45] as well as multivariate Newton's method. In [70] he assumes the normalized Lebesgue measure on the coefficients of complex polynomials and proves that for sets of polynomials of large measure Kuhn's simplicial algorithm finds a point z with |/(z)| < £ and also \z — a < e, where /(a) = 0, within O(log(l/^)) steps. In [68], [69] he generalizes these results to the multivariate case. 2.3.2
Specific Comments
Chapter 2 presents our results published in several research papers. Specifically, section 2.1.1 presents our results from the paper [93], sections 2.1.2-2.1.4 present the results of [98], section 2.1.5 presents the material from [96], section 2.1.6 from [97], section 2.2.1 from [9], and section 2.2.2 from [94]. Section 2.1.2 1. The classical results dealing with the Gelfand n-widths used in the analysis on page 38 can be found in the monograph of Tikhomirov [108]. 2. The theory and properties of perfect splines as defined on page 40 can be found in Micchelli, Rivlin, and Winograd [54], Tikhomirov [108, pp. 130-135], and Traub and Wozniakowski [116,
2.3.
ANNOTATIONS
107
p. 129]. Specifically Eq. (41.1) can be verified in [108] and [54]. The nonlinear systems involved in the computation of the perfect spline s n _ r)T - (compare p. 40 and 42 ) are analyzed in GafTney and Powell [25]. Section 2.1.3 1. The analysis of the generalized error criterion and the approach to Theorem 45.1 was suggested by Wasilkowski. Section 2.1.5 1. The problem of finding zeros of polynomials of known degree has been studied in terms of complexity and optimality by Kuhn et al. [44], [45], Murota [55], Renegar [69], and Shub and Smale [80]. 2. We stress that using the same proof technique as in the proof of Theorem 54.1 one can obtain Theorem 4.1 of [128] for the case of real polynomials and continuous information. Section 2.1.6 1. The methods converging quadratically for zeros with bounded multiplicity (see page 56) are based on hybrid designs involving bisection, Newton- and secant-type methods, as well as higher order interpolator]) methods; see Brent [12], Traub [112], and Ortega and Rheinboldt [64]. If the multiplicity m of a zero of / is finite, then it is possible to construct information N and algorithm that guarantee asymptotically quadratic convergence. We can calculate m by using a combination of bisection and Newton's methods and applying Aitken's 62 formula, see [112, p. 129, App. D]. Knowing m we may use the modified Newton's method [112, p. 127], x!+1 = xt- — m f ( x i ) / f ' ( x i ) , which converges quadratically. For such information and algorithm, the set B contains functions with zeros of infinite multiplicity. Therefore, we cannot essentially beat the bisection only for functions having infinite multiplicity zeros. 2. We stress that we could not prove Theorem 59.1 with the sequence Sn replaced by a constant a, 0 < a < 1. We do not know whether the sequence 5n is necessary, as has been shown for linear problems [121, pp. 386-388].
BIBLIOGRAPHY
108 Section 2.2.2
1. Our results of this section exibit the superiority (in the worst case) of the T. Aird and J. Rice search procedure ZSRCH (IMSL library [2]) over Sobol's approach [101] for solving nonlinear equations in our class F of functions. 2. We now comment on the sequential versus parallel information optimality issues. In this section we proved that parallel information is optimal in the class of all sequential information. There are a number of problems for which the same result holds. For instance it is known that for linear problems sequential information does not help, as described in [121]. Cases are known of nonlinear problems for which sequential information does not help either, as outlined in the papers of Bakhvalov [6], Kiefer [41], Sukharev [106], [107], Traub et al. [121], as well as in section 2.1.2 of this book. For some nonlinear problems it may happen that sequential information is significantly better than parallel information is, as described in sections 2.1.4 and 2.1.5 as well as by Sukharev [107], and Traub and Wozniakowski [116, ch. 10]. It may be noted that for the class
and which is similar to our class F for d = 1, Sukharev [107] proved that sequential information is much more powerful than parallel information is. This means that the assumption of opposite signs at the endpoints is much stronger than the assumption about existence of a zero. In the following bibliography we list a number of books and papers devoted to various complexity aspects and the solution of nonlinear problems.
Bibliography [1] Abaffy, J., and Spedicato, E. ABS Projection Algorithms: Mathematical Techniques for Linear and Nonlinear Algebraic Equations. Ellis Horwood, Chichester, U.K., 1989.
BIBLIOGRAPHY
109
[2] Aird, T. J., and Rice, J. R. Systematic search in high dimensional sets. SIAM J. Numer. Anal., 14: 296-312, 1977. [3] Alefeld, G., Potra, F. A., and Shi, Y. On enclosing simple roots of nonlinear equations. Math. Computation, 61: 733-744, 1993. [4] Allgower, E., and Georg, K. Simplicial and continuation methods for approximating fixed points and solutions to systems of equations. SIAM Rev., 22, (1): 28-85, 1980. [5] Allgower, E., and Georg, K. Continuation Methods. SpringerVerlag, Berlin, New York, 1990. [6] Bakhvalov, N. S. On the optimality of linear methods for operator approximation in convex classes of functions. USSR Comp. Math. Math. Phys., 11: 244-249, 1971. [7] Blum, L., Shub, M., and Smale, S. On a theory of computation and complexity over the real numbers; NP completeness, recursive functions and universal machines. Bull. Amer. Math. Soc. (New Series), 21: 1-46, 1989. [8] Booth, R. S. Location of zeros of derivatives. SIAM J. Appl. Math., 15: 1495-1501, 1976. [9] Boult, T., and Sikorski, K. Can we approximate zeros of functions with nonzero topological degree? J. Complexity, 4: 317329, 1987. [10] Boult, T., and Sikorski, K. Complexity of computing topological degree of Lipschitz functions in N-dimensions. J. Complexity, 2: 44-59,1986. [11] Boult, T., and Sikorski, K. Complexity of computing topological degree of Lipschitz functions in two dimensions. SIAM J. Sci. Stat. Computing, 10, (4): 1-13, 1989. [12] Brent, R. P., Algorithms for Minimization Without Derivatives. Prentice Hall, Englewood Cliffs, NJ, 1973. [13] Brent, R. P. A class of optimal order zero-finding methods using derivative evaluations, in Analytic Computational Complexity (J. F. Traub, Ed.). Academic Press, New York: 59-73, 1976.
110
BIBLIOGRAPHY
[14] Brent, R. P. Multiple precision zero-finding methods and the complexity of elementary function evaluation, in Analytic Computational Complexity (J. F. Traub, Ed.). Academic Press, New York: 151-176, 1976. [15] Brent, R. P., Winograd, S., and Wolfe, P. Optimal iterative processes for root finding. Numer. Math., 20: 327-341, 1973. [16] Bus, J. P. C., and Dekker, T. J. Two efficient algorithms with guaranteed convergence for finding a zero of a function. ACM Trans. Math. Software, 1: 330-345, 1975. [17] Chernousko, F. L. An optimal algorithm for finding the roots of an approximately computed function. USSR Comput. Math, and Math. Phys., 8: 1-24, 1968. [18] Dekker, T. J. Finding a zero by means of successive linear interpolation, in Constructive Aspects of the Fundamental Theorem of Algebra (B. Dejon and P. Henrici, Eds.). Wiley-Interscience, New York, 1969. [19] Dennis, J. E., and Schnabel, R. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. SIAM, Philadelphia, 1996. [20] Eichorn, B. H. On sequential search, in Selected Statistical Papers, Vol. 1 Math. Centrum, Amsterdam: 81-85, 1968. [21] Eiger, A., Sikorski, K., and Stenger, F. A bisection method for systems of nonlinear equations. ACM Trans. Math. Software, 10, (4): 367-377, 1984. [22] Eaves, B. C. A short course in solving equations with PL homotopies. SIAM-AMS Proc., 9: 73-143, 1976. [23] Eaves, B. C., Gould, F. J., Peitgen, H. O., and Todd, M. J. (Eds.) Homotopy Methods and Global Convergence. Academic Press, New York, 1970. [24] Fletcher, R. Practical Methods of Optimization. 2-nd Ed., Wiley, Chichester, U.K., 1987. [25] Gaffney, P. W., and Powell, M. J. D. Optimal interpolation in numerical analysis, in Lecture Notes in Mathematics, Vol.
BIBLIOGRAPHY
111
506 (G. A. Watson, Ed.) Springer-Verlag, New York, Berlin: 90-100, 1976. [26] Garcia, C. B., and Zangwill, W. I. Pathways to Solutions, Fixed Points and Equilibria. Prentice -Hall, Englewood Cliffs, NJ, 1981. [27] Gill, P. E., Murray, W., and Wright, M. H. Practical Optimization. Academic Press, London, 1981. [28] Graf, S., Novak, E., and Papageorgiou, A. Bisection is not optimal on the average. Numer. Math., 55: 481-491, 1989. [29] Gross, O., and Johnson, S. M. Sequential minimax search for a zero of a convex function. MTAC, 13: 44-51, 1959. [30] Harvey, C., and Stenger, F. A two dimensional analogue to the method of bisections for solving nonlinear equations. Quart. Appl. Math., 33: 351-367, 1976. [31] Hirsch, M., and Smale, S. On algorithms for solving f ( x ) = 0. Comm. Pure Appl. Math., 32: 281-312, 1979. [32] Hyafil, L. Optimal Search for the Zero of the (n — l)-st Derivative. IRIA/LABORIA, Rep. No. 247, 1977. [33] Kacewicz, B. The use of integrals in the solution of nonlinear equations in N-dimensions, in Analytic Computational Complexity (J. F. Traub, Ed.). Academic Press, New York: 127-141, 1976. [34] Kacewicz, B. An integral interpolation iterative method for the solution of scalar equations. Numer. Math., 26: 355-365, 1976. [35] Kacewicz, B. Integrals with a kernel in the solution of nonlinear equations in N dimensions. J. Assoc. Comput. Mach., 26: 233249,1979. [36] Kahaner, D., Moler, C. and Nash, S. Numerical Methods and Software, Prentice Hall, Englewood Cliffs, NJ, 1989. [37] Kearfott, B. R. Computing the Degree of Maps and a Generalized Method of Bisection. Ph.D. dissertation, University of Utah, 1977.
112
BIBLIOGRAPHY
[38] Kearfott, B. R. An efficient degree computation method for a generalized method of bisection. Numer. Math., 32: 109-127, 1979. [39] Kelley, C. T. Iterative Methods for Linear and Nonlinear Equations. SIAM, Philadelphia, 1995. [40] Kiefer, J. Sequential mimimax search for a maximum. Proc. Amer. Math. Soc., 4: 502-505, 1953. [41] Kiefer, J. Optimum sequential search and approximationmethods under minimum regularity assumptions. J. Soc. Indust. Appl. Math., 5: 105-136, 1957. [42] Kim, M. H. Ph.D. dissertation, CUNY Graduate School, 1985. [43] Ko, Ker-I. Applying techniques of discrete complexity theory to numerical computation, in Studies in Complexity Theory ( R. V. Book, Ed.) Research Notes in Theoretical Computer Science. Pitman, London: 1-62, 1986. [44] Kuhn, H. W. Finding roots of polynomials by pivoting, in Fixed Points: Algorithms and Applications (S. Karamardian, Ed.) Academic Press, New York, San Francisco, London, 1977. [45] Kuhn, H. W., Wang, Z., and Xu, S. On the cost of computing roots of polynomials. Math. Programming, 28: 156-163, 1984. [46] Kung, H. T. The complexity of obtaining starting points for solving operator equations by Newton's method, in Analytic Computational Complexity (J. F. Traub, Ed.). Academic Press, New York: 33-57, 1976. [47] Kung, H. T., and Traub, J. F. Optimal order of one-point and multipoint iteration, in Complexity of Computation (R. H. Karp, Ed.). SIAM-AMS Proc., 7: 149-160, 1974. [48] Kung, H. T., and Traub, J. F. Optimal order and efficiency for iterations with two evaluations. SIAM, J. Numer. Anal, 13: 84-99, 1976. [49] Majstrovskij, G. D. On the optimality of Newton's method. Soviet. Math. Dokl., 13: 832-840, 1972.
BIBLIOGRAPHY
113
[50] McMullen, C. Families of Rational Maps and Iterative RootFinding Algorithms. Ph.D. dissertation, Harward University, 1985. [51] Meersman, R. On Maximal Order of Families of Iterations for Nonlinear Equations. Ph.D. dissertation, Vrije University, Brussels, 1976. [52] Meersman, R. Optimal use of information in certain iterative processes, in Analytic Computational Complexity (J. F. Traub, Ed.). Academic Press, New York: 109-125, 1976. [53] Micchelli, C. A., and Miranker, W. L. High order search methods for finding roots. J. Assoc. Comput. Mach., 22: 52-60, 1975. [54] Micchelli, C. A., Rivlin, T. J., and Winograd, S. The optimal recovery of smooth functions. Numer. Math., 26: 191-200, 1976. [55] Murota, K. Global convergence of a modified Newton's iteration for algebraic equations. SIAM J. Numer. Anal, 19: 793-799, 1982. [56] Nemirovsky, A., and Yudin, D. B. Problem Complexity and Method Efficiency in Optimization. Wiley, New York, 1983. [57] Nemirovsky, A., and Nesterov, Y. Interior-Point Polynomial Algorithms in Convex Programming. SIAM, Philadelphia, 1994. [58] Novak, E. Average case results for zero finding. J. Complexity, 5: 489-501, 1989. [59] Novak, E. Determining zeroes of increasing Lipschitz functions. Aequationes Math., 41: 161-167, 1991. [60] Novak, E. The Bayesian approach to numerical problems: Results for zero finding. Proc. IMACS-GAMM Int. Symp. on Numerical Methods and Error Bounds. Akademie Verlag, Berlin, 164-171, 1995. [61] Novak, E., and Ritter, K. Average errors for zero finding: Lower bounds. Math. Zeitschrift, 211: 671-686, 1992.
114
BIBLIOGRAPHY
[62] Novak, E., and Ritter, K. Some complexity results for zero finding for univariate functions. J. Complexity, 9: 15-40, 1993. [63] Novak, E., Ritter, K., and Wozniakowski, H. Average case optimality of a hybrid secant-bisection method. Math. Computation, 64: 1517-1539, 1995. [64] Ortega, J. M., and Rheinboldt, W. C. Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York, 1970. [65] Osborne, M. R. Finite Algorithms in Optimization and Data Analysis. Chichester, New York, Wiley, New York, 1985. [66] Powell, M. J. D. Convergence properties of a class of minimization algorithms. Nonlinear Programming 2 (O. Mangasarian, R. Meyer, and S. Robinson, Eds.). Academic Press, New York, 1-27,1975. [67] Priifer, M., and Siegberg, H. W. On Computational Aspects of Topological Degree in Tin. Sonderforschungsbereich 72, Approximation und Optimierung, Universitat Bonn, Preprint No. 252, 1980. [68] Renegar, J. On the complexity of a piecewise linear algorithm for approximating roots of complex polynomials. Math. Programming 32: 301-318,1985. [69] Renegar, J. On the cost of approximating all roots of a complex polynomial. Math. Programming, 32: 319-336, 1985. [70] Renegar, J. On the efficiency of Newton's method in approximating all zeros of a system of complex polynomials. Math. Operations Res., 121-148, 1987. [71] Renegar, J. On the worst case arithmetic complexity of approximating zeros of polynomials. J. Complexity, 3: 90-113, 1987. [72] Renegar, J. On the worst case arithmetic complexity of approximating zeros of systems of polynomials. SIAM J. Computing, 18: 350-370, 1989. [73] Renegar, J. On the computational complexity and geometry of the first-order theory of the reals: Parts I, II and III. J. Symbolic Computation, 13: 255-352, 1992.
BIBLIOGRAPHY
115
[74] Renegar, J. On the computational complexity of approximating solutions for real algebraic formulae. SIAM J. Computing, 21: 1008-1025, 1992. [75] Ritter, K. Average errors for zero finding. Lower bounds for smooth or monotone functions. Aequationes Math., 48: 194219, 1994. [76] Saari, D. G., and Simon, C. P. Effective price machanisms. Econometrica, 46: 1097-1125, 1978. [77] Schonhage, A. The Fundamental Theorem of Algebra in Terms of Computational Complexity. Report, University of Tubingen, 1982. [78] Schonhage, A. Equation solving in terms of computational complexity, in Proc. Intern. Congress Math., Berkeley, 1986. [79] Shub, M. On the distance to the zero set of a homogeneous polynomial. J. Complexity, 5: 303-305, 1989. [80] Shub, M., and Smale, S. On the Average Cost of Solving Polynomial Equations. Proc. 1982 Rio Conference on Dynamical Systems, 1982. [81] Shub, M., and Smale, S. Computational complexity: On the geometry of polynomials and a theory of cost: Part I. Annales Scientifiques de 'Ecole Normale Superieure, 4 serie, t. 18: 107142, 1985. [82] Shub, M., and Smale, S. On the geometry of polynomials and a theory of cost: Part II. SIAM J. Computing, 15: 145-161, 1986. [83] Shub, M., and Smale, S. On the existence of generally convergent algorithms. J. Complexity, 2: 2-11, 1986. [84] Shub, M., and Smale, S. Complexity of Bezout's theorem I: Geometrical aspects. J. AMS, 6: 459-501, 1993. [85] Shub, M., and Smale, S. Complexity of Bezout's theorem II: Volumes and probabilities, in Computational Algebraic Geometry (F. Eyssette, and A. Galligo, Eds.). Progress in Mathematics, Birkhauser, 109: 267-285, 1993.
116
BIBLIOGRAPHY
[86] Shub, M., and Smale, S. Complexity of Bezout's theorem III: Condition number and packing. J. Complexity, 9: 4-14, 1993. [87] Shub, M., and Smale, S. Complexity and Bezout's theorem V: Polynomial time. Theoret. Comput. Sci., 133: 141-164, 1994. [88] Shub, M., and Smale, S. On the intractibility of Hilbert's Nullstellensatz and an algebraic version of "NP not equal to P" ? Duke Math J., 81: 47-54, 1995. [89] Shub, M., and Smale, S. Complexity of Bezout's theorem IV: Probability of success; Extensions. SINUM, 33: 128-148, 1996. [90] Shub, M., Buergisser, P., and Lickteig, T. Test complexity of generic polynomials. J. Complexity, 8: 203-215, 1992. [91] Shub, M., Tischler, D., and Williams, R. F. The Newtonian graph of a complex polynomial. SIAM J. Math. Anal., 19: 246256, 1988. [92] Sikorski, K. A three dimensional analogue to the method of bisections for solving nonlinear equations. Math. Comp., 33: 722-738, 1979. [93] Sikorski, K. Bisection is optimal. Numer. Math., 40: 111-117, 1982. [94] Sikorski, K. Optimal solution of nonlinear equations satisfying a Lipschitz condition. Numer. Math., 43: 225-240, 1984. [95] Sikorski, K. Optimal solution of nonlinear equations. J. Complexity, 1: 197-209, 1985. [96] Sikorski, K. Study of linear information for classes of polynomial equations. Aequationes Mathematicae, 37: 1-14, 1989. [97] Sikorski, K., and Trojan, G. M. Asymptotic near optimality of the bisection method. Numer. Math. 57: 421-433, 1990. [98] Sikorski, K., and Wozniakowski, H. For which error criteria can we solve nonlinear equations? J. Complexity, 2: 163-178, 1986. [99] Smale, S. The fundamental theorem of algebra and complexity theory. Bull. Amer. Math. Soc., 4:1-36, 1981.
BIBLIOGRAPHY
117
[100] Smale, S. On the efficiency of algorithms of analysis. Bull. Amer. Math. Soc., 13: 87-121, 1985. [101] Sobol, I. M. On the systematic search in a hypercube. SIAM J. Numer. Anal. 16: 790-793, 1979. [102] Stenger, F. Computing the topological degree of a mapping in Kn. Numer. Math., 25: 23-38, 1975. [103] Stynes, M. An algorithm for numerical calculation of topological degree. Appl. Anal., 9: 63-77, 1979. [104] Stynes, M. A simplification of Stenger's topological degree formula. Numer. Math., 33: 147-156, 1979. [105] Stynes, M. On the construction of sufficient refinements for computation of topological degree. Numer. Math., 39: 453-462, 1981. [106] Sukharev, A. G. Optimal strategies of the search for an extremum. USSR, Comp. Math, and Math. Phys., 11: 119-137, 1971. [107] Sukharev, A.G. Optimal search for a zero of function satisfying a Lipschitz condition. USSR, Comp. Math, and Math. Phys., 16: 17-26, 1976. [108] Tikhomirov, V. M. Some Problems of Approximation Thoery. Moscow State University [in Russian], 1976. [109] Todd, M. J. The computation of fixed points and applications, in Springer Lecture Notes in Economics and Mathematical Systems, Vol. 124, Springer-Verlag, Heidelberg/New York, 1976. [110] Todd, M. J. Optimal dissection of simplices. SIAM J. Appl. Math., 34, (4): 792-803, 1978. [Ill] Traub, J. F. On functional iteration and the calculation of roots, in Preprints of Papers 16th Nat. ACM Conf. Session 5A-1, Los Angeles, CA: 1-4, 1961. [112] Traub, J. F., Iterative Methods for the Solution of Equations. Prentice Hall, Englewood Cliffs, NJ, 1964. 2nd ed., Chelsea, New York, 1982.
118
BIBLIOGRAPHY
[113] Traub, J. F., and Wozniakowski, H. Strict lower and upper bounds on iterative computational complexity, in Analytic Computational Compleixity (J. F. Traub, Ed.). Academic Press, New York: 15-34, 1976. [114] Traub, J. F., and Wozniakowski, H. Optimal linear information for the solution of nonlinear operator equations, in Algorithms and Complexity, New Directions and Recent Results (J. F. Traub, Ed.). Academic Press, New York, 103-119, 1976. [115] Traub, J. F., and Wozniakowski, H. Convergence and complexity of Newton's iteration for operator equations. J. Assoc. Comput. Much., 26: 250-258, 1979. [116] Traub, J. F., and Wozniakowski, H. A General Theory of Optimal Algorithms, Academic Press, New York, 1980. [117] Traub, J. F., and Wozniakowski, H. Convergence and complexity of interpolatory-Newton iterations in Banach space. Comput. Math. Appl, 6: 385-400, 1980. [118] Traub, J. F., and Wozniakowski, H. Optimal radius of convergence of interpolatory iterations for operator equations. Aequationes Math., 21, (2-3): 159-172, 1980. [119] Traub, J. F., and Wozniakowski, H. Information and computation, in Advances in Computers, Vol. 23. Academic Press, New York: 35-92, 1984. [120] Traub, J. F., Wasilkowski, G. W., and Wozniakowski, H. Information, Uncertainty, Complexity. Addison-Wesley, Reading, MA, 1983. [121] Traub, J. F., Wasilkowski, G. W., and Wozniakowski, H. Information Based Complexity. Academic Press, New York, 1988. [122] Trojan, G. M. Tight bounds on the complexity index of one point iteration. Comput. Methods Appl., 6: 431-433, 1980. [123] Trojan, G. M. Lower bounds and fast algorithms for sequence acceleration. J. ACM, 31: 329-335, 1984. [124] Vavasis, S. Nonlinear Optimization, Complexity Issues. Oxford University Press, New York, 1991.
BIBLIOGRAPHY
119
[125] Wasilkowski, G. W. Can any stationary iteration using linear information be globally convergent? J. Assoc. Comput. Mach., 27: 263-269, 1980. [126] Wasilkowski, G. W. n-Evaluation conjecture for multipoint iterations for the solution of scalar nonlinear equations. J. Assoc. Comput. Mach., 28: 71-80, 1981. [127] Wasilkowski, G. W. The strength of nonstationary iteration. Aequationes Math., 24: 243-260, 1981. [128] Wasilkowski, G. W. Any iteration for polynomial equations using linear information has infinite complexity. Theoret. Comput. Set., 22: 195-208, 1983. [129] Wasilkowski, G. W. Average case optimality. J. Complexity, 1: 107-117, 1985. [130] Wozniakowski, H. On Nonlinear Iterative Processes in Numerical Methods, Ph.D. dissertaion, University of Warsaw [in Polish], 1972. [131] Wozniakowski, H. Maximal stationary iterative methods for the solution of operator equations. SIAM J. Numer. Anal., 11: 934-949,1974. [132] Wozniakowski, H. Generalized information and maximal order of iteration for operator equations. SIAM J. Numer. Anal., 12: 121-135, 1975. [133] Wozniakowski, H. Properties of maximal order methods for the solution of nonlinear equations. Z. Angew. Math. Mech., 55: 268-271, 1975. [134] Wozniakowski, H. Maximal order of multipoint iterations using n evaluations, in Analytic Computational Complexity (J. F. Traub, Ed.). Academic Press, New York: 75-107, 1976. [135] Wozniakowski, H. A survey of information based complexity. J. Complexity, 1: 11-44, 1985.
This page intentionally left blank
Chapter 3
Fixed Points-Contractive Functions Fixed point computation has been an intensive research area since 1967 when Scarf introduced simplicial algorithm to approximate fixed points. Several algorithms have been invented since then, including restart and homotopy methods. Most of these were designed to approximate fixed points of general maps and used the residual error criterion. In this chapter we consider the absolute and/or relative error criteria for contractive univariate and multivariate functions. The departure of our analysis is the classical Banach fixed point theorem. Namely, we consider a function / : D D, where D is a closed subset of a Banach space B. We assume that / is contractive with a factor q < 1, i.e.,
Then, there exists a unique a = a(f) € D such that a is a fixed point of /, We let
be the class of all such functions. The proof of Banach's theorem is based on a simple iteration algorithm defined as follows. We first choose any x0 D, and let
CHAPTER 3. FIXED
122
POINTS-CONTRACTIVE
Then
By assuming that = 0 D, we get the relative error of the nth approximation xn estimated by
This bound is sharp, i.e., there exists a function / for which the relative error of xn is equal to qn. Thus to guarantee that the relative error is at most e, e > 0, we must perform
steps, that require s(e, q) function evaluations. Here and elsew in this chapter, log(x) means logarithm base 2. By assuming that each function evaluation costs c > 0, the simple iteration algorithm computes ^-approximation to a with cost c • s(e, ). By an e-approximation we mean here a point x D such that For many scientific applications, the contraction factor q is very close to one. In this case, the cost of the simple iteration algorithm is huge. For example by taking e = 10 -4 and q = 1 — 10 -4 , the cost is roughly 0.9-10 5 c. Can we do better? That is, can we guarantee the same error estimate using fewer function evaluations? Or, what is the complexity (minimal cost) of the computation of a fixed point to within the error e ? This is the subject of this chapter for the contractive case (q < 1) and of the following chapter for noncontractive functions
3.1
Univariate Problems
In this section we study the complexity of approximating fixed points of univariate contractive functions. This is done for the relative error criterion in section 3.1.1, and for the absolute error criterion in section 3.1.2. We prove that in both cases similar methods based
3.1. UNIVARIATE PROBLEMS
123
on bisection-envelope hybrid constructions are almost optimal. Precisely, we define the class of functions
and the singleton set
We want to compute an e-approximation M(f) in S(/,e) for any function / € F with respect to the the relative error criterion
with the convention 0/0 = 0, and the absolute error criterion
The information is assumed to be arbitrary sequential evaluations of function values in [0,1]. 3.1.1
Relative Error Criterion
In this section we use the relative error criterion, and investigate information with varing cardinality n. By m(e, q) we denote the minimal cardinality number, i.e., the minimal number of function evaluations necessary to compute e-approximation in the relative sense, for every function in the class F. We show that m(s,q) is much less than the number of function evaluations s(e, q) in the simple iteration. More precisely, we present a Fixed Point Envelope (FPE) method, which constructs the sequence {xn} of successive approximations to the fixed point a = a(f) with the relative error en = \xn — \a\ satisfying the inequality
Obviously en 0 as n and for q 1 and large n we approximately have en 0.5-e n _i. Thus, we have a linear convergence with the ratio 0.5 instead of q as in the simple iteration 121.1. We prove that the FPE method minimizes the number of function evaluations. We also show that it is an almost optimal complexity method.
124
CHAPTER 3. FIXED
POINTS-CONTRACTIVE
We derive bounds on m(e, q) assuming that e q < I. From (123.1) we obtain
0.5 and 1/3 <
Thus asymptotically as e 0, m(e, q) does not depend on q. On the other hand, for fixed e and q 1 , m(e, q) goes to infinity pathologically slowly, as log log(l/(l - q)). We now compare m(e, q) with n(e, q) for e = 10~4 and q = 4 1 - 10~ . We have m(e,
q)
€ [16, 18], and n(e,
q)/m(e,
q)
5000.
Thus the FPE method is 5000 times more effective than the simple iteration for this data. The FPE method relies on constructing two envelopes that interpolate the already computed function values. Then the set of all possible fixed points of functions that have the same information is given by the interval [a, 6], where a and b are the fixed points of the envelopes. The best approximation is provided by the harmonic mean of a and b. The next evaluation point is chosen such that it minimizes the relative error for a worst possible function value. It is a zero of a quadratic polynomial. We stress that the FPE method uses function evaluations at sequentially chosen points. In fact, we show that n parallel function evaluations cannot provide an e-approximation, with e < q, no matter how large n is. Thus the fixed point problem with the relative error criterion is an example of a nonlinear problem for which sequential computation must be used. This is in contrast to linear problems, for which the sequential computation is not more powerful than parallel information. The FPE method
In this section we derive the FPE method. In what follows, by tn we denote the nth point at which the function / is evaluated, and by [an, bn] we denote the smallest interval that contains the fixed points of all functions from the set An which contains all functions such that We let and be the lower and upper envelopes for the set An. That is,
3.1. UNIVARIATE
PROBLEMS
125
and
Clearly,
and
interpolate the data,
and they are piecewise linear functions with slopes ±g. They belong to the class F and
For n = 1, we set ti = 0 and compute f\ = f ( t \ ) , as illustrated in Figure 126.1. The envelopes are given by
From this we easily conclude that The FPE method computes the first approximation x1 such that the relative error between x\ and the fixed points a from [ai, bi] is minimal, i.e.,
This minimal error is achieved when
being the harmonic mean of a\ and b\. We observe that (61 — a1)/(b1 + a1) q for all fi € [0, 1] and that (b1 - a1)7(b1 + a1) = q for 0 1 - q. We now suppose that t , , and [ai, bi] have already been computed for i = !,...,«— 1. Obviously, if ti = /(£;) for some z, then the fixed point of / is exactly known. Thus, we assume that tj jt f ( t i ) for all i. We also assume inductively that ti $ [a,-, 6,-].
126
CHAPTER 3. FIXED
POINTS-CONTRACTIVE
Figure 126.1: The functions f± and f\. The method FPE chooses the next evaluation point tn from [a n _i, 6 n _i] in such a way that the relative error is minimized for the worst possible value of f ( t n ) that is consistent with the previously computed values. To derive the formula for tn we proceed as follows. We take any x € [a n _i,6 n _i]. Then the value of f ( x ) belongs to the interval [/ n _i(x),/ n _i(o;)], where / n _i and _! are the lower and upper envelopes, respectively. For f(x) > x we illustrate this in Figure 127.1. The next interval [a, b] containing the fixed points is given by
If
then we have
We note that x £ [a,b] whenever x ^ /(#)• Knowing the interval [a, 6], the harmonic mean 2ab/(a + b) is the best approximation, and the relative error is (b - a)/(b + a}. Obviously, a = a ( x , /(#)) and
111
3.1. UNIVARIATE PROBLEMS
Figure 127.1: The envelopes
and
depend on the pair (x,/(x)). The point tn is chosen as the solution of the following minimization problem,
Using simple geometric arguments and the formulas for a ( x , f ( x ) ) and b(x, f ( x ) ) we get
Thus, the point tn is given by the equation
This leads to the formula for
CHAPTER 3. FIXED
128
POINTS-CONTRACTIVE
We observe that for q close to one, we have is close to the geometric mean of a n _i and bn_l. On the other hand, for q close to zero we have tn = 2a n - 1 b n _ 1 /(a n _i + b n -i) which is close to the harmonic mean of a n _i and b n _ i . Knowing t n , we compute = f ( t n ) and determine the new interval [an, 6n] of fixed points using the previous formulas for x = tn. Finally, we set
as the nth approximation. We summarize the FPE method, which terminates whenever the relative error does not exceed e, in the following flowchart.
Stop :
xn is an e — approximation.
We now analyze the convergence of the squence {xn}. We let en = \xn — a\/\a\, where a is the fixed point of /. Then en < e*. Furthermore it can happen that for some function /, en = e*n for every n. Indeed, this is the case for f ( x ) — I — q + qx. We observe that we have two formulas for e*,
or
Using the formula for £n, we get after several algebraic transformations,
3.1. UNIVARIATE PROBLEMS
129
(129.1) where
and
with It is easy to verify that and for
We observe that
and
Therefore
and
Thus we
have
Clearly, e*
0. Furthermore, for large n we have
Thus, for q close to one, we have linear convergence with the ratio almost |, e* 2 e n-i' This should be contrasted with the simple iteration algorithm for which linear convergence has the ratio q. This indicates that the FPE algorithm is much more efficient than simple iteration is for q close to one. To analyze how fast e* goes to zero, we denote by k = k(q) the minimal number of steps needed to guarantee that e*k < |. We show that To this end, we assume inductively that
CHAPTER 3. FIXED
130
POINTS-CONTRACTIVE
This is true for i — 1. From the formula for e*+1 and from we get
which proves the form of e*. Thus
as claimed. We checked numerically that for
To reduce the error to E, s < |, we have to perform k+p steps, where p is chosen such that e*+fc < 2ee£ < e. Using the upper bound on g(z), we get
Thus
Using the lower bound on g ( z ) , we get for q > |,
Therefore we get
We summarize this analysis in the following theorem.
3.1. UNIVARIATE PROBLEMS
131
Theorem 131.1 For any f in the class F, the FPE method computes M(f) = xn such that \xn — a\ < £\a\, f ( a ) = a, using n = n(e, q, f) function evaluations with
Thus,
Furthermore, for the function
The speed of convergence of the FPE method "almost" does not depend on how close q is to one, since the function loglog(l/(l — q)} goes to infinity pathologically slowly as q tends to 1. We tested the FPE method for the worst case function f(x) = 1 — q -+- xq with e = 1(T6 and for q - 1 - 10~f for i = 4,6,8. The FPE solved the problem with using 24, 24, and 25 steps, respectively. We also compared the FPE and simple iteration for the same function with e = 1(T4 and q = 1 - 1(T6 The FPE used 18 steps whenever the simple iteration needed more than 9 • 106 steps. In general the simple iteration method requires log(l/e)/log(l/g) steps, whenever the FPE requires roughly ^ • log(l/e) + log log(l/(l — g)) steps. For fixed s and q — 1 — <5, with S 0, we have 1/6 log(l/e) steps of the simple iteration, and loglog(l/$) steps of the FPE method to get the same accuracy of solution. Optimality of the FPE method
We now establish optimality of the FPE method by showing that it minimizes the number of function evaluations necessary to compute an e-approximation. We suppose that for a function / G -F, one constructs an eapproximation z(f] = Z(T\, /(TI), ..., f(Tn(j))) to the fixed point «(f),
132
CHAPTER 3. FIXED
POINTS-CONTRACTIVE
using an arbitrary sequential method. This means that T\ is the point of the first function evaluation, rt depends on all previously computed values /(TI), ...,/(r,-_i), and the total number n(f) of function evaluations is determined by the selection of the termination functions ter; : IV -» {0,1),
We recall that the FPE method computes an e-approximation £„*(f) to the fixed point «(f) for any / € F, for
Below, in Theorem 132.1, we show that the minimal number of function evaluations m(£,q) is equal to min{n : e*n < s}. This implies that the FPE method is optimal. As a consequence of that and of Theorem 131.1 we get for e < 0.5 and q > l/v/2
We are ready to prove Theorem 132.1 For any sequential method computing z(f) with the relative error \z(f) — a(f)|/|a(f)| < s, for any function f G F, the worst case number of function evaluations is at least nm\n = min{n : e*n < e}, i.e.,
As an immediate consequence we obtain m(e, mality of the FPE method.
q)
\n, ni.e., m=
opti-
Proof The proof is based on a contradictive argument. We supoose to the contrary that n(f) < min{n : e*n < e} for every function f € F. Then for the first evaluation point TI we define
3.1. UNIVARIATE PROBLEMS
133
The interval [01,61] of fixed points of functions from the set A\ = {/ 6 F : f(r\) — /i) has the property that T\ £ [ai,&i] and (&i — ai)/(b\ + ai) >q = e$. We assume inductively that Ti and /,• = / f a ) have been computed for i = 1,..., n — 1. We let / n _i and jn-\ denote the lower and upper envelopes for the set We let [an_!, 6 n _j] be the smallest interval containing the fixed points of functions from inductively assume that
We let TL be the largest point from {0, rt,..., r n _i} which is not greater than a n _i, and let TR be the smallest point from {TI, ..., r n _i, 1} which is not greater than 6 n _i. This is illustrated in Figure 134.1. We define the point t* by the formula from the FPE method for the interval [a n _i,6 n _i],
The points Ln and Rn are obtained by shifting the graph of / n _i and fn-i to touch the points 6 n _i and a n _i, respectively,
and We suppose that the point rn of the next evaluation is chosen based on the previous information /(r t -), i = 1,..., n — 1. We assign the value = f ( r n ] as follows.
134
CHAPTER 3. FIXED
POINTS-CONTRACTIVE
Figure 134.1: Definition of TL and TR. We denote by [a n ,6 n ] the next interval of fixed points after f ( r n ) has been computed. It is easy to check that if (i) or (ii) holds then [an,bn] = [a n _i,6 n _i]. If (iii) or (iv) holds then [an,bn] is a proper subset of ja r a _i, 6 n _i].- It turns out that (bn — an}/(bn + an) is minimized if Tn = t*. Thus we have
We are now ready to construct a function /* from F for which We apply the construction presented above for n = 1,2, ...,k. Here k = min{z : ter,-(/i, ...,/,•) = 1}. Since n(f) < m'm{n : e* < e} for every / e F,then k < min{n : e* < e} — 1. We consider two envelopes f, and /^. Since they share the same values at r,-, then z(/,) = z(f'f.) = Z(TI, fi,..., fk) = Zk- We observe that the interval [afc,6/;] of fixed points is such that (b^ — dk)/(bk + ak) > e*k > £• Furthermore, a/j = f , ( a k ) and bk = fk(bk)We define
3.1. UNIVARIATE PROBLEMS
135
Then
This is a contradiction that completes the proof. • Complexity of the problem
We now analyze the complexity of the problem. As before we assume that each function evaluation cost c units of time, and that arithmetic operations, comparisons, and square root computation have unit cost. The cost of the FPE method, cost(FPE, e, 9), is at most equal to m(£,g)(c + 21). By comp(e, q) we denote the e-complexity of the problem. From Theorems 131.1 and 132.1, we obtain Theorem 135.1 We let
Thus, for c »
1, the FPE method almost minimizes the complexity,
The essence of this theorem is that even for q pathologically close to one, the e-complexity for small e is roughly c • log(l/s). The bound c -log(l/£:) on the e-complexity can be concluded from our results in section 2.1.1. We proved there that for the absolute error criterion, \x — a(f)\ < £, one has to perform roughly log(l/e) evaluations, even if arbitrary linear functionals are sequentially computed as information (optimality of the bisection method). Since the e-complexity for the relative error is no smaller than the e-complexity for the absolute error, we conclude that comp(e, q) is roughly equal c • \og(l/s) even if arbitrary linear functional evaluations are permitted. This means that the FPE method remains almost optimal in this case as well. Arbitrary interval [a, b]
We assumed that the domain D = [0,1]. It is not difficult to verify that the analysis carries over for D = [a, b], 0 < a < b < +00. The only changes in the FPE method are the starting values a0 and 60.
136
CHAPTER 3. FIXED
POINTS-CONTRACTIVE
We should now set GO = a, b0 = 6. Then e\ = q(b—t\)/(b+qti). For q close to one we have t\ « \fab and e\ = q(l — \/a/6)/(1 + \/a/b). For a positive a, the number of steps k needed to achieve e*k = 0.5 does not depend on q. Therefore, the term loglog(l/(l — q)} disappears from the estimate of n(f, £, q). One can similarly analyze the domain D = [a, b] for —oo < a < b < +00. For example, we let D = (-00,+00). Then, it is enough to set t\ = 0, compute /(ti), and determine the first interval [ai,&i] of fixed points. We observe that if f ( t i ) > 0 then 0 < a\ and the FPE method can be used without any modification. On the other hand, if f ( t \ ) < 0 then b± < 0 and the sign of the point t^ as well as the points tn should be changed, i.e.,
It is clear that Theorem 131.1 also remains valid in this case. Parallel information
The FPE method uses sequential function evaluations. We now show that the sequential computation is crucial for the solution of the fixed point problem. Namely, we show that any parallel method based on evaluations of function values at n predetermined points cannot produce an approximation with relative error less that £, no matter how large n is! Indeed, we assume that for a priori chosen points r,-, 0 < T\ < TI < ... < rn < 1, we compute /fa), ..., /(r n ). If TI > 0, then we set /fa) = TI. Then z(f) = z(J) = zfa, qr^ ..., qrn] and
In any case it is impossible to have the relative error no greater than with
3.1.
UNIVARIATE PROBLEMS We now assume that
137 We consider two functions,
and
We observe that /i, /2 G -F, and that l,...,n. The fixed points of fj are given and a(/2) = r 2 (l - )/2. Then for any approximation we have
and
This means that for e < q it is impossible to compute an e-approximation using n parallel function evaluations. This holds for an arbitrarily large n. We conclude that the fixed point problem is an example for which sequential evaluations are necessary. Roughly speaking, log(l/£~) of them are enough to compute an ^-approximation, whereas an arbitrary number of parallel evaluations is not sufficient to solve the problem with e < q.
3.1.2
Absolute Error Criterion
For univariate contractive functions we designed a Fixed Point Envelope (FPE-A) method. This method, like the FPE described in the previous section, constructs two envelope functions that interpolate already computed function values. Then the set of all possible fixed points of functions that coincide at all evaluation points is given by the interval of uncertainty [a, 6], where a and b are the fixed points of the envelopes. Given the initial interval of uncertainty [0,1], the method iteratively computes functions at the midpoints of intervals of uncertainty until the length of some interval is at most 2e. Then the midpoint of the last interval is an ^-approximation to the fixed point. The FPE-A method is given in Figure 138.1.
CHAPTER 3. FIXED
138
while
POINTS-CONTRACTIVE
do begin
if
then return is the fixed point else if then
else
endif end; return
as an e-approximation to a.
Figure 138.1: The FPE-A method.
the FPEIn section 4.1.1, we show that for any A method utilizes the minimal number of function evaluations to compute an ^-approximation to the fixed point of any / in F. This minimal number for q < 1 is given by
We note that is the number of function evaluations required by bisection. Obviously, m(e, <j>) is much less than s(e,<jr), the number of function evaluations of the simple iteration, when q is close to 1. 3.1.3
Exercises
15. Derive formally Eq. (129.1). 16. (*) Carry out the analysis of sections 3.1.1 and 3.1.2 for the case of residual error criterion.
3.2.
MULTIVARIATE
PROBLEMS
139
17. (**) Generalize the results of sections 3.1.1 and 3.1.2 for a general homogeneous error criterion E ( A ( f , x ) , x ) , as defined in Eq. (45.1).
3.2
Multivariate Problems
In this section we consider the problem of approximating fixed points of multivariate contractive functions. We utilize the absolute error criterion. We first recall the simple iteration method and then describe a ball iteration method. This method is not more efficient than simple iteration in the worst case; however, it is essentially faster for some test functions presented in section 3.2.6. It is impossible to essentially improve the efficiency of the simple iteration whenever the dimension of the domain of functions is large. However, for a moderate dimension we derive a fixed point ellipsoid method, which is much more efficient than the simple iteration for mildly contractive functions. This method is based on the concept of dimensional reduction and on the Nemirovsky-Yudin-Shor construction of minimal volume ellipsoids that was originally developed for solving convex optimization problems. Finally, we describe a method based on the computation of centers of gravity. We believe that this method is minimizing the number of function evaluations in moderate dimensional cases, although it is nonconstructive. We let Bd(Q, 1) be the unit ball in the d-dimensional real space. We consider the class of contractive functions:
with the factor q < 1, where || • || is the / 2 -norm. Due to Banach's Fixed-Point Theorem, there exists exactly one fixed point «(f) G Bd(Q, 1) of any / 6 Fj. We wish to compute an (absolute) s-approximation M(f) to the fixed point
for e < 1.
Simple iteration The simple iteration method (SI) defined as in Eq. (121.1) by x,-+i =
CHAPTER 3. FIXED POINTS-COTRACTIVE
140
/(xj-) for i = 0,1,..., with x0 = 0, requires at most
function evaluations (iterations) to compute an e-approximation to the fixed point of any function from the class Fj. We note that s(e, q) is independent of the dimension d. The efficiency of SI cannot be essentially improved whenever d > s( , q) — 1. However, for small n and q very close to one, there exist methods more efficient than SI. For instance, for the univariate case, d = 1, we presented an FPE-A method and proved that it is optimal in terms of the required number of function evaluations (see previous section). The FPE-A method is always better than bisection and is much faster than SI whenever q is close to one. 3.2.1
A Constructive Lemma
We first prove the following constructive lemma, that provides base for the bounding procedure of the ball iteration, the ellipsoid iteration, and the centroid methods outlined in the following sections. Lemma 140.1 We let f G Fj, and suppose that A Bd(Q, 1) contains the fixed point a ( f ) . Then for every x G A we have a(f) € A D Bd(c, 7), where
and
Proof We let
or
and becomes
Then the inequality
3.2. MULTIVARIATE
PROBLEMS
141
Figure 141.1: Location of the fixed point. Dividing both sides by (1 — q2) > 0, we obtain
That is, Replacing z and a back by a(f) - x and x - /(x), respectively, we have
Therefore a(f) belongs to A Ci Bd(c, 7) as claimed. • The lemma is illustrated in Figure 141.1, where H is the supporting hyperplane of Bd(c, 7) at b = (/(x) + *)/(! + ), and its normal is a = x — /(x). We note that b is the intersection point of the line segment [x,/(x)] and the ball Bd(c,^). Apparently, «(f) is also in the half-space H~, i.e., a T (a(f) — b) < 0.
142 3.2.2
CHAPTER 3. FIXED POINTS-CONTRACTIVE
Ball Iteration
In this section we describe the ball iteration method. We suppose that a ball Bd(xi,jt) contains the fixed point of / 6 Fd- Then, the fixed point a(f) is in the intersection
where B d (c, 7) is given in Lemma 140.1 with x = x;. We let Si = ||/(xj-) — x;||. If Si = 0, then x{ is the fixed point of /. We now suppose that Si ^ 0. Then
and since B is a nonempty set,
This yields £ , • < ( ! + We let 5 c '(xj-+i, 7i+i) be the smallest ball that encloses ?, and
Then x,-+i and 7,-+! can be calculated by the following formulas, (i) If 0 < Si < & then
and
Figure 143.1 (a), (b) illustrates the case (i), and Figure 143.1 (c) depicts the case (ii). The ball iteration (BI) method is defined as follows. Starting from the initial ball J3 d (xi,7i) = Bd(Q, 1), we iteratively apply the
3.2. MULTIVARIATE PROBLEMS
143
Figure 143.1: The minimal ball enclosing the intersection of two balls. above formulas to construct minimal volume balls to enclose the fixed point, until we obtain a ball Bd(x.k, ~fk) with 7^ < e. Then xfc is an eapproximation of the fixed point. The method also checks whether the residual error satisfies the condition ||/(x;) — x,-|| < (1 — q2)s/qFigure 144.1 summarizes the ball iteration. We now analyze the number of iterations of the BI method needed to solve our problem. First, we view the above Si as a variable 6 ranging from 0 to (1 + )7;- Since 7,-+! = qS/(l — q2) increases linearly for 8 < &, it is clear that the maximal value of 7,-+i must occur at some value <5max > &. For 8 > &, letting the derivative dji+i/d6 - 0, we find max = ; at Therefore the radius of the ith ball constructed by BI method, for all i > 0. We conclude that the worst case number of iterations
144
CHAPTER 3. FIXED
POINTS-CONTRACTIVE
while i > £ do begin if 6
(I — q2}e/q then return x,- + (/(x,-) - x,-)/(l - q2) as an ^-approximation; stop;
if 0 <
< then Use Eq. (142.1) to compute Xi+1 and if then Use Eq. (142.3) to compute Use Eq. (142.2) to compute Xi+i; i :— i + 1;
;
end; return x, as an e-approximation; stop;
Figure 144.1: The ball iteration method. of the BI method is also [log(l/e)/log(l/)]. We remark that when applied to univariate contractive functions the BI method coincides with the FPE-A iteration. This means that BI minimizes the number of function evaluations in the univariate case. 3.2.3
Ellipsoid Iteration
In this section we provide an efficient method for multivariate contractive functions for a moderate dimension n. Several practical applications fall in this category, as listed in the annotations. The iteration, called the fixed point ellipsoid method (FP-EL), is based on a concept of dimensional reduction and on the Nemirovsky-YudinShor construction of minimum volume circumscribed ellipsoids that was originally designed for solving convex optimization problems. The FP-EL method is outlined as follows. Starting from the initial ball Bd(0,1), the method evaluates function / at the center and determines which part of the ball contains the fixed point. It then constructs the ellipsoid of minimal volume that encloses this part. With the new ellipsoid, the method similarly constructs another ellipsoid
3.2. MULTIVARIATE
PROBLEMS
145
with smaller volume to enclose the fixed point, and so on. A sequence of ellipsoids that always enclose the fixed point are thus obtained. The volumes of those ellipsoids are geometrically descreasing. If the radius of some ellipsoid becomes less than e, then the FP-EL method chooses the center of this ellipsoid as an e-approximation to the fixed point. Otherwise, the ellipsoids become elongated. Once some ellipsoid is so flat that it can be well approximated by a hyperplane, the FP-EL method switches to this hyperplane to continue fixed point approximation in the (d — 1) dimensional space. These bounding (of fixed points) and dimensional reduction steps are repeated in all dimensions except in the one-dimensional case in which the FPEA iteration is used to approximate the one-dimensional fixed point. Finally, the FP-EL method determines an e-approximation to the fixed point from previous computations. We show that the number of function evaluations required by the FP-EL method to compute an e-approximation to the fixed point of any function / 6 Fd is at most
We observe that the constant in the big O notation is independent of e, q and d. We stress that the number of function evaluations depends weakly on q. Even if q is very close to one, q = I — 6 with small then the number of function evaluations is only proportional to in It is clear that if n is not too large and q is close to 1, then the number of function evaluations of the FP-EL method is much smaller than the corresponding number for the simple iteration. This section is organized as follows. First, we discuss the dimensional reduction steps of the FP-EL method. Following this, we prove a lemma that is essential for the bounding steps. In what follows we present a complete version of the FP-EL iteration that does combine both bounding and dimensional reduction steps. Further we address the implementation issues and report on some numerical tests. We list some open problems as exercises at the end of this section.
146
CHAPTER 3. FIXED
POINTS-CONTRACTIVE
Dimensional reduction
We describe here the dimensional reduction procedure used in the d FP-EL method. First, for any Fd, we let / : d (0, 1) to be the function
It can be verified that / is contractive with the the same factor q as / and that (f) = [a(f)i, ..., ce(f)d] = <*(f) is its unique fixed point. Thus, we can use / instead of / for approximating the fixed point a(f) later. We suppose that a hyperplane H = {x G 7id : x^ej = a}, where ei = [1,0, --.,0] T € TZd, is given. The deflated function of / on H is defined as the (d — 1) dimensional function
It maps ft^1 into Bd~l(0,l) since \\g(x)\\ < |/(a,x)|| < 1, Vx 6 Tld~l . In addition, g is contractive with the same factor q as / since
Consequently, g has a unique fixed point
Lemma 146.1 We let g be the deflated function of f on the hyperplane xTei = a. Then
Proof By definition, «((f)i = fi+i(a,a(g)) for 1 < i < d— 1. Thus, we have
3.2. MULTIVARIATE
PROBLEMS
147
That is,
By adding (a — a(f)i) 2 to both sides, we obtain
or
Hence,
and the conclusion follows. • This lemma provides the following dimensional reduction procedure. First, we suppose that c\ was computed, such that \c\ — «(f)i| < rji. We let f^~^ be the deflated function of /M = / on the hyperplane xTei = c\. Then we switch to the (d — 1) dimensional space to compute c% such that \c^ — «(/^~ 1 -')i| < %> and similarly define fld~2\ the (d — 2) dimensional deflated function of /K-1]. Following this, we obtain the deflated functions fid~l1,..., /^ and the corresponding c,-'s that satisfy |cj — a(fid~t+1^)\\ < J?i- The value of Cd can be computed using the FPE-A iteration, so that We now show that the vector c = [GI, c j , . . . , cj is an e-approximation of a(f), i.e., ||c — a(f)|| < £, whenever the 77,'s are chosen sufficiently small. First, we let a^ = [GI, ..., c^-i, a(/^)] T for 1 < i < d and a^ = «(f). Then
148
CHAPTER 3. FIXED
Since Since
POINTS-CONTRACTIVE
then by Lemma 146.1,
Thus , we have
Taking, for instance,
the above inequality implies that We can also determine the needed accuracy of dynamically. That is, we let e^ = £, and we suppose that GI was computed so that
with known 8\. Then inequality (148.1) implies that it is enough to approximate the fixed point a(/' d ~^) to within £d-i = £d — #1/1/1 — 2 . Following this, we dynamically set Ei for approximating a(/W) for all i. This, together with Eq. (148.2), is used in the FP-EL method, as depicted in Figure 154.1. We assumed so far that / was deflated on the specific hyperplane of the form: xTe! = a. This restriction can be easily removed, and we can similarly deflate / on any hyperplane. This is due to the fact that the 12- norm is preserved under orthogonal transformations. The generalized deflation is described as follows. We take a hyperplane H : (x — u) r d = 0 that contains the point u. The normal vector of H is d with ||d|| = 1. We let Q be an orthogonal matrix such that Qd = ei. For example, we can choose Q as a Householder matrix Q = I — 2 VVT with a properly selected vector v, ||v|| = 1. In the space transformed by Q we define the function h by /i(-) = Q f ( Q T ( - ) ) . Then the fixed point of h is ot(h) = Q a ( f ) , since
3.2. MULTIVARIATE
PROBLEMS
149
We observe that if x is an ^approximation to
i.e., QTx is an ^-approximation to «(f). Therefore, we can approximate a(h) instead of a(f), provided Q is known. We also note that in the transformed space, H becomes H : xTei = ci = d r u. We thus can define the deflated function of h on H as before,
For
where
the k-dimensional deflated function is defined as
with
and the i X i matrix XiXi rotates the desired direction onto the first coordinate axis of the z'-dimensional space. In summary, we obtain the dimensional reduction scheme illustrated in Figure 150.1 for approximating the fixed point of /. As mentioned earlier, QTc is an e-approximation of «(f) if the 77,-'s in this scheme are chosen such that the right-hand side of inequality (148.1) is at most e. To estimate the number of function evaluations in the dimensional reduction procedure, we suppose that /i(i) function evaluations are used to complete the step (*) in this scheme with appropriately chosen ry,-'s. Then, the procedure requires at most
function evaluations to compute an ^-approximation to «(f). In what follows we show how to realize the step (*) of the dimensional reduction scheme.
CHAPTER 3. FIXED
150
the do begin
while
POINTS-CONTRACTIVE
identity matrix;
Find an i-dimensional hyperplane such that Find Qi the orthogonal matrix as in (149.1) that rotates the vector d onto the first coordinate axis of the z-dimensional space;
end;
Use the FPE-A algorithm to find Cd such that as an -approximation of the fixed return point a ( f ) . Figure 150.1: Dimensional reduction scheme. Ellipsoid method
The construction of the FP-EL ellipsoid method is based on Lemma 140.1. We first observe the following corollary of this lemma. Corollary 150.1 We let f be any function in Fd- Then, for any
x
with
B d (0,1),
and
We are ready to present the FP-EL method. We initially evaluate / at x0 = 0, the center of the ellipsoid EQ = Bd(0,l). By Corollary 150.1, the fixed point a(f) belongs to
where a = XQ — /(XQ) and b = (/(XQ) + gxo)/(l + )• We then construct the ellipsoid E\ of minimum volume to enclose Z, as depicted
3.2. MULTIVARIATE
PROBLEMS
151
in Figure 152.1. The construction of minimal volume ellipsoids, such as EI, is given in the following theorem (see the annotations to this chapter). Theorem 151.1 We let Ei = {x £ Ud : (x-x,-) T A,~ 1 (x-x,-) < 1} be an ellipsoid defined by a symmetric and positive definite matrix A{. The unique ellipsoid of minimum volume containing the ellipsoid section
is given by
with
where
The volume ratio is given by
where \A\ denotes the determinant of A. In our case, the set Z can be rewritten as
with the initial XQ = 0 and AQ = I. Thus, the parameter £ is given
by
152
CHAPTER 3. FIXED
POINTS-CONTRACTIVE
Figure 152.1: Minimum volume ellipsoid. Obviously, the ellipsoid EI also contains «(f). We hence compute / at xi, the center of EI, and similarly construct another smaller minimal volume ellipsoid E? to enclose the part of E\ which contains the fixed point cv(f). Following this idea, a sequence of ellipsoids {£";} is constructed, such that a(f) belongs to each E{. The volumes of ellipsoids E{ are geometrically decreasing, since
We stop contructing ellipsoids whenever (i) the radius of some ellipsoid Em is not greater than e, or (ii) the smallest semi-axis of Em is at most 771 = e\J\ — q2/d as given by (148.2). The smallest semi-axis and the largest semi-axis of the ellipsoid and where are given by and are the smallest and largest eigenvalues of Am, respectively. Thus the case (i) is satisfied whenever Then the center of Em, is an ^-approximation
3.2. MULTIVARIATE
PROBLEMS
153
In the case (ii) we have \f\\(Am} < 771. We let H : (x-x m ) T d = 0, where d is an eigenvector of Am corresponding to Ai(A m ). Then |(a(f) — x m ) T d| < T/I, since Em contains <*(f). Therefore, we can use the dimensional reduction to approximate «(f) in the (d — 1) dimensional space. The ellipsoid method obtained in this way is presented in Figure 154.1. We comment on this method. 1. The inner while loop (steps 4-14) constructs minimum volume ellipsoids to enclose the fixed point. This part of the FP-EL method corresponds to the step (*) in the dimensional reduction scheme. 2. The method dynamically determines the error tolerance for the deflation process. An ^-approximation to the fixed point «(f) is assured by step 19. 3. The A,; is a k x k matrix, so is Q in the decomposition A; = QDQT (step 18). This leads to updating of Q as in step 20. We now analyze the efficiency of the FP-EL method. Since |Ao| = 1 then Eq. (151.1) yields |A;| < e'^^^. Hence This implies that \/Ai(A,-) < £kV^ ~ ^A ig satisfied for large i, and the inner loop terminates after at most //(&) steps,
We note that step 19,
is met.
is executed only after the condition Hence we have It is easy to check that
for all
Therefore
CHAPTER 3. FIXED
154
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20
while k > 1 do begin while
POINTS-CONTRACTIVE identity matrix;
identity matrix; and
do begin
end; then if A as an -approximation of f*; return else such that the first column of Q Decompose corresponds to the smallest eigenvalue
Partition
in columns, then update
21 22 23 24 endif 25 end; 26 Use the FP-EN iteration to find Cd such that as an -approximation to a ( f ) . 27 Return Figure 154.1: FP-EL method. From Eq. (149.2) we conclude that the FP-EL method requires at most p(e, q) function evaluations to compute an e-approximation to the fixed point of any function from the class F^, where
3.2. Mt/LTIVARJATE PROBLEMS
155
Obviously, p(e, q) is much smaller than the number of function evaluations s(e, q) of the SI method whenever the dimension d is not too large and q is close to 1. In this case, the FP-EL method is much more efficient than the simple iteration. This is also demonstrated in section 3.2.5 summarizing numerical tests. 3.2.4
Centroid Method
In this section we construct a ball centroid method (BC). This method is also based on Lemma 140.1 and the dimensional reduction. It fulfills the bounding step (*) of the dimensional reduction scheme in Figure 150.1 by the procedure listed in Figure 156.1, in which the function r(Z), for a convex set Z C -6(0,1), is defined by
and £k, £/d < e^ < £, is the required error tolerance for the kth dimension, as discussed in section 3.2.3. We define the sets 5,- as in Figure 156.1. Then, we let 5t- = {x € Si | aT(x — x,-) > 0}, where x; is the centroid (center of gravity) of Si and a = /(x,-) — x,-. The ratio of volumes of the sets 5,' and Si satisfies
Since 5,-+i is a subset of 5,' (in Figure 156.1), we obtain
the volume This yields k of B (Q, 1)). Moreover, We recall that the dimensional reduction can be performed after one component of the fixed point has been approximated to within
CHAPTER 3. FIXED
156
while
POINTS-CONTRACTIVE
do begin then
if radius return the center of Si as an approximation of a be the centroid of Let
end;
Figure 156.1: The procedure used by BC method to bound the fixed point. absolute error. This is accomplished at most whenever the half of the minimum width w(Sm) of the set Sm is not greater than r*, where
and HI, H-2 are distinct, parallel (k — 1) dimensional hyperplanes supporting the set Sm. Using Steinhagen's theorem of convex analysis we get Therefore, we select the smallest number m such that
for the BC method to switch to the (k — 1) dimensional problem (through the dimensional deflation). The last inequality yields that
Using (149.2), we conclude that the number of function evaluations in the BC method is at most
3.2. MULTIVARIATE
PROBLEMS
157
It is worth pointing out that the above number is about d times smaller than the upper bound on the number of iterations in the FPEL method. Hence the BC method provides a better upper bound on the complexity of computing ^-approximations for the class FdNevertheless, the BC method is not suitable for implementation, since it is difficult to numerically determine the centroid x,- of Si in Figure 156.1. In our recent paper we improve this bound to be O(d(log(l/e) + log(l/(l - g)))). The latter bound can also be obtained by the interior ellipsoid algorithm (see the annotations). 3.2.5
Numerical Tests
Implementation of the FP-EL method requires the computation of minimum volume ellipsoids and the eigensystem of the matrices which define these ellipsoids. It turns out that the formula for construction of minimum volume ellipsoids, given in Theorem 151.1, is numerically unstable despite of its elegance. The instability comes from the roundoff error that may cause the computed matrices Ai to become indefinite. Hence, the quantity af j4;a;, whose square root is required, may turn out to be zero or negative. To avoid such instability, one must implement the algorithm in a numerically stable way, for example, by using Gill's rank-one matrix updating algorithm. We briefly comment on the computation of eigenvalues and eigenvectors in the inner loop of the FP-EL method, in Figure 154.1, steps 4-14. We recall that
This can be rewritten as a rank-one matrix updating,
and with Using this special form, we can apply the Bunch-Nielsen-Sorensen algorithm to compute the eigensystem of 5;+i provided that the eigensystem of B{ = QiDiQj is known. Starting with both Qo and DQ as the identity matrices, we sequentially apply this algorithm to update minimum volume ellipsoids and eigensystems at the same time. We implemented the FP-EL method in ANSI C programming language. The program run successfully for all our test functions. We
CHAPTER 3. FIXED
158
POINTS-CONTRACTIVE
also compared its efficiency (CPU time and the number of function evaluations) with SI and BI methods. We did run our tests on an HP 9000/300 computer. In all testing cases, the three methods use the same initial evaluation points. In addition, the FP-EL and BI also use the same initial balls. We use the condition
as the stopping criterion for both SI and BI methods, since it implies We explain the format of the tables. The decimal point numbers are CPU times measured in seconds. The integer numbers in parentheses are the number of function evaluations. In the FP-EL columns, if dimensional reductions do occur, the numbers of function evaluations are listed from the higher dimension to the lower dimension. Test 1. The first testing function is a very simple affine mapping given by
where the constant vector s is randomly chosen from Bd(Q, 1). Obviously, s is the unique fixed point of /. Table 158.1 shows the results for n = 4. Table 159.1 shows the results for varying d from 2 to 6. Both tables use B d (0,2) as the initial ball. It is surprising that the BI method is extremely efficient (CPU time < 0.001 sec.) in these cases. Table 158.1: Tl: d = 4 and e = 10" l-q 1
10-
io-2 10~3 4
io-
ID"5
FP-EL 0.680 0.680 0.720 0.620 0.660
(328) (324) (333) (298) (322)
SIA 0.002 (122) 0.040 (1273) 0.400 (> 1.27 X IO 4 ) 3.900 (> 1.27 x IO 5 ) 39.30 (> 1.27 x IO 6 )
< < < < <
BIA IO-3 (19) IO-3 (20) IO-3 (20) IO-3 (20) IO-3 (20)
3.2. MULTIVARIATE
PROBLEMS
159
Table 159.1: Tl: q = 1 - 1(T6 and e = 1(T6
d 2 3 4 5 6
FP-EL 0.028 ( 84) 0.240 (181) 0.710 (346) 1.500 (530) 2.900 (718)
232.3 338.3 421.9 513.2 604.0
SIA (> 1.24 x 107) (> 1.27 x 107) (> 1.27 x 107) (> 1.28 x 107) (> 1.28 x 107)
< < < < <
BIA 10~3 (19) 10-3 (20) ID"3 (20) 10~3 (20) 10-3 (20)
Test 2. The testing function is constructed as follows. First, we let F(x) = Ax + G(x) - b, where
and
with We test with properly chosen t such that / is contractive. The results are shown in Table 160.1. Test 3. We define the periodical function with
CHAPTER 3. FIXED
160
POINTS-CONTRACTIVE
Table 160.1: T2: e = 10 -6 1-9 2
io-
3
10~
io-4 ID" 5 IO-6
io-7
FP-EL 24.30 27.00 27.30 28.50 28.20 28.20
|
(2093) (2285) (2341) (2355) (2346) (2330)
BIA
SIA 0.025 ( 94) 0.085 ( 335) 0.290 ( 1150) 1.000 ( 3897) 3.300 (13120) 11.50 (43977)
0.040 ( 168) 0.180 ( 651) 0.400 ( 1239) 2.200 ( 6733) 3.700 (11985) 13.80 (44762)
for i = 1,2 and 2m — 1 < x\,X2 <• 2m + 1, where m is an integer number. It is easy to check that g is contractive with the factor 0 < q < 1 and (1,1) is its unique fixed point. We test Table 160.2: T3: £ = 10~6 with the Initial Ball J3 2 ([0,0],2) l-q io-21
io-
ID"3 10~4 10~5 10~6
FP-EL 0.035 (111) 0.040 (113) 0.040 (113) 0.040 (114) 0.040 (113) 0.040 (112)
SIA 0.000 0.002 0.002 0.004 0.000 0.004
( 44) ( 68) ( 82) ( 93) ( 26) (114)
BIA 0.002 ( 46) 0.008 ( 134) 0.030 ( 449) 0.100 ( 1459) 0.150 ( 2065) 0.770 (11518)
(a) 9 = 7T/6.
l-q
io-21
io-
10~3
io-4 IO- 5
io-6
FP-EL 0.035 (102 ) 0.045 (111,40) 0.045 (118, 50) 0.045 (121 ) 0.055 (133, 55) 0.045 (136 )
SIA
BIA
0.004 ( 111) 0.040 ( 947) 0.314 ( 7245) 1.800 ( 49664) 10.07 ( 277745) 44.60 (1128360)
0.002 ( 38) 0.026 ( 393) 0.010 ( 122) 0.140 ( 2082) 57.60 ( 780225) 436.3 (5922527)
(b) e = 7T/2.
3.2. MULTIVARIATE PROBLEMS
161
with 9 = 7T/6 and 0 = n/2. Table 160.2 shows the results. Test 4. We test the function
with the saw-like function
for i = 1,2 and m < x,- < m + 1, where m is any integer. We note Table 161.1: T4: s = 1Q-6 and the initial ball is B 2 ([0,0],2) l-q lo-21
io-
10~3 10~4 10~5 1(T6
FP-EL 0.040 (112) 0.040 (112) 0.045 (113) 0.040 (113) 0.045 (113) 0.045 (113)
SIA
BIA
0.002 ( 85) 0.038 ( 1002) 0.400 ( 12434) 5.300 ( 147246) 64.00 (1701155) fail
0.005 ( 94) 0.060 ( 993) 0.810 ( 11417) 8.800 ( 124188) 92.90 (1341851) fail
(a) 0 = 7T/4.
l-q 1
io-
10~2 10~3 4
io-
10~5 10~6
FP-EL 0.045 (113) 0.045 (114) 0.040 (114) 0.040 (113) 0.040 (113) 0.040 (113)
SIA
BIA
0.002 ( 88) 0.030 ( 1088) 0.400 ( 13161) 5.500 ( 154630) 62.80 (1777443) fail
0.000 ( 19) 0.075 ( 1123) 0.900 ( 12192) 8.400 ( 128922) 95.90 (1417180) fail
(b) 9 = 7T/3.
162
CHAPTER 3. FIXED
POINTS-CONTRACTIVE
that 0 < q < 1 is the contractive factor of /. Table 161.1 shows the results for 0 = Tr/4 and 6 = w/3. The above results indicate that for some functions with q very close to 1 (e.g., T3 and T4), the overhead of our implementation due to solving eigensystems and updating ellipsoids is negligible when compared with the costs of the BI and the SI methodss. Our tests also revealed that the cost of the FP-EL method depends very weakly on the contractive factor q. This agrees with the theoretical bound, which depends only on l n ( l / ( l — 2 )). 3.2.6
Exercises
18. (*) Find a selection of rfs that guarantees, that the right-hand side of the inequality (148.1) is at most e, and that minimizes the number of function evaluations in the FP-EL method. 19. (**) Find the minimal number rn(e, q) of function evaluations needed to compute an ^-approximation to the fixed point of any function from F^, for a small and moderate dimension d. 20. (**) Analyze the average number of function evaluations necessary to solve the fixed point problem. The average number is defined with respect to a given probability measure on the class Fd. One may choose, for example, Gaussian or Wiener type measure. 21. (**) Analyze the average number of function evaluations in the ball iteration method.
3.3
Annotations
Constructive approach to approximation of fixed points can be traced back to the classical work of Stefan Banach. In his dissertation in 1922 [3] he developed the Simple Iteration method for contractive functions in Banach spaces. Intensive research in this area started in 1960s, when Scarf [32] demonstrated a simplicial algorithm to approximate fixed points of general continuous maps. Following this development several new classes of algorithms were invented, including restart methods of Merrill [24], [25], and homotopy continuation methods of Eaves and Saigal [7], [8], as well as Allgower and Georg [1], Garcia and Zangwill [9], Kojima and Yamamoto [19], and Todd
3.3. ANNOTATIONS
163
[44]. Most of these were designed to approximate fixed points of general continuous maps with using the residual error criterion. 3.3.1
Specific Comments
Section 3.1 1. The fixed point computation is an example of a problem for which sequential computation must be used (with relative error criterion), and the sequential computation is much more powerful than parallel is (with absolute error criterion). This is in contrast to linear problems, for which the sequential computation is not more powerful than is parallel, as explained by Traub et al. [46]. 2. The FPE-A method for univariate contractive functions was developed by us in [38]. 3. The ball iteration method and its numerical tests are exhibited in the work of Tsay [48] and also Xu and Shi [52]. 4. Nemirovsky proved in [27] that it is impossible to essentially improve the efficiency of the simple iteration whenever the dimension of the domain of contractive functions is at least 5. The circumscribed ellipsoid method was originally developed for convex optimization problems by Nemirovsky and Yudin [28] (it is known as the Nemirovsky- Yudin-Shor algorithm, see also Shor [34], [35]) and later applied to the linear programming problem by Khachiyan [17]. 6. The centroid (centers of gravity) method for solving convex optimization problems was designed by Levin [23] and independently discovered later by Newman [30]. Analysis of this method is based on the results of convex analysis due to Grunbaum [12] (see also Mitiagin [26]). 7. The interior ellipsoid method for solving convex optimization problems was developed by Tarasov, Khachiyan, and Erlikh [43]. 8. The proof of Lemma 140.1 was carried out in [40]. This lemma was also published in [38].
164
CHAPTER 3. FIXED
POINTS-CONTRACTIVE
9. The quantities Xj- + i, and 7,-_|_i in the BI method are calculated according to the formulas given by Williamson [50]. 10. For the univariate contractive functions Williamson [50] demonstrated that the BI method coincides with the FPE-A iteration. Section 3.2 1. For many practical applications the dimension is small and the contraction factor is close to one. These include problems in economics, game theory, and electromagnetics, as listed for example in [13], [14]. 2. The Householder orthogonal matrices are constructed as in [11]. 3. The construction of minimal volume ellipsoids in the FP-EL method and the proof of Theorem 151.1 is given in Konig and Pallaschke [20]. 4. The estimates of the ratio of volumes of consecutive sets in the BC method are given by Mitiagin [26]. 5. The Steinhagen's theorem is given by Juhnke [16]. 6. Implementation of the FP-EL method requires the computation of minimum volume ellipsoids and the eigensystem of the matrices that define these ellipsoids. Bland et al. [4] showed that the formula for construction of minimum volume ellipsoids, given in Theorem 151.1, is numerically unstable. To avoid this instability, we implemented the algorithm in a numerically stable way, using Gill's rank-one matrix updating algorithm from [10]. 7. The eigensystem of Bi is updated with the Bunch-NielsenSorensen algorithm from [5]. 8. The testing function T2 is taken from [52]. 9. A general formulation of the average case analysis can be found in Traub et al. [46].
BIBLIOGRAPHY
165
10. In our paper [15] we analyze the interior ellipsoid method (see [43]) for the computation of fixed points. We show that this constructive method enjoys the same bound as the (nonconstructive) centroid iteration, O(d(\og(l/e) + log(l/(l - 9))), and that both methods do not require the dimensional deflation procedure. Therefore we improve the bounds presented here by a factor of d. Our analysis also implies that the circumscribed ellipsoid algorithm does not need the dimensional deflation, and its number of iterations can be bounded by O(d 2 (log(l/£) + log(l/(l - q)))). The latter algorithm may become most practical (in that case when the function evaluations are not too expensive), since its implementation is much simpler than that of the interior ellipsoid procedure. We believe that the best possible bound on the number of function evaluations under the worst case analysis is of the form 0(rf)(log(l/e) + bg(l/(l - ))), for e -> 0, q -> 1~. In the following bibliography we list a number of books and papers devoted to fixed point computation.
Bibliography
[1] Allgower, E. L., and Georg, K. Numerical Continuation Methods. Springer-Verlag, New York, 1990. [2] Allgower, E. L., and Georg, K. Simplicial and continuation methods for approximating fixed points and solutions to systems of equations. SIAM Rev. 22: 28-85, 1980. [3] Banach, S. Sur les operations dans les ensembles abstraits et leur application aux equations integrates. Fund. Math., 3: 133181, 1922. [4] Bland, R. G., Goldfarb, D., and Todd, M. J. The ellipsoid method: A survey. Operations Res., 6: 1039-1090, 1981. [5] Bunch, J. R., Nielsen, C. P., and Sorensen, D. C. Rank-one modification of the symmetric eigenproblem. Numer. Math., 31-48, 1978.
166
BIBLIOGRAPHY
[6] Chou, D. On the optimality of Krylov information. J. Complexity, 3: 26-40, 1987. [7] Eaves, B. C. Homotopies for computation of fixed points. Math. Programming, 3: 1-22, 1972. [8] Eaves, B. C., and Saigal, R. Homotopies for computation of fixed points on unbounded regions. Mathematical Programming^: 225-237, 1972. [9] Garcia, C. B., and Zangwill, W. I. Pathways to Solution, Fixed Point, and Equilibria. Prentice Hall, Englewood Cliffs, NJ, 1981. [10] Gill, P. E., Murray, W., and Saunders, M. A. Methods for computing and modifying the LDV factors of a matrix. Math. Comput., 132: 1051-1077, 1975. [11] Golub, G. H., and Van Loan, C. Matrix Computations. John Hopkins Press, Baltimore, MD, 1989. [12] Grunbaum, B. Partials of mass-distribution of convex bodies by hyperplanes. Path. J. Math., 10: 1257-1261, 1960. [13] Howard, R. A. Dynamic Programming and Markov Processes. MIT Press, 1960. [14] Howland, J. L., and Vaillancourt, R. Attractive cycle in the iteration of meromorphic functions. Numer. Math., 46: 323337, 1985. [15] Huang, Z., Khachiyan, L., and Sikorski, K. Approximating fixed points of weakly contractive mappings. J. Complexity, 15: 200213, 1999. [16] Juhnke, F. Inradius and Dicke konvexer Korper aus optimierungstheoretische Sicht. Beitrdge Algebra Geom., 27: 13-20, 1988. [17] Khachiyan, L. G. Polynomial algorithm in linear programming. Soviet Math. Doklady, 20: 191-194, 1979. [18] Kojima, M., and Yamamoto, Y. Variable dimension algorithms: basic theory, interpretetion, and extensions of some existing methods. Math. Programming, 24: 177-215, 1982.
BIBLIOGRAPHY
167
[19] Kojima, M., and Yamamoto, Y. A unified approach to the implementation of several restart fixed point algorithms and a new variable dimension algorithm. Math. Programming, 28: 288-328, 1984. [20] Konig, H., and Pallaschke, D. On Khachian's algorithm and minimal ellipsoids. Numer. Math., 211-223, 1981. [21] Kuhn, H. W. Approximate search for fixed points, in Computing methods in optimization problems 2 (L. A. Zadeh, L. W., Neustat, and A. V. Balakrishnan, Eds.). Academic Press, New York, 199-211, 1969. [22] van der Laan, G., and Talman, A. J. J., Eds., The Computation and Modeling of Economic Equilibria, North-Holland, Amsterdam, 1987. [23] Levin, A. On an algorithm for the minimization of convex functions [in Russian]. DAN USSR, 160 (6): 1244-1247, 1965. [24] Merrill, O. H. Applications and Extensions of an Algorithm that Computes Fixed Points of a Certain Upper Semi-continuous Point to Set Mapping. Ph.D. dissertation, University of Michigan, Ann Arbor, 1972. [25] Merrill, O. H. A summary of techniques for computing fixed points of continuous mapping, in Mathematical Topics in Economic Theory and Computation (R. H. Day and S. M. Robinson, Eds.). SIAM, Philadelphia, 130-149, 1972. [26] Mitiagin, B. S. Two inequalities for volumes of convex bodies. Math. Notes, 5: 61-65, 1968. [27] Nemirovsky, A. S. On optimality of Krylov's information when solving linear operator equations. J. Complexity, 7: 121-130, 1991. [28] Nemirovsky, A. S., and Yudin, D. B. Informational complexity and effective methods of solution of convex extremal problems [in Russian]. Econom. and Math. Methods, 12 (1): 357-369, 1976. [29] Nemirovsky, A. S., and Yudin, D. B. Problem Complexity and Method Efficiency in Optimization. Wiley, New York, 1983.
168
BIBLIOGRAPHY
[30] Newman, D. J. Location of maximum on unimodal surfaces. J. ACM, 12: 395-398, 1965. [31] Ortega, J. M., and Rheinboldt, W. C. Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York, 1970. [32] Scarf, H. The approximation of fixed point of a continuous mapping. SIAM J. Appl. Math., 15: 1328-1343, 1967. [33] Scarf, H., and Hansen, T. The Computation of Economic Equilibria. Yale University Press, New Haven, CT, 1973. [34] Shor, N. Z. Cut-off method with space extension in convex programming problems. Cybernetics, 13: 94-96, 1977. [35] Shor, N. Z. Minimization Methods for Nondifferentiable Functions. Springer-Verlag, Berlin, 1985. [36] Sikorski, K. Bisection is optimal. Numer. Math., 40: 111-117, 1982. [37] Sikorski, K. Optimal solution of nonlinear equations. J. Complexity, 1: 197-209, 1985. [38] Sikorski, K. Fast algorithms for the computation of fixed points, in Robustness in Identification and Control (M. Milanese, R. Tempo, and A. Vicino, Eds.). Plenum Press, 49-59, 1989. [39] Sikorski, K., and Wozniakowski, H. Complexity of fixed points I. J. Complexity, 3: 388-405, 1987. [40] Sikorski, K., Tsay, Ch. W., and Wozniakowski, H. An ellipsoid algorithm for the computation of fixed points. J. Complexity, 9: 181-200,1993. [41] Smale, S. On the efficiency of algorithms of analysis. Amer. Math. Soc., 13: 87-121, 1985.
Bull.
[42] Swaminathan, S., Ed. Fixed Point Theory and Its Applications. Academic Press, New York, 1976. [43] Tarasov, S. P., Khachiyan, L. G., and Erlikh, I. I. The method of inscribed ellipsoids. Soviet. Math. Dokl, 37: 226-230, 1988.
BIBLIOGRAPHY
169
[44] Todd, M. J. The Computation of Fixed Points and Applications. Springer Lecture Notes in Economics and Mathematical Systems, 124, Springer-Verlag, New York, 1976. [45] Traub, J. F. Iterative Methods for the Solution of Equations. Prentice Hall, Englewood Cliffs, NJ, 1964. [46] Traub, J. F., Wasilkowski G. W., and Wozniakowski, H. Information Based Complexity. Academic Press, New York, 1988. [47] Traub, J. F., and Wozniakowski, H. On the optimal solution of large linear systems. J. Assoc. Comput. Mach., 31: 545-559, 1984. [48] Tsay, C. W. Fixed Point Computation and Parallel Algorithms for Solving Wave Equations. Ph.D. dissertation, University of Utah, 1994. [49] Wasilkowski, G. W. Information of varying cardinality. J. Complexity, 2: 204-208, 1986. [50] Williamson, T. E., Jr. Geometric estimation of fixed points of lipschitzian mappings. J. Math. Anal. Appl., 62: 600-609, 1978. [51] Wozniakowski, H. Information-based complexity. Ann. Rev. Comput. Sci., 1: 319-380, 1986. [52] Xu, Z. B., and Shi, X. Z. A comparison of point and ball iterations in the contractive mapping case. Computing, 49: 75-85, 1992.
This page intentionally left blank
Chapter 4
Fixed PointsNoncontractive Functions In this chapter we consider the approximation of fixed points of noncontractive functions with respect to the absolute error criterion. In this case the functions may have multiple and/or whole manifolds of fixed points. We analyze methods based on sequential function evaluations as information. The simple iteration usually does not converge in this case, and the problem becomes much more difficult to solve. We prove that even in the two-dimensional case the problem has infinite worst case complexity. This means that no methods exist that solve the problem with arbitrarily small error tolerance for some "bad" functions. In the univariate case the problem is solvable, and a bisection envelope method is optimal. These results are in contrast with the solution under the residual error criterion. The problem then becomes solvable, although with exponential complexity, as outlined in the annotations. Therefore, simplicial and/or homotopy continuation and all methods based on function evaluations exhibit exponential worst case cost for solving the problem in the residual sense. These results indicate the need of average case analysis, since for many test functions the existing algorithms computed e-approximations with polynomial in 1/e cost. We define Cd as the class of Lipschitz functions that transform
172
CHAPTER 4. FIXED
POINTS-NONCONTRACTIVE
the d-dimensional unit cube into itself,
where q > 1. Note that due to Brouwer's fixed point theorem any function in Cd has a fixed point in [0,1]^. The solution operator is now defined as suchthat
This means that we solve the problem under the absolute error criterion. We assume the information to be arbitrary sequential function evaluations. We establish the following complexity results. In the univariate case, we prove that the minimal cardinality number m(e, q) is
and in the multivariate case
whenever e < 0.5 and d > 2. These imply that a bisection-envelope method is optimal in the univariate case, and that the multivariate problem is unsolvable in the worst case. Throughout the chapter £ is assumed to be less than 0.5, norm is assumed to be the second norm, and all logarithms are assumed to have base 2. Contents
In section 4.1.1 we prove Eq. (172.1). In section 4.1.2 we demonstrate that the FP-EN method is optimal. In section 4.2 we consider the multivariate problem. Namely, in section 4.2.1 we construct the worst case functions g and h for the class C-2 and show that both g and h belong to Ci- This completes the proof of Eq. (172.2) for the class C2. We also establish Eq. (172.2) for arbitrary dimension d. For the residual error criterion we exhibit (in the annotations) a parallel grid search that is an almost optimal complexity method. The complexity in this case grows exponentially with 1/e.
4.1. UNIVARIATE PROBLEMS
4.1
173
Univariate Problems
We show that the bisection and bisection-envelope based algorithms enjoy optimal complexity bounds in the one-dimensional case. To this end, for any given information N with card(TV) < [log(l/e) — 1], we construct two functions g,h € C\ such that N(g) = N(h) and such that the distance of their fixed points a(g) and a(h) is greater than 2e, \a(g) — a(h)\ > 1e. This yields that no algorithms using such N can compute ^-approximations of fixed points for the class C\ in the worst case. Since the bisection method needs exactly b(e) = flog(l/e) - 1] function evaluations to compute sapproximations, it minimizes the number of function evaluations. The minimal cardinality number for the class C\ is then given by m(£,q) = |"log(l/e) - 1]. In addition, we show that the FP-EN method, when adapted for the class Ci, is also optimal. 4.1.1
Minimal Cardinality Number
We suppose that N(f) = [/(^i),..., /(^fc)] is any information consisting of sequential evaluations of function values, and that card(W) = k < |"log(l/£) — 1]. This assumption implies that 2~k > 2e. Therefore, there exists a 5, 2~k > 8 > 0 such that 2~k - 26/(I + q) > 2e. In what follows we construct two functions g, h S C\ such that they have the same information, N(g) = N(h), and the distance of their fixed points is \a(g) — a(h)\ > 2e. For the sequentially chosen evaluation points a;,-, we use the following procedure to assign y8- = g(xi) = h(xi) for 1 < i < fe, where g and h are defined as follows.
for
do begin if
end
then
174
CHAPTER 4. FIXED
POINTS-NONCONTRACTIVE
We note that after this assignment, r - t > 2 k, each a;,- is outside of the interval (l, r), and
We now define
and
where
and
are the unique fixed points of g and h, respectively (see Figure 175.1). We obtain
It hence remains to show that g, h £ C*i, i.e., (i) they are -Lipschitz and (ii) they map [0,1] into itself. The condition (i) is satisfied since they are piecewise linear functions and are Lipschitz with the constant q in each subinterval. We now show that 0 < g(x) < 1, for all 0 < x < 1. We check this for x 6 [0, a(h)} and x G [a(h), 1] separately. To this end, we observe that 0 < t < r < 1 and r — i > 2h > S. Then, for 0 < x < a(h), we have S < g(x] < t + § < r, and for a(h) < x < 1, we have t < r — S < g(x) < x. That is, 0 < g(x) < 1- The proof for the function h is similar and omitted here. Consequently, flog(l/e) — 1] is a lower bound on the number of function evaluations for any method computing ^-approximation in the class C\. Since the bisection method needs exactly that many function evaluations, we obtain
4.1. UNIVARIATE
PROBLEMS
175
Figure 175.1: The graphs of g(x] and h(x). Theorem 175.1 Ci is
The minimal cardinality number in the class
and the bisection method minimizes the number of f evaluations.
4.1.2
The FPE-A Method
We now adapt the FPE-A method for the class C\. We recall that the idea of the FPE-A method is to iteratively compute function values at the points that minimize the length of the next intervals of uncertainty in the worst case. For the class C\, the evaluation point in each iteration is still the midpoint of the current interval of uncertainty. Following the formulas in chapter 2, we obtain the FPE-A method presented in Figure 176.1. The shrinking ratio of the lengths of uncertainty intervals (6,- o-i)/(bi~i — o,-_i) determines the efficiency of the FPE-A method. This ratio is analyzed in the following lemma.
176
CHAPTER 4. FIXED
while
POINTS-NONCONTRACTIVE
do begin
then return
if
else if
is a fixed point then
else
endif end; return
as an
approximation of
Figure 176.1: The FPE-A method for the class dLemma 176.1 We let the FPE-A method. Let
be the ith interval generated by Then
where
Proof We first assume that
and let
Furthermore, and
where
Therefore
Then
4.1. UNIVARIATE PROBLEMS
177
where »/,- = 28/((I + g)(&;-i - o,-_i)) < 1. This proves Eq. (176.1) for y,- > Xi. The proof for the case y,- < a:,- is similar and is omitted here. • The above lemma yields
since [aQ,b0] = [0,1]. We note that rjj can be arbitrarily close to 0 (if fj is arbitrarily close to Xj), and therefore 6t- — a; can be arbitrarily close to 2~*. Consequently, the FPE-A method needs m(e, q) function evaluations in the worst case to compute e-approximations in the class C\. This means that FPE-A is also minimizing the number of function evaluations. 4.1.3
Exercises
22. Verify that the function h defined on page 174 belongs to the class C\. 23. Complete the proof of Lemma 176.1 by analyzing the case y,- < Xi.
24. (*) Carry out the analysis of different error criteria in the class Ci. 25. (**) Analyze general sequential information in the class C\. 26. (**) Develop average case analysis and optimal methods in the class C\.
178
4.2
CHAPTER 4. FIXED
POINTS-NONCONTRACTIVE
Multivariate Problems
Several important application problems can be modeled by the solution of multivariate fixed point problems with noncontractive transformations. These include economics modeling, engineering applications including solution of nonlinear partial differential and integral equations as well as nonlinear operator equations. Various types of continuation methods were developed to solve general problems of this nature. These include "globally" convergent, simplicial, as well as Newton-type continuation algorithms. All of these algorithms follow homotopy curves leading to the fixed points. It was observed that in many cases these algorithms were not able to compute close approximations to the solutions and/or failed to produce any reasonable approximations. This was due to erratic behavior of homotopy curves near singular points, resulting in branching and turning. In this section we show that actually all methods that are based on sequential function evaluations will eventually fail to produce close approximations (in the absolute sense) to fixed points for some "difficult" functions. Therefore the problem with the absolute error criterion is unsolvable in the worst case. On the contrary, the approximations to fixed points in the residual sense can be computed with arbitrarily small error tolerance, although the cost of computing them can be exponentially high. Therefore all such methods in the worst case will exhibit the number of iterations (function evaluations) proportional to ( l / e ) d . Therefore the problem is intractable for small e in the worst case. In section 4.2.1 we show that the complexity is infinite in the case of the absolute error criterion. In the annotations we review the result of Hirsch, Papadimitriou, and Vavasis, showing that the complexity is finite but exponential in the case of the residual error criterion. 4.2.1
Absolute Error Criterion
In this section we consider the class C? and show that the minimal cardinality number in the class of all methods based on sequential function evaluations is infinite, m(e, q) = +00, even in the case of nonexpansive functions with q — 1. This indicates that there exist no methods that guarantee computing ^-approximation for every function in this class, with e < 0.5. To this end, for any given information
4.2. MULTIVARIATE
PROBLEMS
179
N, we construct two functions g, h 6 C 2 and q > 1, to establish Eq. (172.2) in the general case. We formulate this result in the following theorem. Theorem 179.1 The minimal cardinality number m(e, 1) is infinite, whenever e < 0.5. Proof For any information N we let k = card(JV) > 1, N(f) = [/(^i) yi)> •••) f(%k, y/t)], where (a;,-, y,-) are sequentially generated evaluation points. We split the proof into the following three steps: 1. Assign function values at evaluation points (a;,-,y,-) that are generated by N. 2. Define functions g and h such that N(g) = N(h) and the distance of their fixed points is \\a(g) — a(h)\\ = 1. 3. Prove that both g and h belong to the class C%. We first discuss steps 1 and 2 of the proof. Step 1. We choose a fixed 8 < (l/2)*+1, and we assign (a,-, 6,-) = g ( x i , y i ) = /i(xj,y,-), for 1 < i < k, according to the following flowchart:
for
do begin if
end.
then
180
CHAPTER 4. FIXED
POINTS-NONCONTRACTIVE
After this assignment, we have 0 < t < m = (l+r)/2 < r < 1 and r - i > (1/2)*. We let c = (r -1)/2. Then we have c > (l/2)* +1 > 6 and We also note that all (a:,-, m] are outside of the set (I, r) x [0,1], and that
Step 2. We divide the domain [0,1]2 into six subdomains:
as illustrated in Figure 181.1 (a). We define g ( x , y } = (g\(x, y } , g i ( x , y)) by
where is chosen to be the positive solution to the quadratic equation (185.1). (The purpose of such a particular choice for £ is to assure that g and h are nonexpanding in both D$ and 1)4.) The function /i(x, y) = ( h i ( x , y), h^(x, y)) is given by
We illustrate the functions g i ( x , y ) , g-2(x,y) and h i ( x , y ) in Figure 181.1(b)-(d), respectively.
4.2. MULTIVARIATE
PROBLEMS
181
Figure 181.1: Illustration of the functions g ( x , y ) and h ( x , y ) . (a) Partition of the domain [0,1]2, (b) Graph of g\(x,y) = hi(x,y). (c) Graph of gi(x, y). (d) Graph of h2(x,y).
182
CHAPTER 4. FIXED
POINTS-NONCONTRACTIVE
According to Eqs. (180.2)-(180.4), we get g(x^ y,-) = h(x{, y,-) = (a,-,6j-) for all 1 < i < k. This means that N(g) = N(h). By solving the equation g ( x , y ) = ( z , y ) , we determine the unique fixed point of 5, a(g) = ( m , l ) . Similarly, the unique fixed point of h, equals a(h) = (m, 0). We easily conclude that the distance \\a(g) -a(h)\\ = 1. We now proceed to verify claims of step 3 of the proof. Step 3. To complete the proof, we must show that both functions g and h belong to the class Ci- We first prove this statement for the function g and then apply that fact to show that h € C^. First, we observe that if (a:, y) £ D^ then there exists ( x ' , y ) = (2m — x, y) £ L>3 such that
We apply this fact to simplify the proof. The following two lemmas imply that g 6 Ci. Lemma 182.1 We let the function g be defined as in Eq. (180.3). Then g maps [0,1]2 into [0, I]2.
Proof We show that
and
for any The condition (a) is satisfied since we have:
• For i < x < r, g \ ( x , y ) monotonically increases from i + 8 to r — 5 for any y. Therefore
which along with Eq. (180.1) guarantees 0 <
( ) < 1.
4.2. MULTIVARIATE
PROBLEMS
183
Now we show that (b) is also satisfied. Since 0 < Qi(x, y) = y < 1 for any point (a;, y) in DI, £>2, ^5, or D6, it remains to prove (b) for (x,y) € £>s or (x,y) e £>4. If (a:, y) € As then g z ( x , y) can be written as
We observe that for a fixed x, #2 monotonically increases from (x — €)£/c to (x — £)/c as y changes from 0 to (x — £)/c. Since £ + yc < x < m, we have
and
Therefore, 0 < y£ < 52(^5 y) < 1If (x,y) € D4 then, by Eq. (182.1), we get gi(x,y) = (2m 5f 1 (x / ,y),y 2 (x',y)), where (x',y) = (2m- x,y) belongs to D 3 . Since we proved that 0 < gi(x', y) < 1, we get 0 < 52(^5 y) < 1- Moreover, the inequality yields which becomes
by substituting m = (i + r)/2, and by (180.1). We therefore completed the proof of Lemma 182.1. • The proof that g is nonexpanding in [0,1]2 can be simplified by showing that g is nonexpanding in each of the domains £>;, i = 1,..., 6. To show this, we suppose that P € -Di, Q G D%, and that g is nonexpanding in each J9,. We let R be the intersection point of the segment PQ and the line x = 1. Since R £ DI n Z?2, we have
and
184
CHAPTER 4. FIXED
POINTS-NONCONTRACTIVE
Then, by the triangle inequality, we get
This means that g is also nonexpanding in Di(jD2- By repeating the same argument we conclude that g is nonexpanding in DI UD<2 UZ>3, and finally in uf.-.jD; = [0,1]2. Consequently, we only need to prove the following lemma to show that g is nonexpanding in [0,1]2. Lemma 184.1 We suppose that both P = (pi^pi) and Q = (^1,92) belong to one of the subdomains D\, D^,..., and DQ. Then
Proof We assume without loss of generality that p2 > q?. We let
and Then
We note that
,Then,
Therefore the remaining cases to analyze are: (i) F, Q 6 D3, and (ii)
P,QeD 4 .
Case (i). In this case we have
4.2. MULTIVARIATE
PROBLEMS
185
We consider two subcases: A a; < 0 and Ax > 0. If Aar < 0 then the term
due to 0 < £ < 1 and Ay > 0. Moreover, the other two terms on the right-hand side of Eq. (184.1) are already nonnegative due to 0 < f < £ < c < 1. We therefore obtain
which means that If Aa; > 0, then the second component of g(P) — g(Q), (1 — £)Ay + £Ao;/c, is positive. If this component is not greater than Ay then \\g(P) - g(Q)\\ < ||(Az, Ay)||, since the first component of g(P) — g(Q) is positive and less than Ax. Thus, we only need to consider (1— £)Ay+£Ax/c > Ay, which is equivalent to Ay < Ax/c. We obtain
Since £ was chosen to be the positive root of the quadratic equation
we obtain
and hence
186
CHAPTER 4. FIXED
POINTS-NONCONTRACTIVE
Case (ii). We assume that both P and Q belong to D$. By Eq. (182.1), there exist P1 = (2m-pi,p2) and Q' = (2m-ql,q2) in D3 such that and
Hence we obtain
Since g was already proved to be nonexpanding in D%, we get
This inequality implies that •
We now apply the fact that g 6 C2 to show that h 6 C2. Lemma 186.1 We let the function h be defined as in Eq. (180.4)Then h G C2 with q — \. This means that h maps [0,1]2 into [0,1]2 and is nonexpanding in [0,1]2. Proof We select any ( x , y ) G [0,1]2, and note that the inequality 0 < hi(x,y) = <7i(z,t/) < 1 was already proved. We also note that ( z , l - 2 / ) € [0,1] 2 yields 0 < h2(x,y) = g^x, 1 -y) < 1. Accordingly, h maps [0,1]2 into itself. We now suppose that P = ( p i , p 2 ) and Q = (
Since g\(x, y) is independent of the variable y, we get
and
4.2. MULTIVARIATE
PROBLEMS
187
Using the fact that g is nonexpanding in [0,1]2, we obtain
which shows that h is nonexpanding in [0,1]2 and completes the proof of Lemma 186.1. • In summary, we showed the existence of two functions g and h in C2 such that N(g) = N(h) and \\a(g) - a(h)\\ = 1. Since this holds for any N with card(JV) > 1, we indeed proved Theorem 179.1. Complexity in the class Cd for d > 2 The above negative complexity result also holds for any class Cd with d > 2. More precisely, since the class Ci with q = 1 is a subset of the class C? with q > 1, the infinite complexity is simply attained for the class Ci with q = 1. We now consider any d > 2. We observe that any / € C-z can be extended to / 6 Cd by defining
Following this, the constructions of g and h for the class C% can be easily adapted for the class Cd- As a result, the following theorem holds. Theorem 187.1 The minimal cardinality number m(e, for any dimension d > 2, whenever e < 0.5. 4.2.2
q)
— +00,
Exercises
27. (**) In the class of general linear sequential information establish the infinite complexity result mentioned above. 28. (**) Analyze different error criteria for noncontractive multivariate functions.
188
CHAPTER 4. FIXED
POINTS-NONCONTRACTIVE
29. (**) Analyze possible restrictions of the class of functions which would guarantee positive results, i.e., existence of methods with polynomial (or smaller) complexity.
4.3 4.3.1
Annotations General Comments
The FPE-A and the bisection methods have the same cost in the worst case for the univariate noncontractive functions. The bisection indeed has the advantage that the knowledge of contractive factors is not required a priori. Nevertheless, with the additional knowledge of the Lipschitz factor q, the FPE-A algorithm could be more efficient than the bisection for some functions. This is evident by noting that after i iterations, the length of the interval of uncertainty is always less than 2~* due to the nonzero values of %'s in Eq. (177.1). In fact, it could be shown that the FPE-A method has the smallest local error for every function in C\ since it implements a central algorithm (as defined in chapter 1; see also [8]). The infinite complexity for the class Cd indicates that in order to obtain finite complexity results one should impose further restrictions on the class of functions. Indeed, this approach was taken by Natarajan in [4]. He defined the condition number where For the class of functions with KE bounded by 1/e, his algorithm can compute an ^-approximation with at most O(\og(l/£)(2KeA)d) function evaluations, where A, L — 1 < A < L + 1, is the Lipschitz constant of the function f ( x ) — x. This means that in this case the complexity is finite and has an upper bound of the form We believe that the negative result for the class Cd also holds for general information consisting of sequential evaluations of arbitrary linear functionals. A proof technique to establish this could follow the proofs in Boult and Sikorski [1]. 4.3.2
Residual Error Criterion
Hirsch, Papadimitriou, and Vavasis show in [2] that the fixed point problem has exponential complexity in the case of the residual error
4.3. ANNOTATIONS
189
criterion in the class of functions satisfying Lipschitz condition with the constant larger than one. Namely, they assume the following class of functions
They consider the residual error criterion, i.e., for every / 6 F they want to compute a point x = x(f) g [0, l]d such that ||/(x) — x^oo < £. They show that any method based on sequential function evaluations must use in the worst case at least n = (k(\/e — lO)M)d~2 of such evaluations, for d > 3, k > 10~5, where M is the upper bound on the Lipschitz constant of the functions g(x) = f(x) — x, and M < L + 1. The upper bound on the complexity can be easily established by a simple uniform grid search method. Namely, we subdivide the unit cube [0, l]d uniformly into \j]d subcubes of diameter diam = l/[ j]. Then we compute function values at the centers of those cubes. Since for every / 6 F there exist a cube C = C(xc, r), with the center xc and radius r = diam/2, in this subdivision, containing a fixed point a = a(f) of the function /, we get
Therefore, the problem can be solved using m = \j]d function evaluations in the worst case. As a conclusion, we obtain: Corollary 189.1 The minimal cardinality number m(e, L) for solving this problem in the residual sense is given by
for e -> 0, where d- 2 < p < d, L > I, and the dimension d > 3. Hirsch et al. also show that in the two-dimensional case d = 2 a lower bound has the form (Ife - 2.2)M/88 -7. They outline a method, based on the computation of winding numbers, with the number of function evaluations bounded by 3M/e in the worst case. Therefore, we have m(e) = O(L/e) in the two-dimensional case. Quite different results hold for the residual error criterion with the second norm in the case when the Lipschitz constant L = 1.
BIBLIOGRAPHY
190
Namely, our analysis in [3] implies that the upper bound on the minimal cardinality number is m(e) ~ O(d\og(l/e)). This upper bound is realized by the interior ellipsoid algorithm. We formulate this in Corollary 189.2 The minimal cardinality number m(e,L) for the case of the second norm \\ • \\2 and L = 1 is
It is interesting to note that in the case of Lipschitz constant L greater than one and the infinity norm, the complexity is exponential in 1/e, whenever for L < 1 and the second norm, the complexity is only logarithmic in 1/e. 4.3.3
Specific Comments
Section 4.1 presents the material from [5] and from [6]. Section 4.2 follows our work presented in [7].
Bibliography
[1] Boult, T., and Sikorski, K. Can we approximate zeros of functions with nonzero topological degree? J. Complexity, 4: 317329, 1987. [2] Hirsch, M. D., Papadimitriou, C., and Vavasis, S. Exponential lower bounds for finding brouwer fixed points. J. Complexity, 5: 379-416, 1989. [3] Huang, Z., Khachiyan, L., and Sikorski, K. Approximating fixed points of weakly contractive mappings. J. Complexity, 15: 200213,1999. [4] Natarajan, B. K. Condition-sensitive computation of approximate fixed points. J. Complexity, 9: 406-411, 1993.
BIBLIOGRAPHY
191
[5] Sikorski, K. Fast algorithms for the computation of fixed points, in Robustness in Identification and Control (M. Milanese, R. Tempo, and A. Vicino, Eds.). Plenum Press, 49-59, 1989. [6] Tsay, C. W. Fixed Point Computation and Parallel Algorithms for Solving Wave Equations. Ph.D. dissertation, University of Utah,1994. [7] Tsay, C. W., and Sikorski, K. A note on the complexity of fixed point computation for noncontractive maps, in Complexity in Numerical Optimization (P. Pardalos, Ed.). World Scientific, Singapore, 448-461, 1993. [8] Traub, J. F., Wasilkowski, G. W., and Wozniakowski, H. Information Based Complexity. Academic Press, New York, 1988.
This page intentionally left blank
Chapter 5
Topological Degree Computation In this chapter we address the problem of computing topological degree of Lipschitz functions. From the knowledge of the topological degree one may ascertain whether there exists a zero of a function inside the domain, a knowledge that is practically and theoretically worthwile. Namely, Kronecker's theorem states that if the topological degree is not zero then there exists a zero of a function inside the domain. Under more-restrictive assumptions one may also derive equivalence statements, i.e., nonzero degree is equivalent to the existence of a zero. By computing a sequence of domains with nonzero degrees and decreasing diameters one can obtain a region with arbitrarily small diameter that contains at least one zero of the function. Such methods, called generalized bisections, have been implemented and tested by several authors, as described in the annotations to this chapter. These methods have been touted as appropriate when the function is not smooth or cannot be evaluated accurately. For such functions they yield close approximations to roots in many cases for which all available other methods tested have failed (see annotations). The generalized bisection methods based on the degree computation are related to simplicial continuation methods. Their worst case complexity in general classes of functions is unbounded, as results of section 2.1.2 indicate; however, for tested functions they did converge. This suggests the need of average case analysis of such methods. There are numerous applications of the degree computation in
194 CHAPTER 5. TOPOLOGICAL DEGREE
COMPUTATION
nonlinear analysis. In addition to the existence of roots, the degree computation is used in methods for finding directions proceeding from bifurcation points in the solution of nonlinear functional differential equations as well as others as indicated in annotations. Algorithms proposed for the degree computation were tested on relatively small number of examples. The authors concluded that the degree of arbitrary continuous function could be computed. It was observed, however, that the algorithms could require an unbounded number of function evaluations. This is why in our work we restrict the functions to still relatively large class of functions satisfying the Lipschitz condition with a given constant K. In this class, we can compute the degree for every element using a priori bounded number of function evaluations. We also derive lower bounds on the number of evaluations that we must undertake to solve this problem. The problem of computing the degree has different flavor than the nonlinear zero finding problems discussed so far. Since the value of the degree is an integer, then any e-approximation with E < 0.5 gives the exact value of the degree. In the first section we discuss the two-dimensional case. In the following section we obtain lower and upper bounds on the complexity of computing topological degree in the d-dimensional case.
5.1
Two-Dimensional Lipschitz Functions
In this section we consider the class F of Lipschitz functions with constant K, defined on the unit square C = [0,1]2, / : C 7£2, such that for every / 6 F we have ((/(a;)))^ > m > 0 for all x e dC, the boundary of C, and such that K/(4m) > 1. We note that if A'/(4m) < 0.5 then the functions in F do not have zeros and therefore the degree is zero for every such /. The case 0.5 < 7l'/(4m) < 1 is still open. The information is assumed to be n sequential function evaluations on the boundary of C. This specific information is assumed since the degree is uniquely determined by the function values on the boundary of C. In what follows, we derive an almost optimal complexity algorithm (f>* using optimal information N* with n = m* — 4[A"/(4m)J function evaluations, where m* = m*(e) for e < 0.5 is the minimal cardinality number. The information N* is parallel and therefore can easily be implemented on a parallel computer yielding an optimal linear speedup.
5.1. TWO-DIMENSIONAL
5.1.1
LIPSCHITZ FUNCTIONS
195
Basic Definitions
We let 6 = [0, 0] € ft2, / be the set of all integers, and for positive m and K we define
w h e r e i s t h e infinity norm i n f t 2 . O u r problem i s defined in terms 01 me solution operator
where deg(/, C, 0) is the topological degree of / relative to C at 9. We want to compute the exact integer value of the degree for every / 6 F. The computational methods under consideration are M = ((/>, N) , where 0 is any algorithm
and N is any information consisting of sequential function evaluations, where x\ 6 dC is given a priori, and computed sequentially, for some transformations j = 1,..., n. By the minimal cardinality number we now understand the minimal number of evaluations n* for which there exists information with n* evaluations that identifies the degree exactly for any choice of a function / G F. We define a segment [x,y] to be the closed counterclockwise ordered portion of dC with endpoints x and y, and by (x, y) we mean the interior of ( x , y ) . We define a partition P of dC to be either empty set or a set {pi}f=l of counterclockwise ordered points from <9C2 such that
We now define an essential concept called sufficient refinement of the boundary of C.
196 CHAPTER 5. TOPOLOG1CAL DEGREE COMPUTATION Definition A nonempty partition P forms a sufficient refinement of the boundary of C with respect to the sign of a function / if and only if (pi,Pi+i) n (PJ,PJ+I) = 0, for all i ^ j, and on each [pi,p,-+i] there exists a component of /, say fs, that is of constant sign (i.e., ^ 0) on [pi,pi+i], and the remaining component of/ is nonzero at p; and Pi+i- • Given a sufficient refinement of the boundary of C, the topological degree can be computed by deg(/,C,0) (196.1) where ji is the index of the component of / that has a constant sign on
and
Description of the algorithm The algorithm calculates the degree using Eq. (196.1). By utilizing parallel information N* of cardinality 4[A'/(4m)J, which will be defined later, the algorithm constructs conceptually a partition P that forms a sufficient refinement of DC with respect to the sign of /, for every / € F. The algorithm calculates the sign of both components of / at each point of the partition P. A flowchart of this algorithm is given in Figure 197.1. We first derive a lower bound on the minimal cardinality number and then define the optimal information N*. 5.1.2
Lower Bound on the Minimal Cardinality
Num-
ber We show that the minimal cardinality number n* is greater than n\ = 4 [A"/(4m)J - 1. Namely, we prove the following lemma.
5.1. TWO-DIMENSJONAL LIPSCHITZ
FUNCTIONS
197
Figure 197.1: Flowchart of the algorithm to calculate the topological degree in two dimensions.
198 CHAPTER 5. TOPOLOGICAL DEGREE
COMPUTATION
Lemma 198.1 For any sequential information with n < n± function evaluations, there exist two functions f\ and fa in F, such that
and
Proof We let M = [K/(2m)\, and define
with the notation XJ±±M = %j for all j. Thus the points Xj subdivide the boundary of C into 4M segments [xj, Xj+i] of size > 2m/K, i.e., \\Xj - xj+i\\ = l/M > 4m/K. For a given point c 6 7£2, we define the functions hi, hi : 7i2 Ti, i: = 1, 2 by
for any point w 6 7i2. For given points c; 6 7£2, i = 1, 2, we define the functions gt : ft2 -> 7e by
for every w 6 7S2. We now define the function /i : C
• 7£2 by
We note that /i is a Lipschitz function with the constant A', as the minimum of such functions, and that ||/i(u>)|| = m for all w 6 dC. Thus /i belongs to F (the first component of f\ is illustrated in Fig. 201.1). We also note that deg(/i,C,0) = 0, since f\ has no zeros at all. We next generate the parallel information induced by the function A,
5.1. TWO-DIMENSIONAL LIPSCHISZ FUNCTIONS
199
for all / 6 F, where Wi — W i ( f i ( w i ) , . . . , /i(io;_i)). Since the number of evaluations n < 4M — 1, we know that there exists a segment on the boundary of C, say [xj-.\,Xj], that was not sampled by that information, i.e., Wi $. [xj-i,Xj] for all i = !,...,«. We assume without loss of generality that j is an odd number and define the function /2 : C ->• K2, by
where C and £ are defined by Range for j 0<j <M M < j < 2M 2M < j < 3M 3M < j < 4M
Value for £ (-TO/
A', -TO/A)
(TO/ A", -TO/ A) (m/K, m/K) (-TO/ A,TO/A)
Value for £ (m/K, TO/A") (-m/K, m/K) (-m/K, -m/K) (m/K, -m/K)
To verify that /2 is a member of the class F, we note that /2 is a Lipschitz function with constant K as a minimum of such functions. Moreover, for every w e dC\[xj_i,Xj+i] we have ||/2(w)ll = m. For every to G [£j-i, a^j+i] the choice of parameters C and £ guarantees that ||/2(w)|| = m, since either gi or #2 is equal to +m or -TO on that interval. Now we note that for every w S [xj,Xj+i] we have f i ( w ) = w h( ) — (min(m, m - K\\z - Xj\\), max(-TO, -3m + K\\z — Xj\\)) (compare Figure 200.1 for the case with 0 < j < M). We also note that f i ( w ) = f?(w) for all w from dC such that w $ [xj_i, »j+i]This implies that JV(/i) = AT(/ 2 ). The choice of C and £ yields that /2 has exactly one zero in C (see Figure 200.1) with the Jacobian at that zero equal to K2 or —K2, which implies that |deg(/2, C,&)\ = 1. This completes the proof of Lemma 198.1. • 5.1.3
Minimal Cardinality Number
Lemma 198.1 implies that the minimal cardinality number n* is greater than 4\K/(4m)] — 1. In Lemma 200.1 below we show that
200 CHAPTER 5. TOPOLOGICAL DEGREE
COMPUTATION
Figure 200.1: Zero of the function /2 = (fi, f 2 ). n* — 4\K/(4m)]. This is accomplished by exhibiting parallel information with q = 4[A'/(4m)] evaluations, such that it generates sufficient refinement of dC with respect to the sign of any / G F, and therefore allows us to calculate the exact value of the degree. On page 196 we outlined the idea of an algorithm to compute the degree. Here, in Lemma 200.1, we exhibit the details of the algorithm and prove that it calculates the degree for any / € F. We let / = (/i,/ 2 ) 6 F, M = 4m/K - 6, for arbitrarily small 8 > 0, and let q = 4[1/M]. We observe that q = 4\K/(4m)]. We define x0 = (0, 0) and Xj, j = 1,..., , to be counterclockwise (CCW) oriented points along the boundary of C such that \\Xj — Xj+i\\ = M, j = 0,1, ...,q, where xq+i = xl. We next define the information
and formulate Lemma 200.1 For every f in F, the information N*(f) uniquely determines the degree of f, i.e., for all f € F, N*(f) = N*(f) implies deg(f) = deg(f). This information generates a partition P of dC such that either P is empty and the degree of f is zero, or P forms a sufficient refinement of dC with respect to the sign of f.
5.1. TWO-DIMEJVS/ONAL LIPSCHITZ
FUNCTIONS
201
Figure 201.1: First component of the function /i for the case m/K = 1/16. Proof For every function / in F we build a partition P = {PJ}^=I of dC (where 0 < Q < 2q is the number of points in the partition, and Q depends on /) using only a?,- and /(x/), i = 1,..., g, such that P is either empty (in which case the degree of / is zero) or P forms a sufficient refinement of dC with respect to the sign of /. First we construct the partition P. We assume that /i(xi) ^ 0; if this is not satisfied then we interchange the roles of fi and /2 in the definition below. Definition of P
We first let i = j = 0 and P = 0. While i < q + 1, we iteratively define the partition P as follows. We let i = j and redefine j as the smallest index greater than i such that fi(xj) ^ 0. Then we set P as
s u c ht h a t
ifthaereexists where aresuchthat
202 CHAPTER 5. TOPOLOGICAL DEGREE
COMPUTATION
i ss u c ht h a t
and
andthereexists and and
suchtaht
i ss u c ht h a t andthaereexists
suchthat
and
where
are such that
where p is given by
We note that we do not always add the same number of points to the partition. Therefore we cannot tell in advance what is the total number of points in the partition. We let Q be the number of points in the partition, i.e., P = Uj=i{pi}, where Q = 0 whenever P = 0. First we show that if P = 0 then deg(/, C, 0) = 0. We notice that the only information which will result in P — 0 has /a (a:;) = 0 for all z, 1 < i < q. Thus the knowledge of the information Ng is sufficient to determine whether P is empty. By combining the fact that f^Xi) = 0 for all i with the definition of the class F, we conclude that there exists no z such that |/2(z)| > m and \\z — Xi\\ < m/K. In addition, we know that if 1/2(2)| < m then l/iC^j)! ^ d. Therefore there exists no z such that fi(z] = 0 and \\z — Xi\\ < 2m/K. By recalling that ||x,- — £,-+i|| = M < 4m/K we get /i 7^ 0 on the boundary of C. We can now use the PoincareBohl theorem to show that deg(/, C, 6) = 0. This theorem says that
5.1. TWO-DIMENSIONAL LIPSCHITZ FUNCTIONS
203
deg(/, C, 0) = deg(#,C,0), for all f,g e F, whenever the homotopy /i(i, z) = t * f ( z ) + (l-t)*g(z) is nonzero for every t 6 [0,1] and every z e SC. We recall that /i 7^ 0 on the boudary of C and without loss of generality assume that f\ (z) > 0 for all z G dC Then we let g(z) = (m,h(z))amdh(t,z') = t*f(z') + (l-t')*g(z). Since/j > 0 on the boundary of C, the homotopy h(t, z) is nonzero for every t 6 [0,1] and every z 6 dC. By combining the fact that deg(g, C, 0) = 0 (since 5 has no zeros at all) with the Poincare-Bohl theorem we conclude that if P = 0 then deg(/, C, 6} = deg(#, C, 0) = 0. For the remainder of this section we assume that P is not empty. In the above definition of the partition P we defined two types of segments: /? = [Pi,/3r] segments where fi has a zero and 7 = [ji,Jr] segments where fi does not have any zeros. The endpoints of the f3 and 7 segments form our partition. For clarity in the following discussion we index those segments (each type independently) using the CCW order, i.e., we have (3j = [/?j,/,/3j>], j = l,...,Qi, and Tj = bjVi 7j,r], J = 1, —,Q2, where Qi+Q2 = Q/2. We observe that the above definition results in the following properties: 1. For every j and any x € (3j, there exists z € /3j such that /i(2) = 0 and \\x — z\\ < M/2. (This follows from the definition of/?;, and /3r in the partition P, and the property ||a;j — £j+i|| = M.) 2. For every i, /i(z;) = 0 iff there exists j such that £; 6 flj. thenforevery 4. For every z G 5C such that f i ( z ) = 0 there exists jf such that 5. For every i, j, the intersection /?,• n 7^ = 0. 6. For every i,,;, a;,- ^ jj. 7. For every j, /i is not zero on 7^, and /2 is not zero on /3j. 8. For every i, a;t- ^ P. 9. A segment [xj, ij+i] contains no points of P iff
204 CHAPTER 5. TOPOLOGICAL DEGREE COMPUTATION 10. A segment [xj,Xj+i\ contains one point of P iff f i ( x 3 ) = 0 or = 0 but not both. 11. A segment [xj, Xj+i] never contains more than two points of P. 12. We let z be a zero of f\, f\ (z) = 0. By the definition of the class F we have 1/2(2)! > m. Since it would take a distance greater or equal to m/K from z for the magnitude |/i (x)\ to be at least m, and at least m/K more for /2 to reach a zero, there exists no point y £ dCsuch that / 2 (y) = 0 and \\y-z\\ < M/2 < 2m/K. We next show that P forms a sufficient refinement of dC with respect to the sign of /, for arbitrary / € F. Specifically we show for each segment [L, R], L,Re P, such that (L, jR) D P = 0, that (204.1)
/i ^ 0 on [L, R] or /2 / 0 on [L, R],
and (204.2)
fi and /2 are nonzero at both L and .R.
That is, by property 4 above we know that f\ is of constant sign (and not zero) on all segments except when there exists i such that [L,R] = [A',/>/3;,r]- By properties 1 and 12 we have /2 ^ 0 on \fti,l,Pi,r] for all z, which yields Eq. (204.1). By definition, we know that for all ;', fi(/3jti) ^ 0 and fi(/3j,R) / 0, which yields that /i ^ 0 at all points of P. The fact that /2 / 0 at all points of P follows from property 7 and the above argument. Thus Eq. (204.2) holds, which completes the proof of Lemma 200.1. • Now we proceed to formulate Lemma 204.1 For every function f in F the following formulas hold for the signs of fi and /2 at all points of the partition P from Lemma 200.1, contained in [#,, Xi+i], for all i — 1,..., q. If [a^-, Xi+\\ contains exactly one point T of P then and
5.1. TWO-DIMENSIONAL
LIPSCHITZ FUNCTIONS
205
and
If [xj, Xi+i] contains two points L and R of P then for j = 1, 2
and
where p is given in the definition of P on page 201. Before proving the lemma we formulate the following corollary. Corollary 205.1 We observe that the properties 9-11 on page 203 imply that for every f in F we can calculate the number of points of P in each segment [a;,-, £;+i]. Thus, by using Eqs. (204-3)-(205,3), we can calculate the degree of every f in F using the formula given by Eq. (196.1). Proof of Lemma 204.1. In the case of [a;,, £t-+1] containing exactly one point of the partition P we distinguish two cases. If /i(x{) = 0 and /i(x,-+i) ^ 0 then for some j, T = /3j r . Equation (205.2) for the function fa is straightforward and for the function /2 follows directly from Eq. (202.4) and the property 7 on page 203. If fa has a zero in [ar,-,ar,-.|_i] then for some j, L = /3jj and R = (3jtr. Therefore the
206 CHAPTER 5. TOPOLOGICAL
DEGREE COMPUTATION
property 3 yields the formula for f\. As in the property 12, it follows that /2 is of constant sign within M/2 < 2m/K from a zero of /i, which combined with Eqs. (202.1) and (202.2) and ||x t - - z,- + i|| = M yields the formula for /2. Now, we prove Eq. (205.3). If f\ is not zero on [a;,-, z,-+1] then Eq. (205.3) obviously holds for / 1; and since in this case L = jjj and R = 7jir for some j, then Eq. (205.3) for /2 follows from Eqs. (202.4) and (202.4), and the definition of the index p. We suppose now that f\ has a zero in [a;,-, Xi+i]. Then as in Eq. (205.1) for some j, L = /33j and R = /3^r. Thus the property 3 on page 203 yields Eq. 205.3 for /j. We now observe that all zeros of /2 are in ( x i , L ) U (R,Xi+i) since L and R are both within M/2 from some zero of /i- Thus sign/ 2 (L) = sign(,R) = sign/2(cni), where cti is an arbitrary zero of /i in [a;;, 2^+1]. ToshowEq. (205.3) for/ 2 , we prove that sign/ 2 (a- p ) =sign/ 2 (ai). To this end we first assume that |/2(a;,-)| < TO or that |/ 2 (a; t -+i)| < TO, and that p = i, i.e., |/ 2 (z; +1 )| < |/ 2 (xi)|. We suppose on the contrary that sign/ 2 (x p ) / sign/ 2 (« 1 ). Thus there exists a zero of /2, "2 € [a;,,^] C [xi,Q-i]. Then ||a2 - a:;|| > \ f ' 2 ( x t ) \ / K , and property 12 on page 203 yields that ||a'2 — c*i|| > 2m/K. It takes a distance TO//I' from «i for |/i| to become at least m, say at a point z, and a distance TO/A" — |/ 2 (x z -_)_i)|/A' from 2 for |/2| to become less than m at Xi+i. The sum of these distances is obviously equal to ||xj —z,-+i||. Therefore we have
which is a contradiction. The proof for the case p = i+l is essentially the same. We finally assume that |/2(a;8-)| > TO and |/ 2 (a;;+i)| > TO. Then l/iC^i)! < m or l/iC^'+i)! < rni since it is impossible for both |/i| and |/2| to be at least TO at both a;,- and ajj+i, and have zeros in [xi,Xi+\]As before we show that sign/ 2 (x p ) = sign/ 2 («i) for some zero 1 Q ] of /i- We suppose that the opposite holds and that p = i. Then there exists a zero a2 of /2 in [ar,-, ai]. Since in this case /i(a;,-)| < m
5.1. TWO-DIMENSIONAL
LIPSCHITZ FUNCTIONS
207
weget
which is a contradiction. The proof for the case p — i+l is essentially the same. This statement finally completes the proof of lemma 204.1. 5.1.4
Complexity of the Problem
In this section we examine the complexity comp(deg2), that is the minimal cost of computing topological degree for functions in our class. We then study the cost of our method M* = (4>*,N*) and conclude that it is an almost optimal complexity method. As before, we assume that all arithmentc operations (+,-,*,/), absolute value | • |, each logical operation (and, or, not), and all comparisons (<, >, >, <, =, 7^) have unit cost, and that each function evaluation costs c units of time, where c may be much larger than unity. Lemma 198.1 implies that any method that solves our problem uses at least n* = 4[A'/(4m)J function evaluations. Thus a lower bound on the computational complexity is
since at least 2n* numbers are involved in the computation and combining them takes at least In* — 1 unit cost operations. The cost of our method M* is equal to the cost of computing the information and the algebraic cost involved in combining these evaluations. The location of the evaluation points can, for known K and rn, be calculated once (precomputed), in which case the cost of computing the information is n*c. In our implementation the algorithm calculates the location of the next information point based on the location of the current point. This approach results in the information cost of n*(c + 17). The algebraic complexity of (f>* is 31 • n* + 12. Thus the cost of the method M*, as implemented, is n* • (c+48) +12. Using precomputation we can reduce the cost down to n* • (c + 20) + 5. Therefore we conclude that M* is an almost optimal complexity method.
208 CHAPTER 5. TOPOLOGICAL DEGREE COMPUTATION 5.1.5
Numerical Experiments
We report here some numerical tests of the algorithm as implemented on HP-9000 workstation (based on Motorola 68020 processor with floating point coprocessor). The CPU timing was obtained by averaging the total CPU time of 10,000 successive calls to the algorithm to reduce the effect of extraneous CPU time that is counted in the system timing figures. The algorithm was extended here to handle arbitrary polygonal domains in 7l2. The extension is straightforward and is based on selecting points on the boudary of the polygon that are separated by the distance l/[A'/(2m)J in the infinity norm. The following test functions were selected. • Function 1. FI = ((sin(11.5 X TT x x) + 0.01),cos(ll x TT x y)). This function has 121 zeros within the unit square, the minimum value of its infinity norm on the boundary m = 0.01, and the Lipschitz constant is K = 2 X 11.5 X TT < 80.0. The value of the degree is 1. • Function 2. F2 = ((x2 + y2 - 0.5), (2xy - 0.5)). This function has one zero of multiplicity 2 within the unit square, the minimum norm on the boundary m = 0.5, and the Lipschitz constant K = 4. The value of the degree is 0. • Function 3. FS = ((a; 2 — 4y), (y2 — 2x + 4y)), with the domain [—2, 2] x [—1/4,1/4]. This function has exactly one zero in this domain. The minimum norm on the boundary is m > 0.4375 and the Lipschitz constant K = 8. The degree is —1. • Function 4. F4 = ((sin(1.5 X TT x a;) + 0.01), - cos(?r x y)). This function has exactly one zero within the unit square. The minimum norm on the boudary is m = 0.01, and the Lipschitz constant K = 3 x TT < 10.0. The value of the degree is —1. • Function 5. F5 = ((x2 — y2},2xy). The domain was selected as [—1,1] x [—1,1]. The minimum norm on the boundary is m = 2i/2- 1 > 0.8284, and the Lipschitz constant is K = 4.0. The value of the degree is 2. • Function 6. Fg = ((x2 — y 2 ), 2xy). The domain was selected as the triangle A = conv{(-4, -4), (4, -4), (0, 4)}. The minimum norm on the boundary is m > 0.4 (computed numerically), and the Lipschitz constant is K = 32.0. The degree is 2.
5.1. TWO-DIMENSIONAL
LIPSCHITZ FUNCTIONS
209
• Function 7. F7 = ((x3 - 3zy 2 ),(y 3 - 3z 2 y)). The domain was selected as [—1,1] X [—1,1]. The minimum norm on the boundary is m > 0.6666, and the Lipschitz constant is K = 12.0. The value of the degree is —3. • Function 8. Fs = ((x3 - 3zy 2 ), (-y3 + 3z 2 y)) = z3, for z = (x + i y ) . The domain was selected as the triangle A defined above. The minimum norm on the boundary is m > 4.3333, and the Lipschitz constant is K = 64.0. The value of the degree is 3. • Function 9. F9 = ((x4 + y4 - 6x 2 y 2 ), (4xy(x2 - y 2 ))) = z4, for z = (x + iy). The domain was selected as [—1,1] X [—1,1]. The minimum norm on the boundary is m > 0.7, and the Lipschitz constant is K = 36.0. The value of the degree is 4. • Function 10. FIQ = F9. The domain was selected as the triangle A. The minimum norm on the boundary is m > 7.0, and the Lipschitz constant is K = 256.0. The degree is 4. These tests functions were chosen to verify the algorithm in many different cases. We included oscillatory functions having several zeros (Fi), polynomials (F?, FB), oscillatory functions with exactly one zero (F4), as well as functions tested by other researchers F5 - F10. The first set of tests is depicted in Table 210.1 In many of the tests we overestimated the Lipschitz constant K and underestimated the minimum boundary norm m. With such a choice of parameters the algorithm is guaranteed to compute the correct value of the degree, although the computation will involve more than the minimal number of function evaluations. On the other hand, if one underestimates K and/or overestimates m, the algorithm is only heuristic and in general may not compute the correct value of the degree. In the second set of tests reported in Table 210.2 we underestimated the Lipschitz constant K. We do not overestimate the minimum norm m since the algorithm automatically checks to make sure that no information points violate this constraint. With such choice of parameters the degree was still computed correctly in many cases when the underestimation factor was 2 or even 100.
210 CHAPTER 5. TOPOLOGICAL DEGREE
COMPUTATION
Table 210.1: Timing Results for Correct Use of the Algorithm. F No. 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10
Domain
Deg
K
m
[0, 1]2 [0, 1]2 [0, 1]2 [0, 1]2 [-2, 2] x [-j, j] [-2, 2] x [-1,1] [0, 1]2 [0, 1]2 [-1, 1]2
1 1 0 0 -1 -1 -1 -1 2 2 2 2 -3 -3 3 3 4 4 4 4
80.0 80.0 4.0 4.0 8.0 8.0 10.0 10.0 4.0 4.0 32.0 32.0 12.0 12.0 64.0 64.0 36.0 36.0 256.0 256.0
0.01 0.005 0.5 0.05 0.4375 0.01 0.01 0.001 0.8284 0.33 0.4 0.01 0.6667 0.0066 4.33 4.33 0.7 0.07 7.0 0.7
[-M]2
A
A
rL- 1i ) J-Jii22 [-i, i]
A A
[-M]22 [-M]
A
A
No. of F Calls 8000 16000 8 80 40 1800 1000 10000 8 24 400 15998 36 3596 73 968 76 772 183 1828
CPU (sec xlO~ 4 ) 59.3 99.4 0.1 0.5 0.5 11.0 7.3 64.2 0.1 0.3 2.5 93.5 0.2 22.9 0.5 6.4 0.6 6.8 1.5 13.6
Table 210.2: Timing Results for Heuristic Use of the Algorithm (* = Incorrect Values Computed). F No. 1 2 3 4 5 6 7 8 9 10
Domain
Deg
K
m
[0, 1]2 [0, 1]2 [-2,2] x [-7,7] [0, 1]2
1 0 -1 -1 -1* 2 -3 T 4 4
0.8 2.0 2.0 2.0 2.0 8.0 6.0 16.0 16.0 128
0.005 0.5 0.4375
[-M]
2
A [_1,1]2 A
hM]2
A
0.5
0.8284 0.4
0.6667 4.33 0.7 7.0
No. of F Calls 160 4 8 4 4 98 16 18 34 90
CPU (sec xlO~ 4 ) 1.1 0.1 0.1 0.1 0.1 0.8 0.3 0.3 0.5 0.8
5.2. LIPSCHITZ FUNCTIONS IN D DIMENSIONS
5.1.6
211
Exercises
30. (**) Generalize the results of section 5.1 to different classes of functions, e.g., functions with bounded derivatives of higher order, or functions with known numbers of sign changes of components on each subinterval of the boundary. 31. (**) Derive an efficient hybrid degree-bisection-Newton type method for approximating zeros of a "large" class of nonlinear algebraic equations.
5.2
Lipschitz Functions in d Dimensions
In this section we consider the class F of Lipschitz functions with constant K, defined on the unit cube C, f : C —¥ Tld, such that for every / 6 F we have ||/(aO||oo —m > 0 f°r a^ x ^ ®C, ^he boundary of C and such that K/ (8m) > 1. We note that if K/ (2m) < 1 then the functions in F do not have zeros and therefore the degree is zero for every such /. The case 1 < K/(2m) < 4 is open. The information is assumed to be n sequential function evaluations on the boundary of C. The topological degree is computed by an algorithm <£, : N(F) -»• 7. In this section we solve the following problems. 1. We construct information N*, which uniquely determines the degree for every / in F. This information consists of
function evaluations. 2. We exhibit an algorithm <^>*, which uses the information N* to compute the degree. 3. We find a lower bound, roughly equal to 2d(LA'/(8m)J) d ~ 1 , on the minimal cardinality number n*. We stress that the information N* is parallel, since the evaluation points are given a priori. By assuming that each function evaluation costs c units of time, and elementary arithmetic operations have unit cost, Eq. (211.1) yields a lower bound compiow on the complexity comp(deg) of the problem
212 CHAPTER 5. TOPOLOGICAL DEGREE
COMPUTATION
If A'/(8m) is large and/or d is large then the problem is intractable. For example, by taking K/(8m) — 103 and d — 10, compiow « 1 X 10 28 (c+10). The cost of the method M* = (<j>*,N*) is roughly A(c + (d2/2) (d 1)). Thus for small dimension d, say d < 5, and small K/(8m), say K/(8m) < 10, M* computes the degree in time at most roughly 105(c + 300). 5.2.1
Basic Definitions
We let C = [0, l]d be the unit cube in Kd, d > 2, || • || = || • jj^ to be the infinity norm in TZd, and 0 = (0,..., 0) € lZd. For a given positive m and K we define for all x,y € C, and ||/(a;)|| > m, for every x € 9(7, and K/(8m) > 1}. Our problem is defined by the solution operator which gives the value of the topological degree of / relative to C at 6. To solve this problem we apply sequential information with predefined cardinality n, where x\ 6 dC is given a priori, Xj = X j ( f ( x \ ) , . . . , / ( x j _ i ) ) , and Xj is an arbitrary transformation £j : 7id(J~l> 5(7, j = 1,..., n. If all of these functions are constant, i.e., all Xj are given a priori, then the information becomes parallel. By the minimal cardinality number n* we mean the minimal n for which there exists information TV that uniquely determines the degree for any function in our class, i.e., N(f) = N(g) implies deg(/,C, 9) = deg( ff ,C, 0), for all /, g G F. Knowing N(f) we compute the degree by an algorithm <j>, deg(/,C,0) = ##(f)). Below we exhibit an algorithm (/>* using information N* (defined in the following section), which was developed by Kearfott and is based on his parity theorem.
5.2. LIPSCHITZ FUNCTIONS IN D DIMENSIONS 5.2.2
213
Information N*
In this section we prove that the computation of function values on a uniform grid in dC with diameter less than 2(m/K) uniquely determines the degree. To this end, we define M = [K/(2m)+l\ and R - 1/M. Then we subdivide each (d - 1) dimensional face of C into Md~l equal cubes of diameter R, by subdividing each edge into M equal intervals of length R. In this way we obtain a subdivision of dC into 2dMd~l cubes d of diameter R,
We let X = {xi,..., XA} to be the set of all vertices of cubes C{. We observe that Then we define the information N*,
for any / € F. We first show
Lemma 213.1 The information N* uniquely determines the degree for any function Proof To prove this lemma we use Poincare-Bohl homotophy theorem. We let
To conclude that deg(/,C,0) = deg(#,C,6>) for all f,g e F such that N*(f) = N*(g), it is enough to show that the homotopy h(t, z) is not zero for every t 6 [0,1] and z 6 dC. To this end, we select an arbitrary z 6 dC. Then there exists an Xj such that \\Xj — z\\ < R/2 < TO/A'. Since Xj € dC and / € F, we get ||/(a:j)|| = l/K^j)!! > m for some z, 1 < i < d. Then we have |/,-(z) - fi(xj)\ < \\f(z) - f ( x j ) \ \ < K\\z - Xj\\ < m. This implies that f{(z) ^ 0 and sign(/,-(^)) = sign/,-(a;j). Since f ( x j ) = g(xj) and g 6 F, then
214 CHAPTER 5. TOPOLOGICAL DEGREE
COMPUTATION
gi(z) 7^ 0 and signg4-(z) = sign/;(z). Therefore, for every t € [0,1] we have
This inequality completes our proof. •
5.2.3
Algorithm 0* Using Information TV*
In this section we exhibit an algorithm (j>* to compute the degree. First we show that the evaluation points xt. i = 1: ...: A, yield an impartial refinement of dC relative to the sign of /, for every / 6 F. Impartial refinement is defined as follows. Definition If n = 1 then $[0, 1] = {0} U {1} is impartially refined relative to the sign of/ iff /(O) x /(I) < 0. If 7i > 1 then dC is impartially refined relative to the sign of / iff dC may be written as a union of a finite number of (d — 1) dimensional regions /3j,...,/3 ? (by a (d — 1) dimensional region we mean a union of a finite number of (d— 1) dimensional simplices) in such a way that
the (d-1) dim ensional
inter iors of
the
I regions p t are pairwise disjoint;
foreveryi=1,...,qthereexist t e {1, ...,d} [ such that fri is of constant sign on /%;
h{
if Si is an (d — 1) simplex in /?,- such that Si has a (d — 2) face in #/?,-, then this face is also a (d — 2) face of some (d - 1) simplex Sj in/?j, i + j.
We now consider the subdivision (213.1) of dC into 1dMd~l (d1) dimensional cubes d and subdivide in a standard way each d into (d — 1)1 (d — 1) dimensional simplices. (Hereafter we shall use the term (d— 1) simplices) This forms a simplicial subdivision of dC into 2dMd~1(d— 1)! (d — 1) simplices (compare the annotations to this chapter),
5.2. LIPSCHITZ FUNCTIONS IN D DIMENSIONS
215
for tj = ±1, and L = 2dMd~l(d - 1)!, where Sj are oriented (rf - 1) simplices. We note that the verticies of Sj are uniquely determined by this subdivision and the evaluation points a;,-. We now prove Lemma 215.1 The subdivision given in Eg. (215.1) yields an impartial refinement of dC relative to the sign of f, for every f G F. Proof We construct the regions /?,- satisfying properties (214.1)(214.4) of the above definition. For an arbitrary / € F and for each cube d in the subdivision (213.1) we choose a component fj of / that is of constant sign on C,-. Such a component exists, since for some index j, |/_j(zj)| > m, where Z{ is the center of d. Thus fj is of constant sign on d since the radius of Ci is less than m/K and / is in F. Indeed, \fj(z) - fj(z-)\ < \\f(z) - /(*,-) 11 < K\\z - Zi\\ < m for \\z — Zi\\ < m/K, which yields sign(/j(z)) = sign(/j(^,-)). Then we group the cubes Ci to form connected regions /3j ?1 ,..., flj,kj, such that some component fj is of constant sign on all = 1,..., kj, and 0j,ii ^Pj,h T^ ^ f°r 'i T^ '2- In this way we obtain a decomposition of
dc,
that satisfies properties (214.1)-(214.3). Since each cube in every /3jj is subdivided into (d — 1) simplices forming a simplicial subdivision of dC, then the property (214.4) is also met. This completes the proof. • We now proceed to outline Kearfott's parity theorem that makes possible the computation of the degree. We can use his parity theorem since every impartial refinement of dC is also a sufficient refinement (see annotations). We first introduce necessary notation. We let S = [Si,..., Sj] be a (d — 1) simplex in 7ld with vertices 5,-. The range matrix R(S,f) associated with S and any / 6 F is a d X d matrix
216 CHAPTER 5. TOPOLOGICAL DEGREE
COMPUTATION
where
The range matrix R(S, /) is called feasible iff
We define the parity of the range matrix R(S, /) by if R(S, /) is feasible after an even permutation of rows, — 1 if R(S, /) is feasible after an odd permutation of rows, 0 otherwise. 1
We remark that the parity can be computed with roughly d2/2 comparisons. We define the algorithm (f>* using N* by
where L and tjSj are as in Eq. theorem says that
(215.1). The Kearfott's parity
for every / € F. We observe that the implementation of (j>* requires computing the parities of L = 2dMd~l (d - 1)! (d - 1) simplices. Thus the cost of the method M* = (>*, N*) is at most
where c is the cost of one function evaluation and, as before, arithmetic operations and comparisons cost unity.
5.2.
LIPSCHITZ FUNCTIONS IN D DIMENSIONS
5.2.4
111
Lower Bound on the Minimal Cardinality Number
In this section we derive a lower bound on the number of function evaluations necessary to compute the degree for any function in the class F. Theorem 217.0 For any information N with the number of evaluations n < 2d[K/(2m)\d~1 — l, there exist two functions f* and f** in Fsuch thatN(D = N(f**), |deg(/*,C,0)| = 1, and deg(f**, C, 0) = 0. We note that this theorem implies that the minimal cardinality number n* is at least 2d[K / (2m) \d-1. This lower bound is exponential in the dimension d. Thus for large d and/or large K/(8m) the problem is intractable. In order to prove Theorem 216.1 we need the following lemma. Lemma 217.1 We let Hd be a d-dimensional cube in C with diameter 8m/K < 1 such that ( Bd = Hd n dC is a (d - 1) dimensional face of Hd, (217.1) < and corresponding (d — 1) faces of Hd and C are [ parallel. Then there exists a function fd £ F, fd = ( f d , . . . , f $ ) , there exists exactly one zero adof fd, and
where bd is the center of Bd, and dist(a d , Bd) = m/K,
where
such that
218 CHAPTER 5. TOPOLOGICAL DEGREE COMPUTATION This in particular implies that a is a simple zero;
for every z £ C, and every j; (218.2) Proof The proof of this lemma is quite technical. We use induction with respect to the dimension d > 2. We first let d = 2 and let H2 be the square satisfying property (217.1). Without loss of generality we assume that B2 C [0,1], B2 = [&i, 62], so that b2 = ((&! + 6 2 )/2, 0). We let cj = 62 + (m/K, m/K) and c2 = 62 - (m/K, m/K). Wefirstdefine the function /2 : C-+U2
by
as illustrated in Figure 219.1 We observe that /2 satisfies the Lipschitz condition with constant A', and that a2 = b2 + ( — m / K , m/K) is the unique zero of /2. Thus dist(a 2 , B2) =TO/A"and ||tt2 - 6 2 || =TO/A',which implies Eq. (217.2). The definition o f / 2 directly yields Eqs. (217.3), (217.4), (218.1), and (218.2). To show Eq. (217.5) we observe that
and
i = 1,2. Thus the lemma holds for d — 2. Induction step We now assume that lemma holds for (d— 1) dimensions, and we prove it for d, for any d > 3. We let Hd C C, dia,m(Hd) = 8m/K be a rf-dimensional cube for which Eq. (217.1) holds. Without loss of generality we assume that all points in B have the /th component
5.2. LIPSCHITZ FUNCTIONS IN D DIMENSIONS
219
Figure 219.1: Construction of the function /2. (I ^ d) equal to 1. (If / = d then the same construction follows with dth dimension replaced by the first dimension). We let Hd~l be the orthogonal projection of Hd onto the (d- 1) dimensional cube C. The induction assumption implies that there exists a function fd~l for Hd~l such that all properties (217.1)(218.1) hold. We define (see Figure 221.1)
where is the center of We define
We also define the function
and and
by
220 CHAPTER 5. TOPOLOGICAL DEGREE
COMPUTATION
a r ea l lo ft h e where and points We observe that the function g satisfies the Lipschitz condition with constant K, since it is obtained by taking the minimum of Lipschitz functions with constant K. We also note that the zero set of gd is given by
ZQ = {z € C : there exists i such that Finally, for any z 6 C, z = (z\,..., zd), we let z = (zi,..., 2^-1) be the orthogonal projection of z onto the (d — 1) dimensional cube C'. We define where
and
for i= l,...,d- 1. We now show that fd is in F and satisfies Eqs. (217.2)-(218.2). We first check that fd is continuous. Since for every z G C \ lnt(Hd) we have \\z - bd\\ > 4m/K, then \jj(z) > -2m + K\\z y,-|| > -2m + K • 3m/K = m, for j = 1,..., I2d~'2. Therefore gd(z) = m. This and continuity of gd implies that fd is continuous. Thus we must only check the continuity of fd for i = 1,..., cf — 1, at all z € C — Hd and z € Hd such that zd = 6^ or 2d = bdd-1m/K. We first let z e C\Hd. \izd = 6^-4m/A', then//(z) = min(m,max(/! J - 1 (l),m)) m. If zd = bdd + 4m/K then //(z) = min(m, max(//~ 1 (l), 3m)) = m. If bdd - 4m/A" < zd < bdd + 4m/K then 2 e C-Hd~l, where C is the (d- 1) dimensional unit cube. The induction assumption now yields f d ~ l ( z ) = m, which implies ff(z) = m. For z £ Hd such that
5.2. LIPSCHITZ FUNCTIONS IN D DIMENSIONS
111
Figure 221.1: Construction of the function fd. zd = bdd we have ff(z) = mm(m,max(f?-l(z),-m)) = fd~l(z), d which means that ff is continuous. For z 6 H such that zj, = bdd - 2m/K we have ff(z] = min(m, max(//-1(5), -m)) = fd~l(z), which means that ff is continuous. This analysis shows that all the components fd are continuous, which yields continuity of fd. The function fd satisfies the Lipschitz condition with constant K since it is defined by taking minima and maxima of Lipschitz functions with constants K.
222 CHAPTER 5. TOPOLOGICAL DEGREE COMPUTATION We now show that a 4 is the only zero of fd. Obviously fd can only have zeros inside Hd. We let z 6 Hd be such
Then \zd - (bdd + m/K)\ > 3m/A', so \\z - yt\\ > 3m/K, for all y,-, i = l,...,2 d ~ 2 . This yields that g d ( z ) = min(m,..., m) = m. Thus /d has no zeros in this domain. We now take
Then \zd — bd\ > 2m/'K which combined with the induction assumption fi~l(z) < m yields ff(z] = min(m, max(/z- ~l(z, m)) = m. Thus fd has no zeros in this domain. We now take In this domain, the induction assumption yields that the only zeros of f d , j — 1,..., d— 1, are in the form (af" 1 , ••-, a^l|, z j ) . We observe that gd is zero at only one of these points, namely for the point with zd = bdd - m/K. Indeed, to verify that we recall that \ad~l - bj\ < TO/A", j = l,...,d- 1, and add = bdt - m/K. Thus, by definition (219.1), Hc^-^ll = 2m/Kfori = 1,..., 2d~2, and therefore gd(ad) = 0. For every z £ Hd such that b^ — m/K < zj < bd there exists yz with iq = 1 for a^1 > 6^ and z'? = -1 for a*'1 < bd, such that ||y; - z|| < 2m/K. Thus # d (z) < 0. For every z £ Hd such that 6^ — 2mJK < Zd < bd — m/K, and for every yt we have ||y; — z|| > 2m/A'. Thus (7^(2) > 0. Therefore ad is the only zero of /| in this domain. For the domain we shall take any z such that g d ( z ) = 0, and show that there exists an index i e {!,..., d— 1} such that ff(z) / 0. To this end, we first observe that if \Zj — bd\ > 2m/A" for some j = l,...,d— 1, then||z — & d ~ 1 || > 2m/K and from the induction assumption (218.2) there exists an index i such that f d ~ l ( z ) = m, which implies f d ( z ) = m, since \zd — bd\ < 2m/K. Thus we assume that \Zj — bd\ < 2m/K for every j = I, ...,d— 1. We now take z such that gd(z) = 0. This means that
5.2. LIPSCHITZ FUNCTIONS IN D DIMENSIONS
223
we have (i) for every and that (ii) there exists j' such that We suppose that y-i = y(il, ...,id_i), where
and
where Qi ^ 0 and Qj UQ2 = {1,..., j - 1, j + 1, ...,d- 1}. Thus for every q 6 Q\ we have
If there exists 6 Qi such that |z? — 6^| = 3m/K, then 3m/K and Eq. (218.2) implies that //(z) = m for some i. Otherwise (i.e., if \zq-bdq\ = m/K for all q € Qi), 29 = bdq±m/K. Then we take y(ii,..., ^-2) such that ig are as above for q € Q2i and for q £. Q\ we take t, = +1 if 2, = bd + m/K and i, = -1 for zq = bdq - m/K. This implies that \\y(ii, ...,^-2) - z\\ < 2m/K, which contradicts (i) and completes the proof of uniqueness of the zero of fd. We now show that Eq. (217.2) holds. Indeed, we have ||a^ — bd\\ — m/K, since ad = bdd - m/K and \ad - bf\ < m/K for i = l,...,d- 1. We also note that dist(c^,B d ) = dist(a d ~ 1 , B^1) = m/K, which yields Eq. (217.2). Equations (217.3), (218.1), and (218.2) follow immediately from the definition of fd and the continuity. We now show that Eq. (217.4) holds. Obviously Eq. (217.3) implies Eq. (217.4) in the case z € dC\Bd. Therefore we let z £ Bd and subdivide Bd into five regions Bf, i = 1,...,5, as depicted in Figure 221.1, where
224 CHAPTER 5. TOPOLOGICAL DEGREE COMPUTATION and
Then we recall that Eq. (218.1) holds and: (a) For any z G Bd, by an argument similar to that following Eq. (222.2) we have ff(z) = m, i = l,...,d- 1. Thus \ \ f d ( z ) \ = m. (b) For any z € B$, the same argument as that following Eq. (222.4) yields f d ( z ) = m for some i e {!,..., d- I}. (c) For every z € £3 we have \Zd-(bdd + m/K}\ < m/K, zt-bf\ < 2m/K and obviously z\ — bf = 1. We let Qi = {«' : Zi > bf} and Q2 = {i • zl < bf}. Then for i £ Qi we have \zt - (bf + m/K)\ < m/K, and for i £ Qi we have \Z{ — (bf — m/K}\ < m/K. Thus for y(z'i,..., id-z) such that iq — 1 for q G Qi and iq = —1 for g € $2 we get ||y(z'i,..., ^-2) - z\\ < m/K, which yields yji(z) = -m for some i € {1, ...,2 d ~ 2 }. This means that gd(z) = —m, and thus \ \ f d ( z ) \ \ = m. (d) For every z 6 ^4, the induction assumption (217.4) and the definition of fd yield \ \ f d ~ 1 ( z ) \ — m, and therefore ||/ d (z) | = m. (e) For every z 6 Bd, Eq. (222.1) implies that g d ( z ) = m, and therefore ||/ rf (2)|| - m. In this way we proved that for every z € dC we have ||/ d (2)|| = m, which completes the proof of (217.4). To prove Eq. (217.5) we note that for z close to ad by the induction assumption and the definition of// we have dff(z)/dzd = d 0, i = l,...,d— 1. Thus we need to show only that dg /dzi\ad — ±K6itd, i = l,...,d. We let y(ii,..., ^-2) be such that for ad > bd we have iq = +1 and for ad < bd we have iq = — 1. Then \ad-(bd + ig-m/K}\ < m/K since ||a d -6 d || < m/A", and obviously
5.2.
LIPSCHITZ FUNCTIONS IN D DIMENSIONS
225
For z in a small neigborhood of a we have
Therefore
and
which shows Eq. (217.5). We now show Eq. (218.2). We first observe that Eq. (217.3) implies Eq. (218.2) for z £ C \ Hd. For z 6 Hd such that zd < bdd - 2m/K or zd > bdd + 2m/K, Eq. (218.2) follows directly from the proofs following Eqs. (222.1) and (222.2). For z € Hd such that bdd - 2m/K < zd 2m/K we have \\z - bd~l\\ > 2m/K and then by the induction assumption there exists an index jf such that f j ~ 1 ( z ) = m, so that ff(z) = m. For z € Hd such that bdd < zd < bdd + 2m/K and \\z - bd\\ > 2m/K as in Eq. (222.4) we have \\z — bd~l\\ > 2m/K and by the induction assumption there exists an index j such that f - ~ 1 ( z ) = m, which combined with the definition of fd yields f d ( z ) = m, and completes the proof of Eq. (218.2). The function fd is in F since it satisfies the Lipschitz condition with constant K and its norm is exactly m on the boundary of C (compare Eq. (217.4)). This finally completes the proof of Lemma 218.2. • We are now ready to prove Theorem 216.1. Proof of Theorem 216.1 We first let P = [K/(8m)\ and show that for every / G F and every sequential information N ( f ) = [ f ( x i ) , . . . , f ( x n ) ] , with n < 2dPd~1 - 1, there exists a cube H d C C with diam(H d ) = 8m/K, satisfying Eq. (217.1), and such that no point a?; belongs to Bd. Indeed, we subdivide the boundary of C into 2dPd~l (d — 1) cubes of diameter 1/P by subdividing uniformly each (d — 1) face of C into Pd~l (d — 1) cubes. Then, since n < 2dPd~l - 1 , there exists at least one (d - 1) cube in this subdivision, say Bd, that does not contain any of the points X{. Since diam( J B d ) = 1/P > 8m/K, we take as Bd any (d - 1) cube of diameter 8m/K contained in Bd, with faces parallel to the corresponding
226 CHAPTER 5. TOPOLOGICAL DEGREE
COMPUTATION
faces of Bd. This Bd is obviously a (d— 1) face of a cube Hd satisfying Eq. (217.1). We let /**(z) = [d, d,..., d] for every z e C, and let Hd be constructed as above for the function /**. We let /* = fd from Lemma 218.2 for this cube Hd. We observe that
since for every a;,- we have /**(z,-) = f*(xi] = [ d , d , . . . , d ] . Moreover, there exists a unique zero a of /*. We let D to be an open neighborhood of a such that /* is continuously differentiable in D. Then, since ad is a simple zero of /*, we get deg(/*,D,$) = ±1. Also, since /* has no zeros in C\ D, we get deg(/*, C\ D,0) = 0. Therefore by the additivity of the degree we get
Obviously deg(/**,C, 0} — 0, which combined with Eq. (226.1) completes the proof of Theorem 216.1. • 5.2.5
Exercises
32. (**) Generalize the results of section 5.2 to different classes of functions, e.g., functions with bounded derivatives of higher order or piecewise analytic functions. 33. (**) Carry out the analysis of section 5.2 for the information with uniformly bounded noise in each function evaluation. 34. (**' Carry out an average case analysis of the topological degree computation. 35. (**) Derive efficient algorithms based on the degree computation for solving nonsmooth nonlinear (and possibly linear) programming problems.
5.3
Annotations
The topological degree, also known as Brouwer degree, is a generalization of the concept of winding number in complex analysis. It could be numerically computed by approximation of the integral formulations due to Kronecker (see [20]) or Heinz [11]. More practical
5.3. ANNOTATIONS
111
algorithms were developed via determinant formulas by Stenger [27], and later, by computation of signs based on relating these formulas to Sperner simplexes, by Kearfott [14]-[19], and Stynes [28]-[30]. We proved in [5]-[6] that the latter algorithms enjoy almost optimal complexity in the class of Lipschitz functions. More recently, algorithms for degree computation based on interval arithmetic were proposed by Aberth [1]. The degree computation is an important tool in nonlinear analysis, root finding, and nonlinear optimization in cases of low smoothness functions, when there is singularity of the derivative at critical points. This stems from the two main properties of the degree: (i) dependence only on the function values on the boundary of the domain; and (ii) continuity with respect to small perturbations in the function values. Actually the degree can be correctly computed by our algorithms whenever the signs of the function values are computed correctly. A good summary of applications for the degree computation is given in the recent work of Kearfott [19]. Several algorithms based on the degree computation were proposed for zero-finding problems. These include generalized bisection methods [7], [14]-[17], [24], [34], as well as simplicial continuation methods [2]. In many test cases these algorithms were able to find approximate solutions whenever all other tested methods failed [10], [17]. The problem of computing topological degree using noisy information (compare problem 33 on page 226) was recently studied by Yokoyama [35], [36]. 5.3.1
Specific Comments
Section 5.1 1. This section presents the work of Boult and Sikorski [6]. 2. The formula (196.1) for computing the topological degree in two dimensional case was given originally by Stenger [27] (see also Kearfott [14] and Stynes [28]). 3. The test functions -F5-F10 were taken from [14]. Section 5.2 1. This section presents the work of Boult and Sikorski [5].
228
BIBLIOGRAPHY
2. The standard simplicial subdivision of an arbitrary cube (used in (215.1)) is given by Jeppson in [12]. 3. The simplicial subdivision in Eq. (215.1) was developed by Stynes [28] and Kearfott [14]. 4. The explicit formulas for the verticies of simplices Sj's in Eq. (215.1) are given by Allgower et al. in [3] and also Jeppson [12]. 5. The parity theorem was given by Kearfott in [14]. 6. Stynes proved that every impartial refinement of the boundary of C is also a sufficient refinement (Theorem 3.3 of [28]).
Bibliography
[1] Aberth, O. Computation of topological degree using interval arithmetic and applications. Math. Comp., 62(205): 171-178, 1994. [2] Allgower, E. L., and Georg, K. Simplicial and continuation methods for approximating fixed points and solutions to systems of equations. SI AM Rev., 22: 28-85, 1980. [3] Allgower, E. L., Keller, C. L., and Reeves, T. E. A Program for the Numerical Approximation of a Fixed Point of an Arbitrary Continuous Mapping of the n-cube into Itself. Aerospace Research Laboratories, Tech. Rep. 71-0257, Wright-Patterson Air Force Base, OH, 1971. [4] Boult, T. E. Information Based Complexity in Nonlinear Equations and Computer Vision. Ph.D. dissertation, Comp. Sci. Dept., Columbia University, New York, 1986. [5] Boult, T. E., and Sikorski, K. Complexity of computing topological degree of Lipschitz functions in N dimensions. J. Complexity, 2: 44-59, 1986. [6] Boult, T. E. and Sikorski, K. An optimal complexity algorithm for computing the topological degree in two dimensions. SIAM J. Sci. Stat. Comp., 10, (4): 686-698, 1989.
BIBLIOGRAPHY
229
[7] Eiger, A., Sikorski, K., and Stenger, F. A method of bisection for solving n nonlinear equations. ACM Trans. Math. Software, 10: 376-377, 1984. [8] Hansen, E. Global optimization using interval analysis-the multidimensional case. Numer. Math., 34: 247-270, 1980. [9] Hansen, E., and Sengupta, S. Bounding solutions of systems of equations using interval analysis. BIT, 21: 203-211, 1981. [10] Harvey, C., and Stenger, F. A two-dimensional analogue to the method of bisection for solving nonlinear equations. Quart. Appl. Math., 33: 351-368, 1976. [11] Heinz, E. An elementary analytic theory of the degree of mapping in n-dimensional space. J. Math. Mechanics, 8(2): 231247,1959. [12] Jeppson, M. M. A search for the fixed points of a continuous mapping, in Mathematical Topics in Economic Theory and Computation (R. H. Day and S. M. Robinson, Eds.). SIAM, Philadelphia, 122-129, 1972. [13] Kearfott, R. B. Computing the Degree of Maps and a Generalized Method of Bisection. Ph.D. dissertation, University of Utah, Salt Lake City, 1977. [14] Kearfott, R. B. An efficient degree-computation method for a generalized method of bisection. Numer. Math., 32: 109-127, 1979. [15] Kearfott, R. B. On a general technique for finding directions proceeding from bifurcation points, in Numerical Methods for Bifurcation Points (T. Kiipper, H. D. Mittleman, and H. Weber, Eds.). Birkhauser, Boston, 210-218, 1984. [16] Kearfott, R. B. Abstract generalized bisection and a cost bound. Math. Comp., 49: 187-202, 1987. [17] Kearfott, R. B. Some tests of generalized bisection. ToMS, 13 (3): 197-220, 1987.
ACM
[18] Kearfott, R. B. Interval Newton/generalized bisection when there are singularities near roots. Ann. Operations Res., 25: 181-196, 1990.
230
BIBLIOGRAPHY
[19] Kearfott, R. B. Rigorous Global Search: Continuous Problems. Kluwer Academic Pub., Boston, London, Dodrecht, 1996. [20] Ortega, J. M. and Rheinboldt, W. C. Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York, 1970. [21] Peitgen, H. O. and Priifer, M. A constructive approach to global bifurcation, the Schauder continuation method and coincidence problems, in Approximation of Fixed Points and Functional Differential Equations (H. O. Peitgen, Ed.). Springer Lecture Notes, Springer-Verlag, New York, 1979. [22] Priifer, M. and Siegberg H. W. On computational aspects of topological degree in 7£n. Preprint No. 252, Sonderforschungsbereich 72, Approximation und Optimierung, Universitat Bonn, 1978. [23] Priifer, M. Calculating global bifurcation, in Continuation Methods (H. Wachter, Ed.). Academic Press, New York, 187214,1978. [24] Sikorski, K. A three-dimensional analogue to the method of bisection for solving nonlinear equations. Math. Comp., 33: 722-738, 1979. [25] Sikorski, K. Bisection is optimal. Numer. Math., 40: 111-117, 1982. [26] Sikorski, K. Minimal Number of Function Evaluations for Computing Topological Degree in Two Dimensions. Tech. Rep., Department of Computer Science, Columbia University, 1984. [27] Stenger, F. An algorithm for computing the topological degree of a mapping in Un. Numer. Math., 25: 23-28, 1976. [28] Stynes, M. An algorithm for numerical calculation of topological degree. Appl. Anal, 9: 63-77, 1979. [29] Stynes, M. A simplification of Stenger's topological degree formula. Numer. Math., 33: 147-156, 1979. [30] Stynes, M. On the construction of sufficient refinements for computation of topological degree. Numer. Math., 37: 453-462, 1981.
BIBLIOGRAPHY
231
[31] Traub, J. F., Wasilkowski, G. W., and Wozniakowski, H. Information, Uncertainty, and Complexity. Addison-Wesley, Reading, MA, 1983. [32] Traub, J. F., and Wozniakowski, H. A General Theory of Optimal Algorithms, Academic Press, New York, 1980. [33] Traub, J. F., and Wozniakowski, H. Information and Computation, Adv. Computers, 23: 35-92, 1984. [34] Vrahatis, M. N. Solving systems of nonlinear equations using the nonzero value of the topological degree. ACM ToMS, 14(4): 312-336, 1988. [35] Yokoyama, M. Computing topological degree using noisy information, J. Complexity, 6: 379-388, 1990. [36] Yokoyama, M. Kearfott's algorithm with noisy information derives topological degree. J. Complexity, 13: 272-278, 1997.
This page intentionally left blank
Index method, 4, 5, 27 -envelope algorithms, 173 -envelope hybrid method, 123 -Newton method, 101 -secant method, 101, 103, 104 Bland, R. G., 164, 165 Blum, L., 19, 20, 109 Booth, R. S., 98, 109 Boult, T., 98, 109, 188, 190,
Abaffy, J., 108 Aberth, O., 226, 228 absolute error criterion, 5, 8, 123 adaptive stopping criterion, 99 adversary principle, 48 Aird, T. J., 108, 109 Aitken, 107 Alefeld, G., 109 algorithm, 11 simple iteration, 121 Allgower, E., 109, 162, 165, 228 almost optimal complexity method, 18 asymptotic optimality, 56 asymptotic worst case, 56 asymptotically the best rate of convergence, 56 average case analysis, 99 average case complexity, 100
Brent, R. P., 99, 106, 107, 109 Brouwer degree, 226 Brouwer's fixed point theorem, 172 Buergisser, P., 116 Bunch, J. R., 165 Bunch-Nielsen-Sorensen algorithm, 164 Bus, J. C. P., 99, 110
Bakhvalov, N. S., 108, 109 ball iteration, 140, 142, 144 Banach, 162, 165 Banach's fixed point theorem, 121, 139 Banach space, 121 bisection algorithm, 29, 58 information, 28, 55, 58
cardinality of information, 9, 10, center of gravity method, 139 central algorithm, 15, 188 centroid method, 140, 155 Chebyshev polynomial, 42 Chernousko, F.L., 98, 110 Chou, D., 166 circumscribed ellipsoids, 144
227, 228
INDEX
234
combinatorial algorithmic stage, 6, 9 combinatory complexity, 4 completely labeled simplex under /, 72 complexity of the problem 6, 135 computational method, 6, 8, 11 condition number, 188 constrained approximation by polynomials, 53 continuation method, 72 contraction factor, 121, 122 contractive function, 121 convex function, 99 Cucker, F., 20 Dekker, T. J., 99, 110 Dennis, J. E., 99, 110 diameter of information (local and global), 12 dimensional reduction, 139, 146 e-approximation, 6, 8 e-complexity of the problem, 16, 18 Eaves, B. C., 110, 162, 166 Eichorn, B. H., 98, 110 Eiger, A., 110, 229 ellipsoid iteration, 140, 144, 150 envelope functions, 124 envelopes of the family of functions, 41 Erlikh, I. I., 163, 168 error criterion general, 23
relative residual, 23 relative root, 23 residual, 23 root (absolute), 23 error of an algorithm, 13 local and global, 13 error tolerance, 8 extremum, 99 Fibonacci search, 98 fixed point, 121 fixed point ellipsoid method, 139 Fixed Point Envelope method, 123, 137 fixed point problem, 8 Fletcher, R., 99, 110 FPE method, 124, 131 FPE-A method, 137 Gaffney, P. W., 107, 110 Garcia, C. B., 111, 162, 166 Gelfand nth width, 37, 45, 107 general error criterion, 23, 24, 44
Georg, K., 109, 165, 228 Gill, P. E., 99, 111, 166 global error of an algorithm, 13 Goldfarb, D., 165 Golub, G. H., 165 Gould, F. J., 110 Graf, S., 111 Gross, O., 98, 99, 111 Grunbaum, B., 163, 166 Hansen, E., 229
INDEX Hansen, T., 168 harmonic mean, 124
Harvey, C., Ill, 229 Heinz, E., 226, 229 Higham, N., 19, 20 Hirsch, M. D., 106, 111, 178, 189, 190 homogeneous linear system, 50, 51 Howard, R. A., 166 Howland, J. L., 166 Huang, Z., 166, 190 Hyafil, L., 98, 111 impartial refinement, 214 infinitely differentiable functions, 47, 56 information, 4, 9 complexity, 4 complexity m(e), 17 complexity of the problem, 10 gathering stage, 6 priced, 10 with predefined cardinality, 10 with varying cardinality, 99 Information-Based Complexity, 7, 19 interior point methods, 99 interpolatory algorithm, 15 methods, 107 iterative model, 105 Jacobian, 80 Jeppson, M. M., 227, 229 Johnson, S. M., 98, 99, 111
235 Juhnke, F., 164, 166 Kacewicz, B., 106, 111 Kahaner, D., 99, 111 Kearfott, B. R., 111, 212, 215, 227 Kearfott's parity theorem, 216 Keller, C. L., 228 Kelley, C. T., 105, 112 Khachiyan, L., 163, 166, 168, 190 Kiefer, J., 98, 108, 112 Kim, M. H., 106, 112 Ko, Ker-L, 112 Kojima, M., 162, 166, 167 Konig, H., 164, 167 Kowalski, M., 20 Kronecker, 226 Kronecker's theorem, 193 Kuhn, H. W., 106, 112, 167 Kung, H. T., 106, 112 Lebesgue measure, 59 Levin, A., 167 Lickteig, T., 116 linearly independent functionals, 57 Lipschitz condition, 189 Lipschitz functions, 83, 194, 211 local diameter of information, 12 local error of algorithm, 13 local radius of information, 12 Majstrovskij, G. D., 98, 99, 112 McMullen, C., 106, 113 Meersman, R., 106, 113 Merril, O. H., 162, 167
236
metric, 56 Micchelli, C. A., 98, 99, 107, 113 minimal cardinality number, 90, 95, 123, 173, 179, 187, 189, 195, 196, 199, 216 minimal volume ellipsoid, 151, 157 Miranker, W. L., 98, 99, 113 Mitiagin, B. S., 163, 164, 167 Moler, C., 99, 111 Murota, K., 106, 107, 113 Murray, W., 111, 166 Nash, S., 99, 111 Natarajan, B. K., 188, 191 nearly optimal information and algorithm, 58 Nemirovsky, A. S., 98, 99, 113, 163, 167 Nemirovsky- Yudin-Shor method, 139, 144 Nesterov, Y., 113 Newman, D. J., 163, 168 Nielsen, C. P., 165 Novak, E., 19, 20, 98, 99, 111, 113 numerically stable method, 7 optimal complexity method, 16 optimal error algorithm (strongly (locally) and globally, 14 optimization problems, 99 oracle calls, 6, 8 Ortega, J. M., 105, 107, 114 Osborne, M. R., 99, 114
INDEX Pallaschke, D., 164, 167 Papadimitriou, C., 178, 189, 190 Papageorgiou, A., 111 parallel (nonadaptive) information, 9 parity of the range matrix, 216 partial information, 10 Peitgen, H. O., 110, 230 perfect splines, 40 Plaskota, L., 20 Poincare-Bohl Theorem, 203 polynomial equations, 46 Potra, F. A., 109 Powell, M. J. D., 107, 110, 114 Priifer, M., 114, 230 radius of information (local and global), 12, 29 range matrix, 215 real number model of computation, 6, 7 Reeves, T. E., 228 relative error criterion, 123 residual criterion, 24, 46 root criterion, 24, 44 Renegar, J., 98, 106, 107, 114 residual error criterion, 5, 8, 24, 37 Rheinboldt, W. C., 105, 107, 114, 168, 230 Rice, J. R., 108, 109 Ritter, K., 98, 99, 101, 113, 115 Rivlin, T. J., 107, 113 root criterion, 24, 34 s-homogeneous error, 45
INDEX Saari, D. G., 106, 115 Saigal, R., 162, 166 Saunders, M. A., 166 Scarf, H., 121, 162, 168 Schnabel, R., 99, 110 Schonhage, A., 98, 106, 115 Sengupta, S., 229 sequential (adaptive) information, 9 Shi, X. Z., 163, 169 Shi, Y., 109 Shor, N. Z., 163, 168 Shub, M., 19, 20, 98, 99, 106, 107, 109, 115, 116 Siegberg, H. W., 114, 230 Sikorski, K., 20, 98, 109, 110, 116, 166, 168, 188, 190, 191, 227, 228, 230 Simon, C. P., 106, 115 simple iteration method, 121, 131, 139, 162 simplicial continuation method, 71 Smale, S., 19, 20, 98, 99, 106, 107, 111, 115, 116, 168 Sobol, I. M., 108, 117 solution operator, 8 Sorensen, D. C., 165 Spedicato, E., 108 Sperner's triangle, 72 Steinhagen, 164 Stenger, F., 20, 110, 117, 226, 227, 229, 230 strongly optimal algorithm, 91 Stynes, M., 117, 226, 230 sufficient refinement, 196 Sukharev, A. G., 98, 99, 108, 117 Swaminathan, S., 168
237
Talman, A.J.J., 167 Tarasov, S. P., 163, 168 termination functions, 10 procedure, 9 Tikhomirov, V. M., 107, 117 Tischler, D., 116 Todd, M., 98, 107, 110, 117, 162, 165, 169 topological degree, 8, 71, 72, 195, 211 nonzero, 71 of Lipschitz functions, 193 Traub, J. F., 20, 105, 106, 107, 108, 112, 117, 118, 162, 164, 169, 191, 231 Trojan, G. M., 106, 116, 118 Tsay, Ch. W., 163, 169, 191 unit cube, 83 unit information operations, 6, 9 univariate polynomials, 46 Vaillancourt, R., 166 van der Laan, G., 167 Van Loan, C., 166 Vavasis, S., 99, 118, 178, 189, 190 Vrahatis, M. N., 231 Wang, Z., 112 Wasilkowski, G., 20, 106, 107, 118, 169, 191, 231 Werschulz, A. G., 20, 21 Wiener measure, 99, 101 Williams, R. F., 116
INDEX
238
Williamson, Jr. T. E., 163, 169 Winograd, S., 106, 107, 110, 113 Wolfe P 106 110 worst'case ' complexity, 6 cost, 16 Wozniakowski, H., 19, 20, 21, 98, 99, 105, 106, 107, 108, 116, 118, 119, 169, 231 Wright, M. H., 111
Xu, S., 112 Xu, Z. B., 163, 169 Yamamoto, Y., 162, 166, 167 Yokoyama, M., 227, 231 Yudin, D. B., 99, 113, 163, 167 Zangwill, W. I., Ill, 162. 166 Zero finding problem, 8 zeros of nonlinear functions, 23