Selected Topics in Approximation and Computation
This page intentionally left blank
Selected Topics in Approximation and Computation
Marek A. Kowalski UNIVERSITY OF WARSAW
Krzysztof A. Sikorski UNIVERSITY OF UTAH
Frank Stenger
UNIVERSITY OF UTAH
New York Oxford OXFORD UNIVERSITY PRESS 1995
Oxford University Press Oxford New York Athens Auckland Bangkok Bombay Calcutta Cape Town Dar es Salaam Delhi Florence Hong Kong Istanbul Karachi Kuala Lumpur Madras Madrid Melbourne Mexico City Nairobi Paris Singapore Taipei Tokyo Toronto and associated companies in Berlin Ibadan
Copyright © 1995 by Oxford University Press, Inc. Published by Oxford University Press, Inc., 198 Madison Avenue, New York, New York 10016 Oxford is a registered trademark of Oxford University Press, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of Oxford University Press. Library of Congress Cataloging-in-Publication Data Kowalski, Marek A., 1953Selected topics in approximation and computation/ Marek A. Kowalski, Krzysztof A. Sikorski, Frank Stenger. p. cm. Includes bibliographical references (p. - ) and index. ISBN 0-19-508059-9 1. Approximation theory. I. Sikorski, Krzysztof A., 1953- . II. Stenger, Frank. III. Title. QA221.K69 1995 511'.4—dc20 95-6224
9 8 7 6 5 4 3 2 1 Printed in the United States of America on acid-free paper
To our wives and children
This page intentionally left blank
Preface This text covers classical basic results of approximation theory. It also obtains new developments in the theory of moments and Sine approximation, as well as n-widths, s-numbers, and the relationship of these concepts to computational complexity. In addition, the text also contains several computational algorithms. Chapter 1 covers basic concepts of classical approximation. In Section 1.1, the classical and basic concepts of approximation theory are couched in the language of functional analysis. Thus, it is most convenient to cover the concepts of existence and uniqueness of best approximation in a normed space setting. From the point of both theory and application, the inner product space is an important normed space in which to study best approximation. In this space it is convenient to study approximation in various polynomial settings such as via classical orthogonal polynomials and via Cardinal, or Sine approximation. These concepts are covered in Section 1.2 of the text. Concepts of approximation in the uniform norm are covered in Section 1.3. These concepts are also conveniently presented in a Banach space setting. Important examples from a classical standpoint include polynomial approximation with respect to the uniform norm. Chapters 2 deals with spline methods of approximation. Polynomials are historically the most popular tools of approximation since they are easy to compute. However, an interpolatory polynomial of high degree does not do a good job of approximating arbitrary data, since it is then nearly always the case that the polynomial also has large overshoots and undershoots between data points. On the other hand, splines, which are piecewise polynomials discussed in Chapter 2, are very convenient for approximating data, particularly data that is contaminated with noise. This is especially true of the practically important B-splines, which have variationdiminishing properties when used to approximate data ( i.e., the total variation of the spline approximant is no more than that of the
viii
PREFACE
data). Thus splines provide particularly useful methods of approximation in the important areas of computer-aided geometric design and for representing computer graphics displays. Sine methods discussed in Chapter 3 are ideal for approximating functions that may have singularities at end-points of an interval. Spline methods are ideal for the approximation of data, and polynomial methods are ideal for approximating analytic functions that have no singularities on the interval of approximation. For example, if a function is analytic in a region containing an interval /, then we can achieve an C9(exp{—bn}} error in a degree n polynomial approximation of the function in 7, where 6 is some positive constant. This rapid exponential rate of decrease of the error reduces to a drastically slow O(n~b'} rate in the case when the function has a singularity on /. On the other hand, using ra-point Sine approximation, we can achieve an (9(exp{ — frn1/2}) error of approximation whether or not the function has an end-point singularity on /. Also, it turns out that Sine methods provide simple-to-use, accurate approximation tools for every operation of calculus, including the approximation of Hilbert transforms, the approximation of derivatives, the approximation of definite and indefinite integration, the approximation and inversion of Laplace transforms, and the approximation of definite and indefinite convolution integrals. While an 0(exp{ —fr'n 1 / 2 }) error is also possible via spline approximation, by suitable choice of both the mesh and the degree of the spline on each subinterval, the constant b' in this exponential rate of convergence is usually not as large as the corresponding constant 6 for sine approximation. In Chapter 4 we present a family of simple rational functions, which make possible the explicit and arbitrarily accurate rational approximation of the filter, the step, and impulse functions! Moment problems are conveniently discussed in the setting of approximation theory in Chapter 5. Included among the well known moment problems are the discrete and continuous moment problems named after Hausdorff, Stieltjes, and Hamburger, as well as the discrete and continuous trigonometric moment problems. We also include the Sine moment problem, whose solution in the appropriate Sine space is relatively easy and devoid of the difficulties that one encounters using the usual monomial bases. Chapter 6 deals with some rather "deep" concepts of approximation — n-widths and s-numbers. An important and historic problem of approximation theory is to achieve a practically accurate approximation to a function / by a polynomial of a certain degree. Next, one might want to know the error of best approximation of the func-
PREFACE
ix
tion / by polynomials of degree n. Following this, we could identify a class of functions F to which the function / to be approximated belongs, and we could then determine the maximum error of best approximation, as / is varied throughout F. Finally, we could vary our tools of approximation, i.e., the classes of n-dimensional subspaces (e.g., not just polynomials), and so deduce the best method of approximation of functions in F. Thus we are able to explore limits of approximability, a knowledge which is both practically and theoretically worthwhile. Chapter 7 discusses optimal methods of approximation and optimal algorithms for general, nonlinear approximation problems. It relates approximability with the amount of work required to achieve a certain accuracy. These concepts are discussed in the general setting of normed spaces. Later, they are connected with splines and also with the concepts of n-widths and s-numbers discussed in the previous chapter. Chapter 8 illustrates applications of the approximation theory of the previous chapters. There we discuss the solution of Burgers' equation, the approximation of band-limited signals, and a nonlinear zero-finding problem. Each section of the text ends with a set of exercises. Each chapter closes with annotations, which include historical remarks that indicate the source of the material. References follow at the end of each chapter. Exercises are numbered from 1 to 180 globally throughout the text. Theorems, lemmas, corollaries, examples, and figures are numbered consecutively on each page, and have the page number attached to their name. For example, Lemma 105.1 is the name of the first lemma on page 105, and similarly, Example 45.2 is the name of the second example on page 45. There are no references to the literature inside of the text. All references are discussed in the annotations to each chapter. Numbered formulas are almost eliminated to provide more structured text. In the few remaining cases they are numbered again according to the page system described above (i.e., formula 15.1 refers to first formula on page 15). We believe that the special format chosen will best serve the reader by providing more structured and self contained text.
x
PREFACE
We are very grateful for criticisms and remarks to this manuscript that were given to us by: L. Plaskota, G. Wasilkowski, and H. Wozniakowski. We also wish to acknowledge the support of the IBM corporation (Sikorski, Stenger), the NSF, and the University of Utah under the ACERC center (Sikorski), and the Committee for Scientific Research - KBN (Kowalski) during parts of the duration of this project. M. A. Kowalski, K. Sikorski, and F. Stenger, December 1994-
Contents 1
Classical Approximation
1
1.1 General results 1 1.1.1 Exercises 12 1.2 Approximation in unitary spaces 13 1.2.1 Computing the best approximation 17 1.2.2 Completeness of orthogonal systems 20 1.2.3 Examples of orthogonal systems 21 1.2.4 Remarks on convergence of Fourier series . . . 34 1.2.5 Exercises 36 1.3 Uniform approximation 39 1.3.1 Chebyshev subspaces 42 1.3.2 Maximal functionals 47 1.3.3 The Remez algorithm 56 1.3.4 The Korovkin operators 58 1.3.5 Quality of polynomial approximations 63 1.3.6 Converse theorems in polynomial approximation 66 1.3.7 Projection operators 72 1.3.8 Exercises . 83 1.4 Annotations 87 1.5 References 89 2
Splines
2.1
Polynomial splines 2.1.1 Exercises 2.2 B-splines 2.2.1 General spline interpolation 2.2.2 Exercises 2.3 General splines 2.3.1 Exercises 2.4 Annotations 2.5 References
93
93 102 103 109 110 Ill 114 . 114 115
xii
CONTENTS'
3
Sine Approximation 117 3.1 Basic definitions 117 3.1.1 Exercises .125 3.2 Interpolation and quadrature 126 3.2.1 Exercises 132 3.3 Approximation of derivatives on T 134 3.3.1 Exercises 136 3.4 Sine indefinite integral over F . . 136 3.4.1 Exercises 139 3.5 Sine indefinite convolution over F 139 3.5.1 Derivation and justification of procedure . . . . 141 3.5.2 Multidimensional indefinite convolutions . . . . 146 3.5.3 Two dimensional convolution . 147 3.5.4 Exercises 149 3.6 Annotations 150 3.7 References 150
4
Explicit Sine-Like Methods 153 4.1 Positive base approximation 153 4.1.1 Exercises 158 4.2 Approximation via elliptic functions 158 4.2.1 Exercises 160 4.3 Heaviside, filter, and delta functions 161 4.3.1 Heaviside function 162 4.3.2 The filter or characteristic function . . . . . . . 163 4.3.3 The impulse or delta function 164 4.3.4 Exercises 166 4.4 Annotations 166 4.5 References 166
5
Moment Problems 5.1 Duality with approximation 5.1.1 Exercises 5.2 The moment problem in the space Co(D] 5.3 Classical moment problems 5.3.1 Exercises 5.4 Density and determinateness 5.4.1 Exercises 5.5 A Sine moment problem 5.5.1 Exercises 5.6 Multivariate orthogonal polynomials 5.6.1 Exercises
169 170 -. 175 175 178 185 189 203 205 206 206 218
CONTENTS 5.7 5.8
Annotations References
6 n-Widths and s-Numbers 6.1 n-Widths 6.1.1 Relationships between n-widths 6.1.2 Algebraic versions of an and cn 6.1.3 Exercises 6.2 s-Numbers 6.2.1 s-Numbers and singular values 6.2.2 Relationships between s-numbers 6.2.3 Exercises 6.3 Annotations 6.4 References 7
xiii 219 220 223 223 229 235 236 237 240 246 255 255 256
Optimal Approximation Methods 259 7.1 A general approximation problem 262 7.1.1 Radius of information—optimal algorithms . .264 7.1.2 Exercises 270 7.2 Linear problems 270 7.2.1 Optimal information 276 7.2.2 Relations to n-widths 281 7.2.3 Exercises 285 7.3 Parallel versus sequential methods 286 7.3.1 Exercises 290 7.4 Linear and spline algorithms 291 7.4.1 Spline algorithms 295 7.4.2 Relations to linear Kolmogorov n-widths . . . 302 7.4.3 Exercises 304 7.5 s-Numbers, minimal errors 304 7.5.1 Exercises 309 7.6 Optimal methods 310 7.6.1 Optimal complexity methods for linear problems312 7.6.2 Exercises 314 7.7 Annotations 314 7.8 References . 316
8 Applications 8.1 Sine solution of Burgers' equation 8.2 Signal recovery 8.2.1 Formulation of the problem 8.2.2 Relations to n-widths
319 319 321 321 322
xiv 8.2.3 Algorithms and their errors 8.2.4 Asymptotics of minimal cost 8.2.5 Exercises 8.3 Bisection method . 8.3.1 Formulation of the problem . 8.3.2 Optimality theorem 8.3.3 Exercises 8.4 Annotations 8.5 References . Index
CONTENTS 325 332 333 334 .334 335 . 340 340 340 343
Selected Topics in Approximation and Computation
This page intentionally left blank
Chapter 1
Classical Approximation In this chapter we acquaint the reader with the theory of approximation of elements of normed spaces by elements of their finite dimensional subspaces. The theory of best approximation was originated between 1850 and 1860 by Chebyshev. His results and ideas have been extended and complemented in the 20th century by other eminent mathematicians, such as Bernstein, Jackson, and Kolmogorov. Initially, we present the classical theory of best approximation in the setting of normed spaces. Next, we discuss best approximation in unitary (inner product) spaces, and we present several practically important examples. Finally, we give a reasonably complete presentation of best uniform approximation, along with examples, the Remez algorithm, and including converse theorems about best approximation.
1.1
General results
The goal of this section is to present some general results on approximation in normed spaces. Let T be a linear space (over the field of complex or real numbers) endowed with the norm || • || and let V be a, finite dimensional subspace of f'. Suppose that / is an element of f. We wish to determine an element v in V satisfying
At this moment it is natural to ask whether such an element v exists. It turns out that the answer is positive regardless of the structure of the spaces T and V. Namely, the following theorem holds.
2
CHAPTER 1. CLASSICAL
APPROXIMATION
Theorem 2.1 Under the above assumptions, the set
is nonempty, convex, and compact. Proof Since 0 G V we have
Let us now consider an arbitrary element g in V satisfying \\y\\ > 2||/||. We note that
Hence, g $ 0(f,V) and (3(f,V) C B = {g 6 V : \\g\\ < 2||/||}. Consequently, e(f, V) — inf^ 6j g | / — z/||. We remind the reader that any bounded and closed subset of a finite dimensional space is compact and that any continuous function on a compact set attains its infimum on the set. Since B is compact and the function
is continuous, we get the existence of a v G V such that ||/ — v = e ( f , V ) . Thus, /3(f,V) is nonempty. For arbitrary elements v\, v% G /?(/, V) and for any number a 6 [0,1] we have
Hence, the set /3(/, V7) is convex. Since it is also closed, the proof is complete. • Henceforth, an element v e /?(/, V] will be called a best approximation of / with respect to V, and the quantity e(f, V) will be referred to as the best approximation error of / with respect to V . Given a positive number r, we denote by B ( f , r] the ball of radius r about /. From the proof of Theorem 2.1 we get the following result.
1.1. GENERAL RESULTS
3
Corollary 3.1 The best approximation error e(/, V) can be rewritten as Moreover, if the above minimum is attained at r = p, then
Some properties of an element / G T can be carried over to its best approximation, as shown in the following theorem. Theorem 3.1 If f 6 F and A : F-+T is a linear operator such that \\A\\ d= sup^ii^! Ay11 < 1, A(V) C V and Af = f, then there is a best approximation h of f with respect to V satisfying Ah — h. Proof Indeed, for any best approximation g € V of / we have
Thus, the operator A continuously maps the set fl(f,V) into itself. If /?(/, V) is a singleton, then, of course, Ag = g. Otherwise, we use the Brouwer fixed point theorem , which reads as follows. Let D be a convex and compact subset of TV1 and let F : D —> D be a continuous mapping. Then there exists a point x in D such that F ( x ) = x. Since the dimension n is arbitrary and 7£2n can be regarded as Cn, the Brouwer theorem readily extends toward subsets of Cn. Let K be the field of scalars in T (K = 71 or K = C) and let the elements v\, v % , . . . , vn form a basis of V. Since A(V) C V, we have
where Ck, d^ € K and each dk depends linearly and continuously on GI, c 2 , . . . , cn. Thus, given a point (GI, c 2 , . . . , cn] € Kn the equation defines a linear and continuous mapping of Kn into Kn. By Theorem 2.1, the set /?(/, V) is nonempty convex and compact, and so is the set
4
CHAPTER 1. CLASSICAL
APPROXIMATION
Since A(ft(f,V)) C 0(/,V), we get F(D) C D. Thus, by the Brouwer theorem, there exists a point a; = (xi, x-2,.,., xn] € D such that F(x) = x. Finally, setting h = 53£=i xkvk we obtain h € P(f, V) and Ah = /i. The proof is complete. • Let us now consider the following two examples that illustrate the problem of uniqueness of best approximation. Example 4.1 Let f be the Cartesian plane 7£2 with the norm Let V be an arbitrary one dimensional subspace of T and let / be any element in f. Thus, any ball in T and any such a subspace V represent on the plane a disk and a straight line containing the origin, respectively. The situation is shown in Figure 4.1. From this figure and Corollary 3.1 we see that there exists a unique best approximation element to each / 6 T with respect to V. •
Figure 4.1: Approximation on Cartesian plane Example 4.2 As in the previous example let T = Ti2. We redefine the norm in T by the equation Thus, balls in T are now squares on the plane. Two sides of these squares are always parallel to the straight lines y — ±x (see Figure 5.1). Let V be a one dimensional subspace of T. Since any point in V is clearly its unique best approximation with respect to V, we assume that the element / to be approximated is in T \ V.
1.1. GENERAL RESULTS
5
Based on Corollary 3.1 and Figure 5.2 we see that /3(f, V) is a proper
Figure 5.1: Unit ball in 7£2 with the norm ||(x,y)|| = |xj + |y| closed segment of V if the slope of V is ±7r/4, and that /?(/, V) is a singleton if the slope of V is diiferent from ±7r/4. •
Figure 5.2: Sets of best approximations The examples above show that the uniqueness of best approximation depends on the geometry of the space F (induced by the norm), as well as on the subspace V and the element /. We should like to point out that the approximation problem in Example 4.2 has many solutions because the unit sphere
6
CHAPTER 1. CLASSICAL
APPROXIMATION
contains intervals, i.e., sets of the form
where f,g£F and / / g. We shall now characterize all normed spaces T such that approximation problem has a unique solution regardless of the choice of V and /. Theorem 6.1 The following conditions are equivalent: (a) For any finite dimensional subspace V of T and any element f of F there exists a unique best approximation of f with respect to V. (b) For any two distinct elements f and g such that \\f\\ = \\g\\ = 1 we have \\f + g\\ < 2. (c) The unit sphere dB(Q, 1) in T does not contain any interval. Proof We shall first prove the equivalence between (b) and (c). We assume that the condition (c) is not satisfied. Hence, the sphere 8B(Q,l) contains some interval, say [/,]. Then ||/|j = \\g\\ = 1, / ^ g and Thus, ||/ + #|| = 2 and (b) is not satisfied. We now assume that the condition (c) holds. Hence, for any different elements / and g of 9J3(0, 1) the interval [/, g] is not contained in dB(0, 1), i.e., \\af + (1 - a)g\\ < 1 for some a, 0 < a < 1. If a = 1/2, then ||/ + 5|| < 2. Let a ^ 1/2. Since [f,g] C -8(0,1), we have ||(1 — a)f + ag\\ < 1. Consequently,
Hence (b) follows. This proves that (b) and (c) are equivalent. We now show that (b) implies (a). Assume that some element / 6 F has two different best approximations h\ and h? with respect to some finite dimensional subspace V of J-. Then
where
Since g\ ^ #2 an d ||<7i|| = \\9'2\\ = 1 the condition (b) yields
1.1. GENERA L RES ULTS
7
which contradicts the definition of e(f, V}. Thus, (b) implies (a). It remains to show that when (c) does not hold, we can find an element / in f possessing many best approximations with respect to some finite dimensional subspace V of f'. Indeed, if an interval [g,h] is contained in dB(0, 1), then for any number a we have
Thus, for 0 < a < 1 the elements a(g — h) are best approximations of —h with respect to the space spanned by g — h. The proof is complete. • We shall call the normed space T strictly convex if the conditions of Theorem 6.1 are satisfied. The spaces lp and L p (0,1) for 1 < p < oo are classical examples of strictly convex spaces (see Exercises at the end of this section). Of course, the space defined in Example 4.2 is not strictly convex. Let us also note that the spaces Za(0,1), Loo(0,1) and (7(0,1) are not strictly convex. We shall now provide some important facts on the best approximation error e(f,V). Theorem 7.1
(a) For any elements f and g in T the. following properties hold:
Thus, the mapping T 9 / i-» e(f, V) is continuous and defines a seminorm in T'. (b) Let L be a linear functional on J- satisfying the conditions
Then for any / in F we have Moreover, for each f in F there is a linear functional L on F having the properties (i), (ii) ; such that
8
CHAPTER 1. CLASSICAL
APPROXIMATION
Proof An easy verification of the statement (a) is left to the reader as an exercise. In order to prove (b), let / £ J- and let h be a best approximation of / with respect to V. Then,
We now show how to choose L for which e(f,V) = |L(/)|. To this end we use the Hahn-Banach theorem , which follows. If V is an arbitrary linear subspace of the nonned space T and A : V —> C is a linear functional, then there exists a linear functional A : J-'—tC such that ||A|| = ||A|| and A(f) = X ( f ) f o r a l l f e V . Firstly, let us note that when e(f, V) = 0, the zero functional does the job. Hence, we assume that e(f, V] > 0 and consider the linearspace VQ spanned by / and V. The elements of VQ have the form a f + 9t where a is a scalar and g G V. We now define a linear functional L0 on VQ by requiring that
Since \\af + g\\ > \a e(f, V), we have
and
Now, from the Hahn-Banach theorem we conclude the existence of a linear functional L on the space T such that
Hence, the proof is complete. • It is useful to know whether the best approximation depends continuously on the element to be approximated. It turns out that the continuity is a consequence of the uniqueness.
9
1.1. GENERAL RESULTS
Theorem 9.1 Let V be a finite dimensional subspace of T, such that each element f in T has a unique best approximation b(f] with respect to V. Then the mapping
is continuous. Proof Let us assume to the contrary that for some elements J3 6 V, j = 0, 1, 2 , . . . , such that
the sequence {fr(/j)}?^i does not converge to the element 6(/o)- Since this sequence is clearly bounded and belongs to a finitely many dimensional space, it contains a subsequence {b(fjh)}f^L1 that converges to an element v 6 V, v ^ b(fo). From the assertion (a) of Theorem 7.1 it follows that
On the other hand, by the triangle inequality, we have
Thus, it now follows that the element v 6 V best approximates /o, /o - " I I = e(/ 0 , V). Indeed,
This fact contradicts the uniqueness of b(f0) proof. •
and completes the
We recall that a subset W of J- is said to be linearly dense in f if each element of F can be approximated to arbitrary accuracy by a linear combination of elements in W. Then the linear subspace span W spanned by all elements in W is called dense in f'. Theorem 9.2 A subset W C T is linearly dense in T iff any linear and continuous functional L : yr—^ C that vanishes on W must vanish identically.
Proof If a continuous functional L : J-—$C vanishes on W, then
Assuming that W is linearly dense in J7 we get L = 0, since
We assume now that W is not linearly dense in T. Thus, for some / € T we have E = inf vespan w \\f - v\\ > 0. We define a linear functional A : {a/ — v : a € C, v € span W}->C by the equation A ( a f — v} = aE. Since aE = inf v6span w \\otf — v\\, we get
By the Hahn-Banach theorem, there exists a linear functional A : T-+C such that ||A]| = ||A|| and A.(g) = X(g) for all g 6 T. Hence, A vanishes on W but does not vanish identically since A(/) = E > 0. This completes the proof. • We close this section with showing that for some elements / in a Banach space J- the approximation errors e(f, \4) can converge to zero arbitrarily slowly. Theorem 10.1 Let J- be a Banach space and let
be its finite dimensional distinct subspaces. Then, for every sequence of positive numbers a0, o^, «3, • • - satisfying
there exists an element f in J- such that
Proof We first show that for every nonnegative integer n we can find an element /„ 6 T such that
1.1.
GENERAL RESULTS
11
To this end, we pick up k £ A/" and g € T \ Vk. Then e(g, Vk} ^ 0 and for gk — otk&(g, Vk}"lg we have
Let Vk be a best approximation to gk, \\gk - Vk\\ = e(gk,Vk) = a^. We now note that for an arbitrary hk G 14 \ Vfc-i the function
is continuous, liou—^ C/>(.T) = oo and ;(0) = e(gk, Vk)- Thus, there exists a number Xk such that <j>(xk) — a/t-i- We now set
and we note that e(gk-i,Vk-i) = otk-i and e(gk-i,Vk) = otk- Repeating the argument above for k = n, n — 1 , . . . , 1 we obtain a sequence of elements ^ n , 5 n _ i , . . . , #0 such that
Hence, fn = go is the desired element. Let hk,n € Vk be a best approximation of fn with respect to Vk. We have ||/n - /ifc, n || = e(/ n , 14) = a^ for A; < n. Since
each sequence {hk,n}^=o is bounded and contains a convergent subsequence {/ife,,i-(A;)}j^:o- Without loss of generality we assume that {nj(k + l)}j2.Q i's a subsequence of {?ij(fc)}^. 0 . We now set
and note that corresponding to each k there exists an integer Sk such that Of course, we may assume henceforth that the sequence {sk}^,0 is increasing. We have
and
12
CHAPTER 1. CLASSICAL
APPROXIMATION
Since {np(j)}f£L0 have
is a subsequence of {np(k)}™=0 for j > k, we also
and
Let us note that
Thus,
and consequently {/71t a.)}^_0 is a Cauchy sequence. Since J7 is a Banach space, the limit / = liiiij— >£» /7la (« exists in J^. Actually, / is the desired element since for j = 0, 1, 2. ... we have
This concludes the proof. • 1.1.1
Exercises
1. When is the set of all elements v e V satisfying the equation ||/ — u|| = e(f, V) denumerable? 2. Show that if V is a finite dimensional subspace of a normed space F and V ^ F, then there exists an element h in .F, h ^ 0 , such that e(h,V) = \\h\\. 3. Prove Corollary 3.1. 4. Let J- be the space L p ( — 1, 1), where 1 < p < oo. Let V be a subspace of J- consisting of all algebraic polynomials of degree < n < oo. Show that any even or odd function in f has a best approximation in V that is feven or odd, respectively. 5. Prove that all one dimensional normed spaces are strictly convex.
1.2. APPROXIMATION
IN UNITARY SPACES
13
6. Assume that r > 0 and that a finite dimensional space T with a norm || • || is not strictly convex. Show that it is always possible to introduce another norm ||| • ||| that makes the space strictly convex and such that the quantity
is not larger than a given positive number e. What is a geometric interpretation of this fact? 7. Show that T is strictly convex iff for any / and g in 'f the condition implies that / and g are linearly dependent. 8. Show that the space C(0, 1) is not strictly convex and show that the spaces lp and L p (0,1) for 1 < p < oo are strictly convex iff 1 < p < oo. 9. Prove that when dim T = dim V = oo, Theorem 7.1 remains valid but a best approximation may not exist. 10. Prove the assertion (a) of Theorem 7.1. 11. Given a linear subspace V of T consider the linear quotient space TJV of all abstraction classes [/] corresponding to the equivalence relation {(/,g) £ F y. F : f — g £ V}. Show that || [/] || d= e(f, V) is a norm in F/V. Then prove that the operator Q : T —> f/V, Qf c= [/], maps the open unit ball of T onto the open unit ball of TJV. (This operator is called the canonical quotient mapping.)
1.2
Approximation in unitary spaces
Throughout this section we assume that 'f is a unitary space. Thus, the norm • || in ? is induced by an inner product (•, •), i.e.,
Two elements /, g 6 F\ {0} are said to be orthogonal iff (/, g) — 0. Let V be a finite dimensional subspace of T. An element a 6 V and the subspace V are said to be orthogonal iff
14
CHAPTER 1. CLASSICAL
APPROXIMATION
Let us note that any two elements / and g in T satisfy the parallelogram law:
which is shown in Figure 14.1.
Figure 14.1: Parallelogram law Indeed,
Since the parallelogram law implies the assertion (a) of Theorem 6.1 we see that all unitary spaces are strictly convex. In particular, given a finite dimensional subspace V of a unitary space JF and given an element / in f there exists a unique element g in V satisfying
The following theorem provides a very useful characterization of g. Theorem 14.1 The element g is a best approximation of f with respect to V iff f - g is orthogonal to the subspace V. Proof If /-g is orthogonal to V, then for any v <E V we have v-g 6 V and
1.2. APPROXIMATION
IN UNITARY
SPACES
15
Consequently, e(f, V) = \\f - g\\. Let us now assume that / — g is not orthogonal to V. Then (/ — g, ho) = a ^ 0 for some ho 6 V, ho ^ 0. In order to complete the proof it is enough to find an element v 6 V such that |j/ — u|| < \ \ f - g f . We define
and note that
Thus, u = g0 is the desired element. The proof is complete. • Theorem 14.1 says that for any / 6 T the best approximation P(f) °f / with respect to V is the orthogonal projection of / on V. Corollary 15.1 The operator F 9 / H-> P(/) is linear and \\P\\ = 1. Moreover, for any f € T we have
We omit an easy proof of this corollary. Let us assume that !, g % , . . . , gn are linearly independent elements of V and that
Then the best approximation of / £ T with respect to V takes the form
By Theorem 14.1 the numbers a.^ are determined by the linear equations
16
CHAPTER 1. CLASSICAL APPROXIMATION
Thus, we have the linear system
The matrix above is denoted by G(gi, # 2 , - - - , 9n) and called the Gram matrix. Since the best approximation of / is unique, this matrix is nonsingular. We denote its determinant by A(<7i, g2, • • • , gn). One may easily show that G = G(gi, gz,..., gn) is Hermitian, G = GH, and positive definite,
Here, H stands for the conjugate transpose operation. When the fcth column of G(gi, Si-,----, gn) i-s replaced by the right-hand— side vector of the equation (16.1), the determinant of the resulting matrix will be denoted by Afc(
anc?
Proof It is easy to verify that the equations (15.1) and (16.1), taken together with Cramer's formula, give the desired form of P(f)Since the elements P(f) and / — P(f) are orthogonal we get
Expanding the determinant A(/, ( / i , . . . , gn) 'm terms of the first row we obtain the numerator of the fraction above. This observation completes the proof. •
1.2. APPROXIMATION
IN UNITARY SPACES
17
Remark Let us assume that T is the Cartesian space TLn+i with the ordinary inner product. Then the last identity in Corollary 16.1 has a nice geometric interpretation: The height of the (ra+1) dimensional parallelogram spanned by the vectors g\ , g^, • • • , gn and / is precisely the ratio of the (ra + 1) dimensional volume of the parallelogram and the n dimensional volume of the parallelogram's base spanned by the vectors #i, # 2 , - - - , 9n- • 1.2.1
Computing the best approximation
There are two general ways for determining P(f). They are both based on the equations (15.1) and (16.1). The first one is to solve the linear system (16.1) and then represent P(f) by means of the formula (15.1). Let G — G(g\, g%, . . ., gn) and let d denote the right-hand side of the system (16.1). We assume that this system is solved by any algorithm in the floating point arithmetic. Let s(G, d) denote the computed approximation of the solution a = G~ld and let ||G|| = su
P||r?||2=i I t a l i a , where 11^112 = (ELi Nl 2 ) • Then, based on the sensitivity of a to perturbations of G and d, one obtains the following rule. We should not expect the error Er = \\a — s(G, d)||2 to be less than e \\G\\ I I G ' 1 ] } ^ ] ^ , where e is the relative arithmetic precision. The quantity cond(G) = GJHIG" 1 !), is called the condition number of G. We also recall that the algorithm is said to be stable if
where K is a constant that may dependent on n. In order to solve the system Go? = d we recommend finding the decomposition G = LLH, where L is a lower triangular matrix, 1 and then solving the triangular systems LHy = d and La = y. The following algorithm does it in situ, i.e., the computed L and a are stored in the lower triangular part of the matrix G and in the vector d, respectively.
'The matrix L exists since G is Hermitiaii and positive definite.
18
CHAPTER 1. CLASSICAL
APPROXIMATION
for k := 1 to n do begin 9k,k '•= ^/9k,k 5
for i := k + 1 to n do giik :- gltk/
71 do
for i :- j to n do gitj :=
/f)i,i
This algorithm is known in the literature as the Cholesky algorithm. It is stable, and requires approximately n3/(j multiplications. The only numerical trouble that may arise in the use of this algorithm (or any other stable algorithm) is a large condition number of the matrix G. The following example illustrates this obstacle. Example 18.1 Let us set ft = L 2 (0,1),
and let g^ be the monomial of degree k — 1, i.e.,
Then the matrix G(g\, g % , . . . , gn) coincides with the Hilbert matrix
which is known to have a huge condition number cond(Hn), even for small values of n. For instance: cond(#4) w 1.55 104, cond(#6) « 1.5 107, cond(ff 10 ) « 1.6 1013. • The second method for computing the element P(f) is motivated by the observation that in the case when the elements glt g?, . . . , gn are orthonormal the equation (15.1) takes the form
When the elements gk are not orthonormal, instead of solving the system (16.1) we may replace the elements (/1; 2, • • •, 9n with some orthonormal elements i/i, i s - 2 , . . . , vn that span the same space V
1.2. APPROXIMATION
19
IN UNITARY SPACES
and then to use representation (15.1). The following algorithm can be used to produce the desired orthonormal basis of V. for i := 1 to n do begin
end Here we assume that g^i = g; for i = 1,2,...,
n;
Indeed, by induction one easily proves that for any i = 2, 3 , . . . , n the spaces
and
are orthogonal (i.e., (v,w~) = 0 for v £ V{ and w 6 Wi) and taken together span the space V. Moreover, each v^, is clearly of unit norm. Thus, the elements z/fc, k = 0, 1 , . . . , 71, form an orthonormal basis of V. In many applications, when the round-off errors are present, this algorithm does an excellent job. For instance, it proves to be stable for orthogonalization of vectors in Cm and for orthogonalization of polynomials in L2 (a, fy. Nevertheless, we should like to warn against taking its good numerical properties for granted. The algorithm above is known in the literature as the modified Gram-Schmidt algorithm. Remark The (original) Gram-Schmidt algorithm reads us follows.
begin
end
Let g be a sequence of n linearly independent elements of Cm and let GS(<7) denote the output v = {!>,•}"_! of the Gram-Schmidt algorithm. In contrast to the modified algorithm, v := GS(g) is unstable. However, it turns out that the second iteration, w := GS(GS(g)), leads to stable results. •
CHAPTER 1. CLASSICAL APPROXIMATION
20
Let us note that the essential cost of both methods of computing P(f) consists of approximately ri 2 /2 inner product evaluations. 2 Thus, in a practical situation, the choice between the methods should be motivated by the needs and numerical properties of each method. 1.2.2
Completeness of orthogonal systems
Let us set
Let us also suppose that W is a linear subspace of T. It is natural to ask if any element in W can be approximated to arbitrary accuracy with elements of the spaces Vn, i.e., whether
If this property is satisfied, the orthogonal sequence is called complete in W. The key to the completeness is the Bessel inequality, which states that
This inequality can be easily proven as follows. The best approximation of any u 6 U with respect to Vn takes the form
Using Corollary 15.1 we now get
Letting n tend to oo we get the Bessel inequality. Moreover, we arrive at the following conclusion. Corollary 20.1 The orthogonal sequence {^/t}^i is complete in W iff
2
When the inner product is given through an integral (like that in Example 18.1) its evaluation is the matter of choosing a quadrature formula.
1.2. APPROXIMATION
IN UNITARY SPACES
21
or equivalently if
The infinite series in the second equation of Corollary 20.1 is called the Fourier series of w. 1.2.3
Examples of orthogonal systems
For many unitary spaces T that occur in applications we are fortunate to find explicitly an infinite system (sequence) of orthogonal elements v\. G T. We shall now survey properties of four important classes of orthogonal systems. Example 1: Trigonometric functions We begin with classical examples of orthonormal systems in the space T — LI(—7r,7r) equipped with the inner product
Given numbers k — 0, 1,... and / = 0, ±1, ± 2 , . . . let us define the following functions in Li(—ir, TT):
where i = ^/~-l. It is easy to verify that
and
for any j, k = 0, 1, 2 , . . . and any /, m = 0, ±1, ± 2 , . . . . Here, 8ptq denotes the Kronecker delta , i.e.
22
CHAPTER 1. CLASSICAL
APPROXIMATION
Thus, the systems
are orthonormal. Moreover, for any function / in L\(—7r,7r) the truncated Fourier series
and
are identical. We shall now list (without proof) some other properties of the above defined systems A and B. Theorem 22.1 Let t, x £ 'R., and n € Af be arbitrary. (a) The orthonormal systems A and B are complete in L^(—TT, TT). (b) For an arbitrary function f in LI(—TT, TT) we have
where
(c) If f is a 2?r—periodic function on Tl such that f G L\ ( —TT, TT) and for some point T 6 H the derivative f ' ( r ] exists, or for some positive number e, f is monotone in the intervals (r — e, T) and (T, T + e), then (d) For any number p > 2 there exists a function f in Lp(— 7r,7r) such that s n ( f ) does not converge to f in the Lp — norm. (e) There exists a continuous lit— periodic function f on Tl such that the sequence { s n f ( t } }7^L0 is unbounded. (f) Given a function g in L\(— TT, TT) let us define
1.2. APPROXIMATION
IN UNITARY SPACES
23
and Then and
(g) For any number p > 1 and any function g in Lp(—7r,7r) the sequence {Fn<7}^0 converges to g in the Lp—norm. Moreover, if g is a continuous function on the interval [—7r,7r], then the sequence {Fng}^LQ converges uniformly to g. Example 2: Sine functions Given a positive number a let us consider the class W(a) of entire functions / : C—>C satisfying the conditions
and
where K is a positive constant independent of 2, which may depend of / 3. It is easy to see that W(a) is a linear space and that
defines an inner product in W(a). We now recall the Paley-Wiener theorem, which reads as follows. The space W(a) with the inner product < •, • > is a Hilbert space of functions f : C —> C that admit a unique representation
Functions satisfying this inequality are said to be of exponential type a.
24
CHAPTER 1. CLASSICAL
APPROXIMATION
Let us note that F is the Fourier transform of / restricted to the real line, i.e.,
Let ( • , •) denote the inner product of the space L 2 (-a,a). We also recall the Parseval theorem, which states that: If G and H are the Fourier transforms of functions g 6 W(a] and h € W(a), respectively, then
We are now ready to derive some additional properties of W(a) Given a function / e W(a) and a number z € C we have
Thus, the function
is a reproducing kernel of W(a), f(z) =< /, A"(-,z) >. For any number h £ (0,7r/a], the equation
\
defines a function in L^(—n/h, TT//I). From the assertion (a) of Theorem 22.1 it follows that the sequence of functions Vk(t] = e-Mit ^ ^ _ 0, ±1, ±2,,..) is a complete orthogonal system in L^(—i^/h^/h}. Thus,
where
1.2. APPROXIMATION
IN UNITARY SPACES
25
Thus, for / g W(a) and z g C we have
where
We shall now show that the functions S(k,h) are orthogonal in the space L^—00,00). Indeed, for any integer numbers k and / we have
Thus, by the Parseval theorem we obtain
where 8^^ is the Kronecker delta. The results of this subsection yield the following theorem. Theorem 25.1 (a) W(o) is a reproducing kernel Hilbert space, (b) Given a positive number h the sequence {S (k, h) /^/h}(£L_00 is orthonormal in LI(—00,00) and complete in W(n/K). (c) If f e W(a), h 6 (0, n/a] and z <EC, then
(d) If f £ W(a) and h 6 (0, IT/a], then
and
26
CHAPTER 1. CLASSICAL
APPROXIMATION
The functions in the class W(a) are often called signals of bandwidth [—a, a] and the functions S(k,h) are called sine functions. Example 3: Prolate spheroidal wave functions
It is well known that for any positive number c, the values of the parameter K such that the differential equation
has a nonzero solution, can be ordered to form a positive strictly increasing sequence
Moreover, for K = Kfc(c), there exists a unique function Uk(c,t), Uk(c,-) : [—1,1] i-> 72., satisfying the differential equation and such that Uk(c, 0) = Pk(ty, where Pk is the k th Legendre polynomial, i.e.,
The functions Uk(c, •) are known to possess the following additional properties. (A) Each function tt&(c, •) continuously depends on c, and for fixed c it can be extended to an entire function. (B) The functions Uk(c, •), k — 0, 1 , . . . are orthogonal and complete in the space L%(—1, 1). (C) Each function u^(c^ •) has exactly k simple zeros in the interval (—1, 1). (D) The functions Uk(c, •) satisfy the eigenrelations
and
1.2.
APPROXIMATION
IN UNITARY SPACES
27
Given positive numbers a and r, we set
and define the prolate spheroidal wave functions i/>k : [~T,T] i-» 'R, by
Then the properties (A)-(D) can be restated in terms of ^, to read as follows. Theorem 27.0
(a) Each function ^ continuously depends on a and on T, and for fixed a and T it can be extended to an entire function. — 1/2 (b) The functions
on the interval (—T,T). (d) The functions if>k satisfy the eigenrelations
and
where 0k = /^(ar), A^ = Afe(ar) and Afc \ 0 as k-^oo. (e) For p = [2ar/?rj - 1 and q = [2ar/?rJ +1 f/ie eigenvalues A p (ar) anc? A ? (ar) enjoy the bonds
Substituting s = ra lu in the first equation of (d) we get
where Xk(u) = \'2ikafik} V'fc (ra"1^;). Using the second equation of (d) and the assertion (a) of Theorem 25.1 we find that
28
CHAPTER 1. CLASSICAL
APPROXIMATION
Thus, Taking these identities together with (b) we get: Corollary 28.1 Each function ip^, k — 0, 1 , . . . , is a signal of bandwidth [—a, a]. Moreover, the set {V'/cl^Lo J's orthonormal in LI{—00,00) and complete in W(a). Example 4: Orthogonal polynomials Let /J, be a finite positive measure defined on the field of all Borel subsets of 7£. Let us denote by £2(7?., yu) the unitary space of all functions f : Ti—tH that are square integrable over 'Ti with respect to the measure fj,. The inner product in £ 2 (7£,/x) is defined by the equation
For some measures /i, the space £2(7?-, /•*) contains the class of all polynomials LT, as a dense linear subspace. For instance, this is the case when the measure /j, is supported on a compact subset of 7£. Then there exists a complete orthogonal system of polynomials in £2(7£,;u) (see Exercise 19 at the end of this section). A necessary and sufficient condition on the measure fj, for which polynomials are dense in £2(7^,//) will be given in the next chapter. As we display in the following theorem, systems of orthogonal polynomials in £2(7?.,//) share remarkable common properties. Theorem 28.1 Let II C £2(72.,/i) and let a polynomial that, vanishes almost everywhere with respect to /j, be the zero polynomial. Then the following properties are satisfied. (a) There exists a unique sequence of polynomials {vk}f?Lo suc-h that
(b) Let {pk}^°=0 be a sequence of polynomials satisfying the conditions Then {pk}*kLo = {ckvk}'j£=o for some sequence of real nonzero numbers Cfc. Here, v^ are the polynomials in the assertion (a).
1.2. APPROXIMATION
IN UNITARY SPACES
29
Let {pfc}£L0 satisfy the conditions of the statement (b) and let % = (pk,Pk}- Then: (c) For each k = 1, 2, 3 , . . . there are unique real numbers a^, bk, and Ck such that
and (d) Given k = 0, 1, 2 , . . . and given real numbers x and y we have
where the numbers a& are from the assertion (c). (e) For k = 1, 2, 3 . . . ifte polynomial pk has precisely k simple roots. Moreover, p'k(a)pk+i(a) is different from zero and has the same sign for every root a of the polynomial p^. Thus, the roots of the polynomials pk and Pk+~i interlace. The formula in the property (c) expressing pk+\ in terms of p^ and Pk~\ is called the three term recurrence relation. The identity in (e) is called the Christoffel-Darboux summation formula. Remarks 1. Let L-: U—tTZ be a linear functional such that L(p2) > 0 for any nonzero polynomial p. We may then introduce an inner product in II by the formula
Then the opening assumption of Theorem 28.1 can be ignored and the assertions (a)-(e) remain valid. Let {PfcjjtLo ke a sequence of orthonormal polynomials with respect to L, i.e.,
We say that q is a quasi-orthogonal polynomial of order n > 2 (with respect to L) if the degree of q is at most n and L ( q P k ) — 0 for k = 0, 1, . . ., n — 2. (We shall need quasi-orthogonal polynomials in Chapter 5.) Their basic properties follow.
30
CHAPTER 1. CLASSICAL
APPROXIMATION
• A polynomial q of degree n > 2 is quasi-orthogonal with respect to L iffq-aPn + /3 Pn^, where a + \{3\ > 0. • Given n G J\f and x € 7i there exists a quasi-orthogonal polynomial q of degree n + 1 such that q(x) = 0 iff Pn(x) ^ 0. • All roots of quasi-orthogonal polynomials are real and simple. 2. The three term recurrence relation has the following important practical consequences. • The orthogonal polynomials po,pi,.. .,pn can be fully represented on a computer through the real numbers p$, a, /3 (where Po(x) = po, p \ ( x ] = ax + /3) and the coefficients a^^b^c^, k = 1, 2 , . . . , n — 1. • The knowledge of polynomials po, p\ and numbers a^, b^, c^, k = 1, 2 , . . . , yields an easy recursive algorithm for sampling the polynomials pk at a given point. • It turns out that polynomials % that are orthogonal with respect to the inner product
can be explicitly found when any of the following conditions are met.
The polynomials qn induced by those three inner products are called the Jacobi polynomials for the case of (i), the Laguerre polynomials for the case of (ii), the Hermite polynomials for the case of (iii), and they are traditionally denoted by P,[a , L\" , and //„, respectively. They are also referred to as classical orthogonal polynomials. The following theorem provides their additional common properties. Theorem 30.1 (a) The sequence {qn}^L0 is a complete orthogonal system in the L-2 —space with weight p. (b) The polynomials qn satisfy the following differential relations.
1.2. APPROXIMATION
31
IN UNITARY SPACES
and where n = 0, I , . . . and the quantities 5, Kn, Xn are listed in the table on page 33. Two sequences of the Jacob! polynomials are of special importance. They are
Given a nonnegative integer m, let us define
/
-| /o
1/91
where sm and am are the leading coefficients of Pm and Pm i respectively. The polynomials Tm and Um are called the Chebyshev polynomials of the first kind and the Chebyshev polynomials of the second kind, respectively. Their basic properties are listed in the next theorem. Theorem 31.1 For any numbers m, n = 0, 1, 2, ... and x e Tl we have:
(c) Tm+1(x) = 2xTm(x) - r m _i(a;), Tl(x) = x.
T^(x] = 0, T 0 (x) = 1, and
(d) Tm has precisely m roots on the interval (—1,1) and its j-th root is (e) The leading coefficient ofTm is 2max(°'m-1).
32
CHAPTER I . CLASSICAL
(h) Um+i(x) = 2xUm(x) - Um-i(x), Ui(x] = 2x.
APPROXIMATION
U-i(x) = 0, U0(x) = I , and
(i) Um has precisely m roots on the interval (—1,1) and its j-th root is given by the formula
(j)
The leading coefficient of Um is 2m.
Some applications of orthogonal polynomials are discussed in the next chapter where we consider moment problems and density of polynomials in the space £2(7£,/i). In Chapter 2 we shall also exhibit an algebraic characterization of orthogonal polynomials in many variables.
1.2. APPROXIMATION
IN UNITARY SPACES
Table 33.1: Properties of classical orthogonal polynomials
33
CHAPTER 1. CLASSICAL
34
1.2.4
APPROXIMATION
Remarks on convergence of Fourier series
Let K be a compact subset of TV1 and let /x be a positive measure on K. We assume that there is a complete orthonormal system in the space Li(K} n) consisting of continuous functions on K, say {/n}£LiWe aim to study the behavior of the Fourier series
in the supremum norm U/H^ = sup^g^- |/(a;)|. Here, /o is a given continuous function on K. For simplicity we shall denote the nth Fourier coefficient of the above series by fln. We remind the reader that
is the best approximation error e(/o, span{/i , ,/2, • • • , fk}) measured in the norm || • || of the space £ 2 (^-')/-0- Moreover, we have
Henceforth, we assume without loss of generality that We now present a theorem on estimating the error of the Fourier approximation in the supremum norm. Theorem 34.1 For k — 1, 2, 3, . . . the following inequality holds,
Proof We begin with determining the maximal value M/, of the function
subject to the condition |j X^n=o anfn\\ = 1. As the functions /„ are orthonormal, we can rewrite this condition in the form
1.2. APPROXIMATION
IN UNITARY
SPACES
35
By using the Lagrange method we get the system of equations
where («o> <*i, • • • i &k) 'IS a point of the maximum of F and A is the Lagrange multiplier. After standard algebraic manipulations, this system takes the form
For any s in K and for arbitrary numbers jn we have by the Schwarz inequality
On the other hand, we also have
Thus, combining these inequalities and taking the supremum with respect to s G K, we get
36
CHAPTER 1. CLASSICAL
APPROXIMATION
Now, by setting 70 = 1, Jn = —fin (n — ii 2, • • - , &) we finally get the desired inequality. • We shall now show when the series Sfo converges uniformly. Theorem 36.1 //
then the series Sfo converges uniformly on K to a continuous function go, which agrees p.—almost everywhere with the function /Q. Proof Given an integer n and given an s in K we have
and consequently the series Sfo is absolutely and uniformly convergent on K to some continuous function go- Since the series converges to the function /o in the norm || • ||, we see that /o coincides //—almost everywhere with go, as claimed. • ,
1.2.5
Exercises
12. Find the best approximation of the function f ( x ) = x' 2 with respect to the linear subspace of 1/2(0, 1) spanned by the functions g \ ( x ) = ex and g^(x) — e^x. 13. Show that the m i n i m u m
is attained when a = -8/21, b = 8/7 and c = 8/35.
1.2. APPROXIMATION
IN UNITARY SPACES
37
14. Find an example of nonunitary normed space JT such that for some of its one dimensional subspaces V, the set of best approximations of any element s G jF with respect to V consists of precisely one element which linearly and continuously depends on s. 15. Prove Corollary 15.1. 16. Suppose that T = £ 2 (0, 1), f ( x ) = a; m , and yk(x] = x'^, where
Let Vn = span{^i, 5 2 , . . . , gn}- Using Corollary 16.1 and the Cauchy determinant formula:
show that
17. Let 0 < m < p\ < p? < ... . Show that
18. Let us recall that the Weierstrass theorem establishes the density of algebraic polynomials in the space C'(0, 1). Since (7(0, 1) is a dense subspace of £2(0,1), algebraic polynomials are also dense in £2(0, 1). Use this fact and the results in Exercises 16 and 17 to show that the space spanned by the functions gk(x) = xpk (0 < pi < p-2 < ...) is dense in £2(0, 1) iff the series £^£2 1/Pfc is divergent. This result is known in the literature as the Milnz theorem. 19. Let T be a normed space. Show that T is a unitary space if any two elements / and g in T obey the parallelogram law (see p. 14). 20. Show that the Gram matrix in (16.1) is positive definite.
CHAPTER 1. CLASSICAL
38
APPROXIMATION
21. A normecl space 3- is said to be separable if it contains a denumerable set T such that any element of T can be approximated with elements of T to arbitrary accuracy. Use the modified Gram-Schmidt algorithm to show that any separable unitary space contains a complete orthorionnal system. 22. Let the operator Fn and the functions un and vn be those in Theorem 22.1. Show that Fn transforms the class of all nonnegative continuous functions on [—TT, re] into itself. Then find the functions Fn(uo), Fn(ui), and Fn(vi). 23. Derive Theorem 26.1 from the properties (A)-(D). 24. Prove the properties (a)-(c) of quasi-orthogonal polynomials given in Remark 1 on page 29. 25. Find min /^ \xn — p(x)\2 dx, where the minimum is taken over all polynomials of degree at most n — 1. 26. Show that for the Chebyshev polynomials we have
and
Here, both equations involve n x n matrices. 27. Prove that
for x G [—1) 1]. the convergence oi infinite series being u n i f o r m .
1.3. UNIFORM APPROXIMATION
39
28. Using the three term recurrence formula for the polynomials Tn derive the following algorithm that for given numbers ao, ai, . . ., an computes the numbers bo, 61, . . . , bn such that
for j :~ 2 to n do for k :~ n downto j
do
begin
end When this algorithm is completed bk = ak for 0 < k < n; 29. Write an inverse algorithm to that of Exercise 28, i.e., the algorithm for transforming given numbers ao, ai, . . , , an into numbers 6 0) & ! > • • • ) bn such that
30. Assume that / is a continuous function on [— 1, 1] whose Fourier series, S — J]fco akTk^ satisfies the following condition.
where 7 > 1 and $ < oo. Show that the series S converges uniformly to /. 31. Show that if any open subset GJ of K has a, positive measure /i(w), then the functions g^ and /o in Theorem 36.1 are equal everywhere on K.
1.3
Uniform approximation
This section deals with uniform approximation i.e., approximation in the space C(B) of complex valued continuous functions / defined on a compact subset B of Tim. The norm ||/|| is assumed to be We begin the discussion with a classical result on characterization of best approximating functions.
40
CHAPTER J. CLASSICAL
APPROXIMATION
Theorem 40.1 Let V be a finite dimensional subspace of (.'(B). Let functions f and VQ belong to C(B) and V, respectively. Then the function i/o is a best approximation of f with respect to V , \\f — '/oil = e(f, V), iff every function v in V satisfies the condition
Proof Without loss of generality we assume that ||/ - i/0|| > 0. Suppose that there is a function v\ in V that approximates / better than z/o, i.e.,
Thus,
This shows that the condition of the theorem is not satisfied for v = f i - I/Q. The next step is to show the opposite implication. Assume that
for some nonzero function 1/2 in V. Due to continuity of /, I/Q and 1/2, there exists an open subset U of B such that D C U and
This choice of U guarantees that the maximum
cannot be attained on the closed subset B \ U. Thus,
1.3. UNIFORM APPROXIMATION
41
We use this property to define a function 1/3 £ V, which better approximates / than z/0. Namely, we set
Let us note now that:
Consequently, ||/ — 1/3 1| < ||/ — i/o||. This completes the proof. • We shall now comment on the uniqueness problem for approximation in C(B). First, let us note that the space C(B) is not strictly convex, unless the set B consists of precisely one point. Indeed, when B contains two different points, x and y, then there are continuous functions such that
When these functions are restricted to B we have
Since /i and /2 are different on B, the space C(B] is clearly not strictly convex. Thus, the uniqueness of best approximation with respect to a finite dimensional subspace V of C(B) cannot be taken for granted, unless special assumptions are made on V.
42 1.3.1
CHAPTER, 1. CLASSICAL
APPROXIMATION
Chebyshev subspaces
Given n linearly independent functions
We say that these functions form a Chebyshev system or that V is a Chebyshev subspace if any nonzero function in V has less than n zeros on B. It turns out that this condition implies the uniqueness of best approximation with elements of V. Before proving this we show two useful lemmas on Chebyshev systems. Lemma 42.1 Functions #1, g%, . . . , gn in C(B] form a Chebyshev system iff for arbitrary distinct points x\, x%, . . . , xn in B the matrix
is nonsingular. Proof The matrix above is nonsingular iff the homogeneous linear system
has only the trivial solution. In other words, the function Y^k=\ Ck9k with zeros at the points x\, x^,..., xn vanishes identically. Since these points are arbitrary and distinct this means that the functions g\, g<2,..., gn form a Chebyshev system, as claimed. • Lemma 42.2 Let V be an arbitrary n-dimensional Chebyshev subspace of C(B] and let f 6 C(B) \ V. If g is a best approximation of f with respect to V, then the set
contains at least n + 1points. Proof Assume to the contrary that D = {xi, x?,..., xp}, where p < n. Let {(/!, 5> 2 > • • • , 9n} be a basis of V. From Lemma 42.1 we conclude that the matrix [gj(xk)] •= i ^=\ 'ls of the full rank. Thus, since p < n, the linear system
1.3.
UNIFORM APPROXIMATION
43
has a solution. For h = Y?j=i Cj9j € V we have
Since e(f,V) proof. •
> 0, this contradicts Theorem 40.1 and completes the
We are now in a position to prove the uniqueness property of approximation in C(B). Theorem 43.1 Let V be a finite dimensional subspace of C(B). Then any function in C(B) has a unique best approximation with respect to V iff V is a Chebyshev subspace. Proof Let us assume that for some function / G C(B] the set A of best approximations of / with respect to V contains two different elements h\ and h%. (We remind the reader that A is always nonempty and convex.) We now define a function ho and a set D by the equations
and respectively. Of course ho € A. Consequently, for each point x 6 D we have
where E — e ( f , V ) . Since
and | • | is a strictly convex norm in C we get f ( x ) — /ij (x) = f ( x ) — hi(x}. Hence, the functions h\ and /?,2 are identical on D. Assuming that V is a Chebyshev subspace and invoking Lemma 42.2 we conclude that the function h\ — h-2 has at least 1 + dimV distinct zeros. Since h\ - h% G V and h\ ^ h-2, this contradicts that V is a Chebyshev subspace. In order to complete the proof we now suppose that V is not a Chebyshev subspace and we construct a function in C(B) having infinitely many best approximations with respect to V.
44
CHAPTER 1. CLASSICAL APPROXIMATION
There is a function g in V that does not vanish identically and has n = dimV distinct roots z j , x?,.. ., xn in B. Let
is singular. Thus, for some complex numbers /3k we have
and X^l=i \&k\ 7^ 0- From this we obtain
Without loss of generality we assume that $l(/3k) ^ 0 if /3k ^ 0, k = 1, 2 , . . . , ri. Let us now define sets R\, R%, and a function s •• {xi, s 2 , . . . , a;n}->{-!, 0, 1} as
and
The function s can be extended to a function 5 6 C"(-B) such that 11-511 = 1. We now consider the function
Let us note that Indeed, the assumption e(f0, V] < I yields a function v 6 V such that
1.3.
UNIFORM APPROXIMATION
45
for k = 1, 2 , . . . , n. Hence, if 3£(/3fc) 7^ 0, then
and consequently ?K(Pkv(xk)) > 0- Thus,
which contradicts that Y^k=i Pkv($k) = 0. We are now ready to show that /o has infinitely many best approximations with respect to V. For a given number e 6 [—1/2, 1/2] we define a function he £ V by the equation
Then
Thus, ||/o - he\\ < 1. Since e ( f 0 , V ) > 1, we finally get
This completes the proof. • Example 45.1 A natural and important example of an n-dimensional Chebyshev subspace of C(a, 6), —oo < a < 6 < oo, is the space n n _i of all algebraic polynomials with degree not larger than n — 1,
This is an immediate conclusion from the fact that any polynomial of degree n — 1 has n - 1 zeros. • Example 45.2 Given a positive number e < 2n let us consider the linear space 0^ of all trigonometric polynomials defined on [—TT, ir—e] and having degree not larger than n, n^ — span{l, cos(x), s i n ( a ; ) , . . . , cos(n,7;), sin(?i.7;)}. It can be easily verified that any element of this space is of the form
46
CHAPTER 1. CLASSICAL
APPROXIMATION
where i = V — 1, and p is an algebraic polynomial of degree not larger than 2n. For x 6 [—TT, T — e] there is a one-to-one correspondence between the numbers x and e''x. It now follows that any nonzero element of 11^ has at most In roots and that FI?l is (2n+l) — dimensional Chebyshev subspace of C(0, 2n — e). • A Chebyshev subspace of C(B) is said to be real if it is spanned by a system of real valued functions. Thus, the subspaces in Examples 45.1 and 45.2 are both real. Actually, there are infinitely many real Chebyshev subspaces of C(a,b). However, it turns out that the existence of real Chebyshev subspaces of C(BJ cannot be taken for granted if B C Um and m> 1.
Theorem 46.1 Let B be a compact subset ofK"1, m> 2, and let
If the set B contains a subset that is homeomorphic to the set T, then the space C(B] does not contain real Chebyshev subspaces of dimension n > 1. Proof We assume on contrary that there is a homeomorphism H : T->F, F C B, and that V is a real Chebyshev subspace of C(B), dim V = n > 1. We also set
The curves T and F are shown in Figure 46.1. Without loss of generality we may assume that the set B\r consists of infinitely many points. We now consider a basis g\, g^,..., gn of V and we let £3, £ 4 , . . . , £n denote arbitrary n — 2 distinct points in B\r. According to Lemma 42.1 the determinant
is different from zero for all distinct points xi, x2,..., xn in B. Let us now consider the following three functions:
1.3. UNIFORM A PPROXIMATION
47
Figure 46.1: The curves T and T
and
These functions are continuous and they do not change sign on their domains. Since a(r) = b(q) and b(p) = c(r), they must be of the same sign. On the other hand, we have c(q) = —a(p). This contradiction completes the proof. • As an immediate consequence of Theorem 46.1 we obtain the following corollary. Corollary 47.1 Let B be a compact subset of 7£m with an open interior. If m > I , then there is no real Chebyshev subspace ofC(B) whose dimension is larger than one. 1.3.2
Maximal functionals
We shall now construct a linear functional satisfying the conditions (i) and (ii) of Theorem 7.1 for T = C(B) and V being a Chebyshev subspace of J-. Lemma 47.1 Let B be a compact subset of 7i'"1 and let V be a Chebyshev subspace of C(B) of dimension n. Then for any n + 1 distinct points Xj in B there is a unique linear functional L: C(B)-*C of the. form
48
CHAPTER 1. CLASSICAL
APPROXIMATION
satisfying the following requirements.
(iii) All numbers X3 are different from zero and the number AI is real and positive. Proof Let {hi, h^,. . . , /i 7l ) be a basis of the space V. From Lemma 42.1 it follows that for arbitrary distinct points z j , # 2 , . . . , xn+\ in B the solutions («i, a-2, • • • , «n) of the homogeneous linear system
form a one dimensional subspace of C71"*"1 and none of their coordinates is zero, except for the case of the trivial solution. This shows that there is precisely one solution (Ai, A 2 , . . . , A n ) of the system satisfying the conditions
These properties of the numbers Aj taken together with the fact that the norm of the linear functional
is Y^J=I I-M yield the desired result. • Corollary 48.1 For a given continuous function f £ C(B) the quantity
depends continuously on the points Xj. Moreover, if for a fixed index k the point x^ tends to the point x^+i, the value of L(f] approaches zero, i.e.,
1.3. UNIFORM APPROXIMATION
49
Proof The continuity assertion follows immediately from the form of the functional L. Lemma 42.1 implies the existence of a function h in the Chebyshev subspace V such that
Since L(V) = {0} and E"=i |Aj = 1 we now get
Thus L(f) -* 0 when xjfe—> a^+i because / and /i are continuous functions. The proof is complete. • The functional in Lemma 47.1 is called the maximal functional for the subspace V associated with the points x\, x % , . . ., xn+\. The following lemma states an important feature of this functional. Lemma 49.1 Let f e C(B), let V C C(B) be a Chebyshev subspace and let L(g} = -Y?j=i ^ j 9 ( x j ) be the maximal functional for V associated with the points x\, x%,. .., xn. Then there exists a unique function h £ V such that
Moreover, the function h is the best approximation of f with respect to V on the set B = {xi, x%,..., xn+\}, i.e.,
Proof Given a basis {/ifc}^=i of V we consider the the linear system
Multiplying the j'th equation by Aj and then summing up the resulting identities we prove that oen+i = L(f). After eliminating a n+1 the system reduces to
50
CHAPTER 1. CLASSICAL
APPROXIMATION
Since V is a Chebyshev subspace, by Lemma 42.1, the last system has a unique solution for (a?i, «2, . . . , a n ). Thus, the existence and uniqueness of h follows. We now assume to the contrary that h is not the best approximation. Then Theorem 40.1, applied to the space C(B), yields a function /IQ in V such that
for j = 1, 2 , . . . , n + 1. Consequently,
This contradicts the assertion (i) of Lemma 47.1 and completes the proof. • Let C(a,b) denote the normed space of all continuous and realvalued functions defined on the interval [a, b]. Corollary 50.1 For an n—dimensional Chebyshev subspace V of C(a, b) of and the coefficients Xj of the maximal functional L are real and have the following property:
Proof Since the space C(a, b) does not contain complex-valued functions, the coefficients Xj are real. By Lemma 47.1 it remains to show that the signs of Xj and Aj+i are opposite, 1 < j• < n. From Lemma 42.1 we know that there exists a function h in V possessing its zeros precisely at the points Xk & {%i, x^,..., xn+i}\{xj, Zj+i}, and such that h(xj) > 0. We have h(xj+i) > 0 because otherwise h vanishes at some additional point in the interval (xj,Xj+\\. Thus, since we conclude that the quantities Xj and Aj+i are of opposite signs, as claimed. •
1.3.
UNIFORM APPROXIMATION
51
Theorem 51.1 Let us assume that for an n-dimensional Chebyshev subspace V o f C ( a , b ) and that some functions f £ C(a,b) and g 6 V there exist n + 1 points Xj,
such that
where either s = 1 or s — — 1. Then
Proof Indeed, since e(f, V} = e(f -g, V), by Theorem 7.1 we have
where L is the maximal functional for V associated with the points Xj. Corollary 50.1 implies that the products \ j ( f ( x j ) — g ( x } ) } have a constant sign for 1 < j' < n + 1. This and ^'/=i I'M = 1 yield
Thus, the corollary follows. • We are now ready to prove a result, which, in addition to Theorem 40.1, characterizes best approximation of functions in C(a, b). Theorem 51.2 Let V be an n-dimensional Chebyshev subspace of C(a,b) and let f and g be arbitrary functions in C(a,b) and V, respectively. Then g is the best approximation of f in V iff there exist n + 1 points Xj,
such that
where, either s — I or s ~ — 1.
52
CHAPTER 1. CLASSICAL APPROXIMATION
Proof Let us assume that the points Xj exist and that the function g is not the best approximation. Thus, for some function h € V we have
Consequently, the difference h — g — (f — g) — (f — h) has the same sign as / — g at each point Xj. Since these signs alternate, we conclude from the continuity of h - g that h — g has n distinct roots in the interval [a, b], which is a contradiction, since V is a Chebyshev subspace and h — g 6 V. Let us now suppose that the function g £ V is the best approximation of /. Given n + 1 distinct points £1, £ 2 > . • • , £n+i G [a, b] we set
where L is the maximal functional for V associated with the points £j. According to Corollary 48.1 the maximum
is attained at some distinct points £,- = Xj. Without loss of generality we may assume that M > 0 and
In order to complete the proof we need to show that the function g coincides with the best approximation h of / with respect to V on the set B = {x\, x?,..., xn+i}. To this e n d , we consider the quantity
where LQ is the maximal functional corresponding to the points Xj, and we assume to the contrary that
for some point x € [a, b]. Let us now note that it is always possible to replace one point of the set B with the point x in such a way that for each point x'j of the resulting set B1 we have
1.3. UNIFORM
APPROXIMATION
where either s = 1 or s = —I. We now consider the value
where LI is the maximal functional corresponding to the set B'. By Lemma 49.1 and Corollary 50.1 the products X'j (f(x'j) — h(x'j}} are of the same sign. Thus, since h € V and X^?=i l-^'jl = 1> we have
Since |Lo(/)| = M\ this is a contradiction. The proof is complete. The sequence {xkY^i 'm Theorem 51.2 is called an alternant of / with respect to V . We shall now exhibit three examples of determining best polynomial approximations to continuous functions. Example 53.1 Let B = {XQ, x\ , . . . , xn+\ }, where Xj are real numbers and let V — Yln be the space of polynomials of degree at most n. For a given function / : B —» 72. we wish to find an explicit form of a best approximation h of f with respect to V on the set B. According to Lemma 49.1 and Corollary 50.1, the polynomial h is uniquely determined by the conditions
where C, is either e(f,V) or —e(f,V) and a is an arbitrary function such that a(xj) = (—1) J . We now see that h is the Lagrange interpolatory polynomial Siting the data ( x j , f ( x j ) - C (—1) J '). Thus, we have
where P(x) = Oj=d (x ~ xj}- Since the degree of the polynomial h does not exceed 7i, the divided difference
54
CHAPTER 1. CLASSICAL
APPROXIMATION
vanishes. Consequently, we get
and the desired explicit form of the polynomial h follows. We close this example with an alternate and simpler form of ( in the case when the set B consists of the extremal points of the Chebyshev polynomial Tn, i.e., when
Since XQ = —xn+\ = 1 and x\} x % , . . . , xn are the roots of the polynomial Un(x] = (1 — a; 2 )~ 1 ' 2 sin ((«, + 1) arccos(a:)), we see that After standard manipulation, we get
Finally, taking this formula together with the definition of the divided differences we obtain a simpler equation for the number £:
Details of the derivation are left to the reader.
•
Example 54.1 Let us now determine a polynomial Ax+B = p(x] 6 HI which is the best approximation of a given function / e C(a, b) whose second derivative has a constant sign on the interval [a, 6]. (Thus, / is either convex or concave.) Since dim(rii) = 2, the alternant consists of three points a < x\ < x% < x 3 < 6. Obviously, the second derivative of the difference / — p has a constant sign. Thus, / — p has at most one extremal point in the interval (a, b). This shows that Taking these equations together with the conditions of Theorem 51.2 we get Consequently,
1.3, UNIFORM
APPROXIMATION
55
where x% is a unique solution of the equation
A geometric interpretation of this result is given in Figure 55.1.
Figure 55.1: Best approximation of a convex function
Example 55.1 Given an integer n > 2 we shall now find the best approximation P in n n _i of the monomial Mn(x) = xn on the interval [-1,1]. We set
where Tn is the nth Chebyshev polynomial of the first kind. The assertion (e) of Theorem 31.1 implies that w is a polynomial of degree at most n—1. Since Mn —w = 2l~nTn, we conclude that the alternant for Mn consists of the points
From Theorems 31.1 and 51.2 we thus have
56 1.3.3
CHAPTER 1. CLASSICAL
APPROXIMATION
The Remez algorithm
The proof of Theorem 51.2 yields a general algorithm for computing the best approximation. In Figure 57.1 we present a schematic diagram of such an algorithm, which is due to Remez. Given a function / in C(a, &) and given an n-dimensional Chebyshev subspace of C(a, b) the algorithm determines an h in V such that ||/ — h\\ — e(f, V). It can be shown that the Remez algorithm converges regardless of the initial approximation of the alternant chosen. However, its convergence may be arbitrarily slow if a poor choice of alternant is made. In practice one often gives up on determination of the point z at which the function \<j>\ attains the global maximum. Instead on finds a z such that
where A; is a sufficiently large natural number that may depend on >. The stopping condition \4>(z)\ > y is then replaced by \
(z)\ > y — e, where £ is a small positive number (tolerance). From the proof of Theorem 51.2 it follows that the points Xj can be always exchanged for points £y in such a way that
This leads to a version of the general algorithm called the first algorithm of Remez. Another version obtains by noting that the conditions h(xj) + y(—iy = f ( x j ) imply that the function 4> has a zero Uj in each interval ( x j , X j + i ) , j = 1, 2 , . . . , n. Thus, one selects the point £j as a sufficiently accurate solution of the following minima and maxima problem
Here UQ = a, un+i = b, and j = 1, 2, . . ., n. This version is called the second algorithm of Remez. It is linearly convergent for any / 6 C(a,b). Under some conditions on the smoothness of / it achieves quadratic convergence. We should like to remark that instead of determining best approximation one often finds an explicit approximation that is almost as accurate as the optimal one and require small computational effort. Methods of constructing such approximations will be discussed on pages 72-83.
1.3. UNIFORM
?
APPROXIMATION
igure 57.1: Diagram of the Remez algorithm
57
58
CHAPTER 1. CLASSICAL
1.3.4
APPROXIMATION
The Korovkin operators
Some modifications of Fourier series can be used for uniform approximation. An example of such a modification is contained in the assertions (f) and (g) of Theorem 22.1. We shall now discuss how the Fourier technique can be used for approximating continuous 27T— periodic functions / : Tt—>C by trigonometric polynomials. Let C(—7r,7r) denote the linear space of such functions, equipped with the norm
Let us choose a triangular array of real numbers {ck,n}™=ink=i such that the trigonometric polynomials
are nonnegative on the real line for every n. We now define a Korovkin operator Kn : C(—7r,7r) —»C(—TT, TT) by the equation
where / € C(—TT, TT) and x € 7£. It is easy to verify that
Here Ak = £ /^ /(«) cos(ku) du and B*. = ± f^ f ( u ) sm(ku) du. Thus, Knf can be regarded as a modified nth Fourier sum for the function /. As an immediate consequence of the definition of the Kn we obtain the following corollary. Corollary 58.1 A function f in C( — TT,TT) satisfies the equation
We are now ready to prove the following result.
1.3. UNIFORM APPROXIMATION
59
Theorem 59.1 Let Kn be. the Korovkin operator corresponding to the array {ck,n}^\ k=i- Then cl
where w(/, •) is the modulus of continuity of the function f , i.e.,
Proof We remind the reader that all functions in C(—7r,7r) are uniformly continuous. Thus, for a fixed positive number e and for a fixed function / € C(—ir, TT) there exists a positive number 6 such that
These implications taken together yield
for any s and t in Ti. Let us define functions /o, /i, and /2 by the equations
where x 6 ft. From the definition of the operator Kn it easily follows that In addition, we have
60
CHAPTER 1. CLASSICAL
APPROXIMATION
for arbitrary s and t. Let us note now that the operator Kn is monotone, i.e., | (Kng}(x}\ < (Knti) (x) for x € Tl provided that \s(y}\ < My) f°r y £ ^- Thus, fixing t and choosing
we get
Consequently, the uniform convergence of Knf to / follows because the e can be arbitrarily small. Since 1 — cn x < 1. We shall now estimate the quantity Q = \ K n f ( x ) - f ( x } \ , where x G [—7r,7r] is such that Q = \\Knf — f \ \ . By the definition of Kn and cu(/, •) we have
We remind the reader that for any positive numbers /j and S the modulus of continuity w(/, •) satisfies the inequality
1.3. UNIFORM
APPROXIMATION
61
Thus,
where In = /^ |u|y n (tt) du. Since \u < 7rsin(|ti|/2) on [—7r,7r] , we have By the Schwarz inequality it follows that
Taking this inequality together with the very last upper bound on TT Q we finally get
which completes the proof. The estimate in Theorem 59.1 can be sharpened by choosing appropriate numbers S and Ci >n . To this end, we shall need the following lemma. Lemma 61.1 The maximal magnitude Mn of the quantity c\^n subject to the condition that
62
CHAPTER 1. CLASSICAL
APPROXIMATION
Proof We begin with recalling the following result. Every even polynomial p G H^ that is nonnegative on the real axis, can be represented as
where ak are real numbers.
For p — yn we get
Consequently,
We shall find Mn using the Lagrange multiplier technique. To this end, we define
and note that the corresponding system of normal equations
takes the form
Since these equations imply that A = Z31~o akak+\ > the equations to the three term recurrence relation for the Chebyshev polynomials of the second kind (see Theorem 31.1) we easily note that the general form of the afc is
where A is any root of Un+\. Since the largest root of the Un+i is cos (—rrH V « + / y the lemma follows. •
63
1.3. UNIFORM APPROXIMATION 1.3.5
Quality of polynomial approximations
Trigonometric and algebraic polynomials are convenient tools for approximating continuous functions. It is therefore important to know how well they can approximate. Given functions / e (7( —7r,7r) and g 6 C(— 1, 1) let us consider the best approximation errors
(Dealing with -£?„(/) we shall assume in addition that /^ f(x] dx — 0.) Using Theorem 59.1, Lemma 61.1, and Chebyshev polynomials we can easily derive upper bounds on these errors. 2
Theorem 63.1 There exists a positive constant K < 1 + ~ such that Moreover, if /^ f(x) dx = 0, then
Proof Let us select a Korovkin operator Kn corresponding to the maximal value Mn of the parameter c 1?n . Then by the definition of En(f) and by Theorem 59.1 we have
where 8 is an arbitrary positive number and where • || is the norm in C(— TT, TT). Setting S = l/n and using Lemma 61.1 we now get
where K < 1 + ^. According to Corollary 58.1, when /"^ f(x) dx = 0 we may replace E^ (/) with E^ in the above inequalities.
64
CHAPTER I . CLASSICAL
APPROXIMATION
It remains to prove the estimate on En(g). To this end, we define an auxiliary function (j> 6 C(—ir, TT) by the equation
Since > is an even function, its best approximation G G 0^ is an even polynomial (see Theorem 3.1). Consequently,
for some coefficients cxk € 7£ and all numbers x € 72.. On the other hand we have
Now, by the form of the G and by Theorem 31.1 we see that
Thus, G G H n and therefore
which completes the proof. • Given numbers M > 0 and a £ (0,1] let Lip^a be defined as the class of all functions / : D —>• 7£ satisfying the inequality
on the domain D C TL of /. Let us also set
We shall now discuss the quality of polynomial approximations to functions whose derivatives belong to Lip M a.
1.3. UNIFORM APPROXIMATION
65
Theorem 65.1 Let the functions f 6 C(-7r,7r) and g G C(—1,1) be k times differentiable on the interval [-TT, TT] and [—1,1], respectively. Furthermore, let fW and g^ lie in L\pMa. Then for n = 1, 2 , . . . and in = k, k + 1 , . . . we have
and
where K is the constant in Theorem 63.1 and where
Proof Firstly, let us note that for arbitrary functions
we have
Thus, by Theorem 63.1 we get
Let us now assume that c/> and 7 are both differentiable on their domains. We recall that then > £ Lip|u/|il and j 6 Lip|.y|l. Here, || • || and § • I are the norms in the spaces C(—7r,7r) and C( — 1, 1), respectively. Consequently, we have
But
for any polynomials u 6 11^ and v G H n , so
66
CHAPTER 1. CLASSICAL
APPROXIMATION
Let us note that if Jj^. (j)(x) dx = 0, then the inequalities above are valid when E^(
provided that <j) is at least twice differentiable on [—TT,TT]. We are now ready to derive the desired estimates on E^(f) and Em(g). The foregoing discussion results in the following inequalities
Thus, the theoremfollows.• 1.3.6
Converse theorems in polynomial approximation
What can we say about the smoothness of functions / G C(—TT, TT) and g G C(—1,1) if for some numbers k G A/"U {0} and a G (0,1] the approximation errors E%(f) and En(g) are both of order O(n~k~a) as n—too ? It turns out that we can say a lot. The key to answering this question is the following result. Lemma 66.1 Given polynomials p 6 IT^, q G O n and given numbers x G [—7r,T], y G (a, 6) i/ie derivatives p1 (x) and q'(y) satisfy the inequalities
Here, \\ • \\ and j|| ||| are tlie norms inC( — 7T,7r) andC(a,b), respectively.
1.3. UNIFORM APPROXIMATION
67
Proof Let us assume to the contrary that \\p'\\ > n \\p\\. In order to show that this is impossible we consider a trigonometric polynomial
where £ 6 [—7r,7r) is an extremal point of p', i.e., |p'(£)| = ||p'||- It is easy to verify that
for k = 0, 1, . . ., In. Thus, P has at least one root in each interval
where j' = 0, 1,..., In — 1. This property taken together with the Rolle theorem and the periodicity of P results in the conclusion that the derivative P' has at least In distinct roots in the interval [—TT, TT). By the definition of £ we also have
which shows that £ is a double zero of p'. Consequently, P' has at least 2n + 1 roots, counting multiplicities. Since P1 £ 11^, it now follows that the polynomial P is a constant, contradicting its definition. In order to prove the desired estimate on q'(y] we set
where y = ((& — a) cos(x) + (a + &))/2. Since T lies in FI^, it follows from the proven part of the theorem that
Let us note that
Consequently,
as claimed. • We now answer the opening question of this subsection.
68
CHAPTER 1. CLASSICAL
APPROXIMATION
Theorem 68.1 Let k £ Af U {0}, a 6 (0, 1] and let W denote the class of all functions f : D —>• C (D C 13.) whose modulus of continuity w(/, •) satisfies the relation
If for some functions f € C(—TT,TT) and g £ C(—1, 1) the approximation errors E ^ ( f ) and En(g) are both of order O(n~k~a] as n—»oo, then the derivatives f(k) and g(k) exist on the intervals [—TT, TT] and [a,b] C (—1,1), respectively. Furthermore, they both belong to the class Lip a if a 6 (0,1), and to the class W if a = 1. Proof We shall restrict our attention to proving the statement for /. A proof of the case of g can be obtained in a similar way and it is left to the reader. By the hypothesis, for any natural number n there exist a polynomial pn such that
where A is a positive constant, independent of n. polynomials Sj by the equations
Let us define
where j G A/". Since
the convergence of the infinite series being absolute and uniform on [—TT, ?r]. By repeated use of Lemma 66.1 it follows that
Thus, for / = 1, 2 , . . . , k the series Y^^Los converges absolutely and uniformly on [—7r,7r]. In particular, we have
1.3. UNIFORM
APPROXIMATION
69
and
where A 2 > 0 is independent of n. Hence, for TO 6 Af we have
This implies that for some constant A$ and all n G Af • In order to simplify notation we set
It remains to show that
To this end, let us consider two arguments x and y in [— n,n] such that where 0 < 8 < 1/2. Let us also pick up an integer r < ln(2/$)/ln(2) such that Then
and
where a constant A± > 0 does not depend on n. Consequently.
70
CHAPTER 1. CLASSICAL
APPROXIMATION
Here,
Thus, we finally get the inequality
where AS > 0 is independent of n. Now, the desired result follows easily. • The following example shows that we cannot expect that fW € Lip a when E%(f) = O(n~k-a} with a = I. Example 70.1 Let / be an arbitrary function in C*(-7r,7r) such that
Then
On the other hand, we have
(We leave a verification of the last identity to the reader.) Consequently, /<» g Lip 1. • Remark In contrast to the trigonometric case, we cannot claim that the derivatives ^(—1) and g^(l) exist if En(g] is of order O(n~ } as n—>oo. In fact, the situation is even worse since these
71
1.3. UNIFORM APPROXIMATION
derivatives may fail to exist even if En(g) — 0 for sufficiently large 71. The function (7(0;) = |1 — x2\ illustrates this obstacle. • As far as the trigonometric approximation is concerned and when a G (0,1) Theorem 68.1 fully converts Theorem 65.1. Since the class W is essentially larger than the class Lip 1, a gap appears if a = 1. In order to close this gap we shall now provide a complete characterization of functions in C(—7r,7r) that are approximate by polynomials in iT^ to within O(n~k~l). Theorem 71.1 Let k G Af U {0} and let Z denote the class of all functions cf> 6 C'( —7r,7r) satisfying the inequality
where x G [—TT, TT], h G (0,1) and M is a positive constant dependent on (/), at most. Then
holds for a given function f in C(—ir, TT) iff f^ exists and belongs to the class Z. Proof Let us assume that E^(f) = O(n~k~1} as n—too. Proceeding as in the proof of Theorem 68.1 we conclude that /^' exists and it can be represented by a uniformly convergent series
where Sj is a trigonometric polynomial of degree 2J such that
Since
coincide with the second derivative of s^ of the interval (x — h, x + /i), we get
evaluated at some point
72
CHAPTER 1. CLASSICAL
APPROXIMATION
where K > 0 is a constant independent of h and j. Let us now select an integer number 771 such that l/h < 2m < 2/h. Then
Thus, /(*) 6 Z. Let us now assume that <j> = /W is well-defined and belongs to the class Z. Proceeding as in the proof of Theorem 65.1 we get
In order to complete the proof it is now enough to show that
To this end, let Kn(j) = f*v (/>(x)yn(x) dx be the Korovkin operator corresponding to the maximal value Mn of the parameter c\tU. Then by the inequality (61.1) and Lemma 61.1, for n—>oo we get
On the other hand, we also have
Thus, the theorem follows. • 1.3.7
Projection operators
Theoretically, the problem of determining polynomials that best approximate functions in C(a, 6) can be solved with arbitrary accuracy
1.3. UNIFORM APPROXIMATION
73
(for instance via the Remez algorithm). However, it turns out that any method for solving this problem (even with a limited precision) may require vast amounts of computation unless additional smoothness assumptions are made on the functions to be approximated. Thus, one attempts to develop relatively inexpensive methods leading to approximations that are almost as good as optimal ones. By pursuing such methods for approximation with polynomials in Un one additionally imposes the following natural requirements. • The methods should work for any function in C(a, b). • Polynomials produced by the methods should depend linearly and continuously on the function, being approximated. • If the function being approximated is a polynomial of degree n, the methods reconstruct the. function. These three conditions restrict our attention to methods of the form Pn : C(a, 6) ->n n , where Pn is a linear and continuous operator such that Operators satisfying the above conditions will be referred to as projection operators. Their complete characterization is provided by the following theorem. Theorem 73.1 (i) A mapping Pn : C(a, b) —»lln is a projection operator iff there exist polynomials and functions of bounded variation such that
and
(ii) The norm of any projection operator Pn : C(a, b) -4-IIn is given by the equation
74
CHAPTER 1. CLASSICAL
APPROXIMATION
where ||| ||| is the norm in C(a, b] and for a fixed x 6 [a, b], V n ( x ) is the total variation of the function ZTf=i ( l j ( x } ^ k , i-£-,
Here, the polynomials qj and the functions fj,j are those in (i). (iii) I f P n : C(a,b)—>nn is a projection operator, then
Proof Let us assume that Pn : C(a, b) ->IIn is a projection operator. Then for any / € C(a, b) we have
def
where m,j(x) = x3 and Lj are linear continuous functionals. We now recall the following Riesz representation theorem. Every linear continuous functional L : C(a,b) ~->C is of the form
where /j, is a function of bounded variation, and \\L\\ — var(/z). It now follows that
where ^j are functions of bounded variation. Consequently,
Since we must have Pn(q) — q for any q 6 Yln, we get
1.3.
UNIFORM APPROXIMATION
75
Hence, the conditions in the assertion (i) hold when qj = m,j. Conversely, these conditions imply that the mapping Pn is linear and continuous and satisfies the equation Pn(q) — q for any q £ Un. Thus, it is clearly a projection operator. Let us now consider the projection operator
For a given x 6 C(a, b) the mapping
defines a linear continuous functional whose norm is
Thus,
which proves the assertion (ii). Let / be an arbitrary function in C(a, b}. Then for a polynomial p € nn such that |||/ - p|| = En(f) we have
Consequently,
We shall now show that both sides of this inequality are actually equal. To this end, let us pick up an arbitrary positive number e and note that there exists a function g e C(a, b) such that
76
CHAPTER 1. CLASSICAL
APPROXIMATION
Thus, there exists a closed subinterval / C [a, b] of positive length such that the inequality
holds on / when
Then the inequality
is satisfied uniformly for a; G [a, 6]. Let £ be an interior point of the interval / and let a function h G C(a, 6) be chosen in such a way that
Hence,
and consequently
Estimating |/i - Pn(h)\l, we have
1.3. UNIFORM APPROXIMATION
77
Since e > 0 is arbitrary, it now follows that
The proof is complete. • By virtue of Theorem 73.1, projection operators having small norm are of special practical interest. Below we shall define three projection operators A n , Bn, and Cn, which are related to Chebyshev polynomials and possess almost minimal norms. Since a linear transformation of the interval [a, 6] onto another interval [c, d] does not affect norms of projection operators, we shall restrict our attention to the case of [a, b] = [—1,1], For brevity, we shall adopt the compact notation
to describe the sums
respectively. Given a function / 6 C(— 1, 1) let us set
and
Here,
are the roots and the extremal points of the Chebyshev polynomial Tm, respectively. We now define:
78
CHAPTER 1. CLASSICAL
APPROXIMATION
Remark Let us note that Anf is the nth Fourier expansion of / in terms of the Chebyshev polynomials Tj. It can be shown that Bnf and Cnf are the Lagrange interpolatory polynomials of / associated with the nodes
respectively. (See Exercises at the end of this section.) It is easy to note that a map transforming functions in C(—1,1) into their interpolatory polynomials associated with fixed nodes
is a projection operator. We should like to warn from taking good approximation properties of these operators for granted. It turns out that some selections of nodes lead to practically useless operators. For instance, if the nodes are uniformly distributed on the interval [—5, 5], i.e., Xj — — 5 + I 0 j / ( n + 1), then the corresponding Lagrange polynomials of the function /(z) = (1 + £ 2 )~ 1 fail to converge to the f(x) at each point of the interval (3.63, 5]. This fact is known in the literature as the Runge example. • In order to present good approximation properties of the projection operators An, Bn, and Cn we need auxiliary preparations. Let C°(—7r,7r) denote the linear space of all 2?r-periodic and even functions g : TZ-^C endowed with the norm
Let 11° stand for the linear subspace of C°(—TT, TT) consisting of even trigonometric polynomials of degree < n, i.e.,
Given a function / € C(-l, 1) let f° be defined by
1.3. UNIFORM APPROXIMATION
79
It is easy to verify that the mapping
is an isometric and isomorphic relation between the spaces C(—1,1) and C°(—TT, TT). Moreover, if Pn : C(—1,1) —}H n is a projection, the equation defines the operator P° : C°(—7r,7r) -411° such that P°nh = h for every h&H°n and ||Pn°||0 =y sup \\P°ng\\0 = \\Pn\\. I|S|I0 = 1
Given a real number a; let Sx : C'°( —TT, TT) —>C°(—TT, TT) be defined by the equation
where g is an arbitrary function in C°(—ir, n] and t is any real number. We are now in a position to formulate the following result. Theorem 79.1 Let n be an arbitrary natural number. Then (a) There exists a projection operator Pn : C(—1, 1) ->Iln of minimal norm. (b) For every projection operator Pn : C(—l, 1) ->IIn we have
80
CHAPTER 1. CLASSICAL
APPROXIMATION
and
A complete proof of this theorem is very space consuming. Therefore, we shall confine ourselves to proving the assertion (b). References concerning Theorem 79.1 are given in Annotations at the end of this chapter. Proof of the assertion (b) Let Pn : C Y (—1, 1) —> fl ?l be an arbitrary projection operator. We shall first show the identity
Since the set IP = {cos(kt)}^L0 is linearly dense in C°(—7r,ir) and the operators P7° and Sn are continuous, it suffices to prove this identity for every g in 11°. To this end, let us assume that g(t) — cos(kt) and note that the operator A°n takes the form
Thus,
On the other hand, we also have
and
Here, GJ^ are some constants independent of t. Denoting by / the right hand side of the identity to be proven, for k < n we get
81
1.3. UNIFORM APPROXIMATION For k > n we obtain
Thus, the desired identity is established. The proven part of the assertion (b) and the mean value theorem for integrals imply that given t € 72. and g G CO(—TC,TT) there exists an y £ [0, TT] such that
Consequently, if i is chosen in such a way that
we get
Since g 6 C 0 (— TT, TT) can be arbitrarily chosen and as ||5y||0 is clearly not larger than 2, it now follows that
or equivalently that
The proof of the assertion (b) is complete.
•
Remark Let us note that the ^(A0+An) is not a projection operator and, therefore, the assertion (b) of Theorem 79.1 dues not imply the existence of a projection operator of minimal norm. • In the table below we list approximations (to within 10~ 2 ) of the norms ||/4 n ||, ||B,i||, \\Cn\\ and j j ^ ( A 0 + An)\\ for several values of n.
82
CHAPTER 1. CLASSICAL n
1
P«ll \\Bn\\
1.44 1.41 1.00 1.00
HCnll IIM^O + AOH
10 2.22 2.49 2.43 1.24
20 2.49 2.90 2.87 1.37
50 2.86 3.47 3.45 1.54
100 3.1 4 3.90 3.89 L.68
APPROXIMATION 500 3.79 4.92 4.92 2.00
1000 4.07 5.36 5.36 2.14
5000 4.72 6.38 6.38 2.47
Table 82.1: Norms of projection operators We close this section with a result on approximation of functions having absolutely convergent Chebyshev series. Theorem 82.1 Let the Chebyshev series
of a function f 6 C(—1,1) converge absolutely. Then for every nonnegative integer number n we have
Proof Since Anf € !!„, |Tj| = 1 for j = 0, 1 , . . . and the Chebyshev series converge absolutely, we get
Thus, it remains to derive the lower bound on En(f). To this end, let us note that En(f) cannot be smaller than the error £(/) of best approximation of / with respect to Hn on the set
Invoking the form of this error obtained in Example 53.1 we get
where
1.3. UNIFORM
APPROXIMATION
Since f ( u n < j ) - EfcLo / a fc(/)^('"nj)'
83
we see tlial
where
Let us note that for any integer numbers A; and / such that 21 (n + 1) - k > 0 we have
and consequently On the other hand, each Os| can be interpreted as the error |C(^A-)IThus, it is clear that
Now, from the former identity it follows that Cfc — Cn+i if k is an odd multiple of n + 1 and that (jj — 0, otherwise. Since Tn+i(unik) = (—l) f c , it is easy to verify that £n+i = 1. This completes the picture of the numbers ^ and results in the desired form of the quantity £:
Theorem 82.1 implies that if the Chebyshev coefficient an+i(f)\ is much larger than £^ln+2 aj(f) > then the best approximation of the function / in Tln is almost identical with the polynomial A n (/). 1.3.8
Exercises
32. Determine an algebraic polynomial p of degree < 3 such that the quantity
is as small as possible.
84
CHAPTER 1. CLASSICAL
APPROXIMATION
33. Find a trigonometric polynomial g of degree < 3 that best approximates the function sin(o;/2) in the supremum norm on the interval [—7T,7r]. 34. Show that among polynomials of degree precisely n whose leading coefficients equal 1 the polynomial 2~ max (°' n ~ 1 )T, i , has a smallest supremum norm on the interval [—1, 1]35. Determine a polynomial q G H n such that
36. In the class of polynomials p of degree < n such that p(0) — I find a polynomial p* whose supremum norm on the interval [1,2] is as small as possible. 37. Let / G C(a, b) be a function whose nth derivative is strictly positive on the interval (a. b}. Show that
is an (n + l)-dimensional Chebyshev subspace of C(a, b). 38. Assume that none of distinct points «i, o^i • • - , otn belongs to the interval [a, b] and prove that
is an Ti-dimensional Chebyshev subspace of C'(a, 6). 39. Show that for every even function / G G Y (—7r,7r) we have £#(/) = En(g), where g(x] = /(arcsin(a:)) for x G [-1,1]. 40. Show that the mappings
and
are Korovkin operators. (These operators bear the names of De La Vallee-Poussin and Jackson, respectively.)
1.3. UNIFORM APPROXIMATION
85
41. Let V denote the set consisting of all nonnegative functions in the space C(a, b) and let m.; stand for the v,tli monomial, i.e., rrii(x) — x\ i = 0, 1, . . . . A linear operator K : C(a, b) —>C(a, b) is said to be positive if KY C Y. Follow the proof idea of Theorem 59.1 to obtain the following result of Korovkin. // {Kjljlj is a sequence of positive operators such that
then
42. Show that the mapping
is a positive operator satisfying the equations Bnm0 = m.0, Bnm\ — mi, and R n m 2 = r?i 2 + 7 (mi - m 2 ). Then obtain the Weierstrass theorem from the result in Problem 40. The operator Bn is called the Bernstein operator. 43. Prove that given a polynomial q 6 ITm and a natural number ri > m we have
where polynomials ri, r 2 , . . . , r m _i 6 Hm do not depend on n. Hint: First prove the assertion for monomials. To accomplish this apply m times the linear operator D : II —>• II,
to both sides of the equation
and then substitute y = 1 - a;.
86
CHAPTER 1. CLASSICAL APPROXIMATION
44. Use the result in Exercises 18 on page 37 to prove the following Miinz's theorem. The space spanned by the functions gk(x] — xpk (0 < Pi < pz < . • .) is dense in (7(0, 1) iff p\ = 0 and the series ^£2 ~^/Pk is divergent. 45. Let \j be the jth coefficient of the maximal functional L in Corollary 50.1. Set jk = £j=i ^j (& = 1, 2 , . . . , n + 1) and show that sign(7 fc ) - sign (A*) and ^l-=i \7k = 1/2jfiTmi: Note that Y^=i ^j = °- Thus, by the ,46e/ summation formula, for every / 6 C(a, b) we have
46. Using the result in Exercise 45 prove that under the assumptions of Theorem 51.1 we have
47. Let / and g be arbitrary functions in C(—1,1). and let be an alternant of/ with respect to Iin. Assume that Pn, Qn e LTn are such that ||/ — Pn|| = En(f), and Qn best approximates the function g on the alternant. Assume also that L is the maximal functional for IIn associated with the points xj,. Show that the function
vanishes at each point Xk- Next, using the Rolle theorem, prove that if the functions / and g are (n+ 1) times differentiate on [—1,1], then there is a point £ 6 (—1, 1) such that
48. Prove that for any two functions / and g possessing derivatives of order n + 1 on the interval [—1, 1] we have provided that |/( n+1 )(a:)| < \g(n+1)(x)\ for all x 6 [-1, 1]. Hint: By perturbing the function g appropriately, show that without loss of generality one may assume that |/^™ +1 '(a;)| < | 5 r( n + 1 )(a;)| for all x £[-1,1]. Then use the last result in Exercise 47.
87
1.4. ANNOTATIONS
49. Prove that for any positive numbers /j and S the modulus of continuity w(/, •) satisfies the inequality
50. Suppose that functions /, g G C(—1,1), and h G C(—TT,TT) satisfy the equation What can we say about the smoothness of these functions? 51. Let f(x] — \x\ and #(0;) = \/l — x2 for all x G [—1,1]. Show that E'2n(f) = E2n(g) = O(n~1} as n->oo, f G Lip 1 and g G Lip 1/2 on the interval [—1,1]. Why these results do not contradict Theorem 68.1 ? 52. Prove that a function / G C(—TT, TT) has derivatives of all orders iff lirrin—^oo nkE^(f) = 0 for every number k. 53. Let Bn and Cn be the projection operators defined on page 77, and let / G C(—1,1). Show that Bnf and Cnf are the Lagrange interpolatory polynomials of / associated with the nodes i ? i + l , l > £ » i + l , 2 ) ' - - i £?i+l,n+l,
anc
l '«?i-l,0, '"71-1,1, • • • , W n _ i i n ,
respectively. 54. Given a function / G C(-l.l) let /!„/, fin/, C'?l/ and a,j(f) be defined as on page 77. Prove that Bnf and Cn/ can be derived from Anf by applying the suitable quadrature formulas to the integrals o,j(f).
1.4
Annotations
The theory of linear approximation in normed spaces is a very well developed area. Its full presentation would have been a task as serious as to delay the completion of this book indefinitely. Thus, we confined ourselves to the results on characterization, uniqueness, basic properties, and construction of best approximation. The selection of the material for this chapter is based on the first author notes for the courses on methods of approximation he taught at the University of Utah in 1987 and at the University of Warsaw in 198992. Probably there is no single book covering the same material, but the significant influence of the monographs [1], [12], [17], and [26] is acknowledged.
88
CHAPTER 1. CLASSICAL APPROXIMATION
Specific comments Section 1.1: Hahn-Banach Theorem 10.1 C(0,l) and Vk
Proofs of the Brouwer fixed point theorem and the theorem can be found in [13] and [20], respectively. bears the name of Bernstein who proved it for JF = = Pk-i, see [26].
Section 1.2: We refer the reader to [7] for the orthogonalization algorithms and more information on numerical solution of linear equations. A detailed study of the trigonometric Fourier series is given in [27]. Although the sine functions are not as classical as the trigonometric ones, they lead to orthogonal series of equal practical interest, see [24]. As indicated in [23], [8], and [22] prolate spheroidal wave functions are natural and convenient tool of signal processing. See [22] for their additional properties. Theorems 28.1, 30.1, and 31.1 are just illustrations of theory of orthogonal polynomials, [25]. In fact, there are many other classes of inner products for which the corresponding orthogonal polynomials can be found explicitly, [3]. The idea of the proofs of Theorems 34.1 and 36.1 has been communicated to the first author by Dr. Sawori. Section 1.3: Theorem 40.1 and 43.1 belong to Kolmogorov and to Haar, respectively (see [12]). Theorem 46.1 admits a far-reaching generalization. It can be shown that C(B] contains a real Chebyshev subspace V of dimension larger than one iff the compact set B is homeomorphic to a subset of the unit circle { ( x , y ) £ H2 : x2 + y 2 = 1}. This result is due to Mairhuber, see [15]. Its extension toward the Chebyshev systems over the complex field is given in [21]. A detailed study of Chebyshev systems and their applications is presented in [6]. Theorems 51.1 and 51.2 bear the name of De la Vallee-Poussin and Chebyshev, respectively. The Remez algorithm admits various modifications, see [12] and [18]. An implementation of this algorithm for best approximation by polynomials is presented in [14]. Theorems 63.1 and 65.1 are due to Jackson. Their derivation from the Korovkin theorem (Theorem 59.1) is taken from [12]. The estimate K < 1 + 7T 2 /2 can be improved. Actually, the smallest value
89
1.5. REFERENCES
of the constant K can be found, see [1]. The converse theorems 68.1 and 71.1 belong to Bernstein and Zygmund, respectively. We should like to mention that there are many generalizations of these results toward various approximation settings, e.g., approximation by polynomials, splines, and rational functions in Lp spaces. The interested reader is referred to [4], [16], [18], and [26]. The material on projection operators is based on [15]. A more abstract and detailed presentation of this subject is given in [10]. Proofs of the assertions (a) and (b) of Theorem 79.1 can be found in [5] and [15], respectively.
1.5
References
[1] N.I. Ahieser. Lectures on the theory of approximation. New York, 1956.
Ungar,
[2] E.W. Cheney. Introduction to Approximation Theory. McGrawHill, New York, 1966. [3] T.S. Chichara. An Introduction to Orthogonal Gordon and Breach, New York, 1978.
Polynomials.
[4] V.K. Dzjadik. Introduction to the Theory of Uniform Approximation of Functions by Polynomials. Nauka, Moscow, 1977. In Russian. [5] J.R. Isbel and Z. Semadeni. Projection constants and spaces of continuous functions. Trans. Am. Math. Soc., 1 (107): 38-48, 1963. [6] S. Karlin and W.J. Studden. Tchebycheff Systems with Applications in Analysis and Statistics. Interscience, New York, 1966. [7] D. Kincaid and W. Cheney. Numerical Analysis. Brooks/Cole Publishing Company, 1990. [8] H.J. Landau and H.O. Pollak. Prolate spheroidal wave functions, Fourier analysis and uncertainty, II. Bell System Tech. J., 40: 65-84, 1961. [9] H.J. Landau and H.O. Pollak. Prolate spheroidal wave functions, Fourier analysis and uncertainty, III. Bell System Tech. J., 41: 1295-1336, 1962.
90
BIBLIOGRAPHY
[10] G. Lewicki and W. Odyniec. Minimal Projections in Banach Spaces. Springer-Veiiag, Berlin, 1990. Lect. Notes in Math. 1449. [11] J. Mairhuber. On Haar's theorem concerning Chebysheff problems having a unique solution. Proc. Am. Math. Soc., 7: 609-15, 1956. [12] G. Meinardus. Approximation of Functions: Theory and Numerical Methods. Springer-Verlag, New York, 1967. [13] J.M. Ortega and W.C. Rheinboldt. Iterative Solution of Nonlinear Equations in Several Variales. Academic Press, New York, London,1970. [14] S. Paszkowski. Determination of the best polynomial in the sense of uniform approximation by the second algorithm of Remez. Zastosowania Matematyki, XII (1): 107-22, 1971. [15] S. Paszkowski. Zastosowania Numeryczne Wielomianow i Szeregow Czebyszewa. PWN, Warszawa, 1975. In Polish. [16] P.P. Petrushev and V.A. Popov. Rational Approximation of Real Functions. Cambridge University Press, Cambridge, 1987. [17] M.J.D. Powell. Approximation Theory and Methods. Cambridge University Press, Cambridge, 1981. [18] J.R. Rice. The Approximation of Functions, Vols. I, II. Addison-Wesley, Reading, Mass., 1964, 1969. [19] T.J. Rivlin. An Introduction to the Approximation of Functions. Blaisdel, Waltham, Mass., 1969. [20] W. Rudin. Real and Complex Analysis. MacGraw-Hill, New York, 1974. [21] I.J. Schoenberg and C.T. Yang. On the unicity of solutions of problems of best approximation. Ann. Math. Pura Appl., 54(4): 1-12, 1961. [22] D. Slepian. Some asymptotic expansions for prolate spheroidal wave functions. J. Math. Phys., 44: 99-143, 1965. [23] D. Slepian and H.O. Pollak. Prolate spheroidal wave functions, Fourier analysis and uncertainty, I. Bell System Tech. J., 40: 43-64, 1961.
BIBLIOGRAPHY
91
[24] F. Stenger. Numerical Methods Based on Sine and Analytic Functions. Springer-Verlag, New York, 1993. [25] G. Szego. Orthogonal Polynomials. AMS Colloquium Publications, v. XXIII, Providence, 1975. [26] A.F. Timan. Theory of Approximation of Functions of a Real Variable. Pergamon Press, Oxford, 1963. [27] A. Zygmund. Trigonometric Series, Vols. I, II. Cambridge University Press, Cambridge, 1968.
This page intentionally left blank
Chapter 2
Splines Spline functions are important approximation tools in numerous applications for which high degree polynomial methods perform poorly, such as in computer graphics and geometric modelling, as well as for various engineering problems—especially those involving graphing of numerical solutions and noisy data. Algorithms based on spline functions enjoy minimal approximation errors in wide classes of problems and minimal complexity bounds. In this Chapter we provide a brief introduction to basic classes of polynomial splines, B-Splines, and abstract splines. Further study of spline algorithms as applied to linear problems is outlined in Chapter 7.
2.1
Polynomial splines
In this section we define polynomial spline functions, exhibit their interpolatory properties, and construct algorithms to compute them. It turns out that these splines provide interpolating curves that do not exhibit the large oscillations associated with high degree interpolatory polynomials. This is why they find applications in univariate curve matching in computer graphics. We let Xj, j — 1,..., n be distinct points in the interval [a, b],
We say that S : [a, b]—»7£ is a spline function of degree k with the knot sequence x\,..., xn, if: (1) 5 is a polynomial of degree at most k in every subinterval [xi,x,-+i], i — 0, ...,n, where XQ — a, xn+\ — b, and (2) 5' and all its derivatives of the order at most k — I are continuous on [a, 6], i.e., S € C>k~l[a, b].
94
CHAPTER 2.
SPLINES
Moreover, a spline function S of degree 2k — 1 that is given by a polynomial of degree at most k — 1 in [a, x\\ U [xn, b] is called a natural spline. We let S(k; x\,..., xn) be the linear space of all splines S of degree k with the knot sequence x\,..., xn in [a, b]. We first show a representation lemma for splines. Lemma 94.1 A function S £ S(k; xi,..., xn] is a spline iff
where Pk(x) is a polynomial of degree k, and
Proof We first show that a function S (given above) is a spline. Indeed, for any index j and x G [xj, £j+i) the function S is a polynomial of degree k, and is also k — 1-times continuously differentiable for x 6 [a, b], i.e., it is a spline. To prove the only if part of the lemma, we take any spline function 5. Then the &-the derivative of S is a piecewise constant function. Letting tj = S^(xj-\-) — S^(xj—), we observe that there exists a constant c such that S^ can be written as
After integrating this expression fc-times we obtain the required form of S with aj = t j / k l . • Lemma 94.1 yields the following corollary. Corollary 94.1 A function S : [a, 6]—)- 7£ is a natural spline of degree 2k — 1 with a knot sequence x'i, ..., xn iff
where pk-\ is a polynomial of degree k - I, and
2.1. POLYNOMIAL SPLINES
95
An easy proof of this corollary is left to the reader as an exercise. In the remainder of this section we shall be interested in interpolation and approximation properties of spline functions. We suppose that [a, b] is a finite interval, and that the real numbers i/j, i = 1,..., TI are given. We wish to construct a spline function S that interpolates this data, i.e., S ( x i ) = y,-. Below we often assume that the values y; are samples of a real function / : [a, &]—>7vL at the knots Xi, \j{ — f ( x i ) . This interpolation problem does not have a unique solution in the class of all splines. However, it does possess a unique solution in the class of natural splines in the sense of the following theorem. Theorem 95.1 For every choice of the knot sequence X{ and the real numbers y,-, i = 1, ...,n, and for every integer k, 1 < k < n, there exists a unique natural spline S of degree Ik — 1 that interpolates the data X i , y i , i.e., S(xi) = y;. To prove this theorem we need the following Lemma. Lemma 95.1 Let f € Ck 1 (a, 6) be such that f^ is continuous in [a, b] \ {xj,..., xn}. If S is a natural spline of degree 2k — I , then
where aj are the coefficients of the spline S as represented in Corollary 94-1Proof We first integrate by parts the left-hand side of the above equation A;-times to get:
since S(k+v\a) = S^k+p^(b} = 0 for p > 0. Corollary 94.1 implies that for x 6 (a;,-, £j + i) we have
96
CHAPTER 2.
SPLINES
Therefore, by taking into account that Y?j=\ aj = 0 we get
This completes the proof. • We are now ready to prove the Theorem. Proof of Theorem 95.1 We first consider the homogeneous problem S(xi) = 0, i = I , . ..,7i, where S is a natural spline of degree '2k — I . Since S satisfies the assumptions of Lemma 95.1, by setting / = 5 we get
Therefore S is a polynomial of degree at most k — 1. Since S has at least n > k zeros zz-, S ( x ) = 0 for every x. This problem can be rewritten as a homogeneous system of n + k linear equations for the coefficients a,j and the coefficients of the polynomial pk-i in Corollary 94.1, as follows.
The matrix of this system is nonsingular, since the system has only zero solution. Now we observe that the original problem S(xi) = j/j satisfies the same system with the right-hand side of the i-th equation being y,-. Therefore, it has a unique solution, since the matrix is nonsingular. • We shall now exhibit an important extremal property of natural splines. Theorem 96.1 Let S be. the unique natural spline of degree 2k - 1 interpolating the data ( x i , y i ) , i = [ , . . . , n with 1 < k < n, and f any function with bounded 1-2 norm of it's k-th derivative, and such that f ( x i ) — iji. Then the l-2 norm of S^ is minimal in this class of
2.1. POLYNOMIAL
SPLINES
97
functions, i.e.,
Equality is attained here iff f — S almost everywhere. Proof We take any function / as above. By applying Lemma 95.1 to the function / — 5 we get:
This means that the functions 5^' and /( fc ) — S^ are orthogonal. We observe that if h and g belong to L-}(a, 6), and are orthogonal
then
Moreover, the above integrals are equal iff g — 0 almost everywhere in [a, 6]. Indeed, this result is a direct consequence from the orthogonality of h and g and the formula (h + g)2 = h2 + 2hg + g2. By applying this to h = S^ and g = fW - S^ we get:
In addition, equality holds iff f(k)(x) — S^(x) = 0 almost everywhere, which means that / — S is almost everywhere a polynomial of degree k — I . By taking into account that f ( x t ) = S ( x i ) , i = 1, ...,n and k < n we obtain / = 5' almost everywhere. This completes the proof. • We now restrict our attention to the mostly used natural cubic spline 5s, and construct an algorithm for computing it. We denote ; def . . d e f „// , > ._. hi = Xi+\ — z,-, z = 1,..., n— I and Zj = b3(Xj), j = I,...,n. Ihen we observe that the second derivative of 83 is a linear function in each subinterval [z,-,a;,-+i]. Therefore, for any x 6 [2^,0^+1] we get
98
CHAPTER 2.
SPLINES
By integrating the above formula twice we obtain S% in the form
with coefficients A and B to be determined below. From the interpolatory conditions 83 (x,-) = yt and £3(2:44.1) = j/i+i we get a system of linear equations for A and B.
By solving this system we get
and
After simple algebraic transformations we obtain:
The function 63 derived here is 'a cubic spline if it has continuous first derivative. We must thus have left and right derivatives of 83 equal at all points a;,, i = 2,..., n — 1, i.e.,
By using this condition in the equation (98.1) we get a system of linear equations for the unknown z;'s (i = 1,..., n — 2).
Since 6*3 is a natural spline, we have z\ — zn = 0. Using this fact and simplifying the above formula we get the following system of linear
99
2.1. POLYNOMIA L SPLINES equations for the 2;'s:
where Hi = 2(/i; + /i,-+i), i>i = 6(6,-6,;_ 1 ), and b.t - ^(y;+i -?A). To solve this three diagonal system of equations we may use Gaussian elimination without pivoting. We now suppose that all z,-'s have been computed. How do we then evaluate the spline 63 at a given argument x? To accomplish this we first identify the index i for which x £ [xi,Xi+i), and we then compute the value 63(2;) from Newton's form of 83 with using of Homer's algorithm:
where the coefficients C, D, and E are determined from the equation (98.1) as,
The spline £3 interpolating given data a;;,y; has the following properties. Lemma 99.1 Let f : [a, 6]—)-Ti be any function interpolating the. data xi,yt (f(xi) = yi). Then: (i) If the function f is continuously differentiable on [XI,XH], then
where M = maxxe[XltXn] \ f ( x } \ , (ii) // the function f is twice continuously differentiate [ x i , xn], then
on
100
CHAPTER 2.
SPLINES
and
Proof We first prove the assertion (i). To this end we use the form of 83 derived on page 99 and for x 6 [#;, a;;+i] we write
where RI = f ( x t ) - f ( x ) + hi l [ f ( x i + i ) - f(xi)](x - a:;) and R2 = -(hi/Q)(2zi + zi+i)(x - x^ + D(x - Xi)'2 + E(x - a;,-)3. By using the formulas for D and E we obtain:
and
Therefore,
We now let r,- = 6(6» — 6,-_i)/(/i,_i + /^) and observe that the linear equations on top of page 99 imply:
This estimate yields
2.1. POLYNOMIA L SPLINES
101
The last inequality finally yields
We now easily get
which completes the proof of (i). We next prove the assertion (ii). To this end we set M\ = m&x.X£!Xl>Xn] \f ( x ) \ , we take any x G [a;;, a;;+i], and use the Taylor series expansion of f ( x ) and /(a;4-+i) at x4- to get
where T/,-, £; 6 [a;,-,Xt+i]. In asimilar way we estimate (6,-—&,-_i)/(/i,-+ /H-i):
where /[:£;_!, &•;, x,-+i] is the second divided difference of / taken with respect to o:;_i,2;,-, and x,-+i, and £,• £ [a?j_i,o;,-+i]. We now obtain and finally,
This completes the proof of (ii). We now observe that (iii) has been shown in a general case in Theorem 96.1. •
102
CHAPTER 2.
SPLINES
The last result of Lemma 99.1 says that the spline 63 is the smoothest in the sense of minimizing the second norm of the second derivative function in the class C 2 ( a , b ) among all functions / interpolating the data (a:;,y;). Since the expression f " ( x ) ( l + f' (x)'2)~3/2 is the curvature of the function / at x, and whenever / (x) is much smaller than 1, the second derivative / (x) closely approximates the curvature. In this case the L2- norm of the second derivative gives a good approximation to the total curvature of the function / on [a, b}. This is why we say that the spline 83 minimizes the total curvature. This is why 83 is widely used in graphics applications. By assuming that the knots are equispaced, i.e., xi+\ — Xi = h, for every i = 1,..., n — 1 the parts (i) and (ii) of Lemma 99.1 show that the spline 83 uniformly approximates any given / to within an error of O(h) in the case (i) and to within an error of O(h^} in the case (ii), as h—>0. This is in sharp contrast with high degree interpolatory polynomials based on equispaced nodes, which may uniformly diverge from interpolated function, as exhibited in the Runge example on page 78. This is why we should use splines rather than high degree interpolatory polynomials to approximate data. 2.1.1
Exercises
55. Show that the dimension of the spline space S(k\ x\,..., xn) is k + n + 1. Hint'. Show that the functions 1, x, x2...., xk, (x — £1)4., • • • , (x — #n)+ form a basis in this space. 56. Prove Corollary 94.1. 57. Verify that the function 63 given by the formula (98.1) satisfies: (i) S3(xi) -yi,i= l,...,n, (ii) 83 and 53 are continuous at Xi, i = 2, ...,n — 1, (iii) 53(0:1) = Sz(xn) = 0. 58. Derive the formulas for the coefficients of Newton's form of the spline 83 given on page 99. 59. Implement the algorithm for computing the spline 83. Test it for several values of n and various data x.t, j/j. Compare the uniform error |j/ — .S^H^ with its estimates for functions / satisfying assumptions of Lemma 99.1.
2.2. B-SPLINES
2.2
103
B-splines
In this section we introduce .B-splines, which play important role in numerical computation, and specifically in Computer Aided Geometric Design. They are defined as divided differences of truncated power functions (x — £»)+. We show some properties of B-splines including the spanning of the space of all splines with preassigned knots, the partition of unity, and we derive recurrence formulas for computing them. Let {«i}^_00 be an increasing sequence of real numbers (a;,- < Xi+i). We define the i-th B-spline B(x;xl,...,Xi+k) of degree k — 1 with knots a;,-,..., Xi+k as the A;-th divided difference of the truncated power function p ( t , x) = (t — a:)^."1, i.e.,
In this formula p[xi,..., Xi+k](x) is the k + 1-st divided difference of the function p ( - , x ) taken with respect to the first argument at the set of points £;,..., Xi+k and expressed as a function of x. We often simplify the notation by writing
whenever the knot sequence {a:;} is fixed. Clearly, the B-spline is a well defined function for all x ^ X{. If x = Xj for some index j, then we define We remark that the assumption about infinite knot sequence {.'£;} is not necessary in the definition of B-splines. It is however convenient in the formulation of several results of this section. Let us note that the function B^(a;) is a spline as defined on page 93. Indeed, the divided difference defining B^^a:) can be written as (see page 53):
where a,j are some coefficients. The last equation shows that B,-)/-(a;) is a piecewise polynomial of degree k — 1, and that it belongs to Ck-'2(K}. Hence B,-)fc(a;) is a spline. Below we outline several properties of B-splines.
104
CHAPTER 2.
Lemma
(i) (ii)
SPLINES
104.1
The spline Bitk(x) vanishes outside of the interval [ x t , x l + k ] , Bi^(x) is positive for x 6 (xj, ZJ+A,.).
Proof To show (i) we fix x ^ [a;,, a;;-^]. Then the function p ( t , x ) is a polynomial (t — x)k~l of degree k — 1 for a; < ./;.(, and is zero for x > Xi+k- Therefore, the fc-th divided difference of p ( - , x ) is zero, i.e, Bitk(x) = 0. We now observe that (ii) is satisfied for k = 1, since
Given k > 2 and x £ (z;, £«•+&), we consider the function g(t) = p ( t , x ) , and observe that the Z?-spline BI^(X] is the coefficient of tk in the Lagrange interpolatory polynomial Lk(t), which interpolates g at Xi, ...,Xi+k (which is the property of divided differences). Since the polynomial Lk(t) is not zero and not equal to the polynomial (i—a;)*" 1 , then the difference Lk(t)—g(t) has only fc + 1 distinct zeros Xi,...,Xi+k- From the continuity of L%'(t) —g^(t), for j = 0,...k — 2 and the intermediate value theorem we conclude that L^' (t) — g(k~l\t) has two sign changes in (xi,Xi+k)- Since g ( k ~ ^ ( t ) = 0 for t < x and gtk-V(t) = (k - 1)1 for t > x, the function L(^l]'(t) is of the form At + B, where A = B i i k ( x ) k \ . For the above condition to be satisfied the coefficient A must be positive, which shows that Bi,k(x) > 0 as claimed. • We next show that the B-splines add up to one for any given argument x 6 72.. Lemma 104.2 For every x 6 72. we have
Proof We first find the index j such that x 6 [xj,Xj+i). We assume that x ^ Xj, and observe than only a finite number of terms appear in the above summation, namely
2.2, B-SPLINES
105
since the other terms are equal to zero. From the recurrence relation defining divided differences,
we obtain
Therefore,
The last equation holds, since for t > Xj+\ the function p ( t , x) is a polynomial in t of degree k — 1, i.e., its k — 1-st divided difference is equal to its leading coefficient, which is one. For t < Xj we have p(t, x) = 0, hence, the divided difference is also zero. Finally the right-hand side continuity of the B-spline implies that this result holds whenever x = Xj. • A sequence of nonnegative functions that satisfy Lemma 104.2 is called a partition of unity. Remark In the first exercise on page 102 we indicated that the functions x*, i = 0, ...k — 1, (x — Xj)^~ , j — 1,..., n form a basis of the spline space S(k — \\x\,..., xn). It turns out that this basis is ill conditioned for the numerical representation of arbitrary splines. In contrast, the B-splines form a well conditioned basis. This is why we revoke to B-spline representation of general splines. • We now consider an interval [a, 6] such that XQ < a < x\, and xn < b < a;n+i , and we define the splines
We then have Theorem 105.1 The B-splines Bl(x), i — I , ...,n+ k form a basis of the space S(k — 1; Z j , ..., xn) on the interval [a, b].
106
CHAPTER 2.
SPLINES
To prove this theorem we need the following lemmas. Lemma 106.1 Let s be. any spline in S(k— 1; x\,..., x ^ ) . If s(x) = 0 for all x <£ [,'EI, Xk], then s(x} = 0 for every x g TL. Proof Lemma 94.1 implies that s can be written as:
where p(x) is a polynomial of degree k — 1. For x < x\ we have p(x) = 0, since s(x) = 0. Consequently s(x) — ^2j=i ai(x ~~ x j } + '1We now take the points Xk+i,...,X2k (recall that {x{\ is an infinite sequence of knots), and define the polynomials
Then for
we get
since s(x) = 0 for x > x/,. As F(p) depends linearly on p, and the polynomials pi(-) form a basis of the space O^-i of all polynomials q of degree at most k — 1 (see Exercise 62 on page 110), F(q) — 0 for any such q. By taking the Lagrange interpolatory polynomials qm of degree k — 1 such that qm(xj) = <5mj, TO, j = 1,..., k we obtain
This shows that all coefficients a,j vanish, i.e., s(x] = 0 for every
x en. m
Lemma 106.2 T/ie B-splines B1(x] i = 1, ...,?/, defined in Tlieorem 105.1 are linearly independent on 7v.
2.2. B-SPLINES
107
Proof We use induction on n. The lemma is true when n = 1, since B\(x] is nonzero on Tl. Suppose now that the lemma is valid for n — 1, and consider the linear combination
Then suppose that the splines Bi are linearly dependent on 7£. Hence there exists an index IQ such that a;0 ^ 0, and
We now take any x G ( s n _ i , x n ) , and observe that 0 = f ( x ) = anBn(x). Since Bn(x) > 0 for such x, we must have an = 0. From the induction hypothesis we get a,- = 0 for i = 1, ...,n - 1, which contradicts a;0 ^ 0. • We are finally ready to prove the Theorem 105.1. Proof of Theorem 105.1 We only need to prove that the splines Bi(x] , i = l,...,n+ k are linearly independent on [a,b] since the dimension of the space 5'(fc — 1; £i,..., xn] is n + k (see Exercise 55 on page 102). We assume the contrary, i.e., that they are linearly dependent on the interval [a, b], which implies that there exists a nonzero coefficient a;0 / 0, with 1 < io < n + k, such that
vanishes for every x G [a, &]• We observe that f ( x ) — 0 for x 6 [XQ, a: n +i), since / is a polynomial on [XQ, x\), and on [xn, x n + i). We now set
Since g is a spline of degree k — I with knots a;i_^, x^-ki ••••, x\ , Lemma 106.1 tells us that g(x) — 0, for every x € 1Z. This implies that f ( x ) = 0 for every x G ( — oo,x 0 ). Similarly we get f ( x ) = 0 for all x G [z n +i,oo). Consequently /(x) = 0 for every x G H. Now by Lemma 106.2 the B^s are linearly independent on 7£, and so a,- = 0 for every z = 1, ...,n+ A;. This contradicts our assumption and completes the proof. •
108
CHAPTER 2.
SPLINES
We next develop a recurrence formula for computing jS-splines. To this end we use Steffenson's rule for the divided difference of the product of two functions / and (/, in the form
We first normalize the spline Bi^(x) as
and we define
so that (t-x)^T1 = f ( t } - g ( t ) . By using the fact that f[xi,Xi+i] — 1, and that the second and higher divided differences of / are zero, we obtain
Rewriting, we have
By multiplying both sides of this expression by £;+£ — Xi, we get
This formula gives the spline Bitk(x) as a convex linear combination of Bitk-i(x) and Bi+itk-i(x)- It thus enables us to compute Bsplines at any given x by the following stable algorithm. For a given
2.2. B-SPLINES
109
argument x we first locate the index j such that x € [xj,Xj+i). If i < j — k or i > j + 1, then B^k(x) = 0, since cc is outside of the support of Bitk- In the opposite case i<j
This method of computing B i t k ( x ) has excellent numerical stability properties, since we only add and multiply nonnegative numbers at each step. 2.2.1
General spline interpolation
Now we briefly outline a general spline interpolation problem. A particular case of this problem with restriction to the class of natural splines was studied in the previous section. We let the splines B i t k ( x ) , i = 1,..., N, with N = n + k, be defined as in Theorem 105.1, and we let ii < £2 < • • • < *JV be any interpolation points in [a, b]. A general interpolation problem can thus be formulated as follows: for a given set of real numbers y,-, i = 1,..., N find a spline s 6 S(k] x\,..., xn) that satisfies: The existence and uniqueness of the spline s depends on the location of the knot sequence Xi versus the interpolation sequence t,-. It is a subject of extensive study as outlined in the Annotations to this chapter. Here we summarize one result. Since the spline s can be written as a linear combination of the 5-splines Bitk, i.e.,
we get a system of linear equations for the u n k n o w n coefficients a,,, in the form
110
CHAPTER 2.
SPLINES
The matrix of this system, A = [ B i t k ( t j ) ] ^ j = i , has a band structure, since each of the splines Bi^ has finite support [a;,-,a;,-+fc]. Therefore the interpolation problem has a unique solution iff this matrix is nonsingular. It turns out that the following result holds. Lemma 110.1 The matrix A is nonsingular iff t^ G [x.t, xt+k], for every i = l,...,N, i.e., whenever all its diagonal elements B.^k(t-i) are positive. In addition for m = 1, ..., N, all m X m submatrices of A have nonnegative determinants. This implies that Gaussian elimination without pivoting is a numerically stable algorithm for solving this system. These results are referred to in the Annotations to this chapter. 2.2.2
Exercises
60. Prove Steffenson's formula for the divided difference of the product of two functions that we used to derive the recurrence relation for the computation of S-splines on page 108, i.e.,
Hint: Use induction on the number of points n, and the fact that h[x0,...,xn] = f[xi,...,xn], where h ( x ) - (x - x 0 } f ( x } . This fact can be established as follows: we first observe that f[xi,..., xn] is the leading coefficient of the polynomial pn-\ (x) of degree n — 1 interpolating / at x\, ...,xn. The polynomial qn(x] — (x — xo)pn-\(x) interpolates /i(a;) at X Q , . . . , x n . Since qn(x] has the same leading coefficient c as pn-\(x), and since c = }I[XQ, ..., xn] the above result follows. 61. Implement the general interpolation algorithm outlined on p. 109. Generate the matrix of the system by using the recursive algorithm for the computation of 5-splines from page 109. Explore the band structure of the linear system while solving it with Gaussian elimination without pivoting. Test your algorithm for several choices of the interpolation points, degrees of splines, and dimensions N. 62. Define the polynomials p,-(a;) = (xi — x)k~l, x G Tl where a;,- are distinct real numbers, i = 1,..., k. Show that these polynomials form a basis of the space Hk-i of real polynomials of degree at most k — 1.
2.3. GENERAL SPLINES
2.3
111
General splines
In this section we define and outline general properties of splines in linear spaces. We assume that 7 is a linear space with a field of scalars K = Tl or C, and that N : T-^Kn is a linear operator. We also assume that a linear operator T : F^H is given, where H is a normed linear space. Given a vector y G Kn we define a spline interpolating y with respect to N as an element a = a(y] of the space F that satisfies the following conditions. Several properties of spline elements are characterized below. We first define the set P(x) of all elements h in kerJV that minimize the distance \\Tli — :r||.
Then the following lemma holds. Lemma 111.1 Let y be a vector in Kn. (i) There exists a spline
which means that /i0 € P(Tf). We now assume that P(T/) / 0 for some /, and we select an element h G P(T/). Since N ( h ) = 0, for cr = / - h, we get
112
CHAPTER 2.
SPLINES
This shows that a is a spline interpolating y. We next prove (ii). We suppose first that a is a spline with N(&) = V- Then for any / such that N(f) — y we have / = a + ho for some ho G ker/V. Hence N(f) = N ( a ) = y, and
This means that / - a G P(Tf). Next, we prove the "only if" part of the assertion (ii). Since / - a G P(Tf), then N(f - a) = 0 and
Thus, a is a spline. We now prove the assertion (iii). We first assume that N ( f ) — y, and observe that if there exist two different elements hi, h^ G P(Tf), then (proceeding as in the proof of part (i)) we conclude that a\ = f — hi and a2 = / — h-2 are distinct splines, i.e., the spline is not unique. To prove the "only if part, we suppose that there exist two distinct splines a\ ^ a^ interpolating a given vector y = N ( f ) for some / such that the set P ( T f ) is not empty. The assertion (ii) then yields that / - al G P ( T f ] , and that / - a-2 G P ( T f ) . As / — &i ^ f — U'2 the set P ( T f ) is not a singleton. Finally, we suppose that there exists an s / 0, s G kerTV n kerT. The assertion (i) then implies that for some / such that N ( f ) = y, the set P(Tf) is not empty. We now select h G P ( T f ] , and we consider the spline a = f — h. By defining a\ = a + s, we get a ^ o\, Ta = Ta\, and N(&i) = y. This shows that a\ is also a spline, and contradicts the uniqueness of a spline element. • In what follows, we suppose that H is a Hilbert space, and that T(ker./V) is a closed subset o f / / . Then Lemma 111.1 implies that the spline a = a(y) exists for y = N ( f ) , since the set P(Tf) is not empty for every /. Below we prove two important properties of splines. Lemma 112.1 An element a — ff(y] is a spline iff N (a} = y, and Ta is orthogonal to T(ker/V), i.e., (Ta,Th) = 0 for every h G kerTV.
2.3. GENERAL SPLINES
113
Proof We first prove that if a(y) is a spline, then Ta is orthogonal to T(ker./V). To this end, for any h 6 ker./V, and any real constant a, we have N(cr + oeh) = N(cr') = y. Moreover,
Thus, 0 < a2\\Th\\2 + 2a(Th,Th). Consequently (Ta,Th) = 0, since otherwise for sufficiently small \a\ the above inequality is not satisfied. This shows that Ta is orthogonal to T(ker./V). We now prove the "only if part of the lemma. We take any / G F, and we assume that a is such that N(a} = y = N ( f ) , and that (Th,T
which implies that a is a spline. • The next lemma gives two more properties of splines. Lemma 113.1 The spline element a = a(y) (i) depends linearly on y, and
(ii) is a center of symmetry of the set V(y) = {/ € T : N(f)
=
y , and ||T/|| < 1}. Here center of symmetry means that u + a — f € V(y) for any / G V ( y ) . Proof We leave the proof of (i) as an exercise to the reader. To show (ii) we need to verify that 2
which implies that 2a — / 6 ^(y). •
114 2.3.1
CHAPTER 2.
SPLINES
Exercises
63. Show that spline element is homogeneous, i.e., that if
2.4
Annotations
Spline functions were introduced by Schoenberg in his famous paper [17]. An extensive history of the field can be found in the monographs [19] and [20]. Several earlier developments of spline theory, specifically in the univariate case, are summarized in [1], [10], and [8]. The field of spline theory and applications has been an active research area over the last 40 years. This research gave rise to hundreds of papers and several monographs, some of which are listed below. We should specifically mention important applications in the area of Computer Aided Geometric Modeling that are summarized in [3] and [9]. Sections 2.1 and 2.2 of this Chapter are based on the classical theory of univariate polynomial splines and ^-splines as outlined in the monographs [6], [7], [10], and [20]. Section 2.3 is based on papers [2] and [12], which deal with generalizations of splines to arbitrary linear spaces. Specific comments Sections 2.1 and 2.2 deal with splines with single knots only. These splines have been generalized to the case of multiple knots, as summarized in [6]. Section 2.1: The minimization property of natural splines as expressed in Theorem 96.1 was first shown by Holladay [11]. Section 2.2: The proof of the positivity of B^k on [x^x^/,] follows the idea of Bojanov [5]. The recurrence formulas for B-splines are used to show that property by induction in a more classical approach.
2.5. REFERENCES
115
The nonsingularity of the matrix A (see Lemma 110.1) in the general interpolation problem has been characterized in [18]. The total positivity of this matrix is studied in [14] and [15]. The study of existence and uniqueness of the solution to the interpolation problem as outlined on page 109 is equivalent to the analysis of non-singularity of the collocation matrix [Bitk(tj)]. This is summarized nicely in [6] (Chapter 4). The location of the knot sequence versus the interpolatory sequence is also studied in [6].
2.5
References
[1] J.H. Ahlberg, E.N. Nilson, and J.W. Walsh. The Theory of Splines and Their Applications. Academic Press, 1967. [2] P.M. Anselone and P. J. Laurent. A general method for the construction of interpolating or smoothing spline functions. Numer. Math., 12 : 66-82, 1968. [3] R. Bartels, J. Beatty, and B. Barsky. An introduction to splines for use in Computer Graphics and. Geometric Modeling. Morgan Kaufmann, 1987. [4] E.O. Bohmer. Spline-Funktionen. Teubner, Stuttgart, 1974. [5] B.D. Bojanov. On the total positivity of the truncated power kernel. Colloq. Math. 60/61: 594-600, 1990. [6] B.D. Bojanov, H.A. Hakopian, and A.A. Sahakian. Spline Functions and Multivariate Interpolations. Kluwer, Boston, 1993. [7] C. de Boor. A Practical Guide to Splines. Springer-Verlag, New York, 1978. [8] Z. Ciesielski. The Theory of Spline Functions, Gdansk University Press, 1976.
(in Polish)
[9] G. Farin. Curves and Surfaces for Computer Aided Geometric Design: a Practical Guide. Academic Press, San Diego, 1988. [10] T.N.E. Greville. Introduction to spline functions. In Theory and Applications of Spline Functions. Ed. by T.N.E. Greville. Academic Press, New York. 1969.
116
BIBLIOGRAPHY
[11] J.C. Holladay. Smoothest curve approximation. Math. Tables Aids Computation, 11: pp. 233-43, 1957. [12] R. Holmes. R-splines in Banach spaces: I. Interpolation of linear manifolds. J. Math, Anal. Appl., 40: 574-93, 1972. [13] J. Jankowska and M. Jarikowski. Survey of Numerical Methods, Part I. (in Polish), WNT, Warsaw, 1981. [14] S. Karlin. Total Positivity, Volume I. Stanford University Press, 1968. [15] S. Karlin. Total Positivity, interpolation by splines, and Green's functions of differential operators. J. Approximation Theory, 4: 91-112,1971. [16] N.P. Kornejc'uk. Splines in Approximation Theory, (in Russian), Nauka, Moscow, 1984. [17] I.J. Schoenberg. Contributions to the problem of approximation of equidistant data by analytic functions. Quart. Appl. Math., 4: 45-99, 112-41, 1946. [18] I.J. Schoenberg and A. Whitney. On Polya frequency functions, III: The positivity of translation determinants with application to the interpolation problem by spline curves. Trans. Amer. Math. Soc., 74: 246-59, 1953. [19] I.J. Schoenberg. Cardinal Spline Interpolation. SIAM, Philadelphia, 1973.
CBMS 12,
[20] L. Schumaker. Spline Functions, Basic Theory. Wiley, New York, 1981. [21] J. Stoer and R. Bulirsch. Introduction to Numerical Methods. Springer Verlag, New York, 1993. [22] G. Wahba. Spline Models for Observational Data. Philadelphia, 1991.
SIAM,
Chapter 3
Sine Approximation Sine methods are a new family of self-contained methods of approximation, which have several advantages over classical methods of approximation in the case of the presence of end-point singularities, in the case when we have a semi-infinite or infinite interval of approximation, or in the case of the presence of a boundary layer situation. For example, the error of approximation of a function / over a finite interval [a, b] by a polynomial of degree ?i, with / analytic in some domain containing [a, 6] is O(t:~cn), where c is some positive constant independent of n. This rapid rate of convergence drops considerably in the case when / has a singularity at an end-point of [a, b]. For example, the error of best approximation of (a; — a) a , with 0 < a < I by a polynomial of degree n on the interval [a, b] is O(n~a). On the other hand, the error of best n-term Sine approx1 /2 imation is O(e~~in ) in each of these cases, where 7 is a constant independent of n. In this chapter we introduce Sine methods of approximation of operations in calculus, such as the approximation of differentiation and indefinite integration. We also introduce methods of approximation and inversion of Laplace and Hilbert transforms, and methods of approximating definite and indefinite convolutions.
3.1
Basic definitions
Sine functions are defined in Chapter 1 on pages 23-25. Let us at once introduce some basic definitions of Sine approximation. • Let d denote a positive constant, and let, T*d denote the region
118
CHAPTERS.
SINC
APPROXIMATION
Let T> be a simply connected domain having boundary dT>. Let a and b denote two distinct points of dT>, and let (f> denote a conformal map of T> onto T>d, such that >(a) = —oo, and 4>(b) = oo. Let if} = <{>~l denote the inverse map, and let F be denned by
Given (/>, if) and a positive number /i, let us define the Sine points z^
by
Let us also define p by Let Hol(V) denote the class of all functions F that are analytic in T>. Let 1 < p < oo, and let HP(V) denote the family of all functions F € Hol(V), such that
In the case p = 1 we shall simply write N(F, D), instead of NI (F, £>). Corresponding to positive numbers a and (3, let L ai/ g(X>) denote the family of all functions F 6 Hol(T>) for which there exists a constant C such that
for all z in £>. We shall simply denote L ai0 ,(X>) by L (1 (X>). It is convenient to define yet another important family of functions, MatpCD), with 0 < a < l , 0 < / 3 < 1, and 0 < d < TT. The family Ma>/3(P) consists of all those functions F £ Hol(V) n C(V) such that G € LaV where
and where CF is defined by
We shall denote M a?a (X>) by Af a (P). It may be shown that Ma(Z>) contains the class Lipa(V}r\Hol(V] for the case of the bounded regions that we shall encounter in the
3.1. BASIC DEFINITIONS
119
examples of this chapter. We remark, also, that if C = F — £ F , with £ defined as above, then the term £F satisfies the relations
where the limits are taken from within V, and this is why we require 0 < oc, ft < 1 and 0 < d < TT in the definition of the class Matp(D). That is, £F £ In(£>). The classes L a)/ g(Z>) and Ma)/g(£>) are important to Sine methods since they are easy to recognize, they house the solution to most differential and integral equation problems from applications, and also, since they guarantee the rapid convergence of Sine approximation described in the theorems that follow. Let us next state and prove some properties of the spaces L a>/ g(X>) and M a) /j(P), which are simple consequences of the above definitions. Theorem 119.1 Let a G (0,1], ft G (0,1], d' G (0,7r), let V = i>(Vd,} and for some fixed d £ (OX), let V == v(Dd). Let F G Hol(T>), and let'IF denote the indefinite integral of F. 1. //F G H00^1), then F'/fi G H°°(D). 2. //F 6 H00^1), and if (!/<£')' is uniformly bounded in T>', then pWI(')n e H°°(D], n = 2 , 3 , . . . . 3. 7/F € Ma^(D'}, then F'/<j>' e L a ^(X?). 4. // F € Mafi(W}, and if (!/>')' is uniformly bounded in T>', then FH/^T € L^(P), n = 2 , 3 , . . . . 5. //F G H 1 ^), tten JF £ #°°(X>). 6. J/F'/^ G L« )/3 (P), iAen F € M ai/3 (27). 7. //F e L ai/ j(£), i/zen ^'F e Hl(V}. Proof
Part 1: At the outset we consider the case of V = D(I. For Q < d < d' Cauchy's integral formula for the hound of the nth derivative of / G //o/('P ( /) at a point z at the center of a, disc of radius d' — d lying in T>lti, yields
120
CHAPTERS.
SINC
APPROXIMATION
for all z G P^. After setting / = F o tp, and taking n — 1, we arrive at the statement of Part 1 of the Theorem. Part 2: This part of the proof follows by induction, upon inspection of the identity
For if we take 0 < d < d" < d' and, if for some k > 1, F^/[4>'}k is uniformly bounded in T>" = -(/;(£>,;»), then assuming the validity of the Part 1 of the theorem, the term (F(k)/[(j>']h)'/(/>' is then uniformly bounded on T> = ip(T>d)- If the term (!//)', which is analytic in V is uniformly bounded in P", then the second term on the right-hand side of the identity is also in H°°(V). This proves Part 2. Part 3: We again proceed as in the Part 1 of this theorem by returning first to the case of T),J. We assume initially, that / G La,/?(^d')- Then for all z G XV, we have
with C a constant. Then, upon bounding the nth derivative o f / at z using Cauchy's formula, via integration about a disc with center z G T>d and radius d' — d lying in P^/, we find that
with 7 = max(a,/?). Hence /(") G La^C^d)- Next, for the case when / G M a)i e(XV), the term Cf defined above takes the form
Hence if n is a positive integer, then (Cf)^n'(z) = O(e z \ ) as z —)• dtoo in T>. Part 4: This part follows at once by using the results of the proof of the Part 3 above, and proceeding as in the proof of the Part 2. Part 5: This is straightforward, since the indefinite integral of F is always bounded in V by a constant plus N i ( F , ' D ) . Part 6: Again, we must first consider the case of / G La^^rf)Recalling the definition of the class / G L aii g(X'd) we find for the case of $lz > 0, z = x + iy, and g — Xf that
3.1. BASIC DEFINITIONS
121
Similarly, for the case of $lz < 0, we find that
Thus, the Part 6 of the theorem follows at once, for the case of T> = T>d- The case of an arbitrary simply connected region T> then also follows from this result, by taking F = f o i/>. Part 7: Clearly F € Hol(D). Hence we have <j>'F e Hl(D), provided that fdD \(j>'(z)F(z) dz\ < oo. Now, if z £ dT>, then p ( z ) et(z)
_
et+ir where t
_
ft^^
T
-
Cj^,(z) _
t+iT
±df
Thus> dp(z)
'(z)p(z)dz = e dt = p(z)dt, i.e., dz = {l/<j>'(z)}dt. F 6 L a ,0(I>), then
_
Hence, if
This completes the proof. •
Example 121.1 //(a, 6) = 7?., arzd i/T> is the strip defined on page 117, V = X>rf
Figure 121.1 : The strip The class La^(V} is the class of all functions f 6 Hol(V) such that if z e V and ?Rz < 0, then \f(z)\ < ce-a\z\, while if ~ € P and Kz > 0, t/ze?i |/(z)| < ce"'3^'. 77iuw, i/iz.s map allows for exponential
122
CHAPTER 3. SINC APPROXIMATION
decay at both x = -oo and x — oo. The Sine points z, are defined
by
Example 122.1 If (a, b) = (0,1), and ifD is the eye-shaped region, V = {z e C : | arg[z/(l - z)]| < d}, then <£(z) = log[z/(l - z)] and La,/3(P) is t/ie c/ass of all functions f € Hol(V], such that for all z£V, \f(z)\
Figure 122.1: The eye-shaped region In this case, if e.g. S = max{a,/?}, and a function w is such that w 6 Hol(T>), and w £ Lips(T>}, then w € Ma^(D). The Sine points Z are
Example 122.2 // (a, b) — (0, oo), and ifT> is the sector V — {z € C : |arg(z)| < d}, then 4>(z) = log(z), the class La^(T>) is the class of all functions f £ Hol(T>} such that if z G T> and \z < 1, then \f(z}\ < c z\a, while if z € V and \z\ > 1, then \f(z)\ < c\z\~0.
3.1. BASIC DEFINITIONS
123
Figure 123.1: The sector This map allows for algebraic decay at both x = 0 and x — oo. The Sine points z3 are defined by Zj = e^h, and !/>'(zj) = e3>l. • Example 123.1 If (a, b) = (0,oo) ; and if T> is the bullet-shaped region V — {z £ C : | arg(sinh(z))| < d}, then 4>(z] = log(sinh(z)), and L^/jCD) is the class of all functions f 6 Hol(V] such that if z 6 T> and z\ < 1, then \f(z)\ < c\z\Q, while if z € I> and \z\ > 1, then \f(z}\ < cexp{-/j|z|}.
Figure 123.2: The bullet-shaped region This map allows for algebraic decay at x = 0 and exponential decay at x = oo. The Sine points Zj are defined by z} = log[f j/l + (1 4-e 2 - 7 ' 1 ) 1 / 2 ], and \l'(zj] = (1 + e-'^''1)-1/2. •
124
CHAPTER 3. SINC
APPROXIMATION
Example 124.1 If (a, b] = 7£, and if V is the hourglass-shaped region, take (j)(z] = logjz + (1 + .z2)1/2], and the class L a)/ g(Z>) is i/ie class of all functions f G Hol(V) such that if z G T> and $lz < 0, then \f(z)\ < c(l + \z\}-a, while if z G V and $.z > Q, then \f(z)\ < c(l + \z\}-e.
Figure 124.1: The hourglass-shaped region This map allows for algebraic decay at both x = —oo andx — oo. The Sine points Zj are defined by Zj — sinh(j/i), and l/<j>'(zj} = cosh(jh). Example 124.2 //(a, 6) = Tl, and if V is the funnel-shaped region,
take (/>(z) = log{sinh[2+ (1 + - 2 ) 1/2 ]}. The datm L U > ^(P) consist* of all functions f € Hol(D) such that if z G T> and $R.z < 0, then \f(z)\ < c(l + z\)-Q, while if z e T> and $z > 0, then \f(z}\ < ce~P\z\, This map thus allows for algebraic decay at x — —oo and exponential decay at x =• oo. The Sine points Zj are defined by Zj = (1/2)fo - l / t j ] , where tj = \og(e>h + (I + e2^1)1/2], and \/^(zj} = ( l / 2 ) ( l + lA 2 )(l + e- 2 ^)- 1 / 2 . •
3.1. BASIC
DEFINITIONS
125
Figure 125.1: The funnel-shaped region 3.1.1
Exercises
65. Prove, with reference to Examples 121.1-124.2, that (!/>')' is uniformly bounded in the region V J (^d) for each of the following cases. (a) For all d > 0 if
(b) For all d € (0, TT) if 0(z) = log[z/(l - z)]. (c) For all d > 0 if >(z) = log(z). (d) For all d € (O.Tr/2) if (z) = log[sinh(z)]. (e) For all d € (0, ir/2) if (z) = \og[z + VT+~?]. (f) For all d £ (0, ir/2) if (z) = log{sinh[;z + yT+11]}. 66. Prove that if a > 0, /? > 0, then
67. Let 'Hf denote the Hilbert. transform of /, i.e.,
Prove that if 0 < « < ft' < 1, 0 < c/ < d' < IT, with P' = V>(TV), and if / e L a .(*>'), then '«/ € M 0 (P), for F defined in each of the examples 121.1, 122.1. and 123.1.
126
3.2
CHAPTERS.
SINC
APPROXIMATION
Interpolation and quadrature
In this section we shall derive the basic formulas for Sine interpolation and quadrature. Let h denote a positive number, let k be an integer, and let S(k,h}(z) be defined as on page 25. We start with the following theorem. Theorem 126.1 LetF^Hl(V). (a) For all z in T>,
(b) Furthermore,
Proof Let 0 < 5 < d, let n € A/", and let #(n, 5) and X>(n, J) be defined by
Let z be fixed in £>, set 4>(z) = u + rf, where u and v are real, and also, )(() — £ + "/, with £ and 77 real. Thus, it follows that if n is sufficiently large and S is sufficiently close to d, then z 6 £>(ri, 8). In this case |<^(z)—^(C)| > min{(n+|)/i— |w|, ^~ u|} > 0, which implies, by our assumptions on F, that the above contour integral is bounded. Along the parts ofD(n,8) that are transformations under tp of the vertical segments of 8B(n, 5), <j)(z) = (n + ~)/i + iv, and therefore \sm(TT(j)(z)/h)\ = |(—l) n cosh(7ru/A)| > f; thus the contribution to E(n, 5, F) of the integral along these segments is bounded by
3.2. INTERPOLATION
AND QUADRATURE
127
Along the horizontal segments, we have \(z) — <^(C)| = [(u~^,)2+(v~ 5) 2 ] 1 / 2 . The upper and lower estimates that follow for the function sin[TT(f)(z)/h], for z on the horizontal portions of dB(n,8) will often be used:
Hence, E(n, 5, F)(z) is bounded by
We now let n —> oo and 8 -^ d E(n,6,F)(z) approaches
in K(n,6) to conclude that
On the other hand, it is readily seen by computing the residues at the poles of the integrand in the definition of K(n,'5), that
We may also let n —)• oo and 5 —> d in this expression, since we have already shown that this limit exists. This completes the proof of part (a). We next prove part (b). We multiply both sides of the equation in (a) by >'(z), integrate over F, interchange the order of integration in the double integral, and use the identities
128
CHAPTER 3. SINC
APPROXIMATION
which follow readily, after use of the transformation £ = (f>(z).
•
We remark here that Theorem 126.1 shows that the infinite sum Sine series interpolation of F converges to F in all of P, as h —> 0. It is straightforward to obtain a method for interpolating F (rather than F/4>'} if we assume that <j>'F 6 Hl(T>), and if we replace F by 'F G Hl(D), and z £ V, then
Proof Since z € T and C «E dT>, it follows that \(z) - (/>(Q\ > d. Also, applying the identity
to the equations in Theorem 126.1, we readily obtain the desired bounds. • Theorem 128.2 Let F £ La,0CD), let N be a positive integer, and, let h and an integer M be selected by the formulas
where [•] denotes the greatest integer function. Then there exists a constant C, which is independent of N, such that
3.2. INTERPOLATION AND QUADRATURE
129
Proof We may write
and where So is the bound in inequality (a) of Theorem 128.1. But since for all x € 'R, and k G Z we have
it then follows that S ( k , l i ) ( x ) is bounded by 1 on 7Z. Therefore, since F 6 La^(D], and since p(zk) = ek/\ we get
where Ci is a constant independent of TV. Similarly, we get \S>i\ < C3/(flh)e~PNl1, with 6*3 a constant independent of N. Now, recalling that M = [[IN/a], it follows that there are positive constants C4 and Cs, independent of TV, such that for all positive integers N we have C4e-0Nh < C2e-aMk < C5e~fiNh. Hence there exists a constant C6, independent of TV, such that
The theorem now follows easily, by taking h as described above.
•
We next bound the error of (M + TV + l)-point Sine quadrature of F over P. We omit the proof, since it is similar to the proof of Theorem 128.2. Theorem 129.1 Let F/cf>' & La^(V], let N be a positive integer, and set M = [j3N/oc], where [•] denotes the greatest integer function. Let h be selected by the formula
Then there exists a constant C, independent of N, such that
130
CHAPTER 3. SING
APPROXIMATION
It may happen that F € Hl(D), but $F £ Hl(V}. If the limits
taken along F exist and are bounded, it may happen that (j>'G G Hl(D), with G = F — £F, denned as on page 118. These conditions are all satisfied, for example, if F G Ma^(T>}. This interpolation device is often useful in applications. Indeed, the following basis is suitable for approximation in Ma^(D}.
These basis functions satisfy the properties <jjj(zk) = 1 if j = k, while u>j(zk) — 0 if j' ^ k. These basis functions suffice for purposes of uniform norm approximation of F G Ma^(T>] over F, via the formula
It is readily verified that upon selecting h as in Theorem 128.1, the error of this formula is again bounded by CNl^exp(—(?rd/3./V")1/2. Indeed, this bound remains unaltered if the factors (1 + e~Mh) and (1 -f e~Nh) on the right-hand sides of the expressions defining U-M and WJY are replaced by 1. It is convenient to introduce a symbolic notation at this point. Letting M, AT, z^ and w^ be defined as above, we set in — M + N + 1, and given any function F defined on F, we define the operator Vm by VmF = (F(z-M, . .., F(z/v))' r where the superscript !T" denotes the transpose. Also, given a vector c = (c_^,. . ., CN)T, we define the interpolation operator Ylm by Il m c = ^k=-M ckijjk- We can then replace the above approximating equation by its convenient operator form
It may at times be convenient to replace either U-M °r W7V by an ordinary Sine basis function j j as defined above, if it is desirable that the final approximation should vanish at an end point of F, since the Sine basis functions j j vanish at both end points of F. For example, if a function is known to vanish at the end point a of F, then the
3.2. INTERPOLATION
AND QUADRATURE
131
approximation of F given in above will have the same feature as well as the same order of accuracy if we define u>.j by
We can also establish other variants of the above approximation results, such as convergence in all of £>, or convergence in the Hilbert space defined by the inner product
invoking the results of the following lemma. These results are left to Exercises at the end of this section. Lemma 131.1 Let x 6 Tl, y € Ti and let z = x + iy. Then
Proof Since
the proof of the inequality in Lemma 131.1 follows at once by bounding the integrand e~"1 in
by eyt and then integrating the result over 'R,. The remaining part follows by Parseval's theorem, after expressing e~lzi as a Fourier series in t over (—TT//J, TT//I). • One result facilitated by Lemma 131.1 is the following, which justifies collocation, and the symbolic replacement of operations of calculus via approximate algebraic Sine expressions.
132
CHAPTER 3. SINC
APPROXIMATION
Theorem 132.1 Let f (E L a)/ g(£>), let N be a positive integer, set M = \j3N/oi\, where [•] denotes the greatest integer function, and h = (ird/({3N))1/'2. LeteN be defined by
with u>k defined as above. Let {c/c}^L_jW be a sequence of numbers, and set
Then
Proof We may assume without loss of generality that F 6 L 0 ) ^(X>). Then the above bound on E follows directly by application of the triangle inequality and the results of the previous lemma. We have
Now, the first absolute value term is bounded by ejv, whereas, by Schwarz's inequality, the second term is bounded on F by
By Lemma 131.1 we have (£k=-M S(kJi)o(f)(x)\'2Y hence we have EI < SN- •
3.2.1
* < 1, and
Exercises
68. Let / 6 H(T>d) and let z 6 T>d- Show with reference to Theorem 119.1 that
3.2. INTERPOLATION
AND QUADRATURE
69. Let F 6 Hol(D) n Lipa(V}, with V defined as in Example 121.1, and with a e (0,1]. Let G be defined as on the page 118. Prove that there exists a constant C such that for all z 6 V we have
70. Let R > 1, and let a e (1/R,R). (a) Set a;| = g(y) = (1 - 2/ 2 ) 1 / 2 . The function g 6 L Q (X>) with T) defined as in Example 122.1, with (a, 6) = (—1,1) and d = TT. Prove, by taking h = [^/(cvTV)] 1 / 2 , u(x) = (1 - x' 2 ) 1 /2 sgn (x-), u(a;) = log{[l + «(a;)]/[l-ttfa;)]}, and w f c (z) = S ( k , h } o v ( x ) + S ( — k , h} o v ( x ) , that there exists a constant C, independent of TV, such that
(b) Set G(tt) = ^(tanh(ti)) = [cosh(u)] a , and note that G € Hol(D], with X> defined as in Example 123.1, with d = yr/2, and also, that G(sinh(kh)) = O(exp{-(a/2)e-l A: l /l }), Jfe -> ±00. Conclude, by taking h = log(JV)/JV, that there exists a constant C, which is independent of TV, such that
(c) Use the result of the (b) - part of this problem to construct an TV + 1 — basis approximation to x a on [—1, 1], which is accurate to within an error bounded by (7exp{ —t ^ . }.
134
CHAPTER 3. SINC
APPROXIMATION
71. Prove with reference to Theorem 128.1 that if F 6 Hl(D], then there exists a constant c that is independent of h and k, such that
3.3
Approximation of derivatives on P
Let us note that by the integral definition of S ( k , h)(z] we have
It thus follows by assertion (2) of Theorem 119.1, that if (!/>')' is uniformly bounded in V = (f)~1(V(ii)} where 0 < d < d', then for all x G F, we have
where cn is a constant depending only on n. Let us assume that F G Ma^(V), let N be a positive integer, set M = \J3N/a], and h - { n d / ( ( l N ) } 1 / ' 2 . Let w, be defined as in the previous section. We then arrive at the following result, upon differentiating the contour integral expression for the difference between F and its infinite sum Sine expansion, and then truncation of the infinite sum as in Theorem 128.2. Theorem 134.1 Let F 6 Ma^(D'), let N be a positive integer, set M = [/IN/a], and h = {-Kd/(pN}}ll'2. Let MJ be defined as on page 130. If (!/>')' is uniformly bounded in V, then, corresponding to
3.3. APPROXIMATION
OF DERIVATIVES
ON F
135
every integer m — 0 , 1 , 2 , . . . , there exists a constant cm, depending only on in, such that for every n = 0, 1, ...,m, we have
Proof We shall assume without loss of generality that /? = a, so that M — N, and that F € L a (Z>), leaving the case of F 6 Ma(D), to the problems at the end of this section. For the sake of this proof we therefore take Uj = S (j, h) o <j>. Define Kn and w n j for j = -N,...,N,by
Since y 6 L Q (P), it follows that / F 6 Hl(T>). Hence, we arrive at the identity
It follows that
Let us first bound the functions hnunij(x), for x 6 F. To this end, we first note that if we set $(x) = (-K/h)(j)(x), then under the assumption that (l/'}' is uniformly bounded in !>', it follows that |(l/'(a;))' < c/i, uniformly on F, with c a constant that depends neither on h nor on x. Therefore, since
136
CHAPTER 3. SINC
APPROXIMATION
the uniform boundedness of/i n w n j(x) follows by applying Schwartz's inequality to bound this integral. Upon noting that for x 6 F and z €E dV that \(j>(x) — (j>(z}\ > d, it follows that Kn(x, z) is uniformly bounded for x € F and z £ T>. Since >' F €. Hl(V], it follows that J is bounded by cj exp(—TT d/h*), with GI a constant that is independent of h. Now, since |/ino;nj(a,i)| < c for all x e F, with c a constant independent of h and x, and since .F £ L Q: (2?), so that |_F(zfc)| < C2exp(—|fc| h), with C2 a constant independent of k and h, it follows that
The result of Theorem 134.1 now follows, upon selecting h as stated in the theorem. •
3.3.1
Exercises
72. Prove Theorem 134.1 for the case of F G Ma^(T>). 73. Give an explicit approximation of
for the case when x = Zk, k — —M,.. .,N, both in general, and for each of Examples 121.1 to 124.2. Use the notation
3.4
Sine indefinite integral over F
It will be convenient to use the notation
We shall here assume that F/c/)' G L a > / 3 (P), with 0 < a < 1, 0 < /? < 1, and 0 < d < TT, so that if G(z) = f F(z) dz, then G 6 Matfj(D). We again select a positive integer TV, and we define M by M = [/3N/a]. We again select h as in Theorem 128.2, the Sine
3.4. S1NC INDEFINITE INTEGRAL OVER T
137
points Zj as on page 118, we set m = M + TV + 1, and we define a Toeplitz matrix /("- 1 ) of order m by /'"') = [e;-jL with e,_j denoting the (i, j ) t h element of /'~ ] '. If u is an arbitrary function defined on (a, 6), we define a diagonal matrix D(u) and operators Vmu and Umu as on page 130. We also define matrices Am and Bm, and letting x € F, we define operators J7, J', Jm, and t7/u by
Thus, we can state the following theorem, the result of which enables us to collocate indefinite integrals over F. Theorem 137.1 ////>' 6 Latf}(T>), then there exists a constant C\, which is independent of N, such that
where \ \ - \ \ denotes the sup norm over 'R. Proof We may assume without loss of generality that F = 7£, that (f>(z) = z, i.e., that P = Prf - {z € C : Q=^| < d}. In addition, for sake of simplicity, we shall take ft = cv, so that in our proof we will have M = TV. We shall use the notation
If we define g by
then G = Jg is given explicitly by
138
CHAPTER 3. SING
APPROXIMATION
Moreover, by assertion (6) of Theorem 119.1 it follows that G E La(X'd)' Hence, we shall assume, without loss of generality, that
F d=f
JfeLa,p(Dd).
Since Jmf = CN J C/v /, we need to bound the difference J f—Jmf • But
Now, since Jf e L a (X>d), it follows by Theorem 128.2 that £\ < c}e, where s = Nl/2e~('KdaN) , and where. c\ is a constant independent of TV. Next, by the integral representation of f — Cf given in Exercise 68 on page 133 it follows that for any a; 6 7£, we have
Now, it may be shown (see Exercise 74, page 139) that
and this bound immediately enables us to bound J(f — C f} in the form
Now, by Exercise 75 on page 139, we have
We therefore arrive at the desired result that
3.5. SINC INDEFINITE
CONVOLUTION OVER T
139
Next, it can be verified (see Exercise 76, page 139) that if a: 6 72., then \ ( J S ( k , h ) ) ( x ) \ < 1.1 h. Since / € L«(P), we have for x € 72,
where c'2 and C;j are constants independent of TV. Thus, we find that e3 < c 4 ./V~ 1 / 2 log(./V)£, upon using the above norm bound for ||C'/v||The statement of Theorem 137.1 follows upon adding the above bounds for £j, j = 1, 2, 3. •
3.4.1
Exercises
74. Prove that
75. Prove that
76. Prove that if x £ U, then \ ( J S ( k , h ) ) ( x )
3.5
< 1.1 /i.
Sine indefinite convolution over F
For purposes of describing Sine indefinite convolution, we shall use the notation of Sine indefinite integration, except that we shall restrict ourselves to contours F = (a,b), with (a, 6) a subinterval of the real line 72. This restriction is made solely for simplifying the proof of the procedure. We also expect the final approximation procedure to be effective for curvilinear contours. The model convolution integrals that we shall consider take the form
140
CHAPTERS.
SINC
APPROXIMATION
where x G (a, 6). An effective method for approximating both p and q will provide an effective method of approximating either of the definite integral convolutions
The method of this section for approximating p and q that we shall describe here provides a powerful technique for solving any initial or boundary value problem whose solution can be expressed in terms of definite or indefinite one or multidimensional convolution-type integrals, or integral equations. Included among these are integral equations that many consider to be difficult, such as Abel's integral equation, or integral equations to which the classical Wiener-Hopf method is theoretically applicable. Sine collocation of p and q is possible under the following Assumptions: We assume that the "Laplace transform",
exists for all s 6 fi+ = {s £ C : 3ls > 0}. Let P(r, x) be defined by
We assume that (i) P(r, •) 6 Ma$(T>'), uniformly for r 6 [0, b - a]; and that (ii) —Q^/4 > 'r € L a , p f ( T > f ) , with a/ and j3f some positive numbers, Vf some simply connected domain of analyticity of f , >/ a conformal transformation o f V j onto T>&, and such that (f)j : (0, c] —> TZ. Under these assumptions, we have Theorem 140.1 If the above assumptions are satisfied, then there exists a constant Ci, independent of N, such that
We shall motivate this result and give a brief proof of it in the next section. Then, we illustrate the power of this result when it is applied to the approximation of a multidimensional convolution. In Chapter 8 we shall illustrate the use of this result to obtain an approximate solution of Burgers' equation.
3.5. SING INDEFINITE 3.5.1
CONVOLUTION OVER r
141
Derivation and justification of procedure
Let us briefly motivate the proof of Theorem 140.1. We shall consider only the case of p ( z ) , since the case of q(x) can be treated in exactly the same way. Letting Jw be defined as above, by (Jw}(x] = f ^ w(t) dt, with w € L\(a, 6), it follows that for every positive integer n,
Assuming that the length \b — a\ of the interval (a, 6) is finite, it is then convenient to take \\w\\ = (b — a)~l fa w(i)\ dt, since this choice of norm yields the simple inequality
By using the inversion formula for the Laplace transform, and then converting to the "Laplace transform" by replacing s by l/s (a transformation that conformally transforms the right half plane S7+ to itself) the definition of p(x) may also be written as
Thus, it follows from the definition of J that
By analytic continuation as a function of s, it follows that the identity
142
CHAPTER 3. SINC
APPROXIMATION
holds not only for all s G C such that s > b - a, but in the larger, resolvent set of J, excluding the point s = 0. Here, the resolvent set of J is the set {s £ C : (s — J)~~l exists}. The resolvent set of J can be more closely identified, upon setting
It follows, in this notation, that
Hence, the resolvent set of J includes the set {s G C : $\s < 0}, as well as the set {Jts 6 C : s\ > b — a}. The above formulas for p and w yield the Dunford-type 'integral
If (a, b) is a finite interval, then it follows that F ( s ) is bounded for all sufficiently large s G C. It is also well known that the classical Laplace transform f(s] '= ^e~stf(t)dt —>• 0 as s —>• oo, so that F(0) = /(oo) = 0. It now follows that for bounded intervals (a, b) we have
It may be readily shown that this result also holds for unbounded intervals (a, b). Now, letting Vm, IIm and Jm be defined as in the previous section, it follows that since Jmg « Jg, we may expect that
We remark here that it may be readily shown (see Exercise 78, page 149) that every eigenvalue of the matrices Am and Bm denned as in the previous section lies in the closure of the right half plane. Indeed, it has been shown by direct computation, that all eigenvalues of the matrices Am and Bm lie in the open right half plane for 1 < 77i < 513, and hence the matrices F(Am) and F(Bm) are well defined for all such values of 771, and may be evaluated in the usual way, via diagonalization of Am and Bm.
3.5. S1NC INDEFINITE
CONVOLUTION OVER T
143
Let us now outline the proof of Theorem 140.1. To this end, we state a sequence of lemmas, whose proof is left to the problems at the end of this section. Let X denote the space of functions g G Hol(V) that are absolutely integrable over the boundary of £>, and let Y denote the space of all functions w G Hol(V) that are uniformly bounded in V. These spaces will be nonned by g\\x — KUPxe(a,b) \ 9 ( x } / (i)'(x}\ and ||'W||Y = su P.rg((i 6) l' u ) ( 3 ')l respectively. It is readily seen from Theorem 119.1 that if g & X, then Jy 6 Y. On the other hand, if P and T)' are related as in Theorem 119.1, and if u G Y('D'), then Lemma 143.1 Let u £ ~ Y , l e t t j j j , M and N be defined as in Section 3.2. If h = 7/7V1/2, where j is a positive constant, then as N —> oo,
Proof The proof is left to Exercise 79 on page 149.
Lemma 143.2 Let g e X, and let h — j/N1/'2, for some fixed positive constant 7. Let J and Jm be defined as above. Then, as N -^ oo,
Proof The proof of Lemma 143.2 is left to Exercise 80 on page 149. •
Lemma 143.3 Let $Rs < —c < 0. Then, for g G X, we have (a) As m -> oo,
144
CHAPTER 3. S1NC
APPROXIMATION
Proof We leave the proof to Problem 81 on page 149.
Lemma 144.1 Let the conditions of Lemma 143.2 be satisfied, and for a given g £ X, let F(J}g 6 Y. Then, as m —>• oo,
Moreover, there exists a constant ci, which is independent of N , such that
Proof The Proof is left to problem 82 on page 149. • We are now in position to complete the proof of Theorem 140.1. Proof We shall only prove the approximation for p ( x ) , since the proof for the approximation of q(x) is almost exactly the same. Upon setting
and using the above Dunford integral expression for F(J'), we get
Thus, it follows that
Replacing s by \/s in this expression, and recalling that /*'(*) = /(!/,), we get
3.5. SINC INDEFINITE
CONVOLUTION OVER T
145
Now it readily follows that if g G X, then for any s $ $7+, ,s ^ 0, we have
and applying this result once more, we get
where the subscript r on A indicates that A operates with respect to the variable T. We now apply the operation (27ri)""1/l'^o[...] ds to each side of the above equation, to get
Let us write
where
By assumption, P(r, •) € Ma$(D'} uniformly with respect to r 6 [ 0 , 6 — a], and it therefore follows that Pr(r, •)/>' 6 L a|/ g(Z>), uniformly for r 6 [0, 6 - a], where T> is defined in terms of V1 as at the outset of Section 3.1. Hence it follows immediately, according to Theorem 119.1, that there exists a constant cit independent of N, such that
146
CHAPTERS.
SINC
APPROXIMATION
We next bound E p ( x ) . In essence, suffices to show that for all r G [ 0 , 6 — a] and r 6 T>, we have the inequality
with 02 a constant independent of r and r. For, if the last inequality holds, then by Lemma 143.1 we get
with £3 a constant independent of r, r, and N. It then follows that
Hence, it remains to derive the bound |F r j r (r,r)| < A. Since P(r, •) € Ma,)ie(X>/) uniformly with respect to r 6 [0, 6 — a], it follows that P T (r, •)/>' 6 L ai/ g(P) uniformly with respect to r 6 [ 0 , 6 — a]. Since for r e [0, 6 - a], we also have P r (-,r)/^>' / | < cx [/>/]"/ /[I + / 9 /]° i/+ ^ / i uniformly for r € £>, with GI a constant independent of r, and r, the bound follows. •
3.5.2
Multidimensional indefinite convolutions
Perhaps the most important application of the result of Theorem 140.1 is its remarkable efficiency for collocating multidimensional convolution integrals, especially those involving integral formulations of partial differential equations. These include potential theory problems, scattering problems or Navier Stokes problems. We illustrate here the Sine collocation of a three dimensional convolution integral. The example that, follows also illustrates the ease of adaptation to parallel computation of the resulting multidimensional convolution algorithm.
3.5. SINC INDEFINITE CONVOLUTION
3.5.3
OVER T
147
Two dimensional convolution
Here we illustrate the approximation of a convolution integral of the form
where the approximation is sought over the region
and with (a,-, &;) C H. We assume that the mappings <j)j : T>j —> T>d> have been determined. We furthermore assume that positive integers Nj and Mj, as well as positive numbers hj (j = 1,2) have been selected. Then we set rrij = Mj + Nj + 1, and we define the Sine points by z(^ - ~l(thj), for i = -Mj,. ..,Nj-J = 1,2. Next, we determine matrices Aj, X j , Sj and XJ"1, such that
where We require the two dimensional "Laplace transform"
which we assume to exist for all s^' G Ji"1", with U + denoting the right half plane. It can then be shown that the values p ^ j , which approximate p(z\ , z • ), can be computed via the following succinct algorithm. In this algorithm we use the notation h^. = (hit-Mi, • ••,hi,N2)T and h.j = ( f o _ M l i j , . . . , hNl j) T We again emphasize the obvious ease of adaptation of this algorithm to parallel computation. Algorithm
1. Form the arrays zj j) , and j^4>^(x) at x = z^ for j = 1,2, and i = — M j , . . . , N j , and then form the block of numbers
\3iJ\ = \a(z?\z^)].
148
CHAPTERS.
SINC
APPROXIMATION
2. Determine Aj, Sj, and Xj for j = 1,2.
3. Form h.j = X^g.j, j - - M 2 , . . . , N2.
4. Form k;. = Xf'h;,., i = -Mi,- • -,Mi. (1)
(2)\
5. Form r; ,• = F(s} ,' s J- /} k;,-, i = — M i , . . . , N\. \ Z IJ ' )J
6. Form qj r = ^r*,-) * — — M i , . . . , A/i; 7. Form p.j = Xiq.j, j = -M 2 ,..., JV2. Remark It is unnecessary to compute the matrices X^1 and X^1 in steps 3 and 4 of this algorithm, since the vectors h. )7 and k^. can be found via the L U factorization of the matrices J^i and X%. • Thus, starting with the rectangular array [gij], the algorithm transforms this into the rectangular array [pi,j]. We shall denote the result of this algorithm via the simple notation
Once the numbers p,-j have been computed, we can then use these numbers to approximate p on the region B via the use of a Sine basis; upon setting p^ — e1^ , we can define the functions
We then get the approximation
To get an idea of the complexity of the above procedure, we make the simplifying assumption that M7- = Nj — yV, for j = 1,2.
3.5. SINC INDEFINITE
CONVOLUTION OVER T
149
We may readily deduce that if the above two dimensional "Laplace transform" F is known explicitly or if the evaluation of this transform can be reduced to the evaluation of a one-dimensional integral, then the complexity, i.e., the total amount of work required to achieve an error e when carrying out the computations of the above algorithm (to approximate p ( x , y) at (2/V+l) 2 points) on a sequential machine, is0([l/log( £ )] 6 ). The above algorithm extends readily to i> dimensions, in which case the complexity for evaluating a /'-dimensional convolution integral (at (2/V+ 1)" points) by the above algorithm to within an error of e is of the order of [log(l/e)] 2 "+ 2 . 3.5.4
Exercises
77. Let u be denned and integrable over (a, 6), and set
78. Let Am and Bm be defied as on page 137, and prove that each eigenvalue of Am and Bm lies in the closed right half plane. Hint: Show first that /( -1 ) = H + S} where H is the matrix of order m for which each entry is the number 1/2, and where S is askew symmetric matrix, and then show that c is an arbitrary complex vector of order TO, then the real part of c*/(~^c is nonnegative. Then conclude from this result that the real parts of the eigenvalues of J^"1), and hence also of Am and Bm are nonnegative. 79. Use the result of Theorem 134.1 to prove Lemma 143.1. 80. Prove Lemma 143.2. 81. Prove Lemma 143.3. 82. Prove Lemma 144.1.
150
3.6
BIBLIOGRAPHY
Annotations
The Cardinal series representation owes its origin to Borel [1], [2]. Shortly thereafter, Whittaker [24] studied its analytic function properties. In [3], [4] Davis revived the use of the contour integral as means to examine the error of trigonometric approximation. In [5], [15], and [17] the contour integral approach was used to examine the error of quadrature and in [16]—to study the error of approximation via the Cardinal series. The contour integral approach was also used effectively in [6]-[13], to study the error of cardinal interpolation, quadrature, and polynomial interpolation. The research for the adaptation of the use of the cardinal function to other intervals was carried out mainly in [18]-[23]. Excellent applications of Sine methods for solving partial differential equations may be found in [14], and similarly for integral equations, in [23]. ,
3.7
References
[1] E. Borel Sur 1'interpolation. C.R. Acad. Sci. Paris, 124: 673— 76, 1897. [2] E. Borel. Memoire sur les Series Divergente. Ann. Ecole Norm. Sup., 16: 9-131, 1899. [3] P.J. Davis. Errors of Numerical Approximation for Analytic Functions. J. Rational Mech. Anal, 2: 303-13, 1953. [4] P.J. Davis. On the Numerical Integration of Periodic Analytic Functions. In On Numerical Approximation, edited by R. Langer, University of Wisconsin Press, Madison, 45-49, 1959. [5] G.H. Hammerlin Uber ableitungsfreie Schranken fur Quadraturfehler. Numer. Math., 5: 226-33, 1961. [6] R. Kress. Interpolation auf einem Unendlichen Interval!. Computing, 6: 274-88,1970. [7] R. Kress. Uber die numerische Berechnung Konjugierter Funktionen. Computing, 10: 177--87, 1972. [8] R. Kress. Ein Ableitungsfrei Restglied fur die trigonometrisdie Interpolation Periodischcr Analytischer F u n k t i o n e t i . Numer. Math. 16: 389-96, 1971.
BIBLIOGRAPHY
151
[9] R. Kress. Zur Numerischen Integration Periodischer Funktionen nach der Rechteckregel. Numer. Math., 20: 87-92, 1972. [10] R. Kress. On general Hermite trigonometric interpolation. Numer. Math., 20: 125-38, 1972. [11] R. Kress. On error norms of the trapezoidal rule. SI AM J. Numer. Anal, 15: 433-43, 1978. [12] R. Kress. Zur Quadratur Unendlicher Integrate bei Analytischen Funktionen. Computing, 13: 267-77, 1974. [13] R. Kress. Linear Integral Equations. Springer, Applied Mathematical Sciences, v. 82, Springer-Verlag, New York, 1982. [14] J. Lund and K.L. Bowers. Sine methods for quadrature and differential equations, SIAM, 1992. [15] J. McNamee. Error-bounds for the evaluation of integrals by the Euler-MacLaurin formula and by repeated Gauss-type Formula;. Math. Comp,, 18; 368-81, 1964. [16] J. McNamee, F. Stenger, and E.L. Whitney. Whittaker's cardinal function in retrospect. Math. Comp., 25: 141-54, 1971. [17] E. Martensen. Zur Numerischen Auswertung Unendlicher Integrale. ZAMM, 48: T83-T85, 1968. [18] F. Stenger. Integration formulas based on the. trapezoidal formula. ,/. Inst. Maths Applies, 12: 103-14, 1973. [19] F. Stenger. Approximations, via the Whittaker cardinal function. J. Approx. Theory, 17: 222-40, 1976. [20] F. Stenger. Remarks on integration formulas based on thetrapezoiclal formula. J. Inst. Maths Applies, 19: 145-47, 1977. [21] F. Stenger. Numerical methods based on Whittaker cardinal, or Sine functions. SIAM Review, 23: 165-224, 1981. [22] F. Stenger. Explicit, nearly optimal, linear rational approximations with preassigned poles. Math. Comp., 47: 225-52, 1986. [23] F. Stenger. Numerical methods based on Sine and analytic functions. Springer-Verlag, New York, 1993.
152
BIBLIOGRAPHY
[24] E.T. Whittaker. On the functions which are represented by expansion of the interpolation theory. Proc. Roy. Soc. Edinburgh, 35: 181-94,1915.
Chapter 4
Explicit Sine-Like Methods In this section we derive several methods of approximation using the function values {f(kh)}(£L_00. We present a family of simple rational functions, which make possible the explicit and arbitrarily accurate rational approximation of the filter, the step (Heaviside) and the impulse (delta) functions. The chief advantage of these methods is that they make it possible to write down a simple and explicit rational approximation corresponding to any desired accuracy. Also, the three families of approximations are very simply connected with one another—the filter being related to the Heaviside via an elementary transformation, and the impulse being the derivative of the Heaviside. Thus, these methods make it possible for us to approximate generalized functions.
4.1
Positive base approximation
In this section we discuss various methods, some of which are new, for approximating a function f ( t ) using the values /(O), /(±/i), /(±2/i), . .., where h > 0. Some of the given formulas are interpolating, i.e., the approximant actually takes on the values of / at the points kh, k € Z, where Z — {0 ± 1, ±2,...}, whereas most others merely approximate /. However, only those that approximate / do in fact converge to / as h —» 0. We give bounds on the error of approximation as simple functions of h, and these bounds illustrate the rate of convergence of the approximants to / as h —> 0. Let h > 0, and corresponding to an arbitrary positive number /3 € (0,1), let us set
154
CHAPTER 4. EXPLICIT SINC-LIKE
METHODS
and
In order to prove convergence of Fj to /, let us recall the definition of the modulus of continuity of / by
Theorem 154.1 Let a and ft be constants such that Q < a < fl < 1, and let f be uniformly bounded on Ti. Then there exist constants Aj and BJ, j = 1, 2, 3, 4, such that
where
4.1. POSITIVE BASE APPROXIMATION
155
Proof We may write
with
Hence, we write
with
We now complete the proof only for the case of j = 1, leaving the remaining cases to the problems at the end of this section. The Poisson summation formula, with the notation
states
and
156
CHAPTER 4. EXPLICIT SINC-LIKE
METHODS
Finally, for the case of ej , since g\(t) > 0, gi(—t] = #i(i), we have
The proof for the case of the approximation F\ (/, ft, /i, t) is complete. • Let us mention, at this point, the identity
where [•] denotes the greatest integer function, which yields yet another explicit expression for Fi(f,f3, h), for the case when / is periodic. More specifically, if / is continuous and periodic on 72., with period T, then by taking h — T/N, where ./V is a positive integer, we get
4.1. POSITIVE BASE APPROXIMATION
157
The formulas for the Fj on page 154 will of course be more useful in practice in the cases when either / has compact support, or else when /(t) decreases to zero rapidly as t —> ±00. Let us note, furthermore, that in such cases, we can: (a) Explicitly express the Fourier transform of each FJ; and (b) Explicitly express the indefinite integrals of F\ and F2. Moreover, if / has compact support, then FS(/, /3,/i) is an entire function. Just as for the case of the FI(/, /3, h) we can obtain another explicit (and useful!) expression for F4(f,fi,!i) in the case when / is periodic on 7i. In this case we may use the identity
to deduce that if / has period T on 7£, then by taking h = T/N, where N is a positive integer, we get
We remark also, that if f ( t ) —> 0 as t —> ±00, then both Fourier and Hilbert transforms of F4 (/,/?, h) may be explicitly expressed:
Finally, using the identity
we find that if / is periodic, with period T on 7£, then by taking h = T/N, where N is a positive integer, we get
158 4.1.1
CHAPTER 4. EXPLICIT SINC-LIKE
METHODS
Exercises
83. Use the result
to prove Theorem 154.1 for the case of j — 2. 84. Use the result
to prove Theorem 154.1 for the case of j = 3. 85. Use the result
to prove Theorem 154.1 for the case of j = 4.
4.2
Approximation via elliptic functions
Elliptic functions also provide explicit methods of interpolation that are just as accurate as Sine methods, and yet they are at times more convenient to use, such as for inversion of Laplace transforms. For example, the functions sin(yrz//i) and $>ii(z) have zeros at the same points z = e-7'1, j = 0, ±1, ± 2 , . . . , where
Let 0 < k < 1. The following elliptic function notation, i.e.,
then enables us to show that
4.2. APPROXIMATION
VIA ELLIPTIC
FUNCTIONS
159
Theorem 159.1 LetV denote the sector in the complex plane, T> = {z € C : | argz| < d}, with 0 < d < n. Let g <= H2 (T>). Then, for all s £ T>, we have
with
Moreover, if s = re , with r > 0 and \0\ < d, then
where
and where Ni(g,D,s) is uniformly bounded for s £ (0,oo). Proof The proof of this theorem may be carried out using the results of the problems at the end of this section. We therefore omit the proof. • The results of this theorem may be readily extended to the general setting of Section 3.1. We thus state the following result, leaving the proof to the problems at the end of this section. Theorem 159.2 LetV, r, z3> p, and L Qi /j(I>) be defined as in Section 3.1. Let g £ La^(T>}, let N be a positive integer, let M = [fiN/a], where [•} denotes the greatest integer function, and let h = (•Kd/(pN})1!'2. Then there exists a constant C, independent of N, such that
An important feature of Theorem 159.2 is that since the function 4>/i only has poles and zeros in the entire complex plane, after
160
CHAPTER 4. EXPLICIT S1NC-LIKE METHODS
approximating a Laplace transform g ( s ) = (£/)(*) = f^' e ~ s t f ( t ) dt we can apply the Bromwich inversion formula^
to get an explicit approximation of / by evaluating the inverse transform via the use of residues. The result is summarized in the following theorem. Theorem 160.1 Given g 6 H2(£i+), where fi+ denotes the right half plane, there exists a unique f € £2(0,00) such that / = L~lg. Let 6h(d) be defined by the formula
Then, for every positive number c, we have
References to the proof are given in the Annotations. 4.2.1
Exercises
86. Let /! be defined as on page 158. Prove that (a) \$(ix)\ = 1 for all x 6 U] (b) $/,(-*) = l/$ A (z) for all z 6 C; (c) |$fc(a:)| < fc1/2 for all x £ (0,oo); (d) 3>'h(e'h} = kl/2Kl(-iye^h/7r for all j e 2; (e) A,- J /2 < |$ A ( e (j+i/2)V e )| < A-V2 for all 9 e 7^ and j e 2. 87. (a) Use Poisson's summation formula to show that if z = t r' e , with t > 0 and \&\ = d, and with 0 < d < TT, the.n
4.3. HEAVISIDE, FILTER, AND DELTA FUNCTIONS
161
with
(b) Prove that
88. Use the definitions on page 158 to establish the relations
89. Consider the function u(s) = log |$^(s)| defined in the region (a) Use the results of the previous problems to show that u(s) = 0 if arg(s) = 7T/2, and that u(s) < log^ 1 / 2 ) if arg(s) = 0. (b) Show that if v(s) - (1 - 20/7r) log^1/2) then v is also harmonic in P, and moreover, for all s G P, we have u(s) < v(s). (c) Conclude from the (b)-Part of this problem, and Exercise 88, that for any s = re on the right half plane, we have |$fc(s)| < AVa-M/T < 2e -' 8 (V2-|9|/rr)/A i 90. Prove Theorem 159.2. 91. Prove Theorem 160.1. 92. Prove that the Laplace transform (^//^(.s') of
is
4.3
Heaviside, filter, and delta functions
In this section we derive simple explicit rational approximations of the Heaviside, filter, and delta functions.
162
CHAPTER 4. EXPLICIT S1NC-LIKE METHODS
4.3.1
Heaviside function
For a € 72., we define the function sgn(a) as
If h > 0 and if N denotes a positive integer, we define the Heaviside function H, and functions HN and H on 72. by
Theorem 162.1 If h > 0 and w 6 H, then
and
Moreover, by taking
we /mue /or a// w € ft,
Proof By the Poisson summation formula we can derive the identity
Then, we clearly have
4.3. HEAVISIDE,
FILTER, AND DELTA FUNCTIONS
163
Moreover, in view of the above definitions of H and HN, we also have
Hence, the theorem follows after selecting h = ,
^172- •
We remark that although the bounds on \H(w) — HN(W)\ and \H(w) -HN(w)\ are bad near w = 0, we have H N ( 0 ) = H(0) = 1/2, and HN(W) varies monotonically between H ( w ) and 1/2 for w < 0, and between 1/2 and H(w) for w > 0. 4.3.2
The filter or characteristic function
We define the filter function X(w), for w 6 7£, by
We also set
Paraphrasing Theorem 162.1 we get Theorem 163.1 If h > 0 and w G U, then
and
Moreover, by taking
164
CHAPTER 4. EXPLICIT SING-LIKE
METHODS
The proof is the same as that of Theorem 162.1, and we therefore omit it. 4.3.3
The impulse or delta function
In this section we examine the derivative of the function HN. Clearly the derivative of the Heaviside function H(w) is the delta junction, which has many applications to generalized functions. As may be expected H'N must in some way resemble H'. We now state some results of this resemblance for purposes of computational applications. Theorem 164.1 The derivative of HN, is given by
Setting and letting h be selected as in Theorem 163.1, we have
uniformly on Ti, and also
uniformly on Ti — [— e, e], where f. > 0 is arbitrary. Furthermore, if f is continuous and bounded on li, then
where C(N] = O(l/VN] asN-^oo, and where ^ ( f , 6} i* tlie modulus of continuity of f .
4.3. HEAVISIDE, FILTER, AND DELTA FUNCTIONS
165
Proof Let f ( x ; w ) = e™ ( ff ~'^2 and let n = N + 1. We note that
which coincides with Riemann's approximation of the integral
After some computation, we get
Thus,
Consequently,
as N —>oo, and with the bound on the extreme right-hand side uniformly valid on 72.. The considerations in the case of ^H'N((3^w} are similar to those of aNH'N(apjw}, and we leave them to an exercise at the end of this section. We now note that
166
BIBLIOGRAPHY
where C(N) = O(l/VN) as N -> oo. On the other hand, the integral on the extreme right-hand side is /(O) + SJY, with 9^v bounded by w(/, ytojv) + (2M/7r)y / o/V) and with M bounding / on 72.. • 4.3.4
Exercises
93. Verify the inequality 94. Show that
uniformly on Ti — [—e, e], where e > 0 is arbitrary.
4.4
Annotations
The material of this chapter is taken from [1], [2], [3], [4], and [5]. Specific comments Section 4.1: The positive base methods were originally derived in [3]; some of these procedures have effective extensions to radial basis approximations. Section 4.2: The interpolation methods presented in this section were introduced in [4]. Complete proofs of Theorems 159.1, 159.2 and 160.1 can be found in [5] on pages 255-60. Section 4.3: The Heaviside, delta, and filter approximations by rational functions were first derived in [1]. These ideas go back to [2].
4.5
References
[1] Y. Ikebe, M. Kowalski, and Frank Stenger. Rational approximation of the step, filter, and impulse functions. I n Asymptotic and Computational Analysis, Edited by R. Wong, Marcel Dekker, New York, pages 441 54, 1990.
BIBLIOGRAPHY
167
[2] F. Stenger. An analytic function which is an approximate characteristic function. SIAM J. Numer. Anal, 12: 239-54, 1975. [3] F. Stenger. Explicit approximate methods for computational control theory. In Computation and Control. Edited by K.L. Bowers and J. Lund, Birkhauser, Basel, pages 299-316, 1989. [4] F. Stenger. Explicit, nearly optimal, linear rational approximations with preassigned poles. Math. Cornp., 47: 225-52, 1986. [5] F. Stenger. Numerical Methods Based on Sine and Analytic Functions. Springer-Verlag, New York, 1993.
This page intentionally left blank
Chapter 5
Moment Problems In this chapter we present the classical moment problems as they have been mathematically defined. Moment problems are the simplest way to describe inverse problems mathematically. These problems were originally posed with moments being integrals of monomials. Such moment problems are ill-posed, and present considerable computational difficulty. On the other hand, moment problems whose moments are integrals of orthogonal bases can computationally be dealt with much more easily. The main problem of the theory of moments is finding necessary and sufficient conditions on real numbers m, i 6 /, and elements Pi of a normed space X such that there exists a linear functional C € X* satisfying
Here, / is an arbitrary set and X* denotes the space of all linear and continuous functionals £ : X —> 7v equipped with the norm
where -B(l) is the unit ball in X.l Frequently, a priori bound on the norm of £ is assumed, and certain positivity properties of £ are also required. We shall confine ourselves to the following requirements.
where B 6 (0, oo] and F is a cone in X, i.e., a subset of X such that \F e F for all f e F and A > 0. 1
The space X* is said to be dual to X.
170
CHAPTER 5. MOMENT
PROBLEMS
In order to avoid triviality, we shall assume that at least one of the numbers m is different from zero. More specifically, we shall require that Although the moment problem formulated above might not seem to be related to approximation, it turns out that it is dual to some approximation problem; the theory we presented in Chapter 1 corresponds to the the case when / is finite and F = {0}.
5.1
Duality with approximation
Since JJLQ ^ 0 and no assumption on the elements pi were made, we may reformulate the moment problem as follows. (MP) Find a functional C in X* such that
Here q0 — po/Mo, qt — Pt - Vtqa for t e T — I\ {0} and <5,-)0 is the Kronecker delta, i.e., <$o,o = 1 and 8tf0 = 0 for t € T. In order to derive a necessary and sufficient condition for the existence of £ we shall need the Mazur-Orlicz theorem, which reads as follows. Let Y be an arbitrary linear space and let u> : Y -> 7£ satisfy the conditions
for all x,y € Y and all a > 0. Given a set J, let f : J -> H be a bounded function and let g : J —> Y be an arbitrary function. Then the following statements are equivalent. (Cl) that
There exists a linear functional L : Y -^H such
(C2) The inequality
5.1. DUALITY WITH
APPROXIMATION
171
holds for arbitrary n G Af, j\, J 2 , . . . , jn € J and for arbitrary nonnegative numbers «i, a-2,. • . , otn. Let us now consider the following approximation problem. (AP) Find an element v 6 span{g t } (6 T — F such that
Here, || • \\ is the norm on X and the symbol ; —' stands for the algebraic subtraction of subsets of the linear space X, i.e., for A, B C X we have A—B = {a - b : a e A, b € B}. Given a subset W C X and an element / € X we shall denote infu,ew ||/ — w\\ by e(f, W). This extends the definition of best approximation error used in Chapter 1. We are now in a position to show the following duality theorem. Theorem 171.1 The moment problem (MP) has a solution iff the approximation problem (AP) has no solution, i.e.,
Here, l/B is to mean 0 if B = oo. Proof Let us assume that a linear functional £ satisfies the conditions (i), (ii), and (iii). Equivalently, we may rewrite these conditions in the form
It is now easily seen that for a fixed m, upon defining u>(xj = ^H^H and choosing suitably the set J and functions / : J —>• { — 1, 0, 1}, g : J — > X , we may write these inequalities in the form (a), (b). Thus, by the Mazur-Orlicz theorem, we conclude that C exists iff for some m € (0, J3) the inequality
holds for all positive numbers a1; a2 and for all elements w e span{j9t}<6x and h € F. Of course, we may assume that a = a\ — «2 > 0 and set
172
CHAPTER 5. MOMENT
PROBLEMS
Thus, w + h = —av; and we see that L exists iff for some in € (0, B),
is satisfied regardless of v G sp&n{qt}t£T~F- This means that a necessary and sufficient condition for the existence of £ is
as claimed. • Corollary 172.1 We have
where the infimum is taken over all functionals C 6 X* the conditions (i) and (ii).
satisfying
Remarks 1. Let I? be a closed subset of Tln and let Co(D) denote a normecl space of continuous functions / : D^TZ vanishing at infinity, i.e., for any e > 0 there exists a compact subset K C D such that |/(s)| < e for all s € D \ K. The norm in C0(D) is defined by |/|| = sups££, |/(s)|. In the case when D is compact we shall write C(D) instead o f C 0 ( D ) . Let us now recall the Riesz theorem on characterization of functionals in Co(D)*: Given a functional L € Co(D)* there exists a unique regular Borel measure /j, on D such that
Moreover, the functional L and the measure /j, satisfy the condition \\L\\ = |//|, where \fi\ is the total variation of /i. Thus, for X = CQ(D) and arbitrary functions pi & C'o(D), the moment problem (MF1) can be reformulated as follows. Find a regular Borel measure /;, on D such that
5.1. DUALITY WITH APPROXIMATION
173
2. Let |i be finite nonnegative Borel measure on Tin. Given 1 < p < oo let Z/ p (72. n ,yu) denote a linear space of all functions / : Kn —> 72. whose norm
is finite. (Two functions in L p (72. n ,/i) whose difference vanishes /ualmost everywhere are considered to be the same.) We now recall the Ricsz-Steinhauss theorem on characterization of functionals in L p (72. n ,/t/)*. Let I < p < oo. For every L € i p (72. n ,yu)* i/jere exists a unique function A 6 Lq('R.n, IJ,}, where l/p + l/q = 1, suc/i £/ia£
Moreover, the functional L and the function A satisfy the relation \\L\\ - \\X\q. Hence, for X = Lp(Tin,/j.) and arbitrary functions pi 6 L p (72™,//) the moment problem (MP) can be reformulated as follows Find a function A G L (/ (72 n , /z), l/p+ 1/9 = 1, suc/i i/mi
3. Let us note that by choosing suitably the cone F in the moment problem (I)-(III) we can model the support of the resulting measure (j,. For instance, if F in the problem (I)-(III) consists of all functions in Co(D) that are nonnegative on a subset W C D, then the measure H has to be nonnegative and supported on a subset of W. A similar remark, concerning the support of A, also applies to the moment problem (J)-(JJJ). 4. Given two points (ai, a 2 , . . . , a n ) and (61, & 2 , • • • , &n) in 72n by an interval we mean the set
The collection of all such intervals is denoted by Tn.
174
CHAPTER 5. MOMENT PROBLEMS
Let 4> be & function of intervals, i.e., a real-valued mapping defined on Xn and satisfying the condition
for all disjoint intervals /i, /2> • • • ) In £.1-n • An interval 7 6 In is said to be an interval of continuity of if limfc—>oo >(-f/c) = >(!) for any sequence of intervals Ik such that either Ik f / or /^ J, /• We recall that there exists a unique Borel measure on Tin coinciding with > over all intervals of continuity of . For this reason we shall denote the measure by (/>, the symbol for the function of intervals. We shall also write d(f) for the integrator corresponding to that measure. A sequence of functions of intervals {4>k}^=-i is K<(\d to converge to <j> if linifc—^oo >/;(/) = >(-0 for all intervals of continuity of ;. We are now in a position to recall two theorems of Helly which are frequently used in solving moment problems with D = Ti'1. For every uniformly bounded sequence {$A:}fcLi of functions of intervals there exists its subsequence {<^.}^:1 and a function of intervals (f) to which the subsequence converges. Let f : Tin-+ Ti be a continuous function for which the condition Ij f 72." implies that ff^n\j.f(x)d<^>k(x)^^ uniformly in k. Let {^kJ'kLi be a uniformly bounded sequence of functions of intervals that converges to (p. Then
In the univariate case, a positive finite Borel measure on 72can be defined in terms of a distribution function, i.e., a bounded nondecreasing function / : Ji—tTi. We recall that for any such a function there exists a unique positive Borel measure > on Ti such that (j)(( — oo, i]) = f ( t ) - \\mx—^-00f(x) whenever t is a, continuity point of /. For this reason two distribution functions are said to be substantially equal if they have the same points of continuity and their values at these points differ only by a constant. The integrator corresponding to a distribution / is denoted by df. Finally, we remark that the Helly theorems can be readily restated in terms of distributions instead of functions of intervals. •
5.2. THE MOMENT PROBLEM IN THE SPACE C0(D) 5.1.1
175
Exercises
95. The moment problem (MP) is said to be determinate if it has a unique solution £ in X*. Show that if the elements 'pi are linearly dense in X, then the moment problem (MP) is either unsolvable or determinate. 96. Let UQ, u i , . . . , un be linearly independent elements of a u n i tary space U . Define a linear functional £ on span{u^}]J_:.0 by £(uk) — 8otk for k = 0, 1, . . . , n. Show that ||£|| equals the first diagonal coefficient of the Gram matrix G(UQ, u\, . . . , un). 97. Let {/it}™_0 and {pi}f=0 be finite sequences of real numbers and linearly independent elements of X , respectively. Assuming that dim X > n find a necessary and sufficient condition on positive number B such that there exists a functional L G X* satisfying the equations \\L\\ = B and L(pi) = yU,- for i — l,2,...,n. 98. Let T be a normed space of real valued functions such that for all /, g 6 T the product fg belongs to T . Assume that for some sequence {/JJJ^Q of functions in J- there exists a functional L 6 F* such that
Show that
99. Prove that under the assumption of the previous exercise there exists a positive number M such that for arbitrary real numbers p 0) Pi, • • • , Pn we have
where | • || is the norm in JF.
5.2
The moment problem in the space Co(D)
We shall now derive a number of solvability conditions of the moment problem (I)-(III) with
176
CHAPTER 5. MOMENT PROBLEMS
Theorem 176.1 If B < oo; then the following statements are equivalent. (a) There exists a nonnegative regular Borel measure JJL on D such that
(b) e(g0,span{g<}ieT-Co(£|)+) > l/B. (c) sup{inf 5eD (w(s) - q0(s)) •
w G span{g t } i6 T} < -1/B-
If B < oo and go G Co (.£?) + , these conditions are equivalent to the following. (d) Z/et 7 > 0 6e /w:ed. For any function w G span{/j,-}ie/, w = Y^k-i ak'Pik, the inequality w(s) > —j on D implies
If B — oo and qo € Co(jD)+, i/ie conditions (a), (b), and (c) are equivalent to the following. (d') For any function w 6 span{p,-},-£/, w = Sfc=i a A:Wk) ^e inequality w(s) > 0 on D implies
If \nfS£DPo(s) > 0 ana1 B = oo, iAe conditions (a), (b), and (c) are equivalent to the inequality (e') sup {inf S6 D w(s) : w G span{f/J (eT } < 0. Moreover, if D is convex, they are equivalent, to the statement (e) Each function in span{g t } te T /*«« at least one zero on D. Proof "(a) <^> (b)" This equivalence is a straightforward consequence of Theorem 171.1 and Remark 1 on page 172. "(b) <=?• (c)" The condition (b) means that for every functions w 6 span{oj} t6 x and / G -P there exists an s G D such that
5.2.
THE MOMENT PROBLEM IN THE SPACE C0(D)
111
where e > l/B. We need to prove that this can be restated as follows. For every w E span{ f }j 6 T there is an s E D satisfying the inequality with e > l/B. Indeed, w(s) - qo(s) < -f implies t h a t w(s) - qo(s) f ( s ) < -s for all / 6 F. Conversely, letting f ( s ) — max(0, w(s] qo(s)) in the last inequality we get the preceding one. Thus, the equivalence (b) «=>• (c) is proven. We assume that B < oo and go E F. "(c) •<=> (d)" Let 7 be a fixed positive number. We suppose first that the condition (d) does not hold, i.e., there exists a function u in span{ -j on D. Thus, for w = u/(Bj) we have w(s) — qo(s] > -l/B. This yields
which contradicts (c). Conversely, from the inequality above it follows that w(s) — qo(s) > —l/B on D for some function w from span{gt}t£T- Substituting «o = ~Bj and u(s) = Bjw(s) we get u(s) + oioqo(s) > —j. Hence, the equivalence "(c) <=> (d)" is proven. We assume now that B = oo and qo E F. "(c) <=> (d')" By the equivalence "(a) <=> (c)" the inequality (c) implies the existence of a functional £ E Co(D)* such that £(qi) = (5^0 for i E /, which immediately gives the desired implication (c) => (d'). Conversely, let us assume that (c) does not hold, i.e.,
178
CHAPTER 5. MOMENT
PROBLEMS
Letting t—>oo we get the condition (c') as claimed. We assume additionally that D is convex. "(c') o (e)" It is easy to note that (e) implies (c!). Conversely, from (c') we get
Since D is convex, this and (c') result in (e). This proves the equivalence "(c') <& (e)" and completes the proof. • In many applications one looks for a nonnegative Borel measure r/ on D such that
where functions Pt- : D—t-TZ are continuous, and do not necessarily vanish at infinity. Let us note that Theorem 176.1 can be applied provided that there exists a function g : D —> 'R- satisfying
Indeed, then for B = oo and pi — Pi/g the condition (a) of Theorem 176.1 is a restatement of our requirements on 77. Let us also note that the function g can be easily removed from the resulting equivalence (a) <=> (d'). Thus, we obtain the following corollary, which will be extensively used in the next section to derive solvability conditions of the classical moment problems. Corollary 178.1 If the function g exists, a necessary and sufficient condition for the existence of a nonnegative regular Borel measure r/ on D satisfying (a') is that for any function w 6 span{P;};6/, w — 2]J_i akPik, the inequality w(s) > 0 on D implies
5.3
Classical moment problems
We shall now review solvability conditions of the following classical moment problems. From now on, nonnegative regular Borel measures will be called, for brevity, positive measures.
5.3. CLASSICAL MOMENT PROBLEMS
179
Discrete trigonometric moment problem Given a sequence of complex numbers {tk}^-^ fina a positive measure a on the interval [—TT, TT] such that
Continuous trigonometric moment problem Given a continuous function t : (—a, a) —iC, 0 < a < oo, find a positive measure a on 71 such that
Discrete Hausdorff moment problem Given a sequence of reals {fkJ'kLoi find a positive measure a on the interval [0, 1] such that
Continuous Hausdorff moment problem Given a continuous function f : (—oo, 0) —>72., find a positive measure a on ( —oo,0) satisfying
Discrete Stieltjes moment problem Given a sequence of reals {sk} fina a positive measure a on the interval (0, oo) such that
Continuous Stieltjes moment problem Given a continuous function s : (a, 6) —fr-72., —oo < a < b < oo, find a positive measure a on 72. satisfying
180
CHAPTER 5. MOMENT
PROBLEMS
Discrete Hamburger moment problem Given a sequence of reals {/ifcl^Lo fin(^ a positive measure aon'R, such that
Continuous Hamburger moment problem Given a continuous function h : [0, a) —>-7^. such that h(0) = 0 and 0 < a < oo, find a positive measure a on 7£ satisfying
The names of these problems suggest some relationships between discrete and continuous ones. Noting similarities between problems (179.1) and (179.2), (179.3) and (179.4), (179.5), and (179.6) is relatively easy. See the Exercises at the end of this section. In contrast to that, problems (180.1) and (180.2) seem to be not tied at all. Thus, we shall take time to indicate that (180.2) relates to (180.1) through the function
where a is a positive measure on K. Indeed, assuming that \t/z < 1, we have
and consequently,
where hk is given by (180.1). Integrating by parts one shows that for zt(z — t) ^ 0,
5.3.
CLASSICAL MOMENT PROBLEMS
181
We leave the verification to the reader as an exercise. Using the Fubini theorem we may now rewrite g(z) as follows:
where h(x] is given by (180.2). Thus, the function g(z] generates both moment problems (180.1) and (180.2). We remark that g uniquely determines the measure cr, see Exercise 101 on page 186. We shall now list solvability conditions of the classical moment problems. Theorem 181.1 (a) The discrete trigonometric moment problem (179.1) has a solution iff for all complex numbers CQ, GI, . . . , cn the sequence {ife}^-_00 satisfies the condition
(b) The continuous trigonometric moment problem (179.2) has a solution iff for all numbers x0, x\,..., a;« € [0, a) and CQ, c.\,. . ., c.n € C the function t : ( — a, a) —>C satisfies the condition
(c) The discrete Hausdorff moment problem (179.3) has a solution iff for n, k = 0, 1, 2 , . . . the sequence {fj}Jl.0 satisfies the condition
(d) T/ie continuous Hausdorff moment problem (179.4) has a solution iff for all n = 0, 1, 2 , . . . and all numbers h > 0 and x < — nh the function f : (—00, 0) ->7£ satisfies the condition
182
CHAPTER 5. MOMENT
PROBLEMS
(e) The discrete Stieltjes moment problem (179.5) has a solution iff for all real numbers po, Pi, • • •, Pn the sequence {sk}'£L0 satisfies the condition
(f) The continuous Stieltjes moment problem (179.6) has a solution iff for all real numbers XQ, xi,..., x n , po, pi, • • . , pn such that Xj + Xk G (a, b) for all j, k, the function s : (a, 6) —^72. satisfies the condition
(g) The discrete Hamburger moment problem (180.1) has a solution iff for all numbers p$, / ? ] , . . . , pn the sequence {/ifc}^Lo satisfies the condition
(h) The continuous Hamburger moment problem (180.2) has a solution iff for all numbers XQ, x\,..., xn € [0, a/2) and po, p\,..., pn 6 72. the function h : [0, a), —>-72. satisfies the condition
The proof is based on the Theorem 176.1 and on representations of nonnegative polynomials. We shall prove below the assertions (a), (b), (c), and (d). Proofs of (e), (f), (g), and (h) will be outlined in a series of exercises. Proofs (a) The discrete trigonometric moment problem (179.1) can be reformulated as follows. Find a positive measure a on the interval [—TT, ?r] such that
for k 6 Z = {0, ±1, ±2,...}. Thus, by Theorem 176.1 (condition (d')), the measure a exists iff the implication
5.3. CLASSICAL MOMENT PROBLEMS
183
holds for all m, n G A/", u-, jfc 6 -2 and all rv^., /j/, 6 7^. We recall the following theorem. Every trigonometric polynomial p(t) that is nonnegative on the interval [—TT, TT] ca?i 6e represented as
for all complex numbers CQ, c\,..., cn. The proof of the assertion (a) is complete. (b) The continuous trigonometric moment problem (179.2) can be rewritten as follows. Find a positive measure a on the real axis such that
for all x E (—a, a). Since the function t is assumed to be continuous, we may restrict ourselves to rational arguments x, i.e., x = n/d, where n £ Z and d 6 A/". Noting that the functions 7£ 9 £ H-> cos(te)/(l + £|) and 7Z 9 t H-> sin(ta;)/(l + |t|) are members of C 0 (7^), we use Corollary 178.1 to state that the measure a exists iff the implication
holds for all u, w,dk € M, n/, 6 Z and all ak} (Jk £ 11 such that i^k/dk & (-a, a). The function P(i) above can be regarded as a trigonometric polynomial of the variable y = /. fll—i a^ • where
184
CHAPTER 5. MOMENT PROBLEMS
m — max(u, w). Thus, the assumption P(t) > 0 on Tl leads to the conclusion that P(t] can be represented as
where xk = k flfcLi dk l > 0, and c/t G C. As d^ are arbitrary natural numbers, the implication holds iff
for all numbers XQ, x\,..., xn G [0, a) and CQ, c\, . . ., cn £ C. The proof of the assertion (b) is complete. (c) According to Theorem 176.1 (condition (d')), the discrete Hausdorff moment problem (179.3) has a solution iff for arbitrary reals «!, 0 1 2 , . . . , cxm and for arbitrary nonnegative integers k\, k?, . . ., km the condition
implies the inequality X(q) — J2]=iajfk,j > 0. For sufficiently large M, the polynomial q(x) above, can be written as
where polynomials TI, r 2 , . . . , r^-i do not depend on M, see Exercise 43 on page 85. Since the polynomials Vk>n(x} = xk(l — x}n are nonnegative on [0,1] and the quantity \(q) continuously depend on q, the implication above holds iff X(vkin) > 0, for k, n = 0, 1, 2 , . . . . Noting that
we complete the proof of (c). (d) The function / is assumed to be continuous. Thus, without loss of generality, we confine ourselves to rational arguments x in (179.4), i.e., x = n/d, where n € Z and d 6 AT. By Theorem 176.1 (condition (d')), the continuous HausdorfF moment problem has a, solution iff the implication
5.3. CLASSICAL MOMENT PROBLEMS
185
holds for arbitrary numbers u, n^, dk 6 A/". For fixed parameters <*fc) n A)i <^fc the function Q(x) can be. regarded as an algebraic polynomial of the variable y/t = e~xh, where /?, = 0"=! ^j'- Thus, proceeding as in the proof of the assertion (c) we arrive at the conclusion that the last condition reduces to the requirement that
for h > 0 and k, n — 0, 1, 2 , . . . . Here, the functions Vk,n,h are defined by the formula
Hence, \(Vk^h] = £?=o (^ (-l}n^ f (h(j - n - k } } . Now, the assertion (c) follows easily. Proofs of the assertions (e), (f), (g), and (h) are outlined in a series of exercises. • 5.3.1
Exercises
100. Show that for zt(z - t) / 0 we have
101. Show that the mapping
is well defined for any function a of bounded variation on 72. and features the following properties. 1. The function g is analytic in the upper and in the lower half-planes and g(z) — g ( z } . 2. The integrator da is uniquely determined by g. Moreover,
186
CHAPTER 5. MOMENT
PROBLEMS
The last equation is known in the literature as the Stieltjes inversion formula. 102. Show that given a real number a ^ 0, there exists a positive measure a on 7i such that
for j = 0, 1, 2 , . . . . Prove that we may require that the measure is supported on
103. Given a positive measure a on [0,oo), find a positive measure <TI on 7vL such that
where n = 0, 1, 2 , . . . . 104. Explain relationships between the moment problems (179.1) and (179.2), (179.3) and (179.4), (179.5), and (179.6). 105. Let {tkJ'kLo be a sequence of reals. Based on the proof of the assertion (a) of Theorem 181.1 derive a necessary and sufficient condition for the sequence to be of the form
where a is a positive measure on [—7r,7r]. 106. L e t / : [0, oo)—>72. be a continuous function. Based on the proof of the assertion (b) of Theorem 181.1 show that the function can be representedas
where a is a positive measure on [0,oo), iff for arbitrary real numbers 2;0, X[, . . ., xn, r 0 , r\, . . ., ru it satisfies the condition
5.3. CLASSICAL MOMENT PROBLEMS
187
107. It is known that for any algebraic polynomial p, the condition p(x) > 0 on [0,1] implies that
where t, u, v and w are algebraic, polynomials with real coefficients. Taking this fact together with Theorem 176.1 (condition (d')) prove that given a finite sequence of reals {/A,-}/L-0 there exists a positive measure a on [0, 1] such that
iff for arbitrary reals X i , x?,..., £r n /2] the sequence satisfies the following condition.
108. Any algebraic polynomial p(x) that is nonnegative on [0, oo) can be written as p(x) — [u(x)]2 + x [v(a;)]2, where u and v are algebraic polynomials with real coefficients. Using this fact and Corollary 178.1 prove statements (e) and (f) of Theorem 181.1. 109. Based on the discussion on page 180 indicate a likeness between the discrete Stieltjes moment problem and the problem of existence of a positive measure a on [0, a) such that
Here, a is a fixed positive number and s : [0, a) —>7£ is a given function that vanishes at 0. Then show that such a measure exists iff the inequality
is satisfied for all real numbers X-Q, a-'i,. . . , xn, TO, r\, . .., rn such that \Xj -~ x^ < a.
188
CHAPTER 5. MOMENT
PROBLEMS
110. Any algebraic polynomial p ( x ) that is nonnegative on the real axis can be represented as p ( x ) = [u(x)] -{-[v(x)] , where u and v are algebraic polynomials with real coefficients. Use this fact and Corollary 178.1 to derive the statement (g) of Theorem 181.1. 111. The representation of nonnegative polynomials in one variable quoted in the previous exercise does not carry over the multivariate case. Show this by proving that polynomial p ( x , y ) = X 2 y 2 (a i 2 + j/2 —1) + 1 cannot be written as a finite sum of squares of real-valued polynomials and m\nXty£Tip(x,y) = 26/27. 112. Assume that a is a solution of the discrete Hamburger moment problem (180.1) and for some numbers /i_2, /i_i the determinants
are nonnegative for all n > 0. Prove that the measure a\:
is also a solution to this moment problem. 113. Use the Helly theorems (see page 174) to show that if the assertion (h) of Theorem 181.1 is valid for all a £ (0, oo), then it is also valid for a = oo. 114. Given a positive measure a on Tl and 0 < a < oo let a function h : [0,a),-»7e be defined by (180.2). Show that
if XQ, x i , . . . , xn & [0, a/2) and p0, pi, .. ., pn 6 71. Thus, the assertion (h) gives a necessary condition for the solvability of the continuous Hamburger moment problem.
5.4. DENSITY AND DETERMINATENESS
189
115. Given a positive (and finite) number a let h : [0, a) —>Ti be a continuous function satisfying the condition in the assertion (h) and such that h(0) = 0. For m = 1 , 2 , . . . consider the set
and the linear space Lm spanned by the functions
for x € Rm. Show that there exists a linear functional am : Lm —> Tl such that
and that am(p) > 0 if
116. Use Corollary 178.1 to show that for each m the functional am in the previous exercise is of the form am(p) — J__^ p(t] dam(t}, where a,m is a positive measure on ( — 00, Bm] and p G Lm. Consequently,
117. Use the Helly theorems and the result in the previous exercise to show that the assertion (h) of Theorem 181.1 gives a sufficient condition for the solvability of the continuous Hamburger moment problem.
5.4
Density and determinateness
Let or be a positive measure on 7in and let {PJ}J^I be a family of functions in Li('R,n,da}. We shall now show a relation between the linear density of functions pj in Li(7ln,dcr) and some property of a. To this end, we set JJLJ = f^n pj(x)da(x] and we consider the moment problem of finding positive measures /x different from a and such that
190
CHAPTER 5. MOMENT PROBLEMS
We shall say that a is V-extremal with respect to the family {pj}3£i if given two solutions a\ and cr2 of the moment problem above the measure a cannot be written as a — aa\ + (1 — a)a? with a G (0, 1). We are now ready to prove the following result. Theorem 190.1 The functionspj are linearly dense in L\(Tln, da], i.e.,
iff the measure a is V -extremal. Proof Let us assume that span{pj}j£; ^ L\(Tin,da). This is equivalent to the existence of a nontrivial linear continuous functional L on Li(TZn,dcr) such that L(PJ) = 0 for all j G /. By the Riesz-Steinhauss theorem this functional must be of the form L(f) = fKn f ( x ) p ( z ) d c r ( x ) , where p e LfX)(K'\da). Without loss of generality we assume that 0 < \\P\\QQ < 1- Let us now note that the measures a\ and a^ defined by
are solutions to the moment problem and satisfy a = }^a\ + \&iHence, the measure a is not V-extremal. In order to show the opposite implication we assume that measure a is not V-extremal, i.e.,
where a £ (0,1) and <TI,
is vanishes on s p & n { p j } j ^ j .
Moreover, the inequalities
show that it is continuous. Thus,
This completes the proof. • Characterization of measures a such that the set { p j } j £ i is linearly dense in the space Lp(lZn, d a ] , where p > 1, is an open problem.
5.4. DENSITY AND DETERMINATENESS
191
See Annotations for references to some partial results. In one dimensional case, when p = 2 and pj are monomials, Pj(x) = x3, x G 72, the characterization is known and belongs to M. Riesz. We shall now take time to derive the Riesz result, since it establishes important ties of the density problem with quasi-orthogonal polynomials and with the discrete Hamburger moment problem. We begin with some definitions. Let L : [1 —>• 72, be a linear functional. A distribution function / : Ti—»72. (see p. 174) is called a representation of L if
If all representations of L are substantially equal to / we shall say that / is determinate. The problem of finding / is actually the discrete Hamburger moment problem (180.1) with hk = L(xk). If the representation / is determinate that moment problem and the corresponding measure 0" in (180.1) are also called determinate. Otherwise, they are called indeterminate. In order to guarantee the existence of/ we assume in what follows that the functional L satisfies the condition
We denote by {-Pfc}^_0 ^ ne sequence of orthonormal polynomials with respect to L,
We shall now show that a representation of the functional L can be obtained based on the following properties of quasi-orthogonal polynomials. Theorem
191.1
(a) Let q be a quasi-orthogonal polynomial of degree m + 1 with respect to L and let x0 < x\ < ... < xm be the zeros of q. Then there exist positive numbers Co, Ci,..., Cm such that the following quadrature formula
holds for all p g II2 m 2
This means that L(p) > 0 if p > 0 on 72. (see Exercise 110 on page 188).
192
CHAPTER 5. MOMENT PROBLEMS'
(b) Given a real number x such that Pm(x) ^ 0, let q be a quasiorthogonal polynomial of degree m + 1 such that q(x) = 0. Then the quantities
and
are both equal to the coefficient Ck of the quadrature formula above with x — k = x. (c) If f is any representation of the functional L, then under the assumptions of (a) we have
for k = 0, 1, . . ., TO. Proof (a) Since the zeros xj are distinct, the Vandermonde matrix i 1m x-\ is nonsingular. Thus, we may define numbers Cj, 1 < j^ < •> J j,k—\ TO, by the linear equations
[
We now let
This definition guarantees that the quadrature formula L(p) = I(p) holds for all p 6 H m . In order to show that it is actually valid for all p 6 112™, we pick up any polynomial p G II 2m ar'd note that it can be written as p = .s f/ + r, where q is the quasi-orthogonal polynomial, s £ I^-i and r G n m . Since L ( s q ) — I ( s q ) = 0 we get It remains to show that the numbers C\. are positive. To this end, let us consider the Lagrange interpolatory polynomial Wk 6 FIm such that
5.4. DENSITY AND DETERMINATENESS
193
where 0 < k < n is a given integer and 8k,j stands for the Kronecker delta. Since wi € H-ym we have
This completes the proof of the assertion (a) (b) Since Pm(x) ^ 0, the polynomial q exists (see page 29) and consequently the quadrature in (a) is well defined. Assuming that x — Xk we have L(p'2} = /(p 2 ) > Ck for any polynomial p £ fl m such that p(x) — I . Moreover, this estimate is exact for the polynomial p = Wk defined in the proof of (a). Consequently, we get
Based on the observation that
we complete the proof of (b) by showing that the explicit value of the minimum Mm(x) is A m (a;). The detailed verification is left to the reader as an exercise. (c) Given an integer 0 < k < m let us consider the Hermit e. interpolatory polynomials g, h £ H 2 ( m _ i ) defined (uniquely) by the conditions
and
These conditions imply that g < 1 < h on (—00, Xk], and g < 0 < h on [xjt,oo), see Figure 194.1. Moreover, since / is a representation of L, we have
194
CHAPTER 5. MOMENT PROBLEMS We now complete the proof by noting that
Figure 194.1: Graphs of the polynomials g and h Remark If the quasi-orthogonal polynomial q in the assertion (a) of Theorem 191.1 coincide with Pm+i up to a constant factor, then the corresponding quadrature formula holds for all polynomials p 6 H2m+i and it coincides with the classical Gauss quadrature formula formula. The result (c) is known as the Chebyshev inequalities, • Given a distribution function / and a point x 6 7Z we define the jump of / at x by the equation
Corresponding to each m, the quadrature formula in Theorem 191.1 can be written in the form
where fm is a distribution function whose nonzero jumps are located at the points Xj and jump/(a; J ) = ("-', for j = 0, 1 . . . . , 7;;,. This function can be explicitly defined by the equation
5.4. DENSITY AND DETERMINATENESS
195
Taking this observation together with the Helly theorems (see page 174) we may obtain a representation of the functional L. Indeed, the sequence {fm}m=i 's uniformly bounded by L ( l ) and, according to the first theorem of Helly it contains a subsequence {/7n fc }fcLi converging to a distribution function /. Moreover, if p 6 I I and Ij are finite intervals such that Ij t TV"., then for sufficiently large j we have ff p(x)df,,lk(x) = f-^p(x)dfmk(x). Thus, by the second theorem of Helly we get
and consequently / is a representation of L. We are now ready to prove the following theorem. Theorem 195.1 Given a number x £ TZ let
(a) IfA(x] > 0, then there exists a representation f of the functional L such that j u m p f ( x ) — A ( x } . Moreover, no other representation of L may have a larger jump at x. (b) /// is a determinate representation of the functional L and x is a continuity point of f , then A(:t) = 0. Conversely, if A(x) = 0 for all x 6 TV!., then L has a continuous and determinate representation /. Proof Since the zeros of orthogonal polynomials interlace, Pm(x) ^ 0 for infinitely many m. Consequently, there are infinitely many quasi-orthogonal polynomials qm of degree m such that qm(x) = 0. Hence, according to the assertions (a) and (b) of Theorem 191.1, the distribution functions fm above can be chosen in such a way that jump/ T O (x) = Am(x). Thus, repeating the Helly argument, we may get a representation f of L such that
The fact that no other representation has a larger jump at x follows easily from the Chebyshev inequalities. This proves the statement (a). The first part of (b) is an immediate consequence of (a). It remains to show that L has a continuous and determinate representation / if A(a;) = 0 for all a: 6 TV.. To this end, we pick up an
196
CHAPTER 5. MOMENT
PROBLEMS
x 6 7£ and consider the quadrature formula corresponding to the quasi-orthogonal polynomials qm. If / and g are representations of L and C™ are the coefficients of this quadrature, then
and
Here k is chosen in such a way that 6'™ = A m (a;). Since A(,T) = 0, by letting m—>oo we get
Thus, noting that x is arbitrary, we arrive at the conclusion that the distribution functions / and g are substantially equal and continuous. This completes the proof. • In addition to the statement (b) of Theorem 195.1 we quote the following useful result. Theorem 196.1 If A(XQ) = 0 for some XQ € Ti, then the functional L has a determinate representation. Proof We have already shown the existence of a representation / of L. Thus, only the uniqueness of / requires a proof. We assume that F is also a representation of L. Let qm be a quasi-orthogonal polynomial of degree m + 1 such that qm(xo) = 0. We recall that for infinitely many m £ A/" the qm exists and has m+\ real roots We assume that .TO = £k,m- Given a nonnegative integer n let us consider the function
and the Hermite interpolatory polynomials G, H 6 n 2 ( T O _i) defined (uniquely) by the conditions
5.4. DENSITY AND
DETERMINATENESS
197
and
These conditions imply that G < fn < H on (—00,2;], and G < 0 < H on [z,oo), see Figure 197.1. Thus, we have
Figure 197.1: Graphs of the polynomials G and H
and
By using the corresponding quadrature formulas for L(G) and L(H), we get L(H) - L(G) = Am(x0). Since A m (cco)-4A(x 0 ) = 0 as m—>oo, we arrive at the conclusion that the functions / and F are continuous at XQ and
198
CHAPTER 5. MOMENT
PROBLEMS
Now, we use the substitution y = ex x° to obtain the equations
Here f ( y ) = f(x0 + \n(y)) and F ( y ) = F(x0 + l n ( y ) ) for all y e (0,1). Since monomials are linearly dense in the space C(0,1) we get
which means that the functions / and F are substantially equal on the interval (—oo,x'o). It remains to show that they are also substantially equal on the interval (XQ,OO). To this end, we consider the symmetry with respect to the vertical axis x — x0 and apply similar arguments to the symmetric images of the functions (j)m and the polynomials G, H. This completes the proof. • Let P denote the class of all linear functionals L : 0 —> 72 satisfying the condition (191.1). Taking the last two theorems together with the assertion (b) of Theorem 191.1 we get the following result. Theorem 198.1 Let L and LI be functionals in P. Let u be a determinate representation of L and let v be a representation of LI such that
where M is a positive constant. Then the representation v is also determinate. Proof Indeed, by Theorem 195.1 we have
Here Pj are the orthonormal polynomials corresponding to the functional L and x is an arbitrary continuity point of the representation u. Given m G M we now consider the quantity
where Qj are the orthonormal polynomials corresponding to the functional L\. Since the polynomial Km = A, m (:r) J^"10 Pj (x)Pj is
5.4. DENSITY AND DETERMINATENESS
199
of degree m and Km(x) = 1, by the assertion (b) of Theorem 191.1 we get On the other hand, we have
Thus, 0 < Vm(x) < Am(x) for all m G M, and consequently
Now, by Theorem 196.1 we immediately get the determinateness of v. • We are now ready to derive the main result of this section. Theorem 199.1 Given a functional L & P let f be its fixed representation and let $ be a distribution function such that
Let U(f) C £2(7^) /) be the closure (taken in the space L-2(TZ, f}) of the set of all polynomials with complex coefficients. Then (a) // the distribution function is determinate, a is a continuity point of f and 6 ^ 0 , then the function
can be approximated with a polynomial to arbitrary accuracy, i.e., r e U(f) (b) Polynomials are dense in Z/2(?£,/), i.e., U(f) = L-2(Tl,f), iff the the distribution function is determinate. Proof (a) Let us consider the distribution function
Since 6 ^ 0 , there exists a positive constant M such that u ^ ^ p < M for all t 6 K. Moreover, the linear functional
200
CHAPTER 5. MOMENT
PROBLEMS
satisfies the condition (191.1). Consequently, since the distribution $ is determinate, by Theorem 198.1 we immediately get the determinateness of g. Let Qk G life, k = 0, 1, 2, . . ., be the orthonormal polynomials corresponding to the functional L\. For m = 0, 1, 2, ... we set
We now note that a, as a continuity point of /, is also a continuity point of g. Thus, by the assertion (b) of Theorem 195.1, we get
By the Christoffel-Darboux formula (see page 29) it follows that km is a quasi-orthogonal polynomial of degree m. Hence, its zeros Pi, p-2, • • • , pm are real and simple. Denoting by cm the leading coefficient of km we obtain
We are now in a position to show that r is the limit function of polynomials pm defined by the equations km(o- + ib)~1km(z) = I — (z — a — ib}pm(z),
for all z € C.
Indeed, when m—>oo we have
This completes the proof of the statement (a). (b) If polynomials are dense in L z ( T i , f ) , then the distribution / is determinate (see Exercise 95 on page 175). Consequently, by Theorem 198.1, the distribution is also determinate. We now assume that 4> is determinate. Let u be any function in Z/2(7v,/) and let a be a continuity point o f / . In order to show the density of polynomials in L-2(Tl, /), it suffices to prove that / can be
5.4.
DENSITY AND DETERMINATENESS
201
approximated to arbitrary accuracy by a rational function R of the form
where A, B, 6/j, Ck € 7£, and bk / 0. This follows readily from the proven part of the Theorem. Since the set of differentiable functions in CQ(Ti) is linearly dense in L2(7^,/), we may also assume that the function u is real-valued differentiable and lima—><;<, u(x) = lim^,—>_oo u(x) = 0. For x G 72. we define
and
Thus, the functions «i, u^ are symmetric with respect to a and belong to in CQ(T£). Moreover, we have u(x) = u\(x] + (x — a)U2(x}. We now use the substitution ay(x} = -,(x—a){22+b , .22 1' where 6 ^ 0 . Since V / ' the function y maps each of the intervals [a, oo] and [—00, a] onto the interval [0, &~ 2 ] we have
where v\ and w2 are functions in C Y (0, 6-2) such that v\ (0) = ^2(0) = 0. According to the Weierstrass theorem, given a positive number e there are polynomials p\ and p% satisfying the condition
Consequently, setting p(x) — p i ( y ( x ) } + (x - a}pi(y(x)} we get
The function p is clearly rational. More specifically, we have
where n is a positive integer and P is a polynomial of degree at most In + 1. We now may select different positive numbers b±, b%,.. ., bn such that
202
CHAPTER 5. MOMENT
PROBLEMS
for all x 6 Ti. (The verification of this fact is left to the reader.) We finally note that the rational function
can be rewritten in the desired form and satisfies the condition
Since s can be arbitrarily small we get u G U(f). the proof. •
This completes
If / is a determinate distribution then the corresponding distribution $ is also determinate. The inverse implication is not true in general. When is determinate and / is indeterminate we say that the distribution / is extremal. We close this section with quoting without proof some additional facts on determinate and indeterminate distributions. Theorem 202.1 Let f be a representation of a functional L 6 'P. (a) Let {jPfcj-J^-o ^ the sequence of orthonormal polynomials corresponding to the functional L and let
Then a necessary and sufficient that at least one of the series
condition that f be determinate is
shall diverge. (b) Let hk = f n t k d f ( t ) for k = 0, 1, 2 , . . . . Then a sufficient condition that f be determinate is that the series ]C?=o / I 2fc shall diverge. (c) Let f be indeterminate and let H denote the set of all functions a (including the case a = ±00) analytic, in the. half plane '3>(z) > 0 and satisfying the condition <<,(t\(z}} < 0 when '-3(z) > 0. Then the function
5.4. DENSITY AND DETERMINATENESS
203
is of the form
where «(/; •) € ti and A, B, C, and D are entire functions of order 1 depending only on the functional L and such that A(z)D(z) — B(z)C(z) = 1. When u(f;z) = a G 7£ U {±00} the numerator and the denominator of the function g have each infinitely many zeros that are all real, simple, and interlacing. (d) The distribution f is extremal iff the function a(/; •) reduces to a constant a 6 7£ U {±00}. Furthermore, any extremal distribution is discrete, i.e., it coincides with a step functions. The jumps of an extremal distribution are located at the zeros of the denominator of the corresponding function g. (e) Let R(L] be the set of all representations of a functional L G P.3 If R(L) is not a singleton then, the set 7£(L) contains infinitely many absolutely continuous distributions. Moreover, the mapping R(L) 9/1-4 a(/; •) € H is a one-to-one correspondence between the setsU(L] andU. 5.4.1
Exercises
118. Assume that a distribution / is of bounded support, i.e., / is constant outside a finite interval. Show that / is determinate. 119. Show that the Gaussian distribution
is determinate. Hint: Without loss of generality assume that m — 0 and a = 1/2. Then consider the series Z^JbLo^-(O) 2 where hk are the normalized Hermite polynomials, hk = Hk/\\Hk\\, see page 33. 120. Substituting ln(i) = x + ^, show that
We maintain the assumption that two substantially equal distributions are considered to be identical.
CHAPTER 5. MOMENT PROBLEMS
204
Then deduce that the distribution
is indeterminate and that polynomials are not dense in the spaces Li(7£,/) and L^Ti-^f}121. Based on Exercise 101, find the inverse of the function
defined in Theorem 202.1. 122. Assuming that Pj are orthonormal polynomials corresponding to a functional L £ T3, define
Prove that
Hint: First show that L(kn(x, -}p) = p(x] for all polynomials p of degree at most n and for all x € l-l. 123. Based on Exercise 122 and the definition of polynomials Qk given in Theorem 202.1, show that
5.5. A SINC MOMENT PROBLEM
205
124. Assuming that L 6 P and the series
both converge show that the corresponding discrete Hamburger moment problem is indeterminate. Hint: By Theorem 195.1 the functional L has a representation / such that j u m p / ( O ) = A(0) > 0. Define /i_ 2 = ££=1 Qk(Q)'\ /i_i = 0 and combine the results in Exercises 112 and 123 to show that L has also a representation f\ that is continuous at 0. 125. Show that if a functional L G P admits an indeterminate representation then the set 72.(L) contains a discrete and nonextremal distribution.
5.5
A Sine moment problem
In this section we consider a moment problem of the form
where A/" (resp., Z] denotes the set of nonnegative integers (resp., the set of all integers) and where F is defined on page 118. Theorem 205.1 Let w £ M a>/ g(P), and assume that
where c is a constant that is independent of N. and \\ • \\ is the sup norm taken over F. Proof In this proof, for simplicity, we shall denote positive constants independent of JV by Cj, 1 < j' < 5. Since w 6 Ma^(T>), it follows from Theorem 119.1, that dw(t) — w'(t] dt, with w'/fi € La^(D). Now, approximating the indefinite integral of w', along the lines of Theorem 137.1, we get
206
CHAPTER. 5. MOMENT
PROBLEMS
Now, setting /J,j = Jp w'(t) S ( j , /i) o(j>(t) dt, we have, from Exercise 71 on page 134, that Vj\ = \JJLJ — hw'(zj)/(j)'(zj)\ < c 2 £/v/-/V'. Hence, it follows that ||v|| < CSSN/N, where v = [U_M > • • - i VN]T. Moreover, H/t- 1 )!! = O ( N ) , and thus ll/^Hl < c4£N. Hence, by Exercise 75 on page 139, it follows that H H / ^ ^ v H < c5sN log AT. Finally,
as claimed. • Remark Here, we may note, that from a stability standpoint, we have II/^^H = O(N), whereas v||2 = 0(1), resulting from the identity supkez 5(fc, h)2(x] = 1 for all x £ 12. Thus this procedure is only mildly ill-posed. Since Ma>p(V) is a Banach space, it follows from the above inequalities that the sequence {Il/^^u}^' is a Cauchy sequence, which converges to a unique w £ Ma^. Well-posed methods of approximation of other moment problems of the form fpt/f(x) dw(x) — ^1^ can thus be obtained via the constructive approximation of the functions S ( k > h ) o <j) with a linear combination of the t k , whatever these functions may be. •
5.5.1
Exercises
126. Show that ||v||2 = 0(1). 127. Prove that the sequence {II/^^uj-j0 converges to a unique w e Mai/3. 128. Show that, under the assumptions of Theorem 205.1, the procedure of construction of w requires O(Nlog(N)) arithmetic operations to achieve a result accurate to within e — O(£N).
5.6
Multivariate orthogonal polynomials
Orthogonal polynomials in one real variable x satisfy the following three-term recurrence relation
where p_i = 0, pk is of the kth degree and c/c+io^./a^+i > 0 for all k, see Theorem 28.1. The converse is also true and it is formulated in the following statement.
).6. MULTIVARIATE
ORTHOGONAL
POLYNOMIALS
207
If polynomials {Pk(x)}'j^=o are defined by the recurrence above, then there exists a positive measure a on the real axis such that
This statement is known in the literature as the Favard theorem. In this section we combine some algebraic techniques with moment problem methods to characterize orthogonal polynomials in several variables. We begin the discussion with introducing some notation. In the sequel, we shall denote by H(n)—the linear space of all polynomials with complex coefficients in n real variables. Given p €E H(n), we shall denote by degp the total degree of the polynomial p. We recall that any basis of H(n) contains exactly r* = (n+k~1) polynomials of the degree k. By n&(n) we shall mean the subspace of II(n) consisting of polynomials of degree at most k. Let where T stands for the transpose and p\, p^,..., pi- are polynomials in the variables x\, x%, . .., xn. Then xP is to mean the polynomial vector For example, if n = 2 and P = [x\, XiX-i]'1, then
Given a basis P of H(n) we may arrange the polynomials in P into vectors P0, PI, P2, . . . , such that for a fixed fc, the vector P/,. consists of exactly r^ polynomials of degree k. We say that a linear functional £ : II(n)—»C defines a quasiinner product in II (n) if there exists a basis Q of H(n) such that
for all polynomials p and q in Q. Given such a functional £ and given a matrix A = \pj,k\ whose entries pj^ belong to II (n), we denote by L(A] the matrix of numbers C(pjtk)- The last equation says that L(Qj Q J) = 0 when j ^ k and that £(Qj Qj] is a diagonal nonsingular matrix for j > 0.
208
CHAPTER 5. MOMENT
PROBLEMS
For an arbitrary matrix M whose number of columns is a multiple of n, we define
where M = [M\ \ M<2 \ ... \ Mn] and all blocks M& are of identical dimensions. We are now ready to formulate a result that relates orthogonality in n(n) to suitable implicit recurrence relations. Theorem 208.1 Let P be an arbitrary basis ofYl(n). There exists a linear functional C which defines a quasi-inner product in FI(n) and satisfies
iff for k = 0, 1, 2 , . . . there exist unique matrices A^, BI;, (7/t such that
and for a fixed sequence of matrices DO, D\, ... satisfying D^Ak = I, where I is the unit matrix, the recursion
produces nonsingular matrices Ik. Moreover, we have Ik = C(Pk P^} and the factorizations Ik — TkMkT^ , where Mk are diagonal matrices, induce the vectors Qk = T^ Pk whose polynomial coefficients are orthogonal with respect to C, i.e.,
Proof We begin the proof with deducing orthogonality from the relations (a), (b), and (c). To this end, let us define a linear f u n c t i o n a l £ on yr(ri) by the equations
5.6. MULTIVAR1ATE ORTHOGONAL POLYNOMIALS
209
Since P is a basis of II(n) the functional £ is well-defined. Let us assume that £(PiP?) — 0 for every i, j such that 0 < i < k and j > i. Here, k is a fixed nonnegative integer. According to (a) and (b), Ak is a nr^ X r*+1 matrix of full rank. Thus, Ak has a left inverse JD&, DkAk — I. Taking this observation together with (b) and (c), for / > k + 1 we get
which by induction proves that the matrices Ij are symmetric, Ij = L (PjPj1') , and that
The symmetry and nonsingularity of I/, implies that Ik = T^M^T^ where M& and Tj, are some matrices such that Mk is diagonal and £ 0. Let us define a basis Q of fl(n) by
Then for each k we have
This shows that C, defines a quasi-inner product in l\(n). We shall now deduce the recurrence relations from orthogonality. Since B is a basis of II (n), for each k > 0 the polynomial vector xP can be written in the form
where Hkj are suitable n rka x rjn matrices. These matrices are, of course, unique. We shall show that rank //&,&+ 1 = 't'n+l • To this end, we assume to the contrary that every r7^+1 rows of Hkk+i are
210
CHAPTER 5. MOMENT
PROBLEMS
linearly dependent. Thus, any set Y of r£ +1 polynomial coefficients of the vector xPk is linearly dependent, with respect to the space Uk(n], i.e., span{F} fl iTU(n) ^ {0}. On the other hand, it is easy to verify that where the right-hand side denotes the linear space spanned by all elements of Ilfc(n) and all polynomial coefficients of xPj.- Thus, we obtain the contradiction. Let us now assume that C defines a quasi-inner product in H(n) and satisfies the conditions
Without loss of generality we require that £(Po P0r) = [1]. Thus, C(PjPT) = 0 if the vector P consists of polynomials in II.;_i(7i). For some basis Q of ll(n) we have C(Q.,Qj} — 0 when j ^ k and £ ( Q k Q ' k ) — Mk, where Mk is a diagonal nonsingular matrix (k > 0). For each k we have P/, = Y?,j=oTk,jQj, where T/,:>?- are suitable matrices such that detT^ / 0. Since C(Pk Qj) - 0 for j < k - 1, we arrive at the conclusion that T^j = 0 for j < k — I . Hence, Pk = Tk^Qk and the matrices C(Pk Pk) — T^M^T^ are symmetric and nonsingular. We are now ready to show that Hk,j = 0 for A; > 0 and j < k — 2. Indeed, for such k and j we have
and
Hence Hk,j = 0, as claimed. Upon defining Ak — H^k+i, Bk = Hk,k and Ck = Hk,k-i, we obtain (a) and (b). In order to get (c) with Ik = C.(Pk P^} we may repeat the induction argument from the first part of the proof. The theorem is proven. • Remarks
1. For n > 1 the left inverse matrix Dk of Ak is not unique. In order to see this better we consider the singular value, decomposition of Ak:
r 6.6. MULTIVARIATE ORTHOGONAL POLYNOMIALS
211
Here, H stands for the conjugate-transpose, Uk, Vk are orthogonal matrices and Ylk is a diagonal positive definite matrix. It is easy to verify that the factorization
with an arbitrary r?^j+1 x (nr^ — r^+l) matrix Xk, states all possible choices of Dk. Therefore, (b) leads to infinitely many explicit recurrence formulas of the form
Here, Ek = -DkBk, Fk = -DkCk and k = 0, 1,... . This shows that in order to construct Pk+\ from Pk and Pk-i it is sufficient to compute r£+1 (n + l)r£ + r^"1 entries of the matrices Dk, Ek, and Fk- Actually, Dk can be chosen in such a way that it possesses exactly r^+1 nonvanishing columns, see Exercise 130 on page 218. Then it is sufficient to determine only r*+1(r£+1 + r£ + r£~ 1 ) of such entries. 2. Since rank Ak = rank Dk it follows by induction that Ik is nonsingular iff rankCfc+i = r^+1. Let us also note that actually the matrices Ik are independent of the choice of the matrices Dj. 3. If the basis P consists of real polynomials, then the corresponding matrices Tk and Mk can be chosen to be real. Moreover, the equation (/,) = £(/#) defines an inner product in H(?i) iff the matrices Ik featuring in (c) are positive definite. 4. When n = 1, the polynomial vectors and matrices in Theorem 208.1 can be replaced with their unique entries, establishing an equivalence between three-term recurrence relations and orthogonality. In particular, the theorem says that polynomials {p/e}fcLo defined through the recursion
where p\ = 0, and Cfc + i 0^/0^+1 = Ik+i/Ik •£ 0 for all k, are orthogonal with respect to a linear functional £. If Ik+i/Ik is positive for every k, the functional defines an inner product in H(l), i.e., £(pp) > 0 for all p 6 H ( l ) \ {0}. Since any polynomial in one variable, which is nonnegative on the real axis, can be represented as a sum of squares of polynomials, we see that C(p) > 0 if p > on Ti.
212
CHAPTER 5. MOMENT
PROBLEMS
Thus, by Corollary 178.1 the functional C takes the form
where /j, is a positive measure on Ti. This immediately gives the Favard theorem. • We shall now consider the problem of integral representations of linear functionals on the space of polynomials in n variables. To this end, we shall take time for technical preparation. Let us consider two bases P = {pj}JL1 and Q — {q^JL-^ of the space Il(n). We order the elements of P and Q in such a way that Each polynomial q^ can be written in the form q^ = Y^JLi ck,jPji where the coefficients c^.j vanish for sufficiently large j. We shall now prove the following useful lemma. Lemma 212.1 Let j be a positive integer. Let us assume that for some positive measure yu on TV1 the basis Q is orthonormal in the space Z/2(7£ n ,/u). Then a necessary and sufficient condition for the existence of a function p in L>i(TVl, //) such that
is the convergence of the series Y^=i \ck,j 2 - Here, Sj^ is the Kronecker delta. Proof We define a linear functional y on II(n) by the equations Given a polynomial w = Y^k=i ak ?fc, o-k € C, we have
By applying the Schwarz inequality we get
5.6. MULTIVARIATE
ORTHOGONAL POLYNOMIALS
where ||w;|| is the norm in L2(Tin,fi}. exact when ak = ~c~k~j, we have
213
Since this estimate becomes
This shows that the functional y is continuous with respect to the norm | • || iff the series ]C/*ii \ c k , j \ 2 converge. By the Hahn-Banach theorem, there exists a norm-preserving extension C, of y to the space L-2(Tin,/j.). We conclude the proof using the Riesz theorem to represent the functional £ in the form £(/) = J^n f ( x ) pd/u(x), with / ^ eL 2 (7^^M).
»
We shall now show that corresponding to the basis P one may select an orthonormal basis Q of a simple structure such that the convergence condition in Lemma 212.1 is satisfied. Lemma 213.1 There exists a distribution function : ~R. —>• Ti and real polynomials {qk}'j£=o ^at are orthonormal with respect to $>, i.e.,
and such that the basis
has the following property. For the r\L x •/•£ matrices (V,^. defined by the expansion
we have ||G Y ;,fclloo < 2~ t+1 |G 0 ,o • Here, i, k = 0, 1, 2 , . . . and \\ • ^ denotes the infinite matrix norm 4 Proof We shall define the polynomials q^ through the recursion
where x € "fc, q~\(x) — 0, qo(x) = 1, and cvfe + i, jk+i are suitably chosen positive numbers, fc = 0, 1, . . . . Let us observe that for the polynomial vectors Qk induced by the basis
4
Given an n x m matrix A = [tii.j], H ^ H ^ is defined as max,< n X^' = i a « , j l -
214
CHAPTER 5. MOMENT PROBLEMS
we have
Here, the matrices Dk and Fk are chosen in such a way that they have exactly one nonvanishing entry in each row. These nonzero elements are from the sets
and
respectively. Hence, H-D/tH^, = max {a, : i = k + 1, A;, . . ., \(k + l ) / n ] } and Halloo = max{7i : i = fc + 1, A;, . . ., \(k + l)/n]}. Let us now define matrices Ai,k, Ci,k by the expansions
We note that xQk = £]£_i [Gk,i\ xPi, where [Gk,i] stands for the Ti-block matrix
Substituting the expansions above to the recurrence relation for the vectors Qk we obtain
where k > 0, j — 0 , . . . , k + 1 and GSit — 0 for « < 0 or t > a. Thus, we get
5.6.
MULTIVARIATE
ORTHOGONAL POLYNOMIALS
215
We now select the coefficients ak and 7^ in such away that they satisfy the following relations.
Thus, we have
Hence, Finally, let us note that by the Favard theorem, the polynomials are orthonormal with respect to some positive measure on Ti,
This completes the proof. • Corresponding to a distribution cf) on TZ, we shall denote by <•/>(") the n dimensional distribution (positive measure) on Tin such that
We are now in a position to demonstrate integral representation of linear functional on FI(n). Theorem 215.1 Given a linear functional L : U(n) —>C there is an indeterminate distribution (/> on TL and a function p € LiCR,n, <j^n}} such that
Moreover, the measure <j> can be chosen in such a way that either (i) (j) is absolutely continuous, and the representation above is satisfied for infinitely many functions p in L2(7?-", 4>^), or (ii) <j> is discrete, and the representation holds for infinitely many functions p in L'2(Tin, >'"'), or
216
CHAPTER 5. MOMENT
PROBLEMS
(lii) (/) is discrete, and the representation holds for a unique function p inL2(nn,(^). Proof Let £ : n (n)—>C be a linear functional. Without loss of generality we assume that £ does not vanish identically. Let P* = fet}£Li be a fixed basis of the space U(n), degp*j+l > deg£>j, and let j be the smallest integer such that C,(PJ) ^ 0. We now define a new basis P = {pk}kLi of n(re) by the equations
Thus, we have Corresponding to the basis P, let us consider the measure > and the basis Q induced by Lemma 213.1. Now, from Lemma 212.1 we get the existence of a function p 6 L-z(Jin, (^n^) such that
Thus the desired representation follows readily. From the proof of Lemma 213.1 it follows that the measure <j) can be chosen as any solution of the discrete Hamburger moment problem associated with orthogonal polynomials q^ that satisfy the three-term recurrence relation
where x £ 1Z, q~i(x) = 0, qo(x) = 1, 0 < otk+i < Q^/8, and k — 0, 1 , . . . . This implies that the series ]CtLo W), i.e., n(n) = L-2(nn,^). The last condition reduces to the density of polynomials (in one variable) in the space L-2(TZ, ), see Exercise 133 on page 219. From Theorem 202.1 it follows that: • The condition (i) is satisfied when <*/> is chosen to be absolutely continuous. • The condition (ii) holds when <j> is chosen to be discrete and nonextremal.
5.6. MULTIVAR1ATE ORTHOGONAL POLYNOMIALS
217
• The condition (iii) is satisfied when (/> is chosen to be discrete and extremal. The proof is complete. • In what follows we assume that the algebraic conditions of Theorem 208.1 are satisfied with positive definite matrices /^., k = 0 , 1 , . . . . Then the linear functional £ induced by this theorem defines an inner product in n(n), i.e.,
In Remark 3 on page 211 we indicated that for n = 1 this condition implies that the functional £ takes the form
where / 6 II(n) and fj, is a positive measure on Kn. Unfortunately, for n > 1 the implication is not valid any longer (for references see Annotations at the end of this chapter). A major obstacle is the fact that for n > I a nonnegative polynomial in II (n) may be not representable as a sum of squares of real polynomials. An example of such a polynomial is given in Exercise 111 on page 188. Thus, in order to extend the result in Remark 3, page 211, toward dimensions n > 1 we need an additional assumption. Corresponding to the basis P in Theorem 208.1 let us define the sets We say that zeros of polynomials in P are not dispersed if for every positive integer k there exists a compact and convex subset Sk of TV1 such that each polynomial in Pk has at least one zero in Sk- We are now ready to prove the following theorem. Theorem 217.1 The linear functional C, in Theorem 208.1 has an integral representation
with a positive measure [i iff zeros of polynomials in the basis P are not dispersed. Proof Without loss of generality we assume that the constant polynomial in the basis FI equals 1. Thus, the functional £ is completely defined on n(?i) by the conditions
218
CHAPTERS.
MOMENT PROBLEMS
When £ takes the integral form, we use the Chakalov theorem, which reads as follows. For each k € M there is a positive measure > on Tin such that its support s^ consists of at most r^ points and Since L(Pk) = {0}, we conclude that any polynomial in Pk has at least one zero on the convex hull of s^. Conversely, if the zeros of polynomials in P are not dispersed, then by Theorem 176.1 (the equivalence (a) <=> (e)) it follows that for each k G A/" there exists a positive measure jj,k such that
Taking this together with the theorems of Helly we get the existence of a positive measure jj, on 7in such that
This gives the desired integral representation and completes the proof. •
5.6.1
Exercises
129. Given a fixed basis P of H(n) show that P is orthonormal with respect to a linear functional iff for k = 0, 1 , 2 , . . . the basis satisfies the conditions (a) and (b) of Theorem 208.1 with matrices A^ and Ck+i such that Ck+i = b p ( v 4 ^ ) . 130. Prove Remark 2 on page 211 and reformulate the condition (b) of Theorem 208.1 without using matrices /^. 131. Show that the matrix D^ in the recurrence formula (211.1) can be chosen in such a, way that it possesses exactly ?',^+1 nonvanishing columns corresponding to those polynomial coefficients of the vector xP'k which are linearly independent with respect to the space Y\k+i(n). 132. Assume that the matrices /& in Theorem 208.1 are symmetric and positive definite. Prove that the matrices Tk and M& such
5.7. ANNOTATIONS
219
that Ik = TfcMfcT/' can be chosen in such a way that either • Tfc is real and lower triangular, Mk is real and diagonal, or • Tk is complex and Mk is the unit matrix. 133. Show the equivalence: Ti(n) = L 2 (ft n ,<£ ( n ) )
5.7
iff n
= L^(n,).
Annotations
The material selection for this chapter is motivated by the ties of the theory of moments with the theory of approximation. We focussed our attention on duality results, classical existence theorems, density question, and orthogonal polynomials. We should like to stress that we barely scratched the surface of the subject. Theory of moments is very broad and, besides of approximation, connects to diverse areas of mathematics such as functional analysis, differential equations, numerical analysis, probability, statistics, and signal processing, just to mention a few. For theoretical results, some interplays and history of the subject the reader is referred to the monographs [1], [19], and [25]. An overview of applications is given in the series of articles [4], [7], [10], [11], [20], and [24]. The approach presented in this chapter is based on [18] and has been not published in a textbook before. Specific comments Section 5.1: Theorem 171.1 is taken from [18]. For / = {0, 1 , . . . , n}, V = {0}, and linearly independent elements po, pi, . . . , pn the moment problem (PM) reduces to the Lproblem studied in [2] and [19]. Corollary 171.1 is then a restatement of the duality principle formulated therein and can be taken as a departure point for studying classical approximation. A short proof of the Mazur— Orlicz theorem can be found in [26]. The Riesz theorem and the Riesz-Steinhauss theorem are proven in [23]. Proofs of the Helly theorems are given in [5]. Section 5.2: This section is based on [18]. Section 5.3: The classical moment problems listed in Section '2.3 are also discussed in [1]. However, the proofs of solvability conditions in Theorem 181.1 do not follow [1] and, instead, they are based on
220
BIBLIOGRAPHY
Corollary 178.1 and representations of nonegative polynomials that are derived in [22] and [25]. Section 5.4: Most of the material of this section is based on [8]. Theorem 202.1 summarizes most important results on the determinate and indeterminate distributions that are proven in [25]. The reader interested in density of multivariate polynomials in Lp spaces is referred to [3] and [9]. Section 5.5: The results of this section are new. Section 5.6: This section is based on the series of articles [13], [14], [16], and [15]. Other general results on multivariate orthogonal polynomials are presented in [17] and [28]. Interesting examples of orthogonal polynomials in two variables are discussed in [12] and [27]. A linear functional on U(n), n > 1, satisfying the condition
may fail to possess an integral representation with a positive measure. This fact is proven in [6]. The example in Exercise 111 on page 188 of a positive polynomial that is not representable as a sum of squares of real polynomials is taken from [6]. For the Chakalov theorem we refer to [21].
5.8
References
[1] N.I. Akhieser. The Classical Moment and Some: Related Questions in Analysis. Oliver and Boyd, Edinburgh, London, 1965. [2] N.I. Akhieser and M.G. Krein. Some. Quest/ions in the Theory of Moments. AMS, Providence, R.I., 1962. [3] C. Berg and J.R.P. Christensen. Density questions in the classical theory of moments. Ann. Inst. Fourier Grenoble, 3(31): 99114,1981. [4] C. Berg. The multidimensional moment problem and semigrups. Proc. Symp. Appl. Math., 37 (Moments in Mathematics): 110-24, 1977.
221
BIBLIOGRAPHY
[5] S. Bochner. Monotone Funktionen, Stieltjessche Integrale und harmonische Analyse. Math. Ann., 108: 378-410, 1933. [6] J.R.P. Christensen , C. Berg, and C.U. Jensen. A remark on the multidimensional moment problem. Math. Ann., 243: 163-69, 1979. [7] P. Diaconis. Application of the method of moments in probability and statistics. Proc. Symp. Appl. Math., 37 (Moments in Mathematics): 125-42, 1977. [8] G. Freud. Orthogonal Polynomials. Pergamon Press, Oxford, New York, 1971. [9] B. Fuglede. The multidimensional moment problem. Math., 1: 47-65, 1983.
Expo.
[10] T. Kailath. Signal processing applications of some moment problems. Proc. Symp. Appl. Math., 37 (Moments in Mathematics): 71-109, 1977. [11] J.H.B. Kemperman. Geometry of the moment problem. Proc. Symp. Appl. Math., 37 (Moments in Mathematics): 16-53, 1977. [12] T. Koornwinder. Two-variable analogues of the classical orthogonal polynomials. In R.A. Askey, editor, Theory and Applications of Special Functions. Academic Press, 1975. [13] M.A. Kowalski. Orthogonality and recursion formulas for polynomials in n variables. SIAM J. Math. Anal., 13: 316-23, 1982. [14] M.A. Kowalski. The recursion formulas for orthogonal polynomials in n variables. SIAM J. Math. Anal, 13: 309-15, 1982. [15] M.A. Kowalski. A note on the general multivariate moment problem. In Constructive Theory of Functions'84, pages 49399, 1984. [16] M.A. Kowalski. Algebraic characterization of orthogonality in the space of polynomials. Lecture Notes in Math., 1171: 101-10, 1985. [17] M.A. Kowalski. Moments of square-integrable functions. Math. Anal. Appl., 127(1): 237-45, 1987.
J.
[18] M.A. Kowalski and Z. Sawori. The moment problem in the space Co(.S'). Mh. Math., 97: 47-53, 98:225, 1984.
222
BIBLIOGRAPHY
[19] M.G. Krein and A.A. Nudel'man. Markov Moment Problem and Extremal Problems. AMS, Providence, R.I., 1977. [20] H.J. Landau. Classical background of the moment problem. Proc. Symp. Appl. Math., 37 (Moments in Mathematics): 1-15, 1977. [21] I.P. Mysovskikh. On Chakalov's theorem. USSR Cornp. Math., 15: 221-27, 1975. [22] G. Polya. Aufgaben und Lehrsatzc aus der Analysis. Dover Publications, New York, 1945. [23] W. Ruclin. Real and Complex Analysis. MacGraw-Hill, New York, 1974. [24] D. Sarson. Moment problem and operators in Hilbert space. Proc. Symp. Appl. Math., 37 (Moments in Mathematics): 5470, 1977. [25] J.A. Shohat and J.D. Tamarkin. New York, AMS, 1943.
The Problem of Moments.
[26] R. Sikorski. On a theorem of Mazur and Orlicz. Studia Math., 13: 180-82,1953. [27] P.K. Suetin. Orthogonal Polynomials in Two Variables. Moscow, Nauka, 1988. In Russian. [28] Y. Xu. On multivariate orthogonal polynomials. SI AM J. Math. Anal, 24(3): 783-94, 1993.
Chapter 6
Introduction to n-Widths and s-Numbers n-Widths and s-numbers provide conceptual generalizations of the classical concepts of best approximation. These concepts give both a new idea of best approximation, and they also play an important role for better understanding of best approximation and complexity. In Chapter I the reader was presented with general results on best approximation of a single element / of a normed space 'J- with respect to a finite dimensional subspace V of T. When / varies within a set A £ J-~, there is no a priori reason to believe that a particular choice of V will be good for all elements of A. One may let the subspace V vary within T and choose the one for which the supremum supa€Ae(f,V) is as small as possible. This idea and its rigorous mathematical formulation was announced by Kolmogorov in 1936. It received much attention in last thirty years and proved to be very fruitful. The followers, notably Tichomirov and Pietsch, enriched mathematics with a new theory, currently known as the theory of n-widths and s-numbers. The subsequent sections introduce the reader to this research.
6.1
n-Widths
Let T be a normed linear space and let V be its linear subspace of dimension at most n - 1 (?i 6 J\f}. We maintain the assumption that the field of scalars in T is either 71 or C. Suppose that given a subset A of J- we wish to approximate, as closely as possible, any member of A with elements from V. Then, it is reasonable to measure the quality of approximation by the "worst case" best.
224
CHAPTER 6. N-WIDTHS AND S-NUMBERS
approximation error sup a6j4 e(a, V). We may also ask how accurate members of A can be approximated with elements from subspaces of f having dimension less than n. This question suggests to consider the quantity
and to find a subspace Vb attaining the infimum. The number dn(A, J-) is called the Kolmogorov n-width of A in J- and the VQ (if exists) is called its extremal subspace. Remark In the literature one often assumes that dim V < n in the definition of the Kolmogorov n-widths. • We shall list bellow basic properties of the Kolmogorov n-widths. Theorem 224.1 Let F, A, and n be as above. Then: (a) We have
where cb(A) is the convex balanced hull of the closure of A, i.e.,
(b) For any subset D of A, we have
(c) For any linear subspace 8 of J- containing the set A we have
The proof is left to the reader as an exercise. Example 224.1 Let B(r] be the ball of radius r > 0 in J", i.e.,
For any subspace V of T such that dim V < n — I there exists an element x 6 dB(r] such that \\x\\ = e(x, V) (see Exercise 2 on page 12). Thus dn(B(r},F] > r, and as
we arrive at the conclusion that d n ( B ( r ] , F ) = r for every integer n < dim T . •
6.1. N-WIDTHS
225
A far reaching generalization of this example can be obtained using the Borsuk-Ulam theorem on antipodes, which reads as follows. Let D be a bounded, open, and balanced subset1 of Ti'"1 and letT : dD-+'R,m~l be a continuous mapping. Then, there is a point d in 3D such that T(d) = T(—d). This result gives an important tool for obtaining nontrivial lower bounds on the Kolmogorov n-widths: Theorem 225.1 Let V be an n-dimensional subspace of T and let By(r) be a ball of radius r > 0 in V, i.e.,
Then, dk(Bv(r)}J:) — r for k = 1, 2 , . . . , n and consequently dn(A,F) >r ifBv(r) C A. Proof By the assertion (b) of Theorem 224.1 and the obvious inequalities
we need to prove that dn(Bv(r), f ) > r. We shall show this inequality assuming that the field of scalars in F is C. The adaption to the case of real scalars is left to the reader. Let {vi, f 2 , . . . , vn} be a basis of V and let {y\, y % , . . . , yn~i} be arbitrary linearly independent elements in F. It is enough to show that there exists an element x in dBv(r} such that
where i = \f—\. To this end, we assume without loss of generality that T is the minimal linear space containing the elements yk and V i.e.
Thus, clin^J 7 ) < CXD and by the result, in Exercise (i (page 13) wo may also confine ourselves to the case when 'J- is strictly convex. A subset, D of a linear space is said to be lni.lu.iic.efl if d 6 D implies — d t I).
226
CHAPTER 6. N -WIDTHS AND S-NUMBERS
We are now in a position to use the Borsuk-Ulam theorem. It is easy to verify that
is a bounded open and balanced subset of 72.2n. We now set
where ]Cl-=i ("Ik + i^k)yk is the optimal approximation of Y?k=i (ak + i/3k)vk with respect to the space spanned by the elements y ^ . By virtue of Theorems 3.1, 6.1, and 7.1 the T is a well-defined odd and continuous mapping of dD into 72.2""1. From Theorem 6.1 we now conclude that there exists a 2w-tuple (ai, 61,. . ., a n , bn) such that
Thus,
is the desired element. • We shall now show the following consequence of Theorem 225.1. Corollary 226.1 Let H be a separable Hilbert space of infinite dimension and let
where {^k}^=i ?s « complete orthonormal system in H and where numbers /?& are such that
Then, and span{i/i, z ^ 2 , . . . , t'n-i} is an extremal subspace for dn(G, H).
6.1. N -WIDTHS
227
Proof We shall first show that d u ( G , H ) > \./\lin . According to Theorem 225.1 it is enough to prove that
To this end, we pick up an arbitrary / € Bn and note that
Thus, / € G and the inclusion follows. On the other hand, we have
We see that dn(G,H] — l/|/9 n | and spanjz^i, ^ 3 , . . . , f n -i) is an extremal subspace for dn(G, H). The proof is complete. • We illustrate this corollary by the following example. Example 227.1 Let a, T and E be positive numbers. Let W(a) be the space of all signals of bandwidth [—a, a], as defined on page 23. We denote by W(a; r) the subspace of L^—T^T] consisting of the restrictions of signals in W(a) to the real axis and define
where \\f\\\>00 = !T>00\f(t}\^dt. We remark that for / G W(a;r) the quantity ||/||2]00 has a physical interpretation of the energy of the signal /. Based on properties of the prolate spheroidal wave functions one easily finds that
228
CHAPTER 6. N-WIDTHS AND S-NUMBERS
and
where the functions
and
is the extremal subspace for d n ( J ( a , T, E ] , L^ — T, T)), n > 1.
•
The following result characterizes precompact sets in terms of convergence of their Kolmogorov n-widths to zero. Theorem 228.1 A set A C 3~ is precompact iff it is bounded and
Proof Let us assume that A is a precompact subset of jF. This means that given a positive number e there exists an e — net for A, i.e., a finite set N = N(e) = {si, 5 2 , . . . ,S TO _I} satisfying
Thus, A is clearly bounded and
Since £ can be chosen arbitrarily small we get
Conversely, let us assume now that this equation holds for some bounded subset A of J - . Hence, for any positive number 5 there exists a finite dimensional subspace V of J- such that sup rte/1 e(a, V) < 6/2. Since A is bounded, there exists a compact subset K of V satisfying
Consequently, any <5/2-net for K is a 5-net for A, and the precompactness of A follows. The proof is complete. • We shall now introduce more quantities related to approximability of elements from A C J~ with members of finite dimensional subspaces of T . These are:
6.1. N-WIDTHS
229
• The Bernstein n-width of A in T\
Here V is any ri-dimensional subspace of T and By(r) is the ball in V of radius r, centered at zero. • The Kolmogorov linear n-width of A in JF:
The infimum above is taken over all linear continuous operators Pn : F^rT such that rankP n =f dimF n (7") < n - 1. • The Gelfand n-width of /I in T:
Here, L is an arbitrary continuous linear operator on J- assuming values in Cn~~l (or in 7ln~l if the field of scalara in ? is ft)." In the definitions above the suprema taken over the empty set are to mean zero. Remark Our notation for these three n-widths differ slightly from the literature standard. Most authors write bn-i(A,F) instead of bn(A,F), Xn-l(A,Jr) instead of an(A, F) and dn-l(A,F] instead of cn(A,F). • 6.1.1
Relationships between n-widths
From now on we assume that the set A is closed, convex, and balanced. This is not restrictive for dn(A, F} and an(A, J7), and simplifies the further presentation of relations between the four n-widths we defined. We remark that in general
but the equalities cannot be taken for granted. As an easy consequence of the definitions above we get Theorem 229.1 //' A C B C F and n > 0, then
230
CHAPTER 6. N -WIDTHS AND S -NUMBERS
and
Let us note that the Bernstein n-width of A can be regarded as the best lower bound on d^A,^) yield by Theorem 225.1. If
for some n-dimensional subspace V, then V is called an extremal subspace for bn(A, F}. The Kolmogorov linear n-wiclth of .4 may be derived as dn(A, F] when we confine ourselves to linear approximation methods. Thus,
Moreover, the following theorem holds. Theorem 230.1 For any linear subspace £ of J- containing the set A and for any n > 0 we have
Proof By the definition of the Kolmogorov linear n-widths, the proof reduces to showing that each continuous operator P on G such that rank P = k < n can be linearly extended to the entire space 7 preserving the rank and continuity. To this end, we note that P is of the form
where gj 6 £ and Lj are continuous linear functionals on £. Thus, the Hahn—Banach argument immediately gives the desired extension and completes the proof. • The Gelfancl n-width c n (A, .F) brings us to the problem of n continuous linear constraints on elements in A that maximally reduce the elements' largest norm. If
for some continuous linear operator L : J-^-Cn~l, the best constraints are the components of L equated to zero, and kerL is called an extremal subspace for cn(A,J-).
6.1. N-WIDTHS
231
We remind the reader that a linear subspace V of T is of codimension k, codim(V) = A;, if V = kerL for some linear operator L : T'—>Cfc whose components are linearly independent. (We require that L : T -+7lk when the normed space T is defined over the reals.) If the subspace V is closed, we may additionally assume that L is continuous. Thus,
When calculating cn(A,F) we may replace T by span(A)—the smallest linear subspace of T containing all elements of A. More generally, we have:
Lemma 231.1 If A C T\ and FI is a linear subspace of' F (endowed with the same norm as T), then c n ( A , f ) = cn(A,f\). Proof The Hahn-Banach theorem implies that any linear and continuous operator L : J-\ —> Cn~^ can be extended to the entire space J- preserving linearity and continuity. Thus, c n ( A , J ~ ) > cn(A,J-\). The converse inequality is obvious. • This lemma allows us to get the following analogue of Theorem 225.1 for the Gelfand n-widths. Theorem 231.1 Let V be an n-dimensional subspace of T. Then for r > 0 and k — I , 2 , . . . , n we have
Proof We clearly have Ck(Bv(r),F) < r. Thus, by Lemma 231.1 it suffices to prove that Ck(Bv(r), V] > r, which reduces to showing that for any choice of k < n linear functionals LI, L%,.. .,Lk on V there is an s 6 By(r) such that
We complete the proof noting that the condition dim V — n guarantees the existence of such an s. • We shall now present basic relations between the '/(.-widths. Theorem 231.2 For any 'integer number :n. we have: (a) dn(A,F), cn(A,f)£(bn(A,f),an(A,:F)}.
232
CHAPTER 6. N-WIDTHS AND S-NUMBERS
(b) If A is a proper1 subset of an n-dimensional subspace V of J-, then bn(A,F] = dn(A^) = cn(A,F) = an(A,F] = inf{||a|| : a e 6A}. (c) If J- is a unitary space, then
Proof (a) From the definition of the yj-wiclths dn, bn, and an and from Theorem 225.1 it readily follows that bn(A, T} < d n ( A , F } < an(A, f ) . It remains to show that bn(A.^F) < c n ( A } F) < an(A,F}. Let us note that if P : J-—> J- is a continuous linear operator and rank P < n, then
Thus, an(A, jF] > cn(A, f ) . We also have cn(A, ^) > bn(A, T] since, by Theorem 231.1, cn(Bv(r), F] > r. The proof of the assertion (a) is complete. (b) Let V be an arbitrary ?i-dimensional subspace of T and let r be a positive number. The assumptions of (b) and the definition of the Bernstein n-width yield:
Thus, by (a) we only need to show that
We now invoke the Mazur theorem on supporting hyperplanes, which states as follows. Corresponding to each element m on the boundary of a proper convex subset DofV there exists a nontrivial linear continuous functional y : V —> 72. such that
The geometric interpretation of this fact is that the hyperplane H = {m + y : y E kerj^} passes through the point in and supports the set D. For some element b in 8A we have
6.1.
N-WIDTHS
233
This is illustrated in Figure 233.1. Taking into account that A is
Figure 233.1: The set A and its supporting hyperplane balanced and letting D = A, m = b we obtain a linear functional L such that If b € kerL, then A C kerL and consequently an(A, f ) = 0, since dim kerL = n— I . Thus, we may assume that b $ kerL. Then for every v in V we have
where {v\, u 2 , . . ., u n _] } is a basis of kerL and where the coefficients Cj — Cj(v] are continuous linear functionals on V. Since L(u) = c0L(b), we see that v £ A implies |co(f)| < 1. By the HahnBanach theorem the functionals c/^ can be extended to the entire space F preserving linearity and continuity. Then, P = Y^k=l ckVk is a continuous linear operator on J- and rankP = n — 1. Moreover, for every a in A we have
This yields and the assertion (b) follows. (c) Each element ,s of a unitary space J- has a unique optimal approximation P(s] with respect to eac.h finite dimensional subspace
234
CHAPTER 6. N-WIDTHS AND S-NUMBERS
V. Moreover, the operator P is the orthogonal projection onto V, see Theorem 14.1 and Corollary 15.1. Thus, the assertion (c) follows easily. The proof of is complete. • We remark that neither of the inequalities ^(A,^) < dn(A,F] and cn(A, J") > dn(A,J:'} holds true in general. This fact is demonstrated in Exercises 136 and 137 on page 236. We illustrate the foregoing material with some results on nwidths of Sobolev classes W£,
Here r £ M, p 6 [l,oo] and |j • || is the standard norm in L p (0, 1). For any q 6 [1, oo] the class W£ is a subset of the space Lq = Lg(Q, 1). Theorem 234.1 The entries of the table below indicate the asymptotic behaviour of the widths dn(W£, Lq), cn(W£, Lq], an(W£, Lq] for r > 2, depending on several cases of relationships between p and q.
Table 234.1: n-Widths of Sobolev classes
6.1. N-WIDTHS
235
The ©-notation used in the table is defined as follows. Given reals w ? i, wn (n = 1 , 2 , . . . ) the. equation vn — (")('(%) is to mean that there exist positive constants CQ and c\ such that
Thus, the constants corresponding to the table entries are independent of n, but they may (and generally do) depend on p and q. Theorem 234.1 summarizes results of many researchers taken over three decades. We skip the proof since it requires lengthy preparations. We should like to remark that splines excel in the approximation of functions in the Sobolev classes.
6.1.2
Algebraic versions of an and cn
We shall now take time to comment on some natural modifications of the Kolmogorov linear and Gelfand rt-widths. Given a subset B of the normed space J- let us consider the following quantities. • The algebraic version of Kolmogorov's linear n-width of B:
where Pn : T —> J- is an arbitrary linear operator such that rankP n < n — 1. • The algebraic version of Gelfand's n-width algebraic version of Gelfand's n-width of B:
where L is an arbitrary linear operator on T assuming values in Cn~l (or in nn~l if the field of scalars in J" is 11}. Thus, An(B,F) and Cn(B,F} are defined as an(B,F) and cn(B, F) when dropping the requirement of continuity of operators Pn and Ln.
According to the discussion on page 231 an alternate definition ofC' n (fl,^)is
We recall that the analogous formula for c^B,?) required considering closed subspac.es V .
236
CHAPTER 6. N-WIDTHS AND S-NUMBERS
It can be easily verified that for B = cb(B) Lemma 231.1, Theorem 231.1 and Theorem 231.2 remain valid when the widths a n (J5,J c ") and cn(B,Jr) are replaced with their algebraic versions. Moreover, we have Theorem 236.1 Let B be a subset of a normed space f such that B = cb(B). Then: (a) For all n 6 A/" we have
and
(b) If B = K + E, where K ~ cb(K) is a compact subset of J- and E is a finite dimensional subspace of f, then for all n > dim E we have
We do not include the proof of Theorem 236.1 since it requires lengthy preparations. References concerning this theorem are given in Annotations at the end of this chapter. 6.1.3
Exercises
134. Prove Theorem 224.1 and show that under its conditions we also have an(A, J-) = an(cb(A), J - ) . 135. Find a subset A of the Cartesian space 7i2 such that
137. Let 7^ denote the space 7£3 with the norm
Consider the set A denned in the previous exercise and show that
6.2. S-NUMBERS
237
138. Let {ufcjfcLi and {a/,.}^ be a sequence of linearly independent elements of a nonned space J- and a nondecreasing sequence of reals, respectively. Define 14 = span{i;j,..., UA.--I} arid
Use Theorem 225.1 to show that dn(A,yr) = an and Vn is an extremal subspace for dn(A,J:), n>2. 139. Let B C 1Zk be a compact set and let A be a subset of the space C(B). Suppose that there are n points Xj in B such that for every selection s of n signs Sj 6 {—1, 1} (1 < j < ft) there is a function /s in /I assuming these signs at the points. Show that
Hint: Given linearly independent elements v\, v % , . . . , vn-\ of C(B) there are real numbers c\, GI, . . . , cn such that
and Y?k=i \ck\ — 1- Consider a function / in /I such that f(xk)Ck > 0 and estimate e(/,span{ui, u 2 , • • - , ^1-1})140. Show that lim n —><» cn(A, F) — limn—>.QO C*n(-^i ^") = 0 if A is a precompact subset of ^. Hint: Let {ai, a 2 , . . . , an} be an e-net in A, By the HahnBanach theorem, for each a,j there exists a continuous linear functional Lj on F such that LJ(O,J} = aj|| and j|Lj | = 1. Prove that a|| < 1e if a 6 A and Ly(a) = 0 for 1 < j < n. 141. Show that for A = cb(A) Lemma 231.1, Theorem 231.1 and Theorem 231.2 remain valid when the widths a n (A,^) and cn(A,F) are replaced with their algebraic versions.
6.2
.s-Numbers
Let S and IF be two Banach spaces. Let £(£, ?'} denote the nonned space of all continuous linear operators from £ to J-. ( l i v e n a mapping T 6 £(^,J^) its norm is given by the equation ||7"jj = sup{||Tx||: xe B£(l)}.
238
CHAPTER 6. N-WIDTHS AND S-NUMBERS Let us define A(T) as the image of the unit ball B£(1) under T,
i.e.,
It turns out that the Kolmogorov n-width of A(T) have some remarkable properties as a function of T. Namely, if $n(T) stands for d n (A(r),J-),then
In what follows the width dn(A(T), T) will be called the n-th Kolmogorov number of the operator T and denoted for brevity by dn(T). If s n (T) is defined as any of the n-widths an(A(T),F), cn(A(T), 7"), then the properties above are satisfied except for (a3). However, one may guarantee (al)-(a5) by altering the definition of these widths. Namely, instead of
and
one respectively considers
and
6.2. S-NVMBERS
239
Then, (al)-(a5) hold for ,s n (T) = a n ( T ) and for xn(T) = a n (7'). Another example featuring the properties (al)-(a5) can be obtained by setting sn(T) — lin(T), where
with Y € £(h,£) and X € £(T,^). Here Z 2 stands for the Hilbert space of all complex sequences a = {akj^-i satisfying the condition £i£a laA:| < oo. The inner product in /2 is defined by
The quantities an(T), cn(T) and hn(T) will be respectively referred to as the n-th approximation number, the n-th Gelfand number, and the n-th Hilbert number of the operator T. Instead of propagating examples of quantities with the properties (al)-(a5), we take (al)-(a5) as axioms and look for their consequences. For the convenience of further considerations we shall introduce some definitions and notation. • A mapping s,
is said to be an s-scale if the conditions (al)-(a5) are satisfied regardless of the Banach spaces £ and T chosen. Then, s n (T) is called the n—th s-number of the operator T. • Given an operator T € £(£,F), an operator T* 6 £(F*, £*} is called the adjoint of T if
for every r € £ and every / 6 F*. Here
When JP is a Hilbert space its dual T* is isometrir.ally isomorphic to f . For this reason, in the d e f i n i t i o n of adjoint operators we shall identify a Hilbert space with its dual.
240
CHAPTER 6. N-WIDTHS AND S-NUMBERS • Let T be an operator in £(G,H}, where G and H are Hilbert spaces. A positive number a is called a singular value of T if T(g] = ah and T*(h) = ag for some elements g 6 G and h € H such that \\g\\ = \\h\\ = I .
We begin discussing properties of s-scales with the following simple observation. Lemma 240.1 If s is an arbitrary s-scale, then
regardless of the chosen Banach spaces S and 'J-. Proof Let e > 0, n € M and T € £(£, F) be fixed. By the definition of an(T) there is an operator P 6 £(£, •T7) such that
Let s be an arbitrary s-scale. By (al), (a2), (a4) and the last inequality we get
Since £ is arbitrary the lemma follows easily. • Thus, Lemma 240.1 says that the approximation numbers yield the largest s-scale. Later we shall also determine the smallest s-scale. 6.2.1
s-Numbers and singular values
It turns out that s-numbers can be regarded as a generalization of singular values. We shall now focus our attention on this issue. Example 240.1 Given nonnegative reals o\ > a2 ~> • • • let us consider the operator D G £(l%,l-2), ^({^j}^]) — { a i a j } < j L i - Jt is easy to verify that the singular values of D are those CTJ that are positive. We shall now show that for any s-scale and n £ N we have
To this end we define operators Jn 6 £(C n ,/2) and Qn £ £(/2,C 7 1 ) by the equations
241
6.2. S -NUMBERS
We shall first prove that sn(D] > an. If an — 0, then the inequality is trivial. Hence, we assume that an > 0 and note that then the operator Dn ~ QnDJn £ £(Cn,Cn) is invertible,
and H-D"1!! = o~l. Therefore, by (a5) and (a3) we get
Thus, sn(D) > an. In order to prove that sn(D) < an we consider the operator Pn = Jn-\Qn-\ € £(^2 ; h)- It is easily seen that
and rankP n < n. Moreover, by (al) we have s\(D - DPn) = \\D DPn\\ = an. Thus, by (a2) and (a4) we get
Finally, we obtain sn(Du) = an} as claimed.
•
We shall now deal with singular values of compact operators. Lemma 241.1 IfT 6 £(H,G), T ^ 0 is a compact operator acting between Hilbert spaces H and G, then a = \\T\\ is its singular value. Proof Let us select arbitrary elements xi, x 2 ) . . . € BJJ(\} such that lim/;—^oo ||Ta;fc|| = a and define
Since T is compact, there is a subsequence {yk^JLi that converges to an element y G G. We have
and
242
CHAPTER 6. N-WIDTHS AND S-NUMBERS
Thus, the subsequence {xfcjj^i converges to x = (r~1T*y. Indeed,
It is also clear that ||x|| = 1. Finally, from the equation y^ — a~lTxkJ by letting j—>oo we obtain y = a~lTx, which completes the proof. • We are now in a position to demonstrate the singular value decomposition of compact operators. Lemma 242.1 Let H and G be Hilbert spaces. Corresponding to a compact operator T G £(G,H) let us define D G £ ( / 2 , / 2 ) by the equation where a\ > cr2 > ... > 0 and o~k is either the k-th singular value of T or zero. Then, o~i = \\T\\ — \\D\\ and for some operators U G C(12,H), V G £(/ 2 ,G) such that \\U\\ = \\V \ = 1 we have
Proof If T = 0 the lemma is obvious. Thus, we assume in the rest of the proof that ||T|| > 0. We shall describe the desired operators (/, V, and D in terms of some elements ccj, x?,... € H and yj, y 2 , . . . G G with the following orthogonality property
We shall obtain these elements by induction. By Lemma 241.1 a\ = j|T|| is a singular value of T. Hence, there exist elements x\ G H and y\ G G such that
We assume now that we have already defined orthogonal elements
and numbers a\, a-2, . . . , an-i > 0 such that
243
6.2. S-NL1MBERS
for j = 1, 2 , . . . , n. We now define operators Vn € £(/2, #), Un & £(l-2,G) and Dn e £(h,l-2) by the equations
where a = (o^, 0 2 , . . . ) is an arbitrary sequence in 1%. It can be easily verified that \\Vn\\ = \Un\\ — 1,
and
We now set crn = \\T- UnDnV*\\. If •//.. If an > 0 by applying Lemma 241.1 to the operator
we obtain elements xn 6 H and yn € H such that
For j < n we have
and
Thus, the elements xi, x % , . . . , xn and j/i, j/2, • • • J/n Consequently,
are
orthogonal.
In the next step we define rr, l+1 = ||7' - (/, l+1 D n+1 1/* +1 1|. Let us note that for any h G H we have
244
CHAPTER 6. N-WIDTHS AND S-NUMBERS
Moreover, the elements (T — Un+iDn+\V*+l)h and &n(h,xn}yn are orthogonal since
Consequently,
and as h G H is arbitrary, we get an+\ < an. If this procedure does not yield the desired representation of T after a finite number of steps, it gives us infinitely many positive singular values a\ > 172 > 03 •> • • • with the corresponding elements Xj and t/j. Since T is a compact operator, the sequence {Txj}^L1 contains a Cauchy subsequence. We also have
Taking these two facts together we arrive at the conclusion that
Let us finally note that the mappings V G £ ( / 2 , / / ) , U G £(1-2, G), and D G £(^2^2) defined by the equations
are the limits of Vn, f/ n , and Dn, respectively. That is
Consequently, T = UDV* and D = UTV, as claimed. • We summarize the foregoing discussion in the following theorem. Theorem 244.1 Let H, G be Hilbert spaces and let T G £(G,H) be a compact operator. Then the s-numbers Sj(T) (j = I , 2 , . . . ) are uniquely determined by the axiomatic properties (al)-(a5) on page 238. Moreover, Sj(T] — a j , where the quantities a.j are those in Lemma 242.1.
6.2. S-NUMBERS
245
Proof Indeed, by applying Lemma 242.1 to the operator T we get
Since \\U\\ = \\U*\\ = \\V\\ = \\V*\\ = 1 by (a3) we obtain
From Example 240.1 it now follows that sn(T) = sn(D) — an.m It turns out that Theorem 244.1 can be extended toward noncompact operators. Before stating this extension we shall present two self interesting results that will be used in its proof. Theorem 245.1 Let n be a positive integer and let T be an arbitrary operator in L ( H , J - ) . If H is a Hilbert space, then a n (T) = cn(T). Proof We only need to show that an(T) < cn(T) since the converse inequality readily follows from Lemma 240.1. Given a subspace V — V of H such that codirnV < n we define M = T — TPy, where Py stands for the orthogonal projection of H onto V. It is clear that V C kerM. Thus, rank M = codim kerM < coclimV < n. Now, by (al) and (a2) we get
Since V is arbitrary, this implies that an(T) < cn(T). The proof is complete. • We shall now state without proof the following property of the Gelfand s-numbers. Theorem 245.2 If £ and 'f are Banach spaces and T 6 £(£,/"), then
The interested reader will find references to the proof in Annotations. We are now ready to present the extension of Theorem 244.1.
246
CHAPTER 6. N-WIDTHS AND S-NUMBERS
Theorem 246.1 There exists a unique s-scale on the class of operators acting between Hilbert spaces. Proof Let G and H be Hilbert spaces and let s be an s-scale on C(G,H). By Theorem 245.1, given n e N and T £ £(G, H) we have an(T) — cn(T). From Theorem 244.1 it follows that sn(T\v) — cn(T\v) for any finite dimensional subspace V of G'. Thus, Theorem 245.2 yields an(T) = cn(T) = sup{sn(T\v) : dim V < oo, V cG}< sn(T). We finally get sn(T) — an(T) since we know from Lemma 240.1 that sn(T)
Relationships between .s-numbers
For the reader convenience we quote two important definitions concerning Banach spaces. • We say that a Banach space T has the metric extension property if given Banach spaces £0, £, and given mappings J G C(£o,£), T 6 £(£o,F) the condition yields the existence of an operator T € £(£, J~) such that T — rjand||r|| = ||f||. Given an index set S this property attributes to the spaces /00(5<). It can be shown that any Banach space can be identified with a subspace of/ 00 (,S') for some set ,5". In most applications of this property one selects £Q as a subspace of £ and defines ./ : £o —> £ as the identity operator (canonical embedding).
6.2.
S-NUMBERS
247
Figure 247.1: Diagram of the extension property We say that a Banach space £ has the metric lifting property if, given Banach spaces FQ, T , and given a positive number e, and given mappings Q 6 £(-^"> -^o)> T € £(£, FQ) the condition
yields the existence of an operator T e £(£, .F) such that T = Qf a n d | | T | | < ( l + e ) | | f | | .
Figure 247,2: Diagram of the lifting property This property holds true for the spaces /i(S). One can prove that any Banach space can be identified with a quotient space of l \ ( S ) with suitably chosen index set S. In most applications of lifting
248
CHAPTER 6. N-WIDTHS AND S-NUMBERS
property one selects FQ as the quotient space F/V where V is a linear subspace of F. Then one defines Q € £(F, F/V) as the canonical de f quotient mapping, Qf = [f]. See Exercise 11 on page 13. We are now ready to prove the following result. Theorem 248.1 Let n be a positive integer and let T be art arbitrary operator in £(£, -7-"). Then (a) an(T) = cn(T) if f. is a Hilbert space, or else if J- lias the -metric extension property; (b) an(T) = dn(T) if T is a Hilbert space, or else if £ has the metric lifting property. Proof (a) The case when £ is a Hilbert space has been already discussed in Theorem 245.1. We proceed to show that an(T] = cn(T) when f has the metric extension property. To this end we consider an arbitrary subspace V of £ with codimF < n and note that the operator T\v has a norm preserving extension TO £ £(^,^ r ), i.e., T0 v — T\y and \\T0\\ — \\T\y\\. We now define M = T - T0. Since V C kerM, we have rankM = codim kerM < codimV < n. Therefore, by (al) and (a2) we get
Since V is arbitrary, this yields an(T) < cn(T). The converse inequality readily follows from Lemma 240.1. Thus a,n(T) = cu(T). (b) We confine ourselves to showing that an(T) < dn(T) since the converse inequality easily obtains from the definitions of an and dn. The best approximation operator in a Hilbert space is a linear projection. Thus an(T) < dn(T] if T is a Hilbert space. Let us assume that £ has the metric lifting property. Given a linear subspace V of T we denote by Qy the canonical quotient mapping from f into TJV. Since the norm in TJV is defined as ||[/]|| = e(f,V), the Kolmogorov s-number dn(T) can be rewritten in the form
We now select a positive number e and a subspace V C F such that
6.2. S-NUMBEHS
249
As QyT 6 £•(£, F/V), by the lifting property, there exists a mapping f e £(€,F) such that Q$T = Q^T and \\T\\ < (1 + e) ||Q^T||. We note that rank (T - f) < n since Qy(T - f) = 0. Consequently,
Letting e—>-0 we finally get a n (T) < dn(T). The proof is complete. • Let F be a linear subspace of a Banach space T. We recall that an operator P G £(.F, .T7) is said to be a projection of T onto V if F(t>) = v for all t; G V. When the dimension or codimension of V is finite one proves that: there exists a projection P of J- onto V such that
(We give references to the proof in Annotations.) This result allows us to bound an(T) from above in terms of c n (T) and dn(T). Theorem 249.1 Let £ and T be arbitrary Banach spaces. Then
Proof We shall first show that c n (T) < (1 + ^/n)cn(T). By the definition of c n (T), given e > 0 there exists a subspace W of £ such that We now take a projection of £ onto W with ||P|| < 1 + T/n, and we define Q = T-TP. Since W C kerQ we get r a n k Q = codim kerQ < codimM 7 < n. Consequently,
Thus, letting e->0 we get an(T) < (1 + 01) c n (T). We shall now prove that an(T) < (1 + ^/rc) d n (T). By the definition of d n ( T ) , given e > 0 there exists a subspace I/ of .F such that dim V < n and
250
CHAPTER 6. N-WIDTHS AND S-NUMBERS
Let us now consider the operator M — FT, where P is a projection of f onto V with \\P\\ < ^/n. We have rank M < dim V < n and
for all v 6 V, x e B?(l). Hence,
Thus, letting £->Q we finally get an(T) < (1 + V™) dn(T). This completes the proof. • The concepts of Kolmogorov and Gelfand s-numbers are dual to each other. In order to explain this fact we need to recall some relevant notions and results from functional analysis. Let X be a normed space. • A sequence of functionals AI, Aj, AS, . . . € X* is said to converge to a functional A G X* in the weak* sense if
This notion of convergence yields so called weak* topology on X*. • Corresponding to each x E X we may define a yx 6 X*~* by the equation The mapping J : X-> X**,
is linear and isometric, i.e., \\x\\ — \\Jx\\ for all x £ X. It is called the canonical embedding of X into X**. The image J ( X ) consists of those functionals on X* that are continuous with respect to the weak* topology. • The space is is called reflexive if the canonical embedding J : X —> X** is invertible. For this reason, the bidual of a reflexive space can be regarded as the space itself. Examples of reflexive spaces include L p (7^. ?i ,//) for 1 < p < oc.
6.2. S-NUMBERS
251
Although the canonical embedding may fail to be invertible (which is the case e.g., for X = L\(R-n,/^)), it always obeys the principle of local reflexivity, which reads as follows. Given a finite dimensional subspace V of X** and e > 0 there exists a mapping M e L{V^X] such that \\M\\ < l + e and MJv = v for all v e X n J"1 (V). Let us now recall two important duality principles. The first one states that if V is a subspace of X and x G X, then
where the supremum is taken over all A 6 X* such that \(V) = 0 and ||A|| < 1. (Compare Exercise 10 on page 13.) By applying this to X* instead of X we obtain the second duality result. // W is a weak* closed subspace of X* and f 6 X*, then
where the supremum is taken over all x £ X such that \\x\\ < 1 and w(x) = 0 for every w G W. We are now ready to present the duality between the Kolmogorov and Gelfand s-numbers. Theorem 251.1 Let n be a positive integer and let T be an arbitrary operator in C(£,F], where £ and T are Banach spaces. Then (a) dn(T*) = cn(T) and dn(T] > dn(T**}. Moreover, (b) dn(T) = cn(T*) if 'JF is reflexive or else if T is compact. Proof (a) Given arbitrary linear functionals L\, Z / 2 , . . ., L n _ ] in S* we define W = span{Z/i, L 2 , . . . , L n _ i ) and
Thus codimV < n and dim W < n. We have ||T|v|| = sup. re v i ^^ Tx\\. For any x 6 £ and any A € -T-"*, by the duality principles mentioned above, we obtain
252
CHAPTER 6. N-WIDTHS AND
S-NUMBERS
and
Hence,
Now, by taking the infimum over L\, L-2, • • • , Ln-i we get dn(T*} = Cn(T}.
We shall now prove that dn(T] > dn(T**). To this end we consider the canonical embeddings Je : £->£**, J^ : T'—>• F** and note that given x G £ and / 6 T* it holds
Thus, J-^T — T**J£. Now, we select an arbitrary subspace V of T, dim V < n and observe that for each x £ £ and v G V we have
Moreover, ||J^:(a;)ll = INI anf l dim,/ :r (V A ) < n. Consequently,
By taking the infimum over V we get dn(T) > dn(T**), which completes the proof of assertion (a). (b) In virtue of (a) it suffices to prove that dn(T**} > dn(T). When F is a reflexive space we leave a straightforward verification of this inequality to the reader. It remains to prove it for compact operators T. Given e > 0 there exists a subspace U of f** such that dim U < n and Since T is compact, the image T(Bz(1)) has an e-net, i.e., there exist points x\, x-2, . . ., Xk in Be(\.) satisfying the condition
6.2. S-NUMBERS
253
We now define
Since dimY is finite, by the principle of local reflexivity it follows that there exists an operator M 6 £(Y,F) such that ||M|| < 1 + e and MJ-^(Txj) = TXJ for each 1 < j < k. As J£ is isometric, for each j we have
Hence, for some elements zj 6 U it also holds Taking into account that T**J^ = J^T we now get
Consequently,
and, of course, dimM(t/) < n. Since the points TXJ form the e-net for B£ (1), this yields
On the basis of this inequality, by taking the infimum over U and then letting e-^0 we finally get dn(T) < dn(T**). The proof is complete. • Remark From the proof of assertion (a) it readily follows that if a subspace W of f* is extremal for dn(T*}, then the subspace ^(M 7 ) of f is extremal for c n (T), i.e.. dim W < •»,, codimV(VK) < n and
Based on compactness of the unit ball of J-" in the weak* topology one proves that there always exists an extremal subspace for dn(T*}. And so does it for cn(T). • We close this chapter with quoting (without proof) some recent results on s-numbers related to Hardy spaces Hp and H*. We remind the reader that
254
CHAPTER 6. N -WIDTHS AND S-NUMBERS Hp is the linear space of analytic functions / in the u n i t disk A C C such that the norm
is finite; H* d= { f u : f e Hp}, where u(z) = l + z'2 for z 6 C, and the norm in // * is defined by
The Hp and H* endowed respectively with the norms || • )|^ and || • \\fjt are Banach spaces. It is easy to note that given numbers p and q such that 1 < q < p < oo the restriction of an arbitrary function / 6 Hp to the interval (—1,1) is an element of Lq = L q ( — l , 1). Moreover, the restriction of any function g £ H* to ( —1, 1) is a member of L^ = Lcx,(— 1, 1). Theorem 254.1 Let /?M : Hp->L,, and I* : H*^LIX) denote the restriction operator*,
(a) For 1 < q < p < oo and n 6 A/" i/tere exzsi positive constants A and B such that
(b) For 1 < p < oo and n € Af there exist positive constants C and D such that
We should like to remark that the upper bound for a n ( I * ) obtains by constructing a rational approximation that depetids linearly on
9 e //*.
6.3. ANNOTATIONS
6.2.3
255
Exercises
142. Verify the axioms (al)-(a5) for (a) sn — dn\ (b) sn = a n ; (c) sn = cn; (d) sn = hn.
143. Assume that G and H are Hilbert spaces, T 6 £(G, //) and that n G A/". Based on the singular value decomposition, find an operator Pn G £(G, //) such that rank Pn < n and an(T) = \\T~Pn\\.
144. Let n G A/" and let s be an arbitrary s-scale. Show that sn(T) = 0 iff rank T < n. 145. Let T G £(£,/"), where £ and JF are Banach spaces. Prove that hn(T) = hn(T*) for all n G A/". 146. Show that the following conditions are equivalent. 1. T is a compact operator, 2. Hindoo cf7lCr) = 0, 3. lim n —^ooC^T) = 0. Hint: According to Schauder's theorem T is compact iff T* is compact. 147. Prove that an(T) = an(T*) for any continuous linear operator acting from a Banach space into a Hilbert space. 148. Based on the proof technique of assertion (b) of Theorem 251.1 show that a n (T) = an(T*) if T 6 £(£,.7") is a compact operator or else if the space F is reflexive. 149. Let T be a Banach space with the metric extension property, and let B = cb(B) be a subset of T. Show that
for all n £ JV.
6.3
Annotations
The majority of the material for this chapter is based on the monographs [17] and [20]. In the literature various n-widths and s- numbers are studied. Here we confined ourselves to those, which we find
256
BIBLIOGRAPHY
most important and will use in the next chapter to demonstrate, their interplay with the theory of optimal algorithms. The reader interested in more complete picture of the subject is referred to [1], [4], [5], [11], [14], [19], [17], [20], [22], [23], and [25]. Specific comments Section 6.1: The Borsuk-Ulam theorem is proven in [12]. Its application to n-widths outlined in the proof of Theorem 225.1 is due to Tichomirov, see [24]. In the literature Theorem 225.1 is frequently called the Fundamental Theorem of Tichomirov. Example 227.1 is based on [16]. The Mazur theorem on supporting hyperplanes is proven in [21]. The proof of Theorem 234.1 is outlined in [20]. The earliest result listed in Table 234.1 is dn(W%,L2) = ©(n~ r ). It belongs to Kolmogorov [10] and dates back to 1936. The completion of the table was done in the seventies and is due to Kashin [9], [8], Maiorov [15], and Hollig [6]. The reader is referred to [20] for a detailed history of this table, n-Widths of Sobolev classes of multivariate functions are studied in [1] and [23]. Theorem 236.1 is due to Heinrich, see [4]. Section 6.2: The axiomatic theory of .s-numbers has been developed by Pietsch. For a detailed study of the subject and some ramifications we refer to [19] and [17]. The omitted proof of Theorem 245.2 can be found in [17]. The extension and lifting properties of Banach spaces are defined and exemplified in [18]. The existence of projections mentioned on page 249 has been established in [3] for codimV < n and in [7] for dim V < n. The principle of local reflexivity is due to Lindenstrauss and Rosenthal, see [13]. For the duality results quoted after this principle and for the existence of extremal subspaces for dn(T*) we refer to [22]. Theorem 254.1 is due to Wilderotter [26] and improves the results of Burchard and Hollig [2].
6.4
References
[1] K.I. Babenko. Theoretical Background and Construction uf Computational Algorithms for Mathe.inatical-Physical Problems.
BIBLIOGRAPHY
257
Nauka, Moscow, 1979. In Russian. [2] H.G. Burchard and K. Hb'llig. n-Widths and entropy of Hp classes in Lq(-l, 1). SIAM J. Math. Anal., 16: 405-21, 1985. [3] D.J.H. Garling and Y. Gordon. Relations between some constants associated with finite dimensional Banach spaces. Israel J. Math., 9: 346-61, 1971. [4] S. Heinrich. On the relation between linear n-widths and approximation numbers. J. Approx. Theory, 3: 315-33, 1989. [5] H.P. Helfrich. Optimale Approximation besrankter Mengen in normieren Raumen. J. Approx. Theory, 4: 165-82, 1971. [6] K. Hollig. Approximationszahlen von Sobolev-Eiubettuiigen. Math. Ann., 242: 273-81, 1979. [7] M. Kadec and S. Snobar. (Certain functionals on the Minkowski compactum. Math. Notes, 10: 694-96, 1971. [8] B.S. Kashin. Diameters of some finite-dimensional sets and classes of smooth functions. Math. USSR Izv., 11: 317-33, 1977. [9] B.S. Kashin. On Kolmogorov diameters of octahedra. Soviet. Math. Dokl, 15: 304-07, 1974. [10] A. Kolmogoroff. Uber die beste Annaherung von Funktionen einer gegeben Funktionenklasse. Annals of Math., 37: 107-10, 1936. [11] N.P. Kornejc'uk. Extremal Problems in Approximation Theory. Nauka, Moscow, 1976. In Russian. [12] K. Kuratowski. Topology. Academic Press, 1968. [13] J. Lindenstrauss and H.P. Rosenthal. The L /r spaces. Israel J. Math., 1: 325-49, 1969. [14] G.G. Lorentz. Approximation of Functions. Holt, R i n e h a r t , and Winston, New York, 1966. [15] V.E. Maiorov. On linear diameters of Sobolev classes. Soviet Math. Dokl., 19: 1491-94, 1978.
258
BIBLIOGRAPHY
[16] A.A. Melkman. n-Widths and optimal interpolation of timeand band-limited functions. In C.A. Micchelli and T.J. Rivlin, editors, Optimal Estimation in Approximation Theory, pages 55-68, Plenum Press, New York, 1977. [17] A. Pietsch. Eigenvalues and s-Numbers. Leipzig, 1987.
Geest and Portig,
[18] A. Pietsch. Operators Ideals. North-Holland, Amsterdam, 1980. [19] A. Pietsch. s-Numbers of operators in Banach spaces. Studia Math., 51: 201-23, 1974. [20] A. Pinkus. n-Widths in Approximation. Berlin, Heidelberg, New York, 1985.
Springer Verlag,
[21] R.T. Rockafeller. Convex Analysis. Princeton University Press, Princeton, N.Y., 1970. [22] I. Singer. Best Approximation in Normed Spaces by Elements of Linear Subspaces. Springer-Verlag, Band 171, 1970. [23] V. Temlakov. Approximation of functions with bounded mixed derivatives Nauka, Moscow, 1986. In Russian. [24] V.M. Tichomirov. Diameters of sets in function spaces and the theory of best approximation. Russian Math. Surveys, 15: 75111, 1960. [25] V.M. Tichomirov. Some Problems in Approximation Theory. Nauka, Moscow, 1976. [26] K. Wilderotter. n—Widths of Hp—spaces in Lq(— 1, 1). Journal of Complexity, 8: 324-35, 1992.
Chapter 7
Optimal Approximation Methods In this chapter we outline a theory of optimal computational methods for general, nonlinear approximation problems. We define the notions of optimal algorithms and information, and we analyze the classes of parallel and sequential methods. We put special emphasis on linear problems as well as linear and spline algorithms. Several relationships between optimal methods and n-widths and s-numbers are also exhibited. We start with a simple example that illustrates the theoretical concepts introduced later. We suppose that for a given positive number e, we wish to compute an ^-approximation M(/) to the integral f^ f ( t ) d t i.e.,
for some function / : [0,1]—»7£. We assume that / belongs to a class of absolutely continuous functions for which the first derivative exists almost everywhere and is bounded by 1, ||/ ||esssup < 1. By a computational method we understand the sampling process (information) N(f) = [/(ti),. ..,/(*„)], where 0 < t x < t2 < ... < tn < 1, combined with an algorithm , : Tln—>[0,1], to actually compute M(/) = ^(JV(/)). We want to find a method that allows us to compute an eapproximation for every function / in our class, with a minimal cost, measured by the number of function samples. For this we must identify an optimal location of sampling points £,-, which minimize the error of any method using n samples. To this end, we first observe
260
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
that all functions sharing given information y = N ( f ) , for some / in our class, lie in between two envelopes: / + , and /"", which are piecewise linear functions with slope ±1 (Figure 260.1), where
Therefore, for a given set of sampling points t = [ti,...,tn], and samples of /: y — N ( f ) , the integrals of all functions / in our class sharing information with /, N(f) = N ( f ) , are in between the integrals of / + and /"":
This implies that the local (worst case) error e((N(/)))
of any
Figure 260.1: The envelopes algorithm <j) using N(f)
must be
We define the global error of an algorithm as the worst case local error
261
The algorithm with minimal global error e((f>) is thus given by the trapezoid method
where /(•) is a piecewise linear interpolant to N ( f ) . The largest error e((f)°(f)} of 4>° occurs for the functions / such that N(f) = 0. This error is given by:
This error is called the radius r ( N ) of the information A/. It only depends on the selection of sampling points t,. To minimize r ( N ) we need to find an optimal choice of ij's or optimal information N. We accomplish this by taking partial derivatives of the function g(t\,,. .tn) = r ( N ) , with respect to each £;, and setting them equal to 0. By solving the resulting system of linear equations we obtain the optimal choice:
It turns out that the second derivative matrix g (t) is positive definite for every t = [ti,.. .,tn], i.e., the function #(•) is convex and t° is it's global minimum (A verification of this is left to the reader as an exercise). For this specific choice of points we get the minimal radius r ( N ) equal to
To guarantee that <j>°(N°(f)} computes an e-approximation for every / we need to take n° sampling points t°, such that -^ < 6, i.e., 7i° — \~^.~\ is the minimal n u m b e r of points with this property. Conclusion The equispaced sampling combined with the trapezoid algorithm is an optimal method for the solution of our problem. To guarantee an e-approximation for any function in our class we must sample / at 71° = [^] equispaced points t°. The number n° is the minimal number with that property.
262
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
We remark that we analyzed here the worst case performance of all computational methods based on function samples. In what follows we will outline methods based on much more general sampling procedures, as applied to arbitrary (in general) nonlinear problems.
7.1
A general approximation problem— information and algorithms
We consider an approximation problem defined by (in general) a nonlinear operator 5 acting from a subset F of a linear space 'f', into a normed linear space GY, S : F C F-^G. For every element / in F we want to compute an approximation M(/) to S ( f ) . More precisely, given a tolerance e > 0, we wish to compute M(/) within a distance of at most £ from S ( f ) , i.e., satisfying the absolute error criterion
Assuming that the operator S is an imbedding, S = /, our problem reduces to the approximation problem of finding M(/), which is within a distance e from the element /. To compute M(f) we need to gather some information on the element /, e.g. if / is a function, we sample / at a discrete set of arguments. We assume that in general we can compute information operations L(f), L : F-^Ti or C, where L is a real or complex linear functional. Examples of such functionals L are function and/or derivative evaluations, scalar products, integrals, etc. We denote the class of all such L's by £. For each / we compute a finite number n of unit information operations, which constitute the information N about /. The information N is called nonadaptive (parallel) iff
where Li G jC , for every / 6 F, and where the L,- are linearly independent. We stress that the linear independence assumption is not reducing generality and is taken to simplify some proofs. Sometimes we consider information consisting of continuous nonlinear functionals, and then we obviously drop this assumption. The number n is called the cardinality of N . In the case of nonadaptive information the functionals Li are given a'priori, before any computation takes place. Such information can be easily implemented on a parallel (even distributed memory) computer with up to n processors, and
7.1. A GENERAL APPROXIMATION
PROBLEM
263
yielding optimal speed-up over sequential one processor computation. The class of adaptive (sequential) information does not enjoy this desirable property; every Li evaluation depends on all previously computed values. We stress that adaptive information can be much more powerful than nonadaptive for some problems. The adaptive information is therefore defined as where y,-(/) = [y,-_i(/),!,-(/; W -i(/))] G TV or C«, yi(/) = LI(/), and !,!,!,;(•; y;-i(/)) 6 £, i = 2 , . . . , n. For adaptive information, the next evaluation (functional) I/j is defined with respect to all previously computed information operations given in the vector j/,-_i. Such structure is not easily implementable on a parallel computer and is naturally suited for sequential single processor implementation. Two general properties of the information follow. (i) It is partial, i.e., the operator N is in general many to one. Thus knowing N(f), we can not uniquely identify f. (ii) It is priced, i.e., each evaluation Li costs, say, c-units of time, and the total N(f) therefore costs c • n plus the cost of constructing all functional Li in the adaptive case. We are interested in information that has minimal cost. For many applications the cost of constructing Lj's is negligible, and therefore the above problem reduces to finding information with minimal cardinality m(e), which makes possible the computation of an e-approximation M(/) for every / 6 F. This minimal cardinality is called the information complexity of the problem in the class £. Section 7.6 further discusses cost issues and optimal complexity methods. Knowing the information vector N(f) we compute an approximation M(f) to 5(/). This is accomplished by an algorithm <j) defined as any transformation of the information vector N(f) into the space G, i.e., We are interested in algorithms that use minimal resources (information with minimal cardinality and a small number of arithmetic operations) to solve our problem. Such algorithms > compute
264
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
M(/) = >(TV(/)), To! every / 6 F, which satisfies the absolute error criterion, and have smallest (or close to smallest) cost (Section 7.6). Remark We stress that classical numerical methods (algorithms) can be cast into our framework. The notions of information and algorithms are not separated there, but rather used jointly as a method or an algorithm. • By a computational method M for solving the problem (5, F) we understand algorithm <j> and information N combined , M — ((/>, TV), (M(-) = (N ( • ) ) ) . Remark (Need for restricting the set F) We consider an approximation problem given by the embedding operator S = I. To be able to find information with small (finite) cardinality for computing e-approximations we need to assume that the set of problem elements F is essentially smaller than the space T . Indeed Theorem 1.10 states that if F = J- then there exist /'s in F such that the distance between / and any increasing sequence of finite dimensional subspaces can decrease arbitrarily slowly. This means that if we take any e > 0, any cardinality ra, and any linear method using the information TV, M(/) = X)"=i aiLi(f), where a,- are linearly independent elements in F, then there exists an element / G 7 such that the error ||M(/) — /|| > £. Therefore in most applications we assume that the set F is a ball of finite radius, or a bounded convex subset ofF • In the following we introduce concepts of radius and diameter of information, define error of an algorithm, and introduce optimal and central algorithms. These notions will enable us to characterize the quality of computational methods. 7.1.1
Radius of information — optimal algorithms
The radius and diameter will characterize the quality of information. The smaller the radius and diameter the better method can be obtained. Let y — TV(/) be the n-dimensional information vector for some / € F. The set of elements / 6 F indistinguishable from / by TV will be denoted by V(y) (see Figure 265.1),
7.1. A GENERAL APPROXIMATION
PROBLEM
265
Figure 265.1: The sets V(y) and S(V(y)) We define the set S ( V ( y ) ) as the set of solutions for all problem elements that share the information vector y,
The local radius of information r(N, y) is defined as the radius of the set S ( V ( y ) ) , i.e.,
The local diameter of information d(n, y) is the diameter of the setS(V(y)),i.e.,
The global radius r(N) and global diameter d(N) of the information N are defined as the worst case local radius and the worst case local diameter respectively, i.e.,
and
It is not difficult to prove the following relationships between the radii and diameters of information.
266
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
Lemma 266.1 For every vector y = N(f), f 6 F and any information N we have: (i) %d(N,y) using the information N . More precisely, we define the local error e(<^, y) and the global error e(>) of any algorithm <j> as:
and
Theorem 266.1 We have:
and
Proof We will only prove the first equation since the proof of the second is similar. Let A = inf0 6 $(jv) e(cf>, y) and B — r(N, y). In order to prove that A = B we take any algorithm > and observe that
This yields B < A. Now we take any 6 > 0, and c<$ 6 G such that
7.1. A GENERAL APPROXIMATION
PROBLEM
267
Then the algorithm s(y) = c$, y 6 N(F) is well defined (based on the axiom of choice). By taking the infimum with respect to the algorithms
Since 6 is arbitrary we get A < B, which completes the proof. • Theorem 266.1 indicates that all algorithms have errors at least equal to corresponding radii of information. The strongly (locally) and globally optimal error algorithms are therefore defined as follows. The algorithm <j>° is a strongly (locally) optimal error algorithm iff
The algorithm <j>00 is a (globally) optimal error algorithm iff
These definitions immediately imply that every strongly optimal algorithm is also globally optimal. It may however, happen that the local error of a globally optimal algorithm is much larger that the local radius of information. We define 5(c(y),r) as a smallest ball with the center c(y) and the radius r = r ( N , y ] , containing the set S ( V ( y ) } . We set C(y) to be the set of all centers of such balls. We now suppose that for every y G N ( F ) the center set (.'(y) is not empty. We choose any element c= c(y) in C ( y ) , and define the central algorithm 4>c.(y) by
Lemma 267.1 An algorithm (j) is central iff it is a strongly optimal. error algorithm. The proof of this lemma is left to the reader as an exercise at the end of this section. The next lemma characterizes existence of central algorithms.
268
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
Lemma 268.1 If for every y G N(F~) there exists an element c = c (ll) £ G such that the set S(V(y)j is symmetric with respect to c, i.e., z G S ( V ( y ) } implies 2c - z G S ( V ( y ) ) , then: (i) c is the center of the set S(V(y)); (ii) d(N,y)=2 r(N,y). Proof (i) We assume the contrary, i.e., that c is not a center of the set S(V(y)). Then there exists an element b G G such that
Hence, for some z G S(V(y)) we have sup^^/y^n | b — M|| < ||c — z||. We define h = c — z. Since 5 r (V(t/)) is symmetric with respect to c then z — c + /i belongs to 5(V(j/)). Therefore,
which is not possible. (ii) If r(N,y] = +00 then the assertion (i) of Lemma 266.1 implies r(N,y) = d(N,y) = +00, i.e., (ii) holds. Thus we assume now that r(N,y) is finite. Choosing a (small) positive 6 and z& G S ( V ( y ) ) such that |jc - z$\\ > r ( N , y ) — 6, and setting c - zs — h we get zs = c+h G S(V(y)}. Therefore, d ( N , y ) > ]\zs - zs\\ = 2||/i|| = 2||c - zs\\ > 2r(N,y) - '26 . Taking <5^0+ we obtain d ( N , y ) > 2r(N, y). This combined with the assertion (i) of Lemma 266.1 yields 2r(N,y) > d(N,y) > 2r(N,y). Hence, (ii) follows. • It is important to analyze algorithms that are "nearly" strongly optimal or, more precisely, that compute a solution belonging to the set S(V(y)). Such algorithms will be called interpolatory, since they provide an exact solution for some element / that interpolates our unknown / with respect to the given information N. An algorithm <j) is called interpolator}/ iff
It turns out that the local error of interpolatory algorithms is at most twice the local radius of information. This property is formulated in the following theorem.
7.1. A GENERAL APPROXIMATION PROBLEM
269
Theorem 269.1 For every interpolatory algorithm (j)1, and every y € N(F) wehave and
Proof We only show (i) since the proof of (ii) is similar. To this end, it is enough to prove that e(0 7 ,y) < d(N,y) (see Lemma 266.1 and the first equation of Theorem 266.1). We observe that
which completes the proof.
Figure 269.1: Local diameter of information There exist interpolatory algorithms that are strongly optimal. Examples include the trapezoid algorithm discussed in the introductory example to this chapter as well as spline algorithms exhibited in Section 7.4. These are commonly applied to numerical integration, approximation, and even nonlinear zero finding problems. There also exist interpolatory algorithms that are not optimal. The example that follows exhibits this phenomenon.
270
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
Example 270.1 Let S = I be the approximation problem, F = {/ : [O.lHft, / continuous, \\f\\^ < 1}, and let y = N ( f ) = [ / ( x i ) , . . . , f ( x n ) ] be parallel information. We consider the interpolatory algorithm !'(y) = 5(/) = /. Then e(^ ; ,y) < d ( N , y ) = 2 (as illustrated in on page 269), and for / = 1, y° = [1. . . ., 1] we have e (4> i y) — 2. Thus e((f>l) = 2. Taking a non-intcrpolatory algorithm >(y) = 0 one gets e(>) = 1 = r ( N ) , i.e., this 0 is optimal. •
7.1.2
Exercises
150. Verify that the largest error of the trapezoid algorithm <j>° in the introductory example to this chapter occurs for the function / = 0. Compute the partial derivatives and solve the resulting three diagonal system of linear equations for the optimal sampling points t°. Show that the second derivative matrix g (t) is positive definite for every vector t = \t\,.. .,tn]. 151. Prove the relationship between diameters and radii of information given in Lemma 266.1. 152. Using the definitions of strongly optimal and central algorithms prove Lemma 267.1. 153. Provide formal proofs of the second equation in Theorem 266.1 and the assertion (ii) of Theorem 269.1.
7.2
Linear problems
Many important applications, like e.g., approximation, integration, computation of derivatives, and interpolation (compare Chapter 8) are defined by a linear operator acting from a convex, bounded set F in V into the space G. This is why the theory of problems defined by a linear operator is of special interest to us. Precisely by a linear problem we understand the triple:
where S : f-^-G is a linear operator, T : T-^tH is a linear (called restriction) operator, H is a normed linear space, and N is information consisting of evaluations of linear functionals (chosen from a given class of all linear functionals). The class of problem elements is then defined as:
7.2. LINEAR PROBLEMS
271
Such an F is a convex and balanced subset of T . Instead of defining F by a linear restriction operator T, we could, in an essentially equivalent way (see annotations to this chapter), define F to be a convex balanced subset of T . In this section we analyze linear problems with parallel information, and in Section 7.3 we show that this assumption is not restrictive. Actually, we show that sequential information does not provide much better means of approximating solutions than specific choice of parallel information. This is especially important for the design of computational methods for multicomputers (Section 7.3). Assume now that N is parallel,
where the L; are linear functional chosen from a given class £. The radius of information establishes an intrinsic uncertainty in the solution of a problem due to the partial information N. It is closely related to the diameter of TV, r ( N ) < d(N) < 2 r ( N ) ) . For linear problems, the diameter d(N) can be easily obtained. Lemma 271.1 For every linear problem (S,T,N) with parallel information, the diameter d(N) is given by
Proof Let c - 2sup{ \\Sh\\ : h 6 ker N n F}. We first show that d ( N ) < c. To this end, we take any y G N(F) and any two elements /i, /2 e F , such that N ( f i ) = N(fr) = y. Denoting h = l/2(/j -/ 2 ) we get h € ker N, and \\Th\\ < 1/2(11^ || + ||r/2||) < 1/2(1 + 1) = 1, i.e., h 6 ker TV n F. Therefore,
which implies that
272
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
Now, we show that d(N] > c . To this end, we take any h G ker N n F, and denote /t = h, h = -h. Then N ( f i ) = N(f-2) = 0, and \\T fi\\ = \\Th\\ < 1, i.e., /, G F. This yields
Taking the supremum over h G ker A*" n F we get
which completes the proof. • We note that the second part of the proof (d(N) > c) yields that the diameter d(N) is equal to the local diameter d(N, 0) for the zero data vector y. This indicates that the zero data is the worst case data in terms of local diameters of information. Corollary 272.1 For every linear problem (5, T, TV) with parallel information N the worst case data vector is given by zero data:
We now state an important corollary from Lemma, 271.1. Corollary 272.2 For every linear problem ( S , T , N ) with parallel information N, if ker N n ker T is not a subset of ker S, then the problem is unsolvable with finite error, i.e., r(N) = d(N) = +00. Proof We note that
with the notation 0/0 = 0. If there exists h 6 ker N n ker T such that S(h) ^ 0, then the right hand side of the equation above is infinite. This completes the proof. • Below we give an example to illustrate Corollary 272.2.
7.2. LINEAR PROBLEMS
273
Example 273.1 We let S = /, ^ = C r (0, 1) = {/ : [0,1]-^ : f € C(0,1)}, where r > 2, and G = C(0,1). We define the restriction operator T : f -> C(0,1) by Tf - f. The set F is given now
b y F = {/e^:||r|| 00
(i) We first consider parallel evaluation of function values as the information N, N ( f ) = [ f ( x i ) , . . . , f ( x n ) ] , for Xi € [0,1], Zj ^ Xj for i ^ j. Corollary 272.2 indicates that if the number of evaluations n is less than the regularity r, then we cannot solve the problem with finite error. Indeed, for n < r the function h(t) — (t — x ^ ) .. .(t — xn) has the properties h € ker N n ker T and h (/[ ker 5. (ii) We now suppose that we have parallel evaluations of arbitrary linear functional as the information N, N ( f ) = [ L i ( / ) , . . . , L n (/)]. Does it still hold r(n) = +00, for n < r? Yes, it does. We justify it by showing existence of an element w G ker N D ker T and w (£ ker S. We let w(t) = E"=o a ^ s - % setting L j ( w ) = E^o^'MO = 0, j = l , . . . , n , we obtain a homogeneous system of n linear equations with n + 1 u n k n o w n s a;, i = 0 , . . ., w. There obviously exists a nonzero solution to this system, say [ag,. . . ,a ? °J, with a°Q ^ 0 for io € [ 0 , . . . , n]. Taking w°(t] = E?=o a,9*'' we get N(w°) = [ 0 , . . . , 0], Tw° = 0, and Sw° / 0. One may ask whether the above negative results are due to the choice of linear information. As we will see below, this is not the case. (iii) Now we suppose that the information N consists of parallel evaluations of arbitrary continuous nonlinear functionals. In this case we cannot apply Corollary 272.2 since it only deals with linear information. We can make use of the famous Borsuk-Ulam theorem on antipodes, which we formulate here in much more general setting than on page 225. Let f G 'f and let hi, . . ., hn+\ € T be linearly independent elements. We define a generalized parallelepiped P in the space J- by F' = {g G 'J~ '• g = f + Y^i=i aihi, at G 72., a,-1 < 1}. We let N : P—>72." be any continuous transformation. Then there exist two elements J\, /2 belonging to the opposite faces of P, such that N(J\) = Af(/ 2 ). This means that fj = f + E"^1 aijhi, a^3 < 1, j=l,2, and there exists i0 G [1, . . ., n + 1], such that a; 0>1 = — a,- 0)2 — ±1.
274
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
We now assume that n < r and N(f) = [Z/i(/),.. .,£„(/)], for continuous nonlinear functionals Li, Li : T-VR,. Given an arbitrary increasing sequence of points tj € (0,1), j — l , . . . , n + 1, and a positive constant c, we define n + 1 linearly independent elements hi (t) = c TL^ljjti (t-tj) i = l,...,n + l. Since Thi - 0 for every i, all hi's belong to the set F, and consequently all linear combinations of them also belong to F. The Borsuk-Ulam theorem implies that there exist /i,/2 belonging to the opposite faces of P — {g 6 T : 9 = ES1 a,-M, for which JV(A) = AT(/ 2 ). Thus,
Clearly /i and /2 belong to T, and
where 0 < b = mini<j-< n+ i | 1^=1,^; (*»' ~ tj}\- % taking c-+ + oo we obtain d(N) = +00. (iv) It can be shown for a large class of problems 5, that by choosing N as arbitrary nonlinear information (not continuous!) only one evaluation of a nonlinear functional L is needed to guarantee d(N) — 0, i.e., the problem becomes trivial to solve, (see Annotations to this chapter). • We now characterize the existence of an optimal algorithm in terms of a specific choice of a mapping M : N(F)^F: Theorem 274.1 If there exists a mapping M : 7V(JF)—>-jF such that then d(N) = 2r(N} and the algorithm (j)0 = S M e $(N) is optimal. Proof Indeed, we have
7.2.
LINEAR PROBLEMS
275
Thus, £(j>0) = r(N] = ± d ( N ) , as claimed. • Corollary 275.1 Let f be a unitary space over the field K = 72 or K = C and let F be a ball in F centered at 0. Let the information operator N : T—*Kn be defined by
where Ci, • • -,C?i are linearly independent elements of F and < •, • > is the inner product in T'. Then d(N) = 2r(N) and the algorithm
is optimal. Here N* is the adjoint of N and G(Ci, C2i • • • > Cn) *5 the Gram matrix. Proof For x = (aii, x 2 , . . . , xn) € Kn we have N*x = ^j^i X3(,.r Thus, N N* is the Gram matrix G'(Ci, (2, • • • , Cn)- We now set M = N*(N N*)~l and note that for any /
and
We note that these equations imply that M(7V/) is the orthogonal projection of / onto kerTV. Theorem 274.1 yields the equations ±d(N) = r(N) = e(S M), which completes the proof. • We shall now use this corollary to find an optimal algorithm for recovering band- and energy-limited signals from their samples. Example 275.1 Given positive numbers a, T, E let J(a,T,E) be the subset of W(a, r), as defined in Example 227.1. Thus, functions / in J(o, r, E) are signals of bandwidth [—a, a] and of energy || /'||.] ,x at most E. We assume that they are given t h r o u g h the information operator N : W(a, T] -»C n ,
276
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
We aim to find an optimal algorithm for recovering / from N f , i.e., a mapping )0 : J\f(J(a, T, E ) ) —>-L 2 [—r, T] such that
is as small as possible. Here, || • || 2r is the norm in L 2 [-r,r]. To this end, we note first that by the Paley-Wiener and the Parseval theorems (see p. 23) for each signal / G J(a, r, E) there exists a unique function xj £ L 2 ( —a, a) such that ||a;/||2 ( i < B/(27r) and
where (• , •) is the inner product in Z/ 2 (— a, a). Thus, setting
and
we obtain f = Sxj,Mf = Nzf, and ||/||2)00 < E iff \\Xf\\2 < E/(2n). Now, from Corollary 275.1 it follows that the algorithm
is optimal for recovering / from Aff. Since (uj, , u^) = "'"" ^J~ k' for j, k = 1, 2, . . ., n we get
where the coefficients cj, c 2 , . . . , c^ are determined by the solution of the linear system
7.2.1
Optimal information
By optimal information in a given class £, we understand information with minimal radius. More precisely we let A^ to be a nonadaptive information
7.2. LINEAR. PROBLEMS'
277
for some class £ of permissible information operations. We then denote Ln to be the class of all such information, and we define the n-th minimal radius in the class Cn by
We also denote r(n) — r(n, £*) to be the ?i-th minimal radius in the class of all linear functionals £*. Similarly we define the n-th minimal diameter in the class L — £* as
The information N° is called optimal in the class £n iff
In many applications it is easier to minimize the diameter of information than its radius. The resulting radius is then at most two times larger than the radius of optimal information. Therefore, we call information with minimal diameter almost optimal. Below we illustrate this concept by an example. Example 277.1 We let T = {/ 6 C(0, 1) : which are absolutely continuous } and Tf = /'. Thus, F = {/ G T : esssup te[cu] |/'(£)| < 1}. We define S — I and assume that the space G = F is endowed with the infinity norm, H/H^ = sup te[0jl] |/(t)|. • We first want to find information with minimal diameter, which consists of function values. We therefore take
Then, according to Lemma 271.1 the diameter d(N) is given by:
We let m = max{2a;i,a;2 — xi,...,xn — a; n _i,2(l — x n )}, and suppose without loss of generality that in = Xi+\ — X{. We define
Then g e ker/V, and d ( N ) > 2||(/j| ( X — '2m/'2 = •///,. Now we choose an arbitrary h £ ker N such that H/i'||, :6 ., s1l ,, < 1. We let ./: G [.7: y , ,v; y + 1 ] .
278
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
with the notation XQ = 0 and x7l+i = 1. We have h(xj) — 0. Without loss of generality we assume x < (xj + a;j+i)/2. Then we obtain
This implies \\h\\^ < m/'2 and d(N) < '2m/2 = m. Thus, d(N) = m. Clearly d(N) is minimized for the choice of equispaced points x°j = (2j-l)/(2n),j = l , . . . , 7 i . Conclusion The information N°(f) = [/(IE?), . ..,/(<)] has min 'mal diameter d(N°] = l/n in the class of all parallel function evaluations. Can we do better by allowing "more powerful" information N consisting of parallel continuous nonlinear functional? It turns out that we cannot! We justify it by again applying the Borsuk-Ulam theorem. Indeed, we take the functions:
with the notation a;g = —l/2n, a;°+1 = 1 + l/2n, i = 0 , . . . , n. Since these functions are linearly independent, the Borsuk—Ulam theorem implies that there exist functions gj,
with a{j < 1, such that N(g\) — N ( g ^ ) , and for some io, we have aj 0] i = — di0i-2 = ±1. Therefore
Conclusion The information JV° has minimal diameter in the class of all parallel evaluations of arbitrary nonlinear continuous functionals. • From now on we assume that the information ./V consists of parallel evaluations of arbitrary linear functionals, i.e., C — £*. We
7.2. LINEAR PROBLEMS
279
will characterize minimal diameters of such information by properties of the linear operator K — ST~l : H—>G, and then will relate the minimal diameters to Gelfand ri-widths of the solution set S(F) in the space G: Cn+i(S(F),G)). We remark that in the case when the information N is continuous the above n-th width is called the Gelfand n + 1-st s-number. This situation is outlined in detail in Section 7.5. We assume that the operator K is one to one (for general case see annotations), and define the t/-norm of the operator A', with U as arbitrary linear subspace of H as
The n-th minimal norm of the operator K (n -f 1-st Gelfand s-number of the operator K ) is defined as
Below we prove that the n-tl\ minimal radius of information r(n) is estimated from below by the n-th minimal norm k ( i i ) . Lemma 279.1 The following estimate holds:
Proof Lemma 271.1 implies that
We set u = Th for any h <E ker N n F. Then ||ti|| < 1, and u e T (ker N). This yields
We have since codim(T(ker N}} < codim(ker N} - dim(7V(F)) < n. Thus, r(N) > H A ' H y j k e r / v ) ^ k ( n ) . Since this inequality holds for arbitrary information N, by taking the i n f i i n u i n over N we get r(n) > k ( n ) . •
280
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
We observe that for any class jC C £* we get r(n, £) > r(n), so that the above lemma implies r ( n , C ) > k(n) as well. We can also obtain an upper bound on r(n) in terms of the ri-th minimal norm of K. Namely, we show that r(n) is at most equal twice the k(ii). Lemma 280.1 There exists information N° such that
Proof To prove this lemma it is enough to show existence of the information N° such that
Indeed, assume without loss of generality (see Remark on page 253) that there exists an extremal subspace U° of H such that k(n) = \\K\\U0. We first decompose the space H as the following direct sum, H = U° © C/ 01 where codim(E7°) = dim(f/°- L ) = d < n. This decomposition means that there exist elements «,• € H and linear functionals L° defined on H, i = I,.. . } d , such that every element u £ U° •*•, has the following decomposition:
We now define the information
and show that this information satisfies the equation d(N°) = 2 k ( n ) . Using Lemma 271.1 we obtain
which completes the proof. • Lemma 280.1 gives qualitative estimates of the ri-th minimal radius in the class of all functionals £*. It translates the problem of
7.2. LINEAR PROBLEMS
281
finding the minimal radius to finding extremal subspac.es of the operator K. This problem is well studied for various operators K in classical approximation theory. Moreover, if the information N° is such that r(/V°) - d ( N ° ) / ' 2 then
i.e., it is the n-th optimal information. Indeed, Lemma 279.1 and Lemma 280.1 imply that
which yields the above equation. The information N° is specialized to the case when G and H are separable Hilbert spaces in annotations to this chapter. 7.2.2
Relations to n-widths
The algebraic Gelfand n-width Cn(B,J-") of a set B in the linear space T is defined as the maximal norm of an element in B n V n -i minimized with respect to all choices of linear subspac.es V n _i C T of codim(V n _i) < n — 1. (compare Section 7.1) In this section we do not assume that the linear operator Q in the formal definition of Cn(B,J-) is continuous, i.e.,
where Q is a linear operator, Q : Jr—}'R,n~1(Cn~l). Equivalently we can write Cn(B,J-) as:
since every Vn-\ can be described as Vn-\ = ker Q for Q as above. We stress that the case of continuous Q is analyzed in Section 7.5. We let Cn+i = C n+ i(5(F),G) be the algebraic Gelfand n+l-st width of the range of the solution operator in the space G. We shall show that Cn+i is closely related to the n-th minimal diameter d(n}. We first decompose J- as the following direct sum: 'f = ker S 0 ker S . This means that every element / G T has a, u n i q u e decomposition / — (j + h, for (j £ ker S, and /?. E ker ,S X . We define the constant c/, (q = q(F, •<~>}), as
282
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
Since the set F is absorbing and balanced, the quantity q is well defined and nonnegative. In the following we show that q < I or q = +00. If ker S1 C F, then for every / G F we have p h G F, for every choice of nonnegative p, i.e., q = +00. Otherwise q < I . Indeed, if for every / 6 F (/ = g + h) we have h € F, then q = 1 ( Proof of this is left as Exercise 157 on page 285). If there exists /o 6 F (/g = gQ + ho) such that ho ^ F then q < 1. To show this, we suppose to the contrary that q > 1, i.e.,
Then there exists p > I such that pho € F. We decompose ho as h0 = ^(p/io) + (1 — |)0. Since p/i0 £ F, 0 £ F and J7 is convex we get h0 E F which contradicts our assumption. The following example illustrates the case when q = 1. Example 282.1 We take T = Kn, S(f) = / x , for / = [A,..., /„], and T = I. Then F = {/ € Hn : ||/||2 < 1} is the unit ball in Kn,
and
For every / € F, f = g + h, we get
since |/x| < 1. If / = [1,0,.. .,0] then sup{p 6 U : p h e F} - 1, which shows that g = 1 for this case. • We are now ready to formulate the main result of this section. Theorem 282.1 The n-th minimal diameter d(n) satisfies the following bounds:
7.2. LINEAR PROBLEMS
283
with the notation 0 (+00) = 0. Moreover, if r(n) = d(n)/2, and q — I , then
Proof We first show d(n) < 2 Cn+i- To this end we take any linear subspace Vn of G such that codim(Vn) < n. This means that
for some linear functionals Li G £*. Taking the information
we get ker iV = {/ 6 F : S(f) € K}, i.e., 5(ker N) = Vn. Lemma 271.1 implies
By taking the infimum with respect to all choices of Vn C G we get
We now proceed to show that
We split this proof into three cases:
In the case (i) the inequality is trivially satisfied since d(n) > 0. In the case (ii) we assume for simplicity that S(F) = G with the general case 5(Jr) ^ G left as an exercise. We denote N(f) = [ L i ( f ) , . . . , Ln(f)], for any L,- G £*, and then decompose the space F as the direct sum T = ker 5(Bker 51, i.e., any / € J" has a unique decomposition / = g + h, g 6 ker S, and h G ker S1. By taking now any w 6 G we get
284
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
for some / 6 JF, h € ker SL, g 6 ker S. We define the linear functional L°(w) = Z/;(/i), i = 1,..., n, and let
Then obviously codim(V^,) < n, and
Now we take any small positive number i, i € (0,), and any element h 6 ker S1-. Since 5*(/i) G 5(F) there exists f £ F, f = fi + /2j such that 5(/) = •S'C/z) = •S'(^)) which implies that /2 = h. The definition of the number q for this specific choice of / yields the existence of a constant c = c(f] > q — t such that c/2 G F. By the convexity of F we get (g-*)/ 2 = *^c/2 + (l- a^) 0 € F. Therefore
Taking the infimum with respect to N we obtain
Since t is arbitrarily small positive number we get 2Cn+\q < d(n), which establishes the desired inequality in the case (ii). In the case (iii) (q = +00) we first assume that dim(G) < n. Then for the subspace Vn — {0} we have codim(V^) < n, which implies Cn+\ = 0. Then 1Cn+iq = 0(+oo) = 0 < d(n) trivially holds. In particular we have d(n) = 0, since d(n) < 2Cfn_|_i. Now we show that d(n) — +co if dim(G) > n. We first note that there exist n~\-1 linearly independent elements 5 r (/i), . . . , S ( f n + i ) £
7.2. LINEAR PROBLEMS
285
G for some /,- G ker 51. Then the /;'s are linearly independent. Therefore, lin(fi,.. .,fn+i) C F. Considering arbitrary information
where [ c j , . . . , cn+\\ is a non-zero solution of the homogeneous system of n linear equations with n+1 unknowns:
Therefore / e ker JV n ker 5 X , and S(f) ^ 0. Then for every element c, c/ 6 F n ker AT and ||5(c/)|| = |c|||S(/)||->- + oc for |c|—» + oo. This shows that d(N) = +00, for every A7', i.e., also d(n) = +00. The inequality d(n) < 2Cn+i implies Cn+i = +00, thus the inequality 2gCn+i < d(n) holds in this case also. This completes the proof of the theorem. •
7.2.3
Exercises
154. Give an example of a linear problem and inter polatory algorithm that is not strongly optimal. 155. Prove the equation sup{||5/»|| : h 6 ker N n F} = sup{||S/i||/||T/i|| : h € ker A^}, used in the proof of Corollary 272.2. 156. Give an example for which the constant q defined on page 282 is any number in [0, 1]. 157. Show that q = I if kerS1- , y) is much larger than the local radius of information r(N, y), for some data vector y.
286 7.3
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
Parallel versus sequential methods
Two models of computation are currently most popular: (i) sequential, where the code is executed on a single processor, and (ii) parallel/distributed, where the code is executed on a collection of processors connected to a shared memory, or where all processors have their local memory and are interconnected via specific network. Examples of sequential computers include scalar supercomputers and workstations. The parallel/distributed hardware has several implementations, ranging from shared memory machines (e.g., SGI Power Challenge), to massively parallel, distributed memory machines (Connection machine CM-5, IBM SP-2, Intel Paragon, Cray T3D), to finally network connected clusters of workstations. Each of the above hardware implementations requires specific programming styles to utilize its full potential. The traditional approach for solving large problems was based on running codes sequentially in "batch mode", i.e., without any on line interaction between programmer and the running software. This model has, to a large extent, changed in present day computing. Programmers/designers revoke to interactive use of workstations for solving very large problems. Moreover, with the advance of parallel hardware, more large scale applications are ported to one of the above mentioned architectures. Two parallel programming paradigms are most commonly used: (i) shared memory and (ii) distributed message passing. In the shared memory model the program runs in parallel on several processors while using shared memory data. This data is updated/stored sequentially during the execution. In the distributed message passing model the data is distributed among possibly thousands of processors and is passed through the interconnecting network whenever such needs arise in the running algorithm. The first approach (i) is only scalable to small number of processors (memory access bottleneck) , whenever the second (ii) has already been shown to be scalable to thousands of processors. By scalability we understand here the possibility of achieving almost linear speed-up of the implementation on an arbitrarily large number of processors. Several important applications in engineering and physics have been demonstrated to achieve optimal linear speed-up over one processor implementation for the distributed model. We believe that future large scale numeric computation will mostly be advanced by innovative parallel algorithm design. This is why theoretical investigation of various classes of computational problems should address
7.3. PARALLEL VERSUS SEQUENTIAL
METHODS
287
gation of various classes of computational problems should address the issue of parallelization and large scale scalabilty. In this section we show that the class of linear problems (S, T, N) is ideal from the point of view of parallel computation. Namely we show that no matter which sequential information is chosen for the solution of a linear problem (5, T, N), there always exists a parallel information of more or less the same quality. The computation of parallel information with n evaluations, n being an arbitrarily large number, can be easily implemented on a distributed machine to yield an optimal speed-up. This is guaranteed by the fact that this computation does not involve any communication between processors. For many problems (see Section 7.4) this information can also be easily combined by using optimal linear algorithms. For sequential information we have the following result. Theorem 287.1 For every linear problem [S,T,N] and any choice of sequential information Ns there exists parallel information Np with the same cardinality, such that:
Proof We let / be any element in F, and Ns be any sequential information:
where, as in Section 7.1,
We then define
i.e., Np is generated by applying the sequential information Ns to the function / = 0. We first observe that Np(h) = Ns(h) - 0 for every h € kerJV?. Indeed, for any such h we have
288
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
for i = 2, ...,n. This implies that the local diameters d(NJJ,0), and d(Ns, 0) are equal
Corollary 272.1 yields d(Np) - d ( N p , Q ) . This equation combined with the last inequality completes the proof. •
Corollary 288.1 Since the radius of information to half the diameter we obtain:
is at least equal
This means that the radius of the parallel information is at most twice the radius of the sequential one. Until recently, no linear problems have been known for which r ( N s ) < r ( N p ) (see Annotations) . Moreover, for linear problems denned by real linear functional 5 it holds: r ( N p , y ) = l / ' 2 d ( N p , y ) for every y 6 N ( F ) , i.e., r ( N p ) = 1/2 d ( N p ) , which combined with the above inequality gives r(Ns) > r ( N p ) . Remark We stress that Theorem 287.1 is addressing the worst case setting. It may happen that for some data vector y = N(f) a sequential method would be much more powerful than the optimal parallel. Indeed, we consider here the class F of problem elements from Example 277.1, i.e., the class of / : [0,1]—>•?£, that are absolutely continuous, with \\f'\\esssup < 1. We define £(/) = fi f(f)dt and suppose that for some function / we find two points ti < t-^ such that /(ti) — f(t-2) — t\ —t-2- Then in the interval [£1,^2], the function / is a line interpolating (ti,/(ii)) and (^/(^l), i- e -i there is no need to sample / anymore in [£1,^2] since the integral can be computed exactly on this interval. An extreme example of a function / for which only two evaluations are needed to guarantee r ( N , y ) — 0 is f ( x ) = x with y = N(f] = [/(0),/(1)] = [0, 1]. We recall that the worst case optimal information for this problem is given by function samples at equispaced points in [0, 1]. • All assumptions in Theorem 287.1 are necessary to guarantee its conclusions. Examples are known for which the theorem does not
7.3. PARALLEL VERSUS SEQUENTIAL METHODS
289
hold even if only one of the assumptions is violated. Here we list a nonlinear problem for which there exists sequential information that is exponentially more powerful than the optimal parallel. Example 289.1 We let F = {/ : [0, !]->?£ :/(0) < 0, /(I) > 0, / continuous, and /-1(0) is a singleton}, and define S(f) = /~ 1 (0) to be the only zero of such /. We first analyze parallel information for the solution of this problem:
We let Xi = l/(n + 1) i, % = 0 , . . . , n + 1, and define the functions:
i = 0 , . . . , n. Then the linear system
has a non zero solution a = [a 0 ,..., an]. We suppose that
and without loss of generality assume a,-0 > 0. We now take arbitrarily small positive 8 and define the function
By taking the functions h(x] = Y^i=oai^"i(x)i
we obtain:
and
290
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
Therefore d(N) > |S/i-S/2|| = l/(n + l)-4/3 <5. By taking <5-»0+ we obtain d(N) > l/(n + 1). It is easy to verify that
For this problem we have d(N) = 2r(N), and even locally d(N, y] = '2r(N, y) for every choice of y = N ( f ) , f € F. This shows that function evaluations at equispaced points constitute optimal information in the class of all parallel information. It turns out that we can use the sequential bisection method to approximate zeros of functions in our class. It is left as an exercise to the reader to verify that in the class of sequential function evaluations: Ns(f) = [ / ( E I ) , . . . , f ( x n ) ] , where
and Xi is any function oTj : 7^l"1-^[0,1], the optimal information is provided by the bisection information: Nb™(f), where
and
Since the d(Nbis) = 1/2" = 2r(Nbis), we conclude that the sequential information is exponentially more powerful that the optimal parallel. In addition in Section 8.3 we show that the bisection information Nb™ is even optimal in the class of all sequential information consisting of arbitrary linear functionals. • 7.3.1
Exercises
161. Show that the bisection information Nb" is optimal in the class of all sequential function evaluations (see Example 289.1). 162. Suppose we approximate a zero of a function / € F — {/ : [0, !]->#, |/(z) - /(y)| < K\x - y , f ~ l ( 0 ) not empty }, i.e., we want to compute a point x£ G S ( f , s ) = {x G [0, 1] : dist(x, /^(O)) < e}, where dist(X, Y) = \nfxeX,yeY \\x - y\\. (1) Generalize the definition of diameter of information for this problem, and show that d(N] = 1 for every sequential evaluations of function values as information.
7.4.
LINEAR AND SPLINE ALGORITHMS
291
(2) Generalize the result from (1) to parallel and sequential evaluations of arbitrary linear functionals as information. 163. Suppose we want to compute an e-approximation to a zero of a function / 6 F as in Exercise 162 with using of the residual error criterion; i.e.. a point xe 6 S ( f , t~), where S ( f , e ) = {t € [0, 1] : |/(£)| < £}• Generalize the definition of a radius and diameter of information for this case, and (1) Find optimal information in the class of parallel function evaluations. (2) Show that sequential information is not more powerful than parallel information in the class of function evaluations. (3) Generalize the results from (1) and (2) to arbitrary linear functionals as information. 164. Repeat the Exercise 163 for the multivariate class of functions F = {r.[0,l]d->nd: \\f(x}-f(y}\\00<\\x-y\oo that have d a zero in [0, l] }, where d > 2.
7.4
Linear and spline algorithms
In previous sections we have been mostly studying properties of information, with special emphasis on theory of linear problems. In this section we investigate efficient algorithms that combine given information and compute the approximate solution M ( / ) . The "easiest", most natural way of combining given information y = N ( F ) = [Li(/), . . ., Ln(f)] is a linear combination of L,(/)'s, or a linear algorithm :
where ; are some elements of the space G. Optimal linear algorithms enjoy almost smallest cost (Section 7.6) for given information. In what follows we study linear algorithms, and their relationship with spline algorithms. We show that for the specific class of linear problems, where the restriction operator T acts into a Hilbert space and T(ker TV) is closed, there always exist optimal linear algorithms. Finally in Section 7.6 we exhibit optimal complexity (cost) properties of linear algorithms. Many applications, like integration, approximation, interpolation, or derivative evaluation (in well known classes of functions)
292
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
enjoy having optimal linear algorithms. It is natural to ask whether all linear problems (S, T, TV) have that property. Unfortunately, the general answer is negative. Examples are known of linear problems for which there exist no optimal linear algorithms (see annotations). Further investigation of this problem is undertaken in Section 7.5 where the relationship between optimal linear algorithms and snumbers is outlined. Here we show that for the important class of linear problems denned by a linear functional 5 optimal linear algorithms always exist. The following result is known as Smolyak's theorem. Theorem 292.1 Suppose we solve a linear problem (S,T,N) where the operator S is a real linear functional S : F—tH, and N is parallel information, N(f) = [ L \ ( f ) , . . .,L n (/)], with Li : J--YH. For i = 1, 2 , . . . , n we define real functions r^ : 1Z—\H, by
Then: (i) There exists a linear optimal algorithm for the solution of this problem. (ii) If rt- (0) exists for every i = I,..., n, then the algorithm
is a unique optimal linear algorithm. The results of Section 7.4 and Corollary 288.1 indicate that for every sequential information there exists parallel information of the same radius, i.e., we do not lose generality by considering parallel information in this theorem. Proof We first prove (i). If the radius r(N) = +00, then any algorithm is optimal, in particular <j>(N(f)) = 0 for every /, which is linear (
7.4.
LINEAR AND SPLINE ALGORITHMS
293
The set A is convex and balanced. Indeed, we take any x\ = x(f) and x-2 = x(g), x\,x-z G A. Then for every t G [0, 1] the element tf + (I - t)g G F, since F is convex. Thus x(tf + (1 - t)g) = tx\ + (1 - t)x-2 G A, which shows that A is convex. Since F is balanced, the same property attributes to A. We now let r — r(N) to be. the radius of information N , and we take the point b = [r, ( ) , . . . , 0] G Un+l. Since r(N) = 1/2 (/(A'') = sup{5/i : /i G ker Af n F} the point b belongs to the boundary of A, see Figure 233.1. For every boundary point a of a convex set A there exists a supporting hyperplane of this set passing through a. Precisely, for a vector c = [c 0 ,. . ., cn]T, and a point a G dA we define the hyperplane
where CT denotes the transpose of the vector c. Then the Mazur theorem 6.1.1 implies that there exists a non zero vector c such that for every point x € A we have
i.e., the set A is located on "one side" of the hyperplane H ( c , a). Now, we consider the supporting hyperplane H ( c , b ) passing through the point b. This means that c = [ c i , . . ., cn]T is chosen in such a way that
or equivalently,
We first observe that the coefficient CQ ^ 0. Indeed, by supposing to the contrary we get ^?=i ciLi(f) < 0 for every / G F. Since F is balanced we get 52?=i ciLi(f) > 0 for every / G F, which implies that XI?=i ciLi(f] = 0, for every / G F. Since L,-'s are linearly independent , all c,-'s must be equal to zeros, which is a contradiction. Therefore CQ 7^ 0. Since the set A is balanced, we have —b G dA, and the supporting hyperplane through —6 is given by: CT(X + 6) = 0, i.e.,
294
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
Now, the inequalities (I) and (II), with dj = — c;/Co, yield
i.e.,
The last inequality proves that the linear algorithm
is optimal, which completes the proof of (i). Now we prove the second part (ii). To this end, we take any optimal linear algorithm
choose a small number 5, and consider a function / € F such that
Then We note that r(N) = sup{5/ : / € ker N n F} = r;(0) for every i — 1 , . . . , n, which implies that r(N) is finite. The above inequality yields We now choose 8 to be positive and / such that Sf > r;(£) — £, for arbitrarily small positive £. We obtain
By taking £—>-0+ we get
7.4.
LINEAR AND SPLINE ALGORITHMS
295
We now choose 6 to be negative, and obtain
By combining these two inequalities we get:
We now let S—>-0 to conclude
Spline algorithms
Optimal algorithms have minimal global error over all problem elements. It may happen that for a specific choice of data y = N(f) the error of an optimal algorithm may be significantly larger than the local radius of information r(/V, y). A strongly optimal algorithm has the desirable property that for all /'s its local error is equal to r(/V,y). Unfortunately, strongly optimal algorithms are difficult to obtain for most applications. Therefore it is essential to study algorithms that are almost strongly optimal, i.e., whose local error is at most a (small) constant times larger than the local radius of information. We define here the deviation of an algorithm 4> as:
which is the worst case ratio of a local error of (/) as compared to the local radius of information. Obviously strongly optimal algorithms have the deviation equal to one, and for every algorithm (f> the dev(>) is at least one. In this section we investigate conditions under which linear algorithms enjoy a finite, and possibly, a small deviation. An important class of spline algorithms will play an essential role in providing this characteristic.
296
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
Suppose we solve a linear problem (S,T,N}. We define a spline algorithm <j>s by
where
for any / 6 F. This set contains all elements in the kernel of N that minimize the distance between the set T(ker N) and the element Tf. As a conclusion from Section 2.3 that describes properties of splines we obtain: Lemma 296.1 (a) The spline algorithm s ) < 2. (c) s is a homogeneous algorithm, i.e. <j)(cy) = c<j>(y), for any constant c and vector y. (d) (j)s is uniquely defined (i.e., for every two splines PI, a^ interpolating the same y = N ( f ) , and for every f € F it holds S<TI = Sa2) iff the set SP(Tf] is a singleton for every f € F. Proof Property (a) holds since spline element exists whenever P(Tf) is not empty for every / € F. To show (b) we observe that ||Tcr(y)[| < ||T/|| < 1 for every y 6 N(F); u = N ( f ) since ||TCT(J/)|| is minimal on the set V(y) = N~l(y). Thus a(y) e V(y) as defined in Section 7.1. Property (c) follows from the homogenity of spline element itself. Finally to show (d) we take two splines a\ and <72 interpolating the same y = A r (/) for any f £ F. Then (Section 2.3) / - cr,- = hi <E P(T/), and S^i - Sv2 = Sh2 - Sh-^. The last equation implies that SP(Tf) is a singleton iff 5<Ti = 5(72, i.e., the spline algorithm (j>s is uniquely defined. • In the following theorem we fully characterize algorithms with small (finite) deviation. Theorem 296.1 We assume that the set SP(Tf] is a singleton for every f 6 F, and that the radius of information r(7V) is finite. Then every nonspline homogeneous algorithm has infinite deviation.
7.4.
LINEAR AND SPLINE ALGORITHMS
297
Proof We take any nonspline homogeneous algorithm and suppose that the global error e(>) is finite. (In the case e(>) = +00 we immediately get dev(^) = +00, since r(N) < +00). We first observe that there exists uniquely defined spline algorithm >s = 5'cr(y), since the set S P ( T f ) is a singleton. If the spline cr(y) is in ker T, then the algorithm ; is equal to the spline algorithm, 4>(y} — S a ( y ) for every y. Indeed, by taking any real c we get c
Since this equation holds for any real c, we conclude that <j)(y) = Sa(y), for every y. The above argument shows that there exists y 6 N ( F ) such that Ta(y] -^ 0, and <j>(y) ^ Sa(y), since <j> is a nonspline algorithm. We define y = y/||Ta(y)|| and consider the set B = SN~l(y). The set B is a singleton, containing only the element 5 obtain
Finally we get e(>, y) ^ 0 and
which completes the proof. • Since linear algorithms are homogeneous this theorem implies the following important corollary characterizing linear algorithms with small deviation. Corollary 297.1 If the set SP(Tf) is a singleton for every f 6 F, and r ( N ) < +00, then: I. If the spline algorithm (f)s is linear, then it is the only linear algorithm with finite deviation. 1. If the algorithm (f>s is nonlinear, then there exist no linear algorithms with finite deviation.
298
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
We now show that the assumption that PT(f) is a singleton set is essential. Indeed, we give an example for which PT(f) is a subinterval of a real line, and for which there exists a central linear spline algorithm and a linear spline algorithm of deviation 2. In [10] a similar example is given (S = /), for which there exists a central non-spline algorithm and a linear spline algorithm of deviation 2. Example 298.1 We let F = {/ € C[0,1] : H/H^ < 1}, to be the class of continuous functions on the interval [0,1] with bounded infinity norm. We consider here the integration problem 5(/) = Jo f ( i ) d t . These definitions imply that the operator T is identity, T = I. We assume that the information consists of parallel function evaluations : y = N(f) — [/(EI), . • . , f ( x n ) ] , for some points 0 < x\ < x-2 < ... < xn < 1. We leave as an exercise to the reader to verify that: 1. The local radius of information for every choice of y = N(f) is equal to 1, r(N,y) = I. 2. The set S(N~1(y)) is given by
These equations imply that the number 0 is the unique center of the set 5(V(y)) = S(N~1(y) n F) for every y. We further observe that a spline interpolating y is any element in C[0,1] such that
Therefore there exists infinitely many splines interpolating any given y ^ [0,.. .,0]. Thus, for all / in F such that y = N(f) ^ [ 0 , . . . , 0] the set SP(Tf] is a nonempty interval:
The central linear algorithm 4>c(y) = 0 is a strongly optimal algorithm with dev(0 c ) = 1. We observe that (j)c is a spline algorithm, since for every y there exists a spline
7.4.
LINEAR AND SPLINE ALGORITHMS
299
We now consider an arbitrary linear problem (5, T, Af), assuming that the range H of the operator T is a Hilbert space, and that T(ker AT) is a closed subset of H. We show that if the radius r ( N ) is finite, then there always exist a unique linear central spline algorithm. Theorem 299.1 Let us assume that H is a Hilbert space, T(ker N) is a closed subset of H, and that the. radius r ( N ) < +00. Then the algorithm
where a^ is a spline interpolating
is a unique linear central spline algorithm. Moreover the radius of information is equal half the diameter,
and the local error of the algorithm <j)s is given by:
where a(y) = Y?i=i J/*0* z's a spline interpolating y = N ( f ) . Proof We split the proof of this theorem into the following steps. (i) We show that the algorithm /' is well defined, i.e., that the splines a± exist, and that <j)s is a spline algorithm. We further show that <j)s is unique. (ii) We show that r ( N ) = r2d(N). (iii) We show that )s is a central algorithm, and derive the above formula for its local error. We start with showing (i). To this end, we observe that the set Ei = {/ £ F : N(f) = €i] is nonempty, since L,-'s are linearly independent. We take any element g e Ei and observe that g — f G ker N
300
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
for every / G £,-. Moreover inf/g^ ||T/|| = inf/ l 6 i< e r w | TV/-T/i||. We now decompose H as the direct sum // = T(ker Af) 0 T'(ker N)^. This decomposition is well defined since T(ker N) is a closed subset of H . We write Tg = (Tg)i-t-(Tg)% according to this decomposition, and obtain inf/ i e k e r 7 V \\Tg~Th\\ = inf / l 6 k e r N ||(r5)1 + (T( ? ) 2 -T/i|| = ||(T(/)2||, which is achieved for some h G ker A" such that Th — (Tg)\. We now define the element
which implies that Tu = 0. This shows that u 6 ker .'V n ker T. Since the radius r(7V) is finite, Corollary 272.2 implies that ker S D ker N n ker T 3 u. Thus, 0 = Su = S(?i - 6V2, i.e., 6Vi = ,SV2, which shows that the spline algorithm / is unique, and completes the proof of (i). We now show (ii). To this end, we take any / 6 F and the spline a = £]™-i Li(f)ffi. We define h = f — a, and conclude that h G ker N and that Th is orthogonal to Tcr, (T7i,Tcr) = 0, where (•, •) is the scalar product in the space H. (see Lemma 112.1). We first show that h belongs to F, i.e., that \\Th\\ < 1. Indeed
By using Lemma 271.1 we obtain:
7.4.
LINEAR AND SPLINE ALGORITHMS
301
Now, we take the supremum on the left hand side of the last inequality to get
which shows r(N) — \ d ( N ) and completes the proof of (ii). We finally show (iii). To prove that <j>s is a central algorithm we observe that the set TV"1 (y)flF is symmetric with respect, to cr(y) (see Lemma 113.1). This implies that, the set S(V(y)) - S ( N ~ l ( y ) n F) is symmetric with respect to Scr(y). Now, Lemma 268.1 implies that Sa(y) is a center of the set S ( V ( y ) ) , which shows that (// is a central algorithm. We further observe that the condition cr(y) + h £ V(y) can be equivalently stated as \\T(a(y) + h)\\ < 1, or as \\Th\\ < l-||Ta(y)|i 2 . Hence,
where in the last equation we used the result from Exercise 155 on page 285. This completes the proof of our theorem. • It has been proved recently (see Annotations) that the assumptions that H is a Hilbert space and T(ker N) a closed set in H are necessary to guarantee that the spline algorithm /' is linear for arbitrary linear solution operator S\ We finally specialize our results to the case when 5 is a linear functional. We show that the linear central spline algorithm coincides in this case with the unique optimal linear algorithm constructed in the assertion (ii) of Smolyak's Theorem 292.1. Lemma 301.1 We suppose that all assumptions of Theorem 299.1 are satisfied and that the operator S is a linear functional. Then the unique optimal linear algorithm (j>°(N(f)) = Y?i=i r i(0)^z(/) given
302
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
in the assertion (ii) of Smolyak's Theorem 292.1 coincides with the linear central spline algorithm
Proof We take r^(x) as defined in the Lemma 292.1 , i.e.,
where Rj — {/ € F : £;(/) = x, A?'(/) = ®> f°r every J 7^ 0- Given an Y / € Ri we can write / = xa\, + h where h is some element in ker N. Therefore 1 > \\Tf\\2 = x'2\\Ta,\\2 + \\Th\\2, since (Th,T<Ti) = 0. This (as in the proof of the above theorem) implies that
Therefore the derivative of r,-(x) exists at x — 0, and r 4 -(0) = S<7;, which combined with the Smolyak's lemma completes the proof. •
7.4.2
Relations to linear Kolmogorov n-widths
In this subsection we show cl-^se relationship between minimal errors of linear algorithms and linear Kolmogorov n-widths for linear problems. This establishes another important relationship between approximation theory and computational methods. We consider the linear problem (S, T, N) with parallel information N(f) = [ L j ( / ) , . . . , Ln(f}] for any / G F, where L,- are arbitrary linear functionals. We define the class of all linear algorithms using parallel information with at most n information operations by («,), and we set /(n) to be the minimal error of all linear algorithms in the class <Wn):
We shall say that the algorithm -> 6 (7i) is an n-th optimal linear algorithm iff e((f>] = / ( n ) . We remark that for problems for which
7.4.
LINEAR AND SPLINE ALGORITHMS
303
there exist optimal algorithms that are linear we have l(n] = r(n), where r(n) is the n-th minimal radius (see page 277). We recall that the algebraic linear Kolmogorov n-width An(B,G) of the set B in the space G is defined as the minimal error of approximation of the identity operator by n — 1 dimensional linear operators on the set B ( Chapter 6). We let the constant q = q(F, 5) be defined as on page 282. Then we prove the following theorem. Theorem 303.1 Let An+i = An+i(S(F),G) be the n + I-at linear- Kolmogorov n -width of the set S ( F ) in the space G. Then the minimal error of linear algorithms satisfies the following bounds:
with the notation 0 X (+00) = 0. Proof We first show that l(n) < An+\. To this end, we take an arbitrary linear operator P : G—>-G, with the range P(S(F)) in an n dimensional subspace of G. This means that there exist linear functionals l{ : G-^Ti and elements ,- of G such that the element p ( S ( f ) } = E?=i l i ( S f ) g i , for any / € F. We now define the information N(f) = [ l i ( S f ) , . . . , l n ( S f ) ] , and consider the linear algorithm (N(f}} = P ( S ( f ) ) = E?=1 k(Sf)P(9i). Then, obviously ^ e *(n), and e() = sup /6F ||5/ - <^V(/))|| - sup fl6S(F) \\g - P(g)\\. By taking the infimum with respect to all such linear operators P we obtain l(n) < A n+1 that completes the first part of the proof. We now proceed to show that qAn+\ < l(n). As in the proof of theorem 282.1 we split this proof into three cases:
In the first case (i) the estimate is trivially satisfied. In the second case (ii) we assume as in the proof of Theorem 282.1 that S(.F) = G, and take any linear algorithm <j> 6 $(n): (j>(N(f)) = J2f=i £*(/)&• We then define the linear operator P : G-+G by P(v) = E?=i i>i(h}gi for v = S(li) where / = g + /i, and where g G ker S, h £ ker S1 are the components of / in the direct sum decomposition of T = ker ,5©
304
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
ker 5-1. For this operator P we get P(,S'(F)) C span(gi,..., n ). Moreover,
for any / 6 F. By taking a number i € (0, q) , we obtain from the definition of q, that for every / G F, there exists a constant c = c(/) > — t such that c/i G F. This immediately implies that
Since P, <^> and i were arbitrary, by taking the infimum we obtain An+i < l ( n ) / ( l , which completes the proof of (ii). In the third case (iii) we first assume that dim(G) < n. Then An_)_i = 0, and qAn+\ = 0 < /(n). Therefore we assume that dim((7) > n. Then the last part of the proof of Theorem 282.1 yields d(n) = r(n) — +00. Since /(?i) > r(n) we obtain l ( n ) = +00, which means that qAn+\ < l(n) also holds in this case. This finally completes the proof of the theorem. • This theorem is relates results of approximation theory with minimal errors of linear algorithms. For many important applications the linear Kolmogorov ?i-widths have been found. They characterize the behavior (rate of decrease) of the minimal errors of linear algorithms. It is also known that the Gelfand ri-width is not greater than the linear Kolmogorov n-width, and for some applications they are equal as outlined in Theorem 231.2. 7.4.3
Exercises
165. Verify all claims of Example 298.1. 166. Derive formulas for the functions ?-,•(•) defined in the Smolyak's Lemma 292.1 for the introductory example to Chapter 7. Verify that the optimal linear algorithm derived in this example isgivenbyE?=i/(*i)K-(0).
7.5
S-numbers and minimal errors of algorithms
In Sections 7.2 and 7.4 we outlined basic relationship between optimal information and Gelfand n-widths as well as optimal linear
7.5. S-NUMBERS, MINIMAL ERRORS
305
algorithms and linear Kolmogorov n-widths. In both cases the nwidths establish tight lower and upper bounds on respectively minimal diameters of linear information, and minimal errors of linear algorithms. In this section we show that s-numbers as outlined in Chapter 6 have similar, and even more general properties. We assume that the solution operator S is a bounded linear operator acting between any Banach spaces T, G, S 6 £(^,0), and that the problem set F = Bf is the unit ball in J-'. The class of permissible information operations consists of parallel evaluations of continuous functionals Lj (linear or nonlinear), Lt : 3-—>'R, or C, i.e., the information N ( • ) = [ L i ( - ) , . . . , Ln(-)]. N € £„- We let = 4>(./V) be a class of algorithms using N such that all linear algorithms belong to <]>. We define sn(S) to be the lower bound on errors of all algorithms in $(JV) with using of arbitrary N e Ai-ii
n = 2,3,...,withs 1 (5) = ||5||. Then the following theorem holds. Theorem 305.1 The mapping s,
is an s-scale. Proof We need to check the conditions (al)-(a5) in the definition of s-scale (Section 6.2). To this end, observe that the above definition of sn immediately implies that 0 < s ?l+ i (S) < -sn (•''') for n — 2, 3, ..., and s2(S) = «n/N6-cr» l / f /i6<J'(N) < e( = °) = •^/eHI-'5'/ ~ °ll = ||5|| = «i(5), which shows (al). To show (a'2) we take any operators A", V 6 £(^, G), and observe that for any algorithms (/>!, > 2 , for A" and V respectively, we can apply the algorithm fa + fa for the operator X + Y . Formally
306
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
with the notation (>i + i(2/i) + ^(ih), J/i 6 '^'" * or C""1 and j/2 e '^""1 or C""1. The last equation shows (a2). To show (a3) we choose any Banach spaces E, f, G, H, and operators Z 6 C(E,F), Y £ £(^,G), and A" e C(G,H). We then choose a positive J, any information N : B^-^TV1 or C", and algorithm 4> € $(N) such that
Then we define the information
and algorithm Then we have
By taking now S— >-0+ (a3) follows. To show (a4) we take any operator 5 € C(jF,G) with rank(S) < n. Then S(f) = [Li(f)
and algorithm (/>(N(f)) = S ( f ) . Since e((j>) = 0, we obtain .*„.(£) = 0 as needed. Finally we show (a5). We first observe that *„(/„,) < * i ( I u ) = \\In\\ = 1, for In being the identity operator on l\n. We now choose
7.5. S-NUMBERS, MINIMAL ERRORS
307
any continuous information N : 7£n-»7£n~1. Then the Borsuk-Ulam theorem (as formulated in Chapter 6) implies that there exists /0 on the boundary of the unit ball Bn of Un such that N(fo) = N(-f0). Therefore for any algorithm <j> we have
By taking the infimum on the left hand side we get s n (/ n ) > i, which shows (a5). The proof of the theorem is now completed. • The following corollary holds. Corollary 307.1 The s-scale sn(S) generated by Theorem 305.1 is equal to: (1) the scale dn of Kolmogorov numbers, whenever $ is the class of all linear algorithms and £n is the class of arbitrary continuous information. (2) the scale an of approximation numbers, whenever $ is the class of all linear algorithms and Ln is the class of all linear continuous information. Therefore the scale dn establishes a lower bound on the errors of linear algorithms using arbitrary continuous information, and the scale an establishes a lower bound on the errors of linear algorithms using arbitrary continuous linear information. We now take any class Ln of information and any continuous linear operator S as specified in the beginning of this section. We define minimal half diameters of information in this class by the sequence tn(S),
with t i ( S ) = \\S\\. By using a similar technique as in the proof of Theorem 305.1 one can prove the following result. Lemma 307.1 The mapping t:
is an s-scale.
308
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
Remarks The scale tn(S) is equal to the scale cn(S) of G elf and numbers (defined in Section 7.2) whenever Cn is the class of all linear continuous information. When £n is the class of all continuous information, the tn(S) coincides with the scale Jn($) of Babenko numbers (see Annotations for references). Therefore the scale cn(S) establishes a lower bound on the radius of arbitrary continuous linear information (since ^ d ( N ) < r ( N ) ) , and the scale 7,1(6') establishes a lower bound on the radius of arbitrary continuous information. • We summarize the above remarks in the following two tables.
So far we have outlined lower bounds on minimal errors of algorithms in terms of s-numbers. Below we refer to results of Section 6.2 to give conditions under which the .s-numbers provide upper bounds on these quantities. These will allow us to answer general questions of the type, e.g., under what assumptions optimal linear algorithms exist for linear problems, etc. In particular, if S 6 £(-7-", G) for F, and G are Hilbert spaces, then the following result holds, since all s-scales are equal as described in Theorem 246.1. Corollary 308.1 Let N be any information in the class of all continuous information jCn. Then (1) Nonlinear information is not more powerful than linear information, since jn(S) — cn(S). (2) There exists an optimal linear algorithm for this problem, since cn = an. (It is left as an exercise to the reader to construct this algorithm) We remark that a similar result to the assertion (2) of Corollary 308.1 has been proven in Theorem 299.1 under different assumptions on S, J - , G, and the information class £„. We end this section by stating a corollary from Theorem 24S.1.
7.5. S-NUMBERS, MINIMAL ERRORS
309
Corollary 309.1 Let S 6 £(-?-", G) where jF and G are arbitrary Banach spaces. Then (1) If G has the metric extension property, and N is an arbitrary continuous linear information, then there exists an optimal linear algorithm >/,-n using N. (2) // f has the metric lifting property, S is any positive number, N is any continuous information, and (j> is any linear algorithm using N, then there exist linear continuous information N and a linear algorithm > using N , such thai
Hence, nonlinear continuous information is not more powerful than linear continuous information for the solution of this problem, when one is restricted to using linear algorithms only. We remark that a similar result to the assertion (1) with G = 7?., and different assumptions on T and N was given in Theorem 292.1. The proof of this Corollary is a reformulation of the proof of Theorem 248.1. This is why we only outline the proof of assertion (1), leaving as an exercise the proof of assertion (2). Proof We only show the assertion (1). We choose any linear information N G Ln. Then kerTV is a linear subspace of T. We define J to be the canonical embedding J : kerN-^f. Since G has the metric extension property, then there exists an operator S 6 £(^ r , G), such that SlkerTV = SJ, and 51]^^ = ||5||. We define the linear algorithm <£(#(/)) = (S - 5)(/), for any / e T. Then the error of is given by:
where in the last equation we used Lemma 271.1. This completes the proof of assertion (1). •
7.5.1
Exercises
167. Prove Lemma 307.1. 168. Carry out the construction of optimal linear algorithm in assertion (1) of Corollary 308.1.
310
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
169. Modify the proof of Theorem 248.1 to show the assertion (2) of Corollary 309.1.
7.6
Optimal complexity methods
In this section we discuss complexity issues of approximation methods. We use the real number model of computation, which is defined by the following two postulates: 1. We assume that each information operation £;(•) costs c > 0 units. 2. We assume that multiplication by a scalar or addition (subtraction) of elements in the space G has unit cost. In the case when G is a finite dimensional space (dim(G) = &), then the unit cost corresponds to the cost of k arithmetic operations. We make this assumption to normalize our cost functions. In addition we assume that comparisons and evaluations of certain elementary functions have also unit cost. We define the worst case cost of a method M = (, N), as:
where the cost(./V(/)) is the cost of computing the information vector y = N ( f ) , and cost(>(./V(/))) is the cost of combining the information y to compute M(f) — 4>(y) with using of the algorithm >. A method M° = ((/>°,N°) is called an optimal complexity method if it guarantees to compute an e-approximation for every problem element, with a minimal worst case cost among all methods that belong to a specific class Ai, i.e.,
=f inf (cost(M) : M = (<j>, N] e M such that e((f>} < e}. This minimal cost comp(£r) is called e-complexity of the problem S in the class M.. Examples of classes of methods include the following. 1. Information consisting of parallel evaluations of arbitrary linear functionals combined with an arbitrary algorithm <j>. 2. Information consisting of arbitrary parallel function samples combined with any linear algorithm.
7.6. OPTIMAL
METHODS
311
3. Information consisting of arbitrary sequential function samples combined with an arbitrary algorithm, etc. We recall that the information complexity m(e) in the class C (Section 7.1, p. 263) is the minimal number of samples L;(.) 6 £ in the information Nn = [ L i ( - ) , . . . , L n (-)], which is needed to compute an ^-approximation for every / 6 F. Formally
We now suppose that we can construct information N° consisting of 71 = m(e) samples, such that the radius of N° is at most e, r ( N ° ) < s. To summarize: N° is such that: 1. The number of samples n equals m(e). 2. The radius r(N°) < e. 3. The cost cost(AT°(/)) for a worst / 6 F is equal c m(e}. We further assume that there exist an algorithm ) = r(N°). 5. The cost of combining N°(f) with the algorithm is much smaller that the cost of computing N°:
Under these assumptions the cost of the method M° = (>, N°) is approximately equal to cm(s)}
We observe that to compute an e-approximation for every f g F we have to use information 7Vn with the number of samples n > m(e). Therefore the cost of any method M that solves our problem must be at least c m(e),
These arguments imply that the method M° is an almost optimal complexity method in the class M = (0,AT), where N is any
312
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
information consisting of evaluations //;(•) 6 £, and is an arbitrary algorithm. Furthermore, we conclude that the ^-complexity of the problem is approximately
which establishes important relationship between information complexity and the ^-complexity of the problem. Conclusion Any method satisfying conditions (1—5) above is an almost optimal complexity method. Thee-complexity of the problem is approximately equal cm(e). • Fortunately many important applications do satisfy conditions (1- 5). In particular the class of linear problems for which there exist optimal linear algorithms satisfies these assumptions. Below we specialize our observations to linear problems. 7.6.1
Optimal complexity methods for linear problems
It turns out that for any linear problem for which there exists an optimal linear algorithm <j> using optimal parallel information /V, the computational method M = (<j>, N) enjoys almost minimal cost. This property will be analyzed below. We first estimate the information complexity m(e) in terms of the 71-th minimal radii. Namely, since the sequential information is not essentially more powerful than parallel, Corollary 288.1 implies that
where r(n, £) is the n-th minimal radius in the class C, see page 277. We further observe that if the n-th minimal diameter
satisfies d(n,C) = 2r(n, £), then the information complexity m(e) is given by:
The last condition holds e.g., for linear problems defined by a real linear functional 5, or in the case when H is a, Hilbe.rt space
7.6. OPTIMAL METHODS
313
and T(ker AT) is a closed set, for every information in the class £ n . For these problems there also exist optimal linear algorithms using arbitrary parallel information as outlined in Smolyak's Theorem and Theorem 299.1. We suppose now that the information Nn has exactly n = m(e) samples and that the radius r(Nn) is at most e. Thus, the following lemma holds. Lemma 313.1 If there exists a linear optimal algorithm > using Nn> then the method M = ($, Nn) is an almost optimal complexity method. More precisely,
Proof Since every method M that solves our problem must use at least m(e) samples, and to combine m(e) samples any algorithm has to use at least m(e) — 1 unit cost operations, the worst case cost of M must be at least (c + l)m(e) — 1. Therefore the complexity comp(e) must be at least (c + l)m(e) — 1, which proves the left inequality. Since the cost of an optimal linear algorithm <j>(N(f)) = E"=i Li(f)ai is cost(>(JV(/))) = 2n - 1 we immediately obtain cost(M) = (c + 2)m(e) — 1, which completes the proof. • We illustrate this Lemma on the basis of the introductory example to the Chapter 7. In this example we analyzed parallel information consisting of function samples for the integration problem Jo f(i)dt. We proved that the information complexity of the problem was m(e) = [^]. Moreover we showed that the trapezoid method was the optimal linear algorithm,
Its cost was equal to n units (real arithmetic operations in this case). From this we conclude that the method M° = (!>0, JV°) is an almost optimal complexity method. Its cost is within 1 unit from the lower bound (c+ l)m(e) - 1 on the cost of any method based on function evaluations . Hence, for this problem we know the e-complexity to within the cost of one arithmetic operation.
314 7.6.2
CHAPTER 7.
OPTIMAL APPROXIMATION
METHODS
Exercises
170. Derive a formula for the information complexity rn(e) for the problems defined in Exercises 163 and 164. 171. Find tight estimates on the complexity of the above problems (Hint: derive optimal and/or strongly optimal algorithms).
7.7
Annotations
Chapter 7 is based on the theory outlined in two monograph books: [11] and [10]. The reader may find there history of research devoted to finding optimal information and algorithms for several important applications, as well as exhaustive bibliography of the field presently known as Information Based Complexity. These monographs go far beyond the worst case model described here. They analyze in depth extensions of the theory to different settings, including average, probabilistic, and asymptotic analysis. Several results of current research papers are also included in our outline. Specifically, Section 7.5 is based on the paper [6], whenever most of the material in other sections is based on the above monographs. The exercises after Section 7.3 are based on the papers [8] and [9]. Below we review several specific notes included in this chapter. Specific comments Section 7.1: It has been shown in the monograph [10] (p. 57) that every convex, balanced subset F of a linear space F can be described by a specific choice of a linear restriction operator T (defined through the use of Minkowski's functional) in such a way that F\ — {/ G F : \\Tf\\ < 1} C F C {/ € f : \\Tf\\ < 1} = F2. Now instead of the linear problem S : F-+G, one may consider the problem Si = S\p\ or 62 = S\F2. Since for every choice of linear information the radii and diameters are the same for all problems S, Si, and 82, one can solve the problem S% without losing generality. We stress that in our formulation the cardinality of adaptive information n was independent of a particular problem element /. This can be generalized by allowing the number n = n(f] to be dependent on / and to be determined by an arbitrary termination criterion. It turns out that in such formulation all worst case results for linear problems still hold (compare Section 2.2 and 5.2 of [10]).
7.7. ANNOTATIONS
315
Section 7.2: Nonlinear information for the solution of an arbitrary problem has been studied in Chapter 7 of [11]. Here we cite two important results. 1. (Theorem 3.1 of [11] ). For every solution operator S and any choice of nonlinear information N — [L\, ...,Ln], n—1,2,... there exists nonlinear information N° = [Lj] consisting of only one nonlinear functional evaluation such that d(N) = d(N°). 2. (Theorem 3.3 of [11] ). For any solution operator S, such that the cardinality of the set S(F) is at most continuum, there exists a nonlinear functional L\ such that d(L^) — 0. Moreover if the cardinality of the set S(F) is larger than continuum, then the problem may not be solvable with finite error, i.e., d(Li) = +00, for any LI. The case when operator K is not one-one is treated in detail in Chapter 2 Section 4 of [11], pp. 33-38. We specialize the optimal information N° to the case when G and H are separable Hilbert spaces and the operator K = ST~: is compact (for the noncompact case we refer the reader to p. 69 of [10]). We define the operator K° = K*K : H-+H. Obviously, K° is self adjoint and compact. We let AI > A2 > A , - . . . > 0, 1 < i < dimH be the eigenvalues of K° corresponding to an orthonormal system of eigenvectors Xi, K°Xi = XiXi, (xi,Xj) = 5,-j, where (•, •) is the scalar product in the space H. We also formally set Xj = 0 and Xj = 0 whenever dimH < +00 and j > d\mH. We then define the number d < n as follows: If \n > 0 then d = n, otherwise d = min{j : Xj = 0} — 1, and the information
when d > 1, and N°(f) = 0 for d = 0. It turns out that JV o is an optimal information in the class of all linear functionals and r(N°) = vX+i. The proof of this is given in Theorem 5.3.2 of [10] on pp. 67-68. The above result tells that the functionals £„•(•) = (T-, Xi), which yield the Fourier coefficients of Tf in the eigenvalue decomposition of the operator K°, provide best information in the class of all linear functionals for this problem. Section 7.3: In [4] a linear problem is constructed for which sequential information is slightly better than parallel (r(Ns) < r(JV p )). In [10] pp. 61-63 several examples analyzing the power of best sequential and parallel information are given, as well as a brief history of this problem. We stress that the first results along these lines can already be found in the work of Kiefer [3] and Bakhvalov [2].
316
BIBLIOGRAPHY
Section 7.4: Several examples of this type are given in Section 5.5 of [10], pp. 80-87. Probably the most natural are given in paper [12]. The authors consider there the inverse of a finite Laplace transform, and show that the error of every linear algorithm is infinite whenever the optimal radius of information goes to zero as O(^) as n—>-oo. The result referred to on page 301 has been proved in [5]. Section 7.5: This section is based on the paper [6]. For a definition of the Babenko s-scale, see [6] and [1].
7.8
References
[1] V.J. Babenko. Estimating the quality of computational algorithms. Computer Methods Appl. Mech. Eng. 7: 47-63 (Part 1), 135-52 (Part 2), 1976. [2] N.S. Bakhvalov. On the optimality of linear methods for operator approximation in convex classes of functions. USSR Cornput. Math. Math. Phys. 11: 244-49, 1971. [3] J. Kiefer. Optimum sequential search and approximation methods under regularity assumptions. J.Soc.Indust. Appl. Math., 5: 105-36, 1957. [4] M.A. Kon and E. Novak. The adaption problem for approximating linear operators Bull. Amer. Math. Soc. (N. S.) 23: 159-65, 1990. [5] M.A. Kon and R. Tempo. On linearity of spline algorithms. Report, Department of Mathematics, Boston University, 1987. [6] P. Mathe. s-Numbers in Information Based Complexity. J. of Complexity, 6: 41-66, 1990. [7] K. Sikorski. Bisection is optimal. Numer. Math. 40: 111-17, 1982. [8] K. Sikorski. Optimal Solution of nonlinear equations satisfying a Lipschitz condition. Numer. Math., 43: 225-40, 1984. [9] K. Sikorski and H. Wozniakowski. For which error criteria can we solve nonlinear equations. J. of Complexity 2: 163-78, 1986. [10] J. Traub, G. Wasilkowski, and Wozniakowski Based Complexity. Academic Press, 1988.
Information
BIBLIOGRAPHY
317
[11] J. Traub and H. Wozniakowski. A General Theory of Optimal Algorithms. Academic Press, 1980. [12] A.G. Werschulz and H. Wozniakowski. Are linear algorithms always good for linear problems. Aequationes Math., 31: 20212, 1986.
This page intentionally left blank
Chapter 8
Applications We shall now take time to demonstrate some applications of the foregoing material. We discuss here Burgers' equation problem, the simplest fluid dynamics conservation law problem, as well as the approximation of band-limited signals, and the bisection method for nonlinear zero finding problems.
8.1
Sine solution of Burgers' equation
In this section we illustrate the application of Sine approximation to the approximate solution of Burgers' equation. Let 72. denote the real line, and let UQ denote a given function defined on 71. We shall illustrate an integral equation procedure for solving the Burgers equation problem
We accomplish this by first transforming this problem into the equivalent integral equation problem
which we discretize via the Sine collocation procedure using the algorithm on page 147, and then we solve the resulting discretized system via Neumann iteration.
320
CHAPTERS.
APPLICATIONS
We take
Such &UQ enables us to approximate an arbitrary continuous function on 7£, by Theorem 154.1. Moreover, this choice of u^ enables an explicit expression for the integral
so that we can now rewrite u ( x , t ] in the form
where
We note that it is possible to explicitly evaluate the "Laplace transform" of the convolution kernel in this equation, i.e.,
We now proceed to discretize the equation as outlined in Section 3.5. To this end, we select £ — 1/2, b = 1, c = 0, (/>t(t) = log{sinh(t)}, 4>x(x) — x, dt = 7T/2, at = ftt — 1/2, dx = 7r/4, ax = flx = 1, and in this case it is convenient to take Mt = Nt = Mx = A^ = N. We thus form matrices
where Sx and St are diagonal matrices, and then proceed as in the algorithm on page 147, to reduce the integral equation problem to the nonlinear matrix problem
8.2. SIGNAL RECOVERY
321
[uij] = F(Ax,Bt, [4]) - F(A'X, Bt, [4-]) + [Vij], with notation as given for the algorithm on page 147, where the function v^ may be evaluated a priori, via the formula u,-j = v ( i h X ) Z j ) , and with Zj = log e^llt + f 1 + e2-7''1' j If a is sufficiently small we may solve this system by the Neumann iteration
for k = 0 , 1 , 2 , . . . , starting with [u\- ] = [%']• For example, with a = 1/2, and using the map <j)t(t] = log[sinh(t)] we achieved convergence in 4 iterations, for all values of TV (between 10 and 30) that we attempted. We can also solve the above equation via Neumann iteration for larger values of a, if we restrict the time / to a f i n i t e interval, (0,T), via the map t(t] - log {Tt/(T -t)}.
8.2
Signal recovery
This section aims to demonstrate some applications of the results presented in Chapters 6 and 7 to approximation of band- and energylimited signals / 6 W(a, r). Our approach is focused on finite information and follows the theory outlined in Chapter 7. 8.2.1
Formulation of the problem
We remind the reader that the space W(a,r) is equipped with the inner product ( - , - ) and the norm | • ||2 r of the space L^(—T, r) and is consisted of functions / : 7-2. —>C such that
where x j 6 L--t( — a, a) is uniquely determined by /, see page 228. We assume that signals in the space W(a,r) are given through an information operator of the form N : W(a,r) —>C",
where Ll : W(a,r) —K'' are linear functionals that are allowed to be chosen adaptively, i.e., for i = 2, 3,. . ., n. All algorithms discussed in this section are assumed to be of the form j ( N f ] : NW(a, T) -^ L 2 (-r, r ) . The error 6(7) of an
322
CHAPTERS.
APPLICATIONS
algorithm 7 is defined by its worst performance in a given subclass J o f W ( a , r ) , with respect to the norm || • || 2 r ,
The assumption J = W(a,r) always leads to an infinite error, since there is no limitation on the energy of the signals in W(a,r). Thus, we assume a finite bound E on the energy of the signals to be approximated. That is, we deal with the subclass
8.2.2
Relations to n-widths
We recall that the Kolmogorov n-width dn(J(a,r, E),W(a,r)) is equal to ^E\n^i(ar), where A 7 i _i(ar) is the (n — l)st eigenvalue of the integral operator
Moreover, we have
where P : L-2( — r, r) —>• L^( — T, T] is the orthogonal projection on the subspace spanned by the eigenfunctions (/;0, >i,. . ., >n-2- (See Corollary 226.f and Example 227.f.) Thus, by the definition of the Kolmogorov linear ri-widths, we obtain
We shall now take time to show that these quantities also coincide with the corresponding Gelfand n-widths. For convenience we define p = ar. Theorem 322.1 Let ( • , •) be the inner product in L 2 ( — T , r ) . Let £1 < £2 < • • • < £n-i be the zeros of 4>n-\ on the interval [—T, r] and let M : W(a,T)-^Cn~l, N : W(a, T) ->Cn~l be defined by the equations
8.2. SIGNAL RECOVERY
323
Then ker M and ker N are. extremal subspaces for both the widths cn(J(a,r, £"), W(a,r)} and Cn(J(a, r, E\ W(a, T)}. Moreover,
Proof We will prove the lemma for cn(J(a, r, E], W(a, r)) only. The arguments we use are also valid for Cn(J(a, r, E], W(a, r)). By Theorem 231.2 we obtain
As c ?i ,(J(a, r, E), W(a, r)) < inin{^», q}, it remains to show that p ~ q 2 = E\n-i(p). We begin with evaluating the p. As shown in Example 227.1,
Since the functions ^ are orthonormal and complete in L%(— T, r), we get
We shall now prove that q2 = EXn-\(p). To this end, we recall that for any / 6 W(a, T) it holds
see p. 24. For j = 0, 1, . . . , u — 1 we. deli tie functions K j : 'Ji2 ~^'R by the recursion
324
CHAPTERS.
APPLICATIONS
By induction on j it follows that
for all / € W(a, T] D ker A^j, where Nj : W(a, r)-^C3 is defined by
The verification of this fact is deferred to the Exercises at the end of this section. Moreover, by the assertion (d) of Theorem 26.1 we also get
where < • , • > is the inner product in L^—oo, 0x3) and H : W(a, r)n ker N —>• LZ(—oo, oo) is defined by the equation
We now consider the quantity
Since this supremum can be rewritten as
c is the largest eigenvalue of H. By the definitions of q and c we clearly have c — q2/E. We also note that
Thus, we need to show that A is the largest eigenvalue of H. We assume to the contrary that the supremum c is attained for /0, H fo — c/o and A < c. We now define
8.2. SIGNAL RECOVERY
:V25
and note that CT = c/\/l - c2 and the supreiuuni CT is attained for <70 = /o/v/l — l l / o i l - i r - (Th e verification of these facts is left to the reader as an exercise.) Since
we see that ((f)n-\ , do) = 0 and consequently #0 has a zero C 6 (-T, r) such that C / £j (j = 1, 2, . . . , n — 1). We now consider the function
As g0 G W(a,r) and 50(C) = 0 we have ||<7i|| 2oo < oo and the function g\ can be extended to an entire function of exponential type a. Now, from the Paley-Wiener theorem (see p. 23) it follows that g\ G VF(a, r). And, of course, g\ 6 kerAf. On the other hand, we also have \9\(t)\ > \go(t)\ forte ( ~ r , r ) a n d 1^(01 < \9o(t)\ f o r t G 7e\[-r,r]. Thus, the function
satisfies
Since / K \r_ t Ti |^i(t)| 2 dt = 1, the function
8.2.3
Algorithms and their errors
Based on Theorems 266.1, 287.1, 303.1, and Example 275.1 these approximation results can be interpreted as follows. Theorem 325.1
(a) There is no linear information N : W(a,r) M> Cn and there i$ no algorithm j : 7V,/(a,r, K) ^ L-^ — T, r) whose, worst ease error
326
CHAPTER 8.
APPLICATIONS
satisfies e(f) < \/E\n(p). (b) Let the information operators Mn : W(a, r) H-> Cn and Nn : W(a, r) t-> C" fre defined by the equations
and
where f 6 W(a, r) and £ n j are the zeros of ^n on [~ r i r ]- ^ ^ M> algorithms a : MnJ(a,r,E) >—>• LI{ — r, r) and ft : NnJ(a,T,E) M1/2 (—T, r) &e defined as follows:
where the coefficients of the linear system
ai, 02, . . ., a/t are determined by the solution
Then, we have
Thus, the algorithms a and /? have the smallest possible worst case error and they both use parallel information operators consisting correspondingly of n inner products and n signal samples at the zeros of the function ;„. Any function / of bandwidth [—a, a] is uniquely determined by its samples f(kcr), k — 0, ±1, ± 2 , . . . , taken at the rate
see Theorem 25.1. Let us note that the number of the samples f(kcr) in the interval [—r, T] is [2p/Tr]. One can easily verify that the truncated series involving only the samples of/ from [—r, T] provides, in general, a poor approximation to /. In fact, the situation is much worse, since from the assertion (d) of Theorem 26.1 and from the assertion (a) of Theorem 325.1 one gets:
8.2. SIGNAL RECOVERY
327
Theorem 327.1 There: is no algorithm for approximation of signals in the class J(a,r,E) that uses linear information N with n < [2p/Tr] — 1 and whose worst case error is smaller than \/E/2. Although the information operators Mn and Nn are optimal for the recovery, they are not easy to obtain. In the lookout for an alternative we shall now discuss recovering signals / G J(a,r,E) from the samples where tj, t^,..., tn are arbitrary distinct points from the interval [—r, T]. We begin with estimating the error of the Lagrange interpolatory algorithm
where f\t\, t-2, . . .,tk] are the divided differences of /. Lemma 327.1 We have
Proof Any function / 6 J(a, r, E) is of the form
where x j 6 B - {x <E L 2 (-a,a) : \\x\\^a < E/(2yr)}. Thus, the remainder term of Lagrangian interpolation is
Hence,
328
CHAPTER 8.
APPLICATIONS
Since s u p t e [ _ T i T ] n " = i | t - t j < (2r) n and e'-fo i 1 ( . . . , tn]\ < ^f we get
Thus by the Stirling formula:
we obtain
as claimed. • Remark We should like to stress that the bound in Lemma 327.1 is independent of the nodes t\, t % , . . . , tn. Much better estimates on e(L) are possible for special choices of nodes. For instance, if t\. are the Chebyshev points in [—T, T], i.e., if
then sup i£ [_ T)T ] ni~i K ~ M = 2(r/2) n and consequently
From Corollary 275.1 and Example 275.1 it follows that
is the minimal error of an algorithm using the information Mn. In order to compare this quantity to e(L) it is convenient to define Ok = tk/r for k = 1, 2 , . . . , n and to prove the following lemmas. Lemma 328.1 We have
8.2. SIGNAL RECOVERY
329
Proof Let / 6 J(a, T, E) n ker 7Vn be arbitrary. Since / ( £ [ ) = /(*2) = . . . = /(in) and
the /(£) coincide with the remainder term of Lagrangian interpolation of /, i.e.,
It can be easily verified by induction on n that
where 9 = t/r. Making the substitution w — L>JT in the integral above, we get
where y(w) = T~*X(W/T). (Consequently,
Since the operator x \-^ y maps the unit ball in L2(—a, a) onto the unit ball in Li(—p,p) and the function
vanishes at the points #1, 6 > 2 , . . . , On, we see that ||/|| 2T = I M ^ r We also have ||/|] 2>0 o — \\9\\-2 oo- H ence i the lemma follows. •
Lemma 329.1 Let a = p/n and
330
CHAPTER 8.
APPLICATIONS
Then In particular, when n > '2p/n then this inequality yields
Proof According to the Paley-Wiener theorem, J ( p , 1, E) consists of entire functions / of exponential type p satisfying |)/|| 2 ^ < E, which are restricted to the real line. The function h is entire square integrable over real line and of exponential type p. (We defer the formal proof of these properties to Exercises.) Moreover, h,(0\) — h(02] - ... = h(0n) = 0 and if a < Tr/2, then
We also have
Let us now define the function !IQ to be the restriction to the real line of the complex mapping z H-> \/~Eh(z)/\\//,\\2 ^:. T h e n , //, () e J ( p , l , E ] and /i 0 (#i) = /lo(^) = ... = h0(0n) = 0. Consequently, we get which coincides with the first bound of the lemma. It is easy to verify that
8.2. SIGNAL RECOVERY
331
if a < 7T/2. We also have
and the infimum attains when £/t are the zeros of the Lagrange polynomial Pn = F'n ' . It now follows that for n > '2p/n it holds
Now, by the Stirling formula, we get
and
Hence, the lemma follows. • Let
From Theorem 325.1 it follows that this infimum is attained and
Since r(a, r, E\ i i , . . . ,£ n ) < e(L), the remark after Lemma 327.1 and Lemmas 328.1, 329.1 yield the following result. Corollary 331.1 For n > 2p/?r we have
332 8.2.4
CHAPTERS.
APPLICATIONS
Asymptotics of minimal cost
Let e be a positive number and let
This quantity can be interpreted as the minimal number of samples required to find an ^-accurate approximation to any signal in the class J(a,r, E1). When dealing with computational costs of algorithms we shall assume that the cost of arithmetic operations (+, —, X, /) and the cost of sampling of any signal in J ( a , r , E ) are taken as unity and c, respectively. From these definitions it follows that the cost of any algorithm 7 such that 6(7) < e must be at least proportional to cm(s). We are now in a position to prove the following theorem. Theorem 332.1
(a) Regardless of the sizes of a, T and E we have
(b) For sufficiently small e > 0, Lagrangian interpolation with (log l/e)(loglog l/e)~ 1 (l + o(l)) arbitrary distinct nodes on the interval [—r, T] yields an £~accurate approximation to any signal in J(a, T, E) with almost minimal cost. Proof Let T = {ti, t % , . . ., tn} be the set of arbitrary distinct nodes on the interval [—r, T] and let LT be the Lagrange interpolatory algorithm associated with these nodes. By Lemmas 327.1 and 329.1 there exist positive constants K\ > Ky (dependent on a, r, and E, and independent of T) such that for sufficiently large n it holds
Thus, if w(e) is either m(e) or
then Consequently,
333
8.2. SIGNAL RECOVERY After substituting w(e] = h(£)i
we
Set
Hence, h(e) — I + 0(1) as e—>0+ and we see that
which proves the assertion (a). For any t G [—T, r] we have
where P(t) = FlfciC* ~ ^)- Since the derivatives P'(tfc) can be precompiled, to get LT(Afnf) (t} we need n measurements of / and at most 3?i arithmetic operations. Thus, if n = m^g) the total cost of evaluating LT(J\fnf)(t) is proportional to cm/ j (g) = cm(g)(l + 0(1)). This proves (b) and completes the proof. • 8.2.5
Exercises
172. For j —-. 1, . . . , n — 1 let Kj and Nj be defined as in the proof of Theorem 322.1. Show that
173. Given a linear subspace V of W(a, r) let
and let the supremum c be. attained by fa £ V. Show that CT = c/\/l - c2 and that the supremum cr is attained by g0 = /Q//1 - ||/o||2,T. 174. Prove Theorem 325.1.
334
CHAPTER 8.
APPLICATIONS
175. Show that the function /?, in Lemma 329.1 is entire square integrable over Tl and of exponential type p. 176. Prove that In — Q(n~ll2} as n->oo.
8.3
Bisection method
We consider here a nonlinear zero finding problem for smooth functions that change sign at the end points of an interval. The goal is to compute an ^-approximation to a zero of such a function. This can be accomplished by using bisection, hybrid bisection-secant, or bisection-Newton type methods. We show that in the worst case the bisection method is optimal in the class of all methods using arbitrary adaptive linear information. This holds even for infinitely many times differentiable functions having only simple zeros. 8.3.1
Formulation of the problem
Let G = C°°[0, 1] be the space of infinitely many times differentiable functions / : [0, l]-»ft. We let exactly one z such that f ( z ) = z, and f (z) ^ 0). The solution operator 5 is defined as
We wish to compute a point x = x ( f ) , which is an e-approximation to S ( f ) for every function / 6 F,
To accomplish this we use arbitrary adaptive linear information N — Nn as defined in Chapter 7,
where 3/1 = LI(/), y; = [y;_i, !/,-(/; y;_i)], and £-,-,/(•) f = 7 Lj(-;y,-i) is a linear functional i = !,...,«,. The bisection information Nlns is in particular given by the functionals
for Xi = (a,--i + 6,-_i)/2 with a 0 = 0, 60 = 1, and
335
8.3. BISECTION METHOD
Knowing N ( f ) we approximate S(f)
by an algorithm <-/>,
As in Chapter 7 the worst case error of an algorithm <j> is
The radius of information is given by
and as in general, it is a lower bound on the error of any algorithm
where is the class of all algorithms using N. The bisection algorithm is defined as
It is easy to verify that
8.3.2
Optimality theorem
In this section we show that the bisection method Mbls = (;6", Nhls] is optimal, i.e., Theorem 335.1 The following equation holds
where L is the class of all adaptive linear information. To prove this theorem we need two lemmas. We first denote /,to be any closed subintervals of [0,1], i = 1,..., k, and
We let L{ : G—tTi be linearly independent linear functional.?. Then we prove
336
CHAPTER 8.
APPLICATIONS
Lemma 336.1 For every 6, 0 < 5 < 1, and every family of intervals Ii C [0, 1], with diameter diam(Ii) = 8, i = 1, ..., k — 1 such that the functional Li, i = ! , . . . , & — 1 are linearly independent on Ck-\, there exists an interval /& C [0, 1] with diam(Ik) = 8, such that the functionals Li, i— !,...,& are linearly independent on CkProof We suppose that such a interval Ik does not exist. Therefore, for every choice of Ik C [0, 1] with diam(/&) = 8, the functionals L\,...,Lk are linearly dependent on Ck- Since L\, ...,Lk-i are linearly independent on C/t_i, this yields
We first assume that ai(Ik) = «, for every choice of 7^.. We let 7* = [cj,Cj + i], for any numbers Cj, j = !,...,, such that c\ — 0, cq = I — 5 (q < oo), and Cj < c./ + i < c? + 8. Then /* form a ^-covering of [0, 1],
By the unity decomposition theorem we can decompose any / 6 G as
Therefore
This equation contradicts linear independence of L - [ , . . . , L k on G. Therefore there must exist two intervals I1 and 7 2 such that
and flj ^ j j for some index j. This implies that
8.3. BISECTION METHOD
337
Since flj — j^^O this contradicts linear independence of LI, ..., Lk-\ on Ck-\, and completes the proof. • The next lemma guarantees existence of specific worst case functions. Lemma 337.1 For every adaptive information Nn. and jor every 5, 0 < 8 < 2~n/(n + 1), there exists a function fn € G, points x\
where x 2)7l — x\^n > 2~n — (n + 1)<5, and
(Hi) fn is strictly increasing for
Proof We use induction on n. We suppose first that n — 1. Since LI is not 0 on G Y , then as in the proof of lemma 336.1 we conclude that there exists an interval / j = [c,c+ 6], l\ C [0,1], such that L] / 0 on G'i = C(h}- We assume without, loss of generality that c > 0.5 — 6/2, and define
We note that f\ e G, and that Z/ijj = LI is linearly independent on C\. By taking x\t\ — 5/2, and 0:2,1 = c — 5/2 we obtain
and /i is strictly increasing on (x^i - 5/2, Xi^) U (a; 2i i,o; 2il + 5/2}. This completes the case n = 1. We now assume that lemma holds for some n > 1 with a function /„. The information Nn+\ consisting of n+ 1 evaluations yields
338
CHAPTER. 8.
APPLICATIONS
a functional Ln+\jn. Without loss of generality we assume that -£>!,/„, -.., Lnjn, Ln+ijn are linearly independent on G (see also annotations to this section). Then Lemma 336.1 implies that there exists an interval /n+1 = [c, c + S] such that L\jn,..., Ln+ijn are linearly independent on C n +i. If c > Z2,n + $/2 or c < z 1)n —3£/2, then Lemma 337.1 holds for fn+\ — fn, £i in +i = a;i, n) ' and Z2,n+i = ^2,nWe therefore assume that
and without loss of generality suppose that c > (x\in + xi,n}/'l — 6/1. Then we set
We note that g G G Y , and take a function /i € C' n +i, such that
Such function h exists since the functionals are linearly independent on C n +i. Therefore
The functions g, h, and fn are illustrated in Figure 338.1.
Figure 338.1: The functions g, h, and fn We now take a positive constant d so small that
8.3. BISECTION
METHOD
339
We then define the function
which obviously belongs to G. Moreover, since N n ( f u + \ ) — Nn(fn), then Lijn+l = L.tjn for i = 1, ..., n+ 1. We define .TI,?I+I = x^n and 2 ; 2,7i+i = c - S/'2. Then x-2,n+i - •K\,n+\ > 2~ n ~ 1 - (?i + '2) (5, and dist(/j, [xi )7l , a;2),J) >
for every S, 0 < 8 < 2 n l/(n + 1). Then the proof follows immediately from the definition of radius of information on page 335, with S approaching zero. To construct f\ and /2 we apply the technique presented in the proof of Lemma 337.1. Namely, we take the function /„, functionals Ljjn from Lemma 337.1,and define the function g 6 G by
We then take a function h G Cn such that L-ijn(g-\- h) — 0 for every i = 1,..., n. As before we choose a positive constant d so small that
and define the functions
Both f i and /2 belong to G and have only simple and single zeros, 5(/i) € (xi>n - 6/2,xitn) and 5(/2) 6 (cc 2)n ,a; 2>n + J/2), since fn is strictly increasing on these intervals. Therefore we conclude that fa and /2 belong to F. Moreover we note that Nn(fa) = Nn(fy) and \S(fi)-~S(f2)\ > 2-"- 1 -(n+l)<5, which means that these functions satisfy all of the requirements. The proof is complete. •
340
BIBLIOGRAPHY
Corollary 340.1 In order to compute an e-approximation to the zero of every function in our class one needs m(e] — [Iog2 - — 1] function evaluations. The number m(e) is minimal with that property, i.e., the information complexity of the problem is equal to m(e). 8.3.3
Exercises
177. Show that e((f)bis) = r(Nbis) = 1n~l. 178. Prove Corollary 340.1
8.4
Annotations
This material illustrates applications of the approximation theory outlined in previous chapters. We discuss here, the approximate solution of Burgers' equation, the approximation of band-limited signals, and a nonlinear zero finding problem. Specific comments Section 8.1: The method of solving Burgers' equation was published in [14]. Section 8.2: The material of this section is selected from [6] (the n-widths results) and from [4] and [5] (the cost analysis). For extensions and generalizations the interested reader is referred to the articles above and also to [1], [2], [3], [7], and [11]. Section 8.3: This section is based on the paper [9]. In this paper the linear independence of the functionals defining information was not assumed, and therefore the proofs were slightly more technical. It is important to stress that our result holds in the worst case setting. If someone considers an average case setting with adaptive stopping rules, then a hybrid bisection-Newton's method turns to be optimal. This is reported in a recent paper [8]. The bisection method was also studied from the point of view of asymptotic rate of convergence. It turns out that the linear rate of convergence can not be essentially improved, for functions with zeros of infinite multiplicity, as reported in [10].
8.5
References
[1] B.Z. Kacewicz and M.A. Kowalski. Approximating linear functionals on unitary spaces in the presence of bounded data errors
BIBLIOGRAPHY
341
with applications to signal recovery. Int. J. of Adaptive Control and Signal Processing, (to appear). [2] B.Z. Kacewicz and M.A. Kowalski. Recovering signals from inaccurate data. Curves and Surjaces in Computer Vision and Graphics II, M.J. Silbermann, H. D. Tag are, Editors, Proc. SPIE, 1610: 68-74, 1992. [3] M.A. Kowalski. Optimal complexity recovery of band- and energy-limited signals. J. Complexity, 2: 239-54, 1986. [4] M.A. Kowalski. On approximation of band-limited signals. J. Complexity, 5: 283-302, 1989. [5] M.A. Kowalski and F. Stenger. Optimal complexity recovery of band- and energy-limited signals II. J. Complexity, 5: 45-59, 1989. [6] A.A. Melkman. n—Widths and optimal interpolation of timeand band-limited functions. In Optimal Estimation in Approximation Theory, edited by C.A. Micchelli and T.J. Rivlin, Plenum Press, New York, pages 55-68, 1977. [7] A.A. Melkman. n—Widths and optimal interpolation of timeand band-limited functions II. SIAM J. Math. Anal, 16: SOSIS, 1985 [8] E. Novak, K. Richter, and H. Wozniakowski. Average Case Optimality of a Hybrid Secant-Bisection Method, Math. Cornp. , (to appear). [9] K. Sikorski. Bisection is Optimal. Numer. Math., 40: 111-17, 1982. [10] K. Sikorski and J. Trojan. Asymptotic Near Optimality of the Bisection Method. Numer. Math., 57: 421-33, 1990. [11] D. Slepian. On bandwidth. Proc. IEEE, 64: 292-300, 1976. [12] F. Stenger. Explicit, nearly optimal, linear rational approximations with preassignecl poles. Math. Comp., 47: 225-52, 1986. [13] F. Stenger. Numerical Methods Based on Sine and Analytic Functions. Springer-Verlag, New York, 1993.
342
BIBLIOGRAPHY
[14] F. Stenger, B. Barkey, and R. Vakili. .Sine Convolution Approximate Solution of Burgers' Equation. In Computation and Control III, edited by K.L. Bowers and J. Lund, Birkhauser, Boston, pages 341-54, 1993.
Index Abel summation formula, 86 absolute error criterion, 262 adaptive information, 263 adjoint operator, 239 Ahieser, N.I., 89 Ahlberg, J.H., 115 Akhieser, N.I.,89, 220 algebraic version of Kolmogorov's linear n-width, 235 algorithm, 263 almost strongly optimal, 295 linear, 291 optimal, 267, 302 almost optimal information, 277 alternant, 53 Anselone, P.M., 115 approximation numbers, 239 Askey, R., 221 Babenko, K.I., 256 Babenko, V.J., 316 Bakhvalov, N.S., 316 Banach, 8, 10, 12, 88, 90, 213, 230, 231, 233, 237, 238, 239, 240, 245, 246, 247, 249, 251, 255, 256 Banach space, 10 Barkey, B., 342 Barsky, B., 115
Bartels, R., 115 Beatty, J., 115 Berg, C., 220, 221 Bernstein, 1, 85, 88, 89, 229, 230,232 Bernstein n-width, 229 Bernstein operator, 85 Bernstein theorem (68.1), 89 Bernstein theorem (10.1), 88 Bessel inequality, 20 best approximation, 2 error, 2 Bochner, 221 Bohmer, E.G., 115 Bojanov, B.D., 114, 115 Borel, 28, 172, 173, 174, 176, 178, 188 Borel, E., 150 Borel subsets, 28 Borsuk, 225, 226, 256 Borsuk-Ulam theorem, 273 on antipodes, 225 Bowers, K.L., 151 Bromwich's inversion formula, 160 Brouwer, 3, 4, 88 fixed point theorem, 3 B-spline, 103 Bulirsch, R., 116 bullet-shaped, 123 Burchard, H.G., 257 Burgers' equation, 319
INDEX
344
canonical embedding of X into X**, 250 canonical quotient mapping, 13,248 cardinality, 262 Cauchy, 244 formula, 37 center of symmetry, 113 central algorithm, 267 Chakalov, 218, 220, 222 theorem, 218 Chebysheff, 90 Chebyshev, 1, 31, 38, 42, 43, 45, 46, 47, 49, 51, 52, 54, 55, 62, 63, 77, 78, 82, 83, 84, 88, 194,195 inequalities, 194 polynomials, 31 subspace, 42 system, 42 theorem (51.2), 88 Cheney, E.W., 89 Chichara, T.S., 89 Cholesky, 18 algorithm, 18 Christensen, J.R.P., 220, 221 Christoffel, 29, 200 Christoffel-Darboux formula, 29 Ciesielski, Z., 115 classical orthogonal polynomials, 30 codimension, 231 compact operators, 241 complete sequence, 20 computational method, 264 condition n u m b e r , 17 cone, 169 Continuous Hamburger moment problem, 180 Continuous Hausdorff moment
problem, 179 Continuous Stieltjes moment problem, 179 Continuous trigonometric moment problem, 179 convolution integral, 147 curvature, 102 Darboux, 29, 200 Davis, P.J., 150 cleBoor, C., 115 delta function, 164 dense sets, 9 determinate distribution, 191 determinate measure, 191 determinate moment problem, 175 deviation of an algorithm, 295 Diaconis, P., 221 Discrete Hamburger moment problem, 180 Discrete Hausdorff moment problem, 179 Discrete Stieltjes moment problem, 179 Discrete trigonometric moment problem, 179 distributed model of computation, 286 distribution function, 174 divided difference, 53 dual space, 169 Dunford-type integral, 142
Dzjadik, V.K., 89 ^-complexity, 310 elliptic, functions. 158 c-net, 228 extremal d i s t r i b u t i o n , 202
345
INDEX extremal property of splines, 96 extremal subspace, 224 for bn, 230 for cn, 230 eye-shaped region, 122
(globally) optimal error algorithm, 267 Gordon, Y., 257 Gram matrix, 16 Gram-Schmidt algorithm, 19 Greville, T.N.E., 115
Farin, G., 115 Favard, 207, 212, 215 theorem, 207 filter function, 163 first algorithm of Remez, 56 Fourier series, 21 Fourier transform, 24 Freud, G., 221 Fubini, 180 Fuglede, B., 221 function of intervals, 174 functions of exponential type a, 23 functions vanishing at infinity, 172 fundamental theorem of Tichomirov, 256 funnel-shaped, 124
llaar, 90
Garling, D.J.H., 257 Gauss, 194, 203 quadrature formula, 194 Gelfand, 229, 230, 231, 235, 239, 245, 250, 251 numbers, 239 general approximation problem, 262 general spline interpolation, 109 general splines, 111 global diameter, 265 global error, 266 global radius, 265
theorem (43.1), 88 Halm, 230,231, 233, 237 Hahn—Banach theorem, 8 Hakopian, H.A., 115 Hammerlin, G., 150 Hardy, 253 spaces, 253 Heaviside function, 162 Heinrich, S., 256, 257 Helfrich, H.-R, 256, 257 Helly theorems, 174 Hermite interpolatory polynomials, 193 Hermite polynomials, 30 Hennitian positive definite matrix, 16 Hilbert, 226, 239, 240, 241, 242, 244, 245, 246, 248, 255 matrix, 18 numbers, 239 transform, 125 Holladay, J.G., 114, 116 Hollig, K., 257 Holmes, R., 116 homogeneous algorithm, 296 Homer's algorithm, 99 hourglass-shaped, 124 Ikebe, Y., 166 impulse function, 164
INDEX
346
indeterminate distribution, 191 indeterminate measure, 191 information, 262 almost optimal, 277 complexity, 263 operations, 262 interpolation problem, 95 interpolatory, 268 interval in TZn, 173 interval of continuity, 174 intervals, 6 Isbel, J.R., 89 Jackson operator, 84 Jackson theorems (63.1, 65.1), 88 Jacobi polynomials, 30 Jankowska, J., 116 Jankowski, M., 116 Jensen, C.U., 221 jump of a distribution function, 194 Kacewicz, B.Z., 340, 341 Kadec, M., 257 Kailath, T., 221 Karlin, S., 89, 116 Kashin, B.S., 256, 257 Kemperman, J.H.B., 221 Kiefer, J., 316 Kincaid, D., 89 knot sequence, 93 KolmogorofF, A., 257 Kolmogorov, 223, 224, 225, 229, 230, 235, 238, 248, 250, 251, 256, 257 ri-width, 224 theorem (40.1), 88
Kon, M.A., 316 Koornwind, T., 221 Kornejc'uk, N.P., 116, 257 Korovkin operators, 58 Korovkin theorem, 88 Kowalski, M.A., 166, 221, 340, 341 Krein, M.G., 220, 222 Kress, R., 150, 151 Kronecker delta, 21 Kuratowski, K., 257
Lagrange interpolatory polynomial, 53 Lagrange method, 35 Lagrange multiplier, 35 Laguerre polynomials, 30 Landau, H.J., 89, 222 "Laplace transform", 140, 142 Laurent, P.J., 115 Legendre polynomials, 26 Lewicki, G., 90
Lindenstrauss, J., 256, 257 linear algorithm, 291 linear problem, 270 linearly dense sets, 9 local diameter of information, 265 local error, 266 local radius of information, 265 Lorentz, G.G., 257 Lund, J., 151 Maiorov, V.E., 256, 257 iVIairhuber, , ) . , 90 theorem. (S8 M a r t e n s e u , L]., 1 5 1 iV'lathe, P., 316 maximal functional, 49
INDEX Mazur, 232, 256 theorem on supporting hyperplanes, 232 Mazur-Orlicz theorem, 170 McNamee, J., 151 Meinardus, G., 90 Melkman, A.A., 257, 341 metric extension property, 246 metric lifting property, 247 Micchelli, C.A., 257, 341 modified Gram-Schmidt algorithm, 19 modulus of continuity, 59 moment problem, 170 Miinz theorem, 37, 86 Mysovskikh, I.P., 222 natural cubic spline, 97 natural spline, 94 Neumann's iteration, 321 Newton's form of cubic spline, 99 Nilson, E.N., 115 nonadaptive information, 262 not dispersed zeros, 217 Novak, E., 316 ri-th minimal diameter, 277 n-th minimal radius, 277 Nudel'man, A.A., 222 Odyniec, W., 90 optimal complexity method, 310 optimal information, 277 optimal linear algorithm, 302 Ortega, J.M., 90 orthogonal elements, 13 orthogonal polynomials, 28 orthogonal projection, 15
347
orthonormal elements, 13 Paley-Wiener theorem, 23 parallel model of computation, 286 parallel information, 262 parallelogram law, 14 Parseval theorem, 24 partition of u n i t y , 105 Paszkowski, S., 90 perturbed in form at ion, 295 Petrushev, P.P., 90 Pietsch, A., 223, 256, 258 Pinkus, A., 258 Poisson's summation formula, 155 Pollak, H.O., 89, 90 Polya, G., 222 Popov, V.A., 90 positive measures, 178 positive operators, 85 Powell, M.J.D., 90 precompact sets, 228 principle of local reflexivity, 251 projection operators, 73 prolate spheroidal wave functions, 27 quadrature formula, 191 quasi-inner product, 207 quasi-orthogonal polynomials, 29 quotient space, 13 rank of an operator, 229 real Chebyshev space, 46 reflexive, 250 regular Borel measure, 172 Remez, 90
INDEX
348
representation of a functional, 191 restriction operator, 270 Rheinboldt, W.C., 90 Rice, J.R., 90 Riesz representation theorem, 74 Riesz theorem, 172 Riesz-Steinhauss theorem, 173 Rivlin, T.J., 90, 257, 341 Rockafeller, R.T., 258 Rosenthal, H.P., 256, 257 Rudin, W., 90, 222 Runge example, 78 Sahakian, A.A., 115 Sarso, D., 222 Sawori, Z., 221 Schauder's theorem, 255 Schoenberg, I.J., 90, 114, 116 Schumaker, L., 116 Schwarz inequality, 35 second algorithm of Remez, 56 sector, 122 Semadeni, Z., 89 separable normed space, 38 sequential information, 263 sequential model of computation, 286 Shohat, J.A., 222 signals of limited bandwidth, 26 Sikorski, K., 316, 341 Sikorski, R., 222 Silbermann, M.J., 341 sine functions, 26 Sine indefinite convolution, 139
indefinite integral, 136 points, 118 quadrature, 129 series interpolation, 128 Singer, I., 258 singular value decomposition, 210 of compact operators, 242 Slepian, D., 90, 341 Smolyak's theorem, 292 Snobar, S., 257 s-numbers, 239 Sobolev, 234, 256 spline algorithm, 296 spline function, 93 stable algorithm, 17 Steffenson's rule, 108 Stenger, F., 91. 151, 166, 167, 341, 342 Stieltjes inversion formula, 186 Stirling formula, 328 Stoer, J., 116 strictly convex normed space, 7 strip, 121 strongly (locally) optimal error algorithm, 267 Studden, W.J., 89 substantially equal distributions, 174 Suetin, P.K., 222 Szego, G., 91 Tagare, H.G., 341 Tamarkin, J.D., 222 Tchebycheff, 89 Temlakov, V., 258 Ten ipo, R., 31G
349
INDEX three term recurrence relation, 29 Tichomirov, V.M., 223, 256, 258 Timan, A.F., 91 Toeplitz matrix, 136 trapezoid methods, 261 Traub, J.F., 316 Trojan, ,]., 341 two dimensional "Laplace transform", 147 Ulam, S., 225, 226, 256 uniform approximation, 39 unit sphere, 5 unitary space, 13 Vakili, R., 342 Vallee Poussin operator, 84 Vallee-Poussin theorem (51.1), 88
Vandermonde matrix, 192 V-extremal measure, 190 Wahba, G., 116 Walsh, W., 115 Wasilkowski, G.W., 316 weak* convergence, 250 weak* topology, 250 Werschulz, A.G., 317 Whitney, A., 116 Whitney, E.L., 151 Whittaker, E.T., 152 Wilderotter, K., 258 worst case cost, 310 Wozniakowski, H., 316, 317
Xu, Y., 222 Yang, C.T., 90 Zygmund, A., 91 theorem (71.1), 89