Mariano Giaquinta Giuseppe Modica
Mathematical Analysis Linear and Metric Structures and Continuity
Birkhauser Boston • Basel • Berlin
Giuseppe Modica Universitil degii Studi di Firenze Dipartimento di Matematica Applicata 1-50139 Firenze Italy
Mariano Giaquinta Scuola Nonnale Superiore Dipartimento di Matematica 1-56100 Pisa Italy
Cover design by Alex Gerasev. Mathematics Subject Classification (2000): OOA35, 15-01, 32K99, 46L99, 32C18, 46E15, 46E20 Library of Congress Control Number: 2006927565 ISBN-IO: 0-8176-4374-5 ISBN-13: 978-0-8176-4374-4 Printed on acid-free paper.
e-ISBN-IO: 0-8176-4514-4 e-ISBN-13: 978-0-8176-4514-4
\5!)®
©2007 Birkhauser Boston Birkhiiuser ftIV> All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Birkhauser Boston, c/o Springer Science+Business Media LLC, 233 Spring Street, New York, NY 10013, USA) and the author, except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. 9 8 7 6 5 432 I
www.birkhauser.com
(MP)
Preface
One of the fundamental ideas of mathematical analysis is the notion of a function; we use it to describe and study relationships among variable quantities in a system and transformations of a system. We have already discussed real functions of one real variable and a few examples of functions of several variables l , but there are many more examples of functions that the real world, physics, natural and social sciences, and mathematics have to offer: (a) not only do we associate numbers and points to points, but we associate numbers or vectors to vectors, (b) in the calculus of variations and in mechanics one associates an energy or action to each curve y(t) connecting two points (a, y(a)) and
(b, y(b)): b
If>(y):=
J
F(t,y(t),y'(t))dt
a
in terms of the so-called Lagrangian F (t, y, p) , (c) in the theory of integral equations one maps a function into a new function
J b
x(s)
~
K(s, T)X(T) dT
a
by means of a kernel K(s, T), (d) in the theory of differential equations one considers transformations of a function x(t) into the new function t
t
~
J
f(s, x(s)) ds,
a
where f(s,y) is given. 1
in M. Giaquinta, G. Modica, Mathematical Analysis. Functions of One Variable, Birkhauser, Boston, 2003, which we shall refer to as [GMl] and in M. Giaquinta, G. Modica, Mathematical Analysis. Approximation and Discrete Processes, Birkhauser, Boston, 2004, which we shall refer to as [GM2].
vi
Preface
Yrro VOLTDAA.
,..,J-,-tu&t.. ------
PAllt,
........ ".u..
,...•..
,,'~
q-. .. -.~ -
Figure 0.1. Vito Volterra (1860-1940) and the frontispiece of his Le!;ons sur les fonctions de lignes.
H"....'''.... ~
..
11m
Of course all the previous examples are covered by the abstract setting of functions or mappings from a set X (of numbers, points, functions, ... ) with values in a set Y (of numbers, points, functions, ... ). But in this general context we cannot grasp the richness and the specificity of the different situations, that is, the essential ingredients from the point of view of the question we want to study. In order to continue to treat these specificities in an abstract context in mathematics, but also use them in other fields, we proceed by identifying specific structures and studying the properties that only depend on these structures. In other words, we need to identify the relevant relationships among the elements of X and how these relationships reflect on the functions defined on X. Of course we may define many intermediate structures. In this volume we restrict ourselves to illustrating some particularly important structures: that of a linear or vector space (the setting in which we may consider linear combinations), that of a metric space (in which we axiomate the notions of limit and continuity by means of a distance), that of a normed vector space (that combines linear and metric structures), that of a Banach space (where we may operate linearly and pass to the limit), and finally, that of a Hilbert space (that allows us to operate not only with the length of vectors, but also with the angles that they form). The study of spaces of functions and, in particular, of spaces of continuous functions originating in Italy in the years 1870-1880 in the works of among others Vito Volterra (1860-1940), Giulio Aseoli (1843-1896), Cesare Arzela (1847-1912) and Ulisse Dini (1845-1918), is especially relevant in the previous context. A descriptive diagram is the following:
Preface
---=-,~
vii
Isets I
~~ Imetric spaces I
Ivector spaces I ~
~
Ilinear normed spaces I
!
IBanach spaces I
1 IHilbert spaces I Accordingly, this book is divided into three parts. In the first part we study the linear structure. In the first three chapters we discuss basic ideas and results, including Jordan's canonical form of matrices, and in the fourth chapter we present the spectral theorem for self-adjoint and normal operators in finite dimensions. In the second part, we discuss the fundamental notions of general topology in the metric context in Chapters 5 and 6, continuous curves in Chapter 7, and finally, in Chapter 8 we illustrate the notions of homotopy and degree, and Brouwer's and Borsuk's theorems with a few applications to the topology of lR. n . In the third part, after some basic preliminaries, we discuss in Chapter 9 the Banach space of continuous functions presenting some of the classical fixed point theorems that playa relevant role in the solvability of functional equations and, in particular, of differential equations. In Chapter 10 we deal with the theory of Hilbert spaces and the spectral theory of compact operators. Finally, in Chapter 9 we survey some of the important applications of the ideas and techniques that we previously developed to the study of geodesics, nonlinear ordinary differential and integral equations and trigonometric series. In conclusion, this volume 2 aims at studying continuity and its implications both in finite- and infinite-dimensional spaces. It may be regarded as a companion to [GMl] and [GM2], and as a reference book for multi-dimensional calculus, since it presents the abstract context in which concrete problems posed by multi-dimensional calculus find their natural setting. Though this volume discusses more advanced material than [GMl,2], we have tried to keep the same spirit, always providing examples and 2
This book is a translation and revised edition of M. Giaquinta, G. Modica, Analisi Matematica, III. Strutture lineari e metriche, continuita, Pitagora Ed., Bologna, 2000.
viii
Preface
exercises to clarify the main presentation, omitting several technicalities and developments that we thought to be too advanced and supplying the text with several illustrations. We are greatly indebted to Cecilia Conti for her help in polishing our first draft and we warmly thank her. We would like to also thank Fabrizio Broglia and Roberto Conti for their comments when preparing the Italian edition; Laura Poggiolini, Marco Spadini and Dmberto Tiberio for their comments and their invaluable help in catching errors and misprints and Stefan Hildebrandt for his comments and suggestions, especially those concerning the choice of illustrations. Our special thanks also go to all members of the editorial technical staff of Birkhiiuser for the excellent quality of their work and especially to Avanti Paranjpye and the executive editor Ann Kostant. Note: We have tried to avoid misprints and errors. But, as most authors, we are imperfect authors. We will be very grateful to anybody who wants to inform us about errors or just misprints or wants to express criticism or other comments. Our e-mail addresses are
[email protected]
[email protected]
We shall try to keep up an errata corrige at the following webpages: http://www.sns.it/-giaquinta http://www.dma.unifi.it/-modica
Mariano Giaquinta Giuseppe Modica Pisa and Firenze October 2006
Contents
Preface.....................................................
v
Part 1. Linear Algebra 1.
2.
Vectors, Matrices and Linear Systems 1.1 The Linear Spaces lR. n and en . . . . . . .. . .. . . . . .. . . . .. . .. . a. Linear combinations . . . . . . . . . . . . . . . . . . . . . . . . . . b. Basis.... .. .. .. . . . . .. . c. Dimension................................... d. Ordered basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Matrices and Linear Operators " a. The algebra of matrices b. A few special matrices c. Matrices and linear operators " d. Image and kernel " e. Grassmann's formula f. Parametric and implicit equations of a subspace.. 1.3 Matrices and Linear Systems. . . . . . . . . . . . . . . . . . . . . . . . . .. a. Linear systems and the language of linear algebra b. The Gauss elimination method . . . . . . . . . . . . . . . .. c. The Gauss elimination procedure for nonhomogeneous linear systems " 1.4 Determinants........................................ 1.5 Exercises............................................
3 3 3 6 7 9 10 11 12 13 15 18 18 22 22 24 29 31 37
Vector Spaces and Linear Maps. . . . . . . . . . . . . . . . . . . . . . . .. 2.1 Vector Spaces and Linear Maps a. Definition b. Subspaces, linear combinations and bases " c. Linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. d. Coordinates in a finite-dimensional vector space .. e. Matrices associated to a linear map. . . . . . . . . . . .. f. The space £(X, Y) g. Linear abstract equations. . . . . . . . . . . . . . . . . . . . ..
41 41 41 42 44 45 47 49 50
x
Contents
h. Changing coordinates. . . . . . . . . . . . . . . . . . . . . . . .. 1. The associated matrix under changes of basis j. The dual space £(X, lK) " k. The bidual space . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1. Adjoint or dual maps 2.2 Eigenvectors and Similar Matrices " 2.2.1 Eigenvectors.................................... a. Eigenvectors and eigenvalues " b. Similar matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. c. The characteristic polynomial " d. Algebraic and geometric multiplicity. . . . . . . . . . .. e. Diagonizable matrices " f. Ttiangularizable matrices " 2.2.2 Complex matrices " a. The Cayley-Hamilton theorem " b. Factorization and invariant subspaces " c. Generalized eigenvectors and the spectral theorem d. Jordan's canonical form " e. Elementary divisors " 2.3 Exercises............................................
3.
51 53 54 55 56 57 58 58 60 60 62 62 64 65 66 67 68 70 75 76
Euclidean and Hermitian Spaces " 79 3.1 The Geometry of Euclidean and Hermitian Spaces " 79 a. Euclidean spaces " 79 b. Hermitian spaces " 82 c. Orthonormal basis and the Gram-Schmidt algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 85 d. Isometries " 87 e. The projection theorem. . . . . . . . . . . . . . . . . . . . . .. 88 f. Orthogonal subspaces " 90 g. Riesz's theorem . .. 91 h. The adjoint operator " 92 3.2 Metrics on Real Vector Spaces " 95 a. Bilinear forms and linear operators 95 b. Symmetric bilinear forms or metrics 97 c. Sylvester's theorem " 97 d. Existence of g-orthogonal bases 99 e. Congruent matrices 101 f. Classification of real metrics . . . 103 g. Quadratic forms 104 h. Reducing to a sum of squares 105 3.3 Exercises 109
Contents
4.
Self-Adjoint Operators 4.1 Elements of Spectral Theory 4.1.1 Self-adjoint operators a. Self-adjoint operators b. The spectral theorem c. Spectral resolution d. Quadratic forms e. Positive operators f. The operators A * A and AA * g. Powers of a self-adjoint operator 4.1.2 Normal operators a. Simultaneous spectral decompositions b. Normal operators on Hermitian spaces c. Normal operators on Euclidean spaces 4.1.3 Some representation formulas a. The operator A * A b. Singular value decomposition c. The Moore-Penrose inverse 4.2 Some Applications 4.2.1 The method of least squares a. The method of least squares b. The function of linear regression 4.2.2 Trigonometric polynomials a. Spectrum and products b. Sampling of trigonometric polynomials c. The discrete Fourier transform 4.2.3 Systems of difference equations a. Systems of linear difference equations b. Power of a matrix 4.2.4 An ODE system: small oscillations 4.3 Exercises
xi
111 111 111 111 112 114 115 117 118 119 121 121 121 122 125 125 126 127 128 128 128 130 130 131 132 134 136 136 137 141 143
Part II. Metrics and Topology 5.
Metric Spaces and Continuous Functions , 5.1 Metric Spaces 5.1.1 Basic definitions a. Metrics b. Convergence 5.1.2 Examples of metric spaces a. Metrics on finite-dimensional vector spaces b. Metrics on spaces of sequences c. Metrics on spaces of functions 5.1.3 Continuity and limits in metric spaces a. Lipschitz-continuous maps between metric spaces. b. Continuous maps in metric spaces
149 151 151 151 153 154 155 157 159 161 161 162
xii
6.
Contents
c. Limits in metric spaces d. The junction property 5.1.4 Functions from JR.n into JR.m a. The vector space CO(A,JR.m) b. Some nonlinear continuous transformations from JR.n into JR.N c. The calculus of limits for functions of several variables 5.2 The Topology of Metric Spaces 5.2.1 Basic facts a. Open sets b. Closed sets c. Continuity d. Continuous real-valued maps e. The topology of a metric space f. Interior, exterior, adherent and boundary points .. g. Points of accumulation h. Subsets and relative topology 5.2.2 A digression on general topology a. Topological spaces b. Topologizing a set c. Separation properties 5.3 Completeness a. Complete metric spaces b. Completion of a metric space c. Equivalent metrics d. The nested sequence theorem e. Baire's theorem 5.4 Exercises............................................
164 165 166 166
Compactness and Connectedness 6.1 Compactness 6.1.1 Compact spaces a. Sequential compactness b. Compact sets in JR.n c. Coverings and E-nets 6.1.2 Continuous functions and compactness a. The Weierstrass theorem b. Continuity and compactness c. Continuity of the inverse function 6.1.3 Semicontinuity and the Frechet-Weierstrass theorem 6.2 Extending Continuous Functions 6.2.1 Uniformly continuous functions 6.2.2 Extending uniformly continuous functions to the closure of their domains 6.2.3 Extending continuous functions a. Lipschitz-continuous functions
197 197 197 197 198 199 201 201 202 202 203 205 205
167 171 174 175 175 175 176 177 178 179 180 181 182 182 184 184 185 185 186 187 188 188 190
206 207 207
Contents
6.2.4 Tietze's theorem Connectedness 6.3.1 Connected spaces a. Connected subsets b. Connected components c. Segment-connected sets in JRn d. Path-connectedness 6.3.2 Some applications 6.4 Exercises
208 210 210 211 211 212 213 214 216
Curves 7.1 Curves in JRn 7.1.1 Curves and trajectories a. The calculus b. Self-intersections c. Equivalent parametrizations 7.1.2 Regular curves and tangent vectors a. Regular curves b. Tangent vectors c. Length of a curve d. Arc length and CI-equivalence 7.1.3 Some celebrated curves a. Spirals b. Conchoids c. Cissoids d. Algebraic curves e. The cycloid f. The catenary 7.2 Curves in Metric Spaces a. Functions of bounded variation and rectifiable curves b. Lipschitz and intrinsic reparametrizations 7.2.1 Real functions with bounded variation a. The Cantor-Vitali function 7.3 Exercises
219 219 219 222 223 223 224 224 225 226 232 233 234 236 237 238 238 240 241
Some Topics from the Topology of JRn 8.1 Homotopy 8.1.1 Homotopy of maps and sets a. Homotopy of maps b. Homotopy classes c. Homotopy equivalence of sets d. Relative homotopy 8.1.2 Homotopy of loops a. The fundamental group with base point b. The group structure on 1fI (X, xo) c. Changing base point
249 250 250 250 252 253 256 257 257 257 258
6.3
7.
8.
xiii
241 243 244 245 247
xiv
Contents
d. Invariance properties of the fundamental group 8.1.3 Covering spaces a. Covering spaces b. Lifting of curves c. Universal coverings and homotopy d. A global invertibility result 8.1.4 A few examples a. The fundamental group of 8 1 b. The fundamental group of the figure eight c. The fundamental group of 8 n , n ~ 2 8.1.5 Brouwer's degree a. The degree of maps 8 1 ----+ 8 1 b. An integral formula for the degree c. Degree and inverse image d. The homological definition of degree for maps 8 1 ----+ 8 1 8.2 Some Results on the Topology of JRn 8.2.1 Brouwer's theorem a. Brouwer's degree b. Extension of maps into 8 n c. Brouwer's fixed point theorem d. Fixed points and solvability of equations in JRn+1 e. Fixed points and vector fields 8.2.2 Borsuk's theorem 8.2.3 Separation theorems 8.3 Exercises
259 260 260 261 264 264 266 266 267 267 268 268 269 270 271 272 272 272 273 274 . 275 276 278 279 281
Part III. Continuity in Infinite-Dimensional Spaces 9.
Spaces of Continuous Functions, Banach Spaces and Abstract Equations 9.1 Linear Normed Spaces 9.1.1 Definitions and basic facts a. Norms induced by inner and Hermitian products. b. Equivalent norms c. Series in normed spaces d. Finite-dimensional normed linear spaces 9.1.2 A few examples a. The space Rp , 1 ::; p < 00 b. A normed space that is not Banach c. Spaces of bounded functions d. The space Roc; (Y) 9.2 Spaces of Bounded and Continuous Functions 9.2.1 Uniform convergence a. Uniform convergence b. Pointwise and uniform convergence
285 285 285 287 288 288 290 292 292 293 294 295 295 295 295 297
Contents
9.3
9.4
9.5
9.6
xv
c. A convergence diagram 297 d. Uniform convergence on compact subsets 299 9.2.2 A compactness theorem 300 a. Equicontinuous functions 300 b. The Ascoli-Arzela theorem 301 Approximation Theorems 303 303 9.3.1 Weierstrass and Bernstein theorems a. Weierstrass's approximation theorem 303 b. Bernstein's polynomials 305 c. Weierstrass's approximation theorem for periodic functions 307 9.3.2 Convolutions and Dirac approximations 309 a. Convolution product 309 b. Mollifiers 312 c. Approximation of the Dirac mass 313 316 9.3.3 The Stone-Weierstrass theorem 9.3.4 The Yosida regularization 319 a. Baire's approximation theorem 319 320 b. Approximation in metric spaces Linear Operators 322 9.4.1 Basic facts 322 a. Continuous linear forms and hyperplanes 323 b. The space of linear continuous maps 324 c. Norms on matrices 324 d. Pointwise and uniform convergence for operators . 325 e. The algebra End (X) 326 f. The exponential of an operator 327 9.4.2 Fundamental theorems 327 a. The principle of uniform boundedness 328 b. The open mapping theorem 329 c. The closed graph theorem 330 d. The Hahn-Banach theorem 331 Some General Principles for Solving Abstract Equations 334 9.5.1 The Banach fixed point theorem 335 a. The fixed point theorem 335 b. The continuity method 337 9.5.2 The Caccioppoli-Schauder fixed point theorem 339 a. Compact maps 339 b. The Caccioppoli-Schauder theorem 341 c. The Leray-Schauder principle 342 9.5.3 The method of super- and sub-solutions 342 a. Ordered Banach spaces 343 b. Fixed points via sub- and super-solutions 344 Exercises 344
xvi
Contents
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators . . " .. " .. " , , 10.1 Hilbert Spaces 10.1.1 Basic facts a. Definitions and examples b. Orthogonality 10.1.2 Separable Hilbert spaces and basis a. Complete systems and basis b. Separable Hilbert spaces c. Fourier series and £2 d. Some orthonormal polynomials in £2 10.2 The Abstract Dirichlet's Principle and Orthogonality a. The abstract Dirichlet's principle b. Riesz's theorem c. The orthogonal projection theorem d. Projection operators 10.3 Bilinear Forms 10.3.1 Linear operators and bilinear forms a. Linear operators b. Adjoint operator c. Bilinear forms 10.3.2 Coercive symmetric bilinear forms a. Inner products b. Green's operator c. Ritz's method d. Linear regression 10.3.3 Coercive nonsymmetric bilinear forms a. The Lax-Milgram theorem b. Faedo--Galerkin method 10.4 Linear Compact Operators 10.4.1 Fredholm-Riesz-Schauder theory a. Linear compact operators b. The alternative theorem c. Some facts related to the alternative theorem d. The alternative theorem in Banach spaces e. The spectrum of compact operators 10.4.2 Compact self-adjoint operators a. Self-adjoint operators b. Spectral theorem c. Compact normal operators d. The Courant-Hilbert-Schmidt theory e. Variational characterization of eigenvalues 10.5 Exercises
351 351 351 351 354 355 355 355 357 360 363 364 366 367 368 368 369 369 369 370 371 371 372 373 374 376 376 377 378 378 378 379 381 383 384 385 385 387 388 390 392 393
Contents
11. Some Applications 11.1 Two Minimum Problems 11.1.1 Minimal geodesics in metric spaces a. Semicontinuity of the length b. Compactness c. Existence of minimal geodesics 11.1.2 A minimum problem in a Hilbert space a. Weak convergence in Hilbert spaces b. Existence of minimizers of convex coercive functionals 11.2 A Theorem by Gelfand and Kolmogorov 11.3 Ordinary Differential Equations 11.3.1 The Cauchy problem a. Velocities of class Ck(D) b. Local existence and uniqueness c. Continuation of solutions d. Systems of higher order equations e. Linear systems f. A direct approach to Cauchy problem for linear systems g. Continuous dependence on data h. The Peano theorem 11.3.2 Boundary value problems a. The shooting method b. A maximum principle c. The method of super- and sub-solutions d. A theorem by Bernstein 11.4 Linear Integral Equations 11.4.1 Some motivations a. Integral form of second order equations b. Materials with memory c. Boundary value problems d. Equilibrium of an elastic thread e. Dynamics of an elastic thread 11.4.2 Volterra integral equations 11.4.3 Fredholm integral equations in Co 11.5 Fourier's Series 11.5.1 Definitions and preliminaries a. Dirichlet's kernel 11.5.2 Pointwise convergence a. The Riemann-Lebesgue theorem b. Regular functions and Dini test 11.5.3 L 2 -convergence and the energy equality a. Fourier's partial sums and orthogonality b. A first uniform convergence result c. Energy equality 11.5.4 Uniform convergence
xvii
395 395 395 395 396 397 397 398 400 402 403 404 404 405 407 409 410 411 413 415 416 418 419 421 423 424 424 425 425 426 427 427 429 430 431 433 435 436 436 437 439 439 440 441 442
xviii
Contents
a. A variant of the Riemann-Lebesgue theorem ..... b. Uniform convergence for Dini-continuous functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . c. Riemann's localiziation principles 11.5.5 A few complementary facts a. The primitive of the Dirichlet kernel b. Gibbs's phenomenon " " 11.5.6 The Dirichlet-Jordan theorem a. The Dirichlet-Jordan test b. Fejer example 11.5.7 Fejer's sums
442 444 445 445 445 447 449 449 451 452
A. Mathematicians and Other Scientists
455
B. Bibliographical Notes
457
c.
459
Index
Mathematical Analysis Linear and Metric Structures and Continuity
Part I
Linear Algebra
William R. Hamilton (1805-1865), James Joseph Sylvester (1814-1897) and Arthur Cayley (1821-1895).
1. Vectors, Matrices and Linear Systems
The early developments of linear algebra, and related to it those of vectorial analysis, are strongly tied, on the one hand, to the geometrical representation of complex numbers and the need for more abstraction and formalization in geometry and, on the other hand, to the newly developed theory of electromagnetism. The names of William R. Hamilton (1805-1865), August Mobius (1790-1868), Giusto Bellavitis (1803-1880), Adhemar de Saint Venant (1797-1886) and Hermann Grassmann (1808-1877) are connected with the beginning of linear algebra, while J. Willard Gibbs (18391903) and Oliver Heaviside (1850-1925) established the basis of modern vector analysis motivated by the then recent Treatise in Electricity and Magnetism by James Clerk Maxwell (1831-1879). The subsequent formalization is more recent and relates to the developments of functional analysis and quantum mechanics. Today, linear algebra appears as a language and a collection of results that are particularly useful in mathematics and in applications. In fact, most modeling, which is done via linear programming of ordinary or partial differential equations or control theory, can be treated numerically by computers only after it has been transformed into a linear system; in the end most of the modeling on computers deals with linear systems. Our aim here is not to present an extensive account; for instance, we shall ignore the computational aspects (error estimations, conditioning, etc.), despite their relevance, but rather we shall focus on illustrating the language and collecting a number of useful results in a wider sense. There is a strict link between linear algebra and linear systems. For this reason in this chapter we shall begin by discussing linear systems in the context of vectors in IRn or en.
1.1 The Linear Spaces lRn and
en
a. Linear combinations Let ][{ be the field of real numbers or complex numbers. We denote by the space of ordered n-tuples of elements of ][{, ][{n:=
{xlx:= (Xl, x2 , ••• , X n ), xi E][{,
i = 1, ... ,n}.
][{n
4
1. Vectors, Matrices and Linear Systems
The elements of lK n are often called points or vectors of lKn ; in the latter case we think of a point in lKn as the end-point of a vector applied at the origin. In this context the real or complex numbers are called scalars as they allow us to regard a vector at different scales. We can sum points of lKn , or multiply them by a scalar by summing their coordinates or multiplying the coordinates by>.: X
._ (x 1 + y, 1 x 2 + y, 2 ... , x n + yn), + y.-
if x o o o o
= (Xl, x 2, ... , x n ),
y = (y1, y2, ... , yn), >. E lK. Of course "Ix, y, z E lKn and V>., /l E lK, we have
(x+y)+z=x+(y+z),x+y=y+x, >.(x + y) = >'x + >.y, (>' + /l)x = >'x + /lX, (>'/l)x = >'(/lx), if 0 := (0, ... ,0), then X + 0 = 0 + x = x, 1· x = x and, if -x := (-1 )x, then x + (-x) = O.
We write x - y for x + (- y) and, from now on, the vector 0 will be simply denoted by O. 1.1 Example. If we identify JR2 with the plane of geometry via a Cartesian system, see [GMI], the sum of vectors in JR2 corresponds to the sum of vectors acccording to the parallelogram law, and the multiplication of x by a scalar .\ to a dilatation by a factor 1.\1 in the same sense of x if .\ > 0 or in the opposite sense if .\ < O.
1.2 About the notation. A list of vectors in lKn will be denoted by a lower index, Yl, Y2, ... , Yk, and a list of scalars with an upper index >.1, >.2, ... , >.k. The components of a vector x will be denoted by upper indices. In connection with the product row by columns, see below, it is useful to display the components as a column
However, since this is not very convenient typographically, if not strictly necessary, we shall write instead x = (Xl, x 2, ... , x n ). Given k scalars >.1, >.2, ... , >.k and k vectors Y1, Y2, ... , Yk of IR n , we may form the linear combination vector of Y1, Y2, ... , Vk with coefficients >.1, >.2, ... , >.k given by k
I:>jYj E lK
n
.
j=l
1.3 Definition. (i) We say that W C lKn is a linear subspace, or simply a subspace of lK n , if all finite linear combinations of vectors in W belong to W.
1.1 The Linear Spaces R n and en
5
1.1. Giusto Bellavitis (18031880) and a page from his Nuovo metodo di geometria analitica.
Figure
(ii) Given a subset S c OC n we call the span of S the subset ofOC n , denoted by Span S, of all finite combinations of vectors in S. If W = Span S, we say that the elements of S span W, or that S is a set of generators forW.
(iii) We say that k vectors there exist scalars
o.
(iv) If k vectors
VI> V2, ... , Vk
,V, >.2, ... , >.k,
VI, V2, ... , Vk
E OC n are linearly dependent if
not all zero, such that l:~=1 >.jVj
=
are not linearly dependent, that is implies
are called linearly independent. OC is a set of linearly independent vectors if any finite choice of vectors VI, v2,···, Vk E S is made of linearly independent vectors. (vi) Let W be a linear subspace of n . A subset SeW of linearly independent vectors that form a set of generators for W is called a basis ofW. then
VI, V2, ... , Vk n
(v) A subset S
c
oc
Observe the following. (i) W is a linear subspace of OC n if and only if for all x, yEW and all >., J-L E OC, we have >.x + J-LY E W. (ii) If W is a linear subspace of OCn , then 0 E Wand moreover >.v E W for all >. E OC if V E W. (iii) VI, V2, ... , Vk E OC n are linearly dependent if and only if one of the vectors VI, V2, ... , Vk is a linear combination of the others.
6
1. Vectors, Matrices and Linear Systems
( •.. l
............ IIIDdrut n
_ JlI.utt _
,...,.,..
D ••
w_
lit
All
~u
barycentrische Calcul
_'-' ......... ,......."..,.../,111 "'...... tr-M."
s..-v-..., (&lnIl.,..~.) t ~ , MIl
41111
c..d1.0',. -.)
.~_~
401
..................... -.-Npe.
BGIf'.m.lu.t
G. . . .
r...
........ ,."........ ..,...eW._ ,•.. ,, po\K'o ...... ............... __ ,.,...,..
·
-..-.......... ..
'" __
, ...... ,
· , r..,.a. . - . ......... _
.......·.·11eriI • 1..
.. r ~
~
-... uth,..
~
• •• 1.,,,,
I ••••
r.
-t-k ~ ~ •. 4MM
_
r'WI .., . . . . . . , . •
..
.
,.n.w..
....... ~ ..•. . _
~4.CIf
F.I: .;;..,..\. .
flilril
;+ ....
·• •
~
.-..""'~.
.... .......-. ...
r~.,..~,..."....
AalUt
, r.,dlIlUl.d lItlbh'
h.'..... , .. ,h ...... h ..
".I,.J•
O' . . . . . . ~ .................... ~ .... 1J. . . "~
_tl·"
,
•
.
..,.NlIIllptr .. ~ ............ ', ..
L. l ...101 , .. "
r
I
J, •
I" ,.
r •• I.. '''I~
Figure 1.2. A page from Memoire sur les sommes et les differences geometriques by Adhemar de Saint Venant (1797-1886) and the frontispiece of the Barycentrische Calcul by August Mobius (1790-1868).
(iv) If k vectors vI,
V2, ... , vk are linearly independent, then necessarily are distinct and not zero. Moreover, any choice of h, 1 :S h:S k, yields a linearly independent set. (v) Let S c JKn. Then W := SpanS is a linear subspace of JKn. More explicitly W = Span S if and only if for every w E W there exist kEN, scalars .V, .x 2 , ... , .x k E JK and vectors V1, V2, ... , Vk E S such that w = 2:7=1 .xjVj. VI, V2,""
Vk
b. Basis We shall now discuss the crucial notion of basis. 1.4 Definition. Let C c JKn. A subset [3 c C is a maximal set of linearly independent vectors of C if [3 is a set of linearly independent vectors and, for every wEe \ [3, the set [3 U {w} is not a set of linearly independent vectors. 1.5 Proposition. Let W be a linear subspace of JKn. [3 is a basis of W if and only if [3 is a maximal set of linearly independent vectors of W. Proof. Let 13 be a basis of W. 13 is a set of linearly independent vectors. Moreover, since for every w E W there are k vectors VI, V2, , Vk E 13 and k scalars 11- 1 , 11- 2 , . .• , I1- k such that w = L::7=ll1-jvk; the vectors VI, V2, , Vk, ware not linearly independent, i.e., 13 U {w} is not a set of linearly independent vectors. Conversely, suppose that 13 is a maximal set of linearly independent vectors. To prove that 13 is a basis, it suffices to prove that Span 13 = Wand, actually, that W C
1.1 The Linear Spaces IRn and
en
7
SpanB. If wE W, by assumption B U {w} is not a set of linearly independent vectors. Then there exist VI, V2, ... , Vk E B and scalars a, >J, >.2, ... , >.k such that k
aw + L>.iVi = O. i=l
On the other hand a hence
'I 0,
since otherwise
VI, V2, ... , Vk
would be linearly dependent,
1 k . w = - - L>"vi, a i=1
o
and w E Span B. Therefore W C Span B.
Using Zorn's lemma, see e.g., [GM2], one can show that every subspace of a linear space, including of course the linear space itself, has a basis. In the present situation, Proposition 1.5 allows us to select a basis of a subspace W by selecting a maximal set of linearly independent vectors in W. For instance, if W = Span {VI, V2, ... , Vk}, we select the first nonzero element, say WI := VI, then successively choose W2, W3, ... , Wh in the list (VI, V2, ... , Vk) so that W2 is not a multiple of VI, and by induction, Wj is not a linear combination of WI, W2,"" Wj-I. This is not a very efficient method, but it works. For more efficient methods, see Exercises 1.46 and 3.28. 1.6 'If. Define the notion of minimal set of generators and show that it is equivalent to the notion of basis.
c. Dimension We shall now show that all bases of a subspace W of lK n have the same number of elements, called the dimension of W, denoted by dim W, and that dim W S; n. 1.7 Theorem. We have the following.
wkl
be a basis of a subspace W C lKn and let VI, V2,"" v p be p vectors of W that are linearly independent with p < n. Then one can complete the list VI, V2,"" v p with n - p vectors among WI, W2, ... , Wk to form a new basis of W. (ii) k vectors of lKn are always linearly dependent if k > n. (iii) All bases of a subspace W C lK n have the same number of elements k and k S; n.
(i) Let
{WI, W2, ... ,
Proof. (i) We proceed by induction on p = 1,2, ... , n. Let p = 1 and VI '10. Since WI, W2, ... , Wk is a basis of W, we have VI
with at least one of the 'I 0, hence
xl
xi
=
xlWI
+ ... + xkWk
(Ll)
not zero. Without loss of generality, we can assume that
8
1. Vectors, Matrices and Linear Systems
and the vectors Vi, W2, W3, ... , Wk span W. The vectors Vi, W2, W3, ... , Wk are also independent. In fact, if V, ),, 2 , ... , )" n are such that k
)"lVl
+L
)"jWj = 0,
j=2
then (1.1) yields k
)"lxlwl
+ L(xi)"l + )"i)Wi
= 0,
i=2
of 0; consequently 2:J=2 )"jWj = 0, hence ),,2 = .. , = = o. Assume now the inductive hypothesis, that is, that we can choose n - p vectors of the basis Wi = (1,0, ... ,0), W2 = (0,1, ... ,0), ... , W n = (0,0, ... ,1), say Wp+l, ... , Wn, such that {Vi, ... , Vp , Wp+l, ... , w n } is a basis of W. Let us prove the claim for p + 1 vectors. Since Vp+ 1 is independent of Vi, V2, ... , Vp, we infer by the induction hypothesis that and this implies V = 0 since xl )"k
k
p
Vp+l = LxiVi i=l
+
L
(1.2)
yjWj
j=p+l
where at least one of the yi is not zero. Assuming, without loss of generality, that yp+l of 0, we have 1 p+l Vp+l -
Wp+l:= Y
yj
k
L
P
p+l Wj -
j=p+2 Y
xi
L p+l Vi, i=l Y
and the vectors Vl, ... , Vp+l, Wp+2, ... , Wk span W. Let us finally show that these vectors are also independent. If k
p
L
i=l
)"i Vi
+ )"p+lvp+l +
)"jWj = 0,
L j=p+2
(1.2) yields k
p
L()"i
i=l
+ )"p+lxi)Vi + )"p+lyp+lwp+l +
L
()"j
+ )"p+lyj)Wj
= O.
j=p+2
Since {Vi, ... , vp, Wp+l, ... , w n } is a basis of W, because of the induction assumption, and yp+l of 0 by construction, we conclude that )"p+l = 0, and consequently )"i = 0 for all indices i. (ii) Assume that the Vi, V2, , Vk E ]](n are independent and k > n. By (i) we can complete the basis {Wi, W2, , w n } of ]](n to form a basis of Span {Vi, V2, ... , vd with k elements; this is a contradiction since {el, e2, ... , en} is already a basis of ]](n, hence a maximal system of linearly independent vectors of ]](n. (iii) follows as (ii). Let us prove that two bases of W have the same number of elements. Suppose that {Vi, V2, ... , v p } and {el, e2, ... , ed are two bases of W with p < k. By (i) we may complete Vi, V2, , v p with k - p vectors chosen among el, e2, ... , ek to form a new basis {Vi, V2, , vp, e p+ 1, ... , ek} of W; but this is a contradiction since {el, e2, ... , e p} is already a basis of W, hence a maximal system of linearly independent vectors of W, see Proposition 1.5. Similarly, and we leave it to the reader, one can prove that k ~ n. 0
1.8 Definition. The number of the elements of a (all) basis of a linear subspace W of JKn is called the dimension of Wand denoted by dim W.
1.1 The Linear Spaces IR n and
en
9
1.9 Corollary. The linear space JKn has dimension nand, ifW is a linear subspace of JK n , then dim W :::; n. Moreover,
(i) there are k linearly independent vectors Vl, V2, (ii) a set of k linearly independent vectors Vl, V2,
E W, E W is always
, Vk , Vk
a basis ofW, Vl, V2, ... , v p E W with p > k are always linearly dependent, (iv) ifvl' V2, ... , v p are p linearly independent vectors ofW, then p:::; k, (v) for every subspace V C JKn such that V C W we have dim V :::; k, (vi) let V, W be two subspaces of JKn; then V = W if and only if V C W and dim V = dim W.
(iii) any p vectors
1.10
-,r.
Prove Corollary 1.9.
d. Ordered basis Until now, a basis S of a linear subspace W of JKn is just a finite set of linearly independent generators of W; every x E W is a unique linear combination of the basis elements. Here, uniqueness means uniqueness of the value of each coefficient in front of each basis element. To be precise, one would write x= l:A(v)V. YES
It is customary to index the elements of S with natural numbers, i.e., to consider S as a list instead of as a set. We call any list made with the elements of a basis S an ordered basis. The order just introduced is then used to link the coefficients to the corresponding vectors by correspondingly indexing them. This leads to the simpler notation k
x= l:Aivi i=l
we have already tacitly used. Moreover,
1.11 Proposition. Let W be a linear subspace ofJKn of dimension k and let (Vl' V2, ... , Vk) be an ordered basis of W. Then for every x E W there is a unique vector A E JK k , A:= (A l , A2 , ... , Ak ) such that x = I:~=l Aivi. 1.12 Example. The list (el, e2, , en) of vectors of IK n given by el := (1,0, e2 := (0,1, ... ,0), ... , en = (0,0, ,1) is an ordered basis oflK n . In fact el, e2, are trivially linearly independent and span IK n since
,0), , en
for all x E IK n . (el, e2, ... , en) is called the canonical or standard basis of IK n . We shall always think of the canonical basis as an ordered basis.
10
1. Vectors, Matrices and Linear Systems
1.2 Matrices and Linear Operators Following Arthur Cayley (1821-1895) we now introduce the calculus of matrices. An m x n matrix A with entries in X is an ordered table of elements of X arranged in m rows and n columns. It is customary to index the rows from top to bottom from 1 to m and the columns from left to right from 1 to n. If {an denotes the element or entry in the ith row and the jth column, we write
or
A
A=[a;], i =1, ... ,m, j =1, ... ,n.
A;
A,
Given a matrix we write for the entry (i,j) of and denote the set of matrices with m rows and n columns with Mm,n(X), Usually X will be the field of real or complex numbers lK, but a priori one allows other entries.
1.13 Remark (Notation). The common agreement about the indices of the elements of a matrix is that the row is determined by the upper index and the column by the lower index. Later, we shall also consider matrices with two lower indices or two upper indices, A = [aij] or A = [a ij ]. In both cases the agreement is that the first index identifies the row. These agreements turn out to be particularly useful to keep computation under control. The three different types of notation correspond to different mathematical objects represented by those matrices. But for the moment we shall not worry about this. If A
= [a;]
E
Mp,n and B = [b;] E Mp,m are two matrices with the
same number of rows p, we denote by [A I B], or by
(0 0),
matrix with p rows and (n + m) columns defined by
[AIB]:=(0
~~ ("'~! .
aP1
a 21 a 22
1 an 2 an
b11 b21
b21
b~
b~ b )
aP2
aPn
bf
~
~
2
m
or shortly by
[AIB];:= {:lJ-n Similarly, given A
= [a;]
ifl:S;;j:S;;n
i
ifn+l:S;;j:S;;n+m
E Mp,n and B
= [b;]
=1, ... ,po
E Mq,n, we denote by
the
1.2 Matrices and Linear Operators
11
the (p + q) x n matrix C = [c~] defined by if 1 ::; i ::; p, if p + 1 ::; i ::; p + q.
a. The algebra of matrices Two matrices A := [a~], B = [b~] in Mm,n(IK) can be summed by setting i
= 1, ... ,m,
j
= 1, ... ,n.
Moreover, one can multiply a matrix A E Mm,n(IK) by a scalar A E K by setting that is, each entry of AA is the corresponding entry of A multiplied by A. Notice that the sum of matrices is meaningful if and only if both matrices have the same number of rows and columns. Putting the rows one after the other, we can identify Mm,n(K) with Knm as a set, and the operations on the matrices defined above correspond to the sum and the multiplication by a scalar in Knm. Thus Mm,n(K), endowed with the two operations (A, B) --t A + B and (A, A) --t AA, is essentially Knm. A basis for Mm,n(K) is the set of m x n matrices {I)} where I~ has entries 1 at the (i, j) position and zero otherwise.
1.14 Definition (Product of matrices). If the number of rows of A is the same as the number of the columns of B, A = [a~] E Mp,n(K), B = [b~] E Mn,q(K), we define the product matrix AB E Mp,q by setting n
where
i Cj
" ' akbj' i k = '~
k=l
Notice that if (ai, a~, ... , a~) is the ith row of A and (b], b;, ... , b'J) is the jth column of B, then
12
1. Vectors, Matrices and Linear Systems
where (AB)ij ..=
= ali bj1 + a2i b2j + . . . + ani bnj '
Cji
For this reason the product of matrices is called the product rows by columns. It is easily seen that the product of matrices is associative and distributive i.e., (AB)C = A(BC) =: ABC,
A(B
+ C) =
AB + AC
but, in general, it is not commutative, AB =I- BA, as simple examples show. Indeed, we may not be able to form BA even if AB is meaningful. Moreover, AB may equal 0, 0 being the matrix with zero in all entries, although A =I- 0 and B =I- o.
b. A few special matrices For future purposes, it is convenient to single out some special matrices. A square n x n matrix A with nonzero entries only on the principal diagonal is called a diagonal matrix and denoted by
o in short, A is diagonal iff A = [aJ] , a; := Aj symbol,
. {I
8'·
'=
J .
0
8; where 8; is the Kronecker
ifi=j if i =I- j.
The n x n matrix Id n := diag (1, ... ,1) = [8}] is called the identity matrix since for every A E Mp,n(JK) and B E Mn,q we have Ald n = A, IdnB = B. We say that A = [a;] is upper triangular if aJ = 0 for all (i,j) with i > j and lower triangular if aJ = 0 for all (i, j) with j < i. 1.15 Definition. We say that a n x n square matrix A E Mn,n(JK) is invertible if there exists B E Mn,n(lK) such that AB = Id n and BA = Id n . Since the inverse is trivially unique, we call B the inverse of A and we denote it by A- 1 .
1.2 Matrices and Linear Operators
13
1.16~. Show that an upper (lower) triangular matrix A = raj] E Mn,n(lK) is invertible if and only if ai =I 0 Vi = 1, ... , n. Show that, if A is invertible, A -1 is upper (lower) triangular.
1.17 Definition. LetA = [a~] E Mm,n(lK). The transpose AT of A is the matrix AT := [b~] E Mn,m(lK) where b~ = a{ Vi = 1, ... , n, Vj = 1, ... , m. We obtain AT from A by exchanging rows with columns, that is, writing the successive columns of A from left to right as successive rows from top to bottom. It is easily seen that
(i) (AT)T = A, (ii) (AA + p,Bf = AAT + p,B T VA,p, E OC, (iii) (ABf = B T AT VA,B, (iv) A is invertible if and only if AT is invertible and (A -1 f = (A T )-l. In particular, in the context of matrices with one upper and one lower index, the transposition operation exchanges upper and lower indices; thus in the case of row- and column-vectors we have
c. Matrices and linear operators A map A : OCn ----. OCm is said to be linear if
A(AX + J-lY)
= AA(x) + p,A(y) In particular A(O) = O. By induction it is easily seen that A is linear if and only if
A(
k
k
j=l
j=l
L AjVj) = L Aj A(vj)
for any k = 1,2,3, ... , for any V1, Vz, ... , Vk E oc n and scalars A1 , ... , Ak . Linear maps from OCn into OCm and m x n matrices are in one-t
----.
OC m be a linear map. Then the matrix
(1.3) where (e1, ez, ... , en) is the canonical basis of OC n , is the unique matrix such that
A(x) = Ax, x = (Xl, x Z , ... , x n ). (1.4) Conversely, if A E Mn,m(OC), then the linear map in (1.4) is a linear map from OCn into OCm .
14
1. Vectors, Matrices and Linear Systems
Proof. Assuming A is defined by (1.3), we have for all x = (Xl, n
A(x) = A(:~:>iei) i=l
2 X , ... ,
xn) E lK n
n
= LxiA(ei) = Ax. i=l
Actually, A is characterized by (1.4), since if A(x) = Bx 'Ix, then A(ei) = Bei Vi = 1, ... ,n, hence A and B have the same columns. Conversely, given A E Mm,n(lK), it is trivial to check that the map x --+ Ax is a 0 linear map from lK n into lK m .
1.19 Remark. The map A -> A that relates linear operators and matrices is tied to the (ordered) canonical basis of oc n and ocm . If A and A are related by (1.4), we refer to A and A respectively as the linear map associated to the matrix A and the matrix associated to the linear map A. If we denote by al, a2, ... , an the columns of A indexed from the left so that A = [al Ia2! ... Ian], then n
A(x)
= Ax = L Xi~,
I
2
X= ( X , X , ... ,X
n) ,
(1.5)
i=l
that is, for every x = (X!, x 2, ... , x n ), A (x) is the linear combination of vectors al, a2, ... , an of ocm with scalars Xl, x 2 , ... , x n as coefficients. Observe that A(el) = al, ... , A(en ) = an, where (el,e2, ... ,en ) is the canonical basis of OC n . 1.20 Proposition. Under the correspondence (1.4) between matrices and linear maps, the sum of two matrices corresponds to the sum of the associated operators, and the product of two matrices corresponds to the composition product of the associated operators. Proof. (i) Let A,B E Mm,n(lK) and let A(x) := Ax and B(x) := Bx. Then we have (A + B)(x) := A(x) + B(x) = Ax + Bx = (A + B)x. (ii) Let A E Mm,n(K), B E Mp,m(lK), A(x) := Ax and B(y) := By 'Ix E lK n ,
Vy E lKm . Then
(B
0
A)(x)
= B(A(x»
= B(A(x»
= B(Ax) = (BA)x. o
1.21 'If. Give a few examples of 2 x 3 and 3 x 2 matrices, their sums and their products (whenever this is possible). Show examples for which AB op BA and AB = 0 although A op 0 and B op O. Finally, show that Ax = 0 'Ix E lK n implies A = O. [Hint: Compare Exercises 1.76, 1.79 and 1.81.] 1.22
'If. Show that
--+
= ax + by 'Ix, y E JR..
JR. is linear if and only if there exist a, b E JR. such that
1.23 'If. Show that
1.2 Matrices and Linear Operators
15
Figure 1.3. Some linear transformations of the plane. In the figure the possible images of the square [0,1] x [O,lJ are in shadow.
d. Image and kernel Let A E Mm,n(JK) and let A(x) := Ax, x E OCn be the linear associated operator. The kernel and the image of A (or A) are respectively defined by ker A = ker A := {x E OC n IA(x) =
ImA = ImA:= {Y E OC m
o},
I:J x E OCn
such that A(x) =
y}.
Trivially, ker A is a linear subspace of the source space OC n , and it easy to see that the following three claims are equivalent: (i) A is injective, (ii) ker A = {O}, (iii) aI, a2, ... , an are linearly independent in OCm . If one of the previous claims holds, we say that A is nonsingular, although in the current literature nonsingular usually refers to square matrices. Also observe that A may be nonsingular only if m ::::: n. Also 1m A = 1m A is a linear subspace of the target space OCm , and by definition 1m A = Span {aI, a2, ... , an}. The dimension ofIm A = 1m A is called the rank of A (or of A) and is denoted by Rank A (or Rank A). By definition Rank A is the maximal number of linearly independent columns of A, in particular RankA :::; min(n, m). Moreover, it is easy to see that the following claims are equivalent
16
1. Vectors, Matrices and Linear Systems
(i) A is surjective, (ii) 1m A = ][{m, (iii) Rank A = m. Therefore A may be surjective only if m ::; n. The following theorem is crucial.
1.25 Theorem (Rank formula). For every matrix A E Mm,n(][{) we have dim 1m A = n - dimker A. Proof. Let (VI, V2, ... , Vk) be a basis of ker A. According to Theorem 1.7 we can choose (n - k) vectors ek+I, ... , en of the standard basis of OC n in such a way that VI, V2, ... , Vk, ek+lo ... ,en form a basis of OC n . Then one easily checks that (A(ek+I),'" ,A(en)) is a basis of ImA, thus concluding that dim 1m A = n - k. 0
A first trivial consequence of the rank formula is the following. 1.26 Corollary. Let A E Mm,n(][{)' (i) If m < n, then dimker A> a. (ii) If m ;::: n, then A is nonsingular, z.e., ker A = {a}, if and only if Rank A is maximal, Rank A = n. (iii) If m = n, i. e., A is a square matrix, then the following two equivalent claims hold: a) Let A(x) := Ax be the associated linear map. Then A is surjective if and only if A is injective. b) Ax = b is solvable for any choice of b E ][{m if and only if A(x) = a has zero as a unique solution. Proof. (i) From the rank formula we have dim ker A = n - dim 1m A 2: n - m
> O.
(ii) Again from the rank formula, dim 1m A = n - dim ker A = n = min(n, m). (iii) (a) Observe that A is injective if and only if ker A = {O}, equivalently if and only if dim ker A = 0, and that A is surjective if and only if 1m A = OC m , i.e., dim 1m A = m = n. The conclusion follows from the rank formula.
o
(iii) (b) The equivalence between (iii) (a) and (iii) (b) is trivial.
Notice that (i) and (ii) imply that A : surjective only if n = m. 1.27~.
][{n ----; ][{m
may be injective and
Show the following.
Proposition. Let A E Mn,n (OC) and A(x) := Ax. The following claims are equivalent:
(i) (ii) (iii) (iv) (v) (vi)
A is injective and surjective, A is nonsingular, i.e., ker A = {O}, A is surjective, there exists B E Mn,n(OC) such that BA = Id n , there exists B E Mn,n(OC) such that AB = Id n , A is invertible, i.e., there exists a matrix BE Mn,n(OC) such that BA Id n .
=
AB
=
1.2 Matrices and Linear Operators
17
An important and less trivial consequence of the rank formula is the following. 1.28 Theorem (Rank of the transpose). Let A E Mm,n. Then we have
(i) the maximum number of linearly independent columns and the maximum number of linearly independent rows are equal, z.e., RankA = Rank AT,
(ii) let p
:= RankA. Then there exists a nonsingular p x p square submatrix of A.
Proof. (i) Let A = [a;J, let aI, a2, ... , an be the columns of A and let p:= RankA. We assume without loss of generality that the first p columns of A are linearly independent and we define B as the mxp submatrix formed by these columns, B := [al I a21 ... lap]. Since the remaining columns of A depend linearly on the columns of B, we have Vk
= 1, ... , m,
Vj
= p + 1, ... , n
for some R = [r;] E Mp,n-p(lK). In terms of matrices,
hence Taking the transposes, we have AT E Mn,m(lK), BT E Mp,m(lK) and
(1.6)
Since [Id p IR]T is trivially injective, we infer that ker AT = ker B T , hence by the rank formula Rank AT = m - dimkerA T = m - dim ker B T = RankB T , and we conclude that Rank AT
= RankB T
::; min(m,p)
= p = RankA.
Finally, by applying the above to the matrix AT, we get the opposite inequality Rank A = Rank (AT)T ::; Rank AT, hence the conclusion. (ii) With the previous notation, we have Rank B T = Rank B = p. Thus B has a set of p independent rows. The submatrix S of B made by these rows is a square p X P matrix with RankS = RankS T = p, hence nonsingular. 0 1.29 -,r. Let A E Mm.n(lK), let A(x) := Ax and let (VI, V2, ... , v n ) be a basis of lK n . Show the following: (i) A is injective if and only if the vectors A(vI), A(V2), .. . , A(v n ) of lK m are linearly independent, (ii) A is surjective if and only if {A(VI),A(V2)"" ,A(vn )} spans lKm , (iii) A is bijective iff {A(VI),A(V2)' ... ,A(vn )} is a basis of lK m .
18
1. Vectors, Matrices and Linear Systems
e. Grassmann's formula Let U and V be two linear subspaces of OC n . Clearly, both U n V and U
+V
:= {x E OC
n
Ix = u + v for some u E U and v E V}
are linear subspaces of OC n . When U n V = {O}, we say that U + V is the direct sum of U and V and we write U EB V for U + V. If moreover U EB V = OCn , we say that U and V are supplementary subspaces. The following formula is very useful. 1.30 Proposition (Grassmann's formula). Let U and V be linear subspaces of OC n . Then dim(U
+ V) + dim(U n V) = dim U + dim V.
Proof. Let (Ul' U2,.' ., Uh) and (Vi, V2, ... , Vk) be two bases of U and V respectively. The vectors Ul, U2, ... , Uh, Vi, V2, ... , Vk span U + V, and a subset of them form a basis of U + V. In particular, dim(U + V) = RankL where L is the n x (h + k) matrix defined by L :=
[Ul
Moreover, a vector x = Z=~l such that
I.. ,I I- I... IUh
XiUi
E
OC n
Vi
is in
un V
Vk] .
if and only if there exist unique
yl, y2, ... , yk
x = X1Ul
+ ... xhUh
thus, if and only if the vector w longs to ker L. Consequently, the
= ylvl
+ ... + yk vk ,
:= (_xl, _x 2 , ... , _x h , yl, y2, .. . , yk) linear map > : OC h + k --; OCn ,
E OC h + k be-
h
>(x,y) :=
2::xiUi i=l
is injective and surjective from ker L onto Un V. It follows that dim(U n V) = dim ker L and, by the rank formula,
dim(U
n V) + dim(U + V)
= dim ker L + RankL = h + k = dimU + dim V. o
1.31 ,. Notice that the proof of Grassmann's formula is in fact a procedure to compute two bases of U + V and un V starting from two bases of U and V. The reader is invited to choose two subspaces U and V of ocn and to compute the basis of U + V and of Unv.
f. Parametric and implicit equations of a subspace 1.32 Parametric equation of a straight line in ocn . Let a 1= 0 and let q be two vectors in OC n . The parametric equation of a straight line through q and direction a is the map r : OC --; OC n given by r(>..) := >"a + q, >.. E IK. The image of r { x E lR,n 13>" such that x = >"a + q} is the straight line through q and direction a.
1.2 Matrices and Linear Operators
19
Figure 1.4. Straight line through q and direction a.
We have r(O) = q and r(l) = a + q. In other words, r(t) passes through q and a + q. Moreover, x is on the straight line passing through q and a + q if and only if there exists t E lK such that x = t a + b, or, more explicitly
XI=ta l + ql , x 2 =ta2 +q2, (1.7)
!
x n = tan
+ qn. := ta + q
In kinematics, lK = jR and the map t -+ r(t) gives the position at time t of a point moving with constant vector velocity a starting at q at time t = 0 on the straight line through q and a + q.
1.33 Implicit equation of a straight line in lK n • We want to find a representation of the straight line (1.7) which makes no use of the free parameter t. Since a i= 0, one of its components is nonzero. Assume for instance a l i= 0, we can solve the first equation in (1.7) to get t = (ql - xl )ja l and, substituting the result into the last (n - 1) equations, we find a system of (n - 1) constraints on the variable x = (xl, x 2 , ..• , x n ) E lKn ,
x3
l (ql_x )
a2 + q2 , = (ql-x l )a3 +q3 ,
X2 =
!
xn =
~
~
~an+qn. a
The previous linear system can be written as A(x - q) the matrix defined by -1
0 -1
0 0
-a~jal
0 0
0
-1
_an ja l
0
0
0
[ -a'ja' 3 l -a ja
A
=
=0
where A E Mn-l,n(lK) is
~l
1.34'. Show that there are several parametric equations of a given straight line. A parametric equation of the straight line through a and b E jRn is given by t -+ r(t) := a + t(b - a), t E lR. 1.35 Parametric and implicit equations of a 2-plane in lK 3 • Given two linearly independent vectors VI, V2 in jR3 and a point q E jR3, we call the parametric equation
20
1. Vectors, Matrices and Linear Systems
of the plane directed by VI, V2 and passing through q, the map 'I' : by '1'((0, ,8» := OVI + ,8v2 + q, or in matrix notation
I
'1'((0, ,8» = [VI V2] (;)
nc2 ~ nc3 d.efined
+ q.
Of course 'I' is linear iff q = 0. The 2-plane determined by this parametrization is defined by II:= 1m 'I' = Suppose VI
= (a, b, c)
and V2
{x E R.3Ix- q E ImA}.
= (d, e, f)
so that
Because of Theorem 1.28, there is a nonsingular 2 x 2 submatrix B of A and, without loss of generality, we can suppose that B =
(~ ~). We can then solve the system
°
in the unknown (0, ,8), thus finding and ,8 as linear functions of xl - ql and x 2 q2. Then, substituting into the third equation, we can eliminate (0,,8) from the last equation, obtaining an implicit equation, or constraint, on the independent variables, of the form r (xl _ ql) + S (x 2 _ q2) + t (x 3 _ q3) = 0, that describes the 2-plane without any further reference to the free parameters (0, ,8).
More generally, let W be a linear subspace of dimension k in JKn, also called a k-plane (through the origin) of JKn. If VI, V2, ... , Vk is a basis of W, we can write W = 1m L where
We call x -> L(x) := Lx the parametric equation of W generated by (VI, V2, ... , Vk). Of course a different basis of W yields a different parametrization. We can also write any subspace W of dimension k as W = ker A where A E Mn-k,n(JK). We call it an implicit representation of W. Notice that since ker A = W, we have Rank AT = Rank A = n - k by Theorem 1. 28 and the rank formula. Hence the rows of A are n - k linearly independent vectors of JKn. 1.36 Remark. A k-dimensional subspace of JKn is represented by means of k free parameters, i.e., the image of JKk through a nondegenerate parametric equation, or by a set of independent (n - k) constraints given by linearly independent scalar equations in the ambient variables.
1.2 Matrices and Linear Operators
21
1.37 Parametric and implicit representations. One can go back and forth from the parametric to the implicit representation in several ways. For instance, start with W = ImL where L E M n ,k(lK) has maximal rank, RankL = k. By Theorem 1.28 there is a k x k nonsingular matrix submatrix M of L. Assume that M is made by the first few rows of L so that
M L=
N where N E Mn-k,k (IK). Writing x as x = (x', x") with x' E IK k and x" E Kn-k, the parametric equation x = Lt, t E IKk , writes as
X' {
=Mt,
(1.8)
x" =Nt.
As M is invertible,
We then conclude that x E ImL if and only if NM-1x' = x". The latter is an implicit equation for W, that we may write as Ax = 0 if we define A E M n -k,n(lK) by
A=EJB· Conversely, let W = ker A where A E Mn,k (IK) has Rank A = n - k. Select n - k independent columns, say the first n - k on the left, call B E M n - k ,n-k(lK) the square matrix made by these columns, and split x as x = (x', x") where x' E IK n - k and x" E IKk . Thus Ax = 0 rewrites as
G
8(::,)=0,
or
Bx' +Cx" =0.
As B is invertible, the last equation rewrites as x' = -B-1Cx", Therefore x E ker A if and only if
x=
i.e., W = ImL.
x" :=Lx",
22
1. Vectors, Matrices and Linear Systems
1.3 Matrices and Linear Systems a. Linear systems and the language of linear algebra Matrices and linear operators are strongly tied to linear systems. A linear system of m equations and n unknowns has the form
at
+ a~x2 + '" + a;,x n = bl , ar xl + a~x2 + ... + a;x n = b2, Xl
(1.9)
The m-tuple (b l , ... ,bm ) is the given right-hand side, the n-tuple (Xl, ... , x n ) is the unknown and the numbers {aj}, i = 1, ... ,m, j = 1, ... ,n are given and called the coefficients of the system. If we think of the coefficients as the entries of a matrix A,
.
A:= raj] =
I
ar:
a2l a22
a~a2 )
am 2
am n
C am I
n
.
,
(1.10)
and we set b:= (b\ b2, ... , bm ) E ocm , x:= (xl, x 2, ... , x n ) E OC n , then the system can be written in a compact way as Ax=b.
(1.11)
Introducing the linear map A(x) := Ax, (1.9) can be seen as a functional equation (1.12) A(x) = b or, denoting by aI, a2, ... , an the n-columns of A indexed from left to right, as xlal + x2a2 + .. ,+ xnan = b. (1.13) Thus, the discussion of linear systems, linear independence, matrices and linear maps are essentially the same, in different languages. The next proposition collects these equivalences. 1.38 Proposition. With the previous notation we have:
(i) Ax is a linear combination of the columns of A. (ii) The following three claims are equivalent: a) the system (1.11) or (1.9), is solvable, i.e., there exists x E OC n such that Ax = b; b) b is a linear combination of all a2,···, an; c)bElmA. (iii) The following four claims are equivalent:
1.3 Matrices and Linear Systems
23
a) Ax = b has at most one solution, b) Ax = 0 implies x = 0, c) A(x) = 0 has a unique solution, d) ker A = {OJ, e) aI, az, ... , an are linearly independent. (iv) ker A is the set of all solutions of the system Ax = O. (v) 1m A is the set of all b's such that the system Ax = b has at least one solution. (vi) Let Xo E OC n be a solution ofAxo = b. Then the set of all solutions of Ax = b is the set
{xo }
+ ker A
:=
{x
E
OC n Ix - Xo E ker A}.
With the previous notation, we see that b is linearly dependent of aI, az, ... , an if and only if
Thus from Proposition 1.38 (ii) we infer the following. 1.39 Proposition (Rollche-Capelli). With the previous notation, the system (1.9) or (1.11) is solvable if and only if
The m x (n
+ 1)
matrix
is often called the complete matrix of the system (1.9). 1.40
-,r.
Prove all claims in this section.
1.41 Solving linear systems. Let us return to the problem of solving Ax=b,
where
A E M""n (IK), b E IK"'.
If n = m and A is nonsingular, then the unique solution is xo := A -lb. In the general case, according to Proposition 1.39, the system is solvable if and only if Rank A = Rank [A I b], and if xo E IK n is a solution, the set of all solutions is given by {xo}+ker A. Let r := Rank A. Since Rank AT = r, we may assume without loss of generality that the first r rows of A are linearly independent and the other rows depend linearly on the first r rows. Therefore, if we solve the system of r equations
24
1. Vectors, Matrices and Linear Systems
a~) (Xl)
a;
x2
·· ·
.. .
a~
xn
l
b2
(b.) , ..
(1.14)
br
the remaining equations are automatically fulfilled. So it is enough to solve Ax = b in the case where A E Mr,n(OC) and Rank AT = Rank A = r. We have two cases. If r = n, then A E Mr,r is nonsingular, consequently Ax = b has a unique solution x = A-I b. If r < n, then A has r linearly independent columns, say the first r. Denote by R the r x r nonsingular matrix made by these columns, and decompose x = (x', x") with x' E ocr and x" E oc n - r . Then Ax = b writes as
EJ 8 (::,)
i.e., Rx'
+ Sx" =
= b,
b, or x' = R-l(b - Rx"). Therefore,
x= Id n -
r
concluding that the set of all solutions of the system Ax = b is
{x I x -
XQ
I
E ker A} = {x x -
XQ
E ImL}.
b. The Gauss elimination method As we have seen, linear algebra yields a proper language to discuss linear systems, and conversely, most of the constructions in linear algebra reduce to solving systems. Moreover, the proofs we have presented are constructive and become useful from a numerical point of view if one is able to efficiently solve the following two questions: (i) find the solution of a nonsingular square system Ax = b, (ii) given a set of vectors T c OC n , find a subset SeT such that Span S = SpanT. In this section we illustrate the classical Gauss elimination method which efficiently solves both questions. 1.42 Example. Let us begin with an example of how to solve a linear system. Consider the linear system
6x + 18y +6z 3x + 8y +6z { 2x +y + z where x:= (x,y,x), b:=
(h,b2,b3)
and
or
Ax=b
1.3 Matrices and Linear Systems
25
We subtract from the second and third equations the first one multiplied by 1/2 and 1/3 respectively to get the new equivalent system:
6x + 18y + 6z 3x + 8y + 6z - ~(6x + 18y + 6z) { 2x + y + z - ~(6x + 18y + 6z)
= bl, = -~bl +b2, = -~bl +b3,
i.e.,
6x
+ 18y + 6z -y+3z -5y-z
{
= bl, = -~bl +b2, =-~bl+b3.
(1.15)
This essentially requires us to solve the system of the last two equations
= -~bl = -~bl
-y+3z -5y-z
{
+ b2, + b3·
We now apply the same argument to this last system, i.e., we subtract from the last equation the first one multiplied by 5 to get
6x {
+ 18y + 6z
= bl, = -~bl = -~bl
-y+3z -5y - z - 5( -y + 3z)
+ b2, + b3 -
5(-~bl
+ b2),
i.e., 6x
+ 18y + 6z -y+3z -16z
{
= bl, = -~bl + b2, = 2bI - 5b2 +b3.
This system has exactly the same solution as the original one and, moreover, it is easily solvable starting from the last equation. Finally, we notice that the previous method produced two matrices
U:=
(~o ~~ ~) 0
,
-16
U is upper triangular and L is lower triangular with 1 in the principal diagonal, so the original system Ax = b rewrites as Ux=Lb.
Since L = [1}J is invertible (I: = 1 Vi) and x is arbitrary, we can rewrite the last formula as a decomposition formula for A, A = L-lU.
The algorithm we have just described in Example 1.42, that transforms the proposed 3 x 3 square system into a triangular system, extends to systems with an arbitrary number of unknowns and equations, and it is called the Gauss elimination method. Moreover, it is particularly efficient, but does have some drawbacks from a numerical point of view.
26
1. Vectors, Matrices and Linear Systems
Let (1.16)
Ax=O
be a linear homogeneous system with m equations, n unknowns and a coefficient matrix given by
A =
a2
a~ a22
a~a )
am 2
am n
2 n
C .1
am 1
Starting from the left, we denote by ]1 the index of the first column of A that has at least one nonzero element. Then we reorder the rows into a new matrix B of the same dimensions, in such a way that the element bJ1 is nonzero and all columns with index less than]1 are zero,
.
B ~ [b,] ~
(0;
0
1 0 b)1 0 *
* *
0
*
*
where * denotes the unspecified entries. We then set P1 := bJ1' and for i = 2, ... , m we subtract from the ith row of B the first row of B multiplied by -b))P1. The resulting matrix therefore has the following form
o
P1
*
00* 00* where P1 i= 0, below P1 all entries are zero and * denotes the unspecified entries. We then transform A 1 into A 2 , A 2 into A 3 , and so on, operating as previously, but on the submatrix of A 1 of rows of index respectively larger than 2, 3, .... The algorithm of course stops when there are no more rows and/or columns. The resulting matrix produced this way is not uniquely determined as there is freedom in exchanging the rows. However, a resulting matrix that we call a Gauss reduced matrix, is clearly upper triangular if A is a square matrix, m = n, and in general has the following stair-shaped form
1.3 Matrices and Linear Systems
-
~ "'~"" '"
~
27
1l1l.tt < ~ -~ct
u.tl1lt.
ItttlltlJft
Figure 1.5. Two pages of the Japanese mathematician Takakazu Seki (1642-1708) who apparently dealt with determinants before Gauss.
GA:=
0 0 0 0
PI
0 0 0
*
0 0 0
* * *
P2 0 0
0 0
00000
* * P3 0
* * *
0
* * * Pr
000
(1.17) where * denotes the unspecified entries; the nonzero numbers PI, P2, ... , Pr are called the pivots of the stair-shaped matrix GA. Finally, since o multiplying one of the equations of the system Ax = 0 by a nonzero scalar, o exchanging the order of the equations, o summing a multiple of one equation and another equation, produces a linear system with the same solution as the initial system, and observing that the Gauss elimination procedure operates with transformations of this type, we conclude that GAX = 0 has the same solution as the initial system. We now discuss the solvability of the system Lx = b, if L is stairshaped. 1.43 Proposition. Let L be a stair-shaped m x n matrix. Suppose that L has r pivots, r :::; min( n, m). Then a basis of 1m L is given by the r columns containing the pivots, and the system Lx = b, b = (b l , b2 , ... , bm ) T, has a solution if and only if br +! = ... = bm = o. Proof. Since there are r pivots and at most one pivot per row, the last rows of L are identically zero, hence 1m L C {b E IK m I b = (bl, b2, . .. , br , 0, ... , On. Consequently,
28
1. Vectors, Matrices and Linear Systems
Figure 1.6. Takakazu Seki (1642-1708).
dimlmL S; r. On the other hand the r columns that contain the pivots are in ImL and are linearly independent, hence Rank L = r, and 1m L = {(b I , b2 , •.. , br , 0, ... ,0) I bi E IKVi,i=l, ... ,r}. 0
The Gauss elimination procedure preserves several properties of the original matrix A. 1.44 Theorem. Let A E Mm,n(lK) and let G A be one of the matrices resulting from the Gauss elimination procedure. Then (i) ker A
= ker GA,
(ii) Rank A = RankGA = number of pivots ofGA, (iii) letjl,j2, .. ' ,jr be the indices of the columns of the pivots ofGA, then the columns of A with the same indices are linearly independent, Proof. (i) is a rewriting of the equivalence of Ax = 0 with GAX = O. (ii) Because of (i), the rank formula yields Rank A = RankGA, and RankGA equals the number of pivots by Proposition 1.43. (iii) Let A = [al I a21 ... Ian] and let
Following the Gauss elimination procedure we used on A, we easily see that the columns of B transform into the columns of the pivots which are linearly independent. By (i) ~B=M.
0
1.45~. Let A E M m ,n(IK). Show a procedure to find a basis for Rank A and Rank AT. 1.46~. Let W = Span {VI, V2, ... , vd be a subspace of IKn . Show a procedure to find a basis of W among the vectors VI, V2, ... , Vk. 1.41~.
Let A E M m ,n(IK). Show a procedure to find a basis of ker A.
1.3 Matrices and Linear Systems
29
1.48~. Let VI, V2, ... , Vk E IK m be k linearly independent vectors. Show a procedure to complete them with n - k vectors of the canonical basis of IR n in order to form a new basis of IR n . [Hint: Apply the Gauss elimination procedure on the matrix
0 1
o
o 0
1
1
o
1.49~.
o
)1
Show that A E Mn,n(IK) is invertible if and only if a Gauss reduced matrix of
A has n pivots.
c. The Gauss elimination procedure for nonhomogeneous linear systems Now consider the problem of solving A(x) = b, where A E Mm,n(JK), x E lK n and b E lK m. We can equivalently write it as Xl
x2 a 21
a~ a 22
a~ a 32
C a"{'
a2'
a n1 a n2
1 0
am n
a'3
0
0 1
xn _b 1 _b 2
:)
0
=0.
_bm
If one computes a Gauss reduced form of the m x (n a
I
B := [A Idn ] =
1 2
+ m)
a~ a~
an a n2
1 0 0 1
a 2m
am n
0
C
1
:1
am 1
0
matrix
r)
we find, on account of Theorem 1.44, that
GB~G8 where G A E Mm,n(lK) is a Gauss reduced matrix of A and S E Mm,m(lK). Moreover, if the elimination procedure has been carried out without any permutation of the rows, then S is a lower triangular matrix with 1 as entries in the principal diagonal, hence it is invertible. Since for every b the system Ax = b is equivalent to GAX = Sb, we then have GAx = Sb = SAx 't/x E lK n , thus concluding that A
= S-lG A . In particular,
30
1. Vectors, Matrices and Linear Systems
1.50 Proposition (LR decomposition). Let A E Mn,n(JK) be a square
matrix. If the elimination procedure proceeds without any permutation of the rows, we can decompose A as A = LR, where R = GA is the resulting Gauss reduced matrix and L is a suitable lower triangular matrix with 1 as entries in the principal diagonal of L. In general, howewer, the permutation of the rows must be taken into account. For this purpose, let us fix some notation. Recall that a permutation of {1, ... ,m} is a one-to-one map ( j : {1, ... ,m} --+ {1, ... ,m}. The set of all permutations of m elements is denoted by Pm. For every permutation (j of m elements, define the associated permutation matrix R a E Mm,m(JK) by
where (el, e2, ... , em) is the canonical basis of oc m. Let A E Mm,n(OC). If (j permutes the indices of the rows of A, then the resulting matrix is RaA. Now denote by 9(A) the Gauss reduced matrix, if it exists, obtained by the Gauss elimination procedure starting from the top row and proceeding without any permutation of the rows. Let GA be a Gauss reduced form of A. Then GA = 9(Ra A) for some permutation (j of m elements. Now fix a Gauss reduced form GA of A, and let (j be such that GA = 9(Ra A). Write Ax = Y as (RaA)x = RaY = Idm(Ray) and let
Then Band RaA may be reduced without any permutation of the rows, hence by the above
where S is lower triangular with all entries in the principal diagonal equal to 1. Therefore GAX = SRaY = SRaAx 't/x, that is,
(1.18) When A E Mn,n(OC) is a square matrix, (1.18) shows that A is invertible if and only if a Gauss reduced form G A of A is invertible and A-I
= GAlSRa .
In practice, let (el, e2, ... , en) be the canonical basis of oc n and let A-I =: [VI I v21 ... Iv n ]. Let i = 1, ... n. To compute Vi, we observe that Vi = A -lei, i.e., AVi = ei. Thus, using the Gauss elimination procedure, from (1.18) Vi is a solution of G A Vi = SRaei. Now, since GA is upper triangular, this last system is easily solved by inductively computing the components of Vi starting from the last, upward.
1.4 Determinants
31
(0,1 t-----,
Figure 1.7. The area transformation.
1.4 Determinants The notion of determinant originated with the observation of Gabriel Cramer (1704-1752) and Carl Friedrich Gauss (1777-1855) that the process of elimination of unknowns when solving a linear system of equations amounts to operating with the indices of the associated matrix. Developments of this observation due to Pierre-Simon Laplace (1749-1827), Alexandre Vandermonde (1735-1796), Joseph-Louis Lagrange (1736-1813) and Carl Friedrich Gauss (1777-1855), who introduced the word determinant, were then accomplished with a refined study of the properties of the determinant by Augustin-Louis Cauchy (1789-1857) and Jacques Binet (1786-1856). Here we illustrate the main properties of the determinant. 1.51 Determinant and area in
]R2.
Let
A = (:
(1.19)
:)
be a 2 x 2 matrix. It is easily seen that A is not singular, i.e., the linear homogeneous system
ax + by = 0, { ex + dy = 0 has zero, (x, y) = (0,0), as a unique solution if and only if ad - be ad - be is the determinant of the matrix A, det A
=
det (:
:):=
#
O. The number
ad - be.
One immediately notices the combinatorial characteristic of this definition: if A = [a;], then det A = ata~ - a~aiLet a := (a, c) and b := (b, d) be the two columns of the matrix A in (1.19). The elementary area of the parallelogram spanned by a and b with vertices (0,0), (a, c), (b, d) and (a + b, c + d), is given by Area (T) =
lallbll sin 01
where 0 is the angle ;1;, irrespective of the orientation, see Figure 1. 7. On the other hand, by Carnot's formula lallbl cos 0 = ab + cd, hence
32
1. Vectors, Matrices and Linear Systems
..
MANUA[.I HOSP[.I
ELEMENTARY TREATI &
ERNESTO PASCAL arGIwtIo . . . a.llall'UW&
;Jaf-Il
I DETERMINA TI
DETERMI S/J/ULTANIIOUS L/NUR
A
T
6QU4TI()~'S
J.."D ALOZ8lUICAL OJll:)JlI!:TRY.
CBA.RLB~
OLRICO HOEPLI II:DlTOflIt-1.lIU10 P&U.A .LIlI.
L. DODOBON. H.A.
"'llIhl:
~,\
)U,CklLUlC o\)fD
WILA:"'lO
-
co.
uu
Figure 1.8. Frontispieces of two books on determinants respectively by Ernesto Pascal and Charles L. Dodgson, better known as Lewis Carollo
Area (T)2 = (a 2 + c 2 )(b 2 + d 2 )(1 - cos 2 0) = a 2b2 + a 2d 2 + b2c 2 + c 2d 2 _ (ab + cd)2 = a 2d 2 + b2c 2 - 2abcd = (ad - bc)2 = det A 2 , Le.)
Area(T) = IdetAI. We may think of det A as of the area of T with sign. In fact, the sign of det A may be used to define the sign of the angle formed by the vectors a and b. The angle ;;J; is positively (negatively) oriented if det[a IbJ > 0 (det[a I bJ < 0). Angles with sign in geometry are also modelled by complex multiplication, identifying ]R2 with iC. Using the previous notation, setting z := a + ib, w = c + id we have zw = (a + ib)(c - id) = (ac +bd) +i(bc - ad) = (ae b )1R2 +idetA.
Let vi, V2 E ]R.2. As we have seen, the determinant of the matrix [Vi IV2] is not zero iff Vi and V2 are linearly independent. Actually, for any n ~ I there is a real function defined on n x n matrices that tells us whether the n columns of the matrix are linearly independent: the determinant. One of the simplest ways to define it is as follows. We recall that a permutation of {I, ... , n} is a one-to-one map a : {l, ... ,n} -> {l, ... ,n}. The set of permutations of n objects, denoted by Pn is a group with respect to the operation of composition. A permutation that exchanges two adjacent indices and leaves the other indices unchanged is called a transposition. Transpositions are elementary permutations in the sense that each permutation a can be obtained by composing
1.4 Determinants
33
subsequent transpositions. Of course, there are several ways to decompose a given permutation into elementary transpositions, but the parity or oddity of the number of transpositions needed to realize a given permutation a depends only on the permutation a. We define the signature, or sign, of the permutation a the number
(-1)""
:=
{+1 -1
if a decomposes in an even number of transpositions, if a decomposes in an odd number of transpositions.
1.52 Definition. Let A is then defined by
=
det A:=
[a;] E Mn,n(JK), n 2 1. The determinant of A
I: (-1)"" a~(l)a;(2) ... a~(n)'
(1.20)
,,"EPn
Notice that det A is a sum of products and each product contains just one element from each row and each column, and the sum, apart from the sign, is extended to all possible choices. 1.53 Example. Of course for matrix A in (1.19) we again get det A = ad - be. Going back to the area, one shows that given 3 vectors VI, V2, V3 E IR3 and denoting by T the polyhedra generated by these vectors, we still have VoI3(T) = Idet[vi [v21 v3]1·
For n vectors VI, V2, ... , Vn E IRn, let
and let L(x) := Lx. If Q is the unit cube of IR n ,
Q:= {x = (xl, x 2 , ... , x n )
IO:S: xi :s:
1 Vi},
we define the n-dimensional volume of T := L(Q) by Voln(T):= IdetL[.
It is useful to think of the determinant as a function of the columns of the matrix. In fact, we have the following.
1.54 Theorem. The determinant on n x n matrices is the unique function det : Mn,n(JK) -+ OC such that, when seen as a function of columns, it is (i)
(LINEAR ON EACH FACTOR): >. E OC
det [ . ..
for all
a~, a~' E
OCn , i
=
1, ... , n, and
I~ + a~' I... ] = det [. .. Ia~ I ] + det [. .. Ia~' I ... ],
I I... ] =
det [ .. , >'ai
>. det [ . .. Iai I
],
34
1. Vectors, Matrices and Linear Systems
(ii) (ALTERNATING): by exchanging two adjacent columns the determinant changes sign, det [ ...
1ai I ai+l I ... ] = - det [ .. ·1 ai+1 I ai I ... ],
(iii) (NORMALIZED): det Idn
= 1.
Notice that because of (i) the alternating property can be equivalently formulated by saying that det A = 0 if A has two equal columns. Proof. Clearly the right-hand side of (1.20) fulfills the conditions (i), (ii), (iii). To prove uniqueness, suppose that D : Mn,n(II() ---t II( fulfills (i), (ii), (iii) of Theorem 1.54. Write A = raj] E Mn,n(II() as A = tal I a2! ... Ian] where n
ai
=
L a{ej, j=l
(el, e2, ... , en) being the canonical basis of II(n. Then by (i) D(A) =
L
a;(l)a;(2) ... a~(n)D([el I '"
Ien])
O"(l), ... ,O"(n)
where 17(1),17(2), ... , a(n) vary in {I, ... , n}. Since by (ii) D(A) = 0 if A has two equal columns, we infer that a(i) 1= a(j) if i 1= j, i.e., that a is a permutation of (1,2, ... ,n). Since D([eO"(l) I ... I eO"(n)]) = (-1)0" D([el I ... I en]) and D([el I .. , I en]) = 1, we conclude that D(A) agrees with the right-hand side of (1.20), hence D(A) = det A. 0
The determinant can also be computed by means of an inductive formula.
1.55 Definition. Let A = [a;] E Mn,n(lK), n 2: 1. A r-minor of A is a r x r submatrix of A, that is a matrix obtained by choosing the common entries of a choice of r rows and r columns of A and relabeling the indices from 1 to r. For i,j = 1, ... , n we define the complementing (i,j)-minor of the matrix A, denoted by M{(A), as the (n -1) x (n -I)-minor obtained by removing the ith row and the jth column from A. 1.56 Theorem (Laplace). Let A E Mn,n(lK), n 2: 1. Then detA :=
A
if n = 1,
{ 2:7=1 (-I)J+1 a] det M](A) if n > 1.
(1.21 )
Proof. Denote by D(A) the right-hand side of (1.21). Let us prove that D(A) fulfills the conditions (i), (ii) and (iii) of Theorem 1.54, thus D(A) = det A. The conditions (i) and (ii) of Theorem 1.54 are trivially fulfilled by D(A). Let us also show that (iii) holds, i.e., if aj = aj+l for some j, then D(A) = O. We proceed by induction on j. By the induction step, det M); (A) = 0 for h 1= j,j + 1, hence D(A) = (-I)i+ l a] det M] (A) + (-I)ja]+l detM]+l(A). Since a] = a]+l' and, consequently, M](A) = M]+l(A), we conclude that D(A) = O. 0
From (1.20) we immediately infer the following.
1.4 Determinants
35
1.57 Theorem (Determinant of the transpose). We have
detA T = detA
for all A E Mn,n(lK).
One then shows the following important theorem.
1.58 Theorem (Binet's formula). Let A and B be two n x n matrices. Then det(BA) = det B det A. Proof. Let A = [a;J = [al I ... [an], B = [b;J = [bl the canonical basis of lK n . Since n
n
2)BA){ ej j=l
=
I ... I b n ] and
let (el, ... , en) be
n
L Il?ai ej = L aibr, j,r=l
r=l
we have n
det(BA)=det([Lalbrl···1 r=l
=
L
n
La~br]) r=l
a~(I)a;(2)'" a~(n) det[bu(l)
I ...
[bu(n)J
uEPn
=
L
(-1)Ua~(I)a;(2)" .a~(n) detB = detAdetB.
uEP n
o As stated in the beginning, the determinant gives us a criterion to decide whether a matrix is nonsingular or, equivalently, whether n vectors are linearly independent.
1.59 Theorem. A n x n matrix A is nonsingular if and only if det A
-I- O.
Proof. If A is nonsingular, there is aBE Mn,n(lK) such that AB = Id n , see Exercise 1.27; by Binet's formula det A det B = 1. In particular det A of O. Conversely, if the columns of A are linearly dependent, then it is not difficult to see that det A = 0 by using Theorem 1.54. 0
Let A = [a)] be an m x n matrix. We say that the characteristic of A is r if all p-minors with p > r have zero determinant and there exists a r-minor with nonzero determinant.
1.60 Theorem (Kronecker). The rank and the characteristic of a matrix are the same. Proof. Let A E Mm,n(lK) and let r := RankA. For any minor B, trivially RankB ::; Rank A = r, hence every p-minor is singular, Le., has zero determinant, if p > n. On the other hand, Theorem 1.28 implies that there exists a nonsingular r-minor B of A, hence with det B of O. 0
The defining inductive formula (1.21) requires us to compute the determinant of the complementing minors of the elements of the first row; on account of the alternance, we can use any row, and on account of Theorem 1.57, we can use any column. More precisely,
36
1. Vectors, Matrices and Linear Systems
1.61 Theorem (Laplace's formulas). Let A be an n x n matrix. We have for all h, k = 1, ... , n
n 8kh detA = ~)-l)h+jajdetM;(A), j=l
n
8kh detA = ~) _l)i+ha~ detM,,(A), i=l
where 8hk is Kronecker's symbol. 1.62 ... To compute the determinant of a square n x n matrix A we can use a Gauss reduced matrix GA of A. Show that det A = (-1)" n?=l (GA)~ where a is the permutation of rows needed to compute GA, and the product is the product of the pivots.
It is useful to rewrite Laplace's formulas using matrix multiplication. Denote by cof(A) = [cj] the square n x n matrix, called the matrix of cofactors of A, defined by
Notice the exchange between the row and column indices: the (i,j) entry of cof(A) is (-l)i+j times the determinant of the complementing (j, i)-minor. Using the cofactor matrix, Laplace's formulas in Theorem 1.61 rewrite in matrix form as 1.63 Theorem (Laplace's formulas). Let A be an n x n matrix. Then we have (1.22) cof(A) A = A cof(A) = det A Id n .
We immediately infer the following. 1.64 Proposition. Let A
= [all a21 ... I an]
E
Mn,n(JK) be nonsingular.
(i) We have A
(ii)
(CRAMER'S RULE)
-1
1
= det A cof(A).
The system Ax
= h, hE OC n , has a unique solu-
tion given by 1 2 ... ,xn) , x= ( x,x,
where
i
x
det B i
= detA'
1.5 Exercises
37
Proof. (i) follows immediately from (1.22). (ii) follows from (i), but it is better shown using linearity and the alternating property of the determinant. In fact, solving Ax = b is equivalent to finding x = (xl, X 2 , ... , xn) such that b = 2::7=1 xiai' Now, linearity and the alternating property of the determninant yield detBi =det [ .. ·Iai-ll txjaj lai+ll .. ·] = txjdet [ .. ·lai-lJaj Jai+lJ ... ]. j=l j=l Since the only nonzero addend on the right-hand side is the one with j = i, we infer detBi=xidet[all··· Jai-l[aiJai+lJ ... Jan] =
Xi
detA.
o 1.65~. Show that detcof(A) = (detA)n-l.
1.5 Exercises 1.66~. Find the values of x, y E JR for which the three vectors (1,1,1), (1, x, x 2 ), (1, y, y2) form a basis of JR3. 1.67~. Let 0<1,0<2 E C be distinct and nonzero. Show that eart, e a2t , t E JR, are linearly independent on C. [Hint: See [GM2] Corollary 5.54.] 1.68~.
Write the parametric equation of a straight line (i) through b = (1,1,1) and with direction a = (1,0,0), (ii) through a = (1,1,1) and b = (1,0,0).
1.69~. Describe in a parametric or implicit way in JR3,
o o o o o o o
a straight line through two points, the intersection of a plane and a straight line, a straight line that is parallel to a given plane, a straight line on a plane, a plane through three points, a plane through a point containing a given straight line, a plane perpendicular to a straight line.
1. 70 ~ Affine transformations. An affine tmnsformation
~.
n P2
Let PI and P2 be two (n - I)-planes in JRn. Show that either PI = P2 or = 0 or PI n P2 has dimension n - 2.
1. 72~. In JR4 find (i) two 2-planes through the origin that meet only at the origin, (ii) two 2-planes through the origin that meet along a straight line. 1.73 ~. In JR2 write the 2 x 2 matrix associated with the counterclockwise rotations of angles rr /2, rr, 3rr /2, and, in general, () E JR.
38
1. Vectors, Matrices and Linear Systems
1.14'. Write the matrix associated with the axial symmetry in IR3 and to plane symmetries. 1.15 ,. Write down explicit linear systems of 3,4,5 equations with 4 or 5 unknowns, and use the Gauss elimination procedure to solve them. 1.16'. Let A E Mn,n(lK). Show that if AB = 0 VB E Mn,n(lK), then A = O.
1.11'. Let A =
(~
-1 -1
~) and B =
-1
(:
2
~). Compute A + B, V2A + B.
1.18'. Let
A=
(~
-1
3 2
2 0
o
1
:) 1
~ (~ ~:).
B
2
5
1.19'. Let
Show that AB = O. 1.80'. Let A, B E Mn,n(lK). Show that if AB = 0 and A is invertible, then B = O. 1.81'. Let
B (0 i) :=
i
0
'
C (i 0). :=
0
-i
Show that o o o o
A2 = B2 = C2 = - Id, AB=-BA=C, BC = -CB = Id, CA=-AC=B.
1.82'. Let A,B E Mn,n. We say that A is symmetric if A = AT. Show that, if A is symmetric, then AB is symmetric if and only if A and B commute, i.e., AB = BA. 1.83'. Let M E Mn,n(lK) be an upper triangular matrix with all entries in the principal diagonal equal to 1. Suppose that for some k we have M k = MM·· . M = Id n . Show that M = Id n . 1.84'. Let A, B E Mn,n(lK). In general AB =I- BA. The n x n matrix [A, B] .AB - BA is called the commutator or the Lie brocket of A and B. Show that (i) [A, B] = -[B, A], (ii) (JACOBI'S IDENTITY) [[A, B], C] + [[B,C],A] + [[C,A],B] = 0, (iii) the trace of [A, B] is zero. The troce of a n x n matrix A = [a~] is defined as tr A :=
Ei a~ =
O.
1.5 Exercises
1.85~.
1.86
~
39
Let A E Mn,n be diagonal. Show that B is diagonal if and only if [A, BJ = O. Block matrices. Write a n x n matrix as
A~) A~
A = (Ai Ai
where Ai is the submatrix of the first k rows and h columns, A~ is the submatrix of the first k rows and n - h columns, etc. Show that 1 1 1 B1 A11A1) 2 (B 12 B1) _ (A1 +2 A1B2 1 ( Ai A~ Bi B~ AiBi + A~Bi 1.87~.
Let A E Mk,k(lK), BE Mn,n(lK) and
Compute det C. 1.88~.
Let A E Mk,dlK), B E Mn,n(lK), C E Mk,n(lK) and
M=(~ ~). Compute det M. 1.89 ~ Vandermonde determinant. Let >'1, >'2, ... , >'n ElK and
A:=
1
1
1
1
>'1 >.2 1 >.31
>'2 >.2 2 >.3 2
>'3 >.2 3 >.3 3
>'n
>'1
>.~
>'3
>.~
>.2n >.3n
Prove that det A = TIi<j (>'i - >'j). [Hint: Proceed by induction on n. Notice that det A is a polynomial in >'n and use the principle of identity for polynomials.] 1.90~.
Compute the rank of the following matrices
(!
1 1 3 4
3 -3 1 -2
nu 3 1 2 5
1 -1 2 3
~) (111!)
1.91 ~. Solve the following linear systems
2x + 4y + 3z - 2t = 3,
+ 2z + t = 1, x + 2y - z + 2t = 2, x - 5y + 4z - 3t = 1,
3X - y {
2x + 2y - 3z + 3t = 3, x
+ 2y -
z
+ 3t =
2,
x - 3y + 2z + 2t = -4, 4x + y - 2z
+ 8t =
1.
2. Vector Spaces and Linear Maps
The linear structure of lK n is shared by several mathematical objects. We have already noticed that the set of m x n matrices satisfies the laws of sum and multiplication by scalars. The aim of this chapter is to introduce abstract language and illustrate some facts related to linear structure. In particular, we shall see that in every finite-dimensional vector space we can introduce the coordinates related to a basis and explain how the coordinates description of intrinsic objects changes when we change the coordinates, i.e., the basis.
2.1 Vector Spaces and Linear Maps a. Definition Let lK be a commutative field, here it will be either
~
or
2.1 Definition. A vector space over the field lK is a set X endowed with (i) an operation + : X x X -+ X, called the sum, that makes X a commutative group, i.e., a) (x+y)+z=x+(y+z), x+y=y+x, 'Vx,yEX, b) there exists an element 0 E X called the zero element, such that x+O=O+x=x'VxEX, c) for every x E X there exists -x E X such that x + (-x) = 0, (ii) an operation of multiplication by a scalar· : lK x X -+ X that associates to every A E lK and x E X an element of X denoted by AX such that a) A(X + y) = AX + Ay, (A + JL)x = AX + JLX, b) A(JLX) = (AJL)X, 1· x = x. In particular, (-l)x = -x "Ix E X; we therefore write x - y instead of
x
+ (-y).
The elements of a vector space over lK are called vectors, and the elements of lK are called scalars. The product of a vector by scalars allows us to regard a vector at all scales.
42
2. Vector Spaces and Linear Maps
2.2 Example. As we have seen, IK n for n :::: 1, and all the linear subspaces of IK n are vector spaces over IK. Also, the space of m x n matrices with entries in IK, Mm,n(IK), is a vector space over IK, with the two operations of sum of matrices and multiplication of a matrix by a scalar, see Section 1.2. 2.3 Example. Let X be any set. Then the class F(X, IK) of all functions <.p : X - t IK is a vector space with the two operations of sum and multiplication by scalars defined by
(<.p + 'lj;)(x) := <.p(x)
+ 'lj;(x),
(A<.p)(X) := A<.p(X)
'VxEX.
Several subclasses of functions are vector spaces, actually linear subspaces of F(X, IK). For instance, o the set CO([O, 1]' JR.) of all continuous functions <.p : [O,IJ - t JR., the set of kdifferentiable functions from [0,1] into JR., the set Ck([O, 1], JR.) of all functions with continuous derivatives up to the order k, the set COO([O, 1]' JR.) of infinitely differentiable functions, o the set of polynomials of degree less than k, the set of all polynomials, o the set of all complex trigonometric polynomials, o the set of Riemann summable functions in ]0,1[, o the set of all sequences with values in IK.
We now begin the study of properties that depend only on the linear structure of a vector space, independently of specific examples.
b. Subspaces, linear combinations and bases 2.4 Definition. A subset W of a vector space X is called a linear subspace, or shortly a subspace of X, if
(i) 0 E W, (ii) 'V x, yEW we have x + yEW, (iii) 'V x E W and 'V A E lK we have AX E W. Obviously the element 0 is the zero element of X and the operations of sum and multiplication by scalars are as those in X. In a vector space we may consider the finite linear combinations of elements of X with coefficients in lK, i.e., n
where AI, A2 ,00., An ElK and VI, V2,oo., V n E X. Notice that we have indexed both the vectors and the relative coefficients, and we use the standard notation on the indices: a list of vectors has lower indices and a list of coefficients has upper indices. It is readily seen that a subset W c X is a subspace of X if and only if all finite linear combinations of elements of X with coefficients in lK belong to W. Moreover, given a set SeX, the family of all finite linear combinations of elements of S is a subspace of X called the span of Sand denoted by Span S. We say that a finite number of vectors are linearly dependent if there are scalars, not all zero, such that
2.1 Vector Spaces and Linear Maps
Q
43
TER
..
_,......-....._--...
,..
TAil .Oy ..... Ul.tt AC,.DUIY.
OUJI.Uf 1100011 .Aii'D OaA.nOIl.ITlKJrr. _IMITtI, _ n_--. UlIIDOIf ..."1I1.D_c:o.,,,'t"JltlIlA,,,,,, c.IoIlIll"'~'~
I'"~
Figure 2.1. Arthur Cayley (1821-1895) and the Lectures on Quatemions by William R. Hamilton (1805-1865).
>.lVl
+ ... + >.n Vn = 0,
or, in other words, if one vector is a linear combination of the others. If n vectors are not linearly dependent, we say that they are linearly independent. More generally, we say that a set 5 of vectors is a set of linearly independent vectors whenever any finite list of elements of 5 is made by linearly independent vectors. Of course linearly independent vectors are distinct and nonzero.
2.5 Definition. Let X be a vector space. A set 5 of linearly independent vectors such that Span 5 = X is called a basis of X. A set A c X is a maximal independent set of X if A is a set of linearly independent vectors and, whenever we add to it a vector w E X\A, Au{ w} is not a set of linearly independent vectors. Thus a basis of X is a subset 5
c
X such that
(i) every x E X is a finite linear combination of some elements of 5. Equivalently, for every x E X there is a map >. : 5 --; K such that x = EVEs >.(v)v and >.(v) = 0 except for a finite number of elements (depending on x) of 5, (ii) each finite subset of 5 is a set of linearly independent vectors.
It is easy to prove that for every x E X the representation x = EVEs >.( v)v is unique if 5 is a basis of X. Using the same proof as in Proposition 1.5 we then infer 2.6 Proposition. Let X be a vector space over K. Then 5 of X if and only if 5 is a maximal independent set.
c
X is a basis
44
2. Vector Spaces and Linear Maps
Using Zorn's lemma, see [GM2], one can also show the following.
2.7 Theorem. Every vector space X has a basis. Moreover, two bases have the same cardinality. 2.8 Definition. A vector space X is finite dimensional if X has a finite basis. In the most interesting infinite-dimensional vector spaces, one can show the basis has nondenumerable cardinality. Later, we shall see that the introduction of the notion of limit, Le., of a new structure on X, improves the way of describing vectors. Instead of trying to see every x E X as a finite linear combination of elements of a nondenumerable basis, it is better to approximate it by a suitable sequence of finite linear combinations of a suitable countable set. For finite-dimensional vector spaces, Theorem 2.7 can be proved more directly, as we shall see later. 2.9'. Show that the space of all polynomials and CD([O, 1]' IR) are infinite-dimensional vector spaces.
c. Li1!ear maps 2.10 Definition. Let X and Y be two vector spaces over lK. A map rp : X -> Y is called lK-linear, or linear for short, if
rp(x + y)
= rp(x) + rp(y)
and
rp( >.x)
= >.rp(x)
for any x, y E X and>. E K A linear map that is injective and surjective is called a (linear) isomorphism. Of course, if rp : X
->
Y is linear, we have rp(O)
2.11 Proposition. Let rp : X rp(
->
= 0 and, by induction,
Y be linear. Then
k
k
i=l
i=l
L >.iei ) = L >.irp(ei)
for any>.!, >.2, ... , >.n ElK and el, e2,"" en EX. In particular, a linear map is fixed by the values it takes on a basis. The space of linear maps rp : X -> Y between two vector spaces X and Y, denoted by £(X, Y), is a vector space over lK with the operations of sum and multiplication by scalars defined in terms of the op~rations on Y by (rp + ~)(x) := rp(x) + ~(y), (>.rp)(x) = >.rp(x) for all rp, ~ E £(X, Y) and>. E R Notice also that the composition oflinear maps is again a linear map, and that, if rp : X -> Y is an isomorphism, then the inverse map rp-l : Y -> X is also an isomorphism. It is easy to check the following.
2.1 Vector Spaces and Linear Maps
2.12 Proposition. Let 'P : X
--+
45
Y be a linear map.
(i) If SeX spans W C X, then 'P(S) spans 'P(W). (ii) If el, e2, ... , en are linearly dependent in X, then 'P(el), ... , 'P(e n ) are linearly dependent in Y. (iii) 'P is injective if and only if any list (el' e2, ... , en) of linearly independent vectors in X is mapped into a list ('P(eI) , 'P(e2), ... , 'P(e n )) of linearly independent vectors in X. (iv) The following claims are equivalent a) 'P is an isomorphism, b) Sex is a basis of X if and only if 'P(S) is a basis of Y. 2.13~.
Show that the following maps are linear (i) the derivation map D : C 1 ([0,1]) ---+ CO ([0, 1]) that maps a C 1 -function into its derivative, I ---+ I'· (ii) the map that associates to every function of class CO([O, 1]) its integral over [0,1],
1---+
fa1 I(t) dt,
(iii) the primitive map CO ([0, 1]) ---+ C 1 ([O, 1]) that associates to every continuous function the primitive function
f
x
I(x)
---+
F(x)
:=
I(t) dt.
° 2.14 Definition. Let'P : X --+ Y be a linear map. The kernel of'P and the image of'P are respectively
I
ker'P:= {x E X 'P(x) = Im'P:= {y E Y
o},
I:J x EX: 'P(x) =
y}.
It is easily seen that ker'P is a linear subspace of the source space X and that ker'P = {o} if and only if 'P is injective. Also, Im'P is a linear subspace of the target space Y, and Im'P = Y if and only if 'P is surjective. If 1m 'P has finite dimension, its dimension is called the rank of 'P and denoted by Rank 'P. Of course 'P is surjective if and only if dim Y = Rank 'P, provided dim 1m 'P < +00. d. Coordinates in a finite-dimensional vector space Let X be a finite-dimensional vector space over lK and let (er, e2,"" en) be an ordered basis on X. Then every vector x E X writes uniquely as x = L~=l xiei, where Xl, X2,.·., Xn ElK. Then (el' e2, ... , en) defines a map E : X --+ lKn characterized by n
if and only if
X = Lxiei'
i=l
46
2. Vector Spaces and Linear Maps
h
7 £
£-1
Figure 2.2. Coordinate system in a finite-dimensional vector space.
It is trivial to verify that £ is linear, injective and surjective, hence an isomorphism, together with its inverse n
£-l(x)
= Lxiei'
x
= (xl,
x 2, ... , x n ).
i=l
We call £ the coordinate system related to the ordered basis (el' e2, .. ·, en) and refer to £(x) as to the coordinate vector of x with respect to the basis (el' e2, ... , en). Notice that £ maps ei to the ith vector ei of the canonical basis of lKn . Also notice that ordered bases and isomorphims £ : X -+ lKn are in one-to-one correspondence. In particular, any isomorphism £ : X -+ lKn is a coordinate system related to a suitable basis. In fact, the vectors el, e2, ... , en of X defined for i = 1, ... ,n by ei:= £-l(ei) form a basis of X by Proposition 2.12 (iii) and it is easy to check that n
if and only if
X
= Lxiei'
i=l
The use of a basis, or, equivalently, of a coordinate system, allows us to transfer definitions and results in lKn to similar definitions and claims in X. We have the following. 2.15 Proposition. Let X be a finite-dimensional vector space. (i) Let (er, e2, ... , en) be an ordered basis of X and let VI, V2, ... , vp E X, P ~ n, be p linearly independent vectors. Then we can choose n - p elements among el, e2, ... , en, say el, e2,"" e n - p, such that (VI, V2,"" vp,el, e2, .. ·, e n - p) is a basis of X. (ii) Assume that VI, V2, . .. , Vk spans X. Possibly eliminating some of the VI, V2, ... , Vk, we get a basis of X. (iii) Any two bases of X have the same number of elements.
The number of elements of a basis of a finite-dimensional space X is called the dimension of X and denoted by dim X. The following corollaries follow from Proposition 2.15.
2.1 Vector Spaces and Linear Maps
47
2.16 Corollary. Let X be a vector space of dimension n, let E : X _ OCn be a coordinate system on X and let W be a subspace of X. Then E(W) is a subspace ofOC n and dim W = dimE(W). 2.17 Corollary. Let X be a vector space of dimension n. Then
(i) n linearly independent vectors of X form a basis of X, (ii) if k > n, then k vectors of X are always linearly dependent, (iii) for every subspace W of X we have dim W ::; n, (iv) let V, W be two subspaces of X. Then V = W if and only if V C W and dim V = dim W. Let U and V be two subspaces of a vector space X. Then both U n V and U + V := { x E X I x = U + v, U E U, v E V} are linear subspaces of X. When Un V = {O}, we say that U + V is the direct sum of U and V and we write U EB V instead of U + V. Moreover if X = U EB V, we say that U and V are supplementary. Thus X = U EB V means that every x E X decomposes uniquely as x = u+w, u E U, v E V.
2.18 Corollary (Grassmann's formula). Let U and V be two finitedimensional subspaces of a vector space X. Then dim U + dim V
= dim(U n V) + dim(U + V).
2.19 'If. Show that every n-dimensional vector space is the direct sum of n subspaces of dimension 1. 2.20 'If. Let el, e2, ... , en be distinct vectors of X and let 1 < P < n. Then, trivially, Span {eI' e2, ... , ep} + Span{ep+I, ... ,en} = Span {eI' e2, ... , en}. Show that, if the ei's are linearly independent, then
2.21 'If. Let VI, V2 be two subspaces of a vector space V of finite dimension and assume that V = VI El7 V2. Then every vector v E V decomposes uniquely as v = VI + V2 with Vi E V. Show that the coordinate maps 1ri : V ---+ Vi, i = 1,2, 1ri(V) = Vi, are linear. 2.22 ,. Let 'P : X ---+ Y be an isomorphism from X onto Y. Show that dim X
= dim Y.
e. Matrices associated to a linear map Let X, Y be two vector spaces of dimension nand m respectively. We shall now show that every choice of an oriented basis, equivalently of a coordinate system, in X and Y yields an identification between linear maps and matrices. Let (el' e2, ... , en) be an oriented basis in X, (h, 12, ... , fm) be an oriented basis in Y and let E : X _ OCn , F : Y _ ocm be the corresponding
48
2. Vector Spaces and Linear Maps
Dl AlIIdeh.n
bre
'·011
1844
Die llneale Ausdehnungslehre
_
... ... _...... ••••• 11-0'11•
............
....... I'WIl,"".......
~
-
~
. ...
~
""fOII I,t.
Figure 2.3. Hermann Grassmann (1808-
1877) and his Ausdenungslehre.
coordinate systems. To every linear map f : X linear map L : ][(n -> ][(m defined by
->
Y one associates the (2.1)
see Figure 2.4, that maps the coordinates of a vector x EX, relative to the basis (el' e2, ... , en), into the coordinates of f(x) E Y, relative to the basis (fI, 12,···, 1m), and then, see Proposition 1.18, an m x n matrix L such that L(x) = Lx. We call Land L respectively, the map and the matrix associated to f using the coordinate systems £ and F, or, equivalently, using (eI, e2,···, en) and (fI, 12,·.·, 1m) as a basis in X and in Y, respectively. Since £-1 maps the ith vector ei of the canonical basis of ][(n to ei, L(ei) is the coordinate vector of £(ei) in the basis (fI, 12,···, 1m), hence L = [L;J where n
f(ej) =
L Ljk
(2.2)
i=l
Equivalently, see Proposition 1.18,
Since £ : X have
-> ][(n
and F : Y
£(ker £)
= ker L,
-> ][(m
are isomorphisms, we trivially
F(lm f)
= 1m L.
Hence, recalling Theorem 1.25, we have the following.
2.1 Vector Spaces and Linear Maps
49
.<....--/X~7 L Figure 2.4. The matrix associated to a linear map.
2.23 Theorem (Rank formula). Let £ : X --. Y be a linear map between linear spaces. If X is finite dimensional, then Rank £ = dimlm£
= dim X
- dimker£.
Proof. Let (el' e2, ... , en) be a basis of X. Then Ime = Span {e(el), ... ,e(en )}, hence dimlme < +00. Now choose a basis (ft, 12, ... , 1m) of Ime and consider the linear associated map L : IKn ---> IK m using the two bases (et, e2, . .. , en) on X and (ft, 12,· .. , 1m) on 1m e. Then Theorem 1.25 yields dimlme = dim 1m L = n - dimker L = n - dimkere.
o f. The space £(X, Y) Let X and Y be vector spaces of dimension nand m, and let (eI, e2,.··, en) and (fl, 12,···, fm) be two bases in X and Y. Then (2.2) defines a map M : .c(X, Y) --. Mm,n(lK) which is trivially injective and surjective. Since M is also linear, we deduce that .c(X, Y) and Mm,n(lK) are isomorphic. In particular, the vector space .c(X, Y) has dimension mn. A basis of .c(X, Y) is given by the mn maps {'f'~}, j = 1, ... , n, i = 1, ... ,m, defined in terms of the bases as if k
=1=
if k
= i,
i,
k
=
1, ... ,n.
'f';
The matrix associated to is the m x n matrix with all entries 0 except for the entry (i,j) where we have 1. Of course the matrix M(£) associated to £ depends on the coordinate systems we use on X and Y. When we want to emphasize such a dependence, we write
M[(£) to denote the matrix associated to £ : X --. Y using the coordinate systems t: on the source space and :F in the target space. The product of composition of linear maps corresponds to the product of composition of linear maps at the level of coordinates, hence to the product row by columns of the corresponding matrices. More precisely, we have the following.
50
2. Vector Spaces and Linear Maps
2.24 Proposition. Let
-+ Z -+ IK k
be two linear maps, be three systems of
rows by columns. Proof. In fact,
o A special case arises if X = Y = Z. In this case, the space £(X, X) of the linear maps from X into itself, also known as the space of endomorphisms of X and sometimes denoted by End (X), is closed under the operations of sum, multiplication by scalars and product of composition. We say that £(X, X) is an algebra with respect to these operations and, for any coordinate system £ on X, M~ : £(X, X) -+ Mn,n(IK) is an isomorphism of algebras. The set of isomorphisms from X into itself, called the automorphisms of X and denoted by Aut (X), is a group with respect to the composition. If dim X = nand £ : X -+ IK n is a coordinate system, then M~(Aut (X)) coincides with the group GL(n, IK) of all nonsingular n x n matrices,
I
GL(n, IK) := {L E Mn,n(IK) det L
:f- O}.
g. Linear abstract equations Let X, Y be two vector spaces over K A linear (abstract) equation in the unknown x is an equation of the form
=
y,
where
(2.3)
= 0 is
(i) the set of all solutions of the associate linear homogeneous equation
I
{ x E X x - Xo E ker
Taking into account the rank formula, we infer the following. 2.25 Corollary. Let X, Y be finite dimensional of dimension nand m respectively, and let
2.1 Vector Spaces and Linear Maps
51
(i) ifm < n, then dimkercp > 0, (ii) if m 2 n, then cp is injective iff Rank cp = n, (iii) if n = m, then cp is injective if and only if cp is surjective. The claim (iii) of Corollary 2.25 is one of the forms of Fredholm's alternative theorem: either cp(x) = y is solvable for every y E Y or cp(x) = 0 has a nonzero solution. 2.26 Example. A second order linear equation
ay" +by' +cy =
f,
a,b,c E IR,
can be seen as an abstract linear equation
f E CO(IR),
f by introducing the linear map
y .......
(2.4)
+ by' + cy.
(2.5)
f E CO(IR), see [GM1J,
°
(i) the set of all solutions of the associated homogeneous equation ay" +by' +cy = is a linear space, actually ker
Moreover, consider the map 'Y : 1R2 ....... C2(1R) that maps each (a, f3) E 1R2 to the unique solution of the initial value problem
ay" + by' + cy = 0, { yeO) = a, y' (0) = f3. It is easy to show that 'Y : 1R2 ....... C2(1R) is linear, and by definition, Im'Y = ker
°
h. Changing coordinates The coordinates of a vector depend on the chosen coordinate system. Let us discuss how they change. Let X be a vector space of dimension n, and let £ : X -+ ][{n and F : X -+ ][{n be two coordinate systems on X, that we label respectively as the old system and the new system. Denote by (e1' e2, ... , en) and (ft, 12, ... , fn) the bases associated respectively to the old coordinate system £ and to the new coordinate system F. The linear map L := F 0 £-1 : ][{n -+ ][{n maps the old £-coordinate vector of x E X to the new F-coordinate vector of x, see Figure 2.5. The matrix L associated to L in the basis (ell e2, ... , en) is
52
2. Vector Spaces and Linear Maps
L
Figure 2.5. Changing the basis.
We say that L or L changes coordinates from £ to :F. Remember that the ith column of L is the new .r-coordinate vector of the ith vector of the old basis, i.e, see (2.2), L = [Lj] where n ej
=
LL~k
(2.6)
i=l
Let m : X ....... X be the linear map defined by m(ei) := Ii Vi = 1, ... , m. Then the associated matrix M = [Mj] to m using the basis £ is M :=
[fUr) I£(12) 1···1 fUn)]
or, n
!J = LMjei' i=l
Therefore, comparing with (2.6), M = L -1. In conclusion (i) L maps the old coordinates to the new coordinates,
(ii) L -1 maps the old basis to the new basis. Thus L acts differently on the basis and on the coordinates. We say that the coordinates change in a contravariant way. This is nothing mysterious; for instance, 7000 g = 7 Kg: if the unit measure f 1 is, say 1000 times the unit measure el, then we expect that the number of units el associated to a measure will be 1/1000 of the number of units fl. 2.27 Example. Suppose we want to change from the canonical basis (er, e2) to a new one given by the vectors (1,2) and (3,4) in ]R2. The matrix that changes the basis from el = (1,0), e2 = (0,1) into !I = (el + 2 e2), h = (3 el + 4 e2) is
The old coordinates of a vector P can easily be obtained from the new ones, as P = x er
+ ye2 = ex !I + j3 h,
2.1 Vector Spaces and Linear Maps
53
thus, in the old coordinates,
and, conversely, we have
i. The associated matrix under changes of basis
Let X and Y be two vector spaces of dimension nand m. As before, the matrix associated to a linear map depends on the chosen coordinates on X and Y. Let £ : X ---... JKn, £' : X ---... JKn be two coordinate systems on X, and let F : X ---... JKn, F' : X ---... JKn be two coordinate systems on Y. Of course, for every map £ we have £ = Id y 0 £ 0 Id x , consequently
or, in other words, we can state the following.
2.28 Proposition. Given the previous notation, let R E Mn,n(JK) be the matrix that changes coordinates from £ to £', let S E Mm,m(JK) be the matrix that changes coordinates from F to F', let A A' be the matrices that represent £ respectively, in the systems of coordinates £ and F and in the systems £' and F'. Then
2.29 Corollary. Let A E Mn,n(JK) and let A: JKn ---... JKn be the associated linear operator A(x) := Ax. Let (f1 , f 2 , ... , fn ) be a basis ofJKn. Then the matrix associated to A using the basis (f1 , f 2 , ... , fn ), both in the source and the target JKn, is the matrix
A':= S-lAS where
Proof. Let (e1, e2, ... , en) be the canonical basis of OCn , then Se; = f; are the coordinates of f; in the basis (e1, e2, ... , en), ASe; = Af; is the coordinate vector of A (f;) in canonical coordinates and, finally, S -1 ASe; is the coordinate vector of A (f;) in the (f1, f2, ... , f n ) basis. 0
54
2. Vector Spaces and Linear Maps
j. The dual space £(X, lK) Linear maps from X into lK play a special role. Let X be a vector space over lK with dim X = n. Linear maps from X into lK are also called linear forms or coveetors, and the space of linear forms, £(X, lK), also denoted by X*, is called the dual space of X. Suppose X is finite dimensional. Then, as we have seen, X* has dimension nand, if (el' e2, ... , en) is a basis of X, then every linear form £ : X ----t lK is represented as a 1 x n matrix L that maps the coordinates x of x E X to £(x), i.e., n
X= Lxiei' x=(x l ,x2, ... ,xn )
if
Lx = £(x)
i=l
or, L = (ir
Ihi··· /lnl
and n
£(x)
=
\-I
vx
Llixi,
=
(1. X , X 2 , ••• ,
x n) .
i=l
Consider now the linear maps e l , e 2 , ... , en : X
----t
lK defined by
(2.7)
"fi,j = 1, ... ,n.
2.30 Proposition. We have = 1, ... ,n, the map e i of x, so that
(i) for i
:
X
----t
lK maps x E X to the ith coordinate
n
X= L
"fxEX,
ei(x)ei
i=l
(ii) (e\ e 2, ... , en) is a basis on X*, (iii) if x = L:~l xiei E X and £ = L:~=l liei E X*, then £(x) = L:~l lixi. Proof. (i) If x = 2::?=1 xiei, then ej(x) = 2::~1 xiej(ei) = xj. (ii) Let f. := 2::';=1 pje j . Then f.(e;) = Pi Vi. Thus, if f.(x) = 0 'Ix we trivially have Pi = 0 Vi. (iii) In fact,
f.(x)
=
= tljej(txiei)
(t1jej)(I>iei) j=l i=l n
n
= LL:Zjxiej(ei) i=l j=l
j=l n
n
i=l n
= LLljxioij = L1i xi . i=l j=l
i=l D
The system of linear maps (e l , e 2 , .•• , en) characterized by (2.7) is called the dual basis of (el' e2, ... , en) in X*.
2.1 Vector Spaces and Linear Maps
55
CALCOLO
GEO 1ETRICO _n~ll_
O,£lUI051 DELU IASIC> DEDOtllTA
_om.,,. _.. _ ,IAICO
....
" " ..,
"" •
U'ht"
_
IWIU1
T01UNO
-
PRATSI,LI ROCCA eDITORE
h~=_~
h!i!EU
,......,~ ..,'
. . =M..
'-i~.
'ISO
Figure 2.6. Giuseppe Peano (1858-1932) and the frontispiece of his Calcolo Geometrico.
2.31 Remark. Coordinates of vectors or covectors of a vector space X of dimension n are both n-tuples. However, to distinguish them, it is useful to index the coordinates of covectors with lower indices. We can reinforce this notation even more by writing the coordinates of vectors as column vectors and coordinates of covectors as row vectors.
k. The bidual space Of course, we may consider also the space of linear forms on X*, denoted by X** and called the bidual of X. Every v E X identifies a linear form on .c(X*,IK), by defining v** : X* -+ IK by v**(l) := l(v). The map , : X -+ X**, x -+ ,(x):= x** we have just defined is linear and injective. Since dimX** = dimX* = dimX, , is surjective, hence an isomorphism, and we call it natural since it does not depend on other structures on X, as does the choice of a basis. Since X** and X are naturally isomorphic, there is a "symmetry" between the two spaces X and X*. To emphasize this symmetry, it is usual to write instead of
if
< , >: X* x X
-+
IK,
<
>:=
2.32'. Let X be a vector space and let X' be its dual. A duality between X and X' is a map < , >: X' X X -+ lK that is
56
2. Vector Spaces and Linear Maps
(i) linear in each factor,
< 'P, ax + /3y > = a < 'P, x> +/3 < 'P, Y >, < a'P + /3'IjJ, x> = a < 'P, x> +/3 < 'IjJ, x>, for all a, /3 E IK, x, Y E X and 'P, 'IjJ E X*, (ii) nondegenemte i.e., if <'P,x>=OVx, then'P=O, if
< 'P, x > = 0 V'P, then x = O. -+ < 'P, x > that evaluates a linear map 'P : X
Show that the evaluation map ('P, x) at x E X is a duality.
-+
IK
1. Adjoint or dual maps Let X, Y be vector spaces, X* and Y* their duals and < , > x and < , >y the evaluation maps on X* x X and Y* x Y. For every linear map £ : X ---+ Y, one also has a map £* : Y* ---+ X* defined by
< £*(y*),x >:=< y*,£(x) >
Vx E X, Vy* E Y*.
It turns out that £* is linear. Now if (el' e2, , en) and (fI,···, fm) are bases in X and Y respectively, and (e l , e2, , en) and (J1, fm) are the dual bases in X* and Y*, then the associated matrices L = [LJ] E Mm,n(lK) and M = [MJ] E Mn,m(lK), associated respectively to £ and £*, are defined by
p, ... ,
n
m
£(ei) =
"L L? fh,
£*(Jh) =
"L Mh ei . i=l
h=l By duality,
Mh = < £*(Jh), ei > = < fh, £(ei) > = L? i.e., M = LT. Therefore we conclude that ifL is the matrix associated to £ in a given basis, then L T is the matrix associated to £* in the dual basis. We can now discuss how coordinate changes in X reflect on the dual space. Let X be a vector space of dimension n, X* its dual space, (el, ... , en), (El' E2, ... , En) two bases on X and (e l , e2, ... , en) and (E l , E2, ... , En) the corresponding dual bases on X*. Let £ : X ---+ X be the linear map defined by £(ei) := Ei Vi = 1, ... , n. Then by duality
< £*(Ei),ej > = < Ei,£(ej) > = < Ei,Ej > =
Oij
= < ei,ej >
Vi,j,
£*(Ei ) = ei Vi = 1, ... , n. If Land L T are the associated matrices to £ and £*, L changes basis from (el' e2, ... , en) to (El' E2, ... , En) in X, and L T changes basis in the dual space from (E l , E2 , ... , En) to (e l , e2, ... , en), n
Ei = "LLjei' i=l
n
ei
= "L(LT){E j . j=l
2.2 Eigenvectors and Similar Matrices
n
n
n
57
n
Lbiei = LLbi(LT){E j = Lai ei . i=l
i=l j=l
i=l
or
a=bL.
In other words, the coordinates in X* change according to the change of basis. We say that the change of coordinates in X* is covariant.
2.2 Eigenvectors and Similar Matrices Let A : X ~ X be a linear operator on a vector space. How can we describe the properties of A that are invariant by isomorphisms? Since isomorphims amount to changing basis, we can put it in another way. Suppose X is finite dimensional, dim X = n, then we may consider the matrix A associated to A using a basis (we use the same basis both in the source and the target X). But how can we catch the properties of A that are independent of the basis? One possibility is to try to choose an "optimal" basis in which, say, the matrix A takes the simplest form. As we have seen, if we choose two coordinate systems £ and F on X, and S is the matrix that changes coordinates from £ to F, then the matrices A and B that represent A respectively in the basis £ and F are related by
B = SAS- 1 . Therefore we are asking for a nonsingular matrix S such that S-l AS has the simplest possible form: this is the problem of reducing a matrix to a canonical form. Let us try to make the meaning of "simplest" for a matrix more precise. Suppose that in X there are two supplementary invariant subspaces under A
Then every x E X splits uniquely as x = Xl + X2 with Xl E W 1, X2 E W2, and A(x) = A(X1) + A(X2) with A(xt} E W 1 and A(X2) E W 2. In other words, A splits into two operators A 1 : W 1 ~ W 1, A : W 2 ~ W 2 that are the restrictions of A to W 1 and W 2 . Now suppose that dimX = n and let (e1' e2, ... , ek) and (ft, 12, ... , fn-k) be two bases respectively of W 1 and W 2 . Then the matrix associated to A in the basis (e1' e2,···, ek,ft, 12,···, fn-k) of X has the form
58
2. Vector Spaces and Linear Maps
where some of the entries are zero. If we pursue this approach, the optimum would be the decomposition of X into n supplementary invariant subspaces WI, W2,.'" W n under A of dimension 1,
In this case, A acts on each Wi as a dilation: A(x) = AiX Vx E Wi for some Ai E K Morever, if (el' e2, ... , en) is a basis of X such that ei E Wi for each i, then the matrix associated to A in this basis is the diagonal matrix
2.2.1 Eigenvectors a. Eigenvectors and eigenvalues As usual, lK denotes the field lR or C. 2.33 Definition. Let A : X - t X be a linear operator on a vector space X over lK. We say that x E X is an eigenvector of A if Ax = AX for some A E K If x is a nonzero eigenvector, the number A for which A(x) = AX is called an eigenvalue of A, or more precisely, the eigenvalue of A relative to x. The set of eigenvalues of A is called the spectrum of A. If A E Mn,n(lK), we refer to eigenvalues and eigenvectors of the associated linear operator A : lKn - t lKn , A(x) := Ax, as the eigenvalues and the eigenvectors of A. From the definition, A is an eigenvalue of A if and only if ker(A Id A) =1= {O}, equivalently, if and only if AId - A is not invertible. If A is an eigenvalue, the subspace of all eigenvectors with eigenvalue A
VA
:= {xEXIA(x) =AX}
=ker(AId-A)
is called the eigenspace of A relative to A. 2.34 Example. let X = Coo ([0, 1r]) be the linear space of smooth functions that vanish at 0 and 1r and let D2 : X -> X be the linear operator D 2 (J) := f" that maps every function f into its second derivative. Nonzero eigenvectors of the operator D 2 , that is, the nonidentically zero functions y E Coo [0, IJ such that D 2 y(x) = AY(X) for some A E ~, are called eigenfunctions. 2.35 Example. Let X be the set Pn of polynomials of degree less than n. Then, each P k C Pn k = 0, ... ,n is an invariant subspace for the operator of differentiation. It has zero as a unique eigenvalue.
2.2 Eigenvectors and Similar Matrices
59
2.36'. Show that the rotation in ]R2 by an angle () has no nonzero eigenvectors if 0,11", since in this case there are no invariant lines.
() f=
2.37 Definition. Let A : X ---? X be a linear operator on X. A subspace We X is invariant (under A) if A(W) c W. In the following proposition we collect some simple properties of eigenvectors.
2.38 Proposition. Let A : X
---?
X be a linear operator on X.
(i) x i= 0 is an eigenvector if and only if Span {x} is an invariant subspace under A. (ii) Let A be an eigenvector of A and let VA be the corresponding eigenspace. Then every subspace W C VA is an invariant subspace under A, i.e., A(W) C W. (iii) dim ker(A Id - A) > 0 if and only if A is an eigenvalue for A. (iv) Let W C X be an invariant subspace under A and let A be an eigenvalue for A 1w . Then A is an eigenvalue for A : X ---? X. (v) A is an eigenvalue for A if and only if 0 is an eigenvalue for AId - A. (vi) Let cp : X ---? Y be an isomorphism and let A : X ---? X be an operator. Then x E X is an eigenvector for A if and only if cp( x) is an eigenvector for cp 0 A 0 cp-l, and x and cp(x) have the same eigenvalue. (vii) Nonzero eigenvectors with different eigenvalues are linearly independent. Proof. (i), ... , (vi) are trivial. To prove (vii) we proceed by induction on the number k of eigenvectors. For k = 1 the claim is trivial. Now assume by induction that the claim holds for k - 1 nonzero eigenvectors, and let el, e2, ... , ek be such that ei f= 0 Vi = 1, ... ,k, A(ej) = Ajej Vj = 1, ... ,k with Aj f= Ai Vi f= j. Let (2.8) be a linear combination of q, e2, ... , ek' From (2.8), multiplying by Al and applying A we get alAlel alAlq
+ a2Ale2 + + a2A2e2 +
+ akAlek = 0, + akAkek = 0,
consequently k
L)Aj - Al)ajej = O. j=2 By the inductive assumption, aj(Aj - AI) = 0 Vj = 2, ... , n, hence aj = 0 for all j :2: 2. We then conclude from (2.8) that we also have al = 0, i.e., that q, e2, . .. , ek are linearly independent. 0
Let A : X ---? X be a linear operator on X of dimension n, and let A be the associated matrix in a coordinate system £ : X ---? JKn. Then (vi) implies that x E X is an eigenvector of A if and only if x := £(x) is an eigenvector for x ---? Ax and x and x have the same eigenvalue. From (vii) Proposition 2.38 we infer the following.
60
2. Vector Spaces and Linear Maps
2.39 Corollary. Let A : X -> X be a linear operator on a vector space X of dimension n. If A has n different eigenvalues, then X has a basis formed by eigenvectors of A.
b. Similar matrices Let A : X -> X be a linear operator on a vector space X of dimension n. As we have seen, if we fix a basis, we can represent A by an n x n matrix. If A and A' E Mn,n(lK) are two such matrices that represent A in two different bases (e1' e2,"" en) and (E1' E2, ... , En), then by Proposition 2.28 A' = S-l AS where S is the matrix that changes basis from (el' e2,.'" en) to (El' E2,"" En). 2.40 Definition. Two matrices A, B E M n n(lK) are said to be similar if there exists S E GL(n, lK) such that B = S-i AS. It turns out that the similarity relation is an equivalence relation on matrices, thus n x n matrices are partitioned into classes of similar matrices. Since matrices associated to a linear operator A : X -> X, dim X = n, are similar, we can associate to A a unique class of similar matrices. It follows that if a property is preserved by similarity equivalence, then it can be taken as a property of the linear operator to which the class is referred. For instance, let A : X -> X be a linear operator, and let A, B be such that B = S-lAS. By Binet's formula, we have
det B
= det S-l det A det S =
_1_ det A det S = det A. detS
Thus we may define the determinant of the linear map A : X
->
X by
detA:= detA where A is any matrix associated to A. c. The characteristic polynomial Let X be a vector space of dimension n, and let A : X operator. The function
A -> PA(A) := det(AId - A),
->
X be a linear
A E lK,
is called the characteristic polynomial of A. It can be computed by representing A by a matrix A in a coordinate system and computing PA(A) as the characteristic polynomial of any of the matrices A representing A,
PA(A) = PA(A) = det(AId - A). In particular, it follows that PA( ) : lK -> lK is a polynomial in A of degree n, and that the roots of PA(A) are the eigenvalues of A or of A. Moreover, we can state
2.2 Eigenvectors and Similar Matrices
61
2.41 Proposition. We have the following.
(i) Two similar matrices A, B have the same eigenvalues and the same characteristic polynomials.
(ii) If A has the form
where for i = 1, ... ,k, each block Ai is a square matrix of dimension k i with principal diagonal on the principal diagonal of A, then
PA(S) = PA1(S)' PA2(S)., ·PAk(S). (iii) We have det(sId - A) = sn - trAs n- 1
+ ... + (-l)ndetA
n
=
sn
+ 2)-1)k aks n-k k=1
where tr A := 2:7=1 Ai is the trace of the matrix A, and ak is the sum of the determinants of the k x k submatrices of A with principal diagonal on the principal diagonal of A. Proof. (i) If B = SAS-l, S E GL(n,IK), then sId - B = S(sId - A)S-I, hence det(sId - B) = detSdet(sId - A)(detS)-1 = det(sId - A). (ii) The matrix sId - A is a block matrix of the same form
sId-AI
0
o
o
I'Id-A, I
o
o
0
hence det(s Id - A) = n:=1 det(s Id - Ai)' (iii) We leave it to the reader. D
Notice that there exist matrices with the same eigenvalues that are not similar, see Exercise 2.73.
62
2. Vector Spaces and Linear Maps
d. Algebraic and geometric multiplicity 2.42 Definition. Let A : X ~ X be a linear operator, and let A E IK be an eigenvalue of A. We say that A has geometric multiplicity kEN if dimker(Ald - A) = k. Let PA(S) be the characteristic polynomial of A. We say that A has algebraic multiplicity k if where q(A)
1= o.
2.43 Proposition. Let A : X ~ X be a linear operator on a vector space of dimension n and let A be an eigenvalue of A of algebraic multiplicity m. Then dim ker(A Id - A) :::; m. Proof. Let us choose a basis (el, e2, ... , en) in X such that (el, e2, ... , ek) is a basis for VA := ker(.\ Id - A). The matrix A associated to A in this basis has the form
AId
C
o
D
A=
where the first block, AId, is a k x k matrix of dimension k = dim VA. Thus Proposition 2.41 (ii) yields PA(S) = det(sId - A) = (s - A)kpD(s), and the multiplicity of A is at least k. 0
e. Diagonizable matrices 2.44 Definition. We say that A E M n ,n(lK) is diagonizable, if A is similar to a diagonal matrix. 2.45 Theorem. Let A : X ~ X be a linear operator on a vector space of dimension n, and let (el' e2, ... , en) be a basis of X. The following claims are equivalent.
(i) el, e2, ... , en are eigenvectors of A and AI, A2, ... , An are the relative eigenvalues.
(ii) We have A(x) = 2:~=1 Aixiei for all x E X if x = 2:~1 xiei. (iii) The matrix that represents A in the basis (el' e2, ... , en) is diag (AI, A2, ... , An).
(iv) If A is the matrix associated to A in the basis (fl, fz,···, in), then
S-l AS = diag (AI, A2' ... , An) where S is the matrix that changes basis from (iI, fz,···, in) to (el' e2, ... , en), i.e., the ith column of S is the coordinate vector of the eigenvector ei in the basis (iI, fz,···, fn).
2.2 Eigenvectors and Similar Matrices
Proof. (i) ¢} (ii) by linearity and (iii) Finally (iii) ¢} (iv) by Corollary 2.29.
¢}
63
(i) since (iii) is equivalent to A(ei) = Aiei. 0
2.46 Corollary. Let A: X -- X be a linear operator on a vector space of dimension n. Then the following claims are equivalent.
(i) X splits as the direct sum of n one-dimensional invariant subspaces (under A), X
= WI EB··· EB Wn .
(ii) X has a basis made of eigenvectors of A. (iii) Let AI, A2,' .. , Ak be all distinct eigenvalues of A, and let VAl' ... , VAk be the corresponding eigenspaces. Then k
LdimVAi =n. i=l
(iv) If A is the matrix associated to A in a basis, then A is diagonizable. Proof. (i) implies (ii) since any nonzero vector in any of the Wi'S is an eigenvector. Denoting by (el, e2, ... , en) a basis of eigenvectors, the spaces Wi := Span {ei} are supplementary spaces of dimension one, hence (ii) implies (i). (iii) is a rewriting of (i) since for each eigenvalue A, VA is the direct sum of the Wi'S that have A as the corresponding eigenvalue. Finally (ii) and (iii) are equivalent by Theorem 2.45. 0
2.47 Linear equations. The existence of a basis of X of eigenvectors of an operator A : X -- X makes solving the linear equation A(x) = y trivial. Let (el' e2, ... , en) be a basis of X of eigenvectors of A and let AI, A2,"" An be the corresponding eigenvalues. Writing x, y E X in this basis, n X
=
Lxiei' i=l
=y
we rewrite the equation A(x)
as
n
L(AiXi - yi)ei i=l
i.e., as the diagonal system AlXl = yl, A2X2 = y2,
!
... , AnXn
Therefore
= yn.
=
0,
64
2. Vector Spaces and Linear Maps
(i) suppose that 0 solution
IS
not an eigenvalue, then A(x) n X
=
y has a unique
.
L
y'
):"e i .
i=1
'
(ii) let 0 be an eigenvalue, and let Vo = Span { el, e2, ... , ed. Then A(x) = y is solvable if and only if yl = ... = yk = 0 and a solution of A(x) = y is Xo := L:7=k+l By linearity, the space of all
fei.
solutions is the set {x E X
Ix -
Xo E ker A = Vo }.
2.48~.
Let A : X --+ X be a linear operator on a finite-dimensional space. Show that A is invertible if and only if 0 is not an eigenvalue for A. In this case show that 1/>.. is an eigenvalue for A-I if and only if>" is an eigenvalue for A.
f. Triangularizable matrices First, we notice that the eigenvalues of a triangular matrix are the entries of the principal diagonal. We can then state the following.
2.49 Theorem. Let A E Mn,n(lK). If the characteristic polynomial decomposes as a product of factors of first degree, i.e., if there are (not necessarily distinct) numbers AI, A2, ... , An E lK such that
then A is similar to an upper triangular matrix. Proof. Let us prove the following equivalent claim. Let A : X --+ X be a linear operator on a vector space of dimension n. If PA(>") factorizes as a product of factors of first degree, then there exists a basis (Ul' U2, ... , un) of X such that Span {uI}, Span {UI, U2}, Span {Ul, U2, U3}, ... , Span {Ul, U2, .. · Un} are invariant subspaces under A. In this case we have for the linear operator A(x) = Ax associated to A
(
a~ul,
AUI
= A(uI) =
AU2
= A(U2) = a~ul + a~u2,
~un = A(un) = a~ul + a~u2 + ... + a~Un,
i.e., the matrix A associated to A using the basis (UI, U2, ... , Un) is upper triangular,
A=
(
~~ .
o
o
2.2 Eigenvectors and Similar Matrices
65
We proceed by induction on the dimension of X. If dim X = 1, the claim is trivial. Now suppose that the claim holds for any linear operator on a vector space of dimension n - 1, and let us prove the claim for A. From
PA(A) = det(A Id - A) = (A - A1) ... (A - An), A1 is an eigenvalue of A, hence there is a corresponding nonzero eigenvalue U1 and Span {ut} is an invariant subspace under A. Now we complete {ut} as a basis by adding vectors V2, . .. Vn, and let B be the restriction of the operator A to Span {V2, ... Vn }. Let B be the matrix associated to B in the basis (V2, ... , v n ), and let A be the matrix associated to A in the basis (u 1, u2, . .. , Un). Then
A=
where
o
at = A1.
B
Thus
PA(A)
= PA(A) =
(A - A1) PB(A)
=
(A - Ad PB(A).
It follows that the characteristic polynomial of B is PB(A) = (A - A2) the inductive hypothesis, there exists a basis (U2, ... un) of Span {V2,
(A - An). By ,Vn} such that
Span {U2}, Span {U2, U3}, ... , Span {U2, .. . , un} are invariant subspaces under B, hence
are invariant subspaces under A.
o
2.2.2 Complex matrices When lK = te, a significant simplification arises. Because of the fundamental theorem of algebra, the characteristic polynomial PA(A) of every linear operator A : X - t X over a complex vector space X of dimension n, factorizes as product of n factors of first degree. In particular, A has n eigenvalues, if we count them with their multiplicities. From Theorem 2.49 we conclude the following at once.
2.50 Corollary. Let A E Mn,n(C) be a complex matrix. Then A is similar to an upper triangular matrix, that is, there exists a nonsingular matrix S E Mn,n(C) such that S-l AS is upper triangular. Moreover,
2.51 Corollary. LetA E Mn,n(C)be a matrix. Then A is diagonizable (as a complex matrix) if and only if the geometric and algebraic multiplicities of each eigenvalue agree.
66
2. Vector Spaces and Linear Maps
Proof. Let A1, A2, ... , Ak be the distinct eigenvalues of A, for each i = 1, ... k, let and VA; respectively be the algebraic multiplicity and the eigenspace of Ai. If dim VA; = mi Vi, then by the fundamental theorem of algebra
mi
n
k
2..= dim VA; = 2..= mi = i=l
n.
i=l
Hence A is diagonizable, by Corollary 2.46. Conversely, if A is diagonizable, then L:f=l dim VA; = n, hence by Proposition 2.43 dim VA; :::: mi, hence k
n
= 2..= mi 2':
i=l
k
2..= dim VA; i=l
= n. o
2.52 Remark (Real and complex eigenvalues). If A E Mn,n(J~), its eigenvalues are by definition the real solutions of the polynomial equation det(.\Id - A) = O. But A is also a matrix with complex entries, A E Mn,n(C) and it has as eigenvalues which are the complex solutions of det(.\Id - A) = O. It is customary to call eigenvalues of A the complex solutions of det(.\ Id - A) = 0 even if A has real entries, while the real solutions of the same equation, which are the eigenvalues of the real matrix A following Definition 2.33, are called real eigenvalues. The further developments we want to discuss depend on some relationships among polynomials and matrices that we now want to illustrate.
a. The Cayley-Hamilton theorem Given a polynomial f(t) = I:~=l aktk, to every n x n matrix A we can associate a new matrix f (A) defined by n
n
f(A):= aoId + Lak Ak =: LakAk,
k=l
k=O
if we set A 0 := Id. It is easily seen that, if a polynomial f(t) factors as f(t) = p(t)q(t), then the matrices p(A) and q(A) commute, and we have f(A) = p(A)q(A) = q(A)p(A).
2.53 Proposition. Let A E Mn(C), and let p(t) be a polynomial. Then
(i) if.\ is an eigenvalue of A, then p(.\) is an eigenvalue of p(A), (ii) if J.t is an eigenvalue of p(A), then J.t = p(.\) for some eigenvalue .\ ofA. Proof. (i) follows observing that A k , kEN, is an eigenvalue of A k if A is an eigenvalue
of A.
(ii) Since J1 is an eigenvalue of p(A), the matrix p(A) - J1 Id is singular. Let p(t) = be of degree k, ak l' O. By the fundamental theorem of algebra we have
L:f=l ai ti
2.2 Eigenvectors and Similar Matrices
67
n k
p(t) - J.t = ak
(t - ri),
i=l
hence p(A) - J.tld = akrr~=l(A - riId) and, since p(A) - J.tId is singular, at least one of its factors, say A - rl Id, is singular. Consequently, rl is an eigenvalue of A and trivially, p(rl) - J.t = O. 0
Now consider two polynomials P(t) := Lj Pjt j and Q(t) := Lk Qktk with n x n matrices as coefficients. Trivially, the product polynomial R(t) := P(t)Q(t) is given by R(t) := LPjQktj+k. j,k
2.54 Lemma. Using the previous notation, if A E Mn,n((C) commutes with the coefficients of Q(t), then R(A) = P(A)Q(A). Proof. In fact, R(A)
= L:::PjQkAJ+k = L:::(PjAj)(QkAk) j,k
=
j,k
(L:::PjAj) (L:::QkAk) j
= P(A)Q(A).
k
o
2.55 Theorem (Cayley-Hamilton). Let A E Mn,n((C) and let PA(S) be its characteristic polynomial, PA(S) := det(s Id - A). Then PA(A) = O. Proof. Set Q(s) := sId - A, sEC, and denote by cofQ(s) the matrix of cofactors of Q(s). By Laplace's formulas, see (1.22), cofQ(s) Q(s) = Q(s) cofQ(s) = det Q(s) Id = PA(S) Id. Since A trivially commutes with the coefficents Id and A of Q(s), Lemma 2.54 yields PA(A) = PA(A) Id = cofQ(A) Q(A) = cofQ(A)· 0 =
o. o
b. Factorization and invariant subspaces Given two polynomials PI, Pz with deg PI :::: deg Pz , we may divide PI by Pz , i.e., uniquely decompose PI as PI = QPz + R where deg R < deg Pz . This allows us to define the greatest common divisor (g.c.d.) of two polynomials that is defined up to a scalar factor, and compute it by Euclid's algorithm. Moreover, since complex polynomials factor with irreducible factors of degree 1, the g.c.d. of two complex polynomials is a constant polynomial if and only if the two polynomials have no common root. We also have 2.56 Lemma. Let p(t) and q(t) be two polynomials with no common zeros. Then there exist polynomials a(t) and b(t) such that a(t) p(t) + b(t) q(t) = 1 "It E C.
68
2. Vector Spaces and Linear Maps
We refer the readers to [GM2], but for their convenience we add the proof of Lemma 2.56 Proof. Let P := {r(t) := a(t)p(t)
+ b(t)q(t) Ia(t), b(t)
are polynOmials}
and let d = Cip + fJq be the nonzero polynomial of minimum degree in P. We claim that d divides both P and q. Otherwise, dividing p by d we would get a nonzero polynomial r := p - md and, since p and d are in P, r = p - md E P also, hence a contradiction, since r has degree strictly less than d. Then we claim that the degree of d is zero. Otherwise, d would have a root that should be common to p and q since d divides both p and q. In conclusion, d is a nonzero constant polynomial. 0
2.57 Proposition. For every polynomial p, the kernel of p(A) is an invariant subspace for A E Mn,n(C). Proof. Let w E ker p(A). Since t p(t) = p(t) t, we infer Ap(A) = p(A)A. Therefore
p(A)(Aw)
= (p(A) A)w = (Ap(A))w = Ap(A)w = AO = o. o
Hence Aw E kerp(A).
2.58 Proposition. Let p be the product of two coprime polynomials, p(t) = PI (t)p2(t), and let A E Mn,n(C). Then kerp(A) := kerpl(A) EB kerp2(A). Proof. By Lemma 2.56, there exist two polynomials aI, a2 such that al (t)PI (t) a2(t)p2(t) = 1. Hence
+
(2.9) Set WI := kerpI(A),
W2 := kerp2(A),
W:= kerp(A).
Now for every x E W, we have al (A)PI (A)x E W2 since P2(A)adA)PI(A)x = P2(A)(Id - a2 (A)P2 (A))x = (Id - a2 (A)p2 (A))p2 (A)x = al(A)pdA)P2(A)x = adA)p(A)x =
and, similarly, a2(A)p2(A)x E WI. Thus W = WI fact, if y E WI n W2, then by (2.9), we have y
+ W2.
o.
Finally W = WI EB W2. In
= al (A)PI (A)y + a2 (A)P2 (A)y = 0 + 0 = O. o
c. Generalized eigenvectors and the spectral theorem 2.59 Definition. Let A E Mn,n(C), and let ..\ be an eigenvalue of A of multiplicity k. We call generalized eigenvectors of A relative to the eigenvalue ..\ the elements of
W
:=
ker(..\Id - A)k.
Of course, (i) eigenvectors relative to ..\ are generalized eigenvectors relative to ..\, (ii) the spaces of generalized eigenvectors are invariant subspaces for A.
2.2 Eigenvectors and Similar Matrices
69
2.60 Theorem. Let A E Mn,n(C). Let AI, A2,"" Ak be the eigenvalues of A with multiplicities ml, m2, ... , mk and let WI, W2, ... , Wk be the subspaces of the relative generalized eigenvectors, Wi := ker(Ai Id - A). Then (i) the spaces WI, W 2, ... , W k are supplementary, consequently there is a basis of en of generalized eigenvectors of A, (ii) dim Wi = mi. Consequently, if we choose A' E Mn,n(JK) using a basis (el, e2,···, en) where the the first ml elements span WI, the following m2 elements span W 2 and the last mk elements span Wk. We can then write the matrix A' in the new basis similar to A where A' has the form
o A/ =
~ o
where for every i = 1, ... , k, the block Ai is a mi x mi matrix with Ai as the only eigenvector with multiplicity mi and, of course, (Ai Id - Ai)m i = O. Proof. (i) Clearly the polynomials Pl(S) := (AI - s)m 1 , P2(S) := (A2 - s)m 2 , ••. , Pk(S) := (Ak - s)m k factorize PA and are coprime. Set N i := PiCA) and notice that Wi = kerN i . Repeatedly applying Proposition 2.58, we then get
kerpA(A)
= ker(N l N2 ... N k ) = ker(N l ) EB ker(N2N3'" Nk) = ... = WI EB W2 EB ... EB Wk·
en.
(i) then follows from the Cayley-Hamilton theorem, kerpA(A) =
(ii) It remains to show that dim Wi = mi Vi. Let (et, e2, ... , en) be a basis such that the first hI elements span WI, the following h2 elements span W2 and the last hk elements span Wk. A is therefore similar to a block matrix
o A'=
o where the block Ai is a square matrix of dimension hi := dim Wi. On the other hand, the qi x qi matrix (Ai Id - Ai)m i = 0 hence all the eigenvalues of Ai Id - Ai are zero. Therefore Ai has a unique eigenvalue Ai with multiplicity hi, and PA i (s) := (s - Ai)h i . We then have k
PA(S) = PA'(S) =
k
I1 PA (s) = I1 (s i
i=l
A)h i
,
i=l
and the uniqueness of the factorization yields hi = mi. The rest of the claim is trivial.
o
70
2. Vector Spaces and Linear Maps
Another proof of dim Wi = mi goes as follows. First we show the following. 2.61 Lemma. If 0 is an eigenvalue of BE Mn.n(C) with multiplicity m, the 0 is an eigenvalue for Bm with multiplicity m.
Proof. The function 1 - Am, A E C, can be factorized as 1- Am = f1~(/ (wi - A) where w := e i 2." 1m is a root of unity (the two polynomials have the same degree and take the same values at the m roots of the unity and at 0). For z, tEe
hence
n
m-l
zmld - B m = zmld _B m =
(wizld - B).
i=O If we set qo(z):= f1~ol q(wiz), we have qo(O)
n
f=
0, and
m-l
PBm (zm) := det(zm Id - B m ) =
n
m-l
PB(WiZ) =
i=O
2
(wi z)mq(w i z) = zm qo(z).
i=O
On the other hand PBm = sTq1(r) for some ql with ql(O) following (2.10) PBm(S) = smq1(s), i.e., 0 is an eigenvalue of multiplicity m for Bm.
f=
(2.10) 0 and some r ;::: 1. Thus,
o
Another proof that dim Wi = mi in Theorem 2.60. Since
n n Lmi = LdimWi =dimX, i=l
i=l
it suffices to show that dim Wi ~ mi "Ii. Since 0 is an eigenvalue of B := Ai Id - A of multiplicity m := mi, 0 is an eigenvalue of multiplicity m for B m by Lemma 2.61. Since Wi is the eigenspace corresponding to the eigenvalue 0 of B m , it follows from Proposition 2.43 that dim Wi ~ m. 0
d. Jordan's canonical form 2.62 Definition. A matrix B E Mn,n(l[{) is said to be nilpotent if there exists k ;::: 0 such that B k = o. Let BE Mq,q(C) be a nilpotent matrix and let k be such that B k = 0, but B k- l =I- o. Fix a basis (el' e2, ... , e s ) of ker B, and, for each i = 1, ... , s, k . . 1 set et := ei and define e~, e;, ... ,ei i to solve the systems Be~ := e~- for j = 2,3 ... as long as possible. Let {e{}, j = 1, ... ,ki , i = 1, ... ,q, be the family of vectors obtained this way.
2.63 Theorem (Canonical form of a nilpotent matrix). Let B be a q x q nilpotent matrix. Using the previous notation, {e)} is a basis of C q. Consequently, if we write B with respect to this basis, we get a q X q matrix B' similar to B of the form
2.2 Eigenvectors and Similar Matrices
o o
o
~
B'=
71
(2.11)
o where each block B i has dimension k i and, if k i
0 1 0 ... 0 0 1 ... 0 0 0 ...
> 1, it has the form
0 0 0
(2.12)
1
0 0 0
... 0
The reduced matrix B' is called the canonical Jordan form of the nilpotent matrix B. Proof. The kernels Hj := ker Bj of Bj, j = 1, ... , k, form a strictly increasing sequence of subspaces {O} = Ho C HI C Hz C ... C Hk-l C H k := C q . The claim then follows by iteratively applying the following lemma.
o
2.64 Lemma. For j = 1,2, ... , k -1, let (el' e2,"" ep) be a basis of H j and let Xl, X2, , Xr be all possible solutions of BXj = ej, j = 1, ... ,po Then (el' e2, , ep,xI, X2,"" x r ) is a basis for H j + l . Proof. In fact, it is easily seen that el, ez, ... , e p , Xl, XZ,
o the vectors o {el)
e2, ... , ep,Xl, X2, ... , X r }
. .. , Xr
are linearly independent,
C Hj +1 ,
o the image of H j + l by B is contained in Hj. Thus r, which is the number of elements ei in the image of B, is the dimension of the image of Hj+l by B. The rank formula then yields dimHj+l
= dimHj + dim (1mB n Hj+l) = P + r. o
Now consider a generic matrix A E Mn,nUC), We first rewrite A using a basis of generalized eigenvectors to get a new matrix A' similar to A of the form
o A'=
5J o
o o
\Akl
(2.13)
72
2. Vector Spaces and Linear Maps
where each block Ai has the dimension of the algebraic multiplicity mi of the eigenvalue Ai and a unique eigenvalue Ai. Moreover, the matrix C i := Ai Id - Ai is nilpotent, and precisely C7'i = 0 and C7'l-l =I- O. Applying Theorem 2.63 to each C i , we then show that Ai is similar to Ai Id + B' where B' is as (2.11). Therefore, we conclude the following.
2.65 Theorem (Jordan's canonical form). Let AI, A2,"" Ak be all distinct eigenvalues of A E Mn,n(C). For every i = 1, ... , k
(i) let (Ui,l, ... , Ui,p,) be a basis of the eigenspace VAi (as we know, Pi S; ni), (ii) consider the generalized eigenvectors relative to Ai defined as follows: for any j = 1,2, ... ,Pi, a) set el,j := Ui,j, b) set e~j to be a solution of a= 2, ... ,
(2.14)
as long as the system (2.14) is solvable. c) denote by a(i,j) the number of solved systems plus 1. Then for every i = 1, ... , k the list (e~j) with j = 1, ... ,Pi and a = 1, ... , a(i,j) is a basis for the generalized eigenspace Wi relative to Ai. Hence the full list
(2.15)
i = 1.... ,k,j = 1, ... ,pi,a = 1, ... ,a(i,j)
is a basis of en. By the definition of the
{e~j}'
if we set
'- [1 2 1 2 1 2 ] S .e11,e11,· .. ,e12,e12, .. ·,e21,e21,'''' "
"
"
the matrix J := S-l AS, that represents x the form
-t
Ax in the basis (2.15), has \
J=
where i
J 1,1
0
0
0
0
0
IJ 1 I
0
0
0
0
0
B
0
0
0
0
0
0
Jk,Pk
,Pl
= 1, ... , k, j = 1, ... ,Pi,
Ji,j
has dimension a(i,j) and
2.2 Eigenvectors and Similar Matrices
Ai
Ji,j
=
if dim Ji,j
Ai
1
0
0
0
0
Ai
1
0
0
0
0
Ai
1
0
0
0
0
Ai
1
0
0
0
0
Ai
73
= 1,
otherwise.
A basis with the properties of the basis in (2.15) is called a Jordan basis, and the matrix J that represents A in a Jordan basis is called a canonical Jordan form of A. 2.66 Example. Find a canonical Jordan form of
A=
0 2 1 0 0
(i
0 0 2 1 0
~)
0 0 0 3 1
A is lower triangular hence the eigenvalues of A are 2 with multiplicity 3 and 3 with multiplictiy 2. We then have
A - 2Id
=
(1 ~ ; ;
J
A - 2Id has rank 4 since the columns of A of indices 1, 2, 3 and 5 are linearly independent. Therefore the eigenspace V2 has dimension 5 - 4 = 1 by the rank formula. We now compute a nonzero eigenvalue,
(A - 2Id) (;)
(.
u
~,)
x+t+u
(1) . 0
For instance, one eigenvector is 81 := (O,O,I,-I,I)T. We now compute the Jordan basis relative to this eigenvalue. We have eLl = 81 and it is possible to solve
:
( ) (~) z
+t
x+t+u
for instance,
82
:=
ei,l
-1
1
= (0,1,0, -1, 2)T is a solution. Hence we compute a solution of
74
2. Vector Spaces and Linear Maps
( : J (~J z +t x+t+u
-1 2
hence 83:= er,l = (1,0,0,-1,2)T. Looking now at the other eigenvalue, -1
A-31d~ ~
°
-1 1
(
° °
A is of rank 4 since the columns of indices 1, 2, 3 and 4 are linearly independent. Thus by the rank formula, the eigenspace relative to the eigenvalue 2 has dimension 1. We now compute an eigenvector with eigenvalue 2. We need to solve
and a nonzero solution is, for instance, 84 := (0,0,0,0, l)T. Finally, we compute Jordan's basis relative to this eigenvalue. A solution of
is given by
85
= e§,l = (0,0,0,1, O)T. Thus,
we conclude that the matrix
is nonsingular, since the columns are linearly independent, and by construction
2.2 Eigenvectors and Similar Matrices
75
e. Elementary divisors As we have seen, the characteristic polynomial det(s Id - A),
s E OC,
is invariant by similarity transformations. However, in general the equality of two characteristic polynomials does not imply that the two matrices be similar.
2.67 Example. The unique eigenvalue of the matrix AI-' =
(AOJL
0) is AO and has
AO
multiplicity 2. The corresponding eigenspace is given by the solutions of the system 0.X 1 +O.X 2 =0,
°. 1= °
{ JLX 1 +
x 2 = 0.
If JL 1= 0, then V"Q,I-' has dimension 1. Notice that Ao is diagonal, while AI-' is not diagonal. Moreover, Ao and AI-' with JL are not similar.
It would be interesting to find a complete set of invariants that characterizes the class of similarity of a matrix, without going explictly into Jordan's reduction algorithm. Here we mention a few results in this direction. Let A E Mn,n(C). The determinants of the minors of order k of the matrix sId - A form a subset D k of polynomials in the s variable. Denote by Dk(S) the g.c.d. of these polynomials whose coefficient of the maximal degree term is normalized to 1. Moreover set Do(s) := 1. Using Laplace's formula, one sees that D k - 1 (s) divides D k (s) for all k = 1, ... , n. The polynommts k
=
1, ... ,n,
are called the elementary divisors of A. They form a compl€te set of invariants that describe the complex similarity class of A. In fact, the following holds. 2.68 Theorem. The following claims are equivalent
(i) A and B are similar as complex matrices, (ii) A (,u%d B have the same Jordan's canonical form (up to permutations of rows and columns), (iii) A and B have the same elementary divisors.
76
2. Vector Spaces and Linear Maps
2.3 Exercises 2.69~. Write a few 3 x 3 real matrices and interpret them as linear maps from lR 3 into lR 3 . For each of these linear maps, choose a new basis of]R3 and write the associate matrix with respect to the new basis both in the source and the target lR3 . 2.70~. Let VI, V2, ... , Vn be finite-dimensional vector spaces, and let 10, II, linear maps such that
... ,In
be
Show that, ifIm (Ii) = ker(fi+1) Vi = 0, ... , n - 1, then I:?=1 (_I)i dim Vi = O. 2.71 ~. Consider lR as a vector space over iQi. Show that 1 and ~ are linearly independent if and only if ~ is irrational, ~ ~ iQi. Give reasons to support that lR as a vector space over iQi is not finite dimensional. 2. 72
~
I :X
-->
Lagrange multipliers. Let X, Y and Z be three vector spaces over IK and let Y, 9 : X --> Z be two linear maps. Show that ker 9 C ker I if and only if there exists a linear map f : Z --> Y such that 1:= fog. 2.73'. Show that the matrices
c ~), have the same eigenvalues but are not similar. 2.74'. Let .AI, .A2, ... , .An be the eigenvalues of A E Mn,n(C), possibly repeated with their multiplicities. Show that tr A = .AI + ... +.A n and det A = ;"1 . .A2 ... .An. 2. 75 ~. Show that p( s) = Sn the n x n matrix
+ an -1 Sn -1 + ... + ao 1
o
o
1
is the characteristic polynomial of
-a1
2.76~. Let A E Mk,k(IK), B E Mn,n(IK), C E Mk,n(IK). Compute the characteristic polynomial of the matrix
M:=(~ ~). 2.77'. Let f : cn --> cn be defined by f(ei) := ei+1 if i = 1, ... , n -1 and f(en) = e1, where e1, e2, ... , en is the canonical basis of en. Show that the associated matrix L is diagonizable and that the eigenvalues are all distinct. [Hint: Compute thJ characteristic polynomial.]
2.3 Exercises
77
2.78~. Let A E Mn,n(lR) and suppose A 2 = Id. Show that A is similar to \
o
-
Idn-k
for some k, 1::; k ::; n. [Hint: Consider the subspaces V+ := {x I Ax = x} and V_ := {x I Ax = -x} and show that V+ EEl V_ = lR n . J
2. 79~. Let A, BE Mn,n(lR) be two matrices such that A 2 = B 2 = Id and tr A = tr B. Show that A and B are similar. [Hint: Use Exercise 2.78.] 2.80~. Show that the diagonizable matrices span Mn,n(lR). [Hint: Consider the matrices Mij = diag (1, 2, ... , n) + Ei,j where Ei,j has value 1 at entry (i, j) and value zero otherwise.]
2.81 ~. Let A, B E Mn,n(lR) and let B be symmetric. Show that the polynomial -+ det(A + tB) has degree less than Rank B.
t
2.82~. Show that any linear operator A : lR n dimension 1 or 2.
-+
lR n has an invariant subspace of
2.83 ~ Fitting decomposition. Let f : X -+ X be a linear operator of a finitedimensional vector space and set fk := f 0 . . . 0 f k-times. Show that there exists k, 1 ::; k ::; n such that
(i) ker(fk) = ker(fk+ 1 ), (ii) 1m (fk) = 1m (fk+ 1 ), (iii) film (fk) : 1m (fk) -+ 1m (fk) is an isomorphism, (iv) f(ker fk) C ker(fk), (v) f, ker(fk) : ker(fk) -+ ker(fk) is nilpotent, (vi) V = ker(fk) EEl 1m (fk). 2.84~.
2.85
~.
A is nilpotent if and only if all its eigenvalues are zero.
Consider the linear operators in the linear space of polynomials A(P)(t) := pI (t),
B(P)(t) = tP(t).
Compute the operator AB - BA. 2.86~.
Let A, B be linear operators on lR n . Show that
(i) tr (AB) = tr (BA), (ii) AB - BA =J Id. 2.87~. Show that a linear operator C : lR 2 -+ lR 2 can be written as C = AB - BA where A, B : lR 2 -+ lR 2 are linear operators if and only if tr C = o.
78
2.88~.
2. Vector Spaces and Linear Maps
Show that the Jordan canonical form of the matrix
A{ with a~a5 ... a;:-l
a 21
a 31
a 0
a 32 a
0
0
a n2 a 3n
·"1 a
f. 0 is
[:
1 a 0
0 1 a
0
0
~)
3. Euclidean and Hermitian Spaces
3.1 The Geometry of Euclidean and Hermitian Spaces Until now we have introduced several different languages, linear independence, matrices and products, linear maps that are connected in several ways to linear systems and stated some results. The structure we used is essentially linearity. A new structure, the inner product, provides a richer framework that we shall illustrate in this chapter.
a. Euclidean spaces 3.1 Definition. Let X be a real vector space. An inner product on X is a map ( I ) : X x X ----- lR which is o (BILINEAR)
(x, y) ----- (xly) is linear in each factor, i.e., (Ax + fLylz) = A(xlz) (xlAy + fLz) = A(xly)
+ fL(ylz), + fL(xlz),
for all x,y,z E X, for all A,fL E R o (SYMMETRIC) (xly) = (ylx) for all x, y E o (POSITIVE DEFINITE) (xix) ~ 0 Vx and (xix)
x.
= 0 if and only if x = o.
The nonnegative real number
Ixl :=
J(xlx)
is called the norm of x EX.
A finite-dimensional vector space X with an inner product is called an Euclidean vector space, and the inner product of X is called the scalar product of X. 3.2 Example. The map ( I ) : ffi.n n
(xly) := x. y =
2:= xiyi, i=l
X
ffi.n
--->
ffi. defined by
80
3. Euclidean and Hermitian Spaces
is an inner product on JR n , called the standard scalar product of JR n , and JRn with this scalar product is an Euclidean space. In some sense, as we shall see later, see Proposition 3.25, it is the unique Euclidean space of dimension n. Other examples of inner products on jRn can be obtained by weighing the coordinates by nonnegative real numbers. Let A1, A2, . .. , An be positive real numbers. Then n
(xly)
:=
L
Aixiyi,
i=l
is an inner product on JRn. Other examples of inner products in infinite-dimensional vector spaces can be found in Chapter 10.
Let X be a vector space with an inner product. From the bilinearity of the inner product we deduce that
Ix + yl2 = (x + ylx + y) = (xix + y) + (ylx + y) = (xix) + 2(xly) + (yly) = Ixl 2 + 2(xly) + !y12
(3.1)
from which we infer the following.
3.3 Theorem. The following hold.
(i)
(PARALLELOGRAM IDENTITY)
Ix + Yl2 + Ix (ii) (POLARITY FORMULA)
We have
yl2 = 2 (lxl 2 + lyl2) We have
~ (Ix + yj2 -Ix _
(xly) =
Vx,y E X.
y12)
Vx,y E X,
hence we can get the scalar product of x and y by computing two norms. (iii) (CAUCHY-SCHWARZ INEQUALITY) The following inequality holds l(xly)1 :::;
Ixllyl,
Vx,y E Xj
moreover, (xly) = Ixllyl if and only if either y = 0 or x = >..y for some>" E JR, >.. ;:::: o. Proof. (i), (ii) follow trivially from (3.1). Let us prove (iii). If y = 0, the claim is trivial. If y =f 0, the function t -> Ix + tyl2, t E JR, is a second order nonnegative polynomial since
o~
Ix + tyl2 = (x
+ tylx + tV)
= (x
+ tylx) + (x + tylty)
= Ixl
2 + 2(xly) t + !y12 t 2;
hence its discriminant is nonpositive, thus ((xly))2 -lx[2Iy[2 ~ O. If (xly) = Ixllyl, then the discriminant of t -> Ix + tyl2 vanishes. If y =f 0, then for some t E JR we have Ix + ty[2 = 0, i.e., x = -tV. Finally, -t is nonnegative since
-t(yly)
= (xly) = Ixllyl
;::: o.
0
3.4 Definition. Let X be a vector space with an inner product. Two vectors x, y E X are said to be orthogonal, and we write x -.l y, if (xly) = o.
3.1 The Geometry of Euclidean and Hermitian Spaces
81
From (3.1) we immediately infer the following. 3.5 Proposition (Pythagorean theorem). Let X be a vector space with an inner product. Then two vectors x, y E X are orthogonal if and only if 3.6 Carnot's formula. Let x, y E 1R 2 be two nonzero vectors of 1R2, that we think of as the plane of Euclidean geometry with an orthogonal Cartesian reference. Setting x := (a, b), y := (c, d), and denoting by 0 the angle between Ox and Oy, it is easy to see that lxi, Iyl are the lengths of the two segments Ox and Oy, and that x.y := ac + bd = Ixllyl cosO. Thus (3.1) reads as Carnot's formula
Ix + yl2 = Ixl2
+ lyl2 + 21xllyl
cos o.
E IR n ,
In general, given two vectors x, y we have by Cauchy-Schwarz inequality Ix. y I :::::: Ixllyl, hence there exists a 0 E IR such that x.y Ixllyl =: cos O.
ois called the angle between x and y and denoted by xy. In this way (3.1) rewrites as Carnot's formula Ix + yl2 = Ixl 2 + lyl2 + 21xllyl cosO. Notice that the angle 0 is defined up to the sign, since cos 0 is an even function. 3.7 Proposition. Let X be a Euclidean vector space and let ( inner product. The norm of x EX,
I ) be
its
Ixl := J(xlx) is a function
I I:X
-+
IR with the following properties
(i) Ixl E IR+ Vx E X. (ii)
(iii) (iv)
(NONDEGENERACY) Ixl = 0 if and only if x = O. (I-HOMOGENEITY) = IAllxl VA E IR, Vx E X. (TRIANGULAR INEQUALITY) Ix yl :::::: Ixl Iyl Vx, y
I>.xl
+
+
EX.
Proof. (i), (ii), (iii) are trivial. (iv) follows from the Cauchy-Schwarz inequality since Ix + yl2 = Ixl 2 + lyl2 + 2(ylx) ::; [x[2 + ly[2 + 21(ylx)[
::; Ixl 2 + [Y12
+ 2 [x[ Iyl =
([x[
+ lyI)2). o
Finally, we call the distance between x and y E X the number d(x, y) := Ix - yl. It is trivial to check, using Proposition 3.7, that the distance function d : X x X -+ IR defined by d(x, y) := Ix - yl, has the following properties
82
3. Euclidean and Hermitian Spaces
(i) (NONDEGENERACY) d(x, y) ~ 0 Vx, Y E X and d(x, y) = 0 if and only if x = y. (ii) (SYMMETRY) d(x, y) = d(y, x) Vx, Y E X. (iii) (TRIANGULAR INEQUALITY) d(x,y)::::; d(x,z) +d(z,y) Vx,y,z E X. We refer to d as the distance in X induced by the inner product. 3.8 Inner products in coordinates. Let X be a Euclidean space, denote by ( I ) its inner product, and let (el' e2,"" en) be a basis of X. If x = L~=l xiei, y = L~=l yi ei E X, then by linearity
(xly) = Lxiyj(eil ej ). i,j
The matrix
G = [gij],
gij = (eilej)
is called the Gram matrix of the scalar product in the basis (el, e3,.·., en). Introducing the coordinate column vectors x = (xl, x 2, ... , x n )T and y = (yl, y2, ... , yn)T E IR.n and denoting by . e' the standard scalar product in IR.n, we have (xly) = xeGy = xTGy rows by columns. We notice that (i) G is symmetric, GT = G, since the scalar product is symmetric, (ii) G is positive definite, i.e., xTGx ~ 0 Vx E IR.n and xTGx = 0 if and only if x = 0, in particular, G is invertible. b. Hermitian spaces A similar structure exists on complex vector spaces. 3.9 Definition. Let X be a vector space over
C which is (i) (SESQUILINEAR), i.e.,
+ ,8wlz) = a(vlz) + ,8(wlz), (vlaw + ,8z) = a(vlw) + j3(vlz)
(av
Vv, w, z EX, Va,,8 E
Izi := ~
is called the norm of z E X.
3.10 Definition. A finite-dimensional complex space with a Hermitian product is called a Hermitian space.
3.1 The Geometry of Euclidean and Hermitian Spaces
83
3.11 Example. Of course the product (z, w) -> (zlw) := wz is a Hermitian product on e. More generally, the map ( I ) : x -> e defined by
en en
n
(zlw):= z.w
:=
Lzjw j j=l
en,
en.
is a Hermitian product on called the standard Hermitian product of As we shall see later, see Proposition 3.25, equipped with the standard Hermitian product is in a sense the only Hermitian space of dimension n.
en
Let X be a complex vector space with a Hermitian product ( the properties of the Hermitian product we deduce
1 ).
From
Iz + wl 2 = (z + wlz + w) = (zlz + w) + (wlz + w) 2 2 = (zlz) + (zlw) + (wlz) + (wlw) = Izl + Iwl + 2R(zlw)
(3.2)
from which we infer at once the following.
3.12 Theorem.
(i) We have
R(zlw) = ~ (Iz + wl 2 -lzl 2 -l wI2 )
Vz,w E X.
(ii) (PARALLELOGRAM IDENTITY) We have
Iz + wl 2 + Iz - wl 2
=
2(lzl 2 + Iw1 2 )
Vz,w E X.
(iii) (POLARITY FORMULA) We have 4(zlw)
= (Iz + wl 2 -Iz - w1 2 ) + i(lz + iwl 2 -Iz -
iWI2),
for all z,w E X. We therefore can compute the Hermitian product of z and w by computing four norms. (iv) (CAUCHY-SCHWARZ INEQUALITY) The following inequality holds Vz,w E X; l(zjw)1 ::; Izllwl, Izllwl if and only if either w =
moreover (zlw) = some A E JR., A 2: o.
0, or
z=
AW for
Proof. (i), (ii), (iii) follow trivially from (3.2). Let us prove (iv). Let z, w E X and te i6 , t, (J E R. From (3.2)
oX =
"It ER,
hence its discriminant is nonpositive, thus 1~(e-i6(zlw))1 :::;
Izllwl. Since (J is arbitrary, we conclude l(zlw)1 :::; Izllwl. The second part of the claim then follows as in the real case. If (zlw) = Izllwl, then the discriminant ofthe real polynomial t -> Iz + twl 2 , t E R, vanishes. If w =1= 0, for some t E R we have Iz + twl 2 = 0, i.e., z = -two Finally, -t is nonnegative since -t(wlw) = (zlw) = Izllwl ~ O. 0
84
3. Euclidean and Hermitian Spaces
3.13 4V. Let X be a complex vector space with a Hermitian product and let z, w E X. Show that l(zlw)1 = Izllwl if and only if either w = 0 or there exists A E C such that z = AW.
3.14 Definition. Let X be a complex vector space with a Hermitian product ( I ). Two vectors z, w E X are said to be orthogonal, and we write z J.. w, if (zlw) = o. From (3.2) we immediately infer the following. 3.15 Proposition (Pythagorean theorem). Let X be a complex vector space with a Hermitian product ( I ). If z, wE X are orthogonal, then
We see here a difference between the real and the complex cases. Contrary to the real case, two complex vectors, such that Iz+wl 2 = Iz1 2 + Iwl 2 holds, need not be orthogonal. For instance, choose X := C, (zlw) := wz, and let z = 1 and w = i. 3.16 Proposition. Let X be a complex vector space with a Hermitian product on it. The norm of z EX,
Izl:=~, is a real-valued function
(i) Izl E lR+ Vz E X. (ii) (NONDEGENERACY) (iii) (iv)
I I:X
-->
lR with the following properties
Izi = 0 if and only if z = O. IAzl IAlizl VA E C, Vz E X. Iz + wi :::; Izl + Iwl Vz, wE X.
(I-HOMOGENEITY) = (TRIANGULAR INEQUALITY)
Proof. (i), (ii), (iii) are trivial. (iv) follows from the Cauchy-Schwarz inequality since
Iz + wl 2
=
Izl 2+ Iwl 2+ 2~(zlw) ::::: Izl 2+ Iwl 2+ 21(zlw)1 ::::: Izl2 + Iwl 2+ 21zllwl = (izi + Iwlf)· o
Finally, we call distance between two points z, w of X the real number d(z, w) := Iz - wi. It is trivial to check, using Proposition 3.16, that the distance function d : X x X --> lR defined by d(z, w) := Iz - wi has the following properties (i) (NONDEGENERACY) d(z, w) ~ 0 Vz, w E X and d(z, w) = 0 if and only if z = w. (ii) (SYMMETRY) d(z, w) = d(w, z) Vz, wE X. (iii) (TRIANGULAR INEQUALITY) d(z,w):::; d(z,x)+d(x,w) Vw,x,z E X. We refer to d as to the distance on X induced by the Hermitian product.
3.1 The Geometry of Euclidean and Hermitian Spaces
85
3.17 Hermitian products in coordinates. If X is a Hermitian space, the Gram matrix associated to the Hermitian product is defined by setting
Using linearity n
(zlw) =
L
(ei[ej)ziwj = zTGw
i,j=I
I W2 , ... , Wn) E trn Z, Z 2 , ... , Z n) , W - (W, II..are th e coord'Inat e I'f Z -- (I vector columns of z and w in the basis (eI, e2, ... , en). Notice that T
(i) G is a Hermitian matrix, G = G, (ii) G is positive definite, i.e., zT Gz 2: 0 Vz E en and ZT Gz = 0 if and only if z = 0, in particular, G is invertible.
c. Orthonormal basis and the Gram-Schmidt algorithm 3.18 Definition. Let X be a Euclidean space with scalar product ( I ) or a Hermitian vector space with Hermitian product ( [ ). A system of vectors {ea } aEA C X is called orthonormal if Va, f3 E A. Orthonormal vectors are linearly independent. In particular, n orthonormal vectors in a Euclidean or Hermitian vector space of dimension n form a basis, called an orthonormal basis. 3.19 Example. The canonical basis (el, e2, ... , en) ofjRn is an orthonormal basis for the standard inner product in jRn. Similarly, the canonical basis (el, e2, ... , en) of en is an orthonormal basis for the standard Hermitian product in
en.
3.20'. Let ( [ ) be an inner (Hermitian) product on a Euclidean (Hermitian) space X of dimension n and let G be the associated Gram matrix in a basis (el, e2, ... , en). Show that G = Id n if and only if (er, e2, . .. , en) is orthonormal.
Starting from a denumerable system of linearly independent vectors, we can construct a new denumerable system of orthonormal vectors that span the same subspaces by means of the Gram-Schmidt algorithm. 3.21 Theorem (Gram-Schmidt). Let X be a real (complex) vector space with inner (Hermitian) product ( I ). Let VI, V2, ... , Vk, ... be a denumerable set of linearly independent vectors in X. Then there exist a set of orthonormal vectors WI, W2, ... , Wk, ... such that for each k = 1,2, ... Span { WI, W2, ... , Wk}
= Span { VI, V2,···, vk}.
86
3. Euclidean and Hermitian Spaces
Proof. We proceed by induction. In fact, the algorithm W~ = VI, W'
1
W'1·I
_
Wp -
Iw~I' Vp -
,<,P-I(
LJj=I
I .) .
V p W) W)'
W' W·p'-
never stops since
w~
#
P
Iw~I'
0 Vp = 1,2,3, ... and produces the claimed orthonormal basis.
o
3.22 Proposition (Pythagorean theorem). Let X be a real (complex) vector space with inner (Hermitian) product ( I ). Let (el' e2, ... , ek) be an orthonormal basis of X. Then k
= 2)xlej)ej
X
XEX,
i=l
that is the i th coordinate of x in the basis (e 1, e2, ... , en) is the cosine director (xlei) of x with respect to ei' Therefore we compute k
(xjy)
= ~)xlei) (ylei)
if X is Euclidean,
i=l
k
(xly)
= ~)xlei) (ylei)
if X is Hermitian,
i=l
so that in both cases Pythagoras's theorem holds: k
Ixl = (xix) = L l(xlei)1 2 . 2
i=l
Proof. In fact, by linearity, for j
(xlej)
= 1, ... , k
and x
= 2:r=I Xiei
we have
= Cf::xiei Iej) = I=xi(eilej) = I=xirSij = xj. i=l
i=l
i=l
Similarly, using linearity and assuming X is Hermitian, we have
(xly)
I
= (I=xiei (I=yjej) = i=l
j=l
n
=
L
I=
xiyj(eilej)
i,j=l
k
xiyjrSij = 2:>iif, i=l
i,j=l
hence, by the first part, n
(xly) = L(xlei) (ylei)' i=l
o
3.1 The Geometry of Euclidean and Hermitian Spaces
87
d. Isometries 3.23 Definition. Let X, Y be two real (complex) vector spaces with inner (Hermitian) products ( I )x and ( I )y. We say that a linear map A : X -> Y is an isometry if and only if IA(x)ly =
Ixlx
"Ix E X,
or, equivalently, compare the polar formula, if (A(x)IA(y))y
= (xly)X
"Ix,y E X.
Isometries are trivially injective, but not surjective. If there exists a surjective isometry between two Euclidean (Hermitian) spaces, then X and Yare said to be isometric. 3.24'. Let X, Y be two real (complex) vector spaces with inner (Hermitian) products ( I )x and ( I )y and let A : X ---> Y be a linear map. Show that the following claims are equivalent
(i) A is an isometry, (ii) B c X is an orthonormal basis if and only if A(B) is an orthonormal basis for A(X).
Let X be a real vector space with inner product ( I ) or a complex vector space with Hermitian product ( I ). Let (el' ez, ... , en) be a basis in X and £ : X -> OC n , (OC = lR. of OC = q be the corresponding system of coordinates. Proposition 3.22 implies that the following claims are equivalent.
(i) (e 1, ez, ... , en) is an orthonormal basis, (ii) £(x) = ((xlel),"" (xle n )), (iii) £ is an isometry between X and the Euclidean space lR.n with the standard scalar product (or en with the standard Hermitian product). In this way, the Gram-Schmidt algorithm yields the following.
3.25 Proposition. Let X be a real vector space with inner product ( I ) (or a complex vector space with Hermitian product ( I )) of dimension n. Then X is isometric to lR.n with the standard scalar product (respectively, to en with the standard Hermitian product), the isometry being the coordinate system associated to an orthonormal basis. In other words, using an orthonormal basis on X is the same as identifying X with lR.n (or with en) with the canonical inner (Hermitian) product. 3.26 Isometries in coordinates. Let us compute the matrix associated to an isometry R : X ---> Y between two Euclidean spaces of dimension nand m respectively, in an orthonormal basis (so that X and Yare respectively isometric to IR n (C n ) and IR m (C m ) by means of the associated coordinate system) . It is therefore sufficient to discuss real isometries R : IR n ---> IR m and complex isometries R : en ---> em. Let R : IR n ---> jRm be linear and let R E M m ,n(lR) be the associated matrix, R(x) = Rx, x E IR n . Denoting by (el, e2, ... , en) the canonical basis of IR n ,
88
3. Euclidean and Hermitian Spaces
ri = Rei Vi. Since (el, e2, ... , en) is orthonormal, R is an isometry if and only if are orthonormal. In particular, m 2 nand
(rr,
r2, ... , r n )
i.e., the matrix R is an orthogonal matrix,
When m = n, the isometries R : lR n --> lR n are necessarily surjective being injective, and form a group under composition. As above, we deduce that the group of isometries of lR n is isomorphic to the orthogonal group O(n) defined by
I
O(n) := {R E Mn,n(lR) RTR = Id n
}.
Observe that a square orthogonal matrix R is invertible with R- 1 = RT. If follows that RRT = Id and Idet RI = 1. Similarly, consider en as a Hermitian space with the standard Hermitian product. Let R : en --> em be linear and let R E Mm,n(C) be such that R(z) = Rz. Denoting by (el, e2, ... , en) the canonical basis of lR n ,
R
= [rr Ir21
I
... r n ]
ri = Rei Vi = 1, ... , m.
,
Since (er, e2, ... , en) is orthonormal, R is an isometry if and only if rr, r2, ... , r n are orthonormal. In particular, m 2 nand
i.e., the matrix R is a unitary matrix,
When m = n, the isometries R : en --> en are necessarily surjective being injective, moreover they form a group under composition. From the above, we deduce that the group of isometries of en is isomorphic to the unitary group U (n) defined by
Observe that a square unitary matrix R is invertible with R
RR
T
-1
= RT .
It follows that
= Id and I det RI = 1.
e. The projection theorem Let X be a real (complex) vector space with inner (Hermitian) product ( [ ) that is not necessarily finite dimensional, let V c X be a finitedimensional linear subspace of X of dimension k and let (e1, e2,"" ek) be an orthonormal basis of V. We say that x E X is orthogonal to V if (xlv) = 0 \::Iv E V. As (e1, e2, ... , ek) is a basis of V, x..l V if and only if (xlei) = o\::Ii = 1, ... ,k. For all x EX, the vector k
Pv(x)
:=
I)xlei)ei E V i=l
3.1 The Geometry of Euclidean and Hermitian Spaces
89
is called the orthogonal projection of x in V, and the map Pv : X ~ V, x ~ Pv(x), the projection map onto V. By Proposition 3.22, Pv(x) = x if x E V, hence ImP = V and p 2 = P. By Proposition 3.22 we also have IPv(x)1 2 = L:7=II(xleiW, The next theorem explains the name for Pv(x) and shows that in fact Pv(x) is well defined as it does not depend on the chosen basis (el' e2,···, ek)'
3.27 Theorem (of orthogonal projection). With the previous notation, there exists a unique z E V such that x - z is orthogonal to V, i.e., (x - zjv) = 0 Vv E V. Moreover, the following claims are equivalent.
(i) x - z is orthogonal to V, i.e., (x - zlv) = 0 Vv E V, (ii) z E V is the orthogonal projection of x onto V, z = Pv(x), (iii) z is the point in V of minimum distance from x, i.e.,
Ix - zl < Ix - vi
Vv E V, v =I- z.
In particular, Pv(x) is well defined as it does not depend on the chosen orthonormal basis and there is a unique minimizer of the function v ~ Ix - vi, v E V, the vector z = Pv(x). Proof. We first prove uniqueness. If Zl, Z2 E V are such that (x - zilv) then (Zl - Z2[V) = 0 Vv E V, in particular IZl - Z2!2 = o. (i)
= 0 for i = 1,2,
'* (ii). From (i) we have (xlei) = (Zlei) Vi = 1, ... ,k. By Proposition 3.22 k
k
Z = I)zlei)ei
= 2)x!ei)ei = Pv(x). i=l
i=l
This also shows existence of a point Z such that x - Z is orthogonal to V and that the definition of Pv(x) is independent of the chosen orthonormal basis (el, e2, ... , ek).
'* (i). If Z = Pv(x), we have for every j
(ii)
=
1, ... , k
k
(x - z[ej)
= (xlej) -
2)xlei)(ei[ej)
= (xlej)
- (x[ej)
= 0,
i=l
hence (x - zlv) = 0 Vv. (i)
'* (iii). Let v E V. Since (x Ix -
vl 2 =
Ix -
z[v) = 0 we have z
+z -
v[2 =
Ix -
zl2
+ Iz -
v1 2 ,
hence (iii). (iii) t =
'* (i). Let v E V. The function t
o.
Since
Ix -
z
+ tvl 2 = Ix -
->
Ix -
zl2
z
+ tv[2,
+ 2t~(x -
t E JR, has a minimum point at
zlv)
+ t 2 1v1 2 ,
necessarily ~(x - z[v) = O. If X is a real vector space, this means (x - zlv) = 0, hence (i). If X is a complex vector space, from ~(x - z[v) = 0 Vv E V, we also have ~(e-i(i(x - z[v)) = 0 '<;j() E JR Vv E V, hence (x - z[v) = 0 Vv E V and thus (ii). 0
We can discuss linear independence in terms of an orthogonal projection. In fact, for any finite-dimensional space V eX, x E V if and only if x - Pv(x) = 0, equivalently, the equation x - Pv(x) = 0 is an implicit equation that characterizes V as the kernel of Id - Pv.
90
3. Euclidean and Hermitian Spaces
3.28'. Let W = Span {VI, V2, ... , vd be a subspace of ]Kn. Describe a procedure that uses the orthogonal projection theorem to find a basis of W. 3.29'. Given A E Mm,n(lR), describe a procedure that uses the orthogonal projection theorem in order to select a maximal system of independent rows and columns of A. 3.30'. Let A E Mm,n(lR). Describe a procedure to find a basis of ker A. 3.31 ,. Given k linear independent vectors, choose among the vectors (el, e2, ... , en) of lR n (n - k) vectors that together with v!, V2, ... , Vk form a basis of lR n . 3.32 Projections in coordinates. Let X be a Euclidean (Hermitian) space of dimension n and let V C X be a subspace of dimension k. Let us compute the matrix associated to the orthogonal projection operator P v : X -> X in an orthonormal basis. Of course, it suffices to think of Pv as of the orthogonal projection on a subspace of lR n (en in the complex case). Let (el, e2, ... , en) be the canonical basis of lR n and V C lR n . Let VI, V2,.··, Vk be an orthonormal basis of V and denote by V = [vj J the n x k nonsingular matrix
so that Vj = L:~=l v}ej' Let P be the n x n matrix associated to the orthogonal projection onto V, Pv(x) = Px, or,
I
I
Pi=Pei, i=l, ... ,n.
P = [PI P21 ... Pn], Then
Pi = Pdei) =
k
k
j=l
j=l
2:( ei .Vj ) Vj = 2: v}Vj
i.e.,
(3.3)
p=vvT . The complex case is similar. With the same notation, instead of (3.3) we have k
Pi = Pv(ei) =
k
2:( ei .Vj) Vj = 2: vjVj j=l
k
j=l n
n
(3.4)
" '~ "~ h = '~(VV " T ) ih eh, = '~ vjVjeh j=lh=l
Le.,
h=l
-
T
P=VV.
f. Orthogonal subspaces Let X be a real vector space with inner product ( I ) or a complex vector space with Hermitian product ( I ). Suppose X is finite dimensional and let W C X be a linear subspace of X. The subset
I
Wi- := {x E X (xly) = 0 Vy E W} is called the orthogonal of W in X.
3.1 The Geometry of Euclidean and Hermitian Spaces
91
3.33 Proposition. We have
(i) WJ... is a linear subspace of X, (ii) W n WJ... = {O}, (iii) (wJ...)J...=w, (iv) Wand WJ... are supplementary, hence dim W + dim WJ... = n, (v) if Pw and PW-L are respectively, the orthogonal projections onto W and WJ... seen as linear maps from X into itself, then PW-L = Idx -
Pw· Proof. We prove (iv) and leave the rest to the reader. Let (Vi, V2, ... , Vk) be a basis of W. Then we can complete (Vi, V2, ... , Vk) with n - k vectors of the canonical basis to get a new basis of X. Then the Gram-Schmidt procedure yields an orthonormal basis
(Wi, W2, ... , w n ) of X such that W = Span {Wi, W2, ... , Wk}. On the other hand wk+i, ... ,Wn E W-L, hence dim W-L = n - k.
0
g. Riesz's theorem 3.34 Theorem (Riesz). Let X be a Euclidean or Hermitian space of dimension n. For any L E X* there is a unique XL E X such that \:Ix E X.
(3.5)
Proof. Assume for instance, that X is Hermitian. Suppose L -=f. 0, otherwise we choose = 0, and observe that dim 1m L = 1, and V := ker L has dimension n - 1 if dimX = n. Fix Xo E V-L with Ixol = 1, then every x E X decomposes as XL
x = x'
+ .Axo,
x' E ker L, .A = (xlxo).
Consequently,
L(x) = L(x')
+ .AL(xo) =
and the claim follows choosing
XL :=
(xlxo)L(xo) = (xIL(xo)xo)
o
L(xo)xo.
The map {3 : X* --+ X, L --+ XL defined by the Riesz theorem is called the Riesz map. Notice that {3 is linear if X is Euclidean and antilinear if X is Hermitian. 3.35 The Riesz map in coordinates. Let X be a Euclidean (Hermitian) space with inner (Hermitian) product ( I ), fix a basis and denote by x = (xl, x 2 , ... , x n ) the coordinates of x, and by G the Gram matrix of the inner (Hermitian) product. Let LEX· and let L be the associated matrix, L(x) = Lx. From (3.5) xTGxL
if X is Euclidean,
= xTGxL
if X is Hermitian,
= G-1L T
if X is Euclidean,
Lx = L(x) = (xlxd = Lx = L(x) =
(XIXL)
i.e., GXL GXL
= LT =
j?
or or
XL XL
= G-1I:
T
if X is Hermitian.
In particular, if the chosen basis (et, e2, ... , en) is orthonormal, then G = Id and XL =L XL
T
-T
= L
if X is Euclidean, if X is Hermitian.
92
3. Euclidean and Hermitian Spaces
Figure 3.1. Dynamometer.
3.36 Example (Work and forces). Suppose a mass m is fixed to a dynamometer. If B is the inclination of the dynamometer, the dynamometer shows the number
L = mg cos B,
(3.6)
where 9 is a suitable constant. Notice that we need no coordinates in JR3 to read the dynamometer. We may model the lecture of the dynamometer as a map of the direction v of the dynamometer, that is, as a map L : S2 ----> JR from the unit sphere S2 = {x E JR3 , Ixl
= I}
of the ambient space V into lR. Moreover, extending L homogeneously to
the entire space V by setting L(v) := Ivl L(v/lvl), v E JR3 \ {O}, we see that such an extension is linear because of the simple dependence of L from the inclination. Thus we can model the elementary work done on the mass m, the measures made using the dynamometer, by a linear map L : V ----> JR. Thinking of the ambient space V as Euclidean, by Riesz's theorem we can represent L as a scalar product, introducing a vector F := XL E V such that (vIF)
= L(v)
"Iv E V.
We interpret such a vector as the force whose action on the mass produces the elementary work L(v). Now fix a basis (el, e2, e3) of V. If F = (F 1, F2, F3)T is the column vector of the force coordinates and L = (L 1 , L 2 , L3) is the 1 x 3 matrix of the coordinates of L in the dual basis, that is, the three readings L; = L(e;), i = 1,2,3, of the dynamometer in the directions el, e2, e3, then, as we have seen,
In particular, if (e 1, e2 , e3) is an orthonormal basis,
h. The adjoint operator Let X, Y be two vector spaces both on IK. = lR. or IK. = C with inner (Hermitian) products ( I )x and ( I )y and let A : X -+ Y be a linear map. For any y E Y the map x -+ (A(x)ly)y
3.1 The Geometry of Euclidean and Hermitian Spaces
93
defines a linear map on X, hence by Riesz's theorem there is a unique A*(y) E X such that (A(x)ly)y
= (yIA*(x))x
Vx E X, Vy E Y.
(3.7)
It is easily seen that the map y --+ A*(y) from Y into X defined by (3.7) is linear: it is called the adjoint of A. Moreover,
(i) let A, B : X --+ Y be two linear maps between two Euclidean or Hermitian spaces. Then (A + B)* = A* + B*, (ii) (.\A)* = .\A* if .\ E IR and A : X --+ Y is a linear map between two Euclidean spaces, (iii) (.\A)* = ~A* if.\ E C and A : X --+ Y is a linear map between two Hermitian spaces, (iv) (B 0 A)* = A* 0 B* if A : X --+ Y and B : Y --+ Z are linear maps between Euclidean (Hermitian) spaces, (v) (A*)* = A if A: X --+ Y is a linear map. 3.37~. Let X, Y be vector spaces. We have already defined an adjoint with no use of inner or Hermitian products,
< A(y*), x >=< y*,A(x) >
A: y*
->
X*
"Ix E X, Vy* E Y*.
If X and Yare Euclidean (Hermitian) spaces, denote by {3x : X* -> X, {3y : y* -> Y the Riesz isomorphisms and by A* the adjoint of A defined by (3.7). Show that A* = 1 {3x oAo{3y . 3.38 The adjoint operator in coordinates. Let X, Y be two Euclidean (Hermitian) spaces with inner (Hermitian) products ( I )x and ( I )y. Fix two bases in X and Y, and denote the Gram matrices of the inner (Hermitian) products on X and Y respectively, by G and H. Denote by x the coordinates of a vector x. Let A : X -> Y be a linear map, A * be the adjoint map and let A, A * be respectively, the associated matrices. Then we have
(A(x)ly)y
=
x T ATHy,
(xIA*(y)) = xTGA*y,
if X and Y are Euclidean and
if X and Yare Hermitian. Therefore
GA* =ATH
if X and Yare Euclidean,
GA* =ATH
if X and Yare Hermitian,
or, recalling that G T = G, (G-l)T = G-l, HT = H if X and Yare Euclidean and that aT = G, (G-l)T = G-l, and lIT = H if X and Yare Hermitian, we find
A* = G-1ATH
if X and Yare Euclidean,
A* = G-1ATH
if X and Yare Hermitian.
In particular, in the Euclidean case, in the Hermitian case if and only if the chosen bases in X and Y are orthonormal.
(3.8)
94
3. Euclidean and Hermitian Spaces
3.39 Theorem. Let A : X ---7 Y be a linear operator between two Euclidean or two Hermitian spaces and let A* : Y ---7 X be its adjoint. Then RankA* = RankA. Moreover,
= ker A*, (1m A*)l- = ker A, (1m A)l-
= (ker A*)l-, ImA* = (ker A)l-.
1m A
Proof. Fix two orthonormal bases on X and Y, and let A be the matrix associated to A using these bases. Then, see (3.8), the matrix associated to A * is AT, hence RankA*
= Rank AT = Rank A = Rank A,
= dim Y
- dimker A*
and dim(ker A*)J.
= RankA* = Rank A = dimlmA.
On the other hand, 1m A C (ker A*)J. since, if y = A(x) and A*(v) = 0, then (ylv) = (A(x)[v) = (xIA*(v)) = O. We then conclude that (ker A*)J. = ImA. The other claims 0 easily follow. In fact, they are all equivalent to 1m A = (ker A*)J..
As an immediate consequence of Theorem 3.39 we have the following.
3.40 Theorem (The alternative theorem). Let A : X ---7 Y be a linear operator between two Euclidean or two Hermitian spaces and let A* : Y ---7 X be its adjoint. Then AlkerA-'- : (ker A)l- ---7 ImA and Ailm A : 1m A ---7 (ker A)l- are injective and onto, hence isomorphisms. Moreover,
(i) A(x) = y has at least a solution if and only if y zs orthogonal to ker A*, (ii) y is orthogonal to 1m A if and only if A * (y) = 0, (iii) A is injective if and only if A * is surjective, (iv) A is surjective if and only if A* is injective. 3.41 . A more direct proof of the equality ker A = (ImA*)l- is the following. For simplicity, consider the real case. Clearly, it suffices to work in coordinates and by choosing an orthonormal basis, it is enough to show that ImA = (ker AT)l- for every matrix A E Mm,n(lR). Let A = (a)) E Mm,n(lK) and let al, a 2 , ... , am be the rows of A, equivalently the columns of AT,
Then,
3.2 Metrics on Real Vector Spaces
2
a§x + ara~xlX1 ++ a~x2 +
Ax=
a
-
·· ·
(
(a
n
+ a;,x ) + a~Xn
al'x 1 + a2 x2 + ... + a~xn
Consequently, x E ker A if and only if a i ker A
eX
1 2
ex) eX
.. . am eX
.
= 0 't/i = 1, ... , m,
= Span {aI, a 2, ... , am} 1. =
95
Le.,
(ImAT)1..
(3.9)
3.2 Metrics on Real Vector Spaces In this section, we discuss bilinear forms on real vector spaces. One can develop similar considerations in the complex setting, but we refrain from doing it.
a. Bilinear forms and linear operators 3.42 Definition. Let X be a real linear space. A bilinear form on X is a map b : X x X -.., JR that is linear in each factor, i. e.,
b(ax + (3y, z) = a b(x, z) b(x, ay + (3z) = a b(x, y)
+ (3 b(y, z), + (3 b(z, z),
for all x, y, X E X and all a, (3 E JR. We denote the space of all bilinear forms on X by B(X). Observe that, if bE B(X), then b(O,x) = b(O,y) = 0 't/x,y E X. The class of bilinear forms becomes a vector space if we set
(b 1 + b2)(x, y) := b1 (x, y) + b2(x, y), (.\b)(x, y) := b(.\x, y) = b(x, A, y). Suppose that X is a linear space with an inner product denoted by If b E B(X), then for every y E X the map x -7 b(x, y) is linear, consequently, by Riesz's theorem there is a unique B := B(y) E X such that (3.10) 't/x E X. b(x, y) = (xIB(y))
( I ).
It is easily seen that the map y -7 B(y) from Y into X defined by (3.10) is linear. Thus (3.10) defines a one-to-one correspondence between B(X) and the space of linear operators £(X, X), and it is easy to see this correspondence is a linear isomorphism between B(X) and £(X, X).
96
3. Euclidean and Hermitian Spaces
c_ mtlrie.u 0"""'- r
di. B;rpoLh...", wei b d 0
Hypothe
d
n
'. II .......
a.
e1cl1e dar Gcom Irie zu Grond Ii
lat.,
.1..... c••" ... ..., .. a.-.n. .....,w . . ..".,
B.:11
.......
~
'.
,... •
J:/IM
u.. ...
pk
~_
....-&1Icbft
I'f_~
,..
v
Vllf~
.:,.
u .....
e-.~_
..
.u-
~
»-.Wal . . .
... ..-_ "' . . a..v ~ hdt. ·fIdori .... u- Ouuu.., _,... _ I.UI . . . _r L'anlh ..
B. Ri.manu.
-*"
~
• .tn .......
..,..,.,.,~ ~.
\I.a-.wn
...,~. ~
,
.. - . W1IIkIw M
MU.lbM . . . 0na4 . . . . . .
.... . , .J1p'" ~ - - - . . ~ ~ . QI't wi· __
.~
~1l1iIJL1dI
... • t .... __-.
•.....-1Uft a..... u.w.
~
~
pMIlt. . . ..,w_~h
.ue---
~
Gtt-"'"'"
-'~I~ lo-'.. l»4_ :r...... _~~~
..
........... ,.
GOUIA.IU, I•
.. "
~
t• .di:6II •
...'
DI.lullfl'u'
.
"""lJiW.II
....
.. ....,.,..
c
"*-....,._
w _ ,.............r....... r_ .. "
~
a--j
ill
~
'
1...
,",p."lIhlll. I
Figure 3.2. Frontispiece and a page of the celebrated dissertation of G. F. Bernhard Riemann (1826--1866).
3.43 Bilinear forms in coordinates. Let X be a finite-dimensional vector space and let (e I, e2, ... , en) be a basis of X. Let us denote by B the n x n matrix, sometimes called the Gram matrix of b,
Recall that the first index from the left is the row index. Then by linearity, if for every x, y, x = (xl, x 2 , .. . , xn)T and y = (yl, y2, . .. , yn)T E jRn are respectively, the column vectors of the coordinates of x and y, we have n
b(x, y)
=
L
bijXiyi
=
x T • (By)
= xTBy.
i)j=l
In particular, a coordinate system induces a one-to-one correspondence between bilinear forms in X and bilinear forms in jRn. Notice that the entries of the matrix B have two lower indices that sum with the indices of the coordinates of the vectors x, y that have upper indices. This also reminds us that B is not the matrix associated to a linear operator B related to B. In fact, if instead N is the associated linear operator to b, b(x, y) = (xIN(y))
Vx,y E X,
then yTBx = b(x, y) = (xIN(y)) = yTNTGx
where we have denoted by G the Gram matrix associated to the inner product on X, G = (gij], gij = (eilej), and by N the n x n matrix associated to N : X -> X in the basis (el, e2, . .. , en). Thus or, recalling that G is symmetric and invertible,
3.2 Metrics on Real Vector Spaces
97
b. Symmetric bilinear forms or metrics 3.44 Definition. Let X be a real vector space. A bilinear form b E l3(X) is said to be (i) symmetric or a metric, ifb(x,y) = b(y,x) Vx,y E X, (ii) antisymmetric ifb(x,y) = -b(y,x) Vx,y E X. The space of symmetric bilinear forms is denoted by Sym(X). 3.45~. Let b E B(X). Show that bs(x,y):= ~(b(x,y) +b(y,x)), x,y E X, is a symmetric bilinear form and bA(X, y) := ~(b(x, y) - bey, x)), x, y E X, is an antisymmetric bilinear form. In particular, one has the natural decomposition
b(x, y) = bs(x, y)
+ bA(X, y)
of b into its symmetric and antisymmetric parts. Show that b is symmetric if and only if b = bs, and that b is antisymmetric if and only if b = bA. 3.46~.
Let bE B(X) be a symmetric form, and let B be the associated Gram matrix. Show that b is symmetric if and only if B T = B. 3.47~. Let b E B(X) and let N be the associated linear operator, see (3.10). Show that N is self-adjoint, N* = N, if and only if bE Sym(X). Show that N* = -N if and only if b is antisymmetric.
c. Sylvester's theorem 3.48 Definition. Let X be a real vector space. We say that a metric on X, i. e., a bilinear symmetric form g : X x X ----> IR is (i) nondegenerate ifVx E X, x -I- 0 there is y E X such that b(x,y) andVy E X, y -I- 0 there is x E X such that b(x,y) -I- 0, (ii) positively definite ifb(x,x) > 0 Vx E X, x -I- 0, (iii) negatively definite if b(x, x) < 0 Vx E X, x -I- o.
-I- 0
3.49~. Show that the scalar product (xly) on X is a symmetric and nondegenerate bilinear form. We shall see later, Theorems 3.52 and 3.53, that any symmetric, nondegenerate and positive bilinear form on a finite-dimensional space is actually an inner product.
3.50 Definition. Let X be a vector space of dimension n and let g E Sym(X) be a metric on X. (i) We say that a basis (ell e2,"" en) is g-orthogonal if g(ei,ej) Vi, j = 1, ... ,n, i -I- j. (ii) The radical of g is defined as the linear space rad (g) := {x E X I g(x, y) = 0 Vy EX}. (iii) The range of the metric g is r(g) := n - dimradg.
=
0
98
3. Euclidean and Hermitian Spaces
Figure 3.3. Jorgen Gram (1850--1916) and James Joseph Sylvester (1814-1897).
(iv) The signature of the metric 9 is the triplet of numbers (i+ (g), i_ (g), io(g)) where i+ (g) := maximum of the dimensions of the subspaces V C X on which 9 is positive definite, g(v, v) > 0 I::/v E V, v#- 0, L (g) := maximum of the dimensions of the subspaces V C X on which 9 is negative definite, g(v, v) < 0 I::/v E V, v #- 0, io(g):= dimrad(g).
One immediately sees the following.
3.51 Proposition. We have
(i) The matrix associated to 9 in a basis (el' e2,'." en) is diagonal if and only if (el' e2, ... , en) is g-orthogonal, (ii) 9 is nondegenemte if and only ifrad(g) = {O}, (iii) if G is the matrix associated to 9 in a basis, then x E radg if and only if its coordinate vector belongs to ker G; thus r(g) = Rank G and 9 is nondegenemte if and only if G is not singular, (iv) if X is Euclidean and G E £(X, X) is the linear opemtor associated to g, by g(x, y) = (xIG(y)) I::/x, y E X, then rad (G) = kerG, hence r(g) = Rank G and 9 is nondegenemte if and only if G is invertible.
3.52 Theorem (Sylvester). Let X be a finite-dimensional vector space and let (el' e2, ... , en) be a g-orthogonal basis for a metric 9 on X. Denote by n+, n_ and no the numbers of elements in the basis such that respectively, we have g(ei, ei) > 0, g(ei, ei) < 0, g(ei, ei) = o. Then n+ = i+(g), n_ = L (g) and no = i o(g). In particular, n+, n_, no do not depend on the chosen g-orthogonal basis, and
i+(g)
+ L(g) + io(g) = n.
3.2 Metrics on Real Vector Spaces
Proof. Suppose that g(ei, ei)
>0
99
for i = 1, ... , n+. For each v = L~l viei, we have
n+ g(v, v) = ""' L....J
Ivi I2 g(ei, ei) > 0,
i=l
hence dim Span {q, e2, ... , e n +} ::; i+ (g). On the other hand, if W C X is a subspace of dimension i+(g) such that g(v, v) > 0 "Iv E W, we have WnSpan{en++l,
,e n } = {o}
since g(v, v) ::; 0 for all v E Span{en++l, ,e n }. Therefore we also have i+(g) ::; n - (n - n+) = n+. Similarly, one proves that n_ = L(g). Finally, since G:= [g(ei,ej)] is the matrix associated to 9 in the basis (el' e2, ... , en), we have io(g) = dimrad(g) = dimkerG, and, since G is diagonal, dim ker G = no. 0
d. Existence of g-orthogonal bases The Gram-Schmidt algorithm yields the existence of an orthonormal basis in a Euclidean space X. We now see that a slight modification of the GramSchmidt algorithm allows us to construct in a finite-dimensional space a g-orthogonal basis for a given metric g. 3.53 Theorem (Gram-Schmidt). Let 9 be a metric on a finite-dimensional real vector space X. Then 9 has a g-orthogonal basis. Proof. Let r be the rank of g, r:= n-dimrad (g), and let (Wi, W2, ... , w n - r ) be a basis of rad (g). If V denotes a supplementary subspace of rad (g), then V is g-orthogonal to radg and dim V = r. Moreover, for every v E V there is z E X such that g(v, z) i= 0. Decomposing z as z = W + t, W E V, t E rad (g), we then have g(v, w) = g(v, w) + g(v, t) = g(v, z) i= 0, i.e., 9 is nondegenerate on V. Since trivially, (Wi, W2, ... , Wn - r ) is g-orthogonal and V is g-orthogonal to (Wi, W2, ... , Wn - r ), in order to conclude it suffices to complete the basis (Wi, W2, ... , w n - r ) with a g-orthogonal basis of V; in other words, it suffices to prove the claim under the further assumption that 9 be nondegenerate. We proceed by induction on the dimension of X. Let (fl, 12, ... , In) be a basis of X. We claim that there exists el E X with g(el,q) i= 0. In fact, iffor some Ii we have 9(fi,!i) i= 0, we simply choose el := Ii, otherwise, if g(fi, lil = for all i, for some k i= we must have g(fl,lk) i= 0, since by assumption rad(g) = {O}. In this case, we choose q := fl +!k as
°
°
Now it is easily seen that the subspace
I
Vi := {v E X g(el'v) =
o}
supplements Span {q}, and we find a basis (V2,' .. , Vn) of Vi such that g(Vj, el) = for all j = 2, ... ,n by setting
°
Since 9 is nondegenerate on Vi, by the induction assumption we find a g-orthogonal basis (e2, ... , en) of Vi, and the vectors (q, e2, ... , en) form a g-orthogonal basis of X. 0
100
3. Euclidean and Hermitian Spaces
A variant of the Gram-Schmidt procedure is the following one due to Carl Jacobi (1804-1851). Let 9 : X x X --+ IR be a metric on X. Let (!I, 12,.··, fn) be a basis of X, let G be the matrix associated to 9 in this basis, G = [gij], gij = g(Ji, fj)· Set D.o = 1 and for k = 1, ... ,n D.k := detGk where G k is the k x k submatrix of the first k rows and k columns. 3.54 Proposition (Jacobi). If D.k i= 0 for all k = 1, ... ,n, there exists a g-orthogonal basis (el' e2, ... , en) of X; moreover 9 (ek, ek )
D.k-l :=--s:;;.
Proof. We look for a basis (el, e2, . .. , en) so that
:: : ::;: '+ alj, \ en = a;!r + ... + a':;fn or, equivalently, k
ek :=
L: ai.,Ji,
k = 1, .. . ,n,
(3.11)
i=1
as in the Gram-Schmidt procedure, such that g(ei,ej) =0 for i i' j. At first sight the system g(ei, ej) = 0, i i' j, is a system in the unknowns ai.,. However, if we impose that for all k's g(ek,f;) = 0 Iii = 1, ... , k - 1, (3.12) by linearity g(ek,ei) = 0 for i < k, and by symmetry g(ek,ei) = 0 for i > k. It suffices then to fulfill (3.12) Le., solve the system of k-1 equations in k unknowns a~, a~, ... , at k
L: g(Jj, fd a{ = 0,
Iii = 1, .. , , k - 1.
(3.13)
j=1 If we add the normalization condition k
L:g(Jj,!k)a~ = 1, (3.14) j=1 we get a system of k equations in k unknowns of the type Gkx = b, where G k = [gij], gij := g(Ji,fj), x = (a~, ... ,at)T and b = (O,O, ... ,l)T. Since detGk = Ll k and Llk i' 0 by assumption, the system is solvable. Due to the arbitrarity of k, we are able to find a g-orthogonal basis of type (3.11). It remains to compute g(ek, ek)' From (3.13) and (3.14) we get g(ek,ek) =
k
k
k
k
i,j=1
j=1
i=1
j=1
L: a~a~g(Ji,Ji) = L:a{(L:g(Ji,fj)ai.,) = L:a~8jk = at,
and we compute at by Cramer's formula,
k
Llk-l
ak=~'
o
3.2 Metrics on Real Vector Spaces
101
3.55 Remark. Notice that Jacobi's method is a rewriting of the GramSchmidt procedure in the case where g(Ji, fi) =I 0 for all i's. In terms of Gram's matrix G:= [g(ei,ej)], we have also proved that
TTGT = diag
{~~:l }
for a suitable triangular matrix T.
3.56 Corollary (Sylvester). Suppose that ~l,"" ~k =I O. Then the metric 9 is nondegenerate. Moreover, L (g) equals the number of changes of sign in the sequence (1, ~l, ~2,"" ~n). In particular, if ~k > 0 for all k's, then 9 is positive definite. Let (eI, e2, ... , en) be a g-orthogonal basis of X. By reordering the basis in such a way that
>0 ifj = 1, ... ,i+(g), <0 if j = i+(g) + 1, ... , i+(g) =0 otherwise;
+ i_(g),
and setting
we get
g(fj,Jj)
~{
1 -1
o
if j = 1, ... , i+(g), if j = i+ (g) + 1, ... , i+ (g) otherwise.
+L
(g),
e. Congruent matrices It is worth seeing now how the matrix associated to a bilinear form changes when we change bases. Let (el' e2, ... , en) and (II, 12,.··, fn) be two bases of X and let R be the matrix associated to the map R : X -+ X, R(ei) := fi in the basis (el' e2,"" en), that is
where r i is the column vector of the coordinates of fi in the basis (e 1, e2, ... , en). As we know, if x and x' are the column vectors of the coordinates of x respectively, in the basis (el' e2,"" en) and (II, 12,···, fn), then x = Rx'. Denote by Band B' the matrices associated to b respectively, in the coordinates (eI, e2, ... , en) and (II, 12,.··, fn). Then we have
102
3. Euclidean and Hermitian Spaces
b(x, y) = x,TB,y', b(x,y) = xTBy = X,TRTBRy', hence
B' =RTBR.
(3.15)
The previous argument can be of course reversed. If (3.15) holds, then B and B' are the Gram matrices of the same metric b on JRn in different coordinates b(x, y) = xTb'y = (RxfB(Ry). 3.57 Definition. Two matrices A, B E Mn,n(JR) are said to be congruent if there exists a nonsingular matrix R E Mn,n(JR) such that B = R TAR. It turns out that the congruence relation is an equivalence relation on matrices, thus the n x n matrices are partitioned into classes of congruent matrices. Since the matrices associated to a bilinear form in different basis are congruent, to any bilinear form corresponds a unique class of congruent matrices. The above then reads as saying that two matrices A, BE Mn,n(JR) are congruent if and only if they represent the same bilinear form in different coordinates. Thus, the existence of a g-orthogonal basis is equivalent to the following.
3.58 Theorem. A symmetric matrix A E Mn,n(JR) is congruent to a diagonal matrix. Moreover, Sylvester's theorem reads equivalently as the following. 3.59 Theorem. Two diagonal matrices I, J E Mn,n(JR) are congruent if and only if they have the same number of positive, negative and zero entries in the diagonal. If, moreover, a symmetric matrix A E Mn,n(JR) is congruent to
1=
Boo o 1- 1 o
Idb
0
0
0
then (a, b, n - a - b) is the signature of the metric yT Ax. Thus the existence of a g-orthogonal matrix in conjunction with Sylvester's theorem reads as the following.
3.60 Theorem. Two symmetric matrices A, B E Mn,n(JR) are congruent if and only if the metrics yTAx and yTBx on JRn have the same signature (a, b, r). In this case, A and B are congruent to
3.2 Metrics on Real Vector Spaces
103
Go o 1- I
0
1=
0
Id b
0
0
0
f. Classification of real metrics Since reordering the basis elements is a linear change of coordinates, we can now reformulate Sylvester's theorem in conjunction with the existence of a g-orthonormal basis as follows. Let X, Y be two real vector spaces, and let g, h be two metrics respectively, on X and Y. We say that (X, g) and (Y, h) are isometric if and only if there is an isomorphism L : X ----* Y such that h(L(x), L(y)) = g(x, y) Vx, Y E X. Observing that two metrics are isometric if and only if, in coordinates, their Gram matrices are congruent, from Theorem 3.60 we infer the following.
3.61 Theorem. (X, g) and (Y, h) are isometric if and only if 9 and h have the same signature,
Moreover, if X has dimension n and the metric 9 on X has signature (a,b,r), a+b+r = n, then (X,g) is isometric to (lR n , h) where h(x,y):= xTHyand
Go
H=
o o
o
1- Id 1 0 b
0
o
According to the above, the metrics over a real finite-dimensional vector space X are classified, modulus isometries, by their signature. Some of them have names: (i) The Euclidean metric: i+(g) = n, i_(g) = io(g) a scalar product. (ii) The pseudoeuclidean metrics: L(g) = O. (iii) The Lorenz metric or Minkowski metric: i+(g) io(g) = O. (iv) The Artin metric i+(g) = L(g) = p, io(g) = O.
= 0; =
in this case 9 is
n - 1, L(g)
=
1,
3.62~. Show that a bilinear form 9 on a finite-dimensional space X is an inner product on X if and only if 9 is symmetric and positive definite.
104
3. Euclidean and Hermitian Spaces
g. Quadratic forms • Let X be a finite-dimensional vector space over JR and let b E B(X) be a bilinear form on X. The quadratic form ¢> : X --> JR associated to b is defined by ¢>(x) = b(x, x), x E X.
Observe that ¢> is fixed only by the symmetric part of b 1
bs(x, y) := 2(b(x, y)
+ b(y, x))
since b(x, x) = bs(x, x) Vx E X. Moreover one can recover bs from ¢> since bs is symmetric,
bs(x, y) =
~ (¢>(x + y) -
¢>(x) - ¢>(y)).
Another important relation between a bilinear form b E B(X) and its quadratic form ¢> is the following. Let x and v EX. Since
¢>( x + tv) = ¢>( x) we have
d
dt ¢>(x
+ t (b(x, v) + b( v, x)) + t 2 ¢>( v), + tV)lt==o
= 2 bs(x, v).
(3.16)
We refer to (3.16) saying that the symmetric part bs of b is the first variation of the associated quadratic form. 3.63 Homogeneous polynomials of degree two. Let B = [b ij ] E Mn,n(JR) and let n
L
b(x,y) := xTBy =
bijxiyj
i,j==l
be the bilinear form defined by B on JRn, (yl, y2, ... , yn). Clearly,
X
=
(xl, x 2 , ... , x n ), y
n
¢>(x) = b(x, x) = xTBx =
L
bijXiX
j
i,j==l
is a homogeneous polynomial of degree two. Conversely, any homogeneous polynomial of degree two
P(x) =
L
bijxiX j = xTBx
i,j=l,n iSj
defines a unique symmetric bilinear form in JRn by
b(x, y) :=
~ (P(x + y) -
with associated quadratic form P.
P(x) - P(Y))
3.2 Metrics on Real Vector Spaces
3.64 Example. Let (x, y) be the standard coordinates in mial
ax
2
+ bxy + cy2 =
]R2.
105
The quadratic polyno-
b~2) (:)
(x, y) (b;2
is the quadratic form of the metrics
3.65 Derivatives of a quadratic form. From (3.16) we can compute the partial derivatives of the quadratic form ¢(x) := xTGy. In fact, choosing v = eh, we have
hence, arranging the partial derivatives in a 1 x n matrix, called the Jacobian matrix of ¢,
I
a¢ D¢(x):= [ -a¢ 1 (x) 2 (x) ax ax
I... I-(x) a¢ ] ax n
we have or, taking the transpose,
'V¢(x) := (D¢(x))T = (G
+ GT)x.
h. Reducing to a sum of squares Let 9 be a metric on a real vector space X of dimension n and let ¢ be the associated quadratic form. Then, choosing a basis (el' e2, ... , en) we have n
¢(x)
= g(x,x) = ~)xi)2g(ei,ei) i=l
if and only if (el' e2, ... , en) is g-orthogonal, and the number of positive, negative and zero coefficients is the signature of g. Thus, Sylvester's theorem in conjunction with the fact that we can always find a g-orthogonal basis can be rephrased as follows.
3.66 Theorem (Sylvester's law of inertia). Let ¢(x) = g(x, x) be the quadratic form associated to a metric 9 on an n-dimensional real vector space.
(i) There exists a basis (fl, 12, ... , in) of X such that i+(g)
¢(x)
L(g)
= L (X i )2 - L (x i )2, i=l
n X
i=l
where (i+(g), L(g), io(g)) is the signature of g.
=
Lxiii, i=l
106
3. Euclidean and Hermitian Spaces
(ii) If for some basis (el' e2, ... , en) n
n
¢(x)
= L ¢(ei)lx i I 2,
x:= Lxiei'
(3.17)
i=l
i=l
then the numbers n+, n_ and no respectively, of positive, negative and zero ¢( ei) 's are the signature (i+ (g), i - (g), i o(g)) of g. 3.67 Example. In order to reduce a quadratic form ¢J to the canonical form (3.17), we may use Gram-Schmidt's algorithm. Let us repeat it focusing this time on the change of coordinates instead of on the change of basis. Suppose we want to reduce to a sum of squares by changing coordinates, the quadratic form n
¢J(x) =
L
aijXjx\
i,j=l
where at least one of the aij'S is not zero. We first look for a coefficient akk that is not zero. If we find it, we go further, otherwise if all akk vanish, at least one of the mixed terms, say al2, is nonzero; the change of variables
transforms a12x l x 2 into al2((yl)2 - (y2)2), and since all = a22 = 0, in the new coordinates (yl, y2, ... , yn) the coefficient of (yl)2 is not zero. Thus, possibly after a linear change of variables, we write ¢J as
We now complete the square and set Y l = allyl { yJ = yJ
+ '2:,']=2
for j = 2, ... , n.
so that ¢J(x) = _1_ (allyl all
fyj,
+
t
a2j yj)2 j=2 2
+C
= _1_(yl)2 all
+C
where C contains only products of y2, ... , yn. The process can then be iterated. 3.68 Example. Show that Jacobi's method in Proposition 3.54 transforms ¢J in ~o 1 2 ¢J(x) = - ( x ) ~l
+
~l 2 2 -(x) ~2
+ ... + -~2 ( x n ) 2 , ~3
if x = I:~=l xiei, for a suitable g-orthogonal basis (el' e2,.··, en). 3.69 Example (Classification of conics). The conics in the plane are the zeros of a second degree polynomial in two variables P(x,y) := ax 2 + 2bxy
+ cy2 + dx + ey+ f
= 0,
(x,y) E ]R2,
(3.18)
3.2 Metrics on Real Vector Spaces
107
where a, b, c, d, e, f E JR. Choose a new system of coordinates (X, Y), X = ax + (3y, Y = + 8y in which the quadratic part of P transforms into a sum of squares
"(x
ax 2 + bxy + cy2 = pX 2 + qy 2 , consequently, Pinto
pX 2
+ q y 2 + 2r X + 2s Y + f
= 0.
Now we can classify the conics in terms of the signs of p, q and the conic reduces to the straight line
If p
'I
2rX
°
+ 2sY + f
f.
If p, q are zero,
= 0.
and q = 0, then, completing the square, the conic becomes r p(X - XO)2 + 2sY + f = 0, Xo =-, p
i.e., a parabola with vertex in (Xo,O) and axis parallel to the axis of Y. Similarly, if p = and q 'I 0, the conic is a parabola with vertex in (0, Yo), Yo := slq, and axis parallel to the X axis. Finally, if pq 'I 0, completing the square, the conic is
°
p(X - XO)2
+ q(Y -
YO)2
+f
= 0,
Xo
= rip,
Yo
= slq,
i.e., it is o a hyperbola if f 'I and pq < 0, o two straight lines if f = and pq < 0, o an ellipse if sgn (I) = -sgn (p) and pq > 0, o a point if f = and pq > 0, o the empty set if sgn (I) = sgn (p) and pq > 0. Since we have operated with linear changes of coordinates that map straight lines into straight lines, ellipses into ellipses, and hyperbolas into hyperbolas, we conclude the following.
°
° °
3.70 Proposition. The conics in the plane are classified in terms of the signature of their quadmtic part and of the sign of the zero term. 3. 71 ~. The equation of a quadric i.e., of the zeros of a second order polynomial in n variables, see Figure 3.4 for n = 3, has the form
°
°
p
¢(x)='L X ; i=l
n
'L X; =1. i=p+l
(v) Suppose detA = 0. Since A = AT, we have kerA = (lmA)-L. Choosing a basis in which the first k elements generate 1m A and the last n - k ker A, then A writes as
([;J :)
108
3. Euclidean and Hermitian Spaces
(c), (k), (q)
(b)
~~ (g)
(h)
V
./
(z)
(j)
.~ (n)
cJ (p)
Figure 3.4. Quadrics: (a) ellipsoid: a2x2+b2y2+c2z2 = 1; (b) point: a2x2+b2y2+c2z2 = 0; (c) imaginary ellipsoid: a 2 x 2 + b2y 2 + c 2 z 2 = -1; (d) hyperboloid of one sheet:
a 2x 2 + b2 y 2 _ c 2 z2 = 1; (e) cone: a 2x 2 + b2y 2 - c 2 z2 = 0; (f) hyperboloid of two sheets: _a 2 x 2 - b2 y 2 + c 2 z 2 = 1; (g) paraboloid: a 2 x 2 + b2 y 2 - 2cz = 0, c > 0; (h) saddle: a 2 x 2 - b2 y 2 - 2cz = 0, c > 0; (i) elliptic cylinder: a 2x 2 + b2 y 2 = 1; (j) straight line: a 2 x 2 + b2y 2 = 0; (k) imaginary straight line: a 2 x 2 + b2 y 2 = -1; (I) hyperbolic cylinder: a 2 x 2 - b2 y 2 = 1; (m) nonparallel planes: a 2 x 2 - b2 y 2 = 0; (n) parabolic cylinder: a 2 x 2 - 2cz, c > 0; (0) parallel planes: a 2x 2 = 1; (p) plane: a 2 x 2 = 0; (q) imaginary plane: a 2 x 2 = -1.
3.3 Exercises
109
in this new basis and the quadric can be written as
+ 2(b'lx') + 2(b"lx") + C2 = 0 ker A, x = x' + x", b = b' + b" and
¢(x) = (X')T A'x'
where X', b' E Im A, x", b" E Applying the argument in (iii) to
(x,)TA'x'+2b'.x'
det A'
=I
O.
+C2,
we may further transform the quadric into
¢(x)
= (x')T A'x' + C3 + 2b" .x" = 0,
and, writing y' := -2b" .x" - C3, that is, by means of an affine transformation that does not change the variable x', we end up with ¢( x) = (x') T A'x' - y' = o.
3.3 Exercises 3.72'. Starting from specific lines or planes expressed in parametric or implicit way in IR3, write o the straight line through the origin perpendicular to a plane, o the plane through the origin perpendicular to a straight line, o the distance of a point from a straight line and from a plane, o the distance between two straight lines, o the perpendicular straight line to two given nonintersecting lines, o the symmetric of a point with respect to a line and to a plane, o the symmetric of a line with respect to a plane. 3.73'. Let X, Y be two Euclidean spaces with inner products respectively, ( ( I )y. Show that X x Y is an Euclidean space with inner product
I )x
and
(Xl,X2), (Yl,Y2) E X x Y. Notice that X x {O} and {O} x Yare orthogonal subspaces of X x Y. 3.74'. Let X,Y E IRn . Show that x..l Y if and only if Ix - ayl 21xl Va E R 3.75'. The graph of the map A(x)
:=
Ax, A E Mm,n(IR) is defined as
GA:= {(x,y) Ix E IR n , y E IRk, Y = A(x)} C IR n x IRk. Show that G A is a linear subspace of IRn+k of dimension n and that it is generated by the column vectors of the (k + n) x n
no
3. Euclidean and Hermitian Spaces
Also show that the row vectors of the k x (n
+ k)
matrix
(08) generates the orthogonal to GA. 3.76'. Write in the standard basis of lR4 the matrices of the orthogonal projection on specific subspaces of dimension 2 and 3. 3.77 ,. Let X be Euclidean or Hermitian and let V, W be subspaces of X. Show that V.L n W.L = (V + W).L. 3.78'. Let f : Mn,n(lK) --. lK be a linear map such that f(AB) = f(BA) VA, B E Mn,n(lK). Show that there is >. E lK such that f(X) = >. tr X for all X E Mn,n(lK) where tr X := L~=l xl if X = [xj]. 3.79'. Show that the bilinear form b: Mn,n(lR) x Mn,n(lR) --.lR given by n
"T i b(A, B) := tr (A T B):= " L.)A B)i i=l
defines an inner product on the real vector space Mn,n(lR). Find the orthogonal of the symmetric matrices. 3.80'. Given n + 1 points Zl, Z2, ... , Zn+l in C, show that there exists a unique polynomial of degree at most n with prescribed values at Zl, Z2, .. " Zn+l. [Hint: If P n is the set of complex polynomials of degree at most n, consider the map >: P n --. C n + l given by >(P) := (P(Zl), P(Z2), ... ,P(Zn)).] 3.81 , Discrete integration. Let tl, t2, ... , t n be n points in fa, bJ C R Show that there are constants al, a2, .•. , an such that
for every polynomial of degree at most n - 1. 3.82'. Let
Q := [0, l]n =
{x E lR
n
I°~ Xi ~ 1,
i = 1, ... ,
n}
be the cube of side one in lR n . Show that its diagonal has length ,;n. Denote by the vertices of Q and by x := (1/2,1/2, ... ,1/2) the center of Q. Show that the balls around x that do not intersect the balls B(Xi, 1/2), i = 1, ... , 2 n , necessarily have radius at most R n := (,;n - 2)/2. Conclude that for n > 4, B(x, R n ) is not contained in Q. Xl, ... , X2n
3.83'. Give a few explicit metrics in lR 3 and find the corresponding orthogonal bases. 3.84'. Reduce a few explicit quadratic forms in lR 3 and lR 4 to their canonical form.
4. Self-Adjoint Operators
In this chapter, we deal with self-adjoint operators on a Euclidean or Hermitian space, and, more precisely, with the spectral theory for self-adjoint and normal operators. In the last section, we shall see methods and results of linear algebra at work on some specific examples and problems.
4.1 Elements of Spectral Theory 4.1.1 Self-adjoint operators a. Self-adjoint operators 4.1 Definition. Let X be a Euclidean or Hermitian space X. A linear operator A : X -+ X is called self-adjoint if A * = A. As we can see, if A is the matrix associated to A in an orthonormal basis, then AT and AT are the matrices associated to A* in the same basis according to whether X is Euclidean or Hermitian. In particular, A is self-adjoint if and only if A = AT in the Euclidean case and A = AT in the Hermitian case. Moreover, as a consequence of the alternative theorem we have X = ker A EB 1m A,
ker A 1.. ImA
if A : X -+ X is self-adjoint. Finally, notice that the space of self-adjoint operators is a subalgebra of £(X, X). Typical examples of self-adjoint operators are the orthogonal projection operators. In fact, we have the following.
4.2 Proposition. Let X be a Euclidean or Hermitian space and let P : X -+ X be a linear operator. P is the orthogonal projection onto its image if and only if P* = P and PoP = P.
112
4. Self-Adjoint Operators
Proof. This follows, for instance, from 3.32. Here we present a more direct proof. Suppose P is the orthogonal projection onto its image. Then for every y E X (y-P(y)lz) = 0 'liz E 1m P. Thus y = P(y) if Y E 1m P, that is P(x) = PoP(x) = p 2 (x) 'lix E X. Moreover, for x,y E X 0= (x - P(x)lP(y)) = (xIP(y)) - (P(x)IP(y)), 0= (P(x)ly - P(y)) = (P(x)ly) - (P(x)IP(y)), hence,
(P(x)ly) = (xlP(y)), Conversely, if P* = P and p2 = P we have (x - P(x)IP(z))
= (P*(x -
P(x))lz)
Le.,
= (P(x)
P* = P.
- p 2 (x)lz)
= (P(x)
- P(x)lz)
=0
for all z EX.
0
b. The spectral theorem The following theorem, as we shall see, yields a characterization of the self-adjoint operators. 4.3 Theorem (Spectral theorem). Let A : X ----+ X be a self-adjoint operator on the Euclidean or Hermitian space X. Then X has an orthonormal basis made of eigenvectors of X. In order to prove Theorem 4.3 let us first state the following.
4.4 Proposition. Under the hypothesis of Theorem 4.3 we have
(i) A has n real eigenvalues, if counted with multiplicity, (ii) if V C ~n is an invariant subspace under A, then V 1- is also invariant under A, (iii) eigenvectors corresponding to distinct eigenvalues are orthogonal. Proof. (i) Assume X is Hermitian and let A E Mn,n(l~) be the matrix associated to A in an orthonormal basis. Then A = AT, and A has n complex eigenvalues, if counted with multiplicity. Let z E en be an eigenvector with eigenvalue .\ E iC. Then
Az = Az = .\z = Xz. Consequently, if A = (aj), z = (zl, z2, ... , zn), we have
.\lzl2 = ",n .\zi Zi = ",n (Az)i Zi = ",n . ai zj Zi ' L...z=l L...t=l L...t,J=l J 2 { 'I ""n "\ i i ~n i......-lA· ""n i -J' i J A Z = L...i=l A Z Z = L...i=l Z Z = L...i,j=l a j Z Z ,= "",,n L...i,j=l a i 1
Z
j
i
Z .
Since aj = a{ for all i, j = 1, ... ,n, we conclude that i.e.,
In the Euclidean case, AT = A = A, also. (ii) Let w E V.L. For every v E V we have (A(w)lv) = (wIA(v)) = 0 since A(v) E V and w E V.L. Thus A(w) 1. V. (iii) Let x, y be eigenvectors of A with eigenvalues .\, J.l, respectively. Then .\ and J.l are real and (.\ - J.l)(xly) = (.\xly) - (xlJ.lY) = (A(x)ly) - (xIA(y)) = O. o Thus (xly) = 0 if .\ i- J.l.
4.1 Elements of Spectral Theory
113
Proof of Theorem 4.3. We proceed by induction on the dimension n of X. On account of Proposition 4.4 (i), the claim trivially holds if dim X = 1. Suppose the theorem has been proved for all self-adjoint operators on H when dim H = n - 1 and let us prove it for A. Because of (i) Proposition 4.4, all eigenvalues of A are real, hence there exists at least an eigenvector Ul of A with norm one. Let H := Span {Ul}l- and let B := AIR be the restriction of A to H. Because of (ii) Proposition 4.4, B(H) C H, hence B : H - t H is a linear operator on H (whose dimension is n - 1); moreover, B is self-adjoint, since it is the restriction to a subspace of a self-adjoint operator. Therefore, by the inductive assumption, there is an orthonormal basis (U2, ... , Un) of H made by eigenvectors of B, hence of A. Since U2, ... ,Un are orthogonal to U 1, (u 1, U2, ... , Un) is an orthonormal basis of X made by eigenvectors of A. 0
The next proposition expresses the existence of an orthonormal basis of eigenvectors in several different ways, see Theorem 2.45. We leave its simple proof to the reader.
4.5 Proposition. Let A : X -+ X be a linear operator on a Euclidean or Hermitian space X of dimension n. Let (UI' U2,"" un) be a basis of X and let AI, A2,"" An be real numbers. The following claims are equivalent
(i) (UI, U2, ... , Un) is an orthonormal basis of X and each Uj is an eigenvector of A with eigenvalue Aj, i. e., Vi,j=l, ... ,n,
(ii) (Ul, U2, ... , un) is an orthonormal basis and n
A(x)
=L
Aj(xluj) Uj
VxEX,
j=l
(iii) (UI, U2, ... , un) is an orthonormal basis and for all x, y
E X
if X is Euclidean, if X is Hermitian. Moreover, we have the following, compare with Theorem 2.45.
4.6 Proposition. Let A : X -+ X be a self-adjoint operator in a Euclidean or Hermitian space X of dimension n and let A E Mn,n(JK) be the matrix associated to A in a given orthonormal basis. Then A is similar to a diagonal matrix. More precisely, let (UI' U2,"" un) be a basis of X of eigenvectors of A, let AI, A2,"" An E IR be the corresponding eigenvalues and let S E M n ,n(lR) be the matrix that has the n-tuple of components of Ui in the given orthonormal basis as the i th column,
Then STS
=
Id
and
114
4. Self-Adjoint Operators
if X is Euclidean, and and if X is Hermitian. Proof. Since the columns of S are orthonormal, it follows that STS = Id if X is Euclidean or ST S = Id if X is Hermitian. The rest of the proof is contained in Theorem 2.45.
o
c. Spectral resolution Let A : X ----. X be a self-adjoint operator on a Euclidean or Hermitian space X of dimension n, let (UI' Uz, ... , un) be an orthonormal basis of eigenvectors of A, let )'1, Az, ... , Ak be the distinct eigenvalues of A and VI, Vz ,·.·, Vk the corresponding eigenspaces. Let P;, : X ----. Vi be the projector on Vi so that
P;,(x) =
L
(xIUj) Uj,
UjEVi
and by (ii) Proposition 4.4 k
A(x) = L AiPi(X). i=l
As we have seen, by (iii) Proposition 4.4, we have Vi -.L Yj if i =1= j and, by the spectral theorem, I:~=1 dim Vi = n. In other words, we can say that {Vih is a decomposition of X in orthogonal subspaces or state the following.
4.7 Theorem. Let A : X ----. X be self-adjoint on a Euclidean or Hermitian space X of dimension n. Then there exists a unique family of projectors PI, Pz , ... , Pk and distinct real numbers AI, Az, ... , Ak such that k
k
LP;' = Id i=l
and A
= LAiPi. i=l
Finally, we can easily complete the spectral theorem as follows.
4.8 Proposition. Let X be a Euclidean or Hermitian space. A linear opertor A : X ----. X is self-adjoint if and only if the eigenvalues of A are real and there exists an orthonormal basis of X made of eigenvectors of A.
4.1 Elements of Spectral Theory
d. Quadratic forms To a self-adjoint operator A : X a: X x X -> JK,
->
115
X we may associate the bilinear form
a(x, y) := (A(x)[y),
x,yEX,
which is symmetric if X is Euclidean and sesquilinear, a(x, y) = a(y, x), if X is Hermitian.
4.9 Theorem. Let A: X -> X be a self-adjoint operator, (el' e2, ... , en) an orthonormal basis of X of eigenvectors of A and AI, A2, ... , An be the corresponding eigenvalues. Then n
(A(x)[x) =
L Ai[(xleiW
Vx E X.
(4.1)
i=l
In particular, if Amin and eigenvalues of A, then Amin
A max
are respectively, the smallest and largest
Ixl 2 :::; (A(x)[x)
:::;
A max
lxl 2
Vx E X.
Moreover, we have (A(x)[x) = Amin Ixl 2 (resp. (A(x)[x) = A max and only if x is an eigenvector with eigenvalue Amin (resp. AmaxJ.
Ix1 2 )
if
Proof. Proposition 4.5 yields (4.1) hence n
Amin L
n
[(xlei)1 2
:::;
(A(x)lx) :::; Amax L
i=l
I(x[ei)f,
i=l
and, since [x[2 = I:~11(xlei)12 "Ix E X, the first part of the claim is proved. Let us prove that (A(x)lx) = Aminlxf if and only if x is an eigenvector with eigenvalue Amin. If x is an eigenvector of A with eigenvalue Amin, then A(x) = Amin x hence (A(x)lx) = (AminX[X) = Amin Ix1 2. Conversely, suppose (el, e2, ... , en) is a basis of X made by eigenvectors of A and the eigenspace VA . is spanned by (q, e2, ... , ek)' From (A(x)lx) = Aminl x l2 , we infer that mm n
0= (A(x)[x) - Aminlxl2 = L(Ai - Amin)l(xlei)[2 i=l
and, as AiAmin 2: 0, we get that (xlei) = 0 Vi We proceed similarly for Amax.
> k,
thus x E VAmin •
o
All eigenvalues can, in fact, be characterized as in Theorem 4.9. Let us order the eigenvalues, counted with their multiplicity, as
and let (el' e2, vectors (el' e2,
, en) be an orthonormal basis of corresponding eigen, en), A(ei) = Aiei Vi = 1, ... ,n; finally, set
116
4. Self-Adjoint Operators
Since Vk , Wk are invariant subspaces under A and Vk.l = Wk+1, byapplying Theorem 4.9 to the restriction of (A(x)lx) on Vk and Wk, we find
(4.2)
Al = min (A(x)lx), Ixl=l
Ak
= max{ (A(x)lx) jlxl = 1,
x E Vk}
= min{ (A(x)lx) Ilxl = 1,
if k
x E Wk}
= 2, ... , n - 1,
An = max(A(x)lx). Ixl=l
Moreover, if S is a subspace of dimension n-k+ 1, we have SnVk then there is Xo E S n Vk with Ixol = 1; thus min{ (A(x)lx)
Ilxl = 1, XES} :S
=1=
{O},
(A(xo)lxo)
:S max{ (A(x)lx) Ilxl = 1, x
E
Vk } = Ak.
Since dim Wk = n - k + 1 and minxEwk(A(x)lx) = Ak, we conclude with the min-max characterization of eigenvalues that makes no reference to eigenvectors.
4.10 Proposition (Courant). Let A be a self-adjoint operator on a Euclidean or Hermitian space X of dimension n and let Al :S A2 :S ... :S An be the eigenvalues of A in nondecreasing order and counted with multiplicity. Then
Ak = . max min{(A(x)lx) dlmS=n-k+1
Ilxl = 1, XES}
= dlmS=k .min max{(A(x)lx)
Ilxl
=
1, XES}.
4.11 A variational algorithm for the eigenvectors. From (4.2) we know that
Ak := min{ (A(x)lx)
Ilxl =
1, x E Vk-:' 1 },
k = 1 ...
,n,
(4.3)
where V-I = {O}. This yields an iterative procedure to compute the eigenvalues of A. For j = 1 define
Al = min(A(x)lx), Ixl=l
and for j
= 1, ... , n -
1 set
Yj := eigenspace of Aj,
Wj {
:=
Aj+1
(VI EB V2 EB··· EB Yj).l,
:=
min{ (A(x)lx)
Ilxl
=
1, x E W j
}.
4.1 Elements of Spectral Theory
117
Notice that such an algorithm yields an alternative proof of the spectral theorem. We shall see in Chapter 10 that this procedure extends to certain classes of self-adjoint operators in infinite-dimensional spaces. Finally, notice that Sylvester's theorem, Gram-Schmidt's procedure or the other variants for reducing a quadratic form to a canonical form, see Chapter 3, allow us to find the numbers of positive, negative and null eigenvalues (with multiplicity) without computing them explicitly. e. Positive operators A self-adjoint operator A : X --; X is called positive (resp. nonnegative) if the quadratic form ¢(x) := (Axlx) is positive for x =I- 0 (resp. nonnegative). From the results about metrics, see Corollary 3.56, or directly from Theorem 4.9, we have the following. 4.12 Proposition. Let A : X --; X be self-adjoint. A is positive (nonnegative) if and only if all eigenvalues of A are positive (nonnegative) or iff there is A > 0 (A 2: 0) such that (Axlx) 2: A[x[2. 4.13 Corollary. A : X --; X is positive self-adjoint if and only if a(x,y) = (A(x)ly) is an inner (Hermitian) product on X. 4.14 Proposition (Simultaneous diagonalization). Let A, M : X --; X be linear self-adjoint operators on X. Suppose M is positive. Then there exists a basis (el' e2, ... , en) of X and real numbers AI, A2,"" An such that (4.4) Proof. The metric g(x, y) :== (M(x)[y) is a scalar (Hermitian) product on X and the linear operator M-IA : X ....... X is self-adjoint with respect to 9 since g(M- 1A(x), y) == (M M-1A(x)ly) == (A(x)[y) == (x[A(y)
== (xjM M- 1 A(y») == (MxIM- 1 A(y) == g(x, M- 1 A(y». Therefore, M- 1 A has real eigenvalues and, by the spectral theorem, there is a gorthonormal basis of X made of eigenvectors of M- 1 A, M- 1 A(ej) == Ajej Vi,j == 1, ... , n.
o 4.15 Remark. We cannot drop the positivity assumption tion 4.14. For instance, if
and we have det(A Id - M- I A)
A:=
III
Proposi-
(~ ~)
= A2 + 1, hence M- 1 A has no real eig~nvalue.
118
4. Self-Adjoint Operators
4.16'. Show the following.
Proposition. Let X be a Euclidean space and let g, b : X x X ---> lR be two metrics on X. Suppose g is positive. Then there exists a basis of X that is both g-orthogonal and b-orthogonal. 4.17'. Let A, M be linear self-adjoint operators and let M be positive. Then M-l A is self-adjoint with respect to the inner product g(x,y) := (M(x)ly). Show that the eigenvalues AI, A2, ... , An of M-l A are iteratively given by .
Al =
mm
g(x,x)=l
geM
-1
. (A(x)lx) A(x»x = mm ( ()I) x,eo M x x
and for j = 1, ... , n - 1
Yj {
:= eigenspace of
M-l A relative to Aj,
Wj := (VI EEl V2 EEl··· EEl Yj).L, Aj+l := min{(A(x)lx) 1 (M(x)lx) = 1, x E Wj},
where V.L denotes the orthogonal to V with respect to the inner product g. 4.18'. Show the following.
Proposition. Let T be a linear operator on oc n . If T + T- is positive (nonnegative), then all eigenvalues of T have positive (nonnegative) real part.
f. The operators A * A and AA * Let A : X ~ Y be a linear operator between X and Y that we assume are either both Euclidean or both Hermitian. From now on we shall write Ax instead of A(x) for the sake of simplicity. As usual, A* : Y ~ X denotes the adjoint of A.
4.19 Proposition. The operator A* A: X
~
X is
(i) self-adjoint, (ii) nonnegative, (iii) Ax, A* Ax and (A* Axlx) are all nonzero or all zero, in particular A - A is positive if and only if A is injective, (iv) if U1, U2, , Un are eigenvectors of A- A respectively, with eigenvalues A1' A2, , An, then
in particular, if U1, U2, ... , Un are orthogonal to each other, then AU1' ... ,Aun are orthogonal to each other as well. Proof. (i) In fact, (A- A)-
= A- A-- = A- A.
(ii) and (iii) If Ax = 0, then trivially A- Ax = 0, and if A- Ax = 0, then (A- Axlx) On the other hand, (A- Axlx) = (AxIAx) = IAxl 2 hence Ax = 0 if (A- Axlx) = O. (iv) In fact, (AUiIAuj)
= (A- AUiluj) = Ai(Uiluj) = Ailuil28ij.
= O. 0
4.1 Elements of Spectral Theory
119
4.20 Proposition. The operator AA * : Y ----. Y is
(i) self-adjoint, (ii) nonnegative, (iii) A*x, AA*x and (AA*xlx) are either all nonzero or all zero, in particular AA * is positive if and only if ker A * = {O}, equivalently if and only if A is surjective. (iv) if U1, U2, ... , Un are eigenvectors of AA* with eigenvalues respectively, AI, A2' ... , An, then
in particular, if U1, U2, ... , Un are orthogonal to each other, then A*U1,.'" A*un are orthogonal to each other as well. Moreover, AA* and A* A have the same nonzero eigenvalues and
RankAA*
= RankA* A = Rank A = RankA*.
In particular, RankAA* = RankA* A
~
min(dimX,dim Y).
Proof. The claims (i) (ii) (iii) and (iv) are proved as in Proposition 4.19. To prove that A* A and AA* have the same nonzero eigenvalues, notice that if x E X, x # 0, is an eigenvalue for A* A with eigenvalue A # 0, A* Ax = AX, then Ax # by (iii) Proposition 4.19 and AA*(Ax) = A(A* Ax) = A(AX) = AAx, i.e., Ax is a nonzero eigenvector for AA* with the same eigenvalue A. Similarly, one proves that if yolO is an eigenvector for AA* with eigenvalue A # 0, then by (iii) A*y # and A*y is an eigenvector for A * A with eigenvalue A. Finally, from the alternative theorem, we have
°
°
RankAA*
= RankA* = Rank A = RankA* A. o
g. Powers of a self-adjoint operator Let A : X ----. X be self-adjoint. By the spectral theorem, there is an orthonormal basis (el, e2, ... , en) of X and real numbers AI, A2,"" An such that n Ax
=
LAj(x[ej)ej
"Ix E X.
j=l
By induction, one easily computes, using the eigenvectors el, e2,"" en and the eigenvalues AI, A2,"" An of A the k-power of A, Ak := Ao·· ·oA k times, 'Vk ~ 2, as n
Akx
=
L(Ai)k(xlei) ei
(4.5)
"Ix E X.
i=l
4.21 Proposition. Let A : X ----. X be self-adjoint and k
~
1. Then
(i) Ak is self-adjoint, (ii) A is an eigenvalue for A if and only if Ak is an eigenvalue for Ak,
120
4. Self-Adjoint Operators
(iii)
E X is an eigenvector of A with eigenvalue ..\ if and only if x is an eigenvector for A k with eigenvalue ..\k. In particular, the eigenspaces of A relative to ..\ and of Ak relative to ..\k coincide. (iv) If A is invertible, equivalently, if all eigenvalues of A are nonzero, then X
"Ix E X. 4.22~.
Let A: X
--->
X be self-adjoint. Show that
If p(t) = 2:;;=1 aktk is a polynomial of degree m, then (4.5) yields m
p(A)(x)
=L k=l
m
akAk(x)
n
n
=L
L ak..\j(xjej)ej k=l j=l
= LP(..\j)(xlej)ej.
(4.6)
j=l
4.23 Proposition. Let A : X --> X be a nonnegative self-adjoint operator and let kEN, k ?: 1. There exists a unique nonnegative self-adjoint operator B : X --> X such that B 2k = A. Moreover, B is positive if A is positive.
The operator B such that B 2k denoted by 2t0i'.
= A is called the 2kth root of A and is
Proof. If A(x) = ~'J=l Aj (x[ej )ej, (4.5) yields B 2k = A for n
B(x)
:=
L
2V\j(xlej)ej.
j=l
Uniqueness remains to be shown. Suppose Band C are self-adjoint, nonnegative and such that A = B 2k = C2k. Then B and C have the same eigenvalues and the same eigenspaces by Proposition 4.21, hence B = C. 0
In particular, if A : X --> X is nonnegative and self-adjoint, the operator square root of A is defined by n
VA(x) :=
L 0\J(x!ej)ej,
XEX,
i=l
if A has the spectral decomposition Ax
= 2:7=1 ..\j(x[ej)ej.
4.24~.
Prove Proposition 4.14 by noticing that, if A and M are self-adjoint and M is positive, then M-I/2 AM- 1 / 2 : X ---> X is well defined and self-adjoint.
4.25~.
Let A, B be self-adjoint and let A be positive. Show that B is positive if
S := AB + BA is positive. [Hint: Consider A-I/ 2 BA- 1 / 2 and apply Exercise 4.18.]
4.1 Elements of Spectral Theory
121
4.1.2 Normal operators a. Simultaneous spectral decompositions 4.26 Theorem. Let X be a Euclidean or Hermitian space. If A and B are two self-adjoint operators on X that commute, A=A*,
B=B*,
AB=BA,
then there exists an orthonormal basis (el' e2, ... , en) on X of eigenvectors of A and B, hence n Z
n
= ~)z!ei)ei' i=l
Az
= L Ai(zlei)ei,
n
Bz
i=l
=
LJ.li(z!ei)ei, i=l
AI, A2, ... , An E lR and J.ll, J.l2,"" J.ln E lR being the eigenvalues respectively of A and B. This is proved by induction as in Theorem 4.3 on account of the following.
4.27 Proposition. Under the hypoteses of Theorem 4.26, we have (i) A and B have a common nonzero eigenvector,
(ii) if V is invariant under A and B, then V 1- is invariant under A and B as well. Proof. (i) Let>. be all y E VA we have consequently, there (ii) For every w E (Bw[z) = (wIBz) =
an eigenvalue of A and let VA be the corresponding eigenspace. For ABy = BAy = >.By, i.e., By E VA' Thus VA is invariant under B, is an eigenvector w E VA of B IVA ' i.e., common to A and B. V.L and z E V, we have Az, Bz E V and (Aw[z) = (wIAz) = 0, 0, i.e., Aw,Bw E V.L. 0
4.28'. Show that two symmetric matrices A, B are simultaneously diagonizable if and only if they commute AB = BA.
b. Normal operators on Hermitian spaces A linear operator on a Euclidean or Hermitian space is called normal if N N* = N* N. Of course, if we fix an orthonormal basis in X, we may represent N with an n x n matrix N E Mn,n(C) and N is normal if and T only if NN = NT N if X is Hermitian or NN T = NTN if X is Euclidean. The class of normal operators, though not trivial from the algebraic point of view (it is not closed for the operations of sum and composition), is interesting as it contains several families of important operators as subclasses. For instance, self-adjoint operators N = N*, anti-self-adjoint operators N* = - N, and isometric operators, N* N = Id, are normal operators. Moreover, normal operators in a Hermitian space are exactly the ones that are diagonizable. In fact, we have the following.
122
4. Self-Adjoint Operators
4.29 Theorem (Spectral theorem). Let X be a Hermitian space of dimension n and let N : X ~ X be a linear operator. Then N is normal if and only if there exists an orthonormal basis of X made by eigenvectors ofN. Proof. Let (el, e2, ... , en) be an orthonormal basis of X made by eigenvectors of N. Then for every z E X n
n
N* z = L Aj (zlej )ej j=l
Nz = LAj(zlej)ej, j=l hence NN*z = Ej=IIAjI2(zlej)ej = N* Nz. Conversely, let
N +N* A:=---,
N - N* B:=---.
2 2i It is easily seen that A and B are self-adjoint and commute. Theorem 4.26 then yields a basis of orthonormal eigenvectors of A and B and therefore of eigenvectors of N :=
A+iBandN*=A-iB. 4.30'. Show that N : eigenspaces.
en
0 ~
en
is normal if and only if Nand N* have the same
c. Normal operators on Euclidean spaces Let us translate the information contained in the spectral theorem for normal operators on Hermitian spaces into information about normal operators on Euclidean spaces. In order to do that, let us first make a few remarks. As usual, in en we write z = x+iy, x, Y E IR n for z = (Xl +iYl,.·., X n + iYn)' If W is a subspace of IR n , the subspace of en
W e1 iW := { z E en I z = x
+ iy,
x, YEW}
is called the complexijied of W. Trivially, dimc(We1 iW) if V is a subspace of en, set
4.31 Lemma. A subspace V W if and only if V = V. Proof. If V = W EEl iW, trivially vectors
z+z
V
x:=-2-'
c
= dimIR W. Also,
en is the complexijied of a real subspace
= V. Conversely, if z E V is such that
z
-z
y:=~=
z E V,
the
(z/i) +~ 2
have real coordinates. Set
W :=
{x Ell~n Ix= z: z, zE V};
then it is easily seen that V = W EEl iW if V =
V.
o
4.1 Elements of Spectral Theory
123
For N : lRn --+ lRn we define its complexified as the (complex) linear operator Nc : en --+ en defined by Nc(z) := Nx+iNy if z = x+iy. Then we easily see that (i) oX is an eigenvalue of N if and only if oX is an eigenvalue of N c , (ii) N is respectively, a self-adjoint, anti-self-adjoint, isometric or normal operator if and only if N c is respectively, a self-adjoint, anti-selfadjoint, isometric or normal operator on en, (iii) the eigenvalues of N are either real, or pairwise complex conjugate; in the latter case the conjugate eigenvalues oX and "X have the same multiplicity. 4.32 Proposition. Let N : lRn --+ lRn be a normal operator. Every real eigenvalue oX of N of multiplicity k has an eigenspace V..\ of dimension k. In particular, V..\ has an orthonormal basis made of eigenvectors. Proof. Let A be a real eigenvalue for Ne, Nez = AZ. We have Nez
= Nx -
iNy
= Nez = AZ = >..z,
en
i.e., Z E is an eigenvector of Ne with eigenvalue A if and only if Z is also an eigenvector with eigenvalue A. The eigenspace E).. of Ne relative to >.. is then closed under conjugation and by Lemma 4.31 E).. := W).. EB iW).., where W)..:={xElRnlx=z;z, ZEE)..}, and dimlR W).. = dime E)... Since Ne is diagonizable in
e and W)..
C V).., we have
k = dime E).. = dim W).. :::; dimlR V)...
o
As dim V).. :::; k, see Proposition 2.43, the claim follows.
4.33 Proposition. Let oX be a nonreal eigenvalue of the normal operator N : lRn --+ lRn with multiplicity k. Then there exist k planes of dimension 2 that are invariant under N. More precisely, if e1, e2, ... , en E en are k orthonormal eigenvectors that span the eigenspace E..\ of N c relative to oX and we set . U 2j-1·=
e·J +e:-J
V2 '
U
.
2j·=
e·-e:J J
V2i '
then U1, U2, . .. , U2k are orthonormal in lR n , and for j = 1, ... ,k the plane Span {U2j-1, U2j}, is invariant under N; more precisely we have N(U2 j -d = aU2j-1 - (JU2j, { N(U2j) = (JU2j-1 + aU2j where oX =: a
+ i{J.
Proof. Let E).., ~ be the eigenspaces of Ne relative to >.. and X. Since Ne is diagonizable on e, then dime E).. = dime ~ = k, On the other hand, for
Z
E E)..
124
4. Self-Adjoint Operators
Nez = Nx - iNy = Nez = Xz. Therefore, z E E>. if and only ifz E EX' The complex subspace F>. := E>. EBEX oH: n has dimension 2k and is closed under conjugation; Lemma 4.31 then yields F>. = W>. EB iW>. where
W>.:={XEJRnlx=Z;z, ZEE>.}
and
dimIRW>.=dimeE=2k.
(4.7)
If (el, e2, ... , ek) is an orthonormal basis of E>., ("el, 'e2, ... , ek) is an orthonormal basis of Ex; since V2ej =: U2j-1
+ iU2j,
V2ej =: U2j-1 -
iU2j,
we see that {Uj} is an orthonormal basis of W>.. Finally, if>' := a
+ i{3, we compute
e.+e:-) >.e+Xe:N(U2j-d = Ne ( ::lj?- = ~ = ... = aU2j-1 - (3u2j, { N(U2j)=N(e~:) = >.ej';/"i ="'={3u2j-l+ aU2j, i.e., Span {U2j-l, U2j} is invariant under N.
o
Observing that the eigenspaces of the real eigenvalues and the eigenspaces of the complex conjugate eigenvectors are pairwise orthogonal, from Propositions 4.32 and 4.33 we infer the following. 4.34 Theorem. Let N be a normal operator on JR.n. Then JR.n is the direct sum of I-dimensional and 2-dimensional subspaces that are pairwise orthogonal and invariant under N. In other words, there is an orthonormal basis such that the matrix N associated to N in this basis has the block structure
o
5J
N'=
o
o
To each real eigenvalue A of multiplicity k correspond k blocks A of dimension 1 x 1. To each couple of complex conjugate eigenvalues oX, X of multiplicity k correspond k 2 x 2 blocks of the form
where
0:
+ i(3 := A.
4.35 Corollary. Let N : JR.n ........ JR.n be a normal operator. Then
(i) N is self-adjoint if and only if all its eigenvalues are real, (ii) N is anti-self-adjoint if and only if all its eigenvalues are purely imaginary (or zero),
(iii) N is an isometry if and only if all its eigenvalues have modulus one. 4.36~.
Show Corollary 4.35.
4.1 Elements of Spectral Theory
125
4.1.3 Some representation formulas a. The operator A * A Let A : X ---. Y be a linear operator between two Euclidean spaces or two Hermitian spaces and let A * : Y ---. X be its adjoint. As we have seen, A* A : X ---. X is self-adjoint, nonnegative and can be written as n
A* Ax
=
I>i(xlei)
ei
i=l
where (el' e2, , en) is a basis of X made of eigenvectors of A* A and for each i = 1, , n Ai is the eigenvalue relative to ei; accordingly, we also have n
(A*A)1/2 X := L/-li(x!ei)ei, i=l
where /-li := ,;>:;,. The operator (A* A)1/2 and its eigenvalues /-ll, ... ,/-In, called the singular values of A, play an important role in the description of A. 4.37'. Let A E Mm,n(JR). Show that IIAII := sUPlxl=l value of A. [Hint: IAxl 2 = (A * Ax) • x .J
IAxl
is the greatest singular
4.38 Theorem (Polar decomposition). Let A: X ---. Y be an operator between two Euclidean or two Hermitian spaces.
(i) If dim X ~ dimY, then there exists an isometry U: X ---. Y, i.e., U* U = Id, such that A = U(A* A)1/2. Moreover, if A = US with U*U = Id and S* = S, then S = (A* A)1/2 and U is uniquely defined on ker SJ.. = ker AJ... (ii) IfdimX ~ dimY, then there exists an isometry U: Y ---. X, i.e., U* U = Id such that A = (AA*)1/2U*. Moreover, if A = SU with U*U = Id and S* = S, then S = (AA*)1/2 and U is uniquely defined on ker SJ.. = 1m A. Proof. Let us show (i). Set n := dimX and N := dim Y. First let us prove uniqueness. If A = US where U*U = Id and S* = S, then A*A = S*U*US = S*S = S2, i.e., S = (A* A)1/2. Now from A = U(A* A)1/2, we infer for i = 1, ... ,n
if (e1, e2, ... , en) is an orthonormal basis of X of eigenvectors of (A* A)1/2 with relative eigenvalues 1'01,1'02, ... , I'on. Hence, U(ei) = :; A(e;) if I'oi =I 0, i.e., U is uniquely defined by A on the direct sum of the eigenspaces relative to nonzero eigenvalues of (A* A)1/2, that is, on the orthogonal of ker(A* A)1/2 = ker A. Now we shall exhibit U. The vectors A(e1),'" ,A(en) are orthogonal and IA(ei)1 = I'oi as
126
4. Self-Adjoint Operators
(A(e;)IA(ej)) = (A* A(ei)lej) = Ili(eilej) = lliOij. Let us reorder the eigenvectors and the corresponding eigenvalues in such a way that for some k, 1 S; k S; n, the vectors A(e1),.'" A(ek) are not zero and A(ek+d = ... =
111::ll
A(en ) = O. For i = 1, ... , k we set Vi := and we complete V1, V2, . .. , Vk to form a new orthonormal basis (V1, V2, ... , VN) of Y. Now consider U : X ---> Y defined by i = 1, ... ,n.
By construction (U(ei)IU(ej)) = Oij, i.e., U*U = Id, and, since Ili = IA(ei)1 = 0 for i > k, we conclude for every i = 1, ... , n
(ii) follows by applying (i) to A *.
0
b. Singular value decomposition Combining polar decomposition and the spectral theorem we deduce the so-called singular value decomposition of a matrix A. We discuss only the real case, since the complex one requires only a few straightforward changes. Let A E MN,n(IR) with n :::; N. The polar decomposition yields A
= U(AT A)1/2
with
On the other hand, since ATA is symmetric, the spectral theorem yields S E Mn,n(IR) such that
where /11, /12, ... , /1n are the squares of the singular values of A. Recall that the ith column of S is the eigenvector of (A * A)1/2 relative to the eigenvalue /1i' In conclusion, if we set T := US T E MN,n (IR), then TTT = Id, STS = Id and A = Tdiag (/11, /12,"" /1n) S. This is the singular value decomposition of A, that is implemented in most computer libraries on linear algebra. Starting from the singular value decomposition of A, we can easily compute, of course, (AT A)1/2, and the polar decomposition of A. 4.39. We notice that the singular value decomposition can be written in a more symmetric form if we extend T to a square orthogonal matrix Y E MN,N(lR), yTy = Id and extending diag (Ill, 1l2, ... , Iln) to a N x n matrix by adding N - n null rows at the bottom. Then, again A = Yll.S where Y E MNxN(lR), yTy
=
Id, S E M n ,n(IR), STS
=
Id and
4.1 Elements of Spectral Theory
o o
j1.i
0
0
j1.2
0 0
0 0
j1.n
0
0
o
~=
127
o
c. The Moore-Penrose inverse
Let A : X -+ Y be a linear operator between two Euclidean or two Hermitian spaces of dimension respectively, nand m. Denote by P :X
-+
ker A.l
and
Q:Y-+lmA
the orthogonal projections operators to ker A.l and ImA. Of course Ax = Qy has at least a solution x E X for any y E Y. Equivalently, there exists x E X such that y - Ax ..L 1m A. Since the set of solutions of Ax = Qy is a translate of ker A, we conclude that there exists a unique x := At y E X such that y - Ax ..L ImA, { x E ker A.l, The linear map At : Y
Ax = Qy, { x=Px.
equivalently,
-+
X, y
-+
(4.8)
At y , defined this way, i.e.,
is called the Moore-Penrose inverse of A: X
-+
Y. From the definition
= Q, AtA = P, ker At = ImA.l = kerQ, 1m At = ker A.l . AAt
j
4.40 Proposition. At is the unique linear map B : Y
AB=Q,
BA=P
and
ker B
-+
X such that
= kerQ;
(4.9)
moreover we have
A * AA t = At AA * = A *.
(4.10)
128
4. Self-Adjoint Operators
Proof. We prove that B = At by showing for all y E Y the vector x := By satisfies (4.8). The first equality in (4.9) yields Ax = ABy = Qy and the last two imply x = By = BQy = BAx = Px. Finally, from AAt = Q and At A = P, we infer that A*AAt=A*Q=A*, using also that A*Q = A* and PA* (ker A*)i. and ImA* = ker Ai..
=
AtAA*=PA*=A*, A* since A and A* are such that 1m A = 0
The equation (4.10) allow us to compute At easily when A is injective or surjective.
4.41 Corollary. Let A : X -> Y be a linear map between Euclidean or Hermitian spaces of dimension nand m, respectively.
(i) IfkerA = {O}, then n
~
m, A*A is invertible and At = (A* A)-l A*;
moreover, if A = U(A* A)1/2 is the polar decomposition of A, then At = (A* A)-1/2U*. (ii) Ifker A* = {O}, then n 2:: m, AA* is invertible, and
moreover, if A = (AA*)1/2U* is the polar decomposition of A, then At = U(AA*)-1/2.
For more on the Moore-Penrose inverse, see Chapter 10.
4.2 Some Applications In this final section, we illustrate methods of linear algebra in a few specific examples.
4.2.1 The method of least squares a. The method of least squares Suppose we have m experimental data Yl, Y2, ... , Ym when performing an experiment of which we have a mathematical model that imposes that the data should be functions,
4.2 Some Applications
129
model as a map ¢ : X - t lRm . Then, we introduce a cost function C = C(¢(x), y) that evaluates the error between the expected result when the parameter is x, and the experimental data. Our problem then becomes that of finding a minimizer of the cost function C. If we choose (i) the model of the data to be linear, Le., X is a vector space of dimension nand ¢ = A : X - t lRm is a linear operator, (ii) as cost function, the function square distance between the expected and the experimental data, (4.11)
C(x) = lAx - yl2 = (Ax - ylAx - y), we talk of the (linear) least squares problem. 4.42 Theorem. Let X and Y be Euclidean spaces, A : X map, y E Y and C : X - t lR the function
C(x) := lAx - yl~,
-t
Y a linear
xE X.
The following claims are equivalent (i) x is a minimizer of C, (ii) y - Ax 1. lmA, (iii) x solves the canonical equation
A*(Ax - y)
= O.
(4.12)
Consequently C has at least a minimizer in X and the space of minimizers of C is a translate of ker A. Proof. Clearly minimizing is equivalent to finding z = Ax E 1m A of least distance from y. By the orthogonal projection theorem, x is a minimizer if and only if Ax is the orthogonal projection of y onto 1m A. We therefore deduce that a minimizer x E X for C exists, that for two minimizers Xl, X2 of C we have AXl = AX2, Le., Xl - X2 E ker A and that (i) and (ii) are equivalent. Finally, since 1m Ai- = ker A*, (ii) and (iii) are clearly equivalent. 0
4.43 Remark. The equation (4.12) expresses the fact that the function x -+ IAx-b[2 is stationary at a minimizer. In fact, compare 3.65, since V' x(z[x) = z and V' x(Lx[x) = 2Lx if L is self-adjoint, we have 2 [Ax - bl 2 = Ibl 2 - 2(bIAx) + IAxI , V'(bIAx) = V'(A*b[x) = A*b,
V'x[AxI 2
= V'x(A* Axlx) = 2A* Ax
hence V'xlb- Axl 2 = 2A*(Ax - b).
As a consequence of Theorem 4.42 on account of (4.8) we can state the following 4.44 Corollary. The unique minimizer of C(x) ker AJ.. is x = At y .
= [Ax -
y[~ in lmA*
=
130
4. Self-Adjoint Operators
b. The function of linear regression Given m vectors Xl, X2, ... , Xm in a Euclidean space X and m corresponding numbers Yl, Y2, ... , Ym, we want to find a linear map L : X -+ IR that minimizes the quantity m
F(L) :=
L IYi -
2 L(Xi)1 .
i=l
This is in fact a dual formulation of the linear least squares problem. By Riesz's theorem, to every linear map L : X -+ IR corresponds a unique vector w LEX such that L(y) := (ylw L), and conversely. Therefore, we need to find w E X such that m
L IYi -
C(w) :=
(xil w )1 2
-+
min.
i=l
If Y := (YI' Y2, ... , Ym) E IR m and A : X
-+
we are again seeking a minimizer of C : X
IR m is the linear map
-+
IR wE X.
Theorem 4.42 tells us that the set of minimizers is nonempty, it is a translate ofker A and the unique minimizer ofC in ker Ai- = ImA* is w:= At y . Notice that n
A*a
= Laixi, i=l
hence, w E 1m A * = ker Ai-if and only if w is a linear combination of X2, ... , Xm . We therefore conclude that At y is the unique minimizer of the cost function C that is a linear combination of Xl, X2,.'" Xm . The corresponding linear map L(x) := (xIAt y ) is called the function of linear regression.
Xl,
4.2.2 Trigonometric polynomials Let us reconsider in the more abstract setting of vector spaces some of the results about trigonometric polynomials, see e.g., Section 5.4.1 of [GM2]. Let Pn,21r be the class of trigonometric polynomials of degree m with complex coefficients n
P n ,21r :=
{P(x)
=
L k=-n
Ck
eikx
ICk
E C,
k
=
-n, ... ,n}.
4.2 Some Applications
131
Recall that the vector (C n , ... , cn) E C2n +1 is called the spectrum of P(x) = :E~=-n Ckeikx. Clearly, Pn ,21r is a vector space over C of dimension at most 2n + 1. The function (PIQ) : P n ,21r X P n ,27r -----> C defined by (PIQ) := - 1 j1r P(t)Q(t) dt 27f -1r
is a Hermitian product on P n ,21r that makes P n ,21r a Hermitian space. Since
see Lemma 5.45 of [GM2], we have the following.
4.45 Proposition. The trigonometric polynomials {e i kx h=-n,n form an orthonormal set of 2n + 1 vectors in P n ,27r and we have the following.
(i) P n ,21r is a Hermitian space of dimension 2n + 1. (ii) The map ~: Pn ,27r -----> C2n+l, that maps a trigonometric polynomial to its spectrum is well defined since it is the coordinate system in Pn,21r relative to the orthonormal basis {e i kX}. In particular ~ : Pn,21r -----> 2n+l is a (complex) isometry. (iii) (FOURIER COEFFICIENTS) For k = -n, ... , n we have
c
Ck = (Ple ikx ) =
~ j1r 27f
P(t)e- ikt dt.
-1r
(iv) (ENERGY IDENTITY)
2~
i
1r IP(tW dt =
IIPI1 2 := (PIP)
=
t
I(PleikxW =
k=-n
1r
t
ICkI 2 •
k=-n
a. Spectrum and products Let P(x) = :E~=-n Ckeikx and Q(x) = :E~=-n dke ikx be two trigonometric polynomials of order n. Their product is the trigonometric polynomial of order 2n n n n P(x)Q(x) = Ck eikx dk eikx = Chdkei(h+k)x k=-n k=-n h,k=-n 2n iPX = Chdk )e . p=-2n h+k=p
L
L
L
L (L
If we denote by {cd * {dd the product in the sense of Cauchy of the spectra of P and Q, we can state the following.
132
4. Self-Adjoint Operators
4.46 Proposition. The spectrum of P(x)Q(x) is the product in the sense of Cauchy of the spectra of P and Q
4.47 Definition. The convolution product of P and Q is defined by P
* Q(x) := - 1
211"
j1r P(x + t)Q(t) dt. -1r
Notice that the operation (P, Q) -... P antilinear in the second one. We have
* Q is linear in the first
factor and
4.48 Proposition. P*Q is a trigonometric polynomial of degree n. Moreover the spectrum of P * Q is the term-by-term product of the spectra of P and Q,
Proof. In fact for h, k = -n, ... ,n, we have
n
n
L L
P*Q(x) =
n
L
chdkdhkeikx =
Ckdk eikx .
k=-n
h=-nk=-n
o b. Sampling of trigonometric polynomials A trigonometric polynomial of degree n can be reconstructed from its values on a suitable choice of its values on 2n + 1 points, see Section 5.4.1 of [GM2]. Set Xj := 2;~lj, j = -n, ... , n, then the sampling map
C .p . n,21r
-...1f"'2n+1 \l...
C(P) := (P(x- n ), ... , P(x n ))
,
is invertible, in fact, see Theorem 5.49 of [GM2],
1 n P(x) := 2n + 1 L
P(xj)Dn(x - Xj)
j=-n
where Dn(t) is the Dirichlet kernel of order n n
Dn(t):= L k=-n
n
eikt=1+2Lcoskt. k=l
4.2 Some Applications
Trigonometric polynomials of degree n
133
Spectrum
..
C 2n + 1 ikt 2:~=-n Cke '-----------'
Samplings c2n+l
Figure
4.1. The scenario of trigonometric polynomials, spectra and samples.
4.49 Proposition. v'2~+1 C and its inverse J2n + 1C- 1 Pn,21f given by
are isometries between P n ,21f and C2n + 1 • Proof. In fact, C maps e i kt, k =
t
(C(e i ht)IC(e ikt )) =
.
exp
J=-n
-n, ... ,n,
to an orthonormal basis of C 2n + 1:
(i 21r(h - k) j) = D n (~(h 2n + 1 2n + 1
k)) = (2n
+ l)<>hk. o
From the samples, we can directly compute the spectrum of P. 4.50 Proposition. Let P(x) E P n ,21f and Then
-
1
J1f
21f
P(t)e- ikt dt
-1f
:= 2~~lj, j = -n, ... ,n.
L
= -1-
2n + 1
Xj
n
P(x)e- ikxj .
k=-n
(4.13)
J
Proof. Since (4.13) is linear in P, it suffices to prove it when P(x) -n, ... , n. In this case, we have 2~ J~" P(t)e- i kt dt = <>hk and 1
n
1
n
' " ei(h-k)xj = - - ' " D ( ) <> 2n+1.L.J 2n+1 L.J nXh-k = hk, J=-n
since Dn(xj)
= °for j
j=-n
of- 0, j E [-n,n] and Dn(O)
= 2n+ 1.
o
134
4. Self-Adjoint Operators
c. The discrete Fourier transform The relation between the values {P( tj)} of PEPn,21r at the 2n + 1 points tj and the spectrum P of P in the previous paragraph is a special case of the so-called discrete Fourier transform. For each positive integer N, consider the 21r-periodic function EN(t) : lR -+ e given by N-l
EN(t) := ~ e ~
ikt
{
=
N
if t is a multiple of 21r,
k=O
otherwise.
Let w = ei~ and let 1,w,w2 , ... ,w N hE Z we have
~
I:
w hk
=
k=O
(4.14)
iN'
l-e l-e i t
{I°
1
be the Nth roots of one. For
if h is a multiple of N,
(4.15)
otherwise,
in particular, N-l
1 ~ hk N ~w =8 hk
if - N
< h < N.
(4.16)
k=O
The discrete Fourier transform of order N, DFTN defined by DFTN(y) := Uy rows by column, where
U
= [U]],
eN
= 0, ... , N -
Vi,j
-+
1.
The inverse discrete Fourier transform of order N, IDFTN : eN is defined by IDFTN(z) := Vz where
Vi,j
eN, is
-+
eN,
= O,N-1.
4.51 Proposition. IDFTN is the inverse of DFTN . Moreover, the operators VNDFTN and J-NIDFTN are isometries of eN. Proof. In fact, by (4.16)
(VV)j
=~ N
N-I
L:
k=O
w-ikw kj
=~ N
N-I
L:
i.e., V = V-I and, by the definition of V and V,
itV-I.
w(j-i)k
= 8j,
k=O
uT = it V,
hence
UT =
it V
= 0
4.2 Some Applications
135
Notice that, according to their definitions, we need N 2 multiplications to compute DFTN (or IDFTN ). There is an algorithm, that we shall not describe here, called the Fast Fourier Transform that, using the redundance of some multiplications, computes DFTN (or IDFTN ) with only N multiplications with a performance of O(N log N). Let P(t) = I:~=-n Ckeikx E Pn,271: and let N 2: 2n + 1. A computation similar to the one in Proposition 4.50 shows that
A=
1
2rr
171: -71: P(t)e- kt dt =
1
N-l
N ~ P(Xj )e-
ikxj
=
DFTNy
(4.17)
where y := (P(xo), ... , P(XN)) and Xj := 't; j, -N < j < N. Thus the spectrum of P is the DFTN of its values at Xj if n < N /2. On the other hand, if z := (zo, ... , ZN-l) is the vector defined by if 0
~
k
~
n,
if n < k < N/2, if N/2 ~ k ~ N/2+n, if N /2 + n < k < N and we recall that IDFTN is the inverse of DFTN, we have
i.e., the values of Pat Xj are the IDFTN of the spectrum of P. 4.52 Frequency spectrum. In applications, the DFTN and IDFTN may appear slightly differently. If f is a To-periodic function, one lets T o/ N be the period of sampling, so that tj := ~ j = jT, j = 0,1, ... , N - 1, are the sampling points and DTFN produces the sequence
1 Ck := N
L f(jT)e'N"k).
N-l
.2"
.
j=o In other words, the values of {cd are regarded as the values of the component of frequency I/k := kA = );T' i.e., as the samples of the so-called
frequency spectrum
i: lR --+ C of f, defined by 1(1/)
:=
{Ck
o
if 1/ = I/k = otherwise.
);T'
The discrete Fourier transform and its inverse then rewrite as
136
4. Self-Adjoint Operators
l(;T) =
N-l
~
L f(jT)e-i~jk,
j=o N-l
f(kT) =
.
L l(~T)eiZ;jk.
j=O
4.2.3 Systems of difference equations Linear difference equations of first and second order are discussed e.g., in [GM2]. Here we shall discuss systems of linear difference equations.
a. Systems of linear difference equations First let us consider systems of first order. Let A E Mk,k(C)' The homogeneous linear recurrence for the sequence in C k
Xn+l = AXn , n 2: 0, { X o given has the unique solution X n := An X o "In, as one can easily check.
4.53 Proposition. Given {Fn } in C k , the recurrence
Xn+l = AXn + Fn+l1 { X given o
n 2: 0,
has the solution n
X n := AnXo + LAn-jFj j=O
where we assume Fo := 0. Proof. In fact, for n X n +l = A n +1Xo
~
0 we have n+l
+L
A n+1-j F j = A n +1Xo
j=O
=
A AnXo +A
tAn-j F j j=O
= AXn
n
+L
A n +1-j Fj
+ F n +1
j=O
+
F n+l = A(Anxo
+
tAn-jFj)
+ Fn+l
j=O
+ Fn+ 1 • o
4.2 Some Applications
137
4.54 Higher order linear difference equations. Every equation Xn+k
+ ak-1 Xn+k-l + ... + aox n =
n~O
fn+I,
(4.18)
can be transformed into a k x k system of difference equations of first order. In fact, if X n := (X n ,Xn +l, ... ,Xn +k_l)T E C k ,
Fn
:=
(0,0, ... ,0, fnf
E
Ck ,
and A is the k x k matrix 1 0
0 0
0 1
0 0
A:=
(4.19)
0
0
0
1
-aD
-al
-a2
-ak-l
it is easily seen that X n+1 = AXn
+ Fn+1
(4.20)
and conversely, if {X n } solves (4.20), then {x n }, Xn := Xi; \:In, solves (4.18). In this way the theory of higher order linear difference equations is subsumed to that of first order systems. In this respect, one computes for the matrix A in (4.19) that k-l
det(>.Id - A)
= >.k + I>j>.j. j=O
This polynomial in >. is called the characteristic polynomial of the difference equation (4.18). b. Power of a matrix Let us compute the power of A in an efficient way. To do this we remark the following. (i) If B is similar to A, A
and, by induction,
=
S-1 BS for some S with det S =j:. 0, then
138
4. Self-Adjoint Operators
(ii) If B is a block matrix with square blocks in the principal diagonal
Go
°
°G
B=
0
then
0
0
~o Bn
0
oEJ
=
G 0
0
0
6
Let .AI, .A2,' .. , .Ak be the distinct eigenvalues of A with multiplicities ml, m2,"" mk. For every k, let Pk be the dimension of the eigenspace relative to .Ak (the geometric multiplicity). Then, see Theorem 2.65, there exists a nonsingular matrix S E Mk,k(C) such that J := S-l AS has the Jordan form
Boo B o
J=
0
0
0
o o
0
where i = 1, ... , k, j = 1, ... ,Pi and if Ji,j has dimension 1,
.Ai
Ji,j
=
.Ai
1
0
.Ai
0 1
0 0
0 0
otherwise. 0 0
0 0
0 0
.Ai 0
1 .Ai
4.2 Some Applications
139
Consequently An = SJnS-l, and
I J1,l I
a
a
a
a
I J 1,21
a
a
a
a
a
@;]
In =
k,Pk
It remains to compute the power of each Jordan block. If J' = Ji,j = (A) has dimension one, then Jm = An. If instead J' = J i,j is a block of dimension q at least two,
A
1
a
a
a
A
1
0
J' =
a o
A 1 o A
0
0
then
B ji
J' = Ald+B,
._;: . - Ui+l,j·
Since
we have Bq
(BT)i
= {<5T+i,j if r < q,
J
if r 2: q,
a
= O. Thus Newton's binomial formula yields
i.e.,
1 nl.oX
J,n
=
G) ;2 nl.
0
1
0
0
1
0
0
0
oX
An
1
140
4. Self-Adjoint Operators
Notice that each element of An
= SJ n S- 1 has the form
k
LAjpj(n) j=l
where AI, A2,"" An are the eigenvalues of A and Pj(t) is a polynomial of degree at most Pj -1, where Pj is the algebraic multiplicity of Aj. It follows that for p > maxi (IAiI) there is a constant Cp such that every solution of X n+1 = AXn satisfies
In particular we have the following. 4.55 Theorem. If all eigenvalues of A have modulus less than one, then every solution of Xn+l = AXn converges to zero as n --+ +00. Proof. Fix (T > 0 such that maxi=l.n I>"i I < (T < 1. As we have seen, there exists a constant Cu such that if Xn is a solution of Xn+l = AXn Vn, then
Since 0 <
(T
< 1,
(Tn
->
0, and the claim is proved.
0
4.56 Example (Fibonacci numbers). Consider the sequence of Fibonacci numbers
In+2 = In+1 + In { 10 = 0, h = 1,
n:::: 0,
(4.21)
that is given by
n:::: 0, see e.g., [GM2J. Let us find it again as an application of the above. Set
Fn :=
(
In ) In+l
then
F n+l { Fo
=
=
(In+l) In+2
=
(In+l ) In + In+l
(~)
and,
where
A=(~ ~).
(0
1
(4.22)
4.2 Some Applications
141
The characteristic polynomial of A is det(oXId - A) = oX(oX - 1) - 1, hence A has two distinct eigenvalues oX := 1 + v'5, fJ-:= 1 - v'5. 2 2 An eigenvalue relative to oX is (1, oX) and an eigenvector relative to fJ- is (1, fJ-). The matrix A is diagonizable as A = S~S-l where s:=
(~
:),
It follows that F
e; :n) (~1)
n n(~) =s~nS-l (~) oX~fJ-S =A
=
=oX~fJ-(~
:)(:;n).
Consequently,
4.2.4 An ODE system: small oscillations Let Xl, X2, ... , XN be N point masses in IR 3 each respectively, to a nonzero mass ml, m2, ... , mN. Assume that each point exerts a force on the other points according to Hooke's law, i.e., the force exerted by the mass at Xj on Xi is proportional to the distance of Xj from Xi and directed along the line through Xi and Xj, j
1= i.
By Newton's reaction law, the force exerted by Xi on Xj is equal and opposite in direction, hi = - lij, consequently the elastic constants k ij , i 1= j, satisfy the symmetry condition k ij = k ji . In conclusion, the total force exerted by the system on the mass at Xi is N
fi = - L
kij(xj-xd=- L
j=:~"t
where we set k ii := -
kijxj+( L
j=:~"t 2:j=l.N
j=F i
kij)Xi=-LkijXj
j=:~·t
j=l
kij. Newton's equation then takes the form i
= 1, ... ,N,
(4.23)
with the particularity that the jth component of the force depends only on the jth component of the mass. The system then splits into 3 systems of N equations of second order, one for each coordinate. If we use the matrix notation, things simplify. Denote by M := diag {ml' m2, ... , mN}
142
4. Self-Adjoint Operators
the positive diagonal matrix of masses, by K := (k ij ) E MN,N(lR) the symmetric matrix of elastic contants, and by X(t) E lR N the jth coordinates of the points Xl, ... , XN Xi
=:
(x},x;,xf),
i.e., the columns of the matrix
Then (4.23) transforms into the system of equations
MX"(t) + KX(t) = 0
(4.24)
where the product is the product rows by columns. Finally, ifX"(t) denotes the matrix of second derivatives of the entries of X(t), the system (4.23) can be written as X"(t) + M-lKX(t) = 0, (4.25) in the unknown X : lR ---t MN,n(lR). Since M-lK is symmetric, there is an orthonormal basis of lR N made by eigenvalues of M- l K and real numbers AI, A2, ... , AN such that and notice that Ul, U2,'." UN are pairwise orthonormal vectors since M is diagonal. Denoting by P j the projection operator onto Span {Uj} we also have N
M-lK
=
I: AjP
j .
j=l
Thus, projecting (4.25) onto Span {Uj} we find
0= Pj(O) = Pj(X" + M- 1 KX) = (PjX)" + Aj(PjX),
Vj
= 1, ... ,N,
i.e., the system (4.25) splits into N second order equations each in the unknowns of the matrix PjX(t). Since K is positive, the eigenvalues are positive, consequently each element of the matrix PjX(t) is a solution of the harmonic oscillator
n-
y(t) = cos( V Ajt)y(O) +
sin(.Jf0t) , n- y (0), V Aj
hence
In conclusion, since Id = L:f=l Pj , we have
4.3 Exercises
143
N
X(t)
=
L PjX(t) j=l
(4.26)
The numbers ';>:;/(21f) , ... ..;J:;;/(21f) are called the proper frequencies of the system. We may also use a functional notation A2n+l
00
sin A := L(-lt ( )" 2n+ 1 . k=O
00
cos A :=
A 2n
L(-lt (2n)! k=O
and we can write (4.26) as X(t)
= cos(tVA)X(O) + sin~f~)x'(O),
where A:= M-1K.
4.3 Exercises 4.57~.
Let A be an n x n matrix and let A be its eigenvalue of greatest modulus. Show that IAI :::; sUPi(lail + la~1 + ... + la~I). 4.58 ~ Gram matrix. Let {II, 12, ... , 1m} be m vectors in lR n . The matrix G = [9ij] E Mm,n(lR) defined by gij = (lil/j) is called Gmm's matrix. Show that G is nonnegative and it is positive if and only if II, 12, ... , 1m are linearly independent. 4.59~. Let A, B : en ---> en be self-adjoint and let A be positive. Show that the eigenvalues of A -1 B are real. Show also that A -1 B is positive if B is positive.
4.60~. Let A = [ail E M n ,n(lK) be self-adjoint and positive. Show that detA :::; (trA/n)n and deduce detA:::; n~=1a:. [Hint: Use the inequality between geometric and aritmethic means, see [GM1].] 4.61 ~. Let A E M n ,n(lK) and let aI, a2, ... , an E IK n be the columns of A. Prove Hadamard's lormula detA:::; n~=1Iail. [Hint: Consider H = A·A.] 4.62~.
Let A, BE Mn,n(lR) be symmetric and suppose that A is positive. Then the number of positive, negative and zero eigenvalues, counted with their multiplicity, of AB and of B coincide. 4.63~. Show that liN· Nil = IlNI1 2 if N is normal.
144
4. Self-Adjoint Operators
4.64 , Discrete Fourier transform. Let T : eN -> eN be the cycling forward shifting operator T((zo, Zl, ... ,ZN-1)) := (zt, Z2, ... ,ZN -1, zo). Show that (i) T is self-adjoint, (ii) the N eigenvalues of T are the Nth roots of 1, (iii) the vectors
._
Uk·-
1 (1 ,w,w k 2k , ... ,w k(N-1)) , VN
·2"
W
:= etN', k = 0, ... ,N - 1,
form an orthonormal basis of eN of eigenvectors of T; finally the cosine directions (zluk) of z E eN with respect to the basis (uo, ... , UN,) are given by the Discrete Fourier transform of z. 4.65'. Let A, B : X -> X be two self-adjoint operators on a Euclidean or Hermitian space. Suppose that all eigenvalues of A - B are strictly positive. Order the eigenvalues ).,1, ).,2, , ).,n of A and J1.1, J1.2, ... , J1.n of B in a nondecreasing order. Show that ).,i < J1.i Vi = 1, , n. [Hint: Use the variational characterization of the eigenvalues.] 4.66'. Let A : X -> X be self-adjoint on a Euclidean or Hermitian space. Let ).,t, ).,2, ... , ).,n and J1.1, J1.2,· .. , J1.n be respectively, the eigenvalues and the singular values of A that we think of as ordered as 1).,11 ::; 1).,21 ::; ... ::; I).,nl and J1.1 ::; J1.2 ::; ... ::; J1.n· Show that I).,il = J1.i Vi = 1, ... , n. [Hint: A* A = A2.J 4.67'. Let A : X -> X be a linear operator on a Euclidean or Hermitian space. Let m, M be respectively the smallest and the greatest singular value of A. Show that m::; 1).,1 ::; M for any eigenvalue)., of A. 4.68'. Let A: X spaces. Show that
->
Y be a linear operator between two Euclidean or two Hermitian
(i) (A* A)1/2 maps ker A to {O}, (ii) (A* A)1/2 is an isomorphism from ker A.L onto itself, (iii) (AA*)1/2 is an isomorphism from ImA onto itself. 4.69'. Let A : X -> Y be a linear operator between two Euclidean or two Hermitian spaces. Let (ut, U2, ... , Un) and J1.1, J1.2, ... , J1.n, J1.i 2: 0 be such that (ut, U2,···, Un) is an orthonormal basis of X and (A* A)1/2 x = Li J1.i(xlui)ui' Show that
(i) AA*y = Ll"i#oJ1.i(yIAui)Aui Vy E Y, (ii) If B denotes the restriction of (A* A)1/2 to ker A.L, see Exercise 4.68, then "Ix E kerA.L,
(iii) If C denotes the restriction of (AA*)1/2 to 1m A, see Exercise 4.68, then C- 1y =
L
1 -(yIAui)Aui l"i#O J1.i
Vy E ImA.
4.70'. Let A E MN,n(lI(), N 2: n, with Rank A = n. Select n vectors Ut, U2, ... , Un E J!(n such that AU1, ... , AUn E J!(N are orthonormal. [Hint: Find U E Mn,n(J!() such that AU is an isometry.]
4.3 Exercises
145
4.71 ,. Let A E MN,n(IR) and A = ULlY, where U E O(N), Y E O(n). According to 4.39, show that At = y T Ll'UT where ...L
Ll'=
0
0
0
0
0
0
0 ...L
0
0
...L
0
0
0
0
0
0
0
0
1-'1
1-'2
I-'k
J1.1, J1.2, .•• , J1.k being the nonzero singular values of A.
4.72 ,. For u : IR.
->
IR. 2 , discuss the system of equations
-1) 2
u = O.
4.73'. Let A E Mn,n(IR.) be a symmetric matrix. Discuss the following systems of ODEs x' (t)
+ Ax(t) = 0, + Ax(t) =
- ix'(t)
x" (t)
+ Ax(t) =
0,
0, where Ais positive definite
and show that the solutions are given respectively by 17
cos(tv A)x(O)
+
sin(tv'A) , v'A x (0).
4.74'. Let A be symmetric. Show that for the solutions of x"(t) + Ax(t) = 0 the energy is conserved. Assuming A positive, show that Ix(t)1 ::; E/>-l where E is the energy of x(t) and >- the smallest eigenvalue of A. 4.75'. Let A be a Hermitian matrix. Show that Ix(t)1 = const if x(t) solves the Schrodinger equation ix' + Ax = O.
Part II
Metrics and Topology
Felix Hausdorff (1869-1942), Maurice Frechet 1932).
(1878~1973) and
Rene-Louis Baire (1874-
5. Metric Spaces and Continuous Functions
The rethinking process of infinitesimal calculus, that was started with the definition of the limit of a sequence by Bernhard Bolzano (1781-1848) and Augustin-Louis Cauchy (1789-1857) at the beginning of the XIX century and was carried on with the introduction of the system of real numbers by Richard Dedekind (1831-1916) and Georg Cantor (1845-1918) and of the system of complex numbers with the parallel development of the theory of functions by Camille Jordan (1838-1922), Karl Weierstrass (18151897), J. Henri Poincare (1854-1912), G. F. Bernhard Riemann (18261866), Jacques Hadamard (1865-1963), Emile Borel (1871-1956), ReneLouis Baire (1874-1932), Henri Lebesgue (1875-1941) during the whole of the XIX and beginning of the XX century, led to the introduction of new concepts such as open and closed sets, the point of accumulation and the compact set. These notions found their natural collocation and their correct generalization in the notion of a metric space, introduced by Maurice Frechet (1878-1973) in 1906 and eventually developed by Felix Hausdorff (1869-1942) together with the more general notion of topological space. The intuitive notion of a "continuous function" probably dates back to the classical age. It corresponds to the notion of deformation without "tearing". A function from X to Y is more or less considered to be continuous if, when x varies slightly, the target point y = f(x) also varies slightly. The critical analysis of this intuitive idea also led, with Bernhard Bolzano (1781-1848) and Augustin-Louis Cauchy (1789-1857), to the correct definition of continuity and the limit of a function and to the study of the properties of continuous functions. We owe the theorem of intermediate values to Bolzano and Cauchy, around 1860 Karl Weierstrass proved that continuous functions take on maximum and minimum values in a closed and limited interval, and in 1870 Eduard Heine (1821-1881) studied uniform continuity. The notion of a continuous function also appears in the work of J. Henri Poincare (1854-1912) in an apparently totally different context, in the so-called analysis situs, which is today's topology and algebraic topology. For Henri Poincare, analysis situs is the science that enables us to know the qualitative properties of geometrical figures. Poincare referred to the properties that are preserved when geometrical figures undergo any kind of deformation except those that introduce tearing and glueing of points. An intuitive idea for some of these aspects may be provided by the following examples.
150
5. Metric Spaces and Continuous Functions
.ou.K1lJlIblItO:'oO(j.... PIIIDIt..U.T'IIOIlI.DU~'
...........(1I"' ..._
....
~
•• a.
GRl'XDZ'CGE D••
ME
1
E P CE AB TRAIT F£~U ....n _ H.....,....
LERRE
llH DORPF
'f ..... N:I
'''''15n co:. I1rH'n'il'U
Oo\ll'1l1.lMtI· U.UM
..-.nn_
. . . . . . . a.Uj <10.
0JJ0,,0,4.~.
I. ~' ......... ' "...... ~11 .1
O;U{,.,.l
LEIPZTG fl:IT" COil". IlL.
~O:ll
Figure 5.1. Frontispieces of Les espaces abstraits by Maurice Frechet (1878-1973) and of the Mengenlehre by Felix Hausdorff (1869-1942).
o Let us draw a disc on a rubber sheet. No matter how one pulls at the rubber sheet, without tearing it, the disc stays whole. Similarly, if one draws a ring, any way one pulls the rubber sheet without tearing or glueing any points, the central hole is preserved. Let us think of a loop of string that surrounds an infinite pole. In order to separate the string from the pole one has to break one of the two. Even more, if the string is wrappped several times around the pole, the linking number between string and pole is constant, regardless of the shape of the coils. o We have already seen Euler's formula for polyhedra in [GMl]. It is a remarkable formula whose context is not classical geometry. It was Poincare who extended it to all surfaces of the type of the sphere, Le., surfaces that can be obtained as continuous deformations of a sphere without tearing or glueing. o lR, lR 2 , lR3 are clearly different objects as linear vectorspaces. As we have seen, they have the same cardinality and are thus undistinguishable as sets. Therefore it is impossible to give meaning to the concept of dimension if one stays inside the theory of sets. One can show, instead, that their algebraic dimension is preserved by deformations without tearing or glueing. At the core of this analysis of geometrical figures we have the notion of a continuous deformation that corresponds to the notion of of a continuous one-to-one map whose inverse is also continuous, called homeomorphisms. We have already discussed some relevant properties of continuous functions f : lR --; lR e f : lR2 --; lR in [GMl] and [GM2]. Here we shall discuss continuity in a sufficiently general context, though not in the most general.
5.1 Metric Spaces
151
Poincare himself was convinced of the enormous importance of extending the methods and ideas of his analysis situs to more than three dimensions. .,. L 'analysis situs Ii plus de trois dimensions presente des difficultes enormes; il faut pour tenter de les surmonter etre bien persuade de l'extreme importance de cette science. Si cette importance n'est pas bien comprise de tout Ie monde, c 'est que tout Ie monde n'y a pas suffisamment rejiechi. 1
In the first twenty years of this century with the contribution, among others, of David Hilbert (1862-1943), Maurice Frechet (1878-1973), Felix Hausdorff (1869-1942), Pavel Alexandroff (1896-1982) and Pavel Urysohn (1898-1924), the fundamental role of the notion of an open set in the study of continuity was made clear, and general topology was developed as the study of the properties of geometrical figures that are invariant with respect to homeomorphisms, thus linking back to Euler who, in 1739, had solved the famous problem of Konigsberg's bridges with a topological method. There are innumerable successive applications, so much so that continuity and the structures related to it have become one of the most pervasive languages of mathematics. In this chapter and in the next, we shall discuss topological notions and continuity in the context of metric spaces.
5.1 Metric Spaces 5.1.1 Basic definitions a. Metrics 5.1 Definition. Let X be a set. A distance or metric on X is a map d : X x X -+ lR.+ for which the following conditions hold: (i) (IDENTITY) d(x,y) > 0 if x -=1= y E X, and d(x, x) = 0 "Ix E X. (ii) (SYMMETRY) d(x, y) = d(y, x) "Ix, y EX. (iii) (TRIANGLE INEQUALITY) d(x, y) :s: d(x, z) + d(z, y), V x, y, z E X. A metric space is a set X with a distance d. Formally we say that (X, d) is a metric space if X is a set and d is a distance on X.
The properties (i), (ii) and (iii) are often called metric axioms.
1
The analysis situs in more than three dimensions presents enormous difficulties; in order to overcome them one has to be strongly convinced of the extreme importance of this science. If its importance is not well understood by everyone, it is because they have not sufficiently thought about it.
152
5. Metric Spaces and Continuous Functions
A
B Figure 5.2. Time as distance.
5.2 Example. The Euclidean distance d(x, y) := Ix - yl, x, y E JR, is a distance on JR. On JR2 and JR3 distances are defined by the Euclidean distance, given for n = 2,3 by n
dE(x,y):=
{
~(Xi - Yi)2
} 1/2
,
where x := (Xl, X2), Y := (Y1, Y2) if n = 2, or x := (Xl, X2, X3), Y := (Y1, Y2, Y3) if n = 3. In other words, JR, JR2, JR3 are metric spaces with the Euclidean distance. 5.3 Example. Imagine JR3 as a union of strips ~n:= {(X1,X2,X3) In ~ X3 < n + I}, made by materials of different indices of refractions Vn. The time teA, B) needed for a light ray to go from A to B in JR3 defines a distance on JR3, see Figure 5.2. 5.4 Example. In the infinite cylinder C = {(x, Y, z) I x 2 + y 2 = I} C JR3, we may define a distance between two points P and Q as the minimal length of the line on C, or geodesic, connecting P and Q. Observe that we can always cut the cylinder along a directrix in such a way that the curve is not touched. If we unfold the cut cylinder to a plane, the distance between P and Q is the Euclidean distance of the two image points. 5.5~. Of course lOOlx - yl is also a distance on JR, only the scale factor has changed. More generally, if I: JR -> JR is an injective map, then d(x,y) := I/(x) - l(y)1 is again a distance on IR.
5.6 Definition. Let (X, d) be a metric space. The open ball or spherical open neighborhood centered at Xo E X of radius p > 0 is the set
B(xo, p)
:=
{x
E
X I d(x, xo) < p}.
Figure 5.3. Metrics on a cylinder and on the boundary of a cube.
5.1 Metric Spaces
153
Notice the strict inequality in the definition of B(x,r). In~, ~2, ~3 with the Euclidean metric, B(xo, r) is respectively, the open interval ]xor, Xo + r[, the open disc of center Xo E ~2 and radius r > 0, and the ball bounded by the sphere of ~3 of center Xo E ~3 and radius r > O. We say that a subset E C X of a metric space is bounded if it is contained in some open ball. The diameter of E C X is given by diamE := su p { d(x, y) I x,
YE E},
and, trivially, E is bounded iff diam E < +00. Despite the suggestive language, the open balls of a metric space need not be either round nor convex; however they have some of the usual properties of discs in ~2. For instance B(xo, r) C B(xo, s) Vxo E X and 0 < r < s, Ur>oB(xo,r) = X Vxo E X, nr>oB(xo, r) = {xo} Vxo E X, Vxo E X and Vz E B(xo, r) the open ball centered at z and radius p := r - d(xo, z) > 0 is contained in B(xo, r), (v) for every couple of balls B(x,r) and B(y,s) with a nonvoid intersection and Vz E B(x,r) n B(y,s), there exists t > 0 such that B(z, t) C B(x, r) n B(y, s), in fact t := min(r - d(x, z), s - d(y, z)), (vi) for every x, y E X with x i= y the balls B(x, rd and B(y, r2) are disjoint if rl + r2 S d(x, y).
(i) (ii) (iii) (iv)
5.7 -,r. Prove the previous claims. Notice how essential the strict inequality in the definition of B(xo, p) is.
b. Convergence A distance d on a set X allow us to define the notion of convergent sequence in X in a natural way. 5.8 Definition. Let (X, d) be a metric space. We say that the sequence {x n } C X converges to x E X, and we write X n -+ x, if d(xn,x) -+ 0 in ~, that is , if for any r > 0 there exists n such that d( x n , x) < r for all
n? n.
The metric axioms at once yield that the basic facts we know for limits of sequences of real numbers also hold for limits of sequences in an arbitrary metric space. We have (i) the limit of a sequence {x n } is unique, if it exists, (ii) if {x n } converges, then {x n } is bounded, (iii) computing the limit of {x n } consists in having a candidate x E X and then showing that the sequence of nonnegative real numbers { d( x n , x)} converges to zero, (iv) if X n -+ x, then any subsequence of {x n } has the same limit x.
154
5. Metric Spaces and Continuous Functions
Thus, the choice of a distance on a given set X suffices to pass to the limit in X (in the sense specified by the metric d). However, given a set X, there is no distance on X that is reasonably absolute (even in IR), but we may consider different distances in X. The corresponding convergences have different meanings and can be suited to treat specific problems. They all use the same general language, but the exact meaning of what convergence means is hidden in the definition of the distance. This flexibility makes the language of metric spaces useful in a large context.
5.1.2 Examples of metric spaces Relevant examples of distances are provided by linear vector spaces on the fields OC = IR or C in which we have defined a norm.
5.9 Definition. Let X be a linear space over OC = IR or C. A norm on X is a function II II : X ----+ IR+ satisfying the following properties
Vx X. Ilxll = 0 if and only if x = O. IIAxl1 IAlllxl1 Vx E X, VA E OC. (iv) Ilx + yll ::; Ilxll + Ilyll Vx, y E X. If 11·11 is a norm on X, we say that (X, II II) is a linear normed space or simply that X is a normed space with norm II II. Let X be a linear space with norm II II. It is easy to show that the (i) (ii) (iii)
Ilxll Ilxll
(FINITENESS) E IR E (IDENTITY) ~ 0 and (I-HOMOGENEITY) = (TRIANGLE INEQUALITY)
function d : X x X
----+
IR+ given by
d(x, y) :=
Ilx - yll,
x,yEX,
satisfies the metric axioms, hence defines a distance on X, called the natural distance in the normed space (X, II II). Obviously, such a distance is translation invariant, i.e., d(x + z, Y + z) = d(x, y) Vx, y, z E X. Trivial examples of metric spaces are provided by the nonempty subsets of a metric space. If A is a subset of a metric space (X, d), then the restriction of d to A x A is trivially, a distance on A. We say that A is a metric space with the induced distance from X. 5.10~. For instance, the cylinder C := {(x, y, z) E ]R31 x 2 + y2 = I} is a metric space with the Euclidean distance that, for x,y E C, yields d(x,y) :=length of the chord joining x and y. The geodesic distance d g of Example 5.4, that is the length of the shortest path in C between x and y, defines another distance. C with the geodesic distance d g has to be considered as another metric space different from C with the Euclidean distance. A simple calculation shows that 1r
Ilx-yll:S: dg(x,y):S: "2llx-yll.
We shall now illustrate a few examples of metric spaces.
5.1 Metric Spaces
155
Figure 5.4. The ball centered at (0,0) of radius 1 in JR2 respectively, for the metrics d1, dl.3, d2 and d7. The unit ball centered at (0,0) of radius one for the metric doc is the square] - 1, l[x] - 1,1[.
a. Metrics on finite-dimensional vector spaces 5.11'. As we have already seen, JRn with the Euclidean distance Ix - yl is a metric space. More generally, any Euclidean or Hermitian vector space X is a normed space with norm given by Ilxll := J(xlx) cf. Chapter 3. X is therefore a metric space with the induced distance
d(x,y):= Ilx - yll, called the Euclidean distance of X. 5.12 , oo-distance. Set for x = (Xl, X2, .. . , Xn) E JRn
Ilxll oc := max(l x 11, IX21,···, Ixnl). Show that X --> Ilxlloc is a norm on JRn. Hence, JRn, equipped with the distance doc(x,y) := Ilx - ylloc, is a metric space different from the standard JRn with the Euclidean distance of Exercise 5.11. In JR 2 , the unit ball centered at (0,0) of radius one for the metric doc is the square] - 1, l[x] - 1,1[, see Figure 5.4.
5.13' p-distance. Given a real number P 2: 1, we set for x E JRn n
Ilxll p := (:ElxiI P )
1/ p.
i=l
Show that Ilxll p is a norm on JR n , hence dp(x, y) := Ilx - yllp is a distance on JRn. Observe that II 112 and d2 are the Euclidean norm and distance in JRn. In JR2, the unit ball centered at (0,0) of radius one for the metric dp for some values of p is shown in Figure 5.4. [Hint: The triangle inequality for the p-norm is called Minkowski's discrete inequality
156
5. Metric Spaces and Continuous Functions
which follows for instance from Minkowski's inequality for integral norms, see [GM1]. Alternatively, we can proceed as follows. Suppose a and b are nonzero, otherwise the inequality is trivial, apply the convexity inequality f(>'x+(l->')y) :'S >'f(x)+(l->')f(y) to f(t) = t P with x := a;/lla[[p, y := b;/llb[[p, >. := Iiallp/(Ilalip + [[blip), and sum on i from 1 to n, to get
lI a + bllp <1.] Iialip + Ilbll p 5.14' Product spaces. Let (XI,d(1)), (X2,d(2)), ... (Xn,d(n)) be n metric spaces and let Y = Xl X X2 X ... X X n be the Cartesian product of Xl, ... , X n . Show that each of the functions defined on X x Y by
dp(X, y) := (2:~=1 d(i) (Xi, y;)P riP, if p { d oo (x, y) := max { d(i) (Xi, Yi) = 1, ... , n}
Ii
> 1,
for x = (xl, x 2 , ... , xn), y = (y l , y2, ... , yn) E Y, are distances on Y. Notice that if Xl = ... = X n =]R with the Euclidean distance, Y is ]Rn, then the distances dp(x,y) are just the distances in Exercises 5.13 and 5.12. Also show that if {xd C Y, Xk := (xl;, x~, . .. xi:) V'k, and x = (xl, x 2 , . .. , x n ), then the following claims are equivalent. (i) There exists p :::: 1 such that dp(Xk' x)
->
0,
(ii) dp(Xk, x) -> 0 Vp :::: 1, (iii) doo(Xk,X) -> 0 Vp:::: 1, (iv) V'i=l, ... ,ndi(x~,xi)->O. 5.15 , Discrete distance. Let X be any set. The discrete distance on X is given by
d(x, y) =
{I
o
if xi- y, if x = y.
Show that the balls for the discrete distance are
I
B(x,r) = {y E X d(x,y)
< r}
=
{~}
if r :'S 1, if r :::: 1,
and that convergent sequences with respect to the discrete distance reduce to sequences that are definitively constant.
5.16' Codes distance. Let X be a set that we think of as a set of symbols, and let X n = X X X x ... x X the space of ordered words on n symbols. Given two words x = (Xl, X2, ... , xn) and y = (Yl, Y2, ... , Yn) E X n , let
d(x, y) :=
#{ i IXi i- Yi}
be the number of bits in x and y that are different. Show that d(x, y) defines a distance in Characterize the balls of relative to that distance. [Hint: Write d(x,y) = 2:~=1 d(Xi, Yi) where d is the discrete distance in X.J
xn.
xn
5.1 Metric Spaces
157
Tl IIEllaRlE E COIIUNICAZIONL
~IATEM
CIRC LO
... .IiI .... '.. ' ... III(""")')
TICO
01 PALERMO 11llrlOOVCT10II'
.............. '_ --, • QI,,,. . . -
,.... .. _
.... --,..-,--..-1111.......
_
........ -,..,..
rAooot-.~
.......-
_
__ 11
" - " - . ..
llOlV_ _
..
,
....
Jt~(~
_
--
_~
........ .-...,•
.......1I...-, ...1I ....
_ ...... ............ ........ - - ..--...-,...,..-..0 r..
....
.)...-_
1-._
.. .....--"--..... -~"
,.. •
_,.....
ea.,-
~IIen
c..,.
_ , - ...
-
,....,.
__
__
'n.._""f_ .....
.......
., ........ - -
...
-
_ _ I ..........
-" _~( ............. _...... _ r.,. .. c-.. _4
Gucer"
TOMO XXIt
..-
....--
--.~._-_
(It
DanTDn G •
..-.
1fI'I_
&.0_
...-e.-
,_J- •
-...
-..
-
~·-
....-.
__ -)
U q(4l.a_ "--of;
1
PALEawo, UQ, DIU" fOcurA
---.... ..
""'--
_"'-,,~
-'''--
.....
Figure 5.5. The first page of the These at the Faculty of Sciences of Paris by Maurice Frechet (1878-1973), published on the Rendiconti del Circolo Matematico di Palermo.
b. Metrics on spaces of sequences We now introduce some distances and norms on infinite-dimensional vector spaces. 5.11 Example (£00 space). Consider the space of all real (or complex) sequences x:= (Xl, ... ). For x = {x n }, y:= {Yn}, set
IIxlloo := sup n
Ixnl,
doo(x, y) := Ilx - ylloo.
It is easy to show that X -+ IIxlioo satisfies the axioms of a norm apart from the fact that it may take the value +00. Thus X -+ Ilxlioo is a norm in the space
£00 :=
{x = {Xn} 111x ll 00 < +oo},
that is, on the linear vector space of bounded sequences. Consequently,
is a distance on £00' called the uniform distance. Convergence of {xd in the uniform norm, called the uniform convergence, amounts to
IIxk - xll oo = sup Ixi.: i
xii
-+
0
as k
-+
c
£00 to x E £00 (5.1)
00,
where Xk = (xL x~, ... ) and x = (xl, x2 , ... ). Notice that the uniform convergence in (5.1) is stronger than the pointwise convergence as k For instance, let t.p(t)
:=
-+ 00.
te- t , t E R+, and consider the sequence of sequences {xd
where Xi:= {xi}n, xi:= t.p(~). Then Vi we have xi.: = ie-ilk
I
i )k i = 0,1,... Ilxk - 01100 = sup { ,/-'
}=
-+
0 as k
~1 '" O.
-+
00, while
158
5. Metric Spaces and Continuous Functions
Of course JRn with the metric d oo in Exercise 5.12 is a subset of f. oo endowed with the induced metric d oo . This follows from the identification (x1, ... ,x n )
(x1, ... ,xn,O, ... ,O, ... ).
+-+
5.18 Example (f.p spaces, p 2: 1). Consider the space of all real (or complex) sequences x:= (Xl, ... ). For 1 ::; p::; 00, x = {x n } and y:= {Yn} set
Trivially, IIxllip = 0 if and only if any element of the sequence x is zero, moreover Minkowski's inequality holds as it follows from Exercise 5.13 (passing to the limit as n -+ 00 in Minkowski's inequality in JRn). Thus II Ili p satisfies the metric axioms apart from the fact that it may take the value +00. Hence, II 1lip is a norm in the linear space of sequences f. p :=
{x =
{x n } IlIxllip
< +oo}.
Consequently, dip(X,y) := IIx - yllip is a distance on f. p . Convergence of {Xk} C f. p to x E f. p amounts to 00
L 14 - xil
P -+
0
as k
-+ 00,
i=l
where Xk = (xLx~, ... ) and x = (x l ,x 2 , ... ). Notice that JRn with the metric dp in Exercise 5.13 is a subset of f. p endowed with the induced metric dip' This follows for instance from the identification
Finally, observe that IIxlliq ::; Ilxllip Yx if 1::; P ::; q, hence
Since there exist sequences x = {x n } such that Ilxlliq < q, as for instance
<
+00
while Ilxllip =
+00
if
P
._ {~}l/P ,
X.-
n
the inclusions (5.2) are strict if 1 < P < q. The case p = 2 is particularly relevant since the f.2 norm is induced by the scalar product 00
(XIY)i2 := Lxiyi, i=l
f.2
is called the Hilbert coordinate space, and the set
the Hilbert cube.
5.1 Metric Spaces
159
I
Figure
5.6. Tubular neighborhood of the graph of I.
c. Metrics on spaces of functions The language of metric spaces is particularly relevant in dealing with different types of convergences of functions. As examples of metric spaces of functions, we then introduce a few normed spaces that are relevant in the sequel. 5.19 Example (Continuous functions). Denote by CO([O, 1]) the space of all continuous functions I : [O,IJ -+ JR.. For I : [0,1] -+ JR. set
11/1100,[0,1]:= sup I/(x)l· XE[O,l)
We have (i) 11/1100,[0,lJ
< +00 by Weierstrass's theorem, (ii) 11f1100,[0,l) = a iff I(x) = a "Ix, (iii) IIA 11100,[0,1) = IAlll/lloo,[o,l], (iv) III + glloo,[o,l) ::; 11/1100,[0,1) + Ilglloo,[o,l]' To prove (iv) for instance, observe that for all x E [0,1], we have I/(x)
+ g(x)1 ::; I/(x)1 + Ig(x)1 ::; 11/1100,[0,1] + Ilglloo,[o,l]
hence the right-hand side is an upperbound for the values of I + g. The map I -+ 11/1100,[0,1] is then a norm on CO([O, 1]), called the unilorm or infinity norm. Consequently CO ([0, 1]) is a normed space and a metric space with the unilorm distance
I,g E CO([O, 1]).
doo(f,g) := III - glloo [0 1] = max I/(t) - g(t)l, "
tE[O,l]
I
In this space, the ball B(f, €) of center functions 9 E CO([O, 1]) such that
Ig(x) - l(x)1
and radius € >
<€
a is the
set of all continuous
V x E [O,IJ
or the family of all continuous functions with graphs in the tubular neighborhood of radius € of the graph of I
U(f, €) := {(x, y) I x E [0,1], y E JR.,
Iy -
l(x)1
< €},
(5.3)
see Figure 5.6. The unilorm convergence in CO ([0, 1]), that is the convergence in the uniform norm, of Ud c CO([O, 1]) to I E CO([O, 1]) amounts to computing Mk
:= IIIk - 11100 [0 1J = max IIk(t) - l(t)1 "
for every k = 1,2, ... and to showing that Mk
tEIO,l) -+
a as
k
-+
+00.
160
5. Metric Spaces and Continuous Functions
k
-1
Figure 5.7. The function Ik in (5.4).
5.20 Example (Functions of class C 1 ([0, 1])). Denote by C 1 ([0, 1]) the space of all functions f: [0, 1] ~ lR of class Cr, see [GMl]. For f E C 1 ([0, 1]), set IlfIIC1([0,1]):=
+
sup [f(x)[ xE[O,l]
sup If'(x)! = Ilfl[oo,[o,l]
+ [If'IIoo,[o,l]'
xE[O,l]
It is easy to check that f ~ IIfII C 1([0,1]) is a norm in the vector space C1([0, 1]). Consequently, d C 1([0,1])(f,g):= Ilf - gl ([0,1])
Icr
defines a distance in C 1 ([0,1]). In this case, a function 9 E C 1 has a distance less than I" from f if Ilf - glloo,[o,lJ + IIf' - g'IIoo,[o,l] < 1"; equivalently, if the graph of 9 is in the tubular neighborhood U(f, q) of the graph of f, and the graph of g' is in the tubular neighborhood U(f', 1"2) of f' with q + 1"2 = 1", see (5.3). Moreover, convergence in the C1([0, 1])-norm of Ud c C 1([0, 1]) to f E C1([0, 1]), [Ifk - fIIC1([0,1]) ~ 0, amounts to
fk ~f { f~ ~f'
uniformly in [0,1], uniformly in [O,IJ.
Figures 5.8 and 5.9 show graphs of Lipschitz functions and functions of class C1 ([0, 1]) that are closer and closer to zero in the uniform norm, but with uniform norm of the derivatives larger than one.
5.21 Example (Integral metrics). Another norm and corresponding distance in CO ([0, 1]) is given by the distance in the mean
1
1
1
IIfl[L1([0,1]) :=
[f(t)[ dt,
d L 1([0,1])(f,g):= [1f-gllL1(0,1):= ![f-gl dX .
° 5.22 ~. Show that the L1- norm in CO ([0, 1]) satisfies the norm axioms. Convergence with respect to the L 1 -distance differs from the uniform one. For instance, for k = 1, 2, . .. set
Ik(x) =
{-ka
3
(IX I -
-b)
< -b, -b ::; Ixl < 1.
if 0< Ixl if
(5.4)
We have IIlklloo,[o,lj = 1(0) = k ~ +00 while IllkIIL1([0,1]) = 1/(2k) ~ 0, cf. Figure 5.7. More generally, the LP([O, 1])-norm, 1 ::; p < 00, on CO([O, 1]), is defined by
5.1 Metric Spaces
161
Figure 5.8. The Lebesgue example.
rl
11/[[LP(o,l):= ( Jo I/(x)[P dx
1---> 1[/IILP(o,l)
It turns out that
) lip
.
satisfies the axioms of a norm, hence lip
1
d LP ([o,1])(I, g) :=
III -
gliLv([o,l]) :=
(! II -
glP dx )
°
is a distance in CO([O, 1]); it is called the U([O, 1])-distance. 5.23'. Show that the LP([O, 1])-norm in CO([O, 1]) satisfies the norm axioms. [Hint: The triangle inequality is in fact Minkowski's inequality, see [GMl].]
5.1.3 Continuity and limits in metric spaces a. Lipschitz-continuous maps between metric spaces 5.24 Definition. Let (X, d x ) and (Y, dy ) be two metric spaces and let o < a :::; 1. We say that a junction f : X ----+ Y is a-Halder-continuous ij there exists L > 0 such that
dy(l(x),j(y)):::; Ldx(x,y)O:,
't/ x,yEX.
(5.5)
I-Holder-continuous junctions are also called Lipschitz continuous. The smallest constant L jor which (5.5) holds is called the a-Holder constant of f, often denoted by [f]o:. When a = 1, the I-Holder constant is also called the Lipschitz constant oj f and denoted by [f] 1, Lip f or Lip (I). 5.25 Example (The distance function). Let (X, d) be a metric space. For any xo E X, the function I(x) := d(x,xo) : X ---> lR is a Lipschitz-continuous function with Lip (I) = 1. In fact, from the triangle inequality, we get
I/(y) - I(x)[ = Id(y, xo) - d(x, xo)1 hence
I is Lipschitz continuous with Lip (I)
s d(x, y)
'tx,y E X,
s 1. Choosing x =
Xo, we have
I/(y) - l(xo)1 = Id(y, xo) - d(xo, xo)[ = dey, xo), thus Lip (I)
~
1.
162
5. Metric Spaces and Continuous Functions
Figure 5.9. On the left, the sequence A(x) := k- 1 cos(kx) that converges uniformly to zero with slopes equibounded by one. On the right, 9k(X) := k- 1 cos(k 2 x), that converges uniformly to zero, but with slopes that diverge to infinity. Given any function fECI ([0,1]), a similar phenomenon occurs for the sequences A(x) := f(kx)/k, 9k(X) = f(k 2 x)/k.
5.26 , Distance from a set. Let (X, d) be a metric space. The distance function from x E X to a nonempty subset A C X is defined by d(x, A) := inf{ d(x, y)
It is easy to show that f(x) := d(x, A) : X Lip (f) =
{o 1
-->
lYE A}.
lR is a Lipschitz-continuous function with
if d(x, A) = otherwise.
°
Vx,
If d(x, A) is identically zero, then the claim is trivial. On the other hand, for any x, y E X and z E A we have d(x, z) ~ d(x, y) + dey, z) hence, taking the infimum in z,
d(x, A) - dey, A) ~ d(x, y)
and interchanging x and y, Id(x, A) - dey, A)I ~ d(x, y),
that is, x --> d(x, A) is Lipschitz continuous with Lipschitz constant less than one. Since there exists a x ¢ A such that d(x, A) > 0, there exists a sequence {Zn} C A such that d(x,zn) 1 + ;;;. 1 Th ~ d(x,A) < erelore, Id(x, A) - d(x n , A)I = d(x, A) ~ n
from which we infer that the Lipschitz constant of x than one.
n
+ 1 d(x, Xn), -->
d(x, A) must not be smaller
b. Continuous maps in metric spaces The notion of continuity that we introduced in [GMl], [GM2] for functions on one real variable can be extended in the context of the abstract metric structure. In fact, by paraphrasing the definition of continuity of functions f : IR - t 1R+ we get
5.1 Metric Spaces
163
5.27 Definition. Let (X, d x ) and (Y, dy ) be two metric spaces. We say that f : X --+ Y is continuous at Xo if ' 0 there exists 5 > 0 such that dy(f(x),f(xo)) < f whenever dx(x,xo) < 5, i.e.,
' 0 :35> 0 such that f(Bx(xo, 5))
C By(f(xo), f).
(5.6)
We say that f : X ---> Y is continuous in E C X if f is continuous at every point Xo E E. When E = X and f : X --+ Y is continuous at any point of X, we simply say that f : X --+ Y is continuous. 5.28 'If. Show that a-Holder-continuous functions, 0 < a ~ 1, in particular Lipschitzcontinuous functions, between two metric spaces are continuous.
Let (X, d x ) and (Y, dy) be two metric spaces and E C X. Since E is a metric space with the induced distance of X, Definition 5.27 also applies to the function f : E ---> Y. Thus f : E --+ Y is continuous at Xo E E if
' 0 :35 > 0 such that f(Bx(xo, 5) n E) c By(f(xo), f) and we say that f any point Xo E E.
:E
--+
Y is continuous if
f :E
--+
(5.7)
Y is continuous at
5.29 Remark. As in the case of functions of one real variable, the domain of the function f is relevant in order to decide if f is continuous or not. For instance, f : X ---> Y is continuous in E C X if ' 0 :35> 0 such that f(Bx(xo, 5)) C By(f(xo), f), while the restriction fiE: E
--->
(5.8)
Y of f to E is continuous in E if
' 0 :35 > 0 such that f(Bx(xo, 5) n E) c By(f(xo), f). (5.9) We deduce that the restriction property holds: if f : X --+ Y is continuous in E, then its restriction fiE: E --+ Y to E is continuous. The opposite is in general false, as simple examples show.
5.30 Proposition. Let X, Y, Z be three metric spaces, and Xo EX. If f : X ---> Y is continuous at Xo and 9 : Y --+ Z is continuous at f (xo), then 9 0 f : X ---> Z is continuous at xo. In particular, the composition of two continuous functions is continuous. Proof. Let t > O. Since 9 is continuous at f(xo), there exists a > 0 such that g(By(f(xo),a)) C Bz(g(f(xo)),t). Since f is continuous at xo, there exists 8 > 0 such that f(B x (xo,8)) C By(f(xo), a), consequently go f(B x (xo, 8)) C g(By (f(xo), a)) C Bz(g
0
f(xo), t).
o Continuity can be expressed in terms of convergent sequences. As in the proof of Theorem 2.46 of [GM2], one shows
5.31 Theorem. Let (X, dx) and (Y, d y ) be two metric spaces. f : X --+ Y is continuous at Xo E X if and only if f(x n ) ---> f(xo) in (Y, d y ) whenever {x n } C X, Xn --+ Xo in (X,d x ).
164
5. Metric Spaces and Continuous Functions
c. Limits in metric spaces Related to the notion of continuity is the notion of the limit. Again, we want to rephrase f (x) ----. Yo as x ----. xo. For that we need f to be defined near xo, but not necessarily at xo. For this purpose we introduce the 5.32 Definition. Let X be a metric space and A eX. We say that Xo E X is an accumulation point of A if each ball centered at Xo contains at least one point of A distinct from xo,
Vr > 0 B(xo,r) nA \ {xo} I- 0. Accumulation points are also called cluster points. 5.33'. Consider R with the Euclidean metric. Show that (i) the set of points of accumulation of A :=]a, b[, B = [a, b], C = [a, b[ is the closed interval [a, b], (ii) the set of points of accumulation of A :=]0, 1[U{2}, B = [0, 1JU{2}, C = [0,1[U{2} is the closed interval [0, 1], (iii) the set of points of accumulation of the rational numbers and of the irrational numbers is the whole R.
We shall return to this notion, but for the moment the definition suffices.
5.34 Definition. Let (X, d x ) and (Y, dy ) be two metric spaces, let E C X and let Xo E X be a point of accumulation of E. Given f : E \ {xo} ----. Y, we say that Yo E Y is the limit of f(x) as x ----. Xo, x E E, and we write f (x) ----. Yo as x ----. Xo,
or
lim f(x)
X-XQ
= Yo
xEE
if for any E > 0 there exists 15 > 0 such that dy(J(x), Yo) < x E E and 0 < dx(x, xo) < 15. Equivalently,
E
whenever
VE > 0:315 > 0 such that f(Bx(xo, 15) nE\ {xo}) C By(yo, E). Notice that, while in order to deal with the continuity of f at Xo we only need f to be defined at Xo; when we deal with the notion of limit we only need that Xo be a point of accumulation of E. These two requirements are unrelated, since not all points of E are points of accumulation and not all points of accumulation of E are in E, see, e.g., Exercise 5.33. Moreover, the condition 0 < dx(x, xo) in the definition of limit expresses the fact that we can disregard the value of fat Xo (in case f is defined at xo). Also notice that the limit is unique if it exists, and that limits are preserved by restriction. To be precise, we have
5.35 Proposition. Let (X, d x ) and (Y, dy ) be two metric spaces. Suppose FeE c X and let Xo E X be a point of accumulation for F. If f(x) ----. y as x ----. Xo, x E E, then f(x) ----. y as x ----. Xo, x E F. 5.36'. As for functions of one variable, the notions of limit and continuity are strongly related. Show the following.
5.1 Metric Spaces
165
Proposition. Let X and Y be two metric spaces, E C X and xo EX. (i) If Xo belongs to E and is not a point of accumulation of E, then every function f : E -+ Y is continuous at xo. (ii) Suppose that xo belongs to E and is a point of accumulation for E. Then a) f : E -+ Y is continuous at Xo if and only if f(x) -+ f(xo) as x -+ xo, xE E, b) f(x) -+ y as x -+ XO, x E E, if and only if the function g : E U {xo} -+ Y defined by g(x) := {yf(X)
if x E E \ {xo}, if x = xo
is continuous at xo.
We conclude with a change of variable theorem for limits, see e.g., Proposition 2.27 of [GMl] and Example 2.49 of [GM2]. 5.37 Proposition. Let X, Y, Z be metric spaces, E C X and let Xo be a point of accumulation for E. Let f : E -+ Y, 9 : f(E) -+ Z be two functions and suppose that f(xo) is an accumulation point of f(E). If
(i) g(y) -+ L as y -+ Yo, Y E f(E), (ii) f(x) -+ Yo as x -+ XO, x E E, (iii) either f(xo) = Yo, or f(x) -j. Yo for all x then g(f(x))
-+
L as x
-+
E E and x
-j. xo,
Xo, x E E.
d. The junction property A property we have just hinted at in the case of real functions is the junction property, see Section 2.1.2 of [GMl], which is more significant for functions of several variables. Let X be a set. We say that a family {Uo:} of subsets of a metric space is locally finite at a point Xo E X if there exists r > 0 such that B(xo, r) meets at most a finite number of the U0: 'so 5.38 Proposition. Let (X, d x ), (Y, dy) be metric spaces, f : X -+ Y a function, Xo EX, and let {Uo:} be a family of subsets of X locally finite at Xo·
(i) Suppose that Xo is a point of accumulation of Uo: and that f(x) -+ y as x -+ XO, x E uo:, for all 0:. Then f(x) -+ y as x -+ XO, x E X. (ii) If Xo E no:uo: and f : Uo: C X -+ Y is continuous at Xo for all 0:, then f : X -+ Y is continuous at Xo. 5.39'. Prove Proposition 5.38. 5.40 Example. An assumption on the covering is necessary in order that the conclusions of Proposition 5.38 hold. Set A := {(x, y) I x 2 < Y <,2x 2 } C ]R2 and
f(x,y):=
{I
o
if (x,y) E A, otherwise.
166
5. Metric Spaces and Continuous Functions
The function f is discontinuous at XQ := (0,0), since its oscillation is one in every ball centered at XQ. Denote by Um the straight line through the origin
Um := {(x,y) Iy = mx},
Uoo := {(x, y) I x = O}.
mE JR,
The Un'S, Q E lR U {oo} form a covering of JR2 that is not locally finite at XQ and for any Q E JR U 00, the restriction of f to each Un is zero near the origin. In particular, each restricition flu n : Un ----; JR is continuous at the origin.
5.1.4 Functions from IRn into IRm It is important to be acquainted with the limit notion we have just introduced in an abstract context. For this purpose, in this section, we shall focus on mappings between Euclidean spaces and illustrate with a few examples some of the abstract notions previously introduced.
a. The vector space CO (A, jRnt) Denote by ei : jRn -> jR the linear map that maps x = (xl, x 2, ... , Xn ) E jRn into its ith component, ei(x) := Xi. Any map f : X -7 jRn from a set X into jRn writes as an n-tuple of real-valued functions f(x) = (P(x), r(x)), where for any i = 1, ... , n the function fi : X -7 jR is given by fi(x) := ei(f(x)). From 00"
n
IYII,IY21,oo·,!Ynl::; Iyl::; LIYi!
Y E jRn
i=l
we readily infer the following.
5.41 Proposition. The following claims hold.
(i) The maps e i : jRn -7 jR, i = 1, ... , n, are Lipschitz continuous. (ii) Let (X, d) be a metric space. Then a) f : X -7 IRn is continuous at Xo E X if and only if all its components fl, f2, ... , fn are continuous at xo, b) if f,g: X -7 IRn are continuous at xo, then f + g: X -7 IRn is continuous at xo, c) if f : X -7 IRn and A : X -7 IR are continuous at Xo then the map Af : X -7 IRn defined by Af(x) := A(x)f(x), is continuous at xo. 5.42 Example. The function f : JR3 ----; JR, f(x, y, x) := sin(x 2y) + x 2 is continuous at JR3. In fact, ifxQ:= (xQ,YQ,zQ), then the coordinate functions x = (x,y,lZ) ----; x, x ----; y, x ----; Z are continuous at XQ by Proposition 5.41. By Proposition 5.41 (iii), x ----; x 2 y and x ----; z2 are continuous at XQ, and by (ii) Proposition 5.41, x ----; x 2 y + z2 is continuous at XQ. Finally sin(x 2 y + x 2 ) is continuous since sin is continuous.
5.43 Definition. Let X and Y be two metric spaces. We denote by CO(X, Y) the class of all continuous function f : X -7 Y.
5.1 Metric Spaces
167
As a consequence of Proposition 5.41 CO (X, jRm) is a vector space. Moreover, if A E CO(X, jR) and f E CO(X, jRm), then Af : X ~ jRn given by Af(x) := A(x)f(x), x E X, belongs to cO(X,jRm). In particular,
5.44 Corollary. Polynomials in n variables belong to CO (jRn , jR). Therefore, maps f : jRn ~ jRm whose components are polynomials of n variables are continuous. In particular, linear maps L E £(jRn, jRm) are continuous. It is worth noticing that in fact
5.45 Proposition. Let L : jRn tinuous in jRn.
~
jRm be linear. Then L is Lipschitz con-
Proof. As L is linear, we have Lip(J) : =
sup [[L(x) - L(Y)[[JR'" x,yElRn Ilx - YllJRn x'!y
=
IIL(x - y)IIJR'" = sup IIL(z)IIJR'" =: [[LII. Ilx - YllJRn O#EJRn IlzIIJRn
sup x,yElR n
x'!y
Let us prove that jlLl1 < +00. Since L is continuous at zero by Corollary 5.44, there exists 8 > 0 such that IIL(w)[[ < 1 whenever Ilwll < 8. For any nonzero z E lR n , set w := 2 Since Ilw[1 < 8, we have [IL(w)[[ < 1. Therefore, writing z = 211zl1 wand using the linearity of L
f,=w
IIL(z)[[
=
1121~zll L(w)11 = 21~zIIIIL(w)11 < ~ Ilzll
hence IIL[[
2
:s; ;5 < +00. o
For a more detailed description of linear maps in normed spaces, see Chapters 9 and 10.
b. Some nonlinear continuous transformations from jRn into jRN We now present a few examples of nonlinear continuous transformations between Euclidean spaces. 5.46 Example. For k = 0,1, ... consider the map
Uk () t
=
(cos kt, sin kt) { (1,0)
Uk : ] -
1,1[---> lR2 given by
if t E]O, 21l"/k[, otherwise.
This is a Lipschitz function whose graph is given in Figure 5.10. Notice that the graph of Uk = {(t, Uk (t))} is a curve that "converges" as k ---> 00 to a horizontal line plus a vertical circle at O. Compare with the function sgn x from lR to R
168
5. Metric Spaces and Continuous Functions
y
21r/k
Figure S.10. The function
Uk
in Example 5.46.
5.47 Example (Stereographic projection). Let
sn := { X E
jRn+1 Ilxl = 1}
be the unit sphere in jRn+ 1. If x = (Xl, xn , xn + 1) E jRn+ 1 , let us denote the coordinates of x by (y, z) where y = (Xl, X2, , x n ) E jRn and z = X n +1 E R With this notation, sn = {(y, z) E jRn x jR IIyl2 + z2 = I}. Furthermore, denote by Ps = (0, -1) E sn the South pole of sn. The stereographic projection (from the South pole) is the map that projects from the South pole the sphere onto the {z = O} plane, (7:
Sn \ iPs} C
JRn+1 -+ jRn,
(y,z)
-+
-y-. l+z
It is easily seen that (7 is injective, surjective and continuous with a continuous inverse given by (7-1: jRn -+
Sn \ iPs},
(7
-1
(x):=
(2 1-lxI2) 1 + Ixl 2x , 1 + Ixl 2
that maps x E jRn into the point of sn lying in the segment joining the South pole of sn with x, see Figure 5.11.
5.48 Example (Polar coordinates). The transformation (7: 2;:=
I
{(p,B) p
> 0,
0 -:;
B < 21r}
(p,B)
-+ jR2,
-+
(pcosB,psinB)
defines a map that is injective and continuous with range jR2 \ {O}. The extension of the map to the third coordinate (j: 2; X jR -+ jR2 X jR,..., jR3,
(p,B,z)
-+
(pcosB,psinB,z)
defines the so-called cylindrical coordinates in jR3.
5.49 Example (Spherical coordinates). The representation of points (x, y, z) E jR3 as X = psin
z=pcos
see Figure 5.12, defines the spherical coordinates in jR3. This in turn defines a continuous transformation (p, B,
> 0,0 -:; B < 21r, 0 -:;
5.1 Metric Spaces
169
A
Ps
Figure 5.11. The stereographic projection from the South pole.
Complex-valued functions of one complex variable provide examples of tranformations of the plane. 5.50 Example (w = zn). The map z --> zn defines a continuous transformation of C to C. The inverse image of each nonzero wEe is made by n distinct points, given by the n roots of w; those n points collapse to zero when w = O. If we write the transformation w = zn as Iwl = Izl n Argw = nArgz, we see, identifyng C with ]R2, that the circle of radius r and center 0 is mapped into the circle of radius r n and center O. Moreover, if a point goes clockwise along the circle, then the normalized point image [~~n goes along the unit circle clockwise n times.
5.51 ~. The map z --> w = zn restricted to 'PO is injective and continuous.
< Arg z < 'PI
with 0
5.52~. Show that the map z -->
< 'PI -
'PO ::; 211"
In
w = z2 maps the family of parallel lines to the axes (but the axes themselves) into two families of parabolas with the common axis as the real axis and the common foci at the origin, see Figure 5.13.
5.53 Example (The Joukowski function). This is the map
z "'" 0,
P=(x,y,z) 'P
Figure 5.12. Spherical coordinates.
170
5. Metric Spaces and Continuous Functions
Figure 5.13. The transformation w = z2 maps families of lines parallel to the axes, except for the axes, into two families of parabolas with the common axis as the real axis and the common foci at the origin.
which appears in several problems of aerodynamics. It is a continuous function defined in C\ {O}. Since >.(z) = >'(l/z), every point w oj. ±1,0 has at most, and, in fact, exactly two distinct inverse images Zl, Z2 satisfying Z1Z2 = 1.
5.54'. Show that >.(z) = 1/2(z + liz) is one-to-one from {jz[ < 1, z oj. O} or {[zl > 1} into the complement of the segment {w I - 1 :s; ~w :s; 1}. >. maps the family of circles {z jlzj = r h, 0 < r < 1, into a family of co-focal ellipses and maps the diameters z = te ia , -1 < t < 1, 0 < a < IT, in a family of co-focal hyperbolas, see Figure 5.14. 5.55 Example (The Mobius transformations). These maps, defined by L(z);= aZ+b, cz +d
ad-bcoj.O
(5.10)
are continuous and injective from C \ { -dl c} into C \ {al c} and have several relevant properties that we list below, asking the reader to show that they hold.
5.56'. Show the following. (i) L(z) -> alc as jzj -> 00 and IL(z)1 -> 00 as z -> -dlc. Because of this, we write L( 00) = al c, L( -dl c) = 00 and say that L is continuous from C U {oo} into itself. (ii) Show that every rational function, i.e., the quotient of two complex polynomials, defines a continuous transformation of C U {oo} into itself, as in (i). (iii) The Mobius transformations L(z) in (5.10) are the only rational functions from C U {oo} into itself that are injective. (iv) The Mobius transformations (aiz + bi)/(CiZ + di ), i = 1,2, are identical if and only if (al' bl, q, dI) is a nonzero multiple of (a2, b2, C2, d2)' (v) The Mobius transformations form a group G with respect to the composition of maps; the subset H c G, H := {z, 1- z, liz, 1/(1- z), (z -l)lz} is a subgroup ofG. (vi) A Mobius transformation maps straight lines and circles into straight lines and circles (show this first for the map liz, taking into account that the equations for straight lines and circles have the form A(x 2 + y2) + 2Bx + 2Cy + D = 0 if z = x + iy). (vii) The map in (5.10) maps circles and straight lines through -cld into a straight line and any other straight line or circle into a circle.
5.1 Metric Spaces
"
I:
171
, I
::~ Figure 5.14. The Joukowski function maps circles Izi = r, 0 < r < 1, and diameters z = ±e±io:, 0 ::; t < 1, 0 < a < 27r, respectively into a family of ellipses and of cofocal hyperbolas.
(viii) The only Mobius transformation with at least two fixed points is z. Two Mobius transformations are equal if they agree at three distinct points. There is a unique Mobius transformation that maps three distinct points Zl, Z2, Z3 E iC U {(X)} into three distinct points Wl, W2, W3 E iC U {(X)}.
5.57 Example (Exponential and logarithm). The complex function z --+ exp z, see [GM2], is continuous from iC --+ C, periodic of period 27ri with image iC \ {O}. In particular e Z does not vanish, and every nonzero W has infinitely many preimages. 5.58~.
(i)
W
Taking into account what we have proved in [GM2], show the following. = e Z is injective with a continuous inverse in every strip parallel to the real
axis of width h ::; 27r, and has an image as the interior of an angle of radiants h and vertex at the origin; (ii) w = e Z maps every straight line which is not parallel to the axes into a logarithmic spiral, see Chapter 7.
c. The calculus of limits for functions of several variables Though we may have appeared pedantic, we have always insisted in specifying the domain E c X in which the independent variables varied. This is in fact particularly relevant when dealing with limits and continuity of functions of several variables, as in this case there are several reasonable ways of approaching a point XQ. Different choices may and, in general, do lead to different answers concerning the existence and/or the equality of the limits lim f(x). lim f(x) and x-+xo X-ioXo xEE
xEF
Let (X,d x ) and (Y,d y ) be two metric spaces, f: X a point of accumulation of X.
----+
Y and
XQ
EX
172
5. Metric Spaces and Continuous Functions
Figure 5.15. The function in Example 5.59.
(i) If we find two sets E l , E 2 such that Xo is an accumulation point of both E l and E 2 , and the restrictions f : E l C X -+ Y and f : E 2 C X -+ Y of f have different limits, then f has no limit when x -+ XO, x E E l U E 2 . (ii) if we want to show that f(x) has limit as x -+ Xo, we may a) guess a possible limit Yo E Y, for instance computing the limit Yo of a suitable restriction of f, b) show that the real-valued function x -+ dy(J(x), Yo) converges to zero as x -+ xo, for instance proving that ~
dy(J(x), Yo) where h : X
-+
h(x) for all x
lR is such that h(x)
-+
E
X, xi- xo,
0 as x
-+
xo.
5.59 Example. Let f : ]R2 \ {(O,O)} ---. ]R be defined by f(x,y) := xyl(x 2 + y2) for (x, y) i- (0,0). Let us show that f has no limit as (x, y) ---. (0,0). By contradiction, suppose that f(x,y) ---. L E iR: as (x,y) ---. (0,0). Then for any sequence {(Xn,Yn)} C ]R2\ {(O, O)} converging to (0,0) we find f(x n , Yn) ---. L. Choosing (Xn, Yn) := (lin, kin), we have 1 f ( ;;';; = 1+k2 hence, as n ---. 00, L = kl(l + k 2 ). Since k is arbitrary, we have a contradiction. This is even more evident if we observe that f is positively homogeneous of degree 0, Le., f(>'x, >.y) = f(x, y) for all >. > 0, i.e., f is constant along half-lines from the origin, see Figure 5.15. It is then clear that f has limit at (0,0) if and only if f is constant, which is not the case. Notice that from the inequality 2xy :s; x 2 + y2 we can easily infer that If(x, y)1 :s; 1/2 V(x, y) E ]R2 \ {(O, O)}, Le., that f is a bounded function.
k)
k
5.60 Example. Let f(x, y) := sin(x 2y)/(x 2 + y2) for (x, y) i- (0,0). In this case (l/n,O) ---. (0,0) and f(l/n, O) = 0. Thus is the only possible limit as (x, y) ---. (0,0); and, in fact it is, since
°-- .
If(x, y) -
°
°
I sin(x y)1 :S .J::lli.L Ixl :S ~ Ixl ---. x2 + y2 x2 + y2 2 used I sin tl :s; It I 'It, 21xllyl :s; x 2 + y2
01
2
=
as (x, y) ---. (0,0). Here we (x, y) ---. Ixl is a continuous map in
]R2,
see Proposition 5.41.
V(x, y) and that
5.1 Metric Spaces
173
We can also consider the restriction of f to continuous paths from xo, i.e., choose a map cp : [0,1] -+ JR.2 that is continuous at least at with cp(o) = Xo and cp(t) =f- Xo for t =f- and compute, if possible
°
°
lim f(cp(t)).
t---.o+
Such limits mayor may not exist and their values depend on the chosen path, for a fixed f. Of course, if lim f(x)
X---+XO
= L,
xEE
then, on account of the restriction property and of the change of variable theorem, and lim f(x) = L X---+XQ xEF
respectively for any FeE of which Xo remains a point of accumulation and for any continuous path in E, cp([O, 1]) C E. 5.61 Example. Let us reconsider the function
f : JR2 \ {(O, On
xy f(x,y) := --2---2 x +y
JR,
->
which is continuous in JR2 \ {(O, On. Suppose that we move from zero along the straight line {(x, y) I y = mx, x E JR} that we parametrize by x -> (x, mx). Then
f(
m
= f(x,mx) = - - 2 l+m
m -> - - 2 '
l+m
as x
->
0,
in particular, the previous limit depends on m, hence f(x, y) has no limit as (x, y)
->
(0,0).
Set E := {(x, y) I x E JR, 0
AX 2 }.
lim
(x,y)-(O,O) (x,y)EE
We instead have
f(x, y) = O.
In fact, in this case
0<
~
3
<
- x 2 + y2 -
Alxl =Alxl x2
inE.
5.62 Example. The function f(x,y) =
{~ ; +y
if (x, y) E JR2 \ {(O, On, if (x, y) = (0,0)
is continuous in JR2 \ {(O, On but is not continuous at (0,0). Restricting line through (0,0) parametrized as
f(
=
However, restricting (x, QX 2 ), gives
f
f(at, btl
=
a 2bt 3 a 2t 2 + b2t 2
=
a2b a 2 + b2 t
along the graph of the function y = QX
4
I(x, ax) == x 4 +Q 2 X 4
Q
----.. 1
+ ~2' ~
->
as t
0
QX
2
as x
f
->
to a straight
O.
parametrized as cp(x) := ->
0,
174
5. Metric Spoces and Continuous Functions
MO. OCR.AFJt MATEMATYCZNt K.O.M.1TIT I.tDAItCTJNY' . . . .urAC • • II:)I',.,ST'. 1t. I.C ... TOVjl:l J -..uCIUI.WlIet. " jlIl.""'"I •• ITlllfW.VJ
TONtn
TOPOLOGIE: I UPACIJ MtTWAauJ. U.ACLS COXPLEU
CASIMIR. KUR.ATOWSKI
#
J,uav
cJt nnttn:s:ro .. el.TOn "''''000'''1.1 $ Z A '" A. -
""'
L '" 0 .....
~,H
5.16. Kazimierz Kuratowski (1896-1980) and the frontispiece of the first volume of his Topologie. Figure
thus f has no limit as (x, y) --- (0,0). Let us now consider the restriction of f to the set
I
E:= {(x,y) x 2' 0,
Iyl ~
x
3
}.
We have lim
(x,y)~(O,O)
f(x, y) =
o.
(x,y)EE
In foct,
_01 = I~ + x4
y2
Ixl 1yl < ~ < Ixl--- 0, + y2 - x4 + y2 2
x4
since (x, y) E E.
We conclude by observing that for functions lim f(x) =
x--+xo
f : JRn
~
JR the expression
+00
means that \1M E JR there is 8> 0 such that f(x) > M \Ix E B(xo, 8)\ {xo}.
5.2 The Topology of Metric Spaces In this section we introduce some families of subsets of a metric space X that are defined by the metric structure, namely the families of open and closed sets. Recall that if X is a set, P(X) denotes the set of all subsets of X: A E P(X) if and only if A c X.
5.2 The Topology of Metric Spaces
175
5.2.1 13asic facts a. Open sets 5.63 Definition. A subset A of a metric space (X, d) is called an open set if for all x E A there exists a ball centered at x contained in A, i.e., '
> 0 such that B(x, r x )
C
(5.11)
A.
5.64 Proposition. A subset A of a metric space X is open if and only if either A is empty or is a union of open balls. Proof. Let A be open. Then either A is empty or A is trivially a union of open balls, A = UxEAB(x,r x ). Conversely, (5.11) trivially holds if A = 0. If instead x E A f= 0, since we assume that A is union of balls, there is y E X and p > 0 such that x E B(y, p) C A. Thus yEA and, setting r := p- d(x, y), we have r > 0 and by the triangle inequality B(x,r) C B(y,p) C A. 0
In particular,
5.65 Corollary. The open balls of a metric space X are open sets. 5.66'. Let (X, d) be a metric space and r 2: O. Show that {y E X I d(y, x) open set in X.
> r}
is an
5.67'. Let (X,d) be a metric space. Show that {xn} C X converges to x E X if and only if, for any open set A such that x E A, there exists n such that X n E A for all
n2:n.
The following is also easily seen.
5.68 Proposition. Let (X, d) be a metric space. Then
(i) 0 and X are open sets, (ii) if {An,} is a family of open sets, then U",A", is an open set, too, (iii) if AI, A 2 , .•. ,An are finitely many open sets, then n~I Ai is open. 5.69'. By considering the open sets {] -~, ~[I n E l\I}, show that the intersection of infinitely many open sets needs not be an open set.
b. Closed sets Recall that the complement of A C X is the set
5.70 Definition. Let X be a metric space. F C X is called a closed set if Fe = X \ F is an open set.
176
5. Metric Spaces and Continuous Functions
The de Morgan formulas
together with Proposition 5.68 yield at once the following. 5.71 Proposition. Let X be a metric space. Then
(i) 0 and X are closed sets, (ii) the intersection of any family of closed sets is a closed set, (iii) the union of finitely many closed sets is a closed set. 5.72'. Show that [a, b], [a, +oo[ and [-a, +oo[ are closed sets in JR, while [0,1[ is neither closed nor open. 5.73 ,. Show that the set {~
In =
1,2, ... } is neither closed nor open.
5.74'. Show that any finite subset of a metric space is a closed set. 5.75'. Show that the closed ball {x E Xld(x,xo) X I d(x, xo) = r} are closed sets.
:S r}, and its boundary {x E
One may characterize closed sets in terms of convergent sequences. 5.76 Proposition. Let (X, d) be a metric space. A set F c X is a closed set if and only if every convergent sequence with values in F converges to a point of F. Proof. Suppose that F is closed and that {xd C F converges to x E X. Let us prove that x E F. Assuming on the contrary, that x rt F, there exists r > 0 such that B(x,r) n F = 0. As {x n } C F, we have d(xn,x) ~ r "In, a contradiction since d(xn,x) -> O. Conversely, suppose that, whenever {xd C F and Xk -> x, we have x E F, but F is not closed. Thus X \ F is not open, hence there exists a point x E X \ F such that Vr > 0 B(x, r) n F f= 0. Choosing r = 1, ~' ~' ... , we inductively construct a sequence {x n } C F such that d(xn,x) < ~' hence converging to x. Thus x E F by assumption, but x E X \ F by construction, a contradiction. 0
c. Continuity 5.77 Theorem. Let (X, d x ) and (Y, dy ) be two metric spaces and f X ~ Y. Then the following claims are equivalent
(i) f is continuous, (ii) f-l(B) is an open set in X for any open ball B ofY, (iii) f-l(A) is an open set in X for any open set A in Y, (iv) f-l(F) is a closed set in X for any closed set F in Y.
5.2 The Topology of Metric Spaces
177
Proof. (i) => (ii). Let B be an open ball in Y and let x be a point in f- 1 (B). Since f(x) E B, there exists a ball By (f(x), E) C B. Since f is continuous at x, there exists 8> 0 such that f(Bx(x,8)) C By(f(X),E) C B that is Bx(x,8) C f-l(B). As x is arbitrary, f-l(B) is an open set in X. (ii) => (i) Suppose f-l(B) is open for any open ball B of Y. Then, given xo, f-l(By(f(XO,E))) is open, hence there is 8 > 0 such that Bx(xo,8) is contained in f-l(B y (f(XO,E))), i.e., f(B x (xo,8)) C By(f(XO),E), hence f is continuous at xo· (ii) and (iii) are equivalent since f- 1 (UiA;) = Ui/-l (Ai) for any family {Ai} of subsets of X. (iii) and (iv) are equivalent on account of the de Morgan formulas.
o
5.78~. Let f,g : X - t Y be two continuous functions between metric spaces. Show that the set {x E X I f(x) = g(x)} is closed.
5. 79
~.
It is convenient to set
Definition. Let (X, d) be a metric space. U C X is said to be a neighborhood of Xo E X if there exists an open set A of X such that Xo E A CU. In particular o B(xo,r) is a neighborhood of any x E B(xo,r), o A is open if and only if A is a neighborhood of any point of A. Let (X, d), (Y, d) be two metric spaces let Xo E X and let f : X - t Y. Show that f is continuous at Xo if and only if the inverse image of an open neighborhood of f(xo) is an open neighborhood of xo.
Finally, we state a junction rule for continuous functions, see Proposition 5.38.
5.80 Proposition. Let (X, d) be a metric space, and let {UaJ be a covering of X. Suppose that either all Un's are open sets or all Un's are closed and for any x E X there is an open ball that intersects only finitely many Un' Then (i) A c X is an open (closed) set in X if and only if each A n Un is an open (closed) set in Uc" (ii) Let Y be another metric space and let f : X -+ Y. Then f is continuous if and only if all the restrictions
d. Continuous real-valued maps Let (X, d) be a metric space and f : X -+ R From Theorem 5.77 we find that f : X -+ JR is continuous if and only if f-I(]a, b[) is an open set for every bounded interval la, b[c R Moreover, 5.82 Corollary. Let f : X X and let t E JR. Then
-+
JR be a function defined on a metric space
178
5. Metric Spaces and Continuous Functions
(i) {x E X I f(x) > t}, {x E X I f(x) < t} are open sets, (ii) {x E X I f(x) 2: t}, {x E X lJ(x) ::; t} and {x E X I f(x) closed sets. 5.83 Proposition. Let (X, d) be a metric space. Then F set of X if and only if F = {xld(x,F) = O}.
c
= t} are
X is a closed
Proof. By Corollary 5.82, {x I d(x, F) = O} is closed, x -> d(x, F) being Lipschitz continuous, see Example 5.25. Therefore F = {x I d(x, F) = O} implies that F is closed. Conversely assume that F is closed and that there exists x If- F such that d(x, F) = O. Since F is closed by assumption, there exists r > 0 such that B(x, r) n F = 0. But then d(x, F)·:::: r > 0, a contradiction. 0 5.84~.
Prove the following
Proposition. Let (X, d) be a metric space. Then
(i) F C X is a closed set if and only if there exists a continuous function f : X such that F = {x E X I f(x) ::; O}, (ii) A C X is an open set if and only if there exists a continuous function f : X such that A = {x E X I f(x) < O}. Actually f can be chosen to be a Lipschitz-continuous function.
->
lR
->
lR
[Hint: If F is closed, choose f(x) := d(x, F), while if A is an open set, choose f(x) -d(x, X \ A).]
e. The topology of a metric space 5.85 Definition. The topology of a metric space X is the family P(X) of its open sets.
=
TX C
It may happen that different distances d 1 and d 2 on the same set X that define different families of balls produce the same family of open sets for the same reason that a ball is union of infinitely many squares and a square is union of infinitely many balls. We say that the two distances are topologically equivalent if (X,d 1) and (X,d 2 ) have the same topology, i.e., the same family of open sets. The following proposition yields necessary and sufficient conditions in order that two distances be topologically equivalent.
5.86 Proposition. Let d 1 , d 2 be two distances in X and let B1(x,r) and B 2 (x, r) be the corresponding balls of center x and radius r. The following claims are equivalent
(i) d 1 and d 2 are topologically equivalent, (ii) every ball B1(x,r) is open for d 2 and every ball B 2 (x,r) is open for d1 . (iii) "Ix E X and r > 0 there are rx,px > 0 such that B 2 (x,r X ) C B1(x,r) and B1(x,px) C B 2 (x,r), (iv) the identity map i : X ~ X is a homeomorphism between the metric spaces (X, dt) and (X, d 2 ).
5.2 The Topology of Metric Spaces
179
:' Y"
. . ... A
'
.. ~.. :
"
z
Figure 5.17. x is an interior point to A, y is a boundary point to A and z is an exterior point to A. x and yare adherent points to A and z is not.
5.87~. Show that the distances in Rn d oo and d p Vp ~ 1, see Exercise 5.13, are all topologically equivalent to the Euclidean distance d2. If we substitute R n with the infinitely-dimensional vector space of sequences l'r, the three distances give rise to different open sets.
We say that a property of X is a topological property of X if it can be expressed only in terms of set operations and open sets. For instance, being an open or closed set, the closure of or the boundary of, or a convergent sequence are topological properties of X, see Section 5.2.2 for more. As we have seen, f is continuous if and only if the inverse image of open sets is open. A trivial consequence, for instance, is that the composition of continuous functions is continuous, see Proposition 5.30. Also we see that the continuity of f : X -+ Y is strongly related to the topologies TX:=
{A c X IA open in X},
Ty
:=
{A
c Y I A open in Y},
respectively on X and Y, and in fact it depends on the metrics only through and Ty. In other words being a continuous function f : X -+ Y is a topological property of X and Y.
TX
f. Interior, exterior, adherent and boundary points 5.88 Definition. Let X be a metric space and A eX. We say that Xo E X is interior to A if there is an open ball B(xo, r) such that B(xo, r) C A; we say that Xo is exterior to A if Xo is interior to X \ A; we say that Xo is adherent to A if it is not interior to X \ A; finally, we say that Xo is a boundary point of A if Xo is neither interior to A nor interior to X \ A. o
The set of interior points to A is denoted by A or by int A, the set of adherent points of A, called also the closure of A, is denoted by A or by cl (A), and finally the set of boundary points to A is called the boundary of A and is denoted by GA. 5.89~.
Let (X, d) be a metric space and B(xo, r) be an open ball of X. Show that (i) every point of B(xo,r) is interior to B(xo,r), Le., intB(xo,r) = B(xo,r), (ii) every point x such that d(x,xo) = r is a boundary point to B(xo,r), Le.,
8B(xo, r)
= {x I d(x, xo) = r}, >r
(iii) every point x with d(x, xo)
is exterior to B(xo, r),
180
5. Metric Spaces and Continuous Functions
(iv) every point x such that d(x,xo)::; r is adherent to B(xo,r), i.e., cI(B(xo,r)) = {x I d(x,xo)::; r}. Let X be a metric space and A c X. Show that (i) intA C A, (ii) int A is an open set and actually the largest open set contained in A,
5.90~.
int A = u {
u IU open U C A},
(iii) A is open if and only if A = int A.
5.91 ~. Let X be a metric space and A C X. Show that (i) A c 11, (ii) 11 is closed and actually the smallest closed set that contains A, i.e., cI (A) =
(iii) A is closed if and only if A = (iv) 11 = {x E X Id(x, A) = O}.
n{ F IF closed,
F:) A},
11,
5.92~.
Let X be a metric space and A C X. Show that (i) &A = &(X \ A), (ii) &A n int A = 0, 11 = &A U int A, &A = 11 \ int A, (iii) &A = 11 n Ac, in particular &A is a closed set, (iv) &&A = 0, 11 = 11, intintA = intA, (v) A is closed if and only if &A c A, (vi) A is open if and only if &A n A = 0.
5.93~. Let (X, dx) and (Y, dy) be metric spaces and following claims are equivalent (i) f: X - t Y is continuous, (ii) f(11) c f(A) for all A C X, (iii) f-l(B) c f-l(B) for all BeY.
f :X
-t
Y. Show that the
g. Points of accumulation Let A c X be a subset of a metric space. The set of points of accumulation, or cluster points, of A, denoted by VA, is sometimes called the derived of A. Trivially VA c A, and the set of adherent points to A that are not points of accumulation of A, I(A) := A \ VA, are the points x E A such that B(x,r) nA = {x} for some r > o. These points are contained in A,
I(A) =
A \ VA c A
and are called isolated points of A. 5.94'. Show that VA C
11 and that
A is closed if and only if VA C A.
5.95 Proposition. Let (X, d) be a metric space, F have
c X and x EX. We
5.2 The Topology of Metric Spaces
(i)
181
is adherent to F if and only if there exists a sequence {x n } C F that converges to x, (ii) x is an accumulation point for F if and only if there exists a sequence {x n } C F taking distinct values in F that converges to x; in particular, a) x is an accumulation point for F if and only if there exists a sequence {x n } C F \ {x} that converges to x, b) in every open set containing an accumulation point for F there are infinitely many distinct points of F. X
Proof. (i) If there is a sequence {Xn} C F that converges to x E X, in every neighborhood of x there is at least a point of F, hence x is adherent to F. Conversely, if x is adherent to F, there is a Xn E B(x,~) n F for each n, hence {Xn} C F and Xn --> x. (ii) If moreover x is a point of accumulation of F, we can choose X n E F \ {x} and moreover Xn E B(x,rn), rn := min(d(x,xn_l), ~). The sequence {x n } has the desired properties. 0
h. Subsets and relative topology Let (X,d) be a metric space and Y C X. Then (Y,d) is a metric space, too. The family of open sets in Y induced by the distance d is called the relative topology of Y. We want to compare the topology of X and the relative topology of Y. The open ball in Y with center x E Y and radius r > 0 is
I
By(x,r) := {y E Y d(y, x) <
r} = Bx(x,r) n Y.
5.96 Proposition. Let (X, d) be a metric space and let Y eX. Then
(i) B is open in Y if and only if there exists an open set A C X in X such that B = AnY, (ii) B is closed in Y if and only if there exists a closed set A in X such that B = AnY. Proof. Since (ii) follows at once from (i), we prove (i). Suppose that A is open in X and let x be a point in AnY. Since A is open in X, there exists a ball Bx(x, r) C A, hence By(x,r) = Bx(x,r) nYc AnY. Thus AnY is open in Y. Conversely, suppose that B is open in Y. Then for any x E B there is a ball By(x, rx) = Bx(x, r x ) nBc B. The set A := U{Bx(x, r x ) I x E B} is an open set in X and AnY = B. 0
Also the notions of interior, exterior, adherent and boundary points, in (Y, d) are related to the same notions in (X, d), and whenever we want to emphasize the dependence on Y of the interior, closure, derived and boundary sets we write inty(A), A y , VyA, OyA instead of intA, A, VA, GA.
5.91 Proposition. For any A C Y we have (i) inty(A) = intx(A) nY, (ii) A y = Ax nY,
182
5. Metric Spaces and Continuous Functions
8A
A
Figure 5.18. 8A and 8y A.
(iii) V y A = VxA n Y, (iv) OvA
= oxA \
oxY.
5.98 ,. Let Y := [O,I[e lit The open balls of Yare the subsets of the type {y E [0, 1[ Ily - xl < r}. If x is not zero and r is sufficiently small, {y Ily - xl < r} n [0, 1[ is again an open interval with center x, ]x - r, x + r[. But, if x = 0, then for r < 1
By(O, r) := [0, r[. Notice that x = 0 is an interior point of Y (for the relative topology of Y), but it is a boundary point for the topology of X. This is in agreement with the intuition: in the first case we are considering Y as a space in itself and nothing exists outside it, every point is an interior point and 8y Y = 0; in the second case Y is a subset of IR and 0 is at the frontier between Y and IR \ Y. 5.99'. Prove the claims of this paragraph that we have not proved.
5.2.2 A digression on general topology a. Topological spaces As a further step of abstraction, following Felix Hausdorff (1869-1942) and Kazimierz Kuratowski (1896-1980), we can separate the topological structure of open sets from the metric structure, giving a set-definition of open sets in terms of their properties. 5.100 Definition. Let X be a set. A topology in X is a distinct family of subsets T C P(X), called open sets, such that o 0, X E T, o if {A,,} C T, then u"A" E T, o if AI, A 2 , ... , An E T, then
nk=l A k
E T.
A set X endowed with a topology is called a topological space. Sometimes we write it as (X, T).
5.101 Definition. A function f : X --> Y between topological spaces (X, TX) and (Y, Ty) is said to be continuous if f-I(B) E TX whenever
5.2 The Topology of Metric Spaces
183
B E Ty. f : X -+ Y is said to be a homeomorphism if f is both injective and surjective and both f and f- 1 are continuous, or, in other words A E TX if and only if f(A) E Ty. Two topological spaces are said to be homeomorphic if and only if there exists a homeomorphism between them.
Proposition 5.68 then reads as follows.
5.102 Proposition. Let (X, d) be a metric space. Then the family formed by the empty set and by the sets that are the union of open balls of X is a topology on X, called the topology on X induced by the metric d. The topological structure is more flexible than the metric structure, and allows us to greatly enlarge the notion of the space on which we can operate with continuous deformations. This is in fact necessary if one wants to deal with qualitative properties of geometric figures, in the old terminology, with analysis situs. We shall not dwell on these topics nor with the systematic analysis of different topologies that one can introduce on a set, Le., on the study of general topology. However, it is proper to distinguish between metric properties and topological properties. According to Felix Klein (1849-1925) a geometry is the study of the properties of figures or spaces that are invariant under the action of a certain set of transformations. For instance, Euclidean plane geometry is the study of the plane figures and of their properties that are invariant under similarity transformations. Given a metric space (X, d), a property of an object defined in terms of the set operations in X and of the metric of X is a metric property of X, for instance whether {x n } C X is convergent or not is a metric property of X. More generally, in the class of metric spaces, the natural transformations are those h : (X, dx ) -+ (Y, dy ) that are one-to-one and do not change the distances dy(h(x), h(x)) = dx(x, y). Also two metric spaces (X, d) and (Y, d) are said to be isometric if there exists an isometry between them. A metric invariant is a predicate defined on a class of metric spaces that is true (respectively, false) for all spaces isometric with (X, d) whenever it is true (false) for (X, d). With this languange, the metric properties that make sense for a class of metric spaces, being evidently preserved by isometries, are metric invariants. And the Geometry of Metric Spaces, that is the study of metric spaces, of their metric properties, is in fact the study of metric invariants. 5.103~.
Let (X, dl) and (Y, d2) be two metric spaces and denote them respectively, by B 1 (x, r) and B2(X, r) the ball centered at x and radius r respectively, for the metrics dl and d2. Show that a one-to-one map h : X --t Y is an isometry if and only if the action of h preserves the balls, i.e.,
h(Br(x,r))
= B2(h(x),r)
'Ix E X, Vr
> O.
Similarly, given a topological space (X, TX), a property of an object defined in terms of the set operations and open sets of X is called a topological property of X, for instance being an open or closed subset, being
184
5. Metric Spaces and Continuous Functions
the closure or boundary of a subset, or being a convergent sequence in X are topological properties of X. In the class of topological spaces, the natural group of transformations is the group of homeomorphisms, that are precisely all the one-toone maps whose actions preserve the open sets. Two topological spaces are said homeomorphic if there is a homeomorphism from one to the other. A topological invariant is a predicate defined on a class of topological spaces that is true (false) in any topological space that is homeomorphic to X whenever it is true (false) on X. With this language, topological properties that make sense for a class of topological spaces, being evidently preserved by the homeomorphims, are topological invariants. And the topology, that is the study of objects and of their properties that are preserved by the action of homeomorphisms, is in fact the study of topological invariants.
b. Topologizing a set On a set X we may introduce several topologies, that is subsets of P(X). Since such subsets are ordered by inclusion, topologies are partially ordered by inclusion. On one hand, we may consider the indiscrete topology 7 = {0, X} in which no other sets than 0 and X are open, thus there are no "small" neighborhoods. On the other hand, we can consider the discrete topology in which any subset is an open set, 7 = P(X), thus any point is an open set. There is a kind of general procedure to introduce a topology in such a way that the sets of a given family E C P(X) are all open sets. Of course we can take the discrete topology but what is significant is the smallest family of subsets 7 that contains E and is closed with respect to finite intersections and arbitrary unions. This is called the coarser topology or the weaker topology for which E C 7. It is unique and can be obtained adding possibly to E the empty set, X, the finite intersections of elements of E and the arbitrary union of these finite intersections. This previous construction is necessary, but in general it is quite complicated and E loses control on 7, since 7 builds up from finite intersections of elements of E. However, if the family E has the following property, as for instance it happens for the balls of a metric space, this can be avoided. A basis B of X is a family of subsets of X with the following property: for every couple Ua and U(3 E B there is U, E B such that U, C Ua n U(3. We have the following. 5.104 Proposition. Let B = {Ua } be a basis for X. Then the family consisting of 0, X and all the unions of members of B is the weaker topology in X containing B.
7
c. Separation properties It is worth noticing that several separation properties that are trivial in a metric space do not hold, in general, in a topological space. The following claims, o sets consisting of a single points are closed,
5.3 Completeness
185
o for any two distinct points x and y E X there exist disjoint open sets A and B such that x E A and y E B, o for any x E X and closed set F c X there exist disjoint open sets A and B such that x E A and FeB. o for any pair of disjoint closed sets E and F there exist disjoint open sets A and B such that E C A and FeB, are all true in a metric space, but do not hold in the indiscrete topology. A topological space is called a Hausdorff topological space if (ii) holds, regular if (iii) holds and normal if (iv) holds. It is easy to show that (i) and (iv) imply (iii), (i) and (iii) imply (ii), and (ii) implies (i). We conclude by stating a theorem that ensures that a topological space be metrizable, i.e., when we can introduce a metric on it so that the topology is the one induced by the metric.
5.105 Theorem (Uryshon). A topological space X with a countable basis is metrizable if and only if it is regular.
5.3 Completeness a. Complete metric spaces 5.106 Definition. A sequence {x n } with values in a metric space (X, d) is a Cauchy sequence if 'tiE
> 0 :J
1/
such that d(x n , x m
)
< E 'tin, m
~ 1/.
It is easily seen that
5.107 Proposition. In a metric space (i) every convergent sequence is a Cauchy sequence, (ii) any subsequence of a Cauchy sequence is again a Cauchy sequence, (iii) if {Xk n } is a subsequence of a Cauchy sequence {x n } such that Xk n --+ xo, then Xn --+ Xo.
5.108 Definition. A metric space (X, d) is called complete if every Cauchy sequence converges in X. By definition, a Cauchy sequence and a complete metric space are metric invariants. With Definition 5.108, Theorems 2.35 and 4.23 of [GM2] read as JR., JR. 2 , C are complete metric spaces. Moreover, since n
IX11, IX21,···, Ixnl :::; Ilxll :::; L IXil i=l
186
5. Metric Spaces and Continuous Functions
en
{Xk} c JR.n or is a convergent sequence (respectively, Cauchy sequence) if and only ifthe sequences of coordinates {xi}, i = 1, ... , n are convergent sequences (Cauchy sequences). Thus 5.109 Theorem. For all n 2 1, JR.n and metric are complete metric spaces.
en
endowed with the Euclidean
b. Completion of a metric space Several useful metric spaces are complete. Notice that closed sets of a complete metric space are complete metric spaces with the induced distance. However, there are noncomplete metric spaces. The simplest significant examples are of course the open intervals of JR. and the set of rational numbers with the Euclidean distance. Let X be a metric space. A complete metric space X* is called a completion of X if (i) ~ is isometric to a subsp~ce X of X*, (ii) X is dense in X*, i.e., clX = X*. We have the following.
5.110 Theorem (Hausdorff). Every metric space X has a completion and any two completions of X are isometric. Though every noncomplete metric space can be regarded as a subspace of its completion, it is worth remarking that from an effective point of view the real problem is to realize a suited handy model of this completion. For instance, the Hausdorff model, when applied to rationals, constructs the completion as equivalence class of Cauchy sequences of rationals, instead of real numbers. In the same way, the Hausdorff procedure applied to a metric space X of functions produces a space of equivalence classes of Cauchy sequences. It would be desirable to obtain another class of functions as completion, instead. But this can be far from trivial. For instance a space offunctions that is the completion of CO ([0, 1]) with the L 1 ([0, 1]) distance can be obtained by the Lebesgue integration theory. 5.111 'If. Show that a closed set F of a complete metric space is complete with the induced metric. 5.112 'If. Let (X,d) be a metric space and A C X. Show that the closure of A is a completion of A.
Proof of Theorem 5.110. In fact we outline the main steps leaving to the reader the task of completing the arguments. (i) We consider the family of all Cauchy sequences of X and we say that two Cauchy sequences {Yn} and {zn} are equivalent if d(Yn, Zn) - t 0 (i.e., if, a posteriori, {Yn} and {Zn} "have the same limit"). Denote by X the set of equivalence classes obtained this way. Given two classes of equivalence Y and Z in X, let {Yn} and {Zn} be two representatives respectively of Y and Z. Then one sees
5.3 Completeness
187
Figure 5.19. Felix Hausdorff (1869-1942) and Rene-Louis Baire (1874-1932).
(i) {d(zn, Yn)} is a Cauchy sequence of real numbers, hence converges to a real number. Moreover, such a limit does not depend on the representatives {Yn} e {Zn} of Y and Z, so that (ii) d(Y, Z) is a distance in X. (ii) Let X be the subspace of X of the equivalence classes of the constant sequences with values in X. It turns out that X is isometric to X. Let Y E X and let {Yn} be a representative of Y. Denote by Y v the class of all Cauchy sequences that are equivalent to the constant sequence {Zn} where Zn := Yv "In. Then it is easily seen that Y v -. Y in X and that X is dense in X. (iii) Let {Yv } be a Cauchy sequence in X. For all 1I we choose Zv E X such that d(Yv , Zv) < 1/1l and we let Zv E X be a representative of Zv. Then we see that {zv} is a Cauchy sequence in X and, if Z is the equivalence class of {zv}, then Y v ---> Z. This proves that X is complete. (iv) It remains to prove that any two completions are isometric. Suppose that X and X are two completions of X. With the above notation, we find
XC X
and
Xc
X
that are isometric and one-to-one with X. Therefore
are isometric and in a one-to-one correspondence. Because
X is dense in X
dense respectively in X and X it is not difficult to extend the isometry i : an isometry between X and X.
X a~d and
X
X_ are
X -. X to 0
c. Equivalent metrics Completeness is a metric invariant and not a topological invariant. This means that isometric spaces are both complete or noncomplete and that there exist metric spaces X and Y that are homeomorphic, but X is complete and Y is noncomplete. In fact, homeomorphisms preserve convergent sequences but not Cauchy sequences. 5.113 Example. Consider X := lR endowed with the Euclidean metric and Y := lR endowed with the distance
d(x,y)
=
11 :\xl - 1:IY'!'
188
5. Metric Spaces and Continuous Functions
X and Yare homeomorphic, a homeomorphism being given by the map h(x) := Hlxl' x ERIn particular both distances give rise to the same converging sequences. However the sequence {n} is not a Cauchy sequence for the Euclidean distance, but it is a Cauchy sequence for the metric d since for n, mEN, m ~ n
min
d(m, n) = -n- - - - S 1 - - 1+m 1+n I l+n
->
0
per v
-> 00.
Since {n} does not converge in (JR., d), Y = (JR., d) is not complete.
Homeomorphic, but nonisometric spaces can sometimes have the same Cauchy sequences. A sufficient condition ensuring that Cauchy sequences with respect to different metrics on the same set X are the same, is that the two metrics be equivalent, i.e., there exist constants AI, A2 > Osuch that
5.114~. Show that two metric spaces which are equivalent are also topologically equivalent, compare Proposition 5.86.
d. The nested sequence theorem
An extension of Cantor's principle or the nested intervals theorem in see [GM2], holds in a complete metric space.
~,
5.115 Proposition. Let {Ed be a monotone-decreasing sequence of nonempty sets, i.e., 0 =j:. Ek+l C E k Vk = 0,1, ... , in a complete metric space X. If diam (Ek ) ---.. 0, then there exists one and only one point x E X with
the following property: any ball centered at x contains one, and therefore infinitely many of the Ek 'so Moreover, if all the Ek are closed, then nkEk = {x}.
As a speciai case we have the following. 5.116 Corollary. In a complete metric space a sequence of nested closed
balls with diameters that converge to zero have a unique common point. Notice that the conclusion of Corollary 5.116 does not hold if the diameters do not converge to zero: for Ek .- [k, +oo[C ~ we have 0 =j:. Ek+l C Ek and nkEk = 0. 5.117~.
Prove Proposition 5.115.
e. Baire's theorem 5.118 Theorem (Baire). Let X be a complete metric space that can be
written as a denumerable union of closed sets {Ei }, X = least one of the Ei's contains a ball of X.
U~l E i .
Then at
5.3 Completeness
189
Proof. Suppose that none of the Ei's contains a ball of X and let Xl 1. EI; Since Ei is closed, there is rl such that cl(B(xI,rl)) n EI = 0. Inside cl(B(xI,rI/2)) there is now X2 1. E2 (otherwise cl (B(XI' rI/2)) C E2 which is a contradiction) and r2 such that cl(B(x2,r2)) n E2 = 0, also we may choose r2 < rI/2. Iterating this procedure we find a monotonic-decreasing family of closed balls {B(xk,rk)}, cl(B(XI,q))::l cl(B(x2,r2))::l'" such that cl(B(xn,rn)) n En = 0. Thus the common point to all these balls, that exists by Corollary 5.116, would not belong to any of the En, a contradiction. 0
An equivalent formulation is the following. 5.119 Proposition. In a complete metric space, the denumerable intersection of open dense sets of a complete metric space X is dense in X. 5.120 Definition. A subset A of a metric space X is said nowhere dense if its closure has no interior point, int cl (A)
= 0,
equivalently, if X \ A is dense in X. A set is called meager or of the first category if it can be written as a countable union of nowhere dense sets. If a set is not of the first category, then we say that it is of the second category. 5.121 Proposition. In a complete metric space a meager set has no interior point, or, equivalently, its complement is dense. Proof. Let {An} be a family such that int cl An = 0. Suppose there is an open set U with U C unA n . From U C UnA n C UnA n we deduce nnA n C C U C • Baire's theorem, see Proposition 5.119, then implies that U C is dense. Since U C is closed, we conclude that UC = X i.e., U = 0. 0
5.122 Corollary. A complete metric space is a set of second category. This form of Baire's theorem is often used to prove existence results or to show that a certain property is generic, Le., holds for "almost all points" in the sense of the category, Le., that the set X \ {x E X Ip(x) holds} is a meager set. In this way one can show 2 , see also Chapters 9 and 10, the following. 5.123 Proposition. The class of continuous functions on the interval [0, 1] which have infinite right-derivative at every point, are of second category in COUO,I]) with the uniform distance; in particular, there exist continuous functions that are nowhere differentiable. Finally we notice that, though for a meager set A we have int A may have intclA -=I- 0: consider A := Ql c R 2
See, e.g., J. Dugundji, Topology, Allyn and Bacon Inc., Boston.
= 0, we
190
5. Metric Spaces and Continuous Functions
5.4 Exercises 5.124'. Show that Ix - yl2 is not a distance in JR. 5.125'. Let (X, d) be a metric space and M
dl (x, y)
:= min(M, d(x,
y)),
> O.
Show that the functions
d2(X, y)
:=
d(x, y)/(l
+ d(x, y))
are also distances in (X, d) that give rise to the same topology.
5.126'. Plot the balls of the following metric in C
d(z,w) = {IZ - wi Izl
+ [wi
if argz
= argw or z = w,
otherwise.
5.127'. Let (X, dx) be a metric space. Show that, if 1 : [0, +00[---+ [0 + oo[ is concave and 1(0) = 0, then d(x, y) := I(d x (x, y)) is a distance on X, in particular dCt(x, y) is a distance for any a, 0 instead 1111Ct, 0 < a < 1, is not a norm, if IIII is a norm.
:::; 1. Notice that
5.128'. Let 1 : (X, dx) ---+ (Y, dy) be a-Holder continuous. Show that I: (X, d (Y, dy) is Lipschitz continuous.
x)
---+
5.129'. Let S be the space of all sequences of real numbers. Show that the function d : S X S ---+ JR given by
~.2.- IX n - Ynl d( x,y ) ..- L..J n n=12 1 + IX n - Yn! if x = {Xn}, y = {Yn}, is a distance on S.
5.130 , Constancy of sign. Let X and Y be metric spaces and F be a closed set of Y, let Xo be a point of accumulation for X and 1 : X ---+ Y. If I(x) ---+ Y as x ---+ xo and Y 1:- F, then there exists 8 > 0 such that I(B(xo, 8) \ {xo}) n F = 0. 5.131 ,. Let (X, d) and (Y,8) be two metric spaces and let X x Y be the product metric space with the metric
Show (i) (ii) (iii)
that the projection maps 7r(x, y) = x, 1i'(x, y) = Y, are continuous, map open sets into open sets, but, in general, do not map closed sets into closed sets.
5.132 , Continuity of operations on functions. Let * : Y X Z ---+ W be a map which we think of as an operation. Given 1 : X ---+ Z and g : X ---+ Y, we may then define the map 1 * g : Y X Z ---+ W by 1 * g(x) = I(x) * g(x), x E X. Suppose that X, Y, Z, Ware metric spaces, consider Y x Z as the product metric space with the distance as in Exercise 5.131. Show that if I, g are continuous at xo, and * is continuous at (f(xo),g(xo)), then 1 * g is continuous at xo.
5.4 Exercises
5.133~.
191
Show that
(i) the parametric equation of straight lines in JRn, t ---... at + b, a, b E JRn, is a continuous function, (ii) the parametric equation of the helix in JR3, t ---... (cos t, sin t, t), t E JR, is a continuous function. 5.134~. Let (X, dx) and (Y, dy) be two metric spaces, E C Y, xo be a point of accumulation of E and f : E C X ---... Y. Show that f(x) ---... Yo as x ---... xo, x E E, if and only if If E > 038> 0 such that f(E n B(xo, 8) \ {xo}) c B(yo, E).
5.135~. Show that the scalar product in JRn, (xly) := I:~=l XiYi, Ifx = xl, x 2 , ... , x n , y = yl, y2, ... , yn E JRn, is a continuous function of the 2n variables (x, y) E JR2n. 5.136~. Find the maximal domain of definition of the following functions and decide whether they are continuous there:
xz 1 + y2' 5.137~.
Decide whether the following functions are continuous
{;::;2
{
x -logy'
~ x +y o
if (x, y)
'f (0,0),
if (x, y) = (0,0),
if (x, y)
'f (0,0),
if (x, y) = (0,0),
> v1XI and
if (x, y)
'f (0,0),
if Y - x 2
if (x, y)
= (0,0),
if (x, y) = (0,0).
5.138~. Compute, if they exist, the limits as (x, y) ---...
log2(1 + xy) x2 + y2 sinx(l - cos x) x2 + y2
(x, y)
'f (0,0),
(0,0) of
x sin(x 2 + 3 y 2) x2 + y2 x 2 sin 2 y sin(x2
+ y2) .
5.139~.
(i) (ii) (iii) (iv) (v) (vi)
if if if if if if
Consider JR with the Euclidean topology. Show that A :=]a,bL we have intA = A, A = [a,b] and 8A = {a,b}, A:= [a,b[, we have intA =]a,b[, A = [a,b] and 8A = {a,b}, A := [a, +00[, we have int A =]a, +00[, A = A and 8A = {a}, A := iQI C JR, we have int A = 0, A = JR and 8A = JR, A:= {(x,y) E JR2 I x = y}, we have intA = 0, A = A and 8A = A, A:= N C JR, we have intA = 0, A = A and 8A = A.
5.140~.
5.141
~.
Let (X, d) be a metric space and {Ai} be a family of subsets of X. Show that
Prove the following
Theorem. Any open set A of JR is either empty or a finite or denumerable union of disjoint open intervals with endpoints that do not belong to A.
192
5. Metric Spaces and Continuous Functions
[Hint: Show that (i) "Ix E A there is an interval ]~, 7][ with x EJ~, 7][ and ~, 7] tf- A, (ii) if two such intervals J6,7]I[ and J6,7]2[ have a common point in A and endpoints not in A, then they are equal, (iii) since each of those intervals contains a rational, then they are at most countable many.] Show that the previous theorem does not hold in jRn. Show that we instead have the following. Theorem. Every open set A with disjoint interiors.
c
jRn
is the union of a finite or countable union of cubes
5.142'. Prove the following theorem, see Exercise 5.141,
Theorem. Every closed set F C jR can be obtained by taking out from countable family of disjoint open intervals.
jR
a finite or
5.143'. Let X be a metric space. (i) Show that xo E X is an interior point of A c X if and only if there is an open set U such that Xo E U c A. (ii) Using only open sets, express that Xo is an exterior point to A, an adherent point to A and a boundary point to A. 5.144'. Let X be a metric space. Show that A is open if and only if any sequence {x n } that converges to xo E A is definitively in A, i.e., 3n such that X n E A "In :::: n. 5.145'. Let (X,dx) and (Y,dy ) be two metric spaces and let f : X ---> Y be a continuous map. Show that (i) if YO E Y is an interior point of BeY and if f(xo) = YO, then xo is an interior point of rI(B). (ii) if xo E X is adherent to A c X, then f(xo) is adherent to f(A), (iii) if xo E X is a boundary point of A c X, then f(xo) is a boundary point for f(A), (iv) if xo E X is a point of accumulation of A C X and f is injective, then f(xo) is a point of accumulation of f(A). 5.146'. Let X be a metric space and A c X. Show that Xo E X is an accumulation point for A if and only if for every open set U with xo E U we have UnA \ {XQ} # 0. Show also that being an accumulation point for a set is a topological notion. [Hint: Use (iv) of Exercise 5.145.] 5.147'. Let X be a metric space and A C X. Show that x is a point of accumulation of A if and only if x E A \ {x}. 5.148'. Let X be a metric space. A set A C X without points of accumulation in X, VeAl = 0, is called discrete. A set without isolated points, T(A) = 0, is called perfect. Of course every point of a discrete set is isolated, since A C A = T(A) c A. Show that the converse is false: a set of isolated points, A = T(A), needs not be necessarily discrete. We may only deduce that VA = GA. 5.149'. Let X be a metric space. Recall that a set E C X is dense in X if Show that the following claims are equivalent (i) D is dense in X, (ii) every nonempty open set intersects D, (iii) DC = X \ D has no interior points, (iv) every open ball B(x, r) intersects D.
E
= X.
5.4 Exercises
193
5.150~. IQi is dense in JR, i.e., ij = JR, and &1Qi = JR. Show that JR \ IQi is dense in JR. Show that the set E of points of JRn with rational coordinates and its complement are dense in JRn.
5.151 ~. Let r be an additive subgroup of R Show that either the subgroup of integer multiples of a fixed real number.
r
is dense in JR or
r
is
5.152~. Let X be a metric space. Show X n --> x if and only if for every open set A with x E A there is n such that X n E A Vn ~ n. In particular, the notion of convergence is a topological notion. 5.153~. The notion of a convergent sequence makes sense in a topological space. One says that {x n } C X converge to x E X if for every open set A with x E A there is n such that X n E A Vn ~ n. However, in this generality limits are not unique. If in X we consider the indiscrete topology T = {0, X}, every sequence with values in X converges to any point in X. Show that limits of converging sequences are unique in a Hausdorff topological space. Finally, let us notice that in an arbitrary topological space, closed sets cannot be characterized in terms of limits of sequences, see Proposition 5.76.
5.154~.
Let (X, T) be a topological space. A set F C X is called sequentially closed with respect to T if every convergent sequence with values in F has limit in F. Show that the family of sequentially closed sets satisfies the axioms of closed sets. Consequently there is a topology (a priori different from T) for which the closed sets are the family of sequentially closed sets.
5.155~. Let X be a metric space. Show that diamA
diam int A
~
diam A, but in general
diam A.
5.156~. Let 0 i' E C JR be bounded from above. Show that sup E E E; if sup E rt:. E, then sup E is a point of accumulation of E; finally, show that there exist max E and min E, if E is nonempty, bounded and closed. 5.157~. Let X be a metric space. Show that &A = 0 iff A is both open and closed. Show that in JRn we have &A = 0 iff A = 0 or A = JRn. 5.158~. Let X be a metric space. Show that Oint A C &A, and that it may happen that Oint Ai' &A.
5.159~. Sometimes one says that A is a regular open set if A = int A, and that C is a regular closed set if C = int C. Show examples of regular and nonregular open and closed sets in JR2 and JR3. Show the following: (i) The interior of a closed set is a regular open set, the closure of an open set is a regular closed set. (ii) The complement of a regular open (closed) set is a regular closed (open) set. (iii) If A and B are regular open sets, then An B is a regular open set; if C and D are regular closed sets then CUD is a regular closed set. 5.160~. Let X be a metric space. A subset D C X is dense in X if and only if for every x E X we can find a sequence {xn} with values in D such that X n --> X.
5.161 ~. Let (X, d) and (Y, d) be two metric spaces. Show that (i) if f : X --> Y is continuous, then f : E C X --> Y is continuous in E with the induced metric,
194
5. Metric Spaces and Continuous Functions
1
1 n
Figure 5.20. A Cauchy sequence in eO([O, 1]) with the Ll-metric, with a noncontinuous
"limit" .
(ii) f: X
-+
Y is continuous if and only if f : X
-+
f(X) is continuous.
5.162 ~. Let X and Y be two metric spaces and let continuous if and only if 8f-I(A) C f-I(8A) V A C X.
f :X
-+
Y. Show that
f
is
5.163 ~ Open and closed maps. Let (X, d x ) and (Y, dy) be two metric spaces. A map f : X -+ Y is called open (respectively, closed) if the image of an open (respectively, closed) set of X is an open (respectively, closed) set in Y. Show that (i) the coordinate maps 7ri : JRn -+ JR, x = (Xl, X2, . .. , X n ) -+ Xi, i = 1, ... , n, are open maps but not closed maps, (ii) similarly the coordinate maps of a product 7r x : X X Y -+ X, Jry : X X Y -+ Y given by Jrx(x, y) = X, Jry(x, y) = yare open but in general not closed maps, (iii) f: X -+ Y is an open map if and only if f(int A) C int f(A) VA C X, (iv) f: X -+ Y is a closed map if and only if f(A) C f(A) VA C X. 5.164~. Let a closed map.
f :X
-+
Y be injective. Show that
f
is an open map if and only if it is
5.165~. A metric space (X, dx) is called topologically complete if there exists a distance d in X topologically equivalent to d x for which (X, d) is complete. Show that being topologically complete is a topological invariant.
5.166~. Let (X, d) be a metric space. Show that the following two claims are equivalent (i) (X, d) is a complete metric space, (ii) If {F",} is a family of closed sets of X such that a) any finite subfamily of {F",} has nonempty intersection, b) inf{diamF",} = 0, then n",F", is nonempty and consists of exactly one point.
5.167~. Show that the irrational numbers in [0,1] cannot be written as countable union of closed sets in [0, 1]. [Hint: Suppose they are, so that [0, 1] = UrEiQ>{ r } U Ui Ei and use Baire's theorem.] 5.168~. Show that a complete metric space made of countably many points has at least an isolated point. In particular, a complete metric space without isolated points is not countable. Notice that, if Xn -+ Xoo in JR, then A := {Xn In = 1,2 ... } U {x oo } with the induced distance is a countable complete metric space.
5.4 Exercises
195
5.169'. Show that CO ([0, 1]) with the L1-metric is not complete. [Hint: Consider the sequence in Figure 5.20.J 5.170'. Show that X = {n I n = 0, 1,2, ... } and Y = {lin I n = 1,2, ... } are homeomorphic as subspaces of JR, but X is complete, while Y is not complete.
6. Compactness and Connectedness
In this chapter we shall discuss, still in the metric context, two important topological invariants: compactness and connectedness.
6.1 Compactness Let E be a subset of ]R2. We ask ourselves whether there exists a point Xo E E of maximal distance from the origin. Of course E needs to be bounded, sUPxEE d(O, x) < +00, if we want a positive answer, and it is easily seen that if E is not closed, our question may have a negative answer, for instance if E = B(O, 1). Assuming E bounded and closed, how can we prove existence? We can find a maximizing sequence, i.e., a sequence {x n } C E such that d(O,Xk)
---t
supd(O,x), xEE
and our question has a positive answer if {xd is converging or, at least, if {xd has a subsequence that converges to some point Xo E E. In fact, in this case, d(O,Xk n ) -+ d(O,xo), x -+ d(O, x) is continuous, and d(O,x nk )-+ SUPxEE d(O, x), too, thus concluding that d(O,xo)
= supd(O,x). xEE
6.1.1 Compact spaces a. Sequential compactness 6.1 Definition. Let (X, d) be a metric space. A subset K C X is said to be sequentially compact if every sequence {xd C K has a subsequence {x nk } that converges to a point of K. Necessary conditions for compactness are stated in the following
198
6. Compactness and Connectedness
!)frin oliolotifc!irr
hI l~(r.."
It
&ftllO!Otrttfll, ~lr fin mlliltQitn.
"~'I" 9tt(ulr~1 ll(ttA~rnl, nH"igJlmt tUlt'
rltDt lIB1Jfld btr CtldcbulIQI I1fj«;
lllrrnorb 180114"0, .ltttttfn.:zHflotM.tIl~. .ff/tl#.....""*ft; • ...... ~
effo./l64/1
,-r,",""''''' \Otf... ~L
,••,...
It,. .. '1'7,
Figure 6.1. Bernhard Bolzano (17811848) and the frontispiece of the work where Bolzano-Weierstrass theorem appears.
.,tn" .. It _'Ill"" ..... h.
6.2 Proposition. We have (i) Any sequentially compact metric space (X, d) is complete; (ii) Any sequentially compact subset of a metric space (X, d) is bounded, closed, and complete with the induced metric. Proof. (i) Let {Xk} C X be a Cauchy sequence. Sequential compactness allows us to extract a convergent subsequence; since {Xk} is Cauchy, the entire sequence converges, see Proposition 5.107. (ii) Let K be sequentially compact. Every point x E K is the limit of a sequence with values in K; by assumption x E K, thus K = K and K is closed. Suppose that K is not bounded. Then there is a sequence {x n } C K such that d(Xi,Xj) > 1 Vi,j. Such a sequence has no convergent subsequences, a contradiction. Finally, K is complete by
W.
0
b. Compact sets in
jRn
In general, bounded and closed sets of a metric space are not sequentially compact. However we have 6.3 Theorem. In jRn, n::::: 1, a set is sequentially compact if and only if it is closed and bounded. This follows from
6.4 Theorem (Bolzano-Weierstrass). Any infinite and bounded subset E of jRn, n ::::: 1, has at least a point of accumulation.
6.1 Compactness
199
Proof. Since E is bounded, there is a cube Co of side L, so that E C
Co := [aiO) , biO)] x ... x [a~O), b~O)],
b~O)
-
a~O) =
L.
Since E is infinite, if we divide Co in 2n equal subcubes, one of them Cl := [ap), bi l )] x ... x [a~l), b~l\ b;l) - a;l) = L/2, contains infinitely many elements of E. By induction, we divide Ci in 2n equal subcubes with no common interiors, and choose one of them, CHI, that contains infinitely many elements of E. If C·1 ,.[a(i) b(i)] x ... x [a(i) b(i)] b,(i) _ a,(i) = L/2 i , .l' 1 n , n the vertices of C i converge, a(i) b(i) k ' k
----t
a oo bOO k , k
and
aook = bOOk
since for each k = 1, ... , n the sequences {a (ilk } and {b(i)k} are real-valued Cauchy sequences. The point a := (al', . .. ,a~) is then an accumulation point for E, since for any r > 0, C i C B(a,r) for i sufficiently large. 0
Another useful consequence of Bolzano-Weierstrass theorem is
6.5 Theorem. Any bounded sequence {xd of jRn has a convergent subsequence. Proof. If {xd takes finitely many values, then at least one of them, say a, is taken infinitely often. If {pdkEN are the indices such that x Pk = a, then {x Pk } converges, since it is constant. Assume now that {Xk} takes infinitely many values. The Bolzano-Weierstrass theorem yields a point of accumulation X oo for these values. Now we choose PI as the first index for which IXPl - xool < 1, P2 as the first index greater than PI such that IX P2 - xool < 1/2 and so on: then {xPk} is a subsequence of {x n } and x Pk --t x oo .
o
c. Coverings and €-nets There are other ways to express compactness. Let A be a subset of a metric space X. A covering of A is a family A = {A",} of subsets of X such that A C u",A",. We have already said that A = {A",} is an open covering of A if each A", is an open set, and that {A",} is a finite covering of A if the number of the A", 's is finite. A subcovering of a covering A of A is a subfamily of A that is still a covering of A. 6.6 Definition. We say that a subset A of a metric space X is totally bounded if for any E > 0 there is a finite number of balls B(Xi, E), i = 1,2, ... , N of radius E, each centered at Xi EX, such that A C U~lB(Xi,E). For a given E > 0, the corresponding balls are said to form an E-covering of A, and their centers, characterized by the fact that each point of A has distance less than E from some of the Xi'S, form a set {xd called an E-net for A. With this terminology A is totally bounded iff for every E > 0 there exists an E-net for A. Notice also that A C X is totally bounded if and only if for every E > 0 there exists a finite covering {Ai} of X with sets having diam Ai < Eo
200
6. Compactness and Connectedness
6.7 Definition. We say that a subset K of a metric space is compact if every open covering of K contains a finite subcovering. We have the following.
6.8 Theorem. Let X be a metric space. The following claims are equivalent.
(i) X is sequentially compact. (ii) X is complete and totally bounded. (iii) X is compact.
'*
The implication (ii) (i) is known as the Hausdorff criterion and the implication (i) (iii) as the finite covering lemma.
'*
Proof. (i) =} (ii) By Proposition 6.2, X is complete. Suppose X is not totally bounded. Then for some r > 0 no finite family of balls of radius r can cover X. Start with Xl E X; since B(xI,r) does not cover X, there is X2 E X such that d(X2,XI) 2': r. Since {B(xI,r), B(x2,r)} does not cover X either, there is X3 E X such that d(X3,XI) 2': r and d(X3,X2) 2': r. By induction, we construct a sequence {xd such that d(Xi,Xj) 2': r Vi> j, hence d(Xi,Xj) 2': r Vi,j. Such a sequence has no convergent subsequence, but this contradicts the assumption. (ii) =} (iii) By contradiction, suppose that X has an open covering A = {A a } with no finite subcovering. Since X is totally bounded, there exists a finite covering {Cd of K, n
UC
i
=X,
such that
diamCi
< 1,
i = 1, ... ,no
i=l
By the assumption, there exists at least kl such that A has no finite subcovering for Ckl' Of course Xl := C k1 is a metric space which is totally bounded; therefore we can cover Ck 1 with finitely many open sets with diameter less than 1/2, and A has no finite subcovering for one of them that we call X2. By induction, we construct a sequence {Xd of subsets of X with
such that none of them can be covered by finitely many open sets of A. Now we choose for each k a point Xk E Xk. Since {xd is trivially a Cauchy sequence and X is complete, {Xk} converges to some Xo EX. Let Ao E A be such that xo E A and let r be such that B (xo, r) C A (A is an open set). For k sufficiently large we then have d( Xk, xo) < r for all X E X k , Le., Xk C B(xo, r) C A o. In conclusion, X k is covered by one open set in A, a contradiction since by construction no finite subcovering of A could cover Xk. (iii) =} (i) If, by contradiction, {xd has no convergent subsequence, then {xd is an infinite set without points of accumulation in X. For every x E X there is a ball B(x, r x ) centered at x that contains at most one point of {xd. The family of these balls J:= {B(x,rx)}xEx is an open covering of X with no finite subcovering of {xd hence of X, contradicting the assumption. 0 6.9 Remark. Clearly the notions of compactness and sequential compactness are topological notions. They have a meaning in the more general setting of topological spaces, while the notion of totally bounded sets is just a metric notion. We shall not deal with compactness in topological spaces. We only mention that compactness and sequential compactness are not equivalent in the context of topological spaces. 6.10'. Let X be a metric space. Show that any closed subset of a compact set is compact.
6.1 Compactness
201
6.11 ,.. Let X be a metric space. Show that finite unions and generic intersections of compact sets are compact. 6.12 ,.. Show that a finite set is compact.
6.1.2 Continuous functions and compactness a. The Weierstrass theorem As in [GM2], continuity of f : K ----+ IR and compactness of K yield existence of a minimizer. 6.13 Definition. Let f : X f(x-)
= inf
xEX
----+
IR. Points x_, x+ E X such that
f(x),
f(x+)
=
sup f(x) xEX
are called respectively, a minimum point or a minimizer and a maximum point or a maximizer for f : X ----+ IR. A sequence {xd c X such that f(Xk) ----+ inf xEx f(x) (resp. f(Xk) ----+ sUPxEX f(x)) is called a minimizing sequence (resp. a maximizing sequence).
Notice that any function f : X ----+ IR defined on a set X has a minimizing and a maximizing sequence. In fact, because of the properties of the infimum, there exists a sequence {Yk} C f(X) such that Yk ----+ inf xEx f(x) (that may be -(0), and for each k there exists a point Xk E X such that f(Xk) = Yk, hence f(Xk) ----+ inf xEx f(x).
6.14 Theorem (Weierstrass). Let f : X
----+ IR be a continuous realvalued function defined in a compact metric space. Then f achieves its maximum and minimum values, i.e., there exists x_,x+ E X such that
f(x-)
=
inf f(x),
xEX
f(x+)
=
sup f(x). xEX
Proof. Let us prove the existence of a minimizer. Let {xd C K be a minimizing sequence. Since X is compact, is has a subsequence {Xnk} that converges to some x_ EX. By continuity of f, f(x nk ) - t f(x-), while by restriction
The uniqueness of the limit yields inf xEx f(x) = f(x-).
o
In fact, we proved that, if f : X - t IR is continuous and X is compact, then any minimizing (resp. maximizing) sequence has a subsequence that converges to a minimum (resp. maximum) point.
202
6. Compactness and Connectedness
b. Continuity and compactness Compactness and sequential compactness are topological invariants. In fact, we have the following. 6.15 Theorem. Let f : X -., Y be a continuous function between two metric spaces. If X is compact, then f(X) is compact. Proof. Let {Va} be an open covering of f(X). Since f is continuous, {J-1(Va)} is an open convering of X. Consequently, there are indices al, ... ,aN such that
hence i.e., f(X) is compact.
o
Another proof of Theorem 6.15. Let us prove that f(X) is sequentially compact whenever X is sequentially compact. If {Yn} C f(X) and {x n } C X is such that f(x n ) = Yn Vn, since X is sequentially compact, a subsequence {Xk n } of {Xn} converges to a point XQ EX. By continuity, the subsquence {J(Xk n )} of {Yn} converges to f(xQ) E f(X). Then Theorem 6.8 applies. 0
6.16'. Infer Theorem 6.14 from Theorem 6.15. 6.17'. Suppose that E is a noncompact metric space. Show that there exist (i) f: E (ii) f: E
-> ->
lR continuous and unbounded, lR continuous and bounded without maximizers and/or minimizers.
c. Continuity of the inverse function Compactness also plays an important role in dealing with the continuity of the inverse function of invertible maps. 6.18 Theorem. Let f : X -., Y be a continuous function between two metric spaces. If X is compact, then f is a closed function. In particular, if f is injective, then the inverse funcion f- 1 : f(X) -., X is continuous. Proof. Let F C X be a closed set. Since X is compact, F is compact. From Theorem 6.15 we then infer that f(F) is compact, hence closed. Suppose f injective and let 9 : f(X) -> X be the inverse of f. We then have g-l(E) = feE) VE C X, hence g-l(F) is a closed set if F is a closed set in X. 0
6.19 Corollary. Let f : X -., Y be a one-to-one, continuous map between two metric spaces. If X is compact, then f is a homeomorphism. 6.20 Example. The following example shows that the assumption of compactness in Theorem 6.18 cannot be avoided. Let X = [0,271"[, Y be the unit circle of C centered at the origin and f(t) := e it , t E X. Clearly f(t) = cost + isint is continuous and injective, but its inverse function f- 1 is not continuous at the point (1,0) = f(O).
6.1 Compactness
203
6.1.3 Semicontinuity and the Frechet-Weierstrass theorem Going through the proof of Weierstrass's theorem we see that a weaker assumption suffices to prove existence of a minimizer. In fact, if instead of the continuity of f we assume 1 whenever {xd is such that Xk
----+
X_,
(6.1)
then for any convergent subsequence {x nk } of a minimizing sequence,
we have inf f(x) S f(xo) S liminf f(x nk ) k->oo
xEX
i.e., again f(xo) initions.
= inf xEx
=
lim f(x nk )
k->oo
=
inf f(x),
xEX
f(x). We therefore introduce the following def-
6.21 Definition. We say that a function f : X ----+ lR defined on a metric space X is sequentially lower semicontinuous at x EX, s.l.s.c. for short, if f(x) S liminf f(Xk) k->oo
whenever {xd
cX
is such that Xk
----+
X.
6.22 Definition. We say that a subset E of a metric space X is relatively compact if its closure E is compact. 6.23 Definition. Let X be a metric space. We say that f : X coercive if for all t E lR the level sets of f,
----+
lR is
are relatively compact.
Then we can state the following.
6.24 Theorem (Frechet-Weierstrass). Let X be a metric space and let f : X ----+ lR be bounded from below, coercive and sequentially lower semicontinuous. Then f takes its minimum value.
1
See Exercises 6.26 and 6.28 for the definition of lim inf and related information.
204
6. Compactness and Connectedness
Figure 6.2. Lebesgue's example of a sequence of curves of length the uniform distance to a curve of length 1.
v'2 that
converges in
6.25 Example. There are many interesting examples of functions that are semicontinuous but not continuous: a typical example is the length of a curve. Though we postpone details, Lebesgue's example in Figure 6.2 shows that the function length, defined on the space of piecewise linear curves with the uniform distance, is not continuous. In fact length(/k) = v'2, fk(X) -+ foo(x) := 0 uniformly in [0,1]' and length(foo) = 1 < 21l'. We shall prove later that in fact the length functional is sequentially lower semicontinuous. 6.26'. We say that f : X -+ JR is lower semicontinuous, for short l.s.c., if for all t E JR the sets {x E X I f(x) :::; t} are closed. Sequential lower semicontinuity and lower semicontinuity are topological concepts; they turn out to be different, in general. Show that if X is a metric space, then f is lower semicontinuous if and only if f is sequentially semicontinuous. 6.27'. Let X be a metric space. We recall, see e.g., [GM2], that '- E i is the lim inf of f : X -+ JR as y -+ x, '- = lim inf f(y), y~x
if x is a point of accumulation of X and (i) "1m < '- 30 such that f(y) > m if y E B(x,o) \ {xo}, (ii) "1m> '- "10> 0 3 Y6 E B(x,o) \ {x} such that f(Y6) < m. Show that the lim inf always exists and is given by liminf f(y) = sup y~x
inf
r>OB(x,r)\{x}
Similarly we can define the lim sup of
f :X
f(y) = lim
inf
r~OB(x,r)\{x}
-+
f(y)·
JR, so that
limsupf(y) = -liminf(-f(x». y
x
y-+x
Explicitly define it and show that
limsupf(y) = y~x
Finally, show that
f :X
-+
lim
sup
r~O+ B(x,r)\{x}
f(y).
JR is sequentially lower semicontinuous if and only if "Ix E X f(x) :::; lim inf f(y). y~x
6.28'. Let f : X -+ JR be defined on a metric space X. Show that (i) lim infy~x f(y) :::; limsupy~x f(y), (ii) f(x):::; liminfy~x f(y) if and only if - f(x) ~ limsupy~x - f(y), hence f is lower semicontinuous if and only if - f is upper semicontinuous, (iii) f(y) -+ '- as y -+ x if and only if liminfy~x f(y) = limsupy~x f(y) = '-, (iv) liminfy~x f + liminfy~x 9 :::; liminfy~x(f + g),
liminfx~xo(f(g(x») = f( liminfx~xo g(x»), if either f is continuous at L:= liminfx~xo g(x) or f(x) # L for any x # Xo, (vi) f is bounded from below in a neighborhood of x if and only if lim infy~x f > (v)
-00.
6.2 Extending Continuous Functions
205
6.2 Extending Continuous Functions 6.2.1 Uniformly continuous functions 6.29 Definition. Let (X, dx) and (Y, dy ) be two metric spaces. We say that f : X - t Y is uniformly continuous in X if for any f > 0 there exists 8> 0 such that dy(J(x), f(y)) < f for all x, y E X with dx(x, y) < 8. 6.30 Remark. Uniform continuity is a global property, in contrast with continuity (at all points) which is a local property. A comparison is worthwhile
f : X - t Y is continuous if Vxo EX, V f > 0 :3 8 > 0 (in principle 8 depends on f and xo) dy(J(x),J(xo)) < f whenever dx(x,xo) < 8. (ii) f : X - t Y is uniformly continuous in X if Vf > 0 :38 > 0 (in this case 8 depends on f but not on xo) dy(J(x), f(xo)) < f whenever dx(x,xo) < 8. (i)
Of course, if f is uniformly continuous in X, f is continuous in X and uniformly continuous on any subset of X. Moreover if {Ua } is a finite partition of X and each FlU", : Ua - t Y is uniformly continuous in Ua , then f : X - t Y is uniformly continuous in X. 6.31 ~. Show that Lipschitz-continuous and more generally Holder-continuous functions, see Definition 5.24, are uniformly continuous functions. 6.32~. Show that f : X ---> Y is not uniformly continuous in X if and only if there exist two sequences {x n }, {Yn} C X and EO > 0 such that dx(xn,Yn) ---> 0 and dy(f(xn), f(Yn)) ~ EO 'In.
6.33~.
Show that
x2
(i) and sin x 2 , x E JR, are not uniformly continuous in JR, (ii) l/x is not uniformly continuous in ]0,1], (iii) sin x 2 , x E JR, is not uniformly continuous in R Using directly Lagrange's theorem, show that
(iii) x 2 , x E [0,1], is uniformly continuous in [0,1], (iv) e- x , x E JR, is uniformly continuous in [0, +00[. 6.34~. Let X, Y be two metric spaces and let f : X ---> Y be uniformly continuous. Show that the image of a Cauchy sequence is a Cauchy sequence on Y.
6.35 Theorem (Heine-Cantor-Borel). Let f : X - t Y be a continuous map between metric spaces. If X is compact, then f is uniformly continuous.
206
6. Compactness and Connectedness
Proof. By contradiction, suppose that f is not uniformly continuous. Then there is 0, and two sequences {xn}, {Yn} C X such that
EO>
(6.2)
and
Since X is compact, {Xn} has a convergent subsequence, Xk n ~ x, x E X. The first inequality in (6.2) yields that {Yk n } converges to x, too. On account of the continuity of f, dy (J(Xk n ),j(x)) ~ 0, dy(J(Yk n ), f(x)) ~ 0, hence dy(J(Xkn),f(Yk n )) ~ 0: this contradicts the second inequality in (6.2).
0
6.2.2 Extending uniformly continuous functions to the closure of their domains Let X, Y be metric spaces, E c X and f : E -----> Y be a continuous function. Under which conditions is there a continuous extension of f over E, i.e., a continuous g : E -----> Y such that g = f in E? Notice that we do not want to change the target Y. Of course, such an extension may not exist, for instance if E =]0, 1] and f(x) = llx, x E]O, 1]. On the other hand, if it exists, it is unique. In fact, if gl and g2 : E -----> Yare two continuous extensions, then ~ .- {x E Elgl(X) = g2(X)} is closed and contains E, hence ~ = E.
6.36 Theorem. Let X and Y be two metric spaces. Suppose that Y is complete and that f : E c X -----> Y is a uniformly continuous map. Then f extends uniquely to a continuous function on E; moreover the extension is uniformly continuous in E. Proof. First we observe (i) since f is uniformly continuous in E, if {Xn} is a Cauchy sequence in E, then {f(x n )} is a Cauchy sequence in Y, hence it converges in Y, (ii) since f is uniformly continuous, if {Xn} {Yn} C E are such that X n ~ x and Yn ~ x for some x E X, then the Cauchy sequences {f(x n )} and {f(Yn)} have the same limit. Define F : E ~ Y as follows. For any x E E, let {Xn} C E be such that Xn ~ x. Define
F(x):= lim f(x n ). xn-x
We then leave to the reader the task of proving that (i) F is welldefined, Le., its definition makes sense, since for any x the value F(x) is independent of the chosen sequence {Xn} converging to x, (ii) F(x) = f(x) "Ix E E, (iii) F is uniformly continuous in E, (iv) F extends j, i.e., F(x) = f(x)Vx E E.
o 6.37~. As a special case of Theorem 6.36, we notice that a function j : E C X ~ Y, which is uniformly continuous on a dense subset E C X, extends to a uniformly continuous function defined on the whole X.
6.2 Extending Continuous Functions
207
6.2.3 Extending continuous functions Let X, Y be metric spaces, E C X and f : E -> Y be a continuous function. Under which conditions can f be extended to a continuous function F : X -> Y? This is a basic question for continuous maps.
a. Lipschitz-continuous functions We first consider real-valued Lipschitz-continuous maps, f: E C X
->
R
6.38 Theorem (McShane). Let (X, d) be a metric space, E C X and let f : E -> lR be a Lipschitz map. Then there exists a Lipschitz-continuous map F : X -> lR with the same Lipschitz constant as f, which extends f· Proof. Let L := Lip (I). For x E X let us define
F(x) := inf (f(Y) yEE
+ Ld(x,y»)
and show that it has the required properties. For x E E we clearly have F(x) ::::; f(x) while, f being Lipschitz, gives
f(x) ::::; fey)
+ Ld(x,y)
Vy E E,
i.e., f(x) ::::; F(x), thus concluding that F(x) = f(x) 'Ix E E. Moreover, we have
F(x) ::::; inf (f(z) zEE
+ L dey, z») + L d(x, y)
= F(y)
+ L d(x, y)
and similarly
F(y) ::::; F(x)
+ L d(x, y).
Hence F is Lipschitz continuous with Lip (F) ::::; L. As F is an extension of f, we trivially have Lip (F) 2: L, thus concluding Lip (F) = L. 0
The previous theorem allows us to extend vector-valued Lipschitzcontinuous maps f : E C X -> lRm , but the Lipschitz extension will have, in principle, a Lipschitz constant less than /iii Lip (I). Actually, a more elaborated argument allows us to prove the following.
6.39 Theorem (Kirszbraun). Let f : E C lR n -> lR m be a Lipschitzcontinuous map. Then f has an extension F : lR n --t lRm such that Lip F = Lipf· In fact there exist several extensions of Kirszbraun's theorem that we will not discuss. We only mention that it may fail if either lR n or lR m is remetrized by some norm not induced by an inner product. 6.40 ~ (Federer). Let X be]R2 with the infinity norm Ilxll oo = sup(lx11 the map f: A C X -> ]R2, where A:= {(-I, 1), (1, -1), (1, In and
f( -1,1)
:=
(-1,0),
Show that Lip (I) = 1, but
f(l, -1)
:=
(1,0),
f(l, 1)
:=
(0, .,/3).
f has no I-Lipschitz extension to AU {(O, On·
+ Ix 2 1)
and
208
6. Compactness and Connectedness
6.2.4 Tietze's theorem An extension of Theorem 6.38 holds for continuous functions in a closed domain.
6.41 Theorem (Tietze). Let X be a metric space, E c X be a closed subset of X, and f a continuous function from E into [-1, 1] (respectively, IR). Then f has a continuous extension from X into [-1,1] (respectively, IR). Actually we have the following.
6.42 Theorem (Dugundji). Let X be a metric space, E a closed subset of X and let f be a continuous function from E into JRn. Then f has a continuous extension from X into JRn; moreover the range of f is contained in the convex hull of f(E). We recall that the convex hull of a subset E C JRn is the intersection of all convex sets that contain E.
Proof of Tietze's theorem. First assume that f is bounded. Then it is not restrictive to assume that infE f = 1 and sUPE f = 2. We shall prove that the function
F(x):=
{
f(X) infyEE(f(y)d(x, y)) d(x,E)
if x E E, if x
¢E
is a continuous extension of f and 1 ~ F(x) ~ 2 "Ix E X. Since the last claim is trivial, we need to prove that F is continuous in X. Decompose X = int E U (X \ E) U BE. If Xo E int E, then F is continuous at Xo by assumption. Let Xo E X \ E. In this case x -> d(x, E) is continuous and strictly positive in an open neighborhood of xo, therefore it suffices to prove that that h(x) := infYEE(f(y)d(x, y)) is continuous at xo. We notice that for y E E and x, Xo E X we have
f(y) d(x, y) ~ f(y) d(xo, y)
+ f(y) d(x, xo)
~ f(y) d(xo, y)
+ 2 d(x, xo),
hence
h(x) ~ h(xo)
+ 2d(x,xo)
and, exchanging x with xo,
Ih(x) - h(xo)1 ~ 2d(x, xo). This proves continuity of h at xo. Let xo E BE. For £ > 0 let r > 0 be such that If(y) - f(xo)1 d(y, xo) < r and y E E. For x E B(xo, r/4) we have inf (f(y)d(x,y)) ~ f(xo)d(x,xo) ~ 2~ = ~ EnB(xQ,rf4) 4 2 and inf (f(y)d(x, y)) E\B(xQ,rf4) Therefore we find for x with d(xo,x)
~ d(xo, y) -
d(x, xo)
~ ~r. 4
< r/4,
h(x) = inf (f(y)d(x, y)) = inf f(y)d(x, y) yEE EnB(xQ,rf4) and
d(x, E) = d(x, E n B(xo, r /4)).
<
£
provided
6.2 Extending Continuous Functions
On the other hand, for y E En B(xo, r) we have If(xo) - f(y)1
(f(xo) - E)d(x, E) ::; h(x) ::; (f(xo)
209
< E hence
+ E)d(x, E)
if x E B(xo,r/4), i.e., h(x) is continuous at xo. Finally, if f is not bounded, we apply the above to 9 := 'P 0 f, 'P being the homeomorphism'P: JR -+]0,2[, 'P(x) = ~ + 2(1~lxl)' If G extends continuously g, then
F := 'P- 1
0
G continuously extends f.
0
6.43 Remark. The extension F : X ~ IR of f : E C X Tietze's theorem is Lipschitz continuous outside E.
~
IR provided by
Sketch of the proof of Theorem 6.42, assuming X = JRn and E C X compact. Choose a countable dense set {edk in E and for x rf- E and k = 1,2, ... , and set
Ix - ekl } 'Pk(X) := max { 2 - - - - ) ,0 . d(x,E The function
J(x):=
f(X)' 'Lk>1 2 - k 'Pk(x)f(ek) { 'L i2 1 2- k 'Pk(X) ,
defines a continuous extension
1,
moreover J(JR n
)
x E E,
xrf- E,
is contained in the convex-hull of
f(E).
0
6.44'. Let E and F be two disjoint nonempty closed sets of a metric space (X, d). Check that the function f : X -+ [0,1] given by f(x) =
d(x, E) d(x, E) + d(x, F)
is continuous in X, has values in [0,1], f(x) = 0 "Ix E E and f(x) = 1 "Ix E F.
6.45'. Let E and F be two disjoint nonempty closed sets of a metric space (X, d). Using the function f in Exercise 6.44 show that there exist two open sets A, B C X with An B = 0, A=> E and B => F. Indeed Exercise 6.45 has an inverse.
6.46 Lemma (Uryshon lemma). Let X be a topological space such that each couple of disjoint closed sets can be separated by two open disjoint sets. Then, given a couple of disjoint closed sets E and F, there exists a continuous function f : X -+ [0,1] such that f(x) = 1 "Ix E E and f(x) = 0 "Ix E F. This lemma answers the problem of finding nontrivial continuous functions in a topological space and is a basic step in the construction of the so-called partition or decomposition of unity, a means that allow us to pass from local to global constructions and vice versa. Since we shall not need these results in a general context, we refrain from further comments and address the reader to one of the treatises in general topology.
210
6. Compactness and Connectedness
Pa!PACZ AU VOLUME L
.... 1'opoJork
pr'CIptKt po ala. per ,.ppona au 1rutfol"&tl..l. . bke.tUlll .. OM tNMftlr1Utloa (lI!LIf'I04l:") , ./(~) Nt dl.. ,.,,;~u. a.nq- I, coedItiM .-:.U..... ul,.I•• !(,r)-IiII/(,.C.). BU,
tJ"IIU, d
""".ul •
dit- tlutUlIrM .. _ -~ ~.:a; a4aet. ... lNII ht.nul z-f-4(:I'J contIa,OI. lA c.en.. •....ba ... ~ qoefqoa npUealloM: ~t • ct.eaaNu qMl ... rNf'M1 doat 00 eouI·
...,... au war..,...
.. ""t .............. eo...
ut,.
_So. ,.
a 'l'''.p&t.
oa nIt. la poLat IS. l'Up-N ••d14ln i ......50• • 11.. 'leDdGl d... 1t- Gtoe6Crie IJUq..... , WI aoabn .rWl:nlno . . ~_1lIBC poUt, de ,.a
.elId* ~I fl .t cllm~
_0IaJl;,.
r6t1l
qUI rOIl'
~I."'._."'"
Nt ,V
'Ul.aldoo u .,.
III ortlt'Hrp.ae.I .. ,~-,
~p~-I". qoeI q'u -.1'<4.
de • pin.
-.,-
James Dugundji U~-I.~C~
1M .......
-
'ra
lA d6w~t riet:&l .. J. T ~ d dn .. bn. dI•• lIM ••IWe.tllq,•• modWHI CIllrtOlll eehll d, I.. TIl. . etMrlle 4.- rl)~ ~t cllI Cakul foacUOOD,1) • IIOIItrI q. rena CIfOl!IIPUOO M 1'lip&Ceo ftah ,.cor. ItGp 'll'Olltl: daM U . , . , . acu.w. d. pn1'bIt-1 oa .t eoo.4v1t • COfIlli.,.,., IOu-. 1',.Ple. ", I'~ t&., raN ~/;" tllIIHut«u (DOIIl. . . uuI ,elpUt! t:_ 6e Frhb.,") at dont ~ ~llIt., ~.I d" Illl,.. lilfilll.. ;U.I""'.-.,Jb.- de ri,I.; 111 ClOII't'W...,. lim , . - ,
.oeb,.
, IIJ"nU" 40' l'on • !!=.~_,o. qUoll qll_ lOll L 0" ...... prid~Q. l'f1i11d. 'It.. I... lrila~ 1kI bo. . . .or,Ill•• IWIU. 1O. .·"". . . bJM d. I'~',II:II r:-- qui OMIIUt,II. l. '"'"
.....lD. . . 1t. TOpOlott. , 1'''''1 &cUI_I d. «U.
TOPOLOGY
_.(11'.
AI..."N ... 1'iu ....C().O,I
' ' I,;
ltOinlH. l.A1"II)UN • f,'tb"llV •
l'O.Ol'fT'l)
AJ••IOMI
Figure 6.3. The first page of the Preface of Topologie by Kazimierz Kuratowski (18961980) and the frontispiece of a classical in general topology.
6.3 Connectedness Intuitively, a space is connected if it does not consist of two separate pieces.
6.3.1 Connected spaces 6.47 Definition. A metric space X is said to be connected if it is not the union of two nonempty disjoint open sets. A subset E C X is connected if it is connected as subspace of X. This can be formulated in other ways. For example we say that two sets A and B of a metric space X are separated if both An B and An Bare empty, i.e., no point of A is adherent to B and no point of B is adherent to A. 6.48 Proposition. Let X be a metric space. The following properties are equivalent. (i) X is connected. (ii) There are no closed sets F, G in X such that F n G = 0 and X FUG. (iii) The only subsets of X both open and closed are 0 and X. (iv) X is not the union of two nonempty and separated subsets.
6.3 Connectedness
211
'*
Proof. Trivially (i) ¢} (ii) ¢} (iii). Let us prove that (i) (iv). By contradiction, suppose X = AU B where A and Bare nonempty and separated. From An B = 0 and C C C AUB = X we infer A C B and B C A, hence A = B , i.e., A is an open set. Similarly we infer that B is open, concluding that X is not connected, a contradiction. Finally, let us prove that (iv) (ii). By contradiction, assume that X is not connected. Then X = AUB with A, B closed, disjoint and nonempty, thus (AnB) = CAnB) = AnB = 0. Thus X is separated, a contradiction. 0
'*
a. Connected subsets 6.49 Theorem. E c JR. is connected if and only if E is an interval.
rt
Proof. If E C lR is not an interval, there exist x, y E E and z E with x < z < y. Thus the sets El := En] - 00, z[ and E2 := En]z, +oo[ are nonempty and separated. Since E = El U E2, E is not connected, a contradiction. Conversely, if E is not connected, then E = A U B with A and B nonempty and separated. Let x E A and y E B and, without loss of generality, suppose x < y. Define
z
rt
:=
sup(A n [x, yD.
rt
We have z E A hence z B; in particular x ::; z < y. If z A then x < z < y and z E, i.e., E is not an interval. Otherwise, if z E A, then z B and there exists Zl such that z < Zl < Y and Zl B but then x < Zl < Y with Zl E, thus once again, E is not an interval. 0
rt
rt
rt rt
6.50'. Show that the closure of a connected set is connected. 6.51 ,. Show that any subset of iQ having more than one point is not connected. 6.52'. Let A and B be nonempty, disjoint and open sets of a metric space X such that X = A U B and let Y be a connected subset of X. Then either YeA or Y C B. 6.53'. Let {Y",} be a family of connected subsets of X such that n", Y", that Y:= U",Y", is connected. [Hint: Argue by contradiction.]
f. 0. Show
6.54'. Let {Yn } be a sequence of connected subsets such that Y n n Yn+l Then Y := Un Yn is connected.
f. 0 V n.
b. Connected components Because of Exercise 6.53, the following definition makes sense. 6.55 Definition. Let X be a metric space. The connected component of X containing Xo E X is the largest connected subset C xo of X such that Xo E Cxo· 6.56 Proposition. Let X be a metric space. We have the following.
(i) The distinct connected components of the points of X form a partition ofX. (ii) Each connected component C C X is a closed set. (iii) If y E Cx, then C x = C y .
212
6. Compactness and Connectedness
(iv) If Y c X is a nonempty open and closed subset of X, then Y is a connected component of X. Observe that the connected components are not necessarily open. For in:= {x} \:Ix E Q. stance, consider X = Q for which Of particular interest are the locally connected metric spaces, i.e., spaces X for which for every x E X there exists r x > 0 such that B (x, r x) is connected.
ex
6.57 Proposition. Let X be metric space. The following claims are equivalent.
(i) Each connected component is open. (ii) X is locally connected. Proof. Each point in X has a connected open neighborhood by (i), hence (ii) holds. Let C be a connected component of X, let x E C and, by assumption, let B(x, rx) be a connected ball centered at x. As B(x, r x ) is connected, trivially B(x, r x ) C C, Le., C is open. 0
6.58 Corollary. Every convex set of JR.n is connected. Proof. In fact every convex set KeRn is the union of all segments joining a fixed point XQ E K to points x E K. Then Exercise 6.53 applies. 0
The class of all connected sets of a metric space X is a topological invariant. This follows at once from the following. 6.59 Theorem. Let f : X --+ Y be a continuous map between metric spaces. If X is connected, then f(X) C Y is connected. Proof. Assume by contradiction that f(X) is not connected. Then there exist nonempty open sets C, DeY such that C n D n f(X) = 0, (C U D) n f(X) = f(X). Since f is continuous, A := f- 1(C), B := f-1(D) are nonempty open sets in X, such that An B = 0 and Au B = X. A contradiction, since X is connected. 0
Since the intervals are the only connected subsets of JR., we again find the intermediate value theorem of [GMl] and, more generally, 6.60 Corollary. Let f : X --+ JR. be a continuous function defined on a connected metric space. Then f takes all values between any two that it assumes. c. Segment-connected sets in JR.n In JR.n we can introduce a more restrictive notion of connectedness that in some respect is more natural. If x, y E JR. n , a polyline joining x to y is the union of finitely many segments of the type
6.3 Connectedness
213
where Xi E jRn and [Xi, Xi+l] denotes the segment joining Xi with Xi+l. It is easy to check that a polyline joining X to y can be seen as the image or trajectory of a piecewise linear function '"Y : [0, 1] --+ jRn. Notice that piecewise linear functions are Lipschitz continuous. 6.61 Definition. We say that A c jRn is segment-connected if each pair of its points can be joined by a polyline that lies entirely in A. If A[x] denotes the set of all points that can be joined to X by a polyline in A, we see that A is segment-connected if and only if A = A[x]. Moreover we have the following. 6.62 Proposition. Any segment-connected A
c
jRn
is connected.
Of course, not every connected set is segment-connected, indeed a circle in jR2 is connected but not segment-connected. However, we have the following. 6.63 Theorem. Let A be an nonempty open set of jRn. Then A is connected if and only if A is segment-connected. Proof. Let xo E A, let B := A[x] be the set of all points that can be connected with Xo by a polyline and let C := A \ A[x]. We now prove that both Band C are open. Since A is connected, we infer A = A[x] hence, A is segment-connected. Let x E B. Since A is open, there exists B(x, r) C A. Since x is connected with Xo by a polyline, adding a further segment we can connect each point of B(x, r) with xo by a polyline. Therefore B(x, r) C B if x E B, i.e., B is open. Similarly, if x E C, let B(x, r) C A. No points in B(x, r) can be connected with Xo by a polyline since on the contrary adding a further segment, we can connect x with xo. So B(x, r) C C if x E C, i.e., C is open. 0
d. Path-connectedness Another notion of connection that makes sense in a topological space is joining by paths. Let X be a metric space. A path or a curve in X joining X with y is a continuous function f : [0,1] --+ X with f(O) = X and f(l) = y. The image of the path is called the trajectory of the path. 6.64 Definition. A metric space X is said path-connected if any two points in X can be joined by a path.
Evidently following.
jRn
is path-connected. We have, as in Theorem 6.63, the
6.65 Proposition. Any path-connected metric space X is connected. The converse is however false in general. 6.66 ,. Consider the set A C ]R2, A = G U I where G is the graph of f(x) := sin l/x, 0< x < 1, and I = {O} x [-1,1]. Show that A is connected but not path-connected.
214
6. Compactness and Connectedness
Similarly to connected sets, if {Ac.} c X are path-connected with =j:. 0, then A := U",A", is path-connected. Because of this, one can define the path-connected component of X containing a given Xo E X as the maximal subset of X containing Xo that is path-connected. However, examples show that the path-connected components are not closed, in general. But we have the following.
n",Ac.
6.67 Proposition. Let X be metric space. The following claims are equivalent.
(i) Each path-connected component is open (hence closed). (ii) Each point of x has a path-connected open neighborhood. Proof. (ii) follows trivially from (i). Let C be a path-connected component of X, let x E C and by assumption let B(x, r x ) be a path-connected ball centered at x. Then trivially B(x, r x ) C C, i.e., C is open. Moreover C is also closed since X \ C is the union of the other path-connected components that are open sets, as we have proved. 0
6.68 Corollary. An open set A of JRn is connected if and only if it is path-connected. Proof. Suppose that A is connected and let U C A be a nonempty open set. Each point x E U then has a ball B(x,r) C U that is path-connected. By Proposition 6.57 any path-connected component C of A is open and closed in A. Since A is connected,
C=A.
0
6.3.2 Some applications Topological invariants can be used to prove that two given spaces are not homeomorphic.
6.69 Proposition. JR and JR n , n> 1, are not homeomorphic. Proof. Assume, by contradiction, that h : lR n - lR is a homeomorphism, and let Xo be a point oflR n . Then clearly lR n \ {xo} is connected, but h(lR n \ {xo}) = lR \ {h(xo)} is not connected, a contradiction. 0
Much more delicate is proving that
6.70 Theorem. JRn and JRm, n =j:. m, are not homeomorphic. The idea of the proof remains the same. It will be sufficient to have a topological invariant that distinguishes between different JRn. Similarly, one shows that [0, 1J and [0, l]n, n > 1, are not homeomorphic even if one-to-one correspondence exists. 6.71 ,. Show that for any one-to-one mapping h: [0, 1Jn continuous.
[0, 1J neither h nor h- 1 is
6.3 Connectedness
215
6.72~. Show that the unit circle Sl of ~2 is not homeomorphic to R [Hint: Suppose h : Sl ---> ~ is a homeomorphism and let Xo E Sl. Then Sl \ {xo} is connected, while ~ \ {h(xon is not connected.]
6.73 Theorem. In JR each closed interval is homeomorphic to [-1, 1], each open interval is homeomorphic to] -1, 1[ and each half-open interval is homeomorphic to] -1,1]. Moreover, no two of these three intervals are homeomorphic. Proof. The first part is trivial. To prove the second part, it suffices to argue as in Proposition 6.69 removing respectively, 2, or 1 points to one of the three standard intervals, thus destroying connectedness. 0
°
6.74~. Show that the unit ball sn := {x E ~n+11Ixl = I} in ~n+1 is connected and that Sl and sn, n > 1, are not homeomorphic. 6.75~.
and
~n \
Let A c ~n and let C C ~n be a connected set containing points of both A A. Show that C contains points of oA.
6. 76 ~. Show that the numbers of connected components and of path-connected components are topological invariants. Theorem. Let f : X ---> Y be a continuous function. The image of each connected (path-connected) component of X must lie in a connected component ofY. Moreover, if f is a homeomorphism, then f induces a one-to-one correspondence between connected (path-connected) components of X and Y. 6. 77 ~. In set theory, the following theorem of Cantor-Bernstein holds, see Theorem 3.58 of [GM2]. Theorem. If there exist injective maps X one-to-one map between X and Y.
--->
Y and Y
--->
X, then there exists a
This theorem becomes false if we require also continuity.
Theorem (Kuratowski). There may exist continuous and one-to-one maps f : X Y and g : Y ---> X between metric spaces and yet X and Y are not homeomorphic.
[Hint: Let X, Y C
~
--->
be given by
X =]0, I[U{2}U]3, 4[U{5} U ... U]3n, 3n + I[U{3n + 2} U ... Y =]0, I]U]3, 4[U{5} U ... U]3n, 3n + I[U{3n + 2} U ... By Exercise 6.76, X and Yare not homeomorphic, since the component ]0,1] of Y is not homeomorphic to any component of X, but the maps f : X ---> Y and g : Y ---> X given by
f(x) := { ;
if x
i=
X/2
2,
if x = 2,
are continuous and one-to-one.]
and
x-2
g(x):= {
if x E]O, 1[,
-2-
if x E]3,4[,
x-3
otherwise
216
6. Compactness and Connectedness
6.4 Exercises 6.78~. Show that a continuous map between compact spaces needs not be an open map, i.e., needs not map open sets into open sets.
6. 79~. Show that an open set in ~n has at most countable many connected components. Show that this is no longer true for closed sets. 6.80~.
The distance between two subsets A and B of a metric space is defined by
d(A, B):= inf d(a, b). aEA bEB
Of course, the distance between two disjoint open sets or closed sets may be zero. Show that, if A is closed and B is compact, then d(A, B) > O. [Hint: Suppose :3 an, bn such that d(an, bn ) - t 0 ... J 6.81~. Let (X,dx) and (Y,dy) be metric spaces, and let (X x Y,dxxY) be their Cartesian product with one of the equivalent distances in Exercise 5.14. Let 1r : X X Y be the projection map onto the first component, 1r(x, y) := x. 1r is an open map, see Exercise 5.131. Assuming Y compact, show that 1r is a closed map, i.e., maps closed sets into closed sets. 6.82~. Let I: X - t Y be a map between two metric spaces and suppose Y is compact. Show that I is continuous if and only if its graph
Gf := {(x, y) E X x Y
Ix E X,
Y = I(x)}
is closed in X x Y endowed with one of the distances in Exercise 5.14. Show that, in general, the claim is false if Y is not compact. 6.83~. Let K be a compact set in ~2, and for every x E ~ set K x := {y E ~I K} and I(x) := diam K x , x E ~. Show that I is upper semicontinuous.
(x,y) E
6.84~. A map I : X - t Y is said to be proper if the inverse image of any compact set KeY is a compact set in X. Show that I is a closed map if it is continuous and proper.
6.85 ~. Show Theorem 6.35 using the finite covering property of X. [Hint: "IE > 0 to every x E X we can associate a il(x) > 0 such that dy(J(x),/(y)) < E/2 whenever y E X and dx(x,y) < il(x). From the open covering {B(x,il(x)n of X we can extract a finite subcovering {B(xi,rXini=l, ... ,N such that Xc B(Xl,il(Xl))U" .UB(xn,il(XN))· Set il := min{il(xI), ... , il(XN n.] 6.86~. Let I : E - t ~m be uniformly continuous on a bounded set E. Show that I(E) is bounded. [Hint: The closure of E is a compact set ... J 6.87~.
Show that (i) if I : X - t ~n and 9 : X - t ~n are uniformly continuous, then A E ~, are uniformly continuous, (ii) if I: X - t Y is uniformly continuous in A c X and B C X, then continuous in A U B.
6.88~. Let I, 9 : X uniformly continuous.
-t
~
1+ 9 I
and
AI,
is uniformly
be uniformly continuous. Give conditions such that
Ig
is
6.4 Exercises
217
6.89'. Show that the composition of uniformly continuous functions is uniformly continuous. 6.90'. Concerning maps 1 : [0, +oo[~ JR, show the following. (i) If 1 is continuous and I(x) ~ >.. E JR as x ~ +00, then 1 is uniformly continuous in [0,+00[. (ii) If 1 is continuous and has an asymptote, then 1 is uniformly continuous in [0, +00[. (iii) If 1 : [0, +oo[~ JR is uniformly continuous in [0, +00[, then there exists constants A and B such that I/(x)j S; Alxl + B "Ix 2: O. (iv) If 1 is bounded, then there exists a concave function w(t), t 2: 0, such that I/(x) - l(x)1 S; w(lx - yl) Vx,y 2: o. 6.91 ,. Let K C X be a compact subset of a metric space X and x E X \ K. Show that there exists y E K such that d(x, y) = d(x, K). 6.92 ,. Let X be a metric compact space and I(X) = X. [Hint: 1 2 , /3 , ... , are isometries.]
1 :X
~
X be an isometry. Show that
6.93 " . Show that the set of points of JR2 whose coordinates are not both rational, is connected. 6.94'. Let B be a, at most, countable subset of JR n , n > 1. Show that C := JRn \ B is segment-connected. [Hint: Assume that DEC and show that each x E C can be connected with the origin by a path contained in C, thus C is path-connected. Now if the segment [0, x] is contained in C we have reached the end of our proof, otherwise consider any segment R transversal to [0, x] and show that there is z E R such that the polyline [0, z] U [z, x] does not intersect B.] 6.95'. Let 1 : JRn ~ JR, n > 1, be continuous. Show that there are at most two points y E JR for which 1- 1 (y) is at most countable. [Hint: Take into account Exercise 6.94.J
7. Curves
The intuitive idea of a curve is that of the motion of a point in space. This idea summarizes the heritage of the ancient Greeks who used to think of curves as geometric figures characterized by specific geometric properties, as the conics, and the heritage of the XVIII century, when, with the development of mechanics, curves were thought of as the trace of a moving point.
7.1 Curves in ffi.n 7.1.1 Curves and trajectories From a mathematical point of view, it is convenient to think of a curve as of a continuous map ry from an interval 1 of jR into jRn, ry E CO(1, jRn). The image ry(1) of a curve ry E C°(1,jRn), is called the trace or the trajectory of ry. We say that ry : 1 - t jRn is a parametrization of r if ry(1) = r, intuitively, a curve is a (continuous) way to travel on r. If x, y E jRn, a curve ry E CO([a, b], jRn) such that ry(a) = x, ry(b) = y, is often called a path joining x and y. A curve is what in kinematics is called the motion law of a material point, and its image or trajectory is the "line" in jRn spanned when the point moves. If the basis in jRn is understood, -as we shall do from now on, fixing the standard basis ofjRn- a curve ry E C°(1, jRn) writes as an n-tuple of continuous real-valued functions of one variable, ry(t) = (ry1(t), ry2(t), ... ,ryn(t)), ryi : 1 - t jR, ryi(t) being the component of ry(t) Vt E 1. Let k = 1,2, ... , or 00. We say that a curve ry E k (1,jRn) if all the components of ry are real-valued functions respectively, of class C k (1, jR), and that ry is a curve of class C k if ry E C k (1, jRn) We also say that ry : [a, b] - t jRn is a closed curve of class C k if ry is closed, ry E C k (1, jRn) and moreover, the derivatives of order up to k of each component of ry at a and b coincide,
c
Vi = 1, ... If ry : 1
-t
jRn is of class
C1,
the vector
,n, Vj
= 1, ... ,k.
220
7. Curves
is the derivative or the velocity vector of"( at to E I, and its modulus I'Y' (to) I is the (scalar) velocity of "( at to. We also call "(' (to) the tangent vector to "( at to. Finally, if "( E C 2 (I, ~n), the vector
is called the acceleration vector of"( at to. 7.1 Example (Segment). Let x and y be two distinct points in jRn. The curve s(t) : jR -> jRn given by s(t) = (1 - t)x
+ ty = x + t (y -
t E [0,1],
x),
is an affine map, called the parametric equation of the line through x in the direction of y. Thus its trajectory is the line L C jRn through s(O) = x and s(l) = y with constant vector velocity s'(t) = y - x. In kinematics, s(t) is the position of a point traveling on the straight line s(jR) with constant velocity Iy - xl assuming s(O) = x and s(l) = y. Therefore the restriction S\[O,IJ of s, s(t) = (1 - t)x
+ ty,
0::::; t ::::; 1,
describes the uniform motion of a point starting from x at time t = 0 and arriving in y at time t = 1 with constant speed Iy - xl and is called the parametric equation of the segment joining y to x. 7.2 Example (Uniform circular motion). The curve '"( : jR -> jR2 given by '"((t) = (cos t, sin t) has as its trajectory the unit circle of jR2 {(x, y) I x 2 + y2 = I} with velocity one. In fact, '"('(t) = (-sint,cost) thus h'(t)1 = 1 "It. '"( describes the uniform circular motion of a point on the unit circle that starts at time t = 0 at (1,0) and moves counterclockwise with angular velocity one, d. [GM1]. Notice that '"(' 1- '"( and '"(" 1- '"(' since ('"('(t)h(t)) = ('"("(t)h'(t)) =
! dhl 2
dt
2
(t) = 0,
~ dl;~12 (t) =
O.
Finally, observe that the restriction of'"( to [0,21l'[ runs on the unit circle once, since '"(I [0,211"[ is injective. The uniform circular motion is better described looking at jR2 as the Gauss plane of complex numbers, see [GM2]. Doing so, we substitute '"((t) with t -> e it , t E jR, since we have eit = cos t + i sin t. 7.3 Example (Graphs). Let
f
E c°(I,jRn) be a curve. The graph of
f,
Gf := {(x,y) E I x jRn Ix E I, Y = f(x)} C jRn+l,
has the standard parametrization, still denoted by Gf' Gf : I -> jRn+l, Gf(t) := t-> (t,f(t)), called the graph-curve of f. Observe that Gf is an injective map, in particular G f is never a closed curve, G f is of class C k if f is of class C k , k = 1. ... , 00, and Gj(t) = (1, l' (t))
if f is of class Cl. A point that moves with the graph-curve law along the graph, moves with horizontal component of the velocity field normalized to +1. Notice that IGj(t)1 2 1 "It.
7.1 Curves in JRn
221
Figure 7.1. A cylindrical helix.
7.4 Example (Cylindrical helix). If ')'(t) = (acost,asint,bt), t E JR, then ')"(t) = (-a sin t, a cos t, b), t E R We see that the point ')'(t) moves with constant (scalar) speed along a helix, see Figure 7.1.
7.5 Example (Different parametrizations). Different curves may have the same trace, as we have seen for uniform circular motion. As another example, the curves ')'1 (t) := (t, 0), ')'2(t) := (t 3 , 0) and ')'3(t) := (t(t 2 -1), 0), t E JR, are different parametrizations of the abscissa-axis of JR2; of course, the three parametrizations give rise to different motions along the x-axis. Similarly, the curves 0"1 (t) = (t 3 , t 2 ) and 0"2(t) = (t, (t 2 )1/3), t E JR, are different parametrizations of (a) Figure 7.2. Notice that 0"1 is a 0 00 _ parametrization, while 0"2 is continuous but not of class 0 1. 7.6 Example (Polar curves). Many curves are more conveniently described by a polar parametrization: instead of giving the evolution of Cartesian coordinates of ')'(t) := (x(t), y(t», we give two real functions O(t) and p(t) that describe respectively, the angle evolution of ')'(t) measured from the positive part to the abscissa axis and the distance of ')'(t) from the origin, so that in Cartesian coordinates
')'(t) = (p(t) cos O(t), p(t) sinO(t». If the independent variable t coincides with the angle 0, O(t) = t, we obtain a polar curve ')'(0) = (p(O) cosO,p(O) sinO).
In the literature there are many classical curves that have been studied for their relevance in many questions. Listing them would be incredibly long, but we shall illustrate some of them in Section 7.1.3.
Fisure 7.2. (a) ')'(t) = (t 3 , t 2 ), (b) ')'(t) = (t 3 - 4t, t 2
-
4).
222
7. Curves
a. The calculus Essentially the entire calculus, with the exception of the mean value theorem, can be carried on to curves.
= (/1, ,2, ... , ,n). The integral
7.7 Definition. Let, E CO ([a, b]; jRn), , of, on [a, b] is the vector in jRn
l
b
,(s) ds
:=
(l
b
,1(s) ds,
l
b
,2(s) ds, ... ,
7.8 Proposition. If, E CO([a, b]; jRn), then Proof. Suppose that J:,( s) ds (vI, v 2 , .. • , v n ) E lItn we have
of.
l
b
,n(s) dS).
If: ,(s) dsj :::; f: h(s)1 ds.
0, otherwise the claim is trivial. For all v
using Cauchy's inequality we deduce
I(vi lab ,(s) dS) I= Ilab (v I ,(s» dsl ~ lab I(v I ,(s»1 ds ~ Ivl lab lI(s)1 ds for all v E lIt n . Therefore it suffices to choose v:= J:,(s)ds to find the desired result.
o
If, E C 1([a, b], jRn) and n > 1, the mean value theorem does not hold. Indeed, if,(t) = (cost,sint), t E [0,27T], and s E [0,27T] is such that
0= ,(27T) -,(0) = 27T,'(S), we reach a contradiction, since h' (s ) I = 1(- sin s, cos s) I = 1. However, the fundamental theorem of calculus, when applied to the components yields the following.
7.9 Theorem. Let,
E
C 1([a, b]; jRn). Then ')'(b) -')'(a) =
l
b
')"(s) ds.
Finally, we notice that Taylor's formula extends to curves simply writing it for each component,
(7.1)
7.1 Curves in IR n
223
Figure 7.3. Some trajectories: from the left, (a) simple curve, (b) simple closed curve, (c), (d), (f) curves that are not simple.
b. Self-intersections Traces of curves may have self-intersections, i.e., in general, curves are not injective. In (b) Figure 7.2 the trace of the curve ,(t) = (t 3 - 4t, t 2 - 4) t E ~ self-intersects at the origin. One defines the multiplicity of a curve , E CO (I, ~n) at x E ~n as the number of t's such that ,(t) = x,
Of course, the trace of , is the set of points with multiplicity at least 1. We shall distinguish two cases. (i) , : I -+ ~n is not closed, Le., ,(a) i=- ,(b). In this case we say that, is simple if, is not injective Le., all points of its trajectory have multiplicity 1. Notice that, if I = [a, b], then, is simple if and only if , is an homeomorphism of [a, b] onto ,([a, b]), [a, b] being compact, see Corollary 6.19. In contrast, if I is not compact, I and ,(I) in general are not homeomorphic. For instance let I = [0, 27l'[ and ,(t) := (cost,sint), tEl be the uniform circular motion. Then ,(I) is the unit circle that is not homeomorphic to I, see Exercise 6.72. (ii) , is a closed curve, Le., I = [a, b] and ,(a) = ,(b). In this case we say that, is a simple closed curve if the restriction of , to [a, b[ is injective, or, equivalently, if all points of the trajectory of" but ,(a) = ,(b) have multiplicity 1. A (closed) curve, has self-intersections if it is not a (closed) simple curve. 1.10~. Show that any closed curve 'Y : [a, b] -+ IR n can be seen as a continuous map from the unit circle 8 1 C IR n . Furthermore show that its trace is homeomorphic to 8 1 if 'Y is simple.
1.11 ~. Study the curves (x(t),y(t)), x(t) x(t) = t 2 /(1 + t 6 ), yet) = t 3 /(1 + t 6 ).
=
2t/(1
+ t 2 ),
yet)
=
(t 2
-
1)/(1
+ t 2 ),
c. Equivalent parametrizations Many properties of curves are independent of the choice of the parameter, that is, are invariant under homemorphic changes of the parameter. This is the case for the multiplicity function and, as we shall see later, of the length. For this reason, it is convenient to introduce the following definition
224
7. Curves
7.12 Definition. Let I, J be intervals and let "/ E C°(I, JRn) and 8 E CO(J,lRn ). We say that <5 is equivalent to,,/ if there is a continuous oneto-one map h : J --t I such that 8(s)
= ,,/(h(s))
'1:/
s E J.
In other words <5 is equivalent to,,/ if <5 reduces to "/ modulo a continuous change of variable in the time axis. Since the inverse h- 1 : I --t J of a continuous one-to-one map is also continuous, see [GMl], we have that "/ is equivalent to <5 iff <5 is equivalent to "/. Actually one sees that the relation of equivalence among curves is an equivalence relation. Trivially, two equivalent curves have the same trace and the same multiplicity function; the converse is in general false. 1.13 Example. ,(t) = (cost,sint), t E [0,271"] and 8(t) = (cost,sint), t E [0,471"J have the same trace but are not equivalent since their multiplicity functions are different.
However, we have the following.
7.14 Theorem. Two simple curves with the same trace are equivalent. Proof. Assume for simplicity that the two curves, E C°(I,lR n ) and 8 E C°(J,lR n ), I and J being intervals, are not closed. Set h := ,-I 08 which clearly is a one-to-one and continuous map from J to I. h is then a homeomorphism, see [GMl], and clearly 8(t)
=,0,-1 08(t) = 8(h(t))
V t E J.
o The notion of equivalence between curves can be made more precise.
7.15 Definition. Let"/ E C°(I,lRn ) and <5 E CO(J,lR n ) be two equivalent curves, and let h : J --t I be a homeomorphism such that <5(t) = ,,/(h(t)) 'l:/t E J . We say that "/ and <5 have the same orientation if h is monotoneincreasing and have opposite orientation if h is monotone-decreasing. Since every homeomorphism between intervals is either strictly increasing or strictly decreasing, see [GMl], two equivalent curves either have the same orientation or have opposite orientations. In this way, the set of curves can be partitioned into equivalence classes and each class decomposes into two disjoint subclasses: equivalent curves with the same orientation and equivalent curves with opposite orientation.
7.1.2 Regular curves and tangent vectors a. Regular curves We say that a curve "/ of class C 1 is regular if ,,/'(t) i- 0 'l:/t. It is also convenient to reconsider the notion of equivalence in the category of curves of class C 1 .
7.1 Curves in
jRn
225
7.16 Definition. Let I, J be intervals. Two curves, E C 1 (/, ll~n), 6 E C 1 (J, JRn) of class c 1 are C 1 -equivalent if there exists a one-to-one map h : J ----+ 1 of class C 1 with h'(t) -=j:. 0 Vt E J such that ,(s) = ,(h(s)) Vs E J. Clearly C 1-equivalent curves have the same trace. We can prove that being C 1 -equivalent is an equivalence relation between regular curves; actually we shall prove the following result after Proposition 7.37.
7.17 Theorem. Let, and 6 be two curves of class C 1 , and suppose they are regular. Then, and 6 are C 1 -equivalent if and only if they are C o _ equivalent. Since every function of class C 1 with h' -=j:. 0 Vt is either strictly increasing or strictly decreasing, since h' cannot change sign, any two C 1 _ equivalent curves either have the same orientation or have opposite orientation. In this way the set of C 1 -curves can be partitioned into equivalence classes and each class decomposes into two disjoint subclasses: C 1 _ equivalent curves with the same orientation and C 1-equivalent curves with opposite orientation.
b. Tangent vectors Let, : 1 ----+ JRn be a simple, regular curve of class C1 and let f := ,(I) be its trace. If x E f, there exists a unique tEl such that ,(t) = x. 7.18 Definition. The space of tangent vectors to the trace f at x E f is defined as the space of all multiples of ,'(t), Tanxf := Span
{,'(t)}
,(t) = x.
The unit tangent vector to , at x := ,(t) is defined by
,'(t)
n-y(x) := h'(t)I'
,(t) = x.
Notice that the previous definition makes sense since one proves that Span {r'(t)} where ,(t) = x, depends only on the trace of, and on x. In fact, if , : 1 ----+ JRn and 6 : J ----+ JRn are two curves with the same trace f, then Theorems 7.14 and 7.17 yield that, and 6 are C 1 -equivalent, Le., there exists h : J ----+ 1 one-to-one and of class C 1 with h' (s) -=j:. 0 Vs E J such that 6(s) = ,(h(s)) Vs E I. Differentiating, we get 6'(s)
= h'(s)6'(h(s)),
that is, 6'(s) and ,'(t) are multiples of each other when 6(s) = ,(t) =: x. Moreover, 6'(s)
n8(x)
,
,'(h(s))
,
= 16'(s)1 = sgn(h (s)) h'(h(s))1 = sgn(h (s))n-y(x),
226
7. Curves
to Figure 7.4. A polygonal line inscribed on a curve.
that is, two C1-equivalent curves with the same orientation have the same unit tangent vector, and two C 1-equivalent curves with opposite orientation have the opposite unit tangent vector. Remaining in a classic context, it is convenient to also introduce the families of piecewise-C 1 curves and piecewise regular curves.
7.19 Definition. A curve, : [a, b] -+ jRn is said to be piecewise-C 1 (respectively, regular) if, E C°(I,jRn) and there exist finitely many points a = to < h < ... < tN = b such that the restrictions 'I[ti,ti-l] are of class C 1 (respectively, regular) for all i = 1, ... , N. We emphasize that in Definition 7.19, is required to be continuous everywhere in [a, b], while derivatives are required to exist everywhere except at finite many points where only left- and right-derivatives exist. Notice also that piecewise-C 1 curves are Lipschitz continuous. 1.20~. Let, : [a, b] -+ IR n be a piecewise regular curve. Show that every point in ,([a, bJ) has finite multiplicity. Show a piecewise regular curve that has infinitely many
points of multiplicity 2. 1.21
~.
Show that
,(b) -,(a) = if, : [a, b]
-+
IR n is piecewise 0
1
l
b
,'(s) ds
.
c. Length of a curve Recall that a partition CT of [a, b] is a choice af finitely many points to, ... , tN with a = to < t 1 < ... < tN = b. Denote by S the family of partitions of [a, b]. For each partition CT = {to, t1, ... , t N} E S one computes the length of the polygonal line P(CT) joining the points ,(to), ,(td,···, ,(tN) in the listed order, Figure 7.4, N
P(CT)
:=
L i=l
I!(ti) -
,(ti-dl·
7.1 Curves in IR n
227
Figure 7.5. The graph of I(x) = xsin(ljx), x E [0,1], is not rectifiable.
7.22 Definition. Let IE CO([a, b]; JRn). The length of I is defined as
Lh)
:=
su p { P(a) [a E S}
and we say that I is rectifiable or I has finite total variation if L( I)
<
+00. In other words the length of a curve is the supremum of the lengths of all inscribed polygonals. The following is easily seen.
7.23 Proposition. If I and 0 are equivalent, then Lh) = L(o). In particular I and 0 are either both rectifiable or not, and the length of a simple curve depends only on its trace. 7.24~.
Prove Proposition 7.23.
7.25~. Let'Y:
[a, b] -. IRn be a curve and let a < c < b. Show that Lb) = Lbl[a,cl) +
Lb[c,bJ)' 7.26~.
'Y(t)
Show that if 'Y(t) = (cos t, sin t), t E [0,27l'], we have Lb) t E [0,41l'], we have Lb) = 47l'.
27l', while if
= (cost,sint),
7.27 Example. Curves'Y E CO([a, b}; IR n ) need not be rectifiable, i.e., of finite length. Indeed the curve graph of I, 'Y(x) = (x, I(x)) where
Figure 7.6. A closed curve that is not rectifiable.
228
7. Curves
Figure 7.7. An approximation of the von Koch curve.
f(x)
:=
{~sin(l/X)
if x E]O, 1], if
X
= 0
has infinite length, see Figure 7.5. Indeed, if Xn
1
:=
mr
the length of '/'I[xn-l>x n ] is larger than
+ 7':/2
,
nE N,
xnl sin l/xnl
=
Xn,
hence for any n
Le., L(,/,) = 00. Notice that'/' belongs to CO([O,I],IR n )nCl(]O,I],IRn bounded in a neighborhood of O.
),
but '/" is not
7.28 Example (The von Koch curve). Clearly a bounded region of the plane may be enclosed by a curve of arbitrarily large length, think of the coasts of Great Britain or of Figure 7.6. A curve of infinite length enclosing a finite area is the von Koch curve that is constructed as follows. Start from an equilateral triangle, replace the middle third of each line segment with the two sides of an equilateral triangle whose third side is the middle third that we want to remove. Then one iterates the procedure indefinitely. One can show that the iterated curves converge uniformly to a curve, called the von Koch curve, which (i) is a continuous simple curve, (ii) has infinite length and encloses a finite area, (iii) is not differentiable at any point.
7.29'. Show that each iteration in the construction of von Koch's curve increases its length by a factor 4/3, and, given any two points on the curve, the length of the arc between the two points is infinity. Finally, show that the surface enclosed by von Koch's curve is 8/5 of the surface of the initial triangle.
7.30 Example (The Peano curve). Continuous nonsimple curves may be quite pathological. Giuseppe Peano (1858-1932) showed in 1890 an example of a continuous curve,/, : [0,1] --> [0,1] x [0, IJ whose trace is the entire square: any such curve is called a Peano curve. Following David Hilbert (1862-1943), one such curve may be constructed as follows. Consider the sequence of continuous curves '/'i : [0,1] --> IR2 as in Figure 7.8. The curve at step i is obtained by modifying the curve at step (i - 1) in an interval of width 2- i and in a corresponding square of side 2- i on the target. The sequence of these curves therefore converges uniformly to a continuous curve, whose trace is the
7.1 Curves in IR n
229
----~----,
,
,
. . . . . . . . . . . . . . . . _. . . . .
Figure 7.8. Construction of a Peano curve according to Hilbert.
entire square. Of course, "Y is not injective, otherwise we would conclude that [0,1] is homeomorphic to the square [0, 1j2, compare Proposition 6.69. Another way of constructing a Peano curve, closer to the original proof of Peano who used ternary representations of reals, is the following. Represent each x E [0,1] in its dyadic expression, x = L:~1 bd2 i , bi E {O, I}, choosing not to have representations ending with period 1. If x = L:~o bd2 i E [0,1], set
"Y(x):=
b2H1
L
00
(
H1
i=O
2
b2 ,Li . 2 i )
00
i=O
Using the fact that the alignment "changes" by a small quantity if x varies in a sufficiently small interval, we easily infer that "Y is continuous. On the other hand, "Y is trivially surjective.
No pathological behavior occurs for curves of class C 1 . In particular, there is a formula for computing their length. 7.31 Theorem. Let, E C1([a, b]; lRn ). Then, is rectifiable and
L(r) =
l
b
1,'(s)1 ds.
Proof. Let (T E 5 be a partition of [a, b], P((T) the length of the polygonal line corresponding to (T. The fundamental theorem of calculus yields
hence
Summing over i, we conclude N
P((T) =
L
1"Y(t;) - "Y(ti-dl
i=l
Le., Lb) = sUPu P((T) ~ rectifiable. It remains to show that
or, equivalently, for any
E
~
1 b
h'(s)1 dx
a
J: h'(s)1
l
ds
<
00,
b
h'(s)ldx
for (T arbitrary. This shows that "Y is
~ Lb)
> 0, there is a partition (T, such that
(7.2)
230
7. Curves
MATHE 1ATI ORE
ALEN.
--
o.. .. _
JW.
, • ...-..-
r _ ... 1'liIIIo!r-
nnUloot~'.~
~
..:I.(IO,l)• . . - .
IIMI.~~~l"
.. ......-.--.. .... ;:we.;.,. o::a:.: ~ r.::::=. ~ ...... 1lI
... ,.
BUDOLP PBIBDBICH ALFllllD CLBBSCB.
0S-,SL . t
1.. ......
_.~JIO'Ilk.
_.................
__
~
~ ~ . . . . ~'"-"
fM(,p,~.""',,,",C..lICIWllIlftI.olipllc,
Pnt E. 'OQIlUIDaL ......
Prot Plot.
~~
........................
...... "'
l
opo_-..1'_1).""'
I..".,...'*'Ih
..
).
.
_l.r_~_
iii
,.,
...,..... '0(.. 11
r. . KWa
D70k . - -
~
...... .--0.1.1'l""--..._
~.Io oWa.l'-II, ......
....
"I_t, U_l. U_L
~~...:..""'"
_UI··~h• •
• • _b
t-aln.
t61UM" r....-,'._Ir ~ ,... -".
0f0lI-,.
JlIl1l_ ....... • \'"._.~Ii
~.1:~:lo.
.
-~
%-0.,
... _ .... .. _~
T
...
.....
" .. r-to
~_""''''
~
_k--,.., .. _r-~ ...
"'-~"""-r,,-.......----
............. " ... + ..
D.. .. ,,;-~ .. .I....
-+
.... ...,.n.
..... _11."'-1
Ll:fPlIG,
""
".."'
'.
_.-~
0..
,...
~..-I ' - - _
... .. -...--.",..
,.,_,. ..,_k"'... .,_,'..,
.
-
¥oi .-..-- .......
_-,....-,
~
III._~
Figure 7.9. The first page of the paper of Giuseppe Peano (1858-1932) appeared in
Matematische Annalen.
We observe that for every s E [ti-l, til we have
consequently 1
h'(ti) - ')'(ti-ill + E ti - ti-l provided we choose the partition (T. := (to, tl, ... , t N) in such a way that h"(s)l::;
(7.3)
if s, t E [ti-l, til n lE. is uniformly continuous in [a, b] by the
(such a choice is possible since ')" : [a, b] -> Heine-Cantor theorem, Theorem 6.35). The conclusion then follows integrating with respect to s on [ti-l, til and summing over i. 0
Of course Theorem 7.31 also holds for piecewise-Cl curves: if "( E < tN = band "( E C1([ti_l, til; IR n ) V i = 1, ... N, then
CO[a,b], a = to < t 1 <
L("()
N
=
it
(b
~ ti~l h'(t)1 dt = Ja
h/(t)1 dt < +00.
7.32 Lipschitz curves. Lipschitz curves, i.e., curves "( : [a, b] which there is L > a such that
h(t) - "((s)1 ::;; Lit - sl
V t, s E [a, b],
---t
IRn for
7.1 Curves in lR n
231
are rectifiable. In fact, for every partition 'Y, with a = to < tl < ... < tN = b we have N
P(a)
=
L !'Y(ti-r) - 'Y(tdl :::; L(b - a). i=l
Quite a bit more complicated is the problem of finding an explicit formula for the length of a Lipschitz curve or, more generally, of a rectifiable curve. This was solved with the contributions of Henri Lebesgue (1875-1941), Giuseppe Vitali (1875-1932), Tibor Rad6 (1895-1965), Hans Rademacher (1892-1969) and Leonida Tonelli (1885-1946) using several results of a more refined theory of integration, known as Lebesgue integration theory. 7.33 , The length formula holds for primitives. Let,: [a, b] ---t lR n be a curve. Suppose there exists a Riemann integrable function 1/J : [a, b] ---t lR n such that
,(t)
=
,(a)
l
+
1/J(s) ds
"It E [a, b].
f: [1/J(t)1 dt.
Show that, is rectifiable and LC/) =
7.34'. Show that two regular curves that are C 1 -equivalent have the same length. [Hint: Use the formula of integration by substitution.] 7.35 Example (Length of graphs). Let fECI ([a, b], lR). The graph of f, G f : [a,b] ---t lR 2 , Gf(t) = (t,f(t», is regular and Gj(t) = (l,f'(t». Thus the length of Gf is
7.36 Example (Length in polar coordinates). (i) Let pet) : [a,b] ---t lR+, (J : [a, b] ---t lR be continuous functions and let ,(t) = (x(t), yet»~ be the corresponding plane curve in polar coordinates, ,(t) = (p(t) cos (J(t), pet) sin (J(t». Since 11 /12 = x ,2 + yl2 = p'2 + p2(J12, we infer
LC/)
=
lab Jpl2 + p2(J12 dt,
in particular, for a polar curve ,(t)
L(,)
=
= (p(t) cos (J(t), pet) sin (J(t»,
we have
lab Jpl2 + p2 dt.
(ii) Let pet) : [a, b]
---t lR+, (J : [a, b] ---t lR and f : [a, b] ---t lR be continuous and let ,(t) := (x(t), yet), z(t», t E [a, b], be the curve in space defined by cylindrical coordinates (p(t), (J(t), f(t», i.e., ,(t) := (p(t) cos (J(t), pet) sin (J(t), f«(J(t»). Since 11 /12 = p'2 + p2(J12 + f'2(J12, we infer
LC/)
=
l
b
Jpl2 + p2(J12 + f'2(J12 dt.
(iii) For a curve in spherical coordinates (p(t), (J(t),
x(t)
= p(t) sin
the length is
yet)
= p(t) sin
z(t)
= p(t) cos
=
232
7. Curves
a
t
b
Figure 7.10. Arc length or curvilinear coordinate.
d. Arc length and C1-equivalence Let'Y E C1([a, b]; lR n ) be a curve of class C 1 and regular, 'Y'(t) i Vt. The function s, : [a, b] -+ lR that for each t E [a, b] gives the length of 'Y1[a,tl'
°
s,(t) := Lb[[a,tl) = i t h'(s)1 ds, is called the arc length or curvilinear abscissa of 'Y. We have (i) s,(t) is continuous, not decreasing and maps [a, b] onto [0, L], L being the length of 'Y. Moreover is differentiable at every point and
s,
s~(t)
= h'(t)1
°
Vt
E
[a, b],
(ii) since 'Y is regular, 'Y'(t) i V t E [a, b], s,(t) is in fact strictly increasing; consequently, its inverse t, : [0, L] -+ [a, b] is strictly increasing, too, and by the differentiation theorem of the inverse, see [GM1], t, is of class C 1 and V s E [O,L].
(7.4)
With the previous notation, the reparametrization by arc length of 'Y is defined as the curve 6, : [0, L] -+ lRn given by s E [O,L].
Differentiating, we get
As a consequence, the arc length reparametrization of a regular curve 'Y of length L is a curve 6 : [0, L] -+ lRn that is C1-equivalent to 'Y , has the same orientation of'Y and for which 16'(s)1 = 1 Vs E [O,L]. It is actually the unique reparametrization with these properties.
7.37 Proposition. Let'P : [a,b] -+ lRn and'IjJ : [e,d] -+ lRn be two C 1 _ equivalent curves with the same orientation, 'IjJ(s) = 'P(h(s)) Vs E [c, d], for some h : [e, d] -+ [a, b] of class C 1 with h' > 0, and length L. Then
S'lj;(t)
= s",(h(t)),
Vt
E [e,d],
hence 6",(s) = 6,p(s) Vs E [O,L].
and
t,p(s)
= t",(s)
Vs E [0, L],
7.1 Curves in IR n
233
Figure 7.11. Maria Agnesi (1718-1799) and a page from the Editio princeps of the works of Archimedes of Syracuse (287BC-212BC).
Proof. If 7/I(s) =
s.p(t) =
t
17/I'(r)1 dr =
t
> 0,
then for any t E [c, d]
1
(h(t)
= fa
1
o Proof of Theorem 7.17. Assume that 8 E Cl([c,d],lR n ), 'Y E Cl([a,b],lR n ) 'Y regular, h: [c,d] ---> [a,b] is continuous and increasing and 8(s) = 'Y(h(s)) Vs E [c,d]. Then the functions
(3(s) := L(8 1[c,sj) =
is
18'(s)1 ds,
s E [c,d],
a(t):= Lb![a,tj) = [1'Y'(r)ldr
t E [a,b],
are of class Cl and (3(s) = L(8 IIc,sj) = Lblla,h(s)]) = a(h(s)) Vs E [c,d], see Proposition 7.37. Since 'Y is regular, a(t) is invertible with inverse of class C 1 , hence h(s) = a- 1 ({3(s)) and h is of class Cl. 0
7.1.3 Some celebrated curves Throughout the centuries, mathematicians, artists, scholars of natural sciences and layman have had an interest in plane curves, their variety of forms, and their occurrence in many natural phenomena. As a consequence
234
7. Curves
Figure 7.12. (a) Archimedes's spiral, (b) Fermat's spiral, (c) Hyperbolic spiral.
there is a large literature which attempts to classify plane curves according to their properties focusing on their constructive aspects or by simply providing catalogs. In this section we shall present some of these famous curves. a. Spirals Spirals are probably among the most known curves, the first and simplest being the spiral of Archimedes. This is the curve described by a point that moves with constant velocity along a half-line that rotates with constant angular velocity along its origin. If the origin of the half-line is the origin of a Cartesian plane, we have p vt, () = wt, thus the polar form of Archimedes's spiral is v p = a(), a:= -. w
Other spirals are obtained assuming that the motion along the half-line is accelerated, for instance p = a()n. All these spirals begin at the origin at () origin as () increases.
o and
move away from the
Figure 7.13. (a) Lituus, (b) Logarithmic spiral, (c) Cayley's sextic.
7.1 Curves in JE.n
235
·····00 Figure 7.14. (a) Cardioid, (b) Lemniscate, (c) L'Hospital cubic.
ARCHIMEDEAN SPIRALS. These are the curves defined by mEQ. Among them, see Figures 7.12 and 7.13, we mention o Archimedes's spiral p = aB, o Fermat's spiral p2 = a 2B, o the hyperbolic or inverse spiral p = alB, o the lituus p2 = a 21B, o the logarithmic or equiangular spiral B = logA p, i.e., p = AO, A > 1. It is the spiralis mirabilis of Johann Bernoulli (1667-1748). It (actually, its tangent at every point) forms a constant angle with any ray from the origin, and every ray intersects the logarithmic spiral in a sequence of points with distances in a geometric progression. It is probably the spiral that one finds most frequently in nature, expressing growth proportional to the organism, as in shells, pine cones, sunflowers or in galaxies. SINUSOIDAL SPIRALS. A large variety of curves is described by the sinusoidal spirals pn = an cos(nB), n rational. For instance, o Cayley's sextic p = 4acos 3(BI3), see Figure 7.13, that we can also write in an implicit form as the set of points (x, y) such that 4(x 2 + y2 _ ax)3 = 27a2(x2 + y2?,
Figure 7.15. (a) Parabolic spiral a
= 1, b = 0.7,
(b) Euler's spiral.
236
7. Curves
o
Figure 7.16. (a) The conchoid, (b) The conchoid of Nicomedes a = 4, b = 2, (c) Limacon of Pascal a = b = 1.
o Cardioid p = 2a(1 + cos B), see Figure 7.14, that we can write implicitly as the set of points (x, y) such that (x 2 + y2 - 2ax? =
4a2(x2
+ y2),
o Lemniscate of Bernoulli p2 = a2 cos(2B), see Figure 7.14, equivalently as the set of points (x, y) such that (x 2 + y2)2 = a 2(x 2 _ y2), o Cubic of de l'Hospital: pcos 3 (B/3) = a, see Figure 7.14. Other well-known spirals are, see Figure 7.15, PARABOLIC SPIRALS. (p - a)2 = b2B, EULER'S SPIRAL. ")'(t) = (x(t), y(t)) where x(t)
± J~ s~t dt, y(t)
t
± Jo,fi r cost dt,0< t < 00 . b. Conchoids According to Diadochus Proclus (41l~485), Nicomedes (280BC-21OBC) studied the problem of the trisection of an angle by means of the conchoids. Let 0 be a fixed point, and let £ be a line through 0 intersecting a trajectory C at a point Q. The locus of point P 1 and P2 on £ such that P1 Q = Q P2 = k = const is a conchoid of C with respect to 0, see Figure 7.16. CONCROID OF NICOMEDES. It is the conchoid of a line with respect to a point not on the line. If the line £ is x = b and the point 0 is (0,0), then the conchoid has parametric equations ")'(t) = (x(t), y(t)) where
X(t) = b ± -!!:!!-
..Jt2W'
{ y(t) = t ± vt~:b2'
i.e.,
= b+acos(}, y = (b + a cos B) tan B,
X {
by the change of variable t = btanB, see Figure 7.16. We can write it also in polar coordinates as
b p(B) = a + -B' cos
7.1 Curves in
jRn
237
or as the set of points (x, y) such that
(x 2 + y2)(x _ b)2
= a2x 2.
LIMACON OF PASCAL. (Etienne Pascal (1588-1640), the father of Blaise Pascal.) It is the conchoid of a circle of radius a with respect to a point on the circle. If 0 is the origin and p = 2a cos 0 is the polar equation ofthe circle of center (a,O) through (0,0), the polar equation of the limacon is p = 2a cos 0 + b, see Figure 7.16. Choosing b = 2a the limacon becomes a cardioid. CONCHOID OF DURER. Let Q = (q,O) and R = (0, r) be points such that q + r = b. The locus of points P and P', on the straight line through Q and R, with distance a from Q is Durer's conchoid (Albrecht Durer (1471-1528)), see Figure 7.18. Its Cartesian equation may be found by eliminating q and r from the equations
°
c. Cissoids Given two curves C 1 and C2 and a fixed point 0, we let Ql and Q2 be the intersections of a line through 0 with C 1 and C2 , respectively. The locus of points P on such lines such that op = OQ2 - OQl = Q2Ql is the cissoid of C 1 and C 2 with respect to 0, see (a) Figure 7.17. The cissoids of a circle and a tangent line with respect to a fixed point of the circle that is not opposite to the point of tangency is the cissoid of Diodes introduced by Diocles (240BC-180BC) in his attempts to doubling the cube, see (b) Figure 7.17. If is the origin, and the circle has equation (x - a/2)2 + y2 = a2/4, the intersections points are C = a(l, tanO), B = a cos O(cosO, sin 0), hence Diocles's cissoid has the Cartesian equation y2 (a - x) = x 3, or, equivalently, polar equation
°
p = asinOtanO.
C1 Figure 7.17. (a) The cissoid, (b) Cissoid of Diodes, (c) Folium of Descartes.
238
7. Curves
Figure 7.18. (a) Durer's conchoid, (b) Oval of Cassini, (c) The devil curve.
d. Algebraic curves These are loci of zeros of polynomials. The degree of the polynomial may be taken as measure of complexity: curves that are zeros of second order polynomials are well classified, see Example 3.69. We list here a few more algebraic curves, see Figure 7.19. WITCH OF AGNES!. It has an equation y(x 2 +a 2) = a3 and it is the trace of the curve ')'(t) = (x(t), y(t)) where x(t) = at, y(t) = aj(l + t 2), tER
STROPHOID OF BARROW. It has an equation x(x - a)2 = y2(2a - x) and it is the trace of the curve ')'(t) = (x(t), y(t)) where x(t) = 2a cos 2 t, y(t) = atant(l- 2cos2 t), t E R EIGHT CURVE or LEMNISCATE OF GERONO. It has an equation x 4 = a2(x 2 - y2) and it is the trace of the curve ')'(t) = (x(t), y(t)) where x(t) = a cost, y(t) = asintcost, t E R CURVES OF LISSAJOUS. They are the traces of curves ')'(t) = (x(t), y(t)) where x(t) = asin(o:t+d), y(t) = bsint, t E IR in which each coordinate moves as a simple harmonic motion. One shows that such curves are algebraic closed curves iff 0: is rational. FOLIUM OF DESCARTES. It has an equation x 3 + y3 = 3axy and arises as trace of the curve ')'(t) = (x(t), y(t)) where x(t) = 3atj(1 + t 3), y(t) = 3at2j(1 +t 3), t E IR, see Figure 7.17. DEVIL'S CURVE. It has an equation y4_ x4+ ay2+bx 2 = 0, see Figure 7.18. DOUBLE FOLIUM. It has an equation (x 2 + y2)2 = 4axy2, see Figure 7.20. TRIFOLIUM. It has an equation (x 2 + y2)(y2 + x(x + a)) = 4axy2, see Figure 7.20. OVALS OF CASSIN!. They have equation (x 2 + y2 + a2)2 = b4 + 4a 2x 2, see Figure 7.18. ASTROID. It has an equation x 2/ 3 + y2/3 = a2/ 3, see Figure 7.20.
e. The cycloid Nonrational curves are called transcendental. Among them one of the most famous is the cycloid. This is the trajectory described by a fixed point (the tyre valve) of a circle (a tyre) rolling on a line, see Figure 7.21.
7.1 Curves in IR n
{)()
......
239
~.
Figure 7.19. Some algebraic curves: from the top-left (a) the witch of Agnesi, (b) the strophoid of Barrow, (c) the lemniscate of Gerono, (d) the Lissajous curve for n = 5, d = 1r/2.
If the center of the circle is C = (0, R), the radius R, P = (0,0) and we parametrize the movement with the angle () that C P makes with the vertical through C, then P = P(()), C = C(()), the cycloid has period 27r, and we have
P(()) - C(()) = ( -Rsin() ). R(l - cos ()) Since the circle rolls, C(()) simply translates parallel to the axis of R(). We then conclude that the cycloid is the trace of the curve, : ~ - t ~2 defined by
,(t)
=
(R(() - Sin())) . R(l - cos())
..
~.
. .
Figure 7.20. From the left: (a) the double folium, (b) the trifolium, (c) the astroid.
240
7. Curves
/
,I
"
\ \
I
I'
J'. ______ .::..':..-L._._...:.:-....__ ~:
.__ .
•
.__
Figure 7.21. The cycloid.
The cycloid solves at least two important and celebrated problems. As we know, the pendulus is not isochrone, but it is approximately isocronic for small oscillations, see Section 6.3.1 of [GM1]. Christiaan Huygens (1629-1695) found that the isochronal curve is the cycloid. Johann Bernoulli (1667-1748) showed that the cycloid is the curve of quickest descent, that is, the curve connecting two points on a vertical plane on which a movable point descends under the influence of gravitation in the quickest possible way. Other curves of the same nature as the cycloids are the epicycloids and the hypercycloids, see Figure 7.22. These are obtained from a circle that rolls around the inside or the outside of another circle (or another curve).
f. The catenary Another celebrated transcendental curve is the catenary. It describes the form assumed by a perfect flexible inextensible chain of uniform density hanging from two supports, already discussed by Galileo Galilei (15641642). Answering a challenge of Jacob Bernoulli (1654-1705), it was proved
Figure 7.22. (a) The epicycloid x(t) = 9 cos t - cos(9t), yet) = 9sin t - sin(9t), (b) the ipocycloid x(t) = 8cost + 2cos(4t), yet) = 8sint - 2sin(4t), (c) a catenary.
7.2 Curves in Metric Spaces
241
"" CHRISTIANI
HVGE
II
ZVLICH£MlI, CONST f.
HOROLOGIVM OSCILLATORIVM SI v E DE MOTV PENDVLOR VM AD HOROLOOIA APTATO C !oMONS T I.ATIO~:ES (;lCJHTLICJJ,
(" I
~I.,
'.
,.("-Jt" <- -
'I !
Figure 7.23. The pendulum clock from the Horologium Oscillatorium of Christiaan Huygens (1629-1695).
by Gottfried von Leibniz (1646-1716) and Christiaan Huygens (1629-1695) that the equation of the catenary hung at the same height at both sides is
y=~(ex/a+e-x/a) =acosh~.
7.2 Curves in Metric Spaces Of course we may also consider curves in a general metric space X, as continuity is the only requirement. Let us start introducing the notion of total variation, a notion essentially due to Camille Jordan (1838-1922).
a. Functions of bounded variation and rectifiable curves Let X be a metric space and f : [a, b] C jR - t X be any map. Denote by S the family of finite partitions a = {to, ... , t N} with a = to < h < ... < tN = b of the interval [a, b] and, in correspondence to each partition a, set !a! := maxi=I .....N(!ti - ti-Il) and N-I
Va(f)
:=
L
d(f(ti), f(ti+r)),
i=O
that we have denoted by P( a) in the case of curves into jRn.
242
7. Curves
LETTRES DE
ADETTO VILLE CONTENANT
~qae.\'ntS de ((I JmltOOODJ de Gcomcuie. Sf A "0 Ill. La _ ... p ~ tIlIIdIa: L4 kcmnn
..·il
propoIft~._.I_I'-'
L~t;:~,=ceadlo."'fomadtl~ L~C:::~Spsaa.k~.dcmoaAt« 11&
l.&::=t:~
_I..
I"''''''''_Spoal<
l.&_"kc....
T....p.~
1.t o..mbltk- CtrraT* PJU*ttb;.;u. V.T.r:mm _T&IlllpNIc« kun Oeskn. VA TAlCC das-...ec."ArodeCcordt. Va Tma.e des Solwb ~
A PA !tIS. Chez c; v tL\AI~~.gc ~{' ~J~~e
Gmt lteqao,
M. DC. tIX.
Figure 7.24. Blaise Pascal (1623-1662) and the frontispiece of his Lettres de Dettonville about the Roulettes.
7.38 Definition. The total variation of a map f : [a, b] ber (eventually +00)
-+
X is the num-
V(j, [a, bJ) := sup Vo-(j). o-ES We say that f has bounded total variation if V(j, [a, bJ) < 00. When the curve f : [a, b] -+ X is continuous, V(j, [a, bJ) is called the length of f and curves with bounded total variation, that is with finite length, are called rectifiable. Either directly or repeating the arguments used in studying the length of curves into jRn, it is easy to show the following.
7.39 Proposition. We have
(i) ij[a, b] c [c, d], then V(j, [a, bJ) :s: V(j, [c, dJ), (ii) V(j, [a, bJ) 2: d(j(a), f(b)) and, if f is real-valued and increasing, then V(j, [a, bJ) = f(b) - f(a), (iii) every Lipschitz-continuous function f : [a, b] -+ X has bounded total variation and V(j, [a, bJ) :s: Lip (j) (b - a), (iv) the total variation is a subadditive set-function, meaning V(j, [a, bJ)
:s: V(j, [a, cJ) + V(j, [c, bJ)
if a < c < bj
moreover, if f is continuous at c, then V(j, [a, bJ) V(j, [c, bJ), (v) V(j, [a, bJ) = limlo-I-+o+ Vo-(j, [a, bJ).
= V(j, [a, cJ) +
7.2 Curves in Metric Spaces
243
ALBERTI DV· RERIINSTlTVTIONVM
,,,---
GEOMI!Ta.tC'Allvlltil L1!11,1 CU'Arvoa., 1 ~ ~..... lOM~""'ir.... "~
-
~~..w*~''''hbb'''IIIt~~~ _ _ ".
..................
r.'--IIIIikl.~~I_
""""... IiItIt..... ~ •
e:.-ddiill
,..,"'-
....kfpW_.w,Io
A .......
I...lliaIlJ-dCIdU.-.• .wm." ...
I.
I_ U .. C4
....
Y"
Gn .. u ..
F.<;.'lI6ti...... ~ AND
1"9'"
~.r.tr.r~~.~
M.D.;row""""'A.....
i
Figure 7.25. Frontispieces of two editions of 1532 and of 1606 of Institutionum geomet-
ricorum by Albrecht Diirer (1471-1528).
7.40~. Let f : [a, b] --+ X x X where X is a metric space. Show that f has bounded variation if and only if the two components of f = (/I, h), /I,2 : [a,b] --+ X have bounded variation.
We say that two curves rp : [a, b] ~ X and 'I/; : [c, d] ~ X into X are equivalent if there exists a homeomorphism a : [c, d] ~ [a, b] such that 'I/;(s) = rp(h(s)) 'Ix E [a,b]. From the definitions we have the following.
7.41 Proposition. Two equivalent curves have the same total variation. From (iv) and (v) of Proposition 7.39 we also have the following. ~ X be a rectifiable (continuous) curve. Then the real-valued function t ~ V(rp, [a, t]), t E [a, b] is continuous and increasing.
7.42 Proposition. Let rp : [a, b]
7.43~.
Prove the claims in Propositions 7.39, 7.41 and 7.42.
b. Lipschitz and intrinsic reparametrizations We saw that every regular Euclidean curve may be reparametrized with velocity one. For curves in an arbitrary metric space we have 7.44 Theorem. Let'Y: [a, b] ~ X be a simple rectifiable curve on a metric space X of length L. Then there exists a homeomorphism a : [0, L] ~ [a, b] such that 'Y 0 a : [0, L] ~ X is Lipschitz continuous with Lipschitz constant one.
244
7. Curves 1 I I------l
I I I------l
I I f-I- - - - - - - - - 1 1 ~
~
HH
HH
HH
HH
Figure 7.26. The sets Ek of the middle third Cantor set.
We call that parametrization of the trace of"( the intrinsic parametrization of "(([a, b]). Proof. Let x E [a, b] and V(x) := V(-y, [a, xJ). We have L = V(-y, [a, bJ) and, on account of Proposition 7.42, V(x) is continuous and increasing. Since, is simple, V(x) is strictly increasing hence a homeomorphism between [a,bJ and [O,LJ. Set u:= V-i. We then infer for 0 ~ x < y ~ L d(-y(u(y)) , ,(u(x)))
~
=
V(-y 0 u, [x,yJ) = V(-y, [u(x), u(y)]) V(u(y)) - V(u(x)) = x - y,
Le., ep 0 u is Lipschitz continuous with Lipschitz constant one.
D
7.2.1 Real functions with bounded variation It is worth adding a few more comments about the class of real-valued functions f : [a, b]
---t
lR with finite bounded variation, denoted by BV([a, b]).
7.45 Theorem. We have
llfll := If(a)1 + V(J, [a, b]) is a norm on it, (ii) BV([a, b]) contains the convex cone of increasing functions, (iii) every f E BV([a, b]) is the difference of two increasing functions. (i) BV([a, b]) is a linear space and
Proof. We leave to the reader the task of proving (i) and (ii) and we prove (iii). For
f E BV([a, bJ) and t E [a, b) set ep(t) For x, y E [a, b], x
:=
V(j, [a, tJ),
< y,
'I/J(t)
:=
ep(t)
+ f(t),
t E [a,b].
we have
'I/J(y) - 'I/J(x) = [ep(y) - ep(x)]
+ [f(y)
- f(x)];
now the subadditivity of the total variation yields
ep(y) - ep(x) = V(j, [x,yJ):2: If(y) - f(x)j, in particular
'I/J(y) - 'I/J(x) :2: O. Therefore ep and 'I/J are both increasing with bounded total variation, and f(t) = 'I/J(t) -
ep(t) 'it.
D
A surprising consequence is the following.
7.46 Corollary. Every function in BV([a, b]) has left- and right-limits at every point of [a, b].
7.2 Curves in Metric Spaces
1/3
245
2/3
Figure 7.27. An approximate Cantor-Vitali function.
If we reread the proof of (iii) Theorem 7.45, on account of Proposition 7.42 we infer
7.47 Proposition. Every continuous function f : [a, b] ---. ~ with finite total variation is the difference of two continuous increasing functions. a. The Cantor-Vitali function The Cantor ternary set is defined, see of [GM2], as C = nkEk where Eo := [0, 1]' E 1 is obtained from Eo be removing the open middle third of Eo, and Ek+l by removing from each interval of E k its open middle third. Define for k = 0,1, ... and j = 1, ... , 2k , the base points
= 1, ... , 2k = 2k + 1, ... , 2k +l ,
if j if j
then the intervals that have been removed from E k - 1 to get E k at step k are ._ -k+l] 1 2 [ j=1, ... ,2 k - 1 , h-l,j .- bk-l,j + 3 3' 3 ' and the intervals whose union is Ek are j
=
1, ... ,2 k .
Therefore
Strongly related to Cantor's set is the Cantor- Vitali function introduced by Giuseppe Vitali (1875-1932). To define it, we first consider the C1:"Jproximate Cantor-Vitali functions Vk : [0,1] ---. ~ defined inductively by
246
7. Curves
if X E [0,1/3]' Vo(x) := x,
if X E [1/3,2/3]' if X E [2/3,1]'
see Figure 7.27. One easily checks that for k (i) We have Vk(O) and
= 0,1, ...
= 0, Vk(l) = 1, Vk(bj,k) = ~, Vk(bj,k + 3- k) =
2j -1 Vk(X) = 2m + 1
it<
ifxElm,j, m=0, ... ,k-1, j=1, ... ,2 m.
(ii) We have Vdx)
=
3 r (:2) io k
XE\ (t) dt
where XE k is the characteristic function of the set Ek that we used to define the Cantor ternary set. (iii) We have tlx, y E [0,1],
where Ct = log2/log3, in particular the Vk'S are equi-Holder. In fact, by symmetry it suffices to prove the claim for X, y E [0,3- k ] where Vk is linear with slope (3/2)k. For X ::::; y ::::; 3- k we have
°:: :;
as 23-'" = 1. (iv) We have tlx E [0,1].
In particular (iv) implies that the sequence {Vd converges uniformly to a function V(x), which is by (iii) Holder-continuous with exponent Ct = log 2/ log 3. The function V is called the Cantor- Vitali function and satisfies the following properties
V is not decreasing, hence it has bounded total variation, in each interval of [0, 1] \ Ek, V(x) = Vk(X) is constant, in particular V is differentiable outside the Cantor set with V'(x) = tlx E [0,1] \ C, o V([O, 1]) = [0,1], and V maps [0,1] \ C into the denumerable set
o
o
°
D :=
{y E IR Iy= 1k' j = 0,1, ... , 2k, kEN},
hence V maps C onto [0,1] \ D.
7.3 Exercises
247
7.48 Homeomorphisms do not preserve fractal dimensions. The function 'P: [0,1] ----> [0,2], 'P(x) := x + V(x), is continuous and strictly increasing, hence a homeomorphism between [0,1] and [0,2]. In Theorem 8.109 we shall see that the algebraic dimension of IR n is a topological invariant, that is, IR n and IR m are homeomorphic if and only if n = m. This is not true in general for the fractal dimension, see Chapter 8 of [GM2]. In fact, 'P maps the complement of Cantor's set in [0, 1] into the countable union of intervals oftotal measure 1, HI ('P([O, 1]) \ C) = 1, hence HI('P(C)) = 1 and dimrt('P(C)) = 1, while dimrt(C) = log 2/ log 3. 7.49,.. Let f : lR n ---> lR n be a Lipschitz-continuous map with Lipschitz inverse. Show that f preserves the fractal dimension, dimrt(f(A)) = dimrt A.[Hint: Recall that }tk(f(A))::; Lip(f)}tk(A), see Section 8.2.4 of [GM2].]
7.3 Exercises 7.50 ,.. We invite the reader to study some of the curves described in this chapter, try to convince himself that the figures are quite reasonable, and compute the lengths of some of those curves and, when possible, the enclosed areas.
f : [0,2]
7.51 ,.. Compute the total variation of the following functions
x(x
2
f(x)
:=
f(x)
:= 3X[O,lJ(X)
7.52 ,.. Let g(x) =
-
+ 2X[1,2j (x),
where
..;x, x E [0,1], and let f I
{
f,
9 and go
lR
1),
f(x) = Show that
--->
f
·f
: [0,1] E
[1
XA(X):=
--->
1 X
o
otherwise.
0
if x E A, if x 'I:- A.
lR be given by
1
1 ]
n' n + ;;:"2"
;;:"2"
I {
,
have bounded total variation.
7.53,.. Let f,g E BV([O, 1]). Show that min(f,g), max(f,g),
IfI E BV([O, 1]).
7.54". Show that the Cantor middle third set C is compact and perfect, i.e., int (C) =
0.
8. Some Topics from the Topology of JRn
As we have already stated, topology is the study of the properties shared by a geometric figure and all its bi-continuous transformations, Le., the study of invariants by homeomorphisms. Its origin dates back to the problem of Konigsberg bridges and Euler's theorem about polyhedra, to Riemann's work on the geometric representation of functions, to Betti's work on the notion of multiconnectivity and, most of all, to the work of J. Henri Poincare (1854-1912). Starting from his research on differential equations in mechanics, Poincare introduced relevant topological notions and, in particular, the idea of associating to a geometric figure (using a rule that is common to all figures) an algebraic object, such as a group, that is a topological invariant for the figure and that one could compute. The fundamental group and homology groups are two important examples of algebraic objects introduced by Poincare: this is the beginning of combinatorial or algebraic topology. With the development of what we call today general topology due to, among others, Rene-Louis Baire (1874-1932), Maurice Frechet (1878-1973), Frigyes Riesz (1880-1956), Felix Hausdorff (18691942), Kazimierz Kuratowski (1896-1980), and the interaction between general and algebraic topology due to L. E. Brouwer (1881-1966), James Alexander (1888-1971), Solomon Lefschetz (1884-1972), Pavel Alexandroff (1896-1982), Pavel Urysohn (1898-1924), Heinz Hopf (1894-1971), L. Agranovich Lyusternik (1899-1981), Lev G. Schnirelmann (1905-1938), Harald Marston Morse (1892-1977), Eduard Cech (1893-1960), the study of topology in a wide sense is consolidated and in fact receives new incentives thanks to the work of Jean Leray (1906-1998), Elie Cartan (18691951), Georges de Rham (1903-1990). Clearly, even a short introduction to these topics would deviate us from our course; therefore we shall confine ourselves to illustrating some fundamental notions and basic results related to the topology of IR n , to the notion of dimension and, most of all, to the existence of fixed points.
250
8. Some Topics from the Topology of JRn
t .1
_
Q3 Oy -_/:-.----
- . - ..
o _ _X
+-_ x
:. - .....
'. ' ..
Figure 8.1. A homotopy.
8.1 Homotopy In this section we shall briefly discuss the different flavors of the notion of homotopy. They correspond to the intuitive idea of continuous deformation of one object into another.
8.1.1 Homotopy of maps and sets a. Homotopy of maps In the following, the ambient spaces X, Y, Z will be metric spaces. 8.1 Definition. Two continuous maps f, 9 : X -4 Yare called homotopic if there exists a continuous map H : [0,1] x X -4 Y such that H(O, x) = f(x), H(l,x) = g(x) V x E X. In this case we say that H establishes or is a homotopy of f to g. It is easy to show that the homotopy relation to Y is an equivalence relation, i.e., it is
(i)
(REFLEXIVE)
f
rv
f
rv
9 on maps from X
f.
(ii) (SYMMETRIC) f rv 9 iff 9 rv f. (iii) (TRANSITIVE) if f rv 9 and 9 rv h, then
f
rv
h.
Therefore CO(X, Y) can be partioned into classes of homotopic functions. It is worth noticing that, since
(8.1) we have the following. 8.2 Proposition. f and 9 E CO(X, Y) are homotopic if and only if they belong to the same path-connected component of CO(X, Y) endowed with uniform distance. The subsets of CO(X, Y) of homotopy equivalent maps are the path-connected components of the metric space CO(X, Y) with uniform distance.
8.1 Homotopy
DIE WISSE
Pavd
SCHAFT
251
.....
rge-mt AJe.baodroy
~.~..-t!_
HUAUIGIlIl. 'RO'. DK.. WILHELM WESTPHAL UHDU
E1 nfilh runa
Topologia comhinatoria
In die Itomblnatortsdte Topoloale Or. ......... . - ....... Kart Rt'idtmtlnrr
-,aISDR. YI!WEQ f: SOHI'f. IJt .... UHSCHWEIO lUI
...,
Figure 8.2. Frontispieces of the introduction to combinatorial topology by Kurt Reidemeister (1893-1971) and Pavel Alexandroff (1896-1982) in its Italian translation.
8.3~. Let X, Y be metric spaces. Show the equality (8.1), which we understand as an isometry of metric spaces.
8.4~. Let Y be a convex subset of a normed linear space. Then every continuous map f : X - t Y from an arbitrary metric space X is homotopic to a constant. In particular, constant maps are homotopic to each other. [Hint: Fix YO E Y and consider the homotopy H : [0,1] x X - t Y given by H(t, x) := tyO + (1 - t)f(x).] 8.5~.
Let X be a convex set of a normed linear space. Then every continuous map
f : X - t Y into an arbitrary metric space is homotopic to a constant function. [Hint: Fix xo E X and consider the homotopy H : [O,IJ x X - t Y given by H(t, x) := f(txo + (1 - t)x).] 8.6~.
Two constant maps are homotopic iff their values can be connected by a path.
8.7~.
Let X be a linear normed space. Show that the homotopy classes of maps Y correspond to the path-connected components of Y.
f :X
-t
According to Exercises 8.4, 8.5 and 8.6, all maps into lR. n or defined on lR. n are homotopic to constant maps. However, this is not always the case for maps from or into sn := {x Illxll = I}, the unit sphere of lR.n +1 . 8.8 Proposition. We have --4 sn be two continuous maps such that f(x) and g(x) are never antipodal, i.e., g(x) f= - f(x) \/x E X, then f and 9 are homotopic; in particular, if f : X --4 sn is not onto, then f is homotopic to a constant.
(i) Let f,g : X
252
8. Some Topics from the Topology of IR n
Figure 8.3. The figure suggests a homotopy of closed curves, that is a continuous family of closed paths, from a knotted loop to Sl. But, it can be proved that there is no family of homeomorphisms of the ambient space IR3 that, starting from the identity, deforms the initial knotted loop into Sl.
n
sn
(ii) Let B +! := {x E IR n +!llxl ~ I}. A continuous map f : --> Y is homotopic to a constant if and only if f has a continuous extension F: B n + 1 --> Y. Proof. (i) Since f(x) and g(x) are never antipodal, the segment tg(x) t E [0,1], never goes through the origin; a homotopy of f to 9 is then H(t,x):= t g (x)+(I-t)f(x), Itg(x) + (1 - t)f(x)1
+ (1
- t)f(x),
(t,x) E [O,IJ xX.
Notice that y ---+ ~ is the radial projection from IR n+1 onto the sphere sn, hence H(t, x) is the radial projection onto the sphere of the segment tg(x) + (1 - t)f(x), t E [0,1]. The second part of the claim follows by choosing Yo E sn \ f(X) and g(x) := -Yo. (ii) If F : Bn+1 ---+ Y is a continuous function such that F(x) = f(x) "Ix E sn, then the map H(t,x) := F(tx), (t,x) E [0,1] x sn, is continuous, hence a homotopy of H(O,x) = F(O) to H(I,x) = f(x). Conversely, if H: [0,1] x sn ---+ Y is a homotopy of a constant map g(x) = p E Y to f, H(O, x) = p, H(I, x) = f(x) "Ix E X, then the map F: B n + 1 ---+ Y defined by
F(x)
:=
{H(lxl,x/1X 1) p
is a continuous extension of
f to
if x -I 0, if x = 0
B n + 1 with values into Y.
o
b. Homotopy classes Denote by [X, Y] the set of homotopy classes of continuous maps f : X --> Y and by [I] E [X, Y] the equivalence class of f. The following two propositions collect some elementary facts. 8.9 Proposition. We have
(i)
Let f,1' : X --> Y, g,g' : Y --> Z be continuous and 9 rv g', then go f rv g' 0 1'. (ii) (RESTRICTION) If f,g: X --> Yare homotopic and A c X, then flA is homotopic to 91A as maps from A to Y. (COMPOSITION)
maps. If f
rv
I'
8.1 Homotopy
(iii)
253
(CARTESIAN PRODUCT) f, 9 : X ----> Y1 X Y 2 are homotopic if and only if 7ri 0 f and 7ri 0 9 are homotopic (with values in Yi) where i = 1,2 and 7ri, i = 1,2 denote the projections on the factors.
A trivial consequence of Proposition 8.9 is that the set [X, Y] is a topological invariant of both X and Y. In a sense [X, Y] gives the number of "different" ways that X can be mapped into Y, hence measures the "topological complexity" of Y relative to that of X. Let 'P : X ----> Y be a continuous map and let Z be a metric space. Then 'P defines a pull-back map 'P# : [Y, Z]
---->
[X, Z]
defined by 'P# [I] := [f 0 'P], as Proposition 8.9 yields that the homotopy class of f 0 'P depends on the homotopy class of f. Similarly 'P induces a push-forward map 'P# : [Z,X] ----> [Z, Y] defined by 'P# [g] := ['P
0
g].
8.10 Proposition. We have the following.
(i) Let 'P,"ljJ : X
----> Y be continuous and homotopic, cp rv"ljJ. Then cp# "ljJ# and 'P# = "ljJ#. (ii) Let 'P : X ----> Y and'TJ : Y ----> Z be continuous. Then
=
and
c. H<:motopy equivalence of sets 8.11 Definition. Two metric spaces X and Yare said homotopy equivalent, or are said to have the same homotopy type, if there exist two .continuous maps f : X ----> Y and 9 : Y ----> X such that 9 0 f rv Idx and fog rv Idy. If f : X ----> Y and 9 : Y ----> X define a homotopy equivalence between X and Y, then for every space Z we infer from Proposition 8.10
Similarly g#
0
f#
=
Id[z,x],
f#
0
g#
=
Id[z,y];
hence [Z, X] and [Z, Y] are in a one-to-one correspondence.
8.12 Definition. A space X is called contractible if it is homotopy equivalent to a space with only one point, equivalently, if the identity map i : X ----> X of X is homotopic to a constant map.
By definition if X is contractible to Xo EX, then X is homotopic equivalent to {xo}, hence [Z, X] and [X, Z] reduces to a point for any space Z.
254
8. Some Topics from the Topology of IR n
Figure 8.4. IRn is contractible.
8.13 Example. IR n is contractible. In fact, H(t, x) := (1 - t)x, (t, x) E [O,IJ x IR n , contracts IRn to the origin.
In general, describing the set [X, Y] is a very difficult task even for the simplest case of the homotopy of spheres, [Sk, sn], k, n ~ 1. However, the following may be useful. 8.14 Definition. Let X be a metric space. We say that A c X is a retract of X if there exists a continuous map p : X ---. A, called a retraction, such that p(x) = x 't:/x E A. Equivalently A is a retract of X if the identity map IdA : A ---. A extends to a continuous map r : X ---. A. We say that A c X is a deformation retract of X if A is a retract of X and the identity map Idx ---. X is homotopic to a retraction of X to A. Let A c X be a deformation retract of X and denote by i A : A ---. X the inclusion map. Since Idx : X ---. X is homotopic to the retraction map r : X ---. A, we have iA
0
r = r
rv
Idx,
hence A and X are homotopic equivalent. By the above, for every space Z we have [A, Z] = [X, Z] and [Z, A] = [Z, X] as sets, thus reducing the computation of [Z, A] and of [X, Z] respectively, to the smaller sets [Z, X] and [A,Z]. The following observation is useful. 8.15 Proposition. Let A c X be a subset of a metric space X. Then A is a deformation retract of X if and only if A is a retract of X and Idx : X ---. X is homotopic to a continuous map 9 : X ---. A.
Figure 8.5. 8 1 is a deformation retract of the torus T C
1R3.
8.1 Homotopy
Figure 8.6.
255
sn is a deformation retract of B n \ {O}.
Proof. It is enough to prove sufficiency. Let r : X [0,1] x X -> X be a homotopy of Idx to g, h(O,x) the map k(t,x) = {r(h(2t,X» h(2 - 2t, x)
if
->
A be a retraction and let h : h(l,x) = g(x) "Ix E X. Then
= x,
0::; t ::;
~,
if ~ ::; t ::; 1
is continuous since h(l,x) = r(h(l,x» "Ix and shows that Idx is homotopic to r: X->
A.
0
8.16'. Show that every point of a space X is a retract of X. 8.17 ,. Show that {O, 1} C JR is not a retract of JR. 8.18'. Show that a retract A C X of a space X is a closed set. 8.19'. The possibility of retracting X onto A is related to the possibility of extending continuous maps on A to continuous maps on X. Show Proposition. A c X is a retmct of X if and only if for any topological space Zany continuous map f : A -> Z extends to a continuous map F : X -> Z. 8.20'. Show that
sn is a deformation retract of B n+ 1 \
{O}, see Figure 8.6.
8.21 ,. With reference to Figure 8.8, show that M \ 8M is not a retract of M, but M and M \ 8M are homotopy equivalent since they have a deformation retract in common.
Figure 8.7. The first two figures are homotopy equivalent since they are both deformation retracts of the third figure.
256
8. Some Topics from the Topology of IR n
Figure 8.8. M\8M is not a retract of M, but M and M\8M are homotopy equivalent.
d. Relative homotopy Intuitively, see Figure 8.1, the maps Ht : X ----+ Y, t E [0,1] defined by Ht(x) := H(t,x), are a continuous family of continuous maps that deform f to g. In particular, it is important to note that, in considering homotopy of maps, the target space is relevant and must be kept fixed in the discussion. As we shall see in the sequel, maps with values in Y that are nonhomotopic may become homotopic when seen as maps with values in Z ::J Y. Also, it is worth considering homotopies of a suitable restricted type. For instance, when working with paths with fixed endpoints, it is better to consider homotopies such that for each t all curves x ----+ Ht(x) := H(t, x), x E [0,1], have the same fixed endpoints for all t E [0,1]. Similarly, when working with closed curves, it is worthwile considering homotopies H(t, x) such that every curve x ----+ Ht(x) := H(t,x) is closed for all t E [0,1]. 8.22 Definition. Let C c eO(X, Y). We say that f, gEe are homotopic relative to C if there exists a continuous map H[O, 1] x X ----+ Y such that H(O, x) = f(x), H(l, x) = g(x) and the curves x ----+ Ht(x) := H(t, x) belong to C for all t E [0, 1]. It is easy to check that the relative homotopy is an equivalence relation.
The set of relative homotopy classes with respect to C c eO(X, Y) is denoted by [X, Y]c. Some choices of the subset C c eO ([X, Y]) are relevant. (i) Let Z c Y and C := {f E eO([O, 1], Y) I f(X) c Z}. In this case a homotopy relative to C is a homotopy of maps with values in Z. (ii) Let X = [0,1]' a, bEY and C := {f E eO(X, Y) I f(O) = a, f(l) = b}. Then a homotopy relative to C is called a homotopy with fixed
endpoints.
(iii) Let X = [0,1]' and let C := {f E eO([O, 1], Y) I f(O) = f(l)} be the class of closed curves, or loops, in Y. In this case two curves homotopic relative to C are said loop-homotopic. Recall that a closed curve "t : [0, 1] ----+ X can be reparametrized as a continuous map 0 : 8 1 ----+ X from the unit circle 8 1 C C. Now let "tI, "t2 : [0,1] ----+ X be two loops and let 01,02 : 8 1 ----+ X be two corresponding reparametrizations on 8 1 . Then, recalling that
8.1 Homotopy
257
homotopies are simply paths in the space of continuous maps, it is trivial to show that 1'1 and 1'2 are loop-homotopic if and only if 61 and 62 are homotopic as maps from 8 1 into X. Therefore [[0, 1]' XJc =
[8 1 ,X].
Finally, notice that the intuitive idea of continuous deformation has several subtle aspects, see Figure 8.3.
8.1.2 Homotopy of loops a. The fundamental group with base point Let X be a metric space and let Xo EX. It is convenient to consider loops l' : [0,1] --+ X with 1'(0) = 1'(1) = xo. We call them loops with base point xo. Also, one can introduce a restricted form of homotopy between loops with base point Xo by considering loop-homotopies H(t, x) such that x --+ H(t, x) has base point Xo for every t. We denote the corresponding homotopy equivalence relation and homotopy classes repectively, by "'xo and [ ]xo' Finally,
7f1(X,{XO}) denotes the set of loop-homotopy with base point Xo classes of loops with base point Xo. 8.23'. Show that 7rl(X, xo) reduces to a point if X is contractible and Xo EX. [Hint: Show that 7rl(X,XO) C [Sl,X].]
b. The group structure on 7t"l(X,XO) Given two loops
if t E [0,1/2]' if t E [1/2,1].
Since the homotopies with fixed endpoints can be joined, too, we have
7f1(X,{XO}),
* : 7f1 (X, {xo}) X7f1 (X, {xo}) --+ 7f1 (X, {xo}) still denoted by *, defined by [
8.24 Proposition. The map
* has the following properties.
(i) (ASSOCIATIVITY) Let f,g,h: [0,1] --+ X be three loops with base point Xo· Then ([f]xo * [g]xo) * [h]xo = [f]xo * ([g]xo * [h]xo)'
8. Some Topics from the Topology of lR. n
258
(ii) (RIGHT AND LEFT IDENTITIES) Let f : [0,1] -----+ X be a loop with base point Xo and let e xo : [0,1] -----+ X be the constant map, exo(t) := xo. Then [exol xo
* [flxo =
[f]xo
* [exoJxo =
[flxo'
X be the constant map exo(t) := Xo and, X with base point xo, let! : [O,lJ -----+ X be the
(iii) (INVERSE) Let e xo : [0, 1J
-----+
for a loop f : [O,lJ -----+ map !(t) := f(l- t). Then
[fJxo
* [!Jxo
=
[!lxo
* [fJxo
= [exol xo '
In this way the junction of loops defines a natural group structure on {xo}), where [f];ol = [!]xo'
7fl (X,
X be a space and Xo EX. The set 7fl (X, {xo}) of homotopy classes of loops with base point Xo has a natural group structure induced by the junction operation of loops. We then call 7fl (X, {xo}) the fundamental group of X, or the first homotopy group of X, with base point xo.
8.25 Definition. Let
c. Changing base point By definition 7fl (X, xo) depends on the base point xo. However, if xo, Xl E X, suppose that there exists a path a : [0, 1J -----+ X from Xo to Xl and let a : [O,lJ -----+ X, a(t) := a(l - t), be the reverse path from Xl to xo. For every loop / with base point xo, the curve a * / * a is a loop with base point Xl. Since evidently a * /1 * a rv a * /2 * a if /1 rv /2, a defines a map a* : 7fl (X, xo) -----+ 7fl (X, Xl) by (8.2) where we have denoted by []xo and [JXl respectively, the homotopy classes of curves with base point Xo and Xl. It is trivial to see that a* is a group isomorphism, thus concluding the following. 8.26 Proposition. 7fl (X, xo) and 7fl (X, Xl) are isomorphic as groups for all Xo, Xl E X if X is path-connected. Thus, for a path-connected space X, all groups 7fl(X,XO), Xo E X are the same group up to an isomorphism. We call it the fundamental group or the first homotopy group of X, and we denote it by 7fl(X), However, the map a* defined by (8.2) depends explictly on a. For convenience, let he> := a*. Examples show that in general he> =I- h{3 if a and {3 have the same endpoints, but we have
This implies that
(i) he> = h{3 if a and {3 are homotopic with the same endpoints, (ii) he> is always the same map, independently from a, if 7fl (X, xI) is a commutative group.
8.1 Homotopy
259
,,;f-IJ~-t.fiiJ.A'"
1..tc:V. . . . . Oft a.m.ntlry Topology" Geomwv LM.~;;l1-/..JA
'" .1&-_
....
Y-7'-AW ~-
..
II
III
Figure 8.9. Camille Jordan (1838-1922) and the frontispiece of the Japanese translation of the Lecture Notes of Elementary Topology and Geometry by J. M. Singer and J. A. Thorpe.
Thus, attaching a path to Xo to any curve I : SI -> X, we can construct a loop with base point Xo and, at the homotopy level, this construction is actually a map h : [Sl,X] -> 1I"1(X,XO)' It is clear that h is one-to-one, since its inverse is just the inclusion of 11"1 (X, xo) into [Sl, X]. 8.27 Proposition. Let X be path-connected. If 11"1 (X) is commutative, then the map h: [SI, X] -> 11"1 (X) described above is bijective. 8.28 Definition. We say that a space X is simply connected if X is pathconnected and 11"1 (X, xo) reduces to a point for some Xo E X (equivalently for any Xo E X by Proposition 8.26). 8.29 'If. Show that X is simply connected if X is path-connected and contractible.
d. Invariance properties of the fundamental group Let us now look at the action of continuous maps on the fundamental group. Let X, Y be metric spaces and let Xo E X. To any continuous map f : X -> Y one associates a map
f# : 11"1 (X, xo)
-> 11"1 (Y,
f(xo))
defined by f#(b]xo) := (fo'Y]f(xo)' It is easy to see that the above definition makes sense, and that actually f # is a group homomorphism.
8.30 Proposition. We have the following. (i) Let f : X -> Y and g : Y (g 0 1)# = g# 0 f#·
->
Z be two continuous maps. Then
260
8. Some Topics from the Topology of IR n
(ii) If Id : X --+ X is the identity map and Xo EX, then Id# is the identity map on 7rl (X, {xo} ). (iii) Suppose Y is path-connected, and let F : [0, 1] x X --+ Y be a homotopy of two maps f and g from X into Y. Then the curve o:(t) := F(t, xo), t E [0,1], joins f(xo) to g(xo) and g# = 0:* 0 f#. Proof. (i) and (ii) are trivial. To prove (iii), it is enough to show that 10"( and 75ogo"(oa are homotopic for every loop "( with base point xo. A suitable homotopy is given by the map H(t,x): [O,lJ ---> X ---> Y defined by if x
75(2X) H(t,x):=
{
F(t,"((4xi;~12)) if
:s: l;t,
1-t < 2 -
x
t1
3
if x ~
a(4x - 3)
< -
t+3 4 '
.
o Of course, Proposition 8.30 (i) and (ii) imply that a homeomorphism h : X --+ Y induces an isomorphism between 7rl (X, xo) and 7rl (Y, h(xo)). Therefore, on account of Proposition 8.26, the fundamental group is a topological invariant of path-connected spaces. Actually, from (iii) Proposition 8.30 we infer the following. 8.31 Theorem. Let X, Y be two path-connected homotopy equivalent spaces. Then 7rl (X) and 7rl (Y) are isomorphic. Proof. Let I: X ---> Y, g: Y ---> X be continuous such that gol and let Xo EX. Then we have two induced maps
~
Idx and log
~
Idy
1# : ,q(X,XO) ---> 'Q(Y,/(xo», g#: 1T1(Y,/(xo» ---> 1r1(X,g(f(xo»). Let H : [O,lJ x X ---> X be the homotopy of Id x to go I and let K : [0,1] : Y ---> Y be the homotopy of Idy to log. If a1 (t) := H(t, xo), a2(t) := K(t,/(xo», then by Proposition 8.30 we infer
g# 01# = (g ° f)# = (ad * o( Idx)# = (ad*, 1#
° g# =
(f
° g)# =
(a2)
* (Idy)# =
(a2) •.
Since (a1). and (a2). are isomorphisms, 1# is injective and surjective.
o
8.1.3 Covering spaces a. Covering spaces A useful tool to compute, at least in some cases, the fundamental group, is the notion of covering space. 8.32 Definition. A covering of Y is a continuous map p : X --+ Y from a topological space X, called the total space, onto Y such that for all x E Y there exists an open set U C Y containing x such that p-l(U) = U a Va, where Va are pairwise disjoint open sets and P\V" is a homeomorphism between Va and U. Each Va is called a slice of p-l(U).
8.1 Homotopy
261
Figure 8.10.
8.33 Example. Let Y be any space. Consider the disjoint union of k-copies of Y, that we can write as a Cartesian product X :== Y X {I, 2, ... , k}. Then the projection map p: X ~ Y, p((y,i)) == y, is a covering of X. 8.34 Example. Let Sl be the unit circle of iC. Then the circular motion p : ~ ~ Sl, p( 9) == ei 2rr9 is a covering of Sl . 8.35 Example. Let X C ~3 be the trace of the regular helix "'((t) == (cos t, sin t, t). Then p: X ~ Sl where p: ~3 ~ ~2, p(x,y,z) :== (x,y), is the orthogonal projection on ~2, is another covering of Sl. 8.36 .... Let p : X ~ Y be a covering of Y. Suppose that Y is connected and that for some point YO E Y the set p-1(yO) is finite and contains k points. Show that p-1(y) contains k points for all y E Y. In this case, we say that p : X ~ Y is a k-fold covering ofY. 8.37 .... Show that p : ~+ ~ Sl, p(t) :== e i
t,
is not a covering of Sl.
8.38 ,.. Show that, if p : X ~ X and q : Y ~ Yare coverings respectively, of X and Y, then p x q : X X Y ~ X x Y, p x q(x, y) :== (p(x), q(y)), is a covering of X x Y. In particular, if p : ~ ~ Sl is defined by p(t) :== e i 2rrt, then the map p xp : ~ X ~ ~ Sl X Sl is a covering of the torus Sl x Sl. Figure 8.10 shows the covering map for the standard torus of ~3 that is homeomorphic to the torus Sl x Sl C ~4. 8.39". Think of Sl as a subset of iC. Show that the map p : Sl ~ s1, p(z) == z2, is a two-fold covering of Sl. More generally, show that the map Sl ~ Sl defined by p(t) :== zn is a Inl-covering of Sl if n E Z \ {O}. 8.40 .... Show that the map p : ~+ covering of Sl x ~+. 8.41 ,.. Show that the map p ~2 \
X
~ ~ ~+
X
Sl defined by p(s, 9) == (s, e i9 ) is a
~+ x ~ defined by p(p,9) ._ pei9 is a covering of
{O}.
b. Lifting of curves In connection with coverings the notion of (continuous) lift is crucial. 8.42 Definition. Let p : X ---+ Y be a covering of Y and let f : Z ---+ Y be a continuous map. A continuous map Z ---+ X such that pol = f is called a lift of f on X.
1:
262
8. Some Topics from the Topology of IRn
8.43 Example. Let p : IR ---t Sl be the covering of Sl given by p(t) = e i t. A lift of f : [0,1] ---t Sl is a continuous map h : [0, 1J ---t IR such that f(t) = e i h(t). Looking at t as a time variable, h(t) is the angular evolution of f(t) as f(t) moves on Sl. 8.44 Example. Not every function can be lifted. For instance, consider the covering p ; IR ---t Sl, p(t) = e i 27rt. Then the identity map on Sl cannot be lifted to a continuous map h ; Sl ---t R In fact, parametrizing maps from S1 as closed curves parametrized on [0,271"J, h would be periodic. On the other hand, if h was a lift of z = eit , we would have e it = e i h(t), which implies that h(t) = t+const, a contradiction.
However, curves can be lifted to curves that are not necessarily closed. Let X be a metric space. We say that X is locally path-connected if every point x E X has an open path-connected neighborhood U. 8.45 Proposition. Let p : X ~ Y be a covering of Y and let Xo EX. Suppose that X and Yare path-connected and locally path-connected. Then ~ Y with f3(0) = p(xo) has a unique continuous lift a : [0, 1] ~ X such that po a = f3 and a(O) = xo, (ii) for every continuous map k : [0,1] x [0, 1] ~ Y with k(O, 0) = p(xo), there is a unique continuous lift h : [0,1] x [0, 1] ~ X such that h(O,O) = Xo and p(h(t, s)) = k(t, s) for all (t, s) E [0,1] x [0,1].
(i) each curve f3 : [0, 1]
Proof. Step 1. Uniqueness in (i). Suppose that for the two curves a1, a2 we have p(a1 (t)) = p(a2(t)) Vt E [0,1] and a1 (0) = a2(0). The set E := {t I a1 (t) = a2(t)} is closed in [0, 1J; since p is a local homeomorphism, it is easily seen that E is also open in [0,1]. Therefore E = [0, 1J. Step 2. Existence in (i). We consider the subset E := {t E [0,1]13 a continuous curve
at :
[0, t]
---t
X
such that a(O) = Xo and p(at(O)) = [3(0) W E [0, t]} and shall prove that E is open and closed in [0,1] consequently, E = [0,1] as it is not empty. Let r E E and let U be an open neighborhood of ar(r) for which Plu is a homeomorphism. For a sufficiently small, a < ao, the curve s ---t ,(s) := (p/U)-1(f3(s)), s E [r, r + a], is continuous, ,(r) = ar(r) and p(a(s)) = f3(s), Vs E [r, r + a]. Therefore for the curve a" : [0, a] ---t X defined by
a,,(s) := {ar(s) ,(s)
if 0::; s ::; r, ifT<s::;r+a,
°
we have a,,(t) = f3(t) for all t E [0, a], Le., r + ao E E for some ao > if r E E, or, in other words, E is open in [0, 1J. We now prove that E is closed by showing that T ;= supE E E. Let {tn} C E be a nondecreasing sequence that converges to T and for every n, let an : [0, tn] ---t X be such that p(an(t)) = [3(t) \:It E [0, tn]. Because of the uniqueness ar(t) = as(t) for all t E [0, r] if s < r, consequently a continuous curve a : [0, T[---t X is defined so that p(a(t)) = [3(t) \:It E [0, T[. It remains to show that we can extend continuously a at T. Let V be an open neighborhood of [3(T) such that p-1(V) = UjUj where U a are pairwise disjoint open sets that are homeomorphic to V. Then f3(t) E V for t < t ::; T. Since a([t, T[) is connected and the Ua's are pairwise disjoint, we infer that a(t) must
8.1 Homotopy
263
belong to a unique UOq say Ul, for t < t < T. It suffices now to extend by continuity a by setting a(T) := (PIU1)-ICB(T)).
Step 3. (ii) Uniqueness follows from (i). In fact, if P E [0,1] x [0,1], ,,/p is the segment joining (0,0) to P and hl,h2: [0,1] -; [0,1] -; X are such that POhl = POh2 in [0, IJ x [0, 1] with hI (0,0) = h2(0, 0), from (i) we infer hI (P) = h2(P), as hlhp = h 2hp ' Let us prove existence. Again by (i) there is a curve a(t) with a(O) = xo and p(a(t)) = k(O, t) for all t, and, for each t, a curve s -; h(s, t) such that p(h(s, t)) = k(s, t) with h(O, t) = a(t). Of course k(O, 0) = a(O) = xo and it remains to show that h: [0,1] x [0,1] -; X is continuous. Set R s := [O,s[x[O,I] and Ro := {O} X [0,1]. Suppose h is not continuous and let (s, t) be a point in the closure of the points of discontinuity of h. Let U be an open and connected neighborhood of h(t, s) such that Plu is a homeomorphism. By lifting k!p(u) we find a rectangle R C ]R2 that has (s, t) as an interior point and a continuous function w : R -; U with w(t, s) = h(t, s) such that p(w(t,s)) = k(t,s) = p(h(t,s)) for all (t,s) E R s . Since wand h are continuous in R s , they agree in R s n R. On the other hand, both h(t, s) and w(t, s) lift the same function k(t, s), thus by (i) they agree, hence h(t, s) = w(t, s) is continuous in a neighborhood of (t, s): a contradiction. D
8.46 Proposition. Let X and Y be path-connected and locally pathconnected metric spaces and let f : X -+ Y be a covering of Y. Let a,{3: [0,1] -+ Y be two curves with a(O) = {3(0) and a(l) = {3(1) that are homotopic with fixed endpoints and let a, b : [0, 1] -+ X be their continuous lifts that start at the same point a(O) = b(O). Then a(l) = b(l), and a and b are homotopic with fixed endpoints. Proof. From (i) Proposition 8.45 we know that a, (3 can be lifted uniquely to two curves a, b : [0,1] -; X with a(O) = b(O) = ao, p(ao) = a(O). Let k : [0,1] x [0,1] -; Y be a homotopy between a and (3, i.e., k(O, t) = a(t), k(l, t) = (3(t), k(s,O) = a(O) = (3(0), k(s, 1) = a(l) = (3(1). By (ii) Proposition 8.45 we can lift k to h, so that p(h(s,t)) = k(s, t) and k(O, 0) = a(O) = b(O). Then h is a homotopy between a and b and in particular a(l) = h(O, 1) = b(I). D
8.47 Theorem. Let X and Y be path-connected and locally path-connected metric spaces and let p : X -+ Y be a covering of Y. If Y is simply connected, then p : X -+ Y is a homeomorphism. Proof. Suppose there are XI,X2 E X with p(XI) = p(X2)' Since X is connected, there is a curve a : [0,1] -; X with a(O) = Xl and a(l) = X2. Let b : [0,1] -; X be the constant curve b(t) = Xl. The image curves a(t) := p(a(t)) and (3(t) := p(b(t)) are closed curves, hence homotopic, Y being simply connected. Proposition 8.46 then yields X2 = a(l) = b(l) = Xl. D
8.48 Theorem. Let X and Y be path-connected and locally path-connected, and let p : X -+ Y be a covering of Y. Suppose that Z is pathconnected and simply connected. Then any continuous map f : Z -+ Y has a lift f : Z -+ X. More precisely, given Zo E Z and Xo EX, such that p(xo) = f(zo), there exists a unique continuous map f : Z -+ X such that f(zo)=xo andpof=f. Proof. Let z E Z and let "/ : [0,1] -; Z be a curve joining Zo to z. Then the curve a(t) := f(,,/(t)), t E [0,1], in Y has a lift to a curve a : [O,IJ -; X with a(O) = xo, see (i) Proposition 8.45, and Proposition 8.46 shows that a(l) depends on a(l) = f(z)
264
8. Some Topics from the Topology of IR n
and does not depend on the particular curve 'Y. Thus we define J(z) := a(I), and by definition f(zo) = Xo and poi = f. We leave to the reader to check that i is continuous.
o
c. Universal coverings and homotopy 8.49 Definition. Let Y be a path-connected and locally path-connected metric space. A covering p : X ---+ Y is said to be a universal covering of Y if X is path-connected, locally path-connected and simply connected. From Theorems 8.47 and 8.48 we immediately infer
8.50 Theorem (Universal property). Let X, Y, Z be path-connected and locally path-connected metric spaces. Let p : X ---+ Y, q : Z ---+ Y be two coverings of Y and suppose Z simply connected. Then q has a lift q: Z ---+ X which is also a covering of X. Moreover q is a homeomorphism if X is simply connected, too. The relevance of the universal covering space in computing the homotopy appears from the following.
8.51 Theorem. Let X and Y be path-connected and locally path-connected metric spaces and let p : X ---+ Y be the universal covering of Y. Then 'r:/yO E Y 1l"l(Y,YO) andp-l(yo) c X are one-to-one. Proof. Fix q E p-l(xO). For any curve a in Y with base point xo, denote by a: [0, IJ ---> X its lift with a(O) = q. Clearly a is a curve in X which ends at a(l) E p-l (xo). Moreover, if f3 is loop-homotopic to a in Y, then necessarily a(l) = b(I), so the map a ---> a(l) is actually a map
'Pq : 7q(Y,XO)
--->
p-l(xO).
Of course 'Pq is surjective since any curve in X with endpoints in p-l (xo) projects onto a closed loop in Y with base point xo. Moreover, if 'Pq(b]) = 'Pq([t5]), then the lifts c and d that start at the same point end at at the same point; consequently c and dare homotopic, as X is simply connected. Projecting the homotopy between c and d onto Y yields b) = [15]. 0
d. A global invertibility result Existence of a universal covering p : X ---+ Y of a space Y can be proved in the setting of topological spaces. Observe that if X and Yare pathconnected and locally path-connected, and if p ; X ---+ Y is a universal covering of Y, then Y is locally simply connected, Le., such that 'r:/y E Y there exists an open set V C Y containing y such that every loop in V with base point at x is homotopic (in Y) to the constant loop x. It can be proved in the context of topological spaces that any path-connected, locally path-connected and locally simply-connected Y has a universal covering p : X ---+ Y. We do not deal with such a general problem and confine ourselves to discussing whether a given continuous map f : X ---+ Y is a covering of Y.
8.1 Homotopy
265
Let X, Y be metric spaces. A continuous map f : X ---4 Y is a local homeomorphism if every x E X has an open neighborhood U such that flU is a homeomorphism onto its image. We say that f is a proper map if f-l(K) is compact in X for every compact KeY. Clearly a homeomorphism from X onto its image f(X) C Y is a local homeomorphism and a proper map. Also, if p : X ---4 Y is a covering of Y then p is a local homeomorphism. We have
8.52 Theorem. Let X be path-connected and locally path-connected and let f : X ---4 Y be a local homeomorphism and a proper map. Then X and f(X) are open, path-connected and locally path-connected and f : X ---4 f(X) is a covering of f(X). Before proving Theorem 8.52, let us introduce the Banach indicatrix of n ---4 lR as the map
f :X
Nf :Y
Evidently f(X)
#{ x E X I f(x) = Y}.
---4
N U {oo},
=
{y I Nf(Y) ~ I} and f is injective iff Nf(y) ::::: 1 'Vy.
Nf(Y) :=
8.53 Lemma. Let f : X ---4 Y be a local homeomorphism and a proper map. Then Nf is bounded and locally constant on f(X). Proof. Since f is a local homeomorphism, the set f-1(y) = {x E X I f(x) = y} is discrete and in fact f- 1 (y) is finite, since f is proper. Let N f (Y) = k and f- 1 (Ti) = {Xl, ... ,xd. Since f is a local homeomorphism, we can find open disjoint neighborhoods U1 of Xl, ... , Uk of Xk and an open neighborhood V of Ti such that flUj : Uj - t V are homeomorphisms. In particular, for every y E V there is a unique Xj E Uj such that f(xj) = y. It follows that Nf(y) 2': k Vy E V. We now show that for every Ti there exists a neighborhood W of Ti, W c V, such that N f (y) :::: k holds for all yEW. Suppose, in fact, that for a Ti there is no neighborhood W such that N(y) :::: k for yEW, then there is a sequence {yd C W, Yi - t Ti with N(Yi) > k, and points ~i f/c U1 U .,. U Uk with f(~i) = Yi· The set f-1( {yd U {Ti}) is compact since f is proper, thus possibly passing to a subsequence {~i} converges to a point ~ and necessarily ~ f/c U 1 U· .. U Uk; passing to the limit we also find f(~) = Ti: a contradiction since ~ is different from Xl, ... , Xk' 0 Proof of Theorem 8.52. From Lemma 8.53 we know that, for every y E Y, f-1(y) contains finitely many points {Xl, X2, ... , XN} where N is locally constant. If Ui, i = 1, ... , N, Ui " Xi and V " y are open and homeomorphic sets, we then set
Clearly V is open and to V.
f- 1 (V)
is a finite sum of disjoint open sets that are homeomorphic 0
As a consequence of Theorem 8.47 we then infer the following useful global invertibility theorem.
8.54 Theorem. Let X be path-connected and locally path-connected, and let f : X ---4 Y be a local homeomorphism that is proper. If f(X) is simply connected, then f is injective, hence a homeomorphism between X and f(X).
266
8. Some Topics from the Topology of JRn
Proof. f: X -> f(X) is a covering by Theorem 8.52. Theorem 8.47 then yields that f is one-to-one, hence a homeomorphism of X onto f(X). 0
8.1.4 A few examples a. The fundamental group of 8 1
The map p : IR ~ 81, p( t) = ei 21ft is a universal covering of 8 1 . Therefore for any Xo E 81, p-l(xO) = Z as sets. Therefore, see Theorem 8.51, one can construct an injective and surjective map
that maps [0:] to the end value a(l) E Z of the lift a of 0: with a(O) We have
= O.
8.55 Lemma. 'Pxo : 1rl(81,xo) ~ Z is a group isomorphism. Proof. Let a, {3 be two loops in 8 1 with base point xo and a, b the liftings with a(O) b(O) = O. If n := 'Po([a]) and m = 'Po([{3]), we define c: [0,1] -> JR by
c(s) = {a(2S) n + b(2s - 1) It is not difficult to check that c is the lift of a
'Po([a]
* [{3]) =
S S
E [0,1/2]' E [1/2,1].
* {3 with
'Po([a * {3]) = c(l) = n
=
+m
c(O) = 0 so that
= 'Po([a])
+ 'Po ([{3]). o
Since 'Pxo is a group isomorphism and Z is commutative, 1rl (8 1 , xo) is commutative, and there is an injective and bijective map h : [8 1 , 8 1 ] ~ 1rl (X, xo), see Proposition 8.27. The composition map degb) := 'Pxo(h(b])) is called the degree on 8 1 , and by construction we have the following. 8.56 Theorem. Two maps f, g : 8 1 ~ 8 1 have the same degree if and only if they are homotopic.
Later we shall see that we can recover the degree mapping more directly. 8.57'. Show that the fundamental group of JR2 \ {O} is Z.
8.1 Homotopy
267
Figure 8.11. A figure eight.
b. The fundamental group of the figure eight The figure eight is the union of two circles A and B with a point Xo in common. If a is a loop based at Xo that goes clockwise once around A, and a-I is the loop that goes counterclockwise once around A, and similarly for b, b- I , then the cycle aba-Ib- 1 is a loop that cannot be unknotted in AU B while aa-1bb- 1 can. More precisely, one shows that the fundamental group of the figure eight is the noncommutative free group on the generators a and b. Indeed, this can be proved using the following special form of the so-called SeifertVan Kampen theorem. 8.58 Theorem. Suppose X = UUV, where U, V are open path-connected sets and U n V is path-connected and simply connected. Then for any Xo E Un V, 1fl(X, xo) is the free product of 1fl(U, xo) and 1fl(V, xo)· 8.59 4\[. Show that the fundamental group of JR2 \ {XO, Xl} is isomorphic to the fundamental group of the figure eight. 8.60 4\[. Show that 7q(X x Y,(XO,Yo» is isomorphic to 7r1(X,XO) x 7rl(Y,YO), in particular the fundamental group of the torus 8 1 x 8 1 is Z x Z. 8.61 4\[. Let X = Al U A2 U ... U An where each Ai is homeomorphic to 8 1 , and Ai nA j = {XO} if i ~ j. Show that 7rl (X, xo) is the free group on n generators (1<1, ... , an where ai is represented by a path that goes around Ai once. 8.62 lIf. Let X be the space obtained by removing n points of JR2. Show that 7rl (X, xo) is a free group on n generators aI, ... , an, where ai is represented by a closed path which goes around the ith hole once.
c. The fundamental group of sn, n ~ 2 The following result is also a consequence of Theorem 8.58. 8.63 Theorem. Let X = U U V where U and V are simply connected open sets of X and U n V is path-connected. Then X is simply connected, i.e., 1fl(X,XO) = O. As a consequence we have the following.
8.64 Proposition. The sphere sn 1fl(sn,XO) = 0 ifn ~ 2.
c IRn +1 is simply connected, i.e.,
268
8. Some Topics from the Topology of
jRn
Proof. Let PS and PN be respectively, the south pole and the north pole of the sphere. The stereographic projection from the south (north) pole establishes a homeomorphism between sn \ {p S} (respectively, sn \ {p N }) and jRn. Thus 71'1 (sn \ {p S}, xo) == 71'1(sn \ {PN},XO) == 0 Vxo 1= Ps,PN. By Theorem 8.63 it suffices to show that sn \ {PS,PN} is path-connected. For that we notice that the stereographic projection is a homeomorphism between sn \ {ps, PN} and jRn \ {O} which in turn is path-connected if n 22. 0
Since the fundamental groups of JR.n+1 \ {O} and sn are isomorphic, see Theorem 8.31, equivalently we can state
8.65 Proposition. JR.n \ {O} is simply connected il n > 2. 8.66". Show that jRn, n
> 2, and
jR2
are not homeomorphic.
8.1.5 Brouwer's degree a. The degree of maps 8 1 -+ 8 1 A more analytic presentation of the mapping degree for maps Sl -. Sl is the following. Think of Sl as the unit circle in the complex plane, so that the rotations of Sl write as complex multiplication, and represent 100p3 in Sl as maps 1 : Sl -. Sl or by 21l'-periodic functions B -. l(e i6 ), I: Sl -. Sl. 8.61 Lemma. Let 1 : Sl -. Sl be continuous. There exists a unique continuous function h : JR. -. JR. such that /(e i6 )
= 1(1)eih (6)
WE JR.,
{ h(O) = O.
(8.3)
Proof. Consider the covering p: IR --> SI of SI given by p(t) :== eit . The loop g(z) :== f(z)/f(l) has base point 1 E SI. Then by the lifting argument, Proposition 8.45, there exists a lift h : IR --> .IR such that (8.3) holds. The uniqueness follows directly from (8.3). In fact, if hI, h2 verify (8.3), then hI (0) - h2(0) == k(0)27l' where k(O) E Z. As hI and h2 are continuous, k(O) is constant, hence k(O) == k(O) == O. 0
Let 1 : Sl -. Sl be continuous and let h : JR. -. JR be as in (8.3). Of course, for every B we have
h(B + 21l') - h(B) = 2k(B)1l' for some integer k(B) E Z. Since h is continuous, k is continuous, hence constant. Observe that k = h(21l') - h(O) = h(21l') and k is independent of the initial point 1(1). In particular, 1 : SI -. Sl and 1/1(1) : Sl -. SI have the same degree.
8.1 Homotopy
269
8.68 Definition. Let f : 8 1 -+ 8 1 and let h be as in (8.3). There is a unique integer d E Z such that h(O + 2n) - h(O)
= d2n
veER
The number d is called the winding number, or degree, of the map f 8 1 -+ 8 1 , and it is denoted by deg(J).
fr : 8 1
8.69 Theorem. Two continuous maps fo, degree if and only if they are homotopic.
---->
8 1 have the same
Proof. Let f: 8 1 -+ 8 1 . We have already observed that fez) and f(z)/ f(l) have the same degree. On the other hand, fez) and f(z)/f(l) are also trivially homotopic. To prove the theorem it is therefore enough to consider maps fa, h with the same base point, say f(l) = 1. (i) Assume fa, h are homotopic with base point 1 E 8 1 . By the lifting argument, the liftings ho, hI of fa, h characterized by (8.3) have hI (21l") = h2(21l"), hence deg(h)
= hI (21l") -
hl(O)
= hI (21l") = h2(21l") = h2(21l") -
h2(0)
= deg(h)·
Conversely, let f : be of degree d and let h be given by (8.3). Then the map k: [0,1] X 8 1 -+ 8 1 defined by
8 1 -+
81
k(t, e) := exp (th(e) establishes a homotopy of f to the map cp : 8 h have the same degree d and base point 1 E same map cp(z) = zd.
+ d (1 -
1 -+ 1, 8 1 , then
8
t)e)
cp(z) = zd. Therefore, if fa and they are both homotopic to the 0
Finally we observe that deg(zd) = d Vd E Z and that, if the same base point, deg(g * f) = deg(g) + deg(J).
f
and g have
b. An integral formula for the degree Let f : 8 1 -+ 8 1 and let h : lR -+ lR be as in (8.3). Clearly, thinking of 0 as a time variable, h(O) is the angle evolution of the point f(e iO ) on the circle. The degree of f corresponds to the total angle evolution, that is to the number of revolutions that f(z) does as z goes around 8 1 once counterclockwise, counting the revolutions positively if f(z) goes counterclockwise and negatively if f(z) goes clockwise. Suppose f : [0,2n] -+ 8 1 is a loop of class Or, that is 0 -+ f(e iO ) is of class 0 1 , and let h: lR -+ lR be as in (8.3). Differentiating (8.3) we get ie io f'(e iO ) = if(l)eih(O)h'(O)
= if(eio)h'(O)
and taking the modulus Ih'(O)! = 1f'(e iO )[. Therefore, h' is the angular velocity of f(z) times ±l depending on the direction of motion of f(z) when z moves as eiO on the unit circle. In coordinates, writing f := fr +ifz, we have f' = If + if~, hence
We conclude using the fundamental theorem of calculus
270
8. Some Topics from the Topology of lR n
/
/
/
'-'
/ Figure 8.12. Counting the degree.
8.70 Proposition (Integral formula for the degree). Let f 8 1 be of class C 1 . Then the lift h of f in (8.3) is given by
81
--+
In particular 27r
deg(J)
=
~ r 21r i
=
2~
h' (B) dB
o
1
27r
iO
ie ( - f2(e i o)f{(e iO ) +
h(eiO)f~(eiO)) dB.
(8.5)
One can define the lifting and degree of smooth maps by (8.5), showing the homotopy invariance in the context of regular maps, and then extending the theory to continuous functions by an approximation procedure.
c. Degree and inverse image The degree of f : 8 1 --+ 8 1 is strongly related to the number of roots of the equation f (x) = y counted with a suitable sign. 8.71 Proposition. Let f : 8 1 --+ 8 1 be a continuous map with degree d E Z. For every y E 8 1 , there exist at least jdl points Xl, X2, ... , Xd in 8 1 such that f(Xi) = Y, i = 1, ... , d. Furthermore, if f : 8 1 --+ 8 1 goes around 8 1 never turning back, i.e., if f(e iO ) = eih(O) where h : [0,21r] --+ IR is strictly monotone, then the equation f(x) = y, Y E 81, has exactly Idl solutions. Proof. Let h : lR --t lR be as in (8.3) so that h(21l") = 21l" d, and let s E [0, 21l"[ be such that e is = y. For convenience suppose d > O. The intermediate value theorem yields d distinct points 81, fh, ... , 8d in [0, 21l"[ such that h(8l) = s, h(fh) = s + 21l", ... , h(8 d ) = S + 2(d - 1)1l", hence at least d distinct points Xl, X2, ... , Xd such that f(xj) = f(e iOj ) = eih(Oj) = e is = y, see Figure 8.12. They are of course exactly d points Xl, X2, ... , Xd if h is strictly monotone. 0
8.1 Homotopy
271
With the previous notation, suppose f : Sl ----> Sl is of class C 1 , let h be as in (8.3) and let y E Sl and s E [0, 27r[ be such that y = eis . Assume that y is chosen so that the equation f(x) = y has a finite number of solutions and set
#{ BE Sl Ih(B) = s N_(J, y) := #{ BE Sl I h(B) = s N+(J, y) :=
(mod 27r), h'(B)
> O},
(mod 27r), h'(B) < O}.
Then one sees that
(8.6) see Figure 8.12. 8.72 The fundamental theorem of algebra. Using the degree theory we can easily prove that every complex polynomial
P(z)
:=
zm
+ a1zm-1 + ... + am-1Z + ao Set S~ := {z II z I = p}. For p sufficiently large,
has at least a complex root. P maps S~ in 1R2 \ {O}. Also deg(PIIPI) homotopy
Pt(z) := zm + t(a1zm-1 of P(z) to zm, we have
= m.
In fact, by considering the
+ ... + ao)
(la11 - + ... +laol - )) Izl Izlm [0,1] provided Izi is large
IPt(z)I~lzlm ( 1-t
t E [0,1],
Vizi =!= O.
Thus IPt(z) I > 0 V t E enough, consequently Pt(z)/lPt(z)l, t E [0,1], z E Sl, establishes a homotopy of PIIPI to zm from S~ into Sl, and we conclude that
deg(PIIPI) 8.73 -,r. Show that Figure 8.12.]
f :81
->
= deg(zm) = m.
8 1 has at least d - 1 fixed points if deg f = d. [Hint: See
d. The homological definition of degree for maps 8 1 --+ 8 1 Let f : Sl ----> ~1 be a continuous map, where for convenience we have denoted the target space Sl by ~1. We fix in Sl and ~1 two orientations, for instance the counterclockwise orientation, and we divide Sl in small arcs whose images by f do not contain antipodal points (this is possible since f : Sl ----> ~1 is uniformly continuous) and let Zl, ... , Zn, Zn+l = Zl E Sl the points of such subdivision indexed according to the chosen orientation in Sl. For each i = 1 ... ,n we denote by (};i the minimal arc connecting f(Zi) with f(Zi+1). We give it the positive sign if f(Zi) precedes f(Zi+d with respect to the chosen orientation of ~1, negative otherwise. Finally, for ( E ~\ ( =!= f(Zi) V i, we denote by p(() and n(() the number of arcs (};i respectively, positive and negative that contain (. Then
p(() - n(() as we can see looking at the lift of
=
f.
deg(J) E Z
272
8. Some Topics from the Topology of JRn
LLI ................
~
..
III
~
Courant In,titute of MathematiC41 c;ence.
L .. " ....... ~
n
...........
H
·_
..,.
ru...~
......,.-~
_
..
I.,
' ...
~:-~~w:.. ~ ;'::;..~.= ..... ~ . - . ~ ,.. <.c.-.} .... ~
Wlf
....... ,
.... ~ T
z..llI
......=!...D.-
.............
. . ___..
LwIr . . . .
...... &
_.-..
~:
"~
~
~ ......
_ft.~
~
.. Iloo_"""""". __ .. w ...... ., .....
""Il-"'~_"""""""",,_.I.
_
ew Vork University
~
.......... - . . - ............ ,J:t. .....
~
~$?...::.:~~-:.'~H1E ~~~~.~-=~~~~~-~
.
--~
Figure 8.13. Frontispiece of lecture notes by Louis Nirenberg and a page from a paper by L. E. Brouwer (1881-1966).
8.2 Some Results on the Topology of
lRn Though the presentation of these topics would require more space and advanced techniques, and in any case, it leads us away from the main path, we think that it is worthwhile to present here some results that are relevant in the sequel. However, we shall confine ourselves to illustrate the ideas and refer to the literature for complete proofs and more details. In the next two paragraphs we collect a few relevant results on the topology of maps into that we freely use in the rest of this section.
sn
8.2.1 Brouwer's theorem a. Brouwer's degree A topological degree, called Brouwer's degree, can be defined for continuous maps I : sn --t sn, n 2: 2, either by extending the homological type arguments in the case n = lor, more generally, in terms of homology groups or, analytically, in terms of a sum with sign of the numbers of inverse images of a point, either pointwise or in the mean. Intuitively, one counts how many times the target sn is covered algebraically by the source sn via the map 1. 1 Both approaches require the development of more advanced and relevant techniques; we refer the reader e.g., to o J. Dugundji, Topology, Allyn and Bacon, 1966, o L. Nirenberg, Topics in Nonlinear Analysis, AMS-CIMS, New York, 2001, 1
8.2 Some Results on the Topology of IR n
273
In this way we end up with a map deg:
cO(sn, sn) --t Z
such that (i) deg( Id) = 1, (ii) deg(f) = 0 if f is constant, (iii) deg(f) = (_I)n+l if f(x) = -x, and we have the following.
8.74 Theorem (Brouwer). Let fo, motopic. Then deg(fo) = deg(fd.
h : sn
--t
sn
be continuous and ho-
Indeed the degree completely characterizes the homotopy classes of continuous maps from sn into sn. In fact, we have the following.
8.75 Theorem (Hopf). Two continuous maps of sn into itself are homotopic if and only if they have the same degree. Moreover, for each dE Z there is a map f : sn --t sn with deg(f) = d. A map f : sn --t sn C ~n+l is called antipodal if f( -x) = - f(x) "Ix E sn. For instance, Id: sn --t sn and - Id : sn --t sn are antipodal.
8.76 Theorem (Borsuk antipodal theorem). Let f : sn --t sn be a continuous antipodal map. Then deg(f) is odd; in particular f is not homotopic to a constant map. b. Extension of maps into sn The following two extension theorems for maps into sn are also crucial. We refer the reader e.g., to J. Dugundji, Topology, Allyn and Bacon, 1966. 8.77 Theorem. We have the following.
sn be a closed set. Every continuous map f : A --t sn extends to a continuous map F : sn --t sn. (ii) Let A c sn+l be closed and f : A --t sn be continuous. Pick a point Xi E Ui in every bounded connected component Ui of Ac := sn+l \ A. Then there is a continuous extension F : sn+l \ UdPi} --t sn. (i) Let A c
8.78 Theorem (Borsuk). Let A C ~k, k ?: 1, be closed and let f A --t sn be continuous. Then f can be extended to a continuous map F : ~k --t sn if and only if f is homotopic to a constant map. Observing that Ac has a unique unbounded connected component if A is a compact subset of ~n, and using the stereographic projection, it is not difficult to infer from Theorem 8.77 (i), (ii). o one of the several books on degree theory.
274
8. Some Topics from the Topology of jRn
8.79 Theorem. We have the following.
c jRn be compact. Then any continuous function from A into sn can be extended to a continuous F : jRn -+ sn. (ii) Let A c jRnH be compact and f : A -+ sn be continuous. Pick a point Pi E Vi in every connected component Vi of AC. Then f can be extended to a continuous map F : jRn+l \ Ui{Pi} -+ sn. (i) Let A
As a consequence of the Hopf theorem and Proposition 8.8 we immediately infer the following.
8.80 Proposition. A function f : sn -+ sn has a continuous extension F : cl (BnH) -+ sn if and only if deg(f) = O. 8.81 Corollary. Let f : sn -+ jRn+l \ {O} be a continuous map. Then there exists a continuous extension F : cl (B n+1 ) -+ jRn+l \ {O} of f if and only if deg(f Ilf!) = O. c. Brouwer's fixed point theorem Since the identity from sn into sn has degree one, and the constant maps have degree zero, from the homotopic invariance of the degree we conclude the following. 8.82 Theorem (Brouwer). The identity map Id : sn motopic to a constant map.
-+
sn is not ho-
In other words, we cannot peel an orange without piercing the peel. Brouwer's theorem, whose content is quite intuitive, at least in dimension n = 2, has several interesting and surprising consequences. In fact, we have the following.
8.83 Theorem. The following claims are equivalent (i) (BROUWER'S THEOREM) The identity map Id : sn -+ sn is not homotopic to a constant map. (ii) There is no continuous map F : B -+ sn, B = cl (B n+1 ), such that F(x) = x 'Ix E sn, that is, sn is not a retract of B. (iii) (BROWER'S FIXED POINT THEOREM, I) Every continuous map f : B -+ B, B := cl (BnH ), has a fixed point, i. e., there is at least one x E B such that f(x) = x.
*
Proof. (i) (ii) If F : B -+ sn is a continuous function with F(x) = x Vx E sn, then H(t, x) := F(tx), (t,x) E [0,1] x sn, is a homotopy of the identity to F(O). A contradiction.
*
(ii) (iii) Suppose that there is a continuous F: B -+ B such that F(x) # x for all x E B. Then, and we leave this to the reader, the map G : B -+ sn that maps x E B into the unique point of sn on the half-line from f(x) to x would be a continuous map from B in sn with G(x) = x Vx E sn, contradicting (ii).
8.2 Some Results on the Topology of IR n
sn
275
sn
-t between the identity (iii) =} (i) Suppose that there is a homotopy H : [0,1] x and a constant map, H(I,x) = x, H(O,x) = P E Then the function F : B - t defined by
sn.
F(x)
:=
H(IX[, -lxl) x {p
if x
sn
i= 0,
if x = 0,
would be a continuous extension of the identity on would have no fixed point.
sn
to B, hence -F(x) : B
-t
B 0
8.84 'If. Let U C IR n + 1 be a bounded open set. Prove that there exists no continuous retraction r : V - t 8U with rex) = x on 8U. [Hint: Let 0 E U, B(O, k) :::J V and consider the continuous map f : B(O, k) - t B(O, k) defined by
k rex)
f(x)
if x E
1~(x)1
:=
{
kj;"]
V, __
if x E B(O, k) \ u.]
d. Fixed points and solvability of equations in jRn+l Going through the proof of Theorem 8.83, we can deduce a number of results concerning the solvability of equations of the type F(x) = O. Let f : sn ....... jRn+l \ {O} be a continuous map. Since f never vanishes, the map f Ilfl continuously maps sn into sn. We call degree of f with respect to the origin the number deg(f,O) := deg
(I~I)'
8.85 Proposition. Let f : sn ....... jRn+l \ {O} be a continuous map with deg(f,O) =1= O. Then every extension F : B ....... jRn of f, B := cl (B n + 1 ), has a zero in B n +! . Proof. Suppose this is not true. Then there exists a continuous extension F : B - t IR n + 1 \ {O} of f. Hence F(x)/IF(x)1 is a continuous map from B into sn. According to Proposition 8.80, F(x)/lF(x)1 = f(x)/lf(x)1 has degree zero, a contradiction. 0
Let us illustrate a few situations in which Proposition 8.85 applies.
8.86 Proposition. Let F : cl (B n + 1 ) ....... jRn+l be a continuous map such that F (x) never points opposite to x for all x E sn. Then F (x) = 0 has a solution. Proof. Let f 'Ix E
sn, f
:=
Fisn :
sn
-t
IRn+l. Since, by assumption F(x)
+ .xx i=
0 V .x
:::
0,
has no zeros and therefore
h(t,x)
:=
tf(x)
+ (1- t)x,
t E [0, 1], x E Sn,
sn
sn
never vanishes. Hence h(t,x)/lh(t,x)1 is a homotopy of fllfl: -t to the identity map Id : -t It follows that deg(J, 0) = deg(J Ilfl) = 1. We conclude, on account of Proposition 8.85 that F, being an extension of f, has at least one zero in B n + 1 . 0
sn
sn.
276
8. Some Topics from the Topology of jRn
8.87 Theorem (Brouwer's fixed point, II). Let F : B -+ jRn, B := cl (Bn), be a continuous map with F(&B) c B. Then F has a fixed point. Proof. Set 4>(x) := x-F(x), x E B, and suppose 4>(x) oj 0 "Ix, otherwise we are through. In this case 4> never points opposite for each x E aB. Indeed, if x - F(x) + AX = 0 for some A;::: 0 and x E aB, then F(x) = (l+A)x. Now A> 0 is impossible since IF(x)1 ::; 1, and, if A = 0, then F(x) = x on aB which we have ruled out. Thus, F(x) - x = 0 has a solution inside B. 0
It is worth noticing that Brouwer's theorem still holds if we replace cl (B n ) with any set which is homeomorphic to the closed ball of jRn. Moreover, it also holds in the following form.
8.88 Theorem (Brouwer fixed point theorem, III). Every continuous map f : K -+ K from a convex compact set K into itself has a fixed point. Proof. According to Dugundji's theorem, Theorem 6.42, f has a continuous extension F: jRn --> jRn, whose image is contained in K, K being convex and closed. If B is a ball containing K, then F(B) C B and by Brouwer's fixed point theorem, Theorem 8.87, F has a fixed point x E B, Le., F(x) = x, and, since F(x) E K, we conclude that
xEK.
0
e. Fixed points and vector fields Every (n + 1)-dimensional vector field in a domain A c jRn+l may be regarded as a map cp : A c jRn+ 1 -+ jRn+ 1 , once we fix the coordinates. If r.p is continuous and nonzero, the degree of cp with respect to the origin is called the characteristic of the vector field. The Brower's degree properties and Proposition 8.8 then read in terms of vector fields as follows. 8.89 Proposition. We have the following.
(i) (ii)
(BROUWER) Let r.p be a nonvanishing vector field in cl (Bn+l). Then r.pjsn has characteristic zero. The outward normal to Bn+l at x E sn = &B n+1 is x. Therefore
the outward normal field to sn, x -+ x/lxi, x E jRn+l \ {O}, has characteristic one. (iii) The inward normal at x E sn is -x. Therefore the inward normal field to sn, x -+ -x/lxi, x E jRn+l \ {O}, has characteristic (_1)n+l. (iv) Let r.p and 'ljJ be two continuous nonvanishing vector fields on sn that are never opposite on sn. Then r.p and 'ljJ have the same characteristic. Let us draw some consequences. 8.90 Proposition. Each nonvanishing vector field on r.p : cl (B n+ 1 ) -+ jRn+l must contain at least an inward normal and an outward normal vector. Proof. In fact, ""Isn has characteristic zero by (i) f·roposition 8.89. Since ""Isn and the field of outward (inward) normals have different characteristics, we infer from Proposition 8.89 (iv) that ""Isn must contain an inward (outward) normal. 0
8.2 Some Results on the Topology of JRn
277
8.91 Theorem (Poincare-Brouwer). Every continuous nonvanishing vector field on an even-dimensional sphere s2n must contain at least one normal vector. In particular, there can be no continuous nonvanishing tangential vector fields to sn. Proof. By (ii), (iii) Proposition 8.89, the inward and outward normal vector fields in s2n have different characteristics. Since any unitary vector field must have characteristics differing from one of these two fields, the result follows from (iv) Proposition 8.89. 0
8.92 Proposition. Let f : s2n --. s2n be a continuous map. Then either f has a fixed point x = f(x), or there is an x E s2n such that f(x) = -x. Proof. Suppose f : s2n
--->
s2n has no fixed point. Then the vector field 9 : s2n
--->
s2n
given by g(x) := [ji~j::::~[, x E S2n, is continuous and of modulus one. Thus it contains a normal vector, i.e., f(x) - x = '\x for some x E sn and ,\ E JR. Since If(x)1 = Ixl = 1 we infer 1 = If(x)1 = 1,\ + ll1xl = 1,\ + 11, i.e., either ,\ = 0 or ,\ = -2. We cannot have ,\ = 0 since otherwise f(x) - x = 0 and x would be a fixed point. Thus necessarily ,\ = -2, i.e., f(x) = -x. 0
8.93'. Let 4> : JRn
--->
8.94'. Let ¢: JRn
--->
JRn be a continuous map that is coercive, that is (4)(x) I x) -'-'---'-:-'-:-'--"- ---> 00 uniformly as Ixl ---> 00. Ixl Show that 4> is onto JRn. [Hint: Show that for every x,y E JRn 4>(x) - y never points opposite to x for Ixl = R, R large.] IRn be a continuous map such that limsup 1¢(x)1 Ixl~+oo Ixl
<
l.
Show that 4> has a fixed point.
8.95'. Let us state another equivalent form of Brouwer's fixed point theorem.
Theorem (Miranda). Let f: Q:= {x E JRn Ilxil::; 1, i = 1, ... ,n} continuous map such that for i = 1, ... , n we have !i(Xl,
,Xi-l,-I,Xi+l, ... ,Xn )
--->
JRn be a
2': 0,
!i(Xl, ,Xi-I, I,Xi+l,··· ,xn )::; O. Then there is at least one x E Q such that f(x) = O. Show the equivalence between the above theorem and Brouwer's fixed point theorem. [Hint: To prove the theorem, first assume that strict inequalities hold. In this case show that for a suitable choice of E1, ... , En E JR the transformation
X;=Xi+Ei!i(X),
i=l, ... ,n,
maps Q into itself, and use Brouwer's theorem. In the general case, apply the above to f(x) - ox and let 0 tend to O. Conversely, if F maps Q into itself, consider the maps !i(x) = Fi(X) - Xi, i = 1, ... ,n.]
8.96'. Show that there is a nonvanishing tangent vector field on an odd-dimensional sphere s2n-l. [Hint: Think s2n-l C JR2n. Then the field X= (Xl,X2, ... ,X2n)
---t
(-X n +l,-X n +2, ... ,X2n,Xl,X2, ... ,X n )
defines a map from s2n-l into itself that has no fixed point.]
278
8. Some Topics from the Topology of
jRn
Dm
Situ Abu dIe n-dlmen.doo_le culcJid.lKb. SphJ~').
".r.' a.,...1iI c ..,... L ... 1I ... .\"
_
.............. k.-,..b
_
"
tl.. .,-~t{p(I),"'(..J
• r
... IC........") .. '"""-.,.....,..,"' •
.......
.., all m.
_JJ""l·
~
_tbnt .....
~ II&. J.
~'C~··.....,.... r .... ....'P _ X· ........ .t.MAI.arMdt')
.-lCO"~
-
I).
.< ........ 111 , .. ~
.
~
"
,
1tI, ••
~,.
.... "
J
-.......
1.
JIJ_Jl
~.,.........,
,.
_~
....
.....
t~
/(tI1-f/(Ilt.,....
/.r.. ... .........,....
.... ---.....
...
c.+l~
~P.ka ... .,..,.
I..
... Kulllpol
~ . . .-.
~4.
__kIIoI
a.-r""
ro-._
h_~
-'...-.
-,.Il')C
,
l'
UIt, ....... a--.(}I
t1t,"-'"
u.
_
. ...;.~:r:-t.~~~:~~ ... •j
~_
-·
.. l'
... _ . . -
J~T~
llttt,·~
_
. . . . . . 1'........
.
I"' .....
' - " ' - " " " ' - t..LA.
Figure 8.14. Karol Borsuk (1905-1982) and a page from one of his papers.
8.2.2 Borsuk's theorem Also Borsuk's theorem, Theorem 8.76, has interesting equivalent formulations and consequences. 8.97 Theorem. The following statements hold and are are equivalent.
(i)
(BoRSUK-ULAM)
sn-l.
There is no continuous antipodal map f :
sn
---t
(ii) Each continuous f : sn ---t ~n sends at least one pair of antipodal points to the same point. (iii) (LVUSTERNIK-SCHNIRELMANN) In each family ofn+l closed subsets covering sn at least one set must contain a pair of antipodal points. Proof. Borsuk's theorem => (i) If I: sn ...... sn-1 is a continuous antipodal map, and if we regard sn-1 as the equator of sn, sn-1 C sn, I would give us a nonsurjective map I : sn ...... sn, hence homotopic to a constant. On the other hand I has odd degree by Borsuk's theorem, a contradiction. (i) => (ii) Suppose that there is a continuous 9 : sn ...... Then the map I : sn ...... sn-1 defined by
jRn
such that g(x)
1= g( -x).
I(x):= g(-x) -g(x) Ig(-x) - g(x)1 would yield a continuous antipodal map. (ii) => (iii) Let H, ... , Fn + 1 be n + 1 closed sets covering sn and let a : sn ...... sn be the map a(x) = -x. Suppose that a(Fi ) n Fi = 0 for all i = 1, ... , n. Then we can find continuous functions gi : sn - [0,1] such that g;l(O) = Fi and g;l(l) = a(F.). Next we define g: sn ...... jRn as g(x) = (gl(X), ... ,gn(X». By the assumption there
8.2 Some Results on the Topology of IR n
is Xo E sn such that g;(xo) = g;(-xo) V i, thus xo
f:-U F; t=l
and xo
279
f:-U a(F;), 1=1
consequently xo E F n +l n a(Fn +l), a contradiction. (iii) =? (i) Let f : sn ---> sn-l be a continuous map. We decompose sn-l into (n + 1) closed sets Al, ... , A n +l each of which has diameter less than two; this is possible by projecting the boundary of an n-simplex enclosing the origin and sn-l. Defining F; := f- 1 (A;), i = 1, ... , n + 1, according to the assumption there is an xo and a k such that Xo E F k n a(Fk)' But then f(xo) and f( -xo) belong to Fk and so f cannot ~~~~. 0
8.98 Theorem.
jRn
is not homeomorphic to
m and let h : IR n
jRm
if n
=1=
m.
Proof. Suppose n > be a continuous map. Since n - 1 2: m, from (ii) Theorem 8.97 we conclude that hlsn-l : sn-l ---> IR m C IR n - 1 must send two antipodal points into the same point, so that h cannot be injective. 0 ---> IR m
8.99 Remark. As a curiosity, (ii) of Theorem 8.97 yields that at every instant there are two antipodal points in the earth with the same temperature and atmospheric pressure. 8.100'. Show that every continuous map f : sn is surjective.
--->
sn such that f(x)
~
f(-x) V x
8.2.3 Separation theorems 8.101 Definition. We say that a set A c jRn+l separates complement Ac := jRn+l \ A is not connected. 8.102 Theorem. Let A
c
jRn+l
jRn+l
if its
be compact. Then
(i) each connected component of jRn+l \ A is a path-connected open set, (ii) AC has exactly one unbounded connected component, (iii) the boundary of each connected component of Ac is contained in A, (iv) if A separates jRn+l, but no proper subset does so, then the boundary of each connected component of Ac is exactly A. Proof. (i) follows e.g., from Corollary 6.68, since connected components of AC are open sets. (ii) Let B be a closed ball such that B =:J A. Then B C is open, connected and C Thus B is contained in a unique connected component of AC.
C
B c AC,
(iii) Let U be any connected component of A C and x E au. We claim that x does not belong to any connected component of AC, consequently x f:- AC. In fact, x f:- U, and, if x was in some component V, there would exist B(x, E) C V. B(x, E) would then also intersect U, thus Un V ~ 0: a contradiction. (iv) Let U be any connected component of AC. Since A separates IR n + 1, there is another connected component V of A C and, because V C IR n +1 \ V, necessarily IR n +1 \ V ~ 0. Consequently IRn+l \ au splits as
au = U u (IR n + 1 \ V) which are disjoint and nonempty, so au separates IR n + 1 . Since closed, it follows from the hypotheses on A that au = A. IRn+l \
by (iii)
au c
A and is 0
280
8. Some Topics from the Topology of jRn
8.103 Theorem (Borsuk's separation theorem). Let A c jRn+l be compact. Then A separates jRn+l if and only if there exists a continuous map f : A --t sn that is not homotopic to a constant. Proof. Define the map ,apiA as x-p
x-+--.
Ix-pi
Assume that A separates jRn+l. Then jRn+l \ A has at least one bounded component U. Choosing any p E U we shall show that ,apIA cannot be extended to a continuous function on the closed set AUU, consequently on jRn+l; hence ,apIA is not homotopic to a constant map by Proposition 8.8. In fact, if F : AuU -+ sn were a continuous extension of ,apiA we choose R > 0 such that B(p, R) :J Au U and define g : B(p, R) -+ oB(p, R) as
g(x) := {
p +R x - p Ix - pi
if x E B(p, R) \
p+ RF(x)
if x E V.
u,
Then g would be continuous in B(p, R) and g = Id on oB(p, R): this contradicts Brouwer's theorem. Conversely, suppose that A does not separate A c. Then A C has exactly one connected component, which is necessarily unbounded. By Theorem 8.79, f extends to F : jRn+l -+ sn. Therefore F and consequently f = FIA are homotopic to a constant m~.
0
In particular, Borsuk's separation theorem tells us that the separation property is invariant by homeomorphisms.
8.104 Corollary. Let A be a compact set in jRn and let h : A --t jRn be a homeomorphism onto its image. Then A separates IR.n if and only if h(A) separates jRn. As a consequence we have the following.
8.105 Theorem (Jordan's separation theorem). A homeomorphic image of sn in IR.n+l separates IR.n+l, and no proper closed subset of sn does so. In particular h(sn) is the complete boundary of each connected component oflR.n+ 1 \ h(sn). It is instead much more difficult to prove the following general Jordan's theorem.
8.106 Theorem (Jordan). Let h : sn --t IR.n+l be a homeomorphism between sn and its image. Then IR.n+l \ h(sn) has exactly two connected components, each having h(sn) as its boundary. Jordan's theorem in the case n theorem. We also have
=
1 is also known as the Jordan curve
8.3 Exercises
281
8.107 Theorem (Jordan-Borsuk). Let K be a compact subset oflRn +1 such that IR n \ K has k connected components, and let h be a homeomorphism of K into its image on IR n +1. Then IR n +1 \ h(K) has k connected components. Particularly relevant is the following theorem that follows from Borsuk's separation theorem, Theorem 8.103.
8.108 Theorem (Brouwer's invariance domain theorem). Let U be an open set oflRn + 1 and let h : U c IR n + 1 --+ IR n +1 be a homeomorphism between U and its image. Then h(U) is an open set in IR n + 1 . Proof. Let y E h(U). We shall show that there is an open set W C IR n + 1 such that yEW C h(U). Set x = h-l(y), and B := B(x, €) so that Be U. Then (i) IRn + 1 \ h(B) is connected by Corollary 8.104 since B is homeomorphic to B(O, 1) and B(O, 1) does not separate IR n + l , (ii) h(B,8B) = h(B) \ h(8B) is connected since it is homeomorphic to B(x, E). By writing
IR n + 1
\
h(8B) = (IR n
\
h(B)) U h(B \ 8B)
n
we see that IR \ h(8B) is the union of two nonempty, disjoint connected sets, that are necessarily the connected components of IR n \ h(8B); since h(8B) is compact, they are also open in IR n + l . Thus we can take W := h(B \ 8B). 0
A trivial consequence of the domain invariance theorem is that if A is any subset of IR n + 1 and h : A --+ IR n +1 is a homeomorphism between A and its image h(A), then h maps the interior of A onto the interior of h(A) and the boundary of A onto the boundary of h(A). Using Theorem 8.108 we can also prove
8.109 Theorem. IR n and IR m are not homeomorphic if n
1:- m.
Proof. Suppose m > n. If IR n were homeomorphic to IR m , then the image of IR n into IR m under such homeomorphism would be open in IR m . However, the image is not open under the map (x!, ... ,x n ) ---> (Xl, ... ,xn,O, ... ,0). 0
8.3 Exercises 8.110 .. Euler's formula. Prove Euler's formula for convex polyhedra in 1R3 : V - E+ F = 2, where V := # vertices, E := # edges, F := # faces, see Theorem 6.60 of [GM1]. [Hint: By taking out a face, deform the polyhedral surface into a plane polyhedral surface for which V - E + F decreases by one. Thus it suffices to show that for the plane polyhedral surface we have V - E + F = 1. Triangularize the face, noticing that this does not change V - E + F; eliminate from the exterior the triangles, this does not change V - E + F again, reducing in this way to a single triangle for which V - E + F = 3 - 3 + 1 = 1.J
282
8.111
8. Some Topics from the Topology of IR n
~.
Prove
Proposition. Let Ll be an open set of iC. Ll is simply connected if and only if Ll is path-connected and iC \ Ll has no compact connected components. [Hint: Use Jordan's theorem to show that Llc has a bounded connected component if Ll is not simply connected. To prove the converse, use that IR2 \ {xo} is not simply connected.] 8.112~.
Prove
Theorem (Perron-Frobenius). Let A = [aij] be an n x n matrix with aij ;::: 0 Vi, j. Then A has an eigenvector x with nonnegative coordinates corresponding to a nonnegative eigenvalue. [Hint: If Ax = 0 for some XED:= {x E IR n I xi;::: 0 Vi, L~l xi = I} we have finished the proof. Otherwise f(x) := AX/(Li(Ax)i) has a fixed point in D.] 8.113~.
Prove
Theorem (Rouche). Let B = B(O, R) be a ball in IR n with center at the origin. Let f, g E GO(S) with Ig(x)1 < If(x)1 on aB. Then deg(f, 0) = deg(f + g, 0).
Part III
Continuity in Infinite-Dimensional Spaces
Vito Volterra (1860-1940), David Hilbert (1862-1943) and Stefan Banach (1892-1945).
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
The combination of the structure of a vector space with the structure of a metric space naturally produces the structure of a normed space and a Banach space, Le., of a complete linear normed space. The abstract definition of a linear normed space first appears around 1920 in the works of Stefan Banach (1892-1945), Hans Hahn (1879-1934) and Norbert Wiener (1894-1964). In fact, it is in these years that the Polish school around Banach discovered the principles and laid the foundation of what we now call linear functional analysis. Here we shall restrain ourselves to introducing some definitions and illustrating some basic facts in Sections 9.1 and 9.4. Important examples of Banach spaces are provided by spaces of continuous functions that playa relevant role in several problems. In Section 9.3 we shall discuss the completeness of these spaces, some compactness criteria for subsets of them, in particular the Ascoli-Arzela theorem, and finally the density of subspaces of smoother functions in the class of continuous functions, as the Stone- Weierstrass theorem. Finally, Section 9.5 is dedicated to establishing some principles that ensure the existence of solutions of functional equations in a general context. We shall discuss the fixed point theorems of Banach and of CaccioppoliSchauder, the Lemy-Schauder principle and the method of super- and subsolutions. Later, in ~hapter 11 we shall discuss some applications of these principles.
9.1 Linear Normed Spaces 9.1.1 Definitions and basic facts 9.1 Definition. Let X be a linear space over][{ = IR or C. A norm on X is a function II II : X --t IR+ satisfying the following properties
(i) (ii) (iii) (iv)
IIxll E R "Ix E X, Ilxll ~ 0 and Ilxll = 0 if and only if x = IIAxl1 = IAlllxl1 V x E X, V A E ][{, Ilx + yll s; Ilxll + Ilyll V x, y E X.
0,
286
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
MONOCRAFJE MATE~tA1YCZNE 1.0M.ITIT lU)AIC:fJNY ...... ., .. e . . . EII ....U II. It- a.t'a",TOWlltL • M.A.tVUlIWlCL W ffPf'OIJCI •• JUUltAfJ'
TOM I
THEORIE D
t •
OPERATIO S UNEAIRES
••• STEFAN BA .&O"Ul.oa
J,
ACH
1,'V1UYta1iTt D' Lv4w
J""V.llIICJ'1 rtllfD"JZ'O &:l:UV.'- Jf4l.0COWIJ
VAloSZ"".I,h
Figure 9.1. Stefan Banach (1892-1945) and the frontispiece of the The07'ie des operations lineaires.
If II II is a norm on X, we say that (X, II ID is a linear normed space or simply that X is a normed space with norm II II.
Let X be a linear space. A norm 1111 on X induces a natural distance on X, defined by d(x, y) :=
IIx - yll
'r:/x,yEX,
which is invariant by translations, i.e., d(x+z, y+z) = d(x, y) 'r:/x, y, z E X. Therefore, topological notions such as open sets, closed sets, compact sets, convergence of sequences, etc., and metric notions, such as completeness and Cauchy sequences, see Chapter 5, are well defined in a linear normed space. For instance, if X is a normed space with norm II II, we say that {x n } C X converges to x E X if Ilxn - xII ---. 0 as n ---. 00. Notice also that the norm II II : X ---. lR is a continuous function and actually a Lipschitzcontinuous function,
Illxll-llylll ~ Ilx - yll, see Example 5.25. 9.2 Definition. A real (complex) normed space (X, II II) that is complete with respect to the distance d(x, y) .- IIx - ylj is called a real (complex) Banach space.
9.3 Remark. By Hausdor:ff's theorem, see Chapter 5, every normed linear space X can be completed into a metric space, that is, X is homeomorphic to a dense subset of a complete metric space. Indeed, the completed metric space and the homeomorphism inherit the linear structure, as one easily
9.1 Linear Normed Spaces
287
sees. Thus every normed space X is isomorphic to a dense subset of a Banach space. 9.4 Example. With the notation above: (i) JR with the Euclidean norm Ixl is a Banach space. In fact, Ixl is a norm on JR, and Cauchy sequences converge in norm, compare Theorem 2.35 of [GM2]. (ii) JRn, n ~ 1, is a normed space with the Euclidean norm
see Example 3.2. It is also a Banach space, see Section 5.3. (L~llziI2)I/2, z (iii) Similarly, en is a Banach space with the norm Ilzll (zl, z2, ... , zn). 9.5 ~ Convex sets. In a linear space, we may consider convex subsets and convex functions. Definition. E C X is convex if >.x+ (1->')y E E for all x, y E E and for all >. E [0,1]. f: X ---t JR is called convex if f(>.x + (1- >.)y) :s; Vex) + (1- >.)f(y) for all x,y E X and all >. E [0,1]. Show that the balls B(xo,r) := {x E X convex.
Illx - xoll < r}
of a normed space X are
a. Norms induced by inner and Hermitian products Let X be a real (complex) linear space with an inner (Hermitian) product (xly). Then Ilxll := J(xlx) is a norm on X, see Propositions 3.7 and 3.16. But in general, norms on linear vector spaces are not induced by inner or Hermitian products. 9.6 Proposition. Let II II be a norm on a real (respectively, complex) normed linear space X. A necessary and sufficient condition for the existence of an inner (Hermitian) product ( I ) such that Ilxll = (xix) 'Vx E X is that the parallelogram law holds,
'Vx,y E X. 9.1~. Show Proposition 9.6. [Hint: First show that if ogram law holds. Conversely, in the real case set
IIxl1 2= (xix),
then the parallel-
and show that it is an inner product, while in the complex case, set
(xly)
:= ~(llx + Yl12 -llx 4
yW)
+ i~(llx + iyl12 -llx - iyI12), 4
and show that (xly) is a Hermitian product.
9.8~.
For p ~ 1, Ilxllv:= (L~=llxilvr/p, x = (xl, x 2 , ... , x n ), is a norm in JRn, d. Exercise 5.13. Show that it is induced by an inner product if and only if p = 2.
288
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
b. Equivalent norms 9.9 Definition. Two norms 11111 and 11112 on a linear vector space X are said to be equivalent if there exist two constants 0 < m < M such that
mIlxlll ::; IIxl12 ::; M Ilxlll
"Ix E X.
(9.1)
If 11111 and 11112 are equivalent, then trivially the normed spaces (X, II lid and (X, II 112) have the same convergent sequences (to the same limits) and the same Cauchy sequences. Therefore (X, 11111) is a Banach space if and only if (X, 1/ 112) is a Banach space. Since the induced distances are translation invariant, we have the following. 9.10 Proposition. Let II IiI and II 112 be two norms on a linear vector space X. The following statements are equivalent
(i) II 111 and II 112 are equivalent norms, (ii) the relative induced distances are topologically equivalent, (iii) for any {x n } C X, Ilxnlll -40 if and only if IIxnl12 -4 o. Proof. Obviously (i) => (ii) => (iii). Let us prove that (iii) => (i). (iii) implies that the identity map i : (X, II III) --+ (X, II 112) is continuous at O. Therefore there exists 8 > 0 such that IIzl12 :::; 1 if Ilzlli :::; 8. For x E JR, x # 0, if z := (8/llxllI))x we have IlzIII = 8 hence IIzl12 :::; 1, Le., IIxll2 :::; Ilxlli. Exchanging the role of II III and II 112 and repeating the argument, we also get the inequality Ilxlli :::; IIxl12 "Ix E X for 0 some 81 > 0, hence (i) is proved.
i
t
9.11 1. Let X and Y be two Banach spaces. Show that their Cartesian product, called the direct sum, is a Banach space with the norm II(x, Y)III,xXY := Ilxllx + Ilylly. Show that
II(x,Y)llp,XxY :=
y!llxll~ + Ilyllt,
p
~
1,
II(x,Y)lloo,XxY := max(llxllx, Ilylly), are equivalent norms.
c. Series in normed spaces In a linear vector space X, finite sums of elements of X are elements of X. Therefore, given a sequence { xn } in X, we can consider the series l:~=o xn , i.e., the sequence of partial sums {l:Z=o Xk }. If, moreover, X is a normed space, we can inquire about the convergence of series in X.
9.12 Definition. Let X be a normed vector space with norm 111/. A series l:~=o x n , X n E X, is said to be convergent in X if the sequence of its partial sums, Sn := l:Z=o Xk converges in X, i.e., there exists sEX such that II Sn - S II -4 O. In this case we write 00
s
=
LXk
inX
k=O
instead of Iisn
- sll -4 0
and s is said to be the sum of the series.
9.1 Linear Normed Spaces
289
9.13 Remark. Writing s = L~o Xk might make one forget that the sum of the series s is a limit. In dubious cases, for instance if more than one convergence is involved, it is worth specifying in which normed space (X, II IIx), equivalently with respect to which norm II Ilx, the limit has been computed by writing 00
s=
LXk k=O
or
in the norm X,
00
LXk in X, k=O or, even better, writing lis - LZ=oXkllx -. O. s=
9.14 Definition. Let X be a normed space with norm II II. We say that the series L;o X n , {x n } eX, is absolutely convergent if the series of the norms Ln=o Ilx nII converges in R
We have seen, compare Proposition 2.39 of [GM2] , that every absolutely convergent series in lR is convergent. In general, we have the following. 9.15 Proposition. Let X be a normed space with norm II II. Then all the absolutely convergent series of elements of X converge in X if and only if X is a Banach space. Moreover, if L~=o Xn is convergent, then
=o
Proof. Let X be a Banach space, and let 2: k Xk be absolutely convergent. The sequence of partial sums of 2:k=O [[xkll is a Cauchy sequence in JR., hence 2:k=p [[Xk[[ 0 as p, q 00. From the triangle inequality we infer that
-+
-+
q
[I
11-+
LXk[1 k=p
~
q
L Ilxk[[ k=p
-+
hence" 2:k=p Xk 0 as P, q 00, i.e., the sequence of partial sums of 2:~0 Xk is a Cauchy sequence in X. Consequently, it converges in norm in X, since X is a Banach space. Conversely, let {xd c X be a Cauchy sequence. By induction select nl such that Ilxn - xn111 < 1 if n ~ nl, then n2 > nl such that Ilxn - xn211 < 1/2 if n ~ n2 and so on. Then {Xnk} is a subsequence of {Xk} such that
[[Xnk+l - xnkll ~ T k
Vk,
and consequently the series 2:~1 (X nk + 1 - x nk ) is absolutely convergent, hence convergent to a point y E X by assumption, i.e., p
"L(Xnk + 1 k=l
-
Xnk) -
yll-+ 0,
as P
-+ +00.
290
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
Since this simply amounts to Ilx np - xii -> 0, x := y + X n1 ' {X nk } converges to x, and, as {Xn} is a Cauchy sequence, we conclude that in fact the entire sequence {x n } converges to x. Finally, the estimate follows from the triangle inequality q
q
II k=p LXkl1 ~ k=p L Ilxkll as q ->
00,
since we are able to pass to the limit as L~=o xk converges.
o
9.16 'If Commutativity. Let X be a Banach space and let {x n } C X be such that Ln Xn is absolutely convergent. Then Ln x
is absolutely convergent and
Lk=O Xk
=
L~=l (LkEln Xk).
d. Finite-dimensional normed linear spaces In a finite-dimensional vector space, there is only one topology induced by a norm: the Euclidean topology. In fact, if JK = lR or JK = C, we have the following. 9.18 Theorem. In JKn any two norms are equivalent. Proof. It suffices to prove that any norm p on lK n is equivalent to the Euclidean norm Let (€I, e2, . .. , en) be the standard basis of lK n . If x = (x 1, x 2 , ... , X n) and y = (yl, y2, ... , yn), we have
I I·
~
p(X - y) = p(i)x i - yi)e i ) i=l
t
Ix i - yilp(ei)
i=l
hence p : lKn -> lR+ is continuous. Since the unit ball B := {x E lKn Ilxl = I} of lKn is compact, we infer that p attains a maximum value M and a minimum value m on B. Since the minimum value is attained, we infer that m > 0, otherwise p would be zero at some point of B. Therefore 0 < m ~ p(x) ~ M on B, and, on account of the I-homogeneity of the norm,
mixi i.e.,
IIII
~ p(x) ~ M
Ixi
is equivalent to the Euclidean norm.
o
9.19 Corollary. Every finite-dimensional normed space X is a Banach space. In particular, any finite-dimensional subspace of X is closed and K c X is compact is and only if K is closed and bounded.
9.1 Linear Normed Spaces
291
Proof. Let p be a norm on X and let £ : IK n --t X be a coordinate map on X. Since £ is linear and nonsingular, po £ is a norm on IKn and £ is trivially an isometry between the two normed spaces (IK n , po £) and (X, p). Since po £ is equivalent to the Euclidean norm, (IK n , p 0 £) is a Banach space and therefore (X, p) is a Banach space, too. The second claim is obvious. 0 9.20~.
Let X be a normed space of dimension n. Then any system of coordinates IKn is a linear continuous map between IK n with the Euclidean metric and the normed space X.
£ :X
--t
A key ingredient in the proof of Theorem 9.18 is the fact that the closed unit ball in JRn is compact. This property is characteristic of finitedimensional spaces.
9.21 Theorem (Riesz). The closed unit ball of a normed linear space X is compact if and only if X is finite dimensional. For the proof we need the following lemma, due to Frigyes Riesz (18801956), which in this context plays the role of the orthogonal projection theorem in spaces with inner or Hermitian products, see Theorem 3.27 and Chapter 10.
9.22 Lemma. Let Y be a closed linear subspace of a normed space (X, II II)· Then there exists x E X such that Ilxll = 1 and Ilx - xii 2:: 1/2 V xE Y. Proof. Takexo E X\Y and define d:= inf{lly-xollly E Y}. We have d > 0, otherwise we could find {Yn} C Y with Yn --t xo and xo E Y since Y is closed. Take YO E Y with Ilyo - xoll ::; 2d and set x = II~g=~gll' Clearly Ilxll = 1 and Yo + yllxo - yoll E Y if y E Y, hence IIY- XII=lly-
xo-Yo 11= IlyIIXO-YOII-xo+YOII
Ilxo - yoll
Ilxo - yoll
>~=~.
- 2d
2
o Proof of Theorem 9.21. Let B := {x E X Illxll ::; 1}. If X has dimension n, and £ : IK n --t X is a system of coordinates, then £ is an isomorphism, hence a homeomorphism. Since B is bounded and closed, £-I(B) is also bounded and closed, hence compact in IK n , see Corollary 9.19. Therefore B = £(£-I(B)) is compact in X. We now prove that B is not compact if X has infinite dimension. Take Xl with IlxI11 = 1. By Lemma 9.22, we find X2 with IIx211 = 1 at distance at least 1/2 from the subspace Span {xl}, in particular Ilxl - x211 2': ~. Again by Lemma 9.22, we find X3 with IIx311 = 1 at distance at least 1/2 from Span {Xl, X2}, in particular IIx3 - Xlii 2': ~ and IIx3 - x211 2': ~. Iterating this procedure we construct a sequence {Xn} of points in the unit sphere such that Ilx; - xjll 2': ~ Vi,j, i of- j. Therefore {x n } has no convergent subsequence, hence the unit sphere is not compact. 0
9.23 Remark. We emphasize that, in any infinite-dimensional normed space we have constructed a sequence of unit vectors, a subsequence of which is never Cauchy.
292
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
9.1.2 A few examples In Sections 9.2 and 9.4 we shall discuss respectively, the relevant Banach spaces of linear continuous operators and of bounded continuous functions. Here we begin with a few examples. a. The space i p , 1 ::; P Let (Y,
< 00
lilly)
{~i} C Y
be a normed space and p E JR, p ~ 1. For a sequence ~ we define
=
Then the space of sequences
is a linear space with norm 11~llep(Y). Moreover, we have the following.
9.24 Proposition. £p(Y) is a Banach space if Y is a Banach space. Proof. Let {~d, ~k :=
{d
k
)},
be a Cauchy sequence in l'p(Y). Since for any i (9.2)
the sequence {~ik)h is a Cauchy sequence in Y, hence it has a limit ~i E Y, as k
-> 00.
We then set ~ := {~;} and prove that {~d converges to ~ in l'p(Y). Fix all n, m ~ no(€) we have
€
> 0,
then for
hence, for all r E N r
L IId
n
) -
d
m
)lIv < €p
i=1
and, since x
->
liz - xlly
is continuous in Y, as m
-> 00,
r
L IId
n
) -
~illv ::;
€p
i=l
for n ~ no(€) and all r. Letting r -> 00, we find II~n - ~[[ip(Y) ::; € for n ~ no, i.e., in l'p(Y). Finally, the triangle inequality shows that ~ E l'p(Y). D
~n -> ~
9.1 Linear Normed Spaces
293
b. A normed space that is not Banach The map
f
---t
Ilfll p = IIfll p,]a,b[
:=
(l
b
) lip
a If(t)IP dt
,
P 21,
defines a norm on the space of continuous functions CO([a, b], JR). Indeed, if Y is a linear normed space with norm II II y ,
l Ilf(t)llv b
Ilfll p = Ilflb(]a,b[,Y)
:=
dt,
p
21,
defines a norm on the space of continuous functions with values in Y. In fact, t ---t Ilf(t)llv is a continuous real-valued map, hence Riemann integrable, thus Ilfll p is well defined. Clearly Ilfll p = if and only if f(t) = Vt and Ilfll p is positively homogeneous of degree 1. It remains to prove the triangle inequality for f ---t II flip, called the Minkowski inequality,
°
°
vf, g E CO([a, b], Y). The claim is trivial if one of the two functions is zero. Otherwise, we use the convexity of ¢ : Y ---t JR where ¢(y) := IlylIV' i.e.,
¢(AX + (1 - A)Y) ::::; A¢(X) + (1 - A)¢(Y), with x = f(t)/llfll p, y integrate to get
= g(t)/llglip
Vx, Y E Y, VA E [0,1], (9.3)
and A = IIfll p/(llfll p + Ilgllp), and we
Ilgllp l Ilf(t) + g(t)llv b
Ilfll p
+1
a
dt ::::; 1,
that is Minkowski's inequality. It turns out that CO([a,b],JR) normed with II lip is not complete, see Example 9.25. Its completion is denoted by LP(]a, b[). Its characterization is one of the outcomes of the Lebesgue theory of integration. 9.25 Example. Define, see Figure 9.2,
In(t) =
The sequence
{
o
if - 1 :::; t :::; 0,
nt
if 0
1
if lin:::; t :::; 1
Un}
< t :::; lin,
converges to
Il/n -
I
and
in norm
III~ =
1/ n
1 a
II lip
I(t) =
{~
if - 1 :::; t :::; 0, if 0
< t :::;
1.
as
1 (1 - nt)P dt :::; n
->
O.
If 9 E C O ([-l, 1]) is the limit of Un}, then III - gllp = 0, consequently 9 [-1,0] and 9 = I = 1 on ]0, 1], a contradiction, since 9 is continuous.
= I = 0 on
294
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
1
lin Figure 9.2. Pointwise approximation of the Heaviside function.
c. Spaces of bounded functions Let A be any set and Y be a normed space with norm lilly. The uniform norm of a function f : A --+ Y is defined by the number (possibly infinity)
IlfIIB(A,Y) := sup Ilf(x)lIy· xEA
IlfIIB(A,Y) defines a norm on the space of bounded functions f : A B(A, Y) := {f : A
--+
Y IllfIIB(A,y) <
--+
Y
+oo}
which then becomes a normed space. The norm IlfIIB(A,Y) on B(A, Y) is also denoted by Ilflloo,A or even by Ilflloo when no confusion can arise. The topology induced on B(A, Y) by the uniform norm is called the topology of uniform convergence, see Example 5.19. In particular, we say that a sequence {fn} C B(A, Y) converges uniformly in A to f E B(A, Y) and we write uniformly in A, fn(x) --+ f(x) if
lIin - iIIB(A,Y)
--+
O.
9.26 Proposition. If Y is a Banach space, then B(A, Y) is a Banach space. Proof. Let Un} C B(A, Y) be a Cauchy sequence with respect to 111100' For any E > 0 there is a no such that Illn - Imlloo ::; E for all n,m ~ no. Therefore, for all x E A and n,m ~ no (9.4) Illn(x) - Im(x)lly ::; E. Consequently, for all x E A, {In (x)} is a Cauchy sequence in Y hence it converges to an element I(x) E Y. Letting m --+ 00 in (9.4), we find
Il/n(x) -
l(x)lly ::; E
i.e., III - Inlloo ::; E for n ~ no, hence 11/1100 uniformly in. B(A, Y) since E is arbitrary.
Vn
~
no and 'Ix E A,
::; Illnlloo +E, i.e., IE
B(A, Y) and In
--+
I 0
9.27 'If. Let Y be finite dimensional and let (€I, e2, ... , en) be a basis of Y. We can write I as I(x) = h(x)el + ... + In(x)en. Thus I E B(A, Y) if and only if all the components of I are bounded real functions.
9.2 Spaces of Bounded and Continuous Functions
295
d. The space loo(Y) A special case occurs when A = N. In this case B(A, Y) is the space of bounded sequences of Y, that we better denote by
fioo(Y)
:=
B(N, Y).
Therefore, by Proposition 9.26, fioo(Y) is a Banach space with the uniform norm 11~lleoo(Y) := 11~IIB(N,y) = sup II~illy, i
if Y is complete. 9.28'. Show that for 1 :::: p
< q < 00
we have
(i) £p(IR) C £q(IR) C £oo(IR), (ii) £p(IR) is a proper subspace of £q(IR), (iii) the identity map Id : £p(IR) ---+ £q(IR) is continuous, (iv) £1 (IR) is a dense subset of £q(IR) with respect to the convergence in £q(IR). 9.29'. Show that, if p, q ~ 1 and l/p + l/q
~
~ l~iT/il
::::
=
1I~II~p(lR) P
i=l
1, then
11T/lliq(lR) + --'-'---'-
(9.5)
q
for all {~n} E £p(IR) and {T/n} E £q(IR). Moreover, show that
11~llep(lR) = sup {! f: ~nT/nllliT/lleq(lR)
::::
1}.
(9.6)
n=l
[Hint: For proving (9.5) use the Young inequality ab:::: a P /p + bq /q. Using (9.5), show that ~ holds in (9.6). By a suitable choice b = b(a) and again using Young's inequality, finally show equality in (9.6).]
9.2 Spaces of Bounded and Continuous Functions In this section we discuss some basic properties of the space of continuous and bounded functions from a metric space into a Banach space.
9.2.1 Uniform convergence a. Uniform convergence Let X be a metric space and let Y be a normed space with norm II II y . Then, as we have seen in Proposition 9.26, the space B(X, Y) of bounded
296
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
functions from X into Y is a normed space with uniform norm, and B(X, Y) is a Banach space provided Y is a Banach space. We denote by Cb(X, Y) the subspace of B(X, Y) of bounded and continuous functions from X into Y, Cb(X, Y) := CO(X, Y) n B(X, Y).
Observe that, by the Weierstrass theorem Cb(X, Y) = C(X, Y) if X is compact, and that, trivially, Cb(X, Y) is a normed space with uniform norm. 9.30 Proposition. Cb(X, Y) is a closed subspace of B(X, Y). Proof. Let Un} C Cb(X, Y) be such that In -+ I uniformly. For any no = no(£) such that III - Inolloo,x < Eo It follows that
1l/lloo,x
~
£
> 0,
we choose
III - Inolloo,x + Il/nolloo,x < +00,
Le., I E B(X, Y). Moreover, since Ino is continuous, for a fixed Xo E X there exists 8> 0 such that Il/no(x) - Ino(xo)lly < £ whenever x E X and dx(x,xo) < 8. Thus, for d(x, xo) < 8, we deduce that
II/(x) - l(xo)IIy ~ IIf(x) - Ino(x)IIY i.e., I is continuous at Xo. In conclusion,
+ IIlno(x) - Ino(xo)IIy + Il/no(xo) - l(xo)IIy ~ 3£ o
IE CO(X, Y) n B(X, Y).
Immediate consequences are the following corollaries. 9.31 Corollary. The uniform limit of a sequence of continuous functions is continuous. 9.32 Corollary. Let X be a metric space and let Y be a Banach space. Then Cb(X, Y) with uniform norm is a Banach space. 9.33 1. Show that the space C1 ([a, bJ, JR) of real functions class C 1 , is a Banach space with the norm
11/11et:=
I/(x)1 +
sup XE[O,lJ
sup
I : [a, bJ
JR, which are of
1/'(x)l·
xE[O,l]
[Hint: If {/d is a Cauchy sequence in C1 ([a, b]), show that Ik Then passing to the limit in
h(x) - h(a) =
-+
l
x
-+
I, I~
-+
g, uniformly.
I~(t) dt,
show that I is differentiable and f'(x) = g(x) Vx.J 9.34~. Let X be a metric space and let Y be a complete metric space. Show that the space of bounded and continuous functions from X into Y, endowed with the metric
doo(f,g):= sup dy(f(x),g(x», xEX
is a complete metric space.
9.2 Spaces of Bounded and Continuous Functions
297
Figure 9.3. Consider a wave shaped function, e.g., f(x) = 1/(1 + x 2 ), and its translates fn(x) := 1/(1 + (x + n)2). Then Ilfnll= = 1, while fn(x) ---> 0 for all x E JR.
b. Pointwise and uniform convergence Let A be a set and let Y be a normed space normed by I Iy. We say that {fn}, fn : A -> Y, converges pointwise to f : A -> Y in A if
Ifn(x)
->
f(x)ly
->
\Ix
0
E
A,
while we say that Un} converges uniformly to f in A if
Ilfn -
flloo,A
->
O.
Since for all x E A Ilfn(x) - f(x)lly ~ Ilfn - flloo,x, uniform convergence trivially implies pointwise convergence while the converse is generally false. For instance, a sequence of continuous functions may converge pointwise to a discontinuous function, and in this case, the convergence cannot be uniform, as shown by the sequence fn(x) := x n, x E [0,1[, that converges to the function f which vanishes for all x E [0,1[, while f(l) = 1. Of course, a sequence of continuous functions may also converge pointwise and not uniformly to a continuous function, compare Figure 9.3. More explicitly, f n -> f pointwise in A if
\Ix
E
while,
A, \If> 0 :J n
fn
->
f
= n(x, f)
such that Ifn(x) - f(x)ly < f for all n 2': n,
uniformly in A if
\I f > 0 :J n = E such that Ifn(x) - f(x)ly <
f
for all n 2': n and all x E A.
Therefore, we have pointwise convergence or uniform convergence according to whether the index n depends on or is independent of the point
x. c. A convergence diagram For series of functions fn : A -> Y, we shall write 00
f(x)
=
L fn(x) n=l
\Ix E A
298
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
Absolute convergence in Y
Absolute convergence in B(A, Y)
'ix
Convergence in Y
Uniform convergence, Le., convergence in B(A, Y)
'ix
Figure 9.4. The relationships among the different notions of convergence for series of functions.
if the partial sums converge pointwise in A, and
f(x)
=
L
fn(x)
uniformly in A
n=l if the partial sums converge uniformly. Simply writing 2:::=1 fn(x) = f(x) is, in fact, ambiguous. Summarizing, we introduced four different types of convergence for series of functions from a set A into a normed space Y. More precisely, if {fn} C B(A, Y) and f E B(A, Y), we say that (i) (ii)
2:::=0 fn(x) converges pointwise to f if for all x E A 2:::=0 fn(x) = f(x) in Y, i.e., for all x E A, II 2::~=0 fn(x) - f(x)lly --+ 0 as p --+ 00, 2:::=0 fn(x) converges absolutely in Y for all x E A i.e., for any fixed x E A, the series of nonnegative real numbers 2:::=0 Ilfn(x)lly
converges, 2:::=0 fn(x) converges uniformly in A to f if 2:::=0 fn = fin B(A, Y), i.e., II 2::~=0 fn - fIIB(A,Y) --+ 0 as p --+ 00, (iv) 2:::=0 fn(x) converges absolutely in B(A, Y) if the series of nonnegative real numbers 2:::=0 IlfnIIB(A,Y) converges.
(iii)
Clearly (iv) implies (ii), and (iii) implies (i). Moreover, (iv) implies (iii) and (ii) implies (i) if Y is a Banach space; the other implications are false, see Example 9.35 below. 9.35 Example. Consider functions I : lR+ --> R Choosing In(x) := (_l)n In, we see that L~=l In (x) converges pointwise and uniformly, but not absolutely in lR or in B(lR, lR). Let I(x) := sin xix, x> 0, and, for any n E N, SinX
In(x)
:=
{
< x ::::: n + 1,
--
if n
o
otherwise.
x
Clln::::: Illnlloo : : : c2ln, I:nln does not converge absolutely in B(lR+,lR). But I(x) = L~=o In(x) converges pointwise 'ix E lR+ and also absolutely in lR for all x E lR+. Finally,
Since
9.2 Spaces of Bounded and Continuous Functions
I(x) -
P {SinX L In (x) = ----;-
if
n=O
otherwise,
0
299
x> p
hence P
III - n=O L Inl[(X) ~
C2 --->
as p
0
---> 00,
p
therefore 'L.n In converges uniformly, that is in B(IR+, lR). Here the convergence is uniform in B(lR+, lR) but not absolute in B(lR+, lR), because the functions In take their maxima at different points and the maximum of the sum is much smaller than the sum of the maxima.
9.36 Theorem (Dini). Let X be a compact metric space and let Un} be a monotonic sequence of functions fn : X -+ IR that converges pointwise to a continuous function f. Then f n converges uniformly to f· 9.37~. Show Dini's theorem. [Hint: Assuming that In converges by decreasing to 0, for all t > 0 and for all x E X there exists a neighborhood Vx of x such that Iln(x)1 < t \Ix E Vx for all n larger than some n(x). Then use the compactness of X. Alternatively, use the uniform continuity theorem, Theorem 6.35.J
9.38~.
Show a sequence {In} that converges pointwise to zero and does not converge uniformly in any interval of R [Hint: Choose an ordering of the rationals {rn} and consider the sequence In(x) := 'L.~=o
Let E be a dense subset of a metric space X, let Y be a Banach space and let
{In}, In : X ---> Y be a sequence of continuous functions that converges uniformly in E. Show that {fn} converges uniformly in X. 9.40 ~ Hausdorff distance. Let Z be a metric space with metric d bounded in Z x Z. In the space C of bounded closed sets of Z, we define the Hausdorff distance by
p(E, F) := sup {sup d(x, F), sup d(x, En. xEE
xEF
Show that p is a distance on C. Now suppose that X is a compact metric space and Y is a normed space. Show that {In} converges uniformly to I if and only if the graphs in X x Y of the In's converge to the graph of I with respect to the Hausdorff distance.
d. Uniform convergence on compact subsets 9.41 Example. We have seen in Theorem 7.14 of [GM2J that a power series with radius of convergence p > 0 converges totally, hence uniformly, on every disk of radius r < p. This does not mean that it converges uniformly in the open disk {[z[ < pl. For instance, the geometric series 'L.~=o x n has radius of convergence 1 and if Ixl < 1
~ n _ x p +1
1
l_x-L...,x -I-x' n=O consequently for all p,
1--x - n=O L xnl 1
sup
xE]-!,l[ 1 -
p
=
+00.
300
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
..
'
--ItO.... ... II
~"..",n.o M~
Caw,
...... _II" A«..,..,('O Oao-
,Nt CU4U
Wu.\ ... M.-ort&' . . .
..ltI...... ~."I ..'Il., •• lul
l.'A.,.,...
•• I..w.. ~"fIIM_Id4\111111 bAM l"M1e. mnlluNol" 1Md1k.~1 0•• oc. lIkaMo.I". tIllIlIMI. ~D431i
v
(",",
... "....'" .1.
Uf1. 4ll I\"u...'
• .,t,) ........' •
....,/.., -
.h'~."'1'Of1an
........._
. . ,rI1N .. W.Iai, 00"\.1 ..... 6) ,... r.c-f"~ COIDe
I
"'1
, 1J.,llt 'VI WI .... N IPtIlJI! .. pr,nc.,.. I IltOI'(IIttvaal rt~Hf
till"" l"'cMl:i-lu..
It'
..
41
f
'I'" run. ,\AI
an.
I ...
toGU.
1I"0Ct~
OIM ea·1 ".,t'UMlarl . . ,.....,. . . ,.,tln .t... ra.-
t_I ftfletlU di '''1 ur'" l".lrIU....... 4\Iojlrt'.. ~I",,~...n. I "110" loll"". -.(aJ•.• (llIfU1OGl lIe,t,. cootJau. d'.t. .or ...1l'lruen-11o.. to UI ........ '0 I.. OInt "'lIno z
"")-}l.~.".I)
.
•1. . . . C'Md,1 . . ." ' N .. MI
..$"',..J ala 1\I'I.do". IMMI 4tVO
~
_I
•
In",.. ~, .1 . . •
....,.. ... ~ ... "'I, eM I at a*a
""'j ""I.
I ....
allIM:W
" .. C:OtlIU.~ o.lla ...-r,,..UI: ". . a 41,. elM. per op! 119" ,..,.. ' oca1
COD~'f1"'.
'It
UI ftOntt
.tl
.~
lJero H-.nI rD-
WI lWl'lWto ..
eMI'I~
va
Figure 9.5. Giulio Ascoli (1843-1896) and the first page of Bulle sene di Iunzioni by Cesare Arzela (1847-1912).
9.42 Definition. Let A c JRn and let Y be a normed space. We say that a sequence of functions {fn}, fn : A -+ Y, converges uniformly on compact subsets of A to f : A -+ Y if for every compact subset K c A we have Ilfn - flloo,K -> 0 as n -> 00.
n
9.43'. Let In, I : -> lR be continuous functions defined on the closure of an open set n of Rn. Show that {fn} converges uniformly to I on if and only if {fn} converges uniformly to I in n.
n
9.44'. Let {In}, In : A C jRn -> Y, be a sequence of continuous functions that converges uniformly on compact subsets of A to I : A -> Y. Show that I is continuous.
9.2.2 A compactness theorem At the end of the nineteenth century, especially in the works of Vito Volterra (1860-1940), Giulio A~oli (1843-1896), Cesare Arzela (18471912) there appears the idea of considering functions :F whose values depend on the values of a function, the so-called funzioni di linee, functions of lines; one of the main motivations came from the calculus of variations. This eventually led to the notion of abstract spaces of Maurice FnJchet (1878-1973). In this context, a particularly relevant result is the compactness criteri<m now known as the Ascoli-Arzela theorem. a. Equicontinuous functions Let X be a metric space and Y a normed space.
9.2 Spaces of Bounded and Continuous Functions
301
9.45 Definition. We say that a subset F of l3(X, Y) is equibounded, or uniformly bounded, by some constant M > 0 if we have Ilf(x)lly ::; M 'ix E X, 'if E :F. We say that the family of functions F is equicontinuous if for all f > 0 there is 8 > a such that
Ilf(x) -
f(Y)lly <
'ix,y E X with dx(x,y) < 8, and 'if E:F.
f
9.46 Definition (Holder-continuous functions). Let X, Y be metric spaces. We say that a junction f : X ----t Y is Holder-continuous with exponent 0:, a < 0: ::; 1, if there is a constant M such that dy(f(x), f(y)) ::; Mdx(x, y)''', and we denote by CO,O(X, Y) the space of these functions.
Clearly the space CO,l (X, Y) is the space Lip (X, Y) of Lipschitzcontinuous functions from X into Y. On CO,O(X, Y) n l3(X, Y), 0<0: ::; 1, we introduce the norm
Ilflleo,,, =
sup
Ilf(x)lly +
xEX
sup x,yEX,xly
Ilf(x) - f(y)lly
(9.7)
Ilx-yll'X
and it is easy to show the following.
9.47 Proposition. CO,O(X, Y)nl3(X, Y) endowed with the norm (9.7) is a Banach space if Y is a Banach space. Bounded subsets with respect to the norm (9.7), of CO,O(X, Y) n l3(X, Y) provide examples of equicontinuous families. See the exercises at the end of this chapter for more on Holdercontinuous functions.
b. The Ascoli-Arzela theorem 9.48 Theorem (Ascoli-Arzela). Every sequence of functions Un} in CO ([a, b]) which is equibounded and equicontinuous has a subsequence that is uniformly convergent. More generally, we have the following.
9.49 Theorem. LetX be a compact metric space. A subsetF ofCO(X,IR) is relatively compact if and only if F is equibounded and equicontinuous. Proof. We recall that a subset of a metric space is relatively compact or precompact if and only if it is totally bounded, see Theorem 6.8. If F is relatively compact, then F is totally bounded and, in particular, equibounded in CO(X, JR.). For any € > 0, let iI, ... , fn, E .p be an €-net for F, i.e., \j f E F Ilf - /;11 < € for some Ii. Since the /;'s are uniformly continuous, there is a 8, such that dx(x,y)
< 8,
implies I/;(x) - /;(y)1
< €,
i=1,2, ... ,n E
o
302
9. Spaces of Continuous FUnctions, Banach Spaces and Abstract Equations
PROP. OESARE AR:/;ELA
..-..-. ..... ...:..-:-:...~ z:-; -= =.=.':=:'" "'L~~~.~,~a7."II~r.-:I~::":~~:' U. ,'I,•• n'l
1.. 11"""'- _ _ .. _ .... ,......-.~
.................... ""'
••
~"'l-.I,l .
' .." ••••• II.II , ~f.l. . _.. _.. _ . .. ~
.. ..III
_II~
~
11_6.1
1.L'".'I
I
.1 •••
II . ' - . . .... -
III . . . . : - . &IUl~ . . ~ ~l6IIlA ..... 00. ",......,u. f\Waay. P. 1"-.w.p ..... _ ......" ...
_.. _..... .-.
~""'1&dllJ1IrqlMl"",~",,,,,,,,,,.,,,,,,,,,,
__ ....... --
t
*-1NIIiml......... ~ .....
l~
=:-...:=.:..~W- ;.rJJ:~~ -lo·~~·.r:~_l~ ...... ...-..olI_ .. _ ftl'ItYO ....... _.........·
..--:::()ol ............
.. --- ----
.. ~ . a ~ w~
.....".,._.~UI.I.,
........ ~.-..---... :::::"~'':''~'';: =.:..~:-':-~ --..a.._.........
,. . . . . . iii LS!tellll
.....
n_
~
..... _ ~
_ _ jOII_
_..........
I\.;..r_ ....... _ a . -
,....1'..., 1_.41.d1.,.,.
..
IIl.a
.
. . ..".IIO
ftlUblJI. tl OII. . . llI&IIlI.ItndocIJ
,d....
. . ~ • • fIWJ; ~
..
~
~
.wlaPf¥W"""
1ioI~ ....
lllt.rr~ .......
w. .. ,. ....
po.-~,..
'"
tMeM.
,..
_I.,.t
~.tl.JWJ••
-E~:::.~=.?E=:~:. ......... ,.,n ....... '10
JI6M~..cl._ik1A
AIIIOWO , . '-11 1l.iulM1, It , • • n. v.wt. ...... I ...... ..-caJI • .,~,., .. t""""poIll1""""'."._~ ........ .au. IU u.-4 n.r..... .......,.,... Ie£" IaI\A4I'lI
.
_~'I'O.~
• •~ ...IAnJ_
.... , .
_
-.-._~--
Figure 9.6. Two pages respectively from Le curve limiti di una varieta data di curve by Giulio Ascoli (1843-1896) and from Bulle funzioni di linee by Cesare Arzela (18471912).
Given
f
Ilf - lioll < E. Then - lio(x)1 + llio(x) - lio(y)1 + Ilio(Y) - f(Y)1 :::: 211f - lioll + llio(x) - lio(y)1 :::: 3E,
E F we choose io, 1:::: io :::: n< in such a way that
If(x) - f(Y)1 :::: If(x)
for dx(x, y) < 8<, hence F is an equicontinuous family. Conversely, suppose F is equibounded and equicontinuous. Again by Theorem 6.8, it suffices to show that F is totally bounded. Let E > O. From the compactness of X and the equicontinuity of F, we infer that there exists a finite family of open balls B(Xi, ri) that cover X and such that If(y) - f(x;)1
<E
\;I y E B(Xi, ri),
\;If EF.
Since the set K := {f(Xi) 11:::: i :::: n, f E F} is bounded, we find Yl, Y2, ... , Yrn E JR such that K C Ur;iB(Yi, E). The set F is covered by the finite union of the sets
F",:= {f E
FI
f(x;) E B(Y"'(i),E)
< E,
i = 1, ... ,n},
with 1l" varying among the bijective maps 1l" : {I, ... , n} ---> {I, ... , n}. Therefore, it suffices to show that diamF", :::: 4E. Since for h, h E F", and x E B(Xi, ri) we have
+ Ih(Xi) - Y"'(ijl h(xi)1 + Ih(x;) - h(x)1
Ih(x) - h(x)1 :::: Ih(x) - h(Xi)1
+ IY"'(i)
-
:::: 4E,
the proof is concluded. 9.50~.
0
Notice that the sequence {fn} of wave shaped functions in Figure 9.3,
1 fn(x) := 1 + (x + n)2'
x E JR, n E 1\1,
is equicontinuous and equibounded, but not relatively compact.
9.3 Approximation Theorems
303
9.51 ~. Theorem 9.49 can be formulated in slightly more general forms that are proved to hold with the same proof of Theorem 9.49. Theorem. Let X be a compact metric space, and let Y be a Banach space. A subset Fe C(X, Y) is relatively compact if and only if F is equicontinuous and, for every X, the set F x of all values f (x) of f E F is relatively compact in Y. Theorem. Let X and Y be metric spaces. Suppose X is compact. A sequence {in} C C(X, Y) converges uniformly if and only if {fn} is equicontinuous and there exists a compact set KeY such that {fn(x)} is contained in a fJ-neighborhood of K for n sufficiently large. 9.52~.
Show the following.
Proposition. Let X, Y be two metric spaces and let 0 C X be compact. Then the subsets ofCo,c«O,Y) that are bounded in the II lieD,,, norm are relatively compact in
CO(O, Y).
9.3 Approximation Theorems In this section we deal with the following questions: Can we approximate a continuous function uniformly, and with given precision, by a polynomial? Under which conditions are classes of smooth functions dense with respect to the uniform convergence in the class of continuous functions?
9.3.1 Weierstrass and Bernstein theorems a. Weierstrass's approximation theorem In 1885 Karl Weierstrass (1815-1897) proved the following. 9.53 Theorem (Weierstrass, I). Every continuous function in a closed bounded interval [a, b] is the uniform limit of a sequence of polynomials. In particular, for every n there exists a polynomial Qn(x) (of degree d = d(n) sufficiently large) such that If(x) - Qn(x)1 ~ 2- n V x E [a, b]. If we set
n> 1, we therefore conclude that every continuous function f(x) can be written in a closed and bounded interval as the (infinite) sum of polynomials, 00
f(x) =
L n=O
Pn(x)
uniformly in [a, b].
304
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
We recall that, in general, a continuous function is not the sum of a power series, since the sum of a power series is at least a function of class COO, compare [GM2]. Many proofs of Weierstrass's theorem are nowadays available; in this section we shall illustrate some of them. This will allow us to discuss a number of facts that are individually relevant. A first proof of Theorem 9.53. We first observe, following Henri Lebesgue (1875-1941), that in order to approximate uniformly in [a, b] any continuous function, it suffices to approximate the function lxi, x E [-1,1]. In fact, any continuous function in [a, b] can be approximated, uniformly in [a, b], by continuous and piecewise linear functions. Thus it suffices to approximate continuous and piecewise linear functions. Let f(x) be one of such functions. Then there exist points xo = a < Xl < X2 < ... < Xr < Xr+l = b such that f'(x) takes a constant value d k in each interval]xk' Xk+l [. Then, in [a, b] we have r
f(x) = f(a)
+ L(dk
d-l = 0,
- dk-d'PXk(X),
k=O
where
1
'Pc (x) := max(x - c,O) = -((x - c)
2
+ Ix -
cl).
If we are able to approximate Ix - xkl, x E [a,b], uniformly by polynomials {Qk,n}, then the polynomials 1
r
Pn(x)
:=
f(a)
+ L(dk -
dk_l)-((x - Xk)
+ Qk,n(X))
2
k=O
approximate f(x) uniformly in [a, b]. By a linear change of variable, it then suffices to approximate Ixi uniformly in [-1, IJ. This can be done in several ways. For instance, noticing that if x E [-1,1], then 1 - Ixl solves the equation in Z
one considers the discrete process
zn+I(X) = {
~[z;(x) + (1- x 2)]
n
~
0,
zo(x) = 1.
It is then easily seen that the polynomials Zn (x) satisfy
(i) (ii) (iii) Since
°
Zn(X) ~ in [-1,1], zn(x) ~ Zn+I(X), Zn(X) converges pointwise to 1 -ixi if x E [-1,1]. 1 - Ixl is continuous, Dini's theorem, Theorem 9.36, yields that the polynomials
zn(x) converge uniformly to 1 -Ixl in [-1,1]. Alternatively one shows, using the binomial series, that 00
vT=""X =
L
n cnX ,
n=O
in] - 1,1[. Then one proves that the series converges absolutely in CO([-I, 1]), hence uniformly in [-1,1]. In fact, we observe that C n .- C~2)(-I)n is negative for n ~ 1 hence,
9.3 Approximation Theorems
u:co.
Obordlo
r.oal,tilobo ~ oopmnlor IIclHr l'mollm>on _ ntIla Vuilldorli-'
PROPRIETE E,TRRM LE
. . . . . ...= .,........':'T=,:=: .................... .... _,. ........ .. _....... ~:: ~
'\j\"ALYTIQ E
~
~--...
_ .... -........,.-,..0.- ............. ~.W;£W.~ntf.-fIA o. . . . ~
D'UU "AIIAILI .taul
................. _.r_. . __
11.1
....... ............ .....,.
-
+.,.._
~~
WI ft
.......-....
~
1l.4 . . . I-,.-~
~
Flit•
-
.................... .................. ....... - ... ;i;;f'-, .(T)•. hI
." I,..)
_...._. __.
, •••
.... -
U.·1\-.1II
'UII
./~
.......... ...=..,~II.JI ........... &0 ......... &,,'""'"""'"'
305
t»m.na .""......
GAV1"DU·nLLolJ If 0".
""
> .. ,,> ..,
-~
~--...-.
,
Figure 9.7. The first page of Weierstrass's paper on approximation by polynomials and the Le<;ons sur les proprietes extremales by Sergei Bernstein (188G-1968). p
p
L lenl = 2 - L n=O
p
Cn
= 2-
n=O
lim_ x-+l
L enx
n
n=O
00
:'::: 2 -
lim
L
CnX
x-tl- n=O
n
=2-
lim
V1=X = 2.
x-l-
Replacing 1 - x with x 2 , it follows that 00
Ixl =
L
cn (1 - x 2 )n
uniformly in [-1, IJ.
n=O
o 9.54
-,r.
Add details to the previous proof.
b. Bernstein's polynomials Another proof of Theorem 9.53, grounded in probablistic ideas, see Exercise 9.57, and giving explicit formulas for the approximating polynomials, is due to Sergei Bernstein (1880-1968). It is enough to consider functions defined in [0, 1] instead of in a generic interval [a, b]. 9.55 Definition. Let f E C1([0, 1]). Bernstein polynomials of fare
n
~
0.
306
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
9.56 Theorem (Bernstein). Bernstein's polynomials of f converge uniformly in [0,1] to f. Proof. We split the proof into three steps. Step 1. The following identities hold (9.8) n
:L(k - nx)2(n)X k (1 - x)n-k = nx(l- x). k=O k
(9.9)
The first is trivial: it follows from the binomial formula (a + b)n = E~ (~)akbn-k by choosing a = x and b = 1- x. The second needs some computation. Fix n 2:: 1. Starting from the identities
t t
t
k=O
k(n)yk = yD( (n)yk) = ny(y + l)n-l, k k=O k
k=O
k 2 C)yk = ny(ny + l)(y k
+ l)n-2,
we replace y by x/(I- x) and multiply each of the equalities by (1- x)n. It follows that
t
C)X k (1 - x)n-k = 1, k=O k
t
k=O
kC)x k (1 - x)n-k = nx, k
k=O
k 2 C)x k (1 - x)n-k = nx(nx + 1 - x). k
t
Multiplying each of the previous identities respectively, by n 2x 2 , -2nx and -1, and summing, we infer (9.9). Step 2. As x(1 - x) S ~, (9.9) yields
t Fix /)
> 0 and x
~x)2 (n)X k (1
(k -
k=O
n
k
_ x)n-k
S ~.
(9.10)
4n
E [0,1], and denote by ~n (x) the set of k in {O, 1, ... , n} such that
(9.10) then yields
:L
k C)x (1 - x)k S 1/)2' kE~n(x) k 4n
(9.11)
that is, for n large, the terms that mostly contribute to the sum in (9.8) are the ones with index k such that
9.3 Approximation Theorems
Step 3. Set M := sUPxE[O,l] If(x)1 and, given for [x - y[ < 8. Then we have Bn(x) - f(x) =
to [f(~)
E
> 0,
307
let 8 be such that If(x) - f(y)1
<E
- f(X)] G)x k (l- x)n-k
[f(~)-f(X)]G)xk(l-X)n-k
L kEt>n(x)
L [f(~)-f(x)]G)xk(l-X)n-k
+
kEI'n(x)
=: ~t>
where
rn For k E
r n(X),
:= {O, ... ,
i.e., Ik/n -
+ ~I'
n} \ tin = {k E {O, ... , n} II ~ - xl < 8}.
xl < 8,
on the other hand, if Ik/n - xl
~
we have [f(k/n) - f(x)[
< E,
hence
8, (9.11) yields
1 M < 2M - = -. I ~AI ~ 4n82 2n8 2
Therefore, we conclude for n large enough so that M/(2n8 2 )
IBn (x) - f(x)1 < 2E
:::; E,
uniformly in [0,1].
o 9.57 'If. The previous proof has the following probabilistic formulation. Let and let X n (p) be a random variable with binomial distribution
If f : [0,1]
--t
°:: ;
p :::; 1
lR is a function, the expectation of f(Xn(t)) is given by 00
E [f(Xn(t))] =
L f( ~) C)n1 - t)n-r n n
r=O
and one shows in the theory of probability that E [f(Xn(t))] converge uniformly to f.
c. Weierstrass's approximation theorem for periodic functions
We denote by C~ the class of continuous periodic functions in IR with period T > O. 9.58 Theorem (Weierstrass, II). Every function f E C~ is the uniform limit of trigonometric polynomials with period T.
308
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
In Section 9.3 we shall give a direct proof of this theorem and in Section 11.5 we shall give another proof of it: the Fejer theorem. It is worth noticing that, in general, a continuous function is neither the uniform nor the pointwise sum of a trigonometric series. Here we shall prove that the claims in Theorems 9.58 and 9.53 are equivalent. By a linear change of variable we may assume that T = 21f. First let us prove the following. 9.59 Lemma. Let f E C O([-1f,1f]) be even. Then for any an even trigonometric polynomial
E
> 0 there is
n
T(x) :=
I: ak coskx k=O
such that If(x) - T(x)1 <
V x E [-1f,1f].
E
Proof. We apply Theorem 9.53 to the continuous function g(y) := f(arccosy), y E
[-1,1], to obtain !f(arccosy) -
t
Ckykl
in [-1,1]'
k ck cos xl
in [O,n].
k=O hence !f(x) -
t
k=O n
L
To conclude, it suffices to notice that
Ck cos k x is an even polynomial.
o
k=O
Proof of Theorem 9.58. Let f E cg11"(JR). We consider the two even functions in [-n, n] f(x)
+ f(-x),
Then Lemma 9.59 yields for any f(x)
+ f( -x) =
T1 (x)
f
(f(x) - f(-x))sinx.
>0
+ 0<1 (x),
(f(x) - f( -x)) sin x = T2(X)
+ 0<2 (x)
T1(X) and T2(X) being two even trigonometric polynomials, and for the remainders 0<1 and 0<2 one has 10<1(x)l, 10<2(x)1 ::; f in [-n,n]. Multiplying the first equation by sin 2 x and the second by sinx and summing we find (9.12) for T3(X) a trigonometric polynomial and 110<31100,[-11",11"1 ::; 2f. The same argument applies to f(x - n /2), yielding
f( x -
~) sin 2 x
= T4(X)
+ 0<4(X)
where T4 is a trigonometric polynomial and 110<41100,[-11",11"] ::; 2f. By changing the variable x in x + ~, we then infer
(9.13) where T5(X) := T4(X+ ~) and 110<5 II 00, [-11" ,11"J ::; 2t:. Summing (9.12) and (9.13) we finally conclude the proof. 0
9.3 Approximation Theorems
309
9.60 Remark. Actually, the two Weiertrass theorems are equivalent. We have already proved Theorem 9.58 using Theorem 9.53. We now outline how to deduce the first Weierstrass theorem, Theorem 9.53, from Theorem 9.58, leaving the details to the reader. Given f E CO([-Jr, JrJ), the function g(x) := f(x) + f( -Jr) - f(Jr) x 2Jr satisfies g(Jr) = g( -Jr), hence g can be extended to a continuous periodic map of period 2Jr. According to Theorem 9.58, for any E > we find a trigonometric polynomial
°
n(,)
T,(x) :=
ao + 2:)ak coskx + bksinkx) k=l
with [g(x) - T,(x)1 :::; E for all x E [-Jr, Jr]. Next, we approximate sinkx and cos kx by polynomials (e.g., by Taylor polynomials), concluding that there is a polynomial Q,(x) with IT,(x) - Q,(x)1 < E "Ix E [-Jr, Jr], hence [g(x) - Q,(x)1 < 2E in [-1,1].
9.3.2 Convolutions and Dirac approximations We now introduce a procedure that allows us to find smooth approximations of functions.
a. Convolution product Here we confine ourselves to considering only continuous functions defined on the entire line. The choice of the entire line as a domain is not a restriction, since every continuous function on an interval [a, b] can be extended to a continuous function in lR and, actually for any 15 > 0, to a continuous function that vanishes outside [a - 15, b + 15]. f :
9.61 Example (Integral means). Let consider the mean function of f 1
rx H
lR
-+
f8(X) := 28 }x-8 f(~) df"
lR be continuous. For any 8
x E lR.
Simple consequences of the fundamental theorem of calculus are (i) f8(X) is Lipschitz continuous, (ii) h(x) -+ f(x) pointwise, while from the estimate
Ih(x) - f(x)[::;
sup [f(y) - f(x)1 ly-xl<8
and Theorem 6.35
(iii) f8(X)
-+
f(x) uniformly on every bounded interval of lR.
>
0
(9.14)
310
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
The above allows us, of course, to uniformly approximate continuous functions with Lipschitz-continuous functions on every bounded interval.
9.62 Definition. Let f, 9 : JR ---+ JR be two Riemann integrable functions. Suppose that g(x - t)f(t) is summable in JR for any x E R Then the function
9 * f(x)
:=
l
g(x - y)f(y) dy,
x E JR,
called the convolution product of f and g, is well defined. Clearly the map (j, g)
---+ 9
*f
(i) is a bilinear operator, (ii) 9 * f = - f * 9 since
9 * f(x) =
l
g(x - y)f(y) dy =
-l
g(z)f(x - z) dz = - f
* g(x),
(iii) if f and 9 are summable in [a, b] and f vanishes outside the interval [a, b], then 9 * f is well defined in JR and
Ig * f(x)1
:::;
l
b Ig(x - y)llf(y)1 dy :::; Ilglloo,[x-b,x-a]
l
b If(y)1 dy. (9.15)
9.63 Example. The function
18
in (9.14) is the convolution product of 1 and
g(x) =
{I
o
se se
8,
Ixl ::; Ixl > 8.
9.64 Example. If g(t) = :L~=o Cktk is a polynomial of degree n, then for any vanishes outside an interval,
1 that
is again a polynomial of degree n. 9.65 Example. If 1 = 0 outside [-n,n], then e ikt
* I(x)
1r
= [
1r
l(y)eik(X- y ) dy = 2nckeikx
i.e., e ikt * 1 is the kth harmonic component of the periodic extension of Section 11.5.
I,
compare
9.66 Theorem. Let 9 E Ck(JR), and let f be Riemann summable. Suppose that either f or 9 vanishes outside a bounded interval [a, b]. Then 9 * f E Ck(JR) and Dk(g * f)(x) = (Dkg) * f(x) "Ix E R
9.3 Approximation Theorems
311
Proof. We prove the claim when f = 0 outside [a, b], the other case 9 = 0 outside [a, b] is similar. By (9.15) we then have Ig * f(x)1 :::; Ilg[[oo,[x-b,x-ajll![[l
[[fliI
:=
l
b [f(y)[ dy,
hence
Ilg * f[[oo,[c,dj :::; [[g[[oo,[c-b,d-a] [[f[ll. (i) We now prove that 9
9 * f(x
+ h) -
*f
E C°(lR) if 9 E C°(lR). In fact,
= f(g(x - y + h) - g(x - y))f(y) dy = G * f(x),
9 * f(x)
where G(x) := g(x + h) - g(x). Therefore, using (9.15), we get
* f(x + h) - 9 * f(x)1 :::; Ilg(x + h) - g(x)lloo,[x-b,x-a] Ilflll ---+ 0 (9.16) as h ---+ 0 since [[g(t + h) - g(t)lloo,[x-b,x-a] ---+ 0, because of the uniform continuity of Ig
9 on compact sets.
(ii) Similarly, we prove that 9
*f
E Cl(lR) if
f E Cl(JR.). We have
9*f(X+h~-g*f(x) _ fg'(x-y)f(y)d y =
l (g(x-y+h~-9(X-Y)
-g'(x-y))f(y)dY
=H*f(x), where H(x)
:=
g(X+htg(x) - g'(x). Again, by (9.15),
Ig*f(X+h~-g*f(x) _ fg'(x-Y)f(y)d Y [ :::; Ilg(t+h) -g(t) -g'(t)11 IlfliI. h oo,[x-b,x-a] Since
Ig(x + h~ - g(x) _ g'(X)! = I~ l x+h(g'(y) - g'(x))! 1 (",+h
:::; h Jx :::;
Ig(y) - g(x)[ dy
sup 19'(y)-g'(x)I---+O ly-xl
ash---+O
because of the uniform continuity of g' on compact sets, we then conclude that 9 * f is differentiable at x and that (g * I)' (x) = 9' * f(x) "Ix E JR.. Finally (g * I)' = g' * f is continuous by (i).
(iii) The general case is then proved by induction.
0
9.67 Remark. Let f and 9 be summable and let one of them vanish outside a bounded interval. If f instead of 9 is of class Ck(JR.), then, recalling that 9 * f = - f * g, we infer from Theorem 9.66 that 9 * f E Ck(JR.) and Dk(g * f)(x) = 9 * (Dki)(x). Therefore if both f and 9 are of class Ck(JR.), then
Dk(g * f)(x) = (Dkg)
and, in general, 9
*f
* f(x) = 9 * (Dki)(x)
is as smooth as the smoother of f and g.
312
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
b. Mollifiers 9.68 Definition. A function k(x) E
= k( -x),
k(x)
k(x)
~ 0,
Coo(~)
such that
= 0 for Ixl ~
k(x)
1,
1
k(x) dx
=1
is called a smoothing kernel. 9.69~.
The function
exp (-~) I-x {o
if
Ixl < 1,
if
Ixl 2':
1
is COO(lR.), nonnegative, even and with finite integral. Hence the map k(x) := :%
Given a smoothing kernel k(x), we can generate the family
E-1k(~)
k,(x) :=
Trivially, k,( -x) k, E
= k,(x)
C,:"'(l~),
k,(x)
E
>
o.
and
> 0,
k,(x)
= 0 per Ixl ~
E,
1
k,(x) dx
=
1.
Also k,(x) = 0 for Ixl ~ E and Ilk,lloo = Ilklloo/E.
9.70 Definition. Given a smoothing kernel k, the mollifiers or smoothing operators S, are defined by Sd(x) := k,
* f(x)
=
1
k,(x - y)f(y) dy.
We have Sd(x)
l
x +,
= k, * f(x) = x-'
11
=-
E
x
k,(x - y)f(y) dy
+, kC - Y)f(Y) dy =
x-,
E
1 1
k(z) f(x - EZ) dz.
-1
Since the functions k, are of class Coo, the functions Sd(x), x E ~, are of class Coo by Theorem 9.66. Moreover, as shown by the next theorem, they converge to f in norms that are as strong as the differentiability of f; for instance, they converge uniformly or in norm C 1 if f is continuous or if f is of class C 1 , respectively.
f E \iE > 0;
9.71 Proposition. Let
(i) Sd
E Coo(~),
C°(l~). Then
9.3 Approximation Theorems
(ii) If f = 0 in [a, b], then Sd(x) = 0 in [a (iii) (Sd),(x) = (iv) Sd -+ f as
+ E, b -
313
E];
~ JJRk'(~)f(Y)dY; E
0 uniformly in any bounded interval [a, b].
-+
Moreover, if f E C1(lR.), then (Sd),(x) = (Sd')(x) \Ix E lR. and Sd'(x) f'(x) uniformly on any bounded interval [a,b].
-+
Proof. (i), (iii) follow from Theorem 9.66, and (ii) follows from the definition. (iv) If
f
E C°(lR) and x E JR we have
I!(x) - Sd(x)1
J
=I ~
K.(x - y)[f(y) - f(x)] dY !
= Ik. * (J -
f(x»(x)1
r
sup If(y) - f(x)1 ke(y) dy = sup If(y) - f(x)l· Iy-xl<. JJII. Iy-xl<'
Since f is uniformly continuous on bounded intervals in JR, SUPly_xl<.lf(y)- f(x)1 -+ 0, consequently IS. (x) - f(x)1 -+ 0 as € -+ 0 uniformly on compact sets of JR. If f E C 1 (JR), we have already proved in Theorem 9.66 that Sd is of class C 1 and that (Sd)'(x) = Sd'(x). Applying (iv) to Sd and Sd' we then reach the claim. 0
c. Approximation of the Dirac mass The family ike} is often referred to as an approximation of the Dirac delta. In applications, the Dirac 8 is often "defined" as a function vanishing at every point but zero and with the property that +00
1
-00
8(x) dx
= 1;
sometimes it is "defined" , with respect to convolution, as if it would operate like +00
1
-00
8(x - ~)f(~) d~ = f(x).
Of course, no such function exists in the classical sense; but it can be thought of as a linear operator from C~ (lR.) into lR.
We shall avoid dealing directly with 8, as the correct context for doing this is the theory of distributions, and we set 9.72 Definition. A sequence of nonnegative functions D n : lR. -+ lR. with the properties that for any interval [a, b] and for any p > 0 we have ( JB(O,p)
Dn(x) dx
-+
1,
1
[a,bl\B(O,p)
is called an approximation of the 8.
Dn(x)dx-+O, asn-+oo, (9.17)
314
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
,.,l~ ,:. ,:' ,.,
,:l
;A:
1: 1 /:\
~ o.
Figure 9.8. Approximations of the Dirac
9.73~. Let {D n } be an approximation of 0 and let Show that
lim
f be a continuous function in [a, b].
(b Dn(x - y)f(y)dx = f(x)
n-tooJa
'Ix E]a,b[.
It is easy to prove the following.
9.74 Theorem. Let {D n } be an approximation of 15. Suppose that each D n is continuous in JR. and let f be a continuous function in [a, b]. Then the functions fn(x) :=
l
b
Dn(x - y)f(y) dy,
x E [a, b],
converge uniformly to f in every interval [c, d] strictly contained in [a, b]. Theorem 9.74 uses, in an essential way, the fact that the approximations of 15 are nonnegative. For instance, the result in Theorem 9.74 does not hold for the sequence of the Dirichlet kernels, since the Fourier series of f does not converge to f if f is merely continuous, compare Section 11.5. 9.75~.
Prove Theorem 9.74.
9.76~.
Consider in [-1,1] the sequence of functions where
Show that for every p E]O, 1[ lim
t(1- t 2 )n dt p
n~(X) f~ (1 - t 2 )n dt
=0.
Infer that {D n } is an approximation of 0, hence the functions
1 1
fn(x) := C n
[1 - (t - x)2r f(t) dt
converge uniformly to f on compact sets of] - 1,1[. Finally, observing that the functions fn(x) are actually polynomials of degree not greater than 2n, called Stieltjes polynomials, deduce from the above Weierstrass's theorem, Theorem 9.53.
9.3 Approximation Theorems
cou.rCTJ~ 1II101O~o''IQI:lf'IIrU ~ I,.l flllQlJr, D I~ 'OIlD1 ..... U .''1'1• • III • • ....~
I. P.
315
ATAN 0
LECO~
TJ THEOR
L'APPROXHIATION DE FONCTION I)'U'I[
ARlA8LE RlIELLE
I
VoJum
U IFonll Al'PKOX1MATIO c. o .....
......
"n'
.
.
VALt..tZ POVUfN
"', _ ..' ..
--~
I
lItto'
1',....t.ot.,,)" ALEX)
• OItOII-4 "K\
p.~nIS
,
G~l,'JlUl!.ll·\Il,.. ~'~"
uc.. n
~
q.
'
C;",
llllrttlL
'"' ~ _ _ ,nU_IK_
ool.'...-........
tUr:OYIlI
l)
K
NCAR
rUlll.l
HI~C
LO
."Ifr tGlft
Figure 9.9. Frontispieces of L 'approximation des fonctions by Charles de la ValleePoussin (1866-1962) and of J. P. Natanson Constructive Function Theory, New York, 1964.
Consider the functions
Dn(t)
:=
C
n COS
2n
G),
t
E
[-n, n]
where, see 2.66 of [GM2],
1
:= - - - - - - ; - - , - -
Cn
J':TC cos2n (~) dt
1
(2n)!! 2n (2n - I)!!'
As proved below, we have the following.
9.77 Lemma. The sequence {D n } is an approximation of Ii. Hence, as a consequence of Theorem 9.74, we can state the following.
9.78 Theorem (de la Vallee Poussin). Letf E CO([-n,n]). Thefunctions To ( ).= (2n)!! 1 fTC f() 2n 2 X) d t (9.18) n x. (2n-l)!!2n -TC t cos
(t -
converge uniformly to f in every interval [a, b] with -n < a < b < n. Proof of Lemma 9.77. (i) Since cos t is decreasing in [0, 1r /2]' we have
i
2
Tr /
cos 2 n(t) dt
:s;
(~ -
p) cos 2 n(p)
:s;
~ cos 2n(p);
on the other hand, since cos t is concave in [0, 1r /2]' we have cos t 2: 1 - 2t/1r, hence
316
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
",/2 1o
cos 2n(t)dt 2:
1"'/2 0
(2t)2n 1- -
11"
== (
11"
2 2n
+1
);
we therefore conclude as n----+
00
and (9.19)
o The functions Tn(x) integrals.
III
(9.18) are often called de la Vallee Poussin
9.79 Remark. Let 9 E C°(IR) be a periodic function with period 27L Applying Theorem 9.78 to g(x) := f(3x), x E [-7r,7r], we deduce the uniform convergence of {Tn (x)} to g(x) in [-7r/3,7r/3], i.e., the uniform convergence of {Tn (x/3)} to g(x) in [-7r, 7r]. Since the Tn's are trigonometric polynomials of degree at most 2n, we may deduce at once the second Weierstrass theorem from Theorem 9.78.
9.3.3 The Stone-Weierstrass theorem Weierstrass's theorems can be generalized to and seen as consequences of the following theorem proved in 1937 by Marshall Stone (1903-1989) and known as the Stone- Weierstrass theorem. Let X be a compact metric space and let C(X) = CO(X, IR) be the Banach space of continuous functions with uniform norm. An algebra of functions A is a real (complex) linear space of functions f : X ~ IR (respectively, f : X ~ q such that fg E A if f and 9 E A. We say that A distinguishes between the points of X if for any two distinct points x and y in X there is a function f in A such that f (x) -=I- f (y). We say that A contains the constants if the constant functions belong to A.
9.80 Theorem (Stone-Weierstrass). Let X be a compact metric space and let A be an algebra of continuous real-valued functions, A C CO(X, IR). If A contains constants and if it also distinguishes between the points of X, then A is dense in the Banach space CO (X, IR). Let A be an algebra of bounded and continuous functions. As we have seen, the function Iyl can be approximated uniformly in [0,1] by polynomials. Consequently, if f E A, by considering instead of f the function h := f /Ilflloo, we can approximate x ~ If(x)1 uniformly by the functions Pn(J(x» where {Pn } is a sequence of polynomials. Since the Pn(J(x»'s belong to A, as A is an algebra of functions and f E A, we conclude that IfI belongs to the uniform closure of A, and also
9.3 Approximation Theorems
1
max (I,g) := "2(1 + g + If - gJ),
.
1
mm(l,g) := "2(1 + g -If
317
- gJ)
are in the uniform closure of A, if both f and g are in the uniform closure of A. A linear space of functions R with the property that max (I, g) and min (I, g) are in R if f and g E R is called a linear lattice: the above can be then restated as the closure of A is a linear lattice. To prove that A is dense in CO(X, JR.), it therefore suffices to prove the following. 9.81 Theorem. Let X be a compact metric space. A linear lattice R c CO(X, JR.) is dense, provided it contains the constants and distinguishes between the points in X. Proof. First we show that, for any
f E CO(X, lR) and any couple of distinct points
x, y EX, we can find a function 'lj;x, y E R such that 'lj;x,y(x)
= f(x)
'lj;x,y(y)
= f(y)·
In fact by hypothesis, we can choose w E R such that w(x)
'lj;x,y(t)
:=
i' w(y);
then the function
(f(x) - f(y) w(t) - (f(x)w(y) - f(y)w(x» w(x) - w(y)
has the required property. Given f E CO(X,lR), f > 0 and y E X, for every x E X we find a ball B(x,r x ) such that 'lj;x,y(t) > f(t) - f "It E B(x,r x ). Since X is compact, we can cover it by a finite number of these balls {B x ;} and we set 'Py := max 'lj;x;,y. Then 'Py(Y) = f(y) and 'Py E R since R is a lattice. We now let y vary, and for any y we find B(y, r y ) such that 'Py(t) ~ f(t) + f "It E B(y, ry). Again covering X by a finite number of these balls {B(Yi,ry;}}, and setting 'P:= maxi 'Py;, we conclude 'P E lR and 1'P(t) - f(t)1 < f "It E X, i.e., the claim. D
Of course real polynomials in [a, b] form an algebra of continuous functions that contains constants and distinguishes between the points of [0, 1]. Thus the Stone-Weierstrass theorem implies the first Weierstrass theorem and even more, we have the following. 9.82 Corollary. Every real-valued continuous function on a compact set K c JR.n is the uniform limit in K of a sequence of polynomials in n variables.
Theorem 9.80 does not extend to algebra of complex-valued functions. In fact, in the theory of functions of complex variables one shows that the uniform limits of polynomials are actually analytic functions and the map z ----> z, which is continuous, is not analytic. However, we have the following. 9.83 Theorem. Let A c CO(X, q be an algebra of continuous complexvalued functions defined on a compact metric space X. Suppose that A distinguishes between the points in X, contains all constant functions and contains the conjugate 7 of f if f E A. Then A is dense in CO(X, q.
318
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
...... ,h ..., ....... - _ l l I
LE FO fIOI
c-.n-
OJ CONTI UE
AI."
.. ................... l·. __... ,_.-.... :Au• •
PARIS . ..."utIIIU.\'ll.&.JoaJ, •••ll.lwQ!lI.ualllu... n
...... t Ut ... ,11.,'."1, U L'lul.. ,.,,,,,•• ,,.. ~.l"
'.....
~.f_
,10>
...
Figure 9.10. Rene-Louis Baire (1874-
1932) and the frontispiece of his Ler;ons sur les functions discontinues.
Proof. Denote by Ao the subalgebra of A of real-valued functions. Of course iRf = ~(f + and ':Sf = fI(f belong to Ao if f and 9 E A. Since f(x) i- f(y) implies that iRf(x) i- iRf(y) or ':Sf (x) i- ':Sf(y), Ao also distinguishes between the points of X and, trivially, contains the real constants. It follows that Ao is dense in CO(X, lR) and consequently, A is dense in C(X, ic). 0
I)
-I)
The real-valued trigonometric polynomials n
aO
+
L (ak cos kx + bk sin kx)
(9.20)
k=l
form an algebra that distinguishes between the points of [0, 211"[ and contains the constants. Thus, trigonometric polynomials are dense among continuous real-valued periodic functions of period 211". More generally, from Theorem 9.83 we infer the following.
9.84 Theorem. All continuous complex-valued functions defined on the unit sphere {z E C IIzl = I} are uniform limits of complex-valued trigonometric polynomials
9.3 Approximation Theorems
319
9.3.4 The Yosida regularization a. Baire's approximation theorem The next theorem relates semicontinuous functions to continuous functions.
9.85 Theorem (Baire). Let X be a metric space and let f : X ----+ lR be a function that is bounded from above and upper semicontinuous. Then there is a decreasing sequence of continuous, actually Lipschitz continuous, functions Un} such that fn(x) ----+ f(x) for all x EX. Proof. Consider the so-called Yosida regularization of f
fn(x):= sup {f(y) - nd(y,x)}. yEX
Obviously f(x) ::; fn(x) ::; supf, fn(x) ~ fn+l(X). We shall now show that each fn is Lipschitz continuous with Lipschitz constant less than n. Let x, y E X and assume that fn(x) ~ fn(Y). For all 'T/ > 0 there is x' E X such that
fn(x)
< f(x') - nd(x,x') + 'T/
hence
0::; fn(x) - fn(Y) ::; f(x') - nd(x, x')
+ 'T/ - (f(x') - nd(y, x'» + 'T/ ::; nd(x, y) + 'T/
= n(d(y, x') - d(x, x'» thus
[fn(x) - fn(y)1 ::; nd(x,y), since 'T/ is arbitrary. Let us show that fn(xo) 1 f(xo). Denote by M the sUPxEX f(x). Since f(xo) ~ limsupx~xo f(x), for any A > f(xo) there is a spherical neighborhood B(xo,8) of Xo such that f(x) < A "Ix E B(xo, 8), hence
f() x - n d( xo,x ) ::;
A { M - n8
if d(x,xo)
< 8,
if d(x,xo)
~
8.
Then f(x) - nd(xo, x) ::; A "Ix E X, provided n is sufficiently large, n ~ M-~(xo) hence f(xo) ::; fn(xo) = sup(f(x) - nd(xo, x» ::; A. x Since A> f(xo) is arbitrary, we conclude f(xo) = limn~oo fn(xo).
o
Suppose that X = lR n . An immediate consequence of Dini's theorem, Theorem 9.36, and of Baire's theorem, is the following.
9.86 Theorem. Let f : lRn ----+ lR be a function that is bounded from above and upper semicontinuous. Then there exists a sequence of Lipschitzcontinuous functions that converges uniformly on compact sets to f.
320
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
b. Approximation in metric spaces Yosida regularization also turns out to be useful to approximate uniformly continuous functions from a metric (or normed) space into IR by Lipschitzcontinuous functions. Let X be a normed space with norm II II. 9.87 Proposition. The class of uniformly continuous functions f : X IR is closed with respect to the uniform convergence.
t
E
Recall that the modulus of continuity of f : X [0, +oo[ by wf(t):= sup{lf(x) - f(y)llx,y E X,
Clearly
f
----+
----+
IR is defined for all
Ilx -yll ~ t}.
is uniformly continuous on X if and only if wf(t)
(9.21 ) ----+
0 as t
----+
O.
9.88'. Prove Proposition 9.87.
Lipschitz-continuous functions from X to IR are of course uniformly continuous, therefore uniform limits of Lipschitz-continuous functions are uniformly continuous too, on account of Proposition 9.87. We shall now prove the converse, compare Example 9.61.
9.89 Theorem. Every uniformly continuous function f : X uniform limit of a sequence of Lipschitz-continuous functions.
----+
IR is the
In order to prove Theorem 9.89, we introduce the function <5f(s) that measures the uniform distance of f from the class Lip s(X) of Lipschitzcontinuous functions 9 : X ----+ IR with Lipschitz constants not greater than s
9.90'. Show that 8 -+ Of(8) is nonincreasing and that I is the uniform limit of a sequence of Lipschitz-continuous functions if and only if Of(8) -+ 0 as 8 -+ 00.
Then we introduce the Yosida regularization of f : X fs(x) := inf {f(Y) yEX
----+
IR by
+ s Ilx - YII}.
9.91 ,. Show that (i) (ii) (iii)
Is is Lipschitz-continuous with Lipschitz constant 8, Is (x) :<::: It (x) \:Ix if 8 < t, and, actually Is is the largest 8-Lipschitz-continuous function among equal to
I.
functions less than or
9.3 Approximation Theorems
321
9.92 Proposition. Let f : X ---. JR be a uniformly continuous function. Then 6f(s) =
~ SUP{Wf(t) 2
(9.22)
st}.
t2:0
Moreover, the minimum distance of f from Lip s(X) is obtained at 9s(X) := fs(x) + 6f(8), i.e., Proof. Let 9 E Lip s(X). Then
glloo + sllx -
If(x) - f(y)1 :::; 2 [If -
y[l·
For x, y such that [[x - y[ [ :::; t, by taking the infimum with respect to g, we infer
If(x) - f(y)1 :::;
2~f(s)
+ st
and, taking the supremum in x and y with [[x - yll :::; t, we get wf(t) :::; 2~f(s) hence
+ st, (9.23)
Let us prove that the inequality (9.23) is actually an equality and the second part of the claim. For x, y E X we have
f(x) - f(y) :::; wf([lx - y[[) = (Wf([[X :::; SUP{Wf(t) - st} t20
+ sllx -
ylJ) -
s[lx -
yll) + sllx - yll
yl[.
By taking the supremum in y we get
'ixEX hence, by (9.23) we infer
[If - fslloo :::; SUP{Wf(t) - st} :::; 2~f(s).
(9.24)
t20
Therefore, for gs(x) := fs(x)
+ ~f(x)
we have
and, since gs E Lips(X), we conclude Ilf - gs[[oo = ~f(s). Moreover, by (9.24)
i.e.,
o 9.93'. Show that if
r
:= -(-I)s, then
r
E Lips(X) and
gS(x) := j"(x) -
~f(s).
322
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
Proof of Theorem 9.89. It is enough to prove that infs>o of(s) = O. First, we notice that wf(t) is nondecreasing and subadditive. In fact, if x, yare such that Ilx - yll :::; a+b and we write b a z := - - x + - - y , a+b a+b then Iz - xl:::; a and Iz - yl :::; bj consequently, If(x) - f(z)1 :::; wf(a)
and
If(y) - f(z)1 :::; wf(b)
that yield at once wf(a
+ b) :::; wf(a) + wf(b).
Next, we observe that for any E > 0 and t = mE + (J, mEN, and
(J
< E, we have m < tiE
Therefore Of (Wf(E)) = E
~ SUP{Wf(t) 2 t2':o
_ Wf(E)
t}
E
1 {Wf(E) Wf(E)} 1 :::; - sup - - t + Wf(E) - - - t = -Wf(E). 2 t2':o E E 2
From the last inequality we easily infer infs>o of(s) =
o.
o
9.4 Linear Operators 9.4.1 Basic facts In finite-dimensional spaces, linear maps are continuous, but this is no more true in infinite-dimensional normed spaces, see Example 9.96. 9.94~.
Show that
Proposition. Let X and Y be normed spaces. Suppose that X is finite dimensional. Then every linear map L : X - t Y is continuous.
The following proposition characterizes linear maps between two Banach spaces that are continuous.
9.95 Proposition. Let X and Y be normed spaces and let L : X a linear map. Then the following conditions are equivalent
-+
Y be
(i) L is continuous in X, (ii) L is continuous at 0, (iii) L is bounded on the unit ball, i.e., there exists K ~ 0 such that IIL(x)lly ::; K 't/x with Ilxllx ::; 1, (iv) there exists a constant K such that IIL(x)lly : :;: Kllxllx 't/x E X, (v) L is Lipschitz continuous.
9.4 Linear Operators
323
Proof. If L is continuous, then trivially, (ii) holds. If (ii) holds, then there exists 8 > 0 such that IIL(x)lly ::; 1 provided Ilxllx < 8. This yields IIL(x)lly ::; 1/8 if Ilxllx ::; 1, since L is linear, Le., (iii). Assuming (iii) and the linearity of L, we infer that for all x E X, x ~ 0,
IIL(x)lly Ilxllx
=
IIL(-X-)II < K, Ilxllx y-
Le., (iv). (iv) in turn implies (v) since
IIL(x) - L(y)lly = IIL(x -
y)lly ::; Kllx - yllx
'ix,y E X,
o
and trivially, (v) implies (i).
9.96 Example. Let X be a normed space and let {en} C X be a countable system of independent vectors with lien II = 1, and let Y C X be the subspace of finite linear combinations of {en}. Consider the operator L : Y ---t R defined on {en} by L( en) := n 'in and linearly extended to Y. Evidently L is linear and not bounded.
Linear maps between Banach spaces are often called linear operators.
a. Continuous linear forms and hyperplanes Consider a linear map L : X --+ II{ defined on a linear normed space X, often called also a linear form. If L is not identically zero, we can find x such that x (j. ker L and we can decompose every x E X as
L(x) _ ( L(x L ) x = L(x) x + x - L(x) x ; in other words X
= Span {x}
EEl ker L.
However it may happen that ker L is dense in X.
9.97 Proposition. Let L : X --+ lR be a linear map defined on a normed space X. Then ker L is closed if and only if L is continuous. Proof. Trivially, ker L := L -1 (0) is closed if L is continuous. Conversely, if ker L = X, then L is constant, hence continuous. Otherwise we can choose x such that L(x) = l. Since ker L is closed, also H := x + ker L is closed; since 0 t/: H, we can then find a ball B(O, r) such that B(O, r) n H = 0. We now prove that L is continuous showing that IL(x)1 < 1 'ix E B(O, r). In fact, if IL(x)l2: 1 for some x E B(O,r), then
II L~X) II = IL~X)lllxll < r
while
L
( x) --
L(x)
-1 - .
Since H = {x I L(x) = I}, we conclude that x/ L(x) E H n B(O, r), a contradiction.
0
9.98 Corollary. If L : X --+ lR is a linear map on a normed space X, then ker L is either closed or dense in X. In fact, the closure of ker L is a linear subspace that may agree either with ker Lor X.
324
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
b. The space of linear continuous maps For any linear map L : X -+ Y between two normed spaces with norms IIl1x and II [Iy, we define
IILII.C(x,Y):=
sup
I[L(x)lly,
(9.25)
Ilxl[x:9
i.e.,
I[LIIL:(x,Y) = [IL[loo,B,
B
= {x E X II[xllx = I},
or, equivalently,
IILI[C(x,Y)
= inf{ K
E
IR IIIL(x)[IY ::s;
Kllxllx }
so that
IIL(x)l[y ::s; IILI[qx,Y)
Ilxllx
"Ix E X.
One can shorten this to
IILxll::S; I[L[I [Ixl[ once the norms used to evaluate x, Lx and L are understood. From Proposition 9.95 it follows that L : X -+ Y is continuous if and only if IIL[IL:(x,Y) < +00. For this reason, linear continuous maps from X to Yare also called bounded operators. It is now easily seen that IILIIc(x,Y) is a norm on the linear space .c(X, Y) of linear continuous maps from X into Y.
9.99 Theorem. Let X be a normed space and let Y be a Banach space. Then .c(X, Y) is a Banach space. Proof. Let {L n } be a Cauchy sequence in £(X, Y). For any E > 0 there is nO(E) such that IILn - Lmll < E for n,m 2': no. In particular, IILn(x) - Lm(x)lly 5:: Ellxl[x for all x E X, i.e., for every x EX {Ln(x)} is a Cauchy sequence in Y hence converges to an element L(x) E Y, Ln(x) ---> L(x) as n ---> 00. Letting n to infinity in Ln(x
+ y) =
Ln(x)
+ Ln(y),
we see that L is linear. Letting m ---> 00 in IILn(x) - Lm(x)lly 5:: Evalid for Ilxllx 5:: 1, n 2': no(E), we also find IILn(x) - L(x)lly 5:: E for Ilxllx 5:: 1 and n 2': nO(E). This implies IILII5:: IILnl1 +E and IILn - LII5:: Efor n 2': no, which in turn yields L n ---> L in £(X, Y). 0
c. Norms on matrices For any n, let lK = IR (lK = q and consider IR n (respectively en) as an Euclidean (Hermitian) space endowed with the standard Euclidean (Hermitian) product and let I I be the associated norm. Let L : lK n -+ lK m be linear, let L E Mm,n(lK) be the associated matrix, L(x) =: Lx, and let J.LI, J.L2,···, J.Ln be the singular values of L, that is the eigenvalues of the matrix (L T L)1/2 ordered in increasing order. Then
IILW =
sup IL(x)1 2 = sup (L* L(x)lx) [x[=l
[xl=l
= J.L~'
9.4 Linear Operators
Now define the £2_ norm of L E .c(lKn ,lKm
),
325
by
IILI12 := (2:)Lj)2) 1/2. t,J
Of course, II 112 and II II are equivalent norms in .c(lKn ,lKm .c(lKn , IK m ) is finite dimensional. More precisely we compute
),
since
and therefore, 9.100 Proposition. Let L E M m ,n(IK). Then IIL[I is the maximum of the IILII· Moreover, IILII = IILI12 singular values of Land IILII ::; IILI12 ::; if and only if Rank L = 1.
vn
Proof. Let j.l1, j.l2, ... , j.ln be the singular values of L ordered in nondecreasing order. By the above, IILII = j.ln :<:; IILI12 :<:; .fiiIILII while equality IILII = IILI12 is equivalent to j.l1 = ... = j.ln-1 = 0, and this happens if and only if Rank L = 1. 0 9.101 Then
'II.
Let T : f2
->
IR be a bounded operator and for i = 1, ... , let
ei
= {Oin}n.
00
IIT[[2 =
L [T(ei)[2. j=l
d. Pointwise and uniform convergence for operators In .c(X, Y) we may define two notions of convergence. 9.102 Definition. Let {L n } C .c(X, Y).
(i) We say that {L n } converges pointwise to L ifVx E X we have IILn(x) - L(x)lly ~ 0,
(ii) we say that {L n } converges to L in norm or uniformly, if IIL n - LILc(x,Y)
=
sup
IILn(x) - L(x)lly ~ 0
as n
~
00.
[[x[[x:S 1
Trivially, L n ~ L pointwise if L n ~ L uniformly. But the converse is in general false and holds true if X is finite dimensional. 9.103 Example. Recall that a sequence {x n } is in f2(IR) if and only if2::~l x% < +00. For any n E N let e(n) := {oknh. Of course, [[e(n)[[2 = 1 "In. Consider the sequence of linear forms {L n } on f2(IR) defined by Ln({xd) = xn · For any x E f2(IR) we have 2::%"=1 x% < +00, hence Ln(x) = Xn ---+ 0 as n ---+ 00, i.e., L n -> 0 pointwise. On the other hand,
IILn and {L n
}
0[I.C( f 2,1R) = IILn II.C (f2,1R) =
does not converge uniformly to O.
sup ILn(x)l::::: Ln(e(n») l!x!12:'Ol
=1
326
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
e. The algebra End (X) Let X, Y and Z be linear normed spaces and let f : X -+ Y and g : Y -+ Z be linear continuous operators. The composition g 0 f : X -+ Z is again a linear continuous operator from X to Z, and for every x E X we have
II(g
0
J)(x)llz ::;
Ilgllllf(x)lly ::; Ilgllllfllllxllx,
hence
Ilg fll ::; Ilgllllfll·
(9.26)
0
9.104 Example. In general Ilg 0 /11 < Ilgllll/ll. For instance, if X =]R2 and / and g are the orthogonal projections on the axes, we have II/II = Ilgll = 1 and fog = go/ = 0, hence Ilg 0 /11 = 11/ 0 gil = O. 9.105 Example. Let T:]R2 ->]R2 be defined by T(x) = Tx where T:=
o<
E
«
1. Then T- 1 (x) = T- 1 x where
IITII = 1, liT-III = liE and IITIIIIT- 1 11»
T- (1o 0). 1
=
G~),
We then compute
liE 1 = IIIdl1 = IIT- 1 oTII·
Let X be a Banach space with norm II Ilx, denote the Banach space .c(X,X) by End (X) and the norm on End (X) by 1111. The product of composition defines in End (X) a structure of algebra in which the product satisfies the inequality (9.26): this is expressed by saying that End (X) is a Banach algebra. Clearly, if L E End (X) and Ln = L 0 L 0 ••• 0 L, then by (9.26) we have (9.27) Again, in general, we may have a strict inequality. 9.106 Proposition. Let X be a Banach space and L E End (X). 1, then Id - L is invertible,
IfllLl1 <
00
( Id - L) -1 =
LL
in End (X)
n
n=O
and II(Id - L)-lll ::; l-~LII' In particular, for any y E X the equation x - Lx
=y
has a unique solution, x
Proof. The series
I:~=o L n
= 'L.':=o Lny
and
Ilxll ::; HILllllyll.
is absolutely convergent, since
00
00
IILnll :S ;
;
hence convergent. In particular,
IILli
8:= I:~=o L n
n
1 = 1 -liLli'
E End (X) and
11811
:S 1-IILII. Finally
n
( Id - L)
LL
k =
Id - L n+ 1 -> Id
in End (X)
k=O
o
9.4 Linear Operators
327
f. The exponential of an operator Again by (9.27) we get, similarly to Proposition 9.106, the following.
9.107 Proposition. Let X be a Banach space and L E End (X). := ~~=o anz n be a power series with radius of convergence p> O. If IILII < p, then the series ~~=o anLn converges in End (X)
(i) Let f(z)
and defines a linear continuous operator 00
f(L) := I:>nLn E End (X). n=O
if
(ii) The series ~;::o L k converges in End (X) and defines the linear continuous operator 1
L k!L 00
L
e = exp(L):=
k
E End (X).
k=O
9.108'. Show the following.
Proposition. Let X be a Banach space and let A, BE End (X). Then we have
(i) (Id + ~) n (ii)
->
e A in End (X),
IleAIi ~ eIIAII,
(iii) If A and B commute, i.e., AB
=
BA, then
(iv) if A has an inverse, then (eA)-l = e- A , (v) if X is finite-dimensional, X = lR n , we have e PAP det e
A
1
= PeAp- 1 , = e trA ,
if P has an inverse, if A is symmetric.
9.4.2 Fundamental theorems In this subsection, we briefly illustrate four of the most important theorems about the structure of linear continuous operators on normed spaces. The first three, the principle of uniform boundedness, the open mapping theorem and the closed graph theorem are a consequence of Baire's category theorem, see Chapter 5, and are due to Stefan Banach (1892-1945); the fourth one, known as Hahn-Banach theorem, was proved independently by Hans Hahn (1879-1934) in 1926 and by Banach in 1929.
328
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
a. The principle of uniform boundedness The following important theorem is known as the Banach-Steinhaus theorem and also as the principle of uniform boundedness. 9.109 Theorem (Banach-Steinhaus). Let X be a Banach space and Y be a normed linear space. Let {To;} be a family of bounded linear operators from X to Y indexed on an arbitrary set A (possibly nondenumerable). If sup IITaxllY :::; C(x) <
+00
YxEX,
aEA
then there exists a constant C such that sup IITallL:(x,Y) :::; C <
+00.
aEA
Proof. Set
X
n :=
{x E X
I[[Taxlly ::; n
Va E
A}.
X n is closed and by hypothesis UnX n = X. By Baire's category theorem, it follows that there exists no E N, Xo E X and ro > 0 such that B(xo, ro) C X no , that is Va E
A,
hence Vz E B(O, 1), 1 IITa(z)lly ::; -IITa(roz + xo - xo)lly ro 1 no ::; -(no + IITa(xo)lly)::; ro
+ C(xo) ro
=: C.
o The following corollaries are trivial consequences.
9.110 Corollary. Let {Td be a sequence of bounded linear operators from a Banach space X into a normed space Y. Suppose that for each x E X limk---+oo Tkx =: Tx exists in Y. Then the limit operatorT is also a bounded linear operator from X to Y and we have
IITIiL:(x,Y) :::; liminf IITkliL:(x,Y)' k---+oo
9.111 Corollary. Let 1 :::; p < +00. Any linear operator from £p(JR.) into a normed linear space Y is continuous. Proof. In fact, the linear operators {L n Ln(~) :=
}
defined by
L«6, 6, ... , ~n,O,O, ... »
are clearly continuous and Banach-Steinhaus theorem.
Ln(~) --->
if~=(6,6,···)
L(O in £p. Therefore L is continuous by the 0
The following theorem, again due to Banach, is also a consequence of Baire's category theorem.
9.4 Linear Operators
329
9.112 Theorem. Given a sequence of bounded linear operators {Tk } from a Banach space X into a normed linear space Y, the set
{x E X II~~~f IITkxlly < +oo} either coincides with X or is a set of the first category of X. This in turn implies the following.
9.113 Corollary (Principle of condensation of singularities). For p = 1,2, ... , let {Tp,q}, q = 1,2, ... , be a sequence of bounded linear operators from a Banach space into a normed space Yp . Assume that for each p there exists x p E X such that limsuPq~OO IITp,qxpILqx,Yp) = 00. Then the set
{x E X
Ilim sup IITp,qll.qx,Yp) =
+00
for all p
= 1,2,3, ... }
q~oo
is of second category. The above principle gives a general method of finding functions with many singularities. For instance one can find in this way a continuous function x(t) of period 211' such that the partial sum of its Fourier expansion
satisfies the condition lim sup ISnf(t)1 =
00
n~oo
in a set P C [0,211'] which has the power of the continuum. 1
b. The open mapping theorem 9.114 Theorem (Banach's open mapping theorem). Let X and Y be Banach spaces and let T be a surjective bounded linear operator from X into Y. Then T is open, i. e., it maps open sets of X onto open sets of Y. Proof. We divide the proof into two steps. Step 1. First we prove that there is a 8
> 0 such
that
TBx(O, 1) :::> By(O, 28). Set X n := nTBx(O, 1). All X n are closed and, since T is surjective, U;;"=lXn = Y. By Baire's category theorem, see Theorem 5.118, it follows that for some n, X n has a nonvoid interior. By homogeneity, T(Bx(O, 1) has a nonvoid interior, too, i.e., there exists Yo E Y and 8 > 0 such that B y (yo,48) C T(Bx(O,l). By symmetry -Yo E TBx(O, 1), and, as TBx(O, 1) is convex, By(O, 28) C TBx(O, 1).
Step 2. We shall now prove that TB x (O,l):::> B y (0,8), that is the claim. Observe that by Step 1 and homogeneity
TBx(O,r):::> By(O, 28r) 1
Vr
> O.
(9.28)
For proofs we refer the interested reader to e.g., K. Yosida, Functional Analysis, Springer-Verlag, Berlin, 1964.
330
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
We want to prove that the equation Tx = y has a solution x E Bx(O,I) for any y E By(O, 8). Let y E Y be such that Ilylly < 8. By (9.28) there exists Xl E X such that IIx111x < 1/2 and IITx1 - ylly < 8/2. Similarly, considering the equation Tx = y - TX1, one can find X2 E X such that IIx211x < 1/4 and Ily - TX1 - TX211y < 8/4. By induction, we then construct points Xn E X such that Ilxnllx < 2- n and Ity - Lk=l TXk Ily < 8/2 k . Therefore the series LI:=l Xk is absolutely convergent in X with sum less than 1, hence it converges to some X E X with Ilxllx < 1, and Ity - Txlly = 0. 0
9.115 ~. Show the converse of the open mapping theorem: if T : X ---> Y is an open, bounded linear operator between Banach spaces, then T is surjective.
A trivial consequence of Theorem 9.114 is the following.
9.116 Corollary (Banach's continuous inverse theorem). Let X, Y be Banach spaces and let T : X ----+ Y be a surjective and one-to-one bounded linear operator. Then T- 1 is a bounded operator. 9.117 Remark. Let X and Y be Banach spaces and let T : X ----+ Y be a linear continuous operator. Often one says that the equation Tx = y is well posed if for any y E Y it has a unique solution x E X which depends continuously on y. Corollary 9.116 says that the equation Tx = y is wellposed if X and Yare Banach spaces and Tx = y is uniquely solvable 't:IyEY. c. The closed graph theorem Let X, Y be two Banach spaces. Then X x Y endowed with the norm
II(x, y)llxxY
:=
Ilxllx + Ilylly
is also a Banach space.
9.118 Theorem (Banach's closed graph theorem). Let X and Y be Banach spaces and let T : X ----+ Y be a linear operator. Then T : X ----+ Y is bounded if and only if its graph G T :=
{(x, y) E X
I
x Y y=
TX}
is closed in X x Y. Proof. If T is continuous, then trivially Gr is closed. Conversely, Gr is a closed linear subspace of X x Y, hence Gr is a Banach space with the induced norm of X x Y. The linear map 1r : Gr ---> X, 1r((x, Tx)) := x, is a bounded linear operator that is one-to-one and onto; hence, by the Banach continuous inverse theorem, the inverse map of 1r, 1r- 1 : X ---> G r , X --> (x,Tx), is a bounded linear operator, Le., Ilxllx + IITxlly ::; Cllxllx for some constant C. T is therefore bounded. 0
9.4 Linear Operators
331
Figure 9.11. Hans Hahn (1879-1934) and Hugo Steinhaus (1887-1972).
d. The Hahn-Banach theorem The Hahn-Banach theorem is one of the most important results in linear functional analysis. Basically, it allows one to extend to the whole space a bounded linear operator defined on a subspace in a controlled way. In particular, it enables us to show that the dual space, Le., the space of linear bounded forms on X, is rich. 9.119 Theorem (Hahn-Banach, analytical form). Let X be a real normed space and let p : X ----> 1R. be a sublinear functional, that is, satisfying p(x + y)
:s; p(x) + p(y),
VA> 0, Vx, Y E X.
p(AX) = Ap(X)
Let Y be a linear subspace of X and let f : Y ----> 1R. be a linear functional such that f(x) :s; p(x) Vx E Y. Then f can be extended to a linear functional F : X ----> 1R. satisfying F(x)
=
f(x) Vx E Y,
F(x)
:s; p(x)
Vx E X.
Proof. Denote by K the set of all pairs (Y"" g",) where Y", is a linear subspace of X such that Y", =:J Y and g", is a linear functional on Y", satisfying g",(x) = f(x) "Ix E X,
g",(x)
S p(x) "Ix
E Y",.
We define an order in K by (Ya,g",) S (Y/3,g/3) if Y", C Y/3 and g", = g/3 on Y",. Then K becomes a partially ordered set. Every totally ordered subset {(Y""g",)} clearly has an upperbound (Y',g') given by Y' = U/3Y/3, g' = g/3 on Y/3. Hence, by Zorn's lemma, see e.g., Section 3.3 of [GM2], there is a maximal element (Yo,go). If we show that Yo = X, then the proof is complete with F = go. We shall assume that Yo # X and derive a contradiction. Let Y1 rt- Yo and consider
Y1 := Span (Yo U {yI}) = {x = y
+ >'Y1
lYE Yo, >. E JR},
notice that Y E Yo and>. E JR are uniquely determined by x, otherwise we get Y1 E Yo. Define gl : Y1 ---> JR by gI(y + >'Y1) := gO (y) + >. c. If we can choose c in such a way that gl(Y
+ >'Y1) =
go(y)
+ >'C S p(y + >'Y1)
for all >. E JR, Y E Yo, then (Y1,9I) E K and (yo, go) contradicts the maximality of (yo, go).
S (Y1,gl), Y1 # yo. This
332
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
To choose c, we notice that for x, Y E Yo
90(Y) - 90(x)
=
90(Y - x) ::; p(y - x) ::; p(y + yI)
+ p(-Yl -
x).
Hence
-p( -Yl - x) - 90(X) ::; p(y + Yl) - 90(Y)· This implies that
A:= sup {-p( -Yl - x) - 90(X)}::; inf {p(y + Yl) - 90(Y)} xEYo
=:
yEYo
B.
Thus we can choose c such that A ::; c ::; B. Then
+ Yl) - 90(Y) p(-Yl - y) - 90(Y) ::; c
c ::; p(y
Multiplying the first inequality by>., >.
> 0,
V'y E Yo, V'y E Yo.
and the second by>., >. = 0
< 0,
and replacing
Y with Y/>' we conclude that for all >. # 0 and trivially for>. >.c::; p(y
+ Yl)
- 90(Y)'
o 9.120 Theorem (Hahn-Banach). Let X be a normed linear space of JK = ~ or JK = C and let Y be a linear subspace of X. Then for every f E .c(Y, JK) there exists F E .c(X, JK) such that F(x)
=
11F11c(x,oc) = Ilfllc(Y,oc).
f(x) \Ix E Y,
Proof. If X is a real normed space, then the assertion follows from Theorem 9.119 with p(x) = Ilfll.c(Y,lll.)llxllx. To prove that IIF(x)II.c(x,lll.) ::; Ilfll.c(Y,lll.), notice that F(x) = OIF(x)l, 0 := ±l, then
IF(x)1
= OF(x) = F(Ox)::; p(Ox) = Ilfll.ccY,lll.)IIOxlIx = Ilfll.c(Y,lll.)llxllx.
This shows 11F11.c(x,lll.) ::; Ilfll.c(Y,lll.)' The opposite inequality is obvious. Suppose now that X and Yare complex normed spaces. Consider the real-valued map
h(x)
:=
Rf(x),
x E Y.
h is a JR-linear bounded form on Y considered as a real normed space since Ih(x)1 ::; If(x)1 ::;
'Ix E Y,
Ilfll.c(y,q Ilxllx
thus the first part of the proof yields a JR-linear bounded map H : X H(x) = h(x) 'Ix E Y and IH(x)1 ::; Ilfll.c(Y,lll.)llxllx 'Ix E X. Now define
F(x)
:=
JR, such that
'Ix E X,
H(x) - iH(ix)
hence H(x) = RF(x). It is easily seen that F: X remains to show that
-+
-+
iC is a iC-linear map and extends
f. It
IF(x)1 ::;
Ilfll.c(y,qllxllx
For x E X, we can write F(x) = re i /3 with r
IF(x)1
:2: O.
'Ix E X. Hence
= r = R(e- i /3 F(x)) = RF(e- i /3 x ) = H(e- i /3 x ) ::; IIfll.ccY,lll.)Ile- i /3xllx ::; Ilfll.ccy,qllxllx· o
9.4 Linear Operators
333
Simple consequences are the following corollaries.
9.121 Corollary. Let X be a normed space and let x EX. Then there exists F E £(X, lR) such that F(x)
= Ilxllx,
11F11.C(x,IR) =
1.
9.122 Corollary. Let X be a normed space. Then for all x E X
IF E
Ilxllx = sup { F(x)
£(X, lR),
1IFII.qx
I}
9.123 Corollary. Let Y be a linear subspace of a normed linear space X. If Y is not dense in X, then there exists F E £(X, lR) F =1= 0, such that F(y) = 0 't/y E Y. 9.124~.
Prove Corollaries 9.121, 9.122 and 9.123.
We can give a geometric formulation to the Hahn-Banch theorem that is very useful. For the sake of simplicity from now on we shall assume that X is a real normed space, even though the following results hold also for complex normed spaces. A closed affine hyperplane in X is a set of the form
H where F E £(X, lR) and
:=
{x E X I F(x) = Q:}
Q: E R
H_ := {x E X I F(x)
It defines the two half-spaces
s:; Q:},
H+ := {x E X I F(x)
~ Q:}.
We say that H separates the sets A and B if and
9.125 Lemma (Gauge function). Let C c X be an open convex subset of the real normed space X and let 0 E C. Define
p( x) := inf{ Q: > 0 I; Then
(i) p is sublinear, (ii) 3M such that 0
s:; p(x) s:;
(iii) C:={xEXlp(x)
M
Ilxllx,
E
C}.
334
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
Proof. If B(O,r) C X, we clearly have p(x) ::; ~llxllx "Ix E X, that is (ii). Let us prove (iii). Suppose x E C. Since C is open, (1 + E)X E C, if E is small. Hence p(x) ::; l~€ < 1. Conversely, if p(x) < 1, there is a, 0 < a < 1, such that a-1x E C, hence x = a(a- 1x)+(I-a)0 E C. Finally, let us prove (i). Triviallyp(>.x) = >.p(x) for>. > o. For all x, y E X and E > 0 we know that x - - y - EC p(y) + E ' consequently,
(1 - t)y p(Y)+E
tx p(X)+E
---+---E
C
"It E [0,1].
In particular, for t : = -,----::"P...:.(X---,)-:-+--:-E_
p(x)
+ p(y) + 2E
we obtain
x+y E C. p(x) + p(y) + 2E This yields p(x + y) ::; p(x) + p(y) + 2E and the claim follows, since
E is
arbitrary.
0
9.126 Proposition. Let C C X be an open convex subset of the real normed space X and let x E X, x ~ C. Then there exists f E .c(X,IR) such that f(x) < f(x) "Ix E C. In particular, C and x are separated by the closed affine hyperplane {x I f (x) = f (x) }. Proof. By translation we can assume 0 E C and introduce the gauge function p(x) by Lemma 9.125. If Y := Span {x} and 9 : Span {x} -> IR is the linear map g(tx) := t, it is clear that g(x) ::; p(x) "Ix E Span {x}. By Theorem 9.119, there exists a linear extension f of 9 such that f(x) ::; p(x) "Ix E X. In particular, we have f(x) = 1 and f is bounded because of (ii) of Lemma 9.125. On the other hand, f(x) < 1 "Ix E C by 0 (iii) of Lemma 9.125.
9.127 Theorem (Hahn-Banach thereom, geometrical form). Let A and B be two nonempty disjoint convex sets of a real normed space X. Suppose A is open. Then A and B can be separated by a closed affine hyperplane.
I
Proof. Set C := A - B = {x - y x E A, y E B}. Trivially C is convex and open as C := UyEB(A - V); moreover, 0 'i C since An B = 0. By Proposition 9.126 there exists f E L(X, IR) such that f(z) < 0 Vz E C, i.e., f(x) < f(y) "Ix E A Vy E B. If we choose a such that sup f(x) ::; a::; inf f(y), xEA
yEB
the affine hyperplane {f(x) = a} separates A and B.
o
9.5 Some General Principles for Solving Abstract Equations In this final section we establish some fundamental principles concerning the solvability of abstract equations
9.5 Some General Principles for Solving Abstract Equations
335
Au=f where A : X ---. Y is a continuous function also called a continuous nonlinear operator between Banach spaces. These principles are fully appreciated for instance when dealing with the theory of ordinary or partial differential equations; however in Chapter 11 we shall illustrate some of their applications.
9.5.1 The Banach fixed point theorem Many problems take the form of finding a fixed point for a suitable transformation. For instance, if A maps X into X where X is a vector space, the equation Au = 0 is equivalent to Au + u = u, Le., to finding a fixed point for the operator A + Id. The contraction mapping theorem, proved by Stefan Banach (1892-1945) in 1922, an elementary version of which we saw in Theorem 8.48 in [GM2], is surely one of the simplest results that ensures the existence of a fixed point and also gives a procedure to determine it. The method has its origins in the method of successive approximations of Emile Picard (1856-1941) and may be regarded as an abstract formulation of it. Let {x n } be defined by
If {x n } converges to F(x) = x.
x and F
is continuous, then
x is a fixed
point of F,
a. The fixed point theorem Let X be a metric space. A map T : X ---. X is said to be k-contractive if d(T(x),T(y)):::; kd(x,y) Vx,y E X, or, in other words, ifT: X ---. X is Lipschitz continuous with Lipschitz constant
. ( ) LIp T
Ix, y, EX } = sup {d(Tx,TY) d(x, y)
less than or equal to k. If 0 :::; k < 1, T is often said simply a contraction or a contractive mapping. A point x E X for which Tx = x is called a fixed point for T. The contraction principle states that contractions have a unique fixed point.
9.128 Theorem (The fixed point theorem of Banach). Let X be a complete metric space and let T : X ---. X be k-contractive with 0 :::; k < 1. Then T has a unique fixed point. Moreover, given Xo EX, the sequence {x n } defined recursively by x n + I = T (x n ) converges with an exponential rate to the fixed-point, and the following estimates hold
336
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
hSCIC:UU ".
I.Ero~
l'OPOLlIGI8 £I' toUInONS fOXcnO:fflI!l.L!S'·'
omOVE F.o mo~ FO:lCTlOiiNELLe "PPLI(AllOlC
.A
l""tiIl..::
n DE PHYSIQl.lF.
-
'.OlU.S.~1> 11'.11:0..0\1..\ •
1."
I. ~'I'",a'l,", trk .1..... • <..)... t hoi .......... JIkN, , ... ptt".... d, l.L l'lIMw.. rk31t _I 1.'""'t- I ,_to It ~r1I" HI.tJ_ "" 1flJ'kl, . N 110I ,.rilt. rute _\.u1ll uI"'-p"tll dt 11010 ru.IOlI .. d.tt MI..I rillllllll . . . . "'.. ,.., It. 4Iptl1... •~IIjnJ !mill" II
lIATIl~U"TIOtlV
r",_w.
..u
c.
11eI.••k.ilaWt·,.i'lItufliWirMa .. _n"hU1nU 'llwlHI r...... ~It . . Mlucitfl, 1II. ~n .. ipWSooa. rnrl_IloIlI•• _ "aJwu u ulin' ~tiU'. iWtld .. 1111, t'~..., l ftl .... ~ ~ r~...., ntwo MWJ. .'flU' """ ..... dIu. ,-.,tMl ...,... au ..., "M\lI~lt I IN !p'"
,..t
_
.••
"'-.1
....
4lJutje.,It ... ,t,""._ho,. l"
.. "VI"
_ _ (••lltotHif;'li.' ,,,,""'1..1
.1I"t.)"'~ •
II' tOl"'"
t.klrt!l........,.." ..........pla (n .......
tt.llnd).
0·.. "....1••• ~ll!ol,...,...,-lIl.I.r~"t . .... . . . rI'~-n "" Ml.No """(1). ""'-- ",' ... C4C11h , 1t"rl~ ..... "'I,pr..c.l, U .,,.. (I) r..iJc IoIlM 1M 1 W••• Mt .. "-.If bettf (_ .IKtIf:1't
_*'.
-ww
''''1'''IT e:-. urn",. o"... nm"'''ILUII I
• • •,
. . . _a.._.........
u ••••• tt
. . ~u,.
••• .... 'ftl,
"
.....-,_"--1• .. ..-...-......
_ _ "...,... ..
l"'~
.
"I
~
'~
1 II,
~ ...
.........
'I
r_ri'"
,;;"
Figure 9.12. The frontispiece of Ler;ons sur quelques equations fonctionnelles by Emile Picard (1856-1941) and the first page of a celebrated paper by Jean Leray (1906-1998) and Juliusz Schauder (1899-1943) appeared in Journal de Mathematiques in 1933.
Proof. The proof is as in Theorem 8.48 of [GM2]. First we prove uniqueness. If x, yare two fixed points, from d(x, y) = d(Tx, Ty) ::; kd(x, y), 0 ::; k < 1 we infer d(x, y) = 0, i.e., x=y. Then we prove existence. Choose any Xo E X and let Xn+l := T(xn), n ~ O. We have d(Xn+l,X n ) ::; kd(xn,xn-d ::; knd(Xl,XO) = knd(T(xo),xo),
hence for p
>n d(xp,xn ) ::;
p-l
p-l
j=n
j=n
kn
L d(Xj+l,Xj)::; L kjd(Xl,XO)::; 1- k d(Xl,XO).
Therefore d(xp,x n ) -> 0 as n,p -> 00, i.e., {Xn} is a Cauchy sequence, hence it has a limit x E X and x is a fixed point as it is easily seen passing to the limit in Xn+l = T(xn ). Finally, we leave to the reader the proof of the convergence estimates. 0
Notice that the second estimate in Theorem 9.128 allows us to evaluate the number of iterations that are sufficient to reach a desired accuracy; the second estimate allows us to evaluate the accuracy of X n as an approximate value of x in terms of d(xn+l'x n ), 9.129'. Show that T : X ToT
0 ... 0
->
X has a unique fixed point if its mth iterate T m = < 1. [Hint: x and Tx are both
T is a k-contractive mapping with 0 ::; k
fixed points of Tm.J
9.5 Some General Principles for Solving Abstract Equations
337
9.130~. Let X := CO([a, b]) and let
Tf(t)
:=
Show that
T m f(t) = (
1 )
m -1
l
f(s)ds,
a:::; t :::;
rt (t _ s)m-l f(s) ds
lia
b.
a:::; t:::;
b,
is a contractive map if m is sufficiently large.
9.131 Proposition. Let X be a Banach space and T : X -> X a kcontractive map with 0 ~ k < 1. Then Id - T is a bijection from X into itself, i.e., for every y E X the equation x - Tx = y has a unique solution, moreover (9.29) Lip ( Id - T) -1 ~ 1 ~ k . Proof. For any y E X the equation x - Tx = y is equivalent to x = Y + Tx =: F(x). Since F is k-contractive and k < 1, the fixed point theorem shows that x - Tx = y has a unique solution for any given y E X, i.e., Id - T is bijective. Finally, if x - Tx = y, then Ilxll:::; Ilx - Txll + IITxl1 :::; Ilyll + k Ilxll
i.e., Ilxll:::; l~kIIYII·
o
9.132~. Let X be a Banach space and T : X --+ X a Lipschitz-continuous map. Show that the equation Tx + J.LX = Y
is solvable for any y, provided 9.133~.
IJ.LI
is sufficiently large.
Let X be a Banach space and B : X x X
--+
IR a bilinear continuous form
such that
IB(x,y)1 :::; B
Ilxlillyll
V'x,y E X.
Show that the equation
x=y+B(x,x) has at least a solution if Ilyll < 1/(2B) and, in this case, show that there is a unique solution satisfying Ilxll :::; 1/(2B). [Hint: First look at the simplest case of X = IR and B(x,x):= Bx 2 .]
b. The continuity method The solvability of a linear equation L 1 x = y can be reduced to the solvability of a simpler equation Lox = y by means of the following. 9.134 Theorem (The continuity method). Let X be a Banach space, Y a normed space and La, L 1 two linear continuous functions from X to Y. For t E [0, 1] consider the family of linear continuous functions L t : X -> Y given by L t := (1 - t)L o + tL 1
and suppose that there exists a constant C such that the following a priori estimates hold
338
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
o.
II.
~
.........
~
fII
'NI
,..'-a--
.,. ........... "'....,.....,.,... ..............._ .. ~4""-
.............. #-
~~~
~.,
_,.....-...~
a au.ooo
NID 0
.--..a--... ....,....
~tMI.'O
.,..........
..*"-
.. _ ------... _..,_,......
6 t ~
. - . ..
,....~.,
_.~.,,--....--
n. ow.....--"'.""~ ....... ~ .. _*--.-..J ....w._ .....
(/IIIr._ ==.:..c.-...
-.
-.~---'
__
................... _.a.~.
'-~_
It ........... _ ~ --"'I A .... It.....
·.I.. _ _
.... A/t.
J.-
-"f'.
~
o..r, .. iIlIocnI~.....,a.~
"DoI~"""'~
~.~
..
.
~
....
-- ... _..... _.. -- _... ....... ....... _........, . . ~
,_ n..-6In ..""_
\IIt ..... _ " '
.
J--..,...,"'
...
n.1itIn. ._ .. ~ .. ~ - - .... _...-.""_ ......... ..-IrIr --..... . _..... - - - . ... ..,. L. .. J ,
..--a...-. .._ --:;;:;;;;; ,. u.'.":i.::-: -~.::.::-...:~ ..~~~::;II~;:~~ 1. • • , , "~" "'1'"
Iflo.
1 •••
n., ..·...... ~
"""-II
............_
........ 1- .. . . - - _ -. ...... _"n.-..______
~
~
U.~"".,."
"-
~_ . . . -~ . . T •• ';; •• II
Figure 9.13. George Birkhoff (1884-
A ••••• '
01 "' . .
III_
-....-
,,
....
.. ... -.-.__ ~
--.
a-&_ ....
1944) and a page from a paper by Birkhoff and Kellogg in Transactions, 1922.
(9.30)
Vx E X, Vt E [0,1].
Then, of course the functions L t , t E [0,1], are injective; moreover, L 1 is surjective if and only if L o is surjective. Proof. Injectivity follows from the linearity and (9.30). Suppose now that L s is surjective for some s. Then L s : X - Y is invertible and by (9.30) II(Ls)-111 ~ C. We shall now prove that the equation Ltx = y can be solved for any y E Y provided t is closed to s. For this we notice that Ltx = y is equivalent to
Lsx = y
+ (L s -
Lt)x = y
+ (t -
s)Lox - (t - s)Llx
which, in turn, is equivalent to
x = L;l y + (t - S)L;l(Lo - Ldx =: Tx since L s : X - Y has an inverse. Then we observe that IIL111), consequently T is a contractive map if
It - s\ ~
IITx - Tzlly
~
Cit -
sl(IILoll +
1
fJ
:=
C(IILoll + IILIII)'
and we conclude that L t is surjective for all t with It - sl < fJ. Since fJ is independent of s, starting from a surjective map Lo we successively find that L t with t E [0, fJ], [0,2fJ], ... is surjective. We therefore prove that LI is surjective in a finite number of steps. 0 9.135 Remark. Notice that the proof of Theorem 9.134 says that, assuming (9.30), the subset of [0, I)
S
:=
{s E [0, 1J IL
s :
X - Y is surjective}
is open and closed in [0,1]. Therefore S = [O,IJ provided S
of. 0.
9.5 Some General Principles for Solving Abstract Equations
339
9.5.2 The Caccioppoli-Schauder fixed point theorem Compared to the fixed point theorem of Banach, the fixed point theorem of Caccioppoli and Schauder is more sophisticated: it extends the finitedimensional fixed point theorem of Brouwer to infinite-dimensional spaces. 9.136 Theorem (The fixed point theorem of Brouwer). Let K be a nonempty, compact and convex set of JRn and let f be a continuous map mapping K into itself. Then f has at least a fixed point in K. The generalization to infinite dimensions and to the abstract setting is due to Juliusz Schauder (1899~1943) at the beginning of the Twenties of the past century, however in specific situations it also appears in some of the works of George Birkhoff (1884-1944) and Oliver Kellogg (18781957) of 1922 and of Renato Caccioppoli (1904-1959) (independently from Juliusz Schauder) of the 1930's, in connection with the study of partial differential equations. Brouwer's theorem relies strongly on the continuity of the map f and in particular, on the property that those maps have of transforming bounded sets of a finite-dimensional linear space into relatively compact sets. As we have seen in Theorem 9.21, such a property is not valid anymore in infinite dimensions, thus we need to restrict ourselves to continuous maps that transform bounded sets into relatively compact sets. In fact, the following example shows that a fixed-point theorem such as Brouwer's cannot hold for continuous functions from the unit ball of an infinite-dimensional space into itself. 9.137 Example. Consider the map
Clearly
f : £2
--+
£2 given by
f maps the unit ball of £2 in itself, is continuous and has no fixed point.
a. Compact maps 9.138 Definition. Let X and Y be normed spaces. The (non)linear operator A : X ---. Y is called compact if (i) A is continuous, (ii) A maps bounded sets of X into relatively compact subsets ofY, equivalently fOT any bounded sequence {xd c X we can extract a subsequence {x nk } such that {Ax nk } is convergent. 9.139 Example. Consider the integral operator A: CO([a, b]) u E CO([a,b]) into Au(x) E CO([a,b]) defined by
Au(x) :=
lab F(x, y, u(y)) dy
--+
CO([a, b]) that maps
340
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
where F(x, y, u) is a continuous real-valued function in JR3. For r {(x,y,u)EJR 3 Ix,yE[a,bj, [ul:Sr} and
>
0 set Qr .-
I
M r := {u E CO([a, b]) Ilull oo :S r}. Proposition. A: M r
--t
CO ([a, b]) is a compact operator.
Proof. (i) First we prove that A : M r --t CO([a, bJ) is continuous. Fix £ > 0 and observe that, F being uniformly continuous in Qr, there exists 8 > 0 such that IF(x,y,u) - F(x,y,v)1 < if (x, y, u), (x, y, v) E B r and lu -
vi < 8.
Consequently, we have
[F(x,y,u(y)) - F(x,y,v(y))1 for u, v E M r with Ilu - vlloo,[a,b]
IIAu - Avlloo,[a,bI =
< 8,
£
<£
hence
sup lib [F(x, y, u(y)) - F(x, y, v(y))J dyl
xE[a,b]
a
:S £(b - a).
(9.31)
(ii) It remains to show that A maps bounded sets into relatively compact sets. To do that, it suffices to show that A(Mr ) is relatively compact in CO([a, bJ). We now check that A(Mr ) C CO([a,bJ) is a set of equibounded and equicontinuous functions. Then the Ascoli-Arzela theorem, Theorem 9.48, yields the required property. In fact, the equiboundedness of functions in A(Mr ) follows from
IIAulloo:S(b-a)
sup
IF(x,y,z)[,
(x,y,z)EQr
while the equicontinuity of functions in A(Mr
)
is just (9.31).
o
Compact operators arise as limits of maps with finite rank as shown by the following theorem. 9.140 Theorem. Let X and Y be Banach spaces and Me X a nonempty bounded set. We have
(i) If {An}, An : M
----* Y, is a sequence of compact operators that converges to A : M ----* Y in B(A, Y), i.e., [IAn - AIIB(A,Y) ----* 0 as n ----* 00, then A si compact. (ii) Suppose A : M ----* Y is compact. Then there exists a sequence {An} of continous operators An : X ----* Y such that !IAn - Alloo,M ----* 0 as n ----* 00 and each An has range in a finite-dimensional subspace of Y as well as in the convex envelope of A(M).
Proof. (i) Fix £ > 0 and choose n so that IIA n - Alloo,M < £. Since An(M) is relatively compact, we can cover An(M) with a finite number of balls An(M) C U{=l B(Xi, E), i = 1, ... ,/. Therefore A(M) C U{=l B(Xi, 2£), i.e., A(M) is totally bounded, hence A(M) has compact closure, compare Theorem 6.8. (ii) Since A(M) is relatively compact, for each n there is a ~-net, i.e., elements Yj E A(M), j = 1, ... , I n such that A(M) C Uf::;l B(Yj, lin), or, equivalently, .
1
mmIIAx-yjll:S J n
"Ix E M.
9.5 Some General Principles for Solving Abstract Equations
341
Figure 9.14. Renato Caccioppoli (1904-1959) and Carlo Miranda (1912-1982).
Define the so-called Schauder operators
An X
._ .-
Ef~l aj(x)Yj
xEM,
J
Ej~l aj(x)
where, for x E M and j = 1, ... , I n ,
aj(x):=
max{; -IIAx -
It is easily seen that the functions aj : M simultaneously; moreover
-> ~
YjII,O}. are continuous and do not vanish
o
the claim then easily follows.
b. The Caccioppoli-Schauder theorem 9.141 Theorem (Caccioppoli-Schauder). Let M c X be a closed, bounded, convex nonempty subset of a Banach space X. Every compact operator A : M ~ M from M into itself has at least a fixed point. Proof. Let Uo E M. Replacing U with U - Uo we may assume that 0 E M. From (ii) of Theorem 9.140 there are finite-dimensional subspaces X n C X and continuous operators An : M -> X n such that IIAu - Anull ~ and An(M) C co(A(M)). The subset M n := X n n M is bounded, closed, convex and nonempty (since 0 E M n ) and An(Mn ) C M n . Brouwer's theorem then yields a fixed point for An : M n -> M n , Le.,
s:
Un E M n ,
Anun
= Un,
hence, as the sequence {un} is bounded,
IIAun -
unll =
IIAun - Anunll
1 s: -llunll-> O. n
Since A is compact, passing to a subsequence still denoted by {Un}, we deduce that {Au n } converges to an element vEX. On the other hand v E M, since M is closed, and as n ---+ 00; IIUn - vii Ilv - Aunll + IIAun - unll-> 0 thus Un -> v and from AU n = Un Vn we infer Av = v taking into account the continuity of A. 0
s:
342
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
c. The Leray-Schauder principle A consequence of the Caccioppoli-Schauder theorem is the following principle, which is very useful in applications, proved by Jean Leray (19061998) and Juliusz Schauder (1899-1943) in 1934 in the more general context of the degree theory and often referred to as to the fixed point theorem of Helmut Schaefer (1925- ). 9.142 Theorem (Schaefer). Let X be a Banach space and A: X -+ X a compact operator. Suppose that the following a priori estimate holds: there is a positive number r > 0 such that, if u E X solves u
= tAu
for some 0 :::; t :::; 1,
then
Ilull < r. Then the equation v EX
v =Av has at least a solution. Proof. Let M := {u E X retraction on the ball, i.e.,
I Ilull :::::
r} and consider the composition B of A with the
Au Bu:=
{
I~:~I
if if
IIAull::::: r, IIAull2: r.
B maps M to M, is continuous and maps bounded sets in relatively compact sets, since A is compact. Therefore the Caccioppoli-Schauder theorem yields a fixed point u E M for B, Bu = u. Now, if IIAull ::::: r, u is also a fixed point for A; otherwise IIAul1 > rand u=Bu= _r_ Au = tAu
IIAill1
hence Ilull < r: it follows that also point for A.
IIBul1 < r,
r
'
t:= - -
<1
IIAul1 - ,
Le., u = Bu = Au and u is again a fixed 0
Theorems 9.134 and 9.142 may be regarded as special cases of a sort of general principle: a priori estimates on the possible solution yield existence of a solution.
9.5.3 The method of super- and sub-solutions In this section we state an abstract formulation of the following principle that reminds us of the intermediate value theorem: to find a solution, it often suffices to find a subsolution and a supersolution.
9.5 Some General Principles for Solving Abstract Equations
343
Figure 9.15. Juliusz Schauder (1899-1943) and Jean Leray (1906-1998).
a. Ordered Banach spaces 9.143 Definition. An order cone in a Banach space X is a subset X+ such that
(i) X+ is closed, convex nonempty and X+ i=- {O}, (ii) if u E X+ and A :2: 0, then AU E X+, (iii) if u E X+ and -u E X+, then u = O. An order cone X + C X defines a total order in X
u ::; v
v - u E X +,
if and only if
and we say that X is an ordered Banach space (by X+). In this case intervals in X are well defined
[u,w] := {v
E
I u::; v::; w}.
X
9.144 Definition. An order cone X+ is called normal if there is a number such that Ilull ::; cllvll whenever 0::; u ::; v.
c> 0
9.145 Example. In JR.n with the Euclidean norm the set X+ = is a normal order cone: 0 :S
X
JR.+.
:=
{(Xl, ... ,Xn)
:S y implies
Ixi
:S
I
Xi
2: 0 'Vi}
Iyl.
9.146 Example. In the Banach space CO([a, b]) with the uniform norm
C~([a, b])
:= {u E
I
CO([a, b]) u(x) 2: 0 'Vx E [a, b]}
is a normal order cone. 9.147~.
Show (i) (ii) (iii) (iv) (v)
Let u,v,w,un,Vn be elements of an order cone X+ of a Banach space X. that u:S v and v :S w imply u :S w, u:S v and v :S u imply u = v, if u :S v, then u + w :S v + wand AU :S AV 'VA 2: 0, 'Vw E X, if Un :S Vn , Un -+ U and Vn -+ v as n -+ 00, then U :S v, if X+ is normal, then U :S v :S w imply Ilv-ull:Scllw-ull
and
Ilw-vll:Scllw-ull·
344
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
b. Fixed points via sub- and super-solutions 9.148 Theorem. Let X be a Banach space ordered by a normal order cone, let uo, Vo E X and let A: [uo, vo] eX ----t X be a (possibly nonlinear) compact operator. Suppose that A is monotone increasing, i.e., Au ::; Av whenever u ::; v and that (i) Uo is a subsolution for the equation Au = u, i.e., Uo ::; Auo, (ii) Vo is a supersolution for the equation Au = u, i.e., Avo::; Vo· Then the two iterative processes and started respectively, from Uo and Vo converge respectively to solutions u_ and u+ of the equation u = Au. Moreover
Proof. By induction Ua
:S ... :S Un :S Vn :S ... :S Va,
since A is monotone. From (v) of Exercise 9.147 IIva - Un II
:S C Ilva - uall
'in,
Le., {Un} is bounded. As A is compact, there exists u_ E X such that for a subsequence {Uk n } of {Un} we have AUkn ---> u_ as n ---> 00. Finally u_ = Au_, since A is continuous. One operates similarly with {Vn}. 0
9.149 Remark. Notice that the conclusion of Theorem 9.148 still holds if we require that A be monotone on the sequences {Un} and {v n } defined by Un+l = AUn and Vn +l = AVn started respectively, at Uo and Vo instead of being monotone in [uo, vo].
9.6 Exercises 9.150,.. Show that in a normed space (X, 1111) the norm 1111: X continuous function with best Lipschitz constant one, Le.,
---> ~+
is a Lipschitz-
Illxll-llylll:s IIx - yll· 9.151 ,.. Show that the set E t := {x E X I f(x) space and f : X ---> ~ is convex. 9.152,.. Let X be a normed space with function.
:S t} is convex for all t if X is a normed
1111x. Show that x
--->
IlxlI P , P ~
1, is a convex
9.153,.. Convexity can replace the triangle inequality. Prove the following claim.
9.6 Exercises
345
Eberhard Zeidl ,
....
-
Applied Functional Analy is
_ _BKtlIS .. __ no... Hatm
Maln Principles Iwd Thcir Applications
FO erIO ~rie
ALYSE ELLE
el applications-
Figure 9.16. Frontispieces of two volumes on functional analysis.
Proposition. Let X be a linear space and let f : X
(i) (ii) (iii) Then
-> lR+ be a function such that f(x):::: 0, f(x) = iff x = 0, f is positively homogeneous of degree one: f()..x) = 1)..lf(x) Vx E X, V).. :::: 0, the set {x I f(x) ~ I} is convex. f(x) is a norm on X.
°
9.154'. Prove the following variant of Lemma 9.22.
Lemma (Riesz). Let Y be a closed proper linear subspace of a normed space X. Then, for every € > 0, there exists a point z E X with Ilzll = 1 such that liz - yll > 1 - € for allyEY. 9.155'. Show that BV([a, b]) is a Banach space with the norm
IIfIlBv:=
sup
If(x)1 + Vd'U)·
xE[a,b]
[Hint: Compare Chapter 7 for the involved definitions.] 9.156'. Show that in CO([a,b]) the norms
111100
e
[IIILP
are not equivalent.
9.157'. Show that in Cl([O, 1])
Ix(O)1 + llxl(t)! dt defines a norm, and that the convergence in this norm implies uniform convergence. 9.158'. Denote by Co the linear space of infinitesimal real sequences {xn} and by coo the linear subspace of Co of sequences with only finite many nonzero elements. Show that Co is closed in loo while coo is not closed.
346
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
9.159'. Recall, see e.g., Section 2.2 of [GMl], that the oscillation of a function JR -> JR over an interval around x and radius () is defined as
Wj,6(X):=
f :
sup If(y) - f(x)1 ly-xl<6
and that f : JR -> JR is continous at x if and only if Wj,6(X) -> 0 as () -> O. Show that -> 0 uniformly on every bounded interval of R [Hint: Use Theorem 6.35.]
Wj,6(X)
9.160'. Let
f
E C 1 (JR). Show that
t:.hf(x)
:=
f(x
+ h) -
f(x)
->
f'(x)
ash->O
h
uniformly in every bounded interval of JR. [Hint: Use Theorem 6.35.]
9.161 ,. Let f :]xo -1, Xo sequence {fn },
+ l[e JR -> JR be differentiable at xo. Show that the blow-up
.- f(xo + z/n) - f(xo) .1 f n (Z ) . -
-t
f'()
Xo Z,
n
compare Section 3.1 of [GMl], converges uniformly on every bounded interval of JR. [Hint: Use Theorem 6.35.J
9.162'. Compute, if it exists, lim n---+oo
r
12
fn(x) dx,
fn(x)
:=
~(ex/n -1). x
9.163'. Discuss the uniform convergence of the sequences of real functions in [0,1]
~ (2 + sin(nx»e 1 -COS(nx)x. n
9.164'. Discuss the uniform convergence of the following real series
f
n
x log
(1 + :2)'
n=1
9.165'. Show that {u E CO ([0, 1]) I f o1 u(x) dx = O} is a linear subspace of CO ([0, 1]) that is not closed. 9.166'. Show that {u E CO([O, 1]) I u(O) = I} is closed, convex and dense in CO ([0, 1]). 9.167'. Show that {u E CO(JR) llimx~±oo u(x) = O} is a closed subspace of cg(JR). 9.168'. Show that the subspace C2(JR) of cg(JR) of functions with compact support is not closed. 9.169'. Let X be a compact metric space and :F tinuous if
e
CO(X). Show that :F is equicon-
9.6 Exercises
347
(i) the functions in Fare equi-Lipschitz, i.e., :3 M such that
If(x) - f(y)1 ::; M d(x,y)
Vx,yEX, VfEF,
(ii) the functions in Fare equi-Holder, i.e., :3 M and a, 0
If(x) - f(y)1 ::; M d(x,y)'"
< a ::;
1, such that
Vx,yEX, VfEF.
9.170'. Let F C CO([a, bJ). Show that any of the following conditions implies equicontinuity of the family F. (i) the functions in F are of class C 1 and there exists M
vx E
[f'(x)1 ::; M
> 0 such
that
[a, b], V f E F
(ii) the functions in F are of class C1 and there exists M
>0
and p
>
1 such that
'If E F.
9.171'. Let Fe CO([a, b]) be a family of equicontinuous functions. Show that any of the following conditions implies equiboundedness of the functions in F. (i) :3 C, :3 Xo E [a, b] such that If(xo)1 ::; C V f E F, (ii) :3 C such that V f E :3 x E [a, b] with [f(x)1 ::; C, (iii) :3 C such that
f: If(t)j dt ::;
C.
9.172'. Let Q be a set and let X be a metric space. Prove that a subset B of the space of bounded functions from Q in X with the uniform norm is relatively compact if and only if, for any E > 0, there exists a finite partition Q1, ... , Qn, of Q such that the total variation of every u E B in every Qi is not greater than E.
<
9.173'. Show that a subset K C £p, 1::; p
(i) sUP{xn}EK 2:~=1 Ixn[P < 00, (ii) 'IE> 0:3 n, such that 2:~=n, jxnl P
::;
00,
is compact if and only if
E for all {x n } E K.
9.174'. Let X be a complete metric space with the property that the bounded and closed subsets of CO (X) are compact. Show that X consists of a finite number of points. 9.175' Holder-continuous functions. Let 0 be a bounded open subset oflR. n . According to Definition 9.46 the space of Holder-continuous functions with exponent a in 0, CO''''(O), 0 < a ::; 1 (also called Lipschitz continuous if a = 1) is defined as the linear space of continuous functions in 0 such that
[[ullo,,,,,n
:=
sup lui
n
+ [u]o,,,,,n < +00
where
[u]o,,,,,n:= sup
x,yEn x#y
u(x) - u(y) [
X -
Y
I""
One also defines C?;~(O) as the space of functions that belong to CO''''(A) for all relatively compact open subsets A, A CC O. Show that CO''''(O) is a Banach space with the norm II [[o,,,,,n.
348
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
9.176'. Show that the space Ck([a,b]) is a Banach space with norm k
Iluib
:=
L
IIDhUlloo,[a,b]'
h=O
Define Ck''''([a, b]) as the linear space of functions in Ck([a, b]) with Holder-continuous k-derivative with exponent Q such that
Ilullk,,,,,[a,b)
:= IluIICk([a,bJ)
+ [Dku]o,,,,,[a,b] < +00.
Show that Ck''''([a,b]) is a Banach space with norm
1IIIk,,,,,[a,bj'
:s:
9.177'. Show that the immersion of CO''''([a,b]) into CO,/3([a,b]) is compact if 0 k Q < f3:S: 1. More generally, show that the immersion of Ch''''([a,b]) into C ,/3([a,b]) is compact if k + f3 < h + Q. 9.178'. Let 0 C jR2 be defined by 0:= {(x,y) E jR21y
< IxI 1 / 2 ,
x 2 +y2
<
I}.
By considering the function u(x, y) := {
where 1 <
f3 < 2, show that
~sgn x)y/3
u E C1 (0), but u
ify
> 0,
ify:S: 0
¢ Co,,,, (0) if f3 /2 < Q
:s: 1.
9.179'. Prove the following Proposition. Let 0 be a bounded open set in jRn satisfying one of the following conditions (i) 0 is convex, (ii) 0 is star-shaped, (iii) aO is locally the graph of a Lipschitz-continuous function. Then C h ,,,, (0) C Ck,/3(O) and the immersion is compact if k + f3 < h + Q. [Hint: Show that in all cases there exists a constant M and an integer n such that 'Ix, yEO there are at most n points Z1, Z2, . .. , Zn with Z1 = x and Zn = y such that L:7~/ IZi - zi+ll Mix - yl· Use Lagrange mean value theorem.]
:s:
9.180'. Show that the space of Lipschitz-continuous functions in [a, b] is dense in CO([a,b]). [Hint: Use the mean value theorem.] 9.181'. Show that the space of Lipschitz-continuous functions in [a, b] with Lipschitz constant less than k agrees with the closure in CO([a, b]) of the functions of class C1 with supx If'(x)1 k.
:s:
9.182'. Let A > O. Show that {u E CO([O, +oo[)
I
sup lule-'\x
< +oo}
[O,+oo[
is complete with respect to the metric dU,g):= sUPx{lf(x) - g(x) I e-'\X}.
9.6 Exercises
349
9.183~. Let f : [0,1] ---+ [0,1] be a diffeomorphism with f'(x) > 0 "Ix E [0,1]. Show that there exists a sequence of polynomials Pn(x), which are diffeomorphisms from [0,1] into [0,1], that converges uniformly to f in [0,1]. [Hint: Use Weierstrass's approximation theorem.]
9.184~. Define for A[a;] E Mn,n(lK), lK = IR. or lK = C,
IAx l IIAII:= sup { J;f Ix
i- 0 } .
Show that (i) (ii) (iii) (iv) (v)
IAxl ~ IIAlllxl "Ix E X, IIAII = sup{(Axly) Ilxl = Iyl = I}, IIAI1 2 ~ I:~j=1(aj)2 ~ n11A11 2 , IIAol1 = IIAII, IIABII ~ IIAIIIIBII·
9.185~.
Let A E Mn,n(iC) and A(x) := Ax. Show that
(i) if Ilzll = IzIc", := max(lzll, ... , Iznl), then IIAII=
sup IIA(z)ll=max(tIA;I), IIzIIIs1 'j=l
(ii) if Ilzll = Izh := I:~=1 Izil, then IIAII=
9.186~.
sup IIA(z)ll=max(tIAjl). IIzlllle1 J i=l
Let A, BE M2,2(IR.) be given by
A (0o 1) =
0
'
Then AB i- BA. Compute exp (A), exp (B), exp (A)exp (B), exp (B)exp (A) and exp(A+B). 9.187~.
Define
N(n) U(n) H(n) H+(n)
:= {N E End (C n ) I N is normal}, := {N E End (C n ) I N is unitary}, := {N E End (C n ) I N is self-adjoint}, := {N E End (C n ) I N is self-adjoint and positive}.
Show (i) if N E N(n) has spectral resolution N =
I:j=l AjPj ,
then exp (N) E N(n) and
has the spectral resolution exp (N) = I:j=l eAj Pj , (ii) exp is one-to-one from H(n) onto H+(n), (iii) the operator H ---+ exp (iH) is one-to-one from H n onto U(n).
350
9. Spaces of Continuous Functions, Banach Spaces and Abstract Equations
9.188~. Let L E End (C n ). Then Id-L is invertible if and only if 1 is not an eigenvalue of L. If L is normal, then L = 2::;'=1 AjPj , and we have
L -1 -- 1P j. Aj n
(Id - L)-l =
j=1
If IILII
< 1,
then all eigenvalues have modulus smaller than one and 00
(Id - L)-1 =
n
00
L LAjP = L Ln. j
n=Oj=l
n=O
9.189~. Let T,T- 1 E End (X). Show that S E End (X) and then S-l exists, is a bounded operator and
II S-1
liS -
Til::; l/IITII- l
,
T-lll < liT-III - I-liS - Til liT-III
9.190~. Let X and Y be Banach spaces. We denote by Isom (X, Y) the subspace of all continuous isomorphisms from X into Y, that is the subset of L:(X, Y) of linear continous operators L : X -+ Y with continuous inverse. Prove the following.
Theorem. We have (i) Isom (X, Y) is an open set of L:(X, Y). (ii) The map f -+ f- l from Isom (X, Y) into itself is continuous. [Hint: In the case of finite-dimensional spaces, it suffices to observe that the determinant is a continuous function. J 9.191 ~. Show that, if
f
is linear and preserves the distances, then
f
E Isom (X, Y).
9.192 ~. Show that the linear map D : C l ([O, 1]) C CO ([0, 1]) -+ CO([O, 1]) that maps f to f' is not continuous with respect to the uniform convergence. Show that also the map from CO into CO with domain Cl
f
E C 1 ([0, 1]) C CO([O, 1])
-+
/'(1/2) E JR
is not continuous. In particular, notice that linear subspaces of a normed space are not necessarily closed. 9.193~. Fix a = {an} E £00 and consider the linear operator L : £1 anx n . Show that
-+
£1, (LX)n =
(i) IILII = Ilaliloo' (ii) L is injective iff an i- 0 "In, (iii) L is surjective and L -1 e continuous if and only if inf Ian I > O. 9.194~. Show that the equation 2u = cos u
+ 1 has a
unique solution in CO([O, 1]).
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators
In a normed space, we can measure the length of a vector but not the angle formed by two vectors. This is instead possible in a Hilbert space, Le., a Banach space whose norm is induced by an inner (or Hermitian) product. The inner (Hermitian) product allows us to measure the length of a vector, the distance between two vectors and the angle formed by them. The abstract theory of Hilbert spaces originated from the theory of integral equations of Vito Volterra (1860-1940) and Ivar Fredholm (18661927), successively developed by David Hilbert (1862-1943) and J. Henri Poincare (1854-1912) and reformulated, mainly by Erhard Schmidt (18761959), as a theory of linear equations with infinitely many unknowns. The axiomatic presentation, based on the notion of inner product, appeared around the 1930's and is due to John von Neumann (1903-1957) in connection with the developments of quantum mechanics. In this chapter, we shall illustrate the geometry of Hilbert spaces. In Section 10.2 we discuss the orthogonality principles, in particular the projection theorem and the abstract Dirichlet principle. Then, in Section 10.4 we shall discuss the spectrum of compact operators partially generalizing to infinite dimensions the theory of finite-dimensional eigenvalues, see Chapter 4.
10.1 Hilbert Spaces A Hilbert space is a real (complex) Banach space whose norm is induced by an inner (Hermitian) product.
10.1.1 Basic facts a. Definitions and examples 10.1 Definition. A real (complex) linear space, endowed with an inner or scalar (respectively Hermitian) product ( I ) is called a pre-Hilbert space.
352
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators
FORTSCmITFp. DER )/ATHEliATI CHE~ 1\1' E1~ ~
CH.\FTE~
K SO
DI.lti! EliSlD f'OS ono !Lt"lB$1'LU. =~~~""n,=_~~
GRUXDztGE BINER llLGE)ILDiEX TBEORIE DER LL\B.illX L\TEGRALGLEIC1IUNGE~
.... DAVID HILBDT
1.IlPZIO UYD IIULnf 'mr 1.0. TEU8S.u lJU
DllOCIVYD ftBUG
Figure 10.1. David Hilbert (1862-1943) and the Theone der linearen lntegralgleichungen.
We have discussed algebraic properties of the inner and Hermitian products in Chapter 3. We recall, in particular, that in a pre-Hilbert space H the function (10.1) Ilxll:=~, xEH, defines a norm on H for which the Cauchy-Schwarz inequality,
[(x[y)1 :::;
Ilxll [lyll
Vx,y E H,
holds. Moreover, Carnot's theorem
Ilx + yll2
= IIxW
+ lIyW + 2~(xly)
Vx,y E H
and the parallelogram law
Vx,y E H hold. In Chapter 3 we also discussed the geometry of real and complex pre-Hilbert spaces of finite dimension. Here we add some considerations that are relevant for spaces of infinite dimension. A pre-Hilbert space H is naturally a normed space and has a natural topology induced by the inner product. In particular, if {x n } cHand x E H, then X n ~ x means that IIx n - xII = (x n - xlx n - X)1/2 ~ 0 as n ~ 00. As for any normed vector space, the norm is continuous. We also have the following.
10.2 Proposition. The inner (or Hermitian) product in a pre-Hilbert space H is continuous on H x H, i.e., if X n ~ x and Yn ~ Y in H, then (xnIYn) ~ (xly). In particular, if (xly) = a for all Y in a dense subset Y of H, we have x = o.
10.1 Hilbert Spaces
353
Proof. In fact
l(xnIYn) -
(xly)1
+ (xlYn - y)1 Ilxn - xllllYnl1 + IlxllllYn - yll;
= I(xn - xlYn) ~
the claim then follows since the sequence llYn II is bounded, since it is convergent. If Y is a dense subset of H, we find for any x E H a sequence {Yn} C Y such that Yn -> x. Taking the limit in (x I Yn) = 0, we get (x I x) = O. 0 10.3 ,. Differentiability of the inner product. Let u :]a, b[-> H be a map from an interval of lit into a pre-Hilbert space H. We can extend the notion of derivative in this context. We say that u is differentiable at to E]a, b[ if the limit
u'(to):= lim u(t) - u(to) E H t - to
t~O
exists. Check that Proposition. Ifu,v :]a,b[-> H are differentiable in Ja,b[, so is t
d
-(u(t) I v(t)) = (u'(t)lv(t)) dt
+ (u(t)lv'(t))
->
(u(t) I v(t)) and
"It E]a,b[.
10.4 Definition. A pre-Hilbert space H that is complete with respect to the induced norm, Ilxll := v(xlx), is called a Hilbert space. 10.5". Every pre-Hilbert space H, being a metric space, can be completed. Show that its completion ii is a Hilbert space with an inner product that agrees with the original one when restricted to H.
Exercise 10.5 and Theorem 9.21 yield at once the following.
10.6 Proposition. Every finite-dimensional pre-Hilbert space is complete, hence a Hilbert space. In particular, any finite-dimensional subspace of a pre-Hilbert space is complete, hence closed. The closed unitary ball of a Hilbert space H is compact if and only if H is finite dimensional. 10.7 Example. The space of square integrable real sequences
£2 = £2 (lit) :=
{x =
I
{Xn} Xn E lit,
flxil 2 < oo} i=l
is a Hilbert space with inner product (x I y) := L~l XiYi, compare Section 9.1.2. Similarly, the space of square integrable complex sequences
I
£2(iC) := {x = {xn } xn E C,
f I il
X 2
< oo}
i=l
is a Hilbert space with the Hermitian product (x I y) := L~l Xi'!};'
354
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators
10.8 Example. In CO ([a, b]) b
(fIg) :=
f
f(x)g(x) dx
a
) 1/2 IlfI12:= ( fab If(tW dt .
defines an inner product with induced norm
As we have
seen in Section 9.1.2, CO([a, b]) is not complete with respect to this norm. Similarly
f
1
(f I g) :=
f(x)g(x) dx,
°
defines in CO ([0, 1]' iC) a pre-Hilbert structure for which CO([O, 1], iC) is not complete.
b. Orthogonality Two vectors x and y of a pre-Hilbert space are said to be orthogonal, and we write x .1 y, if (xly) = O. The Pythagorean theorem holds for pairwise orthogonal vectors Xl, X2, .•. , X n
Actually, if H is a real pre-Hilbert space, x .1 y if and only if [Ix
IIxW + Ily112.
+ y[1 2 =
A denumerable set of vectors {en} is called orthonormal if (ehlek) Vh, k. Of course, orthonormal vectors are linearly independent.
= 8hk
10.9 Example. Here are a few examples. (i) In £2, the sequence er = (1,0, ... ), e2 = (0,1, ... ), ... , is orthonormal. Notice that it is not a linear basis in the algebraic sense. (ii) In CO([a, b], lR) with the L2-inner product
f
b
(flg)£2 :=
f(x)g(x) dx
a
the triginometric system _1_, cos (n 27l'X ), sin (n 27l'X ), b-a b-a b-a
n= 1,2,00'
is orthonormal, compare Lemma 5.45 of [GM2]. b
(iii) In CO([a,b],iC) with the Hermitian L 2 -product (flg)L2 .-
f a
trigonometric system
1 ( i2k7l'x) --exp -- , b-a
forms again an orthonormal system.
b-a
k E Z,
f(x)g(x)dx, the
10.1 Hilbert Spaces
355
10.1.2 Separable Hilbert spaces and basis a. Complete systems and basis Let H be a pre-Hilbert space. We recall that a set E of vectors in Hare said to be linearly independent if any finite choice of vectors in E are linearly independent. A set E C H of linearly independent vectors such that any vector in H is a finite linear combination of vectors in E is called an algebraic basis of H. We say that a system of vectors {e"'}"'EA in a pre-Hilbert space H is complete if the smallest closed linear subspace that contains them is H, or equivalently, if all finite linear combinations of the {e",} are dense in H. Operatively, {e"'}"'EA cHis complete if for every x E H, there exists a sequence {x n } of finite linear combinations of the e", 's, Xn
=
L
p(~)e"'i
Ql,.··,OkEA
that converges to
X. 1
10.10 Definition. A complete denumerable system {en} of a pre-Hilbert space H of linearly independent vectors is called a basis of H.
b. Separable Hilbert spaces A metric space X is said to be separable if there exists a denumerable and dense family in X. Suppose now that H is a separable pre-Hilbert space, and {x n } is a denumerable dense subset of H; then necessarily {x n } is a complete system in H. Therefore, if we inductively eliminate from the family {x n } all elements that are linearly dependent on the preceding ones, we construct an at most denumerable basis of vectors {Yn} of H. Even more, applying the iterative process of Gram-Schmidt, see Chapter 3, to the basis {Yn}, we produce an at most denumerable orthonormal basis of H, thus concluding that every separable pre-Hilbert space has an at most denumerable orthonormal basis. The converse holds, too. If {en} is an at most denumerable complete system in H and, for all n, Vn is the family of the linear combinations of el, e2, ... , en with rational coefficients (or, in the complex case, with coefficients with rational real and imaginary parts), then Un Vn is dense in H. We therefore can state the following. 10.11 Theorem. A pre-Hilbert space H is separable if and only if it has an at most denumerable orthonormal basis. 1
Notice that a basis, in the sense just defined, need not be a basis in the algebraic sense. In fact, though every element in H is the limit of finite linear combinations of elements of {e",}, it need not be a finite linear combination of elements of {e",}. Actually, it is a theorem that any algebraic basis of an infinite-dimensional Banach space has a nondenumerable cardinality.
356
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators
orU8E,PP£ VlTALI
GEOMETRIA
ERGI!SNISSE DER MATHEMATIK UNn IHRER GRENZGEBIIITE Ifu.A~~ON,,'I'(IoH
~~-~=·~·:~~~~~~~~~t:'~~: 'I(,SQ{NlnT·1l '1;f"I1, .. U_IJo,."
_ _ _ ",,,l.101£1l.·,,t.ntl
_
ELLO SPAZIO HILBERTIA 0
OR.IviED LlNEAR SPACES MAHLO
MDAY
.nINOItJl·VU.t.AC ..... U)l Ooml'-C.IS Ht:IDII!I,.U"O
....
-_. ...-._-------
BOLOGNA :srOOLA ZANIOB1l:Ltrt
Figure 10.2. Frontispieces of Geometria nello Spazio Hilbertiano by Giuseppe Vitali (1875-1932) and a volume on normed spaces.
10.12 Example. The following is an example of a nonseparable pre-Hilbert space: the space of all real functions f that are nonzero in at most a denumerable set of points {t;} (varying with J) and moreover satisfy 2:i f(t;)2 < 00 with inner product (x I y) = 2: x(t)y(t), the sum being restricted to points where x(t)y(t) i- O.
10.13 Remark. Using Zorn's lemma, one can show that every Hilbert space has an orthonormal basis (nondenumerable if the space is nonseparable); also there exist nonseparable pre-Hilbert spaces with no orthonormal basis. Let H be a separable Hilbert space, let {en} be an orthonormal basis on H and let Pn : H -+ H be the orthogonal projection on the finitedimensional subspace H n := Span {el' e2, ... , en}. If L : H -+ Y is a linear operator from H into a linear normed space Y, set Ln(x) := LoPn(x) 'r/x E H. Since the Ln's are obviously continuous, H n being finite dimensional, and IILn(x) - L(x)lly -+ 0 'r/x E H, we infer from the Banach-Steinhaus theorem the following. 10.14 Proposition. Any linear map L : H space into a normed space Y is bounded.
-+
Y from a separable Hilbert
Therefore linear unbounded operators on a separable pre-Hilbert space L : D -+ Yare necessarily defined only on a dense subset D ~ H of a separable Hilbert space. There exist instead noncontinuous linear operators from a nonseparable Hilbert space into lR.. 10.15 Example. Let X be the Banach space co of infinitesimal real sequences, cf. Exercise 9.158, and let f: X ---+ lR be defined by f((clil,a2, ... )):= a1. Then kerf =
10.1 Hilbert Spaces
357
{(an) E CO I a1 = O} is closed. To get an example of a dense hyperplane, let {en} be the element of co such that e k = Ok,n and let x O be the element of co given by x~ = lin, so that {x O, e 1, e 2 , •.. } is a linearly independent set in co. Denote by l3 a Hamel basis (i.e., an algebraic basis) in Co which contains {x O ,e 1 ,e2 , ... }, and set
l3 = where bi
#
{x o,e e 1
2
,
, ... } U {
bi
liE I}
x O , en for any i and n. Define 00
f: co
~
JR,
f(aox
o
+ '2: ane n + '2: a ibi ) = n=l
Since en E ker f Vn 10.16~.
aD·
iEI
2': 1, ker f is dense in Co but clearly ker f # Co.
Formulate similar examples in the Hilbert space of Example 10.12.
c. Fourier series and £2 We shall now show that there exist essentially only two separable Hilbert spaces: £2(lR) and £z(C). As we have seen, if H is a finite-dimensional pre-Hilbert space, and (el, e2, ... , en) is an orthonormal basis of H, we have n X
n
= ~)xlej) ej,
Z
Ilxli =
2: l(xlej)1
2
.
j=l
j=l
We now extend these formulas to separable Hilbert spaces. Let H be a separable pre-Hilbert space and let {en} be an orthonormal set of H. For x E H, the Fourier coefficients of x with respect to {en} are defined as the sequence {(xlej)}j, and the Fourier series ofx as the series 00
2:(x!ej)ej,
j=l whose partial n-sum is the orthogonal projection Pn(x) of x into the finitedimensional space Vn := Span {el, ez, ... , en}, n
Pn(x) = 2:(xlej) ej' j=l
Three questions naturally arise: what is the image of x E H?
Does the Fourier series of x converge? Does it converge to x? The rest of this section will answer these questions.
358
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators
10.17 Proposition (Bessel's inequality). Let {en} be an orthonormal set in the pre-Hilbert space. Then 00
L
l(xlek)1
2
IIxl1 2
:::;
(10.2)
't/xEH.
k=l Proof. Since for all n the orthogonal projection of x on the finite-dimensional subspace Vn := Span {e1, e2, ... , en} is Pn(x) = L:~=o(xlek)ek, the Pythagorean theorem yields n
L l(xlek)1 2 = IlPn(X)[[2 = k=O When n
~ 00,
Ilx[1 2 -llx -
Pn (x)11 2 S;
Ilx11 2 . o
we get the Bessel inequality (10.2).
10.18 Proposition. Let {en} be an at most denumerable set in a preHilbert space H. The following claims are equivalent.
(i) {en} is complete. (ii) 't/x E H we have x = L~o(xiek)ek in H, equivalently Ilx-Pn(x)11
o as n -+ 00.
(iii) (PARSEVAL'S FORMULA), (iv) 't/x, y E H we have
IIxl1 2 = L~o l(xlekW 't/x E H
-+
holds.
00
(xly)
=
L(x!ej) (ylej). j=l
In this case x
=0
if (xiek)
= 0 't/k.
Proof. (i) {'? (ii). Suppose the set {en} is complete. For every x E Hand n E N, we find finite combinations of e1, e2, ... , en that converge to x, n
IIx - snll ~ o.
Sn:= L O
If Pn(x) = L:~=1 (xlek)ek is the orthogonal projection of x in Vn = Span {e1, .. . , en}, we have, as Sn E Vn , Ilx - Pn(x)11 S; Ilx - snll ~ 0,
therefore x = L:k=o(xlek)ek in H. The converse (ii) (ii) {'? (iii) follows from
=}
(i) is trivial.
n
L l(xlek)1 2 = IIPn (x)1I 2 = k=O
IIxl1 2 -llx -
Pn(x)11 2
when n ~ CXl. (ii) implies (iv) since the inner product is continuous, 00
(xly) = (L(xlei)ei i=l
00
I L(x!ej)ej) j=l
00
00
= L
(xlei) (ylej) (eilej) = L(xlej) (ylej). i,j=l j=l
and (iv) trivially implies (iii). Finally (iii) implies that x = 0 if (xlek)
= 0 Vk.
o
10.1 Hilbert Spaces
359
10.19 Proposition. Let H be a Hilbert space and let ~n} be an orthonormal set of H. Given any sequence {Ck} such that Lj=o ICkl2 < 00, then the series z=;:o cjej converges to H. If moreover {en} is complete, then 00
x = 2:)xlej) ej
\:IxE H.
j=l Proof. Define Xn := 2:.1=1 cjej' As
n+p
I!x n+p
-
xnjj2 =
L
Jej12,
j=n+1 {X n } is a Cauchy sequence in H, hence it converges to y := 2:~o cjej E H. On account of the continuity of the scalar product 00
(yjej) =
(L
00
Ci
eijej) =
i=l
L
Ci
(eijej) =
Cj
i=l
for all j. If x E Hand Cj := (xlej) Vj, then (x - yjej) = 0 Vj, and, since {en} is complete, Proposition 10.18 yields x = y. 0
Let H be a pre-Hilbert space. Let us explicitly interpret the previous results as information on the linear map defined by
xEH, that maps x E H into the sequence of its Fourier coefficients. o Bessel's inequality says that F(x) E £2 \:Ix E H and that F: H continuous, actually
--+
£2 is
00
jIF(x)lj~2 :=
L
j(xjej)j2 ::;
IjxW,
j=l
o if {en} is a complete orthonormal set in H, then Parseval's formula says that F : H --+ £2 is an isometry between H and its image F(H) C £2, in particular F : H --+ £2 is injective, o if H is complete and {en} is a complete orthonormal set, then, according to Proposition 10.19, - the series Z=;:1 cjej converges in H for every choice of the sequence {Cj} C £2, that is, F is surjective onto £2, - the inverse map of F, F- 1 : £2 --+ H, is given by 00
F- 1 ({Cj}) Therefore, we can state the following.
=
LCjej. j=l
360
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators
10.20 Theorem. Every separable Hilbert space Hover lR (respectively over C) is isometric to £2(lR) (respectively to £2(C))' More precisely, given an orthonormal basis {en} C H, the coordinate map [ : £2(IK) ~ H (lK = lR if H is real, resp. IK = C if H is complex), given by 00
[({cd):= LCkek, k=O
is a surjective isometry of Hilbert spaces and its inverse maps any x E H into the sequence of the corresponding Fourier coefficients {(xlej)}j.
Finally, we conclude with the following. 10.21 Theorem (Riesz-Fisher). Let H be a Hilbert space and let {en} be an at most denumerable orthonormal set of H. Then the following statements are equivalent.
(i) {en} is a basis of H. (ii) Vx E H we have x = 2::;:1 (xlej )ej. (iii) Ilx - Pn(x)11 ~ 0, where Pn is the orthogonal projection onto Vn := Span {e1, e2, ... , en}. (iv) (PARSEVAL'S FORMULA or ENERGY EQUALITY) Ilxll = 2::;:1 l(xlejW holds Vx E H. (v) (xly) = 2::;:1 (xlej) (Ylej). (vi) if (xlej) = 0 Vj then x = O. Proof. The equivalences of (i), (ii), (iii), (iv), (v) and of (i) '* (vi) were proved in Proposition 10.18. It remains to show that (vi) implies (i). Suppose that {en} is not complete. Then there is y E H with IIyl12 > L~11(Ylej)12, while, on the other hand, Bessel's inequality and Proposition 10.19 show that there is z E H such that z := L~o(ylej)ej, and by Parseval's formula, IIzl1 2 = L~1 l(ylej)1 2. Consequently IIzl1 2 < Ily112. But, on account of the continuity of the scalar product 00
(zlek)
00
= (2)ylej)ej j=1
lek)
= "L(ylej)(ejlek) = (ylek) j=1
Le., (y - zlek) = 0 Vk. Then by (vi) y = z, a contradiction.
o
d. Some orthonormal polynomials in L 2 Let I be an interval on lR and let p : I ~ lR be a continuous function that is positive in the interior of I and such that for all n 2: 0 jlt1np(t) dt <
+00.
The function p is often called a weight in I. The subspace ~ of C°(1, C) of functions x(t) such that j 1x(t)1 2p(t) dt <
00
10.1 Hilbert Spaces
361
is a linear space and
(xly)
:=
1
x(t)y(t)p(t) dt
defines a Hermitian product on it. This way Vp is a pre-Hilbert space. Also, one easily sees that the monomials {t n } n 2: 0, are linearly independent; Gram-Schmidt's orthonormalization process then produces orthonormal polynomials {Pn (t)} of degree n with respect to the weight p. Classical examples are o JACOBI POLYNOMIALS I n . They correspond to the choice
1:= [-1,1],
p(t)
o LEGENDRE POLYNOMIALS
:= (1 - t)Q(1
+ t)i3,
ex,j3 > -1.
Pn . They correspond to the choice ex = j3 =
in Jacobi polynomials I n , Le.,
p(t)
1= [-1,1]' o
:= 1.
CHEBYCHEV POLYNOMIALS Tn. They correspond to the choice ex -1/2 in Jacobi polynomials I n , i.e.
1= [-1,1]'
p(t):=
°
=
j3 =
1
~.
V
1- t 2
o LAGUERRE POLYNOMIALS Ln. They correspond to the choice
1= [0, +00]' o HERMITE POLYNOMIALS H n . They correspond to the choice
1:= [-00, +00]' One can show that the polynomials {In}, {Pn }, {Tn}, {L n }, {Hn } form respectively, a basis in Vp . Denoting by {R n } the system of orthonormal polynomials with respect to p(t) obtained by applying the Gram-Schmidt procedure to {t n }, n 2: 0, the Rn's have interesting properties. First, we explicitly notice the following properties o (AI) for all n, R n is orthogonal to any polynomial of degree less than n, o (A2) for all n the polynomial Rn(t) - tRn-1(t) has degree less than n, hence
(tRn-1IR n ) = (RnIR n ), o (A3) for all x, y, Z E Vp we have (xylz)
= (xyzll) = (xIYz).
10.22 Proposition (Zeros of R n ). Every R n has n real distinct roots in the interior of I.
362
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators
Proof. Since JI Rn(t)p(t) dt = 0, it follows that R n changes sign at least once in I. Let tl < ... < t r be the points in int (1) in which R n changes sign. Let us show that r = n. Suppose r < n and let Q(t):= (t - tl)(t - tz) ... (t - t r ), then RnQ has constant sign, hence, (RnIQ)
f= 0, that contradicts property (A1).
0
10.23 Proposition (Recurrence relations). There exist two sequences {An}, {fln} of real numbers such that for n 2': 2
Proof. Since deg(Rn - tRn-l)
~
n - 1, then n-l
Rn(t) - tRn-r(t) =
L CiRi(t), i=O
and for i ~ n - 1, we have -(tRn-lIR;) = ci(RilR;). By (A3), we have (tRn-l!Ri) = (RilR;), hence, ifi+1 < n-1, then (tRn-llR;) = 0 from which Ci = 0 for i = n-2, n-l. For i = n - 2, property (A2) shows that
-(Rn-lIRn-d = cn-z(Rn-zIRn-z), hence cn-z
o
< O.
10.24'. Define
dn 2 _(t _l)n. 2 n n! dt n (i) Integrating by parts show that {Qn} is an orthogonal system in [-1,1] with respect to p(t) = 1, and that (QnIQn) = 2j(2n + 1). (ii) Show that Qn(l) = 1 and that Qn is given in terms of Legendre polynomias {Pn } by Qn(t) = Pn (t)jPn (l). Finally, compute Pn (l). (iii) Show that the polynomials {Qn} satisfy the recurrence relation Qn(t)
:= -
1
nQn = (2n - l)Qn-l - (n - 1)Qn-2 and solve the linear ODE
-d ( ( l -2t dQn) ) - +n(n+1)Qn=0. dt dt 10.25'. In Vp with p(t) := e- t and 1= [0, +00[' define 1 dn Q (t):= _et_(e-tt n ). n n! dtn
(i) Show that degQn = n and that {Qn} is a system of orthogonal polynomials in Vp . Then compute (QnIQn). (ii) Show that Qn(O) = 1, and, in terms of Laguerre polynomials,
Q (t) = Ln(t) . n
Ln(O)
Compute then Ln(O). (iii) Show that eat E Vp for all a ~ 0 and compute its Fourier coefficients "'tn,a with respect to {Qn}. (iv) Show that E"'tn,aQn(t) = eat in Vp .
10.2 The Abstract Dirichlet's Principle and Orthogonality
363
METHODE DER MATHEMATI CHEN PHYSIK
LI EAR OPERATOR PART I:
R. COURANT
1m>
':=:::':I~
D. HILBERT
':~~
_- _ -_....... - . - ------
GENERAL THEORY
..
KIUO. DtJ'JI'ftIlLD'" JACOI '1'. 1ICHW.uTZ: -,.._ _ ......,...,.... -"""'1-.nlInI
~
....... ........-...
..
-.n
ERSTER BAND
~
""""'.
w-..c......... ..-nG...... -..-._
'--eN' ",_.lo_
...-....,..
--..,....
...........,-.4~
1lIw ........... A....
...
,.,..,_~
~C-'~/Jl ....-#.
--
........
I ..........,..... ...........
jOtot<4.ll.h.1fJel;S """""'.~
B£R.LlN VERLAG VON JULIUS SPRINGE. 1f)1
Figure 10.3. The frontispieces of two classical monographs.
(v) Changing variable and using the Stone-Weierstrass theorem, show that {e- nt } is a basis in Vp • 10.26~.
Define the polynomials Qn(t) by
(i) Show that {Qn} is an orthogonal system in Vp with I = [0, +oo[ and p(t) = e- t Show that each Qn(t) is proportional to the Hermite polynomial H n . (ii) Show that Qo = 1, Ql = 2t and that for n 2': 2
2 .
Qn(t) = 2tQn-dt) - 2(n -1)Qn-l(t). (iii) Show that Qn satisfies Q~(t) - 2tQ~(t)
+ 2nQn(t)
= 0 and that Q~(t)
2nQn-l (t).
10.2 The Abstract Dirichlet's Principle and Orthogonality The aim of this section is to illustrate some aspects of the linear geometry of Hilbert spaces mainly in connection with the abstract formulation of the Dirichlet principle. In its concrete formulation, this principle has played a fundamental role in the geometric theory of functions by Riemann, in the theory of partial differential equations, for instance, when dealing with
364
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators
gravitational or electromagnetic fields and in the calculus of variations. On the other hand, in its abstract formulation it turns out to be a simple orthogonal projection theorem.
a. The abstract Dirichlet's principle Let H be a real (complex) Hilbert space with scalar (respectively Hermitian) product ( I ) and norm Ilull := J(ulu). Let JK = IR if H is real or JK := C if H is complex. Recall that a linear continuous functional L on H is a linear map L : H -+ IR such that
IL(u)1 :S K
lIull
VUEH;
(10.3)
the smallest constant K for which (10.3) holds is called the norm of L, denoted IILII so that IL(u)1 :S
IILllllull, Vu E H,
and, see Section 9.4,
IILII =
sup IL(u)1
Ilull=l
IL(u)1
= sup -11-1-1 . u#O
u
We denote by H* = .c(H, JK) the space of linear continuous functionals on H, called the dual of H.
10.27 Theorem (Abstract Dirichlet's principle). Let H be a real or complex Hilbert space and let L E H*. The functional F : H -+ IR defined by F(u) :=
1
211ul12 -
R(L(u))
(10.4)
achieves a unique minimum point u in H, and every minimizing sequence, i. e., every sequence {Uk} C H such that F(Uk)
-+
inf F(v),
vEH
converges to u in H. Moreover u is characterized as the unique solution of the linear equation (10.5) Vip E H. (iplu) = L(ip) In particular
Ilull = IILII·
Proof. Let us prove that F has a minimum point. First we notice that F is bounded from below, since, recalling the inequality 2ab ::; a 2 + b2 , we have for all v E H
hence oX
:=
infvEH F(v) E JR. Then we observe that, by the parallelogram law,
10.2 The Abstract Dirichlet's Principle and Orthogonality
!llxl1 2 -
365
L(x)
XL
·····:····l··· Figure 10.4. The Dirichlet's principle.
2 1 2+ 2"llvl12 1 41 11u - vii 2 = 2"llu[[ - IIU+VI1 -2_ = F(u)
~(L(u)_~(L(v)) + 2~( L(U; V))
+ F(v)
(10.6)
(U+V)
- 2F - 2 - .
Thus, if {ukl is a minimizing sequence, by (10.6)
+ Uh) - 2F (Uk -2 -
1 4[[Uk - uhl[ 2 = F(Uk)
+ F(Uh) :::; F(Uk) + F(Uh)
- 2,\
--->
0
as h, k ---> 00. Therefore {Uk} is a Cauchy sequence in H and converges to some u E H; by continuity F(Uk) ---> F(u) hence F(u) = '\. This proves existence of the minimizer
u. If {vkl is another minimizing sequence for F, (10.6) yields [[Uk - vk II ---> 0, and this proves that the minimizer is unique and that every minimizing sequence converges to u in H. Let us show that u solves (10.5). Fix r.p E H, and consider the real function E ---> F(u + E
F(u + E
E
1 E2 + ~[(r.plu) = -11r.p112 2
= O. We deduce ~«
Vr.p E
(
F(v
L(
2
= F(v) + ~«
hence, as z = 0
H.
(10.7)
r.p E H
1 1 2+ ~(vlr.p) + _[[r.p[[2 + r.p) = -llvl1 -
2
Vr.p E H
- L(r.p))
~(L(v)) - ~(L(
1
+ -11r.p112 = 2
F(v)
1
+ -11r.p112, 2
hence F(v + JR. Finally, we infer from (10.5)
IILII =
sup IL(
I[
'1'#0
11r.p11
Iluli. o
366
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators
ACAUUllt Ul
SCllNCBS Dt NONellt
L.£<;O O'A AL.Y K FO CTIO NEL.LE
Figure 10.5. Frigyes Riesz (1880-1956) and the frontispiece of a classical monograph.
b. Riesz's theorem In particular, we have proved the following. 10.28 Theorem (Riesz). For every linear continuous functional L E H* there exists a unique UL E H such that
L(cp) = (cpIUL) Moreover
lIuL/i =
Vcp E H.
(10.8)
IILII·
Actually, we have also proved that Riesz's theorem and the abstract Dirichlet's principle are equivalent. 10.29 Continuous dependence and Riesz's operator. If u solves the minimum problem (10.4), or equation (10.8), we have lIull = IILII· This implies that the solution of (10.4) or (10.8) depends continuously on L. In fact, if L n , L E H* and IIL n - LII --> 0, and if Un, U E H solve (cplu n ) = L(cp) and (cplu) = L(cp) Vcp E H, then
(cplu n
-
u) = (L n
-
L)(cp)
Vcp E H,
hence
lIu n
-
ull = IIL n
-
LII--> 0.
10.30 Riesz's operator. The map f : H* --> H that associates to each L E H* the solution UL of (10.8) is called Riesz's map. It is easily seen that f : H* --> H is linear and by Riesz's theorem we have Ilf(L)11 lIuL/i = liLli, i.e., not only is r continuous, but
10.2 The Abstract Dirichlet's Principle and Orthogonality
r : H*
10.31 Theorem. Riesz's map andH.
-+
367
H is an isometry between H*
c. The orthogonal projection theorem Let us now extend the orthogonal projection theorem onto finite-dimensional subspaces, see Chapter 3, to closed subspaces of a Hilbert space. Let H be a Hilbert space and V a subspace of H. If f E H, then the map L : V -+ JK, v -+ L(v) = Ulv) is a linear continuous operator on V with IILII ::; Ilfll since IUlv)1 ::; Ilfllllvll "Iv E V by the Cauchy-Schwarz inequality. Since a closed linear subspace V of a Hilbert space H is again a Hilbert space with the induced inner product, a simple consequence of Theorem 10.27 is the following. 10.32 Theorem (Projection theorem). Let V be a closed linear subspace of a Hilbert space H. Then for every f E H there is a unique point u E V of minimum distance from f, that is
Ilf - ull =
dist U, V) := inf{ If
- cpll cp E V}.
Moreover, u is characterized as the unique point such that f - u is orthogonal to V, i.e., U - ulcp) = 0 "Icp E V. Proof. We have for all v E V Ilv - 111 2 = IIvl1 2 - 2?R(vlf) + 11/11 2 . Theorem 10.27, when applied to F(v) := IIvl12 - 2?R(flv), v E V, yields existence of a unique minimizer u E V of IIv - IW, hence of v ---+ Ilv - III. The characterization of u given by Riesz's theorem states, in our case, that u is also the unique solution of
2(
= 2(
V
o Let V be a subspace of a Hilbert space H. We denote by V..L the class of vectors of H orthogonal to V V..L := {x E
HI(xlv) = 0 "Iv E V}.
Clearly V..L is a closed subspace of H.
10.33 Corollary. If V is a linear closed subspace of a Hilbert space H, then H = V EB V..L, i. e., every u E H uniquely decomposes as u = v + w, where v E V and w E V..L . 10.34'. Show that, if V is a linear subspace of a Hilbert space H, then V1- is closed and that (V 1-)1- is the closure of V.
368
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators
10.35~. Show that the orthogonal projection theorem is in fact equivalent to Riesz's theorem and, consequently, to the abstract Dirichlet's principle. [Hint: We give the scheme of the proof leaving it to the reader to add details. Uniqueness and IluL11 = IILII follow from (10.8). Let us prove the existence of a solution of (10.8). Suppose L is not identically zero, then ker L = L -1 ( {O}) is a linear closed proper subspace of Hand there exists uo E ker L.L such that uo i- 0 and L(uo) = 1. Since u - L(u)uo E ker L Vu E H, we have
u
= w+L(u)uo
with w E ker Land Uo E ker L.L.
Multiplying by uo, we then find L(u) = (u
11,,"09 2 ) '
d. Projection operators Let V be a linear closed subspace of a Hilbert space H. The projection theorem defines a linear continuous operator Pv : H ----+ H that maps f E H into its orthogonal projection Pv f E V; of course IIPvl1 S 1 and 1m (Pv ) = V. Also
P~ and the formula H
=
= Pv 0 Pv = Pv
V EB V -l can be written as
Id = Pv
+ PVl-,
PVPvl- = PVl-PV = O.
For the reader's convenience, we only prove that Pv(J + g) = Pv(J) + Pv(g). Trivially, f + 9 - Pv(J + g) 1.. V and f + 9 - Pv f - Pvg 1.. V; since there is a unique U E V such that f + 9 - h 1.. V, we conclude Pv(J + g) = Pv(J) + Pv(g). 10.36~. Let P : H -+ H be a linear operator such that p2 = P. Then P is continuous if and only if ker P and 1m P are closed.
10.37~. If V is a closed subspace of a Hilbert space Hand {en} is a denumerable orthonormal basis of V, then the orthogonal projection of x E H is given by 00
00
Px:= 2)Pxlej)ej = L(xlej)ej. j=1
j=1
10.3 Bilinear Forms From now on we shall only consider real vector spaces, though one could develop similar results for sesquilinear forms on complex vector spaces.
10.3 Bilinear Forms
369
10.3.1 Linear operators and bilinear forms a. Linear operators Let H be a real Hilbert space. As we know, the space £(H, H) of linear continuous operators, also called bounded operators from H into H, is a Banach space with the norm IITxl1
IITII=~~~~. If T E £(H, H), we denote by N(T) and R(T) respectively the kernel and the image or range of T. Since T is continuous, N(T) = T- 1 ( {O}) is closed in H, while in general R(T) is not closed. The restriction of T to N (T)J.. , T: N(T)J.. ---+ R(T) is of course a linear bijection, therefore, from Banach's open mapping theorem, d. Section 9.4, we infer the following.
10.38 Proposition. Let T E £(H, H). Then T has a closed range in H if and only if there exists C > 0 such that Ilxl! ::; C IITxl1 Vx E N(T)J.., that is, if and only if T- 1 : R(T) ---+ N(T)J.. is a bounded operator, or, equivalently, if and only if T : N(T)J.. ---+ R(T) is an isomorphism. b. Adjoint operator Let X, Y be two real Hilbert spaces endowed with their inner products ( I )x and ( I )y, and let T E £(X, Y). For any y E Y the map x ---+ (Txly)y is a linear continuous form on X, hence Riesz's theorem yields a unique element T*y E X such that (xIT*y)x
=
(Txly)y
Vx E X, Vy E Y.
(10.9)
It is easily seen that the map T* : Y ---+ X just defined is a linear operator called the adjoint of T. Moreover, from (10.9) T* is a bounded operator with IIT*II = IITII· Obviously, if S, T E £(H, H) (TS)*
=
S*T*,
(T*)*
= T.
10.39'. Suppose that P: H -> H is a linear continuous operator such that p 2 = P and P* = P. Show that V := P(H) is a closed subspace of H and that P is the orthogonal projection onto V.
An operator L : H T*
---+
H on a Hilbert space H is called self-adjoint if
= T, i.e., (xITy)
=
(Txly)
Vx,y E H.
It follows from (10.9) that R(T)J.. = N(T*). Consequently R(T) = N(T*)J.. and using the open mapping theorem, we conclude the following.
10.40 Corollary. Let T E £(H, H) be a bounded operator with closed range. Then we have
370
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators
(i) The equation Tx = y is solvable if and only if y l- N(T*). (ii) T is an isomorphism between N(T).L and R(T) = N(T*).L. In particular the Moore-Penrose inverse Tt : H -+ H, defined by composing the orthogonal projection onto R(T) with the inverse of T1N(T).L, is a bounded operator. (iii) We have H = R(T) EB N(T*). Proof. (i) For x,y E H we have (yITx) by considering the orthogonals,
= (T*ylx).
Hence, R(T).L
= N(T*).
Therefore,
(ii) follows from the open mapping theorem and (iii) follows from (i) by considering the projections onto N(T*).L and N(T'). 0 10.41 ~. Let A E LeX, Y) be a bounded operator between Hilbert spaces. Show that (i) N(A) = N(A' A), (ii) R(A'):J R(A* A), (iii) R(A') = R(A* A) if and only if R(A) = N(A').L, i.e., if and only if R(A) is closed, (iv) if R(A' A) is closed, then R(A') and R(A) are closed. 10.42~. Let H be a Hilbert space and let T be a self-adjoint operator. Show that T is continuous. [Hint: Show that T has a closed graph.]
c. Bilinear forms Let H be a real vector space. A map 13 : H x H -+ JR which is linear on each factor is called a bilinear form. A bilinear form 13 : H x H -+ JR is called continuous or bounded if, for some constant A, we have
113(u, v)1
~ A Ilullllvll
and it is called coercive if there is 13(u, u) ;:::
'r/U,VE H,
>. > 0 such that
>'llul1 2
'r/uEH.
Finally, 13(u, v) is said to be symmetric if 13(u, v) = 13(v, u)
Any linear operator T : H
-+
'r/u,vEH.
H defines a bilinear form by
13(v, u) := (vITu),
and 13 is bounded if T is bounded since 113(v,u)1 ~ IITllllvllllull. Conversely, given a continuous bilinear form 13 : H x H -+ JR on a real Hilbert space H, (10.10) 'r/u,v EH, 113(u, v)1 ~ L Ilullllvll for any given u E H, the map v -+ 13(v, u) is a linear continuous operator on H, hence by Riesz's theorem, there exists Tu E H such that 13(v,u)
=
(vITu)
'r/vEH.
(10.11)
10.3 Bilinear Forms
IITII::; A since
It is easy to see that T is linear and, from (10.10) that jiTul1
2
= 13(Tu, u)
::;
371
AIITullllull.
Consequently, by (10.11), there is a complete equivalence between bilinear continuous forms on a real Hilbert space H and bounded linear operators from H into H. Also, by (10.11), coercive bilinear continuous forms correspond to bounded operators called coercive, i.e., such that for some A > 0 (uITu) 2: Allul1 2 Vu E H. Moreover, self-adjoint operators correspond to bilinear symmetric forms, in fact 13(v, u) -13(u, v)
=
(vITu) - (uITv)
=
(v[(T - T*)u)
VU,v E H.
10.3.2 Coercive symmetric bilinear forms a. Inner products Clearly, every symmetric continuous coercive bilinear form on H defines in H a new scalar product, which in turn induces a norm that is equivalent to the original, since A IluW
::; 13(u, u) ::; AIIul1 2
VUE H.
Replacing (ujv) with 13(u,v), Dirichlet's principle and Riesz's theorem read as follows. 10.43 Theorem. Let H be a real Hilbert space with inner product ( I ) and norm Ilull := ~ and let 13 : H x H -+ JR be a symmetric, continuous and coercive bilinear form on H, i.e., 13(u, v) = 13(v,u) andfor some A 2: A> 0
113(u,v)l::; Allullllvl[,
13(u,u) 2:
Allul1 2 ,
Vu,v E H;
finally, let L be a continuous linear form on H. Then the following equivalent claims hold:
(i)
(ABSTRACT DIRICHLET'S PRINCIPLE). The functional
1 F(u) := "213(u, u) - L(u)
has a unique minimizer U E H, every minimizing sequence converges to U, u in H, u solves 13(
(ii)
= L(
V
(RIESZ'S THEOREM) The equation
13(
UL
= L(
E H.
V
(10.12)
372
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators
Moreover U = UL and
Ilud :::; tiILII·
The continuity estimate for UL follows from
~llud :::;
y'/3(UL' UL)
1
= sup
:::; JAIILII.
IL(u)1 #0 y'/3(u, u)
In terms of operators Theorem 10.43 may be rephrased as follows.
10.44 Theorem. Let T be a continuous, coercive, self-adjoint operator on H, i.e.,
IITull:::; IITllllull,
(uITu) 2':
Allul1 2 ,
Vu E H, T = T*. and IIT-111 :::; l/A.
A> 0,
Then T is invertible with continuous inverse, Proof. From coercivity we infer that
AIIul1 2 ::;
l(uITu)1 ::; IlullllTul1, (10.13) hence N(T) = {O} and T- l : R(T) ---> H is continuous. T is therefore an isomorphism between Hand R(T), and therefore R(T) is closed, R(T) = R(T) = N(T*).L = N(T).L = Hand (10.13) rewrites as liT-lull::; tllull Vu E H. 0
A variational proof of Theorem 10.44. For any y E H, consider the bounded operator ---> JR, L( JR
L :H
B(
B is bounded and symmetric, T being bounded and self-adjoint. Moreover, the coercivity implies that B(
AlluW::; B(u,u)::; IITllllullllull· By the abstract Dirichlet principle for the functional 1 F(u):= "2B(u,u) - L(u), u E H or Riesz's theorem, Theorem 10.43, we find x E H such that (
= B(
(
V
that is, Tx = y. Finally, from the coercivity assumption we infer that is,
IITII- l ::; t.
AIIxl1 2 ::;
l(xITx)1 ::;
IlxllllTxll, o
b. Green's operator Given a bilinear form in a real Hilbert space as above, the Green operator associated to /3 is the operator f B : H* -> H that maps L E H* into the unique solution UL,B E H of /3('1', UL,B) = L('P) V'P E H. It is easily seen that fB is linear and the estimate IluL,BII :::; t11L11 says that fB is continuous. Of course, if f is the Riesz operator and T : H -> H is an isomorphism such that /3(v, u) = (vITu), then fB = T- 1 0 f. 10.45'. Under the assumptions of Theorem 10.43, let K C H be a closed convex set of a real Hilbert space. Show that (i) the functional F(u) has a unique minimizer U E K, (ii) U is the unique solution u E K of the variational inequality
uEK,
B(u,u-v)::;L(v)
VvEK.
10.3 Bilinear Forms
373
c. Ritz's method The Dirichlet principle answers the question of the existence and uniqueness of the minimizer of F(u)
:=
1
:2B(u, u) - L(u)
and characterizes such a minimizer as the unique solution of B( v, uL) L(v) 'Iv E H. But, how can one compute UL? If H is a separable Hilbert space, there is an easy answer. In fact, since B( u, v) is an inner product, we can find a complete system in H which is orthonormal with respect to B,
B(ei, ej) = 8ij , such that every u E H uniquely writes as u = Z=;:lB(u,ej)ej, compare Theorem 10.20. If B(c.p,u) = L(c.p) , Vc.p E H, then B(ej,u) = L(ej), thus we have the following.
10.46 Theorem (Ritz's method). Let H be a separable real Hilbert space, B a symmetric coercive bilinear form, L E H* and {en} a complete orthonormal system with respect to B. Then L(v) = B(v,u) 'Iv E H has the unique solution 00
UL
= L L(ej)ej. j=l
This, of course, allows us to settle a procedure that, starting from a denumerable dense set of vectors {x n }, computes a system of orthonormal vectors with respect to B( , ) by the Gram-Schmidt method, and yields the approximations L(ej) ej of UL.
z=7=1
10.41~. With the notation of Theorem 10.46, show that for every integer N ~ 1, UN := 2:.7=1 L(ej )ej is the solution in Span {q, e2, ... , eN} of the system of N -linear equations
B(V,UN) = L(v), and the unique minimizer of 1 -B(v, v) - R(L(v)), 2 10.48~.
v E Span {e 1 , e2,···, eN}.
Show that the following error estimate for Ritz's method holds:
where F(u) := ~B(u, u) - L(u). [Hint: Compute F(u + v) - F(u).]
374
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators
d. Linear regression Let H, Y be Hilbert spaces and let A E 'c(H, Y). Given y E Y, we may find the minimum points u E H of the functional
F(u) := IIAu - yll~
uEH.
(10.14)
From the orthogonal projection theorem, we immediately infer the following.
10.49 Proposition. Let A E 'c(H, Y) be a bounded operator with closed range. Then the functional (10.14) has a minimum point U E H and, if u E H is another minimizer of (10.14), then u - U E N(A). Moreover, all minimum points are characterized as the points u E H such that Au - y 1R(A) i.e., as the solutions of
A*(Au - y)
=
O.
(10.15)
If N(A) = {O}, as N(A) = N(A* A), (10.15) has a unique solution, u = (A* A)-1 A*y. If N(A) i= {O}, the minimizer is not unique, so it is worth computing the minimizer of least norm, equivalently the only minimizer that belongs to N(A)-l, or the solution of
A*(AU - y) = 0, { U E N(A)-l.
(10.16)
Recall that, being R(A) closed, the map A1N(A)-L ---> R(A) is an isomorphism by the open mapping theorem. Consequently
At y := (A1N(A)-L ) -1 Qy,
Y E Y,
where Q is the orthogonal projection onto R(A), defines a bounded linear operator At : Y ---> H called the Moore-Penrose inverse of A. It is trivial to check that the solution U of (10.16) is U = At y . In the simplest case, N(A*) = {O}, we have R(A) = Y and (10.15) is equivalent to solving Au = y. Since we want to find a solution in N(A)-l, it is worth solving AA*z = y so that U = At y = A*(AA*)-1 y. In general, however, both AA* and A* A are singular and, in order to compute At y , we resort to an approximation argument. Consider the
penalized functional F>..(u) := IIAu - yll~
+ >. Ilull~
uEH,
(10.17)
where>' > 0, that we may also write as
F>..(u) =
IIyl12 -
2(Auly)y
+ IIAull~ + >. Ilull~·
Observing that L(u) := (Auly)y = (A*ylu)H belongs to 'c(H, IR) and that
B(v,u):= (AvIAu)y
+ >'(vlu)H = >'(vlu)H + (vIA* AU)H
10.3 Bilinear Forms
375
is a symmetric, bounded, coercive, bilinear form on H, it follows from the abstract Dirichlet principle, Theorem 10.43, that F).. has a unique minimizer u).. E H given by the unique solution of
Vr.p E H, i.e.,
(AId + A* A)u).. = A*y. We also get, multiplying both sides of (10.18) by u)..,
(10.18)
Allu)..llt + IIAu)..ll~ = (yIAu)..)y from which we infer the estimate independent on A (10.19) 10.50 Proposition. Let A E £(H, Y) be a bounded operator with closed range and for A > 0, let
u).. := (A Id + A* A)-l A*y E H,
be the unique minimizer of (10.17). Then {u)..} converges to At y in Hand
II (AId + A* A)
-1 -
At
11-+ 0
as A
Proof. Since R(A) is closed by hypothesis, there exists C
IlvllH ::; C IIAvlly
-+ 0+.
> 0,
such that
'Iv E N(A)-L.
(10.20)
Since AU>. = A*(y - Au>.) E R(A*) C N(A)-L, we get in particular from (10.19) and (10.20) lIu>.IIH ::; C IIAu>.lIy ::; C Ilylly, i.e., {u>.} is uniformly bounded in H. Let A, Jl > O. From (10.18) we have
-(AU>. - Jlup.) = A* A(u>. - Up.) from which we infer
IIA(u>. - up.)II} = (u>. - Up. 1AU>. - Jlup.)y ::; lIu>. - up.IIH ::; lIu>. -
up.IIH (IAlllu>. -
::; A Ilu>.
-
up.lI1-
up.IIH
+ IA -
+ IA - Jllllup.IIH Ilu>. -
jjAU>. -
Jlup.IIH
Jllllup.IIH) up.IIH.
Taking into account (10.20) and the boundedness of the up.'s we then infer
lIu>. -
up.ll1- ::; C 2 1IA(u>.
- up.)II}
::; C 2 A lIu>. - up.lI1-
+ C 3 1A -
Jllllylly lIu>. - up.IIH
that is, (10.21) provided 2C2 A < 1. For any {Ad, Ak ---+ 0+, we then infer from (10.21) that {U>'k} is a Cauchy sequence in N(A)-L, hence converges to u E N(A)-L. Passing to the limit in (10.18), we also get A*(Au - y) = 0, since {u>.} is bounded, i.e., u:= At y, as required. 0
376
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators
10.3.3 Coercive nonsymmetric bilinear forms Riesz's theorem extends to nonsymmetric bilinear forms.
a. The Lax-Milgram theorem As for finite systems of linear equations in a finite number of unknowns, in order to solve Tx = y, it is often worth first solving TT*x = y or T*Tx = T*y, since TT* and TT* are self-adjoint. We proceed in this way to prove the following. 10.51 Theorem (Lax-Milgram). Let 13(u, v) be a continuous and coercive bilinear form on a Hilbert space H, i. e., there exists A 2': A > 0 such that 13(u, u) 2': A [lu[1
113(u, v)1 ::; A I[ul[ IIvll,
Then for all L E H* there exists a unique 13(v, UL)
= L(v)
UL E
2
'Vu,v E H.
H such that
"Iv E H;
(10.22)
moreover IluLI[ ::; l/AI[LII, i.e., Green's operator associated to 13, H* ----* H, rB(L) := UL, is continuous. Proof. Let T : H
-t
H be the continuous linear operator associated to B by B(v,u) = (vITu)
The bilinear form
rB
Vu,v E H.
B(u, v) := (TT*u I v) = (T*u [T*v)
is trivially continuous and symmetric; it is also coercive, in fact,
Riesz's theorem, Theorem 10.43, then yields a unique 'ilL E H such that
B(v,'ih) = L(v)
"Iv E H.
Thus UL := T*ih is a solution for (10.22). Uniqueness follows from the coercivity of B. 0
Equivalently we can state the following.
10.52 Theorem. Let T : H operator, I[Tull::; IITIII[ul[,
----*
H be a continuous and coercive linear (uITu) 2': AI[uW
'Vu E H
where A 2': A> O. Then T is injective and surjective; moreover its inverse T- l is a linear continuous and coercive operator with [IT-ll[ ::; A-I. 10.53~.
Show the equivalence of Theorems 10.51 and 10.52.
10.54~. Read Theorem 10.52 when H = ]Rn; in particular, interpret coercivity in terms of eigenvalues of the symmetric part of the matrix associated to T.
10.3 Bilinear Forms
377
b. Faedo-Galerkin method If H is a separable Hilbert space, the solution UL of the linear equation (10.22) can be approximated by a procedure similar to the one of Ritz. Let H be a separable Hilbert space and let {en} be a complete orthonormal system in H. For every integer N, we define VN := Span {e1, ... , eN} and let PN : H ~ H be the orthogonal projection on VN and UN to be the solution of the equation (10.23) . I.e.,.m coord'mat es,
UN :=
h L.."i=l x i ei were
",N
N
L B(ei, ej)xj = L(ei),
Vi
= 1, ... ,N.
j=l Notice that the system has a unique solution since the matrix B, B ij B(ei, ej) has N linearly independent columns as B is coercive.
10.55 Theorem (Faedo-Galerkin). The sequence in H.
{UN}
converges to
UL
Proof. We have >.jjUN -
uLII 2
~ B(UN - UL, UN - UL) = B(UN,UN)
+ B(UL,ud -
B(UN,ud - B(UL,UN)
= B(UL,UL -UN),
since B(UN,uL) = L(UN) = B(UN,UN). It suffices to show that for every 'I' E H (10.24)
N-too.
as
We first observe that the sequence {UN} is bounded in H by I[L[[/>. since>' [[UNW ~ B(UN, UN) = L(uN) ~ IILI[ lIuNIi. On the other hand, we infer from (10.22) that (10.25)
'i'P E H,
hence B('P, UN - uL) = B('P - PN'P, uN - ud
+ B(PN'P, UN
- UL)
= B('P - PN'P, uN - UL),
and IB('P, UN - UL)[ ~
AIluN -
UL[[
11'1' -
PN'PII ~
Then (10.24) follows since [['I' - PN'P[I -t 0 as N -t
00.
A
2):II L [[ 11'1' -
PN'P[['
o
378
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators
10.4 Linear Compact Operators In Chapter 4 we presented a rather complete study of linear operators in finite-dimensional spaces. The study of linear operators in infinitedimensional spaces is more complicated. As we have seen, several important linear operators are not continuous, and moreover, linear continuous operators may have a nonclosed range: we may have {xd c H, {yd c Y such that TXk = Yk, Yk ~ Y E Y, but the equation Tx = Y has no solution. Here we shall confine ourselves to discussing compact perturbations of the identity for which we prove that the range or image is closed. We notice however, that for some applications this is not sufficient, and a spectral theory for both bounded and unbounded self-adjoint operators has been developed. But we shall not deal with these topics.
10.4.1 Fredholm-Riesz-Schauder theory a. Linear compact operators Let H be a real or complex Hilbert space. Recall, d. Chapter 9, 10.56 Definition. A linear operator K : H ~ H is said to be compact if and only if K is continuous and maps bounded sets into sets with compact closure. The set of compact operators in H is denoted by K(H, H). Therefore K : H ~ H is compact if and only if K is continuous and every bounded sequence {un} C H has a subsequence {Uh n } such that K(UhJ converges in H. Also K(H, H) c £(H,H). Moreover, every linear continuous operator with finite range is a compact operator, in particular every linear operator on H is compact if H has finite dimension. On the other hand, since the identity map on H is not compact if dim H = +00, we conclude that K(H, H) is a proper subset of £(H, H) if dim H = +00. Exercise 10.89 shows that compact operators need not have finitedimensional range. However, d. Theorem 9.140,
10.57 Theorem. K(H, H) is the closure of the space of the linear continuous operators of finite-dimensional range. Proof. Suppose that the sequence of linear continuous operators with finite-dimensional range {An} converges to A E L:.(H,H), IIA n - All ---t O. Then by (i) Theorem 9.140 A is compact. Conversely, suppose that A is compact, and let B be the unit ball of H. Then A(B) has compact closure, hence for all n there is a lin-net covering A(B), i.e., there are points Yl, Y2, .. . , YN E A(B), N = N(n), such that A(B) C Uf'=l B(Yj, lin). Define Vn := Span {Yl, Y2,· .. , YN}, let Pn : H ---t Vn be the orthogonal projection onto Vn and An := Pn 0 A. Clearly each An has finite-dimensional range, thus it suffices to prove that IIA n - All ---t O. For all x E B we find i E {1, 2, ... , N} such that IIAx - ydl :s; lin, hence, since PnYi = Yi and IlPnzll :s; Ilzll,
10.4 Linear Compact Operators
I"QUI.I'ID
379
PI: *,\'Or.'U,~
... _
. . . L... .......,.· ... 1ol
I.E
a......u.
Y rell£
D'EQUATJO' LI 'EAIRE A NB INFI 'T~ D·' CONNU&
PAIUll. ..
~
O"'GTtlt•• ·"I.L-l'~ IV"lliIlv.-un .. l'~ ..... Iou .... Q'·UIUI, U .. '''u ..,lIu••.•••. ~"Cn-:J,_-'''''
Figure 10.6. Marcel Riesz (1886-1969) and the frontispiece of a volume by Frigyes Riesz (1880-1956).
llJU
IlPn Ax - Axil::::; IlPnAx - PnYi11 + IlPnYi - Axil::::; 211Ax - Yill : : ; 2/n for all x E B.
D
10.58 Proposition. Let K E K(H, H). Then the adjoint K* of K is compact and AK and K A are compact provided A E £( H, H). Proof. The second part of the claim is trivial. We shall prove the first part. Let {Un} C H be a bounded sequence, Ilunll : : ; M. Then {K*u n } is also bounded, hence {KK*u n } has a bounded subsequence, still denoted by {KK*u n }, that converges. This implies that {K*u n } is a Cauchy sequence since IIK*Ui - K*uj112
= (K*(Ui
- uj)IK*(Ui - Uj))
::::; 2MIIKK*(Ui
= (Ui
- ujIKK*(Ui - Uj))
-uj)ll· D
b. The alternative theorem Let A E £(H, H) be a bounded operator with bounded inverse. A linear operator T E £(H, H) of the form T = A + K, where K E K(H, H), is called a compact perturbation of A. Typical examples are the compact perturbations of the identity, T = Id + K, K E K( H, H), to which we can always reduce T = A + K = A(Id + A-I K). The following theorem, that we already know in finite dimension, holds for compact perturbations of the identity. It is due to Frigyes Riesz (18801956) and extends previous results of Ivar Fredholm (1866-1927).
380
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators
10.59 Theorem (Alternative). Let H be a Hilbert space and let T = A + K : H --; H be a compact perturbation of an operator A E £(H, H) with bounded inverse. Then
(i) R(T) is closed, (ii) N(T) and N(T*) are finite-dimensional linear subspaces; moreover, dimN(T) - dimN(T*)
= O.
The following lemma will be needed in the proof of the theorem.
10.60 Lemma. Let T = Id + K be a compact perturbation of the identity. If {x n } cHis a bounded sequence such that TX n --; y, then there exist a subsequence {Xk n } of {x n } and x E H such that Xk n
--;
x
and
Tx
= y.
Proof. Since {x n } is bounded and K is compact, we find a subsequence {Xk n } of {x n } and z E H such that KXkn ----+ Z. It follows that Xk n = TXkn - KXkn ----+ y - z =: x, and, since K is continuous, TXkn ----+ Tx = x + Kx = x - z = y. 0 Proof of Theorem 10.59. Since T = A + K = A( Id + A-I K), A has a bounded inverse, A-I K is compact, we shall assume without loss of generality that A = Id. Step 1. First we show that there is a constant C > 0 such that Vx E N(T).L.
Ilxll::; CllTxl1
(10.26)
Suppose this is not true. Then there exists a sequence {x n } C N(T).L such that Ilxnll = 1 and IIT(xn)ll----+ O. T(x n ) ----+ 0 and Lemma 10.60 yield a subsequence {Xk n } of {x n } and x E H such that Xk n ----+ x and Tx = O. The first condition yields x E N(T).L and IIxll = 1, while the second x E N(T). A contradiction. It follows from (10.26) that T is an isomorphism between the Hilbert space N(T).L and R(T), hence R(T) is complete, thus closed. This proves (i). Step 2. By Lemma 10.60 every bounded sequence in N(T) has a convergent subsequence. Riesz's theorem, Theorem 9.21, then yields that dim N(T) < +00. Similarly, one shows that dim N(T') < 00. The rest of the claim is trivial if K is self-adjoint. Otherwise, we may proceed as follows, also compare 10.62 below. We use the fact that every compact operator is the limit of operators with finitedimensional range, Theorem 10.57. First we assume T = Id+K, K of finite-dimensional range. In this case K: N(K).L ----+ R(K) is an isomorphism, in particular dimR(K') = dimN(K).L = dimR(K). Since T = Id + K, we have N(T) C R(K) and N(T') C R(K), hence N(T) and N(T') are finite-dimensional, and (ii) is proved. Let V := R(K) + R(K'). Trivially N(T), N(T') C V and T and T' map V into itself. The rank theorem then yields dimN(T) = dimN(T'). This proves the claim (ii) if K has finite-dimensional range. Step 3. Returning to the case K compact, by the approximation theorem, Theorem 10.57, there is a linear continuous operator Kl with finite-dimensional range such that 11K - KIll < 1. If Q:= Kl - K, the series L~1 Qj converges in £(H, H) and 00
(Id - Q) LQj = Id. j=1
In particular, Id - Q is invertible with bounded inverse L~1 Qj. Therefore we can write
T = Id+K = Id - Q+Kl = (Id - Q)(Id+ (Id - Q)-IKl) =: A(Id+ B) where B has finite-dimensional range; the claim (ii) then follows from Step 2.
0
10.4 Linear Compact Operators
381
c. Some facts related to the alternative theorem We collect here a few different proofs of some of the claims of the alternative theorem, since they are of interest by themselves. 10.61 R(Id + K) is closed. As we know, this is equivalent to R(T) = N(T*).1., i.e., to show that for every f E N(T*).1. the equation Tu := u + K(u) = f is solvable. To show this, we can use Riesz's theorem. Given f E N(T*).1., we try to solve TT*v = f, i.e.,
b(
= (
Vip E H,
(10.27)
where
b(
:=
(TT*vl
=
(T*v[T*
If v E H solves (10.27), then u := T*v solves Tu = f.
(Tul
Vip E H.
We notice that N(TT*) = N(T*), therefore the bilinear bounded form b(
Ilenl! = 1 and
Otherwise, there exists a sequence {en} C N(T*).1. with
b(en,en ) = lien
+ K*en 11
2
(10.28)
--t
0.
By Lemma 10.60, there exists e E H and a subsequence {qn} of {en} such that
Te
= e + Ke = 0;
in particular Ilell = 1, e E N(T*) and e E N(T), a contradiction. We then conclude that b(
b(
Vip E N(T*).1.,
Ilvll ::;
~11f[1. c
(10.29)
It remains to show that v solves (10.27). If P is the orthogonal projection of H into N(T*), then (10.29) is equivalent to
b(P
Vip E H.
On the other hand,
(
b(
since f and v are in N('J'*).1., hence b(
10.62 Another proof of dim N(T) = dim N(T*). Step 1. Let us prove the equality if Tor T* is injective. Let HI := R(T) and, by induction Hj+1 := T(Hj). H j is a nonincreasing sequence of closed subspaces of H. We claim that there exists n such that HTf = HnVn 2: n. If not, we can find {en} C R(H) with Ilenll = 1 and en E HnnH;;+I' Since for n > m, T(e n ), T(e m ), en E H m + l , em E H;;+I' and
Ken - Kem = (en we may infer
+ Ken) -
(em
+ Kern)
- en
2
+ em
liKen - Kern l1 2 = IIzl12 + Ileml1 2: 1: a contradiction, since {K(en)} has a convergent subsequence.
:=
z
+ ern,
382
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators
If N(T) = {O} and HI = R(T) "I- H, then necessarily HHI "I- H j Vj since T is injective, and this is not possible, as we have seen. Hence H = R(T) and N(T*) =
R(T).l = {O}. If N(T*) = {O}, then repeating the above consideration for Id+K* we get N(T) = {O}. Step 2. Let us prove that dim N(T) ~ dim R(T).l. Assume that dim N(T) < dim R(T).l. Then there exists a linear continuous operator L that maps the finite-dimensional space N(T) into the finite-dimensional space R(T).l with L injective but not surjective. Let us extend L as a linear operator from H to R(T).l by setting Lx = 0 'Ix E N(T).l. Then L has a finite-dimensional range, thus it is compact. Now we claim that N(Id + L + K) = {O}. In fact u + Ku + Lu = 0 implies Tu = u + Ku = -Lu and, since Tu E R(T) and Lu E R(T).l, we infer Tu = Lu = 0, i.e., u E N(T) and u E N(T).l, since L is injective when restricted to N(T); in conclusion u = O. Step 1 then says that Id + K + L is surjective. This is a contradiction, since the equation u + Ku + Lu = v has no solution when u E R(T).l, v'f- R(L). Step 3. Replacing K by K* in the above proves that dimR(T).l = dimN(T*) ~ dimR(T*).l = dimN(T), which completes the proof.
10.63 Yet another proof of dim N(T) = dim N(T*). Let H be a separable Hilbert space, T = Id + K be a compact perturbation of the identity, and let {ei} be a complete orthonormal system for H, ordered in such a way that N(T) + N(T*) is generated by the first elements el, e2, ... , ek. Proposition. Let Vn = Span {el, e2, ... , en}, Pn be the orthogonal projection over Vn . Then there exists a constant I > 0 and an integer no such that 'In ~ no
Proof. Suppose the conclusion is not true; then for a sequence ni 'Pi E Vni n N(T).l we have
-t
00
of vectors (10.30)
By Lemma 10.60 for a subsequence {'Pk i } and 'P E H we then have K 'Pk i Pnx - t X as n - t 00, we infer
-t
-'P. Since
hence 'Pk i - t 'P in H, since 'Pi = PniT'Pi - PniK('Pi), and finally 'P + K'P particular II 'PI I = 1 and 'P E N(T) n N(T).l, a contradiction.
=
O. In 0
From the previous proposition, if {'PI, 'P2, . .. , 'Ps} is a family of linearly independent vectors, then Pn T( 'PI), ... , Pn T('Ps) are also linearly independent, at least for n large enough; on the other hand, since R(T) = N(T*).l, the vectors PnT('PI), ... , PnT('Ps) belong to PnR(T) = Vn n N(T*).l. Hence we have dim Vn n N(T*).1 ~ dim Vn n N(T).l for n large enough. Similarly one proves dim Vn n N(T).l ~ dim Vn n N(T*).l, hence
for n large enough. The claim then follows by considering the orthogonal complements.
10.4 Linear Compact Operators
383
d. The alternative theorem in Banach spaces The alternative theorem generalizes to the so-called Fredholm operators between Banach spaces X and Y of which compact perturbations of the identity are special cases. Let X be a real Banach space on IK = IR or IK = iC and X* := L:(X, IK) its dual space, which is a Banach space with the dual norm
Ilcpll =
sup Icp(x)l, Ilxll=l
VcpEX*.
If cp E X * and x EX, we often write < cp, x > for cp(x ). Clearly, the bilinear map < , >: X* x X ---t IK, defined by < cp,x >= cp(x), is continuous,
1< cp,x > I ~
Ilcpllllxll
Vcp E X*, "Ix E X.
In general, X* is not isomorphic to X, contrary to the case of Hilbert spaces. If X and Yare Banach spaces and if T : X ---t Y is a linear bounded operator, the dual or adjoint operator T* : y* ---t X* is defined by
< T*(cp), x >:=< cp, Tx > .
(10.31)
T* is continuous and IIT*II = IITII. 10.64~. Let TEL:(H, H), where H is a Hilbert space. We then have two notions of adjoint operators: as the operator T* : H ---t H in (10.9) Chapter 10 and as the operator T: H* ---t H* defined in (10.31). Show that, if C : H* ---t H is Riesz's operator, then T = C-l 0 T* 0 C.
For a subset V C X of a Banach space X, we define
Vi-
:= {cp E X* I < cp, x >= 0 "Ix E V}
called the annihilator of V. Notice that Vi- is closed in X*. We have
10.65 Lemma. Let T : X R(T*) = N(T)i-.
---t
X be a bounded linear operator. Then N (T*) = R(T) i- ,
The class of linear compact operators on a Banach space, denoted by K(X, X) is a closed subset of L:(X, X). But in general these operators are not limits of linear operators with finite-dimensional range, contrary to the case X = H, where H is a Hilbert space as shown by a famous example due to Lindemann and Strauss. Recall that we can always approximate K E K(X, X) by nonlinear operators with range contained in a finite-dimensional subspace, see Theorem 9.140. We can now state, but we omit the proof, the following result.
10.66 Theorem (Alternative). Let X be a Banach space and let T = A X be a compact perturbation of an isomorphism A : X ---t X. Then
+K
:X
---t
(i) R(T) is closed, (ii) N(T) and N(T*) have finite dimension, and dimN(T) = dimN(T*). Consequently, we have the following.
10.67 Corollary (Alternative). Let A, K E L:(X, X) where A is a linear isomorphism of X and K is compact. Then the equation Ax + K x = y is solvable if and only if y E N(T*)i-.
384
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators
e. The spectrum of compact operators 10.68 Definition. Let H be a Hilbert space on JK, JK = JR. or JK = C, and let L E £( H, H) be a bounded linear operator on H. The resolvent p(L) of the operator L is defined as the set p(L)
=
{A E JK I (AId -
and its complement o-(L)
= JK \
L)-l is a bounded operator}
p(L) is called the spectrum of L.
By the open mapping theorem
0-( L)
:= {
A E JK IA Id - L is not injective or surjective}.
(10.32)
10.69 Definition. Let L E £(H, H). Then the pointwise spectrum of L is defined as o-p(L) :=
{A E JK lAId -
L is not injective}.
The points in o-p(L) are called eigenvalues of L, and the elements of
N (A Id - L) are called the eigenvectors of L corresponding to A. Of course, o-p(L) C o-(L) and, if dimH < +00, o-p(L) = o-(L) as, in this case, a linear operator is injective if and only if it is surjective. If dim H = +00, there exist, as we know, linear bounded operators which are injective but not surjective, hence, in general 0-p (L) =f 0-( L).
10.70 Remark. In the sequel we shall deal with compact operators L. For these operators the equality o-p(L) = o-(L) also follows from the alternative theorem of the previous section. As in the finite-dimensional case, see Proposition 4.5, eigenvectors corresponding to distinct eigenvalues are linearly independent. Moreover (10.33) because, if IAI > Id +
liLli,
then
II±LII < 1, therefore, see Proposition 9.106,
±L, equivalently AId + L, is invertible and 00
(AId + L) -1 = ~) -l)j An- j Lj , j=O
hence A E p(L). The following theorem gives a complete description of the spectrum of a linear compact operator.
10.4 Linear Compact Operators
10.71 Theorem. Let H be a Hilbert space with dim H K E K(H, H) be a compact operator. Then
+00
385
and let
(i) 0 E o-(K), (ii) K has either a finite number of eigenvalues or an infinite sequence of eigenvalues that converges to
o.
(iii) the eigenspaces corresponding to nonzero eigenvalues have finite dimenswn, (iv) if A =I=- 0 and A is not an eigenvalue for K, then AId - K is an isomorphism of H and (AId - K)-l is continuous,
(v) a(K) \ {O} = ap(K) \ {O}. Proof. (i) In fact R(K) ¥ H, since K is compact. (ii) From (10.33) the set of eigenvalues A is bounded, thus either A is finite or A has an accumulation point. Let us prove that in the latter case, A has only 0 as an accumulation point; we then conclude that A is denumerable, actually a sequence converging to zero. Suppose {An} is a sequence of nonzero eigenvalues with corresponding eigenvectors {Un} such that An -> A ¥ O. Set!Jon:= l/A n and V n := Span{uI, U2, ... , Un}, and notice that, if w := 2:::;'=1 CjUj E Vn , then AnW - Kw = 2:::;'=1 Cj(An - Aj)Uj E Vn - l . We now construct a new sequence {v n } with Ilvnll = 1 by choosing VI E VI and, for n ?: 2, Vn E V n n V nl._ 1 . Clearly Vn is an eigenvector corresponding to An and, according to the previous remark, Vn -!JonKvn E V n -l. For n > m we then find Vn -!JonKvn, !JomKvm E V n - 1 , Vn E V nl._ 1 and K(!Jonvn -!JomVm) = Vn - (vn -!JonKvn
+ !JomKvm)
=: Vn - z,
with Vn E Vn~1 and Z E V n - l . Thus we conclude
a contradiction, since {!JonUn} is bounded and K is compact. In conclusion A = O. (iii), (iv) are part of the claims of the alternative theorem, and (v) follows from (iv).
0
10.72 Remark. Actually, Theorem 10.71 holds under the more general assumption that H is a Banach space. In this case it is known as the Riesz-Schauder theorem.
10.4.2 Compact self-adjoint operators Let us discuss more specifically the spectral properties of linear self-adjoint operators. a. Self-adjoint operators 10.73 Proposition. Let H be a real Hilbert space and L : H bounded self-adjoint linear operator. Set
m:= inf (Lulu), lul=l
Then
M:= sup (Lulu). lul=l
----t
H be a
386
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators
(i) eigenvectors corresponding to distinct eigenvalues are orthogonal, (ii) m, ME a(L), (iii) IILII
= sUPllull=ll(uILu)1 = max(lml, IMI)·
Also, if L is a bounded self-adjoint operator in a complex Hilbert space, then (uILu) E IR Vu E H, consequently all eigenvalues are real, moreover (i), (ii) and (iii) hold. Proof. (i) In fact Lu
= AU and Lv = J.tv,
A, J.t E JR, A t'- J.t yield
(A - J.t)(u I v) = (Lu I v) - (u I Lv) = O. We now prove that for all u E H IIMu - Lull
:s; I(Mu - Lulu)11/2,
2 Ilmu - Lull:S; I(mu - LuluW/ .
(10.34)
The bilinear form b(u, v) := (Mu - Lulv) is symmetric and nonnegative, b(u,u) ~ 0; the Cauchy-Schwarz inequality then yields
I(Mu - Lulv)1 :s; I(Mu - LuluW/ 21(Mv - LvlvW/ 2 :s; GI(Mu - LuluW/ 2 1Ivll· By choosing v = Mu - Lu, the first of (10.34) follows. A similar argument yields the second of (10.34). (ii) Let us prove that M E O"(L); similarly one proves that m E O"(L). Let {ud be a sequence such that Ilukll = 1 and (Lukluk) ---> M. Because of (10.34) MUk - LUk ---> 0 in H. If M is in the resolvent, then Mu - Lu is one-to-one and onto with continuous inverse because of the open mapping theorem. Thus
Uk
:=
(MId - L)-l(Muk - LUk)
--->
0,
that contradicts Iluk" = 1. (iii) Set a := sUPllull=l I(Lulu)l; of course max(lMI, 1m!) = a and a show that IILII :s; a. Since L is self-adjoint
41R(Lulv) = (L(u
+ v)lu + v) -
:s; IILII. Let us
(L(u - v)lu - v),
hence, according to the parallelogram law,
Replacing u and v with
EU,
Vl€
respectively, € > 0, we find
41(Lulv)1 :s; 2amin(€21IuI1 2 €
+ Ilv~12) €
= 4allullllvll.
Hence, if v := Lu, we have IILul1
2
:s; allullllLull,
Le.,
IILII:S; a.
In the complex case we have
(Lulu)
= (uIL*u) = (uILu) = (Lulu)
hence (Lulu) E JR. We leave to the reader the completion of the proof.
o
We notice that the proof of (iii) Proposition 10.73 uses the continuity of (MId - L)-l when M E p(L). If L is compact, this is a consequence of the alternative theorem and the open mapping theorem is not actually needed.
10.4 Linear Compact Operators
387
10.74 Corollary. Let L : H -+ H be a linear compact self-adjoint operator. Then there exists an eigenvalue A of L such that IILII = IAI. Proof. If L = 0, then A = 0 is an eigenvalue. If L i- 0, then IILII = max(lm!'IMI) i- 0 and M, mE a(L). Assuming IILII = 1M!, then M i- 0 and, according to Theorem 10.71, ME ap(L), i.e., M is an eigenvalue of L. Alternatively, we can proceed more directly as follows. Let {un} be a sequence with Ilunll = 1 such that (Lunlun) ---> M; then (Mu n - LUnlun) ---> 0, and by (10.34) MUn - LU n ---> 0 in H. Since L is compact, there is u E H and a subsequence Uk n of {Un} such that Uk n ---> u hence
Ilull =
Mu- Lu= 0,
1,
o
i.e., M is an eigenvalue for L.
b. Spectral theorem 10.75 Theorem (Spectral theorem). Let H be a real or complex Hilbert space and K a linear self-adjoint compact operator. Denote by W the family of finite linear combinations of eigenvectors of K corresponding to nonzero eigenvalues. Then W is dense in N(K).J.... In particular, N(K).J... has an at most denumerable orthonormal basis of eigenvectors of K. If Pj is the orthogonal projection on the eigenspace corresponding to the nonzero eigenvalue Aj, then 00
in £(H, H). Proof. We order the nonzero eigenvalues as Ai
i- Aj
for i
i- j,
and set N j := N(Aj Id - K) for the finite-dimensional eigenspace corresponding to Aj. According to Proposition 10.73
N j .1 N k
for j
i-
k
and
N(K).l Nj 'v'j,
hence N(K) C W-L. To prove that W is dense in N(K)-L, it suffices to show that W = N(K)-L or W-L = N(K). Define if K has no nonzero eigenvalues, if K has at least n nonzero eigenvalues, if K has only p
nonzero eigenvalues
and Vn := W,t. Trivially W-L = nnVn. Notice that, since K is self-adjoint K(W,t) C W,t if K(Wn ) C W n and the linear operator KlVn E .c(Vn , Vn ) is again compact and self-adjoint. Moreover, the spectrum of K lVn is made by the eigenvalues of K different from {AI, A2, ... , An}. Therefore by Corollary 10.74
IIKlVn
II = {I An+ll o
h~
if K at least n otherWIse.
+ 1 eigenvalues,
(10.35)
If K has a finite number of eigenvalues, then V = Vn and (10.35) yields K(Vn ) = {O}, Le., V = Vn C N(T).
388
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators
If K has a denumerable set {An} of eigenvalues, then hence
IAnl
-> 0 by Theorem 10.71,
IIKWII ~ IIKWnl1 = IAn+11-> 0
and K(V) {O}, Le., V C N(K). Choosing an orthonormal set of eigenvectors in each eigenspace N j , we can produce an orthonormal system {en} of eigenvectors of K corresponding to nonzero eigenvalues, Le., such that
(eilej) = 8ij ,
Kej = Ajej,
that is complete in the closure W = N(K).L of W. Let us prove the last part of the claim. Let Pj and Qn be the orthogonal projections respectively, on N j and W n . Since the eigenspaces are orthogonal, we have K(Qn(x)) = 2::;'=1 AjPj(x) \/X E H, hence n
'L AjPj(X) = K(x - Qn(X)),
Kx -
j=l and therefore
IIKx - I:>-jPj(x)11 j=l
~ IIKWn 1IIIx - Qn(x)11 ~ IAn+111Ixll,
Le., k 11
K - 'LAjPjll j=l
the conclusion then follows since
IAnl -> 0 as
~ IA n +11;
n ->
o
00.
c. Compact normal operators A linear bounded operator T E £(H, H) in a complex Hilbert space H is called normal if T*T = TT*. It is easy to show that if T is normal, then
(i) N(T) = N(T*T) = N(TT*) = N(T*), (ii) N(T - AId) = N(T* - "XId), that is, T and T* have the same eigenspaces and conjugate eigenvalues. If T is normal, the operators
T+T* A:=--2-
T-T* B:= -2-i-
are self-adjoint and commute, AB = BA. Two linear compact self-adjoint operators that commute have the same spectral resolution, see Theorem 4.29 for the finite-dimensional case. 10.76 Theorem. Let H be a complex Hilbert space and A, B two linear
compact self-adjoint operators in H such that AB = BA. Then there exists a denumerable orthonormal system {en} which is complete in (N(A) n N(B)).L and made by common eigenvectors of A and B. If Aj and J.Lj are respectively, the eigenvalue of A and the eigenvalue of B relative to ej, and Pj : H ~ H is the orthogonal projection onto Span {ej}, Pjx:= (x[ej)ej, then 00 00 A= LAjPj , j=l
in £(H, H).
10.4 Linear Compact Operators
ZUI' Alnbrl
d~r
389
FlnkuoDalopuaUOntn ond Theori
del" Dormalt'D Optl'alOrtll. T.
J.... S,,1lUIlll
-,
\a hIM.
I .... K _ ......
111""-
l.o..~~ofCdIll.tdl"""~~~ t . . Dec ..... (H I-W) fill; don UtotcJ1u;w, lkr l . - . 1Illd. -. _\~~_)
.........
...w..
(a.1. ...
IIIl ~ ~II.
-., s.na.
.mJ-
:l.*b1lllftll~_~ ~~~ Qw..')
(.,t ib ....tlJul. . .
.....~
1l«I.
_~
...u-........
...
...
DIiIII." . . . ~paaAIC~ .. 1llI"DdalU '-(Dmpkullj ~ ~ tJo~ 'Pu ~ y6! ..mimt tA.ko.,;
.,
1~
~
.-u ~ .-M EMmM:~
lIlis
f.,.· '
!I,l
,.~
-
IIllIl
•
...--....K.,i ... ~1"'11MIJuc,_ _ . , . - . I»IIlqpt !bdl. ttO'J.1.
",,_t'tt.
au.
) . . """'O
.w......
'P
{.. .
c.... -~
~~l
IiIl . . . . . ~ D "" (ul _ o..a.: 4.,
•.•••,--.). .. WJiII
~_~_
"-w.na 1Uw'" fIlAMII
alI!~.w.
o,....,~
~
__
,.,..n..
~,.....,.,
1Il
Jf-II-" _/"."4'
..... .......lilMMa f{I)'" r:.~ _
rI ..... ~ \ . - v
w_.O.~
"'-) .Dd.t Ia . . . . . . ~.. ""
.. , ~ ...... :EMJ'~.An}.I ~a..,u. ....'\l..W_ac..lI,8oIMuM. liI'tl, Ma..
.........._&.._.,... ..
....
".......
U4·
-u
1(~'I+···+.. r.)-~.f,+.:.+.. lIf.
~(..p.~1'UrJm.~0nI,.
-.. .... r.... ..."....,ZolWm I.".,. ,f 011
........ ~
.. 0t0Ir-
...... 0pInI0r ...a-diU .w 'otc:-d- ~: 11M ~ S. IJI. jIda rwmioa f ~ " (6 ...... Il1o )--a-. ....
~ 1a. • ~ dIt E&J.we~ D.. IUd .. all ~eomuJ.~ ... ~ ~ 6 bldet D1lf l= ~ ~~t~'\UHl.6..". .
~
...,'UIl!w.._
....................... O.I_~ ~ ~ p l. . . (a. ..
WltJ"
.danIeOIq
a....Q
L"._~ . . ,..~d.-
""'-0 .........
........ _ . _ . . . , -w-_...l Qel!liuWJV..) 8J..apI J b.~" 'Hfdaeo o.~ '- ...... TrIl. ~ bIWUI ~ DoicI.t ~ Ib.nIl (WI ttl . . ~ap...~.dl.~"'~Speb:!II.
"),
_ .._
.,DIt~"",,,,,,,,1r_""'"
"'--,...,.~
..... W"
-' r....
~_v-~
--..........
"""'-r~
..._--
........ _M1 •
.---,.
Figure 10.7. Two pages from two papers by John von Neumann (1903-1957) in Mathe-
matische Annalen,
Proof. Let W be as in Theorem 10,75, As in the finite-dimensional case, see Proposition 4,27, for every eigenvalue A of A we find a basis of the corresponding eigenspace N(A Id - A) made by eigenvectors of B. By induction we then find a denumerable orthonormal system which is complete in Wand made of common eigenvectors {en} of A and B. By Theorem 10,75 then W = N(A)l- and {en} is a basis of N(A)l- of common eigenvectors of A and B, Now AB = BA implies that B(N(A)) C N(A), Therefore, applying the spectral theorem to B[N(A), we find further eigenvectors {Un} of B corresponding to nonzero eigenvalues that form a basis of N(A) n N(B)l-, The family {en} U {Un} is now a denumerable orthonormal set of eigenvectors common to A and B that is complete in (N(A) n N(B))l-, The second part of the claim easily follows by applying Theorem 10,75 to A and B,
0
10.77 Corollary. Let H be a complex Hilbert space and let T : H ----. H be a compact normal operator. Then there exists a denumerable basis {en} in H of common eigenvectors of T and T*. If Pj denotes the orthogonal projection on Span { ej} and Aj is the corresponding eigenvalue, then 00
T=LAjPj , j=l
00
T*
= LAjPj ,
in £(H,H).
j=l
Proof. Set A := (T + T*)j2 and B := (T - T*)j(2i), We can apply Theorem 10,76 and find a basis {en} in (N(A) n N(B))l-, i.e" a basis in ker(T)l- = ker(T*)l- made by common eigenvectors to T = A + iB and T* = A - iB. 0
390
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators
d. The Courant-Hilbert-Schmidt theory In several instances one is led to discuss the existence and uniqueness of solutions in a Hilbert space H of equations of the type a(iP,u) - >"k(iP,u) = F(iP)
ViP E H
(10.36)
where F E H*, and a(iP, u), k(iP, u) are bounded bilinear forms in H. As we have seen, by Riesz's theorem, there exist bounded operators A, K E £(H, H) and f E H such that
a(iP, u) := (iPIAu),
k(iP, u) := (iPIKu),
for all u, iP E H. Then (10.36) reads equivalently as the linear equation in H (10.37) (A - >"K)u = f. With the previous notation suppose that
- A is continuous, self-adjoint and coercive on H, i.e., there exists v > 0 such that (10.38) VuEH, a(u, u) ~ v IIul1 2 - K is compact, self-adjoint and positive, i.e., Vu =I- 0, u E H.
k(u, u) = (uIKu) > 0
(10.39)
With these assumptions, the corresponding bilinear forms are continuous and symmetric; moreover a(v, u) defines an inner product in H equivalent to the original one (vlu) since
Finally, see Theorem 10.44, A has a continuous inverse. The operator A >..K is therefore a compact perturbation of an isomorphism, and, since A and K are self-adjoint, the alternative theorem yields the following.
10.78 Theorem. The equation Au + >..Ku = f has a solution if and only if f is orthogonal to the solutions of Au - >..Ku = o. Now we want to study the equation
Au->"Ku=O equivalently,
a(iP,u) - >"k(iP,u) = 0 which can be rewritten as 1
:x-u - A
-1
Ku = O.
With the assumptions we have made o A-I K is a linear compact operator,
ViP E H,
10.4 Linear Compact Operators
391
THE
THEORY OF
OU D
If'( TYO 'lOUIW13
""""'"
&«~UNrlOft"p'oI!ia~·.D'~"D
fIItw'lOU"
OOl'U !'"UILICUI01'l.$
Figure 10.8. Lord William Strutt Rayleigh (1842-1919) and the frontispiece of his Theory of Sound.
o A-IK is positive, since a(u,A-IKu) = (u[AA-IKu) = (uIKu) > 0 for u =I=- 0, o A-I K is self-adjoint with respect to the inner product a(v, u), since
a(v, A-I Ku)
= (v[Ku) = (uIKv) = a(u,A- I Kv).
10.79 Definition. We shall say that A =I=- 0 is an eigenvalue of (A, K) and that u is a eigenvector of (A, K) corresponding to A if 1/ A is an eigenvalue of A-I K and u is a corresponding eigenvector, i. e., a solution ofAu-AKu=O. The theory previously developed, when applied to the self-adjoint compact operator A-I K in the Hilbert space H with the inner product a(v, u), yields the following.
10.80 Theorem. Let H be an infinite-dimensional Hilbert space and let A and K E £(H, H) be self-adjoint, for A coercive and K compact. The equation Au- AKu = 0 has zero as its unique solution except for a sequence {An} of positive real numbers such that An ---.... +00. For any such An, the vector space of solutions of Au- AnKU = 0 is finite dimensional. Moreover, if W is the family of finite linear combinations of eigenvectors of (A, K), then W is dense in H. In particular, there exists a complete orthonormal system in H of eigenvectors of (A, K) such that Vi,j. Proof. The eigenvalues of A-I K are positive since A-I K is positive. Since A-I K is compact, A-I K has a denumerable sequence of eigenvalues {J.Ln} and J.Ln -+ 0+ and
392
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators
the corresponding eigenspaces are finite dimensional by Theorem 10.71. Consequently Au - AKu = 0 has nonzero solutions for the sequence An = 1/J.l,n -> +00. The spectral theorem yields the density of W in H and the existence of an orthonormal basis of A -1 K with respect to the inner product a(v, u), 1 -1 -Uj - A KUj =
Aj
o.
Therefore, a(ui, Uj) = (jij, Ajk(Ui, Uj) = a(ui, Uj) = (jij and, if we set ej := J>:jUj, we conclude
o e. Variational characterization of eigenvalues 10.81 Theorem. Let H, A and K be as in Theorem 10.80. Let {en} be a basis in H of eigenvalues of (A, K) ordered in such a way that the corresponding eigenvalues An form a nondecreasing sequence {An}, An < An+l. Then each An is the minimum of the Rayleigh's quotients
u)}
. {a(u, A1 := mm -k ufO u,u() ' . {a(u, u) I ( ) . } An:=~~~ k(u,u) ku,ei =OVz=l, ... , n - l , if n
> 1.
Proof. For U E H write U = E~1 cjej so that k(u,ej) = Cj and k(u,u) = E~1 leil z . If U E Vn := {u I k(u, ei) = 0, Vi = 1, ... , n - I}, then Ci = 0 for i = 1, ... , n - 1, hence 00
k(u,u) =
L IcjlZ j=n
while 00
a(u,u) =
L
00
AjlcjlZ ::::: An
j=n
Therefore ~i:::j
L
ICjlZ = An k(u,u).
j=n
: : : An on Vn . On the other hand, en E Vn and a(en, en) = An keen, en~
Moreover with the previous notation and assumptions, we have the following.
10.82 Theorem (Min-max characterization). Denote by S a generic subspace of dimension n - 1. Then we have
An = mtxmin{ ~~~: ~~ u oJ 0, k(u, z) = 0 Vz E S}.
I
Proof. The inequality::::: follows from Theorem 10.81. Let S be a linear subspace of H of dimension n - 1 and Vn := Span {et, ez, ... , en}. Choose a nonzero vector Uo := E~=1 O
this is possible since dim S = n - 1. Then
10.5 Exercises
393
n
k(uo,uo):= La;, i=l
n
n
.Ai a; ~ .An La;
=L
a(uo, uo)
i=l
hence min{ a(u, u)
k(u,u)
= .An k(uo, uo),
i=l
I k(u, z) = °Vz E S} <
a(uo, uo) < .An. - k(uo,uo) -
o
10.5 Exercises 10.83~. Let P v : H -> H be the orthogonal projection onto the closed subspace Vof a Hilbert space H. Show that o Pv + P w is a projection in a closed subspace if and only if Pv P w = 0, and in this case Pv + P w = PViBW , o P v Pw is a projection on a closed subspace if and only if Pv Pw = Pw Pv, and in this case, Pv P w = P vnw .
1O.84~. Let S, T E £(H, H). Show that (S +T)* = S* +T*, (ST)* = T*S*, (.AT)* = XT*, Id* = Id, T** = T.
10.85~. Let T E £(H, H). Show that
IITI1 2 = IIT*11 2 = IITT*II = IIT*TII.
10.86~. Show that Hilbert's cube {x E
£211xnl ~ lin} is compact, while {x E 1} is not compact. Show also that Hilbert's cube has no interior points, i.e., its complement is dense.
£211x n l
~
10.87~.
Show the following.
Proposition. Let L E £(H, H) be a bounded self-adjoint operator on a real or complex Hilbert space and M:= sup (Lulu). m:= inf (Lulu), lul=l lul=l Then (i) ap(L) C [m, M], (ii) we have (Lulu) = M IluW (resp. (Lulu) = m Ilu[[2) if and only if u is an eigenvector of L corresponding to M (respectively mY,
[Hint: (i) Proceed by contradiction using Riesz's theorem; in the complex case, first show that ap(L) C R (ii) Use (10.34).] 10.88~.
Show the following.
Proposition. Let H be a Hilbert space, {.Aj} a sequence of nonzero real numbers converging to 0, {ej} an orthonormal set in Hand P j : H -> H the orthogonal projection onto Span {ej}. Show that the series Z=~1 .AjPj converges in £(H, H). Moreover, if 00
K:= L.AjPj j=l
then
in £(H,H),
(10.40)
394
(i) (ii) (iii) (iv) (v) (vi)
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators
Vx E H, Kx = E~lAj(xlej)ej, for all j, Aj is a nonzero eigenvalue of K and Kej = Ajej for all j, the sequence {Aj} is the set of all nonzero eigenvalues of K, K is self-adjoint and compact, {ej} is a basis of ker K.l., in particular ker K.l. is sepamble, if Ai 0 and A i Aj Vj, then (AId - K)-l is an isomorphism of H into itself.
If H is a complex Hilbert space and Aj E iC, then (i), (ii), (iii), (iv), (vi) still hold and K is compact and normal. [Hint: (vi) follows from (v) and the alternative theorem. Moreover, an explicit bound for (A Id - K)-l follows from (i), assuming H separable. Choose an orthonormal basis {Zn} in ker K. Then show that the equation (A Id K)x = y has a unique solution
(y-I X ek) k x = (AId - K) -1 = ~ L..J k=l A - Ak Then
Ixl2
=
" -(y 1 + 'L..J I z",)z",. '" A 2
~ I(y I ek)1 + '" I(y I z",)1 < (~ + ~) I 2
L..J IA - A 12 k=l k
A2
L..J '"
d2
-
A2
12 y,
if d := mink EN IA - Akl.] 10.89~.
T(ej) =
and let {en} be a basis of H. Consider the linear operator tej,Let H2 be1, separable i.e., j
00
1
T(u);= L --:(ulej)ej. j=l J Show that T is compact with a nonclosed range. [Hint: Show that T is the limit in £(H, H) of a sequence of linear operators with finite-dimensional range. Then show that v E R(T) if and only if E~ljl(vlej)12 < +00. Choose Vo ;= E~lj-3/2ej,
Vn := E~l j-3/2+l/n and show that Vo rf:- R(T), Vn E R(T) and IVn - vol 10.90
~.
-->
0.]
With the notation of Theorem 10.80 show the so-called completeness relations 00
k(u, v) = Lk(u,ei)k(v,ei)' i=l
00
a(u,v) = LAik(u,ei)k(v,ei)' i=l
11. Some Applications
In this chapter we shall illustrate some of the applications of the abstract principles we stated in the previous chapter to specific concrete problems. Our aim is to show the usefulness of the abstract language in formulating and answering specific questions, identifying their specific characteristics and recognizing common features of problems that a priori are very different. Of course, the abstract approach mostly follows and is motivated by concrete questions, but later we see the approach as the most direct way to understand many questions, and even the most natural. Clearly, the problems we are going to discuss deserve more careful and detailed study because of their relevance, but this is out of our present scope and, in any case, often not possible because of the limited topics we have so far developed. For instance, in this chapter, we shall only use uniform convergence, since the use of integral norms, besides being more complex, requires the notion of Lebesgue's integration, without which any presentation would sound artificial.
11.1 Two Minimum Problems 11.1.1 Minimal geodesics in metric spaces Let X be a connected metric space so that every two of its points can be connected by a continuous path. One of the simplest problems of calculus of variations is to find a continuous curve of minimal length connecting two given points. A first question is deciding when such a minimal connection exists. Here, we shall see how the Frechet-Weierstrass theorem, Theorem 6.24, and the Ascoli-Arzela theorem, Theorem 9.48, lead to an answer.
a. Semicontinuity of the length Let X be a metric space. Recall that I : X ----> JR is called lower semicontinuous if the level sets r f,A := {x I I(x) ::; ),} of I are closed for all ), E JR, equivalently if 1- 1 (])" +oo[) is open for all ), E R Observing that,
396
11. Some Applications
COMME TARII
DE LINEA BREVISS[?L\
ACADEMIAE
,.. S'·'1:'a'..~I~,,~'A;."~t~~
SCIENTIARVM
QVAE-
Ana.rt
IMPERIA LIS
Leonh. Evlcto.
PETROPOLITANAE
C
TO.\(VS UI. AO M"NVI\I cb bee %nlli.
VIQ.\'ElM)t t:ncQ,~,
'rita. •"",.
poaJur, III.e.... b "f"WIl bz~ \iiliIaUD .lUcop.aao a4.uGdqoodc:-.. qat ct'c I"tum r«bm. E;.; Iaoc: fic.,lc la. 't1Otl'U
leWSINt. l8 lCpc.riQc pbu Laum ~~ftl dM Ifdd.tkl paaa-. tltIlSCl)tc-f'/l die rdtUD l ~ .b aleceo 24 UtUftn _llvr. lJI tbpuklc (pbJeri-U • .itt qu. hau"AI 4tKi aos ,oedl, bl&lfor. C'OJrIClm ,.jLIII .rftJiAlillUm de cumbuD miX!. ~In, . . . .n. 4t1Q puo(h CO"ianlh. oJ. Q!tou fttUJI til "prritlt qeK-.o~c lill'C couacn, 'DUOOOll1, tiat t1l batlUltt:l £1 YPbtc. \'itlim.ll I que a ekco pu&do .d wd oqr.c: dYlc:llllr, ao:JJ:lom en &tncr.1ilct 4«cnnUw.lIm. Pn;,. ",1M b_ qurft'on.cm Cd. loll. IktdOIlUJ. fiptic;lAl rt nu-cr6Jtmhl:lH:mlfc ttqlAliollml;C[V1C d llllnlll btnt!imam 4tttflllwa4tm (0:.<: !it.
qtao«
,0&.1.
PETROPOLI
pu6ric.t x(ommOlbri poftic. SoJal qo cuam hoc preblernt. iolutloMJD11R bJl( dil'ctUtiOIiIl Up60
TYrys ACAJ)F..l,tUS (11 1)("
X.
lIetc 'fo!w••
;'11.
~ ~reo-
Figure 11.1. Frontispieces of Commentarii Petropoli vol. 3 (1732) and of the paper by Leonhard Euler (1707-1783) De linea brevissima.
if f = supdi, then f-l(],X, +ooD lowing.
= Ui!i-1(],X, +ooD,
we conclude the fol-
-> i, i E I, be a family of lower semicontinuous functions on a metric space X. Then f := SUPi Ii is a lower semicontinuous function.
11.1 Proposition. Let fi : X
Let (X, d) be a metric space. As we have seen, d. Example 6.25, the length functional L : CO([a, b], X) -> i, which for each continuous curve c.p : [a, b] -> X gives its length, is not a continuous functional with respect to uniform convergence in CO ([a, b], X). But we have the following.
11.2 Theorem (Semicontinuity). The length functional L(c.p) is lower semicontinuous in CO([a, b], X). Proof. Recall that we have L(
where Vs(f) := '£idx(f(ti),f(ti+l))' S = {to = a < tl < ... < tN = b}. Since the functional f - t Vs (f) is continuous for every fixed subdivision S of la, b], the result follows. 0
b. Compactness The intrinsic reparametrization theorem, Theorem 7.44, can be reformulated as: For every family of parametric curves {Ci } in X of length strictly less than k, there exists a family of curves {Cn parametrized in [0,1],
11.1 Two Minimum Problems
397
thus belonging in CO([O, 1]' X), such that C i and C{ are equivalent for all i E I. In particular they have the same length, and the curves C{ are equiLipschitz with Lipschitz constant less than k. Assuming X compact, the Ascoli-Arzelit theorem yields the following.
11.3 Theorem (Compactness). Let X be a compact metric space and let {Ci } be a family of parametrized curves of length strictly less than k. Then the family {Ci } is relatively compact with respect to uniform convergence. More precisely, one can reparametrize the curves Ci on [0,1] in such a way that they belong to Lip k([O, 1], X), and therefore {Cil is a relatively compact subset of CO ([0, 1]' X). c. Existence of minimal geodesics An immediate consequence of Theorems 11.2 and 11.3, on account of the Frechet-Weierstrass theorem is the following. 11.4 Theorem (Existence). Let X be an arc-connected compact metric space and P, Q two points of X. There exists a simple rectifiable curve of minimal length joining P to Q, provided there exists at least a rectifiable curve connecting P and Q. Proof. Since there exists at least a rectifiable curve connecting P and Q, A := inf{Lh') Ii' connecting P and Q} < +00. Let k > A and let K := {
-+
I
X
Q}.
By the Ascoli-Arzela theorem, K is compact in CO ([0, 1], X), hence there is
Q}
by the Frechet-Weierstrass theorem. The map
= IXI
- X2!
and is injective. In fact, if "I/;(XI) = "I/;(X2) with Xl < X2, deleting the loop corresponding to the interval ]XI' X2 [, we would still get a curve connecting P and Q, but of length strictly less than L(<po), contradicting (11.1). 0 11.5'. Show that the compactness assumption on X in Theorem 11.4 is necessary. In particular, discuss the cases when X equals the closed unit cube minus an interior open segment and minus a closed interior segment.
11.1.2 A minimum problem in a Hilbert space In this section we shall show how the theorem ensuring the existence of minimizers for quadratic coercive functionals generalizes to convex coercive functionals in a Hilbert space.
398
11. Some Applications
a. Weak convergence in Hilbert spaces Let X be a Banach space. We say that a sequence {x n } C X converges weakly to x EX, and we write Xn
~x,
if F(x n ) ---. F(x) \:IF E X*, i.e., for every linear continuous functional F : X ---. IR on X. On account ot the Riesz's representation theorem, we have the following.
11.6 Proposition. A sequence {un} in a Hilbert space converges weakly to u E H iff (unlv) ---. (ulv) \:Iv E H.
If H is finite dimensional, weak and strong convergence agree, since weak convergence amounts to the convergence of the components in an orthonormal basis. On the contrary, if H has infinite dimension, the two notions of convergences differ. In fact, while from the inequality
I(u n
Ilvllllun - ull Ilu n - ull ---.0, implies weak convergence
ulv)1 ::;
-
we get that strong convergence, Un ~ U; the opposite is not true. Consider, for instance, a denumerable orthonormal set {en} C H. Then Bessel inequality yields (enlv) ---. \:Iv E H, i.e., en ~ 0, while {en} does not converge since
°
\:In, m.
Weak convergence is one of the major tools in modern analysis. Here we only state one of its major useful issues.
11. 7 Theorem. Every bounded sequence in a separable Hilbert space has a subsequence that is weakly convergent. Proof. Let {xn}, Ixnl ::; M, be a bounded sequence in H, and {ei} be a basis of H. {Xn} has a subsequence {x~} such that (x~led --> 0<1. Similarly {x~} has a subsequence {x~} such that «x~le2) --> 0<2, and so on. Therefore by a Cantor diagonal procedure, we can find a subsequence {Xk n } of {Xn} such that (Xk n Ie;) --> O lR given by 00
T(y) := L(yle;)O
yEH.
i=l
T is linear and bounded, IITII ::; M as 00
IITII 2 = L
2 2 l(yleiWIO
i=l
hence the representation theorem of Riesz yields the existence of XT E H such that T(y) = (yIXT) Vy E Hand IIxTIl = IITII ::; M. In particular XT = L:~1 O
11.1 Two Minimum Problems
399
let Y be any vector in H. For any fixed E > 0 choose N sufficiently large so that for YN := 2:~l(Ylei)ei' we have Ily - YNII < E. Then
on the other hand (Znlei) Thus
-t
0, hence l(znIYN)1
< E for n "In
larger than some n = n(N, E).
2: n.
o 11.8 Remark. The last part of the proof actually shows that in a separable Hilbert space, weak convergence (X n - xlY) --+ 0 'Vy amounts to the convergence of the components (X n - x!ei) --+ 0 'Vi in an orthonormal basis
{ei}' 11.9~. Show that the compactness theorem, Theorem 10.52, holds in a generic Hilbert space which is not necessarily separable. [Hint: Apply Theorem 10.52 to the closure Ho of the family of finite combinations of {Xn}, which is a separable Hilbert space. Then find x E Ho and a subsequence {Xk n } such that (Xk n -xly) - t 0 Vy E Ho. Then, use the orthogonal projection theorem onto Ho to show that actually, (Xk n - xly) - t 0 Vy E H.]
11.10 Theorem (Banach-Saks). Every bounded sequence {v n } C H weakly convergent to v E H has a subsequence {Vk n } such that
1
n
- '"'" Vk· --+
n L.J i=1
'
in the norm of H.
v
Proof. Set Un := V n - v. Then for a positive M we have IIUnl1 extract from {un} a subsequence {Uk n } in such a way that
:S M for all n and we
Ukl := Ul, (Uk2Iukl)
< 1,
(Uk 3 Iuk l)' (Uk 3 I uk 2)
1
< 2'
Vi=l, ... ,p. Therefore
2 n n L..JL.J
1 n n L.t
= -2 "''''(ukluk)+ -2 "'(Uk j=l i<j
't
1
2
J
M2
Iud :s -n +-. J n
j=l
o
400
11. Some Applications
I,.HiOlilDA TON&J.1I1
Multiple Integrals in die Calculus
ofVariatloDS
FO DAM11JNTI 01
ALCQLO D1'lLI,E VARIAZlONJ
---
Qu,rlrt I,Nor",,),. w"""", .. ~ .....
IIOIOOY"
smoLA
~ArUlH£:ldd
Figure 11.2. Frontispieces of two classical monographs that, in particular, deal with semicontinuity on integral functionals.
b. Existence of minimizers of convex coercive functionals Let F : H ~ IR be a convex functional on a real Hilbert space H. This means that the function
cp(A)
:=
F(AU + (1 - A) v)
is convex in [0,1] for all u, v E H. A typical example of a convex functional is the quadratic functional
F(u)
1
= 211uI12 - L(u)
where L is a bounded linear form on H, that we have encountered dealing with the abstract Dirichlet principle. Then we have, compare Proposition 5.61 of [GM1], the following.
11.11 Proposition (Jensen's inequality). A junctional F: H convex if and only if for every finite convex combination m
I>:l:i = 1, (}:i 2: 0, i=l
of points
Ui
E H we have
~
ill
lR is
11.1 Two Minimum Problems
401
Proof. Clearly Jensen's inequality with two points amounts to convexity. So it suffices to prove it assuming F convex. We give a direct proof by induction on m. Assume the claim holds for m - 1 points. Set a := a1 + ... + a m -1 and am := 1 - a. If a = 0 or a = 1 the claim is proved, otherwise 0 < a < 1 and, if
then m-1
a·
0:::; --..!:. :::; 1, a
m
' " -ai = L... i=1 a
L: aiui = au + (1- a)Um,
1,
i=l
hence
F(
f
aiUi) = F(au
+ (1 -
aUm)) :::; aF(u)
+ (1 -
a)F(um) :::;
i=l
f
aiF(ui)
i=l
o
by the inductive assumption.
~ IR be a continuous, convex, bounded from below and coercive functional, meaning
11.12 Theorem. Let F : H
inf F(u)
uEH
> -00,
F(u)
~
+00
lui
as
~
+00.
Then F has a minimizer in H. Proof. Let {Un} be a minimizing sequence, F(u n ) --. infuEHF(u). Since for large n -00
<
inf F(u) :::; F(u n ):::; inf F(u) uEH uEH
+ 1,
the sequence {Un} is bounded. Using the Banach-Saks theorem we find u E H, and we can extract a subsequence {Uk n } of {Un} such that Uk n ->. u and 1
n
Vn := - (L:Uki) --. U n i=1
in the norm of H.
Jensen's inequality yields
thus F(v n ) --. infHF since F(UkJ --. infHF as i --. sequence, too. Finally inf F :::; F(u) = H
on account of the continuity of F.
00,
i.e., {Vn} is a minimizing
lim F(v n ) = F(u)
n----+(X)
o
11.13~. Show that Theorem 11.12 still holds if F is convex with values in JR, bounded from below and lower semicontinuous.
402
11. Some Applications
11.2 A Theorem by Gelfand and Kolmogorov In this section we shall prove that a topological space X is identified by the space of continuous functions on it. If we think of X as a geometric world and of a map from X into lR as an observable of X, we can say: if we know enough observables, say the continuous observables, then we know our world. Let us begin by proving the following. 11.14 Proposition. Every metric space (X, d) can be isometrically embedded in CO(X). Proof. Fix p E X and consider the map CO(X, IR) that maps a E X into fa: X --> IR defined by fa(x):= d(x, a) - d(x,p).
Trivially, fa E CO(X, IR) and Ifa(x) - fb(x)1 = Id(x,a) - d(x,b)1
i.e., IIfa - fblloo S d(a, b); on the other hand for x hence
S d(a, b),
= b we have
Ifa (b) - fb(b)[
= d(a, b), 0
11.15'. Show that every separable metric space (X, d) can be isometrically embedded in 100 , [Hint: Let {Xn} be a sequence in X and let 100 be given by
Let X be a topological space, see Chapter 5. The set CO(X, lR) is a linear space and actually a commutative algebra with identity, since the product of two continuous functions is continuous. Let Rand R' be two commutative algebras. A map 'P : R ---> R' is said to be a homomorphism from R into R' if 'P(a+b) = 'P(a)+'P(b) and 'P(ab) = 'P(a)'P(b). If, moreover, 'P is bijective we say that Rand R' are isomorphic. Clearly CO (X, lR) is completely determined by X, in the sense that every topological isomorphism 'P : X ---> Y determines an isomorphism of the commutative algebras CO(Y, lR) and CO(X, lR), the isomorphism being the composition product f ---> f 0 'P. If X is compact, the converse also holds. 11.16 Theorem (Gelfand-Kolmogorov). Let X be a compact topological space. Then CO(X,lR) determines X. We confine ourselves to outlining the proof of Theorem 11.16. An ideal I of the algebra R is a subset of R such that a, b E I =;. a - b E I and a E I, r E R =;. a . rEI. R is clearly the unique ideal that contains the identity of R. An ideal is called proper if I # R and maximal if it is not strictly contained in any proper ideal. Finally, we notice that RII is a field if and only if I is maximal. 11.17 Lemma. Let X be a compact topological space. I is a proper maximal ideal of CO(X) if and only if there is xo E X such that I = {f E CO(X) I f(xo) = O}.
11.3 Ordinary Differential Equations
403
Proof. For any f E I, the set f- 1(0) is closed and f-1(0) i' O. Otherwise Ilf, hence 1, belongs to I, and I is not proper. Let h, .. ., fn E I. The function f := L::~=l f1: is in I and f-1(0) = nfi- 1 (0) i' 0. Since X is compact, n{f-1(0) [f E I} i' 0. In particular, there is Xo E X such that f(xo) = 0 Vf E I. On the other hand, {f I f(xo) = O} is an ideal, hence I = {f I f(xo) = OJ. 0 The spectrum of a commutative algebra with unity is then defined by spec R:=
I
{I I
maximal ideal of
R}.
Trivially, if R is isomorphic to CO(X, JR), R ~ CO(X, JR), then also the maximal ideals of R and CO (X, JR) are in one-to-one correspondence, hence by Lemma 11.17 spec R ~ spec Co(X) ~
x.
To conclude the proof of Theorem 11.16, we need to introduce a topology on the space spec CO (X, JR) in such a way that spec CO (X, JR) ~ X becomes a topological isomorphism. For that, we notice that, if I is a maximal ideal of CO(X, JR), then CO(X, JR)II ~ JR, hence the so-called evaluation maps f(I), that map (J, I) into [fJ E CO(X,JR)II ~ JR, have sign. Now, if we fix the topology on specR ~ spec CO(X, JR) by choosing as a basis of neighborhoods the finite intersections of
I
U(J) := {I I proper maximal ideal with f(I)
> O},
it is not difficult to show that the isomorphism X -> spec CO(X, JR) is continuous. Since X is compact and the points in spec CO(X, JR) are separated by open neighborhoods, it follows that the isomorphism is actually a topological isomorphism. Theorem 11.16 has a stronger formulation that we shall not deal with, but that we want to state. A Banach space with a product which makes it an algebra such that IIxy[[ :; IIxlllly[[ is called a Banach algebra. An involution on a Banach algebra R is an operation x -> x* such that (x + y)* = x* + y*, (Ax)* = >':x*, (xy)* = (yx)* and (x*)* = x. A Banach algebra with an involution is called a CO-algebra. Examples of C*-algebras are: (i) the space of complex-valued continuous functions from a topological space with involution f -> 1, (ii) the space of linear bounded operators on a Hilbert space with the involution given by A -> A*, A* being the adjoint of A. Again, the space of proper maximal ideals of a commutative C*-algebra, endowed with a suitable topology, is called the spectrum of the algebra.
Theorem (Gelfand-Naimark). A C*-algebra is isometrically isomorphic to the algebra of complex-valued continuous functions on its spectrum.
11.3 Ordinary Differential Equations The Banach fixed point theorem in suitable spaces of continuous functions plays a key role in the study of existence, uniqueness and continuous dependence from the data of solutions of ordinary differential equations.
404
11. Some Applications
11.3.1 The Cauchy problem Let D be an open set in IR x IR n , n ~ 1, and F(t, y) : D C IR x IR n -+ IR n be a continuous function. A solution of the system of ordinary equations d
dtx(t) = F(t,x(t))
(11.2)
is the data of an interval ]Q,,8[C IR and a function x E C 1 (]Q,,8[;lRn ) such that (11.2) holds for all t E]Q, ,8[. In particular, (t, x(t)) should belong to D for all t E]Q, ,8[. Geometrically, if we interpret F(t, x) as a vector field in D, then x(t) is a solution of (11.2) if and only if its graph curve t -+ (t, x(t)) is of class C 1 , has trajectory in D, and velocity equals to (1, F(t, x(t))) for all t. For this reason, solutions of (11.2) are called integral curves of the
system. a. Velocities of class Ck(D) In the sequel, at times we need a fact that comes from the differential calculus for functions of several variables that we are not discussing in this volume. Let 0 C IR n be an open set. We say that a function f : 0 -+ IR is of class Ck(O), k ~ 1, if f possesses continuous partial derivatives up to order k. One can prove that, if f E Ck(O) and'Y : [a, b] -+ 0 is a C k curve in 0, then f 0'Y : [a, b] -+ IR is of class Ck([a, b]). For k = 1 we have the
chain rule
where Df(x) := (-i!dx), ~(x), ... ,-i!n(x)) is the matrix of partial derivatives of f and the product Dfb(t)h'(t) is the standard matrix product. A trivial consequence is that integral curves, when they exist, possess one derivative more than the function velocity F(t, x(t)). This is true by definition if F is merely continuous. If, moreover, F(t,x) E C k and x(t) E C 1 , we successively find from the equation x'(t) = F(t, x(t» that x'(t) E Cl, x'(t) E C2, ... , x'(t) E C k . In particular, if F(t, x) has continuous partial derivatives of any order, then the integral curves are Coo. It is worth noticing that if FE C1(D), then by the chain rule
8F x"(t) = 7jt(t,x(t))
+ DFx(t, x(t))x'(t),
where DFx is the matrix of partial derivatives with respect to the x's variables and the product DFx(t,x(t))x'(t) is understood as the matrix product. For the sequel, it is convenient to set
11.3 Ordinary Differential Equations
405
11.18 Definition. We say that a function F(t, x) : [0:,,8] x B(xo, b) -+ jRn is Lipschitz in x uniformly with respect to t if there exists L > 0 such that [F(t, x) - F(t, y)1 :::; L [x -
yl
V(t, x), (t, y) E [0:,,8] x B(xo, b).
(11.3)
Let D be an open set in jR x jRn. We say that a function F(t, x) : is locally Lipschitz in x uniformly with respect to t if for any D:= [0:,,8] x B(xo, b) strictly contained in D there exists L := L(o:,,8, xo, b) such that
D
-+ jRn
[F(t, x) - F(t, y)[ :::; L [x 11.19~. Show that the function f(t, x) in x uniformly with respect to t.
yl
V(t, x), (t, y)
ED.
= sgn(t)[x[, (t, x) E [-1,1] x [-1, 1] is Lipschitz
= [a, b]
x [c, d] be a closed rectangle in JR x JR. Show that, if for all f(t,x) has derivative fx(t,x) on [c,d] and (t,x) --> fx(t,x) is continuous in D, then f is Lipschitz in x uniformly with respect to t. [Hint: Use the mean value theorem.] 11.20
~.
Let D
t E [a,b], the function x
-->
11.21~. Show the following. Let D be an open set ofJR x JRn and let F(t, x) E CI(D). Then F is locally Lipschitz in x uniformly with respect to t. [Hint: For any (to, xo) E D, choose a, bE JR such that 5 := Ht, x) [It - tal < a, Ix - xol < b} is strictly contained in D. Then, for (t,XI), (t,X2) E 5, consider the curve ')'(s) := (t,(l- S)XI + SX2), S E [0,1] whose image is in 5 and apply the mean value theorem to F(')'(s)), s E [0,1].]
b. Local existence and uniqueness Assume (to,xo) E D. We seek a local solution
for some r
> 0 of the
Cauchy problem relative to the system (11.2), Le.,
= F(t,x(t)), = xo.
:tX(t) {
x(to)
(11.4)
We have the following. 11.22 Proposition. Let D be an open set in jR x jRn, n ::::: 1, and let F(t,x) : D -+ jRn be a continuous function. Then x(t) E C1([to - r,to + r], jRn) solves (11.4) if and only if x(t) belongs to CO([to - r, to + r], jRn) and satisfies the integral equation
J t
x(t)
= Xo +
F(T, X(T)) dT
to
"It E [to - r, to
+ r].
(11.5)
406
11. Some Applications
Proof. Set J:= [to -r, to +r]. If x E C1(I,JRN) solves (11.4), then by integration x satisfies (11.5). Conversely, if x E CO (I, JRn) and satisfies (11.5), then, by the fundamental theorem of calculus, x(t) is differentiable and x'(t) = F(t, x(t)) in f, in particular it has a continuous derivative. Moreover, (11.5) for t = to yields x(to) = x o . ' 0
Let us start with a local existence and uniqueness result.
11.23 Theorem (Picard-LindelOf). Let F(t, x) : D c IR x IRn ---> IRn be a continuous function with domain D := {(t,x) E IR x IRn lit - tal < a,
Ix - xol < b}.
Suppose
(i) F(t,x) is bounded in D, IF(t,x)l::::; M, (ii) F(t,x) is Lipschitz in x uniformly with respect to t, IF(t,x) - F(t,y)1 ::::; kllx -
yll
V(t,x), (B,y) E D.
Then the Cauchy problem (11.4) has a unique solution in [to - r, to where
r
+ r]
. ( b 1)
< mIll a, M'k .
Proof. Let r be as in the claim and J r := [to -r, to +r]. According to Proposition 11.22, we have to prove that the equation x(t) = Xo
+
1 t
F(r,x(r)) dr.
to
has a unique solution x(t) E C°(Ir, JRN). Let Yl,Y2 E C°(Ir,JR n ) be two solutions of (11.5). Then for all t E Jr
1 t
IY1(t) - Y2(t)1 ::;
IF(s,yl(s)) - F(S,Y2(S))1 ds::; kit - tolllYl - Y2l1oo,Ir
to
hence
11m -
Yllloo,Ir ::; krllYl - Yllloo,Ir'
Since kr < 1, then Yl = Y2 in fr. To show existence, we show that the map x
T[x](t)
:=
xo
+
t
lto
-->
Tx given by
F(r, x(r)) dr
is a contraction on
I
X := {x E C°(Ir, JRN) x(to) = Xo, Ix(t) - xol ::; b Vt E Jr}
that is closed in C°(Ir,JRN), hence a complete metric space. Clearly t continuous function in Jr , T[x](to) = Xo and
IT[xJ(t) - xol::;
t
lto
-->
T[x](t) is a
IF(r,x(r))1 dr::; M It I ::; Mr::; b,
therefore T maps X into itself. Moreover, it is a contraction; in fact t
IT[x](t) - T[y](t)1 ::;
J
I
IF(r, x(r)) - F(r, y(r))1 drl
a t
::; kl
J
Ix(r) - y(r)1 drl ::; kltllix - ylloo ::; kr Ilx - ylloo,Ir'
a The fixed point theorem of Banach, Theorel?,l 9.128, yields a (actually, a unique) fixed point T[x] = x in X. In other words, the equation (11.5) has a unique solution. 0
11.3 Ordinary Differential Equations
407
Taking into account the proof of the fixed point theorem we see that the solution x(t) of (11.4) is the uniform limit of Picard's successive approximations
J t
xo(t) := Xo,
and, for n
~ 1,
xn(t):= Xo
+
F(T,Xn_l(T»dT.
to
The Picard-Lindelof theorem allows us to discuss the uniqueness for the initial value problem (11.4).
11.24 Theorem (Uniqueness). Let D c JR x JRn be a bounded domain, let F(t, x) : D -+ JRn be a continuous function that is also locally Lipspchitz in x uniformly in t, and let (to, xo) E D. Then any two solutions Xl : 1-+ JRn, X2 : J -+ JRn defined respectively, on open intervals I and J containing to of the inital value problem x/(t) = F(t,x(t», { x(to) = xo, are equal on In J. Proof. It is enough to assume I C J. Define
Obviously to E E and E is closed relatively to I, as Xl, X2 are continuous. We now prove that E is open in I, concluding E = I since I is an interval, compare Chapter 5. Let t* E E, define x* := XI(t*) = X2(t*). Let a, bE IR+ be such that 15:= {(t, x) E D lit - t*1 < a, Ix - x*1 < b} is strictly contained in D. F being bounded and locally Lipschitz in x uniformly with respect to t in 15, the Picard-Lindelof theorem applies on D. Since XI(t) and X2(t) both solve the initial value problem starting at (t*, x*), we conclude that XI(t) = X2(t) on a small interval around x*. Thus E is open. 0
c. Continuation of solutions We have seen that the initial value problem has a solution that exists on a possibly small interval. Does a solution in a larger interval exist? As we have seen, given two solutions Xl : I -+ JR n , X2 : J -+ JRn of the same initial value problem, one can glue them together to form a new function x : I U J -+ JRn, that is again a solution of the same initial value problem but defined on a possibly larger interval. We say that x is a continuation of both Xl and X2. Therefore, Theorem 11.24 allows us to define the maximal solution, or simply the solution as the solution defined on the largest possible interval. 11.25 Lemma. Suppose that F : D c JR x JRn -+ JRn is continuous in D, and let x(t) be a solution of the initial value problem
408
11. Some Applications
x'(t) = F(t,x(t)), { x(to) = Xo in the bounded interval 'Y < t < 15; in particular (t, x(t)) E D 'tit Eh,J[. If F is bounded near (15, x(J)), then x(t) can be continuously extended on J. Moreover, if (15, x(J)) E D, then the extension is C 1 up to J. A similar result holds also at b, xb)). Proof. Suppose that IF(t, x)1 :'S M Vet, x) and let x(t), tEll, 8[, be a solution. For tl, t2 Ell, 8[ we have
Le., x is Lipschitz on 1I, 8[, therefore it can be continuusly extended to part of the claim follows from (11.5) to get for t < 8
h, 8]. The second
r
x(t) - x(8) = _1_ F(s, xes)) ds t-8 t-8}d and letting t
-->
8+ .
o
Now if, for instance, (15, x( 15)) is not on the boundary of D and we can solve the initial value problem with initial datum x(J) at to = 15, we can continue the solution in the C 1 sense because of Proposition 11.22, beyond the time 15, thus concluding the following. 11.26 Theorem (Continuation of solutions). Let F(t, x) be continuous in an open set D c JR x JRn and locally Lipschitz in x uniformly with respect to t. Then the unique (maximal) solution ofx'(t) = F(t,x(t)) with x(to) = Xo extends forwards and backwards till the closure of its graph eventually meets the boundary of D. More precisely, any (maximal) solution x(t) is defined on an interval ]0:,,8[ with the following property: for any given compact set K C A, there is 15 = J(K) > 0 such that (t,x(t)) tf. K for t 1c [0: + 15,,8 - 15].
Recalling Exercise 11.21, we get the following. 11.27 Corollary. LetD be an open domain inJRxJRn, and letF E C 1 (D). Then every (maximal) solution of x'(t) = F(t, x(t)) can be extended forwards and backwards till the closure of its graph eventually reaches aD. 11.28 Corollary. Let D :=]a, b[ xJRn (a and b may be respectively, +00 and -(0) and let F(t, x) : D ----> JRn be continuous and locally Lipschitz in x uniformly with respect to t. Then every locally bounded (maximal) solution of x' = F(t, x) is defined on the entire interval la, b[. Proof. Let Ix(t)1 :'S M. Should the maximal solution be defined on [a,,8J with, say ,8 < b, then the graph of x would be contained in the compact set [00,,8] x B(O, M) strictly contained in ]a,b[xlR n . This contradicts Theorem 11.26. 0
11.3 Ordinary Differential Equations
409
Of course, if F is bounded in D :=]a, b[ xlR n , all solutions of x' = F(t, x) are automatically locally bounded since their velocities are bounded, so the previous theorem applies. For a weaker condition and stronger result, see Exercise 11.33. 11.29 Example. Consider the initial value problem x' = x 2 , x(O) = 1, in D a := Ixl < a}. Since IFI :S a2 in D a , the continuation theorem applies. In fact, the maximal solution 1/(1 - t), t E] - 00,1 - ~[ has a graph that extends backwards till -00 and forward until it touches aDa.
{(t, x) It E JR,
d. Systems of higher order equations We notice that a differential equation of order n in normal form in the scalar unkown x(t)
n l n d ( d d ) dtn x(t) = F t, x(t), dt x(t), ... , dt n- l x(t)
(11.6)
can be written, by defining Xl (t)
:=
dn -
X(t),
... ,
l
Xn(t):= dt n- l x(t),
as the first order system x~ (t) =:
X2(t),
X~(t) =:
X3(t),
X~_l(t) =: X~(t)
Xn(t), =: F(t,Xl(t),X2(t), ... ,Xn(t))
or, compactly as,
y'(t) = F(t,y(t)) for the vector-valued unknown y(t) := (Xl(t), X2(t), ... , xn(t)) and F D c IR x IR n ---t IR n given by
F(t, Xl, ... , x n ) := (X2' X3, ... ,xn , f(t,
Xl (t),
X2(t), ... ,xn(t)).
Consequently, the Cauchy problem for (11.6) is
X(n)(t) = F(t, x(t), x'(t), x"(t), ... ,x(n-l)(t)), x(to) = xo, x'(to) = Xl, X"(to) = X2,
(11. 7)
410
11. Some Applications
Along the same line, the initial value problem for a system of higher order equations can be reformulated as a Cauchy problem for a system of first order equations, to which we can apply the theory just developed. e. Linear systems For linear systems
= A(t)x(t) + g(t),
x'(t)
where A(t) is an n x n matrix and g(t) E
jRn,
(11.8)
we have the following.
11.30 Theorem. Suppose that A(t) and g(t) are continuous in [a, b] and that to E [a, b] and Xo E jRn. Then the solution of (11.8) with initial value x(to) = Xo exists on the entire interval. Proof. Assume for simplicity that to E]a, b[. The field F(t, x) := A(t)x + get) is continuous in D :=]a,b[xjRn and locally Lipschitz in x uniformly with respect to t, /F(t,x) - F(t,y)l::;
IIA(t)llllx - yll
sup
< a < (3 < b, 'Ix, Y E
Va
jRn.
tE!Q,13j
Therefore, a solution of (11.8) exists in a small interval of time around to, according to Picard-Lindel6f theorem. To show that the solution can be continued on the whole interval Ja, b[, it suffices to show, according to Corollary 11.28, that x(t) is bounded. In fact, we have
1:
x(t) - x(to) =
J t
A(s)x(s)ds +
g(s)ds.
to
For t
> to
we then conclude that
Ix(t)1 ::; Ixol + max Igl(b -
a)
[a,b]
+
IIA(t)11
sup tE!a,bj
r Ix(s)1
ds,
lto
o
and the boundedness follows from Gronwall's inequality below.
11.31 Proposition (Gronwall's inequality). Suppose that k is a nonnegative constant and that f and 9 are two nonnegative continuous functions in [a,,8] such that t
f(t) S k
+
J
t
f(s)g(s) ds,
E
[a, ;3].
Q
Then
f (t) S k exp
(it
g(
S)dS) .
t
Proof. Set U(t)
:=
k + J f(s)g(s) ds. Then we have Q
f(t) ::; U(t), in particular
U ' (t) = f(t)g(t) ::; g(t)U(t),
Ural
= k,
11.3 Ordinary Differential Equations
411
ON TN_ EXIST£!'fC& AND PROPE.R1"la OF TH.&
toL-UTlOHS OF A C<TAlH DJPFERE:qJA1,. &QUAnOlf OF mE SRCOI'fD O~· I.~~n.-..w
...... (,I.) (I)
.,~c.IrwttI......,.~
, - , . . M .-..., at '--A. , -,. III • - _ f a t . - ...
, - " III . - ...;
,(.,,)r'-.
fCiI (D)
............... •
.1.{f(..,t:~]_/f,..•)
(II
, - , . at. __ ,
n. aMtMt ...... __"- •
0
" 0. J~.lt
.. ,c.-.,..
w'-
~
..I,11m. '""em«l)'" (D~ utb:lk-.>,,_ .., ..... >0./11.,)
_.11-. ..
.........
'"
""I
....;
,-0 ••- •.
N. ,)~ -
&~
III&., .. - .. fi.~
Si·S ..
UIII'IS~
.nllilll"
t_
00 II ... ......
..
....
V~ "-~~::~'~:i.-,,.lf,tS~:~l~" 1;d;;rI<M.. ~"·Ii~.
I
,~-~I<JI.lf-.I.
..a., ut,.
~ .. "'-"- .......... ......",J.., .. mL . . No-r..
M._:"OO
.
1 ... ~
m.".. . . ~
'-..._ ,..., .. .u..., - . - '~ .. -"'./fA_·"t1IttA. "'J(ff. j ....... 1.,....." .l~-._
, .. ~ ... '-/~>
-
.
/~_
"..~
(1. . .
t1 . .
..~~/" ftIt_
..
.. ,~ .. . . . , , - •• ~
Figure 11.3. Thomas Gronwall (18771932) and a page from one of his papers.
~ [U(t)ex p ( hence
U(t) exp ( -
f
f 9(S)dS)] ~
-
g(s)
0,
dS) - U(a) ~ O. o
11.32~. Let
w : [a, b]
---+
IRn be of class C 1 ([a, b]). Assume that
Iw'l(t) ~ a(t) Iw(t)1
+ b(t)
'It E [a, b]
where a(t), b(t) are nonnegative functions of class CO([a, b]). Show that
Iw(t)1
~
(lw(to)1
+
1:
b(s)ds)ex p
(1:
a(s)ds)
for every t, to E [a, b]. [Hint: Apply Gronwall's lemma to f(t) := Iw(t)I.] 11.33~. Let F(t, x) : I x IR n ---+ IR n be continuous and locally Lipschitz in x uniformly with respect to t. Suppose that there exist nonnegative continuous functions a(t) and b(t) such that IF(t,x)1 ~ a(t)lxl + b(t). Show that all the solutions of x' = F(t,x) can be extended to the entire interval I.
f. A direct approach to Cauchy problem for linear systems For the reader's convenience we shall give here a more direct approach to the uniqueness and existence of the solution of the initial value problem
to E [a, b], X(to) = X o, { X'(t) = A(t)X(t) + F(t)
(11.9)
Vt E [a,b]
412
11. Some Applications
where X o E JRn and the functions t -> A(t) and t -> F(t) are given continuous functions defined in [a, b] with values respectively, in Mn,n((C) and en. Recall that IIA(t)11 := sUPlxl=l IA(t)xl denotes the norm of the matrix A(t) and set M:= sup IIA(t)ll. tE[a,b]
As we have seen, see Proposition 11.22, X(t), t E [a, b] solves (11.9) if and only if t -> X(t) is of class CO([a, b]) and solves the integral equation
X(t) = X o +
it
(A(s)X(s)
+ F(s)) ds
(11.10)
to
that is, iff X(t) is a fixed point for the map
T: X(t)
t-+
T(X)(t)
:=
it
Xo+
(A(s)X(s)
+ F(s)) ds.
(11.11)
to
Let, > O. The function on CO([a, b], JRn) defined by IIXII,:= sup (IX(t)le-,lt-t ol ) tE[a,b]
is trivially a norm on CO([a, b]). Moreover, it is equivalent to the uniform norm on CO ([a, b], JRn) since e-,lb-aIIIXlloo,[a,bj :::; IIXII, :::; IIXlloo,[a,bj' Hence the space CO ([a, b], JRn) endowed with the norm 1111" that we denote by C" is a Banach space. 11.34 Proposition. Let T be the map in (11.11). Then T(C,)
V, 2
o.
c
C,
Moreover, T is a contraction map on C, if
, > M:= sup IIA(t)ll. tE[a,b]
Proof. In fact, 'IX, Y E 0"1 and t E [a, b], we have ITX(t) - TY(t)1 =
=
11: 11:
A(s)(X(s) - Y(s))
dsl
A(s)(X(s) - Y(s))e-"Ils-tole"lis-tol dsl
::; (t IIA(s)II(IX(s) _ Y(s)le-"Ils-tol)e"lls-tol ds
lto
::; MIIX - Ylh
r
lto
"( 11x
e"lls-tol ds::; M
Ylhe"llt-t ol .
Multiplying the last inequality by e-"Ilt-tol and taking the sup norm gives
IITX -TYII"I::; M 11X - Ylh· "(
o
11.3 Ordinary Differential Equations
413
11.35 Theorem. The initial value problem (11.9) has a unique solution X(t) of class G I ([a, b]), and
IX(t)l::; (IXol +
1:
IF(s)lds)ex p
(1: IIA(s)lldS).
Moreover, X(t) is the uniform limit in GO([a,b],lR n ) of the sequence {Xn(t)} of functions defined inductively by XO(t) := X o , { X n+ I (t) := X o +
1:
(A(s)Xn(s)
(11.12)
+ F(s)) ds.
Proof. Choose"y > M. Then T : Cooy -... Cooy is a contraction map. Therefore, by the Banach fixed point theorem T has a unique fixed point. Going into its proof, we get the approximations. Finally, the estimate on IX(t)1 follows from (11.10) and the Gronwall Lemma. 0
11.36 Remark. In the special case a = -00, b = +00, to Vt and A(t) = A constant, then (11.12) reduces to
= 0,
F(t)
=
°
hence the solution of the initial value problem for the homogeneous linear system with constant coefficients XI(t) = AX(t), { X(O) = X o
is Vt E IR uniformly on bounded sets of IR and
IX(t)1 ::; IXolexp (It - tolllAII)
VtER
g. Continuous dependence on data We now show that the local solution x(t; to, xo) of the Cauchy problem X' = {
x(to)
F(t,x),
= Xo
depends continuously on the initial point (to, xo), and in fact is continuous in (t,to,xo).
414
11. Some Applications
11.37 Theorem. Let F(t, x) and Fx(t, x) be bounded and continuous in a region D. Also suppose that in D we have
IF(t, x)1 :::; M,
> 0 there exists J > 0 such that Ix(t; to, xo) - x (t;to , xc)1 < E provided It - ~ < J and Ixo - xol < J and t,l are in a common interval of Then, for any
E
existence. Proof. Set ¢(t) := x(t; to, xo), 'IjJ(t) := x(t; tQ,£o). From
!
t
¢(t)
= Xo +
F(s, ¢(s)) ds,
'IjJ(t)
= £0 +
to
!
! t
F(s,'IjJ(s»)ds,
to ~
t
F(s, ¢(s») ds =
~
!
!
t
F(s, ¢(s) ds
+
~
F(s, ¢(s) ds
~
we infer ~
t
¢(t)-'ljJ(t)=xo-£o+ ![F(s,¢(S)-F(S,'lj;(S»]dS+! F(s,¢(s»ds, ~
~
hence I¢(t) - 'IjJ(t)1
~ Ix -
kl! t
+
£01
I¢(s) - 'IjJ(s) I dsl
+ Mlto -
tol
to t
~ 8 + kl! I¢(s) -
'IjJ(s) Idsl
+ M8.
to Gronwall's inequality then yields I¢(t) -'lj;(t)1 ~ 8(1
+ M)exp (kit -
Since 1'IjJ(t) -'lj;(t)1
~
I!
tol) ~ 8(1
t
IF(s,'IjJ(s»1 dsl
+ M)exp (k«(J -
a»).
~ Mit - tl ~ M8
t we conclude I¢(t) - 'IjJ(t)1 ~ I¢(t) - 'IjJ(t)1 + I'lj;(t) - 'IjJ(t)1 ~ 8(1 + M) exp (k«(J + 8M
a»
if It -
tl < 8.
o
11.38'. Let F(t,x) and G(t,x) be as in Theorem 11.37, and let ¢(t) and 'IjJ(t) be respectively, solutions of the Cauchy problems
X' = F(t,x), { x(to) = xo
and
Show that I¢(t) - 'IjJ(t)1 ~ (Ixo - £01 if IF(t, x) - G(t, x)1 < E.
+ E«(3 -
X' {
_= G(t~ x),
x(to)
a»
= xo·
exp (k(t - to»
11.3 Ordinary Differential Equations
415
h. The Peano theorem We shall now prove existence for the Cauchy problem (11.4) assuming only continuity on the velocity field F(t,x). As we know, in this case we cannot have uniqueness, see Example 6.16 of [GMl]. 11.39 Theorem (Peano). Let F(t, x) be a bounded continuous junction in a domain D, and let (to, xo) be a point in D. Then there exists at least one solution oj X' = F(t, x), {
= Xo·
x(to)
Proof. Let IF(t,x)1 ::::; M and B := {(t,x) E lR x lRn I It - tol strictly contained in D. If r < min{a, b/M} we have seen that
< a,
Ix - xol
< b}
be
J t
T[x](t) :=
F(T,X(T)) dT
to
maps the closed and convex set
X := {x E CO([xo - r, xo
+ r],lR n ) Ix(to)
= xo, Ix - xol ::::;
b}
in itself, see Theorem 11.23. The operator T is continuous; in fact, since F is uniformly continuous in B, "IE> 0 :3 7) such that
IF(t,x) - F(t,x')1
< E "It
E [a,b]
if Ix - x'i
< 7),
hence
IF(t,xn(t)) - F(t,xoo(t))1
<E
"It E [a,b]
for large enough n if xn(t) ---. xoo(t) uniformly. Then we have
I! t
IIT[xnJ - T[xooJlloo ::::;
IF(t, xn(t)) - F(t, xoo(t))1 dtl
< E (b - a).
to
Moreover
I! t
IT[x](t') - T[x](t)1 =
F(T, X(T)) dTI ::::; Mit - t'l,
t'
and we conclude by the Ascoli-Arzela theorem that T : X ---. X is compact. The Caccioppoli-Schauder theorem yields the existence of at least one fixed point x(t), x(t) = T[x](t); this concludes the proof. 0
Notice that the solutions can be continued, cf. Lemma 11.25, possibly in a nonunique way. Therefore any solution can be continued as a solution forwards and backwards in time till the closure of the graph of the extension eventually meets the boundary of the domain D. 11.40 .. Comparison principle. Let f : [a, b] x lR ---. lR be a function that is Lipschitz on each rectangle [a,b] x [-A,A] and let a(t),,B(t) be two functions such that
a(t)::::; ,B(t),
a'(t):S f(t,a(t)),
Show that every solution of
,B'(t) ~ f(t,(3(t))
"It E [a,b].
416
11. Some Applications
X'(t) = f(t,x(t)), { x(O) = Xo,
a(a) ::; Xo ::; (3(a),
satisfies a(t) ::; x(t) ::; (3(t) 'it E [a, b]. In particular, there is a solution that is defined on the entire interval. 11.41
~
Peano's phenomenon. Consider the Cauchy problem
x'(t) = f(t, x(t)),
x(to) = Xo
in [a, bj,
(11.13)
where f(t, x) is a continuous function. Show that (i) there exist a minimal and a maximal solution, i.e., ;!;.(t) and x(t) solutions of (11.13) such that for any other solution of (11.13) we have ;!;.(t) ::; x(t) ::; x(t), (ii) if the minimal and the maximal solutions of (11.13) exist in [to, to +8], show that through every point (to, xo) with t E [to, to + 8J and x E [;!;.(t) , x(t)] there passes a solution of (11.13). [Hint: To show existence of a maximal solution, show that, if Xn(t) solves x' = f(t, x) + ~,then, possibly passing to a subsequence, {xn} converges to a maximal solution.] 11.42~.
Study the following Cauchy problem passing to polar coordinates (p,O)
11.3.2 Boundary value problems For second order equations it is useful to consider, besides the initial value problem, so-called boundary value problems in which the values of u or u', or a combination of these values, are prescribed at the boundary of the interval. For instance, suppose we want to find the linear motion of a particle under the external force F(t,x(t),x'(t)) starting at time t = in Xo and ending at time t = 1 in Xl, i.e., we want to solve the Dirichlet problem,
°
xII(t) = F(t, x(t), x'(t)) { 11.43~.
in ]0, 1[,
x(o) = xo, x(l)
= Xl.
Check that the problem ff
{
x +x = x(O) = 0,
X(tl) =
°
Xl
(i) has a unique solution if tl of. mr, n E Z and Xl E JR, (ii) has infinite many solutions if tl = mr, n E Z and Xl (iii) has no solutions if tl = mr, n E Z and Xl of. 0.
= 0,
11.3 Ordinary Differential Equations
Discuss also the same problem for the equation x"
+ >"x =
417
O.
11.44 Theorem. Let F(t, x, y) be a continuous function in the domain D := {(t, x, y) It E [0,1], Ixl ~ a, lyl ~ a}. Moreover, suppose that F(t, x, y) is Lipschitz in (x, y) uniformly with respect to t, i.e., there exists J.l > such that
°
IF(t, Xl, Yl) - F(t, X2, Y2)1 ~ J.l (Ixl - x21
+ IYl -
Y2/)
for every (t,xl,Yd, (t,X2,Y2) E D. Then for IAI sufficiently small the problem X" = AF(t,x,x'), (11.14) { x(o) = x(l) = has a unique solution x(t) E 0 2([0,1]). Moreover Ix(t)1 ~ a and Ix'(t)1 ~ a "It E [0,1].
°
Proof. If x(t) solves x" = >"F(t, x(t), x'(t», then x'(t) = A and
+ >..
t
lo° F(r, x(r), x'(r» dr
= A
lot (t - r)F(r, x(r), x'(r» dr,
d + >..-
&
°
t
x(t) = At+B+>..l (t-r)F(r,x(r),x'(r»dr; the boundary conditions yield 1
B=O,
A+>..10 (I-r)F(r,x(r),x'(r))dr=0.
Thus, x(t) is of class C 2 ([0, 1]) and solves (11.14) if and only if x(t) is of class C1 ([0,1]) and solves
J t
x(t) = >..
(t - r)F(r, x(r), x'(r» dr
°
(11.15) 1
->..t 10 (I-r)F(r,x(r),x'(r))dr. Now consider the class
I
X := {x E C 1 ([0, 1]) x(O) = 0, sup Ix'(t)1 [0,1]
~
a}
endowed with the metric d(X1,X2):= sup Ixi(t) -x;(t)1 tE[O,l]
that is equivalent to the C1 metric IIx1 - x21Ioo,[0,l) + Ilxi - x;lloo,[O,lj' It is easily seen that (X, d) is a complete metric space and that the map x(t) --t T[x](t) given by t
T[x](t):= >..I (t - r)F(r, x(r), x'(r» dr - >..t
10
1
(1- r)F(r,x(r),x'(r» dr,
maps X into itself and is a contraction provided 1>"1 is sufficiently small. The Banach fixed point theorem then yields a unique solution x E X. On the other hand, (11.15) implies that any solution belongs to X if 1>"1 is suffciently small, hence the solution is ~~
0
418
11. Some Applications
a. The shooting method A natural approach to show existence of scalar solutions to the boundary value problem
= F(t,x,x') in ]0, I[, x(O) = 0, x(l) = x X"
{
(11.16)
consists in showing first existence of solutions y(t, A) of the initial value problem
= F(t, y, y') in [0, t] y(O) = 0, { y'(O) = A, yll
(11.17)
defined in the interval [0, t] , and then showing that the scalar equation yet, A) =
x
has at least a solution A; in this case the function y(t, X) clearly solves (11.16). Since yet, A) is continuous in A by Theorem 11.37, to solve the last equation it suffices to show that there are values Al and A2 such that yet, AI) < x < y(l, A2). This approach is usually referred to as the shooting method, introduced in 1905 by Carlo Severini (1872-1951).
11.45 Theorem. Let F(t, X, y) be a continuous function in a domain D. The problem (11.16) has at least a solution, provided that I and x/I are sufficiently small. Proof. Suppose IF(t, x, y)1 :::; M', choose M > M' and a sequence of Lipschitz functions Fdt, x, y) that converge uniformly to F(t, x, y) with Vk, Vt,x,y. Problem (11.17) for F k transforms into the Cauchy problem for the first order system
Z' = Gk(t, z), { z(O) = (0, A)
(11.18)
where z(t) = (x(t), y(t)) and Gk(X, z) = (y, Fk(t, x, y)). Now if b > 0 is chosen so that D := {(t, z) Iitl < a e Iz - (0, A)I < b} is in the domain of Gdt, z), and we proceed as in the proof of Peano's theorem, we find a solution Zk,A of (11.18) defined in [0, rJ with
r<min{a,
b }. b + IAI + M
(11.19)
Since Gk is a Lipschitz function, Zk is in fact the unique solution of (11.18) and depends continuously on A .- (0, A). If Xk,A(t) is the first component of Zk,A' we have, see Theorem 11.44, t
Xk,A (t) = At + / (t - r)F(r, Xk,A (r), X~,A (r)) dr, a
11.3 Ordinary Differential Equations
419
hence and in particular,
<x xk,>,(r) > x Xk,>,(r)
if Ar
+ r 2 M < x,
if Ar - r 2 M
(11.20)
> x.
It follows from (11.19) that the assumptions in (11.20) hold for two values of A if r and xlr are small enough, concluding that there is a solution Xk E C2([0,r]) to the boundary value problem
{
X~(t) = Fk(t,xk,X~), (11.21)
Xk(O) = 0, xk(r) =
x.
As in Theorem 11.44, we see that the family {Xk(t)} is equibounded with equicontinuous derivatives, thus, by the Ascoli-Arzela theorem, a subsequence converges to x in the space C1([0,r]), and passing to the limit in the integral form of (11.21), we see actually that x E C2([0,r]) and solves (11.16) in [O,r]. 0
b. A maximum principle Let u E C 2 (JO, 1[) n CO ([0, 1]), but [0,1] can be replaced by any bounded interval. If u has a local maximum point xo in the interior of [0, 1], then u'(XO)
=
° and
u"(xo) S; 0.
(11.22)
If, moreover, u satisfies the differential inequality u"
+ b(x)u' > 0,
(11.23)
°
then clearly (11.22) does not hold at points of ]0,1 [, thus the maximum of u is at or 1, that is, at the boundary of [0,1]. If we allow the nonstrict inequality u" + b(x)u' ?:
°
the constant functions that have maximum at every point, are allowed; but this is the only exception. In fact, we have the following.
11.46 Theorem (Maximum principle). Let u be a function of class C2 (JXl, X2[) n CO ([Xl ,X2]) that satisfies the differential inequality u"
+ b(x)u' ?:
°
where b(x) is a function that is bounded below. Then u is constant, if it has an interior maximum point. Proof. By contradiction, suppose Xo E]xI, X2 [ is an interior maximum point and u is not constant so that there is x such that u(x) < u(xo). Assume for instance x E]xo, X2[ and consider the function
z(x) := e"'(x-xo) - 1, where u is a positive constant to be chosen. Trivially z(x) z(x) > 0 in ]XO,X2[ and
<
0 in ]Xl,XO[, z(xo) = 0,
420
11. Some Applications
z"
+ b(x)z' =
(a 2 + b(x)a)eCt(x-x o )
>0
in
[Xl,
X2]
if a> max(O, - inf xE (xl,x2] b(x)). Also consider the function
+ EZ(X) w(xo) = u(xo),
w(x) := u(x)
where E > 0 has to be chosen. We have w(x) ~ u(x) ~ u(xo) = w(xo) for X < xo, and w(x) = u(x) + EZ(X) < u(xo) if E < u(x~)(:)(x). With the previous choices of a and E, the function w has an interior maximum point in ]XI,X2[, but w" + b(x)w > 0: a contradiction. 0 11.47~. In the previous proof, z(x) := eCt(x-xo) - 1 is one of the possible choices. Show for instance that z(x) := (x - XI)Ct - (xo - xd Ct does it as well.
11.48 Theorem. Let u E C 2 (jXI,X2[) solution of the differential inequality
u"(x)
+ b(x)u'(x)
n C I ([XI,X2]) be a nonconstant
::::: 0
in
]XI, X2[
where b(x) is bounded from below. Then, u'(xd < 0 if u has a maximum value at Xl and U'(X2) > 0 if u has maximum value at X2. Proof. As in Theorem 11.46 we find W'(XI) = u'(a) at
+ Ea ~ 0 if u
has maximum value 0
Xl.
Similarly we get the following.
11.49 Theorem (Maximum principle). Let b(x) and c(x) be two functions with b(x) bounded from below and c(x) sO in [XI,X2]' Suppose that u E C 2 OXI, X2 [) n CO ([Xl, X2]) satisfies the differential inequality
u"
+ b(x)u'(x) + c(x)u ::::: 0
in lXI, X2[.
Then
(i) either u is constant or u has no nonnegative maximum at an interior point, (ii) if u is not constant and has nonnegative maximum at Xl (respectively, at X2), then u'(xd < 0 (respectively, U'(X2) > 0). An immediate consequence is the following comparison and uniqueness theorem for the Dirichlet boundary value problem for linear second order equations.
11.50 Theorem (Comparison principle). Let UI and U2 be two functions in C 2 (]XI, X2[) n CO ([Xl , X2]) that solve the differential equation
u"(x)
+ b(x)u'(x) + c(x)u(x) =
f(x)
where b, c and f are bounded functions and c(x) SO.
(i) If UI (ii) if UI
::::: U2
= U2
at in
Xl
Xl
and X2, then and X2, then
UI ::::: U2 UI
= U2
in in
[Xl, X2], [Xl, X2].
11.51 ~. Add details to the proofs of Theorems 11.49 and 11.50. By considering the equations u" + u = 0 e u" - u = 0 show that Theorem 11.49 is optimal.
11.3 Ordinary Differential Equations
421
c. The method of super- and sub-solutions Consider the boundary value problem
-U" + AU = f(x) { u(O) = u(l) = 0.
in]O,I[,
(11.24)
The comparison principle, Theorem 11.50, says that it has at most one solution if A :::: 0, and, since we know the general integral, (11.24) has a unique solution. Let 9 be the Green operator that maps f E CO([O, 1]) to the unique C 2 ([0, 1]) solution of (11.24). 9 is trivially continuous; since C 2 ([0, 1]) embeds into CO ([0, 1]) compactly, 9 is compact from CO([O, 1]) into CO ([0, 1]); finally by the maximum principle, 9 is monotone: if f : : ; g, then 9f ::::; 9g. Consider now the boundary value problem -U" {
= f(x, u),
u(O) = u(l) =
°
where we assume f : [0, 1] x lR ----+ lR to be continuous, differentiable in U for every fixed x, with fu(x, u) continuous and bounded, Ifu(x, u)1 ::::; k \i(x, u) E [0,1] xR By choosing A sufficiently large, we see that f(x, U)+AU is increasing in U and we may apply to the problem
-U" + AU = f(x,u) {
u(O) = u(l) =
°
+ AU,
(11.25)
the argument in Theorem 11.46, inferring that, if 11 and u are respectively, a subsolution and a supersolution for -u" = f(x, u), i.e.,
-u" :::: f(x, u), u(O), u(l) ::::; 0,
{ u E C 2 ([0, 1]) then setting Tu := 9(J(x, u(x))
{
UO:= VO:=
+ AU(X)) and
='
for n :::: 1,
u,
the sequences {Un} and {v n } converge uniformly to a solution of Tu=u,
Le., to a function of class C 2 that solves (11.25). Hence we conclude
422
11. Some Applications
tn._ ron ~l.'I4.t
UlrTUU"lLU:l~<Jlld
fll"CU..CUM'II·\""to..'Uo.
eQUATION OU CAlC L OF
"Animo.'
CIUMR.E I,
t.~'1"fTNlIIpI""l·iJI11"I~,..1UI1.l••• ..wrirtntid!ei' ....i. .alttl .Ill u1<'tll.lti 'uiJl~u .. P"'..~dlli. It pl., _111111. &eI1Io t.
I.li. prllld,.n Nftlut.
JIl'tt,&I'l.....-n- /MIl ki "'I""'"'4&...
:~~:::::~~~,t:~~r:~.1':'::'~8~~~~~~
..etW1I ..oAIlH 11K. pro,.,.lll",f ok b I n-.,J.~ "rtJt.noilNu.l'i~ fit 1I••»O"I~ .... 1lI U'lL. (·).-'bl~ll .. tiMM '1fti"t_,leit 1b1l'nttW-Ilt'C\tar...l .. «IIf \f II. . . .nltt"•• llln'lonll.rvt 'r'l.II,.,... J,I_rhl.... rt.."-"IlIJI't('I'.tlltl.~ ..l"oI.I"dt.I~ .'IIIi1ium ..... _ prt'IItWlt PH, 1M. fttUIJNlI"I~ ... lidlr.dnolfil , !"R"".("'"rfl, "c.'\I.f..l~. "', rIfUI,", ..11hnor
nriLI., ttl
li-lJn .11• . , .
1b
tb<', "",ilI",J~:
...
c.kPolillro rV>JlI.- ••'hI
::·=o:r.=~.:~=H~~:=w~;~~~~;::.~~
flHJM- iH uoM
f.
1"'.1,,",
pi..
-o.I.."p.arIfot....NI..ol... oIirrtlet.
In_ . C')
':-'''',.1\0 ,.,.. y;...
"r.)
[/#1,1,
....),
.. Jttf,_ti... r~ ..~ .... aWol. (IMli..fI.,...r Wlr"""rea". ""'"rt.H nri.blH (udllh n1(lon pelik.11lkfot d. ".r" ...• J'.) tlJllt1 ,",.fel i"fh'lt'llJ'ft ... nh~llrtbwdR" 1
J':•...
~:~=:~"::::.=:·(~~ct::~-;::~.(~~
l'ItP...... lnl*rrle,.l1...111H ,.1.;' .. ~J. MIlII!lt-f ....f""'I P •• tKorw.~ri ptIr ",ppM U~J".; O·Ol b fQl'ftlot M\lt lI'IIMIHi .. ,..'
eo.
....II'" '"fMrtiln ..euLMJo 01. ""'e:-'.l .. f.lttt."p". c. tw Ki:ttoIJ4i perl!. Pal.\rtdCO)lpl .."...1'••'.lt1aan.11I flHldltoN/, .. tlJl~1ftt. Le mltlQl fII!.... 1I.1IJtCCIIle C4.... tu ''104I]VM' (";
.r:, ·... .r:I .... ""'-'.I(."()"
, .1".. ,.,... •r.RlMtIlf~oIf1....1ernjiz&r. • .,.J """. n ,... ... ,~II..,,.r.~',...r#.WI
~1';Imohl/~jNl'l.w.IBrr""'.
oIIi
Figure 11.4. Two pages from a paper by Sergei Bernstein (1880-1968).
11.52 Theorem. Let f(x, u) be a smooth function with Ifu(x, u)1 :::; k V(x, u). Assume that there exist a subsolution and a supersolution for
-u ll = f(x, u) { u(O) = u(l) = 0.
in [0,1],
Then there also exists a solution.
We also have the following.
11.53 Theorem. Let f(t,p) : [0, +oo[ x~ -. ~ be a function of class C 1 that is periodic of period p in t. If the equation x" (t) = f (t, x(t)) has a subsolution ;r,(t) and a supersolution x(t) that are periodic of period p with ;r,(t) :::; x(t) for all t, then it has also a solution in between, of period p. 11.54~.
Prove Theorem 11.53. [Hint: Follow the following scheme.
(i) Choose M so that f(t, x) - Mx is decreasing. (ii) Inductively define a sequence of p-periodic functions by xo(t) := :f.(t) and Xn +l (t), n 2: 0, as solution of X~+l(t) - MX n +l(t) = f(t, xn(t)) - MXn(t).
(iii) Show that Xn(t) ::; Xn+l(t) ::; x(t). (iv) Show that the sequences {x~} and {x~} are equibounded, in particular {Xn} and {x~} have subsequences that converge, and actually that {Xn}, {X~} and {x~} converge uniformly to x cx" x~, x~. (v) Finally, show that X oo is the solution we are looking for.]
11.3 Ordinary Differential Equations
423
d. A theorem by Bernstein We conclude our excursus in the field of ODEs by the following result. 11.55 Theorem (Bernstein). Let F(x, u,p) : [a, b] x ffi. x ffi. ----; ffi. be a continuous function such that
(i) there exists M > 0 such that uF(x, u, 0) > 0 if lui> M, (ii) there exist continuous nonnegative functions a(x, u) and b(x, u) such that IF(x, u,p)! :S a(x, u)lpl2
+ b(x, u)
V(x, u,p)
E
[a, b] x ffi. x R
Then the problem u ll
=
{ u(a)
F(x, u, u')
in
la, b[,
= u(b) = 0
has a solution.
The original theorem 1 by Bernstein, instead of (i), requires the stronger assumption that F be of class 0 1 and for some positive constant k one has Fu(x, u,p) 2: k > 0 for all (x, u,p). Its proof uses the shooting method. We shall instead use Schaefer's theorem, Theorem 9.142. Proof. As we have seen, the operator that maps every v E C 2 ([a, b]) into the solution of the problem u ff = F(x,v(x),v'(x)), { u(a)
= 0,
u(b)
=0
is compact. Therefore, according to Schaefer's theorem, it suffices to show that, under the assumptions of Theorem 11.55, there exists T > 0 such that, whenever the function v E C 2 ([a,b]) solves
v ff = >"F(x, v, v'), { v(a) = v(b) = 0, for some>.. E [0,1], then
IlvIIC2([a,bj) < T.
ESTIMATE OF Ilvll oo . Let Xo be a maximum point for v 2(x). We may assume xo E]a,b[, otherwise v == 0; therefore we have v' (xo) = 0 and 02:
::2
v 2 (x)lx=XQ = 2v'2(xO)
the assumption (i) then implies
+ 2v(xo)v
Iv(xo)1 : : :
ff
(xo) = >..v(xo)F(xo, v(xo), 0);
M, hence
Ilvll oo : : :
M.
ESTIMATE OF Ilv'lloo. Let fJ- be a positive constant and let A and B be bounds for a(x, u) and b(x, u) when x E [a, b] and lu(x)1 : : : M. Multiplying the equation for v by e-j.£v we find hence
(v'e-j.£V)'::::: >..Av,2 e -j.£v
+ >..Be-j.£v -
fJ- v ,2 e -j.£v
::::: >..Be-j.£v
if fJ- 2: A. Similarly, multiplying the equation for v by ej.£v, we find 1
S.N. BERNSTEIN, SUT les equations du calcul des variations, Ann. Sci. Ec. Norm. Sup. Paris 29 (1912) 481-485.
424
11. Some Applications
Sla UIIE CLASSl D'tOU.HIOKS FOIlCTIOUlLLES
IVA. ,aIDHOLIf
I~.
.......
IIIIM1q.-
tmQ.ll:l
1
-P' ._ "" ~ 4. "" Ip'"n• .u.r- i r'q-.tlllD _ .
bl&. I " .
. . . - .... fODdiG. ,t_1 de IIIaIIIbI
!II.-,tl,r"l., - ,(.)
(0)
'("I,l 1M #r.) "-t d.o
~ ...... A_a. & . . . . ~ mil ~ ... ..... Iqu..Wa ~"'t iI pIAlt .. 'I'Illt ~\I iI,...-iuI"~ C'_pnJ'_'l_p~d'.."....r~
Ib. ,-,*...u. (a) _ "..... ~ ........ DuIoI . . . NWo,lt . . .•... 1IhWa.. IUiilo dot l'''t-. . ~ ~
,.,.._,.....s.1in .. l'..uda
,(.) +/
(bJ
It-,"',!»" - ,(.~
'tuJ:.~I"AI'4q""''''''''
r.. .r~ .. ~ i.IltrodJli....... H.. ~ (~,'hf.).)/(,t.Il.. i~(.)" r...tJ~~).''=tlt .(.) + /fI.o,'Ir!'l't - ",••
1-)
j,q,-uo. 'l~ • ~ • • 1'.lIOII (_) • r--at .. _ 0 AIM£ III IlI
-_ ............ _I W..... I. I.,
llI~lu,,"uulhr ...
~
'h) 11II 0........ _
,lhn
Figure 11.5. Ivar Fredholm (1866-1927) and a page from one of his papers.
if Jl ~ A. Since v' vanishes at some point in la, b[, integrating we deduce for all x E [a, b] _>..Be- JLM (b - a) ~ v' eJLV ~ >"Be JLM (b - a),
therefore
IIv'lloo ~ c(A,B,M) since IIvll oo ~ M by step (i). IIv"lloo. This is now trivial, since from the equation we have
ESTIMATE OF
[v"(x)1 ~ >..[F(x,v(x),v'(x))1 ~ c(M),
F being continuous in [a, b] x [0, M] x [0, c(A, B, M)].
D
11.4 Linear Integral Equations 11.4.1 Some motivations In several instances we have encountered integral equations, as convolution operators or, when solving linear equations, as integral equations of the type
x(t) = Yo
+
it
f(s,y(s))ds;
to
for instance, the linear system x'(t) = A(t)x(t) can be written as
11.4 Linear Integral Equations
X(t) =
it
A(s)x(s) ds.
425
(11.26)
to
(11.26) is an example of Volterra's equation, Le., of equations of the form
J t
f(t) = ax(t)
+
(11.27)
k(t, T)X(T) dT.
o a. Integral form of second order equations The equation x"(t) - A(t)x(t) = 0, t E [a,,8], can be written as a Volterra equation. In fact, integrating, we get
x'(t) = Cl
+
it
A(s)x(s) ds
to
and, integrating again,
x(t) = Co + Cl(t - to)
+
it (iT it to
= Co + Cl (t - to) + =:
F(t)
+
it
A(s)x(s) dS) dT
to
(t - s)A(s)x(s) ds dT
(11.28)
to
(t - s)A(s)x(s) ds,
to
with F(t) := Co
+ Cl(t - to)
and G : [a,,8] x [a,,8]
G(t,s):=
{~t-S)A(S)
----+
~
given by
if s ::; t, otherwise.
b. Materials with memory Hooke's law states that the actual stress a is proportional to the actual strain E. At the end of 1800, Boltzmann and Volterra observed that the past history of the deformations of the body cannot always be neglected. In these cases the actual stress a depends not only on the actual strain, but on the whole of the deformations the body was subjected to in the past, hence at every instant t
a(t) = aE(t)
+ F[E(T)&J,
where F is a functional depending on all values of E(T), 0< T < t. In the linear context, Volterra proposed the following analytical model for F, t
F[E(T)&]:=
J
k(t,T)E(T)dT.
o
426
11. Some Applications
This leads to the study of equations of the type
J t
= aE(t) +
a(t)
k(t,T)E(T)dT,
o that are called Volterra's integral equations of first and second kind according to whether a = 0 or a =/:. O. c. Boundary value problems Consider the boundary value problem X" -
A(t)x
= 0, (11.29)
x(O) = a, {
x(L)
= b.
From (11.28) we infer x(t)
= Cl + C2t + i t (t - s)A(s)x(s) ds
and, taking into account the boundary conditions, Cl
= a,
b- a =- - -IlL (L - s)A(s)x(s) ds,
C2
L
L
0
we conclude that x(t)
= a + b - at _ L
~ L
l
[\L _ s)A(s)x(s) ds
10
+
[t (t - s)A(s)x(s) ds
10
tS (L-t) b-a lLt(L-S) =a+--tA(s)x(s)dsA(s)x(s)ds. L a L t L In other words, x(t) solves (11.29) if and only if x(t) solves the integral equation, called Fredholm equation, x(t)
= F(t) +
l
L G(t, s)x(s) ds
where F(t) := a + bLat and G : [0, L] x [0, L] -- lR is given by s(L - t) G(t,s):=
{
t(L;S)
se s ::; t, se t ::; s.
11.4 Linear Integral Equations
~
A
Co
427
B
Figure 11.6. An elastic thread.
d. Equilibrium of an elastic thread Consider an elastic thread of length £ which readily changes its shape, but which requires a force c d£ to increase its length by d£ according to Hooke's law. At rest, the position of the thread is horizontal (the segment AB) under the action of the tensile force To which is very large compared to any other force under consideration. If we apply a vertical force p at C for which x = ~, the thread will assume the form in Figure 11.6. Assume that 8 = CCo be very small compared to ACo and CoB (as a consequence of the smallness of p compared with To) and, disregarding terms of the order 82 (compared with e), the tension of the thread remains equal to To. Then the condition of equilibrium of forces is
8 T o~
+ T o-~- =p
Le.,
l- ~
8 = p(l - ~)~. Tol
Denoting by y(x) the vertical deflection at a point of abscissa x, we have
y(x) where
G(x,Op
=
x(l - ~)
G(x,~):=
{
(l
,!:o~)~ Tol
Now suppose that a continuously distribuited force with length density acts on the thread. By the principle of superposition the thread will assume the shape
p(~)
I
y(x) =
J
G(x, ~)p(~) d~.
(11.30)
o If we seek the distribution density p(~) so that the thread is in the shape y(x), we are led to study Fredholm's integral equation in (11.30).
e. Dynamics of an elastic thread Suppose now that a force, which varies with the time t and has density at ~ given by p(~) sinwt, w>O,
428
11. Some Applications
acts on the thread. Suppose that during the motion the abscissa of every point of the thread remains unchanged and that the thread oscillates according to
y = y(x) sinwt. Then we find that at time t the piece of thread between ~ and acted upon by the force p(~) sin(wt)~~ plus the force of inertia d2 y
-p(~)~~ dt 2
~
+ ~~
is
= p(~)Y(X)W2 sinwt~~,
where ~ is the density of mass of the thread at ~, and the equation (11.30) takes the form
JG(x,~)[p(~)sinwt+w2p(~)y(~)sinwt]d~. I
y(x)sinwt=
(11.31)
o If we set
J I
G(x, ~)p(~) d~
G(x, Op(~)
f(x),
=:
=:
k(x, ~),
o (11.31) takes the form of Fredholm equation I
y(x)
= >..
J
k(x, Oy(~) d~ + f(x).
(11.32)
o 11.56 'If. Show that, if in (11.32) we assume p(~) constant and solves yll(X) +w2cy(x) = f"(x),
f smooth, then y(x)
(11.33)
y(O) = 0,
{
y(l) = 0,
where c = piTa. Show also that, conversely, if y solves (11.33), then it also solves (11.32). 11.57
'If. In the case
p = const, show that the unique solution of (11.33) is I
SinpX! y(x) = - -1 .f" (~) sinp(l,~) dE. p slllpi
if sinpl
¥
a
x
+ -p1 ! f" (~) sinp(x -~) dE. a
0, p := w"jC. Instead, if sinpA = 0, i.e., p = Pk where k1r
Pk:=
T'
then (11.33) is solvable if and only if
k E OZ,
11.4 Linear Integral Equations
_.
429
OPERE MATEMATICHE Mrmoric:
....
It
Noa.
r\I.IUCAn: .. Q1.u. ....·~"~OIIUJ11t&1 NL
(Q(WA,~ JtA.lIOlAlZ DIU.&
alCtlQlC
v_ _ 1111·''''
Figure 11.7. Vito Volterra (1860---1940) and the frontispiece of the first volume of his collected works.
! I
!,,(€)sin1J.(1
-~) ~ =
0
o equivalently, iff I
! f(€)sin~~
=
o.
o In particular, if f(x) = 0 and 1J. = 1J.k, all solutions are given by
y(x) = CSin1J.kX
CElR
and the natuml oscillations of the thread are given by y = CSin1J.kXsinwkt.
Compare the above with the alternative theorem of Fredholm in Chapter 10.
11.4.2 Volterra integral equations A linear integral equation in the unknown x(t), t E [a, b] of the type
J b
x(t)
=
f(t) +
k(t, T)X(T) dT
a
where f(t) and k(t,x) are given functions, is called a Fredholm equation of second kind, while a Fredholm equation of the first kind has the form
430
11. Some Applications
J b
k(x, T)X(T) dT
=
f(t).
a
The function k(t, T) is called the kernel of the integral equation. If the kernel satisfies k(t, T) = 0 for t > T, the Fredholm equations of first and second kind are called Volterra equations. However it is convenient to treat Volterra equations separately.
11.58 Theorem. Let k(t, T) be a continuous kernel in [a, b] x [a, b] and let f E CO ([a, b]). Then the Volterra integral equation
x(t)
= f(t) + A
it
k(t,T)X(T)dT
has a unique solution in CO([a, bJ) for all values of A. Proof. The transformation b
T[xJ(t):= f(t)
+ A! k(t,r)x(r)dr a
maps CO([a, bJ) into itself. Moreover for all t E [a, b] we have
IT[Xl](t) - T[x2J(t)1 ~
IAI M(t
-
a)llxl -
(t
a)2
x21Ioo,[a,bj
hence
2 IT 2 [xl](t) - T [X2](t)1 ~ and by induction, if Tn := To· ..
0
IAI 2 M 2 ~llxl -
X2[[oo,[a,bj
T n times,
If n is sufficiently large, so that lAin M n (b - a)n In! < 1, we conclude that Tn is a contraction, hence it has a unique fixed point x E CO ([a, bJ). If n = 1 the proof is done, otherwise Tx is also a unique fixed point for Tn, so necessarily we again have Tx = x by uniqueness. 0
11.4.3 Fredholm integral equations in CO 11.59 Theorem. Let k(t, T) be a continuous kernel in [a, b] x [a, b] and let f E CO([a, bJ). The Fredholm integral equation
J b
x(t) = f(t)
+A
k(t, T)X(T) dT
a
has a unique solution x(t) in CO([a,bJ), provided
\,\1
is sufficiently small.
11.5 Fourier's Series
431
Proof. Trivially, the transformation b
T[x](t) := f(t)
+A
J
k(t, T)X(T) dT
a
maps CO([a, b]) into itself and is contractive for A close to zero, in fact, if M .max [k(t, T)[,
J b
IT[xl](t) - T[x2](t)1 ::;
IAI
Ik(t,T)llxl(T) - X2(T)\ dT
a
::; 1>.1 M(b -
a) [[Xl(t) - x2(t)lloo,[a,bJ
1
< 2"ll x l(t) - x2(t)lloo,[a,bj if
IAI M(b -
a)
< 1/2.
o
In order to understand what happens for large A, observe that the transformation
J b
T[x](t):= f(t)
+
k(t,T)x(T)dr
a
is linear, continuous and compact, see Example 9.139. The theorem in Remark 10.72 then yields the following.
Riesz~Schauder
11.60 Theorem. Let k(t, T) E CO([a, b] x [a, b]) and f E CO([a, b]). The equation b
AX(t)
= f(t) +
J
k(t, r)x(r) dr
(11.34)
a
has a set of eigenvalues A with the only accumulation point A = O. Each eigenvalue A =I- 0 has finite multiplicity and for any A, A =I- 0 and A tf- A, (11.34) has a unique solution.
Further information concerning the eigenvalue case requires the use of a different space norm, the integral norm II 112, and therefore a description of the completion L 2 ((a, b)) of CO((a, b)) that we have not yet treated.
11.5 Fourier's Series In 1747 Jean d'Alembert (1717-1783) showed that the general solution of the wave equation cPu
ot2
2f)2u = a
ox 2 '
(11.35)
432
11. Some Applications
THEORIE UlALrrlQUl!:
Dar tellbo.rkeit einer Function
DE LA CHALEUR,
durdJ eine bigooometrisehe Reihe
P.. M. FOURIER.
B. Rl.m ....
). PARIS, CHEZ, FIJ\IIlIUC OIDOT, .illB aT PILS, ~~'-U
n
I'1lDu.""'.~·~.T""'IUQ1tI n1
u.a. •• ·~
II.,
.822..
Figure 11.8. Frontispieces of two celebrated works by Joseph Fourier (1768-1830) and G. F. Bernhard Riemann (1826-1866).
that transforms into
fpu
-=0
aras
by the change of variables r
u(t, x)
= x + at, s = x - at, is given by = cp(x + at) + 1j1(x - at),
where cp and 1j1 are, in principle, generic functions. Slightly later, in 1753, Daniel Bernoulli (1700-1782) proposed a different approach. Starting with the observation of Brook Taylor (1685~1731) that the functions
. (n1l"x) (n1l"a(tg - (3)) , sm -g- cos
n = 1,2, ...
(11.36)
are solutions of the equation (11.35) and satisfy the boundary conditions = u(t, g) = 0, Bernoulli came to the conclusion that all solutions of (11.35) could be represented as superpositions of the tones in (11.36). An outcome of this was that every function could be represented as a sum of analytic functions, and, indeed,
u(t, 0)
L
~ 00 sin(2n + l)x 11" n=O 2n + 1
=
{I
0 -1
if 0 < x < 11",
= 11", if 11" < x < 211". if x
Bernoulli's result caused numerous disputes that lasted well into the nineteenth century that even included the notion of function and, eventually,
11.5 Fourier's Series
433
was clarified with the contributions of Joseph Fourier (1768-1830), Lejeune Dirichlet (1805-1859), G. F. Bernhard Riemann (1826-1866) and many other mathematicians. The methods developed in this context, in particular the idea that a physical system near its equilibrium position can be described as superposition of vibrations and the idea that space analysis can be transformed into a frequency analysis, turned out to be of fundamental relevance both in physics and mathematics.
11.5.1 Definitions and preliminaries We denote by L~1I" the space of complex-valued 2Jr-periodic functions in ~ that are summable on a period, for instance in [-Jr, Jr]. For k E Z, the kth Fourier coefficient of f E L~1I" is the complex number
111" f(t)e-'.kt dt 2Jr -11"
Ck = Ck(J) := - 1
often denoted by
f1 k)
or
rk.
11.61 Definition. The Fourier nth partial sum of f E L~1I" is the trigonometric polynomial of order n given by n
Snf(x):=
L
Ck eikx ,
x E~,
111" f(t)e- ikt dt. 2Jr -11"
Ck = - 1
k=-n
The Fourier series of f is the sequence of its Fourier partial sums and their limit
= Sf(x) =
n
'"' Ckeikx.- lim Snf(x) =
L...J
n--+oo
k=-oo
lim '"' Cke,kx n-+oo L..-J k=-n
If f E L~1I" is real-valued, then Vk E Z
since f(t)
=
f(t) and
i:
f(t)e
ikt
dt
=
i:
f(t)e- ikt dt
=
i:
f(t)e- ikt dt.
The partial sums of the Fourier series of a real-valued function have the form n
S n f( X )
= Co +
'"'( L..-J Cke ikx k=l
n
+ -Cke -ikx)
\0(2 cke ikx) , = Co + '"' L..-J::n k=l
434
11. Some Applications
Figure 11.9. The Dirichlet kernel with n = 5. Observe that the zeros of Dn(t) are equidistributed Xn := 2~~lj, j i' 2k1r, k E Z.
thus, decomposing Ck in its real and imaginary parts, Ck =: (ak - ib k )/2, that is, setting
ak
1/11"
:= -
1r
f(t) cos(kt) dt,
1/11" bk :=;: -11" f(t) sin(kt) dt,
-11"
we find the trigonometric series
Snf(x) = ~o
n
+ L ~((ak -
ibk)(cos(kx)
+ i sin(kx))
(11.37)
k=l
However, the complex notation is handier even for real-valued functions. 11.62'. Show that the operator ~ mapping every function in L~7l" into the sequence of its Fourier coefficients, 1---> {nk)}, has the following properties: it is linear (Af + J1.gnk) = >..nk) + J1.91k) V>..,J1. E iC, V/,g E L~7l"' (fg),- = (r* gl(k), see Proposition 4.46, (f * gnk) = nk)glk), see Proposition 4.48, if g(t) = I(-t), then glk) = n-k), if g(t) = I(t - 'P), then glk) = e-ik
11.5 Fourier's Series
435
a. Dirichlet's kernel The Dirichlet kernel or order n is defined by
Dn(x) := 1 + 2
n
n
k=l
k=-n
L cos(kx) = L
xER
As we have seen in Section 5.4 of [GM2], Dn(t) is a trigonometric polynomial of order nand 2n-periodic, Dn(t) is even,
-1
fO
2n -1r
Dn(t) dt = - 1
2n
i
1r
1 Dn(t) dt = -2'
0
and if t
2n+ 1 Dn(t) =
= 2kn, k E Z,
sin((n + 1/2)t)
if t =I- 2k7r. sin(t/2) The Fourier coefficients of {D n (t)} are trivially
{
Therefore it is not surprising that we have the following. 11.63 Lemma. For every
Snf(x) = -1
2n
i
f
E L~1r(IR) we have
1r (f(x
+ t) + f(x
VxER
- t))Dn(t) dt
0
Proof. In fact
Snf(x) =
t
qe ikx =
~ J1r 27l"
k=-n
f(t)eik(X-t) dt =
~ J1r 27l"
-1r
1 J1r f(t)Dn(t - x) dt = - 1 J1r-X f(x -1r 27l" -1r-X
= 27l"
= -1 J1r f(t + x)Dn(t) dt = -1 27l"
27l"
-1r
1 1r
(I(x
0
f(t)Dn(x - t) dt
-1r
+ t)Dn(t) dt
+ t) + f(x -
t»Dn(t) dt,
where we used, in the fourth equality, that Dn(t) is even and in the second to last equality that for a 27l"-periodic function we have
i
a
27r
+
1r
u(t) dt = [
1r
u(t) dt
\;fa E JR.
o Finally we explicitly notice that, though D1r Dn(t) dt
I:
= 2n,
we have
[Dn(t)[ dt = O(logn).
This prevents us from estimating the modulus of integrals involving Dn(t) by estimating the integral of the modulus.
436
11. Some Applications
ERIE TRIGONOMETRIOHE
IlEnl 1.E:8IsGOz..
PAilS. ""tn1lLU..VI1~...1l
UI'lllJllltIJ\oU4JW1IJI
Figure 11.10. The frontispieces of two volumes on trigonometric series by Henri Lebesgue (1875-1941) and Leonida Tonelli (1885-1946).
11.5.2 Pointwise convergence If P is a trigonometric polynomial, PEPn,21r' then P agrees with its Fourier series, P(x) = L:~=-n Ckeikx \:Ix E JR, see Section 5.4 of [GM2]. But this does not hold for every f E L~1r' Given f E L~1r' we then ask ourselves under which assumptions on f the Fourier series of f converges and converges to f. a. The Riemann-Lebesgue theorem The theorem below states that a rapidly oscillating function with a summable profile has an integral that converges to zero when the frequency of its oscillations tends to infinity, as a result of the compensation of positive and negative contributions due to oscillations, even though the L 1 norms are far from zero. 11.64 Theorem (Riemann-Lebesgue). Let f :]a, b[-+ JR be a Riemann summable function in la, b[. For every interval ]c, d[C]a, b[ we have as [,\1
-+ 00
uniformly with respect to c and d. Proof. (i) Assume first that f is a step function, and let a := {xc = a, XI, be a subdivision of la, b[ so that f(x) = ak on [Xk-l, Xk]. Then
... , X n
= b}
11.5 Fourier's Series
437
This proves the theorem in this case. (ii) Let I be summable in la, b[ and E > O. By truncating I suitably, we find a bounded Riemann integrable function h. such that I/(t) - h.(t)1 dt < E, and in turn a step function gf : (a, b) ~ lR with 2E and from
J:
J:
Ihf(t) - gf(t)1 dt
< E.
Consequently
J:
I/(t) - g.(t)! dt
<
we infer
lid
l(t)e iAt dtl ::;
lid
: ; lid
g.(t)e iAt dtl g.(t)e
iAt
dtl
+
lb
If(t) - g.(t)! dt
+ E. o
The conclusion then follows by applying part (i) to g•.
11.65 Corollary. Let f be Riemann summable in la, b[. Then
l f(s)sin((n+~)s)ds--*O d
uniformly with respect to the interval
as n
--* 00
lc, d[C]a, b[.
11.66'. Show the following. Proposition. Let
I
E L~7l"
Then we have asn~oo
lor every 8 > O. 11.61'. Show Theorem 11.64 integrating by parts if I is of class C 1 ([a, b]). 11.68'. Let I E L~7l' and let {Ck(f)} be the sequence of its Fourier coefficients. Show that !Ck(f)1 ~ 0 as k ~ ±oo.
b. Regular functions and Dini test 11.69 Definition. We say in this context that f E L~7l' is regular at x E IR if there exist real numbers L±(x) and M±(x) such that lim f(x
t~O+
lim f(x t~O+
+ t) = L+(x), + t) t
L+(x)
lim f(x+t) =L-(x),
(11.38)
t~O-
= M+(x),
lim f(x t--->O-
+ t) t
L+(x)
= M-(x).
438
11. Some Applications
Of course, if f is differentiable at x, then f is regular at x with L±(x) = f(x) and M±(x) = f'(x). Discontinuous functions with left and right limits at x and bounded slope near x are evidently regular at x. In particular square waves, sawtooth ramps and C 1 functions are regular at every x E R It is easy to see that if f is regular at x then the function t.p x (t ) .'-
f(x
+ t) + f(x - t) -
L+(x) - L-(x)
(11.39)
~--'---=--"----'-------=-'---'---'-
t
is bounded hence Riemann integrable in ]0, 1T]. 11.70 Definition. We say that a 21T-periodic piecewise-continuous map f : lR. ~ C is Dini-regular at x E lR. if there exist real numbers L±(x) such that
1I
71" f (x
+ t) + f (x - t) -
I t < +00.
L + (x) - L - (x) d
t
a
(11.40)
11.71 Theorem (Dini's test). Let f E L~7I"(lR.) be Dini-regular at x E lR. and let L + (x), L - (x) be as in (11.40). Then Snf(x) ~ (L + (x) + L - (x)) /2. Proof. We may assume that x E [-71", rr]. Since 2~ f~" Dn(t) dt = 2~ f o" Dn(t) dt = 1/2, we have Snf(x) - L+(x) + L -(x) = 2
~ f" (f(x + t) + f(x 2rr
= -1 2rr
io
1"
- t) - L + - L -)Dn(t) dt (11.41)
'Px(t)tDn(t)dt
0
where 'Px(t) is as in (11.39). Set h(t) := 'Px(t) sintt/2)' so that Ih(t)1 S; rr l'Px(t)1 in [0, rr] and consequently h(t) is summable. Since 'Px(t) t Dn(t) = h(t) sin((n + 1/2)t), (11.41), the Corollary 11.65 yields
Snf(x) -
L+(x)
+ L-(x) 2
1"
1 = h(t) sin((n 2rr 0
+ 1/2)t) dt -+ O. o
In particular, if f is continuous, 21T-periodic and satisfies the Dini condition at every x, then Snf(x) ~ f(x) \:Ix E lR. pointwise. 11.12 Example. Let 0 < a S; 1 and A C lR. Recall that a-Holder-continuous if there exists K > 0 such that
f :A
-+
IR is said to be
'7x,y EA.
If(x) - f(y)1 S; Klx - ylO
We claim that a 2rr-periodic a-Holder-continuous function on [a, b] satisfies the Dini test at every x E]a, b[. In fact, if 8 = 8x := min(lx - ai, Ix - bl), then
~" If (x + t) < 2K
-
I
f (x) : f (x - t) - f (x) dt S;
r" C Ho dt + 41Iflloo,[a,bl
io
8
~" ... dt +
< +00.
I." ... dt
11.5 Fourier's Series
11.73~. Show that the 27r-periodic extension of continuous.
/ft1,
439
t E [-7r,7r] is 1/2-H61der-
11. 74 Example. Show that, if f is continuous and satisfies the Dini test at x, then L+(x) = L-(x) = f(x). 11.75~. Show that the 27r-periodic extension of f(t) := 1/1og(1/ltl), t E [-7r,7r] does not satisfy the Dini test at O.
11.5.3 L 2 -convergence and the energy equality a. Fourier's partial sums and orthogonality Denote by
11/112 the quadratic mean over a period of I
II/II~ := 2~
I:
2
I/(t)1 dt,
and with L~1r the space of integrable functions with mitian bilinear form and the corresponding "norm"
(fIg) := - 1 /1r I(t)g(t) dt, 21r -1r
11/112:=
11/112 < 00.
1 /1r ( 21r -1r
2
I/(t)1 dt
The Her-
)
1/2
,
are not a Hermitian product and a norm in L~1r' since 11/112 = 0 does not imply I(t) = 0 'tit, but they do define a Hermitian product and a norm in L~1r n O°(lR), since 11/112 = 0 implies I = 0 if I is continous. Alternatively, we may identify functions I and g in L~1r if III - gl12 = 0, and again (fIg) and 11/112 define a Hermitian product and a norm on the equivalence classes of L~1r if, as it is usual, we still denote by L~1r the space of equivalence classes. It is easily seen that L~1r is a pre-Hilbert space with (fIg). Notice that two nonidentical continuous functions belong to different equivalence classes. Since eikx , k E Z, belong to L~1r and
we have the following.
11.76 Proposition. The trigonometric system {e ikt IkE Z} is an orthonormal system in L~1r.
440
11. Some Applications
Since
we have
n
Snf(x)
=
L
Uleikx)eikx,
x
E
JR.,
k=-n i.e., the Fourier series of f is the abstract Fourier series with respect to the trigonometric orthonormal system. Therefore the results of Section 10.1.2 apply, in particular the Bessel inequality holds 00
L
k=-oo
ICkl 2 ~ Ilfll~
as well as Proposition 10.18, in particular
Ilf -
Snfl12 ~ Ilf - Pl12
VP E
Pn,2n.
Recall also that for a trigonometric polynomial P E P n ,27r the Pythagorean theroem holds
b. A first uniform convergence result 11. 77 Theorem. Let f E 0 1 (JR.). Then Sn f
--t
f uniformly in R
Proof. Since Snf(x) -> f(x) "Ix, it suffices to show the uniform convergence of Snf· We notice that f' E L~1r and that, by integration by parts, Vk E Z, hence , if k =1= 0, Icd!)1 ::;
ick~')1 ::; Ick(J')1 2 + k12
where we have used the inequality labl ::; a 2 + b2 . Since 2":%"=-00 ICk(J')12 converges by Bessel's inequality, we therefore conclude that 2:%"=-00 ICk(J)1 converges, consequently
converges absolutely in C°(lR) since Ileikxlloo,1R = 1 Vk.
11.78'. Let as
Ikl-> 00.
f
E cn(lR) and let {cd be its Fourier coefficients. Show that
o
knlckl -> 0
For stronger results about uniform convergence of Fourier series see Section 11.5.4.
11.5 Fourier's Series
....
441
ZYOKti~D
TRIGO OMETRIC ERIE "'OLOWE I
AMBRIDGE
Figure 11.11. Antoni Zygmund (19001992) and the frontispiece of the first edition of volume I of his Trigonometric Series.
U
TU. UXI"IB'ITY PtUr:1I lUI
c. Energy equality We have, compare Chapter 9, the following. 11.19 Lemma. C 1 (JR) n L~1l" is dense in L~1l"' Proof. Let I E L~1l" and € > O. There is a Riemann integrable function h. with h. [[2 < € and a step function k. in [-7r, 7r] such that Ilk. II :S M. and Ilh. - k. Ih :S (7r€2)/M. where M. := [[h.[loo, consequently
III -
IIh. - k.lb2 =
-1
27r
/11" -11"
Ih. - k.1 2 dt:S -1
2M. /11" 27r_11"
Ih. - k.ldt:S € 2•
First, approximating k. by a Lipschitz function, then smoothing the edges, we find /. E Cl([-7r,7r]) with Ilk. -/.llz < L Finally we modify I. near 7r and -1r to obtain a new function 9. with 9.( -7r) = 9.(7r) = 9'( -7r) = g'(7r) = O. Extending g. to a periodic function in JR, we finally get 9. E Cl(JR) n L~11" and III - 9<112 < 4f. 0
Now we can state the following. 11.80 Theorem. For every f E L~1l" we have IISnf - fl12 ~ O. Therefore, the trigonometric system {e ikx }, k E Z, is orthonormal and complete in L~1l"; moreover, for any f E L~1l" the energy equality or Parseval's identity holds:
Proof. Given I E L~11" and € > 0, let 9 E Cl (JR.) n L~11" be such that III - 9112 < L Since Sn9 is a trigonometric polynomial of order at most n, and Snl is the point of minimal L~11" distance in L~11" from I we have
442
11. Some Applications
and the claim follows since Ilg stated in Proposition 10.18.
-
Snglloo -- 0 as n --
00.
The rest of the claim is now 0
11.81 ~. Show that, if the Fourier series of f E L~71' converges uniformly, then it converges to f. In particular, if the Fourier coefficients Ck of f satisfy
+00
L
ICkl < +00,
k=-oo then f(x) =
L:t:'-oo Cke ikx
in the sense of uniform convergence in R
11.5.4 Uniform convergence a. A variant of the Riemann-Lebesgue theorem Let us state a variant of the Riemann-Lebesgue theorem that is also related to the Dirichlet estimate for the series of products. 11.82 Proposition (Second theorem of mean value). Let f and 9 be Riemann integrable functions in la, b[. Suppose moreover that f is not decreasing, and denote by M and m respectively, the maximum and the minimum values of x ---+ g(t) dt, x E [a, b]. Then we have
J:
mf(b):S
l
b
f(t)g(t)dt:S Mf(b).
In particular, there exists c E]a, b[ such that
l
b
=
f(t)g(t) dt
f(b)
l
b
g(t) dt.
Proof. Choose a constant d such that g(t)+d > 0 in la, b[. If f is differentiable, the claim follows easily integrating by parts f(t)(g(t)+d) dt. The general case can be treated by approximation (but we have not developed the correct means yet) or using the formula of summation by parts, see Section 6.5 of [GM2]. For the reader's convenience we give the explicit computation. Let U = {xo = a,Xl, ,Xn = b} be a partition of [a,b]. Denote by ~k the interval [Xk-l,Xk] and set Uk := L:k=l f(Xk)(Xk - xk-d. We have
J:
00.
fa
Ei n
b
f(t)(g(t)
+ d) dt =
f(t)(g(t)
+ d)) dt ~
k
E n
f(Xk)(G(Xk-l) - G(Xk)
n-l
= f(Xl)G(XO)
+L
G(Xk)(f(Xk+d - f(Xk))
k=l
~ M (f(xd +
n-l
L (f(xk+d -
k=l
= Mf(b)
+ dUk·
f(Xk))
+ Mk
+ Mk
+ dUk
11.5 Fourier's Series
SERlIl
ALTRE
])I
443
FQUIIIEII
RAPPR[S~mlIDM AmITItH~
fUlIllONI 01 UNA VARlABlL£ REAtE ULlSS£ DINI
PIs•
.... Figure 11.12. Ulisse Dini (1845-1918) and the frontispiece of his Serie di Fourier.
Since
Uk --->
J: g(t) dt
as k
---> 00,
we infer
lab f(t)g(t) dt :s:; M f(b). Similarly, we get
J: f(t)g(t) dt
~ mf(b). The second part of the claim follows from the g(t) dt is continuous. 0
intermediate value theorem since
J:
From the Riemann-Lebesgue lemma, see Exercise 11.66, for any > 0 we have for every fixed x
f
E
L~1I' and 8
111' f(x + t)Dn(t) dt ---> 0
as n
---> 00.
For future use we prove the following.
11.83 Proposition. Let
f
E L~1I'
and 8> O. Then
111' f(x + t)Dn(t) dt ---> 0
as n
---> 00
uniformly in x E R Proof. Since 1/ sin(t/2) is decreasing in ]0, 7l'], the second theorem of mean value yields = ~(x) E [8,7l'] such that
~
r
Jo
f(x
On the other hand,
+ t)Dn(t) dt = -._1_ sm(8/2)
r~ f(x + t) sin((n + 1/2)t dt. Jo
11. Some Applications
444
r{ J(x
JJ
+ t) sin((n + 1/2)t dt =
= cos((n
+ 1/2)x)
l
{+x
r{+x J(t) sin((n + 1/2)(t - x)) dt
JJ+x
J(t) sin((n + 1/2)t) dt
J+x
- sin((n + 1/2)x)
l
{+x
J(t) cos((n + 1/2)t) dt
J+x
and the last two integrals converge uniformly to zero in [-11",11"], see Exercise 11.62. Thus JJ" I(x + t)Dn(t) dt -; 0 uniformly in [-11", 1I"J, hence in JR.. 0
b. Uniform convergence for Dini-continuous functions Let f E CO,a(lR)nL~1I" be a 27f-periodic and a-Halder-continuous function. It is easy to see that f is continuous and Dini-regular at every x ERIn fact, if 0 = Ox := min(lx - ai, Ix - bj), then
111" If(x + t) -
1°...
I
f(x) : f(x - t) - f(x) dt S
dt
+ 111" ... dt
S 2K 111" cl+ a dt + 41Ifllt[a,bl < +00. Therefore Snf(x) ~ f(x) \:Ix E lR by the Dini test theorem, Theorem 11.71. We have the following.
11.84 Theorem. If f is 27f-periodic and of class co,a(JR), 0 < a S 1, then Snf(x) -.. f(x) uniformly in JR. Proof. Let 8> 0 to be chosen later. We have
1 J
SnJ(X) - J(x) =
([(x
+ t) -
=: h(8,n,x)
Let
€
J(x))Dn(t) dt
+
1"
+ t) -
J(x)1
:s; KltI'"
K
:s;
J
K
l(x))Dn(t) dt
> 0 such that
"Ix E JR., "It E [0,211"],
hence Ih(8,n,x)1
+ t) -
+ h(8,n,x)
> O. Since J is a-Halder-continuous there exists I/(x
([(x
1
1 _) Isin((n to._._(_ o sm t/2
+ 1/2)t)1 dt:S;
1 J
2K
0
2K C1+o. dt = _80.. a
We can therefore choose 8 in such a way that Ih(8,n,x)/ < € uniformly with respect to x and n. On the other hand Ih(8,n,x)1 < € uniformly with respect to x as n -; +00 by Proposition 11.83 concluding that ISnl(x) - l(x)1
:s; 2€
uniformly in x
for n sufficiently large.
o
With the same proof we also infer the following.
11.85 Theorem (Dini's test). Let f E C°(JR) n L~1I" be a 27f-periodic and continuous function with modulus of continuity w(o), If(x) - f(y)/ S w(o) if Ix - yl S 0, such that w(o)jo is summable in a neighborhood of 0=0. Then Snf ~ f uniformly in R
11.5 Fourier's Series
445
c. Riemann's localiziation principles The convergence of Fourier's partial sums is a local property in the following sense 11.86 Proposition. If g, h E L~1I" and 9 = h in a neighborhood of a point x, then Sng(X) - Snh(x) -+ 0 as n -+ 00. Proof. Assume f := 9 - h vanishes in [x - 8, x have f(x + t) = f(x - t) = 0, hence Snf(x) - f(x) = - 1 211"
l
1r
(f(x
6
+ 8], 8 > o.
+ t) + f(x -
Then, for every t E [0,8] we
t))Dn(t) dt.
Since (f(x + t) + f(x - t))/ sin(t/2) is summable in (8,11"), the result follows from the Riemann-Lebesgue theorem. 0
11.87 Proposition. If f E L~1I" and f = 0 in la, b[, then Snf(x) uniformly on every interval [c, dJ with a < c < d < b.
-+
0
Proof. Let us show that Snf(x) -+ 0 uniformly in [a + 8, b - 8], 0 < 8 < (b - a)/2. For x E [a + 8, b - 8] and 0 < t < 8 we have f(x + t) = f(x - t) = 0, hence
l
1r
Snf(x) = - 1 (f(x + t) 211" 6 The claim follows from Proposition 11.83.
+ f(x -
t))Dn(t) dt.
o
The localization principle says that, when studying the pointwise convergence in an open interval la, b[ or the uniform convergence in a closed interval inside la, b[ of the Fourier series of a function f, we can modify f outside of ]a, b[. With this observation we easily get the following.
11.88 Corollary. Let f E L~1I" be a function that is of class C 1 ([a, b]). Then {Snf(x)} converges uniformly to f(x) in any interval strictly contained in la, b[.
11.5.5 A few complementary facts a. The primitive of the Dirichlet kernel Denote by Gn(x) the primitive of the Dirichlet kernel,
Gn(x)
:=
lX Dn(t) dt.
It is easy to realize that Gn(x) is odd and nonnegative in [0,1f] and takes its maximum value in [0,1f] at the first zero X n := 2~~1 of D n . Thus
IIGn ll oo ,[o,1I"]
_ -
=
_r
(~) G n 2n + 1 -
2
Jo
11"!(2n+l)
sin((n + 1/2)s) ds sin(s/2)
11" sins d 2 s< 2(n + 1) 1f<1f o sin(s/(2n + 1)) - 2n + 1 -
1
446
11. Some Applications
Figure 11.13. The graph of G5(X) in [-7l",7l"].
independently of n; in particular, (11.42) for all c, d E [0,11"]. Also, by Exercise 11.66, or directly by an integration by parts, it is easily seen that, given any 8 > 0, there is a constant c(8) such that 71" 1 (11.43) IGn (1I") - Gn(x)1 = I x Dn(t) dtl ::::; c(8);;:
1
for all x E [0,8]. For future use we now show that
.
hm Gn(x n ) = 2
n->oo
171" sin s - ds. 0
(11.44)
S
In order to do that, we first notice that 2 . -t ::::; sm t ::::; t,
11"
i.e.,
. 1 0< t - sm t < _t 3 - 6
1 11 <-t 11" - 12
-Isint t
t E]O, 11"],
hence
n rn Dn(t) dt _ r sin((n + 1/2)t) dtl ::::; ~ (~)2 ~ ° (11.45) Ii io t/2 12 2n + 1 o
as n
~ 00.
r
io
n
Equality(I1.44) then follows as
sin((n + 1/2)t) dt
t/2
= _2_
r
sins
2n+ 1 i o s/(2n+ 1)
ds = 2
r sins ds.
io
s (11.46)
11.5 Fourier's Series
447
Figure 11.14. The sawthooth h(x) and its Fourier partial sum of order 5 in [-1l",1l"].
b. Gibbs's phenomenon Consider the 271"-periodic function h defined by periodically extending the function -71" - t if - 71" :S t < 0, (11.47) h(t):= 0 ift = 0, { 71" - t if 0 < t :S 71". Its Fourier coefficients are easily computed to be
1
Co = 0,
Ck
hence Snh(x) =
L
:= ik' k
(X
eikx
if:
=i- 0,
=
io
Dn(t) dt -
X
0
k=-n,n k'l'O
or _ ~ sin kx S n h( x ) -2~ k . k=l
In particular Snh(O) = 0 'in, and, by Dini's test, Theorem 11.71, 2
f
sinkkx
k=l
The energy inequality yields
or
= h(x)
pointwise in IR.
(11.48)
448
11. Some Applications
1
00
L
7f2 (11.49)
= 6'
k2
k=l
As we have already seen, we have the following, of which we give a direct proof. 11.89 Theorem. For any positive 8> 0 the Fourier series of h converges uniformly to h in [8,7f]. Proof. We know that Snh(X) converge pointwise to h, therefore it suffices to show that ikx
L (Xl
e
k=-{XJ
(11.50)
ik
k,,;O
converges uniformly in [8, rrJ. We apply Dirichlet's theorem for series of products, see Section 6.5 of [GM2J, respectively, to the series with positive and negative indices with ikx ak = Ij(ik) and bk := e , to find that e ikx
I "L.J -tk. I:s:
ISnh(x) - h(x)1 =
Ikl:::n+1
hence
IISnh -
hi I
4
CXl,
[6,11"]
:s: 11 _ ei6 1
4 1 - I- .I 1 - e'X n + 1
1 n+l
----->0
as n --->
00.
Alternatively, from (11.48) we infer
Snh(x) - h(x) = fax Dn(t) - rr = -111" Dn(s) ds and, by (11.43),
ISnh(x) - h(x)1
:s: c(8)-n1
uniformly in [8, rrJ.
o
However the Fourier series of h does not converge uniformly in [0, rrJ.
11.90 Proposition. We have
II S n h lloo,[o,11"]
---->
2
i
r -ssins ds. o
Proof. Let Yn be the point where Snh(X) obtains its maximum value in [0, rrJ, M n := sup Snh(X) = Sn(Yn), [0,11"]
and let Xn := 2~~1 • Since Xn is the maximum point of Gn(x), we have
:s: Gn(xn ) - Xn = Snf(xn) :s: Snf(Yn) = Gn(Yn) - Yn :s: Gn(Xn ) This implies 0 :s: Yn :s: Xn and -X n :s: Sn(Yn) - Gn(xn) :s: -Yn, hence Gn(Yn) - Xn
IMn -
Gn(xn)1
The conclusion follows from (11.44).
:s: Xn =
2rr 2n + l'
Yn·
(11.51)
o
11.5 Fourier's Series
449
We can rewrite the statement in Proposition 11.90 as
II Snhlloo,[o,1f] Since
21
7r
0
2 -4
1f
(
-;
r -ssins ) h ds Il lloo,[o,1fJ'
Jo
-sins ds = 1.089490 ... S
we see that, while Snh(x) -4 h(x) for all x E JR, near 0, Snh(x) always has a maximum which stays away from the maximum of h that is Ilhlloo,[o,1f] = 7r by a positive quantity: this is the Gibbs phenomenon, which is in fact typical of Fourier series at jump points; but we shall not enter into this subject.
11.5.6 The Dirichlet-Jordan theorem The pointwise convergence of the Fourier series of a continuous or summable function is a subtle question and goes far beyond Dini's test, Theorem 11.71. An important result, proved by Lejeune Dirichlet (1805-1859), shows in fact that a 27r-periodic function which has only a finite number of jumps and maxima and minima, has a Fourier series that converges pointwise to (L + + L -)/2 where L± := limy--->x± f(y); in particular Snf(x) -4 f(x) at the points of continuity. The same proof applies to functions with bounded variation, see Theorem 11.91. In 1876 Paul du Bois-Reymond (1831-1889) showed a continuous function whose Fourier series diverges at one point, and, therefore, that the continuity does not solely suffice for the pointwise convergence of the Fourier series. We shall present a different example due to Lipot Fejer (1880-1959). Starting from this example one can show continuous functions whose Fourier series do not converge in a denumerable dense set, for instance, the rationals. In the 1920's Andrey Kolmogorov (1903-1987) showed a continuous function with Fourier series divergent on a set with the power of the continuum, and Hugo Steinhaus (1887-1972) showed a continuous function whose Fourier 'series converges pointwise everywhere, but does not converge uniformly in any interval. Eventually, the question was clarified in 1962 by Lennart Carleson. Here we collect some complements.
a. The Dirichlet-Jordan test 11.91 Theorem (Dirichlet-Jordan). Let f be a 27r-periodic function with bounded total variation in [a, b].
(i) For every x E]a,b[ we have Snf(x) -4 (L+ + L-)/2 where L± := limy--->x± f (y). (ii) If f is also continuous in ]0:, b[, then Snf(x) -4 f(x) uniformly in any closed interval strictly contained in la, b[.
450
11. Some Applications
J-L+2n Figure 11.15. The amplitude of the harmonics of Qn,!-, (x).
Proof. Let [a, b] be an interval with b - a < 271". Since every function with bounded variation in [a, b] is the sum of an increasing function and of a decreasing function, we may also assume that f is nondecreasing in [a, bJ. (i) Let x E]a, b[, 9x(t) := f(x We have
Snf(x) -
L+ - L2
+ t)
- L+
+ f(x
- t) - L - where L± := limy~x± f(y).
111"
1 = (f(x +t) - L+ 271" 0
1
6
= -1 9x(t)D n (t) ds 271" 0
+ f(x -t) -
+ -1
271"
L-)Dn(t)dt
111" 9x(s)Dn (s) ds
(11.52)
6
=h+h. where 8> 0 is to be chosen later. Since f(x + t) - L+ and -(f(x - t) - L-) are nondecreasing near t = 0 and nonnegative, the second theorem of mean value and (11.42) yield (11.53) while (11.43) yields
Therefore, given
E
>
1 Ihl ::; c(8)-. n 0, we can choose 8 > 0 in such a way that
(11.54)
to obtain from (11.53) and (11.54) that
That proves the pointwise convergence at x.
(ii) In this case for every x E [a, b], we have L + = L - = f(x) and it suffices to estimate uniformly in [a + 0", b - 0"], 0 < 0" < (b - a)/2, hand h in (11.52). Since f is uniformly continuous in [a, b], given E > 0, we can choose 8, 0 < 8 < 0", in such a way that If(x + 8) - f(x)1 + If(x - 8) - f(x)1 < E, uniformly with respect to x in [a + 0", b - 0"], hence from (11.53) Ihl < 271"E uniformly in [a + 0", b - O"J. The uniform estimate of Ihl is instead the claim of Proposition 11.83. Finally, if b - a > 271", it suffices to write [a, b] as a finite union of intervals of length less than 271" and apply the above to them. 0
11.92 Remark. Notice that the Dirichlet-Jordan theorem is in fact a claim on monotone functions. Monotone functions are continuous except on a denumerable set of jump points, that is not necessarily discrete.
11.5 Fourier's Series
451
b. Fejer example Let /.L E l\l be a natural number to be chosen later. For every n E l\l consider the trigonometric polynomial of degree n . _ ~ cos(n+/.L-k)x-cos(n+/.L+k)x Q nIL () X '-L... 'k=1 k . n sinkx = 2sm((n + /.L)x) L - - , k=1 X
see Figure 11.15. It is a cosine polynomial with harmonics of order /.L,/.L 1,n + /.L+ 1, ... ,n+ 2/.L. Now choose
+ 1,n + /.L-
o a sequence {ak} of positive numbers in such a way that 2:1:"=1 ak < +00, o a sequence {nk} of nonnegative integers such that ak log nk does not converge to zero, o a sequence {/.Lk} of nonnegative integers such that /.Lk+1 > /.Lk + 2nk, and set Qk(X) := QnkoILk(X). Since the sums 2:k=1 sint" are equibounded, see (11.42) and (11.48), the polynomials Qn,IL(X) are equibounded independently of n,/.L E l\l and x E R Consequently 2:1:"=1 akQk(x) converges absolutely in C°(IR) to a continuous function f(x), x E JR, 00
f(x) = L akQnk,ILk(x), k=1
which is 211"-periodic and even, for cosine series
f
a sum of cosines. The Fourier series of
f
is then a
00
Sf(x) = ao
2
+L
ak cos(kx).
k=1
We now show that Snf(O) has no limit as n -> 00. Since f is a uniform limit, we can integrate term by term to get Fourier coefficients Cj
:= -1/71' f(t) cos(jt) dt = L00 ak /71' Qk(t) cos(jt) dt
11"
-71'
k=1 11"
-71'
because of the choice of the /.Lk, the harmonics of Qk and Qh, h of- k are distinct, in particular ILk+ n k- 1 nk 1 Ink dt L Cj =akL-:- 2': a k - =aklognk· ILk j=1 J 1 t Consequently, we deduce for the Fourier partial sums of Snk+ILkf(O) - SILk-1f(O) =
f
mk+ n k- 1 L Cj ILk
at 0
2':
ak lognk'
Therefore Snf(O) does not converge, because of our choice of {nd. A possible choice of the previous constants is
which yields ak log(nk) = log 2.
452
11. Some Applications
Figure 11.16. Paul du Bois-Reymond (1831-1889) and Lipot Fejer (188G-1959).
11.5.7 Fejer's sums Let I be a continuous and 27r-periodic function. The Fourier partial sums of I need not provide a good approximation of I, neither uniformly nor pointwise; on the other hand I can be approximated uniformly by trigonometric polynomials, see Theorem 9.58. A specific interesting approximation was pointed out by Lipot Fejer (1880-1959). Let I E L~1l" and Snl(x) = I:;=-n Ckeikx. Fejer's sums of I are defined by 1 n Fnl(x):= n+ 1 L,Snl(x).
k=O Trivially Fnl(x) are trigonometric polynomials of order n that can be written as 1
nk
1
..
n
..
Fn(x) = ~ ~ c'e'Jx = ~ (n -ljl)c·e'Jx. n+1LJ LJ J n+1 LJ J k=O j=-k j=-n We have 11.93 Theorem (Fejer). Let converge to I unilormly in JR.
I
E L~1l"
n CO(JR).
The Fejer sums Fnl(x)
Before proving Fejer's theorem, let us state a few properties of the Fejer kernel defined by 1
Fn(x) := n +
n
1L Dk(X) k=O
where Dk denotes the Dirichlet's kernel of order k. 11.94 Proposition. We have
n+ 1 Fn(x) = { _1_ (Sin((n + 1)X/2))2 n + 1 sin(x/2)
ilx = 2k7r, k Ell, otherwise.
11.5 Fourier's Series
453
Proof. Trivially Fn(O) = n
1
n
LDk(O) = + 1 k=O n
n (n + 1)2 L(2k+1) = - - =n+1. + 1 k=O n +1
1
Observing that in Fn(x) = (sin(x/2)
+ .'.' + sin((n + 1/2)X)) /(n + 1) sm(x/2)
the expression in parentheses is the imaginary part of e ix / 2 + e i3x / 2 + ...
+ e i (2n+l)x/2
e ix / 2 (e i (n+l)x - 1) sin(x/2)(e iX - 1)
sin(x/2) _ e i (n+l)x/2 2isin((n + 1)x/2) _ e i «n+l)x/2) sin((n + 1)x/2) - sin(x/2) 2isin(x/2) sin2 (x/2) , we see that
__ 1_ (Sin((n + 1)X/2))2 Fn ( x ) . n +1 sin(x/2)
o 11.95 Proposition. Fijer's kernel has the following properties (i) Fn(x) 2: 0, (ii) Fn(x) is even, (iii) 2~ Fn(t) dt = 1, (iv) Fn(x) attains its maximum value at 2k7r, k E Z, (v) for all 8 > 0, Fn(x) ----t 0 uniformly in [8,1l"] as n ----t 00, (vi) there exists a constant A> 0 such that Fn(x) ::::; (n+~)x2 for all n and x =1= 0 in [-1l",1l"], (vii) {Fn} is an approximation of the Dirac mass 8.
L1r
E
N
Proof. (i),(ii),(iii), (iv), (v) are trivial; (vi) follows from the estimate sin t :::: 2t/1r in JO, 1r /2]. Finally (vii) follows from (iii) and (v). 0 Proof of Fejer's theorem, Theorem 11.93. First we observe that Fnf(x) - f(x) = - 1 211"
Thus, if we set g(t) := f(x Fnf(x) - f(x) =
111" (J(x + t) + f(x -
+ t) + f(x
~ 211"
- t) - 2f(x),
8
r
Jo
t) - 2f(x))Fn (t) dt.
0
g(t)Fn(t) dt
+ ~ f" g(t)Fn(t) dt =: h + h 211"
J8
Now, given € > 0, we can choose 8 so that If(x + t) + f(x - t) - 2f(x)1 t E [0,8J uniformly in x, since f is uniformly continuous. Hence
Ihl
~ 2€ 1
8
Fn(t)
~ 2€ 111" Fn(t)dt =
< 2€ for all
21r€.
On the other hand Ihl ~ 41IfllooA/((n + 1)82 ), hence IFnf(x) - f(x)1 ~
A
€
+ 411flloo (n + 1)82 ' o
A. Mathematicians and Other Scientists
Maria Agnesi (1718-1799) Pavel Alexandroff (1896-1982) James Alexander (1888-1971) Archimedes of Syracuse (287BC-212BC) Cesare Arzela (1847-1912) Giulio Ascoli (1843-1896) Rene-Louis Baire (1874-1932) Stefan Banach (1892-1945) Isaac Barrow (1630-1677) Giusto Bellavitis (1803-1880) Daniel Bernoulli (1700-1782) Jacob Bernoulli (1654-1705) Johann Bernoulli (1667-1748) Sergei Bernstein (1880-1968) Wilhelm Bessel (1784-1846) Jacques Binet (1786-1856) George Birkhoff (1884-1944) Bernhard Bolzano (1781-1848) Emile Borel (1871-1956) Karol Borsuk (1905-1982) L. E. Brouwer (1881-1966) Renato Caccioppoli (1904-1959) Georg Cantor (1845-1918) Alfredo Capelli (1855-1910) Lennart Carleson (1928- ) Lazare Carnot (1753-1823) Elie Cartan (1869-1951) Giovanni Cassini (1625-1712) Augustin-Louis Cauchy (1789-1857) Arthur Cayley (1821-1895) Eduard Cech (1893-1960) Pafnuty Chebyshev (1821-1894) Richard Courant (1888-1972) Gabriel Cramer (1704-1752) Jean d'Alembert (1717-1783) Georges de Rham (1903-1990) Richard Dedekind (1831-1916) Rene Descartes (1596-1650) Ulisse Dini (1845-1918) Diocles (240BC-180BC) Paul Dirac (1902-1984) Lejeune Dirichlet (1805-1859) Paul du Bois-Reymond (1831-1889) James Dugundji (1919-1985)
Albrecht Durer (1471-1528) Euclid of Alexandria (325BC-265BC) Leonhard Euler (1707-1783) Alessandro Faedo (1914-2001) Herbert Federer (1920- ) LipOt Fejer (1880-1959) Pierre de Fermat (1601-1665) Sir Ronald Fisher (1890-1962) Joseph Fourier (1768-1830) Maurice Frechet (1878-1973) Ivar Fredholm (1866-1927) Georg Frobenius (1849-1917) Boris Galerkin (1871-1945) Galileo Galilei (1564-1642) Carl Friedrich Gauss (1777-1855) Israel Moiseevitch Gelfand (1913- ) Camille-Christophe Gerono (1799-1891) J. Willard Gibbs (1839-1903) Jorgen Gram (1850-1916) Hermann Grassmann (1808-1877) George Green (1793-1841) Thomas Gronwall (1877-1932) Jacques Hadamard (1865-1963) Hans Hahn (1879-1934) Georg Hamel (1877-1954) William R. Hamilton (1805-1865) Felix Hausdorff (1869-1942) Oliver Heaviside (1850-1925) Eduard Heine (1821-1881) Charles Hermite (1822-1901) David Hilbert (1862-1943) Otto Holder (1859-1937) Robert Hooke (1635-1703) Heinz Hopf (1894-1971) Guillaume de I'Hopital (1661-1704) Christiaan Huygens (1629-1695) Carl Jacobi (1804-1851) Johan Jensen (1859-1925) Camille Jordan (1838-1922) Oliver Kellogg (1878-1957) Felix Klein (1849-1925) Helge von Koch (1870-1924) Andrey Kolmogorov (1903-1987) Leopold Kronecker (1823-1891)
456
A. Mathematicians and Other Scientists
Kazimierz Kuratowski (1896-1980) Joseph-Louis Lagrange (1736-1813) Edmond Laguerre (1834-1886) Pierre-Simon Laplace (1749--1827) Gaspar Lax (1487-1560) Henri Lebesgue (1875-1941) Solomon Lefschetz (1884-1972) Adrien-Marie Legendre (1752-1833) Gottfried von Leibniz (1646-1716) Jean Leray (1906-1998) Sophus Lie (1842-1899) Ernst Lindelof (1870-1946) Rudolf Lipschitz (1832-1903) Jules Lissajous (1822-1880) L. Agranovich Lyusternik (1899-1981) James Clerk Maxwell (1831-1879) Edward McShane (1904-1989) Arthur Milgram (1912-1961) Hermann Minkowski (1864-1909) Carlo Miranda (1912-1982) August Mobius (1790-1868) Harald Marston Morse (1892-1977) Mark Naimark (1909-1978) Nicomedes (280BC-21OBC) des Chenes M.- A. Parseval (1755-1836) Blaise Pascal (1623-1662) Etienne Pascal (1588-1640) Giuseppe Peano (1858-1932) Oskar Perron (1880-1975) Emile Picard (1856-1941) J. Henri Poincare (1854-1912) Diadochus Proclus (411-485) Pythagoras of Samos (580BC-520BC) Hans Rademacher (1892-1969) Tibor Rad6 (1895-1965) Lord William Strutt Rayleigh (1842-1919)
Kurt Reidemeister (1893-1971) G. F. Bernhard Riemann (1826-1866) Frigyes Riesz (1880-1956) Marcel Riesz (1886-1969) Eugene Rouche (1832-1910) Adhemar de Saint Venant (1797-1886) Stanislaw Saks (1897-1942) Helmut Schaefer (1925- ) Juliusz Schauder (1899-1943) Erhard Schmidt (1876-1959) Lev G. Schnirelmann (1905-1938) Hermann Schwarz (1843-1921) Karl Seifert (1907-1996) Takakazu Seki (1642-1708) Carlo Severini (1872-1951) Hugo Steinhaus (1887-1972) Thomas Jan Stieltjes (1856-1894) Marshall Stone (1903-1989) James Joseph Sylvester (1814-1897) Brook Taylor (1685-1731) Heinrich Tietze (1880-1964) Leonida Tonelli (1885-1946) Stanislaw Ulam (1909-1984) Pavel Urysohn (1898-1924) Charles de la Valloo-Poussin (1866-1962) Egbert van Kampen (1908-1942) Alexandre Vandermonde (1735-1796) Giuseppe Vitali (1875-1932) Vito Volterra (1860-1940) John von Neumann (1903-1957) Karl Weierstrass (1815-1897) Norbert Wiener (1894-1964) Kosaku Yosida (1909-1990) William Young (1863-1942) Nikolay Zhukovsky (1847-1921) Max Zorn (1906-1993) Antoni Zygmund (1900-1992)
There exist many web sites dedicated to the history of mathematics, we mention, e.g., http://www-history.mcs.st-and.ac.uk/-history.
B. Bibliographical Notes
We collect here a few suggestions for the readers interested in delving deeper into some of the topics treated in this volume. Concerning linear algebra the reader may consult o P. D. Lax, Linear Algebra, Wiley & Sons, New York, 1997, o S. Lang, Linear Algebra, Addison-Wesley, Reading, 1966, o A. Quarteroni, R. Sacco, F. Saleri, Numerical Mathematics, Springer-Verlag, NewYork,2000, o G. Strang, Introduction to Applied Mathematics, Wellesley-Cambridge Press, 1961. Of couse, curves and surfaces are discussed in many textbooks. We mention o M. do Carmo, Differential Geometry of Curves and Surfaces, Prentice Hall Inc., New Jersey, 1976, o A. Gray, Modern Differential Geometry of Curves and Surfaces, CRC Press, Boca Raton, 1993. Concerning general topology and topology the reader may consult among the many volumes that are available o J. Dugundji, Topology, Alyn and Bacon, Inc., Boston, 1966, o K. Jiinich, Topology, Springer-Verlag, Berlin, 1994, o I. M. Singer, J. A. Thorpe, Lecture Notes on Elementary Topology and Geometry, Springer-Verlag, New York, 1967, o J. W. Vick, Homology Theory. An Introduction to Algebraic Topology, SpringerVerlag, New York, 1994. With special reference to degree theory and existence of fixed points we mention o A. Granas, J. Dugundji, Fixed Point Theory, Springer-Verlag, New York, 2003. o L. Nirenberg, Topics in Nonlinear Functional Analysis, Courant Institute of Mathematical Sciences, New York University, 1974. The literature on Banach and Hilbert spaces, linear operators, spectral theory and linear and nonlinear functional analysis is incredibly wide. Here we mention only a few titles o N. J. Akhiezer, I. M. Glazman, Theory of Linear Operators in Hilbert Spaces, Dover, New York, 1983, o H. Brezis, Analyse Fonctione/le, Masson, Paris, 1983, o A. Friedman, Foundations of Modern Analysis, Dover, New York, 1970, and also o N. Dundford, J. Schwartz, Linear Operators, John Wiley, New York, 1988, o K. Yosida, Functional Analysis, Springer-Verlag, Berlin, 1974, as well as the celebrated o R. Courant, D. Hilbert, Methods of Mathematical Physics, Interscience Publishers, 1953, o F. Riesz, B. Sz. Nagy, Le~ons d'Analyse Fonctionelle, Gauthier-Villars, Paris, 1965.
c. Index
accumulation point, 164 algebra - End (X), 326 - ideal,402 - - maximal, 402 - - proper, 402 of functions, 316 - spectrum, 403 - with identity, 402 algorithm - Gram-Schmidt, 85, 99 ball - open, 152 Banach - algebra, 326, 403 - closed graph theorem, 330 continuous inverse theorem, 330 - fixed point theorem, 335 - indicatrix, 265 open mapping theorem, 329 space, 286 - - ordered, 343 basis, 43 - dual, 54 - orthonormal, 85 bilinear form - bounded, 370 - coercive, 370 bilinear forms, 95 - signature, 97 bracket - Lie, 38 Carnot's formula, 81 cluster point, 164 coefficients - Fourier, 433 compact set, 200 - relatively, 203 - sequentially, 197 conics, 106
connected - component, 211 - set, 210 continuity - for metric spaces, 163 continuity method, 337 contractible spaces, 253 convergence in a metric space, 153 - pointwise, 157, 297 - uniform, 157, 294, 296 -- on compact subsets, 310 - weak, 398 convex hull, 208 convolution, 309, 310 - integral means, 309 coordinates - cylindrical, 168 - polar, 168 - spherical, 168 covectors, 54 covering, 165, 199, 260 - locally finite, 165 - net, 199 criterion - Hausdorff, 200 cube - Hilbert, 158 curve, 219 arc length reparametrization, 232 closed, 219 - cylindrical helix, 221 - cylindrical representation - - length, 231 equivalent, 224 - intrinsic parametrization, 243 - length, 227 in cylindrical coordinates, 231 in polar coordinates, 231 in spherical coordinates, 231 minimal, 397 -- of graphs, 231
460
Index
- - semicontinuity, 395 - Lipschitz-continuous, 230 - orientation, 224 parametrization, 219 Peano, 228 piecewise regular, 226 piecewise-C 1 , 226 polar representation, 221 - - length, 231 rectifiable, 227 - regular, 224 - self-intersection, 223 - simple, 223 - spherical representation - - length, 231 tangent vectors, 225 total variation, 241 trace, 219 trajectory, 219 - von Koch, 228 decomposition - polar, 125 - singular value, 126 definitively, 192 degree, 268 - integral formula, 269 - mapping - - degree, 268 - on Sl, 266 - with respect to a point, 275 delta - Dirac, 313 - - approximation, 313 - Kronecker, 12 dense set, 192 determinant, 33, 34 area, 31 Laplace's formula, 36 of a product, 35 of the transpose, 35 Vandermonde, 39 diameter, 153 Dini - regular, 438 - test, 438, 444 Dirichlet - problem, 416 discrete Fourier transform, 134, 144 - inverse, 134 distance, 81, 84, 151, 154, 161, 286 between sets, 216 codes, 156 discrete, 156 Euclidean, 155 from a set, 162 - Hausdorff, 299 in ep , 158
- in the mean, 160 - integral, 160 - uniform, 157, 159 duality, 55 eigenspace, 58 eigenvalue, 58, 384, 391 - min-max characterization, 392 - multiplicity - - algebraic, 62 - - geometric, 62 - real and complex, 66 - variational characterization, 115 eigenvector, 58 energy equality, 360, 441 example - Fejer,451 exponential operator, 327 Fejer - example, 451 - sums, 452 fixed point, 335 force, 92 forms - bilinear, 95 - linear, 54 - quadratic, 115 formula - Binet, 35 - Carnot, 81 - degree, 269 - Euler, 281 Grassmann, 18,47 Hadamard, 143 - inverse matrix, 30 Laplace, 36 - Parseval, 358, 441 - polarity, 80, 83 - rank,49 -- of matrices, 16 Fourier - coefficients, 357, 433 - series, 357, 433 - - uniform convergence, 444 Fredholm's alternative, 50 function, see map - Banach's indicatrix, 265 - bounded total variation, 244 - closed, 194, 216 - coercive, 203 - continuous, 163, 182 image of a compact, 202 - - image of a connected set, 212 - - inverse of, 202 - convex, 287 - exponential, 171 - Holder-continuous, 161
Index
- homeomorphism, 182 - Joukowski, 169 - limit, 164 - Lipschitz-continuous, 161 - - extension, 207 - logarithm, 171 - lower semicontinuous, 204 - Mobius, 170 - open, 194, 216 - proper, 216 - sequentially semicontinuous, 203 - total variation, 241 - uniformly continuous, 205 functions - equibounded, 301 - equicontinuous, 301 - Holder-continuous, 301 - homotopic, 250 fundamental group, 258
- Bessel, 358, 440 - Cauchy-Schwarz, 80, 83, 352 - Gronwall, 410 - Jensen, 400 - Minkowski, 155, 158, 293 - triangular, 81, 84 - variational, 372 inner product - continuity, 352 integral - de la Vallee Poussin, 316 integral equations - Fredholm, 426, 428, 429 - Volterra, 425, 426, 429 invariant - metric, 183 - topological, 184 isolated point, 180 isometries, 87
gauge function, 333 geodesic, 152 - distance, 154 Gibbs phenomenon, 448 Green operator, 421 group - fundamental, 258 - linear, 50 - orthogonal, 88 - unitary, 88
kernel - de la Vallee-Poussin, 315 - Dirichlet, 435
Holder function, 161 Hausdorff criterion, 200 Hermitian product - continuity, 352 Hilbert space, 158, 353 - basis, 355 - complete system, 355 - dual,364 - Fourier series, 357 - pre, 351 - separable, 355 - weak convergence, 398 Hilbert's cube, 393 homeomorphism, 182 homotopy, 250 - equivalence, 253 - first group, 258 - relative, 256 - with fixed endpoints, 256 ideal,402 - maximal,402 - proper, 402 identity - Jacobi,38 - parallelogram, 80, 287 inequality
law - parallelogram, 287 least squares, 129 - canonical equation, 129 lemma - Gronwall, 410 - Riemann-Lebesgue, 436 - Uryshon, 209 liminf,204 limit point, 164 limsup, 204 linear - combination, 42 - equation, 50 - operator, 44 - - characteristic polynomial, 60 - - eigenspace, 58 - - eigenvalue, 58 - - eigenvector, 58 - subspace, 4 - systems, 22 - - Cramer's rule, 36 linear difference - equations -- of higher order, 137 linear difference equations - systems, 136 linear regression, 374 Lipschitz - constant, 161 - function, 161 map - affine, 37
461
462
Index
- compact, 339 - linear, 44 affine, 37 - - associated matrix, 48 - - automorphism, 50 - - endomorphism, 50 graph, 109 image, 45 kernel, 45 rank,45 - proper, 265 - Riesz, 91, 367 mapping - degree, 268 matrix algebra, 11 - associated to a linear map, 48 - block, 39, 137 - characteristic of a, 35 cofactors, 36 complementing minor, 34 congruent to, 102 determinant, 33, 34 diagonal, 12 diagonizable, 60 eigenspace, 58 eigenvalue, 58 eigenvector, 58 Gauss reduced, 26 - - pivots, 26 Gram, 82, 85, 96, 101, 143 identity, 12 inverse, 12, 36 Jordan's -- basis, 72 - - canonical form, 70 - Jordan's formula, 137 - LR decomposition, 30 nilpotent, 69 nonsingular, 15 orthogonal, 88 polar form, 125 power, 137 product, 11 rank, 16 similar to, 60 - singular value decomposition, 126 singular values, 125 - spectrum, 58 - stair-shaped, 26 - - pivots, 26 symmetric, 38 trace, 38, 61 transpose, 13 triangular -- lower, 12 -- upper, 12 - unitary, 88
maximum point, 201 method - continuity, 337 Faedo-Galerkin, 377 - Gauss elimination, 25 - Gram-Schmidt, 106 - Jacobi, 100 - least squares, 128 - Picard, 335, 407 - Ritz, 373 - - error estimate, 373 - shooting, 418 - super- and sub-solutions, 344 - variational for the eigenvalues, 116, 118 metric,97 - Artin, 103 Euclidean, 103 invariant, 183 - Lorenz, 103 nondegenerate, 97 - positive, 97 pseudoeuclidean, 103 metric axions, 151 metric space, 151
0
1 , 160
- compact, 200 complete, 185 completion, 186 - connected, 210 - connected component, 211 - continuity in, 163 - convergence in, 153 immersion in lex>, 402 - immersion in Co, 402 - locally connected, 212 path-connected, 213 - sequentially compact, 197 metrics, 151 - equivalent, 188 - in a product space, 156 - topologically equivalent, 178 minimal geodesics, 397 minimizing sequence, 201 minimum point, 201 Minkowski - discrete inequality, 155 - inequality, 158 Minkowski inequality, 293 minor - complementing, 34 modulus of continuity, 320 mollifiers, 312 Moore-Penrose inverse, 369, 374 neighborhood, 177 norm, 79, 154, 285 - CO,O< 301 - 0 1 ,296
Index
- equivalent norms, 288 - Loo, 294 - £p, 292 - LP, 293 - uniform, 294 - uniform or infinity, 159 normed space, 154, 285 - .C(X, Y), 324 - series, 288 - - absolute convergence, 289 normed spaces - convex sets, 287 numbers - Fibonacci, 140
ODE - Cauchy problem, 405 - comparison theorem, 420 - continuation of solutions, 408 - Gronwall's lemma, 410 - integral curves, 404 - maximum principle, 419, 420 - Picard approximations, 407 - shooting method, 418 operator - adjoint, 93, 369 - closed range, 369 - commuting, 388 compact, 378 - compact perturbation, 379 - eigenvalue, 384 - eigenvector, 384 - Green, 372, 421 - linear - - antisymmetric, 121 - - isometry, 121 - - normal, 121 - - positive, 117 - - self-adjoint, 121 - - symmetric, 121 - normal, 121, 388 - positive, 117 - powers, 119 - projection, 111, 368 - resolvent, 384 - Riesz, 366 - self-adjoint, 111, 369 - singular values, 125 - spectrum, 384 - - pointwise, 384 - square root, 120 operators - bounded, 324 - compact, 339 - exponential, 327 - pointwise convergence, 325 - Schauder, 341 - uniform convergence, 325
order cone, 343 parallegram law, 80, 83 parallelogram law, 352 path,219 Peano curve, 228 Peano's phenomenon, 416 perfect set, 192 phenomenon - Peano, 416 point - adherent, 179 - boundary, 179 - cluster, 164 - exterior, 179 - interior, 179 - isolated, 180 - limit, 164 - of accumulation, 164 polynomials - Bernstein, 306 - Hermite, 361 - Jacobi, 361 - Laguerre, 361 - Legendre, 361 - Stieltjes, 314 - Tchebychev, 361 principle - abstract Dirichlet's, 364, 371 - Cantor, 188 - maximum, 419, 420 - of condensation of singularities, 329 - of uniform boundedness, 328 - Riemann's localization, 445 problem - Dirichlet, 416 product - Hermitian, 82 - inner, 79 - scalar, 79 projection - stereographic, 168 quadratic forms, 104 quadrics, 107 rank, 16 - of the transpose, 17 Rayleigh's quotient, 392 resolvent, 384 retraction, 254 scalars, 4, 41 segment-connected set, 213 semicontinuous function - sequentially, 203 sequence - Cauchy, 185
463
464
Index
- convergent, 153 series, 288 - Fourier, 357, 433 set - boundary of, 179 - bounded, 153 - - totally, 199 - closed, 175 - closure of, 179 - compact, 200 -- sequentially, 197 - complement of, 175 - connected, 210 -- in JR, 211 - convex, 287 - convex hull of, 208 - dense, 192 - derived of, 180 - discrete, 192 - interior, 179 - meager, 189 - neighborhood, 177 - nowhere dense, 189 - of the first category, 189 - of the second category, 189 - open, 175 - perfect, 192 regular closed, 193 - regular open, 193 - relatively compact, 203 - segment-connected, 213 - separated, 210 small oscillations, 141 - normal modes, 143 - proper frequencies, 143 smoothing kernel, 312 space -~, 346 - ()",Ct, 301 - C b , 296 - V, 293 - foo, 295 - f p , 292 - co, 356 - Co, 159
- contractible, 253 - foo, 157
- LP, 161 - f p , 158 - Hilbert, 353 - Hilbert's, 158 - locally path-connected, 262 - L2(]a, b[), 354 - f2, 353 - pre-Hilbert, 351 - simply connected, 259 - topologically complete, 194 spectral theorem, 387
spectrum, 58, 384 - characterization, 385 - pointwise, 384 subsolution, 344 subspace - orthogonal, 90 supersolution, 344 test - Dini, 438, 444 theorem - alternative, 94, 380, 383 - Baire, 188 - Baire of approximation, 319 - Banach's fixed point, 335 - Banach-Saks, 399 - Banach-Steinhaus, 328 - Bernstein, 306, 423 - Binet, 35 Bolzano-Weierstrass, 198 Borsuk,273 - Borsuk's separation, 280 - Borsuk-UIam, 278 - Brouwer, 273 - Brouwer's fixed point, 274, 276, 339 - Brouwer's invariance domain, 281 - Caccioppoli-Schauder, 341 - Cantor-Bernstein, 215 - Carnot, 81, 352 - Cayley-Hamilton, 67 - closed graph, 330 - comparison, 420 - continuation of solutions, 408 - continuous inverse, 330 - Courant, 116 - Cramer, 36 - de la Vallee Poussin, 315 - Dini, 299, 438 - Dirichlet-Jordan, 449 - Dugundji, 208 - existence - - of minimal geodesics, 397 - - of minimizers of convex coercive functionals, 401 - Fejer, 452 - finite covering, 200 - Frechet-Weierstrass, 203 - Fredholm, 94 - Fredholm's alternative, 50 - fundamental of algebra, 271 - Gelfand-Kolmogorov, 402 - Gelfand-Nairnark, 403 - generalized eigenvectors, 69 - Gram-Schmidt, 85 - Hahn-Banach, 331, 332, 334 - Hausdorff, 186 - Heine--Cantor-Borel, 205 - Hopf,273
Index
-
intermediate value, 212 Jacobi, 100 Jordan, 280 Jordan's canonical form, 72 Jordan's separation, 280 Jordan-Borsuk, 281 Kirszbraun, 207 Kronecker, 35 Kuratowski,215 Lax-Milgram, 376 - Lyusternik-Schnirelmann, 278 - McShane, 207 - Miranda, 277 - nested sequence, 188 - open mapping, 329 - Peano, 415 - Perron-Frobenius, 282 - Picard-Lindelof, 406 - Poincare--Brouwer, 277 - polar decomposition, 125 - projection, 89, 367 - Pythagoras, 81, 84, 86, 354 - Riemann-Lebesgue, 436 - Riesz, 91, 291, 366, 371 - Riesz-Fisher, 360 - Riesz-Schauder, 385 - Rouche, 282 - Rouche--Capelli, 23 - Schaefer's fixed point, 342 - second mean value, 442 - Seifert-Van Kampen, 267 - simultaneous diagonalization, 117 - spectral, 112, 122, 385 - spectral resolution, 114 - stability for systems of linear difference equations, 140 - Stone-Weierstrass, 316 - Sylvester, 98, 101 - Tietze, 208 - Uryshon, 185 - Weierstrass, 201 - Weierstrass's approximation, 303 - Weierstrass's approximation for periodic functions, 307 theory - Courant-Hilbert-Schmidt, 389 - - completeness relations, 394 toplogical - invariant, 184 topological - property, 184 - space, 182 topological space - contractible, 253 - deformation retract, 254 - Hausdorff, 184 - retract, 254 - simply connected, 259
topology, 178, 182 - basis, 184 - discrete, 184 - indiscrete, 184 - of uniform convergence, 294 totally bounded set, 199 trigonometric polynomials, 130 - energy identity, 131 - Fourier coefficients, 131 - sampling, 132 tubular neighborhood, 159 variational - inequality, 372 vector space, 41 - IKn , 3 - automorphism, 50 - basis, 5, 43 - - canonical basis of IKn , 9 - - orthonormal, 85 - coordinate system, 46 - dimension, 8, 45 - direct sum, 18, 47 - dual, 54 - Euclidean, 79 -- norm, 81 - Hermitian, 82 -- norm, 84 - linear combination, 4, 42 - linear subspace, 4 - - implicit representation, 18 - - parametric representation, 18 - ordered basis, 9 - subspace, 42 -- supplementary, 47 - supplementary linear subspaces, 18 vectors, 41 - linearly dependent, 5 - linearly independent, 5, 42 - norm, 79 - orthogonal, 80, 84, 354 - orthonormal, 85 - span of, 42 von Koch curve, 228 work,92 Yosida regularization, 319, 320
465
Printed in the United States of America