REVIEWSOF
Algebra and Analysis for Engineers and Scientists
"This book is a useful compendium of the mathematics of (mostly) finite-dimensionallinear vector spaces (plus two final chapters on infinite-dimensional spaces), which do find increasing application in many branches of engineering and science .... The treatment is thorough; the book will certainly serve as a valuable reference." - A merican Scientist "The authors present topics in algebra and analysis for students in engineering and science .... Each chapter is organized to include a brief overview, detailed topical discussions and references for further study. Notes about the references guide the student to collateral reading. Theorems, definitions, and corollaries are illustrated with examples. The student is encouraged to prove some theorems and corollaries as models for proving others in exercises. In most chapters, the authors discuss constructs used to illustrate examples of applications. Discussions are tied together by frequent, well written notes. The tables and index are good. The type faces are nicely chosen. The text should prepare a student well in mathematical matters." - S cience Books and iF lms "This is an intermediate level text, with exercises, whose avowed purpose is to provide the science and engineering graduate student with an appropriate modern mathematical (analysis and algebra) background in a succinct, but nontrivial, manner. After some fundamentals, algebraic structures are introduced followed by linear spaces, matrices, metric spaces, normed and inner product spaces and linear operators.... While one can quarrel with the choice of specific topics and the omission of others, the book is quite thorough and can serve as a text, for self-study or as a reference." - M athematical Reviews "The authors designed a typical work from graduate mathematical lectures: formal definitions, theorems, corollaries, proofs, examples, and exercises. It is to be noted that problems to challenge students' comprehension are interspersed throughout each chapter rather than at the end." - C H O ICE
Printed in the USA
Anthony N. Michel Charles .J Herget
Algebra and Analysis for Engineers and Scientists
Birkhauser Boston • Basel • Berlin
Anthony N. Michel Department of Electrical Engineering nU iversity of Notre Dame Notre Dame, IN 64 556 .U S.A.
Charles .J eH rget eH rget Associates P.O. Box 1425 Alameda, CA 94501 .U S.A.
Cover design by Dutton and Sherman, aH mden, CT. Mathematics Subject Classification (2000): 03Ex,x 03E20, 08-,X 08-01, IS-,X 15A04, 15A06, 15A09, 15A15, 15AI8, 15A21, 15A57, 15A60, 15A63, 20-,X 26-01, 26Ax,x 26A03, 26A15, 26Bx,x 34,X 340- 1, 34A,x 34AI2, 34A30, 340H 5, 64 A22, 64 A50, 64 A55, 64 Bx,x 64 B20, 64 B25, 64 Cx,x 64 C05, 64 Ex,x 64 0- 1, 64 Ax,x 74 ,X 74 0- 1, 74 Ax,x 74 A05, 74 A07, 74 A10, 74 A25, 74 A30, 74 A67, 47BI5,47HI0, 54,X 540- 1, 54A20, 54C,x 54C05, 54C30, 540x , 54005, 54 0 30,54 0 35,54 0 4 5 , 54E50, 93EIO
15-01, 15A03, 20-01, 26-,X 54 B05, 64 ,X 64 NIO, 64 N20, 74 N20, 74 N70, 54E35, 54E54 ,
L i brary of Congress Control Number: 2007931687
ISBN-13: 978-08- 176-74 06-3
e-ISBN-13: 978-08- 176-74 07-0
Printed on acid-free paper. ©2007
Birkhiiuser Boston
Originally published as Mathematical oF undations in Engineering and Science by Prentice-aH ll, Englewood Cliffs, NJ, 1981. A subsequent paperback edition under the title Applied Algebra and F r the Birkhiiuser Boston printing, Functional Analysis was published by Dover, New oY rk, 1993. o the authors have revised the original preface. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Birkhiiuser Boston, c/o Springer Science+Business Media C L , 233 Spring Street, New oY rk, NY 10013, S U A), except for brief excerpts in connection with reviews or scholarly analysis. sU e in connection with any form of information .storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. 9 8 7 6 5 432
I
www.birkhauser.com
(IBT)
CONTENTS
PREFACE
IX
CH A PTER 1: 1.1 1.2 1.3 1.4 1.5 1.6
2.2 2.3 2.4
CONCEPTS
1
Sets 1 Functions 12 Relations and Equivalence Relations 25 Operations on Sets 26 Mathematical Systems Considered in This Book References and Notes 31 References 32
CH A PTER 2: 2.1
F U N DAMENTAL
ALGEBRAIC STRU C TU R ES
Some Basic Structures of Algebra
A. Semigroups and Groups
36
30
33 34
8. Rings and iF elds 46 C. Modules, Vector Spaces, and Algebras D. Overview 61 Homomorphisms 62 69 Application to Polynomials References and Notes 74 References 74
53
v
Contents
vi
CHAPTER J : 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8
iL near Spaces 75 iL near Subspaces and Direct Sums 81 iL near Independence, Bases, and Dimension iL near Transformations 95 iL near uF nctionals 109 Bilinear uF nctionals 113 Projections 119 Notes and References 123 References 123
CHAPTER 4 : .4 1 .4 2
.4 3 4.4
.4 5 .4 6 .4 7
.4 8 ;4 9 .4 10
VECTOR SPACES AND LINEAR TRANSFORMATIONS 75
85
FINITE-DIMENSIONAL VECTOR SPACES ANDMATRICES 124
Coordinate Representation of Vectors 124 Matrices 129 A. Representation of iL near Transformations by Matrices 129 B. Rank of a Matrix 134 C. Properties of Matrices 136 Equivalence and Similarity 148 Determinants of Matrices
155
Eigenvalues and Eigenvectors 163 Some Canonical oF rms of Matrices 169 Minimal Polynomials, Nilpotent Operators and the oJ rdan Canonical oF rm 178 A. Minimal Polynomials 178 B. Nilpotent Operators 185 C. The oJ rdan Canonical oF rm 190 Bilinear uF nctionals and Congruence 194 Euclidean Vector Spaces 202 A. Euclidean Spaces: Definition and Properties B. Orthogonal Bases 209 iL near Transformations on Euclidean Vector Spaces A. Orthogonal Transformations 216 B. Adjoint Transformations 218 C. Self-Adjoint Transformations 221 D. Some Examples 227 E. uF rther Properties of Orthogonal Transformations 231
202 216
vii
Contents
.4 11 4.12
Applications. to Ordinary Differential Equations A. Initial-Value Problem: Definition 238 B. Initial-Value Problem: linear Systems 24 4 Notes and References 261 References 262
CH A PTER 5:
METRIC SPACES
238
263
5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9
Definition of Metric Spaces 264 Some Inequalities 268 Examples of Important Metric Spaces 271 Open and Closed Sets 275 Complete Metric Spaces 286 Compactness 298 Continuous Functions 307 Some Important Results in Applications 314 Equivalent and Homeomorphic Metric Spaces. Topological Spaces 317 323 5.10 Applications A. Applications of the Contraction Mapping Principle 323 B. uF rther Applications to Ordinary Differential Equations 329 5.11 References and Notes 341 References 341
CHAPTER 6: 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.1 0 6.11 6.12
NORMED SPACES AND INNER PRODUCT SPACES 343
Normed linear Spaces 344 linear Subspaces 348 Infinite Series 350 Convex Sets 351 iL near Functionals 355 Finite- Dimensional Spaces 360 Geometric Aspects of iL near Functionals 363 Extension of iL near Functionals 367 Dual Space and Second Dual Space 370 Weak Convergence 372 Inner Product Spaces 375 Orthogonal Complements 381
yiii
Contents
6.13 6.14 6.15
6.16
oF urier Series 387 The Riesz Representation Theorem 393 Some Applications 394 A. Approximation of Elements in iH lbert Space (Normal Equations) 395 B. Random Variables 397 C. Estimation of Random Variables 398 Notes and References 404 References 404
CHAPTER 7: 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10
7.11
L I NEAR OPERATORS
406
Bounded iL near Transformations 04 7 Inverses 415 419 Conjugate and Adjoint Operators eH rmitian Operators 24 7 Other iL near Operators: Normal Operators, Projections, nU itary Operators, and Isometric Operators 34 1 The Spectrum of an Operator 439 Completely Continuous Operators 74 The Spectral Theorem for Completely Continuous Normal Operators 454 Differentiation of Operators 458 Some Applications 465 A. Applications to Integral Equations 465 B. An Example from Optimal Control 468 C. Minimization of Functionals: Method of Steepest Descent 74 1 References and Notes 473 References 74 3 Index 475
PREFACE
This book evolved from a one-year sequence of courses offered by the authors at Iowa State University. The audience for this book typically included theoretically oriented first- or second-year graduate students in various engineering or science disciplines. Subsequently, while serving as Chair of the Department of Electrical Engineering, and later, as Dean of the College of Engineering at the University of Notre Dame, the first author continued using this book in courses aimed primarily at graduate students in control systems. Since administrative demands precluded the possibility of regularly scheduled classes, the Socratic method was used in guiding students in self study. This method of course delivery turned out to be very effective and satisfying to student and teacher alike. F e edback from colleagues and students suggests that this book has been used in a similar manner elsewhere. The original objectives in writing this book were to provide the reader with appropriate mathematical background for graduate study in engineering or science; to provide the reader with appropriate prerequisites for more advanced subjects in mathematics; to allow the student in engineering or science to become familiar with a great deal of pertinent mathematics in a rapid and efficient manner without sacrificing rigor; to give the reader a unified overview of applicable mathematics, thus enabling him or her to choose additional courses in mathematics more intelligently; and to make it possible for the student to understand at an early stage of his or her graduate studies the mathematics used in the cur-
ix
x
Preface
rent literature (e.g., journal articles, monographs, and the like). Whereas the objectives enumerated above for writing this book were certainly pertinent over twenty years ago, they are even more compelling today. The reasons for this are twofold. First, today's graduate students in engineering or science are expected to be more knowledgeable and sophisticated in mathematics than students in the past. Second, today's graduate students in engineering or science are expected to be familiar with a great deal of ancillary material (primarily in the computer science area), acquired in courses that did not even exist a couple of decades ago. In view of these added demands on the students' time, to become familiar with a great deal of mathematics in an efficient manner, without sacrificing rigor, seems essential. Since the original publication of this book, progress in technology, and consequently, in applications of mathematics in engineering and science, has been phenomenal. oH wever, it must be emphasized that the type of mathematics itself that is being utilized in these applications did not experience corresponding substantial changes. This is particularly the case for algebra and analysis at the intermediate level, as addressed in the present book. Accordingly, the material of the present book is as current today as it was at the time when this book first appeared. (Plus a~ change, plus c' e st la meme chose.- A lphonse aK rr, 1849.) This book may be viewed as consisting essentially of three parts: set theory (Chapter I), algebra (Chapters 2- 4 ) , and analysis (Chapters 5-7). Chapter I is a prerequisite for all subsequent chapters. Chapter 2 emphasizes abstract algebra (semigroups, groups, rings, etc.) and may essentially be skipped by those who are not interested in this topic. Chapter 3, which addresses linear spaces and linear transformations, is a prerequisite for Chapters ,4 6, and 7. Chapter 4, which treats finite-dimensional vector spaces and linear transformations on such spaces (matrices) is required for Chapters 6 and 7. In Chapter 5, metric spaces are treated. This chapter is a prerequisite for the subsequent chapters. Finally, Chapters 6 and 7 consider Banach and Hilbert spaces and linear operators on such spaces, respectively. The choice of applications in a book of this kind is subjective and will always be susceptible to criticisms. We have attempted to include applications of algebra and analysis that have broad appeal. These applications, which may be omitted without loss of continuity, are presented at the ends of Chapters 2, 4, 5, 6, and 7 and include topics dealing with ordinary differential equations, integral equations, applications of the contraction mapping principle, minimization of functionals, an example from optimal control, and estimation of random variables. All exercises are an integral part of the text and are given when they arise, rather than at the end of a chapter. Their intent is to further the reader's understanding of the subject matter on hand.
Preface
ix
The prerequisites for this book include the usual background in undergraduate mathematics offered to students in engineering or in the sciences at universities in the United States. Thus, in addition to graduate students, this book is suitable for advanced senior undergraduate students as well, and for self study by practitioners. Concerning the labeling of items in the book, some comments are in order. Sections are assigned numerals that reflect the chapter and the section numbers. For example, Section 2.3 signifies the third section in the second chapter. Extensive sections are usually divided into subsections identified by upper-case common letters A, B, C, etc. Equations, definitions, theorems, corollaries, lemmas, examples, exercises, figures, and special remarks are assigned monotonically increasing numerals which identify the chapter, section, and item number. For example, Theorem 4.4.7 denotes the seventh identified item in the fourth section of Chapter .4 This theorem is followed by Eq. (4.4.8), the eighth identified item in the same section. Within a given chapter, figures are identified by upper-case letters A, B, C, etc., while outside of the chapter, the same figure is identified by the above numbering scheme. iF nally, the end of a proof or of an example is signified by the symbol • .
Suggested Course Outlines Because of the flexibility described above, this book can be used either in a onesemester course, or a two-semester course. In either case, mastery of the material presented will give the student an appreciation of the power and the beauty of the axiomatic method; will increase the student's ability to construct proofs; will enable the student to distinguish between purely algebraic and topological structures and combinations of such structures in mathematical systems; and of course, it will broaden the student's background in algebra and analysis.
A one-semester course Chapters 1, 3, 4, 5, and Sections 6.1 and 6.11 in Chapter 6 can serve as the basis for a one-semester course, emphasizing basic aspects of Linear Algebra and Analysis in a metric space setting. The coverage of Chapter 1 should concentrate primarily on functions (Section 1.2) and relations and equivalence relations (Section 1.3), while the material concerning sets (Section 1.1) and operations on sets (Section 1.4) may be covered as reading assignments. On the other hand, Section 1.5 (on mathematical systems) merits formal coverage, since it gives the student a good overview of the book' s aims and contents.
xii
Preface
The material in this book has been organized so that Chapter 2, which addresses the important algebraic structures encountered in Abstract Algebra, may be omitted without any loss of continuity. In a one-semester course emphasizing Linear Algebra, this chapter may be omitted in its entirety. In Chapter 3, which addresses general vector spaces and linear transformations, the material concerning linear spaces (Section 3.1), linear subspaces and direct sums (Section 3.2), linear independence and bases (Section 3.3), and linear transformations (Section 3.4) should be covered in its entirety, while selected topics on linear functionals (Section 3.5), bilinear functionals (Section 3.6), and projections (Section 3.7) should be deferred until they are required in Chapter .4 Chapter 4 addresses finite-dimensional vector spaces and linear transformations (matrices) defined on such spaces. The material on determinants (Section 4.4) and some of the material concerning linear transformations on Euclidean vector spaces (Subsections .4 1 OD and .4 1 OE), as well as applications to ordinary differential equations (Section 4.11) may be omitted without any loss of continuity. The emphasis in this chapter should be on coordinate representations of vectors (Section 4.1), the representation of linear transformations by matrices and the properties of matrices (Section 4.2), equivalence and similarity of matrices (Section 4.3), eigenvalues and eigenvectors (Section 4.5), some canonical forms of matrices (Section 4.6), minimal polynomials, nilpotent operators and the Jordan canonical form (Section 4.7), bilinear functionals and congruence (Section 4.8), Euclidean vector spaces (Section 4.9), and linear transformations on Euclidean vector spaces (Subsections .4 1 OA, .4 1 OB, and .4 1 oq . Chapter 5 addresses metric spaces, which constitute some of the most important topological spaces. In a one-semester course, the emphasis in this chapter should be on the definition of metric space and the presentation of important classes of metric spaces (Sections 5.1 and 5:3), open and closed sets (Section 5.4), complete metric spaces (Section 5.5), compactness (Section 5.6), and continuous functions (Section 5.7). The development of many classes of metric spaces requires important inequalities, including the Holder and the Minkowski inequalities for finite and infinite sums and for integrals. These are presented in Section 5.2 and need to be included in the course. Sections 5.8 and 5.10 address specific applications and may be omitted without any loss of continuity. oH wever, time permitting, the material in Section 5.9, concerning equivalent and homeomorphic metric spaces and topological spaces, should be considered for inclusion in the course, since it provides the student a glimpse into other areas of mathematics. To demonstrate mathematical systems endowed with both algebraic and topological structures, the one-semester course should include the material of Sections 6.1 and 6.2 in Chapter 6, concerning normed linear spaces (resp., Banach spaces) and inner product spaces (resp., Hilbert spaces), respectively.
Preface
ix ii
A two-semester course In addition to the material outlined above for a one-semester course, a two-se-
mester course should include most of the material in Chapters 2, 6, and 7. Chapter 2 addresses algebraic structures. The coverage of semigroups and groups, rings and fields, and modules, vector spaces and algebras (Section 2.1) should be in sufficient detail to give the student an appreciation of the various algebraic structures summarized in Figure B on page 61. Important mappings defined on these algebraic structures (homomorphisms) should also be emphasized (Section 2.2) in a two-semester course, as should the brief treatment of polynomials in Section 2.3. The first ten sections of Chapter 6 address normed linear spaces (resp., Banach spaces) while the next four sections address inner product spaces (resp., Hilbert spaces). The last section of this chapter, which includes applications (to random variables and estimates of random variables), may be omitted without any loss of continuity. The material concerning normed linear spaces (Section 6.1), linear subspaces (Section 6.2), infinite series (Section 6.3), convex sets (Section 6.4), linear functionals (Section 6.5), finite-dimensional spaces (Section 6.6), inner product spaces (Section 6.11), orthogonal complements (Section 6.12), and Fourier series (Section 6.13) should be covered in its entirety. Coverage of the material on geometric aspects of linear functionals (Section 6.7), extensions of linear functionals (Section 6.8), dual space and second dual space (Section 6.9), weak convergence (Section 6.10), and the Riesz representation theorem (Section 6.14) should be selective and tailored to the availability of time and the students' areas of interest. (For example, students interested in optimization and estimation problems may want a detailed coverage of the H a hn- B anach theorem included in Section 6.8.) Chapter 7 addresses (bounded) linear operators defined on Banach and Hilbert spaces. The first nine sections of this chapter should be covered in their entirety in a two-semester course. The material of this chapter includes bounded linear transformations (Section 7.1), inverses (Section 7.2), conjugate and adjoint operators (Section 7.3), Hermitian operators (Section 7.4), normal, projection, unitary and isometric operators (Section 7.5), the spectrum of an operator (Section 7.6), completely continuous operators (Section 7.7), the spectral theorem for completely continuous normal operators (Section 7.8), and differentiation of (not necessarily linear and bounded) operators (Section 7.9). The last section, which includes applications to integral equations, an example from optimal control, and minimization of functionals by the method of steepest descent, may be omitted without loss of continuity. Both one-semester and two-semester courses offered by the present authors, based on this book, usually included a project conducted by each course participant to demonstrate the applicability of the course material. Each project
ix v
Preface
involved a formal presentation to the entire class at the end of the semester. The courses described above were also offered using the Socratic method, following the outlines given above. These courses typically involved half a dozen participants. While most of the material was self taught by the students themselves, the classroom meetings served as a forum for guidance, clarifications, and challenges by the teacher, usually resulting in lively discussions of the subject on hand not only among teacher and students, but also among students themselves. For the current printing of this book, we have created a supplementary website of additional resources for students and instructors: http://Michel.Herget. net. Available at this website are additional current references concerning the subject matter of the book and a list of several areas of applications (including references). Since the latter reflects mostly the authors' interests, it is by definition rather subjective. Among several additional items, the website also includes some reviews of the present book. In this regard, the authors would like to invite readers to submit reviews of their own for inclusion into the website. The present publication of Algebra and Analysisfor Engineers and Scientists was made possible primarily because of Tom Grasso, Birkhauser's Computational Sciences and Engineering Editor, whom we would like to thank for his considerations and professionalism. Anthony N. Michel Charles .J Herget Summer. 2007
1
N U F DAMENTAL
CONCEPTS
In this chapter we present fundamental concepts required throughout the remainder of this book. We begin by considering sets in Section 1.1. In Section 1.2 we discuss functions; in Section 1.3 we introduce relations and equivalence relations; and in Section 1.4 we concern ourselves with operations on sets. In Section 1.5 we give a brief indication of the types of mathematical systems which we will consider in this book. The chapter concludes with a brief discussion of references.
1.1. SETS Virtually every area of modern mathematics is developed by starting from an undefined object called a set. There are several reasons for doing this. One of these is to develop a mathematical discipline in a completely axiomatic and totally abstract manner. Another reason is to present a unified approach to what may seem to be highly diverse topics in mathematics. Our reason is the latter, for our interest is not in abstract mathematics for its own sake. However, by using abstraction, many of the underlying principles of modern mathematics are more clearly understood. Thus, we begin by assuming that a set is a well defined collection of 1
Chapter 1 I uF ndomental Concepts
2
elements or objects. We denote sets by common capital letters A, B, C, etc., and elements or objects of sets by lower case letters a, b, c, etc. F o r example, we write A = a{ , b, c} to indicate that A is the collection of elements a, b, c. If an element x belongs to a set A, we write X EA. In this case we say that "x belongs to A," or "x is contained in A," or "x is a member of A," etc. Ifx is any element and if A is a set, then we assume that one knows whether x belongs to A or whether x does not belong to A. If x does not belong to A we write x ¢
A.
To illustrate some of the concepts, we assume that the reader is familiar with the set of real numbers. Thus, if we say
R is the set of all real numbers, then this is a well defined collection of objects. We point out that it is possible to characterize the set of real numbers in a purely abstract manner based on an axiomatic approach. We shall not do so here. To illustrate a non-well defined collection of objects, consider the statement "the set of all tall people in Ames, Iowa." This is clearly not precise enough to be considered here. We will agree that any set A may not contain any given element x more than once unless we explicitly say so. Moreover, we assume that the concept of "order" will play no role when representing elements of a set, unless we say so. Thus, the sets A = a{ , b, c} and B = c{ , b, a} are to be viewed as being exactly the same set. We usually do not describe a set by listing every element between the curly brackets { } as we did for set A above. A convenient method of characterizing sets is as follows. Suppose that for each element x of a set A there is a statement P(x ) which is either true or false. We may then define a set B which consists of all elements x E A such that P(x ) is true, and we may write B
=
{x
E
A: P(x ) is true}.
F o r example, let A denote the set of all people who live in Ames, Iowa, and let B denote the set of all males who live in Ames. We can write, then, B=
{x
E
A: x is a male}.
When it is clear which set x belongs to, we sometimes write { x : P(x ) is true} (instead of, say, {x E A: P(x ) is trueD. It is also necessary to consider a set which has no members. Since a set is determined by its elements, there is only one such set which is called the
1.1. Sets
3
empty set, or the vacuous set, or the null set, or the void set and which is denoted by 0. Any set, A, consisting of one or more elements is said to be non-empty or nOD-void. IfA is non-void we write A 1= = 0. If A and B are sets and if every element of B also belongs to A, then we say that B is a subset of A or A includes B, and we write B c A or A :::> B. Furthermore, if B c A and if there is an x E A such that x .¢ B, then we say that B is a proper subset of A. Some texts make a distinction between proper subset and any subset by using the notation c and ~, respectively. We shall not use the symbol ~ in this book. We note that if A is any set, then 0 c: A. Also, 0 c 0. If B is not a subset of A, we write B ¢ A or A P= B. 1.1.1. Example. Let R denote the set of all real numbers, let Z denote the set of all integers, let J denote the set of all positive integers, and let Q denote the set of all rational numbers. We could alternately describe the set Zas Z = { x E R: x is an integer}. Thus, for every x E R, the statement x is an integer is either true or false. We frequently also specify sets such as J in the following obvious manner,
J
=
{x
E Z: x
=
1, 2, ...}.
We can specify the set Q as Q=
x{
E
R:x =
:,p,q
E
Z,q : ;i:o} .
It is clear that 0 c J c Z c Q c R, and that each of these subsets are proper subsets. We note that 0 .¢ .J •
We now wish to state what is meant by equality of sets. 1.1.2. De6nition. Two sets, A and B, are said to be equal if A c Band B c A. In this case we write A = B. If two sets, A and B, are not equal, we write A :;i: B. Ifx and y denote the same element of a set, we say that they are equal and we write x = y. If x and y denote distinct elements of a set, we write x :;i: y.
We emphasize that all definitions are "ifand only if" statements. Thus, in the above definition we should actually have said: A and B are equal if and only if A c Band Be A. Since this is always understood, hereafter all definitions will imply the "only if" portion. Thus, we simply say: two sets A and B are said to be equal if A c Band B cA. In Definition 1.1.2 we introduced two concepts of equality, one of equality of sets and one of equality of elements. We shall encounter many forms of equality throughout this book.
Chapter 1 I uF ndamental Concepts
4
Now let X be a set and let A c: .X The complement of subset A with respect to X is the set of elements of X which do not belong to A. We denote the complement of A with respect to X by CxA . When it is clear that the complement is with respect to ,X we simply say the complement of A (instead of the complement of A with respect to X), and simply write A- . Thus, we have A-
=
{x
E
X: x
AJ. ~
(1.1.3)
In every discussion involving sets, we will always have a given fixed set in mind from which we take elements and subsets. We will call this set the universal set, and we will usually denote this set by .X Throughout the remainder of the present section, X denotes always an arbitrary non-void fixed set. We now establish some properties of sets. 1.1.4. (i) (ii) (iii) (iv) (v) (vi)
Theorem. eL t A, B, and C be subsets of .X
Then
if A c: Band Bee, then Ace; X= 0; 0- = X; (A- r = A; A c B if and only if A- >= B- ; and A = B if and only if A- = B- .
Proof To prove (i), first assume that A is non-void and let x E A. Since A c: B, x E B, and since B c: C, X E C. Since x is arbitrary, every element of A is also an element of C and so A c C. Finally, if A = 0, then A c C follows trivially. The proofs of parts (ii) and (iii) follow immediately from (1.1.3). To prove (iv), we must show that A c (A- ) - and (A- r c: A. If A = 0, then clearly A c: (A- r . Now suppose that A is non-void. We note from (1.1.3) that (A- r
=
{x
E
X:
x
~
A- } .
(1.1.5)
If x E A, it follows from (1.1.3) that x ~ A- , and hence we have from (1.1.5) that x E (A- ) - . This proves that A c:(A- ) - . If(A- r = 0, then A = 0; otherwise we would have a contradiction by what we have already shown; i.e., A c: (A- r . So let us assume that (A- r 0. If x E (A- r it follows from (1.1.5) that x ~ A- , and thus we have x E A in view of (1.1.3). eH nce, (A- r c: A. We leave the proofs of parts (v) and (vi) as an exercise. _
"*
1.1.6. Exercise.
Prove parts (v) and (vi) of Theorem 1.1.4.
The proofs given in parts (i) and (iv) of Theorem 1.1.4 are intentionally quite detailed in order to demonstrate the exact procedure required to prove
1.1. Sets
5
containment and equality of sets. Frequently, the manipulations required to prove some seemingly obvious statements are quite long. It is suggested that the reader carry out all the details in the manipulations of the above exercise and the exercises that follow. Nex t , let A and B be subsets of .X We define the union of sets A and B, denoted by A U B, as the set of all elements that are in A or B; i.e., A
u
B=
x{
E X:
x
E
A or x
E
B}.
When we say x E A or x E B, we mean x is in either A or in B or in both A and B. This inclusive use of "or" is standard in mathematics and logic. IfA and B are subsets of ,X we define their intersection to be the set of all elements which belong to both A and B and denote the intersection by A n B. Specifically, A n B = x { E X : x E A and x E B}.
If the intersection of two sets A and B is empty, i.e., if A n B = 0, we say that A and B are disjoint. F o r example, let X = I{ , 2, 3,4 , 5}, let A = I{ , 2}, let B = 3{ , ,4 5}, let C = 2{ , 3}, and let D = ,4{ 5}. Then A- = B, B- = A, DeB, A U B = ,X A n B = 0, A U C = I{ , 2, 3}, B n D = D, A n C = 2{ ,} etc. In the next result we summarize some of the important properties of union and intersection of sets. 1.1.7. Theorem. (i)
(ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) (xi) (xii) (xiii) (xiv) (xv) (xvi) (xvii)
An B AU B An 0 Au 0
Let A, B, and C be subsets of .X
An X = AuX = X ; A n A= Au A= A u A-
An A-
B n A; B U A; =
=
=
0; A; =
=
A;
=
A; A; X;
0;
An Be A; An B = A if and only if A c B; A c A U B; A = A u B if and only if B c A; (A n B) n C = An (B n C); (A U B) U C = A U (B U C); A n (B u C) = (A n B) u (A n C);
Then
Chapter 1 I uF ndamental Concepts
6
(xviii) (xi)x (x)x
(A
n
B) U
(A U B)(A n Bf
= =
C = (A U C) () (B U A- n B- ; and A- U B- .
C);
Proof. We only prove part (xviii) of this theorem. again as an illustration of the manipulations involved. We will first show that (A () B) U C c (A U C) () (B U C), and then we show that (A () B) U C::::J (A U C) n (B U C). Clearly, if (A () B) U C = 0, the assertion is true. So let us assume that (A () B) U C *- 0, and let x be any element of (A () B) U C. Then x E A () B or x E C. Suppose x E A () B. Then x belongs to both A and B, and hence x E A U C and x E B U C. F r om this it follows that x E (A U C) () (B U C). On the other hand, let x E C. Then x E A U C and x E B U C. and hence x E (A U C) () (B U C). Thus, if x E (A n B) U C, then x E (A U C) n (B U C). and we have (A
n
B) U C
c (A U
n
C)
(B U
C).
(1.1.8)
To show that (A () B) U C ::::J (A U C) () (B U C) we need to prove the assertion only when (A U C) () (B U C) *- 0. So let x be any element of (A U C) n (B U C). Then x E A U C and x E B U C. Since x E A U C, then x E A or x E C. Furthermore, x E B U C implies that x E B or x E C. We know that either x E C or x ¢ C. If x E C. then x E (A () B) U C. If x ¢ C, then it follows from the above comments that x E A and also x E B. Then x E A () B, and hence x E (A () B) U C. Thus, if x ¢ C, then x E (A () B) U C. Since this exhausts all the possibilities, we conclude that (A U C) () (B U C) c (A () B) U C. (1.1.9) F r om (U . S) U C .•
and (1.1.9) it follows that (A U
C) () (B U
C) =
1.1.10. Exercise. Prove parts (i) through (xvii) and parts (xi)x of Theorem 1.1.7.
(A () B)
and (x)x
In view of part (xvi) of Theorem 1.1.7, there is no ambiguity in writing B U C. Extending this concept. let n be any positive integer and let AI' A 2• • ,A3 denote subsets of .X The set AI U A 2 U ... U A3 is defined to be the set of all x E X which belong to at least one of the subsets AI, and we write
AU
U ';1 3
A, =
AI U
A2 U
... U
A3
=
x{
E A, for
x
E X:
some i
= 1• . .. , n).
Similarly, by part (xv) of Theorem 1.1.7, there is no ambiguity in writing A () B n C. We define
n ';1
A, =
AI () A: () ... () A. =
{x
E
X:
x
E
A, for all i
= 1, ... ,n).
1.1. Sets That is,
7
n A, consists of those members of X n
1= 1
AI, A z , • .
which belong to all the subsets
,An'
We will consider the union and the intersection of an infinite number of subsets A, at a later point in the present section. and (x)x of Theorem The following is a generalization of parts (xi)x
1.1.7.
1.1.11. Theorem. Let AI> ... , An be subsets of .X Then (i) (ii)
1.1.14.
U[ 1=
n[
1=1
I
A/J- = A/J=
Exercise.
n A;, 1= 1
(1.1.12)
and
U/=1 A;.
(1.1.13)
Prove Theorem 1.1.11.
The results expressed in Eqs. (1.1.12) and (1.1.13) are usually referred to as De Morgan's laws. We will see later in this section that these laws hold under more general conditions. Next, let A and B be two subsets of X. We define the difference of Band A, denoted (B - A), as the set of elements in B which are not in A, i.e.,
B-
A=
{x
E
:X x
E
f/:
B and x
A}.
We note here that A is not required to be a subset of B. It is clear that B-
A
=
Bn A- .
Now let A and B be again subsets of the set .X The symmetric difference of A and B is denoted by A ! l B and is defined as A !l B
=
(A -
B) U
(B -
A).
The following properties follow immediately.
1.1.15. Theorem. Let A, B, and C denote subsets of .X Then (i) A A B (ii) A A B
=
=
B A A; (A U B) -
(iii) A A A = 0; (iv) A! l 0 = A; (v) A A (B! l C) =
(A
n
B);
(A A B) A C;
(vi) A n (B! l C) = (A n B)! l (A (vii) A! l Be (A! l C) U (C! l B).
1.1.16. Exercise.
n
Prove Theorem 1.1.15.
C); and
8
Chapter 1 I Fundamental Concepts
In passing, we point out that the use of Venn diagrams is highly useful in visualizing properties of sets; however, under no circumstances should such diagrams take the place of a proof. In Figure A we illustrate the concepts of union, intersection, difference, and symmetric difference of two sets, and the complement of a set, by making use of Venn diagrams. Here, the shaded regions represent the indicated sets.
•
A
B
CU ( DO
C"' A U B
x
1.1.17.
iF gure A. Venn diagrams.
1.1.18. Definition. A non-void set A is said to be finite if A contains n distinct elements, where n is some positive integer; such a set A is said to be of order n. The null set is defined to be finite with order ez ro. A set consisting of exactly one element, say A = a{ ,} is called a singleton or the singleton of a. If a set A is not finite, then we say that A is infinite. In Section 1.2 we will further categorize infinite sets as being countable or uncountable. Next, we need to consider sets whose elements are sets themselves. F o r example, if A, D, and C are subsets of ,X then the collection 1< = A { , B, C} is a set whose elements are A, D, and C. We usually call a set whose elements are subsets of X a family of subsets of X or a collection of subsets of .X We will usually employ a hierarchical system of notation where lower case letters, e.g., a, b, c, are elements of ,X upper case letters, e.g., A, B, C, are subsets of ,X and script letters, e.g., 1< , B< , e, are families of subsets of .X We could, of course, continue this process and consider a set whose elements are families of subsets, e.g., 1<{ , B< , e}. In connection with the above comments, we point out that the empty
9
1.1. Sets
set, 0, is a subset of .X It is possible to form a non-empty set whose only element is the empty set, i.e., { 0 } . In this case, { 0 } is a singleton. We see that o E { 0 } and 0 c { 0 } . In principle, we could also consider sets made up of both elements of X and subsets of .X F o r example, if x E X and A c ,X then ,x { A} is a valid set. oH wever, we shall not make use of sets of this nature in this book. There is a special family of subsets of X to which we given a special name.
1.1.19. Definition. Let A be any subset of .X
We define the power class of A or the power set of A to be the family of all subsets of A. We denote the power class of A by P< (A). Specifically, (P(A)
=
B { : B c A}.
1.1.20. Example. The power class of the empty set, (P(0) = { 0 } , i.e., the singleton of 0. The power class of a singleton, (P({a)J = { 0 , a{ n. F o r the In general, if A is a finite set with set A = a{ , b}, (P(A) = { 0 , a{ ,} b{ ,} a{ , n elements, then (P(A) contains 2" elements. _
bn.
Before proceeding further, it should be pointed out that a free and uncritical use of a set theory can lead to contradictions and that set theory has had a careful development with various devices used to exclude the contradictions. Roughly speaking, contradictions arise when one uses sets which are "too big," such as trying to speak of a set which contains everything. In all of our subsequent discussions we will keep away from these contradictions by always having some set or space X fixed for a given discussion and by considering only sets whose elements are elements of ,X or sets (collections) whose elements are subsets of ,X or sets (families) whose elements are collections of subsets of ,X etc. Let us next consider ordered sets. Above, we defined set in such a manner that the ordering of the elements is immaterial, and furthermore that each element is distinct. Thus, if a and b are elements of ,X then a{ , b} = b{ , a}; i.e., there is no preference given to a or b. Furthermore, we have a{ , a, b} = a{ , b}. In this case we sometimes speak of an unordered pair a{ , b}. Frequently, we will need to consider the ordered pair (a, b), (a and b need not belong to the same set) where we distinguish between the first element a and the second element b. In this case (a, b) = (u, v) if and only if u = a and v = b. Thus, (a, b) *- (b, a) if a *- b. Also, we will consider ordered triplets (a, b, c), ordered u q adruplets (a, b, c, d), etc., where we need to distinguish between the first element, second element, third element, fourth element, etc. Ordered pairs, ordered triplets, ordered quadruplets, etc., are examples of ordered sets. We point out here that our characterization of ordered sets is not axiomatic, since we are assuming that the reader knows what is meant by the first
10
element, second element, third element, etc. (However, it is possible to define ordered sets in a totally abstract fashion without assuming this simple fact. We shall forego these subtle distinctions and accept the preceding as a definition.) Now let X and Y be two non-void sets. We define the Cartesian or direct product of X and ,Y denoted by X x ,Y as the set of all ordered pairs whose first element belongs to X and whose second element belongs to .Y Thus,
x Y =
X
y): x
{(x,
E
,X Y
E
.} Y
(1.1.21)
Next, let X I > " " IX I denote n arbitrary non-void sets. We similarly define the (n-fold) Cartesian product of XI" .. , X., denoted by X I x X 2 X •.• x X., as XI
x
X
2 X
x
•..
X.
=
{(Xl' XI
x
2, • .
E IX >
,x . ): X 2 E
X
2, • •
, X.
EX . } .
(1.1.22)
We call X I the ith element of the ordered set (X l ' • • ,x . ) E X I X X 2 X x .X , i = 1, ... , n. eH re again, two ordered sets (X I " • . ,x . ) and •• ,Y . ) are said to be equal if and only if X I = Y I ' i = 1, ... ,n. In the following example, the symbol I:>. means equal by definition.
•. (Y I '
Example. eL t R be the set of all real numbers. We denote the Cartesian product, R x R, by R2 I:>. R x R. Thus, if x, Y E R, the ordered pair (x, y) E R x R. We may interpret (x, y) geometrically as being the coordinates of a point in the plane, x being the first coordinate and Y the second coordinate. _ 1.1.23.
1.1.24.
and
eL t A =
Example. A x
B
A
and let B =
a{ , b, c}. Then
({O, a), (0, b), (0, c), (1, a), (I, b), (I, c)}
B=
x
to, I},
=
({ a, 0), (a, 1), (b, 0), (b, 1), (c, 0), (c, I)}.
F r om this example it follows that, in general, if A and B are distinct sets, then A x B=;e. B x A. • Next, we consider some generalizations to an ordered set. To this end, let I denote any non-void set which we call index set. Now for each « E I, suppose there is a unique A" c: .X We call A { .. :« E I} an indexed family of sets. This notation requires some clarification. Strictly speaking, the set notation A { .. :« E I} would normally indicate that none of the sets A", « E I may be repeated. oH wever, in the case of indexed family we agree to permit the possibility that the sets A.. ,« E I need not be distinct. We define an indexed set in a similar manner. eL t I be an index set, and for each« E I let there be a unique element x .. E .X Then the set { x .. : « E I}
1.1. Sets
11
is called an indexed set. Here again, we agree to permit the possibility that the elements x .., a E I need not be distinct. Clearly, if I is a finite non-void set, then an indexed set is simply an ordered set. In the next definition, and throughout the remainder of this section, J denotes the set of positive integers. 1.1.25. Definition. A sequence is an indexed set whose index set is .J A sequence of sets is an indexed family of sets whose index set is .J We usually abbreviate the sequence "x { E :X n E } J by ,x { ,}, when no possibility for confusion exists. (Even though the same notation is used for the sequence ,x { ,} and the singleton of "x ' the meaning as to which is meant will always be clear from context.) Some authors write ,x { ,};C! to indicate that the index set of the sequence is .J Also, some authors allow the index set of a sequence to be finite. We are now in a position to consider the following additional generalizations.
{ .. : a E I} be an indexed family of sets, and let 1.1.26. Definition. L e t A K be any subset of I. If K is non-void, we define
U
and
{x
n A.. =
=
If K
A.. =
.. eJ r
..ex
0, we define
U
.. e0
E
{x
E
A..
=
:X x X:
A.. for some a
E
x
E
0 and
E
}K
A.. for all ex E K }
n A.. =
.. e0
.
.X
The union and intersection of families of sets which are not necessarily indexed is defined in a similar fashion. Thus, if ff' is any non-void family of subsets of ,X then we define
U
pe'
and
F =
x{
(\ F = e' ~
E
{x
:X x E F for some F E ff'} E
:X x
E
F for all F E
When, in Definition 1.1.26, K is of the form K where k is an integer, we sometimes write 1.1.27. Example. Let X < I}. Let A.. = x{
o< x
and
n A.. =
.el
O { ,J
=
ff'.}
k{ , k
+
1, k
0 A" and n A".
• -t
•
+
2, ...},
k
= R, the set of real numbers, and let I = x{ E R: R: 0 < x < a} for all a E I. Then, U A. = I
E
..el
i.e., the singleton containing only the element O.
•
Chapter 1 I uF ndamental Concepts
11
1.1.28. Example.
A" = :x {
-n <
x <
{x
E
R: - -
:x {
0 <x<
B" =
eL t and
n B" = 00
,,= \
. Then, U =
X
eL t
n+ .}
. Rand n A" =
R, the set of real numbers, and let 1=
I < xI < 1} + .. = 1
n
A" =
.
n
U ..
.- 1
Then,
,,= \
:x {
B" = :x {
-1
.J Let
< x <
-1 <
x <
I} .
2}
I} . •
The reader is now in a position to prove the following results. 1.1.29. Deorem. eL t A { .. : (t E I} be an indexed family of sets. Let any subset of ,X and let K be any subset of I. Then (i) B
n
(ii) B U (iii)
B-
(iv) B (v) (vi)
U [ .. eIC
n[
.. eIC
U [ .. ex
n[ U
.. eIC
.. eIC
=
B [
n B[
=
A..] A..
U
=
A..]
n (B .. eIC
B be
n A..] ; U
A.. ]; A..);
n A.. = U (B - A.. ); A.. r = n A;; and .. ex
.. eIC
.. eIC
A..
r
1.1.30. Exercise.
=
.. eIC
.. elC
U
.. ex
A;.
Prove Theorem 1.1.29.
Parts (v) and (vi) of Theorem 1.1.29 are called De Morgan's laws. We conclude the present section with the following: 1.1.31. Definition. eL t ff' be any family of subsets of .X ff' is said to be a family of disjoint sets if for all A, B E ff' such that A :# B, then A n B = 0. A sequence of sets E{ ,,} is said to be a sequenee of disjoint sets if for every m, n E J such that m n, E", n E" = 0.
"*
1.2.
N U F CTIONS
We first give the definition of a function in a set theoretic manner. Then we discuss the meaning of function in more intuitive terms. 1.2.1. Definition. eL t X and Y be non-void sets. A function/from X into Y is a subset of X x Y such that for every x E X there is one and only one y E Y (Le., there is a unique y E )Y such that (x, y) E f The set X is called the domain of f (or the domain of definition of f), and we say that / is de8Ded on .X The set { y E :Y (x, y) E f for some x E X } is called the range of f and is denoted by
1.2. uF nctions and denote it by f(x ) . We sometimes writef: X into .Y
f from X
13 ->
Y t o denote the function
The terms mapping, map, operator, transformation, and function are used interchangeably. When using the term mapping, we usually say "a mapping of X into Y." Although the distinction between the words "of X " and "from X " is immaterial, as we shall see, the wording "into Y " becomes important as opposed to the wording "onto Y," which we will encounter later. Sometimes it is convenient not to insist that the domain of definition off be all of X ; i.e., a function is sometimes defined on a subset of X rather than on all of .X In any case, the domain of definition offis denoted by ' J ( f) c .X Unless specified otherwise, we shall always assume that ' J ( f) = .X Intuitively, a functionfis a "rule" whereby for each x E X a uniq u e y E Y is assigned to x . When viewed in this manner, the term mapping is q u ite descriptive. However, defining a function as a "rule" involves usage of yet another undefined term. Concerning functions, some additional comments are in order. So-called "multivalued functions" are not allowed by the above definition. They will be treated later under the topic of relations (Section 1.3). 2. The set X (or )Y may be the Cartesian product of sets, e.g., X = IX X X 2 X ... x "X . In this case we think offas being a function of n variables. We write f(x l , . . . ,x,,) to denote the value offat (X I ' • , , x . ) E X = X I X ... x "X . 3. It is important that the distinction between a function and the value of a function be clearly understood. The value of a function, f(x ) , is an element of .Y The function.f is a much larger entity, and it is to be thought ofas a single object. Note that f E P< (X x Y ) (the power x Y ) is a function. The set of X x Y), but not every element of P< (X set of all functions from X into Yis a subset of P< (X x Y) and is sometimes denoted by yx . 1.
1.2.2. Example. Let A and B be the sets defined in Example 1.1.24. L e t fbe the subset of A x B given byf = (0, a), (1, b)}. Thenfis a function from A into B. We see thatf(O) = a andf(1) = b. The range offis the set (a, h} which is a proper subset of B. • Although we have defined a function as being a set, we usually characterize a function according to a rule as shown, for example, in the following. 1.2.3. Example. L e t R denote the real numbers, and letfbe a function from R into R whose value at each x E R is given by f(x ) = sin x. The function f is the sine function. Expressed explicitly as a set, we see that f = {(x, y):
14
Chapter 1 / Fundamental Concepts
= sin .}x Note that the subset ({ ,x function. _
y
y): x
=
sin y} c R x
R is not a
The preceding example also illustrates the notion of the graph of a function. Let X and Y denote the set of real numbers, let X x Y denote their Cartesian product, and let/be a function from X into .Y The collection of ordered pairs (x , f(x » in X x Y is called the graph of the function f. Thus, a subset G of X x Y is the graph of a function defined on X if and only if for each x E X there is a unique ordered pair in G whose first element is .x In fact, the graph of a function and the function itself are one and the same thing. Since functions are defined as sets, equality of fUDctiODS is to be interpreted in the sense as equality of sets. With this in mind, the reader will have no difficulty in proving the following. 1.2.4. Deorem. Two mappings I and g of X into Y a re if I(x ) = g(x) for every x E .X 1.2.5. Exercise.
equal if and only
Prove Theorem 1.2.4.
We now wish to further characterize and classify functions. If I is a function from X into ,Y we denote the range ofIby Gl(f). In general, Gl(/) c Y mayor may not be a proper subset of .Y Thus, we have the following definition. 1.2.6. Definition. Let I be a function from X into .Y If Gl(f) = ,Y then I is said to be surjective or a surjection, and we say that I maps X onto .Y
If/is a function such that for every IX ,X:t E X , f(x implies that l ) = / (x : t) lX = :X t, then/is said to be injective or a oile-to-one mapping, or an injection. If/is both injective and surjective, we say that/is bijective or one-to-one and onto, or a bijection. Let's go over this again. Every function I: X - > Y is a mapping of X into .Y If the range of I happens to be all of ,Y then we say I maps X onto .Y F o r each x E ,X there is always a unique y E Ysuch thaty = I(x ) . oH wever, there may be distinct elements lX and :X t in X such that I(x l ) = I(x:t). If there is a unique x E X such that I(x ) = y for each y E Gl(f), then we say that I is a one-to-one mapping. IfI maps X onto Y a nd is one-to-one, we say that I is one-to-one and onto. In Figure B an attempt is made to illustrate these concepts pictorially. In this figure the dots denote elements of sets and the arrows indicate the rules of the various functions. The reader should commit to memory the following associations: surjective ~ onto; injective ~ one-to-one; bijective ~ one-to-one and onto. Frequently, the term one-to-one is abbreviated as (1-1).
15
1.2. Functions e •
•
5
d •
•
4
,.4
e •
:?' 3 :7 b
d><:3 c
2
a
12 : X
,Y
-+
2 -+
Y
1.2.7.
c •
".4 ,.3
b.
1
a~: 13 : X
2
'~4
b
3
c
2
d
1
I.: .X - + Y . I. is bijective
3 ... 3Y
13 is (1 - 11
12 is onto
I, is into
,.5
2
• 1
1,: ,X
d.
iF gure B. Illustration of different types of mappings.
We now prove the following important but obvious result. 1.2.8. Theorem. eL t f be a function from X into ,Y and let Z = fSt(f), the range off L e tgdenote the set{(y, x ) E Z X :X (x, y) E f} . Then, clearly, g is a subset of Z x X and f is injective if and only if g is a function from Z into .X
Proof L e tfbe injective, and let y E Z. Since y E fSt(f), there is an x E X such that (x, y) E f and hence (y, x ) E g. Now suppose there is another IX E X such that (y, IX ) E g. Then (XI' y) E f Since f is injective and y = f(x ) = f(x I)' this implies that X = X I and so X is unique. This means that g is a function from Z into .X Conversely, suppose g is a function from Z into X.
Let x
h
X~
E
X
be such
that f(x l ) = f(x 2). This implies that (X I ,f(X I» and (X 2,f(X 2» E f and so (f(x l ), IX ) and (f(x 2), x 2) E g. Since f(x l ) = f(x 2) and g is a function, we must have IX = X 2. Therefore,fis injective. _ The above result motivates the following definition. 1.2.9. Definition. L e tfbe an injective mapping of X into .Y Then we say that f has an inverse, and we call the mapping g defined in Theorem 1.2.8 the inverse off Hereafter, we will denote the inverse of fbyf- I .
Clearly, iffhas an inverse, thenf1.2.10. Theorem.
(i) (ii) (iii) (iv)
eL t
f
I
is a mapping from fSt(f) onto .X
be an injective mapping of X into .Y Then
fis a one-to-one mapping of X onto fSt(f); f- I is a one-to-one mapping of fSt(f) onto ;X for every X E ,X f- l (f(X » = ;X and for every y E al(f),f(f- I (y» = y.
1.2.11. Exercise.
Prove Theorem 1.2.10.
16
Chapter 1
I
Fundamental Concepts
Note that in the above definition, the domain of /-1 is R < (f) ,which need not be all of .Y Some texts insist that in order for a function/to have an inverse, it must be bijective. Thus, when reading the literature it is important to note which definition of /-1 the author has in mind. (Note that an injective function /: X Y is a bijective function from X onto R < (f).) 1.2.12. Example. eL t X = Y be given by lex ) = x 3 for every x and/- I (y) = (y)'/3 for all y. _
= E
R, the set o"real numbers. L e t/: X Y R. Then/is a (1-1) mapping of X onto Y
1.2.13. Example. eL t X = Y = ,J the set of positive integers. Let / : ,X Y b e given by fen) = n + 3 for all n E .J Then/is a (1-1) mapping of X into .Y oH wever, the range off, R < (f) = y{ E :Y y > }4 = ,4 { 5, ...} .Y Therefore,fhas an inverse,f- I , which is defined only on R < (f) and not on all of .Y In this case we have/- I (y) = y - 3 for all y E R < (f). _
*'
1.2.14. -
Example.
eL t
bY e given by lex )
and R < (f)
= y{
E
X
= I
:Y ,- I
into R given by /- I (y)
=
=
= Y
R, the set of all real numbers. Let /: X
for all x
;Ix \
< y< +
E
I] . Also,
I for all y
I !\y
R. Then/is an injective mapping E
/-1 is a mapping from R< (f) R < (f).
-
Y a nd g : Y Next, let ,X ,Y and Z be non-void sets. Suppose that/: X - Z. F o r each x E ,X we have f(x ) E Y and g(f(x » E Z. Since / and g are mappings from X into Y a nd from Y into Z, respectively, it follows that for each x E X there is one and only one element g(f(x » E Z. e H nce, the set ({ ,x )z E X X Z: z = g(f(x » , x E X } (1.2.15) is a function from X into Z. We call this function the composite function of g and / and denote it by g 0 f The value of go/ at x is given by (g 0 f)(x )
=
g o/(x )
t:.
g(f(x » .
In Figure C, a pictorial interpretation of a composite function is given. 1.2.17. Theorem. If/is a mapping of a set X onto a set Yand g is a mapping of the set Y onto a set Z, then go/ is a mapping of X onto Z. In order to show that go/ is an onto mapping we must show that foranyz E Zthere exists an x E X s uchthatg(/(x » = z . Ifz E Zthensince g is a mapping of Y onto Z, there is an element y E Y such that g(y) = .z Furthermore, since / is a mapping of X onto ,Y there is an x E X such that lex ) = y. Since g o/(x ) = g(f(x » = g(y) = ,z it readily follows that go/ is a mapping of X onto Z, which proves the theorem. _ Proof
1.2. uF nctions
17
1.2.16.
iF gure C. Illustration of a composite function.
We also have 1.2.18. Theorem. IfI is a (I- I ) mapping of a set X onto a set ,Y and if g is a (I- I ) mappi ng ofthe set Y o nto a set Z, then g 0 I is a (I- I ) mapping of X ontoZ. 1.2.19. Exercise.
Prove Theorem 1.2.18.
Next we prove: 1.2.20. Theorem. If I is a (1-1) mapping of a set X onto a set ,Y and if a set Z, then (g 0 f)- I = (f- I ) 0 (g- I ).
g is a (1- I ) mapping of Y o nto
Proof Let z E Z. Then there exists an x E X such that g 0 f(x ) = ,z and hence (g 0 f)- I (Z) = .x Also, since g 0 f(x ) = g(f(x » = ,z it follows that g- I (Z) = I(x ) , from which we have f- I (g- I (Z» = .x But I- I (g- I (z » = I- I 0 g- I (Z) and since this is equal to x, we havef- I 0 g- I (Z) = (g olt 1(z). Since z is arbitrary, the theorem is proved. _
Note carefully that in Theorem 1.2.20 I is a mapping of X onto .Y If it had simply been an injective mapping, the composite function (/- 1 ) 0 (g- I ) may not be defined. That is, the range of g- I is ;Y however, the domain of 1- 1 is R < (f). Clearly, the domain ofI- I must include the range of g-1 in order that the composition (f- l ) 0 (g- l ) be defined. 1.2.21. Example. Let A = r{ , s, t, u}, B = u{ , v, W, x}, .J z Let the function I : A - > B be defined as
1= fer, )U ,
(s, w), (t, v), (u, )x .J
and C =
w { , x , y,
Chapter 1 I uF ndamental Concepts
18
We find it convenient to represent this function in the following way:
(r stU ) .
1=
u
v x
W
That is, the top row identifies the domain ofI and the bottom row contains each uniq u e element in the range of I directly below the appropriate element in the domain. Clearly, this representation can be used for any function defined on a finite set. In a similar fashion, let the function g : B - + C be defined as
g
= (U
W )X .
v
x
z
W
y
Clearly, both/and g are bijective. Also, go lis the (I- I ) mapping of A onto C given by
). y z (xr stU
g 0/=
W
F u rthermore,
uX ),
g- I
= (X
I- I
og- t
W
u v
Z W
y), x
(gof)- t
= (X r sZ
w Y
t
). u
Now
i.e.,f- t og- t = ( goltt
= (rX Wt sZ Y )' u
.•
The reader can prove the next result readily. L e t W, X, ,Y and Z be non-void sets. If I is a mapping of set W into set ,X if g is a mapping of X into set ,Y and if h is a mapping of Y into set Z (sets W, ,X ,Y Z are not necessarily distinct), then h 0 (g 0 f) = (h 0 g) of
1.2.22. T ' heorem.
Prove Theorem 1.2.22.
1.2.23. Exercise.
1.2.24. Example. Let A = m [ , n, p, ,} q B= m [ , r, s}, C = r{ , t, u, v}, = w{ , ,x ,Y ,}z and define I : A - + B, g : B - + C, and h : C - + D as
~
1= (: ;
:),
(~
g=
r
=
C ~ ; :).
:),
h
hog
= (: :
Then
g0I
=
(~
;
~
:)
and
:) .
D
1.2. uF nctions
19
Thus, h
0
(g 0 f) =
i.e., h 0 (g 0 f)
(:
=
(h
~ ~)
:
0
g)
0
f.
and (h
0
g)
0
f =
(:
:
~ ~),
•
There is a special mapping which is so important that we give it a special name. We have:
1.2.25. Definition. Let X be a non-void set. eL t e : X - X be defined by e(x) = x for all x E .X We call e the identity function on .X It is clear that the identity function is bijective.
1.2.26. Theorem. eL t X and Y be non-void sets, and left f: X - .Y Let ex, ey, and e l be the identity functions on X, ,Y and R < (f), respectively. Then (i) iffis injective, thenf- I of= ex andfof- I ; = e l ; and (ii) f is bijective if and only if there is a g : Y - X such that g andfo g = ey.
0
f = ex
Part (i) follows immediately from parts (iii) and (iv) of Theorem 1.2.10. The proof of part (ii) is left as an exercise. _
Proof.
1.2.27. Exercise.
Prove part (ii) of Theorem 1.2.26.
Another special class of important functions are permutations. 1.2.28. Definition. A permutation on a set X is a (I- I ) mapping of X onto .X It is clear that the identity mapping on X is a permutation on .X F o r this reason it is sometimes called the identity permutation on .X It is also clear that the inverse of a permutation is also a permutation.
1.2.29. Exercise. as
eL t X
f=
= a{ ,
b, e}, and definef: X
(ac bb ae),
g=
-+
X and g : X X
(ab eb ae).
Show that/, g,f- I , and g- I are permutations on .X 1.2.30. Exercise. Let Z denote the set of integers, and let f : Z - Z be defined by f(n) = n + 3 for all n E Z. Show thatfandf- I are permutations on Z and thatf- I 0 f= fo f- I .
10
The reader can readily prove the following results. 1.2.31. Theorem. Iflis a (I- I ) mapping of a set A onto a set B and if g is a (1-1) mapping of the set B onto the set A, then g 0 I is a permutation on A. 1.2.32. Corollary. If I and g are both permutations on a set A, then g is a permutation on A. 1.2.33. Exercise.
0
I
Prove Theorem 1.2.31 and Corollary 1.2.32.
1.2.34. Exercise. Show that if a set A consists of n elements, then there are exactly n! (n factorial) distinct permutations on A.
Now letl be a mapping of a set X into a set .Y If X I is a subset of X , then for each element x ' E XI there is a unique element/(x ' ) E .Y Thus,fmay be used to define a mapping f' of XI into Y defined by
=
f' ( x ' )
for all x '
E
I(x ' )
(1.2.35)
This motivates the following definition.
XI'
1.2.36. Definition. The mappingf' of subset XI C X into Y o f Eq. (I.2.35) is called the mapping of X . into Y induced by the mapping f: X - > .Y In this case f' is called the restriction offto the set X I '
We also have: 1.2.37. Definition. IfI is a mapping of XI into Y a nd if XI mapping f of X into Y is said to be an extension offif
for every x
=
/(x ) E
XI'
C
I(x )
1.2.39. Example. s, t}. Clearly XI
Also, define j, j
j =
C
:X
(U
eL t X I = u{ , v, ,} x .X Define I : X I
I=(U
->
X ->
v
=
Yas
n p q
then any (1.2.38)
Thus, if j is an extension off, then I is a mapping of a set XI which is induced by the mapping j of X into .Y T,
,X
f{ l, v, ,x y, ,} z
and Y
v x
Z) .
C
X into Y
= tn, p, ,q
)X .
Y as
v x
y
npqrs
Z),
j =
(U
y
npqnt
Then j andj are two different extensions off Moreover, I is the mapping
1.2. uF nctions
11
of IX into Y induced either by j or j. In general, two distinct mappings may induce the same mapping on a subset. _ Let us next consider the image and the inverse image of sets under mappings. Specifically, we have 1.2.40. Definition. L e tf be a function from a set X into a set :Y Let A c: ,X and let B c: .Y We define the image of A under f, denoted by f(A), to be the set
f(A) =
y{ E :Y y
= f(x ) ,
X
E
A}.
We define the inverse image of B under f, denoted by f- l (B), to be the set
f- ' ( B)
=
x{
E
X : f(x )
E
B}.
Note thatf- I (B) is always defined for any f: X - - > .Y That is, there is no implication here thatfhas an inverse. The notation is somewhat unfortunate in this respect. Note also that the range offis f( X). In the next result, some of the important properties of images and inverse images of functions are summarized. 1.2.41. Theorem. Let f be a function from X into ,Y let A, A1> and A2 be subsets of ,X and let B, BI> and B2 be subsets of .Y Then
(i) if AI c: A, then f(A I) c: f(A); (ii) f(A I U A2 ) = f(A I ) U f(A2 ); (iii) f(A I n A2 ) c: f(A I) n f(A2 ); (iv) f- ' ( B I U B2 ) = f- I (B I) U f- I(B2 ); (v) f- ' ( B I n B2 ) = rJ ( B I ) n f- I(B 2 ); (vi) f- ' ( B- ) = [ f - I (B)r; (vii) f-'[f(A)]:::> A; and c: B. (viii) f[ f - ' ( B)] Proof We prove parts (i) and (ii) to demonstrate the method of proof. The remaining parts are left as an exercise. To prove part (i), let y E f(AI)' Then there is an x E AI such that y = f(x ) . But AI c: A and so x E A. H e nce,f(x ) = y E f(A). This proves thatf(A I) c: f(A). To prove part (ii), let y E f(A 1 U A2 ). Then there is an x E AI U A2 such that y = f(x ) . If x E AI, then f(x ) = y E f(A,). If x E A2 , then f(x ) = y E f(Az ). Since x is in AI or in Az , f(x ) must be in f(A,) or f(Az ). Therefore, f(A I U A2 ) c: f(A I) U f(Az ). To prove that f(A I) U f(Az ) c: f(A, U Az ), we note that Al c: AI U Az . So by part (i), f(AI) c: f(AI
Chapter 1 I uF ndamental Concepts U U
A2). Similarly, f(A2) c f(A, U A2). F r om this it follows that f(A I) f(A2) c f(A, U A2). We conclude that f(A, U A2) = f(A I) U f(A,j.
1.2.42.
Exercise.
Prove parts (iii) through (viii) of Theorem 1.2.41.
-
We note that, in general, equality is not attained in parts (iii), (vii), and (viii) of Theorem 1.2.41. However, by considering special types of mappings we can obtain the following results for these cases.
1.2.43. Theorem. L e tfbe a function from X into ,Y let A, AI' and A2 be subsets of ,X and let B be a subset of .Y Then (i) f(A, n A2) = f(A I) n f(A2) for all pairs of subsets AI, A2 of X if and only iffis injective; (ii) f- ' [ f (A)] = A for all A c X if and only iff is injective; and (iii) f[ f - I (B)] = B for all B c Y i f and only iffis surjective.
Proof We will prove only part (i) and leave the proofs of parts (ii) and (iii) as an exercise. To prove sufficiency, letfbe injective and let AI and A2 be subsets of .X In view of part (iii) of Theorem 1.2.41, we need only show thatf(A I) nf(A,) c f(A, n A2). In doing so, let y E f(A I) n f(A2). Then y E f(A I) and y E f(A2). This means there is an IX E AI and an x 2 E A2 such that y = f(x ,) = f(x 2). Since f is injective, IX = 2X ' Hence, IX E AI n A2. This implies that y E f(A J n A2); i.e.,f(A I) n f(Al ) c f(A I n Al )· To prove necessity, assume that f(A I n A2) = f(A I) n f(A2) for all subsets AI and A 2 of .X F o r purposes of contradiction, suppose there are IX ' 2X E X such that IX X 2 and f(x , ) = f(x 2). Let AI = IX { } and A2 = (X2;} i.e., AI and A2 are singletons of X I and X 2, respectively. Then AI n A2 = 0, and so f(A, n A2) = 0. However, f(A,) = y{ } and f(A2} = y { ,} and thus f(A I) n f(A2) = y{ } 0. This contradicts the fact that f(A , ) n f(A2) = f(A I n A2) for all subsets AI and A2 of .X Thus, f is injective. -
*
*
1.2.4.4
Exercise.
Prove parts (ii) and (iii) of Theorem 1.2.43.
Some of the preceding results can be extended to families of sets. F o r example, we have:
1.2.45. Theorem. Let f be a function from X an indexed family of sets in ,X and let B { .. : IX of sets in .Y Then (i) f(U (ii) f(
A..)
= « UE J
f(A..);
n A..) c n f(A..);
(l.EI
«EI
«EI
into ,Y let A { .. : IX E I} be } K be an indexed family
E
1.2.
u~ nct;ons
(iii) f- I (U
=
B,,)
"EI:
(iv) f- I (n
B,,)
"E /{
U
"EI:
=
f- I (B,,);
n f- I (B,,); and
(v) if Be Y , f- ' ( B- )
"EI:
=
[ f - I (B} r .
Proof
We prove parts (i) and (iii) and leave the proofs of the remaining parts as an exercise. To prove part (i), let y E feu A,,). This means that there is an x E U A" "EI
"EI
such that y = f(x ) . Thus, for some IX E T, x E A". This implies that f(x ) E f(A,,) and so y E f(A,,). e H nce, y E U f(A,,). This shows that feu A,,) "EI
c U f (A,,). "EI
To prove the converse, let y This means there is an x f(x )
= Y
E
E
f(x )
E
Conversely, let x
Thus,j(x ) that
U
.EI:
1.2.46.
E
"EI
E
U
E
• E/{
B". H e nce,j(x )
f- I (B,,) c f- I (
Exercise.
U
"E/{
"' E /{
E
Thus, x
.K
.E/{
U
E
U
"EI
T.
E
A", and so
E
f-I(B",),
U
E
"E/{
B",.
and so x
E
f- I (B,,) .
f- I (B,,). Then x ceK
= y. Now x
B.). This means that f(x )
B",) c U E
f(A,,) for some IX
"EI
• E/{
IX
E
f(A,,) c feu A,,). This completes the
f- I ( U
B" for some
Therefore,j- I (U
f(A,,). Then y
A" such thatf(x )
"EI
U f - I (B.).
• E/{
"EI
feu A,,). Therefore, U
proof of part (i). To prove part (iii), let x eH nce,
U
E
"EI
E
B., and so x
f- I (B,,) for some E
f- I ( U
ceK
IX
E
K.
B,,). This means
B,,), which completes the proof of part (iii). •
Prove parts (ii), (iv), and (v) of Theorem 1.2.45.
Having introduced the concept of mapping, we are in a position to consider an important classification of infinite sets. We first consider the following definition.
1.2.47. Definition. Let A and B be any two sets. The set A is said to be equivalent to set B if there exists a bijective mapping of A onto B. Clearly, if A is equivalent to B, then B is equivalent to A.
1.2.48. Definition. eL t J be the set of positive integers, and let A be any set. Then A is said to be countably infinite if A is equivalent to .J A set is said to be countable or denumerable if it is either finite or countably infinite. Ifa set is not countable, it is said to be uncountable. We have:
Chapter 1
I
~ntal
Concepts
1.2.49. Theorem. L e t J be the set of positive integers, and let 1 c .J If 1 is infinite, then 1 is equivalent to .J Proof. We shall construct a bijective mapping, f, from J onto 1. L e t .J { : n E J } be the family of sets given by J . = {I, 2, ... , n} for n = 1,2, .... Clearly, each J. is finite and of order n. Therefore, J. n I is finite. Since I is 0 for all n. L e t us now define f : J - + I as follows. L e t infinite, 1 - J. f(I) be the smallest integer in 1. We now proceed inductively. Assume f(n) E I has been defined and let f(n 1) be the smallest integer in I which is greater than f(n). Now f(n + 1) > f(n), and so f(n.) > f(n,J for any n. > n2 • This implies thatfis injective. Nex t , we want to show that f is surjective. We do so by contradiction. Suppose that f(J ) I. Since f(J ) c I, this implies that 1- f(J ) 0. L e t q be the smallest integer in 1 - f(J ) . Then q f(1) because f(l) E f(J ) , and so q > f(I). This implies that 1 n J q _ . 0. Since In J q _ . is non- v oid and finite, we may find the largest integer in this set, say r. It follows that r < q - 1 < .q Now r is the largest integer in I which is less than .q But r < q implies that r E f(J ) . This means there is an s E J such that r = f(s). By definition of f,f(s + 1) = .q Hence, q E f(J ) and we have arrived at a contradition. Thus, f is surjective. This completes the proof. _
*
+
*
*
*
*
We now have the following corollary. 1.2.50. Corollary. countable.
Let
A c B c .X
If B is a countable set, then A is
Proof. If A is finite, then there is nothing to prove. So let us assume that A is infinite. This means that B is countably infinite, and so there exists a bijective mapping f : B - + .J L e t g be the restriction offto A. Then for all Xu X 2 E A such that X . X 2 , g(x . ) = f(x t ) f(x 2 ) = g(x 2 ). Thus, g is an injective mapping of A into .J By part (i) of Theorem 1.2.10, g is a bijective mapping of A onto g(A). This means A is equivalent to g(A), and thus g(A) is an infinite set. Since g(A) c ,J g(A) is equivalent to .J Hence, there is a bijective mapping of g(A) onto ,J which we call h. By Theorem 1.2.18, the composite mapping hog is a bijective mapping of A onto .J This means that J is eq u ivalent to A. Therefore, A is countable. _
*
*
We conclude the present section by considering the cardinality of sets. Specifically, if a set is finite, we say the cardinal Dumber of the set is eq u al to the number of elements of the set. Iftwo sets are countably infinite, then we say they have the same cardinal number, which we can define to be the cardinal number of the positive integers. More generally, two arbitrary sets are said to have the same cardinal number if we can establish a bijective mapping between the two sets (i.e., the sets are equivalent).
1.3.
RELATIONS
AND EQUIVALENCE
RELATIONS
Throughout the present section, X denotes a non-void set.
We begin by introducing the notion of relation, which is a generalization of the concept of function. 1.3.1 Deftnition. Let X and Y be non-void sets. Any subset of X X Y is called a relation from X to .Y Any subset of X x X is called a relation in .X 1.3.2. Example. Let A = u{ , v, ,x y) and B = a{ , b, c, d). Let ~ = ({ u, a), (v, b), (u, c), (x, a»). Then ~ is a relation from A into B. It is clearly not a function from A into B (why?). _ 1.3.3. Example. Let X = Y = R, the set of real numbers. The set y) E R x R: :x ::;;; y) is a relation in R. Also, the set ({ ,x y) E R x R: x = sin y) is a relation in R. This shows that so-called multivalued functions are actually relations rather than mappings. _
({ ,x
As in the case of mappings, it makes sense to speak of the domain and the range of a relation. We have: 1.3.4.
DefiDition. eL t p be a relation from X to .Y The subset of X, {x
:X (x, y)
E
E p,
Y
E )Y ,
is called the domaiD or p. The subset of Y {y
:Y (x, y)
E
E
p, X
EX ) ,
is called the ruge of p. Now let p be a relation from X to .Y Then, clearly, the set p- I c Y defined by p- I
=
{ ( y; x)
E
Y X
X : (x, y)
E
pc X
x
X
,X
)Y ,
is a relation from Y to .X The relation p- I is called the inverse relation of p. Note that whereas the inverse of a function does not always exist, the inverse of a relation does always exist. Next, we consider equivalence relations. eL t p denote a relation in X ; i.e., p c X X .X Then for any ,x y E ,X .either (x, y) E P or (x, y) i p, but not both. If (x, y) E p, then we write x p y and if (x, y) i p, we write x.J/y.
1.3.5. DefiDition. Let p be a relation in .X (i) If x P x for all x
E
,X
then p is said to be reflexive;
Chapter 1
26
I
uF ndtzmental
Concepts
(ii) if x P y implies y p x for all x, Y E p, then p is said to be symmetric; and (iii) if for all x, y, Z E ,X X PY and y p Z implies x p ,z then p is said to be traositive. 1.3.6. Example. Let R denote the set of real numbers. The relation in R given by {(x, y): x < y} is transitive but not reflexive and not symmetric. y} is symmetric but not reflexive and The relation in R given by {(x, y): x not transitive. _
*"
defined by p = ({ A x B): 1.3.7. Example. Let p be the relation in (>J< )X A c B}. That is, A p B if and only if A c B. Then p is reflexive and transitive but not symmetric. _ In the following, we use the symbol,.., to denote a relation in .X E ,.." then we write, as before, x ,.., y.
(x, y)
If
1.3.8. Definition. L e t,.., be a relation in .X Then ...., is said to be an equivalence relation in X if ,.., is reflexive, symmetric, and transitive. If ,.., is an equivalence relation and if x ...., y, we say that x is equivalent to y. In particular, the equivalence relation in X characterized by the statement "x ,.., y if and only if x = y" is called the equals relation in X or the identity relation in .X 1.3.9. Example. eL t X be a finite set, and let A, B, C E P< (X). Let,.., on be defined by saying that A ...., B if and only if A and B have the same number of elements. Clearly A ,.., A. Also, if A ,.., B then B "' "' A. F u rthermore, if A ...., Band B "' "' C, then A ,.., C. Hence, ...., is reflexive, symmetric, and transitive. Therefore, ,.., is an equivalence relation in P< (X). _
P< (X)
1.3.10. Example. Let R1. = R x R, the real plane. Let X be the family of all triangles in R1.. Then each of the following statements can be used to define an equivalence relation in :X "is similar to," "is congruent to," "has the same area as," and "has the same perimeter as." _
1.4.
OPERATIONS ON SETS
In the present section we introduce the concept of operation on set, and we consider some of the properties of operations. Throughout this section, X denotes a non-void set. 1.4.1. Definition. A binary operation on X is a mapping of X x .X A ternary operation on X is a mapping of X x X x X into .X X
into
1.4.
27
Operations on Sets
We could proceed in an obvious manner and on .X Since our primary concern in this book will we will henceforth simply say "an operation on X " binary operation on .X If IX: X X X - > X is an operation, then we IX(,X y) A IX yX .
define an n-ary operation be with binary operations, when we actually mean a usually use the notation
1.4.2. Example. Let R denote the real numbers. Let f: R x R - > R be given by f(x , y) = x + y for all x, y E R, where x + y denotes the customary sum of x plus y (Le., + denotes the usual operation of addition of real numbers). Then f is clearly an operation on R, in the sense of Definition as being the operation on R, 1.4.1. We could just as well have defined i.e., +: R x R - > R, where + ( x , y) A x + y. Similarly, the ordinary rules of subtraction and multiplication on R, "- " and" . ", respectively, are also operations on R. Notice that division, :- ,- is not an operation on R, because x :- - y is not defined for all y E R (i.e., x :- - y is not defined for y = 0). { ,J then "- : - " is an operation on R#. • However, if we let R* = R - O
"+"
1.4.3. Exercise. Show that if A is a set consisting of n distinct elements, then there exist exactly n(·)· distinct operations on A. 1.4..4 Example. Let A = a{ , b}. An example of an operation on A is the mapping IX: A x A - > A defined by I%(a,
a)
A 01% 0
=
0,
b)
1%(0,
=
b
A 01%
lX(b,O)
b,
b IX a =
A
b, lX(b, b) =
b IX b =
It is convenient to utilize the following operation table to define
..!~-
ala b
b b a
a.
IX:
(l.4 . 5)
If, in general, IX is an operation on an arbitrary finite set A, or sometimes even on a countably infinite set A, then we can construct an operation table as follows:
If A =
IX
Y
x
xIXy
a{ , b}, as at the beginning of this example, then in addition to
IX
28
CMprerlIFm ~ en~/C~up~
L~ a b
Iba
a b a a a b b b
" a b a
a
b
a
p, y,
:r::
given in (1.4.5), we can define, for example, the operations A as
b a b
a
and ~ on
•
We now consider operations with important special properties. 1.4.6. =
is said to be commutative if x cz y
Definition. An operation cz on X E X.
y cz x for all x , y
1.4.7. Definition. An operation cz on X is said to be associative if (x cz y) cz z = x cz (y cz )z for x, y, Z E .X In the case of the real numbers R, the operations of addition and multiplication are both associative and commutative. The operation ofsubtraction is neither associative nor commutative. 1.4.8. then
Definition.
If cz and P are operations on X (not necessarily distinct),
(i) cz is said to be left distributive over x cz (y P )z
=
P if
(x cz y)
P (x
cz )z
for every x, y, Z E ;X (ii) cz is said to be right distributive over (x
(iii)
P y) cz
z =
(x
P if cz )z P (y cz
)z
for every x, y, Z E X ; and cz is said to be distributive over P if cz is both left and right distributive over p.
In Ex a mple
1.4.4, cz is the only commutative operation. The operation 1.4.4 is not associative. The operations cz, y, and 6 of this ex a mple are associative. In this example, " is distributive over 6 and 6 is distributive over y. In the case of the real numbers R, multiplication, ".", is distributive over addition, The converse is not true.
p of Example
"+".
1.4.9. Definition. If cz is an operation on ,X and if IX is a subset of ,X then X l is said to be closed relative to cz if for every ,x y E X .. x cz Y E X l . Clearly, every set is closed with respect to an operation on it. The set of all integers Z, which is a subset of the real numbers R, is closed with respect to the operations of addition and multiplication defined on R. The even integers are also closed with respect to both of these operations, whereas the odd integers are not a closed set relative to addition.
1.4.
Operations on Sets
1.4.10.
Definition. If a subset X l of X is closed relative to an operation ~ then the operation a: on X l defined by
on X,
('« ,x for all ,x y
E
y)
= x
IX'
y
= x«
is called the operation on X l
lX >
y
induced by
IX.
If X l = X, then IX' = IX. If X l C X but X l 1= = X, then IX' 1= «= since IX' and « are operations on different sets, namely X l and X, respectively. In general, an induced operation IX' differs from its predecessor IX; however, it does inherit the essential properties which « possesses, as shown in the following result.
1.4.11. Theorem. L e t« be an operation on X, let X l C X, where X l is closed relative to IX, and let IX' be the operation on X l induced by IX. Then (i) if« is commutative, then IX' is commutative; (ii) if« is associative, then IX' is associative; and (iii) if P is an operation on X and X l is closed relative to p, and if« is left (right) distributive over p, then IX' is left (right) distributive over P', where P' is the operation on X l induced by p.
1.4.12.
Exercise.
Prove Theorem 1.4.11.
The operation IX' on a subset X l induced by an operation « on X will frequently be denoted by IX, and we will refer to « as an operation on X l ' In such cases one must keep in mind that we are actually referring to the induced operation IX' and not to IX. Definition. eL t X l be a subset of .X An operation a. on X is called an extension of an operation « on X l if X l is closed relative to a. and if « is equal to the operation on X l induced by a..
1.4.13.
A given operation « on a subset X l different extensions.
1.4.14.
Example. and a. and
on X l
« a b C
Let
a. on X
a b C a C b b b a C
a C
Xl
as
a.
= a{ ,
of a set X may, in general, have many
b, c}, and let X
a b C a a C b b C b a C b a C d C d a e d C a
= a{ ,
b
a b C d e a C b d e C b a e d
C
b
a
C
d
d
C
b a e
e
d
a
C
e
f1.
e d d e e d b e
a
d
b e
b, c, d, e}. Define «
d
e
b e
Chapter 1 I uF ndamental Concepts
30
Clearly, ~ is an operation on IX and ii and fl. are operations on .X Moreover, both a. and fl. (ii fl.) are extensions of ~. Also, ~ may be viewed as being induced by ii and fl.. •
*'
1.5.
MATHEMATICAL IN THIS BOOK
SYSTEMS
CONSIDERED
We will concern ourselves with several different types of mathematical systems in the subsequent chapters. Although it is possible to give an abstract definition of the term mathematical systelf1, we will not do so. Instead, we will briefly indicate which types of mathematical systems we shall consider in this book. 1. In Chapter 2 we will begin by considering mathematical systems which are made up of an underlying set X and an operation ~ defined on X. We .}~ We will be able to characterize a will identify such systems by writing ;X{ according to certain properties which X and ~ possess. Two system ;X{ }~ important cases of such systems that we will consider are semigroups and groups. In Chapter 2 we will also consider mathematical systems consisting of a basic set X and two operations, say ~ and p, defined on ,X where a special relation exists between ~ and p. We will identify such systems by writing {X;~, Pl. Included among the mathematical systems of this kind which we will consider are rings and fields. In Chapter 2 we will also consider composite mathematical systems. Such systems are endowed with two underlying sets, say X and ,F and possess a much more complex (algebraic) structure than semigroups, groups, rings, and fields. Composite sy~tems which we will consider include modules, vector spaces over a field F which are also called linear spaces, and algebras. In Chapter 2 we will also study various types of important mappings (e.g., homomorphisms and isomorphisms) defined on semigroups, groups, rings, etc. Mathematical systems of the type considered in Chapter 2 are sometimes called algebraic systems. 2. In Chapters 3 and 4 we will study in some detail vector spaces and special types of mappings on vector spaces, called linear transformations. An important class of linear transformations can be represented by matrices, which we will consider in Chapter .4 In this chapter we will also study in some detail important vector spaces, called Euclidean spaces. 3. Most of Chapter 5 is devoted to mathematical systems consisting of a basic set X and a function p: X x X - + R (R denotes the real numbers), where p possesses certain properties (namely, the properties of distance
1.6. References and Notes
31
between points or elements in X ) . The function p is called a metric (or a distance function) and the pair ;X{ p) is called a metric space. In Chapter 5 we will also consider mathematical systems consisting of a basic set X and a family of subsets of X (called open sets) denoted by 3. The pair { X ; 3) is called a topological space. It turns out that all metric spaces are in a certain sense topological spaces. We will also study functions and their properties on metric (topological) spaces in Chapter 5. .4 In Chapters 6 and 7 we will consider Dormed linear spaces, inner product spaces, and an important class of functions (linear operators) defined on such spaces. A normed linear space is a mathematical system consisting of a vector space X and a real-valued function denoted by II . II, which takes elements of X into R and which possesses the properties which characterize the "length" of a vector. We will denote normed spaces by { X ; 1I·11l. An inner product space consists of a vector space X (over the field of real numbers R or over the field of complex numbers C) and a function (' , ' ) , which takes elements from X x X into R (or into C) and possesses certain properties which allow us to introduce, among other items, the concept of orthogonality. We will identify such mathematical systems by writing
{ X ; (,,· » ) .
It turns out that in a certain sense all inner product spaces are normed linear spaces, that all normed linear spaces are metric spaces, and as indicated before, that all metric spaces are topological spaces. Since normed linear spaces and inner product spaces are also vector spaces, it should be clear that, in the case of such spaces, properties of algebraic systems (called algebraic strocture) and properties of topological systems (called topological structure) are combined. A class of normed linear spaces which are very important are Bauach spaces, and among the more important inner product spaces are Hilbert spaces. Such spaces will be considered in some detail in Chapter 6. Also, in Chapter 7, linear transformations defined on Banach and Hilbert spaces will be considered. 5. Applications are considered at the ends of Chapters ,4 5, and 7.
1.6.
REFERENCES
AND NOTES
A classic reference on set theory is the book by Hausdorff 1[ .5]. The many excellent references on the present topics include the elegant text by Hanneken 1[ .4), the standard reference by Halmos 1[ .3] as well as the books by Gleason 1[ .1] and Goldstein and Rosenbaum 1[ .2].
REFERENCES 1[ .1]
1[ .2] 1[ .3]
1[ .4] 1[ .5]
31
A. M. GLEASON, uF ndamentals of Abstract Analysis. Reading, Mass.: Addison-Wesley Publishing Co., Inc., 1966. M. E. GOLDStEIN and B. M. ROSENBAUM, "Introduction to Abstract Analysis," National Aeronautics and Space Administration, Report No. SP-203, Washington, D.C., 1969. P. R. A H M L OS, Naive Set Theory. Princeton, N.J.: D. Van Nostrand Company, Inc., 1960. C. B. A H NNEKEN, Introduction to Abstract Algebra. Belmont, Calif.: Dickenson Publishing Co., Inc., 1968. F. A H SU DORF,F Mengenlehre. New o Y rk: Dover Publications, Inc., 194.4
2
ALGEBRAIC
STRUCTURES
The subject matter of the previous chapter is concerned with set theoretic structure. We emphasized essential elements of set theory and introduced related concepts such as mappings, operations, and relations. In the present chapter we concern ourselves with algebraic structure. The material of this chapter falls usually under the heading of abstract algebra or modern algebra. In the next two chapters we will continue our investigation of algebraic structure. The topics of those chapters go usually under the heading of linear algebra. This chapter is divided into three parts. The first section is concerned with some basic algebraic structures, including semigroups, groups, rings, fields, modules, vector spaces, and algebras. In the second section we study properties of special important mappings on the above structures, including homomorphisms, isomorphisms, endomorphisms, and automorphisms of semigroups, groups and rings. Because of their importance in many areas of mathematics, as well as in applications, polynomials are considered in the third section. Some appropriate references for further reading are suggested at the end of the chapter. The subject matter of the present chapter is widely used in pure as well as in applied mathematics, and it has found applications in diverse areas, such as modern physics, automata theory, systems engineering, information theory, graph theory, and the like. 33
Chapter 2
34
I
Algebraic Structures
Our presentation of modern algebra is by necessity very brief. oH wever, mastery of the topics covered in the present chapter will provide the reader with the foundation required to make contact with the literature in applications, and it will enable the interested reader to pursue this subject further at a more advanced level.
2.1.
SOME BASIC STRUCTURES
OF
ALGEBRA
We begin by developing some of the more important properties of mathematical systems, { X ; IX,} where IX is an operation on a non-void set .X 2.1.1. Definition. Let IX be an operation on .X If for all ,x ,Y Z E X, x IX Y = x IX z implies that y = ,z then we say that ;X { IX} possesses the left cancellation property. If x IX y = Z IX Y implies that x = ,z then ;X{ IX} is said to possess the right cancellation property. If { X ; IX} possesses both the left and right cancellation properties, then we say that the cancellation laws hold in ;X { IX.} In the following exercise, some specific cases are given. 2.1.2. Exercise.
eL t
x Y xxy yyx
IX
X =
~
,x{
y} and let IX,
~
p, )',
.1.- r :- t y xxy yxy
xxx yyx
and d be defined as
~ xxx yyy
Show that (i) { X ; P} possesses neither the right nor the left cancellation property; (ii) { X ; )'} possesses the left cancellation property but not the right cancellation property; (iii) { X ; d} possesses the right cancellation property but not the left cancellation property; and (iv) { X ; IX} possesses both the left and the right cancellation property. In an arbitrary mathematical system { X ; IX} there are sometimes special elements in X which possess important properties relative to the operation IX. We have: 2.1.3. Definition. eL t element e, such that
IX
be an operation on a set X and let X contain an
x
IX
e, =
,x
for all x E .X We call e, a right identity element of X relative to lX, or simply aright identity of the system ;X{ IX.} If X contains an element e, which satisfies the condition e,IX
x
=
,x
2.1. Some Basic Structures ofAlgebra
3S
for all x E X, then et is called a left identity element of X simply a left identity of the system ;X{ .} «
relative to «, or
We note that a system ;X { } « may contain more than one right identity element of X (e.g., system { X ; cS} of Exercise 2.1.2) or left identity element of X (e.g., system ;X { y} of Exercise 2.1.2).
2.1.4. Definition. An element e of a set X of X relative to an operation « on X if for every x
is called an identity element
e« x = x « e = x E
.X
2.1.5. Exercise.
X
Let
=
I-± h-oI I
+}
Does either ;X{
or ;X{
to, I}
and define the operations"" +
and"· " by
• .0 I
0
I
o
0 0
I
0
I
0
I
.} have an identity element?
Identity elements have the following properties.
2.1.6. Theorem. L e t«
be an operation on .X
has an identity element e, then e is unique. } « has a right identity e, and a left identity ee. then e, = et . (iii) If« is a commutative operation and if ;X{ } « has a right identity element e" then e, is also a left identity. (i) If { X ; (ii) If { X ;
}«
Proof To prove the first part, let e' and en be identity elements of { X ; .} « Then e' « en = e' and e' « en = en. Hence, e' = en. To prove the second part, note that since e, is a right identity, et« e, = et. Also, since et is a left identity, et « e , = e,. Thus, et = e,. To prove the last part, note that for all x E X we have x = x « e, =
e,« x.
•
In summary, if { X ; } « has an identity element, then that element is unique. F u rthermore, if { X ; } « has both a right identity and a left identity element, then these elements are equal, and in fact they are equal to the uniq u e identity element. Also, if { X ; } « has a right (or left) identity element and « is a commutative operation, then {X; } « has an identity element.
2.1.7. Definition. L e t« relative to «. If x
E
X,
be an operation on X and let e be an identity of X then x ' E X is called a right inverse of x relative to
Chapter 2 I Algebraic Structures
« provided that An element x "
x« E
x' =
e. of x relative to « if
X is called a left ia~erse
x"«
x =
e.
The following exercise shows that some elements may not possess any right or left inverses. Some other elements may possess several inverses of one kind and none of the other, and other elements may possess a number of inverses of both kinds. 2.1.8. Exercise.
Let X
=
,x{
«
y, u, v} and define ~
as
y u v x x y x y y x y y x x
u v
x
u v
y x
y
v
u
(i) Show that { X ; } « contains an identity element. (ii) Which elements possess neither left inverses nor right inverses? (iii) Which element has a left and a right inverse? A.
Semigroups and Groups
Of crucial importance are mathematical systems called semlgroups. Such mathematical systems serve as the natural setting for many important results in algebra and are used in several diverse areas of applications (e.g., qualitative analysis of dynamical systems, automata theory, etc.).
be an operation on .X 2.1.9. Deftnition. L e t« if « is an associative operation on .X
We call { X ;
}«
a semlgroup
Now let ,x y, Z E ,X and let « be an associative operation on .X Then x « (y« )z = (x « y) « Z = U E .X Henceforth, we will often simply write u = x « y « .z As a result of this convention we see that for x, y, u, V E ,X X
« y~ u~ v =
=
x ~
(y « u) « v =
(x~y)~(u«v)
=
x ~
y ~ (u « v)
(x~y)«u«v.
(2.1.10)
As a generalization of the above we have the so-called generalized assoc:lalaw, which asserts that if X I ' X z , .. ' ,x . are elements of a semigroup { X ; ~}, then any two products, each involving these elements in a particular order, are equal. This allows us to simply write X I X« z ~ ... ~ x .• ti~e
2.1. Some Basic Structures ofAlgebra
37
In view of Theorem 2.1.6, part (i), if a semigroup has an identity element, then such an element is unique. We give a special name to such a semigroup. 2.1.11. Definition. A semigroup {X; (X} is called a .monoid if X contains an identity element relative to (x, Henceforth, the unique identity element of a monoid ;X { (X} will be denoted bye. Subsequently, we frequently single out elements of monoids which possess inverses. 2.1.12. DefiDition. Let { X ; (X} be a monoid. If x E X possesses a right inverse x ' E ,X then x is called a right invertible element in .X If x E X possesses a left inverse x " E ,X then x is called a left invertible element in .X If x E X is both right invertible and left invertible in ,X then we say that x is an invertible element or a unit of .X Clearly, if e
E ,X
then e is an invertible element.
2.1.13. Theorem. Let { X ; (X} be a monoid, and let x E .X If there exists a left inverse of x, say x', and a right inverse of ,x say x " , then x ' = x " and x ' is unique. Since (X is associative, we have (x' (X x) (X x " = x " and x ' (X (x (X x " ) x'. Thus, x ' = x " . Now suppose there is another left inverse of x, say x " ' . Then x ' " = x " and therefore x ' " = x'. •
Proof =
Theorem 2.1.13 does, in general, not hold for arbitrary mathematical systems {X; (X} with identity, as is evident from the following: 2.1.14.
Exercise.
Let
X
= u{ , v, x , y} and define (X
u v x
u v
v v u u u u v x
x y
u
v
x v
x
(X
as
y
y
y
x
Use this operations table to demonstrate that Theorem 2.1.13 does not, in (X} is replaced by system ;X { (X} with identity. general, hold if monoid ;X { By Theorem 2.1.13, any invertible element of a monoid possesses a unique right inverse and a unique left inverse, and moreover these inverses are equal. This gives rise to the following.
Chapter 2
38
I Algebraic Structures
2.1.15. Definition. eL t { X ; a} be a monoid. If x E X has a left inverse and a right inverse, x ' and x " , respectively, then this unique element x ' = x " is called the inverse of x and is denoted by X - I . Concerning inverses we have. 2.1.16. Theorem. eL t ;X{
a} be a monoid.
(i) If x E X has an inverse, X - I , then X - I has an inverse (X - I t I = x . (ii) If x, y E X have inverses X - I , y- I , respectively, then X a y has an inverse, and moreover (x a y)- I = y- I 1% X - I . (iii) The identity element e E X has an inverse e- I and e- I = e.
Proof To prove the first part, note that x a X - I = e and X - I Thus, x is both a left and a right inverse of X - I and (X - I )- I = .X To prove the second part, note that (x a y)a(y- I
and (y- I
a X-I)
ax - I ) 1%
(x
=
x l % ( yay- I )ax -
a y) = y- I
1%
(X - I
I
=
ax
=
e.
e
a x ) a y = e.
The third part of the theorem follows trivially from e a e =
e.
_
In the remainder of the present chapter we will often use the symbols and "." to denote operations in place of a, p, etc. We will call these "addition" and "multiplication." oH wever, we strongly emphasize here that ..+" and"· " will, in general, not denote addition and multiplication of real numbers but, instead, arbitrary operations. In cases where there exists an identity element relative to "+ " , we will denote this element by "0" and call it "zero." If there exists an identity element relative to ". ", we will denote this element either by "I" or bye. Our usual notation for representing an identity + } an relative to an arbitrary operation a will still be e. If in a system {X; element x E X possesses an inverse, we will denote this element by - x and we will call it "minus "x . F o r example, if ;X{ + } is a semigroup, then we denote the inverse of an invertible element x E X by - x , and in this case we have x + (- x ) = (- x ) + x = 0, and also, - ( - x ) = .x Furthermore, if ,x y E X are invertible elements, then the "sum" x + y is also invertible, and - ( x y) = (- y ) (- x ) . Note, however, that unless is commutative, - ( x + y) (- x ) + (- y ). Finally, if x, y E X and if y is an invertible element, then - y E .X In this case we often will simply write x + (- y ) = x - y.
"+"
+
*'
+
"+"
2.1.17. Example. eL t X = O { , 1,2, 3}, and let the systems { X ; { X ; .} be defined by means of the operation tables
+}
and
2.1.
Some Basic Structures ofAlgebra
+
0 1 2 3
39
10
1 2 3 0 1 2 3 1 2 3 0 2 3 0 I 3 0 1 2
0 1 2 3
0 0 0 0 0
1 2 3 0 0 0 1 2 3 2 0 2 3 2 1
The reader should readily show that the systems { X ; }+ and { X ; .} are monoids. In this case the operation" " + is called "addition mod 4" and"· " is called "multiplication mod 4."
•
The most important special type of semigroup that we will encounter in this chapter is the group. 2.1.18. Definition. A group is a monoid in which every element is invertible; IX,} with identity in which every element is eL ., a group is a semigroup, ;X { invertible. The set R of real numbers with the operation of addition is an example of a group. The set of real numbers with the operation of multiplication does not form a group, since the number zero does not have an inverse relative to multiplication. However, the latter system is a monoid. If we let Rtt = R - O { ,J then R { ;# .} is a group. Groups possess several important properties. Some of these are summarized in the next result. 2.1.19. Theorem. Let {X; IX} be a group, and let e denote the identity element of X relative to IX. Let x and y be arbitrary elements in .X Then (i) (ii) (iii) (iv)
if x IX x = x , then x = e; if Z E X and x IX y = x IX ,z then y = z; ifz E X a ndx I X y = z I X Y , thenx = z ; there exists a unique W E X such that W (X
(v) there exists a unique z
E
X
x =
y; and
(2.1.20)
such that
x(Xz=y. Proof To prove the first part, let x (X x = x. Then X - I IX (x IX x ) = and so (X - I (X x ) IX x = e. This implies that x = e. To prove the second part, let x IX y = x IX .z Then X - I (X (x IX y) = IX )z , and so (X - I IX x ) IX Y = (X - I (X x ) IX .z This implies that y = .z The proof of part (iii) is similar to that of part (ii).
(2.1.21)
,x
X-I
(X
X-I
IX (x
Chapter 2 I Algebraic·Structures
04
To prove part (iv), let w = y« X - I . Then w« x = (y« x - I )« x = y« (X - I = y. To show that w is unique, suppose there is a v E X such that = y. Then w « x = v « x . By part (iii), w = v. The proof of the last part of the theorem is similar to the proof of part (iv). « x) v «x
In part (iv) of Theorem 2.1.19 the element w is called the left solution of Eq. (2.1.20), and in part (v) of this theorem the element z is called the right solution of Eq. (2. I.21). We can classify groups in a variety of ways. Some of these classifications are as follows. eL t { X ; } « be a group. Ifthe set X possesses a finite number of elements, then we speak of a finite group. If the operation « is commutative then we have a commutative group, also called an abelian group. If « is not commutative, then we speak of a non-commutative group or a non-abelian group. Also, by the order of a group we understand the order of the set .X Now let ;X { } « be a semigroup and let IX be a non-void subset of X which is closed relative to .« Then by Theorem 1.4.11, the operation « I on XI induced by the associative operation « is also associative, and thus the mathematical system { X I ; I« } is also a semigroup. The system { X I ; I« } is called a subsystem of { X ; .} « This gives rise to the following concept.
2.1.22. Definitio... eL t { X ;
}« be a semigroup, let IX be a non-void subset of X which is closed relative to lX, and let I« be the operation on X I induced by .« The semigroup (X I ; (XI} is called a subsemigroup of (X ; (Xl.
In order to simplify our notation, we will henceforth use the notation (X I ; (X} to denote the subsemigroup (X I ; « t l (Le., we will suppress the subscript of )« . The following result allows us to generate subsemigroups in a variety of ways.
2.1.23. Theorem. eL t {X;
} « be a semigroup and let ,X where I denotes some index set. eL t Y = "X If,X { ; }«
c X for aU i E I, is a subsemigroup
I, and jf Y is not empty, then { ;Y }«
is a subsemigroup
of (X ; } « for every i of { X ; .} «
n
lEI
E
Proof eL t x , y E .Y Then x, y E ,X for all i every i, and hence x « y E .Y This implies that { Y ; let
Now let Wbe any non~void
Y' =
{Y:
E
}«
subset of ,X where { X ;
We Y c X and { Y ;
}«
I and so x «
y E ,X for is a subsemigroup. _ }«
is a semigroup, and
is a subsemigroup of { X ;
«n.
2.1. Some Basic Structures ofAlgebra
Then
cy is non-empty,
since X
E
14
cy. Also, let
G=
n
YE'l/
.Y
Then We G, and by Theorem 2.1.23 G { ; Il} is a subsemigroup of { X ; This subsemigroup is called the subsemigroup generated by W.
Il}.
2.1.24. Theorem. Let ;X { Il} be a monoid with e its identity element, and let { X I ; Ill} be a subsemigroup of { X ; Il}. Ife E IX ! , then e is an identity element of { X I ; Ill} and { X I ; Iltl is a monoid. 2.1.25. Exercise.
Prove Theorem 2.1.24.
Nex t we define subgroup. 2.1.26. Definition. L e t { X ; Il} be a semigroup, and let { X I ; Iltl be a subsemigroup of { X ; Il}. If { X I ; Ill} is a group, then { X I ; Ill} is called a subgroup of{ X ; Il}. We denote this subgroup by { X I ; Il}, and we say the set IX determines a subgroup of{ X ; Il}. We consider a specific example in the following: 2.1.27. Exercise. L e t Z6 = O { , 1,2,3,4 , 5} and define the operation on Z6 by means of the following operation table:
+
+012345 0012345 1104523 2 2 504 3 1 3345012 4431250 5523104 (a) Show that Z { 6; +} is a group. (b) L e t K = O { , I}. Show that{ K ; +} is a subgroup Of{Z6; +}. (c) Are there any other subgroups Of{Z6; + } ? We have seen in Theorem 2.1.24 that if e E IX c ,X then it is also an identity of the subsemigroup { X I ; Il}. We can state something further. 2.1.28. Theorem. L e t { X ; Il} be a group with identity element e, and let { X I ; Il} be a subgroup of { X ; Il}. Then e l is the identity element of { X I ; Il} if and only if e l = e.
Chapter 2 I Algebraic Structures
14 2.1.29. Exercise.
Prove Theorem 2.1.28.
It should be noted that a semigroup { X ; lX} which has no identity element may contain a subgroup { X I ; lX,} since it is possible for a subsystem to possess an identity element while the original system may not possess an identity. If{ X ; lX} is a semigroup with an identity element and if { X I ; lX} is a subgroup, then the identity element of X mayor may not be the identity element of X I ' oH wever, if { X ; lX} is a group, then the subgroup must satisfy the conditions given in the following:
2.1.30. Theorem. eL t { X ; lX} be a group, and let X I be a non-empty subset of .X Then { X l ; lX} is a subgroup if and only if (i) e E X I ; (ii) for every x
E IX > (iii) for every ,x y E IX >
E
X-I
X
and
XI;
lX Y E X l '
Proo.f Assume that { X I ; lX} is a subgroup. Then (i) follows from Theorem 2.1.28, and (ii) and (iii) follow from the definition of a group. Conversely, assume that hypotheses (i), (ii), and (iii) hold. Condition (iii) implies that IX is closed relative to lX, and therefore { X I ; lX} is a subsemigroup. Condition (i) along with Theorem 2.1.24 imply that (X I ; lX} is a monoid, and condition (ii) implies that (X I ; lX} is a group. _ Analogous to Theorem 2.1.23 we have: 2.1.31. Theorem. eL t (X ; lX} be a group, and let ,X c X for all i where lis some index set. Let Y = "X If (X,; lX} is a subgroup of { X ;
n
for every i
E
l, then (Y ;
lX}
lEI
E
lX}
l,
is a subgroup of (X ; lX.}
Proof Since e E ,X for every i E 1 it follows that e E .Y Therefore, Y is non-empty. Now let y E .Y Then y E ,X for all i E l, and thus y- I E IX so that y- l E .Y Since y E X, it follows that Y c .X Also, for every ,x y E ,Y x, Y E IX for every i E l, and thus x lX y E IX for every i and hence x lX y E .Y Therefore, we conclude from Theorem 2.1.30 that { Y ; lX} is a lX.} _ subgroup of ;X { A direct consequence of the above result is the following: 2.1.32. Corollary. eL t (X ; lX} subgroups of { X ; lX.} eL t X 3 { X I ; lX} and (X 2 ; lX.} 2.1.33. Exercise.
be a group, and let (X I ; lX} and (X 2 ; lX} be X 2 • Then {X 3 ; lX} is a subgroup of
= XI n
Prove Corollary 2.1.32.
2.1.
Some Basic Structures of Algebra
34
We can define a generated subgroup in a similar manner as was done in the case of semigroups. To this end let W be any subset of ,X where (X;~} is a group, and let
Y' = (Y : We Y c X and (Y;~} The set Y ' is clearly non-empty because G=
n
E Y J!'
is a subgroup of (X ; X
E
~} .
.Y' Now let
.Y
Then We G, and by Theorem 2.1.31 (G;~} is a subgroup of (X ; subgroup is called the subgroup generated by W.
2.1.34.
Exercise • . Let W be defined as above. Show that if subgroup of(X ; ~}, then it is the subgroup generated by W.
(W;~ }
~}.
This is a
Let us now consider the following:
"+"
2.1.35. Example.
Let Z denote the set of integers, and let denote the usual operation of addition of integers. Let W = (I}. If Y is any subset of Z such that (Y ; + } is a subgroup of { Z ; + } and We ,Y then Y = Z. To prove this statement, let n be any positive integer. Since Y i s closed with respect to ,+ we must have 1 + 1 = 2 E .Y Similarly, we must have 1 + I + ... + 1 = n E .Y Also, n- I = - n , and therefore all the negative integers are in .Y Also, n - n = 0 E ,Y i.e., Y = Z. Thus, G = Y = Z, and so
n
the group { Z ; + }
is the subgroup generated by I{ .}
•
E Y J!'
The above is an example of a special class of generated subgroups, the so-called cyclic groups, which we will define after our next result. 2.1.36. Theorem. be a group. Let x
Let
Z denote the set of all integers, and let { X ; ~} and define ;xl< = x IX X IX • • IX x (k times), for k a = (Xl<)-I, and let OX = e. Let Y = lx { :< k E Z}. positive integer. Let I- x < Then { Y ; ~} is the subgroup of{ ;X }~ generated by (x.} E
X
Proof We first show that {Y;~} is a subgroup of ;X{ .}~ Clearly, Y eX and e E Y and for every y E Y we have r lEY . Also, for every x , y E Y we have x ~ y E .Y Thus, by Theorem 2.1.30, (Y;}~ is a subgroup of{ X ; ~}. Next, we must show that {Y;~} is the subgroup generated by .} x { To do so, it suffices to show that Y c /Y for every /Y such that x E Y and such that {Y/;~} is a subgroup of (X; .}~ But this is certainly true, since y E Yimplies y = lx < for some k E Z. Since x E ,Y it follows that lx < E Y and therefore y E ,Y . • j
j
The preceding result motivates the following:
Ciropter 2
4
I Algebraic Structures
2.1.37. Definition. Let { X ; } « be a group. Ifthere exists an element x E X such that the subgroup generated by } x { is equal to { X ; ,} « then { X ; } « is called the cyclic group generated by .x
By Theorem 2.1.36, we see that a cyclic group has elements of such a form that X = { ... , x - 3 , x - z , X - I , e, ,x x z , .. .}. Now suppose there is some positive integer n such that ' X = e. Then we see that I+ ' X = .x Similarly, X-II = e, and x - + I = .x Thus, X = fe, ,x ... , I-'X ,} and X i s a finite set of order n. If there is no n such that ' X = e, then X is an infinite set. We consider next another important class of groups, the so-called permutation groups. To this end let X be a non-empty set and let M(X ) denote the set of all mappings of X into itself. Now, if ,« p E M(X ) then it follows from (1.2.15) that the composite mapping p 0« belongs also to M(X ) , and we can define an operation on M(X ) (i.e., a mapping from M(X ) x M(X ) into M(X ) ) by associating with each ordered pair (P, «) the element po «. We denote this operation by "." and write II
p . « = po ,«
pE
«,
M(X ) .
(2.1.38)
We call this operation "multiplication," we refer to p . « as the product of p and «, and we note that (P 0 «)(x) = (P . «)(x) for all x E .X We also note that "." is associative, for if /%, P, y E M(X ) , then
(<<
P) . y =
•
(<<
0
P)
0 i'
=
« 0 (jJ 0 y)
=
« •
(P . y).
Thus, the system { M (X ) ; .} is a semigroup, which we call the semigroup of transformations on .X Next, let us recall that a permutation on X is a one-to-one mapping of X onto .X Clearly, any permutation on X belongs to M(X ) . In particular, the identity permutation e: X - + X defined by e(x)
=
x for all x
E
,X
belongs to M(X ) . We thus can readily prove the following: 2.1.39. Theorem. { M (X ) ;.} identity permutation of M(X ) . L e t« and so e • « so /% • e = .«
Proof
=
The (e • «)(x) .« Similarly, (<< • e)(x)
E M(X ) .
•
is a monoid whose identity element is the
=
=
e(<)x « ) (« e(x))
=
=
«(x) for every x E ,X «(x) for all x E ,X and
Next, we prove: 2.1.40. Theorem. Let { M (X ) ; .} be the semigroup of transformations on the set .X An element /% E M(X ) has an inverse in M(X ) if and only if« is a permutation on .X Moreover, the inverse of a unit« is the inverse mapping « - I determined by the permutation /%.
2.1.
54
Some Ba.fic Structures ofAlgebra
Proof Suppose that (X E M(X ) is a permutation on .X Then it follows from Theorem 1.2.10, part (ii), that (X - I is a permutation on X and hence (X - I E M(X ) . Since (X 0 (X - I = (X - I 0 (X = e, it follows that (x . (X - I = (X - I . (X = e, and thus (X has an inverse. Next, suppose that (X has an inverse in M(X ) and let (x ' denote that inverse relative to ".". Then (x ' E M(X ) and (x . (X' = (X' • (X = e. To show that (X is a permutation on X we must show that (X is a one-to-one mapping of X onto .X To prove that (X is onto, we must show that for any x E X there exists ayE X such that (X(y) = .x Since (x ' E M(X ) it follows that (X(' )x E X for every x E X and (X 0 (X(' )x = e(x) = .x Letting y = (X(' )x it follows that (X is onto. To show that (X is one-to-one we assume that (X()x = (X(y). Then, (X ' ( X ( x » = (X'(X(y» and since (X 0 (X' = e, we have x =
e(x)
=
(X
=
(X(' )x
0
Therefore, (X is one-to-one. permutation on .X •
(X'
eH nce,
0
=
(X()x
if (X
E
(X'
(X(y)
0
=
e(y)
=
y.
M(X ) has an inverse,
(X - I ,
it is a
Henceforth, we employ the following notation: the set of all permutations on a given set X is denoted by P(X). As pointed out in Chapter I, if a set X has n elements, then there are n! distinct permutations on .X The reader is now in a position to prove the following result.
2.1.41.
Theorem. (P(X ) ;.J
2.1.42.
Exercise.
is a subgroup of(M(X ) ;.J .
Prove Theorem 2.1.41.
The preceding result gives rise to a very important class of groups.
2.1.43. Definition. Any subgroup of the group {P(X); • J is called a permu• J is called the tation group or a transformation group on ,X and {P(X); permutation group or the transformation group on .X Occasionally, we speak of a permutation group on ,X say { Y ; • ,J without making reference to the set .X In such cases it is assumed that (Y ; • J is a subgroup of the permutation group P(X ) for some set .X
2.1.4.4
Example. tations, namely, (XI
(X4
eL t
X
= ,x{ y, .}z Then P(X ) consists of 3! = 6 permu-
C :), Gz ;), y
= =
y
(X2
y
(x s
C z ;), y
= =
G ;),
= (;
(x ,
= (: y
y
x
y
(X3
x y
:),
;).
We can readily verify that (XI = e. If X I = fe, (X 2 J , then (X I ; • J is a subgroup of P(X ) and hence a permutation group on .X eL t X 2 = (e, (X 4 ' (X s J . Then
Chapter 2 I Algebraic Structures
64
{X
2 ; .}
and { X
B.
is also a permutation group on .X • 2 ; .} is of order 3.
Note that { X I ;
.} is of order 2
Rings and iF elds
Thus far we have concerned ourselves with mathematical systems consisting of a set and an operation on the set. Presently we consider mathematical systems consisting of a basic set X with two operations ~ and P defined on the set, denoted by { X ; ~, Pl. Associated with such systems there are two mathematical systems (called subsystems) {X;~} and ;X { Pl. By insisting that the systems ;X { }~ and ;X { P} possess certain properties and that one of the operations be distributive over the other, we introduce the important mathematical systems known as rings. We then concern ourselves with special types of important rings called integral domains, division rings, and fields. 2.1.45. Definition. Let X be a non- e mpty set, and let ~ and P be operations on .X The set X together with the operations ~ and P on ,X denoted by {X;~, P}, is called a ring if (i) {X;~} (ii)' { X ; (iii)
is an abelian group;
Pl is a semigroup; and
P is distributive over ~.
We refer to {X;~} as the group component of the ring, to ;X{ P} as the semigroup componeDt of the ring, to tX as the group operatioD of the ring, and to P as the semigroup operation of the ring. F o r convenience we often denote a ring { X ; tX, P} by X and simply refer to "ring X " . F o r obvious reasons, we often use the symbols and "." ("addition" and "multiplication") in place of tX and P, respectively. Thus, if X is a ring we may write { X ; and assume that { X ; 1 is the group component of ,X and { X ; .} is the semigroup component of .X We call { X ; + } the additive group of ring ,X ;X{ .} the multiplicatiYe semigroup of ring ,X x y the sum of x and y, and x . y the product of x and y. We use 0 ("z e ro") to denote the identity element of { X ; .}+ If { X ; .} has an identity element, we denote that identity bye. The inverse of an element x relative to is denoted by - .x If x has an inverse relative to ".", we denote it by X - I . F u rthermore, we denote x (- y ) by x - y (the "difference of x and y") and (- x ) y by - x y. Note that the elements 0, e, - x , and X - I are unique. Subsequently, we adopt the convention that when operations and "." appear mixed without parentheses to clarify the order of operation, the operation should be taken with respect to ". " first and then with respect to .+ F o r example,
"+"
,+ .}
+
+
+
+
+
+
"+"
x • y
+
z =
(x • y)
+
z
74
2.1. Some Basic Structures ofAlgebra
and not x • (y Thus, we have x
+
)z . The latter would have to be written with parentheses. • (y
+
)z =
(x • y)
In general, the semigroup ;X { if it does we have:
+
(x • )z
= x • y
+ x
• .z
.} does not contain an identity. However,
,+ .}
2.1.46. Definition. Let ;X { an identity element, we say that X
be a ring. If the semigroup ;X { is a ring with identity.
.} has
There should be no ambiguity concerning the above statement. The group + } always has an identity, so if we say "ring with identity," we must .}. refer to ;X{ We note that it is always true that the operation is commutative for a given ring. If in addition the operation "." is also commutative, we have
;X {
"+"
,+ .}
be a ring. Ifthe operation "." is com2.1.47. Definition. eL t ;X { mutative on the set X then the ring X is called a commutative ring. F o r rings we also have:
,+ .}
be a ring with identity. An element x 2.1.48. Definition. L e t{ X ; is called a unit of X if x has an inverse as an element of the semigroup { X ; We denote this inverse of x by X - I .
E
X .}.
The reader can readily verify that the following examples are rings. L e tting" " + and "." denote the usual operations of 2.1.49. Exercise. addition and multiplication, show that { X ; ,+ .} is a commutative ring with identity if (i) (ij) X
(iii) X
X
is the set of integers; is the set of rational numbers; and is the set of real numbers.
2.1.50. Exercise. operation tables:
Show that ;X{
,+ .}
= O{ , I} and define" "+ and"· " by the following
Let X
+
o
0 0
I I
1
I 0
~
o I
I 0 0
_O
0
I
is a commutative ring with identity.
ex} be an abelian group with identity element e. 2.1.51. Exercise. Let ;X{ Define the operation P on X as x p y = e for every x , y E .X Show that ;X{ ex, P} is a ring.
Chapter 2 I Algebraic Structures
14
F o r rings we have:
,+ .}
2.1.52. Theorem. If { X ;
is a ring then for every ,x y
(i) x + 0 = 0 + x = x ; (ii) - ( x + y) = (- x ) + (- y ) = (- x ) - y (iii) if x y = 0, then x = y; (iv) - ( - x ) = x; (v) 0 = x • 0 = 0 • x ; (vi) (- x ) . y = - ( x . y) = x · (- y ); and (vii) (- x ) · ( - y )= x · y .
+
= -x -
E
X we have
y;
Proof Parts (i)-(iv) follow from the fact that ;X{ + } is an abelian group and from our notation convention. To prove part (v) we note that since z + 0 = z for every z E X we have for every x E ,X 0 • x + 0 = 0 • x = (0 + 0) • x = 0 • x + 0 • x, and thus o = 0 • x. Also, x . 0 + 0 = x • 0 = x . (0 + 0) = x • 0 + x • 0, so that o = x • O. eH nce, 0 = x • 0 = 0 . x for every x E .X To prove part (vi), note that 0 • y = 0 for every y E X and since x + (- x ) = 0 we have 0 = O· y = x [ + (- x ) ] · y = x · y+ ( - x ) · y. This implies y) = (- x ) • y since - ( x · y) is the additive inverse of x • y. that - ( x · Similarly, 0 = x · 0 = x • y[ + (- y )] = x · y + x · (- y ). This implies y). Thus, (- x ) . y = - ( x · y) = x • (- y ). that x · (- y ) = - ( x · iF nally, to prove part (vii), we note that since - ( - z ) = z for every z E X and since part (vi) holds for any x E X, we obtain, replacing x by - x , (- x ) .
(- y )
=
-[(-x).
y]
=
-[-(x·
y)] =
x·
y. •
Now let { X ; + , .} denote a ring for which the two operations are equal, i.e., "+" = ".". Then x + y = x · y for all x , y E .X In particular, if y = 0, then x + 0 = x . 0 = 0 for all x E X and we conclude that 0 is the This gives rise to: only element of the set
:X
2.1.53. Definition. A ring ;X{
,+ .}
is called a trivial ring if X =
O { .J
We next introduce:
,+ .}
be a ring. Ifthere exist non-zero elements 2.1.54. Definition. eL t { X ; x , y E X (not necessarily distinct) such that x • y = 0, then x and yare both called divisors of ez ro. We have:
,+ .} be a ring, and let X · = X - O{ .J Then X 2.1.55. Theorem. eL t ;X { has no divisors of ez ro if and only if ;#X{ .} is a subsemigroup of ;X{ .}.
2.1. Some Basic Structures ofAlgebra
49
Proof Assume that X has no divisors of ez ro. Then ,x y E #X x • y 0, so x • Y E #X and X# is a subsemigroup. Conversely, if x, y E #X implies x • y E ,#X then x • y,* if x
*'
implies
° *' °
y,*O. •
and
We now consider special types of rings called integral domains. 2.1.56. Deftnition. A ring { X ; no divisors of ez ro.
,+ .J
is called an integral domain if it has
Our next result enables us to characterize integral domains in another equivalent fashion. 2.1.57. Theorem. A ring X is an integral domain if and only if for every 0, the following three statements are equivalent for every y, Z E X:
*' x
(i) y = ;z (ii) x · y = (iii) y. x =
x • z; and z • .x
Proof Assume that X is an integral domain. Clearly (i) implies (ii) and (iii). To show that (ii) implies (i), let x • y = x • .z Then x • (y - )z = 0. Since x 0 and X has no ez ro divisors, y - z = 0 or y = .z Thus, (ii) implies (i). Similarly, it follows that (iii) implies (i). This proves that (i), (ii), and (iii) are equivalent. Conversely, assume that x ;= t:. 0 and that (i), (ii), and (iii) are equivalent. Let x • y = O. Then x • 0 = x . y, and it follows that y must be zero since (ii) implies (i). Thus, x . Y ;= t:. 0 for y ;= t:. 0, and X has no ez ro divisors. _
*'
We now introduce divisors of elements.
,+ . J be a commutative integral domain with 2.1.58. Definition. Let ;X{ identity, and let x, y E .X We say y is a divisor of x if there exists an element Z E X such that x = y • .z If y is a divisor of ,x we write y I.x If y Ix, it is customary to say that y divides .x 2.1.59. Theorem. Let ;X{ ,+ .J be a commutative integral domain with identity, and let x E .X Then x is a unit of X ifand only if x Ie.
Proof Let x Ie. Then there is a z E X such that e = x • z is an inverse of ,x eL ., z = X - I . Conversely, let x be a unit of .X Then there exists X - I e = x • X - I , and thus x l e. _ We notice that if in an integral domain x • y
=
=
Z •
E
X
.x Thus,
Z
such that
0, then either x
=
0 or
Chapter 2
I
Algebraic Structures
= O.
Now a divisor of zero cannot have an inverse. To show this, we let O. Suppose that y has an inverse. Then x • y • r I = 0 • r I, or x = 0, which contradicts the fact that x and y are z e ro divisors. However, the fact that an element is not a zero divisor does not imply it has an inverse. If all of the elements except zero have an inverse, we have yet another special type of ring. y
x and y be divisors of zero, eL ., x • y =
,+ .}
2.1.60. DeflDition. L e t {X; be a non-trivial ring, and let X # The ring X is called a division ring if { X # ; .} is a subgroup of {X;
=
X .}.
O { .J
In the case of division rings we have: 2.1.61. Theorem. identity.
Let { X ;
,+ .}
be a division ring. Then X
is a ring with
Proof L e t X # = X - O{ .J Then { X # ; .} has an identity element e. L e t x E .X Ifx E X # , then e • x = x • e = x. If x ~ X * , then x = 0 and 0 • e = e • 0 = O. Therefore, e is an identity element of .X • Of utmost importance is the following special type of ring. 2.1.62. Definition. L e t { X ; ,+ .} be a division ring. Then X field if the operation "." is commutative.
is called a
Because of the prominence of fields in mathematics as well as in applications, and because we will have occasion to make repeated use of fields, it may be worthwhile to restate the above definition, by listing all the properties of fields. 2.1.63. DefinitioD. L e t X be a set containing more than one element, and let there be two operations and "." defined on .X Then { X ; is a field provided that:
"+"
+
(i) x
(y
,x y, z
(ii)
+
+
E
z) = (x X (Le.,
+
+ "+"
,+ .}
+
y) z and x · (y • )z = (x • y) • z and"." are associative operations);
x y= y x and x · y = y • x for all x , y "." are commutative operations);
+
E
X
(Le.,
for all
"+"
and
(iii) there exists an element 0 E X such that 0 x = x for all x E X; (iv) for every x E X there exists an element - x E X such that x +(-x)=O; (v) x · (y + z) = x • y + x • z for all x , y, z E X (i.e., "." is distributive over "+ " ); (vi) there exists an element e*"O such that e • x = x for all x E X; and (vii) for any x 0, there exists an X - I E X such that x • (X - I ) = e.
*"
51
2.1. Some Basic Structures ofAlgebra
2.1.64. Example. Perhaps the most widely known field is the set of real numbers with the usual rules for addition and multiplication. _ 2.1.65. Exercise. Let Z denote the set of all integers and .. " + and "." denote the usual operations of addition and multiplication on Z. Show that Z { ; ,+ .} is an integral domain, but not a division ring, and hence not a field. The above example and exercise yield: 2.1.66. Definition. Let R denote the set of all real numbers, let Z denote + and "." denote the usual operations of the set of all integers, and let" " addition and multiplication, respectively. We call {R; ,+ .} the field of real the ring of Integers. numbers and { Z ;
,+ .}
Another very important field is considered in the following: 2.1.67. Exercise. eL t C = R x R, where R is given in Definition 2.1.66. F o r any x , y E C, let x = (a, b) and y = (c, d), where a, b, e, d E R. We define x = y if and only if a = c and b = d. Also, we define the operations ..+" and "." on C by x
and Show that C { ;
,+ .}
+
x • y
y
=
=
(a
(ac -
+
c, b
+
bd, ad +
d)
be).
is a field.
In view of the last exercise we have: 2.1.68. Definition. The field (C; the field of complex numbers.
,+ .}
defined in Exercise 2.1.67 is called
2.1.69. Exercise. eL t Q denote the set of rational numbers, let P denote the set of irrational numbers, and let" " + and"· " denote the usual operations of addition and multiplication on P and Q. (a) Discuss the system Q { ; (b) Discuss the system P { ;
,+ .}. ,+ .}.
2.1.70. Exercise. (This exercise shows that the family of 2 x 2 matrices denote the field of real numbers. forms a ring but not a field.) Let {R; Define M to be the set characterized as follows. If u, v E M, then u and v are of the form
,+ .}
Chapter 2 I Algebraic Structures
52 where a, b, c, d and m, n, p, q onMby
E R.
Define the operations
"+"
and "."
u+ v= 0[ dbJ + m[ p qnJ = 0[ c+ p+ m bd+ q+ nJ C
and
u • v=
0[ bJ. C
d
[m
p
q
nJ = 0[ . m +
b•
c· m + d · p
pO· n +
c.n+ d · q
b•
.Jq
(Note that in the preceding the operations + and· defined on M are entirely different from the operations + and· for the field R.) (a) (b) (c) (d)
Show that M { ; +} { ; +} Show that M { ; ,+ Show that M { ; ,+ Show that M
is a monoid. is an abelian group. .} is a ring. .} has divisors of ez ro.
Next, we introduce the concept of subring. 2.1.71. Definition. eL t X be a ring, and let Y be a non-void subset of X which is closed relative to both operations "+" and "." of the ring .X The set ,Y together with the (induced) operations "+" and ".", { Y ; ,+ ,} ' is called a subring of the ring X provided that { Y ; ,+ .} is itself a ring. In connection with the above definition we say that subset Y determines ,+ .}. We have:
the subring { Y ;
2.1.72. Theorem. If X is a ring then a non-void subset Y o f a subring of the ring X if and only if (i)
(ii)
Y is closed with respect to both operations" " + -x E Y whenever x E .Y
2.1.73. Exercise. sU ing
X
determines
and"· " ; and
Prove Theorem 2.1.72.
the concept of subring, we now introduce subdomains.
2.1.74. Definition. eL t X be a ring, and let Y be a subring of .X an integral domain, then it is called a subdomain of .X
If Y is
We also define subfield in a natural way. 2.1.75. Definition. eL t X be a ring, and let Y be a subring of .X field, then it is called a subfteld of .X
If Y is a
Before, we characterized a trivial ring as a ring for which the set X consists only of the 0 element. In the case of subrings we have:
2.1. Some Basic Structures ofAlbegra
53
,+ .}
2.1.76. Definition. Let { X ; be a ring, and let { Y ; Then subring Y is called a trivial subring if either (i) (ii)
Y = Y=
,+ .}
be a subring.
O { J or
.X
F o r subdomains we have: 2.1.77. Theorem. L e t X be an integral domain, and let Y be a non-trivial subring of .X Then Y is a subdomain of .X Proof Let x , y E ,Y and let x • y = O. Since x , y zero divisors. Thus, Y has no zero divisors. •
E
X, x and y cannot be
F o r subfields we have:
"*
2.1.78. Theorem. Let X be a field, and let Y be a subring of .X is a subfield of X if and only if for every x E ,Y X 0, X - I E .Y 2.1.79.
Exercise.
Then Y
Prove Theorem 2.1.78.
F o r the intersection of arbitrary subrings we have the following: 2.1.80. Theorem. Let X be a ring, and let /X be a subring of X for each i E I, where I is some index set. Let Y = X I ' Then ;Y{ is a subring
n
,+ .}
/EI
of(X ; + , .} .
Proof Since 0 E /X for all i E I, it follows that 0 E Y a nd Y is non-empty. L e t x , y E .Y Then x , y E /X for all i El. Hence, x + y E /X and x · y E /X for all i E I so that Y is closed with respect to "+" and "." Also, -x E /X for every i E I. Thus, by Theorem 2.1.72, Y is a subring of .X •
,+ .}
Now let (X ; Y' Then' Y
= {Y:
be a ring and let W be any subset of .X
W eY e
is non-empty because X
,+ .}
Also, let
X and Y is a subring of .}X E
.Y '
Now let R
and, by Theorem 2.1.80, R { ; is a subring of { X ; called the subring generated by W.
=
n
YEy
,+ .}.
.Y
Then We R This subring is
C. Modules, Vector Spaces, and Algebras Thus far we have considered mathematical systems consisting of a set of elements and of mappings from X X X into X called operations on .X Since a mapping may be regarded as a set and since an operation is a mapping (see Chapter I), the various components of the mathematical systems considered up to this point may be thought as being derived from one set .X X
Chapter 2 / Algebraic Structures Next, we concern ourselves with mathematical systems which are not restricted to possessing one single fundamental set. We have seen that a single set admits a number of basic derived sets. Clearly, the number of sets that may be derived from two sets, say X and ,Y will increase considerably. F o r example, there are sets which may be generated by utilizing operations on X and ,Y and then there are sets which may be derived from mappings of X x Y into X or into .Y Mathematical systems which possess several fundamental sets and operations on at least one of these sets may, at least in part, be analyzed by making use of the development given thus far in the present section. Indeed, one may view many such complex systems as a composite of simpler mathematical systems and refer to such systems simply as composite mathematical systems. Important examples of such systems include vector spaces, algebras, and modules.
{ ; ,+ .} be a ring with identity, e, and let { X ; } + 2.1.81. Definition. L e t R be an abelian group. L e t IJ: R x X - + X be any function satisfying the following four conditions for all tx, pER and for all x, y E X : (i) p(tx + p, x ) = lJ(tx, x ) + IJ(P, x); (ii) lJ(tx, x + y) = p(tx, x ) + p(tx, y); (iii) lJ(tx, p(P, » x = p(tx • P, x); and (iv) p(e, x ) = x .
Then the composite system {R, X , p} is called a module. Since the function p is defined on R x ,X the module defined above is sometimes called a left R-module. A right R-module is defined in an analogous manner. We will consider only left R-modules and simply refer to them as modules, or R-modules. The mapping IJ: R x X - + X is usually abbreviated by writing lJ(tx, x ) = tx,x i.e., in the same manner as "multiplication of tx times x . " Using this notation, conditions (i) to (iv) above become (i) (tx + P)x = txx + px ; (ii) tx(x + y) = txx + txy; (iii) tx(px) = (tx • P)x; and (iv)
ex =
x;
respectively. We usually refer to the module R { , X,IJ} by simply referring to X and calling it an R-module or a module over R. To simplify notation, we used in the preceding the same operation symbol, for ring R as well as for group .X However, this should cause no confusion, since it will always be clear from contex t which operation is used. We will follow similar practices on numerous other occasions in this book.
,+
2.1. Some Basic Structures ofAlgebra
55
2.1.82. Example. Let Z { ; ,+ .} denote the ring of integers, and let{X; }+ be any abelian group. Define p: Z X X - + X by p(n, )x = x ,x where the summation includes x n times. We abreviate this as p(n,x ) = nx and think of it as "n times x . " The identity element in Z is 1, and we see that the conditions (i) to (iv) in Definition 2.1.81 are satisfied. Thus, any abelian group may be viewed as a module over the ring of integers. _
+ ... +
,+ .}
2.1.83. Example. eL t { X ; be a ring with identity, and let R be a subring of X with e E R. By defining p: R x X - + X as p«(I" )x = (1,. ,x it is clear that X is an R-module. In particular, if R = ,X we see that any ring with identity can be made into a module over itself. _ F o r modules we have: 2.1.84. Theorem. Let we have (i) (1,0 =
X
be an R-module. Then for all (I,
E
R and x
E
X
0;
(ii) (1,( - x ) = -«(I,x); (iii) Ox = 0; and (iv) (- ( I,)x = -«(I,x).
Proof. To prove the first part, we note that for 0 E X we have 0 + 0 = O. Thus, (1,(0 + 0) = (1,0 + (1,0 = (1,0, and so (f,Q = O. To prove the second part, note that for any x E X w e have x + (- x ) = 0, and thus (I,(x + (- x » = (l,X + (1,( - x ) = (1,0 = O. Therefore, (1,( - x )
-«(I,x). To prove the third part observe that for 0 E R we have 0 + 0 = O. eH nce, (0 + O)x = Ox + Ox = Ox, and therefore Ox = O. To prove the last part, note that since (I, + (- ( I,) = 0 it follows that (« I, + (- ( I,» x = Ox = O. Therefore, (l,X + (- ( I,)x = 0, and (- ( I,)x = -«(I,x). =
We next introduce the important concept of vector space.
2.1.85. Definition. eL t ;F { group. If X is an F-module,
-
,+ .}
be a field, and let ;X { }+ be an abelian then X is called a vector space over .F
The notion of vector space, also called linear space, is among the most important concepts encountered in mathematics. We will devote the next two chapters and a large portion of the remainder of this book to vector spaces and to mappings on such spaces.
.} be a ring, and let R" = R x ... x R; 2.1.86. Theorem. Let {R; ,+ i.e., R" denotes the n-fold Cartesian product of R. We denote the element
Chapter 2 I Algebraic Structures X
E
R" by x =
(x .. :x
h ••
,
+
Y =
x
,x ,) and define the operation"
+
(X I
for all ,x Y E R". Also, we define p: R X otx
=
., "x
IY ">
R" - +
+
"+
on k' by
,Y ,)
k' by
(otx .. .• . , otx,,)
for all ot E R and x E R". Then, R" is an R-module.
2.1.87. Exercise.
Prove Theorem 2.1.86.
We also have:
,+ .}
2.1.88. Theorem. eL t ;F { be a field, and let "F = F x ... X F be the n-fold Cartesian product of .F Denote the element x E F " by x = (~I' ~z, ... ,~,,) and define the operation "+" on "F by for all ,x Y
E
"F . Also, define p: F x "F otx =
for all ot
F and x
E
2.1.89. Exercise.
E
(~
-+
"F by
..... , ~,,)
F". Then F " is a vector space over .F
Prove Theorem 2.1.88.
In view of Theorem 2.1.88 we have:
2.1.90. Definition. Let ;F { ,+ .} be a field. The vector space "F is called the vector space of n-tuples over .F
over F
Another very important concept encountered in mathematics is that of an algebra. We have:
2.1.91. Definition. eL t X be a vector space over a field .F Let a binary operation called "multiplication" and denoted by "." be defined on X, satisfying the following axioms: (i) x ' (y + )z (ii) (x y) • z (iii) (otx)· (PY)
+
=
=
=
x • Y x • z
+
+
(ot • P)(x
x • ;z
Y • ;z and • y)
for all x, ,Y z E X and for all ot, P E .F Then, X is called an algebra over .F If, in addition to the above axioms, the binary operation of multiplication is
associative, then X is called an associative algebra. Ifthe operation is com-
2.1. Some Basic Structures ofAlgebra
57
mutative, then X is called a commutative algebra. IfX has an identity element, then X is called an algebra with identity. Note that in hypothesis (iii) the symbol"." is used to denote two different operations. Thus, in the case of x • y the operation used is defined on X while in the case of /X • p the operation used is defined on .F The reader is cautioned that in some texts the term algebra means what we defined to be an associative algebra. 2.1.92. Exercise. Let M { ; { ; in Exercise 2.1.70, and let R given by
,+ .}
,+ .}
denote the ring of 2 x 2 matrices defined be the field of real numbers. F o r u E M
where a, b, c, d E R, define /Xu for /X E R by /Xu
= /[ QX
/Xb.J
tXd
/Xc
Show that M is an associative algebra over R. In some areas of application, so-called Lie We have:
algebras are of importance.
2.1.93. Definition. A non-associative algebra R is called a Lie if x • x = 0 for every x E R and if x • (y • )z
for every x , y, Z
E
+
y . (z • x)
+
z •
(x • y) =
0
algebra (2.1.94)
R. Eq u ation (2.1.94) is called the J a cobi identity.
L e t us now consider some specific cases of Lie algebras. Our first exercise shows that any associative algebra can be made into a iL e algebra. 2.1.95. Exercise. Let R be an associative algebra over ,F operation "." on R by
and define the
x . y= x · y - y · x for all x, y E R (where"· " is the operation on the associative algebra R over F ) . Show that R with "." defined on it is a iL e algebra. 2.1.96. Example. In Exercise 2.1.70 we showed that the set of 2 x 2 matrices forms a ring but not a field, and in Exercise 2.1.92 we showed that this set forms an algebra over ,F the field of real numbers. This set can be made into a Lie algebra by Exercise 2.1.95. •
Chapter 2 I Algebraic Structures
2.1.97. Exercise. eL t X denote the usual "three-dimensional space," and let i, j, k denote the elements of X depicted in Figure A.
k = (0,0,1)
j = (0,1,0)
2.1.98.
vectors i.j. k in three-dimensional space.
iF gure A. Unit
Define the operation "x " on X by
0 -k
j k 0
k -j
j
-i
0
x j
k
i
i.e., "X " denotes the usual "cross product," also called "outer product." encountered in vector analysis. Show that X is a iL e algebra. eL t us next consider submodules. 2.1.99. Deflnition. eL t {R; ,+ .J be a ring with identity, and let { X ; } + be an abelian group, where X is an R-module. eL t { Y ; + J be a subgroup of {X; .J If Y is an R-module, then Y is called an R-submoclole of .X
+
We can characterize submodules by the following: 2.1.100. Theorem. eL t X be an R-module, and let Y subset of .X Then, Y is an R-submodule if and only jf
be a non-empty
(i) { Y ; + J is a subgroup of { X ; + ;J and (ii) for all « E R and x E ,Y we have « x E .Y
Proof We give the sufficiency part of the proof and leave the necessity part as an exercise. Let ,« pER and let x E .Y Then «x, px , (<< + P)x E Y by hypothesis (ii). Since iY s a group, it follows that « x + py E Yand since x E X we have
2.1. Some Basic Structures ofAlgebra
59
(<< + P)x = « x + px . Now Iet« E R and let x , y E .Y Then (x + y) E Y and, also, « ( x + y), « x , « y E .Y Thus, « ( x + y) = « x + «Y, since Y is a subgroup of .X Now let « , PER, and let x E .Y Then px E ,Y and hence « ( px ) E .Y We have (<< • P)x E ,Y and so « ( px ) = (<< • P)x . Also, since e E R, we have ex E Y for all x E Y and furthermore, since x E ,X we have ex = x . This proves that Y is an R-module and hence an R-submodule of X . • 2.1.101.
Exercise.
Prove the necessity part of the preceding theorem.
We next introduce the notion of vector subspace, also called linear subspace. 2.1.102. DefinitioD. Let F be a field, and let X be a vector space over .F Let Y be a subset of .X If Y is an F - s ubmodule of ,X then Y is called a vector snbspace. L e t us consider some specific cases. 2.1.103. for i E
=
R}
Example.
L e t R be a ring, let X
I, ... ,n. Then the subset of X
is an R-submodule of
.X •
be an R- m odule, given by { x
E
:X
and let {x x
=
ii
X
E
(1,{,{x
{«
2.1.104. Example. Let F be a field, and let P be the vector space of ntuples over .F Let IX = (1,0, ... ,0) and lX = (0,1,0, ...• 0). Then IX ' lX E F~. Let Y = (x E P: X = I« X I + «lX l , « .. 1« E .}F Then iY s a vector subspace. We see that jf X E ,Y then x is of the form x = (« 1,1. (1,1,0, ... ,0).
•
We next prove: 2.1.105. Theorem. Let X be an R-module, and let Y' denote a family of R-submodules of ,X i.e., /Y is a submodule of X for every {Y E ,Y ' where i E I and I is some index set. Let Y = .{Y Then Y is an R-submodule of .X {el
n
Since {Y is a subgroup of X for all Y 1 E ,Y ' it follows that Y is a subgroup of X by Theorem 2.1.31. Now let « E R and let y E .Y Then y E Y j for all {Y E .Y ' Hence, « y E {Y for all {Y E .Y ' and so (1,y E .Y Therefore, by Theorem 2.1.100, Y is an R-submodule of .X •
Proof
The above result gives rise to:
Chapter 2 I Algebraic Structures
2.1.106. Deftnition. Let X be an R-module, and let W be a subset of .X eL t cy be the family of subsets of X given by eL t G
cy = ()
=
rely
{Y:
W c: Y
and Y i s
c: X
an R-submodule of X}.
.Y Then G is called the R-sabmodule of X generated by W.
eL t us next prove: 2.1.107. Theorem. eL t X be an R-module, and let (Y x I ' • . . , ,x ,) denote the subset of X given by (Y x
l,
Then (Y x
=
,x,,) I'
,
{x
X:
E
= x
«IX
I
+ ... +
«"x"'
,X"
E
.X
«I' ... ,«"
E
R}.
X
lt • •
eL t
,x ,) is an R-submodule of .X
°
Proof. F o r brevity let Y = (Y x I ' • . . , ,x ,). To show that Y is a subgroup of X we first note that E .Y Next, for x E ,X let y = (- « I )X I + (-I"X )X,,. Then y E Y and x y = 0, and hence y = - x . Next, let z = PIX I P..x". Then x Z = (IX I PI)X I (IX" P")x,, E .Y Therefore, by Theorem 2.1.30, Y is a subgroup of .X
+
+ ... +
Finally, note that for any a ax
=
a(IXlx
l
+ ... +
E
IX",X ,)
+ ...
+
+
R,
=
a • IXIX
I
+ ... +
+ ...
Thus, by Theorem 2.1.100, Y is an R-submodule of .X We see that (Y x if we let Y = Y ( x "
l,
E
.Y
•
, ,x ,) belongs to the family cy of Definition 2.1.106 ... , ,x ,), in which case n Y , = (Y x l , • • , ,x ,). This
2.1.108. Deftnition. eL t I'
• IX""X
l , •••
r'.E.y
leads to: (Y x (Y x
+a
+
X
be an R-module, let XI'
,x,,) = { x E X : x = IXIX I + ... , ,x ,) is called the R-module of X
X, and let IX""X , lX I ' .• . ,IX " E R}. Then generated by x I ' . . . , "x .
+
... ,X "
E
Also of interest to us is: 2.1.109. Definition. eL t X be an R-module. If there exist elements XI' • . . ,X" E X such that for every x E X there exist lX I ' • • , IX" E R such that x = I« X I + ... + «"x"' then X is said to be bitely generated and XI' ... , x" are called the generators of .X It can happen that the indexed set I{ X It • . . , IX,,} in the above definition is not unique. That is to say, for x E X we may have x = IXIX I + ... + IX..x" = PIX I + ... + P"x", where 1X , 1= = P, for some i. oH wever, if it turns out that the above representation of x in terms of x " ... ,X " is unique, then we have:
61
2.1. Some Basic Structures ofAlgebra
2.1.110. DefiDitioD. Let X be an R-module which is finitely generated. eL t "x ... , X n be generators of .X Iffor every x E X the relation
x implies that ~I a basis for .X D.
=
+ ... +
~,x,
= PI
for
PIX, + ... + Pnxn all i = I, ... ,n, then the set "x { . .. ,xnJ ~.xn
=
is called
Overview
We conclude this section with the flow chart of Figure D, which attempts to put into perspective most of the algebraic systems considered thus far.
Integral domain
Commutative ring
Module
Associative algebra
2.1.111.
Commutative algebra
iF gure B. Some basic structures of algebra.
2.2.
O H MOMORPHISMS
Thus far we have concerned ourselves with various aspects of different mathematical systems (e.g., semigroups, groups, rings, etc.). In the present section we study special types of mappings defined on such algebraic structures. We begin by first considering mappings on semigroups. 2.2.1. Definition. eL t { X ; }« and { Y ; P} be two semigroups (not necessarily distinct). A mapping p of set X into set Y is called a homomorphism of the semigroup { X ; }« into the semigroup { Y ; P} if p(x
(X
y)
=
(2.2.2)
p(x)P p (y)
for every ,x y E .X The image of X under p, denoted by p(X ) , is called the homomorphic image of .X If x E X then p(x ) is called the homomorphic image of .x In iF gure C, the significance of Eq. (2.2.2) is depicted pictorially. rF om this figure and from Eq. (2.2.2) it is evident why homomorphisms are said to "preserve the operations (X and p." p
p p
x 2.2.3. { Y ; fJ.}
Figure C. Homomorphism
y
or semigroup { X ;
~}
into semigroup
In the above definition we have used arbitrary semigroups ;X{ }« and Pl. As mentioned in Section 2.1, it is often convenient to use the symbol "+ " for operations. When using the notation { X ; +} and { Y ; +} to denote two different semigroups, it should of course be understood that the operation + associated with set X will, in general, be different from the operation + associated with set .Y Since it will usually be clear from context which particular operation is being used, the same symbol will be employed for both semigroups (however, on rare occasions we may wish to distinguish between different operations on different sets). sU ing the notation { X ; + } and { Y ; + } in Definition 2.2.1, Eq. (2.2.2)
{Y;
62
2.2. oH momorphisms now assumes the form p(x
+
=
y)
p(x )
+
(2.2.4)
p(y)
for every ,x y E .X This relation looks very much like the "linearity property" which will be the central topic of a large portion of the remainder of this book, and with which the reader is no doubt familiar. oH wever, we emphasize here that the definition of "linear" will be reserved for a later occasion, and that the term homomorphism is not to be taken as being synonymous with linear. Nevertheless, we will see that many of the subsequent results for homomorphisms will reoccur with appropriate counterparts throughout the this book. 2.2.5. Example. Let R denote the set of real numbers, and let "+" and "." denote the usual operations of addition and multiplication on R. Then R { ; + } and {R; .} are semigroups. eL t f(x ) =
for all x
E
e"
R. Thenfis a homomorphism from {R; + }
to {R; .} . •
2.2.6. Exercise. eL t ;X { +} and { X ; · } denote the semigroups defined in Example 2.1.17. L e tf: X - + X b e defined as follows:f(O) = 1,/(1) = 3, f(2) = I, and f(3) = 3. Show that f is a homomorphism from ;X{ + } into { X ; .} . In order to simplify our notation even further, we will often use the symbol"." in the remainder of the present chapter to denote operations for semigroups (or groups), say { X ; ' } , { Y ; • ,J and we will often refer to these simply as semigroup (or group) X and ,Y respectively. In this case, if p denotes a homomorphism of X into Y we write p(x • y)
=
p(x ) • p(y)
for all x , y E .X In Chapter I we classified mappings as being into, onto, one-to-one and into, and one-to-one alld onto. Now if p is a homomorphism of a semigroup X into a semigroup ,Y we can also classify homomorphisms as being into, onto, one-to-one and into, and one-to-one and onto. This classification gives rise to the following concepts. 2.2.7. Definition. eL t semigroup .Y
p be a homomorphism of a semigroup X
into a
(i) If P is a mapping of X onto ,Y we say that X and Yare homomorphic semigroups, and we refer to X as being homomorphic to .Y (ii) If P is a one-to-one mapping of X into ,Y then p is called an isomorphism of X into .Y
Chapter 2
64
I Algebraic Structures
(iii) If P is a mapping which is onto and one-to-one, semigroup X is isomorphic to semigroup .Y
we say that
(iv) If X = Y (Le., p is a homomorphism of semigroup X into itself), then p is called an endomorphism. (v) If X = Y a nd if p is an isomorphism (Le., p is an isomorphism of semigroup X into itself), then p is called an automorphism of .X
We note that since all groups are semigroups, the concepts introduced in the above definition apply necessarily also to groups. In connection with isomorphic semigroups (or groups) a very important observation is in order. We first note that if a semigroup (or group) X is isomorphic to a semigroup ,Y then there exists a mapping p from X into Y which is one-to-one and onto. Thus, the inverse of p, p- I, exists and we can associate with each element of X one and only one element of ,Y and vice versa. Secondly, we note that p is a homomorphism, eL ., p preserves the properties of the respective operations associated with semigroup (or group) X and semigroup (or group) Y o r, to put it another way, under p the (algebraic) properties of semigroups (or groups) X and Y a re preserved. eH nce, it should be clear that isomorphic semigroups (or groups) are essentially indistinguishable, the homomorphism (which is one-to-one and onto in this case) amounting to a mere relabeling of elements of one set by elements of a second set. We will encounter this type of phenomenon on several other occasions in this book. We are now ready to prove several results.
p be a homomorphism from a semigroup X into a
2.2.8. 1beorem. eL t semigroup .Y Then
(i) p(X ) is a subsemigroup of ;Y (ii) if X has an identity element, e, p(e) is an identity element of p(X ) ; (iii) if X has an identity element, e, and if x E X has an inverse, x - I, then p(x) has an inverse in p(X ) and, in fact, P [ (X)I-] = p(x - I); (iv) if IX is a subsemigroup of ,X then p(X I) is a subsemigroup of p(X ) ; and (v) if Y I is a subsemigroup of p(X ) , then
Xl
=
{x
E
:X
p(x )
E
Y
I}
is a subsemigroup of .X Proof. To prove the first part we must show that the subset p(X ) of Y is closed relative to the operation"· .. on .Y Now if x', y' E p( X), then there exists at least one x E X and at least one y E X such that p(x) = x ' and p(y) = y'. Since p is a homomorphism, we have x'
• y'
=
p(x) • p(y) =
p(x • y),
2.2. oH momorphisms
and since x • y E X it follows that x ' • y' E p(X ) because p(x • y) E P(X ) . Thus, p(X ) is closed and, hence, is a subsemigroup of .Y To prove the second part, note that since e E X we have pee) E p(X ) , and since for any x ' E p(X ) there exists x E X such that p(x ) = x ' , we have
=
p(e) • x '
pee) • p(x ) =
p(e • x ) =
p(x )
=
x'.
Since this is true for every x ' E p(X ) , it follows that p(e) is a left identity element of p(X ) . Similarly, we can show that x ' • pee) = x ' for every x ' E p(X ) . Thus, p(e) is an identity element of the subsemigroup p(X ) of .Y To prove the third part of the theorem, note that since p is a homomorphism, we have p(x ) • p(x -
and
p(x -
I) •
=
I)
p(x )
=
p(x
• I-X )
p(x -
I •
=
x)
=
p(e), p(e);
i.e., p(e) is an identity element of p(X ) . Also, since p(x - I ) E P(X ) , p(x ) has an inverse in P(X ) , and P [ (X)I- ] = p(x - I). The proof of parts (iv) and (v) of this theorem are left as an exercise. _ 2.2.9. Exercise.
Complete the proof of Theorem 2.2.8.
We emphasize that although p(e) in the above theorem is an identity element of the subsemigroup p(X ) of ,Y it is not necessarily true that pee) has to be an identity element of .Y 2.2.10. Definition. eL t p be a homomorphism of a semigroup X into a semigroup .Y If p(X ) has identity element, say e', then the subset of ,X K p • defined by K p = { x E :X p(x) = e'l is called the kernel of the homomorphism p. It turns out that K
p
is a semigroup; i.e., we have:
2.2.11. Theorem. K , is a subsemigroup of .X 2.2.12. Exercise.
Prove Theorem 2.2.11.
Now let X and Y be groups (instead of semigroups, as above), and let p be a homomorphism of X into .Y We have: 2.2.13. Theorem. eL t p be a homomorphism from a group X into a group .Y Then (i) P(X ) is a subgroup of Y; and (ii) if e is the identity element of ,X then pee) is the identity element of .Y
Chapter 2 / Algebraic Structures
66
Proof To prove the first part, let e denote the identity element of .X By part (i) of Theorem 2.2.8, p( X ) is a subsemigroup of ;Y by part (ii) of Theorem 2.2.8, p(e) is an identity element of p(X ) ; and by part (iii) of the same theorem, it follows that every element of p(X ) has an inverse. Thus, p(X ) is a subgroup of .Y The second part of this theorem follows from Theorem 2.1.28 and from part (ii) of Theorem 2.2.8. • The following result is known as Cayley' s theorem. 2.2.14. Theorem. L e t ;X{ .} be a group, and let { P (X ) ; .} denote the permutation group on .X Then X is isomorphic to a subgroup of P(X ) .
Proof F o r each 0 E ,X define the mapping /,,: X - + X by fa(x) = 0 • x for each x E .X If ,x y E X and fa(x) = /,,(y), then a • x = 0 • y, and so x = y. H e nce, fa is an injective mapping. Now let y E .X Then a- I . y E X and so /"(0- 1 • y) = y. This implies that/" is surjective. H e nce, fa is a (1-1) mapping of X onto ,X which implies that /" is a permutation on X ; i.e., fG E P(X ) . Now define the function rp: X - + P(X ) by rp(o) = /" for each o E .X Now let u, VEX . F o r each x E X , I. .• (x) = (u • v) • x = u • (v • x) = fu(v • x ) = fuU . (x ) ) = fu ol.(x ) . Thus, fu.• = fu 0 f. for all u, VEX . Since rp(u • v) = f..• and rp(u) 0 rp(v) = fu 0 f., it follows that rp(u • v) = rp(u) o rp(v), and so rp is a homomorphism. Suppose u, v E X are such that rp(u) = rp(v). Then fu = f., which implies that j,,(x) = I.(x) for all x E .X In particular,fu(e) = I.(e). H e nce, u • e = v • e, so that u = v. This implies that rpis injective. It follows that rp is a (I- I ) mappingof X onto rp(X). By Theorem 2.2.13, part (i), rp(X) is a subgroup of P(X ) . This completes the proof. • We also have: 2.2.15. Theorem. L e t p be a homomorphism of a semigroup X into a semigroup ,Y and let p be an isomorphism of X with p(X ) . Then (i)
p- I is an isomorphism of p(X ) with X ; and
(ii) if P(X ) contains an identity element e', then p- I (e' ) = e is an identity element of X and K , = e{ } and ,K .= e{ }' (K, denotes the kernel of the homomorphism p).
Proof To prove the first part of the theorem, let x', y' E P(X). Then there exist uniq u e ,x y E X such that p(x) = x ' and p(y) = y', and p- I (X ' ) = x and p- I (y' ) = y. Since
=
p(x • y) we have
p- I (X '
• y' )
=
p(x) • P(Y) x • y
=
=
p- I (X ' )
x'
•
.v',
• p- I (y' ) .
67
2.2. oH momorphisms
Since this is true for all x ' , y' E p(X ) , it follows that p- I is an isomorphism of p( X ) with .X To prove the second part of the theorem we first note that P(X) is a subsemigroup of Y by Theorem 2.2.8. It follows from Theorem 2.2.13 that e = I(e') is an identity element of .X Now let p(k) = e'. Since p(e) = e', it follows that k = e and that K, = e{ .} We can similarly show that K , .•
r
e{ .} '
=
_
F r om the above result we can now conclude that if a semigroup X is isomorphic to a semigroup ,Y then the semigroup Y is isomorphic to the semigroup .X F o r endomorphisms and automorphisms we have: L e t"
2.2.16. Theorem. itself.
and IjI be homomorphisms of a semigroup X
into
(i) If" and IjI are endomorphisms of X , then the composite mapping IjI 0 " is likewise an endomorphism of .X (ii) If" and IjI are automorphisms of X , then IjI 0 " is an automorphism of .X (iii) If" is an automorphism of ,X then is also an automorphism of .X
,,-1
Proof To prove the first part, note that" and IjI are both mappings of X into X, and thus IjI 0 " is a mapping of X into .X Also, by definition, (1jI 0 ' l )(x ) = 1jI('l(x» for every x E X. Now since ' l (x ' y) = 'lex) • ' l (y) and ljI(x • y) = ljI(x) • IjI(Y) for every x , y E X , we have
=
IjI 0 ,,(x • y)
=
1jI('l(x • y» (1jI
'l(x»
0
=
1jI('l(x)
• (1jI
0
• ' l (Y »
=
1jI(' l (x »
• 1jI(' l (x »
' l (y» .
This implies that the mapping IjI 0 1' is an endomorphism of .X The proof of the second and third part of this theorem is left as an exercise. _ Complete the proof of the above theorem.
2.2.17. Exercise.
L e t us next consider homomorphisms of rings. To this end let, henceforth, and Y be arbitrary rings, and without loss of generality let the operations of these two rings be denoted by and ".". X
"+"
2.2.18. Definition. Let X and Y be two rings. A mapping p of set X set Y is called a homomorphism of the ring X into the ring Y if (i) p(x (ii) p(x ,
+
y) y)
=
=
p(x ) p(x )
+
p(y); and
• p(y)
into
Chapter 2 I Algebraic Structures
68
for every ,x y E X. The image of X homomorphic image of .X
into ,Y
denoted by P(X ) ,
is called the
If a homomorphism p is a one-to-one mapping of a ring X into a ring ,Y then p is called an isomorphism of X into .Y If the isomorphism p is an onto mapping of X into ,Y then p is called an isomorphism of X with .Y Furthermore, if p is a homomorphism of X into X, then p is called an endomorphism of the ring .X Finally, an isomorphism of X with itself is called an automorphism of ring .X The properties associated with homomorphisms of groups and semigroups can, of course, be utilized when discussing homomorphisms of rings. 2.2.19. Deorem. eL t p be a homomorphism of a ring X
into a ring .Y
(i) The homomorphic image p(X ) is a subring of .Y (ii) . If lX is a subring of X , then p(X I ) is a subring of P(X ) . (iii) eL t Y I be a subring of P(X ) . Then the subset lX C X defined by
XI = {x E :X p(x ) E Y t l is a subring of .X (iv) eL t Z be a ring and let f/I be a homomorphism of Y into Z. Then the composite mapping f/I 0 p is a homomorphism of X into Z.
Proof To prove the first part of the theorem we note that the homomorphic image p(X ) is clearly the homomorphic image of the group { X ; + } and of the semigroup { X ; .). Since this homomorphic image is a subgroup of { Y ; + ) and subsemigroup of { Y ; .), it follows from Theorem 2.1.72 that P(X ) is a subring of .Y The proofs of the remaining parts of this theorem are left as an exercise . 2.2.20. Exercise.
Prove parts (ii), (Hi), and (iv) of Theorem 2.2.19.
•
Analogous to 2.2.10, we make the following definition. 2.2.21. Definition. If p is a homomorphism of a ring X then the subset K , of X defined by K,
=
z{
E
X:
p(z) =
into a ring ,Y
O}
is called the kernel of the homomorphism p of the ring X into .Y We close the present section by introducing one more concept. 2.2.22. Definition. eL t { R ; ,+ .} be a ring with identity and let X and Y be two R-modules. A mapping f: X Y is called an R-homomorphism if, for all u, v E X and « E R the relations
2.3. Application to Polynomials
(i) f(u + (ii) f(rt.u)
=
1)
+
f(u) rt.f(u) =
69
f(v); and
hold. In the next chapter we will consider in great detail a special class of vector spaces and homomorphisms, and for this reason we will not pursue this subject any further at this time.
2.3. APPLICATION
TO POLYNOMIALS
Polynomials play an important role in many branches of mathematics as well as in science and engineering. In the present section we briefly consider applications of some of the concepts of the preceding sections to polynomials. First, we wish to give an abstract definition for a polynomial function. Basically, we want this function to take the form f(t) oH wever,
=
00
+
alt
+ ... +
a.t·.
we are not looking for a way of defining the value of f(t) for each
t, but instead we seek a definition offin terms of the indexed set 0{ o, ... , an}. To this end we let the a, belong to some field. More formally, let F be a field and define a set P as follows. If a E P, then a denotes an infinite sequence of elements from F in which all except a finite number are ez ro. Thus, if a E P, then a = 0{ o, a .. ... ,an' 0, 0, ...}.
That is to say, there exists some integer n > Now let b be another element of P, where b
We say that a on P by
"+"
=
0 such that a/
= 0 for all i > n.
lbo, bl' ... ,b.., 0, 0, ...}.
= b if and only if a, = b, for all i. We now define the operation
0+
b=
+
a{ o
bo, 0 1
+
b.. ... .J
Thus, if n 2 m, then a, + b, = 0 for all i > nand P is clearly closed with respect to "+". Next, we define the operation "." on P by
a• b=
where
C"
=
c = c{ o, ~
CI , •.
,J
" a,b,,_,
t:'o
for all k. In this case c" = 0 for all k> m + n, and P is also closed with respect to the operation"· " . Now let us define Then 0
E
P and { P ; + }
0= O { , 0, ....J is clearly an abelian group with identity O. Next,
Chapter 2 I Algebraic Structures
70
define
... .J
e= { I ,O,O,
Then e E P and { P ; • J is obviously a monoid with e as its identity element. We can now easily prove the following
,+ .
2.3.1. Theorem. The mathematical system P { ; J is a commutative ring with identity. It is called the riDg of polynomials over the field .F 2.3.2.
Exercise.
Prove Theorem 2.3.1.
Let us next complete the connection between our abstract characteriz a tion of polynomials and with the function f(t) we originally introduced. To this end we let
to= { I ,O,O,
= O{ , I, 0, 0,
t\
J
}
t'1. =
O { ,O, 1,0,
t3 =
O { ,O,O, I,O,
J J
At this point we still cannot give meaning to a,t', because a, E F and t' E P. However, if we make the obvious identification a{ " 0,0, ... J E P, and if we denote this element simply by a, E P, then we have f(t)
=
a o • to
+
a\ • t\
+ ... +
a• • t· .
Thus, we can represent J ( t) uniquely by the sequence a{ o, at, ... ,a., 0, ... .J By convention, we henceforth omit the symbol ". ", and write, e.g., f(t)
=
ao
+
a\ t
+ ... +
a"r.
We assign t appearing in the argument of f(t) a special name.
,+ .}
2.3.3. DeftnitiOD. Let P { ; be the polynomial ring over a field .F The element t E P, t = O { , 1,0, ...}, is called the indeterminate of P. To simplify notation, we denote by F [ t ] the ring of polynomials over a field ,F and we identify elements of t[ F ] (i.e., polynomials) by making use of the argument t, e.g., f(t) E t[F .]
* 0,
2.3.4. DeftnitioD. Let f(t) E t[F ,] and let f(t) = f{ O,f1o .• . ,f", ... J where f, E F for all i. The polynomial f(t) is said to be of order n or degree n iff" and if f, = for all i > n. In this case we write degf(t) = and we call f" the leading coefticieDt off If f" = I and f, = for all i > then J ( t) is said to be monic.
*°
°
°
°
of n
n,
If every coefficient of a polynomialfis zero, thenf b. is called the zero polynomial. The order of the zero polynomial is not defined.'
2.3.
71
Application to Polynomials
2.3.5. Theorem. L e tf(t) be a polynomial of order n and let get) be a polynomial of order m. Then f(t)g(t) is a polynomial of order m + n.
Proof
+
+ ... +
/"t· , let get)
L e tf(t) = f o fit = f(t)g(t). Then
and let h(t)
=
go
+
glt
+ ... +
g.r,
Since It = 0 for i > nand gJ = 0 for j > m, the largest possible value of k such that hk is non-zero occurs for k = m n; eL .,
+
hm+n
=
/"gm'
Since F is a field, f" and gm cannot be zero divisors, and thushm + . Therefore, hm + . *- 0, and hk = 0 for all k > m + n. •
*-
O.
The reader can readily prove the next result. 2.3.6. Theorem. The ring F ( t) of polynomials over a field F is an integral domain. 2.3.7. Exercise. Prove Theorem 2.3.6. Our next result shows that, in general, we cannot go any further than integral domain for t[ F l. 2.3.8. Theorem. Let f(t) E t[F .] if and only if f(t) is of order zero.
Then f(t) has an inverse relative to "."
Proof
Let f(t) E t[F J be of order n, and assume that f(t) has an inverse relative to ".", denoted by f- I (t), which is of order m. Then f(t)f- I (t)
where e =
=
e,
{I, 0, 0, ... J is of order ez ro. By Theorem 2.3.5 the degree of + n = 0 and since m > 0 and n > 0, we must
+
f(t)f- 1 (t) is m n. Thus, m havem = n = O. Conversely, let f(t) = fo = = fo 1 = { f o· , 0, 0, ... J . •
f{ o, 0, 0, ... ,J where fo
*-
O. Then f- I (t)
In the case of polynomials of order zero we omit the notation t, and we say f(t) is a scalar. Thus, if c(r) is a polynomial of order zero, we have c(t) = c, where c 1= = O. We see immediately that cf(t) = cfo + cflt + cf"t" for all f(t) E t[F .J The following result, which we will require in Chapter ,4 is sometimes called the division algorithm.
+ ...
Chapter 2
72
I Algebraic Structures
2.3.9. Theorem. eL t f(t), get) E E[t] and assume that get) exist unique elements q(t) and ret) in E[t] such that
*"O. Then there
= (q t)g(t) + ret), (2.3.10) where either ret) = 0 or deg ret) < deg get). Proof If f(t) = 0 or if degf(t) < deg get), then Eq. (2.3.10) is satisfied with q(t) = 0, and ret) = f(t). Ifdegg(t) = 0, Le.,g(t) = c, thenf(t) = c[ - I • f(t)] • C, and Eq. (2.3.10) holds with q(t) = c- I f(t) and ret) = O. f(t)
Assume now that deg f(t) > deg get) > 1. The proof is by induction on the degree of the polynomial f(t). Thus, let us assume that Eq. (2.3.10) holds for deg f(t) = n. We first prove our assertion for n = 1 and then for n + I. Assume that deg f(t) = I, eL ., f(t) = a o + alt, where a l O. We need only consider the case g(t) = b o + bit, where b l O. We readily see that Eq. (2.3.10) is satisfied withq ( t) = alb. 1 and ret) = a o - alb.lb o' Now assume that Eq. (2.3.10) holds for degf(t) = k, where k = 1, ... , n. We want to show that this implies the validity of Eq. (2.3.10) for degf(t) = n + I. Let
*"
f(t) =
ao +
alt
+ ... +
*"
a"+lt"+I,
where a,,+ I 1= = O. Let deg get) = m. We may assume that 0 < m < n + I. Let g(t) = bo + bit + ... + b",t"', where b", O. It is now readily verified that
f(t)
=
b;.la"t"+I"- g' (t)
+
*"
[ f (t) -
b;.la.,tk+I"- g' (t)].
(2.3.11)
Now let h(t) = f(t) - b;.l a.,t"+I-"'g(t). It can readily be verified that the coefficient of t"+1 in h(t) is O. Hence, either h(t) = 0 or deg h(t) < n + I. By our induction hypothesis, this implies there exist polynomials set) and ret) such that h(t) = s(t)g(t) + ret), where ret) = 0 or deg ret) < deg get). Substituting the expression for h(t) into Eq. (2.3.11), we have
f(t)
= b[ ;.la"t"+I"'-
+
s(t)]g(t)
+
ret).
Thus, Eq. (2.3.10) is satisfied and the proof of the existence of ret) and q(t) is complete. The proof of the uniqueness of q(t) and ret) is left as an exercise. _ 2.3.12. Exercise.
Prove that (q t) and ret) in Theorem 2.3.9 are unique.
The preceding result motivates the following definition. 2.3.13. Definition. Let f(t) and get) be any non-zero polynomials. Let q(t) and ret) be the unique polynomials such thatf(t) = q(t)g(t) + r(t), where either ret) = 0 or deg ret) < deg get). We call q(t) the qootient and ret) the remainder in the division of f(t) by get). If ret) = 0, we say that get) divides f(t) or is a factor of f(t).
73
2.3. Application to Polynomials
Next. we prove: 2.3.14. Theorem. eL t t[ F ] denote the ring of polynomials over a field .F eL t f(t) and g(t) be non·zero polynomials in t[ F .] Then there exists a unique monic polynomial. d(t). such that (i) d(t) divides f(t) and g(t). and (ii) if d'(t) is any polynomial which divides f(t) and g(t), then d'(t) divides d(t). Let
Proof.
t[ K ]
=
{x(t)
E
t[ F :]
x ( t)
m(t)f(t) +
=
n(t)g(t). where m(t). n(t)
E
t[ F l}.
We note that f(t). g(t) E t[ K .] Furthermore, if a(t), b(t) E t[ K .] then a(t) and a(t)b(t) E t[ K .] Also. if c is a scalar. then ca(t) E t[ K ] for all b(t) E t[ K ] a(/) E K[/]. Now let d(/) be a polynomial of lowest degree in K[t]. Since all scalar multiples of d(/) belong to t[ K ,] we may assume that d(t) is monic. We now show that for any h(/) E t[ K .] there is a q(t) E t[ F ] such that h(/) = d(/)q(t). To prove this. we know from Theorem 2.3.9 that there exist unique such that h(t) = q(t)d(t) + ,(1). where either elements q(t) and ,(t) in t[ F ] r(t) = 0 or deg ,(t) < deg d(t). Since d(t) E /[ K ] and q(t) E t[ F .] it follows that q(I)d(t) E K(t). Also. since h(/) E t[ K ,] it follows that r(/) = h(t) Since d(t) is a polynomial of smallest degree in (K t). it q(t)d(t) E t[ K .] follows that r(/) = O. eH nce. d(t) divides every polynomial in /[ K .] To show that d(t) is unique. suppose dl(t) is another monic polynomial in t[ K ] which divides every polynomial in t[ K .] Then d(t) = a(t)dl(t). and d 1(t) = b(t)d(/) for some a(t). b(t) E t[ F .] It can readily be verified that this is true only when aCt) = b(t) = 1. Now, since J ( t), g(t) E t[K l, part (i) of the theorem has been proven. To prove part (ii), let o(t), b(t) E t[ F ] be such that f(t) = a(t)d'(t) and get) = b(t)d'(t). Since d(t) E t[ K ,] there exist polynomials m(t), n(t) such that d(t) = m(t)f(t) + n(t)g(t). eH nce, d(t) =
+
m(t)a(t)d'(t)
= m [ (t)a(t)
+
n(t)b(t)d'(t)
n(t)b(t)]d(' t).
This implies that d'(t) divides d(t) and completes the proofofthe theorem.
_
The polynomial d(t) in the preceding theorem is called the greatest common divisor of f(t) and g(t). If d(t) = 1. then f(t) and g(t) are said to be relatively prime. 2.3.15. Exercise. Show that if d(t) is the greatest common divisor of f(t) and g(t). then there exist polynomials m(t) and n(t) such that de,) =
m(t)f(t) +
n(t)g(t).
Iff(t) and g(t) are relatively prime, then 1=
m(t)f(t) +
n(t)g(t).
Chapter 2 I Algebraic Structures
74
Now let f(t) E t[ F ] be of positive degree. If f(t) = g(t)h(t) implies that either g(t) is a scalar or h(t) is a scalar, then f(t) is said to be irreducible. We close the present section with a statement of the fundameotal theorem of algebra. 2.3.16. Theorem. Let f(t) E t[ F ] the field of real numbers and let C (i) If F = C, then f(t) can be product f(t) = c(t -
be a non-zero polynomial. L e t R denote denote the field of complex numbers. written uniquely, except for order, as a
cl)(t -
C1)' .. (t -
c.),
where c, C l , • • ,C. E C. (ij) If F = R, then f(t) can be written uniquely, except for order, as a product f(t) = cfl(t)f1(t) . . .f",(t), where C E R and the fl(t), ... ,/",(t) are monic irreducible polynomials of degree one or two.
2.4.
REFERENCES
AND NOTES
There are many excellent texts on abstract algebra. F o r an introductory exposition of this subject refer, e.g., to Birkhoffand MacLane 2[ .1], H a nneken 2[ .2], H u 2[ .3], Jacobson 2[ .4,] and McCoy 2[ .6]. The books by Birkhoff and MacLane and Jacobson are standard references. The texts by H u and McCoy are very readable. The excellent presentation by H a nneken is concise, somewhat abstract, yet very readable. Polynomials over a field are treated extensively in these references. F o r a brief summary of the properties of polynomials over a field, refer also to Lipschutz 2[ .5].
REFERENCES 2[ .1] 2[ .2] 2[ .3]
2[ .4] 2[ .5)
2[ .6)
G. BIRKO H F and S.MACLANE, A Survey of Modern Algebra. New York: The Macmillan Company, 1965. C. B. A H NNEKEN, Introduction to Abstract Algebra. Belmont, Calif.: Dickenson Publishing Co., Inc., 1968. S. T. Hu, Elements ofModern Algebra. San rF ancisco, Calif.: oH lden-Day, Inc., 1965. N. A J COBSON, eL ctures in Abstract Algebra. New York: D. Van Nostrand Company, Inc., 1951. S. LIPSCHUTZ, iL near Algebra. New York: McGraw-iH ll Book Company, 1968. N. .H McCoY, uF ndamentals of Abstract Algebra. Boston: Allyn & Bacon, Inc., 1972.
3
VECTOR SPACES AND IL NEAR TRANSFORMATIONS
In Chapter I we considered the set-theoretic structure of mathematical systems, and in Chapter 2 we developed to various degrees of complexity the algebraic structure of mathematical systems. One of the mathematical systems introduced in Chapter 2 was the linear or vector space, a concept of great importance in mathematics and applications. In the present chapter we further examine properties of linear spaces. Then we consider special types of mappings defined on linear spaces, called linear transformations, and establish several important properties of linear transformations. In the next chapter we will concern ourselves with finite dimensional vector spaces, and we will consider matrices, which are used to represent linear transformations on finite dimensional vector spaces.
3.1.
IL NEAR
SPACES
We begin by restating the definition of linear space. 3.1.1. Definition. Let X be a non-empty set, let F be a field, let "+ .. denote a mapping of X x X into ,X and let"· " denote a mapping of F x X into .X Let the members x E X be called l'ectors, let the elements « E F be called scalars, let the operation defined on X be called e'\ ctor addition,
"+ ..
75
Chapter 3 I Vector Spaces and iL near Transformations
76
and let the mapping "." be called scalar multiplicatioD or moltipUcatioa or vectors by scalars. Then for each ,x y E X there is a unique element, x y E ,X called the sum or x aad y, and for each x E X and IX E F there is a unique element, IX • X I!. IXX E ,X called the multiple or x by IX. We say that the non-empty set X and the field ,F along with the two mappings of vector addition and scalar multiplication constitute a vector space or a iJ Dear space if the following axioms are satisfied:
+
+
+
y= y x for every ,x y EX ; (i) x (ii) x (y )z = (x + y) + z for every ,x y, Z E X ; (iii) there is a unique vector in ,X called the ez ro vector or the Dull vector or the origiD, which is denoted by 0 and which has the property that 0 x = x for all x EX ; (iv) IX(X y) = IXX IXy for all IX E F and for all ,x y E X ; (v) (IX p)x = IXX px for all IX, p E F and for all x E X ; (vi) (IXP)X = IX(PX) for all IX, p E F and for all x E ;X (vii) Ox = 0 for all x E X ; and (viii) Ix = x for all x E .X
+
+
+
+
+
+
+
The reader may find it instructive to review the axioms of a field which are summarized in Definition 2.1.63. In (v) the "+" on the left-hand side on the right-hand side denotes the operation of addition on F ; the denotes vector addition. Also, in (vi) IXP I!. IX · p, where "." denotes the operation of mulitplication on .F In (vii) the symbol 0 on the left-hand side is a scalar; the same symbol on the right-hand side denotes a vector. The I on the left-hand side of (viii) is the identity element of F r elative to ".". To indicate the relationship between the set ofvectors X and the underlying field ,F we sometimes refer to a vector space X over field .F oH wever, usually we speak of a vector space X without making explicit reference to the field F and to the operations of vector addition and scalar multiplication. If F is the field of real numbers we call our vector space a real vector space. Similarly, if F is the field of complex numbers, we speak of a complex vector space. Throughout this chapter we will usually use lower case Latin letters (e.g., ,x y, )z to denote vectors (Le., elements of X ) and lower case Greek letters (e.g., IX, p, )') to denote scalars (Le., elements of F ) . If we agree to denote the element (- l )x E X simply by - x , eL ., (- l )x I!. - x , then we have x - x = Ix + (- l )x = (l - l)x = Ox = O. Thus, if X is a vector space, then for every x E X there is a unique vector, denoted -x, such that x - x = O. There are several other elementary properties of vector spaces which are a direct consequence of the above axioms. Some of these are summarized below. The reader will have no difficulties in verifying these.
"+"
3.1. iL near Spaces
77
3.1.2. Theorem. eL t X be a vector space. If ,x y, z are elements in X and if ,« P are any members of ,F then the following hold: (i) if « x = « y and IX 1= = 0, then x = y; (ii) If IXX = px and x 1= = 0, then IX = p;. (iii) if oX + y = x + ,z then y = ;z (iv) IXO = 0; (v)
IX(X
(vi) (IX (vii) x
y) =
+
-
-
fJ)x y=
3.1.3. Exercise.
IXX
=
IXX
IX}'; -
px; and -
0 implies that x
=
-yo
Prove Theorem 3.1.2.
We now consider several important examples of vector spaces. 3.1.4. Example. eL t X be the set of all "arrows" in the "plane" emanating from a reference point which we call the origin or the ez ro vector or the null vector, and which we denote by o. eL t F denote the set of real numbers, and let vector addition and scalar multiplication be defined in the usual way, as shown in iF gure A.
/
o
Vector x
x x
x+y
0
• •
y
"fY
Vector x + y
3.1.5.
• • av•
0
.,
. y
• ($•y
Vector y Vector av, O< a < l Vector ($y, fj > 1 Vector "fY, O <'}
iF gm'e
A
The reader can readily verify that, for the space described above, all the axioms of a linear space are satisfied, and hence X is a vector space. _ The purpose of the above example is to provide an intuitive idea of a linear space. We wiJ) utilize this space occasionally for purposes of motivation in our development. We must point out however that the terms "plane" and "arrows" were not formally defined, and thus the space X was not really properly defined. In the examples which follow, we give a more precise formulation of vector spaces.
Chapter 3 I Vector Spaces and iL near Transformations
78
eL t X = R denote the set of real numbers, and let F also denote the set of real numbers. We define vector addition to be the usual addition of real numbers and multiplication of vectors x E R by scalars (X E F to be multiplication of real numbers. It is a simple matter to show that this space is a linear space. _
3.1.6. Example.
3.1.7. Example.
Let X = P denote the set of all ordered n-tuples of elements from field .F Thus, if x E iF t, then x = (C It e2' ... ,elt), where E ,F i = I, ... ,n. With ,x yEP and (X E ,F let vector addition and scalar multiplication be defined as
e,
x
+
y
e2"' " elt) + ('71' ' 7 2"' " 7' 1t) (el + 7' 1' e2 + 7' 2' ... ,elt + 7' 1t)
=
(el' .0.
and (Xx
=
e2" .. ,elt)
(X(el'
"+
.0.
eX « l'
e« 2"
.. ,«elt)'
(3.1.8) (3.1.9)
It should be noted that the symbol" on the right-hand side of Eq. (3.1.8) denotes addition on the field ,F and the symbol" on the left-hand side of Eq. (3.1.8) designates vector addition. (See Theorem 2.1.88.) In the present case the null vector is defined as = (0, 0, ... , 0) and the vector - x is defined by - x = -(~I' ~2"'" ~It) = (-~I' -~2"'" e- lt)' Utilizing the properties of the field ,F all axioms of Definition 3.1.1 are readily verified, and iF t is thus a vector space. We call this space the space iF t of n-tuples of elements of .F _
"+
°
3.1.10. Example.
In Example 3.1.7 let F = R, the field of real numbers. Then X = RIt denotes the set of all n-tuples of real numbers. We call the vector space R" the n-dimensional real coordinate space. Similarly, in Example 3.1.7 let F = C, the field of complex numbers. Then X = Cit designates the set of all n-tuples of complex numbers. The linear space C" is called the ndimensional complex coordinate space. _ In the previous example we used the term dimension. At a later point in the present chapter the concept of dimension will be defined precisely and some of its properties will be examined in detail.
3.1.11. Example.
eL t numbers of the form X
denote the set of all infinite sequences of real (3.1.12)
let F denote the field of real numbers, let vector addition be defined similarly as in Eq. (3.1.8), and let scalar multiplication be defined similarly as in Eq. (3.1.9). It is again an easy matter to show that this space is a vector space. We point out that this space, which we denote by R- , is simply the collection
3.1. iL near Spaces
79
of all infinite sequences; eL ., there is no req u irement that any type of convergence of the sequence be implied. _ 3.1.13. Example. L e t X = c~ denote the set of all infinite sequences of complex numbers of the form (3.1.12), let F represent the field of complex numbers, let vector addition be defined similarly as in Eq. (3.1.8), and let scalar multiplication be defined similarly as in Eq. (3.1.9). Then C~ is a vector space. _ L e t X denote the set of all sequences of real numbers 3.1.14. Example. having only a finite number of non- z e ro terms. Thus, if x E ,X then (3.1.15) for some positive integer I. If we define vector addition similarly as in Eq. (3.1.8), if we define scalar multiplication similarly as in Eq. (3.1.9), and if we let F be the field of real numbers, then we can readily show that X is a real vector space. We call this space the space of finitely non-zero sequences. If X denotes the set of all sequences of complex numbers of the form (3.1.15), if vector addition and scalar multiplication are defined similarly as in eq u ations (3.1.8) and (3.1.9), respectively, then X is again a vector space (a complex vector space). _ L e t X be the set of infinite sequences of real numbers 3.1.16. Example. of the form (3.1.12), with the property that Iim~. = O. If F is the field of real numbers, if vector addition is defined similarly as in Eq. (3.1.8), and if scalar multiplication is defined similarly as in Eq. (3.1.9), then X is a vector space. This is so because the sum of two sequences which converge to z e ro also converges to zero, and because the scalar multiple of a sequence converging to z e ro also converges to zero. _ 3.1.17. Example. L e t X be the set of infinite sequences of real numbers of the form (3.1.12) which are bounded. If vector addition and scalar multiplication are again defined similarly as in (3.1.8) and (3.1.9), respectively, and if F denotes the field of real numbers, then X is a vector space. This space is called the space of bounded real sequences. There also exists, of course, a complex counterpart to this space, the _ space of bounded complex sequences. 3.1.18. Example.
Let X denote the set of infinite sequences of real numbers
of the form (3.1.12), with the property that ~
00
1= 1
I~II
<
00.
Let
F
be the field
of real numbers, let vector addition be defined similarly as in (3.1.8), and let scalar multiplication be defined similarly as in Eq. (3.1.9). Then X is a vector space. _
I Vector Spaces and Linear Transformations
Chapter 3
80
3.1.19. ExaDlple. Let X be the set of all real-valued continuous functions defined on the interval a[ , b]. Thus, if x E ,X then x : a[ , b] - > R is a real, continuous function defined for all a ~ t < b. We note that x = y if and only if x ( t) = y(t) for all t E a[ , b], and that the null vector is the function which is ez ro for all t E a[ , b]. eL t F denote the field of real numbers, let liEF , and let vector addition and scalar multiplication be defined pointwise by (x
and
+
yX t ) (lIx X t )
=
x ( t)
+
lIx{t) =
y(t) for all t E a[ , b]
(3.1.20)
for all t E a[ , b].
(3.1.21)
Then clearly x + y E X whenever x, y E X, IIX E X whenever liEF and x E ,X and all the axioms of a vector space are satisfied. We call this vector space the space of real-valued continuous functions on a[ , b) and we denote it by era, b). _ 3.1.22. Example. Let X be the set of all real-valued functions defined on the interval a[ , b) such that
s:
Ix ( t)\
dt
<
00,
where integration is taken in the Riemann sense. eL t F denote the field of real numbers, and let vector addition and scalar multiplication be defined as in equations (3.1.20) and (3.1.21), respectively. We can readily verify that X is a vector space. _ 3.1.23. Example. eL t X denote the set of all real-valued polynomials defined on the interval a[ , b), let F be the field of real numbers, and let vector addition and scalar multiplication be defined as in equations (3.1.20) and (3.1.21), respectively. We note that the null vector is the function which is ez ro for all t E a[ , b], and also, if x ( t) is a polynomial, then so is - x ( t). Furthermore, we observe that the sum of two polynomials is again a polynomial, and that a scalar multiple of a polynomial is also a polynomial. We can now readily verify that X is a linear space. _ 3.1.24. Example. eL t X denote the set of real numbers between - a < 0 and + a > 0; i.e., if x E X then x E [ - a , a). Let F be the field of real numbers. Let vector addition and scalar multiplication be as defined in Example 3.1.6. Now, if II E F is such that II> I, then tZa > a and tZa ¢ .X F r om this it follows that X is not a vector space. _ Vector spaces such as those encountered in Examples 3.1.19,3.1.22, and 3.1.23 are called function spaces. In Chapter 6 we will consider some additional linear spaces.
81
3.2. Linear Subspaces and Direct Sums
3.1.25. Exercise. Verify the assertions made in Examples 3.1.6,3.1.7, 3.1.10,3.1.11,3.1.13,3.1.14,3.1.16,3.1.17,3.1.18,3.1.19,3.1.22, and 3.1.23.
3.2.
IL NEAR
SUBSPACES
AND DIRECT SUMS
We first introduce the notion of linear subspace. (See also Definition 2.1.102.) 3.2.1. Definition. A non-empty subset Y of a vector space X is called a linear manifold or a linear subspace in X if (i) x + y is in Y whenever x and y are in ,Y and (ii) (X x is in Y whenever (X E F and x E .Y It is an easy matter to verify that a linear manifold Y satisfies all the axioms of a vector space and may as such be regarded as a linear space itself. 3.2.2. Example. The set consisting of the null vector 0 is a linear subspace; { J is a linear subspace. Also, the vector space X is a linear i.e., the set Y = O subspace of itself. If a linear subspace Y is not all of X, then we say that Y is a proper subspace of .X • 3.2.3. Example. The set of all real-valued polynomials defined on the interval a[ , b] (see Example 3.1.23) is a linear subspace of the vector space consisting of all real-valued continuous functions defined on the interval [a, b] (see Example 3.1.19). • Concerning linear subspaces we now state and prove the following result. 3.2.4. Theorem. Let Y and Z be linear subspaces of a vector space .X The intersection of Y and Z, Y n Z, is also a linear subspace of .X Proof. Since Y a nd Z are linear subspaces, it follows that 0 E Y a nd 0 E Z, and thus 0 E Y n Z. Hence, Y n Z is non-empty. Now let (x , E ,F let X , y E ,Y and let x , y E Z. Then (X x + py E Y a nd also (X x + py E Z, because Y a nd Z are both linear subspaces. eH nce, (X x + py E Y n Z and Y n Z is a linear subspace of .X •
p
We can extend the above theorem to a more general result. 3.2.5. Theorem. Let X be a vector space and let ~ be a linear subspace of X for every i E I, where I denotes some index set. Then XI is a linear lei su bspace 0 f .X
n
3.2.6. Exercise.
Prove Theorem 3.2.5.
81
Chapter 3
I Vector Spaces and Linear Transformations
Now consider in the vector space of Example 3.1.4 the subsets Y a nd Z consisting of two lines intersecting at the origin 0, as shown in Figure B. Clearly, Y and Z are linear subspaces of the vector space .X On the other hand, the union of Y and Z, Y u Z, obviously does not contain arbitrary sums lXy + flz, where lX, fl E F and y E Y a nd z E Z. F r om this it follows that if Y and Z are linear subspaces then, in general, the union Y U Z is not a linear subspace of .X
3.1.7.
iF gure B
3.2.8. Definition. eL t X be a linear space, and let Y and Z be arbitrary subsets of X. The sum of sets Y and Z, denoted by Y + Z, is the set of all vectors in X which are of the form y + ,z where y E Y a nd Z E Z. The above concept is depicted pictorially in Figure C by utilizing the vector space of Example 3.1.4. With the aid of our next result we can generate various linear subspaces.
Y+Z
Y
3.2.9.
iF gure C. Sum of two Subsets.
3.2.
83
Linear Subspaces and Direct Sums
3.2.10. Theorem. Then their sum, Y 3.2.11. Exercise.
+
Let Y a nd Z be linear subspaces of a vector space .X Z, is also a linear subspace of .X Prove Theorem 3.2.10.
{ ,J Now let Y and Z be linear subspaces of a vector space .X If Y n Z = O we say that the spaces Y and Z are disjoint. We emphasize that this terminology is not consistent with that used in connection with sets. We now have: 3.2.12. Theorem. Let Y and Z be linear subspaces of a vector space .X Then for every x E Y + Z there exist unique elements Y E Y and Z E Z Z if and only if Y n Z = O { .J such that x = Y
+
x E Y + Z be such that x = IY + ZI = 2Y + Z2' where Y I ' where ZI,Z2 E Z. Then clearly IY - 2Y = Z2 - ZI' Now IY - 2Y E Y and Z2 - Z I E Z, and since by assumption Y n Z = (OJ, it follows that IY - 2Y = 0 and Z2 - ZI = 0, .Y = 2Y and ZI = Z2' Thus, every x E Y + Z has a unique representation x = Y + ,z where Y E Y and Z E Z, provided { .J that Y n Z = O Conversely, let us assume that for each x = Y + Z E Y + Z the Y E Y and the Z E Z are uniquely determined. Let us further assume that the linear subspaces Y and Z are not disjoint. Then there exists a non-zero vector v E Y n Z. In this case we can write x = Y Z = Y Z «v - «v = (y + /Xv) + (z - /Xv) for all/X E .F But this implies that y and z are not unique, which is a contradiction to our hypothesis. eH nce, the spaces Y and Z must be disjoint. _ Proof
2Y
E
Let
Y a nd
+
+
+
Theorem 3.2.10 is readily extended to any number of linear subspaces of .X Specifically, if IX > ... , ,X are linear subspaces of ,X then XI + ... + X, is also a linear subspace of .X This enables us to introduce the following: 3.2.13. Definition. Let X I > " " ,X be linear subspaces of the vector space .X The sum XI + ... + ,X is said to be a direct sum if for each x E XI + ,X there is a unique set of iX E IX > i = I, ... ,r such that x = IX + x,. We denote the direct sum of X I , . .. , ,X by XI EB ...
EB ,X .
+
+
There is a connection between the Cartesian product of two vector spaces and their direct sum. eL t Y a nd Z be two arbitrary linear spaces over the same field F and let V = Y x Z. Thus, if v E V, then v is the ordered pair v=
(y, z),
Chapter 3 I Vector Spaces and iL near Transformations
84
where y E Y a nd
Z E
Z. Now let us define vector addition as (3.2.14)
and scalar multiplication as « ( y, z)
=
(
(3.2.15)
)z« ,
where (yI' Z I), (Y1' z 1) E V = Y x Z and where« E .F Noting thatfor each vector (y, z) E V there is a vector - ( y, z) = (- y , - z ) E V, and observing that (0, 0) = (y, )z - (y, )z for all elements in V, it is an easy matter to show that the space V = Y x Z is a linear space. We note that Y is not a linear subspace of V, because, in fact, it is not even a subset of V. However, if we let Y'
= fey, 0): yE
Y},
Z'
=
Z} ,
and
({ O, z): z
E
Then Y ' and Z' are linear subspaces of V and V = Y' EB Z' . By abuse of notation, we frequently express this simply as V = Y EB Z. Once more, making use of Example 3.1.4, let Y and Z denote two lines intersecting at the origin 0, as shown in F i gure D. The direct sum of linear subspaces Y a nd Z is in this case the "entire plane."
3.2.16.
Figure D
In order that a subset be a linear subspace of a vector space, it is necessary that this subset contain the null vector. Thus, in Figure D, the lines Y a nd Z passing through the origin 0 are linear subspaces of the plane (see Example 3.1.4). In many applications this requirement is too restrictive and a generalization is called for. We have: 3.2.17. DefinitiOD. Let Y be a linear subspace of a vector space ,X x be a fixed vector in .X We cal1 the translation
and let
3.3. iL near Independence. Bases. and Dimension
z =
+ X
Y
Do
z{
E
X:
Z
= x
+
8S
.Y Y
E
}Y
a linear variety or a ftat or an affine linear subspace of .X In F i gure E. an example of a linear variety is given for the vector space of Example 3.1.4.
3.2.18.
3.3.
iF gure E
IL NEAR INDEPENDENCE, AND DIMENSION
BASES,
Throughout the remainder of this and in the following chapter we use the following notation: ({ IX • . .. ,(X,,}, (X, E ,F denotes an indexed set of scalars, and IX{ ' ...• ,x ,}. ,X E .X denotes an indexed set of vectors.
Before introducing the notions of linear dependence and independence of a set of vectors in a linear space ,X we first consider the following.
3.3.1. Deflnition. Let Y be a set in a linear space X (Y may be a finite set or an infinite set). We say that a vector X E X is a finite linear combination of vectors in Y if there is a finite set of elements {YI' Y 2 "" • ,Y ,} in Y a nd a finite set of scalars ({ IX ' (X2' ... , (X.} in F such that X
=
(XIIY
+
(X22Y
+ ... +
(X"y".
(3.3.2)
In Eq. (3.3.2) vector addition has been extended in an obvious way from the case of two vectors to the case of n vectors. In later chapters we will consider linear combinations which are not necessarily finite. The represen-
Chapter 3 I Vector Spaces and iL near Transformations
86
tation of x in Eq. (3.3.2) is, of course, not necessarily unique. Thus, in the case of Example 3.1.10, if X = RZ and if x = (1, 1), then x can be represented as or as
x =
PIZI
+
pzz
=
2(! , 0)
+
3(0,! ) ,
etc. This situation is depicted in Figure .F = (0,1)
2Y
Z2
=
(0,
X"
(1,1)
=
1(1,0) + 1(0, 1)" 2(! , 0) + 3(O,! ) " etc.
M ...-
.......Z, ..
(!'
...... ,Y
= ( 1,0)
0)
3.3.3.
iF gure F
3.3.4.
Theorem. eL t Y be a non-empty subset of a linear space .X eL t be the set of all finite linear combinations of the vectors from Y; eL ., Y E V( Y ) if and only if there is some set of scalars { l X I ' • • , IX",l and some finite subset {y I ' • . ,Y",l of Y such that V( Y )
Y = IXIIY + IX.}'Yz + ... + IX",Y .. , where m may be any positive integer. Then V( Y) is a linear subspace of .X
3.3.5. Exercise.
Prove Theorem 3.3.4.
Our previous result motivates the following concepts. 3.3.6. Defloition. We say the linear space V( Y ) linear subspace generated by the set .Y
in Theorem 3.3.4 is the
3.3.7. Defloition. eL t Z be a linear subspace of a vector space .X If there exists a set of vectors Y c X such that the linear space V( Y ) generated by Y is Z, then we say Y spans Z. If, in particular, the space of Example 3.1.4 is considered and if V and W are linear subspaces of X as depicted in Figure 0, then the set Y = e{ ll spans W, the set Z = e{ lz spans V, and the set M = {el' ezl spans the vector space .X The set N = e{ l, e'.} , e3l also spans the vector space .X
3.3. iL near Independence, Bases, and Dimension
3.3.8.
87
iF gure G. Vand Ware iL nes Intersecting at Origin O.
3.3.9. Exercise.
Show that V( )Y is the smallest linear subspace of a vector space X containing the subset Y of X. Specifically, show that if Z is a linear subspace of X and if Z contains ,Y then Z also contains V( Y). And now the important notion of linear dependence.
3.3.10. Deftnition. Let { X I ' X 2, • • ,x",} be a finite non-empty set in a linear space .X Ifthere exist scalars IX I , • . ,IX " , E E, not all zero, such that IX I X
I
+ ... +
IX",X",
=
0
(3.3.11)
then the set IX{ > X2' • • , x m} is said to be linearly dependent. If a set is not linearly dependent,then it is said to be linearly independent. In this case the relation (3.3.11) implies that IX I = IX 2 = ... = IX", = O. An infinite set of vectors Y in X is said to be linearly independent if every finite subset of Y is linearly independent. Note that the null vector cannot be contained in a set which is linearly independent. Also, if a set of vectors contains a linearly dependent subset, then the whole set is linearly dependent. If X denotes the space of Example 3.1.4, the set of vectors y{ , }z in Figure H is linearly independent, while the set of vectors v} is linearly dependent.
ru,
v u
o 3.3.12. tors.
iF gure .H
3.3.13. Exercise.
iL nearly Independent and iL nearly Dependent Vec-
eL t X = e[a, b), the set of all real-valued continuous functions on a[ , b), where b > a. As we saw in Example 3.1.19, this set forms
Chapter 3 I Vector Spaces and iL near Transformations
88
a vector space. eL t n be a fixed positive integer, and let us define ,x E X for i = 0, 1,2, ... , n, as follows. F o r all I E a[ , b), let and
,x (t) =
x i t) =
1
I', i=
I, ... ,n.
L e tY = x{ o, X I "' " x 8 }. Then V( )Y of degree less than or equal to n.
is the set of all polynomials on a[ , b]
(a) Show that Y is a linearly independent set in .X (b) eL t ,X = ,x { ,} i = 0, 1, ... ,n; i.e., each ,X is a singleton subset of .X Show that
=
V( Y )
(c)
=
eL t oz (t)
V(X o) Ef> V(X
1 for all I
E
I)
Ef> • . . Ef>
V(X.).
a[ , b) and let
Zk(t) =
I
+
+ ... +
I
Ik
for all I E a[ , b) and k = 1, ... ,n. Show that Z = is a linearly independent set in V( )Y .
3.3.14.
Theorem. eL t
"' If I::
a vector space .X
"',X
X 1 "' "
"' P,x" = I::
I- '
If,~
"x ,} Therefore "" =
P,
,~ "'
"x ,}
Zl"' "
.z }
be a linearly independent set in
P, for all i =
then "" =
1, 2, ... , m.
P,x, then ,~ "' ("" - P,)x, = O. Since the set .x{ , ... , is linearly independent, we have ("', - P,) = 0 for all i = 1, ... ,m.
Proof.
"' "',,x =
I- '
{XI'
oz{ ,
for all i. •
The next result provides us with an alternate way of defining linear dependence.
3.3.15. Theorem. A set of vectors .x{ ,
x
,x",}
1, ••
is linearly dependent if and only if for some index i, 1 ~ , "'", such that scalars "' I ' ... , "',- .. "',+ I'
,x
Proof. "' . X
=
"' I X .
+
+
+ "',.+ IX +
""- I X ' - I
I
+ ... + "'",x..
Assume that Eq. (3.3.16) is satisfied. Then I
+ ... +
+
"' , _ . X ' _ I
(- l )x ,
+
"',.+ ,x .+
in a linear space X i ~ m, we can find
+ ... +
"'."X ,
=
(3.3.16)
O.
Thus, "" = - 1 1= = 0 is a non-trivial choice of coefficient for which Eq. (3.3.11) holds, and therefore the set {Xl> X1' • • , "x ,} is linearly dependent. Conversely, assume that the set {XI' X z , • • ,x",} is linearly dependent. Then there exist coefficients "' . , ... , "'", which are not all ez ro, such that
+
"'x z
+ ... +
"'",x", = O.
(3.3.17) Suppose that index i is chosen such that "" 1= = O. Rearranging Eq. (3.3.17) to "'IX
I
z
89
3.3. Linear Independence, Bases, and Dimension - I I I,X
=
II,X ,
+ ... +
I- I
II' - I X
+
III+ I X I+ I
+ ... +
II.X " ,
(3.3.18)
and multiplying both sides of Eq. (3.3.18) by - 1 /11" we obtain IX
=
PIX I
where P" = proof. _
+
+
P1.X1.
-11"/11,,
k
= I,
+
P' _ I X / _
,i -
I
I, i
+
+
PI+ I X / +
1
+ ... +
P",x " "
I, ... ,m. This concludes our
The proof of the next result is left as an exercise. 3.3.19. Theorem. A finite non-empty set Y in a linear space X is linearly indenpendent if and only if for each y E V( Y), y 0, there is a unique finite x " ,} and a uniq u e set of scalars { I II' 111.,"" II",} , subset of ,Y say { X I ' X 1 ."' " such that
*
3.3.20. Exercise.
Prove Theorem 3.3.19.
3.3.21. Exercise. L e t Y be a finite set in a linear space .X Show that Y is linearly independent if and only if there is no proper subset Z of Y such that V(Z) = V( )Y . A concept which is of utmost importance in the study of vector spaces is that of basis of a linear space. 3.3.22. Definition. A set Y or simply a basis, for X if
in a linear space X
(i) Y is linearly independent; and (ii) the span of Y is the linear space X
is called a Hamel
itself; eL .,
V( Y )
=
basis,
.X
As an immediate consequence of this definition we have: 3.3.23. Theorem. Let X be a linear space, and let Y be a linearly independent set in .X Then Y is a basis for V( )Y . 3.3.24.
Exercise.
Prove Theorem 3.3.23.
In order to introduce the notion of dimension of a vector space we show that if a linear space X is generated by a finite number of linearly independent elements, then this number of elements must be unique. We first prove the following result. 3.3.25. 1beorem. L e t Then for each vector x
{XI'
X 1 .,' "
,x , ,}
be a basis for a linear space .X . . . , (I" such that
X there exist unique scalars (II'
E
X
=
(lIX
1
+ ... +
II"X " .
Chapter 3 I Vector Spaces and iL near Transformations
90
Proof. Since IX ' ... ,X . span ,X every vector X a linear combination of them; i.e.,
X
E
can be expressed as
X = lIlx l + lI"X" + ... + lI.X. for some choice of scalars lIl" .. ,lI• . We now must show that these scalars are unique. To this end, suppose that
= X
and Then x
+
=
(- x )
=
+
lI"X" ... - P.x.)
(lIIX I
-
lIlX I
(lIl -
PI)X I
+
lI"X"
+ ... +
+ ... +
+
(lI" -
lI.X.
+
lI.X.)
(- P IX
+ ... +
P")x,,
I-
P"x" P.)x.
(ll. -
=
O.
Since the vectors x I, "x , ...' ,X. form a basis for ,X it follows that they are linearly independent, and therefore we must have (lI, - P,) = 0 for i = 1, ... ,n. From this it follows that III = PI' lI" = P", ... ,lI" = p". • We also have: 3.3.26. Theorem. eL t IX{ ' "X , ... ,x . } be a basis for vector space ,X and let {YI' ... IY' II} be any linearly independent set of vectors. Then m < n.
Proof. We need to consider only the case m > n and prove that then we actually have m = n. Consider the set of vectors IY{ ' X I "" ,x.l. Since the vectors XI' ... ,X . span ,X IY can be expressed as a linear combination of them. Thus, the set {YI' X I > ' " ,x.l is not linearly independent. Therefore, there exist scalars PI' lIl> ... , lI., not all ez ro, such that
PIYI If all the
lI, are
+
lIlx l
zero, then PI
+ ... +
*' 0 and PlY
lI"X. I
=
(3.3.27)
O.
O. Thus, we can write
=
PIYI + O· "Y + ... + O· IY II = O. But this contradicts the hypothesis· of the theorem and can' t happen because the YI' ... IY ' II are linearly independent. Therefore, at least one of the lI, O. Renumbering all the x" if necessary, we can assume that lI" O. Solving for x" we now obtain
*'
*'
"x
=
(- l l)Y I
+ (~~I)XI
+ ... +
Now we show that the set IY{ ' X I "' " ,X ,-I} { X I "' " x.} is a basis for ,X we have I~ ' "~ ,
••
(- : :- I )X . _
I.
is also a basis for .X ,~. E F s uch that
X = ~IXI + ... + ~.x .• Substituting (3.3.28) into the above expression we note that
(3.3.28)
Since
3.3. Linear Independence, Bases, and Dimension
x
=
+
=
'IXI
+
"Y I
"1
Z' X "tXI
+ z
,.[(I[ - )Yt +
+
+
91
(-::-I)X._
+ ... +
t]
".- I X . _ I '
where" and are defined in an obvious way. In any case, every x E X can be expressed as a linear combination of the set of vectors y{ t, X I' • • , X . _ and thus this set must span .X To show that this set is also linearly independent, let us assume that there are scalars such that AYI
+
AIX I
and assume that A1= = O. Then YI-
_ (-A T I)
+
XI·"
+ ... +
In view of Eq. (3.3.27) we have, since YI
+ ... +
= (p~I)XI
A, AI' ... ,A._ I A._IX._
(-A ._I)-A +
PI
a,
I
=
X.-
0,
t
+
0 ·X
.•
(3.3.29)
1= = 0, the relation
(-p:-t)x._
t+
(p~.)x.
(3.3.30)
Now the term (-a../Pt)x. in Eq. (3.3.30) is not zero, because we solved for X . in Eq. (3.3.28); yet the coefficient multiplying X . in Eq. (3.3.29) is zero. Since { X I ' ... ,x . J is a basis, we have arrived at a contradiction, in view of Theorem 3.3.25. Therefore, we must have A = O. Thus, we have
At IX
A._t.X _
+ ... +
1
+
0 . .X
= 0
AI
and since { x u . .. , .x l is a linearly independent set it follows that = 0, • . . , A._ I = O. Therefore, the set { y \ J X I ' • • , X . _ d is indeed a basis for X. By a similar argument as the preceding one we can show that the set ,z Y { YI' XI'· ' ,x . - z J is a basis for ,X that the set 3Y{ ' ,z Y Y I ' X I ' ... ,x . - 3 I is a basis for ,X etc. Now if m > n, then we would not utilize Y n + 1 in our process. Since {Y., . .. ,Y I ) is a basis by the preceding argument, there exist coefficients ' I ., ... , ' I I such that Y.+I
=
' I .Y .
+ ... +
' I IY I '
But by Theorem 3.3.15 this means the "Y i = 1, ... ,n + 1 are linearly dependent, a contradiction to the hypothesis of our theorem. F r om this it now follows that if m > n, then we must have m = n. This concludes the proof of the theorem. _ As a direct consequence of Theorem 3.3.26 we have:
3.3.31. Theorem. If a linear space X has a basis containing a finite number of vectors n, then any other basis for X consists of exactly n elements.
Proof Let { X I ' ... , .x 1 be a basis for X, and let also { Y I "" , y.. l be a basis for .X Then in view of Theorem 3.3.26 we have m < n. Interchanging the role of the X i and ,Y we also have n < m. Hence, m = n. _
Chapter 3 I Vector Spaces and iL near Transformations
92
Our preceding result enables us to make the following definition. 3.3.32. Definition. If a linear space X has a basis consisting of a finite number of vectors, say X { I ' • • , ,x ,}, then X is said to be a ftnite-diJDelLl4 ional vector space and the dimension of X is n, abbreviated dim X = n. In this case we speak of an n-dimeasional vector space. If X is not a finite-dimensional vector space, it is said to be an inftnite-dimeasional vector space. We will agree that the linear space consisting of the null vector is finite dimensional, and we will say that the dimension of this space is ez ro. Our next result provides us with an alternate characterization of (finite) dimension of a linear space. 3.3.33. Theorem. Let X be a vector space which contains n linearly independent vectors. If every set of n + I vectors in X is linearly dependent, then X is finite dimensional and dim X = n. Proof eL t IX{ > • . . ,x,,} be a linearly independent set in ,X and let x Then there exists a set of scalars {II I' ... , 11,,+ I} not all ez ro, such that II I X
+ ... +
I
II"X"
+
II H I
X
and X E V({ X I > " " is n-dimensional. _
(- ...!)L x
=
11"+1
l -
i.e., { X l • •
,x ,});
.•.
,x,,}
-
.X
= O.
Now 11"+1 *- 0, otherwise we would contradict the fact that linearly independent. eH nce, X
E
XI'
•.•
,X "
are
(~)x"
11,,+ I
is a basis for .X
Therefore, X
F r om our preceding result follows: 3.3.34. Corollary. Let X be a vector space. If for given n every set of n + 1 vectors in X is linearly dependent, then X is finite dimensional and dim X
n< o
3.3.35. Exercise.
Prove
3.3.34.
Coroll~ry
We are now in a position to speak of coordinates of a vector. We have: 3.3.36. Definition. Let X be a finite-dimensional vector space, and let x { I ' . . . , ,x ,} be a basis for .X Let X E X be represented by
x =
The unique scalars, I> to the basis {XI' 2X ." •
'tXI
+ ... + ,,,x,,.
2' ., ... ,,,, are called the coordinates of x ,
,x ,}.
with respect
It is possible to prove results similar to Theorems 3.3.26 and 3.3.31 for infinite-dimensional linear spaces. Since we will not make further use of
3.3. iL near Independence, Bases, and Dimension
93
these results in this book, their proofs will be omitted. In the following theorems, X is an arbitrary vector space (i.e., finite dimensional or infinite dimensional). 3.3.37. Theorem. If Y is a linearly independent set in a linear space ,X then there exists a Hamel basis Z for X such that Y c Z. 3.3.38. Theorem. If Y and Z are Hamel Y and Z have the same cardinal number.
bases for a linear space ,X
then
The notion of H a mel basis is not the only concept of basis with which we will deal. Such other concepts (to be specified later) reduce to H a mel basis on finite-dimensional vector spaces but differ significantly on infinite-dimensional spaces. We will find that on infinite-dimensional spaces the concept of Hamel basis is not very useful. However, in the case of finite-dimensional spaces the concept of Hamel basis is most crucial. In view of the results presented thus far, the reader can readily prove the following facts. 3.3.39. Theorem.
=n.
Let
X
be a finite-dimensional linear space with dim X
(i) No linearly independent set in X contains more than n vectors. (ii) A linearly independent set in X is a basis if and only if it contains exactly n vectors. (iii) Every spanning or generating set for X contains a basis for .X (iv) Every set of vectors which spans X contains at least n vectors. (v) Every linearly independent set of vectors in X is contained in a basis for .X (vi) If Y is a linear subspace of X, then Y is finite dimensional and dim Y < n . (vii) If Y is a linear subspace of X and if dim X = dim ,Y then Y = .x 3.3.40.
Exercise.
Prove Theorem 3.3.39.
F r om Theorem 3.3.39 follows directly our next result. 3.3.41. Theorem. Let X be a finite-dimensional linear space of dimension n, and let Y be a collection of vectors in .X Then any two of the three conditions listed below imply the third condition: (i) the vectors in Y a re linearly independent; (ii) the vectors in Y span X ; and (iii) the number of vectors in Y is n.
Chapter 3 I Vector Spaces and iL near Transformalions 3.3.42.
Exercise.
Prove Theorem 3.3.41.
Another way of restating Theorem 3.3.41 is as follows: (a) the dimension of a finite-dimensional linear space X is equal to the smallest number of vectors that can be used to span X ; and (b) the dimension of a finite-dimensional linear space X is the largest number of vectors that can be linearly independent in .X F o r the direct sum of two linear subspaces we have the following result. Theorem. eL t X be a finite-dimensional vector space. If there exist linear subspaces Y and Z of X such that X = Y ® Z, then dim (X ) = dim (Y ) + dim (Z).
3.3.43.
Proof Since X is finite dimensional it follows from part (vi) of Theorem 3.3.39 that Y a nd Z are finite-dimensionallinear spaces. Thus, there exists a basis, say { Y I "" ,Y,,} for ,Y and a basis, say { Z I> ' " ,z ..}, for Z. Let W = { Y I "' " "Y , ZI"" ,z",}. We must show that Wis a linearly independent set in X and that V(W) = .X Now suppose that
Since the representation for 0 in Y and Z, we must have
E
X must be unique in terms of its components
and But this implies that ~I = ~ = ... = ~ " = PI= P~ = ... = P.. = O. Thus, W is a linearly independent set in .X Since X is the direct sum of Y and Z, it is clear that W generates .X Thus, dim X = m + n. This completes the proof of the theorem. _ We conclude the present section with the following results. 3.3.4.4 1beorem. eL t X be an n-dimensional vector space, and let y{ I ' ... , y",} be a linearly independent set of vectors in ,X where m < n. Then it is possible to form a basis for X consisting of n vectors x I ' • • , x"' where ,x = ,Y for i = I, ... , m.
Proof
Let { e l"" ,e,,} be a basis for .X Let SI be the set of vectors IY{ ' ... ,Y"" e l , • • , ell}, where { Y I "' " Y .. } is a linearly independent set of vectors in X and where m < n. We note that SI spans X and is linearly
3.4.
iL near Transformations
95
dependent, since it contains more than n vectors. Now let
. tJ,,Y
E
1= '
"*
" + E 1= '
p,e, =
O.
Then there must be some Pj 0, otherwise the linear independence of { y " ... , Y.} would be contradicted. But this means that ej is a linear combination of the set of vectors Sz = y{ I' • . . , Y .., e l , • • , e j _ l , e j "+ ... , ell}; i.e., Sz is the set SI with ej eliminated. Clearly, Sz still spans .X Now either Sz contains n vectors or else it is a linearly dependent set. If it contains n vectors, then by Theorem 3.3.41 these vectors must be linearly independent in which case Sz is a basis for .X We then let "x = t j , and the theorem is proved. On the other hand, if Sz contains more than n vectors, then we continue the above procedure to eliminate vectors from the remaining e,'s until exactly n - m of them are left. Letting eil, ... ,ej _ be the remaining vectors and letting X .. + I = til' ... ,x " = ej • _ , we have completed the proof of the theorem. _
3.3.45.
Corollary. Let X be an n-dimensional vector space, and let Y be an m-dimensional subspace of .X Then there exists a subspace Z of X of dimension (n - m) such that X = Y EB Z.
3.3.46.
Exercise.
Prove Corollary 3.3.45.
Referring to Figure 3.3.8, it is easy to see that the subspace Z in Corollary 3.3.45 need not be unique.
3.4.
IL NEAR
TRANSFORMATIONS
Among the most important notions which we will encounter are special types of mappings on vector spaces, called linear transformations. Deftnition. A mapping T of a linear space X into a linear space ,Y where X and Y a re vector spaces over the same field ,F is called a linear transformation or linear operator provided that
3.4.1.
(i) T(x (ii) T(tJ)x
+
=
y)
= T(x) + T(y) for all x, y E X ; and tJT(x) for all x E X and for all tJ E .F
A transformation which is not linear is called a non-linear transformation. We will find it convenient to write T E L ( X , )Y to indicate that T is a linear transformation from a linear space X into a linear space Y (i.e.,
Chapter 3 I Vector Spaces and iL near Transformations
96
)Y denotes the set of all linear transformations from linear space X into linear space Y). . It follows immediately from the above definition that T is a linear transfor-
L(X,
mation from a linear space X into a linear space Y if and only if
" II,T(X = I-I; I
,) for all ,X
E X
=
and for all II, E F , ;
T(tl IIIXI)
I, ... ,n. In engineering
and science this is called the principle of soperposition and is among the most important concepts in those disciplines. 3.4.2. Example. Let X = Y denote the space of real-valued continuous Y functions on the interval a[ , b] as described in Example 3.1.19. Let T: X - + be defined by
T [ (]x t)
f (x s)ds,
=
a
<
t
<
b,
where integration is in the Riemann sense. By the properties of integrals it follows readily that T is a linear transformation. • 3.4.3. Example. Let X = e"(a, b) denote the set of functions x ( t) with n continuous derivatives on the interval (a, b), and let vector addition and scalar multiplication be defined by equations (3.1.20) and (3.1.21), respectively. It is readily verified that e"(a, b) is a linear space. Now let T: e"(a, b) -+ eO-I(a, b) be defined by
= dx(t) .
T [ (]x t)
dt F r om the properties of derivatives it follows that T is a linear transformation from e"(a, b) to e"- I (a, b). • 3.4.4. Example. Let X denote the space ofall complexv- alued functions x ( t) defined on the half-open interval 0[ , 00) such that x ( t) is Riemann integrable and such that
,--
where k is some positive constant and a is any real number. Defining vector addition and scalar multiplication as in Eqs. (3.1.20) and (3.1.21), respectively, it is easily shown that X is a linear space. Now let Y denote the linear space of complex functions of a complex variable s (s = (1 + ;0>, ; = ,.JT = ). The Y defined by reader can readily verify that the mapping T: X - +
T [ (] x s)
=
50-
e- " x ( t)
dt
is a linear transformation (called the Laplace traasform of x ( t» .
(3.4.5) •
3.4.6. Example. Let X be the space of real-valued continuous functions on a[ , b] as described in Example 3.1.19. Let k(s, t) be a real-valued function
3.4.
iL near Transformations
defined for a integral
<
<
s :::;;: b, a
<
t
b, such that for each x
s:
X
E
(3.4.7)
k(s, t)x ( t) dt
exists and defines a continuous function of s on a[ , b). eL t defined by
=
[Ttx)(s)
the Riemann
s:
=
y(s)
T1 : X
be X
-+
(3.4.8)
k(s, t)x ( t) dt.
It is readily shown that T 1 E L ( X , X). The equation (3.4.8) is called the rF edholm integral equation of the first type. _
3.4.9.
Example.
If in place of (3.4.8) we define T z : X
=
T [ )xz (s)
=
y(s)
s:
x(s) -
-+
k(s, t)x ( t) dt,
X
by (3.4.10)
then it is again readily shown that Tz E L ( X , X). Equation (3.4.10) is known as the rF edholm integral equation of the second type. _ 3.4.11. Example. In Examples 3.4.6 and 3.4.9, assume that k(s, t) when t> s. In place of (3.4.7) we now have
r
= 0
(3.4.12)
k(s, t)x(t)dt.
Eq u ations (3.4.8) and (3.4.10) now become
=
[ T 3(]x s)
y(s)
=
J : k(s,
(3.4.13)
t)x ( t) dt
and [T.x)(s)
=
=
y(s)
(x s)
.J : -
k(s, t)x ( t) dt,
(3.4.14)
respectively. Equations (3.4.13) and (3.4.14) are called Volterra integral equations (of the first type and the second type, respectively). Again, the mappings T3 and T. are linear transformations from X into .X _ ~
3.4.15. Example. eL t X = C, the set of complex let x denote the complex conjugate of .x Define T: X
= .x
T(x )
Then, clearly, T(x + y) = x + y = x the field of complex numbers. and if ~ T(~)x
= ~x
= ~x
+ y=
=
E
.F
Therefore, T is not a linear transformation.
+
T(x )
then _
*'
T~ (x).
-+
numbers. If x X as T(y). Now if F
E
C,
= C,
Chapter 3 I Vector Spaces and iL near Transformations
98
Example 3.4.15 demonstrates the important fact that condition (i) of Definition 3.4.1 does not imply condition (ii) of this definition. Henceforth, where dealing with linear transformations T: X write Tx in place of T(x ) . 3.4.16.
Definition. L e t T
L(X,
E
=
& ( T)
the null space of T. The set R < (T)
=
y{
->
,Y we wiJI
We call the set
)Y . {x
E
X:
E
:Y y
Tx
=
= O}
Tx , x
(3.4.17)
E
}X
(3.4.18)
is called the range space of T. Since TO = 0 it follows that & ( T) and R < (T) are never empty. The next two important assertions are readily proved. 3.4.19.
Theorem.
Let T
E
L(X,
)Y .
Then
(i) the null space & ( T) is a linear subspace of X; and < (T) is a linear subspace of .Y (ii) the range space R
3.4.20.
Exercise.
Prove Theorem 3.4.19.
F o r the dimension of the range space R < (T) we have Theorem. L e t T E L ( X , )Y . If X is finite dimensional with dimen< (T) is finite dimensional and dim R < [ (T)] < n. sion n, then R
3.4 . 21.
We assume that R < (T) =# O { J and X =# O { ,J for if R < (T) = O { J or X then dim R <{ (T)} = 0, and the theorem is proved. Thus, assume that n> 0 and let Y l o"" y~+1 E R< (T). Then there exist X I ' • . • , X~+1 E X such that Tx/ = y/ for ; = 1, ... , n + 1. Since X is of dimension n, there exist ~I' • . • , ~+I E F such that not all ~/ = 0 and
Proof =
O { ,J
This implies that
~IXt
+ ... +
~+tX~+t
+ ... +
~+IY-+I
O. =
Thus, or
O. Therefore, by Corollary 3.3.34, R < (T) is finite dimensional and dim R < [ (T)] ~IYI
n< o
=
•
3.4.22. Example. Let T: R% - > R"", where R% and R- are defined in Examples 3.1.10 and 3.1.11, respectively. F o r x E R% we write x = ({Io )%{ . Define
3.4.
iL near Transformations
Tby T(~I'
~Z)
(0, I~ ' =
0, ~Z,
0, 0, ...).
The mapping T is clearly a linear transformation. The vectors (0; 1,0, ...) < (T) and dim R <[ (T)] = 2 = dim R [ Z]. • and (0,0,0,1,0,0, ...) span R We also have: Theorem. Let T E L ( X , )Y , and let X be finite dimensional. eL t be a basis for R < (T) and let ,X be such that TX I = ,Y for i = 1, ... , n. Then X I ' . • • , x. are linearly independent in .X
3.4.23. {YI>'"
,y.J
3.4.24.
Exercise.
Prove Theorem 3.4.23.
Our next result, which as we will see is of utmost importance, is sometimes called the fundamental theorem of linear equations. 3.4.25.
Theorem. eL t T
L(X,
E
+
)Y .
dim &(T)
If X is finite dimensional, then
dim R < (T)
= dim .X
(3.4.26)
eL t dim X = n, let dim & ( T) = s, and let r = n - s. We must show that dim R < (T) = r. First, let us assume that < s < n, and let e{ l> ez , ... , e.} be a basis for X chosen in such a way that the last s vectors, et+., e' H ' ... ,e., form a basis for the linear subspace & ( T) (see Theorem 3.3.4)4 . Then the vectors Tel, Tez, , Te" Te'1+ > ... , Te. generate the linear subspace R < (T). But e,+1> e,+,z , e. are vectors in &(T), and thus Te,+1 = 0, ... , Te. = O. From this it now follows that the vectors Tel, Te z , ... , Te, must generate R < (T). Now let fl = Tel,fz = Tez, .. ' .I, = Te,. We must show that the vectors {f1,fZ, ... ,f,} are linearly independent and as such form a basis < (T). for R Next, we observe that "ltfl + "ldz + ... + "I,f, E R < (T). If the "II> "lz, ... ,"1, are chosen in such a fashion that "ltf. + tdz + ... + "1'/, = 0, then Proof
°
°
= =
7tfl
+
T(7. e l
+
tdz
+
7z e z
+
+
+
71 Te l
7,f, =
7,e,),
+
+
7z Tez
+ ... +
+
7,Te,
and from this it follows that x = "lle l 7zez + ... 7,e, E &(T). Now, by assumption, the set e{ I+ ' " .. , e.} is a basis for &(T). Thus there must exist scalars 7t+1> 7,H, ... ,7. such that
"lle l
+
"Izez
This can be rewritten as
+ ... +
"I,e, =
)' , + J e ,+ J
+ ... +
)'.e .•
Chapter 3 I Vector Spaces and iL near Transformations
100
But fel, e", ... ,en} is a basis for .X F r om this it follows that 71 = 7" = ... = Y r = 7r+ I = ... = Y n = O. eH nce, fltf", ... ,fr are linearly independent < (T) = r. If s = 0, the preceding proof remains valid if and therefore dim R we let fel, ... ,e.} be any basis for X and ignore the remarks about the vectors e{ r + I ' • • ,en}' If s = n, then ffi.(T) = .X eH nce, R < (T) = O { J and so < (T) = O. This concludes the proof of the theorem. _ dim R Our preceding result gives rise to the next definition. 3.4.27. Definition. The rank p(T) of a linear transformation T of a finitedimensional vector space X into a vector space Y is the dimension of the range space R < (T). The nullity v(T) of the linear transformation Tis the dimension of the nullspace ffi.(i'). The reader is now in a position to prove the next result. )Y . Let X be finite dimensional, and let 3.4.28. Theorem. eL t T E L ( X , s = dim ffi.(T). eL t IX { ' ... ,x , } be a basis for ffi.(T). Then
(i) a vector x E X satisfies the equation Tx = O
if and only if x = lIlX I + ... + lI,X , for some set of scalars { l ilt ... , lI,}. Furthermore, for each x E X such that Tx = 0 is satisfied, the set of scalars { l ilt ... , II,} is unique; (ii) if oY is a fixed vector in ,Y then Tx = oY holds for at least one x E X (called a solutioD of the equation Tx = oY ) if and only if oY E R < (T); and (iii) if oY is any fixed vector in Y a nd if X o is some vector in X such that Tx o = oY (i.e., X o is a solution of the equation Tx o = oY ), then a vector x E X satisfies Tx = oY if and only if x = X o + PIX I + ... + P,X, for some set of scalars P{ it P", ... ,P,}. Furthermore, for each x E X such that Tx = oY , the set of scalars P { it P1.' ... ,P,} is unique. 3.4.29.
Exercise.
Prove Theorem 3.4.28.
Since a linear transformation T of a linear space X into a linear space Y is a mapping, we can distinguish, as in Chapter I, between linear transformations that are surjective (i.e., onto), injective (i.e., one-to-one), and bijective (i.e., onto and one-to-one). We will often be particularly interested in knowing when a linear transformation T has an inverse, which we denote by T- l . In this connection, the following terms are used interchangeably: T- I exists, T has an inverse, T is invertible, and Tis non-singular. Also,. a linear
3.4.
iL near Transformations
101
transformation which is not non-singular is said to be singular. We recall, if T has an inverse, then
=
T- I (Tx )
and
T(T- I y)
x for all x
E X
(3.4.30)
= y for all y E R< (T).
(3.4.31)
The following theorem is a fundamental result concerning inverses of linear transformations. 3.4.32.
Let T E L ( X ,
Theorem.
)Y .
(i) The inverse of T exists if and only if Tx = 0 implies x = O. (ii) If T- I exists, then T- I is a linear transformation from R < (T) onto .X Proof To prove part (i), assume first that Tx = 0 implies x = O. Let X I ' X 2 E X with TX I = TX2' Then T(x l - x 2) = 0 and therefore IX - 2X = O. Thus, IX = X 2 and T has an inverse. Conversely, assume that T has an inverse. Let Tx = O. Since TO = 0, we have TO = Tx. Since T has an inverse, X = O. To prove part (ii), assume that T- I exists. To establish the linearity of T- I ,let IY = TX I and 2Y = Tx 2, where Y I ' 2Y E R < (T) and X I ' X 2 E X are such that IY = TX I and 2Y = Tx 2. Then T- I (Y I
+
=
2Y )
=
Also, for
T- I (Tx l T- I (Y I )
~
E F we have
T-I(~YI)
=
T-I(~Txl)
+
+
Tx 2)
=
T- I T(x
l
+
x 2)
=
IX
+
X 2
T- I (yz ) .
=
T-I(T(~xl))
=
~XI
=
~T-I(YI)'
Thus, T- I is linear. It is also a mapping onto ,X since every Y E R < (T) is the image of some X E .X F o r, if X E ,X then there is ayE R < (T) such that Tx = y. Hence, X = T- I y and X E R < (T-I). • 3.4.33. Example. Consider the linear transformation T: R2 - + R~ of Example 3.4.22. Since Tx = 0 implies X = 0, Thas an inverse. We see that T is not a mapping of R2 onto R- ; however, T is clearly a one-to-one mapping of R2 onto R < (T). • F o r finite-dimensional vector spaces we have: 3.4.34. Theorem. Let T E L ( X , )Y . If X is finite dimensional, T has an inverse if and only if CR(T) has the same dimension as X ; i.e., p(T) = dim .X Proof
By Theorem 3.4.25 we have dim ffi:(T) +
dim R < (T)
= dim .X
Chapter 3 I Vector Spaces and iL near Transformations
101
Since Thas an inverse ifand only iU t (T) if and only if T has an inverse. _
= O{ ,J it follows that P(T) = dim X
F o r finite-dimensional linear spaces we also have: 3.4.35. Theorem. eL t X and Y be finite-dimensional vector spaces of the )Y . Then R < (T) = Y same dimension, say dim X = dim Y = n. Let T E L ( X , if and only if T has an inverse.
Proof Assume that T has an inverse. By dim R < (T) = n. Thus, dim R < (T) = dim Y a nd part (vii), that R < (T) = .Y Conversely, assume that R < (T) = .Y eL t R < (T). Let ,X be such that TX t = ,Y for i =
Theorem 3.4.34 we know that if follows from Theorem 3.3.39,
IY{ :Y' .! ' .• . ,Y . } be a basis for I, ... ,n. Then, by Theorem 3.4.23, the vectors X u • • , X . are linearly independent. Since the dimension of X is n, it follows that the vectors X l ' • • ,X . span .X Now let Tx = 0 for some X E .X We can represent X as X = «IX I « . x .• Hence, 0 = Tx = «IYI «.1 Since the vectorsY I ".' ,Y . are linearly independent, we must have I« = = .« = 0, and thus X = This implies that T has an inverse. _
+ ... +
+ ... +
o.
At this point we find it instructive to summarize the preceding results which characterize injective, surjective, and bijective linear transformations. In so doing, it is useful to keep Figure J in mind.
T
:Dm = X
3.4.36. iF gure J . iL near transformation T from vector space X vector space .Y
into
3.4.37. Summary (Injective Linear Transformations). Let X and Y be vector spaces over the same field ,F and let T E L ( X , )Y . The following are equivalent: (i) T is injective; (ii) T has an inverse;
3.4.
iL near Transformations
103
(iii) Tx = 0 implies x = 0; < (T), there is a unique x (iv) for each y E R (v) if TXt = Tx 1 , then X t = x 1 ; and (vi)
if X
t
*' x
1,
then TXt
*' Tx
E
X such that Tx
=
y;
1•
If X is finite dimensional, then the following are equivalent: (i) T is injective; and (ii) p(T) = dim .X 3.4.38. Summary (Surjective Linear Transformations). Let X and Y be vector spaces over the same field E, and let T E L ( X , )Y . The following are equivalent: (i) T is surjective; and (ii) for each Y E ,Y there is an x E X such that Tx If X and Y a re
= y.
finite dimensional, then the following are equivalent:
(i) T is surjective; and (ii) dim Y = p(T). 3.4.39. Summary (Bijective Linear Transformations). vector spaces over the same field E, and let T E L ( X , )Y . equivalent: (i) T is bijective; and (ii) for every y E Y there is a unique x
If X
and Y a re
E
X
Let X and Y be The following are
such that Tx =
y.
finite dimensional, then the following are equivalent:
(i) T is bijective; and (ii) dim X = dim Y = p(T). 3.4.40. Summary (Injective, Surjective, and Bijective Linear Transformations). L e t X and Y be finite-dimensional vector spaces, over the same field E, and let dim X = dim .Y (Note: this is true if, e.g., X = .Y ) The following are equivalent: (i) (ii) (iii) (iv) 3.4.41. (3.4.04 ).
T is injective; T is surjective: T is bijective; and T has an inverse. Exercise.
Verify the assertions made in summaries (3.4.37)-
Chapter 3 I Vector Spaces and iL near Transformations
104
eL t us next examine some of the properties of the set L ( X , )Y , the set of all linear transformations from a vector space X into a vector space .Y As before, we assumelhat X and Y a re linear spaces over the same field .F Let S, T E L ( X , Y), and define the sum of SandT by
+
(S
for all x
E
.X
Also, with /X
by a scalar /X as
E
T)x
t::.
E E
+
Tx
F and T E L ( X , (/XT)x
for all x that /XT
Sx
define multiplication of T
)Y ,
/XTx
t::.
(3.4.24 )
(3.4.34 )
+
.X It is an easy matter to show that (S T) E L ( X , )Y and also L(X, )Y . eL t us further note that there exists a zero element in
Y), called the ez ro transformation and denoted by 0, which is defined by
L(X,
Ox
= 0
(3.4.)4
)Y there corresponds a unique for all x E .X Moreover, to each T E L ( X , Y) defined by linear transformation - T E L ( X , ( - T)x
for all x E .X
= -
Tx
In this case it follows trivially that - T
+
(3.4.45)
T=
O.
3.4.64 . Exercise. eL t X be a finite-dimensional space, and let T E L ( X , )Y . Let e{ l> ... ,e.} be a basis for .X Then Te, = 0 for i = I, ... , n if and only if T = 0 (i.e., T is the ez ro transformation). With the above definitions it is now easy to establish the following result. 3.4.74 . Tbeorem. eL t X and Y be two linear spaces over the same field of scalars ,F and let L ( ,X Y) denote the set of all linear transformations from X into .Y Then L ( X , Y ) is itself a linear space over ,F called the space of linear transformations (here, vector addition is defined by Eq. (3.4.24 ) and multiplication of vectors by scalars is defined by Eq. (3.4.43». 3.4.84 .
Exercise.
Prove Theorem 3.4.74 .
Next, let us recall the definition of an algebra, considered in Chapter 2. 3.4.94 . Definition. A set X is called an algebra if it is a linear space and if in addition to each ,x y E X there corresponds an element in X, denoted by x · y and called the product of x times y, satisfying the following axioms: (i) x · (y + )z (ii) (x + y) • z (iii) (/Xx), (py)
=
=
x • y
= x • z
+
+
(/XP)(x
If in addition to the above,
x
• z for all x , y, z E X ;
y • z for all x , y,
• y) for all x , y E X
X ; and and for all /x, P E .F
Z E
3.4.
105
iL near Transformations
(iv) (x ·
= x •
y) • z
(y • )z for all x , y, Z
E
,X
then X is called an associatil'e algebra. If there exists an element i E X such that i . x = x • i = x for every x E ,X then i is called the identity of the algebra. It can be readily shown that if i exists, then it is unique. Furthermore, if x • y = y • x for all x , y E ,X then X is said to be a commutative algebra. Finally, if Y is a subset of X (X i sanalgebra)and(a)ifx + y E Y w heneverx , y E Y , and(b)ifex x E Y whenever ex E F and x E ,Y and (c) if x • y E Y whenever x , y E ,Y then Y is called a subalgebra of .X Now let us return to the subject on hand. Let ,X ,Y and Z be linear spaces over ,F and consider the vector spaces L ( ,X Y) and L(Y, Z). IfS E L ( ,Y Z) and if T E L ( X , )Y , then we define the product STas the mapping of X into Z characterized by (ST)x = S(Tx ) (3.4.50) for all x E .X The reader can readily verify that ST E L ( X , Next, let X = Y = Z. If S, T, V E L ( X , X ) and if ex, easily shown that S(TU ) = (ST)V, S(T+
(S +
and
= ST+ SV,
U) T)V =
SU
(exS)(PT) =
+
Z). ,F
PE
then it is (3.4.51) (3.4.52) (3.4.53)
TV,
(3.4.54)
(a,P)ST.
F o r example, to verify (3.4.52), we observe that S [ eT
+
= S[(T + )U ]x
)U x]
=
(ST)x
+
= S[Tx + ]xU (SU ) x =
(ST +
SU ) x
for all x E ,X and hence Eq. (3.4.52) follows. We emphasize at this point that, in general, commutativity of linear transformations does not hold; i.e., in general, (3.4.55)
ST*- TS.
There is a special mapping from a linear space X into ,X called the identity transformation, defined by (3.4.56) Ix = x for all x E .X We note that I is linear, i.e., I E L ( X , if X * - O { ,J that I is unique, and that TI = for all T
E
L(X,
X).
IT =
T
X ) , that I*- O ifand only
(3.4.57)
Also, we can readily verify that the transformation
106
Chapter
rJ,I, rJ, e ,F defined by
(a.I)x
I Vector Spaces and Linear Transformations
j
=
a.lx
=
(3.4.58)
a.x
is also a linear transformation. The above discussion gives rise to the following result. 3.4.59. Theorem. The set of linear transformations of a linear space X into ,X denoted by L ( X , X), is an associative algebra with identity I. This algebra is, in general, not commutative. We further have: 3.4.60. and
Theorem. Let T
E
L(X, T- I T=
X).
If T is bijective, then T- I IT- I =
I,
E
L(X,
X)
(3.4.61)
where I denotes the identity transformation defined in Eq. (3.4.56). 3.4.62.
Exercise.
Prove Theorem 3.4.60.
F o r invertible linear transformations defined on finite-dimensional linear spaces we have the following result. 3.4.63. Theorem. Let X be a finite-dimensional vector space, and let T E L(X, X). Then the following are equivalent: (i) (ii) (iii) (iv) (v)
3.4.64.
T is invertible; rank T = dim X ; T is one-to-one; T is onto; and Tx = 0 implies x Exercise.
=
O.
Prove Theorem 3.4.63.
Bijective linear transformations are further characterized by our next result. 3.4.65. IE L ( X ,
Theorem. Let X be a linear space, and let S, T, U X ) denote the identity transformation.
E
L(X,
(i) If ST = S U = I, then S is bijective and S- I = T = .U (ii) IfSand Tare bijective, then STis bijective, and (Sn- I = (iii) If S is bijective, then (S- I )- I = S. (iv) If S is bijective, then a.S is bijective and (a.S>1F a nd a.
*' O.
=
~
X).
Let
T- I S- I .
S- I for all a.
E
3.4.
107
iL near Transformations
3.4.66.
Exercise.
Prove Theorem 3.4.65.
With the aid of the above concepts and results we can now construct certain classes of functions of linear transformations. Since relation (3.4.51) allows us to write the product of three or more linear transformations without the use of parentheses, we can define T", where T E L ( ,X X ) and n is a positive integer, as T"I1T· T · ... · T . (3.4.67) n times
Similarly, if T- I is the inverse of T, then we can define T- " ' , where m is a positive integer, as T- ' "
11
=
(T- I )' "
T- I • T- I ... • T- t . mtfmes
m ti'ines
(3.4.68)
.
n tImes
(T. T· .... T) m + n·times =
T"'"+ = =
(T • T • . ..• • n times =
T) • (T • T • . .•
. mtimes
T) (3.4.69)
1'" • T"'.
In a similar fashion we have and
(T"')"
= T"" = T- = (1"')"'
(3.4.70) (3.4.71)
where m and n are positive integers. Consistent with this notation we also have TI = T (3.4.72) and TO = 1. (3.4.73) We are now in a position to consider polynomials of linear transformations. Thus, if f(A) is a polynomial, i.e.,
f(A) =
0«
+
A \«
+ ... +
"« A",
(3.4.74)
,« ,1"'.
(3.4.75)
« ' ... ,« " E ,F then by f(T) we mean where 0 f(T) =
f1, 0 1
+
f1,tT
+ ... +
The reader is cautioned that the above concept can, in general, not be
Chapter 3 I Vector Spaces and iL near Transformations
108
extended to functions of two or more linear transformations, because linear transformations in general do not commute. Next, we consider the important concept of isomorphic linear spaces. In Chapter 2we encountered the notion of isomorphisms of groups and rings. We saw that such mappings, if they exist, preserve the algebraic properties of groups and rings. Thus, in many cases two algebraic systems (such as groups or rings) may differ only in the nature ofthe elements ofthe underlying set and may thus be considered as being the same in all other respects. We n.ow extend this concept to linear spaces. 3.4.76. Definition. eL t X and Y be vector spaces over the same field .F Ifthere exists T E L ( X , Y) such that Tis a one-to-one mapping of X into ,Y then T is said to be an isomorphism of X into .Y If in addition, T maps X onto Y then X and Yare said to be isomorphic. Note that if X and aY re isomorphic, then clearly aY nd X are isomorphic. Our next result shows that all n-dimensional linear spaces over the same field are isomorphic. 3.4.77. Theorem. Every n-dimensional vector space X over a field F is isomorphic to F". Proof eL t e{ l, ... ,e,,} be a basis for .X Then every x E X has the unique representation x = ele l + ... + e"e", where {el, e1., ... ,~,,} is a unique set of scalars (belonging to F ) . Now let us define a linear transformation T from X into P by Tx =
(~1>
~1.,
••
,e,,)·
It is an easy matter to verify that T is a linear transformation of X onto P, and that it is one-to-one (the reader is invited to do so). Thus, X is isomorphic to P . • It is not difficult to establish the next result.
3.4.78. Theorem. Two finite-dimensional vector spaces X and Yover same field F are isomorphic if and only if dim X = dim .Y 3.4.79.
Exercise.
the
Prove Theorem 3.4.78.
Theorem 3.4.77 points out the importance ofthe spaces R" and C". Namely, every n-dimensional vector space over the field of real numbers is isomorphic to R" and every n-dimensional vector space over the field of complex numbers is isomorphic to eft (see Example 3.I.lO).
3.5.
IL NEAR
N UF CTIONALS
There is a special type of linear transformation which is so important that we give it a special name: linear functional. We showed in Example 3.1.7 that if F is a field, then "F is a vector space over .F If, in particular, n = I, then we may view F as being a vector space over itself. This enables us to consider linear transformations of a vector space X over F into .F 3.5.1. Definition. Let X be a vector space over a field .F A mapping f of X into F is called a functional on .X If1 is a linear transformation of X into ,F then we call 1 a linear functional on X . . We cite some specific examples of linear functionals. 3.5.2. Example.
Consider the space era, b]. Then the mapping
s:
II(x ) =
ex s) ds, x
era, b]
E
(3.5.3)
is a linear functional on era, b]. Also, the function defined by Il(X) =
(x so),
X
E
era, b],
So
a[ , b]
E
(3.5.4)
is also a linear functional on era, b]. Furthermore, the mapping f,ex)
=
r
(x s)xo(s)
(3.5.5)
ds,
where X o is a fixed element of era, b] and where x is any element in era, b], is also a linear functional on era, b]. • 3.5.6. Example. eL t X = P, and denote x The mappingf, defined by f,(x ) = el
E
X
by x
=
(e
I' •.•
,
e.).
(3.5.7)
is a linear functional on .X A more general form of I, is as follows. eL t a = (~I' ... , ~.) E X be fixed and let x = (el' ... ,e.) be an arbitrary element of .X It is readily shown that the function Is(x ) is a linear functional on .X
• = :E ,~ e, I- I
(3.5.8)
•
3.5.9. Exercise. Show that the mappings (3.5.3), (3.5.4), (3.5.5), (3.5.7), and (3.5.8) are linear functionals. Now let X
be a linear space and let X '
denote the set of all linear func-
109
Chapter 3 I Vector Spaces and iL near Transformations
110
tionals on .X Iff E X ' is evaluated at a point x quently we will also find the notation
f(x )
,X we write f(x ) . Fre-
E
(x , J )
A
(3.5.10)
useful. In addition to Eq. (3.5.10), the notation x'(x) used. In this case Eq. (3.5.10) becomes
=
f(x )
(x , J )
or x ' x
is sometimes
(x , x ' ) , =
(3.5.11)
where x ' is used in place of f Now letfl = t' x ,J1. = ~ belong to IX , « E .F Let us define fl + f1. = t'x + ~ and « f = « x ' by
(fl
+
f1.)(x) =
t'x
(x ,
+
fl(x ) =
and
(
+
~)
(x,~)
f1.(x),
(x , « x ' ) =
+
(x , ;X )
A
(3.5.12)
«(x,
A
x')
« f (x ) , =
(3.5.13)
respectively. We denote the functional f = x ' such that f(x ) for all x E X by O. Iffis a linear functional then we note that
f(x l
+
1X .) =
(XI IX < '
=
+
1X .' x ' )
+
x')
and also,
f(<)x< =
,x<
1X< .'
x') =
and let
x') ,x < «
f(x l ) + =
x') =
« f (x ) .
f(x1.),
=
x'(x)
=
0
(3.5.14) (3.5.15)
It is now a simple matter to prove the following: 3.5.16. Theorem. The space X ' with vector addition and multiplication of vectors by scalars defined by equations (3.5.12) and (3.5.13), respectively, is a vector space over .F Prove Theorem 3.5.16.
3.5.17. Exercise.
3.5.18. Definition. The linear space X ' of .X
is called the algebraic conjugate
Let us now examine some of the propeties of X ' dimensional linear spaces. We have:
for the case of finite-
3.5.19. T ' heorem. Let X be a finite-dimensional vector space, and let e{ l , • • ,e,,} be a basis for .X IfI« { t ... ,« . } is an arbitrary set of scalars, then there is a unique linear functional x ' E X ' such that (e" x ' ) = ,« for i = 1, ... , n. Proof
F o r every X
E
,X
X =
we have ~1l'1
+
~1.e1.
+ ... +
~"e .•
111
3.5. iL near uF nctionals
Now let x '
E
X'
be given by
I'
(x , x ' ) =
~
•
1= 1
J'
rt~"
= e, for some i, we have = I and = 0 if i *- j. Thus, (e x ' ) = rt, for i = I, ... , n. To show that x ' is unique, suppose there is an" ~ E X ' such that (e,,~) = rt, for i = I, ... ,n. It then follows that (e,,~) - (e " x ' ) = 0 for i = I, ... ,n, and so (e ~ - x ) = 0 for i = I, ... ,n. This " implies ~ - x = 0; i.e., ~ = x . •
Ifx
In our next result and on several other occasions throughout this book, we make use of the Kronecker delta.
3.5.20. DefIDitioD. Let ~ I, ... ,n. Then ~'J
for i,j =
_ I{
/J -
if i =
0 ifi*- j
j
(3.5.21)
is called the Kronecker delta.
We now have:
3.5.22. Theorem. Let X be a finite-dimensional vector space. If e{ l' e~, ... , e.} is a basis for ,X then there is a unique basis {e~, e;, ... , e:.} in X ' with the property that (e e~) = ~/J' F r om this it follows that if X is ndimensional, then so is XI. " Proof F r om Theorem 3.5.19 it follows that for eachj = I, ... , n, a unique ~ E X' can be found such that .< e e~) = ~/J' Thus, we only have to show " that the set e{ ;, e;, ... , e:.} is a linearly independent set which spans X I . To show that e{ ;, e;, ... , e~} is linearly independent, let
PIe;
Then
o=
+
pze;
(eJ , ~ . Pie;)= .~ 1= 1
+ ... +
P.e:. =
p,(eJ , e;)
1= 1
=
~
.
1= 1
o.
= PJ'
P'~/J
and therefore we have PI = P~ = ... = P. = O. This proves that {e~, e;, ... , e~} is a linearly independent set. To show that the set Ie;, e;, ... , e~} spans X ' , let ' x E X ' and define rt ,
= (e
x').
" (x , x ' )
Also,
Let x
=
=
(' l eI
=
~
•
1= 1
'lei' We then have
+ ... +
' I (e l , x ' )
' . e., x ' )
+ ... +
(x , e~) =
= l'< eI>
' . (e., x ' )
•
~ "<e ,, ~) t:t
x') ' I rt l =
=
'J
+ +
+ +
(' . e., x ' ) ' . rt .•
Chapter 3 I Vector Spaces and iL near Transformations
112
Combining the above relations we now have
,x <
= I« ,X<
x')
e;)
= ,x<
+
I« e;
F r om this it now follows that for any x '
x'
+
+
E
= I« e; + ...
which proves our theorem.
X'
«.(X,
+ .« e.).
e.)
we have
+ .« e.,
_
The previous result motivates the following definition. 3.5.23. DefinitiOD. The basis called the dual basis of (e" e2 , • •
(e;, e;, ... , e.} ,
e.}.
of X '
in Theorem 3.5.22 is
We are now in a position to consider the algebraic transpose of a linear transformation. L e t S be a linear transformation of a linear space X into a linear space Y a nd let X ' and yl denote the algebraic conjugates of X and ,Y respectively (the spaces X and Y need not be finite dimensional). F o r each y' E yl let us establish a correspondence with an element x ' E X ' according to the rule x ' ( x ) = ,x < x ' ) = <Sx, y' ) = y' ( Sx ) , (3.5.24) where x E .X L e t us denote the mapping defined in this way by ST: STy' = and let us rewrite Eq. (3.5.24) as
,x <
STy' )
= S< ,x
y' ) ,
x E ,X y' E yl,
x'
(3.5.25)
to define ST. It should be noted that if S is a mapping of X into ,Y then ST is a mapping of yl into X I , as depicted in F i gure K. We now state the following formal definition.
s y
x
lX
3.5.26.
iF gure .K
Transpose of a linear transformation.
3.5.27. DefinitioD. L e t S be a linear transformation of a linear space X into a linear space Y over the same field F and let X ' and yl denote the
3.6. Bilinear uF nctiona[s
113
algebraic conjugates of X and ,Y respectively. A transformation ST from yl into X ' such that ,x< STy') = S < ,x y' ) for all x
E
X and all y'
E
yl is called the (algebraic) traaspose of S.
We now show that ST is a linear transformation.
3.5.28. Theorem. Let S E L ( X , y), and let ST be the transpose of S. Then ST is a linear transformation from yl into X ' . Proof
L e t« ,x <
,F and let y;, y~
E
+
ST(Y;
Thus, ST(y;
+
=
h»
=
=
y~)
,x <
ST(y;) ST(
S < ,x
(y;
,x <
STY;)
+
ST(
S « T(Y;).
+
yl. Then for all x
ST(y~).
= =
eH nce,
E
+
=
h»
S < ,x ST h
,x<
).
y;)
,X
E
+
S < ,x
h)
Also,
=
S < ,x
«Y;)
,x < «
sry;) =
Therefore, ST
E
S < « ,x ,x<
Y;) s« ry~).
L ( yl, X ' ) . •
The reader should have now no difficulties in proving the following results.
3.5.29. Theorem. Let R, S E L ( X , )Y , and let T E L ( y, Z). Let RT, ST, and TT be the transpose transformations of R, S, and T, respectively. Then, (i) (R + Sf = RT + ST; and (ii) (TSf = STTT. 3.5.30. Theorem. Let I denote the identity element of L ( X , is the identity element of L ( X ' , X I ).
X).
Then TJ
3.5.31. Theorem. Let 0 be the null transformation in L ( X , is the null transformation in L ( yl, X I ).
)Y .
Then OT
3.5.32. Exercise.
Prove Theorems 3.5.29-3.5.31.
We will consider an important class of transpose linear transformations in Chapter 4 (transpose of a matrix).
3.6.
BILINEAR
N UF CTIONALS
In the present section we introduce the notion of bilinear functional and examine some of the properties of this concept. Throughout the present section we concern ourselves only with real vector spaces or complex vector
114
Chapter 3
I Vector Spaces and Linear Transformations
spaces. Thus, if X is a linear space over a field ,F it will be assumed that F is either the field of real numbers, R, or the field of complex numbers, C. 3.6.1. DefiDitiOD. L e t X be a vector space over C. A mapping g from X into C is said to be a coojugate f.ClioDal if
+
g(czx
py) =
+
lig(x )
pg(y)
(3.6.2)
for all ,x y E X and for all cz, p E C, where d denotes the complex conjugate of cz and denotes the complex conjugate of p.
p
Ifin Definition 3.6.1 the complex vector space is replaced by a real linear
space, then the concept of conjugate functional reduces to that of linear functional, for in this case Eq . (3.6.2) assumes the form
+
g(czx
py) =
+
czg(x)
for all ,x y E X and for all cz, pER.
pg(y)
3.6.4. DeftDitioD. L e t X be a vector space over C. A mapping g of X into C is called a bilioear f.ClioDal or a bilioear form if
(3.6.3)
x X
(i) for each fixed y, g(x , y) is a linear functional in x ; and (ii) for each fixed x, g(x , y) is a conjugate functional in y. Thus, if g is a bilinear functional, then
+
py, )z = (a) g(czx (b) g(x , czy + pz ) =
for all ,x y, z
E
X
czg(x,
)z
iig(x , y)
and for all cz,
+
+
pg(y, z ) ; and pg(x , z )
pE
C.
F o r the case of real linear spaces the definition of bilinear functional is modified in an obvious way by deleting in Definition 3.6.4 the symbol for complex conjugates. We leave it as an exercise to verify that the examples cited below are bilinear functionals. 3.6.5. Example. L e t ,x y E Cl, where Cl denotes the linear space of ordered pairs of complex numbers (if ,x y E Cl, then x = (~I> 1~ ) and y = (111,111». The function g(x , y) =
is a bilinear functional.
~I;;I
+
~7.; 7.
_
3.6.6. Example. L e t ,x y E Rl, where R7. denotes the linear space of ordered pairs of real numbers (if ,x y E R7., then x = (~I> ~7.) and y = (111) 111»' L e t (J denote the angle hetween ,x y E l} 7.. The dot product of two
3.6.
115
Bilinear uF nctionals
vectors, defined by
I'll + is a bilinear functional. _ g(x , y) =
~
=
~z'lz
(~t
+
~DI/2('It
+
I' DI/2 cos (J
3.6.7. Example.
Let X be an arbitrary linear space over C, and let L ( x ) and P(y) denote two linear functionals on .X The transformation g(x , y) =
is a bilinear functional.
L ( x ) P(y)
_
3.6.8. Example.
eL t X be any linear space over C, and let g be a bilinear functional. The transformation h defined by h(x, y) =
is a bilinear functional.
g(x , y)
_
3.6.9. Exercise. Verify that the transformations given in Examples 3.6.5 through 3.6.8 are bilinear functionals. We note that for any bilinear functional, g, we have g(O, y) = g(O • 0, y) 0 • g(O,y) = 0 for all y E .X Also, g(x , 0) = 0 for all x E .X Frequently, we find it convenient to impose certain restrictions on bilinear functionals. =
3.6.10. Definition. Let X be a complex linear space. A bilinear functional g is said to be symmetric if g(x , y) = g(y, x ) for all x, y E .X Ifg(x , x) 2 0 0, then for all x E ,X then g is said to be positive. Ifg(x , x ) > 0 for all x
*"
g is said to be strictly positive.
3.6.11. Definition. eL t X be a complex vector space, and let g be a bilinear functional. We call the function g: X ......... C defined by g(x )
=
g(x , x )
for all x E X, the quadratic fonn induced by g (we frequently omit the phrase "induced by g"). F o r example, if g(x , y) = ~ 1;;1 + ~2;;2' as in Example 3.6.5, then g(x ) = ~I~l + ~2~Z = I~dz + l~zI2. This is a quadratic form as studied in analytic geometry. F o r real linear spaces, Definitions 3.6.10 and 3.6.11 are again modified in an obvious way by ignoring complex conjugates.
3.6.12. Theorem. If g is the quadratic form induced by a bilinear functional
Chapter 3 I Vectof Spaces and iL near Transformations
116
g, then
~ Proof
+
[ g (x , y)
g(y,x » )
t Y)
t(X =
-
te 2 Y ) .
By direct expansion we have, t ( -x +2 - Y ) =
y)
+y - ' - 2 - x+ g (x- 2
= =
+
I 4 [ g (x ,
x
I 4 [ g (x ,
x)
~
x) -
y)
+
4 I g(x =
+
+
g(y, x
+
y)
y»)
+
g(y, x )
g(x, y) -
g(y, x )
g(x, y)
+
y,x
+
g(y, y»),
and also,
=
t e 2 Y)
g[ (x,
Thus, - } [ g (x ,
y)
+
= te
g(y, )» x
t Y) -
+
g(y, y»).
t e 2 y). -
Our next result is commonly referred to as polarization. 3.6.13. 1beorem. Ift is the quadratic form induced by a bilinear form g on a complex vector space ,X then g(x , y) =
for every ,x y
Proof
+
t[ ! ( x
y») -
t[ ! ( x
-
(here i =
E X
~
y»)
+
+
it[ ! ( x
iy») - it[ ! ( x -
iy)]
- I ).
F r om the proof of the last theorem we have te
t Y)
= ~
+
g(x, y)
+
g[ (x,
x)
g[ (x,
x) -
g[ (x,
x ) ._ ig(x, y)
g[ (x,
x)
+
g(y, x )
g(y, y»)
and t e 2 Y)
Also, it(X
i
= ~
iY ) =
~
g(x , y) -
g(y, x )
+
+
g(y, y»).
+
ig(y, x )
g(y, y»)
and it(X
- ; iY )
= ~
+
ig(x, y) -
ig(y, x )
+
g(y, y»).
After combining the above four expressions, Eq. (3.6.14) results. _ The reader can prove the next result readily.
(3.6.14)
117
3.6. Bilinear uF nctiona/s
3.6.15. Theorem. Let X be a complex vector space. If two bilinear functionals g and h are such that g = h, then g = h.
Exercise.
3.6.16.
Prove Theorem 3.6.15.
F o r symmetric bilinear functionals we have: 3.6.17. Theorem. A bilinear functional g on a complex vector space X symmetric if and only if g is real (i.e., g(x) is real for all x E )X . Proof
is
Suppose that g is symmetric; i.e., suppose that g(x, y)
for all ,x y
E
.X
=
g(y, x)
= y, we obtain g(x) = g(x, x) = g(x, x) = But this implies that g is real. Setting x
g(x)
for all x E .X Conversely, if g(x) is real for all x E ,X then for h(x, y) = g(y, x) we have h(x) = g(x, x ) = g(x, x ) = g(x). Since h = g, it now follows from Theorem 3.6.15 that h = g, and thus g(x, y) =
g(y, )x .
•
Note that Theorems 3.6.13, 3.6.15, and 3.6.17 hold only for complex vector spaces. Theorem 3.6.15 implies that a bilinear form is uniquely determined by its induced quadratic form, and Theorem 3.6.13 gives an explicit connection between g and g. In the case of real spaces, these conclusions do not follow. 3.6.18. Example. Let X = R2 with x = (el' e2) E R2 and y E R2. Define the bilinear functionals g and h by g(x, y) =
and
h(x, y)
Then g(x)
=
el171
+
2'217.
+
1{4 172
+
=
(171,172)
2' 172
= ' I ' l l + 3e2'11 + 3,1'12 + 2' 1' 2'
h(x), but g ;#: h. Note that h is symmetric whereas g is not.
•
Using bilinear functionals, we now introduce the very important concept of inner product. 3.6.19. Definition. A strictly positive, symmetric bilinear functional g on a complex linear space X is called an inner product.
F o r the case of real linear spaces, the definition of inner product is identical to the above definition. Since in a given discussion the particular bilinear functional g is always
Chapter 3 I Vector Spaces and iL near Transformations
118
specified, we will write (x , y) in place of g(x, y) to denote an inner product. Utilizing this notation, the inner product can alternatively be defined as a rule which assigns a scalar (x, y) to every x , y E X (X is a complex vector space), having the following properties: 0 for all x 1= = 0 and (x , x ) = 0 if x = 0; (Y, x ) for all x , y E X ; Py, )z = cz(,x )z + P(Y, )z for all x , y, Z E X Ct, P E C; and (iv) (x , Cty + pz) = « ( x , y) + p(x , )z for all x , y, Z E X Ct, p E C.
(i) (x , x ) (ii) (x , y) (iii) (Ctx +
=
>
and for all and for all
In the case of real linear spaces, the preceding characterization of inner product is identical, except, of course, that we omit conjugates in (i}(- iv). We are now in a position to introduce the concept of inner product space. 3.6.20. DefiDition. A complex (real) linear space X on which a complex (real) inner product, (" ' ) , is defined is called a complex (real) inner product space. In general, we denote this space by { X ; (0, • )}. If the particular inner product is understood, we simply write X to denote such a space (and we usually speak of an inner product space rather than a complex or real inner product space). It should be noted that if two different inner products are defined on the same linear space ,X say (' , )' 1 and (' , • )2' then we have two different inner product spaces, namely, { X ; (' , .).} and { X ; (0, ')2}' Now let { X ; (0, .)' } be an inner product space, let Y be a linear subspace of ,X and let (' , .)" denote the inner product on Y induced by the inner product on X ; i.e., (x, y)' = (x, y)" (3.6.21) for all ,x y EY e .X Then { Y ; (' , ' )"} is an inner product space in its own right, and we say that Y is an inner product subspace of X. Using the concept of inner product, we are in a position to introduce the notion of orthogonality. We have: 3.6.22. Definition. eL t X be an inner product space. The vectors ,x y E X are said to be orthogonal if (x, y) = O. In this case we write x - l y. If a vector x E X is orthogonal to every vector of a set Y c X, then x is said to be orthogonal to set ,Y and we write x - l .Y If every vector of set Y c X is orthogonal to every vector of set Z c X, then set Y is said to be orthogonal to set Z, and we write Y ...L Z. Clearly, if x is orthogonal to y, then y is orthogonal to .x Note that if 0, then it is not possible that x - l x , because (x, x ) > 0 for all x 1= = O. Also note that 0 - l x for all x E X. x 1= =
3.7. Projections
119
Before closing the present section, let us consider a few specific examples. 3.6.23. Example. Let X = R"o F o r x ,' I .) E R· , we can readily verify that o
=
(~I'
00"
~")
E
R" and y
=
(' I I'
••
(x, y) =
is an inner product, and { X ; 3.6.24. Example. ... ,' I .) E C· , let
Let
~,'Il
( ., .)} is a real inner product space. _
= X
•
~
I~
= x
C", F o r
(~I'
.. " ~.)
E
C" and y =
('II>
•
:E ,~ ; "
(x, y) =
1- 1
Then (x, y) is an inner product and ;X{ space. _
(., .)} is a complex inner product
3.6.25. Example. Let X denote the space of continuous complex valued functions on the interval 0[ , 1). The reader can readily show that for f, g E ,X (f,g)
f'=
f(t)g(t)dt
is an inner product. Now consider the family of functions {f.} f.(t) =
n=
0, ± l ,
f.) = 0 if m
3.7.
e
1_
1
,
t
E
defined by
0[ , 1],
± 2 , .... Clearly, f. E X for all n. It is easily shown that (frn, n. Thus, f .. ..L f .. if m n. •
*'
*'
PROJECTIONS
In the present section we consider another special class of linear transformations, called prOjectiODS. Such transformations which utilize direct sums (introduced in Section 3.2) as their natural setting will find wide applications in later parts of this book.
3.7.1. Definition. Let X be the direct sum of linear spaces X I and X 1 ; i.e., let X = X I ® X 1 • eL t x = X I + 2X , be the unique representation of x E X , where X I E X I and 2X , E X 1 • We say that the projection on X I along 2X ,
is the transformation defined by P(x )
=
XI'
Referring to Figure ,L we note that elements in the plane X can uniquely be represented as x = X I + 2X " where X I E X I and X 2 E X 2 (X I and X 1 are one-dimensional linear spaces represented by the indicated lines intersecting at the origin 0). In this case, a projection P can be defined as that
Chapter 3 I Vector Spaces and iL near Transformations
120
= x
3.7.1.
Figure L .
Projection on IX
+
Xl
2X
along 1'X ..
transformation which maps every point x in the plane X onto the subspace XI along the subspace 1'X .' 3.7.3. Theorem. eL t X be the direct sum of two linear subspaces X I 1'X ., and let P be the projection on X I along 1'X .' Then (i) P
E
L(X,
(ii) R < (P) = (iii)
~(P)
=
X);
X I ; and X 2•
Proof To prove the first part, note that if x = X I where x " Y I E X I and 1'X .' 1'Y . E X 2 , then clearly P(f1.X
+
and
=
Py) = =
+
P(f1.X I f1.P(x f1.P(x)
l)
+
+
f1.X1' .
+
PP(YI)
PYI
=
+
PY1' .)
f1.P(x I
+
+
=
1'X . and Y = f1.X I
1'X .)
+
+
PYI PP(YI
+
YI
+
1'Y .'
1'Y .)
pP(y),
and therefore P is a linear transformation. To prove the second part of the theorem. we note that from the definition of P it follows that R < (P) C X I ' Now assume that IX E X I ' Then PX I = IX > and thus x I E R < (P). This implies that XI C R < (P) and proves that R < (P) = X I ' To prove the last part of the theorem, let 1'X . E X 2 • Then PX1' . = 0 so that 1'X . C ~(P). On the other hand, if x E ~(P), then Px = O. Since x = X I + 1' X .' where X I E XI and 1'X . E 1'X .' it follows that X I = 0 and X E 1'X .' Thus, 1'X . ::J ~(P). Therefore, 1'X . = ~(P). • Our next result enables us to characterize projections in an alternative way. 3.7.4. ~(P)
Theorem. eL t P E L ( X , X). if and only if PP = p'1.= P.
Then P is a projection on R < (P) along
111
3.7. Projections
Proof Assume that P is the projection on the linear subspace X l of X along the linear subspace :X h where X = X I EB X I ' By the preceding theorem, Xl = R < (P) and X I = m(p). F o r x E ,X we have x = lX XI' where X I E X I and IX E XI' Then
+
=
p'1. x
=
P(Px)
PX
I
=
XI
=
Px,
and thus p'1. =
P. let us assume that p2 = P. Let 1'X . = m(p) and let X I = R < (P). Clearly, m(p) and R < (P) are linear subspaces of .X We must show that X = R < (P) EB m(P) = X I EB X I ' In particular, we must show that R < (P) n m(p) = O{ J and that R< (P) and m(p) span .X Now if y E R < (P) there exists an x E X such that Px = y. Thus, p'1. x = Py = Px = y. If y E m(p) then Py = O. Thus, if y is in both m(p) and m(p), then we must have y = 0; i.e., R < (P) n m(p) = O { .J Next, let x be an arbitrary element in .X Then we have C~n> versely,
=
x
Px
Letting Px = lX and (I - P)x = IX ' also PX I = P(I - P)x = Px - p'1. x IX E X I ' F r om this it follows that X X I along X I is P. •
+
=
(I -
=
P)x.
we have PX I = pIX = Px = X I and Px - Px = 0; i.e., X I E X I and X I EB X I and that the projection on
The preceding result gives rise to the following: 3.7.5. Definition. pI = P .
XI'
Let
P
E
L(X,
X).
Then P is said to be idempotent if
Now let P be the projection on a linear subspace X l along a linear subspace Then the projection on X I along X I is characterized in the following way.
3.7.6. Theorem. A linear transformation P is a projection on a linear subspace if and only if (I - P) is a projection. If P is the projection on X I along 1'X .' then (I - P) is the projection on 1'X . along X l ' 3.7.7. Exercise.
Prove Theorem 3.7.6.
In view of the preceding results there is no ambiguity in simply saying a transformation P is a projection (rather than P is a projection on X I along 1'X .)' We emphasize here that if P is a projection, then
X =
R < (P)
EB m(p).
(3.7.8)
This is not necessarily the case for arbitrary linear transformations T E L ( X , X ) for, in general, R < (T) and meT) need not be disjoint. F o r example, if there exists a vector X E X such that Tx 0 and such that T2 x = 0, . then Tx E R < (T) and Tx E meT).
*'
Chapter 3 I Vector Spaces and iL near Transformations
121 eL t us now consider:
3.7.9. Definition. eL t T E U.X, X). A linear subspace Y of a vector space X is said to be invariant under the linear transformation T if y E Y implies that Ty E .Y Note that this definition does not imply that every element in Y can be written in the form z = Ty, with y E .Y It is not even assumed that Ty E Y implies y E .Y F o r invariant subspaces under a transformation T E U.X, X ) we can readily prove the following result. 3.7.10. Theorem. eL t T
E
U.X,
Then
X).
(i) X is an invariant subspace under T; (ii) O { J is an invariant subspace under T; (iii) R < (T) is an invariant subspace under T; and (iv) (~ T) is an invariant subspace under T. 3.7.11. Exercise.
Prove Theorem 3.7.10.
Next we consider: 3.7.12. Definition. eL t X be a linear space which is the direct sum of two linear subspaces Y and Z; i.e., X = Y EEl Z. If Y a nd Z are both invariant under a linear transformation T, then T is said to be reduced by Y a nd Z. We are now in a position to prove the following result. 3.7.13. Theorem. Let Y and Z be two linear subspaces of a vector space X such that X = Y EEl Z. Let T E L ( X , X). Then T is reduced by Y and Z if and only if PT = TP, where P is the projection on Y along Z.
Proof Assume that PT = TP. If y E ,Y then Ty = TPy = PTy so that Ty E Y and Y is invariant under T. Now let y E Z. Then Py = 0 and PTy = TPy = TO = O. Thus, Ty E Z and Z is also invariant under T. eH nce, T is reduced by Y and Z. Conversely, let us assume that T is reduced by Y and Z. If x E ,X then x = y + ,z where y E Y and Z E Z. Then Px = yand TPx = Ty E .Y eH nce, PTPx = Ty = TPx ; i.e., PTPx
=
TPx
(3.7.14)
for all x E .X On the other hand, since Y a nd Z are invariant under T, we have Tx = Ty + Tz with Ty E Y and Tz E Z. eH nce, PTx = Ty = PTy = PTPx ; i.e., (3.7.15) PTPx = PTx
3.8.
123
Notes and References
for all x
E
.X
Equations (3.7.14)
and (3.7.15) imply that PT =
TP.
•
We close the present section by considering the following special type of projection.
3.7.16. Definition. A projection P on an inner product space X is said to be an orthogonal projection if the range of P and the null space of Pare orthogonal; i.e., if R < (P) l.. &(P). We will consider examples and additional properties of projections in much greater detail in Chapters 4 and 7.
3.8.
NOTES AND REFERENCES
The material of the present chapter as well as that of the next chapter is usually referred to as linear algebra. Thus, these two chapters should be viewed as one package. F o r this reason, applications (dealing with ordinary differential equations) are presented at the end of the next chapter. There are many textbooks and reference works dealing with vector spaces and linear transformations. Some of these which we have found to be very useful are cited in the references for this chapter. The reader should consult these for further study.
REFERENCES 3[ .1] 3[ .2] 3[ .3]
3[ .4]
P. R. A H M L OS, iF nite Dimensional Vector Spaces. Princeton, N.J . : D. Van Nostrand Company, Inc., 1958. K. O H M F AN and R. N U K ZE, Linear Algebra. Englewood Cliffs, N.J . : PrenticeH a ll, Inc., 1971. A. W. NAYO L R and G. R. SEL,L Linear Operator Theory in Engineering and Science. New Y o rk: H o lt, Rinehart and Winston, 1971. A. E. TAYO L R, Introduction to u F nctional Analysis. New Y o rk: J o hn Wiley & Sons, Inc., 1966.
4
IF NITE-DIMENSIONAL VECTOR SPACES AND MATRICES
In the present chapter we examine some of the properties offinite-dimensional linear spaces. We will show how elements of such spaces are represented by coordinate vectors and how linear transformations on such spaces are represented by means of matrices. We then will study some of the important properties of matrices. Also, we will investigate in some detail a special type of vector space, called the Euclidean space. This space is one of the most important spaces encountered in applied mathematics. Throughout this chapter { « " ... , .« ,} /« E ,F and { x " ... ,x . } , /x E ,X denote an indexed set of scalars and an indexed set of vectors, respectively.
.4 1.
COORDINATE REPRESENTATION OF VECTORS
Let X be a finite-dimensional linear space over a field ,F and let x { I' • • , x.} be a basis for .X Now if x E ,X then according to Theorem 3.3.25 and Definition 3.3.36, there exist unique scalars ~ I' . . . ,~., called coordinates of x with respect to this basis such that
x = 124
~IXI
+ ... +
~.x..
(4.1.1)
.4 1.
U5
Coordinate Representation of Vectors
This enables us to represent x unambiguously in terms of its coordinates as
(4.1.2)
or as (4.1.3) We call x (or x T) the coordinate representation of the underlying object (vector) x with respect to the basis { x " ... ,x,,}. We call x a column vector and x T a row vector. Also, we say that x T is the transpose vector, or simply the transpose
of the vector x. F u rthermore, we define (x T f to be x. It is important to note that in the coordinate representation (4.1.2) or (4.1.3) of the vector (4.1.1), an "ordering" of the basis IX{ ' ... ,x,,} is employed (i.e., the coefficient of X, is the ith entry in Eqs. (4.1.2) and (4.1.3». If the members of this basis were to be relabeled, thus specifying a different "ordering," then the corresponding coordinate representation of the vector X would have to be altered, to reflect this change. However, this does not pose any difficulties, because in a given discussion we will always agree on a particular "ordering" of the basis vectors. Now let« E .F Then
I'
x«
«('IX =
I+
... + ,,,x.)
=
(
I
+ ... +
(<e< ")x,,.
(4.1.4)
In view of Eqs. (4. t.I)-.4{- 1.4) it now follows that the coordinate representa... ,x . } is given by tion of x « with respect to the basis {XI'
I'
« e l-
z{
«x=
«
e« z (4.1.5)
I'_ t
or «x Next, let y E ,X
T=
«({I'
e«_ .
'z,'
.. ",,) = (
"IX I
+
.. ,«e,,).
«ez,'
(4.1.6)
where y
=
"z X
z
+ ... +
""X".
The coordinate representation of y with respect to the basis IX { " is, of course,
(4.1.7)
.. ,x,,}
ChtJpter 4 I iF nite-Dimensional
126
Vector Spaces and Matrices
=Y
(4.1.8)
or (4.1.9) Now
x +
=
y
(~.x.
(~I =
+ ... +
~RX.)
+ 1' .)x l + ... +
+ ('1IXI + ... + 1' RX.) (~.
+
1' .)x..
(4.1.10)
F r om Eq. (4.1.10) it now follows that the coordinate representation of the vector x + Y E X with respect to the basis { X I "" ,x . l is given by
[
x+y= or
x T+
=
yT
~.] . + .[~I] .
(~.
=
1' .
~. (~
. =
[~I . ~ R~
.
1' 1]
(4.1.11)
+ 1' R
+ ('11" .. ,'1R) + 1' ..... ,~. + 1' R)'
..... , ~.)
(4.1.12)
Next, let lU { t • . , u.l and V { It .• • , vRl be two different bases for the linear space .X Then clearly there exist two different but unique sets of scalars (i.e., coordinates) t{ .¥ , .• • ,t¥.l and fP..... ,P.l such that X
=
t¥lu.
+ ... +
t¥.u. =
P.v l + ... + P.v..
(4.I.I3)
This enables us to represent the same vector x E X with respect to two different bases in terms of two different but unique sets of coordinates, namely,
[]
~d
[]
(4.1.14)
The next two examples are intended to throw additional light on the above discussion.
.4 1.15.
Example. eL t X = (~I'" . ,~.) E R·. eL t "I = (1,0, ... ,0), (0, 1,0, ... ,0), ... , U . = (0, ... ,0, I). It is readily shown that the set u{ l , • . , u.l is a basis for RR. We call this basis the natural basis for RR. Noting that (4.1.16)
U2
=
.4 1.
Coordinate Representation of Vectors
117
the unambiguous coordinate representation of x natural basis of R" is
E
R" with respect to the
(4.1.17) or
x T=
... , ~.).
(~1'
Moreover, the coordinate representations of the basis vectors U
0
I
0
u. =
,
... ,
0 ,
z =
0
0
••
, tU I
are
0
I
U
I>
u = "
,
(4.1.18)
0 I
respectively. We call the coordinates in Eq. (4.1.17) the natural coordinates of x E R". (The natural basis for F " and the natural coordinates of x E F " are similarly defined.) Next, consider the set of vectors v { ., ... , v.J, given by v. = (1,0, ... ,0), Vz = (I, 1,0, ... ,0), ... ,v" = (I, ... , I). We see that the vectors { V I> ... , v.J form a basis for R". We can express the vector x given in Eq. (4.1.16) in terms of this basis by
,'.+
(4.1.19)
for i = 1, 2, ... , n - 1. Thus, the coorwhere ot" = ,,, and ott = " dinate representation of x relative to {v., ... , v.) is given by ot. ot z
ot,,_. _ ot"
,. - z' z' - 3'
,,,-.
(4.1.20)
"~-
"~
Hence, we have represented the same vector x E R· by two different coordinate vectors with respect to two different bases for R". • Example. Let X = e[a, b}, the functions on the interval a[ , b]. Let Y = = 1 and x,(t) = I' for all I E a[ , b}, i = 3.3.13, Y is a linearly independent set in X
.4 1.21.
set of all real-valued continuous x{ o, x . , .• . , "x J c ,X where ox (t) 1, ... ,n. As we saw in Exercise and as such it is a basis for V( )Y .
Chapter 4 I iF nite-Dimensional
128
Vector Spaces and Matrices
eH nce, for any y E V(Y) there exists a unique set of scalars ('Io, I' I> such that y = I' oXo + ... + I' . X .•
• . , 1' .1 (4.1.22)
Since y is a polynomial in t we can write, mote explicitly,
y(t) =
1' 0
+
+ ... +
' l It
'I.t·,
t E a[ , b).
(4.1.23)
In the present example there is also a coordinate representation; i.e., we can represent y E V( )Y by
(4.1.24)
I' .
This representation is with respect to the basis (x o, IX > • , .x l in V(Y). We could, of course, also have used another basis fo~ V(Y). F o r example, let us choose the basis (zo, z I' • . , .z l for V( )Y given in Exercise 3.3.13. Then we have y =
1X0Z o
+
IX I Z
I
+ ... +
IX"Z",
where IX. = I' " and IXt = I' t - I' t+I' i = 0, 1, ... ,n - 1. Thus, y may also be represented with respect to the basis (ZO,ZI' " • ,z . } by 1X
0
IX
I
1' 0 I' I -
(4.1.25) E
V(Y )
'II
1' 2 (4.1.26)
IX._ _ IX"
I
1' ,,-1 - 'I. 'I.
Thus, two different coordinate vectors were used above in representing the same vector y E V( )Y with respect to two different bases for V( )Y . • Summarizing, we observe: 1. Every vector X belonging to an n-dimensional linear space X over a field F can be represented in terms of a coordinate vector x, or its transpose x T , with respect to a given basis e{ I' • • , e.l c .X We note that x T E P (the space P is defined in Example 3.1.7). By convention we will henceforth also write x E P. To indicate the coordinate representation of x E X by x E P, we write x ~ .x 2. In representing x by x, an "ordering" of the basis e{ l t • • , ell} c X is implied.
.4 2.
Matrices
3.
129
sU age of different bases for X sentations of x E .X
.4 2.
results in different coordinate repre-
MATRICES
In this section we will first concern ourselves with the representation of linear transformations on finite-dimensional vector spaces. Such representations of linear transformations are called matrices. We will then examine the properties of matrices in great detail. Throughout the present section X will denote an n-dimensional vector space and Yan m-dimensional vector space over the same field .F
A. Representation of Linear Transformations by Matrices We first prove the following result. Theorem. Let e{ ., e2, ... ,e ..} be a basis for a linear space .X
.4 2.1.
(i) eL t set (el> e..},
(ii) L e t
A be a linear transformation from X
into vector space Y and is any vector in X and if e2"' " e..) are the coordinates of x with respect to e{ ., e2, ... , then Ax = e1e; + e2e~ + ... + e.. e~.
e;
= Ae1 , e~ = Ae2 , • • e;, ... , e~J
{e~,
,I" = Ae... If x
be any set of vectors in .Y
Then there exists a = e;,
unique linear transformation A from X into Y such that Ae l Ae 2 = e~, .• . , Ae.. = 1". Proof To prove (i) we note that Ax
= =
A(e1e l el~
+
+
+
e2e2 e2e~
+
+
+
e"e,,)
=
elAe l
=
e2Ae2
+ ... +
E
X we have unique scalars
e"Ae"
e"e~.
To prove (ii), we first observe that for eachx e., e2" .. , e.. such that x
+
e.e l
+
+ ... +
e2e2
e..e".
Now define a mapping A from X into Y as A(x) =
Clearly, A(e,) = e; for i Given x = ele l + e2e2 we have A(x + y) =
=
+
ele;
= 1,
+
(el +
,n.
e.. l".
We first must show that A is linear. e..e.. and y = ' l Ie. + I' 2e2 + ... + ' I ..e..,
+
A[(el
+ ... +
e2e~
+
I' 1)e. I' 1)e'l
+
+ +
+
(e .. (e ..
+
+
' I ..)e ..l ' I ..)e' ...
Chapter 4 I iF nite-Dimensional
130
Vector Spaces and Matrices
On the other hand. and Thus.
A(x)
+
ell.
A(y) =
=
= (el
A(x
+
+
+ ... + + (e~ +
e~e~ 111)e~
+
+
e"e:. 11~)~
111e~
+ ... +
+
11~~
(e"
+
+ ... +
l1"e~
11,,)e:.
y).
In an identical way we establish that
=
lXA(x)
A(lX)X
for all x E X and all lX E .F It thus follows that A E L ( X . )Y . To show that A is uniq u e. suppose there exists aBE L ( X . )Y Be, = e; for i = I• . ..• n. It follows that (A - B)e, = 0 for all i = and thus it follows from Exercise 3.4.6 4 that A = B. •
such that I• . ..• n.
We point out that part (i) of Theorem .4 2.1 implies that a linear transformation is completely determined by knowing how it transforms the basis vectors in its domain. and part (ii) of Theorem .4 2.1 states that this linear transfor-
mation is uniquely determined in this way. We will utilize these facts in the following. Now let X be an n-dimensional vector space. and let {el' ez • . ..• ell} be a basis for .X L e t Y b e an m-dimensional vector space. and let {fIJ~ • ... J " ,} be a basis for .Y L e t A E L ( X . )Y . and let e; = Ae, for i = I • . ..• n. Since {[IJ~ • ... J " ,} is a basis for .Y there are uniq u e scalars a{ o.} i = I• . ..• m. j = I • . ..• n. such that
Now let x E .X
Ael = Aez =
I. = allfl ~ = aufl
Ae" =
e:. =
I
at..!1 elel
with respect to the basis e{ we have
l•
Ax = ffIJ~.·
..
+
+
azt!~
+
+
aufz
+
az,,[z
+ ... +
a",t!",
+
a",d",
(4.2.2)
a",..!",.
Then x has the uniq u e representation
x =
Since Ax E .Y
+
+
+ .,. +
e~ez
e"e"
ell}' In view of part (i) of Theorem 4.2.1
••
ele~
+ ... +
(4.2.3)
e"e~.
Ax has a uniq u e representation with respect to the basis
,fIlII. say.
Ax =
11t!1
+
l1dz
+ ... +
11",[",.
(4.2.4)
.4 2.
Matrices
131
Combining Equations (4.2.2) and (4.2.3), we have Ax
+ ... +
= el(aldl
+
+
e,,(au/l +
+
e8(a l J I
+
a",d",)
+
a",,,/,,,)
+
a",Jm)'
.
Rearranging the last expression we have Ax
al"e" + ... + a h e8)/1 + (a"lel + aue" + ... + a"8e8)/"
= (allel
+ However, have
+
+
(a"'lel
a",,,e,, +
... + a"'8en)/",·
in view of the uniqueness of the representation in Eq. (4.2.4)
+
aue" + 11" = a"lel + aue" + 111
11", =
= allel
amlel
+
a",,,e,,
+
+
alnen, ah e8'
+ ... +
a",ne8'
we
(4.2.5)
This set of equations enables us to represent the linear transformation A from linear space X into linear space Y by the unique scalars lao}, i = I, ... , m,j = I, ... , n. F o r convenience we let
A -- [ a,}] -
ail
a" I
r
a"'l
au
au
...
a",,,
.. ,
a 18 ]
ah .
(4.2.6)
a"'8
We see that once the bases {el, e", . .. ,e { / h/", ... ,I",} are fixed, we can represent the linear transformation A by the array of scalars in Eq. (4.2.6) which are uniquely determined by Eq. (4.2.2). In view of part (ii) of Theorem .4 2.1, the converse to the preceding also holds. Specifically, with the bases for X and Y still fixed, the array given in Eq. (4.2.6) is uniquely associated with the linear transformation A of X into .Y The above discussion justifies the following important definition. 8 },
.4 2.7. Definition. The array given in Eq. (4.2.6) is called the matrix A of tbe linear transformation A from linear space X into linear space Y with respect to the basis e{ 1> • • , en} of X and the basis { I I' ... ,fIll} of .Y If, in Definition .4 2.7, X = ,Y and if for both X and Y the same basis e{ l' ... , e is used, then we simply speak of the matrix A of the linear transformation A with respect to the basis e{ l, ... ,e8 } . In Eq. (4.2.6), the scalars (all, 0,,,, ... ,0'8) form the ith row of A and the 8 }
Chapter 4 I iF nite-Dimensional
132
Vector Spaces and Matrices
scalars (all' 0 2/ , ... , 0"'/) form the jth column of A. The scalar a'l refers to that element of matrix A which can be found in the ith row and jth column of A. The array in Eq. (4.2.6) is said to be an (m X n) matrix. Ifm = n, we speak of a square matrix (i.e., an (n X n) matrix). In accordance with our discussion of Section .4 1, an (n X 1) matrix is called a column vector, column matrix, or n-vector, and a (1 x n) matrix is called a row vector. We say that two (m X n) matrices A = [ 0 1/] and B = b[ l/] are equal if and only if 01/ = bl/ for all i = I, ... , m and for allj = I, ... , n. F r om the preceding discussion it should be clear that the same linear transformation A from linear space X into linear space Y may be represented by different matrices, depending on the particular choice of bases in X and .Y Since it is always clear from context which particular bases are being used, we usually don' t refer to them explicitly, thus avoiding cumbersome notation. Now let AT denote the transpose of A E L ( X , Y) (refer to Definition 3.5.27). Our next result provides the matrix representation of AT. .4 2.8. Theorem. Let A E L ( X , Y ) and let A denote the matrix of A with respect to the bases e{ I' ... , e~} in X and { f l' ... ,I.} in .Y Let X I and yl be the algebraic conjugates of X and Y, respectively. Let AT E L ( Y I , X I ) be the transpose of A. Let {f~, ... ,f~} and {e~, ... , e:.}, denote the dual bases of { f l' ... , f",} and e{ u ... , e~}, respectively. If the matrix A is given by Eq. (4.2.6), then the matrix of AT with respect to {f~, ... ,f~} of yl and {e~, ... , e:.} of X ' is given by
all AT
a21
= [ 01.2.. .~2.2 al~
0"'1] •
a2~
""" •
~."'2
...
a",~
(4.2.9) •
Proof. Let B = b[ l' ] denote the (n x m) matrix of the linear transformation AT with respect to the bases f{ ,~ ... ,f~} and {e~, ... , e:.J. We want to show that B is the matrix in Eq. (4.2.9). By Eq. (4.2.2) we have
for i =
I, ...
,n, and
for j = I, ... , m. By Theorem 3.5.22, Therefore,
e<
"
e~> =
6,,, and
=
6,,}.
.4 2.
Matrices
133
Also,
A< el,f>~ Therefore, b,j
=
e< l, AT/~>
=
= k=L 1"
bkje~)
(el, tl =
= bl}'
bklel, e~>
ajl' which proves the theorem. _
The preceding result gives rise to the following concept. .4 2.10. Definition. The matrix AT in Eq. of matrix A.
(4.2.9)
is called the transpose
Our next result follows trivially from the discussion leading up to Definition .4 2.7. .4 2.11. Theorem. Let A be a linear transformation of an n-dimensional vector space X into an m-dimensional vector space ,Y and let y = Ax. Let the coordinates of x with respect to the basis e{ l , el' ... , e,,} be (e \ J el' ... , e.), and let the coordinates of y with respect to the basis { f l,fl' ... ,f..} be ('I I ' 1' 1' ... , 'I.). eL t
all all
011
ala
au
ala
(4.2.12)
be the matrix of A with respect to the bases reI' e1 , Then
I.}.
allel auel
+
+
a l1 el a 21 el
or, equivalently,
I' I = .4 2.15.
Exercise.
L
•
jml
+ +
a,je j, i =
+
+
alae.
••
=
a 1"e. =
I, ... , m.
,
e.} and { f l,fl, ... ,
' I I'
1' 1'
(4.2.13)
(4.2.14)
Prove Theorem .4 2.11.
Using matrix and vector notation, let us agree to express the system of linear equations given by Eq. (4.2.13) equivalently as
134
Chapter 4
all au
I iF nite-Dimensional
aU.
aa.
au
a~ h
a. 1 a.2
I' 2' ,- .
a• •
or, more succinctly, as
~
T
Vector Spaces and Matrices
"1
"2
".
y, ~
(4.2.16)
(4.2.17)
where x ~ (' I ' 2 ' " .. ".) and yT ~ ("1> "2' ... ,,,.). In terms ofx T, yT, and AT, let us agree to express Eq. (4.2.13) equivalently as
all (' I t
2' ' ,' ,.)
or, in short, as
aU.
a. 1 a.2
a21 au
~
a_
In
ab T x AT
("It "2' .. " "",) (4.2.18)
~
a•
yT.
(4 . 2.19)
We note that in Eq. (4.2.17), x E P, Y E F"', and A is an m X n matrix. F r om our discussion thus far it should be clear that we can utilize matrices to study systems of linear eq u ations which are of the form of Eq. (4.2.13). It should also be clear that an m x n matrix A is nothing more than a uniq u e representation of a linear transformation A of an n-dimensional vector space X into an m-dimensional vector space Y over the same field .F As such, A possesses all the properties of such transformations. We could, in fact, utilize matrices in place of general linear transformations to establish many facts concerning linear transformations defined on finite-dimensional linear spaces. However, since a given matrix is dependent upon the selection of two particular sets of bases (not necessarily distinct), such practice will, in general, be avoided whenever possible. We emphasize that a matrix and a linear transformation are not one and the same thing. In many texts no distinction in symbols is made between linear transformations and their matrix representation. We will not follow this custom.
B. Rank of a Matrix We begin by proving the following result. 4.2.20. Theorem. L e t A be a linear transformation from X into .Y Then A has rank r if and only if it is possible to choose a basis e{ l> e2 , • • , e.}
.4 2.
Matrices
135
for X and a basis { I I' ... ,fIll} for Y such that the matrix A of A with respect to these bases is of the form r..
- 100
6 o
010 A=
0 0 0
...
0 0
0-
0 0
0
1 0 0
...
0
m=
dim .Y
(4.2.21)
000···000···0 000···000···0
....
dim X
n=
Proof. We choose a basis for X of the form e{ l, e2.' ... ,e" e,+I'
• . . , e.}, where e{ l+ ' > ... , e.} isa basisfodJt(A). Ifll = Ae l ,f2. = Ae2.' ... ,/, = Ae" then {l1,f2.," .,/,} is a basis for R < (A), as we saw in the proof of Theorem 3.4.25. Now choose vectors 1,+1, ... ,fin in Y such that the set of vectors l{ 1,f2., .. . ,f",} forms a basis for Y (see Theorem 3.3.4)4 . Then
II 12.
=
=
=
Ae l
=
Ae2
+
(1)/1
(0)/1
+
(0)/2.
+ +
(1)12.
+
(O)/,
+
+
(0)/'1+
+
(0)/,
(0)/'1+
+ +
+
(O)/In'
+
(0)/""
..................................................................................................... ,
I, = o=
Ae,
0=
Ae" =
Ae,+
=
+
(0)/1
I
(0)/2
+
= (0)/1 + (0)/2. +
+
+
+ "' + (O)/In' (4.2.22) + (0)/, + (O)/,+ 1 + ... + (O)/In' (1)/,
(0)/'1+
...................................................................................................... ,
(0)/1
+
(0)/2.
+ ... +
(0)/,
+
(0)/'1+
+ ... +
(O)/In'
The necessity is proven by applying Definition 4.2.7 (and also Eq. (4.2.2» to the set of equations (4.2.22); the desired result given by Eq. (4.2.21) follows. Sufficiency follows from the fact that the basis for R < (A) contains r linearly independent vectors. _ A question of practical significance is the following: if A is the matrix of a linear transformation A from linear space X into linear space Y with respect to arbitrary bases e{ l , • • , e.} for X and { I I' ... , /In} for ,Y what is < (A) be the subspace of Y generthe rank of A in terms of matrix A? Let R ated by Ae l , Ae2.' ... , Ae". Then, in view of Eq. (4.2.2), the coordinate representation of Ae/> i = I, ... ,n, in Y with respect to { I I' ... ,fin} is given by
Chapter 4 I iF nite-Dimensional
136
... ,
Vector Spaces and Matrices
Ae,,~
F r om this it follows that R < (A) consists of vectors y whose coordinate representation is
+ ... + "
y=
"_ ...
a_ ... I
a_ ... 2
(4.2.23)
a..."
where" I' • • , "" are scalars. Since every spanning or generating set of a linear space contains a basis, we are able to select from among the vectors Ael • Ae 2• ... ,Ae" a basis for R < (A). Suppose that the set A { e l , Ae2 • ...• Aek} is this basis. Then the vectors Ae I. Ae 2• ..• , Ae k are linearly independent. and the vectors Aek+I' ... , Ae" are linear combinations of the vectors Ae l • Ae2 , • • • • Aek • F r om this there now follows: .4 2.24. Theorem. Let A E L ( X . )Y , and let A be the matrix of A with respect to the (arbitrary) basis eel' e2 • ... , e,,} for X and with respect to the (arbitrary) basis { l 1.l2 • ... .I...} for .Y Let the coordinate representation of y = Ax be Y = Ax. Then (i) the rank of A is the number of vectors in the largest possible linearly independent set of columns of A; and (ii) the rank of A is the number of vectors in the smallest possible set of columns of A which has the property that all columns not in it can be expressed as linear combinations of the columns in it. In view of this result we make the following definition. .4 2.25. Definition. The rank of an m X of linearly independent columns of A.
c.
n matrix A is the largest number
Properties of Matrices
Now let X be an n-dimensional linear space. let Y be an m-dimensional linear space, let F b e the field for X and ,Y and let A and B be linear transformations of X into .Y eL t A = a[ o ] be the matrix of A. and let B = h[ o ] be the matrix of B with respect to the bases felt e2 • • • , e,,} in X and { f t.f2.
.4 2.
Matrices
137
... ,/",} in .Y Using Eq. (3.4.2 4 ) as well as Definition .4 2.7, the reader can readily verify that the matrix of A + D, denoted by C A A + B, is given by
A
+
B
=
+
a[ lj]
=
b[ IJ]
+
a[ lJ
=
blj]
=
e[ IJ]
C.
(4.2.26)
Using Eq. (3.4.34 ) and Definition .4 2.7, the reader can also easily show that the matrix of A « , denoted by D A «A, is given by «A
=
=
a[ « IJ]
=
a«[ lj]
=
d[ IJ]
D.
(4.2.27)
F r om Eq. (4.2.26) we note that, in order to be able to add two matrices A and B, they must have the same number of row.5 and columns. In this case we say that A and B are comparable matrices. Also, from Eq. (4.2.27) it is clear that if A is an m X n matrix, then so is A « . Next, let Z be an r-dimensional vector space, let A E L ( X , )Y , and let D E L ( ,Y Z). L e t A be the matrix of A with respect to the basis e{ I' e", ... , e in X and with respect to the basis { f l' ! ' " ... ,f",} in .Y Let B be the matrix of D with respect to the basis { f l ,f", ... ,!m} in Y a nd with respect to the basis { g l' g", ... , g,} in Z. The product mapping DA as defined by Eq. (3.4.50) is a linear transformation of X into Z. We now ask: what is the matrix C of DA with respect to the bases e{ l, e", ... , e of X and g{ I ' g", ... , g,} of Z? By definition of matrices A and B (see Eq. (4.2.2», we have K }
K }
and
,
B! J = 1 :bljg/t 1= 1
Now
, "' = 1=1:1 J=I1:
j= I ,
... ,m.
blj aJkgl'
for k = I, ... , n. Thus, the matrix C of BA with respect to basis e{ in X and { g " ... , g,} in Z is e[ IJ' ] where
I' .•.
,
e
K }
(4.2.28) for i
=
I, ... , r andj =
I, ... , n. We write this as C= B A.
(4.2.29)
F r om the preceding discussion it is clear that two matrices A and B can be multiplied to form the product BA if and only if the number of columns ofB is equal to the number of rows of A. In this case we say that the matrices B and A are conformal matrices.
138
Chapter 4 I iF nite-Dimensional
Vector Spaces and Matrices
In arriving at Equations (4.2.28) and (4.2.29) we established the result given below. )Y with respect to the .4 2.30. Theorem. Let A be the matrix of A E L ( X , basis leu ez , ... , e.} in X and basis { l u! z , ,fill} in .Y Let B be the matrix of BEL ( ,Y Z) with respect to basis { I I' ,z ! ,fill} in Y and basis {g" g,z ... ,g,} in Z. Then BA is the matrix of BA. We now summarize the above discussion in the following definition. .4 2.31. let C =
Definition. Let A = a[ l' ] and B = b[ ll] be two m X n matrices, C[ II] be an n X r matrix, and let ~ E .F Then
(i) the som of A and B is the m x
n matrix
D= A + B
where
dll = a'l + bl' for all i = I, ... , m and for allj = 1, ... ,n; (ii) the product of matrix A by scalar ~ is the m x n matrix E=~A
where for all i
=
ell =
1, ... ,m and for allj =
~all
I, ... ,n; and
(iii) the product of matrix A and matrix C is the m x r matrix
G= A C,
where
for each i
=
I, ... , m and for eachj =
1, ... , r.
The properties of general linear transformations established in Section 3.4 hold, of course, in the case of their matrix representation. We summarize some of these in the remainder of the present section. .4 2.32.
Theorem.
(i) Let A and B be (m x n) matrices, and let C be an (n X Then (A B)C = AC + BC.
+
(ii) Let A be an (m Then X
n) matrix, and let Band C be (n x
A(B
+
C)
=
AD
+
AC.
r) matrix. (4.2.33) r) matrices. (4.2.34)
.4 2.
Matrices
139
(iii) Let A be an (m X n) matrix, let B be an (n X C be an (r X s) matrix. Then A(BC) = (iv) Let t¥,
pE
,F and let A be an (m
(t¥ + (v)
(AB)C.
t¥(A +
B) =
and let (4.2.35)
n) matrix. Then X
P)A =
Let t¥ E ,F and let A and B be (m
r) matrix,
t¥A +
x
(4.2.36)
pA.
n) matrices. Then
t¥A +
(4.2.37)
t¥B.
(vi) Let t¥, P E ,F let A be an (m X n) matrix, and let B be an (n X r) matrix. Then (4.2.38) (t¥A)(pB) = (t¥P)(AB). (vii)
Let A and B be (m
x
n) matrices. Then
A +B= (viii)
Let A, B, and C be (m (A +
x
(4.2.39)
B+ A .
n) matrices. Then
B) + C =
A+
(B +
C).
(4.2.40)
The proofs of the next two results are left as an exercise.
.4 2.41.
Theorem. L e t 0 E L ( X , Y ) be the zero transformation defined by Eq. (3.4.)4 . Then for any bases e{ l' ... , e.J and { f l' ... ,I.. J for X and ,Y respectively, the linear transformation 0 is represented by the (m x n) matrix (4.2.42)
The matrix 0 is called the Dull matrix.
.4 2.43.
Theorem. Let I E L ( X , X ) be the identity transformation defined by Eq. (3.4.56). L e t e{ l> ... , e.J be an arbitrary basis for .X Then the matrix representation of the linear transformation I from X into X with respect to the basis e{ l> ... , e.J is given by
I I is called the n x
.4 2.45.
Exercise.
~ ~
[ : .. ..: ..:.:.:..
:J
(4.2.4)4
n identity matrix. Prove Theorems 4.2.32,4.2.41,
and .4 2.43.
140
Chapter 4 I iF nite-Dimensional F o r any (m x
Vector Spaces and Matrices
n) matrix A we have
(4.2.46)
A+ O = O + A = A and for any (n X n) matrix B we have
(4.2.47)
BI= I B= B
where I is the (n x n) identity matrix. If A = a[ u] is a matrix of the linear transformation A, then correspondingly, - A is a matrix of the linear transformation - A , where
-A =
(- I )A =
all
012
ala
021
02 2
02"
0",2
a",,,
(- I ) _ 0 "' 1
- a ll
- 0 12
- a la
- 0 21
-au
- 0 211
- 0 "' 2
- a ",,,
(4.2.48)
= _ - a "' l
It follows immediately that A + (- A ) = 0, where 0 denotes the null matrix. By convention we usually write A + (- A) = A- A . Let A and B be (n X n) matrices. Then we have, in general,
AB*BA,
(4.2.49)
as was the case in Eq. (3.4.55). Nex t ,let A E L ( X , X ) and assume that A is non-singular. Let A- I denote the inverse of A. Then, by Theorem 3.4.60, ..4A1= A-1A = 1. Now if A is the (n x n) matrix of A with respect to the basis e{ l , • • ,ell} in ;X then there is an (n X n) matrix B of A- I with respect to the basis e{ u ... ,ell} in ,X such that (4.2.50) BA= A B= I . We call B the inverse of A and we denote it by A- I . In this connection we use the following terms interchangeably: A- I exists, A bas an inverse, A is invertible, or A is non-singular. If A is not non-singular, we say A is singnlar. With the aid of Theorem 3.4.63 the reader can readily establish the following result for matrices. 4.2.51.
Theorem. eL t A be an (n
(i) rank A = n; (ii) Ax = 0 implies x
=
0;
X
n) matrix. The following are equivalent:
.4 2.
Matrices
141
(iii) for every oY E "F , there is a unique X o E F " such that oY = (iv) the columns of A are linearly independent; and (v) A - I exists. 4.2.52.
Exercise.
Ax o;
Prove Theorem .4 2.51.
We have shown that we can represent n linear eq u ations by the matrix eq u ation (4.2.17). Now let A be a non-singular (n x n) matrix and consider the eq u ation y = Ax. (4.2.53)
If we premultiply both sides of this eq u ation by A - I we obtain x = A- I y ,
(4.2.54)
the solution to Eq. (4.2.53). Thus, knowledge of the inverse of A enables us to solve the system of linear eq u ations (4.2.53). In our next result, which is readily verified, some of the important properties of non-singular matrices are given. 4.2.55.
Theorem.
(i) An (n x n) non-singular matrix has one and only one inverse. (ii) IfA and B are non-singular (n x n) matrices, then (AB)-I = B-1 A- I .
(iii) If A and Bare (n x are A and D. 4.2.56.
Exercise.
n) matrices and if AB is non-singular, then so
Prove Theorem .4 2.55.
Our next theorem summarizes some of the important properties of the transpose of matrices. The proof of this theorem is a direct consequence of the definition of the transpose of a matrix (see Eq. (4.2.9». 4.2.57.
Theorem.
(i) F o r any matrix A, (AT)T = A. (ii) L e t A and B be conformal matrices. Then (AB)T = DTAT. (iii) L e t A be a non-singular matrix. Then (AT)-I = (A-I)T. (iv) L e t A be an (n X n) matrix. Then AT is non-singular if and only if A is non-singular. (v) Let A and B be comparable matrices. Then (A B)T = AT BT. (vi) L e t« E F and A be a matrix. Then (
+
4.2.58.
Exercise.
+
Prove Theorem .4 2.57.
Now let A be an (n X
n) matrix, and let m be a positive integer. Similarly
Chapter 4 I iF nite-Dimensional
142
Vector Spaces and Matrices
as in Eq. (3.4.67) we define the (n x n) matrix A'" by A'"
= -'! •
A • . ..•
A.
.
(4.2.59)
m times
and if A- I exists. then similarly as in Eq. (3.4.68). we define the (n x n) matrix A-"' as A-"' = (A-I)'" = A- I . A- I • . ..• A- I . (4.2.60) , . ~
m times
As in the case of Eqs. (3.4.69) through (3.4.71). the usual laws of exponents follow from the above definitions. Specifically. if A is an (n x n) matrix and if rand s are positive integers. then A' • A' (A' ) '
=
=
A' · '
A' · ' =
A" =
A" =
A' • A'. =
(A' ) ' .
and if A- I exists. then
(4.2.61) (4.2.62) (4.2.63)
Consistent with the above notation we have
Al = A
and
AO =
(4.2.64)
I.
(4.2.65)
We are now once more in a position to consider functions of linear transformations. where in the present case the linear transformations are represented by matrices F o r example. if f(J.) is the polynomial in .J given in Eq. (3.4.74). and if A is any (n X n) matrix. then by f(A) we mean f(A) =
/1 0 1
+
/lIA
+ ... +
/I.A·.
(4.2.66)
.4 2.67. Exercise. Let A E L ( X . X ) . and let A be the matrix of A with respect to the basis felt ...• e.l in .X L e tf(J . ) be given by Eq. (3.4.74). Show that/(A) is the matrix of /(A) with respect to the basis e{ l • ...• e.l. We noted earlier that in general linear transformations and matrices do not commute (see (3.4.55) and (4.2.49.» oH wever. in the case of square matrices. the reader can verify the following result easily. .4 2.68. lbeorem. Let A. B. C denote (n x n) matrices. let 0 denote the (n X n) null matrix. and let 1 denote the (n X n) identity matrix. Then. (i) (ii) (iii) (iv)
0 commutes with any A; A' commutes with Af. where p and q are positive integers; /II commutes with any A. where /I E F; and if A commutes with B and if A commutes with C. then A commutes with /lB + PC. where /I. P E .F
.4 2.
143
Matrices
.4 1.69.
Exercise.
Prove Theorem .4 2.68.
eL t us now consider some specific examples.
.4 2.70.
Example.
eL t F denote the field of real numbers, and let
[~ ~ ~l
A=
Then A+ If /X =
~ :J
B = :[ 3
o
3, then
let
Example.
and A - B
+
D
=[
- i , then
/XC Example.
-
[ l~ :~ ~
I'J~ .
18
9
o
3
eL t F denote the field ofcomplex numbers, let; =
C
.4 2.71.
[~ ~ iJ. =
1 0 -I
Then
If/X =
~ [~ i ~I
3
/XA =
.4 1.71.
and B
I~
7+ i -i
=
3-
7
+
i 5+ 7; 11
5
2;]
.
-4
I- 2- 3;]
-8; 7 [ 1- i - 3 i
-6i·. +5i
a
eL t F denote the field of real numbers, let G
~
[:
:]
and H -
[~
J-
-1,
eMpter 4 I iF nite-Dimensional
144
Vector Spaces and Matrices
Then GH Notice that in this case H G .4 2.73.
[~ ~J =
K Then
.4 2.74.
_
[~
and L =
10 5J [ 22 13
KL=
*' LK.
is not defined.
Let F be the field of real numbers, let
Example.
Clearly, K L
10 13. 5] [ 10 IS 22
=
and K L
~J I[ I 7 12· 16J
=
_
Example.
Let
M
[~ ~J =
[~ ~J.
and N =
Then
[~
MN= i.e., MN = .4 2.75.
0, even though M t= =
Example.
~J
=0,
0 and N t= =
o. _
If A is as defined in Example .4 2.70, then
I 2] 2 4 AT=4560._ [ 163 I .4 2.76.
Example.
p~ Then
Let
5 [:
-6
7
~]
32 24 and
I
Q= -45 24
24
8
- 1 6-
-6
24 27 24
24 -2
24
24
24
I
.4 2.
145
Matrices 1 0
p· Q = Q · P = i.e., Q = .4 2.77.
o
Q- l .
p- I or, equivalently, P =
Example.
0 1
[
0 •
Consider the set of simultaneous linear equations
+
~I
6el 2el
+
+
2e~
e3
+ 3e2 + e2
+
e3
+
+
0·e3
3e, =
+
e4 ,
0,
e,
= 0,
(4.2.78)
o. =
Equation (4.2.78) can be rewritten as
[
eL t
:
~ ~] [~:] :
2 1 0 1
e3
e,
= [
4 2 1 3]
[2
A= 6 314 .
I 0
~
0
].
(4.2.79)
(4.2.80)
I
Matrix A is the coordinate representation of a linear transformation A E L ( X , Y). In this case dim X = 4 and dim Y = 3. Observe now that the first column of A is a linear combination of the second column of A. Also, by adding the third column of A to the second column we obtain the fourth column of A. It follows that A has only two linearly independent columns. Hence, the rank of A is 2. Now since dim X = dim (~ A) + dim lR(A), the nullity of A is also 2. • Next, we discuss briefly partitioned vectors and matrices. Such vectors and matrices arise in a natural way when linear transformations acting on the direct sum of linear spaces are considered. Let X be an n-dimensional vector space, and let Y be an m-dimensional vector space. Suppose that X = U EB W, where U is an r-dimensionallinear subspace of ,X and suppose that Y = R EB Q, where R is a p-dimensional )Y , let e{ l, , eft} be a basis for X such linear subspace of .Y Let A E L(X, that e{ l' , e,} is a basis for ,U and let { f l' ,f",} be a basis for Y such ,f,} is a basis for R. Let A be the matrix of A with respect to that { I I' these bases. Now if x E P is the coordinate representation of x E X with respect to the basis e{ l, ... , eft}' we can partition x into two components,
Chapter 4 I iF nite-Dimensional
146
-
I~
Vector Spaces and Matrices
;[= }
(4.2.81)
=[~}
(4.2.82)
~.
where U
E
pr, ' " E )'-·
and where
Similarly, we can express y
pI as
E
'II
y=
' I ..
where y is the coordinate representation of y with respect to { I I' ... ,I..} and where r E PF and q E F"-p. We say the vector x in Eq. (4.2.81) is partitioned into components u and "'. Clearly, the vector u is determined by the coordinates ofx corresponding to the basis vectors e{ l' ... ,e,} in .U We can similarly divide the matrix A into the partition A=
[~_I -L~:J'
AZI
;
Au
(4.2.83)
where All isa(p x r)matrix, Au isa(p x (n - r» m atrix , A11 isan« m - p) X r) matrix, and Au is an « m - p) X (n - r» matrix. In this case, the equation (4.2.84) y= A x is equivalent to the pair of equations
r q =
=
Allu A 11 u
+ +
Au"' } . Az ' "
(4.2.85)
.4 2.
Matrices
147
A matrix in the form of Eq. (4.2.83) is called a partitioned mattix. The matrices All, Au, A21 , and A22 are called sUbmatrices of A. The generalization of partitioning the matrix A into more than four submatrices is accomplished in an obvious way, when the linear space X and/or the linear space Y a re the direct sum of more than two linear subspaces. Now let the linear spaces X and Y and the linear transformation A and the matrix A of A still be defined as in the preceding discussion. Let Z be a k-dimensional vector space (the spaces ,X ,Y and Zare vector spaces over the same field )F . Let Z = M EB N, where M is a j-dimensionallinear subspace of Z. eL t B E L ( ,Y Z). In a manner analogous to our preceding discussion, we represent B by the partitioned matrix
B= [~I_~LJ'
!
B21
(4.2.86)
Bu
It is now a simple matter to show that the linear transformation BA Z) is represented by the partitioned matrix
BA = We now prove:
[~; -!~;: -i~:;~-! ;:l
E
L(X,
(4.2.87)
Theorem. Let
X be an n-dimensional vector space, and let P E L(X, X ) . If P is a projection, then there exists a basis (el> ... , e.J for X such that the matrix P of P with respect to this basis is of the form .4 2.88.
0:
1 0
o
I
0:,
1
o
I I I I
r
I I
p=
I
I
0 0
:
-~_.
(4.2.89)
:0 I
I
I I
o
I
I I
I I
n- r •
•
;0 where r =
0
o
dim R < (P).
Proof. Since P is a projection we have, from Eq. (3.7.8), X
=
R < (P)
EB (~ P).
Now let r = dim R < (P), and let e{ l> ... , e.J be a basis for X such that (el' ... , e,J is a basis for R < (P). Let P be the matrix of P with respect to this basis, and the theorem follows. •
Chapter 4 I iF nite-Dimensional
148
Vector Spaces and Matrices
We leave the next result as an exercise. 4.2.90.
Theorem.
A E L(X,
X).
Let
X
be a finite-dimensional vector space, and let
If W is a p-dimensional invariant subspace of X and if X
EB Z, then there exists
=
W
a basis for X such that the matrix A of A with respect to this basis has the form A
=
[~: -i'~!:J o :A
2Z
where All is a (p x p) matrix and the remaining submatrices are of appropriate dimension. .4 2.91.
Exercise.
.4 3.
Prove Theorem .4 2.90.
EQUIVALENCE
AND SIMILARITY
F r om the previous section it is clear that a linear transformation A of a finite-dimensional vector space X into a finite-dimensional vector space Y can be represented by means ofdifferent matrices, depending on the particular choice of bases in X and .Y The choice of bases may in different cases result in matrices that are "easy" or "hard" to utilize. Many of the resulting "standard" forms of matrices, called canonical forms, arise because of practical considerations. Such canonical forms often exhibit inherent characteristics of the underlying transformation A. Before we can consider some of the more important canonical forms of matrices, we need to introduce several new concepts which are of great importance in their own right. Throughout the present section X and Y a re finite-dimensional vector spaces over the same field ,F dim X = n and dim Y = m. We begin our discussion with the following result. 4.3.1.
Theorem. Let e{ l , • • ,e"} be a basis for a linear space ,X be a set of vectors in X given by
e{ ;, ... , e~}
e; =
:t pjlej•
j=
I
i=
1, ... ,n,
where Plj E F for all i,j = I, ... ,n. The set e{ ;, ... ,~} X if and only if P = [Plj] is non-singular.
and let (4.3.2)
forms a basis for
Proof Let e{ ;, . .. ,~} be linearly independent, and let Pj denote the jthcolumn vector of P. Let
.4 3.
149
Equivalence and Similarity
for some scalars
lX I '
••
,IX "
E
.F This implies that
' } ' " IXIPIl
It follows that
I, ... ,n.
0, i = =
1':1
Rearranging, we have or
I::" IX I"
1= I
e;, ... , e.
O. =
Since are linearly independent, it follows that IX I = ... = IX" = O. Thus, the columns ofP are linearly independent. Therefore, P is non-singular. { I' .• • , PIt} be a linearly indeConversely, let P be non-singular, i.e., let P pendent set of vectors in .X
" ,=I:: lX,e; =
Let
I' . • . , IX"
E
" IX,PI' ... ,e,,} is a linearly independent set, it follows that I:: =
Then
Since e{ l' for j
=
0 for some scalars IX
I
I, ...
I- '
,n, and thus, I::" IX,P, = O. Since P{ I"
I-'
independent set, it now follows that
IX
e{ ;, ... , e.} is a linearly independent set. _ I
= ... =
.F
0
.. ,p,,} is a linearly IX"
=
0, and therefore
The preceding result gives rise to:
4.3.3. Definition. The matrix P of Theorem .4 3.1 basis e{ ;, ... , with respect to basis e{ I ' . • • , eft}'
e.}
is called the matrix of
We note that since P is non-singular, p- I exists. Thus, we can readily prove the next result.
,e.}
4.3.4. Theorem. L e t e{ l, ... ,e,,} and e{ ;, . .. be two bases for ,X and let P be the matrix of basis e{ ;, ... ,e~} with respect to basis e{ l' ... , eft}' Then p- I is the matrix of basis e{ I' ... , eft} with respect to the basis e{ ;,
... , e,.}.
4.3.5.
Exercise.
Prove Theorem .4 3.4.
The next result is also easily verified.
Chapter 4 I iF nite-Dimensional
150
Vector Spaces and Matrices
.4 3.6. Theorem. eL t X be a linear space, and let the sets of vectors e{ l> ... ,eft}' e{ ~ ,e..}, and e{ f' , . .. , e':} be bases for .X If P is the matrix , e'ft} with respect to basis e{ I ' • • , eft} and if Q is the matrix of basis e{ ,~ of basis e{ f' , , e':} with respect to basis e{ ,~ ... ,e..}, then PQ is the eft}' matrix of basis e{ f' , • . . , e':} with respect to basis e{ l , • • .4 3.7.
Exercise.
Prove Theorem .4 3.6.
We now prove: ~ • , e..} be two bases for a linear .4 3.8. Theorem. eL t e{ I • . • . , eft} and e{ ,~ ... ,e..} with respect to basis space .X and let P be the matrix of basis ,~{ e{ lt • • , eft}' eL t x E X and let x denote the coordinate representation of x with respect to the basis e{ lt • • , eft}' eL t x ' denote the coordinate representation of x with respect to the basis e{ ,~ ... ,e..}. Then Px ' = .x
Proof.
eL t x
T
=
(~I'
=
and let (x)' T
... '~ft)'
(~~,
... ,~~).
Then
and Thus, ~ ft ~eJ
~•
=
J-I
which implies that ,~
Therefore,
~ [ .~
1-'
J~I
=
~
ft
i
P/J~J'
j':1
x
plJe, ]
=
=
=
~ ft(. ~ P/J~J t:1 I- I
) e,
I, ...• n.
Px /. •
.4 3.9. Exercise. eL t X = Rft and let u{ It • • ,u.} be the natural basis for Rft (see Example .4 l.l5). eL t e{ lt • • ,eft} be another basis for R-, and let eft be the coordinate representations of e lt • • , e., respectively, with e lt • • • • , e.} with respect to the natural basis. Show that the matrix of basis e{ I • • • . respect to basis u{ lt • • , fU t} is given by P = e[ lt e2 , • • , eft]' i.e., the matrix whose columns are the column vectors e l • • . • ,eft' )Y , and let e{ l, ... ,e.} and { f l" .. ,f..} .4 3.10. Theorem. eL t A E L ( X , be bases for X and ,Y respectively. eL t A be the matrix of A with respect ,fill} in .Y eL t ~{ • . .. , e..} be to the bases e{ l , • • ,eft} in X and { f l' another basis for .X and letthe matrix of{e,~ , e..} with respectto e{ l , • • , eft} be P. eL t f{ ,~ ... ,f~} be another basis for ,Y and let Q be the matrix of { f l' ... ,fill} with respect to f{ ,~ ... ,f~}. eL t A' be the matrix of A with respect
.4 3.
151
Equivalence and Simiklrity
to the bases e{ ,~
... , e:,} in X and f{ ,~
... ,f~}
in .Y Then.
A' = Q AP. Proof.
We have
A(~ I~ Pklek) =
Ae; =
=
t
t Pk/Aek = t Pkl(f't1 at/rlt)
k~1
k~1
=
Pkl[l=t1 alk(t q J d j)] J=I
k~1
Now, by definition, Ae; =
~
IN
"J ::1
t(f't1 t
J-l
lJ q alkPkl)fj.
k= 1
aj,!j. Since a matrix of a linear transformation
is uniquely determined once the bases are specified, we conclude that
for i =
I, ... ,m andj
=
I, ... , n. Therefore, A' =
QAP. •
In iF gure A, Theorem .4 3.10 is depicted schematically.
x
A
A Px'
x=
" y
"
t· (e;, .. ·.e~}
A'
x'
.4 3.11.
y=
Ax
u; ..... f;"}
"
y'
=
Qy
IF gure A. Schematic diagram of Theorem .4 3.10.
The preceding result motivates the following definition. .4 3.12. (m X
Definition. An (m X n) matrix n) matrix A if there exists an (m X
A' is said to be equivalent to an m) non-singular matrix Q and an
Chapter" I iF nite-Dimensional
152
Vector Spaces and Matrices
n) non-singular matrix P such that
(n X
A' = Q AP.
(4.3.13)
IfA' is equivalent to A, we write A' ..., A. Thus, an (m X n) matrix A' is equivalent to an (m X n) matrix A if and only if A and A' can be interpreted as both being matrices of the same linear transformation A of a linear space X into a linear space ,Y but with respect to possibly different choices of bases. Our next result shows that ..., is reflexive, symmetric, and transitive, and as such is an equivalence relation. .4 3.14.
Theorem. Let A, B, and C be (m x
n) matrices. Then
(i) A is always equivalent to A; (ii) if A is equivalent to B, then B is equivalent to A; and (iii) if A is equivalent to Band B is equivalent to C, then A is equivalent to C. .4 3.15.
Exercise.
Prove Theorem .4 3.14.
The reader can prove the next result readily. .4 3.16.
Theorem. Let A and B be m x n matrices. Then
(i) every matrix A is equivalent to a matrix of the form
1 0 0 .. ·
o
1 0
...
· ..
0-
...
0
000 .. · 1 0 0 0 0 0 .. · 0 0 0
.. · 0 .. · 0
0 0 0 .. · 0 0 0
.. · 0
r
= rank A (4.3.17)
(ii) two (m x n) matrices A and B are equivalent if and only if they have the same rank; and (iii) A and AT have the same rank.
.4 3.18.
Exercise.
Prove Theorem .4 3.16.
Our definition of rank of a matrix given in the last section (Definition .4 2.25) is sometimes called the columa rank of a matrix. Sometimes, an analogous definition for row rank of a matrix is also considered. The above theorem shows that the row rank of a matrix is equal to its column rank.
.4 3.
153
Equivalence and Similarity
Next, let us consider the special case when X
.Y =
We have:
Theorem. L e t A E L ( X , X), let (e l , • • , e.l be a basis for ,X and let A be the matrix of A with respect to (e l , • • , e.l. L e t (e~, ... , e"l be another basis for X whose matrix with respect to (e l , • • , e.l is P. L e t A' be the matrix of A with respect to (~, ... , e"l. Then
.4 3.19.
A'
=
P- I AP.
(4.3.20)
The meaning of the above theorem is depicted schematically in F i gure B. The proof of this theorem is just a special application of Theorem .4 3.10. ,;,.;A~
X
A
t,
Ie" . ". enl
....
• X
•
A'
Ie; ..... e~l 4.3.21.
__
Ie,.' ..• enl
t,,-
e{ ;, ... , e~}
Figure B. Schematic diagram of Theorem .4 3.19.
Theorem .4 3.19 gives rise to the following concept.
.4 3.22.
Definition. An (n X n) matrix A' is said to be similar to an (n X matrix A if there exists an (n X n) non-singular matrix P such that A'
= P- I AP.
n)
(4.3.23)
If A' is similar to A, we write A' ,." A. We call P a similarity transformation.
It is a simple matter to prove the following: Theorem. Let A' be similar to A; i.e., A' = P- I AP, where P is non-singular. Then A is similar to A' and A = PA' P - I .
.4 3.24.
In view of this result, there is no ambiguity in saying two matrices are similar. To sum up, if two matrices A and A' represent the same linear transforX), possibly with respect to two different bases for ,X mation A E L ( X , then A and A' are similar matrices.
eMpter 4 I iF nite-Dimensional
154
Vector Spaces and Matrices
Our next result shows that ' " given in Definition 4.3.22 is an equivalence relation. 4.3.25.
Let A, B, and C be (n x
Theorem.
n) matrices. Then
(i) A is similar to A; (ii) if A is similar to B, then B is similar to A; and (iii) if A is similar to B and if B is similar to C, then A is similar to C. 4.3.26.
Exercise.
Prove Theorem .4 3.25.
F o r similar matrices we also have the following result. 4.3.27.
Theorem.
(i) Ifan (n X n) matrix A is similar to an (n X n) matrix B, then At is similar to Bk, where k is a positive integer. (ii) L e t (4.3.28) where
/%0'
••
,/%",
E
.F
Then
f(P- I AP) =
P- l f(A)P.
(4.3.29)
This implies that if B is similar to A, then f(B) is similar to f(A). In fact, the same matrix P is involved. (iii)
L e t A' be similar to A, and let f(l) denote the polynomial of Eq. (4.3.28). Then f(A) = 0 if and only if f(A' ) = O. (iv) L e t A E L ( X , X ) , and l' et A be the matrix of A with respect to a basis e{ l , • • ,e.} in .X L e t f(l) denote the polynomial of Eq . (4.3.28). Then f(A) is the matrix of f(A) with respect to the basis
e{ l , • • , e.}. (v) L e t A E L ( X , X ) , and letf(l) denote the polynomial ofEq . (4.3.28). Let A be any matrix of A. Thenf(A) = 0 ifand only iff(A) = O.
4.3.30.
Exercise.
Prove Theorem 4.3.27.
We can use results such as the preceding ones to good advantage. F o r example, let A' denote the matrix
11
o
0 0 12 0
00
A' =
(4.3.31)
o o
0 0
0 0
1"_1
o
0
1.
.4 .4
ISS
Determinants ofMatrices
Then
MOO o A~ 0 (A')k
.. ·
·
0 0
=
o
o
0 0
0 0
o
Now letf(A) be given by Eq. (4.3.28). Then
I 0 o I f(A' )
=
0 0
0
+
(10
0 0 0 0
A'1' 0
Al
0
0 ........ . Ar .........
o o
o o
0 A2
+ ...
(II
I
0 0
0
f(AI )
0
0
o
o o
0 0
A"_I 0
0 f(A2)
0
A"
............ . ...........
o o
0 0
-
o f(l.)
We conclude the present section with the following definition. .4 3.32. Definition. We call a matrix of the form (4.3.31) a diagonal matrix. Specifically, a square (n X n) matrix A = [a'l] is said to be a diagonal matrix if alj = 0 for all i j. In this case we write A = diag (all, an, ... , a•• ).
"*
.4 .4
DETERMINANTS OF
MATRICES
At this point of our development we need to consider the important topic of determinants. After stating the definition of the determinant of a matrix, we explore some of the commonly used properties of determinants. We then characterize singular and non-singular linear transformations on finite-dimensional vector spaces in terms of determinants. Finally, we give a method of determining the inverse of non-singular matrices. Let N = {I, 2, ... ,n} . We recall (see Definition 1.2.28) that a permutation on N is a one-to-one mapping of N onto itself. F o r example, if (J denotes a
Chapter 4 I iF nite-Dimensional
156
permutation on N, then we can represent it as
wherej, E Nfor i = I, ... , n andj, q given above, more compactly, as
*-
j" for i *- k. Henceforth, we represent .. . j".
= j dz q
n)
... ... j,,'
I 2 ( jl jz
q=
Vector Spaces and Matrices
Clearly, there are n! possible permutations on N. We let P(N) denote the set of all permutations on N, and we distinguish between odd and even permutations. Specifically, if there is an even number of pairs (i, k) such that i > k but i precedes k in q , then we say that q is even. Otherwise q is said to be odd. Finally, we define the function sgn from P(N) into F b y
+
= {
sgn (q)
I q
is even
-I q is odd for all q E P(N). Before giving the definition of the determinant of a matrix, let us consider a specific example.
4.4.1. Example. As indicated in the accompanying table, there are six permutations on N = (I, 2,3). In this table the odd and even permutations are identified and the function sgn is given. t1
t1
(jl.h)
(j.. h)
123 132 213 231 312 321
(1,2) (1,3) (2, 1) (2,3) (3,1) (3,2)
(1,3) (1,2) (2,3) (2,1) (3,2) (3,1)
Now let A denote the (n X
(jz , h)
sgn t1
even
+1 -1 -1 +1 +1 -1
(2,3) (3,2) (1,3) (3,1) (1,2) (2, 1)
odd odd
even even odd
n) matrix
all al2 A=
is
odd or even
[
a~~ a"l
.. ~ a"z
alrt] ......... •.•
~"
.
a""
We form the product of n elements from A by taking one and only one element from each row and one and only one element from each column. We represent this product as
.4 .4
157
Determinants ofMatrices
where tU i]. ... j.) E P(N). It is possible to find n! such products, one for each u E P(N). We now define the determinant of A, denoted by det (A), by the sum det (A) = where u
=
I:
"ep(N)
sgn (0') • allt • a2jo • . ..•
a.}.,
(4..4 2)
jl .. . j .• We also denote the determinant of A by writing
det(A)
=
(4..4 3)
We now present some of the fundamental properties of determinants. .4 .4 .4
Theorem.
eL t A and B be (n
x n) matrices.
(i) det (AT) = det (A). (ii) If all elements of a column (or row) of A are ez ro, then det (A) = O. (iii) IfB is the matrix obtained by multiplying every element in a column (or row) of A by a constant tx, while all other columns of B are the same as those in A, then det (B) = tx det (A). (iv) If B is the same as A, except that two columns (or rows) are interchanged, then det (B) = - d et (A). (v) If two columns (or rows) of A are identical, then det (A) = O. (vi) If the columns (or rows) of A are linearly dependent, then det (A) = O.
Proof To prove the first part, we note first that each product in the sum given in Eq. (4..4 2) has as a factor one and only one element from each column and each row of A. Thus, transposing matrix A will not affect the n! products appearing in the summation. We now must check to see that the sign of each term is the same. F o r U E P(N), the term in det (A) corresponding to 0' is sgn (u)a llta 2•} .• • a.} .• There is a product term in det (AT) of the form a lt'lajo'2" . aN. such that a 1lt a 2jo . . , a.} . = a} I ' l aN2 ... au .• The right-hand side of this equation is just a rearrangement of the left-hand side. The number of j; > j;+ I for i = I, ... ,n - I is the same as the number of j/ > j/+ I for i = 1, ... , n - 1. Thus, if 0" = ;U j~ . . .j~) then sgn (u' ) = sgn (0'), which means det (AT) = det (A). Note that this result implies that any property below which is proved for columns holds equally as well for rows. To prove the second part, we note from Eq. (4..4 2) that if for some i, Q/ k = 0 for all k, then det (A) = O. This proves that if every element in a row of A is ez ro, then det (A) = O. By part (i) it follows that this result holds also for columns. _
Chapter 4 I iF nite-Dimensional
158 .4 .4 5.
Exercise.
Prove parts (iii}(- vi)
Vector Spaces and Matrices
of Theorem .4 .4 .4
We now introduce some additional concepts for determinants. .4 .4 6. Definition. Let A = a[ l' ] be an n x n matrix. If the ith row and jth column of A are deleted, the remaining (n - 1) rows and (n - 1) columns can be used to form another matrix Mil whose determinant is det (Mil)' We call det (MIJ) the minor of a'l' If the diagonal elements of MIJ are diagonal elements of A, i.e., i = j, then we speak of a principal minor of A. The cofactor of a'l is defined as (- 1 )' + 1 det (MIJ). F o r example, if A is a (3 x
3) matrix, then
det (A)
=
all a ZI
au
an
a l3 a Z3
,
the minor of element a Z3 is det(Mz3) and the cofactor of a Z3 is
=
a ll
l
a 31
The next result provides us with a convenient method of evaluating determinants. .4 .4 7. Theorem. Let A be an n x n matrix. eL t e'l denote the cofactor of a'l' i,j = I, ... ,n. Then the determinant of A is equal to the sum of the products of the elements of any column (or row) of A, each by its own cofactor. Specifically, (4..4 8) for j = for i =
I, ... , n, and, det (A) 1, ... ,n.
F o r example, if A is a (2 x
= J=IL "
a,AI'
2) matrix, then we have
(4..4 9)
.4 .4
159
Determinants ofMatrices
If A is a (3
x 3) matrix, then we have =
det (A)
=
all
012.
0' 3
02'
au
023
0IlC I ,
+
0I1CU
+
0I3 C I3'
In this case five other possibilities exist. F o r example, we also have det (A) .4 .4 10.
Exercise.
=
O"C"
+
02,C2'
+
a 3 ,c 31 •
Prove Theorem .4 .4 7.
We also have: .4 .4 11. Theorem. Ifthe ith row of an (n X n) matrix A consists of elements of the form 0/1 + 0:" 0' 2 + 0;2' • • ,a," + 0:.; i.e., if
a.2
then
det(A)
.4 .4 12.
=
Exercise.
Prove Theorem .4 .4 11.
Furthermore, we have: .4 .4 13. Theorem. eL t A and B be (n x n) matrices. If B is obtained from the matrix A by adding a constant tt times any column (or row) to any other column (or row) of A, then det (B) = det (A). .4 .4 14.
Exercise.
Prove Theorem .4 .4 13.
In addition, we can prove:
Chapter 4 I iF nite-Dimensional
160
Vector Spaces and Matrices
.4 .4 15.
Theorem. Let A be an (n X n) matrix, and let c,/ denote the cofactor of 0 ,/, i,j = I, ... , n. Then the sum of products of the elements of any column (or row) by the corresponding cofactors of the elements of any other column (or row) is ez ro. That is,
• ~
a,/c ,k
1=1
and
= 0 for j
*' k
(4..4 16a) (4..4 16b)
.4 .4 17.
Exercise.
Prove Theorem .4 .4 15.
We can combine Eqs. (4..4 8) ~
•
and (4..4 16a)
a,/c ,k =
1=1
to obtain
det (A)cS/k>
(4..4 18)
1, ... , n, where /~ k denotes the Kronecker combine Eqs. (4..4 9) and (4..4 16b) to obtain
j, k =
delta. Similarly, we can (4..4 19)
1, ... , n. We are now in a position to prove the following important result.
i, k =
.4 .4 20.
Theorem. eL t A and B be (n
Proof
We have
det (AD) =
det(AB)= ~
•
'.=1
By Theorem .4 .4 11
x
n) matrices. Then
det (A) det (B).
(4.4.21)
.
a",.b /• 1
and Theorem .4 .4 ,4
part (iii), we have
a""
a",.
This determinant will vanish whenever two or more of the indices i/,j = 1, ... , n, are identical. Thus, we need to sum only over (f E P(N). We have det (AB) =
~
"EP(N)
b"lb,,1" .b ,•
.
.
,
.4 .4
Determinants 01 Matrices
161
where q = ili~ ... i. and P(N) is the set of all permutations of N = n}. It is now straightforward to show that
{I, ... ,
sgn (q) det (A), =
and hence it follows that det (AB)
= det (A) det (B). •
Our next result is readily verified. .4 .4 22. Theorem. Let I be the (n x n) identity matrix, and let 0 be the (n x n) zero matrix. Then det (I) = I and det (0) = 0. .4 .4 23.
Exercise.
Prove Theorem .4 .4 22.
The next theorem allows us to characterize non-singular matrices in terms of their determinants. .4 .4 24. Theorem. An (n X (A)::I= O.
n) matrix A is non-singular if and only if det
Suppose that A is non-singular. Then A- I exists and A- I A = AA- I I. F r om this it follows that det (A - I A) = I *0, and thus, in view of Eq. (4..4 21), det (A - I ) ::1= 0 and det (A) O. Next, assume that A is singular. By Theorem .4 3.16, there exist nonsingular matrices Q and P such that
Proof
=
*
o A' =
QAP=
°
o
o This shows that rank A det (QAP)
and det (P) =0 . •
*°
< nand det (A')
=
=
0. But
d[ et (Q») • [det (A») • [det (P»)
=
0,
and det (Q)::I= 0. Therefore, if A is singular, then det (A)
Chapter 4 I iF nite-Dimensional
162
Vector Spaces and Matrices
Let us now turn to the problem of finding the inverse A- I of a nonsingular matrix A. In doing so, we need to introduce the classical adjoint of A. .4 .4 25. Definition. Let A be an (n X n) matrix, and let c' j be the cofactor of D/J for i,j = 1, ... ,n. Let C be the matrix formed by the cofactors of A; The matrix (J is called the classical adjoint of A. We write i.e., C = c[ /J' ] adj (A) to denote the classical adjoint of A. We now have: .4 .4 26.
Theorem.
Let A be an (n
=
A[adj (A)]
n) matrix. Then X
a[ dj (A)]A
=
[det (A)] • I.
Proof The proof follows by direct computation, using Eqs. (4..4 18) (4..4 19).
•
As an immediate consequence of Theorem .4 .4 26 lowing practical result. 4.4.27.
Let A be a non-singular (n x
CoroUary.
=
A -I .4 .4 29.
Example.
We have det(A)
and
we now have the foln) matrix. Then
de/(A) adj(A).
(4.4.28)
=
Consider the matrix
_~ H
A~[:
-1,
adj (A) and
=[
-3
~
-1 -1
1 -1
~],
-2 A- I
= [
-~
The proofs of the next two theorems are left as an exercise. .4 .4 30.
Theorem. If A and 8 are similar matrices, then det (A) =
det (8).
X). Let A be the matrix of A with respect .4 .4 31. Theorem. Let A E L ( X , to a basis {el>' .. ,e,,} in ,X and let A' be the matrix of A with respect to another basis fe;, ... , e:.} in .X Then det (A) = det (A').
.4 5.
Eigenvalues and Eigenvectors
.4 .4 32.
Exercise.
163
Prove Theorems .4 .4 30
and .4 .4 31.
In view of the preceding results, there is no ambiguity in the following definition.
.4 .4 33. Definition. The determinant of a linear transformation A of a finite-dimensional vector space X into X is the determinant of any matrix A representing it; i.e., det (A) Do det (A). The last result of the present section is a consequence of Theorems .4 .4 20 and .4 .4 24.
.4 .4 34.
Theorem. Let X be a finite-dimensional vector space, and let A, B E L ( X , X ) . Then A is non-singular if and only if det (A) O. Also, det (AB) = d[ et (A)] • d[ et (B)].
*"
.4 5.
EIGENVALE U S
AND EIGENVECTORS
In the present section we consider eigenvalues and eigenvectors of linear transformations defined on finite-dimensional vector spaces. Later, in Chapter 7, we will reconsider these concepts in a more general setting. Eigenvalues and eigenvectors play, of course, a crucial role in the study of linear transformations. Throughout the present section, X denotes an n-dimensional vector space over a field .F eL t A E L ( X , X ) , and let us assume that there exist sets of vectors e{ l, ... , e.J and e{ ;, ... , e~J, which are bases for X such that
e; =
Ael =
lle l , (4.5.1)
i. = Ae. = l.e.,
where 1, E ,F i = 1, ... , n. If this is the case, then the matrix A' of A with respect to the given basis is
A/ =
This motivates the following result.
o
Chapter 4 I iF nite-Dimensional
164
.4 5.2. Theorem. eL t A such that
E
Vector Spaces and Matrices
X ) , and let.t E .F Then the set ofall x E X
L ( ,X
Ax
Ax =
(4.5.3)
is a linear subspace of .X In fact, it is the null space of the linear transformation (A - .tI), where I is the identity element of L(X, )X .
Proof
Since the zero vector satisfies Eq. (4.5.3) for any .t E ,F the set is non-void. If the zero vector is the only such vector, then we are done, for O { J is a linear subspace of X (of dimension ez ro). In any case, Eq. (4.5.3) holds if and only if (A - U ) x = O. Thus, x belongs to the null space of A - U , and it follows from Theorem 3.4.19 that the set of all x E X sat• isfying Eq. (4.5.3) is a linear subspace of .X Henceforth
we let
mol = x{
:X (A -
.tl)x
=
OJ. (4.5.4) The preceding result gives rise to several important concepts which we introduce in the following definition. E
X ) , and mol be defined as in Theorem .4 5.5. DefiDition. Let ,X A E L ( X , .4 5.2 and Eq. (4.5.4). A scalar .t such that mol contains more than just the zero vector is called an eigenvalue of A (i.e., if there is an x =# 0 such that Ax = lx , then 1 is called an eigenvalue of A). When .t is an eigenvalue of A, then each x =# 0 in mol is called an eigenvector of A corresponding to the eigenvalue .t. The dimension of the linear subspace mol is called the multiplicity of the eigenvalue .t. Ifmol is of dimension one, then A. is called a simple eigenvalue. The set of all eigenvalues of A is called the spectrum of A.
Some authors call an eigenvalue a proper value or a characteristic value or a latent value or a secular value. Similarly, other names for eigenvector are proper vector or cbaracteristic vector. The space mol is called the .tth proper subspace of X. F o r matrices we give the following corresponding definition. .4 5.6. DefiDition. Let A be an (n X n) matrix whose elements belong to the field .F If there exists.t E F and a non-zero vector x E F " such that
Ax
=
.tx
(4.5.7)
then .t is called an eigenvalue of A and x is called an eigenvector of A corresponding to the eigenvalue .t. Our next result provides the connection between Definitions .4 5.5 and .4 5.6. .4 5.8. Theorem. Let A E L ( X , X ) , and let A be the matrix of A with respect to the basis e{ ., ... ,e,,}. Then A. is an eigenvalue of A if and only if.t is an eigenvalue of A. Also, x E X is an eigenvector of A corresponding to .t if
.4 5.
165
Eigenvalues and Eigenvectors
and only if the coordinate representation of x with respect to the basis e{ I' • • , e,,}, ,x is an eigenvector of A corresponding to 1. .4 5.9.
Exercise.
Prove Theorem 4.5.8.
Note that if x (or x) is an eigenvector of A (of A), then any non-ez ro multiple of x (of x) is also an eigenvector of A (of A). In the next result, the proof of which is left as an exercise, we use determinants to characterize eigenvalues. We have:
.4 5.10.
Theorem. Let A E L(X, and only if det (A - lI) = O.
.4 5.11.
Exercise.
)X .
Then 1
E
F is an eigenvalue of A if
Prove Theorem 4.5.10.
Let us next examine the equation det(A - 1 1) =
0
(4.5.12)
in terms of the parameter 1. We ask: Can we determine which values of 1, if any, satisfy Eq. (4.5.12)1 eL t e{ l, ... ,e,,} be an arbitrary basis for X and let A be the matrix of A with respect to this basis. We then have det (A -
U)
=
det (A -
11).
(4.5.13)
The right-hand side of Eq. (4.5.13) may be rewritten as (all
-1)
au
at..
det(A - 1 1) =
(4.5.14) 0"1
ad
(a"" -
1)
It is clear from Eq. (4.4.2)
that expansion of the determinant (4.5.14) yields a polynomial in 1 of degree n. In order for 1 to be an eigenvalue of A it must (a) satisfy Eq. (4.5.12), and (b) it must belong to .F Requirement (b) warrants further comment: note that there is no guarantee that there exists 1 E F such that Eq. (4.5.12) is satisfied, or equivalently we have no assurance that the nth-order polynomial equation det(A - 1 1) =
0
has any roots in .F There is, however, a special class of fields for which requirement (b) is automatically satisfied. We have:
.4 5.15.
Definition. A field F is said to be algebraically closed if for every polynomial p(l) there is at least one 1 E F such that
Pel) =
o.
(4.5.16)
Chapter 4 I iF nite-Dimensional
166
Vector Spaces and Matrices
Any 1 which satisfies Eq. (4.5.16) is said to be a root of the polynomial equation (4.5.16). In particular, the field ofcomplex numbers is algebraically closed, whereas the field of real numbers is not (e.g., consider the equation ..P + I = 0). There are other fields besides the field of complex numbers which are algebraically closed. oH wever, since we will not develop these, we will restrict ourselves to the field of complex numbers, C, whenever the algebraic closure property of Definition .4 5.15 is required. When considering results that are valid for a vector space over an arbitrary field, we will (as before) make usage of the symbol F or frequently (as before) make no reference to F at all. We summarize the above discussion in the following theorem. .4 5.17.
Theorem. eL t A
E
L(X,
Then
X).
(i) det (A - 1 I) is a polynomial of degree n in the parameter 1; i.e., there exist scalars /10' /II' • • , /1ft' depending only on A, such that lT) =
det (A (note that
/1 0
=
/1 0
det (A) and
+
/Ill
/1ft
= (-
+
/lz l z
+ ... +
/I)' f t
(4.5.18)
I)");
(ii) the eigenvalues of A are precisely the roots of the equation (A - ).T) = 0; i.e., they are the roots of /1 0
+
/II).
+
+ ... +
/lz)z'
/lft1"
= 0; and
det
(4.5.19)
(iii) A has; at most, n distinct eigenvalues. The above result motivates the following definition. .4 5.20.
Definition. eL t A E L ( X , det (A -
1I)
and let A be a matrix of A. We call
X),
= det (A -
).1) =
/1 0
+
/II).
+ ... +
/I)."
(4.5.21)
the characteristic polynomial of A (or of A) and det(A - 1 T) =
det(A - 1 1) =
0
(4.5.22)
the characteristic equation of A (or of A). rF om the fundamental properties of polynomials over the field of complex numbers there now follows: Theorem. If X is an n-dimensional vector space over C and if X ) , then it is possible to write the characteristic polynomial of A in the form
.4 5.23. A
E
L(X,
det (A -
).1)
=
(1 1 -
).)",,().z -
).)"" • . •
()., -
).)"",
(4.5.24)
.4 5.
167
Eigenvalues and Eigenvectors
where AI' i = 1, ... ,p, are the distinct roots of Eq. (4.5.19) (Le., AI 1= = A/ for i 1= = j). In Eq. (4.5.24), ml is called the algebraic multiplicity of the root AI'
t
The ml are positive integers, and
ml =
1= 1
n.
Note the distinction between the concept of algebraic multiplicity of AI given in Theorem .4 5.23 and the multiplicity of ).1 as given in Definition .4 5.5. In general, these need not be the same, as will be seen later. We now state and prove one of the most important results of linear algebra, the Cayley-aH milton theorem. .4 5.25. Theorem. eL t A be an n X n matrix, and let p(A) = be the characteristic polynomial of A. Then P(A) =
det (A -
AI)
O.
Proof eL t the characteristic polynomial for A be p(A) =
+
~o
+ ... +
~IA
~"A".
Now let B(A) be the classical adjoint of (A ~ AI). Since the elements bli).) of B(A) are cofactors of the matrix A - ),1, they are polynomials in A of degree not more than n - 1. Thus, blJ(A)
Letting Bk
=
PI/O
+
PI/IA +
... +
PI/<,,-Il
A,,-I.
= P[ I/k] for k = 0, 1, ... , n - 1, we have B(.t)
=
Bo
By Theorem .4 .4 26, (A -
Thus,
+
.tB I
AI)B(A) =
+ ... +
.t"- I B"_ I '
d[ et (A -
AI)]I.
(A - II)(Bo + AB I + ... + A,,-IB,,_I] = (~o + ~IA + ... + ~"A")I. Expanding the left-hand side of this equation and equating like powers of A, we have - B ,,_ I =
~"I,
AB,,_I -
B"-1 =
... , AB I - B o =
~"_II,
« I I, ABo
=
~0I.
Premultiplying the above matrix equations by A", A"-I, ... , A, I, respectively, we have
-A"B"_I
=
A"B"_I -
~"A",
A1B I -
ABo =
~IA,
A"-IB"_1 ABo
=
=
~"_IA"-I, ~ol.
Adding these matrix equations, we obtain
o=
~oI
which was to be shown. _
+
~IA
+ ... +
~"A"
=
p(A),
... ,
Chapter 4 I iF nite-Dimensional
168
Vector Spaces and Matrices
As an immediate consequence of the Cayley-aH milton .4 5.26. Theorem. Let A be an (n nomial given by Eq. (4.5.21). Then
n) matrix X
theorem, we have:
with characteristic poly-
(lIA + ... + (l~_IA~-I]; and (ii) if f(.A.) is any polynomial in 1, then there exist Po, PI' .• . , P~_I such that (i)
A~
=
+
(-I)~+I[(loI
f(A) = Proof =
Pol
+
PIA
+ ... +
F
E
P~_IA~-I.
Part (i) follows from Theorem .4 5.25
and from the fact that
(l~
(-I)~.
To prove part (ii), let f(A) be any polynomial in A and let P(1) denote the characteristic polynomial of A. Then there exist two polynomials g(1) and r(A) (see Theorem 2.3.9) such that f(A) =
where deg r[ (A)] < n - I. sU ing and the theorem follows. •
P(A)g(1)
+
r(1),
the fact that p(A) =
(4.5.27)
0, we have f(A) =
r(A),
The Cayley-aH milton theorem holds also in the case of linear transformations. Specifically, we have the following result. and let p(l) denote the characteristic
.4 5.28. Theorem. eL t A E L ( X , X ) , polynomial of A. Then P(A) = O. .4 5.29.
Exercise.
Prove Theorem .4 5.28.
Let us now consider a specific example. .4 5.30.
Example.
Consider the matrix
A=G J~
Let us use Theorem .4 5.26 to evaluate A37. Since n = A37 is of the form A37 = Pol + PIA.
2, we assume that
The characteristic polynomial of A is P(A)
= (I -
and the eigenvalues of A are 1 1 = 1 37 and r(l) in Eq. (4.5.27) is
=
r(l) =
We must determine
Po
and
PI'
1)(2 -
I and 1 2
Po + sU ing
1)
=
2. In the present case f(l)
PI1. the fact that P(11) =
P(11) =
0, it
.4 6.
169
Some CtllWnical oF rms ofMatrices
= rO. I ) andf(A z ) = r(A z ). Thus, we have Po + PI = p7 = I, Po + 2PI = 237 •
follows thatfO' I ) Hence,
PI
= 237
Po =
I and -
A37
2- 2
(2 - 2 =
or,
[I
A37 -
237
37
37
+
)1
•
Therefore, (2 37
I)A, -
0.J •
I 237 -
Before closing the present section, let us introduce another important concept for matrices.
.4 5.31.
Definition. If A is an (n X n) matrix, then the trace of A, denoted by trace A or by tr A, is defined as tr A =
trace A =
all
+
022
+ ... +
a..
(4.5.32)
(i.e., the trace of a square matrix is the sum of its diagonal elements). It turns out that if F = C, the field of complex numbers, then there is a relationship between the trace, determinant, and eigenvalues of an (n X n) matrix A. We have:
.4 5.33.
A E L(X,
Theorem. Let X be a vector space over C. Let A be a matrix of X ) and let det (A - U ) be given by Eq. (4.5.24). Then
(i) det (A)
=
(ii) trace (A) =
jJ ;
.ti';
t J=I
mJ
1J
;
(iii) if B is any matrix similar to A, then trace (8) (iv) let f(A) denote the polynomial
= trace (A); and
f(A) = 10 + 11 A + ... + 1... A... ; then the roots of the characteristic polynomial of f(A) are f(AI ), ... ,f(A,) and det [ f (A) - All = [ f (A I ) - A]"" ... [ f (1,) - 1] ...•.
.4 5.34.
.4 6.
Exercise.
Prove Theorem .4 5.33.
SOME CANONICAL
O F RMS
OF
MATRICES
In the present section we investigate under which conditions a linear transformation of a vector space into itself can be represented by special types of matrices, namely, by (a) a diagonal matrix, (b) a so-called triangular matrix, and (c) a so-called "block diagonal matrix." We will also investigate
Chapter 4 I iF nite-Dimensional
170
Vector Spaces and Matrices
when a linear transformation cannot be represented by a diagonal matrix.
Throughout the present section X denotes an n-dimensional vector space over a field .F .4 6.1. Theorem. Let ll' ... , lp be distinct eigenvalues of a linear trans)X . Let e~ 1= = 0, ... , e~ 1= = 0 be eigenvectors of A formation A E L ( X , corresponding to ll"' " lp, respectively. Then the set {e~, ... , e~} is linearly independent.
Proof. The proof is by contradiction. Assume that the set e{ ,~ ... ,e~} is linearly dependent so that there exist scalars « I ' ... , p« , not all ez ro, such that
«Ie~
+ ... +
= O.
«pe~
We assume that these scalars have been chosen in such a fashion that as few of them as possible are non-zero. Relabeling, if necessary, we thus have «Ie~
+ ... +
(4.6.2)
0,
=
«,e~
where I« ;= C 0, ... , IX, 1= = 0 and where r < p is the smallest number for which we can get such an expression. Since ll, ... ,l, are eigenvalues and since e~, . .. ,I, are eigenvectors. we have 0=
Also,
=
A(O)
+ ... + + ... + (<,< l,)I,.
=
A(<
(<
o= =
Subtracting Eq. (4.6.4)
o=
.2., • 0
=
(<
.2.,(<,~
=
,« 1,)
+ ... +
+ ... +
«IAe~
« , AI, (4.6.3)
+ ... +
«,e~)
(4.6.4)
(<,< .2.,)1,.
from Eq. (4.6.3) we obtain
« I (ll -
.2.,)e~
+ ... +
,« (.2., -
l,)I,.
Since by assumption the .2.,'s are distinct, we have found an expression involving only (r - I) vectors satisfying Eq. (4.6.2). But r was chosen to be the smallest number for which Eq. (4.6.2) holds. We have thus arrived at a contradiction, and our theorem is proved. _ We note that if, in the above theorem, A has n distinct eigenvalues, then the corresponding n eigenvectors span the linear space X (recall that dim
X = n ).
Our next result enables us to represent a linear transformation with n distinct eigenvalues in a very convenient form. .4 6.5. Theorem. Let A E L ( X , )X . Assume that the characteristic polynomial of A has n distinct roots, so that
.4 6.
Some Canonical oF rms of Matrices
det (A where )..1' e;, i = I, 2, (e~,
=
)..f)
171
)..)O·z -
()..l -
)..) ... ()..ft -
)..),
are distinct eigenvalues. Then there exists a basis such that e; is an eigenvector corresponding to )../ for , n. The matrix A' of A with respect to the basis e{ ;, e;, ... , i.} is
)"z, ... ')..ft ,e~} of X
Al
A' =
(4.6.6)
o A"
Proof Let e; denote the eigenvector corresponding to the eigenvalue A/. In view of Theorem .4 6.1, the set e{ ;, e;, ... ,e.} is linearly independent because AI' Az , , ,t" are all different. Moreover, since there are n of the e;, the set e{ ,~ e;, , e~} forms a basis for the n-dimensional space .X Also, from the definition of eigenvalue and eigenvector, we have
Ale; Ae; = Aze;
Ae;
=
(4.6.7)
F r om Eq. (4.6.7) we obtain the desired matrix given in Eq. (4.6.6). •
The reader can prove the following useful result readily. .4 6.8. Theorem. eL t A E L ( X , X ) , and let A be the matrix of A with respect to a basis (e I ' ez , ... , eft}' If the characteristic polynomial det (A -
)..l)
has n distinct roots, )..1' with respect to a basis (e;,
=
z )" z + ... + IXft)..ft ,Aft' then A is similar to the matrix A' of A , e~}, where IX o
+
IX l ' t
+
IX
o
A' =
o
(4.6.9)
Aft
In this case there eixsts a non-singular matrix P such that A'
=
P- I AP.
(4.6.10)
The matrix P is the matrix of basis (e;, e;, ... , e~} with respect to basis (e l , ez , ... , ell}, and p- I is the matrix of basis e{ l, ... ,eft} with respect to
Chapter 4 I iF nite-Dimensional
172
Vector Spaces and Matrices
basis e{ ,~ ... , e,,}. The matrix P can be constructed by letting its columns be eigenvectors of A corresponding to AIt • • , A., respectively. That is,
P= where x
tt • •
,x .
x
[XI'
,x.l,
2, • •
(4.6.11)
are eigenvectors of A corresponding to the eigenvalues
AI, ... , A., respectively.
The similarity transformation P given in Eq. (4.6.11) is called a modal matrix. If the conditions of Theorem .4 6.8 are satisfied and if, in particular, Eq. (4.6.9) holds, then we say that matrix A bas been diagonalized. .4 6.12.
Exercise.
Prove Theorem .4 6.8.
eL t us now consider some specific examples. .4 6.13. Example. eL t X be a two-dimensional vector space over the field of real numbers. eL t A E L ( X , X ) , and let e{ l, e2} be a basis for .X Suppose the matrix A of A with respect to this basis is given by
_ 2-[ J4
A-
I
I
.
The characteristic polynomial of A is p(l)
=
=
det(A - 1 I)
det(A - 1 1)
= A2 + A-
6.
Now det (A - AI) = 0 if and only if A2 + A- 6 = 0, or (A - 2)(A + 3) = O. Thus, the eigenvalues of A are 1 1 = 2 and 1 2 = - 3 . To find an eigenvector corresponding to 1 1 , we solve the equation (A - l ll)x = 0, or
The last equation yields the equations
e4- 1
+ e4 2 =
These are satisfied whenever I~ =
~2'
0, I~ - ~2 = O. Thus, any vector of the form
is an eigenvector of A corresponding to the eigenvalue 11' F o r convenience, let us choose ~ = I. Then
is an eigenvector. In a similar fashion we obtain an eigenvector sponding to 1 2 , given by X
2
corre-
.4 6.
Some Canonical oF rms ofMatrices
173
The diagonal matrix A' given in Eq. (4.6.9) is, in the present case,
= [~I ;J=
A'
~[ l~-
We can arrive at A', using Eq. (4.6.10). Specifically, let
P=
x
[XI'
2]
Then
[ ..22 -.2.8J =
p- I
and P- I AP
By Eq. (4.3.2), the basis
A is given by e~
=
~
2
1=1
[~ =
PIleI =
-~J
el
+
e2'
[~I =
e;) c X
e{ ,~
;- J
[~ =
;J.
with respect to which A' represents
e; =
~
•
1=1
Pnel =
4e
1 -
e2'
In view of Theorem 4.3.8, if X is the coordinate representation of x with respect to e{ l, e2 ), then X ' = P- I X is the coordinate representation of x with respect to {e~, e;). The vectors e~, are, of course, eigenvectors of A corresponding to AI and A2 , respectively. _
e;
When the algebraic multiplicity of one or more of the eigenvalues of a linear transformation is greater than one, then the linear transformation is said to have repeated eigenvalues. Unfortunately, in this case it is not always possible to represent the linear transformation by a diagonal matrix. To put it another way, if a square matrix has repeated eigenvalues, then it is not always possible to diagonalize it. oH wever, from the preceding results of the present section it should be clear that a linear transformation with repeated eigenvalues can be represented by a diagonal matrix if the number of linearly independent eigenvectors corresponding to any eigenvalue is the same as the algebraic multiplicity of the eigenvalue. The following examples throw additional light on these comments. 4.6.14.
Example.
The characteristic equation of the matrix
[
A= is
det (A -
AI)
and the eigenvalues of A are AI
13-2]
0 4
o
-2
3
-I
= (I - A)2(2 - A) = 0,
= 1 and A2 = 2. The algebraic multiplicity
Chapter 4 I iF nite-Dimensional
174
Vector Spaces and Matrices
of AI is two. Corresponding to AI we can find two linearly independent eigenvectors
m H[ and
Corresponding to
A~
we have an eigenvector
:[ } Letting P denote a modal matrix, we have
p=[~ and
1 I] 0[ 1 Oland p- I 010 =
-1
-2 3
!n
A'-P-'AP=[~
In this example, dim moll = 2, which happens to be the same as the algebraic multiplicity of 11" F o r this reason we were able to diagonalize matrix A. • The next example shows that the multiplicity of an eigenvalue need not be the same as its algebraic multiplicity. In this case we are not able to diagonalize the matrix.
.4 6.15.
Example.
The characteristic eq u ation of the matrix
is
21-2]
[ 001 =
A
det(A - ) .I)
=
0 2
-1
(1- 1 )(2
- 1 )~
=
0
and the eigenvalues of A are 1 1 = 1 and 1~ = 2. The algebraic multiplicity of 1~ is two. An eigenvector corresponding to AI is = (I, 1, 1). An eigenvector corresponding to 1~ must be of the form
rx
H[
*~ O.
.4 6.
Some Canonical oF rms ofMatrices
175
Setting ~x = (1,0,0), we see that dim mAl = I, and thus we have not been able to determine a basis for R3, consisting of eigenvectors. Consequently, we have not been able to diagonalize A. • When a matrix cannot be diagonalized we seek, for practical reasons, to represent a linear transformation by a matrix which is as nearly diagonal as possible. Our next result provides the basis of representing linear transformations by such matrices, which we call block diagonal matrices. In the next section we will consider the "simplest" type of block diagonal matrix, called the Jordan canonical form. Theorem. Let X be an n-dimensional vector space, and let A Let Y and Z be linear subspaces of X such that X = Y EEl Z and such that A is reduced by Y a nd Z. Then there exists a basis for X such that the matrix A of A with respect to this basis has the form
4.6.16. E L(X,
X).
where dim Y matrix. 4.6.17.
=
Exercise.
A=l'-~[ *J r, AI is an (r X
r) matrix and A2 is an (n -
r) X
(n -
r)
Prove Theorem .4 6.16.
We can generalize the preceding result. Suppose that X is the direct sum of linear subspaces X I ' ... ' X , that are invariant under A E L ( X , X). We can define linear transformations AI E L ( X I , ,X ), i = 1, ... ,p, by A/x = Ax for x E X,. That is to say, A, is the restriction of A to ,X . We now can find for each A, a matrix representation A" which will lead us to the following result. Theorem. eL t X be a finite-dimensional vector space, and let A E L(X, )X . If X is the direct sum of p linear subspaces, X I ' ... , "X which are invariant under A, then there exists a basis for X such that the matrix representation for A is in the block diagonal form given by
4.6.18.
A=
AI: ...I : A2 ,-- ,
._ -
o
I
0
:
r- -
: A,
Moreover, A, is a matrix representation of A" the restriction of A to X
it
Chapter 4 I iF nite-Dimensional
176 i
=
Vector Spaces and Matrices
1, ... ,po Also,
.4 6.19.
Exercise.
Prove Theorem .4 6.18.
F r om the preceding it is clear that, in order to carry out the block diagonalization of a matrix A, we need to find an appropriate set of invariant subspaces of X and, furthermore, to find a simple matrix representation on each of these subspaces.
.4 6.20. Example. Let X be an n-dimensional vector space. If A E L ( X , has n distinct eigenvalues, 1 1 , • • , 1., and if we let ~J
then ~
J
= :x{
= OJ,
ll)x
(A -
X)
1, ... ,n,
j =
is an invariant linear subspace under A and
X=
~I
EB··
E· B~.·
F o r any x E ~J' we have Ax = 1J,x and hence AJx = 1Jx for x E ~J' A basis for ~J is any non-zero x J E ~r Thus, with respect to this basis, AJ is represented by the matrix 1J (in this case, simply a scalar). With respect to a • • , .x ,} A is represented basis of n linearly independent eigenvectors, IX{ > by Eq. (4.6.6). • In addition to the diagonal form and the block diagonal form, there are many other useful forms for matrices to represent linear transformations on finite-dimensional vector spaces. One of these canonical forms involves triangular matrices, which we consider in the last result ofthe present section. We say that an (n X n) matrix is a triangulu matrix ifit either has the form
all
or the form
o
012.
0 13
ab
022
023
02.
0
0
0
0
0 0
(4.6.21)
a._ I ,. a••
all
0
0
0
021
02:1,
0
0
(4.6.22)
In case of Eq. (4.6.21) we speak of an upper triangulu matrix, whereas in case of Eq. (4.6.22) we say the matrix is in the lower triangular form.
.4 6.
Some Canonical oF rms ofMatrices
117
.4 6.23. Theorem. L e t X be an n-dimensional vector space over C, and let A E L ( X , X). Then there exists a basis for X such that A is represented by an upper triangular matrix.
Proof. We wilt show that if A is a matrix of A, then A is similar to an upper triangular matrix A'. Our proof is by induction on n. If n = 1, then the assertion is clearly true. Now assume that for n = k, and C any k x k matrix, there exists a non-singular matrix Q such that C' = Q- I CQ is an upper triangular matrix. We now must show.the validity of the assertion for n = k + 1. Let X b e a (k + I)-dimensional vector space over C. Let AI be an eigenvalue of A, and letll be a corresponding eigenvector. Let { f z , ... ,fk+l} be any set of vectors in X such that { f l' ... ,fk+l} is a basis for .X L e t B be the matrix of A with respect to the basis { f l' ... , fk+I.} Since All = A.lI • B must be of the form AI
B= Now let C be the k
[
bl2
~
... ::: ...
bk+I,z
.• .
0....
o
bl,k+1
~ '.k:.1
J
.
bk+I,k+1
x k matrix
By our induction hypothesis, there exists a non-singular matrix Q such that C' = Q- I CQ, where C' is an upper triangular matrix. Now let
I
0-- :- p=
i I
0
I •
I
0
I I I
•
... Q
I I
0:
I
By direct computation we have I ;I 0
0:
...
~-I-
p- I
=
I I
.: •
I 1
0:
1
Q- I
0
178
Chapter 4 I iF nite-Dimensional
and
AI :. • -~_.
P- I BP
•.
o:
Vector Spaces and Matrices
•
I I
=
I I I I I I
o:
where the .' s denote elements which may be non-ez ro. Letting A = P-IBP, it follows that A is upper triangular and is similar to B. eH nce, any (k + 1) x (k + 1) matrix which represents A E L ( X , X ) is similar to the upper triangular matrix A, by Theorem .4 3.19. This completes the proof of the theorem. _ Note that if A is in the triangular form of either Eq. (4.6.21) or (4.6.22), then det (A - 11) = (a J I - A)(au - A) ... (a • - 1). In this case the diagonal elements of A are the eigenvalues of A.
.4 7.
MINIMAL POLN Y OMIALS, OPERATORS, AND THE CANONICAL O F RM
NILPOTENT JORDAN
In the present section we develop the Jordan canonical form of a matrix. To do so, we need to introduce the concepts of minimal polynomial and nilpotent operator and to study some of the properties of such polynomials and operators. nU less otherwise specified, X denotes an n-dimensional vector
space over a field F throughout the present section. A.
Minimal Polynomials
F o r purposes of motivation, consider the matrix
=
A
[~ o ~ =~]. 3
-I
The characteristic polynomial of A is p(A)
=
1)Z(2 -
(I -
and we know from the Cayley- Hamilton P(A)
=
1),
theorem that O.
(4.7.1)
.4 7.
179
Minimal Polynomials
Now let us consider the polynomial Then
m(A) =
A)(2 -
(1 -
m(A)
=
A) =
2-
3A +
+
A2
= O.
3A
21 -
AZ • (4.7.2)
Thus, matrix A satisfies Eq. (4.7.2), which is of lower degree than Eq. (4.7.1), the characteristic eq u ation of A. Before stating our first result, we recall that an nth- o rder polynomial in A is said to be monic if the coefficient of An is unity (see Definition 2.3.4). 4.7.3. Theorem. L e t A be an (n polynomial m(A) such that X
n) matrix. Then there exists a unique
(i) m(A) = 0; (ii) m(A) is monic; and, (iii) if m'(A) is any other polynomial such that m'(A) = 0, then the degree of m(A) is less or equal to the degree of m'(A) (Le., m(A) is ofthe lowest degree such that m(A) = 0). Proof We know that a polynomial, p(A), exists such that P(A) = 0, namely, the characteristic polynomial. F u rthermore, the degree of p(A) is n. Thus, there exists a polynomial, say f(A), of degree m < n such that f(A) = O. Let us choose m to be the lowest degree for which f(A) = O. Since f(A) is of degree m, we may divide f(A) by the coefficient of Am, thus obtaining a monic polynomial, m(A), such that m(A) = O. To show that m(A) is uniq u e, suppose there is another monic polynomial m' ( A) of degree m such that m'(A) = O. Then m(l) - m' ( l) is a polynomial of degree less than m. F u rthermore, m(A) - m'(A) = 0, which contradicts our assumption that m(A) is the polynomial of lowest degree such that m(A) = O. This completes the proof. _
The preceding result gives rise to the notion of minimal polynomial. 4.7.4. Definition. The polynomial m(A) defined in Theorem .4 7.3 is called the minimal polynomial of A. Other names for minimal polynomial are minimum polynomial and reduced characteristic fUBction. In the following we will develop an explicit form for the minimal polynomial of A, which makes it possible to determine it systematically, rather than by trial and error. In the remainder of this section we let A denote an (n X n) matrix, we let p(A) denote the characteristic polynomial of A, and we let m(A) denote the minimal polynomial of A. Theorem. Let f(l) be any polynomial such that f(A) = m(A) divides f(A).
4.7.5.
O. Then
Chapter 4 I iF nite-Dimensional
180
Vector Spaces and Matrices
Proof. Let 11 denote the degree of mel). Then there exist polynomials q ( l) and r(l) such that (see Theorem 2.3.9) I(l)
<
where deg r[ (l)]
or r(l)
11
+
= q ( l)m(l)
r(l),
= O. Since I(A) = 0, we have
o=
+
q(A)m(A)
rCA),
and hence rCA) = O. This means r(l) = 0, for otherwise we would have a contradiction to the fact that mel) is the minimal polynomial of A. Hence, I(l) = q ( l)m(l) and mel) divides I(l). •
.4 7.6. Corollary. The minimal polynomial of A, mel), divides the characteristic polynomial of A, pel). .4 7.7.
Prove Corollary .4 7.6.
Exercise.
We now prove:
.4 7.8.
Deorem. The polynomial pel) divides m [ (l)]".
Proof. We want to show that m [ (l)]" = p(l)q ( l) Let m(,t) be of degree 11 and be given by
for some polynomial
q(,t).
mel) =
l'
+
+ ... +
P.l· - '
P•.
Let us now define the matrices Bo, B., ... , B._. as Bo = I, B. = A + P.I, B1 = Al + P.A + P1I, ... , B._. = A· - t + PIA,- l + ... + P._ . I. Then Bo = I, B. - ABo = PtI, B1 - AB. = P1I, ... , B.- t - AB.- 1 = P.- t I , and - A B' _ I = P,I - [A' + PtA· - t + ... P,I)
= P,I - meA) = P.I.
Now let Then (A -
lI)B(l)
+
l' B o +
=
A,-tB 1 +
=
= A'B o + A·1- B [ t l' I
+
PtA,- I I
-
... + ABo]
+ ... +
AB'I_
+
-
A,-l[Bl
P,- t ll
+
o+
[l'-'AB
+
+ ... +
-
AB t]
A[B,-t
-
P,I =
l· - l AB.
+ .,.
AB,_t]
AB,_l]
m(l)I.
-
AB,_t
.4 7.
MinimolPolynomials
181
Taking the determinant of both sides of this equation we have [det (A -
).1)] • d[ et B().») =
But det B().) is a polynomial in )., say q().). p().)q().) = m [ ().)].•
m [ ()')» ft.
Thus, we have proved that
The next result establishes the form of the minimal polynomial.
.4 7.9.
Theorem. Letp().) be given by Eq. (4.5.24);
=
P().) where m t ,
).\ , .• .
,).p
().t -
).)"',().%
-
i.e.,
).)"' • . .. ().p -
, m p are the algebraic multiplicities of the distinct eigenvalues of A, respectively. Then
••
= (). - ).t)"(). where 1 ::;;; v,::;;; m, for i = I, ... ,p. m().)
).%), • . .•
(). -
.4 7.11. -
).)"",
).p)",
(4.7.10)
Exercise. Prove Theorem .4 7.9. (Hint: Assume that m().) = p\ ) " ... (). - p,)", and use Corollary .4 7.6 and Theorem .4 7.8).
().
The only unknowns left to determine the minimal polynomial of A are , vp in Eq. (4.7.10). These can be determined in several ways. Our next result is an immediate consequence of Theorem .4 3.27.
Vt, • •
4.7.12.
Theorem.
Let
A' be similar to A. and let m' ( .t) be the minimal = m().).
polynomial of A' . Then m /().)
This result justifies the following definition.
.4 7.13.
Definition. eL t A E L ( X , X ) . The minimal polynomial of A is the minimal polynomial of any matrix A which represents A. In order to develop the J o rdan canonical form (for linear transformations with repeated eigenvalues), we need to establish several additional preliminary results which are important in their own right.
.4 7.14.
Theorem. Let A E L ( X , X ) . and letf().) be any polynomial in ).. Let m, = { x : f(A)x = OJ. Then m, is an invariant linear subspace of X under A.
Proof The proof that m, is a linear subspace of X is straightforward and is left as an exercise. To show that m, is invariant under A, let x Em,. so thatf(A)x
= O. We want to show that Ax
E
m"
Let
Chapter 4 I iF nite-Dimensional
182
Vector Spaces and Matrices
Then and
which completes the proof. _ Before proceeding further, we establish some additional notation. Let AI" .. ,Ap be distinct eigenvalues of A E L(X, )X . F o r j = I, ... ,p and for any positive integer ,q let 1~
=
{x:
=
AJT)qX
(A -
OJ.
(4.7.15)
Note that this notation is consistent with that used in Example if we define
=
}~
.4 6.20
~J.
Note also that, in view of Theorem .4 7.14, 1~ is an invariant linear subspace of X under A. We will need the following result concerning the restriction of a linear transformation.
)X . Let IX and X 1 be linear subspaces of .4 7.16. Theorem. Let A E L(X, X such that X = IX EEl X 1 and let AI be the restriction of A to IX . Let f(A) be any polynomial in 1. If A is reduced by X I and X 1 then, for all IX E X " f(AI)x I = f(A)x l • .4 7.17.
Exercise.
Prove Theorem .4 7.16.
Next we prove: .4 7.18. Theorem. Let X be a vector space over C, and let A E Let m(l) be the minimal polynomial of A as given in Eq. (4.7.10). = (A - AI)", let h(A) = (l - A1)" ... (A - Ap )" if p 2 2, let if p = I. eL t AI be the restriction of A to ~i', i.e., AI X = Ax for all Let ml = x { E :X h(A)x = OJ. Then
L(X, X). Let g(l) h(A) = I x E ~i'.
(i) X = ~'i' EEl ml; and (ii) (l - A I)" is the minimal polynomial for AI'
Proof By Theorem .4 7.14, ml and ~i' are invariant linear subspaces under A. Since g(l) and h(l) are relatively prime, there exist polynomials (q A) and r(l) such that (see Exercise 2.3.15) q ( l)g(l)
+
r(l)h(l)
=
1.
.4 7.
eH nce,
183
Minimal Polynomials
for the linear transformation A we have
+
(q A)g(A)
Thus, for x
E
,X
we have x
Now since h(A)q(A)g(A)x
=
=
=
r(A)h(A)
(q A)g(A)x
(q A)g(A)h(A)x
+
I.
(4.7.19)
r(A)h(A)x.
=
(q A)m(A)x
=
(q A)Ox
=
0,
it follows thatq(A)g(A)x E ml. We can similarly show that r(A)h(A)x Emi' . Thus, for every x E X we have x = XI + x 2 , where IX E mi' and X z E ml. Let us now show that this representation of x is unique. Let X = IX X 2 = x; + x~, where IX ' ;x E ml ' and 2X ' ~x E ml. Then
+
=
r(A)h(A)x
r(A)h(A)x
;x
Applying Eq. (4.7.19) to IX and
=
XI
and
r(A)h(A)x;. =
we get r(A)h(A)x l
=
;X
l
r(A)h(A)x;.
F r om this we conclude that XI = ;x . Similarly. we can show that X 2 = x~. Therefore. X = mi' EB ml. To prove the second part of the theorem, let A I be the restriction of A to mi' and let A2 be the restriction of A to ml. eL t ml(l) and m2(1) be the minimal polynomials for AI and A2• respectively. Since g(A I) = 0 and h(A 1 ) = O. it follows that ml(l) divides g(l) and m1 divides hell. by Theorem 4.7.5. eH nce, we can write
o.)
ml(l) =
and
m2(A)
=
(A -
ll)kt
(1 -
A2)lo' ... (1 -
A,)lo,.
where 0 < kl :::;;:vl for i = I• . .. • p. Now let fell = ml(A)mrlA). Then f(A) = m l(A)m 2(A). eL t X E X with X = IX + 2X ' where IX E mi' and 2X E ml. Then f(A)x
=
+
m l(A)m 2(A)x 2 = m 2(A)m.(A)x l O. But this implies that mel) dividesf(l) and 0
m l (A)m 2(A)x
l
Therefore,f(A) = i = I, ... ,po We thus conclude that kl proof of the theorem. _
=
VI
for i
=
<
=
O. VI
<
kl'
I, ...• P. which completes the
We are now in a position to prove the following important result, called the primary decomposition theorem. .4 7.20. Theorem. eL t X be an n-dimensional vector space over C. let AI' ...• A, be the distinct eigenvalues of A E L ( X . X ) . let the characteristic
184
Chapter 4 I m F ite-Dimensional
Vector Spaces and Matrices
p(A.) =
A.)-,'
(4.7.21)
A.,)".
(4.7.22)
polynomial of A be (A.I -
A.)"" ... (A., -
and let the minimal polynomial of A be m(A.) =
eL t
,x =
Then i=
(i) "X (ii) X =
:x {
(A. -
(A -
A. I ) " . • . (A. -
OJ,
A.,I)"x =
i=
I, ... ,po
I, ... ,p are invariant linear subspaces of X under A;
Et> •.. Et>
Xl
X,;
(iii) (A. - A.,)" is the minimal polynomial of A" where A, is the restriction of A to X,; and, (iv) dim ,X = m" i = I, ... ,po
Proof The proofs of parts (i), (ii), and (iii) follow from the preceding theorem by a simple induction argument and are left as an exercise. To prove the last part ofthe theorem, we first show that the only eigenvalue of A, E (L "X ,X ) is A." i = I, ... ,po eL t f) E "X v*" 0, and consider (A, - A.l)v = O. From part (iii) it follows that 0= (A, - A.,ly"V = (A, - 11I),·1- (A , - A.I/)v = (A, - 1,I),·I- (A. - A.,)v = (A. - A.,)(A, - A.,I),.- l (A, - A.,l)v (A. - l ,)l(A , =
A.,I),,-l v =
...
= (A. - A.,)"v.
From this we conclude that 1 = 1 " We can now find a matrix representation of A in the form given in Theorem .4 6.18. uF rthermore, from this theorem it follows that p(A.) =
det (A -
A./) =
D; det (A, -
A./).
Now since the only eigenvalue of A, is 1 the determinant of A, " be of the form det (A, - A.I) = (A., - A.)'t, where ,q =
dim ,X . Since p(A.) is given by Eq. (4.7.21), we must have (A. I -
A.)IIII .• •
(A., -
A.)III, =
(A. l -
A.)" ..• (A., -
from which we conclude that m, = "q Thus, dim ,X This concludes the proof of the theorem. _ .4 7.23.
A./ must
Exercise.
Prove parts (i)-i{ ii)
=
A.)t"
m i= "
1, ... ,po
of Theorem .4 7.20.
The preceding result shows that we can always represent A E L(X, X) by a matrix in block diagonal form, where the number of diagonal blocks
.4 7.
Nilpotent Operators
185
(in the matrix A of Theorem .4 6.18) is equal to the number of distinct eigenvalues of A. We will next find a convenient representation for each of the diagonal submatrices A" It may turn out that one or more of the submatrices A, will be diagonal. Our next result tells us specifically when A E L(X, X ) is representable by a diagonal matrix. .4 7.24. Theorem. Let X be an n-dimensional vector space over C, and X ) . eL t 1..... , 1" p < n, be the distinct eigenvalues of A. let A E L ( X , Then there exists a basis for X such that the matrix A of A with respect to this basis is diagonal if and only if the minimal polynomial for A is of the form mel) = (1 - A1 )(1 - Az ) • . • (A - A,). .4 7.25.
Prove Theorem .4 7.24.
Exercise.
.4 7.26. Exercise. .4 6.14 and .4 6.15.
Apply the above theorem to the matrices in Examples
B. Nilpotent Operators eL t us now proceed to find a representation for each of the A, E L ( X ,X ) " of in Theorem .4 7.20 so that the block diagonal matrix representation A E L(X, X ) (see Theorem .4 6.18) is as simple as possible. To accomplish this, we first need to define and examine so-called nilpotent operators. .4 7.27. DefiDition. eL t N E L ( X , X). Then N is said to be nilpotent if there exists an integer q > 0 such that N" = O. A nilpotent operator is said to be of index q if N" = 0 but N,,- I "* O. Recall now that Theorem .4 7.20 enables us to write X = X I EB X z EEl • X .• Furthermore, the linear transformation (A, - A,l) is nilpotent on ~. Ifwe let N, = A, - A,I, then A, = All + N,. Now 1,1 is clearly represented by a diagonal matrix. oH wever, the transformation N, forces the matrix representation of A, to be in general non-diagonal. So our next task is to seek a simple representation of the nilpotent operator N,. In the next few results, which are concerned with properties of nilpotent operators, we drop for convenience the subscript i.
EB
.4 7.28. T ' heorem. eL t N E L ( V, V), where V is an m-dimensional vector space. If N is a nilpotent linear transformation of index q and if x . E V is such that N,- l x 0, then the vectors x , Nx , ... , N,,- I x in V are linearly independent.
*"
Chapter 4 I iF nite-Dimensional
186
Vector Spaces and Matrices
Proof. We first note that if Nf- I X *- 0, then NJx *- 0 for j = q - I. Our proof is now by contradiction. Suppose that
~
1= 0
~
= -
= NJ+I[~
NJ x
l{ ,1 Nix l{ ,J
I=I+ J
Thus,
o. *- o. Then we can write
l{ ,INI X =
L e tj be the smallest integer such that l{ ,J NJx
(- ! t )NI- J - I (l,J
I=I+ J
*- O. =
X ]
NJ+l
y,
where y is defined in an obvious way. Now we can write
=
Nf- I X
=
Nf- J - I NJ x
Nf- J - I NJ + l
y
=
Nfy
= O.
We thus have arrived at a contradiction, which proves our result. Next, let us examine mations.
0, I, ... ,
the matrix
_
representation of nilpotent transfor-
.4 7.29. Theorem. Let V be a q-dimensional vector space, and let N E L ( V, V) be nilpotent of index .q Let mo E V be such that Nf-1m o *- o. Then the matrix N of N with respect to the basis { N f- I m o, NQ-2 mo , . .. ,mol in V is given by 0100 00 0010 00 N= . (4.7.30) 0000 01 0000 00
Proof.
By the previous theorem we know that {Nf-Im o,' .. ,mol is a linearly independent set. By hypothesis, there are q vectors in the set, and thus '{ N f- I m o, ... ,mol forms a basis for V. Let el = Nqm o for i = I, ... ,q . Then O, i= I Ne l
Hence,
Ne l
+
= 0 • et
Ne 2
=
Ne f
= 0 •e
+
I • et
t
+
=
{ el->J
.
0 • e2 + 0 • e2 + 0 • e2 +
2, ... ,q .
1=
+ ... +
+
0 . ef -
t
0 • ef -
1
I •e
t
f-
+
+
+
0 • ef 0 • eq
0 •e
f•
F r om Eq. (4.2.2) and Definition .4 2.7, it follows that the representation of Nis that given by Eq. (4.7.30). This completes the proofofthetheorem. -
.4 7.
187
Nilpotent Operators
The above theorem establishes the matrix representation of a nilpotent linear transformation of index q on a q-dimensional vector space. We will next determine the representation of a nilpotent operator of index v on a vector space of dimension m, where v < m. The following lemma shows that we can dismiss the case v > m. .4 7.31. eL mma. V = m. Then v <
Let N m.
E
v, where dim
L ( V, V) be nilpotent of index
*
Proof Assume x E V, N· x = 0, N- - I X 0, and v > m. Then, by Theorem 4.7.28, the vectors x , Nx , ... , N- - I x are linearly independent, which contradicts the fact that dim V = m. •
To prove the next theorem, we require the following result. .4 7.32. eL mma. eL t V be an m-dimensional vector space, let N V), let v be any positive integer, and let
=
= OJ, dim WI = = {x: N2X = OJ, dim W 2 =
WI
W2
{x:
W.
=
{x:
Nx
N' x
= OJ, dim
W.
=
E
L ( V,
II,
12 , I•.
Also, for any i such that I < i < v, let { e l' ... , em} be a basis for V such that e{ lt ... ,ed is a basis for WI' Then (i) WI C w2 C • . • C W.; and (ii) (e u " " e"_,, Ne,.+1> ... ,Ne, .. ,} is a linearly independent set of vectors in W,. To prove the first part, let x E WI for any i < v. Then NiX = O. eH nce, NI+ I X = 0, which implies x E W1+ 1 ' To prove the second part, let r = II- I and let t = 11+ I - II' We note that if x E WI+ I , then NI(Nx ) = 0, and so Nx E WI' This implies that Ne J E WI for j = II + I, ... ,11+1' This means that the set of vectors {el, ... ,e" NeH> ! ... , Ne"..} is in WI' We show that this set is linearly independent by contradiction. Assume there are scalars (XI" • ,(x , and PI' ... , PI> not all ez ro, such that
Proof
(Xle l
Since e{ l , • • be non-ez ro. eH nce,
+ ... +
(X,e,
+
PINe,,+1
+ ... +
p,Ne".,
= O.
,e,} is a linearly independent set, at least one of the PI must Rearranging the last equation we have
Chapter 4 I iF nite-Dimensional
188
Thus,
Vector Spaces and Matrices
+ ... +
fl,e,• ..> = 0, W,. If fl.e,,+! + ...
N' ( fl. e,,+.
and (fl.e,,+. + ... + fl,e".,) E + fl,e" • 1= = 0, it can be written as a linear combination of e., ... , e", which contradicts the fact that e{ ., . .. ,e".,} is a linearly independent set. If fl.e,,+. + ... + fl,e,•• , = 0, we contradict the fact that e { ., ... , e".,} is a linearly independent set. eH nce, weconcludethatlZ, = Ofori = I, ... , r andfl, = Ofori = I, ... , t. This completes the proof of the theorem. _ We are now in a position to consider the general representation of a nilpotent operator on a finite-dimensional vector space. .4 7.33. let N
Theorem. eL t V be an m-dimensional vector space over C, and L ( V, V) be nilpotent of index v. Let W. = {x: Nx = O}, ... , W. = {x: N· x = OJ, and let I, = dim W" i = I, ... ,v. Then there exists a basis for V such that the matrix N of N is of block diagonal form, E
N=:[ ' where
N,=
o
:],
(4.7.34)
N,
0100 0010
00 00
0000 0000
01 00
.
(4.7.35)
i = 1, ... ,r, where r = I., N, is a (k, x k,) matrix, I :::;; k,:::;; determined in the following way: there are
I. -
I._I
2/, -
1'1+
2/. -
11
-
(v 1,-.
(i
lI,
and k, is
X v) matrices,
x i) matrices, i = 2, ... ,v -
(I x
I, and
I) matrices.
The basis for V consists of strings of vectors of the form Proof By eL mma .4 7.32, W. c W1 C • • c W•. Let e{ ., ... , e.} be a basis for V such that {e., . .. ,e,.l is a basis for W,. We see that W. = V. Since N is nilpotent of index v, W._ 1 1= = W. and 1.-. < I•. We now proceed to select a new basis for V which yields the desired result. We find it convenient to use double subscripting of vectors. L e th .• = e,•.• .+ ,
.4 7.
189
Nilpotent Operators
•• ,/(/y- I v_ . ),y = e,y and let It. .- 1 = Nlt.., ... ,/(/.- 1 .- . ),.- 1 = NI(/._I .• • )•• , By Lemma .4 7.32, it follows that {el>'" ,e,._.,fl .• - I ,' " ,I<,.-, .• ,)•• - I } is a linearly independent subset of W._I> which mayor may not be a basis for W._ I' If it is not, we adjoin additional elements from W._> \ denoted by 1<,.-, .• • 1+ 1 .• - 1 "" ,/(/•.• -Iv • )•• >\- so as to form a basis for W._ I • Now let 11 .• 2- . = NII • - I ,I2.•• 2- . = NI2..• - I ' · · · ,1<, .• • ,- .• • ),.-2. = NI<, .• • ,_ .• • ).• _ I · By Lemma .4 7.32 it follows, as before, that e{ >\ ... , e,•.• ,/I.• 2- .,. .. ,1<, .• • I- .• • ).• 2- .} is a linearly independent set in W.-2.' If this set is not a basis we adjoin vectors from W.-2. so that we do have a basis. We denote the vectors that we adjoin by 1<".,-1 .• • 1+ 1 .• - 2 ., • • ,1<,. .• - 1 .,.) • 2- .' We continue in this manner until we have formed a basis for V. We express this basis in the manner indicated in Figure C.
Basis for f '." - -,
f(/.-I..-,I.
f"._" ,-
V
f(l.- I • ,),V- l , - - .
f(l._ , - / .-
2 ),v- l
f,,2' - - '
f(l.- I • ,I,2,- - - - - - - - - ,
f(/2- 1 ,),2
f,." ,-
f(l._ I ._ , ).
f(/2- 1,), 1. - - ,
.4 7.36.
,,- - - - - - - - - ,
f/"I'
F i gtn C. Basis for V.
The desired result follows now, for we have
NI; = 1./
eH nce, if we let XI = bottom to top, is
/{ ,./-0,>\
j>
j =
I I.
II.., we see that the first column in Figure
C reading
We see that each column of Figure C determines a string consisting of k, entries, where k, = v for i = I, ... , (I. - /._1)' Note that (/. - 1.-1) > 0, so there is at least one string. In general, the number of strings withj entries is (// - //-1) - (/J + I - //) = 2/} - I} + I - I} - I for j = 2, ... , v - I. Also, there are /1 - (12. - /1) = 2/ 1 - /" vectors, or strings with one entry. Finally, to show that the number of entries, NI, in N is /1' we see that
Chapter 4 I iF nite-Dimensional
190
Vector Spaces and Matrices
- I. - 1.- 2 ) + there are a total of(/. - I.- I ) + (2/'1+ (2/ 1 - 12 ) = II columns in the table of Figure C. This completes the proof of the theorem. _
... +
(2/ 2 -
II -
13 )
The reader should study Figure C to obtain an appreciation of the structure of the basis for the space V.
C. The oJ rdan
Canonical oF rm
We are finally now in a position to state and prove the result which establishes the Jordan canonical form of matrices. .4 7.37.
A E L(X,
Deorem. eL t X be an n-dimensional vector space over C, and let X ) . eL t the characteristic polynomial of A be
p(A) =
A)"" ... (A, -
(AI -
A)m.,
and let the minimal polynomial of A be m(A)
=
(A -
AI)" ... (A -
A,)",
where AI' ... ,A, are the distinct eigenvalues of A. eL t ,X
Then (i) (ii) (iii) (iv)
Xl>"" X
X,
=
x{
E
X:
(A -
A,I)"x
= OJ.
are invariant subspaces of X under A;
= IX EB ..• EB
X,;
dim ,X = m i = 1, ... ,p; and " there exists a basis for X such that the matrix A of A with respect to this basis is of the form AI A
where A, is an (m,
= X
[
0 ... 0]
~ ... ~.2 o
0
•
: : : •
~.
'
(4.7.38)
... A,
m,) matrix of the form
A, = 1,1 + N,
(4.7.39)
and where N, is the matrix of the nilpotent operator (A, of index V, on ,X given by Eq. (4.7.34) and Eq. (4.7.35).
liT)
Proof. Parts (i)-(iii) are restatements of the primary decomposition theorem (Theorem .4 7.20). From this theorem we also know that (1 - 1 ,)" is the minimal polynomial of A" the restriction of A to "X eH nce, if we let N, = A, - l,I, then N, is a nilpotent operator of index V, on "X We are thus able to represent N, as shown in Eq. (4.7.35). The completes the proof of the theorem. _
.4 7.
oJ rdan Canonical oF rm
191
A little extra work shows that the representation of A E L ( X . X ) by a matrix A of the form given in Eqs. (4.7.38) and (4.7.39) is unique. except for the order in which the block diagonals AI• . ..• Ap appear in A. .4 7.40. Definition. The matrix A of A E L ( X . X ) given by Eqs. (4.7.38) and (4.7.39) is called the Jordan canonical form of A. We conclude the present section with an example.
Example. Let X = R 7 • and let u{ I • • • u7 } be the natural basis for .4 7.41. X (see Example .4 I.15). L e t A E L ( X . X ) be represented by the matrix 3 0 o o 0 2 -1 2 1 -1 -6 0 2 -2 0 -1 1 3 0 o 0 o 0 1 o0 o 0 o 0 o 1 0 -I -I o 1 2 4 1 -I
A=
-1
0
o
1
1
1
o
0
with respect to u{ I , • . . • u7 } . L e t us find the matrix At which represents A in the J o rdan canonical form. We first find that the characteristic polynomial of A is Pel)
=
1)7.
(I -
This implies that 1 1 = I is the only distinct eigenvalue of A. Its algebraic multiplicity is m. = 7. In order to find the minimal polynomial of A. let N
=
),.1,
A-
where I is the identity operator in L ( X , respect to the natural basis in X is
o
-2
N= A - I =
o
o
2
I
-2
o
o o o
o
-1
-I
X).
The representation for N with
-I
1
0
o
1 -I
0 -6
-I
I
-I
1
1 0
0
o o
0
0
1
2
0
3 0o0 0
3 0
o o
0 0 4 0
Chapter 4 I iF nite-Dimensional
192
Vector Spaces and Matrices
We assume the minimal polynomial is of the form m(l) = (l - I» ' and proceed to find the smallest VI such that m(A - I ) = m(N) = O. We first obtain
o o
NZ
=
Next, we get that
-1
o
0 I
o o o
0 0 0
0 -I
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 3 0 o0 0 0 -3 0 0 3 0 o0 0 0 o0 0 o0
N3 = 0 ,
3. eH nce, N is a nilpotent operator of index 3. We see that We will now apply Theorem .4 7.33 to obtain a representation for N in this space. sU ing the notation of Theorem .4 7.33, we let WI = :x { Nx = OJ, Wz = :x { NZx = OJ, and W, = :x { N 3 x = 0). We see that N has three linearly independent rows. This means that the rank of N is 3, and so dim (WI) = II = .4 Similarly, the rank of NZ is I, and so dim (Wz ) = Iz = 6. Clearly, dim (W3) = 13 = 7. We can conclude that N will have a representation N' ofthe form in Eq. (4.7.34) with r = .4 Each of the N; will be of the form in Eq. (4.7.35). There will be 13 - Iz = 1 (3 x 3) matrix. There will be 2/ z - 13 - II = 1 (2 X 2) matrix, and 2/1 - Iz = 2 (l x I) matrices. eH nce, there is a basis for X such that N may be represented by the matrix
and so
X =
VI =
5't~.
o
N' =
1 0iO 0 0 0 001:0000 I o 0 0:0 0 0 0 r- -j·000:01:00 I I o 0 010 010 0 ·- - - r- · o 0 0 0 0:0:0 o 0 0 0 0 0:0 1
,-_
..
The corresponding basis will consist of strings of vectors of the form
NZx..
Nx . .
x..
Nx z , X z , x 3, x ... We will represent the vectors x .. X z , "x and x .. by x .. x z , "x and x .., their coordinate representations, respectively, with respect to the natural basis u{ .. ... , u,} in .X We begin by choosing XI E W3 such that X I 1= Wz ; i.e., we find an X I such that N 3x I = 0 but NZx I :# O. The vector x f = (0,
.4 7.
193
oJ rdan Canonical oF rm
1,0,0,0,0,0) will do. We see that (Nxl)T = (0,0, 1,0,0,0, - I ) and (N2IX )T = (- 1 ,0, I, - 1 ,0,0,0). Hence, NX I E Wz but NX I ~ WI and NZx l E WI' We see there will be only one string of length three, and so we next choose zX E Wz such that X z ~ WI' Also, the pair N { x l , }zx must be linearly independent. The vector x I = (1,0,0,0,0,0,0) will do. Now (NxZ)T = (- 2 ,0,2, - 2 ,0,0, - I ), and NX 2 E WI' We complete the basis { Zx l , Nx z , for X by selecting two more vectors, X 3 , x , E W., such that N X 3t x , } are linearly independent. The vectors x I = (0, 0, - I , - 2, I, 0, 0) and x r = (1, 3, I, 0, 0, I, 0) will suffice. It follows that the matrix P =
N [ xz
l,
Nx l , X I '
Nx z , X z , x
3,
x,]
is the matrix of the new basis with respect to the natural basis (see Exercise
.4 3.9).
The reader can readily show that N' = P - I NP,
where
-I
0 I P=
-I 0
0 0 -2 I 0 0 I 0 0 0 I 0 2 0 -I 0 0 -2 0 -2 0 0
0 0
I
0 0
p- l =
I I
2
4
3 I
0 0
0 I
0 0 0 0 0 I 0 -I 0 0 and
I
0 0
-2
2
0 0 I 3 -I 0 0 I 0 0 0 -3 0 I -I 0 0 -I -I -3 -2 -I I 0 0 0 -I 0 0 0 0 I 0 0 0 0 0 0 0 I 0
Finally the J o rdan canonical form for A is given by
A' =
N'
+
I.
(Recall that the matrix representation for [ i s the same for any basis in .X ) Thus,
Chapter 4 I iF nite-Dimensional
194
Vector Spaces and Matrices
1 1 0iO 0 0 0 I 011:0000 001:0000 t- - ·00 0: 1 1:0 0 0 0 0 :I 0 1 :I 0 0 o 0 0 0 I
A' =
I
I
o- o- T"i-l
'- -i-
0 0 0 0 0 OIl Again, the reader can show that A' =
P- I AP.
In general, it is more convenient as a check to show that PA'
= AP. •
.4 7.42. Exercise. eL t X = R' , and let u{ t , • • , u,} denote the natural X ) be represented by the matrix basis for .X Let A E L ( X ,
A=
05 -1 I 1 0 3 -I -1 1 0 0 4 0 0 0 1 1 -1 0 0 0 4 -1 0 0 0 0 1 3 0 0 0 0 1 3
Show that the Jordan canonical form of A is given by 4
1
04
A' =
0iO I
0
0
1:000 o 0 4:0 0 0 O-O - O - r- i- 4 l0
o
0_
I
I
I
1
1_ _
0 0:0 4 : 0 0 0 0 0 i2 ~
and find a basis for X for which A' represents A.
.4 8.
BILINEAR
N UF CTIONALS
AND CONGRUENCE
In the present section we consider the representation and some of the properties of bilinear functionals on real finite-dimensional vector spaces. (We will consider bilinear functionals defined on complex vector spaces in Chapter 6.)
.4 8.
Bilinear uF nctionals and Congruence
195
Throughout this section X is assumed to be an n-dimensional vector space over the field ofreal numbers. We recall that iffis a bilinear functional on a real vector space ,X then f: X x X - + Rand f(<x<
and
I
+
px 2,Y ) =
+
f(x , IY «
PY2) =
« f (x ,
for all ' J « .4 8.1. let
PIr
E
Rand ' J x
IY r
Definition. eL t e{
E
X,j
l , ... ,
2Y
YI'
Pf(x , 2Y ) E
.X
tl 1;1 «JPd(x
=
J, Ir"'i;,. PlrYIr)
Pf(x 2,y)
+
YI)
for all ,« pER and for all x, X I ' x 2, ,Y these properties we have, more generally,
f(ii x J «
+
« f (x l ,y)
=
As a consequence of
J, IY r)
I, ... , rand k =
I, ... ,s.
en) be a basis for the vector space X,
ftJ = f(e" eJ ), i,j =
and
I, ... , n.
The matrix F = lftJ ] is called the matrix of the bilinear functional f with respect to e{ l> ... , en)' Our first result provides us with the representation of bilinear functionals on finite-dimensional vector spaces. .4 8.2. Theorem. Let f be a bilinear functional on a vector space ,X and let fe I' . • . , e.l be a basis for .X eL t F be the matrix of the bilinear functional fwith respect to the basis fel> ... ,e.l. If X and yare arbitrary vectors in X and if x and yare their coordinate representation with respect to the basis fel , e2 , • • , e.l, then
Proof
+ ... +
•
•
~ :E fttl' l J ' 1'1= I- J We have x T = (el" .. ,en) and yT = ('II" .. ,'In)' Also, e.e. and Y = ' I le l + ... + I' .en· Therefore,
f(x , y) =
f(x , y) =
•
•
:E :E I- I I- J
T x yF
=
el' l J ( e l, eJ )
• •
= :E :E ftJel'lJ 1= I = J I =
X
(4.8.3) =
ele l
T x yF
which was to be shown. • Conversely, if we are given any (n X n) matrix F, we can use formula (4.8.3) to define the bilinear functional f whose matrix with respect to the given basis e{ I ' . . • , e.) is, in turn, F again. In general, it therefore follows that on finite-dimensional vector spaces, bilinear functionals correspond in a one-to-one fashion to matrices. The particular one-to-one correspondence depends on the particular basis chosen. Now recall that if X is a real vector space, thenfis said to be symmetric
Chapter 4 I iF nite-Dimensional
196 if f(x , y) = concept. 4.8.4.
f(y, x ) for all x , y
Definition.
skew symmetric if for all x, y
E
.X
E
Vector Spaces and Matrices
We also have the following related
A bilinear functional f on a vector space X is said to be
=
f(x , y)
.X
(4.8.5)
- f (y,x )
F o r symmetric and skew symmetric bilinear functionals we have the following result. 4.8.6. Theorem. L e t e{ t , • • ,eR } be a basis for ,X and let F be the matrix for a bilinear functionalfwith respect to e{ l>' .. ,e.}. Then
(i) f is symmetric if and only if F = T F ; (ii) f is skew symmetric if and only if F = - T F ; and (iii) for every bilinear functional f, there exists a unique symmetric bilinear functional f, and a unique skew symmetric bilinear functional f2 such that f= f , + f 2' We callft the symmetric part offandf2 the skew symmetric part off.
.4 8.7.
Exercise.
Prove Theorem .4 8;6.
The preceding result motivates the following definitions. 4.8.8.
Definition. An (n X
n) matrix F
is said to be
(i) symmetric if F = T F ; and (ii) skew symmetric if F = - F T . The next result is easily verified. .4 8.9. Theorem. Let f be a bilinear functional on ,X and let ft and f2 be the symmetric and skew symmetric parts off, respectively. Then and for all x , y 4.8.10. !(x)
E
f,(x , y) =
t[ f (x ,
y)
+
feY , )x ]
f2(X, y) =
t[ f (x ,
y) -
feY , )x ]
.X
Exercise.
Prove Theorem .4 8.9.
Now let us recall that the q u adratic form induced by f was defined as = f(x , x). On a real finite-dimensional vector space X we now have
.4 8.
197
Bilinear uF nctionals and CongrlUn! ce
F o r quadratic forms we have the following result. .4 8.11. Theorem. L e tJ a nd g be,bilinear functionals on .X The quadratic forms induced by J and g are equal if and only ifJ and g have the same symmetric part. In other words,! ( x ) = § ( x ) for all x E X if and only if ![J(x,
for all x , y
y)
+
=
J(Y,
)x ]
J(x,
x) -
! [ g (x ,
+
y)
g(y, )x ]
.X
E
Proof We note that J(x
-
y, x -
y) =
y) -
+
feY , y) -
f(x -
y, x -
y)].
g(y, y) -
g(x -
y, x -
y)]
F r om this it follows that ![J(x,
y)
+
Now if g(x , x ) = ![J(x,
y)
J(Y,
+
)x ]
! [ f (x , =
x ) , then
J(x, J(Y,
)x ]
= =
so that ![J(x,
x)
y) 1 +
! [ g (x ,
x)
! [ g (x ,
y)
(Y, )x ]
=
+ +
J(Y,
x)
+
J(x,
J(Y,
y).
g(y, x)],
! [ g (x ,
y)
+
(4.8.12)
g(y, )x .]
Conversely, assume that Eq. (4.8.12) holds for all ,x y E .X particular, if we let x = y, we have f(x , x ) = g(x , x ) for all x concludes our proof. _
Then, in .X This
E
F r om Theorem .4 8.11 the following useful result follows: when treating quadratic functionals, it suffices to work with symmetric bilinear functionals. We leave the proof of the next result as an exercise. .4 8.13. Theorem. A bilinear functional on a vector space X is skew symmetric if and only if J ( x , x ) = 0 for all x E .X .4 8.14.
Exercise.
Prove Theorem .4 8.13.
The next result enables us to introduce the concept of congruence. .4 8.15. Theorem. Let f be a bilinear functional on a vector space X, let e{ l> ... , e.l be a basis for ,X and let F be the matrix ofJ w ith respect to this be another basis whose matrix with respect to e{ l> basis. Let e{ ;, . .. ,e~l ... ,e.l is P. Then the matrix F ' of fwith respect to the basis e{ ;, .. . ,e.l
Chapter 4 I iF nite-Dimensional
198 is given by
P"F P . =
(4.8.16)
. p,/e,) = ..= /[ :/]p",p,/I(e", e,) = . . p",I",p,/.= 1(1" )~ .
Proof ~
F'
Vector Spaces and Matrices
Let F '
where, by definition,I:/
~ ~
Hence, F '
~ ~
"-I I-'
t:1
t:' t t:1
Then/(
=
t
"-I
P"F P .
p",e". _
We now have: 4.8.17. Definition. An (n x n) matrix F ' is said to be congruent to an (n X n) matrix F if there exists a non-singular matrix P such that
F' =
PTFP.
We express this congruence by writing F '
(4.8.18)
,." .F
Note that congruent matrices are also equivalent matrices. The next theorem shows that ,." in Definition 4.8.17 is reflexive, symmetric, and transitive, and as such it is an equivalence relation. 4.8.19.
Theorem.
Let A, B, and C be (n x
n) matrices. Then,
(i) A is congruent to A; (ii) if A is congruent to B, then B is congruent to A; and (iii) if A is congruent to Band B is congruent to C, then A is congruent toC. Proof Clearly A = ITAI, which proves the first part. To prove the second part, let A = PTBP, where P is non-singular. Then
B
= (PT)-IAP-I = (P-I)TA(P-I),
which proves the second part. Let A = PTBP and B = QTCQ, where P and Q are non-singular matrices. Then A = PTQTCQP = (QP)TC(QP), where QP is non-singular. This proves the third part. _ F o r practical reasons we are interested in determining the "nicest" (i.e., the simplest) matrix congruent to a given matrix, or what amounts to the same thing, the "nicest" (i.e., the most convenient) basis to use in expressing a given bilinear functional. If, in particular, we confine our interest to quadratic functionals, then it suffices, in view of Theorem .4 8.11, to consider symmetric bilinear functionals.
.4 8.
Bilinear uF nctionals and Congruence
199
We come now to the main result of this section, called Sylvester's theorem. .4 8.20. Theorem. L e t / be any symmetric bilinear functional on a real n-dimensional vector space .X Then there exists a basis {el' ... ,e.} of X such that the matrix of/with respect to this basis is of the form
+1
p}
0
+1
r
-1
n (4.8.21)
-1 0
0
o The integers rand p in the above matrix are uniquely determined by the bilinear form.
Proof. Since the proof of this theorem is somewhat long, it will be carried out in several steps. Step 1. We first show that there exists a basis v{ u ... , v.J of X such that /(v1, vJ) = 0 for i 1= = j. The proof of this step is by induction on the dimension of .X The statement is trivial if dim X = 1. Suppose that the assertion is true for dim X = n - l. Let / be a bilinear functional on ,X where dim X = n. L e t VI E X be such that /(v l , VI) 1= = O. There must be such a VI; otherwise, by Theorem .4 8.13, / would be skew symmetric, and we would conclude that/(x , y) = 0 for all x , y. Now let mz = x { E X : f(v l , x) = OJ. We now show that mz is a linear subspace of .X Let X u 2X E mz so that f(vl , X I ) = f(v u x 2) = O. Then f(v u X I + x 2 ) = f(vl , X I ) + f(v l , x 2) = 0 0 = O. Similarly, f(vt> « X I ) = 0 for all « E R. Therefore, mz is a linear Furthermore, mz 1= = X because VI ¢ mz. Hence, dim mz subspace of .X := ;;; n - 1. Now let dim mz = q < n - 1. Since / is a bilinear functional on mz, it follows by the induction hypothesis that there is a basis for mz consisting of a set of q vectors v{ 2 , • • , vf+tl such that f(v1, vJ) = 0 for i 1= = j, 2 < i, j < q + 1. Also, f(v l , vJ) = 0 for j = 2, ... ,q + I, by definition of mz.
+
Chapter 4 I iF nite-Dimensional
200
Vector Spaces and Matrices
uF rthermore, f(v VI) = f(v l , vJ eH nce, f(v VI) = f(v., v,} = 0 for i = 2, " ... ,q + l . It follows that f(v"vJ } = 0 for" i:# j and I~i,j
+
*
p
o -1
F=
o
-I
n- p
o o
.4 8.
Bilinear uF nctionals and Congruence
201
o -1 F'= -1
o
n- q
o
To prove that p = q we show that e l , • • ,e" e;+h ... ,e:. are linearly independent. F r om this it must follow that p (n - q ) < n, or p < .q By the same argument, q < p, and so p = .q L e t
+
Ylel
+
+
y,e, +
where 1' 1 E R, i = 1, ,p and 1' : above equation we have Then
+ ... +
"leI
= l , ... ,
f(x o, x o) =
Y~
R, i
=
, ~e:.
+
q
,Y e"
Ylel
0, =
1, ... ,n. Rewriting the
+ ... +
-(Y;+le;+1
+ ... + + ... + Y~ >
y~e:.)
A
x
o'
+ ... +
pY e p)
(r~+I~+I'
... ,y~e~)]
0,
ep}. On the other hand,
+ ... +
f[-(~+,~+,
=
E
= f(y,e,
f(x o, x o)
by choice of{ e
"pep =
+ ... +
;Y l+ e;+1
(- 1 )Z[ -
(,,~+
1)2 -
(,,~+z)Z
-
y~e:.),
-
.• •
-
(y~)Z]
<
0
+ ... +
by choice of{~+I" .. ,e~+R}' F r om this we conclude thaty~ '1~ = 0; i.e., 1' 1 = ... = 1' p = O. Hence, Y~+ I~+ I + ... + y~e~ = O. But the set {~+I" .. , e:,} is linearly independent, and thus Y~+I = ... = , ~ = O. Hence, ... ,e~ are linearly independent, and it follows the vectors el' ... ,ep , ~+t, thatp = .q To prove that r is unique, let r be the number of non-zero elements of F and let r' be the number of non-zero elements of F ' . By Theorem .4 8.15, F and F ' are congruent and hence equivalent. Thus, it follows from Theorem 4.3.16 that F and F ' must have the same rank, and therefore r = r'. This concludes the proof of the theorem. _
201
Chapter 4
I Finite-Dimensional
Vector Spaces and Matrices
Sylvester's theorem allows the following classification ofsymmetric bilinear functionals. .4 8.22. Definition. The integer r in Theorem .4 8.20 is called the rank of the symmetric bilinear functional f. The integer p is called the index of f. The integer n is called the order off. The integer s = 2p - r (i.e., the number of + l' s minus the number of - I s' ) is called the signature off. Since every real symmetric matrix is congruent to a unique matrix of the form (4.8.21), we define the index, order, and rank of a real symmetric matrix analogously as in Definition .4 8.22. Now let us recall that a bilinear functional f on a vector space X is said to be positive if f(x , x ) > 0 for all x E .X Also, a bilinear functional f is said to be strictly positive if f(x , x) > 0 for all x 0, x E X (it should be noted that f(x , x ) = 0 for x = 0). Our final result of the present section, which is a consequence of Theorem .4 8.20, enables us now to classify symmetric bilinear functionals.
"*
.4 8.23. Theorem. Let p, r, and n be defined as in Theorem .4 8.20. A symmetric bilinear functional on a real n-dimensional vector space X is (i) strictly positive if and only if p (ii) positive if and only if p = r. .4 8.24.
.4 9.
Exercise.
=
r
=
n; and
Prove Theorem .4 8.23.
EUCIL DEAN
VECTOR SPACES
A. Euclidean Spaces: Definition and Properties
Among the various linear spaces which we will encounter, the so-called Euclidean spaces are so important that we devote the next two sections to them. These spaces will allow us to make many generalizations to facts established in plane geometry, and they will enable us to consider several important special types of linear transformations. In order to characterize these spaces properly, we must make use of two important notions, that of the norm of a vector and that of the inner product of two vectors (refer to Section 3.6). In the real plane, these concepts are related to the length of a vector and to the angle between two vectors, respectively. Before considering the matter on hand, some preliminary remarks are in order. To begin with, we would like to point out that from a strictly logical point of view Euclidean spaces should actually be treated at a later point of
.4 9.
Euclidean Vector Spaces
203
our development. This is so because these spaces are specific examples of metric spaces (to be treated in the next chapter), of normed spaces (to be dealt with in Chapter 6), and of inner product spaces (also to be considered in Chapter 6). oH wever, there are several good reasons for considering Euclidean spaces and their properties at this point. These include: Euclidean spaces are so important in applications that the reader should be exposed to them as early as possible; these spaces and their properties will provide the motivation for subsequent topics treated in this book; and the material covered in the present section and in the next section (dealing with linear transformations defined on Euclidean spaces) constitutes a natural continuation and conclusion of the topics considered thus far in the present chapter. In order to provide proper motivation for the present section, it is useful to utilize certain facts from plane geometry to indicate the way. To this end let us consider the space R'- and let x = (' I ' ,,-) and y = ('11' 1' ,-) be vectors in R'.- Let IU{ > u,-} be the natural basis for R'.- Then the natural coordinate representation of x and y is x =
[~:J
and y =
:[ :J
(4.9.1)
respectively (see Example .4 1.15). The representation of these vectors in the plane is shown in Figure D. In this figure, Ix I, Iy I, and Ix - y Idenote the
.4 9.1.
iF gure D. eL ngth of vectors and angle between vectors.
lengths of vectors ,x y, and (x - y), respectively, and 8 represents the angle IlZ , and the length between x and y. The length of vector x is equal to (,f + of vector (x - y) is equal to { ( ' I - 1' 1)'- + (,,- - 1' ,-))- ' 1/2. By convention,
,n
Chapter 4 I iF nite-Dimensional
Vector Spaces and Matrices
we say in this case that "the distance from x to y" is equal to {(~I - I' I)Z + (~z - I' Z)Z}1/2, that "the distance from the origin 0 (the null vector) to x" is equal to (~f + ~DI/Z, and the like. Using the notation of the present chapter, we have (4.9.3) and
Ix - yl =
,J ( x
-
y)T(x -
= ,J ( y - )X T(y - )x = Iy - lx .
y)
(4.9.4)
The angle (J between vectors x and y can easily be characterized by its cosine, namely,
cos 8 =(~
Utilizing
17~+
~z7z)
(4.9.5)
Z·
""'~f + i ""'I' I + I' z the notation of the present chapter, we have
,J
cos (J =
T x x
XT~
(4.9.6)
yTy
It turns out that the real-valued function T x y, which we used in both Eqs. (4.9.3) and (4.9.6) to characterize the length of any vector x and the angle between any vectors x and y, is of fundamental importance. F o r this reason we denote it by a special symbol; i.e., we write (x, y)
Now if we let x
=
t:.
T x y.
(4.9.7)
yin Eq. (4.9.7), then in view of Eq. (4.9.3) we have
Ix I = ""'(x, x).
(4.9.8)
By inspection of Eq. (4.9.3) we note that
>
(x, x)
and
(x , x )
0 for all x * - O
=
=
0 for x
(4.9.9) (4.9.10)
O.
Also, from Eq. (4.9.7) we have (x, y) =
(4.9.11)
(y, x)
for all x and y. Moreover, for any vectors ,x y, and z and for any real scalars « and p we have, in view of Eq. (4.9.7), the relations (x
+
(x , y
and
+
y, )z = )z =
(x, )z (x, y)
+
+
(Y, )z ,
(4.9.12)
(x , )z ,
(4.9.13)
y) =
«(x,
y),
(4.9.14)
(x , « y ) =
«(x,
y).
(4.9.15)
(<,x<
In connection with Eq. (4.9.6) we can make several additional observa1; if x = - y , then cos tions. First, we note that if x = y, then cos (J = 8 = - 1 ; if x T = (~I> 0) and yT = (0, I' )z , then cos (J = 0; etc. It is easily
+
.4 9.
Euclidean Vector Spaces
+
verified, using Eq. (4.9.6), that cos (J assumes all values between 1 and - 1 ; i.e., - 1 < cos (J S 1. The above formulation agrees, of course, with our notions of length of a vector, distance between two vectors, and angle between two vectors. F r om Eqs. (4.9.9}-(4.9.l5) it is also apparent that relation (4.9.7) satisfies all the axioms of an inner product (see Section 3.6). U s ing the above discussion as motivation, let us now begin our treatment of Euclidean vector spaces. F i rst, we recall the definition of a real inner product: a bilinear functional f on a real vector space X is said to be an inner product on X if (i) f is symmetric and (ii) f is strictly positive. We also recall that a real vector space X on which an inner product is defined is called a real inner product space. We now have the following important
+
.4 9.16.
Definition. A real finite-dimensional vector space on which an inner product is defined is called a Euclidean space. A finite-dimensional vector space over the field of complex numbers on which an inner product is defined is called a unitary space.
We point out that some authors do not restrict Euclidean spaces to be finite dimensional. Although many of the results of unitary spaces are essentially identical to those of Euclidean spaces, we postpone our treatment of complex inner product spaces until Chapter 6, where we consider spaces that, in general, may be infinite dimensional. Throughout the remainder ofthe present section, X will denote an n-dimensional Euclidean space, unless otherwise specified. Since we will always be concerned with a given bilinear functional on ,X we will henceforth write (x, y) in place of f(x , y) to denote the inner product of x and y. Finally, for purposes of completeness, we give a summary of the axioms of a real inner product. We have
*'
(i) (x, x ) > 0 for all x 0 and (x, x ) = 0 if x = 0; (ii) (x, y) = (y, x ) for all x , y E X; (iii) (IXX py, )z = IX(,X )z P(y, )z for all x, y, z E X and all IX, PER; and (iv) (x, lXy pz) = IX(,X y) P(x, )z for all x , y E X and all IX, pER.
+
+
+
We note that Eqs. axioms.
.4 9.17. if y =
o.
+
(4.9.9}-(4.9.15)
are clearly in agreement with these
Theorem. The inner product (x, y) =
0 for all x
E
X if and only
Chapter 4 I iF nite-Dimensional
Vector Spaces and Matrices
Proof If y = 0, then y = 0 • x and (x, 0) = ( x , 0 • x ) = 0 • (x, x ) = 0 for allx E X . On the other hand, let (x , y) = 0 for all x E .X Then, in particular, it must be true that (x, y) = 0 if x = y. We thus have (y, y) = 0, which implies thaty = 0.. . The reader can prove the next results readily. .4 9.18. Corollary. L e t if and only if A = O.
A E L(X,
.4 9.19. y E X,
A, B E L ( X ,
Corollary. Let then A = B.
4.9.20. Corollary. x , y E R-, then A = 4.9.11.
Exercise.
X).
Then (x, Ay) =
0 for all ,x y E X
If (x, Ay) =
(x, By) for all ,x
X).
A be a real (n x n) matrix.
Let
o.
If x T Ay
=
0 for all
(~\t
.• . , ~_)
Prove Corollaries 4.9.18-4.9.20.
Of crucial importance is the notion of norm. We have: 4.9.11.
Definition. F o r each x E ,X
We call Ix
Ixl
I the norm of .x
let (x ,
=
X ) 1/2.
Let us consider a specific case. 4.9.13. Example. and y = ("I ' . • • ,
Let ,,_ ) .
X = R- and let x, y E X, where x F r om Example 3.6.23 it follows that
-
:E /~ ' I- I
(x, y) =
=
(4.9.24)
is an inner product on .X The coordinate representation of x and y with respect to the natural basis in R- is given by
respectively (see Example .4 1.15). We thus have and
(x , y)
Ixl =
_
(
:E l~ I- I
= Tx y,
)1/2
The above example gives rise to:
=
(X TX)1/2 • •
(4.9.25) (4.9.26)
.4 9.
Euclidean Vector Spaces
207
.4 9.27.
Definition. The vector space R" with the inner product defined in Eq. (4.9.24) is denoted by P. The norm of x given by Eq. (4.9.26) is called the Euclidean norm on R". Relation (4.9.29) of the next result is called the Schwarz inequality.
.4 9.28.
Theorem.
Let x and y be any elements of .X
Then
Ix l · I Y I ,
l(x,y)1 ~
(4.9.29)
where in Eq. (4.9.29) I(x, y) I denotes the absolute value of a real scalar and Ix I denotes the norm of .x F o r any x and y in X and for any real scalar tt we have
Proof
+
(x
tty, x
+
tty)
=
+
(x, )x
tt(x, y)
Now assume first that y *- 0, and let tt
Then
+
(x
tty, x
+
=
tty)
(x
=
(x
, ,
+
tt(y, )x
tt 2(y, y)
>
O.
- ( x , y). (y, y)
(x, x )
=
or
=
+
+
2tt(x, y)
+
tt 2(y, y)
x) _
2(x, y)(x y) (y, y ) '
x) _
(x , y)2 > (y,y) -
(x, x)(y, y)
>
+
(x , y)2(y y) (Y, y)2 ,
0 ,
(x , y)z.
Taking the square root of both sides, we have the desired inequality
l(x,y)1 < Ix l · l yl·
To complete the proof, consider the case y = O. Then (x, y) and in this case the inequality follows trivially. •
.4 9.30.
Exercise.
F o r ,x y
E
,X
= 0, Iy I = 0,
show that
l(x,y)1 = Ix l ' l yl
if and only if x and yare linearly d.ependent. In the next result we establish the axioms of a norm.
.4 9.31.
Theorem. following hold:
For
all x and y in X
and for all real scalars tt, the
(i) Ix l > 0 unless x = 0, in which case Ixl = 0; (ii) Ittx I = Itt I . Ix I, where Itt I denotes the absolute value of the scalar tt; and
(iii)
Ix
+
IY ~
Ixl
+
Iyl·
Chopter 4 I iF nite-Dimensional
Vector Spaces and Matrices
Proof The proof of part (i) follows from the definition of an inner product. To prove part (ii), we note that
I«lx z =
(<,x<
=
«x)
(« ,x
=
«x)
Z « (x,
z.
= lz « lx
)x
Taking the square root of both sides we have the desired relation
I«lx
= I«llx I· To verify the last part of the theorem we note that
Ix
sU ing
+
ylZ
=
(x
+
= Ixl z +
y, x
+
y)
2(x , y)
=
(x, x)
+ Iylz.
+
+
2(x , y)
(Y , y)
the Schwarz inequality we obtain
Ix +
Ixl z + 2lxl · I yl + lylZ = (Ixl + Iyl)z.
ylZ <
Taking the square root of both sides we have
Ix +
y I<
Ix I + Iy I,
which is the desired result. _ Part (iii) of Theorem .4 9.31 is called the triangle inequality. Part (ii) is called the homogeneous property of a norm. In Chapter 6 we will define functions on general vector spaces satisfying axioms (i), (ii), and (iii) of Theorem .4 9.31 without making use of inner products. In such cases we will speak of normed linear spaces (Euclidean spaces are examples of normed linear spaces). Our next result is called the parallelogram law. Its meaning in the plane is evident from Figure E. x
x+y
/
o .4 9.32.
.4 9.33.
Ix +
Exercise.
I
/
/
I
I
/
I
y
FIgure E. Interpretation of the parallelogram law.
Deorem. F o r all ,x y
holds. .4 9.34.
Iyl
I
I
I
ylZ
E
+ Ix -
X the equality ylZ
= 21xl z + 21yI Z
Prove Theorem .4 9.33.
Generalizing Eq. (4.9.4),
we define the distance between two vectors x
.4 9.
Euclidean Vector Spaces
andy of X a s
p(x , y) =
Ix -
(4.9.35)
yl.
It is not difficult for the reader to prove the next result. 4.9.36.
F o r all x , y, Z E ,X
Theorem.
(i) p(x , y) = (ii) p(x , y) ~ (iii) p(x , y) <
p(y, x ) ;
=
0 and p(x , y)
=
0 if and only if x p(z, y).
+
p(x , )z
the following hold:
y; and
A function p(x , y) having properties (i), (ii), and (iii) of Theorem .4 9.36 is called a metric. Without making use of inner products, we will in Chapter 5 define such functions on non-empty sets (not necessarily linear spaces) and we will in such cases speak of metric spaces (Euclidean spaces are examples of metric spaces). .4 9.37.
Exercise.
B.
Prove Theorem .4 9.36.
Orthogonal Bases
Following our discussion at the beginning of the present section further, we now recall the important concept of orthogonality, using inner products. In accordance with Definition 3.6.22, two vectors ,x y E X are said to be ortbogonal (to one another) if (x, y) = O. We recall that this is written as x ..L y. F r om the discussion at the beginning of this section it is clear that in 0 is orthogonal to y 0 if and only if the angle between x the plane x and y is some odd multiple of 90°. The reader has undoubtedly encountered a special case of our next result, known as the Pythagorean tbeorem.
*
4.9.38.
Theorem.
Proof.
Ix +
*
Let x , y
Ix +
.X Ifx ..L y, then ylZ
= Ixl z +
Iylz.
Since by assumption x ..L y, we have (x , y)
ylZ
=
(x
+
y, x
+
=
y)
which is the desired result. .4 9.39.
E
(x, x )
+
(x , y)
(y,x )
+
O. Thus, (Y , y)
= Ixl z
+
IYl,z
_
Definition. A vector x
L e t us choose any vector y
Izi
+
=
=
E
is said to be a unit vector if Ix
X
* 0 and let 11~lyl =
1~l YI
Z
=
I=
I~ Iy. Then the norm of z =
1,
i.e., z is a unit vector. This process is called normalizing the vector y.
1. is
Chapter 4 I iF nite-Dimensional
110
Vector Spaces and Matrices
Next, let {fl" .. .J.l be an arbitrary basis for X and let F = [ l t J ] denote the matrix of the inner product with respect to this basis; i.e., ItJ = (It, f J ) for all; andj. More specifically,F denotes the matrix of the bilinear functional f that is used in determining the inner product on X with respect to the indicated basis (see Definition .4 8.1). Let x and y denote the coordinate representation of x and y, respectively, with respect to { f l' ... ,f.l. Then we have, by Theorem .4 8.2, (x , y)
= Tx yF
•
= J - LI L
= yTFx'
• '-I
Ittlh·
Now by Theorems .4 8.20 and .4 8.23, since the inner product is symmetric and strictly positive, there exists a basis e{ l , • • ,e.l for X such that the matrix of the inner product with respect to this basis is the (n x n) identity matrix I, i.e., ifi:;ej (e" eJ
=
)
~'J
~
={
if; = j .
This motivates the following: .4 9.40. Definition. If e{ l , • • , e.l is a basis for X such that (e" eJ ) = 0 for all ;:;e j, i.e., if e, ..L eJ for all ;:;e j, then e{ l , • • ,e.l is called an orthogonal basis. If in addition, (e" e,) = I, i.e., if Ie,l = I for all i, then e{ l , • • ,e.l is said to be an orthonormal basis for X (thus, e{ lt • • e.l is orthonormal if and only if (e" eJ ) = ~I})' sU ing the properties of inner products and the definitions of orthogonal and orthonormal bases, we are now in a position to establish several useful results. .4 9.41. Theorem. eL t e{ lt • • ,e.l be an orthonormal basis for .X Let x and y be arbitrary vectors in ,X and let the coordinate representation of x andy with respectto this basis bex T = (':1' ... ,e.) and yT = (' l it· · , I' .), respectively. Then (4.9.42) and Proof.
Ixl =
(X TX) I/1
,Je:
=
+ ... +
e:·
(4.9.43)
F r om the above discussion we have (x, y) =
T x yF
=
•
L Itt,'IJ =
',J-I
In particular, we have (x, x)
= L
1= '
•
•
L
',J-I
e:.•
~'Je,'IJ
•
= 'L=1 e,'I,.
.4 9.
Euclidean Vector Spaces
211
The reader should note that Eqs. (4.9.7) and (4.9.8) introduced at the beginning of this section are, of course, in agreement with Eqs. (4.9.42) and (4.9.43). (See also Example .4 9.23.) Our next result enables us to determine the coordinates of a vector with respect to a given orthonormal basis. 4.9.44. Theorem. Let e{ l , • • ,e,,} be an orthonormal basis for X and let x be an arbitrary vector. The coordinates of x with respect to {el' ... , ell} are given by the formulas ~I
Proof
Since x
(x, e l)
=
=
(ele l
~Iel
=
(x, e l ),
+ ... +
+ ... +
,~"
••
=
(x, ell)'
we have
~"e",
e"e", e l) =
el(e l , e l)
Repeating this procedure for (x, el ), i = result. _
+ ... +
el )
~"(e,,
=
el'
2, ... ,n, yields the desired
Let us consider some specific cases. 4.9.45. Example. where x = (~I' ~2)
= Z£
Let X and y =
(see Definition .4 9.27). (111,112)' Then
Let
,x y E £ 2 ,
el111 + 2{ 112' The natural basis for £ 2 is given by U I = (1,0) and U 2 = (0, I). Since (u l , u / ) = J ' I ' it follows that u{ 1, u 2} is an orthonormal basis for £2. F u rthermore, we have (x , y)
4.9.46. by
Example.
Let X
=
=
RZ, and let the inner product on RZ be defined
1{ 111 + 4~z11z· (4.9.47) (The reader may verify that this is indeed an inner product.) L e t u{ I , u z } denote the natural basis for RZ; i.e., U I = (1,0) and U z = (0,1). The matrix representation of the bilinear functional, which determines the above inner product with respect to the basis u{ l , uz} is (x, y) =
(x, y)
=
XT[~
~Jy,
where x and yare the coordinate vectors of x and y with respect to u{ l , U 2 J . We see that (UI> U 2 ) = 1 · 0 4 · 0 . 1 = 0; eL ., U I and U 2 are orthogonal with respect to the inner product (4.9.47). Note however that lUll = I and Iuz\ = 2; i.e., the vectors U I and U 2 are not orthonormal. Now let e l = (1,0) and e z = (0, t). Then it is readily verified that e{ l> e 2 } is an orthonormal basis for .X F u rthermore, for x = ~el + e~e2' we have
+
Chapter 4 I iF nite-Dimensional
211 e~
=
(x, e,) and e~
=
Vector Spaces and Matrices
(x, e 1 ). If we let
[~] =
x'
and y'
[:~J =
denote the coordinate representation of x and y, respectively, with respect to
eel, e1,} then
(x, y) =
(x)' Ty'.
This illustrates the fact that the norm of a vector must be interpreted with respect to the inner product used in determining the norm. _ Our next result allows us to represent vectors in X in a convenient way. .4 9.48. all x
E
Theorem. Let e{ X w e have _
,e.} be an orthogonal basis for .X
l , ••
(x , e,)
x - ( e , e, )e l l Proof.
Normalizing e l ,
i..}, where
e: = rile" x
=
i
(x , e~)e~
+
(x , eft)
(-eft, - ) e". e.
,n. By Theorem .4 9.44
I,
+
• • •
,e", we obtain the orthonormal basis
• •
=
+
+
Then for
{~,
... ,
we have
(x , e~)e~
(x, Tf-re,) (Tf-r) e, + ... + (x, ;' .le.)C:.I)e. (x , e,) + + (xle.1 , e.) .• • eft
= = =
~el
1
(x , e,) e (el> e,) l
+ ... +
(x , e.) e.. (e., e.)
_
We are now in a position to characterize inner products by means of Parseyal's identity, given in our next result. .4 9.49. Coronary. Let e{ " ... ,e,,} be an orthogonal basis for .X for any ,x y E X we have (x, y)
.4 9.50.
Exercise.
=
t. (x,
t1
Then
e,)(y, e,). (e" e,)
Verify Corollary .4 9.49.
Our next result establishes the linear independence of orthogonal vectors. We have: .4 9.51. Theorem. Suppose that X I ' • • ' X k are mutually orthogonal noni.e., ,x ...L X j ' i::l= j. Then X l " • ,x k are linearly indezero vectors in ;X pendent.
.4 9.
Euclidean Vector Spaces
Proof
213
Assume that for real scalars (X I ' (X I X
F o r arbitrary i
0=
(X/(X"
+ ... +
«XIX
I
+ ... +
(XkkX J
(XkkX
XI)
we have
, (Xk
= I, ... , k, we have
(0, XI) = =
I
••
=
= O. (X I (X
XI)
I,
+ ... +
(Xk(X
k , XI)
X I );
i.e., (XI(X I , XI) = O. This implies that linear independence of X I " • , X k •
(XI
= 0 for arbitrary i, which proves the
•
Note that the converse to the above theorem is not true. We leave the proofs of the next two results as an exercise. .4 9.52. Corollary. A set, of k non-zero mutually orthogonal vectors is a basis for X if and only if k = dim X = n. .4 9.53. Corollary. F o r X there exist not more than n mutually orthonormal vectors. (In this case we speak of a complete orthonormal set of vectors.) .4 9.54.
Exercise.
Prove Corollaries .4 9.52 and .4 9.53.
Our next result, which is called the Gram·Schmidt process, allows us to construct an orthonormal basis from an arbitrary basis. .4 9.55.
Theorem. eL t
ff., ... , fn} be an arbitrary basis for
gl =
fl,
g,. =
f,. -
g" 1 ="
-
(/,., el)e u .- 1
~
j= 1
(/",ej)ej,
el =
gl/lgII,
e,. =
g,./lg,.l,
elf
.X
Set
= g"/lg,,l·
Then e{ u ... , elf} is an orthonormal basis for .X .4 9.56. Exercise. Prove Theorem .4 9.55. To accomplish this, show that (e" ej ) = 0 for i :#' .j, that lell = 1 for i = 1, ... , n, and that e{ l , • • , elf} forms a basis for .X The next result is a direct consequence of Theorem .4 9.55 and Theorem
3.3.4.4
.4 9.57. Corollary. If e u ... , ekJ k < n, are mutually orthogonal non-ez ro vectors in ,X then we can find a set of vectors ek + I ' • . . , elf such that the set {el, ... , elf} forms a basis for .X Our next result is known as the Bessel inequality.
Clwpter 4 I iF nite-Dimensional
214
.4 9.58. Theorem. If { X I ' " ' x orthonormal vectors in ,X then
for all X
E
IJ,
k
=
is orthogonal to each "x i = Let
is an arbitrary set of mutually
k}
Moreover, the vector
.X
Y
Proof.
Vector Spaces and Matrices
X
:E (x, -
,x )x,
1= '
I, ... , k.
(x, ,x ). We have
=
I< x =
(x -
f. IJ,,X I2 = ~
o
~
(x, x ) -
Now since the vectors x
k
1J,tJ,
k
-
'X
lt • •
~
f. IJ,"X j
ftilJ j lJ
+
X
~I
-
k k ~I
t
IJ j X
J-I
j)
(I,,tJt,x ,
X j )'
are mutually orthonormal, we have
k
which proves the first part of the theorem. To prove the second part, we note that
In Theorem .4 9.58, let U denote the linear subspace of X which is spanned by the set of vectors x { lt • . • , x k } . Then clearly each vector Y defined in this theorem is orthogonal to each vector of ;U i.e., Y .l.. U (see Definition 3.6.22). Let us next consider: .4 9.59.
Theorem. Let
=
y.L
(i) Let j= (ii) y.L (iii) n = (iv) (Y.)L .L (v) X =
{II' I,
Y be a linear subspace of ,X
x{
E
X: (x, y) =
0 for all Y
Then x
,Ik} span .Y , k.
and let
.} Y
(4.9.60)
if and only if x ..1 I j for
y.L
E
E
is a linear subspace of .X dim X = dim Y + dim y.L.
(vi) Let X , Y and X
=
.Y
yEt>
y.L E
2,
2Y
.X E
If x
=
IX
y.L, then
+ X
2
and Y
=
YI
+
Y2'
where
X I ,Y I
E
Y
.4 9.
Euclidean Vector Spaces
215
and
Ixl
v- "lxti z =
+ IXlz .z
To prove the first part, note that if x E y.L, then x J . .fl"' " X J . .fk' since It E Y for i = 1, ... , k. On the other hand, let x J . ./' , i = 1, ... , k. Then for any Y E Y there exist scalars I' " i = 1, ... , k such that y = ndl + ... + I' ,.fk' eH nce,
Proof
(x, y) = Thus, x
E
(x ,
y.L.
t 'I,/')
I- '
f
=
t:1
' I ,(x , /,) =
O.
The remaining parts of the theorem are left as an exercise. .4 9.61.
Exercise.
_
Prove parts (ii) through (vi) of Theorem 4.9.59.
.4 9.62. Definition. Let Y be a linear subspace of .X The subspace y.L defined in Eq. (4.9.60) is called the orthogonal complement of .Y Before closing the present section we state and prove the following important result. .4 9.63. lbeorem. L e t/be y E X such that
a linear functional on .X
f(x )
for all x E .X
=
There exists a unique
(x , y)
(4.9.64)
Proof. If I(x ) = 0 for all x E X , then y = 0 is the unique vector such that Eq. (4.9.64) is satisfied for all x E ,X by Theorem 4.9.17. So let us suppose
that I(x )
*-
0 for some x
~
Then
,X and let
E
=
{x
E
X : /(x )
= OJ.
is a linear subspace of .X Let ~.L be the orthogonal complement Then it follows from Theorem 4.9.59 that X = ~ EB .~ L Furthercontains a non-ez ro vector. Let oY E ~.L and, without loss of more, .~ L generality, let oY be chosen in such a fashion that Iyo I = 1. Now let y = f(yo)Yo and for any x E X let X o = x - lXoY , where lX = f(x ) /f(yo)' Then f(x o) = 0, and thus X o E ~. We now have x = X o + lXoY , and ~
of~.
(x, y) = =
(xo'/(Yo)
• oY )
f(yo) • (x o, oY ) = lXf(yo) = f(x ) ;
+
+
(lXoY /' (Yo) lXf(yo)
• oY ) • (Yo. oY )
i.e., for all x E X , /{ x ) = (x , y). To show that y is unique, suppose that (x , IY ) = (x , yz ) for all x E .X Then (x , IY - yz ) = 0 for all x E .X But this implies that IY - Y z = 0, or IY = Yz· This completes the proof of the theorem. _
.4 10.
IL NEAR TRANSFORMATIONS ON EUCIL DEAN VECTOR SPACES Orthogonal Transformations
A.
In the present section we concern ourselves with special types of linear transformations defined on Euclidean vector spaces. We will have occasion to reconsider similar types of transformations again in Chapter 7, in a much more general setting. nU less otherwise specified, X willdenote an n-dimensional
Euclidean vector space throughout the present section.
The first special type of linear transformation defined on Euclidean vector spaces which we consider is the so-called "orthogonal transformation." Let {el, • . .
,e.l be an orthonormal basis for ,X
= t. PJleJ'
let e;
1="1
i=
1, ... ,
n, and let P denote the matrix determined by the real scalars P'' J The following question arises: when is the set {e~, ... , e.l also an orthonormal basis for X? To determine the desired properties of P, we consider
(t Pklek, t. PIJel)
(e;, eJ) =
In order that (1" ej)
k~1
=
~
0 for i
(e;, ej) =
~I
i.e., we require that
~
•
*"j and (e;, ej) = PklP' ) k'
PTP where, as usual, I denotes the n
.4 10.1. e;
X
=
t
I-J
Theorem. Let
=
{ e l' ...
X
= ~
~
•
~ PkIP/J t.1 I for i PklPkJ
=
(e ko e,).
=
j, we require that
6,J ,
= I,
n identity matrix. We summarize.
,e.l
be an orthonormal basis for .X
Let
,e.l is
an orthonormal basis for
Definition. A matrix P such that pT = = I, is called an orthogonal matrix.
p- I , i.e., such that p7' p
PJleJ'
i
=
1, ... ,n. Then {e~,
if and only if pT =
...
P- I .
This result gives rise to the following:
.4 10.2. =
p- I p
.4 10.3.
Exercise. Show that if P is an orthogonal matrix, then either det P = 1 or det P = - I . Also, show that if P and Q are (n X n) orthogonal matrices, then so is PQ. The nomenclature used in our next definition will become clear shortly.
216
.4 10.
217
Lineary Transformations on Euclidean Vector Spaces
.4 10.4. Definition. A linear transformation A from X into X is called an orthogonal linear transformation if (Ax, Ay) = (x, y) for all x, y E .X Let us now establish some of the properties of orthogonal transformations. .4 10.5. IAx l
Theorem. eL t A Ixl for all x E .X
=
E
L(X,
Then A is orthogonal if and only if
X).
Proof. If A is orthogonal, then (Ax, Ax ) = versely, if IAx I = Ix I for all x E ,X then IA(x
Also, IA(x
+
+
y)1 2 =
(A(x
yW
= Ix
E
.X
y), A(x
+
y»
2(Ax, Ay) 2(Ax, Ay)
+ yl2 = (x +
and therefore for all x , y
+
= IAxl 2 + = Ixl 2 +
+
y, x
(Ax, Ay) = _
=
+
and IAx
(x, x) (Ax
+
Ay, Ax
I = Ilx .
+
Con-
Ay)
IAyl2
lyl2.
+
= Ixl 2+
y)
2(x , y)
+ lyl2,
(x, y)
We note that if A is an orthogonal linear transformation, then x .1.. y for all ,x y E X if and only if Ax .1.. Ay. F o r (x, y) = 0 if and only if (Ax, Ay) = O. .4 10.6. Corollary. Every orthogonal linear transformation of X is non-singular.
Proof Let Ax = singular. _
O. Then
into X
IAxl = Ixl = O. Thus, x = 0 and A is non-
Our next result establishes the link between Definitions 4.10.2 and 4.10.4. Theorem. eL t e{ l' ... ,e.J be an orthonormal basis for .X Let X ) , and let A be the matrix of A with respect to this basis. Then A is orthogonal if and only if A is orthogonal.
.4 10.7. A
E
L(X,
Proof. Let x and y be arbitrary vectors in ,X and let x and y denote their coordinate representation, respectively, with respect to the basis e{ l, ... , e.J. Then Ax and Ay denote the coordinate representation of Ax and Ay, respectively, with respect to this basis. Now, (Ax, Ay) =
and
(Ax Y ( Ay)
= x
T
ATAy,
Chopter 4 I iF nite-Dimensional
118
Vector Spaces and Matrices
Now suppose that A is orthogonal. Then ATA = I and (Ax , Ay) = T x y = (x, y) for all x , y E .X On the other hand, if A is orthogonal, then (Ax , Ay) = T x ATAy = T x y = (x , y) for all x , y E .X Thus, T x ([ ATA - I )y] = O. Since this holds for all x , y E ,X we conclude from Corollary 4.9.20 that
= 0; i.e., ATA = I. •
I
ATA -
The next two results are left as an exercise. 4.10.8.
Corollary.
Let A E L ( X ,
X).
IfA is orthogonal, then det A =
±
1.
4.10.9. Corollary. Let A, BE L ( X , X ) . If A and B are orthogonal transformations, then AB is also an orthogonal linear transformation. 4.10.10.
Exercise.
Prove Corollaries .4 10.8 and .4 10.9.
F o r reasons that will become apparent later, we introduce the following convention. 4.10.11. Definition. L e t A E L ( X , X ) be an orthogonal linear transformation. Ifdet A = + I, then A is called a rotation. Ifdet A = - I , then A is called a reflection. B.
Adjoint Transformations
The next important class of linear transformations on Euclidean spaces which we consider are so-called adjoint linear transformations. Our next result enables us to introduce such transformations in a natural way. 4.10.12.
Theorem. L e t G E L ( X , X ) and defineg: X x X - + R by g(x , y) (x, Gy) for all x , y E .X Then g is a bilinear functional on .X Moreover, if {el>' .. , e.1 is an orthonormal basis for ,X then the matrix of g with respect to this basis, denoted by G, is the matrix of G with respect to {el>' • , e.l. Conversely, given an arbitrary bilinear functional g defined on ,X there X ) such that (x , Gy) = g(x , y) exists a unique linear transformation G E L ( X , for all x , y E .X
=
Proof. g(x l
+
Let G
E
=
x z , y)
L(X, (X I
+
X),
and let g(x , y) =
X Z,
Gy)
=
(X I '
Gy)
(x , Gy). Then
+
(x z , Gy)
=
+
g(x l ,y)
g(x z , Y ) .
Also, g(x, YI
+
=
yz ) =
(x, G(YI g(x, Y I )
+
+
yz »
=
g(x , yz)·
(x, GYI
+
Gyz)
=
(x , GYI)
+
(x , Gyz)
.4 10.
iL near Transformations on Euclidean Vector Spaces
119
Furthermore, and g(x, IX)Y
=
g(tU,
y)
=
(x, G(IX» Y
Gy)
(lX,X =
=
IX(,X
(x, IXG(y»
Gy) =
IX(,X
=
y),
IXg(X,
Gy)
=
IXg(X,
y),
where IX is a real scalar. Therefore, g is a bilinear functional. Next, let e{ ., ... ,e.} be an orthonormal basis for .X Then the matrix G of g with respect to this basis is determined by the elements g/j = g(e l, eJ). Now let G' = g[ ;J] be the matrix of G with respect to {e., . .. ,e.}. Then Ge J
=
t
k=.
g~Jek
for j =
I, ...
,n.
Hence,
(e lt Ge) =
(e k=t. l,
g~)ek) =
g;j.
Since glJ = g(e l , eJ ) = (e lt Ge J ) = g;J' it follows that G' = G; eL ., G is the matrix ofG. To prove the last part of the theorem, choose any orthonormal basis e[ ., ... ,e.} for .X Given a bilinear functional g defined on ,X let G = g[ lj] denote its matrix with respect to this basis, and let G be the linear transformation corresponding to G. Then (x, Gy) = g(x, y) by the identical argument given above. Finally, since the matrix of the bilinear functional and the matrix of the linear transformation were determined independently, this correspondence is unique. _ It should be noted that the correspondence between bilinear functionals and linear transformations determined by the relation (x, Gy) = g(x, y) for all x , y E X does not depend on the particular basis chosen for ;X however, it does depend on the way the inner product is chosen for X at the outset. Now let G E L ( X , X ) , set g(x, y) = (x, Gy), and let h(x, y) = g(y, x) = (y, Gx) = (Gx, y). By Theorem 4.10.12, there exists a unique linear transformation, denote it by G*, such that h(x, y) = (x, G*y) for all ,x y E .X We call the linear transformation G* E L ( X , X ) the adjoint of G.
.4 10.13.
Theorem
(i) F o r each G E L ( X , X ) , there is a unique G* E L ( X , X ) such that (x, G*y) = (Gx, y) for all ,x y E .X (ii) Let {e., . .. ,e.} be an orthonormal basis for ,X and let G be the matrix of the linear transformation G E L ( X , X ) with respect to this basis. Let G* be the matrix of G* with respect to e[ l , • • , e.}. Then G* = GT.
Proof The proof of the first part follows from the discussion preceding the present theorem. To prove the second part, let e[ l, ... , e.} be an orthonormal basis for ,X and let G* denote the matrix of G* with respect to this basis. Let x and y be the coordinate representation of x and y, respectively, with respect to this
Chapter 4 I iF nite-Dimensional basis. Then
(x , G*y) =
=
T x G*y
(GX)T y =
(Gx , y) =
Thus, for all x and y we have T x (G* -
Vector Spaces and Matrices
GT)y
=
T x GT y.
O. eH nce,
G* =
GT. •
The above result allows the following equivalent definition of the adjoint linear transformation. .4 10.14. Definition. eL t G is defined by the formula for all x, y
E
L(X,
X).
(x , G*y)
.X
E
=
The adjoint transformation, G* (Gx , y)
Although there is obviously great similarity between the adjoint linear transformation and the transpose of a linear transformation, it should be noted that these two transformations constitute different concepts. The differences of these will become more apparent in our subsequent discussion of linear transformations defined on complex vector spaces in Chapter 7. Our next result includes some of the elementary properties of the adjoint of linear transformations. The reader should compare these with the properties of the transpose of linear transformations. X ) , let A*, B* denote their respective .4 10.15. Theorem. eL t A, B E L ( X , adjoints, and let lX be a real scalar. Then
(i) (A*)* = A; (ii) (A B)* = A* (iii) (lXA)* = lXA*; (iv) (AB)* = B*A*;
+
(v) (vi) (vii) (viii) .4 10.16.
+
B*;
/* = I, where / denotes the identity transformation; 0* = 0, where 0 denotes the null transformation; A is non-singular if and only if A* is non-singular; and if A is non-singular, then (A*)- I = (A- I )*. Exercise.
Prove Theorem .4 10.15.
Our next result enables us to characterize orthogonal transformations in terms of their adjoints. .4 10.17. A* =
Proof
Theorem. eL t A E L ( X ,
A- I .
We have (Ax, Ay) =
X).
Then A is orthogonal if and only if
(A*Ax , y). But A is orthogonal if and only jf
.4 10.
iL near Transformations on Euclidean Vector Spaces
(Ax , Ay) =
(x, y) for all x , y
E
.X
221
Therefore,
(A*Ax , y)
=
(x , y)
for all x and y. F r om this it follows that A*A =A-I . •
=
I, which implies that A*
The proof of the next theorem is left as an exercise.
.4 10.18. Theorem. Let A E L ( X , X ) . Then A is orthogonal if and only if A- I is orthogonal, and A- I is orthogonal if and only if A* is orthogonal. .4 10.19.
Exercise.
Prove Theorem .4 10.18.
C. Self- A djoint Transformations Using adjoints, we now introduce two additional important types of linear transformations.
.4 10.20. Definition. Let A E L ( X , )X . Then A is said to be self-adjoint if A* = A, and it is said to be skew-adjoint if A* = - A . Some of the properties of such transformations are as follows.
.4 10.21. Theorem. Let A E L ( X , X ) . Let e{ lO • • , e"} be an orthonormal basis for ,X and let A be the matrix of A with respect to this basis. The following are equivalent: (i) A is self-adjoint; (ii) A is symmetric; and (iii) (Ax , y) = (x , Ay) for all x , y
E
.X
.4 10.22. Theorem. Let A E L ( X , X), and let e{ l, ... , e"} be an orthonormal basis for .X Let A be the matrix of A with respect to this basis. The following are equivalent: (i) A is skew-adjoint; (ii) A is skew-symmetric (see Definition .4 8.8); and (iii) (Ax , y) = - ( x , Ay) for all x , y E .X
.4 10.23.
Exercise.
Prove Theorems .4 10.21 and .4 10.22.
The following corollary follows from part (iii) of Theorem .4 10.22.
Chapter 4 I iF nite-Dimensional
221
.4 10.24. Corollary. eL t following are equivalent:
Vector Spaces and Matrices
A be as defined in Theorem .4 10.22.
(i) A is skew-symmetric; (ii) (x, Ax ) = 0 for all x E ;X (iii) Ax . .l x for all x E .X
Then the
and
Our next result enables us to represent arbitrary linear transformations as the sum of self-adjoint and skew-adjoint transformations. .4 10.25. Corollary. eL t A E L(X, X). Then there exist unique At, A" E L(X, X ) such that A = AI + A", where At is self-adjoint and A" is skewadjoint. .4 10.26.
Prove Corollaries .4 10.24
Exercise.
and .4 10.25.
.4 10.27. Exercise. Show that every real n x n matrix can be written in one and only one way as the sum of a symmetric and skew-symmetric matrix. Our next result is applicable to real as well as complex vector spaces. .4 10.28. Theorem. eL t X be a complex vector space. Then the eigenvalues of a real symmetric matrix A are all real. (If all eigenvalues of A are positive (negative), then A is called positive (oegative) definite.) eL t A = r + is denote an eigenvalue of A, where rand s are real numbers and where i = ../- 1 . We must show that s = O. Since A is an eigenvalue we know that the matrix (A - AI) is singular. So is the matrix
Proof
B=
A [ -
(r
=
A" -
(r
+
+
is)I)[A -
is)I)
is)IA -
(r -
(r -
is)IA
+
(r
+
is)(r -
= A" - 2rA + (r" + s")1" = (A - rI)" + s"I.
Since B is singular, there exists an x * "O such that Bx = 0=
T x Bx
=
T x ([ A
-
rl)"
Since A and I are symmetric, (A - rI)T Therefore,
=
+
s"I)x = AT -
rl T
T x (A -
is)1"
O. Also,
rI)"x
+
s"xT.x
= A - rl.
i.e., where y =
(A -
rI)x. Now yTy =
~ ,~
•
,,1 ~
0 and T x x
= L • ,1> 0, because
I- '
.4 10.
iL near Transformations on Euclidean Vector Spaces
* O. Thus, we have
by assumption x
o=
yTy
+
>
SZ(xT)x
0
+
223
sZxT.x
The only way that this last relation can hold is if s and Ais real. _
=
O. Therefore, A =
T,
X ) with Now let A be the matrix of the linear transformation A E L ( X , respect to some basis. If A is symmetric, then all its eigenvalues are real. In this case A is self-adjoint and all its eigenvalues are also real; in fact, the eigenvalues of A and A are identical. Thus, there exist uniq u e real scalars AI' ... , Apt P < n, such that
U)
det (A -
det (A =
(AI -
AI) =
A)""(Az -
A)'"'
... (A, -
A)'".'
(4.10.29)
We summarize these observations in the following: Corollary. Let A E L ( X , )X . If A is self-adjoint, then all eigenvalues of A are real and there exist uniq u e real numbers AI" • ,A" p < n, such that Eq. (4.10.29) holds.
.4 10.30.
i
=
As in Section 4.5, we say that in Corollary 4.10.30 the eigenvalues A" 1, ... ,p < n, have algebraic multiplicities m i = 1, ... ,p, respectively. " is the following result. Another direct consequence of Theorem 4.10.28
4.10.31. Corollary. Let least one eigenvalue.
.4 10.32.
Exercise.
A E L(X,
If A is self-adjoint, then A has at
X).
Prove Corollary 4.10.31.
Let us now examine some of the properties of the eigenvalues and eigenvectors of self-adjoint linear transformations. First, we have:
.4 10.33.
Theorem. Let A E L ( X , X ) be a self-adjoint transformation, and let AI" .. ,Ap , p < n, denote the distinct eigenvalues of A. If ,X is an eigenvector for A, and if XI is an eigenvector for AI' then ,x .1. XI for all i j.
*
Proof Assume that A, A,andconsider AX I = ,x 0 and x , O. We have
*
A,(X Thus,
*
"
Since A,
x,) =
(A,X
"
)JX =
* AI' we have (XI'
Now let A
E
L(X,
X),
(Ax
"
)JX =
(XI'
Ax /) =
(x"
AJX /) =
(A, -
AJ)(X"
)JX
0, which means ,x .1. xI'
=
IX ) =
=
A,X , and Ax,
*
AJ"X
where
Aix " )J x '
O. _
and let A, be an eigenvalue of A. Recall that
~,
Chapter 4 I iF nite-Dimensional
224
Vector Spaces and Matrices
denotes the null space of the linear transformation A -
m= l
x{
E
= OJ.
:X (A - A Il)x
Recall also that ml is a linear subspace of .X have immediately:
A,l, i.e., (4.10.34)
F r om Theorem .4 10.33 we now
X ) be a self-adjoint transformation, and .4 10.35. Corollary. Let A E L ( X , let AI and Aj be eigenvalues of A. If AI *- Aj , then ml ..1 mj •
.4 10.36.
Exercise.
Prove Corollary .4 10.35.
Making use of Theorem .4 9.59, we now prove the following important result. X ) be a self-adjoint transformation, and .4 10.37. lbeorem. Let A E L ( X , let A\, ... , A" p < n, denote the distinct eigenvalues of A. Then
dim X
= n=
dim m\
+
+ ... +
dim mz
dim m,.
Proof Let dim ml = nl , and let ret, ... , e• .l be an orthonormal basis for mi' Next, let e{ • + I > ' " ,e.,H .} be an orthonormal basis for mz . We continue in this manner, finally letting e{ ., + ... +_. + I> • • • , e•• + ... .+ ,} be an orthonormal basis for mp • Let n\ + ... + n p = m. Since ml ..1 mj , i *- j, it follows that the vectors et> ... ,e.., relabeled in an obvious way, are orthonormal in .X We can conclude, by Corollary .4 9.52, that these vectors are a basis for ,X if we can prove that m = n. Let Y be the linear subspace of X generated by the orthonormal vectors e\ , ... , e... Then e{ l , • • , e..} is an orthonormal basis for Y a nd dim Y = m. Since dim Y + dim y1. = dim X = n (see Theorem .4 9.59), we need only prove that dim Y 1. = O. To this end let x be an arbitrary vector in Y 1.. Then (x, e\) = 0, ... , (x, e..) = 0; i.e., x . .l e\, ... , x ..1 e.., by Theorem .4 9.59. So, in particular, again by Theorem .4 9.59, we have x ..1 ml , i = I, ... ,p. Now let y be in mi' Then
(Ax, y) =
(x, Ay) =
(x, AIY)
=
Alx , y) =
0,
since A is self-adjoint, since y is in ml , and since x ..1 mi' Thus, Ax ..1 m, for i = I, ... ,p, and again by Theorem .4 9.59, Ax . .l el , i = I, ... , m. Thus, by Theorem .4 9.59, Ax . .l yol. Therefore, for each x E Y 1. we also have Ax E yol. Hence, A induces a linear transformation, say A', from yol into 1Y ., where A' x = Ax for all x E y1.. Now A' is a self-adjoint linear transformation from yol into oY l, because for all x and y in yol we have
(A'x, y) =
(AX, y) =
(x, Ay) =
(x, A'y).
Assume now that dim yol> O. Then by Corollary .4 10.31, A' has an eigenvalue, say Ao, and a corresponding eigenvector X o *- O. Thus, X o *- 0
.4 10.
iL near Transformations on Euclidean Vector Spaces
225
IS 10 y1. and A' x o = Ax o = Aox o; i.e., Ao is also an eigenvector of A, say Ao = A,. So now it follows that X o E ~/' But from above, X o E Y 1., which This implies that X o 1- x o, or (x o, x o) = 0, which in turn means X o 1- ~/' implies that X o = O. But this contradicts our earlier assumption that X o 1= = O. eH nce, we have arrived at a contradiction, and it therefore follows that dim Y 1. = O. This proves the theorem. _
Our next result is a direct consequence of Theorem .4 10.37.
.4 10.38.
Corollary. eL t A
E
L(X,
)X .
If A is self-adjoint, then
(i) there exists an orthonormal basis in X such that the matrix of A with respect to this basis is diagonal; and (ii) for each eigenvalue A, of A we have dim m, = multiplicity of A,.
Proof As in the proof of Theorem .4 10.37 we choose an orthonormal basis ret, ... ,em}, where m = n. We have Ael = Ale., . .. ,Ae", = Ale"" Ae",+l = A2.e"'h+ ' .. ,Ae",+ ... u. = Ape",+ ... + .• Thus, the matrix A of A with respect to e{ l , • . • ,e.} is Al
In.
0
Al A. 2.
A=
In.
A2.
o
A,
I n,
A,
To prove the second part, we note that the characteristic polynomial of
A is
det (A -
AI) =
and, hence, n,
=
det (A dim /~ =
AI)
=
(AI -
A)"'(A2. -
multiplicity of A" i
=
A)"'
1,
(Ap ,p.
Another consequence of Theorem .4 10.37 is the following:
_
A)"',
Chapter 4 I iF nite-Dimensional
Vector Spaces and Matrices
.4 10.39. Corollary. Let A be a real (n x n) symmetric matrix. Then there exists an orthogonal matrix P such that the matrix A' defined by A' = P- I AP = pTAP is diagonal. .4 10.40.
Exercise.
Prove Corollary .4 10.39.
F o r symmetric bilinear functionals defined on Euclidean vector spaces we have the following result.
.4 10.41. Corollary. eL t f(x , y) be a symmetric bilinear functional on .X Then there exists an orthonormal basis for X such that the matrix offwith respect to this basis is diagonal. Proof
By Theorem .4 10.12 there exists an F
E
L( ,X X )
such that f(x , y)
= (x, Fy) for all x, y E .X Since f is symmetric, f(y, x ) = f(x , y) = (y, Fx) = (x, yF ) = (F,x y) for all x, y E X, and thus, by Theorem .4 10.21,
F is self-adjoint. eH nce, by Corollary .4 10.38, there is an orthonormal basis for X such that the matrix of F is diagonal. By Theorem .4 10.12, this matrix is also the representation offwith respect to the same basis. _
The proof of the next result is left as an exercise.
.4 10.42. Corollary. eL t j(x ) be a quadratic form defined on .X there exists an orthonormal basis for X such that if x T = (~I' ..• the coordinate representation of x with respect to this basis, then! ( x ) + ... + lX.e~ for some real scalars lXI' • • , IX • .4 10.43.
Exercise.
=
,~.)
Then is lXle~
Prove Corollary .4 10.42.
Next, we state and prove the spectral tbeorem for self-adjoint linear X ) is a transformations. First, we recall that a transformation P E L ( X , projection on a linear subspace of X if and only if p1. = P (see Theorem 3.7.4). Also, for any projection P, X = R < (P) EEl (~ P), where R < (P) is the range of P and ~(P) is the null space of P (see Eq. (3.7.8». Furthermore, recall that a projection P is called an orthogonal projection if R < (P) ..1 (~ P) (see Definition 3.7.16).
.4 10.4.4
Theorem. Let
A E L(X,
X)
be a self-adjoint transformation, let
AI' ... ,Ap denote the distinct eigenvalues of A, and let ~I be the null space
of A - AsI (see Eq. (4.10.34.» F o r each; = projection on 1& along &f-. Then
I, ... ,p, let PI denote the
(i) PI is an orthogonal projection for each; = = 0 for i *j, i,j = I, ... ,p;
(ii) PIP)
I, ... ,p;
.4 10.
Linear Transformations on Euclidean Vector Spaces
t
(iii)
= I, where I E L(X,
PJ
J-I
and (iv) A =
t
X ) denotes the identity transformation;
AJP)"
J=I
To prove the first part, note that X = m:, EB m:;-, i = I, ... ,p, by Theorem .4 9.59. Thus, by Theorem 3.7.3, R< (P ,) = m:, and m:(P ,) = m:;-, and hence, P, is an orthogonal projection. To prove the second part, let i 1= = j and let x E .X Then PJx I:>. x J E m: J . Since R< (P ,) = m:, and since m:,1.. m: J , we must have x J E m:(P ,); i.e., P,PJx = 0 for all x E .X
Proof
To prove the third part, let P
t
=
P" We must show that P
I- I
= I.
To
do so, we first show that P is a projection. This follows immediately from the fact that for arbitrary x E ,X p2 X = (PI + ... + P,)(Plx + ... + P,x ) = PIx + ... + P;x , because P'P J = 0 for i 1= = j. Hence, p2 X = (PI + ... + P,)x = Px, and thus P is a projection. Next, we show that dim R<[ (P)] = n. It is straightforward to show that
=
dim R <[ (P)]
t
But by Theorem .4 10.37,
1= 1
dim m [ :,]
t
dim m [ :,],
1= 1
= n,
and thus dim R < [ (P)]
= n.
Since
X = R < (P) EB m:(P), we conclude that R < (P) = .X Finally, since P is a projection with range ,X we conclude that Px = x for all x E ,X i.e., P = 1. To prove the last part of the theorem, let x E .X F r om part (iii) we have
x
Let ,x =
P,X
Ax = =
A(x
for i = i
AIPIX
+
+
+
=
PIX
+
P 2x
+ ... +
I, ... , p. Then ,x E
+
x,) = A'p,X
m:, and AX
+ ... + Ax , = (AIP I + ... +
AX
I
which concludes the proof of the theorem.
P,x. =
I
=
Al IX
A,X ,. Hence,
+ ... +
A,x,
A,P,)X,
_
{ I" .. , P,} satisfying parts (i)-(iii) Any set of linear transformations P of Theorem .4 10.44 is said to be a resolution of the identity in the setting of a Euclidean space. We shall give a more general definition of this concept in Chapter 7. D.
Some Examples
At this point it is appropriate to consider some specific cases. 4.10.45. Example. Let X = P, let A arbitrary basis for .X Suppose that
E
L(X,
X),
and let {el' e2 } be an
I Finite-Dimensional
Chapter 4
A=
0[ 11
Vector Spaces and Matrices
Ou]
021 022
is the matrix of A with respect to the basis e{ l, e2}' eL t x E E2, and let x T = (' I ' ' 2 ) denote the coordinate representation of x with respect to this basis. Then Ax is the coordinate representation of Ax with respect to this basis, and we have Ax
=
0[ 111' + OUe2J ~ 1'[ IJ 021 '1 + 022e2 1' 2
= y.
This transformation is depicted p' ictorially in iF gure F. Now assume that A is a self-adjoint linear transformation. Then there exists an orthonormal basis e{ ,~ }~ such that Ae~
o
=
A.le~,
Ae; =
A.2e;,
81
111 8 1
.4 10.46.
iF gure F
x
o
.4 10.47.
FIgure
G
.4 10.
iL near Transformations on Euclidean Vector Spaces
where A1 and Az denote the eigenvalues of A. Suppose that the coordinates of x with respect to l{ a, e~} are ;~ and ;~ , respectively. Then
Ax =
A(~;e~
+
=
~e;)
+
~;Ae~
=
~;Ae;
~Ale~
+
~Aze;;
i.e., the coordinate representation of Ax with respect to e{ ,~ e~} is (AI~;' Az;~ ). Thus, in order to determine Ax, we merely "stretch" or "compress" the coordinates ;~ , ~ along lines colinear with e~ and e;, respectively. This is illustrated in iF gure G. • .4 10.48. Example. Consider a transformation R from EZ into 1£ . which rotates vectors as shown in iF gure .H By inspection we can characterize R, with respect to the indicated orthonormal basis e{ l, ez,J as Rei Re z
= cos (Je l + sin (Jez = - sin (Je l + cos (Je
z •
U n it circle
.4 10.49.
iF gure H
The reader can readily verify that R is indeed a linear transformation. The matrix of R with respect to this basis is
_ C[ OS
R.-
(J
sin (J
-
(JJ
sin cos (J
.
Chapter 4 I iF nite-Dimensional
Vector Spaces and Matrices
By direct computation we can verify that lij' =
R; I = [
and, moreover, that det R, =
c~s -
(JJ
sin cos (J ,
(J
sm (J
cos" (J
+
sin" (J =
1.
Thus, R is indeed a rotation as defined in Definition .4 10.11. F o r the matrix R, we also note that R o = I, R;t = R_, and R,R~ =
R,+~
•
.
.4 10.50. Example. Consider now a transformation A from E3 into E3, as depicted in iF gure .J The vectors e l , e", e 3 form an orthonormal basis for Set +Y
PlaneZ
900
I I I
I
/
I I
Set .Y
I aJF ure J
.4 10.51.
E3. The plane Z is spanned by et and e". This transformation accomplishes a rotation about the vector e3 in the plane Z. By inspection of Figure J it is clear that this transformation is characterized by the set of equations Aet Ae" =
=
cos (Jet - s in (Je
+
l
+
sin (Je"
+
cos (Je"
+
0 • e3 0 • e3
0 • e l + 0 • e" + I • e 3 • The reader can readily verify that A is a linear transformation. The matrix Ae 3
=
.4 10.
Linear Transformations on Euclidean Vector Spaces
231
of A with respect to the basis { e l' e z , e3 l is A= [
COS9
- s in 9
sin9
cos 9
o
0
F o r this transformation the following facts are immediately evident (assume sin9 t= = 0): (a) e3 is an eigenvector with eigenvalue I; (b) plane Z is a linear subspace of £ 3 ; (c) Ax E Z whenever x E Z; (d) the set +Y is a linear subspace of £ 3 ; (e) Ax E +Y whenever x E ;+ Y (f) Z..l ;+ Y and (g) dim +Y = I, dim Z = 2, and dim +Y dim Z = dim E3. •
+
E.
Further
Properties of Orthogonal Transformations
The preceding example motivates several of our subsequent results. Let A E L ( X , X ) . We recall that a linear subspace Y of X is invariant under A if Ax E Y whenever x E .Y We now prove the following: .4 10.52. Then
Theorem.
eL t
A
E
L(X,
be an orthogonal transformation.
X)
(i) the only possible real eigenvalues of A, if there are any, are + I and - I ; (ii) if Y is a linear subspace of X which is invariant under A, then the restriction A' of A to Y is an orthogonal transformation from Y into Y; and (iii) if Y is a linear subspace of X which is invariant under A, then Y l. is also a linear subspace of X which is invariant under A.
Proof To prove the first part, assume that A has a real eigenvalue, say Ao• (The definition of eigenvalue of A E L ( X , X ) excludes the possibility of complex eigenvalues, since X is a vector space over the field R of real numbers.) Then Ax = .lox for x 0 and
*'
IAx I = IAox I = lAo IIlx -
But IAx I = Ix I, because A is by assumption an orthogonal linear transformation. Therefore, lAo I = I, and we have Ao = + I or - 1 . To prove the second part assume that Y is invariant under A. Then Ax E Y whenever x E ,Y and thus the restriction A' of A to ,Y defined by A' x
=
Ax
for all x in ,Y is clearly a linear transformation of Y into .Y Now, trivially, for all x in Y we have
IA'lx =
IAx l
= lxi,
Chapter 4 I iF nite-Dimensional
232
Vector Spaces and Matrices
since A E L ( X , X ) is an orthogonal transformation. Therefore, A' is an orthogonal transformation from Y into .Y To prove the last part, let Y be an invariant subspace of X under A. Then x E y.l if and only if x 1.. y for all y E .Y Suppose then that x E y.l and consider Ax . Then for each y E Y we have (Ax , y)
=
(x, A*y) =
(x, A- I y),
because A is orthogonal.. But A- I Y is also in ,Y for the following reasons. The restriction A' of A to Y is orthogonal on Y by part (ii) and is therefore a non-singular transformation from Y into .Y eH nce, (A' t 1 exists and, moreover, (A' t l must be a transformation from Y into .Y Thus, (A' t ly = A- I y and A- I y is in .Y We finally have (Ax , y)
for each y in .Y Thus, Ax is invariant under A. •
E
=
0
(x, A- I y ) =
y.l whenever x
E
y.l. This proves that y.l
We also have: .4 10.53. Theorem. Let A E L ( X , X ) be an orthogonal transformation, let +Y denote the set of all x E X such that Ax = ,x and let y_ denote the set of all x E X such that Ax = - x . Then +Y and L are linear subspaces of X and +Y 1.. L .
Proof Since +Y = ~(A - I) and y_ _ Y are linear subspaces of .X Now let x (x, y)
=
which implies that (x, y)
(Ax , Ay) =
=
~(A E
Y
H
(x, - y )
+ I), it follows that +Y and let y E Y_. Then =
y),
-(x,
= O. Therefore, x 1.. yand
and
1.. y_.
+Y
•
sU ing the above theorem we now can prove the following result. .4 10.54. Corollary. eL t A, +Y and y_ be defined as in Theorem .4 10.53, and let Z denote the set of all x E X such that x 1.. +Y and x 1.. y_. Then Z is a linear subspace of X and dim +Y + dim y_ + dim Z = dim X = 11. Furthermore, the restriction of A to Z has no (real) eigenvalues.
Proof Let e{ l , • • , e"J be an orthonormal basis for Y H and let e{ .'1+ > ... , e"I+n.} be an orthonormal basis for ,_ Y where dim +Y = n 1 and dim y_ = 11". Then the set e{ l , • • ,e"I+"}' is orthonormal. Let Y denote the linear subspace generated by e{ l , • • , e".+n.}. Then dim Y = 11 1 + 11". By the defand thus Z is a linear inition of Z and by Theorem .4 9.59 we have Z = .Y ,L subspace of .X Therefore, 11
= dim X =
= dim
+Y
which was to be shown.
+
dim Y
+
dim y_
+
dim y.l
= n l +·
dimZ,
11"
+
dimZ
.4 10.
Linear Transformations on Euclidean Vector Spaces
233
To prove the second assertion, let A' denote the restriction of A to Z. Suppose there exists a non-zero vector x E Z such that A' x = lox . Since A is orthogonal by part (ii) of Theorem .4 10.52, we have 1 0 = ± 1 by part (i) of Theorem .4 10.52. Thus, x is either in +Y or in L . But by assumption, x E Z and Z .J . +Y and Z .J . L . Therefore, x = 0, a contradiction to our earlier assumption. Hence, the restriction A' of A to Z cannot have a real eigenvalue. _ I
Our next result is concerned with orthogonal transformations on twodimensional Euclidean spaces.
A
.4 10.55. Theorem. eL t where dim X = 2.
L(X,
E
X)
be an orthogonal transformation,
(i) If det A = + 1 (i.e., A is a rotation), there exists some real (J such that for every orthonormal basis fe., e2 } the corresponding matrix of A is
_ C[ OS
R,-
(J
-
sin (J
(JJ
sin cos (J
.
(4.10.56)
(ii) If det A = - 1 (i.e., A is a reflection), there exists some orthonormal basis feI' e2 } such that the matrix of A with respect to this basis is
1[ o - I OJ.
Q=
(4.10.57)
To prove the first part assume that det A arbitrary orthonormal basis fe" e2 } . eL t
Proof
A
= [all
a21
= + 1 and choose an
au] a2 2
denote the matrix of A with respect to this basis. Then, since A is orthogonal, so is A and we have (4.10.58) and (4.10.59) det A = I. Solving Eqs. (4.10.58) and (4.10.59) (we leave the details to the reader) yields = cos (J, au = - sin (J, a21 = sin (J, and a 22 = cos (J. To prove the second part assume that A is orthogonal and that det A = - I . Consider the characteristic polynomial of A,
alI
. p(l)
Since det A =
- 1 we have /x o = ,
AI>
,
_
A2 -
=
12
+
/X 11
+
/X o •
- 1 . Solving for 1 I and 1 2 we have -/XI
± .v'iif+4 2
'
Chapter 4 I iF nite-Dimensional
Vector Spaces and Matrices
which implies that both AI and Az are real and that AI *- Az. rF om Theorem .4 10.52 these eigenvalues are + 1and - 1 . Therefore, there exists an orthonormal basis such that the matrix of A with respect to this basis is [
OJ I[
°
AI
Az =
OJ
0 -1
. •
In the above proof we have e l E +Y and ez E ,_Y in view of Theorem .4 10.53. Also, from the preceding theorem it is clear that if A is orthogonal and (a) if det A = 1, then det (A - U ) = 1 - 2 cos 9A + AZ, and (b) if det A = - 1 , then det (A - U ) = AZ - l. .4 10.60. Theorem. eL t A E L ( X , X ) be an orthogonal transformation having no (real) eigenvalues. Then there exist linear subspaces Y " ... , Y r of X such that
(i) dim Y I = 2, i = 1, ... , r; (ii) Y I 1- Y J for all i *- j; (iii) dim Y I + ... + dim Y r = dim X = n; and (iv) each subspace Y I is invariant under A; in fact, the restriction of A to Y I is a non-trivial rotation (i.e., for the matrix given by Eq. (4.10.56) we have 9 *- kn, k = 0, 1,2, ...). Proof.
Since by assumption A does not have any (real) eigenvalues, we have det (A -
AI) =
(tX I
+
PI).
+
+
).Z) ... (tX,
P,).
+
).Z),
where the tXl' PI> i = I, ... , r are real (i.e., det (A - U ) does not have any linear factors (AI - ).), with AI real). Solving the first uq adratic factor we have , _
-PI
,+ P J :
A-
-PI
.- .Jm2 -
""I -
and z-
2
-4«1
t4 X
l,
where AI and Az are complex. By Theorem .4 5.33, part (iv), if 1(' ) is any polynomial function, then I(AI) will be an eigenvalue of I(A). In particular, ifj(A) = tX l + PIA + AZ, we know that one of the eigenvalues of the linear transform~tion tXlI + PIA + AZ will be tX l + PIAl + Ai = 0, by choice. Thus, the linear transformation (tXII + PIA + AZ) has 0 as an eigenvalue. Therefore, there exists a vector jl *- 0 in X such that or
(tXII
+
PIA
+
AZ)/I
=
0 • II
(4.10.61)
.4 10.
iL near Transformations on Euclidean Vector Spaces
235
Now let 11 = Alt· We assert that 11 and 11 are linearly independent. F o r if they were not, we would have 11 = "fIt = Alto where" is a real scalar, and It would be an eigenvector corresponding to a real eigenvalue" of A, which is impossible by hypothesis. Nex t , let Y t be the linear subspace of X generated by It and/1 • Then Y t is two dimensional. We now show that .Y is invariant under A. L e t x E Y t • Then
= x
for some et and ez, and
etft
+ e.Jz
= etA/. + ez AI1 = etA/. +
Ax
ez A Z/ t .
But from Eq. (4.10.61) it follows that
=
AZit Thus,
Ax
=
+
ez ( - f 1,tft
- e z f 1,tft
(et -
= etA/]
+
-
- f 1,tft -
PtA/t.
PtA/.) =
- e z f 1,tft
+
(et -
ez p t)A/t
eZpt)fz,
which shows that Ax E \ Y whenever x E Y t . Thus, Y t is invariant under A. By Theorem 4.10.52, the restriction A' of A to Y t is an orthogonal transformation from Y t into Y t • This restriction cannot have any (real) eigenvalues, for then A would also have (real) eigenvalues. F r om Theorem 4.10.55, A' cannot be a reflection, for in that case A' would have eigenvalues eq u al to I and - 1 . Moreover, A' cannot be a trivial rotation, for then the eigenvalues of A' would be eq u al to 1 if 6 = 0 0 and - 1 if (J = 1800 • But from Corollary 4.10.8 we know that if A is orthogonal, then det A = ± 1. Therefore, it follows now from Theorem 4.10.55 that the restriction of A to Y t is a non- t rivial rotation. Now let Zt = Y t . Since Y t is invariant under A, so is Zto by Theorem 4.10.52, part (iii), and dim Zt = dim X - 2. The restriction At of A to Zt is an orthogonal transformation from Zt into Zto and it cannot have any (real) eigenvalues. Applying the argument already given for A and X now to At andZ t , we can conclude that there exists a two- d imensional linear subspace Y z of Z t such that the restriction of A t to Y 1 is a non- t rivial rotation. Now since Y z i scontainedinZ t and since by definitionZ t = Y t ,wehave .\ Y ..L Y 2• Nex t , let Z2 be the linear subspace which is orthogonal to both Y t and Y z , and let A2 be the restriction of A to Z2' Repeating the argument given thus far, we can conclude that there exists a two- d imensional linear subspace ] Y of Z2 such that the restriction of A 2 to ] Y is a non-trivial rotation and such that .z Y ..L ] Y and Y t ...L .] Y To conclude the proof of the theorem, we continue the above process until we have ex h austed the original space .X •
+
Combining Theorems 4.10.53 and 4.10.60, we obtain the following:
Chapter 4 I iF nite-Dimensional
Vector Spaces and Matrices
.4 10.62. Corollary. Let A E L ( X , X ) be an orthogonal linear transforma,_ Y Y lt • . • , ,Y of X such that tion. Then there exist linear subspaces t+ Y (i) (ii) (iii) (iv) (v)
all of the above linear subspaces are orthogonal to one another; n = dim X = dim + Y + dim L + dim Y I + ... + dim ,Y ; x E +Y if and only if Ax = x; x E y_ if and only if Ax = - x ; and the restriction of A to each /Y , i = 1, ... , r, is a non-trivial rotation.
Since in the above corollary the dimension of each is two, we have the following additional result. .4 10.63. Corollary. If in Corollary .4 10.62 dim X real eigenvalue.
"Y i =
1, ...
,r,
is odd, then A has a
We leave the proof of the next result as an exercise. Deorem. If A is an orthogonal transformation from X .4 10.64. then the characteristic polynomial of A is of the form (I -
l)· ' ( - 1 -
l)- ' ( I - 2 cos 8 1l
+
lZ) . .. (1 - 2 cos 8,l
=
+
into ,X
lZ) det(A -
Al).
Moreover, there exists an orthonormal basis e{ lt ... , e.l of X such that the matrix of A with respect to this basis is of the form cos 8 1
- s in 9 1
sin 9 I
cos 9 I
o :I cos 8, - s in 8, :I ~-,
I
L~
A=
o
•
_
8 ~
J_! ~
8
I
-1
-1
1+
2,
.4 10.
Linear Transformations on Euclidean Vector Spaces
.4 10.65.
Exercise.
237
Prove Theorem .4 10.64.
In our next result the canonical form of skew-adjoint linear transformations is established. .4 10.66. Theorem. Let A be a skew-adjoint linear transformation from X into .X Then there exists an orthonormal basis fe"~ ... ,e.l such that the matrix A of A with respect to this basis is of the form
o A= !-o-i~
o where the .J " ez ro. .4 10.67.
i=
Exercise.
i -J., I
0
I, ... ,r are real and where some of the .J , may be
Prove Theorem .4 10.66.
Before closing the present section, we briefly introduce so-called "normal transformations." We will have quite a bit more to say about such transformations and their representation in Chapter 7. .4 10.68. DefinitioD. A transformation A linear transformation if A*A = AA*.
E
L(X,
X)
is said to be a normal
Some of the properties of such transformations are as follows. .4 10.69.
Theorem. eL t A
E
L(X,
)X .
Then
(i) if A is a self-adjoint transformation then it is also a normal transformation; (ii) if A is a skew-adjoint transformation, then it is also a normal transformation; (iii) if A is an orthogonal transformation, then it is also a normal transformation; and (iv) if A is a normal linear transformation then there exists an orthonormal basis {el' ... ,e,,} of X such that the matrix A of A with respect to this basis is of the form
I
Chapter 4
._ - - -
PI -AI
Finite-Dimensional
Vector Spaces and Matrices
AI:I PI!
o - - - - - -1
1
A=
I
1
P,
"
:- - - - - - A ,
,
I
A, I
P,
I
I I
1
o V~-2,
The proofs of parts (i)-(iii) follow from the definitions of normal, selfadjoint, skew-adjoint, and orthogonal linear transformations. To prove part (iv), let A = AI + A2 , where AI = H A + A*) and A2 = t(A - A*), and note that AI is self-adjoint and A 2 is skew adjoint. This representation is unique by Corollary .4 10.25. Making use of Theorem .4 10.66 and Corollary .4 10.38, we obtain the desired result. We leave the details of the proof of this theorem as an exercise. .4 10.70.
4.11.
Exercise.
Prove Theorem .4 10.69.
APPLICATIONS DIFE F RENTIAL
TO ORDINARY EQUATIONS
In the present section we present applications to the material covered in the present chapter and the preceding chapter. Because of their importance in almost all branches of science and engineering, we consider some topics in ordinary differential equations. Specifically, we concern ourselves with initial-value problems described by ordinary differential equations. The present section is divided into two parts. In subsection A, we define the initialvalue problem, while in subsection B we treat linear initial-value problems. At the end of the next chapter, we will continue our discussion of ordinary differential equations. A. Initial- Value Problem: Definition
Let R denote the set of real numbers, and let D c R2 be a domain (i.e., D is an open and connected subset of R2). We will call R2 the (t, x) plane. Let / be a real-valued function which is defined and continuous on D, and
4.IJ.
Applications to Ordinary Differential Equations
x
let
I:J.
dx/dt (Le.,
x
239
denotes the derivative of x with respect to t). We call
x =
f(t, x)
(4.11.1)
an ordinary differential equation of the first order. Let T = (t l ' t )z c R be an open interval which we call a t interval (Le., T = (t l ' t z ) = t{ E R: t I < t < t )} z . A real differentiable function rp (if it exists) defined on T such that the points (t, rp(t» E D for all t E T and such that
;(t) = for all t
E
(4.11.2)
f(t, rp(t»
Tis called a solution of the differential equation (4.11.1).
.4 11.3. Definition. Let ('r, e) equation (4.11.1) and if rp('r) = value problem
x =
x('r)
In Figure K a
D. If rp is a solution of the differential
e, then rp is called a solution of tbe initial-
E
f(t, x ) } .
=
e
(4.11.4)
typical solution of an initial-value problem is depicted. t interval T = (t, • t2)
x
~
m
=
=
slope of line L
fIT, .,(T))
---
t,
.4 11.5.
t
T
iF gure .K
Typical solution of an initial-value problem.
We can represent the initial-value problem given in Eq. (4.1 1.4) equivalently by means of the integral equation
rp(t) =
e+
r
f(s, rp(s» ds.
(4.1 1.6)
Here we say that two problems are equivalent if they have the same solution. To prove this equivalence, let rp be a solution of the initial-value problem (4.1 1.4). Then rprr) = and
e
;(t) =
f(t, rp(t»
140
Chapter 4 I iF nite-
for all lET. Integrating from
f
DimensiolUll Vector Spaces and Matrices
to I we have
s: ;(s) ds = s: I(s, ,(s»
ds
s: I(s, ,(s»
ds.
or ' ( 1)
= ~ +
Thus, , is a solution of the integral equation (4.11.6). Conversely, let, be a solution of the integral equation (4.11.6). Then ' ( f) = ~ , and differentiating both sides of Eq. (4.11.6) with respect to I we have ;(1) = 1(1, ,(t)), and thus, is a solution of the initial-value problem (4.11.4). Next, we consider initial-value problems described by means of several first-order ordinary differential equations. Let D c R!+' I be a domain (Le., D is an open and connected subset of R· + I ). We will call R· + I the (t, X I ' ... , x . ) space. Let II> ... ,I. be n real-valued functions which are defined and continuous on D (i.e., /,(t, X I ' ... , x.), i = I, ... ,n, are defined for all points in D and are continuous with respect to all arguments I, IX > • • , x.). We call (4.11. 7) IX = /,(1, X I ' ... ,x . ), i = 1, ... , n, a system of n ordinary differential equations of tbe first order. A set of n real differentiable functions 1 ' £ ' ... , ,.} (if it exists) defined on a real I interval T = (I I' I z ) c R such that the points (I, '1(1), ... , ,.(1» E D for all lET and such that
;tCt) =
/,(1, '1(1), ... ".(t» ,
i
= 1, ... , n
(4.11.8)
for all lET, is called a solution of tbe system of ordinary differential equations (4.11.7).
,.} is .4 11.9. Definition. Let (f, ~ I> • . . , ~.) E D. If the set { ' I "' " a solution of the system of equations (4.11.7) and if (' I (f), ... , ,.(f» = (~I> ... , ~.), then the set 1 ' £ "' . ".} is called a solution of the initial-value problem IX = /,(t, X I ' . ' • , x.), i = 1, ... , n } . (4.11.10) X I (f) = ~I' I = I, ... , n It is convenient to use vector notation to represent Eq. (4.11.10). Let
.4 11.
241
Applications to Ordinary Differential Equotions
f(/, x )
and define i =
=
[
/1(/, X 1.(/,
,X.)]
It.' •
.
/[ I('~ =.
.
. 1.(/, x)
,x . )
X It • . .
)X ]
dx/dt componentwise; i.e.,
We can express Eq. (4.11.10) equivalently as i
= f(t, x)
(X T)
=;
}.
(4.11.11)
If in Eq. (4.11.1 I) f(t, x) does not depend on I (i.e., f(t, )x (I, )x E D), then we have i = f(x).
= f(x) for all (4.11.12)
In this case we speak of an autonomous system of first-order ordinary difl'erential equations. Of special importance are systems of first-order ordinary differential equations described by (4.11.13) i = A(t)x + vet), i
and
=
(4.11.14)
A(t)x,
(4.11.15)
i= A x ,
where x is a real n-vector, A(t) = a[ j{ (t)] is a real (n x n) matrix with elements a{j(/) that are defined and continuous on a t interval T, A = a[ ,/] is an (n X n) matrix with real constant coefficients, and vet) is a real n-vector with components v,(t), i = 1, ... ,n, which are defined and at least piecewise continuous on T. These equations are clearly a special case of Eq. (4.11.7). F o r example, if in Eq. (4.11.7) we let /,(t,
XI'
••
,x . ) = /,(t, x) =
~
• a'/(t)x
I- I
l,
i=
I, ... ,n,
then Eq. (4.11.14) results. In the case of Eqs. (4.11.14) and (4.11.15), we speak of a linear homogeneous system of ordinary differential equations, in the case of Eq. (4.11.13) we have a linear non-bomogeneous system of ordinary differential equations, and in the case of Eq. (4.11.15) we speak of a linear system of ordinary differential equations with constant coefficients. Next, we consider initial-value problems described by means of nth-order ordinary differential equations. L e tlbe a real function which is defined and
Chapter 4 I iF nite-Dimensional
242
continuous in a domain D of the real (I, Ii. dkx/dl k. We call (X )~ = 1(1, ,X X (I), • • •
XI' ,
Vector Spaces and Matrices space, and let
,x~)
••
X ( k)
(4.1 1.1 6)
x(~-Il)
an nth-order ordinary dift'erential equation. A real function ' I (if it exists) which is defined on a I interval T = (I I' t 2) C R and which has n derivatives on T is called a solution of Eq. (4.11.16) if (I, 1' (/), ... ,rp(~)(/» E D for all I E Tand if rp(~)(/) = 1(/, 1' (/), ..• , rp(~-Il(/» (4.1 1.17) for all lET. .4 11.18.
Definition. eL t (r, and if rp(r) = of the initial value problem
e" ... ,e~)
e" ... ,
(4.11.16)
=
(X )~
=
rp(~-Il(r)
1(/, ,x x(ll, ...
eJ' ... ,x(I-~ l(r)
=
x ( r)
D. If ' I is a solution of Eq. then ' I is called a solution
E
e~, ,X(~-I»
}.
=
(4.1 1.19)
e~
Of particular interest are nth-order ordinary differential equations
+
a,,(/)x(~) a,,(t)x()~
and a,.x(~)
+
+
a._I(/)x(~-1l
+
a~_I(t)X(~-1l
+
+
+
a l (t)x(1l
+ ... +
a~_lx(~-1l
+
al(/)x ( l)
alx ( I)
+
+
ao(t)x
=
ao(t)x
=
=
0,
aox
V(/),
(4.11.20)
0,
(4.11.21) (4.11.22)
where a,,(t), .• . ,oo(t) are real continuous functions defined on the interval T, where a~(/) :;z:! 0 for all lET, where a~, • . , a o are real constants, where a" :;z:! 0, and where v(/) is a real function defined and piecewise continuous on T. We call Eq. (4.11.21) a linear homogeneous ordinary differential equation oforder n, Eq. (4.1 1.20) a linear non-homogeneous ordinary differential equation of order n, and Eq. (4.1 I .22) a linear ordinary differential equation of order n with constant coefficients. We now show that the theory of nth-order ordinary differential equations reduces to the theory of a system of n first-order ordinary differential equations. To this end, let in Eq. (4.11.19) X = X I ' and let
=
IX
x = x
2
I_~X
x~
=
=
X 2 3
=
X
=
x~
1(/,
X ( 2)
(4.1 1.23)
=
X(~-I)
XI'
••
, x~)
=
x(~)
This system of equations is clearly defined for all (I, X I ' ... ,x~) E D. Now assume that the vector p4 T = ('11' ... , rp~) is a solution of Eq. (4.11.23) on an
.4 11.
Applications to Ordinary Differential Equations
interval T. Since rp"
= ;"
rp3
f(t, rp,(t), . .. ,rpft(t»
= ;", ... ,rpft = rp\ft-I),
and since
f(t, rp,(t), . .. ,rp\ft-Il(t» =
=
rp\ft)(t),
it follows that the first component rp, of the vector, is a solution of Eq. (4.11.16) on the interval T. Conversely, assume that rp, is a solution of Eq. (4.11.16) on the interval T. Then the vector cpT = (rp, rp(l), ... , rp(ft-ll) is clearly a solution of the system of eq u ations (4.11.23). Note that if rp,(1') = ~" ... ,rp\ft-I)(1') = ~ft' then the vector, satisfies ,(f) = ; , where = (~t, ... , ~ft)' The converse is also true. Thus far we have concerned ourselves with initial-value problems characterized by real ordinary differential equations. It is possible to consider initialvalue problems involving complex ordinary differential equations. F o r example, let t be real and let ZT = (z " ... , Zft) be a complex vector (i.e., Zk is of the form U k + ivk , where U k and V k are real and i = ,J = } ) . Let D be a domain in the (t, z) space, and letf., ,f,. be n continuous complex-valued functions defined on D. Let fT = (fl' ,f,.), and let = dz/dl. We call
;T
z
= C(t, )z
z
(4.11.24)
a system of n complex ordinary differential eq u ations of the first order. A complex vector cpT = (rp" • .. , rpft) which is defined and differentiable on a real t interval T = (T" T,,) c R such that the points (I, rp,(t), ... , rpft(t» E D for all t E T and such that
= C(t, .
(+ t) for all t
E
T, is called a solution of the system of eq u ations (4.11.24).
addition, (r,~" ... '~ft) E D and if (rp,(-r), ... ,rpft(-r» = (~I"" then cp is said to be a solution of the initial-value problem
z =
(z r- )
(£ t, )z } .
=
~
'~ft)
=
If in ~T,
(4.11.25)
Of particular interest in applications are initial-value problems characterized by complex linear ordinary differential eq u ations having forms analogous to those given in equations (4.1 1.13)-(4.11.15). We can similarly consider initial-value problems described by complex nth- o rder ordinary differential equations. Let us look now at some specific examples. The first example demonstrates that the solution to an initial-value problem may not be unique. 4.11.26.
Example.
Consider the initial-value problem
x =
x'/3
(x O) =
O.
We can readily verify that this problem has infinitely many solutions passing
Chapter 4 I iF nite-Dimensional
Vector Spaces and Matrices
through the origin of the (I, x) plane, given by l
tpi ) =
0, { H(t
~
°
p)]3/Z, P <
where p is any real number such that
°
<
p
<1<
<
t
<
p I
I. •
The next example shows that the t interval for which a solution to the initial-value problem exists may be restricted. .4 11.27.
Example.
Consider the initial-value problem
= ,{
x ( t.)
where { is any real number. By direct computation we can verify that tp(t)
= {[I -
(t -
tl){ ] - l
is a solution of this problem. We note that if t = t l + Ij{, then the solution tp(t) is not defined. Thus, there is a restriction on the t interval for which a solution to the above problem exists. Namely, if { > 0, the above solution is valid over any interval (t l , t 2 ) such that II< t
2
=t
l
e-+ '
I
In this case we say the solution fails to exist for I > if { < 0, the solution given above is valid for any t > tion exists on any interval (t l , t 2 ). •
t 2 • On the other hand, t I ' and we say the solu-
The preceding examples give rise to several important questions: When does an initial-value problem possess a solution? When is a solution unique? What is the extent of the interval over which such a solution exists? Is the solution continuously dependent on the initial condition ~ ? At the end of the next chapter we will state and prove results which give answers to these questions. 8.
Initial-Value Problem: Linear
Systems
In the remainder of the present section we concern ourselves exclusively with initial-value problems described by linear ordinary differential equations. Let again T = (tit t 2 ) be a real t interval, let x T = (X I "' " ,x ,) denote an n-dimensional vector, let A = a[ ,/] be a constant (n X n) matrix, let A(t) = o[ l/(t)] be an (n X n) matrix with elements 0l/(t) that are defined and continuous on the interval T, and let v(tY = (vl(t), ... ,v,,(t» denote an n-
4 . lJ .
Applications to Ordinary Differential Equations
24S
vector with components vl(t) that are defined and piecewise continuous on T. In the following we consider matrices and vectors with components which may be either real- or complex-valued. In the former case the field for the x space is the field of real numbers, while in the latter case the field for the x space is the field of complex numbers. Also, let D
=
{(I, x): lET, x
E Rn(or en)}.
(4.11.28)
At first we consider systems of ordinary differential equations given by
= A(I)x + V(I), i = A(I)X,
(4.11.29)
i=
(4.11.31)
i
(4.11.30)
and
Ax.
In the applications section of the next chapter we will show that, with the above assumptions, equations (4.11.29)-(4.11.31) possess unique solutions for every (r, e) E D which exist over the entire interval T = (II' (2 ) and which depend continuously on the initial conditions. This is an extremely important result in applications, where we usually require that T = (- 00, 00). .4 11.32. Theorem. The set S of all solutions of Eq. (4.11.30) on T forms an n-dimensional vector space.
Proof
L e t fl and ' 2 be solutions of Eq. (4.11.30), let F denote the field for the x space, and let 0.: 1, 0.: 2 E .F Since d dt [o.:lfl(l)
1'
+
0.:2' 2 (1)]
=
+
0.: 11»4 (1)
=
=
0.: 242» (1)
o.:IA(t)4pl(t)
A(t)[o.:l'l(t)
1' >
+
+
0.:2A(t)4p2(t)
0.:2'2(t)].
it follows that 0.: 1 + 0.: 2'2 E S whenever f2 E S and whenever 0.: 1,0.: 2 E .F F u rthermore, the trivial solution f = 0 defined by f(l) = 0 for all t E T is clearly in S, and for every TI E S there exists a I' f = -11 E S such that TI + I' f = O. It is now an easy matter to verify that all the axioms of a vector space are satisfied for S (we leave the details to the reader to verify). Next, we must show that S is n-dimensional; i.e., we must find a set of solutions fl' ..• , which is linearly independent and which spans S. Let 1;1' ... ,I;n be a set of linearly independent vectors in the n-dimensional x space. By the existence results which we will prove in the next chapter (and which we will accept here on faith), if f E T, there exist n solutions of Eq. (4.11.30) such that' l (f) = ;1' i = I, ... ,n. We first show that these solutions are linearly independent. F o r purposes of contradiction, assume that these solutions are linearly dependent. Then there exist scalars 0.: 1" .. ,
n'
I' ' .... n'
Chapter 4 I iF nite-Dimensional IX"
E
Vector Spaces and Matrices
,F not all ez ro, such that
for all t
E
T. This implies that
But this last equation contradicts the assumption that the ~, are linearly independent. Thus, the ." i = I, ... ,n, are linearly independent. iF nally, to show that these solutions span S, let. be any solution of Eq. (4.1'1.30) on T, such that '<1') = ~. Then there exist unique scalars IX I , • • , IX" E F such that
because the vectors follows that
/~ '
i=
1, ...
,n, form
a basis for the x space. It now
By the uniqueness is a solution of Eq. (4.11.30) on T such that .(1' ) =~. results which we will prove in the next chapter (and which we accept here on faith),
Since. was chosen arbitrarily, it follows that the solutions span S. This concludes the proof. _
1' '
i=
I, ...
,n,
The above result motivates the following two definitions. .4 11.33. Definition. A set of n linearly independent solutions of Eq. (4.11.30) on T is called a fundamental set of solutions of (4.11.30). An (n X n) matrix ' P whose n columns are linearly independent solutions of Eq. (4.11.30) on T is called a fundamental matrix. Thus, if { . I' ... , .,,} is a set of n linearly independent solutions of Eq. (4.11.30) and if.r = (" 11' • • , "III)' then
,,= [
" II , U :~ :.1~
,
.. ..:::.. :.1~
111]
""I "111 ... ",,"
is a fundamental matrix.
=
[ . II.1! ·
1· .,,]
.4 11.
Applications to Ordinary
247
Equations
Diff~rential
In our next definition we employ the natural basis for the x space, given by 0-
I
0
I
, 8:&=
81 =
0
... ,
0
0
u• =
0 I
0
.4 11.34. Definition. A fundamental matrix . (for Eq. (4.11.30» whose columns are determined by the linearly independent solutions ' I ' i = I, ... ,n, with '/(1') = 01' i = I, ... ,n, 'f E
T, is called the state transition matrix . of Eq. (4.11.30).
Let X = IX [ I] be an (n X n) matrix and define differentiation of X respect to t E Tcomponentwise; i.e., X A x [ o.] We now have:
with
.4 11.35. Theorem. L e t" be a fundamental matrix of Eq. (4.11.30) and let X denote an (n X n) matrix. Then" satisfies the matrix equation X = A(t)X , t E T. (4.11.36) Proof We have
+
1· .+ ]
= [ + I I+ : & ! ·
=
A [ (t)'II' IA(t>l' :' !&
= A(t)[ . II.z I· · 1· • .] =
A(t)Y .
.. · I A(t)' I ' . ] •
We also have: .4 11.37. Theorem. If" is a solution of the matrix equation on T and if t, ' f E T, then det "(/) = det "(' f )ef. tf A(.) i., t E T.
Proof " =
Recall that if C =
[ " II] and A(t) = fill
~(detY)=
:~
is an (n
n) matrix, then tr C =
X
;{ I•
o[ IAt)]. Then I¥ II =
"u .. :.:&~
I "d
fl.
[ e ll]
...
¥lh
.. ::: .. :.z~
.. .
~.2~
t
(4.11.38)
I-I
CII'
Let
alk(t)"kr Now
flu
fill
+
(4.11.36)
•
~.:&~
.. ,
'IIh
.. ::: .. ~:&
".. ".1 "d "u "I. "" + "u ..................
+
fl••
flu
,,-'
,,:& • .
fld . , . fl••
(4.11.39)
Chapter 4 I iF nite-Dimensional
Vector Spaces and Matrices
Also,
1' 2' .
-
...................
' / Inn
The last determinant is unchanged if we subtract from the first row 012 times the second row plus 1 , times the third row up to 0ln times the nth row. This yields
°
0\1 (/)'/1
\I
0 \ I (t}yt u
1\ 2' 1
...
°
\I
/' 122
(t}'/I
1n
\l'2n
=
01\(/) det 1' .
Repeating the above procedure for the remaining determinants we get
~
°
d[ et 1'(t)] =
11 (/)
This now implies det Y ( t)
.4 1J.04 .
E
T.
022(1) det 1'(1)
+ ... +
0..(/) det 1'(t)
[tr A(t)] det 1'(t). =
for all t
+
det Y ( t)
=
det Y(r)ef~
It A(,),,,
•
Exercise.
Verify Eq. (4.11.39).
We now prove:
.4 1J.14 . Theorem. A solution Y of the matrix equation (4.11.36) is a 0 for all t E T. fundamental matrix for Eq. (4.11.30) ifand only if det 1'(t)
*'
Proof. Assume that l' = [ . 1 I V ' 21· .. 1V ' .] is a fundamental matrix for Eq. (4.11.30), and let 'I' be a nontrivial solution for (4.11.30). By Theorem .4 11.32 there exist unique scalars ~I' • • , /In E ,F not all ez ro, such that
or
• =
1' a ,
(4.11.42)
where aT = (/II' ..• ,/I.). Equation (4.11.42) constitutes a system of n linear equations with unknowns /II' ..• , /In at any f E T and has a unique solution for any choice of.(f). eH nce, we have det 1' ( f) 0, and it now follows from 0 for any 1 E T. Theorem .4 11.37 that det Y ( t) Conversely, let l' be a solution of the matrix equation (4.11.36) and assume
*'
*'
.4 11.
Applications to Ordinary Differential Equations
that det Y ( t) 1= = 0 for all t pendent for all t E T. •
E
249
T. Then the columns of., are linearly inde-
The reader can readily prove the next result. .4 11.43. Theorem. L e t" be a fundamental matrix for Eq. (4.11.30), and let C be an arbitrary (n x n) non-singular constant matrix. Then is also a fundamental matrix for Eq. (4.11.30). Moreover, ifT is any other fundamental matrix for Eq. (4. 11.30) then there exists a constant (n X n) non-singular matrix P such that T = "P.
"C
4.1l.44.
Exercise.
Prove Theorem .4 11.43.
Now let R(t) = [rit») be an arbitrary matrix such that the scalar valued functions rl}(t) are Riemann integrable on T. We define integration of R(t) componentwise, i.e.,
f R(t)dt = fr[ ,it»)dt
=
J[
r,/(t)dt}
Integration of vectors is defined similarly. In the next result we establish some of the properties of the state transition matrix, • . Hereafter, in order to indicate the dependence of. on l' as well as t, we will write .(t, 1'). By b4 (t, 1'), we mean u.(t, 1')/ut. .4 11.45. Theorem. eL t D be defined by Eq. (4.11.28), let l' E T, let cp(1') = ~, let (1',)~ E D, and let .(t,1' ) denote the state transition matrix for Eq. (4.11.30) for all t E T. Then (i) b4 (t, f) = A(t).(t, 1') with .(1' , 1' ) = I, where I denotes the (n x identity matrix; (ii) the unique solution of Eq. (4.11.30) is given by ,(t)
for all t E T; (iii) .(t, f) is non-singular for alI t (iv) for any t, (J E T we have
.(t,1' )
=
n)
(4.11.46)
.(t, 1'~ T;
E
= .(t, (J~«(J,
f);
(v) [.(t,1'»)-1 t:. .- I (t, f) = .(- r , t) for all t E T; and (vi) the unique solution of Eq. (4.11.29) is given by
cp(t)
= .(t, 1')~
+
f .(t,
")v(,,)d,,.
(4.11.47)
Proof The first part of the theorem follows from the definition of the state transition matrix.
Chapter 4 I iF nite-Dimensiotull
Vector Spaces and Matrices
To prove the second part, assume that f{ t ) = with respect to t we have
(+ t)
= i(t, f~
=
A(t~(t,
.(t, f~.
Differentiating
= A(t)t
f~
F u rthermore, f{ f ) = .(f, f~ = ~ . F r om the uniqueness results (to be presented in the next chapter) it follows that the specified • is indeed the solution of Eq. (4.11.30). The third part of the theorem is a consequence of Theorem .4 11.41. To prove the fourth part of the theorem we note that p4 t{ ) = .(t, f~ is the unique solution of Eq. (4.11.30) satisfying f{ f ) = ~ , and also that f{o)' = ~O', f~, 0' E T. Now consider the solution of Eq. (4.11.30) with < )' . Then initial condition given at 0' in place of f; i.e., p4 t{ ) = .(t,O'.> O
f{ t )
=
=
.(t, f~
~t,
O'~O',
f~.
Since this equation holds for arbitrary ~ in the x space, we have
= .(t, O'~(O',
f)
~t,
f).
To prove the fifth part of the theorem we note that .- I (t, f) exists by part (iii). F r om part (iv) it now follows that
I= where I denotes the (n x
.(t, f~(f,
t),
n) identity matrix. Thus,
.(f, t)
.- I (t, f) =
for all t E T. In the next chapter we will show that under the present assumptions, Eq. (4.11.29) possesses a unique solution for every (f, ~) E D, wheret< f ) = ~. Thus, to prove the last part of the theorem, we must show that the function (4.11.47) is this solution. Differentiating with respect to t we have
= ,< t , f~
+ ~t,
= A(t~(t,
f~
.(t)
A(t)[~t, =
=
Also, f{ f )
=~.
A(t~t)
Therefore, •
+ f~
+
t)Y(t) (Y t)
+
+
+
f ,
f A(t~(t,
f .(t,
,,)v(,,) d"J
,,)Y(,,) d" ,,)Y(,,) d"
+
v(t)
(Y t). is the unique solution of Eq. (4.11.29).
•
In engineering and physics, • is interpreted as representing the "state" of a physical system described by appropriate ordinary differential equations. In Eq. (4.11.46), the matrix .(t, f) relates the "states" of the system at the points t E T and f E T. Hence, the name "state transition matrix." Next, we wish to examine the properties of linear ordinary differential
.4 11.
251
Applications to Ordinary Differential EqQ U tions
eq u ations with constant coefficients given by Eq. (4.1 1.31). We require the following preliminary result. 4.11.48. Theorem. L e t A be a constant (n X n) matrix (A may be real or complex). L e t SN(I) denote the matrix N tie SNCt) = I Ie~ k!AIe.
+
Then each element of matrix SH(I) converges absolutely and uniformly on any finite interval (- I I' II), II > 0, as N - - > 00. Proof
aW denote the (i,j)th element of matrix
Let
Ale, where i,j = I, 0, I, 2, .... Then the (i,j)th element of SNCt) is equal to
... , n, and k =
~
where ~'J
is the K r onecker
delta. We now show that
I
~'J + Let m =
max
max I.J
I,)
la}+ }
00
for all i,j.
(t Ia'i I). Then m is a constant which depends on the elements
max
=J
I~I~.
I
(t IalP I . IaW I) < p=
I<
~!
l
letl a}r
of the matrix A• SinceA1c+1 = A AIe
<
(leI tie a'i k! '
~
+ 1e~1
QI)
I
(max I
II I< m .max l a1} 1 1.
,
t
p=
wehavemaxla(Ie+I)I=maxl~a ~J
I
t. we have
by induction it follows that max l aWI< m Then we have for any t
Since I
-
E
(- t
~J
Ialp I)(max Ia~~1 I). P.}
When k=
I.}
I}
I.}
tl), t l
k •
~
p- I
Ipa(lell p}
Therefore,
maxlat]l~m. I.J
and
Now tet Mk= ( mtl)klk! .
> 0, and for any i,j,
". I (kIt"k! I<M l,
+ L "-I
ali
M" =
e"'t"
we now have ~
QI)
i-
+
~
1e= 1
(kl
al}
tIc
-k'
•
is an absolutely and uniformly convergent series for each i,j over the interval I I' I I) by the Weierstrass M-test. •
(-
We are now in a position to consider the following: 4.11.49. Definition. to be the matrix
Let
A be a constant (n X
eAt = for any
-00
<
I
<
00.
I
+ "=L 1-
k t_ A " k!
n) matrix.
We define eAt
Chapter 4 I iF nite-Dimensional
252
=
We note immediately that eN 1,-0 We now prove: (n
x
I.
(- 0 0, 00), let
.4 11.50.
Theorem. Let T = n) matrix. Then
Vector Spaces and Matrices
l' E
T, and let A be a constant
(i) the state transition matrix for Eq. (4.11.31) is given by
.(t, 1')
= eA l' - . )
for all t E T; (ii) the matrix eA' is non-singular for all t (iii) eA"eA" = eA1h+ t ,) for all t I> t 2 E T; (iv) AeA' = eA'A for all t E T; and (v) (eN)-1 = e- AI for all t E T.
E
T;
Proof To prove the first part we must show that .(t, 1') satisfies the matrix equation ' ( t,1' ) = A.(t, 1')
for all t
E
=
T, with .(1' , 1')
.(/,1' )
I. Now, by definition,
=
e AII-. )
=
I
+
:E (t - k! r- )k Ak.
k=1
In view of Theorem .4 11.48 we may differentiate the above series term by term. In doing so we obtain
= A+
.{ [ e AII- . )]
dt
:E (t - k! 1')k Ak+l
k=1
= AeA II - .
= A[ I +
:E (t - k! 1')k Ak]
k-I
l,
and thus we have 1')
t~ ,
=
A~t,
1')
for all t E T, with .(1' , 1' ) = eA l.- . l = I. Therefore, eAll - d is the state transition matrix for Eq. (4.11.31). The second part of the theorem is obvious. To prove the third part of the theorem, we note that for any tl, t 2 E T, we have - t 2) = eN', which Now .(tl> - t 2) = eAII,+ t ,l, .(tl> 0) = eA", and ~O, yields the desired result. To prove the fourth part of the theorem we note that for all t E T,
A(I +
:E Ik Ak) =
k= l k!
A+
:E t....Ak+1
k=1 k!
=
(I + k-tl t....Ak)A. k!
.4 11.
Applications to Ordinary Differential Equations
253
iF nally, to prove the last part of the theorem, note that for all t t!', . t!(' ,- ) = eA (,- , ) = I.
= e-
Therefore, (t! " )- t
A,.
E
T,
•
The following natural question arises: can we find an expression similar to t!', for the case when A = A(t), t E T. The answer is, in general, no. oH wever, there is a special case when such a generalization is valid. Theorem. If for Eq.
.4 11.51.
(4.11.30)
A(tt)A(t z )
= A(tz)A(tt) for all
t l , t z E T, then the state transition matrix .(t, T) is given by
r
.(t, T) where B(t, T) .4 11.52.
A
= e
=
A'J (I,)d~ T
eB(.,T)
=
+
I
~
k= 1
I
-kIB~{t,
T), •
A('1) d'l.
Exercise.
Prove Theorem .4 11.5 I.
We note that a sufficient condition for A{ t all t I> t z E T is that A( t) be a diagonal matrix. .4 11.53.
...
Exercise.
l)
to commute with A(tz ) for
Find the state transition matrix for i A{ t )
=
A{t)x,
where
~l
= [;
The reader will find it instructive to verify the following additional results. .4 11.54.
Exercise.
eL t A denote the (n
A= A[ .I.
eA ,
for all t
E
T=
= [
n) diagonal matrix
0].
o
Show that
X
A.~
ell' .
0 ]
o
el .,
.
(- b o, bo).
.4 11.55. Exercise. eL t t E T = (- bo, be» , let T E T, and let ~ E R~ (or en). Let A be the (n x n) matrix for Eq. (4.1I.3I), and let. denote the
Chapter 4 I iF nite-Dimensional
Vector Spaces and Matrices
unique solution ofEq . (4.11.31) with ,(of) = ~. Let P be a similarity transformation for A, and let B = p- I AP. (a) Show that eAt = Pe-rP-1 for all t E T. (b) Show that the unique solution of Eq. (4.11.31) is given by
,=
P.r,
where. is the unique solution of the initial-value problem t= B y
with
'!(f) 4.1l.56. Exercise. A(t) = A for all t
E
=
P- I ' ( f) =
P-I~.
Let D be defined by (4.11.28). T; i.e.,
=
i
Ax
+
In Eq.
(4.1 1.29), let (4.11.57)
v(t).
L e tf E T, and let, denote the unique solution ofEq . (4.11.57) with ' ( f) =~. Let P be a similarity transformation for A, and let B = p- I AP. Show that the unique solution of Eq. (4.11.57) is given by
.= p ' ! ' , where. is the unique solution of the initial-value problem
t
By +
=
with (f, '! (f»
E
D, t
E
P- I v(t)
T.
.4 11.58. Exercise. Let J denote the J o rdan canonical form of the (n x n) matrix A of Eq. (4.11.31), and let M denote the non-singular (n x n) matrix which transforms A into J ; i.e., J = M- I AM. Then
J o:
- - 1- -
J=
where
i_~!:
1
0
o o
o
.4 11.
Applications to Ordinary Differential Equations
where
oo
o o
J.=
. I
0· · · · ·
o
1.. + . o 0· · · · · 0 lk+ .. m = I, ... ,p, and where 11> ... ,1.., lk+.' ... ,lu, denote the (not necessarily distinct) eigenvalues of A. Show that
o
ell'
o
where
.]
and I"
I,· - i
2!
1
(v. - l)! 1' · - " (VIII 2)!
t
o where J . is a
VIII
X
VIII
0
matrix and k
+
e'"
0
v.
+ ... +
v, = n.
Next, we consider initial-value problems characterized by linear nth-order ordinary differential equations given by
+
a.(t)x
l l •
a.(t)x
l• l
+
a._ . (t)x
l
a._ . (t)x
c.-
and a.x
l .)
+
a._ . x
.-
Il
Il
+
+
l. - I )
+
+
a.(t)x ( \ l
+ ... +
a.(t)x ( \ l
a.x l l)
+
+
ao(t)x =
+
ao(t)x aox
=
=
v(t),
(4.11.59)
0,
(4.11.60)
O.
(4.11.61)
In Eqs. (4.11.59) and (4.11.60), v(t) and o,(t), i = 0, ... ,n, are functions which are defined and continuous on a real t interval T, and in Eq. (4.11.61),
Chapter 4 I iF nite-Dimensional
Vector Spaces and Matrices
i = 0, ... , n, are constant coefficients. We assume that 0. F= 0, that 0 for any 1 E T, and that v(l) is not identically ez ro. Furthermore, the coefficients 01' 0 1(1), i = 0, ... ,n, may be either real or complex. In accordance with Eq. (4.11.23), we can reduce the study of Eq. (4.11.60) to the study of the system of n first-order ordinary differential equations
the
01'
0,,(1) F=
i where
o
A(I) =
=
o
o
0
-°
-' oo(t) _ a,,(I)
I
o o
0
I
o
1
o
(4.11.62)
A(I)x,
1(1)
- 0 2(1) 0.(1)
a,,(I)
(4.11.63)
- O "- I (t)
a,,(/)
. ••
In this case the matrix A(I) is said to be in companion form. Since A(I) is continuous on T, there exists for all 1 ETa unique solution II to the initialvalue problem i =
A(I)x
}
x(t')=;=(~I,··,~,,)T
(4.11.64) '
where T E T and; E R" (or e") (this will be proved in the next chapter). Moreover, the first component of ,I, PI' is the solution of Eq. (4.11.60) satisfying PI(T) =
p(T) =
p(\)(T) =
~I'
... , pl"-II(T) =
~2'
~".
Now let 1' 1' " .. ,' I ' " be solutions of Eq. (4.11.60). Then we can readily verify that the matrix
y
=[
;::: ... 1' "\ ' -
;1:' ...:::...;:' ]
¥I~"-
t)
t)
• • •
,,~.-
(4.11.65)
t)
is a solution of the matrix equation
+
=
A(I)",
(4.11.66)
where A(I) is defined by Eq. (4.11.63). We call the determinant of" the Wronskian of Eq. (4.11.60) with respect to the solutions ¥l1>"" I¥ ", and we denote it by det" = W(' I ' I > " " 1' ,' ,). (4.11.67) Note that for a fixed set of solutions I¥ I" .. , "" (and considering the Wronskian is a function of I. To indicate this, we write W(" I ' • •
T ,
fixed),
".)(1).
.4 11.
257
Applications to Ordinary Differential Equations
In view of Theorem .4 11.37 we have for all t
,' Y ,)(t)
W(Y ' I "' "
=
= .4 11.69. tion
Example.
T,
E
det Y ( t) =
det 'P(r)eJ~trACII'"
W(Y ' I "' "
Y', )(r)eJ~-[II.-.e")/II.C"lld".
(4.11.68)
Consider the second-order ordinary differential eq u a-
tZx CZI
+
tx
W
The functions 1' Y (t) = t and (z ' Y t) Consider now the matrix
= x
-
=
0,
0
< t<
(4.11.70)
00.
lit are clearly solutions of Eq. (4.11.70).
Then W(YI' >
)z'Y (t) =
det P ' (t) =
--,
2
t>
t
O.
the notation of Eq. (4.1 1.63), we have in the present case al(t)laz(t) lit. F r om Eq. (4.1 1.68) we have, for any l' > 0,
Using
=
)z'Y (t)
W(YI' >
= det "(t) = -_
which checks.
- e2
-
W(Y' I> )z'Y (r)eJ~
ID (Titl _
l'
•
-, 2
(- II.e"I/II,C"IJ d"
t>
t
0,
The reader will have no difficulty in proving the following:
.4 11.71.
Theorem. A set of n solutions of Eq. (4.11.60), Y'I' ... ,Y'", is linearly independent on a t interval T if and only if W(Yt' > ... ,Y,' ,)(t) 1= = 0 for all t E T. Moreover, every solution of Eq. (4.11.60) is a linear combination of any set of n linearly independent solutions.
.4 11.72.
Exercise.
Prove Theorem .4 11. 71.
We call a set ofn solutions ofEq . (4.11.60), 1'Y t ..• , "' Y , which is linearly independent on T a fundamental set for Eq. (4.11.60). L e t us next turn our attention to the non-homogeneous linear nth- o rder ordinary differential eq u ation (4.11.59). Without loss of generality, let us assume that a,,(t) = 1 for all t E T; i.e., let us consider C X "1
+
a"_I(t)xC"-1l
+ ... +
al(t)x(l)
+
ao(t)x
=
v(t).
(4.11.73)
The study of this eq u ation reduces to the study of the system of n first-order
I Finite-Dimensional
Chapter 4
158
ordinary differential equations
o o
A(t) =
+
A(t)x
i =
where
Vector Spaces and Matrices
b(t),
(4.11.74)
o
1
o
o
o
o
1
.
000 - o o(t)
- 0 1(/)
- 0 2(/)
...
- 0 ._
b(t) =
1(/)
o V(/) (4.11.75)
In the next chapter we will show that for all lET there exists a unique solution ~ to the initial-value problem i = (X T)
+
A(t)x
=; =
b(t)
}
(el' ... ,e.)T
,
(4.11.76)
CI '
where T E T and; E R· (or C·). The first component of~, of Eq. (4.11.59), with 0.(/) = 1 for all t E T, satisfying CI(-r)
= el'
=
C(tJ(r- )
'2'
is the solution
= , .•
... , Clo-(> ! r- )
We now have:
.4 11.77.
Theorem. Let I¥{ t>
+
lX .>
... , I¥ .}
+ ... +
I.- I>
O._I(t)X
Then the solution' of the equation Xl.>
+
+
o._I(/)x(·-tJ
be a fundamental set for the equation
+
+
OI(t)X()J
+
01(/)X()J
satisfying ~(T) = ; = (C(T), CIIl(T), , ,(·-t>(T»T R· (or C·) is given by the expression
; E
C(/)
= CA(/)
+ to I¥ ,(/) t:1
I'
W { ,(¥I .. • . W(¥I h • •
r
oo(/)X
=
oo(t)x =
= O. v(1),
(4.11.78) (4.11.79)
(' I " .. ,' . )T, T
, .'¥ )(s)} v(s) ds, I¥ .)(s)
,
E
T,
(4.11.80)
where CA is the solution of Eq. (4.11.78) with CA(T) = ' I ' and where ~(¥lI' ... ,¥I.X/) is obtained from W(¥lI" .. , I¥ .)(/) by replacing the ith column of W(¥lI" .. , I¥ .X/) by (0,0, ... , l)T.
.4 11.81.
Exercise.
Prove Theorem .4 11.77.
Let us consider a specific case.
.4 11.82. tion
Example.
Consider the second-order ordinary differential equa-
12 x
12>
+
tx
ltJ -
X
=
b(t), t
>
0,
(4.11.83)
.4 11.
Applications to Ordinary Differential Equations
where b(t) is a real continuous function for all t > equivalent to
O. This equation
is
(4.11.84)
where v(t) = b(t)/t'1.. F r om Example .4 11.69 we have V'I(t) = and W(V'., V'1' .)(t) = - 2 /t. Also,
o
I t -1
t, V'1' .(t) =
l/t,
1
=--,
t
tr
eL t us next focus our attention on linear nth-order ordinary differential equations with constant coefficients. Without loss of generality, let us assume that, in Eq. (4.11.61), a. = 1. We have (4.11.85)
We call the algebraic equation
a._ l l n- 1 + .,. + all + a o = 0 (4.11.86) the characteristic equation of the differential equation (4.11.85). As was done before, we see that the study of Eq. (4.11.85) reduces to the study of the system of first-order ordinary differential equations given by P(l) =
ln
+
:i =
where
Ax,
(4.11.87)
AJ l-a.~ ..... ~ ..... ~ .....~ ... :::.... ~ . .] o
-al
(4.11.88)
- a 3 ..• - a ._ I We now show that the eigenvalues of matrix A ofEq . (4.11.88) are precisely the roots of the characteristic equation (4.11.86). First we consider
o
1
0
-,t
0
o
o
o
-,t
det(A - , tI)
=
-a2
o o
-,t
o o
.
Chapter 4 I iF nite-Dimensional
160
-1
Vector Spaces and Matrices
0
0
0
o
-1
I
0
0
o
0
0
-1
I
-01
-0"
-03
= -1
...
+
-(1
- 0 "_ , ,
. 0,,_ 1 )
100 -1
+ sU ing
(- 1 )"+ 1 (- 0
induction we arrive at the expression det(A - 1 1)
=
(- I )"{ l "
+
,° ,_11,,-1
0)
+ ... +
I
0
-1
o
0
all
+
oo}.
0
0.
(4.11.89)
1 is an eigenvalue of A if and only if 1 is a root of the characteristic equation (4.11.86).
It follows from Eq. (4.11.89) that
.4 11.90. Exercise. Assume that the eigenvalues of matrix A given in Eq. (4.11.88) are all real and distinct. eL t A denote the diagonal matrix
o
(4.11.91)
1" where 1 1 , • • ,1" denote the eigenvalues of matrix A. eL t Vanclermonde matrix given by
V denote the
I V=
11 II
1" l~
1" l~
(a) Show that V is non-singular. (b) Show that A = V-IAV. Before closing the present section, let us consider so-called "adjoint systems." To this end let us consider once more Eq. (4.11.30); i.e., t
=
A(t)x .
(4.11.92)
Let A*(t) denote the conjugate transpose of A(t). (That is, ifA(t) = o[ (J' t)], then A*(t) = a[ l}(t)]T = a[ ,J (t)], where a,it) denotes the complex conjugate
.4 12.
261
Notes and References
of a,it).) We call the system of linear first-order ordinary differential equations y = -A*(t)y (4.11.93) the adjoint system to (4.1 1.92). .4 11.94. Exercise. eL t Y be a fundamental matrix of Eq. (4.11.92). Show that T is a fundamental matrix for Eq. (4.11.93) if and only if
T*Y = C, where. C is a constant non-singular matrix, and where T* denotes the conjugate transpose of T. It is also possible to consider adjoint equations for linear nth-order ordinary differential equations. eL t us for example consider Eq. (4.11.85), the study of which can be reduced to that of Eq. (4.11.87), with A specified by Eq. (4.11.88). Now consider the adjoint system to Eq. (4.11.87), given by
y= - A *y,
where
0 0
0 -I
- A *=
(4.11.95) 0 0 0
0 -I
ao a a2 1
..................... o 0· · - I a.-
where a, denotes the complex conjugate of a" i = (4.11.95) represents the system of equations Yl =
2Y = .Y
aoY., -YI
= - Y , ,- I
+
,
(4.1 1.96)
1
0, ... , n -
I. Equation
(4.1 1.97)
+
alY . ' a,,-I.Y ·
Differentiating the last expression in Eq. (4.11.97) (n ... ' Y " - I ' and letting "Y = ,Y we obtain C (- I )· y "> + (- I ),,- l a._1Y c..-I> + ... + (- I )Qlit> +
1) times, eliminating
Y"
aoY
=
O.
(4.11.98)
Equation (4.11.98) is called the adjoint of Eq. (4.1 1.85).
.4 12.
NOTES AND REFERENCES
There are many excellent texts on finite-dimensional vector spaces and matrices that can be used to supplement this chapter (see e.g., .4 [ 1], .4 [ 2], .4 [ ,] 4 and .4 [ 6].4 [ - 10]). References .4 [ 1], .4 [ 2], .4 [ 6], and .4 [ 10] include appli-
C1uIpter 4
I Fmite-Dimensional
Vector Spaces and Matrices
cations. (In particular, consult the references in .4 [ 10] for a list of diversified areas of applications.) ExceUent references on ordinary differential equations include .4 [ 3], .4 [ 5], and .4 [ 11]. REFERENCES .4 [ 1]
.4 [ 2]
.4 [ 3] .4 [ ] 4 .4[ 5]
.4 [ 6] .4 [ 7) .4[ 8] .4[ 9] .4[ 10]
.4 [ 11]
N. R. AMuNDSON, MatMmatical Methods in Chemical Engineering: Matrices and Their Applications. Englewood ai1f's, N.J.: Prentice-aH ll, Inc., 1966. R. E. BELM L AN, Introduction to Matrix Algebra. New York: McGraw-iH D Book Company, Inc., 1970. .F BRAUER and .J A. NOBEL, Qualitatil1e Theory of Ordinary Differential Equations: An Introduction. New York: W. A. Benjamin, Inc., 1969. * E. T. BROWNE, Introduction to the Theory of Determinants and Matrices. Chapel iH D, N.C.: The nU iversity of North carolina Press, 1958. E. A. CoDDINGTON and N. IL MNSON, Theory of Ordinary Differential Equations. New York: McGraw-iH ll Book Company, Inc., 1955. F. R. GANTMACHER, Theory of Matrices. Vols. I, II. New York: Chelsea Publishing Company, 1959. P. R. IIALMos, iF nite Dimensional Vector Spaces. Princeton, N.J.: D. Van Nostrand Company, Inc., 1958. .K O H M F AN and R. N UK ZE, Linear Algebra. Englewood ai1f's, N.J.: Prentice-aH ll,
Inc., 1961.
S. IL PSCHT U Z, Linear Algebra. New York: McGraw-iH ll Book Company, 1968. B. NOBLE, Applied iL near Algebra. Englewood aiit' s , N.J.: Prentice-aH ll, Inc., 1969. .L S. PoNTlU A OIN, Ordinary Differential Equations. Reading, Mass.:
Addison-Wesley Publishing Co., Inc., 1962.
- R eprinted by Dover Publications, Inc., New York,
1989.
5
METRIC SPACES
U p to this point in our development we have concerned ourselves primarily with algebraic structure of mathematical systems. In the present chapter we focus our attention on topological structure. In doing so, we introduce the concepts of "distance" and "closeness." In the final two chapters we will consider mathematical systems endowed with algebraic as well as topological structure. A generalization of the concept of "distance" is the notion of metric. Using the terminology from geometry, we will refer to elements of an arbitrary set X as points and we will characterize metric as a real-valued, non-negative function on X x X satisfying the properties of "distance" between two points of .X We will refer to a mathematical system consisting of a basic set X and a metric defined on it as a metric space. We emphasize that in the present chapter the underlying space X need not be a linear space. In the first nine sections of the present chapter we establish several basic facts from the theory of metric spaces, while in the last section of the present chapter, which consists of two parts, we consider some applications to the material of the present chapter.
5.1.
DEFINITION
OF
METRIC SPACE
We begin with the following definition of metric and metric space. 5.1.1. Definition. eL t X real-valued function on X lowing properties: (i) p(x, y) (ii) p(x , y) (iii) p(x, y)
<
>
=
be an arbitrary non-empty set, and let p be a x ,X i.e., p: X x X - R, where p has the fol-
0 for all ,x y E X and p(x , y) = 0 if and only if x = p(y, x) for all x, y E X ; and p(x , )z + p(z, y) for all x , y, Z E .X
y;
The function p is called a metric on ,X and the mathematical system consisting of p and ,X {X; p}, is called a metric space. The set X is often called the underlying set of the metric space, the elements of X are often called points, and p(x, y) is frequently called the distance from a point x E X to a point y E .X In view of axiom (i) the distance between two different points is a unique positive number and is equal to zero if and only if two points coincide. Axiom (ii) indicates that the distance between points x and y is equal to the distance between points y and x. Axiom (iii) represents the well-known triangle inequality encountered, for example, in plane geometry. Clearly, if p is a metric for X and if IX is any real positive number, then the function IXp(X, y) is also a metric for .X We are thus in a position to define infinitely many metries on .X The above definition of metric was motivated by our notion of distance. Our next result enables us to define metric in an equivalent (and often convenient) way. 5.1.2. Theorem. eL t p: X (i) p(x, y) = (ii) p(y, )z <
x
X -
R. Then p is a metric if and only if
0 if and only if x = y; and + p(x, )z for all x , y, z
p(x , y)
E
.X
Proof
The necessity is obvious. To prove sufficiency, let x, y, Z E X with y = .z Then 0 = p(y, y) < 2p(x, y). eH nce, p(x , y) ~ 0 for all ,x y E .X Next, let Z = .x Then p(y, )x < p(x, y). Since x and yare arbitrary, we can reverse their role and conclude p(x , y) < p(y, x). Therefore, p(x , y) = P(Y, x ) for all ,x y E .X This proves that p is a metric. • Different metrics defined on the same underlying set X yield different metric spaces. In applications, the choice of a specific metric is often dictated by the particular problem on hand. If in a particular situation the metric p is understood, then we simply write X in place of { X ; p} to denote the particular metric space under consideration. eL t us now consider a few examples of metric spaces.
5.1.
Definition 01 Metric Space
5.1.3. Example. L e t X be the set of real numbers R, and let the function P on R x R be defined as p(x, y)
= Ix
-
yI
(5.1.4)
for all x, Y E R, where Ix I denotes the absolute value of .x Now clearly p(x , y) = Ix - yl = 0 ifand only if x = y. Also, for all x , y, Z E R, we have p(y, )z = Iy - lz = 1(Y - )x + (x - )z 1 < Ix - yl + Ix - lz = p(x , y) p(x, )z . Therefore, by Theorem 5.1.2, P is a metric and R { ; p} is a metric space. We call p(x, y) defined by Eq. (5.1.4) the usual metric on R, and we call the metric space R { ; p} the real line. _
+
5.1.5. Example. L e t X be the set of all complex numbers C. If z E C, and where a, b are real numbers. Let then z = a + ib, where i = . .;= 1 , i = a - ib and define p as
p(z
l'
Z2)
=
([ z
Z2)(Z I
I -
-
Z2)],12.
(5.1.6)
It can readily be shown that C { ; p} is a metric space. We call (5.1.6) the usual metric for C. _ 5.1.7. Example. Let X function p on X X X as
be an arbitrary non-empty set and define the
0 if x = { I if x
y
* y.
p(x, y) =
(5.1.8)
Clearly p(x, y) 2 0 for all ,x y E X, p(x, x) = 0 for all x E ,X and p(x, y) ::::;; p(x, z) p(z, y) for all x, y, z E .X Therefore, (5.1.8) is a metric on .X The function defined in Eq. (5. I .8) is called the discrete metric and is important in analysis because it can be used to metrize any set .X _
+
We distinguish between bounded and unbounded metric spaces. 5.1.9. Definition. L e t { X ; p} be a metric space. If there exists a positive number r such that p(x, y) < , for all x, y E ,X we say ;X { p} is a bounded metric space. If;X{ p} is not bounded, we say ;X{ p} is an unbounded metric space. If ;X { p} is an unbounded metric space, then p takes on arbitrarily large values. The metric spaces in Examples 5. 1.3 and 5. 1.5 are unbounded, whereas the metric space in Example 5.1. 7 is clearly bounded. 5.1.10. Exercise. Let ;X{ p} function PI : X x X - + R by
be an arbitrary metric space. Define the
PI( X , y)
Show that PI(X,
=
1+
p(x , y) . p(x , y)
y) is a metric. Show that ;X {
(5.1.11)
PI} is a bounded metric space,
Chapter 5 I Metric Spaces even though ;X { p} may not be bounded. Thus, the function (5.1.11) can be used to generate a bounded metric space from any unbounded metric space. (H i nt: Show that if,: R - . R is given by ,(t) = t/(l t), then ,(t 1) < ,(t 1 ) for all t 1 , t 1 such that 0 < t 1 < t 1 .)
+
Subsequently, we will call
R* =
u o+{ o}
{ - o o}
RU
the extended real numbers. In the following exercise, we define a useful metric on R*. This metric is, of course, not the only metric possible.
5.1.12. Exercise.
= R* and define the function f: R* - . R as X
Let
J
x
1:+ : ~
[(x)
1 {lxi'
R
:: ~: E
p*: R* x R* - . R be defined by p.(x , y) = If(x ) - f(y) I for all ,x R*. Show that R { *; P.} is a bounded metric space. The function p* is { *; p*} is called the extended real line. called the usual metric for R*, and R Let
y
E
We will have occasion to use the nex t result.
p} be a metric space, and let x, y, and z be any
5.1.13. Theorem. L e t { X ; elements of .X
for all x , ,Y z E
Proof
Then
.x
Ip(x,
)z -
I<
p(y, )z
(5.1.14)
p(x, y)
F r om ax i om (iii) of Definition 5.1.1· it follows that
<
p(x, )z and
<
p(y, x)
p(x, z) -
p(y, z)
P(Y, z)
+
p(x, y)
+
p(y, )z
(5.1.15)
p(x, )z .
(5.1.16)
p(x, y)
(5.1.17)
P(Y, )z .
(5.1.18)
F r om (5.1.15) we have
<
and from (5.1.16) we have
- p (y, x)
S
p(x, z) -
In view of ax i om (ii) of Definition 5.1.1 we have p(x, y) relations (5.1.17) and (5.1.18) imply - p(x, y) This proves that Ip(x, z) -
<
p(x, z) p(y, z)
I<
P(Y, z)
<
=
p(y, x), and thus
p(x, y).
p(x, y) for all x , y, z
E
.X
•
5.1.
267
Definition ofMetric Space
The notion of metric makes it possible to consider various geometric concepts. We have: 5.1.19. Definition. Let fX ; p} be a metric space, and let Y be a non-void subset of .X If p(x, y) is bounded for all ,x y E ,Y we define the diameter of set ,Y denoted t5( )Y or diam (Y ) , as
=
t5(Y)
sup p{ (x,
y): ,x y E .} Y
+
If p(x , y) is unbounded, we write t5( Y ) = 00 and we say that Y has infinite diameter, or Y is unbounded. If Y is empty, we define t5( )Y = O. 5.1.20. Exercise. Show that if Y c Z c ,X where fX ; p} is a metric space, then t5( )Y < t5(Z). Also, show that if Z is non-empty, then t5(Z) = 0 if and only if Z is a singleton. We also have: 5.1.21. Definition. Let fX ; p} be a metric space, and let Y a nd Z be two non-void subsets of .X We define the distance between sets Y a nd Z as
d(Y , Z) = Let p
E
inffp(y, )z : y
X and define d(p, Z)
=
E
inffp(p, )z : z
,Y z E
Z}.
E
Z}.
We call d(p, Z) the distance between point p and set Z. Since p(y, z) = p(z, y) for all y E Y and z E Z, it follows that d( ,Y Z) d(Z, Y). We note that, in general, d( ,Y Z) = 0 does not imply that Y a nd Z have points in common. F o r example, let X be the real line with the usual metric p. If Y = fx E :X 0 < x < I} and Z = fx E :X I < x < 2}, then clearly d( ,Y Z) = 0, even though Y n Z = 0. Similarly, d(p, Z) = 0 does not imply that p E Z.
=
5.1.22. Theorem. Let fX ; p} be a metric space, and let Y b e any non-void subset of .X If p' denotes the restriction of p to Y X ,Y i.e., if
p'(x, then f;Y
y) =
p(x, y) for all ,x y
E
,Y
p'} is a metric space.
5.1.23. Exercise.
Prove Theorem 5.1.22.
We call p' the metric induced by p on ,Y and we say that {Y; p'} is a metric subspace of fX ; p} or simply a subspace of .X Since usually there is no room for confusion, we drop the prime from p' and simply denote the
Chapter 5 I Metric Spaces metric subspace by {Y; pl. We emphasize that any non-void subset of a metric space can be made into a metric subspace. This is not so in the case of linear ,X then we speak of a proper subspace. subspaces. If Y
*
5.2.
SOME INEQUAIL TIES
In order to present some of the important metric spaces that arise in applications, we first need to establish some important inequalities. These are summarized and proved in the following: 5.2.1. Theorem. L e t R denote the set of real numbers, and let C denote the set of complex numbers. (i)
Let p, q
E
R such that I
<
p
pER such that ~ >
for all~,
00
~p :::;; ,~ (ii) (H6Ider's
inequality)
1 1 - + -p= 1 . q (a)
Let
+
and such that 1. 1. = p q 0 and p > 0, we have
<
p, q
E
p
1. Then
+ fJq ' .
(5.2.2)
R be such that 1 < p
<
iF nite Sums. L e t n be any positive integer and, let, 1> and ' I ., ... ,' I . belong either to R or to C. Then
00,
••
and
,
n'
(5.2.3)
-
(b) Infinite Sums. L e t
R or C. If ~
"{ l
le,l' <
-
and I'{ I} 00
and ~
be infinite sequences in either I'II~
<
00,
then (5.2.4)
(c)
Integrals.
f, g:
la, b]
Let -+
la, b) R. If
be an interval on the real line, and let
s: If(t) I' dt <
00
and
s: Ig(t) I' dt <
(integration is in the Riemann sense), then
s: If(t)g(t) Idt :::;; :U
If(t) I' dt]
I/':U
Ig(t) ~
dt]
(iii) (Minkowski's inequality) L e t pER, where 1:::;; p < 00. (a) iF nite Sums. Let n be any positive integer, and let and ' I I' ... , tI. belong either to R or to C. Then
II'.
00
(5.2.5)
I' ' ... ,e.
5.2. Some Inequalities
(b)
269
Infinite Sums. Let e{ /} and I'{ I} be infinite sequences R or C. Ift:1le/l' < 00 and ~ 1'1/1' < 00, then / [~Iel ± I' II'T ' < [~lel 'T/, + [ t il' l d' T /,. eo
(c)
:U Proof
Integrals. ,J g: a[ , b]
a[ , b]
Let -+
R. If
then
r
If(tWdt <
:U
= fl.'/p and q 2 = any choice of fl., P >
ql
and
I If(tW dtT '
(tile/l,)' / ,
(~I~II,Y!J'
From Figure A it is clear that q l 0, and hence relation (5.2.2) follows.
00,
/ Ig(tW dtT ,. (5.2.8)
(tilell') II' =
+
q2
0 or if
> a.p for
(til1' 11') III
(5.2.3) follows trivially. Therefore, we assume that
7= = 0 and
lell
:U +
s:
P'/.q
0, then inequality
Ig(tW dt <
= e,-I in the (e, 'I) plane, = So- e,-I de and q 2 = If-I d'l. We have
To prove part (iia) we first note that if =
00
To prove part (i), consider the graph of 1' ql
r
(5.2.7)
be an interval on the real line, and let
If(t) ± g(tW dtTI, <
depicted in Figure A. Let
in either
eo
.
(iil'lII,)'/I 7= = (~'TI Iyl'
O. From (5.2.2) we now have
< 1- .
1'1/1
-
P
(~I~II')
lell'
Hence,
5.1.9.
iF gure A.
+ 1- . q
(~'TI ')
I'llI'
.
Chapter 5 I Metric Spaces
270 It now follows that
~ I,~ ' I
~ '~,I 'I,I =
~ (~
I~,I,)'/
(~I' ,~)'/.
which was to be proved. To prove part (iib), we note that for any positive integer n,
1'1,1,)'/' ~ (~I,~ I,)/' (~
< (~I~,I,)'/'(~
~ 1,~I' 1
1'1,1,)'/.'
If we let n - . 00 in the above inequality, then (5.2.4) follows. The proof of part (iic) is established in a similar fashion. We leave the details of the proof to the reader. To prove part (iiia), we first note that if p = I, then inequality (5.2.6) follows trivially. It therefore suffices to consider the case I < p < 00. We observe that for any ~, and we have
I' "
+ I'I,IY =
(I~,I
+
I'I,I)'-II~,I
1[ ,' 1 +
I'I,I]'-I'~,I
(I~,I
+
+ 1'1,1)'1- 1'1,1·
(I~,I
Summing the above identity with respect to i from I to n, we now have
~ =
•
+ ~
•
[I~,I
+ 1'1,1]1-' 1'1,1·
Applying the Halder inequality (5.2.3) to each of the sums on the right side of the above relation and noting that (p - l)q = p, we now obtain
~ 1[ ,' 1 + 1'1,1]' < [~(I~,I If we assume that
+ I' ,I)'T/'[~
[t
+ [~(le,1 / (le,1 + 1'1,1),]1 ' *- 0
'sl
above inequality by this term, we have
[ ~•
Since [ ~
a
I~,
1] /' < [ • ~ [ .~ (I ,~ I +
(1',1 + 1'1,1)'
± 1' ,1'
1] 1'
I~,I'J/
<
a We note that in case [ I; (I~,I 1=1
+ 1'I,l)'T
/
,[t.1'1' ,I,]/' .'
and divide both sides of the
1] /' + [ .~ 1'1,1' 1] /' . 1'1,1)' 1] /' ,the desired result follows. I~,I'
+ 1'1,1)' 1] /' =
0, inequality
(5.2.6) follows
trivially. Applying the same reasoning as above, the reader can now prove the Minkowski inequality for infinite sum!! and for integrals. _ If in (5.2.3), (5.2.4), or (5.2.5) we let p = q = t, then we speak of the Schwarz inequality for finite sums, infinite sums, and integrals, respectively.
271
5.3. Examples ofImportant Metric Spaces
5.2.10. Exercise. Prove H o lder' s inequality for integrals (5.2.5), Minkowski' s inequality for infinite sums (5.2.7), and Minkowski's inequality for integrals (5.2.8).
5.3.
EXAMPLES
OF
IMPORTANT METRIC SPACES
In the present section we consider specific examples of metric spaces which are very important in applications. It turns out that aU of the spaces of this section are also vector spaces. As in Chapter ,4 we denote elements ,x y E R~ (elements ,x y E C~) by x = (' I ' , ,~) and y = ('11' ... ,'1~), respectively, where ' I ' 1' 1 E R for i = I, ,n (where ' " 1' 1 E C for i = I, ... ,n). Similarly, elements ,x y E Roo (elements x , y E Coo) are denoted by x = (' 1 ' ,,,, ...) andy = ('11' 1' ", ...), respectively, where 1' 1 E R for all i (where 1' 1 E C for all i).
I' '
5.3.1. Example.
X
Let
I' '
Cn), let 1 :::;; P <
Rn (let X =
=
[ t1=1
p,(x, y) =
00,
and let
1', - 1' 11,]1/'.
(5.3.2)
We now show that (Rn; p,}({ C~; p,}) is a metric space. Axioms (i) and (ii) of Definition 5.1.1 are readily verified. To show that axiom (iii) is satisfied, let a, b, d E Rn (let a, b, d E cn), where a = (
p,(a,d)=
:::;;
•
} 11,
•
} 11,
{
~1(l.1-~,I' {
1~ (l.I-PII'
1.l,~I(l.I-~ PI+PI-~II' =
{
+
}
n
}
~IP,-~,I'
11,
11,
= p (a,b)+ p (b,d),
the triangle inequality. It thus follows that (Rn; p,}(C { n; p,}) is a metric space; in fact, it is an unbounded metric space. We frequently abbreviate (R~; p,} by R; and (cn; p,} by C;. F o r the case p = 2, we call p" the Euclidean metric or the usual metric on R~. _ 5.3.3. Example.
F o r ,x y
poo(x, y) It is readily shown that 5.3.5. Ex a mple.
Let
E R~
(for ,x y
=
max (1'1 -
(R~;
poo}(« cn;
<
1
I, =
x{
00,
EX :
E C~),
1111, ...
let
,I'n - 1' nl}·
(5.3.4)
poo}) is a metric space. _ let X
=
R- (or X
t I'll' <
I-I
oo} .
=
Coo), and define
(5.3.6)
Chapter 5 I Metric Spaces
172 F o r ,x y
E
[ I;.. I{,
I" let
=
pJ X , y)
- 1' ,1'
I- I
1] /, .
(5.3.7)
We can readily verify that {I,; p,} is a metric space. _ 5.3.8. Example.
=
eL t X
I.. F o r ,x y
E
=
R" (or X
= x{
C"), and let
:X sup 1{ ,{ 1l
E
I
I.. , define
< oo}.
(5.3.9)
p..(x, y) = sup , I{ ,{ - I' ,ll·
(5.3.10)
We can easily show that I{ .. ; P..} is a metric space. _ 5.3.11. Exercise. sU e the inequalities of Section 5.2 to show that the spaces of Examples 5.3.3, 5.3.5, and 5.3.8 are metric spaces. 5.3.12. Example. eL t a[ , b], a < b, be an interval on the real line, and let era, b] be the set of all real-valued continuous functions defined on a[ , b]. Let I < p < 00 and for ,x y E era, b], define p,(x , y) =
i[ •
b
Ix(t) -
y(t)IP dt
IJ I'
(5.3.13)
.
We now show that e{ ra, b]; p,} is a metric space. Clearly, p,(x , y) = ply. x), and p,(x , y) ~ 0 for all x , y E era, b]. If x ( t) = y(t) for all t E a[ , b], then p,(x , y) = O. To prove the converse of this statement, suppose that x(t) 1= = y(t) for some t E a[ , b). Since x , y E era, b], x - Y E e[a, b], and there is some interval in a[ , b], i.e., a subinterval of a[ , b], such that Ix(t) - y(t) I > 0 for all t in that subinterval. eH nce,
1[
6
• Ix ( t) -
y(t) I' dt
1] /, > o.
Therefore, p,(x , y) = 0 if and only if x ( t) = y(t) for all t E a[ , b]. To show that the triangle inequality holds, let u, tJ, W E era, b], and let x = U - tJ and y = v - w. Then we have, from inequality (5.2.8),
=
< =
I{ • lu(t) I{• Iu(t) b • I{ lu(t) b
p,(u, w) =
b
p,(u, v)
+
w(tWdt v(t) +
} 1/,
v(t) -
}
v(tWdt
1/,
w(t) I' dt
+ • I{ b
} 1/,
Iv(t) -
w(tWdt
}
1/,
p,(v, w),
the triangle inequality. It now follows that e{ ra, bJ; p,} is a metric space. It is easy to see that this space is an unbounded metric space. _
273
5.3. Examples of Important Metric Spaces
5.3.14. Example. eL t F o r x, y E ~[a, b], let
b] be defined as
a[~ ,
p_ ( x ,
y)
=
sup Ix { t ) -
GStS6
In
the preceding example.
(5.3.15)
y{ t ) I.
To show that {~[a, b]; p- l is a metric space we first note thatp_ ( x , y) = p_ ( y, x), that p(x , y) > 0 for all ,x y, and that p{ x , y) = 0 if and only if x ( t) = y{ t ) for all t E a[ , b]. To show that p_ satisfies the triangle inequality we note that
=
p_ ( x , y)
sup Ix { t )
Ix - y l
o x =
z{t)
.StS6
+
z ( t) -
y{ t )
I
I
i ·' I
•x
y(t)/
sup Ix ( t) -
=
I
-
GStS6
I I
I I
•
R, pix, yl
I
I
o
y
= Ix - yl
ev= ( v,.V2)
(x "
X"
.- - - - , -
I
2x 1
:
I
I
o P.(x ,
X .. R2, P.(X ,
o
yl
yl" max (Ix, - y ,I,lx 2
- 2Y 11
(x tl
o
- -
x
"' - _ ' : "' - ' =
era,
bJ , P. (x"
-~
Ib I
2x 1 = sup { I x l
- --
- -
-
(t)- 2 x t{ 111
a~tb ~
5.3.16.
iF gure B. Illustration of various metrics.
174
Chopttr 5
<
sup I{ x ( t) -
)z
p_(,x =
It thus follows that e{ a[ ,
b]; P-J
+
I + Iz(t)
z(t)
.S;' 9
I
Mttr;c Spacts
y(t) I} -
y).
p_(,z
is a metric space.
_
In Figure B,. several metrics considered in Section 5.1 and in the present section are depicted pictorially.
5.3.17. Exercise.
Show that the metric defined in Eq. (5.3.4) is eq u ivalent
to
p_(,x
" ,-- [ I-'
=
y)
lim I; I~,
/ - 1' ,1' IJ , .
Let X = R denote the set of real numbers, and define d(x , y) = (x - yy~ for all x, y E R. Show that the function dis not a metric. This illustrates the necessity for the ex p onent lip in Eq. (5.3.2).
5.3.18. Exercise.
We conclude the present section by considering Cartesian products of metric spaces. Let { X ; P.. } and { Y ; py} be two metric spaces, and letZ = X x .Y Utilizing the metrics P.. and py we can define Metrics on Z in an infinite variety of ways. Some of the more interesting cases are given in the following:
5.3.19. Theorem. L e t Z = X x .Y Let ZI = Define the functions P,(ZI' Z2)
and
=
{X; (XI>
P.. } and {Y; pyJ be metric spaces, and let Y I ) and Z2 = (x 2, 2Y ) be two points ofZ = X x .Y
x
([p(z IX >
P_(ZI'
2
l> ' +
[ p iY I '
Y2)],}I/"
= max p{ (z x u 2X ), PY(IY ' are metric spaces.
Z2)
Then Z { ; PI} and Z { ; P-J
The spaces Z { ; P,J and Z { ; P-J
1
<
00
2Y )}.
are examples of product (metric) spa~es.
5.3.20. Exercise.
Prove Theorem 5.3.19.
We can extend We have:
the above concept to the product of n metric spaces.
5.3.21. Theorem. Let { X I ; X ,Y ,)
=
XI E
X
X,
... X
"X
=
" "X IT
PIJ, ... ,{ X , ,;
t-~
x
y)
" P,(X = I;
define the functions p' ( x ,
=
For
'~I
P"J be n metric spaces, and let (XI>
"
... , IX I)
y,)
E
,X Y
=
(YI'
... ,
5.4.
175
Open and Closed Sets
and p"(x , y) =
Then { X ;
p'} and { X ;
5.3.22. Exercise.
5.4.
(I-' • ~
p[ ,(x"
)1/~
y,)~]
.
pIt} are metric spaces. Prove Theorem 5.3.21.
OPEN AND CLOSED
SETS
Having introduced the notion of metric, we· are now in a position to consider several important fundamental concepts which we will need throughout the remainder of this book.
In the present section ;X{
p} will denote an arbitrary metric space.
5.4.1. Definition. Let X o E X and let r E R, r > O. An open sphere or open ball, denoted by S(x o; r), is defined as the set S(x o; r) = x { E :X p(x, x o) < r}. We call the fixed point X o the center and the number r the radius of S(x o ; r). F o r simplicity, we often call an open sphere simply a sphere. The radius of a sphere is always positive and finite. In place of the terms ball or sphere we also use the term spherical neighborhood of X o' In Figure C, spheres in several types of metric spaces considered in the previous sections are depicted. Note that in these figures the indicated spheres do not include boundaries. 5.4.3. Exercise. Describe the open sphere in metric is the discrete metric of Example 5.1.7.
R~
as a function of r if the
We can now categorize the points or elements of a metric space in several ways. 5.4..4 Definition. eL t Y be a subset of .X A point x E X is called a contact point or adherent point of set Y if every open sphere with center x contains at least one point of .Y The set of all adherent points of Y is called the closure of Y and is denoted by .Y We note that every point of Y is an adherent point of ;Y may be points not in Y which are also adherent points of .Y
however, there
5.4.5. Definition. Let Y be a subset of ,X and let x E X be an adherent point of .Y Then x is called an isolated point if there is a sphere with center x
Chapter 5 I Metric Spaces
276
r
~
'.
I
oX Sphere S(XO; rl, where X = R and pIx , yl = Ix - vi
Sphere S(x o ; rl. where X ., R2 and pIx , yl" P2(x , yl "' [ ( tl 1- 1112 +
(b -1I212J~
t2 t 02 +
r
~
t 02 - r
9! - i • - - +- -,
t 02 +
~
t 02
1.,-,
I~
tOI - r
I
- ~ ."
-~.
: tOI
tOI + r
~ I I - " I+ I.,
~2
I I
I
I
t1
tOI - r
:
I
I
I
I
tOI
-
p(x , yl= p _ ( x , yl"' m ax
112 1
I
to! + r
a
b
era,
bJ
and pIx , yl "' p_ (x, yl = sup Ix ( tI-
tI
litI - .."1'I It:~2 - 1121)
oX - r
y(tl I
a~t~b
5.4.2.
I
I
x l tl
Sphere S(Xo; rl, where X ' "
I
I
Sphere S(xo ; rI where X "' R2 and
Sphere S(x o ; rl. where X = R2 and p(x , yl= P I(x , yl= l t
r
. various . Figure C• Spheres In metric spaces.
5.4.
Open and Closed Sets
277
which contains no point of Y o ther than x itself. The point x is called a limit point or point of accumulation of set Y if every sphere with center at x contains an infinite number of points of .Y The set of all limit points of Y is called the derived set of Y a nd is denoted by .'Y Our next result shows that adherent points are either limit points or isolated points. 5.4.6 • . Theorem. eL t Y be a subset of X and let x E .X Ifx is an adherent point of ,Y then x is either a limit point or an isolated point. We prove the theorem by assuming that x is an adherent point of Y but not an isolated point. We must then show that x is a limit point of .Y To do so, consider the family of spheres S(x; lin) for n = 1,2, .... eL t fX t E S(x; lin) be such that fX t E Y b ut fX t 1= = x for each n. Now suppose there are only a finite number of distinct such points X ft , say, lX { ' ... , x k } . If we let d = min p(x, IX )' then d > O. But this contradicts the fact that there is
Proof
1:S:I:S:k
an fx t E S(x; lin) for every n = 1,2,3, .... eH nce, fx t and thus X is a limit point of .Y •
there are infinitely many
We can now categorize adherent points of Y c X into the following three classes: (a) isolated points of ,Y which always belong to Y; (b) points of accumulation which belong to ;Y and (c) points of accumulation which do not belong to .Y Example. Let X = R, let p be the usual metric, and let Y = x{ E R: x < 1, x = 2}, as depicted in Figure D. The element x = 2 is an isolated point of ,Y the elements 0 and 1 are adherent points of Y which do not belong to ,Y and each point of the set x { E R: 0 < x < I} is a limit point of Y belonging to .Y • 5.4.7.
o<
(
)
o 5.4.8. 5.4.7.
iF gure D. Set Y =
{x
E
•
2 R: 0
< x
< 1, x =
2} of Example
5.4.9. Example. Let R { ; p} be the real line with the usual metric, and let Q be the set of rational numbers in R. F o r every x E R, any open sphere S(x; r) contains a point in Q. Thus, every point in R is an adherent point of Q; i.e., R c Q. Since Q c R, it follows that R = Q. Clearly, there are no isolated points in Q. Also, for any x E R, every sphere S(x; r) contains
278
Chapter 5
I Metric Spaces
an infinite number of points in Q. Therefore, every point in R is a limit point of Q; i.e., R c Q'. This implies that Q' = R. _ L e t us now consider the following basic results. 5.4.10. Theorem. L e t Y a nd Z be subsets of ,X and let f and i denote the closures of Y a nd Z, respectively. L e t denote the closure of ,Y and let Y ' be the derived set of .Y Then
r
(i)
Y c
f;
(ii) f = f; (iii) if Y c Z, then (iv)
(v) (vi)
f
= f u i; Y n Z c f n i; f = Y U Y'.
YUZ
c
i;
and
To prove the first part, let x E .Y Then x E S(x ; r) for every r > O. Hence, x E .Y Therefore, Y c f. To prove the second part, let x E ,Y and let r> O. Then there is an XI E Y such that X I E S(x ; r),andhencep(x , X I ) = r l < r. L e tro = r - r l > O. WenowwishtoshowthatS(x l ; ro) c S(x ; r). Indoingso,lety E S(x l ; ro)' Then p(y, X I ) < roo By the triangle inequality we have p(x , y) ~ p(x , XI) + p(x l , y) < r l + (r - r l ) = r, and hence y E S(x ; r). Since X I E f, the sphere S(x l ; ro) containsapointx 2 E .Y Thus, X 2 E S(x ; r). Since S(x ; r) is an arbitrary spherical neighborhood of x , we have X E .Y This proves that c .Y Also, in view of part (i), we have Y c Therefore, it follows that = Y .Y · To prove the third part of the theorem, let r > 0 and let X E .Y Then there is ayE Y such that y E S(x ; r). Since Y c Z, Y E Z and thus X is an adherent point of Z. To prove the fourth part, note that Y c Y U Z and Z c Y U Z. F r om part (iii) it now follows that Y c Y U Z and i c Y U Z. Thus, f u i c Y U Z. To show that Y U Z c f u i, let X E Y U Z and suppose that X :q Y u i. Then there exist spheres S(x ; r l ) and S(x ; r2) such that S(x ; r l) n Y = 0 and S(x ; ' 2 ) n Z = 0. L e t r = min {'It :' ' } z Then S(x ; r) n [ Y U Z] = 0. But this is impossible since X E Y U Z. Hence, X E Y u i, and thus Y U Z c f u i. The proof of the remainder of the theorem is left as an exercise. _ Proof
r
5.4.11.
r.
Exercise.
Prove parts (v) and (vi) of Theorem 5.4.10.
We can further classify points and subsets of metric spaces. 5.4.12. Definition. L e t Y be a subset of X and let Y - denote the complement of .Y A point X E X is called an interior point of the set Y if there
5.4.
Open and Closed Sets
279
exists a sphere Sex; r) such that sex; r) c .Y The set of all interior points of set Y is called the interior of Y a nd is denoted by yo. A point x E X is an ex t erior point of Y if it is an interior point of the complement of .Y The exterior of Y is the set ofall exterior points of set .Y The set ofall points x E X such that x E f () (Y - ) is called the frontier of set .Y The boundary of a set Y is the set of all points in the frontier of Y which belong to .Y 5.4.13. Example. Let R { ; p} be the real line with the usual metric, and let Y = y{ E R: 0 < :Y 5: I} = (0, I]. The interior of Y is the set (0, I) = { y E R: 0 < y < I}. The exterior of Y i s the set (- 0 0, 0) U (I, + 0 0), f = y{ E R: < Y : 5: I} = 0[ , I] and Y - = (- 0 0,0] U 1[ , + 0 0). Thus, the • frontier of Y is the set CO, I}, and the boundary of Y is the singleton l{ .}
°
We now introduce the following important concepts. 5.4.14. Definition. A subset Y of X is said to be an open subset of X if every point of Y is an interior point of ;Y eL ., Y = yo. A subset Z of X is said to be a closed subset of X if Z = Z. When there is no room for confusion, we usually call Y an open set and
Z a closed set. On occasions when we want to be very explicit, we will say that Y is open relative to { X ; p} or witb respect to { X ; p}. In our next result we establish some of the important properties of open sets.
5.4.15. (i) (ii)
Theorem. and 0 are open sets. If { .Y } .. eA is an arbitrary family of open subsets of ,X X
is an open set. (iii) The intersection of a finite number of open sets of X
then
U
• eA
Y ..
is open.
Proof To prove the first part,. note that for every x E X, any sphere Sex; r) c .X Hence, every point in X is an interior point. Thus, X is open. Also, observe that 0 has no points and therefore every point of 0 is an interior point of 0. Hence, 0 is an open subset of .X To prove the second part, let .Y{ .} EA be a family of open sets in ,X and Y .• If Y .. is empty for every tt E A, then Y = 0 is an open let Y = U .eA
subset of .X Now suppose that Y *- 0 and let x E .Y Then x E tt E A. Since Y .. is an open set, there is a sphere Sex; r) such c Y .• Hence, Sex; r) c ,Y and thus x is an interior point of .Y Y is an open set. To prove the third part, let Y 1 and Y 2 be open subsets of .X = 0, then Y 1 n Y 2 is open. So let us assume that Y 1 n Y z *-
Y . for some that sex; r) Therefore, If Y 1 () Y 2 0, and let
Chapter 5 I Metric Spaces
= Y 1 n Y z • Since x E Y " there is an r l > 0 such that x E S(x; Y I ' Similarly, there is an r z > 0 such that x E S(x; rz) c Y z . L e t T = min { r " Tz.} Then x E S(x ; r), where S(x ; r) c S(x ; T1) and S(x ; r) c S(x ; rz). Thus, S(x; r) c Y 1 n Y z , and x is an interior point of Y 1 n Y z . Hence, Y 1 n Y z is an open subset of .X By induction, we can show that the intersection of any finite number of open subsets of X is open. _ x
E Y
T 1) C
We now make the following
p} be a metric space. The topology of X Definition. L e t ;X{ mined by p is defined to be the family of all open subsets of .X
5.4.16.
deter-
In our next result we establish a connection between open and closed subsets of .X 5.4.17.
Theorem.
(i) X and 0 are closed sets. (ii) If Y is an open subset of ,X (iii) If Z is a closed subset of ,X
then r is closed. then Z- is open.
Proof
The first part of this theorem follows immediately from the definitions of ,X 0, and closed set. To prove the second part, let Y b e any open subset of .X We may assume that Y 1= = 0 and Y 1= = .X Let x be any adherent point of Y - . Then x cannot belong to ,Y for if it did, then there would exist a sphere S(x ; ,) c ,Y which is impossible. Therefore, every adherent point of Y - belongs to Y - , and thus Y - is closed if Y is open. To prove the third part, let Z be any closed subset of .X Again, we may assume that Z 1= = 0 and Z 1= = .X L e t x E Z- . Then there exists a sphere S(x ; T) which contains no point of Z. This is so because if every such sphere would contain a point of Z, then x would be an adherent point of Z and consequently would belong to Z, since Z is closed. Thus, there is a sphere S(x ; r) c Z- ; i.e., x is an interiorpointofZ- . Since this holds for arbitrary x E Z- , Z- is an open set. _ In the next sets. 5.4.18.
result we present additional important properties of open
Theorem.
(i) Every open sphere in X is an open set. (ii) If Y is an open subset of ,X then there is a family of open spheres, S { .}.eA' such that Y = U S .•
• eA
(iii) The interior of any subset Y of X in .Y
is the largest open set contained
5.4.
Open and Closed Sets
281
Proof To prove the first part, let Sex; r) be any open sphere in .X L e t x . E sex; r), and let p(x, lX ) = r .• If we let r o = r - ' . , then according to the proof of part (ii) of Theorem 5.4.10 we have S(x l ; ro) c Sex; r). Hence, x . is an interior point of sex; r). Since this is true for any x . E sex; r), it follows that sex ; r) is an open subset of .X To prove the second part of the theorem, we first note that if Y = 0, then Y is open and is the union of an empty family of spheres. So assume that Y t= = 0 and that Y is open. Then each point X E Y is the center of a sphere Sex; r) c ,Y and moreover Y is the union of the family of all such spheres. The proof of the last part of the theorem is left as an exercise.
5.4.19.
Exercise.
_
Prove part (iii) of Theorem 5.4.18.
Let {Y; p} be a subspace of a metric space {X; pI, and suppose that V is a subset of .Y It can happen that V may be an open subset of Y and at the same time not be an open subset of .X Thus, when a set is described as open, it is important to know in what space it is open. We have:
5.4.20.
p} be a metric subspace of { X ; pl. (i) A subset V c Y is open relative to { Y ; p} if and only if there is a subset U c X such that U is open relative to { X ; p} and V = Y n .U (ii) A subset G c Y is closed relative to { Y ; p} if and only if there is a subset F of X such that Fis closed relative to ;X { p} and G = F n .Y Proof L e t S(x o; r) = x { E :x p(x, x o) < r} and S'(x o; r) = x { E :Y p(x, x o) < r}. Then S' ( x o; r) = Y n S(x o; r). Theorem.
Let { Y ;
To prove the necessity of part (i), let V be an open set relative to { Y ; p} , and let x E V. Then there is a sphere S' ( x ; r) c V (r may depend on )x . Now
V=
U
.,el'
S' ( x ; r)
By part (ii) of Theorem 5.4.15,
U
.,el'
=
U
.,el'
S(x ; r)n
Sex; r) = U
Y.
is an open set in ;X{
pl.
To prove the sufficiency of part (i), let V = Y n ,U where U is an open subset of .X L e t x E V. Then x E ,U and hence there is a sphere S(x; r) c .U Thus, S'(x; r) = Y n Sex; r) c Y n U = V. This proves that x is an interior point of V and that V is an open subset of .Y The proof of part (ii) of the theorem is left as an exercise. _
5.4.21.
Exercise.
Prove part (ii) of Theorem 5.4.20.
The first part of the preceding theorem may be stated in another equivalent way. L e t 3 and 3' be the topology of ;X { p} and {Y; pI, respectively, generated by p. Then 3' = { Y n :U U E 3}. Let us now consider some specific examples.
elulpter 5 I Metric Spaces
5.4.22. Example. Let X = R, and let p be the usual metric on R; eL ., p(x, y) = Ix - yl. Any set Y = (a, b) = { x : a < x < b} is an open subset of .X We call (a, b) an open interval on R. _ 5.4.23. Example. We now show that the word "finite" is crucial in part (iii) of Theorem 5.4.15. eL t R { ; p} denote again the real line with the usual metric, and let a < b. If "Y = x { E R: a < x < b + lin}, then for each positive integer n, "Y is an open subset of the real line. oH wever, the set
n- "Y
,,= \
= x{
E
R: a
< x < b} = (a, b]
is not an open subset of R. (This. can readily be verified, since every sphere S(b; r) contains a point greater than b and hence is not in
n- "Y .)
,,= \
_
In the above example, let Y = (a, b]. We saw that Y is not an open subset of R; i.e., b is not an interior point of .Y oH wever, if we were to consider { Y ; p} as a metric space by itself, then Y is an open set. 5.4.24. Example. eL t e{ ra, b]; p_} denote the metric space of Example 5.3.14. eL t 1 be an arbitrary finite positive number. Then the s~t of continuous functions satisfying the condition Ix ( t) I < 1 for all a < t < b is an open _ subset of the metric space e{ ra, b]; p_.} Theorems 5.4.15 and 5.4.17 tell us that the sets X and 0 are both open and closed in any metric space. In some metric spaces there may be proper subsets of X which are both open and closed, as illustrated in the following example. 5.4.25. Example. eL t X be the set of real numbers given by X = (- 2 , - 1 ) U (+ 1 , + 2 ), and let p(x , y) = Ix - yl for x , y E .X Then { X ; p} is clearly a metric space. Let Y = (- 2 , - 1 ) c X and Z = (+ I, + 2 ) c .X Note that both Y and Z are open subsets of .X oH wever, Y - = Z, Z- = ,Y and thus Y a nd Z are also closed subsets of .X Therefore, Y and Z are proper subsets of the metric space ;X { p} which are both open and closed. (Note that in the preceding we are not viewing X as a subset of R. As such X would be open. Considering ;X{ p} as our metric space, X is both open and closed.) _ 5.4.26. Exercise. eL t { X ; p} be a metric space with p the discrete metric defined in Example 5.1.7. Show that every subset of X is both open and closed. In our next result we summarize several important properties of closed sets.
5.4.
Open and Closed Sets
5.4.27.
Theorem.
(i) Every subset of X consisting of a finite number of elements is closed. (ii) L e t X o E ,X let r> 0, and let K ( x o ; r) = x { E X : p(x , x o) < r}. Then K ( x o; r) is closed. (iii) A subset Y c X is closed if and only if feY . (iv) A subset Y c X is closed if and only if Y ' c .Y (v) Let {Y.}.eA be any family of closed sets in .X Then Y. is closed.
n
• eA
(vi) The union of a finite number of closed sets in X is closed. (vii) The closure of a subset Y of X is the intersection of all closed sets containing .Y
Proof Only the proof of part (v) is given. Let {Y.}.eA be any family of closed subsets of .X Then {Y:}.eA is a family of open sets. Now Y . )-
=
U
.eA
5.4.28.
Y:
is an open set, and hence
n
.eA
Y. is a closed subset of .X
(n .eA •
Prove parts (i) to (iv), (vi), and (vii) of Theorem 5.4.27.
Exercise.
We now consider several specific examples of closed sets. 5.4.29. Example. Let X = R, and let p be the usual metric, p(x , y) = Ix - yl· Any set Y = x{ E R: a < x < b}, where a < b is a closed subset of R. We call Y a closed interval on R and denote it by a[ , b]. • 5.4.30. Example. We now show that the word "finite" is essential in part (vi) of Theorem 5.4.27. Let {R; p} denote the real line with the usual metric, and let a> O. If Y. = x { E R: lin < x < a} for each positive integer n, then Y. is a closed subset of the real line. However, the set
U
-
.=1
Y. =
(x
E
R: 0 < x
<
a} =
(0, a]
is not a closed subset of the real line, as can readily be verified since adherent point of (0, a]. •
°
is an
5.4.31. Exercise. The set K ( x o; r) defined in part (ii) of Theorem 5.4.27 is sometimes called a closed sphere. It need not coincide with S(x o; r), i.e., the closure of the open sphere S(x o; r). (i) Show thatS(x o; r) c K(xo;r). (ii) Let (X ; p} be the discrete metric space defined in Example 5.1.7. Describe the sets S(x; I), S(x ; I), and K(x; I) for any x E X and conclude I) if X contains more than one point. that, in general, S(x ; I) K ( x ;
*'
Chapter 5 I Metric Spaces
(iii) Let X = (- 0 0,0) u ,J where J denotes the set of positive integers, and let p(x, y) = Ix - Y I. Describe S(O; 1), (& 0; I), and (K O; 1) and conclude that (& 0; 1) (K O; 1).
*"
We are now in a position to introduce certain additional concepts which are important in analysis and applications. 5.4.32. Definition. eL t Y and Z be subsets of .X The set Y is said to be dense in Z (or dense with respect to Z) if Y :J Z. The set Y is said to be everywhere dense in { X ; p} (or simply, everywhere dense in X ) if Y = .X If the exterior of Y is everywhere dense in X, then Y is said to be nowhere dense in .X A subset Y of X is said to be dense-in-itself if every point of Y is a limit point of .Y A subset Y of X which is both closed and dense-in-itself is called a perfect set. 5.4.33. Definition. A metric space {X; p} is said to be separable if there is a countable subset Y in X which is everywhere dense in .X The following result enables us to characterize separable metric spaces in an equivalent way. We have: 5.4.34. Theorem. A metric space { X ; p} is separable if and only if there is a countable set S = lX{ ' ,~x ...} c X such that for every x E ,X for given f> 0 there is an x . E S such that p(x, x . ) < f. 5.4.35.
Exercise.
Prove Theorem 5.4.34.
eL t us now consider some specific cases. 5.4.36. Example. The real line with the usual metric is a separable space. As we saw in Example 5.4.9, if Q is the set of rational numbers, then Q = R.
•
5.4.37. Example. Let {R·; p,} be the metric space defined in Example 5.3.1 (recall that 1 < p < 00). The set of vectors x = (e I' ,e.) with rational coordinates (i.e., is a rational real number, i = I, ,n) is a denumerable everywhere dense set in R· and, therefore, R { ;· p,} is a separable metric space. _
e,
5.4.38. Example. eL t {l,; p,} be the metric space defined in Example 5.3.5 (recall that I < p < 00). We can show that this space is separable in the following manner. eL t Y
= .Y{
E
I,: .Y
=
('II' ... , 1/.,0,0, ...) for some n,
where 1/1 is a rational real number, i
= 1, ... ,n} .
5.4.
Open and Closed Sets
285
Then Y is a countable subset of I,. To show that it is everywhere dense, let E > 0 and let x E I" where x = (~I> ~z, ...). Choose n sufficiently large so that ~ E' k-~+t
We can now find a Y~
1:
I~kl'
<_. 2
E Y such that
eH nce,
i.e., p,(x,
<
)~Y
By Theorem 5.4.34,
E.
{I,; P,} is separable. _
In order to establish the separability of the space of continuous functions, it is necessary to use the Weierstrass approximation theorem, which we state without proof. Theorem. Let era, b] be the space of real continuous functions on the interval a[ , b], and let 6'(t) be the family of all polynomials (defined on a[ , b]). Let E > 0, and let x E era, b]. Then there is apE 6'(t) such that sup Ix(t) - P(t)1 < E•
5.4.39.
•
~/~b
5.4.40.
Ex e rcise.
U s ing
the Weierstrass approx i mation
theorem, show
that the metric spaces e{ ra, b]; P,}, defined in Example 5.3.12, and e{ ra, b]; p~,} defined in Example 5.3.14, are separable.
Exercise. Show that the metric space { X ; p}, where pis the discrete metric defined in Example 5.1.7, is separable if and only if X is a countable set.
5.4.41.
We conclude the present section by considering an example of a metric space which is not separable. 5.4.42.
5.3.8. Let
Example. Y c R~
L e t {l~; p~} be the metric space defined in Example denote the set
Y ={y Clearly then Y c such that
IX
=
L
.~I
E R~: I~.
~
'1~H)~,
Y =
('11' 1' ,z ...), where 1' 1 =
0 or I}.
Now for every real number IX E 0[ , I], there is ayE Y where Y =
('11> 1' ,z ...). Thus, Y is an uncountable
set. Notice now that for every IY > zY E ,Y p~(IY > yz ) = 0 or l. That is, p~ restricted to Y is the discrete metric. It follows from Exercise 5.4.14 that Y cannot be separable and, consequently, { t ; p~} is not separable. _
Chapter 5 I Metric Spaces
286
5.5.
COMPLETE
METRIC SPACES
The set of real numbers R with the usual metric p defined on it has many remarkable properties, several of which are attributable to the so-called "completeness property" of this space. F o r this reason we speak of R { ; p} as being a complete metric space. In the present section we consider general complete metric spaces. Throughout this section {X; p} is our underlying metric space, and J denotes the set of positive integers. Before considering the completeness of metric spaces we need to consider a few facts about sequences on metric spaces (cf. Definition 1.1.25). 5.5.1. Definition. A sequence .x { } in a set Y c: X is a functionf: J Thus, if .x{ } is a sequence in ,Y thenf(n) = x . for each n E .J
.Y
5.5.2. Definition. eL t .x{ } be a sequence of points in ,X and let x be a point of .X The sequence {x.} is said to converge to x if for every f > 0, there is an integer N such that for all n;;::: N, p(x, x . ) < f (i.e., x . E S(x ; f) for all n ;;::: N). In general, N depends on f; i.e., N = N(f). We call x the limit of .x{ ,} and we usually write
lim x .
•
or x . - x as n - 00. If there is no x then we say that {x.l diverges.
= ,x
E
X to which the sequence converges,
Thus, x . - + x if and only if the sequence of real numbers {p(x., )x } converges to ez ro. In view of the above definition we note that for every f > 0 there is afinite number N such that all terms of {x.l except the first (N - I) terms must lie in the sphere with center x and radius E. eH nce, the convergence of a sequence depends on the infinite number of terms x{ N + 1J X N+ 2' • • ), and no amount of alteration of a finite number of terms of a divergent sequence can make it converge. Moreover, if a convergent sequence is changed by omitting or adding a finite number of terms, then the resulting sequence is still convergent to the same limit as the original sequence. Note that in Definition 5.5.2 we called x the limit of the sequence .x{ .} We will show that if { x . ) has a limit in ,X then that limit is unique. 5.5.3. Definition. eL t .x { } be a sequence of points in ,X where f(n) to x . for each n E .J If the range offis bounded, then .x { } is said to be a bounded sequence.
The range off in the above definition may consist of a finite number of points or of an infinite number of points. Specifically, if the range of f
5.5. Complete Metric Spaces
consists of one point, then we speak of a constant sequence. constant sequences are convergent.
Clearly, all
{ ; p} denote the set of real numbers with the usual 5.5.4. Example. Let R metric. If n E ,J then the sequence n{ Z} diverges and is unbounded, and the range of this sequence is an infinite set. The sequence { ( - I )"} diverges, is
a{ + ( nl)"}
bounded, and its range is a finite set. The sequence to a, is bounded, and its range is an infinite set.
converges
_
be a sequence in .X Let n l , n z , ... , nk' ... be 5.5.5. Definition. eL t "x { } a sequence of positive integers which is strictly increasing; i.e., nJ > nk for all j > k. Then the sequence "x { .} is called a subsequence of ,x { ,}. If the subsequence "x { .} converges, then its limit is called a subsequential limit of ,x { ,]. It turns out that many of the important properties of convergence on R can be extended to the setting of arbitrary metric spaces. In the next result several of these properties are summarized.
5.5.6. lbeorem. eL t ,x { ,}
be a sequence in .X
Then
(i) there is at most one point x E X such that lim "x
"
= x;
(ii) if ,x { ,} is convergent, then it is bounded; (iii) ,x { ,} converges to a point x E X if and only if every sphere about x
contains all but a finite number of terms in ,x { ,}; (iv) ,x { ,} converges to a point x E X if and only if every subsequence of ,x { ,} converges to x ; (v) if{,x ,} converges to x E X and if Y E ,X then lim p(x", )Y = p(x, )Y ;
"
(vi) if ,x { ,} converges to x E X and if the sequence y{ ,,} of X converges to Y E ,X then lim p(x", y,,) = p(x, y); and
(vii) if ,x [ ,} converges "to x E ,X and if there is ayE X and a )' > that p(x", y) < )' for all n E ,J then p(x, y) < y.
0 such
= x and " lim "x = y. Then for every f > 0 there are positive integers N" and N)' such " p(x", x ) < f/2 whenever n > N" and p(x", y) < f/2 whenever n > N that r
Proof.
To prove part (i), assume that ,x y E X
If we let N
Now
f
=
and that lim "x
max (N", N,,), then it follows that
is any positive number. Since the only non-negative number which
Chapter 5 I Metric Spaces
288
is less than every positive number is ez ro, it follows that p(x, y) = 0 and therefore x = y. To prove part (iii), assume that lim x . = x and let Sex; f) be any sphere
•
about .x Then there is a positive integer N such that the only terms of the sequence { x . } which are possibly not in Sex; f) are the terms X I ' x 2 , • • , X N - 1 • Conversely, assume that every sphere about X contains all but a finite number of terms from the sequence .x{ .} With f > 0 specified, let M = max n{ E :J .x 1= S(x ; f)} . IfwesetN= M + l,thenx . E S(x ; f)foralln> N ,which was to be shown. To prove part (v), we note from Theorem 5.1.13 that lP(y, )x -
•
I=
p(x, .x ).
.x Therefore, lim p(x, x . ) = 0 and so lim Ip(y, )x
By hypothesis, lim x . = - p (y, x . )
I<
p(y, x.)
0; i.e., lim p(y, x . ) =
•
•
•
p(y, x) .
iF nally, to prove part (vii), suppose to the contrary that p(x, y) > .Y' Then 6 = p(x, y) - i' > O. Now'Y - p(x., y) > 0 for all n E ,J and thus
0< for all n
E
6<
p(x, y) -
p(x., y)
.J But this is impossible, since lim
•
<
p(x, x . ) X.
=
.x Thus, p(x, y)
We leave the proofs of the remaining parts as an exercise. 5.5.7. Exercise.
<
y.
_
Prove parts (ii), (iv), and (vi) of Theorem 5.5.6.
In Definition 5.4.5, we introduced the concept of limit point of a set In Definition 5.5.2, we defined the limit of a sequence of points, .x{ ,} in .X These two concepts are closely related; however, the reader should carefully note the distinction between the two. The limit point of a set is strictly a property of the set itself. On the other hand, a sequence is not a set. Furthermore, the elements of a sequence are ordered and not necessarily distinct, while the elements of a set are not ordered but are distinct. oH wever, the range of a sequence is a subset of .X We now give a result relating these concepts. Y
c .X
S.S.8. Theorem. eL t (i) x .Y { } (ii) x .Y{ } (iii)
Y be a subset of .X
Then
is an adherent point of Y if and only if there is a sequence in Y (i.e., .Y E Y for all n) such that lim Y. = x ;
E X
E
•
is a limit point of the set Y if and only if there is a sequence of distinct points in Y such that lim Y . = x ; and X
•
Y is closed if and only if for every convergent sequence {y.j, such that Y. E Y for all n, limy. = x E Y.
•
5.5.
Complete Metric Spaces
Proof
To prove part (i), assume that lim Y .
•
=
x. Then every sphere about
x contains at least one term of the sequence .Y { } and, since every term of fy.} is a point of ,Y it follows that x is an adherent point of .Y Conversely, assume that x is an adherent point of .Y Then every sphere about x contains at least one point of .Y Now let us choose for each positive integer n a point .Y E Y such that .Y E S(x; lIn). Then it follows readily that the sequence .Y { } chosen in this fashion converges to x. Specifically, if f > 0 is given, then we choose a positive integer N such that lIN < f. Then for every n > N we have Y . E S(x; lIn) c S(x; f). This concludes the proof of part (i). To prove part (ii), assume that x is a limit point of the set .Y Then every sphere S(x; lIn) contains an infinite number of points, and so we can choose a Y . E S(x; lIn) such that Y . IY II for all m < n. The sequence .Y { } consists of distinct points and converges to .x Conversely, if .Y { } is a sequence of distinct points convergent to x and if S(x; f) is any sphere with center at ,x then by definition of convergence there is an N such that for all n > N, y" E S(x; f). That is, there are infinitely many points of Y i n S(x ; f). To prove part (iii), assume that Y is closed and let ,Y { ,} be a convergent sequence with Y . E Y for all n and lim "Y = x . We want to show that x E Y .
"*
•
By part (i), x must be an adherent point of .Y Since Y is closed, x E .Y Next, we prove the converse. Let x be an adherent point of .Y Then by part (i), there is a sequence y{ .J in Y such that lim Y . = x. By hypothesis, we must
•
have x E .Y Since Y contains all of its adherent points, it must be closed. _
Statement (iii) of Theorem 5.5.8 is often used as an alternate way of defining a closed set. The next theorem provides us with conditions under which a sequence is convergent in a product metric space. 5.5.9. Theorem. Let {X; P.. J and fY; py} be two metric spaces, letZ = X x ,Y let p be any of the metrics defined on Z in Theorem 5.3.19, and let { Z ; p} denote the product metric space of { X ; P..} and { Y ; py}. If Z E Z = X x ,Y then z = (x, y), where x E X and y E .Y eL t fx,,} be a sequence in ,X and let y{ ,,} be a sequence in .Y Then, (i) the sequence ({ .x , y,,)} converges in Z if and only if ,x { ,} X and .Y { } converges in ;Y and (ii) lim (x"' Y.) = (lim x . , lim y,,) whenever this limit exists.
•
5.5.10. Exercise.
converges in
••
Prove Theorem 5.5.9.
In many situations the limit to which a given sequence may converge is unknown. The following concept enables us to consider the convergence of a sequence without knowing the limit to which the sequence may converge.
Chapter 5 I Metric Spaces 5.5.11. Definition. A sequence ,x { ,} of points in a metric space ;X { p} is said to be a Cauchy sequence or a fundamental sequence if for every e > 0 there is an integer N such that p(x", "x ,) < e whenever m, n ~ N. The next result follows directly from the triangle inequality.
p} is
5.5.12. Theorem. Every convergent sequence in a metric space { X ; a Cauchy sequence.
Proof
Assume that lim "x
"
=
.x Then for arbitrary e >
integer N such that p(x", x) < el2 and p(x"" x) In view of the triangle inequality we now have whenever m, n >
<
el2 whenever m, n
p(x", "x ,) < p(x", x) + p(x"" x) < N. This proves the theorem. _
Let ,x { ,}
>
N.
e
p} a Cauchy sequence
We emphasize that in an arbitrary metric space { X ; is not necessarily convergent. 5.5.13. Theorem. sequence.
0 we can find an
be a Cauchy sequence. Then ,x { ,}
is a bounded
We need to show that there is a constant "I such that 0 < "I < 00 and such that p(x"" ,x ,) < "I for all m, n E I. Letting e = I, we can find N such that p(x"" ,x ,) < I whenever m, n ~ N. Now let l = max p{ (XI> x z ), p(x l , x 3), ... ,p(x l , x N)). Then, by the triangle inequality, p(x l , ,x ,) < P(X l ' x N ) + p(x N , ,x ,) < (l + I)
Proof
if n > N. Thus, for all n E I, p(x l , ,x ,) inequality, p(x"" ,x ,) < p(x"" X l ) for all m, n
E
I. Thus, p(x"" ,x ,)
<
2(A
+
<
l
+
+
p(x
l. Again, by the triangle l,
I) and ,x { ,}
,x ,)
is a bounded sequence.
We also have:
-
5.5.14. Theorem. If a Cauchy sequence ,x { ,} contains a convergent subsequence "x { .}, then the sequence ,x { ,} is convergent. 5.5.15.
Exercise.
Prove Theorem 5.5.14.
We now give the definition of complete metric space. 5.5.16. Definition. If every Cauchy sequence in a metric space ;X{ p} converges to an element in ,X then { X ; p} is said to be a complete metric space.
291
5.5. Complete Metric Spaces
Y
Complete metric spaces are of utmost importance in analysis and applications. We will have occasion to make extensive use of the properties of such spaces in the remainder of this book. 5.5.17. Example. eL t X = (0, I), and let p(x, y) = E .X eL t x . = lin for n E .J Then the sequence .x{ } is a Cauchy sequence), since Ix . - lx iii < IIN for all there is no x E X to which .x { } converges, the metric complete. _ 5.5.18. Example.
= Ix - yl·
eL t
Let
x. =
I
+
X
=
Ix
- Y I for all x, is Cauchy (i.e., it n > m > N. Since space { X ; p} is not
Q, the set of rational numbers, and let p(x, y)
2\
.
+ ... + 1, for n.
Cauchy. Since there is no limit in Q to which .x{ } Q { ; p} is not complete. _
n
E
.J
The sequence .x { }
is
converges, the metric space
5.5.19. Example. Let R# = R - CO}, and let p(x , y) = Ix - IY for all x, Y E R'ft. eL t x . = lin, n E .J The sequence .x{ } is Cauchy; however, it does not converge to a limit in R#. Thus, {R#; p} is not complete. Some further comments are in order here. If we view R# as a subset of R in the metric space { R ; p} (p denotes the usual metric on R), then the sequence {x.} converges to zero; i.e., lim x . = O. By Theorem 5.5.8, R# cannot be a closed
•
subset of R. oH wever, R# is a closed subset of the metric space {R#; p}, since it is the whole space. There is no contradiction here to Theorem 5.5.8, for the sequence {x.} does not converge to a limit in R#. Specifically, Theorem 5.5.8 states that if a sequence does converge to a limit, then the limit must belong to the space. The requirement for completeness is that every Cauchy sequence must converge to an element in the space. _ We now consider several specific examples of important complete metric spaces. 5.5.20. Example. eL t p denote the usual metric on R, the set of real numbers. The completeness of R { ; p} is one of the fundamental results of analysis. _ 5.5.21. Example. eL t { X ; P.. } and { Y ; py} be arbitrary complete metric spaces. If Z = X x Yand if Z E Z, then z = (x, y), where x E X and y E Y (see Theorem 5.3.19). Define p,.(Zt, Z2)
= P2«X t , tY ), (x 2, 2Y » = ,J ( P ..(x t , x 2)]2 + (piY t , 2Y )]2.
It can readily be shown that the metric space { Z ; P2} is complete.
_
Chapter 5
292
5.5.22. Exercise.
I Metric Spaces
Verify the completeness of Z { ; P2} in the above example.
5.5.23. Example. Let P be the usual metric defined on C, the set of complex numbers. tU ilizing Example 5.5.21 along with the completeness of R { ; p} (see Example 5.5.20), we can readily show that C { ; p} is a complete metric space. _ 5.5.24.
Exercise.
pl.
Verify the completeness of { C ;
5.5.25. Exercise. eL t X = R" (let X = C") denote the set of all real (of all complex) ordered n-tuples x = (~I' ... ,~,,). Let y = ('11J ... ,'1,,), let p,(x , y)
/ 1' 11'T "
-
= [~I~I
and let
I
sp <
00,
max I{ I~ - 1' 11 • . .. ,I~" - 1' "n. i.e.• p = 00. tU ilizing the completeness of the real line (of the complex plane), show that {R"; p,} = R;({C;'; p,} = C;) is a complete metric space for 1 S p S 00. In particular, show that if lX { } ' is a Cauchy sequence in R; (in C;), where lX ' = (~\kJ ... '~"l')' then {~/l'} is a Cauchy sequence in R (in C) for j = I, ... ,n, and lX { } ' converges to x, where x = (~I' ... ,~,,) and ~, = lim l'~ l' for j = 1, ... , n.
p..(x, y) =
5.5.26. Example. Let {I,; p,} be the metric space defined in Example 5.3.5. We now show that this space is a complete metric space. eL t
Let lX { } ' f
be a Cauchy sequence in I,. where lX ' E J such that
> O. Then there is an N p,(x"
lX )' =
[ .-L1.."
.., -
".1 -
~.l'
=
1] /'
(~lkJ
<
~2k'
••
,
~d'
••
).
f
for all k,j ~ N. This implies that ~"'l I < f for every m E J and all k,j ~ N. Thus, {~.l'} is a Cauchy sequence in R for every m E ,J and hence .~{ l}' is convergent to some limit, say lim ~ ..l' = ~. for m E .J Now let l' x = (~t, ~2' • • , ~'" • • ). We want to show that (i) x E I, and (ii) lim lX ' = .X Since lX { } ' is a Cauchy sequence, exists a " > 0 such that p,(O, lX )' =
k
we know by Theorem 5.5.13 that there
[ .~I ..
1~.k I'
1] /'
<"
for all k E .J Now let n be any positive integer, let p~ be the metric on R" defined in Exercise 5.5.25, and let ~x = {~\kJ ... '~"l'J. Then p~(x~, xj) < p,(x l" IX )' and thus {x~} is a Cauchy sequence in R;. It also follows that p~(O, x~) s" for all k E .J Now by Exercise 5.5.25, }~x{ converges to x ' ,
5.5.
where x '
p~(O,
293
Complete Metric Spaces
x')
=
(' I " .. ,' , ,). It follows from Theorem 5.5.6, part (vii), that
< ,,;
follows that x
i.e., E
t[ i
1',1'
)t/, <
".
Since this must hold for all n
I,. To show that lim X
k
k
E
I, it
= ,x let € > O. Then there is an
integer N such that p,(x } , X k ) < € for all k,j > N. Again, let n be any positive integer. Then we have p~(,~x )~x < € for all j, k > N. F o r fixed n, we conclude from Theorem 5.5.6, part (vii), that p~(X', x~) :::;; € for all k 2 N. eH nce,
[ ~ " 1,,,, "' s l
k'
I' IJ /' <
for all k > €
N, where N depends
only on € (and not on n). Since this must hold for all n E I, we conclude that p(x , x k } < € for all k > N. This implies that lim x k = X . _ k
5.5.27. Exercise. is complete.
Show that the discrete metric space of Example 5.1.7
5.5.28. Example. eL t e{ ra, bJ; p~) be the metric space defined in Example 5.3.14. Thus, era, bJ is the set of all continuous functions on a[ , bJ and y)
p~(x,
=
sup I(X I) -
• S/Sb
y(l) I.
We now show that e{ ra, bJ; p~) is a complete metric space. If ,x { ,} isa Cauchy sequence in era, bJ, then for each € > 0 there is an N such that I,x ,(I) - "X ,(I) I < € whenever m, n 2 N for all I E a[ , b]. Thus, for fixed I, the sequence ,X { ,(I}) converges to, say, oX (I}. Since t is arbitrary, the sequence offunctions {x,,( .)} converges pointwise to a function x o( .). Also, since N = N(€ ) is independent of I, the sequence ,x { ,( • )} converges uniformly to x o( • ). Now from the calculus we know that if a sequence of continuous functions ,x { ,( • )» converges uniformly to a function x o( • ), then x o( • ) is continuous. Therefore, every Cauchy sequence in e{ a[ , b); pool converges to an element in this space in the sense of the metric poo. Therefore, the metric space e{ a[ , bJ; pool is complete. _
5.5.29. Example. eL t e{ ra, bJ; pz} 5.3.12, with p = 2; i.e., pz(x,
:U
y) =
be the metric space defined in Example
(X [ I)
-
y(I)J2
dt}
lIZ.
We now show that this metric space is not complete. Without loss ofgenerality let the closed interval be [ - 1 , IJ. In particular, consider the sequence ,x { ,} of continuous functions defined by x , ,(t)=
{
< t:::;; 0
0,
-)
nt,
O:::;;t:::;;l! n
I,I! n :::;;t:::;;)
} ,
Chapter 5 I Metric Spaces
194 x ( t)
n= 3
n= 2 - ~ ' + f
~ -n=l
- l _ l - - f I~ - - l ..- - -
t
5.5.30. n = m >
for e{ ra, b]; P2}.
iF gw'e .F Sequence {x.}
1,2, .... This sequence is depicted pictorially in Figure n and note that P{ 2(X
.., .X )}2 =
=
(m -
,,)2 ill... t 2 dt
(m -
,,)2
o
3m2 n
< .!. <
fl/. (I -
+
1/..
.F
Now let
nt)2 dt
£
3n
whenever n > 1/(3£). Therefore, .x{ } is a Cauchy sequence. F o r purposes of contradiction, let us now assume that .x{ } converges to a continuous function x, where convergence is taken with respect to the metric P2' In other words, assume that
fl
Ix.(t) -
(x t)12 dt - -
0 as n - -
00.
This implies that the above integral with any limits between + I and - I also approaches ez ro as n - > 00. Since x.(t) = 0 whenever t E [ - 1 ,0] , we have
f
l
Ix.(t) -
(x t)12 dt
=
0
independent of n. From this it follows that the continuous function x is such that
and x(t)
r
= 0 whenever t
Choosing n
fl
E
[ - 1 ,0] .
Ix.(t) -
r
2dt
I x(t) 1
=
0,
Now if 0 <
x(t) 12 dt - -
a S I, then
0 as n - -
00.
> I/a, we have 11 -
x ( tW dt - -
0 as n - -
00.
Since this integral is independent of n it vanishes. Also, since x is continuous
5.5. Complete Metric Spaces
it follows that x(t) = 1 for t > a. Since a can be chosen arbitrarily close to ez ro, we end up with a function x such that x(t)
= {O,
I,
t t
E E
[ - 1 ,0] (0, I]
}.
Therefore, the Cauchy sequence .x [ J does not converge to a point in era, b], and the metric space is not complete. _ The completeness property of certain metric spaces is an essential and important property which we will use and encounter frequently in the remainder of this book. The preceding example demonstrates that not all metric spaces are complete. However, this space e[ ra, b]; pzJ is a subspace of a larger metric space which is complete. To discuss this complete metric space (i.e., the completion of e{ ra, b]; pz)} , it is necessary to make use of the eL besgue theory of measure and integration. F o r a thorough treatment of this theory, we refer the reader to the texts by Royden 5[ .9] and Taylor 5[ .10]. Although knowledge of this theory is not an essential requirement in the development of the subsequent results in this book, we will want to make reference to certain examples of important metric spaces which are defined in terms of the eL besgue integral. F o r this reason, we provide the following heuristic comments for those readers who are unfamiliar with this subject. The eL besgue measure space on the real numbers, R, consists of the triple R { , mr, lJ ,} where mr is a certain family of subsets of R, called the eL besgue measurable sets in R, and J l is a mapping, W mr - > R*, called eL besgue measure, which may be viewed as a generalization of the concept of length in R. While it is not possible to characterize mr without providing additional details concerning the eL besgue theory, it is quite simple to enumerate several important examples of elements in mr. F o r instance, mr contains all intervals of the form (a, b) = x { E R: a < x < b}, c[ , d) = x { E R: c < x < d}, (e,f] = x{ E R: e < x < f } , g[ , h] = x{ E R: g < x < h}, as well as all countable unions and intersections of such intervals. It is emphasized that mr does not include all subsets of R. Now if A E mr is an interval, then the measure of A, lJ (A), is the length of A. F o r example, if A = a[ , b], then lJ (A) = b - a. Also, if B is a countable union of disjoint intervals, then lJ (B) is the sum of the lengths of the disjoint intervals (this sum may be infinite). Of particular interest are subsets of R having measure ez ro. Essentially, this means it is possible to "cover" the set with an arbitrarily small subset of R. Thus, every subset of R containing at most a countable number of points has eL besgue measure equal to ez ro. F o r example, the set of rational numbers has eL besgue measure ez ro. (There are also uncountable subsets of R having eL besgue measure zero.) In connection with the above discussion, we say that a proposition P(x) is true almost everywhere (abbreviated a.e.) if the set S = [x E R: P(x) is
Chapter 5
296
I
Metric Spaces
not true} has eL besgue measure ez ro. F o r example, two functions f, g: R- + R are said to be equal a.e. if the set S = x { E R:f(x ) *- g(x)} E mt and if .J l(S) = O. eL t us now consider the integral of real-valued functions defined on the interval a[ , b] c R. It can be shown that a bounded function f: a[ , b] - + R is Riemann integrable (where the Riemann integral is denoted, as usual, by
r a
f(x ) dx )
if and only if f is continuous almost everywhere on a[ , b]. The .
class of Riemann integrable funCtions with a metric defined in the same manner as in Example 5.5.29 (for continuous functions on a[ , b]) is not a complete metric space. oH wever, as pointed out before, it is possible to generalize the concept of integral and make it applicable to a class of functions significantly larger than the class of functions which are continuous a.e. In doing so, we must consider the class of measurable functions. Specifically, a functionf: R - + R is said to be a eL besgue measurable fnnction if f- I (' l l.) E mt for every open set CU c R. Now letfbe a e L besgue measurable function which is bounded on the interval a[ , b], and let M = sup { f (x ) = y: x E a[ , b],} and let m = inf { f (x ) = y: x E a[ , b].} In the eL besgue approach to integration, the range off is partitioned into intervals. (This is in contrast with the Riemann approach, where the domain of f is partitioned in developing the integral.) Specifically, let us divide the range off into the n parts specified by m = oY < IY < ...
I, ... ,n. The sum
:t {kP(E
k= 1
k)
approximates
the area under the
graph off, and it can serve as the definition of the integral of f between a and b, after an appropriate limiting process has been performed. Provided that this limit exists, it is called the eL besgue integral off over a[ , b], and it is denoted by
f.
(a, b)
fd p. It can be shown that any bounded function f which
is Riemann integrable over a[ , b] is eL besgue furthermore
f.
(a,bl
fdJ.l
= fb f(x ) dx . •
integrable over a[ , b], and
On the other hand, there are functions
which are eL besgue integrable but not Riemann integrable over a[ , b]. F o r example, consider the function f: a[ , b] - + R defined by f(x ) = 0 if x is rational and f(x ) = I if x is irrational. This function is so erratic that the Riemann integral does not exist in this case. oH wever, since the interval a[ , b] = A U B, where A = { x : f(x ) = I} and B = { x : f(x ) = O}, it follows from the preceding characterization
of eL besgue
integral that
f.
(a,bl
fdJ.l
l.J.l(A) + O.J.l(B) = b - a. eL t us now consider an important class of complete metric spaces, given in the next example. =
5.5.
Complete Metric Spaces
297
5.5.31. Example. Let p > 1 (p not necessarily an integer), let (R, mr, p.) denote the eL besgue measure space on the real numbers, and let a[ , h] be a subset of R. Let .c;:a, h] denote the family of functions f: R - > R which are eL besgue measurable and such that f
[J .•
b]
IfIp dp. exists and is finite.
We define an equivalence relation ~ on .c;:a, h] by saying that f' " g if f(x ) = g(x) except on a subset of a[ , h] having eL besgue measure ez ro. Now denote the family of equivalence classes into which .cp[a, h] is divided by pL a[ , h]. Specifically, let us denote the equivalence class [ f ] = g{ E .cp[a, h]: g ~ f} for f E .cp[a, h]. Then pL a[ , h] = { [ f ] : f E .cp[a, hlJ· Now let X = pL a[ , h] and define Pp: X x X - > R by (5.5.32) It can be shown that the value of p([f], g[ )J defined by Eq. (5.5.32) is the same for any f and g in the equivalence classes [ f ] and g[ ,] respectively. Furthermore, p satisfies all the axioms of a metric, and as such pL { a[ , h]; pp} is a metric space. One of the important results of the eL besgue theory is that this space is complete. It is important to note that the right-hand side of Eq. (5.5.32) cannot be used to define a metric on .cp[a, h], since there are functions f *- g such that
.[J ,f
b)
If-
glp dp.
=
0; however, in the literature the distinction between
h] and .cp[a, h] is usually suppressed. b] instead of [ f ] E A L a, b], where f E Finally, in the particular case when p Example 5.5.29 is a subspace of the space L{
pL a[ , pL a[ ,
That is, we usually write .cJa, b]. = 2, the space e{ ra, b]; pz}
z ; pz.}
•
f
E
of
Before closing the present section we consider some important general properties of complete metric spaces. 5.5.33. Theorem. Let { X ; p) be a complete metric space, and let { Y ; p} be a metric subspace of { X ; pl. Then { Y ; p) is complete if and only if Y is a closed subset of .X Proof Assume that { Y ; p) is complete. To show that Y is a closed subset of X we must show that Y contains all of its adherent points. Let y be an adherent point of ;Y i.e., lety E .Y Then each open sphere S(y; lIn), n = I, 2, ... , contains at least one point y" in .Y Since p(y", y) < lIn it follows that
the sequence y{ ,,) converges to y. Since y{ ,,) is a Cauchy sequence in the complete space { Y ; p} we have y{ ,,} converging to a point y' E .Y But the limit of a sequence of points in a metric space is unique by Theorem 5.5.6. Therefore, y' = y; i.e., y E Y and y is closed.
Chapter 5 / Metric Spaces
Conversely, assume that Y is a closed subset of .X To show that the space { Y ; p} is complete, let .Y { } be an arbitrary Cauchy sequence in { Y ; pl. Then y{ ,,} is a Cauchy sequence in the complete metric space ;X { p} and as such it has a limit y E .X oH wever, in view of Theorem 5.5.8, part (iii), the closed subset Y of X contains all its adherent points. Therefore, { Y ; p} is complete. _ We emphasize that completeness and closure are not necessarily equivalent in arbitrary metric spaces. F o r example, a metric space is always closed, yet it is not necessarily complete. Before characterizing a complete metric space in an alternate way, we need to introduce the following concept. 5.5.34. Definition. A sequence S{ t} is called a nested sequence of sets if St
::>
Sz
of subsets of a metric space ;X{ ::>
p}
S3 ::> • •
We leave the proof of the last result of the present section as an exercise. 5.5.35. Theorem. eL t { X ;
p} be a metric space. Then,
(i) { X ; p} is complete if and only if every sequence of closed nested spheres in { X ; p} with radii tending to ez ro have non-void interesection; and p} is complete, if S{ t} is a nested sequence of non-empty closed (ii) if ;X { subsets of ,X
and if lim diam (S,,) =
0, then the intersection
•
is not empty; in fact, it consists of a single point. 5.5.36. Exercise.
5.6.
n SIt
.~I
Prove Theorem 5.5.35.
COMPACTNESS
We recall the Bolzano-Weierstrass theorem from the calculus: Every bounded, infinite subset of the real line (i.e., the set of real numbers with the usual metric) has at least one point of accumulation. Thus, if Y is an arbitrary bounded infinite subset of R, then in view of this theorem we know that any sequence formed from elements of Y has a convergent subsequence. F o r example, let Y = 0[ , 2], and let ,x { ,} be the sequence of real numbers given by "x -
_ I -
(- I )" 2
+ n'I
n-
_
2 1, , ....
Then the range of this sequence lies in Y and is thus bounded. eH nce, range has at least one accumulation point. It, in fact, has two.
the
5.6.
Compactness
299
A theorem from the calculus which is closely related to the BolzanoWeierstrass theorem is the eH ine-Borel theorem. We need the following terminology. 5.6.1. Definition. eL t Y be a set in a metric space { X ; p), and let A be an index set. A collection of sets { Y II : (X E A) in {X; p) is called a covering of Y if Y c U Y II • A subcollection { Y p : p E B) of the covering { Y . : (X E A), eL .,
ileA
B c A such that Y c
U
pes
Y
p
is called a subcovering of { Y.;
(X
E
A). If
all the members Y . and Y p are open sets, then we speak of an open covering and open subcovering. If A is a finite set, then we speak of a finite covering. In general, A may be an uncountable set. We now recall the eH ine-Borel theorem as it applies to subsets of the real line (Le., of R): eL t Y be a closed and bounded subset of R. If { Y . : (X E A) is any family of open sets on the real line which covers ,Y then it is possible to find a finite subcovering of sets from { Y . : (X E A). Many important properties of the real line follow from the BolzanoWeierstrass theorem and from the eH ine-Borel theorem. In general, these properties cannot be carried over directly to arbitrary metric spaces. The concept of compactness, to be introduced in the present section, will enable us to isolate those metric spaces which possess the eH ine-Borel and BolzanoWeierstrass property. Because of its close relationship to compactness, we first introduce the concept of total boundedness. 5.6.2. Definition. eL t Y be any set in a metric space { X ; p}, and let l be an arbitrary positive number. A set S. in X is said to be an l- n et for Y if for any point y E Y there exists at least one point S E S. such that p(s,y) < l. The l-net, S.. is said to be finite if S. contains a finite number of points. A subset Y of X is said to be totally bounded if X contains a finite l- n et for Y for every l > O. Some authors use the terminology l-dense set for E-net and precompact for totally bounded sets. An obvious equivalent characterization of total boundedness is contained in the following result. 5.6.3. Theorem. A subset Y c X is totally bounded if and only if Y can be covered by a finite number of spheres of radius E for any E > O. 5.6.4.
Exercise.
Prove Theorem 5.6.3.
In Figure G a pictorial demonstration of the preceding concepts is given. If in this figure the size of E would be decreased, then correspondingly, the
Chapter 5 I Metric Spaces
300
Set X
• S. is the finite set consisting of the dots within the set X
• •
• •
•
• •
•
• •
•
•
• •
•
•
•
•
• • •
•
•
•
• •
• Set Y
•
•
iF gure G. Total boundedness of a set .Y
5.6.5.
number of elements in S. would increase. If for arbitrarily small E the number of elements in S. remains finite, then we have a totally bounded set .Y Total boundedness is a stronger property than boundedness. We leave the proof of the next result as an exercise. 5.6.6.
Then,
p J be a metric space, and let Y be a subset of .X
Theorem. eL t { X ;
(i) if Y is totally bounded, then it is bounded; if Y is totally bounded, then its closure Y is totally bounded; and (iii) if the metric space { X ; pJ is totally bounded, then it is separable. (ii)
5.6.7. Exercise.
Prove Theorem 5.6.6.
We note, for example, that all finite sets (including the empty set) are totally bounded. Whereas all totally bounded sets are also bounded, the converse does, in general, not hold. We demonstrate this by means of the following example. 5.6.8. Example. eL t /{ 2; P2J be the metric space defined in Example 5.3.5. Consider the subset Y c /2 defined by Y
= y{
E
. 1'1,1
12 ::E t= 1
2
S I}.
We show that Y is bounded but not totally bounded. F o r any ,x y have by the Minkowski inequality (5.2.7), P2(X,y)
= [~Iet
- l' ,12r2 < [~le,/2r2 +
t[ i
l' ,12T'2 s
,Y we
E
2.
Thus, Y is bounded. To show that Y is not totally bounded, consider the set of points E = e{ p e2 , • • J c ,Y where e l = (1,0,0, ...), e2 = (0, 1,
301
5.6. Compactness
0, ...), etc. Then pz(e l, eJ ) = ...;-T for i 1= = j. Now suppose there is a finite €-net for Y for say € = 1- Let S{ l> ... , s,,} be the net S,. Now if eJ is such that p(eJ' SI) < ! for some i, then peek' sJ > peek' eJ ) - p(eJ' SI) > ! for k 1= = j. Hence, there can be at most one element of the set E in each sphere S(SI;! ) for i = I, ... ,n. Since there are infinitely many points in E and only a finite number of spheres S(SI; ! ) , this contradicts the fact that S, is an (- n et. Hence, there is no finite (- n et for ( = ! ' and Y is not totally bounded. _ Let us now consider an example of a totally bounded set. 5.6.9.
Example.
Let R { "; pz}
be the metric space defined in Example 5.3.1,
and let Y be the subset of R" defined by Y =
{y
EO
R":
t
leI
til <
I}. Clearly,
Y is bounded. To show that Y is totally bounded, we construct an €-net for Y for an arbitrary ( > 0. To this end, let N be a positive integer such that N€ > .- In, and let S, be the set of all n-tuples given by s{ =
Sf =
(q l '
... ,q . )
where - N
< ml <
lq =
:Y
EO
N, i
=
mlIN, some integer ml, I, ... , nJ.
Then clearly S. c Y a nd S, is finite. Now for any y = (til' ... ,tI,,) EO ,Y there is an s EO S, such that Ilq - till < IIN for i = I, ... , n. Thus, pz(y, s) ~
[ L•
I-I
I{ IN)Z
1] /1
= filN ~ (.
Therefore, S. is a finite (- n et.
is arbitrary, Y is totally bounded.
Since (
_
In general, any bounded subset of R'i
=
R { "; pz}
is totally bounded.
5.6.10. Exercise. Let l{ ;z pz} be the metric space defined in Example 5.3.5, and let Y c /z be the subset defined by
Y
=
{y
EO
/z:
Itlll~
I, Itizi
... , 111.1 <
(1)"- I , ... .}
Show that Y is totally bounded. In studying compactness of metric spaces, we will find it convenient to introduce the following concept. 5.6.11. Definition. A metric space { X ; p} is said to be sequentially compact if every sequence of elements in X contains a subsequence which converges to some element x EO .X A set Y in the metric space { X ; p} is said to be sequentially compact if the subspace { Y ; pJ is sequentially compact; eL ., every sequence in Y contains a subsequence which converges to a point in .Y 5.6.12. Example. L e t X = (0, I], and let p be the usual metric on the real line R. Consider the sequence .x { ,J where "x = lin, n = I, 2, . . .. This
302
Chapter 5
I
Metric Spaces
sequence has no subsequence which converges to a point in ,X { X ; p} is not sequentially compact. _
and thus
We now define compactness. 5.6.13. Definition. A metric space { X ; p} is said to be compact, or to possess p} contains a finite the eH ine-Borel property, if every open covering of ;X { open subcovering. A set Y in a metric space { X ; p} is said to be compact if the subspace { Y ; p} is compact. Some authors use the term bicompact for eH ine-Borel compactness and the term compact for what we call sequentially compact. As we shall see shortly, in the case of metric spaces, compactness and sequential compactness are equivalent, so no confusion should arise. We will also show that compact metric spaces can equivalently be characterized by means of the Bolzano-Weierstrass property, given by the following. 5.6.14. Definition. A metric space { X ; p} possesses the Bolzano-Weierstrass property if every infinite subset of X has at least one point of accumulation. A set Y in X possesses the Bolzano-Weierstrass property if the subspace { Y ; p} possesses the Bolzano-Weierstrass property. Before setting out on proving the assertions made above, i.e., the equivalence of compactness, sequential compactness, and the Bolzano-Weierstrass property, in metric spaces, a few comments concerning some of these concepts may be of benefit. Informally, we may view a sequentially compact metric space as having such an abundance of elements that no matter how we choose a sequence, there will always be a clustering of an infinite number of points around at least one point in the metric space. A similar interpretation can be made concerning metric spaces which possess the Bolzano-Weierstrass property. Utilizing the concepts of sequential compactness and total boundedness, we first state and prove the following result. 5.6.15. Theorem. Let { X ; p} be a metric space, and let Y be a subset of .X The following properties hold: (i) (ii) (iii) (iv) (v)
if Y is sequentially compact, then Y is bounded; if Y is sequentially compact, then Y is closed; if { X ; p} is sequentially compact, then { X ; p} is totally bounded; if ;X { p} is sequentially compact, then ;X{ p} is complete; and if { X ; p} is totally bounded and complete, then it is sequentially compact.
Proof To prove (i), assume that Y is a sequentially compact subset of X
and assume, for purposes of contradiction, that Y is unbounded. Then we
5.6. Compactness
303
can construct a sequence ,Y { ,} with elements arbitrarily far apart. Specifically, let IY E Y a nd choose zY E Y such that P(YI' 12) > I. Next, choose Y 3 E Y such that p(y I' Y 3) > 1 + p(y I' Y )z . Continuing this process, choose "Y E Y such that P(YI' ,Y ,) > 1 + p(y., Y , ,- I )' If m > n, then P(YI'Y"') > I+ p(y"y")andp(y",,y,,) > Ip(Y I ' Y " ' ) - p(YI,Y,,)1 > 1. But this implies that y{ ,,} contains no convergent subsequence. However, we assumed that Y is sequentially compact; i.e., every sequence in Y contains a convergent subseuq ence. Therefore, we have arrived at a contradiction. Hence, Y must be bounded. In the above argument we assumed that Y is an infinite set. We note that if Y is a finite set then there is nothing to prove. To prove part (ii), let f denote the closure of Y a nd assume that Y E f. Then there is a sequence of points ,Y { ,} in Y which converges to ,Y and every subsequence of y{ ,,} converges to ,Y by Theorem 5.5.6, part (iv). But, by hypothesis, Y is sequentially compact. Thus, the sequence y{ ,,} in Y contains a subsequence which converges to some element in .Y Therefore, Y = f and Y is closed. We now prove part (iii). Let { X ; p} be a sequentially compact metric space, and let X I E .X With E > 0 fixed we choose if possible X z E X such that p(x p x z ) > E. Next, if possible choose X 3 E X such that p(x l , x z ) > E and p(x p x 3 ) > E. Continuing this process we have, for every n, p(x", X I ) > E, p(x", x z ) > E, • • , p(x", X , ,_ I ) > E. We now show that this process must ultimately terminate. Clearly, if{X; p} is a bounded metric space then we can pick E sufficiently large to terminate the process after the first step; i.e., there is no point x E X such thatp(x 1 , x ) :2 € . Now suppose that, in general, the process does not terminate. Then we have constructed a sequence ,x { ,} such that for any two members X I ' x J of this sequence, we have p(xt> X J ) > E. But, by hypothesis, ;X{ p} is sequentially compact, and thus ,x { ,} contains a subsequence which is convergent to an element in .X Hence, we have arrived at a contradiction and the process must terminate. Using this procedure we now have for arbitrary E > 0 afinite set of points { x . , x z , ... ,X l } such that the spheres, S(x,,; E), n = I, ... ,I, cover X ; i.e., for any E > 0, X contains a finite E-net. Therefore, the metric space { X ; p} is totally bounded. We now prove part (iv) of the theorem. Let ,x { ,} be a Cauchy sequence. Then for every E > 0 there is an integer I such that p(x"" ,x ,) < f whenever m > n > I. Since { X ; p} is sequentially compact, the sequence ,x { ,} contains a subsequence tx{ .l convergent to a point X E X so that lim P(Xl., )x = O. ,,-00 The sequence I{ ,,} is an increasing sequence and I", > m. It now follows that whenever m > n > I. Letting m - + 00, we have 0 < p(x", )x < E, whenever n > I. Hence, the Cauchy sequence ,x { ,} converges to x E .X Therefore, X is complete. In connection with parts (iv) and (v) we note that a totally bounded metric
Chapter 5
304
I Metric Spaces
space is not necessarily sequentially compact. We leave the proof of part (v) as an exercise. _ 5.6.16. Exercise.
Prove part (v) of Theorem 5.6.15.
Parts (iii), (iv) and (v) of the above theorem allow us to define a sequentially compact metric space equivalently as a metric space which is complete and totally bounded. We now show that a metric space is sequentially compact if and only if it satisfies the Bolzano-Weierstrass property. 5.6.17. Theorem. A metric space { X ; p} is sequentially compact if and only if every infinite subset of X has at least one point of accumulation.
Proof Assume that Y is an infinite subset of a sequentially compact metric pl. If nY{ } is any sequence of distinct points in ,Y then nY{ } contains space ;X{ because ;X { p} is sequentially compact. a convergent subsequence y{ ,J, The limit Y of the subsequence is a point of accumulation of .Y Conversely, assume that { X ; p} is a metric space such that every infinite subset Y of X has a point of accumulation. Let y{ n} be any sequence of points then this sequence in .Y If a point occurs an infinite number of times in nY { ,} contains a convergent subsequence, a constant subsequence, and we are finished. If this is not the case, then we can assume that all elements of .Y{ } are disti net. eL t Z denote the set of all points Y n' n = I, 2, .... By hypothesis, the infinite set Z has at least one point of accumulation. If Z E Z is such a point of accumulation then we can choose a sequence of points of Z which converges to z (see Theorem 5.5.8, part (i» and this sequence is a subsequence y{ ,.} of nY { ' } Therefore, ;X{ p} is sequentially compact. This concludes the proof. _ Our next objective is to show that in metric spaces the concepts of compactness and sequential compactness are equivalent. In doing so we employ the following lemma, the proof of which is left as an exercise. 5.6.18. eL mma. eL t ;X { p} be a sequentially compact metric space. If { Y .. : IX E A} is an infinite open covering of { X ; p}, then there exists a number E > 0 such that every sphere in X of radius E is contained in at least one of the open sets Y ... 5.6.19. Exercise.
Prove Lemma 5.6.18.
5.6.20. Theorem. A metric space { X ; sequentially compact.
p} is compact if and only if it is
Proof From Theorem 5.6.17, a metric space is sequentially compact if and only if it has the Bolzano-Weierstrass property. Therefore, we first show
5.6. Compactness
that every infinite subset of a compact metric space has a point of accumulation. eL t [ X ; p) be a compact metric space, and let Y be an infinite subset of .X F o r purposes of contradiction, assume that Y has no point of accumulation. Then each x E X is the center of a sphere which contains no point of ,Y except possibly x itself. These spheres form an infinite open covering of .X But, by hypothesis, [ X ; p) is compact, and therefore we can choose from this infinite covering a finite number of spheres which also cover .X Now each sphere from this finite subcovering contains at most one point of ,Y and therefore Y is finite. But this is contrary to our original assumption, and we have arrived at a contradiction. Therefore, Y has at least one point of accumulation, and [ X ; p) is sequentially compact. Conversely, assume that [ X ; p) is a sequentially compact metric space, and let [ Y .. ;« E A) be an arbitrary infinite open covering of .X From Lemma 5.6.18 there exists an [ > 0 such that every sphere in X of radius [ is contained in at least one of the open sets Y ... Now, by hypothesis, { X ; p) is sequentially compact and is therefore totally bounded by part (iii) of Theorem 5.6.15. Thus, with arbitrary [ fixed we can find a finite [-net,
IX[ > S(x
«
x z , ... ,XI)' l ; [)
E
c Y .. I , i
A). eH nce,
=
such that X c
U
I
1= 1
S(x
l;
f). Now in view of Lemma
I, ... ,I, where the sets ,Y ,"
XcU
5.6.18,
are from the family { Y
.. ;
I
Y .. " and X has a finite open subcovering chosen from the infinite open covering { Y .. ;« E A). Therefore, the metric space { X ; p) is compact. This proves the theorem. _ I-I
There is yet another way of characterizing a compact metric space. Before doing so, we give the following definition. 5.6.21. Definition. eL t F { .. : « E A} be an infinite family of closed sets. The family F { .. :« E A} is said to have the finite intersection property if for every finite set B c A the set F .. is not empty.
n
.. EB
5.6.22. Theorem. A metric space ;X{ p} is compact if and only if every infinite family F { .. :« E A} of closed sets in X with the finite intersection property has a nonvoid intersection; i.e., F .. t= = 0 .
n
.. EA
5.6.23. Exercise.
Prove Theorem 5.6.22.
We now summarize the above results as follows.
306 5.6.24.
(i) (ii) (iii) (iv) (v)
Chapter 5
Theorem.
{X;
;X{
p} p} p} p}
In a metric space { X ;
I
Metric Spaces
p} the following are eq u ivalent:
is compact; is sequentially compact;
{X; possesses the Bolzano-Weierstrass property; {X; is complete and totally bounded; and every infinite family of closed sets in { X ; p} with the finite intersection property has a nonvoid intersection.
Concerning product spaces we offer the following exercise.
5~6.2S. Exercise. L e t { X I ; pa, { X z ; pz}, . .. , { X . ; spaces. L e t X = X I X X z X ... x X . , and let p(x , y)
=
PI(X " Y I )
+ ... +
where "x ,Y E "X i = I; ... , n, and where ,x Y space { X ; p} is also a compact metric space.
P.} be n compact metric
P.(x . , Y.), E .X
(5.6.26)
Show that the product
The next result constitutes an important characteriz a tion of compact sets in the spaces R· and C·. 5.6.27. Theorem. L e t { R · ; pz} (let { C · ; pz } ) be the metric space defined in Ex a mple 5.3.1. A set Y c R- (a set Y c C· ) is compact if and only ifit is closed and bounded. 5.6.28. Exercise.
Prove Theorem 5.6.27.
Recall that every non-void compact set in the real line R contains its infimum and its supremum. In general, it is not an easy task to apply the results of Theorem 5.6.24 to specific spaces in order to establish necessary and sufficient conditions for compactness. F r om the point of view of applications, criteria such as those established in Theorem 5.6.27 are much more desirable. We now give a condition which tells us when a subset of a metric space is compact. We have: 5.6.29. Theorem. L e t { X ; p} be a compact metric space, and let Y If Y is closed, then Y is compact.
Proof
c .X
L e t { Y .. ; (J, E A} be any open covering of ;Y i.e., each Y .. is open relative to { Y ; pl. Then, by Theorem 5.4.20, for each Y .. there is a U .. which is open relative to ;X{ p} such that Y .. = Y n U ... Since Y is closed, Y - is an open set in ;X{ pl. Also, since X = Y U Y - , Y - U U{ .. : (J, E A} is an open covering of .X Since X is compact, it is possible to find a finite subcovering from this family; i.e., there is a finite set B c A such that X = Y -
5.7.
Continuous uF nctions
u U[ .. EB V..].
Since Y c
U
307
.. eB
V.., Y
This implies that Y is compact.
U
= _
.. eB
Y
n
V.. ; i.e., { Y
.. ;«
E
B} covers Y .
We close the present section by introducing the concept of relative compactness. 5.6.30. Definition. Let { X ; p} be a metric space and let Y c .X The subset Y is said to be relatively compact in X if Y is a compact subset of .X One of the essential features of a relatively compact set is that every sequence has a convergent subsequence, just as in the case of compact subsets; however, the limit of the subsequence need not be in the subset. Thus, we have the following result. 5.6.31. Theorem. eL t { X ; p} be a metric space and let Y c .X Then Y is relatively compact in X if and only if every sequence of elements in Y contains a subsequence which converges to some x E .X
Proof Let Y be relatively compact in ,X and let nY{ } be any sequence in .Y Then nY{ } belongs to Y also and hence has a convergent subsequence in ,Y since Y is sequentially compact. Hence, nY{ } contains a subsequence .X which converges to an element x EY e Conversely, let nY { } be a sequence in .Y Then for each n = 1,2, ... , there is an x n E Y such that p(x n, nY ) < lin. Since x { n} is a sequence in ,Y it contains a convergent subsequence, say x{ n.} , which converges to some x E .X Since nx { .J is also in ,Y it follows from part (iii) of Theorem 5.5.8 that x E .Y Hence, Y is sequentially compact, and so Y is relatively compact in .X _
5.7.
CONTINUOS U
N UF CTIONS
Having introduced the concept of metric space, we are in a position to give a generalization of the concept of continuity of functions encountered in calculus. 5.7.1. Definition. Let { X ; P..J and { Y ; py} be two metric spaces, and let f: X - + Y be a mapping of X into .Y The mappingf is said to be continuous at the pcint X o E X if for every ( > 0 there is a ~ > 0 such that
o)] < ( whenever p,,(x, x o) < ~. The mapping f is said to be continuous on X simply continuous if it is continuous at each point x E .X PY [ f (x ) ,f(x
or
308
Chapter 5 / Metric Spaces
We note that in the above definition the ~ is dependent on the choice of X o and e; ie., ~ = tS(f, x o). Now if for each f > 0 there exists a ~ = tS(e) > 0 such that for any X o we have py[ f (x ) ,f(x o)] < f whenever p,,(x, x o) < ~, then we say that the function f is uniformly continuous on .X Henceforth, if we simply say f is continuous, we mean f is continuous on .X 5.7.2. Example. Let { X ; p,,} = R~, 5.3. I). Let A denote the real matrix
We denote x
E
Rn and Y
::: :::
[
A=
amI
py}
and let { Y ;
a m2
. ...
:::]
.,.
a mn
=
RT (see Example
.
Rm by
E
L e t us define the function f: Rn - +
Rm by
=
f(x)
Ax
for each x ERn. We now show that f is continuous on Rn. Ifx, X o E Rm are such that y = f(x) and aY = f(x.), then we have
y, oY
[ and
~']
amI
11m
= ~
p[ y(y, OY )]2 Using
R- and
"~ ] e[ ,]
a[ n =
E
am_
tL
en
a/j(e J -
eOJ)r
the Schwarz inequality,· it follows that p[ ,.{y,
Now let
M=
t{ 1
tJ
yo»)2
all}
Ct ah) ~LJ
< [~ 1/1
1= =
0 (if
M=
(e J
-
e I)2) O
0 then we are done). Given any
0 and choosing ~ = flM, it follows that p,.{y, oY ) < f whenever p,,(x, ox ) and any mapping f: Rn - + Rm which is represented by a real, constant (m X n) matrix A is continuous on Rn. • f
<
>
~
5.7.3. Example. Let { X ; p,,} = { Y ; py} = {e[a, b]; P2}' the metric space defined in Example 5.3.12, and let us define a function/: X - + Y in the fol-
5.7.
Continuous uF nctions
lowing way. F o r x
309
,X Y = f(x ) is given by
E
yet)
I: k(t, s)x(s)ds,
=
t
E
a[ , b],
where k: R7. - > R is continuous in the usual sense, i.e., with respect to the metric spaces R~ and R1. We now show that f is continuous on .X Let x, X o E X and y, o Y E Y be such that y = f(x ) and oY = f(x o). Then [ p iY ,
oY W
It follows from Holder's where M = ever Px(,x
u: r
rI{ :
=
k(t, s)[(x s)
-
ox (s)]ds}
dt.
inequality for integrals (5.2.5) that py(y, oY )
<
Mpx(x,
x o),
> 0, py{y,Yo) <
k7.(t, s) dsdtl'7.. eH nce, for any f
x o) <
7.
b, where b =
f
when-
fiM. •
5.7.4. Example. Consider the metric space e{ ra, b]; p~} defined in Example 5.3.14. eL t el[ a , b] be the subset of era, b] of all functions having continuous first derivatives (on (a, b», and let {X; Px} be the metric subspace {el[a, b]; pool. Let {Y; py} = e{ ra, b]; p~} and define the functionJ: X - > Yas follows. F o r x E ,X Y = f(x ) is given by yet) =
dx ( t) . dt
To show that/is not continuous, we show that for any b > 0 there is a pair x o) < ~ but pif(x ) ,f(x o > I. eL t ox (t) = 0 for x , X o E X such that Px(,x all t E a[ , b], and let x(t) = tx sin rot, tx > 0, ro > O. Then p(x o' x) < tx. Now if oY = f(x o) and y = f(x ) , then yo(t) = 0 for all t E a[ , b] and yet) = (XCI) cos rot. eH nce, p(Yo' y) = txro, provided that ro is sufficiently large, i.e., so that cos rot = ± I for some t E a[ , b]. Now no matter what value of ~ we choose, there is an x E X such that p(x, x o) < ~ if we pick tx < ~. oH wever, p(y, oY ) = I if we let ro = Iltx. Therefore,J i s not continuous on .X •
»
We can interpret the notion of continuity of functions in the following equivalent way. 5.7.5. Theorem. Let {X; Px} and { Y ; py} be metric spaces, and let f: X -> .Y Then f is continuous at a point X o E X if and only if for every f > 0, there exists a ~ > 0 such that f(S(xo;~)
5.7.6. Exercise.
c S(f(x o); f).
Prove Theorem 5.7.5.
Intuitively, Theorem 5.7.5 tells us thatfis continuous at X o if f(x ) is arbitrarily close to f(x o) when x is sufficiently close to X o. The concept of continuity is depicted in Figure H for the case where { X ; Px} = { Y ; py} = R~.
Chapter 5 I Metric Spaces
310
o o { Y ; pyl
5.7.7.
iF gure H.
=
R~
Illustration of continuity.
As we did in Chapter I, we distinguish between mappings on metric spaces which are injective, surjective, or bijective. It turns out that the concepts of continuity and convergence of sequences are related. Our next result yields a connection between convergence and continuity. 5.7.8. Theorem. Let { X ; P,J and { Y ; p,.} be two metric spaces. A function f: X - + Y is continuous at a point X o E X if and only if for every sequence }~x{ of points in X which converges to a point X o the corresponding sequence f{ (x)~ } converges to the point f(x o) in Y; i.e., limf(x)~
whenever lim ~x
=
f(lim x~)
=
f(x o)
= x o'
Proof Assume that f is continuous at a point X o E ,X and let {x.l be a sequence such that lim x . = X o' Then for every E > 0 there is a 6 > 0 such that p,.(f(x ) ,f(x o < E whenever Px(x, x o) < 6. Also, there is an N such that Px ( x . , x o) < 6 whenever n > N. Hence, p,.(f(x . ),f(x o < E whenever n > N. Thus, if f is continuous at X o and if lim x . = x o, then Iimf(x . )
»
=
»
f(x o)'
Conversely, assume that f(x . ) - + f(x o) whenever ~x - + x o' F o r purposes of contradiction, assume that f is not continuous at X o' Then there exists an E > 0 such that for each 6 > 0 there is an x with the property that Px(x, x o) < 6 and p,.(f(x ) ,f(x o» > E. This implies that for each positive integer n there is an x~ such that Px(x., x o) < lin and P,.(f(x J , f(x o > E for all n; i.e., ~x - + X o but { f (x . )} does not converge to f(x o)' But we assumed that f(x . ) - + f(x o) whenever ~x - + X o' Hence, we have arrived at a contradic-
»
5.7. Continuous uF nctions
tion, and I must be continuous at theorem. _ X
311
o' This concludes the proof of our
Continuous mappings on metric spaces possess the following important properties.
5.7.9. Theorem. eL t { X ; p~} and { Y ; be a mapping of X into .Y Then (i)
p,} be two metric spaces, and letl
I
is continuous on X if and only if the inverse image of each open subset of { Y ; p,} is open in { X ; p~}; and (ii) I is continuous on X if and only if the inverse image of each closed subset of { Y ; p,} is closed in { X ; p~}.
Proof. eL t I be continuous on ,X and let V::t= 0 be an open subset of ;Y{ p,}. Let U = I- I (V). Clearly, :U :t= 0. Now let x E .U Then there exists a unique y = I(x ) E V. Since V is open, there is a sphere S(y; e) which is entirely contained in V. Since I is continuous at x, there is a sphere S(x; 0) such that its image I(S(x ; 0» is entirely contained in S(y; e) and therefore in V. But from this it follows that S(x; 0) c .U eH nce, every x E U is the center of a sphere which is contained in .U Therefore, U is open. Conversely, assume that the inverse image of each non-empty open subset of Y is open. F o r arbitrary x E X we have y = f(x ) . Since S(y; e) c Y i s open, the setf- I (S(y; e» is open for every f > and x E f- I (S(y;e» . eH nce, there is a sphere Sex; 0) such that sex ; 0) c f- I (S(y; e» . From this it follows that for every f > 0 there is a 6 > 0 such that f(S(x ; 0) c S(y; f). Therefore,fis continuous at .x But x E X was arbitrarily chosen. eH nce, I is continuous on .X This concludes the proof of part (i). To prove part (ii) we utilize part (i) and take complements of open sets. •
°
The reader is cautioned that the image of an open subset of X under Y is not necessarily an open subset of .Y a continuous mapping f: X - + F o r example, let I: R - + R be defined by f(x ) = x 2 for every x E R. Clearly, lis continuous on R. eY t the image of the open interval ( - I , I) is the interval 0[ , I). But the interval 0[ , I) is not open. We leave the proof of the next result as an exercise to the reader.
5.7.10. Theorem. eL t {X; p~}, {Y; p,}, and Z { ; P.} be metric spaces, letf be a mapping of X into ,Y and let g be a mapping of Y into Z. Iffis continuous on X and g is continuous on ,Y then the composite mapping h = g 0 I of X into Z is continuous on .X 5.7.11. Exercise.
Prove Theorem 5.7.10.
F o r continuous mappings on compact spaces we state and prove the following result.
Chapter 5
312
5.7.12. Theorem. Let ;X { Px} f: X - + Y be continuous on .X
and ;Y{
I Metric Spaces
P)'} be two metric spaces, and let
(i) If {X; Px} is compact, then f(X ) is a compact subset of {Y; p)'.} (ii) If U is a compact subset of the metric space ;X{ Px,} thenf(U ) is a compact subset of the metric space { Y ; p)'.} (iii) If {X; P}x is compact and if U is a closed subset of ,X then f( )U is a closed subset of { ;Y p)'). (iv) If;X { Px} is compact, thenfis uniformly continuous on .x
Proof To prove part (i) let IY { I} be a sequence in f(X ) . Then there are points ,x { ,} in X such that IY I = f(x ll ). Since ;X{ Px} is compact we can find a subsequence ,x { .) of ,x { ,} which converges to a point in ;X i.e., ,x . - + x. In view of Theorem 5.7.8 we have, since f is continuous at x, f(x , .) - + f(x ) E f(X ) . From this it follows that the sequence ,Y{ ,} has a convergent subsequence and f(X ) is compact. To prove part (ii), let U be a compact subset of .X Then ;U { Px} is a compact metric space. In view of part (i) it now follows that f( )U is also a compact subset of the metric space { Y ; p)'.} To prove part (iii), we first observe that a closed subset U of a compact metric space ;X{ Px) is itself compact and ;U { Px) is itself a compact metric space. In view of part (ii), f( U ) is a compact subset of the metric space { Y ; P)'} and as such is bounded and closed. To prove part (iv), let E > O. F o r every x E ,X there is some positive number, 'I(x), such that f(S(x ; 2'1(x») c: S(f(x ) ; E/2). Now the family { S ex ; ' I (x » : x E X ) is an open covering of X. Since X is compact, there is a finite set, say F c: ,X such that S { ex; ,,(x»: x E } F is a covering of .X Now let 6 = min {,,(x): x E .} F Since F is a finite set, 6 is some positive number. Now let ,x Y E X be such that p(x, y) < 6. Choose z E F such that x E S(z; ,,(z». Since 6:::;;; ,,(z), Y E S(z; 2,,(z.» Since f(S(z ; 2,,(z)» c S(f(z ) ; E/2), it follows that f(x ) and f(y) are in S(f(z ) ; E/2). eH nce, pif(x ) ,f(y» < E. Since 6 does not depend on x E ,X f is uniformly continuous on .X This completes the proof of the theorem. _ eL t us next consider some additional generalizations of concepts encountered in the calculus. 5.7.13. Definition. eL t ;X { Px} and ;Y{ p),} be metric spaces, and let {fll} be a sequence of functions from X into .Y Iff{ 1l(X)} converges at each x E X, then we say that {fll} is pointwise convergent. In this case we write lim fll = f, II where f is defined for every x E .X Equivalently,
we say that the sequence f{ lO} is pointwise convergent to
5.7.
Continuous uF nctions
a function I if for every = N(f, x ) such that
313 f
> 0 and for every x pil,,(x ) ,/(x »
<
X
E
there is an integer N
f
whenever n > N(f, x). In general, N(f, x ) is not necessarily bounded. However, if N(f, )x is bounded for all x E ,X then we say that the sequence I[ .} converges to I uniformly on .X Let M(f) = sup N(f, x ) < 00. Equivalently, "ex
we say that the sequence [f.} converges uniformly to I on X f > 0 there is an M(f) such that
pil.(x ) ,f(x »
<
if for every
f
whenever n > M(f) for all x E .X In the next result a connection between uniform convergence of functions and continuity is established. (We used a special case of this result in the proof of Example 5.5.28.) 5.7.14. Theorem. Let [ X ; p,,} and [ Y ; py} be two metric spaces, and let f[ It} be a sequence of functions from X into Y such that f" is continuous on X for each n. If the sequence [f.} converges uniformly to I on X, then I is continuous on .X Assume that the sequence [ f .} converges uniformly to Ion .X Then < f whenever n > N for every f > 0 there is an N such that Py(f.(x ) ,f(x » for all x E .X If M > N is a fixed integer then 1M is continuous on .X Letting X o E X b e fixed, we can find a 6> 0 such thatpy(fM(x),IM(x o» < fwhenever p,,(x , x o) < 6. Therefore, we have
Proof
:» ::;;
py(f(x ) ,/(x o
pif(x ) ,fM(X »
+
py(fM(x),fM(X
+
»
O
PY(fM(XO),f(x o» < 3f, whenever. PJe(x, x o) < 6. F r om this it follows that f is continuous at X O' Since X o was arbitrarily chosen,fis continuous at all x E .X This proves the theorem. • The reader will recognize in the last result of the present section several generalizations from the calculus to real-valued functions defined on metric spaces. [ ; p} denote the 5.7.15. Theorem. Let [ X ; pJe} be a metric space, and let R real line R with the usual metric. Let I: X - > R, and let U c: .X If I is continuous on X and if U is a compact subset of [ X ; p",}, then (i) lis uniformly continuous on U ; (ii) fis bounded on ;U and (iii) if U " * 0, f attains its infimum and supremum on ;U i.e., there ex i stx o ,x E U ) andf(x sup l) = 1 E U s uchthatf(x o )= i nf{ f (x ) :x [ f (x ) : x E .} U
Chapter 5
314
I
Metric Spaces
Proof Part (i) follows from part (iv) ofTheorem 5.7.12. Since U is a compact subset of X it follows that /(U ) is a compact subset of R. Thus, /(U ) is bounded and closed. From this it follows that j is bounded. To prove part (iii), note that if U is a non-empty compact subset of ;X { Px,} then /(U ) is a non-empty compact subset of R. This implies that / attains its infimum and supremum on .U •
5.8.
SOME IMPORTANT RESUT L S IN APPLICATIONS
In this section we present two results which are used widely in applications. The first of these is called the fixed point principle while the second is known as the Ascoli-Arzela theorem. Both of these results are widely utilized, for example,in establishing existence and uniqueness of solutions of various types of equations (ordinary differential equations, integral equations, algebraic equations, functional differential equations, and the like). We begin by considering a special class of continuous mappings on metric spaces, so-called contraction mappings. The 5.8.1. Definition. eL t { X ; p} be a metric space and let j: X - X . function / is said to be a contraction mapping if there exists a real number c such that 0 < c < I and for all ,x y
E
s;;: cp(x . y)
p(f(x ) ,j(y»
.X
(5.8.2)
The reader can readily verify the following result. 5.8.3. Theorem. Every contraction mapping is uniformly continuous on .X
5.8.4.
Prove Theorem 5.8.3.
Exercise.
The following result is known as the fixed point principle or the principle of contraction mappings. 5.8.5. Theorem. eL t { X ; p} be a complete metric space, and let / be a contraction mapping of X into .X Then (i) there exists a unique point
and (ii) for any
uX
X such that
E
f(x o) = XI
E ,X
the sequence x { X n+ 1
= /(x
n} n ),
(5.8.6)
xo'
in X defined by
n=
1,2, ...
converges to the unique element X o given in (5.8.6).
(5.8.7)
315
5.8. Some Important Results in Applications
The unique point X o satisfying Eq. (5.8.6) is called a fixed point off In this case we say that X o is obtained by the method of successive approximations. We first show that if there is an X o E X satisfying (5.8.6), then it must be unique. Suppose that X o and oY satisfy (5.8.6). Then by inequality (5.8.2), we have p(xo,Yo) < cp(x o' oY )· Since 0 < c < I, it follows that p(x o' oY ) = 0 and therefore X o = oY ' Now let IX be any point in .X We want to show that the sequence fx.} generated by Eq. (5.8.7) is a Cauchy sequence. F o r any n > I, we have p(x.+ I, x . ) < cp(x., x._ I). By induction we see that p(x.1+ > x . ) < C· - I p(XZx ' l ) for n = 1,2, .... Thus, for any m > n we have
Proof
p(x""
",- I
< I: P(XkI+ '
x.)
x
k= •
< -
c•
-1
p( ,zX
1- c
IX
k) )
<
c· - I p(x z ,
xl)[1
+
c+
... +
C",-I-· ]
.
Since 0 < c < I, the right-hand side of the above inequality can be made arbitrarily small by choosing n sufficiently large. Thus, .x { } is a Cauchy sequence. Next, since fX ; p} is complete, it follows that .x { } converges; i.e., lim x• exists. eL t lim x .
•
But f(lim x . ) •
= .x Now since/is continuous on limf(x . ) = f(lim x.). •
= f(x ) and lim I(x n ) = lim x II.
•
n+
I
,X
we have
•
= .x Thus,/(x ) = x and we
have proven the existence of a fixed point off Since we have already proven uniqueness, the proof is complete. _ It may turn out that the composite function pn' /),. /0/0 ... 0/ is a contraction mapping, whereas / is not. The following result shows that such a mapping still has a unique fixed point.
5.8.8. Corollary. Let { X ;
p} be a complete metric space, and let/; X - > X be continuous on .X If the composite function p.' = f 0/0 ... 0/ is a contraction mapping, then there is a unique point X o E X such that
f(x o) = X o' (5.8.9) Moreover, the fixed point can be determined by the method of successive approximations (see Theorem 5.8.5).
5.8.10. Exercise.
Prove Corollary 5.8.8.
We will consider several applications of the above results in the last section of this chapter. Before we can consider the Arzela-Ascoli theorem, we need to introduce the following concept.
316
Chapter 5
I Metric Spaces
5.8.11. Definition. Let e[a, b] denote the set of all continuous real-valued functions defined on the interval a[ , b] of the real line R. A subset Y of era, b] is said to be equicontinuous on a[ , b] if for every f > 0 there exists a J > 0 such that Ix(t) - x(t o) I < f for all x E Y and all t, to such that It - tol < .J Note that in this definition J depends only on f and not on x or 1 and ' We now state and prove the Arzela-Ascoli theorem.
0,
5.8.12. Theorem. Let e{ a[ , b]; p_} be the metric space defined in Example 5.3.14. Let Y be a bounded subset of e[a, b]. If Y is equicontinuous on a[ , b], then Y is relatively compact in e[a, b].
Proof F o r each positive integer k, let us divide the interval a[ , b] into k equal parts by the set of points Vk = t{ ok' 11k' ... ,/ u } c a[ , b]. That is, a = 10k < Ilk < ... < lu = b, where t'k = a + (ilk)(b - a), i = 0, I, ... ,k, and
-
a[ , b]
U
k= 1
= U
k
1= '
[ / c/- I lk'
I,k] for all k
= 1,2, .... Since each
Vk is a finite set,
Vk is a countable set. F o r convenience of notation, let us denote this set
by T { ,! Tz, ....J The ordering of this set is immaterial. Next, since Y is bounded, there is a ., > 0 such that p_(,x y) < ., for all ,x Y E .Y Let X o be held fixed in ,Y and let Y E Y be arbitrary. Let 0 E era, b] be the function which is zero for all 1 E a[ , b]. Then p_(y, 0) < p_(y, x o) + p_(x o, 0). Hence, p_ ( y, 0) < M for all y E ,Y where M = ., + p_(x o, 0). This implies that sup ly(t)1 < M for all Y E .Y Now, let y{ .J be an arbitrary sequence in
IEI• • bl
.Y We want to show that y{ .J contains a convergent subsequence. Since IY.(TI)I < M for all n, the sequence of real numbers .Y { (T I)} contains a convergent subsequence which we shall call {YI.(TI)}' Again, since IhY (1'Z) I < M for all n, the sequence of real numbers hY{ (1')z } contains a convergent subsequence which we shall call .z Y { (1')z .} We see that .z Y { (1' I)} is a subsequence Of{hY (1' I)}, and hence it is convergent. Proceeding in a similar fashion, we obtain sequences y { hI, .z Y { ,} ... such that bY{ } is a subsequence of y { 1.} for all k > j. Furthermore, each sequence is such that lim hY (1',) exists for each i such that 1 < a subsequence of hY { }
•
i < k. Now let { x . J b e the sequence y{ • .} Then .x { } is and lim .X (1',) exists for i = 1,2, .... We now wish to
•
show that .x{ } is a Cauchy sequence in e{ a[ , b]; p_.} Let f> 0 be given. Since Y is equicontinuous on a[ , b], we can find a positive number k such that Ix.(t) - x . (t' ) I < f/3 for every n whenever It - t' l < Ilk. Since .X { (1',)} is a convergent sequence of real numbers, there exists a positive integer N such that Ix.(1',) - m X (1',) I < f/3 whenever m > Nand n > N for all 1', E Vk • Now, if t E a[ , b], there is some 1', E Vk such that II - 1',1 < Ilk.
5.9. Equivalent and oH meomorphic Metric Spaces. Topological Spaces
Hence,
Ix i t)
for all m > -
"x ,(t)
Nand n >
I< Ifx t(t)
-
317
N, we have fX t(t,)
I + IfX t(t,)
-
"x ,(t,)
+
I IX",(t,) -
"x ,(t)1
<
E.
This implies that poo(x"" fX t) .< E for all m, n > N. Therefore, .x{ } is a Cauchy sequence in era, b]. Since e{ ra, b]; pool is a complete metric space (see Example 5.5.28), fX{ t} converges to some point in era, b]. This implies that fY { t} has a subsequence which converges to a point in era, b] and so, by Theorem 5.6.31, Y is relatively compact in era, b]. This completes the proof of the theorem. _ Our next result follows directly from Theorem 5.8.12. It is sometimes referred to as Ascoli's lemma. 5.8.13. Corollary. Let 9{ 1ft} be a sequence of functions in e{ ra, b]; poolIf 9{ 1ft} is equicontinuous on a[ , b] and uniformly bounded on a[ , b] (Le., there exists an M> 0 such that sup 1,.(t)1 < M for all n), then there exists a .S;.S;b
,
E
era, b] and a subsequence 9{ 1ft.} of ,{ ,,}
uniformly on a[ , b].
5.8.14.
Exercise.
such that 9{ 1ft.}
converges to,
Prove Corollary 5.8.13.
We close the present section with the following converse to Theorem 5.8.12. 5.8.15. Theorem. Let Y be a subset of era, b] which is relatively compact in the metric space e{ ra, b]; pool. Then Y is a bounded set and is equicontinuous on a[ , b]. 5.8.16. Exercise.
5.9.
Prove Theorem 5.8.15.
EQUIVALENT AND HOMEOMORPHIC SPACES. TOPOLOGICAL SPACES
METRIC
It is possible that seemingly different metric spaces may exhibit properties which are very similar with regard to such concepts as open sets, limits of sequences, and continuity of functions. F o r example, for each p, I < p < 00, the spaces R~ (see Examples 5.3.1,5.3.3) are different metric spaces. However, it turns out that the family of all open sets is the same in all of these metric spaces for 1 < p < 00 (e.g., the family of open sets in R7 is the same as the family of open sets in Ri, which is the same as the family of open sets in Rj, etc.). Furthermore, metric spaces which are not even defined on
Chapter 5
318
I Metric Spaces
the same underlying set (e.g., the metric spaces { X ; P.. } and { Y ; py}, where X Y) may have many similar properties of the type mentioned above. We begin with equivalence of metric spaces defined on the same underlying set.
*
5.9.1. Definition. Let { X ; ptl and { X ; Pl} be two metric spaces defined on the same underlying set .X Let 3 1 and 31 be the topology of X determined by PI and Pl' respectively. Then the metrics PI and Pl are said to be equivalent metrics if 3 1 = 31 , Throughout the present section we use the notation f: { X ;
PI} ~
{Y;
Pl}
to indicate a mapping from X into ,Y where the metric on X is PI and the metric on Y is Pl' This distinction becomes important in the case where X = ,Y i.e. in the casef: { X ; PI} - + { X ; Pl}' Let us denote by i the identity mapping from X onto ;X i.e., i(x ) = x for all x E .X Clearly, i is a bijective mapping, and the inverse is simply i itself. However, since the domain and range of i may have different metrics associated with them, we shall write and
i: ;X{ i- I : {X;
PI} Pl}
{X; ~ ~
Pl} {X;
PI}'
With the foregoing statements in mind, we provide in the following theorem a number ofequivalent statements to characterize equivalent metrics. 5.9.2. Theorem. Let {X; pd, {X; Pl}' and { Y ; the following statements are equivalent:
P3} be metric spaces. Then
(i) PI and Pl are equivalent metrics; (ii) for any mappingf: X - + Y,J: { X ; PI} - + { Y ; P3} is continuous on X if and only iff: { X ; Pl} - + { Y ; P3} is continuous on X; (iii) the mapping i: { X ; PI} - + { X ; Pl} is continuous on ,X and the mapping i- I : { X ; Pl} - + { X ; ptl is continuous on X; and (iv) for any sequence x { R } in ,X x { R } converges to a point x in { X ; PI} if and only if x { R } converges to x in ;X { Pl}'
Proof
To prove this theorem we show that statement (i) implies statement (ii); that statement (ii) implies statement (iii); that statement (iii) implies statement (iv); and that statement (iv) implies statement (i). To show that (i) implies (ii), assume that PI and Pl are equivalent metrics, and letfbe any continuous mapping from ;X{ PI} into {Y; P3}' Let U be any open set in { Y ; P3}' Sincefis continuous,J - I (U ) is an open set in { X ; PI}' Since PI and Pl are equivalent metrics,f- I (U ) is also an open set in { X ; Pl} ' Hence, the mapping f: { X ; Pl} - + { Y ; P3} is continuous. The proof of the converse in statement (ii) is identical.
5.9. Equivalent and oH meomorphic
Metric Spaces. Topological Spaces
319
We now show that (ii) implies (iii). Clearly, the mapping i: ;X{ pz} - + {X; pz} is continuous. Now assume the validity of statement (ii), and let { Y ; P3} = {X; pz.} Then i: {X; PI} - {X; pz} is continuous. Again, it is clear that i- I : { X ; PI} - + { X ; PI} is continuous. Letting { Y ; P3} = { X ; pd in statement (ii), it follows that i- I : { X ; pz} - + { X ; PI} is continuous. Next, we show that (iii) implies (iv). eL t i: ;X{ PI} - + ;X { pz} be continuous, and let the sequence {x~} in metric space { X ; PI} converge to .x By Theorem 5.7.8, lim i(x)~ = i(x); eL ., lim ~x = x in { X ; pz.} The converse is ~
~
proven in the same manner. Finally, we show that (iv) implies (i). L e t U be an open set in { X ; PI}' be a sequence in U - which converges Then U - is closed in { X ; PI}' Now let{x~J to x in { X ; PI}. Then x E U by part (iii) of Theorem 5.5.8. By assumption, {x~} converges to x in { X ; pz} also. Furthermore, since x E U - , U - is closed pz,} by part (iii) of Theorem 5.5.8. Hence, U is open in ;X{ pz.} Letting in ;X{ U be an open set in ;X { pz,} by the same reasoning we conclude that U is open in { X ; PI}' Thus, PI and pz are equivalent metrics. This concludes the proof of the theorem. _ The next result establishes sufficient conditions for two metrics to be equivalent. These conditions are not necessary, however.
5.9.3. Theorem. Let ;X{ PI} and ;X { pz} be two metric spaces. If there exist two positive real numbers, Y' and A, such that lpz ( x ,
for all ,x y
5.9.4.
E
,X
Exercise.
y)
<
y)
PI(X,
<
lJ Pz(,x
y)
then PI and pz are equivalent metrics. Prove Theorem 5.9.3.
Let us now consider some specific examples of equivalent metric spaces. 5.9.5. Exercise. eL t ;X { pJ be any metric space. F o r the example Exercise 5.1.10 the reader showed that {X; PI} is a metric space, where PI ( x , y for all ,x y
E
.X
)_ -
I
+
of
p(x , y) p(x , y)
Show that P and PI are equivalent metrics.
PI} = R~ and {R~; pz} = R~ be the metric spaces 5.9.6. Theorem. Let {R~; defined in Example 5.3.1, and let R{ ;~ pool be the metric space defined in Ex a mple 5.3.3. Then (i) poo(x, y) < pz(,x y) < ..jn poo(x, y) for all ,x y E R~; (ii) poo(x, y) < PI(X, y) < npoo(x, y) for all ,x y E R~; and (iii) PI' Pz, and poo are equivalent metrics.
Chapter 5 I Metric Spaces
320
5.9.7. Exercise.
Prove Theorem 5.9.6.
It can be shown that for the metric spaces R { n; PoP) and R { n; Pv), PoP and Pv are equivalent metrics for any p, q such that I < p < 00, I < q < 00. In Example 5.1.12, we defined a metric P*, called the usual metric for R*. U p until now, it has not been apparent that there is any meaningful connection between P* and the usual metric for R. The following result shows that when P* is restricted to R, it is equivalent to the usual metric on R. 5.9.8. Theorem. L e t R { ; p) denote the real line with the usual metric, and let R { *; p*J denote the extended real line (see Exercise 5.1.12). Consider R { ; P*J which is a metric subspace of R { *; P*J. Then { ; p) and {R; p*J, p and p* are eq u ivalent (i) for the metric spaces R metrics; (ii) if U c R, then U is open in {R; p) if and only if U is open in R { *; P*); and (iii) if U is open in R { *; p*), then U n R, U - {+ooJ, and U - { - o o) are open in R { *; p*).
5.9.9. Exercise. Prove Theorem 5.9.8. (H i nt: 5.9.2 to prove part (i) of this theorem.)
Use
part (iii) of Theorem
Our next example shows that i- I need not be continuous, even though
i is continuous.
5.9.10. Example. L e t X be any non-empty set, and let PI be the discrete metric on X (see Example 5.1.7). In Exercise 5.4.26 the reader was asked to show that every subset of X is open in { X ; PI)' Now let { X ; p) be an arbitrary metric space with the same underlying set .X Clearly, i: { X ; PI) -+ { X ; p) is continuous. However, i- I : { X ; p) - + { X ; PI) is not continuous unless every subset of { X ; p) is open. Since this is usually not true, i- I need not be continuous. _ Next, we introduce the concepts of homeomorphism and homeomorphic metric spaces. 5.9.11. Definition. Two metric spaces { X ; P.. J and {Y; py} are said to be bomeomorpbic if there exists a mapping rp: { X ; P..J - + { Y ; p,.) such that (i) rp is a bijective mapping of X onto ,Y and (ii) E c X is open in { X ; P..) if and only if rp(E) is open in { Y ; p,.J. The mappingrp is calledabomeomorpbism. We immediately have the following generalization of Theorem 5.9.2. 5.9.12. Theorem. Let {X; P..,J { Y ; let rp be a bijective mapping of { X ; statements are equivalent.
p,.), and { Z ; p,) be metric spaces, and P.. ) onto { Y ; p,.). Then the following
5.9. Equivalent and oH meomorphic
Metric Spaces. Topological Spaces
321
(i) rp is a homeomorphism; Px} - + Z { ; Pz} is continuous on (ii) for any mapping f: X - + Z, f: ;X { X if and only iff0 rp-I: { Y ; py} - + { Z ; Pz} is continuous on ;Y (iii) rp: { X ; Px} - + { Y ; py} is continuous and rp-I: {Y; py} - + {X; Px} is continuous; and (iv) for any sequence x { n } in ,X x { n } converges to a point x in {X; Px} if and only if r{ p(x n )} converges to rp(x) in { Y ; py}. 5.9.13. Exercise.
Prove Theorem 5.9.12.
The connection between homeomorphic metric spaces defined on the same underlying set and equivalent metrics is provided by the next result. 5.9.14. Theorem. Let { X ; PI} and { X ; P2} be two metric spaces with the same underlying set .X Then P I and P2 are equivalent if and only if the identity PI} - + ;X { P2} is a homeomorphism. mapping i: ;X { 5.9.15. Exercise.
Prove Theorem 5.9.14.
It is possible for ;X { PI} and ;X { P2} to be homeomorphic, even though PI and P2 may not be equivalent. There are important cases for which the metric relations between the elements of two distinct metric spaces are the same. In such cases only the nature of the elements of the metric spaces differ. Since this difference may be of no importance, such spaces may often be viewed as being essentially identical. Such metric spaces are said to be isometric. Specifically, we have:
5.9.16. Definition. eL t { X ; Px} and { Y ; py} be two metric spaces, and let rp: { X ; Px} - + (Y ; py} be a bijective mapping of X onto .Y The mapping rp is said to be an isometry if
Px(,x
y) =
py(rp(x), rp(y»
for all x, y E .X If such an isometry exists, then the metric spaces (X ; and ;Y{ P,.} are said to be isometric.
Px}
rp be an isometry. Then rp is a homeomorphism.
5.9.17. Theorem.
eL t
5.9.18. Exercise.
Prove Theorem 5.9.17.
We close the present section by introducing the concept of topological space. It turns out that metric spaces are special cases of such spaces. pI, In Theorem 5.4.15 we showed that, in the case of a metric space ;X{ (i) the empty set 0 and the entire space X are open; (ii) the union of an arbitrary collection of open sets is open; and (iii) the intersection of a finite collection of open sets is open. Examining the various proofs of the present
Chapter 5 I Metric Spaces
322
chapter, we note that a great deal of the development of metric spaces is not a consequence of the metric but, rather, depends only on the properties of certain open and closed sets. Taking the notion of open set as basic (instead of the concept of distance, as in the case of metric spaces) and taking the aforementioned properties of open sets as postulates, we can form a mathematical structure which is much more general than the metric space. 5.9.19. Definition. Let X be a non-void set of points, and let 3 be a family of subsets which we will call open. We call the pair ;X{ 3} a topological space if the following hold:
3, 0 E 3; (ii) if U 1 E 3 and U z E 3, then U 1 n U z (iii) for any index set A, if IX E A, and ,U . (i)
X
E
E E
3; and 3, then
U
,.eA
,U .
E
3.
The family 3 is called the topology for the set .X The complement of an open set U E 3 with respect to X is called a closed set. The reader can readily verify the following results: eL t ;X {
5.9.20. Theorem.
3} be a topological space. Then
(i) 0 is closed; (ii) X is closed; (iii) the union of a finite number of closed sets is closed; and (iv) the intersection of an arbitrary collection of closed sets is closed. 5.9.21.
Exercise.
Prove Theorem 5.9.20.
We close the present section by citing several specific examples topological spaces. 5.9.22. Example. topological space.
_
In view of Theorem 5.4.15,
of
every metric space is a
5.9.23. Example. Let X = ,x { y,J and let the open sets in X be the void set 0, the set X itself, and the set .} x { If 3 is defined in this way, then ;X { 3} is a topological space. In this case the closed sets are 0, ,X and y{ .} _ 5.9.24. Example. Although many fundamental concepts carry over from metric spaces to topological spaces, it turns out that the concept of topological space is often too general. Therefore, it is convenient to suppose that certain topological spaces satisfy some additional conditions which are also true in metric spaces. These conditions, called the separation axioms, are imposed on topological spaces ;X{ 3} to form the following important special cases:
5.10. Applications
323
TI-spaces: A topological space (X ; ::I} is called a TI-space if every set consisting of a single point is closed. Equivalently, a space is called a T I space, provided that if x and yare distinct points there is an open set containing y but not .x Clearly, metric spaces satisfy the TI-axiom. is called a Tzs- pace if for all Tz-spaces: A topological space (X;::I} distinct points ,x y E X there are disjoint open sets U x and U y such that x E U x and y E U r Tzs- paces are also called aH usdorff' spaces. All metric spaces are Hausdorff spaces. Also, all T z s- paces are T I-spaces. However, there are TI-spaces which do not satisfy the Tz-separation axiom. T 3-spaces: A topological space (X; ::I} is called a T 3s- pace if (i) it is a TI-space, and (ii) if given a closed set Y a nd a point x not in Y there are disjoint open sets U I and U z such that x E U I and Y c U z . T 3s- paces are also called regular topological spaces. All metric spaces are T 3-spaces. All T 3s- paces are Tzs- paces; however, not all Tzs- paces are T 3s- paces. T.-spaces: A topological space (X;::I} is called a T.-space if (i) it is a TI-space, and (ii) if for each pair of disjoint closed sets IY > Y z in X there exists a pair of disjoint open sets U I , U z such that Y I c U I and Y z c U z . T.-spaces are also called normal topological spaces. Such spaces are clearly T 3-spaces. However, there are T 3-spaces which are not normal topological spaces. On the other hand, all metric spaces are T.-spaces. •
5.10.
APPLICATIONS
The present section consists of two parts (subsections A and B). In the first part we make extensive use of the contraction mapping principle to establish existence and uniqueness results for various types of equations. This part consists essentially of some specific examples. In the second part, we continue the discussion of Section .4 11, dealing with ordinary differential equations. Specifically, we will apply Ascoli's lemma, and we will answer the questions raised at the end ofsubsection4.1IA.
A. Applications of the Contraction Mapping Principle In our first example we consider a scalar algebraic equation which may be linear or nonlinear. 5.10.1.
Example.
Consider the equation
x
= f(x ) ,
(5.10.2)
where f: a[ , b] - - > a[ , b] and where a[ , b] is a closed interval of R. Let L . and assume that f satisfies the condition
If(x z ) -
f(x l )!
<
Llx
z -
IX I
> 0,
(5.10.3)
Chapter 5 I Metric Spaces
324
for all X ! , X z E a[ , b). In this case / is said to satisfy a iL pschitz condition, and L is called a iL pschitz constant. Now consider the complete metric space {R; p}, where p denotes the usual metric on the real line. Then aH , b]; p} is a complete metric subspace of{ R ; p} (see Theorem 5.5.33). If in (5.10.3) we assume that L < I, then/is clearly a contraction mapping, and Theorem 5.8.5 applies. It follows that if L < I, then Eq. (5.10.2) possesses a unique solution. Specifically, if X o E a[ , b), then the sequence ,x { ,}, n = 1,2, ... determined by "X = /(X,,_I) converges to the unique solution of Eq. (5.10.2). Note that if Id/(x ) fdx I = If' ( x ) I < c < I on the interval a[ , b) (in this case f' ( a) denotes the right-hand derivative of/at a, and f' ( b) denotes the left-hand derivative of/at b), then / is clearly a contraction. In iF gures J and K the applicability of the contraction mapping principle y= x b1-----------.,(
81- - - . (
/
/
/
/
/
/
iF gure J . Successive approximations (convergent case).
5.10.4.
,
b
y· x /
/
/
/
/
~ ............... y" fIx)
/
8 /
/
,-
x. X
3x b
5.10.5.
iF gure .K
Successive approximations (convergent case).
5.10. Applications
325
is demonstrated pictorially. As indicated, the sequence .x{ } determined by • successive approximations converges to the fixed point .x In our next example we consider a system of linear equations. 5.10.6. Example.
Consider the system of n linear equations
e, =
•
~
:'J 1
+ P"
a'J~
e.)
i=
1, ... , n.
(5.10.7)
Assume that x = (~I' ... , E R·, b = (PI> ' .. , P.) E R·, and a'J E R. Here the constants a'' J P, are known and the are unknown. In the following we use the contraction mapping principle to determine conditions for the existence and uniqueness of solutions of Eq. (5.10.7). In doing so we consider different metric spaces. In all cases we let
e,
y =!(x) denote the mapping determined by the system of linear equations
•
+
a'J~ P" i = I, ... , n, "J :1 where y = (' I I' ... , 'I.) ERn. F i rst we consider the complete space R { n; PI} = R7. Let y' = ! ( x ' ) , y" = ! ( x " ), x ' = (~;, ... , ~) and x " = (~~, ... , ~). We have
I' , =
=
PI(y' , y")
=
<
~
PI(f(X ' ) ,f(x " »
=
tIt a,iej -
e~)1
I- I
J-I
m:x
i; It a,~j
I- I
tt
<
{~la,JI}PI(X',
+ P, - a'Je~
J=I
1= I 1= I
- P,
I
la,Jllej - e~1
x " ),
where in the preceding the Holder inequality for finite sums was used (see Theorem 5.2.1). Clearly, f is a contraction if the inequality
(5.10.8) holds. Thus, Eq. (5.10.7) possesses a unique solution if (5. 10.8) holds for allj. Next, we consider the complete space R { n; pz} = Ri. We have
pl(y' , y")
= pl(! ( x ' ) ,! ( x " » =
= ~
•
ft1 a,tj +
{.
L • L .{ a,J(ej - e1) 2} < { L • L
I-I I-J
1= 1
J=I
P, - a,J e1- P, } z ah }
pi(x ' ,
x " ),
where, in the preceding, the Schwarz inequality for finite sums was employed (see Theorem 5.2.1). It follows that f is a contraction, provided that the inequality
(5.10.9)
ehopler 5 I Metric Spaces
326
holds. Therefore, Eq. (5.10.7) possesses a uniq u e solution, if (5.10.9) is satisfied. { a; p-l = R:.. We have Lastly, let us consider the complete metric space R
p- ( y' , y") =
=
m { ax
I
a
L 1= 1
la,/llp_(x',
Thus,f is a contraction if
It
= m~
p_ ( f(x ' ) ,f(x " »
-
I
~)
x " ).
fti 1al/ Il a
{m~x
all~
k
b.
<
l.
(5.10.10)
Hence, if (5.10.10) holds, then Eq. (5.10.7) has a unique solution. In summary, if anyone of the conditions (5.10.8), (5.10.9), or (5.10.10) holds, then Eq. (5.10.7) possesses a uniq u e solution, namely .x This solution can be determined by the successive approx i mation I- '..t(k) - -
for all i =
I, ...
~
a
+
a IJ ' - I..t(k- I )
1-
bI'
k -- " 1 2 ... ,
,n, with starting point X C01 =
...
(~iO),
,~~O».
(5.10.11)
•
Next, let us consider an integral eq u ation. L e t, E e[a, b) and let (K s, I) be a real-valued function 3.10.12. Example. which is continuous on the sq u are a[ , b) X a[ , b). L e t 1 E R. We call
=
ex s)
+
tp{s)
1
s: (K s,
t)x(t)dt
(5.10.13)
a Fredholm nOD-homogeneous linear integral equation of the secODd kind. In this eq u ation x is the unknown, (K s, t) and, are specified, and 1 is regarded as an arbitrary parameter. We now show that for all III sufficiently small, Eq. (5.10.13) has a uniq u e solution which is continuous on a[ , b]. To this end, consider the complete metric space e{ ra, b]; p-l, and let y = f(x ) denote the mapping determined by
yes) Clearly sup I(K s,
a5;t5;b a5;,5;b
y E era, b]. t) I. Then
=
+ ). s: (K s,
,(s)
We thus have f:
era, b]
S 111M(b -
p_ ( f(x l ),! ( x , »
t)x(t)dt. -+
a)p_ ( x
era, b].
Now let M =
l , x , ).
Therefore, if we choose 1 so that 111=
<
M(b
~
a)
(5.10.14)
5.10. Applications
3rT
then f is a contraction mapping. F r om Theorem 5.8.5 it now follows tbat Eq. (5.10.13) possesses a uniq u e solution x E era, b], if (5.10.14) bolds. Starting at X o E era, b], successive approx i mations to this solution are given by
,x ,(s)
=
,(s)
+
.t
s: (K s,
t)x"I_ (t)dt,
=
n
1,2,3, . . ..
_
(5.10.15)
Nex t , we consider yet another type of integral eq u ation.
5.10.16. Example. L e t rp on the triangle a < t < s < (x s)
=
rp(s)
E
era, b], let (K s, t) be a real continuous function b, and let .t E R. We call
+ .t .J '
(K s,
t)x(t)dt,
a
<
s<
b,
(5.10.17)
a linear Volterra integral equation. H e re x is unknown, (K s, t) and, are specified, and .t is an arbitrary parameter. We now show that, for all .t, Eq. (5.10.17) possesses a uniq u e continuous solution. We consider again the complete metric space e{ ra, b]; pool, and we let Y = f(x ) be the mapping determined by
y(s) =
rp(s)
+
.1.
r
(K s,
t)x(t)dt.
Since the right- h and side of this expression is continuous, it follows that f: era, b] - era, b]. Moreover, since K is continuous, there is an M such that IK(s, t)1 ~ M. L e t YI = f(x l ), and let zY = f(x z ). As in the preceding example, we have
p..(f(x l ),f(x 2)) = P"(YI> 2Y ) < 1.1.1 M(b Now let fl"l denote the composite mapping f 0 f = yl"l. A little bit of algebra yields p..(fI"l(X I ),fI"l(x
However,
z
»
= p..(yl">'yl"l) ~
~n . 1.tI"M"(b - a)" -
0 as n -
0
a)poo(x l , x 2)· ••• 0 f, and let fl"l(x )
~n. 1.1.I"M"(b - a)"p..(x 00.
l,
x
2 )·
(5.10.18)
Thus, for an arbitrary value of
.t, n can be chosen so large that k H e nce,
we have
A
~! 1.tI"M"(b -
a)"
<
1.
p.. (fI"l(X I),fI"l(X 2)» < kp..(x 1 , x 2 ), 0 < k < l. Therefore, the composite mapping fl"l is a contraction mapping. It follows from Corollary 5.8.8 that Eq. (5.10.17) possesses a uniq u e continuous solution for arbitrary .t. This solution can be determined by the method of successive approx i mations. _
Chapter 5 I Metric Spaces
328
Verify inequality (5.10.18).
5.10.19. Exercise.
Next we consider initial-value problems characterized by scalar ordinary differential equations. 5.10.20. Example.
Consider the initial-value problem
x =
(X f)
1(1, x ) }
e
=
(5.10.21)
discussed in Section .4 1 I. We would like to determine conditions for the existence and uniqueness of a solution ,(1) of (5.10.21) for f < 1 ::;; T. Let k > 0, and assume that 1 satisfies the condition
<
1/(1, XI) - 1(1, )z x 1
k IX
lzx
I -
for all 1 E [f, T] and for all x l ' X z E R. In this case we say thatf satisfies a iL pschitz condition in x and we call k a Lipschitz constant. As was pointed out in Section .4 11, Eq. (5.10.21) is equivalent to the integral equation
,(t)
e+ s: f(s, tp(s»ds.
=
Consider now the complete metric space { e [ f ,
F ( ,) Then clearly F: e[ f , T]
poo(F(I' )'
=
F ( ,z »
f(s, fP(s»ds,
e[ f ,
-+
n.
Now
[ (s,
'I(S»
sup
ItJ
sup
ft• k l' l (s)
.9:S:T
<
r
e+ =
• :S:,:S:T
•
-
-
(5.10.22)
T]; Poo}, and let f
<
1<
T.
I
f(s, ' z ( s» ] d s
,z ( S) Ids
Thus, F is a contraction if k < Ij(T - f). Next, let p.) denote the composite mapping F as in (5.10.18), the reader can verify that
<
k(T -
0
F
0 •••
f)Poo(fPl> fPz) .
0
.F Similarly,
(5.10.23) Since
..!n., k·(T -
..!n., kft(T-
f)ft
<
f)ft
-+
0 as n - +
00,
it follows that for sufficiently large n,
.
I. Therefore, p.) is a contraction. It now follows from
Corollary 5.8.8 that Eq. (5.10.21) possesses a unique solution for [f, T]. Furthermore, this solution can be obtained by the method of successive approximations. _
5.10. Applications
5.10.24.
Exercise.
329
Generalize Example 5.10.20 to the initial-value problem
=
IX x t fr)
=
,[ (1,
XI'
{I'
i
,x
=
•.
ft ) ,
I, ... , n,
which is discussed in Section .4 1 I . B.
Further
Applications to Ordinary Differential Equations
At the end of Section 4 . lJ A we raised the following questions: (i) When does an initial-value problem possess solutions? (ii) When are these solutions unique? (iii) What is the extent of the interval over which such solutions exist? (iv) Are these solutions continuously dependent on initial conditions? In Example 5.10.20 we have already given a partial answer to the first two uq estions. In the remainder of the present section we refine the type of result given in Example 5.10.20, and we give an answer to the remaining items raised above. As in the beginning of Section 4 . lIA, we call R2 the (I, )x plane, we let D c R2 denote an open connected set (i.e., D is a domain), we assume that / is a real-valued function which is defined and continuous on D, we call T = (11' ( 2 ) eRa I interval, and we let ' I denote a solution of the differential equation X = 1(1, )x . (5.10.25) The reader should refer to Section 4 . lIA for the definition of solution rp. We first concern ourselves with the initial-value problem X
=
1(1, x),
x ( r)
={
(5.10.26)
characterized in Definition .4 11.3. Our first result is concerned with existence of solutions of this problem. It is convenient to establish this result in two stages, using the notion of €-approximate solution of Eq. (5.10.25). 5.10.27. Definition. A function rp defined and continuous on a I interval T is called an a-€ pproximate solution of Eq. (5.10.25) on T if
(i) (t, 1' (1» E D for all t E T; (ii) ' I has a continuous derivative on T except possibly on a finite set S of points in T, where there are jump discontinuities allowed; and (iii) I;(t) - 1(1, rp(I» I < € for all lET - S.
IfS is not empty, ' I is said to have piecewise continuous derivatives on T. We now prove: 5.10.28. Theorem.
Do
=
In Eq. (5.10.25), let/be continuous on the rectangle {(I, x ) :
It - 1'1 <
a,
Ix - {I <
b}.
Chapter 5
330 x
I Metric Spaces
x "' -r-' - ~ - I
,"
(1", l~
I
I I I I I
- J=- ~
L . .-
I......-"""-- _ _
__ ~
_ L.
1"- a
_
1"+ 1
OCal<-
t
_ '_
1"+ -
........."- ' - "~'- ' :_ ' t1"- a
b
1"+ -
M
b M
b .,. + 1 M
OC=.! I<
M
(1" - I , ~l (1" + a,~)
5.10.29.
iF gure .L
Construction of an t-approximate
Given any f > 0, there exists an f-approximate on an interval It - f I< ~ :5: a such that tp(r) =
Proof
M
=
max
(' , > : lED,
II(t, x) I, and let
~
solution tp of Eq. (5.10.25) ~.
= min (a, hIM). Note that
a if a < hiM and ~ = hiM if a> hiM (refer to Figure )L . We will show that an f-approximate solution exists on the interval [ f , f + ~]. The proof is similar for the interval (f - ~, fl. In our proof we will construct an f-approxiconsisting of a finite number of straight line mate solution starting at (f,~, segments joined end to end (see Figure )L . Since 1 is continuous on the compact set Do, it is uniformly continuous on Do (see Theorem 5.7.12). eH nce, given f > 0, there exists 6 = 6(f) > 0 such that I/(t, x ) - I(t' , x ' ) I < f whenever (t, x), (t', x ' ) E Do, It - t'l < 6 and Ix - i'x < 6. Now let f = to and f + ~ = t". We divide the half-open interval (to, t,,] into n half-open subintervals (to, tl]' (tl' t 2,] ... , (t,,_ I ' t,,] in such a fashion that ~
=
Let
solution.
(5.10.30)
331
5.10. Applications
Next, we construct a polygonal path consisting of n straight lines joined end and having slopes equal to to end, starting at the point (r, e> 6 (to, ml _ 1 = ! ( tI- 1 > e l- l ) over the intervals (tl_ l ,tl] , i= I, ... ,n, respectively, where el = el- I + m l _ I Itl - tl _ 1 I· A typical polygonal path is shown in Figure .L Note that the graph of this path is confined to the triangular region in Figure .L eL t us denote the polygonal path constructed in this way by 1' . Note that 1' is continuous on the interval 1[ ,' l' + ~], that 1' is a piecewise linear function, and that 1' is piecewise continuously differentiable. Indeed, we have 1' (1') = = and
eo)
eo
1' (t)
e
= 1' (t l- l ) + f(tl-I>
'1(ti-I»(t
-
ti-I)'
ti-
I
< t < ti' i =
1, ... , n. (5.10.31)
Also note that
1'1(t) -
+
I<
1' (t')
Mit -
t' l
(5.10.32)
for all t, t' E 1[ ,' l' ~]. We now show that 1' is an f-approximate solution. Let t E (t l _ l , tJ Then it follows from (5.10.30) and (5.10.32) that 1'1(t) - 1' (t l- l ) I < 6 . Now since If(t, x) - f(t' , x ' ) I < f whenever (t, x), (t', x ' ) E Do, It t'l < 6, and Ix - x ' i < 0, it follows from Eq. (5.10.31) that
If(tl- I , rp(t l_ I » - f(t, 1' (t» 1< f. Therefore, the function 1' is an f-approximate solution on the interval It - 1'1 < ~ < a . • I;(t) -
f(t, 1' 1»
=
We are now in a position to establish conditions for the existence of solutions of the initial-value problem (5.10.26). S.10.33. Theorem. In Eq. (5.10.25), let! be continuous on the rectangle Do = )x : It - 1'1 < a, Ix - el < b}. Then the initial-value problem (5.10.26) has a solution on some t interval given by It - 1'1 < ~ ~ a.
ret,
Proof
Let
f.
> 0, f.+
1
.--
< .€ and lim f. = 0 (i.e., let .€{ ,}
n
=
1,2, ... ,
be a monotone decreasing sequence of positive numbers tending to ez ro). By Theorem 5.10.28, there exists for every € . an f.-approximate solution of Eq. (5.10.25), call it 1' ., on some interval It - 1'1 < ~ such that 1' .(1') = e. Now for each 1' . it is true, by construction of 1' ., that (5.10.34)
This shows that 1' { .J is an equicontinuous set of functions (see Definition 5.8.11). Letting t' = Tin (5.10.34), we have 1'1.(t) - el < Mit - 1'1 < M~, and thus I1' .(t) I< I" + M~ for all n and for all t E T[ , l' + ~]. Thus, the sequence 1' { .J is uniformly bounded. In view of the Ascoli lemma (see Corollary 5.8.13) there exists a subsequence 1'{ • .}, k = I, 2, ... of the sequence 1' { .J which converges uniformly on the interval 1[ ' - ~, l' + ~] to a limit function 1' ; i.e.,
331
Chapter 5
I
Metric Spaces
This function is continuous (see Theorem 5.7.14) and, in addition, I,{ t ) -
,(t' ) 1
s
Mit -
t'l·
To complete the proof, we must show that, is a solution of Eq. (5.10.26) or, equivalently, that, satisfies the integral equation
~
,(t) =
r
+
f(s, ,(s» d s.
(5.10.35)
eL t , • be an f•• a- pproximate solution, let .~ ,(t) = ; • (t) - f(t, , ..(t)) at = 0 at the points where those points where , • is differentiable, and let .~ ,(t) , • is not differentiable. Then , • can be expressed in integral form as , • .(t) =
r
+ ~
[f(s, , • (s»
+
~ . ,(s)]ds.
(5.10.36)
...(t)1 < f... Also, since uniformly on 1[ ' - IX, f + IX,] as k - + 00, it follows that If(t, f(t, ,(t» I < f on the interval [ f - IX, f + IX] whenever k is so large that I,... (t) - ,(t)1 < ~ on [ f - IX, f + IX.] sU ing Eq. (5.10.36) we now have
Since , ... is an f-approximate
solution, we have
f is uniformly continuous on Do and since , •
,.,(1» -
r
I
[f(s, , ..,(s»
Therefore, ~
-+
f [f(s, ,.,(s» tp(t)
,
I If If(s, , ..(s» + If I~ . ,(s) Ids 1<
+ ~ ...(s)]ds <
- f(s, ,(s»
1~
+ ~.,(s)]ds = ~
which completes the proof. _
+
=
r
f(s, ,(s» lX(f ...
+
Ids I
f)
A
f' .
f(s, ,(s» d s. It now follows that
f f(s, ,(s»ds,
sU ing Theorem 5.10.33, the reader can readily prove the next result. 5.10.37. Corollary. In Eq. (5.10.25), let f be continuous on a domain D ofthe (t, x ) plane, and let (f, ~ E D. Then the initial-value problem (5.10.26) has a solution, on some t interval containing f. 5.10.38. Exercise.
Prove Theorem 5.10.37.
Theorem 5.10.33 (along with Corollary 5.10.37) is known in the literature as the Caucby-Peano existence tbeorem. Note that in these results the solution , is not guaranteed to be unique. Next, we seek conditions under which uniqueness of solutions is assured. We require the following preliminary result, called the Gronwall inequality. 5.10.39. Theorem. eL t rand k be real continuous functions on an interval > 0 and k(t) > 0 for all t E a[ , b], and let ~ > 0 be a
a[ , b]. Suppose ret)
5.10. Applications
333
given non-negative constant. If
r(t) ~ for all t
E
a[ , b), then
for all t
E
a[ , b).
Proof.
Let
=
k(t)r(t)
<
R(t) =
f
+ ~
k(t)R(t), and
k(s)r(s)ds.
=
(K t)
=
<
R(t), R(a)
=
6, R(t)
(5.10.42)
0
e-f.k(I).". Then
=
-k(t)e-I~k(l)dl
Multiplying both sides of (5. 10.42) by (K t)
K(t)R(t) -
- K ( t)k(t). we have
<
K(t)k(t)R(t)
or
+
K(t)R(t)
or
(5.10.40)
r(t)
Then
k(t)R(t) <
R(t) for all t E a[ , b). Let (K t)
f k(s)r(s)ds +
~
K(t)R(t)
<
0 0
< o.
~ (K[ t)R(t»)
Integrating this last expression from a to t we obtain K(t)R(t) - (K a)R(a) < 0 or K(t)R(t) - ~ < 0 or or
r(t)
which is the desired inequality.
<
R(t) <
~el~k(l)dl,
•
In our next result we will require that the function 1 in Eq. satisfy a Lipschitz condition I/(t, x ' )
for all (t, x'), (t, x " )
E
-
I(t, x " ) I <
k lx'
(5.10.25)
- x"I
D.
5.10.43. Theorem. In Eq. (5.10.25), let 1 be continuous on a domain D of the (t, x ) plane, and let 1 satisfy a Lipschitz condition with respect to x on D. Let (t, e> E D. Then the initial-value problem (5.10.26) has a unique solution on some t interval containing t (i.e., if' l and are two solutions
2'
Chapter 5 I Metric Spaces
334
e,
of Eq. (5.10.25) on an interval (a, b), if r- E (a, b), and if 'I(-r) = 'z(-r) = then' l = ,z ) . Proof. By Corollary 5.10.37, at least one solution exists on some interval (a, b), r- E (a, b). Now suppose there is more than one solution, say ' I and 'z, to the initial-value problem (5.10.26). Then
=
e+
,z ( t)
=
"~et) for all t
s: f(s, ,,(s»ds,
(a, b), and
E
' I (t) -
s: [f(s, 'I(S»
i -
=
1,2
f(s, ' z ( s» ] d s.
Let ret) = l' l (t) - ' z ( t)l, and let k > 0 denote the iL pschitz constant for f. In the following we consider the case when t ~ r- , and we leave the details of the proof for t < r- as an exercise. We have, ret)
s: If(s, 'I(S»
< =
i.e.,
f
f(s, ,z(s))l ds -
E
=
<
s: kr(s)ds
r-[ , b). The conditions of Theorem 5.10.39 are clearly satisfied and
we have: if r(t) ~ case J
,z ( s) Ids
kr(s)ds; ret)
for all t
s: k l'l(s) -
<
J
+
f
kr(s)ds, then ret)
0, it follows that
=
ret)
,z ( t) 1 =
Therefore, l' l (t) in this interval. _
0 for all t
0 for all t
E
< JeJ~""'.
Since in the present
r-[ , b).
r-[ , b), and ' I (t)
E
=
' z ( t)
for all t
Now suppose that in Eq. (5.10.25) f is continuous on some domain D of the (t, x ) plane and assume thatfis bounded on D; i.e., suppose there exists a constant M > 0 such that sup If(t, )x 1
(""leD
M. ~
Also, assume that r- E (a, b), that (-r, e> E D, and that the initial-value problem (5.10.26) has a solution, on a t interval (a, b) such that (t, ,(t)) E D for all t E (a, b). Then lim ,(t) 1- . +
exist. To prove this, let t
=
,(a+ )
E
(a, b). Then
,(t)
=
and lim ,(t) 1- 6 -
e+
f
I(s, ,(s»)ds.
=
,(b- )
5.10.
If a
335
Applications
<
t1
<
t1
<
b, then
1,(t 1)
<
,(t1 )1 -
i"lf(s, ,(s» 1ds
"
<
Mlt 1 -
til·
Now let t 1 - - band t 1 - - b. Then It1 - t 1 1- - 0, and therefore I,(tl) - ,(t1) I- - O. This limiting process yields thus a convergent Cauchy sequence; i.e., ,(b- ) exists. The existence of ,(a+ ) is similarly established. (b, ,(b- » are in the domain Next, let us assume that the points (a, ,(a+ » , D. We now show that the solution, can be continued to the right of t = b. An identical procedure can be used to show that the solution P' can be continued to the left of t = a. We define a function
;(t) =
{ , (t), ,(b- ) ,
Then
;(0 =
t
E
t
=
(a, b) } . b
s: f(s, ;(s»ds
{+
for all t E (a, bl. Thus, the derivative of ;(t) exists on the interval (a, b), and the left-hand derivative of ;(t) at t = b is given by
;(b- ) = feb, ;(b» . Next, we consider the initial-value problem
=
.i
x(b)
=
f(t, x )
,(b- ) .
By Corollary 5.10.37, the differential eq u ation .i = f(t, x ) has a solution ' " which passes through the point (b, ,(b- » and which exists on some interval lb, b + Pl, P > O. Now let
~1)
=
1 E (a, bl 1 E b [ , b+
{ ; (1), ",(1),
Pl
}.
To show that ; is a solution of the differential eq u ation on the interval (a, b + Pl, with ;(-r) = ,{ we must show that ; is continuous at t = b. Since
,(b- )
=
and since
;(0 =
,(b- )
we have
;(0 = for all t
E
(a, b
r
{+
{+
+
f(x , ;(s» d s
s: f(s, ;(s»ds,
s: f(s, ;(s»ds
+ Pl. The continuity of ;
in the last eq u ation implies the
Chapter 5 I Metric Spaces
336
countinuity of I(s, s~ .»
Differentiating the last equation, we have
~(t) = I(t, ~(t» for all t E (a, b + Pl. We call ~ a continuation of the solution tp to the interval (a, b Pl. If 1 satisfies a Lipschitz condition on D with respect to ,x then ~ is unique, and we call ~ the continuation of tp to the interval (a, b + Pl. We can repeat the above procedure of continuing solutions until the boundary of D is reached. Now let the domain D be, in particular, a rectangle, as shown in F i gure M. It is important to notice that, in general, we cannot continue solutions over the entire t interval T shown in this figure.
+
0=
h{ . )x : Tl < t < T 2.tl
< x < t 2)
T= ( Tl.T2)
t
T
5.10.4.4 iF gure M. Continuation of a solution to the boundary of domain D.
We summarize the above discussion in the following: 5.10.45. Theorem. In Eq. (5.10.25), let f be continuous and bound on a domain D of the (t, x) plane and let (T, { ) E D. Then all solutions of the initial-value problem (5.10.26) can be continued to the boundary of D. We can readily extend Theorems 5.10.28, 5.10.33, Corollary 5.10.37, and Theorems 5.10.43 and 5.10.45 to initial-value problems characterized by systems of n first-order ordinary differential equations, as given in Definition .4 11.9 and Eq. .4 1 1.11. In doing so we replace D c R'1. by D c Ra+ I , x E R by x E RaJ: D - + R by f: D - + Ra, the absolute value Ixl by the q u antity a
and the metric p(x, y) =
Ix -
Ixl = I; Ix,l, 'sl
y I on R by the metric p(x , y)
-
(5.10.46)
= I; I,x - y,l
I- '
on R-. (The reader can readily verify that the function given in Eq. (5.10.46) satisfies the axioms of a norm (see Theorem .4 9.31).) The definition of Eapproximate solution for the differential eq u ation i = f(t, x ) is identical to that given in Definition 5.10.27, save that scalars are replaced by vectors (e.g., the scalar function tp is replaced by the n-vector valued function p4 ).
337
5.10. Applications
Also, the modifications involved in defining a Lipschitz on D c R-+ I are obvious. 5.10.47.
condition for f(t, )x
F o r the ordinary differential eq u ation
Exercise.
i =
(5.10.48)
f(t, x)
and for the initial-value problem i
=
=
(X T)
f(t, x),
(5.10.49) ~
characterized in Eq. (4.11.7) and Definition .4 11.9, respectively, state and prove results for existence, uniqueness, and continuation of solutions, which are analogous to Theorems 5.10.28, 5.10.33, Corollary 5.10.37, and Theorems 5.10.43 and 5.10.45. In connection with Theorem 5.10.45 we noted that the solutions of initialvalue problems described by non-linear ordinary differential equations can, in general, not be extended to the entire t interval T depicted in Figure M. We now show that in the case of initial-value problems characterized by linear ordinary differential equations it is possible to extend solutions to the entire interval T. First, we need some preliminary results. Let D = ({ t, )x : a < t < b, x E R-} (5.10.50) where the function equations
1·1 is defined in Eq. (5.10.46). Consider the set of linear =
,X
t
J-I
a,it)x J
f,(t, x ) ,
I.>.
=
i
1, ... , n
(5.10.51)
where the a,it), i,j = I, ... , n, are assumed to be real and continuous functions defined on the interval a[ , b]. We first show that f(t, x) = lfl(t, x), ... ,/_(t, )x T ] satisfies a Lipschitz condition on D, If(t, x ' )
for all (t, x ' ) , (t, x " ) and k
=
max
I! ( .J ! ( ._ I - I
-
L I
E
-
f(t, x " ) I ~
k lx '
-
= (x ; , ...
D, where x '
x"l
x" =
,x~)T,
(x:' ,
a,it) I· Indeed, we have
Ir(t, x ' ) -
r(t, x " ) I = ,~
= L
II,(t, x ' )
- I - 1=1' ~
I- I
<
=
-
-
a,it)x~
I- 'tit a,it)(x~ L
-
J-I
Ix~
-
-
L
J-I
-
J-I
k
I,(t, x " ) I
x~'1
x~)
a,it)x~
I
I
= klx' - "x l·
... ,x~)T,
Chapter 5
338
I
Metric Spaces
Next, we prove the following: 5.10.52. Lemma. In Eq. (5.10.48), let f(t, x) = (/1(t, x), ... ,I,,(t, T »X be continuous on a domain D c R"+I, and let f(t, x) satisfy a Lipschitz condition on D with respect to x, with Lipschitz constant k. If' l and are uniq u e solutions of the initial-value problem (5.10.49), with ' I (f) = ;1' ' 2 (t) = ;2 and with (t, ;1), (t, ;2) E D, then
2'
2' (01::;: 1;1 - ;2Iekl,-1<
"I(t) for all (t, ' I (t» ,
(t, ' 2 { t »
E
D.
We assume that t > t, and we leave the details of the proof for t as an exercise. We have
Proof t
<
(5.10.53)
= ;1
' I (t) ' 2 (t) =
;2 +
and
t2(t) 1 <
"I{ t ) -
+
1;1 -
r r
f(s, ' I (s» d s, f(s, t2(s»ds,
;11
+
Applying Theorem 5.10.39 to inequality (5.10.53) results. _
k
s: Ifl(s) -
(5.10.54),
f1(S) 1ds.
(5.10.54)
the desired inequality
We are now in a position to prove the following important result for systems of linear ordinary differential eq u ations. 5.10.55. Theorem. L e t D c Rn+l be given by Eq. (5.10.50), and let the real functions alit), i,j = I, ... ,n, be continuous on the t interval a[ , b]. Then there exists a unique solutionto the initial-value problem
IX
X,(f)
with (t, ~1'
a[ , b].
Proof
••
'~n)
t
= = E
a1it)x1
1=1 ~I'
i
6
!,(t, )x ,
= I, ... , n
i
=
I, ... ,n }
(5.10.56)
D. This solution can be extended to the entire interval
Since the vector f(t, x) = (fl{ t , x), ... ,/.(t, T »x is continuous on D, since f(t, x) satisfies a Lipschitz condition with respect to x on D, and since (T,;) E D (where; = (~I' ... '~n)T), it follows from Theorem 5.10.43 (interpreted for systems of first-order ordinary differential eq u ations) that the initial-value problem (5.10.56) has a uniq u e solution 'I' through the point
5.10. Applications
339
(r, ;) over some interval e[ , d] c a[ , b]. We must show that 'I' can be continued to a unique solution, over the entire interval a[ , b]. Let i be any solution of Eq. (5.10.56) through (r, ;) which exists on some = i and = 0, we subinterval of a[ , b]. Applying Lemma 5.10.52 to have
I'
2'
(5.10.57) for all t in the domain of definition of i. F o r purposes of contradiction, suppose that 'I' does not have a continuation to a[ , b] and assume that 'I' has a continuation i existing up to t' < b and cannot be continued beyond t'. But inequality (5.10.57) implies that the path (t, i(t» remains inside a closed bounded subset of D. It follows from Theorem 5.10.45, interpreted for systems of first-order ordinary differential equations, that i may be continued beyond t'. We thus have arrived at a contradiction, which proves that a continuation, of", exists on the entire interval a[ , b]. This continuation is unique because f(t, )x satisfies a Lipschitz condition with respect to x on D. • S.lO.58. Exercise. In Theorem 5.10.55, let alj(t), i,j = 1, ... ,n, be continuous on the open interval (- 00, 00). Show that the initial-value probE Rn+1 which can lem (5.10.56) possesses unique solutions for every (r, be extended to the t interval (- 00, 00).
e>
Exercise. eL t D c Rn+ I be given by Eq. (5.10.50), and let the real functions alit), v/(t), i,j = I, ... ,n, be continuous on the t interval a[ , b]. Show that there exists a unique solution to the initial-value problem 5.10.59.
el' ...
=
lx r- )
with (r, ,en) entire interval a[ , b].
E
e/,
i
=
I, ... ,n,
(5.10.60)
D. Show that this solution can be extended to the
It is possible to relax the conditions on v/(t), j = 1, ' . ' . ,n, in the above exercise considerably. F o r example, it can be shown that if v/(t) is piecewise continuous on a[ , b], then the assertions of Exercise 5.10.59 still hold. We now address ourselves to the last item of the present section. Consider the initial-value problem (5.10.49) which we characterized in Definition .4 11.9. Assume that f(t, )x satisfies a Lipschitz condition on a domain D c Rn+1 and that (r,;) E D. Then the initial-value problem possesses a unique solution, over some t interval containing 1'. To indicate the depen-
340
Chapter 5 / Metric Spaces
dence of, on the initial point (r, ;), we write ,(t;
T, ;),
where fP{T; T,;) = ;. We now ask: What are the effects of different initial conditions on the solution of Eq. (5.1O.48)? Our next result provides the answer. 5.10.61. Theorem. In Eq. (5.10.49) let f(t, )x satisfy a iL pschitz condition with respect to x on Dc R·+I. Let (T,;) E D. Then the unique solution f(t; T, ;) of Eq. (5.10.49), existing on some bounded t interval containing T, depends continuously on ; on any such bounded interval. (This means if ;. -> ; , then ,(t; T, ;.) - > f(t; T, ;).) Proof
We have
= ;. +
frs, ,(s; T, ;.)]ds
,(t; T,;)
=;
frs, ,(s; T, ;)]ds.
and
It follows that for t
>
T
(the proof for t
<
.(t; T, 1;)1
I.(t; T, ;.) -
r + r < + r
cp(t; T, ;.)
II;. - ; 1
< where k denotes a iL pschitz we obtain
is left as an exercise),
r
If[s, ,(s; T, ;.)] -
;1 + k
II;. -
T
' , (s;
constant for f(t, )x .
I,(t; T, 1;.) - ,(t; T, 1;)1 < II;. - 1;1 ef~kd' Thus if 1;. - >
1;, then cp(t;
T,
T,
1;.) - >
.(t;
T,
1;.) sU ing
frs, cp(s; T, ;)]1 ds
cp(s;
T,
;)1 ds,
Theorem 5.10.39,
= II;. -1;1
ek(t- T l.
1;). •
It follows from the proof of the above theorem that the convergence is uniform with respect to t on any interval a[ , b] on which the solutions are defined.
5.10.62. Example.
The initial-value problem
x =
(X T)
where
-00
<
T
<
00, - 0 0
,(t;
T,
e>
2x
c= ;
}
< c; < 00, has the unique solution = c;eZ(t-T), - 0 0 < t < 00,
which depends continuously on the initial value C;. •
(5.10.63)
5.11.
References and Notes
341
Thus far, in the present section, we have concerned ourselves with problems characterized by real ordinary differential equations. It is an easy matter to verify that all the existence, uniqueness, continuation, and dependence (on initial conditions) results proved in the present section are also valid for initial-value problems described by complex ordinary differential equations such as those given, e.g., in Eq. (4.11.25). In this case, the norm of a complex vector z = (z " ... ,ZRY ' Zk = kU + ivk , k = 1, ... , n, is given by
where IZk I = = IZI - Z21·
(u~
+
vl)I/2.
5.11. REFERENCES
The metric on
en
is in this case given by P(ZI' Z2)
AND NOTES
There are numerous excellent texts on metric spaces. Books which are especially readable include Copson 5[ .2], Gleason 5[ .3], Goldstein and Rosenbaum [5.4], Kantorovich and Akilov [5.5], oK lmogorov and Fomin 5[ .7], Naylor and Sell 5[ .8], and Royden 5[ .9]. Reference 5[ .8] includes some applications. The book by eK lley 5[ .6] is a standard reference on topology. An excellent reference on ordinary differential equations is the book by Coddington and eL vinson [5.1].
REFERENCES 5[ .1] 5[ .2] 5[ .3] 5[ .4]
5[ .5]
5[ .6] 5[ .7]
E. A. CODDINGTON and N. EL VINSON, Theory ofOrdinary Differential Equations. New o Y rk: McGraw-iH li Book Company, Inc., 1955. E. T. CoPSON, Me/ric Spaces. Cambridge, England: Cambridge nU iversity Press, 1968. A. M. GLEASON, uF ndamentals of Abstract Analysis. Reading, Mass.: Addison-Wesley Publishing Co., Inc., 1966.
M. E. GOLDSTEIN and B. M. ROSENBAUM, "Introduction to Abstract Analysis," National Aeronautics and Space Administration, Report No. SP-203, Washington, D.C., 1969. L. V. A K NTOROVICH and G. P. AKIO L V, uF nctional Analysis in Normed Y rk: The Macmillan Company, 1964. Spaces. New o .J EK EL ,Y General Topology. Princeton, N.J.: D. Van Nostrand Company, Inc., 1955. A. N. O K M L OGOROV and S. V. O F MIN, Elements of the Theory ofFunctions and uF nctional Analysis. Vol. I. Albany, N.Y.: Graylock Press, 1957.
342 5[ .8] 5[ .9] 5[ .10]
Chapter 5 I Metric Spaces A. W. NAYO L R. and G. R. SEL,L iL near Operator Theory in Engineering and Science. New Y o rk: H o lt, Rinehart and Winston, 1971. H . L . ROYDEN, Real Analysis. New Y o rk:The Macmillan Company,I965. A. E. TAYO L R., General Theory of uF nctions and Integration. New Y o rk; Blaisdell Publishing Company, 1965.
6
NORMED SPACES AND INNER PRODUCT SPACES
In Chapters 2- 4 we concerned ourselves primarily with algebraic aspects of certain mathematical systems, while in Chapter 5 we addressed ourselves to topological properties of some mathematical systems. The stage is now set to combine topological and algebraic structures. In doing so, we arrive at linear topological spaces, namely normed linear spaces and inner product spaces, in general, and Banach spaces and Hilbert spaces, in particular. The properties of such spaces are the topic of the present chapter. In the next chapter we will study linear transformations defined on Banach and Hilbert spaces. The material of the present chapter and the next chapter constitutes part of a branch of mathematics called functional analysis. Since normed linear spaces and inner product spaces are vector spaces as well as metric spaces, the results of Chapters 3 and 5 are applicable to the spaces considered in this chapter. Furthermore, since the Euclidean spaces considered in Chapter 4 are important examples of normed linear spaces and inner product spaces, the reader may find it useful to refer to Section .4 9 for proper motivation of the material to follow. The present chapter consists of 16 sections. In the first 10 sections we consider some of the important general properties of normed linear spaces and Banach spaces. In sections II through 14 we examine some of the important general characteristics of inner product spaces and Hilbert spaces. (Inner product spaces are special types of normed linear spaces; Hilbert 343
Chapter 6 I Normed Spaces and Inner Product Spaces
344
spaces are special cases of Banach spaces; Banach spaces are special kinds of nonned linear spaces; and H i lbert spaces are special types of inner product spaces.) In section 15, we consider two applications. This chapter is concluded with a brief discussion of pertinent references in the last section.
6.1.
NORM ED IL NEAR
SPACES
Throughout this chapter, R denotes the field ofreal numbers, C denotes the field of complex numbers, F denotes either R or C, and X denotes a vector space over .F 6.1.1. Definition. Let II 1· 1 denote a mapping from X into R which satisfies the following properties for every ,x y E X and every « E :F (i) (ii) (iii) (iv)
IIxll ~ 0; IIxll = 0 if and only if x = II/%IX I = 1«1l· lxll; and Ilx + yll ~ IIxll + lIyll·
0;
The function " • "is called a nonn on X, the mathematical system con• II}, is called a nonned linear space, and II x II sisting of II • I\ and ,X { X ; " is called the nonn or .x If F = C we speak of a complex nonned linear space, and if F = R we speak of a real nonned linear space. Different norms defined on the same linear space X yield different nonned linear spaces. If in a given discussion it is clear which particular norm is • II} to denote the nonned being used, we simply write X in place of { X ; " linear space under consideration. Properties (iii) and (iv) in Definition 6.1.1 are called the homogeneity property and the triangle inequality of a nonn, respectively. Let { X ; II • II} be a normed linear space and let ,x E ,X i = I, ... ,n. Repeated use of the triangle inequality yields
II X I + ... + .x 1I
~
I\ x
I "
+ ... + IIx.lI·
The following result shows that every normed linear space has a metric associated with it, induced by the. nonn I\ • II. Therefore, every nonned linear space is also a metric space.
II • III be a nonned linear space, and let p be a 6.1.2. 1beorem. L e t ;X { real-valued function defined on X x X given by p(x, y) = IIx - yll for all ,x y E .X Then p is a metric on X and ;X{ p} is a metric space. 6.1.3. Exercise.
Prove Theorem 6.1.2.
6.1. oH rmed iL near Spaces
This theorem tells us that all ofthe results in the previous chapter on metric spaces apply to normed linear spaces as we/l,providedwe let p(x, y) = Ilx - y II. We will adopt the convention that when using the terminology ofmetric spaces (e.g., completeness, compactness, convergence, continuity, etc.) in a normed linear space (X ; II . Ill, we mean with respect to the metric space (X ; p}, where p(x, y) = II x - y II. Also, whenever we use metric space properties on ,F i.e., on R or C, we mean with respect to the usual metric on R or C, respectively. With the foregoing in mind, we now introduce the following important concept. 6.1.4. Definition. A complete normed linear space is called a Banach space. Thus, (X ; (II • II} is a Banach space if and only if (X ; metric space, where p(x , y) = IIx - yll.
p} is a complete
6.1.5. Example. Let X = RR, the space of n-tuples of real numbers, or let X = CR, the space of n-tuples of complex numbers. From Example 3.1.10 we see that X is a vector space. F o r x E X given by x = (e I' . . . , e.), and for pER such that I < p < 00, define
II x II, = l[ ei I' + ... + leRI'p /,. verify that II . II, satisfies the axioms
We can readily of a norm. Axioms (i), (ii), (iii) of Definition 6.1.1 follow trivially, while axiom (iv) is a direct conseq u ence of Minkowski' s inequality for finite sums (5.2.6). L e tting pix , y) = II x y II" then (X ; p,} is the metric space of Exercise 5.5.25. Since (X ; p,} is complete, it follows that (RR; II . II,} and (CR; II . II,} are Banach spaces. We may also define a norm on X by letting
IIxll .. = It can readily be verified that (R\ spaces (see Exercise 5.5.25). • 6.1.6. Example. Let ple 3.1.13), let I S p
<
= X
00,
I,
=
max
I~R~'
II . II..}
le,l· and (CR; II
. II..}
R" (see Example 3.1.11) or X and as in Example 5.3.5, let
x{
E
Define
IIxll, =
:X
f; le,l'
I~'
<
= C" (see Exam-
oo}.
(~ .. le,l' )1/' . /- 1
are also Banach
(6.1.7)
It is readily verified that II . II, is a norm on the linear space I,. Axioms (i), (ii), (iii) of Definition 6.1.1 follow trivially, while axiom (iv), the triangle
Chapter 6 I Normed Spaces and Inner Product Spaces
346
inequality, follows from Minkowski' s inequality for infinite sums (5.2.7). Invoking Example 5.5.26, it also follows that l{ p; II . lip} is a Banach space. H e nceforth, when we simply refer to the Banach space Ip , we assume that the norm on this space is given by Eq. (6.1.7). L e tting p = 00 and I..
= x{
E
X:
sup
ne/1}
sup
ne/I},
/
(refer to Example 5.3.8), and defining
IIxII .. =
/
< oo}
(6.1.8)
it is readily verified that I{ .. ; II • II..} is also a Banach space. When we simply refer to the Banach space I.., we have in mind the norm given in Eq. (6.1.8). •
6.1.9. Example (a) L e t the interval
e[a, b) denote the linear space of real continuous functions on a[ , b), as given in Example 3.1.l9. F o r x E era, b) define
Ilxllp =
i[ "lx(t Wdt I] IP , b
I
<
00.
It is easily shown that lela, b); II . lip} is a normed linear space. Ax i oms (i)-(iii) of Definition 6.1.1 follow trivially, while axiom (iv) follows from the Minkowski inequality for integrals (5.2.8). L e t pix , y) = IIx - lY l p • Then e{ ra, b); pp} is a metric space which is not complete (see Example 5.5.29 where we considered the special case p = 2). It follows that e{ ra, b); II . lip} is not a Banach space. Next, define on the linear space era, b) the function II . II.. by
IIxII .. =
sup Ix ( t)
' E la,b)
I.
It is readily shown that e{ ra, b); II • II..} is a normed linear space. L e t p.. (x, y) = I\ x - yll... In accordance with Example 5.5.28, e{ ra, b); P..} is a complete metric space, and thus e{ ra, b); II . II..} is a Banach space. The above discussion can be modified in an obvious way for the case where era, b) consists of complex-valued continuous functions defined on a[ , b). H e re vector addition and multiplication of vectors by scalars are defined similarly as in Eqs. (3.1.20) and (3.1.21), respectively. F u rthermore, it is easy to show that e{ ra, b); II • lip}, I < p < 00, and e{ ra, b); II . II..} are normed linear spaces with norms defined similarly as above. Once more, the space e{ ra, b); II • lip}, I < p < 00, is not a Banach space, while the space lela, b); II • II..} is. (b) The metric space pL { (a, b); pp} was defined in Example 5.5.31. It can be shown that pL a[ , b) is a vector space over R. If we let
IIXI1p=f[
),,,,bl
IfIPdJlI] /P,
6.1. NormedLinear Spaces
347
>
I, for f E pL a[ , b], where the integral is the Lebesgue integral, then pL { a[ , b]; II . lip} is a Banach space since pL { a[ , b]; pp} is complete, where pp(x, y) ~ Ilx - lY lp. _ p
6.1.10. Example. Let { X ; II • II..}, {Y; II . II,.} be two normed linear spaces over ,F and let X x Y denote the Cartesian product of X and .Y Defining vector addition on X x Y by (XI'
IY )
+
= (XI
(x z ' )z Y
+
X z , IY
+
)z Y
and multiplication of vectors by scalars as
(« ,x
y)
=
(<,x<
«y),
we can readily show that X x Y is a linear space (see Eqs. (3.2.14), (3.2.15) and the related discussion). This space can be used to generate a normed linear space { X x ;Y II . III by defining the norm II . II as F u rthermore, if { X ; easily shown that { X
6.1.11. Exercise. 6.1.10.
II(x, y)11 = IIxll .. + IIYII,· II . II..} and { Y ; II • II,} are Banach x ;Y II . III is also a Banach space.
spaces, then it is _
Verify the assertions made in Examples 6.1.5 through
II • III a
sphere S(x o; r) with
IIx - ox ll < rl·
(6.1.12)
We note that in a normed linear space { X ; center X o E X and radius r > 0 is given by
S(x o; r)
=
x{
E
X:
Referring to Theorem 5.4.27 and Exercise 5.4.31, recall that in a metric space the closure of a sphere (denoted by S(x o; r» need not coincide with the closed sphere (denoted by (K x o; r». In a normed linear space we have the following result.
6.1.13. Theorem. Let X be a normed linear space, let X o E ,X and let r > O. L e t S(x o; r) denote the closure of the open sphere S(x o; r) given by Eq. (6.1.12). Then S(x o; r) = (K x o; r), the closed sphere, where
K(xo;r)
=
x{
E
X : llx - x o lI< r } .
(6.1.14)
Proof By Exercise 5.4.31 we know that S(x o ; r) c (K x o; r). Thus, we need only show that (K x o; r) c S(x o; r). It is clearly sufficient to show that x { E :X Ilx - ox ll = r} c S(x o; r). To do so, let x be such that IIx - ox ll = r, and let 0 < € < I. Let Y = € X o + (I - f)x . Then y - X o = (I - € ) (x - x o)' Thus, Ily - ox ll = II - 1 € l· lx - ox ll < r and so y E S(x o; r). Also, y - x = (€ x o - )x . Therefore, Ily - ix i = f • r. This means that x E S(x o; r), which completes the proof. _
Chapter 6 I Normed Spaces and Inner Product Spaces
348
Thus, in a nonned linear space we may call S(x o; r) the closed sphere given by Eq. (6.1.14). When regarded as a function from X into R, a nonn has the following important property. 6.1.15. Theorem. Let ;X { II . III be a nonned linear space. Then a continuous mapping of X into R.
II • II is
Proof We view II . II as a mapping from the metric space {X; p}, P = II x - y II, into the real numbers with the usual metric for R. Thus, for given f > 0, we wish to show that there is a t5 > 0 such that II x - y II < t5 implies IlIx l l- l Iylll < f. Now let z = x - y. Then x = z + y and so Ilxll < Ilzll + lIyll· This implies that IIx l l- l lyll < Ilzll. Similarly, y = x - ,z and so IIYII < Ilxll + II- z l l = Ilxll + IIzll. Thus, IIYII-Ilxll < IIzll. It now follows that Illx l l- l Iylll< IIzll = IIx - yll. Letting t5 = f, the desired result follows. •
In this chapter we will not always require that a particular nonned linear space be a Banach space. Nonetheless, many important results of analysis require the completeness property. This is also true in applications. F o r example, in the solution of various types of equations (such as non-linear differential equations, integral equations, etc.) or in optimization problems or in non-linear feedback problems or in approximation theory, as well as many other areas of applications, we frequently obtain our desired solution in the form of a sequence generated by means of some iterative scheme. In such a sequence, each succeeding member is closer to the desired solution than its predecessor. Now even though the precise solution to which a sequence of this type may converge is unknown, it is usually imperative that the sequence converge to an element in that space which happens to be the setting of the particular problem in question.
6.2.
IL NEAR
SUBSPACES
We now turn our attention briefly to linear subspaces of a normed linear space. We first recall Definition 3.2.1. A non-empty subset Y of a vector space X is called a linear subspace in X if (i) x + y E Y whenever x and y are in ,Y and (ii) «x E Y whenever « E F and x E .Y Next, consider a normed linear space { X ; 1I·11l, let Y be a linear subspace in ,X and let II· lit denote the restriCtion of 11·11 to ;Y i.e.,
IIxll l = Ilxll Then it is easy to show that { Y ;
for all
x
E
.Y
1\ • lit} is also a nonned linear space. We
6.2. iL near Subspaces
349
call II . III the norm induced by II . lion Y a nd we say that { Y ; II • III} is a normed linear subspace of ;X { II • II}, or simply a linear subspace of .X Since there is usually no room for confusion, we drop the subscript and simply denote this subspace by { Y ; II • III In fact, when it is clear which norm is being used, we usually refer to the normed linear spaces X and .Y Our first result is an immediate consequence of Theorem 5.5.33. 6.2.1. Theorem. Let X be a Banach space, and let Y be a linear subspace of .X Then Y is a Banach space if and only if Y is closed. In the following we give an example of a linear subspace of a Banach space which is not closed. 6.2.2. Example. Let X be the Banach space /1 of Example 6.1.6, and let Y be the space of finitely non-zero sequences given in Example 3.1.14. It is easily shown that Y is a linear subspace of .X To show that Y is not closed, consider the sequence (y.l in Y defined by IY 1Y
!Y
(1,0,0, ...), = =
=
(I, 1/2,0,0, ...), (I, 1/2, 1/4,0,0, ...),
.........................
Y.
=
(I, 1/2, ... , 1/2· , 0,0,
,
).
This sequence converges to the point x = (I, 1/2, , 1/2·, 1/2-+ 1, Since x ¢ ,Y it follows from part (iii) of Theorem 5.5.8 that Y _ closed subset of .X
.• . ) E
.X
is not a
Next, we prove: 6.2.3. Theorem. Let X be a Banach space, let Y be a linear subspace of ,X and let f denote the closure of .Y Then f is a closed linear subspace of .X
Proof Since Y is closed, we only have to show that Y is a linear subspace. Let ,x Y E ,Y and let f > O. Then there exist elements ,'x y' E Y such that and lIy - y' l l < f. Hence, for arbitrary ~,P E ,F 'X~ + + py) - (~X' + Py' ) II = 1I~(x - x ' ) + P(y - y')11 < I~I ·lIx - x ' i l + IPI · l Iy - y'll < (I~I + IPlk Since f > 0 is arbitrary, this implies that ~x + py is an adherent point of ;Y i.e., x~ + py E .Y This completes the proof of the theorem. _ IIx py'
x'lI
E
<
f
.Y Now
1I(~x
We conclude this section with the following useful result. 6.2.4. Theorem. Let X be a normed linear space, and let Y subspace of .X If Y is an open subset of ,X then Y = .X
be a linear
Chapter 6 I Normed Spaces and Inner Product Spaces
350
Proof
eL t x E .X We wish to show that x E .Y Since 0 assume that x 1= = O. Since Y is open and 0 E ,Y there is some l
the sphere S(O; l) Yis
= 2lixllx,
.Y eL t z
C
a linear subspace, it follows that
6.3.
INFINITE
Then
IIzll <
211 x llz = x
>
and so z
l
E
l
E
,Y we may
0 such that E
.Y Since
Y .•
SERIES
Having defined a norm on a linear space, we are in a position to consider the concept of infinite series in a meaningful way. Throughout this section we refer to a normed linear space ;X{ II • II} simply as .X 6.3.1. Definition. eL t x{ integer m, let
be a sequence of elements in .X F o r each positive
8}
y",
=
XI
+ ... +
"x .'
We call y{ ",} the sequence of partial sums of x{ 8 } . If the sequence y{ ",} verges to a limit y E ,X we say the infinite series XI
+
+ ... + z
X
converges and we write
-
y=
We say the infinite series I; X 8
2
1
8
-
X k
I; X
8- 1
+ ... =
-
I; X
8- 1
con-
8
8•
diverges if the sequence "Y { ,}
diverges.
The following result yields sufficient conditions for an infinite series to converge.
. IfI; IIx
6.3.2. Theorem. eL t X be a Banach space, and let x { 8= 1
8
11 <
00,
then
.
(i) the infinite series I; X (ii)
-
.
8= 1
8
8
2
Proof To prove the first part, let y", =
8Y -
"Y , =
Since
i: II
8= 1
X
X " ,+ I
8
+ ... +
be a sequence in X .
converges; and
III; X II < I; II X II· 8= 1 8 1 8
8}
x 8• eH nce,
XI
+ ... +
"x .'
If n >
m, then
II is a convergent infinite series of real numbers, the sequence
6.4.
351
Convex Sets
of partial sums sIft = IIxIl1 + ... + II x'" II is Cauchy. Hence, given f > 0, there is a positive integer N such that n > m > N implies Is. - SIft I:: :; ; f. But Is. - s.. 1> Ily. - y",lI, and so Y { ..} is a Cauchy sequence. Since X is complete, y{ ",} is convergent and conclusion (i) follows. To prove the second part, let y", = IX X .. , and let y = lim y", =
+ ... +
...
I; X .=1
Ilyll
. I; Ilx/ll·
Then for each positive integer m we have y =
•.
<
m -+
Ily.. 1I < Ily - y.. 11 + ... ... 00, we have III; Ix II < I; II tx ll- • Ily - y .. 11
+
1= 1
6.4.
CONVEX
1= 1
y-
y",
at
+
Y .. and
Taking the limit as
I- I
SETS
In the present section we consider the concepts of convexity and cones which arise naturally in many applications. Throughout this section, X is a
real normed linear space.
L e t X and y be two elements of .X
xy =
z{
E
X:
Z=
ax
+
the line segment joining X follows.
(I -
We call the set xy, defined by
a)y for all a
and y. Convex
E
R such that 0
<
a
<
I},
sets are now characterized
as
6.4.1. Definition. L e t Y be a subset of .X Then Y is said to be convex if Y contains the line segment x y whenever X and yare two arbitrary points in .Y A convex set is called a convex body if it contains at least one interior point, i.e., if it completely contains some sphere. In F i gure A we depict a line segment xy, a convex set, and a non-convex set in R2.
)(
line segment yx
Convex set 6.4.2.
Figure
A
Non-convex
set
Chapter 6 I Normed Spaces and Inner Product Spaces
352
Note that an equivalent statement for Y to be convex is that if ,x y E Y then « x + py E Y whenever « and p are positive constants such that «+P=1. We cite a few examples. 6.4.3. Example. The empty set is convex. Also, a set consisting of one point is convex. In R3, a cube and a sphere are convex bodies, while a plane and a line segment are convex sets but not convex bodies. Any linear subspace of X is a convex set. Also, any linear variety of X (see Definition 3.2.17) is a convex set. _ 6.4..4 «Y= .X
_
6.4.5.
x{
Example. E :X x
Let
= «Y,y
Exercise.
Y and Z be convex sets in ,X let II, pER, and let E .J Y Then the set « Y + pZ is a convex set in
Prove the assertions made in Examples 6.4.3 and 6.4..4
6.4.6. Theorem. eL t Y be a convex set in ,X and let II, pER be positive scalars. Then (<< + P)Y = « Y + pY .
Proof Regardless of convexity, if x E (<< + P)Y, then x = (<< + P)y = ay + py E « Y + pY , and thus (<< + P)Y c « Y + pY . Now let Y be convex, and let x = « y pz, where y, z E .Y Then
+
«
1
+ pX =
p
i:+ f J Y
«
+ i+ " J z
E
+~-tiJ1i:+ -1 J
- i:+P
- .
,Y
because
"i:+«P Therefore, x the proof. _
E
(<<
+
P)Y and thus « Y
+
pY c (<< +
P)Y.
This completes
We leave the proof of the next result as an exercise. 6.4.7. Theorem. Let e be an arbitrary collection of convex intersection Y is also a convex set.
n
sets. The
eY e
6.4.8.
Exercise.
Prove Theorem 6.4.7.
The preceding result gives rise to the following concept. 6.4.9. Definition. eL t Y be any set in .X The convex bull of ,Y also called the convex cover of ,Y denoted by Y e , is the intersection of all convex sets which contain .Y
6.4.
Convex Sets
353
We note that the convex hull of Y is the smallest convex set which contains .Y Examples of convex covers of sets in R2 are depicted in Figure B.
Legend:
Y
-
f!J 6.4.10.
iF gure B. Convex hulls.
6.4.11. Theorem. Let Y be any set in .X of 'points expressible as I~ IY ~2Y2 where
I~
>
+
0, i =
The convex hull of Y is the set ~"Y", where IY ' ... ,Y" E ,Y
+ ... +
" ~I = I, ... , n, where E 1= 1
1 and where n is not fixed.
Proof If Z is the set of elements expressible as described above, then clearly Z is convex. Moreover, Y c Z, and hence Y e c Z. To show that Z eY e , we show that Z is contained in every convex set which contains .Y We do so by induction on the number of elements of Y that appear in the representation of an element of Z. Let U be a convex set with :U ::J .Y If z = (% l ZI E Z for which n = I, then (%1 = 1 and Z E .U Now assume that an element of Z is in U if it is represented in terms of n - I elements of .Y Let Z = ~l Z I + ... + (%"Z" be in Z, let P = ~I + ... + (%"_1' let PI = (%IIP, i = I, ... ,n - I, andletu = PIZI + ... + P,,- l Z,,- I ' Thenu E ,U by the induction hypothesis. But "z E ,U (%" = I - P, and Z = pu + (I - P)z" E ,U since U is convex. This completes the induction, and thus Z c U from which it follows that Zc eY ' • 6.4.12. Theorem. Let is also a convex set. 6.4.13.
Exercise.
Y
be a convex set in .X
Then the closure of ,Y
,Y
Prove Theorem 6.4.12.
Since the intersection of any number of closed sets is always closed, it follows from Theorem 6.4.7 that the intersection of an arbitrary number of closed convex sets is also a closed convex set. We now consider some interesting aspects of norms in terms of convex sets.
Chapter 6 / Normed Spaces and Inner Product Spaces
6.4.14.
Theorem. Any sphere in X is a convex set.
Proof
We consider without loss of generality the unit sphere,
Y={x ~
Ifx o' oY ~
+ P=
+ P=
E
,Y then IIx oII < I, then II~xo I, and thus ~xo
+
E
X : llx l l<
I and IIYolI PYol1 < lI~xoll
+ PYo
E
.Y
I}.
< I. Now if« > •
+
IIPYolI
=
0 and ~llxoll
P> +
0, where PllYol1 <
In view of Theorems 6.1.13, 6.4.1 2, and 6.4.1 4, it follows that a closed sphere S(x o; r) is also convex. The following example, cast in R2, is rather instructive.
Example. On R2 we define the norm II • IIi> of Ex a mple 6.1.5. A moment' s reflection reveals that in case of II . 112' the unit sphere is a circle of radius I; when the norm is II • II.., the unit sphere is a sq u are with vertices (1,1), (I, - I ), (- 1 , 1), (- 1 , - I ); if the norm is II • 111' the unit sphere is the sq u are with vertices (0, I), (I~ 0), (- 1 ,0), (0, - I ). If for the unit sphere corresponding to II • lip we let p increase from I to 00, then this sphere will deform in a continuous manner from the sq u are corresponding to II • lit to the sq u are corresponding to II . II... This is depicted in F i gure C. We note that in all cases the unit sphere results in a convex set. F o r the case of the real-valued function
6.4.15.
(6.4.16) the set determined by II x II < 1 results in a set which is not convex. In particular, if p = 2/3, the set determined by II x II < I yields the boundary and the interior of an asteroid, as shown in F i gure C. The reason for the non- c onvex i ty of this set can be found in the fact that the function (6.4.16) does not represent a norm. In particular, it can be shown that (6.4.16) does not satisfy the triangle inequality. •
11'11.
t, 11'11,
6.4.17.
Unit spheres for Example 6.4.15
1I'lb13
iF gure C. Unit
spheres for Example 6.4.15.
355
6.5. iL near uF nctionals
6.4.18.
Exercise.
Verify the assertions made in Example 6.4.15.
We conclude this section by introducing the notion of cone. 6.4.19. Definition. A set Y in X is called a cone with vertex at the origin if Y E Y implies that Y« E Y for all« > O. If Y is a cone with vertex at the origin, then the set X o + ,Y X o E ,X is called a cone with vertex X o' A convex cone is a set which is both convex and a cone. In Figure D examples of cones are shown.
(al Cone
(bl Convex cone
6.4.20.
6.5.
IL NEAR
iF gure D
FUNCTIONALS
Throughout this section X
is a normed linear space.
We recall that a mapping, f, from X into F is called a functional on X (see Definition 3.5.1). Iff is also linear, i.e., f(<x< + py) = « f (x ) + Pf(y) for all« , P E F and all ,x y E ,X then f is called a linear functional (refer to Definition 3.5.1). Recall further that X I , the set of all linear functionals on ,X is a linear space over F (see Theorem 3.5.16). eL t f E X I and x E .X In accordance with Eq. (3.5.10), we use the notation
f(x ) =
(6.5.1)
< x , f)
to denote the value offat .x Alternatively, we sometimes find it convenient to let x' E X ' denote a linear functional defined on X and write (see Eq. (3.5.11))
x'(x)
= ,x<
x').
(6.5.2)
Invoking Definition 5.7.1, we note that continuity of a functional at a point X o E X means, in the present context, that for every f > 0 there is a ~ > 0 such that If(x ) - f(x o) I < f whenever II x - X o I\ .~< Our first
Chapter 6 I oH rmed Spaces and Inner Product Spaces
356
result shows that if a linear functional on X is continuous at one point of X then it is continuous at all points of .X 6.5.3. Theorem. If a linear functional f on X is continuous at some point X o E X, then it is continuous for all x E .X
Proof If y{ ,,} is a sequence in X such that y" - + x o' thenf(y,,) - + f(x o)' by Theorem 5.7.8. Now let ,x { ,} be a sequence in X converging to x E .X Then the sequence y{ ,,} in X given by y" = "x - x + X o converges to x o' By the linearity off, we have f(x ) = f(y,,) -
f(x,,) -
f(x o)' Since If(y,,) - f(x o )l- + 0 as y" - + X o' we have If(x,,) - f(x ) l- + 0 as "x - + x, and therefore f is continuous at x E .X Since x is arbitrary, the proof of the theorem is complete. _
"*
It is clear that iffis a linear functional and if f(x ) 0 for some x then the range off is all of F ; i.e., R < (f) = .F F o r linear functionals we define boundedness as follows.
E
,X
6.5.4. Definition. A linear functionalf on X is said to be bounded if there exists a real constant M > 0 such that for all x
E
.X
If(x ) I < Mil x II Iff is not bounded, then it is said to be unbounded.
The following theorem shows that continuity and boundedness of linear functionals are equivalent. 6.5.5. Theorem.· A linear functional bounded if and only if it is continuous.
f on a normed linear space X is
Proof Assume thatfis bounded, and let M be such that If(x)1 < Mil x II for all x E .X If"x - + 0, then If(x,,) I < Mil "x 11- + o. H e nce,fis continuous at x = O. F r om Theorem 6.5.3 it follows thatfis continuous for all x E .X Conversely, assume thatfis continuous at x = 0 and hence at any x E X. There is a 6> 0 such that If(x)1 < I whenever IIxll < 6. Now for any x 1= = 0 we have II (6x)/11 Ix I II = 6, and thus If(x ) If we let M =
=
I If(~
IIxll
1/6, then If(x )
II II) 1= -r X
If( 6x ) \ .
I< Mllxll,
TIXlT
IIxll -r <
II x ll.
0-
andfis bounded. _
We will see later, in Example 6.5.17, that there may exist linear funetionals on a normed linear space which are unbounded. The class oflinear functionals which are bounded has some interesting properties.
6.5. iL near uF nctionals
357
6.5.6. Theorem. eL t X ' be the vector space of all linear functionals on ,X and let X · denote the family of all bounded linear functionals on .X Define the function II . II: X · - + R by
11/11 =
for IE *X .
I/(x)1 IIxll
sup .... 0
(6.5.7)
Then (i) *X is a linear subspace of XI; (ii) the function II • II defined in Eq. (6.5.7) is a norm on X · ; ; II . III is complete. (iii) the normed space { X ·
and
The proof of part (i) is straightforward and is left as an exercise. To prove part (ii), note that if I -F= 0, then II I II > 0 and if I = 0, then 11/11 = O. Also, since
Proof.
IOt/(x)1 IIIx I
sup .... 0
it follows that II Otl II =
lOt IIII II·
III1 + 1211 =
sup 1/1(x)
<
sup 1/1(x)1
.... 0
Finally,
+
IIxll
+
IIxll
.... 0
lOti sup l/(x ) I, .... 0 lTXlr =
12(x)
I<
sup 1{ /1(x)1
.... 0
+
IIxll
1/2(x)
I}
= III1II + 11/211.
sup 1/2(x)1
.... 0 Ilxll
eH nce, II . II satisfies the axioms of a norm. E * X be a Cauchy sequence. Then Ilx~ - :x "11 To prove part (iii), let }~x{ -+ 0 as m, n - + 00. If we evaluate this sequence at any x E ,X then {x~(x)} is a Cauchy sequence of scalars, because Ix(~ )x - :x "(x) I< IIx~ - :x .. 1111 x II. This implies that for each x E X there is a scalar x ' ( x ) such that x~(x) -+ x'(x). We observe that (' x Otx + py) = lim (~x Otx + py) = lim O[ tx(~ )x + px~(y)]
Otx(' )x
+
= Ot lim x:(x) + ~
p lim x:(y) .-~
= /X('x )x
11_ 0 0
+
px ' ( y),
i.e.,
11' _ 0 0
('x Otx
+
.
py) =
px ' ( y), and thus, ' x is a linear functional. Next we show that ' x is bounded. Since :x { } is a Cauchy sequence, for f > 0 there is an M such that Ix : (x ) - :x "(x)1 < fllxll for all m, n > M and for all x E .X But (~x )x +x'(x), and hence Ix ' ( x ) - :x "(x) I < fllxll for all m > M. It now follows that
Ix(' )x I
= Ix(' )x
-
:x "(x)
+
< fllxll + Ilx:"lIllxll,
:x "(x)
I < Ix(' )x -
:x "(x)
I + I:x "(x) I
and thus x ' is a bounded linear functional. iF nally, to show that :x ., - + E * X , we note that Ix ' ( x ) - :x "(x) I < fllx II whenever m > M from which we have Ilx' - :x ., II < f whenever m > M. This proves the theorem. _
x'
Chapter 6 I Normed Spaces and Inner Product Spaces 6.5.8. Exercise.
Prove part (i) of Theorem 6.5.6.
It is especially interesting to note that *X is a Banach space whether X is or is not a Banach space. We are now in a position to make the following definition. 6.5.9. Definition. The set of all bounded linear functionals on a normed space X is called the Donned conjugate space of ,X or the nonned dual of ,X or simply the dual of ,X and is denoted by * X . F o r I E *X we call 11/11 defined by Eq. (6.5.7) the nonn off The next result states that the norm of a functional can be represented in various equivalent ways. 6.5.10. T ' heorem. L e tlbe be the norm off Then (i)
(ii) (iii) 6.5.H .
IIIII= Ilfll= 11/11 =
a bounded linear functional on ,X
inf{ M : I/(x )
Is
sup { 1 /(x ) l} ;
and
M
b:1~ 1
and let 11/11
II for all x EX } ;
Mil x
sup {l/(x)l}.
1..1- 1
Exercise.
Prove Theorem 6.5.10.
Let us now consider the norms of some specific linear functionals. 6.5.12. Example. mapping
Consider the normed linear space
r
I(x ) =
x(s) ds,
x
e{ ra, b]; II • II-I. The
era, b]
E
is a linear functional on era, b] (cf. Example 3.5.2). The norm of this functional equals (b - a), because I/(x ) 1 =
I6J
G
x(s) ds
I<
(b -
a) max G~.~6
Ix(s) I. •
6.5.13. Example. Consider the space e{ ra, b]; II • II..}, let X o be a fixed element of era, b], and let x be any element of era, b]. The mapping I(x ) =
is a linear functional on bounded, because
If(x)1 =
If
s:
x(s)xo(s)
ds
era, b] (cf. Example 3.5.2). This functional is
(X S)Xo(S)
ds
I< u: I
oX (S)
IdS) II x 11_.
6.5. iL near uF nctionals
359
Sincefis bounded and linear, it follows that it is continuous. We leave it to the reader to show that
11/11 =
r
Ixo(s) Ids. -
6.5.14. Example. eL t a = (~1' ... , ~n) be a fixed element of nF , and let x = (et, ... ,en) denote an arbitrary element of P. Then if
=
f(x )
n
~
~/e"
it follows that f is a linear functional on P (cf. Example 3.5.6). eL tting IIxll = I< etl Z + ... + ienI Z)I/2, it follows from the Schwarz inequality (4.9.29) that (6.5.15) Thus, f is bounded and continuous. In order to determine the norm of ,J we rewrite (6.5.15) as
I/(x) I < sup If(x)1 < lIall Ilxll - ",00 lTXlf , from which it follows that II/ II < II a II. Next, by setting x = a, we have If(a)1 = lIallz . Thus, I/(a) I = 11011 11011 . Therefore IIfll=
lIall· -
6.5.16. Example. Analogous to the above example, let a = (~1' ~z' ...) be a fixed element of the Banach space I" (see Example 6.1.6), and let x = (et, e",· .. ) be an arbitrary element of I". It follows that if f(x )
=
00
~ ~/e" 1=1
thenfis a linear functional on /". We can show thatfis bounded by observing that
If(x)1 =
Iii/~e(1
< ~I~le/l
< lIall·lIxll,
which follows from Holder's inequality for infinite sums (5.2.4). Thus, f is bounded and, hence, continuous. In a manner similar to that of Example 6.5.14, we can show that II/II = I! all. _ We conclude this section with an example functional.
of an unbounded linear
Chapter 6 I Normed Spaces and Inner Product Spaces
360 6.5.17. (~"
~2'
IIxli =
Example. Consider the space X of finitely non-zero sequences x = '~8' 0, 0, ...) (cf. Example 3.1.14). Define II . II: X - + R as max lell· It is easy to show that ;X { II . II} is a normed linear space. ••
i
F u rthermore, it is readily verified that the mapping
is an unbounded linear functional on .X
•
6.5.18. Exercise. Verify the assertions made in Examples 6.5.12, 6.5.13, 6.5.14,6.5.16, and 6.5.17.
6.6.
IF NITE-DIMENSIONAL
SPACES
We now briefly turn our attention to finite-dimensional vector spaces. Throughout this section X denotes a normed linear space. We recall that if "x { ... ,x 8 } is a basis for a linear space ,X then for each x E X there is a unique set ofscaJ a rs e{ l" .. ,e8} in ,F called the coordinates of x with respect to this basis (see Definition 3.3.36). We now prove the following result. 6.6.1. Theorem. Let X be a finite-dimensional normed linear space, and let { x " ... ,x be a basis for .X F o r each x E ,X let the coordinates of x with respect to this basis be denoted by (e I ' • . • , e8) E P. F o r i = I, ... , n, define the linear functionals It: X - + F by It(x) = Then each It is a continuous linear functional. 8
}
el'
Proof The proof that It is linear is straightforward. To show that It is a bounded linear functional, we let
S=
(<"<
a{ =
... , 8« )
E
8£ :
1«,1 + 1«21 + ... + 1«81
=
I}.
It is left as an exercise to show that S is a compact set in the metric space 8F { ; PI} (see Example 5.3.1). Now let us define the function g: S - + R by
g(a) =
1I«,,x
+ ... +
8« 8x 11·
The reader can readily verify that g is a continuous function on S. Now let m = inf{ g (a): a E S}. It follows from Theorem 5.7.15 that there is an a o E S such that g(ao) = m. Note that m 0 since "x { ... , x 8 } is a basis for ,X and also a o O. Hence m > O. It now follows that
*
for every a
= (<"<
*
II«,,x + ... + 8« 8x 11 > m ... 8 « ' ) E S. Since 1«,1 + ... + 1«81 =
I for a
E
S, we
6.6. iF nite-Dimensional
361
Spaces
see that
IIIX IX I + ... + IXnnx ll
> m(lIXII + ... + IIX n)!
for all a E S. Next, for arbitrary x E X with coordinates (' I " .. p = 1'1 I + ... + I I· First, we suppose that p > O. Then
,en)
en
Ilxli = Ilelx l + ... +
>
pm(I%1
+
enxnll
I+
= p II%x
(6.6.2) E
P, we let
... + tXnll
+I~I)
= m(lell + I+ n' l), where inequality (6.6.2) has been used.
Therefore, if p ::1= 0, we have
(let! + ... + len I) < - lmIlx l I·
(6.6.3)
Noting that inequality (6.6.3) is also true if p = 0, we conclude that this Since Ift(x)1 = Iell < leI I + ... + Ienl, inequality is true for all x E .X i = I, ... , n, we see that Ift(x) I < (ljm)lIx l i for any x E .X Hence, ft is a bounded linear functional and, consequently, it is continuous. _ 6.6.4. Exercise. Prove that the set S and the function g have the properties asserted in the proof of Theorem 6.6.1. The preceding theorem allows us to prove the following important result. 6.6.5. Theorem. X is complete.
Let X
be a finite-dimensional normed linear space. Then
Proof
Let Ix { > ... , ,x ,} be a basis for ,X let kY { } be a Cauchy sequence in and for each k let the coordinates of kY with respect to IX { > ... ,x,,} be given by (l1kl> ... , ' 7 h)' It follows from Theorem 6.6.1 that there is a constant M such that I11k} 1- 1/J1 < MllYk - IY II forj = I, ... , n and all i, k = 1,2, .... Hence, each sequence 7'{ k}} is a Cauchy sequence in ,F i.e., in R or C, and is therefore convergent. Let '70} = lim 7' k} for j = I, ... , n. If we ,X
k
let
oY = it follows that kY { }
' 7 0I X I
converges to oY '
+ ... +
7' o"x",
This proves that X
is complete.
_
The next result follows from Theorems 6.6.5 and 6.2.1. 6.6.6. Theorem. L e t X be a normed linear space, and let Y be a finitedimensional linear subspace of .X Then (i) Y is complete, and (ii) Y is closed.
Chapter 6 I Normed Spaces and Inner Product Spaces
6.6.7. Exercise.
Prove Theorem 6.6.6.
Our next result is an immediate consequence of Theorem 6.6.1.
X be a finite-dimensional normed linear space, and let/be a linear functional on .X Then/is continuous.
6.6.8. Theorem. Let 6.6.9. Exercise.
Prove Theorem 6.6.8.
We recall from Definition 5.6.30 and Theorem 5.6.31 that a subset Y o f a metric space X is relatively compact if every sequence of elements in Y contains a subsequence which converges to an element in .X This property can be useful in characterizing finite-dimensional subspaces in an arbitrary normed linear space as we shall see in the next theorem. Note also that in view of Definition 5.1.19 a subset Y in a normed linear space X is bounded if and only if there is a .t > 0 such that II Y II < .t for all Y E .Y
6.6.10. Theorem. Let X be a normed linear space, and let Y be a linear subspace of .X Then Y is finite dimensional if and only if every bounded subset of Y is relatively compact. Proof (Necessity) Assume that Y is finite dimensional, and let {x I' • • , x.J be a basis for .Y Then for any Y E Y there is a unique set {"I' ... , such that Y = "IX I + ... + Let A be a bounded subset of ,Y and let I' k } be a sequence in A. Then we can write kY = "ax i + ... + ".kX . for k = I, 2, . . . . There exists a .l > 0 such that II Y kll < .l for all k. Consider I"Ik I + ... + I".kl. We wish to show that this sum is bounded. Suppose that it is not. Then for each positive integer m, we can find a Y k . such that I"I k.1 + ... + I".l.1 .>L mY > m. Now let Y~. = (l/Ym)Yk.· It follows that
".J
".x ..
lIy~.1I -
Thus, Y~.
0 as m -
m
<.!. m
On the other hand,
00.
Y~.
,,1m II Y k . II ,< ,1 =
=
"' l k.X I
+ ... +
"~k.X.
+ ... +
where t(,k. = "' k J " m for i = I, ... ,n. Since 1";k.1 l~k.1 = I, the coordinates "{ ;k., ; .. k~' .J form a bounded sequence in "F and as such contain a convergent subsequence. Let "{ ;o' ... , ~o} be the limit of such a convergent subsequence whose indices we denote by kmJ' If we let Y~ = ";ox i + ... + , ~ox., then we have \Iy~
-
y~JI
< 1";0 - ";kJ ·lIx
- 0 as mj-
l
lI + ... +
I~o
-
~k",1
• I\ .x II
00.
Thus, Y~.J - y~. Since y~", - 0, it follows that Y~ = O. But this is impossible because lX { ' ... , .x } is a linearly independent set. We conclude that the sum
6.7.
Geometric Aspects ofLinear uF nctionals
363
II' lk I + ... + I1' 11k I is bounded. Consequently, there is a subsequence I' { lk}" .. ,'1"k}} which is convergent in "F . Let 1'{ 10' ,'1"o} be the limit of the convergent subsequence, and let oY = 1' 10 X I + + 1' "ox". Then Y k /- > oY . Thus, kY { } contains a convergent subsequence, and this proves that A is relatively compact. (Sufficiency) Assume that every bounded subset of iY s relatively compact. Let IX E Y be such that IIxllI = I, and let VI = V({x l } ) be the linear sub(see Definition 3.3.6). If VI = ,Y then we are done. space generated by IX { } If VI *' - ,Y let z Y E Y be such that z Y $ VI' Let d = infllyz - « x " l. Since VI is closed by Theorem 6.6.6, we must have d> 0; otherwise z Y E VI' F o r every ' 1 > 0 there is an X o E VI such that d < IIYz - X oII < d + I' . Now let X z = (Yz - o x )/Ilyz - ox ll. Then X z rt V!, Ilxlz l = I, and II
Ilx z - ix i = I I ~:
=
;:11 - ix i
=
IIYz
~
x o ll" z Y -
_
I'x I
- x ' I I> _d_ = 1- - ' 1 - , d+ ' 1 -d+n d+ ' 1 where x ' = X o + IIYz - X oII X E VI for all X E VI' Since I' is arbitrary, we can choose I' so that II X z - X 1 II > t· Now let V2 be the linear subspace generated by {XI' x z .} If VI = ,Y we are done. If not, we can proceed in the manner used above to select an X 3 rt VZ, II x 3 11 = I, II IX - x 3 11 > t, and II X z - x 3 11 > t. If we continue this process, then we either have V({x l ' . . . , ,x ,}) = Y for some n, or or else we obtain such that Ilx,,1I = t and IIx" - "x ,11 > 1- for all an infinite sequence ,x { ,} m *' - n. The second alternative is impossible, since ,x { ,} is a bounded sequence and as such must contain a convergent subsequence. This completes the proof. _ >
IIYz
6.7. GEOMETRIC ASPECTS OF IL NEAR
FUNCTIONALS
Throughout this section X denotes a real normed linear space. Before giving geometric interpretations of linear functionals we introduce the notions of maximal subspace and hyperplane. 6.7.1. Definition. A linear subspace Y of linear space X is called maximal if it is not all of X and if there exists no linear subspace Z of X such that Y , *- Z ,Z,*X and Y c Z. Recall that if Y is a linear subspace of X and if Z E ,Y then we call the set Z = z + Y a linear variety (see Definition 3.2.17). In this case we also say that Z is a translation of .Y
Chapter 6 I aH rmed Spaces and Inner Product Spaces
364
6.7.2. Definition. A hyperplane Y in a linear space X is a rnax i mallinear variety resulting from the translation of a maximal linear subspace. If a hyperplane Y contains the origin, then it is simply a maximal linear subspace and all hyperplanes Z obtained by translating Y are said to be paraUel to .Y The following theorem provides us with an important characterization of hyperplanes in terms of linear functionals. 6.7.3. Theorem. Iff1= = 0 is a linear functional on X and if (X is any fixed scalar, then the set Y = (x: f(x ) = (X} is a hyperplane. It contains the origin o if and only if (X = O. Conversely, if Y is a hyperplane in a linear space X, then there is a linear functional f on X and a fixed scalar (X such that Y = (x: f(x ) = (X.}
Proof Consider the first part. Sincef=/::. 0 there is an IX such thatf(x ,) = ,) = (X and thus X o E .Y Let P /= ::. O. If X o = (« /X P)x (« /X P)f(x l , then f(x o) = oY = - X o .Y It is readily verified that oY = (x : f(x ) = O} and that oY is a linear subspace, so that Y is a linear variety. Since oY /= ::. ,X we can write every element of X as the sum of an element of oY and a multiple of y, where y E X - oY ' If x E ,X if y is any element in X - oY such that f(y) 1= = 0, and if
+
z = x _ f (x ) y
f(y) ,
then f(z ) = 0, and thus x has the required form. Now assume that Y I is a linear subspace of X for which oY c Y I and Y I 1= = oY ' We can choose y E Y I - oY , and the above argument shows that X c Y I so that Y I = .X This shows that oY is maximal and that Y is a hyperplane. The assertion that Y contains 0 if and only if (X = 0 follows readily. Consider now the last part of the thorem. If Y is a hyperplane in X, then Y is the translation of a linear subspace Z in X ; i.e., Y = x o + Z, with X o fixed. If X o i Z, and if V( Y + x o) denotes the linear subspace generated x o) = If for x = (Xx o Z, Z E Z, we by the set Y + X O' then V(Y define f(x ) = (x , then Y = (x: f(x ) = I}. On the other hand, if X o E Z, then we take IX i Z, X = V(Z + IX )' Y = Z, and define for x = (XIX + Z, f(x ) = (X. Then Y = (x : f(x ) = OJ. This concludes the proof of the theorem. _
+
.x
+
In the proof ofthe above theorem we established also the following result: 6.7.4. let Z
=
Theorem. L e tf /= ::. 0 be a linear functional on the linear space ,X and :x{ f(x ) = OJ. If X o E X - Z, then every x E X can be expressed as _ f(x ) x - f(x o )x O + ,z Z E Z.
6.7.
Geometric Aspects ofLinear uF nctionals
The next result shows that it is possible to establish a unique correspondence between hyperplanes and linear functionals. This result follows readily from Theorem 6.7.3. 6.7.5. Theorem. eL t Y be a hyperplane in a linear space .X If Y does not contain the origin, there is a unique linear functional f on X such that Y = { x : f(x ) = I}. 6.7.6. Exercise.
Prove Theorem 6.7.5.
6.7.7. Theorem. Let Y be a maximal linear subspace in a Banach space .X Then either Y = for f = ;X i.e., either Y is closed or else Y is dense in .X
Proof Since Y is a linear subspace, f is a linear subspace of X by Theorem 6.2.3. Now Y c f. eH nce, if Y , * f we must have f = ,X since Y is a maximal linear subspace. _ In the next result we will show that Y is closed if and only if the functional Y is bounded (Le., continuous). Thus, corresponding to any hyperplane in a normed linear space there is a functional that is bounded whenever the hyperplane is closed and vice versa.
f associated with
6.7.8. Theorem. L e tfbe a non-ez ro linear functional on X , and let Y = { x : f(x ) = II} be a hyperplane in .X Then Y i s closed for every II if and only iffis bounded.
Proof Sincefis bounded, it is continuous. If ,x { ,} is a sequence in Y which converges to x E ,X then f(x,,) - + f(x ) = II, so that x E ,Y and thus Y is closed. = O} be closed. In view of Theorem 6.7.4, Conversely, let Z = { x : f(x ) be a there exists an X o E X - Z such that X = x [ o + Z] . Now let ,x { ,} sequence in X such that "x - + x E .X Then it is possible to express each "x and x as "x = c.xo + "z and x = cX o + ,z where z., z E Z. eL t d = inf II X o - z II· Since Z is closed, d > O. Now 6ez
IIx -
,x ,11 =
>
II(e inf
(6- 6 .)eZ
e,,)x o -
II(e -
(z - ,z ,)11 e.)x o - (z -
,z ,)11 =
Ie - e"ld.
Thus e" - + e. Moreover, since f(x,,) = c"f(x o) + f(z,,) = e"f(x o) - + cf(x o) = f(x ) , it follows that f is continuous on ,X and hence bounded. _ We now introduce the concept of a half-space. 6.7.9. Definition. Let f be a non-zero linear functional on ,X II E R. Let Y b e the hyperplane given by Y = { x : f(x ) = II}. Let Y
1,
and let Y 2, Y 3,
Chapter 6 I Normed Spaces and Inner Product Spaces
and Y 4 be subsets of X defined by Y I = { x : f(x ) < ,}« Y 1 = { x : f(x ) < .}« Y , = { x : f(x ) > .}« and Y 4 = { x : f(x ) > .}« Then each of the sets Y I • Y 1, Y , . and Y 4 is called a half space determined by .Y In addition, let ZI and Z1 be subsets of .X We say that Y separates ZI and Z1 if either (i) ZI C Y 1 and Z1 C Y 4 or (ii) ZI C Y 4 and Z1 C Y 1 • 6.7.10. Exercise. Show that each of the sets Y I , Y 1 , "Y Y 4 in the preceding definition is convex. Also, show that if in the above definition f is continuous, then Y I and Y , are open sets in ,X and Y 1 and Y 4 are closed sets in .X In order to demonstrate some of the notions introduced. we conclude this section with the following example. ~1) E .X Let y = 6.7.11. Example. Let X = R1, and let x = (~\t be any fixed vector in .X and define the linear functionalf on X as
The set
f(x )
= "I~I
+
"1~1'
x{ E R1:f(x ) = "I~I + "1~1 = O} is a line through the origin of R1 which is normal to the vector y. IfX I the hyperplane
oY =
6.7.12.
("1' "1)
iF gure E. H a lf spaces.
¢ oY .
6.8. Extension ofLinear uF nctionals
367
is a linear variety which is parallel to oY . The hyperplane Y divides RZ into two open half-spaces ZI and Zz as depicted in Figure E. It should be noted that x E X can now be written as x = Z + py, Z E ,Y where x E ZI if P > 0 and x E Zz if P < o. •
6.8.
EXTENSION
OF
IL NEAR
N UF CTIONALS
In this section we state and prove the Hahn-Banach theorem. This result is very important in analysis and has important implications in applications. We would like to point out that the present form of this theorem is not the most general version of the Hahn-Banach theorem. Throughout this section X will denote a real normed linear space. 6.8.1. Definition. Let Y be a linear subspace of ,X let Z be a proper linear subspace of ,Y let I be a bounded linear functional defined on Z, and let] be a bounded linear functional defined on .Y If lex) = I(x ) whenever x E Z, then] is called an extension of/from Z to .Y If the spaces ,X ,Y Z are normed and if 1I/11z = II 1 lin then I is called a norm preserving extension off We now prove the following version of the Hahn-Banach
theorem.
6.8.2. Theorem. Every bounded linear functional I defined on a linear subspace Y of a real normed linear space X can be extended to the entire space X with preservation of norm. Specifically, one can find a bounded linear functional I such that (i)
lex) = lex) for every x Illllx = Ililly·
E
Y;
and
(ii) Proof Although this theorem is true for X not separable, we the proof only for the case where X is separable (see Definition separability). We assume that Y is a proper linear subspace of X, wise there is nothing to prove. Let x I E X but x I i ,Y and let us subset Y
I
=
x{
E
X:
x =
(XIX
+
y, (X
E
R, y
E
shall give 5.4.33 for for otherdefine the
.}Y
It is straightforward to verify that Y I is a linear subspace of ,X and furthermore that for each x E Y I there is a unique (X E R and a unique y E Y such that x = (XIX + y. Ifan ex t ension] of/from Y t o Y I exists, then it has the form
lex)
=
(X l (x
l)
+ ICY),
and if we let c = l(x l ), then lex) = 1(Y) - c(X. From this it is clear that the extension is specified by prescribing the constant (Xc. In order that the
Chapter 6 I Normed Spaces and Inner Product Spaces
368
norm of the functional not be increased when it is continued from Y to Y we must find a c such that the inequality
If(Y) holds for all Y E .Y If Y E ,Y then /Y tX
<
tXcl
+
IIfll IIy
I
,
tXX I II
Y and the above inequality can be written as
E
If(tX)z
-
<
tXcl
+
IIflllltXz
or
tXx
+
I/(z) - cl < Ilfllllz
l
II
Ix II·
This inequality can be rewritten as
+
- I Ifllllz
IX II < f (z )
c< -
Ilfllllz
+
Ix II
or, equivalently, as
+
IIfll liz
f(z ) -
Ix II <
C
<
f(z )
+
IIfllliz
+
Ix II
(6.8.3)
for all z E .Y We now must show that such a number c does indeed always exist. To do this, it suffices to show that for any Y l o)/Z E ,Y we have C 1 t>.
+
/(YI) 1- 1/IIIIYI
XI
+
II s /(Y z )
IIfllllyz
+
x I II
t>.
Cz
-
(6.8.4)
But this inequality follows directly from
< IlfllilYI - lzY I = IlfllllYI + lX - lX - y~1I < IIfllllYI + IX II + IIfll II zY + lX II· In view of (6.8.3) and (6.8.4) it follows that C I S C < Cz. If we now let f(Y I ) -
f(yz )
j(x )
= f(y) -
/%C,
lY >
E X
we have 11111 = II/II, and J is an extension of/from Y to Y I • Next, since X is separable it contains a denumerable everywhere dense set lX{ > ,~X •• , X . , ..• } . F r om this set of vectors we select, one at a time, a linearly independent subset of vectors lY{ > ,z Y ... ,Y . , ...} which belongs to X - .Y The set lY{ > ,z Y _ .. ,Y . , ...} together with the linear subspace Y generates a subspace W dense in .X Following the above procedure, we now extend the functional f to a functional J on the subspace W by ex t ending/from Y to Y I , then to 2Y ,' etc., where Y
I
= :x{
= X
tXIY
+
;Y Y
E
,Y tX
= tXzY
+
;Y Y
E
Y
and Y~
= :x{ X
I,
E
tX
E
R} R},
etc.
6.8. Extension 0/ iL near uF nctionals
369
Finally, we extend J from the dense subspace W to the space X. At the remaining points of X the functional is defined by continuity. If x E X , then there exists a sequence w { n } of vectors in W converging to .x By con= IimJ ( w n ). The inequality IJ ( x ) I< 1I/1111x II tinuity, if lim Wn = ,x thenJ ( x ) ._00
"_00
follows from
I=
IJ ( x )
lim Ij(wn) I <
'-00
lim 1I/IIIIwnii
n- o o
This completes the proof of the theorem.
=
11/1I11xll·
_
The next result is a direct consequence of Theorem 6.8.2. 6.8.5. Corollary. Let X o E ,X X o 1= = O. Then there exists a bounded nonzero linear functional/defined on all of X such that/(x o) = II X o II and 11/11
=1.
Proof Let Y be the linear subspace of X given by Y IX E R}. F o r Y E ,Y define 10(Y) = IXlix oII, where Y IIXI·llx
oll,
and so
l~o~ 1
=
1 for all Y E .Y
The proof now follows from Theorem 6.8.2.
=
= y{ IXx
E
o'
:X y = lU o, Then Ilyll=
This implies that
II/oil = 1.
_
The next result is also a consequence of the Hahn-Banach
theorem.
6.8.6. Corollary. Let X o E ,X X O 1= = 0, and let "f > O. Then there exists a bounded non·zero linear functional/defined on all of X such that II I II = "f and/(x o) = 1I/11· l Ix o lI· The above corollary guarantees the existence linear functionals. 6.8.7. Exercise. In the next given.
of non-trivial bounded
Prove Corollary 6.8.6. example
a geometric interpretation of Corollary 6.8.5
IS
6.8.8. Example. Let X o E ,X X o 1= = 0, and let/be a linear functional defined on X such that/(x o ) = Ilx o II and II/II = I. Let K b e the closed sphere given by K = x { E :X IIxll < IlxolIJ. Now if x E ,K then I(x ) < I/(x ) I < 11/11 · l Ix l l < IIx o II, and so x belongs to the half-space x { E X : /(x ) < IIx o ll}· Thus, the hyperplane x { E :X /(x ) = II X oII} is tangent to the closed sphere (as illustrated in Figure )F . _
Chapter 6
370
K
I Normed Spaces and Inner Product Spaces
(X :
6.8.9.
iF gure .F
fIx) l= Ixoll}
Illustration of Corollary 6.8.5.
In closing this section, we mention two of the more important consequences of the Hahn-Banach theorem with significant practical implications. One of these states that given a convex set Y in X containing an interior point and given a fixed point not in the interior of ,Y there is a hyperplane separating the fixed point and the convex set .Y The second of these asserts that if Y t and Y z are convex sets in X, if Y t has interior points, and if Y z contains no interior point of Y I , then there is a closed hyperplane which separates Y t and Y z •
6.9.
DU A L
SPACE AND SECOND DU A L
SPACE
In this section we briefly reconsider dual space X · (see Definition 6.5.9), and we introduce the dual space of X·, called the second dual space. Throughout this section X is a real normed linear space, and X ' is the algebraic conjugate of .X We begin by determining the dual spaces of some common normed linear spaces. 6.9.1. Example. eL t X = R· , let x = (el' ... ,e.) denote an arbitrary element of R·, let a = (/XI' ..• , /X.) be some fixed element in R', let II x II = ~f + ... + ~:, and recall from Example 6.5.14 that the functional/(x) = /X l et + .,. + /X.e. is a bounded linear functional on X and II/II = 11011. If we define a set of basis vectors in R· as e l = (1,0, ... ,0), ... , e. = (0, ... , 0, I), then x
E
R· may be expressed as x
= L•
I- I
e,e,. If we let
/(e,), where/is any bounded linear functional on R·, then
/x,
=
6.9.
Dual Space and Second Dual Space
Thus, the dual space X *
of R'
=
Furthermore, the norm on x * is 6.9.2. Exercise. Let given by IIxll = max that X *
=
is itself the space R' in the sense that X
. CE
=
consist of all functionals of the form f(x )
the elements of X *
there is an a
371
15;15;,
= X
I~,I
IlfII =
lal
l 2 ex n / •
E
•
ex,~"
•
R' , where the norm of x = (~I' ... (see Example 6.1.5). Show that if f
= (ex .. ... ,ex.)
~
~
,~.) E
X*,
is then
E X
+ ... + ex.~ , • is given by Ilfll= E lex,l. I- I
R· such that f(x ) =
R' , and show that the norm on X*
exl~1
so
6.9.3. Exercise. Let X = R' , and define the norm of x = (~I' ... , ~,) E X 1~,lp)l/p, where I < p < (see Example 6.1.5). by IIxll = (1~llp Show that if f E x * then there is an a = (ex ... .. , ex,) E R" such that f(x ) = exl~1 + ... + ex,~" i.e., X* = R', and show that the norm on X *
00
+ ... +
is given by IlfII =
(I ex I I'
6.9.4.
Let
Exercise.
6.1.6 and let
.1 + .1q = p
+ ... +
X
IIX,I')
1/"
where q is such that .1 p
I. If p
I.
00, defined in Example 00. Show that the dual <
be the space 1.1" 1 < p
+ .1q =
= 1, we take q =
space of 1.1' is I,. Specifically, show that every bounded linear functional on lp is uniquely representable as I(x )
-
= 1=I; ex,"{ 1
where a = (ex .. ... , ex k , • • ) is an element of I,. Also, show that every element a of If defines an element of (lp)* in the same way and that
IIfll=
1(sup1: IIXIex,Ill,)I/' I- I I
if p
001
< P<
if I
=
I.
.
Since X * is a normed linear space (see Theorem 6.5.6) it is possible to form the dual space of X*, which we will denote by X " and which will be referred to as the second dual space of .X As before, we will use the notation x " for elements of X** and we will write
x " (x ' ) = ,' x <
x " ),
where x' E X*. If X ' denotes the algebraic conjugate of ,X then the reader can readily show that even though X * c X ' and X** c (X * )I, in general, x * * is not a linear subspace of X f f. Let us define a mapping J of X into x * * by the relation (x ' , J x )
= (x , x ' ) , x
E ,X
x'
E
x*
(6.9.5)
Chapter 6 I Normed Spaces and Inner Product Spaces
372
or, equivalently, by
=
Jx
x",
=
x " (x ' )
x'(x).
(6.9.6)
We call this mapping J the canonical mapping of X into X**. The functional x " defined on X * in this way is linear, because
+
"x (<;x <
and thus x "
E
px ; )
(X * )/.
=
,x <
«x;
+
px ; )
Since
=
,x<«
=
x;)
«x"(x~)
+
+
p<,x
;x >
px " (x ; ),
Ix " (x ' ) I= Ix'(x) I= I,x< )'x 1 < IIxllllxl' I, it follows that Ilx"ll < IIxll and thus x " E X**. We can actually show that II x " II = II x II· This is obvious for x = O. If x *- 0, then in view of Corollary 6.8.6 there exists a non-zero x ' E X * such that ,x < x ' ) = II x 1111 x ' II, and thus IIx"ll = IIxll. From this it follows that the norm of every x E X can be defined in two ways: as the norm of an element in X and as the norm of a linear functional on X * , i.e., as the norm of an element in X**. We summarize this discussion in the following result: 6.9.7. Theorem. X
is isometric to some linear subspace in X**.
If we agree not to distinguish between isometric spaces, then Theorem 6.9.7 can simply be stated as X c X**. 6.9.8. Definition. A normed linear space is said to be reflexive if the canonical mapping (6.9.6), J : X - + X**, is onto. If we again agree not to distinguish between isometric spaces, we write in this case X** = .X If X*X**, then X is said to be irreflexive. 6.9.9. Example.
The space
R~,
<
I
<
<
00
is reflexive.
6.9.10. Example.
The spaces Ip , I
6.9.11. Example.
The space II is irreflexive.
6.9.12. Exercise. 6.9.1 I.
Prove the assertions made in Examples 6.9.9 through
6.10.
WEAK
p
00,
are reflexive.
_ _
_
CONVERGENCE
Having introduced the normed dual space, we are now in a position to consider the notion of weak convergence, a concept which arises frequently in analysis and which plays an important role in certain applications. Throughout this section X denotes a normed linear space and *X is the dual space of .X
6.10.
373
Weak Convergence
6.10.1. Definition. A sequence .x{ l of elements in X is said to converge weakly to the element x E X if for every x ' E * X , .x< , x ' )- + ,x < x ' ) . In this case we write x . + - x weakly. If a sequence { x . } converges to x E ,X i.e., if II x . - x 11-+ as n - 00, then we call this convergence strong convergence or convergence in norm to distinguish it from weak convergence.
°
6.10.2. Theorem. Let .x{ l be a sequence in X which converges in norm to x E .X Then .x{ l converges weakly to .x
Proof
Assume that
have
1.x< ,
and thus x . -
°
Ilx. - x l l-
<
x ' ) - (x , ,x )1 x weakly. _
as n -
Ilx'llllx.
-
00.
Then for any x '
x l l- -
0 as n -
6.10.3. Example. X 2
Consider in
/2
= (0, 1,0, ... ,0, ...),
the sequence of vectors X 3
*X
we
00,
Thus, strong convergence implies weak convergence. However, verse is not true, in general, as the following example shows.
0, ...),
E
X J
=
the con-
(1,0, ... ,
= (0,0, I, ... ,0, ...), .... To show
that { x . } converges weakly we note that every x ' E /2 = *X can be represented as the scalar product with some fixed vector y = ('11' 1' 2' ... , 1' ., ...); i.e., if x = (el' e2' ... ,e., ... ), then
,x <
x')
= I=:E el'1l J ~
(see Exercise 6.9.4). F o r the case of the sequence { x . }
we now have
.x< , x ' ) = 1' ., and since 1' . + as n - 00 for every y E /2' it follows that .x < , x ' ) +as n - 00 for every x ' E 12 , Thus, .x{ } converges to weakly. However, x. + strongly, because II x . II = 1. _
°
°
°
°
We leave the proof of the next result as an exercise to the reader. 6.10.4. Theorem. If X are equivalent. 6.10.5. Exercise.
is finite dimensional, weak and strong convergence
Prove Theorem 6.10.4.
Analogous to the concept of weak convergence of elements of a normed linear space X we can introduce the notion of weak convergence of elements of * X . 6.10.6. Definition. A sequence of functionals {x~} in x* converges weakstar (i.e., weak*) to the linear functional x ' E x* if for every x E X we have ,x < )~x + - ,x< x ' ) . We say that x~ - x ' weak*.
Chapter 6 I Normed Spaces and Inner Product Spaces
374
Since strong convergence in X · implies weak convergence in X · , it follows that if a sequence of linear functionals {x~} in X · converges to the linear functional x ' E X · , then x~ - + x ' weak·. Let us consider an example. 6.10.7. Example. Let a[ , b) be an interval on the real line containing the origin, i.e., a < 0 < b, and let e{ a[ , b); II . II~} be the Banach space of realvalued continuous functions as defined in Example 6.1.9. Let l{ pft} be a sequence of functions in era, b) satisfying the following conditions for n = I, 2, ... : (i) Ipft(t) >
(ii) (iii)
Ipft(t)
=
0 for all t E a[ , b); 0 if It I > lin and t
s: Ipft(t) dt =
a[ , b); and
E
I.
F o r each n = I, 2, ... , we can define a continuous linear functional on X (see Example 6.5.13) by
(x , where x
E
era, b]. Now let ~x
x~>
s: (x t)lpft(t) dt
=
be defined on
(x ,
~
x~>
era, b) by
= (x O)
for all x E era, b]. It is clear that x~ E X · . We now show that ~ - + x~ weak· . By the mean value theorem from the calculus, there is a tft such that - l in S t. < lin and
f
1/. - l ift
Ip.(t)x(t) dt
=
(x tft)
f
lift - l ift
'ft(t) dt
=
(x tft)
for each n = 1,2, ... , and x E era, b]. Thus, (x , x~> -+ (x O) for every x E era, b]; i.e., ~x - X o weak· . We see that the sequence of functions I{ p.} does not approach a limit in era, b]. In particular, there is no Ipo E era, b] such that (x O)
=
s: (x t)lpo(t) dt. Frequently,
in applications, it is convenient
to say the sequence l{ pft} converges to the so-called "~ function" which has this property. We see that the sequence l{ pft} converges to the ~ function in the sense of weak· convergence. _ 6.10.8. Theorem. Let X be a separable normed linear space. Every bounded sequence of linear functionals in X · contains a weakly convergent subsequence.
Proof Since X is separable, we can choose a denumerable everywhere dense x z , ..• ,x ft , • . } in .X Now let }~x{ be a sequence in X · . Since this set IX{ ' }>~x is a bounded sequence sequence is bounded in norm, the sequence ({ IX > a subsequence in either R or C. It now follows that we can select from }~x{
375
6.11. Inner Product Spaces
{xU such that the sequence IX < { ' .~x J> converges. Again, from the subwe can select another subsequence {x~.J such that the sequence sequence .~x{ J lX < { ' :.~X J> converges. Continuing this procedure, we obtain the sequences
, x~.,
"~X "~x
.
X~"
x~.,
, x~.,
.
X~"
x~.,
, x~,
.
By taking the diagonal of the above array, we obtain the subsequence of linear functionals X~" x~., x~., .... F o r this subsequence, the sequence ,~X (X)~ , .~x (x)~ , l'x .()~X , ... converges for all n. But then X~,(X), /x ..(x), .~x (x), ... converges for all X E .X This completes the proof of the theorem. _ The concepts of weak convergence and weak* convergence give rise to various generalizations, some of which we briefly mention. Let X be a normed linear space and, let x * be its normed dual. We call weak· compact if every infinite sequence from Y contains a a set Y c: X · weak* convergent subsequence. We say that a functional defined on ,X which in general may be non-linear, is weakly continuous at a point X o E X if for x~, ... , x~l in * X , every f > 0 there is a ~ > 0 and a finite collection {x~, such that If(x ) - f(x o) I < f for all x such that I<,x ;x > I < ~ for i = 1,2, ... ,n. We can define weak· continuity of a functional similarly by inter-
changing the roles of X
and X · .
It can be shown that if X is a real normed linear space and X · is its normed dual, then any closed sphere in x* is weak· compact. The reader can readily show that iff is a weakly continuous functional, then x . - + X weakly implies that f(x~) -+ f(x ) .
6.11.
INNER PRODUCT
SPACES
We recall (see Definition 3.6.19 and the discussion following this definition) that if X is a complex linear space, a function defined on X X X into C, which we denote by (x, y) for x, y E ,X is called an inner product if (i) (x, )x
> 0 for all x * - O and (x, )x = 0 if x = 0;
(ii) (x, y) = (y, )x for all x, y E X ; (iii) (IXX + py, z ) = IX(,X z ) + P(Y, z ) for all x , y, z IX, P E C; and (iv) (x , IXy + pz ) = « ( x , y) + P(x , z ) for all x , y, z IX, p E C.
X
E E
X
and for all and for all
Chapter 6 I Normed Spaces and Inner Product Spaces
376
In the case of real linear spaces, the preceding characterization of an inner product is identical, except we omit complex conjugates in (ii) and (iv). We call a complex (real) linear space X on which an inner product, ( " ' ) , is defined a complex (real) inner product space which we denote by ;X { ( " .)} (see Definition 3.6.20). If the particular inner product being used in a given discussion is understood, we simply write X to denote the inner product space. In accordance with our discussion following Definition 3.6.20, recall also that different inner products defined on the same linear space yield different inner product spaces. Finally, refer also to the discussion following Definition 3.6.20 for the characterization of an (inner product) subspace. We have already extensively studied finite-dimensional real inner product spaces, i.e., Euclidean vector spaces, in Sections .4 9 and .4 10. Our subsequent presentation will be in a more general setting, where X need not be finite dimensional and where X may be a complex vector space. In fact, unless otherwise stated, { X ; ( " .)} will denote in this section an arbitrary complex inner product space. Since the proofs of several of the following theorems are nearly identical to corresponding ones in Sections .4 9 and .4 10, we will leave such proofs as exercises. One of our first objectives will be to show that every inner product space { X ; ( " .)} has a norm associated with it which is induced by its inner product ( " .). We find it convenient to consider first the Schwarz inequality, given in the following theorem. 6.11.1. Theorem. F o r any x E ,X let us define the function by IIxll = (x , X ) I/2. Then for all x , y E ,X
II . II: X
I(x, y)l ~ II x ll·I\ IY I· 6.11.3. Exercise.
6.11.4.
Prove Theorem 6.11.l (see Theorem .4 9.28).
Theorem. Let X -+ R defined by
is a norm; i.e., for every ,x Y (i) (ii) (iii) (iv) 6.11.6~
II . II
be an inner product space. Then the function
IIxll = E
(x,
(6.11.5)
X ) I/2
X and for every«
E
C, we have
IIxll ~ 0; IIxll = 0 if and only if x = 0; lI«lx l = 1«lIlxll; and Ilx + IY I < IIxll + lIyll· Exercise.
R
(6.11.2)
Using the above results, we can now readily show that the function defined by IIxll = (x, X ) I/2 is a norm.
II . II: X
-+
Prove Theorem 6.11.4 (see Theorem .4 9.31).
6.11.
Inner Product Spaces
377
Theorem 6.11.4 allows us to view every inner product space as a normed linear space, provided that we use Eq. (6.11.5) to define the norm on .X Moreover, in view of Theorem 6.1.2, we may view every inner product space as a metric space, provided that we define the metric by p(x, y) = Ilx - yll.
Subsequently, we adopt the convention that when using the properties and terminology ofa normed linear space in connection with an inner product space we mean the norm induced by the inner product, as given in Eq. (6.11.5). We are now in a position to make the following important definition.
6.11.7. space.
Definition.
A complete inner product space is called a H i lbert
Thus, every H i lbert space is also a Banach space (and also a complete metric space). Some authors insist that H i lbert spaces be infinite dimensional. We shall not follow that practice. An arbitrary inner product space (not necessarily complete) is sometimes also called a pre- H i lbert space. 6.11.8. Example. L e t X be a finite-dimensional (real or complex) inner product space. It follows from Theorem 6.6.5 that X is a H i lbert space. _ 6.11.9. Example. L e t 12. be the (complex) linear space defined in Ex a mple 6.1.6. L e t x = (el> e2.' ...) E 12.' Y = (111) 112.' ...) E 12.' and define (x, y): 12. X 12. - Cas (x , y)
.
= I-I; elil' I
It can readily be shown that ( " .) is an inner product on .X Since 12. is complete relative to the norm induced by this inner product (see Ex a mple 6.1.6), it follows that 12. is a H i lbert space. _ 6.11.10. Ex a mple (a) L e t X = ~[a, b] denote the linear space of complex-valued continuous functions defined on a[ , b] (see Ex a mple 6.1.9). F o r ,x y E ~[a, b] define (x , y)
=
s:
x ( t)y(t) dt.
It is readily verified that this space is a pre- H i lbert space. In view of Example 6.1.9 this space is not complete relative to the norm II x II = (x, X)I/2., and hence it is not a H i lbert space. (b) We extend the space of real-valued functions, pL a[ , bJ, defined in Ex a mple 5.5.31 for the case p = 2, to complex-valued functions to be the set of all functions f: a[ , b] C such that f = u + iv for u, v E 2L .[a, b]. Denoting this space also by 2L .[a, b], we define
(f, g)
= r
G [ J .bl
fgdp,
Chapter 6 I Normed Spaces and Inner Product Spaces
378
for f, g
b], where integration is in the eL begue .)} is a Hilbert space. _
E L~[a,
b]; ( "
{L~[a,
In the next example
sense. The space
we consider the Cartesian product of Hilbert
spaces.
i = I, ... , n, denote a finite collection of 6.11.11. Example. Let IX{ '} Hilbert spaces over C, and let X = IX X •• x X .• If x E ,X then x = (X I J ' • . , x.) with IX E IX ' Defining vector addition and multiplication of vectors by scalars in the usual manner (see Eqs. (3.2.14), (3.2.15), and the related discussion, and see Example 6.1.10) it follows that X is a linear space. If ,x Y E X and if (XI' IY )I denotes the inner product of IX and IY on uX then it is easy to show that
defines an inner product on .X The norm induced on X b y is
Ilxll = where IIXlIII = X is a Hilbert
=
(x, )X I/2
d: IIIX
11f)1/2
I- I
X I)/'2. It is readily verified that X
(XI'
space. _
6.11.12. Exercise.
this inner product
is complete, and thus
Verify the assertions made in Example 6.1 1.11.
In Theorem 6.1.15 we saw that in a normed linear space { X ; II • II}, the norm 1\ • II is a continuous mapping of X into R. Our next result establishes the continuity of an inner product. In the following, X . +- X implies convergence with respect to the norm induced by the inner product ( " .) on .X
6.11.13. X E
,X
Theorem. Let .x{ } be a sequence in X and let .Y { } be a sequence in .X Then
+-
(i) (z, x . )
(ii) (x . , z) -
(iii)
IIxlIll-+-
(iv) if 1; .Y ~
,._ 1
Z
6.11.14.
E
(z, )x for all z (x, )z for all z IIxll; and
E
;X
E
X;
is convergent in ,X
then (1; .Y , )z
.X
Exercise.
such that x .
~
,,= 1
+-
x, where
= n:o::.l 1; (y., )z for all ~
Prove Theorem 6.11.13.
Next, let us recall that two vectors x, Y E X are said to be orthogonal if (x, y) = 0 (see Definition 3.6.22). In this case we write x ..L y. If Y c X
379
6.11. Inner Product Spaces
and x E X is such that x .J .. y for all y E ,Y then we write x .J .. .Y Also, if Z c X and Y c X and if z .J .. Y for all z E Z, then we write Y .J .. Z. Furthermore, observe that x .J .. x implies that x = O. Finally, the notion of inner product allows us to consider the concepts of alignment and colinearity of vectors. 6.11.1S. Definition. Let X be an inner product space. The vectors x, y E X are said to be coJinear if (x, y) = ± l Ix l l ·llyll and aligned if (x, y) =
Ilxll·IIYII·
Our next result is proved by straightforward computation. 6.11.16. Theorem.
+
+
F o r all x, y
yW Ilx (i) Ilx (ii) if x .J .. y, then IIx
6.11.17. Exercise.
+
yW = yW
E
X we have
211xW
= IlxW
+
+
211yW; and
IlyW·
Prove Theorem 6.11.16.
Parts (i) and (ii) of Theorem 6.11.16 are referred to as the parallelogram law and the Pythagorean theorem, respectively (refer to Theorems .4 9.33 and .4 9.38). Let x { .. : a E I} be an indexed set of elements in ,X where I is an arbitrary index set (i.e., I is not necessarily the integers). Then (x .. : « E I} is said to be an orthogonal set ofvectors if x .. ...L x p for all ,« pEl such that « 1= = p. A vector x E X is called a unit vector if II x II = 1. An 6.11.18. Definition.
orthogonal set of vectors is called an orthonormal set if every element of the set is a unit vector. Finally, if IX{ } is a sequence of elements in ,X we define an orthogonal sequence and an orthonormal sequence in an obvious manner. sU ing an inductive process we can generalize part (ii) of Theorem 6.11.16 as follows. 6.11.19. Theorem.
Let { X I '
... ,x
II ~ x
J
n}
be a finite orthogonal set in .X
W= J~
Then
IIx J llz.
We note that if x 1= = 0 and if y = lx llxll, then lIyll = 1. eH nce, it is possible to convert every orthogonal set of vectors into an orthonormal set. Let us now consider a specific example. 6.11.20. Example. Let X denote the space of continuous complex-valued functions on the interval 0[ , I]. In accordance with Example 6.11.10, we
I Normed Spaces and Inner Product Spaces
Chapter 6
380
define an inner product on X by
=
(f, g)
(f(t)g(t) dt.
(6.11.21)
We now show that the set of vectors defined by fft(t)
= e2a .,' , n = 0, ± I , ± 2 , ... ,i = , J = I
is an orthonormal set in .X we obtain (f.,f",) =
Substituting Eq.
J
I
e 2a (a- I II)'
Since e
2ak
i.e., if m
'
=
cos 2nk
-
+
2n(n -
II
(fft,f",) =
0, m
* n;
0 e2aCa- I II)"
(6.11.21),
dt
1 m)i
i sin 2nk, we have
* n, then fa ..L
fill' On the other hand,
:J
(fft,fft) =
i.e., if n =
-
(6.11.22) into Eq.
=
fft(t)f",(t) dt
0
(6.11.22)
m, then (fft,fft) =
e2a (ft- f t)" dt =
Il/all =
I and
I;
1. •
The next result arises often in applications. 6.11.23. Theorem. (i)
t I(x,
1='
(ii) (x -
x;)
6.11.25. Exercise.
x,)x,)
is a finite orthonormal set in ,X
... , fX t}
12 < IlxW
:t (x,
1='
If { X I '
for all
..L x J
X
X;
E
for any j
and
then
(6.11.24)
= 1, ... , n.
Prove Theorem 6.11.23 (see Theorem .4 9.58).
On passing to the limit as n result. 6.11.26. Theorem. If ,x { }
->
00
in (6.11.24), we obtain the following
is any countable orthonormal set in ,X
then (6.11.27)
for every x
E
.X
The relationship (6.11.27) is known as the Bessel inequality. The scalars (x , x,) are called the Fourier coefficients of x with respect to the orthonormal set ,x { .} The next result is a generalization of Theorem .4 9.17.
(1"
=
6.12.
381
Orthogonal Complements
6.11.28. Theorem. In an inner product space X we have (x, y) x E X if and only if y = O. 6.11.29. Exercise.
=
0 for all
Prove Theorem 6.11.28.
From our discussion thus far it should be clear that not every normed linear space can be made into an inner product space. The following theorem gives us sufficient conditions for which a normed linear space is also an inner product space. 6.11.30. Theorem.
Let
X
be a normed linear space. If for all ,x y
Ilx + yll2 + Ilx - yW = 2(llxW + IlyW), then it is possible to define an inner product on X by (x, y)
=
+
tfll x
for all ,x y
E
,X
yW where i =
6.11.33. Exercise.
E
,X
(6.11.31)
IIx - yW + illx + iyW - illx - iyW} (6.11.32) ,.;=T.
Prove Theorem 6.11.30.
6.11.34. Corollary. If X is a real normed linear space whose norm satisfies Eq. (6.11.31) for all ,x y E ,X then it is possible to define an inner product on X by (x, y)
for all ,x y
E
= tWx
+
yW - l lx -
yW}
.X
6.11.35. Exercise.
Prove Corollary 6.11.34.
In view of part (i) of Theorem 6.11.16 and in view of Theorem 6.11.30, condition (6.11.31) is both necessary and sufficient that a normed linear space be also an inner product space. Furthermore, it can also be shown that Eq. (6.11.32) uniquely defines the inner product on a normed linear space. We conclude this section with the following exercise. 00, be the normed linear space defined 6.11.36. Exercise. eL t I" I < p < in Example 6.1.6. Show that I, is an inner product space if and only if p= 2.
6.12.
ORTHOGONAL
COMPLEMENTS
In this section we establish some interesting structural properties of Hilbert spaces. Specifically, we will show that any vector x of a Hilbert space X can uniquely be represented as the sum of two vectors y and ,z where y
Chapter 6 I Normed Spaces and Inner Product Spaces
is in a subspace Y of X
and z is orthogonal to .Y
This is known as the
projection theorem. In proving this theorem we employ the so-called "classical
projection theorem," a result of great importance in its own right. This theorem extends the following familiar result to the case of (infinite-dimensional) Hilbert spaces: in the three-dimensional Euclidean space the shortest distance between a point and a plane is along a vector through the point and perpendicular to the plane. Both the classical projection theorem and the projection theorem are of great importance in applications.
Throughout this section, ;X {
(-, .)) is a complex inner product space.
6.12.1. Definition. eL t Y be a non-void subset of .X The set of all vectors orthogonal to ,Y denoted by .Y l, is called the orthogonal complement of .Y The orthogonal complement of yl. is denoted by y{ l.).l. 6 yil, the orthogonal complement of yil is denoted by (yil)~ 6 Yil.l, etc. 6.12.2. Example. eL t X be the space 3£ depicted in iF gure G, and let Y be the Ix a- ix s. Then yl. is the x 2 x 3p- lane, yu is the Ix a- ix s, ~Y is again = yl., yilil the x 2 x 3p- lane, etc. Thus, in the present case, y.u = ,Y ~ Y = yil, yil~il = y~il = yl., etc. _
y
y.
Xl
6.11.3
iF gure G
We now state and prove several properties of the orthogonal complement. The proof of the first result is left as an exercise. 6.12.4.
Theorem. In an inner product space ,X O { )l. =
6.12.5. Exercise.
X
and Xl. =
O { .J
Prove Theorem 6.12.4.
6.12.6. Theorem. eL t Y be a non-void subset of .X Then y~ is a closed linear subspace of .X Proof If ,x y E y.l, then (x, )z = 0 and (y, )z = 0 for all z E .Y eH nce, (<x< + py, )z = (« ,x )z + P(Y, )z = 0, and thus (<x< + Py) .l- z for all z E ,Y or (<x< py) E lY .. Therefore, yl. is a linear subspace of .X
+
6.12.
Orthogonal Complements
383
To show that y.l is closed, assume that X o is a point of accumulation of Then there is a sequence fx)~ from y.l such that II ~x - X o11- 0 as n 00. By Theorem 6.11.13 we have 0 = (x~, z) (x o, z ) as n 00 for all Z E .Y Therefore X o E y.l and y.l is closed. _ lY ..
Before considering the next result we require the following concept.
6.12.7. Definition. Let Y be a non-void subset of ,X and let V(Y) be the linear subspace generated by Y (see Definition 3.3.6). Let V(Y) denote the closure of V(Y). We call V(Y) the closed linear subspace generated by .Y Note that in view of Theorem 6.2.3, V(Y) of .X
6.12.8. Theorem. Let (i) (ii) (iii) (iv) (v)
is indeed a linear subspace
Y and Z be non-void subsets of .X Then
either Y () yl. = 0 or Y () y.l = O { ;J Y c Y.ll.; if Y c Z, then Zl. c Y.l; y.l = Y . lll; and yH is the smallest closed linear subspace of X i.e., yH = V(Y).
which contains ;Y
To prove part (i), assume that Y () yl. 1= = 0, and let x E Y Then x E Y and x E y.l and so (x , x ) = O. This implies that x = The proof of part (ii) is left as an exercise. To prove part (iii), let Y E Z.L. Then y .l- x for all x E Z. Since it follows that y .l- x for all x E .Y Thus, y E y.L whenever Y E
Proof
y.L
::::J
Z.L.
() .Y l.
O.
Z ::::J ,Y Z.L and
To prove part (iv) we note that, by part (ii) of this theorem, y.l c yll.L . On the other hand, since Y c yH , by part (iii) of this theorem, y.L ::::J y.L l l. Thus, y.L = y.L..L .L The proof of part (v) is also left as an exercise. _
6.12.9. Exercise.
Prove parts (ii) and (v) of Theorem 6.12.8.
In view of part (iv) of the above theorem, we can write y.L = y.LH = = ... , and y.l.L = y.l.L..L L = yU H l l. = .... Before giving the classical projection theorem, we state and prove the following preliminary result. yl..L..L lL .
6.12.10. Theorem. Let Y arbitrary vector in .X Let
6=
be a linear subspace of ,X inf(lIy -
xII: y
E
.} Y
and let x
be an
Chapter 6 I Normed Spaces and Inner Product Spaces
384
If there exists a oY E Y such that lIyo - Ix I = 0, then oY is unique, and moreover oY E Y is the unique element in Y such that IIoY - x II = 0 if and only if (x - oY ) 1- .Y
Proof Let us first show that if II oY - x II = 0, then (x - oY ) 1- .Y In doing so we assume to the contrary that there is ayE Y not orthogonal to x - oY ' We also assume, without loss of generality, that y is a unit vector and that (x - oY , y) = « O. Defining a vector Z E Y as z = oY + ,Y« we have
IIx - W z
=
*
IIx -
y« 1l2 =
(x -
oY -
= (x - oY , x - oY ) -
(x -
oY , « y ) -
= IIx = IIx i.e., II x - z II < II x -
oY -
oY 11 2 - 1 1«
-
2
2
I« I
+
« y , x - oY (<,Y<
«y) oY )
x -
+
(<,Y<
«y)
«lIlIyll2 oY 11 2;
11 1- 1« < Ilx oY II. F r om this it follows that if x oY is not orthog,Y then Ilyo - ix i o. This completes the first part of oY
*
onal to every y E the proof. Next, assume that (x - oY ) 1- .Y We must show that oY is a uniq u e vector such that II x y II > II x - oY II for all y oY . F o r any y E Y we have, in view of part (ii) of Theorem 6.11.16,
*
I!x - yW = IIx - oY
+
oY -
F r om this it follows that IIx - yll pletes the proof of the theorem. _
yW =
IIx - oY W
> Ilx - oY II for
+
Ilyo -
all y
*oY -
yW· This com-
In Figure H the meaning of Theorem 6.] 2.] 0 is illustrated pictorially for a subset Y of £3.
x
6.12.11.
fF uJ re
H
The preceding theorem does not ensure the existence of the vector oY . However, if we require in Theorem 6.12.10 that Y b e a c/osedlinear subspace in a H i lbert space ,X then the existence of the unique vector oY is guaranteed.
6.12.
Orthogonal Complements
385
This important result, which we will prove below, is called the classical projection theorem. 6.12.12. Theorem. eL t X be a Hilbert space, and let Y be a closed linear subspace of .X Let x be an arbitrary vector in ,X and let
J = inf{lIy - Ix I: Y
E
.}Y
Then there exists a unique vector oY E Y such that IIYo - Ix I = .J over, oY E Y is the unique vector such that IIY o - x l l= i nf(lly- x l l: Y E } Y if and only if the vector (x - oY ) 1.. .Y
More-
Proof. In view of Theorem 6.12. IO we only have to establish the existence of a vector oY E Y such that II x - oY II = .J Assume that x i Y (if x E ,Y then x = oY and we are done). Since J is the infimum of Ily - ix i for all Y E ,Y there is a sequence nY{ } in Y such that Ilx - n Y ll-J as n - 00. We now show that nY { } is a Cauchy sequence. By part (i) of Theorem 6.11.16 we have II(Y", -
x)
+
(x - nY )W
+ II(Y", -
)x =
(x - nY )IJ2
211Y", X
11 2
+
211x -
nY W·
This equation yields, after some straightforward manipulations, the relation Ily", -
nY W
= 211Y", -
xW
+
211x -
nY W -
14 1 x - (y",
Since Y is a linear subspace, it follows that for each "Y " nY (y", + nY )/2 E .Y Thus,llx - (y", + nY )/211 > ~ and
lIy", -
nY W
<
211Y",
-
xW
+
211x -
nY W -
i
E
Y
Yn)!r' we have
(4 P.
Also, since lIy", - x W- - ~2 as m - 00, it follows that lIy", - "Y W - 0 is a Cauchy sequence. Since Y is a closed linear as m, n - 00. eH nce, nY{ } subspace of a Hilbert space, it is itself a Hilbert space and as such nY{ } has a limit oY E .Y iF nally, by the continuity of the norm (see Theorem 6.1.15), it follows that lim II x - "Y II = II x - oY II = .J This proves the theorem. The next result is a consequence of the preceding theorem. 6.12.13. Theorem. If Y and Z are closed linear subspaces of a Hilbert space ,X if Y c Z, and if Y F= Z, then there exists a non-zero vector in Z, say ,z such that z 1.. .Y
Proof. Let x be any vector in Z which is not in Y (there is one such vector by hypothesis). If we define J as above, i.e., J = inf{lly - Ix I: Y E ,}Y then there exists by Theorem 6.12.12 a vector oY E Y such that II x - oY II = .J Now let z = oY - .x Then z 1.. Y by Theorem 6.12.12. _
Chapter 6 I Normed Spaces and Inner Product Spaces
386
From part (ii) of Theorem 6.12.8 we have, in general, y.u. certain conditions equality holds. 6.12.14. Theorem. eL t f = y.u..
Y be a linear subspace of a Hilbert
::>
.Y Under
space .X
Then
Proof From part (ii) of Theorem 6.12.8 we have Y c y.u.. Since y.u. is closed by Theorem 6.12.6, it follows that f c y.u.. F o r purposes of contradiction, let us now assume that f := 1= y.u.. Then Theorem 6.12.13 establishes the existence of a vector z E y.u. such that z := 1= 0 and such that z - ' f. Thus, .z E· fl.. Since Y c f, it follows that Z E yl.. Therefore, we have Z E yl. n y.u. and Z := 1= 0, which is a contradiction to part (i) of Theorem 6.12.8. eH nce, we must have f = y.u.. • We note that if, in particular, Y is a closed linear subspace of X, then y.u.. In connection with the next result, recall the definition of the sum of two subsets of X (see Definition 3.2.8). Y
=
6.12.15. Theorem. If Y and Z are closed linear subspaces of a Hilbert space ,X and if Y - ' Z, then Y + Z is a closed linear subspace of .X
Proof In view of Theorem 3.2.10, Y + Z is a linear subspace of .X To show that Y + Z is closed, it suffices to show that if u is a point of accumulation for Y + Z, then u = Y + z for some Y E Y a nd for some Z E Z. eL t u be a point of accumulation of Y + Z. Then there is a sequence of vectors u{ ,,} in Y + Z with lIu" - ull- 0 as n - 00. In this sequence we have for each n, u" = "Y + "z with "Y E Y and "z E Z. Suppose now that u{ ,,} converges to a vector u E .X By the Pythagorean theorem (see Theorem 6.11.16) we have
Ilu" - umW = IIY" - mY + But II u" - U m 11- 0 as m, n -
Z" -
m z W =
IIY" - m Y W
+
liz" - m z ll 2•
00, because u" having a limit is a Cauchy sequence. Therefore, IIY" - m Y W - 0 and liz" - m z W - 0 as m, n - + 00. But this implies that the sequences y{ ,,}, Z{ It} are also Cauchy sequences. Since Y and Z are closed, these sequences have limits Y E Y and Z E Z, respectively. iF nally, we note that
Ilu" as n - + u= Y
+
(y
+ )z 1I
= IIY" - Y
+
ZIt -
lz l <
IIY,,-
yll + liz" - lz l
~
0
Therefore, since "z cannot approach two distinct limits, we have .z This completes the proof. •
00.
Before proceeding to the next result, we recall from Definition 3.2.13 that a linear space X is the direct sum of two linear subspaces Y and Z if for every x E X there is a unique Y E Y and a unique Z E Z such that x =
6.13. oF urier
Series
387
+
Y
.z We write, in this case, X the projection theorem.
= Y
ffi Z. The following result is known as
6.12.16. Theorem. If Y is a closed linear subspace of a Hilbert then X = Y f fi y1..
space X,
Proof Let Z = Y + y1.. By hypothesis, Y is a closed linear subspace and so is y1. in view of Theorem 6.12.6. F r om the previous result it now follows
that Z is also a closed linear subspace. Next, we show that Z = .X Since Y c Z and y1. c Z it follows from part (iii) of Theorem 6.12.8 that Z1. c y1. and also that Z1. c y1.1., so that Z1. c y1. nY u . But from part (i) of Theorem 6.12.8 we have y1. n y1.1. = O { .J Therefore, the ez ro vector is the { .J Since Z is a closed linear only element in both y1. and yil, and thus Z1. = O subspace we have from Theorems 6.12.4 and 6.12.14,
Y
Z
=
Zu =
(Z1.)1.
= 0{ 1} . =
.X
We have thus shown that we can represent every x E X as the sum x = + ,z where Y E Y and z E y1.. To show that this representation is unique we consider x = IY + Zl and x = 2Y + Z2' where tY > 2Y E Y and Zl' Z2 E y1.. Then (x - x ) = 0 = IY - 2Y + Z\ - Z2 or IY - 2Y = Z2 - Zl' Now clearly (YI - 2Y ) E Y a nd (Z2 - zJ E yl.. Since IY - 2Y = Z2 - Zl we also have (Y\ - 2Y ) E y1. and (Z2 - Zl) E .Y F r om this it follows that IY - 2Y = Z2 - Z\ = 0; eL ., IY = 2Y and Zl = Z2' Therefore, x is unique. _ The above theorem allows us to write any vector x of a Hilbert space X as the sum of two vectors Y and z ; i.e., x = y + ,z where y is in a closed linear subspace Y o f X and z is in y1.. It is this theorem which gave rise to the expression orthogonal complement. If X is a Hilbert space and if Y is a closed linear subspace of X and if x = y + ,z where y E Y a nd Z E y1., then we define the mapping P as Px = y .
We call the function P the projection of x onto .Y Note that P(Px ) ~ p2X = Py = ;Y eL ., p2 = P. We will examine the properties of projections in greater detail in the next chapter. (Refer also to Definition 3.7.1 and Theorem 3.7.4.)
6.13.
O F R U IER
SERIES
In the previous section we examined some of the structural properties of Hilbert spaces. Presently, we will concern ourselves with the representation of elements in Hilbert space. We will see that the vectors of a Hilbert space can under certain conditions be represented as a linear combination of a
Chapter 6 I Normed Spaces and Inner Product Spaces
388
finite or infinite number of vectors from an orthonormal set. In this connection we will touch upon the concept of basis in Hilbert space. The property which makes all this possible is, of course, the inner product. Much of the material in this section is concerned with an abstract approach to the topic of F o urier series. Since the reader is probably already familiar with certain facets of F o urier analysis, he or she is now in a position to recognize the power and the beauty of the abstract approach. Throughout this section ;X { (0, .)} is a complex inner product space, and convergence of an infinite series is to be understood in the sense of Definition 6.3.1. We now consider the representation of a vector Y of a finite-dimensional linear subspace Y in an inner product space. 6.13.1. Theorem. Let X be an inner product space, let uY{ ..• ,Yn} be a finite orthonormal set in ,X and let Y be the linear subspace of X generated by { Y I ' • . • , nY ' } Then the vectors {Yu ..• , nY } form a basis for Y a nd, moreover, in the representation of a vector Y E Y by the sum
= Y
IXIIY
+ ... +
IXnnY '
the coefficients t1., are specified by t1.1 = ( y,y/),
Exercise.
6.13.2. 4.9.51.)
i= I ,
..• ,n.
Prove Theorem 6.13.1. (Refer to Theorems 4.9.44
and
We now generalize the preceding result. 6.13.3. Theorem.
Let
infinite orthonormal sequence element x
E
.I 1X
1= 1
2
/1
<
00.
t1.,
Proof
A series
i.e.,
,X
if and only if L
in .X
Assume that
=
(x, ,x ), 2
1
=
t
I:& m + l
~
t:1
t1.I X
,
be a countably
is convergent to an
In this case we have the relation
i; I1X / <
1= 1
.
space and let ,x { }
be a H i lbert X
00,
1t1. / 12
i
= I, 2, ....
and let $ "
__
O
=
t IX,X
I- I
,. If n
> m, then
6.13. oF urier
Series
389
as n, m - > 00. Therefore, {s.} is a Cauchy sequence and as such it has a limit, say ,x in the Hilbert space .X Thus lim s. = x . Conversely, if s{ .}
=
~
•
Ill,1 2- >
1-11I+1
and ~
00
1· ..+ 1
Ill,12 <
• -00
converges then it is a Cauchy sequence and" s. -
0 as n, m - > 00.
From
00.
this it follows that
~
•
'-11I+1
s". W
Ill,12 - >
0
.
Now assume that} ' Ill,12 < 00, and let x = lim s•. We must show that f:1 - ... ll, = (x, ,x ). From Theorem 6.13.1 we have ll, = (s., ,x ), i = I, ... ,n. But s. - > x, and hence by the continuity of the inner product we have (s., ,x ) - > (x, ,x ) as n - > 00. Therefore, ll, = (x, ,x ), which completes the proof. _ In the next result we use the concept of closed linear subspace generated by a set (see Definition 6.12.7). 6.13.4. Theorem. Let ,x { } be an orthonormal sequence in a Hilbert space ,X and let Y be the closed linear subspace generated by IX { ' } Corresponding to each x E X the series 00
~
converges to an element
x
1-'
E
(x, x,)x
(6.13.5)
,
.Y Moreover, (x -
6.13.6. Exercise. Prove Theorem 6.13.4. (Hint: 6.13.3, and the continuity of the inner product.)
)X ..L .Y
tU ilize
Theorems 6.11.26,
A more general version of Theorem 6.13.4 can be established by replacing by an arbitrary orthonormal set Z. the orthonormal sequence ,x { } In view of Theorem 6.13.4 any element x of a Hilbert space X can unambiguously be represented by a series of the form 6.13.5 provided that the closed linear subspace Y generated by the orthonormal sequence ,x { } is equal to the space .X The scalars (x, ,x ) in 6.13.5 are called Fourier coefllcients of x with respect to the ,x { ,} 6.13.7. Definition. Let X be a Hilbert space. An orthonormal set Y in X is said to be complete if there exists no orthonormal set of which Y is a proper subset. The next result enables us to characterize complete orthonormal sets. 6.13.8. Theorem. Let X be a Hilbert space, and let Y be an orthonormal set in .X Then the following statements are equivalent:
Chopter 6 I oH rmed Spaus and Inner Product Spaces
390
°
Y is complete; (ii) if (x, y) = for all Y (iii) V(Y) = .X (i)
E
=
,Y then x
0; and
6.13.9. Exercise. Prove Theorem 6.13.8 for the case where Y is an orthonormal sequence ,x { .} As a specific example of a complete orthonormal set, we consider the set of elements e l = (1,0, ... ,0, ...), e" = (0, 1,0, ... ,0, ...), e3 = (0,0, 1,0, ... ,0, ...), ... in the Hilbert space I" (see Example 6.11.9). It is readily verified that Y = Ie,} is an orthonormal set in I". Now let x = (' t , ,,,' ... "., .• . ) E
Ilx - iX II"
I", and corresponding to x
f 1',1",
=
Ic:t:l
and thus lim IIx
- iX ii
k- -
let
iX
0. Hence, =
=
~
i
,'1=
,' e,.
V(Y) =
Then I" and
Y is complete by the preceding theorem.
Many of the subsequent results involving countable orthonormal sets may be shown to hold for uncountable orthonormal sets as well (refer to Definition 1.2.48). The proofs of these generalized results usually require a postulate known as Zorn' s lemma. (Consult the references cited at the end of this chapter for a discussion of this lemma.) Although the proofs of such generalized results are not particularly difficult, they do involve an added level of abstraction which we do not wish to pursue in this book. In connection with generalized results of this type, it is also necessary to use the notion of cardinal number of a set, introduced at the end of Section 1.2. The next result is known as Puseval' s formula (refer also to Corollary .4 9.49). 6.13.10. Theorem. L e t X orthonormal in .X Then
space and let the sequence ,x { }
be a Hilbert
be
(6.13.11) for every x
E
X
if and only if the sequence ,x { }
° liz II" L
°
is complete.
Proof. Assume to the contrary that the sequence ,x { }
z
E X
such that
=
such that (z, ,x )
there exists some z 1= =
I(z,
1= =
x,)
I-' that the sequence
Now assume 6.13.4 and 6.13.8 we have x
=
~
-
t=1
is not complete. Then for all i. Thus, there exists a
I". This proves the first part. ,x { }
(x, ,x )x,
is complete. In view of Theorems
=
-
~
,~
,« ,x .
6.13. oF urier
Since ,x { J
Series
391
is orthonormal we obtain IIx I I
2
(i: (1"x " 1; (1,J x )J =
1= I
J=
= 1'' f1=
This completes the proof.
I
J=
1; (1,/i J (x
"
I
)J x
=
t
1= I
1(1,,1 2
_
A more general version of Theorem 6.13.10 can be established by replacing the orthonormal sequence by an orthonormal set. The next result, known as the Gram-Schmidt procedure, allows us to construct orthonormal sets in inner-product spaces (compare with Theorem
.4 9.55).
6.13.12. Theorem. Let
X be an inner-product space. eL t ,x { } be a finite or a countably infinite sequence of linearly independent vectors. Then there exists an orthonormal sequence y{ ,J having the same cardinal number as the and generating the same linear subspace as ,x { ,} sequence ,x { }
Proof
Since IX # . 0, let us define IY as
It is clear that IY
and IX
11;:11' =
YI
generate the same linear subspace. Next, let
Since (Z2' Y I )
=
=
(x 2 (x 2, Y I )
(x 2, IY )YIo -
(X2'
= (x
YI)
2, Y I )
= 0,
YI)
-
(x 2, IY )(YIo
YI)
*"
it follows that Z2 -L Y I ' We now let 2Y = 2z /11 2z 11. Note that Z2 0, because and IY are linearly independent. Also, IY and 2Y generate the same linear subspace as IX and X 2, because 2Y is a linear combination of IY and 2Y ' Proceeding in the fashion described above we define Zlo Z2' ... and Y I ' 2Y ' .• . recursively as 2X
Z. =
.X -
and
Y.=
a- I
~
1= 1
(X., ,Y )Y,
z
IIz:II'
As before, we can readily verify that z. L- ,Y for all i < n, that z. # . 0, and that the ,Y { l, i = I, ... ,n, generate the same linear subspace as the ,x{ ,} i = I, ... ,n. If the set ,x { } is finite, the process terminates. Otherwise it is continued indefinitely by induction.
Chapter 6 I Normed Spaces and Inner Product Spaces
392
e,,}
The sequence thus constructed can be put into a one-to-one corTherefore, these sequences have the respondence with the sequence ,x { ,} same cardinal number. _ The following result can be established by use of Zorn's lemma. 6.13.13. Theorem. eL t X be an inner product space containing a nonez ro element. Then X contains a complete orthonormal set. If Y is any orthonormal set in ,X then there is a complete orthonormal set containing Y a s a subset. Indeed, it is also possible to prove the following result: if in an inner product space \Y and Y 1 are two complete orthonormal sets, then Y \ and Y 1 have the same cardinal number, so that a one-to-one mapping of set \ Y onto set Y 1 can be established. This result, along with Theorem 6.13.13, allows us to conclude that with each Hilbert space X there is associated in a natural way a cardinal number ". This, in turn, enables us to consider " as the dimension of a Hilbert space .X F o r the case of finite-dimensional spaces this concept and the usual definition of dimension coincide. oH wever, in general, these two notions are not to be viewed as one and the same concept. Next, recall that in Chapter 5 we defined a metric space X to be separable if there is a countable subset everywhere dense in X (see Definition 5.4.33). Since normed linear spaces and inner product spaces are also metric spaces, we speak also of separable Banach spaces and separable Hilbert spaces. In the case of Hilbert spaces, we can characterize separability in the following equivalent way. 6.13.14. Theorem. A Hilbert space X is separable if and only if it contains a complete orthonormal sequence. 6.13.15. Exercise. ,x { }
Prove Theorem 6.13.14.
Since in a separable Hilbert space X with a complete orthonormal sequence one can represent every X E X as X
= L
. (x,
1= \
X , )X
I,
we refer to a complete orthonormal sequence ,x { } in a separable Hilbert space X as a basis for .X Caution should be taken here not to confuse this concept with the definition of basis introduced in Chapter 3. (See Definitions 3.3.6 and 3.3.22.) In that case we defined each X in a vector space to have a representation as a finite linear combination of vectors X I ' Indeed, the concept of Hamel basis (see Definition 3.3.22), which is a purely algebraic
6.14.
The Riesz Representation Theorem
393
concept, is of very little value in spaces which are not finite dimensional. In such spaces, orthonormal basis as defined above is much more useful. We conclude this section with the following result.
6.13.16. Theorem. L e t Y be an orthonormal set in a separable H i lbert space .X Then Y is either a finite set or a countably infinite set. 6.13.17. Exercise.
6.14.
THE
Prove Theorem 6.13.16.
RIESZ REPRESENTATION
THEOREM
In this section we state and prove an important result known as the Riesz representation theorem. A direct conseq u ence of this theorem is that the dual space x * of a H i lbert space X is itself a H i lbert space. Throughout this section, { X ; (0, .)} is a H i lbert space. We begin by first noting that for a fixed Y E ,X
= (x,
f(x )
y)
(6.14.1)
is a linear functional in .x By means of (6.14.1) distinct vectors y E X are associated with distinct functionals. F r om the Schwarz ineq u ality we have
I(x, y)1
<
Ilx l illyll·
X ). F r om this it follows that H e nce, Ilfll < lIyll andfis bounded (i.e.,f E * if X is a H i lbert space, then bounded linear functionals are determined by the elements of X itself. In the next theorem we show that every element y of X determines a uniq u e bounded linear functionalf(i.e., a uniq u e element of X*) of the form (6.] 4 . 1) and that Ilfli = lIyll. F r om this we conclude that the dual space X* of the H i lbert space X is itself a H i lbert space. (Compare the following with Theorem 4.9.63.)
6.14.2.
Theorem. (Riesz) L e tfbe a bounded linear functional on .X Then there is a uniq u e y E X such that f(x ) = (x, y) for all x E .X Moreover, Ilfll = Ilyll, and every y determines a uniq u e element of the dual space X* in this way.
Proof
F o r fixed y E ,X define the linear functionalf on X by Eq . (6.14.1). F r om the Schwarz ineq u ality we have If(x)1 = lex, y)1 < lIyllllx l l so thatf is a bounded linear functional and IIfll < lIyll· L e tting x = y we have If(y)1 = l(y,y)1 = lIyllllyll, from which it follows that Ilfli = Ilyll· Nex t , let f be a bounded linear functional defined on the H i lbert space .X L e t Z be the set of all vectors z E X such that fez) = o. By Theorem 3.4.19, Z is a linear subspace of .X Now let Z{ 8} be a sequence of vectors in Z, and let X o E X be a point of accumulation of Z{ 8}' In view of the con-
eluzpter 6 I Normed Spaces and Inner Product Spaces
394
tinuity offwe now have 0 = fez,,) - f(x o) as n - 00. Thus, X o E Z and Z is closed. If Z = X, then for aU x E X we have f(x ) = 0, and the equality f(x ) = (x, y) = 0 for all x E X holds if and only if y = O. Now consider the case Z c X, X 1= = Z. F r om above, Z is a closed linear subspace of .X We can therefore utilize Theorem 6.12.16 to represent X by the direct sum
X=
ZEfjZ1..
Since Z c X and Z 1= = ,X there exists in view of Theorem 6.12.13 a non-zero vector u E X such that u L - Z; i.e., u E Z1.. Also, since u 1= = O.and since u E Z1., it follows from part (i) of Theorem 6.12.8 that u fI. Z, and hence feu) 1= = O. Since Z1. is a linear subspace of ,X we may assume without loss of generality that feu) = l. We now show that u is a scalar multiple of our desired vector yin Eq. (6.14.1). F o r any fixed x E X we can write
f(x -
f(x ) u) =
f(x ) -
f(x ) f(u) =
f(x ) -
f(x ) =
0,
f(x ) u) E Z. F r om before, we have u...L Z and hence 0, or (x, u) = f(x ) lluW, or f(x ) = (x, u/lluW). Letting u/ll u Wyields now the desired form
and thus (x -
(x - f(x ) u, u) =
y=
f(x )
= (x,
y).
To show that the vector y is unique we assume that f(x ) = (x, y' ) and I(x ) = (x, y") for all x E .X Then (x, y' ) - (x, y") = 0, or (x, y' - y") = 0, or (y' - y", )x = 0 for all x E .X It now follows from Theorem 6.11.28 that y' = y". This completes the proof of the theorem. _
6.14.3.
Exercise.
Definition 6.9.8).
Show that every H i lbert
space X
is reflexive (refer to
6.14..4
Exercise. Two normed linear spaces over the same field are said to be congruent if they are isomorphic (see Definition 3.4.76) and isometric (see Definition 5.9.16). Let X be a H i lbert space. Show that X is congruent to X*.
6.15.
SOME APPLICATIONS
We now consider two applications to some of the material of the present chapter. This section consists of three parts. In the first of these we consider the problem of approximating elements in a H i lbert space by elements in a finite-dimensional subspace. lit the second part we briefly consider random
6.15.
Some Applications
395
variables, while in the third part we concern ourselves with the estimation of random variables.
A. Approx i mation (Normal Equations)
of Elements in H i lbert
Space
In many applications it is necessary to approximate functions by simpler ones. This problem can often be implemented by approximating elements from an appropriate Hilbert space by elements belonging to a suitable linear subspace. In other words, we need to consider the problem of approximating a vector x in a Hilbert space X by a vector oY in a linear subspace Y o f .X eL t IY E X for i = I, ... ,n, and let Y = V({IY )} denote the linear subSince Y is finite dimensional, it is space of X generated by fY I ' ... ,Yn'} closed. Now for any fixed x E X we wish to find that element of Y which minimizes II x - Y II for all Y E .Y If oY E Y is that element, then we say that oY approximates x. We call (x - oY ) the error vector and II x - oY II the error. Since any vector in Y can be expressed as a linear combination Y = lX \ y \ + ... + IXnY n' our problem is reduced to finding the set of IX/, i = 1, ... , n, for which the error IIx - lX \ y \ - ... - IXnnY ll is minimized. But in view of the classical projection theorem (Theorem 6.12.12), oY E Y which minimizes the error is unique and, moreover, (x - oY ) ..L yj' i = I, ... ,n. From this we obtain the n simultaneous linear equations
GT(y1> ... , nY )
IX] \ [
: =
([ 'X )\Y ] :
(6.15.1) '
IXn (x , nY ) where in Eq. (6.15.1) GT(y\, ... , nY ) is the transpose of the matrix (Y\,
Y\)
(Y\,
nY )
(Y2,
Y\)
(Y2'
nY )
(6.15.2) (Yn' Y \ ) (Yn' nY ) The matrix (6.15.2) is called the Gram matrix of Y\, ... ,Yn' The determinant of (6.15.2) is called the Gram determinant and is denoted by A(YI> ... ,yJ. The equations (6.15.1) are called the normal equations. It is clear that in a real Hilbert space G(YI> ... ,Yn) = GT(YI> ... ,Yn), and that in a complex Hilbert space G(YI> ... ,Yn) = GT(y1' ... ,Yn)' In order to approximate x E X by oY E Y we only need to solve Eq. (6.15.1) for the lXI' i = 1, ... ,n. The next result gives conditions under which Eq. (6.15.1) possesses a unique solution for the IX I •
Chapter 6 I oH rmed Spaces and Inner Product Spaces
396
6.15.3. Theorem. A set of elements IY { ' ... 'Yft} of a Hilbert space X is linearly independent if and only if the Gram determinant ~(y I' • • , fY t) O.
*'
Proof We prove this result by proving the equivalent statement ~(yl' ... , fY t) = 0 if and only if the vectors IY { ' ... , fY t} are linearly dependent. Assume that IY { ' ... 'Yft} is a set of linearly dependent vectors in .X Then there exists a set of scalars l{ IX ' ... ,IXft}' not all ez ro, such that
+ ... +
IXIIY
IXftYft
= O.
(6.15.4)
Taking the inner product of Eq. (6.15.4) with the vectors IY { ' the n linear equations
... , fY t}
yields
(6.15.5) IXI(Yft'
YI)
t- -
+
•..
IXft(Yft'
fY t)
= 0
Taking the l{ IX ' ... ,IXft} as unknowns, we see that for a non-trivial solution (IX I• . .. ,IXft) to exist we must have (~ IY ' ... 'Yft) = O. Conversely, assume that ~(yl' ... , fY t) = O. Then a non-trivial solution (IX I • • • • , IXft) exists for Eq. (6.15.5). After rewriting Eq. (6.15.5) as
we obtain
f' .'
( I£'I-IX IIY .
which implies that
t IXly,
I- I
l":1
=
IXllY )
II,£IX =
1= I
IY I12 =
0,
I
O. Therefore, the set { Y I '
dependent. This completes the proof.
... ,Y . }
is linearly
_
The next result establishes an expression for the error II x - oY II. The proof of this result follows directly from the classical projection theorem. 6.15.6. Theorem. Let X be a Hilbert space, let x E ,X let { y l' ... , fY t} be a set of linearly independent vectors in ,X let Y be the linear subspace of X generated by { y l' ... , fY t}' and let oY E Y be such that IIx -
Then
oY II =
min ! I x 7EY
-
yll =
min IIx -
I%IIY
-
... -
IXJ.II·
6.15. Some Applications
397
where
!\(YI'
(Yit ,Y ,)
(Ylt
x)
(Y z ,
(Y z ,
x)
,Y ,)
= det
... ,Y", x )
(Y", ,Y ,) (x, ,Y ,) 6.15.7. Exercise. 8.
(y.. x )
(x, x )
Prove Theorem 6.15.6.
Random Variables
A rigorous development of the theory of probability is based on measure and integration theory. Since knowledge of this theory by the reader has not been assumed, a brief discussion of some essential concepts will now be given. We begin by introducing some terminology. If 0 is a non-void set, a family of subsets, ,~ of 0 is called a q-algebra (or a f-q ield) if (i) for all E, F E 0 we have E U F E 0 and E - F E 0, (ii) for any countable sequence of sets (E,,} in ~
we have
U
-
,,-I
E"
E ~,
and (iii) 0
E ~.
It readily
follows that a q-algebra is a family of subsets of 0 which is closed under all countable set operations. A function P: ~ - + R, where ~ is a q-algebra, is called a probability = 0 and pen) = 1, and measure if (i) 0 < P (E) ~ 1 for all E E ~,(ii)P(0) (iii) for any countable collection of sets (E,,} in ~ such that E, n EJ = 0 if i
"*j, we have P(U .. E,,) = . peE,,) . • ""'1
~
.- 1
A probability space is a triple (0, ~,P}, where n is a non-void set, ~ is a q-algebra of subsets of 0, and P is a probability measure on .~ We call elements 0.> E 0 outcomes (usually thought of as occurring at random), and we call elements E E ~ events. A function X: 0 - + R is called a random variable if (0.>: (X o.» < }x E ~ for all x E R. The set (0.>: (X o.» < }x is usually written in shorter form as (X < .} x If X is a random variable, then the function F x : R - + R defined by (xF )x = P(X < }x for x E R is called the distribution function of .X If "X i = 1, ... , n are random variables, we define the random vector X as X = (X I ' .. , ,X ,)T. Also, for x = (x " ... ,x , ,? E R", the event (XI < X l " • . , "X < ,x ,} is defined to be (0.>: IX (o.» < IX } (') o{ .>: X 2(0.» < x z } (') ... (') (0.>: ,X ,(o.» < ,x ,}. Furthermore, for a random vector ,X the function F x : R" - + R,
Chapter 6 / Normed Spaces and Inner Product Spaces
398
defined by (xF )x = P{ X I < x . , ... , fX t < x ft ,} is called the distribution function of .X If X is a random variable and g is a function, g: R - R, such that the Stieltjes integral to be E{g(X)}
=
roo g(x )dF x roo g(x)dF(x )X .
is a function, g: Rft -
exists, then the expected value of g( X ) Similarly, if X
is a random vector and if g
R such that t.g(x ) dF x ( X )
value ofg(X) is defined to be E{g(X)}
t.
=
is defined
exists, then the expected
g(x)dF(x )x .
Some of the expected
values of primary interest are E(X), the expected value of ,X E(XZ), the second moment of ,X and E{ [ X - E(X)Z ] ,} the variance of .X If we let .c z denote the family of random variables defined on a probability space to, g:, P} such that E(XZ) < 00, then this space is a vector space over R with the usual definition of addition and multiplication by a scalar. We say two random variables, IX and X z , are equal almost surely if P{co: IX (co) (z X co)} = O. If we let L z denote the family of equivalence classes of all random variables which are almost surely equal (as in Example 5.5.31), then L { z ; (,)} is a real Hilbert space where the inner product is defined by
*'
=
L z • Throughout the remainder of this section, we let to, g:, P} denote our belong underlying probability space, and we assume that all random variab~es to the Hilbert space L z with inner product (X , )Y = E(XY). (X,
)Y
E(XY)
for ,X
Y
E
C. Estimation of Random Variables The special class of estimation problems which we consider may be formulated as follows: given a set of random variables { Y .. ... , "Y ,}, find the best estimate of another random variable, .X The sense in which an estimate is "best" will be defined shortly. Here we view the set {Y., ... , "Y ,} to be observations and the random variable X as the unknown. F o r any mappingf: R'" - R such thatf(Y I , • . , "Y ,) E L z for all observations {Y ... .. , "Y ,}, we call X = f(Y I , • • , "Y ,) an estimate of .X Iffis linear, we call X a linear estimate. Next, letfbe linear; eL ., letfbe a linear functional on R"'. Then there is a vector aT = (III' ... ,II",) E R'" such thatf(y) = aTy for all yT = ("., ... , "",) E R"'. Now a linear estimate, X = lilY. II",Y"" is called the best linear estimate of ,X given { Y l "' " "Y ,}, if E{ [ X - lilY. - ... II",Y",]Z} is minimum with respect to a E R"'. The classical projection theorem (see Theorem 6.12.12) tells us that the best linear estimate of X is the projection of X onto the linear vector space
+ ... +
6.15. Some Applications
V({Y
399
, IY Il})' F u rthermore, Eq . (6.15.1) gives us the explicit form for 1, ... ,m. We are now in a position to summariz e the above discussion in the following theorem, which is usually called the orthogonality principle.
"« i =
p . •.
, Y III belong to L z · Then X = « I Y I + ... p .•. is the best linear estimate of X if and only if {«p ... '«Ill} are such - 21Y ,} = 0 for i = 1, ... ,m.
6.15.8. Theorem. L e t ,X Y
+
I« IlYIIl that E{ [ X
We also have the following result.
Y I , •• , Y IIl belong to L z . L e t G = ,Y [ j]' where i,j = I, , m, and let V = (PI' ... ,Pill) E Rill, where ,} for i = 1, , m. If G is non- s ingular, then X = « I Y I is the best linear estimate of X if and only if aT = bTG - I .
6.15.9. Corollary. L e t ,X
't,j =
P, =
+
l« ilY
E{,Y Y E{XY
1ft
j,}
6.15.10. Exercise.
+ ...
Prove Theorem 6.15.8 and Corollary 6.15.9.
L e t us now consider a specific case.
6.15.11. Example. L e t ,X VI' ... , Vm be random variables in L z such that E{ X } = E{V,} = E{XV ,} = 0 for i = I, ,m, and let R = P[ /J] be non, m. Suppose that the measuresingular where P,j = E[V,V j] for i,j = I, ments {Y p • .• , IY ft} of X are given by Y , = X V, for i = I, ... ,m. Then we have E{,Y Y + V,][X + Vj]} = 0'; + P'j for i,j = I, j} = E{ [ X ,} = E{ X ( X ... ,m, where 0'; 11. E{.J z X Also, E{XY + V,)} = 0'; for i = I, ... ,m. Thus, G = /Y[ ']J where j' Y = 0'; + P'j for i,j = I, ... , m, bT = (PI' ... , Pm), where P, = 0'; for i = 1, ... ,m, and aT = bTG-I. •
+
6.15.12. Exercise. In the preceding example, show that if P,j = i,} = I, ... , m, where btj is the K r onecker delta, then
,« -
_
0';-+
2
z for
mO'" , O'v
._
I -
O';b ,j for
I, ... , m.
The nex t result provides us with a useful means for finding the best linear estimate of a random variable ,X given a set of random variables { Y p ... , Y k } , if we already have the best linear estimate, given {Y p .• • , Y k - I } .
6.15.13. Theorem. L e t k > 2, and let Y I , • • , Y k be random variables in L z . L e t Y ' j = V({Y I , • • • • Yj})' the linear vector space generated by the random variables {Y p . • , Y j } , for 1 < j < k. L e t Y i k - I) denote the best linear estimate of Y k• given {Y p . . • , Y k- I,} and let Y k(k - I) = Y k Y i k - I). Then kY ' = 'Yk-I EB V({Y k(k - I)}).
Chapter 6 I Normed Spaces and Inner Product Spaces
04 0
Proof By the classical projection theorem (see Theorem 6.12.12), "Y ,(k - I) .J .. ,Y' .-I· Now for arbitrary Z E ,Y ' ., we must have Z = CIY I + ... + C,.-I ,Y .-I + C,.Y,. for some (C I' ... ,C,.). We can rewrite this as Z = ZI + Z2' where ZI = CIY I + ... + C,.-I,Y .-I + C,.Y,.(k - I) and Z2 = C,.Y,.(k - I). and Z2 1- 'Y,.-I' it follows from Theorem 6.12.12 that ZI Since ZI E ,Y ' .-I and Z2 E V({,Y .(k - I)}), the theorem and Z2 are unique. Since ZI E ,Y' .-I is proved. _
We can extend the problem of estimation of (scalar) random variables to random vectors. eL t X I ' ... , X. be random variables in £ 2 ' and let X = (XI> ... , .X )T be a random vector. Let Y o ' .. , "Y , be random variables in , £ 2 ' We call i = (.A\, ... , .X )T the best linear estimate of ,X given Y{ I' "Y ,}. if ,X is the best linear estimate of "X given { Y o ' .. , "Y ,} for i = 1, , n. Clearly, the orthogonality principle must hold for each X , ; i.e., we must have E{(,X - ,X )Y j } = 0 for i = 1, ... ,n and j = 1, ... ,m. In this case i can be expressed as i = AY, where A is an (n X m) matrix of real numbers and Y = (Y I , • • , "Y ,)T. Corollary 6.15.9 assumes now the following matrix form.
6.15.14. Theorem. Let X I ' ... ,X., oY ... , "Y , £ 2 ' Let G = [)',j]' where )'1) = E{,Y Y j } for i,j = P [ ,j] , where PI) = E{ X , Y j} for i = I, ... ,n. If i = AY is the best linear estimate of ,X given ,Y 6.15.15.
Exercise.
be random variables in 1, ... ,m, and let B = G is non-singular, then if and only if A = BG- I .
Prove Theorem 6.15.14.
We note that Band G in the above theorem can be written in an alternate way. That is, we can say that
i =
E{XYT}[E{YVTWIY
(6.15.16)
is the best linear estimate of .X By the expected value of a matrix of random variables, we mean the expected value of each element of the matrix. In the remainder of this section we apply the preceding development to dynamic systems. eL t J = {I, 2, ...} denote the set of positive integers. We use the notation {X(k)} to denote a sequence of random vectors; i.e., X(k) is a random vector be a sequence of random vectors, (U k) = [ U I (k), for each k E .J eL t (U { k)} ... , U i k)] T , with the properties and
= 0
E{ U ( k)}
E{U(k)UT(j)}
=
Q(k~j"
(6~I5.1
7)
(6.15.18)
for all j, k E ,J where Q(k) is a symmetric positive definite (p X p) matrix { (k)} be a sequence of random vectors, V(k) = for all k E .J Next, let V
6.15.
Some Applications
04 1
[V1(k), ... , V..(k)]T, with the properties
=
E{V(k)}
and
E{V(k)VT(j)}
=
0
(6.15.19)
R(k)Ojk
(6.15.20)
for all j, k E ,J where R(k) is a symmetric positive definite (m for all k E .J Now let X ( I) be a random vector, X ( I) = 1X[ (I), ... , X~(I)]T, properties E{X(I)} = 0 and E{X(I)XT(I)} = P(I), X
m) matrix
with the (6.15.21) (6.15.22)
where P(I) is an (n X n) symmetric positive definite matrix. We assume further that the relationships among the random vectors are such that E{(U k)VT(j») E{(X I)UT(k»)
and
E{X(I)VT(k)}
= =
=
0,
(6.15.23)
0,
(6.15.24)
0
(6.15.25)
for all k,j E .J Next, let A(k) be a real (n x n) matrix for each k E ,J let B(k) be a real (n x p) matrix for each k E ,J and let C(k) be a real (m x n) matrix for each k E .J We let {X(k)} and (Y{ k)} be the sequences of random vectors generated by the difference eq u ations and
(X k
+
Y(k)
=
1)
=
+
A(k)X(k) C(k)X(k)
+
B(k)U(k)
(6.15.26)
V(k)
(6.15.27)
for k = 1,2, .... We are now in a position to consider the following estimation problem: ... , Y(k)}, find the best linear estimate of given the set of observations, (Y{ I), the random vector (X k). We could view the observed random variables as Y [ (I), yT(2), ... ,YT(k)], and apply a single random vector, say cyT = T Theorem 6.15.14; however, it turns out that a rather elegant and significant algorithm exists for this problem, due to R. E. Kalman, which we consider next. In the following, we adopt some additional convenient notation. F o r each k,j E ,J we let t(j Ik) denote the best linear estimate of X ( j), given (Y{ I), ... , (Y k)}. This notation is valid for j < k and j;;::: k; however, we shall limit our attention to the situation where j ;;::: k. In the present context, a recursive algorithm means that ~(k + I Ik + I) is a function only of ~(k Ik) and Y ( k + I). The following theorem, which is the last result of this section, provides the desired algorithm explicitly.
I Normed Spaces and Inner Product Spaces
Chapter 6
6.15.28. Theorem (K a lman). Given the foregoing assumptions for the dynamic system described by Eqs. (6.15.26) and (6.15.27), the best linear estimate of X(k), given (Y(I), ... , Y ( k)} , is provided by the following set of difference eq u ations: i(k Ik)
i(k Ik -
=
and i(k
where K ( k)
=
P(k Ik -
P(k
for k
=
+
+
K ( k)[ Y ( k)
II k)
11 k)
= I[ -
=
C(k)i(k Ik -
1)],
(6.15.29)
A(k)i(k Ik),
=
I)CT(k)[C(k)P(k
P(kl k)
and
+
I)
Ik -
I)CT(k)
K ( k)C(k)] P (kl
A(k)P(kl k)AT(k)
(6.15.30)
(6.15.31)
R(k)] - l ,
I),
(6.15.32)
B(k)Q(k)BT(k)
(6.15.33)
k -
+
+
1, 2, ... , with initial conditions
i(IIO) =
and
P(lIO)
0
= P(I).
Proof Assume that i(kl k - I) is known for k E .J We may interpret i(lIO) as the best linear estimate of X(l), given no observations. We wish
to find i(k Ik) and i(k + 11 k). It follows from Theorem 6.15.13 (extended to the case of random vectors) that there is a matrix K ( k) such that i(k I k) = i(kl k - I) + K ( k)f(kl k - 1), where f(kl k - 1) = Y ( k) - t(kl k - I), and t(k Ik - I) is the best linear estimate of Y ( k), given {Y(l), ... , Y ( k - I)}. It follows immediately from Eqs. (6.15.23) and (6.15.27) and the orthogonality principle that t(k I k - 1) = C(k)i(k I k - I). Thus, we have shown that Eq. (6.15.29) must be true. In order to determine K ( k), let X ( kl k - 1) = X ( k) - X ( kl k - I). Then it follows from Eqs. (6.15.26) and (6.15.29) that
(X kl
k)
=
X ( kl
k -
I) -
K ( k)[ C (k)X ( kl
k -
I) +
V(k)] .
To satisfy the orthogonality principle, we must have E{ X ( k I k)Y T (j)} = 0 for j = 1, ... , k. We see that this is satisfied for any K ( k) for j = 1, ... , k - 1. In order to satisfy E(X ( k Ik)YT(k)} = 0, K ( k) must satisfy 0=
E{ X ( k
Ik -
l)YT(k)}
-
K ( k)[ C (k)E{ X ( k
Let us first consider the term E{ X ( k
Ik -
I)YT(k)}
=
E(X ( kl
k -
+
Ik -
l)Y T (k)}
E{V(k)YT(k)}].
I)X T (k)C T(k)
+
X ( kl
k -
(6.15.34) l)VT(k)}.
(6.15.35)
We observe that X(k), the solution to the difference eq u ation (6.15.26) at (time) k, is a linear combination of X ( l) and U ( l), ... , U ( k - 1). In view
6.15. Some Applications
04 3
of Eqs. (6.15.23) and (6.15.25) it follows that E{X(j)VT(k)} = 0 for all k,j E .J Hence, E{ X ( kl k - I)VT(k)} = 0, since X ( kl k - I) is a linear Y ), ... , Y(k - I). combination of X(k) and O Next, we consider the term
E{ X ( kl
k-
= E{ X ( kl
I)XT(k)}
k-
l)[XT(k)
iT(kl k -
iT(k Ik -
1)' +
I)]}
= E { X ( klk- I )[ X T (klk- l ).+ i T(klk- I )] }
=
P(kl
k-
I)
E{X(kl
I)
(6.15.36)
where
P(kl k and E{ X ( klk tion of O Y { ),
t::.
l)iT(klk' - I)} = I)} .
-
k-
I)}
I)X T (klk -
0, since i(klk - I ) is a linear combina-
... , Y(k -
Now consider
Using
.+
= E{V(k)[TX (k)CT(k)
E{V(k)YT(k)}
= R(k).
VT(k)J}
(6.15.37)
Eqs. (6.15.35), (6.15.36), and (6.15.37), Eq. (6.15.34) becomes
0=
P(kl k -
I)CT(k) -
K(k)[C(k)P(kl
k-
l)CT(k)
.+
R(k)].
(6.15.38)
Solving for (K k), we obtain Eq. (6.15.31). To obtain Eq. (6.15.32), let X(k I k) = i(k) - X(k Ik) and P(k Ik) = ErX ( k I k)XT(k I k)}. In view of Eqs. (6.15.27) and (6.15.29) we have
X ( kl k) =
X ( kl k -
1) -
K(k)[C(k)X(kl
Ik -
= [ I - K(k)C(k)]X(k
k-
1) -
1)
+
V(k)]
(K k)V(k).
F r om this it follows that P(kl k) =
I[ -
(K k)C(k)JP(kl I[ -
= I[ -
x
K ( k)C(k)] P (kl
K(k)C(k)]P(k P { (k
k-
Ik -
1)
k-
Ik -
I)CT(k) -
I)CT(k)KT(k)
+
(K k)R(k)KT(k)
1)
K(k)[C(k)P(k
Ik -
I)CT(k)
.+ R(k)J}
T K (k).
U s ing Eq. (6.15.38), it follows that Eq. (6.15.32) must be true. To show that i(k' + 11k) is given by Eq. (6.15.30), we simply show that the orthogonality principle is satisfied. That is,
E{[X(k
+ =
1) -
for j
=
A(k)i(k Ik)]YT(j)}
EfA(k)[X(k)
1, ... , k.
-
i(k I k)]YT(j)}
.+
EfB(k)U(k)YT(j)} =
°
Chapter 6 / Normed Spaces and Inner Product Spaces
04 4
Finally, to verify Eq. (6.15.33), we have from Eqs. (6.15.26) and (6.15.30)
X(k
+
11 k)
=
A(k)X ( k
Ik) +
B(k)U(k).
F r om this, Eq. (6.15.33) follows immediately. We note that i(ll 0) P(IIO) = P(l). This completes the proof. _
6.16.
=
0 and
NOTES AND REFERENCES
The material of the present chapter as well as that of the next chapter constitutes part of what usually goes under the heading of functional analysis. Thus, these two chapters should be viewed as a whole rather than two separate parts. There are numerous excellent sources dealing with H i lbert and Banach spaces. We cite a representative sample of these which the reader should consult for further study. References 6 [ .6]6[- .8], 6[ .10], and 6[ .12] are at an introductory or intermediate level, whereas references 6 [ .2]6[ - .4] and 6[ .13] are at a more advanced level. The books by Dunford and Schwartz and by Hille and Phillips are standard and encyclopedic references on functional analysis; the text by Y osida constitutes a concise treatment of this subject, while the monograph by H a lmos contains a compact exposition on H i lbert space. The book by Taylor is a standard reference on functional analysis at the intermediate level. The texts by K a ntorovich and Akilov, by K o lmogorov and F o min, and by Liusternik and Sobolev are very readable presentations of this subject. The book by Naylor and Sell, which presents a very nice introduction to functional analysis, includes some interesting examples. F o r references with applications of functional analysis to specific areas, including those in Section 6.15, see, e.g., Byron and F u ller 6[ .1], K a lman et al. 6[ .5], L u enberger 6[ .9], and Porter 6[ .11].
REFERENCES 6[ .1] 6[ .2] 6[ .3] 6[ .4] 6[ .5]
.F W. BYRON and R. W. EL UF R, Mathematics of Classical and Quantum Physics. Vols. I. II. Reading, Mass.: Addison-Wesley Publishing Co., Inc.,
1969 and 1970.· N. DUNO F RD and .J SCHWARTZ, Linear Operators. Parts I and II. New York: Interscience Publishers, 1958 and 1964. P. R. A H M L OS, Introduction to Hilbert Space. New York: Chelsea Publishing Company, 1957. E. IH EL and R. S. PHIIL PS, Functional Analysis and Semi-Groups. Providence, R.I.: American Mathematical Society, 1957. R. E. A K M L AN, P. L . A F B L , and M. A. ARBIB, Topics in Mathematical System Theory. New York: McGraw-iH ll Book Company, 1969. *Reprinted in one volume by Dover Publications, Inc., New oY rk,
1992.
6.16.
6[ .6) 6[ .7) 6[ .8) 6[ .9) 6[ .10] 6[ .11] 6[ .12] 6[ .13]
Notes and References L . V. A K NTORovlCH Spaces. New York:
and G. P. AKIO L V, uF nctional Analysis in Normed The Macmillan Company, 1964. A. N. O K M L OGOROV and S. V. O F MIN, Elements of the Theory of uF nctions and uF nctional Analysis. Vols. t, II. Albany, N.Y.: Graylock Press, 1957 and 1961. .L A. IL SU TERNIK and V. .J SoBOLEV, Elements ofFunctional Analysis. New York: rF ederick Ungar Publishing Company, 1961. D. G. EUL NBERGER, Optimization by Vector Space Methods. New York: J o hn Wiley & Sons, Inc., 1969. A. W. NAYO L R and G. R. SEL,L iL near Operator Theory. New York: Holt, Rinehart and Winston, 1971. W. A. PORTER, Modern oF undations of Systems Engineering. New York: The Macmillan Company, 1966. A. E. TAYO L R, Introduction to uF nctional Analysis. New York: John Wiley & Sons, Inc., 1958. .K O Y SIDA, uF nctional Analysis. Berlin: Springer-Verlag, 1965.
7
IL NEAR
OPERATORS
In the present chapter we concern ourselves with linear operators defined on Banach and Hilbert spaces and we study some of the important properties of such operators. We also consider selected applications in this chapter. This chapter consists of ten parts. Throughout, we consider primarily bounded linear operators, which we introduce in the first section. In the second section we look at inverses of linear transformations, in section three we introduce conjugate and adjoint operators, and in section four we study hermitian operators. In the fifth section we present additional special linear transformations, including normal operators, projections, unitary operators, and isometric operators. The spectrum of an operator is considered in the sixth, while completely continuous operators are introduced in the seventh section. In the eighth section we present one of the main results of the present chapter, the spectral theorem for completely continuous normal operators. Finally, in section nine we study differentiation of operators (which need not be linear) defined on Banach and Hilbert spaces. Section ten, which consists of three subsections, is devoted to selected topics in applications. Items touched upon include applications to integral equations, an example from optimal control, and minimization of functionals (method of steepest descent). The chapter is concluded with a brief discussion of pertinent references in the eleventh section.
7.1.
BOUNDED
IL NEAR
TRANSFORMATIONS
Throughout this section X and Y denote vector spaces over the same field
,F where F is either R (the real numbers) or C (the complex numbers). We begin by pointing to several concepts considered previously. Recall from Chapter I that a transformation or operator T is a mapping of a subset :D(T) of X into .Y Unless specified to the contrary, we will assume that X = :D(T). Since a transformation is a mapping we distinguish, as in Chapter I, between operators which are onto or surjective, one-to-one or injective, and one-to-one and onto or bijective. If T is a transformation of X into Y we write T: X - + .Y If x E X we call y = T(x) the image ofx in Y under T, and if V c X we define the image ofset V in Y under T as the set T(V)
=
y{
E
y
Y:
=
T(v), v EVe X } .
On the other hand, if W c ,Y then the inverse image ofset Wunder T is the set T- I (W) = x { E :X y = T(x) EWe .} Y We define the range ofT, denoted R < (T), by
R < (T) =
y{
E
:Y
y=
T(x), x EX } ;
i.e., R < (T) = T(X). Recall that if a transformation T of X into Y is injective, then the inverse of T, denoted T- I , exists (see Definition 1.2.9). Thus, if y = T(x) and if T is injective, then x = T- l (y). In Definition·3.4.1 we defined a linear operator (or a linear transformation) as a mapping of X into Y having the property that (i) T(x (ii) T(lX)X
+ y) = T(x) + T(y) for all ,x y E X ; = lXT(x) for alllX E F and all x E
and .X
As in Chapter 3, we denote the class of all linear transformations from Also, in the case of linear transformations we write
X into Y by L ( X , )Y . Tx in place of T(x).
Of great importance are bounded linear operators, which turn out to be also continuous. We have the following definition.
7.1.1. Definition. Let X and Y be normed linear spaces. A linear operator T: X - + Y is said to be bounded if there is a real number 1' > 0 such that for all x
E
.X
II Tx Ily < 1' 11 lx ix
The notation II x Ilx indicates that the norm on X II Tx lIy indicates that the norm on Y is employed.
is used, while the notation However, since the norms of the various spaces are usually understood, it is customary to drop the subscripts and simply write II x II and II Tx II·
04 7
Chapter 7 I iL near Operators
04 8
Our first result allows us to characterize a bounded linear operator in an equivalent way.
)Y . Then T is bounded if and only if T 7.1.2. Theorem. Let T E L ( X , maps the unit sphere into a bounded subset of .Y 7.1.3.
Exercise.
Prove Theorem 7.1.2.
In Chapter 5 we introduced continuous functions (see Definition 5.7.1). The definition of continuity of an operator in the setting of normed linear spaces can now be rephrased as follows. 7.1.4. Definition. An operator T: X - > Y (not necessarily linear) is said to be continuous at a point X o E X iffor every f > 0 there is a 6 > 0 such that IIT(x ) -
whenever II x X
o II
T(x o) II <
f
< 6.
The reader can readily prove the next result. 7.1.5. Theorem. Let T E L ( X , )Y . If T is continuous at a single point X o E ,X then it is continuous at all x E .X
7.1.6. Exercise.
Prove Theorem 7.1.5.
In this chapter we will mainly concern ourselves with bounded linear operators. Our next result shows that in the case of linear operators boundedness and continuity are equivalent. 7.1.7. Theorem. is bounded.
Let T
L(X,
E
)Y .
Then T is continuous if and only if it
Assume that T is bounded, and let "I be such that II Tx II S "IIIx II for all x E .X Now consider a sequence x { n ) in X such that x . - > 0 as n - > 00. Then II TX n II < , 11 .x 11 - > 0 as n - > 00, and hence T is continuous at the point E .X F r om Theorem 7.1.5 it follows that T is continuous at all points x E .X Conversely, assume that Tis continuous at x = 0, and hence at all x E .X Since TO = 0 we can find a 6 > 0 such that II Tx II < I whenever II x II S 6. F o r any x 1= := 0 we have 1I(6x)/llxllll = 6, and hence
Proof
o
IITxll Ifwe let"
=
=
II T(I I~
1/6, then II Txll
•
<
11¥)11
i'llxll,
=
(~)II
T(I ~I)I
and Tis bounded.
ilfl·
< _
7.1. Bounded iL near Transformations
Now let S, T E L ( X , + T) by (S
and in Eq. (3.4.34 )
In Eq.
Y).
operators (S
+
=
T)x
(3.4.24 )
+
Sx
we defined the sum of linear
Tx, x
E
,X
we defined multiplication ofT by a scalar
= IX(Tx),
(IXT)x
x
E
,X
IX
E
IX
E
F as
.F
We also recall (see Eq. (3.4.» 4 that the zero transformation, 0, of X into Y is defined by Ox = 0 for all x E X and that the negative of a transformation T, denoted by - T, is defined by (- T)x = Tx for all x E X (see Eq. X ) is defined 3.4.45». Furthermore, the identity transformation IE L ( X , by Ix = x for all x E X (see Eq. (3.4.56». Referring to Theorem 3.4.74 , we recall that L ( X , )Y is a linear space over .F Next, let ,X ,Y Z be vector spaces over ,F and let S E L ( ,Y Z) and T E L ( X , )Y . The product of Sand T, denoted by ST, was defined in Eq. (3.4.50) as the mapping of X into Z such that
(ST)x =
S(Tx), x
E
.X
It can readily be shown that ST E L ( X , Z). Furthermore, if X = Y = Z, X ) is an associative algebra with identity I (see Theorem 3.4.59). then L ( X , Note however that the algebra L ( X , X ) is, in general, not commutative because, in general,
ST* TS.
In the following, we will use the notation B(X , )Y to denote the set of all bounded linear transformations from X into Y; i.e., B(X , )Y
A
{T
E
L(X,
)Y :
T is bounded}.
(7.1.8)
The reader should have no difficulty in proving the next theorem. 7.1.9. Theorem. The space B(X , )Y is a linear space over .F 7.1.10. Exercise.
Prove Theorem 7.1.9.
Next, we wish to define a norm on B(X , )Y . 7.1.11. Definition. defined by IITII
=
Let
T
E
B(X , )Y .
inf{y: II Txll
<
The norm of T, denoted yllxll for all x EX } .
II Til,
is
(7.1.12)
Note that II Til is finite and that
IITxll S IITII' l lx l l for all x E .X In proving that the function II . II: B(X , )Y - + R satisfies all the axioms of a norm (see Definition 6.1.1), we need the following result.
Chapter 7 I iL near Operators
14 0
7.1.13. Theorem. Let T E B(X , )Y . Then expressed in anyone of the following forms:
< lY lxll
(i)
II Til =
inf{ y :IITx l l
(ii)
II Til =
sup I{ I Tx l l/llx l l:
(iii)
IITII =
(iv)
II Til =
7.1.14.
7
","0
Tx l l: x
E
sup
I{ I
Tx l l: x
EX } .
Exercise.
;} X
and
Prove Theorem 7.1.13.
We now show that the function the axioms of a norm.
II . II defined
7.1.15. Theorem. The linear space B(X , norm defined by Eq. (7.1.12»; i.e.,
in Eq. (7.1.12) satisfies all
)Y is a normed linear space (with
II Til > 0, and II Til = 0 if and only ifT = s IISII + II Til for every S, T E B(X, )Y ; and III Til for every T E B(X, )Y and for every ~ E .F
(i) for every T (ii) liS + Til (iii) II T~ II = I~
Proof
be
EX } ;
I{ I
1"'1=\
can equivalently
for all x EX } ;
x
sup
I",I:S:\
II Til
)Y ,
B(X ,
E
0;
The proof of part (i) is obvious. To verify (ii) we note that
II(S + T)x l l = IISx + Tx l l < IISxll + IITxl1 < (IISII + IITll)llxll· If x = 0, then we are finished. If x t= = 0, then liS + Til = ~
< IISII + IITII for all x
II(Sltx~)xlI
E
,X
x t= =
We leave the proof of part (iii), which is similar, as an exercise. F o r the space B(X ,
E
B(X,
X),
then ST
IISTII < IISII 1· 1 F o r each x
E
B(X,
E
IISTII = completing the proof.
E
B(X,
X)
and
Til·
X we have
II (ST)x II = II S(Tx) II < IISII·11 which shows that ST
_
we have the following results.
X)
7.1.16. Theorem. If S, T
Proof
O.
X).
sup ","0
If x t= =
Tx l l
< IISII·IITII·llxll,
0, then
II(ST)xll < IISII·IITII, IIxll
_
7.1.17. Theorem. Let / denote the identity operator on .X B(X, X), and II/II = 1.
Then /
E
7.1.
Bounded iL near
7.1.18. Exercise.
Transformations
14 1
Prove Theorem 7.1.17.
We now consider some specific cases. 7.1.19. Example.
x
= (el' ez, ... ) E
X = I z , the Banach space of Example let us define T: X - > X by
Let ,X
Tx =
6.1.6. F o r
(0, ez, e3' ... ).
The reader can readily verify that T is a linear operator which is neither injective nor surjective. We see that 00 00
IITxW =
<
~Ie,lz
le,l z
~
IIxW·
=
Thus, T is a bounded linear operator. To compute IITII we observe that which implies that IITII < I. Choosing, in particular, x = (0, 1,0, ...) E ,X we have II Txll = Ilxll = I and
IITxll < II x II,
II Txll < IITII·llxll = IITII. Thus, it must be that IITil = I. • 1=
7.1.20. Example. Let X = era, b], and let 11·1100 be the norm on era, b] defined in Example 6.1.9. eL t k: a[ , b] X a[ , b] - > R be a real-valued function, continuous on the square a < s < b, a < t < b. Define the operator T: X - > X by
=
[ T x ] ( s)
for x
E
.X
Then T
L(X,
E
X)
IITx II =
sup
Q~·,~b
This shows that T that IITII = )10' •
E
)10
B(X ,
k(s, t)x ( t) dt
(see Example 3.4.6). Then
Ifb k(s, t)x(t) dt I
< [Q~rb =
r
r Q
Ik(s, t) Idt]
·lIxll·
)Y and that
•
IITil <
[Q~fb )10'
Ix ( t) I] It can, in fact, be shown
F o r norms of linear operators on finite-dimensional spaces, we have the following important result. 7.1.21. Theorem. continuous.
eL t
T E L(X,
)Y .
If X
Let {XI' ,x n } be a basis for .X set of scalars ,en} such that x the linear functionalsj,: X - > F b y j,(x ) =
Proof
reI,
is finite dimensional, then Tis F o r each x
= elx l +
e" i =
I,
+
E
there is a unique If we define ,n, then by Theorem X,
enxn'
Chapter 7 I iL near Operators
14 2
6.6.1 we know that each f, is a continuous linear functional. Thus, there exists a set of real numbers { " I' ... ,,,"} such that If,(x) I < ",lIxll for i = 1, ... , n. Now
Tx = ' I Tx l + ... + ,"Tx". If we let p = max, llTx,11 and )'0 = max , )"' then it follows that IITxll < np)'oll x II. Thus, T is bounded and hence continuous. _
Next, we concern ourselves with various norms of linear transformations on the finite dimensional space R".
7.1.22. Example. eL t X = R", and let IU { ' ... ,u"} be the natural basis for R" (see Example .4 I.I 5). F o r any A E L ( X , X ) there is an n X n matrix, say ... , A = a[ ll] (see Definition .4 2.7), which represents A with respect to IU{ > u"}. Thus, if Ax = y, where x = (' I > ... ,,") E X a ndy = ('71' ... , 7' ") E ,X we may represent this transformation by y = Ax (see Eq. (4.2.17». In Example 6.1.5 we defined several norms on R", namely and
IIxllp =
[ I ' l lI'
+ ... +
1e"'I] ' /P,
11_ = max, I{
II x
<
1
00
e,l}.
It turns out that different norms on R" give rise to different norms oftransformation A. (In this case we speak of the norm of A induced by the norm defined on R".) In the present example we derive expressions for the norm of A in terms of the elements of matrix A when the norm on R" is given by II • III' II • liz, and II • 11-·
led + ... + 1e"1.
(i) L e tp = 1; i.e., IIxli = To prove this, we see that
~
IIAxl1 =
It1 atjel I
= l=t 1 lell ,-t 1 la'll < l-i; . lell • = eL t
m { ax
t
• S;lS;" I- I
jo be such that
Then IIAII =
i;latj,l=
1= 1
max
From this it follows that II A II ~
t'lalll.
1-1=
S;lS;" ,=t I Iall I}
m { ax
tla/JI"
I S;lS;" 1= 1
t lau,l
I- '
1
I
la,ll} · l lx l l·
show that equality must hold, let X o = and " = 0 if i *- jo. Then IIAxoli =
max
)' 0 '
(' I ' ... ,,")
)' 0 ' E
and Ilxoll =
Then IIAII< " o'
To
R" be given by ' l l
= I,
1.
and so we conclude that II A II =
)' 0 '
7.1.
Bounded iL near
Transformations
14 3
Ie.
(ii) Let p = 2; i.e., IIx l l = (leI 12 + + 12)1/2. Let AT denote the and let A { ., ,A,,} be the distinct eigenvalues transpose of A (see Eq. (4.2.9», of the matrix ATA (see Definition .4 5.6). LetA o = max AJ . Then II A II = ~. J
To prove this we note first that by Theorem .4 10.28 the eigenvalues of ATA are all real. We show first that they are, in fact, non-negative. Let {XI' ... , ,x ,} be eigenvectors of ATA corresponding to the eigenvalues A{ . I, ... , A,,}, respectively. Then for each i = I, ... , k we have ATAx/ = A/X/. Thus, ;X ATAx/ = A,X;/X . From this it follows that A= ,
+
+
>
;X ATAx/ x;x/-
O.
F o r arbitrary x E X it follows from Theorem .4 10.44 that x = IX "x , where ATAx, = A/X/, i = I, ... ,k. Hence, ATAx = AIX I A/eX", By Theorem .4 9.41 we have IIAxW = T x ATAx. Thus,
IIAxW =
" Atllx,W < = I;
T x ATAx
'=1
from which it follows that II A II < ~. sponding to Ao, then we must have achieved. Thus, II A II = ~.
e,
(iii) Let Ilx l i = max I{ I}. Then / this part is left as an exercise. _
7.1.23. Exercise.
"
=
Ao I; Ilx,W /= 1
+ .
+ .
AollxW,
If we let x be an eigenvector correAollxW, and so equality is
IIAxW = IIAII =
max ( t /
J=I
laill).
The proof of
Prove part (iii) of Example 7.1.22.
Next, we prove the following important result concerning the completeness of B(X , )Y
7.1.24. Theorem. If Y is complete, then the normed linear space B(X , is also complete.
)Y
Proof L e t {T.} be a Cauchy sequence in the normed linear space B(X , )Y . Choose N such that for a given f > 0, IITm - T.II < f whenever m > N and n > N. Since the T. are bounded we have for each x E ,X
<
T.lll1xll < fllx l l whenever m, n ~ N. From this it follows that T { .x} is a Cauchy sequence in .Y But Y is complete, by hypothesis. Therefore, T.x has a limit in Y which depends on x E .X Let us denote this limit by Tx; i.e., lim T.x = Tx. To IITmx -
T.xll
IITm-
.-00
show that T is linear we note that
T(x and
+
y) =
lim T.(x
T(<)x<
=
+
y) =
lim T.(<)x <
lim T.x
+
lim T.y
= « lim T.x =
«Tx.
= Tx + Ty
Chapter 7 I iL near Operators
14 4
Thus, T is a linear operator of X into .Y We show next that T is bounded and hence continuous. Since every Cauchy sequence in a normed linear space is bounded, it follows that the sequence T { nJ is bounded, and thus II Tn II < M for all n, where M is some constant. We have
II Txll =
II =
lilim Tnx
lim II Tnx
II <
sup 01 Tn IIIIx
II)
This proves that T is bounded and therefore continuous, and T E B(X, )Y . Finally, we must show that Tn - + T as n - + 00 in the norm of B(X, )Y . F r om before, we have II TmX - Tnx II < ell x II whenever m, n > N. Ifwe let n - + 00, then II TmX - Tx II < ell x II for every x E X provided that m > N. This implies that II Tm - Til < e whenever m > N. But Tm - + T as m - + 00 with respect to the norm defined on B(X, )Y . Therefore, B(X, )Y is complete and the theorem is proved. _ In Definition 3.4.16 we defined the null space of T =
(~ T)
x{
:X Tx
E
=
OJ.
E
L(X,
)Y as (7.1.25)
We then showed that the range space R < (T) is a linear subspace of Y a nd that ~(T) is a linear subspace of .X F o r the case of bounded linear transformations we have the following result. 7.1.26. Theorem. of .X
eL t T
)Y . Then
B(X,
E
~(T)
is a closed linear subspace
Proof meT) is a linear subspace of X by Theorem 3.4.19. That it is closed follows from part (ii) of Theorem 5.7.9, since (~ T) = T-I({O)J and since O{ J is a closed subset of .Y _ We conclude this section with the following useful result for continuous linear transformations. 7.1.27. Theorem. Let T
E
-
T(I;
-
I- I
for every convergent series I; ~,X, I- I
-
)Y . Then T is continuous if and only if
L(X, ~IXI)
= I; ~ITxl 1= 1
in .X
The proof of this theorem follows readily from Theorem 5.7.8. We leave the details as an exercise. 7.1.28. Exercise.
Prove Theorem 7.1.27.
7.2.
INVERSES
Throughout this section X and Y denote vector spaces over the same field
F where F is either R (the real numbers) or C (the complex numbers). We recall that a linear operator T: X - 4 Y has an inverse, T- I , if it is injective, and if this is so, then T- I is a linear operator from R < (T) onto X (see Theorem 3.4.32). We have the following result concerning the continuity
ofT- I .
7.2.1. Theorem. Let T E L(X, )Y . Then T- I exists, and T- I E B( 0 such that II Tx II > IXII x II for all x E .X If this is so, II T- I II < I/IX.
Proof Assume that there is a constant IX > 0 such that IXII x II < II Tx II for all x E .X Then Tx = 0 implies x = 0, and T- ' exists by Theorem 3.4.32. For y E R < (T) there is an x E X such that y = Tx and T- l y = .x Thus, or
II =
IXII x
IXII T- I y II
<
II <
IIT-I Y
II Tx II
= II y II,
~ lIyll·
Hence, T- I is bounded and liT-III < I/IX. Conversely, assume that T- I exists and is bounded. Then, for x E X there is ayE R < (T) such that y = Tx, and also x = T- I y. Since T- I is bounded we have or
The next result, called the Neumann expansion theorem, gives us important information concerning the existence of the inverse of a certain class of bounded linear transformations. 7.2.2. Theorem. Let X be a Banach space, let T E B(X, )X , let I E B(X, X) denote the identity operator, and let II Til < I. Then the range of (1- T) is ,X the inverse of (I - T) exists and is bounded and satisfies the inequality
F u rthermore, the series
~
. Til in B(X,
.-0
with respect to the norm of B(X, (1- T)- I
=
1+
)X ;
T+
(7.2.3)
X)
converges uniformly to (J -
i.e.,
T2
+ ... +
T"
+ ....
T)- I (7.2.4) 14 5
Chapter 7 I iL near Operators
416
Proof
Since
IITil <
.
I, it follows that the series I~J ITII·
view of Theorem 7.1.16 we have converges in the space B(X , Theorem 7.1.24. If we set
II P II < IITil·,
.
then ST =
.
I: P+
TS =
T)S =
I,
.=0
=
T)
S(I -
It now follows from Theorem 3.4.65 that (I F u rthermore, S E B(X , X ) . The inequality and is left as an exercise. _ 7.2.5. Exercise.
I: T· .so
I:P, .-0
S=
(I -
and hence the series
because this space is complete in view of
X),
and
.
converges. In
I.
T)- I exists and is equal to S. (7.2.3) follows now readily
.Prove inequality (7.2.3).
The next result, which is of great significance, is known as the Banach inverse theorem. 7.2.6. Theorem. Let X and Y be Banach spaces, and let T E B(X , )Y . If T is bijective then T- I is bounded.
Proof
The proof of this theorem is rather lengthy and requires two preliminary results which we state and prove separately.
7.2.7. Proposition. If A is any subset of X such that .1 = X (.1 denotes the closure of A), then any x E X such that x * - O can be written in the form x =
where
E
X.
Proof
that II X
"x
A and
-
Xl
II < til X Ilx
A such that and obtain
1,2, ....
is constructed as follows. Let X l E A be such II. This can certainly be done since A= .X Now choose - XI - X" II < illX II· We continue in this manner XI
-
••
We can always choose such an x . .X
+ ... ,
X.
k}
Ilx and A =
+ ... +
X"
II X. II < 31IxlI/2·, n =
The sequence x {
E
+
Xl
By construction of
I .x 1I < 2.lIx lI. -
E A,
.X{ l,11
X
because -
tl
x
k
X
-
11--
Xl
-
••
0 as n -
-
.x _
1 E
00.
Hence,
X
7.2. Inverses x
.
= :E x k- I
k•
417
We now compute II x"
II. First,
we see that
Ilxlll = IIx l - X + Ix I < IIx l - Ix I + IIx l i < l! lxll, IIx z lI = Ilx z + IX - -X IX + ix i < IIx - IX - lzx l + Ilx - Ix II < illIx I,
and, in general,
IIx" II = IIx" + ,X ,_I + < Ilx - IX -
+ IX -
+ -X
I - ... - ,X ,_I ,x ,11 + Ilx - IX - ... - ,x ,_111 X
II
X
3
< 2"lI x ll. which proves the proposition. If A { ,,}
7.2.8. Proposition.
= U
such that X
GO
_ is any countable collection of subsets of X
A", then there is a sphere S(x o; E)
,,-I C .1".
C
X
and a set A" such
that S(X o; E) Proof The proof is by contradiction. Without loss of generality, assume that AI C A z C A 3 C . . . . F o r purposes of contradiction assume that for every x E X and every n there is an E" > 0 such that S(x ; E,,) n A" = 0. Now let IX E X and EI > 0 be such that S(x l ; f l ) n AI = 0. eL t X z E X and f z > 0 be such that S(x z ; fz ) c S(x l ; f\ ) and S(x z ; fz ) n Az = 0. We see that it is possible to construct a sequence of closed nested spheres, ,K { ,}, (see Definition 5.5.34) in such a fashion that the diameter of these spheres, diam (K,,), converges to ez ro. In view of part (ii) of Theorem 5.5.35, Then X
n K" * 0. eL t ..
k- I
¢ A" for all n. But this contradicts the fact that X =
completes the proof of the proposition.
Proof ofTheorem 7.2.6. Let Ak =
Clearly, Y
=
. U A
k- I
{ y E :Y k•
II r- I y II <
_
kllyll},
k
n "K . GO
x
U
GO
,,= 1
E
k= 1
A". This
= 1,2, ....
By Proposition 7.2.8 there is a sphere S(Yo; f)
C
and a set A" such that S(Yo; E) C .1". We may assume that oY E A". Let be such that 0 < p < E, and let us define the sets Band Bo by and
B
= y{
Bo =
E
S(Yo; f):
{y E Y: y
p < lIy -
= z - oY ,
Z E
oY II}
B}.
Y
p
14 8
Chapter 7 I iL near Operators
We now show that there is an Ax such that B o c Y - oY E Bo. We then have
Ax
Let Y <
E
B n Aft' Then
IIT-I(y - oY )11 < IIT-I Y II + IIT-I oY lI ~ nUl Y II + II oY III < nUly - oY II + 211 oY 10 = nlly - Y 11[1 o
< Now let K
nlly -
oY
11[1
+
211Yoll IIYoY- ll + 211 po ll
J
]
be a positive integer such that
n[1
+
211 po "] < .K
It then follows that Y - oY E Ax. It follows readily that Bo c Ax Now let Y be an arbitrary element in .Y It is always possible to choose a real number .t such that .ty E Bo. Thus, there is a sequence y{ ,} such that <
,Y
E
= .ty. This means that the sequence
Ax for all i and lim ,Y
i{ :
,Y }
converges to y. We observe from the definition of Ax that if ,Y E Ax , then T1 IY E Ax for any real number .t. eH nce, we have shown that Y c Ax- . iF nally, for arbitrary Y
.
where II fY t series I; X k= 1
Y we can write, by Proposition 7.2.7,
E
Y = IY + II < 311 Y II/2ft . L e tx k =
+ ... +
1Y
T- I yk , k
Since T is continuous and since X k)
< 3KllylI.
,
,and consider the infinite
lk =
IIxlI < ' t l"x k " < 3KIIYl k~
I~
+
This series converges, since
k•
so that
T(f
fY t
= 1,
= ~
k- I
TX
k
=
:tY k
k- I
=
This implies that
.
~ X (:' 1 k
y. eH nce,
T-I
3KIIYII·
converges, it follows that Tx =
x = T- I y . Therefore, IIxli = IIT- I Y II
is bounded, which was to be proved.
_
tU ilizing the principle of contraction mappings (see Theorem 5.8.5), we now establish results related to inverses which are important in applications. In the setting of normed linear spaces we can restate the definition of a contraction mapping as being a function T: X X (T is not necessarily
7.3.
Conjugate and Adjoint Operators
linear) such that
14 9
T(y) II <
IIT(x) -
/Xlix
-
yll
for all x , y E ,X with 0 < IX < I. The principle of contraction mappings asserts that if T is a contraction mapping, then the equation
T(x)
= x
has one and only one solution x E .X We now state and prove the following result. 7.2.9. Theorem. and let l O.
*
(i) If III > (ii) if Ill> (iii) if III > vector x
Let X
X),
let l E ,F
II T II, then Tx = h has a unique solution, namely x = II Til, then (T - 1/)-1 exists and is continuous on X ; II T II, then for a given y E X there is one and only such that (T -
E X
x =(iv) if III-
be a Banach space, let T E B(X,
Til <
Proof (i) F o r any ,x y
i[
l/)x =
y, and
+ ~ + ...
;J
0; one
and
I, then T- I exists and is continuous on .X E
,X
we have
1I1- Tx - l - t TY I I = 11- 1 1\ 1 T(x - y)1I < Il-IIIITllllx - yll. Thus, if II Til < IAI, then A-I T is a contraction mapping. In view of the principle of contraction mappings there is a unique x E X with l- ' T x = x, or Tx = lx . The unique solution has to be x = 0, because TO = O. I
(ii)
L e tL
=
t- T.
Then IILII
=
mil
Til <
l. It now follows from Theorem
exists and is continuous on .X Thus, (lL - ll)- I = (T 7.2.2 that (L _ / )- 1 - ll)- I exists and is continuous on .X This completes the proof of part (ii). The proofs of the remaining parts are left as an exercise. _ 7.2.10. Exercise.
7.3.
Prove parts (iii) and (iv) of Theorem 7.2.9.
CONJG U ATE
AND ADJOINT
OPERATORS
Associated with every bounded linear operator defined on a normed linear space is a transformation called its conjugate, and associated with every bounded linear operator defined on an inner product space is a transformation called its adjoint. These operators, which we consider in this section, are of utmost importance in analysis as well as in applications.
Chapter 7 I iL near Operators
24 0
Throughout this section X and Y a re normed linear spaces over ,F where F is either R (the real numbers) or C (the complex numbers). In some cases we may further assume that X and Y a re inner product spaces, and in other instances we may require that X and/or Y be complete. eL t X f and yf denote the algebraic conjugate of X and ,Y respectively (refer to Definition 3.5.18). tU ilizing the notation of Section 3.5, we write x ' E X f and y' E yf to denote elements of these spaces. If T E L ( X , )Y , we defined the transpose of T, TT, to be a mapping from yf to X f determined by the equation ,x <
TTy' )
=
y' ) for all x
T < ,x
,X y'
E
E
yf
(see Definition 3.5.27), and we showed that TT E L ( yf, Xf). Now let us assume that T: X - + Y is a bounded linear operator on X into .Y Let * x and y* denote the normed conjugate spaces of X and ,Y respectively (refer to Definition 6.5.9). If y' E y*, then y' ( y) = y< , y' ) is defined for every y E Y and, in particular, it is defined for every y = Tx, x E .X The quantity T < ,x y' ) = y' ( Tx ) is a scalar for each x E .X Writing x'(x) = T < ,x y' ) = y' ( Tx ) , we have defined a functional x ' on .X Since y' is a linear transformation (it is a bounded linear functional) and since T is a linear transformation (it is a bounded linear operator), it follows readily that x ' is a linear functional. Also, since T is bounded, we have
Ix'(x) I =
I=
ly' ( Tx )
I
y' ) I<
Ily'lllITxll"<
lIy'III1Tllllxll,
and therefore x ' is a bounded linear functional and x ' E X*. We have thus assigned to each functional y' E y* a functional x ' E X * ; i.e., we have established a linear operator which maps y* into X * . This operator is called the conjugate operator of the operator T and is denoted by T'. We now have The definition of T': *Y .x <
*x
-+
T' y ' )
=
x'
=
T' y ' .
is usually expressed by the relation y' ) , x E ,X y' E Y*.
T < ,x
tU ilizing operator notation rather than bracket notation, the definition of the conjugate operator T' satisfies the equation x'(x)
=
y' ( Tx )
=
(T' y ' ) (x ) ,
x
E ,X
and we may therefore write y' T =
T' y ' ,
where y' T denotes the functional on X consisting of the operators T and y' , and T' y ' is the functional obtained by operating on y' by T' . The reader can readily show that T' is unique and linear. If *Y = yf, which is the case if Y is finite dimensional, then the conjugate T' and the transpose TT are identical concepts. oH wever, since, in general, y* is a proper
7.3.
24 1
Conjugate and Adjoint Operators
subspace of yl, TT is an extension of T' or, conversely, T' is a restriction of P to the space *Y . We summarize the above discussion in the following definition and Figure A.
x
y
7.3.1.
·x
y.
R
iF gure A
7.3.2. Definition. Let T be a bounded linear operator on X into .Y conjugate operator of T, T' : y* - + X * is defined by the formula ,x <
7.3.3. Exercise.
T' y ' )
=
y' ) , x
E
,X y'
E
The
*Y .
Show that the conjugate operator T' is unique and linear.
Before exploring the properties of conjugate operators, we introduce another important operator which is closely related to the conjugate operator, the so-called "adjoint operator." In this case we focus our attention on Hilbert spaces. Let X and Y denote Hilbert spaces, and let the symbol ( , ) denote the inner product on both X and .Y If T is a bounded linear transformation on X into ,Y then in view of the above discussion there is a unique bounded linear operator from *Y into X · , called the conjugate of T. But in view of Theorem 6.14.2, the dual spaces X · , y* may be identified with X and ,Y respectively, because X and Y a re Hilbert spaces. This gives rise to a new type of bounded linear operator from Y into ,X called the adjoint of T, which we consider in place ofT' . Let oY E Y be fixed, and let x ' ( x ) = (x, x ' ) = (Tx, oY ), where T E B(X, )Y and x ' E *X . By Theorem 6.14.2 there is a unique X o E X such that
Chapter 7 I iL near Operators
24 2
x'(x) = (x, ox )' Writing X o = T*yo we define in this way a transformation of Y into .X We call this transformation the adjoint of T. Dropping the subscript
ez ro, we characterize the adjoint of T by the formula
(Tx, y) =
(x, T*y), x
E
,X Y
E
.Y
We will now show that T*: Y - + X is linear, unique, and bounded. To prove linearity, let x E ,X lY > 2Y E ,Y let ~, P E ,F and note that
+
(x, T*(~IY
=
PY2» =
+
(Tx, ~YI «(x,
T*YI)
+
PY2)
=
«(Tx,
+
IY )
p(x , T*Y2) =
P(Tx, 2Y )
(x, ~T*YI
+
PT*Y)J ' '
F r om this it follows that T*(~YI
+
PY2) =
+
~T*YI
PT*Y2'
and therefore T* is linear. To show that T* is unique we note that if (x, T*y) = (x, S*y), then (x, T*y) - (x, S*y) = 0 implies (x, (T* - S*)y) = 0 for all x E .X F r om this it follows that (T* - S*)y 1. x for all x E ,X and thus (T* - S*)y = 0 for all Y E .Y Therefore, T* = S*. To verify that T* is bounded we observe that
II T*x 11 2 = I(T*x, T*x) I = I(T(T*x), )x 1 s;; II T(T*x) IIIIx II and thus
<
IITIIIIT*xllllxll,
II T*x II < IITllllxll·
F r om this it follows that T* is bounded and furthermore II T*II S;; II T II. We now give the following formal definition. 7.3.4. Definition. Let X and Y be Hilbert spaces, and let T be a bounded linear operator on X into .Y The adjoint opentor T*: Y - > X is defined by the formula
(Tx, y) =
(x, T*y), x
E
,X Y
E
.Y
Summarizing the above discussion we have the following result. 7.3.5. Theorem. The adjoint operator T* given in Definition 7.3.4 linear, unique, and bounded.
is
The reader is cautioned that many authors use the terms conjugate operator and adjoint operator interchangeably. Also, the symbol T* is used by many authors to denote both adjoint and conjugate operators. Some of the important properties of conjugate operators are summarized in the following result.
7.3.
Conjugate and Adjoint Operators
24 3
7.3.6. Theorem. Conjugate transformations have the following properties: (i) liT' II = II Til; (ii) I' = I, where I is the identity operator on a normed linear space X ; (iii) 0' =
(iv) (v) (vi) (vii)
0, where 0 is the ez ro operator on a normed linear space X; (S T)' = S' T', where S, T E B(X, )Y and where ,X Y a re normed linear spaces; (IXT)' = IXT,' where T E B(X, )Y , IX E ,F and ,X Y a re normed linear spaces; (ST)' = T'S', where T E B(X, )Y , S E B( ,Y Z), and ,X ,Y Z are normed linear spaces; and if T- J exists and if T- ' E B(Y, X), then (T' ) - J exists, and moreover (T' t J = (T-J)'.
+
+
Proof To prove part (i) we note that
I<,x
From
T' y ' )
I=
I< T x , y' ) 1
"*
Next, let X o E ,X X O O. In view of the Hahn-Banach theorem (see Corollary 6.8.5) there is a y~ E *Y , II y~ II = 1, such that T < x o, y~) = II Tx o II. Therefore,
II Tx o II = I<x
o' T'y~>I<
IIl1x o II < IIT'lIllx o II,
IIT'y~
from which it follows that
IITil <
IIT'II·
Therefore, II Til = II T' II· The proofs of properties (ii)-(vi) are straightforward. To prove (iv), for example, we note that ,x<
(S +
=
T)' y ' )
=
«S S < ,x
=
,x<
=
1- T)x , y' )
y' )
+
S' y '
+
+
y' )
T' y ' )
=
+
+
S < x
= ,x<
Tx, y' ) ,x< (S'
S' y ' )
+
+
,x<
T' y ' )
T' ) y' ) .
From this it follows that (S T)' = S' T' . To prove part (vii) assume that T E B(X, )Y has a bounded inverse T- J : y - + .X To show that T': y* - + *X has an inverse we must show that it is injective. eL t y~, y; E y* be such that y~ y;. Then ,x<
T' y ;>
-
,x<
T'y>~
=
"*
y; -
"*
y~)"*
0
for some x E .X From this it follows that T'y~ T'y,~ and T' is one-to-one. We can, in fact, show that T' is onto. We note that for any x ' E x* and any
Chapter 7 I iL near Operators
24 4 E X
,X
Tx = <x,
y, and we have
=
x')
=
<x,
=
y, x ' )
=
(T- I )' x ' )
T'(T~I)'x').
From this it follows that x'
This shows that x ' 7.3.7. Exercise.
E
R < (T')
=
T' ( T- I )' x ' .
and that (T' ) - I =
(T- I )' .
•
Prove parts (ii), (iii), (v), and (vi) of Theorem 7.3.6.
In the next theorem some of the important properties of adjoint operators are summarized. 7.3.8. Theorem. Let ,X ,Y and Z be Hilbert spaces, and let I and 0 denote the identity and ez ro transformation on ,X respectively. Then (i) IIT*II= IITII, where T E B(X , )Y ; (ii) 1* = I; (iii) 0* = 0; (iv) (S + T)* = S* + T*, where S, T E B(X , )Y ; (v) (a:T)* = a.T*, where T E B(X, )Y and IX E F ; (vi) (ST)* = T*S*, where T E B(X, )Y , S E B(Y, Z); (vii) if T- I E B(Y , X ) exists, then (T*)- t E B(X , )Y exists, and moreover (T*)- t = (T- t )*; (viii) if for T E B(X , )Y we define (T*)* = T**, then T** = T; and (ix) IIT*TII= IITllz , where T E B(X, )Y .
Proof
To prove part (i) we note that IIT*x l l z
or
= I(T*x , T*x ) ! = I(T(T*x ) , )x 1 < II T(T*x ) IIIlxll < IITIIIIT*xllllxll, Ilr*xll <
IITllllx l l·
F r om the last inequality it follows that IIT*II <
T and T* we obtain IITxZ U
or
= I(Tx ,
Tx)1
= I(T*(Tx ) ,
)x 1
II T II. Reversing the roles of
< II T*(Tx ) IIIIx II <
IIT*IIIITx l lllx l l,
II Tx II < II T* IIIIx II· this it follows that II Til < IIT*II,and therefore IITII = IIT*II·
F r om The proofs of properties (ii)-(viii) are trivial. To prove part (ix), we first
7.3.
Conjugate and Adjoint Operators
24 5
note that
IIT*TII< IIT*IIIITII= IITIIIITII= IITW· On the other hand,
IITxW =
(Tx, Tx ) =
<
(T*Tx, )x
< IIT*Tllllxllllxll·
IIT*Tx l lllx l l
Taking the square root on both sides of the above inequality we obtain and thus II Til <
II Txll < ~IIT*Tllllxll, or IITW < IIT*TII· Hence, IIT*TII=
~IIT*TII,
7.3.9. Exercise.
IITW·
•
Prove parts (ii)-(viii) of Theorem 7.3.8.
F r om the above discussion it is obvious that adjoint operators are distinct from conjugate operators even though many of their properties appear to be identical, especially for the case of real spaces. We now cite a few examples to illustrate some of the concepts considered above. Let X = C" be the H i lbert space with inner product defined in Example 3.6.24, and let A E L ( X , X ) be represented (with respect The transformation to the natural basis for X ) by the n x n matrix A = a[ IJ'} Y = Ax can be written in the form
7.3.10. Example.
• = E
IY
i=
alJ x J ,
I= J
1,2, ... ,n,
where IY is the ith component of the vector y E .X Let A* denote the adjoint of A on the H i lbert space X, and let A* be represented by the n X n matrix [a~}. Now if u = (u l , • • ,u.) E X , then (Ax , u)
=
=
(y, u)
and (x , A*u) =
t
1= 1
Nil =
t U I( 1=f:1 aIJx
1= 1
J ),
t (t a~uJ)' J-I
I- I
XI
In order that (Ax , u) = (x , A*u) we must have a~ = iiIJ ; i.e., the matrix of A* is the transpose of the conjugate of the matrix of A. •
7.3.11. Example. Let X = Y = L 1 a[ , b], a < and define the rF edholm operator T by yet) =
(Tx)(t)
=
r
b (see Example
k(s, t)x(s)ds, t
E
6. I I.lO),
a[ , b],
where it is assumed that the kernel function k(s, t) is well enough behaved so that
s: s:
1
I k(s, t) 1
dtds
<
00.
Clulpler 7 I iL near Operalors
24 6
Now if U E L
[, 2a
b], then
=
(Tx, u)
=
u)
(Y.
=
s: y(l)u(l) dl = s: (U I) :U k(s, I)x(s)ds )dl s: ex s) :U k(s, I)U(I)d1 )ds.
F r om this it follows that the adjoint T* of T maps U into the function
s: k(t, s)u(s)ds;
= (T*uXI) =
Z(I)
i.e., the adjoint of T is obtained by interchanging the roles of sand t in the kernel and by utilizing the complex conjugate of k. • 7.3.12.
12
-/
2
Exercise. by
= Y
=
12 (see Example
2' " .. "., )=
T(el>
for all x = (' I > defined by
X
Let
'2'
•.
,
e., )
E
(0,
=
(' I I' 1' 2' ... , "• • . .. )
E
I' ' 2' " .. ".....)
12 , Show that T*: 12 - 1
T*('11> 1' 2' ...• 'I., ... ) for all y
6.1.6) and define T:
= ('12' 1' 3' ... , 'I.,
=
2
y, is the operator
... )
12,
Recalling the definition of orthogonal complement (refer to Definition
6.12.1), we have the following important results for bounded linear operators on H i lbert
spaces.
7.3.13. Theorem. L e t T be a bounded linear operator on a H i lbert X into a H i lbert space .Y Then, (i) { R < (T)}.L (ii) R < (T) = (iii) ~(T) = (iv) R < (T*) (v)
(~ T*)
(vi) R < (T)
=
=
space
~(T*);
~(T*).L;
R <{ (T*)}.;L
= =
~(T).L; ~(TT·);
and
R < (TT*).
Proof We prove (i) and (v) and leave the proofs of (ii}(- iv)
and (vi) as an exercise. To prove (i), we first show that R < (T).l = ~(T*). Let y E R < (T).l. Then (y, Tx) = 0 for all x E ,X and hence (T· y , x) = 0 for all x E .X This can be true only if T*y = 0; i.e., y E ~(T*). On the other hand, if y E ~(T*), then (T*y, x) = 0 for all x E .X Thus, (y, Tx) = 0 for every x E ,X which implies that y E R< (T).l. Now R < (T) need not be closed. However, by Theorem 6.12.14 R < (T) = R < (T)u. Therefore, R<{ (T)}.L = R< (T)il.L = R< (T).L = ~(T*).
7.4.
eH rmitian
24 7
Operators
To prove (v), let y E m(T*). Then T*y = 0 and TT*y = O. This implies that m(T*) c m(TT*). Next, let y E m(TT*). Then TT*y = 0 and (y, TT*y) = O. This implies that (T*y, T*y) = 0 so that T*y = O. Therefore, y E m(T*) and m(TT*) c m(T*), completing the proof of part (v). • 7.3.14.
Exercise.
Prove parts (ii)-(iv) and (vi) of Theorem 7.3.13.
We conclude this section with the following results. 7.3.15. Theorem. Let T E B(X, )X , where X M and N be subsets of .X Define T(M) as
T(M) =
y{ : y =
is a Hilbert
space, and let
Tx, x EM} .
If T(M) c N, then T*(Ni-) c Mi-.
Proof Let z .1 N. Then for x E M we have (Tx, )z fore, T*z .1 x for all x E M and T*z E Mi-. •
=
0=
(x, T*z). There-
7.3.16. Theorem. Let T E B(X, )X , where X is a Hilbert space, and let M and N be closed linear subspaces of .X Then T(M) c N if and only if T*(Ni-) c Mi-.
Proof If T(M) c N, then by Theorem 7.3.15 T*(Ni-) c Mi-. Conversely, if T*(Nl.) c Ml., then by Theorem 7.3.15, T**(Mil) c NH . But T** = T and if M and N are closed linear subspaces, then MH = M and NJ.1. = N. Therefore, T(M) c N. •
7.4.
HERMITIAN
OPERATORS
Throughout this section X denotes a complex Hilbert space. We shall be primarily concerned with operators T E B(X , X ) . By T* we shall always mean the adjoint of T. F o r our first result, recall the definition of bilinear functional (Definition 3.6.4). 7.4.1. Theorem. Let T E B(X, X) and define the function rp: X x X by rp(x, y) = (Tx, y) for all x, y E .X Then rp is a bilinear functional. 7.4.2.
Exercise.
C
Prove Theorem 7.4.1.
Of central importance in this section is the following class of operators. 7.4.3. Definition. A bounded linear transformation T to be hermitian if T = T*.
E
B(X,
X)
is said
24 8
Some authors call such transformations self-adjoint operators (see Definition .4 10.20). The next two results allow us to characterize a hermitian operator in an equivalent manner. The first of these involves symmetric bilinear forms (see Definition 3.6.10). 7.4..4 Theorem. eL t T E B(X, bilinear transformation ,(x , y) =
X). Then T is hermitian if and only if the (Tx , y) is symmetric.
Proof If T* = T, then ,(x , y) = (Tx , y) = (x , T*y) = (x , Ty) = (Ty, )x = ,(y, x), and therefore, is symmetric. Conversely, assume that ,(x , y) = ,(y, x). Then ,(y, )x = (Ty, )x = (x , Ty) = ,(x , y) = (Tx , y) = (x , T*y); eL ., (x, Ty) = (x , T*y) for all x , y E .X F r om this it follows that «T* -
x
and thus (T* T*
E Xor
7.4.5.
=
T)x ..L y for all x T. •
Theorem. Let
(Tx , x ) is real for every x
T E B(X, E .X
0,
T)x , y) = E
X).
This implies that T*x
.X
=
Tx for all
Then T is hermitian if and only if
If T is hermitian, then (Tx , y) = (Ty, x). Setting x = y, we obtain = (Tx , x), which implies that (Tx , )x is real. Then (x, Tx ) = Conversely, suppose (x , Tx ) = (x , Tx ) for all x E .X (Tx , x). Now consider (x , Ty) for arbitrary x , y E .X It is easily verified that
Proof (Tx , )x
+
(x
y, T(x
+
y»
-
(x -
-
where;
y, T(x -
i(x -
iy, T(x -
y) -
(T(x -
y), x -
-
i(T(x -
iy), x -
+
y»
(T(x
+
y), x
+
=
iy»
= ..;=1. Also, y)
+
iy)
+
i(x 4(x,
i(T(x
=
iy, T(x
4 ( Tx ,
+
iy»
(7.4.6)
+
Ty) iy), x y).
Since the left-hand sides of Eqs. (7.4.6) and (7.4.7) are equal, that (x , Ty) = (Tx , y) for all x , y E ,X and hence T = T*. •
+
iy»
(7.4.7)
it follows
The norm of a hermitian operator can be found as follows. 7.4.8. Theorem. Let T E B(X, X ) be a hermitian operator. Then the norm of T can be expressed in the following equivalent ways: (i) (ii)
II Til = IITII =
sup I{ (Tx , )x l: sup { I (Tx , y)l:
IIxll = I}; and IIxli = lIyll = I}.
7.4.
eH rmitian
7.4.9.
Operators
Exercise.
24 9
Prove Theorem 7.4.8.
In the next theorem, some of the more important properties of hermitian operators are given. 7.4.10. Theorem. eL t be a real scalar. Then (i) (ii) (iii) (iv) 7.4 . H .
S, T E B(X,
be hermitian operators, and let ex
X)
(S + T) is a hermitian operator; exT is a hermitian operator; if T is bijective, then T- I is hermitian; and ST is hermitian if and only if ST = TS. Exercise.
Prove Theorem 7.4.10.
Since in the case of hermitian operators (Tx , x ) is real for all x E ,X the following definition concerning definiteness applies (recall Definition 3.6.10). 7.4.12. Definition. eL t T E B(X, X ) be a hermitian operator. Then Tis said to be positive if (Tx , x ) ~ 0 for all x E .X In this case we write T ~ O. If (Tx, x) > 0 for all x *- 0, we say that T is strictly positive. 7.4.13. Definition. eL t S, T hermitian operator T + (- S ) = 7.4.14. Theorem. eL t S, T, U ex be a real scalar. Then, (i) (ii) (iii) (iv)
E
be hermitian operators. If the 0, then we write T > S.
B(X, X ) T- S> E
if S ~ 0, T~ 0, then (S + if ex > 0, T~ 0, then exT~ if S ::; T, T::; ,U then S < for any V E B(X, X), if V*V> o.
B(X, T)
0;
be hermitian operators, and let
>
X)
0;
U; and T > 0, then V*TV>
O. In particular,
Proof The proofs of parts (i}(- iii) are obvious. F o r example, if S > 0, T > 0, then (Sx , )x + (Tx, )x = (Sx + Tx , )x = + Dx , x) ;;::: 0 and (S+ D;;:::O. To prove part (iv) we note that (V*TVx , x) = (TVx , Vx);;::: 0, since Vx = y is a vector in X and (Ty, )Y > 0 for all y E .X If we consider, in particular, T = 1= 1*, then v* V ~ O. •
S«
The proof of the next result follows by direct verification of the formulas involved.
34 0
Chapter 7 / iL near Operators
7.4.15.
Theorem. eL t A
where i
= ,j- 1 .
~ =
U
A [
E B(X ,
+
and let
X),
V=
and
A*]
ii
A [ -
A*],
Then
(i) U and V are hermitian operators; and (ii) if A = C + iD, where C and D are hermitian, then C D= V. 7.4.16.
= U
and
Prove Theorem 7.4.15.
Exercise.
eL t us now consider some specific cases. 7.4.17. Example. eL t X = C" with inner product given in Example 3.6.24. Let A E B(X , X), and let e{ l> ... ,eft) be any orthonormal basis for .X As we saw in Example 7.3.10, if A is represented by the matrix A, then A* is represented by A * = AT. In this case A is hermitian ifand only if A = AT. • 7.4.18. Example. T E B(X, X ) by
= L X
eL t
=
y
Then for any
Z
7.4.19.
Tx
=
=
s:
tx ( t)z ( t)
(x , Tz)
=
Let X =
z =
Show that T*
Tx
y(t)
=
dt
=
s:
x ( t)tz ( t)
dt
_
=
I
-+
X
by
x ( s)ds.
*" T and therefore T is not hermitian.
7.4.20. Exercise. eL t X = L given in Example 7.3.11; i.e.,
Show that T
tx ( t).
b], and define T: X
[, 2a L
6.11.10), and define
(T*x , z).
T* and T is hermitian.
Exercise.
=
we have
E X
(Tx , z )
Thus, T =
b] (see Example
[, 2a
=
(Tx ) (t)
=
2
a[ , b] and consider the Fredholm
s: k(s, t)x(s)ds,
T* if and only if k(t, s) =
operator
t E a[ , b].
k(s, t).
We conclude this section with the following result, which we will subsequently require.
7.5.
Other iL near Operators
34 1
7.4.21. Theorem. Let X be a H i lbert space, let T E B(X , X ) be a hermitian operator, and let 1 E R. Then there exists a real number" > 0 such that , 11 x II < II (T - U ) x II for all x E X if and only if (T - U ) is bijective and (T - 11)-1 E B(X , X ) , in which case II(T - ,il)-III < 1/".
L e t T A= T - AT. It follows from Theorem 7.4.10 that T Ais also hermitian. To prove sufficiency, let Til E B(X , X ) . It follows that for all Y E ,X IITilyli < II Til II · l lyl\ · L e ttingy = TAX and" = II Til WI, we have II TAX II 2 ,,11 x 1\ for all x E .X To prove necessity, let" > 0 be such that , 11 x II < II TAX II for all x E .X We see that TAX = 0 implies X = 0; i.e., m(TJ = O { ,J and so TAis injective. < (T A) = .X It follows from Theorem 6.12.16 that X We next show that R = R< (T A) EEl R< (T A)1.. F r om Theorem 7.3.13, we have R< (TA)l· = men). Since TAis hermitian, meT! ) = m(TA) = O { .J Hence, R< (T A) = .X We next show that R < (T A) = R < (T A), i.e. the range of T A is closed. Let nY{ J be a sequence in R < (T A) such thatYn - > y. Then there is a sequence nx{ J in X such that TAx n = n'J ' F o r any positive integers m, n, , 11 X m - X nIi < II TAx m - TAx nII = II m Y - nY II. Since nY{ J is Cauchy, nx { J must also be Cauchy. Let X n - > .x Then nY = TAx n -> TAX = y. Thus, Y E R < (T A) and so R < (T A) is closed. This proves that TA is bijective. Finally, ,,11 Ti I Y II < II Y II for all y E X implies Ti I E B(X , X ) and II Tilll < 1/". This completes the proof of the theorem. _
Proof
7.5.
OTHER LINEAR OPERATORS: NORMAL OPERATORS, PROJECTIONS, U N ITARY OPERATORS, AND ISOMETRIC OPERATORS
In this section we consider additional important types of linear operators. Throughout this section X is a complex H i lbert space, T* denotes the adjoint of T E B(X , X ) , and I E B(X , X ) denotes the identity operator. 7.5.1. Definition. ifT*T= TT*.
An operator T
E
7.5.2. Definition. An operator T operator if T*T = I. 7.5.3. Definition. An operator T tor if T*T = TT* = I.
E
B(X ,
E B(X ,
B(X ,
Our first result is for normal operators.
is said to be a normal operator
X)
X)
X)
is said to be an isometric is said to be an unitary opera-
34 1
ClUpJ ter
7.5.4. Theorem. Let operators such that T 7.5.5. Exercise. Theorem 7.4.15.
=
7
I iL near Operators
T E B(X, X). Let ,U V E B(X, X ) be hermitian U iV. Then T is normal if and only if U V = VU.
+
Prove Theorem 7.5.4. Recall that U and V are unique by
F o r the next result, recall that a linear subspace Y of X is invariant under a linear transformation T if T(Y ) c Y (see Definition 3.7.9). Also, recall that a cloSed linear subspace Y of a Hilbert space X is itself a Hilbert space with inner product induced by the inner product on X (see Theorem 6.2.1). 7.5.6. Theorem. Let T E B( ,X X ) be a normal operator, and let Y be a closed linear subspace of X which is invariant under T. eL t T I be the restriction of T to .Y Then TIE B(Y , )Y and T I is normal. 7.5.7. Exercise.
Prove Theorem 7.5.6.
F o r isometric operators we have the following result. 7.5.8. Theorem. eL t T E B(X , X). Then the following are equivalent: (i) T is isometric; (ii) (Tx , Ty) = (x, y) for all ,x y E X ; and (iii) II Tx - Ty II = IIx y II for all ,x y E .X Proof If T is isometric, then (x, y) = (lx , y) = (T*Tx, y) = (Tx , Ty) for all x , y E .X Next, assume that (Tx, Ty) = (x, y). Then I\ Tx - Ty I\ ' = I\ T(x - y) II' = (T(x - y), T(x - y)) = x « - y), (x - y» = IIx - yll' ; i.e., IITx - Tyll
= l lx - y ll·
iF nally, assume that II Tx - Ty II = II x y II. Then (T*Tx, II Tx W = IIx W = (x, x); i.e., (T· T x , )x = (x, x) for all x implies that T· T = I; i.e., T is isometric. _ =
)x E
=
.X
(Tx , Tx ) But this
From Theorem 7.5.8 there follows the following corollary. 7.5.9. Corollary. If T E B(X, X ) is an isometric operator, then IITxll = Ilxll for all x E X and IITII = I. F o r unitary operators we have the following result. 7.5.10. Theorem. eL t T
E B(X ,
(i) T is unitary; (ii) T· is unitary; (iii) T and T· are isometric;
)X .
Then the following are equivalent:
7.5. Other iL near Operators
34 3
(iv) T is isometric and T* is injective; (v) T is isometric and surjective; and (vi) T is bijective and T- I = T*.
7.S.H.
Exercise.
Prove Theorem 7.5.10.
Before considering projections, let us briefly return to Section 3.7. Recall that if (a linear space) X is the direct sum of two linear subspaces XI and X z , i.e., X = X l EB X z , then for each x E X there exist unique X l E X l and Xz E X z such that X = Xl X z . We call a mapping P: X - > X defined by Px = X l the projection on .X along X z . Recall thatP E L ( X , X), R < (P) = X l ' and m(p) = X z . Furthermore, recall that if P E L ( X , X ) is such that pz = P, then P is said to be idempotent and this condition is both necessary and sufficient for P to be a projection on R < (P) along m(p) (see Theorem 3.7.4). Now if X is a Hilbert space and if X l = Y is a closed linear subspace of ,X then X z = y.l and X = Y E9 y.l (see Theorem 6.12.16). If for this particular case P is the projection on Y a long y.l, then P is an orthogonal projection (see Definition 3.7.16). In this case we shall simply call P the orthogonal projection on .Y
+
7.5.12. Theorem. eL t Y be a closed linear subspace of X such that Y and Y
*" .X
Let P be the orthogonal projection onto .Y Then
*" O{ J
(i) P E B(X, X ) ; (ii) IIPII = I; and (iii) p* = P.
Proof We know that P E L ( X , X ) . To show that P is bounded let X = X l x z , where X I E Y a nd X z E .Y l. Then II Px II = Ilxlli < IIxll. eH nce, P is bounded and IIPII ~ I. If X z = 0, then IIPxl1 = IIxll and so IIPII = I. To prove (iii), let x, Y E X be given by X = X I + X z and Y = IY + ,zY respectively, where X I ' IY E Y a nd x z , zY E .Y l. Then (x , Py) = (X l + X z , Y l ) = (X l ' Y l ) and (Px, y) = (XI> IY yz) = (X I ' YI)' Thus, (x, Py) = (Px, y)
+
for all ,x Y E .X This implies that P
+
= P*. •
From the above theorem it follows that an orthogonal projection is a hermitian operator.
7.5.13. Theorem. Let Y be a closed linear subspace of ,X and let P be the orthogonal projection onto .Y If Y
l
= x{
and if Y z is the range of P, then Y
Px
E X:
= Y
l
= )x
= Y z.
Chapter 7 I iL near Operators
34 4
Proof Y= Y 7.5.14.
Since I
=
Y~
~Y
.•
=
Theorem.
,Y
since Y c Y
Let P
L(X,
E
x{
I
it follows that
c Y~,
If P is idempotent and hermitian, then
X).
=
Y
and since Y
I,
=
Px
E X:
}x
is a closed linear subspace of X and P is the orthogonal projection onto .Y
Proof
Since P is a linear operator we have
If x, y E ,Y then Px
=
+
P(rx.x
fty) =
x and Py
+
P(rx.x
+
=
+
rx.Px
ftPy.
y, and it follows that fty) =
rx.x
+
fty.
Therefore, (rx.x fty) E Y a nd Y is a linear subspace of .X We must show that Y is a closed linear subspace. First, however, we show that P is bounded and therefore continuous. Since
IIPzW
=
(Pz, Pz)
=
(P*Pz, )z
=
(P~z,
)z
=
(Pz, )z
<
IIPz l lllz l I,
we have IIPzlI ~ IIzll and IIPI! = l. To show that Y is a closed linear subspace of X let X o be a point of accumulation of the space .Y Then there is a sequence of vectors {x~} in Y such that lim I\ ~x - X o II = O. Since ~x E ,Y we can put Px~ = x~ and we have
I\ Px~
~ X o 1\ - 0 as n - (Xl. Since P is bounded, it is continuous and thus we also have 1\ Px~ - PX o II- 0 as n - > (X l , and hence X o E .Y Finally, we must show that P is an orthogonal projection. L e t x E ,Y and let y E .Y l. Then (Py, )x = (y, Px) = (Y, x) = 0, since x ...L y. Therefore, Py...L x and Py E .Y l. But P(Py) = Py, since P~ = P and thus Py E .Y Therefore, it follows that Py = 0, because Py E Y and Py E .Y l. Now let Z = x + y E ,X where x E Y and y E lY .. Then pz = Px + Py = x + = .x Hence, P is an orthogonal projection onto .Y •
°
The next result is a direct consequence of Theorem 7.5.14. 7.5.15. Corollary. L e t Y be a closed linear subspace of X, the orthogonal projection onto .Y Then P(yl.) = O { .J 7.5.16. Exercise.
and let P be
Prove Corollary 7.5.15.
The next result yields the representation of an orthogonal projection onto a finite-dimensional subspace of .X 7.5.17. Theorem. L e t IX{ > • • , x~} be a finite orthonormal set in ,X and let Y be the linear subspace of X generated by { X I "' " x~}. Then the orthogonal projection of X onto Y is given by
Px =
~
~
I- I
(x, ,X )X
, for all x
E
.X
7.5.
Other iL near Operators
34 5
Proof We first note that Y is a closed linear subspace of X by Theorem 6.6.6. We now show that P is a projection by proving that p'1. = P. F o r any j
=
I, ... , n we have
PX
Hence,
for any x
=
ft ~
(x
I- I
J ,
,x )x,
=
(7.5.18)
Ix "
we have X
E
J
=
~
" (x,
,=
,X )X
t-1
Next, we show that CR(P) = Y c CR(P), let y E .Y Then
.Y
=
y
Px.
It is clear that CR(P) c .Y
+ ... +
tllXI
To show that
tI"x"
for some { t il' ... ,tift}. It follows from Eq. (7.5.18) that Py = Y and so y E CR(P). iF nally, to show that P is an orthogonal projection, we must show that CR(P) 1- (~ P). To do so, let x E ~(P) and let y E CR(P). Then
=
(x, y)
= =
=
(x, Py) ~
I~
(x, ~
" (x, ,X )(X
(O,y)
This completes the proof.
=
" (y, ,X )X
1= 1
O.
"
y)
=
(~(x, " 1'1=
,)
= ,X )X
~
"(y, - - ,x )(x,
1= 1
"
y)
=
,x )
(Px, y)
_
Referring to Definition 3.7.12 we recall that if Y and Z are linear subspaces of (a linear space) X such that X = Y ffi Z, and if T E L ( X , X ) is such that both Y and Z are invariant under T, then T is said to be reduced by Y and Z. When X is a Hilbert space, we make the following definition.
7.5.19. Definition. eL t Y be a closed linear subspace of ,X and let T E X ) . Then Y is said to reduce T if Y a nd y.l. are invariant under T.
L(X,
Note that in view of Theorem 6.12.16, Definitions 3.7.12 and 7.5.19 are consistent. The proof of the next theorem is straightforward.
7.5.20. Theorem. Let B(X , X ) . Then Y
be a closed linear subspace of ,X
and let T
Y is invariant under T if and only if y.l. is invariant under T*; and (ii) Y reduces T if and only if Y is invariant under T and T*. (i)
E
Chapter 7 I iL near Operators 7.5.21. Exercise.
Prove Theorem 7.5.20.
7.5.22. Theorem. Let Y be a closed linear subspace of ,X let P be the orthogonal projection onto ,Y let T E B(X, X ) , and let I denote the identity operator on .X Then
(i) Y is invariant under T if and only if TP = PTP; (ii) Y reduces T if and only if TP = PT; and (iii) (I - P) is the orthogonal projection onto lY .. Proof To prove (i), assume that TP = PTP. Then for any x E Y we have Tx = T(Px ) = P(TPx ) E Y, since P applied to any vector of X is in .Y Conversely, if Y is invariant under T, then for any vector x E X we have T(Px ) E ,Y because Px E .Y Thus, P(TPx ) = TPx for every x E .X To prove (ii), assume that PT = TP. Then PTP = P2T = PT = TP. Therefore, PTP = TP, and it follows from (i) that Y is invariant under T. To prove that Y reduces T we must show that Y is invariant under T*. Since P is hermitian we have T*P = (PT)* = (TP)* = P*T* = PT*; i.e., T*P = PT*. But above we showed that PTP = TP. Applying this to T* we obtain T*P = PT*P. In view of (i), Y is now invariant under T*. Therefore, the closed linear space reduces the linear operator T. Conversely, assume that Y reduces T. By part (i), TP = PTP and T*P = PT*P. Thus, PT = (T*P)* = (PT*P)* = PTP = TP; i.e., TP = PT. To prove (iii) we first show that (I - P) is hermitian. We note that (l - P)* = 1* - p* = I - P. Next, we show that (I - P) is idempotent. We observe that (I - pp = (1- 2P + P2) = (1- 2P + P) = (1- P). Finally, we note that (1 - P)x = x if and only if Px = 0, which implies that x E lY .. Thus, yl.
=
x{
E
X:
(1- P)x
It follows from Theorem 7.5.14 that (I The next theorem.
=
.}x
P) is a projection onto lY .. •
result follows immediately from part (iii) of the preceding
7.5.23. Theorem. Let Y be a closed linear subspace of ,X and let P be the orthogonal projection on .Y If II Px II = II x II, then Px = x, and consequently x E .Y 7.5.24.
Exercise.
Prove Theorem 7.5.23.
We leave the proof of the following result as an exercise. 7.5.25. Theorem. Let Y a nd Z be closed linear subspaces of ,X and let P and Q be the orthogonal projections on Y a nd Z, respectively. Let 0 denote
7.5.
Other iL near Operators
the zero transformation in B(X, (i) Y 1(ii) PQ = (iii) QP =
(iv) P(Z) (v) Q(Y )
;z =
34 7 )X .
The following are equivalent:
0; 0; O { ;}
= O{ .J
7.5.26. Exercise.
and
Prove Theorem 7.5.25.
F o r the product of two orthogonal projections we have the following result.
7.S.27. Theorem. L e t Y I and Y z be closed linear subspaces of ,X and let PI and P z be the orthogonal projections onto Y I and Y z , respectively. The product transformation PJP Z is an orthogonal projection if and only if PI commutes with P z . In this case the range of P1P Z is Y I (i Y z .
Proof Assume that PIP Z = PZP I· Then (PIP Z)* = PfN = PZP I = PIP Z; i.e., if PIP Z = PZP I then (PIP Z)* = (P1P Z)· Also, (PJPZP = PIPZPIP Z = PIPIPZP Z = PIP Z; i.e., if PIP Z = PZP I , then PIP Z is idempotent. Therefore, PIP Z is an orthogonal projection. Conversely, assume that PJP Z is an orthogonal projection. Then (PJP z )* = PfN = PZP 1 and also (P1P Z)* = PJP z . Hence, P1P Z = PZP J . Finally, we must show that the range of PI P z is eq u al to Y J (i Y z . Assume that x E 6l(P IP z ). Then P1PZx = ,x because P J P z isan orthogonal projection. Also, PIPZx = PI(PZx) E Y J , because any vector operated on by P J is in Y I ' Similarly, PZPlx = Pz(PJ)x E Y z . Now, by hypothesis, P1P Z = PZP Io and therefore PIPZx = PZPJx = x E Y I (i Y z . Thus, whenever x E 6l(P IP z ), then x E Y J (i Y z . This implies that 6l(P IP z ) c Y I (i Y z . To show that 6l(P IP z ) ::J Y I ( i Y z , assume that x E Y 1 (i Y z . Then PJPZx = PJP{ )xz = PIX = X E 6l(P IP z ). Thus, Y I (i Y z C 6l(P 1P z ). Therefore, 6l(P IP z ) = Y I (i Y z • •
7.5.28. Theorem. L e t
Y and Z be closed linear subspaces of ,X and let P and Q be the orthogonal projections onto Y a nd Z, respectively. The following are eq u ivalent:
(i) (ii) (iii)
P::;;; Q;
II Px II < II Qxll Y c: z;
(iv) QP = (v) PQ =
P; and P.
for all x
E X;
7. I iL near
ChJpz ter
34 8
Operators
Assume that P ~ Q. Since P and Q are orthogonal projections, they are hermitian. F o r a hermitian operator, P ~ 0 means (Px , x ) ~ 0 for all x E .X If P < Q, then (Px , x ) < (Qx , x ) for all x E X or (P"x , x ) < (Q"x , x ) or (Px , Px ) ~ (Qx , Qx ) or II Px II" < II Qx1l 2 , and hence IIPxll < II Qx l l for aU x E .X Next, assume that II Px II < II Qx II for all x E .X If x E Y , then Px = x and Proof
(x , x )
=
(Px , Px )
=
IIQxll" ~
IIPxll" ~
IIQllllxll"
=
II x
II" =
(x , x ) ,
and therefore II Qx II = II x II. F r om Theorem 7.5.23 it now follows that Qx = x , and hence x E Z. Thus, whenever x E Y then x E Z and Z ::J Y. Now assume that Z ::J Y and let y = Px , where x is any vector in X. Then QPx = Qy = y = Px for all x E X and QP = P. Suppose now that QP = P. Then (QP)* = P*, or P*Q* = PQ = p* = P; i.e., PQ = P. Finally, assume that PQ = P. F o r any x E X we have (Px , x ) = IIPxll" = IIPQxll"~IIPII"IIQxll" = IIQxll" = (Qx , Qx ) = (Q2 X ,X ) = (Qx , x ) ; i.e., (Px, )x < (Qx , )x from which we have P < Q. _ We leave the proof of the next result as an exercise. 7.5.29. Theorem. Let Y
1
and "Y be closed linear subspaces of ,X
and let
PI and P 2 be the orthogonal projections onto Y t and "Y , respectively. The difference transformation P = PI - P z is an orthogonal projection if and only if P z < PI' The range of Pis Y t n Y t .
7.5.30. Exercise.
Prove Theorem 7.5.29.
We close this section by considering some specific cases. 7.5.31. Example. in Example .4 10.48.
eL t R denote the transformation from E" into E" given That transformation is represented by the matrix
R,
= [c~S
SID
0 - sin OJ cos 0
0
with respect to an orthonormal basis e{ l' obtain R:
=[ -
e"J.
By direct computation we
c~s 0
SID
sin OJ. 9 cos 9
It readily follows that R*R = RR* = I. Therefore, R is a linear transformation which is isometric, unitary, and normal. _
7.6. The Spectrum 0/ an Operator
7.5.32. Exercise. eL t by y = PTx, where
= X
L
y(t) =
2
34 9 0[ , 00) and define the truneation operator P T
{ X ( t)
o
for all 0 < t :::;; T for all t > T
Show that PT is an orthogonal projection with range
R < (P
T)
=
x{
E :X
x(t)
and null space m(P T )
Additional examples Section 7.10.
7.6.
THE
= x{
E
:X
(x t)
=
0 for t
> T},
= 0 for all t < T}.
of different types of operators are considered in
SPECTRUM
OF
AN OPERATOR
In Chapter 4 we introduced and discussed eigenvalues and eigenvectors of linear transformations defined on finite-dimensional vector spaces. In the present section we continue this discussion in the setting of infinitedimensional spaces. nU less otherwise stated, X will denote a complex Banach space and I will denote the identity operator on .X oH wever, in our first definition, X may be an arbitrary vector space over a field .F 7.6.1. Definition. eL t T E L ( X , )X . A scalar A E F is called an eigenvalue of T if there exists an x E X such that x * - O and such that Tx = AX. Any vector x * - O satisfying the equation Tx = Ax is called an eigenvector of T corresponding to the eigenvalue A.. 7.6.2. Definition. eL t X be a complex Banach space and let T: X The set of all .J E F = C such that
.X
(i) R < (T - AI) is dense in ;X (ii) (T - .J I)-I exists; and (iii) (T - .J I)-I is continuous (i.e., bounded) is called the resolvent set of T and is denoted by p(T). The complement of p(T) is called the spectrum of T and is denoted by q ( T). The preceding definitions require some comments. First, note that if .J is an eigenvalue of T, there is an x * - O such that (T - .J I)x = O. From Theorem 3.4.32 this is true if and only if (T - AI) does not have an inverse. eH nce, if .J is an eigenvalue of T, then ,t E (q T). Note, however, that there
C1u:zpter 7 I iL near Operators
04
are other ways that a complex number 1 may fail to be in p(T). These possi. bilities are enumerated in the following definition. 7.6.3. Definition. The set of all eigenvalues of T is called the point spectrum of T. The set of alll such that (T - l1)- 1 exists but Gl(T - l l) is not dense in X is called the residual spectrum of T. The set of all 1 such that (T - 11)-1 exists and such that Gl(T - 11) is dense in X but (T - ll)- I is not continuous is called the continuous spectrum. We denote these sets by pq ( T), Rq(T), and Cq(T), respectively. Clearly, q ( T) = Pq(T) U Cq(T) U Rq(T). Furthermore, when X is finite dimensional, then q(T) = Pq(T). We summarize the preceding definition in the following table. AI)-1 exists and is continuous (T (T -
=
< R (T- U ) R < (T
-U)
X
*X
.11)-1
AI)-1 exists but not continuous (T (T -
AI)-1 does not exist
(T -
.11)-1 is
A e p(D
.Ie Ca(D
A e Pa(D
.Ie "RtT(T)
1 e RtT(T)
1 e PtT(T)
7.6.4. Table A. Characterization of the resolvent set and the spectrum of an operator
7.6.5. Example. x = (~I' ~2" ..)
E
Let X = /2 be the Hilbert space of Example 6.11.9, let ,X and define T E B(X , X ) by
=
Tx
! { 2 ' i(3' ...). F o r each 1 E C we want to determine (a) whether (T - 11)-1 exists; (b) if so, whether (T - 11)-1 is continuous; and (c) whether Gl(T - 1 1) = .X (~I'
First we consider the point spectrum of T. IfTx =
lx then (~
k = 1,2• . ... This holds for non-trivial x if and only if l Hence.
k:k =
pq ( T) = {
=
-
l )~k =
0,
11k for some k.
I. 2• . .. } .
Next, assume that 1 ¢ pq(T). so that (T - l1)- 1 exists. and let us inves· tigate the continuity of (T - 1 1)- 1 . We see that if y = (' I I. 1' 2.' ..) E Gl(T - 11), then (T - l 1)- l y = x is given by ~
-.....!l.L_ k'lk . ..! . ._ l - I - l k
k-
k
7.6. The Spectrum 0/ an Operator
Now if A.
=
0, then
II (T - A.I)-I y W=
. k= 1
14 ~
and (T -
k"11~
A.I)-I is not
bounded and hence not continuous. On the other hand, if A. A.I)-I is continuous since I' k I < , 1 11k I for all k, where
(T -
*" 0, then
I
and p(n
= P[ O'(T) u CO' ( nr· •
7.6.6. Exercise. eL t X = lz, the Hilbert space of Example x = (' I ' ,,,' ' 3 ' " .), and define the right shift operator T,: X - + left shift operator T,: X - + X by
= Y
and
T,x
=
(0,
I' ' , ,' ...)
6.11.9, let X and the
respectively. Show that
=
p(T,)
p(T,)
= CO'(T,) = A{ .
CO'(T,) RO'(T,) PO'(T,)
=
= A{ .
=
PO'(T,) RO'(T,)
C: IA.I >
E
E
= A{ . E = 0.
I),
C: IA.I = C: IA.I
I),
< I),
We now examine some of the properties of the resolvent set and the spectrum. 7.6.7. Theorem. Let T E B(X, X). IflAI > lently, if A E O'(n, then IA.I < II Til.
II Til, then A. E
p(T) or, equiva-
14
Chapter 7
7.6.8. Exercise.
I iL near Operators
Prove Theorem 7.6.7 (use Theorem 7.2.2).
7.6.9. Theorem. Let T
E
B(X,
X).
Then P(T) is open and o'(T) is closed.
Proof Since o(T) is the complement of p(T), it is closed if and only if P(T) is open. Let 1 0 E P(T). Then (T - 1 0 1) has a continuous inverse. F o r arbitrary 1 we now have
III- (T - l oI} - I (T 1- 1}11 = II(T - l oI} - ' ( T - 1 0 1) - (T - l ol} - I (T = II(T - l oI} - I [ ( T - 1 01) - (T -1I)]11 = II(l- l o)(T - 1 0 1)-111
= Il- l olIl(T - 1 0 /)-111. Now for 11 - 10 I sufficiently small, we have III- (T - loT) - I (T - 1I) II = 11 - 1 0 III(T -
1 0 ) -I
- 1 /)11
II <
1.
Now in Theorem 7.2.2 we showed that if T E B(X, X), then T has a continuous inverse if III- Til < 1. In our case it now follows that (T - lo/)- I (T - lI) has a continuous inverse, and therefore (T - 1I) has a continuous inverse whenever Il - lo I is sufficiently small. This implies that 1 E p(T) and P(T) is open. eH nce, u(T) is closed. _ F o r normal, hermitian, and isometric operators we have the following result. 7.6.10. Theorem. eL t X be a Hilbert space, let T eigenvalue of T, and let Tx = lx . Then (i) (ii) (iii) (iv)
E
B(X ,
X),
let l be an
if T is hermitian then 1 is real; if T is isometric, then III = I; if T is normal, then X is an eigenvalue of T* and T*x = ;x X and if T is normal, if .J l is an eigenvalue of T such that .J l 1= = 1, and if Ty = .J lY, then x ..1 y.
Proof Without loss of generality, assume that x is a unit vector. To prove (i) note that l = 111 x W= l(x , )x = (lx , )x = (Tx, )x , which is real by Theorem 7.4.5. Therefore, (Tx, )x = (x, Tx) = (Tx, )x = ;X i.e., 1 = X and 1 is real. To verify (ii), note that if T is isometric, then II Txll = IIxll = 1, by Corollary 7.5.9. Since Tx = Ax it follows that IIlxll = 1 or Illllx l i = I, and hence III = l.
7.6.
The Spectrum 0/ an Operator
34
To prove (iii), assume that T is normal; i.e., T*T = (T -
U ) (T
-
=
U)* =
=
=
eL .,
(T -
(T -
U ) (T*
(T -
U ) T*
-
(T -
T*T -
IT -
(T* -
II)(T -
UXT
-
=
).1)*
U ) II
IT +
).T* -
=
II)
-
TT* -
TT*. Then
).T* +
).II
=
(T -
U ) *(T
-
),1)
(T -
;.II U ) *(T
-
).1);
U),
and (T - AI) is normal. Also, we can readily verify that I\ (T - ).I)x II = II(T - 11)*xll. Since (T - U ) x = 0, it follows that (T - AI)*X = 0, or (T* - Il)x = 0, or T*x = Ix . Therefore, I is an eigenvalue of T* with eigenvector .x To prove the last part assume that 1 F= /.l and that T is normal. Then (1 -
/.l)(x, y)
= =
i.e., (A -
/.l)(x, y)
= (.tx, y) - (x, fly) T*y) = (Tx , y) - (Tx , y) =
).(x, y) -
/.l(x, y)
(Tx , y) -
(x,
= O. Since 1 F=
/.l we have x ..L y.
0;
•
The next two results indicate what happens to the spectrum of an operator T when it is subjected to various elementary transformations. 7.6.11. Theorem. T. Then
Let
T E B(X ,
= p(q(T»
q ( p(T»
7.6.12. Exercise.
and let P(T) denote a polynomial in
X),
= {p(A): A E q(T)J.
Prove Theorem 7.6.11.
7.6.13. Theorem. Let T q ( T- I )
E
=
B(X ,
be a bijective mapping. Then
X)
[ q ( T)r
l
tJ.
l{ :).
E
q ( T)} .
Proof Since T- I exists, 0 i q ( T) and so the definition of (q[ T)]1sense. Now for any). F= 0, consider the identity (T- I -
It follows that if 1 1
i
i
=
(U
q ( T), then (T- I -
q ( T) implies that
prove that [q(T)]-1 Tand T- I . •
1I)
1i
c q ( T- I )
q ( T- I ).
-
T)
makes
1
T- I .
1/) has a continuous inverse; i.e.,
In other words, U ( T- I )
c [ u (T)r
l •
To
we proceed similarly, interchanging the roles of
Chapter 7 I iL near Operators
4
We now introduce the concept of the approximate point spectrum of an operator. 7.6.14. Definition. eL t T E B(X, )X . Then 1 E C is said to belong to the approximate point spectrum of T if for every E > 0 there exists a non-zero vector x E X such that II Tx - lx II < Ell x II. We denote the approximate point spectrum by n(T). If 1 E n(T), then 1 is called an approximate eigenvalue ofT. Clearly, Pt1(T) c n(T). Other properties of n(T) are as follows. 7.6.15. Theorem. n(T) c t1(T).
eL t
be a Hilbert X
space, and let T
Proof Assume that 1 ~ t1(T). Then (T and for any x E X we have
IIxII =
1/)- 1
and 1
II. ~
Then
)X .
has a continuous inverse,
< II(T - l l)- I IIII(T
II(T- l l)- I (T - l l)x l l
Now let E = I/II(T Ell x II for every x E X
lJ )
B(X,
E
- l l)x l l.
Then we have, from above, II Tx n(T). Therefore, t1(T) ::> n(T). •
lx l l ~
We leave the proof of the next result as an exercise. 7.6.16. Theorem. eL t X be a Hilbert normal operator. Then n(T) = t1(T). 7.6.17. Exercise.
space, and let T
E
B(X,
be a
X)
Prove Theorem 7.6.16.
We can use the approximate point spectrum to establish some of the properties of the spectrum of hermitian operators. 7.6.18. Theorem. eL t hermitian. Then X
be a Hilbert
space, and let T
E
B(X,
X)
be
(i) t1(T) is a subset of the real line;
(ii) II Til = sup {Ill: 1 E t1(T)}; and (iii) t1(T) is not empty and either + II Til or
-II
Til belongs to t1(T).
Proof To prove (i), note that if T is hermitian it is normal and t1(T) = n(T). eL t 1 E n(T), and assume that 1 0 is complex. Then for any x 0 we have
0<
11- IlIlxW =
<
I« T -
= i.e.,
211(T
ll)x ,
)x 1
*'
11- II(x , )x
+
I« T -
Il)x ,
=
*'
I« T - l l)x ,
)x 1 <
II(T -
)x -
lJ)lx lllxll
- l J ) x l lllx l l; 0<
11- IllIx l l <
«T -
211(T- l l)x l l
+
Il)x ,
)x 1
II(T- Il)x l lllx l i
The Spectrum 0/ an Operator
7.6.
for all x E .X But this implies that l rt neT), contrary to the original assumption. eH nce, it must follow that l = .i, which implies that l is real. To prove (ii), first note that II Til > sup { I ll: l E q ( T)} for any T E B(X, X) (see Theorem 7.6.7). To show that equality holds, if T is hermitian, we first must show that II T WE n(P) = q ( P). F o r all reall and all x E X we can write
IIT2 x -
..1hW =
=
Since (T2X , )x
(T2 X or
(T2 X - l 2 X , T2X - l 2 X ) = (T2 X , T2 X ) - (T2 X , l2 X ) -
=
(Tx, T*x)
l2X , Px -
l2 X
)
IIT2x - l 2 X W =
Now let }~x{
(Tx, Tx), we now have = (T2 X , T2 X ) - 2l 2(Tx, Tx) 2l 211TxW
IIT2x W -
be a sequence
-
=
< (II T 1111 Tx~ ).4
_
=
l2X~HZ
ZH
).211 Tx~
-
IIT2X~HZ
11)2 -+
+
+
II T2x .
2A,211 Tx~ ZH 0 as n - +
+
2l211Tx~W
+
A,4 =
). 211 Tx~
ZH -
eL t
7.6.22. Exercise.
Prove Theorem 7.6.21.
=
x{
E
be a Hilbert
:X
IITII. If'
E
B(X,
(T -
W+ ). 4
00;
7.6.21. Theorem. neT) is closed.
i~ T)
)x , (7.6.19)
2). 211 Tx~
Prove part (iii) of Theorem 7.6.18.
In the following we let T i.e., space of T - ;U
A,4(,X
l4
7.6.20. Exercise.
X
)x .
A,4 I1xW.
- ). 2.x 11 +- 0 as n +- 00, and thus ). 2 E n(T2) = Using Theorems 7.6.11 and 7.6.15 and the fact that). 2 follows that IITII = sup { I ll: l E u(T)} . The proof of (iii) is left as an exercise. _ eL .,
l4(,X
of unit vectors such that IITx~ll--+
l = IITII, then we have, from Eq. (7.6.19), IIPx~
(l2 X , T2X ) +
X), U)x
space, and let T
).
E
C, and we let
= O} =
~(T
-
U).
(q T2). E
E
.~ l(T)
n(P), it now
B(X,
)X .
Then
be the null (7.6.23)
It follows from Theorem 7.1.26 that ~.l(T) is a closed linear subspace of .X F o r the next result, recall Definition 3.7.9 for the meaning of an invariant subspace.
7.6.24. Theorem. Let X be a Hilbert space, let). E C, and let S, T B(X, )X . If ST = TS, then ~l(T) is invariant under S.
E
Chapter 7 I iL near Operators
46
Proof
L e t x E l~ (n. We wantto show that Sx Since x E ~l(n, we have Tx = .lx. Thus, STx • have TSx = lSx .
7.6.25.
Corollary •
Proof
Since IT =
=
i.e., TSx = lSx . lSx . Since ST = TS, we
E ~l(n;
is invariant under T.
~l(n
IT, the result follows from Theorem 7.6.24.
•
F o r the nex t result, recall Definition 7.5.19. 7.6.26. Theorem. L e t X be a H i lbert space, let A. If T is normal, then (i) (ii) (iii)
l~ (T)
=
l~ (T)..l l~ (T)
~rtT*); ,~ ,(T)
if A.
reduces T.
E
C, and let T
E
B(X,
X).
"* p.; and
Proof
The proofs of parts (i) and (ii) are left as an exercise. To prove (iii), we see that ~l(T) is invariant under T from Corollary 7.6.25. To prove that ~l(T)lis invariant under T, let y E ~l(T)l.. We want to show that (x, Ty) = 0 for all x E & i T). If x E &l(T), we have Tx = .lx, and so, by part (i), T*x = .x X Now (x, Ty) = (T*x, y) = (X,x y) = (X ,x y) = O. This implies that Ty E &l(T)l., and so &l(T)l. is invariant under T. This completes the proof of part (iii). • 7.6.27.
Prove parts (i) and (ii) of Theorem 7.6.26.
Exercise.
Before considering the last result of this section, we make the following definition. 7.6.28. Definition. A family of closed linear subspaces in a H i lbert space X is said to be total if the only vector y E X orthogonal to each member of the family is y =
o.
7.6.29. Theorem. L e t X be a H i lbert space and let S; T E B(X , X ) . Ifthe family of closed linear subspaces of X given by {&l(T): A. E C} is total, then TS = ST if and only if & l (n is invariant under S for all,t E C.
Proof
The necessity follows from Theorem 7.6.24. To prove sufficiency, assume that & l (T) is invariant under S for aliA. E C. L e t & denote the null - ST). If x E ~in, then Sx E ~l(n space of TS - ST; i.e., ~ = ~(TS by hypothesis. Hence, TSx = T(Sx ) = ,t(Sx) = S(.lx) = S(Tx ) = STx for all x E ~iT). Thus, (TS - ST)x = 0 for any x E & l (n, and so ~in c .& If there is a vector y 1- & , then it follows that y 1- & i T) for all A. E C. By hypothesis, the family {&l(T): A. E C} is total, and thus y = O. It follows that 1& . = O { J and 1& .1. = rOll. and 1& .1. = ,& because & is a closed linear
7.7.
74
Completely Continuous Operators
subspace of .X Therefore, Hence, TS = ST. •
7.7.
COMPLETELY
m. =
X;
eL .,
CONTINUOUS
(TS -
ST)x
= 0 for all x
E
.X
OPERATORS
Throughout this section X is a normed linear space over the field ofcomplex numbers C. Recall that a set Y c X is bounded if there is a constant k such that for all x E Y we have II x II < k. Also, recall that a set Y is relatively compact if each sequence x{ n } of elements chosen from Y contains a convergent subsequence (see Definition 5.6.30 and Theorem 5.6.31). When Y contains only a finite number of elements then any sequence constructed from Y must include some elements infinitely many times, and thus Y contains a convergent
subsequence. From this it follows that any set containing a finite number of elements is relatively compact. Every relatively compact set is contained in a compact set and hence is bounded. F o r the finite-dimensional case it is also true that every bounded set is relatively compact (e.g., in Rn the BolzanoWeierstrass theorem guarantees this). However, in the infinite-dimensional case it does not follow that every bounded set is also relatively compact. In analysis and in applications linear operators which transform bounded sets into relatively compact sets are of great importance. Such operators are called completely continuous operators or compact operators. We give the following formal definition.
7.7.1. Definition. eL t X and Y be normed linear spaces, and let T be a linear transformation with domain X and range in .Y Then T is said to be completely continuous or compact if for each bounded sequence x { n } in ,X the sequence { T x . } contains a subsequence converging to some element of y E .Y
We have the following equivalent characterization of a completely continuous operator. 7.7.2. Theorem. Let X and Y be normed linear spaces, and let T E B(X , Y ) . Then T is completely continuous if and only if the sequence { T x n } contains a subsequence convergent to some y E Y for all sequences x { n } such that Ilx,,11 < I for all n. 7.7.3.
Exercise.
Prove Theorem 7.7.2.
Clearly, if an operator T is completely continuous, then it is continuous. On the other hand, the fact that T may be continuous does not ensure that it is completely continuous. We now cite some examples.
Chapter 7 I iL near Operators
84
7.7.4. Example. eL t T: X - X be the ez ro operator; i.e., Tx = x E .X Then T is clearly completely continuous. _
0 for all
7.7.5. Example. Let X = era, bJ, and let II . II", be the norm on era, bJ as defined in Example 6.1.9. eL t k: a[ , bJ X a[ , bJ - R be a real-valued function continuous on the square a < s < b, a < t < b. Defining T: X-Xby
s:
=
T [ (J x s)
k(s, t)x(t)dt
for all x E ,X we saw in Example 7.1.20 that Tis a bounded linear operator. We now show that T is completely continuous. eL t ,x { ,} be a bounded sequence in ;X i.e., there is a K > 0 such that IIx"lI", < K for all n. It readily follows that if "Y = Tx", then IIY"II S 7011x"II, where
70 =
sup
1I~,b~
fb Ik(s, t) Idt (see Example
7.1.20). We now show that .Y { }
II
is an equicontinuous set of functions on a[ , bJ (see Definition 5.8.11). Let f > O. Then, because of the uniform continuity of k on a[ , bJ X a[ , bJ, there is a ~
> 0 such that
Ik(s .. t) -
every t E a[ , bJ. Thus IY,,(sl)
-
y,,(s~)
I<
r
k(s~,
Ik(sl'
t)1 t) -
<
(K b
k(s~,
f_
a) if
lSI - s~1 < ~
t) IIx(t) Idt
<
for
f
for all n and all s.. s~ such that lSI - s~ I < ~. This implies the set ,Y{ ,} is equicontinuous, and so by the Arzela-Ascoli theorem (Theorem 5.8.12), the set { Y . } is relatively compact in era, b] ; i.e., it has a convergent subseuq ence. This implies that T is completely continuous. It can be shown that if X = L~[a, b) and if T is the Fredholm operator defined in Example 7.3. II, then T is also a completely continuous operator.
-
The next result provides us with an example of a continuous linear transformation which is not completely continuous. 7.7.6. Theorem. Let IE B(X , X) denote the identity operator on X . Then I is completely continuous if and only if X is finite dimensional.
Proof. The proof is an immediate consequence of Theorem 6.6.10. _ We now consider some of the general properties of completely continuous operators. 7.7.7. Theorem. eL t X and Y be normed linear spaces, let S, T E B(X , )Y be completely continuous operators, and let IX, pEe. Then the operator (IXS + PT) is completely continuous.
7.7. Completely Continuous Operators
94
Proof Given a sequence .x{ } with Ilx.1I < I, there is a subsequence x { • .} such that the sequence {Sx • .} has a limit u; i.e., Sx • ~ u. F r om the sequence x { • .} we pick another subsequence x { • ,} such that TX.' J ~ v. Then
+
(as as n k , n kJ ~
00.
PDx • J
= aSx • J
+
PTx • , -
(Xu
+
pv
•
We leave the proofs of the next results as an exercise. 7.7.8. Theorem. L e t T E B(X, X ) be completely continuous. Let Y be a closed linear subspace of X which is invariant under T. Let T t be the restriction of T to .Y Then T t E B(Y, )Y and T t is completely continuous. 7.7.9. Exercise.
Prove Theorem 7.7.8.
7.7.10. Theorem. L e t T E B(X, X ) be a completely continuous operator, and let S E B(X , X ) be any bounded linear operator. Then ST and TS are completely continuous. 7.7.11. Exercise.
Prove Theorem 7.7.10.
7.7.12. Corollary. Let X B(X , )Y and S E B( ,Y X). pletely continuous. 7.7.13.
Exercise.
and Y be normed linear spaces, and let T E If T is completely continuous, then ST is com-
Prove Corollary 7.7.12.
7.7.14. Example. A consequence of the above corollary is that if T E B(X, X ) is completely continuous and X is infinite dimensional, then T cannot be a bijective mapping of X onto .X For, suppose T were bijective. Then we would have T- t T = I. By the Banach inverse theorem (see Theorem 7.2.6) T- t would then be continuous, and by the preceding theorem the identity mapping would be completely continuous. However, according to Theorem 7.7.6, this is possible only when X is finite dimensional. Pursuing this example further, let X = era, bJ with II· II~ as defined in Example 6.1.9. Let T: X
<
~ X
be defined by Tx(t)
=
s: (x r- )d-r
for a
<
t
b and x E .X It is easily shown that T is a completely continuous operator < (T) is the family of all functions on .X It is, however, not bijective since R which are continuously differentiable in ,X and thus R < (T) is clearly a proper subset of .X The operator T is injective, since Tx = 0 implies x = O. The < (T) and a < t < b. We inverse T- t is given by T- t y(t) = dy(t)/dt for y E R saw in Example 5.7.4 that T- t is not continuous. • In our next result we require the following definition.
Chapter 7 I iL near Operators 7.7.15. Definition. Let X and Y be normed linear spaces, and let T E B(X, )Y . The operator Tis said to be finite dimensional ifT(X ) is finite dimensional; i.e., the range of T is finite dimensional. 7.7.16. Theorem. Let X and Y be normed linear spaces, and let T E B(X, )Y . If T is a finite-dimensional operator, then it is a completely continuous operator. Let .x { l be a sequence in X such that II .x 1I ::;; 1 for all n. Then { T x . l is a bounded sequence in T(X). It follows from Theorem 6.6.10 that the set { T x . l is relatively compact, and as such this set has a convergent subsequence in T(X ) . It follows from Theorem 7.7.2 that T is completely continuous. _
Proof
The proof of the next result utilizes what is called the diagonalization process. 7.7.17. Theorem. Let X and Ybe Banach spaces, and let { T .l be a sequence of completely continuous operators mapping X into .Y If the sequence { T .l converges in norm to an operator T, then T is completely continuous. Let .x { l be an arbitrary sequence in X with IIx.11 < I. We must show that the sequence {Tx.l contains a convergent subsequence. By assumption, T 1 is a completely continuous operator, and thus we can { 1 .x l. Let select a convergent subsequence from the sequence T
Proof
denote the inverse images of the members of this convergent subsequence. Next, let us apply T" to each member of the above subsequence. Since T" is completely continuous, we can again select a convergent subsequence from the sequence {T"x. 1l. The inverse images of the terms of this sequence are Xu,
"X ", x 3", ... , .x ", ....
Continuing this process we can generate the array
Using
this array, let us now form the diagonal sequence
Now each of the operators T IJ T", T 3 , • • , T., ... transforms this sequence into a convergent sequence. To show that Tis completely continuous we must
7.7.
Completely Continuous Operators
54 1
show that T also transforms this sequence into a convergent sequence. Now
II Tx • - Tx .... 11 =
<
<
1\ Tx • -
liT -
Tkx •
1\ Tx." 11
+ II Tkx •
+
Tkll(llx • 11
i.e.,
Ilx
m ",
Tkx • -
II) +
+
Tkx". -
Tkx ..",
+
Tkx",,,, -
II + II Tkx",,,, - Tx",,,, II 1\ Tkx • - T k"x ,,,, II;
Tx",,,,
II
Tkx",,,,
II Tx"" - Tx",,,, II < liT - TkII(II x • II + II "x '
II) + II Tkx • - Tkx",,,, II. Since the sequence T { kX • } converges, we can choose m, n > N such that II Tkx • - Tkx ..", II < f/2, and also we can choose k so that II T - Tk II < f/4. We now have
II Tx • - Tx",,,, II <
f
whenever m, n > Nand T { "x .} is a Cauchy sequence. Since Y is a complete space it follows that this sequence converges in Y a nd by Theorem 7.7.2 the desired result follows. _ Theorem 7.7.7 implies that the family of completely continuous operators forms a linear subspace of B(X, )Y . The preceding theorem states that if Y is complete, then this linear subspace is closed. 7.7.18. Theorem.
eL t X
be a Hilbert
space, and let T
E
B(X ,
X).
Then
(i) T is completely continuous if and only if T*T is completely continuous; and (ii) T is completely continuous if and only if T* is completely continuous. We prove (i) and leave the proof of (ii) as an exercise. Assume that T is completely continuous. It then follows from Theorem 7.7.10 that T*T is completely continuous. Conversely, assume that T*T is completely continuous, and let (x,,} be a sequence in X such that II "x II < 1. It follows that there is a subsequence "x{ J such that T*Tx". - > x E X as nk - > 00. Now Proof
II TX"J - Tx". W= II T(x"J - x • ) W= (T(x' J - "x .), T(x", - "x .» = (T*T(x", - "x .), (x"J - "x .» < II T*T(x"J - x • ) II • II "x J - "x . II
:::;; 211 T*Tx",
-
T*Tx " .II- - »
0
as nl , nk - > 00. Thus, T { "x ,} is a Cauchy sequence and so it is convergent. It follows from Theorem 7.7.2 that Tis completely continuous. _ 7.7.19.
Exercise.
Prove part (ii) of Theorem 7.7.18.
In the remainder of this section we turn our attention to the properties of eigenvalues of completely continuous operators.
Chapter 7 I iL near Operators
54 2
7.7.20. Theorem. eL t X b e a Hilbert space, let T If T is completely continuous and if 1 =#
m.A(n =
is finite dimensional.
:x {
0, then
Tx
=
E
B(X, )X , and letA. E C.
lx }
Proof. The proof is. by contradiction. Assume that m.A(n is not finite dimensional. Then there is an orthonormal infinite sequence X I ' 2X .' • • , x .., ... in m.A(n, and
= II Ax .. -AxlllW = 1112. II x .. - lx llW = 21112.; Txlllil = ,."I"'r III =# 0 for all m =# n. Therefore, no subsequence
IITx .. -
TxlllW
i.e., II Tx .. of T { x ..} can be a Cauchy sequence, and hence no subsequence of T { x ..} can converge. This completes the proof. _ In the next result n(T) denotes the approximate point spectrum of T.
7.7.21. Theorem. eL t X b e a Hilbert space, let T E B(X, )X , and let 1 E C. If T is completely continuous, if 1 =# 0, and if 1 E n(T), then 1 is an eigenvalue.
For each positive integer n there is an x .. E X such that II Tx .. -
Proof.
< .!.nII x .. II forA.
E
n(n. We may assume that II x .. II =
Ax .. II
l. Since Tis completely
continuous, there is a subsequence of x { ..}, say x { ...} such that T { x ..J is convergent. eL t lim Tx ... = y E .X It now follows that lIy - lx • I1-- 0 as nk
--
"' 00; i.e., AX ... -
y. Now lIyll=# 0, because lIyll = lim II AX ... II =
IAI lim II x • II = IAI =# O. By the continuity of T. we now have "'
Ty =
T(lim lx ...) = ....
lim T(AX ...) = IJ ,t
1 lim Tx • II.
"'
= ly.
eH nce, Ty = ly, y =# O. Thus, 1 is an eigenvalue of T and y is the corresponding eigenvector. _ The proof of the next result is an immediate consequence of Theorems 7.6.16 and 7.7.21.
7.7.22. Theorem. eL t X be a Hilbert space, and let T pletely continuous and normal. If 1 ofT.
7.7.23. Exercise.
E
u(n and 1 =#
E B(X, X ) be com0, then 1 is an eigenvalue
Prove Theorem 7.7.22.
The above theorem states that, with the possible exception of 1 = 0, the spectrum of a completely continuous normal operator consists entirely of eigenvalues; i.e., if 1 =# 0, either 1 E Pu(T) or 1 E P(T).
7.7.
Completely Continuous Operators
54 3
7.7.24. Theorem. L e t X be a H i lbert space, and let T E B(X , X ) . If T is completely continuous and hermitian, then T has an eigenvalue, l, with
III = II Til·
Proof The proof follows directly from part (iii) of Theorem 7.6.18 and Theorem 7.7.22. _ 7.7.25. Theorem. L e t X be a H i lbert space, and let T E B(X , X ) . If Tis normal and completely continuous, then T has at least one eigenvalue. Proof If T = 0, then l = 0 clearly satisfies the conclusion of the theorem. So let us assume that T *- O. Also, if T = T*, the conclusion of the theorem follows from Theorem 7.7.24. So let us assume that T*- T*. L e t U = 1(T
+
T*) and V =
i/T -
T*). It follows from Theorem 7.4.15 that U
and V
are hermitian. F u rthermore, by Theorem 7.5.4 we have U V = VU. F r om Theorems 7.7.7 and 7.7.18, U and V are completely continuous. Byassumption, V*- O. By the preceding theorem, V has a non- z e ro eigenvalue which we shall call p. It follows from Theorem 7.1.26 that ffi:iV) = ffi:(V - PI) ~ N is a closed linear subspace of .X Since U V = VU, Theorem 7.6.24 implies that N is invariant under .U Now let U I be the restriction of U to the linear subspace N. It follows that U I is completely continuous by Theorem 7.7.8. It is readily verified that U I is a hermitian operator on the inner product subspace N (see Eq. (3.6.21). Hence, U I is completely continuous and hermitian. This implies that there is an (X E C and an x E N such that x * - O and U l x = (X.x This means Ux = (X.x Now since x E N, we must have Vx = px . It follows that l = (1, + iP is an eigenvalue of T with corresponding ipx = x « + iP)x = lx . This eigenvector x , since Tx = U [ + iV] x = (Xx completes the proof. _
+
We now state and prove the last result of this section. 7.7.26. Theorem. L e t X be a H i lbert space, and let T E B(X , X). If Tis normal and completely continuous, then T has an eigenvalue l such that
III = II Til·
Proof L e t S = T*T. Then S is hermitian and completely continuous by Theorem 7.7.18. Also, S > 0 because (Sx , )x = (T*Tx , x ) = (Tx , Tx ) = II Tx ZH > O. This last condition implies that S has no negative eigenvalues. Specifically, if l is an eigenvalue of S, then there is an x * - O in X such that Sx = Ax. Now
o<
(Sx,
)x
~
(Ax, x) =
A(x,
)x =
AllxW,
and since II x II *- 0, we have A > O. By Theorem 7.7.24, S has an eigenvalue, p, where ± p = IISII = IIT*TII = IITW· Now let N ~ ffi:(S - pI) = ffi:iS ), and note that N contains a non- z e ro vector. Since T is normal, TS = T(T*T)
Chapter 7 I iL near Operators = (T*nT = ST. Similarly, we have T*S = ST*. By Theorem 7.6.24, N is invariant under T and under T*. By Theorem 7.5.6 this means T remains normal when its domain of definition is restricted to N. By Theorem 7.7.25, there is alE C and a vector x I= :- 0 in N such that Tx = lx , and thus T*x = .x X Now since Sx = T*Tx = T*(lx ) = IT*x = llx = 1112x for this x I= :0, and since Tx = lJ X for all x E N, it follows that 111 2 = lJ = II S II = II T*T II = II T W· Therefore, III = II T II and 1 is an eigenvalue of T. _
7.8.
THE SPECTRAL THEOREM O F R COMPLETELY CONTINUOS U NORMAL OPERATORS
The main result of this section is referred to as the spectral theorem (for completely continuous operators). Some of the direct consequences of this theorem provide an insight into the geometric properties of normal operators. Results such as the spectral theorem playa central role in applications. In Section 7.10 we will apply this theorem to integral equations.
Throughout this section, X is a complex iH lbert We require some preliminary results.
space.
7.8.1. neorem. L e t T E B(X, X ) be completely continuous and normal. F o r each f > 0, let A. be the annulus in the complex plane defined by
A. =
{l
E C: f
< 1).1 s II Til}.
Then the number of eigenvalues of T contained in A. is finite.
Proof To the contrary, let us assume that for some f > 0 the annulus A. contains an infinite number of eigenvalues. By the Bolzano-Weierstrass theorem, there is a point of accumulation 1 0 of the eigenvalues in the annulus A•. Let ){ .ft} be a sequence of distinct eigenvalues such that )." - > ).0 as n - > 00, and let Tx" = l"x", II "x II = I. Since T is a completely continuous for which the sequence T { "x .} operator, there is a subsequence x { ...} of ,x { ,} converges to an element u E X ; i.e., Tx". - > U as nk - > 00. Thus, since Tx ... = l".x we have l • x ... - > u. But 1/).... - > 1/10 because 1" I= :- O. Therefore x • - > (I/10)u. But the x • are distinct eigenvectors corresponding to distinct eigenvalues. By part (iv) of Theorem 7.6.10 .x { ..} is an orthonormal 2 sequence and "x . - > (I/10)u. But II x • - "x ,11 = 2, and thus x { ...} cannot be a Cauchy sequence. Yet, it is convergent by assumption; i.e., we have arrived at a contradiction. Therefore, our initial assumption is false and the theorem is proved. _ ft. ,
Our next result is a direct consequence of the preceding theorem.
7.8.
The Spectral Theorem for Completely Continuous Normal Operators
54 5
7.8.2. Theorem. Let T
E B(X , X ) be completely continuous and normal. Then the number of eigenvalues of T is at most denumerable. If the set of eigenvalues is denumerable, then we have a point of accumulation at zero and only at zero (in the complex plane). The non-zero eigenvalues can be ordered so that
7.8.3. Exercise.
Prove Theorem 7.8.2.
The next result is known as the spectral theorem. Here we let Ao = 0, and we let {AI' A2.' ...} be the non-zero eigenvalues of a completely continuous operator T E B(X , X). Note that Ao mayor may not be an eigenvalue of T. If Ao is an eigenvalue, then m.(T) need not be finite dimensional. oH wever, by Theorem 7.7.20, m.(T - A/) is finite dimensional for i = 1,2, ....
7.8.4.
Theorem. eL t T E B(X, X ) be completely continuous and normal, { lt A2.' ...} be the non-zero distinct eigenvalues of T let Ao = 0, and let A (this collection may be finite). eL t m., = m.(T - A,I) for i = 0, I, 2, .... Then the family of closed linear subspaces m { .,};:o of X is total.
The fact that each Theorem 7.1.26. Now let Y
Proof
m., is a closed linear subspace of X follows from = U m.", and let N = y.1.. We wish to show that
.
N= O { .J By Theorem 6.12.6, N is a closed linear subspace of .X We will show first that Y is invariant under T*. Let x E .Y Then x E m.. for some n and Tx = l"x. Now l.,(T*x ) = T*(l"x ) = T*Tx = T(T*x ) ; i.e., T(T*x ) = l.(T*x ) and so T*x E m.., which implies T*x E .Y Therefore, Y is invariant under T*. From Theorem 7.3.15 it follows that y.1. is invariant under T. Hence, N is an invariant closed linear subspace under T. It follows from Theorems 7.7.8 and 7.5.6 that if T I is the restriction of T to N, then T I E B(N, N) and T I is completely continuous and normal. Now let us suppose that N 1= = O { .J By Theorem 7.7.25 there is a non-zero x E N and a A. E C such that T I x = lx . But if this is so, Ais an eigenvalue of T and it follows that x E m." for some n. Hence, x E N (\ ,Y which is impossible unless x = O. This completes the proof. • In proving an alternate form of the spectral theorem, we require following result.
the
7.8.5. Theorem. Let {N k } be a sequence of orthogonal closed linear subspaces of ;X i.e., N k .1. N J for all j 1= = k. Then the following statements are equivalent:
(i) N { k } is a total family; (ii) X is the smallest closed linear subspace which contains every N k ; and
Chapter 7 I iL near Operators
S4 6
for every x E X there is a unique sequence x{ (a) X k E N k for every k,
(iii)
(b)
Proof
= U
II
.
L k=1
x
k
=
k}
such that
and
X;
We first prove the equivalence of statements (i) and (ii). Let Y Nil' Then Y c y.l.L by Theorem 6.12.8. Furthermore, y.l.L is the smallest
closed linear subspace which contains Y by Theorem 6.12.8. Now suppose { N k } is a total family. Then yl. = O { .J Hence, yl.l. = X and so X is the smallest closed linear subspace which contains every N k • On the other hand, suppose X is the smallest closed linear subspace which { .J But yl.l.l. = lY .. Thus, contains every N k • Then X = y.l.L and yl.l.l. = O yl. = O { ,J and so { N k } is a total family. We now prove the equivalence of statements (i) and (iii). Let N { k } be a total family, and let x E .X F o r every k = 1,2, ... , there is an IX < E IH < and a kY E Nt such that x = X k + IY '< If IX < = 0, then (x, x k) = 0. If IX < 0, then (x, xk1llxkll) = (Xk + kY ' x k lllx k ll) = II ,x .. II· Thus, it follows from Bessel's inequality that
*'
eH nce, let Y
=
~
. Ilx,..1I
2
<
k=1 N j . Then (x -
E
(i: ,..-1 ,x ..,
(x j' y) -
Next,
00.
x o, y)
=
Y)
let
=
(x j
+
o X
(x j' y) -
=
~
.
X k•
"'1= Yj - x o, y) =
i: (x k, y) =
"'I~
Then X o (x j ,y) (x j' y) -
+
E
For
.X
(Y j ,y)
-
=
(x j ' Y )
fixed j, (xo,Y)
O. Thus,
.
x o) is orthogonal to every element of Nj for every j. Since N { ,..} is a total family, we have x = x o. To prove uniqueness, suppose that x = L IX <
(x -
=
.. L
k=1
x~
-
=
x~,
where X
E
N k we have (x
i: II X k -
k=1
k,
x~
Nk
E k -
. for all k. Then L
x~)
L-
X~ 11 2 = O. Thus, II X
(x j k-
x~)
k=1
for j
(x k -
x~)
*' k, and so II k~
=
"'1-
O. Since X k (x k -
x~)
Ir
x~ II = 0 for all k, and X k is unique for
each k. To prove that (iii) implies (i), assume that x E Nt for every k. By hypothesis, x
=
i:
k=1
X k,
where
X k
E
N k for all k. Hence,
for any j we have
7.8.
54 7
The Spectral Theorem for Completely Continuous Normal Operators
and x ) = 0 for allj. This means x completes the proof. •
= 0, and so N { k ) is a total family. This
In Definition 3.2.13 we introduced the direct sum of a finite number of linear subspaces. The preceding theorem permits us to extend this definition in a meaningful way to a countable number of linear subspaces. 7.8.6. Definition. Let kY { ) be a sequence of mutually orthogonal closed linear subspaces of ,X and let V({Y k )) be the closed linear subspace generated by kY{ '}
If every x
E
V({Y k)) is uniquely representable as x
=
.
L k= 1
X k E Y k for every k, then we say V({Y k)) is the direct sum of kY{ )' case we write
X
k, where
In this
We are now in a position to present another version of the spectral theorem. 7.8.7. Theorem. eL t T E B(X , X ) be completely continuous and normal, let lo = 0, and let P'I' l2' ... , In' ...) be the non-zero distinct eigenvalues of T. eL t mol = mo(T - lJ ) for i = 0, I, 2, ... , and let Pi be the projection on mol along mot. Then (i) PI is an orthogonal projection for each i; (ii) PIP) = 0 for all i,j such that i F= j; (iii)
..
I; P J =
)- 0
(iv) T
=
~
..
t=1
I; and lJP).
The proof of each part follows readily from results already obtained. We simply indicate the principal results needed and leave the details as an exercise. Part (i) follows from the definition of orthogonal projection. Part (ii) follows from part (ii) of Theorem 7.6.26. Parts (iii) and (iv) follow from Theorems 7.1.27 and 7.8.5. •
Proof
7.8.8. Exercise.
Prove Theorem 7.8.7.
In Chapter 4 we defined the resolution of the identity operator for Euclidean spaces. We conclude this section with a more general definition.
{ n ) be a sequence of linear transformations on X 7.8.9. Definition. Let P such that P n E B(X , X ) for each n. If conditions (i), (ii), and (iii) of Theorem { n ) is said to be a resolution of the identity. 7.8.7 are satisfied, then P
7.9.
DIFE F RENTIATION
OF
OPERATORS
In this section we consider differentiation of operators on normed linear spaces. Such operators need not be linear. Throughout this section, X and Y are normed linear spaces over a field ,F where F may be either R, the real numbers, or C, the complex numbers. We will identify mappings which are, .Y As usual, L ( X , )Y will denote the class in general, not linear by I: X - + of all linear operators from X into ,Y while B(X , Y) will denote the class of all bounded linear operators from X into Y~ 7.9.1. Definition. L e t X o E X be a fixed element, and let I: X there exists a function 6/(x o, .): X - + Y such that
-+
.Y
If
(7.9.2) (where t E )F for all hEX , then I is said to be Gateaux differentiable at x o, and 6/(x o, h) is called the Gateaux differential of/at X o with increment h. The Gateaux differential ofI is sometimes also called the weak differential of I or the G-differenfial of f If I is Gateaux differentiable at x o, then 6/(x o, h) need not be linear nor continuous as a function of hEX . However, we shall primarily be concerned with functions I: X - + Y which have these properties. This gives rise to the following concept. 7.9.3. Definition. L e t X o E X be a fixed element, and let I: X there exists a bounded linear operator F ( x o) E B(X, )Y such that
.Y
If
(where hEX ) , then f is said to be F r echet differentiable at x o, and F ( x is called the F r echet derivative of I at x o' We define
o)
+
I~~ 1I~lIl f(xo
h) -
f' ( x o)
=
f(x o) -
F(x
= 0
o)'
If I is F r echet differentiable for each x E D, to be F r echet differentiable on D. We now show that F r echet tiability.
F(xo)hll
-+
where D c X,
then I is said
differentiability implies Gateaux
differen-
7.9.4. Theorem. L e t/: X - + ,Y and let X o E X be a fixed element. If I is F r echet differentiable at x o • then/is Gateaux differentiable. and furthermore the Gateaux differential is given by
6/(x o, h) = 54 8
f' ( x o )h
for all hEX .
7.9.
Differentiation ofOperators
Proof Let such that
o)
F(x
II t~
=
o), let
!'(x
1I1\(J X
o
provided that II th II <
+
54 9
> 0, and let hEX .
f
/(x o) -
th) -
Then there is a 0
II <
F ( x o )th
f •
> 0
II h II
0 if th *- O. This implies that
II /(x o + t~) provided that It I < h) = (F ox )h.
0/11 h II. •
~/(xo,
-
Hence,
/(x o) -
II <
F ( x o )h
/ is Gateaux
Because of the preceding theorem, if I: X .at X o E ,X the Gateaux differential ~/(xo, h) Frecbet differential of/at x o with increment h. Let us now consider some examples.
=
f
differentiable at Y
X
o and
is Frechet differentiable ! , (x o )h is also called the
7.9.5. Example. Let X be a Hilbert space, and let/be a functional defined on X ; i.e., I: X .- .F If I has a Frechet derivative at some X o E ,X then that derivative must be a bounded linear functional on ;X i.e.,! , (x o) E X·. In view of Theorem 6.14.2, there is an element oY E X such that ! , (x o )h = (h,yo)for each h E .X AIthough! , (x o) E X · andyo E ,X we know by Exercise 6.14.4 that X and X · are congruent and thus isometric. It is customary to view the corresponding elements of isometric spaces as being one and the same element. With this in mind, we say! ' ( x o) = oY and we call! ' ( x o) the gradient off at X O' • As a special case of the preceding example specific case.
we consider the following
7.9.6. Example. Let X = R' and let 11·11 be any norm on .X By Theorem 6.6.5, X is a Banach space. Now let / be a functional defined on X ; i.e., I: X .- R. Let x = (~I' ... ,~.) E X and h = (hI> ... ,h.) E .X If/has continuous partial derivatives with respect to ~I' i = I, ... ,n, then the Frechet differential of/is given by ~/(x,
F o r fixed X
o E ,X
h) -
_
8/(x ) ae: hI + ... + o - 8/(xc )
h•.
we define the bounded linear functional F(x F(xo)h
= ~ • 8/(x ~ )
I
hi "~"o
o) on X by
for hEX .
Then F ( x o) is the Frechet derivative of/at X O' As in the preceding example, we do not distinguish between X and X · , and we write
Chapter 7 I iL near Operators
64 0
The gradient off at x is given by
=
f '(x)
(Uf(X)
U f (x » ) .
~ " "' ~
(7.9.7)
.
In the following, we consider another example of the gradient of a functional. 7.9.8. Example. eL t X b e a real Hilbert space, letL : X - > X b e a bounded linear operator, and let/: X - > R be given by f(x ) = (x , L x ) . Then I has a rF echet derivative which is given by! , (x ) = (L + *L )x. To verify this, we let h be an arbitrary element in X and we let (F )x = (L + *L )x. Then
+
f(x
h) -
f(x ) -
=
F(x)h
+
(x
+
h, L x
= (h,Lh).
Lh) -
(x, L x )
F(x)h
I-
-
(h, L x )
-
(h, L *x)
From this it follows that lim If(x IhH
+
h) -
f(x ) -
IIhll
0
•
- .
In
the next example we consider a functional which frequently arises in optimization problems. 7.9.9. Example. Let X and Y be real Hilbert spaces, and let L be a bounded linear operator from X into ;Y i.e., L E B(X , )Y . eL t L * be the adjoint of L . eL t v be a fixed element in ,Y and let/be a real-valued functional defined on Xby
IIv - L x
f(x ) =
11 1 for all x
E
.X
Then f has a Frechet derivative which is given by f' ( x )
=
-2L*v
+
=
(v, v) -
2L*Lx.
To verify this, observe that f(x )
=
=
(v -
Lx,
(v, v) -
v-
Lx)
2(L*v, )x
+
2(v, L x )
+
(Lx,
Lx)
(x , L * L x ) .
The conclusion now follows from Examples 7.9.5 and 7.9.8. •
I:
In the next
R
8
->
R"'.
example we introduce the Jacobian matrix
of a function
7.9.10. Example. eL t X = R8, and let Y = R"'. Since X and Y are finite dimensional, we may assume arbitrary norms on each of these spaces and they wiII both be Banach spaces. L e tf: X - > .Y F o r x = (~I" .. '~8) E ,X
7.9.
64 1
Differentiation ofOperators
let us write
[
I(x ) =
For X
o E ,X
/I~X)J
/[ 1(1;1,;., . .
=
.
.
.
.
I",(x) 1",(1;1'' assume that the partial derivatives
I
af,(x )
~
,I;')J
,1;.)
af,(x o)
ae;-
? f ; "=". exist and are continuous for i = I, ... , m and j = I, ... ,n. The Frechet differential of1 at X o with increment h = (hI' ... ,h.) E X is given by
~
3/(x o, h) =
all (x o)
a/,(x o)
~
h[ h·:.'·J
al",(x o)
al",(x o)
_ ael
The F r tkhet derivative of 1 at X o is given by all (x o)
al;.
~
which is also called the Jacobian matrix j' ( x ) = a! ( x ) /ax . •
of 1 at X
o' We sometimes write
7.9.11. Example. Let X = e[a, b], the family of real-valued continuous functions defined on a[ , b], and let { X ; II· II-} be the Banach space given in Example 6.1.9. Let k(s, t) be a real-valued function defined and continuous on a[ , b] X a[ , b], and let g(t, )x be a real-valued function which is defined and ag(t, x ) /ax is continuous for t E a[ , b] and x E R. Let I: X - . X be defined by I(x ) F o r fixed given by X
o E ,X
=
s: k(s, t)g(t, x(t»dt,
x
E
.X
the Frechet differential of1 at X o with increment hEX
3/(x o, h) =
f
k(s, t) ag(t'a~o(t})
h(t)dt. •
is
Chapter 7 I iL near Operators
64 2 7.9.12. Exercise.
Verify the assertions made in Examples 7.9.5 to 7.9.11.
We now establish some of the properties of F r echet differentials. 7.9.13. Theorem. Then
Let f, g: X
Y
-+
be Frechet
differentiable at X
o E .X
(i) fis continuous at X o E ;X and (ii) for all ,~ p E ,F f~ + pg is F r echet differentiable at X o and (~f + pg)'(x o) = ~f'(xo) pg' ( x o)· Proof To prove (i), let f be Frechet differentiable at x o, and let F(x o) be the Frechet derivative off at X o' Then f(x o + h) - f(x o) = f(x o + h) - f(x o) - (F ox )h + (F ox )h,
+
and
IIf(x o + h) - f(x o) II ~ IIf(x o + h) - f(x o) - (F ox )hll + IIF(x o)hll. Since F(x o) is bounded, there is an M > 0 such that II (F o x )h II < Mil h II. F u rthermore, for given! > 0 there is a ~ > 0 such that IIf(x o + h) - f(x o) - (F ox h) II < I! I h II provided that II h II .~< Hence, IIf(x o + h) - f(x o) II < (M + ! ) lIhll whenever IIhll .~< This implies thatfis continuous atx o' The proof of part (ii) is straightforward and is left as an exercise. _ 7.9.14.
Prove part (ii) of Theorem 7.9.13.
Exercise.
We now show that the chain rule encountered in calculus applies to Frechet derivatives as well. 7.9.15. Theorem. Let ,X ,Y and Z be normed linear spaces. L e t g: X - + ,Y f: Y - + Z, and let,: X - + Z be the composite function , = fog. L e t g be Frechet differentiable on an open set D c ,X and let f be F r echet differentiable on an open set E c g(D). If x E D is such that g(x) E E, then, is Frechet differentiable at x and ,' ( x ) = f'(g(x))g'(x).
Proof Let y = g(x) and d = x + hE D. Then ,(x
=
+
h) -
f(y +
,(x ) -
Thus, given! 11,(x
+
>
f' ( y)d
0 there is a ~
h) -
=
f' ( y)g' ( x ) h
f(y) -
d) -
,(x ) -
+
g(x
>
+
h) f(y +
f' ( y){ g (x
g(x), where hEX d) -
+
f(y) h) -
0 such that II d II
f' ( y)g' ( x ) hll ~
! l Idli
<
+
f' ( y)d
g(x) ~ and
+
is such that
f' ( y)[ d -
g'()x h)
g'()x h).
II h II <
11f' ( y)II· l Ihll·
~ imply E.
By the continuity of g (see the proof of part (i) of Theorem 7.9.13), it follows that Ildli < M · l Ihll for some constant M. Hence, there is a constant k
7.9.
Differentiation 01 Operators
such that
II ,(x +
h) -
This implies that ,' ( x )
64 3
,(x ) -
f' ( y)g' ( x ) h
=
exists and ,' ( x )
II <
kf II h II.
f' ( g(x » g ' ( x ) .
•
We next consider the Frckhet derivative of bounded linear operators. 7.9.16. Theorem. Let T be a linear operator from X into .Y If f(x ) = Tx for all x E ,X then/is Frechet differentiable on X if and only if T is a bounded linear operator. In this case, f' ( x ) = T for all x E .X
Proof Let T be a bounded linear operator. Then Ilf(x + h) - f(x ) - Th II = IIT(x + h) - Tx - Thll = 0 for all x , hEX . F r om this it follows that f' ( x ) = T. Conversely, suppose T is unbounded. Then, by Theorem 7.9.13,/ cannot be Frechet differentiable. •
Let us consider a specific case. 7.9.17. Example. Let X = R" and Y = Rm, and let us assume that the natural basis for each of these spaces is being used (see Example .4 1.15). If A E H ( X , Y), then Ax is given in matrix representation by
all Ax =
[
: amI
Hence, f' ( x ) =
if I(x ) = Ax , then f' ( x ) df(x ) /U x is A. •
=
A, and the matrix
representation of
The next result is useful in obtaining bounds on Frechet functions.
differentiable
7.9.18. Theorem. Let f: X - + ,Y let D be an open set in ,X and let / be Frechet differentiable on D. eL t X o E D, and let hEX be such that X o + th E D for all t when 0· < t < I. eL t N = sup 11f'(x o + th) II. Then 0< , < 1
Ilf(x o + h) - f(x o) II < N l· Ihll. Proof Let y = f(x o h) - f(x o), and let , be a bounded linear functional E Y * ) such that ,(y) = 11,11 · l Iyl! (see Corollary 6.8.6). defined on Y(i.e." Define g: (0, 1) - + R by get) = ,(f(x th» for 0 < t < I. By Theorems 7.9.15 and 7.9.16, g'(t) = ' P (/' ( x + th)h). By the mean value theorem of calculus, there is a to such that 0 < to < I and g(I) - g(O) = g'(t 0)' Thus,
+
+
I,(/(x
+
h»
-
,(/(x »
I< 11,11 •
sup 1If' ( x
0< 1 < 1
+ th)II·llhll·
Chapter 7 I iL near Operators
Since
Irp(f(x
+
h»
-
it follows that II/(x o
+
rp(/(x» h) -
I(x
I = Irp(/(x + h) = IIrpll·lI/(x o + sup 11f'(x o) II ~ O
I = Irp(y) I h) - I(x o) II, th)1I l· Ihll. •
+
I(x ) )
If a function I: X - > Y is Frechet differentiable on an open set D c .X and if f' ( x ) is Frechet differentiable at x E D, then I is said to be twice Frechet differentiable at x , and we call the Frechet derivative of f' ( x ) the second derivative of f We denote the second derivative of I by I". Note that I" is a bounded linear operator defined on X with range in the nonned linear space B(X, )Y . We leave the proof of the next result as an exercise.
7.9.19. Theorem. L e t/: X - > Ybe twice Frechet differentiable on an open set D c .X eL t X o E D, and hEX be such that X o + th E D for all t when 0 < t < I. eL t N = sup 1I/"(x + th) II. Then
11/(x + 7.9.20. Exercise.
O< t < 1
h) -
I(x ) -
f' ( x ) hll
< iN l· Ihll z .
Prove Theorem 7.9.19.
We conclude the present section by showing that the Gateaux and Frechet differentials play a role in maximizing and minimizing functionals which is similar to that of the ordinary derivative of functions of real variables. eL t F = R, and let I be a functional on X ; i.e., I: X - > R. Clearly, for fixed x o, hEX . we may define a function g: R - + R by the relation g(t) = I(x o + th) for all t E R. In this case, if I is Gateaux differentiable at x o• we see that ~/(xo. h) = g' ( t) It.o, where g' ( t) is the usual derivative of g(t). We will need this property in proving our next result, Theorem 7.9.22. First, however, we require the following important concept.
7.9.21. Definition. eL t I be a real-valued functional defined on a domain S) c X ; i.e.,f: S) - > R. eL t X o E S). Then/is said to have a relative minimum (relative maximum) at X o if there exists an open sphere S(x o ; r) c X such that for all x E S(x o; r) n S) the relation I(x o) < I(x ) (/(x o) ~ I(x » holds. IfI has either a relative minimum or a relative maximum at x o• then I is said to have a relative extremum at X O' F o r relative extrema, we have the following result.
7.9.22. Theorem. eL t I: X - + R be Gateaux differentiable at If/has a relative extremum at x o, then ~/(xo, h) = 0 for all hEX . X
o E .X
7.10. Some Applications
Proof
As pointed out in the remark preceding Definition 7.9.21, the realvalued function g(t) = f(x o + th) must have an extremum at t = O. From the oridnary calculus we must have g'(t) 1,.0 = O. eH nce, 6f(x o, h) = 0 for all hEX . • We leave the proof of the next result as an exercise.
7.9.23. Corollary. eL t f: X - + R be Frechet differentiable at fhas a relative extremum at x o, thenj' ( x o) = O. 7.9.24.
Exercise.
X o E
.X
If
Prove Corollary 7.9.23.
We conclude this section with the following example. 7.9.25. Example. Consider the real-valued functionalf defined in Example 7.9.9; i.e.,f(x ) = IIv - L x liz. F o r a given v E ,Y a necessary condition for fto have a minimum at X o E X is that
o=
L*Lx
L*v . •
7.10. SOME APPLICATIONS In this section we consider selected applications of the material of the present chapter. The section consists of three parts. In the first part we consider integral equations, in the second part we give an example in optimal control, while in the third part we address the problem of minimizing functionals by the method of steepest descent. A. Applications to Integral Equations
Throughout this part, X is a complex Hilbert space while T denotes a completely continuous normal operator defined on .X We recall that if, e.g., X = a[ z L , b] and T is defined by (see Example 7.3.11 and the comment at the end in Example 7.7.5)
Tx(s)
=
s: k(s, t)x(t)dt,
(7.10.1)
then T is a completely continuous operator defined on .X Furthermore, if k(s, t) = k(t, s) for all s, t E a[ , b], then T is hermitian (see Exercise 7.4.20) and, hence, normal. In the following, we shall focus our attention on equations of the form
Tx -
h =
y,
(7.10.2)
Chapter 7 I iL near Operators
64 6
where A E C and x, Y E .X If, in particular, T is defined by Eq. (7.10.1), then Eq. (7.10.2) includes a large class of integral equations. Indeed, it was the study of such equations which gave rise to much of the development of functional analysis. We now prove the following existence and uniqueness result. 7.10.3. Theorem. If A1= = 0 and if A is not an eigenvalue of T, then Eq. (7.10.2) has a unique solution, which is given by (7.10.4)
{ n} are the non-zero distinct eigenvalues of T, P n is the projection of where A X onto ~n = ~(T - AnI) along~;,l for n = 1,2, ... ,and Pox is the projection of x onto ~(T). Proof We first prove that the infinite series on the right-hand side of Eq. (7.10.4) is convergent. Since A1= = 0, it cannot be an accumulation point of A { n}. Thus, we can find ad> 0 such that IAI > d and 11 - 1k I> d for k = 1,2, .... We note from Theorem 7.8.7 that PIP j = 0 for i j. Now for N < 00, we have by the Pythagorean theorem,
*'
II-Pf +
k~I;: ~;:112
k~
=rhIlPoYW+
11-A ! kI2I1PkYW
< d211PoYW + dz kt IIP kyW
+ ktlllPkYll z ]
= dzI[ IPoYW = d 211 poY
+ ~ Pkylr
< dzll pOY + =
This implies that k~ Theorem 6.13.3 that
11
dziIYW.
~ 1k 12 II PkY
nt :X ~ ):
i;l PkyW
2
11 is convergent, and so it follows from
is convergent to an element in .X
be a positive integer. By Theorem 7.5.12, P j is continuous, and so P ) PP by Theorem 7.1.27, Pj ~, ~ 1 = ~ , J ....:Y,. Now let x be given by L e tj
00
(
Eq. (7.10.4) for arbitrary Y
00
11-1 All
E
,,- 1
A"
lJ .
.X We want to show that Tx - l x
= y. F r om
7.10. Some Applications
64 7
Eq. (7.10.4) we have
=
Pox
I - r PoY
and
1 lPJ y forj=
PJ X = l J Thus, poY
=
- l Pox
and PJY
=
lJPxJ
theorem (Theorem 7.8.7), we have Y
+
= poY
lPJx.
+
Now from the spectral
fti PJ'Y 00
Tx
00
= ftilJ P J x ,
and
00
~ lPJx. Hence, Y = Tx - l x . :'J 1 Finally, to show that x given by Eq. (7.10.4) is unique, let x and z be such that Tx - Ax = Tz - lz = y. Then it follows that T(x - )z - l(x - z) =Y - Y = O. Hence, T(x - )z = l(x - )z . Since 1 is by assumption not an eigenvalue of T, we must have x - z = O. This completes the proof. _
lx
= lPox
-
1,2, ....
In the next result we consider the case where 1 is a non-zero ofT.
eigenvalue
7.tO.S. Theorem. Let I{ n} denote the non-zero distinct eigenvalues of T, and let A= lJ for some positive integer j. Then there is a (non-unique) x E X satisfying Eq. (7.10.2) if and only if PJY = 0, where PJ is the orthogonal projection of X onto ffi:J = :x { (T - Al)x = O}. If PJY = 0, then a solution to Eq. (7.10.2) is given by
X=X
poY o - " ' "II.
+
PkY
~
~'
k= l lI.k k*J
where Po is the orthogonal projection of X in ffi:J '
-.I\,
(7.10.6)
onto ffi:(T) and X o is any element
Proof We first observe that ffi:J reduces T by part (iii) of Theorem 7.6.26. It therefore follows from part (ii) of Theorem 7.5.22 that TPJ = PJT. Now suppose that Y is such that Eq. (7.10.2) is satisfied for some x E .X Then it follows that PJY = Pi Tx - lJ x ) = TPJx - lJPxJ = AJPXJ - AJPXJ = O. In the preceding, we used the fact that Tx = lJ x for x E ffi:J and PJx E ffi:J for all x E .X Hence, PJY = O. Conversely, suppose that PJY = 0, and let x be given by Eq. (7.10.6). The proof that x satisfies Eq. (7.10.2) follows along the same lines as the proof of Theorem 7.10.3, and the details are left as an exercise. The nonuniqueness of the solution is apparent, since (T - ll)x o = 0 for any X o E
ffi:J' -
7.tO.7. Exercise.
Complete the proof of Theorem 7.10.5.
Chapter 7 I iL near Operators
64 8 B.
An Example
from Optimal Control
In this example we consider systems which can appropriately be described by the system of first-order ordinary differential equations
°
+
AX(I)
i(l) =
(7.10.8)
BU(I),
X o is given. Here (X I) E RIO and (U I) E R'" for every 1 such that < 1 < T for some T> 0, and A is an n X n matrix, and B is an n X m matrix. As we saw in part (vi) of Theorem .4 11.45, if each element of the vector (U I) is a continuous function of I, then the unique solution to Eq. (7.10.8) at time 1 is given by
where x ( o)
A
+
= .(1, O)x(O)
(X I)
(.(1, r- )BU(f)d-r,
(7.10.9)
where .(1, f) is the state transition matrix for the system of equations given in Eq. (7.10.8). [' , T] by Let sU now define the class of vector valued functions ;L O ;L O [' ,
T] =
u{ : uT
=
(U . ,
,u",), where
••
If we define the inner product by (u, v)
=
r
/U
E
L
[ , 20
T], i =
I, ... ,m} .
uT(t)v(l)dl
for u, v E Lr[O, 1',] then it follows that Lr[O, T] is a Hilbert space (see Example 6.11.11). Next, let us define the linear operator L : Lr[O, T] - + Li[O, 1'] by
=
[Lu](I)
I
.(1, r- )BU(f)d-r
(7.10.10)
for all U E Lr[O, 1'.] Since the elements of .(1, r- ) are continuous functions on 0[ , T] X 0[ , T], it follows that L is completely continuous. Now recall from Exercise 5.10.59 that Eq. (7.10.9) is the unique solution to Eq. (7.10.8) when the elements of the vector u(t) are continuous functions of t. It can be shown that the solution of Eq. (7.10.8) exists in an extended sense if we permit u E Lr[O, T]. Allowing for this generalization, we can now consider the following optimal control problem. Let "I E R be such that "I > 0, and let/be the real-valued functional defined on Ll[O, T] given by /(u)
=
r
T x (t)X(I)dt
+
"I
r
T U (I)U(t)dt,
(7.10.11)
where (x t) is given by Eq. (7.10.9) for U E T L O [ , T]. The linear quadratic L O [ , T] such that/(u) in Eq. (7.10.11) is cost control problem is to find u E T minimum, where x(t) is the solution to the set of ordinary differential equations (7.10.8). This problem can be cast into a minimization problem in a Hilbert space as follows.
7.10. Some Applications
64 9
Let
v(t)
= - . (t, O)x o for 0 <
t ::::;; T.
Then we can rewrite Eq. (7.10.9) as
x =
v,
Lu -
and Eq. (7.10.11) assumes the form f(u)
= IILu - vW + "lIuW.
We can find the desired minimizing u in the more general context of arbitrary real Hilbert spaces by means of the following result.
7.10.12. Theorem. Let X and Y be real Hilbert spaces, let :L
X - + Y be a completely continuous operator, and let L * denote the adjoint of L . Let v be a given fixed element in ,Y let" E R, and define the functionalf: X - + R by f(u) =
"lIull z
vW +
IILu -
(7.10.13)
for u E .X (In Eq. (7.10.13) we use the norm induced by the inner product and note that II u II is the norm of u E ,X while II L u - v II is the norm of (L u - v) E .Y ) If in Eq. (7.10.13), " > 0, then there exists a unique U o E X such that f(u o) < f(u) for all u E .X Furthermore, U o is the solution to the equation L*Lu
o
+ "U
o=
(7.10.14)
L*v.
eL t us first examine Eq. (7.10.14). Since L is a completely continuous operator, by Corollary 7.7.12, so is L*L. Furthermore, the eigenvalues of L * L cannot be negative, and so - " cannot be an eigenvalue of L*L. Making the association T = L * L , A = - " , and y = L * v in Eq. (7.10.2), it is clear that Tis normal and it follows from Theorem 7.10.3 that Eq. (7.10.14) has a unique solution. In fact, this solution is given by Eq. (7.10.4), using the above definitions of symbols. Next, let us assume that U o is the unique element in X satisfying Eq. (7.10.14), and let hE X b e arbitrary. It follows from Eq. (7.10.13) that Proof.
f(u o +
h) =
=
= =
Therefore, f(u o +
(L u o + (L u o -
+
(v, v)
(L u o -
+
v,L u o + L h - v) + ,,(uo + v, L u o - v) + 2(Lh, L u o - v)
Lh -
+
v, L u o -
2(h, L * L u
IILu o -
"(I!o, u o) + o+
vW +
v)
+
"u o -
+
2,,(u o, h) (v, v) L * v)
+
+
,,(uo, uo) ,,(h, h)
IlvW + "lIuoW+
h) is minimum if and only if h
,,(h, h)
=
"lIhW·
O.
•
h, U o +
h)
Chapter 7 I iL near Operators
74 0
The solution to Eq. (7.10.14) can be obtained from Eq. (7.10.4); however. a more convenient method is available for the finding of the solution when L is given by Eq. (7.10.10). This is summariz e d in the following result.
7.10.1S. Theorem. L e t Y' >
0, and let f(u) be defined by Eq . (7.10.11), where x ( t) is the solution to Eq. (7.10.8). If
for all t such that 0 ~ ential eq u ation
P(t) with P(T)
Proof
=
<
=
t
.J_ .. =
u(t)
BTp(t)x ( t)
Y'
T, where P(t) is the solution to the matrix differ-
- A Tp(t) -
P(t)A
+.!.
Y'
P(t)BBTp(t) -
I
(7.10.16)
where L u
is given
O. then u minimizes f(u).
We want to show that u satisfies Eq . (7.10.14),
by Eq. (7.10.10). We note that ifu satisfies Eq . (7.10.14). then u -
v)
=
-'!'L*x.
bitrary w
Y'
E
=
- . ! . £ * (L u
We now find the expression for evaluating L * w
,L ,[O.
ru: r f: ruT(t)[f r
=
for ar-
r
T]. We compute (w. £ 0 )
Y'
.(s, t)Bu(t)dt w(s)ds
=
uT(t)BT.T(S,t)w(s)dtds
=
BT.T(S, t)w(s)dsJ d t.
In order for this last expression to eq u al (L*w, u), we must have
=
*L[ w](t) Thus, u must satisfy
u(t)
t<
for all t such that 0 ~
=-
BT.T(S. t)w(s)ds.
_ BI T
Y'
iT
.T(S, t)x(s)ds
I
r
T. Now assume there exists a matrix P such that
=
P(t)x(t)
$T(S. t)x(s)ds.
(7.10.17)
We now find conditions for such a matrix P(t) to exist. F i rst, we see that P(T) = O. Next, differentiating both sides of Eq. (7.10.17) with respect to t, and noting that ebT(s, t) = AT.(s. t), we have
P(t)x(t)
+
P(t)i(t) =
- x ( t)
=
- x ( t)
AT -
f
$T(S, t)x(s)ds
ATp(t)x(t).
74 1
7.10. Some Applications
Therefore,
+
P(t)x(t)
But
P(t)[Ax(t) u(t) =
so that P(t)x ( t)
Hence,
+
=
Bu(t)]
- l - L * x ( t)
+
=
P(t)Ax ( t) - l - P (t)BBTp(t)x ( t) 1' pet) must satisfy pet)
with peT) = If
=
- A TP(t) -
P(t)AT +
+
L*Lu
1' U
ATP(t)X(t).
- x ( t)
-
l- P (t)BBTP(t) i'
O.
it follows that u satisfies
-
- l - B Tp(t)x ( t) i' =
1'
- x ( t)
= *L v,
ATP(t)X(t).
I
where v = - $ ( t, O)x o and so, by Theorem 7.10.12, u minimizes Eq. (7.10.11). This completes the proof of the theorem. _
I given by
The differential equation for pet) in Eq. (7.10.16) is called a matrix Riccati equation and can be shown to have a unique solution for all t < T.
C. Minimiz a tion of Functionals:
Method of Steepest Descent
The problem of finding the minimum (or maximum) of functionals arises frequently in many diverse areas in applications. In this part we turn our attention to an iterative method of obtaining the minimum of a functional I defined on a real Hilbert space .X Consider a functional I: X - + R of the form
I(x ) =
(x, Mx ) -
2(w, x )
+ p,
(7.10.18)
where w is a fixed vector in ,X where PER, and where M is a linear selfadjoint operator having the property
c,llx W « x , Mx ) < c
2
1IxW
(7.10.19)
for all x E X and some constants C 2 > C 1 > O. The reader can readily verify that the functional given in Eq. (7.10.13) is a special case off, given in Eq. (7.10.18), where we make the association M= L * L + 1' 1 (provided i' > 0), w
= L * v, and p = U n der
(v, v).
the above conditions, the equation
Mx =
w
(7.10.20)
74 1
Chapter 7 I iL near Operators
has a unique solution, say x o, and X o minimizes f(x ) . Iterative methods are based on beginning with an initial guess to the solution of Eq. (7.10.20) and then successively attempting to improve the estimate according to a recursive relationship of the form (7.10.21) where~. E Rand r. E .X Different methods of selecting~. and r. give rise to various algorithms of minimizing f(x ) given in Eq. (7.10.18) or, equivalently, finding the solution to Eq. (7.10.20). In this part we shall in particular consider the method of steepest descent. In doing so we let
r.
=
w-
Mx.,
n=
1,2, . . . .
(7.10.22)
The term r. defined by Eq. (7.10.22) is called the residual of the approximation x .• If, in particular, x . satisfies Eq. (7.10.20), we see that the residual is ez ro. F o r f(x ) given in Eq. (7.10.18), we see that
f' ( x . )
= - 2 r",
where f' ( x . ) denotes the gradient of f(x . ). That is, the residual, r., is "pointing" into the direction of the negative of the gradient, or in the direction of steepest descent. Equation (7.10.2 I) indicates that the correction term ~.r. is to be a scalar multiple of the gradient, and thus the steepest descent method constitutes an example of one of the so-called "gradient methods." is chosen so thatf(x . + ~.r.) is minimum. With r. given by Eq. (7.10.22),~. Substituting x . + ~.r. into Eq. (7.10.18), it is readily shown that (l
•
=
(r•• r.) (r., Mr.)
is the minimizing value. This method is illustrated pictorially in Figure B.
,X
fix , ) 7.10.23.
iF gure B. Illustration of the method of steepest descent.
74 3
7.11. Refe,ences and Notes
In the following result we show that under appropriate conditions the x{ J generated in the heuristic discussion above converges to the sequence N uniq u e minimizing element X o satisfying Eq. (7.10.20). 7.10.24. Theorem. L e t M E B(X , X ) be a self-adjoint operator such that for some pair of positive real numbers" and .J l we have ,,11 x W< (x, Mx ) < .J lllx Wfor all x E .X L e t IX E X be arbitrary, let W E ,X and let'N = W - Mx N, where N X I+ = X N (l,N'N for n = 1,2, ... ,and (l,N = ('N' N ' )/('N' M'N)' Then the sequence x { converges to x o, where X o is the uniq u e solution to Eq. (7.10.20).
+
N}
In view of the Schwarz inequality we have (x, Mx ) < IIMx l lllx l l. This implies that "llx l l < IIMx l l for all x E ,X and so M is a bijective mapping by Theorem 7.4.21, with M- I E B(X , X ) and 11M-III < I/r. By Theorem 7.4.10, M- I is also self-adjoint. L e t X o be the uniq u e solution to Eq . (7.10.20), and define :F X - > R by
Proof
=
x o, M(x - x o)) for x E .X We see that F is minimized uniquely by x = x o, and furthermore F ( x o) = O. We now show that lim F ( x N) = O. If for some n, F ( x N) = 0, the process F(x)
(x -
N
terminates and we are done. So assume in the following that F ( x also that since M is positive, we have F ( x ) > 0 for all x E .X We begin with the fact that
+
F ( x N+ I) = F ( x N) - 2(1,N('N' MYN) (I,~('N' where we have let NY = X o - X N. Noting that N ' = (YN' MYN) = (M- I ' N ' 'n), we have F(x
Hence, (F N x I+ ) so X N- >
7.11.
<
N) F(x
(1 -
(F N x I+ ) N)
~
)F ( X
=
n) <
)
1= =
O. Note
M'N)' MYN' so that F ( x
N) =
(' n ' n' )2 :;;::: .1.. (' n , M' n )(M- I rN, ' n ) .J l
(1 -
x o, which was to be proven.
REFERENCES
N
_
~
r
F(x
l ).
Thus, li~
F(x
n) =
Oand
AND NOTES
Many of the excellent sources dealing with linear operators on Banach and H i lbert spaces include Balakrishnan 7[ .2], Dunford and Schwarz 7[ .5], K a ntorovich and Akilov 7[ .6], K o lmogorov and F o min 7[ .7], Liusternik and Sobolev 7[ .8], Naylor and Sell 7[ .11], and Taylor 7[ .12). The exposition by Naylor and Sell is especially well suited from the viewpoint of applications in science and engineering.
Chapter 7 / Linear Operators
474
F o r applications of the type considered in Section 7.10, as well as additional applications, refer to Antosiewicz and Rheinboldt 7[ .1], Balakrishnan 7[ .2], Byron and Fuller 7[ .3], Curtain and Pritchard 7[ .4,] Kantarovich and Akilov 7[ .6], Lovitt 7[ .9], and Luenberger 7[ .10). Applications to integral equations (see Section 7.lOA) are treated in 7[ .3] and 7[ .9]. Optimal control problems (see Section 7.lOB) in a Banach and Hilbert space setting are and 7[ .10]. Methods for minimization of funcpresented in 7[ .2], 7[ .4,] and 7[ .10]. tionals (see Section 7.1OC) are developed in 7[ .1], 7[ .6],
REF E RENCES 7[ .1]
7[ .2] 7[ .3]
7[ .4]
7[ .5] 7[ .6]
7[ .7] 7[ .8] 7[ .9] 7[ .10] 7[ .11] 7[ .12]
.H A. ANTOSIEWICZ and W. C. RHEINBOLDT, "Numerical Analysis and uF nctional Analysis," Chapter 14 in Survey of Numerical Analysis, ed. by .J TODD. New oY rk: McGraw-iH ll Book Company, 1962. A. V. BALARK ISHNAN, Applied uF nctional Analysis. New o Y rk: SpringerVerlag, 1976. .F W. BYRON and R. W. EL UF R, Mathematics of Classical and Quantum Physics. Vols. I. II. Reading, Mass.: Addison-Wesley Publishing Co., Inc., 1969 and 1970.· R. .F CuRTAIN and A. .J PRITCHARD, uF nctional Analysis in Modern Applied Mathematics. o L ndon: Academic Press, Inc., 1977. N. DUNO F RD and .J SCHWARZ, iL near Operators, Parts I and II. New oY rk: Interscience Publishers, 1958 and 1964. L . V. A K NTOROVICH and G. P. AKIO L V, uF nctional Analysis in Normed Y rk: The Macmillan Company, 1964. Spaces. New o A. N. O K M L OGOROV and S. V. O F MIN, Elements of the Theory ofFunctions and uF nctional Analysis. Vols. I, II. Albany, N.Y.: Graylock Press, 1957 and 1961. L. A. IL SU TERNlK and V. J. SoBOLEV, Elements ofFunctional Analysis. New oY rk: rF ederick nU gar Publishing Company, 1961. W. V. LovllT, iL near Integral Equations. New oY rk: Dover Publications, Inc., 1950. D. G. EUL NBERGER, Optimization by Vector Space Methods. New o Y rk: oJ hn Wiley & Sons, Inc., 1969. A. W. NAYO L R and G. R. SEL,L Linear Operator Theory. New oY rk: oH lt, Rinehart and Winston, 1971. A. E. TAYO L R, Introduction to uF nctional Analysis. New oY rk: oJ hn Wiley & Sons, Inc., 1958. *Reprinted in one volume by Dover Publications, Inc., New oY rk,
1992.
INDEX
Abelian group, 40 abstract algebra, 33 additive group, 46 adherent point, 275 adjoint system of ordinary differential equations, 261 adjoint transformation, 219, 220,422 affine linear subspace, 85 algebra, 30,56,57,104 algebraically closed field, 165 algebraic conjugate, 110 algebraic multiplicity, 167,223
algebraic structure, 31 algebraic system, 30 algebra with identity, 57,105 aligned, 379 almost everywhere, 295 approximate eigenvalue, 444 approximate point spectrum, 444 approximation, 395 Arzela-Ascoli theorem, 316 Ascoli's lemma, 317 associative algebra, 56, 105 associative operation, 28 automorphism, 64, 68 autonomous system of differential equations, 241 Axioms of norm, 207 475
476
Index B
Banach inverse theorem, 416 Banach space, 31, 345 basis, 61,89 Bessel inequality, 213, 380 bicompact, 302 bijection 14 bijective, 14, 100 bilinear form, 114 bilinear functional, 114-115 binary operation, 26 block diagonal matrix, 175 Bolzano-Weierstrass property, 302 Bolzano-Weierstrass theorem, 298 boundary, 279 bounded linear functional, 356 bounded linear operator, 407 bounded metric space, 265 bounded sequence, 286 B(X,Y), 409
c C[a,b],80 cancellation laws, 34 canonical mapping, 372 cardinal number, 24 cartesian product, 10 Cauchy-Peano existence theorem, 332 Cauchy sequence, 290 Cayley-Hamilton theorem, 167 Cayley's theorem, 66 characteristic equation, 166,259 characteristic polynomial, 166 characteristic value, 164 characteristic vector, 164 0 > 79 classical adjoint of a matrix, 162 closed interval, 283 closed relative to an operation, 28
closed set, 279 closed sphere, 283 closure, 275 C n ,78 cofactor, 158 colinear, 379 collection of subsets, 8 column matrix, 132 column of a matrix, 132 column rank of a matrix, 152 column vector, 125 commutative algebra, 57,105 commutative group, 40 commutative operation, 28 commutative ring, 47 compact, 302 compact operator, 447 companion form, 256 comparable matrices, 137 complement of a subset, 4 completely continuous operator, 447 complete metric space, 290 complete ortghonormal set of vectors, 213,389 completion, 295 complex vector space, 76 composite function, 16 composite mathematical system, 30, 54 conformal matrices, 137 congruent matrices, 198 conjugate functional, 114 conjugate operator, 421 constant coefficients, 241 contact point, 275 continuation of a solution, 336 continuous function, 307,408 continuous spectrum, 440 contraction mapping, 314 converge, 286,350 convex, 351-355 coordinate representation of a vector, 125 coordinates of a vector with respect to a basis, 92, 124 countable set, 23 countably infinite set, 23
Index
477
covering, 299 cyclic group, 43,44
D degree of a polynomial, 70 DeMorgan's laws, 7,12 dense-in-itself, 284 denumerable set, 23 derived set, 277-278 determinant of a linear transformation, 163 determinant of a matrix, 157 diagonalization of a matrix, 172 diagonalization process, 450 diagonal matrix, 155 diameter of a set, 267 difference of sets, 7 differentiation: of matrices, 247 of vectors, 241 dimension, 78,92,392 direct product, 10 direct sum of linear, subspaces83,457 discrete metric, 265 disjoint sets, 5 disjoint vector spaces, 83 distance 264 between a point and a set, 267 between sets, 267 between vectors, 208 distribution function, 397 distributive, 28 diverge, 286, 350 division algorithm, 71 division (of polynomials), 72 division ring, 46, 50 divisor, 49 divisors of zero, 48 divisors of zero, 48 domain of a function, 12 domain of a relation, 25 dot product, 114
dual, 358 dual basis, 112
E e-approximate solution, 329 e-dense set, 299 e-net, 299 eigenvalue, 164,439 eigenvector, 164,439 element, 2 element of ordered set, 10 empty set, 3 endomorphism, 64, 68 equal by definition, 10 equality of functions, 14 equality of matrices, 132 equality of sets, 3 equals relation, 26 equicontinuous, 316 equivalence relation, 26 equivalent matrices, 151 equivalent metrics, 318 equivalent sets, 23 error vector, 395 estimate, 398 Euclidean metric, 271 Euclidean norm, 207 Euclidean space, 30,124, 205 even permutation, 156 events, 397 everywhere dense, 284 expected value, 398 extended real line, 266 extended real numbers, 266 extension of a function, 20 extension of an operation, 29 exterior, 279 extremum, 464
F factor, 72 family of disjoint sets, 12 family of subsets, 8
478
Index
field, 30, 46, 50 field of complex numbers, 51 field of real numbers, 51 finite covering, 299 finite-dimensional operator, 450 finite-dimensional vector space, 92,124 finite group, 40 finite intersection property, 305 finite linear combination of vectors, 85 finite set, 8 fixed point, 315 flat, 85 F n , 78 Fourier coefficients, 380,389 Frechet derivative, 458 Fredholm equation, 97,326 Fredholm operator, 425 function, 12 functional, 109,355 functional analysis, 343 function space, 80 fundamental matrix, 246 fundamental sequence, 290 fundamental set, 246 fundamental theorem of algebra, 74 fundamental theorem of linear equations, 99
G Gateaux differential, 458 generalized associative law, 36 generated subspace, 383 generators of a set, 60 Gram matrix, 395 Gram-Schmidt process, 213,391 graph of a function, 14 greatest common divisor, 73 Gronwall inequality, 332 group, 30, 39
group component, 46 group operation, 46
H Hahn-Banach theorem, 367-370 half space, 366 Hamel basis, 89 Hausdorff spaces, 323 Heine-Borel property, 302 Heine-Borel theorem, 299 hermitian operator, 427 Hilbert space, 31, 377 homeomorphism, 320 homogeneous property of a norm, 208,344 homogeneous system, 241-242 homomorphic image, 62,68 homomorphic rings, 67 homomorphic semigroups, 63 homomorphism, 30, 62 hyperplane, 364
I idempotent operator, 121 identity: element, 35 function, 19 matrix, 139 permutation, 19,44 relation, 26 transformation, 105,409 image of a set under f, 21 indeterminate of a polynomial ring, 70 index: of a nilpotent operator, 185 of a symmetric bilinear functional, 202 set, 10 indexed family of sets, 10 indexed set, 11 induced: mapping, 20
Index induced (cont.) metric, 267 norm,349,412 operation, 29 inequalities, 268-271 infinite-dimensional vector space, 92 infinite series, 350 infinite set, 8 initial value problem, 238-261,328-: injection, 14 injective, 14,100 inner product, 117,205,375 inner product space, 31, 118, 205 inner product subspace, 118 integral domain, 46,49 integration: of matrices 249 of vectors 249 interior, 278 intersection of sets, 5 invariant linear subspace, 122 inverse: image 21 of a function, 15, 100 of a matrix, 140 of an element, 38 relation, 25 invertible element, 37 invertible linear transformation, 100 invertible matrix, 140 irreducible polynomial, 74 irreflexive, 372 isolated point, 275 isometric operator, 431 isometry,321 isomorphic, 108 isomorphic semigroups, 64 isomorphism, 30, 63, 68,108
J Jacobian matrix, 461 Jacobi identity, 57 Jordan canonical form, 175,191
K Kalman's theorem, 401402 kernel of a homomorphism, 65 Kronecker delta, 111
L Laplace transform, 96 latent value, 164 leading coefficient of a polynomial, 70 Lebesgue integral, 296 Lebesgue measurable function, 296 Lebesgue measurable sets, 295 Lebesgue measure, 295 left cancellation property, 34 left distributive, 28 left identity, 35 left inverse, 36 left invertible element, 37 left R-module, 54 left solution, 40 Lie algebra, 57 limit, 286 limit point, 277,288 line segment, 351 linear: algebra, 33 functional, 109,355-360 manifold, 81 operator, 31,95 quadratic cost control, 468 space, 30,55,76 subspace, 59,81,348 subspace generated by a set, 86 transformation, 30, 95,100 variety, 85 linearly dependent, 87 linearly independent, 87 Lipschitz condition, 324, 328 Lipschitz constant, 324, 328
480
Index
lower triangular matrix, 176 L 297 L(X,Y), 104
M map, 13 mapping, 13 mathematical system, 30 matrices, 30 matrix, 132 matrix of: a bilinear functional, 195 a linear transformation, 131 one basis with respect to a second basis, 149 maximal linear subspace, 363 metric, 31,209,264 metric space, 31,209, 263-342 metric subspace, 267 minimal polynomial, 179,181 minor of a matrix, 158 modal matrix, 172 modern algebra, 33 module, 30, 54 monic polynomial, 70 monoid, 37 multiplication of a linear transformation by a scalar, 104 multiplication of vectors by scalars, 76,409 multiplicative semigroup, 46 multiplicity of an eigenvalue, 164 multivalued function, 25
N natural basis, 126 natural coordinates, 127 n-dimensional complex coordinate space, 78 n-dimensional real coordinate space, 78
n-dimensional vector space, 92 negative definite matrix, 222 nested sequence of sets, 298 Neumann expansion theorem, 415 nilpotent operator, 185 non-abelian group, 40 non-commutative group, 40 non-empty set, 3 non-homogeneous system, 241-242 non-linear transformation, 95 non-singular linear transformation, 100 non-singular matrix, 140 non-void set, 3 norm, 206, 344 normal: equations, 395 linear transformation, 237 operator, 431 topological space, 323 normalizing a vector, 209 normed conjugate space, 358 normed dual space, 358 normed linear space, 31, 208,344 norm of a bounded linear transformation, 409 norm preserving, 367 nowhere dense, 284 null: matrix, 139 set, 3 space, 98,224 vector, 76, 77 nullity of a linear transformation, 100 n-vector, 132
O object, 2 observations, 398 odd permutation, 156
481
Index one-to-one and onto mapping, 14,100 one-to-one mapping, 14, 100 onto mapping, 14,100 open: ball, 275 covering, 299 interval, 282 set, 279 sphere, 275 operation table, 27 operator, 13 optimal control problem, 468 ordered sets, 9 order of a group, 40 order of a polynomial, 70 order of a set, 8 ordinary differential equations, 238-261 origin, 76, 77 orthogonal: basis, 210 complement, 215,382 linear transformation, 217, 231-: matrix, 216,226 projection, 123,433 set of vectors, 379 vectors, 118,209 orthogonality principle, 399 orthonormal set of vectors, 379 outcomes, 397
point spectrum, 440 polarization, 116 polynomial, 69 positive definite matrix, 222 positive operator, 429 power class, 9 power set, 9 precompact, 299 predecessor of an operation, 29 pre-Hilbert space, 377 primary decomposition theorem, 183 principal minor of a matrix, 158 principle of superposition, 96 probability space, 397 product metric spaces, 274 product of: a matrix by a scalar, 138 linear transformations, 105,409 two elements, 46,104 two matrices, 138 projection, 119,226,387 projection theorem, 387,400 proper: subset, 3 subspace, 81, 164 value, 164 vector, 164 Pythagorean theorem, 209, 379 Q
P parallel, 364 parallelogram law, 208, 379 Parseval's formula, 390 Parseval's identity, 212 partial sums, 350 partitioned matrix, 147 permutation group, 44,45 permutation on a set, 19 piecewise continuous derivatives, 329 point of accumulation, 277 points, 264
quadratic form, 115, 226 quotient, 72
R radius, 275 random variable, 397 range of a function, 12 range of a relation, 25 range space, 98 rank of a linear transformation, 100
482 rank of a matrix, 136 rank of a symmetric bilinear functional, 202 real inner product space, 205 real line, 265 real vector space, 76 reduce, 435 reduced characteristic function, 179 reduced linear transformation, 122 reflection, 218 reflexive, 372 reflexive relation, 25 regular topological space, 323 relation, 25 relatively compact, 307 relatively prime, 73 remainder, 72 repeated eigenvalues, 173 residual, 472 residual spectrum, 440 resolution of the identity, 226,457 resolvent set, 439 restriction of a mapping, 20 R-homomorphism, 68 Riccati equation, 471 Riemann intergrable, 296 Riesz representation theorem, 393 right: cancellation property, 34 distributive, 28 identity, 34 inverse, 35 invertible element, 37 R-module, 54 solution, 40 R°°, 78 ring, 30,46 ring of integers, 51 ring of polynomials, 70 ring with identity, 47 R-module, 54 Rn, 78 rotation, 218, 230 row of a matrix, 131
Index row rank of a matrix, 152 row vector, 125,132 R*, 266 R-submodule, 58 R-submodule generated by a set, 60
s scalar, 75 scalar multiplication, 76 Schwarz inequality, 207,376 second dual space, 371 secular value, 164 self-adjoint linear transformation, 221, 224-225 self-adjoint operators, 428 semigroup, 30, 36 semigroup component, 46 semigroup of transformations, 44 semigroup operation, 46 separable, 284, 300 separates, 366 sequence, 11, 286 sequence of disjoint sets, 12 sequence of sets, 11 sequentially compact, 301-305 set, 1 set of order zero, 8 shift operator, 441 a-algebra, 397 a-field,397 signature of a symmetric bilinear functional, 202 similarity transformation, 153 similar matrices, 153 simple eigenvalues, 164 singleton set, 8 singular linear transformation, 101 singular matrix, 140 skew-adjoint linear transformation, 221, 237 skew symmetric bilinear functional, 196 skew symmetric matrix, 196
483
Index skew symmetric part of a linear functional, 196 solution of a differential equation, 239 solution of an initial value problem, 239 space of: bounded complex sequences, 79 bounded real sequences, 79 finitely non-zero sequences, 79 linear transformations, 104 real-valued continuous functions, 80 span, 86 spectral theorem, 226,455,457 spectrum, 164,439 sphere, 275 spherical neighborhood, 275 square matrix, 132 state transition matrix, 247-255 steepest descent, 472 strictly positive, 429 strong convergence, 373 subalgebra, 105 subcovering, 299 subdomain, 52 subfield, 52 subgroup, 41 subgroup generated by a set, 43 submatrix, 147 subring, 52 subring generated by a set, 53 subsemigroup,40 subsemigroup generated by a set, 41 subsequence, 287 subset, 3 subsystem, 40,46 successive approximations, 315, 324-328 sum of: elements, 46 linear operators, 409 linear transformations, 104 matrices, 138
sets, 82 vectors, 76 surjective, 14, 100 Sylvester's theorem, 199 symmetric difference of sets, 7 symmetric matrix, 196, 226 symmetric part of a linear functional, 196 symmetric relation, 26 system of differential equations, 240, 255-260
T ternary operation, 26 Tj-spaces, 323 topological space, 31 topological structure, 31 topology, 280, 318,322-323 totally bounded, 299 T',421 trace of a matrix, 169 transformation, 13 transformation group, 45 transitive relation, 26 transpose of a linear transformation, 113,420 transpose of a matrix, 133 transpose of a vector, 125 triangle inequality, 208, 264, 344 triangular matrix, 176 trivial ring, 48 trivial solution, 245 trivial subring, 53 truncation operator, 439 T*,422 T T ,113
u unbounded linear functional, 356 unbounded metric space, 265 uncountable set, 23 uniform convergence, 313
484
Index
uniformly continuous, 308 union of sets, 5 unit, 37 unitary operator, 431 unitary space, 205 unit of a ring, 47 unit vector, 209 unordered pair of elements, 9 upper triangular matrix, 176 usual metric for R*, 266,320 usual metric on R, 265 usual metric on Rn, 271
V vacuous set, 3 Vandermonde matrix, 260 variance, 398 vector, 75 vector addition, 75 vector space, 30,55, 76 vector space of n-tuples over F, 56 vector space over a field, 76 vector subspace, 59
Venn diagram, 8 void set, 3 Volterra equation, 327 Volterra integral equation, 97
w weak convergence, 373 weakly continuous, 375 weak* compact, 375 weak-star convergence, 373 Weierstrass approximation theorem, 285 Wronskian, 256-259 XYZ Xf, 357 X*, 357-358 zero: polynomial, 70 transformation, 104,409 vector, 76, 77 Zorn's lemma, 390